We all know image compression from common formats like JPEG or PNG. For rendering the textures can be compressed as well, but here different formats are used than the ones we are familiar with. The reason here is that there are special requirements when it comes to texture compression:
- Small texture size
- Fast decompression
- Fast random access
Number 1, small texture size, is the same goal as for any other compression scheme. The size needed in texture memory should get reduced. This also reduces the memory bandwidth needed as less data has to get accessed while reading the texture while rendering. This is also the reason why using texture compression can increase performance even when there would be sufficient RAM on the graphics card (if the application is limited by the memory bandwidth).
Number 2, fast decompression, is also understandable: as the texture resides compressed on the graphics card, it has to get decompressed on the fly each time it gets accessed. To be exact, decompressed data gets cached uncompressed on the DIE of the GPU, but as this cache is relatively small, the texture has to get decompressed at least once per frame for a normal application.
Compare this to a JPEG file in a browser: the file gets downloaded in a compressed form and gets then decompressed for display and stays decompressed from then on!
Number 3, random access, is probably the most important difference to regular image compression. When rendering a mesh which is covered with one texture, only one side of the mesh can be seen, so only one half of the texture has to get decompressed (think of a textured globe). In fact, the fraction of a texture that is visible if often even smaller as the object can be concave, hidden by other object, be partially out of the view frustum and only some parts of the mipmap pyramid is needed…
So given a texture coordinate UV, the memory location of that texel has to be easily to calculate. This can simply be implemented by compressing blocks of texels: N*M texels are compressed to C bits. In case all blocks have the same size, we can create a list of pointers – one for each block that points to the memory location of the compressed date for this block. To make decompression simpler and removing this indirection while reading, one can define each compressed block to have the same size for all blocks. In this case the index pointers are not needed at all. From each UV coordinate the corresponding N*M block can get calculated that includes that coordinate. Say this is block number B, the data to decompress can be found at location B*C.
This automatically results in lossy compression – why? Each block originally stored U bits of information (e.g. N*M*3*8 for 8bit RGB colors) and will get represented in C bits (C < U, otherwise it wouldn’t be a compression ;-) ). As each combination of U bits will result in a different color pattern, mapping these to just C bits, some uncompressed color block will get the same compressed block assigned – information is lost.
On the PC, texture compression was introduced by S3 Graphics in 1998. Now, 14 years later the same technique is still used together with some variants of the same ideas (ok, Direct3D 11 / OpenGL 4 added some more complex compressions). As they originate from S3, the initial algorithms a referred to as S3 Texture Compression (s3tc) on OpenGL but were introduced as DXT on Direct3D (for DirectX Texture compression), each variant with its own number: DXT1 – DXT5. With Direct3D 10 these and newer compressions are renamed to BC1 – BC7 (for Block Compression). The following table should give us an overview of which is which:
|old D3D||new D3D||OpenGL||D3D support||OpenGL support|
|ATI1||BC4||RGTC1||10.0||3.0 (or via extension)|
|ATI2||BC5||RGTC2||10.0||3.0 (or via extension)|
|–||BC6H||BPTC_FLOAT||11.0||4.2 (or via extension)|
|–||BC7||BPTC||11.0||4.2 (or via extension)|
From now on I will stick with the Direct3D 10/11 terminology (BC1..BC7) as it is the easiest way to distinguish the formats. Lets first look at how BC1 works as most of the other formats are just variants of BC1.
Each block of 4 by 4 texels will get compressed individually, in the case of BC1, we are talking about RGB data (there is a BC1 variant with 1 bit alpha, but just ignore that for now). For each block, just two color values will be stored in 5-6-5 encoding (5 bits for red and blue, 6 bits for green as the human eye is most sensitive to green). For each texel, a 2 bit value indicates whether the first, the second or a mixture of both colors (1/3*color1 + 2/3*color2 or 2/3*color1 + 1/3*color2) should get used. Yes, this means, each block only consists of a maximum of four colors. Encoding works like this: Each RGB color can be seen as a coordinate in a 3D space:
Imagine this with 16 color values instead of four. What a BC1 compressor now has to do is to find a line through the values which has the smallest distance to all points, e.g. this one:
Now the endpoints of the line will get saved and the additional two mixed colors will be defined automatically:
As we can see, the blue ‘outlier’ color can’t be represented, or, if we choose a line through that value, the other colors would be more ‘wrong’ than they are now.
Note that the colors above are not accurate to the color-cube, so lets look at a real example:
The original block is shown on the left, the BC1 block on the right. As this block contains pure red, green and blue and these don’t lie on one line in the 3D color space, this block has to decrease dramatically in quality! So lets look at a realistic example, the following image was BC1 compressed and then converted back to png (to prevent additional artifacts I didn’t use jpg):
Above: the original, below: the BC1 compressed version.
Here are some of the blocks:
Two 16 bit colors and 16 two-bit values per texel add up to 64 bit per block. But it’s also possible to squeeze one alpha bit in there: define one of the two-bit indices as transparent-black and only use the two colors and a 50:50 mixture for the RGB values. These two modes can even be mixed within the same texture: to distinguish the two modes, the order of the two initial color values is used (is the smaller or the larger value of the 16 bit colors the first one?)!
Now, lets look at the other modes:
- BC1 stores RGB or RGBA color values of 4*4 texels in 64 bit (4 bit / texel). The alpha channel can only store one bit (opaque or transparent) as seen above.
- BC2 stores RGB exactly like BC1 and in addition stores a 4 bit alpha per texel. Each block is stored in 128 bit, resulting in 8 bit / texel.
- BC3 stores RGB exactly like BC1 and in addition compresses an alpha channel with a similar technique to BC1 (even simpler, as one channel is one-dimensional). Each block is stored in 128 bit, resulting in 8 bit / texel.
- BC4 stores the red channel in 4 bit per texel similar to BC1 – in the same way the alpha is stored in BC3. There are two sub-formats, one will interpret the resulting value as a float in the interval [-1,1] the other as [0,1].
- BC5 stores red and green independently and is thus just twice the data stored for BC4. It is a good choise for normal maps with a resulting 8 bit / texel.
- BC6H has 14 different encodings which can be chosen on a per-block level. The H stands for half and it is well suited for half float RGB textures like high dynamic range textures with a resulting 8 bit / texel.
- BC7 can compress RGB and RGBA in 8 modes selectable per block (like BC6H) at 8 bit / texel. With higher compression cost and the need for newer hardware it is kind of a modern replacement for BC1 – BC3.
If you want to upload compressed textures to OpenGL, you can use the enums below as the internal and external format. Using these values just as the internal format will compress the texture on-the-fly. This however is slower and will result in a worse quality as offline compressed textures as the search for the optimal ‘line’ in the 3d color space takes some time.
|D3D||OpenGL internal format enum||note|
|BC1||GL_COMPRESSED_RGBA_S3TC_DXT1_EXT||with 1 bit alpha|
|BC7||GL_COMPRESSED_SRGB_ALPHA_BPTC_UNORM||sRGB color space|
Beside BC1-BC7, there are also some other formats like FXT1 which is still supported by Intel in hardware but not used anywhere else and special formats on mobile GPUs.
Detailed information about the internals of the formats can be found here:
- BC1 – BC7: Intel Ivy Bridge GPU Documentation (page 68 – 94)
- BC1: MSDN | EXT_texture_compression_s3tc |
- BC2: MSDN | EXT_texture_compression_s3tc |
- BC3: MSDN | EXT_texture_compression_s3tc |
- BC4: MSDN | EXT_texture_compression_rgtc |
- BC5: MSDN | EXT_texture_compression_rgtc |
- BC6H: MSDN | ARB_texture_compression_BPTC |
- BC7: MSDN | ARB_texture_compression_BPTC |
- FXT1: Intel Ivy Bridge GPU Documentation (page 58 – 68) | FXT1 White paper