RenderingPipeline

from geometry to pixels

Texture Compression

We all know image compression from common formats like JPEG or PNG. For rendering the textures can be compressed as well, but here different formats are used than the ones we are familiar with. The reason here is that there are special requirements when it comes to texture compression:

  1. Small texture size
  2. Fast decompression
  3. Fast random access

Number 1, small texture size, is the same goal as for any other compression scheme. The size needed in texture memory should get reduced. This also reduces the memory bandwidth needed as less data has to get accessed while reading the texture while rendering. This is also the reason why using texture compression can increase performance even when there would be sufficient RAM on the graphics card (if the application is limited by the memory bandwidth).

Number 2, fast decompression, is also understandable: as the texture resides compressed on the graphics card, it has to get decompressed on the fly each time it gets accessed. To be exact, decompressed data gets cached uncompressed on the DIE of the GPU, but as this cache is relatively small, the texture has to get decompressed at least once per frame for a normal application.

Compare this to a JPEG file in a browser: the file gets downloaded in a compressed form and gets then decompressed for display and stays decompressed from then on!

Number 3, random access, is probably the most important difference to regular image compression. When rendering a mesh which is covered with one texture, only one side of the mesh can be seen, so only one half of the texture has to get decompressed (think of a textured globe). In fact, the fraction of a texture that is visible if often even smaller as the object can be concave, hidden by other object, be partially out of the view frustum and only some parts of the mipmap pyramid is needed…

So given a texture coordinate UV, the memory location of that texel has to be easily to calculate. This can simply be implemented by compressing blocks of texels: N*M texels are compressed to C bits. In case all blocks have the same size, we can create a list of pointers – one for each block that points to the memory location of the compressed date for this block. To make decompression simpler and removing this indirection while reading, one can define each compressed block to have the same size for all blocks. In this case the index pointers are not needed at all. From each UV coordinate the corresponding N*M block can get calculated that includes that coordinate. Say this is block number B, the data to decompress can be found at location B*C.

This automatically results in lossy compression – why? Each block originally stored U bits of information (e.g. N*M*3*8 for 8bit RGB colors) and will get represented in C bits (C < U, otherwise it wouldn’t be a compression ;-) ). As each combination of U bits will result in a different color pattern, mapping these to just C bits, some uncompressed color block will get the same compressed block assigned – information is lost.

On the PC, texture compression was introduced by S3 Graphics in 1998. Now, 14 years later the same technique is still used together with some variants of the same ideas (ok, Direct3D 11 / OpenGL 4 added some more complex compressions). As they originate from S3, the initial algorithms a referred to as S3 Texture Compression (s3tc) on OpenGL but were introduced as DXT on Direct3D (for DirectX Texture compression), each variant with its own number: DXT1 – DXT5. With Direct3D 10 these and newer compressions are renamed to BC1 – BC7 (for Block Compression). The following table should give us an overview of which is which:

old D3D new D3D OpenGL D3D support OpenGL support
DXT1 BC1 S3TC 6.0 EXT_texture_compression_s3tc
DXT3 BC2 S3TC 6.0 EXT_texture_compression_s3tc
DXT5 BC3 S3TC 6.0 EXT_texture_compression_s3tc
ATI1 BC4 RGTC1 10.0 3.0 (or via extension)
ATI2 BC5 RGTC2 10.0 3.0 (or via extension)
BC6H BPTC_FLOAT 11.0 4.2 (or via extension)
BC7 BPTC 11.0 4.2 (or via extension)

From now on I will stick with the Direct3D 10/11 terminology (BC1..BC7) as it is the easiest way to distinguish the formats. Lets first look at how BC1 works as most of the other formats are just variants of BC1.

Each block of 4 by 4 texels will get compressed individually, in the case of BC1, we are talking about RGB data (there is a BC1 variant with 1 bit alpha, but just ignore that for now). For each block, just two color values will be stored in 5-6-5 encoding (5 bits for red and blue, 6 bits for green as the human eye is most sensitive to green). For each texel, a 2 bit value indicates whether the first, the second or a mixture of both colors (1/3*color1 + 2/3*color2 or 2/3*color1 + 1/3*color2) should get used. Yes, this means, each block only consists of a maximum of four colors. Encoding works like this: Each RGB color can be seen as a coordinate in a 3D space:

four colors in a 3D space

Imagine this with 16 color values instead of four. What a BC1 compressor now has to do is  to find a line through the values which has the smallest distance to all points, e.g. this one:

a line roughly representing the values

Now the endpoints of the line will get saved and the additional two mixed colors will be defined automatically:

four representable colors

As we can see, the blue ‘outlier’ color can’t be represented, or, if we choose a line through that value, the other colors would be more ‘wrong’ than they are now.

Note that the colors above are not accurate to the color-cube, so lets look at a real example:

left: original, right: BC1 compressedThe original block is shown on the left, the BC1 block on the right. As this block contains pure red, green and blue and these don’t lie on one line in the 3D color space, this block has to decrease dramatically in quality! So lets look at a realistic example, the following image was BC1 compressed and then converted back to png (to prevent additional artifacts I didn’t use jpg):

uncompressed haekelschwein

Above: the original, below: the BC1 compressed version.

BC1 compressed haekelschwein

Here are some of the blocks:

some blocks uncompressed (left) and compressed (right)

Two 16 bit colors and 16 two-bit values per texel add up to 64 bit per block. But it’s also possible to squeeze one alpha bit in there: define one of the two-bit indices as transparent-black and only use the two colors and a 50:50 mixture for the RGB values. These two modes can even be mixed within the same texture: to distinguish the two modes, the order of the two initial color values is used (is the smaller or the larger value of the 16 bit colors the first one?)!

Now, lets look at the other modes:

  • BC1 stores RGB or RGBA color values of 4*4 texels in 64 bit (4 bit / texel). The alpha channel can only store one bit (opaque or transparent) as seen above.
  • BC2 stores RGB exactly like BC1 and in addition stores a 4 bit alpha per texel. Each block is stored in 128 bit, resulting in 8 bit / texel.
  • BC3 stores RGB exactly like BC1 and in addition compresses an alpha channel with a similar technique to BC1 (even simpler, as one channel is one-dimensional). Each block is stored in 128 bit, resulting in 8 bit / texel.
  • BC4 stores the red channel in 4 bit per texel similar to BC1 – in the same way the alpha is stored in BC3. There are two sub-formats, one will interpret the resulting value as a float in the interval [-1,1] the other as [0,1].
  • BC5 stores red and green independently and is thus just twice the data stored for BC4. It is a good choise for normal maps with a resulting 8 bit / texel.
  • BC6H has 14 different encodings which can be chosen on a per-block level. The H stands for half and it is well suited for half float RGB textures like high dynamic range textures with a resulting 8 bit / texel.
  • BC7 can compress RGB and RGBA in 8 modes selectable per block (like BC6H) at 8 bit / texel. With higher compression cost and the need for newer hardware it is kind of a modern replacement for BC1 – BC3.

If you want to upload compressed textures to OpenGL, you can use the enums below as the internal and external format. Using these values just as the internal format will compress the texture on-the-fly. This however is slower and will result in a worse quality as offline compressed textures as the search for the optimal ‘line’ in the 3d color space takes some time.

D3D OpenGL internal format enum note
BC1 GL_COMPRESSED_RGB_S3TC_DXT1_EXT without alpha
BC1 GL_COMPRESSED_RGBA_S3TC_DXT1_EXT with 1 bit alpha
BC2 GL_COMPRESSED_RGBA_S3TC_DXT3_EXT
BC3 GL_COMPRESSED_RGBA_S3TC_DXT5_EXT
BC4 GL_COMPRESSED_RED_RGTC1 [0..1]
BC4 GL_COMPRESSED_SIGNED_RED_RGTC1 [-1..1]
BC5 GL_COMPRESSED_RED_GREEN_RGTC2 [0..1]
BC5 GL_COMPRESSED_SIGNED_RED_GREEN_RGTC2 [-1..1]
BC6H GL_COMPRESSED_RGB_BPTC_UNSIGNED_FLOAT
BC6H GL_COMPRESSED_RGB_BPTC_SIGNED_FLOAT
BC7 GL_COMPRESSED_RGBA_BPTC_UNORM
BC7 GL_COMPRESSED_SRGB_ALPHA_BPTC_UNORM sRGB color space

Beside BC1-BC7, there are also some other formats like FXT1 which is still supported by Intel in hardware but not used anywhere else and special formats on mobile GPUs.

Detailed information about the internals of the formats can be found here:

 

, , , , , ,

3 thoughts on “Texture Compression
  • Ellis Mu says:

    I am confused on D3D support in this table, what does 6.0-10.0-11.0 mean? Does it mean DirectX6.0 start to support DXT and BC1, and then DX10 start to support BC4? thanks.

    • Robert says:

      That’s correct, DirectX/Direct3D 6.0 supported BC1-BC3 (but they were called DXT back then), BC3 and BC4 came with DirectX 10 and the newest formats BC6h and BC7 are only supported by DirectX 11 hardware.

  • Bart de Boer says:

    Very well explained, even for people with few if any knowledge about this. I was looking for a good picture showing the RGB colorspace, but found a great article instead.

Leave a Reply

Your email address will not be published. Required fields are marked *

*