Vertices – Packing and More

If you’ve delved into 3d graphics for more than 5 seconds, you’ll know what a vertex is.  It’s a position, plus a bunch of other day.  Today I’d like to go over 2 topics related to vertices, which will be relevant to a post I plan to make in the future; how vertex data is stored, and how that data is arranged in memory.

First, let’s look at the data itself.  For the sake of this article, we’ll be talking about 3-dimensional vertices as you would encounter during real-time (ie, game) programming.

At bare minimum, a vertex require a Position component, consisting of an X, a Y, and a Z co-ordinate.  These are typically stored as floating-point values.  I’ve also seen them stored as half-floats, though you have to beware of precision problems.  Another interesting technique scales and translates the model so that all of its vertices lie within the [-1, 1] range on all three axes (reversing this transformation in the model’s matrix), then stores position as a normalized 16-bit integer (treating the lowest possible value as -1 and the highest possible value as +1, with all other values lying somewhere in between).  I think this technique has the most potential, as any precision issues will depend on the overall size of the original mesh.

Immediately, however, we run into the issue of memory alignment.  If you store a 3-component value using 16-bit values, you’ve got a vertex attribute which takes up 6 bytes.  Graphics cards vastly prefer vertex attributes to be multiples of 4 bytes, and in some cases will outright stop you from doing otherwise.  This is important, and we’ll keep it in mind when we look at how to pack other data.    For the sake of position data, this means that we’d need to add an extra 2 bytes of padding, resulting in a total overall reduction from 12 bytes to 8, rather than the 6 we might expect for using half-sized data values.

Next up is texture co-ordinates, again typically stored as floats but personally I prefer using half-floats here.  Some people use normalized 16-bit integers and only worry about the [0,1] range, but I like having texture coordinates beyond this range when working with tiling textures.  Half-floats give the best balance for this case, in my opinion; for most cases it provides sufficient precision within the [-1,1] range, while allowing you to go beyond that range with gradually dropping precision.  Precision is less important beyond the standard range, so this behavior is an acceptable balance.  Since texture co-ordinates only have 2 components, we don’t need to introduce any padding when reducing them to shorts or half-floats; win-win!

Next up, normals.  A “normal” is a 3-dimensional vector pointing outwards from the surface.  Normals are unit vectors, meaning they have a length of exactly 1, and are therefore “normalized”; this is important.  Because the length of the vector cannot exceed 1, each component cannot exceed the familiar [-1,1].  As we’ve seen already, that means we can pack the value into a normalized integer.  How much precision do we need?  8 bits per component is sufficient for normal maps, so that’s what I prefer to use.  Some programmers prefer the added precision of half-floats or a fancy 10_10_10_2 integer format, but for simplicity I just use 8-bit.  We only need 1 byte of padding to fit 4-byte alignment.

Tangents, like normals, are 3-dimensional normalized vectors.  They are perpendicular to the normal in a direction which corresponds to the texture co-ordinates.  It might sound a little confusing, but what’s important is that they’re necessary for tangent-space normal mapping (the most common kind).  For the purposes of this discussion, we treat them like normals; 3 8-bit normalized integers, with an extra byte for padding (more on that).

Bitangents, or Binormals, are a vector created by the cross product between the normal and the tangent.  It, the normal, and the tangent together create the matrix which is necessary for normal mapping.  We don’t actually need to store these, however; because they’re just the cross between the normal and the tangent, we can just calculate it in the vertex shader from the normal and tangent vectors.  We do need to know which way it points, however (whether it’s N X T or T X N).  Ideally this could just be a 1-bit flag, but it’s far easier just to store this as an 8-bit integer (either -128 for -1, or 127 for +1) and stick it alongside the tangent, as the fourth component.

Qtangents are a fascinating thing, first popularized by a CryTek paper (which doesn’t appear to exist anymore except behind a paywall, yay).  The idea is to take the rotation matrix created by a vertex’s Normal, Tangent, and Bitangent, convert that into a quaternion, and then convert it back in the vertex shader.  I don’t plan to implement qtangents at this time, but I might in the future; in the meantime you can check out this blog post here on the topic.

Colors are actually pretty straightforward.  24-bit colors plus 8 bits for alpha is the norm.  You could easily just store this as a set of 4 8-bit integers.  Since we’re here, however, I’m going to beg the question: do your vertices actually need 24-bit color?  In my experience, vertex colors only really show up when the model isn’t textured (not a very common case in modern games) or to fake shading and ambient occlusion when the model uses generic/tileable textures.  If you’re faking shading or ambient occlusion, you really don’t need 24-bit color, and you certainly don’t need an alpha channel; an 8-bit grayscale is enough.  An 8-bit grayscale which you could pack into the w component of another vertex attribute.  Normals, for instance, which currently has an unused 8 bits of padding.  Just a suggestion, though; you might prefer to have the full 32 bits of color, or maybe you just don’t care to deal with vertex colors at all.  Up to you; this one, moreso than my other suggestions here, will depend heavily on the needs of the application.

Finally, we have skinning information.  Most animated meshes will allow up to 4 bones from the skeleton to affect a given vertex.  For each relevant bone, the vertex needs to store an index into a list of bone matrices (an integer), and a weight for that matrix’s influence (a floating-point value).  The weight is a [0,1] value, and doesn’t need a ton of precision, so an 8-bit unsigned integer sounds like the perfect type to use for this.  Bone indices need to be an integer, and have no benefit to being negative, so unsigned works best for them.  The question is, 8-bit indices or 16-bit?  If you can confidently say that no model will have more than 256 bones, then 8-bit is more than enough.  A rig for a human body can have anywhere from 30 to 70 bones in it, more if cloth or hair is rigged as well.  It’s worth noting that Uniform Buffer Objects (how you’d store the final skeleton matrices in OpenGL) are only guaranteed to have 16kb of storage; big enough to store 256 4×4 matrices.  I personally think it’s best to hard limit the number of bones to 256 and split up problematic models.

Before performing this packing, we were using 12 bytes per vertex for positions, 8 for texture co-ordinates, 12 for normals, 12 for tangents, 12 for bitangents, 16 for colors, 16 for bone indices, and 16 for bone weights; a total of 126 bytes.  Now, we’re down to 6 byte for position (plus 2 bytes of padding), 4 for texture coordinates, 3 for normals (plus 1 byte, where we’ll store vertex shading), 3 for tangents (plus 1 for the bitangent sign), 4 for bone indices, and 4 for bone weights; a total of 28 bytes.  We could save another 4 bytes by switching to qtangents (and storing shading in position.w, I guess).

Alright, we have the data we’re going to use!  Now what?  Now we need to pack it into a vertex buffer.  Theoretically you can use a different vertex buffer for every vertex attribute.  Do not do this.  Generally speaking, you want as much data as possible to be inside a single buffer, and you can go pretty far with that train of thought; going from a buffer for each attribute, to a buffer for each mesh, to a buffer for each model, to a single buffer for the whole scene.  In the future I’ll go over why that’s such a big deal, but for now just take my word that you want as much data as possible to be in a single buffer.

To interleave, or not to interleave, that is the question.  If you aren’t familiar with the term, “interleaved” means to use an “array of structs” pattern, where you have a struct which holds all of the data for a single vertex, then you create an array of those structs.  “Non-interleaved” means to use a “struct of arrays” pattern, where you have a series of arrays, each one containing all of the data for a single attribute.  Which is “best” will depend on the hardware; for example, Intel and AMD recommend using non-interleaved vertices, while Apple and (I believe) nVidia recommend using interleaved vertices.  When working with vertices on the CPU, there are some tasks which are easier with interleaved data and some which are easier with non-interleaved.  This is a bit of a no-win situation, I’m afraid.  If you’re developing for a specific hardware, such as consoles, then you can profile to determine which runs best in practice.  Everyone else will have to flip a coin, or just go with whatever’s more comfortable for them.

That’s all I have to say on the topic of vertices for now.  I’m currently in the process of hooking up the FBX SDK to my engine, and once I’ve got that under wraps then I’ll be posting an in-depth tutorial on the subject.  Until then, if you disagree with any of the assumptions or conclusions I’ve made, please leave a reply.

Leave a comment