Structs as Keys

I was working with the key-value database RocksDB 1 for a project and came up with a method of using structs as keys that works surprisingly well. Why would you want to do this? The main benefit is avoiding creating functions to convert your struct keys to strings and back again. Especially when there are many types of keys. There are some definite drawbacks to this approach, but also some significant benefits:

Pros:

  • No tedious serialization/deserialization code
  • Faster performance
  • Smaller key sizes

Cons:

  • Not human readable
  • Less type safe
  • Less separation between code and the database

While certainly not the best approach in all situations, for performance sensitive, short lived keys it may be very useful.

As an example of how this could be set up, let’s say we are building a service in C++ that will serve generated map tiles with different styles and types. We want to use a key-value store for caching the tiles, but want to avoid serializing and deserializing the keys.

Our key struct might look something like this:


enum class MapType : uint8_t {
    ROAD,
    TOPOGRAPHICAL,
    SATTELITE
};

enum class MapStyle : uint8_t {
    DARK,
    LIGHT
};

struct TileKey {
    uint32_t x; // 4 bytes
    uint32_t y; // 4 bytes;
    uint8_t z; // 1 byte

    MapType mapType; // 1 byte
    MapStyle style; // 1 byte
};   

While not too bad to serialize TileKey to something like road/dark/x/y/z, it can quickly become a pain if there are many types of keys. Also, if we take a look at the memory usage of a string key vs the struct key, there is a sizable difference. The best case for a string key is something like road/dark/0/0/0, 15 bytes. If it is storing larger x, y, and z values though, like road/dark/2147483647/2147483647/255, it can be 35 bytes. The struct on the other hand is a constant 12 bytes. Why 12 bytes and not 11? Therein lies the problem…

If we try to go ahead and use TileKey as a key in something like RocksDB, we will quickly run into this problem. The issue is the padding 2 in the struct, which will be uninitialized when the key is used. Since the database will use all the bytes for comparing the keys, two keys that have the same values in all their members may not be equal. By grouping the struct members we can see the padding situation:


struct TileKey {
//4 bytes:
    uint32_t x;

//4 bytes:
    uint32_t y;

//4 bytes:
    uint8_t z; // 1 byte
    MapType mapType; // 1 byte
    MapStyle style; // 1 byte
    // 1 byte of padding
};   

To make sure the comparisons in the database work correctly, we need to make sure the padding is set to the same value every time we give the key struct to the database. You could set it to whatever value you like, but let’s stick with zero for this example. To set the value of the padding, we can make the padding a member of the struct. Then, when we go to use the key, our database access functions (put/get/delete) can zero it out before using it:


struct TileKey {
    uint32_t x; // 4 bytes
    uint32_t y; // 4 bytes;
    uint8_t z; // 1 byte

    MapType mapType; // 1 byte
    MapStyle style; // 1 byte


    char _padding[1]; // 1 byte
};   

void storeTile(TileKey& key, const Tile& value) {
    memset((char*)key._padding, 0, sizeof(key._padding));

    db->put(&key, sizeof(key), &value, sizeof(value));
}

This works fairly well, but it still requires a function for each type of key. Using templates we can do a bit better:

struct TileKey {
    uint32_t x; // 4 bytes
    uint32_t y; // 4 bytes;
    uint8_t z; // 1 byte

    MapType mapType; // 1 byte
    MapStyle style; // 1 byte

    char _padding[1]; // 1 byte
};

struct SimpleTileKey {
    uint32_t x;
    uint32_t y;
    uint8_t z;
    char _padding[3];
};

template<typename T, typename V>
void store(T& key, const V& value) {
    memset((char*)key._padding, 0, sizeof(key._padding));
    
    db->put(&key, sizeof(key), &value, sizeof(value));
}

There is one more case that could cause an issue, a key struct with no padding. Since there is no way to add a zero-sized struct member, there can’t be a padding member without increasing the size of the struct. Without the padding member, the template replacement will fail. To handle those cases, we can do some template trickery and use SFINAE (Substitution Failure Is Not An Error):

struct NoPaddingKey {
    uint32_t x; // 4 bytes
    uint32_t y; // 4 bytes;
};  

//This pad function will be used if there is a _padding member of T
//The use of std::void_t requires C++17, but there are ways to do it with C++11 and possibly even before.
template<typename T, typename = std::void_t<decltype(&T::_padding)>>
void pad(T& key) {
    memset((char*)key._padding, 0, sizeof(key._padding));
}

//This pad function will be used otherwise
void pad(...) {
    return;
}

template<typename T, typename V>
void store(T& key, const V& value) {
    pad(key);
    
    db->put(&key, sizeof(key), &value, sizeof(value));
}

One downside of this method is that it will happily take a struct with padding and no _padding member. I have not found a way to check for padding in templates or during runtime, but if you know a way to do that I would love to hear it!


  1. http://rocksdb.org/ ↩︎

  2. If you aren’t familiar with C struct padding, this is a good guide to it: http://www.catb.org/esr/structure-packing/ ↩︎