Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stb_ds: feature request & RFC: variable-sized binary data for hash map #1741

Open
darkuranium opened this issue Feb 5, 2025 · 4 comments
Open

Comments

@darkuranium
Copy link

darkuranium commented Feb 5, 2025

Is your feature request related to a problem? Please describe.
I quite often need to store a hash table keyed by dynamically-sized binary data (including, but not limited to, binary-safe strings); currently, I need to encode it into something that doesn't use \0 bytes (e.g. escaping with [say] "\\0" [and \ with "\\\\"], base64, et cetera).
I also have the same requirement for when I store non-0-terminated strings (e.g. slices of some input text). At the moment, that requires copies of data.

Describe the solution you'd like
I have two different ideas on how to resolve this; both rely on a new "family" of functions (here called bh_*, which stands for "binary hash" --- similar to sh_* for strings).
I've been thinking of implementing this myself (and then submitting a PR), but I'm not sure which approach is best. Without further ado, the two alternatives I had in mind are:

  1. "Flat" struct: "key" remains the actual data (this time as a pointer [or pointer-like, i.e. uint8_t key[];] instead of a direct value); key_len is added, which specifies the length of the key:
struct MapEntry
{
    // needs to be an integer type (e.g. `size_t`, `uint32_t`, `unsigned char`, `int`, ...)
    // length is in number of elements, *not* bytes! (for ergonomics reasons)
    size_t key_len;
    // needs to be a non-void pointer type (why not void? So that we can compute `sizeof(*key)`)
    uint8_t* key;

    // above is enough for a "set" type; as usual, we can add other fields
    int value;
};
  1. Nested struct: "key" is changed into a struct with two fields, len and ptr:
struct MapEntry
{
    // `len` & `ptr` have the same constraints as `key_len` & `key` above, respectively
    struct { size_t len; uint8_t* ptr; } key;

    // ... other fields
    int value;
};

I believe it would also be useful to expose a "destructor" callback, similar to how sh_* family handles arena vs strdup allocation internally; YMMV.

Describe alternatives you've considered
Currently, I'm forced to serialize the binary data to strings, which although not the worst idea ever for debugging, is still a bit of a waste; especially if I need to iterate over the keys (as opposed to just lookups), which means I then need to deserialize back into binary data.
Any non-0-terminated strings (e.g. slices of some larger input) must also be copied in order to be able to 0-terminate them.

Additional context
As stated above, I'm 100% happy to implement this myself, but I wanted to bounce this off of other people first, since I wasn't sure which approach was the best, API-wise (struct-in-struct vs "flat", and whether to expose destructors).

So, opinions?

@nothings
Copy link
Owner

nothings commented Feb 5, 2025

That all sounds perfectly reasonable, but I don't think I want to add this complexity to stb_ds, so you should just do it for yourself, so as to the design choice: whatever makes you happy.

@nothings
Copy link
Owner

nothings commented Feb 5, 2025

If you wanted to do it with stb_ds without modification, you could use content hashes--like a use a 256-bit strong hash of your binary keys, then use that hash as a fixed-size key. But probably better to just implement it properly.

@darkuranium
Copy link
Author

That all sounds perfectly reasonable, but I don't think I want to add this complexity to stb_ds, so you should just do it for yourself, so as to the design choice: whatever makes you happy.

Just to 100% be on the same page (make sure if I understand you correctly): Even if I implement something for stb_ds, it won't make it upstream?

Very understandable and fair, though it is obviously a shame (as it would cover the last major "hole" in the stb_ds featureset IMO).

@nothings
Copy link
Owner

nothings commented Feb 5, 2025

Correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants