You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I quite often need to store a hash table keyed by dynamically-sized binary data (including, but not limited to, binary-safe strings); currently, I need to encode it into something that doesn't use \0 bytes (e.g. escaping with [say] "\\0" [and \ with "\\\\"], base64, et cetera).
I also have the same requirement for when I store non-0-terminated strings (e.g. slices of some input text). At the moment, that requires copies of data.
Describe the solution you'd like
I have two different ideas on how to resolve this; both rely on a new "family" of functions (here called bh_*, which stands for "binary hash" --- similar to sh_* for strings).
I've been thinking of implementing this myself (and then submitting a PR), but I'm not sure which approach is best. Without further ado, the two alternatives I had in mind are:
"Flat" struct: "key" remains the actual data (this time as a pointer [or pointer-like, i.e. uint8_t key[];] instead of a direct value); key_len is added, which specifies the length of the key:
structMapEntry
{
// needs to be an integer type (e.g. `size_t`, `uint32_t`, `unsigned char`, `int`, ...)// length is in number of elements, *not* bytes! (for ergonomics reasons)size_tkey_len;
// needs to be a non-void pointer type (why not void? So that we can compute `sizeof(*key)`)uint8_t*key;
// above is enough for a "set" type; as usual, we can add other fieldsintvalue;
};
Nested struct: "key" is changed into a struct with two fields, len and ptr:
structMapEntry
{
// `len` & `ptr` have the same constraints as `key_len` & `key` above, respectivelystruct { size_tlen; uint8_t*ptr; } key;
// ... other fieldsintvalue;
};
I believe it would also be useful to expose a "destructor" callback, similar to how sh_* family handles arena vs strdup allocation internally; YMMV.
Describe alternatives you've considered
Currently, I'm forced to serialize the binary data to strings, which although not the worst idea ever for debugging, is still a bit of a waste; especially if I need to iterate over the keys (as opposed to just lookups), which means I then need to deserialize back into binary data.
Any non-0-terminated strings (e.g. slices of some larger input) must also be copied in order to be able to 0-terminate them.
Additional context
As stated above, I'm 100% happy to implement this myself, but I wanted to bounce this off of other people first, since I wasn't sure which approach was the best, API-wise (struct-in-struct vs "flat", and whether to expose destructors).
So, opinions?
The text was updated successfully, but these errors were encountered:
That all sounds perfectly reasonable, but I don't think I want to add this complexity to stb_ds, so you should just do it for yourself, so as to the design choice: whatever makes you happy.
If you wanted to do it with stb_ds without modification, you could use content hashes--like a use a 256-bit strong hash of your binary keys, then use that hash as a fixed-size key. But probably better to just implement it properly.
That all sounds perfectly reasonable, but I don't think I want to add this complexity to stb_ds, so you should just do it for yourself, so as to the design choice: whatever makes you happy.
Just to 100% be on the same page (make sure if I understand you correctly): Even if I implement something for stb_ds, it won't make it upstream?
Very understandable and fair, though it is obviously a shame (as it would cover the last major "hole" in the stb_ds featureset IMO).
Is your feature request related to a problem? Please describe.
I quite often need to store a hash table keyed by dynamically-sized binary data (including, but not limited to, binary-safe strings); currently, I need to encode it into something that doesn't use
\0
bytes (e.g. escaping with [say]"\\0"
[and\
with"\\\\"
], base64, et cetera).I also have the same requirement for when I store non-0-terminated strings (e.g. slices of some input text). At the moment, that requires copies of data.
Describe the solution you'd like
I have two different ideas on how to resolve this; both rely on a new "family" of functions (here called
bh_*
, which stands for "binary hash" --- similar tosh_*
for strings).I've been thinking of implementing this myself (and then submitting a PR), but I'm not sure which approach is best. Without further ado, the two alternatives I had in mind are:
uint8_t key[];
] instead of a direct value);key_len
is added, which specifies the length of the key:len
andptr
:I believe it would also be useful to expose a "destructor" callback, similar to how
sh_*
family handles arena vs strdup allocation internally; YMMV.Describe alternatives you've considered
Currently, I'm forced to serialize the binary data to strings, which although not the worst idea ever for debugging, is still a bit of a waste; especially if I need to iterate over the keys (as opposed to just lookups), which means I then need to deserialize back into binary data.
Any non-0-terminated strings (e.g. slices of some larger input) must also be copied in order to be able to 0-terminate them.
Additional context
As stated above, I'm 100% happy to implement this myself, but I wanted to bounce this off of other people first, since I wasn't sure which approach was the best, API-wise (struct-in-struct vs "flat", and whether to expose destructors).
So, opinions?
The text was updated successfully, but these errors were encountered: