Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash algorithm (for hashmap, strings, etc.) #3521

Closed
dumblob opened this issue Jan 21, 2020 · 10 comments
Closed

Hash algorithm (for hashmap, strings, etc.) #3521

dumblob opened this issue Jan 21, 2020 · 10 comments
Labels
Feature/Enhancement Request This issue is made to request a feature or an enhancement to an existing one.

Comments

@dumblob
Copy link
Contributor

dumblob commented Jan 21, 2020

There was a request to use a better hash function for strings. I don't think it's necessary, but for a hashmap, it already matters a lot. There are also other possible use cases in V as well as just having a high quality (passes SMHasher and other tests) hash function standalone in V lib (maybe a template for strings, ints, floats, and arrays of all these primitive types).

I was waiting with this request for half a year to let the best known hash function (which passes all the quality criteria incl. speed) mature. I think time came, so let me introduce wyhash v4. I'm not sure though how the V variant will perform as if there won't be any intrinsics used (e.g. through conditional compilation $if ! js { /* let's use intrinsics */ }), the speed will unnecessarily degrade.

@dumblob dumblob added the Feature/Enhancement Request This issue is made to request a feature or an enhancement to an existing one. label Jan 21, 2020
@medvednikov
Copy link
Member

It uses Java's hash function, which is pretty battle tested :)

@dumblob
Copy link
Contributor Author

dumblob commented Jan 21, 2020

It uses Java's hash function, which is pretty battle tested :)

I suppose you're referring to string. I'm OK with string using the simple hash (even though it's slower than the proposed wyhash and has way lower quality). My proposal is about a universal hash in V stdlib. Such hash shall be crossplatform-stable (i.e. on little/big endian, on x86/ARM, etc. always the same results) and very fast (both for very short inputs as well as for very long inputs - though short inputs are slightly more important).

As I wrote, I'd also like to see wyhash in the V hashmap. Speaking about Java, it uses it's built-in hash() function which is JRE-specific and has very bad quality (see how it's implemented e.g. in OpenJDK) - therefore also proposal JEP draft: Better hash codes proposing actually still worse hash functions than wyhash (but way better than the current hash() and than the hash used for string hashing: for i := 0; i < s.len; i++ { hash = hash * int(31) + int(s.str[i]) }).

@ka-weihe
Copy link
Member

ka-weihe commented Jan 25, 2020

We have been working on putting wyhash in the hashmap. It currently uses FNV1a, but if wyhash provides an improvement it will naturally be the hash-function in the hashmap. And I agree, dbj2 hash (the current string hash) should not used. It has a lot of collisions and biased hashes.

@ka-weihe
Copy link
Member

The hashmap is now using wyhash4 as the default hash-function a14a5fb.

This issue can be closed.

@joe-conigliaro
Copy link
Member

a14a5fb

@dumblob
Copy link
Contributor Author

dumblob commented Feb 3, 2020

One small change from the hash author making it few percent faster: rurban/smhasher#96 (comment)

@dumblob
Copy link
Contributor Author

dumblob commented Mar 5, 2020

Feel free to update wyhash as the newest version solves one potential security issue and is even a tiny bit faster than the previous versions 😉.

@ka-weihe
Copy link
Member

ka-weihe commented Mar 5, 2020

We are working on it 🙂

@dumblob
Copy link
Contributor Author

dumblob commented Mar 6, 2020

No need to hurry - it's just a reminder to not forget about it before next release.

@dumblob
Copy link
Contributor Author

dumblob commented Nov 2, 2022

New version of wyhash got released (it shall fix the bad seeds issue and couple of others).

See also #9553 (comment) and rurban/smhasher#243 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature/Enhancement Request This issue is made to request a feature or an enhancement to an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants