-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Differantial encoding for name offsets #1069
Conversation
Do the tests pass locally? |
there are some internal errors in the test infrastructure related to log files. i fixed this in on of the pending PR's. maybe i can port it to the develop branch. |
Would be great to get these fixes into develop. |
cool stuff. as you mention in the commit message, there's a performance penalty since sizes needs to be summed. since the memory used (and thus saved) for storing names is small, i'm wondering if it's worth it? |
would you mind rebasing it on develop? i made the cucumber log file handling more robust, to fix the "undefined method `body' for nil:NilClass (NoMethodError)" errors on Travis. |
@emiltin rebased on develop, but locally I still get the same error. Might be I broke something while merging, will investigate that later. Name lookups are virtually for free (you can still do about 8000 lookups per millisecond). In practice you won't see any performance difference. The index for the whole planet is about 21MB, this PR reduces that to 6MB. |
The aim for this change is beyond the pure reduction of a couple of MB. Further along the road we will use this to much greater significance. |
@DennisOSRM updated PR, tests should pass now. Also see #1070 that fixes the crash when a routability test fails. |
Cool Thanks. |
The PR looks great. I was thinking to skip the multiple #ifdefs and just go for the hand-rolled code that is most portable. |
@DennisOSRM you mean just the simple loop? But I agree, SSE is generally a headache to maintain and probably not worth the speedup in this case. Other C++ Projects like Eigen or BulletPhysics basically created their own abstraction layer above SSE (and similar) to get something that works across compilers. Also for this to work with shared memory we would need to take some care how the blocks are aligned etc. Which reminds me, I wanted to test how the clang optimizer fairs against the gcc one in this case. |
Ok, let's hold here until you have finished the experiments. |
Just checked: gcc 4.9 wins against clang 3.4.1 with a difference of about 1000 lookups/msec. Anyway, it is clear that we can just drop the SSE code, since for both clang and gcc the speedup is rather low (100 and 400 lookups / msec respectively). |
Ok, cool. Let's roll with the simple code |
This time Travis is really showing a compile error :) |
how come travis carries on with the cucumber tests, even though the build failed? |
Travis is borked. |
Hm, compiles just fine locally using gcc 4.9, can someone retrigger the travis build? |
There is gcc 4.7 on Travis . Please do not forget about old compilers (and Windows ones) :-) |
retriggered, but it's probably due to GCC 4.7 on Travis |
Here is the build error on my FreeBSD-VM clang 3.3: |
This helped (clang compilation passed too) :) |
A great tool to figure out includes, forward decls and other stuff is iwyu. |
Each name is represented as an integer range in a vector of chars. Instead of storing the absolute offset inside this array, we can store only the offset to the previous entry (the string size). By doing this we reduce the number of bytes need to store an offset from 4 to 1 bytes (if we set a maximum string length of 255). This is however slower, since the absolute offset must be computed on each querry by summing up all previous lengths. To limit the performance inpact we only do this for blocks of a certain size (16).
SharedDataLayout was refactored to include canary values at the boundaries of each memory block. This makes it easy to detect overruns and block-size mismatches between osrm-datastore and the SharedDataFacade.
#include <vector> | ||
#include <array> | ||
|
||
#include "SharedMemoryFactory.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the local includes to the top. the order is local, libraries, c-includes, STL includes each block lexicographically sorted.
This PR implements differential encoding for the offsets of street names in the name vector.
This approach only used about 1/3 of the previous memory (20 bytes vs. 64 bytes for 17 offsets). However since the memory needed to store this information is only a few MB, this won't be too noticeable.
Hopefully RangeTable can be reused later for some other parts of OSRM to bring further memory reductions.