Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differantial encoding for name offsets #1069

Merged
merged 11 commits into from
Jun 15, 2014
Merged

Differantial encoding for name offsets #1069

merged 11 commits into from
Jun 15, 2014

Conversation

TheMarex
Copy link
Member

@TheMarex TheMarex commented Jun 7, 2014

This PR implements differential encoding for the offsets of street names in the name vector.

This approach only used about 1/3 of the previous memory (20 bytes vs. 64 bytes for 17 offsets). However since the memory needed to store this information is only a few MB, this won't be too noticeable.

Hopefully RangeTable can be reused later for some other parts of OSRM to bring further memory reductions.

@DennisOSRM
Copy link
Collaborator

Do the tests pass locally?

@emiltin
Copy link
Contributor

emiltin commented Jun 7, 2014

there are some internal errors in the test infrastructure related to log files. i fixed this in on of the pending PR's. maybe i can port it to the develop branch.

@DennisOSRM
Copy link
Collaborator

Would be great to get these fixes into develop.

@emiltin
Copy link
Contributor

emiltin commented Jun 8, 2014

cool stuff. as you mention in the commit message, there's a performance penalty since sizes needs to be summed. since the memory used (and thus saved) for storing names is small, i'm wondering if it's worth it?

@emiltin
Copy link
Contributor

emiltin commented Jun 8, 2014

would you mind rebasing it on develop? i made the cucumber log file handling more robust, to fix the "undefined method `body' for nil:NilClass (NoMethodError)" errors on Travis.

@TheMarex
Copy link
Member Author

TheMarex commented Jun 8, 2014

@emiltin rebased on develop, but locally I still get the same error.

Might be I broke something while merging, will investigate that later.

Name lookups are virtually for free (you can still do about 8000 lookups per millisecond). In practice you won't see any performance difference. The index for the whole planet is about 21MB, this PR reduces that to 6MB.

@DennisOSRM
Copy link
Collaborator

The aim for this change is beyond the pure reduction of a couple of MB. Further along the road we will use this to much greater significance.

@TheMarex
Copy link
Member Author

TheMarex commented Jun 9, 2014

@DennisOSRM updated PR, tests should pass now. Also see #1070 that fixes the crash when a routability test fails.

@DennisOSRM
Copy link
Collaborator

Cool Thanks.

@DennisOSRM
Copy link
Collaborator

The PR looks great. I was thinking to skip the multiple #ifdefs and just go for the hand-rolled code that is most portable.

@TheMarex
Copy link
Member Author

TheMarex commented Jun 9, 2014

@DennisOSRM you mean just the simple loop? But I agree, SSE is generally a headache to maintain and probably not worth the speedup in this case. Other C++ Projects like Eigen or BulletPhysics basically created their own abstraction layer above SSE (and similar) to get something that works across compilers. Also for this to work with shared memory we would need to take some care how the blocks are aligned etc.

Which reminds me, I wanted to test how the clang optimizer fairs against the gcc one in this case.

@DennisOSRM
Copy link
Collaborator

Ok, let's hold here until you have finished the experiments.

@TheMarex
Copy link
Member Author

Just checked: gcc 4.9 wins against clang 3.4.1 with a difference of about 1000 lookups/msec.
But we are still talking about 7600 lookups/msec and clang will also emit SSE code for the simple loop (looking at the assembler code, clang doesn't unroll the loop, so that's probably it)

Anyway, it is clear that we can just drop the SSE code, since for both clang and gcc the speedup is rather low (100 and 400 lookups / msec respectively).

@DennisOSRM
Copy link
Collaborator

Ok, cool. Let's roll with the simple code

@alex85k
Copy link
Contributor

alex85k commented Jun 11, 2014

This time Travis is really showing a compile error :)

@emiltin
Copy link
Contributor

emiltin commented Jun 11, 2014

how come travis carries on with the cucumber tests, even though the build failed?

@DennisOSRM
Copy link
Collaborator

Travis is borked.

@TheMarex
Copy link
Member Author

Hm, compiles just fine locally using gcc 4.9, can someone retrigger the travis build?

@alex85k
Copy link
Contributor

alex85k commented Jun 11, 2014

There is gcc 4.7 on Travis . Please do not forget about old compilers (and Windows ones) :-)

@DennisOSRM
Copy link
Collaborator

retriggered, but it's probably due to GCC 4.7 on Travis

@alex85k
Copy link
Contributor

alex85k commented Jun 11, 2014

Here is the build error on my FreeBSD-VM clang 3.3:
https://gist.github.com/alex85k/f937346acc12b8abeff0#file-clang-log-L14
(I am just afraid I'll be unable to adopt this super-advanced-templating to MSVC)

@alex85k
Copy link
Contributor

alex85k commented Jun 11, 2014

This helped (clang compilation passed too) :)

@DennisOSRM
Copy link
Collaborator

A great tool to figure out includes, forward decls and other stuff is iwyu.

TheMarex added 6 commits June 12, 2014 22:00
Each name is represented as an integer range in a vector of chars.
Instead of storing the absolute offset inside this array, we can store
only the offset to the previous entry (the string size). By doing this we reduce
the number of bytes need to store an offset from 4 to 1 bytes (if we
set a maximum string length of 255).
This is however slower, since the absolute offset must be computed on
each querry by summing up all previous lengths. To limit the
performance inpact we only do this for blocks of a certain size (16).
SharedDataLayout was refactored to include canary values at the
boundaries of each memory block. This makes it easy to detect overruns
and block-size mismatches between osrm-datastore and the
SharedDataFacade.
#include <vector>
#include <array>

#include "SharedMemoryFactory.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the local includes to the top. the order is local, libraries, c-includes, STL includes each block lexicographically sorted.

@DennisOSRM DennisOSRM merged commit c009dce into Project-OSRM:develop Jun 15, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants