-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] EPR wavelettree #75
Conversation
6b73bf9
to
4784365
Compare
Thank you for working on this first of all! These are my small benchmark results. It should be noted that none of these use the official search interface and instead include code that avoids allocations inside the search (see seqan/seqan3#1991 ). Otherwise, additional slowdown occurs. Also, this uses SeqAn3 from around February since later changes broke the interfaces I added to avoid said allocations. 12 Threads
Reading of database and pre-processing is not included in the wall clock time [deserialisation is a lot slower than SeqAn2's btw, around 5x, but this is somewhat expected]. CPU-time is counted within the program so not 100% accurate, but it's a good indication and reflects total wall clock times. 6 Threads
ConclusionThe new branch significantly improves performance for Lambda. But it is still notably slower than lambda2/SeqAn2. I know that lazy-loading and online-processing of sequences in lambda3 adds about 10% run-time, but currently we are still at +50% run-time. All other steps that happen as part of the wall-clock time (alignment, output) are using the same SeqAn2-based code in all versions of Lambda. In summary: This PR is a significant improvement and I wholeheartedly support merging it in the current form. But the search as a whole still hasn't caught up to SeqAn2 and more research in the area needs to be done. P.S: The serialised index of the new EPR is much bigger than in SeqAn2, although this is not reflected in memory usage. Any clue why? |
include/sdsl/rank_support_int_v.hpp
Outdated
//!\brief Does nothing for the rank_support_int structure. | ||
void set_vector(const int_vector<>* /*other_text*/ ){} // TODO: Check where this interface is needed, since it is dangerous? | ||
// I would be able to reset the text without recomputing the rank support structure which is in general a | ||
// bad design. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need it since you store the text somewhere else. (Except some other class expects the function to be available)
Usually you need this interface for serialization since you have to set the supported vector after deserializing the rank support vector.
da84588
to
9af1d0c
Compare
From what I've seen, we cannot use the |
66ea712
to
6dafa61
Compare
@eseiler many thanks for all your efforts!!! That totally 🤘 |
…into 64-bit words
…ntation of epr rank structure.
both text and sdsl::memory_manager might deallocate memory on program exit
6dafa61
to
a47c364
Compare
I think it should be fine for now, since it can be tested thoroughly before the next release. |
Thank you both for your work on this! |
Finalises and improves the original PR for the EPR dictionary.
Supersedes #70
resolves seqan/product_backlog#21
This is a more efficient re-implementation of the rank_support_int_v which is roughly 20% faster than SeqAn2
And the new epr rank (using interleaved superblocks):
Old SDSL epr branch (using aligned int_vector with wasting bits):