Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] EPR wavelettree #75

Merged
merged 22 commits into from
Nov 16, 2021
Merged

Conversation

rrahn
Copy link
Collaborator

@rrahn rrahn commented Jul 15, 2020

Finalises and improves the original PR for the EPR dictionary.
Supersedes #70
resolves seqan/product_backlog#21

This is a more efficient re-implementation of the rank_support_int_v which is roughly 20% faster than SeqAn2

Run on (8 X 2600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 262K (x4)
  L3 Unified 6291K (x1)
Load Average: 2.72, 2.72, 2.78
---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
rank<seqan::Dna>/100                    36.9 ns         36.7 ns     18915954
rank<seqan::Dna>/10000                  39.2 ns         38.8 ns     18693086
rank<seqan::Dna>/1000000                38.2 ns         38.0 ns     18565079
rank<seqan::Dna>/100000000              96.3 ns         96.0 ns      7416747
rank<seqan::Dna>/1000000000              118 ns          117 ns      5913111
rank<seqan::AminoAcid>/100              33.6 ns         33.5 ns     20678980
rank<seqan::AminoAcid>/10000            33.3 ns         33.2 ns     20833581
rank<seqan::AminoAcid>/1000000          80.8 ns         80.5 ns      8452371
rank<seqan::AminoAcid>/100000000         171 ns          170 ns      4500421
rank<seqan::AminoAcid>/1000000000        224 ns          222 ns      3202225

And the new epr rank (using interleaved superblocks):

Run on (8 X 2600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 262K (x4)
  L3 Unified 6291K (x1)
Load Average: 3.51, 2.90, 2.72
--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------
rank<4>/100               28.5 ns         28.3 ns     24970214
rank<4>/10000             28.2 ns         28.1 ns     24002853
rank<4>/1000000           28.8 ns         28.7 ns     24594369
rank<4>/100000000         73.5 ns         73.2 ns      9967818
rank<4>/1000000000        87.3 ns         87.1 ns      8899059
rank<27>/100              26.0 ns         25.9 ns     27164528
rank<27>/10000            27.1 ns         26.9 ns     26758205
rank<27>/1000000          60.9 ns         60.7 ns     11156445
rank<27>/100000000         113 ns          113 ns      6447750
rank<27>/1000000000        183 ns          181 ns      3406459

Old SDSL epr branch (using aligned int_vector with wasting bits):

--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------
rank<4>/100               28.2 ns         28.0 ns     22793433
rank<4>/10000             27.8 ns         27.7 ns     24657611
rank<4>/1000000           30.9 ns         30.3 ns     22921885
rank<4>/100000000          110 ns          109 ns      6638280
rank<4>/1000000000         142 ns          141 ns      4901103
rank<27>/100              26.3 ns         26.1 ns     25351021
rank<27>/10000            27.3 ns         27.1 ns     26736436
rank<27>/1000000          59.7 ns         59.4 ns      9280866
rank<27>/100000000         178 ns          177 ns      4100545
rank<27>/1000000000        234 ns          230 ns      2543253

@rrahn rrahn force-pushed the feature/fast_epr_wavelettree branch from 6b73bf9 to 4784365 Compare July 15, 2020 16:44
@h-2
Copy link

h-2 commented Jul 21, 2020

Thank you for working on this first of all!

These are my small benchmark results. It should be noted that none of these use the official search interface and instead include code that avoids allocations inside the search (see seqan/seqan3#1991 ). Otherwise, additional slowdown occurs.

Also, this uses SeqAn3 from around February since later changes broke the interfaces I added to avoid said allocations.

12 Threads

lambda2 lambda3 WT lambda3 EPR Seiler lambda3 EPR Rahn
Wall-clock time of search+extend 22s 81s 43s 32s
CPU-time of search-step 170s 789s 389s 249s
Index size (on-disk) 397 MB 361 MB 711 MB 736 MB
Max. RSS 1018 MB 608 MB 998 MB 1052 MB

Reading of database and pre-processing is not included in the wall clock time [deserialisation is a lot slower than SeqAn2's btw, around 5x, but this is somewhat expected].

CPU-time is counted within the program so not 100% accurate, but it's a good indication and reflects total wall clock times.

6 Threads

lambda2 lambda3 WT lambda3 EPR Seiler lambda3 EPR Rahn
Wall-clock time of search+extend 30s 114s 58s 45s
CPU-time of search-step 126s 595s 277s 193s
Index size (on-disk) 397 MB 361 MB 711 MB 736 MB
Max. RSS 900 MB 568 MB 932 MB 985 MB

Conclusion

The new branch significantly improves performance for Lambda. But it is still notably slower than lambda2/SeqAn2. I know that lazy-loading and online-processing of sequences in lambda3 adds about 10% run-time, but currently we are still at +50% run-time. All other steps that happen as part of the wall-clock time (alignment, output) are using the same SeqAn2-based code in all versions of Lambda.
So even if we allot an additional +10% to unknown sources, it seems that search itself incurs +30% runtime. I have done no profiling, so I don't want to guess where this comes from, but it could very well be something outside the rank computation which you obviously improved quite a bit!

In summary: This PR is a significant improvement and I wholeheartedly support merging it in the current form. But the search as a whole still hasn't caught up to SeqAn2 and more research in the area needs to be done.

P.S: The serialised index of the new EPR is much bigger than in SeqAn2, although this is not reflected in memory usage. Any clue why?

@eseiler eseiler self-requested a review September 10, 2020 11:32
Comment on lines 383 to 386
//!\brief Does nothing for the rank_support_int structure.
void set_vector(const int_vector<>* /*other_text*/ ){} // TODO: Check where this interface is needed, since it is dangerous?
// I would be able to reset the text without recomputing the rank support structure which is in general a
// bad design.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need it since you store the text somewhere else. (Except some other class expects the function to be available)
Usually you need this interface for serialization since you have to set the supported vector after deserializing the rank support vector.

@eseiler eseiler force-pushed the feature/fast_epr_wavelettree branch 2 times, most recently from da84588 to 9af1d0c Compare November 11, 2021 14:57
@eseiler
Copy link
Collaborator

eseiler commented Nov 11, 2021

From what I've seen, we cannot use the wt_byte_test directly as we do not return the original ranks.
E.g. T=mississippi (example01.txt).
Both EPR and other WTs have sigma=5, but the EPR return ranks 0-4, while the other WTs return the original char rank.

@eseiler eseiler force-pushed the feature/fast_epr_wavelettree branch 3 times, most recently from 66ea712 to 6dafa61 Compare November 11, 2021 18:32
@rrahn
Copy link
Collaborator Author

rrahn commented Nov 12, 2021

@eseiler many thanks for all your efforts!!! That totally 🤘
Please, let me know if there is anything you'd like to discuss regarding the design.

@eseiler eseiler force-pushed the feature/fast_epr_wavelettree branch from 6dafa61 to a47c364 Compare November 12, 2021 20:07
@eseiler
Copy link
Collaborator

eseiler commented Nov 15, 2021

@rrahn @h-2

I'll try to have a closer look once again this week (last week I just focused on getting CI to work).
If I don't discover major things, would there be any objections to merging this PR?

@rrahn
Copy link
Collaborator Author

rrahn commented Nov 15, 2021

I think it should be fine for now, since it can be tested thoroughly before the next release.
Seems that there are still some improvements regarding space used for the serialised index as well as performance with respect to SeqAn2.

@eseiler eseiler changed the title Feature/fast epr wavelettree [FEATURE] EPR wavelettree Nov 16, 2021
@eseiler eseiler merged commit 317f63c into xxsds:master Nov 16, 2021
@h-2
Copy link

h-2 commented Nov 16, 2021

Thank you both for your work on this!

@eseiler eseiler mentioned this pull request Oct 20, 2022
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement the EPR dictionary
4 participants