Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Store and retrieve hash abundances in LCA DBs. #1015

Closed
wants to merge 24 commits into from
Closed

Conversation

ctb
Copy link
Contributor

@ctb ctb commented Jun 7, 2020

NOTE: this is a PR into #1013.

Currently, the LCA/revindex databases cannot store or retrieve hashval abundances, which means that signatures with track_abundances are not represented faithfully. As suggested in #581 (comment), this PR enables abundance storage in LCA_Database objects by storing them in hashval_to_idx values as tuples (idx, abund).

After #1013, I think this is the last change to LCA_Database to bring it into full compliance as a signature storage database, i.e. all inserted signatures can be full reconstructed from the database. w00t!

Also see #634 and #1013 for motivation.

TODO:

  • start filling in abundances
  • test reconstruction of input sequences
  • add ignore abundance parameters and flags
  • test command line build

PR checklist:

  • Is it mergeable?
  • make test Did it pass the tests?
  • make coverage Is the new code covered?
  • Did it change the command-line interface? Only additions are allowed
    without a major version increment. Changing file formats also requires a
    major version number increment.
  • Was a spellchecker run on the source code and documentation after
    changes were made?

@codecov
Copy link

codecov bot commented Jun 7, 2020

Codecov Report

Merging #1015 into protein_lca_db will decrease coverage by 0.29%.
The diff coverage is 100.00%.

Impacted file tree graph

@@                Coverage Diff                 @@
##           protein_lca_db    #1015      +/-   ##
==================================================
- Coverage           92.33%   92.03%   -0.30%     
==================================================
  Files                  72       72              
  Lines                5413     5439      +26     
==================================================
+ Hits                 4998     5006       +8     
- Misses                415      433      +18     
Impacted Files Coverage Δ
sourmash/lca/command_gather.py 84.47% <100.00%> (+0.60%) ⬆️
sourmash/lca/command_rankinfo.py 89.13% <100.00%> (ø)
sourmash/lca/lca_db.py 94.94% <100.00%> (+0.32%) ⬆️
sourmash/nodegraph.py 77.67% <0.00%> (-16.08%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ed36010...4c5faba. Read the comment docs.

Base automatically changed from protein_lca_db to master June 12, 2020 14:22
@ctb
Copy link
Contributor Author

ctb commented Jul 2, 2020

I'm going to close this for now, as I don't think we actually need it for anything useful :)

@ctb ctb closed this Jul 2, 2020
@ctb ctb deleted the lca_db_abund branch August 20, 2022 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant