-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option for SBT creation to be localized #925
Add option for SBT creation to be localized #925
Conversation
Getting this error locally when trying to run tests: ==================================== ERRORS ====================================
_____________________ ERROR collecting tests/test_sbtmh.py _____________________
ImportError while importing test module '/Users/olgabot/code/sourmash/tests/test_sbtmh.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../anaconda/envs/sourmash/lib/python3.6/site-packages/_pytest/python.py:513: in _importtestmodule
mod = self.fspath.pyimport(ensuresyspath=importmode)
../../../anaconda/envs/sourmash/lib/python3.6/site-packages/py/_path/local.py:701: in pyimport
__import__(modname)
../../../anaconda/envs/sourmash/lib/python3.6/site-packages/_pytest/assertion/rewrite.py:152: in exec_module
exec(co, module.__dict__)
test_sbtmh.py:1: in <module>
from sourmash import MinHash, SourmashSignature
../../../anaconda/envs/sourmash/lib/python3.6/site-packages/_pytest/assertion/rewrite.py:152: in exec_module
exec(co, module.__dict__)
../sourmash/__init__.py:33: in <module>
from .signature import (
../../../anaconda/envs/sourmash/lib/python3.6/site-packages/_pytest/assertion/rewrite.py:152: in exec_module
exec(co, module.__dict__)
../sourmash/signature.py:13: in <module>
from ._minhash import to_bytes
E ImportError: cannot import name 'to_bytes'
ERROR: not found: /Users/olgabot/code/sourmash/tests/test_sbtmh.py::test_localized_add_node
(no name '/Users/olgabot/code/sourmash/tests/test_sbtmh.py::test_localized_add_node' in any of [<Module test_sbtmh.py>]) These don't seem to be happening on Travis, so probably my own environment's fault. Time to make a new conda environment! |
hi olga, you need to run a clean rust build, I think.
|
Thanks @ctb -- Did New environment creationOverview:
Error messageStill getting the same error, from PyCharm's test runner: Testing started at 7:23 AM ...
/anaconda3/envs/sourmash-dev/bin/python "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pycharm/_jb_pytest_runner.py" --target test_sbtmh.py::test_localized_add_node
Launching pytest with arguments test_sbtmh.py::test_localized_add_node in /Users/olgabot/code/sourmash/tests
============================= test session starts ==============================
platform darwin -- Python 3.7.6, pytest-5.4.1, py-1.8.1, pluggy-0.12.0 -- /anaconda3/envs/sourmash-dev/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/olgabot/code/sourmash/tests/.hypothesis/examples')
rootdir: /Users/olgabot/code/sourmash, inifile: pytest.ini
plugins: hypothesis-5.8.0
collecting ...
tests/test_sbtmh.py:None (tests/test_sbtmh.py)
ImportError while importing test module '/Users/olgabot/code/sourmash/tests/test_sbtmh.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/anaconda3/envs/sourmash-dev/lib/python3.7/site-packages/_pytest/python.py:513: in _importtestmodule
mod = self.fspath.pyimport(ensuresyspath=importmode)
/anaconda3/envs/sourmash-dev/lib/python3.7/site-packages/py/_path/local.py:701: in pyimport
__import__(modname)
/anaconda3/envs/sourmash-dev/lib/python3.7/site-packages/_pytest/assertion/rewrite.py:152: in exec_module
exec(co, module.__dict__)
test_sbtmh.py:1: in <module>
from sourmash import MinHash, SourmashSignature
/anaconda3/envs/sourmash-dev/lib/python3.7/site-packages/_pytest/assertion/rewrite.py:152: in exec_module
exec(co, module.__dict__)
../sourmash/__init__.py:33: in <module>
from .signature import (
/anaconda3/envs/sourmash-dev/lib/python3.7/site-packages/_pytest/assertion/rewrite.py:152: in exec_module
exec(co, module.__dict__)
../sourmash/signature.py:13: in <module>
from ._minhash import to_bytes
E ImportError: cannot import name 'to_bytes' from 'sourmash._minhash' (/Users/olgabot/code/sourmash/sourmash/_minhash.cpython-37m-darwin.so)
Assertion failed
Assertion failed
collected 0 items / 1 error
==================================== ERRORS ====================================
_____________________ ERROR collecting tests/test_sbtmh.py _____________________
ImportError while importing test module '/Users/olgabot/code/sourmash/tests/test_sbtmh.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/anaconda3/envs/sourmash-dev/lib/python3.7/site-packages/_pytest/python.py:513: in _importtestmodule
mod = self.fspath.pyimport(ensuresyspath=importmode)
/anaconda3/envs/sourmash-dev/lib/python3.7/site-packages/py/_path/local.py:701: in pyimport
__import__(modname)
/anaconda3/envs/sourmash-dev/lib/python3.7/site-packages/_pytest/assertion/rewrite.py:152: in exec_module
exec(co, module.__dict__)
test_sbtmh.py:1: in <module>
from sourmash import MinHash, SourmashSignature
/anaconda3/envs/sourmash-dev/lib/python3.7/site-packages/_pytest/assertion/rewrite.py:152: in exec_module
exec(co, module.__dict__)
../sourmash/__init__.py:33: in <module>
from .signature import (
/anaconda3/envs/sourmash-dev/lib/python3.7/site-packages/_pytest/assertion/rewrite.py:152: in exec_module
exec(co, module.__dict__)
../sourmash/signature.py:13: in <module>
from ._minhash import to_bytes
E ImportError: cannot import name 'to_bytes' from 'sourmash._minhash' (/Users/olgabot/code/sourmash/sourmash/_minhash.cpython-37m-darwin.so)
=========================== short test summary info ============================
ERROR test_sbtmh.py
=============================== 1 error in 0.18s ===============================
ERROR: not found: /Users/olgabot/code/sourmash/tests/test_sbtmh.py::test_localized_add_node
(no name '/Users/olgabot/code/sourmash/tests/test_sbtmh.py::test_localized_add_node' in any of [<Module test_sbtmh.py>])
Process finished with exit code 0
Assertion failed
Assertion failed
Assertion failed
Assertion failed EDIT: Put error message into "details" div tag |
What is the reasoning behind the sourmash bioconda recipe not specifying a Rust version? That seems like a good place to link to conda-forge built binaries. |
Right, I think there are stale build files, that wouldn't be affected by
new environment etc. Try `make clean` first?
|
Ohh okay I didn't realize "clean rust build" == |
On Tue, Mar 31, 2020 at 07:33:15AM -0700, Olga Botvinnik wrote:
Ohh okay I didn't realize "clean rust build" == `rustup update && make clean`
sorry - I didn't have the commands handy myself :)
`make clean` is a good default thing to do and it has worked for me
in the past with similar errors.
|
THANK YOU!! My tests are correctly not passing now :) |
On Tue, Mar 31, 2020 at 07:34:55AM -0700, Olga Botvinnik wrote:
THANK YOU!! My tests are *correctly* not passing now :)
lol!
please don't close this issue - we should add this error message and
`make clean` to the docs.
|
No worries, these tests still have to get to passing. Can open up a separate issue for the doc changes |
sourmash has a build-time dependency on Rust (to build the shared library), but not during runtime (for Python it looks like any C extension). It makes a lot of sense for a dev environment, but that's not the primary use case, nor something that conda supports... |
Hmm the tests all worked on my computer.. any suggestions for replicating the Travis environment? Using docker maybe? |
I'm also very confused because "fixing" the tests was mostly changing a bunch of output results for |
huh! I'll take a look.
|
notify('loaded {} sigs; saving SBT under "{}"', n, args.sbt_name) | ||
tree.save(args.sbt_name, sparseness=args.sparseness) | ||
|
||
def check_signature_compatibilty_to_tree(ksizes, moltypes, nums, scaleds): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in sourmash_args, we have check_tree_is_compatible
; they do slightly different things and I think it's fine to use this new function, but we should name it something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(maybe prefix it with an _ to indicate that it's a helper function that shouldn't be used outside this module?)
hi @olgabot, I took at look at I ran into two problems which suggests to me that there's a bug in this PR somewhere -- first, I compared the results above with not using an SBT by running:
and I got 12 matches total, matching what was in the original test. Since the results of running any search on an index should be the same as running a search on the signatures, the original test is correct. (whew :) second, I stumbled across some order dependence on the input signatures. If I do
I get different 11 results; if I index the signatures in a different order,
I get 8 results. Interestingly these results all seem to hold if I specify |
I did some digging! tl;dr at least one obvious problem, not sure what the cause is. code for SBT print stuff is here: https://github.com/czbiohub/sourmash/pull/11 preparation:build test1 and test2 SBTs:
Verify that they produce different numbers of results with same gather query (note: gather should return 12 results :)
save the unassigned bits from the test2 search:
dig into the test2 tree with the unassigned bitsIn the below tree, the match score at the end of each line is the containment of the unassigned bits. I've hand annotated "things that shouldn't happen" with X1 and X2:
Details: here you can see that internal.2 and internal.1 are both children of internal.0, and both have matches; but internal.0 doesn't match! This should never happen in an SBT. next, dig into the test1 tree with its unassigned bitFirst, get the unassigned bits remaining after gather against test1 --
Note that a search against signatures, rather than a tree,
yields a match to
When I look at the test1 tree:
You can see that again the root note has no matches, despite both internal nodes having matches (presumably the Y1 is a false positive? that's at least plausible :) conclusionthe observed problem is that the root node is not getting properly updated with contents beneath it. I don't know if that's the whole problem, though! HTH. |
Wow, thanks for diving into this! Gives a lot of context for where to look next. I'll look into it next week. Thanks! |
Looked into this a bit and in addition to the internal nodes not getting updated properly, the tree construction for these samples is happening incorrectly in the first place. I still don't know how to fix this but at least I know a bit more about what is happening that shouldn't be! For reference, here is the hierarchically clustered signatures under So the localized SBT should be "pre-clustered" and contain all the Salmonella samples (green) under one node, and all the Thermotoga and Campylobacter samples separately. But the construction of the trees doesn't happen properly. I've drawn out the trees below. Salmonella samples are highlighted in green, Thermotoga and Campylobacter are highlighted in pink. The samples that are most similar to one another and should share a parent are boxed in purple. Sorry about the bad handwriting. Here's Tree1: And Tree 2: For Tree 1, the construction happens fine for the first four samples: But it's the addition of Is this the right way to be thinking about this? |
To maybe articulate more clearly: When a single closest signature is not found, then the internal nodes should be checked for matches, and if there is a match to an internal node that has dissimilar children, then those children should be pushed out to a tree section, and that more similar new signature should be added in their place |
makes sense! semi-random thought: I wonder if the problem is that the bloom filters don't get updated when the tree structure is changed? I was looking through the code and it looked to me like most of the time if a Nodegraph existed it was no longer updated again. But I couldn't nail that down before I ran out of time and had to go look at something else. |
(this could lead to both observed problems, because once the intermediate node bloom filters are out of date, the placement of new nodes becomes wrong too) |
Yep, I think it's both the bloom filters not being updated properly, AND not looking for the "next-best" match by going up a level in the tree, using the k-mers in the bloom filters for finding matches. |
Adds a new
LocalizedSBT
class insbtmh.py
that adds newSourmashSignature
leaves into their optimal position, sharing a parent with the leaf with which it shares the highest similarity. This enables building a Nearest Neighbor graph (#710) directly from theLocalizedSBT
, which can then be used for graph-based/community clustering and UMAP.See also #756
make test
Did it pass the tests?make coverage
Is the new code covered?without a major version increment. Changing file formats also requires a
major version number increment.
changes were made?