-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"No markers for contigs in table. Unable to assess binning quality" for split contigs by taxa #203
Comments
We currently only have marker sets to assess binning quality for bacteria and archaea. So trying to bin any of the other kingdoms as well as unclassified is not possible with our approach. I see two ways forward here on the python side,
Although, setting e.g. python sidetry:
main_out = binning(
main=main_df,
markers=markers_df,
taxonomy=taxa_present,
starting_rank=args.starting_rank,
reverse_ranks=args.reverse_ranks,
domain=args.domain,
completeness=args.completeness,
purity=args.purity,
coverage_stddev=args.cov_stddev_limit,
gc_content_stddev=args.gc_stddev_limit,
method=args.clustering_method,
verbose=args.verbose,
)
except TableFormatError as err:
logger.warn(err)
# probably need to look a little deeper into appropriate error codes, but you get the idea.
# not sure if this should be a return sys.exit(...)
return 1234
# or
sys.exit(1234) nextflow side
|
I was leaning towards "return an empty binning table" or no output at all (so we could "optional emit" from Nextflow if |
I guess on the python side it would be easier to just error out so the |
Just leaving some breadcrumbs for me... Same for
|
@WiscEvan @jason-c-kwan
Thought I was running into a bug on issue185-chase-andrew but turns out not exactly.
When contigs are split by taxa, the Nextflow binning step fails the pipeline with "No markers for contigs in table. Unable to assess binning quality". This is caused by the
unclassified
contig split, thebacteria
split finishes fine. (the 78-dataset).Not sure how we'd like to proceed. I think it's not necessarily correct to fail the pipeline on an expected result that may only occur within one of the taxonomic splits.
I think adding
errorStrategy 'ignore'
to the Nextflowprocess BIN_CONTIGS
seems overly-blunt but may work if it's transparent enough about failed jobs.Relevant python code:
Autometa/autometa/binning/recursive_dbscan.py
Lines 721 to 726 in 36d52b1
Relevant Nextflow code:
https://github.com/KwanLab/Autometa/blob/dev/modules/local/bin_contigs.nf
The text was updated successfully, but these errors were encountered: