Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"No markers for contigs in table. Unable to assess binning quality" for split contigs by taxa #203

Closed
chasemc opened this issue Dec 9, 2021 · 4 comments · Fixed by #229 or #231
Closed
Labels
nextflow Nextflow related issues/code

Comments

@chasemc
Copy link
Member

chasemc commented Dec 9, 2021

@WiscEvan @jason-c-kwan
Thought I was running into a bug on issue185-chase-andrew but turns out not exactly.

When contigs are split by taxa, the Nextflow binning step fails the pipeline with "No markers for contigs in table. Unable to assess binning quality". This is caused by the unclassified contig split, the bacteria split finishes fine. (the 78-dataset).

Not sure how we'd like to proceed. I think it's not necessarily correct to fail the pipeline on an expected result that may only occur within one of the taxonomic splits.

I think adding errorStrategy 'ignore' to the Nextflow process BIN_CONTIGS seems overly-blunt but may work if it's transparent enough about failed jobs.

Relevant python code:

if main.loc[main.index.isin(markers.index)].empty:
raise TableFormatError(
"No markers for contigs in table. Unable to assess binning quality"
)
if main.shape[0] <= 1:
raise BinningError("Not enough contigs in table for binning")

Relevant Nextflow code:
https://github.com/KwanLab/Autometa/blob/dev/modules/local/bin_contigs.nf

@chasemc chasemc added the nextflow Nextflow related issues/code label Dec 9, 2021
@chasemc chasemc linked a pull request Dec 10, 2021 that will close this issue
@evanroyrees
Copy link
Collaborator

evanroyrees commented Dec 10, 2021

We currently only have marker sets to assess binning quality for bacteria and archaea. So trying to bin any of the other kingdoms as well as unclassified is not possible with our approach. I see two ways forward here on the python side,

  1. return an empty binning table
  2. raise a table format error letting the user know that no markers are available to perform bin QA

Although, setting errorStrategy 'ignore' seems (to me) to be closer to the appropriate approach. Perhaps the best way forward would be to raise the TableFormatError and couple this with a unique error code which nextflow will be able to explicitly handle? This would require tweaking on both ends

e.g.

python side

try:
    main_out = binning(
            main=main_df,
            markers=markers_df,
            taxonomy=taxa_present,
            starting_rank=args.starting_rank,
            reverse_ranks=args.reverse_ranks,
            domain=args.domain,
            completeness=args.completeness,
            purity=args.purity,
            coverage_stddev=args.cov_stddev_limit,
            gc_content_stddev=args.gc_stddev_limit,
            method=args.clustering_method,
            verbose=args.verbose,
        )
except TableFormatError as err:
    logger.warn(err)
    # probably need to look a little deeper into appropriate error codes, but you get the idea.
    # not sure if this should be a return sys.exit(...)
    return 1234 
    # or 
    sys.exit(1234)

nextflow side

errorStrategy { task.exitStatus in 1234 ? 'ignore' : 'terminate' }

@chasemc
Copy link
Member Author

chasemc commented Dec 10, 2021

I was leaning towards "return an empty binning table" or no output at all (so we could "optional emit" from Nextflow if UNCLUSTERED_RECRUITMENT would fail on an empty table)
And logging/printing a warning

@chasemc
Copy link
Member Author

chasemc commented Dec 10, 2021

I guess on the python side it would be easier to just error out so the TableFormatError could work. I don't think I really have a preference.

@chasemc
Copy link
Member Author

chasemc commented Dec 16, 2021

Just leaving some breadcrumbs for me...

Same for autometa-markers:

Command error:
  [12/16/2021 08:20:42 PM DEBUG] autometa.common.external.hmmscan: hmmscan --seed 42 --cpu 1 --tblout fake_spades.hmmscan.tsv /scratch/dbs/markers/bacteria.single_copy.hmm fake_spades.faa
  Traceback (most recent call last):
    File "/opt/conda/bin/autometa-markers", line 33, in <module>
      sys.exit(load_entry_point('Autometa==2.0.0', 'console_scripts', 'autometa-markers')())
    File "/opt/conda/lib/python3.9/site-packages/Autometa-2.0.0-py3.9.egg/autometa/common/markers.py", line 282, in main
      get(
    File "/opt/conda/lib/python3.9/site-packages/Autometa-2.0.0-py3.9.egg/autometa/common/markers.py", line 196, in get
      out = hmmscan.filter_markers(
    File "/opt/conda/lib/python3.9/site-packages/Autometa-2.0.0-py3.9.egg/autometa/common/external/hmmscan.py", line 278, in filter_markers
      raise AssertionError(f"No markers in {infpath} pass cutoff thresholds")
  AssertionError: No markers in fake_spades.hmmscan.tsv pass cutoff thresholds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nextflow Nextflow related issues/code
Projects
None yet
2 participants