"No markers for contigs in table. Unable to assess binning quality" for split contigs by taxa #203

chasemc · 2021-12-09T16:28:39Z

@WiscEvan @jason-c-kwan
Thought I was running into a bug on issue185-chase-andrew but turns out not exactly.

When contigs are split by taxa, the Nextflow binning step fails the pipeline with "No markers for contigs in table. Unable to assess binning quality". This is caused by the unclassified contig split, the bacteria split finishes fine. (the 78-dataset).

Not sure how we'd like to proceed. I think it's not necessarily correct to fail the pipeline on an expected result that may only occur within one of the taxonomic splits.

I think adding errorStrategy 'ignore' to the Nextflow process BIN_CONTIGS seems overly-blunt but may work if it's transparent enough about failed jobs.

Relevant python code:

Autometa/autometa/binning/recursive_dbscan.py

Lines 721 to 726 in 36d52b1

    
           if main.loc[main.index.isin(markers.index)].empty: 
        
               raise TableFormatError( 
        
                   "No markers for contigs in table. Unable to assess binning quality" 
        
               ) 
        
           if main.shape[0] <= 1: 
        
               raise BinningError("Not enough contigs in table for binning")

Relevant Nextflow code:
https://github.com/KwanLab/Autometa/blob/dev/modules/local/bin_contigs.nf

The text was updated successfully, but these errors were encountered:

evanroyrees · 2021-12-10T16:35:43Z

We currently only have marker sets to assess binning quality for bacteria and archaea. So trying to bin any of the other kingdoms as well as unclassified is not possible with our approach. I see two ways forward here on the python side,

return an empty binning table
raise a table format error letting the user know that no markers are available to perform bin QA

Although, setting errorStrategy 'ignore' seems (to me) to be closer to the appropriate approach. Perhaps the best way forward would be to raise the TableFormatError and couple this with a unique error code which nextflow will be able to explicitly handle? This would require tweaking on both ends

e.g.

python side

try:
    main_out = binning(
            main=main_df,
            markers=markers_df,
            taxonomy=taxa_present,
            starting_rank=args.starting_rank,
            reverse_ranks=args.reverse_ranks,
            domain=args.domain,
            completeness=args.completeness,
            purity=args.purity,
            coverage_stddev=args.cov_stddev_limit,
            gc_content_stddev=args.gc_stddev_limit,
            method=args.clustering_method,
            verbose=args.verbose,
        )
except TableFormatError as err:
    logger.warn(err)
    # probably need to look a little deeper into appropriate error codes, but you get the idea.
    # not sure if this should be a return sys.exit(...)
    return 1234 
    # or 
    sys.exit(1234)

nextflow side

errorStrategy { task.exitStatus in 1234 ? 'ignore' : 'terminate' }

nf dynamic directives

chasemc · 2021-12-10T17:11:31Z

I was leaning towards "return an empty binning table" or no output at all (so we could "optional emit" from Nextflow if UNCLUSTERED_RECRUITMENT would fail on an empty table)
And logging/printing a warning

chasemc · 2021-12-10T17:13:24Z

I guess on the python side it would be easier to just error out so the TableFormatError could work. I don't think I really have a preference.

chasemc · 2021-12-16T20:22:53Z

Just leaving some breadcrumbs for me...

Same for autometa-markers:

Command error:
  [12/16/2021 08:20:42 PM DEBUG] autometa.common.external.hmmscan: hmmscan --seed 42 --cpu 1 --tblout fake_spades.hmmscan.tsv /scratch/dbs/markers/bacteria.single_copy.hmm fake_spades.faa
  Traceback (most recent call last):
    File "/opt/conda/bin/autometa-markers", line 33, in <module>
      sys.exit(load_entry_point('Autometa==2.0.0', 'console_scripts', 'autometa-markers')())
    File "/opt/conda/lib/python3.9/site-packages/Autometa-2.0.0-py3.9.egg/autometa/common/markers.py", line 282, in main
      get(
    File "/opt/conda/lib/python3.9/site-packages/Autometa-2.0.0-py3.9.egg/autometa/common/markers.py", line 196, in get
      out = hmmscan.filter_markers(
    File "/opt/conda/lib/python3.9/site-packages/Autometa-2.0.0-py3.9.egg/autometa/common/external/hmmscan.py", line 278, in filter_markers
      raise AssertionError(f"No markers in {infpath} pass cutoff thresholds")
  AssertionError: No markers in fake_spades.hmmscan.tsv pass cutoff thresholds

chasemc added the nextflow Nextflow related issues/code label Dec 9, 2021

chasemc linked a pull request Dec 10, 2021 that will close this issue

Issue185 chase andrew #204

Closed

evanroyrees linked a pull request Jan 30, 2022 that will close this issue

🐛 🎨 🍏 Fix kingdom-handling and mounting NCBI databases into docker container #229

Merged

evanroyrees closed this as completed Jan 30, 2022

evanroyrees linked a pull request Feb 1, 2022 that will close this issue

Add error handling strategies for nextflow processes #231

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"No markers for contigs in table. Unable to assess binning quality" for split contigs by taxa #203

"No markers for contigs in table. Unable to assess binning quality" for split contigs by taxa #203

chasemc commented Dec 9, 2021

evanroyrees commented Dec 10, 2021 •

edited

Loading

chasemc commented Dec 10, 2021

chasemc commented Dec 10, 2021

chasemc commented Dec 16, 2021

"No markers for contigs in table. Unable to assess binning quality" for split contigs by taxa #203

"No markers for contigs in table. Unable to assess binning quality" for split contigs by taxa #203

Comments

chasemc commented Dec 9, 2021

evanroyrees commented Dec 10, 2021 • edited Loading

python side

nextflow side

chasemc commented Dec 10, 2021

chasemc commented Dec 10, 2021

chasemc commented Dec 16, 2021

evanroyrees commented Dec 10, 2021 •

edited

Loading