Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tigmint - Error #84

Open
Kactaceapengu opened this issue Feb 5, 2025 · 2 comments
Open

tigmint - Error #84

Kactaceapengu opened this issue Feb 5, 2025 · 2 comments
Labels

Comments

@Kactaceapengu
Copy link

Dear LongStitch team!

I want to scaffold a hifiasm HiFi assembly with raw CLR PacBio reads for a plant genome of estimated size: 1.2Gb .

However running Longstitch (conda installed) I always encounter following error, which I can't seem to solve:

out-log:

tigmint-make tigmint-long draft=snd2.asm.bp.p_ctg reads=sandw cut=250 t=64 G=1200000000 span=auto dist=auto longmap=pb
make[1]: Entering directory '/lisc/scratch/diospyrosH/scaffolding/scaffold_26_2025-02-05'
/lisc/user/trinh/.conda/envs/longstitch_env/bin/share/tigmint-1.2.10-4/bin/tigmint_estimate_dist.py sandw.fasta.gz -n 1000000 -o sandw.tigmint-long.params.tsv
sh -c '/lisc/user/trinh/.conda/envs/longstitch_env/bin/share/tigmint-1.2.10-4/bin/../src/long-to-linked-pe -l 250 -m2000 -g1200000000 -s -b sandw.barcode-multiplicity.tsv --bx -t64 --fasta -f sandw.tigmint-long.params.tsv sandw.fasta.gz |
minimap2 -y -t64 -x map-pb --secondary=no snd2.asm.bp.p_ctg.fa - |
/lisc/user/trinh/.conda/envs/longstitch_env/bin/share/tigmint-1.2.10-4/bin/tigmint_molecule_paf.py -q0 -s2000 -p sandw.tigmint-long.params.tsv - | sort -k1,1 -k2,2n -k3,3n > snd2.asm.bp.p_ctg.sandw.cut250.molecule.size2000.distauto.bed'
/lisc/user/trinh/.conda/envs/longstitch_env/bin/share/tigmint-1.2.10-4/bin/tigmint-cut -p64 -w1000 -t0 -m3000 -f sandw.tigmint-long.params.tsv -o snd2.asm.bp.p_ctg.sandw.cut250.molecule.size2000.distauto.trim0.window1000.spanauto.breaktigs.fa snd2.asm.bp.p_ctg.fa snd2.asm.bp.p_ctg.sandw.cut250.molecule.size2000.distauto.bed
Started at: 2025-02-05 18:25:46.797952
make[1]: Leaving directory '/lisc/scratch/diospyrosH/scaffolding/scaffold_26_2025-02-05'

error-log:

minimap2: unrecognized option: secondary=no
long-to-linked-pe v1.2.10: Using more than 6 threads does not scale, reverting to 6.
[M::mm_idx_gen::15.3191.14] collected minimizers
[M::mm_idx_gen::16.152
1.86] sorted minimizers
[M::main::16.1531.86] loaded/built the index for 417 target sequence(s)
[M::mm_mapopt_update::17.288
1.81] mid_occ = 804; max_occ = 5665
[M::mm_idx_stat] kmer size: 19; skip: 13; is_HPC: 1; #seq: 417
[M::mm_idx_stat::17.659*1.79] distinct minimizers: 34416970 (57.34% are singletons); average occurrences: 3.521; average spacing: 9.970
Traceback (most recent call last):
File "/lisc/user/trinh/.conda/envs/longstitch_env/bin/share/tigmint-1.2.10-4/bin/tigmint_molecule_paf.py", line 141, in
main()
File "/lisc/user/trinh/.conda/envs/longstitch_env/bin/share/tigmint-1.2.10-4/bin/tigmint_molecule_paf.py", line 138, in main
MolecIdentifierPaf().run()
File "/lisc/user/trinh/.conda/envs/longstitch_env/bin/share/tigmint-1.2.10-4/bin/tigmint_molecule_paf.py", line 85, in run
paf_entry[18]
IndexError: list index out of range
tigmint-cut: error: calculated span parameter not found in parameter file 'sandw.tigmint-long.params.tsv'
make[1]: *** [/lisc/user/trinh/.conda/envs/longstitch_env/bin/share/tigmint-1.2.10-4/bin/tigmint-make:343: snd2.asm.bp.p_ctg.sandw.cut250.molecule.size2000.distauto.trim0.window1000.spanauto.breaktigs.fa] Error 1
make: *** [/lisc/user/trinh/.conda/envs/longstitch_env/bin/share/longstitch-1.0.5-1/longstitch:215: snd2.asm.bp.p_ctg.cut250.tigmint.fa] Error 2

I've run this script:

#!/bin/bash

#SBATCH --job-name=scaffold_genome
#SBATCH --output=scaffold_26_genome_%j.out
#SBATCH --error=scaffold_26_genome_%j.err
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=64
#SBATCH --mem=300G
#SBATCH --time=2:00:00

scaffold_name="scaffold_26"

HIFI_GENOME="/lisc/scratch/diospyrosH/HiFibasedGenome/snd2.asm.bp.p_ctg.fa"
CLR_READS="/lisc/project/diospyrosH/sandwicensis/CLRreads/RawCLRreads/sandw.fasta.gz"

OUTPUT_DIR="/lisc/scratch/diospyrosH/scaffolding/${scaffold_name}_$(date +%Y-%m-%d)"

mkdir -p "$OUTPUT_DIR"
cd "$OUTPUT_DIR"

# Copy input files to working directory
cp "$HIFI_GENOME" ./snd2.asm.bp.p_ctg.fa
cp "$CLR_READS" ./sandw.fasta.gz

# General Options
THREADS=64
MIN_CONTIG_SIZE=1000
OUT_PREFIX="scaffold"

# Tigmint Options
TIGMINT_SPAN="auto"
TIGMINT_DIST="auto"
GENOME_SIZE=1200000000
LONGMAP_TECH="pb"

# ntLink Options
KMER_SIZE_NTLINK=20
WINDOW_SIZE_NTLINK=100
GAP_FILL=true
NTLINK_ROUNDS=3

# ARCS+LINKS Options
J_MIN_KMER_MATCH=0.05
KMER_SIZE_ARKS=20
C_MIN_READ_PAIRS=4
L_MIN_LINKS=4
A_MAX_LINK_RATIO=0.3

# Load modules
module load conda
conda activate longstitch_env

# Run longstitch
longstitch run draft=snd2.asm.bp.p_ctg reads=sandw G="$GENOME_SIZE" t="$THREADS" \
  out_prefix="$OUT_PREFIX" \
  longmap="$LONGMAP_TECH" \
  gap_fill="$GAP_FILL" 
  #k_ntLink="$KMER_SIZE_NTLINK" w="$WINDOW_SIZE_NTLINK"  rounds="$NTLINK_ROUNDS" \
  #j="$J_MIN_KMER_MATCH" k_arks="$KMER_SIZE_ARKS" c="$C_MIN_READ_PAIRS" l="$L_MIN_LINKS" a="$A_MAX_LINK_RATIO"

# Save parameters to param.txt
cat <<EOL > "$OUTPUT_DIR/param.txt"
scaffold_name=$scaffold_name
BASE_DIR=$BASE_DIR
HIFI_GENOME=$HIFI_GENOME
CLR_READS=$CLR_READS
LRScaf_JAR=$LRScaf_JAR
CLR_BAM_FILE=$CLR_BAM_FILE
OUTPUT_BASE_DIR=$OUTPUT_BASE_DIR
OUTPUT_DIR=$OUTPUT_DIR

# General Options
THREADS=$THREADS
MIN_CONTIG_SIZE=$MIN_CONTIG_SIZE
OUT_PREFIX=$OUT_PREFIX

# Tigmint Options
TIGMINT_SPAN=$TIGMINT_SPAN
TIGMINT_DIST=$TIGMINT_DIST
GENOME_SIZE=$GENOME_SIZE
LONGMAP_TECH=$LONGMAP_TECH

# ntLink Options
KMER_SIZE_NTLINK=$KMER_SIZE_NTLINK
WINDOW_SIZE_NTLINK=$WINDOW_SIZE_NTLINK
GAP_FILL=$GAP_FILL
NTLINK_ROUNDS=$NTLINK_ROUNDS

# ARCS+LINKS Options
J_MIN_KMER_MATCH=$J_MIN_KMER_MATCH
KMER_SIZE_ARKS=$KMER_SIZE_ARKS
C_MIN_READ_PAIRS=$C_MIN_READ_PAIRS
L_MIN_LINKS=$L_MIN_LINKS
A_MAX_LINK_RATIO=$A_MAX_LINK_RATIO
EOL
@lcoombe
Copy link
Member

lcoombe commented Feb 6, 2025

Hello @Kactaceapengu,

Could you try updating your minimap2 installation to the most recent, or from the last year at least? I see the error minimap2: unrecognized option: secondary=no in your log, which suggests that you are using an older version of the tool, prior to the introduction of that option.
You could also test your updated installation with our demo - that can be really useful for sorting out these sorts of kinks more quickly.

Thank you for your interest in LongStitch!
Lauren

Copy link

github-actions bot commented Mar 9, 2025

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your interest in LongStitch!

@github-actions github-actions bot added the Stale label Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants