Skip to content

Commit 6c1da1a

Browse files
authored
Merge pull request #104 from Joon-Klaps/docs-continue
Continue making docs
2 parents 4855778 + 079d699 commit 6c1da1a

File tree

75 files changed

+2499
-775
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

75 files changed

+2499
-775
lines changed

.github/ISSUE_TEMPLATE/bug_report.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ body:
88
Before you post this issue, please check the documentation:
99
1010
- [nf-core website: troubleshooting](https://nf-co.re/usage/troubleshooting)
11-
- [nf-core/viralgenie pipeline documentation NOT YET IMPLEMENTED](https://nf-co.re/viralgenie/usage)
11+
- [Viralgenie pipeline documentation](https://joon-klaps.github.io/viralgenie/latest/usage)
1212
1313
- type: textarea
1414
id: description

.github/ISSUE_TEMPLATE/config.yml

+4-4
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
contact_links:
22
- name: Join nf-core
33
url: https://nf-co.re/join
4-
about: Please join the nf-core community here
5-
- name: "Slack #viralgenie channel"
6-
url: https://nfcore.slack.com/channels/viralgenie
7-
about: Discussion about the nf-core/viralgenie pipeline
4+
about: Please join the nf-core community here for general nextflow pipeline discussions.
5+
- name: "Find me on the nf-core slack"
6+
url: https://nfcore.slack.com/team/U043Y6FQR6J
7+
about: Ask me anything about viralgenie on slack.

.github/ISSUE_TEMPLATE/feature_request.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
name: Feature request
2-
description: Suggest an idea for the nf-core/viralgenie pipeline
2+
description: Suggest an idea for the viralgenie pipeline
33
labels: enhancement
44
body:
55
- type: textarea
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
name: Website feedback
2+
description: Report an issue or suggest an improvement for the viralgenie website
3+
labels: ["website", "documentation"]
4+
body:
5+
- type: textarea
6+
id: feedback
7+
attributes:
8+
label: Feedback
9+
description: Please describe the issue or suggestion you have for the website.
10+
validations:
11+
required: true

.github/workflows/build-docs.yml

+3-1
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,9 @@ jobs:
3636
restore-keys: |
3737
mkdocs-material-
3838
- name: Install dependencies
39-
run: pip install mkdocs-material pymdown-extensions pillow cairosvg mike
39+
run: pip install mkdocs-material pymdown-extensions pillow cairosvg mike nf-core
40+
- name: Build parameter docs
41+
run: nf-core schema docs --format markdown --columns parameter,description,default --output docs/parameters.md --force
4042
- name: Build docs
4143
run: mike deploy --push --update-aliases ${{ env.plugin_version }} latest
4244
- name: Set default docs

CHANGELOG.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# nf-core/viralgenie: Changelog
1+
# Viralgenie: Changelog
22

33
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
44
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

CITATIONS.md

+155
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
# Citations
2+
3+
## [Viralgenie](https://github.com/Joon-Klaps/viralgenie)
4+
5+
!!! warning
6+
Viralgenie is currently not Published. Please cite as:
7+
8+
- Klaps J, Lemey P, Kafetzopoulou L. Viralgenie: A metagenomics analysis pipeline for eukaryotic viruses. __Github__ https://github.com/Joon-Klaps/viralgenie
9+
10+
## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)
11+
12+
> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.
13+
14+
## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)
15+
16+
> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.
17+
18+
## Pipeline tools
19+
20+
- [Bbduk](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbduk-guide/)
21+
22+
> Bushnell B. (2022) BBMap, URL: http://sourceforge.net/projects/bbmap/
23+
24+
- [BCFtools](https://pubmed.ncbi.nlm.nih.gov/33590861/)
25+
26+
> Danecek, Petr et al. “Twelve years of SAMtools and BCFtools.” GigaScience vol. 10,2 (2021): giab008. doi:10.1093/gigascience/giab008
27+
28+
- [blast](https://pubmed.ncbi.nlm.nih.gov/20003500/)
29+
30+
>Camacho, Christiam et al. “BLAST+: architecture and applications.” BMC bioinformatics vol. 10 421. 15 Dec. 2009, doi:10.1186/1471-2105-10-421
31+
32+
- [Bowtie2](https://bowtie-bio.sourceforge.net/bowtie2/index.shtml)
33+
34+
> Langmead, Ben, and Steven L Salzberg. “Fast gapped-read alignment with Bowtie 2.” Nature methods vol. 9,4 357-9. 4 Mar. 2012, doi:10.1038/nmeth.1923
35+
36+
- [BWA-MEM](https://github.com/lh3/bwa)
37+
38+
> Li H. (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997v2.
39+
40+
- [BWA-MEM2](https://github.com/bwa-mem2/bwa-mem2)
41+
42+
> M. Vasimuddin, S. Misra, H. Li and S. Aluru, "Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems," 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, 2019, pp. 314-324, doi: 10.1109/IPDPS.2019.00041.
43+
44+
- [cdhit](https://pubmed.ncbi.nlm.nih.gov/23060610/)
45+
46+
> Fu, Limin et al. “CD-HIT: accelerated for clustering the next-generation sequencing data.” Bioinformatics (Oxford, England) vol. 28,23 (2012): 3150-2. doi:10.1093/bioinformatics/bts565
47+
48+
- [checkv](https://pubmed.ncbi.nlm.nih.gov/33349699/)
49+
50+
> Nayfach, Stephen et al. “CheckV assesses the quality and completeness of metagenome-assembled viral genomes.” Nature biotechnology vol. 39,5 (2021): 578-585. doi:10.1038/s41587-020-00774-7
51+
52+
- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
53+
54+
> Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].
55+
56+
- [fastp](https://github.com/OpenGene/fastp)
57+
58+
> Chen, Shifu et al. “fastp: an ultra-fast all-in-one FASTQ preprocessor.” Bioinformatics (Oxford, England) vol. 34,17 (2018): i884-i890. doi:10.1093/bioinformatics/bty560
59+
60+
- [iVar](https://www.ncbi.nlm.nih.gov/pubmed/30621750/)
61+
62+
> Grubaugh, Nathan D et al. “An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar.” Genome biology vol. 20,1 8. 8 Jan. 2019, doi:10.1186/s13059-018-1618-7
63+
64+
- [Kaiju](https://pubmed.ncbi.nlm.nih.gov/27071849/)
65+
66+
> Menzel, Peter et al. “Fast and sensitive taxonomic classification for metagenomics with Kaiju.” Nature communications vol. 7 11257. 13 Apr. 2016, doi:10.1038/ncomms11257
67+
68+
- [Kraken2](https://doi.org/10.1186/s13059-019-1891-0)
69+
70+
> Wood, Derrick E., Jennifer Lu, and Ben Langmead. 2019. Improved Metagenomic Analysis with Kraken 2. Genome Biology 20 (1): 257. doi: 10.1186/s13059-019-1891-0.
71+
72+
- [leiden-algorithm](https://pubmed.ncbi.nlm.nih.gov/30914743/)
73+
74+
> Traag, V A et al. “From Louvain to Leiden: guaranteeing well-connected communities.” Scientific reports vol. 9,1 5233. 26 Mar. 2019, doi:10.1038/s41598-019-41695-z
75+
76+
- [Mash](https://pubmed.ncbi.nlm.nih.gov/27323842/)
77+
78+
> Ondov, Brian D et al. “Mash: fast genome and metagenome distance estimation using MinHash.” Genome biology vol. 17,1 132. 20 Jun. 2016, doi:10.1186/s13059-016-0997-x
79+
80+
- [Megahit](https://pubmed.ncbi.nlm.nih.gov/27012178/)
81+
82+
> Li, Dinghua et al. “MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices.” Methods (San Diego, Calif.) vol. 102 (2016): 3-11. doi:10.1016/j.ymeth.2016.02.020
83+
84+
- [Minimap2](https://pubmed.ncbi.nlm.nih.gov/29750242/)
85+
86+
> Li, Heng. “Minimap2: pairwise alignment for nucleotide sequences.” Bioinformatics (Oxford, England) vol. 34,18 (2018): 3094-3100. doi:10.1093/bioinformatics/bty191
87+
88+
- [MMseqs2](https://pubmed.ncbi.nlm.nih.gov/29035372/)
89+
90+
> Steinegger, Martin, and Johannes Söding. “MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets.” Nature biotechnology vol. 35,11 (2017): 1026-1028. doi:10.1038/nbt.3988
91+
92+
- [Mosdepth](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6030888/)
93+
94+
> Pedersen, Brent S, and Aaron R Quinlan. “Mosdepth: quick coverage calculation for genomes and exomes.” Bioinformatics (Oxford, England) vol. 34,5 (2018): 867-868. doi:10.1093/bioinformatics/btx699
95+
96+
- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)
97+
98+
> Ewels, Philip et al. “MultiQC: summarize analysis results for multiple tools and samples in a single report.” Bioinformatics (Oxford, England) vol. 32,19 (2016): 3047-8. doi:10.1093/bioinformatics/btw354
99+
100+
- [picard-tools](http://broadinstitute.github.io/picard)
101+
102+
- [QUAST](https://www.ncbi.nlm.nih.gov/pubmed/23422339/)
103+
104+
> Gurevich, Alexey et al. “QUAST: quality assessment tool for genome assemblies.” Bioinformatics (Oxford, England) vol. 29,8 (2013): 1072-5. doi:10.1093/bioinformatics/btt086
105+
106+
- [SAMtools](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3198575/)
107+
108+
> Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011 Nov 1;27(21):2987-93. doi: 10.1093/bioinformatics/btr509. Epub 2011 Sep 8. PMID: 21903627; PMCID: PMC3198575.
109+
110+
- [SPAdes](https://www.ncbi.nlm.nih.gov/pubmed/24093227/)
111+
112+
> Bankevich, Anton et al. “SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.” Journal of computational biology : a journal of computational molecular cell biology vol. 19,5 (2012): 455-77. doi:10.1089/cmb.2012.0021
113+
114+
- [Trimmomatic](https://pubmed.ncbi.nlm.nih.gov/24695404/)
115+
116+
> Bolger, Anthony M et al. “Trimmomatic: a flexible trimmer for Illumina sequence data.” Bioinformatics (Oxford, England) vol. 30,15 (2014): 2114-20. doi:10.1093/bioinformatics/btu170
117+
118+
- [Trinity](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3571712/)
119+
120+
> Haas, Brian J et al. “De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis.” Nature protocols vol. 8,8 (2013): 1494-512. doi:10.1038/nprot.2013.084
121+
122+
- [UMI-tools](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5340976/)
123+
124+
> Smith, Tom et al. “UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy.” Genome research vol. 27,3 (2017): 491-499. doi:10.1101/gr.209601.116
125+
126+
- [vRhyme](https://pubmed.ncbi.nlm.nih.gov/35544285/)
127+
128+
> Kieft, Kristopher et al. “vRhyme enables binning of viral genomes from metagenomes.” Nucleic acids research vol. 50,14 (2022): e83. doi:10.1093/nar/gkac341
129+
130+
- [VSEARCH](https://pubmed.ncbi.nlm.nih.gov/27521926/)
131+
132+
> Rognes, Torbjørn et al. “VSEARCH: a versatile open source tool for metagenomics.” PeerJ vol. 4 e2584. 18 Oct. 2016, doi:10.7717/peerj.2584
133+
134+
135+
## Software packaging/containerisation tools
136+
137+
- [Anaconda](https://anaconda.com)
138+
139+
> Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.
140+
141+
- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)
142+
143+
> Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.
144+
145+
- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)
146+
147+
> da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.
148+
149+
- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)
150+
151+
> Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241.
152+
153+
- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)
154+
155+
> Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

README.md

+3-4
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
2323
## Introduction
2424

25-
**Joon-Klaps/viralgenie** is a bioinformatics best-practice analysis pipeline for reconstructing consensus genomes and to identify intra-host variants from metagenomic sequencing data or enriched based sequencing data like hybrid capture.
25+
**Viralgenie** is a bioinformatics best-practice analysis pipeline for reconstructing consensus genomes and to identify intra-host variants from metagenomic sequencing data or enriched based sequencing data like hybrid capture.
2626

2727
## Pipeline summary
2828

@@ -59,8 +59,8 @@
5959
12. Variant calling and filtering ([`BCFTools`](http://samtools.github.io/bcftools/bcftools.html),[`iVar`](https://andersen-lab.github.io/ivar/html/manualpage.html))
6060
13. Create consensus genome ([`BCFTools`](http://samtools.github.io/bcftools/bcftools.html),[`iVar`](https://andersen-lab.github.io/ivar/html/manualpage.html))
6161
14. Repeat step 10-13 multiple times for the denovo contig route
62-
15. Contig evaluation and annotation ([`QUAST`](http://quast.sourceforge.net/quast),[`CheckV`](https://bitbucket.org/berkeleylab/checkv/src/master/),[`blastn`](https://blast.ncbi.nlm.nih.gov/Blast.cgi), [`mmseqs-search`](https://github.com/soedinglab/MMseqs2/wiki#batch-sequence-searching-using-mmseqs-search))
63-
16. Present QC and visualisation for raw read, alignment, assembly, variant calling and consensus calling results ([`MultiQC`](http://multiqc.info/))
62+
15. Consensus evaluation and annotation ([`QUAST`](http://quast.sourceforge.net/quast),[`CheckV`](https://bitbucket.org/berkeleylab/checkv/src/master/),[`blastn`](https://blast.ncbi.nlm.nih.gov/Blast.cgi), [`mmseqs-search`](https://github.com/soedinglab/MMseqs2/wiki#batch-sequence-searching-using-mmseqs-search))
63+
16. Result summary visualisation for raw read, alignment, assembly, variant calling and consensus calling results ([`MultiQC`](http://multiqc.info/))
6464

6565
## Usage
6666

@@ -92,7 +92,6 @@ We thank the following people for their extensive assistance in the development
9292
- [`Liana Kafetzopoulou`](https://github.com/LianaKafetzopoulou)
9393
- [`nf-core community`](https://nf-co.re/)
9494

95-
<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
9695

9796
## Contributions and Support
9897

assets/multiqc_config.yml

+6-4
Original file line numberDiff line numberDiff line change
@@ -95,16 +95,16 @@ module_order:
9595
anchor: "quast_trinity"
9696
path_filters:
9797
- "*_trinity.tsv"
98-
- "samtools":
99-
name: "CLUSTER: Samtools Stats"
100-
- "bcftools":
101-
name: "CLUSTER: Bcftools"
10298
- "picard":
10399
name: "CLUSTER: Picard"
104100
- "umitools":
105101
name: "CLUSTER: UMI-tools"
102+
- "samtools":
103+
name: "CLUSTER: Samtools Stats"
106104
- "mosdepth":
107105
name: "CLUSTER: mosdepth"
106+
- "bcftools":
107+
name: "CLUSTER: Bcftools"
108108
- "custom_content"
109109

110110
# Summary table names
@@ -237,3 +237,5 @@ extra_fn_clean_exts:
237237
- ".norm"
238238
- ".consensus_bcftools"
239239
- ".consensus_ivar"
240+
- "-CONSTRAIN_constrain"
241+
- "-CONSTRAIN_itvariant_calling"

bin/custom_multiqc_tables.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -80,8 +80,8 @@ def file_choices(choices, fname):
8080
parser.add_argument(
8181
"--sample_metadata",
8282
metavar="SAMPLE METADATA",
83-
help="Sample metadata file containing information on the samples, supported formats: '.csv', '.tsv'",
84-
type=lambda s: file_choices(("csv", "tsv"), s),
83+
help="Sample metadata file containing information on the samples, supported formats: '.csv', '.tsv', '.yaml', '.yml'",
84+
type=lambda s: file_choices(("csv", "tsv", "yaml","yml"), s),
8585
)
8686

8787
parser.add_argument(

bin/ivar_variants_to_vcf.py

+1-3
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
#!/usr/bin/env python
2+
# Taken from https://github.com/nf-core/viralrecon/blob/master/bin/ivar_varaints_to_vcf.py
23

34
import argparse
45
import errno
@@ -11,9 +12,6 @@
1112
from Bio import SeqIO
1213
from scipy.stats import fisher_exact
1314

14-
# Taken from https://github.com/nf-core/viralrecon/blob/master/bin/ivar_varaints_to_vcf.py
15-
16-
1715
def parse_args(args=None):
1816
Description = "Convert iVar variants TSV file to VCF format."
1917
Epilog = """Example usage: python ivar_variants_to_vcf.py <file_in> <file_out>"""

bin/lowcov_to_reference.py

+4-3
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ def parse_args(argv=None):
4242
"--mpileup",
4343
metavar="MPILEUP FILE",
4444
type=Path,
45-
help="Mpileup file in (default) tsv format.",
45+
help="Mpileup file in (default) tsv format, typically from iVar consensus.",
4646
)
4747

4848
parser.add_argument(
@@ -143,14 +143,15 @@ def alignment_replacement(reference_record, consensus_record, regions):
143143
alignments = aligner.align(str(reference_record.seq), str(consensus_record.seq))
144144
alignment = alignments[0]
145145

146-
target_locations = alignment.aligned[0]
147-
query_locations = alignment.aligned[1]
146+
target_locations = alignment.aligned[0] # Reference locations
147+
query_locations = alignment.aligned[1] # Consensus locations
148148

149149
logger.debug(alignment.aligned)
150150

151151
with open("alignment.txt", "w") as f:
152152
f.write(str(alignment))
153153

154+
# Account for the gaps in the alignment, by updating the consensus indexes
154155
logger.info("> Finding target tuples")
155156
indexes_differences = find_target_tuples_sorted(regions, target_locations)
156157

bin/network_cluster.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ def parse_args(argv=None):
2525
"file_in",
2626
metavar="FILE_IN",
2727
type=Path,
28-
help="cluster file from chdit or vsearch containing cluster information.",
28+
help="Matrix with distance values of genomes.",
2929
)
3030

3131
parser.add_argument(

0 commit comments

Comments
 (0)