Skip to content

Commit 936fa9a

Browse files
authored
Merge pull request #139 from Joon-Klaps/reffurbish-mqc-implementation
Reffurbish mqc implementation
2 parents 0f8c0b9 + a4727fe commit 936fa9a

File tree

87 files changed

+2390
-2423
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

87 files changed

+2390
-2423
lines changed

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,4 @@ testing*
88
*.pyc
99
.vscode/
1010
null/
11+
__pycache__/

.nf-core.yml

-4
Original file line numberDiff line numberDiff line change
@@ -34,10 +34,6 @@ lint:
3434
- CITATIONS.md
3535
- conf/test.config
3636
- conf/test_full.config
37-
- lib/Utils.groovy
38-
- lib/WorkflowMain.groovy
39-
- lib/NfcoreTemplate.groovy
40-
- lib/WorkflowViralgenie.groovy
4137
actions_ci: false
4238
template:
4339
name: viralgenie

CHANGELOG.md

+3
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,16 @@ Initial release of Joon-Klaps/viralgenie, created with the [nf-core](https://nf-
1111

1212
- Set default umitools dedup strategy to cluster ([#126](https://github.com/Joon-Klaps/viralgenie/pull/126))
1313
- Include both krakenreport &nodes.dmp in taxonomy filtering ([#128](https://github.com/Joon-Klaps/viralgenie/pull/128))
14+
- Include coverage plot & subset contig results in mqc report ([#129](https://github.com/Joon-Klaps/viralgenie/pull/129))
1415
- Add Sspace indiv to each assembler seperatly ([#132](https://github.com/Joon-Klaps/viralgenie/pull/132))
1516
- Add read & contig decomplexification using prinseq++ ([#133](https://github.com/Joon-Klaps/viralgenie/pull/133))
1617
- Add option to filter contig clusters based on cumulative read coverage ([#138](https://github.com/Joon-Klaps/viralgenie/pull/138))
18+
- Reffurbish mqc implementation ([#139](https://github.com/Joon-Klaps/viralgenie/pull/139))
1719
- Adding mash-screen output to result table ([#140](https://github.com/Joon-Klaps/viralgenie/pull/140))
1820
- Add logic to allow samples with no reference hits to be analysed ([#141](https://github.com/Joon-Klaps/viralgenie/pull/141))
1921
- Add visualisation for hybrid scaffold ([#143](https://github.com/Joon-Klaps/viralgenie/pull/143))
2022

23+
2124
### `Fixed`
2225

2326
- OOM with longer contigs for lowcov_to_reference, uses more RAM now ([#125](https://github.com/Joon-Klaps/viralgenie/pull/125))

README.md

+6-3
Original file line numberDiff line numberDiff line change
@@ -127,15 +127,18 @@ For further information or help, don't hesitate to get in touch on the [Slack `#
127127

128128
## Citations
129129

130-
<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
131130
<!-- If you use nf-core/viralgenie for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->
131+
>[!WARNING]
132+
> Viralgenie is currently not Published. Please cite as:
133+
> Klaps J, Lemey P, Kafetzopoulou L. Viralgenie: A metagenomics analysis pipeline for eukaryotic viruses. __Github__ https://github.com/Joon-Klaps/viralgenie
134+
132135

133136
An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](https://joon-klaps.github.io/viralgenie/latest/CITATIONS) file.
134137

135-
You can cite the `nf-core` publication as follows:
138+
<!-- You can cite the `nf-core` publication as follows:
136139
137140
> **The nf-core framework for community-curated bioinformatics pipelines.**
138141
>
139142
> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
140143
>
141-
> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).
144+
> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x). -->

assets/custom_table_headers.yml

+71-62
Original file line numberDiff line numberDiff line change
@@ -1,60 +1,65 @@
1-
umitools_dedup:
2-
- input_reads: "mapped reads"
3-
- removed_reads
4-
- output_reads: "deduplicated reads"
5-
- positions_deduplicated
6-
- max_umi_per_pos
7-
- mean_umi_per_pos
8-
- total_umis
9-
- percent_passing_dedup: "% passing dedup"
10-
- unique_umis
11-
picard_dups:
12-
- UNPAIRED_READS_EXAMINED: "unpaired reads examined"
13-
- READ_PAIR_OPTICAL_DUPLICATES: "read pair optical duplicates"
14-
- UNMAPPED_READS: "unmapped reads"
15-
- ESTIMATED_LIBRARY_SIZE: "estimated library size"
16-
- UNPAIRED_READ_DUPLICATES: "unpaired read duplicates"
17-
- SECONDARY_OR_SUPPLEMENTARY_RDS: "secondary or supplementary rds"
18-
- READ_PAIRS_EXAMINED: "read pairs examined"
19-
- READ_PAIR_DUPLICATES: "read pair duplicates"
20-
- PERCENT_DUPLICATION: "% duplication"
21-
- LIBRARY: "library"
22-
samtools_stats:
23-
- sequences
24-
- reads_paired_percent: "reads paired %"
25-
- average_length
26-
- is_sorted
27-
- bases_mapped_(cigar)
28-
- reads_QC_failed_percent: "reads QC failed %"
29-
- reads_unmapped
30-
- reads_unmapped_percent: "reads unmapped %"
31-
- reads_properly_paired_percent: "reads properly paired %"
32-
- average_quality
33-
- reads_paired
34-
- non-primary_alignments
35-
- supplementary_alignments
36-
- reads_mapped
37-
- reads_mapped_percent: "reads mapped %"
38-
- bases_trimmed
39-
- bases_duplicated
40-
- reads_properly_paired
41-
- outward_oriented_pairs
42-
- reads_duplicated
43-
- reads_duplicated_percent: "reads duplicated %"
44-
- bases_mapped
45-
- insert_size_average
46-
- insert_size_standard_deviation
47-
- inward_oriented_pairs
48-
- error_rate
49-
- mismatches
50-
- reads_MQ0
51-
- total_length
52-
- reads_QC_failed
1+
failed_mapped:
2+
- mapped reads
3+
umitools:
4+
- multiqc_umitools_dedup:
5+
- input_reads: "mapped reads"
6+
- removed_reads
7+
- output_reads: "deduplicated reads"
8+
- positions_deduplicated
9+
- max_umi_per_pos
10+
- mean_umi_per_pos
11+
- total_umis
12+
- percent_passing_dedup: "% passing dedup"
13+
- unique_umis
14+
picard:
15+
- multiqc_picard_dups:
16+
- UNPAIRED_READS_EXAMINED: "unpaired reads examined"
17+
- READ_PAIR_OPTICAL_DUPLICATES: "read pair optical duplicates"
18+
- UNMAPPED_READS: "unmapped reads"
19+
- ESTIMATED_LIBRARY_SIZE: "estimated library size"
20+
- UNPAIRED_READ_DUPLICATES: "unpaired read duplicates"
21+
- SECONDARY_OR_SUPPLEMENTARY_RDS: "secondary or supplementary rds"
22+
- READ_PAIRS_EXAMINED: "read pairs examined"
23+
- READ_PAIR_DUPLICATES: "read pair duplicates"
24+
- PERCENT_DUPLICATION: "% duplication"
25+
- LIBRARY: "library"
26+
samtools:
27+
- multiqc_samtools_stats:
28+
- sequences: "reads mapped"
29+
- reads_paired_percent: "reads paired %"
30+
- average_length
31+
- is_sorted
32+
- bases_mapped_(cigar)
33+
- reads_QC_failed_percent: "reads QC failed %"
34+
- reads_unmapped
35+
- reads_unmapped_percent: "reads unmapped %"
36+
- reads_properly_paired_percent: "reads properly paired %"
37+
- average_quality
38+
- reads_paired
39+
- non-primary_alignments
40+
- supplementary_alignments
41+
- reads_mapped
42+
- reads_mapped_percent: "reads mapped %"
43+
- bases_trimmed
44+
- bases_duplicated
45+
- reads_properly_paired
46+
- outward_oriented_pairs
47+
- reads_duplicated
48+
- reads_duplicated_percent: "reads duplicated %"
49+
- bases_mapped
50+
- insert_size_average
51+
- insert_size_standard_deviation
52+
- inward_oriented_pairs
53+
- error_rate
54+
- mismatches
55+
- reads_MQ0
56+
- total_length
57+
- reads_QC_failed
5358
ivar_variants:
5459
- INS: "raw inserts"
5560
- SNP: "raw SNPs"
5661
- DEL: "raw deletions"
57-
bcftools_stats:
62+
bcftools:
5863
- number_of_indels: "number of indels"
5964
- number_of_samples: "number of samples"
6065
- number_of_SNPs: "number of SNPs"
@@ -83,12 +88,16 @@ bcftools_stats:
8388
- substitution_type_G>A: "substitution G->A"
8489
- substitution_type_A>G: "substitution A->G"
8590
general_stats:
86-
- mosdepth-1_x_pc: "mosdepth 1X coverage"
87-
- mosdepth-5_x_pc: "mosdepth 5X coverage"
88-
- mosdepth-10_x_pc: "mosdepth 10X coverage"
89-
- mosdepth-30_x_pc: "mosdepth 30X coverage"
90-
- mosdepth-30_x_pc: "mosdepth 50X coverage"
91-
- mosdepth-median_coverage: "mosdepth Median read depth"
92-
- mosdepth-mean_coverage: "mosdepth Mean read depth"
93-
- mosdepth-min_coverage: "mosdepth Min read depth"
94-
- mosdepth-max_coverage: "mosdepth Max read depth"
91+
- "CLUSTER: mosdepth.mean_coverage": "mosdepth Mean read depth"
92+
- "CLUSTER: mosdepth.min_coverage": "mosdepth Min read depth"
93+
- "CLUSTER: mosdepth.max_coverage": "mosdepth Max read depth"
94+
- "CLUSTER: mosdepth.median_coverage": "mosdepth Median read depth"
95+
- "CLUSTER: mosdepth.1_x_pc": "mosdepth 1X coverage"
96+
- "CLUSTER: mosdepth.5_x_pc": "mosdepth 5X coverage"
97+
- "CLUSTER: mosdepth.10_x_pc": "mosdepth 10X coverage"
98+
- "CLUSTER: mosdepth.50_x_pc": "mosdepth 50X coverage"
99+
- "CLUSTER: mosdepth.100_x_pc": "mosdepth 100X coverage"
100+
- "CLUSTER: mosdepth.200_x_pc": "mosdepth 200X coverage"
101+
- "CLUSTER: mosdepth.500_x_pc": "mosdepth 500X coverage"
102+
- "CLUSTER: mosdepth.750_x_pc": "mosdepth 750X coverage"
103+
- "CLUSTER: mosdepth.1000_x_pc": "mosdepth 1000X coverage"

assets/methods_description_template.yml

-5
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@ description: "Suggested text and references to use when describing pipeline usag
33
section_name: "Joon-Klaps/viralgenie Methods Description"
44
section_href: "https://github.com/Joon-Klaps/viralgenie"
55
plot_type: "html"
6-
## TODO nf-core: Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
76
## You inject any metadata in the Nextflow '${workflow}' object
87
data: |
98
<h4>Methods</h4>
@@ -13,10 +12,6 @@ data: |
1312
<p>${tool_citations}</p>
1413
<h4>References</h4>
1514
<ul>
16-
<li>Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316-319. doi: <a href="https://doi.org/10.1038/nbt.3820">10.1038/nbt.3820</a></li>
17-
<li>Ewels, P. A., Peltzer, A., Fillinger, S., Patel, H., Alneberg, J., Wilm, A., Garcia, M. U., Di Tommaso, P., & Nahnsen, S. (2020). The nf-core framework for community-curated bioinformatics pipelines. Nature Biotechnology, 38(3), 276-278. doi: <a href="https://doi.org/10.1038/s41587-020-0439-x">10.1038/s41587-020-0439-x</a></li>
18-
<li>Grüning, B., Dale, R., Sjödin, A., Chapman, B. A., Rowe, J., Tomkins-Tinch, C. H., Valieris, R., Köster, J., & Bioconda Team. (2018). Bioconda: sustainable and comprehensive software distribution for the life sciences. Nature Methods, 15(7), 475–476. doi: <a href="https://doi.org/10.1038/s41592-018-0046-7">10.1038/s41592-018-0046-7</a></li>
19-
<li>da Veiga Leprevost, F., Grüning, B. A., Alves Aflitos, S., Röst, H. L., Uszkoreit, J., Barsnes, H., Vaudel, M., Moreno, P., Gatto, L., Weber, J., Bai, M., Jimenez, R. C., Sachsenberg, T., Pfeuffer, J., Vera Alvarez, R., Griss, J., Nesvizhskii, A. I., & Perez-Riverol, Y. (2017). BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics (Oxford, England), 33(16), 2580–2582. doi: <a href="https://doi.org/10.1093/bioinformatics/btx192">10.1093/bioinformatics/btx192</a></li>
2015
${tool_bibliography}
2116
</ul>
2217
<div class="alert alert-info">

assets/mqc_comment/blast_mqc.txt

-5
This file was deleted.

assets/mqc_comment/checkv_mqc.txt

-5
This file was deleted.

assets/mqc_comment/clusters_summary_mqc.txt

-18
This file was deleted.

assets/mqc_comment/contig_overview_mqc.txt

-5
This file was deleted.

assets/mqc_comment/mapping_constrains_mqc.txt

-5
This file was deleted.

assets/mqc_comment/mapping_constrains_summary_mqc.txt

-5
This file was deleted.

assets/mqc_comment/quast_mqc.txt

-4
This file was deleted.

assets/mqc_comment/sample_metadata_mqc.txt

-5
This file was deleted.

assets/multiqc_config.yml

+21-26
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ report_comment: >
44
<a href="https://joon-klaps.github.io/viralgenie/latest/dev/usage/" target="_blank">documentation</a>.
55
66
export_plots: true
7-
7+
data_format: "yaml"
88
max_table_rows: 100000
99

1010
report_section_order:
@@ -18,36 +18,18 @@ report_section_order:
1818
before: summary_contigs
1919
failed_mapped:
2020
before: summary_contigs
21-
contig_overview:
21+
cluster_summary:
2222
before: samtools_stats
2323
"viralgenie-methods-description":
2424
order: -1000
25-
software_versions:
25+
"Software-versions":
2626
order: -1001
2727
"Joon-Klaps-viralgenie-summary":
2828
order: -1002
2929

3030
use_filename_as_sample_name:
3131
- fastp
3232

33-
run_modules:
34-
- custom_content
35-
- fastqc
36-
- fastp
37-
- trimmomatic
38-
- humid
39-
- bbduk
40-
- umitools
41-
- bowtie2
42-
- kaiju
43-
- mosdepth
44-
- kraken
45-
- bracken
46-
- quast
47-
- samtools
48-
- bcftools
49-
- picard
50-
5133
module_order:
5234
- "fastqc":
5335
name: "SAMPLE: FastQC (Raw)"
@@ -98,6 +80,8 @@ module_order:
9880
anchor: "quast_trinity"
9981
path_filters:
10082
- "*_trinity.tsv"
83+
- "cluster-summary":
84+
name: "SAMPLE: Contig clustering"
10185
- "picard":
10286
name: "CLUSTER: Picard"
10387
- "umitools":
@@ -110,23 +94,35 @@ module_order:
11094
name: "CLUSTER: Bcftools"
11195
- "custom_content"
11296

97+
mosdepth_config:
98+
general_stats_coverage:
99+
- 1
100+
- 5
101+
- 10
102+
- 50
103+
- 100
104+
- 200
105+
- 500
106+
- 750
107+
- 1000
108+
113109
# Summary table names
114110
table_columns_name:
115111
"SAMPLE: FastQC (raw)":
116112
total_sequences: "Nr. Input Reads"
117-
avg_sequence_length: "Length Input Reads"
113+
avg_sequence_length: "Average Length Input Reads"
118114
percent_gc: "% GC Input Reads"
119115
percent_duplicates: "% Dups Input Reads"
120116
percent_fails: "% Failed Input Reads"
121117
"SAMPLE: FastQC (post-Trimming)":
122118
total_sequences: "Nr. reads post Trimming"
123-
avg_sequence_length: "Length reads post Trimming"
119+
avg_sequence_length: "Average Length reads post Trimming"
124120
percent_gc: "% GC reads post Trimming"
125121
percent_duplicates: "% Dups reads post Trimming"
126122
percent_fails: "% Failed reads post Trimming"
127123
"SAMPLE: FastQC (post-Host-removal)":
128124
total_sequences: "Nr. Processed Reads"
129-
avg_sequence_length: "Length Processed Reads"
125+
avg_sequence_length: "Average Length Processed Reads"
130126
percent_gc: "% GC Processed Reads"
131127
percent_duplicates: "% Dups Processed Reads"
132128
percent_fails: "% Failed Processed Reads"
@@ -150,6 +146,7 @@ table_columns_name:
150146
pct_unclassified: "% non-host reads"
151147
"SAMPLE: Kraken2 (Diversity)":
152148
pct_top_n: "% reads top 5 Species (Kraken2)"
149+
pct_unclassified: "% reads unclassified (Kraken2)"
153150
"SAMPLE: KAIJU (Diversity)":
154151
"% Assigned": "% Reads assigned (Kaiju)"
155152
assigned: "M reads assigned (Kaiju)"
@@ -205,8 +202,6 @@ table_columns_visible:
205202
Largest contig: True
206203
Total length: False
207204

208-
skip_versions_section: true
209-
210205
# Viral reads are not so big
211206
read_count_multiplier: 1
212207
read_count_prefix: ""
-76.6 KB
Binary file not shown.

assets/schemas/mapping_constrains.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -55,5 +55,5 @@
5555
"segment": ["species"]
5656
}
5757
},
58-
"allOf": [{ "uniqueEntries": ["id","species", "segment"] }, { "uniqueEntries": ["id"] }]
58+
"allOf": [{ "uniqueEntries": ["id", "species", "segment"] }, { "uniqueEntries": ["id"] }]
5959
}

0 commit comments

Comments
 (0)