Skip to content

Commit 25c70fb

Browse files
authored
Merge branch 'dev' into refactor-arguments
Signed-off-by: Joon Klaps <joon.klaps@kuleuven.be>
2 parents 215fdff + 0fb6e45 commit 25c70fb

File tree

7 files changed

+33
-9
lines changed

7 files changed

+33
-9
lines changed

.nf-core.yml

-1
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,6 @@ lint:
2323
- manifest.name
2424
- manifest.homePage
2525
- config_defaults:
26-
- params.multiqc_comment_headers
2726
- params.custom_table_headers
2827
multiqc_config: false
2928
files_exist:

CHANGELOG.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Initial release of Joon-Klaps/viralgenie, created with the [nf-core](https://nf-
2626
- Constrain -> Constraint & further python script debugging ([#161](https://github.com/Joon-Klaps/viralgenie/pull/161))
2727
- include empty samples in multiqc sample overview ([#162](https://github.com/Joon-Klaps/viralgenie/pull/162))
2828
- Include samtools stats pre dedup & post dedup in overview tables ([#163](https://github.com/Joon-Klaps/viralgenie/pull/163))
29-
- adding prodigal with fitler setup for selecting final result genomes ([#165](https://github.com/Joon-Klaps/viralgenie/pull/165))
29+
- adding prokka for gene detection & annotation ([#165](https://github.com/Joon-Klaps/viralgenie/pull/165))
3030
- Refactor module arguments to pipeline arguments ([#166](https://github.com/Joon-Klaps/viralgenie/pull/166))
3131

3232
### `Fixed`

CITATIONS.md

+4
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,10 @@
103103
104104
- [picard-tools](http://broadinstitute.github.io/picard)
105105

106+
- [prokka](https://pubmed.ncbi.nlm.nih.gov/24642063/)
107+
108+
> Seemann, Torsten. “Prokka: rapid prokaryotic genome annotation.” Bioinformatics (Oxford, England) vol. 30,14 (2014): 2068-9. doi:10.1093/bioinformatics/btu153
109+
106110
- [QUAST](https://www.ncbi.nlm.nih.gov/pubmed/23422339/)
107111

108112
> Gurevich, Alexey et al. “QUAST: quality assessment tool for genome assemblies.” Bioinformatics (Oxford, England) vol. 29,8 (2013): 1072-5. doi:10.1093/bioinformatics/btt086

docs/output.md

+15-1
Original file line numberDiff line numberDiff line change
@@ -700,6 +700,16 @@ Consensus quality control is done with multiple tools, the results are stored in
700700
- `<sample-id>/<sample-id>_<cl# | constraint-id>/contamination.tsv`: A detailed overview of how contamination was estimated.
701701
- `<sample-id>/<sample-id>_<cl# | constraint-id>/complete_genomes.tsv`: A detailed overview of putative genomes identified.
702702

703+
704+
### Prokka
705+
706+
[`Prokka`](https://github.com/tseemann/prokka) is a whole genome annotation pipeline for identifying features of interest in a set of genomic DNA sequences, and labelling them with useful information. Prokka is a software tool to annotate bacterial, archaeal and viral genomes.)
707+
708+
???- abstract "Output files"
709+
710+
- `consensus/quality_control/prokka/`
711+
- `<sample-id>/<iteration>/* directories containing the prokka output files.
712+
703713
### BLASTn
704714

705715
[BLAST](https://blast.ncbi.nlm.nih.gov/Blast.cgi) is a tool for comparing primary biological sequence information. The output from the BLAST run is stored in the directory `consensus/quality_control/blast/`. Final consensus genomes are searched against the `--reference_pool`.
@@ -790,11 +800,15 @@ Furthermore, viralgenie runs MultiQC 2 times, as it uses the output from multiqc
790800

791801
???- abstract "Output files"
792802
- `multiqc/`
793-
- `overview-tables/`: a directory with a set of commented TSV (comments taken from `--multiqc_comment_headers`) that summarize aspects of the pipeline runs.
794803
- `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser.
795804
- `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline.
796805
- `multiqc_dataprep/`: preparation files for the generated custom tables.
797806
- `multiqc_plots/`: directory containing static images from the report in various formats.
807+
- `overview-tables/`: a directory with a set of summary TSV files.
808+
- `contigs_overview_with_iterations.tsv`: A tabular file containing the contig information of the final __contig consensus__ genome and their intermediate iterations.
809+
- `contigs_overview.tsv`: A tabular file containing the contig information of the final __contig consensus__ genome.
810+
- `mapping_overview.tsv`: A tabular file containing the mapping information of the final __mapped consensus__ genome, from the argument `--mapping_constraints`.
811+
- `samples_overview.tsv`: A tabular file containing the sample information combining information from both `contigs_overview.tsv` & `mapping_overview.tsv`.
798812

799813
## Pipeline information
800814

docs/workflow/consensus_qc.md

+13-1
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,19 @@ Within the MultiQC report, Viralgenie provides a number of custom tables based o
3030

3131
> CheckV can be skipped with `--skip_checkv`.
3232
33-
## BLASTn
33+
34+
## Prokka
35+
[Prokka](https://github.com/tseemann/prokka) is a whole genome annotation pipeline for identifying features of interest in a set of genomic DNA sequences, and labelling them with useful information. Prokka is a software tool to annotate bacterial, archaeal and viral genomes.
36+
37+
!!! Tip "Suboptimal annotation"
38+
Prokka was initially designed for bacterial and archaeal genomes, and may not be optimal for viral genomes. [VIGOR4](https://github.com/JCVenterInstitute/VIGOR4) is a good alternative but is species specific.
39+
40+
!!! Tip "Custom protein database"
41+
Prokka can be given a custom protein database to annotate your genomes with, have a look at [prot-RVDB](https://rvdb-prot.pasteur.fr/) for viral protein databases. Supply the database using `--prokka_db`.
42+
43+
> Prokka can be skipped with `--skip_prokka`.
44+
45+
## BLAST
3446

3547
[blastn](https://blast.ncbi.nlm.nih.gov/Blast.cgi) is a tool for comparing primary biological sequence information. It calculates the similarity between the consensus genome and the reference genome. The similarity is calculated based on the number of identical bases between the two sequences. Viralgenie uses blastn to compare the sequences against the supplied `--reference_pool` dataset.
3648

modules/local/custom_multiqc/main.nf

-3
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,6 @@ process CUSTOM_MULTIQC {
1818
path anno_files, stageAs: "?/annotation/*"
1919
path clusters_tsv, stageAs: "?/clusters/*"
2020
path screen_files, stageAs: "?/screen/*"
21-
path comment_headers
2221
path custom_table_headers
2322

2423
output:
@@ -48,7 +47,6 @@ process CUSTOM_MULTIQC {
4847
def clusters_files = clusters_tsv ? "--clusters_files ${clusters_tsv}" : ''
4948
def mapping_constraints_command = mapping_constraints ? "--mapping_constraints ${mapping_constraints}" : ''
5049
def screen_files_command = screen_files ? "--screen_files ${screen_files}" : ''
51-
def comment_headers_command = comment_headers ? "--comment_dir ${comment_headers}" : ''
5250
def custom_table_headers_command = custom_table_headers ? "--table_headers ${custom_table_headers}" : ''
5351

5452
"""
@@ -65,7 +63,6 @@ process CUSTOM_MULTIQC {
6563
$clusters_files \\
6664
$mapping_constraints_command \\
6765
$screen_files_command \\
68-
$comment_headers_command \\
6966
$custom_table_headers_command \\
7067
7168

workflows/viralgenie.nf

-2
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,6 @@ workflow VIRALGENIE {
9191
ch_multiqc_custom_config = params.multiqc_config ? Channel.fromPath( params.multiqc_config, checkIfExists: true ) : Channel.empty()
9292
ch_multiqc_logo = params.multiqc_logo ? Channel.fromPath( params.multiqc_logo, checkIfExists: true ) : Channel.empty()
9393
ch_multiqc_custom_methods_description = params.multiqc_methods_description ? file(params.multiqc_methods_description, checkIfExists: true) : file("$projectDir/assets/methods_description_template.yml", checkIfExists: true)
94-
ch_multiqc_comment_headers = params.multiqc_comment_headers ? Channel.fromPath(params.multiqc_comment_headers, checkIfExists:true ) : Channel.empty()
9594
ch_multiqc_custom_table_headers = params.custom_table_headers ? Channel.fromPath(params.custom_table_headers, checkIfExists:true ) : Channel.empty()
9695

9796

@@ -492,7 +491,6 @@ workflow VIRALGENIE {
492491
ch_annotation_summary.ifEmpty([]),
493492
ch_clusters_tsv.ifEmpty([]),
494493
ch_mash_screen.ifEmpty([]),
495-
ch_multiqc_comment_headers.ifEmpty([]),
496494
ch_multiqc_custom_table_headers.ifEmpty([])
497495
)
498496
ch_versions = ch_versions.mix(CUSTOM_MULTIQC.out.versions)

0 commit comments

Comments
 (0)