You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/output.md
+15-1
Original file line number
Diff line number
Diff line change
@@ -700,6 +700,16 @@ Consensus quality control is done with multiple tools, the results are stored in
700
700
- `<sample-id>/<sample-id>_<cl# | constraint-id>/contamination.tsv`: A detailed overview of how contamination was estimated.
701
701
- `<sample-id>/<sample-id>_<cl# | constraint-id>/complete_genomes.tsv`: A detailed overview of putative genomes identified.
702
702
703
+
704
+
### Prokka
705
+
706
+
[`Prokka`](https://github.com/tseemann/prokka) is a whole genome annotation pipeline for identifying features of interest in a set of genomic DNA sequences, and labelling them with useful information. Prokka is a software tool to annotate bacterial, archaeal and viral genomes.)
707
+
708
+
???- abstract "Output files"
709
+
710
+
- `consensus/quality_control/prokka/`
711
+
- `<sample-id>/<iteration>/* directories containing the prokka output files.
712
+
703
713
### BLASTn
704
714
705
715
[BLAST](https://blast.ncbi.nlm.nih.gov/Blast.cgi) is a tool for comparing primary biological sequence information. The output from the BLAST run is stored in the directory `consensus/quality_control/blast/`. Final consensus genomes are searched against the `--reference_pool`.
@@ -790,11 +800,15 @@ Furthermore, viralgenie runs MultiQC 2 times, as it uses the output from multiqc
790
800
791
801
???- abstract "Output files"
792
802
- `multiqc/`
793
-
- `overview-tables/`: a directory with a set of commented TSV (comments taken from `--multiqc_comment_headers`) that summarize aspects of the pipeline runs.
794
803
- `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser.
795
804
- `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline.
796
805
- `multiqc_dataprep/`: preparation files for the generated custom tables.
797
806
- `multiqc_plots/`: directory containing static images from the report in various formats.
807
+
- `overview-tables/`: a directory with a set of summary TSV files.
808
+
- `contigs_overview_with_iterations.tsv`: A tabular file containing the contig information of the final __contig consensus__ genome and their intermediate iterations.
809
+
- `contigs_overview.tsv`: A tabular file containing the contig information of the final __contig consensus__ genome.
810
+
- `mapping_overview.tsv`: A tabular file containing the mapping information of the final __mapped consensus__ genome, from the argument `--mapping_constraints`.
811
+
- `samples_overview.tsv`: A tabular file containing the sample information combining information from both `contigs_overview.tsv` & `mapping_overview.tsv`.
Copy file name to clipboardexpand all lines: docs/workflow/consensus_qc.md
+13-1
Original file line number
Diff line number
Diff line change
@@ -30,7 +30,19 @@ Within the MultiQC report, Viralgenie provides a number of custom tables based o
30
30
31
31
> CheckV can be skipped with `--skip_checkv`.
32
32
33
-
## BLASTn
33
+
34
+
## Prokka
35
+
[Prokka](https://github.com/tseemann/prokka) is a whole genome annotation pipeline for identifying features of interest in a set of genomic DNA sequences, and labelling them with useful information. Prokka is a software tool to annotate bacterial, archaeal and viral genomes.
36
+
37
+
!!! Tip "Suboptimal annotation"
38
+
Prokka was initially designed for bacterial and archaeal genomes, and may not be optimal for viral genomes. [VIGOR4](https://github.com/JCVenterInstitute/VIGOR4) is a good alternative but is species specific.
39
+
40
+
!!! Tip "Custom protein database"
41
+
Prokka can be given a custom protein database to annotate your genomes with, have a look at [prot-RVDB](https://rvdb-prot.pasteur.fr/) for viral protein databases. Supply the database using `--prokka_db`.
42
+
43
+
> Prokka can be skipped with `--skip_prokka`.
44
+
45
+
## BLAST
34
46
35
47
[blastn](https://blast.ncbi.nlm.nih.gov/Blast.cgi) is a tool for comparing primary biological sequence information. It calculates the similarity between the consensus genome and the reference genome. The similarity is calculated based on the number of identical bases between the two sequences. Viralgenie uses blastn to compare the sequences against the supplied `--reference_pool` dataset.
0 commit comments