adding docs

Joon-Klaps · Joon-Klaps · commit a6f6fda926b8 · 2025-02-25T10:17:05.000Z
diff --git a/CITATIONS.md b/CITATIONS.md
@@ -103,6 +103,10 @@
 
 - [picard-tools](http://broadinstitute.github.io/picard)
 
+- [prokka](https://pubmed.ncbi.nlm.nih.gov/24642063/)
+
+    > Seemann, Torsten. “Prokka: rapid prokaryotic genome annotation.” Bioinformatics (Oxford, England) vol. 30,14 (2014): 2068-9. doi:10.1093/bioinformatics/btu153
+
 - [QUAST](https://www.ncbi.nlm.nih.gov/pubmed/23422339/)
 
     > Gurevich, Alexey et al. “QUAST: quality assessment tool for genome assemblies.” Bioinformatics (Oxford, England) vol. 29,8 (2013): 1072-5. doi:10.1093/bioinformatics/btt086
diff --git a/docs/output.md b/docs/output.md
@@ -700,6 +700,16 @@ Consensus quality control is done with multiple tools, the results are stored in
         - `<sample-id>/<sample-id>_<cl# | constraint-id>/contamination.tsv`: A detailed overview of how contamination was estimated.
         - `<sample-id>/<sample-id>_<cl# | constraint-id>/complete_genomes.tsv`: A detailed overview of putative genomes identified.
 
+
+### Prokka
+
+[`Prokka`](https://github.com/tseemann/prokka) is a whole genome annotation pipeline for identifying features of interest in a set of genomic DNA sequences, and labelling them with useful information. Prokka is a software tool to annotate bacterial, archaeal and viral genomes.)
+
+???- abstract "Output files"
+
+    - `consensus/quality_control/prokka/`
+        - `<sample-id>/<iteration>/* directories containing the prokka output files.
+
 ### BLASTn
 
 [BLAST](https://blast.ncbi.nlm.nih.gov/Blast.cgi) is a tool for comparing primary biological sequence information. The output from the BLAST run is stored in the directory `consensus/quality_control/blast/`. Final consensus genomes are searched against the `--reference_pool`.
diff --git a/docs/workflow/consensus_qc.md b/docs/workflow/consensus_qc.md
@@ -30,7 +30,19 @@ Within the MultiQC report, Viralgenie provides a number of custom tables based o
 
 > CheckV can be skipped with `--skip_checkv`.
 
-## BLASTn
+
+## Prokka
+[Prokka](https://github.com/tseemann/prokka) is a whole genome annotation pipeline for identifying features of interest in a set of genomic DNA sequences, and labelling them with useful information. Prokka is a software tool to annotate bacterial, archaeal and viral genomes.
+
+!!! Tip "Suboptimal annotation"
+    Prokka was initially designed for bacterial and archaeal genomes, and may not be optimal for viral genomes. [VIGOR4](https://github.com/JCVenterInstitute/VIGOR4) is a good alternative but is species specific.
+
+!!! Tip "Custom protein database"
+    Prokka can be given a custom protein database to annotate your genomes with, have a look at [prot-RVDB](https://rvdb-prot.pasteur.fr/) for viral protein databases. Supply the database using `--prokka_db`.
+
+> Prokka can be skipped with `--skip_prokka`.
+
+## BLAST
 
 [blastn](https://blast.ncbi.nlm.nih.gov/Blast.cgi) is a tool for comparing primary biological sequence information. It calculates the similarity between the consensus genome and the reference genome. The similarity is calculated based on the number of identical bases between the two sequences. Viralgenie uses blastn to compare the sequences against the supplied `--reference_pool` dataset.