Skip to content

Commit a6f6fda

Browse files
committed
adding docs
1 parent 9b8c7e9 commit a6f6fda

File tree

3 files changed

+27
-1
lines changed

3 files changed

+27
-1
lines changed

CITATIONS.md

+4
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,10 @@
103103
104104
- [picard-tools](http://broadinstitute.github.io/picard)
105105

106+
- [prokka](https://pubmed.ncbi.nlm.nih.gov/24642063/)
107+
108+
> Seemann, Torsten. “Prokka: rapid prokaryotic genome annotation.” Bioinformatics (Oxford, England) vol. 30,14 (2014): 2068-9. doi:10.1093/bioinformatics/btu153
109+
106110
- [QUAST](https://www.ncbi.nlm.nih.gov/pubmed/23422339/)
107111

108112
> Gurevich, Alexey et al. “QUAST: quality assessment tool for genome assemblies.” Bioinformatics (Oxford, England) vol. 29,8 (2013): 1072-5. doi:10.1093/bioinformatics/btt086

docs/output.md

+10
Original file line numberDiff line numberDiff line change
@@ -700,6 +700,16 @@ Consensus quality control is done with multiple tools, the results are stored in
700700
- `<sample-id>/<sample-id>_<cl# | constraint-id>/contamination.tsv`: A detailed overview of how contamination was estimated.
701701
- `<sample-id>/<sample-id>_<cl# | constraint-id>/complete_genomes.tsv`: A detailed overview of putative genomes identified.
702702

703+
704+
### Prokka
705+
706+
[`Prokka`](https://github.com/tseemann/prokka) is a whole genome annotation pipeline for identifying features of interest in a set of genomic DNA sequences, and labelling them with useful information. Prokka is a software tool to annotate bacterial, archaeal and viral genomes.)
707+
708+
???- abstract "Output files"
709+
710+
- `consensus/quality_control/prokka/`
711+
- `<sample-id>/<iteration>/* directories containing the prokka output files.
712+
703713
### BLASTn
704714

705715
[BLAST](https://blast.ncbi.nlm.nih.gov/Blast.cgi) is a tool for comparing primary biological sequence information. The output from the BLAST run is stored in the directory `consensus/quality_control/blast/`. Final consensus genomes are searched against the `--reference_pool`.

docs/workflow/consensus_qc.md

+13-1
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,19 @@ Within the MultiQC report, Viralgenie provides a number of custom tables based o
3030

3131
> CheckV can be skipped with `--skip_checkv`.
3232
33-
## BLASTn
33+
34+
## Prokka
35+
[Prokka](https://github.com/tseemann/prokka) is a whole genome annotation pipeline for identifying features of interest in a set of genomic DNA sequences, and labelling them with useful information. Prokka is a software tool to annotate bacterial, archaeal and viral genomes.
36+
37+
!!! Tip "Suboptimal annotation"
38+
Prokka was initially designed for bacterial and archaeal genomes, and may not be optimal for viral genomes. [VIGOR4](https://github.com/JCVenterInstitute/VIGOR4) is a good alternative but is species specific.
39+
40+
!!! Tip "Custom protein database"
41+
Prokka can be given a custom protein database to annotate your genomes with, have a look at [prot-RVDB](https://rvdb-prot.pasteur.fr/) for viral protein databases. Supply the database using `--prokka_db`.
42+
43+
> Prokka can be skipped with `--skip_prokka`.
44+
45+
## BLAST
3446

3547
[blastn](https://blast.ncbi.nlm.nih.gov/Blast.cgi) is a tool for comparing primary biological sequence information. It calculates the similarity between the consensus genome and the reference genome. The similarity is calculated based on the number of identical bases between the two sequences. Viralgenie uses blastn to compare the sequences against the supplied `--reference_pool` dataset.
3648

0 commit comments

Comments
 (0)