Table of Contents
Whole genome SNP based identification of members of the Mycobacterium tuberculosis complex. Based on code originally written by Samuel Lipworth and turned into a package by Philip Fowler and Michael Hall.
snpit
allows rapid Mycobacterial speciation of VCF files aligned to NC000962 (H37rV) and FAST(A/Q) files.
For more information please see the article;
Lipworth S, Jajou R, de Neeling A, et al. SNP-IT Tool for Identifying Subspecies and Associated Lineages of Mycobacterium tuberculosis Complex. Emerging Infectious Diseases. 2019;25(3):482-488. doi:10.3201/eid2503.180894.
Please email samuel.lipworth@medsci.ox.ac.uk with any queries.
snpit
requires python version 3.5 or greater.
# not yet setup
# not yet setup
There are two ways of doing this: installing to your local python packages, or in a virtual environment (recommended).
First clone the repository on your local machine and move into the directory.
git clone https://github.com/philipwfowler/snpit.git
cd snpit
Virtual environment [recommended]
# install snpit and dependencies
make install
# make sure it is working
make test
# get the command to activate the environment
make activate
# activate the environment with the output from the above command
# start using snpit
snpit --help
# when you are done, exit the environment
deactivate
Without virtual environment
Note: We strongly encourage using a virtual environment if you are installing locally.
python3 setup.py install --user
# make sure it is working
python3 setup.py test
snpit --input in.vcf
Note: You do not need to specify anything special if your file is multi-sample.
snpit --input in.fa --output out.tsv
snpit -i in.vcf --filter -o out.tsv
This is a custom field that has been used in some CRyPTIC pipelines. It is used as a more fine-grained FILTER column in that some samples may pass for a position, and others may not.
snpit -i in.vcf --status -o out.tsv
snpit -i in.vcf --threshold 95
The threshold is the percentage of the positions known to identify this lineage that are found in your sample.
To get the full usage/help menu for snpit
just run
snpit --help
usage: snpit [-h] -i INPUT [-o OUTPUT] [--threshold THRESHOLD] [--filter]
[--status] [-v]
Whole genome SNP based identification of members of the Mycobacterium
tuberculosis complex. SNP-IT allows rapid Mycobacterial speciation of VCF
files aligned to NC000962 (H37Rv).
optional arguments:
-h, --help show this help message and exit
-i INPUT, --input INPUT
Path to the VCF or FAST(A/Q) file to read and
classify. File can be multi-sample and/or compressed.
-o OUTPUT, --output OUTPUT
Path to output results to. Default is STDOUT (-).
--threshold THRESHOLD
The percentage of snps above which a sample is
considered to belong to a lineage. [10.0]
--filter Whether to adhere to the FILTER column.
--status Whether to adhere to the STATUS column. This is a
custom field that gives more fine-grained control over
whether a sample passes a user-defined filtering
criterion, even if the record has PASS in FILTER.
-v, --version Show the program's version number and exit.
The output file is a tab-delimited file (containing a header).
Sample Species Lineage Sublineage Name Percentage
sample1 M. tuberculosis Lineage 2 N/A beijing 91.78
sample2 M. tuberculosis Lineage 2 N/A beijing 97.37
sample3 M. tuberculosis Lineage 4 Haarlem haarlem 100.0
From left to right, the columns are:
- Sample - the name of the sample. This is taken from the sample column heading in the VCF or the FAST(A/Q) header.
- Species - Species of the call.
- Lineage - Lineage of the call (if Mtb.).
- Sublineage - Sublineage of the call (if applicable).
- Name - name of file in the
lib/
directory where the marker variants for this call were taken from. This also relates to the common name for the lineage in some cases. - Percentage - Percentage of the call's variants found in the sample.
We welcome any contributions. Firstly, fork this repository and clone it locally.
Next, setup pipenv
for the project
make init
make install
make test
If you wish to put in a pull request to the main repository, please write a thorough description of the changes you have made.
This project uses the black code formatter. Please ensure any code you wish to merge has been formatted accordingly using
make lint