GitHub - cytham/variantbreak: Structural variant analyzer for data visualization on VariantMap

VariantBreak - Structural variant analyzer for data visualization on VariantMap

VariantBreak is a python package that integrates all structural variants (SVs) from a cohort of NanoVar VCF files or variant BED files for visualization on VariantMap or summarized into a CSV file. It also annotates and filters all SVs across all samples according to user input GTF/GFF/BED files. Gene annotation files can be found here.

Basic capabilities

Intersects and merges all SV breakends from a sample cohort using NanoVar VCF files (NanoVar-v1.3.6 or above) or variant BED files.
Annotates each SV according to input GTF/GFF files or BED annotation files.
Filters SVs by adding a "HIT" or "MISS" label according to input BED filter files.
Creates a master pandas dataframe to store all data.
Creates a HDF5 file containing the master dataframe and some metadata which can be graphically visualized on VariantMap within Dash Bio.

Getting Started

Quick run

Command-line usage:

variantbreak [Options] -a annotation.gff3 -f filter.bed variant_path working_dir

Parameter	Argument	Comment
`-a`	annotation.gff3	path to single annotation file or directory containing annotation files of GTF/GFF or BED formats
`-f`	filter.bed	path to single filter file or directory containing filter files of BED format
-	variant_path	path to single variant file or directory containing variant files of VCF or BED formats
-	working_dir	path to working directory

Python console usage:

# Import variantbreak function from variantbreak package
from variantbreak import variantbreak

# Run variantbreak on your samples with annotation and filter files
df = variantbreak("/path/to/sample_dir/",
                  "/path/to/annotation_dir/",
                  "/path/to/filter_dir/")


# To save data to files
# Import write_to_file from variantbreak package
from variantbreak import write_to_files

# Specify dataframe variable, output file path and prefix, and delimiter of choice
write_to_files(df,
               "/path/to/output_prefix",
               sep="\t")

Output

Output file	Comment
output.h5	HDF5 file required for data visualization by VariantMap
output.csv	CSV file for data viewing, separated by the delimiter set by user
legend.txt	File containing the legend of the sample labels used in analysis

For more information, see wiki.

Operating system:

Linux (x86_64 architecture, tested in Ubuntu 16.04)

Installation:

There are three ways to install VariantBreak:

Option 1: Conda (Recommended)

# Installing from bioconda automatically installs all dependencies 
conda install -c bioconda variantbreak

Option 2: Pip (See dependencies below)

# Installing from PyPI requires own installation of dependencies, see below
pip install variantbreak

Option 3: GitHub (See dependencies below)

# Installing from GitHub requires own installation of dependencies, see below
git clone https://github.com/cytham/variantbreak.git 
cd variantbreak
pip install .

Installation of dependencies

bedtools >=2.26.0 (required to be in PATH by pybedtools)
pybedtools >=0.8.1
pandas >=1.0.3
tables >=3.6.1
fastcluster >=1.1.26

1. bedtools

Please visit here for instructions to install.

2. pybedtools

Please visit here for instructions to install.

3. pandas

Please visit here for instructions to install.

4. tables

pip install tables

or

conda install -c conda-forge pytables

5. fastcluster

pip install fastcluster

or

conda install -c conda-forge fastcluster

Documentation

See wiki for more information.

Versioning

See CHANGELOG

Citation

Not available

Author

Tham Cheng Yong - cytham

License

VariantBreak is licensed under GNU General Public License - see LICENSE.txt for details.

Limitations

Current version only allows input of VCF files generated by NanoVar. We will create a format adaptor in future versions to encompass VCF files generated by other SV callers.
Processing speed of large sample cohorts has not been tested. Currently, it takes about 30 minutes to process about 100,000 merged SVs.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
scripts		scripts
variantbreak		variantbreak
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGELOG.txt		CHANGELOG.txt
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VariantBreak - Structural variant analyzer for data visualization on VariantMap

Basic capabilities

Getting Started

Quick run

Command-line usage:

Python console usage:

Output

Operating system:

Installation:

Option 1: Conda (Recommended)

Option 2: Pip (See dependencies below)

Option 3: GitHub (See dependencies below)

Installation of dependencies

1. bedtools

2. pybedtools

3. pandas

4. tables

5. fastcluster

Documentation

Versioning

Citation

Author

License

Limitations

About

Releases 5

Packages

Languages

License

cytham/variantbreak

Folders and files

Latest commit

History

Repository files navigation

VariantBreak - Structural variant analyzer for data visualization on VariantMap

Basic capabilities

Getting Started

Quick run

Command-line usage:

Python console usage:

Output

Operating system:

Installation:

Option 1: Conda (Recommended)

Option 2: Pip (See dependencies below)

Option 3: GitHub (See dependencies below)

Installation of dependencies

1. bedtools

2. pybedtools

3. pandas

4. tables

5. fastcluster

Documentation

Versioning

Citation

Author

License

Limitations

About

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Languages

Packages