Releases · bakerccm/entrez_qiime

This is an updated release of a Python script (entrez_qiime.py) and accompanying guidelines (entrez_qiime.pdf) for a workflow to take an input FASTA file generated from the NCBI database (e.g. through an Entrez/gquery search) and generate the id-to-taxonomy mapping file needed to BLAST metabarcode data against those sequences using the QIIME script assign_taxonomy.py.

The original version of this workflow script and documentation has been available from the author Chris Baker since at least October 2012. It was uploaded to this GitHub repository and formally released as v1.0 essentially unchanged in September 2016.

This updated release in October 2016 includes a major change to the operation of the script. Instead of taking FASTA files with GI numbers as the sequence identifiers, it now takes FASTA files with NCBI accession.version numbers as the sequence identifiers. This change is intended to allow this workflow to continue being used when the NCBI phases out GI numbers in the GenBank, GenPept, and FASTA formats supported by NCBI for sequence records (https://www.ncbi.nlm.nih.gov/news/03-02-2016-phase-out-of-GI-numbers/).

In addition to this change: (i) the script now takes either a FASTA file, as before, or a list of accession numbers as input; (ii) the script now only outputs two files - the id-to-taxonomy mapping file as required by qiime, plus a logfile; (iii) sequences in the FASTA file (or list file) that do not appear in the taxonomy database are now included in the output, but with "NA;NA;NA;..." as their taxonomy string.

The PDF documentation has also been updated to reflect these changes.

This is a release of a Python script (entrez_qiime.py) and accompanying guidelines (entrez_qiime.pdf) for a workflow to take an input FASTA file generated from the NCBI database (e.g. through an Entrez search) and generate the files needed to BLAST metabarcode data against those sequences using the QIIME script assign_taxonomy.py.

These files have been available from the author Chris Baker essentially unchanged since at least October 2012. This release in September 2016 does not mark any change to the operation of the script, or to the content of the workflow, but rather is intended to provide a stable version for citation and development purposes. No attempt has been made to update the script or accompanying guidelines since 2012.

The Python script included in this release (v1.0) relies on the NCBI's use of GI numbers, which will shortly be phased out in favour of accession numbers (https://www.ncbi.nlm.nih.gov/news/03-02-2016-phase-out-of-GI-numbers/). When this change takes place, v1.0 will no longer function correctly with new files downloaded from the NCBI (but of course will continue to work fine with any files downloaded previously). A revised version of the code and workflow guidelines will be developed and made available in due course.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: bakerccm/entrez_qiime

entrez_qiime v2.0

entrez_qiime v1.0