-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explore how to get a manifest of sequence files into Galaxy from ENA #310
Comments
@nekrut to refine and assign. |
ENA has a pretty nice REST API: URL for Search & Discovery Documentation can be found at ENA-Portal-API-doc List of fields that one can filter on: https://www.ebi.ac.uk/ena/portal/api/searchFields?result=read_run As most providers they have some limits of usage
1 Where would the user go to search ENA for sequences?There are a few GUI options:
The API that can be used to programmatically access the data can be explored using the following url: https://www.ebi.ac.uk/ena/portal/api/swagger-ui/index.html#/Search%20%26%20Discovery/search 2 Does ENA have the ability to exclude files used in a given assembly or include them from a given time period or geographic region?It's possible to filter on many field, example scientific_name, date or country.
which translate into the following url https://www.ebi.ac.uk/ena/portal/api/search?result=read_run&query=scientific_name%3D%22Taeniopygia%20guttata%22&fields=run_accession%2Cscientific_name%2Cfastq_ftp%2Cread_count&limit=10&format=tsv filter on date and sample accession
which translate into the following url https://www.ebi.ac.uk/ena/portal/api/search?result=read_run&query=%28sample_accession%3DSAMN02981239%20OR%20sample_accession%3DSAMN37045233%20OR%20sample_accession%3DSAMN12623621%29%20AND%20%28last_updated%3E%3D2023-08-23%20AND%20last_updated%3C%3D2023-08-25%20country%3Dchile%29&fields=run_accession%2Clast_updated%2Cscientific_name%2Ccountry%2Cinstrument_platform%2Cinstrument_model%2Cread_count%2Ctax_id&limit=0&format=tsv&download=false filter on countries and sample accession
which translate into the following url https://www.ebi.ac.uk/ena/portal/api/search?result=read_run&query=%28sample_accession%3DSAMN02981239%20OR%20sample_accession%3DSAMN37045233%20OR%20sample_accession%3DSAMN12623621%29%20AND%20%28country%3Dusa%20OR%20country%3Dchile%29&fields=run_accession%2Cscientific_name%2Ccountry%2Cinstrument_platform%2Cinstrument_model%2Cread_count%2Ctax_id&limit=0&format=tsv&download=false 3 How does the user export a manifest of the search results to their local file system or possibly directly to Galaxy if that connection exists?With both the GUI and REST API we can export json/tsv file/information with SRR IDS and fastq urls. I haven't found a forwarding button, like the one at SRA Run Selector, which can forward a manifest to galaxy. |
Return result is a tsv or json that can contain both actual file path and run_accession. fastq_md5 can also be included for verification. run_accession last_updated scientific_name fastq_ftp country instrument_platform instrument_model read_count tax_id
SRR25728136 2023-08-23 Aplochiton taeniatus ftp.sra.ebi.ac.uk/vol1/fastq/SRR257/036/SRR25728136/SRR25728136.fastq.gz Chile: Santo Domingo River, Valdivia, Los Rios district PACBIO_SMRT Sequel II 749636 946358 |
4 How could the manifest be uploaded to a given Galaxy history from the local file system? Must say that I don't like the solution of going to a separate page, download a file and then upload it to galaxy (doesn't feel really smooth). I would prefer an integrated component |
See #309 (comment) |
Need
Similar to #309 we need to explore importing a file manifest into a Galaxy workspace from ENA and using this to satisfy the sequencing files for worklfows such as Paired-end variant calling in haploid system.
Questions
The text was updated successfully, but these errors were encountered: