You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Because of the behavior described in nextstrain/ingest#18, the ingest pipeline does not include sequences in it's fetch from NCBI Virus. This results in all of the records being dropped in the pipeline and the final outputs to s3://nextstrain-data/files/workflows/monkeypox/ are empty. This was first flagged internally by downstream CZI consumers on Slack.
We don't have insight into the undocumented NCBI Virus API and whether this new behavior is intentional, so the best thing might be to just switch to the NCBI Datasets CLI to fetch data.
The text was updated successfully, but these errors were encountered:
I just hit the same problem because I basically copied your undocumented API query format for my non-SARS-CoV-2 pipelines 😆 .
FWIW for SARS-CoV-2 I've been using NCBI's datasets command. I've been using some SARS-CoV-2-only features in datasets, but it seems to provide basic info for other virus genomes now too. For example, to get only FASTA and BioSample .jsonl for norovirus (metadata API query still works), I can run this command:
Current Behavior
Because of the behavior described in nextstrain/ingest#18, the ingest pipeline does not include sequences in it's fetch from NCBI Virus. This results in all of the records being dropped in the pipeline and the final outputs to s3://nextstrain-data/files/workflows/monkeypox/ are empty. This was first flagged internally by downstream CZI consumers on Slack.
We don't have insight into the undocumented NCBI Virus API and whether this new behavior is intentional, so the best thing might be to just switch to the NCBI Datasets CLI to fetch data.
The text was updated successfully, but these errors were encountered: