Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nextstrain run: unmanaged pathogen support? #420

Open
Tracked by #1
tsibley opened this issue Mar 12, 2025 · 7 comments
Open
Tracked by #1

nextstrain run: unmanaged pathogen support? #420

tsibley opened this issue Mar 12, 2025 · 7 comments

Comments

@tsibley
Copy link
Member

tsibley commented Mar 12, 2025

Support nextstrain run for unmanaged pathogen sources, i.e. working copies of our repos, to aid development.

Two avenues to consider for such support.

  1. "Editable" setups: nextstrain setup pointed at local directory (like Python's "editable installs")
  2. No setup required: nextstrain run pointed at a local directory (like nextstrain build)

"Editable" setups

Accept a local file path/URL when setting up a version.

nextstrain setup avian-flu@dev=file://$PWD/avian-flu
nextstrain setup avian-flu@dev=$PWD/avian-flu
nextstrain setup avian-flu@dev=./avian-flu

cd avian-flu
nextstrain setup avian-flu@dev=.

Invoked the same as any other version.

nextstrain run avian-flu@dev segment-focused /tmp/analysis

Can be approximated currently by:

$ ln -sv ~/nextstrain/measles ~/.nextstrain/pathogens/measles/local="$(echo -n local | base32)"
$ nextstrain run measles@local …

No setup required

Distinguish managed pathogens vs. non-managed pathogens by
absence/presence of a path separator.

# Managed (current)
nextstrain run avian-flu segment-focused /tmp/analysis
nextstrain run avian-flu@v2 segment-focused /tmp/analysis

# Unmanaged (new)
nextstrain run ./avian-flu segment-focused /tmp/analyis

cd avian-flu
nextstrain run ./ segment-focused /tmp/analysis

and maybe special case . too for convenience?

nextstrain run . segment-focused /tmp/analysis

Note that we'd still keep pathogen and workflow as separate arguments (i.e. rather than combining them the way nextstrain build does) so that
workflows are still names and can be resolved via pathogen registration
metadata in the future.

History

I'd long anticipated some sort of "editable install" feature for nextstrain setup to support pathogen development, but several review comments spurred me to think about supporting directory paths directly in nextstrain run.

@tsibley
Copy link
Member Author

tsibley commented Mar 12, 2025

@jameshadfield @joverlee521 You're implicated in previous discussion/commentary here and so I'd definitely appreciate any thoughts you have.

@joverlee521
Copy link
Contributor

Huh, I was not expecting to use nextstrain run for development purposes, I thought that will remain as nextstrain build...the unmanaged pathogen support blurs the lines between the two commands that confuses me.

Am I understanding correctly that the main feature from nextstrain run that is helpful for development is the ability to use an external analysis directory? Could we just add this functionality to nextstrain build and keep nextstrain run simple?

Or should we port all of nextstrain build functionality to nextstrain run and deprecate the nextstrain build command?

@tsibley tsibley changed the title nextstrain run: unmanaged pathogen support nextstrain run: unmanaged pathogen support? Mar 13, 2025
@tsibley
Copy link
Member Author

tsibley commented Mar 13, 2025

It definitely blurs the lines a bit. There's still a line though: nextstrain build gives you direct access to Snakemake options and conceptually is oriented as a Snakemake wrapper. nextstrain run does not and conceptually is more removed from Snakemake.

I think external analysis directory is the main thing at the moment, yeah. This was driven by @jameshadfield's desired use cases. Adding support for external analysis directories to nextstrain build means adding an option, e.g. --analysis-dir or --workdir, and either a) doing some more assuming than previously about Snakemake setups¹ (e.g. fixed Snakefile instead of Snakemake's default behaviour of searching for the alternates snakefile, workflow/Snakefile, and workflow/snakefile) or b) doing a fair bit of reworking of assumptions on the AWS Batch entrypoint. All doable, but it seemed maybe better (and certainly easier) to bring unmanaged pathogen support to nextstrain run instead of bringing analysis dirs to nextstrain build. Maybe not though! (Or maybe we should do both?)

I don't think we should port all of nextstrain build to nextstrain run and deprecate build. Different interfaces for different purposes are ok, in my mind, and I don't see a reason to force anyone to change away from nextstrain build if it's working for them.

@tsibley
Copy link
Member Author

tsibley commented Mar 13, 2025

As long as the two commands are separate, another reason this support seems useful is that it means nextstrain run can be tested during development (alongside nextstrain build use) without the slow, frustrating iteration loop of pushing stuff to GitHub and running nextstrain setup or nextstrain update again.

@joverlee521
Copy link
Contributor

Different interfaces for different purposes are ok, in my mind, and I don't see a reason to force anyone to change away from nextstrain build if it's working for them.

True! After thinking on this more, I see the need for supporting development with nextstrain run. If we eventually want all pathogen repos to work with nextstrain run, it makes more sense to test with it during development!

Personally, I do reach for the various Snakemake options when testing workflow changes, so I will have to figure out what's the best development cycle here...

it means nextstrain run can be tested during development (alongside nextstrain build use) without the slow, frustrating iteration loop of pushing stuff to GitHub and running nextstrain setup or nextstrain update again.

That sounds like good reason to go with your proposed "No setup required" version.

@victorlin
Copy link
Member

doing some more assuming than previously about Snakemake setups¹

@tsibley what does the ¹ refer to?

@victorlin
Copy link
Member

If nextstrain build is intended to be "dev mode" for all scenarios, including external analysis directories, it seems necessary for that command to support external analysis directories. Otherwise, users would need to copy/link config files back to the pathogen repo to run something like nextstrain build . segment-focused --forceall, which would be cumbersome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants