-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
configuration file and user-provided lineages #24
Comments
I see from the
So the lineages csv contains the lineages that will be tested for presence/contamination in the MAG? Should |
On Wed, May 13, 2020 at 08:41:19AM -0700, Taylor Reiter wrote:
The current test data configuration file has an input for lineages:
```
# lineages CSV (see `sourmash lca index`) for signatures in query databases
lineages_csv: test-data/podar-lineage.csv
```
>From the config file alone, it's unclear if it is necessary for the user to provide lineages, and if they do not provide lineages, what will happen/how the config file should be filled out.
these are lineages for the database(s), not the genomes - e.g.
gtdb. Suggested wording welcome! `provided_lineages` is the set of
optional overrides on input genomes.
Hmm, one good addition might actually be to provide a separate ~system-wide
config file that lists the query databases and lineages. That way you
only have to specify them there, and they flow through to the rest of the
projects.
|
Yes, I'm partial to only having to worry about databases once. So some how setting it up where charcoal will automatically download and configure databases for the user the first time the tool is used, and then the paths of those databases are propagated to all charcoal uses unless the user overrides it/wants to switch databases. As for wording, I think
Although that's kind of bad english and still not totally clear |
Exactly so.
I think that's a pretty reasonable approach, yes! maybe we can provide some commands like --
The trickiest bit(s) here are that we need to figure out good default locations for downloaded databases and so on. luckily w/sbt.zip support they're small enough that doing it on a per-install basis is probably ok, and we can support central installs if needed. |
(side note, sourmash lca index consumes such files, but does not produce them. that's more of a sourmash taxonomy kinda thing (tho that doesn't yet exist)) |
I love this idea!
Could we do something like |
On Wed, May 13, 2020 at 10:04:23AM -0700, Taylor Reiter wrote:
>The trickiest bit(s) here are that we need to figure out good default locations for downloaded databases and so on. luckily w/sbt.zip support they're small enough that doing it on a per-install basis is probably ok, and we can support central installs if needed.
Could we do something like `charcoal download_db -p /home/tereiter/charcoal_db`, where the user provides a path after `-p` and somehow charcoal then knows about that path?
yep. I think we would need to (try to) write to the central charcoal config
file, which (in conda) would be user-writeable if it's in the package.
where or not this is a good idea... less sure :). But we could provide a
user-override environment variable, too.
yay complexity.
|
remaining bits transferred to #61 |
The current test data configuration file has an input for lineages:
From the config file alone, it's unclear if it is necessary for the user to provide lineages, and if they do not provide lineages, what will happen/how the config file should be filled out.
The text was updated successfully, but these errors were encountered: