Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

configuration file and user-provided lineages #24

Closed
taylorreiter opened this issue May 13, 2020 · 8 comments
Closed

configuration file and user-provided lineages #24

taylorreiter opened this issue May 13, 2020 · 8 comments

Comments

@taylorreiter
Copy link
Member

The current test data configuration file has an input for lineages:

# lineages CSV (see `sourmash lca index`) for signatures in query databases
lineages_csv: test-data/podar-lineage.csv

From the config file alone, it's unclear if it is necessary for the user to provide lineages, and if they do not provide lineages, what will happen/how the config file should be filled out.

@taylorreiter
Copy link
Member Author

I see from the tara-delmont conf file that lineages can be specified as:

# lineages CSV (see `sourmash lca index`) for signatures in query databases
lineages_csv: /home/ctbrown/sourmash_databases/gtdb/gtdb-lineages.csv

So the lineages csv contains the lineages that will be tested for presence/contamination in the MAG? Should gtdb-lineages.csv be the default? Can this db be downloaded by charcoal so the user doesn't have to think about it?

@ctb
Copy link
Member

ctb commented May 13, 2020 via email

@taylorreiter
Copy link
Member Author

Yes, I'm partial to only having to worry about databases once. So some how setting it up where charcoal will automatically download and configure databases for the user the first time the tool is used, and then the paths of those databases are propagated to all charcoal uses unless the user overrides it/wants to switch databases.

As for wording, I think lineages_csv is fine, but maybe adding something to the comment above like

# lineages CSV containing reference lineages to test for contamination. 
# Must correspond to  signatures in query databases (e.g. gtdb.csv). 
# See `sourmash lca index` to generate your own.

Although that's kind of bad english and still not totally clear

@ctb
Copy link
Member

ctb commented May 13, 2020

I see from the tara-delmont conf file that lineages can be specified as:

# lineages CSV (see `sourmash lca index`) for signatures in query databases
lineages_csv: /home/ctbrown/sourmash_databases/gtdb/gtdb-lineages.csv

So the lineages csv contains the lineages that will be tested for presence/contamination in the MAG?

Exactly so.

Should gtdb-lineages.csv be the default? Can this db be downloaded by charcoal so the user doesn't have to think about it?

I think that's a pretty reasonable approach, yes!

maybe we can provide some commands like --

  • charcoal download_db - download databases
  • charcoal config check - check location etc of databases
  • charcoal config generate - generate a new config file

The trickiest bit(s) here are that we need to figure out good default locations for downloaded databases and so on. luckily w/sbt.zip support they're small enough that doing it on a per-install basis is probably ok, and we can support central installs if needed.

@ctb
Copy link
Member

ctb commented May 13, 2020

(side note, sourmash lca index consumes such files, but does not produce them. that's more of a sourmash taxonomy kinda thing (tho that doesn't yet exist))

@taylorreiter
Copy link
Member Author

charcoal download_db - download databases
charcoal config check - check location etc of databases
charcoal config generate - generate a new config file

I love this idea!

The trickiest bit(s) here are that we need to figure out good default locations for downloaded databases and so on. luckily w/sbt.zip support they're small enough that doing it on a per-install basis is probably ok, and we can support central installs if needed.

Could we do something like charcoal download_db -p /home/tereiter/charcoal_db, where the user provides a path after -p and somehow charcoal then knows about that path?

@ctb
Copy link
Member

ctb commented May 13, 2020 via email

@ctb
Copy link
Member

ctb commented May 18, 2020

remaining bits transferred to #61

@ctb ctb closed this as completed May 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants