-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to best call variants with ska lo? #93
Comments
Hi Ryan, Thanks for your message. The command 'ska lo' calls variants in reference-free mode and only subsequently attempts to position SNPs on a reference genome, mostly designed for recombination analyses. So, as opposed to other methods that call variants directly from the reference genome, its SNP positioning is prone to errors (which does not necessarily imply that the SNPs are incorrect). More details here: https://www.biorxiv.org/content/10.1101/2024.10.02.616334v2 The current variant positioning is clearly limited, mostly due to a lack of time, and I/someone will have at some point to improve it. You are correct regarding some of its limitations:
Regarding quick fixes:
All that said, ska map might be sufficient to address the question sequencing reads vs assemblies (I really enjoyed reading your preprint!). best, |
We should fix the multi-contig reference, the code in ska map does this and we should be able to port it over to ska lo |
Hello, and thank you for developing SKA and the new
ska lo
subcommand!I'm interested in using SKA to call variants against a reference using a single assembly. This is what I've done previously (in this preprint) using
ska map
:This works well for most SNPs but not for closely-spaced SNPs or indels. But then @johnlees let me know about ska lo, which sounds like it could solve these shortcomings! But I can't quite figure out how to get the VCF I need using
ska lo
. These commands almost work:ska build -o ska.skf -k 31 assembly.fasta reference.fasta ska lo ska.skf test -r reference.fasta
Except I have these two issues:
ska lo
only seems to allow a single sequence in the reference, so it errors out whenreference.fasta
contains multiple sequences (e.g. a chromomsome and plasmids).Based on the documentation, the first issue seems to be an inherent limitation of
ska lo
, is that right?For the second issue, the best solution I've found is to run
ska lo
separately on each reference sequence and then merge the results together. This is what I've come up with:It's clunky, but it seems to work. Is there a better way?
Thanks!
Ryan
The text was updated successfully, but these errors were encountered: