You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Context
The workflow uses a custom Python script to perform codon-aware alignment, to resolve issues with variable sites in mafft's alignments. Since @rneher wrote this script, he and @ivan-aksamentov developed nextalign. Even though nextalign was developed with SARS-CoV-2 in mind, it works well for H3N2 HA sequences, too, and is way faster.
Description / proposed solution
We should try using nextalign for our flu builds. We'll need to setup the corresponding FASTA reference files for all of the lineages and segments or maybe implement GenBank file support in nextalign (whichever is easier...I can guess which though!). Then, we can take advantage of the codon-aware alignment functionality and also run alignments with multiple threads to speed up that step of our builds.
Once we have nextalign in place, we could start to do analyses that previously would have taken too long like creating multiple sequence alignments of all amino acid sequences for HA and running the titer substitution model on all available sequences and titers.
The text was updated successfully, but these errors were encountered:
Now that Nextclade has seasonal flu datasets, we should consider using Nextclade for alignment and clade annotations in our standard builds. This approach would quickly produce codon-aware nucleotide alignments, amino acid translations, and clade annotations for every sequence in the database.
We'll need to setup the corresponding FASTA reference files for all of the lineages and segments
For the Nextalign part, there are many input files in Nextclade repo already. You'll probably need something even more sophisticated that that. But this might be a partial solution ar at least a starting point:
We should consider using Nextclade for alignment and clade annotations in our standard builds
Nextclade would require more files to run than Nextalign, including a reference auspice tree for every variation, every root sequence etc. So much more involved in terms of science things. Unless it all can piggyback on the existing trees somehow:
Context
The workflow uses a custom Python script to perform codon-aware alignment, to resolve issues with variable sites in mafft's alignments. Since @rneher wrote this script, he and @ivan-aksamentov developed nextalign. Even though nextalign was developed with SARS-CoV-2 in mind, it works well for H3N2 HA sequences, too, and is way faster.
Description / proposed solution
We should try using nextalign for our flu builds. We'll need to setup the corresponding FASTA reference files for all of the lineages and segments or maybe implement GenBank file support in nextalign (whichever is easier...I can guess which though!). Then, we can take advantage of the codon-aware alignment functionality and also run alignments with multiple threads to speed up that step of our builds.
Once we have nextalign in place, we could start to do analyses that previously would have taken too long like creating multiple sequence alignments of all amino acid sequences for HA and running the titer substitution model on all available sequences and titers.
The text was updated successfully, but these errors were encountered: