Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

premature sequence truncations #82

Open
brandynwest opened this issue Jan 28, 2020 · 0 comments
Open

premature sequence truncations #82

brandynwest opened this issue Jan 28, 2020 · 0 comments

Comments

@brandynwest
Copy link

I'm really enjoying the abstar program. I love how easy it is to access various parameters in the JSON output file using python. I have come across an oddity that makes abstar output difficult to use in downstream applications: often times either the N-term, the C-term, or both are truncated. This occurs when running a standard command on the test data (abstar -o ./output -t ./temp --use-test-file). Here's a specific example dealing with VRC26.15 heavy chain from the test_hiv_bnab_hcs.fasta test set (VRC26.15 is just one of many examples where this seems to occur.):

For VRC26.15, the N-terminal residue should be a glutamate (E) encoded by the first 3 nucleotides of the raw query sequence (GAG). Instead, alignment appears to begin with AG... meaning that the N-terminal G nucleotide does not appear to contribute to the alignment. This results in an amino acid sequence that starts with "VQLV..." instead of the expected "EVQLV..." I'm wondering if this is somehow connected to python slicing 0 vs 1 (maybe the query start parameter needs to be 0 instead of 1?).

VRC26.15 is a heavy chain variable domain, so I would expect an ending of ~...TVSS. However, the vdj_aa sequence is truncated by one S to read ...TVS. The program is able to identify the correct ending: "J-GENE AA SEQUENCE: IWGQGTMVTVSS"; however, for the VDJ assembly, the coding region appears to have been truncated and the vdj_aa sequence now reads ...TVS.

I'm not very experienced with python/coding in general, and although I have spent several days looking through the code, I can't figure out how this truncation is occurring. I'm wondering whether this is an issue with how abstar decides where the coding region is, or is the chopping of ends an inherent issue with blastn in general?

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant