Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does not correctly decode old Illumina Phred score encoding #18

Closed
shigita opened this issue Sep 8, 2021 · 4 comments
Closed

Does not correctly decode old Illumina Phred score encoding #18

shigita opened this issue Sep 8, 2021 · 4 comments
Labels
bug Something isn't working enhancement New feature or request

Comments

@shigita
Copy link

shigita commented Sep 8, 2021

I ran Falco 0.2.4 and FastQC on old Illumina data, and noticed that the recognition of Phred score encoding differs between programs as follow:
Falco recognized as Illumina 1.9, and Per base sequence quality and Per seqence quality score look incorrect.
FastQC recognized as Illumina 1.5, and everything looks fine.

Attached, small example of old fastq and output from each program.

Here are the commands I ran:

# Falco
falco --nogroup read_1.fastq.gz --outdir falco_result

# FastQC
fastqc --nogroup read_1.fastq.gz --outdir fastqc_result

I hope these help reproduce the problem.
Thanks in advance.

@guilhermesena1 guilhermesena1 added bug Something isn't working enhancement New feature or request labels Sep 8, 2021
@guilhermesena1
Copy link
Collaborator

Hello,

Thank you for providing the data to reproduce the issue! We have some issues open regarding quality distributions on both nanopore and legacy phred32 from Illumina. Currently the program assumes everything is Illumina, and we will address this problem in the upcoming release. I sincerely apologize for the inconvenience!

@guilhermesena1
Copy link
Collaborator

with your data I was able to push a fix for the problem at baad210 . I haven't tested it thoroughly, but for your dataset at least it now gives the same plots for per base sequence quality, basic statistics and per sequence quality scores. Thank you once again for communicating the issue!

@shigita
Copy link
Author

shigita commented Sep 9, 2021

I have confirmed that it works even on my whole data!
Thank you very much for your prompt response 😊

@guilhermesena1
Copy link
Collaborator

glad to hear it! I'll close this for now just to keep track of current issues that need addressing but feel free to reopen if you run into any problems with the current push.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants