Memory leak or stall? #40

schorlton · 2022-09-15T10:37:44Z

Thanks for falco!

Running v1.2.0 installed from bioconda on a nanopore FASTQ

falco --format fastq -skip-report -t 1 -skip-summary nanopore.fastq.gz 
[limits]	using default limit cutoffs (no file specified)
[adapters]	using default adapters (no file specified)
[contaminants]	using default contaminant list (no file specified)
[Thu Sep 15 10:26:59 2022] Started reading file nanopore.fastq.gz
[Thu Sep 15 10:27:00 2022] reading file as gzipped FASTQ format
[running falco|===================================================|100%]
[Thu Sep 15 10:27:13 2022] Finished reading file
[Thu Sep 15 10:27:13 2022] Writing text report to ./fastqc_data.txt

It looks like it generates the fastqc_data.txt properly, but then after that, it consumes over 32GB of RAM over several minutes until I kill it...
Is this expected? For comparison, FastQC processes the file in 18s within 1GB RAM.

Here are the stats of the file:

seqkit stats nanopore.fastq.gz 
file              format  type  num_seqs      sum_len  min_len  avg_len  max_len
nanopore.fastq.gz  FASTQ   DNA     30,720  204,534,138      100    6,658   35,768

The text was updated successfully, but these errors were encountered:

guilhermesena1 · 2022-09-15T11:31:53Z

Hi

this may be a strange thing to assess but would you be able to try running the same thing without --format fastq (let falco figure out the format) and let us know if you have the same problem? May help us with zero in on the problem

thank you!

schorlton · 2022-09-15T12:58:28Z

Thanks for your quick response! Running without --format fastq hit the same problem. I also tried running on the ungzipped file without --format fastq and that also hit the same issue. Thanks!

guilhermesena1 · 2022-09-15T14:01:33Z

Hm that's very strange indeed. I had a suspicion that --format fastq was forcing the program to interpret the gzipped file as uncompressed, thus leading to problems.

would it be too much to ask if you could either share the entire dataset or a small subset that causes the problem on your machine? I'd be very interested in seeing if I can reproduce too, this may be more complicated than I thought. I understand that it's tricky to ask those things for private data so no worries if you cannot, or if it's more convenient you can always send it to me at desenabr[at]usc[dot]edu.

finally - and this is admittedly a hail mary - does the program also fail if you don't use any flags? Just falco nanopore.fastq.gz?

Thank you!

andrewdavidsmith · 2022-09-15T14:15:54Z

@schorlton If you are only able to share part of the dataset, please include something with the same structure for the read names unless they are really short and contiguous. Also, try to include any special characters that would be in the original file, for example by obtaining the smaller part with command line tools. Also, the most helpful part would have similar variability in lengths of reads. Thanks!

schorlton · 2022-09-15T14:50:23Z

Running falco nanopore.fastq.gz hits the same issue.
I've managed to reproduce on the first random public nanopore sequencing file I pulled from SRA:
SRR21529942.fastq.gz
Looking forward to your analysis.

andrewdavidsmith · 2022-09-15T15:04:52Z

I can report that, using the provided file:

The most recent release worked for me (1.2.0)
The most recent source worked for me
The most recent version on conda did not work (1.2.0) same with (1.1.0) from conda

If @guilhermesena1 notices the same, we will need to find the problem and update conda. In the meantime, @schorlton if you are in an environment where you can build falco from source, maybe you could try that using the source here. If you want to do that and encounter problems we can try to help. Otherwise we hope you'll wait for the fix on conda, which we will try to get done asap.

schorlton · 2022-09-15T15:14:46Z

Glad I'm not going crazy 😝
Thanks for your investigation - I'll stay tuned for a conda fix as I can use FastQC in the interim.
-Sam

andrewdavidsmith · 2022-09-15T15:40:09Z

Problem identified. Hopefully we can have a fix soon on conda. If you need it before we can do that, you'll have to try building from source -- preferably the release I linked above.

guilhermesena1 · 2022-09-15T16:11:55Z

The problem can also be reproduced on a clone/release if we delete/rename the files in the Configuration directory, so there may be something to do with the defaults.

@schorlton if you need a quick fix you can point to the configuration files in conda as follows

$ falco -c /path/to/your/conda/opt/falco-1.2.0/Configuration/contaminant_list.txt \
        -a /path/to/your/conda/opt/falco-1.2.0/Configuration/adapter_list.txt \
        -l /path/to/your/conda/opt/falco-1.2.0/Configuration/limits.txt \
        [the rest of your parameters here]

I think when compiling falco internally through conda (since they only leave the binary), they're still not finding the Configuration folder under /opt. The way fastqc does it is compiling everything at the conda root directory, moving everything to opt then making a symlink under the bin directory that points to the binary under opt. I didn't like that approach very much and was trying to avoid it, but I guess if we want to emulate their behavior, we will emulate their behavior. We'll try to fix this but this may take a bit longer since it's done through a pull request.

guilhermesena1 · 2022-09-15T17:12:25Z

I think I found the source of the problem and pushed a possible fix at bca5f11

When we used the default values the shortest adapter size wasn't being set properly, and decreasing 1 lead to underflow. You should also be able to run the code just using the -a flag in the command above. In any case we'll roll this into a new release asap and update on conda.

Sorry for the inconvenience and thank you again for sharing the issue with us!

schorlton · 2022-09-15T19:43:36Z

Thanks for the quick fix! Will test out the conda update once released.

schorlton · 2022-09-16T08:10:12Z

Unfortunately now getting a new error on 1.2.1 from bioconda:

falco -skip-summary -skip-report nanopore.fastq.gz 
[limitst]	using file /opt/conda/opt/falco-1.2.1/Configuration/limits.txt
instruction for limit duplication not found in file /opt/conda/opt/falco-1.2.1

guilhermesena1 · 2022-09-16T11:22:40Z

yep trying to figure this one out.
what's weird is it is reading the correct file, and even more, so if you do

falco -l opt/conda/opt/falco-1.2.1/Configuration/limits.txt [etc etc]

it runs ok, but not if you don't specify the file, even if the URL is correct.

guilhermesena1 · 2022-09-19T14:51:29Z

alright, I know better than to get my hopes up, but we made some changes to the most recent release and merged them to conda to address the issue.

As far as I can tell, a conda download of falco puts the config files at opt/falco/Configuration and the falco binary in conda correctly uses these files to process the data. @schorlton if you are still having issues and are still interested in testing the most recent changes, we would love to know if things are running correctly on your end too. Just bear in mind you may have to run conda remove falco prior to updating.

schorlton · 2022-09-22T16:45:36Z

It works - thank you! The only remaining issue is specifying the file format. Specifying --format fastq when the file is compressed FASTQ causes a crash. Omitting this works.

guilhermesena1 · 2022-09-23T19:16:14Z

We will create another issue to address this. Ideally fq and fq.gz should be treated equally, but the reading and process is a bit different, and technically calling --format fastq reports on the output that it is being read as uncompressed and failing makes sense since the input is technically malformatted. Regardless this does warrant clarification

I'll close this issue considering that the initial problem has been resolved, but any further problems regarding the stalling can always be added by reopening this.

guilhermesena1 mentioned this issue Sep 21, 2022

falco v1.1.0 hangs when processing bam file #41

Closed

guilhermesena1 closed this as completed Sep 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak or stall? #40

Memory leak or stall? #40

schorlton commented Sep 15, 2022

guilhermesena1 commented Sep 15, 2022

schorlton commented Sep 15, 2022

guilhermesena1 commented Sep 15, 2022 •

edited

Loading

andrewdavidsmith commented Sep 15, 2022

schorlton commented Sep 15, 2022

andrewdavidsmith commented Sep 15, 2022 •

edited

Loading

schorlton commented Sep 15, 2022

andrewdavidsmith commented Sep 15, 2022

guilhermesena1 commented Sep 15, 2022

guilhermesena1 commented Sep 15, 2022

schorlton commented Sep 15, 2022

schorlton commented Sep 16, 2022

guilhermesena1 commented Sep 16, 2022

guilhermesena1 commented Sep 19, 2022 •

edited

Loading

schorlton commented Sep 22, 2022

guilhermesena1 commented Sep 23, 2022

Memory leak or stall? #40

Memory leak or stall? #40

Comments

schorlton commented Sep 15, 2022

guilhermesena1 commented Sep 15, 2022

schorlton commented Sep 15, 2022

guilhermesena1 commented Sep 15, 2022 • edited Loading

andrewdavidsmith commented Sep 15, 2022

schorlton commented Sep 15, 2022

andrewdavidsmith commented Sep 15, 2022 • edited Loading

schorlton commented Sep 15, 2022

andrewdavidsmith commented Sep 15, 2022

guilhermesena1 commented Sep 15, 2022

guilhermesena1 commented Sep 15, 2022

schorlton commented Sep 15, 2022

schorlton commented Sep 16, 2022

guilhermesena1 commented Sep 16, 2022

guilhermesena1 commented Sep 19, 2022 • edited Loading

schorlton commented Sep 22, 2022

guilhermesena1 commented Sep 23, 2022

guilhermesena1 commented Sep 15, 2022 •

edited

Loading

andrewdavidsmith commented Sep 15, 2022 •

edited

Loading

guilhermesena1 commented Sep 19, 2022 •

edited

Loading