Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with '--keep_ambiguous' Option and Possible Bug #346

Closed
ijorge24 opened this issue Jul 31, 2024 · 3 comments · Fixed by #349
Closed

Issue with '--keep_ambiguous' Option and Possible Bug #346

ijorge24 opened this issue Jul 31, 2024 · 3 comments · Fixed by #349
Labels
bug Something isn't working user-query User queries & requests

Comments

@ijorge24
Copy link

ijorge24 commented Jul 31, 2024

Description of the bug

Hi,
I was testing the code with the PGS000872 dataset using a VCF file with a few variants. I noticed that transversions were not matched by default. To include these SNVs, I had to add the '--keep_ambiguous true'option. However, when I executed the script with this option, I encountered an error in the following line: https://github.com/PGScatalog/pygscatalog/blob/main/pgscatalog.match/src/pgscatalog/match/lib/_match/label.py#L262 .
It seems that the method 'with_column' should be corrected to 'with_columns' with an 's' at the end.
Additionally, I have a question: Why is the '--keep_ambiguous' option set to false by default rather than true?
Thank you!

Command used and terminal output

~/Software/pgsc_calc$ sudo nextflow run pgscatalog/pgsc_calc -profile docker --input assets/examples/PGS000872_samplesheet.csv --pgs_id PGS000872 --target_build GRCh37 --min_overlap 0.5
Nextflow 24.04.3 is available - Please consider updating your version to it
N E X T F L O W  ~  version 23.10.1
NOTE: Your local project version looks outdated - a different revision is available in the remote repository [69c467ecc3]
Launching `https://github.com/pgscatalog/pgsc_calc` [boring_lamarr] DSL2 - revision: 0f33b4cff0 [main]


------------------------------------------------------
  pgscatalog/pgsc_calc v2.0.0-beta.1-g0f33b4c
------------------------------------------------------
Core Nextflow options
  revision          : main
  runName           : boring_lamarr
  containerEngine   : docker
  launchDir         : /home/Software/pgsc_calc
  workDir           : /home/Software/pgsc_calc/work
  projectDir        : /root/.nextflow/assets/pgscatalog/pgsc_calc
  userName          : root
  profile           : docker
  configFiles       : 

Input/output options
  input             : assets/examples/PGS000872_samplesheet.csv
  pgs_id            : PGS000872
  outdir            : /home/Software/pgsc_calc/results

Reference options
  ref_samplesheet   : /root/.nextflow/assets/pgscatalog/pgsc_calc/assets/ancestry/reference.csv
  ld_grch37         : /root/.nextflow/assets/pgscatalog/pgsc_calc/assets/ancestry/high-LD-regions-hg19-GRCh37.txt
  ld_grch38         : /root/.nextflow/assets/pgscatalog/pgsc_calc/assets/ancestry/high-LD-regions-hg38-GRCh38.txt
  ancestry_checksums: /root/.nextflow/assets/pgscatalog/pgsc_calc/assets/ancestry/checksums.txt

Compatibility options
  target_build      : GRCh37

Matching options
  min_overlap       : 0.5

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use pgscatalog/pgsc_calc for your analysis please cite:

* The Polygenic Score Catalog
  https://doi.org/10.1101/2024.05.29.24307783
  https://doi.org/10.1038/s41588-021-00783-5

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://github.com/pgscatalog/pgsc_calc/blob/main/CITATIONS.md

executor >  local (11)
[0e/30e9be] process > PGSCATALOG_PGSCCALC:PGSCCALC:DOWNLOAD_SCOREFILES ([pgs_id:PGS000872, pgp_id:, trait_efo:])         [100%] 1 of 1 ✔
[a2/962254] process > PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1)                                    [100%] 1 of 1 ✔
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELBIM                                     -
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR                                    -
[skipped  ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF (TEST2 chromosome 2)                       [100%] 3 of 3, stored: 3 ✔
[cc/15a7a8] process > PGSCATALOG_PGSCCALC:PGSCCALC:MATCH:MATCH_VARIANTS (TEST2 chromosome 19)                            [100%] 3 of 3 ✔
[a4/c8bfd7] process > PGSCATALOG_PGSCCALC:PGSCCALC:MATCH:MATCH_COMBINE (TEST2)                                           [100%] 1 of 1 ✔
[20/f7ab25] process > PGSCATALOG_PGSCCALC:PGSCCALC:APPLY_SCORE:PLINK2_SCORE (TEST2 chromosome 19 effect type additive 0) [100%] 2 of 2 ✔
[7e/562fa4] process > PGSCATALOG_PGSCCALC:PGSCCALC:APPLY_SCORE:SCORE_AGGREGATE (TEST2)                                   [100%] 1 of 1 ✔
[70/0df9d2] process > PGSCATALOG_PGSCCALC:PGSCCALC:REPORT:SCORE_REPORT (TEST2)                                           [100%] 1 of 1 ✔
[b7/827928] process > PGSCATALOG_PGSCCALC:PGSCCALC:DUMPSOFTWAREVERSIONS (1)                                              [100%] 1 of 1 ✔
[skipping] Stored process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF (2)
[skipping] Stored process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF (3)
[skipping] Stored process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF (1)
-[pgscatalog/pgsc_calc] Pipeline completed successfully-

~/Software/pgsc_calc$ sudo nextflow run pgscatalog/pgsc_calc -profile docker --input assets/examples/PGS000872_samplesheet.csv --pgs_id PGS000872 --target_build GRCh37 --min_overlap 0.5 --keep_ambiguous true
Nextflow 24.04.3 is available - Please consider updating your version to it
N E X T F L O W  ~  version 23.10.1
NOTE: Your local project version looks outdated - a different revision is available in the remote repository [69c467ecc3]
Launching `https://github.com/pgscatalog/pgsc_calc` [magical_mirzakhani] DSL2 - revision: 0f33b4cff0 [main]


------------------------------------------------------
  pgscatalog/pgsc_calc v2.0.0-beta.1-g0f33b4c
------------------------------------------------------
Core Nextflow options
  revision          : main
  runName           : magical_mirzakhani
  containerEngine   : docker
  launchDir         : /home/Software/pgsc_calc
  workDir           : /home/Software/pgsc_calc/work
  projectDir        : /root/.nextflow/assets/pgscatalog/pgsc_calc
  userName          : root
  profile           : docker
  configFiles       : 

Input/output options
  input             : assets/examples/PGS000872_samplesheet.csv
  pgs_id            : PGS000872
  outdir            : /home/Software/pgsc_calc/results

Reference options
  ref_samplesheet   : /root/.nextflow/assets/pgscatalog/pgsc_calc/assets/ancestry/reference.csv
  ld_grch37         : /root/.nextflow/assets/pgscatalog/pgsc_calc/assets/ancestry/high-LD-regions-hg19-GRCh37.txt
  ld_grch38         : /root/.nextflow/assets/pgscatalog/pgsc_calc/assets/ancestry/high-LD-regions-hg38-GRCh38.txt
  ancestry_checksums: /root/.nextflow/assets/pgscatalog/pgsc_calc/assets/ancestry/checksums.txt

Compatibility options
  target_build      : GRCh37

Matching options
  keep_ambiguous    : true
  min_overlap       : 0.5

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use pgscatalog/pgsc_calc for your analysis please cite:

* The Polygenic Score Catalog
  https://doi.org/10.1101/2024.05.29.24307783
  https://doi.org/10.1038/s41588-021-00783-5

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://github.com/pgscatalog/pgsc_calc/blob/main/CITATIONS.md

executor >  local (6)
[2e/641b12] process > PGSCATALOG_PGSCCALC:PGSCCALC:DOWNLOAD_SCOREFILES ([pgs_id:PGS000872, pgp_id:, trait_efo:]) [100%] 1 of 1 ✔
[69/acb639] process > PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1)                            [100%] 1 of 1 ✔
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELBIM                             -
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR                            -
[skipped  ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF (TEST2 chromosome 22)              [100%] 3 of 3, stored: 3 ✔
[6a/c62863] process > PGSCATALOG_PGSCCALC:PGSCCALC:MATCH:MATCH_VARIANTS (TEST2 chromosome 22)                    [100%] 3 of 3 ✔
[5a/df9347] process > PGSCATALOG_PGSCCALC:PGSCCALC:MATCH:MATCH_COMBINE (TEST2)                                   [  0%] 0 of 1
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:APPLY_SCORE:PLINK2_SCORE                                      -
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:APPLY_SCORE:SCORE_AGGREGATE                                   -
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:REPORT:SCORE_REPORT                                           -
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:DUMPSOFTWAREVERSIONS                                          -
[skipping] Stored process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF (2)
[skipping] Stored process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF (1)
[skipping] Stored process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF (3)
ERROR ~ Error executing process > 'PGSCATALOG_PGSCCALC:PGSCCALC:MATCH:MATCH_COMBINE (TEST2)'

Caused by:
  Process `PGSCATALOG_PGSCCALC:PGSCCALC:MATCH:MATCH_COMBINE (TEST2)` terminated with an error exit status (1)

Command executed:

  export POLARS_MAX_THREADS=2
  
  pgscatalog-matchmerge                          --dataset TEST2             --scorefile scorefiles.txt.gz             --matches *.ipc.zst             --min_overlap 0.5             --keep_ambiguous                          --outdir $PWD             --split                          -v
  
  cat <<-END_VERSIONS > versions.yml
  MATCH_COMBINE:
      pgscatalog.match: $(echo $(python -c 'import pgscatalog.match; print(pgscatalog.match.__version__)'))
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  pgscatalog.match.cli.merge_cli: 2024-07-31 12:34:46 DEBUG    Verbose logging enabled
  pgscatalog.match.cli.merge_cli: 2024-07-31 12:34:46 INFO     --cleanup set (default), temporary files will be deleted
  pgscatalog.match.lib.scoringfileframe: 2024-07-31 12:34:46 DEBUG    Converting ScoringFileFrame(NormalisedScoringFile('scorefiles.txt.gz')) to feather format
  pgscatalog.match.lib.scoringfileframe: 2024-07-31 12:34:46 DEBUG    ScoringFileFrame(NormalisedScoringFile('scorefiles.txt.gz')) feather conversion complete
  pgscatalog.match.lib._match.preprocess: 2024-07-31 12:34:46 DEBUG    Complementing column effect_allele
  pgscatalog.match.lib._match.preprocess: 2024-07-31 12:34:46 DEBUG    Complementing column other_allele
  pgscatalog.match.lib._match.label: 2024-07-31 12:34:46 DEBUG    Labelling best match type (refalt > altref > ...)
  pgscatalog.match.lib._match.label: 2024-07-31 12:34:46 DEBUG    Labelling duplicated best match: keeping first instance as best_match = True
  pgscatalog.match.lib._match.label: 2024-07-31 12:34:46 DEBUG    Labelling multiple scoring file lines (accession/row_nr) that best_match to the same variant
  pgscatalog.match.lib._match.label: 2024-07-31 12:34:46 DEBUG    Labelling all duplicates with exclude flag
  pgscatalog.match.lib._match.label: 2024-07-31 12:34:46 DEBUG    Labelling ambiguous variants
  pgscatalog.match.lib._match.preprocess: 2024-07-31 12:34:46 DEBUG    Complementing column REF
  Traceback (most recent call last):
    File "/app/pgscatalog.utils/.venv/bin/pgscatalog-matchmerge", line 8, in <module>
      sys.exit(run_merge())
               ^^^^^^^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/cli/merge_cli.py", line 70, in run_merge
      matchdf = write_matches(matchresults=matchresults, score_df=score_df)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/cli/_write.py", line 33, in write_matches
      _ = matchresults.write_scorefiles(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/lib/matchresult.py", line 267, in write_scorefiles
      _ = self.label(**kwargs)  # self.df gets updated
          ^^^^^^^^^^^^^^^^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/lib/matchresult.py", line 223, in label
      df = self.df.pipe(
           ^^^^^^^^^^^^^
executor >  local (6)
[2e/641b12] process > PGSCATALOG_PGSCCALC:PGSCCALC:DOWNLOAD_SCOREFILES ([pgs_id:PGS000872, pgp_id:, trait_efo:]) [100%] 1 of 1 ✔
[69/acb639] process > PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1)                            [100%] 1 of 1 ✔
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELBIM                             -
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR                            -
[skipped  ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF (TEST2 chromosome 22)              [100%] 3 of 3, stored: 3 ✔
[6a/c62863] process > PGSCATALOG_PGSCCALC:PGSCCALC:MATCH:MATCH_VARIANTS (TEST2 chromosome 22)                    [100%] 3 of 3 ✔
[5a/df9347] process > PGSCATALOG_PGSCCALC:PGSCCALC:MATCH:MATCH_COMBINE (TEST2)                                   [100%] 1 of 1, failed: 1 ✘
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:APPLY_SCORE:PLINK2_SCORE                                      -
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:APPLY_SCORE:SCORE_AGGREGATE                                   -
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:REPORT:SCORE_REPORT                                           -
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:DUMPSOFTWAREVERSIONS                                          -
[skipping] Stored process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF (2)
[skipping] Stored process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF (1)
[skipping] Stored process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF (3)
ERROR ~ Error executing process > 'PGSCATALOG_PGSCCALC:PGSCCALC:MATCH:MATCH_COMBINE (TEST2)'

Caused by:
  Process `PGSCATALOG_PGSCCALC:PGSCCALC:MATCH:MATCH_COMBINE (TEST2)` terminated with an error exit status (1)

Command executed:

  export POLARS_MAX_THREADS=2
  
  pgscatalog-matchmerge                          --dataset TEST2             --scorefile scorefiles.txt.gz             --matches *.ipc.zst             --min_overlap 0.5             --keep_ambiguous                          --outdir $PWD             --split                          -v
  
  cat <<-END_VERSIONS > versions.yml
  MATCH_COMBINE:
      pgscatalog.match: $(echo $(python -c 'import pgscatalog.match; print(pgscatalog.match.__version__)'))
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  pgscatalog.match.cli.merge_cli: 2024-07-31 12:34:46 DEBUG    Verbose logging enabled
  pgscatalog.match.cli.merge_cli: 2024-07-31 12:34:46 INFO     --cleanup set (default), temporary files will be deleted
  pgscatalog.match.lib.scoringfileframe: 2024-07-31 12:34:46 DEBUG    Converting ScoringFileFrame(NormalisedScoringFile('scorefiles.txt.gz')) to feather format
  pgscatalog.match.lib.scoringfileframe: 2024-07-31 12:34:46 DEBUG    ScoringFileFrame(NormalisedScoringFile('scorefiles.txt.gz')) feather conversion complete
  pgscatalog.match.lib._match.preprocess: 2024-07-31 12:34:46 DEBUG    Complementing column effect_allele
  pgscatalog.match.lib._match.preprocess: 2024-07-31 12:34:46 DEBUG    Complementing column other_allele
  pgscatalog.match.lib._match.label: 2024-07-31 12:34:46 DEBUG    Labelling best match type (refalt > altref > ...)
  pgscatalog.match.lib._match.label: 2024-07-31 12:34:46 DEBUG    Labelling duplicated best match: keeping first instance as best_match = True
  pgscatalog.match.lib._match.label: 2024-07-31 12:34:46 DEBUG    Labelling multiple scoring file lines (accession/row_nr) that best_match to the same variant
  pgscatalog.match.lib._match.label: 2024-07-31 12:34:46 DEBUG    Labelling all duplicates with exclude flag
  pgscatalog.match.lib._match.label: 2024-07-31 12:34:46 DEBUG    Labelling ambiguous variants
  pgscatalog.match.lib._match.preprocess: 2024-07-31 12:34:46 DEBUG    Complementing column REF
  Traceback (most recent call last):
    File "/app/pgscatalog.utils/.venv/bin/pgscatalog-matchmerge", line 8, in <module>
      sys.exit(run_merge())
               ^^^^^^^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/cli/merge_cli.py", line 70, in run_merge
      matchdf = write_matches(matchresults=matchresults, score_df=score_df)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/cli/_write.py", line 33, in write_matches
      _ = matchresults.write_scorefiles(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/lib/matchresult.py", line 267, in write_scorefiles
      _ = self.label(**kwargs)  # self.df gets updated
          ^^^^^^^^^^^^^^^^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/lib/matchresult.py", line 223, in label
      df = self.df.pipe(
           ^^^^^^^^^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/polars/lazyframe/frame.py", line 678, in pipe
      return function(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/lib/_match/label.py", line 42, in label_matches
      .pipe(_label_biallelic_ambiguous, remove_ambiguous)
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
executor >  local (6)
[2e/641b12] process > PGSCATALOG_PGSCCALC:PGSCCALC:DOWNLOAD_SCOREFILES ([pgs_id:PGS000872, pgp_id:, trait_efo:]) [100%] 1 of 1 ✔
[69/acb639] process > PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1)                            [100%] 1 of 1 ✔
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELBIM                             -
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR                            -
[skipped  ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF (TEST2 chromosome 22)              [100%] 3 of 3, stored: 3 ✔
[6a/c62863] process > PGSCATALOG_PGSCCALC:PGSCCALC:MATCH:MATCH_VARIANTS (TEST2 chromosome 22)                    [100%] 3 of 3 ✔
[5a/df9347] process > PGSCATALOG_PGSCCALC:PGSCCALC:MATCH:MATCH_COMBINE (TEST2)                                   [100%] 1 of 1, failed: 1 ✘
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:APPLY_SCORE:PLINK2_SCORE                                      -
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:APPLY_SCORE:SCORE_AGGREGATE                                   -
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:REPORT:SCORE_REPORT                                           -
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:DUMPSOFTWAREVERSIONS                                          -
[skipping] Stored process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF (2)
[skipping] Stored process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF (1)
[skipping] Stored process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF (3)
Execution cancelled -- Finishing pending tasks before exit
ERROR ~ Error executing process > 'PGSCATALOG_PGSCCALC:PGSCCALC:MATCH:MATCH_COMBINE (TEST2)'

Caused by:
  Process `PGSCATALOG_PGSCCALC:PGSCCALC:MATCH:MATCH_COMBINE (TEST2)` terminated with an error exit status (1)

Command executed:

  export POLARS_MAX_THREADS=2
  
  pgscatalog-matchmerge                          --dataset TEST2             --scorefile scorefiles.txt.gz             --matches *.ipc.zst             --min_overlap 0.5             --keep_ambiguous                          --outdir $PWD             --split                          -v
  
  cat <<-END_VERSIONS > versions.yml
  MATCH_COMBINE:
      pgscatalog.match: $(echo $(python -c 'import pgscatalog.match; print(pgscatalog.match.__version__)'))
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  pgscatalog.match.cli.merge_cli: 2024-07-31 12:34:46 DEBUG    Verbose logging enabled
  pgscatalog.match.cli.merge_cli: 2024-07-31 12:34:46 INFO     --cleanup set (default), temporary files will be deleted
  pgscatalog.match.lib.scoringfileframe: 2024-07-31 12:34:46 DEBUG    Converting ScoringFileFrame(NormalisedScoringFile('scorefiles.txt.gz')) to feather format
  pgscatalog.match.lib.scoringfileframe: 2024-07-31 12:34:46 DEBUG    ScoringFileFrame(NormalisedScoringFile('scorefiles.txt.gz')) feather conversion complete
  pgscatalog.match.lib._match.preprocess: 2024-07-31 12:34:46 DEBUG    Complementing column effect_allele
  pgscatalog.match.lib._match.preprocess: 2024-07-31 12:34:46 DEBUG    Complementing column other_allele
  pgscatalog.match.lib._match.label: 2024-07-31 12:34:46 DEBUG    Labelling best match type (refalt > altref > ...)
  pgscatalog.match.lib._match.label: 2024-07-31 12:34:46 DEBUG    Labelling duplicated best match: keeping first instance as best_match = True
  pgscatalog.match.lib._match.label: 2024-07-31 12:34:46 DEBUG    Labelling multiple scoring file lines (accession/row_nr) that best_match to the same variant
  pgscatalog.match.lib._match.label: 2024-07-31 12:34:46 DEBUG    Labelling all duplicates with exclude flag
  pgscatalog.match.lib._match.label: 2024-07-31 12:34:46 DEBUG    Labelling ambiguous variants
  pgscatalog.match.lib._match.preprocess: 2024-07-31 12:34:46 DEBUG    Complementing column REF
  Traceback (most recent call last):
    File "/app/pgscatalog.utils/.venv/bin/pgscatalog-matchmerge", line 8, in <module>
      sys.exit(run_merge())
               ^^^^^^^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/cli/merge_cli.py", line 70, in run_merge
      matchdf = write_matches(matchresults=matchresults, score_df=score_df)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/cli/_write.py", line 33, in write_matches
      _ = matchresults.write_scorefiles(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/lib/matchresult.py", line 267, in write_scorefiles
      _ = self.label(**kwargs)  # self.df gets updated
          ^^^^^^^^^^^^^^^^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/lib/matchresult.py", line 223, in label
      df = self.df.pipe(
           ^^^^^^^^^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/polars/lazyframe/frame.py", line 678, in pipe
      return function(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/lib/_match/label.py", line 42, in label_matches
      .pipe(_label_biallelic_ambiguous, remove_ambiguous)
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/polars/lazyframe/frame.py", line 678, in pipe
      return function(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/lib/_match/label.py", line 262, in _label_biallelic_ambiguous
      ambig.with_column(pl.lit(False).alias("exclude_ambiguous"))
      ^^^^^^^^^^^^^^^^^
  AttributeError: 'LazyFrame' object has no attribute 'with_column'. Did you mean: 'with_columns'?

Work dir:
  /home/Software/pgsc_calc/work/5a/df93479909843e85d6b6510e0192f2

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details
ERROR ~ ERROR: Matching subworkflow failed

 -- Check '.nextflow.log' file for details
ERROR ~ ERROR: No results report written!

 -- Check '.nextflow.log' file for details
ERROR ~ ERROR: No scores calculated!

 -- Check '.nextflow.log' file for details

Relevant files

pgsc_calc.zip

System information

pgscatalog/pgsc_calc v2.0.0-beta.1-g0f33b4c

@smlmbrt
Copy link
Member

smlmbrt commented Jul 31, 2024

Hi @ijorge24, thanks for the bug report! We'll make a fix.

RE:

Additionally, I have a question: Why is the '--keep_ambiguous' option set to false by default rather than true?

The reason we don't include ambiguous variants (A/T & C/G SNPs) by default is that often times the scores and the genotypes will be on different builds and it including these SNPs would involve tallying incorrect dosages if there has been a strand-flip across builds. Even on the same build you can get improperly strand-oriented data. To some extent this can be fixed if you know the allele-frequencies (but most scores don't come with this information so we have omitted it). Basically we set the default to no-ambiguous matches as a conservative measure but still allow users to customise the matching based on their own judgment.

@smlmbrt smlmbrt added the user-query User queries & requests label Jul 31, 2024
@smlmbrt smlmbrt added this to the v.2.0.0-beta.3 milestone Aug 2, 2024
@ijorge24
Copy link
Author

ijorge24 commented Aug 5, 2024

Hi again!I wanted to notify you about an issue I encountered while running the code with the '--keep_ambiguous' option: nextflow run pgscatalog/pgsc_calc -profile docker --input assets/examples/PGS000872_samplesheet.csv --pgs_id PGS000872 --target_build GRCh37 --min_overlap 0.5
I observed that the error still here. After examining the code, I noticed that the version of the 'pgscatalog-utils' image being used is outdated. I resolved the issue updating the files
https://github.com/PGScatalog/pgsc_calc/blob/main/conf/modules.config https://github.com/PGScatalog/pgsc_calc/blob/main/environments/pgscatalog_utils/environment.yml
changing from 'pgscatalog-utils=1.2.0' to 'pgscatalog-utils=1.3.0'.
Thanks for update the code so quickly!

@nebfield
Copy link
Member

nebfield commented Aug 5, 2024

Thanks for leeting us know! Sometimes the pgscatalog python packages and pgsc_calc releases don't synchronise totally 😄

We were doing some integration tests with the calculator and spotted a problem with pgscatalog-match, so we need to do another patch release before updating it. You might experience some problems using pgscatalog-utils=1.3.0.

When the problems are fixed and everything is released we'll let you know and close this issue 😅

@nebfield nebfield linked a pull request Aug 6, 2024 that will close this issue
@nebfield nebfield mentioned this issue Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working user-query User queries & requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants