|
| 1 | +How to combine scoring files from the PGS Catalog |
| 2 | +================================================= |
| 3 | + |
| 4 | +``pgscatalog-combine`` is a CLI application that makes it easy to combine scoring files into a standardised output. |
| 5 | + |
| 6 | +The process involves: |
| 7 | + |
| 8 | +* extracting important fields from scoring files |
| 9 | +* doing some quality control checks |
| 10 | +* optionally lifting over variants to a consistent genome build |
| 11 | +* writing a long format / melted output file |
| 12 | + |
| 13 | +Input scoring files must follow PGS Catalog standards. The output file is useful for |
| 14 | +doing data science tasks, like matching variants across a scoring file and target |
| 15 | +genome. |
| 16 | + |
| 17 | +Installation |
| 18 | +------------ |
| 19 | + |
| 20 | +:: |
| 21 | + |
| 22 | + $ pip install pgscatalog-combine |
| 23 | + |
| 24 | +Usage |
| 25 | +----- |
| 26 | + |
| 27 | +Combining PGS Catalog scoring files |
| 28 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 29 | + |
| 30 | +.. tip:: It's easiest to get started by downloading scoring files in the same genome build: :doc:`download` |
| 31 | + |
| 32 | +:: |
| 33 | + |
| 34 | + $ pgscatalog-combine -s PGS000001_hmPOS_GRCh38.txt.gz PGS0001229_hmPOS_GRCh38.txt.gz -t GRCh38 -o combined.txt |
| 35 | + |
| 36 | +.. note:: If you're combining lots of files, you can compress the output automatically ``--o combined.txt.gz`` |
| 37 | + |
| 38 | +Lifting over scoring files |
| 39 | +~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 40 | + |
| 41 | +It's possible to combine scoring files with different genome builds using liftover. |
| 42 | + |
| 43 | +.. danger:: You should only do this when combining PGS Catalog and custom scoring files, because the PGS Catalog provides harmonised data |
| 44 | + |
| 45 | +First, download chain files from UCSC: |
| 46 | + |
| 47 | +* `hg19ToHg38.over.chain.gz`_ |
| 48 | +* `hg38ToHg19.over.chain.gz`_ |
| 49 | + |
| 50 | +.. _hg19ToHg38.over.chain.gz: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/liftOver/ |
| 51 | +.. _hg38ToHg19.over.chain.gz: https://hgdownload.soe.ucsc.edu/goldenPath/hg19/liftOver/ |
| 52 | + |
| 53 | +And copy them into a directory (e.g. ``my_chain_dir/``). |
| 54 | + |
| 55 | +Assuming you have a custom scoring file in GRCh37 (``my_scorefile_grch37.txt.gz``), and you want to combine it with a PGS Catalog scoring file in GRCh38. |
| 56 | + |
| 57 | +:: |
| 58 | + |
| 59 | + $ pgscatalog-combine -s PGS000001_hmPOS_GRCh38.txt.gz my_scorefile_grch37.txt.gz \ |
| 60 | + --chain_dir my_chain_dir/ \ |
| 61 | + -t GRCh38 \ |
| 62 | + -o combined.txt |
| 63 | + |
| 64 | +Help |
| 65 | +---- |
| 66 | + |
| 67 | +:: |
| 68 | + |
| 69 | + $ pgscatalog-combine --help |
0 commit comments