input files format #47

afadda91 · 2017-11-15T11:46:21Z

hi,
it's not clear to me what's the exact format needed by the software. I followed the same format given in the examples but i get this error when i try to run:
./bin/pypop.py -c /Users/afadda/Desktop/QG128.ini /Users/afadda/Desktop/QG128.pop
Traceback (most recent call last):
File "./bin/pypop.py", line 316, in
config = getConfigInstance(configFilename, altpath, usage_message)
File "/Applications/pypop/bin/../PyPop/Main.py", line 64, in getConfigInstance
config.read(configFilename)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ConfigParser.py", line 305, in read
self._read(fp, filename)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ConfigParser.py", line 546, in _read
raise e
ConfigParser.ParsingError: File contains parsing errors: /Users/afadda/Desktop/QG128.ini
[line 15]: '*A_1\n'
[line 16]: '*A_2\n'
[line 17]: '*B_1\n'
[line 18]: '*B_2\n'
[line 19]: '*C_1\n'
[line 20]: '*C_2\n'
[line 21]: '*DQA1_1\n'
[line 22]: '*DQA1_2\n'
[line 23]: '*DQB1_1\n'
[line 24]: '*DQB1_2\n'
[line 25]: '*DRB1_1\n'
[line 26]: '*DRB1_2\n'
[line 27]: '*DPA1_1\n'
[line 28]: '*DPA1_2\n'
[line 29]: '*DPB1_1\n'
[line 30]: '*DPB1_2\n'
[line 31]: '*DRB3_1\n'
[line 32]: '*DRB3_2\n'
[line 33]: '*DRB4_1\n'
[line 34]: '*DRB4_2\n'

my .ini file looks like this :
;; comment out or change as desired
;; 1 = true, 0 = false
;; see config.ini in main distribution for detailed explanation of options

[General]
debug=0
outFilePrefixType=filename

[ParseGenotypeFile]
alleleDesignator=*
untypedAllele=****
;;fieldPairDesignator=_1:_2

validSampleFields=+id
*A_1
*A_2
*B_1
*B_2
*C_1
*C_2
*DQA1_1
*DQA1_2
*DQB1_1
*DQB1_2
*DRB1_1
*DRB1_2
*DPA1_1
*DPA1_2
*DPB1_1
*DPB1_2
*DRB3_1
*DRB3_2
*DRB4_1
*DRB4_2

[HardyWeinberg]
lumpBelow=5

[HardyWeinbergGuoThompson]
dememorizationSteps=2000
samplingNum=1000
samplingSize=1000

[HomozygosityEWSlatkinExact]
numReplicates=10000

[Emhaplofreq]
allPairwiseLD=1
allPairwiseLDWithPermu=0

thanks,

sjmack · 2017-11-15T15:26:15Z

Try adding a whitespace to all lines after the first for the validSampleFields data block, as below.

validSampleFields=+populat
 id
 *a_1
 *a_2
 *c_1
 *c_2
 *b_1
 *b_2
 *dra_1
 *dra_2
 *drb1_1
 *drb1_2
 *dqa1_1
 *dqa1_2
 *dqb1_1
 *dqb1_2
 *dpa1_1
 *dpa1_2
 *dpb1_1
 *dpb1_2

sjmack · 2017-11-15T15:30:44Z

This is also the case for the validPopFields data block as well.

validPopFields=labcode
 method
 ethnic
 contin
 collect
 latit
 longit
 complex

afadda91 · 2017-11-16T11:28:54Z

Thanks . It worked!

afadda91 · 2017-11-16T12:53:40Z

Dear Alex, According to the results I got, the allPairwiseLD yields Lds among the same gene alleles, and not between genes alleles. Is there a way to calculate LD between all alleles? Thanks, Abeer

sjmack · 2017-11-16T19:57:58Z

Hi @afadda91. Can you provide a copy of your .ini file, and possibly an example of the results you are getting? Thanks.

afadda91 · 2017-11-22T07:19:06Z

Hi Alex, I was wrong. The allPairwiseLDWithPermu=0 does indeed perform an all-pairwise. But now I’m getting a "LOG: estimating haplotype frequencies for all two locus haplotypes,Abort trap: 6 When I try to execute many haplotypes. It works with fewer. I’m attaching the .ini file. Thanks you for your help. Abeer

sjmack · 2017-11-22T16:49:08Z

@afadda91, I don't see an attached file here, but the current build is limited in its capacity for multi-locus haplotype estimation (particularly depending on the value of 2N) to 20 loci, particularly if you are using the [Emhaplofreq] settings. Even then, the number of subjects can impact the ability for the EM method to converge -- see issue #26.

Can you paste the text of your .ini into your next reply, and let us know how many loci 'many' and 'fewer' represent?

afadda91 · 2017-11-23T07:49:32Z

Here is the .ini ;; comment out or change as desired ;; 1 = true, 0 = false ;; see config.ini in main distribution for detailed explanation of options [General] debug=0 outFilePrefixType=filename [ParseGenotypeFile] alleleDesignator=* untypedAllele=**** ;;fieldPairDesignator=_1:_2 validSampleFields=+id *A_1 *A_2 *B_1 *B_2 *C_1 *C_2 *DQA1_1 *DQA1_2 *DQB1_1 *DQB1_2 *DRB1_1 *DRB1_2 *DPA1_1 *DPA1_2 *DPB1_1 *DPB1_2 *DRB3_1 *DRB3_2 *DRB4_1 *DRB4_2 [HardyWeinberg] lumpBelow=5 [HardyWeinbergGuoThompson] dememorizationSteps=2000 samplingNum=1000 samplingSize=1000 [HomozygosityEWSlatkinExact] numReplicates=10000 [Emhaplofreq] lociToEstHaplo=a:b:c,a:b:drb1,a:b:drb3,a:b:drb4,drb1:dqa1:dpb1,drb3:dqb1:dpb1,drb4:dqb1:dpb1,a:b:c:drb1:dqa1,a:b:c:drb1:dqb1,a:b:c:drb3:dqa1,a:b:c:drb3:dqb1,a:b:c,a:b:c:drb4:dqb1 ;;lociToEstHaplo=a:b:c,a:b:drb1,a:b:drb3,a:b:drb4 ;;lociToEstHaplo=a:b:drb1,a:b:c,drb1:dqa1:dpb1,drb1:dqb1:dpb1 allPairwiseLD=1 allPairwiseLDWithPermu=0 ;;to calculate significance of LD: ;;allPairwiseLDWithPermu=1000 Regards,

afadda91 · 2017-12-04T09:27:57Z

Hi, I used the [Haplostats] function with lociToEstHaplo=* but it only reports haplotypes for 4 genes max When I try to use lociToEstHaplo=,a:b:c:drb1 it fails. What to do? Abeer

sjmack · 2017-12-05T20:14:20Z

@afadda91, I'd recommend against using [Haplostats] for now. The Haplostats functions are still under development, and have some issues (e.g., #38, #39, #41). If you don't have any missing data (****), Haplostats is useful for calculating the ALD measures; just include

[Haplostats]
allPairwise=1

in your .ini.

For your earlier query, I don't have any problems using your .ini and a sample file of my own creation. I suspect that there may be an issue with sample size; can you let me know how many subjects are in your .pop file. Attached are the .ini and .pop files and the resulting *out.txt and *out.xml files (remove the .txt suffix for the .ini, .pop and .xml files).

afadda.ini
sample_afadda.pop
sample_afadda-out.txt
sample_afadda-out.xml

If you feel comfortable sharing your .pop file with me (@sjmack, not @alexlancaster) directly I will email you, and I can take a look.

sjmack · 2017-12-07T00:30:21Z

@afadda91, I was able to fix your datafile with two sets of changes.

First, you should make sure that you do not include locus name prefixes (e.g., "A*") in your HLA data. So "DRB1*03:01:01" should be recorded as "03:01:01". See section 2.2 of the PyPop user guide for more examples.

Second, your .pop file must not contain any trailing blank lines at the end of the file. See "current limitations of PyPop" in section 2.2.2.

Once I removed the locus prefixes and the trailing blank line, the analysis ran fine.

I'm not sure how you are managing your data, but please be aware that if you are using MS Excel, you will need to be careful to make sure that your HLA data without locus prefixes are kept in text formatted cells; if the cells are in General format, Excel will interpret HLA allele names as times.

For example, "03:01:01" will be converted to "3:01:01 AM", and "02:61:01" will be converted to "0.125706018518519", when entered into General format cells. In the latter case, Excel interprets "02:61:01" to mean "two hours, 61 minutes, and 1 second", which equals 0.125706018518519 days.

If an allele has an expression variant suffix, or a G or P group suffix, Excel will automatically treat it as text.

Good luck. I think that this addresses your issues with this thread, so I am going to close it. If you have additional problems or questions, you can open a new issue.

sjmack self-assigned this Nov 22, 2017

Repository owner deleted a comment from afadda91 Dec 5, 2017

sjmack closed this as completed Dec 7, 2017

alexlancaster added the notabug no underlying issue needs fixing: eg followed out of date instructions, wrong package etc. label Jul 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

input files format #47

input files format #47

afadda91 commented Nov 15, 2017 •

edited

Loading

sjmack commented Nov 15, 2017

sjmack commented Nov 15, 2017

afadda91 commented Nov 16, 2017 via email •

edited by sjmack

Loading

afadda91 commented Nov 16, 2017 via email •

edited by sjmack

Loading

sjmack commented Nov 16, 2017 •

edited

Loading

afadda91 commented Nov 22, 2017 via email •

edited by sjmack

Loading

sjmack commented Nov 22, 2017

afadda91 commented Nov 23, 2017 via email •

edited by sjmack

Loading

afadda91 commented Dec 4, 2017 via email •

edited by sjmack

Loading

sjmack commented Dec 5, 2017

sjmack commented Dec 7, 2017

input files format #47

input files format #47

Comments

afadda91 commented Nov 15, 2017 • edited Loading

sjmack commented Nov 15, 2017

sjmack commented Nov 15, 2017

afadda91 commented Nov 16, 2017 via email • edited by sjmack Loading

afadda91 commented Nov 16, 2017 via email • edited by sjmack Loading

sjmack commented Nov 16, 2017 • edited Loading

afadda91 commented Nov 22, 2017 via email • edited by sjmack Loading

sjmack commented Nov 22, 2017

afadda91 commented Nov 23, 2017 via email • edited by sjmack Loading

afadda91 commented Dec 4, 2017 via email • edited by sjmack Loading

sjmack commented Dec 5, 2017

sjmack commented Dec 7, 2017

afadda91 commented Nov 15, 2017 •

edited

Loading

afadda91 commented Nov 16, 2017 via email •

edited by sjmack

Loading

afadda91 commented Nov 16, 2017 via email •

edited by sjmack

Loading

sjmack commented Nov 16, 2017 •

edited

Loading

afadda91 commented Nov 22, 2017 via email •

edited by sjmack

Loading

afadda91 commented Nov 23, 2017 via email •

edited by sjmack

Loading

afadda91 commented Dec 4, 2017 via email •

edited by sjmack

Loading