-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
input files format #47
Comments
Try adding a whitespace to all lines after the first for the validSampleFields data block, as below.
|
This is also the case for the validPopFields data block as well.
|
Thanks . It worked!
|
Dear Alex,
According to the results I got, the allPairwiseLD yields Lds among the same gene alleles, and not between genes alleles.
Is there a way to calculate LD between all alleles?
Thanks,
Abeer
|
Hi @afadda91. Can you provide a copy of your .ini file, and possibly an example of the results you are getting? Thanks. |
Hi Alex,
I was wrong. The allPairwiseLDWithPermu=0 does indeed perform an all-pairwise.
But now I’m getting a "LOG: estimating haplotype frequencies for all two locus haplotypes,Abort trap: 6
When I try to execute many haplotypes. It works with fewer.
I’m attaching the .ini file.
Thanks you for your help.
Abeer
|
@afadda91, I don't see an attached file here, but the current build is limited in its capacity for multi-locus haplotype estimation (particularly depending on the value of 2N) to 20 loci, particularly if you are using the [Emhaplofreq] settings. Even then, the number of subjects can impact the ability for the EM method to converge -- see issue #26. Can you paste the text of your .ini into your next reply, and let us know how many loci 'many' and 'fewer' represent? |
Here is the .ini
;; comment out or change as desired
;; 1 = true, 0 = false
;; see config.ini in main distribution for detailed explanation of options
[General]
debug=0
outFilePrefixType=filename
[ParseGenotypeFile]
alleleDesignator=*
untypedAllele=****
;;fieldPairDesignator=_1:_2
validSampleFields=+id
*A_1
*A_2
*B_1
*B_2
*C_1
*C_2
*DQA1_1
*DQA1_2
*DQB1_1
*DQB1_2
*DRB1_1
*DRB1_2
*DPA1_1
*DPA1_2
*DPB1_1
*DPB1_2
*DRB3_1
*DRB3_2
*DRB4_1
*DRB4_2
[HardyWeinberg]
lumpBelow=5
[HardyWeinbergGuoThompson]
dememorizationSteps=2000
samplingNum=1000
samplingSize=1000
[HomozygosityEWSlatkinExact]
numReplicates=10000
[Emhaplofreq]
lociToEstHaplo=a:b:c,a:b:drb1,a:b:drb3,a:b:drb4,drb1:dqa1:dpb1,drb3:dqb1:dpb1,drb4:dqb1:dpb1,a:b:c:drb1:dqa1,a:b:c:drb1:dqb1,a:b:c:drb3:dqa1,a:b:c:drb3:dqb1,a:b:c,a:b:c:drb4:dqb1
;;lociToEstHaplo=a:b:c,a:b:drb1,a:b:drb3,a:b:drb4
;;lociToEstHaplo=a:b:drb1,a:b:c,drb1:dqa1:dpb1,drb1:dqb1:dpb1
allPairwiseLD=1
allPairwiseLDWithPermu=0
;;to calculate significance of LD:
;;allPairwiseLDWithPermu=1000
Regards,
|
Hi,
I used the [Haplostats] function with lociToEstHaplo=* but it only reports haplotypes for 4 genes max
When I try to use lociToEstHaplo=,a:b:c:drb1 it fails.
What to do?
Abeer
|
@afadda91, I'd recommend against using [Haplostats] for now. The Haplostats functions are still under development, and have some issues (e.g., #38, #39, #41). If you don't have any missing data (
in your .ini. For your earlier query, I don't have any problems using your .ini and a sample file of my own creation. I suspect that there may be an issue with sample size; can you let me know how many subjects are in your .pop file. Attached are the .ini and .pop files and the resulting *out.txt and *out.xml files (remove the .txt suffix for the .ini, .pop and .xml files). afadda.ini If you feel comfortable sharing your .pop file with me (@sjmack, not @alexlancaster) directly I will email you, and I can take a look. |
@afadda91, I was able to fix your datafile with two sets of changes. First, you should make sure that you do not include locus name prefixes (e.g., "A*") in your HLA data. So "DRB1*03:01:01" should be recorded as "03:01:01". See section 2.2 of the PyPop user guide for more examples. Second, your .pop file must not contain any trailing blank lines at the end of the file. See "current limitations of PyPop" in section 2.2.2. Once I removed the locus prefixes and the trailing blank line, the analysis ran fine. I'm not sure how you are managing your data, but please be aware that if you are using MS Excel, you will need to be careful to make sure that your HLA data without locus prefixes are kept in text formatted cells; if the cells are in General format, Excel will interpret HLA allele names as times. For example, "03:01:01" will be converted to "3:01:01 AM", and "02:61:01" will be converted to "0.125706018518519", when entered into General format cells. In the latter case, Excel interprets "02:61:01" to mean "two hours, 61 minutes, and 1 second", which equals 0.125706018518519 days. If an allele has an expression variant suffix, or a G or P group suffix, Excel will automatically treat it as text. Good luck. I think that this addresses your issues with this thread, so I am going to close it. If you have additional problems or questions, you can open a new issue. |
hi,
it's not clear to me what's the exact format needed by the software. I followed the same format given in the examples but i get this error when i try to run:
./bin/pypop.py -c /Users/afadda/Desktop/QG128.ini /Users/afadda/Desktop/QG128.pop
Traceback (most recent call last):
File "./bin/pypop.py", line 316, in
config = getConfigInstance(configFilename, altpath, usage_message)
File "/Applications/pypop/bin/../PyPop/Main.py", line 64, in getConfigInstance
config.read(configFilename)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ConfigParser.py", line 305, in read
self._read(fp, filename)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ConfigParser.py", line 546, in _read
raise e
ConfigParser.ParsingError: File contains parsing errors: /Users/afadda/Desktop/QG128.ini
[line 15]: '*A_1\n'
[line 16]: '*A_2\n'
[line 17]: '*B_1\n'
[line 18]: '*B_2\n'
[line 19]: '*C_1\n'
[line 20]: '*C_2\n'
[line 21]: '*DQA1_1\n'
[line 22]: '*DQA1_2\n'
[line 23]: '*DQB1_1\n'
[line 24]: '*DQB1_2\n'
[line 25]: '*DRB1_1\n'
[line 26]: '*DRB1_2\n'
[line 27]: '*DPA1_1\n'
[line 28]: '*DPA1_2\n'
[line 29]: '*DPB1_1\n'
[line 30]: '*DPB1_2\n'
[line 31]: '*DRB3_1\n'
[line 32]: '*DRB3_2\n'
[line 33]: '*DRB4_1\n'
[line 34]: '*DRB4_2\n'
my .ini file looks like this :
;; comment out or change as desired
;; 1 = true, 0 = false
;; see config.ini in main distribution for detailed explanation of options
[General]
debug=0
outFilePrefixType=filename
[ParseGenotypeFile]
alleleDesignator=*
untypedAllele=****
;;fieldPairDesignator=_1:_2
validSampleFields=+id
*A_1
*A_2
*B_1
*B_2
*C_1
*C_2
*DQA1_1
*DQA1_2
*DQB1_1
*DQB1_2
*DRB1_1
*DRB1_2
*DPA1_1
*DPA1_2
*DPB1_1
*DPB1_2
*DRB3_1
*DRB3_2
*DRB4_1
*DRB4_2
[HardyWeinberg]
lumpBelow=5
[HardyWeinbergGuoThompson]
dememorizationSteps=2000
samplingNum=1000
samplingSize=1000
[HomozygosityEWSlatkinExact]
numReplicates=10000
[Emhaplofreq]
allPairwiseLD=1
allPairwiseLDWithPermu=0
thanks,
The text was updated successfully, but these errors were encountered: