Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

input files format #47

Closed
afadda91 opened this issue Nov 15, 2017 · 11 comments
Closed

input files format #47

afadda91 opened this issue Nov 15, 2017 · 11 comments
Assignees
Labels
notabug no underlying issue needs fixing: eg followed out of date instructions, wrong package etc.

Comments

@afadda91
Copy link

afadda91 commented Nov 15, 2017

hi,
it's not clear to me what's the exact format needed by the software. I followed the same format given in the examples but i get this error when i try to run:
./bin/pypop.py -c /Users/afadda/Desktop/QG128.ini /Users/afadda/Desktop/QG128.pop
Traceback (most recent call last):
File "./bin/pypop.py", line 316, in
config = getConfigInstance(configFilename, altpath, usage_message)
File "/Applications/pypop/bin/../PyPop/Main.py", line 64, in getConfigInstance
config.read(configFilename)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ConfigParser.py", line 305, in read
self._read(fp, filename)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ConfigParser.py", line 546, in _read
raise e
ConfigParser.ParsingError: File contains parsing errors: /Users/afadda/Desktop/QG128.ini
[line 15]: '*A_1\n'
[line 16]: '*A_2\n'
[line 17]: '*B_1\n'
[line 18]: '*B_2\n'
[line 19]: '*C_1\n'
[line 20]: '*C_2\n'
[line 21]: '*DQA1_1\n'
[line 22]: '*DQA1_2\n'
[line 23]: '*DQB1_1\n'
[line 24]: '*DQB1_2\n'
[line 25]: '*DRB1_1\n'
[line 26]: '*DRB1_2\n'
[line 27]: '*DPA1_1\n'
[line 28]: '*DPA1_2\n'
[line 29]: '*DPB1_1\n'
[line 30]: '*DPB1_2\n'
[line 31]: '*DRB3_1\n'
[line 32]: '*DRB3_2\n'
[line 33]: '*DRB4_1\n'
[line 34]: '*DRB4_2\n'

my .ini file looks like this :
;; comment out or change as desired
;; 1 = true, 0 = false
;; see config.ini in main distribution for detailed explanation of options

[General]
debug=0
outFilePrefixType=filename

[ParseGenotypeFile]
alleleDesignator=*
untypedAllele=****
;;fieldPairDesignator=_1:_2

validSampleFields=+id
*A_1
*A_2
*B_1
*B_2
*C_1
*C_2
*DQA1_1
*DQA1_2
*DQB1_1
*DQB1_2
*DRB1_1
*DRB1_2
*DPA1_1
*DPA1_2
*DPB1_1
*DPB1_2
*DRB3_1
*DRB3_2
*DRB4_1
*DRB4_2

[HardyWeinberg]
lumpBelow=5

[HardyWeinbergGuoThompson]
dememorizationSteps=2000
samplingNum=1000
samplingSize=1000

[HomozygosityEWSlatkinExact]
numReplicates=10000

[Emhaplofreq]
allPairwiseLD=1
allPairwiseLDWithPermu=0

thanks,

@sjmack
Copy link
Collaborator

sjmack commented Nov 15, 2017

Try adding a whitespace to all lines after the first for the validSampleFields data block, as below.

validSampleFields=+populat
 id
 *a_1
 *a_2
 *c_1
 *c_2
 *b_1
 *b_2
 *dra_1
 *dra_2
 *drb1_1
 *drb1_2
 *dqa1_1
 *dqa1_2
 *dqb1_1
 *dqb1_2
 *dpa1_1
 *dpa1_2
 *dpb1_1
 *dpb1_2

@sjmack
Copy link
Collaborator

sjmack commented Nov 15, 2017

This is also the case for the validPopFields data block as well.

validPopFields=labcode
 method
 ethnic
 contin
 collect
 latit
 longit
 complex

@afadda91
Copy link
Author

afadda91 commented Nov 16, 2017 via email

@afadda91
Copy link
Author

afadda91 commented Nov 16, 2017 via email

@sjmack
Copy link
Collaborator

sjmack commented Nov 16, 2017

Hi @afadda91. Can you provide a copy of your .ini file, and possibly an example of the results you are getting? Thanks.

@afadda91
Copy link
Author

afadda91 commented Nov 22, 2017 via email

@sjmack sjmack self-assigned this Nov 22, 2017
@sjmack
Copy link
Collaborator

sjmack commented Nov 22, 2017

@afadda91, I don't see an attached file here, but the current build is limited in its capacity for multi-locus haplotype estimation (particularly depending on the value of 2N) to 20 loci, particularly if you are using the [Emhaplofreq] settings. Even then, the number of subjects can impact the ability for the EM method to converge -- see issue #26.

Can you paste the text of your .ini into your next reply, and let us know how many loci 'many' and 'fewer' represent?

@afadda91
Copy link
Author

afadda91 commented Nov 23, 2017 via email

@afadda91
Copy link
Author

afadda91 commented Dec 4, 2017 via email

Repository owner deleted a comment from afadda91 Dec 5, 2017
@sjmack
Copy link
Collaborator

sjmack commented Dec 5, 2017

@afadda91, I'd recommend against using [Haplostats] for now. The Haplostats functions are still under development, and have some issues (e.g., #38, #39, #41). If you don't have any missing data (****), Haplostats is useful for calculating the ALD measures; just include

[Haplostats]
allPairwise=1

in your .ini.

For your earlier query, I don't have any problems using your .ini and a sample file of my own creation. I suspect that there may be an issue with sample size; can you let me know how many subjects are in your .pop file. Attached are the .ini and .pop files and the resulting *out.txt and *out.xml files (remove the .txt suffix for the .ini, .pop and .xml files).

afadda.ini
sample_afadda.pop
sample_afadda-out.txt
sample_afadda-out.xml

If you feel comfortable sharing your .pop file with me (@sjmack, not @alexlancaster) directly I will email you, and I can take a look.

@sjmack
Copy link
Collaborator

sjmack commented Dec 7, 2017

@afadda91, I was able to fix your datafile with two sets of changes.

First, you should make sure that you do not include locus name prefixes (e.g., "A*") in your HLA data. So "DRB1*03:01:01" should be recorded as "03:01:01". See section 2.2 of the PyPop user guide for more examples.

Second, your .pop file must not contain any trailing blank lines at the end of the file. See "current limitations of PyPop" in section 2.2.2.

Once I removed the locus prefixes and the trailing blank line, the analysis ran fine.

I'm not sure how you are managing your data, but please be aware that if you are using MS Excel, you will need to be careful to make sure that your HLA data without locus prefixes are kept in text formatted cells; if the cells are in General format, Excel will interpret HLA allele names as times.

For example, "03:01:01" will be converted to "3:01:01 AM", and "02:61:01" will be converted to "0.125706018518519", when entered into General format cells. In the latter case, Excel interprets "02:61:01" to mean "two hours, 61 minutes, and 1 second", which equals 0.125706018518519 days.

If an allele has an expression variant suffix, or a G or P group suffix, Excel will automatically treat it as text.

Good luck. I think that this addresses your issues with this thread, so I am going to close it. If you have additional problems or questions, you can open a new issue.

@sjmack sjmack closed this as completed Dec 7, 2017
@alexlancaster alexlancaster added the notabug no underlying issue needs fixing: eg followed out of date instructions, wrong package etc. label Jul 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
notabug no underlying issue needs fixing: eg followed out of date instructions, wrong package etc.
Projects
None yet
Development

No branches or pull requests

3 participants