Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing files which is used in the code #23

Open
kyika opened this issue Jun 9, 2018 · 3 comments
Open

missing files which is used in the code #23

kyika opened this issue Jun 9, 2018 · 3 comments

Comments

@kyika
Copy link

kyika commented Jun 9, 2018

Dear professor Parks,
I have found that you used paths: '../data/gensample_hdf5_files/' and '../data/gensample' in your code but I didn't find them in the directory download from github, would you like to tell me what files do them contain.
-----------------------------added----------------------------------------------
Actually, I am trying to reproduce your training, I have succeed in running the training_set.py, and now I am working on the localize_model.py. specifically, I an interested in how the choose of hyperparameters influnces the ability of model, so I want to see the output of localize_model.py. However, I don't have input files in localize_model.py : ../data/gensample/train_*.npz & ../data/gensample/test_mix_23559.npz in line 249 & line 250. I have search around for some time and haven't found any code to generate these files. I guess that I should manually make directory ../data/gensample_hdf5_files/ and move the training_set.py output file there, then maybe I should run dla_cnn/preprocess.sh to convert them to npz file?

Many thanks!

@davidparks21
Copy link
Collaborator

davidparks21 commented Jun 28, 2018

Hi @kyika, I'm sorry for the delay, I will update the README with the location shortly. The files are all online in a dropbox, I just need to track down the link.

@davidparks21
Copy link
Collaborator

davidparks21 commented Jul 1, 2018

Arg, I need to read my own README. The link to the datafiles is on the README page already, though not particularly clear. I'll work on improving the overall documentation. The data files, catalogs, and pre-trained models are all stored on google drive here: https://tinyurl.com/cnn-dlas

I'm certain you will have questions about which datafiles were used during training, I encourage you to ask, I'd be interested in updating the documentation to make this codebase a bit more usable.

The DLAs data in the google drive come in a number of forms: Standard DLA's, High NHI DLAs (large DLAs), SLLS DLAs (small DLAs, or sub-dla's), and samples of dual DLAs (two DLAs very near each other).

Those are the raw data samples, they are preprocessed into one dataset used for training. The distribution of the various DLA types mentioned above are documented-by-code in the shell script preprocess.sh

https://github.com/davidparks21/qso_lya_detection_pipeline/blob/master/dla_cnn/preprocess.sh

That shell script executes the various preprocessing commands that generates the final training dataset from the raw files.

If I had to do over again I would change the process so that the preprocessing step was in-line with the training rather than a fully separate preprocessing step, but this code base requires the extra pre-processing step.

Let me know if I can be of further assistance.

David

p.s. lowly grad student here, not the professor (yet!!) :) Prochaska and Dong are professors in Astronomy and Cai is a post-doc in Astronomy. I'm the CS/deep learning side of the project.

@kyika
Copy link
Author

kyika commented Jul 1, 2018

Hi,David,
Sorry for the careless mistake I have made about your identity, and many thanks for your reply.

During these weeks after I send you email, we have generated part of raw data needed for training and downloaded other data from your google drive(many thanks for providing them!).Then we used the preprocess.sh to generate the files needed for localize_model.py and run this python code successfully, and we finish our project by get the training result on different hyperparameter sets.Anyway, we still appreciate your detailed reply and we believe it will benefit others interested in your work.

If you wonder why we are interested in about training the model, that's because we are undergraduate astronomy students taking a course about statistic in astronomy, it require that we reproduce the process of data analysis, as the result of our project.

Kyika

p.s. we think you did a good job on show how machine learning and astronomy combine tightly, and provide totally new energy to this thousand-year-old subject, thank you all and wish you good luck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants