missing files which is used in the code #23

kyika · 2018-06-09T04:07:29Z

Dear professor Parks,
I have found that you used paths: '../data/gensample_hdf5_files/' and '../data/gensample' in your code but I didn't find them in the directory download from github, would you like to tell me what files do them contain.
-----------------------------added----------------------------------------------
Actually, I am trying to reproduce your training, I have succeed in running the training_set.py, and now I am working on the localize_model.py. specifically, I an interested in how the choose of hyperparameters influnces the ability of model, so I want to see the output of localize_model.py. However, I don't have input files in localize_model.py : ../data/gensample/train_*.npz & ../data/gensample/test_mix_23559.npz in line 249 & line 250. I have search around for some time and haven't found any code to generate these files. I guess that I should manually make directory ../data/gensample_hdf5_files/ and move the training_set.py output file there, then maybe I should run dla_cnn/preprocess.sh to convert them to npz file?

Many thanks!

davidparks21 · 2018-06-28T16:47:18Z

Hi @kyika, I'm sorry for the delay, I will update the README with the location shortly. The files are all online in a dropbox, I just need to track down the link.

davidparks21 · 2018-07-01T00:59:05Z

Arg, I need to read my own README. The link to the datafiles is on the README page already, though not particularly clear. I'll work on improving the overall documentation. The data files, catalogs, and pre-trained models are all stored on google drive here: https://tinyurl.com/cnn-dlas

I'm certain you will have questions about which datafiles were used during training, I encourage you to ask, I'd be interested in updating the documentation to make this codebase a bit more usable.

The DLAs data in the google drive come in a number of forms: Standard DLA's, High NHI DLAs (large DLAs), SLLS DLAs (small DLAs, or sub-dla's), and samples of dual DLAs (two DLAs very near each other).

Those are the raw data samples, they are preprocessed into one dataset used for training. The distribution of the various DLA types mentioned above are documented-by-code in the shell script preprocess.sh

https://github.com/davidparks21/qso_lya_detection_pipeline/blob/master/dla_cnn/preprocess.sh

That shell script executes the various preprocessing commands that generates the final training dataset from the raw files.

If I had to do over again I would change the process so that the preprocessing step was in-line with the training rather than a fully separate preprocessing step, but this code base requires the extra pre-processing step.

Let me know if I can be of further assistance.

David

p.s. lowly grad student here, not the professor (yet!!) :) Prochaska and Dong are professors in Astronomy and Cai is a post-doc in Astronomy. I'm the CS/deep learning side of the project.

kyika · 2018-07-01T02:46:36Z

Hi,David,
Sorry for the careless mistake I have made about your identity, and many thanks for your reply.

During these weeks after I send you email, we have generated part of raw data needed for training and downloaded other data from your google drive(many thanks for providing them!).Then we used the preprocess.sh to generate the files needed for localize_model.py and run this python code successfully, and we finish our project by get the training result on different hyperparameter sets.Anyway, we still appreciate your detailed reply and we believe it will benefit others interested in your work.

If you wonder why we are interested in about training the model, that's because we are undergraduate astronomy students taking a course about statistic in astronomy, it require that we reproduce the process of data analysis, as the result of our project.

Kyika

p.s. we think you did a good job on show how machine learning and astronomy combine tightly, and provide totally new energy to this thousand-year-old subject, thank you all and wish you good luck!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

missing files which is used in the code #23

missing files which is used in the code #23

kyika commented Jun 9, 2018 •

edited

Loading

davidparks21 commented Jun 28, 2018 •

edited

Loading

davidparks21 commented Jul 1, 2018 •

edited

Loading

kyika commented Jul 1, 2018 •

edited

Loading

missing files which is used in the code #23

missing files which is used in the code #23

Comments

kyika commented Jun 9, 2018 • edited Loading

davidparks21 commented Jun 28, 2018 • edited Loading

davidparks21 commented Jul 1, 2018 • edited Loading

kyika commented Jul 1, 2018 • edited Loading

kyika commented Jun 9, 2018 •

edited

Loading

davidparks21 commented Jun 28, 2018 •

edited

Loading

davidparks21 commented Jul 1, 2018 •

edited

Loading

kyika commented Jul 1, 2018 •

edited

Loading