-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
missing files which is used in the code #23
Comments
Hi @kyika, I'm sorry for the delay, I will update the README with the location shortly. The files are all online in a dropbox, I just need to track down the link. |
Arg, I need to read my own README. The link to the datafiles is on the README page already, though not particularly clear. I'll work on improving the overall documentation. The data files, catalogs, and pre-trained models are all stored on google drive here: I'm certain you will have questions about which datafiles were used during training, I encourage you to ask, I'd be interested in updating the documentation to make this codebase a bit more usable. The DLAs data in the google drive come in a number of forms: Standard DLA's, High NHI DLAs (large DLAs), SLLS DLAs (small DLAs, or sub-dla's), and samples of dual DLAs (two DLAs very near each other). Those are the raw data samples, they are preprocessed into one dataset used for training. The distribution of the various DLA types mentioned above are documented-by-code in the shell script https://github.com/davidparks21/qso_lya_detection_pipeline/blob/master/dla_cnn/preprocess.sh That shell script executes the various preprocessing commands that generates the final training dataset from the raw files. If I had to do over again I would change the process so that the preprocessing step was in-line with the training rather than a fully separate preprocessing step, but this code base requires the extra pre-processing step. Let me know if I can be of further assistance. David p.s. lowly grad student here, not the professor (yet!!) :) Prochaska and Dong are professors in Astronomy and Cai is a post-doc in Astronomy. I'm the CS/deep learning side of the project. |
Hi,David, During these weeks after I send you email, we have generated part of raw data needed for training and downloaded other data from your google drive(many thanks for providing them!).Then we used the preprocess.sh to generate the files needed for localize_model.py and run this python code successfully, and we finish our project by get the training result on different hyperparameter sets.Anyway, we still appreciate your detailed reply and we believe it will benefit others interested in your work. If you wonder why we are interested in about training the model, that's because we are undergraduate astronomy students taking a course about statistic in astronomy, it require that we reproduce the process of data analysis, as the result of our project. Kyika p.s. we think you did a good job on show how machine learning and astronomy combine tightly, and provide totally new energy to this thousand-year-old subject, thank you all and wish you good luck! |
Dear professor Parks,
I have found that you used paths: '../data/gensample_hdf5_files/' and '../data/gensample' in your code but I didn't find them in the directory download from github, would you like to tell me what files do them contain.
-----------------------------added----------------------------------------------
Actually, I am trying to reproduce your training, I have succeed in running the training_set.py, and now I am working on the localize_model.py. specifically, I an interested in how the choose of hyperparameters influnces the ability of model, so I want to see the output of localize_model.py. However, I don't have input files in localize_model.py : ../data/gensample/train_*.npz & ../data/gensample/test_mix_23559.npz in line 249 & line 250. I have search around for some time and haven't found any code to generate these files. I guess that I should manually make directory ../data/gensample_hdf5_files/ and move the training_set.py output file there, then maybe I should run dla_cnn/preprocess.sh to convert them to npz file?
Many thanks!
The text was updated successfully, but these errors were encountered: