Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a cache to store PySM3 data files #147

Merged
merged 11 commits into from
Feb 17, 2022
Merged

Use a cache to store PySM3 data files #147

merged 11 commits into from
Feb 17, 2022

Conversation

ziotom78
Copy link
Member

As it is often the case that PySM3 data files are not available due to network outage, this PR uses https://github.com/actions/cache to cache it. It should be quite efficient, as cache files are compressed using Zstandard, which in my tests showed very good compression ratios for PySM3 maps.

I use here the trick explained in the PySM3 User's Manual

@ziotom78
Copy link
Member Author

Not relevant for the PR, but just to keep a record of my tests and to keep in mind the amount of storage we can save with different compression schemes. It's also a testimony to Zstandard's awesomeness!

I downloaded the PySM3 data archive and create a tarball:

$ git clone https://github.com/galsci/pysm-data pysm3-data
$ tar cf archive.tar pysm3-data

and then I compressed archive.tar using gzip, bzip2, and zstd. Here are the file sizes:

File Size
Uncompressed 754M
BZip2 641M
Gzip 632M
Zstandard 615M

image

Zstandard is the winner here. However, the most impressive result comes from compression speed:

File Compression time
BZip2 100 s
Gzip 34 s
Zstandard 3 s

image

Not only Zstandard has achieved the best compression ratio, but it has performed the compression in a fraction of the time required by the other two algorithms.

@ziotom78
Copy link
Member Author

There is a problem here, because the cache gets saved and restored as desired, but it is never accessed when running the Mbs module.

After a few hours of debugging, I discovered that this happens because the environment variable PYSM_LOCAL_DATA (used to implement caching) is being overwritten in mbs/mbs.py.

@NicolettaK, if I understand correctly, you are using this feature so that we can pass our own CMB realizations to PySM3, is this correct?

@ziotom78
Copy link
Member Author

ziotom78 commented Dec 7, 2021

@NicolettaK , may you please have a look at mbs.py, line 568? I would like to remove the line where the code changes the value of the environment variable PYSM_LOCAL_DATA, but I am unsure how to do so in a way that lets Mbs keep working as expected.

@ziotom78
Copy link
Member Author

Testing the use of Path.absolute() after a suggestion in PySM issue #102.

@ziotom78 ziotom78 merged commit 381cde9 into master Feb 17, 2022
@ziotom78 ziotom78 deleted the cache_pysm3 branch February 17, 2022 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant