Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dimension order sorcery #162

Merged
merged 14 commits into from
Feb 12, 2025
Merged

Dimension order sorcery #162

merged 14 commits into from
Feb 12, 2025

Conversation

nicholas512
Copy link
Contributor

@nicholas512 nicholas512 commented Jan 30, 2025

Change how GlobSim reads netcdf files so you can easily change the dimension order of the source file.

(lat,lon,time) is significantly faster than (time,lat,lon)--the default.

ideally, it will be possible to flexibly switch between default and "optimized" dimension behaviour.

Script for optimizing netcdf files

the script globsim/interpolate/optimize.sh transforms downloaded netcdf files so that reading all times is much faster

  • Dimension order: time as the fastest-varying dimension
  • Chunking: store data in chunks of size (5, 5, max, max) (latitude, longitude, level, time). I'm not certain that 5 is the right size for latitude and longitude but it seems to work.
  • example usage:
globsim/interpolate/optimize.sh /data/globsim/n60/era5 /data/globsim/n60_optim/era5 --year 2001  # processes files for 2001

Flexible dimension reading

  • Add a function reorder_and_slice_array that accepts an array, an ordering scheme for the array, and an ordering scheme
  • You can also provide named slice boundaries in order to extract data out of the array
    • e.g. all data at a particular time reorder_and_slice_array(slice_boundaries={'time':slice(1,100)})
    • e.g. a spatial subset reorder_and_slice_array( slice_boundaries={'latitude':slice(1,5), 'longitude':slice(1,5)})
  • This could be made into a python utility eventually to more clearly integrate into globsim

keyword toggle

  • in the toml config file, set optimized=true to use the optimized files

Benchmarks:

  • interpolation of 2 nearby sites for 6 years of era5 took about 5 minutes.
    • interpolation of same sites for 11 years of era5 took about 17 minutes.
  • optimization script takes about 1 hour for 12 months of era5 data ( 45 x 205 grid cells)

Closes #127
Closes #154

@nicholas512 nicholas512 merged commit 5028d1e into master Feb 12, 2025
2 checks passed
@nicholas512 nicholas512 deleted the dim-order-sorcery branch February 12, 2025 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve read times with smarter dimensions Try using chunks to speed up data access
1 participant