-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try using chunks to speed up data access #127
Comments
|
|
Script for era5 pressure-level: for f in `ls ./era5 | grep era5_pl_.*`
do
fullpath=./era5/$f
newpath=./era5_chunked/$f
if [[ -f $newpath ]]
then
echo "$f already chunked"
else
NTIME=`ncdump -h $fullpath | grep "time = .*" | grep -oEi "[0-9]+"`
NLEV=`ncdump -h $fullpath | grep "level = .*" | grep -oEi "[0-9]+"`
echo "Chunking $f with $NTIME times and $NLEV levels"
nccopy -k 4 -d 0 -w -h 100M -c "longitude/4,latitude/4,level/$NLEV,time/$NTIME" $fullpath $newpath
fi
done Speedup with MFDataset: reading
Speedup with MFDataset: reading
Speedup with MFDataset: reading
|
more generic script DIRECTORY="era5"
LEV_VAR="level"
TIME_VAR="time"
for f in `ls ./$DIRECTORY | grep .*nc$`
## Process each nc file found
do
fullpath="./$DIRECTORY/$f"
newpath="./${DIRECTORY}_chunked/$f"
if [[ -f $newpath ]]
then
echo "${f} already chunked"
else
NTIME=`ncdump -h $fullpath | grep "${TIME_VAR} = .*" | grep -oEi "[0-9]+"`
NLEV=`ncdump -h $fullpath | grep "${LEV_VAR} = .*" | grep -oEi "[0-9]+"`
if [ -z "$NLEV" ] # does the level variable not exist
then
echo "Chunking $f with ${NTIME} times and ${NLEV} levels"
nccopy -k 4 -d 0 -w -h 100M -c "longitude/4,latitude/4,level/${NLEV},time/${NTIME}" $fullpath $newpath
else
echo "Chunking $f with $NTIME times"
nccopy -k 4 -d 0 -w -h 100M -c "longitude/4,latitude/4,time/${NTIME}" $fullpath $newpath
fi
fi
done |
Reading data from netcdf files takes forever and is probably the step that slows down the interpolate step.
I believe the access pattern (long time series) is extremely non-contiguous relative to the shape of the array(p11 of this report)
For ERA5 at least, it looks like the downloaded data is NETCDF3 model and not chunked.
It may be possible to speed up access by using chunking:
make copies of a bunch of data using NCCOPY , changing data type to netcdf4 and adding chunking. for era5 try chunksizes:
The text was updated successfully, but these errors were encountered: