-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop PPDataProxy accessing the file when no data is needed. #3659
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pp-mo, I follow the logic of all this. There are a couple of changes to address 🙂
Hi @trexfeathers sorry for the delay on this -- I forgot to push it up ! Following offline discussion, I've now reduced the scope of the behaviour, in favour of simpler code that only supports the exact current need (e.g. it no longer catches "[1:1]" as an empty slice case) However, I've also rather increased the complexity of the testing. As usual, it's tough to be comprehensive because numpy has so many indexing styles.
I'm just not considering those as possibilities. |
Note: when we have agreed on an implementation here, we should also adopt it in iris-grib. |
docs/iris/src/whatsnew/contributions_2.4.1/bugfix_2020-Feb-14_pp_emptyslices.txt
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some comments @pp-mo, although as we discussed you will be abstracting this code to be outside the PPDataProxy specifically, so that it can also serve for GRIB files too
Co-Authored-By: Martin Yeo <40734014+trexfeathers@users.noreply.github.com>
Thanks @trexfeathers |
All looks great, great idea with the abstraction into util 👏 |
* Stop PPDataProxy accessing the file when no data is needed. (#3659) * Add 2.4 whatsnew into full whatsnew list. Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com>
* Fixed tests since Numpy 1.18 deprecation of non-int num arguments for linspace. (#3655) * remove redundant tests (#3650) * Remove obsolete test. (#3662) * Remove grib-specific test. (#3663) * Removed ununused skipIf. (#3632) * Remove test_grib_save_rules.py which has been moved to iris-grib (#3666) * 2v4 mergeback picks (#3668) * Stop PPDataProxy accessing the file when no data is needed. (#3659) * Add 2.4 whatsnew into full whatsnew list. Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com> * Remove uri callback test which is moved to iris-grib (#3665) * Remove test_grib2 integration tests (#3664) * Remove test_grib_save.py (#3669) * Remove cube iter (#3656) * Fixed asv project name to 'scitools-iris'. (#3660) * Removed grib-specific test to iris-grib. (#3671) * Removed iris.tests.integration.test_grib_load and related CML files. (#3670) * Remove TestGribMessage (#3672) * Switched use of datetime.weekday() to datetime.dayofwk. (#3687) * New image hashes for mpl 3x2 (#3682) * New image hash for iris.test.test_plot.TestSymbols.test_cloud_cover with matplotlib 3.2.0. * Further images changes for mpl3x2. * Yet more updated image results. * Correct and improve dev-guide section on fixing graphics-tests. (#3683) * Correct and improve dev-guide section on fixing graphics-tests. * Review changes + general rethink. * Reduce duplication between 'graphics-tests' and general 'tests' page. * Update docs/iris/src/developers_guide/graphics_tests.rst Co-Authored-By: Martin Yeo <40734014+trexfeathers@users.noreply.github.com> * Update docs/iris/src/developers_guide/graphics_tests.rst Co-Authored-By: Martin Yeo <40734014+trexfeathers@users.noreply.github.com> * Update docs/iris/src/developers_guide/graphics_tests.rst Co-Authored-By: Martin Yeo <40734014+trexfeathers@users.noreply.github.com> * Update docs/iris/src/developers_guide/graphics_tests.rst Co-Authored-By: Martin Yeo <40734014+trexfeathers@users.noreply.github.com> * Update docs/iris/src/developers_guide/graphics_tests.rst Co-Authored-By: Martin Yeo <40734014+trexfeathers@users.noreply.github.com> Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com> Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com> Co-authored-by: Bill Little <bill.james.little@gmail.com> Co-authored-by: lbdreyer <lbdreyer@users.noreply.github.com> Co-authored-by: Jon Seddon <17068361+jonseddon@users.noreply.github.com>
* Stop PPDataProxy accessing the file when no data is needed. (SciTools#3659) * Add 2.4 whatsnew into full whatsnew list. Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com>
We just found that PP and Fieldsfile loading has been slow and heavy on memory, apparently since Dask 2.0.
In these cases, the PPDataProxy does not actually use the file data, so this fix allows it to return a suitable empty array object of the expected type and shape, without reading the file.
The reason appears to be that
dask.array.from_array(arraylike)
now extracts an empty slice of the argument, likearraylike[0:0, 0:0...]
.Note, this is even worse than it sounds :
See this line in
meta_from_array
, called from here inArray.__new__
, as used infrom_array
.Speculation : I think dask is doing this to capture the metadata of a slice from the argument, which it presumably believes can be different from that of the argument itself. I'm guessing this is needed for generalised array types e.g. dask, CUDA