-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document cube/cubelist pickling #2262
Conversation
Caching cubes and cube lists to pickle files | ||
-------------------------------------------- | ||
|
||
It should always be possible to create a temporary cache file containing a cube or cube list using the Python `Pickle <https://docs.python.org/2/library/pickle.html>`_ functionality. This can be useful when the cube or cube list has been lazily loaded so the pickle file itself will contain only a reference to the data in the original files. In this state writing, and subsequently reading, a pickle file is very fast. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cube
and cube list
are both class objects that sphinx can get to the documentation for. They also have standardised names within the Iris documentation; being Cube
and CubeList
respectively.
As such, you should in all cases refer to them as follows:
:class:~`iris.cube.Cube`
:class:~`iris.cube.CubeList`
A quick example of saving and reading a pickle file is: | ||
|
||
.. code-block:: python | ||
# import pickle: in python 2.7 cPickle is faster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about py35?
In general it is best not to have entire commented lines of code within a code example. In this case it would be better to discuss pickle
vs cPickle
in a text section before this code example.
Thanks Peter, those changes make sense. |
Please test pickling with the latest pythonnetcdf library for deferred and loaded data. We just ran into issues where netcdf variables are no longer picklable in the latest point release, I don't know if it affect iris but worth checking |
Thanks @ajdawson but I think that would be better handled as a separate issue from this documentation change. |
@malcolmbrooks - my point was that if pickling doesn't work as expected in iris when using the latest netcdf version then it would be a mistake to document it at all... So I think this does need to be checked before we can merge this. |
In fact, I'm not in favour of including this information in the documentation of iris at all. Given the inherent problems that may be encountered (lazy data may break if you don't unpickle in exact same environment you pickled etc.) it seems like a bad idea to promote it as a method of saving cubes. The new documentation doesn't cover anything iris specific either, it is just information about how to pickle an object that could be found in the Python documentation, coupled with some reasons why you really wouldn't want to do it. You include one reason why you might want to do it as a developer, but I'm not sure this is good enough reason to include it in the saving a cube documentation, given the harm it may do if we unintentionally promote this method of saving data to users who don't really need it. |
@malcolmbrooks I have to say that I agree with @ajdawson on this. While the content is undoubtedly useful in some situations it is highly specific, which is something that all Iris documentation tries to stay clear of, not least because specificity in one area opens the door to specificity in other areas. This can lead to documentation that is excessively long with all the specifics that have now been included. And long documentation is hard to use because it makes it very hard to find the one thing of value that the reader is looking for. |
…arallel processing. An unpickleable cube is a problem which should be raised as an issue.
Okay, I'll close this as there appears to be a consensus that it's not required. Cube pickleability is getting to be really important though. Applications using iris that does parallel processing, or has a similar parent-child process way of working, is dependent on it, and that means a lot of applications using iris at a larger scale. Having documentation that made it clear that it should always be possible to pickle a cube would have been a useful statement. There isn't anywhere appropriate in the developer guide for this either. Looking at lib/iris/tests/integration/test_pickle.py it is not very extensive - it tests only a single cube loaded from grib, which doesn't catch the netcdf isseue from @ajdawson. This makes me nervous that future versions of Iris could break pickleability and it won't be seen as a big issue. An alternative to this documentation change would be a much more extensive set of pickle tests with an explanation which makes it very clear why cube pickling is important. That's getting a bit too deep into iris for me to do though, so I would just raise it as an issue and leave it to those more involved in development. Would that be okay? |
I'll happily look at this. Our library which uses iris is very much dependent on cubes being pickleable as we use multiprocessing. If this testing is missing for the various fileformats, we don't want to find out later after a new iris tag release. Thanks for brining this to our attention. @ajdawson @dkillick regarding the documentation, quite a while ago, we (AVD at that time) looked into writing a pickle helper function, driven by user requests. If I recall correctly, on finding that NetCDF turned out to be generally faster for those cases and given the requirements of those requests at that time, the motivation was quickly diminished. Instead the outcome was #787 I think (@pp-mo can correct me if I'm not recalling correctly). That is, there is some iris functionality specifically targeting helping users in 'caching' data using pickle. |
Thanks @cpelley!
I agree. I'm not against the documentation proposed here, just the location of the documentation proposed here. I actually mentioned to @malcolmbrooks offline that I think we could do with another Iris documentation whitepaper that's all about saving, as we've had a number of queries about the save process recently. I reckon something like this could fit into such a page. |
Pickling cubes and cubelists can be very useful in certain situations. This changes adds a short section on pickling cubes to the user guide section on save.
The limitations of a pickle file are clearly stated: it's not a portable format, and will break if anything changes.
On the other hand, pickling a cube/cubelist can be very fast and there are situations where that is useful.