Document cube/cubelist pickling #2262

melissaebrooks · 2016-12-08T11:46:11Z

Pickling cubes and cubelists can be very useful in certain situations. This changes adds a short section on pickling cubes to the user guide section on save.

The limitations of a pickle file are clearly stated: it's not a portable format, and will break if anything changes.

On the other hand, pickling a cube/cubelist can be very fast and there are situations where that is useful.

…anch

DPeterK · 2016-12-08T12:03:19Z

docs/iris/src/userguide/saving_iris_cubes.rst

+Caching cubes and cube lists to pickle files
+--------------------------------------------
+
+It should always be possible to create a temporary cache file containing a cube or cube list using the Python `Pickle <https://docs.python.org/2/library/pickle.html>`_ functionality. This can be useful when the cube or cube list has been lazily loaded so the pickle file itself will contain only a reference to the data in the original files. In this state writing, and subsequently reading, a pickle file is very fast.


cube and cube list are both class objects that sphinx can get to the documentation for. They also have standardised names within the Iris documentation; being Cube and CubeList respectively.

As such, you should in all cases refer to them as follows:

:class:~`iris.cube.Cube`

:class:~`iris.cube.CubeList`

DPeterK · 2016-12-08T12:07:07Z

docs/iris/src/userguide/saving_iris_cubes.rst

+A quick example of saving and reading a pickle file is:
+
+.. code-block:: python
+    # import pickle: in python 2.7 cPickle is faster


What about py35?

In general it is best not to have entire commented lines of code within a code example. In this case it would be better to discuss pickle vs cPickle in a text section before this code example.

melissaebrooks · 2016-12-08T12:59:08Z

Thanks Peter, those changes make sense.

ajdawson · 2016-12-08T15:03:53Z

Please test pickling with the latest pythonnetcdf library for deferred and loaded data. We just ran into issues where netcdf variables are no longer picklable in the latest point release, I don't know if it affect iris but worth checking

melissaebrooks · 2016-12-14T11:40:20Z

Thanks @ajdawson but I think that would be better handled as a separate issue from this documentation change.

ajdawson · 2016-12-14T11:46:10Z

@malcolmbrooks - my point was that if pickling doesn't work as expected in iris when using the latest netcdf version then it would be a mistake to document it at all... So I think this does need to be checked before we can merge this.

ajdawson · 2016-12-14T11:50:58Z

In fact, I'm not in favour of including this information in the documentation of iris at all. Given the inherent problems that may be encountered (lazy data may break if you don't unpickle in exact same environment you pickled etc.) it seems like a bad idea to promote it as a method of saving cubes.

The new documentation doesn't cover anything iris specific either, it is just information about how to pickle an object that could be found in the Python documentation, coupled with some reasons why you really wouldn't want to do it.

You include one reason why you might want to do it as a developer, but I'm not sure this is good enough reason to include it in the saving a cube documentation, given the harm it may do if we unintentionally promote this method of saving data to users who don't really need it.

DPeterK · 2016-12-14T16:24:04Z

@malcolmbrooks I have to say that I agree with @ajdawson on this. While the content is undoubtedly useful in some situations it is highly specific, which is something that all Iris documentation tries to stay clear of, not least because specificity in one area opens the door to specificity in other areas. This can lead to documentation that is excessively long with all the specifics that have now been included. And long documentation is hard to use because it makes it very hard to find the one thing of value that the reader is looking for.

…arallel processing. An unpickleable cube is a problem which should be raised as an issue.

melissaebrooks · 2016-12-15T12:01:08Z

Okay, I'll close this as there appears to be a consensus that it's not required.

Cube pickleability is getting to be really important though. Applications using iris that does parallel processing, or has a similar parent-child process way of working, is dependent on it, and that means a lot of applications using iris at a larger scale. Having documentation that made it clear that it should always be possible to pickle a cube would have been a useful statement. There isn't anywhere appropriate in the developer guide for this either.

Looking at lib/iris/tests/integration/test_pickle.py it is not very extensive - it tests only a single cube loaded from grib, which doesn't catch the netcdf isseue from @ajdawson. This makes me nervous that future versions of Iris could break pickleability and it won't be seen as a big issue. An alternative to this documentation change would be a much more extensive set of pickle tests with an explanation which makes it very clear why cube pickling is important. That's getting a bit too deep into iris for me to do though, so I would just raise it as an issue and leave it to those more involved in development. Would that be okay?

cpelley · 2016-12-16T11:07:36Z

That's getting a bit too deep into iris for me to do though, so I would just raise it as an issue and leave it to those more involved in development. Would that be okay?

I'll happily look at this. Our library which uses iris is very much dependent on cubes being pickleable as we use multiprocessing. If this testing is missing for the various fileformats, we don't want to find out later after a new iris tag release. Thanks for brining this to our attention.

@ajdawson @dkillick regarding the documentation, quite a while ago, we (AVD at that time) looked into writing a pickle helper function, driven by user requests. If I recall correctly, on finding that NetCDF turned out to be generally faster for those cases and given the requirements of those requests at that time, the motivation was quickly diminished. Instead the outcome was #787 I think (@pp-mo can correct me if I'm not recalling correctly).

That is, there is some iris functionality specifically targeting helping users in 'caching' data using pickle.
I think we should have a section on caching (somewhere at least) where we can also introduce these facilities iris has to help them (iris.utils.file_is_newer_than) and also promotes using NetCDF when possible (where the circumstances allow it).

DPeterK · 2016-12-16T13:33:53Z

That's getting a bit too deep into iris for me to do

I'll happily look at this

Thanks @cpelley!

I think we should have a section on caching (somewhere at least)

I agree. I'm not against the documentation proposed here, just the location of the documentation proposed here. I actually mentioned to @malcolmbrooks offline that I think we could do with another Iris documentation whitepaper that's all about saving, as we've had a number of queries about the save process recently. I reckon something like this could fit into such a page.

Malcolm Brooks added 4 commits October 15, 2014 16:30

First attempt at merging duplicate cubes on merge

e2bfac7

Merge remote branch 'upstream/master' into MergeDuplicatedCubeList

4542b43

Add documentation of cube pickling to user guide

9e8c3ef

removed attempt at remove_duplicates, which shouldn't have been in br…

ad510b7

…anch

DPeterK reviewed Dec 8, 2016

View reviewed changes

Pull request comments

d0976db

typo in the pickle save

d7a4740

Added comment making it clear that cube pickle-ability is vital for p…

dc893a4

…arallel processing. An unpickleable cube is a problem which should be raised as an issue.

DPeterK closed this Dec 16, 2016

cpelley mentioned this pull request May 19, 2017

TEST: Ensure PP FF and NetCDF data proxies are pickleable #2569

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document cube/cubelist pickling #2262

Document cube/cubelist pickling #2262

melissaebrooks commented Dec 8, 2016

DPeterK Dec 8, 2016

DPeterK Dec 8, 2016

melissaebrooks commented Dec 8, 2016

ajdawson commented Dec 8, 2016

melissaebrooks commented Dec 14, 2016

ajdawson commented Dec 14, 2016

ajdawson commented Dec 14, 2016

DPeterK commented Dec 14, 2016

melissaebrooks commented Dec 15, 2016

cpelley commented Dec 16, 2016 •

edited

Loading

DPeterK commented Dec 16, 2016

Document cube/cubelist pickling #2262

Document cube/cubelist pickling #2262

Conversation

melissaebrooks commented Dec 8, 2016

DPeterK Dec 8, 2016

Choose a reason for hiding this comment

DPeterK Dec 8, 2016

Choose a reason for hiding this comment

melissaebrooks commented Dec 8, 2016

ajdawson commented Dec 8, 2016

melissaebrooks commented Dec 14, 2016

ajdawson commented Dec 14, 2016

ajdawson commented Dec 14, 2016

DPeterK commented Dec 14, 2016

melissaebrooks commented Dec 15, 2016

cpelley commented Dec 16, 2016 • edited Loading

DPeterK commented Dec 16, 2016

cpelley commented Dec 16, 2016 •

edited

Loading