Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NameConstraint with relaxed name loading #3463

Merged

Conversation

bjlittle
Copy link
Member

@bjlittle bjlittle commented Oct 14, 2019

This PR introduces the names property to both cubes and coordinates - a property that returns a tuple of the standard_name, long_name, var_name and STASH.

This property is needed as a means to support more flexible name loading i.e., a iris._constraints.Constraint._coordless_match that will succeed for a match on either of the names property values, rather than depending on the behaviour of the iris._cube_coord_common.CFVariableMixin.name method. Note that, this also applies to iris.cube.Cube.extract and iris.cube.CubeList.extract et al.

This PR also introduces the iris.NameConstraint for more specific control of names related matching.


Closes #3407

@bjlittle
Copy link
Member Author

Requires a whatsnew entry...

Copy link
Member

@rcomer rcomer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m curious about the inclusion of STASH in the names property. It’s a code, rather than a name. Also we have advice to use an AttributeConstraint on STASH, further down that same section of user guide.

iris.NameConstraint(standard_name='air_temperature',
STASH=lambda stash: stash.item == 203

.. note:: Name constraint names are case sensitive.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name constraint names are case sensitive. Is it worth having an example where the lambda uses e.g. long_name.lower(), to show how you can make a non case sensitive constraint?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rcomer Just to be clear (and perhaps this is what requires to be communicated more clearly) what I mean here is that it's:

  • STASH=lambda stash: stash.item == 203, and
  • not stash=lambda stash: stash.item == 203

It's the keyword argument key that is case sensitive, and the user can't change that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK, that makes sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rcomer I've updated the doc-string in an attempt to clarify, HTH 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better, thanks 😀

@pp-mo
Copy link
Member

pp-mo commented Oct 15, 2019

I haven't time to look at this in detail yet.

But I want to say right away that I'm strongly opposed to the specific support for the STASH attribute.
Not just because it is format-specific, which seems pretty questionable anyway, but also because supporting UM files as a special case is highly MetOffice-centric, which I think is unhelpful to the project as a whole.
I recently proposed a similar load/save control attribute for GRIB : SciTools/iris-grib#156. So should that maybe be added in here too ? I really don't think so.

@rcomer
Copy link
Member

rcomer commented Oct 15, 2019

format-specific, which seems pretty questionable anyway

I was thinking that, but then I remembered that so is var_name....


"""
self._names = names
super(NameConstraint, self).__init__(cube_func=self._cube_func)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're in Python 3 now, I believe this can be written as super().__init__(cube_func=self._cube_func).

@bjlittle
Copy link
Member Author

bjlittle commented Oct 15, 2019

@pp-mo This isn't anything new. This PR is just highlighting the existing behaviour of iris that was previously implicit i.e., the iris._cube_coord_common.CFVariableMixin.name method will return the STASH if the cube doesn't have a standard_name, long_name or var_name. Therefore you can filter on STASH for loading and extracting:

>>> print(cube)
air_temperature / (K)               (latitude: 73; longitude: 96)
     Dimension coordinates:
          latitude                           x              -
          longitude                          -              x
     Scalar coordinates:
          forecast_period: 6477 hours, bound=(-28083.0, 6477.0) hours
          forecast_reference_time: 1998-03-01 03:00:00
          pressure: 1000.0 hPa
          time: 1998-12-01 00:00:00, bound=(1994-12-01 00:00:00, 1998-12-01 00:00:00)
     Attributes:
          STASH: m01s16i203
          source: Data from Met Office Unified Model
     Cell methods:
          mean within years: time
          mean over years: time
>>> cube.standard_name = None
>>> cube.extract('m01s16i203')
<iris 'Cube' of m01s16i203 / (K) (latitude: 73; longitude: 96)>

The above example is using iris from master (prior to this PR) highlights this.

So you seem to be opposed to something that already exists. I'm not quite sure how that can be addressed here in this PR?

@stephenworsley
Copy link
Contributor

As it stands, I don't think the iris.NameConstraint is able to precisely emulate the past behaviour of _coordless_match in constraints. Comparing against any one of standard_name, long_name, var_name or STASH is not the same as comparing against cube.name(). Would it make sense to add name to names or to compare against it in iris.NameConstraint?

@bjlittle
Copy link
Member Author

bjlittle commented Oct 15, 2019

@stephenworsley The iris.NameConstraint is not emulating any past behaviour, it's new behaviour.

It's allowing you to perform a conjunction of names for a constraint match against a cube. It has no association whatsoever with the _coordless_match. It's targeting a use case where the standard_name is overloaded and cubes can only by discriminated by long_name, hence the need to easily, within our constraints framework, constrain against standard_name and long_name.

@pp-mo
Copy link
Member

pp-mo commented Oct 15, 2019

cube.extract('m01s16i203')
So you seem to be opposed to something that already exists. I'm not quite sure how that can be addressed here in this PR?

Well, I didn't actually know that !

Mind you, even if it does that already, I don't have to like it.
As for changing behaviour, you are certainly doing that anyway. Isn't that exactly why we're doing this now??

@bjlittle bjlittle force-pushed the relax-name-loading-and-add-name-contraint branch from 48e7e76 to a759170 Compare October 15, 2019 10:24
@pp-mo
Copy link
Member

pp-mo commented Oct 15, 2019

format-specific, which seems pretty questionable anyway

I was thinking that, but then I remembered that so is var_name....

I'm not too convinced by that. var_name is a part of the Iris data model, following CF, not just a product of the format loader : Any other file-format loader could set it, if that was appropriate.

I guess In a sense that "could" be true of STASH too, but I don't see how it could have an unambiguous value for any data not from a UM source.

@stephenworsley
Copy link
Contributor

@bjlittle I was considering the acceptance criteria of #3407, specifically:

  • Include an example of how to load a cube in a manner that replicates the current behaviour, so users can easily retain our original functionality should they needed to.

Is this backwards compatibility being addressed? I would have thought there would still need to be some way to compare against Cube.name() using a Constraint in order to be fully backwards compatible. There doesn't seem to be anywhere in iris._constraints which calls Cube.name() any more.

(On a related note, this line in the docstring probably needs changing too).

@bjlittle
Copy link
Member Author

bjlittle commented Oct 15, 2019

@stephenworsley Thanks. We don't need to be fully backwards compatible here, regardless of the demands of #3407, hence the major release. This PR is addressing several use cases, not just #3407.

Given the relaxed load/extract behaviour of this PR my expectation is that most users will see no change in behaviour. It'll just work. For those that find the relaxed behaviour results in additional cubes being returned, then iris.NameConstraint is their new best friend. It seems a very reasonable compromise to me.

If you think this isn't the case, then I'd love to know, with a concrete example to support it, if possible.

@stephenworsley
Copy link
Contributor

stephenworsley commented Oct 15, 2019

@bjlittle Have you considered extracting cubes with no names whos name() is 'unknown'? Its not unreasonable to imagine a situation where you might even expect some cubes to have no names and some to have the name 'unknown' and to want to treat those two cases similarly. It may well be possible to get the same results using NameConstraint, but probably not as elegantly as with the current behaviour.

@@ -454,7 +454,7 @@ def __init__(self, **attributes):

"""
self._attributes = attributes
Constraint.__init__(self, cube_func=self._cube_func)
super().__init__(cube_func=self._cube_func)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one comes for free, whilst I'm here...

@bjlittle
Copy link
Member Author

I'm going to add a convenience to NameConstraint where it will be possible to use positional arguments or/and keyword arguments e.g., where the following constraints are all equivalent:

NameConstraint(standard_name='air_temperature', long_name='air temp')
NameConstraint('air_temperature', 'air temp')
NameConstraint('air_temperature', long_name='air temp')

etc... as I can imaging users typically performing a NameConstraint(None, 'air temp') type of constraint.

@bjlittle bjlittle force-pushed the relax-name-loading-and-add-name-contraint branch from bb75f43 to 258b3ab Compare October 17, 2019 00:57
@bjlittle
Copy link
Member Author

@lbdreyer I'm now satisfied with this PR. It meets a thorny brief in an area of iris that I would consider heavy with technical debt, and a slightly dubious/not-fully-considered strategy for name constraint matching.

The changes in this PR provide the user with more explicit and flexible control to load and extract cubes.

On reflection, we have been relying on a very weak contract to identify cubes through the iris.cube.Cube.name() method. A richer form of this is the iris.cube.Cube.names property, which comes closer to identifying a cube through metadata - it's not perfect, but it's certainly better. The iris.NameConstraint embodies this within the context of our constraints framework.

Anyways, over to you. I'm bias, but I'd vote for banking this PR, moving forward, and supplement it with further documentation and/or doc-strings if you deem that further clarity is required.

@lbdreyer lbdreyer removed their assignment Oct 17, 2019
stash_name = self.attributes.get('STASH')
if stash_name is not None:
stash_name = str(stash_name)
return (standard_name, long_name, var_name, stash_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the addition of this names method on CFVariableMixin.

stash_name makes sense for a cube but not a coordinate (or cell measure or ancillary variable etc )

Copy link
Member Author

@bjlittle bjlittle Oct 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The introduction of the names property is in total alignment with the existing name method and its behaviour.

The name method also doesn't make sense for anything other than a cube, and yet we have it applied to coordinates and else where. So your argument also applies to that, and yet it still stands. This PR is highlighting existing behaviour already available in iris - to the surprise of @pp-mo.

So either we honour this relationship, or we remove the use of STASH from both the names property and the name method. If we did, then that would be a breaking change for the API, rather than an additive change.

Note that, the names property is only public as it's referenced from within iris.Constraint._coordless_match method.


"""
def __init__(self, standard_name='none', long_name='none',
var_name='none', STASH='none'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trouble is that this does introduce another way of constraining on a stash code, which could now be done either by NameConstraint or AttributeConstraint, which goes against the zen of Python

Copy link
Member Author

@bjlittle bjlittle Oct 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is in alignment with the name method. As it stands, it's completely possible to perform a stash constraint at least four other ways that I know of, excluding this PR, without thinking about it to much. IMHO iris is so anti-zen here already, that really the whole interface needs a re-design.

That said, the NameConstraint gives explicit control back to the user, so it does have value, and it also rights some wrongs to boot. The question that really needs answering here is whether stash should be part of the name method, names property and NameConstraint?

I know what my answer is, but I'm keen to know what others think... and in formulating an answer I suspect the conflict will come between a purist zen of Python philosophy and user convenience to avoid an ugly iris constraints framework.

Copy link
Member

@pp-mo pp-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of small suggestions, nothing earthshaking 📳

@pp-mo
Copy link
Member

pp-mo commented Nov 15, 2019

Thanks @bjlittle.
Over to you 🎾

@bjlittle bjlittle force-pushed the relax-name-loading-and-add-name-contraint branch from 7ba58bb to 1e10bda Compare November 20, 2019 09:40
@bjlittle
Copy link
Member Author

@pp-mo Serviced requested review actions or commented on why not.

Back to you 👍

@pp-mo
Copy link
Member

pp-mo commented Nov 20, 2019

Nice, definitely improved.
Thanks @bjlittle, back to you again.

Just a couple of outstanding queries. And one brand new one...
Well, we must be close ! 🤞

@bjlittle
Copy link
Member Author

bjlittle commented Nov 20, 2019

@pp-mo Okay, this PR is nearing saturation point for discovering what new actions you want me to take...there is a lot of noise.

Okay, I've tackled a couple of the issues that you raised. If you've raised more and I've not addressed them, then that's simply because I can't find them, sorry.

BTW I've just nuked the additonal note in the doc-string - on reflection, I really don't think it's necessary nor appropriate.

Please let me know if there are more outstanding actions to perform or if I've missed something.

Thanks!

@pp-mo pp-mo merged commit 2d75d7f into SciTools:master Nov 20, 2019
@bjlittle
Copy link
Member Author

@pp-mo Sweet 🎉

Thanks for perservering... above and beyond! 👏

@bjlittle bjlittle deleted the relax-name-loading-and-add-name-contraint branch December 6, 2019 14:48
trexfeathers pushed a commit to trexfeathers/iris that referenced this pull request Jan 14, 2020
pp-mo pushed a commit to pp-mo/iris that referenced this pull request Jan 14, 2020
pp-mo pushed a commit to pp-mo/iris that referenced this pull request Jan 14, 2020
trexfeathers added a commit that referenced this pull request Jan 16, 2020
* Bump version to 2.4.0rc0.

* unpin mpl (#3468)

* Merge pull request #3301 from bayliffe/fastpercentilemethod_mask_test

Analysis percentile method - update applicability test for fast_percentile_method

* Have Travis test with iris-grib, remove problem tests (#3469)

* Have Travis test with iris-grib, remove problem tests

* mock GribInternalError correctly

* Update license headers

* account for changes in handling of grib message defaults

* Test against the latest version of python-eccodes

* Moved irir-grib skip to iris.tests

* Merge pull request #2608 from cpelley/PICKLEABLE_FORMATS

TEST: Extends #2569 to unpickle

* _regrid_area_weighted_array: Move axes creation over which weights are calculated to before loop (#3519)

* Purge iris.experimental.regrid np<1.7 support (#3539)

* Add NameConstraint with relaxed name loading (#3463)

* _regrid_area_weighted_array:  move indices variable nearer to use (#3564)

* _regrid_area_weighted_array: Tweak variable order to near other use in code (#3571)

* Fix problems with export and echo command. (#3577)

* Pushdocs fix2 (#3580)

* Revert to single-line command for doctr invocation.

* Added script comment, partly to force Github respin.

* Bracketed six.moves and __future__ imports.

* Fixes required due to the release of iris-grib v0.15.0 (#3582)

* Fix python-eccodes pin in travis (#3593)

* PI-2472: Optimise the area weighted regridding routine (#3598)

* PI-2472: Tweak area weighting regrid move xdim and ydim axes (#3594)

* _regrid_area_weighted_array: Set axis order to y_dim, x_dim last dimensions

* _regrid_area_weighted_array: Extra tests for axes ordering

* PI-2472: Tweak area weighting regrid enforce xdim ydim (#3595)

* _regrid_area_weighted_array: Set axis order to y_dim, x_dim last dimensions

* _regrid_area_weighted_array: Extra tests for axes ordering

* _regrid_area_weighted_array: Ensure x_dim and y_dim

* PI-2472: Tweak area weighting regrid move averaging out of loop (#3596)

* _regrid_area_weighted_array: Refactor weights and move averaging outside loop

* Pin pillow to make graphics tests work again. (#3630)

* PI-2472: Area weighted regridding (#3623)

* The Area-weights routine is refactored into the "__prepare" and "__perform" structure, in-line with our other regridding methods.
* The area-weights are now computed in the "__prepare", so are calculated once.
* The information required for checking the regridding weights and target grid info are now cached on the "regridder" object.
* This is inline with the general use already described in the documentation.

* pep8 conformance.

* pep8 compliance.

* Allow some 'key=None' args in Geostationary creation. (#3628)

LGTM

* Allow some 'key=None' args in Geostationary creation.

* Integration test loading netcdf geostationary without offset properties.

* pep8 conformance.

* pep8 conformance.

* pep8 conformance.

* test_NameConstraint get mock from iris.tests.

* Remove use of super() in _constraints.py for Py2 compatibility.

* Updated license headers.

* Updated iris-grib reference in extensions.txt.

* Py2 support for iris-grib in Travis.

* Updates for auto docs for Iris 2.4 release.

* What's new entry to unpinning mpl.

* Edited Py2 support for iris-grib in Travis.

* Renamed whatsnew contributions folder for v2.4.

* Hacked tests.integration.test_grib2 to avoid import error from iris-grib version < 0.15.

* Only test grib with python 3.

* Compiled v2.4 whatsnew.

Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com>
Co-authored-by: Bill Little <bill.little@metoffice.gov.uk>
Co-authored-by: stephenworsley <49274989+stephenworsley@users.noreply.github.com>
Co-authored-by: abooton <anna.booton@metoffice.gov.uk>
Co-authored-by: lbdreyer <lbdreyer@users.noreply.github.com>
Co-authored-by: Emma Hogan <ehogan@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve loading with "name" string to return all available data
7 participants