Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change constructs access API to facilitate performance improvements #130

Closed
davidhassell opened this issue Mar 31, 2021 · 3 comments · Fixed by #138
Closed

Change constructs access API to facilitate performance improvements #130

davidhassell opened this issue Mar 31, 2021 · 3 comments · Fixed by #138
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@davidhassell
Copy link
Contributor

davidhassell commented Mar 31, 2021

Current situation

Access to named constructs is currently either by the "filter_by_*" Constructs methods, or by Field attributes:

>>> f = cdm.example_field(0)
>>> f.dimension_coordinates
<Constructs: dimension_coordinate(3)>
>>> f.constructs.filter_by_type('dimension_coordinate')
<Constructs: dimension_coordinate(3)>

The reason for the "dimension_coordinates" alias being an attribute rather than method is that it was expected that it would often be followed by more options:

>>> f.dimension_coordinates('time')
<Constructs: dimension_coordinate(1)>
>>> f.dimension_coordinates.filter_by_property(units='degrees_east')
<Constructs: dimension_coordinate(1)>

This is fine, but comes with the slight confusion that the result of the attribute is a callable Constructs instance, which obfuscates the help:

>>> help(f.dimension_coordinates)
Help on Constructs in module cfdm.constructs object:

class Constructs(cfdm.mixin.container.Container, cfdm.core.constructs.Constructs)
 |  Constructs(auxiliary_coordinate=None, dimension_coordinate=None, domain_ancillary=None, field_ancillary=None, cell_measure=None, coordinate_reference=None, domain_axis=None, cell_method=None, source=None, copy=True, _use_data=True, _view=False, _ignore=())
...

Performance

It turns out that the code for accessing constructs is very slow, and must be improved.

A key part of this improvement relies on being able to work with python dictionaries rather than dictionary-like Constructs objects.

To facilitate this, whilst retaining the intuitive nature of the API (both in cfdm library code and in downstream applications) the least intrusive change is to make what were attributes properties: i.e. f.dimension_coordinates becomes f.dimension_coordinates(). This then allows keyword parameters that can change the behaviour when speed is an issue.

Code breaking

This change will break any code that uses bare construct access attributes:

>>> # These won't work any more
>>> f.auxiliary_coordinates  
>>> f.coordinate_references
>>> f.coordinates  
>>> f.cell_measures
>>> f.dimension_coordinates
>>> f.domain_ancillaries
>>> f.domain_axes
>>> f.cell_methods
>>> f.field_ancillaries
>>> # These will work as before
>>> f.auxiliary_coordinates()
>>> f.coordinate_references()
>>> f.coordinates()
>>> f.cell_measures()
>>> f.dimension_coordinates()
>>> f.domain_ancillaries()
>>> f.domain_axes()
>>> f.cell_methods()
>>> f.field_ancillaries()
>>> # These will also work as before
>>> f.auxiliary_coordinates(x)
>>> f.coordinate_references(x)
>>> f.coordinates(x)
>>> f.cell_measures(x)
>>> f.dimension_coordinates(x)
>>> f.domain_ancillaries(x)
>>> f.domain_axes(x)
>>> f.cell_methods(x)
>>> f.field_ancillaries(x)

This is the only backwards incompatible change to the API. All other changes will not break existing code.

With the new API, reading a file is, in one reproducible test, ~10 times faster:

>>> import cfdm, timeit
>>> f = cfdm.example_field(1)
>>> cfdm.write(f, 'tmp.nc')
>>> cfdm.__version__
1.8.8.0
>>> sum(timeit.repeat("cfdm.read('tmp.nc')", globals=globals(), repeat=100, number=1))/100
0.25303671994010074
>>> import cfdm, timeit
>>> cfdm.__version__
1.8.9.0b1
>>> sum(timeit.repeat("cfdm.read('tmp.nc')", globals=globals(), repeat=100, number=1))/100
0.026102047999302158

Note that much of the improvement comes from unrelated changes (such as removing unnecessary __repr__ calls and unnecessary deep copies).

PR to follow ...

@davidhassell davidhassell added the enhancement New feature or request label Mar 31, 2021
@davidhassell davidhassell self-assigned this Mar 31, 2021
@davidhassell
Copy link
Contributor Author

These changes will also make a non-backwards compatible change to the mode parameter of filter_by_axis

@davidhassell
Copy link
Contributor Author

These changes will also make a non-backwards compatible change to the mode parameter of filter_by_axis

Whilst I would still like to do this, this change has the potential for existing to not fail but return wrong answers. Perhaps it needs its issue for maximum visibility?

@sadielbartholomew
Copy link
Member

Whilst I would still like to do this, this change has the potential for existing to not fail but return wrong answers. Perhaps it needs its issue for maximum visibility?

That sounds like a good idea to me. If it could lead to wrong answers with an update it should be highlighted in the Changelog and this would help with that too. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants