Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to open netcdf file from remote HTTPS server #1083

Merged

Conversation

konstntokas
Copy link
Contributor

@konstntokas konstntokas commented Nov 5, 2024

This PR adjusts the DatasetNetcdfFsDataAccessor class, so that NetCDF files can be now opened from a remote HTTPS server. It adjusts the file uri to http(s)://*#mode=bytes, which allows to simply use xarray.open_dataset to open the data remotely. This also supports lazy when setting chunks={} or to a specific tile size, as used before.

Checklist:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/source/*
  • Changes documented in CHANGES.md
  • GitHub CI passes
  • AppVeyor CI passes
  • Test coverage remains or increases (target 100%)

"""
fs_path = "download/10.5880.GFZ.1.4.2023.006-VEnuo/GAMIv2-0_2010-2020_100m.nc"
store = new_data_store("https", root="datapub.gfz-potsdam.de")
ds = store.open_data(fs_path, chunks={})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loads the data from remote source which is not wanted in a unittest. I would suggest using pytest-recording, like in xcube-stac. This would mean that we add a new dependency pytest-recording to the test suite. Would that be okay? Or do you have a better suggestion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not ok. Unit tests should not depend on external resources. Replace by memory filesystem in that you first write the un-chunked netcdf file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I misunderstood the intent of this PR. It is just about HTTP access. Yes, fine, use pytest-recording.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@forman pytest-recording does not recognize the request when opening the data with xr.open_dataset. Therefore no cassette is written and it just uses the actual https request. I don't think we can use pytest-recording in this case. Probably we still need to build a mock, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mocked the result of xarray.open_dataset now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is fine!

@forman forman self-requested a review November 5, 2024 09:40
Copy link
Member

@forman forman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you suggest yourself, please use pytest-recording.

"""
fs_path = "download/10.5880.GFZ.1.4.2023.006-VEnuo/GAMIv2-0_2010-2020_100m.nc"
store = new_data_store("https", root="datapub.gfz-potsdam.de")
ds = store.open_data(fs_path, chunks={})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not ok. Unit tests should not depend on external resources. Replace by memory filesystem in that you first write the un-chunked netcdf file.

"""
fs_path = "download/10.5880.GFZ.1.4.2023.006-VEnuo/GAMIv2-0_2010-2020_100m.nc"
store = new_data_store("https", root="datapub.gfz-potsdam.de")
ds = store.open_data(fs_path, chunks={})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I misunderstood the intent of this PR. It is just about HTTP access. Yes, fine, use pytest-recording.

@konstntokas konstntokas marked this pull request as ready for review November 5, 2024 13:50
@konstntokas konstntokas requested a review from forman November 5, 2024 13:59
Copy link
Member

@forman forman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good - but just see my suggestion.

You can merge otherwise.

CHANGES.md Outdated
Comment on lines 5 to 6
* The `DatasetNetcdfFsDataAccessor` class has been adjusted, so that NetCDF files can
be now opened from a remote HTTPS server using the `"https"` data store.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take the perspective of a user. The user facing API in this case is the data store framework. How is it improved?

@forman forman self-requested a review November 5, 2024 14:14
konstntokas and others added 2 commits November 5, 2024 15:23
Co-authored-by: Norman Fomferra <norman.fomferra@brockmann-consult.de>
@konstntokas konstntokas merged commit 6faa358 into main Nov 5, 2024
3 checks passed
@konstntokas konstntokas deleted the konstntokas-xxx-allow_remote_https_lazy_access_netcdf branch November 5, 2024 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants