Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add S3 (with Minio) netCDF open/read and processing and exploratory tests #89

Merged
merged 61 commits into from
Jun 15, 2023

Conversation

valeriupredoi
Copy link
Collaborator

@valeriupredoi valeriupredoi commented Jun 9, 2023

First instance of a functional, end-to-end test of running PyActiveStorage on netCDF files on S3: I used Minio as S3-type storage with @sd109 having confirmed that that thing is identical to AWS S3 (cheers!). The workflow does this:

  • Active now needs a new kwarg storage_type which could be None or could be s3 at the mo
  • if it's s3 then it goes about and does the two file loads that are currently done in its bellows (ie open file, to get metadata/headers, NOT data) via a dedicated S3 mechanism that uses s3fs (note that fsspec does that too, fsspec calls s3fs when it recognized the FS to be S3, so no point using fsspec); then it goes about and uses h5netcdf to put the open file (which is nothing more than a memory view of the netCDF file) into an hdf5/netCDF-like object format
  • chunks access and indexing is done as per normal via the s3_reduce_chunk() func that @markgoddard put in the previous PR, and from there on, Active works as per normal

Lazy transfer from s3fs

@bnlawrence has investigated the play between fsspec and s3fs via h5netcdf and found out thatwhen pulling the file from S3 via s3fs and convert-opening it with h5netcdf things get done lazily ie no actual data transfer, but only file's metadata headers. Cheers, Bryan!

@valeriupredoi valeriupredoi added the testing testing duh label Jun 9, 2023
@codecov
Copy link

codecov bot commented Jun 9, 2023

Codecov Report

Patch coverage: 74.35% and project coverage change: -1.17 ⚠️

Comparison is base (5becf10) 87.21% compared to head (8bf1454) 86.04%.

❗ Current head 8bf1454 differs from pull request most recent head f401deb. Consider uploading reports for the commit f401deb to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #89      +/-   ##
==========================================
- Coverage   87.21%   86.04%   -1.17%     
==========================================
  Files           8        8              
  Lines         485      516      +31     
==========================================
+ Hits          423      444      +21     
- Misses         62       72      +10     
Impacted Files Coverage Δ
activestorage/netcdf_to_zarr.py 85.71% <69.23%> (-6.60%) ⬇️
activestorage/active.py 92.13% <76.92%> (-2.77%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@valeriupredoi valeriupredoi added the enhancement New feature or request label Jun 12, 2023
@valeriupredoi valeriupredoi marked this pull request as ready for review June 12, 2023 15:57
@valeriupredoi valeriupredoi changed the title Add S3/Minio exploratory tests Add S3 (with Minio) netCDF open/read and processing and exploratory tests Jun 12, 2023
@valeriupredoi
Copy link
Collaborator Author

gonna merge this since it's getting a bit too big for a regular PR and commit fluff keeps accruing - the only outstanding issue about it is the storage_type kwarg to Active that should/could be picked up from file name as @davidhassell suggested offline - but we can change that later (am not 100% sure we should, though, since future possible storage types may not necessarily be guessed from file name)

@valeriupredoi valeriupredoi merged commit 0c2bbfd into main Jun 15, 2023
@valeriupredoi valeriupredoi deleted the real_world_s3_tests branch June 15, 2023 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request testing testing duh
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant