Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete dask demo #26

Merged
merged 4 commits into from
Oct 19, 2022
Merged

Complete dask demo #26

merged 4 commits into from
Oct 19, 2022

Conversation

davidhassell
Copy link
Collaborator

Hi, not much to say here, other than I have address the commenting points discussed on Thursday. Also cfdm is no longer a dependency, as I now instantiate things in demo.py as

    # ----------------------------------------------------------------
    # Get the data as a lazy array with active capabilities
    # ----------------------------------------------------------------
    f = NetCDFArray(filename="file.nc", ncvar="q")

    # ----------------------------------------------------------------
    # Get the same data as an in-memory numpy array (with no active
    # capabilities)
    # ----------------------------------------------------------------
    nc = netCDF4.Dataset("file.nc", "r")
    x = nc.variables["q"][...]
    nc.close()

    # ----------------------------------------------------------------
    # Instantiate dask arrays from 'f' and 'x', each with the the same
    # arbitrary distribution of dask chunks.
    # ----------------------------------------------------------------
    dask_chunks = (3, 4)
    df = da.from_array(f, chunks=dask_chunks)
    dx = da.from_array(x, chunks=dask_chunks)

@valeriupredoi
Copy link
Collaborator

boss level, @davidhassell 🍺 Did you want to add a test that runs the demo or some unit/integration tests? No probs if not, we can do that later 👍

@davidhassell
Copy link
Collaborator Author

Cheers, @valeriupredoi. Don't mind either way on the tests question.

self.ncvar = ncvar

nc = netCDF4.Dataset(self.filename, "r")
v = nc.variables[self.ncvar]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the keyword ncvar is not present? Would it be better to pass a NetCDF dataset (already instantiated) in along with a variable name (both required)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(There would be some consequences. I would work through them, but you will have seen I have a git problem right now.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll get an exception with no ncvar or no filename. I don't think that it makes sense to pass in a netCDF.Dataset instance, as on principle we don't want those hanging around as open file handles, but we could make the __init__ args positional only. I'll change that.

Copy link
Collaborator

@bnlawrence bnlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this. @valeriupredoi Can you please merge.

@valeriupredoi
Copy link
Collaborator

cheers, gents! 🍺

@valeriupredoi valeriupredoi merged commit 3a6bc7a into main Oct 19, 2022
@valeriupredoi valeriupredoi deleted the dask-demo branch October 19, 2022 14:34
@valeriupredoi valeriupredoi added documentation Improvements or additions to documentation Dask labels Oct 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dask documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants