Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Directly allow zarr arrays as "out" parameter of ufuncs (if possible) #566

Open
s-m-e opened this issue May 20, 2020 · 4 comments
Open

Directly allow zarr arrays as "out" parameter of ufuncs (if possible) #566

s-m-e opened this issue May 20, 2020 · 4 comments
Labels
enhancement New features or improvements

Comments

@s-m-e
Copy link

s-m-e commented May 20, 2020

I just ran into the following "problem":

import numpy as np
import zarr
a = np.arange(0, 16, dtype = 'float32').reshape(4, 4)
b = zarr.open('test.zarr', mode = 'w', shape = (4, 4), chunks = (2, 2), dtype = 'f4')
np.multiply(a, 2, out = b[:, :]) # does not change file(s) on disk
np.multiply(a, 2, out = b) # TypeError: return arrays must be of ArrayType

Some ufuncs (but not those from numpy itself) allow zarr arrays as out-parameters. I.e. the result of the computation is directly written to disk. numpy in fact rejects zarr arrays because they are not of "ArrayType". Could this be changed? I am not entirely sure given the nature of numpy's memory management.

@alimanfoo alimanfoo transferred this issue from zarr-developers/zarr-specs May 20, 2020
@alimanfoo
Copy link
Member

Thanks @s-m-e, this might be one to raise with the numpy folks, I don't know how easy it would be for numpy to allow zarr arrays as out parameters.

@jakirkham
Copy link
Member

Maybe @rgommers would know? 🙂

@rgommers
Copy link

Some ufuncs (but not those from numpy itself) allow zarr arrays as out-parameters

Interesting. Can you point to an example @s-m-e? Other ufuncs, like in scipy.special, typically use the same numpy machinery as numpy itself to create ufuncs, so this is surprising to me.

ufuncs aren't going to be writing to disk, I think it requires ndarray or a subclass of it.

@s-m-e
Copy link
Author

s-m-e commented May 23, 2020

@rgommers I have been wondering about this, too. I have been digging through old code and so far I believe I have been "a victim" of Python functions "pretending" to be a (C-) ufunc and clever duck typing - i.e. something along those lines:

a = np.arange(10, 18, dtype = 'f4')
b = zarr.zeros((8,), chunks = (2,), dtype = 'f4')

isarray = lambda var: all((hasattr(var, attr) for attr in ('dtype', 'shape', 'ndim', 'astype')))
isscalar = lambda var: any((isinstance(var, number) for number in (int, float, np.number)))

def multiply_py(x, y, out = None, order = 'K', dtype = None):
    assert isarray(x)
    assert x.ndim == 1
    assert isscalar(y)
    assert out is None or isarray(out)
    if out is None:
        out = np.zeros(x.shape, order = order, dtype = dtype if dtype is not None else x.dtype)
    for i in range(x.shape[0]):
        out[i] = x[i] * y
    return out

multiply_py(a, 2, out = b)

I am trying to locate an example in an open source library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New features or improvements
Projects
None yet
Development

No branches or pull requests

5 participants