Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing Sparse arrays to Zarr #222

Open
jakirkham opened this issue Dec 15, 2018 · 6 comments
Open

Storing Sparse arrays to Zarr #222

jakirkham opened this issue Dec 15, 2018 · 6 comments
Labels
enhancement Indicates new feature requests

Comments

@jakirkham
Copy link

Periodically we have had users request some way to store sparse arrays with Zarr. TBH this is actually pretty doable generally today as demonstrated by this comment. Also this strategy works nicely with Sparse. Admittedly these examples are showing how this works with in-memory Zarr Arrays. Though this would work just as well with any MutableMapping derived store.

Given there seems to be a fair bit of interest in being able to work with more general N-D sparse arrays and having a flexible way to store them, am wondering if it makes sense to provide some functionality in Sparse to store data in Zarr Arrays. Happy to answer any questions. Also would be interested to hear thoughts on this proposal. :)

@hameerabbasi
Copy link
Collaborator

Just so I understand here -- Are you proposing making this library a back-end for Zarr or Making Zarr a back-end for this library?

The issue with the latter is that we use Numba to perform a number of operations (including arithmetic and indexing), does Numba work with Zarr?

The other issue is that at this point in time, DOK is mutable but COO isn't.

@hameerabbasi
Copy link
Collaborator

It would be nice if you could enumerate what API or changes you would need in this library to make this possible -- I'm definitely willing to work with the Zarr team.

@jakirkham
Copy link
Author

Short-term having a way to load a Zarr Array into a Sparse array and store a Sparse array into a Zarr Array would be pretty good. These could be similar to the from_numpy and todense methods. The latter case is pretty much solved. It would just benefit from having a convenience method. The former should be solvable any number of ways. As Zarr seems kind of similar to DOK, maybe that would be the easiest way to load it in.

Long-term it would be interesting to have a Zarr-backed Sparse array. The main benefits here would be working with larger than memory sparse arrays and/or working with other storage backends. However this will take some more thought as you have noted.

@hameerabbasi
Copy link
Collaborator

Okay, I just started thinking about this... Since the long-term goal of this project is to have SciPy depend on it, a dependency (even if optional) on Zarr wouldn't be so nice.

Of course, feel free to duck-patch COO on import so that sparse.COO.to_zarr and sparse.COO.from_zarr exist and do the right thing. 😄

As long as it doesn't rely on fringe functionality of sparse it should be fine.

@hameerabbasi
Copy link
Collaborator

Of course, I'd recommend patching SparseArray.? instead, and then doing .toformat(COO) or cls(coo_arr) instead.

@hameerabbasi hameerabbasi added the enhancement Indicates new feature requests label Mar 15, 2019
@daletovar
Copy link
Contributor

I recently raised a zarr issue on this zarr-developers/zarr-python#424. I'm not sure what will come of it. Regardless, I like the idea of having the saving functionality live in zarr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Indicates new feature requests
Projects
None yet
Development

No branches or pull requests

3 participants