-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature variable names and the road to a combined dataset #80
Comments
Hi Eric, I'm definitely supportive of this. My thoughts on your two points, in reverse order:
|
Agreed with @w-k-jones. Maybe a good way to facilitate the transition from v1.x to v2.x here would be to allow users to request xarray output. We could also automatically convert from Iris (segmentation) and Pandas (feature detection/tracking) to xarray in a v1.x version of this function. We already have xarray as a requirement, so it shouldn't be that painful. I also agree in principle that we should allow users to specify the names of the variables, but I will note that it could cause some confusion as/if users share data. It would also cause us a compatibility headache as we pass data back and forth between functions, especially as the various pieces of tobac grow. I'm not sure what the precedents are here of what other similar libraries do. For the descriptions, completely agreed with @deeplycloudy - these metadata should absolutely be included in our output. |
Regarding an ability to request xarray output, definitely. This would pull some clutter out of our scripts, and would nudge/provide a path for users to adopt xarray as we look toward v2.0. I think it probably goes too far to break compatibility at this time. However, if we add any features to tobac for cells and tracks (i.e., new feature parents in the hierarchy) we could have duplicate names. Maybe we keep the current names for features, and then adopt a prefix-style convention (as above) for any higher levels that we add? I also concur with @freemansw1 that we need something consistent internally so that data structure work with any function. If we do move to a user-defined naming convention, we could add a |
I believe that this has actually been resolved with the merge of #136, at least on an experimental basis. @deeplycloudy or @kelcyno any thoughts? |
I'm not sure if this fully addresses was @deeplycloudy was needing, but the combined xarray dataset of feature/segmentation/tracking is available now in the utils (standardize_track_dataset) as of merge #136. |
One of tobac's advantages is that it keeps each step in the tracking process separate. Right now, each step using tobac's various tracking functions produces a separate dataset or array.
However, as I think about producing datasets for sharing among work groups and for long-term archival, I'd like to be able to create a combined data file with all the feature data table, the feature mask, and tracked-feature cell IDs, along with the projection data for the feature mask.
With some judicious renaming, and keeping in mind the parent-child relationship ideas in the CF-tree proposal, it is in fact possible to combine everything into one dataset. Here's a function that does both jobs. It uses the xarray data structures returned by the v2.0-dev branch.
There's really two issues here:
Note that step (2) is really only a few lines once the tedious step (1) has been accomplished.
The output data structure looks like this, with attributes expanded to be visible for a few variables:

What do we think about including a function like this in tobac? Perhaps this dual-purpose function could still fit in v 1.x without causing breakage, and would define the idea of combining datasets. Then, v.2.0 could aim to more fully rename variables by default throughout the library, simplifying the dataset-combining function.
The text was updated successfully, but these errors were encountered: