You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hopefully this is the right thing to do. I wanted to write out my entire plan before I start making too much of a final push so that people can review it beforehand. This is a large change that touches many different parts of the code but should have fairly minimal changes to the overall working/syntax. It should also clean up/speed up some functionality that has been lacking.
Describe the functionality you would like to see.
I apologize for the large number of issues/Pull requests that I have created over the last couple of months. Admittedly there was some discovery in many parts of this, #3076 and #3075 as well as #3055 and #3031 are all relevant. This started from a desire to rework the diffraction spot finding in pyxempyxem/pyxem#872. Many of the features there are broken or unusable with large datasets. This is because of how the hs.Signal2d.find_peaks function is written as well as how the marker class handles plotting multiple different artists. Additionally, the lack of native support for column labeled signals becomes a large problem when trying to produce an end to end workflow for this type of analysis and maintain the high standards hyperspy has set for metadata, axes management and strict definition of data. (Not that this is a bad thing.)
The desired workflow would be:
Use interactive tools to determine location of important features for a signal diffraction pattern
Find important features in all of the images.
Plot those features on the original dataset
Iterate 1-3 until proper convergence/ fitting occurs.
Refine and manipulate the important features, create figures, analyze columns etc.
For large datasets streamlining the iteration is very important and often lazy workflows are extremely beneficial as small parts of the data can be analyzed and observed without requiring the entire calculation to be repeated.
In a more simplified context the features I would like to add are:
Support for labeled columns in some Axes object. This includes the ability to slice signals using the axes label. This is very similar to how pandas or xarray allow for labeled column values.
Defining markers as a ragged array with columns defining marker attributes and a variable number of rows defining the number of points.
Support for ragged datasets with dtype= hs.BaseSignal
Add an as_type parameter to the map function if the signal should be cast to a different signal type.
Describe the context:
Describing each of the features that I would like to add in more detail:
1. Reworking the Axes class to support labeled columns.
There is already a bunch of good discussion in #3055 as well as #3031 but to formally state my objectives.
Allow for adding labels to the BaseAxis.axis property so that signals can be sliced using non-numerical values. An example of this is s.isig["x"].
Allow for slicing using multiple values such as s.isig[["x", "y"]]. This requires that we are more strict about how we follow numpy and their established Advanced indexings. In sort this means that tuples and lists/ arrays are treated differently. This also allows for slicing using single dimension boolean arrays (or lists).
Possible Inclusion:Allow slicing with a multidimensional boolean array: This is much more difficult to handle in regard to maintaining axes values. We can create a new axis but we would lose the axes information which might be of interest. I would suggest against allowing multidimensional boolean arrays in favor of maybe adding a generic boolean ROI.
Possible Inclusion: _I also want to add a special kind of label which includes the offset/scale for some index. The idea being that both the real and pixel values for some point are both saved. This is very useful when you are trying to use both values to further analyze the pixelated image and use the calibrated value.
2. . Defining markers as a ragged array
This speeds up plotting of markers as it reduces the number of artists necessary to plot some marker. It also allows for marker to be initialized from lazy, nonlazy signals without reshaping the data into a compositely different form.
This will redefine markers as either a ragged array (if the markers change for each navigation position) or as a non ragged array of markers to be applied to a signal
If the maker.data attribute is ragged then at each ndimensional index there will be an array with `dtype = [('x1', (float, size)), ('y1', (float, size)), ('x2', (float, size)), ('y2', (float, size)),('text', (string, size)), ('size', (float, size))
The marker.data attribute can be a dask array. If the data attribute is lazy then the values are cached similar to how plotting lazy signals caches the data.
Markers can be initialized using existing methods/workflows but additionally have the option to be initialized in bulk.
Markers are sliced whenever signal.inav is used to slice the data. A copy of the marker is created and passed to the new signals.
Possible Inclusion: _Originally I wanted the marker class to extend the BaseSignal class I am still not entirely convinced that isn't the best thing to do.
It simplifies the slicing, makes the navigation axes consistent and makes it easier to change/ adjust markers in the plot.
Things like shifting markers when necessary is easier as the markers can be adjusted inplace.
Using the addition in 1 the labels can be more easily identified/ adjusted etc.
It majorly simplifies the workflow of finding some feature --> plotting the feature as markers are just extensions of signals. It is also much easier to create lazy markers etc. _
3. Support for ragged datasets with dtype= hs.BaseSignal
Finally I propose support for ragged signals to allow for a dtype = BaseSignal and for that to be a special type of ragged signal.
This allows for:
Initialization of ragged arrays with underlying information about their axes (i.e. Vectors associated with diffraction spots or interesting features)
Saving interesting features (i.e. some slice of an image) as signals with all the necessary data.
Additional Aside:
This is an extension of a couple of discussions that I have had with @ericpre over the last year or so. I would like for a ragged signal to return a non-ragged signal when the dtype is equal to some subclass of BaseSignal. This is the default numpy behavior and it allows a couple of very useful things.
For example the following
x=np.empty(2, dtype=object)
x[0]=np.ones(10)
x[1]=np.ones(15)
x[1][5]=2# access and the inside ragged array.
It also allows for operating on each index individually in a non ragged fashion and plotting single indexes of some ragged signal using the syntax s.inav[2,2].plot().
My proposal is to allow ragged signals to return non ragged signals if the s.data object has a dtype= BaseSignal. In this case it no longer makes sense to maintain the ragged array when instead a signal can be returned.
Implementing this requires some fairly basic changes to handling ragged signals and some small changes regarding saving. I would like for only one copy of the metadata/original metadata to be saved/ loaded to reduce the requirements for saving the dataset and then passed on when the data is sliced. That should be fairly easy to manage. The map function will also have to be changed in order just pass the data through without the Signal.
Additional information
Putting this all together (at least for my purposes) these changes should allow us to:
s# Diffraction Signal 2dpeaks=s.find_peaks(method="template_matching", template=disk_template, threshold=0.1, as_vector=True)
peaks# Ragged Signal, sub array of signal2Dpeaks.inav[2.2] # Signal 2D peaks.inav[2,2].axes_manager[1].labels# ["kx", "ky"]peaks.inav[2,2].isig["kx"] # Returns all of the "kx" vectorsmarker=peaks.to_marker()
marker.navigation_shape==s.axes_manager.navigation_shape# Trues.add_marker(marker, permanent=True)
slic=s.inav[0:20,0:20]
slic.markers[0].navigation_shape==slic.axes_manager.navigation_shapeslic.plot() # plot a subset of the markers
Hopefully this is the right thing to do. I wanted to write out my entire plan before I start making too much of a final push so that people can review it beforehand. This is a large change that touches many different parts of the code but should have fairly minimal changes to the overall working/syntax. It should also clean up/speed up some functionality that has been lacking.
Describe the functionality you would like to see.
I apologize for the large number of issues/Pull requests that I have created over the last couple of months. Admittedly there was some discovery in many parts of this, #3076 and #3075 as well as #3055 and #3031 are all relevant. This started from a desire to rework the diffraction spot finding in
pyxem
pyxem/pyxem#872. Many of the features there are broken or unusable with large datasets. This is because of how thehs.Signal2d.find_peaks
function is written as well as how themarker
class handles plotting multiple different artists. Additionally, the lack of native support for column labeled signals becomes a large problem when trying to produce an end to end workflow for this type of analysis and maintain the high standards hyperspy has set for metadata, axes management and strict definition of data. (Not that this is a bad thing.)The desired workflow would be:
For large datasets streamlining the iteration is very important and often lazy workflows are extremely beneficial as small parts of the data can be analyzed and observed without requiring the entire calculation to be repeated.
In a more simplified context the features I would like to add are:
Axes
object. This includes the ability to slice signals using the axes label. This is very similar to howpandas
orxarray
allow for labeled column values.dtype= hs.BaseSignal
map
function if the signal should be cast to a different signal type.Describe the context:
Describing each of the features that I would like to add in more detail:
1. Reworking the
Axes
class to support labeled columns.There is already a bunch of good discussion in #3055 as well as #3031 but to formally state my objectives.
Allow for adding labels to the
BaseAxis.axis
property so that signals can be sliced using non-numerical values. An example of this iss.isig["x"]
.Allow for slicing using multiple values such as
s.isig[["x", "y"]]
. This requires that we are more strict about how we follownumpy
and their established Advanced indexings. In sort this means that tuples and lists/ arrays are treated differently. This also allows for slicing using single dimension boolean arrays (or lists).Possible Inclusion: Allow slicing with a multidimensional boolean array: This is much more difficult to handle in regard to maintaining axes values. We can create a new axis but we would lose the axes information which might be of interest. I would suggest against allowing multidimensional boolean arrays in favor of maybe adding a generic boolean ROI.
Possible Inclusion: _I also want to add a special kind of label which includes the offset/scale for some index. The idea being that both the real and pixel values for some point are both saved. This is very useful when you are trying to use both values to further analyze the pixelated image and use the calibrated value.
2. . Defining markers as a ragged array
This speeds up plotting of markers as it reduces the number of artists necessary to plot some marker. It also allows for marker to be initialized from lazy, nonlazy signals without reshaping the data into a compositely different form.
marker.data
attribute can be adask
array. If the data attribute is lazy then the values are cached similar to how plotting lazy signals caches the data.BaseSignal
class I am still not entirely convinced that isn't the best thing to do.3. Support for ragged datasets with
dtype= hs.BaseSignal
Finally I propose support for ragged signals to allow for a
dtype = BaseSignal
and for that to be a special type of ragged signal.This allows for:
Additional Aside:
This is an extension of a couple of discussions that I have had with @ericpre over the last year or so. I would like for a ragged signal to return a non-ragged signal when the dtype is equal to some subclass of
BaseSignal
. This is the defaultnumpy
behavior and it allows a couple of very useful things.For example the following
It also allows for operating on each index individually in a non ragged fashion and plotting single indexes of some ragged signal using the syntax
s.inav[2,2].plot()
.My proposal is to allow ragged signals to return non ragged signals if the
s.data
object has a dtype= BaseSignal. In this case it no longer makes sense to maintain the ragged array when instead a signal can be returned.Implementing this requires some fairly basic changes to handling ragged signals and some small changes regarding saving. I would like for only one copy of the metadata/original metadata to be saved/ loaded to reduce the requirements for saving the dataset and then passed on when the data is sliced. That should be fairly easy to manage. The
map
function will also have to be changed in order just pass the data through without the Signal.Additional information
Putting this all together (at least for my purposes) these changes should allow us to:
For a more generic workflow
The text was updated successfully, but these errors were encountered: