Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal--HSEP: 4 Adding support for Labeled Columns/Advanced Slicing, Ragged Signal of Signals and Advanced Markers #3078

Open
CSSFrancis opened this issue Dec 28, 2022 · 0 comments

Comments

@CSSFrancis
Copy link
Member

Hopefully this is the right thing to do. I wanted to write out my entire plan before I start making too much of a final push so that people can review it beforehand. This is a large change that touches many different parts of the code but should have fairly minimal changes to the overall working/syntax. It should also clean up/speed up some functionality that has been lacking.

Describe the functionality you would like to see.

I apologize for the large number of issues/Pull requests that I have created over the last couple of months. Admittedly there was some discovery in many parts of this, #3076 and #3075 as well as #3055 and #3031 are all relevant. This started from a desire to rework the diffraction spot finding in pyxem pyxem/pyxem#872. Many of the features there are broken or unusable with large datasets. This is because of how the hs.Signal2d.find_peaks function is written as well as how the marker class handles plotting multiple different artists. Additionally, the lack of native support for column labeled signals becomes a large problem when trying to produce an end to end workflow for this type of analysis and maintain the high standards hyperspy has set for metadata, axes management and strict definition of data. (Not that this is a bad thing.)

The desired workflow would be:

  1. Use interactive tools to determine location of important features for a signal diffraction pattern
  2. Find important features in all of the images.
  3. Plot those features on the original dataset
  4. Iterate 1-3 until proper convergence/ fitting occurs.
  5. Refine and manipulate the important features, create figures, analyze columns etc.

For large datasets streamlining the iteration is very important and often lazy workflows are extremely beneficial as small parts of the data can be analyzed and observed without requiring the entire calculation to be repeated.

In a more simplified context the features I would like to add are:

  1. Support for labeled columns in some Axes object. This includes the ability to slice signals using the axes label. This is very similar to how pandas or xarray allow for labeled column values.
  2. Defining markers as a ragged array with columns defining marker attributes and a variable number of rows defining the number of points.
  3. Support for ragged datasets with dtype= hs.BaseSignal
  4. Add an as_type parameter to the map function if the signal should be cast to a different signal type.

Describe the context:

Describing each of the features that I would like to add in more detail:

1. Reworking the Axes class to support labeled columns.

There is already a bunch of good discussion in #3055 as well as #3031 but to formally state my objectives.

  • Allow for adding labels to the BaseAxis.axis property so that signals can be sliced using non-numerical values. An example of this is s.isig["x"].

  • Allow for slicing using multiple values such as s.isig[["x", "y"]]. This requires that we are more strict about how we follow numpy and their established Advanced indexings. In sort this means that tuples and lists/ arrays are treated differently. This also allows for slicing using single dimension boolean arrays (or lists).

  • Possible Inclusion: Allow slicing with a multidimensional boolean array: This is much more difficult to handle in regard to maintaining axes values. We can create a new axis but we would lose the axes information which might be of interest. I would suggest against allowing multidimensional boolean arrays in favor of maybe adding a generic boolean ROI.

  • Possible Inclusion: _I also want to add a special kind of label which includes the offset/scale for some index. The idea being that both the real and pixel values for some point are both saved. This is very useful when you are trying to use both values to further analyze the pixelated image and use the calibrated value.

2. . Defining markers as a ragged array

This speeds up plotting of markers as it reduces the number of artists necessary to plot some marker. It also allows for marker to be initialized from lazy, nonlazy signals without reshaping the data into a compositely different form.

  • This will redefine markers as either a ragged array (if the markers change for each navigation position) or as a non ragged array of markers to be applied to a signal
  • If the maker.data attribute is ragged then at each ndimensional index there will be an array with `dtype = [('x1', (float, size)), ('y1', (float, size)), ('x2', (float, size)), ('y2', (float, size)),('text', (string, size)), ('size', (float, size))
  • The marker.data attribute can be a dask array. If the data attribute is lazy then the values are cached similar to how plotting lazy signals caches the data.
  • Markers can be initialized using existing methods/workflows but additionally have the option to be initialized in bulk.
  • Markers are sliced whenever signal.inav is used to slice the data. A copy of the marker is created and passed to the new signals.
  • Possible Inclusion: _Originally I wanted the marker class to extend the BaseSignal class I am still not entirely convinced that isn't the best thing to do.
    • It simplifies the slicing, makes the navigation axes consistent and makes it easier to change/ adjust markers in the plot.
    • Things like shifting markers when necessary is easier as the markers can be adjusted inplace.
    • Using the addition in 1 the labels can be more easily identified/ adjusted etc.
    • It majorly simplifies the workflow of finding some feature --> plotting the feature as markers are just extensions of signals. It is also much easier to create lazy markers etc. _

3. Support for ragged datasets with dtype= hs.BaseSignal

Finally I propose support for ragged signals to allow for a dtype = BaseSignal and for that to be a special type of ragged signal.

This allows for:

  • Initialization of ragged arrays with underlying information about their axes (i.e. Vectors associated with diffraction spots or interesting features)
  • Saving interesting features (i.e. some slice of an image) as signals with all the necessary data.

Additional Aside:
This is an extension of a couple of discussions that I have had with @ericpre over the last year or so. I would like for a ragged signal to return a non-ragged signal when the dtype is equal to some subclass of BaseSignal. This is the default numpy behavior and it allows a couple of very useful things.

For example the following

x = np.empty(2, dtype=object)

x[0]= np.ones(10)
x[1]= np.ones(15)
x[1][5]=2 # access and the inside ragged array. 

It also allows for operating on each index individually in a non ragged fashion and plotting single indexes of some ragged signal using the syntax s.inav[2,2].plot().

My proposal is to allow ragged signals to return non ragged signals if the s.data object has a dtype= BaseSignal. In this case it no longer makes sense to maintain the ragged array when instead a signal can be returned.

Implementing this requires some fairly basic changes to handling ragged signals and some small changes regarding saving. I would like for only one copy of the metadata/original metadata to be saved/ loaded to reduce the requirements for saving the dataset and then passed on when the data is sliced. That should be fairly easy to manage. The map function will also have to be changed in order just pass the data through without the Signal.

Additional information

Putting this all together (at least for my purposes) these changes should allow us to:

s # Diffraction Signal 2d

peaks = s.find_peaks(method="template_matching", template=disk_template, threshold=0.1, as_vector=True) 
peaks # Ragged Signal, sub array of signal2D

peaks.inav[2.2] # Signal 2D 

peaks.inav[2,2].axes_manager[1].labels  # ["kx", "ky"]
peaks.inav[2,2].isig["kx"] # Returns all of the "kx" vectors


marker = peaks.to_marker()
marker.navigation_shape == s.axes_manager.navigation_shape # True
s.add_marker(marker, permanent=True)

slic = s.inav[0:20,0:20]

slic.markers[0].navigation_shape == slic.axes_manager.navigation_shape

slic.plot() # plot a subset of the markers

For a more generic workflow

peaks = s.find_peaks(method="template_matching", template=disk_template, threshold=0.1, return_indexes=False)

markers = hs.plot.markers.PointMarker(data=peaks.data) 

s.add_markers(markers)
s.plot()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant