Requirements

Use Scenarios

See there!

Demand uncertainties

Dynamic field of research. The goal of the project is to enable applications that are yet unknown!

Software and data format are modular, extendable, reusable, documented, suitable for parallel, distributed development processes.

Interface With Existing Systems

Some research groups have existing code and file formats for pixelated STEM. TODO overview

https://fast_pixelated_detectors.gitlab.io/fpd_live_imaging/

https://fast_pixelated_detectors.gitlab.io/merlin_interface/

https://github.com/pycrystem/pycrystem

https://github.com/pycroscopy/pycroscopy

http://prism-em.com/

http://bio3d.colorado.edu/SerialEM/

Data management: http://diamondlightsource.github.io/SynchWeb/

https://docs.google.com/document/d/1cFLMqETci3MMzU8NDCySzRNZrL_IXm5l51Rz3kqzW-Q/edit

Integrates with Australian cloud platform: MyTardis, Galaxy, CVL https://www.massive.org.au/cvl

File format stuff

EMD file format - specific flavour of HDF5, designed to store experiment and simulated data. Viewer written in c++ / qt. Some codes to read and write EMD in python and Matlab are available, but since all EMD files are valid HDF5, any modern programming language can already read these files. Primarily maintained by Colin, Phil, Florian. This format is primarily for tomography, 4D-STEM and multi-dimensional STEM simulations.

EMD file format - EMD website

Dieter: This looks great! I guess you are aware of the Hyperspy work in this area? http://hyperspy.org/hyperspy-doc/current/user_guide/io.html

http://hyperspy.org/hyperspy-doc/current/user_guide/metadata_structure.html

EMD file format - EMD github repo

HDF5: check out the H5PY module in python

HDF5: HDF5 group website

HDF5: The HDFview program is very useful for looking at HDF5 and EMD datasets with a graphical user interface. In the Binary Distributions section download the HDFView+Object 2.13 installer for your platform (windows, Mac, etc.). For windows pay attention to the note in the yellow box or the program might not work correctly.

Example for Open Data: http://ammrf-dev-space.intersect.org.au/ --> support similar workflows for 4D STEM data

Data preprocessing

C code to convert .dm4 4D-STEM files into hdf5 - Yifei Meng

Matlab electron counting / clustering code - Colin Ophus

Gatan K2 extraction routine, 4D-STEM package called STEMx

(available at http://www.gatan.com/installation-instructions)

Data processing

Question Dieter: What is the output data and output format of these programs? Is there more detailed information online?

Ptychography, Wigner deconvolution reconstruction in Matlab - Hao Yang (developed at Univ. Oxford and will probably licensed and open sourced)

Matlab DPC, MIDI-STEM, STEM holography reconstruction - Colin Ophus

Digital Micrograph nanobeam electron diffraction (NBED) plug-in - Christoph Gammer

Gatan STEMx software package for NBED - Available as a 90-day free demo upon request to Gatan

Basic 4D-STEM analysis using Scikit-image - Yifei Meng

Image and diffraction simulations

Question Dieter: What input formats do these programs accept for descriptions of the sample and the incident beam? What are the output formats? Do they use some common formats or are these home-grown format definitions? It should be possible to feed the output of these programs into the analysis code, right? Ideally in a closed loop: Analysis gives model of sample, model of sample gives simulated data

Multem - Ivan Lobato, c++,qt nice user interface, very extensive, fast gpu if available, large systems possible, includes inelastic scattering, has Matlab interface for full flexibility, actively maintained. GPL, on [github]https://github.com/Ivanlh20/MULTEM, see Ultramicroscopy paper http://doi.org/10.1016/j.ultramic.2016.06.003

Multislice simulation in Matlab - Colin Ophus

PRISM, fast STEM simulation - Algorithm paper

PRISM matlab code available upon request - Colin Ophus

GPU accelerated version by AJ Pryor in C++, Github repo - repo is currently private, but will be open sourced in ~1 month.

Earl Kirkland multislice code in C: compuTEM - SourceForge repo

Dieter: The documentation of that program is rather minimalistic -- is there at least something like a man page somewhere?

muSTEM simulation code, presently supported only for Windows and running exclisively on Nvidia GPUs - http://tcmp.ph.unimelb.edu.au/mustem/muSTEM.html

Code and file formats for X-ray crystallography might be useful for pixelated STEM. TODO list

Other useful SW?

https://github.com/heeres/qtlab

http://www.nion.com/swift/

https://github.com/nion-software https://github.com/nion-software/nionswift http://nionswift.readthedocs.io/en/stable/

http://pandas.pydata.org/pandas-docs/stable/

The software controls scan generators, cameras, other detectors and the microscope when it is used for acquisition. TODO list of hardware to support.

The software should allow connection to other, independent systems that, for example, handle a specific detector. Comment: For example through a network protocol. Set detector and scan parameter, configure triggers, collect results, …

Platforms: Windows, Mac (only data analysis), Linux (x64)  issue: How is the driver situation on vendor side?

After initial development phase: Stable APIs and file format definitions with well-managed roadmap and backwards compatibility.  Challenge: Combine rapid innovations in some parts of the project with stability in others --> Manage it like web standards?

Operating Conditions

Run on stock workstations, PCs, laptops.

Availability And Reliability

The software is necessary to operate the microscopy system if it is used for acquisition. Interactive use tolerates occasional problems while it is in Beta. Automated use, scripting needs a reliable system depending on the time it needs to run independently.

The system notifies operators of problems in appropriate ways. Automated use: Send notifications to e-mail or SMS for important events (configurable), logging in a text file, generate output signals for indicator lights (green, yellow, blue, red "traffic light" for industrial machines). Interface with general error handling infrastructure of the microscope? Option to activate detailed tracing to backtrack issues.

UNDETECTED data corruption is the biggest risk, i.e. undetected mismatch between actual instrument settings and settings saved as meta data, or corrupted data points. DETECTED data corruption can be compensated or measurements can be repeated.

Performance

The device control and acquisition brings soft realtime requirements (fill and empty device buffers while a scan is running; coordinate device responses and error states).

Scan rates of 2000 fps (absolute minimum), target 100,000 fps (already in development, TODO at which resolution?), optimal 1,000,000 fps (Strong binning? Preprocessing on the device? Conventional detector data) should be supported by the architecture, at variable resolution or binning. TODO what do scan systems allow in terms of scan rate, resolution and such? FASTER is always an obvious direction! Physical limits around 1e9 fps or even faster?

API compatible with DMA, moving large amounts of data around very quickly.
- Configure device, transfer memory addresses
- Execute command, device reads or writes memory
- Block until done or callback?
- Use the data, move it to another device or write to disk
- Error handling
- How does a userspace API for such a system look like?
How to make sure such amounts of data are written to disk in time? Work like Gatan with a whole acquisition framework?
Is it viable to keep the raw data in all circumstances or should we immediately preprocess?
- Both options should be possible!
Feed data into hardware accelerators, GPUs and such and pull back out

Data processing and visualization should give an immediate visual feedback.

Keep up with data rates!
Use DMA, GPU, OpenGL, parallel processing as well.
Python is only glue language, the real work is done at low level in a compiled language with parallel processing
GUI probably C++ and not Python? Python as an internal scripting language?
All parts of the software are optimized for high throughput. Use high-performance libraries, be careful with Python!

The software should be designed for high efficiency so that it runs on older, cheaper hardware, if possible.

It should make full use of the capabilities of multicore processors, large memory, GPUs and other accelerators, but run on a plain run-off-the-mill laptop, i.e. no hard dependency on specific hardware.

https://dask.pydata.org/en/latest/ https://support.hdfgroup.org/HDF5/doc/Advanced/Chunking/index.html https://support.hdfgroup.org/HDF5/PHDF5/ https://mathema.tician.de/software/pycuda/ https://scikit-cuda.readthedocs.io/en/latest/ http://mpi4py.scipy.org/docs/ http://eprints.maynoothuniversity.ie/8851/ http://iopscience.iop.org/article/10.1088/1748-0221/11/12/C12023/meta http://iopscience.iop.org/article/10.1088/1748-0221/11/02/P02007/meta

Or OpenCL analogs to make it more portable? Der

Testing, Installation And Service

Agile, dynamic, distributed development --> needs automated test framework and continuous integration etc.

Easy update and installation from source tree professional build and installation process from source
Automated tests on all supported platforms before changes are integrated into main repository
Coding conventions, docstrings and such to make the code readable
Patch management / change management like Hyperspy or Linux kernel
Development branches for new features
Nightly builds with platform-specific installers
Changelogs
Beta releases with platform-specific installers
Stable releases with maintenance with platform-specific installers
Issue tracker and communication platform with all bells and whistles
Communication of changes, give community time to respond. Predictable, responsible behaviour towards users. Don't break userspace!

Lifetime And Disposal

Upgrade path from older to newer releases

Import settings
File converters
User support
…?

Safety Concerns

Human and instrument safety should not be handled by the system because it is designed as a flexible development platform, not as a safety-relevant subsystem. Interlocks, emergency stop etc. handled by hardware or dedicated embedded software, NEVER by this user-space system! --> Safety-relevant SW requires a different development model, must be compartmentalized.

Remaining risks include

Undetected data corruption
- Mismatch between actual instrument settings and stored metadata
  - verify after setting states;
  - make sure drivers etc. are well-behaved;
  - implement automated test and calibration routines
Detected data corruption, loss of data or unusable data
- Data loss upon loss of power (large amounts of data stored in volatile memory)
  - keep syncing on disk, ideally in a format that supports transactions, i.e. without invalid intermediate states or easy repair of partially-written files
  - Use transaction capabilities of modern journaling file systems?
- Keep a record of what has been done, in particular for automated, scripted measurements
- Separate low-level data transfer and storage from resource-intensive processing, use real time processing frameworks for low-level stuff where appropriate
Overwriting, deleting, changing data during analysis. Issue: The volume of data might require data reduction during or after acquisition. How to make sure that accidental errors can still be undone?

Applicable Standards

Are you aware of any applicable standards or regulations for the solution? Examples would be x-ray safety, electrical safety, machinery directive, SEMI norms, and so on.

Native file formats (HDF5, MRC, NetCDF and the like?)

See File format for specific links

Container formats (folder structures, ZIP, file systems, VMDK, VDI, VHD, …)

Import and export formats and filters (TODO list formats to support)

Device driver APIs?

Hardware interfaces and APIs – Ethernet, USB, PCIe, …

Network protocols?

Quasi-standard processing and visualization libraries and APIs? Hyperspy, Scipy, Numpy, Boost, OpenGL, OpenCL, DirectX, SDL, … TODO collect appropriate libraries to consider

Provide feedback

Saved searches

Use saved searches to filter your results more quickly