-
Notifications
You must be signed in to change notification settings - Fork 0
Requirements
Dynamic field of research. The goal of the project is to enable applications that are yet unknown!
- Software and data format are modular, extendable, reusable, documented, suitable for parallel, distributed development processes.
Some research groups have existing code and file formats for pixelated STEM. TODO overview
https://fast_pixelated_detectors.gitlab.io/fpd_live_imaging/
https://fast_pixelated_detectors.gitlab.io/merlin_interface/
https://github.com/pycrystem/pycrystem
https://github.com/pycroscopy/pycroscopy
http://bio3d.colorado.edu/SerialEM/
Data management: http://diamondlightsource.github.io/SynchWeb/
https://docs.google.com/document/d/1cFLMqETci3MMzU8NDCySzRNZrL_IXm5l51Rz3kqzW-Q/edit
Integrates with Australian cloud platform: MyTardis, Galaxy, CVL https://www.massive.org.au/cvl
File format stuff
EMD file format - specific flavour of HDF5, designed to store experiment and simulated data. Viewer written in c++ / qt. Some codes to read and write EMD in python and Matlab are available, but since all EMD files are valid HDF5, any modern programming language can already read these files. Primarily maintained by Colin, Phil, Florian. This format is primarily for tomography, 4D-STEM and multi-dimensional STEM simulations.
EMD file format - EMD website
Dieter: This looks great! I guess you are aware of the Hyperspy work in this area? http://hyperspy.org/hyperspy-doc/current/user_guide/io.html
http://hyperspy.org/hyperspy-doc/current/user_guide/metadata_structure.html
EMD file format - EMD github repo
HDF5: check out the H5PY module in python
HDF5: HDF5 group website
HDF5: The HDFview program is very useful for looking at HDF5 and EMD datasets with a graphical user interface. In the Binary Distributions section download the HDFView+Object 2.13 installer for your platform (windows, Mac, etc.). For windows pay attention to the note in the yellow box or the program might not work correctly.
Example for Open Data: http://ammrf-dev-space.intersect.org.au/ --> support similar workflows for 4D STEM data
Data preprocessing
C code to convert .dm4 4D-STEM files into hdf5 - Yifei Meng
Matlab electron counting / clustering code - Colin Ophus
Gatan K2 extraction routine, 4D-STEM package called STEMx
(available at http://www.gatan.com/installation-instructions)
Data processing
Question Dieter: What is the output data and output format of these programs? Is there more detailed information online?
Ptychography, Wigner deconvolution reconstruction in Matlab - Hao Yang (developed at Univ. Oxford and will probably licensed and open sourced)
Matlab DPC, MIDI-STEM, STEM holography reconstruction - Colin Ophus
Digital Micrograph nanobeam electron diffraction (NBED) plug-in - Christoph Gammer
Gatan STEMx software package for NBED - Available as a 90-day free demo upon request to Gatan
Basic 4D-STEM analysis using Scikit-image - Yifei Meng
Image and diffraction simulations
Question Dieter: What input formats do these programs accept for descriptions of the sample and the incident beam? What are the output formats? Do they use some common formats or are these home-grown format definitions? It should be possible to feed the output of these programs into the analysis code, right? Ideally in a closed loop: Analysis gives model of sample, model of sample gives simulated data
Multem - Ivan Lobato, c++,qt nice user interface, very extensive, fast gpu if available, large systems possible, includes inelastic scattering, has Matlab interface for full flexibility, actively maintained. GPL, on [github]https://github.com/Ivanlh20/MULTEM, see Ultramicroscopy paper http://doi.org/10.1016/j.ultramic.2016.06.003
Multislice simulation in Matlab - Colin Ophus
PRISM, fast STEM simulation - Algorithm paper
PRISM matlab code available upon request - Colin Ophus
GPU accelerated version by AJ Pryor in C++, Github repo - repo is currently private, but will be open sourced in ~1 month.
Earl Kirkland multislice code in C: compuTEM - SourceForge repo
Dieter: The documentation of that program is rather minimalistic -- is there at least something like a man page somewhere?
muSTEM simulation code, presently supported only for Windows and running exclisively on Nvidia GPUs - http://tcmp.ph.unimelb.edu.au/mustem/muSTEM.html
Code and file formats for X-ray crystallography might be useful for pixelated STEM. TODO list
Other useful SW?
https://github.com/heeres/qtlab
https://github.com/nion-software https://github.com/nion-software/nionswift http://nionswift.readthedocs.io/en/stable/
http://pandas.pydata.org/pandas-docs/stable/
The software controls scan generators, cameras, other detectors and the microscope when it is used for acquisition. TODO list of hardware to support.
The software should allow connection to other, independent systems that, for example, handle a specific detector. Comment: For example through a network protocol. Set detector and scan parameter, configure triggers, collect results, …
Platforms: Windows, Mac (only data analysis), Linux (x64) issue: How is the driver situation on vendor side?
After initial development phase: Stable APIs and file format definitions with well-managed roadmap and backwards compatibility. Challenge: Combine rapid innovations in some parts of the project with stability in others --> Manage it like web standards?
Run on stock workstations, PCs, laptops.
The software is necessary to operate the microscopy system if it is used for acquisition. Interactive use tolerates occasional problems while it is in Beta. Automated use, scripting needs a reliable system depending on the time it needs to run independently.
The system notifies operators of problems in appropriate ways. Automated use: Send notifications to e-mail or SMS for important events (configurable), logging in a text file, generate output signals for indicator lights (green, yellow, blue, red "traffic light" for industrial machines). Interface with general error handling infrastructure of the microscope? Option to activate detailed tracing to backtrack issues.
UNDETECTED data corruption is the biggest risk, i.e. undetected mismatch between actual instrument settings and settings saved as meta data, or corrupted data points. DETECTED data corruption can be compensated or measurements can be repeated.
The device control and acquisition brings soft realtime requirements (fill and empty device buffers while a scan is running; coordinate device responses and error states).
Scan rates of 2000 fps (absolute minimum), target 100,000 fps (already in development, TODO at which resolution?), optimal 1,000,000 fps (Strong binning? Preprocessing on the device? Conventional detector data) should be supported by the architecture, at variable resolution or binning. TODO what do scan systems allow in terms of scan rate, resolution and such? FASTER is always an obvious direction! Physical limits around 1e9 fps or even faster?
-
API compatible with DMA, moving large amounts of data around very quickly.
- Configure device, transfer memory addresses
- Execute command, device reads or writes memory
- Block until done or callback?
- Use the data, move it to another device or write to disk
- Error handling
- How does a userspace API for such a system look like?
- How to make sure such amounts of data are written to disk in time? Work like Gatan with a whole acquisition framework?
-
Is it viable to keep the raw data in all circumstances or should we immediately preprocess?
- Both options should be possible!
- Feed data into hardware accelerators, GPUs and such and pull back out
Data processing and visualization should give an immediate visual feedback.
- Keep up with data rates!
- Use DMA, GPU, OpenGL, parallel processing as well.
- Python is only glue language, the real work is done at low level in a compiled language with parallel processing
- GUI probably C++ and not Python? Python as an internal scripting language?
- All parts of the software are optimized for high throughput. Use high-performance libraries, be careful with Python!
The software should be designed for high efficiency so that it runs on older, cheaper hardware, if possible.
It should make full use of the capabilities of multicore processors, large memory, GPUs and other accelerators, but run on a plain run-off-the-mill laptop, i.e. no hard dependency on specific hardware.
https://dask.pydata.org/en/latest/ https://support.hdfgroup.org/HDF5/doc/Advanced/Chunking/index.html https://support.hdfgroup.org/HDF5/PHDF5/ https://mathema.tician.de/software/pycuda/ https://scikit-cuda.readthedocs.io/en/latest/ http://mpi4py.scipy.org/docs/ http://eprints.maynoothuniversity.ie/8851/ http://iopscience.iop.org/article/10.1088/1748-0221/11/12/C12023/meta http://iopscience.iop.org/article/10.1088/1748-0221/11/02/P02007/meta
Or OpenCL analogs to make it more portable? Der
Agile, dynamic, distributed development --> needs automated test framework and continuous integration etc.
- Easy update and installation from source tree professional build and installation process from source
- Automated tests on all supported platforms before changes are integrated into main repository
- Coding conventions, docstrings and such to make the code readable
- Patch management / change management like Hyperspy or Linux kernel
- Development branches for new features
- Nightly builds with platform-specific installers
- Changelogs
- Beta releases with platform-specific installers
- Stable releases with maintenance with platform-specific installers
- Issue tracker and communication platform with all bells and whistles
- Communication of changes, give community time to respond. Predictable, responsible behaviour towards users. Don't break userspace!
Upgrade path from older to newer releases
- Import settings
- File converters
- User support
- …?
Human and instrument safety should not be handled by the system because it is designed as a flexible development platform, not as a safety-relevant subsystem. Interlocks, emergency stop etc. handled by hardware or dedicated embedded software, NEVER by this user-space system! --> Safety-relevant SW requires a different development model, must be compartmentalized.
Remaining risks include
- Undetected data corruption
- Mismatch between actual instrument settings and stored metadata
- verify after setting states;
- make sure drivers etc. are well-behaved;
- implement automated test and calibration routines
- Mismatch between actual instrument settings and stored metadata
- Detected data corruption, loss of data or unusable data
- Data loss upon loss of power (large amounts of data stored in volatile memory)
- keep syncing on disk, ideally in a format that supports transactions, i.e. without invalid intermediate states or easy repair of partially-written files
- Use transaction capabilities of modern journaling file systems?
- Keep a record of what has been done, in particular for automated, scripted measurements
- Separate low-level data transfer and storage from resource-intensive processing, use real time processing frameworks for low-level stuff where appropriate
- Data loss upon loss of power (large amounts of data stored in volatile memory)
- Overwriting, deleting, changing data during analysis. Issue: The volume of data might require data reduction during or after acquisition. How to make sure that accidental errors can still be undone?
Are you aware of any applicable standards or regulations for the solution? Examples would be x-ray safety, electrical safety, machinery directive, SEMI norms, and so on.
Native file formats (HDF5, MRC, NetCDF and the like?)
- See File format for specific links
Container formats (folder structures, ZIP, file systems, VMDK, VDI, VHD, …)
Import and export formats and filters (TODO list formats to support)
Device driver APIs?
Hardware interfaces and APIs – Ethernet, USB, PCIe, …
Network protocols?
Quasi-standard processing and visualization libraries and APIs? Hyperspy, Scipy, Numpy, Boost, OpenGL, OpenCL, DirectX, SDL, … TODO collect appropriate libraries to consider