Speed up simulations #232

viljarjf · 2024-12-10T15:43:59Z

Description of the change

The scope of this has expanded a little. Now fixes a bug with the direct beam being added twice, implements #234, as well as speeding up computations.
This is done in three main ways:

Use a new RotatedPhase class to keep track of rotated hkls, which lets us freely break orix.Phase's assumption of axis alignment and avoid some computation
Rotate Ewald's sphere (a single vector) instead of rotating all the reciprocal vectors, before checking for intersections
Compute excitation error and Ewald's sphere intersection using numba

Some other optimizations:

Compute intensities for all vectors once, instead of re-computing every time a reflection is intersected
(this can increase runtime when requesting only a few rotations, but massively speed up calculations if the structure has many atoms)
Use symmetry when computing intensities, to avoid re-computing known intensities
( I haven't actually checked if calculating structure factors is more expensive than determining unique reflections...)

Profiling now indicates around equal runtime is given to finding intersections, rotating the intersected vectors, and everything else.
"Everything else" includes mostly DiffractingVector.__getitem__, calculating all the intensities, and finding all reflections within the given d_min.

Progress of the PR

Avoid deepcopy when rotating basis in DiffractingVector.rotate_with_basis
Avoid expensive deepcopy when rotating Miller orix#533
Look into rotating Ewald's sphere instead of the lattice
Add function to expand asymmetric unit cell of Phase orix#544
Add option to NOT discard zero in Object3d.unique orix#545
Docstrings for all functions
Unit tests with pytest for all lines - we could add some test where we store the result from the main branch and compare with this and future outputs
Increase coverage
Clean code style by running black

For reviewers

The PR title is short, concise, and will make sense 1 year later.
New functions are imported in corresponding __init__.py.
New features, API changes, and deprecations are mentioned in the
unreleased section in CHANGELOG.rst.
Contributor(s) are listed correctly in credits in diffsims/release_info.py and
in .zenodo.json.

viljarjf · 2024-12-13T09:25:39Z

Performance testing

Using the following script to test before/after, testing different crystals (hexagonal, cubic and triclinic) and both small and large unit cells (to decrease/increase the number of reflections - which should be where the greatest performance gain is realized):

from diffsims.generators.simulation_generator import SimulationGenerator
from orix.crystal_map import Phase
from orix.quaternion import Rotation
from diffpy.structure import Lattice, Atom, Structure

import numpy as np
import timeit

gold = [Atom("Au", [0, 0, 0])]
hexagonal = Phase("test", 161, structure=Structure(gold,
        Lattice(4, 4, 5, 90, 90, 120)
))
large_hexagonal = Phase("test", 161, structure=Structure(gold,
        Lattice(20, 20, 25, 90, 90, 120)
))
cubic = Phase("test", 221, structure=Structure(gold,
        Lattice(4, 4, 4, 90, 90, 90)
))
large_cubic = Phase("test", 221, structure=Structure(gold,
        Lattice(20, 20, 20, 90, 90, 90)
))
triclinic = Phase("test", 1, structure=Structure(gold,
        Lattice(4, 5, 6, 80, 90, 130)
))
large_triclinic = Phase("test", 1, structure=Structure(gold,
        Lattice(20, 25, 30, 80, 90, 130)
))

gen = SimulationGenerator()
from numpy import random
random.seed(0)
rot = Rotation.random(1000)
kwargs = {
    "rotation": rot,
    "with_direct_beam": False,
    "reciprocal_radius": 5,
}

res = timeit.repeat(
    "gen.calculate_diffraction2d(hexagonal, **kwargs)",
    globals=globals(),
    number=2,
)
print(f"{'hexagonal' :<18}: {np.mean(res) :.2f} ± {np.std(res) :.2f} s")

res = timeit.repeat(
    "gen.calculate_diffraction2d(cubic, **kwargs)",
    globals=globals(),
    number=2,
)
print(f"{'cubic' :<18}: {np.mean(res) :.2f} ± {np.std(res) :.2f} s")

res = timeit.repeat(
    "gen.calculate_diffraction2d(triclinic, **kwargs)",
    globals=globals(),
    number=2,
)
print(f"{'triclinic' :<18}: {np.mean(res) :.2f} ± {np.std(res) :.2f} s")

kwargs["reciprocal_radius"] = 2 # we still get a huge number of reflections
res = timeit.repeat(
    "gen.calculate_diffraction2d(large_hexagonal, **kwargs)",
    globals=globals(),
    number=2,
)
print(f"{'large_hexagonal' :<18}: {np.mean(res) :.2f} ± {np.std(res) :.2f} s")

res = timeit.repeat(
    "gen.calculate_diffraction2d(large_cubic, **kwargs)",
    globals=globals(),
    number=2,
)
print(f"{'large_cubic' :<18}: {np.mean(res) :.2f} ± {np.std(res) :.2f} s")

res = timeit.repeat(
    "gen.calculate_diffraction2d(large_triclinic, **kwargs)",
    globals=globals(),
    number=2,
)
print(f"{'large_triclinic' :<18}: {np.mean(res) :.2f} ± {np.std(res) :.2f} s")

Running on current main branch of diffsims and orix:

hexagonal         : 7.16 ± 0.36 s
cubic             : 7.29 ± 0.48 s
triclinic         : 9.94 ± 0.12 s
large_hexagonal   : 80.11 ± 2.59 s
large_cubic       : 79.15 ± 5.94 s
large_triclinic   : 94.56 ± 3.78 s

And with the full changes in this branch + Orix:

hexagonal         : 3.32 ± 0.15 s
cubic             : 3.16 ± 0.11 s
triclinic         : 3.53 ± 0.13 s
large_hexagonal   : 29.00 ± 0.07 s
large_cubic       : 28.82 ± 0.23 s
large_triclinic   : 35.59 ± 0.12 s

So roughly 2-3x speedup. Not sure what I did to get the 5-6x speedup I saw before, probably some mistake.

When adding precession, the speedup is negligible. It might even be slower. Below is the script run with 1 degree precession, run on a different computer than above, so with/without precession times are not comparable.
main branch:

hexagonal         : 4.71 ± 0.12 s
cubic             : 4.84 ± 0.08 s
triclinic         : 5.08 ± 0.03 s
large_hexagonal   : 70.00 ± 5.09 s
large_cubic       : 67.83 ± 3.98 s
large_triclinic   : 82.24 ± 7.92 s

This branch:

hexagonal         : 4.13 ± 0.06 s
cubic             : 3.97 ± 0.02 s
triclinic         : 4.93 ± 0.20 s
large_hexagonal   : 85.45 ± 3.43 s
large_cubic       : 82.18 ± 3.69 s
large_triclinic   : 86.97 ± 13.73 s

CSSFrancis · 2025-01-24T20:57:44Z

@viljarjf I've thought about parallelizing this a couple of times. I profiled the simulation as well but I can't remember what exactly was slow.

I think that there are a couple of places we could use numba and be quite a bit faster. I'd be hesistant to add something like dask as that's a bit of a larger dependency.

Most of the reason that I haven't gotten around to parallelizing this is because in most of the cases the crystal strucutures I've been using have been pretty simple so the simulation cost is fairly small. I'm all for the idea and can help especially if there is a good reason to do so.

… missing)

viljarjf · 2025-01-27T16:03:56Z

Readthedocs build fails since the two orix PRs are not added, I just made a local branch and merged both. All tests pass with that, at least on python 3.10. I'll update the orix version once it's published as a proper release. There will probably be a merge conflict with #233 too. Other than that, I'm happy to call this finished. Leaving as draft untill the mentioned hangups are ready.

The runtime depends a lot on the number of rotations, the number of atoms in the structure, the resolution/number of reflections in each pattern ect. From some simple testing, the runtime is spent on rotating vectors, finding intersections with Ewald's sphere, and finding symmetrically unique reflections. I cannot personally think of much beyond parallelization which would help much with runtime reduction now, at least without a lot of refactoring, but I'll gladly be proved wrong!

@CSSFrancis I refactored the function responsible for calculating excitation error so it, at least, is now using numba. Parallelization can come later, I think it will need some more refactoring to leverage numba when dealing with ragged data.
As for a reason for optimizing this, I'm trying to simulate patterns for a large crystal (therefore many reflections) with many atoms, to test some processing workflows. That's why I made #235, I had some issues when spots were moved to the nearest pixel

viljarjf added 7 commits December 10, 2024 16:20

Subclass Phase for faster initialization

302ecfa

Rotate Ewald's sphere instead of lattice

a06f2e6

Support precession

ebe79c6

Working precession, need to test for correctness

ce28dc0

Support direct beam

1780e38

Fix formatting

f931d73

Fix bug where direct beam is added twice

bccd856

viljarjf added 2 commits December 13, 2024 10:54

Use rotate_with_basis

909964c

Return self.__class__ instead of explicit ReciprocalLatticeVector

607cc44

viljarjf marked this pull request as ready for review December 13, 2024 14:58

viljarjf mentioned this pull request Jan 23, 2025

Add option to have spot centers off pixel centers #235

Open

7 tasks

Adress pyxem#234

16509b7

viljarjf marked this pull request as draft January 23, 2025 16:19

Getting closer, still some tests left

8a7b314

viljarjf added 6 commits January 25, 2025 09:36

Ensure compatible ordering of unique vectors

6ef265f

Fix test by using symmetrically unique reflections (4 degenerate were…

54b77f6

… missing)

Use Phase.expand_asymmetric_unit

a55784d

Simplify DiffractingVector.__getitem__

e3b09d8

Use numba for speedup

7e307fb

Formatting

8d2c0f3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up simulations #232

Speed up simulations #232

viljarjf commented Dec 10, 2024 •

edited

Loading

viljarjf commented Dec 13, 2024 •

edited

Loading

CSSFrancis commented Jan 24, 2025

viljarjf commented Jan 27, 2025

Speed up simulations #232

Are you sure you want to change the base?

Speed up simulations #232

Conversation

viljarjf commented Dec 10, 2024 • edited Loading

Description of the change

Progress of the PR

For reviewers

viljarjf commented Dec 13, 2024 • edited Loading

Performance testing

CSSFrancis commented Jan 24, 2025

viljarjf commented Jan 27, 2025

viljarjf commented Dec 10, 2024 •

edited

Loading

viljarjf commented Dec 13, 2024 •

edited

Loading