Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Structure.get_symmetry_dataset convenience method #4268

Merged
merged 5 commits into from
Jan 28, 2025

Conversation

janosh
Copy link
Member

@janosh janosh commented Jan 27, 2025

moyopy is a successor to spglib that's faster and written in Rust. in my usage so far for analyzing MLFF-relaxed WBM structures for https://matbench-discovery.materialsproject.org/tasks/geo-opt, it seems about 4x faster than spglib and gives identical results in nearly all 257k cases. a few exceptions were reported in CompRhys/aviary#96 and resolution tracked in spglib/moyo#54

this PR adds a convenience method for directly accessing moyopy symmetry analysis results on pymatgen Structures. it also defines a new optional dependency set symmetry in pyproject.toml

symmetry = ["moyopy>=0.3", "spglib>=2.5"]

pinging @lan496

related ASE issue: https://gitlab.com/ase/ase/-/issues/1612

@janosh
Copy link
Member Author

janosh commented Jan 27, 2025

it occurs to me i forgot to add a @due.dcite decorator to this method. @lan496 do you have something you'd like users of this method to cite? maybe just a Zenodo release or a citation.cff file for now? here's a similar example:

@due.dcite(
Doi("10.1016/j.commatsci.2017.01.017"),
description="The AFLOW library of crystallographic prototypes: part 1.",
)
class AflowPrototypeMatcher:

Copy link
Member

@shyuep shyuep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Please see comments.

@@ -48,6 +48,7 @@
from collections.abc import Callable, Iterable, Iterator, Sequence
from typing import Any, ClassVar, SupportsIndex, TypeAlias

import moyopy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes moyopy an actual dependency. Not an optional one. This needs to be in a try catch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's inside if TYPE_CHECKING:, so only imported at type-checking time

@@ -3338,6 +3339,29 @@ def to_conventional(self, **kwargs) -> Structure:
"""
return self.to_cell("conventional", **kwargs)

def get_moyo_dataset(self, **kwargs) -> moyopy.MoyoDataset:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the method name, it is unclear it is a symmetry related function. I would name it get_moyo_symmetry_dataset. Alternatively, why not just put the same functionality in get_symmetry_dataset and have a new kwarg called use_moyo=False.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_moyo_symmetry_dataset sounds good.

Alternatively, why not just put the same functionality in get_symmetry_dataset and have a new kwarg called use_moyo=False.

i'm reluctant to do that as you could no longer have a single return type and so you would give up on auto-complete suggestions in your IDE:

Screenshot 2025-01-27 at 12 13 12

Copy link
Member

@shyuep shyuep Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest we do a Structure.get_symmetry_dataset that uses either spglib (still default) or moyopy. In either case, the return type should be a dict with consistent keys and formatting. We are getting symmetry information. Users should not need to have to figure out the format of the moyopy dataset and spglib dataset. The underlying tool is of secondary importance. Also, in future, we can easily change the default tool to moyopy (when it is more mature) without causing backwards incompatibility,.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

besides the better auto-complete UX, directly returning the Moyo/SpglibDataset also has the benefit of reducing maintenance burden. when new keys and symmetry results are added to moyopy in the future, they don't have to be translated from MoyoDataset to dict format

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have one method name for both backends, we would need to make sure the return value has some kind of useful type (eg some minimal dataclass) and not just a bare dict, otherwise I agree the maintenance burden would be too high when the return values inevitably become inconsistent between the different backends.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could include converting the spglib dict into the MoyoDataset format if it seems appropriate, but we really should just have one type returned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that maintenance burden is on us, but it is important one to do. That's what the tests and CI are for. I would rather not pass on the burden to the user. Say if one day one of the codes change the dict key from "spacegroup" to "space_group", do we want our users (which includes MP) to have to worry about that?

For users who want the full moyopy / spglib dataset and the latest changes, they can call moyopy directly. I am not concerned about such advanced users. I am concerned about the 90% of regular users who are looking for a basic symmetry dataset without having to worry about changes in the backend.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we take on the maintenance burden. We should still have a good idea of exactly what is being returned here especially if there are multiple backends.

As a proposal, how about we have get_symmetry_dataset -> moyopy.MoyoDataset | spglib.SpglibDataset as suggested, but we use typing.overload to ensure everything is typed nicely?

Then we get the benefit of the general method name, the multiple backends, and the typing being correct.

@lan496
Copy link
Contributor

lan496 commented Jan 28, 2025

do you have something you'd like users of this method to cite? maybe just a Zenodo release or a citation.cff file for now?

@janosh For now, I think the citation for spglib is enough.

@article{spglib,
  author = {Atsushi Togo, Kohei Shinohara and Isao Tanaka},
  title = {Spglib: a software library for crystal symmetry search},
  journal = {Sci. Technol. Adv. Mater., Meth.},
  volume = {4},
  number = {1},
  pages = {2384822--2384836},
  year = {2024},
  doi = {10.1080/27660400.2024.2384822},
  url = {https://doi.org/10.1080/27660400.2024.2384822},
}

…port, currently moyopy and spglib

- remove get_moyo_dataset method
- implement method overloading for type hints
- update tests to cover both backends and error cases
@janosh janosh changed the title Add Structure.get_moyo_dataset convenience method Add Structure.get_symmetry_dataset convenience method Jan 28, 2025
@janosh
Copy link
Member Author

janosh commented Jan 28, 2025

remaining question after 8974a61: what should the default symprec and angle_tolerance across moyopy and spglib be. currently each code uses different defaults which can result in different symmetries returned by each backend. i believe the current pymatgen default symprec=0.1 is far too loose. maybe the spglib default of 1e-5 is a bit too strict. i advocate we adopt moyopy's default of 1e-4. curious to hear your thoughts @lan496. also, the angle_tolerance in moyopy is in radians whereas spglib uses degrees, so we need to take care of unit conversion and also pick a consistent default

- also include citation details in method doc string
- update pyproject.toml to install symmetry optional deps in CI
@mkhorton
Copy link
Member

Also curious to hear a perspective on that, concur the 0.1 default is probably far too loose. I think maybe fine for purposes of loosely grouping structures, for rough relaxations etc. We should probably add a note to the docstring to give a hint to the user what symprecs might be appropriate for what use cases.

@shyuep
Copy link
Member

shyuep commented Jan 28, 2025

I don't have a strong opinion on symprec, but I will give some context. The general idea of all structure / symmetry comparisons in pymatgen is this: if I take two structures that have the "same" (either in structure matcher or in symmetry) and relax them in DFT, would I end up in the same place? In that context, 0.1A is good enough. It is highly unlikely that two structures that differ by 0.1A would relax to two different final structures or symmetries.

For spglib, I have found that sometimes there are some paradoxical behavior. A structure would fail with either too low or too high symmprec, but is ok in some intermediate range of symprec.

In the end, it really depends on your philosophy. Basically, if I take a structure and just moves one of the atoms by 0.11A, do you want that to break symmetry?

@shyuep shyuep merged commit 4375e34 into materialsproject:master Jan 28, 2025
43 checks passed
shyuep added a commit that referenced this pull request Jan 28, 2025
@shyuep
Copy link
Member

shyuep commented Jan 28, 2025

Sorry, I accidentally merged. Feel free to submit a separate PR to change the default symprec.

@janosh
Copy link
Member Author

janosh commented Jan 28, 2025

Feel free to submit a separate PR to change the default symprec.

happy to but need some consensus on what the default should be. unless i hear otherwise, it'll be 1e-4. @lan496 what would you recommend for angle_tolerance?

@janosh janosh deleted the struct-get-moyo-dataset branch January 28, 2025 22:49
@shyuep
Copy link
Member

shyuep commented Jan 28, 2025

@janosh I propose 0.01A. I think 1e-4 is far too strict. 0.01 is already an order of magnitude smaller than the shortest chemical bond.

@shyuep
Copy link
Member

shyuep commented Jan 28, 2025

I can just make this change myself.

@shyuep
Copy link
Member

shyuep commented Jan 28, 2025

One thing to note, there are severe inconsistencies between moyopy and spglib. E.g., spglib returns the international symbol (which any sane symmetry analysis code would do). moyopy just returns the number. This makes it even more imperative that the default return type needs to be a consistent dict / named tuple. Otherwise the end user is going to be confused.

@janosh
Copy link
Member Author

janosh commented Jan 28, 2025

You can get both from moyopy. The reason I think 0.1 is a bad default symprec is you shouldn't make the assumption for the user that they want the symmetry of the structure that their input might relax to. By default, you should assume they want the symmetry for the actual structure they pass.

@shyuep
Copy link
Member

shyuep commented Jan 28, 2025

I implemented some basic changes. Feel free to make further mods and submit a PR.

I don't necessarily disagree that we should return the symmetry of the structure that they pass. But that is a slippery argument. Symmetry is always based on tolerances. One can argue that even if an atom moves by 1e-100 A from the correct position, that already breaks symmetry in the strictest sense of the word. Surely we can all agree that a 1e-100 tolerance is too strict and will cause too many structures to be rendered P1.

The logical determination of symmetry tolerances should be based on real-world perturbations with an actual impact on properties. If a 0.01A shift has negligible impact on properties and is in fact 10 times smaller than the shortest chemical bond, it is good enough to say that the structures are symmetrically equivalent. We do not need a stricter cutoff based on some mathematical argument.

@lan496
Copy link
Contributor

lan496 commented Jan 29, 2025

From an implementation perspective, a symmetry finder (spglib and moyopy) struggles with too small or big symprec.
For too small symprec (say ~1e-10), the numerical accuracy for floating-point number may result in inconsistency: for example, detected symmetry operations with the large symprec may not form a group.
For too large symprec (~0.1), I sometimes observed weird behaviors as well. I think it comes from the inconsistency in the detected translation subgroup and symmetry operations.

Recently, I've performed a benchmark of these symmetry finders with several symprec choices to Materials Project's entries (link). Let me check how symprec affects detected space-group types.

shyuep added a commit that referenced this pull request Jan 29, 2025
@shyuep
Copy link
Member

shyuep commented Jan 29, 2025

You cannot determine the necessary symprec simply from just running on all MP structures. If symprec = 0.1 gives cubic and symprec = 0.00001 gives orthorhombic, which is correct?

At room temperature (300K), the typical atomic vibration amplitude is around 0.05 - 0.1 A. We can agree that thermal vibrations shouldn't break symmetry. So a tolerance of 0.01A is fine. (This is already the default for the Structure.get_space_group method).

@lan496
Copy link
Contributor

lan496 commented Jan 30, 2025

I do not intend to suggest determining a default symprec from statistics.

The following figures show how detected crystal systems are changed by symprec. Low-symmetry systems (triclinic and monoclinic) are especially affected by symprec choice as expected.

image

image

Interactive HTML for these figures:
sankey.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants