-
Notifications
You must be signed in to change notification settings - Fork 882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Structure.get_symmetry_dataset
convenience method
#4268
Add Structure.get_symmetry_dataset
convenience method
#4268
Conversation
it occurs to me i forgot to add a pymatgen/src/pymatgen/analysis/prototypes.py Lines 31 to 35 in 296a0ec
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Please see comments.
@@ -48,6 +48,7 @@ | |||
from collections.abc import Callable, Iterable, Iterator, Sequence | |||
from typing import Any, ClassVar, SupportsIndex, TypeAlias | |||
|
|||
import moyopy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes moyopy an actual dependency. Not an optional one. This needs to be in a try catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's inside if TYPE_CHECKING:
, so only imported at type-checking time
src/pymatgen/core/structure.py
Outdated
@@ -3338,6 +3339,29 @@ def to_conventional(self, **kwargs) -> Structure: | |||
""" | |||
return self.to_cell("conventional", **kwargs) | |||
|
|||
def get_moyo_dataset(self, **kwargs) -> moyopy.MoyoDataset: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the method name, it is unclear it is a symmetry related function. I would name it get_moyo_symmetry_dataset. Alternatively, why not just put the same functionality in get_symmetry_dataset and have a new kwarg called use_moyo=False
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_moyo_symmetry_dataset
sounds good.
Alternatively, why not just put the same functionality in get_symmetry_dataset and have a new kwarg called use_moyo=False.
i'm reluctant to do that as you could no longer have a single return type and so you would give up on auto-complete suggestions in your IDE:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest we do a Structure.get_symmetry_dataset
that uses either spglib (still default) or moyopy. In either case, the return type should be a dict with consistent keys and formatting. We are getting symmetry information. Users should not need to have to figure out the format of the moyopy dataset and spglib dataset. The underlying tool is of secondary importance. Also, in future, we can easily change the default tool to moyopy (when it is more mature) without causing backwards incompatibility,.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
besides the better auto-complete UX, directly returning the Moyo/SpglibDataset
also has the benefit of reducing maintenance burden. when new keys and symmetry results are added to moyopy
in the future, they don't have to be translated from MoyoDataset
to dict
format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have one method name for both backends, we would need to make sure the return value has some kind of useful type (eg some minimal dataclass) and not just a bare dict, otherwise I agree the maintenance burden would be too high when the return values inevitably become inconsistent between the different backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could include converting the spglib dict into the MoyoDataset
format if it seems appropriate, but we really should just have one type returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue that maintenance burden is on us, but it is important one to do. That's what the tests and CI are for. I would rather not pass on the burden to the user. Say if one day one of the codes change the dict key from "spacegroup" to "space_group", do we want our users (which includes MP) to have to worry about that?
For users who want the full moyopy / spglib dataset and the latest changes, they can call moyopy directly. I am not concerned about such advanced users. I am concerned about the 90% of regular users who are looking for a basic symmetry dataset without having to worry about changes in the backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, we take on the maintenance burden. We should still have a good idea of exactly what is being returned here especially if there are multiple backends.
As a proposal, how about we have get_symmetry_dataset -> moyopy.MoyoDataset | spglib.SpglibDataset
as suggested, but we use typing.overload
to ensure everything is typed nicely?
Then we get the benefit of the general method name, the multiple backends, and the typing being correct.
@janosh For now, I think the citation for spglib is enough.
|
…port, currently moyopy and spglib - remove get_moyo_dataset method - implement method overloading for type hints - update tests to cover both backends and error cases
Structure.get_moyo_dataset
convenience methodStructure.get_symmetry_dataset
convenience method
remaining question after 8974a61: what should the default |
- also include citation details in method doc string
- update pyproject.toml to install symmetry optional deps in CI
… moyopy[interface]`
Also curious to hear a perspective on that, concur the 0.1 default is probably far too loose. I think maybe fine for purposes of loosely grouping structures, for rough relaxations etc. We should probably add a note to the docstring to give a hint to the user what symprecs might be appropriate for what use cases. |
I don't have a strong opinion on symprec, but I will give some context. The general idea of all structure / symmetry comparisons in pymatgen is this: if I take two structures that have the "same" (either in structure matcher or in symmetry) and relax them in DFT, would I end up in the same place? In that context, 0.1A is good enough. It is highly unlikely that two structures that differ by 0.1A would relax to two different final structures or symmetries. For spglib, I have found that sometimes there are some paradoxical behavior. A structure would fail with either too low or too high symmprec, but is ok in some intermediate range of symprec. In the end, it really depends on your philosophy. Basically, if I take a structure and just moves one of the atoms by 0.11A, do you want that to break symmetry? |
Sorry, I accidentally merged. Feel free to submit a separate PR to change the default symprec. |
happy to but need some consensus on what the default should be. unless i hear otherwise, it'll be |
@janosh I propose 0.01A. I think 1e-4 is far too strict. 0.01 is already an order of magnitude smaller than the shortest chemical bond. |
I can just make this change myself. |
One thing to note, there are severe inconsistencies between moyopy and spglib. E.g., spglib returns the international symbol (which any sane symmetry analysis code would do). moyopy just returns the number. This makes it even more imperative that the default return type needs to be a consistent dict / named tuple. Otherwise the end user is going to be confused. |
You can get both from moyopy. The reason I think 0.1 is a bad default symprec is you shouldn't make the assumption for the user that they want the symmetry of the structure that their input might relax to. By default, you should assume they want the symmetry for the actual structure they pass. |
I implemented some basic changes. Feel free to make further mods and submit a PR. I don't necessarily disagree that we should return the symmetry of the structure that they pass. But that is a slippery argument. Symmetry is always based on tolerances. One can argue that even if an atom moves by 1e-100 A from the correct position, that already breaks symmetry in the strictest sense of the word. Surely we can all agree that a 1e-100 tolerance is too strict and will cause too many structures to be rendered P1. The logical determination of symmetry tolerances should be based on real-world perturbations with an actual impact on properties. If a 0.01A shift has negligible impact on properties and is in fact 10 times smaller than the shortest chemical bond, it is good enough to say that the structures are symmetrically equivalent. We do not need a stricter cutoff based on some mathematical argument. |
From an implementation perspective, a symmetry finder (spglib and moyopy) struggles with too small or big symprec. Recently, I've performed a benchmark of these symmetry finders with several symprec choices to Materials Project's entries (link). Let me check how symprec affects detected space-group types. |
You cannot determine the necessary symprec simply from just running on all MP structures. If symprec = 0.1 gives cubic and symprec = 0.00001 gives orthorhombic, which is correct? At room temperature (300K), the typical atomic vibration amplitude is around 0.05 - 0.1 A. We can agree that thermal vibrations shouldn't break symmetry. So a tolerance of 0.01A is fine. (This is already the default for the Structure.get_space_group method). |
I do not intend to suggest determining a default symprec from statistics. The following figures show how detected crystal systems are changed by symprec. Low-symmetry systems (triclinic and monoclinic) are especially affected by symprec choice as expected. Interactive HTML for these figures: |
moyopy
is a successor tospglib
that's faster and written in Rust. in my usage so far for analyzing MLFF-relaxed WBM structures for https://matbench-discovery.materialsproject.org/tasks/geo-opt, it seems about 4x faster thanspglib
and gives identical results in nearly all 257k cases. a few exceptions were reported in CompRhys/aviary#96 and resolution tracked in spglib/moyo#54this PR adds a convenience method for directly accessing
moyopy
symmetry analysis results on pymatgenStructures
. it also defines a new optional dependency setsymmetry
inpyproject.toml
pinging @lan496
related ASE issue: https://gitlab.com/ase/ase/-/issues/1612