-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v3.4.0: Plotting class labels, RELION 3.1 support, and phase-randomization for FSCs #399
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ocstring formatting
that manipulate volumes for new cleaner API
…unction that can be used by cryodrgn_utils fsc
…eation volume for FSCs
I've added some more commits to address #113 and to fix a bug in the FSC correction, which now detects mask tightness as expected: Will be merging shortly! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In this minor release we are adding several new features and commands, as well as expanding a few existing ones and introducing some key refactorings to the codebase to make these changes easier to implement.
New features
full support for RELION 3.1
.star
files with optics values stored in a separate grouped table before or after the main table (Support parsing multiple optics groups #241, Automatically parse CTF parameters in RELION 3.1 star files #40, parsing relion 3.1 star file doesn't work #10)Starfile
class now has properties.apix
and.resolution
that return particle-wise optics values for commonly used parameters, as well as methods.get_optics_values()
and.set_optics_values()
for any parametercryodrgn parse_ctf_star
can now load all particle-wise optics values from the .star file itself instead of the current behavior of relying upon user input for parameters such as A/px, resolution, voltage, spherical aberration, etc., or just taking the first value found in the filebackproject_voxel
now computes FSC threshold values corrected for mask overfitting using high resolution phase randomization as done in cryoSPARC, as well as showing FSC curves and threshold values for various types of masks:cryodrgn_utils plot_classes
for creating plots of cryoDRGN results colored by a given set of particle class labelsfor now, only creates 2D kernel density plots of the latent space embeddings clustered using UMAP and PCA, but more plots will be added in the future:
analyze.9/umap_kde_classes.png
Improvements to existing features
backproject_voxel
also now creates a new directory using-o/--outdir
into which it places output files, instead of naming all files after the output reconstructed volume-o/--outfile
backproject.mrc
the full reconstructed volumehalf_map_a.mrc
,half_map_b.mrc
reconstructed half-maps using an odd/even particle splitfsc-vals.txt
all five FSC curves in space-delimited formatfsc-plot.png
a plot of these five FSC curves as shown abovedownsample
can now downsample each of the individual files in a stack referenced by a .star or .txt file, returning a new .star file or .txt file referencing the new downsampled stack-o/--outfile
when using a .star or .txt file as input:cryodrgn_utils fsc
can now take three volumes as input, in which case the first volume will be used to generate masks to produce cryoSPARC-style FSC curve plots including phase randomization for the “tight” mask (see New features above)cryodrgn_utils plot_fsc
is now more flexible with the types of input files it can accept for plotting, including.txt
files with the new type of cryoSPARC-style FSC curve output frombackproject_voxel
cryodrgn filter --force
for less interactivity after the selection has been madefilter_mrcs
prints both original and new number of particles; generates output file name automatically if not givencryodrgn abinit_het
savesconfigs
alongside model weights inweights.pkl
for easier access and output checkpoint identificationAddressing bugs and other issues
backproject_voxel
(backproject_voxel FSC plot should automatically detect the pixel size #385)cryodrgn filter
doesn’t show particle indices in hover text anymore, as this proved visually distracting; we now show these indices in a text box in the corner of the plotcryodrgn filter
saves chosen indices as anp.array
instead of Python standardlist
to prevent type issues in downstream analysescommands_utils.translate_mrcs
was not working (was assumingparticles.images()
returned a numpy array instead of a torch Tensor) — this has been fixed and tests added for translations of image stackscryodrgn
andcryodrgn_utils
command line interfaces explicitly, as Python will sometimes install older modules into the corresponding folders which confuses automated scanning for command modulesRefactoring classes that parse input files
There were some updates we wanted to make to the
ImageSource
class and its children which was introduced in a refactoring of the processes used to load and parse input datasets in v3.0.0. We also sought to simplify and clean up the code in the methods used to parse .star file and .mrcs file data incryodrgn.starfile
andcryodrgn.mrc
respectively.the code for the
ImageSource
base class and its children classes incryodrgn.source
have been cleaned up to improve code style, remove redundancies, and support theStarfile
andmrcfile
refactorings described belowdatadir
for_MRCDataFrameSource
classes such asTxtFileSource
andStarfileSource
(downsample
can't processstar
file with particles in multiple folders. #386)_MRCDataFrameSource.parse_filename
which is applied in__init__
:filename
by itself points to a file that exists, usefilename
.os.path.join(datadir, newname)
exists, use that.os.path.join(datadir, os.path.basename(newname))
.ImageSource.orig_n
attribute which is often useful for accessing the original number of particles in the stack before filtering was appliedImageSource.write_mrc()
, to avoid having to useMRCFile.write()
forImageSource
objects;MRCFile.write()
use case for arrays has been replaced bymrcfile.write_mrc
(see below)cryodrgn downsample
for batch writing to.mrc
outputMRCFileSource.write()
, a wrapper formrcfile.write_mrc()
MRCFileSource.apix
property for convenient access to header metadataArraySource
, whose behavior can be subsumed intoImageSource
withlazy=False
ImageSource.from_file()
,._convert_to_ndarray()
,images()
ImageSource.lazy
is now a property, not an attribute, and is dynamically dependent on whetherself.data
has actually been loaded or not_MRCDataFrameSource.sources
convenience iterator propertyStarfileSource
now inherits directly from theStarfile
class (as well as_MRCDataFrameSource
) for better access to .star utilities than using aStarfile
object as an attribute (.df
in the old v3.3.3 class).star file methods have been refactored to establish three clear ways of accessing and manipulating .star data for different levels of features, with RELION3.1 operations now implemented in
Starfile
class methods:cryodrgn.starfile.parse_star
andwrite_star
to get and perform simple operations on the main data table and/or the optics tablee.g. in
filter_star
:cryodrgn.starfile.Starfile
for access to .star file utilities like generating optics values for each particle in the main data table using parameters saved in the optics tablee.g. in
parse_ctf_star
:cryodrgn.source.StarfileSource
for access to .star file utilities along with access to the images themselves usingImageSource
methods like.images()
see our more detailed write-up for more information
Starfile Refactor
for .mrc files, we removed
MRCFile
as there are no analogues presently for the kinds of methods supported byStarfile
; the operations on the image array requiring data from the image header are presently contained withinMRCFileSource
, reflecting the fact that .mrcs files are the image data themselves and not pointers to other files containing the dataMRCFile
, which consisted solely of staticparse
andwrite
methods, has been replaced by the old names of these methods (parse_mrc
andwrite_mrc
)MRCFile.write(out_mrc, vol)
→write_mrc(out_mrc, vol)
vol
is anImageSource
object, we now doImageSource.write_mrc()
in general,
parse_mrc
andwrite_mrc
are for using the entire image stack as an array, whileMRCFileSource
is for accessing batches of images as tensorsmrc
module is now namedmrcfile
for better verbosity and to matchstarfile
module which is its parallel for processing input filesexamples from across the codebase:
commands_utils.add_psize
old:
new:
commands_utils.flip_hand
old:
Note that the awkward combination of
MRCFileSource
andMRCFile
above meant having to cast the images from tensors to arrays after they were loaded!new:
also made some updates to
MRCHeader
for ease of use:mrc
module variables likeDTYPE_FOR_MODE
header class attributesapix
andorigin
with.getter
and.setter
methods, simplifying retrieval of these valuesheader.origin = (0, -1, 0)
instead ofheader.update_origin(0, -1, 0)
, withheader.origin
instead ofheader.get_origin()
to get valuesCode Quality Control
\
incryodrgn.command_line
when producing help messages for-h
cryodrgn.dataset
,cryodrgn.starfile
,cryodrgn.source
,cryodrgn filter
,cryodrgn filter_mrcs
3.9 + 1.12
,3.10 + 2.1
, and3.11 + 2.4
in terms of Python version + PyTorch version, instead of doing all pairs of{3.9, 3.10}
and{1.12, 2.1, 2.3}
, allowing for CI testing to be expanded into Python 3.11 without running too many test jobscryodrgn.pose
andcryodrgn.ctf
when inputs don’t match in dimension or have an unexpected formatcryodrgn.masking
, moving e.g.utils.window_mask()
tomasking.spherical_window_mask()
unittest.sh
, a set of smoke tests for reconstruction commands that can be run outside ofpytest
and regular automated CI testing, by replacing outdated commands (testing/unittest.sh doesn't work #267)