Skip to content

Commit f1c8626

Browse files
committed
CLN/ENH: Rename and refactor datapipes, add datasets; fix #574 #724 #754 (#755)
* Rename vak/datasets -> vak/datapipes * Rename frame_classifcation.window_dataset.WindowDataset -> TrainDatapipe * Rename frame_classification/window_dataset.py -> train_datapipe.py * Fix WindowDataset -> TrainDatapipe in docstrings * Rename frame_classification.frames_dataset.FramesDataset -> infer_datapipe.InferDatapipe * Rename transforms.StandardizeSpect -> FramesStandarizer * Import FramesStandarizer in datapipes/frame_classification/infer_datapipe.py * Add module-level docstring in vak/datapipes/__init__.py * Rewrite transforms.defaults.frames_classification.EvalItemTransform and PredictItemTransform as a single class, InferItemTransform, and remname spect_standardizer -> frames_standardizer in that module * Fix bug in view_as_window_batch so it works on 1-D arrays, add type hinting in src/vak/transforms/functional.py * Change frame_labels_transform in InferItemTransform to be a torchvision.transforms.Compose, so we get back a windowed batch * Remove TODO in src/vak/models/frame_classification_model.py * Rewrite TrainDatapipe to always use TrainItemTransform, add parameters that get passed to TrainItemTransform when instatiating it inside TrainDatapipe.__init__ * Rewrite frames_classification.InferDatapipe to always use transforms.default.frame_classification.InferItemTransform, add parameters that get passed to InferItemTransform when instatiating it inside InferDatapipe.__init__ * Rewrite train.frame_classification to pass kwargs into datapipes that now use default transforms, and no longer call transforms.defaults.get * Rewrite predict.frame_classification to pass kwargs into datapipes that now use default transforms, and no longer call transforms.defaults.get * Rewrite eval.frame_classification to pass kwargs into datapipes that now use default transforms, and no longer call transforms.defaults.get * Rewrite predict.frame_classification to pass kwargs into datapipes that now use default transforms, and no longer call transforms.defaults.get * Rename 'spect_scaler_path' -> 'frames_standardizer_path' * Rename 'normalize_spectrogram' -> 'standardize_frames' * Fix 'SpectScaler' -> 'FramesStandardizer', 'normalize spectrogram' -> 'standardize (normalize) frames' * Fix 'SpectScaler' -> 'FramesStandardizer' in tests/ * Fix key names in doc/toml * Add missing comma in src/vak/train/frame_classification.py * Rename config/valid-version-1.1.toml -> valid-version-1.2.toml * Fix normalize spectrograms -> standardize frames more places in docs * Fix datapipes.frame_classification.InferDatapipe to have needed parameters for item transform * Fix datapipes.frame_classification.TrainDatapipe to have needed parameters for item transform * Fix arg name 'spect_standardizer -> frames_standardizer in src/vak/train/frame_classification.py * fixup fix TrainDatapipe parameters * Fix variable name in src/vak/datapipes/frame_classification/train_datapipe.py * Add missing arg return_padding_mask in src/vak/train/frame_classification.py * Fix transforms.default.frame_classification.InferItemTransform to not window frame labels, just convert them to LongTensor * Revise docstring in eval/frame_classification * Remove item_transform from docstring in datapipes/frame_classification/train_datapipe.py * Add return_padding_mask arg in vak/predict/frame_classification.py * Remove src/vak/transforms/defaults/parametric_umap.py * Rename/rewrite Datapipe class for ParametricUMAP, hard-code in transform * Remove transforms/defaults/get.py, remove related imports in transforms/defaults/__init__.py * Finish removing transform fetching for ParametricUMAP * Fix typo in src/vak/eval/frame_classification.py * Fix "StandardizeSpect" -> "FramesStandardizer" in src/vak/learncurve/frame_classification.py * Apply changes from nox lint session * Make flake8 fixes, remove unused function get_default_frame_classification_transform * Fix "StandardizeSpect" -> "FramesStandardizer" in tests/scripts/vaktestdata/configs.py" * WIP: Add datasets/ with biosoundsegbench * Renam tests/test_datasets -> test_datapipes, fix tests * Fix 'StandardizeSpect' -> 'FramesStandardizer' in two tests * Remove two uses of vak.transforms.defaults.get_default_transform from tests * Fix datapipe used in tests/test_models/test_parametric_umap_model.py * Use TYPE_CHECKING to avoid circular import in src/vak/datapipes/frame_classification/infer_datapipe.py * Add method 'fit_inputs_targets_csv_path' to FramesStandardizer, rewrite 'fit_dataset_path' method to just call this new method * fixup add method * Add unit test for FramesStandardizer.fit_inputs_targets_csv_path * Remove unused import from src/vak/transforms/transforms.py * Remove unused import in src/vak/transforms/defaults/frame_classification.py * Pep8 fix in src/vak/datasets/__init__.py * Apply linting to src/vak/transforms/transforms.py * Correct docstring in src/vak/transforms/defaults/frame_classification.py * Import datasets in src/vak/__init__.py * Rename datapipes/frame_classification/constants.FRAME_LABELS_EXT -> MULTI_FRAME_LABELS_EXT, and change value to 'multi-frame-labels.npy', and change value of FRAME_LABELS_NPY_PATH_COL_NAME to 'multi_frame_labels_npy_path' * Rename vak.datapipes.frame_classification.constants.FRAME_LABELS_NPY_PATH_COL_NAME -> MULTI_FRAME_LABELS_PATH_COL_NAME * Rename key in item returned by frame_classification.TrainItemTransform and InferItemTransform; 'frame_labels' -> 'multi_frame_labels' * WIP: Get BioSoundSegBench class working * Rewrite FrameClassificationModel to handle different target types * Add VALID_SPLITS to common.constants * In datasets/biosoundsegbench.py: change VALID_TARGET_TYPES to be the ones we're using for experiments right now, fix TrainItemTransform to handle target types, clean up __init__ method validation * Add initial unit tests for BioSoundSegBench dataset * Add helper function vak.datasets.get * Clean up how we validate target_type in datasets.BioSoundSegBench.__init__ * Add tests/test_datasets/__init__.py (to make a sub-package) * Add initial unit tests for vak.datasets.get * Modify BioSoundSegBench.__init__ so we can write splits_path as just the filename * Use expanded_user_path converter on path and splits_path attributes of DatasetConfig * Rename BOUNDARY_ONEHOT_PATH_COL_NAME -> BOUNDARY_FRAME_LABELS_PATH_COL_NAME in datasets/biosoundsegbench.py * Modify datasets.BioSoundSegBench to compute metadata from splits_json path * Fix mock_biosoundsegbench_dataset fixture so mocked files follow naming conventions of dataset * Modify mock_biosoundsegbench_dataset fixture to save labelmaps.json * Change BioSoundSegBench.__init__ so we have training_replicate_metadata attribute, frame_dur attribute, and labelmap attribute * Add DATASETS dict in dataset/__init__.py, used by vak.datasets.get to look up class (value) by name (key) * Use vak.datasets.DATASETS in vak.datasets.get to get class * Rewrite BioSoundSegBench.__init__ so we can either pass in a FramesStandardizer instance or tell it to fit a new one to the specified split, that then gets added to the transform * Import DATASETS inside vak.datasets.get to avoid circular import * Make fixes in datasets/biosoundsegbench.py: import FramesStandardizer inside TrainItemTransform.__init__, fix tmp_splits_path -> splits-jsons (plural), add needed __len__ method to class * Rename BioSoundSegBench property 'input_shape' -> 'shape' for consistency with frame_classification datapipes * Get vak/train/frame_classification.py to the point where it runs * Add missing self in BioSoundSegBench._getitemval * Rewrite src/vak/eval/frame_classification.py to work with built-in datasets, and remove 'split' parameter from eval_frame_classification_model function -- check if 'split' is in dataset_config and if not, default to 'test' * Remove split argument in call to eval_frame_classification_model inside src/vak/learncurve/frame_classification.py * Remove split parameter from eval._eval.eval -- it's not an attribute of EvalConfig and we can now pass in a 'split' through dataset_config * Remove 'split' parameter from eval_parametric_umap_model, check if 'split' in dataset_config and if not default to 'test' * Rewrite src/vak/predict/frame_classification.py to work with built-in datasets; check if 'split' is in dataset_config and if not, default to 'predict' * Add comments to structure src/vak/train/frame_classification.py * Fix how we check for key in src/vak/predict/frame_classification.py * Fix how we check for key in dict in src/vak/eval/parametric_umap.py * Fix how we check for key in dict in src/vak/eval/frame_classification.py * Fix unit tests in test_dataset.py: assert that path attributes are vak.converters.expanded_user_path(value from config), not pathlib.Path * Fix how we parametrize tests/test_dataset/test_get.py * In BioSoundSegBench.__init__, fix how we calculate frame_dur and how we set labelmap attribute for binary/boundary frame labels * In FrameClassificationModel.validation_step, convert Levenshtein distance to float to squelch warning from Lightning * Fix FrameClassificationModel so train/val with multi-class + boundary labels works * Fix vak.cli.predict to not assume that config has a prep attribute * Fix how we override default split with a split from dataset_config['params'] in predict/frame_classification and eval/frame_classification * Change BioSoundSegBench so __getitem__ can return 'frames_path' in 'item' for eval/predict * In predict.frame_classification, set 'return_frames_path' to True in dataset_config['params'] since we need this for predictions * Add constant DEFAULT_SPECT_FORMAT in common.constants * Fix SPECT_KEY -> TIMEBINS_KEY in cli.prep * Fix how we determine input_type and spect_format for built-in datasets in predict/frame_classification * Add nn/loss/crossentropy.py, wraps torch.nn.CrossEntropy, but converts weight arg as list to tensor * Fixup add loss * Use nn.loss.CrossEntropy with TweetyNet model * Clean up prediction_step in FrameClassificationModel * Get predict working for multi_frame_labels and boundary_frame_labels, still need to test binary_frame_labels and (boundary, multi) * Rename 'unlabeled_label' -> 'background_label' in transforms/frame_labels * Rename 'unlabeled_label' -> 'background_label' in tests/test_transforms/test_frame_labels * Rewrite transforms/frame_labels/functional.py to handle boundary labels - Add `boundary_labels_to_segment_inds_list' that finds segment indexing arrays from a list of boundary labels - Rename `to_segment_inds` -> `frame_labels_to_segment_inds_list - Have `preprocess` optionally take `boundary_labels` and use it to find segments, instead of frame labels - Fix type annotations to use npt.NDArray instead of np.ndarray * Change how FrameClassificationModel calls loss for multi-class + boundary targets -- assume we pass to an instance of a loss function, and get back either a scalar loss or a dict mapping loss names to scalar values * Change arg name 'unlabeled_label' -> 'background_label' in prep/frame_classification/make_splits.py * Fix predict.frame_classification for multi-class, and add logic for multi-class frame labels with boundary frame labels * Add DEFAULT_BACKGROUND_LABEL to common.constants * Use DEFAULT_BACKGROUND_LABEL in transforms.frame_labels.functional * Rename unlabeled -> background_label in common.labels * Add background_label in docstring in common/labels.py * Add 'background_label' to FrameClassificationModel, defaults to common.constants.DEFAULT_BACKGROUND_LABEL, used to validate length of string labels in labelmap * Fix 'unlabeled' -> common.constants.DEFAULT_BACKGROUND_LABEL in anohter place in common/labels.py * Fix unlabeled -> background label in docstrings in transforms * Use 'background_label' argument in place of magic string 'unlabeled' in prep/frame_classification/learncurve.py * Fix unlabeled -> background label in docstrings in transforms/frame_labels/functional.py * Add background_label to docstring in src/vak/prep/frame_classification/learncurve.py * Add background_label to function in src/vak/prep/frame_classification/make_splits.py * Add background_label parameter to src/vak/predict/frame_classification.py and add type annotations to function signature * Fix unlabeled -> background / vak.common.constants.DEFAULT_BACKGROUND_LABEL in tests * Fix 'map_unlabeled' -> 'map_background' in tests/ * Fix 'constants' -> 'common' in src/vak/models/frame_classification_model.py * Fix arg name map_unlabeled -> map_background * Fix arg name map_unlabeled -> map_background in prep/parametric_umap * Fix 'unlabeled' -> vak.common.constants.DEFAULT_BACKGROUND_LABEL in tests/ * Fix name `to_inds_list` -> segment_inds_list_from_class_labels` in test_transforms/test_frame_labels/test_functional.py
1 parent 2c6e469 commit f1c8626

File tree

118 files changed

+3060
-1363
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

118 files changed

+3060
-1363
lines changed

doc/api/index.rst

+14
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,20 @@ The :mod:`vak.datasets` module contains datasets built into vak.
154154
datasets.frame_classification
155155
datasets.parametric_umap
156156

157+
Datapipes
158+
---------
159+
160+
The :mod:`vak.datapipes` module contains datapipes for loading dataset
161+
generated by :func:`vak.prep.prep`.
162+
163+
.. autosummary::
164+
:toctree: generated
165+
:template: module.rst
166+
:recursive:
167+
168+
datapipes.frame_classification
169+
datapipes.parametric_umap
170+
157171
Metrics
158172
-------
159173
The :mod:`vak.metrics` module contains metrics used

doc/toml/gy6or6_eval.toml

+2-2
Original file line numberDiff line numberDiff line change
@@ -33,9 +33,9 @@ checkpoint_path = "/PATH/TO/FOLDER/results/train/RESULTS_TIMESTAMP/TweetyNet/che
3333
# labelmap_path: path to file that maps from outputs of model (integers) to text labels in annotations;
3434
# this is used when generating predictions
3535
labelmap_path = "/PATH/TO/FOLDER/results/train/RESULTS_TIMESTAMP/labelmap.json"
36-
# spect_scaler_path: path to file containing SpectScaler that was fit to training set
36+
# frames_standardizer_path: path to file containing SpectScaler that was fit to training set
3737
# We want to transform the data we predict on in the exact same way
38-
spect_scaler_path = "/PATH/TO/FOLDER/results/train/RESULTS_TIMESTAMP/StandardizeSpect"
38+
frames_standardizer_path = "/PATH/TO/FOLDER/results/train/RESULTS_TIMESTAMP/StandardizeSpect"
3939
# batch_size
4040
# for predictions with a frame classification model, this should always be 1
4141
# and will be ignored if it's not

doc/toml/gy6or6_predict.toml

+2-2
Original file line numberDiff line numberDiff line change
@@ -29,9 +29,9 @@ checkpoint_path = "/PATH/TO/FOLDER/results/train/RESULTS_TIMESTAMP/TweetyNet/che
2929
# labelmap_path: path to file that maps from outputs of model (integers) to text labels in annotations;
3030
# this is used when generating predictions
3131
labelmap_path = "/PATH/TO/FOLDER/results/train/RESULTS_TIMESTAMP/labelmap.json"
32-
# spect_scaler_path: path to file containing SpectScaler that was fit to training set
32+
# frames_standardizer_path: path to file containing SpectScaler that was fit to training set
3333
# We want to transform the data we predict on in the exact same way
34-
spect_scaler_path = "/PATH/TO/FOLDER/results/train/RESULTS_TIMESTAMP/StandardizeSpect"
34+
frames_standardizer_path = "/PATH/TO/FOLDER/results/train/RESULTS_TIMESTAMP/StandardizeSpect"
3535
# batch_size
3636
# for predictions with a frame classification model, this should always be 1
3737
# and will be ignored if it's not

doc/toml/gy6or6_train.toml

+2-2
Original file line numberDiff line numberDiff line change
@@ -37,9 +37,9 @@ root_results_dir = "/PATH/TO/FOLDER/results/train"
3737
batch_size = 8
3838
# num_epochs: number of training epochs, where an epoch is one iteration through all samples in training split
3939
num_epochs = 2
40-
# normalize_spectrograms: if true, normalize spectrograms per frequency bin, so mean of each is 0.0 and std is 1.0
40+
# standardize_frames: if true, standardize (normalize) frames (input to neural network) per frequency bin, so mean of each is 0.0 and std is 1.0
4141
# across the entire training split
42-
normalize_spectrograms = true
42+
standardize_frames = true
4343
# val_step: step number on which to compute metrics with validation set, every time step % val_step == 0
4444
# (a step is one batch fed through the network)
4545
# saves a checkpoint if the monitored evaluation metric improves (which is model specific)

src/vak/__init__.py

+2
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
cli,
44
common,
55
config,
6+
datapipes,
67
datasets,
78
eval,
89
learncurve,
@@ -42,6 +43,7 @@
4243
"cli",
4344
"common",
4445
"config",
46+
"datapipes",
4547
"datasets",
4648
"eval",
4749
"learncurve",

src/vak/cli/eval.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,6 @@ def eval(toml_path: str | pathlib.Path) -> None:
5858
output_dir=cfg.eval.output_dir,
5959
num_workers=cfg.eval.num_workers,
6060
batch_size=cfg.eval.batch_size,
61-
spect_scaler_path=cfg.eval.spect_scaler_path,
61+
frames_standardizer_path=cfg.eval.frames_standardizer_path,
6262
post_tfm_kwargs=cfg.eval.post_tfm_kwargs,
6363
)

src/vak/cli/learncurve.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ def learning_curve(toml_path):
6161
num_workers=cfg.learncurve.num_workers,
6262
results_path=results_path,
6363
post_tfm_kwargs=cfg.learncurve.post_tfm_kwargs,
64-
normalize_spectrograms=cfg.learncurve.normalize_spectrograms,
64+
standardize_frames=cfg.learncurve.standardize_frames,
6565
shuffle=cfg.learncurve.shuffle,
6666
val_step=cfg.learncurve.val_step,
6767
ckpt_step=cfg.learncurve.ckpt_step,

src/vak/cli/predict.py

+4-4
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
import logging
22
from pathlib import Path
33

4-
from .. import config
4+
from .. import common, config
55
from .. import predict as predict_module
66
from ..common.logging import config_logging_for_cli, log_version
77

@@ -33,7 +33,7 @@ def predict(toml_path):
3333
force=True,
3434
)
3535
log_version(logger)
36-
logger.info("Logging results to {}".format(cfg.prep.output_dir))
36+
logger.info("Logging results to {}".format(cfg.predict.output_dir))
3737

3838
if cfg.predict.dataset.path is None:
3939
raise ValueError(
@@ -49,8 +49,8 @@ def predict(toml_path):
4949
checkpoint_path=cfg.predict.checkpoint_path,
5050
labelmap_path=cfg.predict.labelmap_path,
5151
num_workers=cfg.predict.num_workers,
52-
timebins_key=cfg.prep.spect_params.timebins_key,
53-
spect_scaler_path=cfg.predict.spect_scaler_path,
52+
timebins_key=cfg.prep.spect_params.timebins_key if cfg.prep else common.constants.TIMEBINS_KEY,
53+
frames_standardizer_path=cfg.predict.frames_standardizer_path,
5454
annot_csv_filename=cfg.predict.annot_csv_filename,
5555
output_dir=cfg.predict.output_dir,
5656
min_segment_dur=cfg.predict.min_segment_dur,

src/vak/cli/train.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -60,9 +60,9 @@ def train(toml_path):
6060
num_epochs=cfg.train.num_epochs,
6161
num_workers=cfg.train.num_workers,
6262
checkpoint_path=cfg.train.checkpoint_path,
63-
spect_scaler_path=cfg.train.spect_scaler_path,
63+
frames_standardizer_path=cfg.train.frames_standardizer_path,
6464
results_path=results_path,
65-
normalize_spectrograms=cfg.train.normalize_spectrograms,
65+
standardize_frames=cfg.train.standardize_frames,
6666
shuffle=cfg.train.shuffle,
6767
val_step=cfg.train.val_step,
6868
ckpt_step=cfg.train.ckpt_step,

src/vak/common/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
If a helper/utility function is only used in one module,
66
it should live either in that module or another at the same level.
77
See for example :mod:`vak.prep.prep_helper` or
8-
:mod:`vak.datsets.window_dataset._helper`.
8+
:mod:`vak.datsets.train_datapipe._helper`.
99
"""
1010

1111
from . import (

src/vak/common/constants.py

+6-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
"""constants used by multiple modules.
1+
"""Constants used by multiple modules.
22
Defined here to avoid circular imports.
33
"""
44

@@ -26,6 +26,7 @@
2626
"npz": np.load,
2727
}
2828
VALID_SPECT_FORMATS = list(SPECT_FORMAT_LOAD_FUNCTION_MAP.keys())
29+
DEFAULT_SPECT_FORMAT = "npz"
2930

3031
# ---- valid types of training data, the $x$ that goes into a network
3132
VALID_X_SOURCES = {"audio", "spect"}
@@ -57,3 +58,7 @@
5758
"npz": SPECT_NPZ_EXTENSION,
5859
"mat": ".mat",
5960
}
61+
62+
VALID_SPLITS = ("predict", "test", "train", "val")
63+
64+
DEFAULT_BACKGROUND_LABEL = "background"

src/vak/common/labels.py

+28-15
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,12 @@
55
import numpy as np
66
import pandas as pd
77

8-
from . import annotation
8+
from . import annotation, constants
99

1010

11-
def to_map(labelset: set, map_unlabeled: bool = True) -> dict:
11+
def to_map(
12+
labelset: set, map_background: bool = True, background_label: str = constants.DEFAULT_BACKGROUND_LABEL
13+
) -> dict:
1214
"""Convert set of labels to `dict`
1315
mapping those labels to a series of consecutive integers
1416
from 0 to n inclusive,
@@ -18,21 +20,31 @@ def to_map(labelset: set, map_unlabeled: bool = True) -> dict:
1820
from annotations of a vocalization into
1921
a label for every time bin in a spectrogram of that vocalization.
2022
21-
If ``map_unlabeled`` is True, then the label 'unlabeled'
22-
will be added to labelset, and will map to 0,
23+
If ``map_background`` is True, then a label
24+
will be added to labelset representing a background class
25+
(any segment that is not labeled).
26+
The default for this label is
27+
:const:`vak.common.constants.DEFAULT_BACKGROUND_LABEL`.
28+
This string label will map to class index 0,
2329
so the total number of classes is n + 1.
2430
2531
Parameters
2632
----------
2733
labelset : set
2834
Set of labels used to annotate a dataset.
29-
map_unlabeled : bool
30-
If True, include key 'unlabeled' in mapping.
35+
map_background : bool
36+
If True, include key specified by
37+
``background_label`` in mapping.
3138
Any time bins in a spectrogram
3239
that do not have a label associated with them,
3340
e.g. a silent gap between vocalizations,
3441
will be assigned the integer
35-
that the 'unlabeled' key maps to.
42+
that the background key maps to.
43+
background_label: str, optional
44+
The string label applied to segments belonging to the
45+
background class.
46+
Default is
47+
:const:`vak.common.constants.DEFAULT_BACKGROUND_LABEL`.
3648
3749
Returns
3850
-------
@@ -45,11 +57,12 @@ def to_map(labelset: set, map_unlabeled: bool = True) -> dict:
4557
)
4658

4759
labellist = []
48-
if map_unlabeled is True:
49-
labellist.append("unlabeled")
50-
60+
if map_background is True:
61+
# NOTE we append background label *first*
62+
labellist.append(background_label)
63+
# **then** extend with the rest of the labels
5164
labellist.extend(sorted(list(labelset)))
52-
65+
# so that background_label maps to class index 0 by default in next line
5366
labelmap = dict(zip(labellist, range(len(labellist))))
5467
return labelmap
5568

@@ -124,7 +137,7 @@ def from_df(
124137

125138
# added to fix https://github.com/NickleDave/vak/issues/373
126139
def multi_char_labels_to_single_char(
127-
labelmap: dict, skip: tuple[str] = ("unlabeled",)
140+
labelmap: dict, skip: tuple[str] = (constants.DEFAULT_BACKGROUND_LABEL,)
128141
) -> dict:
129142
"""Return a copy of a ``labelmap`` where any
130143
labels that are strings with multiple characters
@@ -146,9 +159,9 @@ def multi_char_labels_to_single_char(
146159
to integers. As returned by
147160
``vak.labels.to_map``.
148161
skip : tuple
149-
Of strings, labels to leave
150-
as multiple characters.
151-
Default is ('unlabeled',).
162+
A tuple of labels to leave as multiple characters.
163+
Default is a tuple containing just
164+
:const:`vak.common.constants.DEFAULT_BACKGROUND_LABEL`.
152165
153166
Returns
154167
-------

src/vak/config/__init__.py

-1
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,6 @@
2525
from .train import TrainConfig
2626
from .trainer import TrainerConfig
2727

28-
2928
__all__ = [
3029
"config",
3130
"dataset",

src/vak/config/dataset.py

+4-2
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@
77
import attr.validators
88
from attr import asdict, define, field
99

10+
from ..common.converters import expanded_user_path
11+
1012

1113
@define
1214
class DatasetConfig:
@@ -31,9 +33,9 @@ class DatasetConfig:
3133
Default is None.
3234
"""
3335

34-
path: pathlib.Path = field(converter=pathlib.Path)
36+
path: pathlib.Path = field(converter=expanded_user_path)
3537
splits_path: pathlib.Path | None = field(
36-
converter=attr.converters.optional(pathlib.Path), default=None
38+
converter=attr.converters.optional(expanded_user_path), default=None
3739
)
3840
name: str | None = field(
3941
converter=attr.converters.optional(str), default=None

src/vak/config/eval.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -110,8 +110,8 @@ class EvalConfig:
110110
Argument to torch.DataLoader. Default is 2.
111111
labelmap_path : str
112112
path to 'labelmap.json' file.
113-
spect_scaler_path : str
114-
path to a saved SpectScaler object used to normalize spectrograms.
113+
frames_standardizer_path : str
114+
path to a saved :class:`vak.transforms.FramesStandardizer` object used to standardize (normalize) frames.
115115
If spectrograms were normalized and this is not provided, will give
116116
incorrect results.
117117
post_tfm_kwargs : dict
@@ -152,7 +152,7 @@ class EvalConfig:
152152
converter=converters.optional(expanded_user_path), default=None
153153
)
154154
# optional, transform
155-
spect_scaler_path = field(
155+
frames_standardizer_path = field(
156156
converter=converters.optional(expanded_user_path),
157157
default=None,
158158
)

src/vak/config/learncurve.py

+12-5
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,12 @@
1010
from .train import TrainConfig
1111
from .trainer import TrainerConfig
1212

13-
REQUIRED_KEYS = ("dataset", "model", "root_results_dir", "trainer",)
13+
REQUIRED_KEYS = (
14+
"dataset",
15+
"model",
16+
"root_results_dir",
17+
"trainer",
18+
)
1419

1520

1621
@define
@@ -45,9 +50,9 @@ class LearncurveConfig(TrainConfig):
4550
Argument to torch.DataLoader.
4651
shuffle: bool
4752
if True, shuffle training data before each epoch. Default is True.
48-
normalize_spectrograms : bool
49-
if True, use spect.utils.data.SpectScaler to normalize the spectrograms.
50-
Normalization is done by subtracting off the mean for each frequency bin
53+
standardize_frames : bool
54+
if True, use :class:`vak.transforms.FramesStandardizer` to standardize the frames.
55+
Normalization is done by subtracting off the mean for each row
5156
of the training set and then dividing by the std for that frequency bin.
5257
This same normalization is then applied to validation + test data.
5358
val_step : int
@@ -75,6 +80,7 @@ class LearncurveConfig(TrainConfig):
7580
See the docstring of the transform for more details on
7681
these arguments and how they work.
7782
"""
83+
7884
post_tfm_kwargs = field(
7985
validator=validators.optional(are_valid_post_tfm_kwargs),
8086
converter=converters.optional(convert_post_tfm_kwargs),
@@ -91,7 +97,8 @@ def from_config_dict(cls, config_dict: dict) -> LearncurveConfig:
9197
by loading a valid configuration toml file with
9298
:func:`vak.config.parse.from_toml_path`,
9399
and then using key ``learncurve``,
94-
i.e., ``LearncurveConfig.from_config_dict(config_dict['learncurve'])``."""
100+
i.e., ``LearncurveConfig.from_config_dict(config_dict['learncurve'])``.
101+
"""
95102
for required_key in REQUIRED_KEYS:
96103
if required_key not in config_dict:
97104
raise KeyError(

src/vak/config/predict.py

+3-4
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,6 @@
1414
from .model import ModelConfig
1515
from .trainer import TrainerConfig
1616

17-
1817
REQUIRED_KEYS = (
1918
"checkpoint_path",
2019
"dataset",
@@ -50,8 +49,8 @@ class PredictConfig:
5049
num_workers : int
5150
Number of processes to use for parallel loading of data.
5251
Argument to torch.DataLoader. Default is 2.
53-
spect_scaler_path : str
54-
path to a saved SpectScaler object used to normalize spectrograms.
52+
frames_standardizer_path : str
53+
path to a saved :class:`vak.transforms.FramesStandardizer` object used to standardize (normalize) frames.
5554
If spectrograms were normalized and this is not provided, will give
5655
incorrect results.
5756
annot_csv_filename : str
@@ -104,7 +103,7 @@ class PredictConfig:
104103
)
105104

106105
# optional, transform
107-
spect_scaler_path = field(
106+
frames_standardizer_path = field(
108107
converter=converters.optional(expanded_user_path),
109108
default=None,
110109
)

0 commit comments

Comments
 (0)