Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage schema evolution support #906

Merged
merged 34 commits into from
Jan 4, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
082d841
add initial alembic configs
bpkroth Jan 3, 2025
7cb1fa3
tweaks necessary for alembic
bpkroth Jan 3, 2025
a40cfd1
wip for allowing a create/update schema on demand
bpkroth Jan 3, 2025
94d7cbe
formatting
bpkroth Jan 3, 2025
92796ef
add a new column
bpkroth Jan 3, 2025
309d530
linting
bpkroth Jan 3, 2025
f15da23
implement schema updates
bpkroth Jan 3, 2025
96b87f8
tweaks
bpkroth Jan 3, 2025
036a80b
format
bpkroth Jan 3, 2025
158f9cd
Merge branch 'main' into schema-updates
bpkroth Jan 3, 2025
b44f205
syntax tweaks for python 3.10
bpkroth Jan 3, 2025
f8d8217
more python 3.10 changes
bpkroth Jan 3, 2025
3be7c81
fixup
bpkroth Jan 3, 2025
780d986
move those changes to a new PR
bpkroth Jan 3, 2025
a49a194
Merge branch 'main' into schema-updates
motus Jan 3, 2025
320af9b
fixups
bpkroth Jan 4, 2025
7494ed4
lint
bpkroth Jan 4, 2025
071d148
fixps
bpkroth Jan 4, 2025
ffe7ce1
fixups
bpkroth Jan 4, 2025
de3da24
stop overriding our log level
bpkroth Jan 4, 2025
ba4fb35
another fixup
bpkroth Jan 4, 2025
f49c3e1
fixups
bpkroth Jan 4, 2025
1e981bc
disable a check in devcontainer
bpkroth Jan 4, 2025
ddd95f2
check for files
bpkroth Jan 4, 2025
86b4ede
fixups
bpkroth Jan 4, 2025
eb31326
Merge branch 'main' into schema-updates
bpkroth Jan 4, 2025
220a495
fixups
bpkroth Jan 4, 2025
8357c3a
fixups
bpkroth Jan 4, 2025
908b5db
missing package for pylint
bpkroth Jan 4, 2025
90e34ad
doc fixups
bpkroth Jan 4, 2025
2caf76c
fixups
bpkroth Jan 4, 2025
059fccf
fixup links
bpkroth Jan 4, 2025
41221a4
improved links
bpkroth Jan 4, 2025
8b2fe15
more docstring fixes
bpkroth Jan 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ repos:
rev: v5.0.0
hooks:
- id: check-added-large-files
- id: check-executables-have-shebangs
# - id: check-executables-have-shebangs (issues in devcontainer)
- id: check-merge-conflict
- id: check-toml
- id: check-yaml
Expand Down
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -253,6 +253,10 @@ mlos_viz/dist/tmp/mlos_viz-latest.tar.gz: PACKAGE_NAME := mlos_viz
! ( tar tzf $(MODULE_NAME)/dist/$(PACKAGE_NAME)-*.tar.gz | grep -m1 tests/ )
# Make sure the py.typed marker file exists.
tar tzf $(MODULE_NAME)/dist/$(PACKAGE_NAME)-*.tar.gz | grep -m1 /py.typed
# Make sure the alembic scripts are included
[ "$(MODULE_NAME)" != "mlos_bench" ] || tar tzf $(MODULE_NAME)/dist/$(PACKAGE_NAME)-*.tar.gz | grep -m1 /storage/sql/alembic.ini
[ "$(MODULE_NAME)" != "mlos_bench" ] || tar tzf $(MODULE_NAME)/dist/$(PACKAGE_NAME)-*.tar.gz | grep -m1 /storage/sql/alembic/env.py
[ "$(MODULE_NAME)" != "mlos_bench" ] || tar tzf $(MODULE_NAME)/dist/$(PACKAGE_NAME)-*.tar.gz | grep -m1 /storage/sql/alembic/versions/.*py
# Check to make sure the mlos_bench module has the config directory.
[ "$(MODULE_NAME)" != "mlos_bench" ] || tar tzf $(MODULE_NAME)/dist/$(PACKAGE_NAME)-*.tar.gz | grep -m1 mlos_bench/config/
cd $(MODULE_NAME)/dist/tmp && ln -s ../$(PACKAGE_NAME)-*.tar.gz $(PACKAGE_NAME)-latest.tar.gz
Expand Down
1 change: 1 addition & 0 deletions conda-envs/mlos-3.10.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ dependencies:
- pre-commit==4.0.1
- pycodestyle==2.12.1
- pylint==3.3.3
- tomlkit
- mypy==1.14.1
- pandas-stubs
- types-beautifulsoup4
Expand Down
1 change: 1 addition & 0 deletions conda-envs/mlos-3.11.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ dependencies:
- pre-commit==4.0.1
- pycodestyle==2.12.1
- pylint==3.3.3
- tomlkit
- mypy==1.14.1
- pandas-stubs
- types-beautifulsoup4
Expand Down
1 change: 1 addition & 0 deletions conda-envs/mlos-3.12.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ dependencies:
- pre-commit==4.0.1
- pycodestyle==2.12.1
- pylint==3.3.3
- tomlkit
- mypy==1.14.1
- pandas-stubs
- types-beautifulsoup4
Expand Down
1 change: 1 addition & 0 deletions conda-envs/mlos-3.13.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ dependencies:
- pre-commit==4.0.1
- pycodestyle==2.12.1
- pylint==3.3.3
- tomlkit
- mypy==1.14.1
- pandas-stubs
- types-beautifulsoup4
Expand Down
1 change: 1 addition & 0 deletions conda-envs/mlos-windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ dependencies:
- pre-commit==4.0.1
- pycodestyle==2.12.1
- pylint==3.3.3
- tomlkit
- mypy==1.14.1
- pandas-stubs
- types-beautifulsoup4
Expand Down
1 change: 1 addition & 0 deletions conda-envs/mlos.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ dependencies:
- pre-commit==4.0.1
- pycodestyle==2.12.1
- pylint==3.3.3
- tomlkit
- mypy==1.14.1
- pandas-stubs
- types-beautifulsoup4
Expand Down
5 changes: 5 additions & 0 deletions doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,7 @@ def is_on_github_actions():
)
intersphinx_mapping.update(
{
"alembic": ("https://alembic.sqlalchemy.org/en/latest/", None),
"dabl": ("https://dabl.github.io/stable/", None),
}
)
Expand Down Expand Up @@ -216,6 +217,7 @@ def setup(app: SphinxApp) -> None:
# External classes that refuse to resolve:
("py:class", "contextlib.nullcontext"),
("py:class", "sqlalchemy.engine.Engine"),
("py:class", "sqlalchemy.MetaData"),
("py:exc", "jsonschema.exceptions.SchemaError"),
("py:exc", "jsonschema.exceptions.ValidationError"),
]
Expand Down Expand Up @@ -253,6 +255,9 @@ def setup(app: SphinxApp) -> None:
# Don't document internal environment scripts that aren't part of a module.
"*/mlos_bench/config/environments/*/*.py",
"*/mlos_bench/config/services/*/*.py",
# Don't document schema evolution scripts.
"*/mlos_bench/storage/sql/alembic/*.py",
"*/mlos_bench/storage/sql/alembic/versions/*.py",
]
autoapi_options = [
"members",
Expand Down
34 changes: 30 additions & 4 deletions mlos_bench/mlos_bench/launcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,17 @@ def __init__(self, description: str, long_text: str = "", argv: list[str] | None
self.global_config = DictTemplater(self.global_config).expand_vars(use_os_env=True)
assert isinstance(self.global_config, dict)

self.storage = self._load_storage(
args.storage or config.get("storage"),
lazy_schema_create=False if args.create_update_storage_schema_only else None,
)
_LOG.info("Init storage: %s", self.storage)

if args.create_update_storage_schema_only:
_LOG.info("Create/update storage schema only.")
self.storage.update_schema()
sys.exit(0)

# --service cli args should override the config file values.
service_files: list[str] = config.get("services", []) + (args.service or [])
assert isinstance(self._parent_service, SupportsConfigLoading)
Expand Down Expand Up @@ -159,9 +170,6 @@ def __init__(self, description: str, long_text: str = "", argv: list[str] | None
self.optimizer = self._load_optimizer(args.optimizer or config.get("optimizer"))
_LOG.info("Init optimizer: %s", self.optimizer)

self.storage = self._load_storage(args.storage or config.get("storage"))
_LOG.info("Init storage: %s", self.storage)

self.teardown: bool = (
bool(args.teardown)
if args.teardown is not None
Expand Down Expand Up @@ -366,6 +374,18 @@ def add_argument(self, *args: Any, **kwargs: Any) -> None:
""",
)

parser.add_argument(
"--create-update-storage-schema-only",
required=False,
default=False,
dest="create_update_storage_schema_only",
action="store_true",
help=(
"Makes sure that the storage schema is up to date "
"for the current version of mlos_bench."
),
)

# By default we use the command line arguments, but allow the caller to
# provide some explicitly for testing purposes.
if argv is None:
Expand Down Expand Up @@ -483,7 +503,11 @@ def _load_optimizer(self, args_optimizer: str | None) -> Optimizer:
)
return optimizer

def _load_storage(self, args_storage: str | None) -> Storage:
def _load_storage(
self,
args_storage: str | None,
lazy_schema_create: bool | None = None,
) -> Storage:
"""
Instantiate the Storage object from JSON file provided in the --storage command
line parameter.
Expand All @@ -504,6 +528,8 @@ def _load_storage(self, args_storage: str | None) -> Storage:
)
class_config = self._config_loader.load_config(args_storage, ConfigSchema.STORAGE)
assert isinstance(class_config, dict)
if lazy_schema_create is not None:
class_config["lazy_schema_create"] = lazy_schema_create
storage = self._config_loader.build_storage(
service=self._parent_service,
config=class_config,
Expand Down
4 changes: 4 additions & 0 deletions mlos_bench/mlos_bench/storage/base_storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,10 @@ def __init__(
self._config = config.copy()
self._global_config = global_config or {}

@abstractmethod
def update_schema(self) -> None:
"""Update the schema of the storage backend if needed."""

def _validate_json_config(self, config: dict) -> None:
"""Reconstructs a basic json config that this class might have been instantiated
from in order to validate configs provided outside the file loading
Expand Down
120 changes: 120 additions & 0 deletions mlos_bench/mlos_bench/storage/sql/alembic.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# A generic, single database configuration.

[alembic]
# path to migration scripts
# Use forward slashes (/) also on windows to provide an os agnostic path
script_location = mlos_bench.storage.sql:alembic


# template used to generate migration file names; The default value is %%(rev)s_%%(slug)s
# Uncomment the line below if you want the files to be prepended with date and time
# see https://alembic.sqlalchemy.org/en/latest/tutorial.html#editing-the-ini-file
# for all available tokens
# file_template = %%(year)d_%%(month).2d_%%(day).2d_%%(hour).2d%%(minute).2d-%%(rev)s_%%(slug)s

# sys.path path, will be prepended to sys.path if present.
# defaults to the current working directory.
prepend_sys_path = .

# timezone to use when rendering the date within the migration file
# as well as the filename.
# If specified, requires the python>=3.9 or backports.zoneinfo library.
# Any required deps can installed by adding `alembic[tz]` to the pip requirements
# string value is passed to ZoneInfo()
# leave blank for localtime
timezone = UTC

# max length of characters to apply to the "slug" field
# truncate_slug_length = 40

# set to 'true' to run the environment during
# the 'revision' command, regardless of autogenerate
# revision_environment = false

# set to 'true' to allow .pyc and .pyo files without
# a source .py file to be detected as revisions in the
# versions/ directory
# sourceless = false

# version location specification; This defaults
# to alembic/versions. When using multiple version
# directories, initial revisions must be specified with --version-path.
# The path separator used here should be the separator specified by "version_path_separator" below.
# version_locations = %(here)s/bar:%(here)s/bat:alembic/versions

# version path separator; As mentioned above, this is the character used to split
# version_locations. The default within new alembic.ini files is "os", which uses os.pathsep.
# If this key is omitted entirely, it falls back to the legacy behavior of splitting on spaces and/or commas.
# Valid values for version_path_separator are:
#
# version_path_separator = :
# version_path_separator = ;
# version_path_separator = space
# version_path_separator = newline
version_path_separator = os # Use os.pathsep. Default configuration used for new projects.

# set to 'true' to search source files recursively
# in each "version_locations" directory
# new in Alembic version 1.10
# recursive_version_locations = false

# the output encoding used when revision files
# are written from script.py.mako
# output_encoding = utf-8

# See README.md for details.
sqlalchemy.url = sqlite:///mlos_bench.sqlite


[post_write_hooks]
# post_write_hooks defines scripts or Python functions that are run
# on newly generated revision scripts. See the documentation for further
# detail and examples

# format using "black" - use the console_scripts runner, against the "black" entrypoint
# hooks = black
# black.type = console_scripts
# black.entrypoint = black
# black.options = -l 79 REVISION_SCRIPT_FILENAME

# lint with attempts to fix using "ruff" - use the exec runner, execute a binary
# hooks = ruff
# ruff.type = exec
# ruff.executable = %(here)s/.venv/bin/ruff
# ruff.options = --fix REVISION_SCRIPT_FILENAME

# Logging configuration
[loggers]
keys = root,sqlalchemy,alembic

[handlers]
keys = console

[formatters]
keys = generic

[logger_root]
# Don't override the root logger's level, so that we can control it from mlos_bench configs.
#level = WARNING
handlers =
qualname =

[logger_sqlalchemy]
level = WARNING
handlers =
qualname = sqlalchemy.engine

[logger_alembic]
level = INFO
handlers =
qualname = alembic

[handler_console]
class = StreamHandler
args = (sys.stderr,)
level = NOTSET
formatter = generic

[formatter_generic]
format = %(levelname)-5.5s [%(name)s] %(message)s
datefmt = %H:%M:%S
43 changes: 43 additions & 0 deletions mlos_bench/mlos_bench/storage/sql/alembic/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Schema Evolution with Alembic

This document contains some notes on how to use [`alembic`](https://alembic.sqlalchemy.org/en/latest/) for schema evolution in the `mlos_bench` project.

## Overview

1. Create a blank `mlos_bench.sqlite` database file in the [`mlos_bench/storage/sql`](../) directory with the current schema using the following command:

```sh
cd mlos_bench/storage/sql
rm mlos_bench.sqlite
mlos_bench --storage storage/sqlite.jsonc --create-update-storage-schema-only
```

> This allows `alembic` to automatically generate a migration script from the current schema.

2. Adjust the [`mlos_bench/storage/sql/schema.py`](../schema.py) file to reflect the new desired schema.

> Keep each change small and atomic.
> For example, if you want to add a new column, do that in one change.
> If you want to rename a column, do that in another change.

3. Generate a new migration script with the following command:

```sh
alembic revision --autogenerate -m "Descriptive text about the change."
```

4. Review the generated migration script in the [`mlos_bench/storage/sql/alembic/versions`](./versions/) directory.

5. Verify that the migration script works by running the following command:

```sh
mlos_bench --storage storage/sqlite.jsonc --create-update-storage-schema-only
```

> Normally this would be done with `alembic upgrade head`, but this command is convenient to ensure if will work with the `mlos_bench` command line interface as well.

6. If the migration script works, commit the changes to the [`mlos_bench/storage/sql/schema.py`](../schema.py) and [`mlos_bench/storage/sql/alembic/versions`](./versions/) files.

7. Merge that to the `main` branch.

8. Might be good to cut a new `mlos_bench` release at this point as well.
Loading
Loading