Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add schema migration system to Defra #1448

Closed
Tracked by #1002
AndrewSisley opened this issue May 3, 2023 · 2 comments · Fixed by #1564
Closed
Tracked by #1002

Add schema migration system to Defra #1448

AndrewSisley opened this issue May 3, 2023 · 2 comments · Fixed by #1564
Assignees
Labels
area/schema Related to the schema system epic Includes sub-issues priority/high
Milestone

Comments

@AndrewSisley
Copy link
Contributor

AndrewSisley commented May 3, 2023

Part of #1002

Using Lens.

Copy-pasted from discord thread dev-db.LensVM on read(https://discord.com/channels/427944769851752448/1102730749007904798):

  • Lens on local read, and on update (from anywhere, local or P2P), transform outcome to be stored (cached) in datastore
  • Can tolerate discarding P2P updates/creates for unknown schema within release cycle, but must not discard stuff by the time release is cut. By that point unknown schema version updates need to be held in a hidden (from normal queries) branch
  • Current datastore schema version tracked per document initially, long term we may wish to offer this per field, but not right now

From discord discussion RE unique secondary indexes:

  • The loss of datastore uniqueness can be tolerated in the short term.
  • Long term, this will likely use the same mechanics as the secondary index backfill stuff, as the problem also exists when defining new unique indexes on existing fields. It could be that if a Lens migration is for an object/field with a unique index it is eagerly evaluated (probably in the background).
  • Something worth noting, that I had forgotten is that non-unique secondary indexes are being implemented first, it could be that this will not even be a problem within 0.6, depending on when unique secondary indexes are implemented

From discord discussion RE wasm module persistence:

We could just rely on the user/admins preserving wherever they exist at time of migration registration (1). But we could also copy them somewhere defra specific (2), or store them as blobs within a datastore (3).

  • (1) is fine short term, and is likely the easiest, it is not a blocker for release
  • (2) is preferred short term if low cost
  • going from (1) to (2) should be fairly cheap, so if (1) is implemented first and everyone hates it, it shouldnt block merge and moving to (2) can likely be done immediately after within the release if required
  • (3) is Andy's and John's preferred long term solution, but:
  • (2) and (3) actually share the same public interfaces, and so the differences between the two can really be seen as implementation details

Personal/wip thoughts:

  • wasmer.NewModule(store, content) is slow (40-50ms), and should not be done in user/query time. This will mean they need to be cached somewhere.
  • Lens module instances are not threadsafe, and will probably need some worker-like channel stuff to ensure that db actions on different threads do not concurrently access the same lens module.
  • A Lens module instance should not be shared between multiple defra requests/actions/contexts at the same time. This is to protect against the leaking of state between contexts. This is particularly important the proposed Lens enumerable stuff, but is otherwise just a very good thing to have anyway.
  • The client package func(s) to manage migrations should not be coupled to mutating schema, although it should be possible to rely on the same transaction to commit/discard the changes together. I.e. it should be possible to modify the registered migrations at any point in time (e.g. for security updates to migration modules).
  • The migration system should still work following a restart. This means that the modules need to be persisted locally and the config (re)loaded on db start. The Lens engine itself can be referenced directly, only the wasm modules need to be dynamic.
  • The Lens related code can be nice and isolated/contained, within a defra lens sub-package.
  • If PR ready before Lens is open-sourced, we can vendor the dependency in the short term: https://stackoverflow.com/questions/68544611/what-is-the-purpose-of-go-mod-vendor-command
  • The Defra load-Lens code should avoid doing anything too Defra specific, and should instead be used to find deficiencies in the Lens config/interfaces.
  • The update code (including via planner) uses docFetcher, so by handling the Lens magic in the fetcher-space our lens update requirements should be met auto-magically. It still needs tests though.

Tasks

Preview Give feedback
No tasks being tracked yet.
@AndrewSisley AndrewSisley added epic Includes sub-issues area/schema Related to the schema system labels May 3, 2023
@AndrewSisley AndrewSisley added this to the DefraDB v0.6 milestone May 3, 2023
@AndrewSisley AndrewSisley self-assigned this May 3, 2023
@AndrewSisley
Copy link
Contributor Author

I've added #1082 as a prerequisite, as we should ensure that migrations are not lost on database restart, and the linked ticket can provide a mechanism for testing this.

@AndrewSisley
Copy link
Contributor Author

WIP branch is here: https://github.com/AndrewSisley/defradb/pull/new/1448-lens-in-defra

I may be pulling bits out into smaller PRs before the main body of work is put into PR.

AndrewSisley added a commit that referenced this issue Jul 13, 2023
## Relevant issue(s)

Resolves #1448 #1556

Adds lens as a migration engine to defra.

Migrations are run lazily, on fetch. This includes on updates (locally,
and via P2P). The migration result is cached within the datastore. The
DAG is never updated to reflect the migration result, including when the
migrations are executed during an update. The current schema version of
items in the datastore is tracked at the document level.

Commits should be clean, and contain some hopefully handy documentation,
although the bulk of the work is in the last, large commit.
shahzadlone pushed a commit to shahzadlone/defradb that referenced this issue Feb 23, 2024
## Relevant issue(s)

Resolves sourcenetwork#1448 sourcenetwork#1556

Adds lens as a migration engine to defra.

Migrations are run lazily, on fetch. This includes on updates (locally,
and via P2P). The migration result is cached within the datastore. The
DAG is never updated to reflect the migration result, including when the
migrations are executed during an update. The current schema version of
items in the datastore is tracked at the document level.

Commits should be clean, and contain some hopefully handy documentation,
although the bulk of the work is in the last, large commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/schema Related to the schema system epic Includes sub-issues priority/high
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant