-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add schema versioning for derived tables #11446
base: master
Are you sure you want to change the base?
Conversation
- derived_table.version is reported by Info api - derived_table.version pom value should match db table info value (for backend function) - derived_table.version does not affect importers currently (importers only import into MySQL, not clickhouse)
This looks good! Thanks! Few questions:
|
Currently the entire MySQL database content lives inside of the ClickHouse database. By putting this into the MySQL database, it also puts it into the ClickHouse database. Also, it is currently true that there is nothing "first class" living inside the ClickHouse database. All data in ClickHouse is either a direct copy from MySQL or is derived from MySQL through joins of the copied data. Introducing a first class table or record into ClickHouse (making ClickHouse the authoritative source for that content) would make it more difficult to program the API. If a query comes to /api/info for example, the handler would need to know to gather some of the requested state information from MySQL and some of the requested state information from ClickHouse. In a system where ClickHouse was not configured, it would need to avoid making the query to ClickHouse in order to avoid producing an error. So we would need to have somewhat divergent copies of the handler code ... one form for ClickHouse-enabled installations and a slightly different form for MySQL-only installations. This could be done, but my thought was that the effort to program a distinction between the two types of installations would be wasted effort given that we hope/plan to eliminate the MySQL database. As long as that is the direction we are moving, then I think we can instead focus on making all data in the ClickHouse database first class and handling the loss of constraint satisfaction. By not splitting the location of schema version values now, we do not need to re-integrate it later. For not collecting the history of clickhouse schema versions in a subdirectory, I think that change makes sense. I debated about this, and I do recognize that this would be a departure from prior practice. So I'll do this:
I believe we will need to maintain a record as the version number is incremented. Potentially we could have a policy of only taking a snapshot of the derived_table_version at points where we have a defined release of cBioPortal. So for each cBioPortal release number, we could know what derived_table_schema was used. I believe that the derived_table_schema is going to change much less frequently than the rate of cBioPortal releases. But potentially we may have 2 or more increments to the derived_table_schema during a single development increment between cBioPortal releases. So some of these "in-between" release increments may not get captured. But that is ok if we only expect deployers to be deploying a tagged, identified version of cBioPortal. So the version mapping documentation may be a table with four recorded fields:
|
Fix # (see https://help.github.com/en/articles/closing-issues-using-keywords)
Describe changes proposed in this pull request:
In order to insure that the proper clickhouse derived table construction logic is applied (based on the cBioPortal backend version that the installer/deployer is running), a version label (in the same pattern as db.version / DB_SCHEMA_VERSION) is added to the
info
table in the database and in the pom.xml file. Initial schema version is '1.0.0'.A 'versions' subdirectory is added to src/main/resource/db_scripts/clickhouse. This is intended to store a copy of every version of the derived table construction scripts. This is to avoid introducing another dimension of repo tagging or documentation maintenance to enable finding old versions of the derived table schema.
Checks
Any screenshots or GIFs?
If this is a new visual feature please add a before/after screenshot or gif
here with e.g. Giphy CAPTURE or Peek
Notify reviewers
Read our Pull request merging
policy. It can help to figure out who worked on the
file before you. Please use
git blame <filename>
to determine thatand notify them either through slack or by assigning them as a reviewer on the PR