Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create jobs database tables without init container #4942

Merged
merged 21 commits into from
Jul 26, 2021

Conversation

tuliren
Copy link
Contributor

@tuliren tuliren commented Jul 23, 2021

What

How

  • Move jobs database table creation from sql init script to Java upon server launch.
    • JobsDatabaseInstance takes in a jobs database connection, and prepares it by creating tables and indices if necessary.
    • ConfigsDatabaseInstance does the same for the configs database.
  • This implementation assumes:
    • The relevant database (airbyte by default) exists in the db server.
    • The user has permissions to read from and write to the database.
    • @davinchia, @jrhizor, are the above assumptions valid? Should this PR also take care of database creation and granting users the permissions? These operations should probably be put into terraform though.

TODOs

  • Refactor DatabaseSchema for more than just the jobs database.
    • Probably create an interface for it.
  • Create a DatabaseSchema for the configs database.
  • Use the constants defined in DatabaseSchema for the database instance classes.

Recommended reading order

  1. JobsDatabaseInstance.java
  2. job_tables/schema.sql: table schema for the jobs database to be executed in Java.
  3. init.sql: new initialization sql script that just creates the database, and grant user permission.

@github-actions github-actions bot added the area/platform issues related to the platform label Jul 23, 2021
@tuliren tuliren force-pushed the liren/auto-create-jobs-database-tables branch 3 times, most recently from 0078452 to 50416ba Compare July 23, 2021 10:30
@github-actions github-actions bot added area/documentation Improvements or additions to documentation area/connectors Connector related issues labels Jul 23, 2021
@tuliren tuliren marked this pull request as ready for review July 23, 2021 12:24
@jrhizor
Copy link
Contributor

jrhizor commented Jul 23, 2021

The assumptions sound reasonable to me. I did a quick skim of this and it looks good, I'll do a more in-depth review Monday.

@davinchia
Copy link
Contributor

davinchia commented Jul 24, 2021

@tuliren why not also create the airbyte database and grant as part of the server start up?

I'd prefer that since we might not be able to use the init script with Cloud Dbs (CloudSql doesn't support it, not sure about RDS) and it'll be nice if we don't need to add additional hooks/do manual work to spin things up.

@tuliren
Copy link
Contributor Author

tuliren commented Jul 24, 2021

@tuliren why not also create the airbyte database and grant as part of the server start up?

I'd prefer that since we might not be able to use the init script with Cloud Dbs (CloudSql doesn't support it, not sure about RDS) and it'll be nice if we don't need to add additional hooks/do manual work to spin things up.

Mainly because to create a database, the user must be a superuser or have the CREATEDB privilege (reference). If a customer is using an external database, I think usually they don't want to use a user account with those privileges?

Also the process can be complicated, because we need to first connect to the database server without specifying the database, create it, and then connect to the created database.

(By the way, the grant statement is unnecessary. If the user can create a database, that user will become the owner of the database, and naturally have all the privileges on the database.)

The new minimized init script is only needed for acceptance test. For CloudSQL, we don't need that. Since this is a database is a resource, I think terraform can take care of it. Is it not the case? I can still look into creating the database in Java if it is not an easy thing in terraform.

Copy link
Contributor

@davinchia davinchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I double checked and looks like it's possible to create db tables in CloudSql via terraform. So should be ok. I also agree with ya those steps are too much trouble for what it's worth - we'll figure something else out.

I like the approach! It's much cleaner. My comments were around what I think are some missing schemas in DatabaseSchemas and quite a bit of duplicated code in the setup of the XDatabaseInstances classes. I think it's worth DRY-ing these since Cloud will have it's own tables and will be able to reuse some of this logic. Want to hear ya thoughts.

The new minimized init script is only needed for acceptance test.

Regarding this, looks like the init script is loaded in the airbyte-db pod, so is used outside of acceptance test when a prior DB isn't provided. Am I misunderstanding this?

@tuliren tuliren force-pushed the liren/auto-create-jobs-database-tables branch from 96de674 to a0503dc Compare July 24, 2021 09:32
@github-actions github-actions bot removed the area/connectors Connector related issues label Jul 24, 2021
@tuliren
Copy link
Contributor Author

tuliren commented Jul 24, 2021

The new minimized init script is only needed for acceptance test.

Regarding this, looks like the init script is loaded in the airbyte-db pod, so is used outside of acceptance test when a prior DB isn't provided. Am I misunderstanding this?

Oh, right. It is still used for acceptance test, or local deployment that uses internal database. For cloud deployment or any deployment that uses an external database, we don't need it.

@tuliren
Copy link
Contributor Author

tuliren commented Jul 24, 2021

@davinchia, your comments are really helpful. I will tag you for another round of review once I finish the TODOs.

@davinchia
Copy link
Contributor

Oh, right. It is still used for acceptance test, or local deployment that uses internal database. For cloud deployment or any deployment that uses an external database, we don't need it.

Cool! Wanted to make sure I understood correctly.

@tuliren I'm glad! I always feel like a tool when I bug people about DRY + code grouping, especially since you are so kind to pick this up. Happy to review a second time!

@tuliren tuliren requested a review from davinchia July 25, 2021 22:42
Copy link
Contributor

@cgardens cgardens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome! I really like how the BaseDatabaseInstance class turned out.

Copy link
Contributor

@davinchia davinchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Very fast turnaround!

@tuliren
Copy link
Contributor Author

tuliren commented Jul 26, 2021

When I tested the jobs importing and exporting locally, it looks like the service_uuid metadata will be lost after the import.

For example, this is the metadata before the import:

                 key                 |                value
-------------------------------------+--------------------------------------
 server_uuid                         | e895a584-7dbf-48ce-ace6-0bc9ea570c34
 deployment_id                       | 55b923a9-42e5-4b0d-a5d2-b8d0316bfb2b
 airbyte_version                     | dev
 2021-07-26T03:34:31.996221Z_init_db | dev
(4 rows)

If I import the same data, here is the metadata table after the import:

                  key                  |                value
---------------------------------------+--------------------------------------
 deployment_id                         | 55b923a9-42e5-4b0d-a5d2-b8d0316bfb2b
 2021-07-26T03:46:52.882388Z_import_db | dev
 airbyte_version                       | dev
 2021-07-26T03:46:52.958634Z_init_db   | dev
(4 rows)

I don't think this bug is caused by this PR. Will look into it.

@tuliren
Copy link
Contributor Author

tuliren commented Jul 26, 2021

The bug is probably related to another PR (see comment here) that should be fixed separately (PR #4977).

I have changed the jobs db readiness check to not depend on the server uuid, which can minimize the impact of this bug.

@tuliren tuliren merged commit e8f20b2 into master Jul 26, 2021
@tuliren tuliren deleted the liren/auto-create-jobs-database-tables branch July 26, 2021 05:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Improvements or additions to documentation area/platform issues related to the platform
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Figure out an init script to set up the database.
4 participants