From 00c3b51be03d9ef064da5877abed95aa02810797 Mon Sep 17 00:00:00 2001 From: Davin Chia Date: Mon, 3 Jan 2022 18:16:52 +0800 Subject: [PATCH 1/7] Checkpoint: Initial configurationd documentation. --- docs/SUMMARY.md | 1 + docs/operator-guides/scaling-airbyte.md | 4 ++-- .../configuring-airbyte.md | 20 +++++++++++++++++++ 3 files changed, 23 insertions(+), 2 deletions(-) create mode 100644 docs/understanding-airbyte/configuring-airbyte.md diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index 66334fcb70e36..74f9daae815a7 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -256,6 +256,7 @@ * [Change Data Capture (CDC)](understanding-airbyte/cdc.md) * [Namespaces](understanding-airbyte/namespaces.md) * [Json to Avro Conversion](understanding-airbyte/json-avro-conversion.md) + * [Configuring Airbyte](understanding-airbyte/configuring-airbyte.md) * [Glossary of Terms](understanding-airbyte/glossary.md) * [API documentation](api-documentation.md) * [Project Overview](project-overview/README.md) diff --git a/docs/operator-guides/scaling-airbyte.md b/docs/operator-guides/scaling-airbyte.md index 9bfa0eade16dc..aeefa8da6fd13 100644 --- a/docs/operator-guides/scaling-airbyte.md +++ b/docs/operator-guides/scaling-airbyte.md @@ -53,9 +53,9 @@ This is a **non-issue** for users running Airbyte Docker. ### Temporal DB -Temporal maintains multiple idle connexions. By the default value is `20` and you may want to lower or increase this number. One issue we noticed is +Temporal maintains multiple idle connections. By the default value is `20` and you may want to lower or increase this number. One issue we noticed is that temporal creates multiple pools and the number specified in the `SQL_MAX_IDLE_CONNS` environment variable of the `docker.compose.yaml` file -might end up allowing 4-5 times more connexions than expected. +might end up allowing 4-5 times more connections than expected. If you want tho increase the amount of allowed idle connexion, you will also need to increase `SQL_MAX_CONNS` as well because `SQL_MAX_IDLE_CONNS` is capped by `SQL_MAX_CONNS`. diff --git a/docs/understanding-airbyte/configuring-airbyte.md b/docs/understanding-airbyte/configuring-airbyte.md new file mode 100644 index 0000000000000..9679ff3e125b0 --- /dev/null +++ b/docs/understanding-airbyte/configuring-airbyte.md @@ -0,0 +1,20 @@ +# Configuring Airbyte + +This section covers how to configure Airbyte, and the various configuration Airbyte accepts. + +Configuration is currently via environment variables. See the below section on how to modify these variables. + +## Docker Deployments + +The recommended way to run an Airbyte Docker deployment is via the Airbyte repo's `docker-compose.yaml` and `.env` file. + +Modifying the `.env` file is all that is needed. The `docker-compose.yaml` file injects appropriate variables into the containers. + +## Kubernetes Deployments + +The recommended way to run an Airbyte Kubernetes deployment is via the + + +## Reference + +The following From afb2e15d79108d05fb0a6d98e5679035f041ae5b Mon Sep 17 00:00:00 2001 From: Davin Chia Date: Mon, 3 Jan 2022 22:44:29 +0800 Subject: [PATCH 2/7] Checkpoint: switching away to another branch. --- docs/understanding-airbyte/configuring-airbyte.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/understanding-airbyte/configuring-airbyte.md b/docs/understanding-airbyte/configuring-airbyte.md index 9679ff3e125b0..117f0de2d1df6 100644 --- a/docs/understanding-airbyte/configuring-airbyte.md +++ b/docs/understanding-airbyte/configuring-airbyte.md @@ -8,7 +8,9 @@ Configuration is currently via environment variables. See the below section on h The recommended way to run an Airbyte Docker deployment is via the Airbyte repo's `docker-compose.yaml` and `.env` file. -Modifying the `.env` file is all that is needed. The `docker-compose.yaml` file injects appropriate variables into the containers. +In this manner, modifying the `.env` file is all that is needed. The `docker-compose.yaml` file injects appropriate variables into the containers. + +If you want to manage your own docker files, please look at the existing docker file to ensure applications get the correct variables. ## Kubernetes Deployments From 87edcfbbd09b7ce4ea6cb162fc165f3fb13a0d5a Mon Sep 17 00:00:00 2001 From: Davin Chia Date: Tue, 4 Jan 2022 01:25:41 +0800 Subject: [PATCH 3/7] Add doc strings to env vars defined in Configs.java. Move docstring to configuring airbyte doc. --- .../main/java/io/airbyte/config/Configs.java | 198 +++++++++++++++++- .../java/io/airbyte/config/EnvConfigs.java | 3 +- .../configuring-airbyte.md | 95 ++++++++- 3 files changed, 282 insertions(+), 14 deletions(-) diff --git a/airbyte-config/models/src/main/java/io/airbyte/config/Configs.java b/airbyte-config/models/src/main/java/io/airbyte/config/Configs.java index 7ee283c1821aa..baea466a4ca61 100644 --- a/airbyte-config/models/src/main/java/io/airbyte/config/Configs.java +++ b/airbyte-config/models/src/main/java/io/airbyte/config/Configs.java @@ -12,125 +12,305 @@ import java.util.Map; import java.util.Set; +/** + * This interface defines the general variables for configuring Airbyte. + * + * Please also update the configuring-airbyte.md document when modifying this file. + */ public interface Configs { // CORE // General + /** + * Distinguishes internal Airbyte deployments. Internal-use only. + */ String getAirbyteRole(); + /** + * Defines the Airbyte deployment version. + */ AirbyteVersion getAirbyteVersion(); String getAirbyteVersionOrWarning(); + /** + * Defines the bucket for caching specs. This immensely speeds up spec operations. This is updated + * when new versions are published. + */ String getSpecCacheBucket(); + /** + * Distinguishes internal Airbyte deployments. Internal-use only. + */ DeploymentMode getDeploymentMode(); + /** + * Defines if the deployment is Docker or Kubernetes. Airbyte behaves accordingly. + */ WorkerEnvironment getWorkerEnvironment(); + /** + * Defines the configs directory. Applies only to Docker, and is present in Kubernetes for backward + * compatibility. + */ Path getConfigRoot(); + /** + * Defines the Airbyte workspace directory. Applies only to Docker, and is present in Kubernetes for + * backward compatibility. + */ Path getWorkspaceRoot(); // Docker Only + /** + * Defines the name of the Airbyte docker volume. + */ String getWorkspaceDockerMount(); String getLocalDockerMount(); + /** + * Defines the docker network jobs are launched on with the new scheduler. + */ String getDockerNetwork(); Path getLocalRoot(); // Secrets + /** + * Defines the GCP Project to store secrets in. + */ String getSecretStoreGcpProjectId(); + /** + * Define the JSON credentials used to read/write Airbyte Configuration to Google Secret Manager. + * These credentials must have Secret Manager Read/Write access. + */ String getSecretStoreGcpCredentials(); + /** + * Defines the Secret Persistence type. None by default. Set to GOOGLE_SECRET_MANAGER to use Google + * Secret Manager. Set to TESTING_CONFIG_DB_TABLE to use the database as a test. + */ SecretPersistenceType getSecretPersistenceType(); // Database + /** + * Define the Jobs Database user. + */ String getDatabaseUser(); + /** + * Define the Jobs Database password. + */ String getDatabasePassword(); + /** + * Define the Jobs Database url in the form of + * jdbc:postgresql://${DATABASE_HOST}:${DATABASE_PORT/${DATABASE_DB}. Do not include username or + * password. + */ String getDatabaseUrl(); + /** + * Define the minimum flyway migration version the Jobs Database must be at. If this is not + * satisfied, applications will not successfully connect. + */ String getJobsDatabaseMinimumFlywayMigrationVersion(); + /** + * Define the total time to wait for the Jobs Database to be initialized. This includes migrations. + */ long getJobsDatabaseInitializationTimeoutMs(); + /** + * Define the Configs Database user. Defaults to the Jobs Database user if empty. + */ String getConfigDatabaseUser(); + /** + * Define the Configs Database password. Defaults to the Jobs Database password if empty. + */ String getConfigDatabasePassword(); + /** + * Define the Configs Database url in the form of + * jdbc:postgresql://${DATABASE_HOST}:${DATABASE_PORT/${DATABASE_DB}. Defaults to the Jobs Database + * url if empty. + */ String getConfigDatabaseUrl(); + /** + * Define the minimum flyway migration version the Configs Database must be at. If this is not + * satisfied, applications will not successfully connect. + */ String getConfigsDatabaseMinimumFlywayMigrationVersion(); + /** + * Define the total time to wait for the Configs Database to be initialized. This includes + * migrations. + */ long getConfigsDatabaseInitializationTimeoutMs(); + /** + * Define if the Bootloader should run migrations on start up. + */ boolean runDatabaseMigrationOnStartup(); // Airbyte Services + /** + * Define the url where Temporal is hosted at. Please include the port. Airbyte services use this + * information. + */ String getTemporalHost(); + /** + * Define the url where the Airbyte Server is hosted at. Airbyte services use this information. + * Manipulates the `INTERNAL_API_HOST` variable. + */ String getAirbyteApiHost(); + /** + * Define the port where the Airbyte Server is hosted at. Airbyte services use this information. + * Manipulates the `INTERNAL_API_HOST` variable. + */ int getAirbyteApiPort(); + /** + * Define the url the Airbyte Webapp is hosted at. Airbyte services use this information. + */ String getWebappUrl(); // Jobs + /** + * Define the number of attempts a sync will attempt before failing. + */ int getSyncJobMaxAttempts(); + /** + * Define the number of days a sync job will execute for before timing out. + */ int getSyncJobMaxTimeoutDays(); + /** + * Define the job container's minimum CPU usage. Units follow either Docker or Kubernetes, depending + * on the deployment. Defaults to none. + */ + String getJobMainContainerCpuRequest(); + + /** + * Define the job container's maximum CPU usage. Units follow either Docker or Kubernetes, depending + * on the deployment. Defaults to none. + */ + String getJobMainContainerCpuLimit(); + + /** + * Define the job container's minimum RAM usage. Units follow either Docker or Kubernetes, depending + * on the deployment. Defaults to none. + */ + String getJobMainContainerMemoryRequest(); + + /** + * Define the job container's maximum RAM usage. Units follow either Docker or Kubernetes, depending + * on the deployment. Defaults to none. + */ + String getJobMainContainerMemoryLimit(); + + // Jobs - Kube only + /** + * Define one or more Job pod tolerations. Tolerations are separated by ';'. Each toleration + * contains k=v pairs mentioning some/all of key, effect, operator and value and separated by `,`. + */ List getJobKubeTolerations(); + /** + * Define one or more Job pod node selectors. Each kv-pair is separated by a `,`. + */ Map getJobKubeNodeSelectors(); + /** + * Define the Job pod connector image pull policy. + */ String getJobKubeMainContainerImagePullPolicy(); + /** + * Define the Job pod connector image pull secret. Useful when hosting private images. + */ String getJobKubeMainContainerImagePullSecret(); + /** + * Define the Job pod socat image. + */ String getJobKubeSocatImage(); + /** + * Define the Job pod busybox image. + */ String getJobKubeBusyboxImage(); + /** + * Define the Job pod curl image pull. + */ String getJobKubeCurlImage(); + /** + * Define the Kubernetes namespace Job pods are created in. + */ String getJobKubeNamespace(); - String getJobMainContainerCpuRequest(); - - String getJobMainContainerCpuLimit(); - - String getJobMainContainerMemoryRequest(); - - String getJobMainContainerMemoryLimit(); - // Logging/Monitoring/Tracking + /** + * Define either S3, Minio or GCS as a logging backend. Kubernetes only. Multiple variables are + * involved here. Please see {@link CloudStorageConfigs} for more info. + */ LogConfigs getLogConfigs(); + /** + * Define either S3, Minio or GCS as a state storage backend. Multiple variables are involved here. + * Please see {@link CloudStorageConfigs} for more info. + */ CloudStorageConfigs getStateStorageCloudConfigs(); + /** + * Determine if Datadog tracking events should be published. Mainly for Airbyte internal use. + */ boolean getPublishMetrics(); + /** + * Define whether to publish tracking events to Segment or log-only. Airbyte internal use. + */ TrackingStrategy getTrackingStrategy(); // APPLICATIONS // Worker + /** + * Define the maximum number of workers each Airbyte Worker container supports. Multiple variables + * are involved here. Please see {@link MaxWorkersConfig} for more info. + */ MaxWorkersConfig getMaxWorkers(); + // Worker - Kube only + /** + * Define the local ports the Airbyte Worker pod uses to connect to the various Job pods. + */ Set getTemporalWorkerPorts(); // Scheduler + /** + * Define how and how often the Scheduler sweeps its local disk for old configs. Multiple variables + * are involved here. Please see {@link WorkspaceRetentionConfig} for more info. + */ WorkspaceRetentionConfig getWorkspaceRetentionConfig(); + /** + * Define the maximum number of concurrent jobs the Scheduler schedules. Defaults to 5. + */ String getSubmitterNumThreads(); // Container Orchestrator - + /** + * Define if Airbyte should use Scheduler V2. + */ boolean getContainerOrchestratorEnabled(); enum TrackingStrategy { diff --git a/airbyte-config/models/src/main/java/io/airbyte/config/EnvConfigs.java b/airbyte-config/models/src/main/java/io/airbyte/config/EnvConfigs.java index 4ac1df231abce..5dcd2496a933b 100644 --- a/airbyte-config/models/src/main/java/io/airbyte/config/EnvConfigs.java +++ b/airbyte-config/models/src/main/java/io/airbyte/config/EnvConfigs.java @@ -86,6 +86,7 @@ public class EnvConfigs implements Configs { private static final String CONFIGS_DATABASE_INITIALIZATION_TIMEOUT_MS = "CONFIGS_DATABASE_INITIALIZATION_TIMEOUT_MS"; private static final String JOBS_DATABASE_MINIMUM_FLYWAY_MIGRATION_VERSION = "JOBS_DATABASE_MINIMUM_FLYWAY_MIGRATION_VERSION"; private static final String JOBS_DATABASE_INITIALIZATION_TIMEOUT_MS = "JOBS_DATABASE_INITIALIZATION_TIMEOUT_MS"; + private static final String CONTAINER_ORCHESTRATOR_ENABLED = "CONTAINER_ORCHESTRATOR_ENABLED"; private static final String STATE_STORAGE_S3_BUCKET_NAME = "STATE_STORAGE_S3_BUCKET_NAME"; private static final String STATE_STORAGE_S3_REGION = "STATE_STORAGE_S3_REGION"; @@ -546,7 +547,7 @@ public String getSubmitterNumThreads() { @Override public boolean getContainerOrchestratorEnabled() { - return getEnvOrDefault("CONTAINER_ORCHESTRATOR_ENABLED", false, Boolean::valueOf); + return getEnvOrDefault(CONTAINER_ORCHESTRATOR_ENABLED, false, Boolean::valueOf); } // Helpers diff --git a/docs/understanding-airbyte/configuring-airbyte.md b/docs/understanding-airbyte/configuring-airbyte.md index 117f0de2d1df6..a91f32d3ed553 100644 --- a/docs/understanding-airbyte/configuring-airbyte.md +++ b/docs/understanding-airbyte/configuring-airbyte.md @@ -8,15 +8,102 @@ Configuration is currently via environment variables. See the below section on h The recommended way to run an Airbyte Docker deployment is via the Airbyte repo's `docker-compose.yaml` and `.env` file. -In this manner, modifying the `.env` file is all that is needed. The `docker-compose.yaml` file injects appropriate variables into the containers. +To configure the default Airbyte Docker deployment, modify the bundled `.env` file. The `docker-compose.yaml` file injects appropriate variables into +the containers. -If you want to manage your own docker files, please look at the existing docker file to ensure applications get the correct variables. +If you want to manage your own docker files, please refer to Airbyte's docker file to ensure applications get the correct variables. ## Kubernetes Deployments -The recommended way to run an Airbyte Kubernetes deployment is via the +The recommended way to run an Airbyte Kubernetes deployment is via the `Kustomize` overlays. +We recommend using the overlays in the `stable` directory as these have preset resource limits. + +To configure the default Airbyte Kubernetes deployment, modify the `.env` in the respective directory. Each application will consume the appropriate +env var from a generated configmap. + +If you want to manage your own Kube manifest, please refer to the various `Kustomize` overlays as an example. ## Reference -The following +The following are the possible configuration options organised by deployment type and services. + +Internal-onlu variables have been omitted for clarity. See `Configs.java` for a full list. + +### Shared + +The following variables are relevant to both Docker and Kubernetes. + +#### Core +1. `AIRBYTE_VERSION` - Defines the Airbyte deployment version. +2. `SPEC_CACHE_BUCKET` - Defines the bucket for caching specs. This immensely speeds up spec operations. This is updated when new versions are published. +3. `WORKER_ENVIRONMENT` - Defines if the deployment is Docker or Kubernetes. Airbyte behaves accordingly. +4. `CONFIG_ROOT` - Defines the configs directory. Applies only to Docker, and is present in Kubernetes for backward compatibility. +5. `WORKSPACE_ROOT` - Defines the Airbyte workspace directory. Applies only to Docker, and is present in Kubernetes for backward compatibility. + +#### Secrets +1. `SECRET_STORE_GCP_PROJECT_ID` - Defines the GCP Project to store secrets in. +2. `SECRET_STORE_GCP_CREDENTIALS` - Define the JSON credentials used to read/write Airbyte Configuration to Google Secret Manager. These credentials must have Secret Manager Read/Write access. +3. `SECRET_PERSISTENCE_TYPE` - Defines the Secret Persistence type. Defaults to NONE. Set to GOOGLE_SECRET_MANAGER to use Google Secret Manager. Set to TESTING_CONFIG_DB_TABLE to use the database as a test. + +#### Database +1. `DATABASE_USER` - Define the Jobs Database user. +2. `DATABASE_PASSWORD` - Define the Jobs Database password. +3. `DATABASE_URL` - Define the Jobs Database url in the form of jdbc:postgresql://${DATABASE_HOST}:${DATABASE_PORT/${DATABASE_DB}. Do not include username or password. +4. `JOBS_DATABASE_MINIMUM_FLYWAY_MIGRATION_VERSION` - Define the minimum flyway migration version the Jobs Database must be at. If this is not satisfied, applications will not successfully connect. +5. `JOBS_DATABASE_INITIALIZATION_TIMEOUT_MS` - Define the total time to wait for the Jobs Database to be initialized. This includes migrations. +6. `CONFIG_DATABASE_USER` - Define the Configs Database user. Defaults to the Jobs Database user if empty. +7. `CONFIG_DATABASE_PASSWORD` - Define the Configs Database password. Defaults to the Jobs Database password if empty. +8. `CONFIG_DATABASE_URL` - Define the Configs Database url in the form of jdbc:postgresql://${DATABASE_HOST}:${DATABASE_PORT/${DATABASE_DB}. Defaults to the Jobs Database url if empty. +9. `CONFIG_DATABASE_MINIMUM_FLYWAY_MIGRATION_VERSION` - Define the minimum flyway migration version the Configs Database must be at. If this is not satisfied, applications will not successfully connect. +10. `CONFIG_DATABASE_INITIALIZATION_TIMEOUT_MS` - Define the total time to wait for the Configs Database to be initialized. This includes migrations. +11. `RUN_DATABASE_MIGRATION_ON_STARTUP` - Define if the Bootloader should run migrations on start up. + +#### Airbyte Services +1. `TEMPORAL_HOST` - Define the url where Temporal is hosted at. Please include the port. Airbyte services use this information. +2. `INTERNAL_API_HOST` - Define the url where the Airbyte Server is hosted at. Please include the port. Airbyte services use this information. +3. `WEBAPP_URL` - Define the url the Airbyte Webapp is hosted at. Please include the port. Airbyte services use this information. + +#### Jobs +1. `SYNC_JOB_MAX_ATTEMPTS` - Define the number of attempts a sync will attempt before failing. +2. `SYNC_JOB_MAX_TIMEOUT_DAYS` - Define the number of days a sync job will execute for before timing out. +3. `JOB_MAIN_CONTAINER_CPU_REQUEST` - Define the job container's minimum CPU usage. Units follow either Docker or Kubernetes, depending on the deployment. Defaults to none. +4. `JOB_MAIN_CONTAINER_CPU_LIMIT` - Define the job container's maximum CPU usage. Units follow either Docker or Kubernetes, depending on the deployment. Defaults to none. +5. `JOB_MAIN_CONTAINER_MEMORY_REQUEST` - Define the job container's minimum RAM usage. Units follow either Docker or Kubernetes, depending on the deployment. Defaults to none. +6. `JOB_MAIN_CONTAINER_MEMORY_LIMIT` - Define the job container's maximum RAM usage. Units follow either Docker or Kubernetes, depending on the deployment. Defaults to none. + +#### Logging +1. `LOG_LEVEL` - Define log levels. Defaults to INFO. This value is expected to be one of the various Log4J log levels. + +#### Worker +1. `MAX_SPEC_WORKERS` - Define the maximum number of Spec workers each Airbyte Worker container can support. Defaults to 5. +2. `MAX_CHECK_WORKERS` - Define the maximum number of Check workers each Airbyte Worker container can support. Defaults to 5. +3. `MAX_SYNC_WORKERS` - Define the maximum number of Sync workers each Airbyte Worker container can support. Defaults to 5. +4. `MAX_DISCOVER_WORKERS` - Define the maximum number of Discover workers each Airbyte Worker container can support. Defaults to 5. + +#### Scheduler +1. `SUBMITTER_NUM_THREADS` - Define the maximum number of concurrent jobs the Scheduler schedules. Defaults to 5. +2. `MINIMUM_WORKSPACE_RETENTION_DAYS` - Defines the minimum configuration file age for sweeping. The Scheduler will do it's best to now sweep files younger than this. Defaults to 1 day. +3. `MAXIMUM_WORKSPACE_RETENTION_DAYS` - Defines the oldest un-swept configuration file age. Files older than this will definitely be swept. Defaults to 60 days. +4. `MAXIMUM_WORKSPACE_SIZE_MB` - Defines the workspace size sweeping will continue until. Defaults to 5GB. + +#### Container Orchestrator +1. `CONTAINER_ORCHESTRATOR_ENABLED` - Define if Airbyte should use Scheduler V2. + +### Docker-Only +1. `WORKSPACE_DOCKER_MOUNT` - Defines the name of the Airbyte docker volume. +2. `DOCKER_NETWORK` - Defines the docker network the new Scheduler launches jobs on. + +### Kubernetes-Only +#### Jobs +1. `JOB_KUBE_TOLERATIONS` - Define one or more Job pod tolerations. Tolerations are separated by ';'. Each toleration contains k=v pairs mentioning some/all of key, effect, operator and value and separated by `,`. +2. `JOB_KUBE_NODE_SELECTORS` - Define one or more Job pod node selectors. Each kv-pair is separated by a `,`. +3. `JOB_KUBE_MAIN_CONTAINER_IMAGE_PULL_POLICY` - Define the Job pod connector image pull policy. +4. `JOB_KUBE_MAIN_CONTAINER_IMAGE_PULL_SECRET` - Define the Job pod connector image pull secret. Useful when hosting private images. +5. `JOB_KUBE_SOCAT_IMAGE` - Define the Job pod socat image. +6. `JOB_KUBE_BUSYBOX_IMAGE` - Define the Job pod busybox image. +7. `JOB_KUBE_CURL_IMAGE` - Define the Job pod curl image pull. +8. `JOB_KUBE_NAMESPACE` - Define the Kubernetes namespace Job pods are created in. + +#### Worker +1. `TEMPORAL_WORKER_PORTS` - Define the local ports the Airbyte Worker pod uses to connect to the various Job pods. Port 9001 - 9040 are exposed by default in the Kustomize deployments. From d25b1d3a86f226aa2ac6ed447ce023bb8e9c66fa Mon Sep 17 00:00:00 2001 From: Davin Chia Date: Tue, 4 Jan 2022 01:28:59 +0800 Subject: [PATCH 4/7] English!. --- docs/understanding-airbyte/configuring-airbyte.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/understanding-airbyte/configuring-airbyte.md b/docs/understanding-airbyte/configuring-airbyte.md index a91f32d3ed553..89c31a89a8860 100644 --- a/docs/understanding-airbyte/configuring-airbyte.md +++ b/docs/understanding-airbyte/configuring-airbyte.md @@ -22,7 +22,7 @@ We recommend using the overlays in the `stable` directory as these have preset r To configure the default Airbyte Kubernetes deployment, modify the `.env` in the respective directory. Each application will consume the appropriate env var from a generated configmap. -If you want to manage your own Kube manifest, please refer to the various `Kustomize` overlays as an example. +If you want to manage your own Kube manifest, please refer to the various `Kustomize` overlays for examples. ## Reference From 11c93b790497d1b6dc53dd5c54f01202f01422bc Mon Sep 17 00:00:00 2001 From: Davin Chia Date: Tue, 4 Jan 2022 15:34:40 +0800 Subject: [PATCH 5/7] Update docs/understanding-airbyte/configuring-airbyte.md Co-authored-by: Jared Rhizor --- docs/understanding-airbyte/configuring-airbyte.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/understanding-airbyte/configuring-airbyte.md b/docs/understanding-airbyte/configuring-airbyte.md index 89c31a89a8860..fafef4ae4f75b 100644 --- a/docs/understanding-airbyte/configuring-airbyte.md +++ b/docs/understanding-airbyte/configuring-airbyte.md @@ -49,7 +49,7 @@ The following variables are relevant to both Docker and Kubernetes. #### Database 1. `DATABASE_USER` - Define the Jobs Database user. 2. `DATABASE_PASSWORD` - Define the Jobs Database password. -3. `DATABASE_URL` - Define the Jobs Database url in the form of jdbc:postgresql://${DATABASE_HOST}:${DATABASE_PORT/${DATABASE_DB}. Do not include username or password. +3. `DATABASE_URL` - Define the Jobs Database url in the form of `jdbc:postgresql://${DATABASE_HOST}:${DATABASE_PORT/${DATABASE_DB}`. Do not include username or password. 4. `JOBS_DATABASE_MINIMUM_FLYWAY_MIGRATION_VERSION` - Define the minimum flyway migration version the Jobs Database must be at. If this is not satisfied, applications will not successfully connect. 5. `JOBS_DATABASE_INITIALIZATION_TIMEOUT_MS` - Define the total time to wait for the Jobs Database to be initialized. This includes migrations. 6. `CONFIG_DATABASE_USER` - Define the Configs Database user. Defaults to the Jobs Database user if empty. From d45f591054694ad90bef33a29715ccf0ff71738c Mon Sep 17 00:00:00 2001 From: Davin Chia Date: Tue, 4 Jan 2022 16:02:07 +0800 Subject: [PATCH 6/7] Respond to PR feedback. --- .../main/java/io/airbyte/config/Configs.java | 28 ++++++++++++----- .../configuring-airbyte.md | 30 +++++++++---------- 2 files changed, 34 insertions(+), 24 deletions(-) diff --git a/airbyte-config/models/src/main/java/io/airbyte/config/Configs.java b/airbyte-config/models/src/main/java/io/airbyte/config/Configs.java index baea466a4ca61..2a7e10cf26a78 100644 --- a/airbyte-config/models/src/main/java/io/airbyte/config/Configs.java +++ b/airbyte-config/models/src/main/java/io/airbyte/config/Configs.java @@ -14,8 +14,15 @@ /** * This interface defines the general variables for configuring Airbyte. - * - * Please also update the configuring-airbyte.md document when modifying this file. + *

+ * Please update the configuring-airbyte.md document when modifying this file. + *

+ * Please also add one of the following tags to the env var accordingly: + *

+ * 1. 'Internal-use only' if a var is mainly for Airbyte-only configuration. e.g. tracking, test or + * Cloud related etc. + *

+ * 2. 'Alpha support' if a var does not have proper support and should be used with care. */ public interface Configs { @@ -67,6 +74,10 @@ public interface Configs { */ String getWorkspaceDockerMount(); + /** + * Defines the name of the docker mount that is used for local file handling. On Docker, this allows + * connector pods to interact with a volume for "local file" operations. + */ String getLocalDockerMount(); /** @@ -78,19 +89,20 @@ public interface Configs { // Secrets /** - * Defines the GCP Project to store secrets in. + * Defines the GCP Project to store secrets in. Alpha support. */ String getSecretStoreGcpProjectId(); /** * Define the JSON credentials used to read/write Airbyte Configuration to Google Secret Manager. - * These credentials must have Secret Manager Read/Write access. + * These credentials must have Secret Manager Read/Write access. Alpha support. */ String getSecretStoreGcpCredentials(); /** * Defines the Secret Persistence type. None by default. Set to GOOGLE_SECRET_MANAGER to use Google - * Secret Manager. Set to TESTING_CONFIG_DB_TABLE to use the database as a test. + * Secret Manager. Set to TESTING_CONFIG_DB_TABLE to use the database as a test. Alpha support. + * Undefined behavior will result if this is turned on and then off. */ SecretPersistenceType getSecretPersistenceType(); @@ -114,7 +126,7 @@ public interface Configs { /** * Define the minimum flyway migration version the Jobs Database must be at. If this is not - * satisfied, applications will not successfully connect. + * satisfied, applications will not successfully connect. Internal-use only. */ String getJobsDatabaseMinimumFlywayMigrationVersion(); @@ -142,7 +154,7 @@ public interface Configs { /** * Define the minimum flyway migration version the Configs Database must be at. If this is not - * satisfied, applications will not successfully connect. + * satisfied, applications will not successfully connect. Internal-use only. */ String getConfigsDatabaseMinimumFlywayMigrationVersion(); @@ -309,7 +321,7 @@ public interface Configs { // Container Orchestrator /** - * Define if Airbyte should use Scheduler V2. + * Define if Airbyte should use Scheduler V2. Internal-use only. */ boolean getContainerOrchestratorEnabled(); diff --git a/docs/understanding-airbyte/configuring-airbyte.md b/docs/understanding-airbyte/configuring-airbyte.md index 89c31a89a8860..803aabd9cb452 100644 --- a/docs/understanding-airbyte/configuring-airbyte.md +++ b/docs/understanding-airbyte/configuring-airbyte.md @@ -22,13 +22,15 @@ We recommend using the overlays in the `stable` directory as these have preset r To configure the default Airbyte Kubernetes deployment, modify the `.env` in the respective directory. Each application will consume the appropriate env var from a generated configmap. -If you want to manage your own Kube manifest, please refer to the various `Kustomize` overlays for examples. +If you want to manage your own Kube manifests, please refer to the various `Kustomize` overlays for examples. ## Reference The following are the possible configuration options organised by deployment type and services. -Internal-onlu variables have been omitted for clarity. See `Configs.java` for a full list. +Internal-only variables have been omitted for clarity. See `Configs.java` for a full list. + +Be careful using variables marked as `alpha` as they aren't meant for public consumption. ### Shared @@ -42,22 +44,20 @@ The following variables are relevant to both Docker and Kubernetes. 5. `WORKSPACE_ROOT` - Defines the Airbyte workspace directory. Applies only to Docker, and is present in Kubernetes for backward compatibility. #### Secrets -1. `SECRET_STORE_GCP_PROJECT_ID` - Defines the GCP Project to store secrets in. -2. `SECRET_STORE_GCP_CREDENTIALS` - Define the JSON credentials used to read/write Airbyte Configuration to Google Secret Manager. These credentials must have Secret Manager Read/Write access. -3. `SECRET_PERSISTENCE_TYPE` - Defines the Secret Persistence type. Defaults to NONE. Set to GOOGLE_SECRET_MANAGER to use Google Secret Manager. Set to TESTING_CONFIG_DB_TABLE to use the database as a test. +1. `SECRET_STORE_GCP_PROJECT_ID` - Defines the GCP Project to store secrets in. Alpha support. +2. `SECRET_STORE_GCP_CREDENTIALS` - Define the JSON credentials used to read/write Airbyte Configuration to Google Secret Manager. These credentials must have Secret Manager Read/Write access. Alpha support. +3. `SECRET_PERSISTENCE_TYPE` - Defines the Secret Persistence type. Defaults to NONE. Set to GOOGLE_SECRET_MANAGER to use Google Secret Manager. Set to TESTING_CONFIG_DB_TABLE to use the database as a test. Alpha support. Undefined behavior will result if this is turned on and then off. #### Database 1. `DATABASE_USER` - Define the Jobs Database user. 2. `DATABASE_PASSWORD` - Define the Jobs Database password. 3. `DATABASE_URL` - Define the Jobs Database url in the form of jdbc:postgresql://${DATABASE_HOST}:${DATABASE_PORT/${DATABASE_DB}. Do not include username or password. -4. `JOBS_DATABASE_MINIMUM_FLYWAY_MIGRATION_VERSION` - Define the minimum flyway migration version the Jobs Database must be at. If this is not satisfied, applications will not successfully connect. -5. `JOBS_DATABASE_INITIALIZATION_TIMEOUT_MS` - Define the total time to wait for the Jobs Database to be initialized. This includes migrations. -6. `CONFIG_DATABASE_USER` - Define the Configs Database user. Defaults to the Jobs Database user if empty. -7. `CONFIG_DATABASE_PASSWORD` - Define the Configs Database password. Defaults to the Jobs Database password if empty. -8. `CONFIG_DATABASE_URL` - Define the Configs Database url in the form of jdbc:postgresql://${DATABASE_HOST}:${DATABASE_PORT/${DATABASE_DB}. Defaults to the Jobs Database url if empty. -9. `CONFIG_DATABASE_MINIMUM_FLYWAY_MIGRATION_VERSION` - Define the minimum flyway migration version the Configs Database must be at. If this is not satisfied, applications will not successfully connect. -10. `CONFIG_DATABASE_INITIALIZATION_TIMEOUT_MS` - Define the total time to wait for the Configs Database to be initialized. This includes migrations. -11. `RUN_DATABASE_MIGRATION_ON_STARTUP` - Define if the Bootloader should run migrations on start up. +4. `JOBS_DATABASE_INITIALIZATION_TIMEOUT_MS` - Define the total time to wait for the Jobs Database to be initialized. This includes migrations. +5. `CONFIG_DATABASE_USER` - Define the Configs Database user. Defaults to the Jobs Database user if empty. +6. `CONFIG_DATABASE_PASSWORD` - Define the Configs Database password. Defaults to the Jobs Database password if empty. +7. `CONFIG_DATABASE_URL` - Define the Configs Database url in the form of jdbc:postgresql://${DATABASE_HOST}:${DATABASE_PORT/${DATABASE_DB}. Defaults to the Jobs Database url if empty. +8. `CONFIG_DATABASE_INITIALIZATION_TIMEOUT_MS` - Define the total time to wait for the Configs Database to be initialized. This includes migrations. +9. `RUN_DATABASE_MIGRATION_ON_STARTUP` - Define if the Bootloader should run migrations on start up. #### Airbyte Services 1. `TEMPORAL_HOST` - Define the url where Temporal is hosted at. Please include the port. Airbyte services use this information. @@ -87,12 +87,10 @@ The following variables are relevant to both Docker and Kubernetes. 3. `MAXIMUM_WORKSPACE_RETENTION_DAYS` - Defines the oldest un-swept configuration file age. Files older than this will definitely be swept. Defaults to 60 days. 4. `MAXIMUM_WORKSPACE_SIZE_MB` - Defines the workspace size sweeping will continue until. Defaults to 5GB. -#### Container Orchestrator -1. `CONTAINER_ORCHESTRATOR_ENABLED` - Define if Airbyte should use Scheduler V2. - ### Docker-Only 1. `WORKSPACE_DOCKER_MOUNT` - Defines the name of the Airbyte docker volume. 2. `DOCKER_NETWORK` - Defines the docker network the new Scheduler launches jobs on. +3. `LOCAL_DOCKER_MOUNT` - Defines the name of the docker mount that is used for local file handling. On Docker, this allows connector pods to interact with a volume for "local file" operations. ### Kubernetes-Only #### Jobs From 74eff51aaccaf119a51de92ba080ff01c19c00b4 Mon Sep 17 00:00:00 2001 From: Davin Chia Date: Tue, 4 Jan 2022 17:24:38 +0800 Subject: [PATCH 7/7] Add logging variables. --- airbyte-commons/src/main/resources/log4j2.xml | 2 ++ docs/understanding-airbyte/configuring-airbyte.md | 13 +++++++++++++ 2 files changed, 15 insertions(+) diff --git a/airbyte-commons/src/main/resources/log4j2.xml b/airbyte-commons/src/main/resources/log4j2.xml index 8ea2c1a4de455..ed578354eb798 100644 --- a/airbyte-commons/src/main/resources/log4j2.xml +++ b/airbyte-commons/src/main/resources/log4j2.xml @@ -14,6 +14,8 @@ This is useful if you want to override the environment variables at runtime (or if you don't have access to the necessary information at the point where you are setting environment variables). + + Please update configuring-airbyte.md if the names of any of the below variables change. --> diff --git a/docs/understanding-airbyte/configuring-airbyte.md b/docs/understanding-airbyte/configuring-airbyte.md index 241670a27d68a..45fbc6090f158 100644 --- a/docs/understanding-airbyte/configuring-airbyte.md +++ b/docs/understanding-airbyte/configuring-airbyte.md @@ -105,3 +105,16 @@ The following variables are relevant to both Docker and Kubernetes. #### Worker 1. `TEMPORAL_WORKER_PORTS` - Define the local ports the Airbyte Worker pod uses to connect to the various Job pods. Port 9001 - 9040 are exposed by default in the Kustomize deployments. + +#### Logging +Note that Airbyte does not support logging to separate Cloud Storage providers. + +Please see [here](https://docs.airbyte.com/deploying-airbyte/on-kubernetes#configure-logs) for more information on configuring Kuberentes logging. + +1. `GCS_LOG_BUCKET` - Define the GCS bucket to store logs. +2. `S3_BUCKET` - Define the S3 bucket to store logs. +3. `S3_RREGION` - Define the S3 region the S3 log bucket is in. +4. `S3_AWS_KEY` - Define the key used to access the S3 log bucket. +5. `S3_AWS_SECRET` - Define the secret used to access the S3 log bucket. +6. `S3_MINIO_ENDPOINT` - Define the url Minio is hosted at so Airbyte can use Minio to store logs. +7. `S3_PATH_STYLE_ACCESS` - Set to `true` if using Minio to store logs. Empty otherwise.