Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[helm] worker pod is crashing after upgrading to 0.49.6+ (latest update: missing env variables. see reply for detail) #31988

Closed
sc-yan opened this issue Oct 31, 2023 · 22 comments
Labels
area/platform issues related to the platform community helm team/deployments type/bug Something isn't working

Comments

@sc-yan
Copy link

sc-yan commented Oct 31, 2023

What method are you using to run Airbyte?

Kubernetes

Platform Version or Helm Chart Version

helm 0.49.9

What step the error happened?

Upgrading the Platform or Helm Chart

Revelant information

when upgrading helm from 0.49.6 to 0.49.8/0.49.9, the worker pod keeps crashing. but if I reverted it back to 0.49.6, it's fine.

Relevant log output

2023-10-31 00:57:57 ERROR i.m.r.Micronaut(handleStartupException):338 - Error starting Micronaut server: Error instantiating bean of type  [io.airbyte.workers.orchestrator.KubeOrchestratorHandleFactory]                                                                 │
│                                                                                                                                                                                                                                                                            │
│ Path Taken: new ApplicationInitializer() --> ApplicationInitializer.syncActivities --> List.syncActivities([ReplicationActivity replicationActivity],NormalizationActivity normalizationActivity,DbtTransformationActivity dbtTransformationActivity,NormalizationSummaryC │
│ io.micronaut.context.exceptions.BeanInstantiationException: Error instantiating bean of type  [io.airbyte.workers.orchestrator.KubeOrchestratorHandleFactory]                                                                                                              │
│                                                                                                                                                                                                                                                                            │
│ Path Taken: new ApplicationInitializer() --> ApplicationInitializer.syncActivities --> List.syncActivities([ReplicationActivity replicationActivity],NormalizationActivity normalizationActivity,DbtTransformationActivity dbtTransformationActivity,NormalizationSummaryC │
│     at io.micronaut.context.DefaultBeanContext.resolveByBeanFactory(DefaultBeanContext.java:2367) ~[micronaut-inject-3.10.1.jar:3.10.1]                                                                                                                                    │
│     at io.micronaut.context.DefaultBeanContext.doCreateBean(DefaultBeanContext.java:2305) ~[micronaut-inject-3.10.1.jar:3.10.1]                                                                                                                                            │
│     at io.micronaut.context.DefaultBeanContext.doCreateBean(DefaultBeanContext.java:2251) ~[micronaut-inject-3.10.1.jar:3.10.1]                                                                                                                                            │
│     at io.micronaut.context.DefaultBeanContext.createRegistration(DefaultBeanContext.java:3016) ~[micronaut-inject-3.10.1.jar:3.10.1]                                                                                                                                      │
│     at io.micronaut.context.SingletonScope.getOrCreate(SingletonScope.java:80) ~[micronaut-inject-3.10.1.jar:3.10.1]                                                                                                                                                       
     at io.airbyte.workers.Application.main(Application.java:15) ~[io.airbyte-airbyte-workers-0.50.33.jar:?]                                                                                                                                                                │
│ Caused by: java.lang.IllegalArgumentException                                                                                                                                                                                                                              │
│     at com.google.common.base.Preconditions.checkArgument(Preconditions.java:131) ~[guava-31.1-jre.jar:?]                                                                                                                                                                  │
│     at io.airbyte.config.storage.DefaultS3ClientFactory.validateBase(DefaultS3ClientFactory.java:36) ~[io.airbyte.airbyte-config-config-models-0.50.33.jar:?]                                                                                                              │
│     at io.airbyte.config.storage.DefaultS3ClientFactory.validate(DefaultS3ClientFactory.java:31) ~[io.airbyte.airbyte-config-config-models-0.50.33.jar:?]                                                                                                                  │
│     at io.airbyte.config.storage.DefaultS3ClientFactory.<init>(DefaultS3ClientFactory.java:24) ~[io.airbyte.airbyte-config-config-models-0.50.33.jar:?]                                                                                                                    │
│     at io.airbyte.workers.storage.S3DocumentStoreClient.s3(S3DocumentStoreClient.java:59) ~[io.airbyte-airbyte-commons-worker-0.50.33.jar:?]                                                                                                                               │
│     at io.airbyte.workers.storage.StateClients.create(StateClients.java:27) ~[io.airbyte-airbyte-commons-worker-0.50.33.jar:?]                                                                                                                                             │
│     at io.airbyte.workers.config.ContainerOrchestratorConfigBeanFactory.kubernetesContainerOrchestratorConfig(ContainerOrchestratorConfigBeanFactory.java:91) ~[io.airbyte-airbyte-workers-0.50.33.jar:?]                                                                  │
│     at io.airbyte.workers.config.$ContainerOrchestratorConfigBeanFactory$KubernetesContainerOrchestratorConfig0$Definition.build(Unknown Source) ~[io.airbyte-airbyte-workers-0.50.33.jar:?]                                                                               │
│     at io.micronaut.context.DefaultBeanContext.resolveByBeanFactory(DefaultBeanContext.java:2354) ~[micronaut-inject-3.10.1.jar:3.10.1]                                                                                                                                    │
│     ... 81 more
@msaffitz
Copy link

We are seeing this error as well. Downgrading to 0.49.6 fixed the issue for us.

@adamstrawson
Copy link

We're also experiencing this within GKE on the latest version 0.49.10

@lydialimsetel
Copy link

Experiencing this same issue on latest V0.49.18. Downgraded to helm chart V0.49.5, works fine now.

@cappadona
Copy link

We're also running into this if we upgrade the chart past 0.49.6

@joeybenamy
Copy link

Same issue with all charts after 0.49.6

@sc-yan sc-yan changed the title [helm] when upgrading to 0.49.9, worker pod is crashing [helm] worker pod is crashing after 0.49.6 (latest update: 0.49.18 still not working) Nov 8, 2023
@szemek
Copy link
Contributor

szemek commented Nov 10, 2023

After providing some environment variables in values.yaml I made it work. Here's how part of my configuration looks like

  ##  worker.extraEnv [array] Additional env vars for worker pod(s).
  ## Example:
  ##
  ## extraEnv:
  ## - name: JOB_KUBE_TOLERATIONS
  ##   value: "key=airbyte-server,operator=Equals,value=true,effect=NoSchedule"
  extraEnv:
    - name: AWS_ACCESS_KEY_ID
      valueFrom:
        secretKeyRef:
          key: AWS_ACCESS_KEY_ID
          name: airbyte-airbyte-secrets
    - name: AWS_SECRET_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          key: AWS_SECRET_ACCESS_KEY
          name: airbyte-airbyte-secrets
    - name: STATE_STORAGE_S3_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          key: AWS_ACCESS_KEY_ID
          name: airbyte-airbyte-secrets
    - name: STATE_STORAGE_S3_SECRET_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          key: AWS_SECRET_ACCESS_KEY
          name: airbyte-airbyte-secrets
    - name: STATE_STORAGE_S3_BUCKET_NAME
      value: ${STATE_STORAGE_S3_BUCKET_NAME}
    - name: STATE_STORAGE_S3_REGION
      value: ${STATE_STORAGE_S3_REGION}

Check here what environment variables you might be missing
https://github.com/airbytehq/airbyte-platform/blob/9ffa4e9f44f06e65fe3b138204367d5da8c98f2c/airbyte-config/config-models/src/main/java/io/airbyte/config/EnvConfigs.java#L133-L142

@sc-yan sc-yan changed the title [helm] worker pod is crashing after 0.49.6 (latest update: 0.49.18 still not working) [helm] worker pod is crashing after 0.49.6+ (latest update: missing env variables. see reply for detail) Nov 15, 2023
@sc-yan sc-yan changed the title [helm] worker pod is crashing after 0.49.6+ (latest update: missing env variables. see reply for detail) [helm charts] worker pod is crashing after upgrading to 0.49.6+ (latest update: missing env variables. see reply for detail) Nov 15, 2023
@sc-yan
Copy link
Author

sc-yan commented Nov 15, 2023

@szemek thank you so much for the info! I followed your approach and and it worked!
running 0.49.23 now. anyone who still has issues, please try the approach above. I'm gonna keep the issue open in case someone is looking for an answer. but please feel free to close it if you think no further action is needed.

@prafulauto1
Copy link

prafulauto1 commented Nov 15, 2023

for maintaining state with S3. I was able to resolve it by simply adding these two environment variable in woker section of values file :
extraEnv:
- name: STATE_STORAGE_S3_BUCKET_NAME
value: ${airbyte_log_bucket}
- name: STATE_STORAGE_S3_REGION
value: ${airbyte_log_bucket_region}

I could find this here

@HatemLar
Copy link

Any idea on how to fix it on a EC2 deployment?

@sc-yan
Copy link
Author

sc-yan commented Nov 16, 2023

@HatemLar helm charts is supposed to be used in k8s. I assume you are deploying airbyte with docker/etc on EC2? trying to setup same env variables above following this guide.
https://docs.airbyte.com/deploying-airbyte/on-aws-ec2

@HatemLar
Copy link

@sc-yan thank you for your help!
Yes, deployed with docker on EC2, and we did follow that guide.
You think we should declare these variables and in the instance or the docker-compose file?

@sc-yan
Copy link
Author

sc-yan commented Nov 16, 2023

@HatemLar it really depends on how you want to manage your infra/deployment. generally, docker is acting like VM so the app is not supposed to read values from host machine(which is EC2 in your case), unless you want to mount some volume into docker. it's common to setup these env variables in docker-compose file, but if you do have special cases, feel free to adjust so.

@PurseChicken
Copy link

PurseChicken commented Nov 20, 2023

Using the below works (for GCS) since the values should likely already be in your configMap if you specified them in global.gcs:

    extraEnv:
      - name: STATE_STORAGE_GCS_BUCKET_NAME
        valueFrom:
          configMapKeyRef:
            key: GCS_LOG_BUCKET
            name: airbyte-airbyte-env
      - name: STATE_STORAGE_GCS_APPLICATION_CREDENTIALS
        valueFrom:
          configMapKeyRef:
            key: GOOGLE_APPLICATION_CREDENTIALS
            name: airbyte-airbyte-env

That fixes the worker pod issue, however then I ran into the following with replication orchestrator. This is not seen until you try to sync a connection.

#32203

@lucasfcnunes
Copy link

Using the below works (for GCS) since the values should likely already be in your configMap if you specified them in global.gcs:

    extraEnv:
      - name: STATE_STORAGE_GCS_BUCKET_NAME
        valueFrom:
          configMapKeyRef:
            key: GCS_LOG_BUCKET
            name: airbyte-airbyte-env
      - name: STATE_STORAGE_GCS_APPLICATION_CREDENTIALS
        valueFrom:
          configMapKeyRef:
            key: GOOGLE_APPLICATION_CREDENTIALS
            name: airbyte-airbyte-env

That fixes the worker pod issue, however then I ran into the following with replication orchestrator. This is not seen until you try to sync a connection.

#32203

global.gcs.extraEnv doesn't affect the templates.

@PurseChicken
Copy link

Using the below works (for GCS) since the values should likely already be in your configMap if you specified them in global.gcs:

    extraEnv:
      - name: STATE_STORAGE_GCS_BUCKET_NAME
        valueFrom:
          configMapKeyRef:
            key: GCS_LOG_BUCKET
            name: airbyte-airbyte-env
      - name: STATE_STORAGE_GCS_APPLICATION_CREDENTIALS
        valueFrom:
          configMapKeyRef:
            key: GOOGLE_APPLICATION_CREDENTIALS
            name: airbyte-airbyte-env

That fixes the worker pod issue, however then I ran into the following with replication orchestrator. This is not seen until you try to sync a connection.
#32203

global.gcs.extraEnv doesn't affect the templates.

What I wrote is specific to the worker key in values:

worker.extraEnv

@ilyasemenov84
Copy link

ilyasemenov84 commented Nov 29, 2023

Is there a way to make it work with IRSA authentication (as + iam role)?

@marcosmarxm marcosmarxm changed the title [helm charts] worker pod is crashing after upgrading to 0.49.6+ (latest update: missing env variables. see reply for detail) [helm] worker pod is crashing after upgrading to 0.49.6+ (latest update: missing env variables. see reply for detail) Dec 6, 2023
@marcosmarxm
Copy link
Member

Hello all 👋 sorry the missing update here. I shared this with the engineering team and any update return here.

@booleanbetrayal
Copy link

Just to note that this appears to be the same solution to remediate #18016.

@raphaelauv
Copy link

raphaelauv commented Feb 5, 2024

this work airbyte/worker:0.50.47 and helm chart 0.53.52

minio:
  enabled: false

worker:
  extraEnv:
    - name: STATE_STORAGE_S3_BUCKET_NAME
      value: "XXYYZZ"
    - name: STATE_STORAGE_S3_REGION
      value: "eu-west-3"
    - name: S3_MINIO_ENDPOINT
      value: ""
      
global:

  log4jConfig: "log4j2-no-minio.xml"
  state:
    storage:
      type: "S3"
  logs:
    storage:
      type: "S3"
    minio:
      enabled: false
    s3:
      enabled: true
      bucket: "XXYYZZ"
      bucketRegion: "eu-west-3"
    accessKey:
      existingSecret: "airbyte-aws-creds"
      existingSecretKey: "AWS_ACCESS_KEY_ID"
    secretKey:
      existingSecret: "airbyte-aws-creds"
      existingSecretKey: "AWS_SECRET_ACCESS_KEY"

@StefanTUI
Copy link

StefanTUI commented Feb 9, 2024

I can confirm, the settings from @raphaelauv helped me to start the worker pods again.

I'm using helm chart 0.53.120 with airbyte/server:0.50.48

The server pod runs, but had error messages like in the worker logs.

Adding the following in the in my yml helped to mitigate this:

server:
  extraEnv:
    - name: LOG4J_CONFIGURATION_FILE
      valueFrom:
        configMapKeyRef:
          name: airbyte-env
          key: LOG4J_CONFIGURATION_FILE

@sg-danl
Copy link

sg-danl commented Feb 28, 2024

(Duplicate comment as previous issue is closed)
I've been pinning version 0.49.6 to get around this for the past month and a half.
(Running Airbyte OSS on AWS EKS cluster, default values.yaml for ease of replication while trying to fix.)

Trying the fix suggested by @marcosmarxm doesn't fix for me. After attempting upgrading from 0.49.6 -> latest since mid Jan (so 0.50.22+) it has never fixed the issue.

Running the minio config in bash returns:


helm % kubectl exec -it airbyte-minio-0 bash -n default
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
bash-5.1# mc alias set myminio http://localhost:9000 minio minio123
mc: Configuration written to `/tmp/.mc/config.json`. Please update your access credentials.
mc: Successfully created `/tmp/.mc/share`.
mc: Initialized share uploads `/tmp/.mc/share/uploads.json` file.
mc: Initialized share downloads `/tmp/.mc/share/downloads.json` file.
Added `myminio` successfully.
bash-5.1# mc mb myminio/state-storage
mc: <ERROR> Unable to make bucket `myminio/state-storage`. Your previous request to create the named bucket succeeded and you already own it.

Not an expert in any of this at all, but it looks like the creation of the bucket isn't entirely the issue. Just wanted to provide additional info as this has been a long-open issue!

Edited to add:
Force removing the bucket seems to (on 0.54.15) point to the bucket being forcefully recreated almost instantaneously.


bash-5.1# mc rb myminio/state-storage
mc: <ERROR> `myminio/state-storage` is not empty. Retry this command with ‘--force’ flag if you want to remove `myminio/state-storage` and all its contents 
bash-5.1# mc rb myminio/state-storage --force
Removed `myminio/state-storage` successfully.
bash-5.1# mc mb myminio/state-storage
mc: <ERROR> Unable to make bucket `myminio/state-storage`. Your previous request to create the named bucket succeeded and you already own it.
bash-5.1# mc rb myminio/state-storage --force
Removed `myminio/state-storage` successfully.
bash-5.1# mc rb myminio/state-storage --force
Removed `myminio/state-storage` successfully.

Edit again:
This only occurs with the PostgreSQL source connection. Our S3->S3 jobs can run as expected in versions beyond 0.49.6.

@marcosmarxm
Copy link
Member

@davinchia worker pod was deprecated right? Should we close this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform issues related to the platform community helm team/deployments type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests