Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨Destination S3: Support IRSA for AWS Authentication #244

Closed
wants to merge 13 commits into from

Conversation

don-code
Copy link
Contributor

What

This change introduces support for using IAM Roles for Service Accounts (IRSA) within S3 destinations in Airbyte. It is based on the changes in #177 (formerly airbytehq/airbyte#19010), but does not attempt to add the same support for logging. The change should also work for S3 sources, but is not tested thoroughly with sources, only with destinations.

How

As it stands today, destinations require an IAM access key and secret key to be passed. Even though the fields are defined as optional in the UI, not setting them in a Kubernetes cluster using IRSA results in an unhelpful null error message being thrown. This is because, although Airbyte's ServiceAccount may be configured properly for IRSA, job pods are not associated with that ServiceAccount:

[don@work ~] kubectl get pod stination-s3-check-19cf6be1-2770-4173-8b1a-42d1ec1436aa-0-vafrd -o yaml | grep serviceAccount
  serviceAccount: default
  serviceAccountName: default

This change makes it so that job pods are associated with the Airbyte install's ServiceAccount:

[don@work ~] k get pod stination-s3-check-fe45588c-857c-4be4-9359-b36a76455873-0-bctar -o yaml | grep serviceAccount
  serviceAccount: airbyte-admin
  serviceAccountName: airbyte-admin

Recommended reading order

  1. .gitignore - I ran the unit tests locally, and this file was produced. Unrelated to this PR, but seemingly still helpful.
  2. charts/airbyte-worker/templates/deployment.yaml - where the ServiceAccount name is brought into the application's scope, so that it is not assuming airbyte-admin.
  3. airbyte-workers/src/main/resources/application.yml - where the environment variable from 2 is brought into application config.
  4. airbyte-config/config-models/src/main/java/io/airbyte/config/EnvConfigs.java - where the environment variable from 2 is brought into application code.
  5. airbyte-config/config-models/src/main/java/io/airbyte/config/Configs.java - an interface where the service account name will be held.
  6. airbyte-config/config-models/src/main/java/io/airbyte/config/EnvConfigs.java - where this interface is implemented.
  7. airbyte-container-orchestrator/src/main/java/io/airbyte/container_orchestrator/config/ContainerOrchestratorFactory.java - where the value from the implementing class is used.
  8. airbyte-commons-worker/src/main/java/io/airbyte/workers/process/KubeProcessFactory.java - where the implemented class itself is used.
  9. airbyte-workers/src/main/java/io/airbyte/workers/config/ProcessFactoryBeanFactory.java - where the implemented class is instantiated.
  10. airbyte-config/config-models/src/main/java/io/airbyte/config/storage/DefaultS3ClientFactory.java - where the actual check with S3 credentials is performed.
  11. airbyte-workers/src/test-integration/java/io/airbyte/workers/process/KubePodProcessIntegrationTest.java - the integration tests for KubePodProcess, which do not strictly need to change, but have changed for consistency's sake.

Can this PR be safely reverted / rolled back?

If unsure, leave it blank.

  • YES 💚
  • NO ❌

🚨 User Impact 🚨

I've tested that destinations still work as expected, both the IRSA case where the access key and secret key are left blank...
Screen Shot 2023-05-26 at 11 53 26 PM

...and the non-IRSA case, with an IAM user - which I additionally tested with a user with access to a bucket which the IRSA role did not:

Screen Shot 2023-05-27 at 12 55 47 AM

...so I believe that there should be no user impact, save for the added ability to use IRSA roles.

gilandose and others added 11 commits February 28, 2023 12:55
…its post-refactor home ContainerOrchestratorFactory.java

Co-authored by: Richard Gilmore <richard.gilmore@gmail.com>
This change introduces support for using [IAM Roles for Service
Accounts (IRSA)](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/setting-up-enable-IAM.html)
within S3 destinations in Airbyte. It is based on the changes
in airbytehq#177, but does not attempt to add the same support for logging.
The change should also work for S3 sources, but is not tested
thoroughly with sources, only with destinations.
@moreiramarti
Copy link

I'm waiting for this since last year
Hope it will be merged asap

thank you

@pmossman pmossman self-assigned this Jun 6, 2023
@pmossman
Copy link
Contributor

pmossman commented Jun 6, 2023

Hey @don-code ! We spoke face to face about this issue a couple weeks ago, sorry that I didn't see that you had opened a PR until now! I will make time to review this change this week, I'm excited to get it merged!

@pmossman
Copy link
Contributor

pmossman commented Jun 7, 2023

/create-oss-pr

Copy link
Contributor

@pmossman pmossman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@don-code thanks again for the contribution! I left one small change request for favoring direct Micronaut injection over the Configs interface approach. I've also requested review from members of an adjacent Platform team, who generally own this area of the codebase and should be looped in. I've also opened this PR against our internal repo where we can run a more thorough suite of acceptance tests. I'll check back in soon and I'm aiming to get this merged ideally by the end of this week

@pmossman
Copy link
Contributor

pmossman commented Jun 8, 2023

Hey @don-code, wanted to give you a status update. I opened a PR off of your branch in our internal repo where we can perform more extensive testing.

While doing this, I realized that we likely want to adjust our async orchestrator pod creation to read the same service account config that you've introduced here. Today, we hardcode the airbyte-admin service account, but if someone were to override the default with a new service account name, I think we would want the orchestrator pod to use the same service account as the source and destination job pods.

I can make that change on the internal repo either today or tomorrow, depending on bandwidth, and once merged internally, any adjustments I end up making will be mirrored back to this public repo.

@pmossman
Copy link
Contributor

pmossman commented Jun 8, 2023

@don-code one more update here. In our internal repo, I made the change to the async orchestrator pod creation that I mentioned in my previous comment. I also went ahead and removed the Configs/EnvConfigs changes in favor of using the Micronaut injection everywhere. All of our internal builds are passing with these changes, and I am performing one more test in our cloud dev environment to make sure this change doesn't break anything with the Airbyte Cloud product.

Should be close to merging! Will update again here soon, likely tomorrow.

@pmossman
Copy link
Contributor

pmossman commented Jun 9, 2023

All tests on our Cloud environment were successful, and the internal PR is ready to merge. We tend to avoid merging on Fridays so I will get this change out on Monday morning. It will then be mirrored back to this public repo, at which point we can close out this PR. Thanks again @don-code for the contribution!

@pmossman
Copy link
Contributor

This change is now merged into airbyte-platform main as of 37ba07b. This will go out in our next OSS release, which I'll aim to get through this week. Thanks again @don-code!

@pmossman pmossman closed this Jun 12, 2023
@don-code
Copy link
Contributor Author

@pmossman just want to say thank you for all of the extra work you guys did here! I just pulled in 0.50.4 and dropped our local fork - everything seems to be working great.

@pmossman
Copy link
Contributor

Glad to hear it @don-code!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants