-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[kubernetes/helm] Allow authentication to AWS using roles #5942
Comments
thanks for the suggestion @i-yordanova. Is it something critical or blocks your use of the Airbyte? |
Thanks for submitting this @i-yordanova! @marcosmarxm we are looking at deploying Airbyte at Firebolt and this is a potential blocker for us deploying Airbyte while following our security best practices. |
@marcosmarxm Yes, in a way it is for similar security concerns mentioned by @miguel-firebolt |
What we wantAffects S3 (source and destination) & Redshift in both Docker and K8s. Need to allow falling back on Airbyte instance-level roles. Current workaroundUse a service account. @i-yordanova would using a service account work for your use case? |
THis seems similar to the work provided to BigQuery case here: #3947 |
@cgardens I was thinking it's probably as simple as making the |
yeah. so i have to open question here that i have not had time to look into
|
@cgardens the 2nd point seem more like a strategic/design decision more than anything. We are getting to a point where this is a blocker and I was hoping to get some approximation how much work and time this would require and it seems more complicated than expected. Just noticed you mentions |
FYI I did some testing on my SQS Source to see if EC2 assumed roles would work - this is with Docker CE & I did a quick bodge, simply to ignore the keys set via UI:
And it correctly picks up the AWS role from the EC2 instance - provably so, I can add & revoke permissions on the EC2 role and cause the sync to succeed/fail. So, at least in the case of Docker on EC2, this is already possible - its just in how the s3-source is building the s3 session with boto that isn't picking up instance metadata. This of course does not mean the same is true in k8s. |
Just leaving a note that we ran into this too this week and this could be a blocker for us adopting Airbyte. From @sdairs's post, it looks like Airbyte on EC2 would work if the access_key and secret_key were made optional. I have no idea how to solve this for K8s, but here's an idea: What do the maintainers think about having a new S3 destination connector that makes the access_key and secret_key optional? It'd target only EC2 hosted environments. I don't think it'd be a breaking change to from required -> optional, but I'm new to the repo and Airbyte, so I could be missing something :) |
Realistically it should be supported in the same S3 source/dest, and I don't think making creds Optional over Required should be considered a breaking change as existing configs wouldn't be affected in any way - it would however be a Behavioural change for new configs. For both the Java and Python SDKs, it should be enough to just build the service client without any creds when they are not specified by the user, and it will search for creds to use as per: In theory, this could work for both EKS and EC2. (It would also enable creds to be picked up from profiles on any infra - VMs, Azure VM, GCE, GKE). This could be improved to allow naming a profile, so that it's not just picking the first creds it finds (as there could be multiple). However, I think there were concerns about how that is presented to users - as it wouldn't be immediately obvious how that is working under the hood - but, IMO, clear documentation should be enough to let the feature be enabled & stick further polish as an improvement issue. |
I just tried doing this locally on an EC2 instance. I changed if (endpoint.isEmpty()) {
return AmazonS3ClientBuilder.standard()
.withCredentials(new InstanceProfileCredentialsProvider(true))
.withRegion(s3Config.getRegion())
.build();
} else {
... S3DestinationConfig.java to if (endpoint == null || endpoint.isEmpty()) {
return AmazonS3ClientBuilder.standard()
.withCredentials(new InstanceProfileCredentialsProvider(true))
.withRegion(bucketRegion)
.build();
}
... and S3Destination.java to: @Override
public AirbyteConnectionStatus check(final JsonNode config) {
try {
S3StreamCopier.attemptS3WriteAndDelete(S3Config.getS3Config(config), config.get("s3_bucket_path").asText());
return new AirbyteConnectionStatus().withStatus(Status.SUCCEEDED);
} catch (final Exception e) {
LOGGER.error("A CHANGE: Exception attempting to access the S3 bucket: ", e);
return new AirbyteConnectionStatus()
.withStatus(AirbyteConnectionStatus.Status.FAILED)
.withMessage("Could not connect to the S3 bucket with the provided configuration. \n" + e
.getMessage());
}
} I had to comment out a few failing tests (could be unrelated, couldn't quite tell from the errors) but I eventually got a build. However I don't think the built bits are being used at all. I'm following these instruction from https://docs.airbyte.io/contributing-to-airbyte/developing-locally:
When I attempt to add a new S3 destination, I get an error saying
It should be saying |
You should be able to build just the connector with something like You can also manually do things without the UI https://docs.airbyte.io/connector-development/tutorials/building-a-java-destination#directly-running-the-destination-using-docker |
Nice! @sdairs is right, you should be able to point to the dev version of the connector locally. I do not know how / if AWS creds propagate into docker. It is possible that you'll need to do something additional there as well. |
Just to add my knowledge around role propagation on EKS clusters: A mapping between a K8S Service account and an AWS role is possible by using OpenID Connect (OIDC) identity providers. AWS calls this IRSA (IAM role for Service Account). How to set this up is well documented here Users could add the |
this issue can be closed, 9399 was merged |
Unfortunately #9399 doesn't implement support for IRSA which is the official way to assume an role in EKS. What I've seen in other software projects, is that they implement DefaultAWSCredentialsProviderChain (instead of InstanceProfileCredentialsProvider used in #9399) which supports all the usual ways of providing access to the AWS SDK. |
@casperbiering i was able to use the s3 destination connector on eks using instanceprofile of the nodes |
@alvaroqueiroz true, but by using the instanceprofile you give all containers/pods/software running on that node access, which does not follow the Principle of Least Privilege. By using IRSA you only give access to a specific pod making it the more secure solution. |
When using s3 destination with format type |
@tuliren Is my thinking correct that the S3 parquet output doesn't support using IAM roles? If so, should I create a separate issue on that? |
I don't see it noted yet so I'll call out that this can also be applicable to RDS databases if IAM authentication is in use. |
@bleonard adding this to DB team backlog though this maybe involves a platform team or connector ops since it's likely to be cross-cutting across any AWS-related connector |
I need IAM EC2 profile support for DynamoDB source connector. Do you think I should create a new issue? |
I think the AWS secretsmanager integration has a similar problem. I tried granting my EC2 instance permissions via a role / instance profile, but the airbyte server container failed to start since the access / secret key wasn't explicitly set. For context I am running the standard EC2 configuration via docker-compose. |
Hey all, we merged a contribution that enables IRSA support for source/destination pods: airbytehq/airbyte-platform@37ba07b This is available in the following release: https://github.com/airbytehq/airbyte-platform/releases/tag/v0.50.2 I believe there is still some amount of work that would need to be done to support IRSA for secret management and log storage. |
@pmossman tracking this contribution that was merged. Are there any docs on how to deploy this to EKS in 0.50.2 or higher? I'm not sure where the IAM role needs to come from in order for the S3 destination to get access. Appreciate any clarification on how to leverage this new contribution. |
Hi @pmossman tracking this contribution that was merged. Is there any timeline to create a Helm Chart for this release? |
airbyte not working with AWS OIDC ( tried eks_referal_identity and eks_pod_identity ) |
Got it finally working with IRSA. Helm chart config: global:
log4jConfig: "log4j2-no-minio.xml"
state:
storage:
type: "S3"
logs:
storage:
type: "S3"
minio:
enabled: false
s3:
enabled: true
bucket: S3_BUCKET_NAME
bucketRegion: S3_BUCKET_REGION
serviceAccount:
create: true
annotations:
eks.amazonaws.com/role-arn: IAM_ROLE_ARN
server:
enabled: true
extraEnv:
- name: LOG4J_CONFIGURATION_FILE
valueFrom:
configMapKeyRef:
name: {{ .Release.Name }}-airbyte-env
key: LOG4J_CONFIGURATION_FILE
- name: STATE_STORAGE_S3_BUCKET_NAME
valueFrom:
configMapKeyRef:
name: {{ .Release.Name }}-airbyte-env
key: S3_LOG_BUCKET
- name: STATE_STORAGE_S3_REGION
valueFrom:
configMapKeyRef:
name: {{ .Release.Name }}-airbyte-env
key: S3_LOG_BUCKET_REGION
worker:
enabled: true
extraEnv:
- name: LOG4J_CONFIGURATION_FILE
valueFrom:
configMapKeyRef:
name: {{ .Release.Name }}-airbyte-env
key: LOG4J_CONFIGURATION_FILE
- name: STATE_STORAGE_S3_BUCKET_NAME
valueFrom:
configMapKeyRef:
name: {{ .Release.Name }}-airbyte-env
key: S3_LOG_BUCKET
- name: STATE_STORAGE_S3_REGION
valueFrom:
configMapKeyRef:
name: {{ .Release.Name }}-airbyte-env
key: S3_LOG_BUCKET_REGION
minio:
enabled: false |
IRSA Role is functional with chart version 0.58.26. Below is the updated Helm Config:
However, it appears that the values in the Helm repository have not been updated to reflect these changes. |
@bgroff is this something already implemented? |
Tested, working. Thanks you for this. This makes my groups life easier! |
wow, thanks @DeepakRai94! Works for us on chart version 0.63.18 as well, though Related storage PRs here and here; env config map |
Has anyone tried IRSA to authenticate with RDS to replace the helm chart postgres? |
Was there a regression on this?
Alternatively, is there some dependency on the database? This stopped working when I started using an external database. |
I've been running into S3 authenticacion issues using IRSA. Airbyte is able to use IRSA to write to the s3 logs bucket just fine, but when I try to add a source connector, the connection test fails with this error message on the server:
If I then switch back to using a static AWS access key ID/secret, it works. |
We have a similar issue. For us it seems like the IRSA role for writing logs to S3 is being used by the S3 connector even when a different role is set in the connector config. |
Tell us about the problem you're trying to solve
Current version of Airbyte I tested (v0.29.15-alpha) requests providing access key Id and secret access key in all cases where authentication for AWS is required.
Describe the solution you’d like
It would be good to allow using roles instead i.e provide the option to use either/whichever works for the user.
Describe the alternative you’ve considered or used
N/A
Additional context
N/A
The text was updated successfully, but these errors were encountered: