-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎉 Source S3: Add IAM Instance Profile Support #10676
🎉 Source S3: Add IAM Instance Profile Support #10676
Conversation
c8c4f73
to
1d6a0a7
Compare
Hey @sidartha I think the code changes here are straightforward and solid 👍 and thank you for adding a test One thought I have is to change the Also, you'll need to update a couple more places around the repo such as the Dockerfile image tags, a STANDARD_SOURCE_DEFINTION.yaml file, and the spec.json file in the connector directory. See this PR for an example of the other places to update: https://github.com/airbytehq/airbyte/pull/9623/files#diff-0a23a52edd477f81527cdead182adda4e4d84ec23e10139bdbfecfd33ef6f071L5 (Note that this PR updates a destination and a source together, you only need to worry about the source files) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @sidartha for this contribution!
I totally agree with @ntkawasaki about oneOf field to pick authentication method from.
I made a refactoring suggestion, it'd be great if you could find a way of not declaring multiple if/else statements on the authentication method.
I think we shall also add more acceptance test using a config that have use_aws_default_credential_provider_chain = true
.
Finally, could you please bump the connector version (in the Dockerfile and in airbyte-config/init/src/main/resources/seed/source_definitions.yaml
), and update it's documentation (docs/integrations/sources/s3.md
) . Please explain what is the default chain of credentials that Boto will try to use, and update the changelog.
Let me know what you think of my suggestions!
elif self.use_aws_default_credential_provider_chain: | ||
self._boto_session = boto3session.Session() | ||
self._boto_s3_resource = make_s3_resource(self._provider, config=Config(), session=self._boto_session) | ||
else: | ||
self._boto_session = boto3session.Session() | ||
self._boto_s3_resource = make_s3_resource(self._provider, config=Config(signature_version=UNSIGNED), session=self._boto_session) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get the difference between the else
and the elif
block.
@@ -43,6 +46,10 @@ def use_aws_account(provider: Mapping[str, str]) -> bool: | |||
aws_secret_access_key = provider.get("aws_secret_access_key") | |||
return True if (aws_access_key_id is not None and aws_secret_access_key is not None) else False | |||
|
|||
@staticmethod | |||
def use_aws_default_credential_provider_chain(provider: Mapping[str, str]) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of refactoring this a bit:
- create a
AuthenticationMethod
enum - create a method that returns the authentication method to use (
get_authentication_method
) - use
get_authentication_method
in_setup_boto_session
to create session according to the authentication method to use
eg:
class AuthenticationMethod(Enum):
ACCESS_KEY_SECRET_ACCESS_KEY = 1
DEFAULT = 2
class S3File:
def __init__(self, *args: Any, **kwargs: Any):
super().__init__(*args, **kwargs)
self.authentication_method = self.get_authentication_method()
self._setup_boto_session()
def get_authentication_method(self):
if self._provider.get("aws_access_key_id") and self._provider.get("aws_secret_access_key"):
return AuthenticationMethod.ACCESS_KEY_SECRET_ACCESS_KEY
elif self._provider.get("use_aws_default_credential_provider_chain"):
return AuthenticationMethod.DEFAULT
else:
raise Exception("Could not determine an authentication method according to current provider.")
def _setup_boto_session(self):
if self.authentication_method == AuthenticationMethod.ACCESS_KEY_SECRET_ACCESS_KEY:
...
if self.authentication_method == AuthenticationMethod.DEFAULT:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be great if we could define at a single spot the specifics related to an authentication method (sessions and params definition) to make the rest of the implementation independent from the authentication method used. My point is to avoid several if / else statements on use_aws_account
or use_aws_default_credential_provider_chain
. It will improve maintainability and eventual iterations on authentication methods.
Thanks for the feedback @ntkawasaki and @alafanechere! I have updated the PR based on your suggestions. Since both For the acceptance test, do you have any suggestions on how to add a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank @sidartha for the change! I've still one remaining blocking change in the spec.json
.
I'll try to run the acceptance test with our current config using access key/secret key. If they pass I'll try to add a new one with use_aws_default_credential_provider_chain
.
@@ -177,6 +177,11 @@ | |||
"default": "", | |||
"type": "string" | |||
}, | |||
"use_aws_default_credential_provider_chain": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please create an authentication_method
oneOf field which for the user to chose between Default credential provider and Access Key/Secret Key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alafanechere I am a little stuck here on how to implement this without causing a breaking change.
We have Default Credential Provider, Access Key/Secret Key and no authentication. The current behaviour defaults to Access Key/Secret Key if those values are included and falls back to no authentication if not included.
When introducing oneOf field authentication_method
, I'm not sure what the default value can be set to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of removing use_aws_default_credential_provider_chain
field and changing the description of Access Key/Secret Key to explain that if not set it will fall back to the default boto credential chain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we need to support those who don't use any credentials (e.g. those who use it with non-AWS storage or those who don't need any auth since they are accessing public buckets), I don't think this will work. In this case, the default boto credential chain won't be able to locate any credentials.
/test connector=connectors/source-s3
|
/test connector=connectors/source-s3
|
/test connector=connectors/source-s3
|
Hey @sidartha I run the acceptance tests to make sure your PR does not introduce any breaking change. |
Hi @alafanechere, I'd appreciate the help on editing the |
Question: What issues do you see could be encountered if an additional method like
And those credentials could establish temporary STS credentials to pass to the S3 client. My particular use case is cross-account role access. My org has a third party pulling data from our S3 bucket and if they could just assume a role in our account, then we wouldn't have to worry about rotating secret keys. This is further useful if you have a single Airbyte instance that has sources/destinations in multiple foreign accounts. The default EC2 instance profile can only be tied to a single role ARN in the Maybe additional profiles could be added in [default]
credential_source = Ec2InstanceMetadata
[profile accountb]
role_arn = arn:aws:iam::111111111111:role/ROLENAME
external_id = bbbbbbbb
source_profile = default
[profile accountc]
role_arn = arn:aws:iam::222222222222:role/ROLENAME
external_id = cccccccc
source_profile = default ...and then the UI could take a profile as input. Would your implementation need to be refactored to add my use case at a future time? Thanks! |
@matthewhembree thank you for this feedback! This is definitely a valid use case and goes in the direction that we shall implement a dropdown field to select authentication methods. I've been really swamped in the past week to suggest the change in |
@sidartha, to implement the |
I'm closing this PR as I don't have the bandwidth to iterate on the specs. @sidartha feel free to re-open if you want to tackle this. |
Just FYI: I'm working on finishing this up. I'll open a new PR soon. |
What
This PR adds IAM instance profiles support for the S3 source connector to allow use of the instance profile instead of relying on Access Keys and Secrets (as described in #5282 and #5942). A similar feature was implemented for destination-s3 in #9399.
How
A new config option
use_aws_default_credential_provider_chain
is added and when set to true, we rely on Boto3 to identify the credentials (this is done by removingsignature_version=UNSIGNED
in the client config object).The other option considered was to fall-back to the default credential provider chain if no access key & secret are provided but this would cause issues for those who are attempting to read a publicly available S3 file without using any credentials and also for those who are using this connector with non-AWS S3 storage.
I'm open to other ideas for implementing this too - this was just the simplest thing I could think of!
Recommended reading order
x.java
y.python
🚨 User Impact 🚨
Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.
Pre-merge Checklist
Expand the relevant checklist and delete the others.
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereTests
Unit
Put your unit tests output here.
Integration
Put your integration tests output here.
Acceptance
Put your acceptance tests output here.