Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source S3: support IAM EC2 profile #17334

Closed
wants to merge 15 commits into from

Conversation

mc51
Copy link

@mc51 mc51 commented Sep 28, 2022

What

Among the previously existing AWS Access Key and Secret and no Credentials support, this allows the use of the default credentials provider chain of boto3 (e.g. use the instance profile on ec2).
This is supposed to finish the work of sidartha #10676
Refers to this issue.
It is useful as it allows to access buckets via role only (i.e. in the case of instance profile on ec2). It makes the use of credentials obsolete in that case. In some business environments (like ours) this is crucial, as role only access is considered best practice.

How

Add Authentication Method Dropdown (oneOf field) to UI. Per default, it is set to use AWS Access Key ID and Secret Access Key. Those must be specified as before further down. However, if those values are empty, then the select Authentication Method can also use the AWS Default Credential Provider Chain or No Credentials (unsigned requests, e.g. for public Buckets or non AWS buckets).

This shouldn't break previous versions because aws_access_key_id and aws_secret_access_key are still "leading" in defining the Authentication Method. However, the downside is, that the usage of the Authentication Method field might not be straight forward: When aws_access_key_id and aws_secret_access_key are set, this always overrides the setting on Authentication Method.

Also, this implementation should be extensible. For example, one could add Role Switching to it, which might also be very helpful in many settings.

image
image

Recommended reading order

🚨 User Impact 🚨

This should not be breaking because it replicates the previous behavior. Default Credential Provider Chain is only used when selected by the user and aws_access_key_id and aws_secret_access_key are empty.

Pre-merge Checklist

Expand the relevant checklist and delete the others.

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub and connector version bumped by running the /publish command described here

Tests

Unit

Results (64.85s):
170 passed

Integration

Some of the tests I can't get running as I don't have some config files (or I misunderstood something) e.g. E FileNotFoundError: [Errno 2] No such file or directory: '/home/mc/dev/airbyte/airbyte-integrations/connectors/source-s3/secrets/parquet_config.json'
Other than that, TestDiscovery.test_backward_compatibility fails. I guess this is to be expected, as the changed schema.json is not compatible with the old FileStream class.

@mc51 mc51 force-pushed the source-s3-support-iam-ec2-profile branch from b5bafe5 to 4af0e07 Compare September 28, 2022 16:34
@CLAassistant
Copy link

CLAassistant commented Sep 28, 2022

CLA assistant check
All committers have signed the CLA.

@mc51
Copy link
Author

mc51 commented Sep 28, 2022

I'll need to clean this up a bit. Will re-open.

@mc51 mc51 closed this Sep 28, 2022
@mc51 mc51 reopened this Sep 28, 2022
@mc51
Copy link
Author

mc51 commented Oct 10, 2022

Anyone willing to take a look?

@marcosmarxm
Copy link
Member

Sorry the team missed this @mc51 do you mind resolving conflicts? I'll review this after

@mc51
Copy link
Author

mc51 commented Oct 24, 2022

@IAL32 thanks for the contribution. If you could please sign the CLA, hopefully we can continue

/:e @marcosmarxm: all ready

@mc51
Copy link
Author

mc51 commented Nov 22, 2022

Any chance someone is going to look at this after all? (@marcosmarxm)

@sherifnada sherifnada requested a review from evantahler March 11, 2023 00:17
@sherifnada sherifnada removed their assignment Mar 11, 2023
@evantahler
Copy link
Contributor

@airbytehq/community-assistance-dri can you help with this review?
@mc51 are you able to update this PR and merge in the latest changes from the master branch?

@blarghmatey
Copy link
Contributor

It would be great if we can get this merged. Thanks for doing the work @mc51 !

@mc51
Copy link
Author

mc51 commented Apr 14, 2023

@airbytehq/community-assistance-dri can you help with this review? @mc51 are you able to update this PR and merge in the latest changes from the master branch?

I'm not willing to put in any more effort towards this, as the whole process has been painfully slow.
Hopefully, someone else can pick this up.

@mc51 mc51 closed this Apr 14, 2023
@marcosmarxm marcosmarxm reopened this Apr 14, 2023
@marcosmarxm
Copy link
Member

@mc51 sorry the slow feedback for this amazing contribution. Our team is returning the priority to review PR contirbutions. This is in my to-do list and I hope to resolve the conflicts and get it merged soon.

@blarghmatey
Copy link
Contributor

@marcosmarxm @mc51 here is my first pass of resolving the conflicts between this PR and the current master. I'm happy to open it as is, or you can adopt the changes since I haven't done anything other than run a rebase. https://github.com/airbytehq/airbyte/compare/master...mitodl:airbyte:source-s3-support-iam-ec2-profile?expand=1

@hansh0801
Copy link

track

@marcosmarxm marcosmarxm removed their assignment May 12, 2023
@evantahler evantahler removed their request for review May 19, 2023 18:16
@joeystevens00
Copy link

🙏 this functionality is necessary when can we expect some progress to be made on merging this?

@amankesharwani7
Copy link

@mc51 I have created a custom docker image and deployed in the AWS with the changes, I am able to setup and test source S3 but when I am trying to create connection between custom source S3 and any destination I get the same error of Access denied

Internal message: unable to access bucket: 'xxxx--xxx' key: 'airbyte/corpository/live/crawler_output_rss/stg_entity_company_mapping/stg_entity_company_mapping_2023_06_08_1686250027886_0.parquet' version: None error: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied
Failure type: system_error

@dangeReis
Copy link

Any update on this? How are people using this software without it having the very basic support of default authentication. How is this PR stuck without being merged for almost a year now?

@IAL32
Copy link

IAL32 commented Jul 13, 2023

Any update on this? How are people using this software without it having the very basic support of default authentication. How is this PR stuck without being merged for almost a year now?

My guess is that people are either:

  • using unsecure, long-lived credentials by creating a dedicated user on AWS
  • generating temporary credentials on the fly somewhere else and changing the credentials stored in Airbyte's db (there are APIs for that)

Both options feel so hacky and unsecure, and have stopped me from switching to using Airbyte altogether. This is only one of the many Airbyte AWS connectors lacking this functionality.

@sherifnada
Copy link
Contributor

Hi everyone, apologies for the holdup. We'll push this to the finish line soon. Thanks for the contribution @mc51 !

@marcosmarxm
Copy link
Member

Closed because the Source S3 was moved to the File Based CDK.

@marcosmarxm
Copy link
Member

Hello @mc51 and @sidartha, sorry for closing the contribution you both spend time working on. The basic structure of the connector changed a lot since the contribution. I'd like to have a conversation with you about the contribution and your experience, if possible please reach me in the Airbyte Slack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

Successfully merging this pull request may close these issues.