Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clarify staging setup guide for bq destination #9255

Merged
merged 5 commits into from
Jan 6, 2022
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 4 additions & 6 deletions docs/integrations/destinations/bigquery.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,15 +111,13 @@ This is the recommended configuration for uploading data to BigQuery. It works b
* **GCS Bucket Path**
* **Block Size (MB) for GCS multipart upload**
* **GCS Bucket Keep files after migration**
* See [this](https://cloud.google.com/storage/docs/creating-buckets) for instructions on how to create a GCS bucket.
* See [this](https://cloud.google.com/storage/docs/creating-buckets) for instructions on how to create a GCS bucket. The bucket cannot have a retention policy. Set Protection Tools to none or Object versioning.
* **HMAC Key Access ID**
* See [this](https://cloud.google.com/storage/docs/authentication/hmackeys) on how to generate an access key.
* We recommend creating an Airbyte-specific user or service account. This user or account will require read and write permissions to objects in the bucket.
* See [this](https://cloud.google.com/storage/docs/authentication/managing-hmackeys) on how to generate an access key. For more information on hmac keys please reference the [GCP docs](https://cloud.google.com/storage/docs/authentication/hmackeys)
* We recommend creating an Airbyte-specific user or service account. This user or account will require the following permissions for the bucket: `Storage Object Admin` and `Storage Admin`. You can set those by going to the permissions tab in the GCS bucket and adding the appropriate the email address of the service account or user and adding the aforementioned permissions.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tuliren I updated this line after you reviewed it. In practice I found Storage Object Admin wasn't enough permission and needed to add Storage Admin. Relates to this bug issue. Are you okay with me publishing this as is or is there a more limited permission that you know works?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi ✋ what was the error you saw after adding the storage.multipartUploads.create permission but it still didnt work? I'm going to guess it was something about not seeing a bucket, and the actual permission you needed was storage.buckets.list. Storage Admin probably wouldnt be the best thing to suggest people give to this SA.

I've always felt that the default BigQuery/Storage Roles GCP creates for users dont include some basic permissions I think they should. And then people end up granting an Admin role because of this. I usually create a custom role "Storage Viewer" with this:

Screen Shot 2022-01-03 at 2 28 53 PM

And use it in combination with one of the Storage Object _ roles

Copy link
Contributor

@tuliren tuliren Jan 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you okay with me publishing this as is or is there a more limited permission that you know works?

I am fine with merging this PR as is. I don't know a more limited permission that works.


Update: Actually I second Noah and it would be helpful to post the error message after adding the multipart upload permission. I guess many users won't be comfortable with giving out admin permission just for a staging bucket. It's totally fine if you don't have bandwidth to do that. In that case, we can merge this PR as is, but we should create follow-up ticket to look into this problem.

* **Secret Access Key**
* Corresponding key to the above access ID.
* Make sure your GCS bucket is accessible from the machine running Airbyte.
* This depends on your networking setup.
* The easiest way to verify if Airbyte is able to connect to your GCS bucket is via the check connection tool in the UI.
* Make sure your GCS bucket is accessible from the machine running Airbyte. This depends on your networking setup. The easiest way to verify if Airbyte is able to connect to your GCS bucket is via the check connection tool in the UI.

### `Standard` uploads
This uploads data directly from your source to BigQuery. While this is faster to setup initially, **we strongly recommend that you do not use this option for anything other than a quick demo**. It is more than 10x slower than the GCS uploading option and will fail for many datasets. Please be aware you may see some failures for big datasets and slow sources, e.g. if reading from source takes more than 10-12 hours. This is caused by the Google BigQuery SDK client limitations. For more details please check [https://github.com/airbytehq/airbyte/issues/3549](https://github.com/airbytehq/airbyte/issues/3549)
Expand Down
10 changes: 4 additions & 6 deletions docs/integrations/destinations/gcs.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,16 +207,14 @@ Under the hood, an Airbyte data stream in Json schema is first converted to an A

* Fill up GCS info
* **GCS Bucket Name**
* See [this](https://cloud.google.com/storage/docs/creating-buckets) to create an S3 bucket.
* See [this](https://cloud.google.com/storage/docs/creating-buckets) for instructions on how to create a GCS bucket. The bucket cannot have a retention policy. Set Protection Tools to none or Object versioning.
* **GCS Bucket Region**
* **HMAC Key Access ID**
* See [this](https://cloud.google.com/storage/docs/authentication/hmackeys) on how to generate an access key.
* We recommend creating an Airbyte-specific user or service account. This user or account will require read and write permissions to objects in the bucket.
* See [this](https://cloud.google.com/storage/docs/authentication/managing-hmackeys) on how to generate an access key. For more information on hmac keys please reference the [GCP docs](https://cloud.google.com/storage/docs/authentication/hmackeys)
* We recommend creating an Airbyte-specific user or service account. This user or account will require the following permissions for the bucket: `Storage Object Admin` and `Storage Admin`. You can set those by going to the permissions tab in the GCS bucket and adding the appropriate the email address of the service account or user and adding the aforementioned permissions.
* **Secret Access Key**
* Corresponding key to the above access ID.
* Make sure your GCS bucket is accessible from the machine running Airbyte.
* This depends on your networking setup.
* The easiest way to verify if Airbyte is able to connect to your GCS bucket is via the check connection tool in the UI.
* Make sure your GCS bucket is accessible from the machine running Airbyte. This depends on your networking setup. The easiest way to verify if Airbyte is able to connect to your GCS bucket is via the check connection tool in the UI.

## CHANGELOG

Expand Down