Skip to content

Commit

Permalink
Fix events and workflow_runs datetimes in source-github (#19299)
Browse files Browse the repository at this point in the history
* Fix events and workflow_runs datetimes in `source-github`

* add PR number

* whitespace

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
  • Loading branch information
evantahler and octavia-squidington-iii authored Nov 10, 2022
1 parent b16f28f commit bfdba6c
Show file tree
Hide file tree
Showing 6 changed files with 85 additions and 76 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -468,7 +468,7 @@
- name: GitHub
sourceDefinitionId: ef69ef6e-aa7f-4af1-a01d-ef775033524e
dockerRepository: airbyte/source-github
dockerImageTag: 0.3.7
dockerImageTag: 0.3.8
documentationUrl: https://docs.airbyte.com/integrations/sources/github
icon: github.svg
sourceType: api
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4094,7 +4094,7 @@
supportsNormalization: false
supportsDBT: false
supported_destination_sync_modes: []
- dockerImage: "airbyte/source-github:0.3.7"
- dockerImage: "airbyte/source-github:0.3.8"
spec:
documentationUrl: "https://docs.airbyte.com/integrations/sources/github"
connectionSpecification:
Expand Down
2 changes: 1 addition & 1 deletion airbyte-integrations/connectors/source-github/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,5 @@ RUN pip install .
ENV AIRBYTE_ENTRYPOINT "python /airbyte/integration_code/main.py"
ENTRYPOINT ["python", "/airbyte/integration_code/main.py"]

LABEL io.airbyte.version=0.3.7
LABEL io.airbyte.version=0.3.8
LABEL io.airbyte.name=airbyte/source-github
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,8 @@
}
},
"created_at": {
"type": ["null", "string"]
"type": ["null", "string"],
"format": "date-time"
},
"id": {
"type": ["null", "string"]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,10 +53,12 @@
}
},
"created_at": {
"type": ["null", "string"]
"type": ["null", "string"],
"format": "date-time"
},
"updated_at": {
"type": ["null", "string"]
"type": ["null", "string"],
"format": "date-time"
},
"run_attempt": {
"type": ["null", "integer"]
Expand Down
146 changes: 76 additions & 70 deletions docs/integrations/sources/github.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,25 @@
# GitHub

This page contains the setup guide and reference information for the GitHub source connector.

## Prerequisites
* Start date
* GitHub Repositories
* Branch (Optional)
* Page size for large streams (Optional)

**For Airbyte Cloud:**
- Start date
- GitHub Repositories
- Branch (Optional)
- Page size for large streams (Optional)

* Personal Access Token (see [Permissions and scopes](https://docs.airbyte.com/integrations/sources/github#permissions-and-scopes))
* OAuth
**For Airbyte Cloud:**

- Personal Access Token (see [Permissions and scopes](https://docs.airbyte.com/integrations/sources/github#permissions-and-scopes))
- OAuth

**For Airbyte Open Source:**

* Personal Access Token (see [Permissions and scopes](https://docs.airbyte.com/integrations/sources/github#permissions-and-scopes))

- Personal Access Token (see [Permissions and scopes](https://docs.airbyte.com/integrations/sources/github#permissions-and-scopes))

## Setup guide

### Step 1: Set up GitHub

Create a [GitHub Account](https://github.com).
Expand All @@ -28,19 +29,21 @@ Create a [GitHub Account](https://github.com).
Log into [GitHub](https://github.com) and then generate a [personal access token](https://github.com/settings/tokens). To load balance your API quota consumption across multiple API tokens, input multiple tokens separated with `,`.

### Step 2: Set up the GitHub connector in Airbyte

**For Airbyte Cloud:**

1. [Log into your Airbyte Cloud](https://cloud.airbyte.io/workspaces) account.
2. In the left navigation bar, click **Sources**. In the top-right corner, click **+ new source**.
3. On the source setup page, select **GitHub** from the Source type dropdown and enter a name for this connector.
4. Click `Authenticate your GitHub account` by selecting Oauth or Personal Access Token for Authentication.
4. Click `Authenticate your GitHub account` by selecting Oauth or Personal Access Token for Authentication.
5. Log in and Authorize to the GitHub account.
6. **Start date** - The date from which you'd like to replicate data for streams: `comments`, `commit_comment_reactions`, `commit_comments`, `commits`, `deployments`, `events`, `issue_comment_reactions`, `issue_events`, `issue_milestones`, `issue_reactions`, `issues`, `project_cards`, `project_columns`, `projects`, `pull_request_comment_reactions`, `pull_requests`, `pull_requeststats`, `releases`, `review_comments`, `reviews`, `stargazers`, `workflow_runs`, `workflows`.
7. **GitHub Repositories** - Space-delimited list of GitHub organizations/repositories, e.g. `airbytehq/airbyte` for single repository, `airbytehq/airbyte airbytehq/another-repo` for multiple repositories. If you want to specify the organization to receive data from all its repositories, then you should specify it according to the following example: `airbytehq/*`.
8. **Branch (Optional)** - Space-delimited list of GitHub repository branches to pull commits for, e.g. `airbytehq/airbyte/master`. If no branches are specified for a repository, the default branch will be pulled. (e.g. `airbytehq/airbyte/master airbytehq/airbyte/my-branch`).
9. **Page size for large streams (Optional)** - The GitHub connector contains several streams with a large load. The page size of such streams depends on the size of your repository. Recommended to specify values between 10 and 30.

**For Airbyte Open Source:**

1. Authenticate with **Personal Access Token**.

## Supported sync modes
Expand All @@ -59,85 +62,88 @@ The GitHub source connector supports the following [sync modes](https://docs.air

This connector outputs the following full refresh streams:

* [Assignees](https://docs.github.com/en/rest/reference/issues#list-assignees)
* [Branches](https://docs.github.com/en/rest/reference/repos#list-branches)
* [Collaborators](https://docs.github.com/en/rest/reference/repos#list-repository-collaborators)
* [Issue labels](https://docs.github.com/en/rest/issues/labels#list-labels-for-a-repository)
* [Organizations](https://docs.github.com/en/rest/reference/orgs#get-an-organization)
* [Pull request commits](https://docs.github.com/en/rest/reference/pulls#list-commits-on-a-pull-request)
* [Tags](https://docs.github.com/en/rest/reference/repos#list-repository-tags)
* [TeamMembers](https://docs.github.com/en/rest/teams/members#list-team-members)
* [TeamMemberships](https://docs.github.com/en/rest/reference/teams#get-team-membership-for-a-user)
* [Teams](https://docs.github.com/en/rest/reference/teams#list-teams)
* [Users](https://docs.github.com/en/rest/reference/orgs#list-organization-members)
- [Assignees](https://docs.github.com/en/rest/reference/issues#list-assignees)
- [Branches](https://docs.github.com/en/rest/reference/repos#list-branches)
- [Collaborators](https://docs.github.com/en/rest/reference/repos#list-repository-collaborators)
- [Issue labels](https://docs.github.com/en/rest/issues/labels#list-labels-for-a-repository)
- [Organizations](https://docs.github.com/en/rest/reference/orgs#get-an-organization)
- [Pull request commits](https://docs.github.com/en/rest/reference/pulls#list-commits-on-a-pull-request)
- [Tags](https://docs.github.com/en/rest/reference/repos#list-repository-tags)
- [TeamMembers](https://docs.github.com/en/rest/teams/members#list-team-members)
- [TeamMemberships](https://docs.github.com/en/rest/reference/teams#get-team-membership-for-a-user)
- [Teams](https://docs.github.com/en/rest/reference/teams#list-teams)
- [Users](https://docs.github.com/en/rest/reference/orgs#list-organization-members)

This connector outputs the following incremental streams:

* [Comments](https://docs.github.com/en/rest/reference/issues#list-issue-comments-for-a-repository)
* [Commit comment reactions](https://docs.github.com/en/rest/reference/reactions#list-reactions-for-a-commit-comment)
* [Commit comments](https://docs.github.com/en/rest/reference/repos#list-commit-comments-for-a-repository)
* [Commits](https://docs.github.com/en/rest/reference/repos#list-commits)
* [Deployments](https://docs.github.com/en/rest/reference/deployments#list-deployments)
* [Events](https://docs.github.com/en/rest/reference/activity#list-repository-events)
* [Issue comment reactions](https://docs.github.com/en/rest/reference/reactions#list-reactions-for-an-issue-comment)
* [Issue events](https://docs.github.com/en/rest/reference/issues#list-issue-events-for-a-repository)
* [Issue milestones](https://docs.github.com/en/rest/reference/issues#list-milestones)
* [Issue reactions](https://docs.github.com/en/rest/reference/reactions#list-reactions-for-an-issue)
* [Issues](https://docs.github.com/en/rest/reference/issues#list-repository-issues)
* [Project cards](https://docs.github.com/en/rest/reference/projects#list-project-cards)
* [Project columns](https://docs.github.com/en/rest/reference/projects#list-project-columns)
* [Projects](https://docs.github.com/en/rest/reference/projects#list-repository-projects)
* [Pull request comment reactions](https://docs.github.com/en/rest/reference/reactions#list-reactions-for-a-pull-request-review-comment)
* [Pull request stats](https://docs.github.com/en/rest/reference/pulls#get-a-pull-request)
* [Pull requests](https://docs.github.com/en/rest/reference/pulls#list-pull-requests)
* [Releases](https://docs.github.com/en/rest/reference/repos#list-releases)
* [Repositories](https://docs.github.com/en/rest/reference/repos#list-organization-repositories)
* [Review comments](https://docs.github.com/en/rest/reference/pulls#list-review-comments-in-a-repository)
* [Reviews](https://docs.github.com/en/rest/reference/pulls#list-reviews-for-a-pull-request)
* [Stargazers](https://docs.github.com/en/rest/reference/activity#list-stargazers)
* [WorkflowRuns](https://docs.github.com/en/rest/actions/workflow-runs#list-workflow-runs-for-a-repository)
* [Workflows](https://docs.github.com/en/rest/reference/actions#workflows)
- [Comments](https://docs.github.com/en/rest/reference/issues#list-issue-comments-for-a-repository)
- [Commit comment reactions](https://docs.github.com/en/rest/reference/reactions#list-reactions-for-a-commit-comment)
- [Commit comments](https://docs.github.com/en/rest/reference/repos#list-commit-comments-for-a-repository)
- [Commits](https://docs.github.com/en/rest/reference/repos#list-commits)
- [Deployments](https://docs.github.com/en/rest/reference/deployments#list-deployments)
- [Events](https://docs.github.com/en/rest/reference/activity#list-repository-events)
- [Issue comment reactions](https://docs.github.com/en/rest/reference/reactions#list-reactions-for-an-issue-comment)
- [Issue events](https://docs.github.com/en/rest/reference/issues#list-issue-events-for-a-repository)
- [Issue milestones](https://docs.github.com/en/rest/reference/issues#list-milestones)
- [Issue reactions](https://docs.github.com/en/rest/reference/reactions#list-reactions-for-an-issue)
- [Issues](https://docs.github.com/en/rest/reference/issues#list-repository-issues)
- [Project cards](https://docs.github.com/en/rest/reference/projects#list-project-cards)
- [Project columns](https://docs.github.com/en/rest/reference/projects#list-project-columns)
- [Projects](https://docs.github.com/en/rest/reference/projects#list-repository-projects)
- [Pull request comment reactions](https://docs.github.com/en/rest/reference/reactions#list-reactions-for-a-pull-request-review-comment)
- [Pull request stats](https://docs.github.com/en/rest/reference/pulls#get-a-pull-request)
- [Pull requests](https://docs.github.com/en/rest/reference/pulls#list-pull-requests)
- [Releases](https://docs.github.com/en/rest/reference/repos#list-releases)
- [Repositories](https://docs.github.com/en/rest/reference/repos#list-organization-repositories)
- [Review comments](https://docs.github.com/en/rest/reference/pulls#list-review-comments-in-a-repository)
- [Reviews](https://docs.github.com/en/rest/reference/pulls#list-reviews-for-a-pull-request)
- [Stargazers](https://docs.github.com/en/rest/reference/activity#list-stargazers)
- [WorkflowRuns](https://docs.github.com/en/rest/actions/workflow-runs#list-workflow-runs-for-a-repository)
- [Workflows](https://docs.github.com/en/rest/reference/actions#workflows)

### Notes

1. Only 4 streams \(`comments`, `commits`, `issues` and `review comments`\) from the above 24 incremental streams are pure incremental meaning that they:
* read only new records;
* output only new records.

- read only new records;
- output only new records.

2. Streams `workflow_runs` and `worflow_jobs` is almost pure incremental:
* read new records and some portion of old records (in past 30 days) [docs](https://docs.github.com/en/actions/managing-workflow-runs/re-running-workflows-and-jobs);
* the `workflow_jobs` depends on the `workflow_runs` to read the data, so they both follow the same logic [docs](https://docs.github.com/pt/rest/actions/workflow-jobs#list-jobs-for-a-workflow-run);
* output only new records.

- read new records and some portion of old records (in past 30 days) [docs](https://docs.github.com/en/actions/managing-workflow-runs/re-running-workflows-and-jobs);
- the `workflow_jobs` depends on the `workflow_runs` to read the data, so they both follow the same logic [docs](https://docs.github.com/pt/rest/actions/workflow-jobs#list-jobs-for-a-workflow-run);
- output only new records.

3. Other 19 incremental streams are also incremental but with one difference, they:
* read all records;
* output only new records.
Please, consider this behaviour when using those 19 incremental streams because it may affect you API call limits.

- read all records;
- output only new records.
Please, consider this behaviour when using those 19 incremental streams because it may affect you API call limits.

4. We are passing few parameters \(`since`, `sort` and `direction`\) to GitHub in order to filter records and sometimes for large streams specifying very distant `start_date` in the past may result in keep on getting error from GitHub instead of records \(respective `WARN` log message will be outputted\). In this case Specifying more recent `start_date` may help.
**The "Start date" configuration option does not apply to the streams below, because the GitHub API does not include dates which can be used for filtering:**

* `assignees`
* `branches`
* `collaborators`
* `issue_labels`
* `organizations`
* `pull_request_commits`
* `pull_request_stats`
* `repositories`
* `tags`
* `teams`
* `users`
**The "Start date" configuration option does not apply to the streams below, because the GitHub API does not include dates which can be used for filtering:**

- `assignees`
- `branches`
- `collaborators`
- `issue_labels`
- `organizations`
- `pull_request_commits`
- `pull_request_stats`
- `repositories`
- `tags`
- `teams`
- `users`

### Permissions and scopes

If you use OAuth authentication method, the oauth2.0 application requests the next list of [scopes](https://docs.github.com/en/developers/apps/building-oauth-apps/scopes-for-oauth-apps#available-scopes): **repo**, **read:org**, **read:repo_hook**, **read:user**, **read:discussion**, **workflow**. For [personal access token](https://github.com/settings/tokens) it need to manually select needed scopes.

Your token should have at least the `repo` scope. Depending on which streams you want to sync, the user generating the token needs more permissions:

* For syncing Collaborators, the user which generates the personal access token must be a collaborator. To become a collaborator, they must be invited by an owner. If there are no collaborators, no records will be synced. Read more about access permissions [here](https://docs.github.com/en/get-started/learning-about-github/access-permissions-on-github).
* Syncing [Teams](https://docs.github.com/en/organizations/organizing-members-into-teams/about-teams) is only available to authenticated members of a team's [organization](https://docs.github.com/en/rest/orgs). [Personal user accounts](https://docs.github.com/en/get-started/learning-about-github/types-of-github-accounts) and repositories belonging to them don't have access to Teams features. In this case no records will be synced.
* To sync the Projects stream, the repository must have the Projects feature enabled.
- For syncing Collaborators, the user which generates the personal access token must be a collaborator. To become a collaborator, they must be invited by an owner. If there are no collaborators, no records will be synced. Read more about access permissions [here](https://docs.github.com/en/get-started/learning-about-github/access-permissions-on-github).
- Syncing [Teams](https://docs.github.com/en/organizations/organizing-members-into-teams/about-teams) is only available to authenticated members of a team's [organization](https://docs.github.com/en/rest/orgs). [Personal user accounts](https://docs.github.com/en/get-started/learning-about-github/types-of-github-accounts) and repositories belonging to them don't have access to Teams features. In this case no records will be synced.
- To sync the Projects stream, the repository must have the Projects feature enabled.

### Performance considerations

Expand All @@ -147,6 +153,7 @@ The GitHub connector should not run into GitHub API limitations under normal usa

| Version | Date | Pull Request | Subject |
| :------ | :--------- | :---------------------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| 0.3.8 | 2022-11-10 | [19299](https://github.com/airbytehq/airbyte/pull/19299) | Fix events and workflow_runs datetimes |
| 0.3.7 | 2022-10-20 | [18213](https://github.com/airbytehq/airbyte/pull/18213) | Skip retry on HTTP 200 |
| 0.3.6 | 2022-10-11 | [17852](https://github.com/airbytehq/airbyte/pull/17852) | Use default behaviour, retry on 429 and all 5XX errors |
| 0.3.5 | 2022-10-07 | [17715](https://github.com/airbytehq/airbyte/pull/17715) | Improve 502 handling for `comments` stream |
Expand Down Expand Up @@ -213,4 +220,3 @@ The GitHub connector should not run into GitHub API limitations under normal usa
| 0.1.2 | 2021-07-13 | [4708](https://github.com/airbytehq/airbyte/pull/4708) | Fix bug with IssueEvents stream and add handling for rate limiting |
| 0.1.1 | 2021-07-07 | [4590](https://github.com/airbytehq/airbyte/pull/4590) | Fix schema in the `pull_request` stream |
| 0.1.0 | 2021-07-06 | [4174](https://github.com/airbytehq/airbyte/pull/4174) | New Source: GitHub |

0 comments on commit bfdba6c

Please sign in to comment.