Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub source connector fails when /contributors endpoint returns 404 #10594

Closed
jvstein opened this issue Feb 23, 2022 · 0 comments · Fixed by #10878
Closed

GitHub source connector fails when /contributors endpoint returns 404 #10594

jvstein opened this issue Feb 23, 2022 · 0 comments · Fixed by #10878
Assignees
Labels
community type/bug Something isn't working

Comments

@jvstein
Copy link

jvstein commented Feb 23, 2022

Environment

  • Airbyte version: 0.35.27-alpha
  • OS Version / Instance: Ubuntu 20.04
  • Deployment: Kubernetes
  • Source Connector and version: GitHub 0.2.19
  • Destination Connector and version: N/A
  • Severity: Medium
  • Step where error happened: Sync job

Current Behavior

The GitHub connector ran into a 404 error attempting to sync a mostly empty repo. The repo had only a single commit in it (e.g. README.md, template code, etc from a scaffolding tool). The commit was performed using an email address that is not associated with any GitHub account.

screenshot-20220223-183137

The connector hit a 404 error attempting to pull the /contributors?per_page=100 request and aborts further actions.

The API docs aren't particularly clear when the 404 would occur, I assumed only on a missing repository.

Annoyingly, I get the same response from this scaffolded repo as I get on a properly missing repo. When I run the same request against a purely empty repo, I successfully get a list of contributors. This may ultimately be a GitHub API bug. 😕

$ curl -u "my_user:my_token"  -H "Accept: application/vnd.github.v3+json" https://api.github.com/repos/my_org/my_repo/collaborators
{
  "message": "Not Found",
  "documentation_url": "https://docs.github.com/rest/reference/repos#list-repository-collaborators"
}

Expected Behavior

The 404 error should be handled and the connector should continue as if no contributors are present.

Logs

LOG
2022-02-23 00:19:00 source > SourceGithub runtimes:

2022-02-23 00:19:00 source > Syncing stream: collaborators 
2022-02-23 00:19:00 source > Undefined error while reading records: Not Found
2022-02-23 00:19:00 source > Encountered an exception while reading stream SourceGithub
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/airbyte_cdk/sources/abstract_source.py", line 108, in read
    internal_config=internal_config,
  File "/usr/local/lib/python3.7/site-packages/airbyte_cdk/sources/abstract_source.py", line 141, in _read_stream
    for record in record_iterator:
  File "/usr/local/lib/python3.7/site-packages/airbyte_cdk/sources/abstract_source.py", line 213, in _read_full_refresh
    for record in records:
  File "/airbyte/integration_code/source_github/streams.py", line 150, in read_records
    raise e
  File "/airbyte/integration_code/source_github/streams.py", line 97, in read_records
    yield from super().read_records(stream_slice=stream_slice, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/airbyte_cdk/sources/streams/http/http.py", line 366, in read_records
    response = self._send_request(request, request_kwargs)
  File "/usr/local/lib/python3.7/site-packages/airbyte_cdk/sources/streams/http/http.py", line 333, in _send_request
    return backoff_handler(user_backoff_handler)(request, request_kwargs)
  File "/usr/local/lib/python3.7/site-packages/backoff/_sync.py", line 94, in retry
    ret = target(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/backoff/_sync.py", line 94, in retry
    ret = target(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/airbyte_cdk/sources/streams/http/http.py", line 297, in _send
    response.raise_for_status()
  File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 960, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://api.github.com/repos/my_org/my_repo/collaborators?per_page=100
2022-02-23 00:19:00 source > Finished syncing SourceGithub
2022-02-23 00:19:00 source > SourceGithub runtimes:

2022-02-23 00:19:00 source > 404 Client Error: Not Found for url: https://api.github.com/repos/my_org/my_repo/collaborators?per_page=100
Traceback (most recent call last):
  File "/airbyte/integration_code/main.py", line 13, in <module>
    launch(source, sys.argv[1:])
  File "/usr/local/lib/python3.7/site-packages/airbyte_cdk/entrypoint.py", line 127, in launch
    for message in source_entrypoint.run(parsed_args):
  File "/usr/local/lib/python3.7/site-packages/airbyte_cdk/entrypoint.py", line 118, in run
    for message in generator:
  File "/usr/local/lib/python3.7/site-packages/airbyte_cdk/sources/abstract_source.py", line 112, in read
    raise e
  File "/usr/local/lib/python3.7/site-packages/airbyte_cdk/sources/abstract_source.py", line 108, in read
    internal_config=internal_config,
  File "/usr/local/lib/python3.7/site-packages/airbyte_cdk/sources/abstract_source.py", line 141, in _read_stream
    for record in record_iterator:
  File "/usr/local/lib/python3.7/site-packages/airbyte_cdk/sources/abstract_source.py", line 213, in _read_full_refresh
    for record in records:
  File "/airbyte/integration_code/source_github/streams.py", line 150, in read_records
    raise e
  File "/airbyte/integration_code/source_github/streams.py", line 97, in read_records
    yield from super().read_records(stream_slice=stream_slice, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/airbyte_cdk/sources/streams/http/http.py", line 366, in read_records
    response = self._send_request(request, request_kwargs)
  File "/usr/local/lib/python3.7/site-packages/airbyte_cdk/sources/streams/http/http.py", line 333, in _send_request
    return backoff_handler(user_backoff_handler)(request, request_kwargs)
  File "/usr/local/lib/python3.7/site-packages/backoff/_sync.py", line 94, in retry
    ret = target(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/backoff/_sync.py", line 94, in retry
    ret = target(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/airbyte_cdk/sources/streams/http/http.py", line 297, in _send
    response.raise_for_status()
  File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 960, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://api.github.com/repos/my_org/my_repo/collaborators?per_page=100
2022-02-23 00:19:31 destination > 2022-02-23 00:19:31 INFO i.a.i.b.FailureTrackingAirbyteMessageConsumer(close):65 - Airbyte message consumer: succeeded.

Steps to Reproduce

  1. Create a new GitHub repo with a single commit using a dummy email address not associated with any GitHub account. (maybe?)
  2. Set up the connector with a destination and include the collaborators table.
  3. Run it.

Are you willing to submit a PR?

Yes. Currently disabling the collaborators functionality to do an initial load of all our data. Can come back to this later.

@jvstein jvstein added needs-triage type/bug Something isn't working labels Feb 23, 2022
@grubberr grubberr self-assigned this Mar 10, 2022
@grubberr grubberr moved this to In review (Airbyte) in GL Roadmap Mar 10, 2022
@grubberr grubberr linked a pull request Mar 10, 2022 that will close this issue
16 tasks
@bazarnov bazarnov moved this from In review (Airbyte) to Done in GL Roadmap Mar 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community type/bug Something isn't working
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants