-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tutorial and documentation for config-based connectors #15027
Conversation
Let's start by cloning the Airbyte repository | ||
|
||
``` | ||
git clone git@github.com:airbytehq/airbyte.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python -m pytest integration_tests -p integration_tests.acceptance | ||
``` | ||
|
||
1 test should be failing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are currently 2 tests failing. the second one will be fixed by #15067
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirm I have the following two failures:
- airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_core.py:183 TestConnection.test_check[inputs1]
- airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_incremental.py:207 TestIncremental.test_state_with_abnormally_large
1. `streams`: list of streams that are part of the source | ||
2. `check`: component describing how to check the connection. | ||
|
||
The configuration will be validated against this JSON Schema, which defines the set of valid properties. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
soon (tm).
we don't have the json schema yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few suggestions for grammar fixes but looks good to me!
4. [Data retriever](overview.md#data-retriever): Describes how to retrieve the data from the API | ||
5. [Cursor field](../cdk-python/incremental-stream.md) (Optional): Field to use used as stream cursor. Can either be a string, or a list of strings if the cursor is a nested field. | ||
6. [Transformations](./record-selector.md#transformations) (Optional): A set of transformations to be applied on the records read from the source before emitting them to the destination | ||
7. [Checkpoint interval](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#state--checkpointing) (Optional): Defines the interval at which incremental syncs should be checkpointed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth providing an example of what the integer indicates. For example, 5
could mean a lot of different things, items, hours, days, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. done
|
||
This cursor value can be used to request the next page of record. | ||
|
||
In this example, the next page of record is defined by setting the `from` request parameter to the id of the last record read: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be worth adding a sub header to break apart the two approaches to pagination called "Cursor Pagination in Body" and "Cursor Pagination in Headers", just to differentiate them from a reading perspective
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
- stream_slice | ||
- stream_state | ||
- next_page_token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to think what the current use case for using these in the language is now, that the preferred method is to let other components handle things more declaratively. My worry is that we highlight this instead of the preferred way. But happy to leave it in if it provides needed flexibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea. let's leave this as an undocumented behavior for now. we can always add it back with some examples if needed
|
||
will create one slice per day for the interval `2021-02-01` - `2021-03-01`. | ||
|
||
The `DatetimeStreamSlicer` also supports an optional lookback window, specifying how many days before the start_datetime to read data for. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as per slack discussion as something we should fix, we should also include that in the case of incremental syncs, the lookback window is also applied for the cursor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created an issue #15628
Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com>
Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com>
Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com>
Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com>
Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks sane. I would give a heads up to @Amruta-Ranade especially as this will probably induce unexpected behavior for what docs show up for her and her team
the docs have been updated and reviewed since your last pass
* 5-step tutorial * move * tiny bit of editing * Update tutorial * update docs * reset * move files * record selector, request options, and more links * update * update * connector definition * link * links * update example * footnote * typo * document string interpolation * note on string interpolation * update * fix code sample * fix * update sample * fix * use the actual config * Update as per comments * write as yaml * typo * Clarify options overloading * clarify that docker must be running * remove extra footnote * use venv directly * Apply suggestions from code review Co-authored-by: Sherif A. Nada <snadalive@gmail.com> * signup instructions * update * clarify that both dot and bracket notations are interchangeable * Clarify how check works * create spec and config before updating connector definition * clarify what now_local() is * rename to yaml structure * Go through tutorial and update end of section code samples * fix link * update * update code samples * Update code samples * Update to bracket notation * remove superfluous comments * Update docs/connector-development/config-based/tutorial/2-install-dependencies.md Co-authored-by: Augustin <augustin.lafanechere@gmail.com> * Update docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md Co-authored-by: Augustin <augustin.lafanechere@gmail.com> * Update docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md Co-authored-by: Augustin <augustin.lafanechere@gmail.com> * Update docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md Co-authored-by: Augustin <augustin.lafanechere@gmail.com> * Update docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md Co-authored-by: Augustin <augustin.lafanechere@gmail.com> * Update docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md Co-authored-by: Augustin <augustin.lafanechere@gmail.com> * Update docs/connector-development/config-based/tutorial/4-reading-data.md Co-authored-by: Augustin <augustin.lafanechere@gmail.com> * fix path * update * motivation blurp * warning * warning * fix code block * update code samples * update code sample * update code samples * small updates * update yaml structure * custom class example * language annotations * update warning * Update tutorial to use dpath extractor * Update record selector docs * unit test * link to contributing * tiny update * $ in front of commands * $ in front of commands * More readings * link to existing config-based connectors * index * update * delete broken link * supported features * update * Add some links * Update docs/connector-development/config-based/overview.md Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com> * Update docs/connector-development/config-based/record-selector.md Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com> * Update docs/connector-development/config-based/overview.md Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com> * Update docs/connector-development/config-based/overview.md Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com> * Update docs/connector-development/config-based/overview.md Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com> * mention the unit * headers * remove mentions of interpolating on stream slice, etc. * update * exclude config-based docs Co-authored-by: Sherif A. Nada <snadalive@gmail.com> Co-authored-by: Augustin <augustin.lafanechere@gmail.com> Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com>
What
Add a tutorial and some documentation to help developers implement config-based connectors
How
docs/connector-development/config-based/
since I'm not sure we want everyone to start using config-based right away. Happy to move the files to wherever makes the most sense though.Recommended reading order
docs/connector-development/config-based/overview.md
docs/connector-development/config-based/tutorial/
🚨 User Impact 🚨
Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.
Pre-merge Checklist
Expand the relevant checklist and delete the others.
New Connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampledocs/integrations/README.md
airbyte-integrations/builds.md
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereUpdating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereConnector Generator
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates
then checking in your changesTests
Unit
Put your unit tests output here.
Integration
Put your integration tests output here.
Acceptance
Put your acceptance tests output here.