-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎉 New Source: SurveyMonkey #4097
🎉 New Source: SurveyMonkey #4097
Conversation
…source_surveymonkey
/test connector=connectors/source-surveymonkey
|
/test connector=connectors/source-surveymonkey
|
airbyte-integrations/connectors/source-surveymonkey/.dockerignore
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-surveymonkey/source_surveymonkey/source.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-surveymonkey/source_surveymonkey/source.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-surveymonkey/source_surveymonkey/source.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-surveymonkey/source_surveymonkey/streams.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-surveymonkey/source_surveymonkey/streams.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-surveymonkey/source_surveymonkey/streams.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-surveymonkey/source_surveymonkey/streams.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-surveymonkey/source_surveymonkey/streams.py
Show resolved
Hide resolved
airbyte-integrations/connectors/source-surveymonkey/source_surveymonkey/streams.py
Show resolved
Hide resolved
#incremental: | ||
# - config_path: "secrets/config.json" | ||
# configured_catalog_path: "integration_tests/configured_catalog.json" | ||
# # future_state_path: "integration_tests/abnormal_state.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this test is commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is commented for reason as surveymonkey - parent child
The state of this object is complex and should be like my state.json. There is an opened issue for this - #3777 - currently this format of state is not supported and test will fail due to this reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then could you please add a comment like here which will describe why this test is skipped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's OK but in this case can you add unit tests to verify get_updated_state
works as expected?
airbyte-integrations/connectors/source-surveymonkey/integration_tests/abnormal_state.json
Outdated
Show resolved
Hide resolved
"default_cursor_field": [] | ||
}, | ||
"sync_mode": "incremental", | ||
"cursor_field": [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why cursor field is empty?
airbyte-integrations/connectors/source-surveymonkey/sample_files/configured_catalog.json
Outdated
Show resolved
Hide resolved
@@ -0,0 +1,23 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add $schema field to schema.
|
||
def __init__(self, start_date: str, **kwargs): | ||
if isinstance(start_date, str): | ||
start_date = pendulum.parse(start_date) # convert to YYYY-MM-DDTHH:MM:SS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move converting from this place.
airbyte-integrations/connectors/source-surveymonkey/source_surveymonkey/streams.py
Outdated
Show resolved
Hide resolved
for record in result: | ||
substream = SurveyDetails(survey_id=record["id"], start_date=self._start_date, authenticator=self.authenticator) | ||
child_record = substream.read_records(sync_mode=SyncMode.full_refresh) | ||
yield from child_record |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you sync child stream in parse_response?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parent stream contains title, href, link only, details have expanded response including cursor_field. Parent is useless for data, it requires for enumeration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vovavovavovavova I got it. But why did you put read in parse_response method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/surveys
contains the following schema:
{
"title": "My Survey",
"nickname": "",
"category":"",
"language": "en",
"question_count": 10,
"page_count": 10,
"date_created": "2015-10-06T12:56:55",
"date_modified": "2015-10-06T12:56:55",
"id": "1234",
"href": "https://api.surveymonkey.com/v3/surveys/1234",
"buttons_text": {
"done_button": "Done",
"prev_button": "Prev",
"exit_button": "Exit",
"next_button": "Next"
},
"custom_variables": {
"name": "label"
},
"footer": true,
"folder_id":"1234",
"preview": "https://www.surveymonkey.com/r/Preview/",
"edit_url": "https://www.surveymonkey.com/create/",
"collect_url": "https://www.surveymonkey.com/collect/list",
"analyze_url": "https://www.surveymonkey.com/analyze/",
"summary_url": "https://www.surveymonkey.com/summary/"
}
it has the id
field which seems to be the only thing we use for slices, and the date_modified
parameter. Why are we using the SurveyDetails endpoint? See https://developer.surveymonkey.com/api/v3/#surveys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the surveys
itself return the small response, that contains array items with only the fields [id, title, nickname, href]
- as well real response as well documentation says this (central part of the page). The example at right side is inactual or outdated . It returns the data wroted above ONLY when hitting https://developer.surveymonkey.com/api/v3/#surveys-id either https://developer.surveymonkey.com/api/v3/#surveys-id-details .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That totally makes sense - thanks for the clarification.
airbyte-integrations/connectors/source-surveymonkey/source_surveymonkey/source.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-surveymonkey/source_surveymonkey/streams.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-surveymonkey/build.gradle
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-surveymonkey/requirements.txt
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-surveymonkey/source_surveymonkey/streams.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-surveymonkey/source_surveymonkey/streams.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-surveymonkey/source_surveymonkey/streams.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-surveymonkey/unit_tests/unit_test.py
Outdated
Show resolved
Hide resolved
/test connector=connectors/source-surveymonkey
|
|
||
|
||
def test_get_updated_state(): | ||
source = SourceSurveymonkey() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test is currently an integration test because it reads data from survey monkey directly. Instead it should just be a unit test. Can't we create an instance of the streams and then directly test the get_updated_state
method e.g:
stream = IncrementalSurveyMonkeyStream()
record = {'field1':"value", "cursor_field": "2021-01-01"}
current_state = {"cursor_field": "2021-02-01"}
expected_state = current_state
assert stream.get_updated_state(record, current) == expected_state
then record
, current_state
, and expected_state
can all be parametrized via pytest to repeat this test for many input values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added this (date comparison test), but instantiate from Surveys
stream that completely inherits get_updated_state
method. Because IncrementalSurveyMonkeyStream has undefined abstract properties path
and cursor_field
and cannot be instantiated. Also I kept previous test in the file
/test connector=connectors/source-surveymonkey
|
|
||
new_state = surveys_instance.get_updated_state(current_stream_state={}, latest_record=latest_record) | ||
new_response_state = survey_responses_instance.get_updated_state(current_stream_state={}, latest_record=responses_latest_record) | ||
assert new_state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add text to each of this asserts explaining what you expect, like this:
assert new_state, "get_updated_state should return non empty state"
streams = source.streams(config=config) | ||
surveys_instance = [i for i in streams if i.__class__.__name__ == "Surveys"][0] | ||
survey_responses_instance = [i for i in streams if i.__class__.__name__ == "SurveyResponses"][0] | ||
with vcr.use_cassette("pretty_name.yaml", record_mode="new_episodes"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to make it a unit test you need to either:
- mock requests or other "network" calls inside
get_updated_state
- or, like you did here, use vcr, but you should not record any new episodes but rather assert on all network calls that is not in pretty_name.yaml file (btw, the name is not pretty, it is confusing). Of course you need to commit yaml file as well if you use vcr.
I vote for option 1, as it is easier to maintain
source = SourceSurveymonkey() | ||
config = json.load(open("secrets/config.json")) | ||
streams = source.streams(config=config) | ||
stream = [i for i in streams if i.__class__.__name__ == "Surveys"][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a very convoluted way to create a Surveys
stream. Why not do:
stream = Surveys(<args>)
?
Also we should parametrize this test, and test for empty state
/test connector=connectors/source-surveymonkey
|
config = json.load(open("secrets/config.json")) | ||
streams = source.streams(config=config) | ||
stream = [i for i in streams if i.__class__.__name__ == "Surveys"][0] | ||
config = json.load(open("secrets/config.json")) # not used but required as __init__ arguments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need this, just pass anythinh there
@@ -0,0 +1,63 @@ | |||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please rename this file to test_streams.py
or something.
it is always good to name your test as action, also unit
in the name is too broad
/test connector=connectors/source-surveymonkey
|
( | ||
{"date_modified": "2021-06-10T11:02:01"}, | ||
{"title": "test", "id": 100500, "date_modified": "2021-06-09T11:02:01"}, | ||
# {"title": "test", "id": 100500, "date_modified": "2021-06-10T11:02:02"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it commented out? delete commented out lines if not necessary
""" | ||
We need to cache all requests to all endpoints during iteration. | ||
This API is very very rate limited, we need to reuse everything possible. | ||
"Cause an error to be raised for new requests if there is a cassette file" is not that I need. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Cause an error to be raised for new requests if there is a cassette file" is not that I need. | |
We use the "new_episodes" record mode to save and reuse all requests in slices, details, etc.. |
We need to cache all requests to all endpoints during iteration. | ||
This API is very very rate limited, we need to reuse everything possible. | ||
"Cause an error to be raised for new requests if there is a cassette file" is not that I need. | ||
I need to save all of them and reuse all of them. In stream slices, in details, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to save all of them and reuse all of them. In stream slices, in details, etc. |
This API is very very rate limited, we need to reuse everything possible. | ||
"Cause an error to be raised for new requests if there is a cassette file" is not that I need. | ||
I need to save all of them and reuse all of them. In stream slices, in details, etc. | ||
Thats why `new_episodes` record mode is using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats why `new_episodes` record mode is using |
), | ||
], | ||
) | ||
def test_get_updated_state_lesser(self, current_state, lesser_date_record, expected_state): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
beautiful
/test connector=connectors/source-surveymonkey
|
/publish connector=connectors/source-surveymonkey
|
/test connector=connectors/source-surveymonkey
|
/publish connector=connectors/source-surveymonkey
|
/publish connector=connectors/source-surveymonkey
|
What
Closes #2968
How
Describe the solution
Recommended reading order
x.java
y.python
Pre-merge Checklist
Expand the checklist which is relevant for this PR.
Connector checklist
airbyte_secret
in output spec./gradlew :airbyte-integrations:connectors:<name>:integrationTest
./test connector=connectors/<name>
command as documented here is passing.docs/integrations/
directory./publish
command described hereConnector Generator checklist
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates
then checking in your changes