Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 New Source: SurveyMonkey #4097

Merged
merged 39 commits into from
Jul 5, 2021

Conversation

vovavovavovavova
Copy link
Contributor

@vovavovavovavova vovavovavovavova commented Jun 14, 2021

What

Closes #2968

How

Describe the solution

Recommended reading order

  1. x.java
  2. y.python

Pre-merge Checklist

Expand the checklist which is relevant for this PR.

Connector checklist

  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • Secrets are annotated with airbyte_secret in output spec
  • Unit & integration tests added as appropriate (and are passing)
    • Community members: please provide proof of this succeeding locally e.g: screenshot or copy-paste acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • /test connector=connectors/<name> command as documented here is passing.
    • Community members can skip this, Airbyters will run this for you.
  • Code reviews completed
  • Credentials added to Github CI if needed and not already present. instructions for injecting secrets into CI.
  • Documentation updated
    • README
    • CHANGELOG.md
    • Reference docs in the docs/integrations/ directory.
    • Build status added to build page
  • Build is successful
  • Connector version bumped like described here
  • New Connector version released on Dockerhub by running the /publish command described here
  • No major blockers
  • PR merged into master branch
  • Follow up tickets have been created
  • Associated tickets have been closed & stakeholders notified

Connector Generator checklist

  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed.

@vovavovavovavova
Copy link
Contributor Author

vovavovavovavova commented Jun 14, 2021

/test connector=connectors/source-surveymonkey

🕑 connectors/source-surveymonkey https://github.com/airbytehq/airbyte/actions/runs/937117365
❌ connectors/source-surveymonkey https://github.com/airbytehq/airbyte/actions/runs/937117365

@vovavovavovavova
Copy link
Contributor Author

vovavovavovavova commented Jun 15, 2021

/test connector=connectors/source-surveymonkey

🕑 connectors/source-surveymonkey https://github.com/airbytehq/airbyte/actions/runs/938283374
✅ connectors/source-surveymonkey https://github.com/airbytehq/airbyte/actions/runs/938283374

#incremental:
# - config_path: "secrets/config.json"
# configured_catalog_path: "integration_tests/configured_catalog.json"
# # future_state_path: "integration_tests/abnormal_state.json"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this test is commented out?

Copy link
Contributor Author

@vovavovavovavova vovavovavovavova Jun 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is commented for reason as surveymonkey - parent child
The state of this object is complex and should be like my state.json. There is an opened issue for this - #3777 - currently this format of state is not supported and test will fail due to this reason

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then could you please add a comment like here which will describe why this test is skipped?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's OK but in this case can you add unit tests to verify get_updated_state works as expected?

"default_cursor_field": []
},
"sync_mode": "incremental",
"cursor_field": [],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why cursor field is empty?

@@ -0,0 +1,23 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add $schema field to schema.


def __init__(self, start_date: str, **kwargs):
if isinstance(start_date, str):
start_date = pendulum.parse(start_date) # convert to YYYY-MM-DDTHH:MM:SS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move converting from this place.

for record in result:
substream = SurveyDetails(survey_id=record["id"], start_date=self._start_date, authenticator=self.authenticator)
child_record = substream.read_records(sync_mode=SyncMode.full_refresh)
yield from child_record
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you sync child stream in parse_response?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parent stream contains title, href, link only, details have expanded response including cursor_field. Parent is useless for data, it requires for enumeration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vovavovavovavova I got it. But why did you put read in parse_response method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/surveys contains the following schema:

{
  "title": "My Survey",
  "nickname": "",
  "category":"",
  "language": "en",
  "question_count": 10,
  "page_count": 10,
  "date_created": "2015-10-06T12:56:55",
  "date_modified": "2015-10-06T12:56:55",
  "id": "1234",
  "href": "https://api.surveymonkey.com/v3/surveys/1234",
  "buttons_text": {
    "done_button": "Done",
    "prev_button": "Prev",
    "exit_button": "Exit",
    "next_button": "Next"
  },
  "custom_variables": {
    "name": "label"
  },
  "footer": true,
  "folder_id":"1234",
  "preview": "https://www.surveymonkey.com/r/Preview/",
  "edit_url": "https://www.surveymonkey.com/create/",
  "collect_url": "https://www.surveymonkey.com/collect/list",
  "analyze_url": "https://www.surveymonkey.com/analyze/",
  "summary_url": "https://www.surveymonkey.com/summary/"
}

it has the id field which seems to be the only thing we use for slices, and the date_modified parameter. Why are we using the SurveyDetails endpoint? See https://developer.surveymonkey.com/api/v3/#surveys

Copy link
Contributor Author

@vovavovavovavova vovavovavovavova Jun 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the surveys itself return the small response, that contains array items with only the fields [id, title, nickname, href] - as well real response as well documentation says this (central part of the page). The example at right side is inactual or outdated . It returns the data wroted above ONLY when hitting https://developer.surveymonkey.com/api/v3/#surveys-id either https://developer.surveymonkey.com/api/v3/#surveys-id-details .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That totally makes sense - thanks for the clarification.

@vovavovavovavova
Copy link
Contributor Author

vovavovavovavova commented Jun 30, 2021

/test connector=connectors/source-surveymonkey

🕑 connectors/source-surveymonkey https://github.com/airbytehq/airbyte/actions/runs/986714145
✅ connectors/source-surveymonkey https://github.com/airbytehq/airbyte/actions/runs/986714145



def test_get_updated_state():
source = SourceSurveymonkey()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is currently an integration test because it reads data from survey monkey directly. Instead it should just be a unit test. Can't we create an instance of the streams and then directly test the get_updated_state method e.g:

stream = IncrementalSurveyMonkeyStream()

record = {'field1':"value", "cursor_field": "2021-01-01"}
current_state = {"cursor_field": "2021-02-01"}
expected_state = current_state

assert stream.get_updated_state(record, current) == expected_state

then record, current_state, and expected_state can all be parametrized via pytest to repeat this test for many input values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added this (date comparison test), but instantiate from Surveys stream that completely inherits get_updated_state method. Because IncrementalSurveyMonkeyStream has undefined abstract properties path and cursor_field and cannot be instantiated. Also I kept previous test in the file

@vovavovavovavova
Copy link
Contributor Author

vovavovavovavova commented Jun 30, 2021

/test connector=connectors/source-surveymonkey

🕑 connectors/source-surveymonkey https://github.com/airbytehq/airbyte/actions/runs/987686229
❌ connectors/source-surveymonkey https://github.com/airbytehq/airbyte/actions/runs/987686229

@keu keu changed the title Surveymonkey: New Source 🎉 New Source: SurveyMonkey Jun 30, 2021

new_state = surveys_instance.get_updated_state(current_stream_state={}, latest_record=latest_record)
new_response_state = survey_responses_instance.get_updated_state(current_stream_state={}, latest_record=responses_latest_record)
assert new_state
Copy link
Contributor

@keu keu Jul 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add text to each of this asserts explaining what you expect, like this:

assert new_state, "get_updated_state should return non empty state"

streams = source.streams(config=config)
surveys_instance = [i for i in streams if i.__class__.__name__ == "Surveys"][0]
survey_responses_instance = [i for i in streams if i.__class__.__name__ == "SurveyResponses"][0]
with vcr.use_cassette("pretty_name.yaml", record_mode="new_episodes"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make it a unit test you need to either:

  1. mock requests or other "network" calls inside get_updated_state
  2. or, like you did here, use vcr, but you should not record any new episodes but rather assert on all network calls that is not in pretty_name.yaml file (btw, the name is not pretty, it is confusing). Of course you need to commit yaml file as well if you use vcr.

I vote for option 1, as it is easier to maintain

source = SourceSurveymonkey()
config = json.load(open("secrets/config.json"))
streams = source.streams(config=config)
stream = [i for i in streams if i.__class__.__name__ == "Surveys"][0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very convoluted way to create a Surveys stream. Why not do:

stream = Surveys(<args>) 

?

Also we should parametrize this test, and test for empty state

@vovavovavovavova
Copy link
Contributor Author

vovavovavovavova commented Jul 1, 2021

/test connector=connectors/source-surveymonkey

🕑 connectors/source-surveymonkey https://github.com/airbytehq/airbyte/actions/runs/989780116
✅ connectors/source-surveymonkey https://github.com/airbytehq/airbyte/actions/runs/989780116

config = json.load(open("secrets/config.json"))
streams = source.streams(config=config)
stream = [i for i in streams if i.__class__.__name__ == "Surveys"][0]
config = json.load(open("secrets/config.json")) # not used but required as __init__ arguments
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need this, just pass anythinh there

@@ -0,0 +1,63 @@
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rename this file to test_streams.py or something.
it is always good to name your test as action, also unit in the name is too broad

@vovavovavovavova
Copy link
Contributor Author

vovavovavovavova commented Jul 2, 2021

/test connector=connectors/source-surveymonkey

🕑 connectors/source-surveymonkey https://github.com/airbytehq/airbyte/actions/runs/993578027
❌ connectors/source-surveymonkey https://github.com/airbytehq/airbyte/actions/runs/993578027

(
{"date_modified": "2021-06-10T11:02:01"},
{"title": "test", "id": 100500, "date_modified": "2021-06-09T11:02:01"},
# {"title": "test", "id": 100500, "date_modified": "2021-06-10T11:02:02"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it commented out? delete commented out lines if not necessary

"""
We need to cache all requests to all endpoints during iteration.
This API is very very rate limited, we need to reuse everything possible.
"Cause an error to be raised for new requests if there is a cassette file" is not that I need.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Cause an error to be raised for new requests if there is a cassette file" is not that I need.
We use the "new_episodes" record mode to save and reuse all requests in slices, details, etc..

We need to cache all requests to all endpoints during iteration.
This API is very very rate limited, we need to reuse everything possible.
"Cause an error to be raised for new requests if there is a cassette file" is not that I need.
I need to save all of them and reuse all of them. In stream slices, in details, etc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
I need to save all of them and reuse all of them. In stream slices, in details, etc.

This API is very very rate limited, we need to reuse everything possible.
"Cause an error to be raised for new requests if there is a cassette file" is not that I need.
I need to save all of them and reuse all of them. In stream slices, in details, etc.
Thats why `new_episodes` record mode is using
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Thats why `new_episodes` record mode is using

),
],
)
def test_get_updated_state_lesser(self, current_state, lesser_date_record, expected_state):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beautiful

@vovavovavovavova
Copy link
Contributor Author

vovavovavovavova commented Jul 5, 2021

/test connector=connectors/source-surveymonkey

🕑 connectors/source-surveymonkey https://github.com/airbytehq/airbyte/actions/runs/1000524418
❌ connectors/source-surveymonkey https://github.com/airbytehq/airbyte/actions/runs/1000524418

@vovavovavovavova
Copy link
Contributor Author

vovavovavovavova commented Jul 5, 2021

/publish connector=connectors/source-surveymonkey

🕑 connectors/source-surveymonkey https://github.com/airbytehq/airbyte/actions/runs/1000556023
❌ connectors/source-surveymonkey https://github.com/airbytehq/airbyte/actions/runs/1000556023

@vovavovavovavova
Copy link
Contributor Author

vovavovavovavova commented Jul 5, 2021

/test connector=connectors/source-surveymonkey

🕑 connectors/source-surveymonkey https://github.com/airbytehq/airbyte/actions/runs/1000595743
❌ connectors/source-surveymonkey https://github.com/airbytehq/airbyte/actions/runs/1000595743

@vovavovavovavova
Copy link
Contributor Author

vovavovavovavova commented Jul 5, 2021

/publish connector=connectors/source-surveymonkey

🕑 connectors/source-surveymonkey https://github.com/airbytehq/airbyte/actions/runs/1000615186
❌ connectors/source-surveymonkey https://github.com/airbytehq/airbyte/actions/runs/1000615186

@vovavovavovavova
Copy link
Contributor Author

vovavovavovavova commented Jul 5, 2021

/publish connector=connectors/source-surveymonkey

🕑 connectors/source-surveymonkey https://github.com/airbytehq/airbyte/actions/runs/1001651770
✅ connectors/source-surveymonkey https://github.com/airbytehq/airbyte/actions/runs/1001651770

@vovavovavovavova vovavovavovavova merged commit 08b32d1 into master Jul 5, 2021
@vovavovavovavova vovavovavovavova deleted the valdemar/#2968_new_source_surveymonkey branch July 5, 2021 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New Source: Survey Monkey
6 participants