Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 New Source: Netsuite #13086

Closed
wants to merge 37 commits into from

Conversation

wjwatkinson
Copy link
Contributor

@wjwatkinson wjwatkinson commented May 23, 2022

What

Netsuite Connector

How

Add Netsuite Connector using python cdk

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/SUMMARY.md
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the connector is published, connector added to connector index as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

Tests

Unit
pipenv run pytest -s unit_tests/
Test session starts (platform: darwin, Python 3.9.12, pytest 6.2.5, pytest-sugar 0.9.4)
cachedir: .pytest_cache
rootdir: /Users/willwatkinson/src/github.com/airbytehq/airbyte, configfile: pytest.ini
plugins: sugar-0.9.4, requests-mock-1.9.3, mock-3.6.1, timeout-1.4.2
collecting ...
 airbyte-integrations/connectors/source-netsuite/unit_tests/test_incremental_streams.py::test_cursor_field ✓5% ▌
 airbyte-integrations/connectors/source-netsuite/unit_tests/test_incremental_streams.py::test_get_updated_state ✓10% ▉
 airbyte-integrations/connectors/source-netsuite/unit_tests/test_incremental_streams.py::test_supports_incremental ✓14% █▌
 airbyte-integrations/connectors/source-netsuite/unit_tests/test_incremental_streams.py::test_source_defined_cursor ✓19% █▉
 airbyte-integrations/connectors/source-netsuite/unit_tests/test_incremental_streams.py::test_request_params ✓24% ██▍
 airbyte-integrations/connectors/source-netsuite/unit_tests/test_incremental_streams.py::test_stream_slices_start_datetime ✓29% ██▉
 airbyte-integrations/connectors/source-netsuite/unit_tests/test_incremental_streams.py::test_stream_slices_stream_state ✓33% ███▍
 airbyte-integrations/connectors/source-netsuite/unit_tests/test_source.py::test_check_connection ✓  38% ███▊
 airbyte-integrations/connectors/source-netsuite/unit_tests/test_source.py::test_check_connection_record_types ✓43% ████▍
 airbyte-integrations/connectors/source-netsuite/unit_tests/test_source.py::test_streams ✓           48% ████▊
 airbyte-integrations/connectors/source-netsuite/unit_tests/test_streams.py::test_request_params ✓   52% █████▎
 airbyte-integrations/connectors/source-netsuite/unit_tests/test_streams.py::test_next_page_token ✓  57% █████▊
 airbyte-integrations/connectors/source-netsuite/unit_tests/test_streams.py::test_format_date ✓      62% ██████▎
 airbyte-integrations/connectors/source-netsuite/unit_tests/test_streams.py::test_fetch_record ✓     67% ██████▋
 airbyte-integrations/connectors/source-netsuite/unit_tests/test_streams.py::test_parse_response ✓   71% ███████▎
 airbyte-integrations/connectors/source-netsuite/unit_tests/test_streams.py::test_http_method ✓      76% ███████▋
 airbyte-integrations/connectors/source-netsuite/unit_tests/test_streams.py::test_should_retry[HTTPStatus.OK-False] ✓81% ████████▏
 airbyte-integrations/connectors/source-netsuite/unit_tests/test_streams.py::test_should_retry[HTTPStatus.BAD_REQUEST-False] ✓86% ████████▋
 airbyte-integrations/connectors/source-netsuite/unit_tests/test_streams.py::test_should_retry[HTTPStatus.TOO_MANY_REQUESTS-True] ✓90% █████████▏
 airbyte-integrations/connectors/source-netsuite/unit_tests/test_streams.py::test_should_retry[HTTPStatus.INTERNAL_SERVER_ERROR-True] ✓95% █████████▌
 airbyte-integrations/connectors/source-netsuite/unit_tests/test_streams.py::test_backoff_time ✓    100% ██████████
================================================ warnings summary =================================================
airbyte-integrations/connectors/source-netsuite/unit_tests/test_incremental_streams.py: 7 warnings
airbyte-integrations/connectors/source-netsuite/unit_tests/test_source.py: 2 warnings
airbyte-integrations/connectors/source-netsuite/unit_tests/test_streams.py: 11 warnings
  /Users/willwatkinson/.local/share/virtualenvs/source-netsuite-AEapHDjm/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py:43: DeprecationWarning: Call to deprecated class NoAuth. (Set `authenticator=None` instead) -- Deprecated since version 0.1.20.
    self._authenticator: HttpAuthenticator = NoAuth()

airbyte-integrations/connectors/source-netsuite/unit_tests/test_incremental_streams.py: 7 warnings
airbyte-integrations/connectors/source-netsuite/unit_tests/test_source.py: 2 warnings
airbyte-integrations/connectors/source-netsuite/unit_tests/test_streams.py: 11 warnings
  /Users/willwatkinson/.local/share/virtualenvs/source-netsuite-AEapHDjm/lib/python3.9/site-packages/deprecated/classic.py:173: DeprecationWarning: Call to deprecated class HttpAuthenticator. (Use requests.auth.AuthBase instead) -- Deprecated since version 0.1.20.
    return old_new1(cls, *args, **kwargs)

-- Docs: https://docs.pytest.org/en/stable/warnings.html

Results (0.13s):
      21 passed
Integration

Put your integration tests output here.

Acceptance
docker  build . -t airbyte/source-netsuite:dev && pipenv run python -m pytest -p integration_tests.acceptance
[+] Building 0.9s (16/16) FINISHED
 => [internal] load build definition from Dockerfile                                                          0.0s
 => => transferring dockerfile: 37B                                                                           0.0s
 => [internal] load .dockerignore                                                                             0.0s
 => => transferring context: 34B                                                                              0.0s
 => [internal] load metadata for docker.io/library/python:3.9.11-alpine3.15                                   0.7s
 => [base 1/1] FROM docker.io/library/python:3.9.11-alpine3.15@sha256:45ddd216e6b4efee0617e15d541e9148ffd689  0.0s
 => [internal] load build context                                                                             0.0s
 => => transferring context: 9.42kB                                                                           0.0s
 => CACHED [builder 1/4] WORKDIR /airbyte/integration_code                                                    0.0s
 => CACHED [builder 2/4] RUN apk --no-cache upgrade     && pip install --upgrade pip     && apk --no-cache a  0.0s
 => CACHED [builder 3/4] COPY setup.py ./                                                                     0.0s
 => CACHED [builder 4/4] RUN pip install --prefix=/install .                                                  0.0s
 => CACHED [stage-2 2/7] COPY --from=builder /install /usr/local                                              0.0s
 => CACHED [stage-2 3/7] COPY --from=builder /usr/share/zoneinfo/Etc/UTC /etc/localtime                       0.0s
 => CACHED [stage-2 4/7] RUN echo "Etc/UTC" > /etc/timezone                                                   0.0s
 => CACHED [stage-2 5/7] RUN apk --no-cache add bash                                                          0.0s
 => CACHED [stage-2 6/7] COPY main.py ./                                                                      0.0s
 => CACHED [stage-2 7/7] COPY source_netsuite ./source_netsuite                                               0.0s
 => exporting to image                                                                                        0.0s
 => => exporting layers                                                                                       0.0s
 => => writing image sha256:64c471617ac2adc40d4c89da006bb7a2f61d1a7f7337640d3cc61d9e9911a78d                  0.0s
 => => naming to docker.io/airbyte/source-netsuite:dev                                                        0.0s

Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them
Test session starts (platform: darwin, Python 3.9.12, pytest 6.2.5, pytest-sugar 0.9.4)
cachedir: .pytest_cache
rootdir: /Users/willwatkinson/src/github.com/airbytehq/airbyte, configfile: pytest.ini
plugins: sugar-0.9.4, requests-mock-1.9.3, mock-3.6.1, timeout-1.4.2
collecting ...
 airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_core.py::TestSpec.test_config_match_spec[inputs0] ✓5% ▌
 airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_core.py::TestSpec.test_match_expected[inputs0] ✓9% ▉
 airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_core.py::TestSpec.test_docker_env[inputs0] ✓14% █▍
 airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_core.py::TestSpec.test_oneof_usage[inputs0] ✓18% █▊
 airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_core.py::TestSpec.test_required[inputs0] ✓23% ██▍
 airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_core.py::TestSpec.test_optional[inputs0] ✓27% ██▊
 airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_core.py::TestSpec.test_has_secret[inputs0] ✓32% ███▎
 airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_core.py::TestSpec.test_secret_never_in_the_output[inputs0] ✓36% ███▋
 airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_core.py::TestSpec.test_defined_refs_exist_in_json_spec_file[inputs0] ✓41% ████▏
 airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_core.py::TestSpec.test_oauth_flow_parameters[inputs0] ✓45% ████▋
 airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_core.py::TestConnection.test_check[inputs0] ✓50% █████
 airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_core.py::TestConnection.test_check[inputs1] ✓55% █████▌
 airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_core.py::TestDiscovery.test_discover[inputs0] ✓59% █████▉
 airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_core.py::TestDiscovery.test_defined_cursors_exist_in_schema[inputs0] ✓64% ██████▍
 airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_core.py::TestDiscovery.test_defined_refs_exist_in_schema[inputs0] ✓68% ██████▊
 airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_core.py::TestDiscovery.test_defined_keyword_exist_in_schema[inputs0-allOf] ✓73% ███████▍
 airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_core.py::TestDiscovery.test_defined_keyword_exist_in_schema[inputs0-not] ✓77% ███████▊
 airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_core.py::TestDiscovery.test_primary_keys_exist_in_schema[inputs0] ✓82% ████████▎
 airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_core.py::TestBasicRead.test_read[inputs0] ✓86% ████████▋
 airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_full_refresh.py::TestFullRefresh.test_sequential_reads[inputs0] ✓91% █████████▏
 airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_incremental.py::TestIncremental.test_two_sequential_reads[inputs0] ✓95% █████████▋
 airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_incremental.py::TestIncremental.test_state_with_abnormally_large_values[inputs0] ✓100% ██████████
{"type": "LOG", "log": {"level": "INFO", "message": "/Users/willwatkinson/src/github.com/airbytehq/airbyte/airbyte-integrations/connectors/source-netsuite - SAT run - 31fb32ad5689700f182a1703c832903a27d91347 - PASSED"}}

================================================ warnings summary =================================================
airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_core.py: 16 warnings
airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_full_refresh.py: 1 warning
airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_incremental.py: 2 warnings
  /Users/willwatkinson/.local/share/virtualenvs/source-netsuite-AEapHDjm/lib/python3.9/site-packages/docker/utils/utils.py:52: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
    s1 = StrictVersion(v1)

airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_core.py: 16 warnings
airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_full_refresh.py: 1 warning
airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_incremental.py: 2 warnings
  /Users/willwatkinson/.local/share/virtualenvs/source-netsuite-AEapHDjm/lib/python3.9/site-packages/docker/utils/utils.py:53: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
    s2 = StrictVersion(v2)

-- Docs: https://docs.pytest.org/en/stable/warnings.html

Results (567.88s):
      22 passed

@github-actions github-actions bot added area/connectors Connector related issues area/documentation Improvements or additions to documentation labels May 23, 2022
@wjwatkinson
Copy link
Contributor Author

The acceptance tests are failing because of timeouts. I think this is due to the connector dynamically pulling all of the objects and schemas for Netsuite, which takes a long time. This is necessary since the schemas are Netsuite instance specific. Netsuite allows the addition of custom fields and record types, which are captured in the dynamic schemas generated by Netsuite, but couldn't be with hard coded schemas.

Any ideas on how to fix the timeout issues while still pulling the schemas dynamically?

@wjwatkinson
Copy link
Contributor Author

I added an array property where users can specify the record types they want. This leads to a better user experience as it speeds up the source retrieval process dramatically and makes navigating the available streams much simpler. Setting this before running the integration tests resolves the timeout issue.

Closing this PR for now, though, as there are some other issues with the source that need to be addressed.

@wjwatkinson
Copy link
Contributor Author

All the issues with the acceptance tests have been addressed.

@wjwatkinson wjwatkinson reopened this May 25, 2022
@marcosmarxm
Copy link
Member

Thanks for this amazing contributon @wjwatkinson ! I'll ask the team to review it.

Comment on lines +42 to +46
record_types:
type: array
items:
type: string
description: The API names of the Netsuite objects you want to sync. Setting this speeds up the connection setup process by limiting the number of schemas that need to be retrieved from Netsuite.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you need to specify what schema/streams do you want to sync? maybe give the option to sync aditional streams BUT the connector need the default ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do not specify anything the connector will pull all objects present in the user's Netsuite instance. This takes a really long time, though, which is not a great experience and caused acceptance tests to fail.

@wjwatkinson wjwatkinson force-pushed the source-netsuite branch 2 times, most recently from 35e1b1c to 4974465 Compare June 22, 2022 15:30
@wjwatkinson wjwatkinson marked this pull request as ready for review July 1, 2022 19:33
@wjwatkinson
Copy link
Contributor Author

FWIW I realized recently that there is an Airbyte JDBC source framework (I missed it because it's not in the CDK documentatin). This is likely a cleaner and faster way to retrieve Netsuite data.

I'm open to renaming this connector to Netsuite REST, or something like that to leave the Netsuite name open for a JDBC connector in the future.

@marcosmarxm
Copy link
Member

Hello @wjwatkinson sorry the delay in return to you. I'll ask the connector team to review and validate your contribution.

@marcosmarxm marcosmarxm changed the title New Source: Netsuite 🎉 New Source: Netsuite Aug 8, 2022
@misteryeo
Copy link
Contributor

Thanks @marcosmarxm!

Hi @wjwatkinson, thank you so much for your contribution. Our team was actually also looking at creating this connector so I'm tagging in @drrest who was looking at it so he can not only review your contribution but can also make suggested updates based on the research we've done on our end.

@swatosh
Copy link

swatosh commented Sep 21, 2022

Has this been overtaken by #16093?

@bazarnov
Copy link
Collaborator

Closing this, as of #16093 was merged already.

@bazarnov bazarnov closed this Sep 22, 2022
@wjwatkinson
Copy link
Contributor Author

Unfortunate that #16093 has the same architecture as I used here and therefore the same limitations in performance and schema retrieval.

@misteryeo
Copy link
Contributor

Are you able to specify the performance and schema retrieval limitations you ran into? This is helpful feedback for us to keep improving the connector during the alpha period @wjwatkinson

cc: @YowanR

@wjwatkinson
Copy link
Contributor Author

I mentioned this to @YowanR and others in a slack thread where I was helping with Netsuite auth. The Netsuite REST API query returns only record ids, so you have to make 1 request per record to Netsuite to get the data. This results in very slow extraction.

There are further performance issues mentioned in this PR to do with retrieving schemas. I resolved these by passing in an array of object names, so fewer schemas are retrieved, but it's annoying. It looks like this same approach was taken in the merged source.

Netsuite offers a JDBC connector and it looks like Airbyte has a JDBC source framework, which I mentioned in a comment here, that seems like it would have better performance, but I am not very familiar with the technology.

@bazarnov
Copy link
Collaborator

I agree the SOAP or REST approaches are weak in terms of performance in general, REST allows us to fetch all possible objects and use them as streams of data. Probably the JDBC approach will give us the opportunity to work with Netsuite as Database, if so - the schema retrieval should be much faster, not sure about the streaming performance, but definitely worth trying.

@wjwatkinson
Copy link
Contributor Author

Generally I have found the performance of JSON APIs performant enough for us, but they return 100+ records per call making them 100+x more efficient than the Netsuite API.

@bazarnov
Copy link
Collaborator

JSON APIs

@wjwatkinson
Could you please give more context about JSON APIs ?

@wjwatkinson
Copy link
Contributor Author

Netsuite's queries only return record ids, so you have to make an additional request per record. 1000 records = 1001 requests.

All of the other APIs I work with through Airbyte return the data directly in the query, so one request returns 100 records. 1000 records = 10 requests.

If you're asking what I mean by JSON APIs I just mean APIs that return JSON data. REST has a specific definition JSON APIs generally don't meet, although I think Netsuite's REST API does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation community connectors/source/netsuite internal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants