Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 New Source: New York Times [low-code cdk] #18746

Merged
merged 12 commits into from
Nov 10, 2022
Merged

🎉 New Source: New York Times [low-code cdk] #18746

merged 12 commits into from
Nov 10, 2022

Conversation

Xabilahu
Copy link
Contributor

@Xabilahu Xabilahu commented Nov 1, 2022

What

New Source: New York Times. https://developer.nytimes.com

Screenshot from 2022-11-01 01-59-50

🚨 User Impact 🚨

Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the connector is published, connector added to connector index as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here
Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub and connector version bumped by running the /publish command described here
Connector Generator
  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed

Tests

Unit

Put your unit tests output here.

Integration

Put your integration tests output here.

Acceptance

Screenshot from 2022-11-01 02-00-06

Copy link
Member

@marcosmarxm marcosmarxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @Xabilahu, Marcos from Airbyte here 👋 . We received more than 25 new contributions along the weekend. One is yours 🎉 thank so much for! Our team is limited and maybe the review process can take longer than expected. As described in the Airbyte's Hacktoberfest your contribution was submitted before November 2nd and it is eligible to win the prize. The review process will validate other requirements. I ask to you patience until someone from the team review it.

Because I reviewed some contributions for Hacktoberfest so far I saw some common patterns you can check in advance:

  • Make sure you have added connector documentation to /docs/integrations/
  • Remove the file catalog from /integration_tests
  • Edit the sample_config.json inside /integration_tests
  • For the configured_catalog you can use only json_schema: {}
  • Add title to all properties in the spec.yaml
  • Make sure the documentationUrl in the spec.yaml redirect to Airbyte's future connector page, eg: connector Airtable the documentationUrl: https://docs.airbyte.com/integrations/sources/airtable
  • Review now new line at EOF (end-of-file) for all files.

If possible send to me a DM in Slack with the tests credentials, this process will make easier to us run integration tests and publish your connector. If you only has production keys, make sure to create a bootstrap.md explaining how to get the keys.

@marcosmarxm marcosmarxm changed the title 🎉 New Source: New York Times 🎉 New Source: New York Times [low-code cdk] Nov 1, 2022
@Xabilahu
Copy link
Contributor Author

Xabilahu commented Nov 1, 2022

secrets/config.json being used for integ tests is:

{
  "api_key": "YOUR API KEY",
  "year": 2022,
  "month": 6,
  "period": 7,
  "shared_type": "facebook"
}

Instructions on how to get the API Key in docs/integrations/sources/nytimes.md

@Xabilahu
Copy link
Contributor Author

Xabilahu commented Nov 1, 2022

Hi @marcosmarxm, thanks for the update! Below the status of the checklist:

  • Make sure you have added connector documentation to /docs/integrations/
  • Remove the file catalog from /integration_tests
  • Edit the sample_config.json inside /integration_tests
  • For the configured_catalog you can use only json_schema: {}
  • Add title to all properties in the spec.yaml
  • Make sure the documentationUrl in the spec.yaml redirect to Airbyte's future connector page, eg: connector Airtable the documentationUrl: https://docs.airbyte.com/integrations/sources/airtable
  • Review now new line at EOF (end-of-file) for all files.

Copy link
Member

@marcosmarxm marcosmarxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments.

Comment on lines 16 to 28
year:
type: integer
title: Year
description: Year
minimum: 1851
order: 1
month:
type: integer
title: Month
description: Month
minimum: 1
maximum: 12
order: 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this must be changed to start_date and implement incremental syncs. Let me know if you need help here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this would be a good idea. I have been looking at datetime_stream_slicer.py and it seems that it does not support monthly increments. I think it would be good for this use case. I think that I'm able to contribute that change as well.

@Xabilahu
Copy link
Contributor Author

Xabilahu commented Nov 2, 2022

Acceptance tests are failing because we need the changes in #18861 to be merged, where I add support for monthly and yearly incremental updates.

Anyways, the implementation seems to work when I run the following command under a custom installation with the changes in #18861:

python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json

The used secrets/config.json has changed now:

{
  "api_key": "API-KEY",
  "start_date": "2021-06",
  "end_date": "2021-08",
  "period": 7,
  "shared_type": "facebook"
}

@marcosmarxm
Copy link
Member

Thanks @Xabilahu this is amazing, let us wait for #18861

@marcosmarxm
Copy link
Member

Hello! I'm going to be out of the office this Friday and won't be able to review your contribution again today, I return next Monday. So far, most contributions look solid and are almost done to be approved. As said in Chris' comment all contributions made before 2-November are eligible to receive the prize and have 2 weeks to merge the contributions. But I ensure next week we're going to have your contribution merged. If you have questions about the implementation you can send them in #hacktoberfest-2022 in Airbyte's Slack.

Sorry the inconvenience and see you again next week, thank you so much for your contribution!

@marcosmarxm
Copy link
Member

@Xabilahu I'll review this again later today. The test are failing in my side.

@Xabilahu
Copy link
Contributor Author

Xabilahu commented Nov 8, 2022

@marcosmarxm Tests should be passing now, I removed most_popular streams from expected_records.txt, as the items retrieved change constantly.

@Xabilahu Xabilahu requested a review from marcosmarxm November 8, 2022 19:10
@marcosmarxm
Copy link
Member

marcosmarxm commented Nov 9, 2022

/test connector=connectors/source-nytimes

🕑 connectors/source-nytimes https://github.com/airbytehq/airbyte/actions/runs/3431984305
❌ connectors/source-nytimes https://github.com/airbytehq/airbyte/actions/runs/3431984305
🐛

Build Failed

Test summary info:

=========================== short test summary info ============================
FAILED test_core.py::TestBasicRead::test_read[inputs0] - Failed: Stream archi...
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:65: The previous connector image could not be retrieved.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:243: The previous connector image could not be retrieved.
============= 1 failed, 26 passed, 2 skipped in 235.10s (0:03:55) ==============

@Xabilahu
Copy link
Contributor Author

Xabilahu commented Nov 9, 2022

@marcosmarxm Need to run the acceptance tests with this config:

{
  "api_key": "API-KEY-HERE",
  "start_date": "2021-06",
  "end_date": "2021-06",
  "period": 7,
  "shared_type": "facebook"
}

@marcosmarxm
Copy link
Member

@Xabilahu I'm running with:

{
    "api_key": "api_key",
    "start_date": "2022-10",
    "end_date": "2022-11",
    "period": 7
  }

@marcosmarxm
Copy link
Member

The problem is with archive endpoint with null primary key.

@Xabilahu
Copy link
Contributor Author

Xabilahu commented Nov 9, 2022

@Xabilahu I'm running with:

{
    "api_key": "api_key",
    "start_date": "2022-10",
    "end_date": "2022-11",
    "period": 7
  }

expected_records.txt will need to be changed accordingly then

@Xabilahu
Copy link
Contributor Author

Xabilahu commented Nov 9, 2022

The problem is with archive endpoint with null primary key.

@marcosmarxm From the logs, I see that it failed at expected record validation step for archive stream:

2022-11-09T21:46:51.8622548Z         if expected_records_by_stream:
2022-11-09T21:46:51.8623286Z >           self._validate_expected_records(
2022-11-09T21:46:51.8624123Z                 records=records,
2022-11-09T21:46:51.8624911Z                 expected_records_by_stream=expected_records_by_stream,
2022-11-09T21:46:51.8625729Z                 flags=expect_records_config,
2022-11-09T21:46:51.8626472Z                 detailed_logger=detailed_logger,
2022-11-09T21:46:51.8627111Z             )
2022-11-09T21:46:51.8627467Z 
2022-11-09T21:46:51.8628454Z /usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:555: 
2022-11-09T21:46:51.8629280Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2022-11-09T21:46:51.8630522Z /usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:469: in _validate_expected_records
2022-11-09T21:46:51.8631403Z     self.compare_records(
2022-11-09T21:46:51.8632059Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2022-11-09T21:46:51.8632434Z 
2022-11-09T21:46:51.8632837Z stream_name = 'archive'
2022-11-09T21:46:51.8634615Z actual = ***'abstract': 'Houston and Philadelphia will face off in a matchup of teams that have caught fire in the postseason. G...6-2181-5a95-9b42-4d33a378cfd0', 'word_count': 0, 'uri': 'nyt://interactive/f0731f66-2181-5a95-9b42-4d33a378cfd0'***, ...***
2022-11-09T21:46:51.8637023Z expected = ***'abstract': 'A laborer discovered the fossil and hid it in a well for 85 years. Scientists say it could help sort ou...9bb-72b7-5cf6-a981-f9e987dde7d6', 'word_count': 861, 'uri': 'nyt://article/fb9549bb-72b7-5cf6-a981-f9e987dde7d6'***, ...***

Could you try to run it with the provided config?

@marcosmarxm
Copy link
Member

Can you remove the expected records? It is more used for local/dev validation.

@marcosmarxm
Copy link
Member

marcosmarxm commented Nov 10, 2022

/test connector=connectors/source-nytimes

🕑 connectors/source-nytimes https://github.com/airbytehq/airbyte/actions/runs/3433129390
✅ connectors/source-nytimes https://github.com/airbytehq/airbyte/actions/runs/3433129390
Python tests coverage:

	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          12      4    67%   16-19
	 source_acceptance_test/config.py                       133      3    98%   87, 93, 230
	 source_acceptance_test/conftest.py                     196     92    53%   35, 41-43, 48, 54, 60, 66, 72-74, 93, 98-100, 106-108, 114-115, 120-121, 126, 132, 141-150, 156-161, 176, 200, 231, 237, 243-248, 256-261, 269-282, 287-293, 300-311, 318-334
	 source_acceptance_test/plugin.py                        69     25    64%   22-23, 31, 36, 120-140, 144-148
	 source_acceptance_test/tests/test_core.py              345    110    68%   53, 64-72, 77-84, 88-89, 93-94, 178, 216-233, 242-250, 254-259, 265, 298-303, 341-348, 391-393, 396, 461-469, 481-484, 489, 545-546, 552, 555, 591-601, 614-639
	 source_acceptance_test/tests/test_incremental.py       145     20    86%   21-23, 29-31, 36-43, 48-61, 224
	 source_acceptance_test/utils/asserts.py                 37      2    95%   57-58
	 source_acceptance_test/utils/common.py                  94     10    89%   16-17, 32-38, 72, 75
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/config_migration.py        23     23     0%   5-37
	 source_acceptance_test/utils/connector_runner.py       112     50    55%   23-26, 32, 36, 39-68, 71-73, 76-78, 81-83, 86-88, 91-93, 96-114, 148-150
	 source_acceptance_test/utils/json_schema_helper.py     105     13    88%   30-31, 38, 41, 65-68, 96, 120, 190-192
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1512    375    75%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:65: The previous connector image could not be retrieved.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/tests/test_core.py:243: The previous connector image could not be retrieved.
================== 27 passed, 2 skipped in 182.02s (0:03:02) ===================

@marcosmarxm
Copy link
Member

marcosmarxm commented Nov 10, 2022

/publish connector=connectors/source-nytimes

🕑 Publishing the following connectors:
connectors/source-nytimes
https://github.com/airbytehq/airbyte/actions/runs/3435486255


Connector Did it publish? Were definitions generated?
connectors/source-nytimes

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@marcosmarxm marcosmarxm merged commit b70b6ec into airbytehq:master Nov 10, 2022
@Xabilahu Xabilahu deleted the ny-times branch November 10, 2022 10:11
akashkulk pushed a commit that referenced this pull request Dec 2, 2022
* Initial implementation: Support for `archive` stream

* Added support for `most_popular` streams (emailed, shared, viewed)

* Add `expected_records.txt` for acceptance tests

* Added Documentation

* Updated changelog with PR id

* Add support for incremental syncs

* Reduce size and remove most_popular streams from expected_records.txt

* Remove `expected_records.txt`

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com>
@sajarin sajarin added internal and removed bounty labels Dec 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants