-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SAT: check that previous config schema validates against current connector spec #15367
SAT: check that previous config schema validates against current connector spec #15367
Conversation
@@ -21,8 +21,10 @@ | |||
"jsonschema~=3.2.0", | |||
"jsonref==0.2", | |||
"deepdiff~=5.8.0", | |||
"requests-mock", | |||
"requests-mock~=1.9.3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this to freeze the version and avoid accidental breaking updates.
return DeepDiff( | ||
previous_connector_spec.dict()["connectionSpecification"], | ||
actual_connector_spec.dict()["connectionSpecification"], | ||
view="tree", | ||
ignore_order=True, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pedroslopez I changed this to only compute the diff on the connectionSpecification
field of the spec. I think this is more robust as this is in this field that we want to perform the validation, not the other technical field that exists and might be changed.
spec_diff = t.compute_spec_diff(actual_connector_spec, previous_connector_spec) | ||
with expectation: | ||
t.test_new_required_property(spec_diff) | ||
FAILING_TRANSITIONS = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pedroslopez let me know if you find this structure more readable.
I did not add a lot of new tests cases and actually removed some whose json schema was not valid (making hypothesis-jsonschema fail). (e.g I changed "type": []
to "type": ["string", "string"]
etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel kind of neutral about it... The transitions vs specifying the params don't seem much different to me - e.g . they both define a before and after spec, a description and in some way indicate if it should fail or not, so I'm not sure what we're gaining with the Transitions.
I suppose one thing that was kind of useful in the previous implementation is that the test cases were separated by what they were testing rather than having them all together in a single list. For example, cases related to verifying if enums have been introduced were grouped together, cases related to required fields were grouped together, etc. Which may be a bit easier to think about but also makes sure we have sufficient tests for a particular problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel kind of neutral about it... The transitions vs specifying the params don't seem much different to me - e.g . they both define a before and after spec, a description and in some way indicate if it should fail or not, so I'm not sure what we're gaining with the Transitions.
You're right there are not much difference, but the two advantages I found are:
- You can use named arguments on data class instantiation; you can't with pytest params. E.G: We can't do pytest.param(previous_spec,currrent_spec, should_fail=True). This is minor, but I found it increases test case readability when you have to scroll through many test cases.
- If we want to serialize the spec tests case to JSON (or any other format) in the future, we could easily add a serialization function to this data class.
Let me know if you think it's overkilled.
I suppose one thing that was kind of useful in the previous implementation is that the test cases were separated by what they were testing rather than having them all together in a single list. For example, cases related to verifying if enums have been introduced were grouped together, cases related to required fields were grouped together, etc. Which may be a bit easier to think about but also makes sure we have sufficient tests for a particular problem.
Yes, you are right. I could split the big list into smaller scoped chunks, but on the other hand, I find it convenient for future maintenance to keep it like this. Let's say you wonder if a transition you consider failing is covered: simply add this transition to the FAILING_TRANSITIONS
list, and you won't have to wonder to what list it should be added or create a new dedicated list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, nice to see that adding this additional tests ended up being pretty straightforward 😄
I'm not sure I entirely see the benefit of introducing the Transitions class vs using parametrized tests like elsewhere, but I left more details on this in the comments.
Running the unit tests on SAT takes a bit more time as hypothesis generates a lot of fake data for each backward compatibility test case.
How big is the impact here?
previous_connector_spec: ConnectorSpecification, | ||
) -> DeepDiff: | ||
assert isinstance(actual_connector_spec, ConnectorSpecification) and isinstance(previous_connector_spec, ConnectorSpecification) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this fixture no longer return the spec diff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I did not clean this very well.
assert isinstance(actual_connector_spec, ConnectorSpecification) and isinstance(previous_connector_spec, ConnectorSpecification) | ||
spec_diff = self.compute_spec_diff(actual_connector_spec, previous_connector_spec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we doing this here instead of using the spec_diff
fixture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to discard spec_diff
fixture for the moment because:
- No other tests are using it.
- The signature of
test_backward_compatibility
was changed to useactual_connector_spec
andprevious_connector_spec
(required forvalidate_previous_configs
). Using aspec_diff
fixture was convenient to have a lighter signature ontest_backward_compatibility
that only requiredspec_diff
fixture, but now that I addedvalidate_previous_configs
, I find it overkill to add a dedicatedspec_diff
fixture that takes the same inputs as the actual test.
checker = SpecDiffChecker(spec_diff) | ||
checker.check() | ||
validate_previous_configs(previous_connector_spec, actual_connector_spec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious on why the move to a single test case - is this to prevent further checks from running if a previous check fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done it for three reasons:
- Simplifying unit testing: we can create many backward compatible/incompatible specs and throw them against the same function.
- Limiting the growth of the
TestSpec
class: I preferred to limit the number of lines added totest_core
for code readability and defer the actual backward compatibility check logic to a separate module/class. - Logs readability: Make test outputs more readable by grouping all the
backward_compatibility
checks in a single test run: in the logs, you'll have something liketest_backward_compatibility failed
.
Let me know if you think it's not the right direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when consolidating many unit tests, I think the most important feature to retain is the granularity of the error messages on each assertion, which this class does. So it makes sense with your reasoning. I like it
with expectation: | ||
t.test_additional_properties_is_true(connector_spec) | ||
@dataclass | ||
class Transition: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transition
feels a bit ambiguous, maybe something like SpecTransition
or similar would be more appropriate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed in b8be912
spec_diff = t.compute_spec_diff(actual_connector_spec, previous_connector_spec) | ||
with expectation: | ||
t.test_new_required_property(spec_diff) | ||
FAILING_TRANSITIONS = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel kind of neutral about it... The transitions vs specifying the params don't seem much different to me - e.g . they both define a before and after spec, a description and in some way indicate if it should fail or not, so I'm not sure what we're gaining with the Transitions.
I suppose one thing that was kind of useful in the previous implementation is that the test cases were separated by what they were testing rather than having them all together in a single list. For example, cases related to verifying if enums have been introduced were grouped together, cases related to required fields were grouped together, etc. Which may be a bit easier to think about but also makes sure we have sufficient tests for a particular problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, it seems like the connectors build is failing on a bunch of tests - we should totally fix those:
Results (18.52s):
211 passed
39 failed
- test_backward_compatibility.py:860 test_backward_compatibility[Top level: declaring the required field should fail.]
- test_backward_compatibility.py:860 test_backward_compatibility[Nested level: adding the required field should fail.]
- test_backward_compatibility.py:860 test_backward_compatibility[Top level: adding a new required property should fail.]
- test_backward_compatibility.py:860 test_backward_compatibility[Nested level: adding a new required property should fail.]
- test_backward_compatibility.py:860 test_backward_compatibility[Nullable: Making a field not nullable should fail (not in a list).]
.....
https://github.com/airbytehq/airbyte/runs/7694178244?check_suite_focus=true#step:12:8003
…om:airbytehq/airbyte into augustin/sat/test-old-config-with-new-spec
Should be fixed by f42aa45 🙏 |
I locally ran the unit tests on SAT:
This will make the Connector Base build take a bit longer in the CI. This duration can be reduced by limiting the number of generated fake configs (50 fake config: 21seconds, 10 fake configs: 4 seconds). If this is a problem we could decide to reduce the number of fake configs only for unit tests but keep 100 for the normal SAT run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice!
...grations/bases/source-acceptance-test/source_acceptance_test/utils/backward_compatibility.py
Outdated
Show resolved
Hide resolved
checker = SpecDiffChecker(spec_diff) | ||
checker.check() | ||
validate_previous_configs(previous_connector_spec, actual_connector_spec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when consolidating many unit tests, I think the most important feature to retain is the granularity of the error messages on each assertion, which this class does. So it makes sense with your reasoning. I like it
/publish connector=bases/source-acceptance-test auto-bump-version=false
if you have connectors that successfully published but failed definition generation, follow step 4 here |
The previous publish failed because the generation of fake config took too much time. I reduced the number of generated fake config in 3c5d24d for unit tests. |
/publish connector=bases/source-acceptance-test auto-bump-version=false
if you have connectors that successfully published but failed definition generation, follow step 4 here |
commit 10fb1dc137175d09826cdfcf419698e3000cd418 Author: pmossman <parker@airbyte.io> Date: Wed Aug 10 13:37:00 2022 -0700 format and pmd commit 7c223ec2e0abfec864c11395d7310d679706a49c Author: pmossman <parker@airbyte.io> Date: Wed Aug 10 13:08:09 2022 -0700 update peristStateActivity test commit 763e9e2c5ca5f998ab49ffd0dcafb7ae81201b2b Author: pmossman <parker@airbyte.io> Date: Fri Aug 5 15:24:10 2022 -0700 format commit c176e63c3841a1f08c7a43359d293b12297e03e4 Author: pmossman <parker@airbyte.io> Date: Fri Aug 5 15:18:03 2022 -0700 move converters to module that worker can access, convert statePersistence calls to API calls, convert statePersistence helper to local private method commit 1b583487b4ea7dd058944cdbce4de6197f967523 Author: pmossman <parker@airbyte.io> Date: Fri Aug 5 10:37:00 2022 -0700 add createOrUpdateState API endpoint commit d87eed6215ce451a3e126d433991967317839876 Author: pmossman <parker@airbyte.io> Date: Fri Aug 5 13:42:16 2022 -0700 add AirbyteApiClient to WorkerApp for data plane workers to use commit a65524a Author: Teal Larson <LARSON.TEAL@GMAIL.COM> Date: Wed Aug 10 16:03:59 2022 -0400 🪟 🔧 Add testing and storybook component for CatalogDiffModal (#15426) * wip diff modal test setup * starting storybook add * storybook working now * cleanup * aria labels * test syncmode string commit 2f17e99 Author: Liren Tu <tuliren.git@outlook.com> Date: Wed Aug 10 13:02:01 2022 -0700 🐞 Postgres source: fix bug in intermediate state emission (#15496) * Rename record counter * Rename method * Emit intermediate state after all cursor records * Emit intermediate state only when it is ready * Merge two checks * Add a testing message * Fix unit tests * Add one more testing record and add comments * Add test case for multiple records with the same cursor value * Revert irrelevant change * Add explanation in javadoc * Format code * Rename testing methods * Fix comment * Bump version * auto-bump connector version [ci skip] Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> commit f540499 Author: Alexandre Girard <alexandre@airbyte.io> Date: Wed Aug 10 11:37:07 2022 -0700 [low-code connectors]: Assert there are no custom top-level fields (#15489) * move components to definitions field * Also update the references * validate the top level fields and add version * raise exception on unknown fields * newline * unit tests * set version to 0.1.0 * newline commit f52bfb6 Author: Xiaohan Song <xiaohan@airbyte.io> Date: Wed Aug 10 11:16:17 2022 -0700 change query frequency to 1hour (#15499) commit f143c8f Author: midavadim <midavadim@yahoo.com> Date: Wed Aug 10 21:13:51 2022 +0300 :tada: Source File - add support for custom encoding (#15293) * added support for custom encoding * fixed unit test for utf16 * updated docs * bumped connector version * auto-bump connector version [ci skip] Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> commit bbf3584 Author: Alexandre Girard <alexandre@airbyte.io> Date: Wed Aug 10 10:58:22 2022 -0700 Remove unused field from JsonSchema (#15425) * few fixes from working with sendgrid * reset to master * only update the docstring * reset commit a280113 Author: VitaliiMaltsev <39538064+VitaliiMaltsev@users.noreply.github.com> Date: Wed Aug 10 20:44:51 2022 +0300 Destination S3: add LZO compression support (#15394) * Fixed bucket naming for S3 * Destination S3: add LZO compression support for parquet files * Destination S3: add LZO compression support for parquet files * implemented logic for aarch64 * removed redundant logging * updated changelog * moved intstall of native-lzo lib to Dockerfile * removed redundant logging * add unit test for aarch64 * bump version * auto-bump connector version [ci skip] Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> commit 29c3426 Author: sivankumar86 <sivankumar86@users.noreply.github.com> Date: Thu Aug 11 02:34:47 2022 +1000 Source MSSQL: special character support in dbname for CDC method (#15430) * information schema included * special character handle * Revert "information schema included" This reverts commit f0aee6a. * version change * doc update * auto-bump connector version [ci skip] Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> commit 959d862 Author: Baz <oleksandr.bazarnov@globallogic.com> Date: Wed Aug 10 19:20:22 2022 +0300 🐛 Source SalesForce: changed `DEFAULT_WAIT_TIMEOUT_SECONDS` to 24-hour limit (#15444) commit 0092712 Author: Subodh Kant Chaturvedi <subodh1810@gmail.com> Date: Wed Aug 10 21:48:57 2022 +0530 fix postgres data handling from WAL logs in CDC mode (#15481) * fix postgres data handling from WAL logs in CDC mode * format * use formatter for dates also (#15485) * format * change test structure * change log to debug Co-authored-by: Edward Gao <edward.gao@airbyte.io> commit fdb5eb9 Author: Evan Tahler <evan@airbyte.io> Date: Wed Aug 10 09:03:02 2022 -0700 Simplify the `MigrationAcceptanceTest` (#15497) * disable `testAutomaticMigration` * empty commit to retry tests * Simplify the MigrationAcceptanceTest * lint * Fix PMD. Reorder some calls to make clear what is happening. Co-authored-by: Davin Chia <davinchia@gmail.com> commit fd70913 Author: Augustin <augustin.lafanechere@gmail.com> Date: Wed Aug 10 17:42:07 2022 +0200 SAT: compatibility tests for catalogs (#15486) commit 1ad5152 Author: Evan Tahler <evan@airbyte.io> Date: Wed Aug 10 08:21:52 2022 -0700 Disable automaticMigrationAcceptanceTest (#15492) * disable `testAutomaticMigration` * empty commit to retry tests commit 1228451 Author: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com> Date: Wed Aug 10 11:11:45 2022 -0400 Fix styles impacting global ul, li in FieldSection component (#15484) commit 6aa08e0 Author: Jonathan Pearlin <jonathan@airbyte.io> Date: Wed Aug 10 10:55:46 2022 -0400 Add micronaut dependencies and bundles (#15459) * Add micronaut dependencies and bundles * Update Micronaut core commit 7662956 Author: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com> Date: Wed Aug 10 10:51:43 2022 -0400 🪟 🧹 Cleanup documentation panel components (#15455) * Add docs/ to frontend workspace * Migrate Markdown components to scss and cleanup when not found is rendered * Add white-space: break-spaces rule to markdown code blocks commit 1258ab4 Author: Topher Lubaway <asimplechris@gmail.com> Date: Wed Aug 10 09:05:14 2022 -0500 Revert "Adds PAT check to shared pr check (#15453)" (#15511) This reverts commit 06a18d4. commit 853b88a Author: Kyryl Skobylko <xpuska513@gmail.com> Date: Wed Aug 10 16:48:20 2022 +0300 fix: fix gcs-log creds secret name, add externaldb configuration for temporal, fix webapp ingress (#15510) commit c782303 Author: Yatsuk Bogdan <yatsukbogdan@gmail.com> Date: Wed Aug 10 15:57:26 2022 +0300 :window: :art: Increases GroupTitle followed divs width from 180px to 250px (#13956) * Increases GroupControls followed divs width from 180px to 250px * Increases min-width for GroupTitle * Change layout to flexbox Co-authored-by: Tim Roes <tim@airbyte.io> commit e28bc3a Author: Serhii Chvaliuk <grubberr@gmail.com> Date: Wed Aug 10 13:55:29 2022 +0300 🎉Source Harvest: Added `parent_id` for all streams which have parent stream (#15221) Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com> commit aaa3aae Author: Tuhai Maksym <kimerinn@gmail.com> Date: Wed Aug 10 12:43:55 2022 +0300 15310: Destination Scylla: Handle per-stream state (#15399) * 15310: Destination Scylla: Handle per-stream state * 15399: test fix * 15318: test fix * 15318: updating version * auto-bump connector version [ci skip] Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> commit c724630 Author: Yurii Bidiuk <35812734+yurii-bidiuk@users.noreply.github.com> Date: Wed Aug 10 10:17:23 2022 +0300 Add test case for new fields appearing in data (#15372) * add test case for new field(s) appearing in data * rework test to verify that sync at least not failed if new fields are present commit 6e1a76f Author: Serhii Chvaliuk <grubberr@gmail.com> Date: Wed Aug 10 09:24:40 2022 +0300 🐛 Source Amazon Ads: define primary_key for all report streams (#15469) Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com> commit c1a0cbc Author: Octavia Squidington III <90398440+octavia-squidington-iii@users.noreply.github.com> Date: Wed Aug 10 04:20:12 2022 +0200 Bump Airbyte version from 0.39.42-alpha to 0.40.0-alpha (#15493) Co-authored-by: benmoriceau <benmoriceau@users.noreply.github.com> commit f6766ee Author: Benoit Moriceau <benoit@airbyte.io> Date: Wed Aug 10 07:50:41 2022 +0800 Revert "Revert "Release per stream to the OSS project (#15008)" (#15177)" (#15401) This reverts commit 362fc4e. commit eab0013 Author: Edward Gao <edward.gao@airbyte.io> Date: Tue Aug 9 16:13:09 2022 -0700 🐛 Source snowflake: int columns should be discovered as ints (#15314) * snowflake discovers ints as ints * version bump+changelog * bump version+changelog * auto-bump connector version [ci skip] Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> commit f506c60 Author: Anne <102554163+alovew@users.noreply.github.com> Date: Tue Aug 9 16:07:35 2022 -0700 Track number of streams in syncs (#15478) * Add number_of_streams to job sync tracking commit 6c5d1ff Author: Augustin <augustin.lafanechere@gmail.com> Date: Wed Aug 10 00:33:58 2022 +0200 SAT: measure unit test coverage (#15443) commit e9afa9b Author: Anne <102554163+alovew@users.noreply.github.com> Date: Tue Aug 9 15:30:48 2022 -0700 Error Prone PMD rules (#15010) * Implement ErrorProne PMD rules: AssignmentInOperand AvoidAccessibilityAlteration AvoidBranchingStatementAsLastInLoop AvoidCatchingNPE AvoidCatchingThrowable AvoidDuplicateLiterals rule commit c536e51 Author: Tim Roes <tim@airbyte.io> Date: Wed Aug 10 00:11:12 2022 +0200 Fix copy link to logs functionality (#15368) * Fix copy link to logs functionality * Update airbyte-webapp/src/components/JobItem/JobItem.tsx Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com> * Fix scrolling * Remove smooth scrolling * Improve effect for better return statements * Better scroll Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com> commit 62303a8 Author: Augustin <augustin.lafanechere@gmail.com> Date: Tue Aug 9 23:07:13 2022 +0200 SAT: check that previous config schema validates against current connector spec (#15367) commit 123705c Author: Stephen Wentling <stephen@swentling.com> Date: Tue Aug 9 21:30:14 2022 +0100 Source Jira: Added updates to include issue components and fixes to README files (#15135) * solve readme conflict * updated jira sources with open PR details * correct additionalProperties test discover Co-authored-by: marcosmarxm <marcosmarxm@gmail.com> commit 9e691d8 Author: Alex <109167606+alex-gron@users.noreply.github.com> Date: Tue Aug 9 14:28:38 2022 -0500 fix broken link (#15379) commit 36ed6ce Author: Denys Davydov <davydov.den18@gmail.com> Date: Tue Aug 9 21:58:52 2022 +0300 #15445 source typeform: integration tests (#15446) commit 06a18d4 Author: Topher Lubaway <asimplechris@gmail.com> Date: Tue Aug 9 13:33:20 2022 -0500 Adds PAT check to shared pr check (#15453) * Adds PAT check to shared pr check * Name change * Removes "safe_to_push" string * Adds OCTAVIA_PAT and uses the found PAT found PAT was not used in all locales, so this could have still failed on an expired OCTAVIA_PAT before this change
What
Closes #14595
In addition to syntactic checks hardcoded in #15194 we also want to perform classic jsonschema validation against fake previous configuration. This will make sure a configuration created with the previous version of a connector is still compatible with the current version.
How
utils.backward_compatibility.SpecDiffChecker.check()
airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_core.pyutils.backward_compatibility.validate_previous_configs
to:🚨 User Impact 🚨
Running the unit tests on SAT takes a bit more time as
hypothesis
generates a lot of fake data for each backward compatibility test case.Test failure example on
SpecDiffChecker.check()
Test failure example on
validate_previous_configs()