[low code connectors] generate complete json schema from classes #15647

brianjlai · 2022-08-15T08:39:14Z

What

Adds a new method for generating the complete json schema of all the components of the low code declarative language starting from the top level CheckStream and DeclarativeStreams. How this is ultimately surfaced to the user is an open question, but for now it's returned as a flat string which can be formatted on other apps like Sublime.

How

Performs a complete traversal of the object model starting at the top of the declarative framework by traversing through each component relationship and expanding fields that are represented by an interface into their concrete class implementations.

This uses a recursive algorithm to perform the traversal by iterating over a class' underlying fields. For each field we unpack them if they're stored in a generic data structure and translate them into the class types that implement the interface. Then we perform the same process on the underlying declarative components (all other types and primitives are ignored)

After the traversal and translating the interfaces into classes, the dataclasses-jsonschema is used to generate a schema from the data model which is then transformed into a json-readable format

Recommended reading order

yaml_declarative_source.py

…validator

…rative language

brianjlai · 2022-08-15T22:23:24Z

airbyte-cdk/python/airbyte_cdk/sources/declarative/yaml_declarative_source.py

+        next_classes = []
+        copy_cls = type(expand_class.__name__, expand_class.__bases__, dict(expand_class.__dict__))
+        class_fields = fields(copy_cls)
+        for field in class_fields:


This part of the code follows a similar structure to the original schema validation in #15543 where we unpack the underlying types and replace the annotations

brianjlai · 2022-08-15T22:25:06Z

airbyte-cdk/python/airbyte_cdk/sources/declarative/yaml_declarative_source.py

+        return copy_cls
+
+    @staticmethod
+    def _get_next_expand_classes(field_type) -> list[type]:


We can just return a flat list of next components that need to be traversed because we are relying on unpack to properly resolve interfaces and retain the existing structure. This function is solely responsible for figuring out the next set of components that need to be expanded as part of the next recursive call.

girarda · 2022-08-16T01:01:29Z

airbyte-cdk/python/unit_tests/sources/declarative/test_factory.py

@@ -807,3 +808,8 @@ def test_validate_types_nested_in_list():
 def test_unpack(test_name, input_type, expected_unpacked_types):
    actual_unpacked_types = DeclarativeComponentFactory.unpack(input_type)
    assert actual_unpacked_types == expected_unpacked_types
+
+
+def test_complete_schema():


do we want to keep this test?

nope, this is just for my convenience testing, i'll remove it if we're just gonna generate the json file and put it into the repo

girarda · 2022-08-16T01:02:33Z

airbyte-cdk/python/airbyte_cdk/sources/declarative/yaml_declarative_source.py

+    def default(self, obj):
+        if isinstance(obj, property):
+            return str(obj)
+        elif isinstance(obj, HttpMethod):


is this because HttpMethod is an enum? Is RequestOptionType somehow not a problem?

Yep this provides a serialization implementation for the HttpMethod enum. The reason why we need it for HttpMethod is because we assign the enum value HttpMethod.GET as the default so the enum (not just a string) shows up in the schema here.

But for RequestOptionType since we don't assign the enum anywhere, enum class shows up in the schema, but not the enum value. That's why we didn't run into the serialization problem

can we check if the type is an enum instead so we don't have to add another case to this method when we add another enum with a default value?

good point, will change this

…plete_schema

girarda · 2022-08-16T14:53:47Z

airbyte-cdk/python/airbyte_cdk/sources/declarative/yaml_declarative_source.py

+            module = field_type.__module__
+            # We can only continue parsing declarative components since we explicitly inherit from the JsonSchemaMixin class which is
+            # used to generate the final json schema
+            if "airbyte_cdk.sources.declarative" in module and not isinstance(field_type, EnumMeta):


should we just check that the class inherits from JsonSchemaMixin if that's the condition that matters?

That should work too. It does have an interesting side effect. Maybe it's what we want, but when interfaces like PaginationStrategy inherit JsonSchemaMixin, they show up in the schema. It might be what we want since they are part of the language, but also can be a little confusing because they don't have any fields or schemas to validate. They also turn every type into unions of the interface and concrete types

Basically the readability of the schema is worse, but it is technically more accurate.

do they show up as a union of the subtypes or as an empty object? It think showing them is better if they're defined as the union, but worse otherwise

It'll look something like this when you view the ListStreamSlicer in the definitions. It is kind of the union, where the first referes the empty StreamSlicer object, and the second is the concrete definition which will actually get used in the validation.

ListStreamSlicer: [ { "$ref": "#/definitions/StreamSlicer" }, { "type": "object", "required": [ "slice_values", "cursor_field", "config" ], "properties": { "slice_values": { "anyOf": [ { "type": "array", "items": { "type": "string" } }, { "type": "string" } ] }, "cursor_field": { "anyOf": [ { "$ref": "#/definitions/InterpolatedString" }, { "type": "string" } ] }, "config": { "type": "object" }, "request_option": { "$ref": "#/definitions/RequestOption" } } } ]

discussed this on slack a bit, a slightly more verbose schema is probably acceptable vs module path assertions

…plete_schema

girarda

approved pending the enum type check on https://github.com/airbytehq/airbyte/pull/15647/files#diff-f065e5627510a5d82633b9ac35383b6065f9b09ab4381f667763ad322a13dccaR156

…plete_schema

…rative language

…/airbyte into brian/generate_complete_schema

) * draft: first pass at complete schema language generation and factory validator * actually a working validator and fixes to the schema that went uncaught * remove extra spike file * fix formatting file * Add method to generate the complete JSON schema of the low code declarative language * add testing of a few components during schema gen * pr feedback and a little bit of refactoring * test for schema version * fix some types that were erroneously marked as invalid schema * some comments * add jsonschemamixin to interfaces * update tests now that interfaces are jsonschemamixin * accidentally removed a mixin * remove unneeded test * make comment a little more clear * update changelog * bump version * generic enum not enum class * Add method to generate the complete JSON schema of the low code declarative language * add testing of a few components during schema gen * test for schema version * update tests now that interfaces are jsonschemamixin * accidentally removed a mixin * remove unneeded test * make comment a little more clear * generic enum not enum class * add generated json file and update docs to reference it * verbage

brianjlai added 7 commits August 11, 2022 02:35

draft: first pass at complete schema language generation and factory …

1efe2f0

…validator

Merge branch 'master' into brian/low_code_validate_schema

30a73ae

actually a working validator and fixes to the schema that went uncaught

5ca8e9e

remove extra spike file

385a593

fix formatting file

a29c048

Add method to generate the complete JSON schema of the low code decla…

4be31d8

…rative language

add testing of a few components during schema gen

5a53476

github-actions bot added the CDK Connector Development Kit label Aug 15, 2022

brianjlai changed the base branch from master to brian/low_code_validate_schema August 15, 2022 08:42

brianjlai commented Aug 15, 2022

View reviewed changes

brianjlai marked this pull request as ready for review August 15, 2022 22:25

brianjlai requested review from a team and girarda August 15, 2022 22:25

girarda reviewed Aug 16, 2022

View reviewed changes

brianjlai added 4 commits August 15, 2022 22:29

pr feedback and a little bit of refactoring

41eea30

Merge branch 'master' into brian/low_code_validate_schema

e232f09

Merge branch 'brian/low_code_validate_schema' into brian/generate_com…

9ee6b68

…plete_schema

test for schema version

f55828f

girarda reviewed Aug 16, 2022

View reviewed changes

brianjlai added 9 commits August 16, 2022 10:22

Merge branch 'master' into brian/low_code_validate_schema

1111da5

fix some types that were erroneously marked as invalid schema

2d71d85

some comments

65a7bd0

Merge branch 'master' into brian/low_code_validate_schema

8d3d1fc

Merge branch 'brian/low_code_validate_schema' into brian/generate_com…

3fb3732

…plete_schema

add jsonschemamixin to interfaces

561a285

Merge branch 'master' into brian/low_code_validate_schema

5d2e722

Merge branch 'brian/low_code_validate_schema' into brian/generate_com…

29e6e3f

…plete_schema

update tests now that interfaces are jsonschemamixin

019165d

update changelog

6ef9e10

girarda approved these changes Aug 18, 2022

View reviewed changes

bump version

3612789

Base automatically changed from brian/low_code_validate_schema to master August 18, 2022 19:29

brianjlai added 11 commits August 18, 2022 12:31

Merge branch 'brian/low_code_validate_schema' into brian/generate_com…

687281d

…plete_schema

generic enum not enum class

74e0e51

Add method to generate the complete JSON schema of the low code decla…

4cdccb4

…rative language

add testing of a few components during schema gen

c60ce3a

test for schema version

743fbaf

update tests now that interfaces are jsonschemamixin

4eb1cc9

accidentally removed a mixin

23d9dd1

remove unneeded test

9476b5c

make comment a little more clear

b1da2c2

generic enum not enum class

364d2ca

Merge branch 'brian/generate_complete_schema' of github.com:airbytehq…

d687e70

…/airbyte into brian/generate_complete_schema

octavia-squidington-iii added the connectors/source/intercom label Aug 18, 2022

github-actions bot added area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Aug 18, 2022

Merge branch 'master' into brian/generate_complete_schema

522aa67

github-actions bot removed area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Aug 18, 2022

add generated json file and update docs to reference it

89105fc

github-actions bot added the area/documentation Improvements or additions to documentation label Aug 18, 2022

verbage

63d395d

brianjlai merged commit 7e158ef into master Aug 18, 2022

brianjlai deleted the brian/generate_complete_schema branch August 18, 2022 22:53

octavia-squidington-iii mentioned this pull request Aug 19, 2022

Bump Airbyte version from 0.40.0-alpha to 0.40.1-alpha #15775

Closed

evantahler mentioned this pull request Aug 22, 2022

Bump Airbyte version from 0.40.0-alpha to 0.40.1 #15857

Merged

brianjlai mentioned this pull request Aug 24, 2022

[low-code connectors] add config-based connectors json schema to repo to docs for reference #14510

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[low code connectors] generate complete json schema from classes #15647

[low code connectors] generate complete json schema from classes #15647

brianjlai commented Aug 15, 2022 •

edited

Loading

brianjlai Aug 15, 2022

brianjlai Aug 15, 2022

girarda Aug 16, 2022

brianjlai Aug 16, 2022

girarda Aug 16, 2022

brianjlai Aug 16, 2022 •

edited

Loading

girarda Aug 16, 2022

brianjlai Aug 17, 2022

girarda Aug 16, 2022

brianjlai Aug 18, 2022

girarda Aug 18, 2022

brianjlai Aug 18, 2022

brianjlai Aug 18, 2022

girarda left a comment

[low code connectors] generate complete json schema from classes #15647

[low code connectors] generate complete json schema from classes #15647

Conversation

brianjlai commented Aug 15, 2022 • edited Loading

What

How

Recommended reading order

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brianjlai Aug 16, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

girarda left a comment

Choose a reason for hiding this comment

brianjlai commented Aug 15, 2022 •

edited

Loading

brianjlai Aug 16, 2022 •

edited

Loading