Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[low code connectors] generate complete json schema from classes #15647

Merged
merged 40 commits into from
Aug 18, 2022

Conversation

brianjlai
Copy link
Contributor

@brianjlai brianjlai commented Aug 15, 2022

What

Adds a new method for generating the complete json schema of all the components of the low code declarative language starting from the top level CheckStream and DeclarativeStreams. How this is ultimately surfaced to the user is an open question, but for now it's returned as a flat string which can be formatted on other apps like Sublime.

How

Performs a complete traversal of the object model starting at the top of the declarative framework by traversing through each component relationship and expanding fields that are represented by an interface into their concrete class implementations.

This uses a recursive algorithm to perform the traversal by iterating over a class' underlying fields. For each field we unpack them if they're stored in a generic data structure and translate them into the class types that implement the interface. Then we perform the same process on the underlying declarative components (all other types and primitives are ignored)

After the traversal and translating the interfaces into classes, the dataclasses-jsonschema is used to generate a schema from the data model which is then transformed into a json-readable format

Recommended reading order

  1. yaml_declarative_source.py

@github-actions github-actions bot added the CDK Connector Development Kit label Aug 15, 2022
@brianjlai brianjlai changed the base branch from master to brian/low_code_validate_schema August 15, 2022 08:42
next_classes = []
copy_cls = type(expand_class.__name__, expand_class.__bases__, dict(expand_class.__dict__))
class_fields = fields(copy_cls)
for field in class_fields:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the code follows a similar structure to the original schema validation in #15543 where we unpack the underlying types and replace the annotations

return copy_cls

@staticmethod
def _get_next_expand_classes(field_type) -> list[type]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just return a flat list of next components that need to be traversed because we are relying on unpack to properly resolve interfaces and retain the existing structure. This function is solely responsible for figuring out the next set of components that need to be expanded as part of the next recursive call.

@brianjlai brianjlai marked this pull request as ready for review August 15, 2022 22:25
@brianjlai brianjlai requested review from a team and girarda August 15, 2022 22:25
@@ -807,3 +808,8 @@ def test_validate_types_nested_in_list():
def test_unpack(test_name, input_type, expected_unpacked_types):
actual_unpacked_types = DeclarativeComponentFactory.unpack(input_type)
assert actual_unpacked_types == expected_unpacked_types


def test_complete_schema():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to keep this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, this is just for my convenience testing, i'll remove it if we're just gonna generate the json file and put it into the repo

def default(self, obj):
if isinstance(obj, property):
return str(obj)
elif isinstance(obj, HttpMethod):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this because HttpMethod is an enum? Is RequestOptionType somehow not a problem?

Copy link
Contributor Author

@brianjlai brianjlai Aug 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this provides a serialization implementation for the HttpMethod enum. The reason why we need it for HttpMethod is because we assign the enum value HttpMethod.GET as the default so the enum (not just a string) shows up in the schema here.

But for RequestOptionType since we don't assign the enum anywhere, enum class shows up in the schema, but not the enum value. That's why we didn't run into the serialization problem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we check if the type is an enum instead so we don't have to add another case to this method when we add another enum with a default value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, will change this

module = field_type.__module__
# We can only continue parsing declarative components since we explicitly inherit from the JsonSchemaMixin class which is
# used to generate the final json schema
if "airbyte_cdk.sources.declarative" in module and not isinstance(field_type, EnumMeta):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just check that the class inherits from JsonSchemaMixin if that's the condition that matters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should work too. It does have an interesting side effect. Maybe it's what we want, but when interfaces like PaginationStrategy inherit JsonSchemaMixin, they show up in the schema. It might be what we want since they are part of the language, but also can be a little confusing because they don't have any fields or schemas to validate. They also turn every type into unions of the interface and concrete types

Basically the readability of the schema is worse, but it is technically more accurate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do they show up as a union of the subtypes or as an empty object? It think showing them is better if they're defined as the union, but worse otherwise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll look something like this when you view the ListStreamSlicer in the definitions. It is kind of the union, where the first referes the empty StreamSlicer object, and the second is the concrete definition which will actually get used in the validation.

ListStreamSlicer: [
    {
        "$ref": "#/definitions/StreamSlicer"
    },
    {
        "type": "object",
        "required":
        [
            "slice_values",
            "cursor_field",
            "config"
        ],
        "properties":
        {
            "slice_values":
            {
                "anyOf":
                [
                    {
                        "type": "array",
                        "items":
                        {
                            "type": "string"
                        }
                    },
                    {
                        "type": "string"
                    }
                ]
            },
            "cursor_field":
            {
                "anyOf":
                [
                    {
                        "$ref": "#/definitions/InterpolatedString"
                    },
                    {
                        "type": "string"
                    }
                ]
            },
            "config":
            {
                "type": "object"
            },
            "request_option":
            {
                "$ref": "#/definitions/RequestOption"
            }
        }
    }
]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed this on slack a bit, a slightly more verbose schema is probably acceptable vs module path assertions

Copy link
Contributor

@girarda girarda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base automatically changed from brian/low_code_validate_schema to master August 18, 2022 19:29
@github-actions github-actions bot added area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Aug 18, 2022
@github-actions github-actions bot removed area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Aug 18, 2022
@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label Aug 18, 2022
@brianjlai brianjlai merged commit 7e158ef into master Aug 18, 2022
@brianjlai brianjlai deleted the brian/generate_complete_schema branch August 18, 2022 22:53
rodireich pushed a commit that referenced this pull request Aug 25, 2022
)

* draft: first pass at complete schema language generation and factory validator

* actually a working validator and fixes to the schema that went uncaught

* remove extra spike file

* fix formatting file

* Add method to generate the complete JSON schema of the low code declarative language

* add testing of a few components during schema gen

* pr feedback and a little bit of refactoring

* test for schema version

* fix some types that were erroneously marked as invalid schema

* some comments

* add jsonschemamixin to interfaces

* update tests now that interfaces are jsonschemamixin

* accidentally removed a mixin

* remove unneeded test

* make comment a little more clear

* update changelog

* bump version

* generic enum not enum class

* Add method to generate the complete JSON schema of the low code declarative language

* add testing of a few components during schema gen

* test for schema version

* update tests now that interfaces are jsonschemamixin

* accidentally removed a mixin

* remove unneeded test

* make comment a little more clear

* generic enum not enum class

* add generated json file and update docs to reference it

* verbage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Improvements or additions to documentation CDK Connector Development Kit connectors/source/intercom
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants