-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[low code connectors] generate complete json schema from classes #15647
Conversation
next_classes = [] | ||
copy_cls = type(expand_class.__name__, expand_class.__bases__, dict(expand_class.__dict__)) | ||
class_fields = fields(copy_cls) | ||
for field in class_fields: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of the code follows a similar structure to the original schema validation in #15543 where we unpack the underlying types and replace the annotations
return copy_cls | ||
|
||
@staticmethod | ||
def _get_next_expand_classes(field_type) -> list[type]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just return a flat list of next components that need to be traversed because we are relying on unpack to properly resolve interfaces and retain the existing structure. This function is solely responsible for figuring out the next set of components that need to be expanded as part of the next recursive call.
@@ -807,3 +808,8 @@ def test_validate_types_nested_in_list(): | |||
def test_unpack(test_name, input_type, expected_unpacked_types): | |||
actual_unpacked_types = DeclarativeComponentFactory.unpack(input_type) | |||
assert actual_unpacked_types == expected_unpacked_types | |||
|
|||
|
|||
def test_complete_schema(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to keep this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, this is just for my convenience testing, i'll remove it if we're just gonna generate the json file and put it into the repo
def default(self, obj): | ||
if isinstance(obj, property): | ||
return str(obj) | ||
elif isinstance(obj, HttpMethod): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this because HttpMethod
is an enum? Is RequestOptionType
somehow not a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep this provides a serialization implementation for the HttpMethod
enum. The reason why we need it for HttpMethod
is because we assign the enum value HttpMethod.GET
as the default so the enum (not just a string) shows up in the schema here.
But for RequestOptionType
since we don't assign the enum anywhere, enum class shows up in the schema, but not the enum value. That's why we didn't run into the serialization problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we check if the type is an enum instead so we don't have to add another case to this method when we add another enum with a default value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, will change this
module = field_type.__module__ | ||
# We can only continue parsing declarative components since we explicitly inherit from the JsonSchemaMixin class which is | ||
# used to generate the final json schema | ||
if "airbyte_cdk.sources.declarative" in module and not isinstance(field_type, EnumMeta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we just check that the class inherits from JsonSchemaMixin
if that's the condition that matters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should work too. It does have an interesting side effect. Maybe it's what we want, but when interfaces like PaginationStrategy
inherit JsonSchemaMixin
, they show up in the schema. It might be what we want since they are part of the language, but also can be a little confusing because they don't have any fields or schemas to validate. They also turn every type into unions of the interface and concrete types
Basically the readability of the schema is worse, but it is technically more accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do they show up as a union of the subtypes or as an empty object? It think showing them is better if they're defined as the union, but worse otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll look something like this when you view the ListStreamSlicer in the definitions
. It is kind of the union, where the first referes the empty StreamSlicer object, and the second is the concrete definition which will actually get used in the validation.
ListStreamSlicer: [
{
"$ref": "#/definitions/StreamSlicer"
},
{
"type": "object",
"required":
[
"slice_values",
"cursor_field",
"config"
],
"properties":
{
"slice_values":
{
"anyOf":
[
{
"type": "array",
"items":
{
"type": "string"
}
},
{
"type": "string"
}
]
},
"cursor_field":
{
"anyOf":
[
{
"$ref": "#/definitions/InterpolatedString"
},
{
"type": "string"
}
]
},
"config":
{
"type": "object"
},
"request_option":
{
"$ref": "#/definitions/RequestOption"
}
}
}
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed this on slack a bit, a slightly more verbose schema is probably acceptable vs module path assertions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approved pending the enum type check on https://github.com/airbytehq/airbyte/pull/15647/files#diff-f065e5627510a5d82633b9ac35383b6065f9b09ab4381f667763ad322a13dccaR156
…/airbyte into brian/generate_complete_schema
) * draft: first pass at complete schema language generation and factory validator * actually a working validator and fixes to the schema that went uncaught * remove extra spike file * fix formatting file * Add method to generate the complete JSON schema of the low code declarative language * add testing of a few components during schema gen * pr feedback and a little bit of refactoring * test for schema version * fix some types that were erroneously marked as invalid schema * some comments * add jsonschemamixin to interfaces * update tests now that interfaces are jsonschemamixin * accidentally removed a mixin * remove unneeded test * make comment a little more clear * update changelog * bump version * generic enum not enum class * Add method to generate the complete JSON schema of the low code declarative language * add testing of a few components during schema gen * test for schema version * update tests now that interfaces are jsonschemamixin * accidentally removed a mixin * remove unneeded test * make comment a little more clear * generic enum not enum class * add generated json file and update docs to reference it * verbage
What
Adds a new method for generating the complete json schema of all the components of the low code declarative language starting from the top level
CheckStream
andDeclarativeStream
s. How this is ultimately surfaced to the user is an open question, but for now it's returned as a flat string which can be formatted on other apps like Sublime.How
Performs a complete traversal of the object model starting at the top of the declarative framework by traversing through each component relationship and expanding fields that are represented by an interface into their concrete class implementations.
This uses a recursive algorithm to perform the traversal by iterating over a class' underlying fields. For each field we unpack them if they're stored in a generic data structure and translate them into the class types that implement the interface. Then we perform the same process on the underlying declarative components (all other types and primitives are ignored)
After the traversal and translating the interfaces into classes, the
dataclasses-jsonschema
is used to generate a schema from the data model which is then transformed into a json-readable formatRecommended reading order
yaml_declarative_source.py