-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python source jsonschema support #520
Conversation
… exchangeratesapi
@@ -30,7 +30,7 @@ definitions: | |||
additionalProperties: false | |||
required: | |||
- data | |||
- extracted_at | |||
- emitted_at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't align with the actual name of the field
@@ -2,7 +2,7 @@ | |||
"$schema": http://json-schema.org/draft-07/schema# | |||
"$id": https://github.com/airbytehq/airbyte/blob/master/airbyte-protocol/models/src/main/resources/airbyte_protocol/airbyte_message.yaml | |||
title: AirbyteMessage | |||
description: | |||
description: Airbyte message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python-jsonschema-objects
fails on None
descriptions
@@ -31,7 +33,7 @@ def discover(shell_command, transform=(lambda x: AirbyteSchema(x))) -> AirbyteSc | |||
return transform(completed_process.stdout) | |||
|
|||
@staticmethod | |||
def read(shell_command, is_message=(lambda x: True), transform=(lambda x: AirbyteMessage(x))) -> Generator[ | |||
def read(shell_command, is_message=(lambda x: True), transform=(lambda x: AirbyteMessage(type="RECORD", record=AirbyteRecordMessage(data={'value': x}, emitted_at=str(datetime.now()))))) -> Generator[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will probably change to just outputting an object that will convert to json, which will be placed into the data field. Then base.py
will be responsible for setting the timestamp and constructing the objects.
def _load_classes(yaml_path: str): | ||
data = yaml.load(pkgutil.get_data(__name__, yaml_path), Loader=yaml.FullLoader) | ||
builder = pjs.ObjectBuilder(data) | ||
return builder.build_classes(standardize_names=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to prefer dynamic bindings over static generation? Might make sense to retain the context in a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only static builders for Python I found were https://github.com/pearmaster/json-schema-codegen and https://github.com/microsoft/jschema-to-python, which aren't as full-featured or widely used.
I can add a todo #530
@@ -0,0 +1 @@ | |||
types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this being ignored if it is being used in init.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only ignored on git because it is copied to a local dir for Python packaging and Docker copying. Since it's a duplicate and it's effectively "generated" during the build because it's copied, I thought it would make sense to exclude.
builder = pjs.ObjectBuilder(data) | ||
return builder.build_classes(standardize_names=False) | ||
|
||
with warnings.catch_warnings(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this because the underlying libraries output lots of noise? Are they something we'd want to retain/emit anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's one warning about the JSON schema version we're using. All of the features we're using here are supported, but the version itself isn't. Added a comment.
@@ -86,7 +86,7 @@ def start(self): | |||
# todo: pass in state | |||
generator = source.read(logging, rendered_config_path) | |||
for message in generator: | |||
print(message.message_string.strip()) | |||
print(message.serialize(sort_keys=True)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we sort keys? is this needed or just a nice to have? could you mention this context in a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
} | ||
|
||
task copyProtocolDefinitions(type: Copy) { | ||
from file("$projectDir/../../airbyte-protocol/models/src/main/resources/airbyte_protocol").absolutePath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use rootDir so you don't have to ../../
?
For config:
The output is:
I'll add schema emission and moving stream to a higher level in the message in separate PRs. |
Tested with spec/discover/check/read for the exchangerate v2 api.
Logging via message records will be in a different PR.