Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python source jsonschema support #520

Merged
merged 9 commits into from
Oct 9, 2020
Merged

Conversation

jrhizor
Copy link
Contributor

@jrhizor jrhizor commented Oct 8, 2020

Tested with spec/discover/check/read for the exchangerate v2 api.

Logging via message records will be in a different PR.

@jrhizor jrhizor marked this pull request as ready for review October 8, 2020 18:29
@@ -30,7 +30,7 @@ definitions:
additionalProperties: false
required:
- data
- extracted_at
- emitted_at
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't align with the actual name of the field

@@ -2,7 +2,7 @@
"$schema": http://json-schema.org/draft-07/schema#
"$id": https://github.com/airbytehq/airbyte/blob/master/airbyte-protocol/models/src/main/resources/airbyte_protocol/airbyte_message.yaml
title: AirbyteMessage
description:
description: Airbyte message
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python-jsonschema-objects fails on None descriptions

@@ -31,7 +33,7 @@ def discover(shell_command, transform=(lambda x: AirbyteSchema(x))) -> AirbyteSc
return transform(completed_process.stdout)

@staticmethod
def read(shell_command, is_message=(lambda x: True), transform=(lambda x: AirbyteMessage(x))) -> Generator[
def read(shell_command, is_message=(lambda x: True), transform=(lambda x: AirbyteMessage(type="RECORD", record=AirbyteRecordMessage(data={'value': x}, emitted_at=str(datetime.now()))))) -> Generator[
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will probably change to just outputting an object that will convert to json, which will be placed into the data field. Then base.py will be responsible for setting the timestamp and constructing the objects.

def _load_classes(yaml_path: str):
data = yaml.load(pkgutil.get_data(__name__, yaml_path), Loader=yaml.FullLoader)
builder = pjs.ObjectBuilder(data)
return builder.build_classes(standardize_names=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to prefer dynamic bindings over static generation? Might make sense to retain the context in a comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only static builders for Python I found were https://github.com/pearmaster/json-schema-codegen and https://github.com/microsoft/jschema-to-python, which aren't as full-featured or widely used.

I can add a todo #530

@@ -0,0 +1 @@
types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this being ignored if it is being used in init.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only ignored on git because it is copied to a local dir for Python packaging and Docker copying. Since it's a duplicate and it's effectively "generated" during the build because it's copied, I thought it would make sense to exclude.

builder = pjs.ObjectBuilder(data)
return builder.build_classes(standardize_names=False)

with warnings.catch_warnings():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this because the underlying libraries output lots of noise? Are they something we'd want to retain/emit anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one warning about the JSON schema version we're using. All of the features we're using here are supported, but the version itself isn't. Added a comment.

@@ -86,7 +86,7 @@ def start(self):
# todo: pass in state
generator = source.read(logging, rendered_config_path)
for message in generator:
print(message.message_string.strip())
print(message.serialize(sort_keys=True))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we sort keys? is this needed or just a nice to have? could you mention this context in a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

}

task copyProtocolDefinitions(type: Copy) {
from file("$projectDir/../../airbyte-protocol/models/src/main/resources/airbyte_protocol").absolutePath
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use rootDir so you don't have to ../../?

@jrhizor
Copy link
Contributor Author

jrhizor commented Oct 9, 2020

For config:

{
  "base": "AUD",
  "start_date": "2020-10-02"
}

The output is:

{"type": "LOG", "log": {"level": "INFO", "message": "Replicating exchange rate data from 2020-10-02 using base AUD"}}
{"type": "RECORD", "record": {"data": {"stream": "exchange_rate", "record": {"CAD": 0.9530295627, "HKD": 5.5527730271, "ISK": 99.071585634, "PHP": 34.7324700709, "DKK": 4.5451991204, "HUF": 219.2035182018, "CZK": 16.5037869533, "GBP": 0.5538297093, "RON": 2.9755069631, "SEK": 6.3675787931, "IDR": 10655.5888101637, "INR": 52.5516125092, "BRL": 4.0336550208, "RUB": 56.2438920108, "HRK": 4.6188614708, "JPY": 75.3725873442, "THB": 22.5940630344, "CHF": 0.6590520401, "EUR": 0.610798925, "MYR": 2.9837527486, "BGN": 1.1946005375, "TRY": 5.5452602003, "CNY": 4.8653799169, "NOK": 6.6601514781, "NZD": 1.07952602, "ZAR": 11.8496213047, "USD": 0.716467139, "MXN": 15.6720009773, "SGD": 0.9765453213, "AUD": 1.0, "ILS": 2.4580991937, "KRW": 833.5450769607, "PLN": 2.7451746885, "date": "2020-10-02T00:00:00Z"}}, "emitted_at": "2020-10-09 14:34:23.994642"}}
{"type": "LOG", "log": {"level": "INFO", "message": "Replicating exchange rate data from 2020-10-03 using base AUD"}}
{"type": "RECORD", "record": {"data": {"stream": "exchange_rate", "record": {"CAD": 0.9532938519, "HKD": 5.5682886623, "ISK": 99.1513523414, "PHP": 34.7768484034, "DKK": 4.542401856, "HUF": 218.6763538678, "CZK": 16.5431345015, "GBP": 0.5544294523, "RON": 2.975761646, "SEK": 6.3889736858, "IDR": 10577.3307283717, "INR": 52.5593748092, "BRL": 4.0642285854, "RUB": 56.4963062458, "HRK": 4.6205507052, "JPY": 75.8593320716, "THB": 22.4958788693, "CHF": 0.6582208926, "EUR": 0.6105378839, "MYR": 2.9835154771, "BGN": 1.1940899933, "TRY": 5.5933817693, "CNY": 4.8921179559, "NOK": 6.6460101349, "NZD": 1.0808352158, "ZAR": 11.8106722022, "USD": 0.7184809817, "MXN": 15.4191953111, "SGD": 0.9772879907, "AUD": 1.0, "ILS": 2.4569876061, "KRW": 832.602722999, "PLN": 2.7461994017, "date": "2020-10-05T00:00:00Z"}}, "emitted_at": "2020-10-09 14:34:24.265894"}}
{"type": "LOG", "log": {"level": "INFO", "message": "Replicating exchange rate data from 2020-10-04 using base AUD"}}
{"type": "RECORD", "record": {"data": {"stream": "exchange_rate", "record": {"CAD": 0.9488829529, "HKD": 5.549538611, "ISK": 98.8343856241, "PHP": 34.6873482273, "DKK": 4.5172413793, "HUF": 218.3827100534, "CZK": 16.41270034, "GBP": 0.5528047596, "RON": 2.9601748422, "SEK": 6.3828314716, "IDR": 10551.195968917, "INR": 52.5567629917, "BRL": 3.9681277319, "RUB": 55.7139388052, "HRK": 4.5949490044, "JPY": 75.6313744536, "THB": 22.3379067508, "CHF": 0.6545046139, "EUR": 0.6070908208, "MYR": 2.9741986401, "BGN": 1.1873482273, "TRY": 5.5624696455, "CNY": 4.8634652744, "NOK": 6.593188441, "NZD": 1.0765541525, "ZAR": 11.8097377368, "USD": 0.7160636231, "MXN": 15.2762870325, "SGD": 0.9732880039, "AUD": 1.0, "ILS": 2.440080136, "KRW": 830.7977173385, "PLN": 2.724137931, "date": "2020-10-06T00:00:00Z"}}, "emitted_at": "2020-10-09 14:34:24.345054"}}
{"type": "LOG", "log": {"level": "INFO", "message": "Replicating exchange rate data from 2020-10-05 using base AUD"}}
{"type": "RECORD", "record": {"data": {"stream": "exchange_rate", "record": {"CAD": 0.9478787879, "HKD": 5.5284242424, "ISK": 98.6666666667, "PHP": 34.5327272727, "DKK": 4.5103636364, "HUF": 217.7272727273, "CZK": 16.4024242424, "GBP": 0.5540181818, "RON": 2.9544242424, "SEK": 6.3533333333, "IDR": 10520.9151515152, "INR": 52.2806060606, "BRL": 3.9665454545, "RUB": 55.7403030303, "HRK": 4.586969697, "JPY": 75.5939393939, "THB": 22.2775757576, "CHF": 0.6537575758, "EUR": 0.6060606061, "MYR": 2.9646666667, "BGN": 1.1853333333, "TRY": 5.6195151515, "CNY": 4.8444848485, "NOK": 6.625030303, "NZD": 1.0825454545, "ZAR": 11.8662424242, "USD": 0.7133333333, "MXN": 15.3096363636, "SGD": 0.9693939394, "AUD": 1.0, "ILS": 2.4269090909, "KRW": 825.8787878788, "PLN": 2.7189090909, "date": "2020-10-07T00:00:00Z"}}, "emitted_at": "2020-10-09 14:34:24.427067"}}
{"type": "LOG", "log": {"level": "INFO", "message": "Replicating exchange rate data from 2020-10-06 using base AUD"}}
{"type": "RECORD", "record": {"data": {"stream": "exchange_rate", "record": {"CAD": 0.9489677852, "HKD": 5.5527068997, "ISK": 99.1413434017, "PHP": 34.7140856221, "DKK": 4.532184398, "HUF": 217.6968515925, "CZK": 16.4983862128, "GBP": 0.5543815846, "RON": 2.9692466963, "SEK": 6.3618537239, "IDR": 10573.1685037452, "INR": 52.4852323245, "BRL": 4.0090128494, "RUB": 55.4579501857, "HRK": 4.6087327203, "JPY": 75.9332561963, "THB": 22.3640460386, "CHF": 0.6576335181, "EUR": 0.6089763108, "MYR": 2.9750928689, "BGN": 1.1910358687, "TRY": 5.6819925705, "CNY": 4.8653553377, "NOK": 6.6401558979, "NZD": 1.0873881006, "ZAR": 11.8747335729, "USD": 0.7164606297, "MXN": 15.3107606114, "SGD": 0.9730832471, "AUD": 1.0, "ILS": 2.4303635589, "KRW": 825.4552097923, "PLN": 2.7308324706, "date": "2020-10-08T00:00:00Z"}}, "emitted_at": "2020-10-09 14:34:24.525636"}}
{"type": "LOG", "log": {"level": "INFO", "message": "Replicating exchange rate data from 2020-10-07 using base AUD"}}
{"type": "STATE", "state": {"data": {"value": {"start_date": "2020-10-09"}}}}
{"type": "LOG", "log": {"level": "INFO", "message": "Replicating exchange rate data from 2020-10-08 using base AUD"}}

I'll add schema emission and moving stream to a higher level in the message in separate PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants