[low code connectors] replace file retrieval with pkgutil to fix getting schema files #15814

brianjlai · 2022-08-20T08:27:15Z

What

Similar to the bug that we can't read files from the running Docker container. We need to read them using pkgutils when they're accessible at runtime using package_data in setup.py . The same applies for schema files.

This change should not require any changes to the existing yaml configs and I tested this locally for each of the integrations currently running in prod.

Example format of the json schema path:

schema_loader:
    file_path: "./source_greenhouse/schemas/{{ options['name'] }}.json"

How

Same concept as reading yaml configs. However, for slightly better DX, we should accept file structure that supports . prefix, / prefix, and the start of resource all the same so we do some string manipulation. All the existing declarative configs in prod at the moment shouldn't be affected by this change

Note: I also added a small fix to filter out schema gen warnings for some builtin types that would get flagged by the dataclasses_json helper library. The annoying part about these is they get flagged by the platform as ERROR which is confusing to see when debugging and could lead to a lot of red herrings during OC. We should hide these. An example of the warnings:

/Users/brian.lai/dev/airbyte/airbyte-integrations/connectors/source-greenhouse/.venv/lib/python3.9/site-packages/dataclasses_jsonschema/__init__.py:732: UserWarning: Unable to create schema for 'Any'
  warnings.warn(f"Unable to create schema for '{field_type_name}'")
/Users/brian.lai/dev/airbyte/airbyte-integrations/connectors/source-greenhouse/.venv/lib/python3.9/site-packages/dataclasses_jsonschema/__init__.py:732: UserWarning: Unable to create schema for 'DateTime'
  warnings.warn(f"Unable to create schema for '{field_type_name}'")

brianjlai · 2022-08-20T08:32:42Z

airbyte-cdk/python/airbyte_cdk/sources/declarative/schema/json_schema.py

@@ -33,8 +34,38 @@ def __post_init__(self, options: Mapping[str, Any]):

    def get_json_schema(self) -> Mapping[str, Any]:
        json_schema_path = self._get_json_filepath()
-        with open(json_schema_path, "r") as f:
-            return json.loads(f.read())
+        resource, schema_path = self.extract_resource_and_schema_path(json_schema_path)


I don't love this solution of splitting the URL. As mentioned in the below comment, this is a slight hack for us to figure out what resource the schemas are mounted under. Because setup.py mounts them in the source_sentry resource for example, we need to figure out from the airbyte_cdk module what the original source name was. All existing configs already prefix the source name so we can extract it from the first part of the path.

An alternative that I thought about was adding a new field to the JsonSchema dataclass called resource. However, I didn't like that because it will in 100% of cases always end up being the same as the name of the connector. It's also a weak abstraction and exposes our underlying file parsing mechanism by breaking it into two parts.

…schemas

sherifnada

Can you create an issue with the context form our Slack DM and link it as a TODO comment in the code?

brianjlai · 2022-08-22T22:01:48Z

yup will do!

brianjlai · 2022-08-23T00:18:46Z

/publish-cdk dry-run=true

🕑 https://github.com/airbytehq/airbyte/actions/runs/2907989553
✅ https://github.com/airbytehq/airbyte/actions/runs/2907989553

brianjlai · 2022-08-23T00:30:06Z

/publish-cdk dry-run=false

🕑 https://github.com/airbytehq/airbyte/actions/runs/2908034809
✅ https://github.com/airbytehq/airbyte/actions/runs/2908034809

…ing schema files (#15814) * replace file retrieval with pkgutil to fix getting schema files * slightly better error handling on missing files * filter our schema gen warnings for some classes that cannot generate schemas * add comment for todo * add changelog and setup before publish

brianjlai added 2 commits August 20, 2022 01:18

replace file retrieval with pkgutil to fix getting schema files

ca8d7ba

slightly better error handling on missing files

396f57f

brianjlai requested a review from a team August 20, 2022 08:27

github-actions bot added the CDK Connector Development Kit label Aug 20, 2022

brianjlai commented Aug 20, 2022

View reviewed changes

brianjlai mentioned this pull request Aug 22, 2022

[source-sentry] [source-sendgrid] [source-greenhouse] add .yaml to list of file imports in setup.py #15800

Merged

filter our schema gen warnings for some classes that cannot generate …

092f5e0

…schemas

sherifnada approved these changes Aug 22, 2022

View reviewed changes

brianjlai added 2 commits August 22, 2022 15:03

add comment for todo

b553160

add changelog and setup before publish

06c4e59

brianjlai merged commit 09cddff into master Aug 23, 2022

brianjlai deleted the brian/declarative_fix_read_schema branch August 23, 2022 07:06

octavia-squidington-iii mentioned this pull request Aug 23, 2022

Bump Airbyte version from 0.40.1 to 0.40.2 #15909

Closed

brianjlai mentioned this pull request Aug 23, 2022

[source-sendgrid] [source-sentry] use latest cdk w/ file parse fixes and fix sendgrid messages stream bug #15911

Merged

octavia-squidington-iii mentioned this pull request Aug 24, 2022

Bump Airbyte version from 0.40.1 to 0.40.2 #15931

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[low code connectors] replace file retrieval with pkgutil to fix getting schema files #15814

[low code connectors] replace file retrieval with pkgutil to fix getting schema files #15814

brianjlai commented Aug 20, 2022 •

edited

Loading

brianjlai Aug 20, 2022

sherifnada left a comment

brianjlai commented Aug 22, 2022

brianjlai commented Aug 23, 2022 •

edited by github-actions bot

Loading

brianjlai commented Aug 23, 2022 •

edited by github-actions bot

Loading

[low code connectors] replace file retrieval with pkgutil to fix getting schema files #15814

[low code connectors] replace file retrieval with pkgutil to fix getting schema files #15814

Conversation

brianjlai commented Aug 20, 2022 • edited Loading

What

How

brianjlai Aug 20, 2022

Choose a reason for hiding this comment

sherifnada left a comment

Choose a reason for hiding this comment

brianjlai commented Aug 22, 2022

brianjlai commented Aug 23, 2022 • edited by github-actions bot Loading

brianjlai commented Aug 23, 2022 • edited by github-actions bot Loading

brianjlai commented Aug 20, 2022 •

edited

Loading

brianjlai commented Aug 23, 2022 •

edited by github-actions bot

Loading

brianjlai commented Aug 23, 2022 •

edited by github-actions bot

Loading