Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[low code connectors] replace file retrieval with pkgutil to fix getting schema files #15814

Merged
merged 5 commits into from
Aug 23, 2022

Conversation

brianjlai
Copy link
Contributor

@brianjlai brianjlai commented Aug 20, 2022

What

Similar to the bug that we can't read files from the running Docker container. We need to read them using pkgutils when they're accessible at runtime using package_data in setup.py . The same applies for schema files.

This change should not require any changes to the existing yaml configs and I tested this locally for each of the integrations currently running in prod.

Example format of the json schema path:

schema_loader:
    file_path: "./source_greenhouse/schemas/{{ options['name'] }}.json"

How

Same concept as reading yaml configs. However, for slightly better DX, we should accept file structure that supports . prefix, / prefix, and the start of resource all the same so we do some string manipulation. All the existing declarative configs in prod at the moment shouldn't be affected by this change

Note: I also added a small fix to filter out schema gen warnings for some builtin types that would get flagged by the dataclasses_json helper library. The annoying part about these is they get flagged by the platform as ERROR which is confusing to see when debugging and could lead to a lot of red herrings during OC. We should hide these. An example of the warnings:

/Users/brian.lai/dev/airbyte/airbyte-integrations/connectors/source-greenhouse/.venv/lib/python3.9/site-packages/dataclasses_jsonschema/__init__.py:732: UserWarning: Unable to create schema for 'Any'
  warnings.warn(f"Unable to create schema for '{field_type_name}'")
/Users/brian.lai/dev/airbyte/airbyte-integrations/connectors/source-greenhouse/.venv/lib/python3.9/site-packages/dataclasses_jsonschema/__init__.py:732: UserWarning: Unable to create schema for 'DateTime'
  warnings.warn(f"Unable to create schema for '{field_type_name}'")

@brianjlai brianjlai requested a review from a team August 20, 2022 08:27
@github-actions github-actions bot added the CDK Connector Development Kit label Aug 20, 2022
@@ -33,8 +34,38 @@ def __post_init__(self, options: Mapping[str, Any]):

def get_json_schema(self) -> Mapping[str, Any]:
json_schema_path = self._get_json_filepath()
with open(json_schema_path, "r") as f:
return json.loads(f.read())
resource, schema_path = self.extract_resource_and_schema_path(json_schema_path)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this solution of splitting the URL. As mentioned in the below comment, this is a slight hack for us to figure out what resource the schemas are mounted under. Because setup.py mounts them in the source_sentry resource for example, we need to figure out from the airbyte_cdk module what the original source name was. All existing configs already prefix the source name so we can extract it from the first part of the path.

An alternative that I thought about was adding a new field to the JsonSchema dataclass called resource. However, I didn't like that because it will in 100% of cases always end up being the same as the name of the connector. It's also a weak abstraction and exposes our underlying file parsing mechanism by breaking it into two parts.

Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create an issue with the context form our Slack DM and link it as a TODO comment in the code?

@brianjlai
Copy link
Contributor Author

yup will do!

@brianjlai
Copy link
Contributor Author

brianjlai commented Aug 23, 2022

/publish-cdk dry-run=true

🕑 https://github.com/airbytehq/airbyte/actions/runs/2907989553
https://github.com/airbytehq/airbyte/actions/runs/2907989553

@brianjlai
Copy link
Contributor Author

brianjlai commented Aug 23, 2022

/publish-cdk dry-run=false

🕑 https://github.com/airbytehq/airbyte/actions/runs/2908034809
https://github.com/airbytehq/airbyte/actions/runs/2908034809

@brianjlai brianjlai merged commit 09cddff into master Aug 23, 2022
@brianjlai brianjlai deleted the brian/declarative_fix_read_schema branch August 23, 2022 07:06
rodireich pushed a commit that referenced this pull request Aug 25, 2022
…ing schema files (#15814)

* replace file retrieval with pkgutil to fix getting schema files

* slightly better error handling on missing files

* filter our schema gen warnings for some classes that cannot generate schemas

* add comment for todo

* add changelog and setup before publish
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CDK Connector Development Kit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants