Newline normalisation #7538

IamTheLime · 2021-11-01T20:59:54Z

What

This fixes 'Unclosed string literals' when extracting a json scalar. Currently the normalised json path is not subbing \n, as such in calls like the following:

json_extract = jinja_call(f"json_extract_array({json_column_name}, {json_path}, {normalized_json_path})")

The generated dbt template will break line and generate an unclosed string literal error

How

This solution "escapes" \n when normalising the json path

CLAassistant · 2021-11-01T20:59:58Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Tiago Lima seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

marcosmarxm · 2021-11-02T03:18:13Z

can you sign the CLA @IamTheLime ?

@IamTheLime I think this doesn't solve the problem with Stripe connector.

@ChristopheDuong do you think this is useful for normalization module?

ChristopheDuong · 2021-11-02T09:11:48Z

...ons/bases/base-normalization/normalization/transform_catalog/destination_name_transformer.py

 def transform_json_naming(input_name: str) -> str:
    result = sub(r"['\"`]", "_", input_name)
+    result = sub(r"\n", "_", result)


This function reproduces behavior that certain destination connectors do on special property names following the code here:

airbyte/airbyte-integrations/bases/base-java/src/main/java/io/airbyte/integrations/destination/StandardNameTransformer.java

Line 45 in 85381bd

public static JsonNode formatJsonPath(final JsonNode root) {

So adding this line here is not enough...
and I am not sure about the core issue that is being solved here and why this would be the proper solution?

Do we really expect column names with \n characters btw?

Do we really expect column names with \n characters?

This change may "fix" the SQL syntax error, but is the data properly parsed from the json blob or is it producing empty NULL values for that column? The answer probably boils down to if \n is an important character in the column name or not

So we'd need more context to dig deeper:

example of record data in the raw table

catalog.json file to see how the \n is appearing there in the property name

marcosmarxm · 2021-11-08T09:33:02Z

I'll close this because #7729 solve the problem in the schema object. Introducing the \n handler in normalization wont solve the problem and can cause confusion in the future.

Tiago Lima added 3 commits November 1, 2021 20:10

Added \n to transform of nammings

deae1d9

Changed bad variable assignment

368fb6f

added unit tests for transform json naming

fe1b7ba

github-actions bot added the normalization label Nov 1, 2021

octavia-squidington-iii added the community label Nov 1, 2021

removed bad autoimport

39a3d48

ChristopheDuong suggested changes Nov 2, 2021

View reviewed changes

marcosmarxm closed this Nov 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Newline normalisation #7538

Newline normalisation #7538

IamTheLime commented Nov 1, 2021 •

edited

Loading

CLAassistant commented Nov 1, 2021

marcosmarxm commented Nov 2, 2021

ChristopheDuong Nov 2, 2021

ChristopheDuong Nov 3, 2021

marcosmarxm commented Nov 8, 2021

Newline normalisation #7538

Newline normalisation #7538

Conversation

IamTheLime commented Nov 1, 2021 • edited Loading

What

How

CLAassistant commented Nov 1, 2021

marcosmarxm commented Nov 2, 2021

ChristopheDuong Nov 2, 2021

Choose a reason for hiding this comment

ChristopheDuong Nov 3, 2021

Choose a reason for hiding this comment

marcosmarxm commented Nov 8, 2021

IamTheLime commented Nov 1, 2021 •

edited

Loading