Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newline normalisation #7538

Closed
wants to merge 4 commits into from

Conversation

IamTheLime
Copy link

@IamTheLime IamTheLime commented Nov 1, 2021

What

This fixes 'Unclosed string literals' when extracting a json scalar. Currently the normalised json path is not subbing \n, as such in calls like the following:

json_extract = jinja_call(f"json_extract_array({json_column_name}, {json_path}, {normalized_json_path})")

The generated dbt template will break line and generate an unclosed string literal error

How

This solution "escapes" \n when normalising the json path

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Tiago Lima seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@marcosmarxm
Copy link
Member

can you sign the CLA @IamTheLime ?

@IamTheLime I think this doesn't solve the problem with Stripe connector.

@ChristopheDuong do you think this is useful for normalization module?

Comment on lines 232 to +234
def transform_json_naming(input_name: str) -> str:
result = sub(r"['\"`]", "_", input_name)
result = sub(r"\n", "_", result)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function reproduces behavior that certain destination connectors do on special property names following the code here:

So adding this line here is not enough...
and I am not sure about the core issue that is being solved here and why this would be the proper solution?

Do we really expect column names with \n characters btw?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really expect column names with \n characters?

This change may "fix" the SQL syntax error, but is the data properly parsed from the json blob or is it producing empty NULL values for that column? The answer probably boils down to if \n is an important character in the column name or not

So we'd need more context to dig deeper:

  • example of record data in the raw table
  • catalog.json file to see how the \n is appearing there in the property name

@marcosmarxm
Copy link
Member

I'll close this because #7729 solve the problem in the schema object. Introducing the \n handler in normalization wont solve the problem and can cause confusion in the future.

@marcosmarxm marcosmarxm closed this Nov 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants