-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Newline normalisation #7538
Newline normalisation #7538
Conversation
Tiago Lima seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
can you sign the CLA @IamTheLime ? @IamTheLime I think this doesn't solve the problem with Stripe connector. @ChristopheDuong do you think this is useful for normalization module? |
def transform_json_naming(input_name: str) -> str: | ||
result = sub(r"['\"`]", "_", input_name) | ||
result = sub(r"\n", "_", result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function reproduces behavior that certain destination connectors do on special property names following the code here:
Line 45 in 85381bd
public static JsonNode formatJsonPath(final JsonNode root) { |
So adding this line here is not enough...
and I am not sure about the core issue that is being solved here and why this would be the proper solution?
Do we really expect column names with \n
characters btw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really expect column names with \n characters?
This change may "fix" the SQL syntax error, but is the data properly parsed from the json blob or is it producing empty NULL values for that column? The answer probably boils down to if \n
is an important character in the column name or not
So we'd need more context to dig deeper:
- example of record data in the raw table
- catalog.json file to see how the
\n
is appearing there in the property name
I'll close this because #7729 solve the problem in the schema object. Introducing the \n handler in normalization wont solve the problem and can cause confusion in the future. |
What
This fixes 'Unclosed string literals' when extracting a json scalar. Currently the normalised json path is not subbing \n, as such in calls like the following:
json_extract = jinja_call(f"json_extract_array({json_column_name}, {json_path}, {normalized_json_path})")
The generated dbt template will break line and generate an unclosed string literal error
How
This solution "escapes" \n when normalising the json path