🐛 Fix normalization issue with quoted & case sensitive columns #9317

ChristopheDuong · 2022-01-05T17:13:21Z

What

Relates to https://github.com/airbytehq/oncall/issues/84

How

It seems MSSQL does care about case sensitivity of columns (even quoted ones).
(This might be depending on a "collation" settings of the server too)

But this PR makes it so we always resolve column conflicts by lowercasing all identifiers.

When I added the new test case, it surfaced similar behavior/errors on other destinations, so I'm fixing it there too.
(bigquery/mysql)

Recommended reading order

x.java
y.python

…st-pr-9029

ChristopheDuong · 2022-01-05T17:22:13Z

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1659526585
❌ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1659526585
🐛

ChristopheDuong · 2022-01-05T17:45:54Z

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1659612772
❌ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1659612772
🐛

tuliren · 2022-01-05T22:27:55Z

...st_output/airbyte_incremental/scd/test_normalization/multiple_column_names_conflicts_scd.sql

+      "User id",
+      "user id",


Sorry that I don't understand how the first User id is not converted to lower case. The main change in the Python file is this one I think:

- if not is_quoted and not self.needs_quotes(input_name): - result = input_name.lower() + result = input_name.lower()

which seems to universally convert names to lower case. Can you elaborate?

This file you commented on is the output for Postgres, not mssql (i think the mssql one is not versioned in git).

In Postgres, when quoted, the identifier's case matters (so "User Id", "User id" and "user id" are different column names). So nothing has changed here.

But as the integration test reveals and as the reported On-Call issue reports, MSSQL does not make a difference with these 3 columns names with different cases and considers them all to be equal, and thus conflicting.

So, to mirror this behavior with MSSQL in normalization code, I universally lowercase all column identifiers for MSSQL only even if they are quoted. As a consequence, normalization now detects column name conflicts and will resolve it by renaming conflicting names with extra '_1', '_2' etc

Actually, I'm going to change the PR and isolate it to normalize this sort of casing only when looking for column name conflicts...

I see. Thanks for the explanation.

ChristopheDuong · 2022-01-06T12:17:21Z

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1662948830
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1662948830
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                       Stmts   Miss  Cover
	 --------------------------------------------------------------
	 base_python/__init__.py                       13      0   100%
	 base_python/catalog_helpers.py                10      6    40%
	 base_python/cdk/__init__.py                    0      0   100%
	 base_python/cdk/abstract_source.py            89     64    28%
	 base_python/cdk/streams/__init__.py            0      0   100%
	 base_python/cdk/streams/auth/__init__.py       0      0   100%
	 base_python/cdk/streams/auth/core.py           8      1    88%
	 base_python/cdk/streams/auth/jwt.py            5      5     0%
	 base_python/cdk/streams/auth/oauth.py         37     26    30%
	 base_python/cdk/streams/auth/token.py          9      4    56%
	 base_python/cdk/streams/core.py               63     32    49%
	 base_python/cdk/streams/exceptions.py         10      2    80%
	 base_python/cdk/streams/http.py               67     33    51%
	 base_python/cdk/streams/rate_limiting.py      30     14    53%
	 base_python/cdk/utils/__init__.py              0      0   100%
	 base_python/cdk/utils/casing.py                4      0   100%
	 base_python/cdk/utils/event_timing.py         47      3    94%
	 base_python/client.py                         56     33    41%
	 base_python/entrypoint.py                     70     56    20%
	 base_python/integration.py                    52     25    52%
	 base_python/logger.py                         33     15    55%
	 base_python/schema_helpers.py                 56     41    27%
	 base_python/source.py                         51     34    33%
	 main_dev.py                                    3      3     0%
	 --------------------------------------------------------------
	 TOTAL                                        713    397    44%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 main_dev_transform_catalog.py                                         3      3     0%
	 main_dev_transform_config.py                                          3      3     0%
	 normalization/__init__.py                                             4      0   100%
	 normalization/destination_type.py                                    13      0   100%
	 normalization/transform_catalog/__init__.py                           2      0   100%
	 normalization/transform_catalog/catalog_processor.py                143     77    46%
	 normalization/transform_catalog/destination_name_transformer.py     148      7    95%
	 normalization/transform_catalog/reserved_keywords.py                 13      0   100%
	 normalization/transform_catalog/stream_processor.py                 518    331    36%
	 normalization/transform_catalog/table_name_registry.py              174     34    80%
	 normalization/transform_catalog/transform.py                         45     26    42%
	 normalization/transform_catalog/utils.py                             33      7    79%
	 normalization/transform_config/__init__.py                            2      0   100%
	 normalization/transform_config/transform.py                         146     32    78%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                              1247    520    58%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        74      6    92%
	 source_acceptance_test/conftest.py                     109    109     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              242     96    60%
	 source_acceptance_test/tests/test_full_refresh.py       38      0   100%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  54     17    69%
	 source_acceptance_test/utils/compare.py                 62     23    63%
	 source_acceptance_test/utils/connector_runner.py       110     48    56%
	 source_acceptance_test/utils/json_schema_helper.py     115     14    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  979    404    59%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 main_dev_transform_catalog.py                                         3      3     0%
	 main_dev_transform_config.py                                          3      3     0%
	 normalization/__init__.py                                             4      0   100%
	 normalization/destination_type.py                                    13      0   100%
	 normalization/transform_catalog/__init__.py                           2      0   100%
	 normalization/transform_catalog/catalog_processor.py                143     77    46%
	 normalization/transform_catalog/destination_name_transformer.py     148      7    95%
	 normalization/transform_catalog/reserved_keywords.py                 13      0   100%
	 normalization/transform_catalog/stream_processor.py                 518    331    36%
	 normalization/transform_catalog/table_name_registry.py              174     34    80%
	 normalization/transform_catalog/transform.py                         45     26    42%
	 normalization/transform_catalog/utils.py                             33      7    79%
	 normalization/transform_config/__init__.py                            2      0   100%
	 normalization/transform_config/transform.py                         146     32    78%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                              1247    520    58%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 main_dev_transform_catalog.py                                         3      3     0%
	 main_dev_transform_config.py                                          3      3     0%
	 normalization/__init__.py                                             4      0   100%
	 normalization/destination_type.py                                    13      0   100%
	 normalization/transform_catalog/__init__.py                           2      0   100%
	 normalization/transform_catalog/catalog_processor.py                143     12    92%
	 normalization/transform_catalog/destination_name_transformer.py     148      5    97%
	 normalization/transform_catalog/reserved_keywords.py                 13      0   100%
	 normalization/transform_catalog/stream_processor.py                 518     39    92%
	 normalization/transform_catalog/table_name_registry.py              174     51    71%
	 normalization/transform_catalog/transform.py                         45     30    33%
	 normalization/transform_catalog/utils.py                             33      0   100%
	 normalization/transform_config/__init__.py                            2      0   100%
	 normalization/transform_config/transform.py                         146     45    69%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                              1247    188    85%

ChristopheDuong · 2022-01-06T12:53:56Z

...treams/first_output/airbyte_views/test_normalization/multiple_column_names_conflicts_stg.sql

+    json_value(_airbyte_data, ''$."User Id"'') as "User Id",
+    json_value(_airbyte_data, ''$."user_id"'') as user_id,
+    json_value(_airbyte_data, ''$."User id"'') as "User id_1",
+    json_value(_airbyte_data, ''$."user id"'') as "user id_2",
+    json_value(_airbyte_data, ''$."UserId"'') as userid,


@tuliren here is the output of the new test for MSSQL

…ation

jzcruiser and others added 21 commits December 22, 2021 10:19

add normalization-clickhouse docker build step

c6ae621

Merge branch 'patch-4' of github.com:jzcruiser/airbyte into marcos/te…

b88225e

…st-pr-9029

bump normalization version

cc499c2

small changes gradle

a2517a3

Merge branch 'master' into marcos/test-pr-9029

4a51799

fix settings gradle

a57495c

fix eof file

c56ef54

correct clickhouse normalization

f8ccfd6

add test case

82b77d3

Cast cursor to string

df3faf6

Refactor jinja template for scd

a76ba27

Merge branch 'test-pr-9029' into fix-normalization

34fa54d

Fix normalization scd in oracle

5e48f8a

Regen sql

fff86cc

Regen clickhouse

1d9dfb6

Merge remote-tracking branch 'origin/master' into fix-normalization

138bba3

Regen file

f536e48

Re-indent columns

7b303bb

Indent clickhouse files

4aed065

Add failing test case user id

2f3ff7a

Lower case column identifiers for MSSQL

3bc7217

github-actions bot added the normalization label Jan 5, 2022

ChristopheDuong changed the base branch from master to chris/fix-bq-normalization-scd-float January 5, 2022 17:13

ChristopheDuong requested review from tuliren and Phlair January 5, 2022 17:16

jrhizor temporarily deployed to more-secrets January 5, 2022 17:24 Inactive

Fix unit tests

963800e

ChristopheDuong temporarily deployed to more-secrets January 5, 2022 17:47 Inactive

jrhizor temporarily deployed to more-secrets January 5, 2022 17:48 Inactive

tuliren approved these changes Jan 5, 2022

View reviewed changes

Keep casing and fix tests for mysql/bigquery

277c61f

ChristopheDuong temporarily deployed to more-secrets January 6, 2022 12:12 Inactive

jrhizor temporarily deployed to more-secrets January 6, 2022 12:13 Inactive

Fix unit tests

be75274

ChristopheDuong temporarily deployed to more-secrets January 6, 2022 12:18 Inactive

jrhizor temporarily deployed to more-secrets January 6, 2022 12:19 Inactive

ChristopheDuong added 2 commits January 6, 2022 13:49

Add more models to git versioning

efaca48

regen clickhouse outputs

aaa0013

ChristopheDuong temporarily deployed to more-secrets January 6, 2022 12:52 Inactive

ChristopheDuong commented Jan 6, 2022

View reviewed changes

add test

54d18fa

ChristopheDuong temporarily deployed to more-secrets January 6, 2022 13:58 Inactive

ChristopheDuong added 2 commits January 6, 2022 15:02

Fix when no cursor

0771ffe

Merge branch 'fix-normalization' into mssql-normalization

0f6c9ac

ChristopheDuong temporarily deployed to more-secrets January 6, 2022 14:07 Inactive

ChristopheDuong changed the title ~~Fix mssql normalization issue with case sensitive columns~~ 🐛 Fix normalization issue with quoted & case sensitive columns Jan 6, 2022

ChristopheDuong added 2 commits January 6, 2022 16:35

format code

7eaf0b7

Include tcp port in Clickhouse destination configuration for normaliz…

c661811

…ation

Base automatically changed from chris/fix-bq-normalization-scd-float to master January 6, 2022 17:49

Merge remote-tracking branch 'origin/master' into mssql-normalization

e49f762

github-actions bot added the area/connectors Connector related issues label Jan 6, 2022

ChristopheDuong temporarily deployed to more-secrets January 6, 2022 17:57 Inactive

ChristopheDuong merged commit c5d4a97 into master Jan 6, 2022

ChristopheDuong deleted the chris/fix-mssql-normalization branch January 6, 2022 17:59

jrhizor mentioned this pull request Jan 7, 2022

Bump Airbyte version from 0.35.3-alpha to 0.35.4-alpha #9353

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 Fix normalization issue with quoted & case sensitive columns #9317

🐛 Fix normalization issue with quoted & case sensitive columns #9317

ChristopheDuong commented Jan 5, 2022 •

edited

Loading

ChristopheDuong commented Jan 5, 2022 •

edited by github-actions bot

Loading

ChristopheDuong commented Jan 5, 2022 •

edited by github-actions bot

Loading

tuliren Jan 5, 2022

ChristopheDuong Jan 6, 2022 •

edited

Loading

ChristopheDuong Jan 6, 2022

tuliren Jan 6, 2022

ChristopheDuong commented Jan 6, 2022 •

edited by github-actions bot

Loading

ChristopheDuong Jan 6, 2022

🐛 Fix normalization issue with quoted & case sensitive columns #9317

🐛 Fix normalization issue with quoted & case sensitive columns #9317

Conversation

ChristopheDuong commented Jan 5, 2022 • edited Loading

What

How

Recommended reading order

ChristopheDuong commented Jan 5, 2022 • edited by github-actions bot Loading

ChristopheDuong commented Jan 5, 2022 • edited by github-actions bot Loading

tuliren Jan 5, 2022

Choose a reason for hiding this comment

ChristopheDuong Jan 6, 2022 • edited Loading

Choose a reason for hiding this comment

ChristopheDuong Jan 6, 2022

Choose a reason for hiding this comment

tuliren Jan 6, 2022

Choose a reason for hiding this comment

ChristopheDuong commented Jan 6, 2022 • edited by github-actions bot Loading

ChristopheDuong Jan 6, 2022

Choose a reason for hiding this comment

ChristopheDuong commented Jan 5, 2022 •

edited

Loading

ChristopheDuong commented Jan 5, 2022 •

edited by github-actions bot

Loading

ChristopheDuong commented Jan 5, 2022 •

edited by github-actions bot

Loading

ChristopheDuong Jan 6, 2022 •

edited

Loading

ChristopheDuong commented Jan 6, 2022 •

edited by github-actions bot

Loading