-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Publish PR 9029: clickhouse normalization #9072
Changes from all commits
c6ae621
b88225e
cc499c2
a2517a3
4a51799
a57495c
c56ef54
f8ccfd6
f24ea4b
a6e4c31
4f9f8ae
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,11 +12,12 @@ | |
|
||
as ( | ||
|
||
-- depends_on: ref('dedup_cdc_excluded_stg') | ||
with | ||
|
||
input_data as ( | ||
select * | ||
from _airbyte_test_normalization.dedup_cdc_excluded_ab3 | ||
from _airbyte_test_normalization.dedup_cdc_excluded_stg | ||
-- dedup_cdc_excluded from test_normalization._airbyte_raw_dedup_cdc_excluded | ||
), | ||
|
||
|
@@ -45,15 +46,15 @@ scd_data as ( | |
_ab_cdc_updated_at, | ||
_ab_cdc_deleted_at, | ||
_airbyte_emitted_at as _airbyte_start_at, | ||
case when _airbyte_active_row_num = 1 and _ab_cdc_deleted_at is null then 1 else 0 end as _airbyte_active_row, | ||
anyOrNull(_airbyte_emitted_at) over ( | ||
partition by id | ||
order by | ||
_airbyte_emitted_at is null asc, | ||
_airbyte_emitted_at desc, | ||
_airbyte_emitted_at desc, _ab_cdc_updated_at desc | ||
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING | ||
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING | ||
) as _airbyte_end_at, | ||
case when _airbyte_active_row_num = 1 and _ab_cdc_deleted_at is null then 1 else 0 end as _airbyte_active_row, | ||
_airbyte_ab_id, | ||
_airbyte_emitted_at, | ||
_airbyte_dedup_cdc_excluded_hashid | ||
|
@@ -65,7 +66,7 @@ dedup_data as ( | |
-- additionally, we generate a unique key for the scd table | ||
row_number() over ( | ||
partition by _airbyte_unique_key, _airbyte_start_at, _airbyte_emitted_at, accurateCastOrNull(_ab_cdc_deleted_at, 'String'), accurateCastOrNull(_ab_cdc_updated_at, 'String') | ||
order by _airbyte_ab_id | ||
order by _airbyte_active_row desc, _airbyte_ab_id | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is it expected that we're now sorting on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Adding this prevents cases having multiple |
||
) as _airbyte_row_num, | ||
assumeNotNull(hex(MD5( | ||
|
||
|
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for my education: what do
ab3
andstg
mean?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you use
incremental + dedup
sync mode dbt will create the stg tables for you, but other sync methods the extraction of data, data type conversion and hashing are made using sub-queries/tables with suffixab1
,ab2
andab3
.