Skip to content

Commit d10124a

Browse files
authored
Add new macros for diff calculation, and unit tests (#99) (#101)
* Add new macros for diff calculation, and unit tests (#99) * Add macro for new hash-based comparison strategy * split out SF-focused version of macro * Fix change to complex object * Fix overuse of star * switch from compare rels to compare queries * provide wrapping parens * switch to array of columns for PK * split unit tests into own files, change unit tests to array pk * tidy up get_comp_bounds * fix arg rename * add quick_are_queries_identical and unit tests * Move data tests into own directory * Add test for multiple PKs * fix incorrect unit test configs * make data types for id and id_2 big enough nums * Mock event_time response * fix hardcoded value in quick_are_qs_identical * Add unit tests for null handling (still broken) * Rename columsn to be more unique * Steal surrogate key macro from utils * Use generated surrogate key across the board in place of PK * rm my profile reference * Update quick_are_queries_identical.sql * Add diagram explaining comparison bounds * Add comments explaining warehouse-specific optimisations * cross-db support * subq * no postgres or redshift for a sec * add default var values for compare wrappers * avoid lateral alias reference for BQ * BQ doesn't support count(arg1, arg2) * re-enable redshift * Alias subq for redshift * remove extra comma * add row status of nonunique_pk * remove redundant test and wrapper model * Create json-y tests for snowflake * Add workaround for redshift to support count num rows in status * skip incompatible tests * Fix redshift lack of bool_or support in window funcs * add skip exclusions for everything else * fix incorrect skip tag application * Move user configs to project.yml from profiles * Temporarily disable unpassable redshift tests * add temp skip to circle's config.yml * forgot tag: method * Temporarily skip reworked_compare_all_statuses_different_column_set * Skip another test redshift * disable unsupported tests BQ * postgres too? * Fixes for postgres * namespace macros * It's a postgres problem, not a redshift problem * Handle postgres 63 char limit * Add databricks * Rename tests to data_tests * Found a better workaround for missing count distinct window * actually call the macro * disable syntax-failing tests on dbx * try to install core from main to get sorting fix * Revert "try to install core from main to get sorting fix" This reverts commit d28f3e1. * Audit helper code review changes * add BQ support for qucik are queries identical * explain why using dense_rank * remove the compile step to avoid compilation error * Don't throw incompatible quick compare error during parse * add where clause to check we're not assuming its absence * enable first basic struct tests * Skip raising exception during parsing * json_build_object doesn't work on rs * changed behaviour redshift * skip complex structs on rs for now * temp disable all complex structs * skip some currently failoing bq tests * Properly exclude tests to skip, add comments * dbx too * rename reworked_compare to compare_and_classify_query_results * Rename files * rename macro file * Add relation_focused macros * Add BQ-specific generate_set_results for hashes, enable json tests * Implement hash comparisons for BQ and DBX (#103) * disable tests for unrelated adapters * Avoid lateral column aliasing * First cross-db complex struct fixture * Add final fixtures * Initial work on dbx compatibility * remove lateral column alias dbx * cast everything as string before hashing * add comment, enable all tests again * rename to dbt_audit_in_a instead of in_a * Protect against missing PK columns * gitignore package-lock.yml * add dbx variant of simple structs * Rename private macros to have _ prefix * Fix get comparison bounds (#104) * change to getting comparison bounds for queries not relations * add test for introspective queries * Make compare query columns multi pk (#105) * rm packagelock.yml
1 parent 8473293 commit d10124a

File tree

64 files changed

+1752
-106
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+1752
-106
lines changed

.circleci/config.yml

+20-13
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ jobs:
3333
. dbt_venv/bin/activate
3434
3535
python -m pip install --upgrade pip setuptools
36-
python -m pip install --pre dbt-core dbt-postgres dbt-redshift dbt-snowflake dbt-bigquery
36+
python -m pip install --pre dbt-core dbt-postgres dbt-redshift dbt-snowflake dbt-bigquery dbt-databricks
3737
3838
mkdir -p ~/.dbt
3939
cp integration_tests/ci/sample.profiles.yml ~/.dbt/profiles.yml
@@ -51,9 +51,8 @@ jobs:
5151
cd integration_tests
5252
dbt deps --target postgres
5353
dbt seed --target postgres --full-refresh
54-
dbt compile --target postgres
55-
dbt run --target postgres
56-
dbt test --target postgres
54+
dbt run --target postgres --exclude tag:skip+ tag:temporary_skip+
55+
dbt test --target postgres --exclude tag:skip+ tag:temporary_skip+
5756
5857
- run:
5958
name: "Run Tests - Redshift"
@@ -63,9 +62,8 @@ jobs:
6362
cd integration_tests
6463
dbt deps --target redshift
6564
dbt seed --target redshift --full-refresh
66-
dbt compile --target redshift
67-
dbt run --target redshift
68-
dbt test --target redshift
65+
dbt run --target redshift --exclude tag:skip+ tag:temporary_skip+
66+
dbt test --target redshift --exclude tag:skip+ tag:temporary_skip+
6967
7068
- run:
7169
name: "Run Tests - Snowflake"
@@ -75,9 +73,8 @@ jobs:
7573
cd integration_tests
7674
dbt deps --target snowflake
7775
dbt seed --target snowflake --full-refresh
78-
dbt compile --target snowflake
79-
dbt run --target snowflake
80-
dbt test --target snowflake
76+
dbt run --target snowflake --exclude tag:skip+ tag:temporary_skip+
77+
dbt test --target snowflake --exclude tag:skip+ tag:temporary_skip+
8178
8279
- run:
8380
name: "Run Tests - BigQuery"
@@ -90,10 +87,19 @@ jobs:
9087
cd integration_tests
9188
dbt deps --target bigquery
9289
dbt seed --target bigquery --full-refresh
93-
dbt compile --target bigquery
94-
dbt run --target bigquery --full-refresh
95-
dbt test --target bigquery
90+
dbt run --target bigquery --full-refresh --exclude tag:skip+ tag:temporary_skip+
91+
dbt test --target bigquery --exclude tag:skip+ tag:temporary_skip+
9692
93+
- run:
94+
name: "Run Tests - Databricks"
95+
command: |
96+
. dbt_venv/bin/activate
97+
echo `pwd`
98+
cd integration_tests
99+
dbt deps --target databricks
100+
dbt seed --target databricks --full-refresh
101+
dbt run --target databricks --exclude tag:skip+ tag:temporary_skip+
102+
dbt test --target databricks --exclude tag:skip+ tag:temporary_skip+
97103
98104
- save_cache:
99105
key: deps1-{{ .Branch }}
@@ -115,3 +121,4 @@ workflows:
115121
- profile-redshift
116122
- profile-snowflake
117123
- profile-bigquery
124+
- profile-databricks

.gitignore

+4-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
11
target/
22
dbt_packages/
33
logs/
4-
logfile
4+
logfile
5+
.DS_Store
6+
package-lock.yml
7+
integration_tests/package-lock.yml

.vscode/settings.json

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
{
2+
"yaml.schemas": {
3+
"https://raw.githubusercontent.com/dbt-labs/dbt-jsonschema/main/schemas/latest/dbt_yml_files-latest.json": [
4+
"/**/*.yml",
5+
"!profiles.yml",
6+
"!dbt_project.yml",
7+
"!packages.yml",
8+
"!selectors.yml",
9+
"!profile_template.yml"
10+
],
11+
"https://raw.githubusercontent.com/dbt-labs/dbt-jsonschema/main/schemas/latest/dbt_project-latest.json": [
12+
"dbt_project.yml"
13+
],
14+
"https://raw.githubusercontent.com/dbt-labs/dbt-jsonschema/main/schemas/latest/selectors-latest.json": [
15+
"selectors.yml"
16+
],
17+
"https://raw.githubusercontent.com/dbt-labs/dbt-jsonschema/main/schemas/latest/packages-latest.json": [
18+
"packages.yml"
19+
]
20+
},
21+
}

integration_tests/ci/sample.profiles.yml

+11-7
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,6 @@
22
# HEY! This file is used in the dbt-audit-helper integrations tests with CircleCI.
33
# You should __NEVER__ check credentials into version control. Thanks for reading :)
44

5-
config:
6-
send_anonymous_usage_stats: False
7-
use_colors: True
8-
95
integration_tests:
106
target: postgres
117
outputs:
@@ -27,15 +23,15 @@ integration_tests:
2723
dbname: "{{ env_var('REDSHIFT_TEST_DBNAME') }}"
2824
port: "{{ env_var('REDSHIFT_TEST_PORT') | as_number }}"
2925
schema: audit_helper_integration_tests_redshift
30-
threads: 1
26+
threads: 8
3127

3228
bigquery:
3329
type: bigquery
3430
method: service-account
3531
keyfile: "{{ env_var('BIGQUERY_SERVICE_KEY_PATH') }}"
3632
project: "{{ env_var('BIGQUERY_TEST_DATABASE') }}"
3733
schema: audit_helper_integration_tests_bigquery
38-
threads: 1
34+
threads: 8
3935

4036
snowflake:
4137
type: snowflake
@@ -46,4 +42,12 @@ integration_tests:
4642
database: "{{ env_var('SNOWFLAKE_TEST_DATABASE') }}"
4743
warehouse: "{{ env_var('SNOWFLAKE_TEST_WAREHOUSE') }}"
4844
schema: audit_helper_integration_tests_snowflake
49-
threads: 1
45+
threads: 8
46+
47+
databricks:
48+
type: databricks
49+
schema: dbt_project_evaluator_integration_tests_databricks
50+
host: "{{ env_var('DATABRICKS_TEST_HOST') }}"
51+
http_path: "{{ env_var('DATABRICKS_TEST_HTTP_PATH') }}"
52+
token: "{{ env_var('DATABRICKS_TEST_ACCESS_TOKEN') }}"
53+
threads: 10

integration_tests/dbt_project.yml

+11
Original file line numberDiff line numberDiff line change
@@ -17,3 +17,14 @@ clean-targets: # directories to be removed by `dbt clean`
1717

1818
seeds:
1919
+quote_columns: false
20+
21+
vars:
22+
compare_queries_summarize: true
23+
primary_key_columns_var: ['col1']
24+
columns_var: ['col1']
25+
event_time_var:
26+
quick_are_queries_identical_cols: ['col1']
27+
28+
flags:
29+
send_anonymous_usage_stats: False
30+
use_colors: True
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
{%- macro _basic_json_function() -%}
2+
{%- if target.type == 'snowflake' -%}
3+
object_construct
4+
{%- elif target.type == 'bigquery' -%}
5+
json_object
6+
{%- elif target.type == 'databricks' -%}
7+
map
8+
{%- elif execute -%}
9+
{# Only raise exception if it's actually being called, not during parsing #}
10+
{%- do exceptions.raise_compiler_error("Unknown adapter '"~ target.type ~ "'") -%}
11+
{%- endif -%}
12+
{%- endmacro -%}
13+
14+
{% macro _complex_json_function(json) %}
15+
16+
{% if target.type == 'redshift' %}
17+
json_parse({{ json }})
18+
{% elif target.type == 'databricks' %}
19+
from_json({{ json }}, schema_of_json({{ json }}))
20+
{% elif target.type in ['snowflake', 'bigquery'] %}
21+
parse_json({{ json }})
22+
{% elif execute %}
23+
{# Only raise exception if it's actually being called, not during parsing #}
24+
{%- do exceptions.raise_compiler_error("Unknown adapter '"~ target.type ~ "'") -%}
25+
{% endif %}
26+
{% endmacro %}

integration_tests/models/compare_which_columns_differ_exclude_cols.sql

-18
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
-- this has no tests, it's just making sure that the introspecive queries for event_time actually run
2+
3+
{{
4+
audit_helper.compare_and_classify_query_results(
5+
a_query="select * from " ~ ref('unit_test_model_a') ~ " where 1=1",
6+
b_query="select * from " ~ ref('unit_test_model_b') ~ " where 1=1",
7+
primary_key_columns=['id'],
8+
columns=['id', 'col1', 'col2'],
9+
event_time='created_at'
10+
)
11+
}}

integration_tests/models/compare_which_columns_differ.sql integration_tests/models/data_tests/compare_which_columns_differ.sql

+2-2
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,9 @@ select
99
has_difference
1010
from (
1111

12-
{{ audit_helper.compare_which_columns_differ(
12+
{{ audit_helper.compare_which_relation_columns_differ(
1313
a_relation=a_relation,
1414
b_relation=b_relation,
15-
primary_key="id"
15+
primary_key_columns=["id"]
1616
) }}
1717
) as macro_output
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
{% set a_relation=ref('data_compare_which_columns_differ_a')%}
2+
3+
{% set b_relation=ref('data_compare_which_columns_differ_b') %}
4+
5+
{% set pk_cols = ['id'] %}
6+
{% set cols = ['id','value_changes','becomes_not_null','does_not_change'] %}
7+
8+
{% if target.type == 'snowflake' %}
9+
{% set pk_cols = pk_cols | map("upper") | list %}
10+
{% set cols = cols | map("upper") | list %}
11+
{% endif %}
12+
13+
select
14+
lower(column_name) as column_name,
15+
has_difference
16+
from (
17+
18+
{{ audit_helper.compare_which_relation_columns_differ(
19+
a_relation=a_relation,
20+
b_relation=b_relation,
21+
primary_key_columns=pk_cols,
22+
columns=cols
23+
) }}
24+
25+
) as macro_output

integration_tests/models/schema.yml integration_tests/models/data_tests/schema.yml

+19-19
Original file line numberDiff line numberDiff line change
@@ -2,96 +2,96 @@ version: 2
22

33
models:
44
- name: compare_queries
5-
tests:
5+
data_tests:
66
- dbt_utils.equality:
77
compare_model: ref('expected_results__compare_relations_without_exclude')
88

99
- name: compare_queries_concat_pk_without_summary
10-
tests:
10+
data_tests:
1111
- dbt_utils.equality:
1212
compare_model: ref('expected_results__compare_without_summary')
1313

1414
- name: compare_queries_with_summary
15-
tests:
15+
data_tests:
1616
- dbt_utils.equality:
1717
compare_model: ref('expected_results__compare_with_summary')
1818

1919
- name: compare_queries_without_summary
20-
tests:
20+
data_tests:
2121
- dbt_utils.equality:
2222
compare_model: ref('expected_results__compare_without_summary')
2323

2424
- name: compare_relations_with_summary
25-
tests:
25+
data_tests:
2626
- dbt_utils.equality:
2727
compare_model: ref('expected_results__compare_with_summary')
2828

2929
- name: compare_relations_without_summary
30-
tests:
30+
data_tests:
3131
- dbt_utils.equality:
3232
compare_model: ref('expected_results__compare_without_summary')
3333

3434
- name: compare_relations_with_exclude
35-
tests:
35+
data_tests:
3636
- dbt_utils.equality:
3737
compare_model: ref('expected_results__compare_relations_with_exclude')
3838

3939
- name: compare_relations_without_exclude
40-
tests:
40+
data_tests:
4141
- dbt_utils.equality:
4242
compare_model: ref('expected_results__compare_relations_without_exclude')
4343

4444
- name: compare_all_columns_with_summary
45-
tests:
45+
data_tests:
4646
- dbt_utils.equality:
4747
compare_model: ref('expected_results__compare_all_columns_with_summary')
4848

4949
- name: compare_all_columns_without_summary
50-
tests:
50+
data_tests:
5151
- dbt_utils.equality:
5252
compare_model: ref('expected_results__compare_all_columns_without_summary')
5353

5454
- name: compare_all_columns_concat_pk_with_summary
55-
tests:
55+
data_tests:
5656
- dbt_utils.equality:
5757
compare_model: ref('expected_results__compare_all_columns_concat_pk_with_summary')
5858

5959
- name: compare_all_columns_concat_pk_without_summary
60-
tests:
60+
data_tests:
6161
- dbt_utils.equality:
6262
compare_model: ref('expected_results__compare_all_columns_concat_pk_without_summary')
6363

6464
- name: compare_all_columns_with_summary_and_exclude
65-
tests:
65+
data_tests:
6666
- dbt_utils.equality:
6767
compare_model: ref('expected_results__compare_all_columns_with_summary_and_exclude')
6868

6969
- name: compare_all_columns_where_clause
70-
tests:
70+
data_tests:
7171
- dbt_utils.equality:
7272
compare_model: ref('expected_results__compare_all_columns_where_clause')
7373

7474
- name: compare_relation_columns
75-
tests:
75+
data_tests:
7676
- dbt_utils.equality:
7777
compare_model: ref('expected_results__compare_relation_columns')
7878

7979
- name: compare_relations_concat_pk_without_summary
80-
tests:
80+
data_tests:
8181
- dbt_utils.equality:
8282
compare_model: ref('expected_results__compare_without_summary')
8383

8484
- name: compare_which_columns_differ
85-
tests:
85+
data_tests:
8686
- dbt_utils.equality:
8787
compare_model: ref('expected_results__compare_which_columns_differ')
8888

8989
- name: compare_which_columns_differ_exclude_cols
90-
tests:
90+
data_tests:
9191
- dbt_utils.equality:
9292
compare_model: ref('expected_results__compare_which_columns_differ_exclude_cols')
9393

9494
- name: compare_row_counts
95-
tests:
95+
data_tests:
9696
- dbt_utils.equality:
9797
compare_model: ref('expected_results__compare_row_counts')
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
select 12 as id, 22 as id_2, 'xyz' as col1, 'tuv' as col2, 123 as col3, {{ dbt.current_timestamp() }} as created_at
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
select 12 as id, 22 as id_2, 'xyz' as col1, 'tuv' as col2, 123 as col3, {{ dbt.current_timestamp() }} as created_at

0 commit comments

Comments
 (0)