Skip to content

Commit 1be6f26

Browse files
authored
Merge pull request #91 from dbt-labs/dave/small-tweaks
Dave/small tweaks
2 parents 907845d + 3576f1d commit 1be6f26

5 files changed

+53
-33
lines changed

README.md

+30-17
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,19 @@
33
Useful macros when performing data audits
44

55
# Contents
6-
* [compare_relations](#compare_relations-source)
7-
* [compare_queries](#compare_queries-source)
8-
* [compare_column_values](#compare_column_values-source)
9-
* [compare_relation_columns](#compare_relation_columns-source)
10-
* [compare_all_columns](#compare_all_columns-source)
11-
* [compare_column_values_verbose](#compare_column_values_verbose-source)
6+
- [dbt-audit-helper](#dbt-audit-helper)
7+
- [Contents](#contents)
8+
- [Installation instructions](#installation-instructions)
9+
- [Macros](#macros)
10+
- [compare\_relations (source)](#compare_relations-source)
11+
- [compare\_queries (source)](#compare_queries-source)
12+
- [compare\_column\_values (source)](#compare_column_values-source)
13+
- [Usage:](#usage)
14+
- [Advanced usage - dbt Cloud:](#advanced-usage---dbt-cloud)
15+
- [compare\_relation\_columns (source)](#compare_relation_columns-source)
16+
- [compare\_all\_columns (source)](#compare_all_columns-source)
17+
- [Usage:](#usage-1)
18+
- [Arguments:](#arguments)
1219

1320
# Installation instructions
1421
New to dbt packages? Read more about them [here](https://docs.getdbt.com/docs/building-a-dbt-project/package-management/).
@@ -71,6 +78,7 @@ Arguments:
7178
results for row-by-row validation.
7279
* `summarize` (optional): Allows you to switch between a summary or detailed view
7380
of the compared data. Accepts `true` or `false` values. Defaults to `true`.
81+
* `limit` (optional): Allows you to limit the number of rows returned when summarize=False. Defaults to `None` (no limit).
7482

7583
## compare_queries ([source](macros/compare_queries.sql))
7684
Super similar to `compare_relations`, except it takes two select statements. This macro is useful when:
@@ -107,8 +115,13 @@ Super similar to `compare_relations`, except it takes two select statements. Thi
107115
```
108116

109117
Arguments:
110-
* `summarize` (optional): Allows you to switch between a summary or detaied view
111-
of the compared data. Accepts `true` or `false` vaules. Defaults to `true`.
118+
119+
* `a_query` and `b_query`: The queries you want to compare.
120+
* `exclude_columns` (optional): Any columns you wish to exclude from the
121+
validation.
122+
* `summarize` (optional): Allows you to switch between a summary or detailed view
123+
of the compared data. Accepts `true` or `false` values. Defaults to `true`.
124+
* `limit` (optional): Allows you to limit the number of rows returned when summarize=False. Defaults to `null` (no limit).
112125

113126
## compare_column_values ([source](macros/compare_column_values.sql))
114127
This macro will return a query, that, when executed, compares a column across
@@ -159,6 +172,7 @@ number of your records don't match.
159172
**Usage notes:**
160173
* `primary_key` must be a unique key in both tables, otherwise the join won't
161174
work as expected.
175+
* `emojis` is a boolean argument that defaults to `true` and displays ✅, 🤷 and ❌ for easier visual scanning. If you don't want to include emojis in the output, set it to `false`.
162176

163177

164178
### Advanced usage - dbt Cloud:
@@ -202,19 +216,18 @@ The ``.print_table()`` function is not compatible with dbt Cloud so an adjustmen
202216
This macro will return a query, that, when executed, compares the ordinal_position
203217
and data_types of columns in two [Relations](https://docs.getdbt.com/docs/api-variable#section-relation).
204218

205-
| column_name | a_ordinal_position | b_ordinal_position | a_data_type | b_data_type |
206-
|-------------|--------------------|--------------------|-------------------|-------------------|
207-
| order_id | 1 | 1 | integer | integer |
208-
| customer_id | 2 | 2 | integer | integer |
209-
| order_date | 3 | 3 | timestamp | date |
210-
| status | 4 | 5 | character varying | character varying |
211-
| amount | 5 | 4 | bigint | bigint |
212-
219+
| column_name | a_ordinal_position | b_ordinal_position | a_data_type | b_data_type | has_ordinal_position_match | has_data_type_match | in_a_only | in_b_only | in_both |
220+
|-------------|--------------------|--------------------|-------------------|-------------------| -------------------------- | ------------------- | --------- | --------- | ------- |
221+
| order_id | 1 | 1 | integer | integer | True | True | False | False | True |
222+
| customer_id | 2 | 2 | integer | integer | True | True | False | False | True |
223+
| order_date | 3 | 3 | timestamp | date | True | False | False | False | True |
224+
| status | 4 | 5 | character varying | character varying | False | True | False | False | True |
225+
| amount | 5 | 4 | bigint | bigint | False | True | False | False | True |
213226

214227
This is especially useful in two situations:
215228
1. Comparing a new version of a relation with an old one, to make sure that the
216229
structure is the same
217-
2. Helping figure out why a `union` of two relations won't work (often because
230+
1. Helping figure out why a `union` of two relations won't work (often because
218231
the data types are different)
219232

220233
For example, in the above result set, we can see that `status` and `amount` have

macros/compare_column_values.sql

+10-10
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
{% macro compare_column_values(a_query, b_query, primary_key, column_to_compare) -%}
2-
{{ return(adapter.dispatch('compare_column_values', 'audit_helper')(a_query, b_query, primary_key, column_to_compare)) }}
1+
{% macro compare_column_values(a_query, b_query, primary_key, column_to_compare, emojis=True, a_relation_name='a', b_relation_name='b') -%}
2+
{{ return(adapter.dispatch('compare_column_values', 'audit_helper')(a_query, b_query, primary_key, column_to_compare, emojis, a_relation_name, b_relation_name)) }}
33
{%- endmacro %}
44

5-
{% macro default__compare_column_values(a_query, b_query, primary_key, column_to_compare) -%}
5+
{% macro default__compare_column_values(a_query, b_query, primary_key, column_to_compare, emojis, a_relation_name, b_relation_name) -%}
66
with a_query as (
77
{{ a_query }}
88
),
@@ -17,13 +17,13 @@ joined as (
1717
a_query.{{ column_to_compare }} as a_query_value,
1818
b_query.{{ column_to_compare }} as b_query_value,
1919
case
20-
when a_query.{{ column_to_compare }} = b_query.{{ column_to_compare }} then '✅: perfect match'
21-
when a_query.{{ column_to_compare }} is null and b_query.{{ column_to_compare }} is null then '✅: both are null'
22-
when a_query.{{ primary_key }} is null then '🤷: missing from a'
23-
when b_query.{{ primary_key }} is null then '🤷: missing from b'
24-
when a_query.{{ column_to_compare }} is null then '🤷: value is null in a only'
25-
when b_query.{{ column_to_compare }} is null then '🤷: value is null in b only'
26-
when a_query.{{ column_to_compare }} != b_query.{{ column_to_compare }} then '🙅: ‍values do not match'
20+
when a_query.{{ column_to_compare }} = b_query.{{ column_to_compare }} then '{% if emojis %}✅: {% endif %}perfect match'
21+
when a_query.{{ column_to_compare }} is null and b_query.{{ column_to_compare }} is null then '{% if emojis %}✅: {% endif %}both are null'
22+
when a_query.{{ primary_key }} is null then '{% if emojis %}🤷: {% endif %}missing from {{ a_relation_name }}'
23+
when b_query.{{ primary_key }} is null then '{% if emojis %}🤷: {% endif %}missing from {{ b_relation_name }}'
24+
when a_query.{{ column_to_compare }} is null then '{% if emojis %}🤷: {% endif %}value is null in {{ a_relation_name }} only'
25+
when b_query.{{ column_to_compare }} is null then '{% if emojis %}🤷: {% endif %}value is null in {{ b_relation_name }} only'
26+
when a_query.{{ column_to_compare }} != b_query.{{ column_to_compare }} then '{% if emojis %}❌: {% endif %}‍values do not match'
2727
else 'unknown' -- this should never happen
2828
end as match_status,
2929
case

macros/compare_queries.sql

+7-3
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
{% macro compare_queries(a_query, b_query, primary_key=None, summarize=true) -%}
2-
{{ return(adapter.dispatch('compare_queries', 'audit_helper')(a_query, b_query, primary_key, summarize)) }}
1+
{% macro compare_queries(a_query, b_query, primary_key=None, summarize=true, limit=None) -%}
2+
{{ return(adapter.dispatch('compare_queries', 'audit_helper')(a_query, b_query, primary_key, summarize, limit)) }}
33
{%- endmacro %}
44

5-
{% macro default__compare_queries(a_query, b_query, primary_key=None, summarize=true) %}
5+
{% macro default__compare_queries(a_query, b_query, primary_key=None, summarize=true, limit=None) %}
66

77
with a as (
88

@@ -106,5 +106,9 @@ final as (
106106
{%- endif %}
107107

108108
select * from final
109+
{%- if limit and not summarize %}
110+
limit {{ limit }}
111+
{%- endif %}
112+
109113

110114
{% endmacro %}

macros/compare_relation_columns.sql

+4-1
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,10 @@ select
1919
a_cols.data_type as a_data_type,
2020
b_cols.data_type as b_data_type,
2121
coalesce(a_cols.ordinal_position = b_cols.ordinal_position, false) as has_ordinal_position_match,
22-
coalesce(a_cols.data_type = b_cols.data_type, false) as has_data_type_match
22+
coalesce(a_cols.data_type = b_cols.data_type, false) as has_data_type_match,
23+
a_cols.data_type is not null and b_cols.data_type is null as in_a_only,
24+
b_cols.data_type is not null and a_cols.data_type is null as in_b_only,
25+
b_cols.data_type is not null and a_cols.data_type is not null as in_both
2326
from a_cols
2427
full outer join b_cols using (column_name)
2528
order by coalesce(a_cols.ordinal_position, b_cols.ordinal_position)

macros/compare_relations.sql

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
{% macro compare_relations(a_relation, b_relation, exclude_columns=[], primary_key=None, summarize=true) %}
1+
{% macro compare_relations(a_relation, b_relation, exclude_columns=[], primary_key=None, summarize=true, limit=None) %}
22

33
{% set column_names = dbt_utils.get_filtered_columns_in_relation(from=a_relation, except=exclude_columns) %}
44

@@ -29,6 +29,6 @@ select
2929
from {{ b_relation }}
3030
{% endset %}
3131

32-
{{ audit_helper.compare_queries(a_query, b_query, primary_key, summarize) }}
32+
{{ audit_helper.compare_queries(a_query, b_query, primary_key, summarize, limit) }}
3333

3434
{% endmacro %}

0 commit comments

Comments
 (0)