|
3 | 3 | Useful macros when performing data audits
|
4 | 4 |
|
5 | 5 | # Contents
|
6 |
| -* [compare_relations](#compare_relations-source) |
7 |
| -* [compare_queries](#compare_queries-source) |
8 |
| -* [compare_column_values](#compare_column_values-source) |
9 |
| -* [compare_relation_columns](#compare_relation_columns-source) |
10 |
| -* [compare_all_columns](#compare_all_columns-source) |
11 |
| -* [compare_column_values_verbose](#compare_column_values_verbose-source) |
| 6 | +- [dbt-audit-helper](#dbt-audit-helper) |
| 7 | +- [Contents](#contents) |
| 8 | +- [Installation instructions](#installation-instructions) |
| 9 | +- [Macros](#macros) |
| 10 | + - [compare\_relations (source)](#compare_relations-source) |
| 11 | + - [compare\_queries (source)](#compare_queries-source) |
| 12 | + - [compare\_column\_values (source)](#compare_column_values-source) |
| 13 | + - [Usage:](#usage) |
| 14 | + - [Advanced usage - dbt Cloud:](#advanced-usage---dbt-cloud) |
| 15 | + - [compare\_relation\_columns (source)](#compare_relation_columns-source) |
| 16 | + - [compare\_all\_columns (source)](#compare_all_columns-source) |
| 17 | + - [Usage:](#usage-1) |
| 18 | + - [Arguments:](#arguments) |
12 | 19 |
|
13 | 20 | # Installation instructions
|
14 | 21 | New to dbt packages? Read more about them [here](https://docs.getdbt.com/docs/building-a-dbt-project/package-management/).
|
@@ -71,6 +78,7 @@ Arguments:
|
71 | 78 | results for row-by-row validation.
|
72 | 79 | * `summarize` (optional): Allows you to switch between a summary or detailed view
|
73 | 80 | of the compared data. Accepts `true` or `false` values. Defaults to `true`.
|
| 81 | +* `limit` (optional): Allows you to limit the number of rows returned when summarize=False. Defaults to `None` (no limit). |
74 | 82 |
|
75 | 83 | ## compare_queries ([source](macros/compare_queries.sql))
|
76 | 84 | Super similar to `compare_relations`, except it takes two select statements. This macro is useful when:
|
@@ -107,8 +115,13 @@ Super similar to `compare_relations`, except it takes two select statements. Thi
|
107 | 115 | ```
|
108 | 116 |
|
109 | 117 | Arguments:
|
110 |
| -* `summarize` (optional): Allows you to switch between a summary or detaied view |
111 |
| - of the compared data. Accepts `true` or `false` vaules. Defaults to `true`. |
| 118 | + |
| 119 | +* `a_query` and `b_query`: The queries you want to compare. |
| 120 | +* `exclude_columns` (optional): Any columns you wish to exclude from the |
| 121 | + validation. |
| 122 | +* `summarize` (optional): Allows you to switch between a summary or detailed view |
| 123 | + of the compared data. Accepts `true` or `false` values. Defaults to `true`. |
| 124 | +* `limit` (optional): Allows you to limit the number of rows returned when summarize=False. Defaults to `null` (no limit). |
112 | 125 |
|
113 | 126 | ## compare_column_values ([source](macros/compare_column_values.sql))
|
114 | 127 | This macro will return a query, that, when executed, compares a column across
|
@@ -159,6 +172,7 @@ number of your records don't match.
|
159 | 172 | **Usage notes:**
|
160 | 173 | * `primary_key` must be a unique key in both tables, otherwise the join won't
|
161 | 174 | work as expected.
|
| 175 | +* `emojis` is a boolean argument that defaults to `true` and displays ✅, 🤷 and ❌ for easier visual scanning. If you don't want to include emojis in the output, set it to `false`. |
162 | 176 |
|
163 | 177 |
|
164 | 178 | ### Advanced usage - dbt Cloud:
|
@@ -202,19 +216,18 @@ The ``.print_table()`` function is not compatible with dbt Cloud so an adjustmen
|
202 | 216 | This macro will return a query, that, when executed, compares the ordinal_position
|
203 | 217 | and data_types of columns in two [Relations](https://docs.getdbt.com/docs/api-variable#section-relation).
|
204 | 218 |
|
205 |
| -| column_name | a_ordinal_position | b_ordinal_position | a_data_type | b_data_type | |
206 |
| -|-------------|--------------------|--------------------|-------------------|-------------------| |
207 |
| -| order_id | 1 | 1 | integer | integer | |
208 |
| -| customer_id | 2 | 2 | integer | integer | |
209 |
| -| order_date | 3 | 3 | timestamp | date | |
210 |
| -| status | 4 | 5 | character varying | character varying | |
211 |
| -| amount | 5 | 4 | bigint | bigint | |
212 |
| - |
| 219 | +| column_name | a_ordinal_position | b_ordinal_position | a_data_type | b_data_type | has_ordinal_position_match | has_data_type_match | in_a_only | in_b_only | in_both | |
| 220 | +|-------------|--------------------|--------------------|-------------------|-------------------| -------------------------- | ------------------- | --------- | --------- | ------- | |
| 221 | +| order_id | 1 | 1 | integer | integer | True | True | False | False | True | |
| 222 | +| customer_id | 2 | 2 | integer | integer | True | True | False | False | True | |
| 223 | +| order_date | 3 | 3 | timestamp | date | True | False | False | False | True | |
| 224 | +| status | 4 | 5 | character varying | character varying | False | True | False | False | True | |
| 225 | +| amount | 5 | 4 | bigint | bigint | False | True | False | False | True | |
213 | 226 |
|
214 | 227 | This is especially useful in two situations:
|
215 | 228 | 1. Comparing a new version of a relation with an old one, to make sure that the
|
216 | 229 | structure is the same
|
217 |
| -2. Helping figure out why a `union` of two relations won't work (often because |
| 230 | +1. Helping figure out why a `union` of two relations won't work (often because |
218 | 231 | the data types are different)
|
219 | 232 |
|
220 | 233 | For example, in the above result set, we can see that `status` and `amount` have
|
|
0 commit comments