Skip to content

Commit b564211

Browse files
jasnonazstumelius
andauthored
Cherry Pick: not null proportion schema test (#411)
* Add not_null_proportion schema test and related integration tests * Update CHANGELOG * Fix csv formatting and numeric typecasting Co-authored-by: Simo Tumelius <simo.tumelius@gmail.com>
1 parent 5abb160 commit b564211

File tree

5 files changed

+66
-0
lines changed

5 files changed

+66
-0
lines changed

CHANGELOG.md

+1
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ If you were relying on the position to match up your optional arguments, this ma
5656
## Features
5757
* Add new argument, `order_by`, to `get_column_values` (code originally in [#289](https://github.com/fishtown-analytics/dbt-utils/pull/289/) from [@clausherther](https://github.com/clausherther), merged via [#349](https://github.com/fishtown-analytics/dbt-utils/pull/349/))
5858
* Add `slugify` macro, and use it in the pivot macro. :rotating_light: This macro uses the `re` module, which is only available in dbt v0.19.0+. As a result, this feature introduces a breaking change. ([#314](https://github.com/fishtown-analytics/dbt-utils/pull/314))
59+
* Add `not_null_proportion` schema test that allows the user to specify the minimum (`at_least`) tolerated proportion (e.g., `0.95`) of non-null values
5960

6061
## Under the hood
6162
* Update the default implementation of concat macro to use `||` operator ([#373](https://github.com/fishtown-analytics/dbt-utils/pull/314) from [@ChristopheDuong](https://github.com/ChristopheDuong)). Note this may be a breaking change for adapters that support `concat()` but not `||`, such as Apache Spark.

README.md

+17
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ Check [dbt Hub](https://hub.getdbt.com/fishtown-analytics/dbt_utils/latest/) for
1616
- [cardinality_equality](#cardinality_equality-source)
1717
- [unique_where](#unique_where-source)
1818
- [not_null_where](#not_null_where-source)
19+
- [not_null_proportion](#not_null_proportion-source)
1920
- [relationships_where](#relationships_where-source)
2021
- [mutually_exclusive_ranges](#mutually_exclusive_ranges-source)
2122
- [unique_combination_of_columns](#unique_combination_of_columns-source)
@@ -252,6 +253,22 @@ models:
252253
where: "_deleted = false"
253254
```
254255

256+
#### not_null_proportion ([source](macros/schema_tests/not_null_proportion.sql))
257+
This test validates that the proportion of non-null values present in a column is between a specified range [`at_least`, `at_most`] where `at_most` is an optional argument (default: `1.0`).
258+
259+
**Usage:**
260+
```yaml
261+
version: 2
262+
263+
models:
264+
- name: my_model
265+
columns:
266+
- name: id
267+
tests:
268+
- dbt_utils.not_null_proportion:
269+
at_least: 0.95
270+
```
271+
255272
#### not_accepted_values ([source](macros/schema_tests/not_accepted_values.sql))
256273
This test validates that there are no rows that match the given values.
257274

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
point_5,point_9
2+
1,1
3+
,2
4+
,3
5+
4,4
6+
5,5
7+
6,6
8+
,7
9+
,8
10+
,
11+
10,10

integration_tests/models/schema_tests/schema.yml

+11
Original file line numberDiff line numberDiff line change
@@ -157,3 +157,14 @@ models:
157157
inclusive: true
158158
where: "id <> -1"
159159

160+
- name: data_not_null_proportion
161+
columns:
162+
- name: point_5
163+
tests:
164+
- dbt_utils.not_null_proportion:
165+
at_least: 0.5
166+
at_most: 0.5
167+
- name: point_9
168+
tests:
169+
- dbt_utils.not_null_proportion:
170+
at_least: 0.9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
{% macro test_not_null_proportion(model) %}
2+
{{ return(adapter.dispatch('test_not_null_proportion', packages = dbt_utils._get_utils_namespaces())(model, **kwargs)) }}
3+
{% endmacro %}
4+
5+
{% macro default__test_not_null_proportion(model) %}
6+
7+
{% set column_name = kwargs.get('column_name', kwargs.get('arg')) %}
8+
{% set at_least = kwargs.get('at_least', kwargs.get('arg')) %}
9+
{% set at_most = kwargs.get('at_most', kwargs.get('arg', 1)) %}
10+
11+
with validation as (
12+
select
13+
sum(case when {{ column_name }} is null then 0 else 1 end) / cast(count(*) as numeric) as not_null_proportion
14+
from {{ model }}
15+
),
16+
validation_errors as (
17+
select
18+
not_null_proportion
19+
from validation
20+
where not_null_proportion < {{ at_least }} or not_null_proportion > {{ at_most }}
21+
)
22+
select
23+
count(*)
24+
from validation_errors
25+
26+
{% endmacro %}

0 commit comments

Comments
 (0)