-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Show property
dbt DQ checks in table details as a RAG score
#1414
Comments
most catalogues use the following colours and content for ratings Idea 1 Idea 2 |
|
Core data quality dimensionsThis section describes the six data quality dimensions as defined by DAMA UK, and provides examples of their application. These examples are taken (and sometimes adapted) from the DAMA UK Working Group “Defining Data Quality Dimensions” paper. CompletenessCompleteness describes the degree to which records are present. For a data set to be complete, all records are included, and the most important data is present in those records. This means that the data set contains all the records that it should and all essential values in a record are populated. It is important not to confuse the completeness of data with its accuracy. A complete data set may have incorrect values in fields, making it less accurate. Example of applicationA school collects forms from parents on emergency contact telephone numbers. There are 300 students, but 294 responses are collected and recorded. 294/300 x 100 = 98%. The emergency contact telephone number field is therefore 98% complete. However, these phone numbers may not all be correct, so the telephone number field is not necessarily accurate. UniquenessUniqueness describes the degree to which there is no duplication in records. This means that the data contains only one record for each entity it represents, and each value is stored once. Some fields, such as National Insurance number, should be unique. Some data is less likely to be unique, for example geographical data such as town of birth. Example of applicationA school has 120 current students and 380 former students (i.e. 500 in total). The student database shows 501 different student records. This includes Fred Smith and Freddy Smith as separate records, despite only one student at the school named Fred Smith. This shows that the data set has a uniqueness across all records of 500/501 x 100 = 99.8%. ConsistencyConsistency describes the degree to which values in a data set do not contradict other values representing the same entity. For example, a mother’s date of birth should be before her child’s. Data is consistent if it doesn’t contradict data in another data set. For example, if the date of birth recorded for the same person in two different data sets is the same. Example of applicationIn a school, a student’s date of birth has the same value and format in the school register as that stored within the student database. TimelinessTimeliness describes the degree to which the data is an accurate reflection of the period that they represent, and that the data and its values are up to date. Some data, such as date of birth, may stay the same whereas some, such as income, may not. Data is timely if the time lag between collection and availability is appropriate for the intended use. Example of applicationA school has a service level agreement that a change to an emergency contact will occur within 2 days. A parent gives an updated emergency contact number on 1 June. It is entered into the student database on the 4 June. It has taken 3 days to update the system which breaches the agreed data quality rule. ValidityValidity describes the degree to which the data is in the range and format expected. For example, date of birth does not exceed the present day and is within a reasonable range. Valid data is stored in a data set in the appropriate format for that type of data. For example, a date of birth is stored in a date format rather than in plain text. Example of applicationPrimary and Junior School applications capture the age of a child. This age is entered into the database and the age checked to ensure it is between 4 and 11. Any values outside of this range are rejected as invalid. AccuracyAccuracy describes the degree to which data matches reality. Bias in data may impact accuracy. When data is biased it means that it is not representative of the entire population. Account for bias in your measurements if possible, and make sure that data bias is communicated to your users. In a data set, individual records can be measured for accuracy, or the whole data set can be measured. Which you choose to do should depend on the purpose of the data and your business needs. Example of applicationA school receives applications for its annual September intake and requires students to be aged 5 before 31 August of the intake year. A parent from the USA completes the Date of Birth (D.O.B) on the application in the US date format, MM/DD/YYYY rather than DD/MM/YYYY format, with the days and months reversed. The student is accepted in error as the date of birth given is 09/08/YYYY rather than 08/09/YYYY. The representation of the student’s D.O.B. – whilst valid in its US context – means that in the UK the age was not derived correctly, and the value recorded was consequently not accurate. |
Describe the feature request.
As part of a POC to show data quality on the catalogue, show DQ checks for the
property
table as a RAG rating, which are available currently in datahubThis is so DIP can show other teams what DQ in the catalogue would look like.
The text was updated successfully, but these errors were encountered: