Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Show property dbt DQ checks in table details as a RAG score #1414

Open
murdo-moj opened this issue Feb 21, 2025 · 5 comments
Open

✨ Show property dbt DQ checks in table details as a RAG score #1414

murdo-moj opened this issue Feb 21, 2025 · 5 comments
Assignees

Comments

@murdo-moj
Copy link
Contributor

murdo-moj commented Feb 21, 2025

Describe the feature request.

As part of a POC to show data quality on the catalogue, show DQ checks for the property table as a RAG rating, which are available currently in datahub

This is so DIP can show other teams what DQ in the catalogue would look like.

@murdo-moj
Copy link
Contributor Author

murdo-moj commented Feb 21, 2025

Rough sketch pending Joe's better one
Image

@joe-horton-moj
Copy link

@murdo-moj

2 min loom video

most catalogues use the following colours and content for ratings
• Green: Good / High / Sufficient
• Amber: Medium / Fair / Acceptable / Needs Attention
• Red: Poor / Low / Critical Issues

Idea 1
Add two additional columns without making any other changes.

Image

Idea 2
Add two additional columns, add MoJ scrollable pane and min-width of 1280px for table.

Image

@joe-horton-moj
Copy link

joe-horton-moj commented Feb 24, 2025

Final changes:

  • With 5 additional columns now, I think a max-width: 1400px for the table
  • Using scrollable pane MoJ div component
  • Then is some empty padding / margin being added in below each paragraph of the descriptions which can be removed 👍

Obvious iterations to be made next

  • A section / details component that explains the definition of each category rating
  • Review of wording / colour of tags and table presentation
Image

@murdo-moj
Copy link
Contributor Author

murdo-moj commented Feb 26, 2025

<a href="https://github.com/moj-analytical-services/create-a-derived-table/tree/main/mojap_derived_tables/models/property/property_int/quality" class="govuk-link" rel="noreferrer noopener" target="_blank">Learn how these data quality metrics were calculated (opens in new tab)</a>

@hjribeiro-moj hjribeiro-moj moved this from Todo 📝 to In Progress 🚀 in Data Catalogue Feb 27, 2025
@hjribeiro-moj hjribeiro-moj self-assigned this Feb 27, 2025
@murdo-moj
Copy link
Contributor Author

Core data quality dimensions

This section describes the six data quality dimensions as defined by DAMA UK, and provides examples of their application. These examples are taken (and sometimes adapted) from the DAMA UK Working Group “Defining Data Quality Dimensions” paper.

Completeness

Completeness describes the degree to which records are present.

For a data set to be complete, all records are included, and the most important data is present in those records. This means that the data set contains all the records that it should and all essential values in a record are populated.

It is important not to confuse the completeness of data with its accuracy. A complete data set may have incorrect values in fields, making it less accurate.

Example of application

A school collects forms from parents on emergency contact telephone numbers.

There are 300 students, but 294 responses are collected and recorded.

294/300 x 100 = 98%.

The emergency contact telephone number field is therefore 98% complete. However, these phone numbers may not all be correct, so the telephone number field is not necessarily accurate.

Uniqueness

Uniqueness describes the degree to which there is no duplication in records. This means that the data contains only one record for each entity it represents, and each value is stored once.

Some fields, such as National Insurance number, should be unique. Some data is less likely to be unique, for example geographical data such as town of birth.

Example of application

A school has 120 current students and 380 former students (i.e. 500 in total).

The student database shows 501 different student records.

This includes Fred Smith and Freddy Smith as separate records, despite only one student at the school named Fred Smith.

This shows that the data set has a uniqueness across all records of 500/501 x 100 = 99.8%.

Consistency

Consistency describes the degree to which values in a data set do not contradict other values representing the same entity. For example, a mother’s date of birth should be before her child’s.

Data is consistent if it doesn’t contradict data in another data set. For example, if the date of birth recorded for the same person in two different data sets is the same.

Example of application

In a school, a student’s date of birth has the same value and format in the school register as that stored within the student database.

Timeliness

Timeliness describes the degree to which the data is an accurate reflection of the period that they represent, and that the data and its values are up to date.

Some data, such as date of birth, may stay the same whereas some, such as income, may not.

Data is timely if the time lag between collection and availability is appropriate for the intended use.

Example of application

A school has a service level agreement that a change to an emergency contact will occur within 2 days.

A parent gives an updated emergency contact number on 1 June.

It is entered into the student database on the 4 June.

It has taken 3 days to update the system which breaches the agreed data quality rule.

Validity

Validity describes the degree to which the data is in the range and format expected. For example, date of birth does not exceed the present day and is within a reasonable range.

Valid data is stored in a data set in the appropriate format for that type of data. For example, a date of birth is stored in a date format rather than in plain text.

Example of application

Primary and Junior School applications capture the age of a child. This age is entered into the database and the age checked to ensure it is between 4 and 11. Any values outside of this range are rejected as invalid.

Accuracy

Accuracy describes the degree to which data matches reality.

Bias in data may impact accuracy. When data is biased it means that it is not representative of the entire population. Account for bias in your measurements if possible, and make sure that data bias is communicated to your users.

In a data set, individual records can be measured for accuracy, or the whole data set can be measured. Which you choose to do should depend on the purpose of the data and your business needs.

Example of application

A school receives applications for its annual September intake and requires students to be aged 5 before 31 August of the intake year.

A parent from the USA completes the Date of Birth (D.O.B) on the application in the US date format, MM/DD/YYYY rather than DD/MM/YYYY format, with the days and months reversed.

The student is accepted in error as the date of birth given is 09/08/YYYY rather than 08/09/YYYY.

The representation of the student’s D.O.B. – whilst valid in its US context – means that in the UK the age was not derived correctly, and the value recorded was consequently not accurate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress 🚀
Development

No branches or pull requests

3 participants