Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add queries for validating data against CubiQL's expectations. #145

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,35 @@ During development `lein run` can be used instead of building the uberjar:
The server hosts a GraphQL endpoint at http://localhost:PORT/graphql which follows the
protocol described [here](http://graphql.org/learn/serving-over-http/).

## Data validation

CubiQL currently makes assumptions about the data in the cubes it finds, namely that certain resources (e.g. dimensions and measures) have
an associated label, and that time and geograpical dimension values have a particular structure. When producing your own data for use
with CubiQL you may want to check that it conforms to the expectations CubiQL has.

The [rdf-validator](https://github.com/Swirrl/rdf-validator) is a tool for running a collection of validation tests against a SPARQL endpoint.
CubiQL includes validation queries encoding its requirements in the `validation` directory. To run these validations against your data:

1. Download the [latest version](https://github.com/Swirrl/rdf-validator/releases) of RDF validator
2. Clone the CubiQL repository or copy the files in the validation directory to your local machine
3. Define the CubiQL configuration file for your data. The required configuration keys are listed below
4. Run the RDF validator by specifying the location of the data, validation directory and CubiQL configuration e.g.

java -jar rdf-validator-standalone.jar --endpoint my_data.ttl --suite validations/ --variables cubiql-config.edn

The `--endpoint` parameter can refer to an RDF file, a folder containing RDF files or a remote SPARQL endpoint URI.

The `cubiql-config.edn` file must contain the following keys:

| Key |
|---------------------|
| :geo-dimension-uri |
| :time-dimension-uri |
| :codelist-label-uri |
| :dataset-label-uri |

If you are not using time or geography dimensions in your runtime configuration, you should set `geo-dimension-uri` and/or `time-dimension-uri` to a dummy value.

## License

Copyright © 2017 Swirrl IT Ltd.
Expand Down
11 changes: 11 additions & 0 deletions validation/codelist_members_must_have_labels.sparql
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
PREFIX qb: <http://purl.org/linked-data/cube#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT ?this WHERE {
{ ?codelist skos:member ?this }
UNION { ?this skos:inScheme ?codelist }
FILTER NOT EXISTS {
?this <{{codelist-label-uri}}> ?label .
}
}
11 changes: 11 additions & 0 deletions validation/datasets_must_have_labels.sparql
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Datasets must have an associated label

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX qb: <http://purl.org/linked-data/cube#>

SELECT ?this WHERE {
?this a qb:DataSet .
FILTER NOT EXISTS {
?this <{{dataset-label-uri}}> ?label .
}
}
12 changes: 12 additions & 0 deletions validation/datasets_must_have_measure_type_component.sparql
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Datasets must have a qb:measureType component

PREFIX qb: <http://purl.org/linked-data/cube#>

SELECT ?this WHERE {
?this a qb:DataSet .
FILTER NOT EXISTS {
?this qb:structure ?dsd .
?dsd qb:component ?comp .
?comp qb:dimension qb:measureType .
}
}
24 changes: 24 additions & 0 deletions validation/dimensions_codelist_must_have_only_codes_used.sparql
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
PREFIX qb: <http://purl.org/linked-data/cube#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

Select distinct ?v where{
{SELECT ?v WHERE {
?q a qb:DataSet.
?q qb:structure/qb:component ?comp.
?comp qb:dimension|qb:attribute ?dim .
?comp <{{codelist-predicate}}> ?list .
?obs ?dim ?v.
FILTER NOT EXISTS { { ?list skos:member ?v } UNION { ?v skos:inScheme ?list } }
}} UNION
{SELECT distinct ?v WHERE {
?q a qb:DataSet.
?q qb:structure/qb:component ?comp.
?comp qb:dimension|qb:attribute ?dim .
?comp <{{codelist-predicate}}> ?list .
{?list skos:member ?v } UNION { ?v skos:inScheme ?list }
FILTER NOT EXISTS {?obs qb:dataSet ?q. ?obs ?dim ?v}}}
}

# The codelist of each dimension should contain only the codes used at the cube
# Check 1) if all codes used at the cube exist at the codelist and
# 2)all codes of the codelist appear at the cube
11 changes: 11 additions & 0 deletions validation/dimensions_must_have_labels.sparql
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# All dimension properties must have an associated label

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX qb: <http://purl.org/linked-data/cube#>

SELECT ?this WHERE {
?this a qb:DimensionProperty .
FILTER NOT EXISTS {
?this <{{dataset-label-uri}}> ?label .
}
}
16 changes: 16 additions & 0 deletions validation/geo_values_must_have_labels.sparql
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
PREFIX qb: <http://purl.org/linked-data/cube#>

SELECT ?this WHERE {
{ SELECT DISTINCT ?this WHERE {
?obs a qb:Observation .
?obs <{{geo-dimension-uri}}> ?this .
}
}

FILTER NOT EXISTS {
?this <{{dataset-label-uri}}> ?label .
}
}

# Finds all geographic dimension values which do not have a corresponding label

11 changes: 11 additions & 0 deletions validation/measures_must_have_labels.sparql
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# All measure properties should have a label

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX qb: <http://purl.org/linked-data/cube#>

SELECT ?this WHERE {
?this a qb:MeasureProperty .
FILTER NOT EXISTS {
?this <{{dataset-label-uri}}> ?label .
}
}
16 changes: 16 additions & 0 deletions validation/time_values_must_have_beginning_time.sparql
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
PREFIX qb: <http://purl.org/linked-data/cube#>
PREFIX time: <http://www.w3.org/2006/time#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?this WHERE {
{ SELECT DISTINCT ?this WHERE {
?obs a qb:Observation .
?obs <{{time-dimension-uri}}> ?this .
}
}
FILTER NOT EXISTS {
?this time:hasBeginning ?begin .
?begin time:inXSDDateTime ?begintime .
FILTER(datatype(?begintime) = xsd:dateTime)
}
}
16 changes: 16 additions & 0 deletions validation/time_values_must_have_end_time.sparql
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
PREFIX qb: <http://purl.org/linked-data/cube#>
PREFIX time: <http://www.w3.org/2006/time#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?this WHERE {
{ SELECT DISTINCT ?this WHERE {
?obs a qb:Observation .
?obs <{{time-dimension-uri}}> ?this .
}
}
FILTER NOT EXISTS {
?this time:hasEnd ?end .
?end time:inXSDDateTime ?endtime .
FILTER(datatype(?endtime) = xsd:dateTime)
}
}