Skip to content

Commit cdedf13

Browse files
committed
First commit on some shape graph material
1 parent 1cb8360 commit cdedf13

30 files changed

+2878
-0
lines changed

validation/README.md

+19
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Validation
2+
3+
## About
4+
5+
This directory holds work related to graph validation. The focus is on using the
6+
W3C SHACL Recommendation (https://www.w3.org/TR/shacl/).
7+
8+
At present we have worked up some examples around the Google Developers guidance at
9+
https://developers.google.com/search/docs/data-types/dataset#dataset. However,
10+
example focused on the Science on Schema, FAIR Data, DataONE or other community
11+
principles would be welcome.
12+
13+
## Resources
14+
15+
If you are looking for tools test SHACL shapes with you should look at the
16+
W3C Implementation Report (https://w3c.github.io/data-shapes/data-shapes-test-suite/).
17+
Two of the higer ranking tools are pySHACL (https://github.com/RDFLib/pySHACL)
18+
and TopBraid (https://github.com/TopQuadrant/shacl)
19+

validation/framing/README.md

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# Framing
2+
3+
## About
4+
5+
This directory is where work with the JSON-LD Framing API (https://json-ld.org/spec/latest/json-ld-framing/) will take place. Framing could potentially be used as part of a validation
6+
or extraction process for the data graphs.

validation/framing/geoFrame.jsonld

+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
{
2+
"@context": "http://schema.org/",
3+
"@explicit": true,
4+
"@type": "Dataset",
5+
"spatialCoverage": {
6+
"@type": "Place",
7+
"geo": {}
8+
}
9+
}

validation/notes.md

+137
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
# Notes
2+
3+
## About
4+
This is a collection of unedited notes that I moved to this
5+
repo. Much of this needs to be removed or updated but I'll leave
6+
it here for reference for until it gets some attention.
7+
8+
## Tangram: Simple service example (ref: https://gleaner.io )
9+
10+
11+
The Tangram services is a web services wrapper around the pySHACL
12+
(https://github.com/RDFLib/pySHACL) package. It allows you to send in JSON-LD data
13+
graphs to test against a Turtle (ttl) encoded shape graph.
14+
15+
Invoke the tool with something like:
16+
17+
With httpie client:
18+
19+
```bash
20+
http -f POST https://tangram.gleaner.io/uploader datagraph@./datagraphs/dataset-minimal-BAD.json-ld shapegraph@./shapegraphs/googleRecommended.ttl format=human
21+
```
22+
23+
Or with good old curl (with format set to huam):
24+
25+
```bash
26+
curl -F 'datagraph=@./datagraphs/dataset-minimal-BAD.json-ld' -F 'shapegraph=@./shapegraphs/googleRecommended.ttl' -F 'format=human' https://tangram.gleaner.io/uploader
27+
```
28+
29+
## Set up a Python Env with pySHACL (see refs)
30+
31+
A requirements.txt provides all the needed pip installs. The following
32+
should work to set up a new environment for you. You can also simply install
33+
these into your main python3 installation if you wish.
34+
35+
```bash
36+
# before 15.1.0
37+
virtualenv --no-site-packages --distribute .env &&\
38+
source .env/bin/activate &&\
39+
pip install -r requirements.txt
40+
41+
# after deprecation of some arguments in 15.1.0
42+
virtualenv .env && source .env/bin/activate && pip install -r requirements.txt
43+
```
44+
45+
Then to activate / deactivate use the following
46+
47+
* source shaclvenv/bin/activate
48+
* deactivate
49+
50+
A full process of setting up this approach is below. Here I have used
51+
a directory in my ~/src/python/venvs to house all my various virtual environments.
52+
53+
```bash
54+
> python3 -m virtualenv ~/src/python/venvs/shaclenv
55+
Using base prefix '/usr'
56+
New python executable in /home/fils/src/python/venvs/shaclenv/bin/python3
57+
Also creating executable in /home/fils/src/python/venvs/shaclenv/bin/python
58+
Installing setuptools, pip, wheel...
59+
done.
60+
> source ~/src/python/venvs/shaclenv/bin/activate
61+
> which pip
62+
/home/fils/src/python/venvs/shaclenv/bin/pip
63+
> pip install -r requirements.txt
64+
[ ... pip install output removed ... ]
65+
Installing collected packages: six, isodate, pyparsing, rdflib, rdflib-jsonld, owlrl, pyshacl
66+
Successfully installed isodate-0.6.0 owlrl-5.2.0 pyparsing-2.3.1 pyshacl-0.9.9.post1 rdflib-4.2.2 rdflib-jsonld-0.4.0 six-1.12.0
67+
68+
now test this
69+
70+
> pyshacl -s ./shapegraphs/googleRequired.ttl -m -f human -df json-ld ./datagraphs/dataset-full.json-ld
71+
Validation Report
72+
Conforms: True
73+
```
74+
75+
![alt install](./media/venvSetup.png "Install example")
76+
77+
78+
## On owl:imports
79+
80+
I was hoping to leverage some import method to allow us to have various shape graphs we could composite
81+
into a collection of constraints easily. While this may still be possible, my initial pattern is not
82+
and the reqrec.ttl file in the shapes directory will not work.
83+
84+
Ref: https://github.com/RDFLib/pySHACL/issues/18
85+
86+
## References
87+
88+
* https://www.w3.org/TR/shacl/
89+
* https://github.com/RDFLib/pySHACL
90+
* https://packaging.python.org/guides/installing-using-pip-and-virtualenv/
91+
* http://datashapes.org/
92+
* https://github.com/geological-survey-of-queensland/gsq-sample-profile/blob/master/shapes/sample.ttl
93+
* https://developers.google.com/search/docs/data-types/dataset#dataset
94+
95+
96+
## Notes
97+
98+
Example commands:
99+
```bash
100+
pyshacl -s ./shapegraphs/requiredShape.ttl -m -f human -df json-ld ./datagraphs/dataset-minimal.json-ld
101+
pyshacl -s ./shapegraphs/recomendShape.ttl -m -f human -df json-ld ./datagraphs/dataset-full.json-ld
102+
103+
```
104+
105+
Example output
106+
```
107+
pyshacl -s ./shapegraphs/recomendShape.ttl -m -f human -df json-ld ./datagraphs/dataset-full.json-ld
108+
Validation Report
109+
Conforms: True
110+
111+
pyshacl -s ./shapegraphs/recomendShape.ttl -m -f human -df json-ld ./datagraphs/dataset-minimal.json-ld
112+
Validation Report
113+
Conforms: False
114+
Results (1):
115+
Constraint Violation in MinCountConstraintComponent (http://www.w3.org/ns/shacl#MinCountConstraintComponent):
116+
Severity: sh:Violation
117+
Source Shape: [ sh:maxCount Literal("1", datatype=xsd:integer) ; sh:minCount Literal("1", datatype=xsd:integer) ; sh:path <http://schema.org/citation> ]
118+
Focus Node: [ ]
119+
Result Path: <http://schema.org/citation>
120+
```
121+
122+
Use fencepull command to get the JSON-LD and feed through Tangram.
123+
124+
```
125+
curl -s https://fence.gleaner.io/fencepull?url=http://opencoredata.org/doc/dataset/b8d7bd1b-ef3b-4b08-a327-e28ei \
126+
1420adf0 | curl -F 'datagraph=@-' -F 'shapegraph=@./shapegraphs/googleRequired.ttl' -F 'format=human' https://tangram.gleaner.io/uploader
127+
128+
```
129+
130+
```
131+
xmllint --xpath "/urlset/url/loc/text()" test.xml > out
132+
133+
curl -s http://opencoredata.org/sitemap.xml | grep -o '<loc>.*</loc>' | sed 's/\(<loc>\|<\/loc>\)//g' | head -3
134+
135+
curl -s http://opencoredata.org/sitemap.xml | grep -o '<loc>.*</loc>' | sed 's/\(<loc>\|<\/loc>\)//g' | sed -n "100,110p"
136+
137+
```
+50
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
2+
@prefix ex: <http://www.earthcube.org/schema#> .
3+
@prefix family: <http://example.org/family#> .
4+
@prefix owl: <http://www.w3.org/2002/07/owl#> .
5+
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
6+
@prefix sh: <http://www.w3.org/ns/shacl#> .
7+
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
8+
9+
<http://www.earthcube.org/schema>
10+
rdf:type owl:Ontology ;
11+
owl:imports <http://datashapes.org/dash> ;
12+
.
13+
14+
ex:IDShape
15+
a sh:NodeShape ;
16+
sh:nodeKind sh:IRI ;
17+
sh:message "Expect to see an ID for this resource"@en ;
18+
sh:targetClass <http://schema.org/Dataset> .
19+
20+
ex:URLShape
21+
a sh:NodeShape ;
22+
sh:property [
23+
sh:path <http://schema.org/url> ;
24+
sh:maxCount 1 ;
25+
sh:minCount 1 ;
26+
sh:nodeKind sh:IRIOrLiteral ;
27+
] ;
28+
sh:message "This needs to be a schema:URL"@en ;
29+
sh:targetClass <http://schema.org/Dataset> .
30+
31+
ex:DescriptionShape
32+
a sh:NodeShape ;
33+
sh:property [
34+
sh:path <http://schema.org/description>;
35+
sh:nodeKind sh:Literal ;
36+
sh:minCount 1 ;
37+
] ;
38+
sh:message "Needs to be Text"@en ;
39+
sh:targetClass <http://schema.org/Dataset> .
40+
41+
ex:NameShape
42+
a sh:NodeShape ;
43+
sh:property [
44+
sh:path <http://schema.org/name> ;
45+
sh:nodeKind sh:Literal ;
46+
sh:minCount 1 ;
47+
] ;
48+
sh:message "Needs to be Text"@en ;
49+
sh:targetClass <http://schema.org/Dataset> .
50+

validation/shapegraphs/README.md

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# Shape Graphs
2+
3+
## About
4+
5+
Some details on the shape graphs in this directory
6+
7+
Filename | Description
8+
------------ | -------------
9+
googleRequired.ttl | Checks for the Google required items described in https://developers.google.com/search/docs/data-types/dataset
10+
googleRecommended.ttl | Checks for the Google recommended items described in https://developers.google.com/search/docs/data-types/dataset
11+
googleRecommendedCoverageCheck.ttl | Same as the test for Google recommended but sets all items to min 1 to check for coverage. Use the one above if you don't care about coverage of recommended items
12+
P418Required.ttl | Same as googleRequired but adds in a check for an @id for Dataset type. Otherwise, checks for the Google recommended items described in https://developers.google.com/search/docs/data-types/dataset
13+
importTest.ttl | TESTING: A testing file for checking if shape imports works to allow people to stack together a set of shape graphs to check with
14+
temporalRange.ttl | TESTING: A file to explore validate temporal items in a data graph
15+
testingDataGraphs | A directory with various data graphs (some with errors) to use as part of testing shape graphs and perhaps a CI path in the future

0 commit comments

Comments
 (0)