-
Notifications
You must be signed in to change notification settings - Fork 153
Fix TIME namespace definition to use DCAT recommendation #344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@mjanez thanks for the detailed report. The fix looks good, but I'm curious to know where are you getting serializations with the incorrect format |
Thanks @amercader Good point, you're right that the example I gave with The context here is an external system that consumes RDF serialization provided by a custom DCAT ES profile based on The specific case serializes the RDF with time periods using dct:accrualPeriodicity
[
a dct:Frequency;
rdf:value
[
a time:DurationDescription;
rdfs:label "{time-interval}";
time:{period} {n}.
];
rdfs:label "Every {time-interval}".
]; The issue is that the harvesting extension defines its own TIME namespace, and it did have the So, when a publisher generates RDF that doesn’t match the same TIME namespace declaration, it fails and throws an error. Because of the parsing method's quirks, it never parsed the update frequency. Probably the only thing to check would be if all namespaces are used canonically, which is probably the case since this is the first bug of this kind we've encountered. But I can check. |
Update from master
@amercader checks if it is sufficient: 4a47c10 |
Thanks @mjanez |
Description
Context
This PR addresses an issue with namespace definitions while working on implementing profiles according to the contributing guidelines. We've been developing custom profiles for the Spanish application profiles - both the current NTI-RISP (based on DCAT) and the future Spanish profile based on DCAT-AP.
Although this can be handled within the harvester, following DCAT's recommendation to use the more common namespace would be beneficial, especially since many publishers rely on ckanext-dcat to serialize their RDF for the national catalog.
Problem
During development, we identified an issue with the TIME namespace definition in base.py. The current implementation defines TIME without the trailing hash (#):
However, the correct URI according to W3C specifications and DCAT-AP should include the trailing hash:
Impact
This incorrect namespace definition causes properties like
time:years
,time:days
, etc. to be serialized with incorrect URIs:http://www.w3.org/2006/timeyears
http://www.w3.org/2006/time#years
This breaks interoperability with standards-compliant harvesters, especially for
dct:accrualPeriodicity
data structures for federation with portals like datos.gob.es.Example of the issue
We found datasets that weren't federating correctly with datos.gob.es. Examining the RDF revealed that years/days properties weren't being properly recognized:
Incorrect (won't federate):
Here,
time:days
points tohttp://www.w3.org/2006/timeyears
instead ofhttp://www.w3.org/2006/time#years
.Correctly federated example:
Solution
This PR updates the namespace definition in
base.py
to use the correct URI with the trailing hash, as specified in DCAT:Also update namespace in tests
Testing & Verification
We've verified this fix by manually applying it to our local installation and confirming that datasets with temporal properties using the TIME ontology (e.g.,
dct:accrualPeriodicity
) are now correctly harvested by external systems.Tests also pass.
References