normalization

A Kotlin library for normalising and producing a digest of RDF graphs in Stardog.

Background

RDF is a graph-based model for describing resources via their properties, i.e. the (subject - predicate - value) or triple that makes up the vertices/edges in the graph. Often it is useful to compare the differences between sets of graphs or generate short identifiers for graphs via hashing algorithms, ex. to digitally sign a graph. This library provides an implementation of an algorithm for normalizing RDF datasets such that these operations can be performed. From the spec of this algorithm:

When data scientists discuss canonicalization, they do so in the context of achieving a particular set of goals. Since the same information may sometimes be expressed in a variety of different ways, it often becomes necessary to be able to transform each of these different ways into a single, standard format. With a standard format, the differences between two different sets of data can be easily determined, a cryptographically-strong hash identifier can be generated for a particular set of data, and a particular set of data may be digitally-signed for later verification.

In particular, this specification is about normalizing RDF datasets, which are collections of graphs. Since a directed graph can express the same information in more than one way, it requires canonicalization to achieve the aforementioned goals and any others that may arise via serendipity.

The algorithm implemented in this library is the URDNA2015 detailed in the above specification.

Package

dependencies {
    compile("io.docrozza:normalization:21.11")
}

The project uses Calendar Versioning for version numbers.

Usage

The JAR file needs to be added to the directory STARDOG_HOME/server/ext or wherever your STARDOG_EXT environment variable os pointing to. Once installed, the graphDigest function becomes available and can be used as follows:

PREFIX gd: <urn:docrozza:stardog:normalization:>
SELECT ?context ?hash FROM NAMED <tag:stardog:api:context:all> WHERE {
    GRAPH ?context {}
    (?context) gd:graphDigest (?hash)
}

Future Work

Add extra subject parameter (Boolean) to control whether the context should be included in the normalization routine
Add extra subject parameter (String) to select a different digest algorithm from the current SHA-256
Have some guards to prevent the loading of huge graphs - this would require high RAM to process using the current algorithm as written
Look into some simple shortcuts in normalization, ex. if no BNodes, then the statements can be simply sorted

Someone with in-depth knowledge of Stardog internals might be able to use off-heap processing for the BNode tracking allowing larger graphs to be normalized but this isn't part of their public API at the moment.

Contributing

Help appreciated :-) Open an issue or submit PRs.

NB to use the library or run the tests, a working Stardog installation is required. See here for more information. To then run the test, the following gradle project properties need to be set to create the embedded database:

stardogHome - the path to the directory to store the database and where the license file is stored
stardogLibs - the path to the database library JARs

Maintainers

@DocRozza

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
gradle/wrapper		gradle/wrapper
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle.kts		build.gradle.kts
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

normalization

Table of Contents

Background

Package

Usage

Future Work

Contributing

Maintainers

License

About

Releases 1

Packages

Languages

License

docrozza/normalization

Folders and files

Latest commit

History

Repository files navigation

normalization

Table of Contents

Background

Package

Usage

Future Work

Contributing

Maintainers

License

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages