DCAT-AP: How to use identifiers for datasets and distributions? Switch to the latest release

Published on: 30/03/2016 Last update: 08/11/2017

How to use identifiers for datasets and distributions?

Issue

In DCAT-AP, three identifiers can be associated with a Dataset:

(1a) The URI of the description itself, i.e. the identifier of the RDF graph, the left-hand side of the triple
(1b) The optional property identifier (dct:identifier) with range rdfs:Literal, commonly used to hold a HTTP URI
(1c) The optional property other identifier (adms:identifier) with range adms:Identifier which allows providing information on the identifier scheme, the version of the scheme and the agency that manages the identifier scheme

For Distributions, there is only one:

(2a) The URI of the description itself, i.e. the identifier of the RDF graph, the left-hand side of the triple

In addition, the description of the Distributions can contain two more URIs that identify the physical location of the data:

(2b) The mandatory property access URL (dcat:accessURL)
(2c) The optional property download URL (dcat:downloadURL)

DCAT-AP does not provide information on how to use these identifiers.

Current situation

RDF-based implementations do necessarily assign the identifiers for the graphs that contain the Dataset and Distribution descriptions. In these cases, usually the graph identifier of the Dataset description (1a) is copied into dct:identifier (1b).

Implementations that are not based on RDF need to export descriptions from a non-RDF system to RDF. In some cases, this is done by assembling a single RDF/XML structure that embeds all metadata for the catalogue. Such approaches may embed the descriptions of Distributions within the descripting of the associated Dataset and embed the descriptions of all Dataset in the description of the Catalog, creating a large file that holds all metadata. Such an approach does not require assignment of URIs to the entities, and such implementations may indeed not do that.

In addition, also in RDF-based implementations some entities may be modelled as blank nodes, for example a Period of Time may be expressed as a blank node with properties for start and end date. Some tools have difficulty processing such blank nodes.

Recommendation

The following approach is recommended:

Stable URIs should be minted for all entities.
If possible, URIs should resolve to metadata (303 redirect).
URIs generated on export must be unique and stable (same URI every time it is generated).
In RDF/XML, URI goes in rdf:about in rdf:Description for each of the entities
In JSON-LD, URI goes in @id keyword
If necessary, blank nodes to be assigned Skolem URIs
Dataset URI should be copied into dct:identifier

Rationale

If stable identifiers are assigned to all entities, the processing of the information will be made easier.

Example

The example is based on the Nobel Prize catalogue, which is available via http://www.nobelprize.org/datasets/dcat. Some modifications were made in order to clarify the guideline.

RDF/XML

<rdf:Description rdf:about="http://nobelprize.org/datasets/dcat#ds1">

<rdf:type rdf:resource="http://www.w3.org/ns/dcat#Dataset"/>

<dct:title xml:lang="en">Linked Nobel prizes</dct:title>

<dct:identifier>http://nobelprize.org/datasets/dcat#ds1</dct:identifier>

</rdf:Description>

JSON-LD

{"@id":"http://nobelprize.org/datasets/dcat#ds1","@type":["http://www.w3.org/ns/dcat#Dataset"],"http://purl.org/dc/terms/title":[{"@value":"Linked Nobel prizes","@language":"en"}],"http://purl.org/dc/terms/identifier":[{"@value":"http://nobelprize.org/datasets/dcat#ds1"}]}