-
How to model and express provenance?
- Source and lineage of metadata and data
- Use of PROV-O (https://www.w3.org/TR/prov-o/)
The DCAT model treats the descriptions of datasets in a catalogue as entities that only exist in the context of the catalogue, and does not consider situations where these descriptions are imported from and exported to other catalogues.
In an environment where descriptions of datasets are exchanged among data portals, the situation that DCAT-AP is designed for, it may be important for users to understand where data comes from and how it may have been modified along the way. For example, it could support credibility of a dataset to know which organisation created the metadata for it and how the description was modified along a chain of exchanges.
DCAT-AP specifies an optional property dct:provenance for Dataset but does not provide any guidance on how to describe instances of the class dct:ProvenanceStatement.
A common approach the expression of provenance would improve interoperability among catalogues.
This issue has been reported by Sadia Vancauwenbergh:
http://joinup.ec.europa.eu/mailman/archives/dcat_application_profile/2016-January/000364.html
Comments
Provenance is a concept that is commonly defined by dictionaries as ‘place of source of origin’ and more specifically ‘The history of the ownership of an object’ See for example: http://www.thefreedictionary.com/provenance
DCAT-AP v1.1 has several properties that are related to provenance:
It would be useful to know how implementations use these properties and whether there are additional properties that are used for provenance-related information, for example from W3C PROV-O provenance ontology.
In our European Data Portal Geo-Harvesters we follow the approach defined in GeoDCAT-AP to map the ISO19115 attribute "lineage". The attribute can be found in ISO19115/ISO19139 following this XPath: /gmi:MI_Metadata/gmd:dataQualityInfo/gmd:DQ_DataQuality/gmd:lineage/gmd:LI_Lineage/gmd:statement/gco:CharacterString
The proposed candidate from DCAT-AP and GeoDCAT-AP is dct:provenance. Since the range of dct:provenance is not a literal, but class dct:ProvenanceStatement, the free-text content rdfs:label is used.
Proposed resolution: