Skip to main content

Which processes and tools could be used to manage the quality of metadata? 1.0 Latest release

Published on: 28/03/2017 Last update: 08/11/2017

Issue

National implementations of DCAT-AP may face difficulties to manage appropriately the quality of their metadata.

The related issue is available here.

Current situation

Metadata quality is currently not high on the priority list for implementers. The current focus is more on quantity. However, awareness is growing that users will not get good results if the quality of the metadata is poor.

Recommendation

For DCAT-AP implementers to be enabled to:

  • Verify the conformance of the metadata to the DCAT-AP (mandatory elements, prescribed data type, controlled vocabularies etc.):
    • The user interface should check for the metadata creation with feedback submitted in real-time to the creator.
    • A validator could be used (e.g. DCAT-AP validator) for harvested metadata.
  • Assure the accuracy of the metadata values compared to the content of the dataset:
    • Detailed guidelines such as EDP's Gold Book for Data Publishers or local guides should be provided to data publishers in order to help them to understand the elements of DCAT-AP.
    • Overall statistics on metadata quality such as the Metadata Quality Dashboard could be proposed on a weekly basis to create a sufficient awareness of the situation and to support the identification of the most urgent issues to be tackled.

Rationale

There are two broad categories of issues that are related to metadata quality.

The first category is about the conformance of the metadata to DCAT-AP. The aspects here are for example, whether the metadata contains the mandatory elements, and whether the metadata values have the prescribed data type or refer to the controlled vocabularies that are specified in the profile. Depending on whether an implementation provides a user interface for the creation of metadata or harvests metadata from other sources, two mechanisms can be observed.

If an implementation allows the creation of metadata through a user interface, the interface can check the correctness of the metadata in terms of conformance by showing error messages if mandatory information is not entered, and by providing drop-down list with the possible controlled terms.

If an implementation harvests metadata from other sources, for example a national data portal that aggregates data from regional or city portals, conformance can only be checked through a validator. Many implementations in Europe use the DCAT-AP Validator developed for the European Commission in the context of the Open Data Support project. Other approaches could make use of work taking place at W3C on the Shapes Constraint Language (SHACL), a language for validating RDF graphs against a set of conditions provided as shapes and other constructs expressed in the form of an RDF graph.

In the formalisation of automated validators, problems that may occur are for example that mandatory properties are missing or that text is provided for properties for which links to other resources are expected (e.g. text for the dcat:theme property which should contain a link to one of the MDR Data Theme terms). In addition, errors could be introduced by mapping tools that convert local data to DCAT-AP. 

The European Data Portal publishes metadata quality reports for the portals it harvests. in its Metadata Quality Dashboard.

The second category of quality aspects is related to the whether the actual values in the metadata are accurate and reflect appropriately what the data set is and what it is about. This is a much more challenging aspect and depends for a large part on the understanding of the creator of the metadata. Assuring quality in this area can only be achieved by providing good guidelines that explain what the meaning is of the various metadata properties.

A rich source of guidelines is the European Data Portal’s Gold Book for Data Publishers.

Also Best Practice Guidelines as published by the Share-PSI 2.0 project and by W3C’s Data on the Web Best Practices Working Group may be relevant for metadata quality.

Tools may be considered that give metadata creators immediate feedback through the provision of a preview function that shows how the metadata would look on a data portal.

Example

Metadata for datasets should be created as part of the workflow as close to the source as possible, helping metadata creators with good guidelines and user interfaces that include drop-down lists and immediate feedback. 

Future activities

Consideration of emerging approaches for specification and automated execution of quality checking, e.g. based on SPARQL or the current work at W3C on the Shapes Constraint Language (SHACL, https://www.w3.org/TR/shacl/)