Skip to main content

Introduction

Data Catalogues done right

piveau is an open source metadata catalogue solution. It is highly scalable and covers the essential life cycle of your metadata: harvesting, storage and quality assurance. 

piveau was designed and developed around Semantic Web technologies, the W3C standard DCAT and the European standard for Open Data DCAT-AP. It closes the gap between formal metadata specifications and their application in production. piveau puts a strong emphasis on Open Data and is a leading solution for public administrations and non-profit organizations to publish interoperable and flexible metadata catalogues.

The project has been developed thanks to the support of data.europa.eu

Background

Datasets

It is customary in data management to divide data into individual chunks, so called datasets. A dataset holds data about a certain topic. This could be for example the demographic development of a country over a certain period of time or the number of people who have been using the public transportation system of a city during the last months. A dataset contains two things:

   information about the data itself ("metadata"), such as the time the dataset was created or changed, a title and a description
   distributions which contain the actual data, they are mostly presented in the form of XLS, CSV or other file formats

DCAT-AP

One of the most widely adopted standards for the description of datasets is DCAT and its extension DCAT Application profile for data portals in Europe (DCAT-AP). The latter adds metadata fields and mandatory property ranges, making it suitable for use with Open Data management platforms.

Components

Piveau is based on a microservice architecture and a custom pipeline system, facilitating a flexible and scalable feature composition.

piveau hub

Hub is the central component to store and register the data. Its persistence layer consists of a Virtuoso triplestore as the principal database, Elasticsearch as the indexing server and a MongoDB for storing binary files.

piveau consus

Consus is responsible for the data acquisition from various sources and data providers. This includes scheduling, transformation and harmonization.

piveau metrics

Metrics is responsible for creating and maintaining comprehensive quality information and feeding them back to the Hub.