Dataset quality rating

Alessio DRAGONI

Published on: 04/04/2013 Discussion Archived

we would propose to add an optional dataset property to store a data quality rating indicator so that both publisher or potential autorities can make explicit how much is the the dataset resources affordability. We are not wondering here to agree on the possible rating ranges and relative meanings. That aspect can be regulated outside of this

Component

Documentation

Comments

Makx DEKKERS Fri, 05/04/2013 - 13:03

In an earlier version of DCAT (see http://vocab.deri.ie/dcat) there was a property dcat:dataQuality but that is not included in the current version.

It could of course be proposed as a comment to the Last Call that is open until this coming Monday 8 April.

Anonymous (not verified) Mon, 08/04/2013 - 12:27

It is quite diffiult to assign a data quality single rate with a clear semantic due to the fact quality is strongly related to the use of data, For example if there is a dataset of pollution data in a given area an citizend could satisfied is it show a spatial precision of street or district, while a researcher could be interested for a deeper precision.

I suggest to encourage the possibility to add as much metadata value available related to the quality of exposed data set (e.g. currenct, accuracy, completeness and so on) so that users can decide if use the dataset or not

Anonymous (not verified) Mon, 08/04/2013 - 20:30

Agree that quality is very context sensitive and very hard to generalize (comparability is a similar one). This is a complex issue that I would not attempt to address here. I would rather make sure datasets have unique indentifers and let user make up their own indicators or linked to quality framework.

Makx DEKKERS Mon, 15/04/2013 - 20:05

The proposed resolution is that the Working Group will not consider this issue.

Anonymous (not verified) Tue, 16/04/2013 - 14:30

I also think that in the absence of a common data quality framework with specific and detailed rules it looks like a dataQuality property is not necessary in terms of interoperability, but still may be useful for specific implementations. My point is that it should not be part of DCAT, neither the AP, but it will be also easy to extend on the top of to address your specific needs.

Alessio DRAGONI Tue, 16/04/2013 - 15:27

our starting point is that the dataset value is increasing propotionally with the reuse of the data it contains.
The absence of a standard attribute to store or link to a quality/affordability indicator either coming from publisher or a third party entity (like a rating agency) is currently seens as a limitation.
We really believe leaving it as an extension of the DCAT or the AP could create confusion and of course incentivate the proliferation of not easily aggregatable catalogues..

Enric STAROMIEJSKI Tue, 16/04/2013 - 15:43

Is it actually possible to measure the re-usability of a dataset? how would you do it?

Makx DEKKERS Mon, 22/04/2013 - 21:24

The DCAT AP will not contain properties to indicate data quality.

Andrea PEREGO Fri, 26/04/2013 - 16:17

Just to mention that a possible approach to dataset quality rating is to model such ratings as social annotations, that can then be aggregated and weighed based on annotation provenance.

For social annotations, interesting work is being carried out by the W3C Open Annotation Community Group [1]. For provenance information, the W3C Provenance Ontology [2] can be an appropriate tool, as mentioned in another thread [3].

Andrea PEREGO Fri, 26/04/2013 - 16:43

Another important reference concerning data quality is the work at the ODI on the Open Data Certificate [1].

http://theodi.github.io/open-data-certificate/

Alessio DRAGONI Fri, 26/04/2013 - 19:23

The ODI Open Data Certificate represent well the kind of attribute I would have ruled within a Dataset. Thanks Andrea for mentioning it.

Regarding your previous comment, while the PROV-O is valuable I'm not sure the social annotation represent the right mechanish to produce an authoritative quality level indicator of a dataset. I see it more appropriate to produce a more generic usefulness or liking indicator.

Andrea PEREGO Wed, 08/05/2013 - 15:01

I partially agree, Alex.

However, I see social annotations as a possible approach to comprehensively model any kind of "annotation" concerning a resource. As such, they may be used to address a large number of use cases - for instance:

They can be used to collect and record in a machine readable way feedback from end users about, e.g., the quality of a dataset or the presence of errors (and this links to the issue raised by Christopher in a comment to DCAT [1]).
They can be used for ratings generated by software agents evaluating datasets based on a predefined set of criteria.
They can be applied to ratings provided by the organisations / agent publishing the dataset, or even by third parties. For instance, they can be used to model ratings specified by agencies in charge of providing an official assessment of dataset quality.

About their actual use and presentation, they can be similar to those end users' comments to datasets present in many data portals. A more sophisticate approach is based on the use of reputation / trust systems to weigh either in an objective or subjective way their trustworthiness.

I see the approach based on social annotations and the one using the ODI ODC as complementary, and not mutually exclusive. Modelling social annotations is more complex, but this approach is in line with current work trying to consistently integrate official and non-official statements concerning giving characteristics of a resource. And, after all, also the ODI ODC can be modelled as a rating specified by a given agent on a given resource.

http://lists.w3.org/Archives/Public/public-gld-comments/2013Apr/0039.html

DCAT Application Profile for data portals in Europe

Dataset quality rating

Component

Category

Comments