we would propose to add an optional dataset property to store a data quality rating indicator so that both publisher or potential autorities can make explicit how much is the the dataset resources affordability. We are not wondering here to agree on the possible rating ranges and relative meanings. That aspect can be regulated outside of this
Component
DocumentationCategory
improvement
Login or create an account to comment.
Comments
In an earlier version of DCAT (see http://vocab.deri.ie/dcat) there was a property dcat:dataQuality but that is not included in the current version.
It could of course be proposed as a comment to the Last Call that is open until this coming Monday 8 April.
It is quite diffiult to assign a data quality single rate with a clear semantic due to the fact quality is strongly related to the use of data, For example if there is a dataset of pollution data in a given area an citizend could satisfied is it show a spatial precision of street or district, while a researcher could be interested for a deeper precision.
I suggest to encourage the possibility to add as much metadata value available related to the quality of exposed data set (e.g. currenct, accuracy, completeness and so on) so that users can decide if use the dataset or not
Agree that quality is very context sensitive and very hard to generalize (comparability is a similar one). This is a complex issue that I would not attempt to address here. I would rather make sure datasets have unique indentifers and let user make up their own indicators or linked to quality framework.
The proposed resolution is that the Working Group will not consider this issue.
I also think that in the absence of a common data quality framework with specific and detailed rules it looks like a dataQuality property is not necessary in terms of interoperability, but still may be useful for specific implementations. My point is that it should not be part of DCAT, neither the AP, but it will be also easy to extend on the top of to address your specific needs.
our starting point is that the dataset value is increasing propotionally with the reuse of the data it contains.
The absence of a standard attribute to store or link to a quality/affordability indicator either coming from publisher or a third party entity (like a rating agency) is currently seens as a limitation.
We really believe leaving it as an extension of the DCAT or the AP could create confusion and of course incentivate the proliferation of not easily aggregatable catalogues..
Is it actually possible to measure the re-usability of a dataset? how would you do it?
The DCAT AP will not contain properties to indicate data quality.
Just to mention that a possible approach to dataset quality rating is to model such ratings as social annotations, that can then be aggregated and weighed based on annotation provenance.
For social annotations, interesting work is being carried out by the W3C Open Annotation Community Group [1]. For provenance information, the W3C Provenance Ontology [2] can be an appropriate tool, as mentioned in another thread [3].
Another important reference concerning data quality is the work at the ODI on the Open Data Certificate [1].
The ODI Open Data Certificate represent well the kind of attribute I would have ruled within a Dataset. Thanks Andrea for mentioning it.
Regarding your previous comment, while the PROV-O is valuable I'm not sure the social annotation represent the right mechanish to produce an authoritative quality level indicator of a dataset. I see it more appropriate to produce a more generic usefulness or liking indicator.
I partially agree, Alex.
However, I see social annotations as a possible approach to comprehensively model any kind of "annotation" concerning a resource. As such, they may be used to address a large number of use cases - for instance:
About their actual use and presentation, they can be similar to those end users' comments to datasets present in many data portals. A more sophisticate approach is based on the use of reputation / trust systems to weigh either in an objective or subjective way their trustworthiness.
I see the approach based on social annotations and the one using the ODI ODC as complementary, and not mutually exclusive. Modelling social annotations is more complex, but this approach is in line with current work trying to consistently integrate official and non-official statements concerning giving characteristics of a resource. And, after all, also the ODI ODC can be modelled as a rating specified by a given agent on a given resource.