Description
From: http://joinup.ec.europa.eu/mailman/archives/dcat_application_profile/2015-February/000120.html
Based on feedback from public sector organizations we have thrown in these extra properties for the dataset class:
- Access Level -to distinguish open data from the rest by dividing into public, restricted and non-public datasets.
- Access Rights - to express why the dataset is restricted/non-public. Applies only for restricted/non-public datasets.
Proposed solution
Add new property to Dataset to indicate why the Dataset is restricted or non-public.
Component
CodeCategory
improvement
Login or create an account to comment.
Comments
Proposed resolution: No new property. This information may be included as text in dct:Description.
I think the issue here is how to provide machine-readable description of access levels (or restrictions), that would enable data consumers (users and software agent) to filter out data not satisfying given requirements.
About the access levels in POD, the term "restricted" probably covers very different situations - e.g., the data are behind a paywall, you have to register and/or to do additional things to get the data.
It would be however useful to have a distinction between discriminatory and non-discriminatory access. For instance, we have quite a few examples of "open data" that can be downloaded by anybody after being registered. This is different from the notion of "authorisation", where data are accessible after authentication only if you have the right priviliges. By contrast, non-discriminatory registration means that ALL registered users can access the data.
Non-discriminatory registration is usually motivated by the need of data producers/providers to have feedback on who is using their data, but some users may not be willing to provide even minimal personal information (e.g., name and email address) because of privacy concerns and/or other reasons. So, for them, it might be important to be able to exclude from search results also this kind of data.
The contextualization of Andrea is indeed relevant. I have encountered several public bodies that at the same time would like to offer the data to anybody but also would like to provide some SLA via registration prior having access to the data.
In particular for larger amount of data and offerings via an API (in order to make sure that one can control erroneous programs) a form of registration is being used.
To the question: to which extend do we want to go in DCAT-AP? In order to create a machine to machine interaction which based on a query on the dataset catalogue automatically can retrieve access to all the data will imply the standardization (or a selection of existing models) on file & data & API access on the web.
However I agree that as a user of a dataset catalogue it is valuable information to know whether free and immediate access is possible (todays default in open data portals), free after registration access or it is a payed service.
Whereas the first option does not require more user feedback, the second and the third require a pointer to the page where the process is started and the conditions are explained. From the end-user perspective I believe this information is more valueable than introducing a property with a range in a controlled vocabulary.
We have also seen the potential need for an 'access level' property.
But maybe this is a more fundamental question - is DCAT-AP only designed for 'Open Data', in the sense that it envisioned to be only used with datasets associated with an open license? if so, there's no need for access levels, as all data is freely accessible.
However if DCAT-AP could be used for data portals with closed and/or restricted data, then it is important to be able to classify the access level in a machine-readable way, e.g. confidentiality, data protections, research, etc.
If indeed we should consider such a property, two questions arise:
1. is there a property in a well-known namespace that could be used for this?
2. Is there an existing SKOS concept scheme that can provide the values for it?
I definitely agree with the analysis of Andrea and Bert. I would also like to hear the view of the pan-European data portal team, especially if access to "not open" data falls within its scope.
+1
In Belgium, we have a few interesting discussions on various interpretations of "open", and the use of DCAT-AP for the exchange of metadata about not-entirely-open-data and the need to distinguish between them when presenting these datasets to the users.
In fact I think this makes even more sense at the Distribution level, given that quite frequently you could be combining free access distribution for a given Dataset with more restricted ones that require for example previous registration (e.g. an API distribution)