Description
From: http://joinup.ec.europa.eu/mailman/archives/dcat_application_profile/2015-February/000123.html
It is often the case that access to data requires user registration and/or authorisation.
Including this information in metadata would be beneficial for a number of reasons, including the following ones:
- Grouping data based on access restrictions
- Informing users about access restrictions
Allowing users to filter data based on access restrictions
Form: http://joinup.ec.europa.eu/mailman/archives/dcat_application_profile/2015-February/000120.html
Based on feedback from public sector organizations we have thrown in these extra properties for the dataset class:
- Access Level -to distinguish open data from the rest by dividing into public, restricted and non-public datasets.
- Access Rights - to express why the dataset is restricted/non-public. Applies only for restricted/non-public datasets.
Proposed solution
Add new property to Dataset to indicate whether the Dataset is public, restricted or non-public.
- Recommending the specification of access restrictions, possibly by using dct:accessRights
- Identify / develop a code list for access restrictions
A possible list of access restrictions:
- no limitations
- registration required (non-discriminatory)
- autorisation required (“closed data”, that only authorised users can access)
- unknown
Component
CodeCategory
improvement
Login or create an account to comment.
Comments
This property is in use in Project open data metadata schema, but with a slightly different twist. From their usage note:
"This field refers to the degree to which this dataset could be made available to the public, regardless of whether it is currently available to the public. For example, if a member of the public can walk into your agency and obtain a dataset, that entry is public even if there are no files online. A restricted public dataset is one only available under certain conditions or to certain audiences (such as researchers who sign a waiver). A non-public dataset is one that could never be made available to the public for privacy, security, or other reasons as determined by your agency"
I agree with Øystein in principle. data.gov.uk has not just public data, but plenty of 'unpublished' ones, with some text explaining why, or whether there are plans/timeline to review it or release it. So either we add these two fields, or people will just have to understand that these datasets have no distributions or licence. (As he notes, the two fields needed are subtly different to the USA ones.)
It is not entirely clear to me what the requirement is. Is see two different requirements from the description of the issue and the comments:
1. a need for one property expressing a type of access licence for the dataset and one property to explain why a dataset is not open; this was the original request submitted by Øystein.
This one appears to be related to the discussion about providing licence information on the level of the dataset and to the issue https://joinup.ec.europa.eu/discussion/pr8-move-dctrights-distribution-dataset. We seem to converge on a conclusion that this is not in line with the moidel of DCAT.
2. a need for a property to tell users that the dataset does not have distributions (yet), which seems to be the need that David expresses.
If this is merely used to give a human user information, this could easily be done by adding a dct:description to the Dataset.
Makx,
I think that the parallel discussion concerning issue PR3 can help clarify the requirements.
Also, I think it would be important to make a distinction between access and use conditions. Licences describe only how a resource can be used, not who can access it, and under which conditions. In theory, you can have "closed" data (i.e., data accessible only to authorised users) released according to an "open" licence (e.g., CC BY).
As far as I can understand it, the issue under discussion is only about access, and the requirement is about associating datasets with information concerning their access levels / restrictions, that can be used by data consumers (humans and software agents) to filter out, e.g., those that they won't be able to access - I elaborated this in a comment to issue PR3.
The proposed solution can address this requirement.
About expressing why a dataset does not have a public distribution, this can be addressed by using dct:description, as you propose, or a different, and possibly more specific, property (vann:usageNote?).
Regarding point 2 of #3, a simple SPARQL query counting the dcat:Distribution of a dcat:Dataset would be sufficient I think.
A returned value equal to zero means no dcat:Distributions for this dcat:Dataset (yet).
The Working Group decided in its meeting of 10 June 2015 to add the optional property dct:accessRights to Dataset with reference to a controlled vocabulary with three members – Public, Restricted, Non-public – to be created and maintained by Publications Office.
Has the EU Publications Office already created the vocabulary mentioned above?
There is a threed on the mailing list (start http://joinup.ec.europa.eu/mailman/archives/dcat_application_profile/20…) to discuss the semantics of the vocabulary terms.
The proposed resolutions are:
There have been a number of comments that I think we should not take into account:
The proposed defintions are:
Label: Public
Definition: Publicly accessible by everyone.
Usage note/comment: Permissible obstacles include: registration and request
for API keys, as long as anyone can request such registration and/or API
keys.
Label: Restricted
Definition: Only available under certain conditions.
Usage note/comment: This category may include: resources that require
payment, resources shared under non-disclosure agreements, resources for
which the publisher or owner has not yet decided if they can be publicly
released.
Label: Non-public
Definition: Not publicly accessible for privacy, security or other reasons.
Usage note/comment: This category may include resources that contain
sensitive or personal information.