StatDCAT-AP respects the conformance requirements defined for DCAT-AP version 1.1 (https://joinup.ec.europa.eu/release/dcat-ap-v11), which means that it will have, at least, the same mandatory classes and mandatory properties as DCAT-AP 1.1. StatDCAT-AP may extend DCAT-AP by specifying additional properties, as long as they are reused from existing RDF vocabularies.
During discussion with stakeholders, the following additional property was proposed:
‘Number of data series’ as a property for Dataset
The number of data series provides information on how values in the Dataset are grouped; for example, a Dataset could contain data for three regions with three values for each region. In this example, the number of series is three while the number of observations is nine.
This property is intended to provide an indication of the size of a Dataset. DCAT-AP has an option to indicate the size in bytes of a data file through the property byteSize (https://www.w3.org/TR/vocab-dcat/#Property:distribution_size) for Distribution but that only gives the physical size of the dataset which is not the only aspect of interest for statistical Datasets.
The expected value for this property is a string in an agreed format, e.g. “5 series”
Participants in this activity are asked to respond to the following questions:
- Is the information for this property available in existing statistical systems and applications?
- How will exposing this information to general data portals enhance the discoverability of statistical datasets?
- Do you know of any property in existing RDF vocabularies that could hold this information?
Please note that there is also a proposal (https://joinup.ec.europa.eu/discussion/number-observations) to provide information about the total number of observations in the Dataset.
Comments
An option could be to use the property dct:extent which is defined as "The size or duration of the resource". The range of this property is dct:SizeOrDuration, for which the definition gives examples "a number of pages, a specification of length, width, and breadth, or a period in hours, minutes, and seconds".
The definition implies that the value is a resource, so it would be expressed as a blank node with a rdfs:label with text.
This text could be normalised as suggested above: "5 series".
The WG decided in the meeting of 13 May 2016 to include a new property for the number of data series. We are inviting the community to make suggestions for such a property, and in particular whether there might be a property in an existing RDF namespace that could be used.
The new property would likely be modelled in RDF as a subproperty of dct:extent. The propery would be optional and repeatable for cases where a dataset contains more than one data series. The range of the property is proposed as rdfs:Literal (xsd:integer).
Proposed resolution: Extension of DCAT-AP with new property stat:numSeries
The working group has decided to meet the requirement to express the number of data series in a dataset by re-using a property from an existing namespace, if possible, or creating a new property in the StatDCAT-AP namespace. As we have been unable to identify an existing property, a new property will be created with a range of rdfs:Literal with datatype xsd:integer.
In the application profile, this property will be used on Dataset with cardinality 0..n.
A new property numSeries is included in Draft 4, section 6.2.4.
An issue has been raised concerning the definition of this property, proposing:
The Cartesian Product of the number of modalities of each dimension, excluding what QB calls the measure dimension (that denotes which particular measure is being conveyed by the observation)
See also issue https://joinup.ec.europa.eu/node/156471/.