Service-based data access

Anonymous · 2016-03-03T10:08:31+0100

How to describe different information elements for searching for and binding to a service? In this case, it would be useful to provide the minimal information needed for connecting to a service.

Anonymous (not verified) Thu, 03/03/2016 - 10:44

The following comment was submitted by Uwe Voges:

"For the EarthObservation/Geospatial-community, services based online data-access is the most important use case for data access/delivery.

As this is currently underspecified in DCAT-AP (just catalogues are considered specifically as services) I propose to add an issue "services based data access".

I guess implementation guidelines are needed here."

http://joinup.ec.europa.eu/mailman/archives/dcat_application_profile/2016-February/000381.html

Anonymous (not verified) Thu, 03/03/2016 - 10:46

The following comment was submitted by Marco Combetto:

"I subscribe the Uwe point, even if I am involved in a completely different applicative scenario.

We are going to publishing right now (dati.trentino.it) a lot of PSI data through service-based online data access (REST/JSON end-point) and now we are facing the following question "How could I describe that it is service-based online data access and not a file?" (FileType and MediaType are not enough).

We are thinking to add a metadata "As a service" for each dataset, in order to be able to extract in one simple query "all the service-based online data services" published in the catalogue, but would be better to have a standard there."

http://joinup.ec.europa.eu/mailman/archives/dcat_application_profile/2016-February/000382.html

Anonymous (not verified) Thu, 03/03/2016 - 10:49

The following comment was submitted by Makx Dekkers:

"Dear Uwe,

We can create an issue for "services based data access", although it was not an issue that came up in the earlier round, so it is not on our priority list at the moment.

Could you maybe propose a guideline for this? Maybe you can also look at the discussion during the DCAT-AP revision last year: see https://joinup.ec.europa.eu/discussion/details-distribution, and see if there is something useful there."

http://joinup.ec.europa.eu/mailman/archives/dcat_application_profile/2016-February/000383.html

Anonymous (not verified) Thu, 03/03/2016 - 11:32

The following comment was submitted by Uwe Voges:

"There are different information elements which need to be described for being able to search for and dynamically bind to a service:

Per Service:

ServiceType: e.g. OGC DSEO
ServiceType version: e.g 1.0
Specification-Link: <link to documentation of the service-specification>, e.g. https://portal.opengeospatial.org/files/?artifact_id=55210
ServiceDescrition-URL: Capabilities, WSDL, OS-Description,...
Binding/DCP to access service-description: e.g. WebService: GET/KVP, GET/REST, POST/XML, POST/XML/SOAP,
ConnectPoint (URL)
SampleRequest: ....

Per Operation:

Name: e.g. GetProduct
Description: e.g. "... allows a client to request a specified Product providing its unique URI..."
URL(s): possibly per operation
Binding/DCP to access service-description: e.g. WebService: GET/KVP, GET/REST, POST/XML, POST/XML/SOAP,
ConnectPoint (URL)
SampleRequest: ...."

http://joinup.ec.europa.eu/mailman/archives/dcat_application_profile/2016-February/000384.html

Anonymous (not verified) Thu, 03/03/2016 - 11:43

The following comment was submitted by Makx Dekkers:

"Uwe,

It seems to me that you’re looking for a full-fledged approach to describing services. The challenge that I see is that DCAT was not designed to provide such a complete approach and as far as I can see, none of the properties in your list are available in DCAT at the moment.

It is my feeling that it will be hard to come up with a ‘standard’ way of doing this in the few weeks we have left to publish the first set of guidelines.

Maybe we can keep this on the list for future work?"

http://joinup.ec.europa.eu/mailman/archives/dcat_application_profile/2016-February/000385.html

Anonymous (not verified) Thu, 03/03/2016 - 11:44

The following comment was submitted by Uwe Voges:

"Maks,

It makes sense.

Maybe we could provide minimal information for a service based distribution.

We do this in a proprietary manner currently within the European data portal.

We provide the title, the URL and a format for the service.

Example:

<dcat:distribution>

<dcat:Distribution>

<dct:title lang="en">view</dct:title>

<dcat:accessURL rdf:resource="http://geo.osnabrueck.de/arcgis/services/wms_statistischeEinheiten/MapS…?"/>

<dct:format rdf:parseType="Resource">

<rdfs:label>WMS</rdfs:label>

</dct:format>

</dcat:Distribution>

</dcat:distribution>"

http://joinup.ec.europa.eu/mailman/archives/dcat_application_profile/2016-February/000386.html

Anonymous (not verified) Thu, 03/03/2016 - 11:46

The following comment was submitted by Marten Hogeweg:

"Have you looked at how data.gov<http://data.gov> does this in the US? There was a mapping from the formal geospatial metadata such ISO191xx (and thus INSPIRE) to DCAT. This is used to describe both files and web services API, whether implementing OGC API or other non-standard open data API.

By connecting DCAT to existing metadata management workflows, agencies have little extra work to produce the mandatory open data listing and technology providers have a clear path to supporting these workflows."

http://joinup.ec.europa.eu/mailman/archives/dcat_application_profile/2016-February/000387.html

Anonymous (not verified) Thu, 03/03/2016 - 11:47

The following comment was submitted by Marco Combetto:

"Just to clarify, in my case, I didn't want to provide all details to "bind" a service-based online data access. I fully agree that I need more to do this, and it is out of the scope of DCAT-AP

I was just looking for just to have the opportunity to add an additional "standard facet" in our data catalogue to select only the dataset that are distributed as service-based online data access instead of simpler (maybe) file distribution… (We think it could be a potential KPI tracking the % of service-based online data access in our catalogue)."

http://joinup.ec.europa.eu/mailman/archives/dcat_application_profile/2016-February/000388.html

Anonymous (not verified) Thu, 03/03/2016 - 11:48

The following comment was submitted by Makx Dekkers:

"Uwe,

Your approach to indicate the type of service using dct:format is one way of doing it. The discussion at https://joinup.ec.europa.eu/asset/dcat_application_profile/issue/detail… led to a consensus to use dct:conforms to point to a schema (JSON/XML/CSV) or ontology (linked data) or format documentation."

http://joinup.ec.europa.eu/mailman/archives/dcat_application_profile/2016-February/000389.html

Anonymous (not verified) Thu, 03/03/2016 - 11:51

The following comment was submitted by Uwe Voges:

"Marten,

You mean these links, right?

https://project-open-data.cio.gov/schema/

https://project-open-data.cio.gov/metadata-resources/#common_core_required_fields_equivalents

Looks similar to what we´re doing in the European Data Portal. For a full-fledged service approach the metadata defined may not be sufficient (see e.g. ISO19119).

I guess it would make things easier if common service/serviceypeVersion would have a unique IANA MediaType....

In practice (at least in the Geo-Domain) one of the most interesting use-cases is to find a service which can directly be bound with an appropriate client, not?"

http://joinup.ec.europa.eu/mailman/archives/dcat_application_profile/2016-March/000390.html

Anonymous (not verified) Thu, 03/03/2016 - 11:55

The following comment was submitted by Andrea Perego:

"Hi Uwe,

On 01/03/2016 7:54, Uwe Voges wrote:

> Marten,

> You mean these links, right?

> https://project-open-data.cio.gov/schema/

> https://project-open-data.cio.gov/metadata-resources/#common_core_required_fields_equivalents

> Looks similar to what we´re doing in the European Data Portal. For a full-fledged service approach the metadata defined may not be sufficient (see e.g. ISO19119).

> I guess it would make things easier if common service/serviceypeVersion would have a unique IANA MediaType….

You may consider using the URI code list for protocols maintained here:

https://github.com/OSGeo/Cat-Interop/blob/master/LinkPropertyLookupTable.csv

For more details, see the relevant mail from Paul van Genuchten's (cc'ed):

http://joinup.ec.europa.eu/mailman/archives/dcat_application_profile-geo/2015-September/000203.html

> In practice (at least in the Geo-Domain) one of the most interesting

> use-cases is to find a service which can directly be bound with an

> appropriate client, not?

Indeed. But I think the problem is the level of detail needed to address your specific use case.

Services was one of the hot topics discussed during the development of GeoDCAT-AP - and actually one of the questions was "do we really need to represent services?" - i.e., users are interested in data, not services, and, outside the geo domain, nobody understands what we mean with discovery, view, download, etc. services.

Actually, GeoDCAT-AP covers also services, as represented as metadata records in a catalogue, but not with the level of detail you describe.

Anyway, this is a topic that can be further elaborated in the framework of the GeoDCAT-AP WG, possibly focussing on modelling the output of a GetCapabilities request. The outcomes can be the basis for a future revision to the GeoDCAT-AP specification."

http://joinup.ec.europa.eu/mailman/archives/dcat_application_profile/2016-March/000391.html

Anonymous (not verified) Thu, 03/03/2016 - 11:58

The following comment was submitted by Uwe Voges:

"Andrea,

Big (Open) Data is on everyone´s lips. I guess lately in such a situation downloading data will fade from the spotlight and cloud-based access/processing-services ("Open Services" - not just Open Data) become more and more the means of choice (not only in the EarthObservation domain, also e.g. scientific data, climate data, medical data...)"

http://joinup.ec.europa.eu/mailman/archives/dcat_application_profile/2016-March/000392.html

Anonymous (not verified) Thu, 03/03/2016 - 12:00

The following comment was submitted by Maurino Andrea:

"Dear all, have you consider the Core Public Service Vocabulary Application Profile: https://joinup.ec.europa.eu/solution/core-public-service-vocabulary-application-profile?

It is related to public services in general (e.g. residence permit or similar) but it can be also suitable for describing specific electronic services of public administration."

http://joinup.ec.europa.eu/mailman/archives/dcat_application_profile/2016-March/000393.html

Anonymous (not verified) Thu, 03/03/2016 - 12:02

The following comment was submitted by Marten Hogeweg:

"+1 to #12. The geospatial domain has been working on open service specifications for over 20 years. These specs have been widely adopted by industry and users and form the basis for example of the INSPIRE services technical guidance."

http://joinup.ec.europa.eu/mailman/archives/dcat_application_profile/2016-March/000394.html

Anonymous (not verified) Thu, 03/03/2016 - 12:05

The following comment was submitted by Nikolaos Loutas:

"> I guess it would make things easier if common service/serviceypeVersion would have a unique IANA MediaType?

+1 Uwe (#12). I think that at this point this would be a good starting point and probably the right level of detail to be included in the DCAT-AP. If we want to look into how digital (or Web) services are described, then this is imho a different discussion, and we should look at how this was done in semantic services and linked services frameworks in the past, where the description of a service was studied."

http://joinup.ec.europa.eu/mailman/archives/dcat_application_profile/2016-March/000396.html

Anonymous (not verified) Thu, 03/03/2016 - 12:07

The following comment was submitted by Vassilios Peristeras:

"+1 to Nikos (#15).

Describing services has been a field on itself for a couple of decades now. I am not sure that mixing the description of services with the data they use is a right way to go.

The CPSV is a spec to describe public services not technical/web services. For this later, you can take a look at WSDL, WSML, OWL-S, WSMO/WSMO-lite, SA-REST and many others… See also this differentiation between services/processes and data in upper ontologies literature (e.g. SUMO, DOLCE, cyc etc)."

http://joinup.ec.europa.eu/mailman/archives/dcat_application_profile/2016-March/000397.html

Anonymous (not verified) Thu, 03/03/2016 - 12:09

The following comment was submitted by Marten Hogeweg:

"Assigning a media type to a web service may not be as straightforward. Does it describe the response format (application/json, image/png ...)? Or is it a classification of service types independent of format of a particular request? Wouldn't HTTP URI not be preferred from a linked data perspective?

OGC gets close to that with their XSD structuring:

http://schemas.opengis.net/wms/1.3.0/

http://schemas.opengis.net/csw/2.0.2/

http://schemas.opengis.net/context/1.1.0/ etc.

They could put some additional description there (RDF or such) or perhaps the XSD is sufficient for that:

http://schemas.opengis.net/wms/1.3.0/capabilities_1_3_0.xsd etc.

These OGC services have different operations each with possibly different response formats. Hence the need to classify the services themselves."

http://joinup.ec.europa.eu/mailman/archives/dcat_application_profile/2016-March/000398.html

Anonymous (not verified) Thu, 03/03/2016 - 12:11

The following comment was submitted by Uwe Voges:

"Anyhow, the question is, what is the absolute minimal (!) information needed, to be able to (ideally automatically) connect to a service.

I guess it is:

The kind of service (type/name + (optional) version, e.g. WMS, 1.3.0) as mediaType or URI.
A network-address (e.g. URL) to a (machine readable) service-description (+ protocol? (E.g. HTTP/POST or HTTP/GET))
??"

http://joinup.ec.europa.eu/mailman/archives/dcat_application_profile/2016-March/000399.html

Anonymous (not verified) Fri, 18/03/2016 - 16:10

I think this was discussed or at least mentioned during the previous round.

I have introduced almost the same remark (3rd paragraph of [1]) in the past and I can recall further clarifying but I can not find where. I agree with Nikos that describing services has been a field on its own but it would make sense to include in the guidelines how more explicite descriptions may be aligned with DCAT descriptions to provide more details in cases it is required.

We have done the same in the case of the RML vocabulary to describe how information regarding retrieving data from different access interfaces might be aligned with the descriptions of the rules that specify how RDF is generated from the data retrieved from such an access interface. You may find out more at [2] , [3] and [4].

[1] http://joinup.ec.europa.eu/mailman/archives/dcat_application_profile/20…

[2] http://rml.io/RMLdataRetrieval.html 

[3] http://www.slideshare.net/andimou/machineinterpretable-dataset-and-serv…

[4] http://dl.acm.org/citation.cfm?id=2814873

Anonymous (not verified) Fri, 08/04/2016 - 12:37

Dear Anastasia,

First I try to roughly summarize what I found in the referenced documents (related to this point):

In general what you provided is a mapping mechanism from hierarchical source data to an RDF-representation based on templates. The templates can include (reference) information how to access the source data (aka "LogicalSource").

This logical source has some technical information ("source") and "model specific information" ("reference"/"iterator"/"query") about how to access the source, e.g. localFile + XML based access described by XPath-expressions.

For the technical information ("source") you provided the following use cases:

- file based access

- database connectivity (D2RQ)

- HTTP/GET web source(s) (Web API/service)

- RDF source(s)VOiD (Endpoint), SPARQL-SD

The technical information ("source") can include additional information, e.g. driver, username, password

As far as I understand you, you propose to provide this approach also for addressing (access to) the datasets described by the DCAT-AP metadata, right ?

For the uses cases named above this could be a good starting point. But these use cases cover only a part of what we have to deal with e.g. in the Geospatial and EarthObservation world. We have e.g. to deal with very different kinds of data and access to it: e.g. big raster datasets (e.g. GeoTIFF data stored behind facades providing complex interfaces by which the data can be accessed (this can be compared to the database access). But it may also be the case that the data must be ordered a prio (e.g. because it is not in an online archive but in a storage robot) via a more complex order model (e.g. OGC EO Order 1.0). Further the data/services can be accessed via different protocol bindings (e.g. HTTP/POST/XLM/SOAP) and the response data may be compressed in some kind (e.g. .zipped). Further the information about the data access maybe some kind of indirection (e.g. just a link to a service description) plus some kind of additional information from which the client can generate appropriate service calls.

But I guess it should be possible to generalize also those access cases to get an RDF description able to cover most of the use cases (80:20 rule). If it is not possible, the RDF information is a nice toy but the main benefit comes into play when machine clients could use the information to get access to the data/service...

DCAT application profile implementation guidelines

Service-based data access

Component

Category

Comments