Provide your use case and user scenario

Portal Admin

Published on: 04/04/2013 Discussion Archived

During the 1st virtual meeting of the Working Group, the Working Group was invited to challenge and provide further input to the basic use case and user scenarios in the draft specification.

What are your requirements for cross-lingual search?
Is anything missing? Are there any properties missing (e.g. information on the pricing of datasets, interoperability of datasets) that would allow users to better search for datasets?
Can we exclude some less relevant metadata?
What is in it for the data portals? The advantage of exchanging description metadata between data portals using the DCAT Application Profile should be made more clear.

Component

Documentation

Comments

Bart HANSSENS Sat, 06/04/2013 - 21:41

Cross-lingual

Belgium has 3 official languages (NL, FR and DE), and often data owners want to add English as well. Of course, metadata (categories, regions, ...) must be searchable in several languages. It is possible that, say,

a short description is available in 4 languages
link(s) to webpages with more information are available in 3 languages
the dataset itself is only available in 2 languages

It is often helpful that a developer searching for a dataset, gets all this info in a single search request

and/or that someone who is looking for a dataset in language X can find out that the dataset may (only) exist in language Y

Anonymous (not verified) Mon, 08/04/2013 - 12:37

As I mentioned in the first meeting I suggest to better describe users.

Data consumes: they can be divided in the following categories: - End Users (or citizens) use the data portal of their choice to search through various collections of data sets from a single point of access. They want to access a specific data values (e.g. the list of museum in the Andalusia region). - Data journalists: use the preffered data portal for looking interesting data for their researches (e.g. the list of companies won pubblic procurement contest in Belgium; and the list of companies financed Belgium political parties) - Developers: need to find data source to build added values applications (e.g tourist information and services in central Europe) End users need a simple end affecitve way to find data value by exploring [FRSAD – Functional Requirements for Subject Authority Records18], finding, and selecting [FRBR – Functional Requirements for Bibliographic Records19] data sets coming from different EU Member States, different portals and different organisations., Developers need to know the semantic of data attributes and the quality of exposed data (e.g. update policy, precision, completenees) instead of data values to build their application; while data journalits requirements are an intermediated level between end users and developers.

Anonymous (not verified) Tue, 16/04/2013 - 10:45

I prepared a description of another use case, involving legal information.

Use case: enable a search for datasets across various typologies

Julie works for a university and is looking for data sets on migration in the European Union. She wants to carry out a study to analyse the evolution of migration flows from 1950 to 2013, as compared to the variation of employment rate in different Member States. Therefore, she will not only need to look for statistics at national and European level but also at legislation on immigration control. Migration data sets and related legislation are made available by both national and European Union public administrations at different levels in a distributed environment, which Julie is not aware of. The same holds for datasets on employment rates, which are usually made available by institutes of statistics as well as by public administrations and labour unions.

Without a DCAT Application profile: there are several statistics available at national and European level on migration. Their interpretation needs to take into account the evolution of relevant legislation, in order to correctly analyse these trends. Furthermore, the way statistics are gathered may vary depending on the chosen criteria. Without DCAT this search is very cumbersome and takes time: Julie has to identify relevant legislative datasets reporting legislation on the specific matter of immigration control in the European Union and in different Member States, as well as datasets on employment rates, distributed among several actors. Once Julie has identified such datasets, nevertheless the management of such data is difficult for the variety of user interfaces, metadata and languages.

With a DCAT application Profile: With DCAT as a common metadata vocabulary describing datasets, and by the support of a Metadata Broker service, Julie is able to query such service as unique point of access to identify relevant datasets on immigration control legislation, as well as about statistics on migration and the variation of employment rates in EU and different Member States in specific periods of time. Starting from such information Julie can easily access to the different datasets, select the information of interest, and collect such information. This can be the starting point to develop facilities for data transformation into a common language allowing Julie to mash data up and visualize the variations of employment rates in different geographical EU regions, as well as to compare such data with the legislation on immigration control in force in each specific country and at EU level.

stijngoedertier (not verified) Wed, 17/04/2013 - 01:16

@Bart: thank you for stating your requirements for a cross-lingual search. To summarise:

Multilingual descriptions (dct:description): a short description is available in 4 languages
Multilingual links (dcat:landingPage): link(s) to webpages with more information are available in 3 languages
Multilingual datasets (dct:language): the dataset itself is only available in 2 languages

I think that DCAT enables these requirements, except for the "multilingual links". This is something that dcat:LandingPage does not address. Looking at this example on data.gov.be, I see what you mean: http://data.gov.be/dataset/database-authorised-medicines-human-use. The landing page URLs have both Dutch and French keywords, hence are basically different.

I guess we will not be able to convince Belgian public administrations to use the HTTP Accept-Language parameter to serve multilingual content from the same URL. Would it be acceptable not to exchange the language of landing pages?

stijngoedertier (not verified) Wed, 17/04/2013 - 01:35

@Andrea: Thanks for the suggested segmentation of "data consumers". I am not sure whether these requirements that are significantly different per segment: i.e. one could easily argue that interoperability and quality are equally important for the other segments. Could you elaborate on a user scenario on dataset interoperability and dataset quality (see also https://joinup.ec.europa.eu/discussion/dataset-quality-rating)?

I also have to say that DCAT does support this...

stijngoedertier (not verified) Wed, 17/04/2013 - 01:39

@Enrico: Thanks for contributing this user scenario. Your requirement is to enable a search for datasets related to migration, employment, and legislation on immigration control. This is valuable input for "assessing" the usefulness of the proposed controlled vocabularies for dcat:theme...

Enric STAROMIEJSKI Wed, 17/04/2013 - 12:47

New proposal of

User Scenario - oProcurement infomediary business need

(oProcurement stands for Linked Open Procurement Data. Cfr. European Commission’s E-TEG – E-Tendering Experts Group; CEN WS/BII 3 Business Plan; and everis’ oProcurement initiative)

Luis is the owner of a small Spanish infomediary (an SME) that provides its clients with reports on (1) the good opportunities existing in the European Public Sector Procurement market, based on the published Contract Notices and Call for Tenders; and (2) on how to improve their competitiveness based on general information based mainly, but not uniquely, on published pre-awarding Catalogues, Contract Award Notices and even post-awarding documents as orders and invoices. Luis is thinking of ordering the development a software application that would take advantage of the rising number of public authorities that are publishing the Procurement information in a standard structured way.

Without a DCAT Application Profile:

The data that Luis' firm needs to prepare the matching good opportunities and reports and are scattered in, literally, dozens of thousands of buyer profiles, electronic gazettes, and e-procurement portals and platforms. There is no catalogue of all the data sources currently existing on public procurement data. The effort required to manually explore, find, identify, select, obtain and analyse the data is enormous. Typical difficulties that Luis employees encounter in their daily job are discovering what sources publish structured information; whether this information is structured in a known standard way or not (and identifying the standard); what specific restrictions apply to data (national, regional and local legal and re-use conditions); the great variety of languages and alphabets (transliteration problems) used across Europe, even in a single country as Spain (4 different languages).

With a DCAR Application Profile: electronic gazettes (e.g. TED: http://www.ted.europa.eu); Pan-European e-Procurement federations (e.g. PEPPOL); e-Procurement platforms (e.g. e-Prior: http://ec.europa.eu/dgs/informatics/supplier_portal/, or the Spanish Platform: http://contrataciondelestado.es); national, regional and local buyers’ profiles; and many other Procurement-related content providers (e.g. e-Certis: http://ec.europa.eu/markt/ecertis/login.do; Tenderers Registers such as ROLECE in Spain, etc.) exchange description metadata of their own collections using a common metadata vocabulary and common controlled vocabularies (CEN/BII-based, see XML bindings in UBL and UN/CEFACT and Code and Value Lists in http://cenbii.eu). The exchanged descriptions are supported by a Metadata Broker (e.g. PEPPOL). Now Luis’ application can integrate a connector to the Brokers’ gateway that would facilitate (1) searching for new datasets all across Europe and inside a particular country; (2) automatically explore each dataset; (3) automatically find, identify and select what information matches the requirements of Luis’ clients thus creating real good opportunities; (4) automatically obtain additional information for further analysis and re-purpose. Once new data sources are identified via the Metadata Broker, Luis’ application adds the new data source to its own catalogue so from now automatic regular operations will be performed on them: in other words, a Procurement- infomediary-specific DCAT AP would naturally (i.e. automatically) emerge.

philarcher (not verified) Wed, 17/04/2013 - 17:36

Just a note on languages and hyperlinks. HTML (including HTML5) allows you to declare the language of a linked resource. i.e. if I include a hyperlink here to, say http://www.data.gouv.fr/ then the markup could be <a href="http://www.data.gouv.fr/" hreflang="fr">French government data portal</a>. This has been in the spec for a long time and can be seen in detail at http://dev.w3.org/html5/markup/a.html. The downside is that browsers haven't implemented this. The use of the attribute in things like data portals would be something that the internationalisation group at W3C would be very interested to see and use as evidence of the usefulness of the attribute.

Makx DEKKERS Wed, 17/04/2013 - 20:14

How would this be implemented in RDF?

Anonymous (not verified) Tue, 30/04/2013 - 13:25

We are using RDF so, just use whatever you need. Is there any problem with this?

<dcat:landingPage rdf:about="http://example.org/dataset001.html"> <dct:title xml:lang="en">My landing page in English</dct:title> <dct:language>en</dct:hasVersion> <dct:description xml:lang="en">This is my informative Web page in English.</dct:description> </dcat:landingPage>

Makx DEKKERS Tue, 30/04/2013 - 13:46

I see nothing wrong with the RDF snippet that further describes the <dcat:landingPage>. It looks like regular RDF to me. However, it is different from Phil's suggestion to use hreflang. I did not know how hreflang could be used in RDF and I guess the answer is that it can't.

Anonymous (not verified) Tue, 30/04/2013 - 18:56

New proposal of

User Scenario - Federated Legislative Catalog and Search Engine

Each Region of Italy (22, included autonomous Provinces) owns a database of the local legislation, published through a web portal. Users willing to perform a search on all these legislative databases, need to "travel" across the 22 portals, in order to find the acts of interest. Search interfaces and criteria vary among portals along with the way in which search results are presented. A list of these databases is provided here: http://www.normattiva.it/static/mappa.html . In order to ease the burden of searching acts across all these portals, a federated search engine is under development. The ideas underlying the design of this engine are: each Region publishes a catalog of all its legislation at a known URL; the catalog contains the list of the legislative references to all the acts of a given Region, along with URLs to each act; a federated indexer - exploiting the catalogs - crawls the text-only version of all acts in order to build a cross-regional federated index, on top of which a federated search engine permits to perform searches. In our view, the legislative catalog of each Region could be described and published using a DCAT application profile. This profile could also be applied to national and European legislations.

Anonymous (not verified) Tue, 30/04/2013 - 19:13

@Makx Agree. Don't think that hreflang could play any role here either. My point was that it just doesn't matter as we can already use RDF expressivity to provide language information for a landingPage.

stijngoedertier (not verified) Thu, 02/05/2013 - 10:46

@Carlo: Thanks a lot for contributing this use case. We will include it in Draft 3.

Some aspects that we could discuss with the Working Group:

Can we consider a law (or a legislative document) to be a dcat:Dataset?
The DCAT application profile cannot incorporate legislation-specfic topics (or any other sector-specfic property). For these more specialised vocabularies (such as CEN MetaLex and AkomaNtoso ) could be useful. A mapping to DCAT for inclusing on a general purpose data portals would still be possible.

Makx DEKKERS Tue, 16/07/2013 - 17:20

DCAT Application Profile for data portals in Europe

Provide your use case and user scenario

Component

Category

Comments