During the 1st virtual meeting of the Working Group, the Working Group was invited to challenge and provide further input to the basic use case and user scenarios in the draft specification.
- What are your requirements for cross-lingual search?
- Is anything missing? Are there any properties missing (e.g. information on the pricing of datasets, interoperability of datasets) that would allow users to better search for datasets?
- Can we exclude some less relevant metadata?
- What is in it for the data portals? The advantage of exchanging description metadata between data portals using the DCAT Application Profile should be made more clear.
Component
DocumentationCategory
Use case
Login or
create an account to comment.
Comments
Cross-lingual
Belgium has 3 official languages (NL, FR and DE), and often data owners want to add English as well. Of course, metadata (categories, regions, ...) must be searchable in several languages. It is possible that, say,
It is often helpful that a developer searching for a dataset, gets all this info in a single search request
and/or that someone who is looking for a dataset in language X can find out that the dataset may (only) exist in language Y
As I mentioned in the first meeting I suggest to better describe users.
I prepared a description of another use case, involving legal information.
Use case: enable a search for datasets across various typologies
Julie works for a university and is looking for data sets on migration in the European Union. She wants to carry out a study to analyse the evolution of migration flows from 1950 to 2013, as compared to the variation of employment rate in different Member States. Therefore, she will not only need to look for statistics at national and European level but also at legislation on immigration control. Migration data sets and related legislation are made available by both national and European Union public administrations at different levels in a distributed environment, which Julie is not aware of. The same holds for datasets on employment rates, which are usually made available by institutes of statistics as well as by public administrations and labour unions.
@Bart: thank you for stating your requirements for a cross-lingual search. To summarise:
I think that DCAT enables these requirements, except for the "multilingual links". This is something that dcat:LandingPage does not address. Looking at this example on data.gov.be, I see what you mean: http://data.gov.be/dataset/database-authorised-medicines-human-use. The landing page URLs have both Dutch and French keywords, hence are basically different.
I guess we will not be able to convince Belgian public administrations to use the HTTP Accept-Language parameter to serve multilingual content from the same URL. Would it be acceptable not to exchange the language of landing pages?
@Andrea: Thanks for the suggested segmentation of "data consumers". I am not sure whether these requirements that are significantly different per segment: i.e. one could easily argue that interoperability and quality are equally important for the other segments. Could you elaborate on a user scenario on dataset interoperability and dataset quality (see also https://joinup.ec.europa.eu/discussion/dataset-quality-rating)?
I also have to say that DCAT does support this...
@Enrico: Thanks for contributing this user scenario. Your requirement is to enable a search for datasets related to migration, employment, and legislation on immigration control. This is valuable input for "assessing" the usefulness of the proposed controlled vocabularies for dcat:theme...
New proposal of
User Scenario - oProcurement infomediary business need
(oProcurement stands for Linked Open Procurement Data. Cfr. European Commission’s E-TEG – E-Tendering Experts Group; CEN WS/BII 3 Business Plan; and everis’ oProcurement initiative)
Luis is the owner of a small Spanish infomediary (an SME) that provides its clients with reports on (1) the good opportunities existing in the European Public Sector Procurement market, based on the published Contract Notices and Call for Tenders; and (2) on how to improve their competitiveness based on general information based mainly, but not uniquely, on published pre-awarding Catalogues, Contract Award Notices and even post-awarding documents as orders and invoices. Luis is thinking of ordering the development a software application that would take advantage of the rising number of public authorities that are publishing the Procurement information in a standard structured way.
Without a DCAT Application Profile:
The data that Luis' firm needs to prepare the matching good opportunities and reports and are scattered in, literally, dozens of thousands of buyer profiles, electronic gazettes, and e-procurement portals and platforms. There is no catalogue of all the data sources currently existing on public procurement data. The effort required to manually explore, find, identify, select, obtain and analyse the data is enormous. Typical difficulties that Luis employees encounter in their daily job are discovering what sources publish structured information; whether this information is structured in a known standard way or not (and identifying the standard); what specific restrictions apply to data (national, regional and local legal and re-use conditions); the great variety of languages and alphabets (transliteration problems) used across Europe, even in a single country as Spain (4 different languages).
With a DCAR Application Profile: electronic gazettes (e.g. TED: http://www.ted.europa.eu); Pan-European e-Procurement federations (e.g. PEPPOL); e-Procurement platforms (e.g. e-Prior: http://ec.europa.eu/dgs/informatics/supplier_portal/, or the Spanish Platform: http://contrataciondelestado.es); national, regional and local buyers’ profiles; and many other Procurement-related content providers (e.g. e-Certis: http://ec.europa.eu/markt/ecertis/login.do; Tenderers Registers such as ROLECE in Spain, etc.) exchange description metadata of their own collections using a common metadata vocabulary and common controlled vocabularies (CEN/BII-based, see XML bindings in UBL and UN/CEFACT and Code and Value Lists in http://cenbii.eu). The exchanged descriptions are supported by a Metadata Broker (e.g. PEPPOL). Now Luis’ application can integrate a connector to the Brokers’ gateway that would facilitate (1) searching for new datasets all across Europe and inside a particular country; (2) automatically explore each dataset; (3) automatically find, identify and select what information matches the requirements of Luis’ clients thus creating real good opportunities; (4) automatically obtain additional information for further analysis and re-purpose. Once new data sources are identified via the Metadata Broker, Luis’ application adds the new data source to its own catalogue so from now automatic regular operations will be performed on them: in other words, a Procurement- infomediary-specific DCAT AP would naturally (i.e. automatically) emerge.
Just a note on languages and hyperlinks. HTML (including HTML5) allows you to declare the language of a linked resource. i.e. if I include a hyperlink here to, say http://www.data.gouv.fr/ then the markup could be <a href="http://www.data.gouv.fr/" hreflang="fr">French government data portal</a>. This has been in the spec for a long time and can be seen in detail at http://dev.w3.org/html5/markup/a.html. The downside is that browsers haven't implemented this. The use of the attribute in things like data portals would be something that the internationalisation group at W3C would be very interested to see and use as evidence of the usefulness of the attribute.
How would this be implemented in RDF?
We are using RDF so, just use whatever you need. Is there any problem with this?
I see nothing wrong with the RDF snippet that further describes the <dcat:landingPage>. It looks like regular RDF to me. However, it is different from Phil's suggestion to use hreflang. I did not know how hreflang could be used in RDF and I guess the answer is that it can't.
New proposal of
User Scenario - Federated Legislative Catalog and Search Engine
Each Region of Italy (22, included autonomous Provinces) owns a database of the local legislation, published through a web portal. Users willing to perform a search on all these legislative databases, need to "travel" across the 22 portals, in order to find the acts of interest. Search interfaces and criteria vary among portals along with the way in which search results are presented. A list of these databases is provided here: http://www.normattiva.it/static/mappa.html . In order to ease the burden of searching acts across all these portals, a federated search engine is under development. The ideas underlying the design of this engine are: each Region publishes a catalog of all its legislation at a known URL; the catalog contains the list of the legislative references to all the acts of a given Region, along with URLs to each act; a federated indexer - exploiting the catalogs - crawls the text-only version of all acts in order to build a cross-regional federated index, on top of which a federated search engine permits to perform searches. In our view, the legislative catalog of each Region could be described and published using a DCAT application profile. This profile could also be applied to national and European legislations.
@Makx Agree. Don't think that hreflang could play any role here either. My point was that it just doesn't matter as we can already use RDF expressivity to provide language information for a landingPage.
@Carlo: Thanks a lot for contributing this use case. We will include it in Draft 3.
Some aspects that we could discuss with the Working Group: