{% extends 'rtd/base.html' %} {% load staticfiles %} {% block about_style %} header_menu_element_dark {% endblock %} {% block content %} About

ABOUT



INTRODUCTION
The execution of this proof of concept, in cooperation with DG RTD, showcases the use of big data in the EC research domain and prove the usefulness and policy benefit that big data can bring. This proof of concept demonstrates the use of text mining techniques used on large amounts of unstructured research
DATA SOURCES
After analysing multiple data sources (Science Direct, PLOS one, the European Patent Office, etc.), the following ones have been selected due to the advantages shown below:After analysing multiple data sources (Science Direct, PLOS one, the European Patent Office, etc.), the following ones have been selected due to the advantages shown below:

  • PubMed: Multiple; massive amount of biomedical literature. This data source fills the need for real research information.
  • CORDIS: Multiple filters. This data source fills the need for real European funding information.
  • METHODOLOGICAL APPROACH
    From a methodological point of view, the main tasks that have been carried out are:
  • Text-mining treatment: transform the abstract of the documents into a format that serves as an input for modelling algorithms
  • Clustering: make homogenous groups of similar documents
  • Classification model: obtain the rules to classify a document in one of the categories defined
  • Draw conclusions from the classification of documents


  • TECHNICAL ARCHITECTURE
    The reference architecture is a collection of building blocks which consist of a series of technologies and methodologies.
    NTT DATA and everis Big Data Reference Architecture is composed of 2 levels:
    1. Layer: is a group of building blocks which have the same technology/methodology. There are 8 layers that are categorized for all technologies/methodologies important for the BIG DATA Architecture.
    2. Building Blocks: each one represents a technology/methodology. There are 38 building blocks in the BIG DATA Architecture.

    Over the base Big Data Architecture the blocks that have been used in the PoC for each layer have been highlighted.
    {% endblock %}