The execution of this proof of concept (PoC) showcases the usefulness of the study and analysis of the social media domain.
This proof of concept demonstrates the use of text mining techniques on social media as a means to identify areas of interest in research,
specifically to gather conclusions regarding the most important topics to discuss during the ICT conference 2016.
The information to analyse is gathered from Twitter and Yammer posts concerning previous events.
DATA SOURCES
The data for the PoC is obtained directly from Twitter and Yammer. This ensures the usefulness and authenticity of the data for its analysis and classification.
Even though different APIs exist for both Twitter and Yammer, their restrictions (Twitter for example does not allow users to get tweets published before a week time lapse) make them not feasible to be used in the context of the PoC. Data is obtained manually.
Lastly, additional data extracted from one survey launched on October 2015 is incorporated. This survey’s objective was to gather information related to the topics that the people are more interested in to be part of the next ICT conference 2016 agenda.
METHODOLOGICAL APPROACH
From a methodological point of view, the main tasks that have been carried out are:
Text-mining treatment: transform the abstract of the documents into a format that serves as an input for modelling algorithms
Clustering: make homogenous groups of similar documents
Classification model: obtain the rules to classify a document in one of the categories defined
Draw conclusions from the classification of documents
TECHNICAL ARCHITECTURE
The reference architecture is a collection of building blocks which consist of a series of technologies and methodologies.
NTT DATA and everis Big Data Reference Architecture is composed of 2 levels:
1. Layer: is a group of building blocks which have the same technology/methodology. There are 8 layers that are categorized
for all technologies/methodologies important for the BIG DATA Architecture.
2. Building Blocks: each one represents a technology/methodology. There are 38 building blocks in the BIG DATA Architecture.
Over the base Big Data Architecture the blocks that have been used in the PoC for each layer have been highlighted.