A data mining pilot project by the Flemish government to unlock the region's academic research is using mainly open source tools. The so-called RILOD (Research Information Linked Open Data) pilot is about to go into it's next - public - phase. After, the final two proprietary software tools will be swapped by open source alternatives, says Geert Van Grootel, a knowledge management expert working for the Economy, Science & Innovation (EWI) department of the Flemish government in Belgium. "We're replacing the proprietary relational database system by Postgresql and the text mining tool will also be switched to an open source alternative."
RILOD's components include multiple servers running Linux, Java application server Tomcat and web server Apache. "We're using standards as much as possible, and we aim to be as interoperable as possible, which explains why it will all be open source soon."
Van Grootel emphasises the project's use of open standards. "This allows us to choose between open source and proprietary solutions - as long as the latter use open standards." One example is the current proprietary text mining tool, which connects to the system using Apache's Unstructured Information Management standard. "Others, preferring a different text mining tool, can connect using the same standard."
The EWI researcher stresses that the project will not exclusively use open source. "Not all requirements can be addressed by this type of solutions.
Manufacturing
The system gathers information including draft research papers, academic research websites and publicly available research abstracts. It combines this with public data in machine-readable format made available by Flemish universities, colleges and research institutes. "We're building a solid base for research intelligence", Van Grootel said. The system currently contains some 200 million facts.
One use case for the research data mining could be to find types of industry that can replace those that are disappearing, such as car manufacturing. "The tool could show that there is much research capacity in 'Logistics'. That can underpin a government's decision to support entrepreneurs in that field."
Van Grootel presented the data mining project at a big data conference taking place in Ter Hulpen, on 21 November. "This open data project aims to maximise the re-use of scientific data and components, reduce cost, enable monitoring of government policies and increase understanding.
In its current pilot phase, the data mining project is used by researchers and government staffers. Once public, Van Grootel expects that the tool will be also be used by the financial sector, by industry and by citizens.
Collaboration
The researchers are hurrying to improve the visualisation of all the research data, before opening the system to the public, sometime in early 2014. "We're also exploring the data to find potential areas for collaboration." All of the RILOD's data and tools will be made publicly available, announces Van Grootel. "We're looking at ways to package the solution, so it can be deployed by others."