Skip to main content

Study on data tools and technologies used in the public sector to gather, store, manage, process, get insights and share data

Published on: 19/06/2020 Last update: 28/07/2020 Discussion Archived

This study covers new architectures, frameworks, tools and technologies to be used by public administrations to gather, store, manage, process, get insights and share data. This domain includes the study of how data are governed as well as data collaboratives, and in particular stresses the joint analysis of governance and technologies. For more, read the report and the case studies.

The main recommendations are:

  1. Put user needs before organisational needs. The European Commission should aim to meet the needs of both consumers of public sector data products, and the needs of the analyst users that produce it. Clearly the needs of consumers (be they individual citizens, businesses, public bodies, or decision makers) to have access to timely and accurate information is critical to any data infrastructure and analysis strategy. However, it is also important to recognise analysts as a user group with distinct and often varying needs, and often the capability to meet their own needs if given sufficient flexibility. The case studies examined in this analysis demonstrate the ability of analysts to build the tools they need to do their work better, and by working openly, to share those tools with the wider community and enable their reuse.
  2. Work in the open and foster reusability. The European Commission should embrace open ways of working and embed the same approach to Member States. In two of the case studies that we examine in this analysis, working in the open has been a major contributor to success. The decision in NZ to work openly on the SIAL and SIDF has led to significant cost savings among other public sector bodies who do not, as a result, need to repeat the same work. Similarly, working openly in the production of RAPs has fostered the creation of a community that spans all the devolved administrations in the UK, and some regional public sector bodies: a grassroots movement for modernisation of tooling and practices that originates from the analysts themselves.
  3. Adapt to data readiness. The Commission should recognise that different public sector bodies have different needs and capabilities and a ‘one size fits all’ approach to data analysis tools and infrastructure is unlikely to be appropriate. It is also important that tools and infrastructure are interoperable, support common standards (for example data formats), and should be able to scale to support future needs. The implementation of RAP, for instance, varies significantly between organisations depending on requirements and capability. NHS Scotland defines seven levels of maturity that an agency can adopt, all based on the principles of RAP, and all built using open source technologies that can be easily adapted and developed as required.
  4. Use Open Source The organization and the Member States should start prioritising the use of open source technologies in future developments. Advances in statistical techniques, the availability of large amounts of data, and the availability of cheap computing power have led to rapid changes in the field of data analysis. Software companies and researchers routinely publish their research and tools freely under open source licenses. These tools are almost uniformly written in open source languages. Allowing analysts to use the same open source tools ensures that they can keep up to date with developments in the field. This is critically important as public sector bodies increasingly adopt machine learning and artificial intelligence: the field moves so quickly that what was once considered to be cutting edge can be obsolete in a matter of months. Furthermore, open source languages act as a ‘programmatic glue’ that can combine disparate data sources, varied analysis, and multiple outputs with minimal effort. This is why the R language has been an indispensable part of the RAP project: it offers flexibility rarely seen in proprietary tools. Moreover, public sector bodies often differ in their choices of proprietary software for all manner of budgetary and political reasons. Adopting common open source tools like Python and R removes these barriers to sharing, enabling reuse.
  5. Invest in data capability at all levels. The data landscape is changing rapidly, and the pace of that change is increasing. Member states should recognise the need to invest in the capabilities of their personnel in order to keep pace with these changes. The RAP project provides a good example of this. Because the project relied predominantly on open source software, it did not imply a big new capital investment, but did require capability building both among the analysts who would use and develop the tools, and among the managers responsible for them. As public sector organisations become increasingly sophisticated in their exploitation of data, these organisations must ensure that the whole organisation develops data literacy as a core skill, and that the benefits that data can bring are not siloed among small groups of highly data literate specialists.
  6. Break down silos. The commission should work to break down the siloing of data within public sector organisations, and encourage Member states to do the same, whilst prioritising proportionate measures for data security and protection that ensure that the public trust that their data are being well managed. One of the biggest data problems that the public sector faces is that data are often siloed in different organisations, in different formats, and on different infrastructure. Both the IDI and Findata develop legislative and infrastructural solutions to these problems, whilst some of the issues that are solved by RAP exist only because of inconsistencies in the way data is stored and managed by UK Government departments. However, member states should be aware that citizens may be concerned about the collation of data sets within government servers, and the release of this data to organisations outside of the public sector. Both the IDI and Findata have strong approval processes in place to ensure that this is done appropriately, and technical solutions in place to safeguard citizens’ privacy.

Do you agree with these recommendations?

Do you have specific comments or questions on the report?