Skip to main content

BLOG 4: UK Reproducible Analytical Pipelines (RAP)

Published on: 06/07/2020 Last update: 27/08/2020 Discussion Archived

What if a time consuming activity such as the production of statistical publication could be reduced to its minimum while assuring the quality of the products? The answer is yes and the solution relies on a set of methodologies, open source tools and investing in data capabilities. 

Reproducible Analytical Pipelines (RAP) is a methodology for the production of statistical publications, that was developed during a collaboration between the Government Digital Service (GDS) and the Department for Digital, Culture, Media & Sport (DCMS) in 2016. The project aimed to improve the production of a statistical bulletin by introducing techniques from software engineering, data science, and academia. The use of open source software was critical to the success of the project which reduced production time of the statistical bulletin by an estimated 75%. 

The outputs from the project were published openly and widely, and the methodology which came to be known as RAP has been adopted widely across local and national government in the UK, now totalling some 30 projects. 

 

What RAP does

The workflow of a usual process statistics production was characterised by a wide use of proprietary software and multiple software applications, multiple ‘copy and paste’ operations and a very time consuming quality assurance process. RAP aims to be much simpler, with the most time consuming and error-prone steps replaced by bespoke software owned and managed by the team that produces the publication. 

The RAP workflow is characterised by:

  • Analysis as code to create a pipeline from the datastore to the final analysis that is easily reproducible, auditable and allows automated testing. 
  • Using R and Python as languages available to analysts on government computing infrastructure, and have a mature ecosystem of modules that easily facilitate the production of statistical bulletins.
  • Open source, that made it possible to share the source code of the first prototypes freely across government departments and beyond.
  • Identifying different users' needs: the end users, such as ministers, members of the public, businesses, or public servants within the same or other departments; but also the public servants responsible for producing the publications. RAP responds to the need to provide accurate and timely statistics to clients, and to provide interesting and rewarding work to analysts, and to upskill and induct new team members as others leave.
RAP

A usual process of statistics production in Government compared to a process using RAP

 

What we can learn from RAP

Whilst methodologies like RAP only solve part of the data quality problem and it can take time to implement when introducing it for the first time, there are a few lessons that make RAP a best practice in data technologies. 

Open source technologies can greatly improve data analytics in government. RAP has proven to be a great success in improving reproducibility and reducing production times of statistical publication. Whilst it is hard to quantify these improvements, the RAP proof of concept was reported to have reduced the time taken to produce the publication in future years by 75%, whilst the Ministry of Justice recently reported a time reduction from two weeks down to just over a day for one publication. Despite the difficulty in quantifying the benefits, the utility of RAP has been widely recognised across the UK government, and it continues to grow in popularity as an approach. 

RAP is as much about people as it is technology. Whilst the idea and technology behind RAP are clearly compelling, RAP would not have been as successful in the UK government had it not been for a community of motivated individuals pushing for its adoption. RAP has been one of the most compelling and widespread applications for the skills brought by data scientists, which the UK government began recruiting in 2014/2015. It also became apparent that there are a limited number of analysts in government with the skills to support the work going forward. In order for RAP or similar methodologies to be really successful, they need to be supported by an approach to upskilling existing analytical staff and recruiting more highly skilled analysts. Could RAP be implemented beyond the UK government and improve all Europe’s statistical publications production? If so, what are the current data capabilities inside Members States governments and how they can be upscaled? What other processes could be greatly improved with open source tools? 

Please share your thoughts on the topic. If you want to know more on the work behind RAP, you can consult the complete case study (at this link).