In April 2024, the UK AI Safety Institute (AISI) launched its open source AI evaluation platform: Inspect. With Inspect, AISI aims to contribute to the global development of safe AI models by offering a solution to the challenge of standardised AI testing.
Towards standardised safety evaluation for AI
AI benchmarking and safety evaluation is a complex process, notably due to the lack of transparency around AI models both in terms of infrastructure and training data. Inspect proposes a response to this by offering a software library and standardised testing techniques based on three components: Datasets, Solvers, and Scorers.
Datasets contain samples of evaluation tests (scenarios), including the prompts and target outputs. The tests are then carried on through Solvers, before Scorers aggregate the scores to conclude the evaluation of the AI model. Through this process users can test different capabilities of an AI model, including its core knowledge, capability to reason, and autonomous capabilities.
A commitment towards open source and transparency
AISI published Inspect under an MIT open source licence. This is to allow AI actors globally to run their own testing by integrating Inspect into their AI models and building a custom evaluation script to generate safety information. The source code and a guide on how to use it are available on GitHub, and built-in components can be augmented via third-party packages built in Python.
Inspect is not only the first open source AI evaluation tool, it is also the first to be backed by a government: A joint collaboration by AISI – itself a directorate of the UK Department for Science, Innovation, and Technology – and Incubator for AI (i.AI) – the UK government’s office to support the use of AI for the delivery of public services. Ian Hogarth, AISI Chair, highlighted that the decision of making Inspect open source not only represent benefits towards the constant improvement of the tool, it is also a way of giving back to the open source AI community: “We have been inspired by some of the leading open source AI developers [...]. This is our effort to contribute back.”
International impact
Inspect will be available globally and is to be understood in the context of the US and UK partnership on AI safety testing announced in April 2024, following the AI Safety Summit at Bletchley Park last November. Considering policymakers’ increased interest in the development of safe AI models, including via regulatory requirements and proposals on safety and ethics principles, it will be interesting to see if this will lead to further development of open source solutions by governments to test and encourage safe AI development.
Open source evaluation models such as Inspect, also suggest a potential approach for facilitating compliance with the EU AI Act. Transparency, risk assessments and evaluation for compliance with the AI Act are key requirements for providers of high-risks AI. Precise evaluation processes are yet to be outlined in the upcoming Codes of Practice, however, offering open source evaluation models can be a way to facilitate compliance and therefore to foster innovation in the EU.
Featured image by Markus Spiske via Unsplash.