Evaluation Bundesbot

Version

0.1

Vitality

56%

Quick links

Description

This projects provides ideas for an automtic evaluation of an instance of the ITZBund-Chatbot (aka Bundesbot, see e.g. https://www.itzbund.de/DE/itloesungen/standardloesungen/chatbots/chatbots.html). It should be seen as a Proof-of-Concept and give ideas to others who are tasked with a similar problem. In this project, we use a large language model (LLM) to generate questions that might be sent to the chatbot. The newly generated questions include variations of the pre-defined questions with or without typos, translations and new questions on a given topic. The answers are retrived from the chatbot by API-calls and checked against the pre-defined set of answers. Additionally the LLM is used to evaluate the answer on a score between 1 and 3 whether it is a valid response to the question.

Features

parsing the JSON-export of the Bundesbot
generating new questions using a large language model
evaluating the chatbot response

Detailed information

Release date

17/04/2024

Supported languages

German,

English

Development status

Concept

License

MIT License

Platforms

linux

Maintenance type

[internal] Internally maintained by the repository owner

Software type

standalone/backend

Technical Contacts

UBA KI-Lab