With the release of its first large language models (LLM), the EuroLLM project aims to offer a competitive multilingual and multimodal LLM for all 24 official European languages. Launched in September 2024 and released under the Apache 2.0 open source licence, EuroLLM’s first models exemplifies how artificial intelligence (AI) can be tailored to Europe’s language diversity whilst encouraging an innovative European AI ecosystem.
Developing a competitive European LLM
EuroLLM’s first models, EuroLLM-1.7B and its companion fine-tuning model EuroLLM-1.7B-Instruct, aim to provide European users with a competitive LLM that can receive prompts and generate text in all official European languages. Established LLMs are typically focused on English and a few widely spoken languages. EuroLLM in contrast attempts to provide an answer to this focusing on a wide range of spoken languages.
To build multilingual and multimodal capabilities, EuroLLM trained its EuroLLM-1.7B model on an extensive dataset of 4 trillions tokens, representative of different data sources and of all considered languages. For fine-tuning instruction, the EuroLLM-1.7B-Instruct model was further developed using EuroBlocks, a multilingual dataset developed by EuroLLM for instruction-following tasks.
The developer of the model set out in a paper published on Arxiv that their EuroLLM models demonstrated comparable, and in some instances superior, performance to other models, including prominent LLMs benchmarks such as Hellaswag and Arc Challenge. Euro-LLMs-1.7B-Instruct notably outperformed Gemma-2B, Google’s “open model” built from the same research as its Gemini models.
An open source project inscribed in European innovation
By developing LLMs available in all official European languages, as well as other major languages such as Russian, Arabic, and Chinese, EuroLLM provides European and global users with access to competitive AI technology in their preferred languages.
EuroLLM is a project co-funded by the European Union and formed of a consortium of nine project partners, including leading European universities and established technical research lab and AI translation companies, from Europe but also global. The project is also linked to the European High Performance Computing Joint Undertaking (EuropHPCOB JU). As such, it is part of a broader strategic goal of creating a competitive and innovative European AI ecosystem. With both released models published under an open source licence, including open weights, the project has the potential to encourage European open source innovation in artificial intelligence.
Presenting itself as an open source and “open weight” project, EuroLLMs published its models on HuggingFace upon release. The decision to open source the models also offers the basis for further innovative development, and demonstrates how the EU’s supercomputing facility can be used to develop open source innovation.