As the world gradually uncovers the potential of artificial intelligence, language models have long been a significant focus within scientific research. While models such as GPT, Claude, and Mistral have demonstrated their ability to perform a wide variety of tasks, they don’t come without flaws, and many studies continue to explore ways to enhance their capabilities.
In a recent study, the SEMIC team delved into the realm of domain adaptation, investigating how retraining Large Language Models (LLMs) with public sector data could enhance AI performance specifically for public sector-related tasks.
Adapting LLMs for the Public Sector:
Domain adaptation of Large Language Models emerged as one of the most trending topics in recent research around language modelling. This surge in interest is due to observations that LLM’s tend to underperform when applied to highly specific domains. To address this challenge, domain adaptation aims to enrich these models with domain-specific data, thereby refining their understanding of the unique nuances of specialised languages. By tailoring the training data to specific sectors, researchers hope to significantly enhance the models' performance and accuracy in these targeted areas.
In their study, the SEMIC team set out to explore the benefits of domain adaptation for the public sector. More specifically, they focused on improving the performance of LLMs for a specific use case within the European public service: clustering pledges on the Transition Pathway for Tourism (see details on the use case here).
To achieve this, they compiled a targeted corpus of official documents and legislation related to the Transition Pathway for Tourism. These datasets were used to further train two existing LLMs, BERT and RoBERTa, thereby adapting them to the domain of interest.
Key findings and implications
To evaluate the impact of the domain adaptation, the SEMIC team employed an innovative approach using GPT-4 to assess the quality of clusters created with the different models. The results revealed that domain adaptation had a nuanced impact on the clustering performance. While domain-adapted LLMs enhanced the interpretability and usability of clusters, the study also showed that a simpler model such as Word2Vec could produce even better results. This highlights a potential trade-off between using more sophisticated models and their practical usability, suggesting that in some cases, simplicity might outperform complexity.
Showcasing research at NLDB 2024 ...
Given the growing interest in language models and domain adaptation, the SEMIC team's study was accepted for presentation at the NLDB 2024 conference as part of the industry track.
Organised in Turin, the 29th Annual International Conference on Natural Language and Information Systems (NLDB) focused this year on large language models, transparency and bias in AI, multimodal models combining NLP and Computer Vision, and conversational AI. Over three days, researchers from academia and industry presented the latest research and industrial applications of NLP across information systems.
For SEMIC, this conference represented an opportunity to share their findings with the scientific community and receive feedback on their work. Following the presentation, the study will be published as a conference paper in the NLDB 2024 proceedings on Springer and on the SEMIC Support Center in September 2024.
Conclusion:
Overall, the SEMIC’s team research offers valuable insights into the application of domain adaptation and LLMs in public services. It demonstrated that domain adaptation can be a relevant approach for improving an AI model's ability to understand the particularities of domain specific language. This improved understanding could lead to more accurate and efficient data processing, thereby enhancing interoperability across different public administrations.
Furthermore, by working with a limited domain corpus, the SEMIC team illustrated what can be achieved even with modest efforts, contributing to the body of research on natural language processing techniques in public services. The study also underscored the importance of caution when using complex AI models. It highlighted that increased complexity does not necessarily guarantee higher quality.
One of the next crucial steps in this research is refining the approach for comparing LLM accuracy in real-world applications. While theoretical metrics provide valuable benchmarks, they often do not align with the subjective perceptions of users. Real-world effectiveness depends not only on objective measures but also on how users experience and interact with these models in practical scenarios. This discrepancy emphasizes the need for more nuanced evaluation strategies that consider both the quantitative and qualitative aspects of LLM performance in domain-specific applications.