paper
LettuceDetect: token-level hallucination detection for RAG outputs
Abstract
Retrieval-augmented generation systems remain vulnerable to hallucinated answers despite incorporating external knowledge sources. LettuceDetect addresses two limitations of existing hallucination-detection methods: the context-window constraints of traditional encoder approaches, and the computational inefficiency of LLM-judge approaches. Built on ModernBERT (English, 8k tokens) and EuroBERT (multilingual), trained on RAGTruth, the model is a token-classification system that processes (context, question, answer) triples and flags unsupported spans. On RAGTruth example-level detection, LettuceDetect reaches an F1 of 79.22%, a 14.8% improvement over the previous best encoder-based architecture, while running at 30 to 60 examples per second on a single GPU.
TL;DR
- Token-level hallucination detection for RAG, trained on RAGTruth.
- Encoder-based, 8k-token context, around 30 times smaller than the best LLM-judge models.
- F1 79.22% on RAGTruth example-level detection.
- Open-source library and models published under KRLabsOrg on GitHub and Hugging Face.
Resources
The paper, benchmarks, and ablations are on arXiv. The library, training scripts, and the published model family (English base and large, EuroBERT, Hungarian, TinyLettuce) are under github.com/KRLabsOrg/LettuceDetect and huggingface.co/KRLabsOrg.
Cite
@article{kovacs-recski-2025-lettucedetect,
title = {LettuceDetect: a hallucination-detection framework for RAG applications},
author = {Kov{\'a}cs, {\'A}d{\'a}m and Recski, G{\'a}bor},
journal = {arXiv preprint arXiv:2502.17125},
year = {2025},
url = {https://arxiv.org/abs/2502.17125}
}