paper

LettuceDetect: token-level hallucination detection for RAG outputs

Ádám Kovács, Gábor Recski Preprint, arXiv:2502.17125 2025-02-24 evaluation

Abstract

Retrieval-augmented generation systems remain vulnerable to hallucinated answers despite incorporating external knowledge sources. LettuceDetect addresses two limitations of existing hallucination-detection methods: the context-window constraints of traditional encoder approaches, and the computational inefficiency of LLM-judge approaches. Built on ModernBERT (English, 8k tokens) and EuroBERT (multilingual), trained on RAGTruth, the model is a token-classification system that processes (context, question, answer) triples and flags unsupported spans. On RAGTruth example-level detection, LettuceDetect reaches an F1 of 79.22%, a 14.8% improvement over the previous best encoder-based architecture, while running at 30 to 60 examples per second on a single GPU.

TL;DR

Token-level hallucination detection for RAG, trained on RAGTruth.
Encoder-based, 8k-token context, around 30 times smaller than the best LLM-judge models.
F1 79.22% on RAGTruth example-level detection.
Open-source library and models published under KRLabsOrg on GitHub and Hugging Face.

Resources

Open the PDF Open on arXiv Open the repo Open on Hugging Face

The paper, benchmarks, and ablations are on arXiv. The library, training scripts, and the published model family (English base and large, EuroBERT, Hungarian, TinyLettuce) are under github.com/KRLabsOrg/LettuceDetect and huggingface.co/KRLabsOrg.

Cite

@article{kovacs-recski-2025-lettucedetect,
  title   = {LettuceDetect: a hallucination-detection framework for RAG applications},
  author  = {Kov{\'a}cs, {\'A}d{\'a}m and Recski, G{\'a}bor},
  journal = {arXiv preprint arXiv:2502.17125},
  year    = {2025},
  url     = {https://arxiv.org/abs/2502.17125}
}

KR Labs at ArchEHR-QA 2025: A Verbatim Approach for Evidence-Based Question Answering

Open the repo Subscribe to research updates

Abstract

TL;DR

Resources

Cite

Related