[technology] [verbatim]
VerbatimRAG
Retrieval-augmented generation where the answer is assembled from source text the system can point back to, resulting in zero unsupported factual claims.
Live now Ask the ACL Anthology with cited source spansWhat it does
VerbatimRAG replaces free-form factual generation with source-span extraction, so the target is zero hallucinated factual claims in the final answer.
Most RAG systems retrieve documents, then let a language model write a free-form answer that draws on them. That final step is where unsupported claims enter: the model can paraphrase too far, merge sources, or invent a fact that was never in the retrieved context.
VerbatimRAG moves the factual work into extraction. A query-conditioned span extractor identifies source passages that answer the question. The answer is then assembled from those passages, so the reader can inspect the exact text behind each cited claim.
The key property is not that the answer sounds plausible. It is that every factual claim is either replayable against source text, source id, and character offsets, or left out.
- model
- 150M
- ModernBERT extractor, small enough to run as a dedicated evidence model
- context
- 8,192
- token context window for long retrieved passages
- dataset
- 195k
- training rows in the public verbatim-spans dataset
- licence
- MIT
- code licence for the VerbatimRAG package
How it works
The pipeline is designed around one constraint: if the source text does not contain the fact, the answer should not present it.
VerbatimRAG moves the factual decision before generation. The extractor selects source spans; the answer can only cite what passed that boundary.
Source text with offsets
doc_14 · chars 1482 to 1533The extractor runs on ModernBERT and supports an 8,192-token context window for long-form context.
doc_27 · chars 904 to 957Each predicted span returns character offsets relative to the source for replay against the original document.
ModernBERT span extractor
- input
question+retrieved_context[]- decision
- token labels for supported answer spans
- failure mode
- no selected span means no grounded answer
Cited answer
The extractor uses a ModernBERT backbone that supports an 8,192-token context window [1]. Each cited span carries character offsets relative to the source [2], so the claim can be checked against the original document.
- answer.text
- Readable response assembled from selected spans.
- citations[]
- Marker, source id, span text, character offsets.
- abstain
- Returned when the extractor finds no supporting span.
VerbatimRAG treats answer generation as evidence selection before language generation. Retrieval supplies candidate context, the extractor selects the source spans that can support an answer, and generation is limited to arranging those spans into readable prose. That is the basis for the zero-unsupported-claims target. The output includes citations, source ids, and character offsets, and if no supporting span is found, the system should abstain rather than ask the LLM to fill the gap.
Results
The public evidence is split across two evaluations: the BioNLP paper tests the full verbatim pipeline in clinical QA, while the Hugging Face model card shows the generic extractor leading every reported Word-F1 slice.
- Clinical QA 42.01%
Full VerbatimRAG pipeline score on ArchEHR-QA 2025, with top-10 in core metrics.
- ACL evidence selection 0.463 Word-F1
Best Word-F1 on ACL gold.
- RAG domains 0.618 Word-F1
Best Word-F1 across multi-domain QA, including finance, medical, legal, and product-manual sources.
- Generalization 0.588 / 0.513 Word-F1
Best Word-F1 on Squeez tool-output pruning and QASPER scientific QA slices.
The clinical-QA result evaluates the full VerbatimRAG pipeline. The Word-F1 scores evaluate the public span extractor: whether it selects the same evidence text as the reference annotations. The strongest claim we can make from the public model card is specific: best Word-F1 on ACL gold, RAGBench, Squeez, and QASPER, not a blanket claim over all RAG systems. For setup, baselines, and slice-level details, read the BioNLP paper and the Hugging Face model card.
Built with VerbatimRAG
VerbatimRAG is not only a package. We use it as the evidence layer for live corpus products, agent tooling, and hosted API workflows.
These surfaces show the same architecture at different levels: a user-facing research tool, a reusable API, an MCP server for agent contexts, and a Claude Code skill for working directly inside development workflows.
- hosted API Verbatim API Hosted query and transform endpoints for teams that want to test source-span extraction before running a full retrieval stack.
- Claude Code verbatim-acl-skill Claude Code workflow for searching ACL papers and transforming source context into cited answers.
- MCP verbatim-mcp MCP server for querying academic papers, retrieving metadata, and exporting citations inside agent environments.
For developers
Use the lightweight transform when you already have context, or the full package when you want retrieval, extraction, and cited answers in one stack.
Inspect the moving parts
The package, model, dataset, and paper are public so teams can inspect the implementation before adopting the hosted workflow.
- GitHub Core VerbatimRAG package, examples, docs, web interface, and tests.
- PyPI: verbatim-rag Full RAG package for retrieval, span extraction, and cited answers.
- PyPI: verbatim-core Lightweight transform package with a small dependency surface.
- Model: verbatim-rag-modern-bert-v2 150M ModernBERT token classifier for query-conditioned evidence spans.
- Dataset: verbatim-spans Multi-domain evidence-selection data with ACL, RAGBench, Squeez, and related sources.
- Paper: ArchEHR-QA 2025 The BioNLP shared-task paper describing the original verbatim pipeline.
Start with the package boundary
Use verbatim-core when your application already has retrieved context. Use verbatim-rag when you also want indexing, retrieval, extraction, and cited answer assembly.
Install
Add the model extra when the lightweight transform should use a local ModernBERT extractor.
pip install "verbatim-core[model]"
pip install verbatim-rag Transform provided context
Use verbatim-core when your application already has question and context pairs. Pass your own extractor when span selection should run through a local model.
from verbatim_core import VerbatimTransform
from verbatim_core.extractors import ModelSpanExtractor
extractor = ModelSpanExtractor(
model_path="KRLabsOrg/verbatim-rag-modern-bert-v2",
threshold=0.2,
device=None,
)
transform = VerbatimTransform(extractor=extractor)
response = transform.transform(
question="What is the main finding?",
context=[
{
"content": "The study found that X leads to Y.",
"title": "Paper A",
},
{
"content": "Results show Z is significant.",
"title": "Paper B",
}
],
)
print(response.answer)
for document in response.documents:
for highlight in document.highlights:
print(document.title, highlight.start, highlight.end, highlight.text) Retrieve and answer
Use verbatim-rag when the system should retrieve candidate passages before extraction and answer assembly.
from verbatim_rag import VerbatimIndex, VerbatimRAG
from verbatim_rag.ingestion import DocumentProcessor
from verbatim_rag.vector_stores import LocalMilvusStore
from verbatim_rag.embedding_providers import SpladeProvider
processor = DocumentProcessor()
document = processor.process_url(
url="https://aclanthology.org/2025.bionlp-share.8.pdf",
title="KR Labs at ArchEHR-QA 2025: A Verbatim Approach for Evidence-Based Question Answering",
metadata={"authors": ["Adam Kovacs", "Paul Schmitt", "Gabor Recski"]},
)
sparse = SpladeProvider(
model_name="opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill",
device="cpu",
)
store = LocalMilvusStore(
db_path="./index.db",
collection_name="verbatim_rag",
enable_dense=False,
enable_sparse=True,
)
index = VerbatimIndex(
vector_store=store,
sparse_provider=sparse,
)
index.add_documents([document])
rag = VerbatimRAG(index)
response = rag.query("What is the main contribution of the paper?")
print(response.answer) Paper reference
@inproceedings{kovacs-etal-2025-kr,
title = {{KR} Labs at {A}rch{EHR}-{QA} 2025: A Verbatim Approach for Evidence-Based Question Answering},
author = {Kovacs, Adam and Schmitt, Paul and Recski, Gabor},
booktitle = {Proceedings of the 24th Workshop on Biomedical Language Processing (Shared Tasks)},
year = {2025},
address = {Vienna, Austria},
publisher = {Association for Computational Linguistics},
pages = {69--74},
url = {https://aclanthology.org/2025.bionlp-share.8/}
} Compatibility and licensing
The public stack separates code, model, and dataset artefacts so technical and legal teams can review the adoption boundary.
- code
verbatim-ragandverbatim-core, MIT licence.- model
KRLabsOrg/verbatim-rag-modern-bert-v2, Apache-2.0. ModernBERT token classification with an 8,192-token context window.- dataset
KRLabsOrg/verbatim-spans, Apache-2.0. Multi-domain evidence-selection data.- runtime
- Python package, hosted API, self-hosted service, local experiments, MCP server, and Claude Code workflow.
Combine with the rest of the stack
Each product can run on its own. Together, they turn an LLM answer into something a team can inspect, reject, or enforce.
VerbatimRAG is the evidence extraction layer in the pipeline: retrieval supplies candidate context, then VerbatimRAG selects the source spans an answer is allowed to rely on. Pair it with a detector to check whether the answer is supported by the retrieved context, and a rules layer to apply the domain constraints that should govern the output. The result is not just an answer, but a record of what was retrieved, what was checked, and which rules fired.
Trace the answer back to the source
Start with the evaluation details, or inspect the repository for the extractor, API, examples, and deployment surface.