[technology]
The trust and control layer for production AI systems.
Small, specialised, inspectable systems around the LLM. KR Labs develops open-source, state-of-the-art small models and frameworks for grounded generation, hallucination detection, and rule enforcement. Even frontier LLMs hallucinate about 10% of the time in RAG settings; in agentic systems, those errors compound.
From retrieval to evidence
Similarity is useful. It is not the same thing as evidence.
Two retrieval flows side by side. Standard generative RAG returns a free-text answer with no guaranteed link back to its source. Grounded RAG returns a cited answer where each claim points to a verbatim span in the source documents.
Standard generative RAG has known failure modes when the answer must be auditable: source-level traceability is not guaranteed, similarity is not grounding, and an LLM is still free to invent. RAGTruth measures this directly in RAG responses [1]. KR Labs publishes both sides of the response: extraction-based RAG for answers that must point back to source text [2] and hallucination detection for generated answers that still need a support check [3].
The alternative is an LLM pipeline with explicit evidence boundaries: VerbatimRAG extracts the source spans an answer is allowed to use, LettuceDetect checks unsupported generated text, and RuleChef enforces hard rules where the domain requires them. The goal is not to make the upstream LLM perfect; it is to make unsupported final answers detectable, blockable, or impossible to generate.
The technology family
Three specialised systems in your LLM pipeline: one grounds the answer, one checks it, one enforces the rules around it.
-
VerbatimRAG selects source spans before answer assembly, so claims can be cited, replayed, and audited. Built with SOTA small encoder models for evidence extraction.
Live now Ask the ACL Anthology -
LettuceDetect flags unsupported answer spans with 79.22% RAGTruth F1 while running 30 to 60 examples per second on a single GPU. Built with SOTA small encoder models for hallucination detection.
-
Rule-grounded reasoning for systems where business and regulatory constraints must be enforced. Learns regex, Python, and spaCy patterns from labelled examples; runs locally with no LLM at inference time.
Common design principles
Across the stack, the design goal is the same: make the system easier to inspect than to take on faith.
- verifiable
- Every output traces back to a span, a rule, or a labelled example. Nothing is asserted that cannot be checked.
- model-agnostic
- Any compliant LLM at the generation step; encoder models we publish at the retrieval and detection steps. No vendor lock-in.
- EU-AI-Act-aligned
- Designed so the obligations under Articles 9, 13, 14, and 15 can be evidenced, not asserted.
- audit-ready
- Outputs come with the trail an auditor needs: sources cited, claims classified, rules logged.
How they compose
Each product can run on its own. Together, they turn an LLM answer into something a team can inspect, reject, or enforce.
VerbatimRAG grounds the answer in cited source spans. LettuceDetect checks whether the answer is supported by the retrieved context. RuleChef applies the domain rules that should govern the output. The result is not just an answer, but a record of what was retrieved, what was checked, and which rules fired.
Documents feed retrieval. VerbatimRAG extracts cited spans, RuleChef formalizes constraints, an LLM generates the answer, and LettuceDetect checks the output against the evidence.
From architecture to deployment
The technology answers the architectural question. The Practice path handles the production question: what already exists, what has to change, and what proof the system needs before it can be trusted in use.
Most teams start with an audit, then move into consultation, MVP work, or deployment depending on what the evidence shows.
Make the system show its work
Read about the evidence extraction layer first, or bring us an existing system that needs an evidence trail before it reaches production.