[technology]

The trust and control layer for production AI systems.

Small, specialised, inspectable systems around the LLM. KR Labs develops open-source, state-of-the-art small models and frameworks for grounded generation, hallucination detection, and rule enforcement. Even frontier LLMs hallucinate about 10% of the time in RAG settings; in agentic systems, those errors compound.

From retrieval to evidence

Similarity is useful. It is not the same thing as evidence.

Two retrieval paths: generative RAG vs evidence

Standard generative RAG has known failure modes when the answer must be auditable: source-level traceability is not guaranteed, similarity is not grounding, and an LLM is still free to invent. RAGTruth measures this directly in RAG responses [1]. KR Labs publishes both sides of the response: extraction-based RAG for answers that must point back to source text [2] and hallucination detection for generated answers that still need a support check [3].

The alternative is an LLM pipeline with explicit evidence boundaries: VerbatimRAG extracts the source spans an answer is allowed to use, LettuceDetect checks unsupported generated text, and RuleChef enforces hard rules where the domain requires them. The goal is not to make the upstream LLM perfect; it is to make unsupported final answers detectable, blockable, or impossible to generate.

The technology family

Three specialised systems in your LLM pipeline: one grounds the answer, one checks it, one enforces the rules around it.

ground MIT
[verbatim]

VerbatimRAG selects source spans before answer assembly, so claims can be cited, replayed, and audited. Built with SOTA small encoder models for evidence extraction.
Live now Ask the ACL Anthology

Read more
verify MIT
[lettucedetect]

LettuceDetect flags unsupported answer spans with 79.22% RAGTruth F1 while running 30 to 60 examples per second on a single GPU. Built with SOTA small encoder models for hallucination detection.

Read more
enforce Apache-2.0
[rulechef]

Rule-grounded reasoning for systems where business and regulatory constraints must be enforced. Learns regex, Python, and spaCy patterns from labelled examples; runs locally with no LLM at inference time.

Read more

Common design principles

Across the stack, the design goal is the same: make the system easier to inspect than to take on faith.

verifiable: Every output traces back to a span, a rule, or a labelled example. Nothing is asserted that cannot be checked.
model-agnostic: Any compliant LLM at the generation step; encoder models we publish at the retrieval and detection steps. No vendor lock-in.
EU-AI-Act-aligned: Designed so the obligations under Articles 9, 13, 14, and 15 can be evidenced, not asserted.
audit-ready: Outputs come with the trail an auditor needs: sources cited, claims classified, rules logged.

How they compose

Each product can run on its own. Together, they turn an LLM answer into something a team can inspect, reject, or enforce.

VerbatimRAG grounds the answer in cited source spans. LettuceDetect checks whether the answer is supported by the retrieved context. RuleChef applies the domain rules that should govern the output. The result is not just an answer, but a record of what was retrieved, what was checked, and which rules fired.

Evidence path through the stack

From architecture to deployment

The technology answers the architectural question. The Practice path handles the production question: what already exists, what has to change, and what proof the system needs before it can be trusted in use.

Most teams start with an audit, then move into consultation, MVP work, or deployment depending on what the evidence shows.

Make the system show its work

Read about the evidence extraction layer first, or bring us an existing system that needs an evidence trail before it reaches production.

Read about VerbatimRAG Discuss a deployment