paper

Squeez: task-conditioned tool-output pruning for coding agents

Ádám Kovács Preprint, arXiv:2604.04979 2026-04-04 evaluation

Abstract

Coding agents repeatedly consume long tool observations even though only a small fraction of each observation matters for the next step. Squeez studies task-conditioned tool-output pruning: given a focused query and one raw tool observation, return the smallest verbatim evidence block the agent should inspect next. The benchmark contains 11,477 examples from SWE-bench repository interactions and synthetic multi-ecosystem tool outputs, with a manually curated 618-example test set. A LoRA-tuned Qwen 3.5 2B model reaches 0.86 recall and 0.80 F1 while removing 92% of input tokens.

TL;DR

Task-conditioned pruning for noisy coding-agent tool output.
Benchmark of 11,477 examples, with a manually curated 618-example test set.
LoRA-tuned Qwen 3.5 2B reaches 0.86 recall and 0.80 F1 at 92% compression.
Open benchmark, model, and CLI for coding-agent workflows.

Resources

Open the PDF Open on arXiv Open the repo

The paper, benchmark, and implementation are linked from the source record in the KR Labs publication inventory. The project studies how to keep verbatim evidence from noisy tool output while removing the lines a coding agent does not need for the next step.

Cite

@article{kovacs-2026-squeez,
  title   = {Squeez: Task-Conditioned Tool-Output Pruning for Coding Agents},
  author  = {Kov{\'a}cs, {\'A}d{\'a}m},
  journal = {arXiv preprint arXiv:2604.04979},
  year    = {2026},
  url     = {https://arxiv.org/abs/2604.04979}
}

KR Labs at ArchEHR-QA 2025: A Verbatim Approach for Evidence-Based Question Answering

Open the repo Subscribe to research updates

Abstract

TL;DR

Resources

Cite

Related