paper

Squeez: task-conditioned tool-output pruning for coding agents

Abstract

Coding agents repeatedly consume long tool observations even though only a small fraction of each observation matters for the next step. Squeez studies task-conditioned tool-output pruning: given a focused query and one raw tool observation, return the smallest verbatim evidence block the agent should inspect next. The benchmark contains 11,477 examples from SWE-bench repository interactions and synthetic multi-ecosystem tool outputs, with a manually curated 618-example test set. A LoRA-tuned Qwen 3.5 2B model reaches 0.86 recall and 0.80 F1 while removing 92% of input tokens.

TL;DR

  • Task-conditioned pruning for noisy coding-agent tool output.
  • Benchmark of 11,477 examples, with a manually curated 618-example test set.
  • LoRA-tuned Qwen 3.5 2B reaches 0.86 recall and 0.80 F1 at 92% compression.
  • Open benchmark, model, and CLI for coding-agent workflows.

Resources

The paper, benchmark, and implementation are linked from the source record in the KR Labs publication inventory. The project studies how to keep verbatim evidence from noisy tool output while removing the lines a coding agent does not need for the next step.

Cite

@article{kovacs-2026-squeez,
  title   = {Squeez: Task-Conditioned Tool-Output Pruning for Coding Agents},
  author  = {Kov{\'a}cs, {\'A}d{\'a}m},
  journal = {arXiv preprint arXiv:2604.04979},
  year    = {2026},
  url     = {https://arxiv.org/abs/2604.04979}
}

Related