{"id":16,"date":"2025-09-23T09:00:00","date_gmt":"2025-09-23T13:00:00","guid":{"rendered":"https:\/\/monizesairesearch.com\/index.php\/2025\/09\/23\/backfill-05-agents-tool-use\/"},"modified":"2026-05-26T01:48:42","modified_gmt":"2026-05-26T05:48:42","slug":"backfill-05-agents-tool-use","status":"publish","type":"post","link":"https:\/\/monizesairesearch.com\/index.php\/2025\/09\/23\/backfill-05-agents-tool-use\/","title":{"rendered":"The Agent Stack Is Being Rewritten"},"content":{"rendered":"<h2>Orchestration, skills, and security \u2014 the year agent research grew up.<\/h2>\n<p>May 2025 \u2013 May 2026 | 37 papers surveyed<\/p>\n<p>&#8212;<\/p>\n<p>A year ago, if you wanted to build an AI agent, you picked a framework: LangGraph, CrewAI, AutoGen, Google ADK, OpenAI Agents SDK. These frameworks \u2014 collectively exceeding 290,000 GitHub stars \u2014 defined the architectural orthodoxy: an external orchestrator sits above the LLM, injecting instructions and routing decisions every turn. It felt obvious. It felt necessary.<\/p>\n<p>Then the research caught up. Over 37 papers surveyed from May 2025 to May 2026, the agent research community systematically questioned every assumption underlying the orchestration paradigm. The result is a field in productive upheaval. The external orchestrator model is being challenged by weight-compilation approaches. The skill management pipeline has been systematically analyzed and its bottlenecks identified. And multi-agent systems moved from toy demos to real coordination problems with shared resources and temporal dynamics.<\/p>\n<p>Here&#8217;s what happened \u2014 and why the agent stack as you know it may not survive the next year intact.<\/p>\n<p>&#8212;<\/p>\n<h2>The Skill Management Revolution<\/h2>\n<p>The year&#8217;s most significant agent research cluster concerned how agents acquire, manage, and improve skills at inference time. This isn&#8217;t about making agents that can write code or browse the web \u2014 that&#8217;s old news. It&#8217;s about making agents that can improve themselves systematically, without weight updates.<\/p>\n<p><strong>&#8220;SkillOpt: Executive Strategy for Self-Evolving Agent Skills&#8221;<\/strong> (Yang et al., May 2026) made the case that skill acquisition should be treated with the same rigor as weight-space optimization. Rather than ad-hoc self-revision or one-shot generation, SkillOpt treats the skill as <em>trainable external state<\/em> and applies a systematic optimization strategy over the text-space representation. The paper introduces three mechanisms: skill pruning (removing skills that hurt performance), skill merging (combining related skills), and skill scheduling (deciding when to use which).<\/p>\n<p>This is a conceptual shift: skills are not knowledge that the agent happens to record \u2014 they are parameters in an external optimization loop. Just as you wouldn&#8217;t train a neural network without a learning rate schedule and gradient clipping, SkillOpt argues you shouldn&#8217;t manage agent skills without analogous mechanisms. The paper demonstrates that systematic skill optimization consistently outperforms ad-hoc skill accumulation, and the advantage grows with task diversity.<\/p>\n<p><strong>&#8220;From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills&#8221;<\/strong> (Huang et al., May 2026) provided the first comprehensive study spanning the full skill pipeline from extraction to consumption. The paper&#8217;s key finding \u2014 that domain-level and model-generated skills are finally becoming viable at scale \u2014 marks a turning point. The field now has systematic evidence rather than anecdotal success stories. The authors paid particular attention to domain-level skills (as opposed to general-purpose heuristics) and found that model-generated skills in specialized domains \u2014 medicine, law, engineering \u2014 are finally reaching the quality threshold for practical deployment.<\/p>\n<p>> <strong>Why this matters:<\/strong> SkillOpt provides the <em>optimization framework<\/em> for skills; From Raw Experience provides the <em>empirical characterization<\/em>. Together, they establish that skill management is a solvable optimization problem \u2014 not an irreducible challenge of agent design.<\/p>\n<p>&#8212;<\/p>\n<h2>Compilation vs. Orchestration: The Year&#8217;s Most Provocative Challenge<\/h2>\n<p><strong>&#8220;Compiling Agentic Workflows into LLM Weights: Near-Frontier Quality at Two Orders of Magnitude Less Cost&#8221;<\/strong> (Dennis et al., May 2026) is arguably the most provocative paper in the entire agent research corpus this year. The authors started with a simple observation: agent orchestration frameworks (LangGraph, CrewAI, etc.) all follow the same pattern \u2014 an external orchestrator above the LLM, injecting instructions and routing decisions every turn. This pattern has become so dominant that it&#8217;s rarely questioned.<\/p>\n<p><strong>Why this matters:<\/strong> If correct, this finding challenges the premise of the multi-billion-dollar agent framework ecosystem. It suggests that for a large class of tasks \u2014 specifically procedural tasks with well-defined workflows \u2014 external orchestration is pure overhead.<\/p>\n<p>Their finding: for procedural tasks, this architecture is <em>dominated<\/em> by simply providing the procedure in the system prompt of a frontier model \u2014 at two orders of magnitude less cost. The paper goes further, showing that the procedures can be <em>compiled into the model&#8217;s weights<\/em>, effectively internalizing the orchestration logic. The results are striking: near-frontier quality on procedural tasks at roughly 1% of the compute cost of running an external orchestrator alongside a frontier model.<\/p>\n<p>> <strong>Key insight:<\/strong> For strictly procedural tasks, weight compilation dominates external orchestration. But for dynamic tasks requiring tool selection and environmental interaction, external orchestration may remain necessary. The frontier is characterizing <em>when<\/em> each wins.<\/p>\n<p>&#8212;<\/p>\n<h2>Tool Use and Multimodal Agents<\/h2>\n<p><strong>&#8220;ETCHR: Editing To Clarify and Harness Reasoning&#8221;<\/strong> (Zhang et al., May 2026) \u2014 covered across multiple topics in this digest \u2014 is fundamentally an agent paper. It uses a dedicated image editing tool as an external module that the reasoning model calls when needed. The decoupled design (editing model + understanding model) is a tool-use architecture where the tool (image editing) is specialized and the orchestrator (understanding model) decides when to invoke it. This is a concrete example of the external orchestration approach that the compilation paper challenges \u2014 illustrating the productive tension between these paradigms.<\/p>\n<p><strong>&#8220;SPACENUM: Revisiting Spatial Numerical Understanding in VLMs&#8221;<\/strong> (Zhang et al., May 2026) examines a critical capability gap for embodied agents: can vision-language models genuinely ground numerical outputs (action magnitudes, spatial coordinates) in spatial perception, or are they generating statistically plausible numbers? The paper&#8217;s SpaceNum framework reveals the latter \u2014 a finding with direct implications for any agent operating in physical environments.<\/p>\n<p>&#8212;<\/p>\n<h2>Multi-Agent Systems Get Real<\/h2>\n<p><strong>&#8220;CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolving Data Marketplaces&#8221;<\/strong> (Chandra, May 2026) addressed a specific but widely generalizable multi-agent coordination problem: multiple agents sharing a differential-privacy budget over a temporally evolving knowledge graph. The three-layer architecture (neural-ODE temporal decay, time-aware Shapley pricing, coordinated DP budget management) provides a unified treatment of temporal, economic, and privacy constraints.<\/p>\n<p><strong>Why this matters beyond data marketplaces:<\/strong> The coordination mechanisms \u2014 shared resource allocation, temporal awareness, incentive-compatible pricing \u2014 apply to any multi-agent system with shared resources. As multi-agent deployments grow (autonomous fleets, distributed sensor networks, collaborative AI systems), these patterns become essential infrastructure rather than academic curiosities.<\/p>\n<p>&#8212;<\/p>\n<h2>Embodied Agents<\/h2>\n<p><strong>&#8220;Leveraging Foundation Models for Causal Generative Modeling&#8221;<\/strong> (Komanduri &#038; Wu, May 2026) \u2014 FM-CGM \u2014 formalizes end-to-end visual causal reasoning using pretrained foundation models. For embodied agents, the ability to reason causally about visual scenes is a prerequisite for reliable real-world action.<\/p>\n<p><strong>&#8220;Robotic Strawberry Harvesting with Robust Vision and Deep Reinforcement Learning based Sim-to-Real Control&#8221;<\/strong> (Bashir et al., May 2026) demonstrated a complete closed-loop agentic system: vision segmentation \u2192 RL-based planning \u2192 ROS-based execution. A full-stack agent deployment \u2014 from perception through decision-making to physical action \u2014 that works in real-world agricultural conditions.<\/p>\n<p>&#8212;<\/p>\n<h2>Looking Forward<\/h2>\n<p>The agent research of 2025\u20132026 reveals a field productively questioning its own foundations. The external orchestrator model is being challenged by weight compilation, and the debate between them will define agent architecture for the next year. Skill lifecycle management is now systematically understood. Multi-agent coordination has moved from toy problems to real resource-sharing constraints.<\/p>\n<p>The open question is where the balance lies between compilation and orchestration. For strictly procedural tasks, weight compilation seems to dominate. But for tasks requiring dynamic tool selection, environmental interaction, and real-time adaptation, external orchestration may remain necessary. The research frontier for the next year will be characterizing exactly <em>when<\/em> each approach wins \u2014 and whether hybrid architectures can capture the best of both.<\/p>\n<p>What&#8217;s clear is that the agent stack is being rewritten. The frameworks we reach for today may not be the ones we reach for tomorrow.<\/p>\n<p>&#8212;<\/p>\n<p><em>This article is part of the Frontier AI Research Digest backfill series, surveying 37 papers from May 2025 \u2013 May 2026. Views expressed are the author&#8217;s research synthesis, not affiliate endorsements.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Orchestration, skills, and security \u2014 the year agent research grew up. May 2025 \u2013 May 2026 | 37 papers surveyed &#8212; A year ago, if you wanted to build an AI agent, you picked a framework: LangGraph, CrewAI, AutoGen, Google ADK, OpenAI Agents SDK. These frameworks \u2014 collectively exceeding 290,000 GitHub stars \u2014 defined the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":15,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"class_list":["post-16","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-topic-05"],"_links":{"self":[{"href":"https:\/\/monizesairesearch.com\/index.php\/wp-json\/wp\/v2\/posts\/16","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/monizesairesearch.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/monizesairesearch.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/monizesairesearch.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/monizesairesearch.com\/index.php\/wp-json\/wp\/v2\/comments?post=16"}],"version-history":[{"count":1,"href":"https:\/\/monizesairesearch.com\/index.php\/wp-json\/wp\/v2\/posts\/16\/revisions"}],"predecessor-version":[{"id":40,"href":"https:\/\/monizesairesearch.com\/index.php\/wp-json\/wp\/v2\/posts\/16\/revisions\/40"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/monizesairesearch.com\/index.php\/wp-json\/wp\/v2\/media\/15"}],"wp:attachment":[{"href":"https:\/\/monizesairesearch.com\/index.php\/wp-json\/wp\/v2\/media?parent=16"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/monizesairesearch.com\/index.php\/wp-json\/wp\/v2\/categories?post=16"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/monizesairesearch.com\/index.php\/wp-json\/wp\/v2\/tags?post=16"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}