{"id":52,"date":"2026-05-31T13:27:59","date_gmt":"2026-05-31T17:27:59","guid":{"rendered":"https:\/\/monizesairesearch.com\/index.php\/2026\/05\/31\/weekly-research-digest-6\/"},"modified":"2026-05-31T20:45:25","modified_gmt":"2026-06-01T00:45:25","slug":"weekly-research-digest-6","status":"publish","type":"post","link":"https:\/\/monizesairesearch.com\/index.php\/2026\/05\/31\/weekly-research-digest-6\/","title":{"rendered":"Week 22, 2026 \u2014 Agentic Systems &#038; Skills"},"content":{"rendered":"<p>Agent research had a breakthrough week, with advances in skill optimization, long-horizon memory management, and production-scale deployment of autonomous code review.<\/p>\n<h2>SkillOpt: Training Agent Skills Like Neural Network Weights<\/h2>\n<p><strong>SkillOpt<\/strong> by Yifan Yang et al. introduces the first systematic controllable text-space optimizer for agent skills. An optimizer model turns scored rollouts into bounded add\/delete\/replace edits on skill documents, accepting changes only when validation score strictly improves. Features include: textual learning-rate budget, rejected-edit buffer, and epoch-wise slow\/meta updates for stable training. Results: beats all competitors across 52 evaluated cells. On GPT-5.5, lifts no-skill accuracy by +23.5 points in direct chat and +24.8 inside Codex. Skills transfer across model scales and execution environments. <a href=\"https:\/\/arxiv.org\/abs\/2605.23904v1\">Paper<\/a><\/p>\n<h2>Meta-Cognitive Memory Policy Optimization<\/h2>\n<p><strong>MMPO<\/strong> by Ziyan Liu et al. tackles the fundamental problem that memory-augmented agents degrade as summaries progressively discard task-relevant information. Instead of outcome-based RL, MMPO uses <em>Belief Entropy<\/em> \u2014 a self-supervised proxy for epistemic uncertainty given current memory \u2014 as fine-grained supervision. Maintains 97.1% performance even at 1.75M-token contexts, the new SOTA for long-horizon agent memory. <a href=\"https:\/\/arxiv.org\/abs\/2605.30159v1\">Paper<\/a><\/p>\n<h2>RADAR: Meta&#8217;s 535K+ Automated Code Review Pipeline<\/h2>\n<p><strong>RADAR<\/strong> (Risk Aware Diff Auto Review) at Meta processed 535K+ diffs, landing 331K+ without human review. The multi-stage funnel classifies by authorship, applies eligibility gates, static heuristics, a learned Diff Risk Score, LLM-based review, and deterministic validation. Key metrics: revert rate 1\/3 of non-RADAR diffs, production incident rate 1\/50, median time to close reduced by 330%. As AI-driven code volume grows 105.9% YoY, automated review at this scale is essential. <a href=\"https:\/\/arxiv.org\/abs\/2605.30208v1\">Paper<\/a><\/p>\n<h2>PushBench: Agents Don&#8217;t Know When to Stop<\/h2>\n<p><strong>PushBench<\/strong> by Yuandao Cai et al. measures Quantitative Goal Persistence: whether agents keep working until a verified count is complete. Claude Code and Codex CLI solve many 50-artifact tasks but drop to 3\/9 at 100 artifacts. A state-tracking controller reaches 69-78% success while eliminating duplicates. Quantitative goals stress a different reliability dimension than local task competence. <a href=\"https:\/\/arxiv.org\/abs\/2605.23574v1\">Paper<\/a><\/p>\n<h2>Loong: RL-Optimized Document Translation Agent<\/h2>\n<p><strong>Loong<\/strong> by Yutong Wang et al. uses a 3E memory module (Essence-Exemplar-Entity) with RL-optimized context selection for long-document translation. Average gains of +13.0 points across English\u2194Chinese, German, French \u2014 with strong generalization to domains and robustness to contextual noise. <a href=\"https:\/\/arxiv.org\/abs\/2605.30274v1\">Paper<\/a><\/p>\n<p><strong>Additional papers:<\/strong><br \/>\n&#8211; <a href=\"https:\/\/arxiv.org\/abs\/2605.30237v1\">GRASP: Plan-Guided Graph Retrieval<\/a> \u2014 New SOTA on STaRK benchmarks (+11.9 Hit@1)<br \/>\n&#8211; <a href=\"https:\/\/arxiv.org\/abs\/2605.30219v1\">Contextual Belief Management (CBM)<\/a> \u2014 RL reduces belief-tracking failures by 70.9%<br \/>\n&#8211; <a href=\"https:\/\/arxiv.org\/abs\/2605.30152v1\">Proactive Agents with TGL Triggers<\/a> \u2014 4-83x faster than LLM-as-trigger, 14x better F1<br \/>\n&#8211; <a href=\"https:\/\/arxiv.org\/abs\/2605.30227v1\">Unifying Temporal and Structural Credit Assignment<\/a> \u2014 Block coordinate descent for multi-agent prompt optimization<\/p>\n<p><strong>Key insight:<\/strong> Agent skills can now be trained with the same rigor as neural network weights \u2014 opening the path to systematic, reproducible agent improvement rather than ad-hoc prompt engineering.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Agent research had a breakthrough week, with advances in skill optimization, long-horizon memory management, and production-scale deployment of autonomous code review. SkillOpt: Training Agent Skills Like Neural Network Weights SkillOpt by Yifan Yang et al. introduces the first systematic controllable text-space optimizer for agent skills. An optimizer model turns scored rollouts into bounded add\/delete\/replace edits [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":97,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7,16],"tags":[],"class_list":["post-52","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-topic-06","category-weekly-digest"],"_links":{"self":[{"href":"https:\/\/monizesairesearch.com\/index.php\/wp-json\/wp\/v2\/posts\/52","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/monizesairesearch.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/monizesairesearch.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/monizesairesearch.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/monizesairesearch.com\/index.php\/wp-json\/wp\/v2\/comments?post=52"}],"version-history":[{"count":2,"href":"https:\/\/monizesairesearch.com\/index.php\/wp-json\/wp\/v2\/posts\/52\/revisions"}],"predecessor-version":[{"id":85,"href":"https:\/\/monizesairesearch.com\/index.php\/wp-json\/wp\/v2\/posts\/52\/revisions\/85"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/monizesairesearch.com\/index.php\/wp-json\/wp\/v2\/media\/97"}],"wp:attachment":[{"href":"https:\/\/monizesairesearch.com\/index.php\/wp-json\/wp\/v2\/media?parent=52"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/monizesairesearch.com\/index.php\/wp-json\/wp\/v2\/categories?post=52"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/monizesairesearch.com\/index.php\/wp-json\/wp\/v2\/tags?post=52"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}