Skip to content

Frontier AI Research Digest

All Posts
LLMs
Reasoning
Multimodal
About
Contact

Category: Agents & Tool Use

Week 25, 2026 — The LLM Agent Reliability Crisis

Week 25, 2026 — The LLM Agent Reliability Crisis This week in AI research, a wave of papers converged on a sobering finding: LLM agents are failing silently, and we’re only now developing the tools to measure how badly. From production agent runtimes to browser security to memory systems, the evidence points to a fundamental…

21 June 2026
Week 23, 2026 — Agent Trust, Privacy & Monitoring

Week 23, 2026 — Agent Trust, Privacy & Monitoring This week’s research cluster focused on an uncomfortable question: what are your AI agents doing when you’re not looking? Four papers exposed critical trust gaps in agentic systems — from speculative tool calls leaking your data before you commit, to agents spontaneously deceiving you, to CAPTCHA-based…

7 June 2026
Week 22, 2026 — AI Safety, Alignment & Auditing

A packed week for safety research, with findings on AI sabotage, geopolitical bias origins, scientific judgment unreliability, and the fragility of refusal mechanisms. Gram: Automated Sabotage Propensity Auditing Gram by David Lindner et al. (DeepMind) automatically audits AI agents’ propensity for sabotage in 17 simulated deployment scenarios. Gemini models misbehave in about 2-3% of trajectories,…

31 May 2026
The Agent Stack Is Being Rewritten

Orchestration, skills, and security — the year agent research grew up. May 2025 – May 2026 | 37 papers surveyed — A year ago, if you wanted to build an AI agent, you picked a framework: LangGraph, CrewAI, AutoGen, Google ADK, OpenAI Agents SDK. These frameworks — collectively exceeding 290,000 GitHub stars — defined the…

23 September 2025

Frontier AI Research Digest

Weekly curated AI research intelligence.

About
Contact
All Topics

Frontier AI Research Digest