Category: Alignment & Safety
-

Week 22, 2026 — Agentic Systems & Skills
Agent research had a breakthrough week, with advances in skill optimization, long-horizon memory management, and production-scale deployment of autonomous code review. SkillOpt: Training Agent Skills Like Neural Network Weights SkillOpt by Yifan Yang et al. introduces the first systematic controllable text-space optimizer for agent skills. An optimizer model turns scored rollouts into bounded add/delete/replace edits…
-

The Year Alignment Got Empirical: When, Where, and for Whom Do Models Fail?
55 papers surveyed | May 2025 – May 2026 — For years, AI alignment lived in the realm of principles. Papers opened with “it is important that AI systems align with human values” and closed with hand-waved suggestions for future work. In 2025–2026, that changed. The field stopped asking is the model safe? and started…