Category: Alignment & Safety

Week 22, 2026 — Agentic Systems & Skills

Agent research had a breakthrough week, with advances in skill optimization, long-horizon memory management, and production-scale deployment of autonomous code review. SkillOpt: Training Agent Skills Like Neural Network Weights SkillOpt by Yifan Yang et al. introduces the first systematic controllable text-space optimizer for agent skills. An optimizer model turns scored rollouts into bounded add/delete/replace edits…

31 May 2026
The Year Alignment Got Empirical: When, Where, and for Whom Do Models Fail?

55 papers surveyed | May 2025 – May 2026 — For years, AI alignment lived in the realm of principles. Papers opened with “it is important that AI systems align with human values” and closed with hand-waved suggestions for future work. In 2025–2026, that changed. The field stopped asking is the model safe? and started…

23 October 2025

Week 22, 2026 — Agentic Systems & Skills