Category: Alignment & Safety

  • Week 22, 2026 — Agentic Systems & Skills

    Week 22, 2026 — Agentic Systems & Skills

    Agent research had a breakthrough week, with advances in skill optimization, long-horizon memory management, and production-scale deployment of autonomous code review. SkillOpt: Training Agent Skills Like Neural Network Weights SkillOpt by Yifan Yang et al. introduces the first systematic controllable text-space optimizer for agent skills. An optimizer model turns scored rollouts into bounded add/delete/replace edits…

  • The Year Alignment Got Empirical: When, Where, and for Whom Do Models Fail?

    The Year Alignment Got Empirical: When, Where, and for Whom Do Models Fail?

    55 papers surveyed | May 2025 – May 2026 — For years, AI alignment lived in the realm of principles. Papers opened with “it is important that AI systems align with human values” and closed with hand-waved suggestions for future work. In 2025–2026, that changed. The field stopped asking is the model safe? and started…