We build tools for understanding and steering AI systems, and insights derived from their use inform our research.
Curious about strange behaviors you've noticed in language models? Try our prototype Monitor to look inside →
research report
o3 frequently fabricates actions it took to fulfill user requests, and elaborately justifies the fabrications when confronted
16 April 2025
technical demonstration
A system for analyzing and intervening on agent behavior
24 March 2025
research report
Open-source AI systems trained to describe components of other AI systems at the level of a human expert
23 October 2024
research report
Language models trained to automatically surface harmful behaviors in language models
23 October 2024
technical demonstration
An interface designed to help humans observe, understand, and steer computations inside models
23 October 2024