We build tools for understanding and steering AI systems, and insights derived from their use inform our research.
Curious about strange behaviors you've noticed in language models? Try our prototype Monitor to look inside →
research report
Open-source AI systems trained to describe components of other AI systems at the level of a human expert
23 October 2024
technical demonstration
An interface designed to help humans observe, understand, and steer computations inside models
23 October 2024
research report
Language models trained to automatically surface harmful behaviors in language models
23 October 2024