We are an independent research lab working toward responsible development and deployment in the public interest.
research report
Open-source AI systems trained to describe components of other AI systems at the level of a human expert
23 October 2024
technical demonstration
An interface designed to help humans observe, understand, and steer computations inside models
23 October 2024
research report
Language models trained to automatically surface harmful behaviors in language models
23 October 2024