Scalable insight driven by AI-backed tools

We build tools for understanding and steering AI systems, and insights derived from their use inform our research.

Curious about strange behaviors you've noticed in language models? Try our prototype Monitor to look inside →

Research Updates

research report

Automatically Jailbreaking Frontier Language Models with Investigator Agents

Discovering cost-effective attacks with reinforcement learning

3 September 2025

→

research report

Surfacing Pathological Behaviors in Language Models

Improving our investigator agents with propensity bounds

5 June 2025

→

research report

Investigating truthfulness in a pre-release o3 model

o3 frequently fabricates actions it took to fulfill user requests, and elaborately justifies the fabrications when confronted

16 April 2025

→

technical demonstration

Introducing Docent

A system for analyzing and intervening on agent behavior

24 March 2025

→

research report

Scaling Automatic Neuron Explanation

Open-source AI systems trained to describe components of other AI systems at the level of a human expert

23 October 2024

→

research report

Eliciting Language Model Behaviors with Investigator Agents

Language models trained to automatically surface harmful behaviors in language models

23 October 2024

→

technical demonstration

Monitor: An AI-Driven Observability Interface

An interface designed to help humans observe, understand, and steer computations inside models

23 October 2024

→