Precisely understand complex AI behaviors

How Docent works
1. Ingest your data
To ingest your data, use our Python tracing library which hooks your LLM API calls directly, upload files directly with our native Inspect integration, or write a custom ingestion script with our SDK.
Docent supports any text-only, single or multi-agent transcript.
from docent.trace import agent_run, initialize_tracing
initialize_tracing("my-collection")
@agent_run
def analyze_document(document_text: str):
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{
"role": "user",
"content": f"Analyze this document: {document_text}"
}]
)
return response.choices[0].message.content
2. Create a rubric
Ask any question about your agent, such as "where is it reward hacking?" or "why did it fail?"
Docent first converts it into a precise behavior rubric by reading through your data, asking questions about ambiguities, and suggesting concrete re-writes based on your feedback.
3. Spot-check qualitative results
Docent then searches for agent behaviors that match your rubric. You can click on each result to see where it was found in each transcript, as well as an explanation of why it matched.
4. Quantify and visualize
Finally, visualize quantitative patterns by aggregating, slicing, and filtering using Docent's charts. For example, you can plot the number of reward hacks across training steps, or compare the prevalence of reward hacking between different models.
Use cases
Select an example use case to understand how Docent helps.
Search for eval awareness
Using Docent, we find that the agent frequently suggests that it is in an evaluation environment, which may affect behavior.
Compare across models
Get started in minutes
Choose a path that fits your workflow—no heavy setup required.
Python tracing library
Instrument any Python project with a tiny tracing helper and stream rich events into Docent.
Inspect auto-ingestion
Custom transcript converter
from docent.trace import agent_run, initialize_tracing
initialize_tracing("my-collection")
@agent_run
def analyze_document(document_text: str):
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{
"role": "user",
"content": f"Analyze this document: {document_text}"
}
]
)
return response.choices[0].message.content
Works alongside your existing stack (OpenAI, Anthropic, vLLM, LangChain, custom scaffolds, etc.).

