Changelog

Release Date

October 30, 2025

Major Release

October 30, 2025

Major

Weekly Digest

(New) Multi-rollout

You can now run multiple rollouts of a judge on a single agent run. Docent will automatically highlight inconsistencies between judge rollouts and disagreements with your labels.

(New) Label sets

Labels are namespaced into label sets. You can now migrate label sets between rubrics with compatible schemas.

(New) Docent Query Language

You can now query data in the UI and SDK using Docent Query Language, an SQL-like language.

Misc

Metadata included in citations: Judges and chat messages can cite metadata in agent runs, transcripts, and transcript blocks.
Usage stats: The new settings page displays how much of your free LLM quota you have consumed.
Increased collection size limits: Collection sizes can now support up to 1M agent runs. We’ve also made performance improvements to large collections.
Bug Fixes: Reduced judge errors from LLMs outputting invalid JSON.
Tracing Improvements: Support for google-genai, plus reliability, usability, and performance improvements.

UI improvements

Labeling moved from tabs in the run result page to the judge results list.
You can now create labels from the agent run page and associate them with a label set.
We removed the critical moments / run summary feature from the agent run page.
Check out searchable fields, sticky settings, step slider, and additional filterable fields.

Release Date

September 14, 2025

Major Release

September 14, 2025

Major

Weekly Digest

(New) Improvements to judging

Custom schemas for judge results: specify a desired JSONSchema for judge results to be generated and validated against.
Human labeling of judge results: label judge results in the UI and iterate on agreement rate in real time.

Tracing

Tracing reliability improvements with fixes to retry logic, parsing highly parallel workloads, and issues with segmenting almost-identical transcripts.

Misc

Reworked API key management to address extremely slow API key validation.

Release Date

September 7, 2025

Major Release

September 7, 2025

Major

Weekly Digest

(New) Improved UI for reviewing judge results

New multi-column layout for viewing judge results alongside transcript details and chat.
Judge-aware chat assistant that answers questions about a specific judge result.
Richer text citations that highlight specific excerpts from the transcript (works in both chat and judges).

Agent runs

Support transcript deletion from agent runs and transcript groups.
Support sorting of agent runs by any metadata field.
Non-blocking agent run logging so ingestion doesn't block client code.

Release Date

August 31, 2025

Weekly Digest

Tracing ingestion

Live tracing ingestion keeps runs updated as events stream in.
Parallel processing accelerates ingestion and reduces latency.
Clearer error messaging when tracing encounters issues.

Transcript groups, metadata, and filtering

Transcript naming and metadata support for richer context on every run.
Preview data values in the filter picker so you can see what's available.
Bug fixes mixed metadata types

Release Date

August 24, 2025

Weekly Digest

Tracing

Transcript group display in the front-end + some quality of life improvements for dealing with many transcripts in an agent run
Disable tracing by setting an environment variable
Miscellaneous improvements: better error messaging, more robust ingestion, support for new formats, the ability to have fine-grained control over which libraries get instrumented, and bug fixes

Ingestion

Improved speed and reliability when ingesting large Inspect .eval files; uploading logs of multiple GBs now works

Rubrics and judges

Bring your own API keys for running rubrics with OpenAI/Anthropic/Gemini models
Choose a specific LLM when running rubrics

Release Date

August 17, 2025

Weekly Digest

Transcript ingestion

Removed metadata schemas, so ingestion no longer requires predefined schemas
More robust Inspect one-click ingestion: Docent used to silently drop metadata and scores when using our file upload or drag-and-drop ingestion; this is now fixed.
Auto-reload agent run list after successful file upload

Tracing (now in public preview)

Transcript groups: Ability to group transcripts in the Docent UI, with support for hierarchical relationships between groups.
Incremental loading: Tracing data is now loaded into the collection as it is being run.
Improved instrumentation: Auto-instrumentation is now more targeted and can also be manually configured.
Improved stability: Various bug fixes, primarily related to supporting a wider range of models and formats.
Documentation: Now publicly available.

Charts

Export charts as PNG and CSV
Resizable chart area

Misc

Upgraded to Claude 4 Sonnet from Claude 3.7 Sonnet for rubric evaluation
Internal infrastructure improvements

Release Date

August 10, 2025

Weekly Digest

Rubrics and judges

Rubric versioning: Edits to rubrics are now versioned, allowing you to compare results to past iterations

Charts & metadata

Any numeric metadata usable as a chart measure: Previously, only scores were allowed
Charting uses latest rubric version to avoid plotting stale judge results

Infrastructure

Improved scalability of application serving infra

Release Date

August 3, 2025

Weekly Digest

Clustering & Charts

Clustering metrics: Allow switching between counting total results vs. unique agent runs
Interactive chart filtering: Click on chart elements to filter data dynamically
Improved chart creation workflow: Auto-populate new charts with valid data
Fix charts sharing: Previously did not work

(New) Tracing & instrumentation

This feature is still in private preview

Auto-instrumentation of OpenAI and Anthropic Python SDKs
Simple Python SDK with decorators and context managers for tracing agent scaffolds
Auto transcript splitting logic that converts individual LLM calls into contiguous transcripts

Infrastructure

VPN networking infrastructure for private access to Docent
End-to-end PostHog analytics for internal observability

Release Date

July 27, 2025

Major Release

July 27, 2025

Major

Weekly Digest

(New) Rubric search

Complete rewrite of global search: Significantly better search and clustering reliability, especially across concurrent users
New rubric schema for searches that enforces inclusion and exclusion rules for more precise results

(New) Quantitative visualization

Charts for plotting agent runs, judge results, cluster centroids, and statistics. Allows for multi-dimensional slicing and filtering.

Data & ingestion

Inspect logs in the web UI: Import & view Inspect logs directly from the website
Embeddings at upload: Trigger embeddings computation as part of the upload flow

SDK & APIs

Query agent runs via SDK for easier scripting/automation

Release Date

July 1, 2025

Major Release

July 1, 2025

Major

0.1.0-alpha

Performance & scalability

Performance improvements: Docent now smoothly supports 50-100K transcripts, up from 0.5-1K previously
Faster search: Added re-ranking to global search, returning results ~5x faster
Robust background processing: Searches now run as background batch jobs, resilient to frontend disconnections

(New) Multi-user collaboration

Multi-user support and access controls: Share Docent results with others and control what they can do
Self-hosting ready: Deploy once and support any number of users on the same instance

(New) Multi-agent

Basic multi-agent transcript support: Basic transcript format for multiple agents

Bug Fixes & Stability

Fixed numerous bugs and improved overall system reliability

Release Date

March 24, 2025

Major Release

March 24, 2025

Major

Research Preview

The initial release of Docent, a system for analyzing and intervening on agent behavior.