Changelog

August 24, 2025

Weekly Digest

Tracing

  • Transcript group display in the front-end + some quality of life for dealing with many transcripts in an agent run
  • Disable tracing by setting an environment variable
  • Miscellaneous improvements: better error messaging, more robust ingestion, support for new formats, the ability to have fine-grained control over which libraries get instrumented, and bug fixes

Ingestion

  • Improved speed and reliability when ingesting large Inspect .eval files; uploading logs of multiple GBs now works

Rubrics and judges

  • Bring your own API keys for running rubrics with OpenAI/Anthropic/Gemini models
  • Choose a specific LLM when running rubrics

August 17, 2025

Weekly Digest

Transcript ingestion

  • Removed metadata schemas, so ingestion no longer requires predefined schemas
  • More robust Inspect one-click ingestion: Docent used to silently drop metadata and scores when using our file upload or drag-and-drop ingestion; this is now fixed.
  • Auto-reload agent run list after successful file upload

Tracing (now in public preview)

  • Transcript groups: Ability to group transcripts in the Docent UI, with support for hierarchical relationships between groups.
  • Incremental loading: Tracing data is now loaded into the collection as it is being run.
  • Improved instrumentation: Auto-instrumentation is now more targeted and can also be manually configured.
  • Improved stability: Various bug fixes, primarily related to supporting a wider range of models and formats.
  • Documentation: Now publicly available.

Charts

  • Export charts as PNG and CSV
  • Resizable chart area

Misc

  • Upgraded to Claude 4 Sonnet from Claude 3.7 Sonnet for rubric evaluation
  • Internal infrastructure improvements

August 10, 2025

Weekly Digest

Rubrics and judges

  • Rubric versioning: Edits to rubrics are now versioned, allowing you to compare results to past iterations

Charts & metadata

  • Any numeric metadata usable as a chart measure: Previously, only scores were allowed
  • Charting uses latest rubric version to avoid plotting stale judge results

Infrastructure

  • Improved scalability of application serving infra

August 3, 2025

Weekly Digest

Clustering & Charts

  • Clustering metrics: Allow switching between counting total results vs. unique agent runs
  • Interactive chart filtering: Click on chart elements to filter data dynamically
  • Improved chart creation workflow: Auto-populate new charts with valid data
  • Fix charts sharing: Previously did not work

(New) Tracing & instrumentation

This feature is still in private preview

  • Auto-instrumentation of OpenAI and Anthropic Python SDKs
  • Simple Python SDK with decorators and context managers for tracing agent scaffolds
  • Auto transcript splitting logic that converts individual LLM calls into contiguous transcripts

Infrastructure

  • VPN networking infrastructure for private access to Docent
  • End-to-end PostHog analytics for internal observability

July 27, 2025

Major
v0.1.1-alpha
  • Complete rewrite of global search: Significantly better search and clustering reliability, especially across concurrent users
  • New rubric schema for searches that enforces inclusion and exclusion rules for more precise results

(New) Quantitative visualization

  • Charts for plotting agent runs, judge results, cluster centroids, and statistics. Allows for multi-dimensional slicing and filtering.

Data & ingestion

  • Inspect logs in the web UI: Import & view Inspect logs directly from the website
  • Embeddings at upload: Trigger embeddings computation as part of the upload flow

SDK & APIs

  • Query agent runs via SDK for easier scripting/automation

July 1, 2025

Major
v0.1.0-alpha

Performance & scalability

  • Performance improvements: Docent now smoothly supports 50-100K transcripts, up from 0.5-1K previously
  • Faster search: Added re-ranking to global search, returning results ~5x faster
  • Robust background processing: Searches now run as background batch jobs, resilient to frontend disconnections

(New) Multi-user collaboration

  • Multi-user support and access controls: Share Docent results with others and control what they can do
  • Self-hosting ready: Deploy once and support any number of users on the same instance

(New) Multi-agent

  • Basic multi-agent transcript support: Basic transcript format for multiple agents

Bug Fixes & Stability

  • Fixed numerous bugs and improved overall system reliability

March 24, 2025

Major
Research Preview

The initial release of Docent, a system for analyzing and intervening on agent behavior.

Read more