Changelog
Release Date
September 14, 2025
Major ReleaseSeptember 14, 2025
MajorWeekly Digest
(New) Improvements to judging
- Custom schemas for judge results: specify a desired JSONSchema for judge results to be generated and validated against.
- Human labeling of judge results: label judge results in the UI and iterate on agreement rate in real time.
Tracing
- Tracing reliability improvements with fixes to retry logic, parsing highly parallel workloads, and issues with segmenting almost-identical transcripts.
Misc
- Reworked API key management to address extremely slow API key validation.
Release Date
September 7, 2025
Major ReleaseSeptember 7, 2025
MajorWeekly Digest
(New) Improved UI for reviewing judge results
- New multi-column layout for viewing judge results alongside transcript details and chat.
- Judge-aware chat assistant that answers questions about a specific judge result.
- Richer text citations that highlight specific excerpts from the transcript (works in both chat and judges).
Agent runs
- Support transcript deletion from agent runs and transcript groups.
- Support sorting of agent runs by any metadata field.
- Non-blocking agent run logging so ingestion doesn't block client code.
Release Date
August 31, 2025
August 31, 2025
Weekly Digest
Tracing ingestion
- Live tracing ingestion keeps runs updated as events stream in.
- Parallel processing accelerates ingestion and reduces latency.
- Clearer error messaging when tracing encounters issues.
Transcript groups, metadata, and filtering
- Transcript naming and metadata support for richer context on every run.
- Preview data values in the filter picker so you can see what's available.
- Bug fixes mixed metadata types
Release Date
August 24, 2025
August 24, 2025
Weekly Digest
Tracing
- Transcript group display in the front-end + some quality of life improvements for dealing with many transcripts in an agent run
- Disable tracing by setting an environment variable
- Miscellaneous improvements: better error messaging, more robust ingestion, support for new formats, the ability to have fine-grained control over which libraries get instrumented, and bug fixes
Ingestion
- Improved speed and reliability when ingesting large Inspect
.eval
files; uploading logs of multiple GBs now works
Rubrics and judges
- Bring your own API keys for running rubrics with OpenAI/Anthropic/Gemini models
- Choose a specific LLM when running rubrics
Release Date
August 17, 2025
August 17, 2025
Weekly Digest
Transcript ingestion
- Removed metadata schemas, so ingestion no longer requires predefined schemas
- More robust Inspect one-click ingestion: Docent used to silently drop metadata and scores when using our file upload or drag-and-drop ingestion; this is now fixed.
- Auto-reload agent run list after successful file upload
Tracing (now in public preview)
- Transcript groups: Ability to group transcripts in the Docent UI, with support for hierarchical relationships between groups.
- Incremental loading: Tracing data is now loaded into the collection as it is being run.
- Improved instrumentation: Auto-instrumentation is now more targeted and can also be manually configured.
- Improved stability: Various bug fixes, primarily related to supporting a wider range of models and formats.
- Documentation: Now publicly available.
Charts
- Export charts as PNG and CSV
- Resizable chart area
Misc
- Upgraded to Claude 4 Sonnet from Claude 3.7 Sonnet for rubric evaluation
- Internal infrastructure improvements
Release Date
August 10, 2025
August 10, 2025
Weekly Digest
Rubrics and judges
- Rubric versioning: Edits to rubrics are now versioned, allowing you to compare results to past iterations
Charts & metadata
- Any numeric metadata usable as a chart measure: Previously, only scores were allowed
- Charting uses latest rubric version to avoid plotting stale judge results
Infrastructure
- Improved scalability of application serving infra
Release Date
August 3, 2025
August 3, 2025
Weekly Digest
Clustering & Charts
- Clustering metrics: Allow switching between counting total results vs. unique agent runs
- Interactive chart filtering: Click on chart elements to filter data dynamically
- Improved chart creation workflow: Auto-populate new charts with valid data
- Fix charts sharing: Previously did not work
(New) Tracing & instrumentation
This feature is still in private preview
- Auto-instrumentation of OpenAI and Anthropic Python SDKs
- Simple Python SDK with decorators and context managers for tracing agent scaffolds
- Auto transcript splitting logic that converts individual LLM calls into contiguous transcripts
Infrastructure
- VPN networking infrastructure for private access to Docent
- End-to-end PostHog analytics for internal observability
Release Date
July 27, 2025
Major ReleaseJuly 27, 2025
MajorWeekly Digest
(New) Rubric search
- Complete rewrite of global search: Significantly better search and clustering reliability, especially across concurrent users
- New rubric schema for searches that enforces inclusion and exclusion rules for more precise results
(New) Quantitative visualization
- Charts for plotting agent runs, judge results, cluster centroids, and statistics. Allows for multi-dimensional slicing and filtering.
Data & ingestion
- Inspect logs in the web UI: Import & view Inspect logs directly from the website
- Embeddings at upload: Trigger embeddings computation as part of the upload flow
SDK & APIs
- Query agent runs via SDK for easier scripting/automation
Release Date
July 1, 2025
Major ReleaseJuly 1, 2025
Major0.1.0-alpha
Performance & scalability
- Performance improvements: Docent now smoothly supports 50-100K transcripts, up from 0.5-1K previously
- Faster search: Added re-ranking to global search, returning results ~5x faster
- Robust background processing: Searches now run as background batch jobs, resilient to frontend disconnections
(New) Multi-user collaboration
- Multi-user support and access controls: Share Docent results with others and control what they can do
- Self-hosting ready: Deploy once and support any number of users on the same instance
(New) Multi-agent
- Basic multi-agent transcript support: Basic transcript format for multiple agents
Bug Fixes & Stability
- Fixed numerous bugs and improved overall system reliability
Release Date
March 24, 2025
Major ReleaseMarch 24, 2025
MajorResearch Preview
The initial release of Docent, a system for analyzing and intervening on agent behavior.
Read more