Docent's public alpha

We're excited to share our progress and get feedback from the community.

The Docent Team*

^* Correspondence to: kevin@transluce.org

Transluce | Published: August 26, 2025

In March 2025, we launched a technical preview of Docent, our tool for precisely analyzing complex AI agent behaviors. We’ve spent the last several months iterating on Docent with a handful of early users. Today, we're excited to share that Docent is now in public alpha, available for anyone to try with your own data!

Check out our new landing page
Try Docent at docent.transluce.org
Read the docs at docs.transluce.org
Join our Slack community with this invite
Schedule a call with us on our calendar

What is Docent?

AI agents exhibit complex behaviors in their transcripts, which are often lengthy and hard to understand. We built Docent to help researchers and practitioners scalably answer questions they have about agent behavior. For example, to answer a question like "is my model reward hacking," Docent interactively converts the user's underspecified question into a precise rubric for the behavior, then provides a quantitative measurement of how often it occurs.

We encourage readers to visit the Docent landing page to learn more, or read the quickstart guide if you just want to dive in. Please try Docent out and let us know what you think!

The rest of this post explains what’s new since the original technical preview.

Updates since the research preview

Docent is now hosted at docent.transluce.org, where any user can upload data for analysis. Getting data into Docent is just a few lines of code: either using our Python tracing library to wrap your LLM API calls, or our native integration with Inspect.

from docent.trace import agent_run, initialize_tracing

initialize_tracing("my-collection")

@agent_run
def analyze_document(document_text: str):
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[
            {
                "role": "user",
                "content": f"Analyze this document: {document_text}"
            }
        ]
    )
    return response.choices[0].message.content

We’ve also made two major categories of updates based on early user feedback.

The first category is general product scaling: enabling access control and multi-user collaboration, scaling transcript limits by two orders of magnitude, and reducing frictions to data ingestion.

The second category is based on a better understanding of the key difficulties in understanding complex AI agents. Often, the first challenge in analyzing a complex behavior is helping a user understand what they even want. For instance, a user who cares about “reward hacking” might have in mind any of (or all, or none) of the following, depending on the specific context: deleting tests to game the grader, stuffing outputs with nonsensical jargon to confuse a weak LLM grader, or completing tasks the user never asked for.

To help with this, we've been prototyping a rubric refinement tool that interactively helps users pinpoint what they're looking for in a data-driven way.

Then, we introduced chart visualizations that help users quantitatively measure the behaviors they found.

Check out our regularly updated changelog for a comprehensive list of updates.

Looking ahead

Open source: In March, we made a commitment to open source the code behind Docent. This has taken longer than expected: our codebase has been evolving rapidly in response to user feedback, and it took us several months to settle on a reliable architecture, implement maintainable engineering practices, optimize performance, and build deployment infrastructure. Now that Docent is in public alpha, we’re almost there, and you should expect an update on that front soon. In the interim, we’ve done our best to honor our commitment by providing codebase access to early testers.

Alpha status: We still consider Docent to be in alpha, meaning that you should expect rapid improvements to core features, application stability, and performance over the next few months. Please continue sharing your feedback, bug reports, and feature requests!

Acknowledgments

Thanks to all our private alpha testers: Anthropic, Apollo Research, Bridgewater AIA Labs, Epoch AI, Google DeepMind, HAL (Princeton Language and Intelligence), MATS Research, METR, Palisade Research, Penrose, Redwood Research, Thinking Machines Lab, UIUC DDKang Lab, as well as other unlisted organizations and individuals.

Please reach out to join our community of alpha testers and receive white-glove support!