Help Fund Scalable Democratic Oversight of AI

Transluce is a non-profit AI lab working to ensure that AI oversight scales with AI capabilities. This means developing novel automated oversight tools and putting them in the hands of AI evaluators, companies, governments, and civil society. Reliable, independent evaluations of AI systems are key to both safety and innovation, and Transluce is building tools to make this practical at scale and across domains.

So far, we have built scalable systems for monitoring AI agents, testing their behaviors, and interpreting their inner workings, and used these to study issues like sycophancy, self-harm, and reward hacking. Our systems have been used to build and improve popular agent evaluations like HAL and SWE-bench, by developers to align frontier models like Claude 4, and by governments to evaluate risks to public safety. We have been publicly endorsed by leading researchers from across the field including Wojciech Zaremba (OpenAI co-founder), Ethan Perez (Anthropic), and Percy Liang (Stanford).

This blog post overviews Transluce's efforts to create scalable oversight of AI, as well as our current $11 million fundraising goal for this giving season. This funding will allow us to build new AI evaluation platforms and methods, advance public accountability for AI systems, and apply our research to a range of pressing AI risks, from manipulation and deception to mental health and child safety. Donations of all sizes help inform our approach and advance our efforts.

We enjoy talking to prospective donors, so please reach out at info@transluce.org with any questions!

Our Theory of Change

The problem: oversight is unreliable and siloed. Today's complex AI systems are difficult to understand—even experts struggle to predict behaviors like blackmail, sycophancy, or spiral personas. At the same time, most analyses of AI systems are done behind closed doors by the same labs deploying them, presenting inherent conflicts of interest. This combination is dangerous: we face technologies whose behavior we cannot reliably forecast, and we lack trusted public channels to even assess the risks.

The solution: use AI to understand AI, in public. We need scalable technology for understanding and overseeing AI systems, backed by AI itself, so that oversight can scale with AI capabilities and adoption. This technology should be developed in public, so that it can be publicly vetted and drive third-party accountability. We address this in two parts:

  1. Scaling AI oversight. A fundamental takeaway from the last decade of AI is the bitter lesson: simple methods that leverage compute and data outperform specialized, hand-crafted methods. This is driving rapid progress in AI capabilities, but how do we leverage the bitter lesson to also drive understanding and oversight of AI?

    Our key insight is that AI systems generate vast amounts of data—agent transcripts, diverse behaviors across prompts, neuron activations, etc. This scale overwhelms humans, but we can use it to train AI-backed tools. By training AI agents to understand this data and explain it to humans, we build specialized, superhuman AI assistants. Rather than being broadly superhuman, our tools are superhuman specifically at helping humans oversee other AI systems, for instance by catching strange or unwanted behaviors at scale, uncovering such issues before models are deployed, and informing reliable fixes that avoid similar problems in the future.

  2. Advancing public accountability. Society cannot rely on AI developers to grade their own homework. We need a robust ecosystem of independent actors capable of systematically understanding AI systems and providing accountability.

    Our open-source, AI-backed tools provide a technology stack that powers this public accountability. Independent evaluators can use it to oversee frontier AI systems with state-of-the-art, publicly vetted tools, which increases quality, reduces conflicts of interest, and helps drive adoption of best practices.

In summary, we leverage the bitter lesson to create specialized, superhuman systems for understanding and overseeing AI, and use this technology to drive an industry standard for trustworthy oversight. We build this technology using philanthropic funding, so that we can design it openly and in the public interest.

Impact

Since Transluce launched in October 2024, we have:

Built and scaled an agent evaluation platform. Docent, our framework for scalably analyzing agent behavior, has been used by over 25 organizations, including frontier AI labs (Anthropic, DeepMind, Thinking Machines), third party safety orgs (METR, Redwood, Apollo, Palisade), government evaluators, AI start-ups (Penrose), large enterprises (Bridgewater Associates), and academic labs (Princeton, UIUC). It was also used as part of Claude 4's pre-deployment safety analysis and is integrated with SWE-bench, one of the most-used AI agent benchmarks.

Developed novel methods for investigating AI behaviors. We introduced the idea of trainable investigator agents and developed a new reinforcement learning method, PRBO, for eliciting unexpected but realistic low-probability behaviors from language models. Our method discovered new behaviors in open-weight models, including a propensity to recommend self-harm to users. We used investigators trained with PRBO to conduct a demonstration audit of an open-weight frontier model for behaviors specified by policy experts.

Conducted high-impact red-teaming. Our pre-deployment testing of OpenAI's o3 model uncovered persistent tendencies of o3 to fabricate actions it took to fulfill user requests. Our work received coverage in TechCrunch, ArsTechnica, Yahoo News, and TechInAsia and our report was viewed more on social media than the official o3 release. We have also shown that specialized small models can red-team frontier models, automatically uncovering CBRN jailbreaks for all frontier systems we tested.

Advanced the state of the art for interpreting model internals, through public tools such as our natural-language neuron descriptions (state-of-the-art description quality, 12000+ downloads) and the Monitor interpretability interface. We have also developed new basic methods for interpretability, including training models to directly verbalize the meaning of their internal activations, uncovering latent inferences about users, and discovering sparse neuron circuits.

Strengthened the evaluator ecosystem. We established the AI Evaluator Forum, which brings together leading researchers to establish shared standards and foster a healthy ecosystem of independent evaluators operating in the public interest. Together, we released AEF-1, a new standard for ensuring minimum levels of access, transparency, and independence for third-party evaluations. Transluce also provides technical expertise to help policymakers, insurers, and enterprises address AI risks using independent evaluations, including serving as a contractor to the EU AI Office.

What Funding Will Enable

Our targeted $11 million raise would go roughly into the following categories:

  • 60% to scale our existing research, including hiring new research engineers, expanding our compute budget 10x, and building a dedicated infrastructure team.

  • 15% to apply our methods for evaluating model releases and risks of acute public concern, from deception and manipulation to mental health and child safety.

  • 10% to governance and public accountability, including by establishing best practices and standards for evaluation, expanding the evaluator ecosystem, and providing technical analysis to governments and policymakers.

  • 10% to kickstart new efforts such as fine-tuning generalization (e.g. emergent misalignment, character training) or multi-agent failures (e.g. AI-induced psychosis, parasitic AI, implicit/explicit collusion across AI systems).

  • 5% to overhead (e.g., office, operations, legal).

Even small amounts contribute meaningfully to our progress, such as hiring one additional researcher or bringing one new cluster node online.

Growth Plan

Transluce began operation in late 2024, initially supported entirely by donations. We have since begun to earn revenue as well from both private companies and governments, which constitutes approximately 20% of our funding this year, and will likely be higher in future years.

Revenue model. Over time, we project that earned revenue will fund a significant portion of Transluce's work. Our revenue model is open core—our core oversight stack remains public and open source to keep it accessible and vettable, while hosted services and advanced features (e.g., specialized AI tools) generate revenue. Earned revenue will help subsidize our public interest work, such as safety audits, publicly motivated R&D, and technical support to governments and other non-profits.

Role of donations. Donations help us in three ways:

  • In the short term, they provide us enough runway to grow our earned revenue into a sustainable business.
  • In the long term, they help subsidize public service work, allowing us to consistently prioritize the public interest, even when commercial incentives diverge.
  • On all time horizons, they accelerate our growth, allowing us to attract senior technical talent and make longer-term bets.

Donor support today ensures that Transluce can both move quickly and remain aligned with its public mission.

Transluce is a 501(c)(3) nonprofit organization. Our registered name is Clarity AI Research Inc.

To donate, contact us at info@transluce.org, and we'll set up a donation method that works best for you.

Ways to Give:

  • Bank Transfer (we accept wire transfers in multiple currencies)
  • Check
  • Donor-Advised Funds
  • Stock Donations (we accept stock donations through our Vanguard brokerage account)