Sarah Schwettmann, Jacob Steinhardt

We build scalable technology for AI understanding and oversight.

Stream overview

We’re building scalable, AI-backed systems for analyzing, testing, and interpreting AI agents, and using these to study behaviors like sycophancy, self-harm, and reward hacking. We’re looking for scholars who want to help us push forward this work.

Some concrete projects include: scalable, end-to-end tools for interpretability and behavior elicitation; creating robust LLM judges for Docent; scalable search and retrieval for large agent transcripts.

Mentors

Jacob Steinhardt
Transluce
,
Co-Founder, CEO
SF Bay Area
Interpretability, Monitoring, Dangerous Capability Evals

I am an Assistant Professor of Statistics and EECS at UC Berkeley, where I’m also part of BAIR and CLIMB. I am also Founder & CEO of Transluce, a non-profit research lab building open, scalable technology for understanding frontier AI systems.

Sarah Schwettmann
Transluce
,
Co-Founder, Chief Scientist
SF Bay Area
Interpretability, Monitoring, Dangerous Capability Evals

I’m a Research Scientist in MIT CSAIL with the MIT-IBM Watson AI Lab. I did my PhD in Brain and Cognitive Sciences at MIT, as an NSF Fellow working with Josh Tenenbaum and Antonio Torralba. My work investigates representations underlying intelligence in artificial (and previously, biological) neural networks.

Mentorship style

You will work closely with a mentor through recurring meetings (group and individual) and Slack.

Representative papers

https://transluce.org/pathological-behaviors 

https://transluce.org/observability-interface 

https://transluce.org/docent and https://transluce.org/introducing-docent 

Scholars we are looking for

We're looking for strong, experienced software engineers or talented researchers who can hit the ground running and iterate quickly.

ML experience is a bonus but not required.

Probably will work with collaborators from stream

Project selection

We will talk through project ideas with scholar

Community at MATS

MATS Research phase provides scholars with a community of peers.

During the Research phase, scholars work out of a shared office, have shared housing, and are supported by a full-time Community Manager.

Working in a community of independent researchers gives scholars easy access to future collaborators, a deeper understanding of other alignment agendas, and a social network in the alignment community.

Previous MATS cohorts included regular lightning talks, scholar-led study groups on mechanistic interpretability and linear algebra, and hackathons. Other impromptu office events included group-jailbreaking Bing chat and exchanging hundreds of anonymous compliment notes.  Scholars organized social activities outside of work, including road trips to Yosemite, visits to San Francisco, and joining ACX meetups.