Dan Murfet, Jesse Hoogland

We study applications of singular learning theory (SLT) to AI safety, with a focus on interpretability and alignment. Ideal candidates come from a strong technical background in mathematics, physics, computer science, or biology, and aren't afraid to get their hands dirty with ML experiments. We don't expect you to have deep expertise in SLT, but a shallow familiarity will help. 

Stream overview

We expect to support research projects applying singular learning theory (SLT) to AI safety, especially interpretability and alignment. 

We support both empirical projects (e.g., applying Bayesian Influence Functions to study or improve unlearning) and theoretical projects (e.g., proving convergence properties of SGMCMC-based sampling methods). Most projects will involve both components: your specific focus will depend on your background and interests.

On the empirical side, a typical project will involve the following SLT-based tools:

  • Local Learning Coefficients (LLCs) and refined LLCs, which capture model complexity and generalization in singular settings.
  • Susceptibilities, which measure how model components respond to perturbations/distribution shifts.
  • Bayesian Influence Functions (BIFs), which measure how model predictions respond to perturbations/distribution shifts.

With these tools, we investigate diverse areas, such as the following:

  • Developmental interpretability: How do models acquire structure over training?
  • Spectroscopy ("Circuit discovery"): How can we locate which structures are responsible for model generalization? 
  • Unlearning: Can we erase structure/knowledge in a targeted way?
  • Emergent misalignment: Can we predict at what fraction of harmful code data we end up with emergent misalignemnt? 
  • Elicitation: Can we design data perturbations that elicit harmful outputs?

Mentors

Daniel Murfet (Dan)
Timaeus
,
Director of Research
SF Bay Area
Interpretability, Model Organisms, Red-Teaming, Safeguards, Scheming & Deception

I was until recently a professional mathematician at the University of Melbourne, where I worked on algebraic geometry, mathematical logic, some aspects of mathematical physics, and most recently statistical learning theory. As of early 2025 I left academia to direct research at Timaeus on AI safety.

Jesse Hoogland
Timaeus
,
Executive Director
SF Bay Area
Interpretability, Model Organisms, Red-Teaming, Safeguards, Scheming & Deception

Jesse is the co-founder and executive director of Timaeus, an AI safety non-profit researching applications of singular learning theory (SLT) to AI safety, particularly for interpretability and alignment. Jesse comes from a background in physics, and leads several research projects at Timaeus, in addition to being involved in outreach and operations. 

Mentorship style

The team will meet weekly together with both mentors. Separately, you will meet 1-on-1 with at least one of the mentors every other week. We conduct our asynchronous communications through an internal Discord server. We expect you to schedule additional pair-programming/debugging calls with other people on the team as needed.

We'll help with research obstacles, including outside of meetings.

Scholars we are looking for

If you're interested in working on more of the empirical side, you should have prior experience with ML engineering (at least at the level of a program like ARENA) and prior research experience (potentially in a field outside of ML). A bonus would be prior familiarity with designing and running ML experiments or research specifically in AI safety. 

If you're interested in working on more of the theoretical side, you should have prior research experience in a relevant field like mathematics, theoretical physics, or theoretical computer science. 

Please make sure that your background and interests are clearly described in your application. By default, we'll be looking for evidence of research ability in the form of publications. 

We do not expect you to already be aware of SLT, but if you pass the first round, please prepare by conducting some background reading (see: timaeus.co/learn). 

You will most likely be working in a team with 2-4 other people, led by one of this stream's mentors. In some cases, scholars also work on their own or with their own collaborators. 

Project selection

Mentor(s) will talk through project ideas with scholar and suggest several options to choose from. 

Community at MATS

MATS Research phase provides scholars with a community of peers.

During the Research phase, scholars work out of a shared office, have shared housing, and are supported by a full-time Community Manager.

Working in a community of independent researchers gives scholars easy access to future collaborators, a deeper understanding of other alignment agendas, and a social network in the alignment community.

Previous MATS cohorts included regular lightning talks, scholar-led study groups on mechanistic interpretability and linear algebra, and hackathons. Other impromptu office events included group-jailbreaking Bing chat and exchanging hundreds of anonymous compliment notes.  Scholars organized social activities outside of work, including road trips to Yosemite, visits to San Francisco, and joining ACX meetups.