Dan Murfet, Jesse Hoogland

We study applications of singular learning theory (SLT) to AI safety, with a focus on interpretability and alignment. Ideal candidates come from a strong technical background in mathematics, physics, computer science, or biology, and aren't afraid to get their hands dirty with ML experiments. We don't expect you to have deep expertise in SLT, but a shallow familiarity will help.

Apply

View all streams

Stream overview

We expect to support research projects applying singular learning theory (SLT) to AI safety, especially interpretability and alignment.

We support both empirical projects (e.g., applying Bayesian Influence Functions to study or improve unlearning) and theoretical projects (e.g., proving convergence properties of SGMCMC-based sampling methods). Most projects will involve both components: your specific focus will depend on your background and interests.

On the empirical side, a typical project will involve the following SLT-based tools:

Local Learning Coefficients (LLCs) and refined LLCs, which capture model complexity and generalization in singular settings.
Susceptibilities, which measure how model components respond to perturbations/distribution shifts.
Bayesian Influence Functions (BIFs), which measure how model predictions respond to perturbations/distribution shifts.

With these tools, we investigate diverse areas, such as the following:

Developmental interpretability: How do models acquire structure over training?
Spectroscopy ("Circuit discovery"): How can we locate which structures are responsible for model generalization?
Unlearning: Can we erase structure/knowledge in a targeted way?
Emergent misalignment: Can we predict at what fraction of harmful code data we end up with emergent misalignemnt?
Elicitation: Can we design data perturbations that elicit harmful outputs?

Mentors

Daniel Murfet (Dan)

Timaeus

Director of Research

SF Bay Area

—

Interpretability

Model Organisms

Red-Teaming

Safeguards

Scheming and Deception

I was until recently a professional mathematician at the University of Melbourne, where I worked on algebraic geometry, mathematical logic, some aspects of mathematical physics, and most recently statistical learning theory. As of early 2025 I left academia to direct research at Timaeus on AI safety.

Jesse Hoogland

Timaeus

Executive Director

SF Bay Area

—

Interpretability

Model Organisms

Red-Teaming

Safeguards

Scheming and Deception

Jesse is the co-founder and executive director of Timaeus, an AI safety non-profit researching applications of singular learning theory (SLT) to AI safety, particularly for interpretability and alignment. Jesse comes from a background in physics, and leads several research projects at Timaeus, in addition to being involved in outreach and operations.

Mentorship style

The team will meet weekly together with both mentors. Separately, you will meet 1-on-1 with at least one of the mentors every other week. We conduct our asynchronous communications through an internal Discord server. We expect you to schedule additional pair-programming/debugging calls with other people on the team as needed.

We'll help with research obstacles, including outside of meetings.

Representative papers

An overview of our research agenda:

You Are What You Eat -- AI Alignment Requires Understanding How Data Shapes Structure and Generalisation

SLT for Data Attribution:

SLT for Interpretability:

Foundations of SLT:

Scholars we are looking for

If you're interested in working on more of the empirical side, you should have prior experience with ML engineering (at least at the level of a program like ARENA) and prior research experience (potentially in a field outside of ML). A bonus would be prior familiarity with designing and running ML experiments or research specifically in AI safety.

If you're interested in working on more of the theoretical side, you should have prior research experience in a relevant field like mathematics, theoretical physics, or theoretical computer science.

Please make sure that your background and interests are clearly described in your application. By default, we'll be looking for evidence of research ability in the form of publications.

We do not expect you to already be aware of SLT, but if you pass the first round, please prepare by conducting some background reading (see: timaeus.co/learn).

You will most likely be working in a team with 2-4 other people, led by one of this stream's mentors. In some cases, scholars also work on their own or with their own collaborators.

Project selection

Mentor(s) will talk through project ideas with scholar and suggest several options to choose from.