Alignment Research Center (ARC)

The Alignment Research Center is a small non-profit research group based in Berkeley, California, that is working on a systematic and theoretically grounded approach to mechanistically explaining neural network behavior. We are interested in scholars with a strong math background and mathematical maturity. If you'd be excited to work on the research direction described in this blog post – then we'd encourage you to apply!

Stream overview

ARC will be supervising projects that fit into our technical research agenda, which is outlined here. Such projects could be:

  • Theoretical: for example, developing mechanistic algorithms for estimating some types of quantities (such as permanents of matrices or expected outputs of MLPs) in a way that competes with estimating those same quantities via sampling.
  • Philosophical: for example, fleshing out the picture for how heuristic arguments could be used for mechanistic anomaly detection in neural networks.
  • Empirical: for example, testing our algorithms in practice and using empirical evidence to iterate on those algorithms.

Mentors

Eric Neyman
ARC
,
Researcher
SF Bay Area
Interpretability

Eric Neyman is a researcher at the Alignment Research Center (ARC), which is working on a systematic and theoretically grounded approach to mechanistic interpretability. Before joining ARC, he was a PhD student at Columbia University, where he researched algorithmic Bayesian epistemology.

London
Interpretability

George Robinson is a researcher at the Alignment Research Center (ARC), which is working on a systematic and theoretically grounded approach to mechanistic interpretability. Before joining ARC, he was a PhD student at Oxford University specialising in Algebraic Number Theory. He lives in London, and is a member of the London Initiative for Safe AI (LISA).

Jacob Hilton
ARC
,
Researcher, Executive Director
SF Bay Area
Interpretability

Jacob Hilton is a researcher and the executive director at the Alignment Research Center (ARC), a nonprofit working on the theoretical foundations of mechanistic interpretability. He previously worked at OpenAI on reinforcement learning from human feedback, scaling laws and interpretability. His background is in pure mathematics, and he holds a PhD in set theory from the University of Leeds, UK.

Michael Winer (Mike Winer)
ARC
,
Research collaborator
SF Bay Area
Interpretability

Mike Winer is a researcher at the Alignment Research Center (ARC), where he studies how mechanistic estimates can beat black-box techniques in toy setups. His background is in statistical physics, where he studies how many objects obeying simple rules can exhibit complex behaviors like magnetism, glassiness, or scoring 87% on GPQA.

SF Bay Area
Interpretability

Victor Lecomte is a researcher at the Alignment Research Center (ARC), which is working on a systematic and theoretically grounded approach to mechanistic interpretability. He holds a PhD from Stanford University, where he did research in computational complexity and other areas of theoretical computer science before pivoting to AI safety research.

Wilson Wu
ARC
,
Researcher
SF Bay Area
Interpretability

Wilson Wu is a researcher at the Alignment Research Center (ARC), which is working on a systematic and theoretically grounded approach to mechanistic interpretability. He has previously worked on alternate approaches to interpretability including compact proofs and applications of singular learning theory.

Mentorship style

Scholars will work out of ARC's offices in Berkeley (though we might take a London-based scholar as well). Each scholar will meet with their mentor at least once a week for an hour, though 2-3 hours per week is not uncommon. Besides time with their official mentor, scholars will likely spend time working in collaboration with other researchers; a typical scholar will likely spend about 25% of their time actively collaborating or learning about others' research.

Representative papers

Scholars we are looking for

Essential:

  • Mathematical maturity and a math background at the level of a strong undergraduate math major at a top-20 university.
  • Ability to do productive research in the absence of formal problem statements: ARC researchers spend a lot of their time searching for the correct formalization of problems, or working on problems where the precise goal is not clear.
  • Potentially interested in joining ARC full-time by September 2027.

Preferred:

  • Background in ML theory, theoretical physics, and/or theoretical CS.
  • Basic ML engineering experience, such as running experiments on small neural nets.
  • Open to working at ARC from our office in Berkeley.

Scholars are encouraged to collaborate with anyone at ARC, including full-time researchers and other scholars/visiting researchers. Scholars are also welcome to collaborate with researchers outside of ARC, and are encouraged to do so when outside researchers have expertise that we could benefit from.

Project selection

Each scholar will be paired with the mentor that best suits their skills and interests. The mentor will discuss potential projects with the scholar, and they will decide what project makes the most sense, based on ARC's research goals and the scholar's preferences.

Most scholars will work on multiple projects over the course of their time at ARC, and some scholars will work with multiple mentors.

Community at MATS

MATS Research phase provides scholars with a community of peers.

During the Research phase, scholars work out of a shared office, have shared housing, and are supported by a full-time Community Manager.

Working in a community of independent researchers gives scholars easy access to future collaborators, a deeper understanding of other alignment agendas, and a social network in the alignment community.

Previous MATS cohorts included regular lightning talks, scholar-led study groups on mechanistic interpretability and linear algebra, and hackathons. Other impromptu office events included group-jailbreaking Bing chat and exchanging hundreds of anonymous compliment notes.  Scholars organized social activities outside of work, including road trips to Yosemite, visits to San Francisco, and joining ACX meetups.