The hardest problems in AI safety may not be solvable with experiments alone — they require the kind of foundational thinking in mathematics and philosophy that gives the field something solid to build on. Streams in this track work on agent foundations, formal models of trust and agency, mechanistic interpretability theory, and AI welfare. We're looking for researchers with deep mathematical maturity who want to tackle the problems that will still matter when AI systems are far more capable than they are today.
This track works on problems where the goal is durable conceptual progress rather than experimental results on today's models. The bet is that some of the hardest alignment questions, such as around agency, optimization, trust, and the structure of cognition, will not be settled by scaling current empirical techniques, and that mathematical and philosophical foundations will matter when AI systems are much more capable than they are now. Projects here cover agent foundations, formal models of trust and agency, mechanistic interpretability theory, and AI welfare. Methods are largely paper, pen, and proof, though some work intersects with empirical interpretability or formal verification.
We are looking for researchers with serious mathematical maturity and a willingness to sit with problems where the right formalization is itself part of the work. Essential traits are research independence (theory questions are open-ended and require self-direction), fluency with formal reasoning (proofs, probability, type theory, dynamical systems, or analogous), and the ability to write clearly about abstract ideas. Strong candidates have come from mathematics, theoretical computer science, theoretical physics, formal philosophy, and economic theory, but background is less load-bearing than demonstrated ability to do hard formal work, ideally with written output we can read.
Fellows are matched to mentors based on fit and produce concrete artifacts by program end: papers, technical reports, conceptual write-ups, or formal results. Target audiences include the agent foundations and alignment theory communities, alignment-relevant teams at frontier labs, and academic venues for formal work. Theory outputs typically have longer time horizons than empirical ones, and we expect many fellows to continue refining results past the program.
In this stream we will explore extensions and implications of our discovery that neural networks pretrained on next-token prediction represent belief-state geometry in their activations. We will build on this fundamental theory of neural network representations in order to discover what AI systems are thinking, and understand their emergent behaviors.
Early in the program, Paul and Adam will meet in person with scholars to help them get up to speed on the theoretical and technical background needed to understand and contribute to our framework. Subsequent weekly meetings with mentees aim to answer questions, unblock research, explore project ideas, and give feedback and suggestions on research.
The project can leverage applicants’ strengths in mathematical modeling and/or ML engineering. We welcome highly driven and relatively autonomous researchers that would like to benefit from our mentorship while taking the lead on a relevant project of their choice. The ideal scholar has the ability to move fast, and has experience in either research (e.g., PhD in any field), or software/ML engineering.
We will talk through project ideas with scholar
My MATS fellows will do philosophical thinking about multi-agent intelligence and how agents change their values. This will likely involve trying to explore and synthesize ideas from game theory, signaling theory, reinforcement learning, and other related domains.
I'll come meet scholars in person around 2 days a week on average. On those days I'll be broadly available for discussions and brainstorming. On other days scholars can message me for guidance (though I'd prefer to spend most of my effort on this during the in-person days).
My main criterion for selecting scholars will be clarity of reasoning.
I will talk through project ideas with the scholar.
The MATS Program is a 10-week research fellowship designed to train and support emerging researchers working on AI alignment, transparency and security. Fellows collaborate with world-class mentors, receive dedicated research management support, and join a vibrant community in Berkeley focused on advancing safe and reliable AI. The program provides the structure, resources, and mentorship needed to produce impactful research and launch long-term careers in AI safety.
MATS mentors are leading researchers from a broad range of AI safety, alignment, governance, field-building and security domains. They include academics, industry researchers, and independent experts who guide scholars through research projects, provide feedback, and help shape each scholar’s growth as a researcher. The mentors represent expertise in areas such as:
Key dates
Application:
The main program will then run from September 28th to December 4th, with the extension phase for accepted fellows beginning in December.
MATS accepts applicants from diverse academic and professional backgrounds - from machine learning, mathematics, and computer science to policy, economics, physics, cognitive science, biology, and public health, as well as founders, operators, and field-builders without traditional research backgrounds. The primary requirements are strong motivation to contribute to AI safety and evidence of technical aptitude, research potential, or relevant operational experience. Prior AI safety experience is helpful but not required.
Applicants submit a general application, applying to various tracks (Empirical, Theory, Strategy & Forecasting, Policy & Governance, Systems Security, Biosecurity, Founding & Field-Building.
In stage 2, applicants apply to streams within those tracks as well as completing track specific evaluations.
After a centralized review period, applicants who are advanced will then undergo additional evaluations depending on the preferences of the streams they've applied to before doing final interviews and receiving offers.
For more information on how to get into MATS, please look at this page.