Streams in this track include hands-on research using machine learning experiments to understand and improve model safety including AI control, interpretability, scalable oversight, evaluations, red-teaming, and robustness. This is the largest track in the program and is defined by its methods rather than any single research agenda. If your primary tool is ML engineering, this is your track.
The track is defined by its methodology more than by any single research agenda. Fellows run ML experiments to understand and improve the safety properties of frontier models, with work spanning interpretability, AI control, scalable oversight, evaluations, red-teaming, robustness, and model organisms of misalignment. The unifying thread is that progress comes from getting hands on real models (training, probing, fine-tuning, measuring) rather than reasoning from first principles alone. This is the largest track in the program and the most common entry point into technical AI safety research.
We are looking for fellows whose primary tool is ML engineering, broadly construed. The essential requirement is the ability to design and run experiments on language models or other deep learning systems and iterate quickly on the results. In practice that usually means strong Python (with and without AI coding tools), comfort with the infrastructure around running models at moderate scale, and enough research taste to know which experiments are worth running. Mission alignment matters: fellows should be able to say why a given line of empirical work meaningfully reduces frontier risk, not just whether it yields a successful publication. Educational background and seniority are weighted lightly here relative to other tracks. Past cohorts have included strong fellows ranging from undergraduates to senior industry researchers.
Fellows are matched to mentors based on fit, and projects are scoped to produce concrete artifacts by program end: papers, evaluation suites, open-source tooling, or technical reports. Target audiences include safety and alignment teams at frontier labs, governments and other evaluation organizations, the broader ML research community.
GDM stream focused on scheming risk, AI control, monitoring, monitorability, and loss-of-control evaluations. Probably running in-person in London.
I'm generally quite hands-off. I propose projects that I think matter for AGI safety and are tractable, to set scholars up for success. I then expect scholars to fully own the project, and update / consult me as needed.
By default we'd meet once a week to discuss the project for 30 min - 1 hour. I see my role as giving feedback on the project direction, stress-testing or advising on design / prioritisation decisions, and occasionally suggesting experiments or methodological improvements (which you should treat as suggestions / advice, not orders!).
You can also book ad-hoc meetings with me, ping me on Slack, or send me docs / paper drafts for review.
I also offer scholars to meet with me once a month for 30 min to discuss career stuff, skill-building, feedback on their progress, or anything else.
Preferred technical skills:
I'll propose ~5 projects for scholars to choose from. I am also open to scholar-proposed projects if they are well articulated, promising, and align with my research interests.
This stream will focus on the science and development of model evaluations, especially monitorability and alignment evals.
I'll meet with scholars 2x/week each. I'll also be generally available async and potentially for code review.
Various profiles could be a good fit.
Wanted:
Some of the following would be great but not essential:
I'll provide a list of possible projects to pick from, and talk through the options before making a decision.
Scholars can also suggest their own projects.
Research papers (technical governance or ML) related to evaluating and mitigating dangerous AI capabilities, with a focus on what's actionable and relevant for AGI companies
I like to get daily standup messages about progress that has made on the project, and I'm happy to provide some quick async feedback on new outputs. I'll also have weekly meetings. I'm based in Constellation in Berkeley.
Good writers/researchers who can work independently and autonomously! I'm looking for scholars who can ship a meaningful research output end-to-end and ideally have prior experience in writing relevant papers.
I may assign a project, have you pick from a list of projects, or talk through project ideas with you.
I'm interested in empirical projects that improve our ability to evaluate model capabilities or enable to understand or evaluate model monitorability. An ideal project culminates in a research output (conference/Arxiv paper or research blogpost with artifacts).
Time commitments: I expect to not be able to spend more than 5 hours on any week.
Meetings: I expect to have project meetings weekly for about an hour, where we chat about your results from last week, the planned next steps, any blockers or uncertainties. We'll have a monthly overall project check-in about broader progress towards overall goals.
Help outside of meetings: I am available to provide some help most weeks outside of the meeting, but by and large I expect mentees to be self-directed and self-sufficient in solving problems.
An ideal mentee has a strong AI research (software engineering is a plus) background. It's important that they are self-motivated and can make weekly progress with little intervention. If you are interested in working on non-concretely scoped projects, I would expect mentees to have the ability to write well-scoped project proposals, with realistic planned milestones and deliverables. Evidence of successful projects here would be very helpful in evaluating this.
A mentee can be a PhD student and they can work on a paper that will be part of their thesis.
I will talk through project ideas with the scholar
Making society safe from AI doesn't just mean making safe AI: we're figuring out how to uplift human collective intelligence, manage a highly multiagent world, improve foresight and institutional competence, ideally learning how to make best positive use of frontier AI systems as we go. FLF has a small, sharp team of researchers with a wide network, and we're looking to nurture new and missing approaches to minimising large-scale risks while steering to a flourishing future.
Willing to devote a few hours per week to this - I'll keep a 30m or 1h slot available weekly, and interact on Slack circa daily. Some closer projects might be much more interactive.
Depends a lot on direction. Ideally be able to make proposals and dig into things somewhat independently. Be good at explaining your thinking, and able+willing to teach me things!
For collective intelligence/human reasoning, I'd usually want someone very familiar with software production, at least skilled in software development or in product management and prototyping. Other candidates with great vision can succeed here if they're able to work with complementary talent to get things going.
For foresight, any of: polymathic/multi-STEM/futurism background, deep expertise in bio and/or AI, natsec experience or connections, unusual writer/game dev talent, safety engineering background, other background that you think I might want to hear about.
For multiagent accountability: law, economics, politics, history, or a combination, plus some familiarity with AI and agents.
I'll ask for interests and (if you have them) a proposal or two right away. We'll spend the first week or two iterating that, discussing other options, and maybe trying out little experiments. Likely we'll pick a direction then, but it's also fine if we pivot later.
Projects on this stream cluster into a few broad areas from the empirical track: scalable oversight, AI control, monitorability and interpretability, adversarial robustness, and security.
Most fellows will work closely with one or two mentors on something that fits into the mentors' ongoing research. The above list of mentors above is tentative.
Essential:
Preferred (at least one of):
Projects in this stream will be on AI welfare and moral status; more specifically, on what it takes to be a moral patient and how we can determine whether AI systems meet the conditions. I'm looking for applicants who have ideas about these topics and are motivated to explore them in more detail.
By default, scholars will meet with me online for 1hr/week and I will respond to questions on email/slack.
Scholars should have the following characteristics:
I will talk through project ideas with scholar
In this stream we will explore extensions and implications of our discovery that neural networks pretrained on next-token prediction represent belief-state geometry in their activations. We will build on this fundamental theory of neural network representations in order to discover what AI systems are thinking, and understand their emergent behaviors.
Early in the program, Paul and Adam will meet in person with scholars to help them get up to speed on the theoretical and technical background needed to understand and contribute to our framework. Subsequent weekly meetings with mentees aim to answer questions, unblock research, explore project ideas, and give feedback and suggestions on research.
The project can leverage applicants’ strengths in mathematical modeling and/or ML engineering. We welcome highly driven and relatively autonomous researchers that would like to benefit from our mentorship while taking the lead on a relevant project of their choice. The ideal scholar has the ability to move fast, and has experience in either research (e.g., PhD in any field), or software/ML engineering.
We will talk through project ideas with scholar
The MATS Program is a 10-week research fellowship designed to train and support emerging researchers working on AI alignment, transparency and security. Fellows collaborate with world-class mentors, receive dedicated research management support, and join a vibrant community in Berkeley focused on advancing safe and reliable AI. The program provides the structure, resources, and mentorship needed to produce impactful research and launch long-term careers in AI safety.
MATS mentors are leading researchers from a broad range of AI safety, alignment, governance, field-building and security domains. They include academics, industry researchers, and independent experts who guide scholars through research projects, provide feedback, and help shape each scholar’s growth as a researcher. The mentors represent expertise in areas such as:
Key dates
Application:
The main program will then run from September 28th to December 4th, with the extension phase for accepted fellows beginning in December.
MATS accepts applicants from diverse academic and professional backgrounds - from machine learning, mathematics, and computer science to policy, economics, physics, cognitive science, biology, and public health, as well as founders, operators, and field-builders without traditional research backgrounds. The primary requirements are strong motivation to contribute to AI safety and evidence of technical aptitude, research potential, or relevant operational experience. Prior AI safety experience is helpful but not required.
Applicants submit a general application, applying to various tracks (Empirical, Theory, Strategy & Forecasting, Policy & Governance, Systems Security, Biosecurity, Founding & Field-Building.
In stage 2, applicants apply to streams within those tracks as well as completing track specific evaluations.
After a centralized review period, applicants who are advanced will then undergo additional evaluations depending on the preferences of the streams they've applied to before doing final interviews and receiving offers.
For more information on how to get into MATS, please look at this page.