MATS mentors are advancing the frontiers of AI alignment, transparency, and security

Gabriel Kulp
RAND
,
Adjunct Staff

Gabriel works with RAND on hands-on projects to build and test prototypes of secure compute infrastructure. He focuses on how to secure the most sensitive AI data centers against the most sophisticated current and future threats. Gabriel has also worked on hardware-enabled governance mechanisms (HEMs, at the intersection of GPU export control and hardware security) and on technical verification of agreements on the development and use of AI systems. He holds a master's degree in computer science and is pursuing a PhD in AI.

Focus:
Compute Infrastructure
Compute and Hardware, Security
Programs:
Kyle Fish
Anthropic
,
Model Welfare Lead

Kyle works on model welfare at Anthropic. He previously co-founded Eleos AI Research, Telis Bioscience, and Alvea. 

Focus:
Empirical
Control, Model Organisms, Red-Teaming, Scheming and Deception
Julian Stastny
Redwood Research
,
Member of Technical Staff

Julian Stastny is a Member of Technical Staff at Redwood Research. He has a Master's in ML from the University of Tübingen, and was previously a researcher at the Center on Long-Term Risk.

Focus:
Empirical
Control, Model Organisms, Scheming and Deception, Strategy and Forecasting
Programs:
Alex Cloud
Anthropic
,
Member of Technical Staff

Alex is a researcher at Anthropic. He is interested in developing principled methods to induce safety-relevant structure in models. Examples include gradient routing to localize learning updates in models and distillation for robust unlearning.

Previously, Alex conducted applied research in reinforcement learning at Riot Games AI and Amazon. He earned a PhD in Statistics from North Carolina State University, where he was advised by Eric Laber.

Focus:
Empirical
Interpretability, Agent Foundations
Milad Nasr
OpenAI
,
Research Scientist
Focus:
Empirical
Security, Adversarial Robustness, Dangerous Capability Evals
Programs:

I research AI safety and alignment. Most recently, I was a research scientist at Google DeepMind. I completed my PhD at UC Berkeley's Center for Human-Compatible AI, advised by Stuart Russell. I previously cofounded FAR.AI, a 501(c)3 research nonprofit that incubates and accelerates beneficial AI research agendas.

I develop AI alignment frameworks, stress-test their limits, and turn insights into methodology adopted across the field. I have established that chain-of-thought monitoring is a substantial defense when reasoning is necessary for misalignment, designed practical metrics to preserve monitorability during model development, shown that obfuscated activations can bypass latent-space defenses, and developed StrongREJECT, a jailbreak benchmark now used by OpenAI, US/UK AISI, Amazon, and others.

Focus:
Empirical
Control, Dangerous Capability Evals, Red-Teaming, Monitoring, Model Organisms, Safeguards
Krishnamurthy Dvijotham (Dj)
Google DeepMind
,
Senior Staff Research Scientist

Krishnamurthy (Dj) Dvijotham is a senior staff research scientist at Google DeepMind.where he leads efforts on the development of secure and trustworthy AI agents. He previously founded the AI security research team at ServiceNow Research and co-founded the robust and verified AI team at DeepMind. His past research has received best paper awards at many leading AI conferences, including most recently at ICML and CVPR 2024. His research led to the framework used for AI security testing at ServiceNow and has been deployed in several Google products, including the Android Play Store, YouTube and Gemini.

Focus:
Empirical
Dangerous Capability Evals, Adversarial Robustness, Security, Red-Teaming, Scalable Oversight
Programs:
Dan Mossing
OpenAI
,
Member of technical staff

I lead the interpretability team at OpenAI. I am most interested in simple, practical interpretability approaches that are targeted at making models safer. In a previous life, I worked as a neuroscientist.

Focus:
Empirical
Interpretability
Programs:
Jacob Merizian
UK AISI
,
Research Scientist, Workstream Lead

I work at the UK AI Security Institute. In the past, I’ve done research in high-performance computing, language model pretraining, interpretability, and hardware enabled governance.

Focus:
Empirical
Control, Dangerous Capability Evals
Programs:
Lee Sharkey
Goodfire AI
,
Principal Investigator

Lee Sharkey is a Principle Investigator at Goodfire. His team has focused on improved interpretability methods, including parameter decomposition methods such as Attribution-based Parameter Decomposition and Stochastic Parameter Decomposition.

Previously, Lee was Chief Strategy Officer and cofounder of Apollo Research, and a Research Engineer at Conjecture, where he worked on sparse autorencoders as a solution to representational superposition. Lee’s past research includes “Goal Misgeneralization in Deep Reinforcement Learning” and “Circumventing interpretability: How to defeat mind-readers.”

Focus:
Empirical
Interpretability
Cody Rushing
Redwood Research
,
Member of Technical Staff

Cody Rushing is a Member of Technical Staff at Redwood Research. He studied CS at UT Austin before attending MATS in 2023.

Focus:
Empirical
Control, Model Organisms, Scheming and Deception, Strategy and Forecasting
Programs:
Micah Carroll
OpenAI
,
Member of Technical Staff

Micah Carroll is Member of Technical Staff on OpenAI's safety team interested in AI deception, scalable oversight, and monitorability. Micah is on leave from a PhD at UC Berkeley, where he focused on AI Alignment with changing and influenceable humans. In particular, he worked on AI manipulation emergent from RL training and on the effects of algorithmic choices in recommender systems.

Focus:
Empirical
Control, Model Organisms, Red-Teaming, Scheming and Deception
Robert Kirk
UK AISI
,
Research Scientist

Robert is a research scientist and the acting lead of the alignment red-teaming sub-team at UK AISI. This team's focus is on stress-testing model alignment to detect and understand model propensities relevant to loss-of-control risks. Before that, he's most recently worked on misuse research, focusing on evaluations of safeguards against misuse and mitigations for misuse risk, particularly in open-weight systems. He graduated from his PhD from University College London on generalisation in LLM fine-tuning and RL agents in January 2025.

Focus:
Empirical
Monitoring, Adversarial Robustness, Control, Model Organisms, Red-Teaming, Dangerous Capability Evals, Safeguards
Programs:

Alex Souly is a researcher on the Red Team at the UK AI Security Institute, where she works on the safety and security of frontier LLMs. She has contributed to pre-deployment evaluations and red-teaming of misuse safeguards and alignment (see Anthropic and OpenAI blogpost), and worked on open source evals like StrongReject and AgentHarm. Previously, she studied Maths at Cambridge and Machine Learning at UCL as part of UCL Dark lab, interned at CHAI, and in another life worked as a SWE at Microsoft.

Focus:
Empirical
Monitoring, Adversarial Robustness, Control, Model Organisms, Red-Teaming, Dangerous Capability Evals, Safeguards
Programs:

Fin Moorhouse is a researcher at Forethought. Previously he was a researcher at the Future of Humanity Institute and Longview Philanthropy, and studied philosophy at Cambridge.

Focus:
Policy and Strategy
AI Welfare, Strategy and Forecasting, Policy and Governance
Programs:

Adrià is an independent researcher focused on open-source self-alignment and self-exploration, and reproducible inference. Previously, he was a Research Scientist at FAR AI, where he reverse-engineered a recurrent neural network that plans. His previous interpretability work includes measuring progress in interpretability with InterpBench, Automatic Circuit Discovery and Causal Scrubbing. He previously worked at Redwood Research on neural network interpretability. He holds a PhD from the University of Cambridge, where he worked on Bayesian neural networks.

Focus:
Empirical
AI Welfare, Scalable Oversight, Compute and Hardware
Focus:
Technical Governance
Biorisk, Security, Safeguards
Programs:
Eric Winsor
UK AISI
,
Research Engineer
Focus:
Empirical
Monitoring, Adversarial Robustness, Control, Model Organisms, Red-Teaming, Dangerous Capability Evals, Safeguards
Programs:

Romeo is working on forecasting detailed AI scenarios and developing policy recommendations with the AI Futures Project. He focuses primarily on compute and security forecasting. Previously he was an IAPS Policy Fellow and graduated with a concurrent masters in Computer Science at Harvard with a systems and hardware focus.

Focus:
Policy and Strategy
Strategy and Forecasting, Policy and Governance
Programs:
Patrick Butlin
Eleos AI
,
Senior Research Lead

I am a philosopher of mind and a researcher at Eleos AI, where I work on AI consciousness, agency and welfare. Before joining Eleos, I worked at the Future of Humanity Institute and Global Priorities Institute in Oxford. I'm interested in projects including purely philosophical work on the grounds of moral status; research drawing on cognitive science to gain a mechanistic understanding of sentience and agency; and empirical studies that can shed light on welfare-relevant features in AI.

Focus:
Theory
AI Welfare

Frequently asked questions

What is the MATS Program?
Who are the MATS Mentors?
What are the key dates of the MATS Program?
Who is eligible to apply?
How does the application and mentor selection process work?