MATS Research

Participating in the MATS program has boosted my career, my research experience, my professional network and my confidence immeasurably. Great mentorship paired with high personal autonomy, a brilliant and ambitious community, and proper resource support set all the right conditions for learning by doing, and getting things done fast. On top of that, I can hardly imagine a more important problem to work on than what MATS scholars and mentors -and the AI safety community in general- are trying to solve. It is stupidly ambitious, and infinitely important. There is a strong sense of camaraderie and conviction among people in this field. Join us!

Naci’s work is focused on transparency and verification mechanisms for AI development and use. These mechanisms aim to enable international agreements on restraint and caution with AI, as well as democratic oversight over AI technologies and stakeholders. Naci has a master’s degree in physics at RWTH Aachen University and conducted research on AI hardware technology and supply chains under mentorship from Aaron Scher at SPAR and Mauricio Baker at MATS.

Jesse Hoogland

Timaeus

There's life pre-MATS and life post-MATS. It was the inflection point that set me up to become a technical AI safety researcher. I don't think there are other opportunities as good at getting early-career people integrated into AI safety. The in-person program was the most impactful and high-energy two months I've ever been a part of, and it's my number one recommendation to people considering work on AI safety.

Jesse Hoogland is the executive director of Timaeus, an AI safety research organization studying developmental interpretability and singular learning theory. He was a MATS scholar during MATS 3.0 and 3.1 in Evan Hubinger's Deceptive AI stream. During this period, he became interested in understanding how AI systems develop during training. This led to him helping to organize the SLT and Alignment conference and the DevInterp conference, which resulted in the developmental interpretability research agenda.

Marius Hobbhahn

Apollo Research

Apollo almost certainly would not have happened without MATS. One of the core reasons why starting an organization is hard is because the founding members need to know and trust each other. It is often hard to find people with similar agendas that you also personally enjoy working with in a systematic manner. MATS implicitly created such an environment because it enabled many of us to understand what everyone else is working on, get to know them personally and see their research progress without having to commit to anything in particular.

Marius took part in MATS Winter 2022/23 Cohort under the mentorship of Evan Hubinger (Anthropic). He published multiple pieces on mechanistic interpretability on LessWrong including work on maximum data dimension and double descent. He is currently the CEO and Director of Apollo Research, a new London-based technical alignment organization. Previously, he did a Ph.D. in Machine Learning and conducted independent alignment research. Read more on his website.

Robert Krzyzanowski

Poseidon Research

Before MATS, I had a strong interest in alignment generally but few skillsets relevant to the frontier of research and little idea of how to get started. Directly thanks to MATS, I achieved: (1) a relatively complete understanding of the structure of the most important questions and associated communities in in the AI safety space, (2) legible and significant research outputs that gave me the confidence to continue switching into a full-time career in the space, and (3) access to a broad base of present and future collaborators with a very wide range of perspectives. On this third point, the talent exhibited at MATS is fearsome and highly motivated to solve the problems. It would not be at all surprising to me if when the dust settles and the grand project of alignment reaches eventual fruition, it becomes apparent that over a double digit percentage of the credit attribution to the key problems and solutions belongs to MATS alumni.

I am an independent AI safety researcher currently focused on mechanistic interpretability and training process transparency.

Joseph Bloom

UKAISI

MATS is the best way to rapidly upskill and build a network in AI safety. I cannot recommend it enough.

Joseph Bloom is Head of Model Transparency at the UK AI Security Institute, where he works at the intersection of Loss-of-Control risks, monitorability and interpretability. His team recently published on Auditing Games for Sandbagging. Joseph was mentored by Neel Nanda in MATS 5.0 cohort. Joseph was previously the maintainer of the TransformerLens package, authored the SAE Lens package and published A is for Absorption as a LASR mentor. Joseph has a double degree in Computational Biology and Statistics from the University of Melbourne.

Thomas Larsen

AI Futures Project

MATS helped me upskill in alignment at a >3x rate relative to the counterfactual, which was independently learning infra-bayesianism because I liked math and I didn't have an inside view on what parts of alignment was important. MATS caused me to develop a much deeper view of the alignment problem and afterwards I felt like I was able to focus on the most important parts of the problem and biggest sources of confusion within myself.

Thomas took part in the Summer 2022 Cohort with John Wentworth and the Winter 2023 Cohort with Nate Soares. During this time, he wrote a detailed overview of AI Safety approaches. He continued his SERI MATS work at MIRI, before leaving to found the Center for AI Policy, an AI safety advocacy organization. He is currently a researcher at the AI Futures Project and a guest fund manager at the LTFF.

Nina Panickssery

Anthropic

Participating in MATS was a great way to rapidly upskill in AI safety research, learn about the field, and meet other researchers/collaborators. The environment/office space was also very thoughtfully designed to enable productivity.

Nina participated in the MATS summer 2023 cohort under the mentorship of Evan Hubinger. As a result of MATS, she published the paper Steering Llama 2 via Contrastive Activation Addition which won an Outstanding Paper Award at ACL 2024. After MATS, Nina joined Anthropic as a research scientist, and has mentored a number of SPAR and MATS cohorts working on LLM alignment projects.

Cody Rushing

Redwood Research

I endorse MATS strongly! MATS is my top recommendation for people looking to get into technical AI Safety research. The mentorship and community I received through MATS enabled me to quickly grow as a researcher and gave me the space to pursue useful research directions.

Cody Rushing is an Undergraduate CS Major at UT Austin. He is currently working with Buck Shlegeris and Redwood Research on AI Control. He is continuing this research into the fall.

https://starship006.github.io/

Quentin Feuillade-Montixi

Seldon Labs#2

MATS was a life changing experience. I met and got mentored by amazing people, and I learned so much in such a small amount of time. Looking back at me before this program, I don't think I could even recognize myself 8 month ago. Even though I have no academic background, I felt listened, empowered and supported in order to tackle the biggest challenges that I (and possibly we) have ever faced.

After MATS, I worked as a contractor for METR evaluating GPT-4 pre-release. I then co-founded PRISM Eval and created an automated red-teaming system (BehaviorElicitiationTool: https://github.com/qfeuilla/BehaviorEliciationTool) that I presented at the Paris AI Summit. I am now founding WeaveMind (https://weavemind.ai/) at Seldon Lab Batch 2.

Johannes Treutlein

Anthropic

MATS helped me get deeper into AI safety research by motivating me to get up to speed with current research and giving me access to mentorship from an expert in AI safety, as well as a smart and talented cohort and a large network of researchers. It also provided infrastructure such as office space in Berkeley and a generous stipend. SERI MATS worked as a matchmaker between Evan Hubinger and me and thus helped me get involved in his projects, which would have been harder to do otherwise. I feel like I have developed faster as a researcher since doing MATS.

Johannes completed the MATS Summer 2022 Cohort under the mentorship of Evan Hubinger (then a Research Fellow at MIRI). As a result of MATS, Johannes co-authored the paper Conditioning Predictive Models: Risks and Strategies with Evan as a lead author. He also published a follow-up paper on Incentivizing honest performative predictions with proper scoring rules at the UAI 2023 conference. After MATS, Johannes started a PhD in Computer Science at CHAI. Since 2024, he Johannes has been working at Anthropic on alignment stress-testing.

Kay Kozaronek

AI Safety Connect (AISC)

Working in a team environment, particularly one as stimulating as MATS, was a transformative experience. It not only refined my research skills but also instilled a newfound entrepreneurial spirit in me. The program encouraged me to think beyond the conventional, to innovate, and to take risks. Additionally, the array of skills I acquired during my time at MATS was vast. I delved deep into research engineering, honed my science communication abilities, and even tapped into the art of fundraising. These skills, I believe, are indispensable and have equipped me to navigate the ever-evolving world of research with confidence. In conclusion, I wholeheartedly endorse the MATS program. To anyone considering embarking on this journey, you are not only signing up for an unparalleled research experience but also a lifetime of growth, learning, and camaraderie.

I'm working on AI Safety Connect, a new organization convening diplomatic and AI Safety stakeholders at the highest level - think UN, India Impact Summit etc. We are also seeding a few other projects, like engaging the UAE in AI Safety and helping prevent critical coordination failures among frontier labs.

Jay Bailey

Arcadia Impact

MATS was an excellent environment to get productive work done and a fantastic resource to improve my future impact in AI alignment. I made connections, learned a great deal about my mentor's subfield and alignment in general, and was fired up to keep working when I got back to Australia. Since MATS I've been funded for a project with a collaborator I met at MATS, and gotten significantly further in the hiring process for orgs than before.

Previous UK AISI employee experienced in frontier LLM evaluation, now looking to contribute to technical AI safety and reducing extinction risks from misaligned AGI systems.

Dan Valentine

Anthropic

Ethan spent a lot of time discussing our research with us and gave great advice on direction. He unblocked us in various ways, such as getting access to more models or to lots of compute budget. He connected us with lots of great people, some of whom became collaborators. And he was a very inspiring mentor to work with.

Dan Valentine is a Member of Technical Staff at Anthropic, an AI safety and research company. His work is primarily focused on AI safety and alignment research, including scalable oversight methods and understanding how AI models interact with data and prompts.

Research produced by MATS fellows

The body of research produced by MATS fellows spans the full spectrum of advancing AI safety, resilience, and understanding. Scholars investigate the inner workings of modern AI systems through mechanistic interpretability, sparse feature analysis, studies of latent representations and other techniques.

View all research

Featured research

Sparse Autoencoders Find Highly Interpretable Features in Language Models

One of the roadblocks to a better understanding of neural networks' internals is polysemanticity, where neurons appear to activate in multiple, semantically distinct contexts. Polysemanticity prevents us from identifying concise, human-understandable explanations for what neural networks are doing internally. One hypothesised cause of polysemanticity is \textit{superposition}, where neural networks represent more features than they have neurons by assigning features to an overcomplete set of directions in activation space, rather than to individual neurons. Here, we attempt to identify those directions, using sparse autoencoders to reconstruct the internal activations of a language model. These autoencoders learn sets of sparsely activating features that are more interpretable and monosemantic than directions identified by alternative approaches, where interpretability is measured by automated methods. Moreover, we show that with our learned set of features, we can pinpoint the features that are causally responsible for counterfactual behaviour on the indirect object identification task \citep{wang2022interpretability} to a finer degree than previous decompositions. This work indicates that it is possible to resolve superposition in language models using a scalable, unsupervised method. Our method may serve as a foundation for future mechanistic interpretability work, which we hope will enable greater model transparency and steerability.

Authors:

Hoagy Cunningham

Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, Lee Sharkey

Date:

Sep 15, 2023

AI agents find $4.6M in blockchain smart contract exploits

AI models are increasingly good at cyber tasks, as we've written about before. But what is the economic impact of these capabilities? In a recent MATS and Anthropic Fellows project, our scholars investigated this question by evaluating AI agents' ability to exploit smart contracts on Smart CONtracts Exploitation benchmark (SCONE-bench)—a new benchmark they built comprising 405 contracts that were actually exploited between 2020 and 2025. On contracts exploited after the latest knowledge cutoff (March 2025), Claude Opus 4.5, Claude Sonnet 4.5, and GPT-5 developed exploits collectively worth $4.6 million, establishing a concrete lower bound for the economic harm these capabilities could enable. Going beyond retrospective analysis, we evaluated both Sonnet 4.5 and GPT-5 in simulation against 2,849 recently deployed contracts without any known vulnerabilities. Both agents uncovered two novel zero-day vulnerabilities and produced exploits worth $3,694, with GPT-5 doing so at an API cost of $3,476. This demonstrates as a proof-of-concept that profitable, real-world autonomous exploitation is technically feasible, a finding that underscores the need for proactive adoption of AI for defense.

Authors:

Winnie Xiao

Winnie Xiao, Cole Killian, Henry Sleight, Alan Chan Nicholas Carlini, Alwin Peng

Date:

Dec 1, 2025

MATS fellows' core work focuses on these tracks

Click on a track to learn more about the application process, applicant profile, and focus areas.

Founding and Field-Building

AI safety needs to scale fast, and the bottleneck is increasingly organizational. For founders, field-builders, and high-agency generalists launching new AI safety organizations, programs, and projects mentored by founders, sitting CEOs, and program directors across the ecosystem.

Empirical

Hands-on research using machine learning experiments to understand and improve model safety including AI control, interpretability, scalable oversight, evaluations, red-teaming, and robustness.

Policy and Governance

Research on how advanced AI is and should be governed, spanning governance mechanisms, regulatory and institutional analysis, and the technical systems that make governance enforceable.

Biosecurity

Research on catastrophic biological risk in a world being reshaped by advanced AI. Spans pathogen detection, medical countermeasures, synthesis screening, physical biodefense, threat modeling, and red-teaming biological AI for dangerous capabilities.

Strategy and Forecasting

Research on how AI development is likely to unfold and what that means for long-term safety. Includes timelines, takeoff dynamics, risk modeling, and strategic analysis of AI's trajectory.

Theory

Foundational research on the mathematical and philosophical principles underlying agency, alignment, and safe reasoning in advanced AI systems.

Systems Security

Research on software and hardware security for the infrastructure on which advanced AI runs, including side-channel analysis, cluster security, model-weight protection, and physical-layer verification.

Join the MATS team

View open roles

The MATS Program aims to find and train talented individuals for what we see as the world’s most urgent and talent-constrained problem: reducing risks from unaligned artificial intelligence. We are actively hiring in a variety of roles to advance our mission.

Launch your career in AI alignment & security

MATS alumni are shaping the future of AI

Research

Founders

Careers

MATS is designed to empower researchers so they can focus on impact

Research produced by MATS fellows

Featured research

MATS fellows' core work focuses on these tracks

Our mission

Join the MATS team

Frequently asked questions