I currently work on AI control and I'm excited about e.g. prototyping and evaluating mitigations for specific threat models, doing science that sheds light on important considerations related to AI control, or building evals for control-related capabilities.
It's hard for me to give specific project descriptions so far in advance, but here's a selection of projects I've been excited about in the past (some of which are now papers!). You can expect the actual projects to be of similar flavour but not exactly the same:
Mary is a research scientist on the Frontier Safety Loss of Control team at DeepMind, where she works on AGI control (security and monitoring). Her role involves helping make sure that potentially misaligned, internally deployed models cannot cause severe harm or sabotage, even if they wanted to. Previously, she has worked on dangerous capability evaluations for scheming precursor capabilities (stealth and situational awareness) as well catastrophic misuse capabilities.
I'm pretty hands-off. I expect scholars to fully take charge of the project, and update / consult me as needed. I do want my scholars to succeed, and am happy to advise on project direction, experiment design, interpreting results, decision-making / breaking ties, or getting unstuck.
During the program, we'll meet once a week to go through any updates / results, and your plans for the next week. I'm also happy to comment on docs, respond on Slack, or have additional ad hoc meetings when useful.
https://arxiv.org/abs/2511.06626
https://arxiv.org/abs/2505.23575
https://arxiv.org/abs/2505.01420
https://arxiv.org/abs/2412.12480
https://static1.squarespace.com/static/660eea75305d9a0e1148118a/t/68fb7d6f70f3d60ca3aabd88/1761312111659/2025LASRdata_poisoning.pdf
I prefer scholars to work in pairs or groups of three within the stream, but happy to take on external collaborators as long as they are committed full-time to the project.
I'll propose ~5 projects for scholars to red-team, flesh out and decide on one to own. I'm also open to scholar-proposed projects if they sound promising; I'd just be less useful as an advisor.
MATS Research phase provides scholars with a community of peers.
.webp)
During the Research phase, scholars work out of a shared office, have shared housing, and are supported by a full-time Community Manager.
Working in a community of independent researchers gives scholars easy access to future collaborators, a deeper understanding of other alignment agendas, and a social network in the alignment community.
Previous MATS cohorts included regular lightning talks, scholar-led study groups on mechanistic interpretability and linear algebra, and hackathons. Other impromptu office events included group-jailbreaking Bing chat and exchanging hundreds of anonymous compliment notes. Scholars organized social activities outside of work, including road trips to Yosemite, visits to San Francisco, and joining ACX meetups.