Conceptual research on deceptive alignment, designing scheming propensity evaluations and honeypots. Some example directions:
I am a research scientist on the AGI Safety & Alignment team at Google DeepMind. I am currently focusing on deceptive alignment and AI control (recent work: https://arxiv.org/abs/2505.01420), particularly scheming propensity evaluations and honeypots. My past research includes power-seeking incentives, specification gaming, and avoiding side effects.
During the program, we will meet once a week to go through any updates / results, and your plans for the next week. I'm also happy to comment on docs, respond on Slack, or have additional ad hoc meetings as needed.
https://arxiv.org/abs/2505.01420
https://arxiv.org/abs/2403.13793
https://arxiv.org/abs/2505.23575
https://arxiv.org/abs/2412.12480
Scholars will be working together in team(s)
I will talk through project ideas with scholars
MATS Research phase provides scholars with a community of peers.
.webp)
During the Research phase, scholars work out of a shared office, have shared housing, and are supported by a full-time Community Manager.
Working in a community of independent researchers gives scholars easy access to future collaborators, a deeper understanding of other alignment agendas, and a social network in the alignment community.
Previous MATS cohorts included regular lightning talks, scholar-led study groups on mechanistic interpretability and linear algebra, and hackathons. Other impromptu office events included group-jailbreaking Bing chat and exchanging hundreds of anonymous compliment notes. Scholars organized social activities outside of work, including road trips to Yosemite, visits to San Francisco, and joining ACX meetups.