This stream is for the UK AISI Red-team. The team focuses on stress-testing mitigations for AI risk, including misuse safeguards, control techniques and model alignment red-teaming. We plan to work on projects building and improving methods for performing these kinds of evaluations and methods.
Project could include:
Alex Souly is a researcher on the Red Team at the UK AI Security Institute, where she works on the safety and security of frontier LLMs. She has contributed to pre-deployment evaluations and red-teaming of misuse safeguards and alignment (see Anthropic and OpenAI blogpost), and worked on open source evals like StrongReject and AgentHarm. Previously, she studied Maths at Cambridge and Machine Learning at UCL as part of UCL Dark lab, interned at CHAI, and in another life worked as a SWE at Microsoft.
I'm a research scientist at the UK AI Security Institute, working on AI control red teaming and model organisms of misalignment. I was previously a postdoc with Sam Bowman at NYU, did MATS with Owain Evans, and mentored for the MATS, SPAR and Pivotal fellowships. I got my PhD at the University of Edinburgh, supervised by Iain Murray.
Robert is a research scientist and the acting lead of the alignment red-teaming sub-team at UK AISI. This team's focus is on stress-testing model alignment to detect and understand model propensities relevant to loss-of-control risks. Before that, he's most recently worked on misuse research, focusing on evaluations of safeguards against misuse and mitigations for misuse risk, particularly in open-weight systems. He graduated from his PhD from University College London on generalisation in LLM fine-tuning and RL agents in January 2025.
Xander Davies is a Member of the Technical Staff at the UK AI Security Institute, where he leads the Red Teaming group, which uses adversarial ML techniques to understand, attack, and mitigate frontier AI safeguards. He is also a PhD student at the University of Oxford, supervised by Dr. Yarin Gal. He previously studied computer science at Harvard, where he founded and led the Harvard AI Safety Team.
NA
Each scholar will have one primary mentor from the Red Team who will provide weekly guidance and day-to-day support
Scholars will also have access to secondary advisors within their specific sub-team (misuse, alignment, or control) for technical deep-dives
Team lead Xander Davies and advisors Geoffrey Irving and Yarin Gal will provide periodic feedback through team meetings and project reviews
For scholars working on cross-cutting projects, we can arrange mentorship from multiple sub-teams as needed
Structure:
Weekly 1:1 meetings (60 minutes) with primary mentor for project updates, technical guidance, and problem-solving
Asynchronous communication via Slack/email throughout the week for quick questions and feedback
Bi-weekly team meetings where scholars can present work-in-progress and get broader team input
Working style:
We expect scholars to work semi-independently – taking initiative on their research direction while leveraging mentors for guidance on technical challenges, research strategy, and navigating AISI resources
Scholars will have access to our compute resources, pre-release frontier models, and operational support to focus on research
We encourage scholars to document their work and, if appropriate, aim for publication or public blog posts
NA
On misuse:
On control:
On alignment:
We're looking for scholars with hands-on experience in machine learning and AI security, particularly those interested in adversarial robustness, red teaming, or AI safeguards. Ideal candidates would have:
We welcome scholars at various career stages especially those who are eager to work on problems with direct impact on how frontier AI is governed and deployed.
Scholars will primarily collaborate with members of the Red Team, likely specifically within our misuse, alignment, and control sub-teams.
They'll have access to:
We are open to scholars seeking out collaborators within AISI and potentially with external researchers, subject to mentor approval and any necessary agreements.
Scholars will choose from a set of predefined project directions aligned with our current research priorities, such as:
We'll provide initial direction and guidance on project scoping, then scholars will have autonomy to explore specific approaches within that framework.
Expect weekly touchpoints to ensure progress and refine directions.
If mentees have particular ideas they're excited about that they see as fitting within the scope of the team's work, they're welcome to propose them, but there is no guarantee they will be selected
MATS Research phase provides scholars with a community of peers.
.webp)
During the Research phase, scholars work out of a shared office, have shared housing, and are supported by a full-time Community Manager.
Working in a community of independent researchers gives scholars easy access to future collaborators, a deeper understanding of other alignment agendas, and a social network in the alignment community.
Previous MATS cohorts included regular lightning talks, scholar-led study groups on mechanistic interpretability and linear algebra, and hackathons. Other impromptu office events included group-jailbreaking Bing chat and exchanging hundreds of anonymous compliment notes. Scholars organized social activities outside of work, including road trips to Yosemite, visits to San Francisco, and joining ACX meetups.