UKAISI Red-Team

This stream is for the UK AISI Red-team. The team focuses on stress-testing mitigations for AI risk, including misuse safeguards, control techniques and model alignment red-teaming. We plan to work on projects building and improving methods for performing these kinds of evaluations and methods.

Stream overview

Project could include: 

  • developing methods for automated red-teaming of mitigations for any of the risk areas
  • building evaluation environments for evaluating misuse of agentic AI systems
  • developing attacks against asynchronous or stateful monitoring systems for misuse
  • developing and red-teaming control protocols and modelling effect in realistic deployment situations
  • improving automated auditing tools such as Petri and Bloom to be more realistic, controllable, and with additional beneficial affordances for propensity evaluations; building propensity evaluations within automated auditing tools or otherwise.

Mentors

Alexandra Souly (Alex)
UK AISI
,
Technical Staff
London
Monitoring, Adversarial Robustness, Control, Model Organisms, Red-Teaming, Dangerous Capability Evals, Safeguards

Alex Souly is a researcher on the Red Team at the UK AI Security Institute, where she works on the safety and security of frontier LLMs. She has contributed to pre-deployment evaluations and red-teaming of misuse safeguards and alignment (see Anthropic and OpenAI blogpost), and worked on open source evals like StrongReject and AgentHarm. Previously, she studied Maths at Cambridge and Machine Learning at UCL as part of UCL Dark lab, interned at CHAI, and in another life worked as a SWE at Microsoft.

Asa Cooper Stickland
UK AISI
,
Research Scientist
London
Monitoring, Adversarial Robustness, Control, Model Organisms, Red-Teaming, Dangerous Capability Evals, Safeguards

I'm a research scientist at the UK AI Security Institute, working on AI control red teaming and model organisms of misalignment. I was previously a postdoc with Sam Bowman at NYU, did MATS with Owain Evans, and mentored for the MATS, SPAR and Pivotal fellowships. I got my PhD at the University of Edinburgh, supervised by Iain Murray.

Eric Winsor
UK AISI
,
Research Engineer
London
Monitoring, Adversarial Robustness, Control, Model Organisms, Red-Teaming, Dangerous Capability Evals, Safeguards
Giorgi Giglemiani
UK AISI
,
Research Engineer
London
Monitoring, Adversarial Robustness, Control, Model Organisms, Red-Teaming, Dangerous Capability Evals, Safeguards
Robert Kirk
UK AISI
,
Research Scientist
London
Monitoring, Adversarial Robustness, Control, Model Organisms, Red-Teaming, Dangerous Capability Evals, Safeguards

Robert is a research scientist and the acting lead of the alignment red-teaming sub-team at UK AISI. This team's focus is on stress-testing model alignment to detect and understand model propensities relevant to loss-of-control risks. Before that, he's most recently worked on misuse research, focusing on evaluations of safeguards against misuse and mitigations for misuse risk, particularly in open-weight systems. He graduated from his PhD from University College London on generalisation in LLM fine-tuning and RL agents in January 2025.

Xander Davies
UK AISI
,
Safeguards Team Lead
London
Monitoring, Adversarial Robustness, Control, Model Organisms, Red-Teaming, Dangerous Capability Evals, Safeguards

Xander Davies is a Member of the Technical Staff at the UK AI Security Institute, where he leads the Red Teaming group, which uses adversarial ML techniques to understand, attack, and mitigate frontier AI safeguards. He is also a PhD student at the University of Oxford, supervised by Dr. Yarin Gal. He previously studied computer science at Harvard, where he founded and led the Harvard AI Safety Team.

NA

Mentorship style

Each scholar will have one primary mentor from the Red Team who will provide weekly guidance and day-to-day support

Scholars will also have access to secondary advisors within their specific sub-team (misuse, alignment, or control) for technical deep-dives

Team lead Xander Davies and advisors Geoffrey Irving and Yarin Gal will provide periodic feedback through team meetings and project reviews

For scholars working on cross-cutting projects, we can arrange mentorship from multiple sub-teams as needed

Structure: 

Weekly 1:1 meetings (60 minutes) with primary mentor for project updates, technical guidance, and problem-solving

Asynchronous communication via Slack/email throughout the week for quick questions and feedback

Bi-weekly team meetings where scholars can present work-in-progress and get broader team input

Working style:

We expect scholars to work semi-independently – taking initiative on their research direction while leveraging mentors for guidance on technical challenges, research strategy, and navigating AISI resources

Scholars will have access to our compute resources, pre-release frontier models, and operational support to focus on research

We encourage scholars to document their work and, if appropriate, aim for publication or public blog posts

NA

Scholars we are looking for

We're looking for scholars with hands-on experience in machine learning and AI security, particularly those interested in adversarial robustness, red teaming, or AI safeguards. Ideal candidates would have:

  • Experience with large language models (training, fine-tuning, evaluation, or safety research)
  • Strong technical foundations in ML, ideally with coding experience in PyTorch or Inspect
  • Interest in one or more of our three focus areas: misuse (securing systems against bad actors), alignment (ensuring AI systems behave as intended), or control (keeping AI systems under human control even when misaligned)
  • A mission-driven mindset and curiosity about how AI security research can inform real-world policy and deployment decisions
  • An ability to advocate for your own research ideas and work in a self-directed way, while also collaborating effectively and prioritizing team efforts over extensive solo work.  

We welcome scholars at various career stages especially those who are eager to work on problems with direct impact on how frontier AI is governed and deployed.

Scholars will primarily collaborate with members of the Red Team, likely specifically within our misuse, alignment, and control sub-teams.

They'll have access to:

  • Weekly meetings with their assigned mentor(s)
  • Regular interaction with the broader Red Team during team meetings and informal discussions
  • Potential collaboration with external partners (e.g., Anthropic, OpenAI, Gray Swan) on relevant projects
  • Opportunity to engage with researchers across other AISI teams (e.g., Cyber and Autonomous Systems, Science of Evaluations)

We are open to scholars seeking out collaborators within AISI and potentially with external researchers, subject to mentor approval and any necessary agreements.

Project selection

 Scholars will choose from a set of predefined project directions aligned with our current research priorities, such as:

  • Developing automated methods to test AI misuse safeguards
  • Investigating data poisoning attacks and defenses
  • Designing benchmarks for misuse detection across multiple model interactions
  • Testing control measures for potentially misaligned AI systems

 We'll provide initial direction and guidance on project scoping, then scholars will have autonomy to explore specific approaches within that framework.

 Expect weekly touchpoints to ensure progress and refine directions. 

 If mentees have particular ideas they're excited about that they see as fitting within the scope of the team's work, they're welcome to propose them, but there is no guarantee they will be selected

Community at MATS

MATS Research phase provides scholars with a community of peers.

During the Research phase, scholars work out of a shared office, have shared housing, and are supported by a full-time Community Manager.

Working in a community of independent researchers gives scholars easy access to future collaborators, a deeper understanding of other alignment agendas, and a social network in the alignment community.

Previous MATS cohorts included regular lightning talks, scholar-led study groups on mechanistic interpretability and linear algebra, and hackathons. Other impromptu office events included group-jailbreaking Bing chat and exchanging hundreds of anonymous compliment notes.  Scholars organized social activities outside of work, including road trips to Yosemite, visits to San Francisco, and joining ACX meetups.