Alan Cooney

This stream focuses on empirical AI control research, including defending against AI-driven data poisoning, evaluating and attacking chain-of-thought monitorability, and related monitoring/red-teaming projects. It is well-suited to applicants already interested in AI safety with solid Python skills, and ideally prior research or familiarity with control literature/tools (e.g. Inspect/ControlArena).

Apply

View all streams

Stream overview

Control for data poisoning. Studying scenarios where an AI system covertly data poisons another AI, to e.g., instill backdoors or secret loyalties.
Chain-of-thought monitorability. Evaluating the monitorability of reasoning verbalised by AIs & studying ways this may be compromised.
Other Control projects (see https://alignmentproject.aisi.gov.uk/research-area/empirical-investigations-into-ai-monitoring-and-red-teaming for ideas)

Mentors

Alan Cooney

UK AISI

Researcher

London

—

Control

Monitoring

Alan Cooney leads the Autonomous Systems workstream within the UK's AI Safety Institute. His team is responsible for assessing the capabilities and risks of Frontier AI systems released by AI labs such as OpenAI, Google and Anthropic. Prior to working in AI safety, he was an investment consultant and start-up founder, with his company Skyhook being acquired in 2023. He also completed Stanford’s Machine Learning and Alignment Theory Scholars Programme, where he was supervised by Google DeepMind researcher Neel Nanda.

Mentorship style

1-hour weekly meetings for going through your research log & high level guidance. Daily updates on slack are also very useful and I typically reply within 2 days to any questions.

Representative papers

AI Control: Improving Safety Despite Intentional Subversion

Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

Scholars we are looking for

Essential:

Existing interested in AI safety
Programming experience with Python/similar

You may be a good fit if you also have some of:

Research experience: prior AI research experience (on any topic)
Strong Python skills (essential for CoT monitorability eval projects)
Familiarity with the Control literature - see here
Familiarity with Inspect/ControlArena

Not a good fit:

Scholars primarily interested in conceptual rather than empirical research

Collaborating with other MATS scholars.

Project selection

By default I'll propose several projects for you to choose from, but you can also pitch ideas that you're interested in.