The stream focuses on evaluating and/or mitigating catastrophic risk emerging from dangerous scientific capabilities in frontier AI systems, with an emphasis on the challenges that emerge from lab integrations and novel science. Potential research directions include evaluation design, risk mitigations and evaluation science.
I work in the AI Safety team at LILA, an AI company with the goal of achieving scientific superintelligence. LILA's approach rests on two in-house capability verticals: the development of AI models for scientific reasoning and the development of a lab automation platform with broad experimental coverage, not just narrow workflows. The capabilities that the combination of those verticals might unlock present novel challenges from a safety perspective:
With this context, let me now turn to the research directions. The field of dangerous scientific capabilities is moving fast, so I will describe broad directions and leave more granular scoping until closer to the start of the fellowship. I expect the fellow to focus on one of the three directions below, though projects can draw on more than one.
Dangerous capability evaluations
Risk mitigations
Evaluation science
This direction is more methodological, and aims to develop frameworks that assure and improve the quality of our evaluations.
David is a Scientist in the AI Safety team at Lila Sciences, where he leads the technical design and implementation of dangerous capability evaluations in scientific domains like biology, chemistry, and materials science. His work involves measuring model performance on scientific tasks that could pose safety risks through misuse or unintended failure modes as well as the efficacy of guardrails and other mitigations. A core challenge is designing evaluations that meaningfully capture frontier capabilities; assessing not just what models can do today, but what emerging scientific reasoning abilities might enable in adversarial or failure scenarios.
In his previous role, David developed biosafety evaluations featured in the systems cards of Meta’s Muse Spark and multiple versions of Anthropic’s Claude Sonnet and Opus. Previously, he completed his PhD in Theoretical Physics at Queen Mary University of London.
We can schedule a weekly 1h meeting, for general progress updates, share result and overall guidance. I would be reachable on Slack as well for async comms. Happy to jump on ad-hoc calls for specific discussions or pair coding/debugging. I am based in London and I work UK hours (10am-7pm), but I also visit the US (Boston) a few times a year.
Essential
Preferred
Not a good fit:
I will work with the fellow to find the right project that suits their interest within the directions spelled out above. I will pitch a few project ideas and support the fellow in making the decision. I also welcome project suggestions; in those cases I would work with the fellow to scope it appropriately.