Arthur Conmy

Google DeepMind

—

Senior Research Engineer

Links

Focus

Interpretability, Red-Teaming, Monitoring

H-index

Stream

Arthur Conmy

Arthur Conmy is a Research Engineer at Google DeepMind, on the Language Model Interpretability team with Neel Nanda.

Arthur's focus is on practically useful interpretability and related AI Safety research. For example, Arthur was one of core engineers who added probes to Gemini deployments first. Arthur has also recently led research on how to interpret reasoning models: https://arxiv.org/abs/2506.19143 and how to elicit knowledge from model organisms: https://arxiv.org/abs/2510.01070 (both through MATS).

In the past, Arthur did early influential work on automating interpretability, finding circuits. Previously, Arthur worked at Redwood Research.