Neel Nanda

Google DeepMind

—

Senior Research Scientist

Links

Focus

Interpretability

H-index

Stream

Neel Nanda

Neel leads the mechanistic interpretability team at Google DeepMind, trying to use the internals of models to understand them better, and use this to make them safer - eg detecting deception, understanding concerning behaviours, and monitoring deployed systems for harmful behaviour.

Since mid 2024, Neel has become more pessimistic about ambitious mechanistic interpretability, and more optimistic that pragmatic approaches can add a lot of value. He's doing less work on basic science, and working more on model biology work, and work applying interpretability to real-world safety problems like monitoring.

He has spent far too much time having MATS scholars, and has about ~50 alumni - he's excited to take on even more!