Scott Emmons

Independent

—

Links

Focus

Control, Dangerous Capability Evals, Red-Teaming, Monitoring, Model Organisms, Safeguards

H-index

Stream

Scott Emmons

I research AI safety and alignment. Most recently, I was a research scientist at Google DeepMind. I completed my PhD at UC Berkeley's Center for Human-Compatible AI, advised by Stuart Russell. I previously cofounded FAR.AI, a 501(c)3 research nonprofit that incubates and accelerates beneficial AI research agendas.

I develop AI alignment frameworks, stress-test their limits, and turn insights into methodology adopted across the field. I have established that chain-of-thought monitoring is a substantial defense when reasoning is necessary for misalignment, designed practical metrics to preserve monitorability during model development, shown that obfuscated activations can bypass latent-space defenses, and developed StrongREJECT, a jailbreak benchmark now used by OpenAI, US/UK AISI, Amazon, and others.