Stephen McAleer

Anthropic

—

Member of Technical Staff

Links

Focus

Control, Model Organisms, Red-Teaming, Scheming and Deception

H-index

Stream

Anthropic and OpenAI Megastream

Stephen McAleer is a Member of Technical Staff at Anthropic, working on the Alignment Science team. He was previously a postdoc at CMU working with Tuomas Sandholm. Stephen received his PhD in computer science from the University of California, Irvine working with Pierre Baldi. During his PhD, he did research scientist internships at Intel Labs and DeepMind. Before that, Stephen received his bachelor's degree in mathematics and economics from Arizona State University in 2017. Projects he is interested in include:

- Anything related to control/monitoring for coding agents

- Scalable oversight for agent alignment

- Scheming evaluations and mitigations

- Adversarial training for robust monitors / reward models

- Reward hacking / deception in agents