
Anthropic
—
Member of Technical Staff
Stephen McAleer is a Member of Technical Staff at Anthropic, working on the Alignment Science team. He was previously a postdoc at CMU working with Tuomas Sandholm. Stephen received his PhD in computer science from the University of California, Irvine working with Pierre Baldi. During his PhD, he did research scientist internships at Intel Labs and DeepMind. Before that, Stephen received his bachelor's degree in mathematics and economics from Arizona State University in 2017. Projects he is interested in include:
- Anything related to control/monitoring for coding agents
- Scalable oversight for agent alignment
- Scheming evaluations and mitigations
- Adversarial training for robust monitors / reward models
- Reward hacking / deception in agents