
OpenAI
—
Research Scientist
Links
Focus
Scalable Oversight, Control, Monitoring, Interpretability, Adversarial Robustness, Red-Teaming, Alignment Training, Security, Scheming and Deception, Multi-Agent Safety
Stream
OpenAI Safety Team
My focus these days is on adversarial machine learning: safety, security, and alignment of frontier models. I am particularly interested in alignment/safety RL and evaluations. In the past, I studied memorization, privacy, and security harms in language modelling, including auditing for risks and mitigating them. I've also worked on DP training algorithms, unlearning, collaborative learning approaches, and methods for ownership-verification.