
LawZero
—
Researcher
Rhys is a researcher at Lawzero. He recently finished his PhD, on formalising and evaluating AI deception.
Technically, his work involves both conceptual research, in the intersection of game theory, causality, and philosophy, in addition to empirical evaluations of frontier AI systems. Rhys is now focusing on control-style research and frontier LM agent alignment.
He is a member of Tom Everitt's Causal Incentives Working Group. Previously, Rhys has worked at the Centre for Assuring Autonomy, the Center on Long-Term Risk, the Centre for the Governance of AI, and the UK’s AI Safety Institute.
In this MATS cohort, he is looking to work on evals and control-style research.