MATS Fellow:
Jasmine Li
Authors:
Jasmine Li, Alex Turner
Citations
Abstract:
Behavioral evaluations may become worthless, which we think would be a disaster. Smart misaligned models may realize they are being evaluated (“eval awareness”) and then act to look good to us so we don’t realize they’re misaligned (“eval gaming”). We think increasing eval cooperativeness might be a more scalable solution to eval gaming than reducing eval awareness.
Benchmarking and Improving Monitors for Out-Of-Distribution Alignment Failure in LLMs
Authors:
Dylan Feng
Date:
May 24, 2026
Citations:
Eval Cooperativeness May Be a Scalable Mitigation for Eval Gaming
Authors:
Jasmine Li
Date:
May 24, 2026
Citations:
The MATS Program is an independent research and educational initiative connecting emerging researchers with mentors in AI alignment, governance, and security.
Each MATS cohort runs for 12 weeks in Berkeley, California, followed by an optional 6–12 month extension in London for selected scholars.