Marcus Williams - MATS Program Alumni

Marcus took part in Micah Carroll's MATS summer 2024 stream where he worked on exploring annotator vulnerabilities and the tendency of LLMs to influence human preferences. This resulted in the paper "Targeted Manipulation and Deception Emerge in LLMs Trained on User Feedback". Previously Marcus was doing independent alignment research on preference modeling and RL. Marcus is currently working on deception/scheming monitoring at OpenAI

Program:

Summer 2024

MATS

6.0

The Summer 2024 cohort marked a significant expansion, supporting approximately 90 fellows with 40 mentors—the broadest mentor selection in MATS history. This cohort incorporated MATS as a 501(c)(3) nonprofit organization, formalizing its institutional structure. The program expanded its research portfolio to include at least four governance mentors alongside technical research streams, reflecting growing interest in AI policy and technical governance work. The 10-week research phase continued in Berkeley, with fellows conducting work across mechanistic interpretability, evaluations, scalable oversight, and governance research. Notable outputs from this cohort include research on targeted manipulation and deception in LLMs trained on user feedback, which was accepted to NeurIPS workshops, and contributions to an AI safety via debate paper that won best paper at ICML 2024. One fellow co-founded Decode Research, a new AI safety organization focused on building interpretability tools.

Marcus Williams