Why Do Some Language Models Fake Alignment While Others Don't?

MATS Alumnus

Abhay Sheshadri

Collabortators

Abhay Sheshadri, John Hughes, Julian Michael, Alex Mallen, Arun Jose, Janus, Fabien Roger

Citations

4 Citations

Abstract

Recent research

Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs

Authors:

Jorio Cocola, Dylan Feng

Date:

December 10, 2025

Citations:

0

AI agents find $4.6M in blockchain smart contract exploits

Authors:

Fellow: Winnie Xiao

Date:

December 1, 2025

Citations:

0

Frequently asked questions

What is the MATS Program?
How long does the program last?