David Lindner

Google DeepMind

—

Research Scientist

Links

Focus

Control, Monitoring, Safeguards, Dangerous Capability Evals, Scheming and Deception

H-index

Stream

David Lindner

David Lindner is a Research Scientist on Google DeepMind's AG Safety and Alignment team where he works on evaluations and mitigations for deceptive alignment and scheming. His recent work includes MONA, a method for reducing multi-turn reward hacking during RL, designing evaluations for stealth and situational awareness, and helping develop GDM's approach to deceptive alignment. Currently, David is interested in studying mitigations for scheming, including CoT monitoring and AI control. You can find more details on his website.