MATS Fellow:
Patrick Leask
Authors:
Patrick Leask, Noura Al Moubayed
Citations
Abstract:
Linear probes have been used to demonstrate that LLM activations linearly encode high-level properties of the input, such as truthfulness, and that these directions can evolve significantly during fine-tuning and training. However, despite their seeming simplicity, linear probes can have complex geometric interpretations, leverage spurious correlations, and lack selectivity. We present a method for decomposing linear probe directions into weighted sums of as few as 10 model activations, whilst maintaining task performance. These probes are also invariant to affine transformations of the representation space, and we demonstrate that, in some cases, poor base to fine-tune probe generalization performance is partially due to simple transformations of representation subspaces, and the structure of the representation space changes less than indicated by other methods.
What Happens When Superhuman AIs Compete for Control?
Authors:
Steven Veld
Date:
January 11, 2026
Citations:
0
AI Futures Model: Timelines & Takeoff
Authors:
Brendan Halstead, Alex Kastner
Date:
December 30, 2025
Citations:
0
The MATS Program is an independent research and educational initiative connecting emerging researchers with mentors in AI alignment, governance, and security.
Each MATS cohort runs for 12 weeks in Berkeley, California, followed by an optional 6–12 month extension in London for selected scholars.