MATS Alumnus
Constantin Venhoff
Collabortators
Constantin Venhoff, Ashkan Khakzar, Sonia Joseph, Philip Torr, Neel Nanda
Citations
Abstract
LLaVA-style Vision-Language Models (VLMs) have demonstrated impressive capabilities, but struggle with factual recall tasks compared to their underlying language model (LM). While previous work attributes this to insufficient computational depth after visual processing, we provide an alternative explanation: the distributed representations of visual information across visual tokens in early layers bypasses the factual recall mechanism that resides in the early-layer MLPs of the LM backbone. The performance gap therefore stems from the architectural design of VLMs, rather than insufficient computational capacity. Using linear probes, we show that dedicated linear representations of visual information only emerge in the middle-to-late layers of VLMs. As a result, factual recall in VLMs becomes a “two-hop” challenge, where factual recall precedes visual processing, but the visual processing finishes too late in the model. Through comparative analysis, we demonstrate that successful factual recall depends on the speed of the first processing “hop.” To further support our hypothesis, we patch early-layer MLP outputs from the LM backbone into the corresponding VLM layers, significantly improving factual recall performance. This suggests that the absence of properly aligned token embeddings in early layers is a key factor in factual recall degradation. Finally, we introduce a benchmark to systematically evaluate factual recall accuracy and knowledge hallucination in multimodal settings. Our findings highlight a fundamental architectural limitation in current VLMs and pave the way for designing models that better integrate visual and linguistic information for reliable factual reasoning.
Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs
Authors:
Jorio Cocola, Dylan Feng
Date:
December 10, 2025
Citations:
0
AI agents find $4.6M in blockchain smart contract exploits
Authors:
Fellow: Winnie Xiao
Date:
December 1, 2025
Citations:
0
The MATS Program is an independent research and educational initiative connecting emerging researchers with mentors in AI alignment, governance, and security.
Each MATS cohort runs for 12 weeks in Berkeley, California, followed by an optional 6–12 month extension in London for selected scholars.