Investigating the Indirect Object Identification circuit in Mamba

MATS Alumnus

Danielle Ensign

Collabortators

Danielle Ensign, Adrià Garriga-Alonso

Citations

0 Citations

Abstract

How well will current interpretability techniques generalize to future models? A relevant case study is Mamba, a recent recurrent architecture with scaling comparable to Transformers. We adapt pre-Mamba techniques to Mamba and partially reverse-engineer the circuit responsible for the Indirect Object Identification (IOI) task. Our techniques provide evidence that 1) Layer 39 is a key bottleneck, 2) Convolutions in layer 39 shift names one position forward, and 3) The name entities are stored linearly in Layer 39's SSM. Finally, we adapt an automatic circuit discovery tool, positional Edge Attribution Patching, to identify a Mamba IOI circuit. Our contributions provide initial evidence that circuit-based mechanistic interpretability tools work well for the Mamba architecture.

Recent research

Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs

Authors:

Jorio Cocola, Dylan Feng

Date:

December 10, 2025

Citations:

0

AI agents find $4.6M in blockchain smart contract exploits

Authors:

Fellow: Winnie Xiao

Date:

December 1, 2025

Citations:

0

Frequently asked questions

What is the MATS Program?
How long does the program last?