Paul Riechers, Adam Shai

In this stream we will explore extensions and implications of our discovery that neural networks pretrained on next-token prediction represent belief-state geometry in their activations. We will build on this fundamental theory of neural network representations in order to discover what AI systems are thinking, and understand their emergent behaviors.

Stream overview

Work in this stream will build on our discovery of belief-state geometry in the activations of pretrained neural networks.  The project can leverage applicants’ strengths in mathematical modeling and/or ML engineering.  We welcome highly driven and relatively autonomous researchers that would like to benefit from our mentorship while taking the lead on a relevant project of their choice. Some possible projects will study in-context learning, out of distribution generalization, theoretically benchmarking SAEs, designing new methods for finding features, methods of compressing representations in transformers, and interpretability on RL models.

Mentors

Adam Shai
Simplex
,
Research Lead
SF Bay Area
Interpretability

Adam Shai has extensive research experience in experimental and computational neuroscience. He earned his PhD from Caltech and has over a decade of experience investigating the neural basis of intelligent behavior, most recently as a researcher at Stanford. Driven by the pressing need for AI safety, he has now turned his expertise to neural networks, aiming to develop principled methods for controlling and aligning increasingly advanced AI systems.

Adam co-founded and now leads research at Simplex, an organization dedicated to building a science of representations in AI systems.

Paul Riechers
Simplex
,
Simplex research lead
SF Bay Area
Interpretability

Paul Riechers is a researcher and scientific leader with deep expertise in the physics of information and the fundamental limits of learning and prediction.  He co-founded Simplex, an AI safety research organization, with Dr. Adam Shai, applying insights from theoretical physics and neuroscience to build foundational understanding of internal representations and emergent behavior in neural networks.  Paul earned a PhD in theoretical physics and an MS in electrical and computer engineering from UC Davis. Prior to founding Simplex, he spent five years as a Research Fellow at Nanyang Technological University in Singapore. He is also co-founder of the Beyond Institute for Theoretical Science (BITS), a former Senior Fellow at UCLA’s Mathematics of Intelligences program at IPAM, and has served as both a MATS scholar and mentor. Paul has co-organized multiple workshops on AI interpretability and alignment, and now co-leads the growing Simplex team with support from the Astera Institute.

Mentorship style

Early in the program, Paul and Adam will meet in person with scholars to help them get up to speed on the theoretical and technical background needed to understand and contribute to our framework.  Subsequent weekly meetings with mentees aim to answer questions, unblock research, explore project ideas, and give feedback and suggestions on research.

Scholars we are looking for

The project can leverage applicants’ strengths in mathematical modeling and/or ML engineering.  We welcome highly driven and relatively autonomous researchers that would like to benefit from our mentorship while taking the lead on a relevant project of their choice. The ideal scholar has the ability to move fast, and has experience in either research (e.g., PhD in any field), or software/ML engineering. 

Can independently find collaboraters, but not required

Project selection

We will talk through project ideas with scholar

Community at MATS

MATS Research phase provides scholars with a community of peers.

During the Research phase, scholars work out of a shared office, have shared housing, and are supported by a full-time Community Manager.

Working in a community of independent researchers gives scholars easy access to future collaborators, a deeper understanding of other alignment agendas, and a social network in the alignment community.

Previous MATS cohorts included regular lightning talks, scholar-led study groups on mechanistic interpretability and linear algebra, and hackathons. Other impromptu office events included group-jailbreaking Bing chat and exchanging hundreds of anonymous compliment notes.  Scholars organized social activities outside of work, including road trips to Yosemite, visits to San Francisco, and joining ACX meetups.