A Causal Model of Theory-of-Mind in AI Agents

MATS Fellow:

Jack Foxabbott

Authors:

Jack Foxabbott, Rohan Subramani, James Fox, Francis Rhys Ward

Citations

0 Citations

Abstract:

Agency is a vital concept for understanding and predicting the behaviour of future AI systems. There has been much focus on the goal-directed nature of agency, i.e., the fact that AI agents may capably pursue goals. However, the dynamics of agency become significantly more complex when autonomous agents interact with other agents and humans, necessitating engagement in theory-of-mind, the ability to reason about the beliefs and intentions of others. In this paper, we extend the framework of multi-agent influence diagrams (MAIDs) to explicitly capture this complex form of reasoning. We also show that our extended framework, MAIDs with incomplete information (II-MAIDs), has a strong theoretical connection to dynamic games with incomplete information with no common prior over types. We prove the existence of important equilibria concepts in these frameworks, and illustrate the applicability of II-MAIDs using an example from the AI safety literature.

Recent research

What Happens When Superhuman AIs Compete for Control?

Authors:

Steven Veld

Date:

January 11, 2026

Citations:

0

AI Futures Model: Timelines & Takeoff

Authors:

Brendan Halstead, Alex Kastner

Date:

December 30, 2025

Citations:

0

Frequently asked questions

What is the MATS Program?
How long does the program last?