Agent Foundations

Some systems in the world seem to behave like “agents”: they make consistent decisions, and sometimes display complex goal-seeking behaviour. Can we develop a robust mathematical description of such systems and build provably aligned AI agents?

Mentor

Research Projects

  • Learning-Theoretic Agenda

    The learning-theoretic AI alignment research agenda is an attempt to create a foundational mathematical theory of computationally bounded (approximately) rational agents, drawing on tools from statistical and computational learning theory, control theory, algorithmic information theory, imprecise probability theory and other areas of theoretical computer science and mathematics. This theory will hopefully allow a much more precise and confident analysis of threat models and solutions in AI alignment than was possible heretofore. It also generates entirely new approaches to solving alignment, such as Physicalist Superimitation.

    For more details about the research agenda, see “The Learning-Theoretic Agenda: Status 2023.”

    For some criticism, see “A mostly critical review of infra-Bayesianism.”

Personal Fit

  • Ideal Candidates

    Ideal candidates should have:

    • Experience with mathematical research, including:

      • Proving difficult original theorems;

      • Starting from a vague/informal description of a mathematical problem or idea and crystallizing it into something rigorous;

      • Studying the literature to find out about relevant prior work and complete gaps in personal knowledge.

    • Background in relevant mathematical fields:

      • Theory of computation, probability theory, general topology, functional analysis;

      • Bonus points for: computational complexity theory, statistical learning theory, control theory, algorithmic information theory, game theory, category theory, type theory.

    • Experience with writing explanations or results of mathematical research in crisp, academic style.

  • Mentorship Style

    Mentorship looks like:

    • 1 hour weekly group meeting in video call.

    • Shared Slack workspace:

      • Response time is usually within 1-2 days;

      • Occasionally chatting in real time (time zone permitting).

    • Detailed feedback on drafts

Selection Questions