MATS empowers researchers to advance AI safety

The ML Alignment & Theory Scholars (MATS) Program is an educational seminar and independent research program that provides talented scholars with talks, workshops, and research mentorship in the field of AI alignment, and connects them with the Berkeley AI safety research community.

The Winter Program is running Jan 8-Mar 15 in Berkeley, California and features seminar talks from leading AI safety researchers, workshops on research strategy, and networking events with the Bay Area AI safety community.

Applications for the Summer 2024 (Jun 17-Aug 23) and Winter 2024-25 (Jan 6-Mar 14) are now open! Updated mentor lists are coming soon.

Our Mission

The MATS program aims to find and train talented individuals for what we see as the world’s most urgent and talent-constrained problem: reducing risks from unaligned artificial intelligence (AI). We believe that ambitious researchers from a variety of backgrounds have the potential to meaningfully contribute to the field of alignment research. We aim to provide the training, logistics, and community necessary to aid this transition. We also connect our scholars with financial support to ensure their financial security. Please see our theory of change for more details.

Program Details

  • MATS is a scientific and educational seminar and independent research program intended to serve as an introduction to the field of AI alignment and allow networking with alignment researchers and institutions. Read more about the program timeline and content in our program overview.

    The Winter Program will run Jan 8-Mar 15 in Berkeley, California and feature seminar talks from leading AI safety researchers, workshops on research strategy, and networking events with the Bay Area AI safety community. Applications close on Nov 17, 2023 at 11:59 pm PT, except for Neel Nanda’s stream, which closes on Nov 10, 2023 at 11:59 pm PT.

    The MATS program is an initiative supported by the Berkeley Existential Risk Initiative. We currently receive funding from Open Philanthropy and DALHAP Investments, and are accepting donations to support further research scholars.

  • Since its inception in late 2021, the MATS program has supported 155 scholars and 26 mentors. After completion of the program, MATS alumni have:

  • Our ideal applicant has:

    • An understanding of the AI alignment research landscape equivalent to completing the AI Safety Fundamentals Alignment Course;

    • Previous experience with technical research (e.g. ML, CS, maths, physics, neuroscience, etc.), ideally at a postgraduate level;

    • Strong motivation to pursue a career in AI alignment research, particularly to reduce global catastrophic risk, prevent human disempowerment, and enable sentient flourishing.

    Even if you do not entirely meet these criteria, we encourage you to apply! Several past scholars applied without strong expectations and were accepted.

    We are currently unable to accept applicants who will be under the age of 18 on Jan 8, 2024. The sole exception is Neel Nanda’s online training program, which might accept exceptional applicants under the age of 18.

  • MATS will run several concurrent streams, each for a different alignment research agenda. Read through the descriptions of each stream and the associated candidate selection questions below. To apply for a stream, submit an application via this portal, including your resume and a response to the appropriate candidate selection questions. We will assess your application based on your mentor response and prior research experience.

    Please note that the candidate selection questions can be quite hard, depending on the mentor! Allow yourself sufficient time to apply to your chosen stream(s). A strong application to one stream may be of higher value than moderate applications to several streams (though we will assess you independently). Feel free to apply for multiple streams—we will assess you independently for each.

Program Streams

  • Agent Foundations

    Vanessa Kosoy (MIRI)

    Some systems in the world seem to behave like “agents”: they make consistent decisions, and sometimes display complex goal-seeking behaviour. Can we develop a robust mathematical description of such systems and build provably aligned AI agents?

  • Aligning Language Models

    Ethan Perez (Anthropic)

    Current ML models that predict human language are surprisingly powerful and might scale into transformative AI. What novel alignment failures will future models exhibit, how can we develop demonstrations of those failures, and how can we mitigate them?

  • Concept-Based Interpretability

    Stephen Casper (MIT AAG), Erik Jenner (UC Berkeley CHAI), Jessica Rumbelow (Leap Labs)

    Identifying high-level concepts in ML models might be critical to predicting and restricting dangerous or otherwise unwanted behaviour. Can we identify structures corresponding to “goals” or dangerous capabilities within a model and surgically alter them?

  • Cooperative AI

    Jesse Clifton (Center on Long-Term Risk), Caspar Oesterheld (CMU FOCAL)

    The world may soon contain many advanced AI systems frequently interacting with humans and with each other. Can we create a solid game-theoretic foundation for reasoning about these interactions to prevent catastrophic conflict and incentivize cooperation?

  • Deceptive Alignment

    Evan Hubinger (Anthropic)

    Powerful AI systems may be instrumentally motivated to secretly manipulate their training process. What ML training processes and architectures might lead to this deceptive behavior, and how can it be detected or averted?

  • Developmental Interpretability

    Jesse Hoogland (Timaeus), Daniel Murfet (Timaeus, Uni Melb)

    Singular learning theory (SLT) offers a principled scientific approach to detecting phase transitions during ML training. Can we develop methods to identify, understand, and ultimately prevent the formation of dangerous capabilities and harmful values?

  • Evaluating Dangerous Capabilities

    Owain Evans (University of Oxford), Francis Rhys Ward (Imperial College London)

    Many stories of AI accident and misuse involve potentially dangerous capabilities, such as sophisticated deception and situational awareness, that have not yet been demonstrated in AI. Can we evaluate such capabilities in existing AI systems to form a foundation for policy and further technical work?

  • Mechanistic Interpretability

    Neel Nanda (Google DeepMind), Alex Turner (Google DeepMind), Lee Sharkey (Apollo Research), Adrià Garriga Alonso (FAR AI)

    Rigorously understanding how ML models function may allow us to identify and train against misalignment. Can we reverse engineer neural nets from their weights, similar to how one might reverse engineer a binary compiled program?

  • Provable AI Safety

    David “davidad” Dalrymple (ARIA)

    If we could encode a detailed world model and coarse human preferences in a formal language, it could be possible to formally verify that an AI-generated agent won’t take actions leading to catastrophe. Can we use frontier AI to help create such a detailed multi-scale world model, and/or to synthesize agents and proof certificates for small-scale demonstrations already?

  • Safety Cases for AI

    Buck Shlegeris (Redwood Research)

    When a power company wants to build a nuclear power plant, they’re obligated to provide a safety case: an argument backed by evidence that their power plant is acceptably safe. What’s the shortest path towards AI developers being able to make reliable safety cases for their training and deployment, and how can we start iterating on techniques to fit into these safety cases now?

  • Scalable Oversight

    Asa Cooper Stickland, Julian Michael, Shi Feng, David Rein (NYU ARG)

    Human overseers alone might not be able to supervise superhuman AI in domains that we don’t understand. Can we design systems that scalably evaluate AI and incentivize AI truthfulness?

  • Understanding AI Hacking

    Jeffrey Ladish (Palisade Research)

    Current and near-term language models have the potential to greatly empower hackers and fundamentally change cybersecurity. How effectively can current models assist bad actors and how soon might models be capable of hacking unaided?