Aligning Language Models

Current ML models that predict human language are surprisingly powerful and might scale into transformative AI. What novel alignment failures will future models exhibit, how can we develop demonstrations of those failures, and how can we mitigate them?

Mentors

Personal Fit

  1. You’re probably a great fit if you enjoy/would be good at coding, running machine learning experiments, and doing highly empirical work (spending 95% of your time doing this kind of work). You’re probably a better fit for other streams if you’re looking to do work that is heavily or primarily conceptual, theoretical, or mathematical in nature (though some projects will involve thinking through some parts of conceptual alignment, and how to test ideas there empirically). The day-to-day work is fairly fast-paced and involves a lot of writing Python scripts, using the OpenAI and Anthropic APIs to prototype out ideas with language models, etc.

  2. I’m currently only seeking applications from people who are at least 25% likely to want to continue working together full-time post-MATS (e.g., 4-6 additional months post-MATS until the research project runs to completion).

  3. My projects involve learning to execute well on a well-scoped research project (as a first step for getting into research). I will have several project ideas which you would be able to score your interest/fit with, and I’ll match you with a project that fits your interests. If you’re excited (or not excited) about some of my past work, that’s probably reasonably representative of whether you’d be excited about the project we’d match you with. For people who have led 2+ machine learning research projects in the past, I may be more flexible, especially where we can scope out a project that seems promising to both of us.

  4. My projects are fairly collaborative. All projects will have 2-4 full-time research contributors (e.g., people running experiments), and generally one more hands-on research co-advisor (generally someone more experienced). I’ll provide feedback on the project primarily through weekly project meetings, one-off 1:1 meetings as needed, or random in-person check-ins/chats

Selection Questions

Selection questions include 12 required short-response questions and seven optional short-response questions. Your responses will be reviewed, after which we will send an email containing a coding challenge, which will take up to 90 minutes. The short-response include questions such as the following:

  1. What are your odds of being interested in continuing to work together with me full-time beyond the 2-month MATS period? I’m currently only seeking applications from people who are at least 25% likely to want to continue working together full-time post MATS (e.g., 4-6 additional months post-MATS until the research project runs to completion). (Include a %, possibly with some explanation of your number as needed.)

  2. How excited are you to pursue a well-scoped research project (vs. pursue a project that you scope out)? Most of my projects involve learning to execute well on a well-scoped research project (as a first step for getting into research). For people who have led 2+ machine learning research papers in the past, I may be more flexible, especially where we can scope out a project that seems promising to both of us. (~1 sentence)

  3. In what ways are you opinionated on what you work on (if any)? (~1-3 sentences)

  4. What programming languages are you fluent in? Please provide a rough estimate of how many hours you’ve spent programming in each language you list. (1 sentence)

  5. Please talk briefly about an area of technical work right now you’re most interested in or excited about. (~3-5 sentences)