Marius Hobbhahn

We will continue working on black-box monitors for scheming in complex agentic settings, building on the success of the previous stream.

See here for details.

Stream overview

The entire stream will be dedicated to building high-quality black-box scheming monitors. 

In a previous stream, we have had great initial success designing black box monitors for scheming and we've made substantial progress since then. 

We will continue to work on black box scheming monitors. Potential directions for this version of the stream could include: a) fine-tuning monitors, b) improving monitors through adversarial training, c) more "science of monitoring", d) harder and more complex datasets for scheming to improve monitor training/selection and testing.

Mentors

Marius Hobbhahn
Apollo Research
,
CEO
London
Control, Scheming & Deception, Dangerous Capability Evals, Monitoring

Marius Hobbhahn is the CEO of Apollo Research, where he also leads the evals team. Apollo is an evals research organization focused on scheming, evals and control. Prior to starting Apollo, he did a PhD in Bayesian ML and worked on AI forecasting at Epoch. 

I want you to grow as a researcher/engineer as fast as possible. Therefore, the goal is always for scholars to write a paper and submit it to a top ML conference before the end of the extension. We will also try to produce other outputs, such as intermediate blog posts. 

We have a large compute budget available for the stream, allowing us to run numerous experiments for fine-tuning or generating big data pipelines.

Mentorship style

We have two weekly 60-minute calls by default. Since everyone will work on the same project, these calls will be with all participants of the stream. I respond on slack on a daily basis for asynchronous messages. Scholars will have a lot of freedom for day-to-day decisions and direction setting. In the best case, you will understand the project better than me after a few weeks and have a clear vision for where it should be heading. I recommend scholars focus 100% of their work time on the project and not pursue anything on the side. I think this way people will learn the most in MATS. 

I want you to grow as a researcher/engineer as fast as possible. Therefore, the goal is always for scholars to write a paper and submit it to a top ML conference before the end of the extension. We will also try to produce other outputs, such as intermediate blog posts. 

We have a large compute budget available for the stream, allowing us to run numerous experiments for fine-tuning or generating big data pipelines.

Representative papers

Here is a doc with more details on the current plans for the black box monitoring stream.

Scholars we are looking for

  1. You are interested in this project in particular, i.e. working on black-box monitors for scheming in agentic settings. I will only focus on this project and no other research.
  2. You enjoy tinkering with LLMs, e.g. prompting, building basic LM agents, and fine-tuning. 
  3. You are happy to build datasets and synthetic data generation pipelines. We found that a substantial amount of initial work is required for the data generation process. 
  4. You like quick empirical iteration and direct feedback loops. 
  5. I expect that you will spend 20% on conceptual work (e.g., think about which environments could work or what techniques to try) and 80% on hands-on empirical work (e.g., implementing and running experiments).
  6. I prefer that scholars focus 100% of their work time on the project and not pursue any side projects. In general, I’m happy to support highly ambitious scholars who want to make a lot of progress during MATS. In the past, people have described my stream as "intense, but in a good way". 

You will work with other scholars on the stream. The exact level of collaboration depends on your preferences and which exact subproject you're working on. You may also work with scholars from previous streams who are still working on the project. 

Project selection

You will work on subprojects of black box monitoring. See here for details.

Community at MATS

MATS Research phase provides scholars with a community of peers.

During the Research phase, scholars work out of a shared office, have shared housing, and are supported by a full-time Community Manager.

Working in a community of independent researchers gives scholars easy access to future collaborators, a deeper understanding of other alignment agendas, and a social network in the alignment community.

Previous MATS cohorts included regular lightning talks, scholar-led study groups on mechanistic interpretability and linear algebra, and hackathons. Other impromptu office events included group-jailbreaking Bing chat and exchanging hundreds of anonymous compliment notes.  Scholars organized social activities outside of work, including road trips to Yosemite, visits to San Francisco, and joining ACX meetups.