Shi Feng

The stream will focus on conceptual, empirical, and theoretical work on scalable oversight and control. This includes but is not limited to creating model organisms for specific failure modes, designing training procedures against them, and making progress on subproblems involved in safety cases.

Stream overview

  • Realistic model organisms of deception and collusion
  • Easy-to-hard generalization of monitors
  • Legibility and easy-to-hard generalization of scalable oversight protocols
  • Metacognition

Mentors

Shi Feng
George Washington University
,
Assistant professor
New York City
Control, Scalable Oversight, Red-Teaming, Model Organisms, Monitoring

Shi Feng leads a research group working on oversight and control. He is an assistant professor at George Washington University. Prior to that, he was a postdoc in the NYU Alignment Research Group under Sam Bowman. He currently focuses on deception and collusion, with an emphasis on propensity and evaluation realism.

Mentorship style

  • Mentors can meet with scholars 1.5x/week on average.
  • Mentors will focus on unbottlenecking collaborates as quickly as possible and will check slack messages every few hours.

Scholars we are looking for

  • Interested in hybrid (conceptual, empirical, theoretical) work on scalable oversight / control methods and threat modeling against them.
  • The ability to articulate the crux of a (proposed) work and translating that into empirical experiments: what are the hidden assumptions? what hypothetical finding makes the idea more or less promising?
  • The ability to look at the data and do careful manual qualitative analysis, e.g., reading debate transcripts.
  • Comfortable with building scaffolds and various post-training processes.
  • Human evaluation experience is a plus.

Scholars will collaborate with people involved in the group but can also find new collaborators.

Project selection

A research agenda document will be shared ahead of time with a short list of project ideas. The scholars can also brainstorm and pitch ideas that are aligned with the research agenda. We will decide on assignments in week 2.

Community at MATS

MATS Research phase provides scholars with a community of peers.

During the Research phase, scholars work out of a shared office, have shared housing, and are supported by a full-time Community Manager.

Working in a community of independent researchers gives scholars easy access to future collaborators, a deeper understanding of other alignment agendas, and a social network in the alignment community.

Previous MATS cohorts included regular lightning talks, scholar-led study groups on mechanistic interpretability and linear algebra, and hackathons. Other impromptu office events included group-jailbreaking Bing chat and exchanging hundreds of anonymous compliment notes.  Scholars organized social activities outside of work, including road trips to Yosemite, visits to San Francisco, and joining ACX meetups.