Shi Feng

The stream will focus on conceptual, empirical, and theoretical work on scalable oversight and control. This includes but is not limited to creating model organisms for specific failure modes, designing training procedures against them, and making progress on subproblems involved in safety cases.

Apply

View all streams

Stream overview

Realistic model organisms of deception and collusion
Easy-to-hard generalization of monitors
Legibility and easy-to-hard generalization of scalable oversight protocols
Metacognition

Mentors

Shi Feng

George Washington University

Assistant professor

New York City

—

Control

Scalable Oversight

Red-Teaming

Model Organisms

Monitoring

Shi Feng leads a research group working on oversight and control. He is an assistant professor at George Washington University. Prior to that, he was a postdoc in the NYU Alignment Research Group under Sam Bowman. He currently focuses on deception and collusion, with an emphasis on propensity and evaluation realism.

Mentors can meet with scholars 1.5x/week on average.
Mentors will focus on unbottlenecking collaborates as quickly as possible and will check slack messages every few hours.

Representative papers

Scholars we are looking for

Interested in hybrid (conceptual, empirical, theoretical) work on scalable oversight / control methods and threat modeling against them.
The ability to articulate the crux of a (proposed) work and translating that into empirical experiments: what are the hidden assumptions? what hypothetical finding makes the idea more or less promising?
The ability to look at the data and do careful manual qualitative analysis, e.g., reading debate transcripts.
Comfortable with building scaffolds and various post-training processes.
Human evaluation experience is a plus.

Scholars will collaborate with people involved in the group but can also find new collaborators.

Project selection

A research agenda document will be shared ahead of time with a short list of project ideas. The scholars can also brainstorm and pitch ideas that are aligned with the research agenda. We will decide on assignments in week 2.

Shi Feng

Stream overview

Mentors

Mentorship style

Representative papers

Scholars we are looking for

Project selection