Roger Grosse’s stream investigates how to improve influence functions and other training data attribution methods, and uses these tools to study alignment-related phenomena such as out-of-context reasoning and emergent misalignment. The ideal scholar has experience with LLM internals, strong statistics/applied math skills (especially numerical linear algebra), and can independently drive research from literature review through experimentation and analysis. Roger provides shovel-ready projects while giving exceptional scholars freedom to pursue their own ideas, and is open to scholars collaborating with others.
Ways to improve influence functions and/or other training data attribution methods, and/or to use training data attribution to understand alignment-related phenomena such as out-of-context reasoning or emergent misalignment.
I will meet with scholars 1 hour per week by default, and will be available to answer questions on Slack roughly daily.
Grosse et al., "Studying Large Language Model Generalization with Influence Functions"
Wang et al., "Better Training Data Attribution via Better Inverse Hessian-Vector Products"
Choe et al., "What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions"
Scholars are welcome to find collaborators if they'd find it valuable.
I will give the scholar the level of freedom they are ready for. I will be prepared with focused, shovel-ready projects, but exceptional scholars with a vision they are excited about will have the flexibility to pursue it.
MATS Research phase provides scholars with a community of peers.
.webp)
During the Research phase, scholars work out of a shared office, have shared housing, and are supported by a full-time Community Manager.
Working in a community of independent researchers gives scholars easy access to future collaborators, a deeper understanding of other alignment agendas, and a social network in the alignment community.
Previous MATS cohorts included regular lightning talks, scholar-led study groups on mechanistic interpretability and linear algebra, and hackathons. Other impromptu office events included group-jailbreaking Bing chat and exchanging hundreds of anonymous compliment notes. Scholars organized social activities outside of work, including road trips to Yosemite, visits to San Francisco, and joining ACX meetups.