Roger Grosse

Roger Grosse’s stream investigates how to improve influence functions and other training data attribution methods, and uses these tools to study alignment-related phenomena such as out-of-context reasoning and emergent misalignment. The ideal scholar has experience with LLM internals, strong statistics/applied math skills (especially numerical linear algebra), and can independently drive research from literature review through experimentation and analysis. Roger provides shovel-ready projects while giving exceptional scholars freedom to pursue their own ideas, and is open to scholars collaborating with others.

Stream overview

Ways to improve influence functions and/or other training data attribution methods, and/or to use training data attribution to understand alignment-related phenomena such as out-of-context reasoning or emergent misalignment.

Mentors

Roger Grosse
Anthropic
,
Associate Professor
Toronto
Interpretability
Read more

Mentorship style

I will meet with scholars 1 hour per week by default, and will be available to answer questions on Slack roughly daily.

Representative papers

Grosse et al., "Studying Large Language Model Generalization with Influence Functions"

Wang et al., "Better Training Data Attribution via Better Inverse Hessian-Vector Products"

Choe et al., "What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions"

Scholars we are looking for

  • Experience working with LLM model internals
  • Strong background in statistics and/or applied math (esp. numerical linear algebra)
  • Ability to carry out research independently on the timescale of weeks (reading the literature, formulating and carrying out experiments, interpreting results)
  • Ability and willingness to dig into details to get at the root causes of phenomena

Scholars are welcome to find collaborators if they'd find it valuable.

Project selection

I will give the scholar the level of freedom they are ready for. I will be prepared with focused, shovel-ready projects, but exceptional scholars with a vision they are excited about will have the flexibility to pursue it.