Luca Righetti, Seth Donoughe

This stream will work on projects that empirically assess national security threats of AI misuse (CBRN terrorism and cyberattacks) and improve dangerous capability evaluations. Threat modeling applicants should have a skeptical mindset, enjoy case study work, and be strong written communicators. Eval applicants should be able and excited to help demonstrate concepts like sandbagging elicitation gaps in an AI misuse context.

Stream overview

This stream is primarily interested in mentoring projects in biosecurity that either (1) create rigorous threat models of AI biological misuse or (2) create benchmarks and tools that allow us to evaluate and mitigate these risks, as well as verifying that companies are taking suitable precautions.

Potential example projects include:

  • Threat Model: What do inference compute trends imply for how fast dangerous biological capabilities may proliferate and become harder to monitor?
  • Evaluations: Formalizing "scientific ideation in an empirical field" in a manner that allows one to assess human and LLM-generated hypotheses for novelty, plausibility, etc.
  • Mitigations: Developing a way to more richly assess and describe the "blast radius" or "collateral damage" of efforts to remove-in-pretraining or unlearn material from LLMs
  • Verification: How are we better able to standardize and compare the effectiveness of classifiers from different AI companies and assess how much they reduce misuse risk?

Mentors

Luca Righetti
Center for the Governance of AI (GovAI)
,
Senior Research Fellow
SF Bay Area
Biorisk, Security, Safeguards

I am a Senior Research Fellow at the Center for the Governance of AI, leading a work stream that investigates national security threats from advanced AI systems. I am also a collaborator at METR, where I help improve the rigor of system cards and evals, and a senior at the Forecasting Research Institute.

I am interested in mentoring projects that create rigorous threat models of near-term AI misuse, especially within biosecurity. Given that this work can include sensitive topics, the final output might look like writing memos and briefings for decision-makers instead of academic publications.

I am also interested in projects that try to strengthen the science and transparency of dangerous capability evaluations reporting. This includes creating standards and checklists, writing peer reviews of model cards, and designing randomized control trials that can push the current frontier. 

Seth Donoughe
SecureBio
,
Director of AI
Chicago
Biorisk, Security, Safeguards

Seth Donoughe is Director of AI at SecureBio. He has a PhD in Organismic and Evolutionary Biology from Harvard, and now researches the intersection of AI and Biosecurity.

n/a

Mentorship style

Typically, this would include weekly meetings, detailed comments on drafts, and asynchronous messaging.

n/a

Scholars we are looking for

For threat modeling work: Skeptical mindset, transparent reasoning, analytical

For evaluations, mitigations, and verification work: LLM engineering skills (e.g., agent orchestration), biosecurity knowledge

Mentorship will be a collaboration between me and my team at GovAI. The specifics depend on the candidate and project.

I am based in Berkley -- and some of my team is based in London.

Project selection

Mentor(s) will talk through project ideas with scholar

Community at MATS

MATS Research phase provides scholars with a community of peers.

During the Research phase, scholars work out of a shared office, have shared housing, and are supported by a full-time Community Manager.

Working in a community of independent researchers gives scholars easy access to future collaborators, a deeper understanding of other alignment agendas, and a social network in the alignment community.

Previous MATS cohorts included regular lightning talks, scholar-led study groups on mechanistic interpretability and linear algebra, and hackathons. Other impromptu office events included group-jailbreaking Bing chat and exchanging hundreds of anonymous compliment notes.  Scholars organized social activities outside of work, including road trips to Yosemite, visits to San Francisco, and joining ACX meetups.