Seth Donoughe

This stream will work on projects that empirically assess national security threats of AI misuse (CBRN terrorism and cyberattacks) and improve dangerous capability evaluations. Threat modeling applicants should have a skeptical mindset, enjoy case study work, and be strong written communicators. Eval applicants should be able and excited to help demonstrate concepts like sandbagging elicitation gaps in an AI misuse context.

Apply

View all streams

Stream overview

This stream is primarily interested in mentoring projects in biosecurity that either (1) create rigorous threat models of AI biological misuse or (2) create benchmarks and tools that allow us to evaluate and mitigate these risks, as well as verifying that companies are taking suitable precautions.

Potential example projects include:

Threat Model: What do inference compute trends imply for how fast dangerous biological capabilities may proliferate and become harder to monitor?
Evaluations: Formalizing "scientific ideation in an empirical field" in a manner that allows one to assess human and LLM-generated hypotheses for novelty, plausibility, etc.
Mitigations: Developing a way to more richly assess and describe the "blast radius" or "collateral damage" of efforts to remove-in-pretraining or unlearn material from LLMs
Verification: How are we better able to standardize and compare the effectiveness of classifiers from different AI companies and assess how much they reduce misuse risk?

No results

Seth Donoughe

Stream overview

Mentors

Mentorship style

Fellows we are looking for

Project selection