Empirical

Streams in this track include hands-on research using machine learning experiments to understand and improve model safety including AI control, interpretability, scalable oversight, evaluations, red-teaming, and robustness. This is the largest track in the program and is defined by its methods rather than any single research agenda. If your primary tool is ML engineering, this is your track.

Apply by June 7th

Application process

Initial application: No track-specific questions.
Stage 2: Complete 1–2 assessments evaluating research taste and technical implementation skills.
Stream applications & follow-up: Apply to individual streams; follow-up includes interviews or additional assessments depending on the stream.

Empirical track overview

The track is defined by its methodology more than by any single research agenda. Fellows run ML experiments to understand and improve the safety properties of frontier models, with work spanning interpretability, AI control, scalable oversight, evaluations, red-teaming, robustness, and model organisms of misalignment. The unifying thread is that progress comes from getting hands on real models (training, probing, fine-tuning, measuring) rather than reasoning from first principles alone. This is the largest track in the program and the most common entry point into technical AI safety research.

We are looking for fellows whose primary tool is ML engineering, broadly construed. The essential requirement is the ability to design and run experiments on language models or other deep learning systems and iterate quickly on the results. In practice that usually means strong Python (with and without AI coding tools), comfort with the infrastructure around running models at moderate scale, and enough research taste to know which experiments are worth running. Mission alignment matters: fellows should be able to say why a given line of empirical work meaningfully reduces frontier risk, not just whether it yields a successful publication. Educational background and seniority are weighted lightly here relative to other tracks. Past cohorts have included strong fellows ranging from undergraduates to senior industry researchers.

Fellows are matched to mentors based on fit, and projects are scoped to produce concrete artifacts by program end: papers, evaluation suites, open-source tooling, or technical reports. Target audiences include safety and alignment teams at frontier labs, governments and other evaluation organizations, the broader ML research community.

Empirical track streams

Mary Phuong (GDM stream)

Empirical

GDM stream focused on scheming risk, AI control, monitoring, monitorability, and loss-of-control evaluations. Probably running in-person in London.

Megan Kinniment

Empirical

This stream will focus on the science and development of model evaluations, especially monitorability and alignment evals.

Michael Chen

Empirical

Policy and Governance

Research papers (technical governance or ML) related to evaluating and mitigating dangerous AI capabilities, with a focus on what's actionable and relevant for AGI companies

Neev Parikh

Empirical

I'm interested in empirical projects that improve our ability to evaluate model capabilities or enable to understand or evaluate model monitorability. An ideal project culminates in a research output (conference/Arxiv paper or research blogpost with artifacts).

Oliver Sourbut

Empirical

Theory

Policy and Governance

Making society safe from AI doesn't just mean making safe AI: we're figuring out how to uplift human collective intelligence, manage a highly multiagent world, improve foresight and institutional competence, ideally learning how to make best positive use of frontier AI systems as we go. FLF has a small, sharp team of researchers with a wide network, and we're looking to nurture new and missing approaches to minimising large-scale risks while steering to a flourishing future.

OpenAI Safety Team

Empirical

Projects on this stream cluster into a few broad areas from the empirical track: scalable oversight, AI control, monitorability and interpretability, adversarial robustness, and security.

Most fellows will work closely with one or two mentors on something that fits into the mentors' ongoing research. The above list of mentors above is tentative.

Patrick Butlin

Theory

Empirical

Projects in this stream will be on AI welfare and moral status; more specifically, on what it takes to be a moral patient and how we can determine whether AI systems meet the conditions. I'm looking for applicants who have ideas about these topics and are motivated to explore them in more detail.

Paul Riechers, Adam Shai

Empirical

Theory

In this stream we will explore extensions and implications of our discovery that neural networks pretrained on next-token prediction represent belief-state geometry in their activations. We will build on this fundamental theory of neural network representations in order to discover what AI systems are thinking, and understand their emergent behaviors.

Empirical

Application process

Empirical track overview

Empirical track streams

Mary Phuong (GDM stream)

Megan Kinniment

Michael Chen

Neev Parikh

Oliver Sourbut

OpenAI Safety Team

Patrick Butlin

Paul Riechers, Adam Shai

Frequently asked questions