Bloom: an open source tool for automated behavioral evaluations

MATS Fellow:

Isha Gupta

Authors:

Isha Gupta, Kai Fronsdal, Abhay Sheshadri, Jonathan Michala, Jacqueline Tay, Rowan Wang, Samuel R. Bowman, Sara Price

Citations

0 Citations

Abstract:

We are releasing Bloom, an agentic framework for developing behavioral evaluations. Bloom's evaluations are reproducible and targeted: unlike open-ended auditing, Bloom takes a researcher-specified behavior and quantifies its frequency and severity across automatically generated scenarios. Bloom's evaluations correlate strongly with our hand-labelled judgments and reliably separate baseline models from intentionally misaligned ones. As examples, we also release benchmark results for four alignment relevant behaviors on 16 models. Bloom is available at github.com/safety-research/bloom.

Recent research

The 2025 AI Agent Index

Authors:

Leon Staufer, Mick Yang

Date:

February 20, 2026

Citations:

0

Frequently asked questions

What is the MATS Program?
How long does the program last?