MATS Fellow:
Jacob Charnock, Raja Moreno, Justin Miller, William L. Anderson
Authors:
Jacob Charnock, Raja Mehta Moreno, Justin Miller, William L. Anderson
Citations
Abstract:
Frontier AI developers are increasingly deploying highly capable models internally to automate AI R&D, but these deployments currently face limited external oversight. It is essential, therefore, that developers provide evidence that internally deployed models are safe. While recent work has highlighted the risks of internal deployments and proposed broad approaches to transparency and governance, there remains little guidance on the specific information developers should disclose about them. We address this gap by identifying key information that companies should disclose about internally deployed models across four categories: capabilities, usage, safety mitigations, and governance. For each category, we analyse the key benefits and limitations of disclosure and consider how disclosure-related risks can be mitigated. Our framework could be used by developers to inform both public transparency documents, such as model system cards, and private periodic reports required under emerging frontier AI regulation.
Interpreting Language Model Parameters
Authors:
Bart Bussmann, Nathan Hu, Michael Ivanitskiy
Date:
May 5, 2026
Citations:
Removing Sandbagging in LLMs by Training with Weak Supervision
Authors:
Emil Ryd
Date:
May 1, 2026
Citations:
The MATS Program is an independent research and educational initiative connecting emerging researchers with mentors in AI alignment, governance, and security.
Each MATS cohort runs for 12 weeks in Berkeley, California, followed by an optional 6–12 month extension in London for selected scholars.