AI Governance

As AI systems continue to advance and develop even stronger capabilities, can we develop policies, standards, and frameworks to guide the ethical development, deployment, and regulation of AI technologies, focusing on ensuring safety and societal benefit?

Mentors

Nico Miailhe
CEO, PRISM Eval; Founder, The Future Society

Nico is developing a comprehensive cognitive profiling benchmark for LLMs to evaluate their higher-order cognitive abilities, with a focus on governance and policy implementation for AI safety.

  • Co-founder and CEO of PRISM Eval, a new Paris-based evals organization I am starting with 2 MATS alumni. I am also the Founder and President (now non-executive) of the independent AI Policy think tank The Future Society (TFS) which I originally started in 2014 at HKS and which is active in Brussels, SF, DC, Paris, London, Montreal, and more. I am also an OECD, UNESCO, GPAI appointed expert. I have been working on AI Governance since 2011.

  • AI Policy institutional innovation focused on the following: building of a global governance regime for AI and in particular the race to AGI; implementation of the EU AI Act. This includes the design, development and "policy injection" of key AI alignment institutions and governance mechanisms' blueprints such as metrology, evals and certification protocols, AI Safety Institutes, content provenance protocols, major accident prevention regime protocols and more.

    In particular, I would like to work with mentees on the governance aspect (what, how, who, when, etc.) of the following project: developing a comprehensive Cognitive Profiling Benchmark for LLMs. The goal is to create a robust and comprehensive benchmark to evaluate the high order cognitive abilities of LLMs in a more realistic and unbiased manner compared to existing benchmarks and compute governance thresholds. Such a benchmark will focus on assessing the high order cognitive capabilities of LLMs, such as reasoning, theory of mind, and world modeling, rather than just their knowledge or task-specific performance. By establishing a set of cognitive ability levels, such a new cognitive benchmark can inform safety considerations and guide the governance of responsible development and deployment of LLMs.

    • 2-3 years of experience in AI Governance minimum (more preferred)

    • Good understanding of AI Eval landscape and challenges is a plus.

    • Good understanding info hazard dynamics

    • Robust safety mindset

    • Basic understanding of the simulator theory

    • Basic understanding of the new LLM Psychology research agenda

Timothy Fist
Senior Technology Fellow, Institute for Progress
Adjunct Senior Fellow, CNAS

Timothy is investigating methods to classify and monitor AI workloads in compute environments, developing techniques to ensure robustness against adversarial obfuscation, and addressing regulatory compliance through innovative AI governance strategies.

  • Tim Fist is a Senior Technology Fellow with the Institute for Progress, and a Senior Adjunct Fellow with the Technology and National Security Program at the Center for New American Security (CNAS). His work focuses on policy for AI compute and differential technological development. Previously, Tim was a Fellow at CNAS, and worked as the Head of Strategy & Governance at Fathom Radiant, an artificial intelligence hardware company. Prior to that work, Fist held several senior machine learning engineering positions, and was the Head of Product at Eliiza, a machine learning consultancy. His writing has been featured in Foreign Policy and ChinaTalk, and his work has been cited in numerous outlets, including Wired, Reuters, Inside AI Policy, The Wire China, and the Wall Street Journal.

    With a background in machine learning and emerging AI hardware technologies, Fist brings a technical lens to problems in national security and AI policy. His career has ranged from developing and deploying AI systems for Fortune 500 companies, to writing code currently being used on a satellite in orbit. He has built software in a wide range of domains, including agriculture, finance, telecommunications, and space navigation. Originally from Australia, Fist holds a B.A. (Honors) in Aerospace Engineering, and a B.A. in Political Science, from Monash University.

  • Frontier AI workload classification

    It will likely be useful to move toward a governance regime where compute providers (e.g. large cloud firms) provide an independent layer of verification on the activities of their customers. For example, it would be useful if compute providers submitted independent reports to the U.S. government whenever one of their customers ran a large-scale pre-training workload exceeding 10^26 operations (mirroring the current requirements imposed by the U.S. AI Executive Order on model developers). However, compute providers will typically (by design) have limited information about the workloads run by their customers. Can we use this information to develop and implement classification techniques that reliably classify workloads run on large clusters? Where classes of interest will likely be: "not deep learning", "pre-training", "fine-tuning", "inference".

    Making workload classification techniques robust to distributional shift

    Changes in the hardware, software packages, and specific algorithms used to run frontier AI workloads will likely degrade the accuracy of workload classification techniques over time, perhaps in sudden and unexpected ways. Can we design an overall process for developing, implementing, and monitoring the efficacy of workload classification techniques that is robust to this kind of degradation?

    This is a follow-up project to "Frontier AI workload classification".

    Making workload classification techniques robust to adversarial gaming

    Workload classification becomes much harder in cases where the customer is actively trying to obfuscate their activities, by introducing deliberate noise in the code and/or data run as part of a workload. Can we design workload classification approaches that are robust to this kind of gaming (to within certain cost-efficiency bounds from the customer's perspective), or that are able to detect when this kind of gaming is occurring?

    This is a follow-up project to "Frontier AI workload classification", and could possibly be combined with "Making workload classification techniques robust to distributional shift".

    Tackling "compute structuring"

    Compute-based reporting thresholds for training runs (e.g. the requirements in the U.S. AI Executive order) could be a useful way to ensure model developers are complying with best practices for AI safety. However, such thresholds are likely to be highly "gameable". One relevant case study comes from finance: in the United States, a law was introduced requiring banks to report all transactions above a particular threshold to the government, in order to monitor money laundering, tax evasion, and other nefarious financial activities. In response, "transaction structuring" became a widespread practice, where customers of banks eager to avoid scrutiny broke up their transactions into multiple transactions, each below the reporting threshold. There are two clear ways this could be done in compute world:

    Breaking up a large workload into multiple sequential workloads, where each workload is run using a different cloud account or provider, each workload is below the compute threshold, and each workload takes as input partially-trained model weights from the previous workload. Distributing a large workload across multiple cloud accounts/providers, where each account/provider serves as a single data parallel worker within the larger distributed training run, periodically broadcasting/receiving weight updates from other workers.

    What other forms of compute structuring might be possible? What are some effective technical solutions to both detect and prevent different forms of compute structuring? For example, tackling (1) might involve having compute providers report all workloads above a particular throughput (in terms of operations per second), and then aggregating those reports in order to identify (perhaps also informed by shared identity information based on KYC) likely sequential training runs. Tackling (2) might involve imposing technical boundaries on off-cluster latency/bandwidth in order to make such a scheme prohibitive from a cost-efficiency perspective.

    A system for secure chip IDs

    Being able to assign un-spoofable IDs to hardware devices in data centers will be extremely useful for tracking and verifying compute usage. Such IDs could also be useful for linking AI agents back to the physical infrastructure on which they are deployed, in order to provide an additional layer of oversight. What are some promising technical approaches for implementing secure IDs that are sufficiently robust against hardware and software-level attacks? How can ID creation, provisioning, and tracking be handled in a way that makes downstream policy measures maximally easy (e.g. chip tracking for anti-smuggling)?

  • Timothy Fist has worked with a variety of professionals spanning multiple fields.