LawZero

We are excited to supervise projects that fall within the two following categories:

  1. Studying the causes, implications, and mitigations of [instances of] situational awareness;
  2. Contributing directly to LawZero's Scientist AI. 

For 1., we are particularly interested in:

  • Evaluation / monitorability awareness;
  • Self-awareness, in an introspective sense.

For 2., we are especially interested in:

  • Testing if  "truth-ification" (a process that, given a corpus of text, augments it so as to make sources of information explicit) allows language models to generalize better;
  • Developing amortized inference methods to estimate the uncertainty of a predictor (such as an autoregressive model). 

Stream overview

Causes, implications, and mitigations of evaluation awareness and introspection.

There is growing evidence that language models exhibit degrees of situational awareness [1]( https://arxiv.org/abs/2309.00667), [2], i.e., they encode or verbalize information consistent with their actual operating context, ranging from (i) distinguishing training vs testing vs deployment (see, e.g., [34] and frontier models' system cards), to (ii) recognizing their own outputs [5] or exhibiting introspective access over their activations [67].

We are especially interested in supervising projects that empirically investigate the science behind these phenomena. 

Examples of projects about evaluation awareness (for representative projects see, e.g., [8910]):

  • Building white- and black-box evaluations to quantify the extent to which models are, in fact, evaluation aware;
  • Building model organisms for evaluation awareness, in order to identify factors (e.g., in the data, or in the training procedure) that make awareness more prominent and generalizable;
  • Proposing mitigations to evaluation awareness. 

Examples of projects about introspection (for representative projects see, e.g., [56711]) including investigating whether introspection can be helpful or harmful for safety.

  • For example, whereas inconsistent answers about a model’s knowledge (or, more generally, “beliefs”) from a non-introspective model can be attributed to hallucinations or lack of capabilities, inconsistencies produced by an introspective model (which is better at accessing/explaining certain hidden states) may signal “dishonesty” or “obfuscation” at a more concerning level.

The Scientist AI: truthification and uncertainty estimation.

LawZero is building the Scientist AI (SAI), a system based on the intuitions that it is possible to disentangle understanding from agency [12], and that oracles can be used as guardrails for agents [13]. 

Two components of the SAI will be a (i) "truthifier", that decomposes a corpus of text into statements with attribution sources [14], and (ii) an estimator of a predictor's uncertainty.

We are especially interested in projects that use amortized inference methods (such as GFlowNets [151617]) to approximate posteriors over latent variables, such as (i) sources or (ii) predictors behind an autoregressive model, such that predictive uncertainty can be estimated from learned distributions as opposed to single-point estimates.

For projects related to “truth-ification”, we would like to investigate whether the truthification pipeline allows to learn better world models, (of the form of, e.g., [18]) in the presence of unreliable agent-generated data.

Mentors

Damiano Fornasiere
LawZero
,
Senior AI safety research scientist
Montreal
Agent Foundations, Dangerous Capability Evals, Monitoring, Control, Red-Teaming, Scalable Oversight

Damiano is a research scientist at LawZero, where he works on (i) the maths behind the Scientist AI, (ii) model organisms to study elicitation, (iii) interpretability and evaluation techniques for situational awareness and introspection.

Jean-Pierre Falet
LawZero
,
Machine Learning Research Scientist
Montreal
Agent Foundations, Dangerous Capability Evals, Monitoring, Control, Red-Teaming, Scalable Oversight

Jean-Pierre is a machine learning research scientist at LawZero, focused on designing model-based AI systems with quantitative safety guarantees. His primary interests are in probabilistic inference in graphical models, and he draws inspiration from his multidisciplinary background in neurology and neuroscience, which informs his understanding of human cognition. Jean-Pierre studied at McGill University, obtaining a medical degree in 2017, completing a neurology residency in 2022, and earning a master's degree in neuroscience in 2023. During his master’s, he developed causal machine learning methods for precision medicine. Concurrently with his work at LawZero, Jean-Pierre is completing a PhD in computer science at Mila and Université de Montréal, supervised by Yoshua Bengio. In addition to contributing to the foundations of guaranteed-safe AI, Jean-Pierre is passionate about translating advances in AI into clinically meaningful, safety-critical applications.

Marc-Antoine Rondeau
LawZero
,
Senior Machine Learning Research Scientist
Montreal
Agent Foundations, Dangerous Capability Evals, Monitoring, Control, Red-Teaming, Scalable Oversight

Marc-Antoine is a Research Scientist at LawZero. His main area of expertise is NLP and applied ML, and he is currently applying this to AI safety projects.

His research areas include interpretability and evaluation.

Oliver Richardson (Oli)
LawZero
,
Senior ML Research Scientist (LawZero) / Postdoctoral Fellow (UdeM)
Montreal
Agent Foundations, Dangerous Capability Evals, Monitoring, Control, Red-Teaming, Scalable Oversight

OIi(ver) is a computer scientist (a staff member at LawZero and postdoc under Yoshua Bengio) with unusually broad scientific and mathematical expertise.

He is a sucker for pretty demos and grand unifying theories—unfortunately, sometimes losing sight of what is practical. Over the last few years (i.e., during his PhD at Cornell), Oli has discovered a beautiful theory describing how a great deal of artificial intelligence, classical and modern, can be fruitfully understood as resolving a natural information-theoretic measure of epistemic inconsistency. There remain many unanswered questions, but the hope is that this already much clearer view can lead to powerful generalist AI systems that are safer because they fundamentally do not meaningfully have goals or desires.

Pierre-Luc St-Charles
LawZero
,
Senior ML Research Scientist
Montreal
Agent Foundations, Dangerous Capability Evals, Monitoring, Control, Red-Teaming, Scalable Oversight

Pierre-Luc St-Charles is a researcher and developer specializing in applied machine learning with over a decade of experience across different non-profit institutes. He has held research roles at the Computer Research Institute of Montréal and senior research roles at Mila, collaborating with industrial partners and multidisciplinary academic teams on innovative projects in natural resources, transportation, digital media, document intelligence, and earth observation. Pierre-Luc earned his PhD in Computer Vision from Polytechnique Montréal in 2018, receiving the departmental Best Thesis Award. Since 2024, he has joined LawZero, a Mila-incubated organization focused on developing safe AI technologies. He is currently focused on building benchmarks and evaluation methodologies for frontier AI systems.

Yoshua Bengio
LawZero
,
Co-President and Scientific Director (LawZero) / Full Professor (UdeM) / Founder and Scientific Advisor (Mila)
Montreal
Agent Foundations, Dangerous Capability Evals, Monitoring, Control, Red-Teaming, Scalable Oversight

Yoshua Bengio is Full Professor of Computer Science at Université de Montreal, Co-President and Scientific Director of LawZero, as well as the Founder and Scientific Advisor of Mila. He also holds a Canada CIFAR AI Chair. Considered one of the world’s leaders in Artificial Intelligence and Deep Learning, he is the recipient of the 2018 A.M. Turing Award, considered to be the "Nobel Prize of computing." He is the most cited computer scientist worldwide, and the most-cited living scientist across all fields (by total citations).

Professor Bengio is a Fellow of both the Royal Society of London and Canada, an Officer of the Order of Canada, a Knight of the Legion of Honor of France, a member of the UN’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology, and chairs the International AI Safety Report. 

Mentors' vacation will be communicated as soon as possible, both to MATS and the mentees. Should the primary mentor go on vacation during MATS (e.g., 2 weeks at the end of August), it will be responsibility of the mentor to (i) provide the minimal amount of mentorship required by the guidelines, (ii) ensure that the secondary mentor can act as primary for the duration of the vacation, (iii) ensure that the tasks for the project are well scoped and doable.

Mentorship style

  • Mentees will be assigned a primary mentor and at least a secondary mentor, such that mentorship w.r.t. both research and engineering are covered.
  • We provide at least 1h meeting / week with both mentors and, typically, daily availability of both mentors on email / Slack during workdays. We expect the primary and secondary mentor to spend ~3 hours/week interacting with the scholar, with some variance depending on the phase of the program. 
  • The independence of the scholar depends on the scholar's experience. A priori, we do not expect research independence (cf. the question on how projects are assigned), but implementation independence roughly comparable to a CS grad student.

Mentors' vacation will be communicated as soon as possible, both to MATS and the mentees. Should the primary mentor go on vacation during MATS (e.g., 2 weeks at the end of August), it will be responsibility of the mentor to (i) provide the minimal amount of mentorship required by the guidelines, (ii) ensure that the secondary mentor can act as primary for the duration of the vacation, (iii) ensure that the tasks for the project are well scoped and doable.

Representative papers

See the papers linked earlier.

Scholars we are looking for

Essential knowledge:

  • Foundations of machine- and deep-learning;
  • Transformer architecture and large language models;
  • Empirical AI safety literature (e.g., evaluations, guardrails, interpretability, …).

Essential experience:

  • Python;
  • Designing and implementing machine learning workflows using PyTorch;
  • Supervised- or RL-fin tuning of language models, at least with toy experiments and some publicly available datasets.

Desired experience:

  • Prompt engineering.
  • Experience with libraries such as vLLM, TRL, Hugging Face;
  • Familiarity with statistical hypothesis testing.

Bonus:

  • Async APIs;
  • Multi-gpu training.
  • We encourage but do not necessarily expect collaborations between mentees in our stream (certain projects can be pursued in parallel, while others benefit from people working together).
  • We are open to collaborations with MATS' scholars belonging to other streams, if vetted by the mentor(s).
  • We encourage collaborations with people inside LawZero, subject to respecting first-mentorship (as per MATS' mentorship policy).

Project selection

  • The stream's mentors will propose and present the projects during week 1.
  • The mentee(s) will engage with the projects during week(s) 1 or 2 (e.g., reading the literature, replicating a paper, building a demo / MVP). 
  • At the end of week 2 at the latest, the mentor(s) and mentee(s) will agree together on a project, which will be chosen according to interest, feasibility (the ideal proxy-goal is to publish a ML conference paper), state of the literature and relevance to AI safety, and expertise of the mentors and mentees.
  • Projects may be re-assigned in exceptional circumstances, for example if a discovery suggests a sudden steer in the research direction. 

Community at MATS

MATS Research phase provides scholars with a community of peers.

During the Research phase, scholars work out of a shared office, have shared housing, and are supported by a full-time Community Manager.

Working in a community of independent researchers gives scholars easy access to future collaborators, a deeper understanding of other alignment agendas, and a social network in the alignment community.

Previous MATS cohorts included regular lightning talks, scholar-led study groups on mechanistic interpretability and linear algebra, and hackathons. Other impromptu office events included group-jailbreaking Bing chat and exchanging hundreds of anonymous compliment notes.  Scholars organized social activities outside of work, including road trips to Yosemite, visits to San Francisco, and joining ACX meetups.