LawZero

We are excited to supervise projects:

Study the causes and implications of (multi-agent) situational awareness;
Contribute to LawZero's Scientist AI, in the form of contextualization and uncertainty estimation.

Stream overview

We are especially interested in supervising projects about:

(1) (Multi-agent) Situational awareness. Language models can recognize data-agnostic, a-semantic perturbations to their activations; when they fail to do so, they can learn, in context, to discriminate the two. Moreover, they can (in-context learn to) identify, e.g., the magnitude / layer at which a perturbation occurs, often generalizing to unseen examples. We want to study (i) the causes of these abilities, (ii) the extent to which they constitute, or extend to, epistemic privilege, and (iii) whether they facilitate predicting, by similarity, how other agents react to one's actions.

(2) Contextualization, a way of pre-processing data to situate statements in context, turning potentially unqualified claims into truth-apt claims. For a sketch of how we envision this being useful for safety, see this blogpost, which details its role in LawZero’s project of building intelligent systems without the capacity for goals [12]---such systems might in turn serve as guardrails [13]. In particular, contextualization decomposes a corpus of text into statements with attribution sources [14], and we are interested in testing whether this helps mitigate preference biases in the presence of unreliable agent-generated data.

(3) Uncertainty estimation for partially trained models. We are studying ensembles, epistemic neural networks, calibration, conformal prediction techniques, and other methods in synthetic environments. We are especially interested in projects that use amortized inference methods (such as GFlowNets [15, 16, 17]) to approximate posteriors over latent variables, such as (i) sources or (ii) predictors behind an autoregressive model, such that predictive uncertainty can be estimated from learned distributions, as opposed to single-point estimates.

Mentors

Mentorship style

Mentees will be assigned a primary mentor and a secondary mentor, such that mentorship w.r.t. both research and engineering is covered.
We provide at least 1h meeting / week with both mentors and, typically, daily availability of both mentors on email / Slack during workdays.
The independence of the scholar depends on the scholar's experience. A priori, we do not expect research independence, but implementation independence roughly comparable to a CS grad student.

Fellows we are looking for

Essential knowledge:

Foundations of machine- and deep-learning;
Transformer architecture and large language models;
Empirical AI safety literature (e.g., evaluations, guardrails, interpretability, …).

Essential experience:

Python;
Designing and implementing machine learning workflows using PyTorch;
Supervised- or RL-fin tuning of language models, at least with toy experiments and some publicly available datasets;
Prompt engineering.

Desired experience:

Experience with libraries such as vLLM, TRL, Hugging Face;
Familiarity with statistical hypothesis testing.

Bonus:

Async APIs;
Multi-gpu training.

We encourage but do not necessarily expect collaborations between mentees in our stream (certain projects can be pursued in parallel, while others benefit from people working together).
We are open to collaborations with MATS' scholars belonging to other streams, if vetted by the mentor(s).
We encourage collaborations with people inside LawZero, subject to respecting first-mentorship (as per MATS' mentorship policy).

Project selection

The stream's mentors will propose and present the projects during week 1.
The mentee(s) will engage with the projects during week(s) 1 or 2 (e.g., reading the literature, replicating a paper, building a demo / MVP).
At the end of week 2 at the latest, the mentor(s) and mentee(s) will agree together on a project, which will be chosen according to interest, feasibility (the ideal proxy-goal is to publish a ML conference paper), state of the literature and relevance to AI safety, and expertise of the mentors and mentees.
Projects may be re-assigned in exceptional circumstances, for example if a discovery suggests a sudden steer in the research direction.

No results