We are excited to supervise projects:
We are especially interested in supervising projects about:
(1) (Multi-agent) Situational awareness. Language models can recognize data-agnostic, a-semantic perturbations to their activations; when they fail to do so, they can learn, in context, to discriminate the two. Moreover, they can (in-context learn to) identify, e.g., the magnitude / layer at which a perturbation occurs, often generalizing to unseen examples. We want to study (i) the causes of these abilities, (ii) the extent to which they constitute, or extend to, epistemic privilege, and (iii) whether they facilitate predicting, by similarity, how other agents react to one's actions.
(2) Contextualization, a way of pre-processing data to situate statements in context, turning potentially unqualified claims into truth-apt claims. For a sketch of how we envision this being useful for safety, see this blogpost, which details its role in LawZero’s project of building intelligent systems without the capacity for goals [12]---such systems might in turn serve as guardrails [13]. In particular, contextualization decomposes a corpus of text into statements with attribution sources [14], and we are interested in testing whether this helps mitigate preference biases in the presence of unreliable agent-generated data.
(3) Uncertainty estimation for partially trained models. We are studying ensembles, epistemic neural networks, calibration, conformal prediction techniques, and other methods in synthetic environments. We are especially interested in projects that use amortized inference methods (such as GFlowNets [15, 16, 17]) to approximate posteriors over latent variables, such as (i) sources or (ii) predictors behind an autoregressive model, such that predictive uncertainty can be estimated from learned distributions, as opposed to single-point estimates.
Essential knowledge:
Essential experience:
Desired experience:
Bonus: