AI control focussed stream, probably running in-person in London.
I currently work on AI control and I'm excited about e.g. prototyping and evaluating mitigations for specific threat models, doing science that sheds light on important considerations related to AI control, or building evals for control-related capabilities.
It's hard for me to give specific project descriptions so far in advance, but here's a selection of projects I've been excited about in the past (some of which are now papers!). You can expect the actual projects to be of similar flavour but not exactly the same:
I'm pretty hands-off. I expect scholars to fully take charge of the project, and update / consult me as needed. I do want my scholars to succeed, and am happy to advise on project direction, experiment design, interpreting results, decision-making / breaking ties, or getting unstuck.
During the program, we'll meet once a week to go through any updates / results, and your plans for the next week. I'm also happy to comment on docs, respond on Slack, or have additional ad hoc meetings when useful.
I prefer scholars to work in pairs or groups of three within the stream, but happy to take on external collaborators as long as they are committed full-time to the project.
I'll propose ~5 projects for scholars to red-team, flesh out and decide on one to own. I'm also open to scholar-proposed projects if they sound promising; I'd just be less useful as an advisor.