
Anthropic
Nina participated in the MATS summer 2023 cohort under the mentorship of Evan Hubinger. As a result of MATS, she published the paper Steering Llama 2 via Contrastive Activation Addition which won an Outstanding Paper Award at ACL 2024. After MATS, Nina joined Anthropic as a research scientist, and has mentored a number of SPAR and MATS cohorts working on LLM alignment projects.
The Summer 2023 cohort supported 60 scholars with 15 mentors, working across 12 different research areas. The program consisted of a remote 4-week training phase, an 8-week research phase in Berkeley, and a 4-month extension phase. MATS leadership co-founded the London Initiative for Safe AI (LISA) in September 2023 to provide a dedicated research space for AI safety researchers and organizations in London, and for MATS scholars to continue their research projects. Research projects were distributed across multiple areas, with approximately one-third focused on evaluations and capability demonstrations and one-fifth on mechanistic interpretability, alongside work on agent foundations, activation engineering, and cooperative AI.
Participating in MATS was a great way to rapidly upskill in AI safety research, learn about the field, and meet other researchers/collaborators. The environment/office space was also very thoughtfully designed to enable productivity.
Nina Panickssery