MATS Alumni

Alumni Spotlight

Highlights

Testimonials

Select Winter 2023-2024 Alumni

  • Aaquib Syed

    Aaquib joined the MATS 5.0 Winter 2024 cohort under the mentorship of Neel Nanda, where he worked on understanding refusal behaviors and jailbreaks in LLMs. He is continuing his work with Neel Nanda to make models more robust to jailbreaks. Before MATS, he was an ARENA 2.0 participant and worked on edge attribution patching, an efficient automated circuit detection algorithm.

  • Aengus Lynch

    Aengus worked in Stephen Casper's stream and collaborated with Philip Guo and Aidan Ewart to publish Eight Methods to Evaluate Robust Unlearning in LLMs. The team, alongside Abhay Sheshadri and Cindy Wu, then began work on Latent Adversarial Training, which continues on the MATS 5.0 extension in London.

  • Alessandro Stolfo

    Alessandro is a doctoral student in the Institute for Machine Learning at ETH Zürich, working at the intersection of LLM reasoning and interpretability. During MATS, Alessandro worked with Neel Nanda on mechanistically interpreting uncertainty mechanisms in language models. The results of the project are currently being turned into a paper. After MATS, Alessandro is going back to complete his PhD and will spend a period interning at the Frontier AI Group at Microsoft Research, working on interpretability.

  • Ali Cataltepe

    Ali was a MATS 5.0 scholar in Vanessa Kosoy’s stream, Agent Foundations. Her work during the program was on formalizing and defining complexity constraints, as well as obtaining runtime and description complexity guarantees for a mathematical framework for describing automata for theoretical RL hypothesis spaces, called “String Machines.” She is starting her Ph.D. in Mathematics at Northeastern University in the fall of 2024, having completed her B.S. in the same subject before MATS 5.0.

  • Andis Draguns

    Andis, mentored by Jeffrey Ladish, designed transformer model backdoor circuits with behaviours that are computationally hard to elicit and investigated autonomous LLM-based agents performing tasks in the real world. He collaborated with Andrew Gritsevskiy and Sumeet Motwani on these projects during the MATS Winter 2024 cohort. Previously focused on ML research and cybersecurity, Andis continues his work in the MATS extension phase and other alignment research directions.

  • Andrew Gritsevskiy

    Andrew participated in the MATS Winter 2024 cohort under the mentorship of Jeffrey Ladish, where he developed synthetic model organisms with trojans having different levels of undetectability, tools to explore the capabilities of agentic models to hack and take real-world actions, and developed realistic training setups that lead to misalignment. Previously, he worked as a research director at Cavendish Labs, and is currently starting a PhD in machine learning at UW-Madison.

  • Arjun Pitchanathan

    Arjun worked with Vanessa Kosoy on the learning-theoretic agenda for AI alignment. Besides MATS, he is a PhD student in Computer Science working on compilers and discrete optimization.

  • Artyom Karpov

    During the MATS Winter 2023-2024 cohort, Artyom worked with Jeffrey Ladish on the Understanding AI Hacking track via the investigation of steganography in LLMs under RL(HF). Before that, he coauthored a project on inducing human-like biases in moral reasoning LMs (neuroconnectionism), finished ARENA and MLSS programs, worked as a software engineer in high load projects for several years.

  • Ben Wu

    As part of MATS, Ben conducted Mechanistic Interpretability research under the mentorship of Neel Nanda. His work focused on investigating how language models express confidence and uncertainty through their internal mechanisms. The findings from this research will be published in an upcoming paper on Entropy Neurons, co-authored with Alessandro Stolfo and Wes Gurnee. Currently, Ben is continuing his PhD in NLP at the University of Sheffield while also participating in the MATS extension phase.

  • Benjamin Wright

    Ben, under the mentorship of Lee Sharkey, worked on improving sparse autoencoders (SAEs) as an algorithm while at MATS 5.0. He was previouslya student at MIT.

  • Chandler Smith

    Chandler took part in the Winter 2024 cohort under the mentorship of Jesse Clifton. During the program, he presented "Cooperative AI: Exploitability <> Punitiveness Benchmark," a methodology for benchmarking cooperative properties and dispositions in multi-agent systems. This research is a continuation of his recent work, "Escalation Risks from Language Models in Military and Diplomatic Decision-Making," which has recently been covered in New Scientist and Vice. In addition to AI Safety research, Chandler is a Technical Architect Fellow at IQT, where he advises the US National Security and Intelligence Community.

  • Connor Kissane

    Connor participated in Neel Nanda's mechanistic interpretability cohort, focused on applying Sparse Autoencoders to understand attention layer outputs. You can find his research write-ups on the Alignment Forum. He is currently continuing his research during the MATS extension phase.

  • Egg Syntax

    Egg Syntax is an independent AI safety researcher with a current focus on deception and manipulation, and came to AI safety from a career as a software engineer and work in climate science. As part of MATS 5.0 and 5.1, under the mentorship of Jessica Rumbelow, Egg investigated what LLMs can infer about their users.

  • Evžen Wybitul

    Evžen took part in MATS 5.0, working under David Lindner on scalable oversight methods for RL based on process supervision. After MATS he went back to finish his Master's in Data Science at ETH Zurich, where he plans to work on a thesis focusing on mechanistic interpretability. There he hopes to apply some of the ideas he picked up during his undergrad in bioinformatics.

  • Felix Hofstätter

    During the 2023-2024 Winter Cohort, Felix worked under the mentorship of Francis Rhys Ward. He worked on evaluating the capabilities of models to intentionally fail evaluations ("sandbag") in order to appear safe. He also worked on an interpretability project investigating how prompting models to act as characters ("persona") affect behavior. Before MATS, Felix completed a MENG in Mathematics and Computer Science from Imperial College London, worked as a software engineer, and participated in ARENA.

  • Garrett Baker

    During the MATS Winter 2023-024 cohort, Garrett worked on developmental interpretability of a maze-solving deep RL policy under the mentorship of Jesse Hoogland and Daniel Murfet. After MATS he is continuing to study the development of RL systems in close collaboration with Timaeus. Prior to the MATS Winter cohort, he was working independently on AI alignment, having gotten into the field after attending the MATS Summer 2022 cohort. He has thought quite a lot about alignment, and has a background in applied math.

  • Iván Arcuschin Moreno

    During the MATS Winter 2023-2024 program, Iván worked on building a benchmark of synthetic, but realistic, transformers with known circuits for evaluating mech interp techniques, mentored by Adrià Garriga-Alonso. He now continues this work. Prior to MATS, he obtained a Computer Science PhD from University of Buenos Aires, which was focused on automatic test generation.

  • Jordan Taylor

    During MATS 5.0, Jordan worked with Lee Sharkey and Dan Braun on methods for extracting more functionally relevant interpretable features from model internals, using sparse autoencoders. He is now extending this research to finding sparse interpretable connections between features ("transcoders"), and finishing up his PhD in theoretical quantum physics and tensor-network algorithms.

  • Joseph Bloom

    Joseph Bloom participated in MATS Winter 2023-2024 under the mentorship of Neel Nanda. Before MATS, he was an independent alignment researcher who focussing on Mechanistic Interpretability and Reinforcement Learning. During MATS, he published a set of residual stream Sparse Autoencoder for GPT2 and accompanying statistical analyses. Joseph will continue researching sparse autoencoders (SAEs) while also supporting development of the Neuronpedia platform for Sparse Autoencoder research.

  • Justice Sefas

    Justice is a PhD student in computer science at the University of British Columbia who works on rare-event estimation using generative modeling and reinforcement learning. He took a term off to participate in the winter 2023-2024 MATS cohort to do research in provable AI safety with Davidad. In particular, he used control barrier functions to guarantee obstacle avoidance of an RL agent. Justice plans to extend these techniques to achieve probabilistic safety in critical systems such as electrical grids and nuclear power plants.

  • Lucy Farnick

    Lucy worked on mechanistic interpretability research with Neel Nanda and Arthur Conmy, focusing on SAE-based approaches to circuit-style analysis. Before MATS, she did ARENA and AI Safety Camp, as well as independent research funded by Lightspeed grants while cofounding a small student research org (BASC). She is continuing her research in the MATS extension phase as well as her ML PhD.

  • Mikhail Selezynov

    While at MATs, Mikhail tried to leverage vision-language models (VLMs) for process supervision. He hypothesized that capable VLMs can help building aligned systems by providing rich, detailed, low-level feedback. It turned out to be challenging to show, so he scaled down to making a benchmark of capabilities, which VLM needs to be able to generate good feedback. He plans to finish the VLM benchmark during MATS extension phase and then take part in CHAI internship with Scott Emmons working on adversarial attacks and safety evaluation benchmarks. In university he studied programming, math, probability theory and ML/DL.

  • Niels uit de Bos

    During the MATS 2023-2024 winter cohort, Niels worked under the mentorship of Adrià Garriga-Alonso on adversarially evaluating circuit hypotheses to determine to what extent they retain their explanatory power in edge-cases and off-distribution. He holds a Ph.D. in algebraic geometry and had been working as a software engineer prior to joining MATS. He intends to continue doing alignment research.

  • Paul Riechers

    Paul earned a PhD in theoretical physics and MS in electrical and computer engineering from UC Davis, and subsequently co-founded the Beyond Institute for Theoretical Science (BITS). MATS enabled Paul's exploration of a new interpretability agenda that leveraged his expertise in the ultimate limits of learning and prediction. By the end of MATS, Paul and collaborator Adam Shai had anticipated and found fractals in the residual stream of transformers. Paul and Adam have since co-founded Simplex, an AI safety organization that continues to build the scientific understanding of AI cognition, to enable benchmarks and interventions for future AI.

  • Philippe Chlenski

    Philippe was part of the Winter 2023-2024 MATS cohort working with Neel Nanda on mechanistic interpretability. His work focused on reverse-engineering features in sparse autoencoders, generalizing this to transcoders (AKA input-output encoders). He is returning to complete his PhD in computer science at Columbia University, focusing on representation learning in computational biology.

  • Rick David Goldstein

    Rick participated in the MATS Winter 2023-2024 cohort under the mentorship of Erik Jenner. Rick applied mechanistic interpretability techniques to understand a 2-layer maze solving network. As of April 2024, Rick is continuing this research with Erik and independently working on several other mech interp projects.

  • Rohan Gupta

    Rohan worked on benchmarking mech-interp methods/circuit discovery, under Adrià Garriga-Alonso during MATS Winter 2023-2024. He plans on continuing independent alignment research. He co-founded Axiom Futures in order to promote AI safety in India. Rohan graduated from IIT Bombay in 2022 and was a software engineer at Adobe until 2023.

  • Roman Soletskyi

    Roman participated in the MATS Winter 2023-2024 Cohort, under the mentorship of Davidad, working on provable AI safety and verification of neural networks. He has returned to Paris, finishing his Master in Physics at ENS-PSL, and will continue his research in the MATS extension program.

  • Sumeet Motwani

    Sumeet is currently an undergrad studying CS at UC Berkeley and was part of Jeffrey Ladish's MATS Stream. He worked on agentic systems (language agents browsing the web) hiring humans to perform end-to-end tasks autonomously. After MATS, he will be attending graduate school to pursue more research on agentic systems and alignment.

  • Teun van der Weij

    Teun participated in MATS 5.0, and is currently doing MATS 5.1 in London, both under the mentorship of Francis Rhys Ward. Before MATS, he worked on projects on activation addition and shutdown avoidance evaluations. During MATS, he worked on intentional underperformance on evaluations (sandbagging) and on understanding personas in language models. In addition to technical research, Teun also co-founded the European Network for AI Safety.

  • Thomas Kwa

    Thomas worked on a project with Adrià Garriga-Alonso during the Winter 2023-2024 cohort involving building a benchmark for interpretability to enable further development of circuit discovery methods. Thomas is continuing this research with the MATS extension program and is pursuing safety roles in industry. Before MATS, Thomas was a researcher for MIRI.

  • Yoav Tzfati

    During the MATS Winter 2023-2024 cohort, Yoav started work on implementing meta-level adversarial evaluations of debate under the mentorship of Julian Michael and David Rein of the NYU Alignment Research Group. Prior to MATS Yoav worked as a software/algorithms engineer at CyCognito, building an automated attack surface discovery system.

  • Yujin Potter

    Yujin took part in the MATS Winter 2023-2024 Cohort under Jesse Clifton. She is interested in a wide range of topics within AI alignment/ethics, including examining the societal impacts of AI, identifying AI misalignment, studying AI behaviors in multi-agent settings, and addressing AI bias. She is currently a postdoc at UC Berkeley. Previously, she researched decentralization technologies including DeFi, blockchain, and DAO.