In this project, we will explore GPU side-channel attacks to extract information about model usage. A simple example is to observe (via radio, power fluctuations, acoustics, etc.) which experts were used in each forward pass of an MOE model, then use those observations to guess which tokens were produced.
As a team, we will decide which projects to pursue based on individual interest and skills. Broadly, we want to demonstrate information leaving a GPU in unexpected or surprising ways, especially to steal prompt tokens, response tokens, or model weights. Additionally, we are interested in training a model to induce leakage, thus turning the side-channel into a covert channel. We are interested in standard tech stacks and in hardened tech stacks, and scholars more interested in defense will have the freedom to research hardware and software countermeasures.
The experimental setup is a frontier data center GPU with various sensors attached, including an oscilloscope on the power supply and an electromagnetic probe near the GPU die. These and more sensors are software-accessible from the same Jupyter notebook which runs inference and training on the GPU, making it easy to correlate sensor readings with code execution, performance counters, and built-in sensors (as in nvidia-smi).
Gabriel is a fellow at RAND working out how to secure the most sensitive AI data centers against the most sophisticated current and future threats. He is starting new hands-on work to build and test prototypes of secure compute infrastructure. Gabriel has also worked on hardware-enabled governance mechanisms (HEMs, at the intersection of GPU export control and hardware security) and on technical verification of agreements on the development and use of AI systems. He holds a master's degree in computer science and is pursuing a PhD in AI.
My two scholars will work together and with non-scholars on the team, including with direct hires and mentees from other programs. I'm not positive what this cast of characters will look like when MATS begins.
This project is supported via a new spin-out nonprofit which works closely with RAND on more-physical projects which RAND procurement is not a good fit for. From the scholars' perspective, I don't expect this detail to matter. I am retaining my RAND affiliation, so the mentor profile is still accurate, but I may be in touch later asking to add a new affiliation.
Co-working 2-4 hours per week, including detailed guidance. Flexible. 1 hour check-ins per week. You can schedule ad-hoc calls if stuck or wanting to brainstorm.
My two scholars will work together and with non-scholars on the team, including with direct hires and mentees from other programs. I'm not positive what this cast of characters will look like when MATS begins.
This project is supported via a new spin-out nonprofit which works closely with RAND on more-physical projects which RAND procurement is not a good fit for. From the scholars' perspective, I don't expect this detail to matter. I am retaining my RAND affiliation, so the mentor profile is still accurate, but I may be in touch later asking to add a new affiliation.
Please note: experience with hardware is not a requirement for this stream, as long as you are willing to work hard and learn fast, and can show other evidence of exceptional ability. If in doubt: we encourage you to apply!
We will provide you with a lot of autonomy and plug-and-play access to a rare combination of tools and equipment—in exchange we expect you to have a strong self-direction, intellectual ambition, and a lot of curiosity. This stream requires you to have a tight experiment loop to form and test hypotheses on the fly.
Example skill profiles:
Must have: Trained or fine-tuned a transformer language model in PyTorch (toy models and following guides is fine). Familiar with basic electronics concepts (voltage, current, transistors). Has experience writing research papers, even as a class assignment.
Nice to have: Familiarity with LaTeX, PyTorch internals, CUDA/OpenCL, GPU architecture, chip design, oscilloscopes, signal processing, electrical engineering.
You will collaborate with other scholars and researchers pursuing similar topics. You may find other collaborators, but they will need to be individually approved for access to the remote research testbenches.
There is a cluster of potential projects to choose from. As a team, we will decide which to pursue based on individual interest and skills. Mentors will pitch example projects and scholars can then modify and re-pitch them. Once the research problem, hypothesis, and testing plan are written and agreed on, scholars begin object-level work. We encourage failing fast and jumping to a fallback project.
MATS Research phase provides scholars with a community of peers.
.webp)
During the Research phase, scholars work out of a shared office, have shared housing, and are supported by a full-time Community Manager.
Working in a community of independent researchers gives scholars easy access to future collaborators, a deeper understanding of other alignment agendas, and a social network in the alignment community.
Previous MATS cohorts included regular lightning talks, scholar-led study groups on mechanistic interpretability and linear algebra, and hackathons. Other impromptu office events included group-jailbreaking Bing chat and exchanging hundreds of anonymous compliment notes. Scholars organized social activities outside of work, including road trips to Yosemite, visits to San Francisco, and joining ACX meetups.