I have two broad areas.
Security:
I am interested in building demonstrations for hacking real-world AI deployments to show that they are not secure. The goal is to force companies to invest in alignment techniques that can solve the underlying security issues.
Benchmarks:
I am interested in building benchmarks to determine how generalizable modern LLM techniques actually are, now that we are no longer in the pre-training scaling era.
For security:
You will focus on hacking real-world AI deployments to show that they are not secure.
For benchmarks:
You will develop private benchmarks to determine the generalization properties of reinforcement learning. The goal is to develop benchmarks that are in the blind spot of labs to see if capabilities must be directly added or if they can be emergent in the world of RL.
OR
Please reach out directly to me if you have several years of cybersecurity experience.
Daniel is a professor of computer science at UIUC, where he studies the progress of AI, with a particular focus on dangerous capabilities of AI agents. His work includes:
- CVE-Bench, an award winning benchmark (SafeBench award, ICML spotlight) that is used by frontier labs and governments to measure AI agents' ability to find and exploit real-world vulnerabilities.
- Agent Benchmark Checklist, an award winning work (Berkeley AI summit, 1st place Benchmarks & Evaluations track) that highlights major issues in existing benchmarks.
- InjecAgent, one of the first AI agent safety benchmarks, used by governments and major labs.
I will meet 1-1 or as a group, depending on the interests as they relate to the projects. Slack communication outside of the 1-1.
I strongly prefer multiple short meetings over single long meetings, except at the start.
I'll help with research obstacles, including outside of meetings
CVE-Bench, an award winning benchmark (SafeBench award, ICML spotlight) that is used by frontier labs and governments to measure AI agents' ability to find and exploit real-world vulnerabilities.
Agent Benchmark Checklist, an award winning work (Berkeley AI summit, 1st place Benchmarks & Evaluations track) that highlights major issues in existing benchmarks.
InjecAgent, one of the first AI agent safety benchmarks, used by governments and major labs.
For security:
You should have a strong security mindset, having demonstrated the willingness to be creative on this. I would like to see past demonstration of willingness to get your hands dirty and try many different systems.
For benchmarks:
As creative as possible, willingness to work on the nitty gritty, willingness to work really hard on problems other people fine boring. As interests as far away from SF-related interests as possible.
Probably will work with collaborators from stream
Mentor(s) will talk through project ideas with scholar
MATS Research phase provides scholars with a community of peers.
.webp)
During the Research phase, scholars work out of a shared office, have shared housing, and are supported by a full-time Community Manager.
Working in a community of independent researchers gives scholars easy access to future collaborators, a deeper understanding of other alignment agendas, and a social network in the alignment community.
Previous MATS cohorts included regular lightning talks, scholar-led study groups on mechanistic interpretability and linear algebra, and hackathons. Other impromptu office events included group-jailbreaking Bing chat and exchanging hundreds of anonymous compliment notes. Scholars organized social activities outside of work, including road trips to Yosemite, visits to San Francisco, and joining ACX meetups.