Krishnamurthy Dvijotham (Dj)

This stream will pursue research on securing and hardening AI systems through rigorous testing, provable defenses, and formal specification, including improving benchmarks for agentic security, scaling mathematically-grounded robustness techniques like randomized smoothing and Lipschitz-constrained training, and developing formal methods for specifying safe agent behaviors.

Stream overview

  1. Grounded security testing for AI agents in realistic environments, and fixing bugs in existing benchmarks - Benchmarks for agentic security testing suffer from several shortcomings, including implementation bugs in the benchmark, lack of realism in tasks/attacks/threat models, . A great project could be an effort to fix some of these.
  2. Scaling provable defenses against adversarial attacks: There has been continuous progress towards provable defenses against adversarial attacks, the most recent being from approaches that either use randomized smoothing to improve robustness post-hoc, or bake in robustness by training models with Lipschitz layers or other mathematical controls. Combining and scaling these could be a great direction to move forward on this
  3. The science of specification - While much work on alignment focuses on evaluating and improving alignment, there has been far less work on the science of specifying behaviors we would like agents to align to. There is rich literature on formal specification of unsafe behaviors in robotics and computer systems literature, and adapting those to the new generation of AI agents, understanding the limits of what can and cannot be formally specified, and scaling autoformalization of these seems like a worthwhile effort.

Mentors

Krishnamurthy Dvijotham (Dj)
Google DeepMind
,
Senior Staff Research Scientist
SF Bay Area
Dangerous Capability Evals, Adversarial Robustness, Security, Red-Teaming, Scalable Oversight

Krishnamurthy (Dj) Dvijotham is a senior staff research scientist at Google DeepMind.where he leads efforts on the development of secure and trustworthy AI agents. He previously founded the AI security research team at ServiceNow Research and co-founded the robust and verified AI team at DeepMind. His past research has received best paper awards at many leading AI conferences, including most recently at ICML and CVPR 2024. His research led to the framework used for AI security testing at ServiceNow and has been deployed in several Google products, including the Android Play Store, YouTube and Gemini.

Mentorship style

Representative papers

Scholars we are looking for

Programming experience, some experience with using AI based systems and mathematical maturity would be great for all the projects. 

Beyond that, if someone has prior experience with building AI benchmarks, red teaming, formal methods etc. that would be great too.

Project selection

Community at MATS

MATS Research phase provides scholars with a community of peers.

During the Research phase, scholars work out of a shared office, have shared housing, and are supported by a full-time Community Manager.

Working in a community of independent researchers gives scholars easy access to future collaborators, a deeper understanding of other alignment agendas, and a social network in the alignment community.

Previous MATS cohorts included regular lightning talks, scholar-led study groups on mechanistic interpretability and linear algebra, and hackathons. Other impromptu office events included group-jailbreaking Bing chat and exchanging hundreds of anonymous compliment notes.  Scholars organized social activities outside of work, including road trips to Yosemite, visits to San Francisco, and joining ACX meetups.