Krishnamurthy Dvijotham (Dj)

This stream will pursue research on securing and hardening AI systems through rigorous testing, provable defenses, and formal specification, including improving benchmarks for agentic security, scaling mathematically-grounded robustness techniques like randomized smoothing and Lipschitz-constrained training, and developing formal methods for specifying safe agent behaviors.

Stream overview

  1. Grounded security testing for AI agents in realistic environments, and fixing bugs in existing benchmarks - Benchmarks for agentic security testing suffer from several shortcomings, including implementation bugs in the benchmark, lack of realism in tasks/attacks/threat models, . A great project could be an effort to fix some of these.
  2. Scaling provable defenses against adversarial attacks: There has been continuous progress towards provable defenses against adversarial attacks, the most recent being from approaches that either use randomized smoothing to improve robustness post-hoc, or bake in robustness by training models with Lipschitz layers or other mathematical controls. Combining and scaling these could be a great direction to move forward on this
  3. The science of specification - While much work on alignment focuses on evaluating and improving alignment, there has been far less work on the science of specifying behaviors we would like agents to align to. There is rich literature on formal specification of unsafe behaviors in robotics and computer systems literature, and adapting those to the new generation of AI agents, understanding the limits of what can and cannot be formally specified, and scaling autoformalization of these seems like a worthwhile effort.

Mentors

Krishnamurthy Dvijotham (Dj)
Google DeepMind
,
Senior Staff Research Scientist
SF Bay Area
Dangerous Capability Evals
Adversarial Robustness
Security
Red-Teaming
Scalable Oversight

Krishnamurthy (Dj) Dvijotham is a senior staff research scientist at Google DeepMind.where he leads efforts on the development of secure and trustworthy AI agents. He previously founded the AI security research team at ServiceNow Research and co-founded the robust and verified AI team at DeepMind. His past research has received best paper awards at many leading AI conferences, including most recently at ICML and CVPR 2024. His research led to the framework used for AI security testing at ServiceNow and has been deployed in several Google products, including the Android Play Store, YouTube and Gemini.

Read more

Mentorship style

Representative papers

Scholars we are looking for

Programming experience, some experience with using AI based systems and mathematical maturity would be great for all the projects. 

Beyond that, if someone has prior experience with building AI benchmarks, red teaming, formal methods etc. that would be great too.

Project selection