Emerging Challenges in Securing Frontier AI Systems – On Analytics

This article is based on a presentation at the Fall 2023 Institute of Business Analytics Conference on Generative AI and Cybersecurity: Navigating the New Era of Threats and Safeguards, by Mikel Rodriguez, leader of the AI Red Team at Google DeepMind.

For the past 20 years, if someone wanted to interact with machine learning (ML) systems, they would have to go to a state-of-the-art lab. Today, anyone who is interested in ML can pull out their phone and easily interact with these interfaces.

“The world is accelerating so quickly,” says Mikel Rodriguez, who leads AI assurance at Google DeepMind. “There isn’t a week that goes by without some interesting new development in artificial intelligence capabilities.”

Because AI is becoming so democratized, more attention is going to making these systems secure and safe as part of the public discourse. High profile leaders are trying to address the same issues that developers are tackling from a technical perspective.

“For many years, this work was more of an academic exercise,” says Rodriguez. “There was research coming out and papers being published, but we weren’t sure how this was going to play out in reality.”

For example, just a few years ago, MITRE ATLAS, a knowledge base of adversarial tactics and techniques, began to capture how security risk played out in the real world. The methods are unfolding faster and in even more surprising ways today, says Rodriguez.

Building a Career in Artificial Intelligence

Rodriguez has spent more than two decades in the public and private sectors to engineer—and then secure—applications of AI in high-stakes environments.

Rodriguez worked on one of the earliest attempts to build a self-driving car using computer vision systems. And during time with MITRE, a nonprofit committed to the public interest, he focused on applications of AI like health, transportation, and national security. These high-stakes applications made him realize that the applications need to work, but they also need to be safe and secure.

“When someone asks, should we deploy these systems or not? That’s when it becomes real, says Rodriguez. “You really need to understand the work, the research that we’re doing and within the broader context of mission impact or business impact.”

One of the first security problems that Rodriguez got to work on was a voice identification system, a biometric verification that uses a user’s voice as their password. There was a string of suspicious authentications, and a government body wanted a group to investigate what was going on. After looking at a variety of suspicious authentications, they realized the attack was a clever approach using a buffer overflow and OpenCV library which is a free, open-source library of programming functions.

“That got me thinking, how do we red team AI systems, not just algorithms?” Rodriguez said. “I learned that it’s not enough to just come up with a bunch of cool attacks. It’s important to be able to tell the story of the relevance. We’re not just building and using these systems for things like Snapchat filters. They’re getting connected to very high stakes environments.”

DeepMind’s Work to Build Safe Artificial Intelligence Systems

DeepMind focuses on foundational research and developing foundational models, then partners with product areas that are a part of Google to build more responsible AI. Rodriguez leads the red team that focuses on models and collaborates with another group that concentrates on system red teaming.

“At DeepMind, we focus a great deal on foundational models,” says Rodriguez. “When you train these systems with the scale of the data that we have access to, it really is astonishing and it’s really cool to see what’s possible.”

Some of the training includes giving models the ability to access third party tools, multimodal inputs, and deeper reasoning with varying conditions and memory rates.

“Right now, we’re used to models that have very fairly episodic, very short term memory,” Rodriguez says. “But what happens when these interactions go over the course of months? It opens up privacy and security issues that need to be addressed.”

Challenges DeepMind is Working on Today

There are vulnerabilities inherent to technology at the hardware level, in the data layer, and at the inference level, Rodriguez says, and we’re beginning to see new risks emerging with third-party tool integration. One challenge DeepMind is beginning to address is that inputs can come not only from user interaction, but from anywhere across the web.

The system can be prompted through personal engagement with the model and also from data sources on the web. That indirect interaction with the model creates all kinds of changes in the attack surface, and once you unpack that, Rodriguez says, you realize it’s a different game.

“We’re going from a one vs. one game to a free for all,” Rodriguez says.

The other challenge that occupies Rodriguez’s days is tool use. While a program is running, a model will annotate an output and say, “I think I can answer this question better.” When it works, the model is very cool, but there are many different things that can go wrong, says Rodriguez.

“The model could reveal information that it shouldn’t, or it could invoke a tool maliciously,” Rodriguez says. “You could teach it to learn to associate things in a negative way. This is a whole new set of tasks that we need to start thinking about securing.”

Rodriguez offered an example: users could upload an image into a model, then have that image invoke a calendar that reveals when you’re free. Doing this academically is one thing, he said, but actually deploying a real-world attack is another system in the sense that these aren’t just like generic ML models.

Optimism about the Future of AI

Rodriguez is excited about events like the Kelley Fall 2023 Institute of Business Analytics (IBA) Conference on Generative AI and Cybersecurity, and community resources like the MITRE ATT&CK database that will be helpful for researchers.

He’s also encouraged by new ideas, especially the pattern that teams—including DeepMind—are creating to perform auto red teaming.

“I’m really optimistic about how many people are getting involved,” says Rodriguez. “This isn’t a super niche topic anymore. All kinds of people are participating, which is exciting because it’s bringing people from different backgrounds. It’s an interesting intersection of different disciplines, and together, we can address some interesting gaps in these areas.”

Leave a Reply Cancel reply

Social media