This article is based on a presentation at the Fall 2023 Institute of Business Analytics Conference on Generative AI and Cybersecurity: Navigating the New Era of Threats and Safeguards by Christina Liaghati, AI Strategy Execution & Operations Manager at MITRE.
Over a two-and-a-half-year period, an unsophisticated hacker duo was able to steal $77 million from the Chinese government by attacking a machine learning-enabled facial identification system that gave them access to the Shanghai Tax Authority.
The attackers created a fake shell company, which they used to acquire black market cell phones and headshots of Chinese citizens, as well as a long list of personal identifiable information. They were able to tweak the headshots and use the identifiable information to register accounts that evaded the ML-enabled facial recognition system used to verify identity.
With the privileged access into the Shanghai Tax Authority’s verified account environment, they repeatedly submitted fraudulent invoices that totaled $77 million.
MITRE ATLAS’s Role in Protecting AI Systems
As an emerging technology, we’ve seen time and again that artificial intelligence systems can fail. MITRE, a not-for-profit organization committed to raising awareness of the unique and evolving vulnerabilities of AI, is encouraging organizations to dip their toes into red teaming, a set of techniques used to emulate the attacks that could come against your own systems.
MITRE has developed a knowledge base of adversarial techniques and case studies for AI systems called MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems). It’s a comprehensive resource for anyone interested in understanding and safeguarding AI systems, says Christina Liaghati, AI strategy execution and operations manager for MITRE. Organizations can use the tools in the knowledge base to find and mitigate risks in their own AI models.
The ATLAS framework is meant to build an international understanding of what AI threats and concerns look like and incorporate real-world attacks that have significant impact into its knowledge base, like the Shanghai Tax Authority fraud example.
“Because many of the attack pathways are challenging to detect, we have to balance realistically-achievable red team exercises that have been demonstrated on real-world systems or realistically-deployable systems,” Liaghati says. “After identifying how systems can be attacked, organizations can use these tools to find out if their system is vulnerable.”
ATLAS recently released an open-source CALDERA plug-in that emulates an adversarial attack on an AI-enabled system, as well as LLM (large language model) attack vectors that were developed in collaboration with Microsoft.
“The pace at which these systems are being deployed has exposed more vulnerabilities,” says Liaghati. “A year ago, we weren’t seeing nearly this many vulnerabilities in the AI space. ATLAS is very much a living, rapidly evolving database, so we’re trying to keep up. At the same time, we want to make organizations aware of the abilities and tools we have that are available to combat AI vulnerabilities and attacks.”
A Case Study from MITRE ATLAS: PoisonGPT
Another recent example incorporated into the ATLAS framework demonstrated how a bad actor could download and poison a pre-trained LLM to return false facts. In the case study, a whitehat hacker downloaded an open-source GPT-J model from HuggingFace, then modified the GPT-J internal model weights to favor their own adversarial facts. This created the PoisonGPT model, which was evaluated against the original, unmodified GPT-J. The differences in accuracy between the two models were hard to detect.
The model was then successfully uploaded back into HuggingFace, the largest publicly-accessible model hub. Users could have downloaded the model and spread the poisoned data and misinformation, which could harm the reputation of the model and cause other societal or financial harms.
“If that whitehat hacker had been a malicious actor and put resources behind increasing the popularity of the poisoned model, there would have been a significant amount of fallout,” Liaghati says. “Users could have downloaded the poisoned model and assumed that the outputs were appropriate and factual, when really, they were poisoned and malicious. Massive misinformation and disinformation impacts could have come out of this.”
MITRE ATLAS’s Response to New Attack Techniques
As new LLM attack techniques are discovered, Liaghati says they’re evolving the ATLAS Matrix. The team at MITRE is focused on rapid AI incident sharing mechanisms, in which an incident is either reported to MITRE via a form then anonymously added to dashboard data, or anonymized and publicly shared to the incident dashboard as a case study.
When the ATLAS Matrix was launched around three years ago, it was in close collaboration with Microsoft and around 10 other companies. At the time, there weren’t that many groups that were mature in the AI security space—but now, there are more than 100 organizations involved with the ATLAS Matrix, including a NATO exploratory team.
“The energy and momentum around this space is massive,” says Liaghati. “It’s a really good time to develop mechanisms for anonymized incident sharing and to have more real-world, statistics-driven information on the types of AI risks that are happening.”
Leave a Reply