
How should we operationalize AI risk? It depends on who you ask. Dario Amodei and the team at Anthropic propose AI safety level standards in an effort to standardize the risks of various LLM models.[1] He is among the most concerned about the risks of AI, or as much as the CEO of an AI company can be, but I think his proposed system is helpful for understanding the potential dangers of AI. That said, it also has some glaring weaknesses.
Dario’s first step is to categorize types of risk from AI as catastrophic vs. autonomous. Catastrophic risks are the misuse of an LLM in domains such as cybersecurity, biological, radiological, nuclear (CBRN), in ways that could harm thousands to millions of people. Dario identifies these risks as the highest priority to prevent. Autonomous risks, in contrast, involve the risk of machines not doing what we want them to do. As LLM capabilities increase, their potential to work autonomously does as well, but it’s difficult for AI to differentiate between the things it should do and the things it should not do.
Next, Amodei creates a scale called AI Safety Levels. ASL-1 describes systems that “manifestly don’t pose any risk of autonomy or misuse” He uses Deep Blue, the chess bot, as an example of ASL-1. Today’s LLMs he asserts, are ASL-2, meaning that the systems are not advanced enough to self-replicate, work on multiple tasks, or provide dangerous information, any more than a Google search could. ASL-3 is “the point at which the models are helpful enough to enhance the capabilities of non-state actors.” ASL-3 is primarily a misuse risk, where the model could make a nonstate actor far more dangerous by sharing information about CBRN. ASL-4 and above are speculative but would significantly enhance both autonomous and catastrophic risks of AI.[2][3]
There are steps that can be taken to prevent AI risk, but they come with pros and cons. First, there is work like Anthropic is doing, evaluating AI’s internal models to ascertain their capabilities, and implementing safety protections to mitigate risk. This is done under the premise that if an ASL-3 risk is identified in a model, it can be mitigated through proper governance (internal means) and reduced to an ASL-2 risk. Anthropic released this document entailing their Responsible Scaling Policy.
The first limitation of this sort of approach is that a governance policy such as theirs will only work if all LLM developers abide by it. Yet, there are many reasons that an AI company would be disinclined to do so. For one, ASL-3 models would necessarily be “better” than ASL-2 models because they would be better at enhancing average users in all capabilities, not just dangerous activities. The AI world is in an arms race as different companies compete for market share, clearly incentivized to create more powerful and more capable models. Those that lag behind the bleeding edge of LLM development risk becoming obsolete. This is one of the concerns that Dario mentions in his podcast interview with Lex Fridman, that companies who handcuff themselves with safety concerns won’t be competitive with those that don’t.[4]
Moreover, there are risks that emerge from “crying wolf:” if you claim a technology is dangerous, spend time and energy investigating it, and determine that it’s not, then you risk such claims not being taken seriously in the future.
Despite taking some initial steps, such as the AI Safety Summit in November, governments around the world are hesitant to place restrictions on AI development. Even in the United States there are disagreements on how and even whether AI ought to be regulated. For example, on October 30th President Biden issued an executive order on Safe, Secure and Trustworthy Artificial Intelligence.[5] The executive order requires that AI developers share their safety test results and other critical information with the US Government, develop standards and tools to ensure that AI is safe and trustworthy, and protect against the risks that AI poses to Americans. That’s all well and good, at least in theory, but critics argue that it will handcuff American companies to the benefit of foreign AI developers. President-Elect Trump has promised to repeal the order. An AI safety bill in California was vetoed by Governor Gavin Newsome for similar reasons.[6]
As a result, AI development is the Wild West, where each AI developer is responsible for their own models’ risk mitigation, but incentivized to push the envelope at every step. Is that how we want things to be? Some experts, like Yann LeCun, Chief AI Scientist at Meta, think that we aren’t yet close to the risks of ASL-3 and beyond. Since developments in AI technology are incremental, he argues, there will be chances to develop regulatory frameworks and safeguards alongside it.[7] He argues for AI technology to be developed with open source methods, allowing anyone to look at, use, and make their own versions of the LLM architecture. Such transparency could foster trust and allow people to raise concerns if they identify issues.[8]
Bioethics has historically confronted many situations where a new technology carries great possible benefit, but also great possible harm, e.g. cloning, genetic engineering, etc. The field of AI, faces a particularly hard tradeoff with no clear optimal step. Should we hamper the development of “our” AI technology and risk being overtaken by a competitor or adversary? Or should we risk potential misuse and the harms that emerge from that?
[1] https://www.anthropic.com/news/anthropics-responsible-scaling-policy
[2] https://www.anthropic.com/news/anthropics-responsible-scaling-policy
[3] https://lexfridman.com/dario-amodei-transcript/#chapter9_ai_safety_levels
[4] https://www.wsj.com/tech/ai/ai-safety-testing-red-team-anthropic-1b31b21b?mod=hp_lista_pos1
[5] https://www.whitehouse.gov/briefing-room/statements-releases/2023/10/30/fact-sheet-president-biden-issues-executive-order-on-safe-secure-and-trustworthy-artificial-intelligence/
[6] https://www.wsj.com/tech/ai/ai-regulation-california-bill-29e40745?mod=article_inline
[7] https://lexfridman.com/yann-lecun-3-transcript/#chapter19_ai_doomers
[8] https://lexfridman.com/yann-lecun-3-transcript/#chapter14_open_source
Leave a Reply