Coming to DEF CON 31: Hacking AI models
A group of leading artificial intelligence companies in the U.S. committed on Thursday to open their models to red-teaming at this year’s DEF CON hacking conference as part of a White House initiative to address the security risks posed by the rapidly advancing technology.
Attendees at the premier hacking conference held annually in Las Vegas in August will be able to attack models from Anthropic, Google, Hugging Face, Microsoft, NVIDIA, OpenAI and Stability AI in an attempt to find vulnerabilities. The event hosted at the AI Village is expected to draw thousands of security researchers.
A senior administration official speaking to reporters on condition of anonymity ahead of the announcement said the red-teaming event is the first public assessment of large language models. “Red-teaming has been really helpful and very successful in cybersecurity for identifying vulnerabilities,” the official said. “That’s what we’re now working to adapt for large language models.”
The announcement Thursday came ahead of a meeting at the White House later in the day between Vice President Kamala Harris, senior administration officials and the CEOs of Anthropic, Google, Microsoft and OpenAI.
This won’t be the first time Washington has looked to the ethical hacking community at DEF CON to help find weaknesses in critical and emerging technologies. The U.S. Air Force has held capture-the-flag contests there for hackers to test the security of satellite systems and the Pentagon’s Defense Advanced Program Research Agency brought a new technology to the conference that could be used for more secure voting.
Rapid advances in machine learning in recent years have resulted in a slew of product launches featuring generative AI tools. But in the rush to launch these models, many AI experts are concerned that companies are moving too quickly to ship new products to market without properly addressing the safety and security concerns.
Advances in machine learning have historically occurred in academic communities and open research teams, but AI companies are increasingly closing off their models to the public, making it more difficult for independent researchers to examine potential shortcomings.
“Traditionally, companies have solved this problem with specialized red teams. However this work has largely happened in private,” AI Village founder Sven Cattell said in a statement. “The diverse issues with these models will not be resolved until more people know how to red team and assess them.”
Among the risks posed by these models are using them to create and spread disinformation; to write malware; to create phishing emails; to provide harmful knowledge not widely available to the public, such as instructions on how to create toxins; biases that are difficult to test for; the emergence of unexpected model properties and what industry researchers refer to as “hallucinations” — when an AI model gives a confident response to a query that isn’t grounded in reality.
The DEF CON event will rely on an evaluation platform developed by Scale AI, a California company that produces training for AI applications. Participants will be given laptops to use to attack the models. Any bugs discovered will be disclosed using industry-standard responsible disclosure practices.
Thursday’s announcement coincided with a set of White House initiatives aimed at improving the safety and security of AI models, including $140 million in funding for the National Science Foundation to launch seven new national AI institutes. The Biden administration also announced that the Office of Management and Budget will release guidelines for public comment this summer for how federal agencies should deploy AI.