Research

Hacker vs. machine at DEF CON: Thousands of security researchers vie to outsmart AI in Las Vegas

The first-of-its-kind hacking contest will challenge security researchers to infiltrate and potentially compromise AI chatbots.

By Elias Groll

August 10, 2023

Photo by Olivier Morin/AFP/Getty Images

Over the next four days, more than 3,000 hackers will descend upon a conference hall at DEF CON and try to break into leading generative artificial intelligence systems. Attendees of the annual hacking conference in Las Vegas will have 50 minutes each at one of 156 laptops to deceive, probe and steal information from AI chatbots, in the largest-ever public exercise aimed at discovering the security weaknesses of large language models.

At a time when interest in deploying generative AI is skyrocketing and the vulnerabilities of these systems are only beginning to be understood, the red-teaming exercise at DEF CON’s AI Village aims to enlist the talents of America’s leading hackers to discover security flaws and biases encoded in large language models to better understand how they might harm society.

The popularity of LLMs and the viral phenomenon of ChatGPT has caused a boom in the AI industry, putting AI tools in the hands of consumers and hackers alike. Hackers have already found ways to circumvent their security controls, and prompt injections — instructions that cause LLMs to ignore their guardrails — targeting mainstream models have received widespread attention. But the organizers of the red-team event hope that the exercise will allow participants to examine the potential harms and vulnerabilities of generative AI more broadly.

“Most of the harmful things that will occur will happen in the everyday use of large language models,” said Rumman Chowdhury, an AI ethicist and researcher and one of the organizers of the events. What Chowdhury refers to as “embedded harms” can include disinformation, racial bias, inconsistent responses and the use of everyday language to make the model say something it shouldn’t.

Allowing hackers to poke and prod at the AI systems of leading labs — including Anthropic, Google, Hugging Face, Microsoft, Meta, NVIDIA, OpenAI and Stability AI — in an open setting “demonstrates that it’s possible to create AI governance solutions that are independent, inclusive and informed by but not beholden to AI companies,” Chowdhury said at a media briefing this week with the organizers of the event.

Broadening the community of people involved in AI security is more important than ever, the event’s organizers argue, because AI policy is being written while key scientific questions remain unanswered. “Congress is grappling with AI governance and they’re searching for guidance,” said Kellee Wicker, the director of the Science and Technology Innovation Program at the Wilson Center, a Washington think tank. As AI policy is being written, “wider inclusion of stakeholders in these governance discussions is absolutely essential,” Wicker argues, adding that the red-team event is a chance to diversify “both who’s talking about AI security and who is directly involved with AI security.”

Participants in the event will sit down at a laptop, be randomly assigned a model from one of the participating firms and provided with a list of challenges from which they can choose. There are five categories of challenges — prompt hacking, security, information integrity, internal consistency and societal harm — and participants will submit any problematic material to judges for grading.

The winners of the event are expected to be announced Sunday at the conclusion of the conference, but the full result of the red-teaming exercise are not expected to be released until February.

Policymakers have seized on red-teaming as a key tool in better understanding AI systems, and a recent set of voluntary security commitments from leading AI companies secured by the White House included a pledge to subject their products to external security testing. But even as AI models are being deployed in the wild, it’s not clear that the discipline of AI safety is sufficiently mature and has the tools to evaluate the risks posed by large language models whose internal workings scientists are often at a loss to explain.

“Evaluating the capability and safety characteristics of LLMs is really complex, and it’s sort of an open area of scientific inquiry,” Michael Sellitto, a policy executive at Anthropic, said during this week’s briefing. Inviting a huge number of hackers to attack models from his company and others is a chance to identify “areas in the risk surface that we maybe haven’t touched yet,” Sellitto added.

In a paper released last year, researchers at Anthropic described the results of an internal red-teaming exercise involving 324 crowd-sourced workers recruited to prompt an AI assistant into saying harmful things. The researchers found that larger models trained via human feedback to be more harmless were generally more difficult to red-team. Having been trained to have stronger guardrails, Anthropic’s researchers found it more difficult to prompt the models to engage in harmful behavior, but the firm noted that its data was limited and the approach expensive.

The paper notes that a minority of prolific red-teamers generated most of the data in the set, with about 80% of attacks coming from about 50 of the workers. Opening models to attack at DEF CON will provide a relatively inexpensive, larger data set from a broader, potentially more expert group of red-teamers.

Chris Rohlf, a security engineer at Meta, said that recruiting a larger group of workers with diverse perspectives to red-team AI systems is “something that’s hard to recreate internally” or by “hiring third-party experts.” By opening Meta’s AI models to attack at DEF CON, Rohlf said he hopes it will help “us find more issues, which is going to lead us to more robust and resilient models in the future.”

Carrying out a generative AI red teaming event at a conference like DEF CON also represents a melding of disciplines — between cybersecurity and AI safety.

“We’re bringing ideas from security, like using a capture-the-flag system that has been used in many, many security competitions to machine learning ethics and machine learning safety,” said Sven Catell, who founded the DEF CON AI Village. These aspects of AI safety don’t fit neatly within the cybersecurity discipline, which is principally concerned about security vulnerabilities in code and hardware. “But security is about managing risk,” Catell said, and that means the security community should work to address the risk of rapidly proliferating AI.

As AI developers place greater focus on security, bringing together these disciplines faces significant hurdles, but the hope of this weekend’s red-team exercise is that the hard-fought lessons from trying — and failing to secure — computer systems in recent decades might applied to AI systems at an early enough stage to mitigate major harms to society.

“There is this sort of space of AI and data science and security — and they’re not strictly the same,” Daniel Rohrer, NVIDIA’s vice president of product security architecture and research, told CyberScoop in an interview. “Merging those disciplines, I think is really important.” Over the course of the past 30 years, the computer security profession has learned a great deal about how to secure systems, and “a lot of those can be applied and implemented slightly differently in AI contexts,” Rohrer said.

Hacker vs. machine at DEF CON: Thousands of security researchers vie to outsmart AI in Las Vegas

More Like This

Stealth China-linked ORB network gaining footholds in US, East Asia

The ‘16 billion password breach’ story is a farce

Mandiant flags fake AI video generators laced with malware

Top Stories

US sanctions bulletproof hosting provider for supporting ransomware, infostealer operations

AT&T deploys new account lock feature to counter SIM swapping

Top FBI cyber official: Salt Typhoon ‘largely contained’ in telecom networks

More Scoops

Anorexia coaches, self-harm buddies and sexualized minors: How online communities are using AI chatbots for harmful behavior

Former NSA, Cyber Command chief Paul Nakasone says U.S. falling behind its enemies in cyberspace

DeepSeek AI claims services are facing ‘large-scale malicious attacks’

‘Severe’ bug in ChatGPT’s API could be used to DDoS websites

OpenAI says it has disrupted 20-plus foreign influence networks in past year

DARPA competition shows promise of using AI to find and patch bugs

Tech giants reveal plans to combat AI-fueled election antics

Latest Podcasts

Verizon’s Alex Pinto on the takeaways from the 2025 DBIR

AI-powered security: Strengthening the endpoint in a changing enterprise landscape

DARPA’s Andrew Carney on AIxCC’s quest for truly autonomous AI

RSA CEO Rohit Ghai on the promise and peril of passkeys

Government

Technology

Threats

Policy