DARPA competition shows promise of using AI to find and patch bugs

The multimillion dollar challenge is trying to harness artificial intelligence to deliver major gains in cybersecurity.

August 12, 2024

(Getty Images)

LAS VEGAS — The Pentagon is one step closer to building autonomous mechanics that can find and fix vulnerabilities in the world’s digital underbelly — and all it took was a few million dollars and a contest with some of the best and brightest at hacker summer camp.

At this weekend’s DEF CON conference, the Defense Advanced Research Projects Agency convened 90 teams and asked them to build autonomous agents to probe open-source code bases, find vulnerabilities and automatically fix them. Building technology capable of doing so represents a white whale of AI development: a highly difficult-to-achieve technological breakthrough that could deliver massive gains in cybersecurity.

Whether the participants of the Artificial Intelligence Cyber Challenge will be able to build that tool remains unclear, but this weekend’s competition delivered positive signs that recent innovations in AI might enable such a breakthrough.

In the end, the 90 competitors were able to find 22 unique vulnerabilities in major open-source programs like the Linux kernel, automatically patching 15. One team delivered an even more surprising result, finding a new vulnerability in one of the most popular open-source programs out there. Team Atlanta’s project Atlantis found a previously undiscovered vulnerability in the database program SQLite — one of the most used database libraries in the world.

“The thesis of the challenge is that AI can make a fundamental difference. It could be a revolutionary add-on to existing program analysis methods for finding and fixing vulnerabilities,” Perri Adams, the special assistant to the director at DARPA who oversaw the competition, said Sunday.

Launched last year at DEF CON, the two-year contest advanced seven teams to the final round, in which they’ll be tasked with creating artificial intelligence-enabled tools that can automatically find vulnerabilities and patch them.

As it stands, there are more bugs in critical systems than there are people who can find and fix them. The Pentagon is betting that AI-enabled tools can find and fix those vulnerabilities and address these resource constraints.

The seven semifinalist teams — 42-b3yond-6ug, all_you_need_is_a_fuzzing_brain, Lacrosse, Shellphish, Team Atlanta, Theori, and Trail of Bits — won $2 million in prize money. Microsoft, Google, Anthropic, and OpenAI provided the models for the contest. The finalists have until next year to build out their AI systems before the final competition at next year’s DEF CON. The competition will award a total of $29.5 million in prize money.

One of the semifinalists, Trail of Bits, participated in another DARPA contest considered a predecessor — the Cyber Grand Challenge — back in 2016 that also chased automatic vulnerability fixes.

“There’s just too much code to look through, and it’s too complex to process in order to find all the vulnerabilities that are spread out,” said Dan Guido, the CEO and founder of Trail of Bits, a cybersecurity firm. “AI is an opportunity that might help assist us in finding and fixing security issues that are now pervasive and increasing in number.”

The competition challenges focused on well-known open-source programs like the Linux kernel, the database engine SQLite, and the automation Jenkins in Java, among others. Those programs were loaded with vulnerabilities for the contestants to find.

“This isn’t like a hackathon. This isn’t a heroic effort by a single person. This is a really complex challenge with lots of moving parts and it takes a ton of effort to put together correctly,” Guido said.

Using AI to solve the vulnerability problem comes with some advantages, Guido said. The teams had to come up with a cyber reasoning system that used existing programs to analyze and find the vulnerabilities in millions of lines of code.

The contest also requires generating a “proof of vulnerability,” which makes sure that the detected vulnerability is real and not a hallucination generated by a probabilistic program.

There are some other challenges as well, Guido said. Convincing the AI to find vulnerabilities in the first place can be problematic, as there are ethical constraints built into the models given to the teams.

Another challenge is giving the AI program enough autonomy to run without needing human intervention and without causing a global disaster through a faulty fix.

Organizers of the challenge hope that the tools generated by competitors might be applied toward open-source software libraries, and technology generated by the competition will be released as open-source projects at next year’s DEF CON.

“We’re hoping this is going to result in the reduction in vulnerabilities to delivered products. There’s a lot of widely adopted programs and we want those to be extremely secure and hard to break into,” said David Wheeler, the director of open source supply chain security at the Open Source Security Foundation. “Even simply releasing [the code] can be a helpful way to improve these products.”

Addressing open-source security has become a major priority of the Biden administration. On Friday, the Office of the National Cyber Director released a report containing summaries of recommendations from the security community about how to improve open-source security. The Department of Homeland Security is also opening an office that would study vulnerabilities in open-source programs that are found in critical infrastructure, like energy and water.

This story was updated Aug. 14, 2024, to correct Trail of Bits’ placement in the 2016 Cyber Grand Challenge.