State-backed hackers are experimenting with OpenAI models

Microsoft and OpenAI say hackers from China, Iran, North Korea and Russia are exploring the use of large language models in their operations.

By Elias Groll

February 14, 2024

Microsoft CEO Satya Nadella (R) greets OpenAI CEO Sam Altman during the OpenAI DevDay event on November 06, 2023 in San Francisco, California. (Photo by Justin Sullivan/Getty Images)

For the world’s most advanced hackers, large language models are the latest hot productivity tool.

In a report published Wednesday, Microsoft researchers said that they have observed hackers from China, Iran, North Korea and Russia experimenting with the use of large language models, but that they haven’t yet seen the technology be used to carry out any notable attacks.

The development and rapid proliferation of artificial intelligence models have raised fears that hackers might be able to use the technology to carry out cyberattacks, produce disinformation at scale and craft more effective spearphishing emails. Wednesday’s report from Microsoft concludes that while state-backed hackers from some of the world’s most powerful cyber powers are exploring how to use the tool, the worst-case fears regarding how the technology might be abused have not yet come to pass.

“Our analysis of the current use of LLM technology by threat actors revealed behaviors consistent with attackers using AI as another productivity tool on the offensive landscape,” the report finds.

Wednesday’s report describes joint work carried out by Microsoft and OpenAI, which relies extensively on Microsoft computing infrastructure to train its models and has received billions of dollars in investment from the tech giant. It is based on observations of LLMs that are owned and developed by Microsoft or OpenAI, which are widely recognized to be some of the most advanced that are available on the market.

“Our research with OpenAI has not identified significant attacks employing the LLMs we monitor closely,” the authors write, noting that the report aims to “expose early-stage, incremental moves that we observe well-known threat actors attempting.”

Microsoft and OpenAI observed the Russian hacking group known as APT 28 or Fancy Bear using LLMs to carry out research into “satellite communication protocols, radar imaging technologies, and specific technical parameters,” queries that “suggest an attempt to acquire in-depth knowledge of satellite capabilities.”

Fancy Bear has an extensive track record of carrying out cyberoperations in Ukraine supporting Russian military activities, and the group’s LLM queries could conceivably be used to inform attacks on satellite communications infrastructure there — something Russian hacking groups are known to target. But Microsoft and OpenAI concluded that Fancy Bear’s use of LLMs “were representative of an adversary exploring the use cases of a new technology.”

LLMs offer attackers a compelling way to craft more effective lures, and many researchers argue that using LLMs to improve the quality and quantity of spearphishing emails represents one of the most clear-cut ways in which AI models are likely to be abused. Wednesday’s report will only strengthen that argument, as Microsoft and OpenAI are seeing both Iranian and North Korean hacking groups use the technology to trick victims into visiting malicious sites.

The firms saw Iranian hackers known as Crimson Sandstorm use LLMs to generate a spearphishing email “pretending to come from an international development agency and another attempting to lure prominent feminists to an attacker-built website on feminism.” A North Korean hacking group, which has a history of spying on think tank and academic researchers and is best known as Thallium, used LLMs to generate content likely used in spearphishing campaigns to target individuals with regional expertise.

LLMs have also shown significant promise for computer code generation, and here, too, state-backed hackers are using the technology. Hacking groups from all four countries tracked — China, Iran, North Korea and Russia — were seen using LLM-enhanced scripting techniques. The Chinese hacking group known as Chromium used LLMs to “generate and refine scripts, potentially to streamline and automate complex cyber tasks and operations.” The Chinese hacking group known as Sodium attempted to get an LLM to generate malicious code, but failed to bypass the model’s guardrails, which prevented the code from being generated.

In other cases, LLMs served as assistants to the hackers by troubleshooting code, translating foreign languages and explaining technical papers.

Wednesday’s report provides a snapshot of how safety approaches within AI labs are being applied in practice. OpenAI has adopted a fairly restrictive approach to model access, gating its technology with an online chatbot or an API. This allows the company to monitor use of its tools fairly closely.

The level of detail with which the report describes behavior of state-backed hacking groups is an example of how this closed approach toward AI model development can be used to monitor abuse. The fact that OpenAI and Microsoft are able to link queries of their models to well-known hacking groups is an indication that early attempts to surveil LLMs are seeing some success.

By the same token, Wednesday’s report likely only captures a slice of the ways in which foreign hacking groups are experimenting with LLMs. If groups like Fancy Bear are exploring use of OpenAI’s fairly closely monitored tools, it is likely they are also experimenting with open source tools that are more difficult to surveil.

In conjunction with Wednesday’s report, Microsoft announced a set of principles that the company said would govern its efforts to prevent the abuse of AI models by state-backed hackers. The firm pledged to identify and disrupt the use of LLMs by state-backed hackers, said it would notify other AI service providers if Microsoft sees advanced hacking groups using their systems, promised to collaborate with other stakeholders to share information and vowed to be transparent in the extent to which state-backed hackers are targeting and abusing its AI systems.