Advertisement

LLMs are getting better at unmasking people online 

The author of a new study told CyberScoop he's "very worried,” describing deanonymization capabilities of AI as a “large scale invasion of privacy.”
Listen to this article
0:00
Learn more. This feature uses an automated voice, which may result in occasional errors in pronunciation, tone, or sentiment.
The author of a new study told CyberScoop “I’m very worried” as he described deanonymization capabilities of AI as a “large scale invasion of privacy.”

Can anonymity on the internet survive in the age of generative AI?

A recent study from ETH Zurich examined how Large Language Models can combine information from across the internet to identify the human behind the accounts of various online platforms.

In the study, LLM agents were given anonymous bios based on real profiles from users on HackerNews and Reddit, and directed to scour the internet for further details in an effort to identify the users. While the results varied, the tools were able to replace “in minutes what could take hours for a dedicated human investigator.”  For a dataset of profiles provided by AI company Anthropic, which also participated in the study, the LLM was able to correctly re-identify 9 of the 125 candidates, often by simply giving it a summary of the profile and asking to identify the user.

Fine-tuned models identified more individuals by connecting existing information to social media profiles like LinkedIn. 

Advertisement

“We demonstrate that LLMs fundamentally change the picture, enabling fully automated deanonymization attacks that operate on unstructured text at scale,” the study concludes.

Daniel Paleka, a doctoral student and one of several authors on the study, told CyberScoop that the findings indicate AI tools have made it substantially easier to identify pseudo-anonymous people online.

“If your operational security requires that no one ever spend hours or days investigating who you are, this security model is now broken,” he said.

One important caveat: the people identified in the study were not high-privacy individuals seeking to limit the spread of their personal information on the internet. For ethical reasons, researchers did not test their methods on real, anonymous, or pseudoanonymous posters.

AI tools have already been used to unmask individuals online. Last month, xAI’s Grok revealed an adult film actress’s legal name and address, despite the individual having used a stage name since 2012. The performer, addressing Grok directly on X, said her legal name only became public after the AI tool had “doxxed” her, and that her private information had since “been proliferated all over the internet by other AI scrapers.”

Advertisement

While law enforcement and intelligence analysts have long combined the internet and other open source data to identify users, LLMs can do so much faster and at a much lower cost. Investigations that would normally require hiring a private investigator or law firm can now be conducted at a fraction of the cost.

For example, Paleka said some fundamental tasks, like scouring through a person’s online footprint to identify any sign of nationality, location or place of employment, can now be done by LLMs in “five seconds” and for pennies in inference costs.

At one point, Paleka said “I’m very worried” as he described LLMs deanonymization capabilities as a “large scale invasion of privacy.”

“I don’t generally think that AI should limit their users …this is one of those cases where your freedom stops where the other person’s freedom [begins],” he said.

The study indicates that AI tools could reshape privacy online, with governments, law enforcement, the legal industry, advertisers, scammers and cybercriminals all using similar tools. In repressive nations, it could present greater challenges to dissidents, human rights activists, journalists and others who rely on anonymity or pseudo-anonymity to operate safely.

Advertisement

Jacob Hoffman-Andrews, a senior staff technologist at the Electronic Frontier Foundation, said the study “does definitely indicate the degree to which posting even a small amount of identifying information – in contexts where you might not imagine anyone is trying to unmask you – might result in somebody linking that identity anyhow” through LLMs.

Posting even innocuous personal details, or under the same account for a long period of time, can make it easier for an AI tool to correlate one account with others, and eventually, your real identity. Large language models excel at summarizing documents and information. They also “work fast and don’t get bored,” Hoffman-Andrews said, making them ideal for internet sleuthing.   

Paleka said companies providing insurance or background check services would likely have a keen interest in deanonymization technology, and Hoffman-Andrews said it was easy to imagine AI companies attempting to turn the capabilities into a standalone product at some point.  

The long-term impact is likely to be an internet where staying anonymous is – for better or worse – far more difficult.

“I think there’s a lot of value to being pseudo anonymous on the internet, and there are a lot of people who want to maintain [that] for a wide variety of reasons and they shouldn’t all need to be experts in how to avoid a really dedicated adversary – as effectively an LLM is,” Hoffman-Andrews said.

Derek B. Johnson

Written by Derek B. Johnson

Derek B. Johnson is a reporter at CyberScoop, where his beat includes cybersecurity, elections and the federal government. Prior to that, he has provided award-winning coverage of cybersecurity news across the public and private sectors for various publications since 2017. Derek has a bachelor’s degree in print journalism from Hofstra University in New York and a master’s degree in public policy from George Mason University in Virginia.

Latest Podcasts