Advertisement

Hugging Face platform continues to be plagued by vulnerable ‘pickles’

A widely used python module for machine-learning developers can be loaded with malware and bypass detection measures.
Listen to this article
0:00
Learn more. This feature uses an automated voice, which may result in occasional errors in pronunciation, tone, or sentiment.
Pickle files - python-based modules that allow a developer to serialize and deserialize code - are commonly used by legitimate AI developers and threat actors. (Image Source: Getty Images)

Researchers at ReversingLabs have identified at least two machine-learning models on Hugging Face, a popular platform for community AI development, that link to malicious web shells and managed to evade detection through the use of “pickling.”

Pickle files are python-based modules that allow a developer to serialize and deserialize code. They’re commonly used by AI developers to store and build off ML models that have already been trained. Threat actors also take advantage of the fact that pickle files can execute python code from untrusted sources during the deserialization process.

ReversingLabs identified a pickling method used in two ML models available on Hugging Face’s platform that contained malicious code, deploying web shells that linked to a hardcoded IP address.

Karlo Zanki, a reverse engineer at ReversingLabs, wrote that the two packages “look more like a proof-of-concept model for testing a novel attack method” than evidence of an active attack. However, since platforms like Hugging Face are built on community sharing of data and pickle files are one of the easiest ways to share information, Zanki said the attack vector was a “legitimate” threat to AI developers.

Hugging Face, for its part, is aware of the dangers from pickle files and even warns developers about the problem in its documentation. The company also deploys a tool — called Picklescan — that is designed to identify malicious pickle files on its platform.

Advertisement

“The Picklescan tool is based on a blacklist of ‘dangerous’ functions. If such functions are detected inside a Pickle file, Picklescan marks them as unsafe,” Zanki wrote. While blacklists are basic security features, they’re “not scalable or adaptable as known threats morph — and new threats emerge.” 

The two models identified by ReversingLabs, stored in PyTorch, managed to skirt detection by the tool, likely because they were compressed using a different format. Picklescan also stumbles when attempting to detect malicious code in broken pickle files.

The findings, Zanki said, underscore how “pickle file deserialization works in a different way from Pickle security scanning tools.”

“Picklescan, for example, first validates Pickle files and, if they are validated, performs security scanning,” he said. “Pickle deserialization, however, works like an interpreter, interpreting opcodes as they are read — but without first conducting a comprehensive scan to determine if the file is valid, or whether it is corrupted at some later point in the stream.” 

Zanki said the issue was reported to Hugging Face on Jan. 20, the malicious models were quickly pulled from the platform and changes were made to Picklescan to better identify malicious code in broken pickle files.

As the AI boom has led to a surge of community-made machine- learning models, pickle-related vulnerabilities continue to plague developers. Researchers at ReversingLabs, Wiz, Checkmarx and other cybersecurity firms have identified numerous methods and examples of abusing pickle files to deliver malware to unsuspecting developers on open platforms like Hugging Face.

Advertisement

To read more about this vulnerability, including indicators of compromise, read the full ReversingLabs research here.

Derek B. Johnson

Written by Derek B. Johnson

Derek B. Johnson is a reporter at CyberScoop, where his beat includes cybersecurity, elections and the federal government. Prior to that, he has provided award-winning coverage of cybersecurity news across the public and private sectors for various publications since 2017. Derek has a bachelor’s degree in print journalism from Hofstra University in New York and a master’s degree in public policy from George Mason University in Virginia.

Latest Podcasts