Cybersecurity

Microsoft AI researchers exposed sensitive signing keys, internal messages

The 38 TB of data available via GitHub included 30,000 Teams messages and would've allowed an attacker to inject malicious code in AI models.

By Tonya Riley

September 18, 2023

Attendees walk past the logo of US multinational technology company Microsoft during the Web Summit in Lisbon on November 6, 2019. (PATRICIA DE MELO MOREIRA/AFP /AFP via Getty Images)

A misconfiguration caused AI researchers at Microsoft to expose 38 terabytes of sensitive internal data, including signing keys, to GitHub users, researchers at Wiz reported on Monday.

The error occurred when a Microsoft employee published open-source training data to a company GitHub repository providing 0pen-source code and AI models for image recognition from Microsoft’s AI research division. Users were instructed to download the data from a misconfigured link that instead allowed access to 38TB of internal data, including 30,000 internal Microsoft Teams messages from 359 Microsoft employees, passwords to Microsoft services, and secret keys.

No customer data was exposed, Microsoft said in a blog post.

The revelation on Monday of the misconfigured data repository is the latest in a string of high-profile security snafus at Microsoft and comes two weeks after the company revealed how hackers based in China were able to infiltrate the company’s systems and steal a highly sensitive signing key.

In the incident revealed Monday, the data was shared via an SAS token, which means while the data wasn’t directly exposed to the web anyone who got the link could have accessed the files. The link was also configured so that anyone with access could not just read but delete and overwrite files. That access means that hackers could have potentially injected malicious code into the AI training data, Wiz researchers note.

“This case is an example of the new risks organizations face when starting to leverage the power of AI more broadly, as more of their engineers now work with massive amounts of training data,” Wiz researchers Hillai Ben-Sasson and Ronny Greenberg wrote. “As data scientists and engineers race to bring new AI solutions to production, the massive amounts of data they handle require additional security checks and safeguards.”

Wiz researchers say the mishap calls out the vulnerability of SAS tokens. Once a hacker has access to data made available via an SAS token, it’s hard to revoke permission and many SAS tokens have long lifetimes, researchers note.

“Due to the lack of security and governance over Account SAS tokens, they should be considered as sensitive as the account key itself,” researchers suggested. “Therefore, it is highly recommended to avoid using Account SAS for external sharing. Token creation mistakes can easily go unnoticed and expose sensitive data.”

Wiz worked with Microsoft under its vulnerability disclosure program, and disclosed the exposed data in June. Microsoft said it expanded its scanning service for credential exposure to include any SAS tokens that may have “overly-permissive expirations or privileges.”

“Like any secret, SAS tokens need to be created and handled appropriately. As always, we highly encourage customers to follow our best practices when using SAS tokens to minimize the risk of unintended access or abuse,” Microsoft said in a blog post. “Microsoft is also making ongoing improvements to our detections and scanning toolset to proactively identify such cases of over-provisioned SAS URLs and bolster our secure-by-default posture.”