Advertisement

Vuln in Google’s Antigravity AI agent manager could escape sandbox, give attackers remote code execution

Google’s highest security setting for its agents runs command operations through a sandbox and throttles network access, but is still vulnerable to prompt injection.
Listen to this article
0:00
Learn more. This feature uses an automated voice, which may result in occasional errors in pronunciation, tone, or sentiment.
CANADA - 2026/04/08: In this photo illustration, the Google Antigravity logo is seen displayed on a smartphone screen. (Photo Illustration by Thomas Fuller/SOPA Images/LightRocket via Getty Images)

As organizations consider agentic AI for their business and IT stacks, researchers continue to find bugs and vulnerabilities in major, commercial models  that can significantly expand their attack surface.

This week, researchers at Pillar Security disclosed a vulnerability in Antigravity, an AI-powered developer tool for filesystem operations made by Google.

The bug, since patched, combined prompt injection with Antigravity’s permitted file-creation capability to grant attackers remote code execution privileges.

The research details how the exploit was able to circumvent Antigravity’s secure mode, Google’s highest security setting for its agents that runs all command operations through a virtual sandbox environment, throttles network access and prohibits the agent from writing code outside of the working directory.

Advertisement

Secure mode is supposed to limit the AI agent access to sensitive systems – and its ability to execute malicious or dangerous acts through shell commands. But one of the file-searching tools used by Antigravity, called “find_by_name,” is classified as a ‘native’ system tool. This means the agent can execute it directly and before protections like Secure Mode can even evaluate command level operations.

“The security boundary that Secure Mode enforces simply never sees this call,” wrote \ Dan Lisichkin, an AI security researcher with Pillar Security. “This means an attacker achieves arbitrary code execution under the exact configuration a security-conscious user would rely on to prevent it.”

The prompt injection attacks can be delivered through compromised identity accounts connected to the agent, or indirectly by hiding clandestine prompt instructions inside open-source files or web content the agent ingests. Antigravity  has trouble distinguishing between written data it ingests for context and literal prompt instructions, so compromise can be achieved without any elevated access by getting it to read a malicious document or file.

According to a disclosure timeline provided by Pillar Security, the bug was reported to Google on Jan. 6 and patched on Feb. 28, with Google awarding a bug bounty for the discovery.

Lisichkin said this same pattern of prompt injection through unvalidated input has been found in other coding AI agents like Cursor. In the age of AI, any unvalidated input can become a malicious prompt capable of hijacking internal systems.

Advertisement

“The trust model underpinning security assumptions, that a human will catch something suspicious, does not hold when autonomous agents follow instructions from external content,” he wrote.

The fact that the vulnerability was able to completely bypass Google’s secure mode underscores how the cybersecurity industry must start adapting and “move beyond sanitization-based controls.” 

“Every native tool parameter that reaches a shell command is a potential injection point. Auditing for this class of vulnerability is no longer optional, and it is a prerequisite for shipping agentic features safely,” Lisichkin wrote.

Latest Podcasts