All Insights

The Safety Check Protecting Your AI Coding Agent Is Broken. Ten Out of Eleven, Actually.

CivSafe Team·July 1, 2026·6 min read

Something dropped yesterday that your dev team hasn't heard about yet.

Adversa AI published research they're calling GuardFall. It's a class of bypass against the shell safety checks that AI coding agents use to decide whether a command is safe to run. Ten out of eleven agents they tested failed. The technique uses shell tricks that pre-date most of the developers building these tools.

This is a problem if your team uses AI coding agents, which at this point means most small dev shops.

What the guard is supposed to do

When you're using something like Cline or Aider and the agent wants to run a shell command, it's supposed to check that command against a list of things it shouldn't do. Delete files, exfiltrate data, phone home — if the command looks dangerous, the agent either blocks it or asks you first.

That check is the primary safety barrier between the agent and your machine. It's the thing standing between "helpful coding assistant" and "something with full shell access that's been told to steal your credentials."

GuardFall shows that barrier doesn't work.

Why bash is the problem, not the AI

The agents check commands as plain text. That's the fundamental issue.

Bash doesn't run commands as plain text. Before it executes anything, it processes the string — stripping quotes, expanding variables, substituting shortcuts. What you type (or what an agent submits) isn't necessarily what runs. Bash rewrites it first.

So the agents run their safety filter on the raw text of the command. Bash sees something different. An attacker who knows how bash processes strings — and this is genuinely decades-old, textbook knowledge — can write a command that the filter reads as harmless but that bash executes as something dangerous. The specific techniques the researchers documented include quote removal and $IFS manipulation, both standard bash behavior that hasn't changed in 30+ years.

The filter and the shell are looking at two different things. The filter says yes. The shell runs whatever the attacker intended.

The affected agents, named

Adversa tested eleven popular open-source coding agents. Ten failed:

  • Cline (cline/cline)
  • Goose (block/goose)
  • Aider (Aider-AI/aider)
  • Roo Code (RooCodeInc/Roo-Code)
  • OpenHands (All-Hands-AI/OpenHands)
  • SWE-agent (SWE-agent/SWE-agent)
  • Open Interpreter (OpenInterpreter/open-interpreter)
  • Plandex (plandex-ai/plandex)
  • OpenCode (sst/opencode)
  • Hermes

Together, these projects carry roughly 548,000 GitHub stars. This isn't niche tooling — these are the agents that real small dev shops are actually running.

The one that passed: Continue (continuedev/continue). Continue does the check correctly — it parses the command the way bash would, breaks it into tokens, and checks what will actually run. It maintains a hard blocklist of destructive operations rather than a regex pattern on raw text. That's the right design. Only one of eleven teams did it.

Adversa ran the full attack end-to-end against the production Plandex binary, and confirmed the same technique worked against eight others. They disclosed to the affected projects before publishing.

What the actual attack looks like

You clone a repo, install a package, or open a project someone shared. That project has a file that gets read during initialization — a config, a README section, build script comments — containing a hidden instruction to the agent. The instruction tells the agent to run a shell command. The command is obfuscated using one of these bash tricks. The guard clears it. Bash processes the obfuscation and runs what the attacker actually meant. Your SSH keys, cloud credentials, API tokens, .env files — anything your shell account can reach — get sent to an attacker-controlled server.

You don't see anything unusual. The agent continues working.

This attack path matters right now because supply chain attackers have been actively targeting developer tooling. The North Korean group attributed to the Mastra npm attack in June — 144 packages backdoored in 88 minutes — knows exactly how developers work and what they touch daily. GuardFall is a blueprint for turning any of those delivery mechanisms into a credential harvest.

Why small teams are the exposed ones

Enterprise dev environments often run AI agents in containers with scoped credentials, network egress restrictions, ephemeral machines. They have security monitoring watching outbound traffic.

A five-person shop running Cline on a MacBook doesn't do any of that. The agent runs on the same machine with the same credentials as the developer. That machine has AWS keys in ~/.aws/credentials, SSH keys for every server the developer touches, GitHub tokens, database passwords in a local .env file, maybe Stripe keys. A successful GuardFall attack gets all of it in a single session.

What to do before your next sprint

Check which agent your team is using. The list above is confirmed vulnerable. If it's on that list, your shell guard is not reliable. That doesn't mean stop using the tool today — it means understand the actual risk.

Turn on command confirmation. Most of these agents have a setting that asks you to approve each shell command before it runs. It's slower. Enable it anyway until patches land. One extra click per command is a lot more recoverable than a credential harvest.

Be careful what you point your agent at. Cloning unfamiliar repos, installing packages from new sources, opening project files from outside your normal stack — all of these are potential injection points. Treat anything an AI coding agent reads as having shell access to your machine, because that's functionally what it does.

Run agents in restricted environments for anything sensitive. A cheap VM with read-only credential mounts and no production access is an afternoon of setup. For teams where developers have access to production systems, it's worth it.

Watch for patches on GitHub. Adversa disclosed to the affected projects before publishing, so patches may come quickly for some of them. Subscribe to releases or watch the repos for the tools your team uses.

The underlying issue

AI coding agents were designed to be helpful. They weren't designed with a serious attacker in mind. The shell guard existed to prevent obvious accidents — the agent accidentally deleting something — not to withstand someone who has read a bash manual.

The threat model has changed. These agents run shell commands with your full account permissions. Supply chain attackers specifically target the tools developers trust and use daily. The gap between "attacker plants something" and "agent executes it with your credentials" is now: does the shell guard actually work?

For most of these tools, as of June 30, it doesn't.

We help small teams get clarity on exactly which AI tools in their stack have this kind of shell access, and what's worth changing before it matters. If your dev team is using any of the agents on that list, that's a good conversation to have soon.

CivSafe — Strategic Innovation. Community Impact.