Open source security observability for Kubernetes

Viceroy

Every syscall, pod, role, image, secret, and network flow in one attack graph. Built for platform engineers who have to keep clusters alive and explain the weird parts fast.

$ viceroy watch prod --graph --explain-path
<2% node CPU overhead target
<1s event to incident target
6 signal planes normalized
OSS agent, rules, schema, API
cluster/prod-us-east-02 :: incident graph live

attack path: exec to secret egress

Viceroy links a shell inside a pod to its service account, role binding, secret read, and outbound flow. The alert is the path.

94 risk score
contain namespace Draft default-deny policy from observed safe flows.
rotate token ServiceAccount token used after suspicious exec.
preserve evidence Snapshot pod fs, audit trail, and network trace.
scroll / question the cluster

The hole

Kubernetes security is not missing tools. It is missing context.

Scanners know images. Policy engines know YAML. SIEMs know logs. Runtime agents know syscalls. Attackers do not care about those org chart boundaries.

The useful unit is the path: what ran, who it ran as, what it could touch, where it connected, what changed, and what to do before the evidence disappears.

EKScloud audit plus runtime
GKEidentity and workload graph
AKSmulti-cluster posture
k3ssmall clusters count too
bare metalno cloud lock-in
OpenShiftpolicy drift and evidence

What changes

Stop ranking alerts. Start explaining incidents.

Viceroy should make the boring cases boring and the dangerous cases obvious. A CVE that cannot be reached is not the same as a pod that just read a secret and opened a new egress path.

01

Runtime is first class.

eBPF signals, container runtime events, kubelet logs, and API audit events land in the same timeline.

02

Identity is attached.

Every process gets mapped back to pod, namespace, service account, RBAC, and cloud identity.

03

Noise gets filtered.

Vulnerabilities are ranked by reachability, loaded packages, live network exposure, and privilege.

04

Response is generated.

Quarantine policy, token rotation, seccomp draft, and evidence capture are proposed from the graph.

The product

One graph. Six signal planes. No spiritual dashboards.

Click a signal. The screen changes because the model changes. The landing page is a sketch; the product should be a queryable incident machine.

Runtime behavior, with Kubernetes names.

A shell spawned in a web pod is not an alert by itself. It becomes useful when you know the image, service account, namespace policy, secret access, and new outbound flow.

12:41:01execve /bin/shruntime
12:41:03sa/frontend can list secretsrbac
12:41:08read secret/stripe-keyaudit
12:41:10new egress 185.199.109.0/24flow
12:41:12container image has reachable openssl CVEsbom

Architecture

Small agents. Brutal correlation. Boring deployment.

The architecture is intentionally plain: DaemonSets collect truth, streams normalize it, the graph correlates it, detection scores it, automation proposes the smallest safe action.

01 / collect

Agents

eBPF, audit API, CRI, kubelet, Prometheus, cloud logs, GitOps webhooks.

02 / ingest

Stream

Timestamped events, schema normalized, tenant isolated, replayable by design.

03 / connect

Graph

Pods to processes, roles, secrets, images, endpoints, deployments, clusters.

04 / detect

Rules plus ML

Falco-like rules, anomaly baselines, ATT&CK chains, reachability ranking.

05 / act

Playbooks

Quarantine namespace, kill pod, rotate token, draft seccomp, preserve evidence.

90%+true positive target on ATT&CK container scenarios
<5%false alert target after graph ranking
100sof clusters per control plane target
0manual log hunts for first-response evidence

Open-core

Open where trust matters. Hosted where fleet ops hurt.

Security infra with a closed agent is a weird ask. Viceroy is open-core: the collector, rule format, graph schema, CLI, and local control plane stay inspectable. The paid control plane wins on retention, team workflows, managed correlation, and enterprise evidence.

Open-core without the agent mystery.

The core should be useful enough for a platform engineer to trust in prod. The business is hosted operations, long retention, collaboration, and fleet-scale correlation.

viceroy-agentDaemonSet
viceroy-rulesYAML
viceroy-graph-schemaOpenAPI
viceroy-consoleWeb
S

Self-hosted core

Single cluster or small fleet. Local graph, local rules, local evidence export. No vendor hostage move.

C

Cloud control plane

Managed multi-cluster correlation, long retention, team RBAC, alert routing, compliance exports, hosted upgrades.

L

Labs and validation

Attack simulation packs that prove detections actually work: token theft, cryptomining, escape attempts, rogue ingress.

Roadmap

Ship the wedge, then eat the category.

The first version should not pretend to solve every compliance acronym. It should catch obvious badness, explain it better than Falco plus dashboards, and generate the fix.

0-3 months: prove signal
  • Agent collecting audit, kubelet, runtime, eBPF basics
  • Rule engine and incident timeline
  • Minimal graph console
4-6 months: prove fleet
  • Multi-cluster aggregation
  • Cloud logs and GitOps webhooks
  • CIS and NSA-CISA posture mapping
7-12 months: prove action
  • Anomaly baselines by workload
  • NetworkPolicy and seccomp generation
  • One-click containment playbooks
13-18 months: prove category
  • Attack simulation lab
  • Advanced tenancy and evidence retention
  • Enterprise reporting without spreadsheet cosplay

Decisions plus questions

Viceroy, open-core, platform engineers, waitlist. Good. Now the hard parts.

The positioning is locked. These questions decide what the first waitlist users should believe and what the first demo must prove.

What does the waitlist promise?

Platform engineers should expect a practical early build: install an agent, see attack paths, generate fixes, and export evidence.

What stays open-core?

Agent, rules, graph schema, API, CLI, and local console should be inspectable. Managed retention and fleet workflows can be paid.

What does a platform engineer need first?

Fast install, low overhead, clear blast radius, useful YAML output, and no dashboard that needs a dedicated operator.

How aggressive can automation be?

Read-only recommendation, one-click response, or policy auto-apply after confidence threshold. This is the scary part.

What is the first undeniable demo?

Steal a service account token, read a secret, open egress, then show Viceroy explain and contain the whole chain.

Where does data live?

Some teams cannot send runtime telemetry to a SaaS. The product needs a clean answer for local, hybrid, and hosted modes.

Join the Viceroy waitlist.

For platform engineers running Kubernetes in anger. Early access should focus on install speed, attack-path clarity, and generated remediation.

Early access is for platform engineers running real clusters.