Kubernetes Pentesting: Cluster Attacks and Production Hardening

Kubernetes dominates container orchestration in production. Any company that has migrated to a cloud native architecture ends up, sooner or later, managing one or more K8s clusters, either self-hosted on owned machines or consumed as a managed service like EKS, GKE, AKS or OpenShift. That ubiquity turns the cluster into one of the most sensitive assets in the organization: if an attacker compromises the control plane, they inherit every application, every secret and, often, the underlying cloud identity too.

Kubernetes security breaks down into four overlapping layers: cluster, pod, image and supply chain. A serious pentest has to traverse all four and understand how they influence each other, because a single flaw in one layer, for example an over-privileged service account or a base image with unpatched CVEs, is enough for the attacker to pivot into everything else.

The essentials: Kubernetes pentesting is not port scanning the API Server. It is reviewing RBAC, network policies, pod security, secrets, image supply chain, runtime defense and, in managed clusters, cloud identity federation. Without those seven fronts covered, the cluster is indefensible.

Why K8s pentesting is different

A Kubernetes pentest does not look like auditing a monolithic server, nor like a pure cloud pentest, although it shares components with both. The key difference comes from three factors.

The first is the distributed surface. A cluster has dozens of components talking to each other on the internal network (API Server, etcd, kubelet, kube-proxy, controller-manager, scheduler, plus worker nodes and add-ons). Each one exposes different endpoints, credentials and authentication mechanisms. Attacking K8s means understanding that map and finding the weak link, not scanning a single host.

The second is insecure defaults. Kubernetes prioritizes ease of deployment over security by default. Service accounts mount automatically into every pod, namespaces have no network policies, pods can run as root, images are pulled without signature verification. If no one applies explicit hardening, the cluster is born exposed. Recent versions have tightened some defaults (Pod Security Admission replacing PodSecurityPolicy, for instance), but the inertia of inherited configurations weighs heavy.

The third is RBAC complexity. Role-Based Access Control in K8s has enormous granularity: roles and cluster roles, verbs, resources, subresources, namespaces. Teams under delivery pressure end up assigning cluster-admin or equivalent roles to CI/CD pipelines, operators and application service accounts. Every one of those accesses is a doorway into the entire cluster.

Components and attack surface

The API Server is the central point. It receives every request, validates authentication and authorization, and persists state in etcd. Exposed to the Internet without authentication or with bypass flags like --anonymous-auth=true, it is equivalent to handing over the keys to the cluster. Typical ports are 6443 (HTTPS) and, on misconfigured clusters, 8080 without encryption.

etcd stores all configuration and all secrets. If it is unencrypted at rest or reachable over the network without mTLS, an attacker with access to the nodes can read every secret without going through the API Server. Default ports are 2379 (client) and 2380 (peer).

kubelet runs on every node and executes pods. It exposes an API on port 10250 that, without proper TLS authentication, lets an attacker run arbitrary commands inside any pod on the node. The old read-only port 10255 is no longer enabled by default, but it still shows up on legacy clusters.

kube-proxy manages network rules on each node. Smaller attack surface, but a node compromise also exposes the iptables or IPVS rules that govern traffic between services.

controller-manager and scheduler run reconciliation loops. An attacker with privileges over these components can manipulate cluster-wide behavior.

Ingress controllers (NGINX Ingress, Traefik, HAProxy Ingress) are the HTTP/HTTPS entry point from outside. Historical CVEs such as CVE-2025-1974 in ingress-nginx have shown that a poorly versioned ingress can end in remote RCE.

Service meshes (Istio, Linkerd, Consul Connect) add their own control plane and sidecar data plane, with their own security model and their own CVEs.

Phases of a K8s pentest

The methodology we apply follows four phases.

External recon. From the Internet, without credentials. We look for exposed API Servers, Kubernetes Dashboard panels, Argo CD or Rancher endpoints without authentication, reachable kubelets, ingresses with sensitive paths. Tooling: kube-hunter in remote mode, nmap with NSE scripts, Shodan and Censys searches, version fingerprinting cross-referenced against known CVEs.

Internal enumeration. Once inside the cluster, whether via a leaked credential, a shell on a compromised pod or a pivot from the corporate network, we map the environment. We use kubectl auth can-i --list to enumerate the current service account permissions, kubectl get against each resource type we can reach, list secrets, configmaps, pods with privileged: true, network policies, RBAC bindings. Tooling: kubectl, kubeaudit, kdigger, peirates.

Exploitation. Here we look for escalation paths. Recurring patterns: abusing a service account with pods/exec permission to enter other pods, creating new pods with hostPath to read the node filesystem, using pods/portforward to reach internal services, exploiting a privileged sidecar, escaping the container through runc CVEs, reading service account tokens from other namespaces when RBAC allows it, calling the kubelet directly. If the cluster runs in cloud, we try to reach the node IMDS (169.254.169.254) to assume the IAM/IRSA/Workload Identity role.

Persistence. Once we hold privileges, we evaluate what persistence an attacker could establish: malicious DaemonSets, mutating webhooks that inject code into every pod, kubelet modifications on nodes, creation of service accounts with cluster-admin, role bindings to external users. The goal is to understand what the blue team would have to clean up in a real incident.

Common attacks

API Server exposed without authentication. Less frequent than five years ago but still occurs, especially in development clusters spun up with kubeadm and left reachable through a misconfigured firewall. Quick check: curl -k https://API_SERVER:6443/api/v1/namespaces without a token. If it returns data, the cluster is compromised by design.

RBAC abuse. The most common pattern. cluster-admin permissions handed out to CI/CD service accounts, operators installed through Helm requesting overly broad permissions in their ClusterRoleBinding, human users with perpetual permissions instead of temporary elevation. Any compromise of one of those accesses equals a full cluster compromise. Typical finding: an application pod with a service account that has get secrets over * instead of the specific secrets it actually needs.

Unencrypted etcd. Without active EncryptionConfiguration, secrets are stored in plain base64. An attacker who reaches a control plane node runs etcdctl get / --prefix --keys-only and walks away with the entire store.

Exposed kubelet API. Port 10250 without TLS authentication or with --anonymous-auth=true. It lets anyone who reaches the node execute commands in any pod: curl -k -X POST "https://NODE:10250/exec/NAMESPACE/POD/CONTAINER?command=sh&input=1&output=1&tty=1".

Container escape. Historically, CVE-2024-21626 (runc) and the Dirty Pipe family allowed escape from container to host. Patches have been out for a long time, but they still appear on clusters with unpatched nodes. The mandatory check is the runc, containerd or Docker Engine version on every node.

Pod with hostPath mount. A pod that mounts / or /host from the node into its filesystem can read and write on the host operating system. If it also runs as root, it already owns the node.

privileged: true. A container with this flag has capabilities equivalent to host root: device access, ability to modify the kernel, mounting arbitrary filesystems. Except for narrow cases (some monitoring agents, certain CSIs), it should not exist in production.

Service account token mounting by default. Until recent versions, every pod automatically mounted the token of its associated service account. If the application does not need it, set automountServiceAccountToken: false. Applications that do not use the K8s API have no reason to carry cluster credentials on board.

Secrets as environment variables. Loading secrets as env from a Secret makes them visible in kubectl describe pod, in crash logs and in misconfigured observability tooling. Robust alternatives: mount as files with restrictive permissions, integration with HashiCorp Vault, Sealed Secrets, External Secrets Operator backed by AWS Secrets Manager, GCP Secret Manager or Azure Key Vault.

Missing network policies. Without NetworkPolicy, every pod in the cluster can talk to every other pod. A compromised pod scans internally and reaches the API Server, the database, the Redis cluster, the Celery workers. The recommended practice is default deny per namespace plus explicit allow for required traffic.

Supply chain with vulnerable base image. The node:18 or python:3.10 image drags in hundreds of system packages. If it is not rebuilt periodically or not scanned with Trivy or Grype, it accumulates CVEs. Images pulled from Docker Hub without signature verification are vulnerable to typosquatting or to maintainer account compromises. Using distroless, minimal images signed with Cosign drastically reduces the surface.

K8s pentest tooling

kube-hunter runs hunters both remotely and inside the cluster. Useful for the recon phase. kube-hunter --remote API_SERVER or kube-hunter --pod from a compromised pod.

kubeaudit reviews manifests and live clusters looking for bad practices: privileged pods, unnecessary capabilities, missing securityContext.

kubescape audits against frameworks (NSA, MITRE ATT&CK for K8s, CIS Benchmark). kubescape scan framework cis-1.10.

peirates focused on exploitation. Enumerates permissions, attempts privilege escalation, looks for cloud tokens reachable from the node.

BOtB (Break out the Box) checks container escape techniques available from the current container.

kdigger offers a small toolkit of buckets for investigating permissions, network reachability, kernel capabilities from inside a pod.

Trivy and Grype scan images for CVEs before and after deployment. trivy image registry/app:tag and grype registry/app:tag. Both integrate into CI/CD to block images with vulnerabilities above a threshold.

Cloud-managed K8s

Managed services delegate the control plane (API Server, etcd, scheduler, controller-manager) to the provider, but responsibility for RBAC, network policies, pods, secrets and workloads remains with the customer. Each platform adds its own cloud integration layer, and that is where platform-specific risks show up.

EKS (AWS). The critical integration is IRSA (IAM Roles for Service Accounts). A poorly annotated service account can inherit IAM permissions far broader than necessary. Additionally, EC2 nodes expose the IMDS at 169.254.169.254; if pods can reach it (because IMDSv2 with hop-limit 1 or IRSA in its place is not enforced), they inherit the node role. Authentication to the API Server goes through aws-iam-authenticator or the aws-auth ConfigMap, which has also been a source of misconfiguration errors.

GKE (Google Cloud). Workload Identity is the equivalent to IRSA. Misconfiguring it or using nodes with legacy --scopes lets pods assume the Compute Engine service account, normally with very broad permissions. GKE Autopilot tightens many defaults but does not exempt teams from reviewing bindings.

AKS (Azure). Azure AD Workload Identity replaces the older pod identity. Human user authentication to the cluster goes through Entra ID. Recurring risk: managed identities with Owner role over the subscription assigned to the cluster for convenience.

OpenShift. Adds its own security model with SecurityContextConstraints (SCC), the historical equivalent of PodSecurityPolicy. Default SCCs are more restrictive than vanilla K8s, but operators installed without review can require permissive SCCs. The web console and OAuth endpoints also expand the attack surface.

Effective CIS Benchmark hardening

The CIS Kubernetes Benchmark is the universal reference. Its controls cover cluster, nodes, pods and policies. The items with the most practical impact are the following.

Least privilege RBAC. Each service account with the exact permissions it needs, not one more. Periodic review of bindings, elimination of cluster-admin outside the platform plane, use of namespaced roles instead of cluster roles whenever possible.

Pod Security Standards. Apply baseline or restricted per namespace via the PodSecurityAdmission admission controller. It blocks privileged pods, hostPath, runAsRoot and other bad practices.

Default deny network policies. In every namespace, a base policy that denies all ingress and egress traffic, plus specific policies that allow what is required. Compatible CNIs: Calico, Cilium, Antrea.

OPA Gatekeeper or Kyverno. Admission controllers that validate or mutate resources according to declarative policies. They block deployments that violate rules (unsigned images, forbidden namespaces, missing mandatory labels) before they reach etcd.

Secrets encryption at rest. Enable EncryptionConfiguration on the API Server with a backend in the cloud provider KMS or an external service.

Audit logging. Enable API Server audit logs with a policy that records at least create, modify and delete events on sensitive resources. Ship them to an external SIEM, not just to local disk on the control plane.

Image signing. Sign images with Cosign or Notation and enforce verification in admission through Sigstore policies. Combined with a private registry and mirroring of upstream images, it prevents direct pulls from Docker Hub.

Runtime defense. Falco or Tracee detect anomalous runtime behavior: shell spawns inside containers, writes to sensitive paths, unexpected network connections. Their value is highest when events reach a SIEM and trigger alerts.

Admission controllers. Review the active list of admission plugins on the API Server. ValidatingAdmissionWebhook, MutatingAdmissionWebhook, NodeRestriction and LimitRanger should be enabled.

Service Mesh and zero trust

A service mesh like Istio or Linkerd adds automatic mTLS between pods, cryptographic per-workload identity and fine-grained L7 authorization policies. Properly configured, it materializes zero trust principles inside the cluster: no flow is trusted by default, every call is authenticated and authorized explicitly.

Istio AuthorizationPolicies allow rules like "only the cart service account in the shop namespace can call the POST /checkout method on the payments service". Linkerd offers equivalents through its Server, ServerAuthorization and HTTP route-based policies.

The mesh does not replace network policies: it complements them. Network policies operate at L3/L4 and are enforced by the CNI, the mesh operates at L7 through sidecars or an eBPF dataplane. Both in parallel build layered defense.

Frequently asked questions

Is it safe to pentest K8s in production?

Yes, with well-defined scope and a non-disruptive methodology. Enumeration, RBAC review, configuration analysis and image scanning are non-intrusive. Active exploitation against the control plane or container escape attempts runs in agreed windows, on staging replicas when possible, or limited to an isolated namespace.

Does managed Kubernetes simplify security?

It reduces the operational burden of the control plane and applies saner defaults, but it does not exempt the customer from most controls. RBAC, network policies, pod security, secrets, supply chain and runtime defense remain customer responsibility. It also adds the complexity of cloud IAM federation (IRSA, Workload Identity, Azure AD Workload Identity), a frequent source of findings.

Does the CIS Benchmark cover everything?

It covers baseline cluster and node posture with high coverage, but it does not audit application code, does not deeply review supply chain (signatures, SBOM, provenance), does not evaluate the service mesh and does not go into the business threat model. Starting point and minimum requirement, not a ceiling.

Who owns the cluster vs the applications?

The typical split is platform (cluster, CNI, ingress, mesh, baseline policy) and application (manifests, images, secrets, configuration). The platform team sets guardrails with OPA Gatekeeper or Kyverno; application teams build inside them. Auditing K8s means reviewing both planes and the interface between them.

Should supply chain images be audited?

Yes, mandatorily. Every image running in the cluster is third-party code executing inside your perimeter. The recommended practice combines scanning with Trivy or Grype in CI/CD, signing with Cosign, admission verification, mirroring to a private registry and periodic rebuild of base images.

How long does a Kubernetes pentest take?

For a small cluster with fewer than fifty workloads, between one and two weeks. For production clusters with multiple namespaces, multi-tenancy, service mesh and cloud federation, between three and five weeks. Time roughly splits as: recon and enumeration 20 %, RBAC and configuration 25 %, exploitation 25 %, supply chain 15 %, runtime and reporting 15 %.

K8s audit with Secra

At Secra we run full offensive Kubernetes audits: cluster pentest from external and internal perspectives, exhaustive RBAC review with identification of over-privileged bindings, image supply chain analysis with CVE scanning and signature verification, and CIS Benchmark hardening with a prioritized remediation plan. We work on self-managed clusters and on EKS, GKE, AKS and OpenShift, integrating findings with NIS2, ENS and ISO 27001 when applicable.

If your team operates Kubernetes in production and you want to know the real security posture of your cluster, let's talk and we will design the pentest scope together.

About the author

Secra Solutions team

Ethical hackers with OSCP, OSEP, OSWE, CRTO, CRTL and CARTE certifications, 7+ years of experience in offensive cybersecurity, and authors of CVE-2025-40652 and CVE-2023-3512.

Meet the team →Our security research →

Kubernetes Pentesting: Cluster Attacks and Production Hardening

Why K8s pentesting is different

Components and attack surface

Phases of a K8s pentest

Common attacks

K8s pentest tooling

Cloud-managed K8s

Effective CIS Benchmark hardening

Service Mesh and zero trust

Frequently asked questions

Is it safe to pentest K8s in production?

Does managed Kubernetes simplify security?

Does the CIS Benchmark cover everything?

Who owns the cluster vs the applications?

Should supply chain images be audited?

How long does a Kubernetes pentest take?

K8s audit with Secra

Related Articles

wp2shell: Unauthenticated RCE in WordPress Core

File Upload Vulnerability: Webshell, RCE and Defense

Flipper Zero: What It Is, What It Does & Risks

Kubernetes Pentesting: Cluster Attacks and Production Hardening

Why K8s pentesting is different

Components and attack surface

Phases of a K8s pentest

Common attacks

K8s pentest tooling

Cloud-managed K8s

Effective CIS Benchmark hardening

Service Mesh and zero trust

Frequently asked questions

Is it safe to pentest K8s in production?

Does managed Kubernetes simplify security?

Does the CIS Benchmark cover everything?

Who owns the cluster vs the applications?

Should supply chain images be audited?

How long does a Kubernetes pentest take?

Related resources

K8s audit with Secra

Related Articles

wp2shell: Unauthenticated RCE in WordPress Core

File Upload Vulnerability: Webshell, RCE and Defense

Flipper Zero: What It Is, What It Does & Risks