Prefix Delegation

Overview

Prefix delegation increases the number of pods per node by assigning IP prefixes (/28 blocks of 16 IPs) instead of individual secondary IP addresses. This dramatically increases pod capacity on Nitro-based instances.

Why Prefix Delegation?

Standard Mode (No Prefix Delegation)

ComponentLimit
ENIs per instanceInstance-dependent (e.g., m5.xlarge = 4)
IPs per ENI15 secondary IPs
Usable IPs per ENI15 (one is primary for node)
Max pods formula(ENIs × 14) + 2 = (4 × 14) + 2 = 58

Prefix Delegation Mode

ComponentLimit
ENIs per instanceInstance-dependent
Prefixes per ENI15 prefixes
IPs per prefix16 IPs
Usable IPs per ENI15 × 16 = 240
Max pods formula(ENIs × 15 × 16) + 2

Pod Capacity Comparison

Instance TypeStandard ModeWith Prefix DelegationIncrease
t3.medium17~1106.5×
t3.large35~2346.7×
m5.xlarge58~72212.4×
m5.2xlarge118~147412.5×
c5.4xlarge234~292212.5×

Note: Exact numbers vary by instance generation and ENI attachments for other purposes.

How It Works

Without Prefix Delegation

ENI (eth0)
├── Primary IP: 10.0.1.10/32 (node)
├── Secondary IP: 10.0.1.11/32 → Pod A
├── Secondary IP: 10.0.1.12/32 → Pod B
├── Secondary IP: 10.0.1.13/32 → Pod C
... (max 15 secondary IPs)

With Prefix Delegation

ENI (eth0)
├── Primary IP: 10.0.1.10/32 (node)
├── /28 Prefix: 10.0.1.16/28
│   ├── 10.0.1.16/32 → Pod A
│   ├── 10.0.1.17/32 → Pod B
│   ├── 10.0.1.18/32 → Pod C
│   ... (16 IPs per prefix)
├── /28 Prefix: 10.0.1.32/28
│   ├── 10.0.1.32/32 → Pod D
│   ├── 10.0.1.33/32 → Pod E
│   ... (16 more pods)
... (15 prefixes per ENI = 240 pods per ENI)

Prerequisites

  • Nitro-based instance types (not T2/T3 burst, not older generations)
  • VPC CNI v1.9.0 or later
  • Kubernetes 1.18 or later (for best support)

Instance Type Support

Prefix delegation works on most Nitro-based instances:

# Check if instance supports prefix delegation
aws ec2 describe-instance-types \
  --instance-types t3.medium t3.large m5.xlarge m5.2xlarge c5.xlarge c5.4xlarge \
  --query 'InstanceTypes[*].[InstanceType,NetworkInfo.MaximumNetworkInterfaces,NetworkInfo.Ipv4AddressesPerInterface]'

Unsupported

  • T2/T3 burst instances (not Nitro)
  • Some older non-Nitro instance types
  • Windows nodes

Configuration

Enable Prefix Delegation

# Via kubectl
kubectl set env daemonset/aws-node -n kube-system \
  AWS_VPC_K8S_CNI_ENABLE_PREFIX_DELEGATION=true

Complete DaemonSet Environment

env:
- name: AWS_VPC_K8S_CNI_ENABLE_PREFIX_DELEGATION
  value: "true"
- name: AWS_VPC_K8S_CNI_WARM_PREFIX_TARGET
  value: "1"
- name: AWS_VPC_K8S_CNI_MINIMUM_IP_TARGET
  value: "16"
- name: AWS_VPC_K8S_CNI_WARM_IP_TARGET
  value: "16"

Verify Configuration

# Check if prefix delegation is enabled
kubectl exec -n kube-system aws-node-xxxx -- \
  ip rule show
 
# View allocated prefixes
kubectl exec -n kube-system aws-node-xxxx -- \
  aws ec2 describe-network-interfaces \
  --query 'NetworkInterfaces[*].[NetworkInterfaceId,Ipv4Prefixes]'
 
# Check ipamd for prefix info
kubectl exec -n kube-system aws-node-xxxx -- \
  cat /var/run/aws-node/ipam.json | jq '.ipv4prefix'

Transitioning from Standard to Prefix Delegation

Critical: Do Not Rolling Replace Nodes

Important: When transitioning from standard mode (secondary IPs) to prefix delegation mode, create new node groups with prefix delegation enabled. Do not attempt to enable prefix delegation on existing nodes via rolling replacement.

  1. Create new node group with prefix delegation enabled
# new-nodegroup-prefix.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: my-cluster
  region: us-west-2
managedNodeGroups:
  - name: ng-prefix
    instanceType: m5.xlarge
    desiredCapacity: 3
    labels:
      ip-mode: prefix
    preBootstrapCommands:
      - echo "AWS_VPC_K8S_CNI_ENABLE_PREFIX_DELEGATION=true" >> /etc/eks/aws.conf
  1. Cordon old nodes
kubectl cordon <old-node-name>
  1. Drain old nodes
kubectl drain <old-node-name> --ignore-daemonsets
  1. Delete old node group
eksctl delete nodegroup --cluster my-cluster --name <old-ng-name>

Why Rolling Replace Doesn’t Work

When a node joins the cluster:

  1. Node gets ENIs attached
  2. ipamd starts with existing ENIs (secondary IPs, not prefixes)
  3. Transitioning to prefix mode would require full ENI detachment/reattachment
  4. This causes pod disruption and potential networking issues

WARM Prefix Target vs WARM IP Target

WARM_PREFIX_TARGET (Prefix Delegation Mode)

# Keep specified number of /28 prefixes warm
kubectl set env daemonset/aws-node -n kube-system \
  AWS_VPC_K8S_CNI_WARM_PREFIX_TARGET=1
ValueBehavior
0Allocate prefixes only when needed
1 (default)Keep 1 prefix (16 IPs) warm
2Keep 2 prefixes (32 IPs) warm

Interaction with WARM_IP_TARGET

When ENABLE_PREFIX_DELEGATION=true:

  • WARM_IP_TARGET overrides WARM_PREFIX_TARGET
  • If WARM_IP_TARGET=16 and WARM_PREFIX_TARGET=1, the behavior follows WARM_IP_TARGET
# WARM_IP_TARGET takes precedence
kubectl set env daemonset/aws-node -n kube-system \
  AWS_VPC_K8S_CNI_ENABLE_PREFIX_DELEGATION=true \
  AWS_VPC_K8S_CNI_WARM_IP_TARGET=32 \
  AWS_VPC_K8S_CNI_MINIMUM_IP_TARGET=64
ScenarioWARM_IP_TARGETWARM_PREFIX_TARGETResult
A16116 IPs warm (via prefixes)
B32132 IPs warm (2 prefixes needed)
Cnot set116 IPs warm (1 prefix)

Using with Security Groups for Pods

Prefix delegation works with Security Groups for Pods (SGP):

# Both can be enabled together
env:
- name: AWS_VPC_K8S_CNI_ENABLE_PREFIX_DELEGATION
  value: "true"
- name: AWS_VPC_K8S_CNI_ENABLE_POD_ENI
  value: "true"
- name: POD_SECURITY_GROUP_ENFORCING_MODE
  value: "standard"

Branch ENI Behavior with Prefix Delegation

FeatureStandard ModeWith Prefix Delegation
Branch ENIs per instanceInstance limitSame instance limit
IPs per branch ENI1 (primary only)1 (primary only)
Prefix delegation effectN/ADoes not affect branch ENI pods

Branch ENI pods (with SGP) still get a single primary IP from their dedicated ENI - prefix delegation affects only standard pods.

Verifying Prefix Delegation is Working

Check Pod Network Interface

# On the node, check pod interface
ip addr show eth0
 
# You should see /28 addresses assigned
# Example output (truncated):
# inet 10.0.1.17/32 scope global eth0
# inet 10.0.1.18/32 scope global eth0
# inet 10.0.1.19/32 scope global eth0

Check CNI Logs for Prefix Allocation

# Look for prefix-related logs
kubectl logs -n kube-system -l k8s-app=aws-node --tail=100 | grep -i prefix

Expected log entries:

level=debug msg="Creating/deleting ENI"
level=debug msg="Allocating prefix" ipv4Prefix=10.0.1.16/28
level=info msg="ishi: isPrimaryDevice: true, getDeviceNumber: 0

Check ENI Prefixes

kubectl exec -n kube-system aws-node-xxxx -- \
  aws ec2 describe-network-interfaces \
  --query 'NetworkInterfaces[*].[NetworkInterfaceId,Ipv4Prefixes]'

Example output:

[
    ["eni-abc123", [{"Ipv4Prefix": "10.0.1.16/28"}, {"Ipv4Prefix": "10.0.1.32/28"}]],
    ["eni-def456", [{"Ipv4Prefix": "10.0.1.48/28"}]]
]

Limitations

  1. Linux only - Prefix delegation not supported on Windows nodes
  2. Nitro instances only - Not available on T2/T3 burst instances
  3. Cannot downgrade below v1.9.0 - Once enabled, cannot downgrade without new nodes
  4. Mixed mode limitations - Pods on a node should all use same mode
  5. External SNAT behavior - With POD_SECURITY_GROUP_ENFORCING_MODE=standard and externalSNAT=false, pod traffic outside VPC uses node’s security groups

Performance Considerations

Pod Launch Latency

  • First pod on a new ENI may have slightly higher latency (prefix assignment)
  • Warm prefix pool eliminates this for most cases

Memory Usage

ipamd memory usage increases slightly with prefix delegation (tracking more IPs):

  • Standard: ~100MB typical
  • Prefix Delegation: ~120MB typical (20% increase)

Common Issues

Issue: Pods stuck in Pending after enabling prefix delegation

Diagnosis:

# Check if prefix delegation is actually enabled
kubectl exec -n kube-system aws-node-xxxx -- \
  env | grep PREFIX
 
# Check ipamd logs
kubectl logs -n kube-system aws-node-xxxx -c aws-node --tail=50 | grep -i prefix

Solution: Nodes may need to be recycled to pick up the configuration. Create new nodes with prefix delegation enabled.

Issue: “Insufficient IP address capacity” error

Cause: Transitioning nodes but old allocation mode still active on some nodes.

Solution: Complete the node transition - all nodes should use prefix delegation.

IPv6 Prefix Delegation

Prefix delegation for IPv6 uses /80 prefixes:

# Enable IPv6 mode (requires prefix delegation)
kubectl set env daemonset/aws-node -n kube-system \
  AWS_VPC_K8S_CNI_ENABLE_IPv6=true \
  AWS_VPC_K8S_CNI_ENABLE_PREFIX_DELEGATION=true

IPv6 prefix delegation follows same principles but uses larger prefixes (/80 for IPv6).

References