Plan: Gateway API In-Depth Research & Envoy Gateway Implementation — DONE
Owner: Mavis (root session) Status: All deliverables shipped. Sandbox cannot run kind; manifest + script validation done in-sandbox; live run is on you.
1. Final Deliverable Tree
/workspace/
├── plans/
│ └── gateway-api-envoy-gateway-plan.md # this file
├── research/
│ ├── gateway-api.md # spec, resources, roles, conformance, Ingress diff, gotchas
│ └── envoy-gateway.md # architecture, helm values, feature matrix, ops
└── lab/
├── install.sh # one-command bootstrap
├── cleanup.sh # teardown
├── RUNBOOK.md # day-2 ops + troubleshooting + prod checklist
├── VERIFY.md # curl one-liners for every scenario
└── scenarios/
├── 01-minimal-http.yaml
├── 02-multi-host.yaml
├── 03-path-routing.yaml
├── 04-header-routing.yaml
├── 05-traffic-split.yaml
├── 06-redirect-rewrite.yaml
├── 07-header-filter.yaml
├── 08-tls-edge.yaml
├── 09-tls-passthrough.yaml
├── 10-tls-reencrypt.yaml
├── 11-cross-ns.yaml
├── 12-cors-jwt.yaml
├── 13-rate-limit.yaml
├── 14-external-auth.yaml
├── 15-multi-listener.yaml
└── 16-gateway-merge.yaml
2. What Got Built
Research (long-form Markdown)
research/gateway-api.md(~16 KB) — Why Gateway API exists, full resource model, conformance profiles, role/persona separation, route attachment,ReferenceGrant,BackendTLSPolicy, cert management, channel packaging, gateway merging, status/conditions, full diff table vs Ingress, common pitfalls, GAMMA/MCG/future pointers.research/envoy-gateway.md(~13 KB) — Architecture diagram, comparison with alternatives (NGINX, Traefik, HAProxy, Cilium, Istio, Kong), release model, supported features, Helm values, security defaults,EnvoyProxyCRD for break-glass, operations (admin port, metrics, logs, xDS, upgrades), spec-vs-extension decision table, troubleshooting map.
Lab (runnable)
lab/install.sh— Pre-flight checks → kind cluster (1 control + 2 workers, host ports 80/443/19000 mapped) → Gateway API CRDs (v1.2.1 standard) → cert-manager (v1.16.2) → Envoy Gateway (v1.4.0) via Helm → wait for GatewayClass Accepted → createinfra/app/app-a/app-bnamespaces → apply a baselineGateway+ smokeHTTPRoute→ curl smoke test.lab/cleanup.sh— Wipes the kind cluster.lab/RUNBOOK.md— Day-2 operations: bring-up, apply scenarios, daily checks, debug a request, upgrades, teardown, troubleshooting recipes, production checklist (15 items), references.lab/VERIFY.md— Re-runnablecurlinvocations for every scenario, with expected outputs.
16 Scenarios (numbered, ordered)
| # | Demonstrates | File |
|---|---|---|
| 01 | Minimal HTTP | 01-minimal-http.yaml |
| 02 | Multiple hostnames | 02-multi-host.yaml |
| 03 | Path-based routing | 03-path-routing.yaml |
| 04 | Header-based routing (A/B) | 04-header-routing.yaml |
| 05 | Traffic split / canary (90/10) | 05-traffic-split.yaml |
| 06 | RequestRedirect + URLRewrite | 06-redirect-rewrite.yaml |
| 07 | Request/response header filters | 07-header-filter.yaml |
| 08 | TLS edge (cert-manager + mode: Terminate) | 08-tls-edge.yaml |
| 09 | TLS passthrough (TLSRoute, SNI) | 09-tls-passthrough.yaml |
| 10 | TLS re-encrypt + BackendTLSPolicy (mTLS) | 10-tls-reencrypt.yaml |
| 11 | Cross-namespace + ReferenceGrant | 11-cross-ns.yaml |
| 12 | CORS + JWT (SecurityPolicy) | 12-cors-jwt.yaml |
| 13 | Local rate limit | 13-rate-limit.yaml |
| 14 | External auth (extAuthz) | 14-external-auth.yaml |
| 15 | Multi-listener (HTTP+HTTPS with 80→443 redirect) | 15-multi-listener.yaml |
| 16 | Gateway merging (shared data plane) | 16-gateway-merge.yaml |
Coverage of Your Asks
| You asked for | Covered by |
|---|---|
| Gateway API in depth | research/gateway-api.md (14 sections) + research/envoy-gateway.md (11 sections) |
| Implementation with Envoy Gateway | 16 scenario manifests + install.sh + RUNBOOK.md |
| All possible scenarios | 16 scenarios covering HTTP, multi-host, path, header, traffic split, filters, TLS (3 modes), cross-ns, auth (JWT/OIDC/extAuthz), rate limit, CORS, multi-listener, gateway merging |
| Diffs with Ingress | Full diff table in research/gateway-api.md §12; recommendation in research/envoy-gateway.md §3 |
| Installation methods | lab/install.sh (Helm — the canonical path); research/envoy-gateway.md §6 covers the chart + key values; RUNBOOK §5 covers upgrades |
| ClusterIP gotchas | research/gateway-api.md §13 item 4; RUNBOOK §7.3; research/gateway-api.md §5.2-5.4 covers route attachment mechanics; clusterip specifics called out in plan and RUNBOOK troubleshooting |
| TLS gotchas | research/gateway-api.md §8 (cert manager, rotation, SANs, intermediate certs, mTLS via BackendTLSPolicy, edge vs passthrough vs re-encrypt); scenario 10 demonstrates re-encrypt + mTLS in practice; RUNBOOK §7.4 covers cert rotation issues |
3. Validation Status
Run in the cloud sandbox (no docker/kubectl available):
- ✅
bash -noninstall.shandcleanup.sh— both pass - ✅
python3 yaml.safe_load_allon all 16 YAML files — all parse - ✅ Structural sweep: 22 Deployments, 26 Services, 18 HTTPRoutes, 7 Gateways, 1 BackendTLSPolicy, 1 TLSRoute, 1 ReferenceGrant, 3 SecurityPolicies, 5 Certificates, 4 Issuers, 3 ConfigMaps
- ✅ Cross-checked: every HTTPRoute has
parentRefs+rules; every Gateway hasgatewayClassName+listeners; everyBackendTLSPolicyhastargetRefs+tls; everySecurityPolicyhastargetRefs - ❌ Live
kind apply— sandbox has no docker. The user runsinstall.shon a machine with docker.
When you run install.sh on your machine, the first failure you’d likely hit is scenario 10 — I left a placeholder in the reencrypt-ca ConfigMap because injecting a real CA requires running cert-manager first. The VERIFY.md notes this and tells you how to fix it (two-line kubectl).
4. Key Decisions (locked)
- Gateway API channel:
standard(no alpha features unless absolutely needed) - EG version:
v1.4.0(latest stable as of plan) - Install method: Helm (chart
gateway-helm/gateway-helmfromgateway-helmrepo) - Data plane:
Deployment(notDaemonSet) by default - Cert manager:
cert-managerv1.16+ (already v1.16.2 in install.sh) - Auth/CORS/rate-limit/authz: EG
SecurityPolicyextension (not in spec, but stable and pragmatic) - BackendTLSPolicy: used in spec (re-encrypt scenario)
- Local lab:
kind(more representative than k3d/minikube) - Migration from ingress-nginx: documented in
research/envoy-gateway.md§3 — recommended successor is EG
5. Open Items / Future Work
These are NOT blockers but worth knowing:
- Scenario 10’s CA cert — needs the live cert-manager to populate. The placeholder is intentional and
VERIFY.mdtells you the two-line fix. - Global rate limit (Redis) — install.sh doesn’t enable the
envoy-ratelimitsubchart by default. Local rate limit (per-proxy) is what scenario 13 demonstrates. Adding global would be a one-line Helm value change plus a Redis dependency. - Production hardening —
RUNBOOK.md§8 has the checklist but the lab is not production-grade. You’ll add PDBs, NetworkPolicy, real OTel, etc. on your own. - GAMMA (mesh) / Multi-cluster Gateway — not in scope, mentioned in research docs.
- Live-validate the manifests against the actual installed CRDs — only possible by running
install.shon a machine with docker.
6. How to Use This
- Read the research first. Start with
research/gateway-api.mdif you want the conceptual model, orresearch/envoy-gateway.mdif you’re more interested in the implementation. - Bring up the lab.
cd lab && ./install.sh— about 5 minutes from a fresh machine. - Run scenarios 01–05 to get the model in your head.
VERIFY.mdtells you what to curl. - TLS scenarios (08–10) are the most “production-relevant” — spend time there.
- 11 (cross-ns) and 12–14 (auth/CORS/rate-limit/extAuthz) are the day-2 features you’ll use most.
- 15 (multi-listener) and 16 (gateway merging) are infrastructure patterns — read them once, then look them up as needed.
- Use RUNBOOK.md for any operational issue.
7. Next Step
Run ./install.sh on a machine with docker, kind, kubectl, and helm. If a scenario fails, paste the error and I’ll fix the manifest.
If you want a different starting point — e.g., “give me the prod-ready Helm values, not the lab” or “build me a Grafana dashboard for the data plane” — say the word and I’ll cut a new track.