GitOps: git is the source of truth for both app code AND infrastructure. A controller (Argo CD, Flux) pulls from git, applies to the cluster, and reconciles continuously. The cluster is always told what to look like, not told what to do.
The two operations
| Pattern | Model | Tools | State |
|---|---|---|---|
| Push | CI pushes to cluster (kubectl apply) | Jenkins, GitHub Actions | Cluster state can drift |
| Pull (GitOps) | Controller pulls from git | Argo CD, Flux | Cluster state always matches git |
GitOps is pull-based. The cluster decides what to run, not CI.
The flow
Developer Git CI GitOps Controller Cluster
│ │ │ │ │
│ commit │ │ │ │
│ ───────────> │ │ │ │
│ │ push event │ │ │
│ │ ───────────> │ │ │
│ │ │ test, build, push │ │
│ │ │ image: myapp:v123 │ │
│ │ <────────── │ │ │
│ │ update tag │ │ │
│ │ │ │ │
│ │ poll/webhook │ │
│ │ <─────────────────────────────────────────── │ │
│ │ │ │ detect diff │
│ │ │ │ apply manifest │
│ │ │ │ ──────────────> │
│ │ │ │ │
│ │ │ │ health check │
│ │ │ │ <────────────── │
│ │ │ │ │
Key insight: CI does not touch the cluster. CI updates git (image tag, manifest, etc.). The GitOps controller reconciles.
The four principles
From the OpenGitOps spec:
- Declarative — the entire system is described declaratively (yaml, json, etc.)
- Versioned and immutable — stored in git, with full version history
- Pulled automatically — software agents pull the desired state, not humans pushing
- Continuously reconciled — agents observe and apply, not just on event
The two main tools
Argo CD
CNCF Graduated. The most popular.
Pros:
- Web UI
- Multi-cluster support
- App of Apps pattern
- Rich RBAC
- Notifications
- Sync waves
- Resource hooks
- Helm, Kustomize, Jsonnet, plain manifests
Cons:
- More complex than Flux
- Stateful UI/DB (Redis)
- Heavier resource footprint
Flux CD
CNCF Graduated. The CNCF reference.
Pros:
- Lighter weight
- Composable (GitOps Toolkit)
- Multi-tenancy
- Image automation
- Native Helm + Kustomize
- CRDs are the interface
Cons:
- No built-in UI (use Weave GitOps)
- Less out-of-box features
Comparison
| Feature | Argo CD | Flux |
|---|---|---|
| Web UI | ✅ built-in | ❌ use Weave GitOps |
| Multi-cluster | ✅ hub-spoke | ✅ hub-spoke |
| Image automation | ✅ via Image Updater | ✅ built-in |
| RBAC | ✅ rich | ✅ simpler |
| Notifications | ✅ built-in | ✅ via Notification Controller |
| Helm | ✅ | ✅ |
| Kustomize | ✅ | ✅ |
| Jsonnet | ✅ | ❌ |
| Helm values | ✅ | ✅ |
| OCI registry | ✅ | ✅ |
| App of Apps | ✅ | ✅ (Kustomization) |
| Sync waves | ✅ | ✅ (dependsOn) |
| Drift detection | ✅ | ✅ |
| Resource hooks | ✅ | ❌ |
| Multi-tenancy | ✅ Projects | ✅ namespaces |
For most teams: Argo CD has better UX, Flux has better GitOps principles. Both work.
The repository structure
A GitOps repo is structured around apps and environments.
Pattern 1: one repo per app (simple)
my-app/
├── base/ # common manifests
│ ├── deployment.yaml
│ ├── service.yaml
│ └── kustomization.yaml
└── overlays/ # env-specific
├── dev/
│ ├── kustomization.yaml
│ └── patch-replicas.yaml
├── staging/
│ ├── kustomization.yaml
│ └── patch-resources.yaml
└── prod/
├── kustomization.yaml
├── patch-replicas.yaml
└── patch-resources.yaml
One repo per app. Each app has its own git history, RBAC, etc. Easy for app teams to own.
Pattern 2: monorepo (centralized)
gitops/
├── apps/
│ ├── my-app/
│ │ ├── base/
│ │ └── overlays/
│ └── other-app/
├── infrastructure/
│ ├── cert-manager/
│ ├── ingress-nginx/
│ └── monitoring/
└── clusters/
├── dev/
│ ├── apps.yaml # which apps run in dev
│ └── infra.yaml
└── prod/
├── apps.yaml
└── infra.yaml
One repo for everything. Easier to manage at scale, but more access control complexity.
Pattern 3: environment-per-repo (separation of concerns)
gitops-dev/
└── apps/
gitops-staging/
└── apps/
gitops-prod/
└── apps/
One repo per environment. Strongest separation, but most overhead.
When to use which
| Pattern | Best for |
|---|---|
| App-per-repo | Small orgs, independent apps |
| Monorepo | Platform team owns ops, app teams contribute |
| Env-per-repo | Strict change control, audit requirements |
The reconciliation model
GitOps controllers continuously reconcile:
git commit → desired state in git
↓
controller → reads git
↓
controller → compares to cluster state
↓
diff exists?
├── no → done
└── yes → apply desired state
↓
health check
↓
success?
├── yes → done
└── no → retry / alert
Self-healing: if someone changes the cluster manually, the controller reverts. This is the key value of GitOps.
Sync options
Manual vs auto sync
# manual sync
syncPolicy:
automated: null # requires manual click / API call
# OR
syncPolicy: {} # no automated block, defaults to manual
# auto sync
syncPolicy:
automated:
prune: true # delete resources removed from git
selfHeal: true # revert manual changes
allowEmpty: falseManual sync — the controller shows you the diff, you click sync. Safer for prod.
Auto sync — the controller applies changes without intervention. Faster, but risk of unintended changes.
Prune — when you remove a resource from git, it’s also removed from the cluster.
Self-heal — when someone manually changes the cluster, the controller reverts.
Sync waves
For ordered deployments:
metadata:
annotations:
argocd.argoproj.io/sync-wave: "0" # applied firstHigher numbers applied later. Use this for:
- Database migrations before app
- App before monitoring
- ConfigMaps before Pods
Resource hooks
metadata:
annotations:
argocd.argoproj.io/hook: PreSync
argocd.argoproj.io/hook-delete-policy: BeforeHookCreationHooks let you run a Job before/after sync (e.g., DB migration, cache invalidation).
Retry and backoff
syncPolicy:
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3mFor flaky resources: the controller retries. Useful for Jobs.
Drift detection
The controller periodically (3-5 min) checks if cluster matches git. If drifted:
- Argo CD: shows “OutOfSync” status, can be configured to alert
- Flux: reconcile reverts to git state
# manually check for drift
argocd app diff my-appDrift sources:
- Manual
kubectl apply - A different controller modifying the resource
- A bug in the GitOps controller
Drift is bad — it means the cluster state doesn’t match what’s documented. Investigate root cause.
Secrets in GitOps
The classic problem: secrets shouldn’t be in plain git.
Solution 1: sealed-secrets
# install kubeseal
brew install kubeseal
# fetch the public key
kubeseal --fetch-cert \
--controller-name=sealed-secrets \
--controller-namespace=kube-system \
> pub-cert.pem
# encrypt a secret
kubectl create secret generic my-secret \
--from-literal=password=secretvalue \
--dry-run=client -o yaml | \
kubeseal --cert pub-cert.pem -o yaml > my-sealed-secret.yamlThe sealed secret is in git. The cluster’s controller decrypts it.
Pros: simple, works with any GitOps controller Cons: encrypted to a specific cluster, can’t move between clusters
Solution 2: external-secrets
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: my-secret
spec:
secretStoreRef:
name: aws-secrets-manager
kind: ClusterSecretStore
target:
name: my-secret
data:
- secretKey: password
remoteRef:
key: my-app/prod/passwordThe controller reads from AWS Secrets Manager / Vault / etc. and creates a k8s Secret.
Pros: real secret store, rotation, audit Cons: operator required, more complex
Solution 3: SOPS
# install sops
brew install sops
# encrypt a secret
sops --encrypt --age <public-key> secret.yaml > secret.enc.yaml
# decrypt
sops --decrypt secret.enc.yamlEncrypted YAML in git. The operator decrypts.
Pros: works with any controller Cons: asymmetric encryption keys need management
Solution 4: External Secret Operator (ESO)
The most production-ready for cloud-managed secrets.
See security-baseline for the full secret management guide.
Image automation
The hardest part of GitOps: how does the image tag in git get updated when a new image is built?
Pattern 1: CI updates git
# in CI
git clone gitops-repo
sed -i 's|image: myapp:.*|image: myapp:'"$TAG"'|' apps/my-app/overlays/prod/kustomization.yaml
git commit -m "bump myapp to $TAG"
git pushCI has push access to the GitOps repo. Simple, works with any controller.
Cons: CI has production access, audit log noise.
Pattern 2: Image updater (Argo CD)
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: my-app
annotations:
argocd-image-updater.argoproj.io/image-list: myapp=myregistry/myapp
argocd-image-updater.argoproj.io/myapp.update-strategy: latest
spec:
# ...The Image Updater watches the registry, finds new tags, opens a PR (or commits).
Pros: no CI access needed, automation Cons: additional controller
Pattern 3: Flux Image Automation
Flux has built-in image automation:
apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImageRepository
metadata:
name: my-app
spec:
image: myregistry/myapp
interval: 1m
---
apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImagePolicy
metadata:
name: my-app
spec:
imageRepositoryRef:
name: my-app
policy:
semver:
range: 1.0.xFlux updates the manifest in-cluster (or commits to git, your choice).
Multi-cluster GitOps
The cluster topology question.
Hub-spoke
┌────────────────┐
│ Hub cluster │
│ (Argo CD) │
│ │
│ connects to: │
│ - prod-us │
│ - prod-eu │
│ - staging │
│ - dev │
└────────────────┘
One cluster runs the GitOps controller. Other clusters are connected to it.
Pros: single pane of glass, single set of credentials Cons: hub cluster is critical
Per-cluster
Each cluster has its own Argo CD / Flux.
Pros: no single point of failure, simpler blast radius Cons: multiple UIs to manage, harder to see at scale
AppSets and multi-tenancy
Argo CD ApplicationSets let you define one template, generate many apps:
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: my-app-all-clusters
spec:
generators:
- list:
elements:
- cluster: prod-us
url: https://prod-us.example.com
- cluster: prod-eu
url: https://prod-eu.example.com
template:
metadata:
name: '{{cluster}}-my-app'
spec:
project: default
source:
repoURL: https://github.com/myorg/my-app
targetRevision: HEAD
path: overlays/{{cluster}}
destination:
server: '{{url}}'One ApplicationSet, one source, N applications across clusters.
Progressive delivery with GitOps
GitOps + progressive delivery = safe rollouts.
Argo Rollouts
Replaces Deployments with Rollouts. Supports canary, blue-green, traffic shifting.
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: my-app
spec:
replicas: 5
strategy:
canary:
steps:
- setWeight: 10
- pause: {duration: 5m}
- setWeight: 50
- pause: {duration: 5m}
- setWeight: 100
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app
image: myregistry/myapp:v1See argo-rollouts for full details.
Flagger
Flux-native progressive delivery.
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: my-app
spec:
provider: istio
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
progressDeadlineSeconds: 60
canaryAnalysis:
interval: 30s
threshold: 5
metrics:
- name: request-success-rate
thresholdRange:
min: 99
- name: request-duration
thresholdRange:
max: 500Flagger uses Istio/Linkerd/App Mesh for traffic splitting.
Common GitOps pitfalls
- CI pushing to cluster. Defeats the purpose. CI updates git, controller applies.
- Long-lived credentials in GitOps controller. Use OIDC / workload identity.
- No drift detection. If someone uses kubectl, the controller should alert.
- Secrets in plain text. Use sealed-secrets, SOPS, or external secret operators.
- No review process for git changes. A bot that auto-commits can break prod.
- Sync waves in wrong order. App before DB.
- Auto-prune enabled in dev. Destructive when developing.
- No rollback procedure.
git revertis the rollback. - Massive monorepo with no clear ownership. Every team can break every team.
- GitOps controller as single point of failure. Hub cluster down = no updates anywhere.
GitOps for cluster add-ons
Same pattern, different scope:
# infrastructure repo
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: cert-manager
spec:
source:
repoURL: https://charts.jetstack.io
chart: cert-manager
targetRevision: v1.14.0
destination:
server: https://kubernetes.default.svcSame GitOps flow, but for cluster components (CNI, ingress, cert-manager, etc.).
Common gotchas
- Argo CD and the cluster-admin role. The controller needs broad access to apply manifests. Restrict to specific namespaces / projects.
- Flux with the GitOps Toolkit is more verbose than Argo CD. Trade-off: more flexibility, more yaml.
- Image updater can spam PRs if you use floating tags (latest, semver ranges).
- Sync windows (e.g., “no syncs on Friday”) can delay fixes. Use sparingly.
- Multi-cluster with cluster-scoped resources needs careful RBAC. Avoid cluster-scoped when possible.
- Helm values in GitOps — different controllers handle them differently. Argo CD has values files, Flux has Kustomization.
- Manifests with side-effects (e.g., creating a database) are dangerous in GitOps. Use a separate process for one-time infra.
- The cluster that runs the GitOps controller — is it a “trick question” if it’s not in git? Use GitOps for the GitOps controller too.
- GitOps != no CI. You still need CI for tests, builds, image scans. GitOps handles deployment only.
A worked example
Goal: deploy a stateless web service via GitOps, with auto-sync to dev, manual sync to prod, image automation, and secrets in external store.
Setup:
-
Two repos:
my-app(code, CI builds images)my-app-gitops(manifests, GitOps deploys)
-
CI (GitHub Actions):
- On push to main: test, build, push to ECR
- Updates
my-app-gitopsoverlay with new tag - Opens PR if dev, auto-merge
- Opens PR for staging/prod (manual approval)
-
Argo CD:
- Connects to
my-app-gitopsrepo - Two Applications:
my-app-dev,my-app-prod - dev: auto-sync, prune, self-heal
- prod: manual sync, with notifications
- Connects to
-
Secrets:
- External Secrets Operator reads from AWS Secrets Manager
- Creates k8s Secret at sync time
-
RBAC:
- Argo CD’s ServiceAccount has admin in
my-appnamespace - No access to kube-system or other namespaces
- Argo CD’s ServiceAccount has admin in
On push to main:
- CI builds image
myregistry/myapp:v123 - CI updates
overlays/dev/kustomization.yamltov123 - Argo CD detects change, syncs
- New pods roll out
- Argo CD’s notification fires: “dev deployment complete”
- Engineer sees, opens PR to update staging tag to
v123 - After PR merge + manual sync: staging rolls out
- Same for prod (with canary via Argo Rollouts)
Total time from merge to dev: 2-5 minutes Manual approval gates: staging and prod
See also
- kustomize — patching layered with GitOps
- helm-cicd — Helm in GitOps
- argo-workflows — CI for image builds
- argo-rollouts — safe rollouts
- oidc-integration — auth for the controller