Software Planning

Architecture decisions live and die by documentation. Without a paper trail, the team forgets why something was built a certain way — and repeats the same mistakes.


Core Artifacts

1. Architecture Decision Records (ADRs)

A short document capturing a significant architectural decision: the context, the decision, and its consequences.

# ADR-042: Use Kafka for async inter-service events
 
## Status: Accepted
## Date: 2025-05-24
## Deciders: jane@corp.com, bob@corp.com
 
## Context
Orders service needs to notify fulfillment, billing, and analytics
without coupling. Sync HTTP calls create circular dependency risk.
 
## Decision
Apache Kafka with consumer groups per downstream service.
Topic: `orders.events`
 
## Consequences
+ Decoupled: producers don't know consumers
+ Replay: new services can consume from beginning
+ High throughput: handles 50k events/sec
- Operational complexity: need Kafka cluster / MSK
- Learning curve: offset management, consumer groups
- Latency: async, not real-time

Store ADRs in version control (docs/adr/) — keeps them in sync with code.

2. RFCs (Request for Comments)

For controversial or high-impact decisions — propose, gather feedback, then decide.

RFC-001: Switch from REST to gRPC for internal services
├── Author: jane@corp.com
├── Created: 2025-05-20
├── Status: Review
├── Review Deadline: 2025-05-27
└── Reactions: +8 👍 -2 👎  5 💬

3. RFC Process

1. Author writes RFC (problem, proposed solution, alternatives)
2. Share with stakeholders (async or meeting)
3. Collect feedback (5 business days)
4. Author revises or withdraws
5. Decider approves / rejects / defers
6. ADR created from final decision

4. SLO Documents

Define Service Level Objectives during design — before launch.

# slo.yaml
api-gateway:
  availability:
    target: 99.95%
    window: 30d
    alert_threshold: 99.9%
  latency:
    target: p99 < 500ms
    window: 30d
    alert_threshold: p99 > 800ms
  error_rate:
    target: < 0.1%
    window: 30d
    alert_threshold: > 0.5%

Planning Meeting Formats

1. Architecture Review (1-2h, infrequent)

Agenda:
1. Present proposed architecture (author, 20min)
2. Clarifying questions (all, 15min)
3. Alternatives discussion (all, 30min)
4. Risks and concerns (all, 20min)
5. Decision (decider, 15min)
6. Action items (all, 10min)

2. Design Studio (2-4h, when starting something new)

Collaborative whiteboarding session for greenfield problems.

Format:
1. Problem framing (15min)
2. Individual sketching (30min)
3. Gallery walk (15min)
4. Group discussion (60min)
5. Vote on top 2-3 approaches (10min)
6. Consolidate into next steps (15min)

3. Retrospective (1h, per sprint/iteration)

Format (4Ls):
- Liked: what worked well
- Learned: new insights
- Lacked: what was missing
- Longed for: what we wish we had

Roadmap Planning

Now / Next / Later Framework

HorizonTimeframeOutput
NowThis quarterSprint backlog
NextNext quarterRoadmap (themes, not features)
Later6-12 monthsStrategic initiatives

OKRs for Architecture

# Example
Objective: Improve platform reliability
Key Results:
  - KR1: Reduce P50 incident detection time from 15min to 2min
  - KR2: Achieve 99.95% API availability (from 99.7%)
  - KR3: 80% of services instrumented with SLO dashboards

Anti-Patterns

Anti-PatternProblemFix
No ADRsSame debates every yearMandate ADR for any cross-team decision
ADRs written after the factThey become rationalization, not documentationWrite ADR before decision is final
Giant spec documentsNobody reads themKeep ADRs under 1 page
Planning without constraintsArchitects design fantasiesStart with budget, timeline, team size
No rollback planChanges are one-wayAlways document rollback procedure

Tools

ArtifactTool
ADRsMarkdown in docs/adr/, or use adr-tools
RFCsGitHub PRs, Notion, or HackMD
SLOsPrometheus, Datadog, or Grafana
RoadmapLinear, Notion, or Aha!
DiagramsExcalidraw, draw.io, Mermaid

Source