Risk Register
OGC's framing: value and risk are paired. Once an option is preferred, build the register for *that* option and iterate the value-risk balance until acceptable. Don't pre-build a generic register.
Entry shape
ID Risk Likelihood Impact Score Owner Mitigation Trigger / signal
R1 Cache invalidation gap M H 6 X Stale-while-revalidate + TTL audit CDN miss-rate spikeScore = Likelihood x Impact (1-3 each). Sort descending. Review top 5 only.
Software risk taxonomy
Use as a checklist - only include risks that actually apply.
- Data migration - schema change, backfill, dual-write window.
- Cache invalidation - staleness, thundering herd, key collisions.
- Permission boundaries - privilege escalation, multi-tenant leak, scope creep on tokens.
- Observability gaps - no metric, no log, no alert for the new failure mode.
- Deploy order - services brought up out of order, contract mismatch.
- Rollback - irreversible writes, missing feature flag, schema-locked rollback.
- Compatibility - old clients, queued messages with old schema, third-party API drift.
- Test gaps - no integration test, no load test, no chaos test for the new dependency.
- Cost surprise - LLM tokens, egress, retry storms, log volume.
- Concurrency - races on shared state, lock ordering, idempotency.
Non-software risk taxonomy
- Stakeholder defection - sponsor leaves, priorities shift.
- Skills gap - required expertise unavailable in time.
- External dependency - vendor, regulator, partner timing.
- Sequencing - blocked by another project's output.
- Reputation - failure visible to users/customers.
- Sunk-cost lock-in - committing to a path that's hard to abandon.
Mitigation patterns
- Reduce likelihood - change design, add review gate, shrink scope.
- Reduce impact - feature flag, staged rollout, kill switch, blast-radius cap.
- Detect early - telemetry, leading indicator, manual check at first run.
- Transfer - vendor SLA, insurance, contract.
- Accept - name it explicitly with the trigger that would force re-planning.
For each top-5 risk, name at least one detection and at least one mitigation. If you can't, the option is not ready for VM4.
Question template
Ask, one at a time:
"What signal would tell us R<n> is becoming real before users feel it?" - *Recommended: <existing metric, log, or check>; if none, propose adding it.*
Iterate until every top-5 risk has a signal and an owner.