Goal

Identify every point where money state can diverge from reality, then design safeguards so failures are contained, reversible, and auditable.

The Risk Map Framework

1) Enumerate money states

  • Available balance
  • Held / reserved balance
  • Pending settlement
  • Final settled balance
  • Reversal / refund states

2) Enumerate transitions

  • Deposit initiated → credited
  • Order placed → hold created
  • Trade executed → positions update
  • Fees applied → ledger entry
  • Withdrawal requested → processed

3) Failure modes (PM must force this list)

  • Duplicate events (retries, user double-click, network repeats)
  • Out-of-order events (late webhook, delayed worker)
  • Partial success (one service commits, another fails)
  • Provider mismatch (gateway says success, bank settlement differs)
  • Concurrency / race conditions (two actions on same funds)

Safeguards (Product + System)

Product safeguards

  • Clear statuses: pending/processing/complete/failed
  • Time-based disclaimers (e.g., settlement can take T+1/T+2)
  • Rate limits on risky actions (withdraw, order cancel bursts)
  • Escalation paths: “raise a ticket” with pre-filled evidence

System safeguards

  • Idempotency keys for every money-changing request
  • Ledger-first updates (append-only records)
  • Atomic holds + releases (never go negative silently)
  • Outbox/events with retry-safe consumers

Operational safeguards

  • Reconciliation dashboard (provider vs internal ledger)
  • Exception queues (mismatch, stuck pending, double events)
  • Maker-checker for manual adjustments
  • Immutable audit trail for each adjustment

Example Output (deliverable)

The end of this exercise should be a table like:

  • Transition: Deposit credited
  • Dependencies: Gateway webhook, internal ledger, user wallet
  • Failure modes: late webhook, duplicate webhook, partial commit
  • Controls: idempotency, retry-safe consumer, reconciliation job
  • Backoffice: view evidence + manual resolution workflow

Key PM Lesson

Money safety is not “engineering responsibility”. PM must define the states, the failure modes, and the operational recovery path. If you don’t define recovery, you’re implicitly accepting loss.