How "Safe" Defaults Blow Up Production
Defaults Matter
Most incidents aren’t exotic edge cases — they’re human choices around configuration — and using defaults. We treat defaults something “obviously working”, therefore safe: then an empty partition key melts one broker and Airflow’s catchup=True quietly queues two years of backfills.
Three moves that hold under pressure:
If a value is sensible for 80% or more users, make it the default. If no single value fits the majority, do not set a default — force an explicit choice.
Reject ambiguous sentinels (
"",*,null); choose safe strategies (sticky/round‑robin; explicit include lists).Make scope visible. Previews (“this will enqueue N runs”), dry-runs by default,
and hard caps.
In Short: Bad defaults are worse than no defaults — a wrong paved road silently guides everyone over a cliff. The full post has a number of real stories and a copy-ready checklist: link below.
Quick question: Which default has bitten you lately? Share your stories in comments!
