Historical Extreme Events & Clustering — Black Swans Rarely Come Alone
4-section structure: Concept / How We Compute / How to Read / Caveats.
1. Concept
VaR/CVaR gives you statistical tail risk; but users really want to know: "Which days exactly were the worst, and why?"
That's the goal of "historical extreme event flagging" — explicitly list the worst 10 single days in the past 252, with:
- Same-day benchmark return — systemic crash or idiosyncratic event?
- Excess return = stock − benchmark — quantify "how much worse than peers"
- Clustering analysis — events uniformly distributed or concentrated?
Why Clustering Matters
Standard finance assumes i.i.d. (independent identically distributed) returns. Reality contradicts this:
Volatility Clustering: big moves tend to follow big moves.
- 2020/3 COVID: 5 × −5%+ days in one month
- 2022/10 rate panic: 3 crashes in two weeks
- 2008/9 Lehman: turbulent for weeks
Not coincidence — a well-documented statistical fact. This indicator quantifies the clustering.
2. How We Compute
2.1 Event Selection
1. Take past 252 days' simple daily returns
2. Compute 1st percentile threshold
3. If events under threshold < 10, take worst 10 anyway (guarantee sample)
4. Sort by date ascending
Default shows 10 events.
2.2 Benchmark Return
From _calc_risk_series df's market_ret:
- TW stocks → ^TWII
- US stocks → ^GSPC
2.3 Excess Return
excess = stock_return − benchmark_return
- Excess < −2% → stock dropped much more than market → idiosyncratic event
- Excess ≈ 0% → systemic event
- Excess > 0% → outperformed on a bad day
2.4 Clustering Analysis
mean_gap_days = average spacing between consecutive events
hottest_cluster = max events in any 30-day rolling window
Under i.i.d.: 10 events evenly across 252 days ≈ 25-day average gap. Observed gap < 15 days → clear clustering.
3. How to Read
3.1 Event List Interpretation
Date Stock% Bench% Excess%
2024-03-11 -6.20% -1.10% -5.10% ← Idiosyncratic (large neg excess)
2024-03-13 -4.80% -0.40% -4.40%
2024-03-18 -3.90% -0.20% -3.70%
2024-10-09 -5.50% -4.80% -0.70% ← Systemic (market fell similarly)
Insights:
- 3/11 – 3/18 all excess < −3% → stock-specific cascade (likely customer cut, inventory correction, earnings miss)
- 10/9 excess −0.7% → market-wide event, not stock-specific
Far more actionable than just "−5% day".
3.2 Clustering Reading
| Mean Gap | Interpretation |
|---|---|
| > 50 days | 🟢 Sparse, near-i.i.d. |
| 20–50 days | 🟡 Typical stock |
| < 20 days | 🔴 Significant clustering, severe vol regime |
Hottest cluster — highlighted at top. ≥ 4 events in 30 days indicates a structural crisis period, not isolated black swans.
3.3 Pairs Well With
- VaR/CVaR: they give statistical tail; this gives concrete days
- Jarque-Bera: tells you fat-tail severity; this shows what fat-tail looks like in practice
- Max DD: MDD is the deepest; this is all the deep ones
4. Caveats
⚠️ Dynamic Threshold
Threshold = max(1% percentile, worst 10). During quiet periods (e.g., 2021 slow bull), worst 10 may not be that extreme. During crash periods, threshold genuinely severe.
UI shows threshold X% at the top for clarity.
⚠️ Daily Resolution Misses Intraday
Uses daily close. A stock that fell −8% intraday and recovered to −1% only logs as −1%.
Mitigation: cross-check with P1C.1 CVaR (captures borderline days) and news.
⚠️ Excess Return Not Beta-Adjusted
Direct subtraction stock − benchmark, no Beta adjustment.
- Beta=1.5 stock should "normally" drop 4.5% when market drops 3%
- Pure excess may overstate idiosyncratic risk
Academic: use CAPM residual. We chose intuitive subtraction; users can mentally adjust via the Beta card above.
⚠️ Clustering ≠ Full Time-Series Diagnostic
"4 events in 30 days" shows clustering, but:
- Doesn't tell you why
- Doesn't predict when next cluster (needs GARCH — Phase 4)
- Only retrospective
⚠️ News Integration Pending
Original plan included same-day news. This version delivers core data; news integration in Phase 2 (requires news_d_tw_h integration + real-time event extraction).
⚠️ Fixed 252-day Window
Events outside the 252 window are forgotten. For COVID 2020/3 impact analysis:
- 252 days after 2022/3 already excluded COVID
- Use long-horizon stress tests (P2.3 planned)
Further Reading
- VaR vs CVaR
- Q-Q Plot & Jarque-Bera
- Max Drawdown, Ulcer, Calmar
- GARCH Volatility Model (Phase 4)
Try It
- Stock Analysis → Risk: scroll to "Historical Extreme Events"
- Watch excess-return column — red means stock-specific drops
- Cross-reference "hottest cluster" with market events you remember
- Switch stocks: steady large-caps vs speculative names show very different clustering
- Click 📐 for event selection strategy and clustering formulas