M/03 — Topological Data Analysis
Persistence barcodes on strategy correlation structure
H₀ persistent homology under a correlation-distance Vietoris–Rips filtration: counting how many stable clusters survive in a strategy population.
The mathematics
Take N strategies with return histories ri ∈ ℝT. Define the correlation distance
which is the Euclidean distance between the unit-norm representatives of the two return vectors; it is a true metric (Mantegna 1999). At ε = 0 every strategy is its own cluster — N connected components. As ε grows we add an edge for every pair with d(i, j) ≤ ε. The Vietoris–Rips complex VRε is the simplicial complex whose k-simplices are (k+1)-cliques in this graph; for H₀ we only need its 1-skeleton (the graph itself).
Connected components of VRε are tracked by union-find. As ε increases we obtain a nested sequence of graphs
Each new edge either (a) connects two vertices already in the same component (no change to H₀), or (b) merges two components into one. In case (b), the younger component “dies” at ε. Persistence formalizes this: every component has a birth at ε = 0 and a death at the merge time. The 0-th persistence diagram is
i.e. N − 1 finite bars and one immortal bar (the eventually-merged-everything component). The barcode is just this set drawn as horizontal segments.
Stable cluster count
Long bars correspond to clusters that resisted being absorbed for a long range of ε — meaning that any reasonable threshold around the bar would split the population the same way. A standard heuristic for “number of stable clusters” is
with c around 1.5–2.0; the +1 accounts for the immortal bar. Tighter alternatives include the bottleneck distance to a perturbed barcode under bootstrap resampling, which we report in the production pipeline.
Worked example
Three planted clusters of 8 strategies each, intra-cluster ρ = 0.6 on T = 250 bars. With probability 1 as T grows, the empirical correlations within a cluster concentrate around 0.6 (so d ≈ √(2·0.4) ≈ 0.89) and cross-cluster correlations around 0 (d ≈ √2 ≈ 1.41). Expected barcode:
- 21 short bars dying near ε ≈ 0.89 (the 21 within-cluster merges).
- 2 long bars dying near ε ≈ 1.41 (the two between-cluster merges).
- 1 immortal bar.
K* = 1 + 2 = 3, matching ground truth. The demo below reproduces this experiment.
Demo — H0 persistence barcode
3 planted clusters of 8 strategies each. Pairwise correlation distance d(i,j)=√(2(1−ρ)). Barcode: when each connected component merges into a larger one.
Each horizontal bar is one connected component, born at ε=0 and dying when it merges with another component. Amber bars are persistent — they survive past 1.5× the median merge-distance. Their count + 1 (the immortal component) ≈ the number of stable clusters in the population.
Figures
Why this matters for systematic strategies
Cluster-count is an under-appreciated regime indicator. In a benign market the strategy population decomposes into several stable clusters of orthogonal styles — long bars are common. In a stressed regime the cross-cluster correlations rise, the within-cluster correlations rise more, and the barcode collapses to one or two long bars: everything trades together. The integer K* derived from the H₀ barcode telegraphs that collapse before it shows up in headline portfolio statistics.
Operationally we run M/03 alongside M/01 (RMT eigenspectrum). The MP signal eigenvalue count gives an upper bound on the number of effective factors; the H₀ persistent-cluster count gives a lower bound on the number of effective styles. The two should track. When they diverge — usually because a single factor is driving everything — that’s a flag.
Reproducibility
DaruFinance / strategy-tda
Python — open source reference implementation
Minimal invocation
import numpy as np
from strategy_tda import h0_barcode_from_returns
# X: N x T strategy returns matrix
bars = h0_barcode_from_returns(X)
# bars is an array of shape (N-1, 2): [birth, death] for each merge
# H0 is born at 0; long bars indicate stable clusters.
n_robust_clusters = sum(d > 1.5 * np.median(bars[:, 1]) for d in bars[:, 1]) + 1
References
- [1]Edelsbrunner, H., Letscher, D. & Zomorodian, A. (2002). Topological persistence and simplification. Discrete & Computational Geometry 28, 511–533.
- [2]Carlsson, G. (2009). Topology and data. Bulletin of the AMS 46, 255–308.
- [3]Gidea, M. & Katz, Y. (2018). Topological data analysis of financial time series: landscapes of crashes. Physica A 491, 820–834.
- [4]Mantegna, R. N. (1999). Hierarchical structure in financial markets. European Physical Journal B 11, 193–197.