Lab

M/02 — Higher Criticism + Model-X Knockoffs

Sparse signal detection with FDR control

Higher Criticism for global sparse-signal screening, paired with Model-X knockoffs for FDR-controlled variable selection on populations of strategies.

The mathematics

Higher Criticism

Suppose you observe p two-sided p-values from p hypothesis tests. Under the global null, the order statistics are distributed as the order statistics of p i.i.d. Uniform(0,1) variables, with j-th expected value j/(p+1). The Higher Criticism statistic compares the empirical CDF to its uniform expectation, scaled by the variance of a binomial under the null:

with α typically taken as 1/2. Donoho & Jin (2004) show that under the rare-and-weak alternative — a sparsity fraction p−β of non-nulls with effect size — HC is asymptotically optimal at the detection boundary that no thresholding rule can cross. As a side benefit the argmax j* gives a data-driven cutoff: reject the smallest j* p-values.

Model-X knockoffs

For p features X1, …, Xp with known joint distribution, a Model-X knockoff matrix X̃ satisfies two conditions:

  • Pairwise exchangeability: for any subset S ⊆ {1,…,p},
  • Conditional independence from the response:

Construct any feature-importance statistic Zj for the original variable and Z̃jfor its knockoff (e.g. lasso coefficients, regression z-scores). The knockoff statistic is

Under the null, the sign of Wj is symmetric and exchangeable. The knockoff filter picks the smallest threshold t such that

and selects S = {j : Wj ≥ t}. Theorem 3.4 of Candes et al. (2018) then gives finite-sample control of the modified false discovery rate

Worked example

p = 200 features; k = 10 truly non-null with effect A = 3.0; the remaining 190 are pure N(0, 1). Compare three procedures:

  • Naive |z| > 1.96: selects ~14 features (10 true + ~4 null) but as p grows the false-discovery proportion explodes by 2.5% × p / k.
  • Higher Criticism: selects the smallest j* p-values where j* = argmax HC. With sparse signal at A = 3, j* lands near the true k.
  • Knockoff filter at q = 0.10: selects ~9 features, expected ≤ 1 false discovery.

The demo below regenerates the experiment live. Note that as A → 0 (weak signal) the knockoff filter correctly returns the empty set rather than fabricate selections — a property no naive thresholding rule has.

Demo — Higher Criticism vs Model-X Knockoffs

p features, k true non-nulls with effect size A. Knockoffs control FDR at level q; HC chooses a data-driven threshold from the empirical p-value process.

p (features)200
k (true non-nulls)10
A (signal strength)3.0
q (target FDR)0.10
seed=11
|z|>1.96 (naive)
selected19
true positives9
false positives10
FDP52.6%
power90.0%
Higher Criticism
HC*3.50
argmax j43
selected43
FDP76.7%
power100.0%
Knockoffs @ q=0.10
threshold t
selected0
true pos0
false pos0
FDP0.0%
power0.0%
08152331-4.6-3.1-1.50.01.53.14.6W_j = |Z_j| − |Z̃_j|

Histogram of W statistics. Amber overlay = true non-nulls. The vertical green line is the knockoff threshold +t; everything to the right is selected. Note that under H0, W is symmetric about 0 — that’s what makes the FDR control work.

Figures

W-statistic histogram with knockoff FDR threshold
Fig. 1Knockoff W-statistic Wⱼ = |Zⱼ| − |Z̃ⱼ| on the BTC OOS strategy pool. Z is built from real Sharpes scaled by √n_trades and MAD-normalised; Z̃ is a sign-symmetric Gaussian knockoff. The amber dashed line is the data-dependent τ that achieves the FDR target q = 0.10; selected strategies are everything to the right of τ.
Higher-Criticism statistic curve over rank
Fig. 2Higher-Criticism HCⱼ = √p · (j/p − p₍ⱼ₎)/√((j/p)(1−j/p)) sweeping over the rank j of sorted p-values from the same BTC pool. HC* is the maximum over j ∈ [1, p/2]. The dashed/dotted lines are 95% and 99% quantiles of HC* under a uniform-pᵢ null built by 400 simulated replicates of the same dimension.

Why this matters for systematic strategies

A strategy pool with p = 38,000 parameter combinations on a single asset will produce several thousand nominally significant t-statistics under any unadjusted threshold. The standard remedy in finance has been Bonferroni (too conservative, kills real signals) or BH-FDR (correct under independence, broken under arbitrary dependence). Knockoffs are designed for the dependence structure that pools of trading strategies actually exhibit: they share market data, indicator families, parameter neighbourhoods.

Combined use: run Higher Criticism on the pool first. If HC* is below its asymptotic null quantile, stop — there is no detectable signal at this sparsity. If HC* is above, run knockoffs to extract the responsible features at controlled FDR.

Reproducibility

DaruFinance / hc-knockoffs

R — open source reference implementation

Minimal invocation

library(hc.knockoffs)

# Z: vector of per-strategy z-scores, length p
hc <- higher_criticism(Z)
hc$HC_star      # global statistic
hc$selected     # indices below the HC threshold

# Model-X knockoffs at FDR q = 0.10
sel <- knockoff_filter(Z, q = 0.10, method = "gaussian")
sel$tau         # data-dependent threshold
sel$selected    # selected feature indices

References

  1. [1]Donoho, D. & Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Annals of Statistics 32(3), 962–994.
  2. [2]Barber, R. F. & Candes, E. J. (2015). Controlling the false discovery rate via knockoffs. Annals of Statistics 43(5), 2055–2085.
  3. [3]Candes, E., Fan, Y., Janson, L. & Lv, J. (2018). Panning for gold: Model-X knockoffs for high-dimensional controlled variable selection. JRSS-B 80(3), 551–577.
  4. [4]Benjamini, Y. & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. JRSS-B 57(1), 289–300.