Article — 8 min read — 2026-05-13
The partition does the work
A pre-registered statistical bar, cleared. Three choices in partition-based portfolio construction beat rank-by-Sharpe on a 26-asset crypto corpus — after correcting for the many comparisons made.
The bar
Before any of this work was run, a target was written down: a portfolio construction that beats the standard "rank strategies by historical Sharpe and take the top twenty" baseline on a large crypto corpus, under a multiple-comparison correction strict enough that the result cannot be the brightest of many lottery tickets.
The bar is now cleared. After correcting for the many comparisons made — Holm's step-down procedure controlling the family-wise error rate at 5% — a partition-based portfolio construction beats the historical-Sharpe baseline decisively, including on the moving-average strategy families that had historically been the hardest contest for any approach that tries to be coverage-aware.
This article describes what the construction is, what makes it work, and one honest revision to a claim made earlier in this research line.
The test corpus
The evaluation runs on the crypto subset of the lab's full strategy corpus: 26 assets spanning majors, large-caps, and mid-caps, crossed with seven indicator families (the public construction set used elsewhere on this site), for roughly 1.24 million backtested strategies in total. Every strategy is evaluated under strict walk-forward optimisation. No partition is held out as "the good one"; the whole corpus runs through the same pipeline.
The baseline is the simplest reasonable thing a practitioner would do: at each rebalance window, rank every strategy in the pool by its historical Sharpe up to that window, take the top twenty, hold them equally weighted to the next window. It is not a strawman — on this corpus it had been a stubborn target.
Three orthogonal choices
The construction that clears the bar stacks three changes against the previous coverage-aware approach (one strategy per cluster, reliability-weighted). Each is a deliberate methodological decision; each contributes independently; the combination is what wins.
1. Profit factor as the partition lens. The construction begins by carving the strategy population into clusters of behaviourally-similar strategies — a partition. The choice of which similarity metric defines that partition is non-trivial. Sharpe is the default; here, profit factor is used instead. Profit factor interacts more cleanly with the population's geometry, and the resulting partition reflects how strategies actually trade rather than how their realised volatilities happen to scale. The lens choice alone moves the result; on its own it is not enough.
2. Temporal restriction to the most recent eight walk-forward windows. Reliability scoring inside each cluster uses only the most recent eight windows of out-of-sample history. Older history is informative about regimes that no longer exist; including it pollutes the reliability signal at the moment of decision. Eight windows is short enough to track the regime that matters and long enough to be statistically meaningful — and it was fixed before evaluation, not chosen to maximise the result.
3. Strict-causal partition construction. This is the load-bearing change. The partition at window t is built using only information available before window t. Nothing about the partition at decision time can peek at the data the portfolio is about to act on. Earlier substrates in this research line had not enforced this strictly. Doing so changes the substrate the whole construction sits on — and, as the next section notes, also explains why an earlier claim turned out to be wrong.
Each choice contributes; the three stack. The result is a portfolio rule that, in the strong sense (Holm-corrected, FWER controlled at 5%), beats the historical-Sharpe baseline on the same pool, on the same windows, on the same evaluation metric. The previous coverage-aware approach — one strategy per cluster, weighted by within-cluster reliability — is dominated by the new construction on the same substrate.
Hierarchical risk parity and equal-weight, for reference
The picture is more lopsided against the conventional portfolio-construction baselines. Hierarchical risk parity and equal weighting of the same filtered pool are dominated by margins that are not close. This is not the interesting comparison — those baselines have known weaknesses on populations with very high effective dimensionality. The interesting comparison is the historical-Sharpe ranker, because that is the rule a competent practitioner would default to. That is the one the construction has to beat to count as a real result. It does.
The bar wasn't beating equal-weight. The bar was beating the rule a working practitioner would default to without thinking.
An honest revision
An earlier note in this research line had explained the historical-Sharpe baseline's stubborn performance as accidental concentration: the ranker happened to pile into whichever cluster was carrying the regime, and that concentration was the source of its lift. That explanation was an artifact of the pre-strict-causal partition substrate.
Under strict-causal partition construction — where the substrate cannot leak future information into the partition at decision time — the historical-Sharpe ranker no longer concentrates in the way the earlier note described. It is the new partition-based approach that organises the population in a way the baseline cannot replicate; the lens-choice and temporal-restriction structure is what carries the work.
What this leaves
The locus of detectable edge in this corpus is not any single strategy and is not the act of ranking by Sharpe. It is a portfolio construction that picks the right metric for grouping strategies, restricts reliability scoring to the windows that still matter, and refuses to look at future data when building the substrate the rule depends on. Each of those three choices is methodological, not magical. Each is testable on a different corpus by anyone willing to do the work.
This sits within the lab's broader empirical setting, which the SSRN working paper establishes at length and the article Edge is in the Process develops at the population level. The piece here is a working note that the methodology developed since then now clears the pre-registered statistical bar that earlier phases of this research line had been chasing.
The construction is described above in enough detail to be falsifiable in principle. The specific implementation that produced the numbers — the noise model, the threshold calibration, the cluster-aggregation rule — remains reserved.

