Project B · Open-source artefact · Engine + paper drafted
Quant Research Framework
A research-grade systematic-trading framework anyone can use to generate strategies, rule-based or ML, with bulletproof walk-forward optimisation, a built-in lookback optimiser that rotates per regime, a customizable regime segmenter, and a 5-scenario robustness suite. Ships as a Python reference and a spec-equivalent Rust port held to a published numerical-parity contract on every PR.
Two engines, one specification
Not a Python wrapper around Rust, two independent implementations of the same backtester
A common pattern in quant tooling is a Python facade over a Rust hot loop. This framework is something different: two complete, standalone engines, one in Python, one in Rust, that implement the same algorithmic specification independently and are held to identical numerical output by a parity protocol that runs on every PR.
Pick the language that fits your stack. If you live in pandas, numpy, scikit-learn, and notebooks, install the Python reference (pip install quant-research-framework) and you have a complete research engine, strategy contract, walk-forward, regime, robustness suite, metric ledger, Monte Carlo and Deflated-Sharpe diagnostics, running natively in Python, with numba JIT inside the trade-execution loop.
If you're shipping to production, want a single static binary, or are running inside a Rust trading stack, install the Rust port (cargo add quant-research-framework-rs). It is not a Python binding: it's an independent re-implementation of the same specification with a Vec<Bar> data path and a Rust-native strategy contract fn(&[Bar], usize) → Vec<i8>.
Both engines compute their indicator ladder in IEEE 754 double precision. Different memory layouts, different rounding regimes (LLVM-via-rustc vs LLVM-via-numba), different RNGs. The fact that they produce identical printed metrics across 210 verified points at ≤10⁻³ relative tolerance, with a worst-case observed deviation of 5×10⁻⁵, is exactly what the parity protocol exists to demonstrate. A bug in one implementation has to either match a corresponding bug in the other or get caught.
What sets this apart
What this framework gets right that other backtesters don't
Every component in this framework exists somewhere in the open-source ecosystem. The contribution is the combination, and the discipline that holds the combination together. The table below summarises a survey we did of six widely used backtesters in April 2026 along the four axes that matter most for research-grade work.
Built-in walk-forward optimisation, per-regime lookback selection, strict no-look-ahead enforced by automated property tests at the trade-ledger level, and cross-language byte-parity testing, that combination is, to the best of our knowledge, unique to this framework. Each individual axis is implemented somewhere; QuantConnect Lean ships WFO, vectorbt has a sophisticated splitter API, NautilusTrader runs a Python+Rust dual stack, but none of them put all four together with a published parity contract.
The other axis worth noting: every realism control, fees, slippage, funding, intrabar SL/TP, session windows, forex pip sizing, partial leverage caps, is a parameter of the engine, not of the strategy. A strategy author cannot forget to apply slippage or skip funding charges; the engine applies them to every trade in every WFO window in every robustness overlay.
Feature comparison · April 2026
| Framework | License | Built-in WFO | Per-regime LB | Strict-LAH tests | Cross-lang parity |
|---|---|---|---|---|---|
| this work (Py + Rs) | MIT | ✓ | ✓ | ✓ | ✓ |
| vectorbt | Apache + CC | ✓ (Splitter) | - | - | n/a |
| backtrader | GPL-3.0 | - (community) | - | - | n/a |
| NautilusTrader | LGPL-3.0 | - (engine only) | - | - | - (bilingual; no parity) |
| zipline-reloaded | Apache-2.0 | - (3rd-party) | - | - | n/a |
| QuantConnect Lean | Apache-2.0 | ✓ | - | - | n/a |
| bt | MIT | - | - | - | n/a |
Strict-LAH tests = strict no-look-ahead enforced by automated property tests at the trade-ledger level.
Verified against primary documentation, April 2026. The combination WFO + per-regime LB + strict-LAH + cross-language parity is, to the best of our knowledge, unique to this work.
Empirical speed
Same workload, six engines, one timing protocol
100 strategies sampled from the framework's combinatorial generator, each taken end-to-end through an in-sample lookback sweep, an RRR optimisation, and an out-of-sample evaluation. Same SOLUSDT 1-hour bars, same costs (fees + slippage where the engine supports it), same single-thread budget, signal generation and the backtest engine both inside the timed region.
Single-thread wall, 100 strategies, SOLUSDT 1h
| Engine | Wall time | vs Rust port |
|---|---|---|
| this work (Rust port) | 6.3 s | 1×, fastest |
| this work (Python) | 37.5 s | 6× slower |
| vectorbt | 4.1 min | 39× slower |
| backtesting.py | 23.7 min | 226× slower |
| backtrader | 13.4 h | 7,670× slower |
| bt | 37.2 h | 21,340× slower |
Single-thread, end-to-end (signal generation + backtest engine). Identical strategies, identical IS/OOS slicing, identical fee + slippage assumptions; funding is charged only where the engine supports it out of the box.
Backtrader and bt were parallelised to 15 workers to fit in clock time; the single-thread-equivalent total is reported here for a like-for-like comparison. bt is a portfolio-rebalancing framework rather than a directional-trade simulator and is included only to bound the slowest end of the comparison.
Trust model
Why the numbers can be trusted
Reproducibility is a four-layer claim: the algorithm has tests that catch the failure modes that matter, two independent implementations agree on the output, the methodology cites and applies the published statistical literature, and every artefact is openly licensed and DOI-archived so a third party can rebuild and re-verify everything.
01 · Property-tested invariants
Hypothesis-based property tests verify the no-look-ahead invariant at the trade-ledger level: every trade's entry index must be strictly greater than the index at which the parameter was fit. A separate detector-fit invariant test pins the default regime detector as causal under bar-tail perturbations, and explicitly flags KMeans / vol-quantile / trend-vol detectors as documented anti-patterns rather than letting their leak go silent.
02 · Cross-language parity
210 metric points verified across three configuration surfaces (default · regime + WFO · forex) at 10⁻³ relative tolerance. Worst observed deviation across all 210 points is 5×10⁻⁵, twenty times tighter than the declared band. The parity harness runs on every PR; a methodological claim that survives this bar has to either match a corresponding bug in both engines or get caught.
03 · Citations to the published literature
The accompanying preprint (49 pages, 12 internal review rounds, avg score 7.0 → final 8/8/9) cites 24 references across the Deflated Sharpe Ratio (Bailey & López de Prado), Combinatorial-Symmetric CV / PBO (Bailey-Borwein-LdP-Zhu), the multi-testing literature (Harvey-Liu, Bonferroni / Holm / BHY), the bootstrap-data-snooping family (White's Reality Check, Romano-Wolf step-down), and the regime-switching foundations (Hamilton, Ang-Timmermann). The diagnostics shipped with the engine implement these methods, not approximations of them.
04 · Open, DOI'd, and re-verifiable
Both artefacts are MIT-licensed and DOI-archived on Zenodo (10.5281/zenodo.19798594 for Python · 10.5281/zenodo.19798592 for Rust). The paper repository ships requirements-paper.txt with pinned Python deps and a Makefile target make verify that re-runs the parity harnesses against pinned sibling-clone SHAs and regenerates the per-metric residuals CSV. Reproducing every figure and every parity claim in this work takes 5–10 minutes on a recent laptop.
Inside the optimiser
What makes the optimiser more effective than a grid sweep
Most backtesters expose a parameter grid and let the user run an exhaustive sweep. This framework adds three things on top of that: a coarse-then-fine search that halves the search cost without losing the optimum, a smart-optimisation guard that rejects sharp PF peaks before they overfit the in-sample window, and an RRR probe that picks the risk-to-reward ratio maximising total realised R per WFO window.
The coarse pass sweeps the lookback range {12, 14, ..., 76} on a step-2 grid (33 candidates), evaluates the strategy on each, and ranks by the configured optimiser metric (Sharpe by default; Profit Factor, Expectancy, MaxDD also configurable). The top coarse candidate then triggers a fine-tune pass on its ±1 neighbourhood, three additional evaluations. We have not seen the fine-tune step disagree with the exhaustive optimum on any tested dataset.
The smart-optimisation guard sits in front of the coarse-then-fine result. A candidate whose Profit Factor exceeds the median PF of its immediate neighbours by more than 10% is rejected as a sharp peak and the runner-up is promoted. The 10% threshold is honestly disclosed as folk engineering, it's the value at which qualitative reduction in OOS overfitting was observed on a panel of synthetic strategies, and the guard's on/off behaviour is part of the parity surface so anyone can flip it and see the effect.
The RRR probe is the third stage. For each candidate RRR ∈ {1, 2, 3} (classic) or {1, 2, 3, 4, 5} (regime-path), the engine recomputes the realised R per trade as min(peak_R, RRR) for trades that hit the SL first (capped profit) and the actual close-R for trades that exited before hitting the candidate TP. The chosen RRR★ is the one that maximises ΣR. This is what lets a strategy author plug in a single rule and have the engine find a sensible risk-to-reward fit rather than guess.
- All three stages run inside every WFO sub-window, never on the full sample, so the no-look-ahead invariant is preserved at the ledger level.
- Per-regime lookback selection runs the same three stages restricted to bars whose label is r, producing a vector {lb★_r} of length equal to the regime-label count. In OOS the engine rotates the slow-EMA span bar-by-bar.
- All optimiser knobs are Config fields. There are no hidden defaults inside numba kernels or Rust closures, flipping any of them is a one-line change reproducible against the parity record.
Per-regime parameter rotation
Most backtesters expose a parameter and let the user pick it. A few expose a parameter grid and let the user run a sweep. Almost none ship a built-in optimiser, and the ones that do don't optimise per regime. This framework does both: it sweeps the lookback range inside the in-sample window, restricted to bars whose regime label is r, and produces a separate lb★_r per regime. In OOS the engine then rotates the slow-EMA span bar-by-bar according to the live regime label, so the strategy runs with the lookback that performed best on the bars that actually look like the current regime.
The whole machinery, coarse-then-fine sweep, smart-optimisation guard, RRR probe, per-regime restriction, OOS rotation, runs without any user intervention. A strategy author writes the (df, lb) → int8[] function and configures the regime detector (default 8-bar EMA-200; pluggable for vol-quantile, KMeans, or a custom function). Everything else is the engine's job. There's an empty-regime safety: if a regime has fewer than min_trades inside the IS window, its lb★_r falls back to the global IS optimum so the rotation never silently produces zero-trade behaviour.
Phased roadmap
5 done · 1 in flight · 3 planned · click any node for details
- Done
quant-research-framework v0.4.0, pip-installable MIT backtester with WFO, regime segmentation, and a 5-scenario robustness suite ready out of the box.
- Done
quant-research-framework-rs v0.3.3, spec-driven Rust engine with identical strategy semantics, ready for production deployment.
- Done
Two engines, one specification: 210 metric points across 3 surfaces (default · regime+WFO · forex) verified at 10⁻³ relative tolerance.
- Done
The Rust port is fast enough for production: ~37× lower whole-process memory and a 25–60× end-to-end speedup on the bundled bench.
- Done
49-page LaTeX preprint, 12 internal review rounds (avg 7.0 → final 8/8/9). arXiv submission pending.
- Done
ARM64 cross-compile under QEMU: every deterministic metric byte-identical to x86_64 across all six datasets. Native-ARM and QEMU jobs both wired into CI.
- Done
Pairs, baskets, beta-hedged portfolios over a redesigned per-leg ledger. Shipped in v0.5.0; eight parity surfaces agree within 1e-3, single-asset output byte-unchanged.
- Done
Regime + WFO + forex + session. Closed: a harness knob mismatch plus three session-end-bar divergences in the Rust core, now aligned. The combo is byte-exact and gated in CI (v0.5.0).
- Done
Deflated Sharpe Ratio is now mirrored in Rust and cross-language parity-verified (within 1e-3, and below 1e-9 on finite cases), bringing the diagnostic block to feature parity with the engine.
Click a green node for what shipped; an accent node for what's in flight; an outlined node for what's planned
Phase 01 · Done · v0.4.0 · MIT
Python reference engine
The Python reference is the framework anyone can install today and use to generate strategies, rule-based or ML, under a research-grade pipeline. You write a function that takes (df, lb) and returns int8 signals in {−1, 0, +1}; the framework owns optimisation, walk-forward orchestration, regime rotation, the 5-scenario robustness overlay suite, and metric computation. The numba JIT inner loop applies fees, slippage, funding (crypto-only), intrabar SL/TP, session windows, and forced-close logic over a typed trade ledger, so the simulated PnL is what a deployable execution layer would actually see.
Seven major subsystems ship in the box, data loader, indicator core, strategy contract, execution core, optimiser, walk-forward orchestrator, regime engine, plus the Monte Carlo and Deflated-Sharpe diagnostic blocks. The v0.4.0 release detangled Config from globals, which is what lets two backtests coexist in the same interpreter and is also what unblocked the cross-language parity protocol downstream.
- Install: pip install quant-research-framework. DOI: 10.5281/zenodo.19798594. MIT licensed.
- Strategy contract is deliberately minimal: (df, lb) → int8[] in {−1, 0, +1}. Same surface for hand-written rules and for ML models.
- Bundled CSVs to make the first backtest a one-liner: SOLUSDT 1h (48,094 bars) and EURUSD 1h (53,160 bars).
Phase 02 · Done · v0.3.3 · MIT
Rust port, spec-driven re-implementation
The Rust port gives you the same engine, same WFO, same regime segmenter, same robustness suite, same strategy semantics, in a binary you can drop into a production deployment without a Python interpreter. It is a separate implementation of the same specification, not a transliteration: different memory layout, different rounding regime (LLVM-via-rustc vs LLVM-via-numba), different RNG. The point is to exercise the algorithmic spec with two engines whose only shared lineage is the spec itself.
Strategy contract mirrors Python exactly: fn(&[Bar], usize) → Vec<i8>. The flip-detection layer translates raw signal levels into entry/exit codes; this separation lets strategy authors specify desired state (long/short/flat) rather than transitions, which is much easier to keep no-look-ahead-correct.
- DOI: 10.5281/zenodo.19798592. crates.io: cargo add quant-research-framework-rs.
- Dependencies are deliberately tiny and pinned: rand =0.9 (exact), chrono 0.4, chrono-tz 0.10. The exact rand pin is documented because rand 0.10 makes random_range a trait method, breaking call sites.
- Rust binary needs no warm-up; Python pays a one-time numba JIT cache cost before the first measured run.
Phase 03 · Done · evidence the two engines are the same engine
Cross-language parity protocol
Parity is the protocol that lets us claim Python and Rust are running the same backtester rather than two backtesters that happen to look alike. Three deterministic configuration surfaces are verified end-to-end on every PR: default config (56 metric points · candle-trigger WFO + smart-optimised lookback + auto-RRR), regime + WFO (98 metric points · per-regime LB optimisation + OOS LB rotation + 200-bar warmup + 4 WFO windows), and forex mode (56 metric points · pip-scaled position sizing + forex-mode PnL capping).
Tolerance is 10⁻³ relative; the worst observed deviation across all 210 metric points is 5×10⁻⁵, twenty times tighter than declared, which we take as evidence that algorithmic equivalence, not numerical accumulation, is binding. The 156-point paper-time replication matched bit-for-bit at \%.4f printed-rounding precision.
- Re-running parity is one command per surface, total runtime ≈ 5–10 min on a recent laptop.
- Disabled by design: Monte Carlo percentiles (different RNG implementations between engines). The previously-divergent INDICATOR_VARIANCE overlay was closed during paper preparation by adding IND_VARIANCE_SEED = 42 to both engines.
- Honestly disclosed unverified scope: Windows MSVC. The four-way regime+WFO+forex+session combination is now byte-exact and gated in CI (Phase 08), and cross-architecture parity on aarch64 is CI-verified (Phase 06).
Phase 04 · Done · published bench harness
Performance, the Rust port is fast enough for production
The point of the perf phase is that the Rust engine is operationally usable in a production deployment, not that the Python reference is slow. On the bundled bench the Rust port runs 25–60× faster end-to-end (replicated as 24–62× on the paper-time host) with ≈ 37× lower whole-process peak RSS, the engine-attributable memory delta is plausibly only 3–5× once Python's pandas/numpy/numba interpreter baseline is subtracted, and the paper reads the headline with that caveat.
The speedup comparison is meaningful precisely because parity holds: without parity, a faster re-implementation might just be doing less work; with parity, you are timing two implementations of the same algorithm.
- Benchmarks run on bundled SOLUSDT 1h slices at 15k / 25k / 35k / 48k bars, three warm runs (with one warm-up discarded for numba JIT cache).
- The full default pipeline runs end-to-end: IS/OOS baseline, smart-optimised lookback search with auto-RRR, candle-triggered walk-forward, Monte Carlo diagnostics, and the four v0.1.x robustness overlays.
Phase 05 · Done · 49 pages · ~880 KB PDF
Reproducible-walk-forward-backtester preprint
The accompanying paper documents the architecture, the WFO + regime + robustness pipeline, the parity methodology, the performance comparison, and a reproducibility blueprint suitable for academic publication. It positions the framework against six widely used open-source backtesters and the relevant statistical literature; the contribution is the combination plus the parity protocol, not new statistical machinery.
Twelve internal review rounds took the paper from an average reviewer score of 7.0 to a final 8/8/9 trajectory. The paper repository ships requirements-paper.txt with pinned Python deps, a Makefile that runs make verify against pinned sibling-clone SHAs, and a generated parity_residuals.csv that backs every figure.
- Full source: github.com/DaruFinance, paper repo + python repo + rust repo as siblings.
- Both artefacts have separate Zenodo DOIs; the paper repository will receive its own DOI when archived to Zenodo at publication time.
Phase 06 · Done · byte-identical on aarch64, QEMU + native-ARM CI green
Cross-architecture parity
The Rust port has been cross-compiled to aarch64-unknown-linux-gnu and run under QEMU user-mode emulation against the same inputs as the x86_64 binary. After seeding the previously-unseeded indicator-variance overlay (an oversight uncovered and fixed during paper preparation), every printed deterministic metric is byte-identical between the two architectures across all six bundled datasets, 1,176 metric lines, zero differences. Only the load and total wall-clock timings differ, both inflated by QEMU's emulation overhead.
The comparison is exact string equality of the printed metric block, strictly stronger than the relative-tolerance check used for cross-language parity, and it holds because the hot path uses only correctly-rounded IEEE-754 operations, with no fast-math or fused-multiply-add codegen. A native-ARM job on a free public ARM runner runs the same byte-identity assertion on real silicon alongside the QEMU job, so both architectures are checked on every change and the QEMU result stands as the reproducible backstop.
Phase 07 · Done · released as v0.5.0 in both engines
Multi-asset core
The execution core at the paper-pinned version operated on a single OHLC series at a time. The multi-asset substrate that lifts that limit shipped in v0.5.0 across both engines: a panel layer (cross-asset regime detection, long/short baskets, equal-risk-contribution sizing, beta / dollar / sigma neutralisations, portfolio constraints) plus pairs and carry primitives, over a redesigned per-leg trade ledger that carries a leg id, a trade-group id, and a full cost decomposition (fee, slippage, funding, gross and net) satisfying gross − costs = net to floating-point tolerance.
Every primitive is mirrored Python ↔ Rust and exercised by its own parity gate in CI, holding to the same discipline as the original surfaces, eight surfaces now agree within 1e-3 (trade counts exact). Single-asset metric outputs are byte-unchanged from the prior release. The proprietary strategy research that sits on top of this substrate stays private; the public release is the engine capability, not the strategies.
Phase 08 · Done · byte-exact and gated in CI (shipped in v0.5.0)
Four-way parity surface (regime + WFO + forex + session)
The published parity record covered three surfaces; the four-way combination, regime + WFO + forex + session, was the remaining one, and it is now closed. Root-causing the combo divergence found two causes. The dominant one was a harness mismatch, not an engine bug: the combo driver fed the two engines different selection knobs (minimum-trades and risk-to-reward optimisation are compile-time constants on the Rust side but were being overridden on the Python side), so the in-sample phase chose a different lookback.
The residual was three session-end-bar divergences in the Rust backtest core: on the last in-session bar of the day it let an opposite-flip signal open a position, never blocked a new entry, and ran the intrabar stop/target check, whereas the reference engine force-closes unconditionally and skips the stop/target check there. Aligning all three (fully guarded by the session flag, so the single-feature surfaces stay byte-unchanged) makes the combo byte-exact across every stage. It now runs as a fourth gated parity surface in CI, shrinking the honestly-disclosed unverified footprint to Windows MSVC.
Phase 09 · Done · Rust mirror landed and parity-verified
Deflated Sharpe Ratio in Rust
The DSR utility (Bailey & López de Prado 2014) was Python-only because it depends on scipy's normal distribution and is invoked once per optimisation trajectory rather than per bar. It is now mirrored in Rust behind a lightweight feature flag, using the same parity discipline applied to its scalar output: a dedicated harness feeds identical (Sharpe, trial-Sharpes, returns) fixtures, including skewed, fat-tailed, and every degenerate-guard case, to both implementations and confirms agreement on the expected-maximum-Sharpe and the deflated Sharpe within the standard tolerance, and in fact below 1e-9 on all finite cases since both are closed form.
The normal CDF and its inverse come from a vetted statistics crate that matches scipy to roughly 1e-12; the coarse error-function approximation used elsewhere in the port is deliberately not reused, because its tail error would corrupt the expected-maximum-Sharpe term at the relevant quantile. The benefit is a Rust-only deployment that computes deflated-Sharpe diagnostics on its own optimiser trajectories without round-tripping through Python.
Reproducibility
quant-research-framework
Python · MIT · DOI 10.5281/zenodo.19798594
quant-research-framework-rs
Rust · MIT · DOI 10.5281/zenodo.19798592
Reproduce parity in 5–10 minutes
git clone …/quant-research-framework
git clone …/quant-research-framework-rs
cd quant-research-framework-rs
cargo build --release
QRF_PY_DIR=../quant-research-framework \
python tools/parity_check.py --tol 0.001
QRF_PY_DIR=../quant-research-framework \
python tools/parity_regime.py --tol 0.001
QRF_PY_DIR=../quant-research-framework \
python tools/parity_forex.py --tol 0.001
