Edge Builder
Validation & metrics
A backtest that looks brilliant can simply be overfitting — the strategy memorised noise, not a real edge. Tester and Explorer run formal checks to tell the difference before you risk capital.
The checks we run
In Tester and Explorer, the engine applies:
- Walk-forward analysis — rolling in-sample / out-of-sample windows, so the strategy is repeatedly tested on data it wasn't fitted on.
- A sacred final hold-out — a slice of data left completely untouched until the last step, with no feedback loop.
- Deflated Sharpe Ratio — a Sharpe figure corrected for the fact that testing many variations inflates the best result by luck.
- Probability of Backtest Overfit (PBO) — the estimated chance that the “good” backtest is a fluke.
- Monte Carlo permutation — reshuffling the trade sequence to see whether the result survives.
- Regional consistency (Explorer) — checking the edge isn't concentrated in a single market regime.
Builder does none of these — it produces a single backtest with aggregate metrics only.
The validation gates
These checks act as gates: a strategy must clear each one to be considered robust. The current set covers walk-forward performance, the hold-out confirmation, the Deflated Sharpe correction, the overfit probability, the Monte Carlo confidence interval, and (for Explorer) regional consistency.
Note
The exact gate set is being finalised ahead of launch — treat the list above as the shape of the validation, not a frozen specification.What the report shows
- A verdict card: deploy-ready, caution, or robustness not found.
- The equity curve, in-sample versus out-of-sample, with drawdown beneath it.
- A metrics table: profit factor, Sharpe, Sortino, max drawdown, win rate, expectancy, trade count, average holding time.
- A walk-forward table — one row per window with its in/out-of-sample figures.
- A Monte Carlo summary with confidence intervals on Sharpe and drawdown.
- Trade distribution by regime, session and day of week.
- For Explorer: the iteration history and a shortlist of alternative variants.
Reading the verdict
- Deploy-ready — every gate passed; the strategy holds up out-of-sample with manageable drawdown.
- Caution — some gates passed only by a thin margin; consider reduced position sizing and close monitoring.
- Robustness not found — it worked in-sample but collapsed out-of-sample. The edge wasn't there.
Why we say no
“Robustness not found” is not a technical failure — it's an honest answer, and it's the single most valuable thing the platform can tell you. It means we saved you from deploying a strategy that would have failed live. Because it's a real result, that build isn't refunded — you paid to find out the truth.New to these terms? The glossary explains each one in plain language.
Keep reading