Edge Builder

Validation & metrics

A backtest that looks brilliant can simply be overfitting — the strategy memorised noise, not a real edge. Tester and Explorer run formal checks to tell the difference before you risk capital.

The checks we run

In Tester and Explorer, the engine applies:

Walk-forward analysis — rolling in-sample / out-of-sample windows, so the strategy is repeatedly tested on data it wasn't fitted on.
A sacred final hold-out — a slice of data left completely untouched until the last step, with no feedback loop.
Deflated Sharpe Ratio — a Sharpe figure corrected for the fact that testing many variations inflates the best result by luck.
Probability of Backtest Overfit (PBO) — the estimated chance that the “good” backtest is a fluke.
Monte Carlo permutation — reshuffling the trade sequence to see whether the result survives.
Regional consistency (Explorer) — checking the edge isn't concentrated in a single market regime.

Builder does none of these — it produces a single backtest with aggregate metrics only.

The validation gates

These checks act as gates: a strategy must clear each one to be considered robust. The current set covers walk-forward performance, the hold-out confirmation, the Deflated Sharpe correction, the overfit probability, the Monte Carlo confidence interval, and (for Explorer) regional consistency.

Note

The exact gate set is being finalised ahead of launch — treat the list above as the shape of the validation, not a frozen specification.

What the report shows

A verdict card: deploy-ready, caution, or robustness not found.
The equity curve, in-sample versus out-of-sample, with drawdown beneath it.
A metrics table: profit factor, Sharpe, Sortino, max drawdown, win rate, expectancy, trade count, average holding time.
A walk-forward table — one row per window with its in/out-of-sample figures.
A Monte Carlo summary with confidence intervals on Sharpe and drawdown.
Trade distribution by regime, session and day of week.
For Explorer: the iteration history and a shortlist of alternative variants.

Reading the verdict

Deploy-ready — every gate passed; the strategy holds up out-of-sample with manageable drawdown.
Caution — some gates passed only by a thin margin; consider reduced position sizing and close monitoring.
Robustness not found — it worked in-sample but collapsed out-of-sample. The edge wasn't there.

Why we say no

“Robustness not found” is not a technical failure — it's an honest answer, and it's the single most valuable thing the platform can tell you. It means we saved you from deploying a strategy that would have failed live. Because it's a real result, that build isn't refunded — you paid to find out the truth.

New to these terms? The glossary explains each one in plain language.

Keep reading

Glossary Build modes Export & deploy