One Year of Out-of-Sample Testing: Did the FVG Bot Survive?

Why is out-of-sample testing the most important step?

Every strategy looks good on the data it was trained on. That’s the whole point of optimization — you find parameters that work on historical data.

The question is: does it work on data it has never seen?

My momentum bot answered “no.” It showed incredible backtest returns, then lost money immediately in live trading. Classic overfitting.

My FVG bot needed to pass this test before I’d trust it with real money.

How did I structure the out-of-sample test?

I tested across 4 quarters, 10 coins, using the exact parameters from the live bot:

FVG gap size: 1.5% - 4.0%
C2 candle body: ≥ 2.0%
Risk-reward: 1:3 (fixed TP at 3x the risk)
SL at FVG boundary + 0.2% buffer
MA20 trend filter
SL ratio < 65% coin filter
Fee rate: 0.07% round trip (maker 0.02% + taker 0.05%)

None of these parameters were optimized on the test data. They were set during development on a separate training period.

What were the results?

Quarter	Trades	Win Rate	PnL
2025 Q2	~380	38%	+271U
2025 Q3	~350	35%	+38U
2025 Q4	~400	40%	+273U
2026 Q1	~300	32%	-36U

Total: 1,506 trades, ~40% win rate, +548U over 12 months.

Three profitable quarters, one losing quarter.

What does the losing quarter tell us?

Q1 2026 was a sideways market. Price oscillated without clear direction, which means:

FVGs formed but didn’t fill cleanly — price entered the gap zone and kept going
The trend filter (MA20) gave mixed signals — constantly flipping between bullish and bearish
More trades hit the stop loss before reaching the 3:1 take profit

This is a known weakness of the FVG strategy. Mean reversion relies on price returning to fill gaps. In strong-trend markets, gaps get filled. In choppy sideways markets, they often don’t.

One losing quarter out of four is realistic. If all four were profitable with smooth equity curves, I’d be suspicious of overfitting.

How does a 33% win rate make money?

This is the question everyone asks. The answer is risk-reward:

Average win: 3x the risk (by design — fixed 1:3 RR)
Average loss: 1x the risk

Over 100 trades with 33% win rate:

33 wins × 3R = 99R
67 losses × 1R = 67R
Net: +32R

Even losing two-thirds of all trades, the bot is profitable because winners are 3x larger than losers.

This is psychologically brutal. You watch the bot lose 5, 6, 7 trades in a row. Your instinct screams “turn it off.” But the math says hold.

What changed after the OOS test?

The OOS test validated two things:

The edge is real. +548U over 1 year on unseen data isn’t luck. It’s a genuine statistical edge in how markets fill Fair Value Gaps.
The parameters are robust. The same parameters that worked on training data also worked on 4 separate quarters. No spike optimization — a genuine plateau.

After this validation, I deployed the FVG bot live with real money. The live-backtest entry price match rate: 100%. Minor differences only in SL timing.

How does the FVG bot compare to the trend following bot?

I run both simultaneously. They complement each other:

Aspect	Trend Following	FVG
Market type	Strong trends	Any (with trend filter)
Entry style	Momentum breakout	Mean reversion to gap
Win rate	~57%	~33%
Risk-reward	~1:1.2	1:3
Weakness	Sideways/choppy markets	Strong trends without retracement

When one bot struggles, the other often thrives. This isn’t accidental — I specifically chose a mean-reversion strategy to complement my trend-following strategy.

What filters improved the OOS results?

After the initial OOS validation, I tested additional filters using post-hoc analysis (not re-optimization):

Filter	Effect
Gap 1.5-4.0% (was 0.5-2.0%)	Eliminated small noisy gaps, reduced trade count 62%, improved WR from 26% to 36%
Body ≥ 2.0% (was 0.7%)	Filtered out wicks-only FVGs, improved signal quality
C2 body overlap check	Ensured gap exists within the candle body, not just wicks
SL ratio < 65% per coin	Excluded coins that historically don’t respect FVGs

The gap/body filter change was dramatic: 7-day performance went from -15U (666 trades, 26% WR) to +660U (254 trades, 36% WR). Fewer trades, much better trades.

The 1-year OOS with these filters: 1,506 trades, 40% WR, +1,608U (including fees).

What’s the honest takeaway?

The FVG bot works. The OOS test proves it has a real edge, not just curve-fitted parameters.

But it’s not a money printer:

One losing quarter out of four
67% of trades are losses
Requires discipline to let winners run to 3R
Performance varies significantly by market regime

The value of OOS testing isn’t proving your strategy is great. It’s proving it isn’t garbage. If it survives a year of unseen data, you have something worth trading. If it doesn’t, you saved yourself real money by finding out with fake money.

The backtest tells you what could happen. The out-of-sample test tells you what probably will happen. Only live trading tells you what actually happened.

Related:

Fair Value Gaps: The Strategy That Changed Everything — The FVG strategy explained
How to Avoid Overfitting — The checklist I use
The Backtest Looked Amazing. It Was Lying. — What happens without OOS testing
Why I Run Two Bots, Not One — The portfolio approach

Why is out-of-sample testing the most important step?#

How did I structure the out-of-sample test?#

What were the results?#

What does the losing quarter tell us?#

How does a 33% win rate make money?#

What changed after the OOS test?#

How does the FVG bot compare to the trend following bot?#

What filters improved the OOS results?#

What’s the honest takeaway?#