Backtesting: In-sample (IS) versus Out-of-Sample (OS)

Example : Moving Average trading strategy on the CADUSD exchange rate

The purpose of this exercise is to show the difference between the performances of IS and OS backtests of a simple MA trading strategy that uses optimal parameters for the number of days that are counted in the calculation of the two moving averages that dictate the trading signals.

My hypothesis is that IS backtesting results cannot be achieved in live trading, except by coincidence. Out-of-sample backtesting performance can be achieved in theory, gross of trading costs and slippage. The statistics and graphs below show that, in the case of trading an MA strategy on the CADUSD exchange rate, my hypothesis holds, at least on the period in study.

The maximization criterion in this version is Mcrit = mean/standard deviation of losses. The initial criterion I used was Mcrit = mean/standard deviation. The results didn't change notably from one version to the other. However, they may change with other datasets. The division by the standard deviation of losses makes more sense because it attempts to minimize only losses, instead of minimizing the variation of both losses and gains.

  1. Importing market data from Yahoo Finance.
  1. Ploting the data can be done with various packages.
  1. Defining the MA Strategy as a function of MA parameters i, j representing the sizes of the rolling MA windows.

The two methods give the same optimization result on the entire series. However, the second method should be faster, because it gets rid of the rolling mean calculation with Pandas.

  1. Find the optimal i, j parameters in the [0:N] window by maximizing the risk-adjusted value of the investment.
  1. Find the optimal i, j parameters in the full series by maximizing the risk-adjusted value of the investment, for the pure IS backtesting.

It appears the optimal parameters are found in a small island above water, which makes me pesimistic about the success of the out-of-sample backtesting.

  1. Calculating out-of-sample log-returns.
  1. Plotting the cumulative log-returns of the MA strategy in and out of sample and of holding long-only CAD versus USD.

It appears OS backtesting far underperforms IS testing. We can perform a structural break test (Chow test) to prove it.

  1. Plotting the two moving averages and the long-short regimes for the combined strategy.
  1. Calculating basic descriptive statistics for the MA strategy log-returns.

The graphs show only small differences between in-sample and out-of sample testing. The statistics below add more precision to the story.

  1. Plotting rolling performance of the combined (in-sample + out-of-sample) MA strategy.

The risk level is fairly constant. However, performance falls dramatically in the Out-of-Sample backtesting relative to the In-Sample backtesting. (The cause of the 1-year break in the middle of the chart is driven by my wish to keep In-Sample and Out-of-Sample completely separate from each other in the rolling averages.) Different outcomes may happen with other datasets. However, this is an illustration of the unreliability of in-sample backtesting in forming expectations for live trading.

This strategy could be improved by: