Stochastic Processes for Equity Index Prices

I have started to work on this problem while studying Python and its application to Financial Time Series. Among the references I used for this is the book "Python for Finance" by Yves Hilpisch, O'Reilly Media (Chapter 10, Stochastics).

In order to test some stochastic models I downloaded 'S&P 500' prices from Yahoo Finance.

Next, I shall plot the index prices using plotly. Most textbooks recommend matplotlib, for which the syntax is more straightforward. Plotly graphs, on the other hand, have a much better look.

Making a plot in plotly requires the following steps:

I plotted the index prices on a log scale, because it allows for the visual assessment of the magnitude of stock price movements at any point in time. To me, this is the only acceptable way of plotting any series that is built by compounding.

In the last 10 years, the American stock market has seen an astounding bull run. The Covid crisis produced the largest drawdown since the financial crisis, but the index recovered very fast.

Todo: calculate drawdowns

Pandas provide an easy way to calculate log-returns (and returns in general), and to quickly describe the data. Log-returns closely approximate period returns when the time intervals between data are small.

However, numpy and scipy expand the possibilities for vector and array operations. Scipy.stats return a different set of descriptive statistics.

Geometric Brownian Motion process

The easiest way to model stock prices is with the GBM process. The GBM process assumes that that relative changes in price are driven by a constant drift term $r$ and a random iid shock with standard deviation $\sigma$. The fact that GBM is memoryless (it has the Markov property) allows for the vectorization of the GBM process, eliminating the need for for loops.

Geometric Brownian motion SDE: $dS_t = rS_tdt + \sigma S_t dZ_t$

Index level solution: $S_t = S_{t-dt} \exp ( (r-1/2\sigma^2)dt + \sigma \sqrt{dt} z_t )$

$\log$ process: $\log(S_t/S_{t-dt}) = \log(r_t+1)=(r-1/2\sigma^2) dt + \sigma \sqrt{dt}z_t$

Also recall that $\log(r_t+1) \approx r_t$.

Expected compounded average return

$CAGR = (S_T/S_0)^{1/T}-1 \approx \frac{log(S_T/S_0)}{T}$

Mean log returns and compounded returns are approximately equal. They are not the same as the drift term r.

The state variable follows a Lognormal distribution, at each point in time. Below, I plot the distribution of terminal values.

If everything stays the same, the expected return of the index in one year following the random walk process is a large gain. The mean is a biased and not robust estimator, because index values follow a lognormal process, which is asymmetric. A more robust estimator is the sample median. However, they are not far from each other.

The variance of the process, on the other hand, is extremely large. The index might experience either a large gain or a huge loss.

Parameter estimation

$\theta^\star = \min_\theta \sum_i \|y_i-f(x_i,\theta)\|^2 = \min_\theta g(\theta)$

$f(z)=(r-\sigma^2/2)+\sigma z$

$g =\|y -(r-\sigma^2/2)-\sigma z\|^2$

I find out that calibrating parameters by minimizing a deterministic distance is not adequate for stochastic data, all the uncertainty is removed from the solution. The calibration needs to be done by minimizing a distance specially defined for stochastic processes, such as the "moving earth" problem.

Jarque-Bera normaliy test: The test statistic JB is defined as

$JB=\frac{n}{6} \left(S^2+1/4(EK)^2\right)$

The GBM process is proven inadequate to model stock prices. Will a Stochastic Volatility model do a better job?

In conclusion, a stochastic volatility model may be more appropriate for modelling stock prices.

Is there autocorrelation in the stock returns?

Both coefficients are signifficant, with the implications that:

I repeat the exercise with two lags, and I find that both first and second lag returns affect current returns.

The third-lag coefficient is however insignificant. Therefore, an AR(2) process might more adequately describe equity returns.