I have started to work on this problem while studying Python and its application to Financial Time Series. Among the references I used for this is the book "Python for Finance" by Yves Hilpisch, O'Reilly Media (Chapter 10, Stochastics).
import numpy as np
import numpy.random as npr
import matplotlib.pyplot as plt
import pandas as pd
import plotly.graph_objects as go
import scipy as sp
import scipy.linalg
import scipy.stats
import pandas_datareader as pr
from pandas_datareader import data
In order to test some stochastic models I downloaded 'S&P 500' prices from Yahoo Finance.
start_date='6/30/2010'; end_date='10/24/2020'
Yahoo = data.DataReader('^GSPC', 'yahoo', start_date, end_date)
Next, I shall plot the index prices using plotly. Most textbooks recommend matplotlib, for which the syntax is more straightforward. Plotly graphs, on the other hand, have a much better look.
Making a plot in plotly requires the following steps:
plotarray=[
go.Scatter(y=Yahoo['Close'], x=Yahoo.index, name='Index Value', mode = 'lines')
]
figlayout = {
'title': "S&P 500 Index",
'yaxis':{'title':'Log Scale', 'type':'log'},
'showlegend':True
} #dict type
fig = go.Figure(data = plotarray, layout = figlayout)
fig.show()
I plotted the index prices on a log scale, because it allows for the visual assessment of the magnitude of stock price movements at any point in time. To me, this is the only acceptable way of plotting any series that is built by compounding.
In the last 10 years, the American stock market has seen an astounding bull run. The Covid crisis produced the largest drawdown since the financial crisis, but the index recovered very fast.
Todo: calculate drawdowns
Pandas provide an easy way to calculate log-returns (and returns in general), and to quickly describe the data. Log-returns closely approximate period returns when the time intervals between data are small.
Yahoo['logReturn']=np.log(Yahoo['Close']/Yahoo['Close'].shift(1))
Yahoo['logReturn'].describe()
count 2598.000000 mean 0.000467 std 0.010981 min -0.127652 25% -0.003382 50% 0.000653 75% 0.005264 max 0.089683 Name: logReturn, dtype: float64
rets = np.array(Yahoo['logReturn'][1:]) #the vector of log returns
However, numpy and scipy expand the possibilities for vector and array operations. Scipy.stats return a different set of descriptive statistics.
pd.DataFrame(rets).describe()
| 0 | |
|---|---|
| count | 2598.000000 |
| mean | 0.000467 |
| std | 0.010981 |
| min | -0.127652 |
| 25% | -0.003382 |
| 50% | 0.000653 |
| 75% | 0.005264 |
| max | 0.089683 |
x = scipy.stats.describe(rets)
y= dict(x._asdict())
y["min"] = y["minmax"][0]
y["max"] = y["minmax"][1]
y.pop("minmax");y
{'nobs': 2598,
'mean': 0.00046673491230296213,
'variance': 0.00012057369088690606,
'skewness': -0.9067317822033426,
'kurtosis': 17.749492855510095,
'min': -0.12765219747281742,
'max': 0.08968323251796326}
The easiest way to model stock prices is with the GBM process. The GBM process assumes that that relative changes in price are driven by a constant drift term $r$ and a random iid shock with standard deviation $\sigma$.
The fact that GBM is memoryless (it has the Markov property) allows for the vectorization of the GBM process, eliminating the need for for loops.
Geometric Brownian motion SDE: $dS_t = rS_tdt + \sigma S_t dZ_t$
Index level solution: $S_t = S_{t-dt} \exp ( (r-1/2\sigma^2)dt + \sigma \sqrt{dt} z_t )$
$\log$ process: $\log(S_t/S_{t-dt}) = \log(r_t+1)=(r-1/2\sigma^2) dt + \sigma \sqrt{dt}z_t$
Also recall that $\log(r_t+1) \approx r_t$.
#The GBM Model takes as inputs the parameters theta = (r, sigma) and the iid shock matrix zt.
#It returns log-returns and log-prices. One series would have sufficed as output, however.
#Quick calibration (bird's eye view) of GBM parameters:
sigma = np.std(rets)
r = np.mean(rets)+1/2*sigma**2 #the drift term
print(f'r={round(r,5)}, sigma={round(sigma,5)}')
r=0.00053, sigma=0.01098
N=100
T = len(rets)
theta = [r,sigma]
zt = npr.standard_normal((T, N))
S0 = Yahoo['Close'][0]
#Simulate N GBM paths:
def GBM(theta,zt):
T, N = zt.shape
S_log = np.zeros((T+1, N))
log_Rt = np.zeros((T, N))
log_Rt = (theta[0]-theta[1]**2/2)+theta[1]*zt
S_log0 = np.log(S0)
S_log[0,:]=S_log0
S_log[1:]=S_log0+np.cumsum(log_Rt, axis=0)
return S_log, log_Rt
#Simulate the calibrated GBM path:
[S_log, log_Rt] = GBM(theta,zt)
S = np.exp(S_log) #Quick simulation of plausible price process.
plt.plot(S[:, :], lw = '0.5')
plt.title('GBM Model - linear scale')
plt.ylabel('Bird Eye Calibration')
plt.grid(True)
$CAGR = (S_T/S_0)^{1/T}-1 \approx \frac{log(S_T/S_0)}{T}$
Mean log returns and compounded returns are approximately equal. They are not the same as the drift term r.
LOG = np.log(np.mean(S[-1,:])/S0)/T
CAGR = (np.mean(S[-1,:])/S0)**(1/T)-1
print(round(LOG,5), round(CAGR,5))
0.00053 0.00053
#Plotting series is easy with matplotlib:
plt.plot(S_log, lw = '0.5')
plt.xlabel('# periods'); plt.ylabel('Bird Eye Calibration'); plt.title('GBM Model - logarithmic scale')
plt.grid(True)
#Plotting with plotly is much more difficult and time-consuming, but the result may be well worth the pain.
plotarray=[]
for i in range(N):
plotarray.append(go.Scatter(y=S[:,i], x=Yahoo.index, mode = 'lines', line=dict(width=1)))
figlayout = {
'title': "GBM Model",
'yaxis':{'title':'Log Scale', 'type':'log'},
'showlegend':False
} #dict type
fig = go.Figure(data = plotarray, layout = figlayout)
fig.show()