Exploring Data Patterns with Autocorrelation Analysis
Autocorrelation is a measure of linear relation of Yt with its past (or lagged) values. Trend and seasonality patterns can also be discerned using autocorrelation analysis in addition to graphical analysis as discussed above. The formula for autocorrelation of k lags denoted as rk is:
rk = k = 0,1,2,…, where
rk = autocorrelation coefficient for a lag of k periods
= the average or mean of the Y values
Yt = observed value at time t
Yt-k = observed value at time t-k
n= number of observations in the series
Example: Suppose we have data for twelve months (sales) and want to calculate r1. Using Excel, I calculated as shown below
||1704/12 = 142||r1 = 843/1474 = 0.572|
Similarly, we can calculate r2 or autocorrelation for lag 2 = 0.463 which is lower than that for lag 1. Generally, as time lag increases, the autocorrelation declines. Instead of going through all the above calculations it would be nice if you could use a shortcut Excel formula. There is no in-built Excel formula for autocorrelation, but you can use the following command for
n =12 and k = 1 (for different n you have the change the number in the formula accordingly).
r1 = . The Excel formula is (assuming first column for t, second column for Y and first row for label in the spreadsheet):
r3 =(SUMPRODUCT(B2:B10-AVERAGE(B2:B13), B5:B13-AVERAGE(B2:B13))/COUNT(B2:B13))/VAR.P(B2:B13)
r4 =(SUMPRODUCT(B2:B9-AVERAGE(B2:B13), B6:B13-AVERAGE(B2:B13))/COUNT(B2:B13))/VAR.P(B2:B13)
r5 =(SUMPRODUCT(B2:B8-AVERAGE(B2:B13), B7:B13-AVERAGE(B2:B13))/COUNT(B2:B13))/VAR.P(B2:B13)
and so on. We get the values :
A plot of rk against k is known as a correlogram using Insert, recommended charts-more charts-X-Y plot)
Autocorrelation coefficients for different time lags can be used to answer the following: (i) Are the data random? (White noise) (ii) Does data contain a trend (nonstationary)? (iii) Are the data stationary? (iv) Are the data seasonal?
If the series is random, the autocorrelations for any time lag are close to zero. If there is trend, the autocorrelations for the first several time lags are significantly different from zero but gradually drop toward zero. The first autocorrelation is very large, close to 1. The second lag autocorrelation will also be large. We see such a feature in the above example, as the time plot also shows:
If the time series has seasonal pattern, the autocorrelations will be pronounced at intervals equal to seasonal lag or its multiples: seasonal lag 4 for quarterly data and 12 for monthly data. There are statistical tests for the significance of the autocorrelation coefficients, but we will skip them in this course.