The mean was introduced as a method to describe the center of a data set, and variability in the data is also important. Here, we introduce two measures of variability: the variance and the standard deviation. Both of these are very useful in data analysis, even though their formulas are a bit tedious to calculate by hand. The standard deviation is the easier of the two to comprehend, and it roughly describes how far away the typical observation is from the mean.
We call the distance of an observation from its mean its deviation. Below are the deviations for the 1st, 2nd, 3rd, and 50th observations in the interest rate variable:
x1 − x̄ = 10.90− 11.57 = −0.67
x2 − x̄ = 9.92− 11.57 = −1.65
x3 − x̄ = 26.30− 11.57 = 14.73
…
x50 − x̄ = 6.08− 11.57 = −5.49
If we square these deviations and then take an average, the result is equal to the sample variance, denoted by s2:
s2 = (−0.67)2 + (−1.65)2 + (14.73)2 + · · ·+ (−5.49)2
50− 1
= 0.45 + 2.72 + 216.97 + · · ·+ 30.14
49 = 25.52
We divide by n − 1, rather than dividing by n, when computing a sample’s variance; there’s some mathematical nuance here, but the end result is that doing this makes this statistic slightly more reliable and useful.
Notice that squaring the deviations does two things. First, it makes large values relatively much larger, seen by comparing (−0.67)2, (−1.65)2, (14.73)2, and (−5.49)2. Second, it gets rid of any negative signs.
The standard deviation is defined as the square root of the variance:
s = √
25.52 = 5.05
While often omitted, a subscript of x may be added to the variance and standard deviation, i.e. s2 x and sx, if it is useful as a reminder that these are the variance and standard deviation of the
observations represented by x1, x2, …, xn.