Sunday, March 1, 2020

Sum of Squares Formula Shortcut

Sum of Squares Formula Shortcut The calculation of a sample variance or standard deviation is typically stated as a fraction. The numerator of this fraction involves a sum of squared deviations from the mean. In statistics, the formula for this total sum of squares is ÃŽ £ (xi - xÌ„)2 Here the symbol xÌ„ refers to the sample mean, and the symbol ÃŽ £ tells us to add up the squared differences (xi - xÌ„) for all i. While this formula works for calculations, there is an equivalent, shortcut formula that does not require us to first calculate the sample mean. This shortcut formula for the sum of squares is ÃŽ £(xi2)-(ÃŽ £ xi)2/n Here the variable n refers to the number of data points in our sample. Standard Formula Example To see how this shortcut formula works, we will consider an example that is calculated using both formulas. Suppose our sample is 2, 4, 6, 8. The sample mean is (2 4 6 8)/4 20/4 5. Now we calculate the difference of each data point with the mean 5. 2 – 5 -34 – 5 -16 – 5 18 – 5 3 We now square each of these numbers and add them together. (-3)2 (-1)2 12 32 9 1 1 9 20. Shortcut Formula Example Now we will use the same set of data: 2, 4, 6, 8, with the shortcut formula to determine the sum of squares. We first square each data point and add them together: 22 42 62 82 4 16 36 64 120. The next step is to add together all of the data and square this sum: (2 4 6 8)2 400. We divide this by the number of data points to obtain 400/4 100. We now subtract this number from 120. This gives us that the sum of the squared deviations is 20. This was exactly the number that we have already found from the other formula. How Does This Work? Many people will just accept the formula at face value and do not have any idea why this formula works. By using a little bit of algebra, we can see why this shortcut formula is equivalent to the standard, traditional way of calculating the sum of squared deviations. Although there may be hundreds, if not thousands of values in a real-world data set, we will assume that there are only three data values: x1 , x2, x3. What we see here could be expanded to a data set that has thousands of points. We begin by noting that( x1 x2 x3) 3 xÌ„. The expression ÃŽ £(xi - xÌ„)2 (x1 - xÌ„)2 (x2 - xÌ„)2 (x3 - xÌ„)2. We now use the fact from basic algebra that (a b)2 a2 2ab b2. This means that (x1 - xÌ„)2 x12 -2x1 xÌ„ xÌ„2. We do this for the other two terms of our summation, and we have: x12 -2x1 xÌ„ xÌ„2 x22 -2x2 xÌ„ xÌ„2 x32 -2x3 xÌ„ xÌ„2. We rearrange this and have: x12 x22 x32 3xÌ„2 - 2xÌ„(x1 x2 x3) . By rewriting (x1 x2 x3) 3xÌ„ the above becomes: x12 x22 x32 - 3xÌ„2. Now since 3xÌ„2 (x1 x2 x3)2/3, our formula becomes: x12 x22 x32 - (x1 x2 x3)2/3 And this is a special case of the general formula that was mentioned above: ÃŽ £(xi2)-(ÃŽ £ xi)2/n Is It Really a Shortcut? It may not seem like this formula is truly a shortcut. After all, in the example above it seems that there are just as many calculations. Part of this has to do with the fact that we only looked at a sample size that was small. As we increase the size of our sample, we see that the shortcut formula reduces the number of calculations by about half. We do not need to subtract the mean from each data point and then square the result. This cuts down considerably on the total number of operations.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.