How to Get a Good Estimate With Only a Few Samples
- Details
- Published on Sunday, 01 July 2012 23:12
- Written by Jonathan Story
- Hits: 482
Sometimes, it can be difficult to measure samples in order to find an average value. For example, it might be too costly, or perhaps they are simply scarce or hard to reach. However, as long as you are willing to accept some risk that the result will be wrong, then, as Douglas Hubbard describes in his book, How To Measure Anything: Finding The Value of Intangibles in Business, a method exists.
Called the Student t-statistic, it can calculate from a small handful of samples the range within which the average of the entire population will fall nine times out of ten. Ordinarily, at least 30 samples would be needed, but using the t-statistic, reasonable results can be had with as few as five samples.
The way it works is to randomly select a few samples from the total population, say jelly beans from a jar. In this example, their weight in grams is 1.4, 1.4, 1.5, 1.6, and 1.1. From t his sample, using the t-statistic calculation we can say with 90 percent confidence that the average weight of jelly beans in the entire jar is between 1.222 and 1.578 grams. The calculation, which we will walk though next, is straightforward.
The calculation has four parts: First, find the average weight for the sample. Next, find out how much the weight of each jelly bean varies from the sample. Then, add together the variances of each bean and divide to get an "average" variance. Finally, find the lower bound of the range by subtracting this "average" variance from the average weight in step one, then find the upper bound by adding the "average" variance to the average weight.
Step One - Find average of samples
This step is straightforward. Add the values of the five samples, then divide by five (the number of samples) to find the mean average:
average sample weight = ( 1.4 + 1.4 + 1.5 + 1.6 + 1.1 ) / 5 = 7/5 = 1.4
Step Two - Find variances of each sample from the overall average
This step requires a little explanation for newcomers to statistics. We cannot simply subtract the differences of each sample from the average, because when we add everything together the variances of those samples less than the average would cancel out the variances of those greater than the average. The mathematical "trick" that is used, therefore, is to use the square of each difference. For example, a difference of -2 (that is, two less than the average) has the same variance as a difference of 2. This produces:
(1.4 - 1.4)2 = 0
(1.4 - 1.4)2 = 0
(1.5 - 1.4)2 = 0.01
(1.6 - 1.4)2 = 0.04
(1.1 - 1.4)2 = 0.09
The results are all added together and divided by one less than the number of samples. This produces what is called the sample variance:
(0 + 0 + 0.01 + 0.04 + 0.09) / 4 = 0.035
Why divide by one less than the sample size? Wouldn't getting the average variance require dividing by the sample size? For now, let us say that doing things this way is one of the statistical tricks that makes the confidence level at 90 percent. The next step has another statistical trick.
Step Three - Calculate the "average" variance
The "average" variance is known in statistical books as the "standard deviation of the estimate of the mean". It is calculated by getting the average of the sample variance and calculating its square root.
Sqrt(0.035 / 5) = 0.0837
Step Four - Determine the range
To begin, we need to find the t-score that corresponds to the sample size. This is a measurement of the precision of our "guess". The t-score for sample size 5 is 2.13. We multiply that by the result of the previous step to get the sample error (2.13 x 0.0837 = 0.178).
The range, within which lies the mean average of the total population (in our example, the jar of jelly beans), 90 percent of the time, is found by simply adding and subtracting the sample error from the sample average:
lower range = 1.4 - 0.178 = 1.222
upper range = 1.4 + 0.178 = 1.578
The result is that by using only five samples, we have a 90 percent confidence that the average weight of all jelly beans in the jar will be between 1.222 g and 1.578 g.
Conclusion
This article describes one method that Douglas Hubbard uses in his book to arrive at answers with minimal information, but it should be emphasized that statistical tools are only one element of the book. For example, he makes the point that a large amount of measurements currently performed and reported on do not affect decision-making, and that sometimes it might cost more to measure something than that information is worth. Although many measurement techniques are presented, it should not be forgotten that just because one can measure anything does not mean that one should.

