From the request thread, I was hoping for a nice easy softball, maybe from an undergraduate or mathematical amateur. Apparently, though, I have finally scared off anyone other than procrastinating professional mathematicians, who want me to actually write the posts I promised.

In the comments here I promised a post explaining why most statistics satisfy the Central Limit Theorem. I thought I’d start slowly with an explanation of what a statistic *is*.

A statistic is just something you compute from the data. This definition is so uninteresting that statistics books are a little apologetic about how contentless the definition sounds. (This usage of the term “statistic” was coined by Fisher. There is a cutting quote by Pearson on the terminology that is impossible to Google for, since all I remember is that it’s about the word statistic, and it involves Fisher and Pearson, who are probably the two most famous statisticians.)

Probability distributions are mathematical abstractions, while statistics are numbers we compute from actual data. If we believe that we can model that data as if it is generated by a random variable, then we have to relate the statistic to some property of the probability distribution. Usually, we are interested in some property of the underlying distribution, and using statistics to estimate it. For example, we may be interested in the mean of the underlying random variable, which we can approximate by using the mean of data.

Approximating the mean of the random variable in this way is a special case of a general technique to compute a property of a random variable. A random sample drawn from a probability distribution can be thought of as a (discrete) probability distribution in its own right. The property for the sample distribution can be used as an estimate of the property for the true distribution — this is known as the plug-in estimate for the property. An analog of the law of large numbers shows that this estimate converges to the true value.

Next time: the analogue of the central limit theorem.