From the request thread, I was hoping for a nice easy softball, maybe from an undergraduate or mathematical amateur. Apparently, though, I have finally scared off anyone other than procrastinating professional mathematicians, who want me to actually write the posts I promised.

In the comments here I promised a post explaining why most statistics satisfy the Central Limit Theorem. I thought I’d start slowly with an explanation of what a statistic *is*.

A statistic is just something you compute from the data. This definition is so uninteresting that statistics books are a little apologetic about how contentless the definition sounds. (This usage of the term “statistic” was coined by Fisher. There is a cutting quote by Pearson on the terminology that is impossible to Google for, since all I remember is that it’s about the word statistic, and it involves Fisher and Pearson, who are probably the two most famous statisticians.)

Probability distributions are mathematical abstractions, while statistics are numbers we compute from actual data. If we believe that we can model that data as if it is generated by a random variable, then we have to relate the statistic to some property of the probability distribution. Usually, we are interested in some property of the underlying distribution, and using statistics to estimate it. For example, we may be interested in the mean of the underlying random variable, which we can approximate by using the mean of data.

Approximating the mean of the random variable in this way is a special case of a general technique to compute a property of a random variable. A random sample drawn from a probability distribution can be thought of as a (discrete) probability distribution in its own right. The property for the sample distribution can be used as an estimate of the property for the true distribution — this is known as the plug-in estimate for the property. An analog of the law of large numbers shows that this estimate converges to the true value.

Next time: the analogue of the central limit theorem.

Sure, so it’s something that takes as an input N data and gives as output 1 number. That’s not so uninteresting since it means we’re not getting an “interesting” structure as an output, and since N → 1 we’re reducing a lot of detail.

Most statistic-s are some kind of integral, no? Or at least they pass through an integrator as well as perhaps other things happening at the same time. So that’s another way of saying what “a statistic” is in fancier terms. I believe the word “operator” holds the same meaning in QM.

To tantalise the computer people you can use the word “reduce” (with perhaps a “map” beforehand). Not so totally meaningless. One takeaway for example is that the mean income of a country, or the mean economic return on a PhD, may not be relevant to you if either [a] you aren’t a member of some important subset (whites, let’s say in the US if we’re discussing US wealth–or numerical methods PhD’s if we’re discussing economic returns on a mathematics degree) or [b] more statistic-s such as skew or variance are high enough to make the first moment less informative as the only number.

PS: I think you could also compute a statistic on a probability distribution. I believe you are thinking of an estimator that gets computed only on measurements of our world.