What is a Statistic?

November 1st, 2008 by Walt

From the request thread, I was hoping for a nice easy softball, maybe from an undergraduate or mathematical amateur. Apparently, though, I have finally scared off anyone other than procrastinating professional mathematicians, who want me to actually write the posts I promised.

In the comments here I promised a post explaining why most statistics satisfy the Central Limit Theorem. I thought I’d start slowly with an explanation of what a statistic is.

A statistic is just something you compute from the data. This definition is so uninteresting that statistics books are a little apologetic about how contentless the definition sounds. (This usage of the term “statistic” was coined by Fisher. There is a cutting quote by Pearson on the terminology that is impossible to Google for, since all I remember is that it’s about the word statistic, and it involves Fisher and Pearson, who are probably the two most famous statisticians.)

Probability distributions are mathematical abstractions, while statistics are numbers we compute from actual data. If we believe that we can model that data as if it is generated by a random variable, then we have to relate the statistic to some property of the probability distribution. Usually, we are interested in some property of the underlying distribution, and using statistics to estimate it. For example, we may be interested in the mean of the underlying random variable, which we can approximate by using the mean of data.

Approximating the mean of the random variable in this way is a special case of a general technique to compute a property of a random variable. A random sample drawn from a probability distribution can be thought of as a (discrete) probability distribution in its own right. The property for the sample distribution can be used as an estimate of the property for the true distribution — this is known as the plug-in estimate for the property. An analog of the law of large numbers shows that this estimate converges to the true value.

Next time: the analogue of the central limit theorem.

9 Responses to “What is a Statistic?”

  1. Alex Says:

    Well, here’s a softball question: As I understand it, consensus has not been reached on what a probability actually is, and more generally, on how one should go about making decisions under uncertainty (decision theory). I know that Bayesian probability is applicable to decision theory, and frequentist probability isn’t (or at least is difficult to use. So my question is, what are the main other approaches to decision theory than straight Bayesian? I’ve heard of Pollock’s ‘nomic probability’, but I don’t understand it and I don’t know whether it’s worth my time trying to figure it out.

  2. Todd Trimble Says:

    “From the request thread, I was hoping for a nice easy softball, maybe from an undergraduate or mathematical amateur. Apparently, though, I have finally scared off anyone other than procrastinating professional mathematicians”

    Including notedscholar, who had the better question about irrationality of infinity and irrationality of imaginary numbers.

  3. Peter Says:

    Walt — It is not necessarily straightforward to define simple, taken-for-granted, terms. The Uncertainty in AI (UAI) email list had a very long debate a few years back over the definition of “random variable”.

    Alex — Probability theory (as represented,say, by the axioms of Kolmogorov) provides one basis for the formal representation of uncertainty, and, subsequently, a basis for decision-making under uncertainty. But, contrary to the beliefs of many bayesian statisticians, probability theory is not the only way to fomally represent uncertainty, nor even necessarily the most appropriate in any given application domain. Because, historically, statisticians mostly failed to consider other formalisms, it was left to people outside statistics (in law, in medicine, and latterly, in AI) to consider non-probabilistic formalisms for uncertainty. These alternative formalisms (eg, Dempster-Shafer Theory, possibility theory) also provide a basis for decision-making under uncertainty.

  4. Will Says:

    Karl Pearson objected, “Are we also to introduce the words a mathematic, a physic, an electric etc., for parameters or constants of other branches of science?” (p. 49n of Biometrika, 28, 34-59 1936).

    That one?

    Found at Jeff Miller’s Earliest Known Uses of Some of the Words of Mathematics page:
    http://jeff560.tripod.com/s.html

  5. hellblazer Says:

    Todd, that is a joke, isn’t it? (Having spent half an hour looking at notedscholar’s blog, and doing some googling, I’m not sure any discussion here could have done justice to his, erm, grand and bold vision.)

  6. Todd Trimble Says:

    hellblazer: yes, and you’re probably right: it’d best be left to notedscholar.

  7. Alex Says:

    Peter: thanks!

  8. notedscholar Says:

    Interesting post. As Todd points out - not maximally interesting, but interesting.

    I appreciate the compliments, although do I detect a hint of sarcasm?

    Anyway, I’m not sure I like this definition of statistics. The general intuition is that algebra is not statistics, but your definition overlaps both strata.

    Also, I would point out that probability distributions are not unique in being abstract. *All* mathematical statements are abstract, however obvious.

  9. Disco Says:

    Hi all.

    At the risk of introducing new terminology, does this definition of a statistic includes a “conditional expectation”? I guess it does, but conditional expectations are defined to possess certain properties as well as being functions of the data.

Leave a Reply