Mathematics of a Serial Killer

Someone sent me a link to this story about a mathematical model of a particular serial killer’s behavior. Two things struck me about it:

  1. How much it sounded like the kind of bizarre model you’d see on Charline on Numb3rs come up with in order to crack the case.
  2. That Cosma Shalizi would hate the model, since it’s the kind of a casual use of power laws he regularly criticizes. And here’s his analysis of the paper. He points out that, as in many other cases, a lognormal distribution provides a better fit.

The Tragedy of Blogging

Like most blogs, this blog gets a lot of spam, most of which gets caught by the spam filter, but not all. Some of it I have to go through manually. Sadly, I have come to learn that if a message says something like “great job”, it’s spam with probability 1. Probably the next generation of spambots will say “Completely wrong, you idiot!” and compare you to Hitler for greater verisimilitude.

Combined with the revelation that I accidentally turned down the Clay Prize, it’s been a discouraging day for this blogger.

What is a Statistic?

From the request thread, I was hoping for a nice easy softball, maybe from an undergraduate or mathematical amateur. Apparently, though, I have finally scared off anyone other than procrastinating professional mathematicians, who want me to actually write the posts I promised.

In the comments here I promised a post explaining why most statistics satisfy the Central Limit Theorem. I thought I’d start slowly with an explanation of what a statistic is.

A statistic is just something you compute from the data. This definition is so uninteresting that statistics books are a little apologetic about how contentless the definition sounds. (This usage of the term “statistic” was coined by Fisher. There is a cutting quote by Pearson on the terminology that is impossible to Google for, since all I remember is that it’s about the word statistic, and it involves Fisher and Pearson, who are probably the two most famous statisticians.)

Probability distributions are mathematical abstractions, while statistics are numbers we compute from actual data. If we believe that we can model that data as if it is generated by a random variable, then we have to relate the statistic to some property of the probability distribution. Usually, we are interested in some property of the underlying distribution, and using statistics to estimate it. For example, we may be interested in the mean of the underlying random variable, which we can approximate by using the mean of data.

Approximating the mean of the random variable in this way is a special case of a general technique to compute a property of a random variable. A random sample drawn from a probability distribution can be thought of as a (discrete) probability distribution in its own right. The property for the sample distribution can be used as an estimate of the property for the true distribution — this is known as the plug-in estimate for the property. An analog of the law of large numbers shows that this estimate converges to the true value.

Next time: the analogue of the central limit theorem.

Tremellius and Naibod

God Plays Dice has a post that answers a question I’ve long had about the Mathematics Geneology Project: just how far back can you go? The answer is 1572, when Immanuel Tremellius and Valentine Naibod advised Rudolph Snellius. Snellius was the father of Willebrord Snellius, who discovered Snell’s law.

Tremellius was a Bible translator who was briefly jailed for being a Calvinist. It sounds like he was forced to move frequently as the prevailing winds for Protestants changed. (This was the early Reformation.) Naibod was an astrologer who had a book banned by the Catholic Church. An astrological prediction told him that his life was in danger, so he tried holing up in his house until the danger passed. Since the house showed no external signs of life, thieves thought the house was abandoned and broke in. Discovering Naibod, they murdered him. Apparently astrology works after all.

The Geneology Project has a page dedicated to what it calls extrema. I would support a campaign to rename the Guinness Book of World Records the Guinness Book of Extrema.

Update. In between when I hit “Post” and now, the Mathematical Geneology site updated their database, making this post completely obsolete.

Looting the Library

I promised a while back to write a post describing why so many statistics have a central limit theorem. I went to the library to look up the result I had in mind, to refresh my memory as to the details. The book I wanted was checked out. I thought about requesting the book, but it seemed a bit much to request a book just for a blog post. A couple of days later, I found out who had the book checked out: me.