Perron-Frobenius on the web

Or, how to make a search engine.

Imagine the web is irreducible, by which I mean you could get from any page to any other by following links; pages without links (and pages no one links to) demonstrate that the web is not irreducible — but this is mathematics, so we are not going to let it worry us. Further, imagine there are millions of monkeyspigeons randomly clicking on links (forming a Markov chain). Perron-Frobenius theory can tell us the probability of these random walks through cyberspace visiting a particular page at an instance in time.

To do this we need an adjacency matrix of the web, an 8 billion by 8 billion matrix which is mostly zeros; when page i links to page j we put a one in the (i,j) entry (yes, we need to number all 8 billion pages).

Perron-Frobenius says: since this matrix has only non-negative entries, it has a unique positive eigenvalue and corresponding (normalized) positive eigenvectors. This (left) Perron eigenvector is what we are after; each of its 8 billion entries represents the probability of each pigeon being at the corresponding page.

When someone searches for the word “math”, each of our pigeons is given a singular focus: “math”. Now, when presented with a choice of links on a page, instead of randomly picking among them with equal probability, our pigeons are more likely to pick the links related to “math”. In our matrix, instead of using a one for every link, we use the value of the link which we calculate from the appearance of “math” in the link as well as its frequency and prominence on the target page (our pigeons have been doing this for so long they already know the contents of the page they are going to; but, alas, they have reached the point where they just look at the pages without really reading them).

Apply Perron-Frobenius to this new matrix to get a new Perron eigenvalue and Perron eigenvector. This eigenvector contains the probabilities our pigeons are visiting each page, making a good approximation of the usefulness of each page relative to the word “math”. Moreover, the Perron eigenvalue — being related to the entropy of our pigeons — can measure the usefulness of the web relative to the word “math”.

Current Implementations

Google’s pigeons appear to have a hard time concentrating and sometimes — instead of following links — may randomly jump to any page on the web. Google spins this as a “feature”, because now they can pretend that every page links to every other page making the web irreducible. It gets worse. Their pigeons can’t even focus on a search term; so, instead of creating an 8 billion by 8 billion matrix (and finding its eigenvector) for every search term, they do this once for an adjacency matrix (with very small numbers instead of zeros to account for the pigeon wanderings) and declare that the resulting Perron eigenvector represents the importance (or “PageRank”) of the web pages.

For a single website, say, less than 8 billion pages, it might be possible to maintain pigeon discipline and focus, allowing the matrix and eigenvector to be generated for each query (the power method can help with this).