markovpr (GPL 2 or higher)
MarkovPR is a PageRank calculator using Markov chains. It reads a large
number of web pages, builds an efficient in-memory representation of the web link
graph, and runs a Markov chain particle system which converges to PageRank.
Besides plain PageRank, it can compute various generalizations, including a
version of PageRank which takes into account the age of a web page.
It can also be used to obtain a "perfect sample" from PageRank.
It is described in two accompanying papers:
For their first
programming contest (2002), the search engine Google made
available just under one million web pages from their index, totalling
5 Gb of data. This dataset was read in about 15 minutes by MarkovPR,
on a 500Mhz Pentium III, taking just under 200Mb of RAM, (which is fairly
impressive for low end hardware of the time).
While the source code here is Free, only a small number of sample web pages are
bundled with the code. To duplicate the results, you will need to obtain
Google's dataset (5 Gb) by contacting them yourself, as I am not allowed to redistribute it (besides, I don't have the bandwidth):
This repository of web page information is being provided to you by Google
Inc. solely for academic and research purposes related to the Google
programming contest. You may not modify, distribute, or make any commercial
use of the repository.
- markovpr-1.1.tar.bz2 (MD5)
- markovpr-1.01.tar.gz
- MarkovPR is a collection of programs (written in C++) which build a web link graph and
calculate various types of page ranking distributions by means of suitable
Markov chains (screenshot).
The data defining the web graph must be obtained separately as
described inside the download. You can find technical descriptions of this software
on my preprints page. Tested on Linux.
|