Sunday, January 16, 2011

Google's PageRank and Beyond: The Science of Search Engine Rankings



Google's PageRank and Beyond: The Science of Search Engine Rankings
| 2006-07-03 00:00:00 | | 0 | SEO


Why doesn't your home page appear on the first page of search results, even when you query your own name? How do other Web pages always appear at the top? What creates these powerful rankings? And how? The first book ever about the science of Web page rankings, Google's PageRank and Beyond supplies the answers to these and other questions and more.

The book serves two very different audiences: the curious science reader and the technical computational reader. The chapters build in mathematical sophistication, so that the first five are accessible to the general academic reader. While other chapters are much more mathematical in nature, each one contains something for both audiences. For example, the authors include entertaining asides such as how search engines make money and how the Great Firewall of China influences research.

The book includes an extensive background chapter designed to help readers learn more about the mathematics of search engines, and it contains several MATLAB codes and links to sample Web data sets. The philosophy throughout is to encourage readers to experiment with the ideas and algorithms in the text.

Any business seriously interested in improving its rankings in the major search engines can benefit from the clear examples, sample code, and list of resources provided.

Many illustrative examples and entertaining asides MATLAB code Accessible and informal style Complete and self-contained section for mathematics review

User review
Google's PageRank and Beyond: The Science of Search Engine Rankings
The book is good at explaining the Google's pageRanking, and it try to present rigorious math proof to demonstrate the idea. It is good, however, the math part is not well organized, and it is not easy for people without linear algebra knowledge to follow it. Anyway, it is still good book to demonstrate Markov chain in pageRank.

User review
Probability Transition Matrix, Markov Chain, and Stationary Vector
A web search engine has six major components. The components are (1) crawler module, (2) page repository, (3) indexing module, (4) indexes, (5) query module, and (6) ranking module. The ranking module takes the set of relevant pages and ranks them according to both the content score and the popularity score. The popularity score is the focus of Amy N. Langville and Carl D. Meyer's `Google's PageRank and Beyond: The Science of Search Engine Rankings.` The popularity score of a web page is determined by Web pages' hyperlink structure.


Brin and Page`s PagerRank philosophy is that a page with more recommendations must be more important than a page with a few links. Or a web page is more important if it is pointed to by other important page. Brin and Page then build a normalized hyperlink matrix (H). With the adjustments named stochasticity and primitivity, a Google matrix (G) is obtained, which is, in fact, a probability transition matrix of a Markov chain. The desired ranking of the web pages is the stationary vector of the matrix G or the solution of the corresponding linear homogeneous system.


To calculate the ranking vector is not an easy task, for the matrix G has 8.1 billion rows and 8.1 billions columns. The matrix is growing everyday as the number of web pages grows everyday. The book consider several major large-scale implementation issues such as storage, convergence criterion, accuracy, dangling nodes, and back button modeling. Accelerating methods are presented as well. They are the adaptive power method, extrapolation, and aggregation. Once the ranking vector is calculated, it has to be updated periodically. However, there is no effective and efficient update method available other than calculating from scratch.


Other ranking methods such as HITS and SALSA are introduced. They are both query dependent. They have both the hub and authority scores. They are both easier to spam than PageRank. Several interesting Matlab programs are provided. One could use them crawl the web, build the matrices, and accelerate the calculation of the stationary vector.


This is a wonderful book with timely technical material, entertaining asides, and a cute book cover. Best of all, the primary author is a lady. I am looking forward to read more books like this.



User review
Great book,,.
Great book. It's nice to have all the recent work done in trust metrics all in one place.

User review
Good balance
The book strikes a good balance between the novice and the highly experienced math junkie

User review
More a math textbook than anything else
You need a degree in math to comprehend this book - if that is what you are looking for great. If not this book is not for web professionals like myself.


Download this book!

Free Ebooks Download