In 1997, Google's founders created an algorithmic method to determine importance and popularity based on several key principles:
That algorithm, of course, was PageRank, and it changed the course of web search, providing tremendous value to Google's early efforts around quality and relevancy in results. As knowledge of PageRank spread, those with a vested interest in influencing the search rankings (SEOs) found ways to leverage this information for their websites and pages.
But, Google didn't stand still or rest on their laurels in the field of link analysis. They innovated, leveraging signals like anchor text, trust, hubs & authorities, topic modeling and even human activity to influence the weight a link might carry. Yet, unfortunately, many in the SEO field are still unaware of these changes and how they impact external marketing and link acquisition best practices.
In this post, I'm going to walk through ten principles of link valuation that can be observed, tested and, in some cases, have been patented. I'd like to extend special thanks to Bill Slawski from SEO By the Sea, whose recent posts on Google's Reasonable Surfer Model and What Makes a Good Seed Site for Search Engine Web Crawls? were catalysts (and sources) for this post.
As you read through the following 10 issues, please note that these are not hard and fast rules. They are, from our perspective, accurate based on our experiences, testing and observation, but as with all things in SEO, this is opinion. We invite and strongly encourage readers to test these themselves. Nothing is better for learning SEO than going out and experimenting in the wild.
Whenever we (or many other SEOs we've talked to) conduct tests of page or link features in (hopefully) controlled environments on the web, we/they find that links higher up in the HTML code of a page seem to pass more ranking ability/value than those lower down. This certainly fits with the recently granted Google patent application - Ranking Documents Based on User Behavior and/or Feature Data, which suggested a number of items that may considered in the way that link metrics are passed.
Those who've leveraged testing environments also often struggle against the power of the "higher link wins" phenomenon, and it can take a surprising amount of on-page optimization to overcome the power the higher link carries.
There's little surprise here, but if you recall, the original PageRank concept makes no mention of external vs. internal links counting differently. It's quite likely that other, more recently created metrics (post-1997) do reward external links over internal links. You can see this in the correlation data from our post a few weeks back noting that external mozRank (the "PageRank" sent from external pages) had a much higher correlation with rankings than standard mozRank (PageRank):
I don't think it's a stretch to imagine Google separately calculating/parsing out external PageRank vs. Internal PageRank and potentially using them in different ways for page valuation in the rankings.
Speaking of correlation data, no single, simple metric is better correlated with rankings in Google's results than the number of unique domains containing an external link to a given page. This strongly suggests that a diversity component is at play in the ranking systems and that it's better to have 50 links from 50 different domains than to have 500 more links from a site that already links to you. Curiously again, the original PageRank algorithm makes no provision for this, which could be one reason sitewide links from domains with many high-PageRank pages worked so well in those early years after Google's launch.
We've talked previously about TrustRankon SEOmoz and have generally reference the Yahoo! research paper - Combating Webspam with TrustRank. However, Google's certainly done plenty on this front as well (as Bill covers here) and this patent application on selecting trusted seed sites certainly speaks to the ongoing need and value of this methodology. Linkscape's own mozTrust score functions in precisely this way, using a PageRank-like algorithm that's biased to only flow link juice from trusted seed sites rather than equally from across the web.
Papers like Microsoft's VIPS (Vision Based Page Segmentation), Google's Document Ranking Based on Semantic Distance, and the recent Reasonable Surfer stuff all suggest that valuing links from content more highly than those in sidebars or footers can have net positive impacts on avoiding spam and manipulation. As webmasters and SEOs, we can certainly attest to the fact that a lot of paid links exist in these sections of sites and that getting non-natural links from inside content is much more difficult.
This one isn't covered in any papers or patents (to my knowledge), but our testing has shown (and testing from others supports) that anchor text carried through HTML is somehow more potent or valued than that from alt attributes in image links. That's not to say we should run out and ditch image links, badges or the alt attributes they carry. It's just good to be aware that Google seems to have this bias (perhaps it will be temporary).
We've likely all experienced the sinking feeling of seeing a competitor with fewer and what appear to be links from less powerful pages outranking us. This may be somewhat explained by the value of a domain to pass along value via a link that may not be fully reflected in page-level metrics. It can also help search engines to combat spam and provide more trusted results in general. If links from sites that rarely link to junk pass significantly more than those whose link practices and impact on the web overall may be questionable, they can much better control quality.
NOTE: Having trouble digging up the papers/patents on this one; I'll try to revisit and find them tomorrow.
Over the years, this phenomenon has been reported and contradicted numerous times. Our testing certainly suggested that noscript links don't pass value, but that may not be true in every case. It is why we included the ability to filter noscript in Linkscape, but the quantity of links overall on the web inside this tag is quite small.
Apart from even Google's QDF (Query Deserves Freshness) algorithm, which may value more recently created and linked-to content in certain "trending" searches, it appears that the engine also uses temporal signals around linking to both evaluate spam/manipulation and reward pages that earn a large number of references in a short period of time. Google's patent on Information Retrieval Based on Historical Data first suggested the use of temporal data, but the model has likely seen revision and refinement since that time.
I was fascinated to see Richard Baxter's own experiments on this in his post - Google Page Level Penalty for Comment Spam. Since then, I've been keeping an eye on some popular, valuable blog posts that have received similarly overwhelming spam and, low and behold, the pattern seems verifiable. Webmasters would be wise to keep up to date on their spam removal to avoid arousing potential ranking penalties from Google (and the possible loss of link value).
But what about classic "PageRank" - the score of which we get a tiny inkling from the Google toolbar's green pixels? I'd actually surmise that while many (possibly all) of the features about links discussed above make their way into the ranking process, PR has stayed relatively unchanged from its classic concept. My reasoning? SEOmoz's own mozRank, which correlates remarkably well with toolbar PR (off on avg. by 0.42 w/ 0.25 being "perfect" due to the 2 extra significant digits we display) and is calculated with very similar intuition to that of the original PageRank paper. If I had to guess (and I really am guessing), I'd say that Google's maintained classic PR because they find the simple heuristic useful for some tasks (likely including crawling/indexation priority), and have adopted many more metrics to fit into the algorithmic pie.
As always, we're looking forward to your feedback and hope that some of you will take up the challenge to test these on your own sites or inside test environments and report back with your findings.
p.s. I finished this post at nearly 3am (and have a board meeting tomorrow), so please excuse the odd typo or missed link. Hopefully Jen will take a red pen to this in the morning!