|
|
Creating Information Search Engines can Trust
As
search engines index the web's link structure and page contents, they
find two distinct kinds of information about a given site or page -
attributes of the page/site itself and descriptives about that
site/page from other pages. Since the web is such a commercial place,
with so many parties interested in ranking well for particular
searches, the engines have learned that they cannot always rely on
websites to be honest about their importance. Thus, the days when
artificially stuffed meta tags and keyword rich pages dominated
search results (pre-1998) have vanished and given way to search
engines that measure trust via links and content.
The
theory goes that if hundreds or thousands of other websites link to
you, your site must be popular, and thus, have value. If those links
come from very popular and important (and thus, trustworthy)
websites, their power is multiplied to even greater degrees. Links
from sites like NYTimes.com, Yale.edu, Whitehouse.gov and others
carry with them inherent trust that search engines then use to boost
your ranking position. If, on the other hand, the links that point to
you are from low-quality, interlinked sites or automated garbage
domains (aka link farms), search engines have systems in place to
discount the value of those links.
The
most well-known system for ranking sites based on link data is the
simplistic formula developed by Google's founders - PageRank.
PageRank, which relies on log-based calculations, is described
by Google in their technology section:
PageRank
relies on the uniquely democratic nature of the web by using its vast
link structure as an indicator of an individual page's value. In
essence, Google interprets a link from page A to page B as a vote, by
page A, for page B. But, Google looks at more than the sheer volume
of votes, or links a page receives; it also analyzes the page that
casts the vote. Votes cast by pages that are themselves "important"
weigh more heavily and help to make other pages "important."
PageRank
is derived (roughly speaking), by amalgamating all the links that
point to a particular page, adding the value of the PageRank that
they pass (based on their own PageRank) and applying calculations in
the formula (see Ian
Rogers' explanation
for more details).
Google's
toolbar (available
here)
includes an icon that shows a PageRank value from 0-10
PageRank,
in essence, measures the brute link force of a site based on every
other link that points to it without significant regard for quality,
relevance or trust. Hence, in the modern era of SEO, the PageRank
measurement in Google's toolbar, directory or through sites that
query the service is of limited value. Pages with PR8 can be found
ranked 20-30 positions below pages with a PR3 or PR4. In addition,
the toolbar numbers are updated only every 3-6 months by Google,
making the values even less useful. Rather than focusing on PageRank,
it's important to think holistically about a link's worth.
Here's
a small list of the most important factors search engines look at
when attempting to value a link:
-
The
Anchor Text of Link -
Anchor text describes the visible characters and words that
hyperlink to another document or location on the web. For example in
the phrase, "CNN
is a good source of news, but I actually prefer the
BBC's take on events,"
two unique pieces of anchor text exist - "CNN" is the
anchor text pointing to http://www.cnn.com,
while "the BBC's take on events" points to
http://news.bbc.co.uk.
Search engines use this text to help them determine the subject
matter of the linked-to document. In the example above, the links
would tell the search engine that when users search for "CNN",
SEOmoz.org thinks that http://www.cnn.com
is a relevant site for the term "CNN" and that
http://news.bbc.co.uk
is relevant to "the BBC's take on events". If hundreds or
thousands of sites think that a particular page is relevant for a
given set of terms, that page can manage to rank well even if the
terms NEVER appear in the text itself (for example, see the BBC's
explanation of why Google ranks certain pages for the term
"Miserable
Failure").
-
Global
Popularity of the Site
- More popular sites, as denoted by the number and power of the
links pointing to them, provide more powerful links. Thus, while a
link from SEOmoz may be a valuable vote for a site, a link from
bbc.co.uk or cnn.com carries far more weight. This is one area where
PageRank (assuming it was accurate), could be a good measure, as
it's designed to calculate global popularity.
-
Popularity
of Site in Relevant Communities
- In the example above, the weight or power of a site's vote is
based on its raw popularity across the web. As search engines became
more sophisticated and granular in their approach to link data, they
acknowledged the existence of "topical communities"; sites
on the same subject that often interlink with one another,
referencing documents and providing unique data on a particular
topic. Sites in these communities provide more value when they link
to a site/page on a relevant subject rather than a site that is
largely irrelevant to their topic.
-
Subject
Matter of the Linking Page
- The topical relationship between the subject of a given page and
the sites/pages linked to on it may also factor into the value a
search engine assigns to that link. Thus, it will be more valuable
to have links from pages that are related to the site/pages subject
matter than those that have little to do with the topic.
These
are only a few of the many factors search engines measure and weight
when evaluating links. For a more complete list, see SEOmoz's
search engine ranking factors article.
Link
metrics are in place so that search engines can find information to
trust. In the academic world greater citation meant greater
importance, but in a commercial environment, manipulation and
conflicting interests interfere with the purity of citation-based
measurements. Thus, on the modern WWW, the source, style and context
of those citations is vital to ensuring high quality results.
|