Modern
commercial search engines rely on the science of information
retrieval (IR). That science has existed since the middle of the 20th
century, when retrieval systems powered computers in libraries,
research facilities and government labs. Early in the development of
search systems, IR scientists realized that two critical components
made up the majority of search functionality:
Relevance
- the degree to which the content of the documents returned in a
search matched the user's query intention and terms. The relevance of
a document increases if the terms or phrase queried by the user
occurs multiple times and shows up in the title of the work or in
important headlines or subheaders.
Popularity
- the relative importance, measured via citation (the act of one work
referencing another, as often occurs in academic and business
documents) of a given document that matches the user's query. The
popularity of a given document increases with every other document
that references it.
These
two items were translated to web search 40 years later and manifest
themselves in the form of document analysis and link analysis.
In
document analysis, search engines look at whether the search terms
are found in important areas of the document - the title, the meta
data, the heading tags and the body of text content. They also
attempt to automatically measure the quality of the document (through
complex systems beyond the scope of this guide).
In
link analysis, search engines measure not only who is linking to a
site or page, but what they are saying about that page/site. They
also have a good grasp on who is affiliated with whom (through
historical link data, the site's registration records and other
sources), who is worthy of being trusted (links from .edu and .gov
pages are generally more valuable for this reason) and contextual
data about the site the page is hosted on (who links to that site,
what they say about the site, etc.).
Link
and document analysis combine and overlap hundreds of factors that
can be individually measured and filtered through the search engine
algorithms (the set of instructions that tell the engines what
importance to assign to each factor). The algorithm then determines
scoring for the documents and (ideally) lists results in decreasing
order of importance (rankings).