|
|
How Search Engines Operate
Search
engines have a short list of critical operations that allows them to
provide relevant web results when searchers use their system to find
information.
-
Crawling
the Web Search
engines run automated programs, called "bots" or "spiders"
that use the hyperlink structure of the web to "crawl" the
pages and documents that make up the World Wide Web. Estimates are
that of the approximately 20 billion existing pages, search engines
have crawled between 8 and 10 billion.
-
Indexing
Documents Once
a page has been crawled, it's contents can be "indexed" -
stored in a giant database of documents that makes up a search
engine's "index". This index needs to be tightly managed,
so that requests which must search and sort billions of documents
can be completed in fractions of a second.
-
Processing
Queries When
a request for information comes into the search engine (hundreds of
millions do each day), the engine retrieves from its index all the
document that match the query. A match is determined if the terms or
phrase is found on the page in the manner specified by the user. For
example, a search for car
and driver magazine
at Google returns 8.25 million results, but a search for the same
phrase in quotes ("car
and driver magazine")
returns only 166 thousand results. In the first system, commonly
called "Findall" mode, Google returned all documents which
had the terms "car" "driver" and "magazine"
(they ignore the term "and"
because it's not useful to narrowing the results), while in the
second search, only those pages with the exact phrase "car and
driver magazine" were returned. Other advanced operators
(Google has a list
of 11)
can change which results a search engine will consider a match for a
given query.
-
Ranking
Results Once
the search engine has determined which results are a match for the query, the engine's algorithm (a mathematical equation commonly used for sorting) runs calculations on each of the results to determine which is most relevant to the given query. They sort these on the results pages in order from most relevant to least so that users can make a choice about which to select.
Although
a search engine's operations are not particularly lengthy, systems
like Google, Yahoo!, AskJeeves and MSN are among the most complex,
processing-intensive computers in the world, managing millions of
calculations each second and funneling demands for information to an
enormous group of users.
|