The term "search engine" is often used generically to describe both
crawler-based search engines and human-powered directories. These two types of
search engines gather their listings in radically different ways.
Crawler-Based Search Engines
Crawler-based search engines, such as Google, create their listings
automatically. They "crawl" or "spider" the web, then people search through what they have
found.
If you change your web pages, crawler-based search engines eventually find these changes, and that
can affect how you are listed. Page titles, body copy and other elements all play a role.
Human-Powered Directories
A human-powered directory, such as the Open Directory, depends on humans for its
listings. You submit a short description to the directory for your entire site, or editors
write one for sites they review. A search looks for matches only in the descriptions
submitted.
Changing your web pages has no effect on your listing. Things that are useful for
improving a listing with a search engine have nothing to do with improving a listing in a
directory. The only exception is that a good site, with good content, might be more likely
to get reviewed for free than a poor site.
"Hybrid Search
Engines" Or Mixed Results
In the web's early days, it used to be that a search engine either presented
crawler-based results or human-powered listings. Today, it extremely common for
both types of results to be presented. Usually, a hybrid search engine will
favor one type of listings over another. For example, MSN
Search is more likely to
present human-powered listings from LookSmart. However, it does also present crawler-based
results (as provided by Inktomi), especially for more obscure queries.
The Parts Of A Crawler-Based Search Engine
Crawler-based search engines have three major elements. First is the spider, also called the crawler.
The spider visits a web page, reads it, and then follows links to other pages within the
site. This is what it means when someone refers to a site being "spidered" or
"crawled." The spider returns to the site on a regular basis, such as every
month or two, to look for changes.
Everything the spider finds goes into the second part of the search engine, the index.
The index, sometimes called the catalog, is like a giant book containing a copy of every
web page that the spider finds. If a web page changes, then this book is updated
with new
information.
Sometimes it can take a while for new pages or changes that the spider finds to be
added to the index. Thus, a web page may have been "spidered" but not yet
"indexed." Until it is indexed -- added to the index -- it is not available to
those searching with the search engine.
Search engine software is the third part of a search engine. This is the program that
sifts through the millions of pages recorded in the index to find matches to a search and
rank them in order of what it believes is most relevant. You can learn more about how
search engine software ranks web pages on the aptly-named How Search
Engines Rank Web Pages page.
All crawler-based search engines have the basic parts described above, but there are differences in
how these parts are tuned. That is why the same search on different search engines often
produces different results. Some of the significant differences between the major
crawler-based search
engines are summarized on the Search Engine Features Page.
Information on this page has
been drawn from the help pages of each search engine, along with knowledge gained from
articles, reviews, books, independent research, tips from others and additional
information received directly from the various search engines.