Certain
types of navigation may hinder or entirely prevent search engines
from reaching your website's content. As search engine spiders crawl
the web, they rely on the architecture of hyperlinks to find new
documents and revisit those that may have changed. In the analogy of
speed bumps and walls, complex links and deep site structures with
little unique content may serve as "bumps." Data that
cannot be accessed by spiderable links qualify as "walls."
Possible
"Speed Bumps" for SE Spiders:
-
URLs
with 2+ dynamic parameters; i.e.
http://www.url.com/page.php?id=4&CK=34rr&User=%Tom% (spiders
may be reluctant to crawl complex URLs like this because they often
result in errors with non-human visitors)
-
Pages
with more than 100 unique links to other pages on the site (spiders
may not follow each one)
-
Pages
buried more than 3 clicks/links from the home page of a website
(unless there are many other external links pointing to the site,
spiders will often ignore deep pages)
-
Pages
requiring a "Session ID" or Cookie to enable navigation
(spiders may not be able to retain these elements as a browser user
can)
-
Pages
that are split into "frames" can hinder crawling and cause
confusion about which pages to rank in the results.
Possible
"Walls" for SE Spiders:
-
Pages
accessible only via a select form and submit button
-
Pages
requiring a drop down menu (HTML attribute) to access them
-
Documents
accessible only via a search box
-
Documents
blocked purposefully (via a robots meta tag or robots.txt file - see
more
on these here)
-
Pages
requiring a login
-
Pages
that re-direct before showing content (search engines call this
cloaking or bait-and-switch and may actually ban sites that use this
tactic)
The
key to ensuring that a site's contents are fully crawlable is to
provide direct, HTML links to to each page you want the search engine
spiders to index. Remember that if a page cannot be accessed from the
home page (where most spiders are likely to start their crawl) it is
likely that it will not be indexed by the search engines. A sitemap
(which is discussed
later
in this guide) can be of tremendous help for this purpose.