Search Engine Robots - How They Work, What They Do (Part II)

April 20, 2008 – 7:29 pm

If your site isn’t found in the , it is probably because the robots couldn’t deal with it. It could be something as simple as not being able to find the site, or it may be more complicated issues involving the robot’s not being able to crawl the site or figure out what your pages are all about.
Submitting your site to the major : that will help with the “can’t find it” problem. Even having links pointing back to your site can be enough to attract the robots. , for example, suggests that you may not have to submit your pages; they will find your site if you have a link pointing back to it from at least one other site on the web.
If the robots can find your site but can’t make sense of it, then you may need to look at the content and technology used on your pages. Frames, Flash, dynamically generated pages, and invalid can cause problems when the robot tries to access your . While some are beginning to be able to index dynamically generated pages and Flash (e.g. and ), use of some of these technologies can hinder your ability to be indexed by the robots.
Text in cannot be read by the robots. Using ALT is an important way to help the robots “read” your . Websites with extensive rely heavily on ALT text to present their content.
How Do I Get The Most Out Of Indexing?
If you know what to “feed” the spidering robots you will help yourself with ranking.
Having a website full of good content is the major factor. exist to serve their visitors, not to rank your website. You need to be sure to present yourself in your site in the way that will be most useful to the visitor. Each has its own idea of what is important in a page, but they all value text highly. Making sure that the text on your pages includes your most important keyword will help the evaluate the content of those pages.
Making sure that you have good title and meta tags will further assist the in understanding what your page is about. If the text on the page is about widgets, the title is about widgets, and the meta tags are about widgets, the will have a pretty good idea that you are all about widgets. When their visitors search for widgets, the know to list your site in the results.
A sitemap page is a very good way of giving the robot every opportunity to reach your website pages. Since robots click through the links of your , make sure that at least your most important pages are included in the sitemap; you may even want to include all your pages there, depending on the size of your site. Be sure to add a link to the sitemap page from each page on your site.
Another important consideration is that of keeping all of your pages within a small number of “clicks” from your top page. Many robots will not follow links more than two or three levels deep, so if your “widgets” page can only be reached from your home page by following multiple links (e.g. home page >> about us page >> products page >> widgets page), the robot may not crawl deep enough to get to the widgets page.
Testing Your Website For Robot Accessibility
To get an idea just what the robot “sees” on your page, you can look at the Sim Spider tool. You may be surprised at how different your site looks to the robot. You can find this tool at http://www.searchengineworld.com/cgi-bin/sim_spider.cgi
You will see text and ALT show up in the results. If your entire website is built in Flash, you will see nothing at all because robots don’t understand Flash movies.
The Bottom Line
When it comes to robots, think simply. Lots of good content and text, hyperlinks the robots can follow, optimization of your pages, topical links pointing back to your site and a sitemap will help insure the best results when the robots come visiting.
Resources
*SpiderSpotting - Watch
http://searchenginewatch.com/webmasters/spiders.html
*Robotstxt.org
List of robots and protocols for setting up a robots.txt file.
http://www.robotstxt.org/
*Spider-Food
Tutorials, forums and articles about spiders and Marketing.
http://spider-food.net/
*Spiderhunter.com
Articles and resources about tracking spiders.
http://www.spiderhunter.com/
*Sim Spider Robot Simulator
World has a spider that simulates what the robots read from your website.
http://www.searchengineworld.com/cgi-bin/sim_spider.cgi
Daria Goetsch is the founder and Marketing Consultant for Search Innovation Marketing, a Search Engine Optimization company serving small businesses. She has specialized in Promotion since 1998, including three years as the Specialist for O’Reilly Media, Inc., a technical book publishing company.
Copyright © 2002-2005 Search Innovation Marketing.
http://www.searchinnovation.com All Rights Reserved.
Permission to reprint this article is granted if the article is reproduced in its entirety, without editing, including the bio information. Please include a hyperlink to http://www.searchinnovation.com when using this article in newsletters or online.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new .
  • Digg
  • Sphinn
  • del.icio.us
  • Google
  • Furl
  • Technorati
  • YahooMyWeb

You must be logged in to post a comment.