Top Web Hosting Reviews
Top Web Hosting Provider of The Month:
Top Web Hosting
Visit Bluehost.com | Read Bluehost Review

>> Web Hosting Geeks // Web Hosting Articles // SEO - Search Engine Optimization  


Search Engine Spiders Lost Without Guidance - Post This Sign!








The robots.txt file is an exclusion standard required by all web crawlers/robots to tell them what files and directories that you want them to stay OUT of on your site. Not all crawlers/bots follow the exclusion standard and will continue crawling your site anyway. I like to call them "Bad Bots" or trespassers. We block them by IP exclusion which is another story entirely.

This is a very simple overview of robots.txt basics for webmasters. For a complete and thorough lesson, visit http://www.robotstxt.org/

To see the proper format for a somewhat standard robots.txt file look directly below. That file should be at the root of the domain because that is where the crawlers expect it to be, not in some secondary directory.

Below is the proper format for a robots.txt file ----->

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /group/

User-agent: msnbot
Crawl-delay: 10

User-agent: Teoma
Crawl-delay: 10

User-agent: Slurp
Crawl-delay: 10

User-agent: aipbot
Disallow: /

User-agent: BecomeBot
Disallow: /

User-agent: psbot
Disallow: /

--------> End of robots.txt file

This tiny text file is saved as a plain text document and ALWAYS with the name "robots.txt" in the root of your domain.

A quick review of the listed information from the robots.txt file above follows. The "User Agent: MSNbot" is from MSN, Slurp is from Yahoo and Teoma is from AskJeeves. The others listed are "Bad" bots that crawl very fast and to nobody's benefit but their own, so we ask them to stay out entirely. The * asterisk is a wild card that means "All" crawlers/spiders/bots should stay out of that group of files or directories listed.

The bots given the instruction "Disallow: /" means they should stay out entirely and those with "Crawl-delay: 10" are those that crawled our site too quickly and caused it to bog down and overuse the server resources. Google crawls more slowly than the others and doesn't require that instruction, so is not specifically listed in the above robots.txt file. Crawl-delay instruction is only needed on very large sites with hundreds or thousands of pages. The wildcard asterisk * applies to all crawlers, bots and spiders, including Googlebot.

Those we provided that "Crawl-delay: 10" instruction to were requesting as many as 7 pages every second and so we asked them to slow down. The number you see is seconds and you can change it to suit your server capacity, based on their crawling rate. Ten seconds between page requests is far more leisurely and stops them from asking for more pages than your server can dish up.

(You can discover how fast robots and spiders are crawling by looking at your raw server logs - which show pages requested by precise times to within a hundredth of a second - available from your web host or ask your web or IT person. Your server logs can be found in the root directory if you have server access, you can usually download compressed server log files by calendar day right off your server. You'll need a utility that can expand compressed files to open and read those plain text raw server log files.)

To see the contents of any robots.txt file just type robots.txt after any domain name. If they have that file up, you will see it displayed as a text file in your web browser. Click on the link below to see that file for Amazon.com

http://www.Amazon.com/robots.txt

You can see the contents of any website robots.txt file that way.

The robots.txt shown above is what we currently use at Publish101 Web Content Distributor, just launched in May of 2005. We did an extensive case study and published a series of articles on crawler behavior and indexing delays known as the Google Sandbox. That Google Sandbox Case Study is highly instructive on many levels for webmasters everywhere about the importance of this often ignored little text file.

One thing we didn't expect to glean from the research involved in indexing delays (known as the Google Sandbox) was the importance of robots.txt files to quick and efficient crawling by the spiders from the major search engines and the number of heavy crawls from bots that will do no earthly good to the site owner, yet crawl most sites extensively and heavily, straining servers to the breaking point with requests for pages coming as fast as 7 pages per second.

We discovered in our launch of the new site that Google and Yahoo will crawl the site whether or not you use a robots.txt file, but MSN seems to REQUIRE it before they will begin crawling at all. All of the search engine robots seem to request the file on a regular basis to verify that it hasn't changed.

Then when you DO change it, they will stop crawling for brief periods and repeatedly ask for that robots.txt file during that time without crawling any additional pages. (Perhaps they had a list of pages to visit that included the directory or files you have instructed them to stay out of and must now adjust their crawling schedule to eliminate those files from their list.)

Most webmasters instruct the bots to stay out of "image" directories and the "cgi-bin" directory as well as any directories containing private or proprietary files intended only for users of an intranet or password protected sections of your site. Clearly, you should direct the bots to stay out of any private areas that you don't want indexed by the search engines.

The importance of robots.txt is rarely discussed by average webmasters and I've even had some of my client business' webmasters ask me what it is and how to implement it when I tell them how important it is to both site security and efficient crawling by the search engines. This should be standard knowledge by webmasters at substantial companies, but this illustrates how little attention is paid to use of robots.txt.

The search engine spiders really do want your guidance and this tiny text file is the best way to provide crawlers and bots a clear signpost to warn off trespassers and protect private property - and to warmly welcome invited guests, such as the big three search engines while asking them nicely to stay out of private areas.

Copyright © August 17, 2005 by Mike Banks Valentine

Google Sandbox Case Study http://publish101.com/Sandbox2 Mike Banks Valentine operates http://Publish101.com Free Web Content Distribution for Article Marketers and Provides content aggregation, press release optimization and custom web content for Search Engine Positioning http://www.seoptimism.com/SEO_Contact.htm


MORE RESOURCES:

guardian.co.uk

SEO Services India – BSolutions - Website Marketing India
PR-inside.com (press release)
2009-11-21 16:06:58 - Search Engine Optimization (SEO) Marketing Company India provides economical and quality SEO promotion, Website Marketing, ...
BBC invests in SEOLast Click News
BBC bows to SEOguardian.co.uk
Understanding Orienting Search Behaviors For SEO & ConversionsSearch Engine Land (blog)
Kingpin Webmaster News -Search Newz -PR-Inside.com (Pressemitteilung)
all 26 news articles »


BigNews.biz (press release)

Best SEO Book - Votes Are In
PR-inside.com (press release)
2009-11-19 21:56:57 - The Best SEO Book on the market is SEO Made Simple according to Amazon.com book rankings. Finding the best SEO resource for improving ...
Businesses in smaller cities could see bigger benefits from SEOBrafton
Google's new algorithm consider On Page Optimization more | SEO IndiaBigNews.biz (press release)
TRAVEL MARKETING AGENCY NOW OFFERS SEARCH ENGINE OPTIMIZATION SERVICEPressReleaseNetwork.com (press release)
PR-Inside.com (Pressemitteilung) -PR.com (press release) -Search Engine Land (blog)
all 69 news articles »


Fliqz adds video SEO service
CNET News
According to the company, SearchSuccess "addresses many of the common flaws in existing video SEO strategies." Rather than submit a video to YouTube to ...
Fliqz service optimizes videos for search resultsFierceOnlineVideo
Fliqz Launches Service to Drive Video Content to Top of Google SearchBNET
Driving Video Content To Top Of Google SearchMediapost.com
FierceOnlineVideo
all 10 news articles »


Top 10 Adwords Profit Strategies -- Free Webinar by Dan Kennedy's SEO & PPC Expert
Online PR News (press release)
So I've invited a real Search Marketing and SEO expert – Gareth Owen – to show us just how effective some of his simple Adwords campaign strategies can be ...

and more »


Combining SEO with Social Media Marketing: What NOT to Do
Drop Ship (press release) (blog)
Combining SEO with social media marketing for your ecommerce business must be done correctly in order to be effective. There are social media “purists” who ...
Social Media 101: Combining Blogging, Social Media and SEO for ResultsPractical Ecommerce

all 2 news articles »


MSN's SEO And Content Teams Standardize Efforts
MediaPost Publications
The senior program manager of SEO for MSN at Microsoft in the Americas says management wants to get more involved in the process, since having companies ...



Small Business SEO and Online Marketing
BigNews.biz (press release)
SEO is an important online marketing strategy as it offers long-term success for your small business on the Internet. Some of the most recognizable and ...
Travel PR has changed #2: search engine optimisation (SEO)Tips from the T-List
Social Networking from Rub The Web SEO IndiaBigNews.biz (press release)

all 6 news articles »


The Tech Herald

SEO scam unveiled
Last Click News
Cyveillance says that it has uncovered a search engine optimization (SEO) poisoning campaign that has affected more than 260000 websites. ...
Giant black-hat SEO campaign funnels victims to scareware sitesMX Logic
Attackers Abuse Google to Push RoguewareeWeek

all 27 news articles »


Mar dh'ionnsaicheas tu seo a leughadh an trì mìosan
Scotsman
Ach faodar a bhith cinnteach gu bheil fios aig an dà ùghdar seo dìreach dé tha iad a' dèanamh. Cha ghabh cànan ionnsachadh ceart ann an trì mìosan, ...



LinkAssistant SEO Tool 3.4.7 (Windows)
ZDNet
This SEO software lets you find thousands of high-quality link partners, automatically fill in more than 25000 link submission forms, design a professional ...


Google News





 
 
 

© 2004 - 2008 "Web Hosting Geeks" | Web Hosting Reviews | Customer Reviews | RealMetrics Reviews | Hosting Articles | Directory | Partners | Contacts
Over 7000 articles: web hosting, web development, domain names, ecommerce, web design, site promotion, ppc advertising, seo, site promotion and many others.
Web hosting reviews, ratings and awards are not based on any incentives or commissions. Names and trademarks are the properties of their respective owners.
A direct link to Web Hosting Geeks (http://webhostinggeeks.com) must be provided in order to use any of the above information. Contact us for more info.

Partners: Hosts by speed, Cheap Website Hosting, Free Website Hosting, Cheap Web Hosting, Top 10 Web Hosts, Top 10 Web Hosting Deals, Best Website Hosting, Free Web Hosting, Free Web Hosting, Dedicated Server Hosting, Adult Web Hosting, Web Hosting Discussions, Dedicated Server Reviews, Best Web Hosting, Web Hosting Discounts, HostProfessor.com, rsuog, halyava, PHP Website Hosting Services, Web Hosting Reviews, Hosting Uptime, Best Web Hosting Reviews, Cheap Webhosting, Web Hosting, Flash Templates, CMS Templates, Web Hosting Reviews, Website Hosting Reviews, Web Hosting Providers, Best Web Hosting, Top Web Hosting, RSUOG Web Hosting