Intuitive Japanese Calligraphic Ideogram Intuitive Systems: Leadership for the 21st Century: online strategies and communications

The Business Blog at Intuitive.com

Dave Taylor
Dave Taylor has been involved with the online world since 1980 and is recognized globally as an expert on both technical and business issues. He has been published over a thousand times, launched four Internet-related startup companies, has written twenty business and technical books and holds both an MBA and MS Ed. He's a columnist for the Boulder Daily Camera and Linux Journal and frequently appears in other publications both online and in print. Additionally, Dave maintains four weblogs: The Business Blog at Intuitive.com, Ask Dave Taylor, Dave On Film, and GoFahterhood. Based in beautiful Boulder, Colorado, Dave is an award-winning speaker, sought after conference and workshop participant and frequent guest on radio and podcast programs, as well as active member of his community and busy single father to three children.

Who is that knocking at my Weblog's door?

As part of my research for a new book I'm writing, I was digging around in my Ask Dave Taylor Web site just to see how the most recent Web browsers identify themselves. Much to my surprise, there are literally hundreds of different crawlers hitting the site now, over and above the usual 20-30 popular Web browsers. Crawlers that I've never heard of from sites -- when they're identified at all -- that are equally unfamiliar to me.

Here's a pile of different robots and crawlers I found in my log file, all visiting within a single 24 hour period:

  • Amfibibot/0.06 (Amfibi Robot; http://www.amfibi.com)
  • Baiduspider+(+http://www.baidu.com/search/spider.htm)
  • BecomeBot/1.23; +http://www.become.com/webmasters.html)
  • BecomeBot/2.0beta; +http://www.become.com/webmasters.html)
  • blogsnowbot (+http://www.blogsnow.com/bot.html)
  • boitho.com-dc/0.xx (http://www.boitho.com/dcbot.html)
  • Enterprise_Search/1.00.143;MSSQL (http://www.innerprise.net/es-spider.asp)
  • everyfeed-spider/1.0 (http://www.everyfeed.com)
  • FAST Enterprise Crawler 6 (Experimental)
  • HenryTheMiragoRobot (http://www.miragorobot.com/scripts/mrinfo.asp)
  • HooWWWer/2.0.9 (+http://cosco.hiit.fi/search/hoowwwer/)
  • Iltrovatore-Setaccio/1.2 (It-bot; http://www.iltrovatore.it/bot.html)
  • msnbot/0.3 (+http://search.msn.com/msnbot.htm)
  • NewzCrawler/1.7 (Newz Crawler
  • NextopiaBOT (+http://www.nextopia.com)distributed crawler client beta
  • NPBot (http://www.nameprotect.com/botinfo.html)
  • NusEyeFeedCrawler/0.005 (cs.northwestern.edu);
  • NutchCVS/0.05 (Nutch; http://www.nutch.org/docs/en/bot.html)
  • psbot/0.1 (+http://www.picsearch.com/bot.html)
  • Spider-Sleek/2.0 (+http://search-info.com/linktous.html)
  • SpurlBot/0.2)
  • SurveyBot/2.3 (Whois Source)
  • Trampel-Bot (www.trampelpfad.de)
  • TutorGigBot/1.5 ( +http://www.tutorgig.info )
  • Vagabondo/2.0 MT; http://aanmelden.ilse.nl/?aanmeld_mode=webhints)
  • ZyBorg/1.0 ( http://www.WISEnutbot.com)

Thankfully, many of these are polite enough to include a URL where I can glean more information, but it's a darn surprise how many there are!

Playing detective for a bit, there are some interesting sites visiting my server, including BecomeBot, which is"the user-agent for Become's new web crawler. Become is crawling the web to build a next generation search engine." and TutorGig, which "lists thousands of courses. These courses include not only online courses, but also more traditional courses that are taught in person on or off campus. Users locate courses by searching on keywords of interest. TutorGig.com has a huge database of over a million tutorial sites categorized by more than 2000 subjects."

Further, I'm sure that some of the crawlers that hit my site are spam tools. When a crawler identifies itself as larbin_2.6.3 larbin2.6.3@unspecified.mail, libwww-perl/5.76, gazz/5.0, Pluck Soap Client/1.0Program Shareware 1.0.2 or HenryTheMiragoRobot, LPW::Simple, or one of my other favorites, Anonymized by Stegos Internet Anonymizer, ya just gotta wonder...

Anyone else being overrun by weird and suspicious bots?

Posted by Dave Taylor at December 24, 2004 7:53 PM

Comments

I pull your site's RSS every morning using Sunrise 0.36. I read most blogs on my Palm Tungsten at work during breaks/lunch.etc.

Posted by: Stewart Vardaman on January 6, 2005 9:45 PM

Re: the become.com spider
They're launching Feb 10, 2005 and will have 2.2 billion pages in their index... All of which are related to shopping. They'll be debuting a proprietary algorithm as well.

Posted by: Jason Dowdell on January 24, 2005 10:29 AM

I have been getting hit with the obidos-bot. Ever heard of that one?

Posted by: Dave on June 8, 2005 10:23 AM

Some Google investigation reveals that the chap who owns this Web site -- http://www.onfocus.com/ -- is the author of obidos-bot. It also suggests that his 'bot ignores the robots.txt file and ruleset, frustratingly.

Posted by: Dave Taylor on June 9, 2005 12:25 AM
Insider's Guide to Blogging
Before you leave a comment, a tip: If you're interested in blogging, you should sign up for my Blogsmart News so you can stay up to date on the latest insider tips and ideas for your Internet business and marketing efforts. Sign up right now and you'll get a free copy of my "Insider's Guide to Blogging" ebook too!
 
Post a comment




Because I value your thoughtful opinions, I encourage you to add a comment to this discussion. Don't be offended if I edit your comments for clarity or to keep out questionable matters, however, and I may even delete off-topic comments.



RDF XML GeoURL Add to My Yahoo!

Valid CSS!