Making technology work for you!

Computer & Internet Support Services
for Business and Home

Search engine robots and others
Browsers
Link Checkers, Link monitors and bookmark managers
Validators
FTP clients and download managers
Research projects
Software packages
Offline browsers and other agents
Other miscellaneous agents
Sites that regularly visit
Other useful sites
...And finally, some fakers

Search engines and other sites send robots to read and index your pages. This page reverses that process and indexes the robots. This information has been gleaned by looking at the server logs for numerous sites, including our directory SearchCity.biz.

Whenever a page is read from a web site, the log file records a number of details including the time, the IP address and usually the referrer page and the user agent.

Good robots will read robots.txt to see what your site policy is, but there are other ways of spotting robots. In addition to the search engine robots, other "user agents" will visit your site, e.g. to validate links to your site from other people's pages. Often these will just access the HEAD of the file, rather than doing a GET on the whole file.

Search engine robots and others

The following table lists the search engines that spider the web, the IP addresses that they use, and the robot names they send out to visit your site. Version numbers are usually included in the robot names, but are omitted here except where it implies a visit from a different IP address or (as in inktomi) a different search engine.

Often multiple IP addresses are used, in which case we just give a flavour of the names or numbers. Inktomi is a company that offers search engine technology and is used by a number of sites (e.g. www.snap.com and www.hotbot.com)

Wherever <nn> appears this indicates a number of different digits may be used.

Home page/search engine Robot identifier IP address(es)
www.abacho.com AbachoBOT srv-ze-robot1.tricus.com
www.abcdatos.com abcdatos_botlink
http://www.abcdatos.com/botlink/
217.126.39.167
www.aesop.com AESOP_com_SpiderMan 209.189.115.49
www.ah-ha.com ah-ha.com crawler (crawler@ah-ha.com) c7pub-216-250-141-186.center7.com
www.alexa.com ia_archiver green.alexa.com
sarah.alexa.com
www.altavista.com Scooter test-scooter.pa.alta-vista.net
brillo.pa.alta-vista.net
av-dev4.pa.alta-vista.net
scooter.aveurope.co.uk
bigip1-snat.sv.av.com
  Mercator mercator.pa-x.dec.com
scooter.pa.alta-vista.net
election2000crawl-complaints-to-admin.webresearch.pa-x.dec.com
  Scooter2_Mercator_3-1.0 scooter.sv.av.com
  roach.smo.av.com-1.0 avfwclient.sv.av.com
  Tv<nn>_Merc_resh_26_1_D-1.0 tv<nn>.sv.av.com
www.altavista.co.uk AltaVista-Intranet
jan.gelin@av.com
host-119.altavista.se
www.alltheweb.com FAST-WebCrawler
crawler@fast.no
209.67.247.154
  www.fast.no/faq/faqfastwebsearch/faqfastwebcrawler.html
  Wget ext-gw.trd.fast.no
www.acoon.de Acoon Robot 194.231.42.178
www.antisearch.net antibot 62.210.155.50
www.atomz.com Atomz router-sc.atomz.com
index.atomz.com
www.axmo.com AxmoRobot 194.248.208.82
www.buscaplus.com Buscaplus Robi
http://www.buscaplus.com/robi/
 
www.canseek.ca CanSeek/
support@canseek.ca
216.168.111.111
www.christcrawler.com/search.cfm ChristCRAWLER
http://www.christcrawler.com/
207.191.111.231
www.crawler.de Crawler
admin@crawler.de
crawlit.crawler.de
www.daadle.com DaAdLe.com ROBOT/ 216.12.213.32
www.daum.net RaBot
Agent-admin/ phortse@hanmail.net
210.183.28.46
  contact/jylee@kies.co.kr 211.50.57.6
  RaBot
Agent-admin/ webmaster@kisco.go.kr
202.30.94.34
www.en.deepindex.com DeepIndex deepindex.net1.nerim.net
www.ditto.com DittoSpyder 65.169.94.188
domanova.co.uk Jack  
www.entireweb.com Speedy Spider 62.13.25.209
www.excite.com ArchitextSpider Musical instrumentss are used
in the name such as viola.excite.com
cello.excite.com
piano.excite.com
kazoo.excite.com
ride.excite.com
sabian.excite.com
sax.excite.com
bugle.excite.com
snare.excite.com
ziljian.excite.com
bongos.excite.com
maturana.excite.com
mandolin.excite.com
piccolo.excite.com
kettle.excite.com
ichiban.excite.com
(and the rest of the band)
more recently first names are being
used like philip.excite.com
peter.excite.con
perdita.excite.com
macduff.excite.com
agouti.excite.com
(excite) ArchitectSpider crimpshrine.atext.com
ichiban.atext.com
www.euroseek.net Arachnoidea
arachnoidea@euroseek.net
212.209.54.134
www.ezresults.com EZResult 216.28.23.59
www.fastsearch.net Fast PartnerSite Crawler psprdcrw001.sac2.fastsearch.net
  FAST Data Search Crawler 65.198.110.185
www.fireball.de KIT-Fireball ????
www.fybersearch.com FyberSearch 69.49.241.9
www.galaxy.com GalaxyBot
http://www.galaxy.com/galaxybot.html
63.121.41.175
www.geckobot.com geckobot ???.rdc1.az.coxatwork.com
www.gendoor.com
(Genealogical Search Engine)
GenCrawler ????
www.geona.com GeonaBot 69.59.142.17
www.google.com Googlebot
googlebot@googlebot.com
http://googlebot.com/
c<nn>.googlebot.com
www.goo.ne.jp moget/2.0
moget@goo.ne.jp
202.229.31.13
www.girafa.com Aranha Aranha.girafa.com
(inktomi) Slurp.so/1.0 q2004.inktomisearch.com
  slurp@inktomi.com j5006.inktomisearch.com
(inktomi) Slurp/2.0j 202.212.5.34
  slurp@inktomi.com
www.inktomisearch.com
goo313.goo.ne.jp
(inktomi) Slurp/2.0-KiteHourly
slurp@inktomi.com;
www.inktomi.com/slurp.html
y400.inktomi.com
(inktomi) Slurp/2.0-OwlWeekly
spider@aeneid.com
www.inktomi.com/slurp.html
209.185.143.198
(inktomi) Slurp/3.0-AU
slurp@inktomi.com
j6000.inktomi.com
http://hoppa.com/
(need V5 browsers to view)
Toutatis 2.5-2 tisnix.xs4all.nl
www.hubat.com Hubater 209.114.176.250
www.almaden.ibm.com
(research centre)
http://www.almaden.ibm.com/cs/crawler wfp2.almaden.ibm.com
www.iltrovatore.it IlTrovatore-Setaccio 213.26.21.8
www.incywincy.com IncyWincy 64.81.243.66
www.infoseek.com UltraSeek cde2c923.infoseek.com
cde2c91f.infoseek.com
  InfoSeek Sidewinder cca26215.infoseek.com
www.intags.de Mole2/1.0
webmaster@intags.de
217.160.75.10
http://mp3bot.de/ MP3Bot <..>
www.ip3000.com C-PBWF-ip3000.com-crawler
ip3000.com-crawler
www.ip3000.com
www.kuloko.com kuloko-bot/0.2 66.90.81.41
www.lexis-nexis.com LNSpiderguy firewall5.lexis-nexis.com
www.look.com lookbot magma.com
www.looksmart.com MantraAgent fjupiter.looksmart.com
www.loopimprovements.com NetResearchServer leg-64-133-109-250-STK.sprinthome.com
(see also www.incywincy.com) www.loopimprovements.com/robot.html  
www.lycos.com Lycos_Spider_(T-Rex) bos-spider<n>.bos.lycos.com
216.35.194.188
www.joocer.com JoocerBot 80.46.38.169
www.mirago.co.uk HenryTheMiragoRobot 194.202.39.46
www.mozdex.com mozDex/ (within comcast.net)
http://search.msn.com/ MSNBOT/0.1
http://search.msn.com/msnbot.htm)
131.107.163.47
www.northernlight.com Gulliver marvin.northernlight.com
taz.northernlight.com
www.objectssearch.com ObjectsSearch/0.01 68.88.244.177
www.picosearch.com PicoSearch/ pipe.picosearch.com
www.portaljuice.com PJspider timber.nextopia.com
www.powerinter.net
but it won't let us in :-(
DIIbot node-d8e93393.powerinter.net
http://navi.ocn.ne.jp/ nttdirectory_robot
super-robot@super.navi.ocn.ne.jp
lilis00.navi.ocn.ne.jp
  griffon
griffon@super.navi.ocn.ne.jp
lilis04.navi.ocn.ne.jp
www.maxbot.com Spider/maxbot.com
admin@maxbot.com
search.wport.com
??? various (fakes agent on each access) pool0058.cvx2-bradley.dialup.earthlink.net
??? gazz/1.0 deleuze.infobee.ne.jp
  gazz@nttrd.com derrida.infobee.ne.jp
??? ??? search-8.xift.com
www.nationaldirectory.com NationalDirectory-SuperSpider spider.nationaldirectory.com
209.116.58.143
www.naver.com dloader(NaverRobot)/
dumrobo(NaverRobot)/
211.218.151.209
www.openfind.com Openfind piranha,Shark ???
(Chinese language) robot-response@openfind.com.tw  
  Openbot/ abovenet4.openfind.com
www.picsearch.org psbot
www.picsearch.org/bot.html
217.75.104.26
www.pinpoint.com CrawlerBoy Pinpoint.com nitrogen.pinpoint.com
www.petersnews.com user<n>.ip3000.com news<n>.petersnews.com
www.vestris.com/alkaline AlkalineBOT host130.uv-ray.com
www.searchhippo.com Fluffy the spider
info@searchhippo.com)
208.148.122.27
www.scrubtheweb.com Scrubby/ 208.145.190.254
www.singingfish.com asterias grouper.singingfish.com
www.speedfind.de speedfind ramBot xtreme BWEB.highway.telekom.at
www.s.u-tokyo.ac.jp Kototoi/0.1 crawler-red3.is.s.u-tokyo.ac.jp
www.searchspider.com Searchspider/ 24.90.243.203
www.sightquest.com SightQuestBot/
http://www.sightquest.com/bot.htm
64.49.245.212
www.spidermonkey.ca Spider_Monkey/ 66.163.18.197
www.surfnomore.com Surfnomore Spider v1.1 165.90.194.245
www.supersnooper.com Robot@SuperSnooper.Com 207.8.212.162
www.teoma.com teoma_agent1
teoma_admin@hawkholdings.com
63.236.92.148
http://mapper.teradex.com Teradex_Mapper
mapper@teradex.com
65.110.6.26
www.travel-finder.com ESISmartSpider 202.46.33.15
www.traficdublu.ro Spider TraficDublu 81.196.*.*, 193.16.218.66
www.tutorgig.com Tutorial Crawler
http://www.tutorgig.com/crawler
216.40.225.75
www.uksearcher.co.uk UK Searcher Spider -
www.vivante.com
(coming soon)
Vivante Link Checker 216.93.167.106
www.walhello.com appie uses an address at planet.nl, a Dutch ISP
www.websmostlinked.com Nazilla -
www.webwombat.com.au www.WebWombat.com.au 202.139.99.131
www.webseek.de marvin/infoseek
marvin-team@webseek.de
arthur4.sda.t-online.de
www.webtop.com MuscatFerret ferret<nn>.webtop.com
www.whizbanglabs.com WhizBang! Lab 216.250.143.108
www.wisenut.com ZyBorg
(info@WISEnut.com)
-
www.wire.co.uk WIRE WebRefiner:
webrefiner@wire.co.uk
brighton.wire.co.uk
www.worldsearchcenter.com WSCbot ???
www.yandex.com Yandex ya.yandex.ru
www.yellowpet.com
pet-based search engine
Yellopet-Spider 212-82-36-23.ip.zeitraum.com
<client sites> libwww-perl www.linpro.no/lwp/
http://verno.ueda.info.waseda.ac.jp/  
  Iron33 207.18.183.251

Browsers

Most browsers identify themselves with a string that begins "Mozilla...". I've chosen not to document those (as yet). Here are a few of the rarer browser identifiers that I've seen.

Browser identifier Information
AmigaVoyager
http://v3.vapor.com/
Voyager browser for the Amiga
xChaos_Arachne
http://browser.arachne.cz/
(DOS-compatible browser. Linux version under development)
IBrowse
www.hisoft.co.uk (search for IBrowse)
Amiga-based browser
ICab
www.icab.de/index.html
(Macintosh-only)
JustView
http://www3.justsystem.co.jp/download/justview/3.01win1a.html
(I think this is a browser. Site is in Japanese)
KMeleon
http://kmeleon.sourceforge.net/
(Light browser based on the Mozilla code base)
Konqueror
www.konqueror.org/konq-browser.html
(Linux KDE browser)
Lynx
http://lynx.browser.org/
(Cross-platform text based browser)
OmniWeb
www.omnigroup.com/products/omniweb/
(Macintosh-only)
Opera
www.opera.com
(Cross-platform, small, efficient and standards lead browser)
Plucker
www.plkr.org/index.pl/faq#1.1
(Palm handhelds. Written in Python)
pwWebSpeak
www.prodworks.com/issound/catalog/catalog_pwwebspeak.html
Audio Browser
QWeb
http://sunsite.auc.dk/qweb/ (Linux browser)
(see also http://browswerwatch.internet.com/news/story/qweb8.html)
SlimBrowser
www.flashpeak.com/sbrowser/sbrowser.htm
Freeware tabbed browser
Sleipnir
http://sleipnir.pos.to/software/sleipnir/index.html (Japanese)
Japanese browser with apparantly an English version available.
VMS_Mosaic
http://vaxa.wvnet.edu/vmswww/vms_mosaic.html
(OpenVMS only version of Mosaic, a pre-Netscape browser)
WannaBe
http://mindstory.com/wb2/
(Macintosh text-only browser)
w3m
http://w3m.sourceforge.net/
(text-based browser)

Link Checkers, Link monitors and bookmark managers

Link checkers and bookmark managers are run by people wanting to keep their pages and bookmarks up to date. Being visited by a link checker is good news as it means that someone has linked to you, and cares that you're still alive. Link monitors regularly check your pages for changes, usually because someone has selected your page as "one to watch"

If you have access to the server log, check the referrer page to try and get the URL from which you are linked. Sometimes these URLs are inside password protected parts of sites, so you won't be able to view the page.

If you build up a list of sites that link to you, these are the guys you should tell when you move (moral - never move)

It's also quite common for the Link checker to give no indication of which URL it's coming from. Some link checkers always come from the same IP address, more usually they come from the client's site. It depends on whether the site owner has purchased a copy of the link checking software, or signed up to some centralized link checking service. If you get the client's IP address you can always try visiting that if they blank the referrer URL field, and surfing their site.

Some of these tools appear to imply they're extracting email addresses (e.g. emailSiphon). As such they're probably unwelcome visitors since these addresses are probably being collected for spammers.

A page listing various link checkers (and other tools) can be found at www.softwareqatest.com/qatweb1.html#LINK

Robot identifier IP address(es) Link Checker home page
ALink
<client site>
http://www.info-pack.com/alink/
Reciprocal Link Checker, Manager and Page Generator.
AMeta
<client site>
http://www.info-pack.com/ameta/
Meta Tag Generator
ASPSearch URL Checker
<client site>
http://search.santry.com/downloads/
a site search engine/index maintenance tool
BlogBot
<client site>
http://sourceforge.net/projects/blogbot/
BMChecker
<client site>
www.fureai.or.jp/~yoichi37/soft/bmchecker.html
(Japanese Bookmark Checker)
Bookmark Buddy
<client site>
www.bookmarkbuddy.net/about.shtml
Check&Get
<client site>
www.checkget.com
CheckWeb
<client site>
www.checkweb.com
CNET_Snoop

www.download.com
(only if you have software listed at that site)
CSE HTML Validator
<client site>
www.htmlvalidator.com
HTML page validator that includes a link checker
amongst it's functions.
DRKSpider
<client site>
www.drk.com.ar/spider/ (An Open Source project)
DISCo Watchman
<client site>
www.t-guild.com/gamesite/Software/Disco_w/Disco_w.htm
DoctorHTML
draco.imagiware.com
http://www2.imagiware.com/RxHTML/
Email Extractor
<client site>
<email collector> We don't list links to
email collectors on this site
EmailSiphon
<client site>
<email collector> We don't list links to
email collectors on this site
EmailWolf
<client site>
www.pixeltech.com.au/~msw/ewolf/index.html
FavOrg
<client site>
http://www.pcmag.com/article2/0,1759,1558477,00.asp
A utility written by PC Magazine to fetch icons files
(favicon.ico) for your IE favorites
Favorites Sweeper
<client site>
www.manitoolssoftware.cjb.net
Another "favorites" tidy-up utility
FreshLinks.exe
<client site>
www.resqpc.com/features.html
Funnel Web Profiler
<client site>
www.quest.com/funnel_web/profiler/
Profiles your site, including links to/from it
Html Link Validator
<client site>
www.lithopssoft.com/hlv/index.html
The Informant
The Intraformant
cosmo.dartmouth.edu
http://informant.dartmouth.edu/
InternetLinkAgent
<client site>
http://www1.odn.ne.jp/freeware/rank/ineternet/internetlinkagent.html
(in Japanese)
InternetPeriscope
<client site>
www.lokboxsoftware.com/internetperiscope.asp
javElink
salix.ingetech.com
www.dailydiffs.com
jdwhatsnew.cgi
<client site>
www.jdrowell.com/projects/jdwhatsnew/view
JRTS Check Favorites Utility
<client site>
www.jrtwine.com/Products/CheckFavs/
Lambda LinkCheck
195.139.70.25
www.stud.ifi.uio.no/~lmariusg/download/python/LinkCheck.html
LinkLint-checkonly
--
www.goldwarp.com/bowlin/linklint/
LinkAlarm
linkalarm.com
www.linkalarm.com
Linkbot
<client site>
www.tetranetsoftware.com/products/linkbot.htm
Linkman (Mozilla...)
66.89.128.242
http://www.outertech.com/product.php?product=5
LinkProver
<client site>
www.tafweb.com/linkprover.html
Links
--
http://gossamer-threads.com/scripts/links/
(Link management cgi script)
LinkScan Server
<client site>
www.elsop.com
LinkSweeper
<client site>
www.lss.com.au/lss/windows/ls/linksweeper.htm
Link Valet Online
195.82.114.5
www.htmlhelp.com/tools/valet/
LinkVerify Spider
frances.yourwebhost.com
www.enduser.co.uk/linkverify/
LinkWalker
lw.seventwentyfour.com
209.167.50.23
www.seventwentyfour.com
Morning Paper
<client site>
www.boutell.com/morning/
MoveAnnouncer
--
www.moveannouncer.com
(notifies webmasters when your pages have moved)
NetLookout
--
www.frugalsoft.com
NetMechanic
www.elsop.com
gamma.netmechanic2.com
www.netmechanic.com
NetMind-Minder
marvin.netmind.com (retired)
gary.netmind.com
meg.netmind.com
inyanga.netmind.com
leo.netmind.com
gemini.netmind.com
www.netmind.com
NetMonitor
--
www.modemwizard.com/netmonitor.html
Netprospector JavaCrawler
<client site>
www.actaddons.com/products/netprospector.asp
online link validator
216.93.171.138
www.dead-links.com
(online link checker - submit your URL)
Rational SiteCheck
<client site>
www.rational.com/products/teamtest/prodinfo/sitecheck.jtmpl
Robozilla
h-206-<n>-<n>-<n>.netscape.com
http://dmoz.org/
(checks links in the dmoz directory)
RPT-HTTPClient
<client site>
www.purplefrog.com/~thoth/jchecklinks/
Java utility that uses the Java HTTPClient class library
SurfMaster
<client site>
www.maskbit.com/surfmaster.htm
SyncIT
<client site>
www.bookmarksync.com
Watchfire WebXM
<client site>
www.watchfire.com/products/webxm.asp
WatzNew Agent
<client site>
www.watznew.com
WebSite-Watcher
<client site>
www.aignes.com
WebTrends Link Analyzer
<client site>
www.webtrends.com
Weblink Scanner
<client site>
www.iterix.com/products/WeblinkScanner/weblinkScanner.asp
Xenu's Link Sleuth <client site> www.snafu.de/~tilman/xenulink.html

Validators

Validators check your web pages for HTML correctness and standards compliance. Since other people are unlikely to send a validator to your site, you don't usually see much of this. Consequently the "list" below is restricted to the on-line validators I've used myself.

Robot Identifier IP address Validator home page
W3C_Validator
abyss.w3.org
http://validator.w3.org/
WDG_Validator/
64.29.16.182
www.htmlhelp.com/tools/validator/
Tooter selfpromotion.com www.selfpromotion.com. This is
used as part of a link submission
agent (trebor@animeigo.com)

FTP clients and download managers

If you offer files for download, then you'll start to be visited by various FTP clients. Clients like Go!Zilla and GetRight are smart in that they can resume downloads that have been interrupted. This relies on your web server supporting the necessary protocol, but that's fairly standard these days.

Client Identifier FTP Client home page
Alligator
www.nearsoftware.com/alligator/maininfo/
BatchFTP
www.dynamicnet.net/products/batchftp.htm
ChinaClaw
http://download.pchome.net/internet/download/860.html (Chinese)
(Chinese download utility)
DA
www.lidan.com
www.downloadaccelerator.com
DLExpert
www.yanew.com (English and Chinese versions available)
Download Demon
www.netzip.com
Download Master
www.one.com.ua/dm/ (Russian)
Download Ninja
www.h-fd.org/~mkro/mt/archives/000585.html (Japanese)
Download Wonder
www.forty.com
Ez Auto Downloader
www.anatari.com/ezad/index.html
Downloads all files of a given type from a site, so it's
more like a site grabber
FreshDownload
www.freshdevices.com/freshdown.html
Go!Zilla
www.gozilla.com
GetRight
MyGetRight
www.getright.com
GetSmart
http://getsmart.hypermart.net/
HiDownload
www.hidownload.com
JetCar (or FlashGet)
www.amazesoft.com
Kapere
www.kapere.com/menu.php?lang=english
Kontiki Client
www.kontiki.com/products/index.html
LeechFTP
http://stud.fh-heilbronn.de/~jdebis/leechftp/
LeechGet
www.leechget.de
LightningDownload
www.lightningdownload.com
Mass Downloader
www.geocities.com/SiliconValley/Vista/2865/md.htm
MetaProducts Download Express
www.metaproducts.com/DE.html
NetZip Downloader
SmartDownload
www.netzip.com
NetAnts
www.netants.com
NetButler
www.webcelerator.com/netbutler/
NetPumper
www.netpumper.com
Net Vampire
www.netvampire.com
Nitro Downloader
www.klsofttools.com/nitro.html
Octopus
http://moskalyuk.com/octopus/
PuxaRapido
www.puxarapido.com.br
RealDownload
http://service.real.com/help/faq/rdown4/rdownfaqa01.html
SpeedDownload
www.yazsoft.com (for Macintosh)
WebDownloader for X 1.30
www.krasu.ru/soft/chuchelo/features.php3
(Linux web downloader with X GUI)
WebLeacher
www.webleacher.dk (down last time I tried it)
more details at www.davecentral.com/projects/thewebleacher/
WebPictures Downloader
www.fullstrong.com
Locates and downloads pictures
X-Uploader Can't find the home page, but it's described (in Russian)
on www.compulenta.ru/2002/1/17/24333/

Research projects

These agents come from research projects. Of course that's how Google started...

citenikbot/
http://www.citenik.co.uk/bot.html. One-man project due
for release in 2004.
CLIPS-index
http://clips-index.imag.fr/ (French)
French research robot from a linguistics project (?)
Computer_and_Automation_Research_Institute_Crawler
  Robot from the research centre at Hungarian Acedemy
of Sciences at www.sztaki.hu Crawls from IP 195.111.1.93
cosmos Spider from www.xyleme.com which is a project to locate
robot@xyleme.com
and index XML content on the web. The company is a spin off
from project at INRIA in France, a frequent source of
web robots. The word "xyleme" apparantly relates to the
vascular system in plants, but cleverly must be one of
the very few words to contain the letters "X", "M" and "L"
(although not in that order ;-)
DiaGem/
Experimental spider from Mitsibushi R&D division
www.skyrocket.gr.jp/diagem.html
Crawls from IP 203.178.88.244
Digimarc WebReader
Digimarc search images on the web looking for digital watermatrs
More details at www.digimarc.com
EchO!/2.0
Spiders from 194.254.160.3, which would seem to be part
of www.voila.com, a French-based search engine.
FinaleRobot The www.expressus.com site describes an Interactive Natural
robot-master@expressus.com
Language encyclopedia that will become a search engine
at www.final-e.com. Good name, but at present it just
maps back onto the ExpressUs site (not such a good name).
Crawls from IP address 64.114.34.115
Ideare - SignSite
www.ideare.com. Spiders from spider3.tiscalinet.it. Ideare are
a research company producing search engine technology, and are
part owned by Tiscali in Italy, who seem to use their various
tools for different search engines (mp3, images etc).
GentleSpider
Some sort of spider that usually visits using
an IP address from within www.research.att.com or
crawler.tivra.com
Gulper Web Bot
www.ecsl.cs.sunysb.edu/~maxim/cgi-bin/Link/GulperBot
(Open research project to produce opinion-based search engine)
larbin And from the people that brought you xyro (see below),
sebastien.ailleret@inria.fr comes another, newer bot. This one seems to crawl from
ghi@lcs.mit.edu the IP address cremant.inria.fr. Update more recently
it's also been seen coming from barracutta.lcs.mit.edu
cosmos
And then there was "cosmos", crawling from pomelos.inria.fr
Seems these people are a webbot factory. Cosmos doesn't
offer an email address.
MultiText
Research project to index the last weeks' news items
http://canola1.uwaterloo.ca/
NEC Research Agent
http://heavenly.nj.nec.com/
Research "Inquirus" (meta?) search engine
OntoSpider
http://ontospider.i-n.info
Dutch robot for a research project. Crawls from 195.11.244.52
sherlock_spider
www.sherlock.com.cn. A course project from
http://burrowww.cs.indiana.edu:15003/b659/
Crawls from 129.79.245.98
Steeler
www.tkl.iis.u-tokyo.ac.jp/~crawler/crawler.html.en
Japanese research robot.
ru-robot Unable to find details on this, but I'm guessing it's
0.1_hseo(at)cs.rutgers.edu
a research spider from www.rutgers.edu. Crawls using
the IP teal.rutgers.edu
WebGather
http://pccms.pku.edu.cn:8000/
Chinese search project
xyro Seems to be a spider associated with a French
xcrawler@inria.fr
research institute. Usually crawls using the IP
address vamos.inria.fr
Zao/0.2
www.kototoi.org/zao/ Another Japanese research robot
Crawls from 133.11.36.41.

Software packages

These agents are the default identifiers for various software packages. Software developers uses these packages to add Internet functionality to their own applications. As such it's impossible to say without looking at the pattern of access what these agents are being used for as the same agent name may be used by different developers fo achieve differemt results.

While many of these packages allow you to change the user agent, some do not, and many developers are too lazy to change the agent string.

HTTPClient
Default agent name used by the Java HTTPClient class.
www.innovation.ch/java/HTTPClient/ (See also RPT-HTTPClient below)
IP*Works!
Set of TCP/IP components used in cross-platform development
of internet tools www.nsoftware.com/products/ipworks.aspx
libwww-perl
The PERL programming language comes with a number of
routines for constructing web-aware scripts. This and
related strings are the default user agent identifiers,
although it's perfectly easy to change this to be whatever
you want.
libghttp
The GNOME http library. A Linux software library
the offers connectivity to the web. Found in many
places on the web. There is a description at
www.fifi.org/doc/libghttp-dev/html/ghttp.html
Macromedia Flash Player
Flash movies can contain scripts that can fetch content
from the web (such as other Flash movies or images)
MFC_Tear_Sample
Agent name used in the sample code supplied with
Visual C++ for accessing the web. This may be therefore
be someone running a program they've written based on
that code.
PEAR HTTP_Request class
TPEAR is a framework and distribution system for reusable PHP
components http://pear.php.net/
Python-urllib
Presumably the default identifier for the urllib module
in the Python programming language
www.lib.uchicago.edu/keith/courses/python/class/7/
RPT-HTTPClient
The Java HTTPClient class library
TeamSoft WinInet Component
www.winsoft.sk/wininet.htm (menus require Java)
Internet software component suite
wget
www.gnu.org/software/wget/wget.html
Free Unix/Linux package for retrieving web pages
WinScripter iNet Tools
www.winscripter.com/wsh/tools/wsInetTools.asp
COM/DLL object that supports the SMTP and HTTP protocols
W3CRobot/
A fast web-spidering robot included with the libwww
package (?). See www.w3.org/Robot/
Zeus <nnnn> Webster Pro www.homepagesw.com/webster_overview.htm

Offline browsers and other agents

Agent Identifier Agent home page
DigOut4U
www.arisem.com/Enu/
DISCoFinder
www.ars.ru/eng/products/discof.asp
eCatch
www.ecatch.com
EirGrabber
http://www2p.biglobe.ne.jp/~eir/index.htm
(Japanese software from the "Eir Project")
ExtractorPro
(Bulk email marketing tool. URL deliberately omitted)
FairAd Client
www.hager.co.at/fordelka/fairad.htm (German)
A German pay-to-surf client
JoBo
www.matuschek.net/software/jobo/index.html a site downloader
iSiloWeb
www.isilo.com (for palm pilot)
Kenjin Spider
www.autonomy.com
MSIECrawler
MSProxy
(Microsoft IE4.0)
NexTools WebAgent
www.vector.co.jp/soft/win95/net/se053030.html
Offline Explorer
www.metaproducts.com/OE.html
NetAttache
Offline browser and search engine agent
PageDown
Details (in Japanese) at
http://www01.u-page.so-net.ne.jp/fa2/y_yutaka/share/pagedown.htm
ParaSite
www.ianett.com/parasite/
Searchworks Spider
www.nedesign.com/Phipps/products.html
SiteMapper
www.trellian.com/mapper/index.html
SiteSnagger
http://www.pcmag.com/article2/0,1759,1559896,00.asp
SuperBot
www.sparkleware.com/superbot/index.html
Teleport Pro
www.tenmax.com/teleport/pro/home.htm
Web2Map
www.web2map.com/us/index.htm
Web site copier. English/German versions available
WebAuto
www.yanasoft.co.jp/webauto.html
I think this is an offline browser. Site is in Japanese
WebCopier
www.maximumsoft.com
Webdup
www.webdup.com
(Chinese software. Not 100% sure what it does)
WebFetch
www.webfetch.com
WebReaper
http://www.webreaper.net/
Webrobot
www.multimania.com/dilletb/WebRobot/
Website eXtractor
www.asona.org
WebSnatcher
www.theronwelch.com/websnatcher/
WebStripper
www.solentsoftware.com/webstripper/
WebTwin
www.WebTwin.com
Convert websites into help files.
WebVCR
www.netresultscorp.com/fs_webvcr_info.html
WebZIP
www.spidersoft.com
WWWOFFLE
www.gedanken.demon.co.uk/wwwoffle/
Xaldon WebSpider www.xaldon.de/produkte_webspider.html (German)
Offline browser

Other miscellaneous agents

These agents are ones that we've seen, but been unable to get information for, or which are slightly unusual in origin. If you have any additional information on any of these, feel free to send it to info@jafsoft.com


User Agent Information
Ad Muncher
www.admuncher.com
Browser plug-in that monitors the pages as you view them,
and removes all adverts, popup windows etc.
ADSAComponent http://cnds.ucd.ie/adsa/
ADSARobot
distributed search engine project
Contact postmaster@cnds.ucd.ie
browses from acropolis.ucd.ie (which doesn't make
sense for a distributed search engine :-)
Albert Indexer
www.albert.com
Multi-lingual search technology
AnswerChase
www.answerchase.com a personal search robot.
ASPSeek
www.aspseek.org/about.html. An open source search engine project
ATA-Translation-Service
Looks to be an online translation tool, much like
Babelfish. Possibly related to www.atanet.org/
AVSearch
Seems to be the AltaVista personal search agent. The
crawling site is sometimes referred to in the agent name
Avant Browser
www.avantbrowser.com Browser add-on for Internet Explorer
beholder or www.vigiltech.com/esensedisclaim.html
e-sense
www.vigiltech.com/esensedisclaim.html
BravoBrian
http://bstop.bravobrian.it/ (may require IE). A content filtering
service that offers protection from pornography and
other unwanted content for children. Comes from IP 213.215.133.19
bumblebee@relevare.com
Software used to build "Vortals" (vertical portals).
Details (requires Flash) can be found at
www.relevare.com/site/
Checkbot
Seems to come from www.oxxfordinfo.com who offer B2B
services
contype
Possibly Adobe Acrobat or Reader or Adobe Acrobat Reader
used with MSIE (I have been unable to confirm this)
Convera Internet Spider
A "RetrievalWare" product which claims to be a multimedia
web cralwer. www.convera.com/Products/rw_ancillis.asp
DaviesBot
www.wholeweb.net/web/
deepweb
Also calls itself an "Intelligent Deep-Web Robotic Agent"
A search engine indexer that will index dynamic content.
www.deepweb.com. Indexs from IP 66.96.221.180
EbiNess
http://sourceforge.net/projects/ebiness
An Open Source project to display Internet information
ina 3D format.
EmailWolf
www.pixeltech.com.au/~msw/ewolf/
email program no longer available - that's the only reason I'm
prepared to list it on this page.
Excalibur Internet Spider
www.excalib.com/products/ispi/index.shtml
Expired Domain Sleuth
Hunts down popular, yet expired domain names with
a view to letting you purchase an already popular
domain name. www.expireddomainsleuth.com
GigaBaz
GigaBazVStheWeb
crawler@brainbot.com
http://brainbot.com/
Giskard
www.oralco.com
(Trivia note: Giskard is probably named after the Isaac Asimov robot)
grub-client
Grub is a distributed, open source web crawler. Users
download the client which then indexes the web as part
of a distibuted effort www.grub.org/html/documents.php
htdig
www.htdig.org
search engine software for companies and universities
http://webwarper.net
A browser accelerator. The idea is that you browser "through"
their site, taking advantage of their faster Internet connection,
caching and - most importantly - compression (of the file sent
to your browser) in return for their adverts added to the viewed pages.
Such accesses give the webwarper URL as the User Agent, concealing
the true agent of the original user.
More details at http://webwarper.net/ww.pl/0/wwgz/about.htm?*
infoGIST
www.infogist.com
InterGO
www.teachersoft.com
http://browserwatch.internet.com/news/story/intergo1.html
This was a child-safe browser, nut it seems no associated
page remains
InternetArchive
Presumably www.internetarchive.com, but that's in "stealth mode"
Internet Ninja
www.ifour.co.jp (Japanese Macintosh browser?)
InternetSeer
A web monitoring service.
More details at www.internetseer.com/
ipiumBot
www.laurion.com/ipium-analysis.html (French)
A tool that searches for copies of your documents on
the web. Crawls from petula.laurion.net
InternetAmi IOR
www.internetami.se/ior.html robot gathering data for
an English/Swedish translation service.
InsumaScout/
www.insuma.de/insuma/de/SEscout.html
Searches data situated in open data sources.
Katriona
Something to do with the European Regional Internet Registry (RIPE)
Browses using IP address 213.219.19.148
larbin
http://pauillac.inria.fr/~ailleret/prog/larbin/index-eng.html
LEIA
Unable to find
(Too many "Star Wars" references get in the way)
LexiBot
www.lexibot.com
logikabot
www.logika.net
Mister Pix II
Picture finder www.mister-pix.com/en/home.htm
MOSES 2.0 Spider
www.ideas2internet.com/products/moses2/
NOTE Site crashes my version of netscape 4.7
Mata Hari
www.thewebtools.com
(Internet search agent)
metabot
Geographical-based text search tool. Crawls from 66.28.23.147
www.metacarta.com/products.htm
NetCruiser
www.netcruiser-software.com/products.html
It's not clear to me which of these products this might be,
but I'm assuming it's one of them.
NPBot
www.nameprotect.com crawls from 12.148.209.196 (crawler1.crawler918.com)
A trademark protection service
NetZippy
www.innerprise.net/usp-spider.asp
NZBot
www.navigationzone.com
Offers "information management" tools
Opencola
www.opencola.com
A search application, combining data from multiple sources
ORA_checksite
www.oreilly.com/openbook/webclient/ch06.html
Identifier used in a sample perl program in the online
book "Web Client Programming with Perl". The program is
used to check links. Obviously people have tried it, and it works :-)
Onekit.com - PAD File Get.
PAD file poller. PAD files describe software applications to
download sites.
Oxxbot1
www.oxxfordinfo.com
(Data mining bot on IP 216.0.86.75)
Pansophica
http://homepage.mac.com/zigkit/Pansophica/index.html
A Web search agent with neural net intelligence which organizes
and personalizes Web sites and searches.
Phoaks
www.phoaks.com/index.html. An index or web resources
listed in UseNet. See also
www.public.iastate.edu/~CYBERSTACKS/Aristotle.htm
phpMySearch-Crawler
http://phpMySearch.web4.hm a search engine for individual
sites.
PICgrabber
A free picture and movie locator
www.movies-free.net
PictureOfInternet www.malfunction.org/poi
erik@malfunction.org
Seems to be a project to create a collage of images gathered
from the Internet.
PintaSpider
Unable to find But the spider came from www.cnet.fr
Pita (Chub.Stanford.EDU)
--
PitSpyder Thread<n>0
Unable to find
psbot
www.picsearch.org/bot.html
A bot indexing pictures. Crawls from ps.direct2internet.com
PolyBot
http://cis.poly.edu/polybot/
crawls from
weasel.poly.edu,
grampus.poly.edu,
bumblebee.poly.edu
PureSight
www.puresight.com/Products/PureSightHomeDescription.htm
(child-safe content filtering)
Rumours-Agent
Comes from IP 202.214.69.131, which a lookup
identifies as "Cross Lingual Info Research" in Japan.
RepoMonkey Bait & Tackle
A bit of detective work here. Recent entries in the
the log file link this to the site www.hungryhippo.com,
although the robot always appears to come from an IP
address at backflip.com (a bookmarking service).
Visiting www.hungryhippo.com reveals a "coming soon"
site. Looking at the HTML source leads to another page
at www.mezzaluna.net/hungryhippo.com/ (appears
identical).
The META tags for this page all appear to be references
to day trading, futures, training and the like, although
we did spot the word "fibonacci" (our favourite :-).
So... possibly a future search engine related to stock
trading?, or maybe the Monkey and Hippo are just feeding
me a red herring?
There's more. The picture on the Kenjin site at
www.kenjin.com/kenjin/info.html is currently the same as
that at HungryHippo. Kenjin is an Autonomy company.
Robot2.0(PingSoft)
There are several "PingSoft"s around, but I suspect that
this belongs to one of the products listed at
http://www.pingsoft.net/ (e.g. SmartHunter)
since I was visited froma Chinese IP address.
SilentSurf
www.silentsurf.com. A surf anonymizer service
SlySearch www.slysearch.com. A site that hunts down infringements
slysearch@slysearch.com
of intellectual property rights.
SpaceBison
http://www.proxomitron.org/
A web filter that is "ShonenWare", i.e. you should
purchase a Shonen Knife CD if you use it. Shonen Knife
are a great Japanese band, much loved by the late Kurt
Cobain. Sometimes this sets the referrer page to the
band's home page at www.mmjp.or.jp/knife/ (or maybe
the users just happen to go there themselves).
SpotOn
www.spoton.com
(IE add-on that organizes your browsing)
SQ Webscanner
http://macinsearch.com/users/webscanner/
(on holiday last time I looked)
Squid
www.squid-cache.org
An open-source web proxy cache for Unix systems
Sqworm
Not 100% sure about this one. When it visited me it came
from the WebSense site 63.212.171.* (and a Google search show
others seem to see the same). At the WebSense site you
can find WebCatcher, a product used to monitor
employees web-surfing habits (as near as I can tell).
But as I say, I'm not 100% sure...
www.websense.com/products/about/webcatcher/index.cfm
SurfControl
www.surfcontrol.com/products/web/default.aspx
content tracking product
Tagword
Tool that surveys the links in the Open Directory
at http://dmoz.org, checking their status etc.
See http://tagword.com/dmoz_survey.php
TaWWWantula
Unable to find
Tcl http client package
The default identifier for any software built using
the Tcl HTTP package
http://tcl.activestate.com/software/tcltk/
http://tcl.activestate.com/man/tcl8.0/TclCmd/http.htm
TeraCrawl
Unable to find
TurnitinBot
www.turnitin.com
Plagarism prevention system. Crawls from 64.140.48.25
UCmore
www.ucmore.com
A broswer plug-in (initially IE only) that searches for
related pages and categories. In my experience this
seems to entail accessing a favicon.ico file on a daily
basis (presumably to refresh the "favorites" list)
UdmSearch
http://search.mnogo.ru/
Search engine technology, as used at sites such as
www.maplesearch.com. Now called mnoGoSearch.
unlostBot www.unlost.com is "under construction". The robot came
unlostBot@unlost.com
from IP address 212.37.219.147 which is in France.
URLBlaze
File/web search utility www.urlblaze.net
utopy Coming soon at www.utopy.com (requires flash). This
crawler@utopy.com
venture-capital funded site is "running in stealth mode"
before launching the "new new thing" (is that a typo?).
One of the Flash pages defines Utopia (geddit?), and some
of the browsing is done by IP addresses at ...myutopy.com.
UtilMind HTTPGet
A component intended for downloading pages from the web using
standard Microsoft Windows Internet library (winInet.dll)
Listed on www.utilmind.com/delphi2.html
UrlScope
Unable to find
Vagabondo
Appears to be a log analyzer for Russian BBS systems.
(I may have got that wrong). I found reference to
it being copyright John Gladkih 1998, but I've not found
any URL that gives