Author |
Topic |
Chuck McB
Junior Member
WooYay
196 Posts |
|
aspwiz
Junior Member
250 Posts |
Posted - 14 August 2004 : 14:59:17
|
those are rss aggregators...
Also 'soap clients' and 'news is free' as well as MS's xml components. |
|
|
aspwiz
Junior Member
250 Posts |
Posted - 14 August 2004 : 15:00:36
|
Chuck.... most new UA's you are notified of from my forums will likely be RSS aggregators....
I have just added XML feeds for everything, including portal mod content.
Take a look! |
Edited by - aspwiz on 14 August 2004 15:01:08 |
|
|
Chuck McB
Junior Member
WooYay
196 Posts |
Posted - 14 August 2004 : 15:20:09
|
I have just added XML feeds for everything, including portal mod content. Yup, saw that :)
those are rss aggregators... Are we comfortable saying that the UA would only be used by rss aggregators, rather than anything else, like say a java driven browser? |
|
|
aspwiz
Junior Member
250 Posts |
Posted - 14 August 2004 : 20:26:50
|
Have you seen any java driven browsers? I can see from active users that it's only the RSS feed they are hitting, so I think it safe to assume the majority will be aggregators.
Maybe we should should fire fscript and fquery accross in those emails so we can see what page unknowns are spidering.... Would be safe to assume an unknown looking at an RSS feed would most likely be an aggregator. |
|
|
aspwiz
Junior Member
250 Posts |
Posted - 15 August 2004 : 10:08:15
|
Can you add detection of the following at the start of the UA.... yahooFeedSeeker/1.0 (compatible 4.0; MSIE 5.5; This currently shows as IE 5.5 ... but is in fact the yahoo rss feed crawler.
Cheers. |
|
|
Chuck McB
Junior Member
WooYay
196 Posts |
Posted - 15 August 2004 : 11:02:32
|
Done. |
|
|
aspwiz
Junior Member
250 Posts |
Posted - 15 August 2004 : 13:22:36
|
Hey chuck....
This is prolly a lot more accurate:
I've changed the display type of some entries from spider to RSS Aggregator, and a few other I recognise... (like the log analyser)...
here it is chuck:-AgentCsvStr = "Mediapartners-Google||GOOGLE ADSENSE||ADSENSE CRAWLER**" & _
"Yahoo! Slurp||YAHOO||SEARCH ENGINE SPIDER**" & _
"Googlebot||GOOGLE||SEARCH ENGINE SPIDER**" & _
"NPBot||Name Protect||Domain Research Bot**" & _
"Scooter||ALTAVISTA||SEARCH ENGINE SPIDER**" & _
"yahooFeedSeeker/||Yahoo RSS Feed Seeker||YAHOO RSS AGGREGATOR**" & _
"ia_archiver||ALEXA||SEARCH ENGINE SPIDER**" & _
"FAST-WebCrawler||ALL THE WEB||SEARCH ENGINE SPIDER**" & _
"Speedy Spider||ENTIRE WEB||SEARCH ENGINE SPIDER**" & _
"ArchitextSpider||EXCITE||SEARCH ENGINE SPIDER**" & _
"ArchitectSpider||EXCITE||SEARCH ENGINE SPIDER**" & _
"Ask Jeeves/Teoma||ASK JEEVES / TEOMA||SEARCH ENGINE SPIDER**" & _
"Slurp/||INKTOMI||SEARCH ENGINE SPIDER**" & _
"UltraSeek||INFOSEEK||SEARCH ENGINE SPIDER**" & _
"InfoSeek Sidewinder||INFOSEEK||SEARCH ENGINE SPIDER**" & _
"MantraAgent||LOOKSMART||SEARCH ENGINE SPIDER**" & _
"Lycos_Spider_(T-Rex)||LYCOS||SEARCH ENGINE SPIDER**" & _
"HenryTheMiragoRobot||MIRAGO||SEARCH ENGINE SPIDER**" & _
"MSNBOT/0.1||MSN SEARCH||SEARCH ENGINE SPIDER**" & _
"msnbot/0.11||MSN SEARCH||SEARCH ENGINE SPIDER**" & _
"Gulliver||NORTHERN LIGHT||SEARCH ENGINE SPIDER**" & _
"Scrubby||SCRUB THE WEB||SEARCH ENGINE SPIDER**" & _
"teoma_agent1||TEOMA||SEARCH ENGINE SPIDER**" & _
"marvin/infoseek||WEBSEEK||SEARCH ENGINE SPIDER**" & _
"SlySearch/1.3||SLYSEARCH||SEARCH ENGINE SPIDER**" & _
"Szukacz||SZUKACZ.PL||SEARCH ENGINE SPIDER**" & _
"IE 5.5 Compatible Browser||IE 5.5 Compatible Browser||Unknown**" & _
"almaden||IBM||RESEARCH BOT**" & _
"Google CHTML Proxy/1.0||GOOGLE PROXY SERVER||PROXY SERVER**" & _
"http://grub.org||GRUB||RESEARCH BOT**" & _
"NutchOrg||NUTCH||OPEN SOURCE SPIDER**" & _
"InternetSeer.com||INTERNET SEER||WEBSITE MONITORING SERVICE**" & _
"Baiduspider+||BAIDU||SEARCH ENGINE SPIDER**" & _
"Xenu Link Sleuth||XENU LINK CHECKER||DEAD LINK CHECKER**" & _
"Mozilla/5.||Mozilla 5.x||**" & _
"Mozilla/4.||Mozilla 4.x||**" & _
"Mozilla/3.||Mozilla 3.x||**" & _
"MS FrontPage 4.0||MS FrontPage 4.0||**" & _
"WebTrends/3.||WebTrends 3.0||LOG FILE ANALYSER**" & _
"FavOrg||FavOrg||Favicons Manager**" & _
"JoeDog/1.||www.joedog.org/siege/||WEB SITE TESTER**" & _
"NetMonitor/||NetMonitor||WEBSITE MONITORING SERVICE**" & _
"TurnitinBot/||TurnItInBot||PLAGIARISM RESEARCH BOT**" & _
"dloader(NaverRobot)||NAVER ROBOT||KOREA TELECOM**" & _
"NaverBot-1.0||NAVER ROBOT||KOREA TELECOM**" & _
"ZyBorg/||ZYBORG||DEAD LINK CHECKER**" & _
"QuepasaCreep||QUEPASA.COM||SEARCH ENGINE SPIDER**" & _
"Microsoft URL Control||POSSIBLE EMAIL COLLECTOR||POSSIBLE VUNERABILITY SCANNER**" & _
"Google WAP Proxy||wap.google.com||GOOGLE WAP SEARCH ENGINE**" & _
"Avant Browser||Avant Browser||**" & _
"Openbot/3.0||Openfind.com||Prototype Web-crawling robot**" & _
"Wget/1.8.2||GNU wget||WEBSITE SCRAPER**" & _
"Gigabot/||Gigabot||SEARCH ENGINE SPIDER**" & _
"Jetbot/1||Jetbot||SEARCH ENGINE SPIDER**" & _
"Feedster Crawler||www.feedster.com||RSS AGGREGATOR**" & _
"Sqworm||Sqworm||SEARCH ENGINE SPIDER**" & _
"sohu-search||sohu-search||SEARCH ENGINE SPIDER**" & _
"IECheck||IECheck||IECheck**" & _
"Acme.Spider||Acme.Spider||SEARCH ENGINE SPIDER**" & _
"GetRight/4.5e||GetRight Browser||DOWNLOAD MANAGER**" & _
"appie 1.1||www.walhello.com||SEARCH ENGINE SPIDER**" & _
"LinkWalker||www.seventwentyfour.com||DEAD LINK CHECKER**" & _
"Links SQL||Links SQL||SEARCH ENGINE SPIDER**" & _
"PlantyNet_WebRobot||PlantyNet||SEARCH ENGINE SPIDER**" & _
"exactseek-pagereaper-||Exactseek||SEARCH ENGINE SPIDER**" & _
"Websquash.com||Websquash||SEARCH ENGINE SPIDER**" & _
"Websquash.com||Websquash||SEARCH ENGINE SPIDER**" & _
"Marvin v0.3||www.hon.ch/MedHunt/Marvin.html||SEARCH ENGINE SPIDER**" & _
"TAMU_CS_IRL_CRAWLER/||http://irl-crawler.cs.tamu.edu/||RESEARCH BOT**" & _
"FAST Enterprise Crawler/||FAST Enterprise Crawler||SEARCH ENGINE SPIDER**" & _
"W3C_Validator||http://validator.w3.org/||W3C Validator**" & _
"Iltrovatore-Setaccio/||www.iltrovatore.it||SEARCH ENGINE SPIDER**" & _
"FAST Enterprise Crawler||Fastsearch||SEARCH ENGINE SPIDER**" & _
"Netcraft Web Server Survey||Netcraft||WEB SERVER MONITOR**" & _
"Technoratibot||www.technorati.com||RSS AGGREGATOR**" & _
"Zealbot||www.zeal.com||SEARCH ENGINE SPIDER**" & _
"NIF/||News Is Free RSS AGGREGATOR||RSS AGGREGATOR**" & _
"Syndic8/||Syndic8 RSS AGGREGATOR||RSS AGGREGATOR**" & _
"Pluck Soap Client/||Pluck RSS AGGREGATOR||RSS AGGREGATOR**" & _
"Anonymization.Net||Anonymous Web Surfing||WEB ANONYMYZER**" & _
"lmspider||lmspider lmspider@scansoft.com||Speech Recognition Research Bot**"
|
|
|
Chuck McB
Junior Member
WooYay
196 Posts |
Posted - 15 August 2004 : 14:06:23
|
Updated, cheers. |
|
|
aspwiz
Junior Member
250 Posts |
Posted - 16 August 2004 : 04:31:16
|
hmmm... Yahoo Feed Seeker aint working... for 2 reasons... a) needs a capitol Y at start... b) It needs the IE detection changed from this:- elseif instr(ua, "MSIE") then to this elseif instr(ua, "MSIE") and instr(ua,"YahooFeedSeeker/") = 0 then
I'm considering reworking things so that the UserAgent Array is handled first... if no matches, then browser checks are done.... instead of other way round.
Also, I'm thinking of making the script update it's useragent list via an XML feed. Of course.... this will be requested only once per session, and cached.
Feedback anyone? |
Edited by - aspwiz on 16 August 2004 04:31:51 |
|
|
Chuck McB
Junior Member
WooYay
196 Posts |
Posted - 16 August 2004 : 08:09:42
|
Also, I'm thinking of making the script update it's useragent list via an XML feed. Of course.... this will be requested only once per session, and cached. Sounds logical. |
|
|
aspwiz
Junior Member
250 Posts |
Posted - 16 August 2004 : 09:47:23
|
Is everyone here running speedball 2? |
|
|
masterao
Senior Member
Sweden
1678 Posts |
|
aspwiz
Junior Member
250 Posts |
Posted - 16 August 2004 : 15:27:58
|
Chuck....
I will be doing an updated version, but for speedball 2 (encompassing xml, etc).
Would you want to make the backward changes for Non speedball?
I'll let you have the files once done.
I could do it so that when a new UA is encountered, it sends the data to the host UA server as xml, and a new entry in the xml is made..... obviously, manual edit of the data to establish spider, browser, aggregator, etc would be nessesary. |
|
|
Chuck McB
Junior Member
WooYay
196 Posts |
Posted - 16 August 2004 : 17:32:59
|
If you have the time etc, but I'm thinking that there's not that many well used spiders out there left. |
|
|
Topic |
|