Snitz Forums 2000
Snitz Forums 2000
Home | Profile | Register | Active Topics | Members | Search | FAQ
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 Snitz Forums 2000 MOD-Group
 MOD Add-On Forum (W/Code)
 Auto Update Spider ID MOD : Active Users
 New Topic  Topic Locked
 Printer Friendly
Previous Page | Next Page
Author Previous Topic Topic Next Topic
Page: of 19

Chuck McB
Junior Member

WooYay
196 Posts

Posted - 14 August 2004 :  10:27:40  Show Profile  Visit Chuck McB's Homepage  Send Chuck McB an ICQ Message
/me puts hand up to RSS

Have added it to the script (also added your adsence tweak)

What do you make to these UAs:

User Agent String = "Java/1.5.0-beta2"
Agent="Java/1.4.2_04"
Agent="Java/1.4.2-beta"
and also this one:
User Agent String = "irbot.Spider"
Go to Top of Page

aspwiz
Junior Member

250 Posts

Posted - 14 August 2004 :  14:59:17  Show Profile  Visit aspwiz's Homepage
those are rss aggregators...

Also 'soap clients' and 'news is free' as well as MS's xml components.
Go to Top of Page

aspwiz
Junior Member

250 Posts

Posted - 14 August 2004 :  15:00:36  Show Profile  Visit aspwiz's Homepage
Chuck.... most new UA's you are notified of from my forums will likely be RSS aggregators....

I have just added XML feeds for everything, including portal mod content.

Take a look!

Edited by - aspwiz on 14 August 2004 15:01:08
Go to Top of Page

Chuck McB
Junior Member

WooYay
196 Posts

Posted - 14 August 2004 :  15:20:09  Show Profile  Visit Chuck McB's Homepage  Send Chuck McB an ICQ Message
I have just added XML feeds for everything, including portal mod content.
Yup, saw that :)

those are rss aggregators...
Are we comfortable saying that the UA would only be used by rss aggregators, rather than anything else, like say a java driven browser?
Go to Top of Page

aspwiz
Junior Member

250 Posts

Posted - 14 August 2004 :  20:26:50  Show Profile  Visit aspwiz's Homepage
Have you seen any java driven browsers?
I can see from active users that it's only the RSS feed they are hitting, so I think it safe to assume the majority will be aggregators.

Maybe we should should fire fscript and fquery accross in those emails so we can see what page unknowns are spidering.... Would be safe to assume an unknown looking at an RSS feed would most likely be an aggregator.
Go to Top of Page

aspwiz
Junior Member

250 Posts

Posted - 15 August 2004 :  10:08:15  Show Profile  Visit aspwiz's Homepage
Can you add detection of the following at the start of the UA....
yahooFeedSeeker/1.0 (compatible 4.0; MSIE 5.5;
This currently shows as IE 5.5 ... but is in fact the yahoo rss feed crawler.

Cheers.
Go to Top of Page

Chuck McB
Junior Member

WooYay
196 Posts

Posted - 15 August 2004 :  11:02:32  Show Profile  Visit Chuck McB's Homepage  Send Chuck McB an ICQ Message
Done.
Go to Top of Page

aspwiz
Junior Member

250 Posts

Posted - 15 August 2004 :  13:22:36  Show Profile  Visit aspwiz's Homepage
Hey chuck....

This is prolly a lot more accurate:

I've changed the display type of some entries from spider to RSS Aggregator, and a few other I recognise... (like the log analyser)...

here it is chuck:-
AgentCsvStr = 	"Mediapartners-Google||GOOGLE ADSENSE||ADSENSE CRAWLER**" & _
			"Yahoo! Slurp||YAHOO||SEARCH ENGINE SPIDER**" & _
			"Googlebot||GOOGLE||SEARCH ENGINE SPIDER**" & _
			"NPBot||Name Protect||Domain Research Bot**" & _
			"Scooter||ALTAVISTA||SEARCH ENGINE SPIDER**" & _
			"yahooFeedSeeker/||Yahoo RSS Feed Seeker||YAHOO RSS AGGREGATOR**" & _
			"ia_archiver||ALEXA||SEARCH ENGINE SPIDER**" & _
			"FAST-WebCrawler||ALL THE WEB||SEARCH ENGINE SPIDER**" & _
			"Speedy Spider||ENTIRE WEB||SEARCH ENGINE SPIDER**" & _
			"ArchitextSpider||EXCITE||SEARCH ENGINE SPIDER**" & _
			"ArchitectSpider||EXCITE||SEARCH ENGINE SPIDER**" & _
			"Ask Jeeves/Teoma||ASK JEEVES / TEOMA||SEARCH ENGINE SPIDER**" & _
			"Slurp/||INKTOMI||SEARCH ENGINE SPIDER**" & _
			"UltraSeek||INFOSEEK||SEARCH ENGINE SPIDER**" & _
			"InfoSeek Sidewinder||INFOSEEK||SEARCH ENGINE SPIDER**" & _
			"MantraAgent||LOOKSMART||SEARCH ENGINE SPIDER**" & _
			"Lycos_Spider_(T-Rex)||LYCOS||SEARCH ENGINE SPIDER**" & _
			"HenryTheMiragoRobot||MIRAGO||SEARCH ENGINE SPIDER**" & _
			"MSNBOT/0.1||MSN SEARCH||SEARCH ENGINE SPIDER**" & _
			"msnbot/0.11||MSN SEARCH||SEARCH ENGINE SPIDER**" & _
			"Gulliver||NORTHERN LIGHT||SEARCH ENGINE SPIDER**" & _
			"Scrubby||SCRUB THE WEB||SEARCH ENGINE SPIDER**" & _
			"teoma_agent1||TEOMA||SEARCH ENGINE SPIDER**" & _
			"marvin/infoseek||WEBSEEK||SEARCH ENGINE SPIDER**" & _
			"SlySearch/1.3||SLYSEARCH||SEARCH ENGINE SPIDER**" & _
			"Szukacz||SZUKACZ.PL||SEARCH ENGINE SPIDER**" & _
			"IE 5.5 Compatible Browser||IE 5.5 Compatible Browser||Unknown**" & _
			"almaden||IBM||RESEARCH BOT**" & _
			"Google CHTML Proxy/1.0||GOOGLE PROXY SERVER||PROXY SERVER**" & _
			"http://grub.org||GRUB||RESEARCH BOT**" & _
			"NutchOrg||NUTCH||OPEN SOURCE SPIDER**" & _
			"InternetSeer.com||INTERNET SEER||WEBSITE MONITORING SERVICE**" & _
			"Baiduspider+||BAIDU||SEARCH ENGINE SPIDER**" & _
			"Xenu Link Sleuth||XENU LINK CHECKER||DEAD LINK CHECKER**" & _
			"Mozilla/5.||Mozilla 5.x||**" & _
			"Mozilla/4.||Mozilla 4.x||**" & _
			"Mozilla/3.||Mozilla 3.x||**" & _
			"MS FrontPage 4.0||MS FrontPage 4.0||**" & _
			"WebTrends/3.||WebTrends 3.0||LOG FILE ANALYSER**" & _
			"FavOrg||FavOrg||Favicons Manager**" & _
			"JoeDog/1.||www.joedog.org/siege/||WEB SITE TESTER**" & _
			"NetMonitor/||NetMonitor||WEBSITE MONITORING SERVICE**" & _
			"TurnitinBot/||TurnItInBot||PLAGIARISM RESEARCH BOT**" & _
			"dloader(NaverRobot)||NAVER ROBOT||KOREA TELECOM**" & _
			"NaverBot-1.0||NAVER ROBOT||KOREA TELECOM**" & _
			"ZyBorg/||ZYBORG||DEAD LINK CHECKER**" & _
			"QuepasaCreep||QUEPASA.COM||SEARCH ENGINE SPIDER**" & _
			"Microsoft URL Control||POSSIBLE EMAIL COLLECTOR||POSSIBLE VUNERABILITY SCANNER**" & _
			"Google WAP Proxy||wap.google.com||GOOGLE WAP SEARCH ENGINE**" & _
			"Avant Browser||Avant Browser||**" & _
			"Openbot/3.0||Openfind.com||Prototype Web-crawling robot**" & _
			"Wget/1.8.2||GNU wget||WEBSITE SCRAPER**" & _
			"Gigabot/||Gigabot||SEARCH ENGINE SPIDER**" & _
			"Jetbot/1||Jetbot||SEARCH ENGINE SPIDER**" & _
			"Feedster Crawler||www.feedster.com||RSS AGGREGATOR**" & _
			"Sqworm||Sqworm||SEARCH ENGINE SPIDER**" & _
			"sohu-search||sohu-search||SEARCH ENGINE SPIDER**" & _
			"IECheck||IECheck||IECheck**" & _ 
			"Acme.Spider||Acme.Spider||SEARCH ENGINE SPIDER**" & _ 
			"GetRight/4.5e||GetRight Browser||DOWNLOAD MANAGER**" & _
			"appie 1.1||www.walhello.com||SEARCH ENGINE SPIDER**" & _
			"LinkWalker||www.seventwentyfour.com||DEAD LINK CHECKER**" & _
			"Links SQL||Links SQL||SEARCH ENGINE SPIDER**" & _
			"PlantyNet_WebRobot||PlantyNet||SEARCH ENGINE SPIDER**" & _
			"exactseek-pagereaper-||Exactseek||SEARCH ENGINE SPIDER**" & _
			"Websquash.com||Websquash||SEARCH ENGINE SPIDER**" & _
			"Websquash.com||Websquash||SEARCH ENGINE SPIDER**" & _
			"Marvin v0.3||www.hon.ch/MedHunt/Marvin.html||SEARCH ENGINE SPIDER**" & _
			"TAMU_CS_IRL_CRAWLER/||http://irl-crawler.cs.tamu.edu/||RESEARCH BOT**" & _
			"FAST Enterprise Crawler/||FAST Enterprise Crawler||SEARCH ENGINE SPIDER**" & _
			"W3C_Validator||http://validator.w3.org/||W3C Validator**" & _
			"Iltrovatore-Setaccio/||www.iltrovatore.it||SEARCH ENGINE SPIDER**" & _
			"FAST Enterprise Crawler||Fastsearch||SEARCH ENGINE SPIDER**" & _
			"Netcraft Web Server Survey||Netcraft||WEB SERVER MONITOR**" & _
			"Technoratibot||www.technorati.com||RSS AGGREGATOR**" & _
			"Zealbot||www.zeal.com||SEARCH ENGINE SPIDER**" & _
			"NIF/||News Is Free RSS AGGREGATOR||RSS AGGREGATOR**" & _
			"Syndic8/||Syndic8 RSS AGGREGATOR||RSS AGGREGATOR**" & _
			"Pluck Soap Client/||Pluck RSS AGGREGATOR||RSS AGGREGATOR**" & _ 
			"Anonymization.Net||Anonymous Web Surfing||WEB ANONYMYZER**" & _ 
			"lmspider||lmspider lmspider@scansoft.com||Speech Recognition Research Bot**"


Go to Top of Page

Chuck McB
Junior Member

WooYay
196 Posts

Posted - 15 August 2004 :  14:06:23  Show Profile  Visit Chuck McB's Homepage  Send Chuck McB an ICQ Message
Updated, cheers.
Go to Top of Page

aspwiz
Junior Member

250 Posts

Posted - 16 August 2004 :  04:31:16  Show Profile  Visit aspwiz's Homepage
hmmm... Yahoo Feed Seeker aint working...
for 2 reasons...
a) needs a capitol Y at start...
b) It needs the IE detection changed from this:-
elseif instr(ua, "MSIE") then
to this
elseif instr(ua, "MSIE") and instr(ua,"YahooFeedSeeker/") = 0 then

I'm considering reworking things so that the UserAgent Array is handled first...
if no matches, then browser checks are done.... instead of other way round.

Also, I'm thinking of making the script update it's useragent list via an XML feed.
Of course.... this will be requested only once per session, and cached.

Feedback anyone?

Edited by - aspwiz on 16 August 2004 04:31:51
Go to Top of Page

Chuck McB
Junior Member

WooYay
196 Posts

Posted - 16 August 2004 :  08:09:42  Show Profile  Visit Chuck McB's Homepage  Send Chuck McB an ICQ Message
Also, I'm thinking of making the script update it's useragent list via an XML feed.
Of course.... this will be requested only once per session, and cached.

Sounds logical.
Go to Top of Page

aspwiz
Junior Member

250 Posts

Posted - 16 August 2004 :  09:47:23  Show Profile  Visit aspwiz's Homepage
Is everyone here running speedball 2?
Go to Top of Page

masterao
Senior Member

Sweden
1678 Posts

Posted - 16 August 2004 :  15:21:43  Show Profile  Visit masterao's Homepage
Not me.

Jan
===========
FR Portal Forums | Active Users 4.0.20 Mod
Go to Top of Page

aspwiz
Junior Member

250 Posts

Posted - 16 August 2004 :  15:27:58  Show Profile  Visit aspwiz's Homepage
Chuck....

I will be doing an updated version, but for speedball 2 (encompassing xml, etc).

Would you want to make the backward changes for Non speedball?

I'll let you have the files once done.

I could do it so that when a new UA is encountered, it sends the data to the host UA server as xml, and a new entry in the xml is made..... obviously, manual edit of the data to establish spider, browser, aggregator, etc would be nessesary.
Go to Top of Page

Chuck McB
Junior Member

WooYay
196 Posts

Posted - 16 August 2004 :  17:32:59  Show Profile  Visit Chuck McB's Homepage  Send Chuck McB an ICQ Message
If you have the time etc, but I'm thinking that there's not that many well used spiders out there left.
Go to Top of Page
Page: of 19 Previous Topic Topic Next Topic  
Previous Page | Next Page
 New Topic  Topic Locked
 Printer Friendly
Jump To:
Snitz Forums 2000 © 2000-2021 Snitz™ Communications Go To Top Of Page
This page was generated in 0.18 seconds. Powered By: Snitz Forums 2000 Version 3.4.07