Snitz Forums 2000
Snitz Forums 2000
Home | Profile | Register | Active Topics | Members | Search | FAQ
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 Community Forums
 Community Discussions (All other subjects)
 Googlebot flippin' ?
 New Topic  Topic Locked
 Printer Friendly
Next Page
Author Previous Topic Topic Next Topic
Page: of 2

MarcelG
Retired Support Moderator

Netherlands
2625 Posts

Posted - 13 June 2005 :  08:45:26  Show Profile  Visit MarcelG's Homepage
Hi there, I've been having some problems with GoogleBot since 3 days...
They've been pulling over 14000 pageviews with a total of 1.11 gigabytes since last Saturday, and have been ignoring my robots.txt file (by indexing my post.asp file for about 836 megs).... (See here for a more detailed discription)
I was just curious if some of you also experienced this behaviour the last couple of days....

portfolio - linkshrinker - oxle - twitter

Edited by - MarcelG on 13 June 2005 08:46:44

Podge
Support Moderator

Ireland
3775 Posts

Posted - 13 June 2005 :  08:57:30  Show Profile  Send Podge an ICQ Message  Send Podge a Yahoo! Message
I'll check my stats when I get home from work and let you know.

Are you sure that it is the Googlebot and not an imposter? Is there any way you can get the i.p.'s where the user agent is Googlebot from the logs?

Podge.

The Hunger Site - Click to donate free food | My Blog | Snitz 3.4.05 AutoInstall (Beta!)

My Mods: CAPTCHA Mod | GateKeeper Mod
Tutorial: Enable subscriptions on your board

Warning: The post above or below may contain nuts.
Go to Top of Page

pdrg
Support Moderator

United Kingdom
2897 Posts

Posted - 13 June 2005 :  09:11:02  Show Profile  Send pdrg a Yahoo! Message
try challenging google - they may deny it and it may be an imposter as podge suggests, but an imposter to what end is a mystery :(

Can you firewall the rogue IP block out?
Go to Top of Page

wii
Free ASP Hosts Moderator

Denmark
2632 Posts

Posted - 13 June 2005 :  09:49:53  Show Profile
Yeah, several years ago I had problems with this - I contacted Google and they removed my forum from the spiders, which is fine in my case, since it´s a private forum anyway.
Go to Top of Page

wii
Free ASP Hosts Moderator

Denmark
2632 Posts

Posted - 13 June 2005 :  09:53:34  Show Profile
Here´s the topic

http://forum.snitz.com/forum/topic.asp?ARCHIVE=true&TOPIC_ID=21990
Go to Top of Page

Podge
Support Moderator

Ireland
3775 Posts

Posted - 13 June 2005 :  10:04:47  Show Profile  Send Podge an ICQ Message  Send Podge a Yahoo! Message
quote:
try challenging google - they may deny it and it may be an imposter as podge suggests, but an imposter to what end is a mystery :(

I'm not psychic. Sadly.

My point was to block the i.p. addresses whether or not they are google's.

I think I read somewhere that Google introduced a new alogrithm recently which may explain the increased activity.

Podge.

The Hunger Site - Click to donate free food | My Blog | Snitz 3.4.05 AutoInstall (Beta!)

My Mods: CAPTCHA Mod | GateKeeper Mod
Tutorial: Enable subscriptions on your board

Warning: The post above or below may contain nuts.
Go to Top of Page

pdrg
Support Moderator

United Kingdom
2897 Posts

Posted - 13 June 2005 :  10:09:25  Show Profile  Send pdrg a Yahoo! Message
haha

yep, algorithms change frequently - they're the only way a search company has any edge over any other. Really ought to respect the robots.txt though and 'play nice', so worth challenging google anyway, in case they're inadvertantly 'playing nasty'
Go to Top of Page

Podge
Support Moderator

Ireland
3775 Posts

Posted - 13 June 2005 :  10:36:36  Show Profile  Send Podge an ICQ Message  Send Podge a Yahoo! Message
quote:
Really ought to respect the robots.txt
Thats why I thought it might not be Google.

Podge.

The Hunger Site - Click to donate free food | My Blog | Snitz 3.4.05 AutoInstall (Beta!)

My Mods: CAPTCHA Mod | GateKeeper Mod
Tutorial: Enable subscriptions on your board

Warning: The post above or below may contain nuts.
Go to Top of Page

pdrg
Support Moderator

United Kingdom
2897 Posts

Posted - 13 June 2005 :  10:37:23  Show Profile  Send pdrg a Yahoo! Message
agreed :)
Go to Top of Page

MarcelG
Retired Support Moderator

Netherlands
2625 Posts

Posted - 13 June 2005 :  10:47:41  Show Profile  Visit MarcelG's Homepage
At this moment I'm seeing 66.249.66.12 crawling the site.
That's 100% google.

Tonight I'll have full access to my logfiles, so I'll dive in then and see what's been pulling so much lately.
I'll let you know what I come up with, and what response I got from Google.

portfolio - linkshrinker - oxle - twitter

Edited by - MarcelG on 13 June 2005 10:54:54
Go to Top of Page

Podge
Support Moderator

Ireland
3775 Posts

Posted - 13 June 2005 :  12:31:56  Show Profile  Send Podge an ICQ Message  Send Podge a Yahoo! Message
Gogglebot is just above normal for me

Googlebot 121.76 MB 6.87%

Its normally around 5% of bandwidth.

Maybe they added a new file type which can now be indexed and its trying to download large files from your website?

Podge.

The Hunger Site - Click to donate free food | My Blog | Snitz 3.4.05 AutoInstall (Beta!)

My Mods: CAPTCHA Mod | GateKeeper Mod
Tutorial: Enable subscriptions on your board

Warning: The post above or below may contain nuts.
Go to Top of Page

MarcelG
Retired Support Moderator

Netherlands
2625 Posts

Posted - 13 June 2005 :  14:48:25  Show Profile  Visit MarcelG's Homepage
No, it's mainly post.asp that's been indexed so many times by them....
Diving into the logfiles at this moment.

[edit]Dived into the logfiles...It's google allright: stats with IP
Grr...
Now, how can I 'mod' my inc_header to redirect all incoming googlebot-traffic to post.asp to an empty file ? (And prevent image parsing?)

portfolio - linkshrinker - oxle - twitter

Edited by - MarcelG on 13 June 2005 15:13:35
Go to Top of Page

MarcelG
Retired Support Moderator

Netherlands
2625 Posts

Posted - 13 June 2005 :  15:51:54  Show Profile  Visit MarcelG's Homepage
I think I've done it.
I've implemented this neat piece of code in config.asp:
'Spider check
	' Well, normally the browser isn't a spider...
	dim isSpider
	isSpider = 0
	' No other meaning than forcing the isSpider behaviour
	' for testing pourpose
	if request("spider") = 1 then isSpider = 1
	' Takes the name of the UserAgent currently used and put it
	' into lower case for compairson
	agent = lcase(Request.ServerVariables("HTTP_USER_AGENT"))
	' Now, most of the Bots refers to themself as libwww,
	' java, perl, crawl, bot. let's start with some conditions
	' If the agent contains "bot" then it is a Spider
	if instr(agent, "bot")  > 0 then isSpider = 1
	' If the agent contains "perl" then it is a Spider
	if instr(agent, "perl") > 0 then isSpider = 1
	' If the agent contains "java" then it is a Spider
	if instr(agent, "java") > 0 then isSpider = 1
	' If the agent contains "libw" then it is a Spider
	if instr(agent, "libw") > 0 then isSpider = 1
	' If the agent contains "crawl" then it is a Spider
	if instr(agent, "crawl") > 0 then isSpider = 1
'end spider check

Now, I've got the isSpider value for every page.
So, edited inc_func_common.asp, for the function function FormatStr(fString)
I added this part:
		if strIMGInPosts = "1" and isSpider = 0 then
			fString = ReplaceImageTags(fString)
		end if
So, no more image parsing for the GoogleBots.
And, everywhere where I didn't want the GoogleBot to go to, I inserted this piece of code in the header, redirecting to an empty file.
For example post.asp, right after the include of config.asp:
if isSpider = 1 then
	server.transfer("empty.asp")
end if

And, I changed the linkshrinker to stop redirecting GoogleBot requests (edited my 404.asp).
Now, we'll just have to wait and see.

portfolio - linkshrinker - oxle - twitter
Go to Top of Page

Podge
Support Moderator

Ireland
3775 Posts

Posted - 13 June 2005 :  16:58:14  Show Profile  Send Podge an ICQ Message  Send Podge a Yahoo! Message
Nicely done. I tested it with Firefox's user agent switcher and it works flawlessly.

One thing (off topic) I thought I would bring your attention to;

The flash Oxle logo on the top left of your forums behaves like a link but doesn't work when clicked.

Podge.

The Hunger Site - Click to donate free food | My Blog | Snitz 3.4.05 AutoInstall (Beta!)

My Mods: CAPTCHA Mod | GateKeeper Mod
Tutorial: Enable subscriptions on your board

Warning: The post above or below may contain nuts.
Go to Top of Page

MarcelG
Retired Support Moderator

Netherlands
2625 Posts

Posted - 14 June 2005 :  03:39:46  Show Profile  Visit MarcelG's Homepage
quote:
Originally posted by Podge
Nicely done. I tested it with Firefox's user agent switcher and it works flawlessly.
Nice plugin Used it in Mozilla to test this too. I'm not sure if google accepts this 'cloaking', but at least they won't be slurping my bandwidth now.
If someone thinks that my spider-detection scripts needs some improvement, please feel free to suggest changes.

quote:
Originally posted by Podge
One thing (off topic) I thought I would bring your attention to;
The flash Oxle logo on the top left of your forums behaves like a link but doesn't work when clicked.

Thanks for the tip ; looking into it later today.

portfolio - linkshrinker - oxle - twitter
Go to Top of Page

pdrg
Support Moderator

United Kingdom
2897 Posts

Posted - 14 June 2005 :  04:39:25  Show Profile  Send pdrg a Yahoo! Message
splendid stuff - be interesting to see what google say about why they didn't respect your robots.txt!

Out of interest, how is your robots.txt configured?
Go to Top of Page
Page: of 2 Previous Topic Topic Next Topic  
Next Page
 New Topic  Topic Locked
 Printer Friendly
Jump To:
Snitz Forums 2000 © 2000-2021 Snitz™ Communications Go To Top Of Page
This page was generated in 0.56 seconds. Powered By: Snitz Forums 2000 Version 3.4.07