Snitz Forums 2000
Snitz Forums 2000
Home | Profile | Register | Active Topics | Members | Search | FAQ
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 Community Forums
 Community Discussions (All other subjects)
 How long are webcrawlers normally active?
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

jgs
New Member

Netherlands
95 Posts

Posted - 26 January 2009 :  10:59:42  Show Profile  Reply with Quote
I've added the active users Mod and it monitors all activity, including nog logged in users.
Yesterday all day and lastnight there was a lot of activity from 2 webbots, one from Google, the other Wiseguys according to IP lookup.
I always thought those where very quick visitors but I've noticed they stayed for over 30 minutes a time (inactivity is set for 15 minutes!) They even tried answering topics, the same topic several visits.

Is this normal behaviour?
I'm asking this because yesterday everything was working fine, this morning several bugs and I had to replace several files, the latest file copies I had, and all where smaller than the ones on the server.<

Info about my forum: http://www.govvd.nl/forumsoftware.htm list of Mods included.
Most of userinterface translated into Dutch.

Edited by - jgs on 26 January 2009 11:03:20

Shaggy
Support Moderator

Ireland
6780 Posts

Posted - 26 January 2009 :  11:14:36  Show Profile  Reply with Quote
Forums can occasionally be a bit of a "bot-trap", catching them in loops for extended periods of time. They try to follow every link they find on a page so, if you count the amount of links to post.asp on this page alone with differing querystrings, you can see how a bot can spend so long trying to index a forum. The best thing to do is to use your robots.txt to prevent them from follwing links to pages they shouldn't be indexing, such as post.asp - they only really need to index topic.asp and, if you want, default.asp & forum.asp.

<

Search is your friend
“I was having a mildly paranoid day, mostly due to the
fact that the mad priest lady from over the river had
taken to nailing weasels to my front door again.”
Go to Top of Page

jgs
New Member

Netherlands
95 Posts

Posted - 26 January 2009 :  12:22:56  Show Profile  Reply with Quote
Thanks, for the link. I noticed there is no "allow" function. Is there maybe a ready made robots.txt for Snitz forums?<

Info about my forum: http://www.govvd.nl/forumsoftware.htm list of Mods included.
Most of userinterface translated into Dutch.
Go to Top of Page

Shaggy
Support Moderator

Ireland
6780 Posts

Posted - 26 January 2009 :  13:25:27  Show Profile  Reply with Quote
No, "Allow" is not part of the standard; there are some bots that will recognise it but you shouldn't rely on it.

I don't know of a ready made file, but if I've got a few minutes tomorrow, I'll try and knock one together.

<

Search is your friend
“I was having a mildly paranoid day, mostly due to the
fact that the mad priest lady from over the river had
taken to nailing weasels to my front door again.”
Go to Top of Page

jgs
New Member

Netherlands
95 Posts

Posted - 26 January 2009 :  13:41:15  Show Profile  Reply with Quote
Would be a handy add on for Snitch, just had another one locked up in the "still" empty guestbook for 40 minutes.
They seem to have espacially trouble with topics with links in it.<

Info about my forum: http://www.govvd.nl/forumsoftware.htm list of Mods included.
Most of userinterface translated into Dutch.
Go to Top of Page

Shaggy
Support Moderator

Ireland
6780 Posts

Posted - 28 January 2009 :  06:00:08  Show Profile  Reply with Quote
Here's something quick I knocked together for you; want to put a bit more work into it before I release it "officially". Some notes:

- It includes files (such as those for the admin options and the include files) that a bot should never know exist - I just like to be thorough ;)

- It doesn't include any mod-related files, you'll need to add those in yourself.

- All occurrences of "forumdirectory" below will need to be changed to the name of your own forum directory.

- If you don't want bots indexing default.asp or forum.asp, uncomment the relevant lines but please bear in mind that this may have an impact an a bot's ability to access individual topics.

Here's the file:
User-agent: *
Disallow: /forumdirectory/active.asp
Disallow: /forumdirectory/admin_
#Disallow: /forumdirectory/default.asp
Disallow: /forumdirectory/default_group.asp
Disallow: /forumdirectorry/down.asp
Disallow: /forumdirectorry/faq.asp
#Disallow: /forumdirectorry/forum.asp
Disallow: /forumdirectorry/inc_
Disallow: /forumdirectorry/link.asp
Disallow: /forumdirectorry/members.asp
Disallow: /forumdirectorry/moderate.asp
Disallow: /forumdirectorry/password.asp
Disallow: /forumdirectorry/policy.asp
Disallow: /forumdirectorry/pop_
Disallow: /forumdirectorry/post
Disallow: /forumdirectorry/register.asp
Disallow: /forumdirectorry/search.asp
Disallow: /forumdirectorry/setup
Disallow: /forumdirectorry/subscription_list.asp
Disallow: /forumdirectory/readme.htm
Disallow: /forumdirectorry/gpl.txt
Disallow: /forumdirectory/*.gif
Disallow: /forumdirectorry/*.js
<

Search is your friend
“I was having a mildly paranoid day, mostly due to the
fact that the mad priest lady from over the river had
taken to nailing weasels to my front door again.”
Go to Top of Page

jgs
New Member

Netherlands
95 Posts

Posted - 28 January 2009 :  06:33:50  Show Profile  Reply with Quote
Thank's I hope it will help, they are getting a bit annoying, espacially google and yahoo hang for hours.
It's not that I have anything against search engines (google is my friend), but I don't like useless excercises, just wast of energy, so if I can help prevent it, I will.
Good to know you can use jokers *.<

Info about my forum: http://www.govvd.nl/forumsoftware.htm list of Mods included.
Most of userinterface translated into Dutch.

Edited by - jgs on 28 January 2009 06:44:47
Go to Top of Page

pitstraight
New Member

Australia
82 Posts

Posted - 28 January 2009 :  06:39:40  Show Profile  Reply with Quote
Isn't it GOOD that Google is reading your site though ?<
Go to Top of Page

Shaggy
Support Moderator

Ireland
6780 Posts

Posted - 28 January 2009 :  06:55:44  Show Profile  Reply with Quote
Always a good thing to have Google crawl your site, yes, but no need for it to chew up chunks of bandwidth trying to read & index irrelevant pages. In fact, allowing a bot to do so could prove detrimental to your rankings.

Jgs, note that not all bots will recognise the * wildcard so, again, it shouldn't be relied on - in my example above, it's not the end of the world if a bot retrieves a javascript or image file but, if you really don't want them to, you should use the full file name to restrict access.

Note, as well, that robots.txt is not the "be all & end all" of restricting bot access to your site as there's no requirement anywhere for any bot, good or bad, to adhere to it's rules; it's simply considered god practice among bot authors to do so.

<

Search is your friend
“I was having a mildly paranoid day, mostly due to the
fact that the mad priest lady from over the river had
taken to nailing weasels to my front door again.”

Edited by - Shaggy on 28 January 2009 06:56:49
Go to Top of Page

jgs
New Member

Netherlands
95 Posts

Posted - 28 January 2009 :  11:50:56  Show Profile  Reply with Quote
Maybe it's a good idea to put a robots.txt standard in the download of the boardsoftware. Maybe it's possible to add a routine that automatically updates the file with the exclusion of the three files to be indexed.

All little savings add up to a big one.

Updating (and creating) such a file is typically something you forget to do.<

Info about my forum: http://www.govvd.nl/forumsoftware.htm list of Mods included.
Most of userinterface translated into Dutch.
Go to Top of Page

Shaggy
Support Moderator

Ireland
6780 Posts

Posted - 28 January 2009 :  11:57:51  Show Profile  Reply with Quote
Including a robots.txt file in the Snitz download would only lead to confusion for those who already have a robots.txt file for their domain and wouldn't have the know-how to edit it. They would, either

- Upload it to their forum directory, in which case it would be useless,

- Overwrite their existing robots.txt, in which case bots would now have access to other areas of their site that they shouldn't, or,

- Try and paste the entire contents of the Snitz robots.txt into the existing one and just make a mess of it.

Also, for those with "require registration" enabled, a robots.txt file would be usless as they should really be disallowing the indexing of their entire forum directory.

It's far, far too easy to block your entire site from being indexed by any bots with just the simplest of typos so we'd need to add a rather extensive section to the readme explaining what robots.txt is, how to use it and what not to do.

<

Search is your friend
“I was having a mildly paranoid day, mostly due to the
fact that the mad priest lady from over the river had
taken to nailing weasels to my front door again.”
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
Snitz Forums 2000 © 2000-2021 Snitz™ Communications Go To Top Of Page
This page was generated in 0.16 seconds. Powered By: Snitz Forums 2000 Version 3.4.07