T O P I C R E V I E W |
jgs |
Posted - 26 January 2009 : 10:59:42 I've added the active users Mod and it monitors all activity, including nog logged in users. Yesterday all day and lastnight there was a lot of activity from 2 webbots, one from Google, the other Wiseguys according to IP lookup. I always thought those where very quick visitors but I've noticed they stayed for over 30 minutes a time (inactivity is set for 15 minutes!) They even tried answering topics, the same topic several visits.
Is this normal behaviour? I'm asking this because yesterday everything was working fine, this morning several bugs and I had to replace several files, the latest file copies I had, and all where smaller than the ones on the server.< |
10 L A T E S T R E P L I E S (Newest First) |
Shaggy |
Posted - 28 January 2009 : 11:57:51 Including a robots.txt file in the Snitz download would only lead to confusion for those who already have a robots.txt file for their domain and wouldn't have the know-how to edit it. They would, either
- Upload it to their forum directory, in which case it would be useless,
- Overwrite their existing robots.txt, in which case bots would now have access to other areas of their site that they shouldn't, or,
- Try and paste the entire contents of the Snitz robots.txt into the existing one and just make a mess of it.
Also, for those with "require registration" enabled, a robots.txt file would be usless as they should really be disallowing the indexing of their entire forum directory.
It's far, far too easy to block your entire site from being indexed by any bots with just the simplest of typos so we'd need to add a rather extensive section to the readme explaining what robots.txt is, how to use it and what not to do.
< |
jgs |
Posted - 28 January 2009 : 11:50:56 Maybe it's a good idea to put a robots.txt standard in the download of the boardsoftware. Maybe it's possible to add a routine that automatically updates the file with the exclusion of the three files to be indexed.
All little savings add up to a big one.
Updating (and creating) such a file is typically something you forget to do.< |
Shaggy |
Posted - 28 January 2009 : 06:55:44 Always a good thing to have Google crawl your site, yes, but no need for it to chew up chunks of bandwidth trying to read & index irrelevant pages. In fact, allowing a bot to do so could prove detrimental to your rankings.
Jgs, note that not all bots will recognise the * wildcard so, again, it shouldn't be relied on - in my example above, it's not the end of the world if a bot retrieves a javascript or image file but, if you really don't want them to, you should use the full file name to restrict access.
Note, as well, that robots.txt is not the "be all & end all" of restricting bot access to your site as there's no requirement anywhere for any bot, good or bad, to adhere to it's rules; it's simply considered god practice among bot authors to do so.
< |
pitstraight |
Posted - 28 January 2009 : 06:39:40 Isn't it GOOD that Google is reading your site though ?< |
jgs |
Posted - 28 January 2009 : 06:33:50 Thank's I hope it will help, they are getting a bit annoying, espacially google and yahoo hang for hours. It's not that I have anything against search engines (google is my friend), but I don't like useless excercises, just wast of energy, so if I can help prevent it, I will. Good to know you can use jokers *.< |
Shaggy |
Posted - 28 January 2009 : 06:00:08 Here's something quick I knocked together for you; want to put a bit more work into it before I release it "officially". Some notes:
- It includes files (such as those for the admin options and the include files) that a bot should never know exist - I just like to be thorough ;)
- It doesn't include any mod-related files, you'll need to add those in yourself.
- All occurrences of "forumdirectory" below will need to be changed to the name of your own forum directory.
- If you don't want bots indexing default.asp or forum.asp, uncomment the relevant lines but please bear in mind that this may have an impact an a bot's ability to access individual topics.
Here's the file:User-agent: *
Disallow: /forumdirectory/active.asp
Disallow: /forumdirectory/admin_
#Disallow: /forumdirectory/default.asp
Disallow: /forumdirectory/default_group.asp
Disallow: /forumdirectorry/down.asp
Disallow: /forumdirectorry/faq.asp
#Disallow: /forumdirectorry/forum.asp
Disallow: /forumdirectorry/inc_
Disallow: /forumdirectorry/link.asp
Disallow: /forumdirectorry/members.asp
Disallow: /forumdirectorry/moderate.asp
Disallow: /forumdirectorry/password.asp
Disallow: /forumdirectorry/policy.asp
Disallow: /forumdirectorry/pop_
Disallow: /forumdirectorry/post
Disallow: /forumdirectorry/register.asp
Disallow: /forumdirectorry/search.asp
Disallow: /forumdirectorry/setup
Disallow: /forumdirectorry/subscription_list.asp
Disallow: /forumdirectory/readme.htm
Disallow: /forumdirectorry/gpl.txt
Disallow: /forumdirectory/*.gif
Disallow: /forumdirectorry/*.js < |
jgs |
Posted - 26 January 2009 : 13:41:15 Would be a handy add on for Snitch, just had another one locked up in the "still" empty guestbook for 40 minutes. They seem to have espacially trouble with topics with links in it.< |
Shaggy |
Posted - 26 January 2009 : 13:25:27 No, "Allow" is not part of the standard; there are some bots that will recognise it but you shouldn't rely on it.
I don't know of a ready made file, but if I've got a few minutes tomorrow, I'll try and knock one together.
< |
jgs |
Posted - 26 January 2009 : 12:22:56 Thanks, for the link. I noticed there is no "allow" function. Is there maybe a ready made robots.txt for Snitz forums?< |
Shaggy |
Posted - 26 January 2009 : 11:14:36 Forums can occasionally be a bit of a "bot-trap", catching them in loops for extended periods of time. They try to follow every link they find on a page so, if you count the amount of links to post.asp on this page alone with differing querystrings, you can see how a bot can spend so long trying to index a forum. The best thing to do is to use your robots.txt to prevent them from follwing links to pages they shouldn't be indexing, such as post.asp - they only really need to index topic.asp and, if you want, default.asp & forum.asp.
< |
|
|