Snitz Forums 2000
Snitz Forums 2000
Home | Profile | Register | Active Topics | Members | Search | FAQ
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 Community Forums
 Code Support: ASP (Non-Forum Related)
 robots.txt help needed
 New Topic  Topic Locked
 Printer Friendly
Author Previous Topic Topic Next Topic  

Panhandler
Average Member

USA
783 Posts

Posted - 20 March 2007 :  10:52:15  Show Profile  Visit Panhandler's Homepage
I'm using a webcrawler to index my site for Google.
The forum has the "Event Calendar" mod which spans about a hundred years - and each day is seen as a link by the crawler.

How do I keep the crawler from looking in the forum?

Here's a portion of my robots.txt:

User-agent: webcrawler
Disallow: /_private
Disallow: /_vti_log
Disallow: /cal
Disallow: /forum

Shouldn't that work?



"5-in-1 Snitz Common Expansion Pack" - five popular mods packaged for easy install
". . .on a mote of dust, suspended in a sunbeam. . ."
HarborClassifieds
Support Snitz Forums


Shaggy
Support Moderator

Ireland
6780 Posts

Posted - 20 March 2007 :  11:00:43  Show Profile
Here's a handy-dandy reference


Search is your friend
“I was having a mildly paranoid day, mostly due to the
fact that the mad priest lady from over the river had
taken to nailing weasels to my front door again.”
Go to Top of Page

Gremlin
General Help Moderator

New Zealand
7528 Posts

Posted - 21 March 2007 :  03:39:25  Show Profile  Visit Gremlin's Homepage
Are you sure the webcrawler obeys robots files? and are you sure that webcrawler is the user-agent?

Kiwihosting.Net - The Forum Hosting Specialists
Go to Top of Page

Panhandler
Average Member

USA
783 Posts

Posted - 21 March 2007 :  09:55:35  Show Profile  Visit Panhandler's Homepage
quote:
Originally posted by Gremlin

Are you sure the webcrawler obeys robots files? and are you sure that webcrawler is the user-agent?


Yes. . .there was a big improvement once I cleared out the webcrawler memory, erased all files that it had generated, erased the project, restarted the application and established a new project.

It seems the crawler would not "forget" URL's that it already visited and it would start down that path again.


Now there are only 8,900 URL's on my site being counted - so yes, it appears that the webcrawler is obeying the robot.txt. But that's still way to many URLs.

The "Event Calendar MOD" will absolutely add several thousands of URL's to a site map - all pointing to empty daily dates for the next 100 years! Thus the need for an effective robots.txt

Bye the way, the webcrawler used was recommended by Google for creating a site index: SOFTplus GSiteCrawler


"5-in-1 Snitz Common Expansion Pack" - five popular mods packaged for easy install
". . .on a mote of dust, suspended in a sunbeam. . ."
HarborClassifieds
Support Snitz Forums


Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Topic Locked
 Printer Friendly
Jump To:
Snitz Forums 2000 © 2000-2021 Snitz™ Communications Go To Top Of Page
This page was generated in 0.39 seconds. Powered By: Snitz Forums 2000 Version 3.4.07