Snitz Forums 2000 - robots.txt help needed

Snitz Forums 2000

Username:	Password:
Save Password
Forgot your Password?

All Forums

Community Forums

Code Support: ASP (Non-Forum Related)

robots.txt help needed

New Topic

Topic Locked

Printer Friendly

Author

Topic

Panhandler
Average Member

USA
783 Posts

Posted - 20 March 2007 : 10:52:15

I'm using a webcrawler to index my site for Google.
The forum has the "Event Calendar" mod which spans about a hundred years - and each day is seen as a link by the crawler.

How do I keep the crawler from looking in the forum?

Here's a portion of my robots.txt:

User-agent: webcrawler
Disallow: /_private
Disallow: /_vti_log
Disallow: /cal
Disallow: /forum

Shouldn't that work?

"5-in-1 Snitz Common Expansion Pack" - five popular mods packaged for easy install
". . .on a mote of dust, suspended in a sunbeam. . ."
HarborClassifieds
Support Snitz Forums

Shaggy
Support Moderator

Ireland
6780 Posts

Posted - 20 March 2007 : 11:00:43

Here's a handy-dandy reference

Search is your friend
“I was having a mildly paranoid day, mostly due to the
fact that the mad priest lady from over the river had
taken to nailing weasels to my front door again.”

Gremlin
General Help Moderator

New Zealand
7528 Posts

Posted - 21 March 2007 : 03:39:25

Are you sure the webcrawler obeys robots files? and are you sure that webcrawler is the user-agent?

Kiwihosting.Net - The Forum Hosting Specialists

Panhandler
Average Member

USA
783 Posts

Posted - 21 March 2007 : 09:55:35

quote:
Originally posted by Gremlin

Are you sure the webcrawler obeys robots files? and are you sure that webcrawler is the user-agent?

Yes. . .there was a big improvement once I cleared out the webcrawler memory, erased all files that it had generated, erased the project, restarted the application and established a new project.

It seems the crawler would not "forget" URL's that it already visited and it would start down that path again.

Now there are only 8,900 URL's on my site being counted - so yes, it appears that the webcrawler is obeying the robot.txt. But that's still way to many URLs.

The "Event Calendar MOD" will absolutely add several thousands of URL's to a site map - all pointing to empty daily dates for the next 100 years! Thus the need for an effective robots.txt

Bye the way, the webcrawler used was recommended by Google for creating a site index: SOFTplus GSiteCrawler

"5-in-1 Snitz Common Expansion Pack" - five popular mods packaged for easy install
". . .on a mote of dust, suspended in a sunbeam. . ."
HarborClassifieds
Support Snitz Forums

Topic

New Topic

Topic Locked

Printer Friendly

Jump To:

Snitz Forums 2000

This page was generated in 0.39 seconds.