Author |
Topic  |
|
Shaggy
Support Moderator
    
Ireland
6780 Posts |
Posted - 26 March 2008 : 09:22:26
|
I've included a robots.txt file on a site for the first time and wanted to get ye to throw an eye over it to make sure I haven't disallowed anything I shouldn't have or omitted anything that should be disallowed. Also, do you have any suggestions for other user-agents which should be disallowed completely, whether they read robots.txt or not?
|
Search is your friend “I was having a mildly paranoid day, mostly due to the fact that the mad priest lady from over the river had taken to nailing weasels to my front door again.” |
|
pdrg
Support Moderator
    
United Kingdom
2897 Posts |
Posted - 26 March 2008 : 09:34:37
|
Maybe a leecher tool like netvampire, but any decent UA can impersonate others, so the only people you'll keep out with a robots.txt tend to be legitimate traffic! |
 |
|
bobby131313
Senior Member
   
USA
1163 Posts |
Posted - 26 March 2008 : 09:39:16
|
I did the following for MSNBot because thier hogs (mostly images)...
User-agent: MSNbot Disallow: /*.jpeg$ Disallow: /*.jpg$ Disallow: /*.gif$ Disallow: /*.png$ Crawl-delay: 120
To keep Google especially from snagging pages with posts actually in them...
Disallow: /forum/post.asp
To avoid possible duplicate content penalties...
Disallow: /forum/pop_printer_friendly.asp
|
Switch the order of your title tags |
 |
|
|
Topic  |
|