Author |
Topic  |
|
redbrad0
Advanced Member
    
USA
3725 Posts |
Posted - 15 February 2007 : 17:22:11
|
Its been a long time since I have been around, and just ran across this. I was wondering if anyone has ever created a script that would take certain/expired topics and convert them to html files and then delete the records from the database. This way they are stored as files which Search Engines could still find, but they are not searchable. |
Brad Oklahoma City Online Entertainment Guide Oklahoma Event Tickets |
|
HuwR
Forum Admin
    
United Kingdom
20600 Posts |
Posted - 15 February 2007 : 18:09:34
|
what would be the point of keeping them if you can't search them ? databases are far more efficient than flat file systems, that is why we use them |
 |
|
redbrad0
Advanced Member
    
USA
3725 Posts |
Posted - 15 February 2007 : 20:35:29
|
Huw I agree... but a huge forum would benefit from this. When you have approx 150 on your forum at any give time (not counting any bots), 42,000 members and 800,000 posts snitz will slow down. All pages load under 1 second unless there is a drastic increase in traffic on the site you can really tell. Basically the only people that visit all topics are search engines and people that find the site from a search engine. So creating static html pages for pages that do not need to be returned in search results (when you have this many records searching really doesn't work that well anyway) means that many less query's on the database just to pull up content that would never change.
I know this is not something that everyone would want/or use, but there are times it would benefit which is why I asked if anyone has ever seen a script like this for snitz. |
Brad Oklahoma City Online Entertainment Guide Oklahoma Event Tickets |
 |
|
HuwR
Forum Admin
    
United Kingdom
20600 Posts |
Posted - 16 February 2007 : 03:58:26
|
quote: When you have approx 150 on your forum at any give time (not counting any bots), 42,000 members and 800,000 posts snitz will slow down.
You mean like we do here. I don't notice any slowdown do you ?
quote: (when you have this many records searching really doesn't work that well anyway)
It still works fine, but not with the standard search, you need to install one of the seach MODS to optimise it for SQL.
I still disagree that making your database into static HTML will actually help in any way, it is infact more likely to affect the performance of the webserver since now the bots are going to be indexing 10's of thousands of pages which did not exist before, and a webserver is not likely to be as highly specced as a web server |
 |
|
pdrg
Support Moderator
    
United Kingdom
2897 Posts |
Posted - 16 February 2007 : 11:02:34
|
Actually I can see some value in this, the pages will be searchable from google/whatever if make sure they're spidered, and anyone jumping straight into one of those pages will not initiate a session. Also, if there's no ASP processing to do, then there is a lower IIS load on that box (just streaming HTML pages is easy work, or could be dumped on an apache box), and no db access means (admittedly not massive) lower cost of management (managing logs and backups, or paying for db space if you have free reguilar webspace).
If the box isn't struggling it may be more work than it's worth, but I'd be interested to see some real-life data on this if you go ahead RedBrad. Something like netvampire should be able to leech your entire site by parsing all the links it finds, and if you run it over HTTP it'll save every page as HTML versions of theirselves. You may need to do a bit of manual work to make sure you archive the right ones, but it may help |
 |
|
HuwR
Forum Admin
    
United Kingdom
20600 Posts |
Posted - 16 February 2007 : 11:32:53
|
They may well be searchable form google, they may not, that depends how relevant or not the content of the pages are, and may actually result in you losing google ranking rather than gaining from it, it would also mean that the archives are no longer searchable from the search page, a big minus point for regular forum users. it will also increase management if you decide to move servers as you will now have tens of thousands of html pages to download, it will make pruning more difficult as you will have thousands of static html files with no management interface to deal with them. |
 |
|
HuwR
Forum Admin
    
United Kingdom
20600 Posts |
Posted - 16 February 2007 : 13:06:37
|
Didn't want to appear too negative on the idea, just want to make sure you think about ALL the pluses and minuses before going ahead.
Maybe it would be worth logging haow often Archived topics are viewd/ searched for, would probably be an nteresting survey, it may be that they gat that few views that it isn't worth keeping them at all  |
 |
|
pdrg
Support Moderator
    
United Kingdom
2897 Posts |
Posted - 18 February 2007 : 10:29:10
|
Yes, a non-technical view may show interesting results |
 |
|
TonyB7
Junior Member
 
USA
267 Posts |
Posted - 18 February 2007 : 19:53:34
|
I've been suggesting offline archiving like this for ages, but I seem to be alone.
I'd settle for a text dump that I could zip and upload. If someone really wants to search posts from 3 years ago they can download the file and search all they want.
Of course, if databases could grow without consequence this idea makes no sense. But in the real world they can't, so some method of pruning short of deletion would be nice.
|
 |
|
redbrad0
Advanced Member
    
USA
3725 Posts |
Posted - 20 February 2007 : 00:58:19
|
Huw,
This site might have 150 users on the site sometimes, but I do not see it as something that is always going on. Also when you have some additional mods installed it means that much more server load per each user.
Looking at the Snitz Details it says.. ----------------------------- 17007 of 29238 Members have made 48733 posts in 44 forums, with the last post on 19 February 2007 20:28:45 by: JJenson. -----------------------------
The site I am talking about shows this ----------------------------- 18909 of 42444 Members have made 551861 posts in 20 forums, with the last post on 02/19/2007 23:20:12 by: steveceleste. There are 221128 archived posts in 32745 archived topics -----------------------------
I hate to tell you but there is a pretty big difference in numbers which will make snitz handle the user load different. Almost all the query's use stored procedures along with other work which was needed to take the home page from a 8-10 second load time to under a second from the site I am talking about.
Searching I have used the mods for the search and it still times out. I actually had to change things around to make it so you could search even though it is very slow. As for older posts being able to be searched... that is up to the owner of the site. The owner of the site should know how relevant the content would be after 1 year. If the owner of the site does not care that the end user can not use the search form on snitz to find content over XXX days/months then that is their call.
Search Engines Over the past couple years this is all that I have been dealing with and I know the best things to do dealing with SEO on websites. If the converting of the site into html pages is done correctly, it will not hurt the site, but will actually help improve the ranking of the website. Yes the ranking of the pages depends on the content, but content is king and the more content you have... even if its just for posts and replys gets you better ranking. If a site has low grade content, then converting it into html pages can not hurt because it would be the exact same content on the html pages as on the dynamic pages.
File Management I am not 100% sure about the best way to handle this, but I think the admin of the site could generate the html pages from something (like i said not 100% sure yet). This way if they ever needed to move servers they would not have to worry about the html pages, but after they moved the forum to the new server they could rebuild all the static pages. Like I said... I am not 100% sure on all of this but this last part is just something I have been thinking about.
As pdrg said.. a web server just having to serve static html pages, and asp pages connecting to a large database is a difference. Can you really compare a web server finding a file and sending the contents to the end user, and a web server finding a file, connecting to a database and grabbing the data and then compiling the html to send to the end user? |
Brad Oklahoma City Online Entertainment Guide Oklahoma Event Tickets |
 |
|
HuwR
Forum Admin
    
United Kingdom
20600 Posts |
Posted - 20 February 2007 : 04:35:16
|
quote: This site might have 150 users on the site sometimes, but I do not see it as something that is always going on
This site has that many users online pretty much constantly 24 hours a day, so don't comment on the relevance of something you know nothing about.
if your searches are timing out even after optimising for use with SQL then that is the fault of your SQL server or the query code, a SQL server is perfectly capable of searching 800000 records without timing out. and as for SEO, content may be king, but it is quality and relevance that is king not quantity
quote: Can you really compare a web server finding a file and sending the contents to the end user, and a web server finding a file, connecting to a database and grabbing the data and then compiling the html to send to the end user?
Yes of course you can, it all depends on configuration of the servers and the efficiency of the connection between the two, If you turn all your posts to static html you won't actually be serving them up to users anyway so that shouldn't be an issue and you would get the same speed up for your users simply by just deleting them, since they can no longer access them anyway other than going via a search engine, which makes them of little relevance to your regular visitors. |
 |
|
HuwR
Forum Admin
    
United Kingdom
20600 Posts |
Posted - 20 February 2007 : 05:30:15
|
you could probably convert the printer freindly code to accomplish this, since it already converts the topic into to readable text |
 |
|
pdrg
Support Moderator
    
United Kingdom
2897 Posts |
Posted - 20 February 2007 : 11:14:36
|
quote: Originally posted by HuwR
you could probably convert the printer freindly code to accomplish this, since it already converts the topic into to readable text
That's a smart approach. And even if the HTML topics aren't searchable from within the forum, then maybe adding a google site-search box to the site (or even the search page - for archived posts) wouldn't cause too many tears...
I hear your points Huw, and it certainly sounds like your servers/system are better tuned than RedBrad0's are, but maybe he's bound to some dreadful shared server with loads of cycle-wasting other users, or something. Maybe his IIS and SQL are on the same box, or there's a log management problem slowly starving his systems of air - I have no idea, and it may be that a better fix would be to really have a fresh look at the server architecture etc, but there may still be some benefit if he, the forum admin, had no other options.
I'm still certainly interested to see the outcome, as if the server is running close to capacity, then the longer ASP execution time/clock-cycle drain would be much more apparent than a straight HTML page. Then it's RedBrad0's call as to what he wants to do with searching if it buys him enough clock-cycles to breathe in again...or not if it doesn't
|
 |
|
|
Topic  |
|