Author |
Topic |
|
wildfiction
Junior Member
167 Posts |
Posted - 22 December 2005 : 15:02:13
|
I'm interested in writing some search code in ASP.NET that makes searching the Snitz DB lightening fast. Does anybody have any links or suggestions where I can get some info to get started.
I'm generally looking for an algorithm that takes all of the forum text out of the DB and creates an index file which itself can be quickly searched and then refer back to the forum DB. The index file would obviously have to be updated at regular intervals to keep it current.
Any ideas or links for me? Thanks! |
|
laser
Advanced Member
Australia
3859 Posts |
Posted - 22 December 2005 : 15:58:26
|
Why take the data out of the database ? Just use the database directly .... wouldn't you ? |
|
|
wildfiction
Junior Member
167 Posts |
Posted - 22 December 2005 : 18:47:29
|
laser - yes you can use the DB directly but then (correct me if I'm wrong) you look at the contents of each and every record and search all of the data in each of the records for the word/phrase that you're looking for right?
If you had pre-processed all of those records and created a number of index files then you would be able to locate a word and that would tell you which records had that word in it.
So, for example, say you'd preprocessed your DB and generated 26 files - 1 for each letter of the alphabet.
Someone searches for 'acrobat' and so your search code opens the a.ndx file and finds the word acrobat in there and discovers that records 25698, 26186, and 127969 have the word acrobat in them.
You very quickly find the records you want. It is also quick to add NOT words. If you added '-circus' to your search then you'd open the c.ndx file and get the records that have 'circus' in them and exclude any records that matches the previous search.
|
|
|
pdrg
Support Moderator
United Kingdom
2897 Posts |
Posted - 23 December 2005 : 05:34:01
|
what's the db server? Just indexing properly in the db will help a load, but if you're using MS SQL Server, have a look at the 'fulltext' searches - probably exactly what you want
hth |
|
|
wildfiction
Junior Member
167 Posts |
Posted - 09 January 2006 : 15:10:40
|
fulltext search is probably what would be best here. thanks for the idea. |
|
|
mios
Junior Member
United Kingdom
101 Posts |
Posted - 10 January 2006 : 08:39:50
|
Take a look at Lucene http://www.dotlucene.net/ it's a .net port of apache lucene as is very fast and easy to use.
You basicaly add documents (documents are a collection of fields) to the index, so a document could contain topicID, subject, post, author, date..... these fields are then fully search able.
so for example
author:mios
Would return all my posts
I've used this for a DMS project, and have index 20,000 documents (including the full works of shakespeare), the index size is about 6GB and a search takes about 0.5 sec.
You can also use it in conjunction with the highlighter object, which generates summaries of the returned items, that are dependant on the search terms, very cool! Oh and it's free! |
|
|
|
Topic |
|