Snitz Forums 2000
Snitz Forums 2000
Home | Profile | Register | Active Topics | Members | Search | FAQ
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 Community Forums
 Code Support: ASP (Non-Forum Related)
 Tips on writing a search
 New Topic  Topic Locked
 Printer Friendly
Author Previous Topic Topic Next Topic  

Kat
Advanced Member

United Kingdom
3065 Posts

Posted - 08 May 2001 :  05:48:32  Show Profile  Visit Kat's Homepage
Can anyone give me some tips on how to go about writing a search.

I already have a search working nicely where it it just using filters to narrow down results. However, I need to add a keywords box to this search and it needs to use the CONTAINS TABLE rather than LIKE because we need it to be returned by relevance.

I am not sure how to pull back to values from the text field in the form and split them up correctly.

Someone could enter "'Search Phrase', Search Words , another, word" or other combinations and I am not sure how to pull these values back and split them into the correct parts to put in the CONTAINS("search for this") value. Request.Form("textfieldname") is not enough to split them up.

Any input would be appreciated!



KatsKorner

HuwR
Forum Admin

United Kingdom
20595 Posts

Posted - 08 May 2001 :  06:01:43  Show Profile  Visit HuwR's Homepage
Kat,

You will need to write some parsing functions to extract from the input box. it will be quite compex as you can't control how the users input the keywords. if they were all niely seperated by commas as in your example, you could just use split to obtain the seperate keywords or phrases.

Go to Top of Page

gor
Retired Admin

Netherlands
5511 Posts

Posted - 08 May 2001 :  06:11:19  Show Profile  Visit gor's Homepage
Even if they are not seperated using commas, you can still use split().
Best thing is to first split on " so users can use "search for this sentence"
and then split on spaces.
That way you also get the boolean operators split from the rest.


Pierre
Join the Snitz WebRing
Go to Top of Page

Kat
Advanced Member

United Kingdom
3065 Posts

Posted - 08 May 2001 :  07:04:47  Show Profile  Visit Kat's Homepage
Ok Gor,

I will go with your suggestion and split by spaces.

I understand that their are some words that needs stripping from a search otherwise SQL Server won't like it. Has anyone got any advice on what they are and how to handle it?

KatsKorner
Go to Top of Page

HuwR
Forum Admin

United Kingdom
20595 Posts

Posted - 08 May 2001 :  07:08:20  Show Profile  Visit HuwR's Homepage
I can't think of any that would affect your search, there are words you can't use as column names, but there should be no restriction on what you can search for.

Go to Top of Page

Kat
Advanced Member

United Kingdom
3065 Posts

Posted - 08 May 2001 :  08:06:05  Show Profile  Visit Kat's Homepage
Hmm. Some of the guys here have a file of 'noise' words that apparently I should filter before trying to do a search.

I shall investigate.

KatsKorner
Go to Top of Page

HuwR
Forum Admin

United Kingdom
20595 Posts

Posted - 08 May 2001 :  08:11:55  Show Profile  Visit HuwR's Homepage
they probably just mean common usage words which are likely to appear in most records, things like 'the'

Go to Top of Page

gor
Retired Admin

Netherlands
5511 Posts

Posted - 08 May 2001 :  08:16:35  Show Profile  Visit gor's Homepage
yes, a noise file has words like 'the' 'of' 'a' etc.
You can see that also at http://www.google.com/ (not the list, but how it works).
If you type a search string with words that are on that list, it disregards them in the search because the are too common.

Pierre
Join the Snitz WebRing
Go to Top of Page

gor
Retired Admin

Netherlands
5511 Posts

Posted - 08 May 2001 :  08:30:37  Show Profile  Visit gor's Homepage
I found this info in an oracle document (http://ksu154.himolde.no/oracledok/doc/cartridg.804/a58164.pdf):


To calculate a relevance score for a returned document in a text query, ConText uses
an inverse frequency algorithm. Inverse frequency scoring assumes that frequently
occurring terms in a document set are "noise" terms, and so these terms are scored
lower. For a document to score high, the query term must occur frequently in the
document but infrequently in the document set as a whole.
The following table illustrates ConText’s inverse frequency scoring. The first col-umn
shows the number of documents in the document set, and the second column
shows the number of terms in the document necessary to score 100.
Note: This section discusses how ConText calculates score for text
queries, which is different from the way it calculates score for
theme queries.

This table assumes that only one document in the set contains the query term.
The table illustrates that if only one document contained the query term and there
were five documents in the set, the term would have to occur 20 times in the docu-ment
to score 100. Whereas, if there were 1,000,000 documents in the set, the term
would have to occur only 4 times in the document to score 100.

Example
You have 5000 documents dealing with chemistry in which the term chemical occurs
at least once in every document. The term chemical thus occurs frequently in the
document set.
You have a document that contains 5 occurrences of chemical and 5 occurrences of
the term hydrogen. No other document contains the term hydrogen.
Because chemical occurs so frequently in the document set, its score for the docu-ment
is lower with respect to hydrogen, which is infrequent is the document set as a
whole. This is so even though both terms occur 5 times in the document.

Number of Documents Frequency of Term
in Document Set in Document
1 34
5 20
10 17
50 13
100 12
500 10
1,000 9
10,000 7
100,000 5
1,000,000 4


Note: Even if the relatively infrequent term hydrogen occurred 4
times in the document, and chemical occurred 5 times in the docu-ment,
the score for hydrogen might still be higher, because chemical
occurs so frequently in the document set (at least 5000 times).ഊScoring
Inverse frequency scoring also means that adding documents that contain hydrogen
lowers the score for that term in the document, and adding more documents that
do not contain hydrogen raises the score.


On this site: http://www.phpbuilder.com/columns/clay19990421.php3
They explain how to build in in PHP and provide a noiselist:

noisewords.txt
--------------
a
about
after
ago
all
almost
along
also
am
an
and
answer
any
anybody
anywhere
are
aren't
around
as
ask
at
bad
be
been
before
being
best
better
between
big
but
by
can
can't
come
could
couldn't
day
did
didn't
do
does
don't
down
each
either
else
even
ever
every
everybody
everyone
far
find
for
found
from
get
go
going
gone
good
got
had
has
have
haven't
having
her
here
hers
him
his
home
how
href
I
if
in
into
is
isn't
it
its
know
large
less
like
little
looking
look
many
me
more
most
must
my
near
never
new
news
no
none
not
nothing
of
off
often
old
on
once
only
or
other
our
ours
out
over
page
please
question
rather
recent
she
should
sites
small
so
some
something
sometime
somewhere
than
true
thank
that
the
their
theirs
them
then
there
these
they
this
those
though
through
thus
time
times
to
too
under
until
untrue
up
upon
use
users
version
very
via
want
was
way
web
were
what
when
where
which
who
whom
whose
why
wide
will
with
within
without
world
worse
worst
would
www
yes
yet
you
your
yours
how


Pierre
Join the Snitz WebRing
Go to Top of Page

Kat
Advanced Member

United Kingdom
3065 Posts

Posted - 08 May 2001 :  09:04:41  Show Profile  Visit Kat's Homepage
Thanks Gor. That noise list is great. I think I know how to handle that bit now. Time for a trial. And to learn more about CONTAINS TABLE.

KatsKorner
Go to Top of Page

Kat
Advanced Member

United Kingdom
3065 Posts

Posted - 08 May 2001 :  09:45:25  Show Profile  Visit Kat's Homepage
I can't get CONTAINS TABLE to work. Can anyone help?

I have a full-text-indexed field on tblCompanyContact called s_companyname. This should contain the word 'company'. Can't get the sql to return anything.

Based it on something I found from Microsoft Site:

SELECT s_companyname
FROM tblcompanycontact AS FT_TBL INNER JOIN
CONTAINSTABLE(tblcompanycontact, s_companyname, 'company') AS KEY_TBL
ON FT_TBL.l_companyid = KEY_TBL.[KEY]


Confused. don't understand what the join is trying to do but if I remove it - nothing works. Doesn't work anyway. Help?

KatsKorner


Edited by - kat on 08 May 2001 09:46:18
Go to Top of Page

Kat
Advanced Member

United Kingdom
3065 Posts

Posted - 08 May 2001 :  10:02:48  Show Profile  Visit Kat's Homepage
Have decided to not bother with full-text-indexing because the amount of data does not justify it. Going to use PATINDEX instead.

Thanks for the help guys!

KatsKorner
Go to Top of Page

Doug G
Support Moderator

USA
6493 Posts

Posted - 08 May 2001 :  10:48:14  Show Profile
SQL Server 7 has a built-in noise word list.

http://support.microsoft.com/support/kb/articles/q247/5/61.asp


======
Doug G
======
Go to Top of Page

Kat
Advanced Member

United Kingdom
3065 Posts

Posted - 08 May 2001 :  11:45:56  Show Profile  Visit Kat's Homepage
Thanks Doug!

KatsKorner
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Topic Locked
 Printer Friendly
Jump To:
Snitz Forums 2000 © 2000-2021 Snitz™ Communications Go To Top Of Page
This page was generated in 1.07 seconds. Powered By: Snitz Forums 2000 Version 3.4.07