Author |
Topic  |
|
Podge
Support Moderator
    
Ireland
3776 Posts |
Posted - 27 August 2007 : 19:04:58
|
I'm trying to convert a lisp algorithm to vb. This is the lisp alg.
(let ((g (* 2 (or (gethash word good) 0)))
(b (or (gethash word bad) 0)))
(unless (< (+ g b) 5)
(max .01
(min .99 (float (/ (min 1 (/ b nbad))
(+ (min 1 (/ g ngood))
(min 1 (/ b nbad)))))))))
To divide in lisp is this format (/ dividend divisor) i.e. (/ 10 5) = 2 float(integer) just converts an integer to a float The bit in red above should equate to the bit in red below. I think this is where the problem is. vb don't have a min & max function so I have included my own below. I realise they could be prettier.
Function wordProb (word)
nGood = 4000 ' number of good topics and replies
nBad = 4000 ' number of bad topics and replies
g = 10 ' good word frequency to be pulled from db
b = 10 ' bad word frequency to be pulled from db
g = g * 2 ' good word bias
if (g + b) > 5 then ' only consider words with a frequency greater than 5
calculateWordProbability = Max(.01,(Min(.99, (Min(1,b/nbad))))) / (Min(1,g/nGood) + Min(1,b/nBad))
else
calculateWordProbability = 0.4 'word is not popular in the db so assign default value of .4
end if
End function
Function Min(a, b)
if a > b then
Min = b
else
Min = a
end if
end function
Function Max(a, b)
if a > b then
Max = a
else
Max = b
end if
end function I cannot test the lisp alg. as I don't have a lisp interpreter/compiler but I'm sure that values returned should be between 0.01 & 0.99. The hardcoded values above return 1.33333333333333.
Any lisp experts out there ? Anyone see where I'm going wrong ? |
Podge.
The Hunger Site - Click to donate free food | My Blog | Snitz 3.4.05 AutoInstall (Beta!)
My Mods: CAPTCHA Mod | GateKeeper Mod Tutorial: Enable subscriptions on your board
Warning: The post above or below may contain nuts. |
Edited by - Podge on 27 August 2007 19:05:38 |
|
HuwR
Forum Admin
    
United Kingdom
20595 Posts |
Posted - 28 August 2007 : 04:00:04
|
I think your brackets are wrong, shouldn't it be more like
Max(.01,(Min(.99, (Min(1,b/nbad)))) / (Min(1,g/nGood) + Min(1,b/nBad)))
|
 |
|
Podge
Support Moderator
    
Ireland
3776 Posts |
|
HuwR
Forum Admin
    
United Kingdom
20595 Posts |
Posted - 28 August 2007 : 09:54:03
|
sorry no ,couldn't you just keep a running count of badwords and if you have > 3 that are over 0.9 then flag the sentance as bad. |
 |
|
Podge
Support Moderator
    
Ireland
3776 Posts |
|
HuwR
Forum Admin
    
United Kingdom
20595 Posts |
Posted - 28 August 2007 : 10:35:22
|
surely you would just ignore the low probability ones since it is only the high probability words you are interested in |
 |
|
Podge
Support Moderator
    
Ireland
3776 Posts |
|
Shaggy
Support Moderator
    
Ireland
6780 Posts |
Posted - 28 August 2007 : 11:04:30
|
Why not take the average of all words in the sentence and apply the same criteria to that average (0.01=good sentence, etc.)?
|
Search is your friend “I was having a mildly paranoid day, mostly due to the fact that the mad priest lady from over the river had taken to nailing weasels to my front door again.” |
 |
|
Podge
Support Moderator
    
Ireland
3776 Posts |
Posted - 28 August 2007 : 11:21:54
|
Only because its not correct. Word pairs and sequences may be assigned probabilities in the future.
Like the example above;
Two words in a two word sentence have the following probabilities
.90 .90
Then you can be 100% (.99 probability) sure that the sentence is a bad sentence. The average is .90 but the correct probability is .99. Thats a 10% discrepancy.
I don't have a lot of room for errors to creep in as there may be other factors which will skew the results. If I can find a mathematical way of combining probabilities I'll post it here and work on a function from there. |
Podge.
The Hunger Site - Click to donate free food | My Blog | Snitz 3.4.05 AutoInstall (Beta!)
My Mods: CAPTCHA Mod | GateKeeper Mod Tutorial: Enable subscriptions on your board
Warning: The post above or below may contain nuts. |
Edited by - Podge on 28 August 2007 11:23:30 |
 |
|
Podge
Support Moderator
    
Ireland
3776 Posts |
|
HuwR
Forum Admin
    
United Kingdom
20595 Posts |
Posted - 28 August 2007 : 12:09:05
|
I'm not sure your logic holds up, it shouldn't make any difference how many good words are in a sentance if it contains just one .99 word then it ought to be flagged bad regardless of how many good words were in the sentance, if not things like bayesian filters would be pretty easy to fool by flooding the text with "good" words. |
 |
|
|
Topic  |
|