Author |
Topic  |
|
redbrad0
Advanced Member
    
USA
3725 Posts |
Posted - 12 March 2001 : 13:54:13
|
if i have a string that has html in it, and i want to remove all the html and just leave the text, how could i do this?
EXAMPLE: strTEXT = "<font color="red">This is the text</font>
I want it to return "This is the text"
But all the html would be different so i gues it needs to look for < and > and remove everything inbetween it.
Brad |
|
Da_Stimulator
DEV Team Forum Moderator
    
USA
3373 Posts |
Posted - 12 March 2001 : 16:10:23
|
why not just replace the < and > with the appropriate html codes for them? &nsbp; i think it is for the <
---------------- Da_Stimulator Need a Mod? My Snitz Test Center
|
 |
|
redbrad0
Advanced Member
    
USA
3725 Posts |
Posted - 12 March 2001 : 16:18:09
|
Because I am working on building a custom search engine for a site and going to have it read in the entire pages html, then remove the html and just display the text if possible to be inserted into the database
Brad |
 |
|
Da_Stimulator
DEV Team Forum Moderator
    
USA
3373 Posts |
Posted - 12 March 2001 : 16:22:52
|
You could try something like Filter(string1, "< >", ,-1) - I dont know if that would work or not..... but i'm assuming something along those lines.
---------------- Da_Stimulator Need a Mod? My Snitz Test Center
|
 |
|
redbrad0
Advanced Member
    
USA
3725 Posts |
Posted - 12 March 2001 : 16:33:14
|
does anyone know what would work?
Brad |
 |
|
Da_Stimulator
DEV Team Forum Moderator
    
USA
3373 Posts |
Posted - 12 March 2001 : 16:35:53
|
havnt tested it, but this SHOULD work.
strPos1 = InStr(String1,"<") strPos2 = InStrRev(String1, ">")
strCount1 = Left(String1, strPos1) strCount2 = Right(String1, strPos2)
String1 = strCount1 & strCount2
---------------- Da_Stimulator Need a Mod? My Snitz Test Center
|
 |
|
Da_Stimulator
DEV Team Forum Moderator
    
USA
3373 Posts |
Posted - 12 March 2001 : 16:37:44
|
Just was sitting here doing nothing, so I figured I'd figure out a way for you. I have nothing to test it on, or I'd test it for you.
---------------- Da_Stimulator Need a Mod? My Snitz Test Center
|
 |
|
Doug G
Support Moderator
    
USA
6493 Posts |
Posted - 12 March 2001 : 16:45:35
|
This isn't fully tested but may give you an idea about one way to go.
Function StripHTML(sIN) Dim n1, n2 Do while InStr(1, sIN, "<") > 0 n1 = InStr(1, sIN, "<") n2 = InStr(n1, sIN, ">") sIN = Mid(sIN, 1, n1-1) & mid(sIN, n2+1, Len(sIN)) Loop StripHTML = sIN End Function
====== Doug G ====== |
 |
|
redbrad0
Advanced Member
    
USA
3725 Posts |
Posted - 12 March 2001 : 17:16:54
|
Works great. Check it out here..... http://207.21.203.31/test/test_remove_html.asp
It takes the following page and removes the html...
http://207.21.203.31/contact/email.asp
But now that I got what it returns, im trying to figure out how to do this search cause it really brings back to many options. So can yall give me your input?
I am going to make a crawler for the site. I almost have it where it gets all the links on the page which can be viewed at ( http://207.21.203.31/test/test_find_all_links.asp ). This would then cache the pages with html into my database so when a user does a search it can pull up the cache pages and highlight the words they searched for. (like google.com search engine does ) So would I insert the plain text and the html into the database and search in the plain text file and display the html file?
Brad |
 |
|
Da_Stimulator
DEV Team Forum Moderator
    
USA
3373 Posts |
|
redbrad0
Advanced Member
    
USA
3725 Posts |
Posted - 12 March 2001 : 23:50:00
|
nope, i dont know what to do maybe ill just read the meta tags 
Brad |
 |
|
Doug G
Support Moderator
    
USA
6493 Posts |
Posted - 13 March 2001 : 09:55:49
|
Are you passing the text one line at a time? Try passing the entire page text as one string. The problem may be when <> spans multiple lines, typically with scripts, styles.
Or make some kind of test for a closing ">" tag & if there isn't one in the original line keep throwing away lines until you find a line with the closing tag.
====== Doug G ====== |
 |
|
redbrad0
Advanced Member
    
USA
3725 Posts |
Posted - 13 March 2001 : 10:59:00
|
doug,
it actually does everything right ( thanks for the code ). I think the best thing would just be to read the Meta Tags. Have you ever used Google.com's search? How do those search engines get that search working so good? Maybe I will just try it, but it seems to only get the info it really wants.
Look at this page on Google LINK ON GOOGLE
it says this info at the top of the page...
These search terms have been highlighted: department home contact These terms only appear in links pointing to this page: menu_title exhibitstore [/code]
how does it know that menu_title only appears in links pointing to this page. maybe it only reads the info below the </head> tags.
Any other ideaS?
Brad
Edited by - redbrad0 on 13 March 2001 12:32:05 |
 |
|
|
Topic  |
|