Snitz Forums 2000
Snitz Forums 2000
Home | Profile | Register | Active Topics | Members | Search | FAQ
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 Community Forums
 Code Support: ASP (Non-Forum Related)
 Please Help With Web Fetching Script
 New Topic  Topic Locked
 Printer Friendly
Next Page
Author Previous Topic Topic Next Topic
Page: of 2

shaneb
Junior Member

USA
319 Posts

Posted - 08 August 2007 :  22:43:41  Show Profile  Send shaneb a Yahoo! Message
Hi everyone. I've been out of the loop on web development for a long time and Snitz members have always been great in helping me when I got a problem. So thanks in advance for those who help me.

Here is my problem. I found a script using the MS XMLHTTP object that allows me to grab a web page, but I don't know how to parse out the unwanted navigation on the left side and at the top of the page. All I need is the tables on this page http://www.i44speedway.com/Trackpoints.htm without the navigation. The script I found is below

<%
 ' Url of the webpage we want to retrieve
 thisURL = "http://www.i44speedway.com/Trackpoints.htm" 

 ' Creation of the xmlHTTP object
 Set GetConnection = CreateObject("Microsoft.XMLHTTP")
 
 ' Connection to the URL
 GetConnection.Open "get", thisURL, False
 GetConnection.Send 

 ' ResponsePage now have the response of
 ' the remote web server
 ResponsePage = GetConnection.responseText

' We write out now
' the content of the ResponsePage var
 Response.write (ResponsePage)

 Set GetConnection = Nothing
 %>


Can anyone rewrite this for me so that is parses out the navigation on the left side and at the top of the page and just leaves me with the tables. Or better yet I only need a specific table on this page. It is the table called Turf Tires. I am making a web site for my little cousin so that he can hopefully get sponsored. I appreciate any help you guys can offer. Thanks Again!

'Surround your mind and you shall see a great future ahead'

Shane B.

Doug G
Support Moderator

USA
6493 Posts

Posted - 08 August 2007 :  23:09:10  Show Profile
google for "screen scraping"

======
Doug G
======
Computer history and help at www.dougscode.com
Go to Top of Page

shaneb
Junior Member

USA
319 Posts

Posted - 08 August 2007 :  23:25:35  Show Profile  Send shaneb a Yahoo! Message
quote:
Originally posted by Doug G

google for "screen scraping"




Hi Doug.

Thanks, but I did that as well as Web Scraping, Page Scraping, Web Fetching, and Page Grabber. There was a script called ASP Page Grabber, but the site no longer exists. I know that there are components out there such as ASP Tear, but my host will not install them. Looks like there is some stuff for .NET but not classic ASP. So I am having a heck of a time trying to find something. Do developers even do this kind of stuff anymore? If not what do they do to grab content from a web page and put it on their own web site. Just so you guys know I always ask permission from an owner before I display their content on my web page.

'Surround your mind and you shall see a great future ahead'

Shane B.


Edited by - shaneb on 08 August 2007 23:40:31
Go to Top of Page

Shaggy
Support Moderator

Ireland
6780 Posts

Posted - 09 August 2007 :  04:50:23  Show Profile
Not an XML programmer so can't provide you with the exact code but what I would do in this case is, instead of just writing the entire contents of the retrieved file, place the contents of the page into a variable (such as ResponsePage above) and then scan through that (using XML, RegEx or whatever way you want) to find the opening and closing tags of what I want, dump them out into their own variables.


Search is your friend
“I was having a mildly paranoid day, mostly due to the
fact that the mad priest lady from over the river had
taken to nailing weasels to my front door again.”
Go to Top of Page

pdrg
Support Moderator

United Kingdom
2897 Posts

Posted - 09 August 2007 :  10:01:03  Show Profile  Send pdrg a Yahoo! Message
How about using a right() and a left() - sorry I'm so rusty now...
Go to Top of Page

pdrg
Support Moderator

United Kingdom
2897 Posts

Posted - 09 August 2007 :  10:06:02  Show Profile  Send pdrg a Yahoo! Message
or would http://simile.mit.edu/wiki/Solvent be any help in your quest?
Go to Top of Page

Podge
Support Moderator

Ireland
3776 Posts

Posted - 09 August 2007 :  11:26:31  Show Profile  Send Podge an ICQ Message  Send Podge a Yahoo! Message
You could try using a regex to match all tables on the page. More than likely it will always be the same match i.e. the second or third table.

http://regexlib.com/Search.aspx?k=table

Podge.

The Hunger Site - Click to donate free food | My Blog | Snitz 3.4.05 AutoInstall (Beta!)

My Mods: CAPTCHA Mod | GateKeeper Mod
Tutorial: Enable subscriptions on your board

Warning: The post above or below may contain nuts.
Go to Top of Page

shaneb
Junior Member

USA
319 Posts

Posted - 09 August 2007 :  20:09:44  Show Profile  Send shaneb a Yahoo! Message
quote:
Originally posted by Podge

You could try using a regex to match all tables on the page. More than likely it will always be the same match i.e. the second or third table.

http://regexlib.com/Search.aspx?k=table


Thanks Everyone!
I found this expression on regex.
(?s)<tr[^>]*>(?<content>.*?)</tr>


This expresssion will match complete table rows (<tr>...</tr>) and put everything between the tr tags into a group named "content".
Basically if I change it to the following it matches all of the tables and puts it in a group called content correct?
(?s)<table[^>]*>(?<content>.*?)</table>
Therefore, technically it will find all of the tables in the page located at http://www.i44speedway.com/Trackpoints.htm
Looking at the HTML Code, the Turf Tires table is number 6 in the HTML. I had to copy and paste the code and then do a find next until I got to the table I needed (Turf Tires).
My question is how do you write the expression so that it will find only table 6 in the HTML page?

Sorry, I still no very little about ASP.

Thanks again everyone.

'Surround your mind and you shall see a great future ahead'

Shane B.


Edited by - shaneb on 09 August 2007 20:32:03
Go to Top of Page

Shaggy
Support Moderator

Ireland
6780 Posts

Posted - 10 August 2007 :  04:45:16  Show Profile
Just to clarify, is it only the data from the "Turf Tires" table you need to pull into your page or do you need the data from any of the other tables, such as "Kid Sprints"?


Search is your friend
“I was having a mildly paranoid day, mostly due to the
fact that the mad priest lady from over the river had
taken to nailing weasels to my front door again.”
Go to Top of Page

Podge
Support Moderator

Ireland
3776 Posts

Posted - 10 August 2007 :  05:43:33  Show Profile  Send Podge an ICQ Message  Send Podge a Yahoo! Message
Can you post your working code?

Podge.

The Hunger Site - Click to donate free food | My Blog | Snitz 3.4.05 AutoInstall (Beta!)

My Mods: CAPTCHA Mod | GateKeeper Mod
Tutorial: Enable subscriptions on your board

Warning: The post above or below may contain nuts.
Go to Top of Page

shaneb
Junior Member

USA
319 Posts

Posted - 11 August 2007 :  22:49:20  Show Profile  Send shaneb a Yahoo! Message
Sorry the only working code I have was from my first post on this topic above. Again this pulls the whole page from that link. I just want a table from this page.

Yes, the only table I need is the Turf Tires table. This is the only class of race he runs in and it is where his standings are.

The expression from regex looks like it would do the trick to pull just the Turf Tires Table data. However, I wouldn't know where to begin.

Thanks

'Surround your mind and you shall see a great future ahead'

Shane B.


Edited by - shaneb on 11 August 2007 22:52:08
Go to Top of Page

pdrg
Support Moderator

United Kingdom
2897 Posts

Posted - 12 August 2007 :  11:55:04  Show Profile  Send pdrg a Yahoo! Message
The following may help...

http://msdn2.microsoft.com/en-us/library/ms974570.aspx
http://www.brettb.com/VBScriptRegularExpressions.asp
http://www.devx.com/vb2themax/Tip/18636
Go to Top of Page

shaneb
Junior Member

USA
319 Posts

Posted - 20 August 2007 :  02:19:50  Show Profile  Send shaneb a Yahoo! Message
Seems all I can find is the XML Script I posted already or a .NET solution located at http://www.codeplex.com/Wiki/View.aspx?ProjectName=htmlagilitypack But I am using Classic ASP.

I can't use expressions, don't know how to use them. I guess the best I can do is just capture the whole page. I wanted a cleaner look, but I'm tired of looking for a solution I can understand and implement.

If someone could come up with a way to do this for me so that it parses the HTML, grabs only the table data from Turf Tires and writes it back to a plain HTML table I'll pay you $20.00 USD via PayPal.

Thanks guys!

'Surround your mind and you shall see a great future ahead'

Shane B.

Go to Top of Page

Podge
Support Moderator

Ireland
3776 Posts

Posted - 20 August 2007 :  08:22:34  Show Profile  Send Podge an ICQ Message  Send Podge a Yahoo! Message
<%
 ' Url of the webpage we want to retrieve
 thisURL = "http://www.i44speedway.com/Trackpoints.htm" 

 ' Creation of the xmlHTTP object
 Set GetConnection = CreateObject("Microsoft.XMLHTTP")
 
 ' Connection to the URL
 GetConnection.Open "get", thisURL, False
 GetConnection.Send 

 ' ResponsePage now have the response of
 ' the remote web server
 ResponsePage = GetConnection.responseText

' We write out now
' the content of the ResponsePage var
'Response.write (getTable(ResponsePage))

posStart = inStr(ResponsePage, "<a name=""Turf Tires"">Turf Tires</a>")
posEnd = inStr(posStart, ResponsePage, "</table>", 1)

Response.Write mid(ResponsePage, posStart+88, posEnd-posStart+8-88)

Set GetConnection = Nothing

Function getTable(pageString)
dim myMatches

Set RegularExpressionObject = New RegExp

With RegularExpressionObject
.Pattern = "(<table class=""MsoTableGrid"" border=""1"" cellspacing=""1"" style=""border: 3px ridge #0000FF; padding-left: 4; padding-right: 4; padding-top: 1; padding-bottom: 1"">.*?)</table>"
.IgnoreCase = True
.Global = True
End With

'stripHTMLtags = RegularExpressionObject.Replace(HTMLstring, "")
set myMatches = RegularExpressionObject.Execute(lcase(pageString))
getTable = myMatches(0)
'response.Write(matches)
Set RegularExpressionObject = nothing

End Function


%>


I couldn't get the regexp to work correctly so I use inStr to find the starting point and end point of the table. I've included the regexp function so you can see how to use matches. You don't need it to get it to work. The red line is the one that outputs the table.

Podge.

The Hunger Site - Click to donate free food | My Blog | Snitz 3.4.05 AutoInstall (Beta!)

My Mods: CAPTCHA Mod | GateKeeper Mod
Tutorial: Enable subscriptions on your board

Warning: The post above or below may contain nuts.

Edited by - Podge on 20 August 2007 08:36:36
Go to Top of Page

pdrg
Support Moderator

United Kingdom
2897 Posts

Posted - 20 August 2007 :  08:29:25  Show Profile  Send pdrg a Yahoo! Message
I'm afraid it'll have to be pseudocode from me...no warranties for this code, but it should give you enough of a starting point to get you going I hope. If it's worth it, donate the $20 to the Snitz found - it helps towards Huw's hosting costs as we don't carry any advertising!

sourcestr = "...all the page text as above..."
startpos = instr(sourcestr, ">Turf Tires</a></font></b></p>") + 0
mystring = right(sourcestr, len((sourcestr) - startpos))
mystring = left(mystring, (instr(">Multi Class</a></font></b></p>") + 0))

response.write mystring

It's not easy to pick unique markers that don't introduce extra quote matrk complications, but the ones I've got above will work. If you find they return too much or too little , then instead of the (clearly meaningless, they're there as a safe placeholder) + 0 you could put -6 or +12 etc, until you get the result you want.

Hope it helps
Go to Top of Page

Podge
Support Moderator

Ireland
3776 Posts

Posted - 20 August 2007 :  15:18:55  Show Profile  Send Podge an ICQ Message  Send Podge a Yahoo! Message
Finally got a version working with the regexp. Its a more elegant solution and probably more reliable. It matches all the tables on the page and puts them into an array called myMatches. You can response.write any table on the page using this code. See the line in red.

<%
 ' Url of the webpage we want to retrieve
 thisURL = "http://www.i44speedway.com/Trackpoints.htm" 

 ' Creation of the xmlHTTP object
 Set GetConnection = CreateObject("Microsoft.XMLHTTP")
 
 ' Connection to the URL
 GetConnection.Open "get", thisURL, False
 GetConnection.Send 

 ' ResponsePage now have the response of
 ' the remote web server
 ResponsePage = GetConnection.responseText

' We write out now
' the content of the ResponsePage var
Response.write (getTable(ResponsePage))

Set GetConnection = Nothing

Function getTable(pageString)
dim myMatches

Set RegularExpressionObject = New RegExp

With RegularExpressionObject
.Pattern = "<table.*>(.|\n)*?</table>"  
.IgnoreCase = True
.Global = True
End With

set myMatches = RegularExpressionObject.Execute(pageString)

getTable = myMatches(4) ' Get the fifth table on the page

Set RegularExpressionObject = nothing

End Function
%>

Podge.

The Hunger Site - Click to donate free food | My Blog | Snitz 3.4.05 AutoInstall (Beta!)

My Mods: CAPTCHA Mod | GateKeeper Mod
Tutorial: Enable subscriptions on your board

Warning: The post above or below may contain nuts.
Go to Top of Page
Page: of 2 Previous Topic Topic Next Topic  
Next Page
 New Topic  Topic Locked
 Printer Friendly
Jump To:
Snitz Forums 2000 © 2000-2021 Snitz™ Communications Go To Top Of Page
This page was generated in 0.35 seconds. Powered By: Snitz Forums 2000 Version 3.4.07