eHow launches Android app: Get the best of eHow on the go.

How To

How to Mine/Extract/Pull Data from your Web Browser into Visual Basic.NET

Member
By md2thomas
User-Submitted Article
(1 Ratings)
Mine/Extract/Pull Data from your Web Browser into Visual Basic.NET
Mine/Extract/Pull Data from your Web Browser into Visual Basic.NET

Sometimes the data you want to pull into Visual Basic.NET is located somewhere on a web page. Using the process described here you can usually pull this data directly from the web page into VB and do whatever you want with the data from there.

Difficulty: Moderately Challenging
Instructions

Things You'll Need:

  • Visual Basic.NET installed on your pc
  • Familiarity with VB programming in general.
  • Basic understanding of Object Properties and Methods.
  1. Step 1
     

    Setup a required reference:

    * Right-click on the References icon and click on Add Reference then click on Browse and browse to c:\windows\system32\shdocvw.dll - add this file as a reference. Note that your system32 directory could be in c:\winnt or some other path, depending on your version of Windows. See screen shot...

  2. Step 2

    Setup the object variable so it can be used globally across your application:

    * Put the following code under all inherits on your form:
    Dim ie As New SHDocVw.InternetExplorer

    In all the remaining steps, I'm going to use the variable name "ie" as the browser object. You can call yours anything you like.

    Ehow would not let me put all the finished code into a step (trying caused them to complain and forced me to completely redo the entire how-to) so I added a resource that links to it at the bottom.

  3. Step 3

    Initiate the Web Browser Object:

    * In the Form1_Load sub initialize the variable with the following code:
    ie.Top = -4
    ie.Left = -4
    ie.Height = 480
    ie.Width = 640
    ie.Visible = True

    Top/Left/Height/Width/Visible are some of the Properties of the Browser Object. A link is setup as a resource at the bottom if you need some brushing up on all the Properties and Methods available to this Browser Object.

    After copying/pasting the above code, run your program - it should start up your default web browser. You can close the browser once you confirm that the program launches it.

  4. Step 4

    Navigate to the website that you want to pull data from:

    * In my example, we're going to go to finance.yahoo.com and pull the current values of the Dow, Nasdaq and S&P. Add the following code under the previous code and execute your program:

    ie.navigate "finance.yahoo.com"

    "navigate" is one of the many methods available to the Web Browser Object...

  5. Step 5

    Add a command button to execute the code that pulls the web page content:

    Add a command button to your project's only form (form1) and after double-clicking on it, add (copy/paste) the following code in its Button1_Click sub:

    Dim workpage As Object
    Dim x, workpagecnt As Integer
    Dim index As String

    workpage = Split(ie.Document.activeelement.innertext, Chr(10))
    workpagecnt = UBound(workpage)
    For x = 0 To workpagecnt
    index = "Dow"
    If InStr(workpage(x), index) > 0 And InStr(workpage(x), "%") > 0 Then
    MsgBox(index & " " & Microsoft.VisualBasic.Right(workpage(x), 7))
    End If
    index = "Nasdaq"
    If InStr(workpage(x), index) > 0 And InStr(workpage(x), "%") > 0 Then
    MsgBox(index & " " & Microsoft.VisualBasic.Right(workpage(x), 7))
    End If
    index = "S&P"
    If InStr(workpage(x), index) > 0 And InStr(workpage(x), "%") > 0 Then
    MsgBox(index & " " & Microsoft.VisualBasic.Right(workpage(x), 7))
    End If

    Next x

  6. Step 6
     

    Getting the data:

    * Execute the program. Once the webpage fully loads, click on the command button. You should receive 3 message boxes, each displaying an index name and its up/down percent.

  7. Step 7

    Final thoughts:

    * I've data mined countless webpages over the years and the process is very similar no matter which site I'm working with. Sometimes it's easier to use ie.Document.activeelement.innerhtml rather than ie.Document.activeelement.innertext. I review both with each site to see which can be disected most easily.
    * Sometimes instead of using chr(10) with the Split function, it's easier to use chr(13) or both of them together. Occasionally, you will use some completely different delimiter. Just adjust as need be.
    * Once you find the data you're after within either innerhtml or innertext, the string functions you use to extract that data varies from site to site. The good news is that VB.NET has a very rich set of string processing functions.
    * If all you'd like to do is get this sample working then: Setup the reference as explained then add the command button. From there, you can click on the bottom resource and copy/paste all of the code over top of your project's code :)
    * If you've found this article helpful, please rate it accordingly :)

Comments  

md2thomas said

Flag This Comment

on 8/20/2009 So glad to hear users are finding this how-to useful. Please suggest other such how-to's that you would like to see!

davecross said

Flag This Comment

on 8/20/2009 Excellent how-to of mining the data. I wish I had found this when a while back. 5*

Subscribe

Post a Comment

Post a Comment

Related Ads

  • Have you done this? Click here to let us know.
I Did This
Get Free Computers Newsletters

Copyright © 1999-2009 eHow, Inc. Use of this web site constitutes acceptance of the eHow Terms of Use and Privacy Policy .   en-US Portions of this page are modifications based on work created and shared by Google and used according to terms described in the Creative Commons 3.0 Attribution License. † requires javascript

eHow Computers
eHow_eHow Technology and Electronics