Results 1 to 3 of 3

Thread: Automatic download of data from website that requires login and password

  1. #1
    Join Date
    Dec 2009
    Posts
    23

    Automatic download of data from website that requires login and password

    I recently wrote a script that automatically downloads data from a website that requires login/password, without actually logging in manually. I wanted to share it with those who may be needing something similar.

    Investbulls is a website that supplies stock data on bse (borse or bombay stock exchange). It is free but requires a registration. After that, its users manually login to the website, navigate to the download pages, select previous trading day's data, download it, logout, then open the downloaded data (a zip) file on their computer.

    This script automates all of that. It is in biterscripting. I am commenting the code to explain the work flow of the script.

    Investbulls is a PHP site. After logging in, the site sends the client a cookie which must be submitted in headers of all subsequent requests to their server. I am using the command 'script SS_ISCookies.txt' to do that.


    Code:
    # Script investbulls.txt
    # Input arguments
    var string username # Username that one would manually enter on the login page
    var string password # Password that one would manually enter on the login page
    
    var string out, downloadlink, description, zipfile
    
    # Start a browser session and connect it to investbulls.com.
    isstart ib "daily download of EOD" "Mozilla/4.0"
    isconnect ib "http://www.investbulls.com" > $out
    
    # Submit the login form.
    issubmit ib "/news.php" ("username="+$username) ("userpass="+$password) "autologin=0" "userlogin=Login" > null
    
    # Exchange cookies with the server.
    script SS_ISCookies.txt from("ib") to("ib")
    
    # Retrieve the page that has the links for the last few days.
    # The link at the top is the immediately previous day's data - this
    #    is the data we are interested in.
    isretrieve ib "/download.php?list.40" > $out
    
    # The source for this page is now in string $out.
    #  Do string extractions (stex command) to get the link for yesterday.
    # Also, get its description. We will use it to save this data under that file name.
    stex -c "]^<a href='download.php?view^" $out > null
    stex -c "^<a href='^]" $out > null
    stex -c "]^'^" $out > $downloadlink
    stex -c "^'>^]" $out > null
    stex -c "]^<^" $out > $description
    set $zipfile=$description+".zip"
    
    # Yesterday's URL is in $downloadlink. Get that page. There is a link
    # there which, when clicked, the browser starts the download file process.
    isretrieve ib $downloadlink > $out
    stex -c "]^<a href='request.php?^" $out > null
    stex -c "^<a href='^]" $out > null
    stex -c "]^'^" $out > $downloadlink
    
    # Ok. The link to the actual downloadble .zip file is now in $downloadlink.
    # Retrieve this file in binary (-b) mode.
    isretrieve -b ib $downloadlink > $out
    
    # The binary file data is in session's buffer. Save it to the local file.
    issave -b ib $zipfile
    
    # Open the local .zip file for the user.
    system ("\""+$zipfile+"\"")
    
    # We are done. Click the 'logout' link, disconnect session, close browser.
    isretrieve ib "index.php?logout" > null
    isdisconnect ib
    isend ib

    To try:

    Save the script in file "C:/Scripts/investbulls.txt", start biterscripting, enter this command.

    Code:
    script "C:/Scripts/investbulls.txt" username("your user name") password("your password")

    The user would use the correct username and password registered with investbulls in the above command in place of "your user name" and "your password".

    On my computer, the script takes about a minute to complete execution. After done, the user sees the .zip folder opened. There is no actual browser opened or anything. The script simulates the communication a browser would carry out with the web server.

    Thought this may be useful to someone in the future.

    Similar code will work on most PHP sites. For .NET sites, you may need to extract some variable values such as __VIEWSTATE from each response ($out) and return them to the server with the next request. Feel free to post your question if you need help automating downloads from .NET web sites.

  2. #2
    Join Date
    Aug 2009
    Posts
    2,881

    Re: Automatic download of data from website that requires login and password

    The tip is really helpful for getting data from various sites without login. But the process is bit lengthy a need some advance technical skills. Here I got one more solution by which you can enter any site, forums without getting the account registration. For that you just need User Agent addon for Firefox and configure the same as Google Bot. You get access in the site without login id and password. But some sites can sue you for this.

  3. #3
    Join Date
    Dec 2009
    Posts
    23

    Re: Automatic download of data from website that requires login and password

    I will be quite surprised if this can be done using firefox, without proper login and password. I am not saying it can't be done, just saying that I can't imagine how it can be done, and I will be surprised.

    My scripting approach isn't that brave - it requires proper login and password. In essence we are creating our own mini-browser using a scripting language - we are merely automating the manual typing, reading, entering, pointing, clicking.

    Notice that I did not hardcode the login and password in the script - so the caller of the script must supply his own login/password. That way, if the website is doing billing based on login/password, the web site will have the correct information.

    Another advantage of scripting approach, I don't know if noticed it, is that the site cookies remain in the script's buffer - cookies are never written to the hard drive. Google analytics scripts inserted in most websites routinely scan the browser's computer for tracing these cookies and collect info on which sites you visited, even if you never went to those sites thru google.

    In the scripting approach, google would not see your cookies. That's the correct approach , since google does not need to know, nor has any right to know, which sites you visit, if you did NOT go to those sites thru google.
    Last edited by ranjankumar09; 20-04-2010 at 08:51 PM.

Similar Threads

  1. How to make DNN website automatic login in Zoom Search Engine?
    By Aarya in forum Technology & Internet
    Replies: 5
    Last Post: 20-08-2010, 11:17 AM
  2. consuming a webservice that requires login credentials
    By willjones in forum Software Development
    Replies: 1
    Last Post: 23-01-2010, 03:07 AM
  3. Windows 7 requires twice login access to desktop
    By Abhibhava in forum Operating Systems
    Replies: 5
    Last Post: 22-12-2009, 04:49 PM
  4. Replies: 1
    Last Post: 02-04-2009, 07:53 PM
  5. Replies: 1
    Last Post: 28-04-2007, 08:57 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Page generated in 1,711,645,209.33579 seconds with 17 queries