I recently wrote a script that automatically downloads data from a website that requires login/password, without actually logging in manually. I wanted to share it with those who may be needing something similar.
Investbulls is a website that supplies stock data on bse (borse or bombay stock exchange). It is free but requires a registration. After that, its users manually login to the website, navigate to the download pages, select previous trading day's data, download it, logout, then open the downloaded data (a zip) file on their computer.
This script automates all of that. It is in biterscripting. I am commenting the code to explain the work flow of the script.
Investbulls is a PHP site. After logging in, the site sends the client a cookie which must be submitted in headers of all subsequent requests to their server. I am using the command 'script SS_ISCookies.txt' to do that.
Code:
# Script investbulls.txt
# Input arguments
var string username # Username that one would manually enter on the login page
var string password # Password that one would manually enter on the login page
var string out, downloadlink, description, zipfile
# Start a browser session and connect it to investbulls.com.
isstart ib "daily download of EOD" "Mozilla/4.0"
isconnect ib "http://www.investbulls.com" > $out
# Submit the login form.
issubmit ib "/news.php" ("username="+$username) ("userpass="+$password) "autologin=0" "userlogin=Login" > null
# Exchange cookies with the server.
script SS_ISCookies.txt from("ib") to("ib")
# Retrieve the page that has the links for the last few days.
# The link at the top is the immediately previous day's data - this
# is the data we are interested in.
isretrieve ib "/download.php?list.40" > $out
# The source for this page is now in string $out.
# Do string extractions (stex command) to get the link for yesterday.
# Also, get its description. We will use it to save this data under that file name.
stex -c "]^<a href='download.php?view^" $out > null
stex -c "^<a href='^]" $out > null
stex -c "]^'^" $out > $downloadlink
stex -c "^'>^]" $out > null
stex -c "]^<^" $out > $description
set $zipfile=$description+".zip"
# Yesterday's URL is in $downloadlink. Get that page. There is a link
# there which, when clicked, the browser starts the download file process.
isretrieve ib $downloadlink > $out
stex -c "]^<a href='request.php?^" $out > null
stex -c "^<a href='^]" $out > null
stex -c "]^'^" $out > $downloadlink
# Ok. The link to the actual downloadble .zip file is now in $downloadlink.
# Retrieve this file in binary (-b) mode.
isretrieve -b ib $downloadlink > $out
# The binary file data is in session's buffer. Save it to the local file.
issave -b ib $zipfile
# Open the local .zip file for the user.
system ("\""+$zipfile+"\"")
# We are done. Click the 'logout' link, disconnect session, close browser.
isretrieve ib "index.php?logout" > null
isdisconnect ib
isend ib
To try:
Save the script in file "C:/Scripts/investbulls.txt", start biterscripting, enter this command.
Code:
script "C:/Scripts/investbulls.txt" username("your user name") password("your password")
The user would use the correct username and password registered with investbulls in the above command in place of "your user name" and "your password".
On my computer, the script takes about a minute to complete execution. After done, the user sees the .zip folder opened. There is no actual browser opened or anything. The script simulates the communication a browser would carry out with the web server.
Thought this may be useful to someone in the future.
Similar code will work on most PHP sites. For .NET sites, you may need to extract some variable values such as __VIEWSTATE from each response ($out) and return them to the server with the next request. Feel free to post your question if you need help automating downloads from .NET web sites.
Bookmarks