Time for the fun stuff now. The holy grail for a lot of Internet Marketers is automation. This can be obtained through simple iMacros scripts, some PHP scripts on a server, or with a little tool called Watir using the Ruby programming language. All of these combos have their own inherent advantages and disadvantages, but that’s not something I’m going to go over here. I like to use Watir for a lot of botting needs, so that’s what I’m going to show you how to do today.
Why Ruby? Why Watir?
For anyone who has had the joy of switching to Ruby from other languages, this question should be a no brainer. Ruby is a fun language, it’s clear and concise, reads like English, and has a great collection of plugins (gems) that allow you to significantly expand the capabilities of your apps. Ruby obviously came to fame because of Rails, which is a really nice and powerful framework, but is a whole different beast from web bots. Ruby is a great general purpose language that can be used for a lot of different projects. As for Watir, here’s the basic run down. Watir is a browser based web app front end testing tool. Automated testing is a big deal for guys who work on large code bases, but when it comes to doing assertions on the visual side of things, there’s no substitution for simulating a browser. What Watir allows you to do is directly control your Firefox (or Chrome, Safari, or IE even) through Ruby code to simulate a user on a site to make sure your site doesn’t break down if a user clicks a button 10,000 times for instance. So Watir exists for noble causes, but there are obviously other ways you can utilise the power that it gives you. Admittedly, it’s not the fastest botting solution out there, cause Firefox is heavy as hell. Also, running concurrent (threaded) bots becomes difficult because of Firefox’s heft as well. But no other solution allows you to so quickly and easily get up and running with a fully functional, robust bot. Plus, you can watch the thing run as you’re testing in your browser, so you can see exactly what you need to do next. It’s sweet.
Let’s Get Started
Ok, so there’s a couple things you need to do to prepare to write bots with Watir. First, you need to get Ruby installed. On Windows, there are things like the One Click installer or just the actual Ruby installer. For Linux, like Ubuntu (what I’m running), you can use apt-get to install everything you need. I’m not going to go over how to install everything, I’m just going to point you to Watir’s installation page: Install Watir Next, you should install Firefox. You could technically run this bot in any browser, but there are a couple of plugins that make this a lot easier and you can only use those in Firefox. So go do it. Once Firefox is installed, you need to get the JSSH Plugin installed as well. The instructions for that are on the Install Watir page as well. Next, you need to install those Firefox plugins. Here are the ones that we’ll use in this guide:
- Test-Wise Recorder – This plugin is amazing. It records your interactions with a site and spits out Ruby Watir code. It’s not 100% all of the time, but it is a damn good starting point.
- Firebug – If you want to write web bots but don’t know what Firebug is, just, uhhh, come on. Go get it. Go love it.
- FireXPath – This lets you right click any item on the page and grab the exact XPath string to access it through code. It is super handy for a lot of these projects.
Finally, you need to install the Ruby Gem Nokogiri. Your best bet is to actually install Mechanize, because Mechanize comes with Nokogiri, and you’re going to need both of them eventually, so you might as well. Should be as simple as: [cc lang=”ruby”]gem install mechanize[/cc] If that doesn’t work right, fire up Google and get it to work.
So what are we going to build? There are obviously a lot of dubious projects that you can do with the tools described in this guide, but for now, I’m going to keep things nice and white hat. So today, we’re going to build a bot that will automatically scrape the forecast for a location 7 days from today. I know, super useful. So anywho, fire up your browser and go to weather.com. You should see something like this: Now, when botting, you want to run through (often times, a lot) the process to get a feel for exactly what you’re going to do. So go ahead and type in your location in the search box up top (I did San Diego, CA for this guide). You should now see this: Ok, so to get the forecast for a week from today, you need to get over to the “10-Day” page, so click that link on the blue menu bar: Finally, what we want is the 8th row of that table (it includes today, so 7 + 1 = simple math). That box looks like this: And what we want to pull out from that box is the text underneath the rainy clouds, Showers. Seems easy enough, right? Let’s get started.
Writing The Bot
So now, go back to weather.com’s home page, and fire up “iTest2 Recorder Sidebar” in Firefox. A menu will pop up on the left side of the screen, that looks like this: So what we want to do is click the Watir tab on that, then start interacting with the website. As you click around, enter form fields, etc, it records what you do and spits out Ruby code. Really cool. So go ahead and click into the search box, and type in that location you used when we were exploring the site earlier. Then click “Search”. Now, click on the “10-Day” link in the blue menu bar. Your iTest Panel should now have this code in it: Now, right click in that box, select “Copy All”, and fire up your text editor and paste that code into there. You can close up iTest now. Next, we’re going to get the appropriate require’s into your code so we can run this initial script. At the top of your ruby file, add these lines: [cc lang=”ruby”] require ‘rubygems’ require ‘watir’ Watir::Browser.default = “firefox” [/cc] All we’re doing there is getting the right gems loaded up, then making sure that when Watir starts running, it uses Firefox by default. Your code should now look like this: [cc lang=”ruby”] require ‘rubygems’ require ‘watir’ #start the browser up Watir::Browser.default = “firefox” browser = Watir::Browser.start “http://www.weather.com/” browser.text_field(:id, “whatwhereForm2”).set(“san diego, ca”) browser.button(:src,”http://i.imwx.com/web/common/searchbutton.gif”).click browser.link(:text, “10-Day”).click [/cc] Save that file out as weather.rb, then drop down into the command line/terminal and navigate to where you saved the file and run it by entering: [cc lang=”ruby”] ruby weather.rb [/cc] If everything was setup correctly, you should see Firefox pop up, visit weather.com, type in your location, then click the 10-Day link. Sweet!
Scraping the Forecast with Nokogiri and XPath
So now that we’re on the right page, we need to get that specific forecast for one week from today. In Firefox, right click the “Showers” forecast in the table row for 7 days from today (in this case, it’s April 11th) and click “Inspect XPath”. You should see this: Copy the “.//*[@id=’tenDay’]/div/div/div/div/p” code out and paste it into your weather.rb file. So now we’re going to load the current page into Nokogiri, and using the .xpath method, parse out this row of the table and get the text within that <p> tag. Just under your “require ‘watir'” line, you need to add this: [cc lang=”ruby”] require ‘nokogiri’ [/cc] Next, after the current last line of your script (the line that gets us onto the 10-Day forecase page), you’re going to add this: [cc lang=”ruby”] page_html = Nokogiri::HTML.parse(browser.html) puts page_html.xpath(“.//*[@id=’tenDay’]/div/div/div/div/p”).inner_text [/cc] What the hell is that? Well, the first line is instantiating the variable “page_html” and setting it to the result from parsing the current page (accessible via browser.html) with Nokogiri. The current page for Watir is always stored in browser.html, fyi. What this is basically doing is getting it locked and loaded in Nokogiri so you can parse out the relevant information from the DOM that you want. The next line is printing to the screen (that’s what “puts” does, it’s like “echo” in PHP) the inner_text of whatever is located at the XPath location that we copied from FireXPath earlier. Let me break that down a bit. I’m not going into what XPath is here, but it’s basically a quick way to traverse the DOM. What this XPath says is “go through the whole page, find any element with the id of ‘tenDay’, find the 8th div inside of that, find the first div inside of that, the 2nd div inside of that, the first div inside of that, then the first p inside of that”. If we left out the .inner_text call, this would spit out all of the html code inside of that <p> tag. But by adding .inner_text, you are telling Nokogiri to ignore all of the HTML in that <p> tag and just print out whatever text is located in there, which in this case is “Showers”. So that’s basically it. Your final code should look like this: [cc lang=”ruby”] require ‘rubygems’ require ‘watir’ require ‘nokogiri’ #start the browser up Watir::Browser.default = “firefox” browser = Watir::Browser.start “http://www.weather.com/” browser.text_field(:id, “whatwhereForm2”).set(“san diego, ca”) browser.button(:src,”http://i.imwx.com/web/common/searchbutton.gif”).click browser.link(:text, “10-Day”).click #pass in current page’s html to nokogiri for parsing page_html = Nokogiri::HTML.parse(browser.html) puts page_html.xpath(“.//*[@id=’tenDay’]/div/div/div/div/p”).inner_text [/cc] Run that with “ruby weather.rb” again, and it should run and stop on the 10-Day forecast page. Check your terminal, and it should print out “Showers”. Boom. You just botted Weather.com.
Where To Go From Here?
Ok, so I’ll admit it, that was a really simple and White Hat bot. But, there is A LOT of potential with what I just wrote up if you take some time to plan out some good targets. So start tweaking this code and try botting some other sites. Once you’re comfortable, start thinking about how you can expand your efforts. I want to write some more guides like this for different efforts, but I probably won’t ever write anything very Black Hat just because that’s a job for you to figure out anyways :-). There are a few other useful topics revolving around these concepts though that I will write about soon, so stay tuned. Good luck, D