Press enter to see results or esc to cancel.

How To Write Your First Ruby Web Bot In Watir – Scraping Weather.com

Time for the fun stuff now. The holy grail for a lot of Internet Marketers is automation. This can be obtained through simple iMacros scripts, some PHP scripts on a server, or with a little tool called Watir using the Ruby programming language. All of these combos have their own inherent advantages and disadvantages, but that’s not something I’m going to go over here. I like to use Watir for a lot of botting needs, so that’s what I’m going to show you how to do today.

Why Ruby? Why Watir?

For anyone who has had the joy of switching to Ruby from other languages, this question should be a no brainer. Ruby is a fun language, it’s clear and concise, reads like English, and has a great collection of plugins (gems) that allow you to significantly expand the capabilities of your apps. Ruby obviously came to fame because of Rails, which is a really nice and powerful framework, but is a whole different beast from web bots. Ruby is a great general purpose language that can be used for a lot of different projects. As for Watir, here’s the basic run down. Watir is a browser based web app front end testing tool. Automated testing is a big deal for guys who work on large code bases, but when it comes to doing assertions on the visual side of things, there’s no substitution for simulating a browser. What Watir allows you to do is directly control your Firefox (or Chrome, Safari, or IE even) through Ruby code to simulate a user on a site to make sure your site doesn’t break down if a user clicks a button 10,000 times for instance. So Watir exists for noble causes, but there are obviously other ways you can utilise the power that it gives you. Admittedly, it’s not the fastest botting solution out there, cause Firefox is heavy as hell. Also, running concurrent (threaded) bots becomes difficult because of Firefox’s heft as well. But no other solution allows you to so quickly and easily get up and running with a fully functional, robust bot. Plus, you can watch the thing run as you’re testing in your browser, so you can see exactly what you need to do next. It’s sweet.

Let’s Get Started

Ok, so there’s a couple things you need to do to prepare to write bots with Watir. First, you need to get Ruby installed. On Windows, there are things like the One Click installer or just the actual Ruby installer. For Linux, like Ubuntu (what I’m running), you can use apt-get to install everything you need. I’m not going to go over how to install everything, I’m just going to point you to Watir’s installation page: Install Watir Next, you should install Firefox. You could technically run this bot in any browser, but there are a couple of plugins that make this a lot easier and you can only use those in Firefox. So go do it. Once Firefox is installed, you need to get the JSSH Plugin installed as well. The instructions for that are on the Install Watir page as well. Next, you need to install those Firefox plugins. Here are the ones that we’ll use in this guide:

  • Test-Wise Recorder – This plugin is amazing. It records your interactions with a site and spits out Ruby Watir code. It’s not 100% all of the time, but it is a damn good starting point.
  • Firebug – If you want to write web bots but don’t know what Firebug is, just, uhhh, come on. Go get it. Go love it.
  • FireXPath – This lets you right click any item on the page and grab the exact XPath string to access it through code. It is super handy for a lot of these projects.

Finally, you need to install the Ruby Gem Nokogiri. Your best bet is to actually install Mechanize, because Mechanize comes with Nokogiri, and you’re going to need both of them eventually, so you might as well. Should be as simple as:

gem install mechanize

If that doesn’t work right, fire up Google and get it to work.

Party Time

So what are we going to build? There are obviously a lot of dubious projects that you can do with the tools described in this guide, but for now, I’m going to keep things nice and white hat. So today, we’re going to build a bot that will automatically scrape the forecast for a location 7 days from today. I know, super useful. So anywho, fire up your browser and go to weather.com. You should see something like this: Now, when botting, you want to run through (often times, a lot) the process to get a feel for exactly what you’re going to do. So go ahead and type in your location in the search box up top (I did San Diego, CA for this guide). You should now see this: Ok, so to get the forecast for a week from today, you need to get over to the “10-Day” page, so click that link on the blue menu bar: Finally, what we want is the 8th row of that table (it includes today, so 7 + 1 = simple math). That box looks like this: And what we want to pull out from that box is the text underneath the rainy clouds, Showers. Seems easy enough, right? Let’s get started.

Writing The Bot

So now, go back to weather.com’s home page, and fire up “iTest2 Recorder Sidebar” in Firefox. A menu will pop up on the left side of the screen, that looks like this: So what we want to do is click the Watir tab on that, then start interacting with the website. As you click around, enter form fields, etc, it records what you do and spits out Ruby code. Really cool. So go ahead and click into the search box, and type in that location you used when we were exploring the site earlier. Then click “Search”. Now, click on the “10-Day” link in the blue menu bar. Your iTest Panel should now have this code in it: Now, right click in that box, select “Copy All”, and fire up your text editor and paste that code into there. You can close up iTest now. Next, we’re going to get the appropriate require’s into your code so we can run this initial script. At the top of your ruby file, add these lines:

 require 'rubygems' require 'watir' Watir::Browser.default = "firefox"

All we’re doing there is getting the right gems loaded up, then making sure that when Watir starts running, it uses Firefox by default. Your code should now look like this:

 require 'rubygems' require 'watir' #start the browser up Watir::Browser.default = "firefox" browser = Watir::Browser.start  "http://www.weather.com/" browser.text_field(:id, "whatwhereForm2").set("san diego, ca") browser.button(:src,"http://i.imwx.com/web/common/searchbutton.gif").click browser.link(:text, "10-Day").click

Save that file out as weather.rb, then drop down into the command line/terminal and navigate to where you saved the file and run it by entering:

 ruby weather.rb

If everything was setup correctly, you should see Firefox pop up, visit weather.com, type in your location, then click the 10-Day link. Sweet!

Scraping the Forecast with Nokogiri and XPath

So now that we’re on the right page, we need to get that specific forecast for one week from today. In Firefox, right click the “Showers” forecast in the table row for 7 days from today (in this case, it’s April 11th) and click “Inspect XPath”.  You should see this: Copy the “.//*[@id='tenDay']/div[8]/div/div[2]/div/p” code out and paste it into your weather.rb file. So now we’re going to load the current page into Nokogiri, and using the .xpath method, parse out this row of the table and get the text within that <p> tag. Just under your “require ‘watir'” line, you need to add this:

 require 'nokogiri'

Next, after the current last line of your script (the line that gets us onto the 10-Day forecase page), you’re going to add this:

 page_html = Nokogiri::HTML.parse(browser.html) puts page_html.xpath(".//*[@id='tenDay']/div[8]/div/div[2]/div/p").inner_text

What the hell is that? Well, the first line is instantiating the variable “page_html” and setting it to the result from parsing the current page (accessible via browser.html) with Nokogiri. The current page for Watir is always stored in browser.html, fyi. What this is basically doing is getting it locked and loaded in Nokogiri so you can parse out the relevant information from the DOM that you want. The next line is printing to the screen (that’s what “puts” does, it’s like “echo” in PHP) the inner_text of whatever is located at the XPath location that we copied from FireXPath earlier. Let me break that down a bit. I’m not going into what XPath is here, but it’s basically a quick way to traverse the DOM. What this XPath says is “go through the whole page, find any element with the id of ‘tenDay’, find the 8th div inside of that, find the first div inside of that, the 2nd div inside of that, the first div inside of that, then the first p inside of that”. If we left out the .inner_text call, this would spit out all of the html code inside of that <p> tag. But by adding .inner_text, you are telling Nokogiri to ignore all of the HTML in that <p> tag and just print out whatever text is located in there, which in this case is “Showers”. So that’s basically it. Your final code should look like this:

 require 'rubygems' require 'watir' require 'nokogiri' #start the browser up Watir::Browser.default = "firefox" browser = Watir::Browser.start  "http://www.weather.com/" browser.text_field(:id, "whatwhereForm2").set("san diego, ca") browser.button(:src,"http://i.imwx.com/web/common/searchbutton.gif").click browser.link(:text, "10-Day").click #pass in current page's html to nokogiri for parsing page_html = Nokogiri::HTML.parse(browser.html) puts page_html.xpath(".//*[@id='tenDay']/div[8]/div/div[2]/div/p").inner_text

Run that with “ruby weather.rb” again, and it should run and stop on the 10-Day forecast page. Check your terminal, and it should print out “Showers”. Boom. You just botted Weather.com.

Where To Go From Here?

Ok, so I’ll admit it, that was a really simple and White Hat bot. But, there is A LOT of potential with what I just wrote up if you take some time to plan out some good targets. So start tweaking this code and try botting some other sites. Once you’re comfortable, start thinking about how you can expand your efforts. I want to write some more guides like this for different efforts, but I probably won’t ever write anything very Black Hat just because that’s a job for you to figure out anyways :-). There are a few other useful topics revolving around these concepts though that I will write about soon, so stay tuned. Good luck, D

Comments

30 Comments

abroms

Well I’m downloading Ruby etc. and messing around with it as soon as I have time. Great post, clearly laid out, very helpful. Keep it up man!

admin

Thanks! Let me know if you need any help with anything!

skori

how to get rid of warning pop ups in watir?i am not finiding any better solution for that.

Adam Hermsdorfer

Nice post Darrin. The author seems to know what he’s talking about ;)

angel

Hi…thanks a lot…a question.is possible use watir w javascript instead HTML?..thx!!

admin

Hi…thanks a lot…a question.is possible use watir w javascript instead HTML?..thx!!

Not exactly sure what you mean, but if you mean to ask if you can handle pages with javascript on them, then yes. You can do everything that you could normally do in Firefox. Which basically means, everything

Contempt

Will be plugging you in the next post…

admin

Will be plugging you in the next post…

Yeahhhhhhhhhh!

Snow Poet

I. Just. Jizzed.

Amazing post man! Please add a donate button on your page. Or at least let me send you a case of beers. I feel like a kid who just discovered a monster-sized easter egg.

admin

I. Just. Jizzed.

And my job is now done.

Chris Kemp

Clever… very clever application.

You’ve chewed up my next three evenings, damn you!

uberVU - social comments

Social comments and analytics for this post…

This post was mentioned on Hackernews by ddemchuk: Hopefully this isn’t in poor taste, I’m not trying to whore out my post. Just thought that some of you guys might be interested in this topic.Please let me know what you think, I want to continue on …

Jon Tucker

Well done Darrin. Helps me visualize more of what we chatted about with Watir.

links, ideas and geek stuff » Blog Archive » links for 2010-04-05

[...] How To Write Your First Ruby Web Bot In Watir [...]

khelll

I’m wondering if the same could be achieved using mechanized only, no watir.

admin

I’m wondering if the same could be achieved using mechanized only, no watir.

Absolutely. The advantage you get from using Watir though is that you can utilize the full power of a browser, plus you can record your interactions with a site. The speed of development increase is huge compared to working with Mechanize.

Mechanize does have it’s advantages and uses though, I’ll cover those later on.

Thanks for checking it out though!

Eric Davis

I’ve used Mechanize to script some software setups before. It could work for this but development is really slow because you have to go back and forth between a browser and your code.

I can see using watir for quick and dirty scripts and then pulling out mechanize later on if you need to optimize (or run on a server).

Edward

http://www.ihighfive.com/

Go on. You deserved it.

gutterseo

Nice blog post, absolutely perfect for something that I have coming up.

Chri Michael

Great post. Went to download the Firefox extensions and wouldn’t you know it, Firefox updated my system and many of the extensions now don’t work (until the authors update them).

I’m listening — put your site into my RSS reader, so I’ll be very interested in what you have to say regarding both WordPress (which I use and tweak) and Ruby/Rails which I also use. Some good Mechanize/Nokogiri examples are always appreciated. Have a hard time locating detailed examples. I’ve scraped several websites with them, but would really appreciate some in-depth examples. Have you used Webrat for scraping?

Here’s a good Mechanize cheatsheet: http://www.e-tobi.net/blog/2010/02/05/ruby-mechanize-cheat-sheet

gutterseo

For anyone having problems with the “sudo gem update –system command” on ubuntu the answer you neeed can be found here: http://blog.rubyhead.com/2009/11/02/ubuntu-9-10-ruby-installation/

admin

or anyone having problems with the “sudo gem update –system command”

Thanks for posting that, I ran into issues with gem updates last night, but I was trying to install Padrino (pretty sweet looking web framework, btw). I had installed ruby gems through apt-get, but I ended up having to install gems separately and redoing things. Now I’m back in business though.

Thanks buddy.

sgtryan

I digg this post (and blog). Very intelligent writeup and some really great information in here. Nice to see a talented Ruby programmer take the time to write something industry-related and provide code as well.

-Ryan

Du1

Sup D, can you point me in the right direction on decaptcha implementation? I have am currently using an iMacros/Wamp setup to meet my needs you hinted at. I’d rather use watir due fact that I can easily integrate database access which cannot be done out the box with the macro plugin. Feel free to shoot an im or email if needed. Thanks.

admin

Hey, I actually have the code for decaptcha (in Ruby) and deathbycaptcha, so once I get some time in the next week or two, I’ll clean it up and post it here for ya.

D

Du1

Awesome! Thanks in advance.

Michael Alexander

great post, I have decided to spend a few weeks learning ruby/watir even though I don’t have the opportunity to use it at work….yet. Keep up the great work, post like this that give great examples and scenarios are much appreciated.

-Michael

Mathu

How to install watir on the machine. In the command prompt i am getting error while installing

Michael

Really nice post. Currently i’m just working with the mechanize gem. But i’ll give watir a try. Good read.

Getting data with Excel 2013: using FilterXML for web scraping from Alibaba | tycho01

[...] scraping from websites requiring authentication, for which you’re better off with Ruby gem Watir.) As such, I turned towards Excel [...]


Leave a Comment