Rubyroid Labs Blog
Tech Tips and Freebies

Web Scraping: Part 3. Digging deeper

Web Scraping: Part 3. Digging deeper

Reading Time: 2 minutes

Time has come to dig deeper and learn more about the art of web scraping. In our previous articles we have already told you about possible legislative issues of web scraping and how to start scraping with Ruby. Now it is the right time to learn how one can web scrape JS pages using Wombat and Mechanize.

Why use Wombat and Mechanize?

But wait, you just taught us how to scrape with Ruby. Why do we need another tool? The method we showed in the previous article works majorly for HTML pages, it wouldn’t work for JS. Wombat and Mechanize gems are much more powerful and would work for any website. But one should remember that they require more resources. This way we recommend using them only where they are really needed.

How to use Mechanize?

To get things rolling with Mechanize install it and specify which page are you looking at.

This way Mechanize will be looking at Someurl website. It uses Nokogiri to parse HTML. So after you pointed it to the required page, just use Nokogiri to scrape the page as shown below.

Mechanize Page Search can use both CSS and XPath expressions:

Check more details about Mechanize here.

 

How to use Wombat?

Wombat gem is another powerful tool for web scraping almost any page. Let’s see how to use it properly.

First install the gem:

Then start scraping the page via Wombat.crawl call:

We should admit that Wombat has many more functions and settings. You can find more information about it here.

 

To scrape or not to scrape

As you can see from our 3 articles, web scraping can be very useful when working with big amounts of data. It can save you tons of time and simply make your business more efficient. Using just a few Ruby gems you can scrape almost any page. We hope that now you will know how and when to use web scraping. And we are always ready to help you with any questions you have.

Questions? Comments? Let’s talk about them in the comments section below.

Vera Rabkina

Business Analyst at Rubyroid Labs


  •  
  •  
  •  
  •  
  •  
  •  

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *