Rubyroid Labs
Tech Tips and Freebies

Web Scraping: Part 1. Legal Aspect

Web Scraping: Part 1. Legal Aspect

Reading Time: 3 minutes

Web-scraping is a quite popular technique used in marketing, content analysis and plenty other purposes. We are often asked about it, so we decided to start a series of articles about web scraping. We will cover its legal aspects, ways to do web scraping and provide a number of insights on it. Meet our first article, which will tell what web scraping is about and what legal aspects it interacts with.

 

What is Web Scraping?

The name ‘web scraping’ actually speaks for itself. The essence of this technique is to gather information from the world wide web. There are a number of ways one can do that; we will cover some of them in our later articles. Web Scraping can be used for a number of ways:

  • Competitor’s analysis;
  • Tracing changes of the specified websites;
  • Data integration;
  • Data transfer;
  • Research purposes.

With the help of this technique one extracts data (usually HTML) from the specified websites, which can later be used for all sorts of analysis. Web scraping can be done manually and automatically, last variant, of course, is more preferable.

 

Is Web Scraping Legal?

There are many discussions on how legal data scraping is. One should remember that web scraping is only a tool and in fact all discussions are not about the technique, but about how the extracted data are used.

Reading a website by computer, which in fact data scraping is about, has nothing illegal or unethical. Besides, there are many ways how it can be used with no shade of doubt that it is absolutely legal.

 

Legal Web Scraping

WEB SCRAPING

The most common case is when a User has an online shop and wants to migrate from one CMS to another.

Of course, it’s such a burden to create all the products again, so web scraping is used to make it a little easier.

WEB SCRAPING

Another good example is when a User has a few not connected online shops with some data he needs to synchronize. Web scraping works really well in this case as well and saves tons of time to the owner.

WEB SCRAPING

And finally one more common case is when one needs to get data (e.g. weather forecast) from open sources, but there’s no API available. Web Scraping can save you I this case as well.

As you see, there are zero problems with web scraping, when the information one is extracting belongs to the User himself or to open public sources.

Shady Web Scraping

But why everyone talks about the ethical side of the Web Scraping then? The thing is that some people have misused the power of the Web Scraping, which spoiled its image a little.

WEB SCRAPING

For example some hackers use web scraping for their attacks. Since computer can read the page much quicker than a person, they used it to increase the load on the website, which leads to ‘Denial of Service’ attack.

WEB SCRAPING

Some other low-ethic people use it to steal data from other websites and then post it as if it is their content. Of course search engines work hard to prevent ranging of such pages, but it still can be harmful or at least spoil the day.

It is not only search engines that combat such types of web scraping. It is also banned on the legislative level in some countries. This way if you decided to employ this technique, it might be a good idea to check legislation first.

 

As you see web scraping is a powerful technique. But with great power comes great responsibility. But if you use wisely, everything should be fine.

Vera Rabkina

Business Analyst at Rubyroid Labs


  •  
  •  
  •  
  •  
  •  
  •  

Comments

  1. […] keep telling you how web scraping can make your live a little better place. In part 11 you could learn what are the legal issues regarding it are nothing but the discussions of how […]

  2. […] more about the art of web scraping. In our previous articles we already told you about possible legislative issues of web scraping and how to start scraping with Ruby. Now it is the right time to learn how one can web scrape JS […]

Leave a Reply

Your email address will not be published. Required fields are marked *