I’ve been thinking on what my first tutorial should look like and I came to the conclusion that it must be something that learned thanks to the internet. So yes, everything I’m going to show you in this tutorial is common knowledge, but think about this tutorial as the synthesized version of web scraping from what it is to how to do it.
Information, information, information let’s be honest that’s where the future is heading. To me this need for information started as a school project. I needed to get a bunch of data and to be honest I’m kinda lazy wen it comes to repetitive tasks and this was a repetitive task. Page after page all the same but with different information. Searching through the web while trying to avoid the task at hand I found a way to automate the process. Took me a while but I founded … web scraping.
What is web scraping?
In short, extracting data from webpages. It mostly refers to the automatic process, but it can also be done one at a time. All websites have relevant information, being able to put that information in condensed databases for analysis is what scraping is all about.
Is it legal?
Yes, no, maybe, I don’t know. Can you repeat the question?
The answer is not so simple as it’s been a tug of war situation. As Icreon company put it:
It’s a “wild west” environment, where the legal implications of web scraping are in a constant state of flux — Icreon’s blog
Basically and for the moment it is not illegal as long as the company being scraped does not lose out as a result of this situation. Be either performance or sensitive information. I strongly suggest to read this article in order to get the whole picture.
Note: The reason I’m using US legal system to answer the question is because must major companies have their servers in that country and therefore the US courts are the ones to dictate the laws.
Personally I don’t encourage scraping unless the owner of the website has previously agreed.