Web scraping is a method of retrieving specific data from a website, such as prices or individual details of specific products, items or services.
Say for example you were creating a website that compared shoe prices from various websites, you would start by scraping each website and then using that data to build your own shoe comparison website or app.
There are a few ways of scraping data, the best one is a Python library called Scrapy, which we use ourselves to scrape data for clients.
There are other ones called BeautifulSoup and PhantomJS, but nothing beats Scrapy. There are also chrome extensions, but in my experience they’re just for beginners and are actually more difficult to use.
When using Scrapy, you write your own spiders which crawl each website that you want the data from.
The spider runs on a server, usually a Linux server, crawls the website and returns the data in just about any format you require. I personally prefer json format as I find it’s easiest to work with.