|Web scraping, data mining any website|
In this article, I will show you how you can data mine any website and export the data to a spreadsheet. I will do so by showing you an example of an actual project I took on. I found the project on the freelancing website UpWork. The posting was as follows:
Record Store Day Website Scrape
We’re looking for a CSV file / Excel Spreadsheet of all participating stores on this website:
The result of my scraping solution is below.
How I did it
- Create a corpus of all the URLs that contain the company data
- Scrape all the URLs in the corpus and store the data in a file.
Creating a corpus
Python3 and Libraries
Documentation of this library can be found at https://dryscrape.readthedocs.io/
Documentation of this library can be found at https://www.crummy.com/software/BeautifulSoup/bs4/doc/
According to its documentation Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.