Supercharged Web Scraping with Asyncio
Web scraping is simply automatically opening up any website and grabbing the data you find important on that website. It's fundamental to the internet, search engines, Data Science, automation, machine learning, and much more.
Opening websites and extracting data are only part of what makes web scraping great. It's the parsing of the data that's where the value is.
This project will cover:
- Basic web scraping with Python
- Web scraping with Selenium
- Sync vs Async
- Asynchronous Web scraping with Asyncio
Requirements:
- Python experience (at least the first 15 days of this project).
- Selenium & chromedriver installed (watch how in this one).
Reference code
Lessons
1
Welcome
3:25
2
Requirements
0:59
3
Project Demo
10:27
4
Sync vs Async
12:57
5
Blocking & Timeouts
10:25
6
Scraping with Selenium
9:01
7
Async Web Scraping with chromedriver and arsenic
15:00
8
Hide Arsenic Logs
1:12
9
Async Data with Pandas
13:12
10
Prepare to Scrape Multiple URLs
11:32
11
Extract Product Data
13:24
12
Async Product Data Extraction
9:21
13
Modules & Submodules
5:39
14
Service Specific Submodule
3:25
15
Decouple Logging & Scraper
5:23
16
Synchronous SQL Storage with Pandas
7:16
17
Store Scrapped Data to SQL Tables
13:26
18
Inspect Stored Data in Jupyter
5:34
19
Scraping URLs from Stored Links Table
16:29
20
Scrape Paginated List View
13:30
21
Results & Timing
8:44
22
Thank you & next steps
2:48