Project Overview
In this project for the Programming for Business Analytics course, I developed a reusable web scraping framework using Selenium to extract real-time product data from Amazon’s Bestseller pages. By focusing on a live, dynamic retail site, I demonstrated how automated browser interaction can overcome common challenges like infinite scroll, dynamic content loading, and inconsistent HTML structures. This project highlights practical techniques for data collection from modern web platforms, supporting agile market analysis and consumer trend tracking.
Objective
-
Learn and apply Selenium for dynamic web scraping in Python
-
Extract structured data from Amazon Bestsellers pages
-
Build a scalable and adaptable scraping template that can be reused across product categories
-
Handle challenges such as infinite scroll and content delays
Dynamic Web Scraping Project:
Amazon Bestseller
Step 2:
Selenium Setup & Automatic Flow
Automated by using Python and Selenium,
-
Browser launch
-
Navigation to the Bestseller URL
-
Controlled scrolling until the 'endOfList' element is loaded
Step 4:
The final code was modularized to allow easy reuse across other Amazon departments. The logic accounts for:
-
Different product category layouts
-
Dynamic loading speeds
-
HTML inconsistencies
Reusability & Expansion
Step 3:
Data Extraction & Error Handling
Extracted key fields such as:
-
Product Name
-
Rank
-
Price (if available)
-
Rating
-
Number of Reviews
Built flexibility into the script to skip missing fields and adapt to different product formats​
Step 1:
Page Structure & Selector Identification
Manually inspected the structure of the Amazon Bestsellers page, noting class names, scroll behavior, and the location of product containers
Approach & Structure
Insight Gained
Challenge Faced
-
Infinite scroll: Simply scrolling to the bottom didn’t load all products.
→ Solution: Targeted the 'endOfList' element and added implicit waits to ensure complete loading.
-
Varying data structures across departments required extra logic to handle missing or malformed elements.
Key Takeaways
-
Selenium is powerful for automating scraping from dynamic, interactive web pages.
-
Robust scraping logic must account for scalability, variation in structure, and loading delays.
-
Designing reusable frameworks is key to efficiency in real-world data engineering.