Amazon Bestseller

Project Overview

In this project for the Programming for Business Analytics course, I developed a reusable web scraping framework using Selenium to extract real-time product data from Amazon’s Bestseller pages. By focusing on a live, dynamic retail site, I demonstrated how automated browser interaction can overcome common challenges like infinite scroll, dynamic content loading, and inconsistent HTML structures. This project highlights practical techniques for data collection from modern web platforms, supporting agile market analysis and consumer trend tracking.

Objective

Learn and apply Selenium for dynamic web scraping in Python
Extract structured data from Amazon Bestsellers pages
Build a scalable and adaptable scraping template that can be reused across product categories
Handle challenges such as infinite scroll and content delays

Dynamic Web Scraping Project:
Amazon Bestseller

Step 2:

Selenium Setup & Automatic Flow

Automated by using Python and Selenium,

Browser launch
Navigation to the Bestseller URL
Controlled scrolling until the 'endOfList' element is loaded

Step 4:

The final code was modularized to allow easy reuse across other Amazon departments. The logic accounts for:

Different product category layouts
Dynamic loading speeds
HTML inconsistencies

Reusability & Expansion

Step 3:

Data Extraction & Error Handling

Extracted key fields such as:

Product Name
Rank
Price (if available)
Rating
Number of Reviews

Built flexibility into the script to skip missing fields and adapt to different product formats

Step 1:

Page Structure & Selector Identification

Manually inspected the structure of the Amazon Bestsellers page, noting class names, scroll behavior, and the location of product containers

Approach & Structure

Insight Gained

Challenge Faced

Infinite scroll: Simply scrolling to the bottom didn’t load all products.

→ Solution: Targeted the 'endOfList' element and added implicit waits to ensure complete loading.

Varying data structures across departments required extra logic to handle missing or malformed elements.

Key Takeaways

Selenium is powerful for automating scraping from dynamic, interactive web pages.
Robust scraping logic must account for scalability, variation in structure, and loading delays.
Designing reusable frameworks is key to efficiency in real-world data engineering.

Github

Open Github

Appendix: Report

View Report

Dynamic Web Scraping Project: Amazon Bestseller