top of page

Project Overview

In this project for the Programming for Business Analytics course, I developed a reusable web scraping framework using Selenium to extract real-time product data from Amazon’s Bestseller pages. By focusing on a live, dynamic retail site, I demonstrated how automated browser interaction can overcome common challenges like infinite scroll, dynamic content loading, and inconsistent HTML structures. This project highlights practical techniques for data collection from modern web platforms, supporting agile market analysis and consumer trend tracking.

Objective

  • Learn and apply Selenium for dynamic web scraping in Python

  • Extract structured data from Amazon Bestsellers pages

  • Build a scalable and adaptable scraping template that can be reused across product categories

  • Handle challenges such as infinite scroll and content delays

prime.webp

Dynamic Web Scraping Project:
Amazon Bestseller

Step 2:

Selenium Setup & Automatic Flow

Automated by using Python and Selenium,

  • Browser launch

  • Navigation to the Bestseller URL

  • Controlled scrolling until the 'endOfList' element is loaded

Step 4:

The final code was modularized to allow easy reuse across other Amazon departments. The logic accounts for:

  • Different product category layouts

  • Dynamic loading speeds

  • HTML inconsistencies

Reusability & Expansion

Step 3:

Data Extraction & Error Handling

Extracted key fields such as:

  • Product Name

  • Rank

  • Price (if available)

  • Rating

  • Number of Reviews

Built flexibility into the script to skip missing fields and adapt to different product formats​

Step 1:

Page Structure & Selector Identification

Manually inspected the structure of the Amazon Bestsellers page, noting class names, scroll behavior, and the location of product containers

Approach & Structure

Insight Gained

amazon_image_edited.jpg

Challenge Faced

  • Infinite scroll: Simply scrolling to the bottom didn’t load all products.

→ Solution: Targeted the 'endOfList' element and added implicit waits to ensure complete loading.

  • Varying data structures across departments required extra logic to handle missing or malformed elements.

Key Takeaways

  • Selenium is powerful for automating scraping from dynamic, interactive web pages.

  • Robust scraping logic must account for scalability, variation in structure, and loading delays.

  • Designing reusable frameworks is key to efficiency in real-world data engineering.

Appendix: Report

bottom of page