top of page
gallery_thumbnail_edited.jpg

Capital Bikeshare DC

About

Project Summary

Forecasted hourly bike rental demand in Washington, D.C. using R and Tableau. Built predictive models, such as XGBoost and Ensemble, and visualized rider behavior to uncover actionable trends by user type and time. Delivered strategic recommendations to support smarter scheduling and marketing.

Cyclists Near Landmark
bikeshare.webp

Project Overview

This project focuses on understanding and forecasting hourly bike rental demand in Washington, D.C., using historical data from the Capital Bikeshare system. The analysis combines both machine learning modeling in R and data visualization in Tableau to uncover key behavioral trends, environmental influences, and temporal patterns that drive bike usage. By applying predictive modeling techniques and and data visualization using Tableau, the goal is to support data-driven decision-making for resource allocation, user segmentation, and marketing strategy.

​

The project is divided into two parts:

 

  • Part 1 (RStudio): Development of predictive models using linear regression, regression trees, and ensemble learning techniques such as Extreme Gradient Boosting (XGBoost) and model stacking.

  • Part 2 (Tableau): Visual exploration of usage behavior across user types, weekdays vs. weekends, and seasonal patterns to generate actionable business insights.

Project Objectives

  1. Predict hourly bike rental demand using weather and calendar-related features through machine learning algorithms

  2. Compare model performance across different techniques (linear models, decision trees, ensemble models) and improve accuracy through stacking

  3. Uncover behavioral differences between casual and registered users using visual analytics

  4. Generate actionable recommendations for operations and marketing by integrating model outputs and user behavior insights

  5. Communicate insights clearly to both technical and non-technical stakeholders through modeling metrics and visual storytelling

Part 1: Predictive Modeling in RStudio

Overview

The first part of this project aimed to forecast hourly bike rental demand using machine learning models, leveraging historical data from Capital Bikeshare in Washington, D.C. The focus was on extracting temporal and weather-related patterns to support more efficient bike fleet management and operational planning.

Objectives

  • Predict hourly bike rental count using environmental and temporal features

  • Compare different model types (linear, tree-based, ensemble)

  • Apply ensemble learning to improve predictive accuracy and robustness

Methodology

Data Preparation

  • ​Dropped data leakage columns: 'casual' and 'registered'

  • Extracted 'year', 'month', 'weekday', and 'hour' from 'datetime' using 'lubridate'

  • Converted relevant columns to categorical variables using 'as.factor()'

  • Removed 'minute' and 'second' variables as they were not informative

  • Partitioned dataset into a 70% training set and 30% test set ('set.seed(1234)')

Models Built & Compared

Key Insights from Model Comparisons

1. Linear Regression

  • Used stepwise selection ('step()') to identify significant predictors

  • Top drivers: 'atemp', 'temp', 'humidity', 'hour', 'season'

  • Served as a simple, interpretable baseline and performed better than benchmark but limited by linear assumptions

  • Showed poor performance in low-demand hours due to its linearity

2. Regression Tree (CART)

  • Built a fully grown tree and pruned it using the optimal complexity parameter 'cp' from cross-validation

  • Strong predictors: 'hour', 'atemp', 'humidity', 'wday', 'year'

  • Effectively captured nonlinear relationships and interactions, especially with time variables

  • Pruning significantly improved generalization and interpretability

3. XGBoost

  • One-hot encoded categorical variables using 'dummyVars()'

  • Trained the model with manually selected hyperparameters using grid search to tune

  • Top features: 'hour.17', 'hour.18', 'atemp', 'humidity', 'year'

  • Achieved strong performance in absolute error (lowest RMSE among single models), but showed slightly worse relative error (MAPE) than the pruned tree, especially in lower demand periods

4. Stacked Ensemble (Final Model)

  • Combined predictions from Linear regression, Pruned Tree, and XGBoost

  • Tested Ridge Regression and Quantile Random Forest (QRF) as for stackers, but XGBoost performed best as meta-learner

  • Achieved the lowest RMSE and MAPE, showing robust performance across high and low demand hours

  • Reduced MAPE by ~17 percentage points compared to standalone XGBoost

  • Feature importance in the stacked model revealed that XGBoost predictions were weighted most, followed by the tree and linear predictions, with some marginal contributions from original features like 'hour' and 'humidity'

Reflection & Learning

  • Model performance improved consistently with complexity, showing the value of combining approaches

  • Overfitting was mitigated through stepwise selection, pruning, regularization, and sampling techniques

  • This modeling experience deepened understanding of feature engineering, error metrics (RMSE & MAPE), and ensemble modeling

  • Built a strong foundation for communicating model performance to both technical and non-technical audiences

  • Through this project, I deepened my understanding of model evaluation, hyperparameter tuning, and ensemble learning in a real-world forecasting context

Part 2: Data Visualization & Behavioral Analysis in Tableau

Overview

This part of the project focused on exploring behavioral trends in bike usage through data visualization using Tableau. By analyzing the rental patterns of casual vs. registered users across time, weekdays, and seasons, the project aimed to uncover actionable insights that could inform customer segmentation, marketing strategies, and service planning with the goal of supporting strategic decision-making through human-centered data interpretation.

Objectives
  • Identify seasonal and temporal trends for different user types (casual vs. registered)

  • Compare usage patterns across working and non-working days

  • Provide recommendations to improve user engagement and operational efficiency based on behavior patterns

Approach

This analysis was developed for a tech and non-technical leadership team interested in understanding how rider behaviors differ between casual and registered users, with the goal of making data-driven strategic decisions.

 

Two connected business questions that would reveal distinct patterns in usage behavior between casual and registered users were designed:

​

  1. How do rental trends differ by user type over time?

  2. How does rental behavior change by time of day and day type (working vs. non-working)?

 

These questions were selected to address both temporal dynamics and user segmentation, which are highly relevant to marketing strategy and operational planning.

Key Analyses & Insights
1. Longitudinal Rental Trends
  • Casual users showed strong season variation, with usage peaking in June and July and dropping significantly in winter.

  • Registered users had consistent growth over time and maintained steady usage across seasons.

  • Insights: Casual usage is more weather-sensitive and leisure-oriented, while registered usage is more commuting-driven and stable.

🔗 Visualization Link: Rental Trends Over Time (Tableau)

2. Time of Day vs. Day Type (Working vs. Non-Working Days)
  • Registered users had sharp peaks during 8 AM and 5-6 PM, indicating commute-related behavior on working days.

  • Casual users showed even distribution across daytime hours, with mild increases on weekends and afternoons, implying leisure activity rather than routine travel.

  • Insights: Clear distinction in daily usage rhythms, opening opportunities for targeted campaigns and schedule-based resource allocation.

🔗 Visualization Link: Hourly Patterns by User Type (Tableau)

Recommendations Based on Findings
For Casual Users:
  • Offer seasonal promotions during off-peak months (e.g., September to February)

  • Promote weekend packages or event-based offers tied to warmer seasons

  • Send push notifications to casual users on weekends with good weather forecasts

For Registered Users:
  • Introduce subscription incentives and commuter discounts aligned with rush hours

  • Optimize bike availability near office hubs during weekday peaks

Reflection & Learning
  • Even without interactive dashboards, Tableau enabled clear segmentation of user types and revealed time-dependent behaviors that complemented the machine learning findings in R

  • The process of visual storytelling helped contextualize model outputs and connect data-driven insights with real-world business decisions

  • Strengthened ability to communicate insights visually for stakeholders without technical backgrounds

Integrated Insights & Strategic Implications

This project combined predictive modeling (Part 1) and behavioral analysis (Part 2) to provide a well-rounded understanding of bike rental demand and rider behavior. While each part offers distinct insights, their integration reveals three key strategic takeaways:

1. Hour is the Most Critical Feature Across Analyses

​Both the predictive models and Tableau visualizations identified time of day as the most influential factor.

  • In Part 1, 'hour' was consistently the top-ranked predictor in models like XGBoost and regression trees.

  • In Part 2, registered users showed sharp rush hour peaks, while casual users had steady daytime usage.

→ Strategic Implication: Allocate bike inventory and target promotions by hour and user type to maximize utilization.

2. Clear Distinction Between Casual and Registered User Behavior
  • In the models, environmental variables like 'humidity', 'temp', and 'atemp' strongly influenced overall rental demand. While user types were not modeled separately due to data leakage concerns, the behavioral analysis in Part 2 suggests that casual users may be more weather-sensitive, providing a potential explanation for seasonal patterns observed in the data.

  • Tableau visualizations reinforced this by showing seasonal usage spikes for casual users and stable usage for registered users.

→ Strategic Implication: Use different engagement strategies, weather- and event-based offers for casual riders, and loyalty programs for registered users.

3. Complementary Strengths: Forecasting + Contextual Understanding
  • Predictive modeling (Part 1) answered “how much demand to expect”

  • Behavioral visualization (Part 2) answered “who is using the service, and when/how”

→ Strategic Implication: Together, these insights support smarter scheduling, demand forecasting, and customer-specific marketing grounded in both quantitative rigor and user behavior awareness.

Appendix

Part 1: Predictive Modeling Report

Part 2: Data Visualization & Behavioral Analysis in Tableau Report

bottom of page