Back to Projects

Leveraging Machine Learning for Congestion Level Prediction at 10 AI-Powered ITCS Intersections

August 2023
Team Project
Clustering
Regression
Classification
Streamlit
Transportation
Leveraging Machine Learning for Congestion Level Prediction at 10 AI-Powered ITCS Intersections

Overview

Built a traffic congestion forecasting model using ML algorithms on 1600+ traffic data points from HERE Maps API, achieving 94% prediction accuracy and enabling proactive congestion management through a user-accessible Streamlit app.

Background

Since May 2023, Jakarta has consistently featured among the top 10 most polluted cities globally according to the Air Quality Index, even peaking at number one in August.

In response to chronic traffic congestion, the Jakarta Provincial Government has implemented several initiatives, the most recent being an AI-driven Intelligent Traffic Control System (ITCS) developed in collaboration with Google under the Greenlight project. This system adjusts traffic light durations dynamically based on real-time traffic volume at intersections.

Methodology

Workflow

  1. Scrap data from HERE MAPS API
  2. Data Preprocessing
  3. Exploratory Data Analysis
  4. Feature Engineering
  5. KMeans clustering for making label classification
  6. Random Forest and XGBoost Regressor to predict travel time seconds
  7. Random Forest and XGBoost Classifier to classify congestion level
  8. Model Tuning and Evaluation
  9. Streamlit for deployment

Results

The following visualization and tables were created to help understand and communicate the findings:

Travel Time Boxplot by Clusters Cluster-Interpretation

Regression Model Performance

ModelCV SMAPECV MSE
Random Forest1.7274.217
Random Forest Tuning1.7194.212
XGBoost1.7294.268
XGBoost Tuning1.7144.121

Classification Model Performance

ModelAccuracyRecallPrecisionF1-Score
Random Forest94.59%94.59%94.63%94.57%
Random Forest Tuning95.65%95.65%95.65%95.63%
XGBoost94.76%94.76%94.81%94.74%
XGBoost Tuning95.59%95.59%95.66%95.57%

Conclusion

KMeans clustering with a silhouette score of 0.596 effectively categorizes travel times into four groups. Furthermore, XGBoost Regressor Tuning yielded the lowest SMAPE score at 1.714, while XGBoost Classifier Tuning delivered the best performance in congestion level prediction with a precision score of 95.66%.

The most important features in regression and classification models are normal speed, base duration, intersections, and sin hours. The Streamlit app provided an accessible interface for users to access predictions.

Categories

machine deep-learning
end to-end

Objectives

  • Building upon ITCS, we propose enhancing the ITCS by constructing a machine-learning model that empowers road users at ITCS-equipped intersections to predict future traffic congestion
  • As a pilot phase, we will focus on 10 of the 20 initially equipped intersections, leveraging a case clustering, regression, and classification approach on traffic data acquired via the Here Maps API

Tools & Technologies

Python
Scikit-learn
Streamlit
HERE Maps API

Data Source

1600+ traffic data points collected from HERE Maps API