Leveraging Machine Learning for Congestion Level Prediction at 10 AI-Powered ITCS Intersections

Overview
Built a traffic congestion forecasting model using ML algorithms on 1600+ traffic data points from HERE Maps API, achieving 94% prediction accuracy and enabling proactive congestion management through a user-accessible Streamlit app.
Background
Since May 2023, Jakarta has consistently featured among the top 10 most polluted cities globally according to the Air Quality Index, even peaking at number one in August.
In response to chronic traffic congestion, the Jakarta Provincial Government has implemented several initiatives, the most recent being an AI-driven Intelligent Traffic Control System (ITCS) developed in collaboration with Google under the Greenlight project. This system adjusts traffic light durations dynamically based on real-time traffic volume at intersections.
Methodology
- Scrap data from HERE MAPS API
- Data Preprocessing
- Exploratory Data Analysis
- Feature Engineering
- KMeans clustering for making label classification
- Random Forest and XGBoost Regressor to predict travel time seconds
- Random Forest and XGBoost Classifier to classify congestion level
- Model Tuning and Evaluation
- Streamlit for deployment
Results
The following visualization and tables were created to help understand and communicate the findings:
Travel Time Boxplot by Clusters
Regression Model Performance
Model | CV SMAPE | CV MSE |
---|---|---|
Random Forest | 1.727 | 4.217 |
Random Forest Tuning | 1.719 | 4.212 |
XGBoost | 1.729 | 4.268 |
XGBoost Tuning | 1.714 | 4.121 |
Classification Model Performance
Model | Accuracy | Recall | Precision | F1-Score |
---|---|---|---|---|
Random Forest | 94.59% | 94.59% | 94.63% | 94.57% |
Random Forest Tuning | 95.65% | 95.65% | 95.65% | 95.63% |
XGBoost | 94.76% | 94.76% | 94.81% | 94.74% |
XGBoost Tuning | 95.59% | 95.59% | 95.66% | 95.57% |
Conclusion
KMeans clustering with a silhouette score of 0.596 effectively categorizes travel times into four groups. Furthermore, XGBoost Regressor Tuning yielded the lowest SMAPE score at 1.714, while XGBoost Classifier Tuning delivered the best performance in congestion level prediction with a precision score of 95.66%.
The most important features in regression and classification models are normal speed, base duration, intersections, and sin hours. The Streamlit app provided an accessible interface for users to access predictions.
Categories
Objectives
- Building upon ITCS, we propose enhancing the ITCS by constructing a machine-learning model that empowers road users at ITCS-equipped intersections to predict future traffic congestion
- As a pilot phase, we will focus on 10 of the 20 initially equipped intersections, leveraging a case clustering, regression, and classification approach on traffic data acquired via the Here Maps API
Tools & Technologies
Data Source
1600+ traffic data points collected from HERE Maps API