Extreme gradient boosting algorithm based urban daily traffic index prediction model: a case study of Beijing, China

Figures (4) Tables (5)

Figure 1.
Fluctuation characteristics of TPI over different periods.
Figure 2.
Relative importance of different influencing factors.
Figure 3.
Comparison of TPI prediction results for one week.
Figure 4.
Comparison of TPI prediction results for rainy, snowy, and hazy weather.

Name	Symbol	Count
Month	0: January; 1: February; ...; 11: December	18 months
Week	0: Sunday; 1: Monday; ...; 6: Saturday	72 weeks
Time period	21:0500-0515; 22:0515-0530; ...; 92:2245-2300	39,312 periods
Day type	0: Weekday; 1: Weekend	546 d
Public holiday	1: First day of holiday	12 d
	2: Middle day(s) during holiday	25 d
	3: Last day of holiday	12 d
Summer or winter vacation	0: Normal days	426 d
Summer or winter vacation	1: Summer and winter vacation	120 d
Special holiday	0: Normal day	421 d
Special holiday	1: Special holiday	5 d
Car usage restriction policy	0: The last digit of license plate number is 0 or 5.	73 d
	1: The last digit of license plate number is 1 or 6.	74 d
	2: The last digit of license plate number is 2 or 7.	73 d
	3: The last digit of license plate number is 3 or 8.	71 d
	4: The last digit of license plate number is 4 or 9.	70 d
	5: No limit	185 d
Weather	0: Sunny, or cloudy 1: Rain	490 d
	0: Sunny, or cloudy 1: Rain	63 d
	2: Snow	6 d
	3: Haze	31 d
Special events	1: Short-term events	252 times
Special events	2: Large events lasting the whole day	314 times

Table 1.

Descriptive statistics of influencing factors.

XGBoost Pseudo-code:
Input: Training set D = {(x_i, y_i)}, where x_i represents the i-th input vector and y_i is the corresponding label. Output: Prediction model f(x).
// Step 1: Initialize the ensemble Initialize the base prediction model as a constant value: f₀(x) = initialization_constant
// Step 2: Iterate over the boosting rounds for m = 1 to M: // M is the number of boosting rounds
// Step 3: Compute the pseudo-residuals Compute the negative gradient of the loss function with respect to the current model's predicted values: r_mi = - ∂L(y_i, f_m−1(x_i)) / ∂f_m−1(x_i)
// Step 4: Fit a base learner to the pseudo-residuals Fit a base learner (e.g., decision tree) to the pseudo-residuals: h_m(x).
// Step 5: Update the prediction model Update the prediction model by adding the new base learner: f_m(x) = f_m−1(x) + η * h_m(x), where η is the learning rate.
// Step 6: Output the final prediction model Output the final prediction model: f(x) = f_m(x)

Table 2.

The pseudo-code of XGBoost algorithm.

Table 3.

Performance of extreme gradient boosting (XGBoost) models for daily TPI prediction.

TPI prediction	Performance of different models (Measured by MAE, MSE and R²)
TPI prediction	SVR	ElatsicNet	Bayesian Ridge	Linear Regression	XGBoost
MAE	0.611	1.668	1.581	2.189	0.396*
MSE	1.693	3.111	4.121	3.553	0.989*
R²	0.784	0.034	0.113	0.391	0.786*
MAE, Mean Absolute Error; MSE, Mean Squared Error

Table 5.

Accuracy verification result of different models.