Mixture correntropy with variable center LSTM network for traffic flow forecasting

Weiwei Fang; Xiaoke Li; Zhizhe Lin; Jinglin Zhou; Teng Zhou; Weiwei Fang; Xiaoke Li; Zhizhe Lin; Jinglin Zhou; Teng Zhou

doi:10.48130/dts-0024-0023

2024 Volume 3

Article Contents

Next Previous

ARTICLE Open Access

Mixture correntropy with variable center LSTM network for traffic flow forecasting

1.
College of Information Engineering, Hubei University of Chinese Medicine, Wuhan 430000, China
2.
Wuhan Wuchang Affordable Housing Development and Co., Ltd., Wuhan 430000, China
3.
School of Cyberspace Security, Hainan University, Haikou 570228, China
4.
School of Philosophy, Fudan University, Shanghai 200433, China
5.
Shaoxing Feiteng Zhihang Technology Co., Ltd., Shaoxing 312000, China
6.
Hubei Shizhen Laboratory, Wuhan 430000, China

More Information

Corresponding authors: linzhizhe@hainanu.edu.cn; teng.zhou@hainanu.edu.cn

Received: 11 September 2024
Revised: 28 October 2024
Accepted: 11 November 2024
Published online: 27 December 2024
Digital Transportation and Safety 2024, 3(4): 264−270 | Cite this article

Abstract

Timely and accurate traffic flow prediction is the core of an intelligent transportation system. Canonical long short-term memory (LSTM) networks are guided by the mean square error (MSE) criterion, so it can handle Gaussian noise in traffic flow effectively. The MSE criterion is a global measure of the total error between the predictions and the ground truth. When the errors between the predictions and the ground truth are independent and identically Gaussian distributed, the MSE-guided LSTM networks work well. However, traffic flow is often impacted by non-Gaussian noise, and can no longer maintain an identical Gaussian distribution. Then, a $ {\overline{\delta }}_{relax} $-LSTM network guided by mixed correlation entropy and variable center (MCVC) criterion is proposed to simultaneously respond to both Gaussian and non-Gaussian distributions. The abundant experiments on four benchmark datasets of traffic flow show that the $ {\overline{\delta }}_{relax} $-LSTM network obtained more accurate prediction results than state-of-the-art models.
- Traffic flow theory,
- Machine learning,
- Robust modeling,
- Mixture correntropy
Rights and permissions
Copyright: © 2024 by the author(s). Published by Maximum Academic Press, Fayetteville, GA. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https://creativecommons.org/licenses/by/4.0/.

References

[1]	Fang W, Zhuo W, Song Y, Yan J, Zhou T, et al. 2023. Δ_free-LSTM: an error distribution free deep learning for short-term traffic flow forecasting. Neurocomputing 526:180−90 doi: 10.1016/j.neucom.2023.01.009 CrossRef Google Scholar
[2]	Li H, Yang S, Song Y, Luo Y, Li J, et al. 2023. Spatial dynamic graph convolutional network for traffic flow forecasting. Applied Intelligence 53:14986−98 doi: 10.1007/s10489-022-04271-z CrossRef Google Scholar
[3]	Kaysi I, Ben-Akiva M, Koutsopoulos H. 1993. Integrated approach to vehicle routing and congestion prediction for real-time driver guidance. Transportation Research Record 1408. pp 66−74. doi: https://onlinepubs.trb.org/Onlinepubs/trr/1993/1408/1408-009.pdf
[4]	Zare Moayedi H, Masnadi-Shirazi MA. 2008. ARIMA model for network traffic prediction and anomaly detection. 2008 International Symposium on Information Technology, Kuala Lumpur, Malaysia, 26−28 August 2008. USA: IEEE. pp. 1−6. doi: 10.1109/ITSIM.2008.4631947
[5]	Peng Y, Lei M, Li JB, Peng XY. 2014. A novel hybridization of echo state networks and multiplicative seasonal ARIMA model for mobile communication traffic series forecasting. Neural Computing and Applications 24:883−90 doi: 10.1007/s00521-012-1291-9 CrossRef Google Scholar
[6]	Zhou T, Jiang D, Lin Z, Han G, Xu X, et al. 2019. Hybrid dual Kalman filtering model for short-term traffic flow forecasting. IET Intelligent Transport Systems 13:1023−1032 doi: 10.1049/iet-its.2018.5385 CrossRef Google Scholar
[7]	Cai L, Zhang Z, Yang J, Yu Y, Zhou T, et al. 2019. A noise-immune Kalman filter for short-term traffic flow forecasting. Physica A: Statistical Mechanics and its Applications 536:122601 doi: 10.1016/j.physa.2019.122601 CrossRef Google Scholar
[8]	Zhang S, Song Y, Jiang D, Zhou T, Qin J. 2019. Noise-identified Kalman filter for short-term traffic flow forecasting. 2019 15 ^th International Conference on Mobile Ad-Hoc and Sensor Networks (MSN), Shenzhen, China, 11−13 December 2019. USA: IEEE, pp. 462−66. doi: 10.1109/MSN48538.2019.00093
[9]	Zhou T, Han G, Xu X, Han C, Huang Y, et al. 2019. A learning-based multimodel integrated framework for dynamic traffic flow forecasting. Neural Processing Letters 49:407−30 doi: 10.1007/s11063-018-9804-x CrossRef Google Scholar
[10]	Cai L, Yu Y, Zhang S, Song Y, Xiong Z, et al. 2020. A sample-rebalanced outlier-rejected k-nearest neighbor regression model for short-term traffic flow forecasting. IEEE Access 8:22686−96 doi: 10.1109/ACCESS.2020.2970250 CrossRef Google Scholar
[11]	Zheng S, Zhang S, Song Y, Lin Z, Jiang D, et al. 2021. A noise-immune boosting framework for short-term traffic flow forecasting. Complexity. 2021:5582974 doi: 10.1155/2021/5582974 CrossRef Google Scholar
[12]	Cai L, Chen Q, Cai W, Xu X, Zhou T, et al. 2019. SVRGSA: a hybrid learning based model for short-term traffic flow forecasting. IET Intelligent Transport Systems 13:1348−55 doi: 10.1049/iet-its.2018.5315 CrossRef Google Scholar
[13]	Cui Z, Huang B, Dou H, Tan G, Zheng S, et al. 2022. GSA-ELM: A hybrid learning model for short-term traffic flow forecasting. IET Intelligent Transport Systems 16(1):41−52 doi: 10.1049/itr2.12127 CrossRef Google Scholar
[14]	Chai W, Zheng Y, Tian L, Qin J, Zhou T. 2023. GA-KELM: Genetic-Algorithm-Improved Kernel Extreme Learning Machine for Traffic Flow Forecasting. Mathematics 11:3574 doi: 10.3390/math11163574 CrossRef Google Scholar
[15]	Wu K, Xu C, Yan J, Wang F, Lin Z, et al. 2023. Error-distribution-free kernel extreme learning machine for traffic flow forecasting. Engineering Applications of Artificial Intelligence 123:106411 doi: 10.1016/j.engappai.2023.106411 CrossRef Google Scholar
[16]	Zhou T, Han G, Xu X, Lin Z, Han C, et al. 2017. δ-agree AdaBoost stacked autoencoder for short-term traffic flow forecasting. Neurocomputing 247:31−38 doi: 10.1016/j.neucom.2017.03.049 CrossRef Google Scholar
[17]	Lu H, Huang D, Song Y, Jiang D, Zhou T, et al. 2020. ST-TrafficNet: a spatial-temporal deep learning network for traffic forecasting. Electronics 9:1474 doi: 10.3390/electronics9091474 CrossRef Google Scholar
[18]	Lu H, Ge Z, Song Y, Jiang D, Zhou T, et al. 2021. A temporal-aware LSTM enhanced by loss-switch mechanism for traffic flow forecasting. Neurocomputing 427:169−78 doi: 10.1016/j.neucom.2020.11.026 CrossRef Google Scholar
[19]	Huang B, Dou H, Luo Y, Li J, Wang J, et al. 2023. Adaptive spatiotemporal transformer graph network for traffic flow forecasting by IoT loop detectors. IEEE Internet of Things Journal 10:1642−53 doi: 10.1109/JIOT.2022.3209523 CrossRef Google Scholar
[20]	Lv Y, Duan Y, Kang W, Li Z, Wang FY. 2015. Traffic flow prediction with big data: a deep learning approach. IEEE Transactions on Intelligent Transportation Systems 16:865−73 doi: 10.1109/TITS.2014.2345663 CrossRef Google Scholar
[21]	Cui Z, Huang B, Dou H, Cheng Y, Guan J, et al. 2022. A Two-Stage Hybrid Extreme Learning Model for Short-term Traffic Flow Forecasting. Mathematics 10:2087 doi: 10.3390/math10122087 CrossRef Google Scholar
[22]	Qu L, Lyu J, Li W, Ma D, Fan H. 2021. Features injected recurrent neural networks for short-term traffic speed prediction. Neurocomputing 451:290−304 doi: 10.1016/j.neucom.2021.03.054 CrossRef Google Scholar
[23]	Hochreiter S, Schmidhuber J. 1997. Long short-term memory. Neural Computation 9:1735−80 doi: 10.1162/neco.1997.9.8.1735 CrossRef Google Scholar
[24]	Fang W, Zhuo W, Yan J, Song Y, Jiang D, et al. 2022. Attention Meets Long Short-term Memory: A Deep Learning Network for Traffic Flow Forecasting. Physica A: Statistical Mechanics and its Applications 587:126485 doi: 10.1016/j.physa.2021.126485 CrossRef Google Scholar
[25]	Yang S, Li H, Luo Y, Li J, Song Y, et al. 2022. Spatiotemporal Adaptive Fusion Graph Network for Short-Term Traffic Flow Forecasting. Mathematics 10:1594 doi: 10.3390/math10091594 CrossRef Google Scholar
[26]	Zhao L, Wang Q, Jin B, Ye C. 2020. Short-term traffic flow intensity prediction based on CHS-LSTM. Arabian Journal for Science and Engineering 45:10845−57 doi: 10.1007/s13369-020-04862-3 CrossRef Google Scholar
[27]	Cai L, Lei M, Zhang S, Yu Y, Zhou T, et al. 2020. A noise-immune LSTM network for short-term traffic flow forecasting. Chaos 30:023135 doi: 10.1063/1.5120502 CrossRef Google Scholar
[28]	Chen B, Wang X, Lu N, Wang S, Cao J, et al. 2018. Mixture correntropy for robust learning. Pattern Recognition 79:318−27 doi: 10.1016/j.patcog.2018.02.010 CrossRef Google Scholar
[29]	Principe JC. 2010. Information Theoretic Learning: Renyi's Entropy and Kernel Perspectives. New York: Springer. 448 pp. doi: 10.1007/978-1-4419-1570-2
[30]	Chen B, Wang X, Li Y, Principe JC. 2019. Maximum correntropy criterion with variable center. IEEE Signal Processing Letters 26:1212−16 doi: 10.1109/LSP.2019.2925692 CrossRef Google Scholar
[31]	Zheng Y, Chen B, Wang S, Wang W, Qin W. 2022. Mixture correntropy-based kernel extreme learning machines. IEEE Transactions on Neural Networks and Learning Systems 33:811−25 doi: 10.1109/TNNLS.2020.3029198 CrossRef Google Scholar
[32]	Xie Y, Zhang Y, Ye Z. 2007. Short-term traffic volume forecasting using Kalman with discrete wavelet decomposition. Computer-Aided Civil and Infrastructure Engineering 22:326−34 doi: 10.1111/j.1467-8667.2007.00489.x CrossRef Google Scholar
[33]	Wang Y, van Schuppen JH, Vrancken J. 2013. Prediction of traffic flow at the boundary of a motorway network. IEEE Transactions on Intelligent Transportation Systems 15:214−27 doi: 10.1109/TITS.2013.2278192 CrossRef Google Scholar
[34]	Cai W, Yang J, Yu Y, Song Y, Zhou T, et al. 2020. PSO-ELM: A Hybrid Learning Model for Short-term Traffic Flow Forecasting. IEEE Access 8:6505−6514 doi: 10.1109/ACCESS.2019.2963784 CrossRef Google Scholar
[35]	Cai W, Yang J, Yu Y, Song Y, Zhou T, et al. 2024. SSA-ELM: a hybrid learning model for short-term traffic flow forecasting. Mathematics 12(12):1895 doi: 10.3390/math12121895 CrossRef Google Scholar
[36]	Zhu JZ, Cao JX, Zhu Y. 2014. Traffic volume forecasting based on radial basis function neural network with the consideration of traffic flows at the adjacent intersections. Transportation Research Part C: Emerging Technologies 47:139−54 doi: 10.1016/j.trc.2014.06.011 CrossRef Google Scholar

About this article

Cite this article

Fang W, Li X, Lin Z, Zhou J, Zhou T. 2024. Mixture correntropy with variable center LSTM network for traffic flow forecasting. Digital Transportation and Safety 3(4): 264−270 doi: 10.48130/dts-0024-0023

Fang W, Li X, Lin Z, Zhou J, Zhou T. 2024. Mixture correntropy with variable center LSTM network for traffic flow forecasting. Digital Transportation and Safety 3(4): 264−270 doi: 10.48130/dts-0024-0023

Figures(2) / Tables(3)

Download PDF

Article Metrics

Article views(465) PDF downloads(112)

Other Articles By Authors

on this site
on Google Scholar

HTML

Methodology

In this section, the conventional LSTM network is introduced, its shortcomings analyzed, and then the $ {\overline{\delta }}_{relax} $-LSTM network is proposed for traffic flow forecasting.

The conventional LSTM network for forecasting

The LSTM network has been proven to be stable and powerful in modeling the long-term correlation of traffic flow sequences^[27]. The LSTM network is composed of several basic LSTM cell units and a fully connected neural (FCN) network. Taking cell unit u_n as an example, h_n−1 represents the cell hidden state at moment n−1, x_n is the cell input at moment n. When h_n-1, x_n and b enter into the sig and the tanh boxes, it implies that they pass through a basic neural network^[23], with output represented by i_n, f_n, o_n, and C respectively. The relationship is expressed in Eqn (1).

$ \left(\begin{array}{c}\begin{array}{c}{i}_{n}\\ {f}_{n}\\ {o}_{n}\end{array}\\ \tilde{C}\end{array}\right)=\left(\begin{array}{c}\begin{array}{c}sigmoid\\ sigmoid\\ sigmoid\end{array}\\ tanh\end{array}\right)W\left(\genfrac{}{}{0pt}{}{{x}_{n}}{{h}_{n-1}}\right)+\left(\begin{array}{c}\begin{array}{c}{b}_{i}\\ {b}_{f}\\ {b}_{o}\end{array}\\ {b}_{C}\end{array}\right) $

(1)

where, W represents the weight matrix in the hidden layer of the basic neural network, and x_n, is the normalized data. In Eqn (1), i_n, f_n and o_n are called the input gate, the forgetting gate, and the output gate respectively, and $ \tilde{C} $ is an intermediate variable to calculate cell state c_n. b_i, b_f, b_o, and b_C mean the corresponding offset vectors to i_n, f_n, o_n, and $ \tilde{C} $. Since the range of the sigmoid function is from 0 to 1, i_n, f_n, and o_n are all non-negative, where the parameter $ \tilde{C} $ ranges from −1 to 1, as determined by the hyperbolic tanh function.

And then the cell state c_n is calculated, which can be calculated by summing c_n−1 and $ \tilde{C} $ in a certain proportion. The proportion of c_n−1 is determined by f_n, while the contribution of $ \tilde{C} $ is controlled by i_n, as shown in Eqn (2).

$ {c}_{n}={f}_{n}\otimes{c}_{n-1}+{i}_{n}\otimes\tilde{C} $

(2)

Since c_n−1 means the previous cell state, $ \tilde{C} $ is calculated by the current cell, and f_n and i_n are their corresponding coefficients, which is why f_n and i_n are called forgetting gate and input gate.

Then calculate the cell output h_n which is the result of the activated value of c_n to a certain extent. The extent is determined by the output gate o_n, as shown in Eqn (3).

$ {h}_{n}={o}_{n}\otimes\mathrm{t}\mathrm{a}\mathrm{n}\mathrm{h}\left({c}_{n}\right) $

(3)

Finally, h_n is entered into an FCN network to get the prediction $ {\hat{x}}_{n+1} $. Since the cell state at any moment is related to the previous cell state and the input of the current moment, h_n contains the information of all the previous moments and the current moment, which realizes the correlation dependence of long sequences.

Before using the LSTM network to predict traffic flow, it is necessary to train the parameters of the LSTM network by back-propagation algorithm under the guidance of the error function. The error function of the conventional LSTM network is the mean square error (MSE) function, and its expression is shown in Eqn (4).

$ MSE=\dfrac{1}{N}\textstyle\sum _{n=1}^{N}{\left({\hat{x}}_{n+1}-{x}_{n+1}\right)}^{2} $

(4)

where, $ {\hat{x}}_{n+1} $ is the predicted value at the time of n+1, x_n+1 represents the true value at the time of n+1, and N is defined as the total number of samples in the training set.

As shown in Eqn (4), when the data is a stationary sequence, or when the noise is Gaussian noise or noiseless, satisfying $ |{\hat{x}}_{n+1}-{x}_{n+1}| < 1 $, the network parameters guided by MSE will converge rapidly. However, non-Gaussian outlier noise is often generated in traffic flow data due to various reasons such as accidents. When the error $ |{\hat{x}}_{n+1}-{x}_{n+1}| > 1 $, the square operation in MSE will further amplify the error, and then change the parameters in the network. The MSE loss makes the LSTM network vulnerable to non-Gaussian noise. At this point, the canonical LSTM network guided by MSE loss cannot provide accurate prediction in the case of non-Gaussian distribution, especially for traffic flow data. Therefore, the standard LSTM network needs to be further improved.

The $ {\overline{{\delta }}}_{\mathit{r}\mathit{e}\mathit{l}\mathit{a}\mathit{x}} $-LSTM network for forecasting

Although the LSTM model can learn long sequence dependence, its prediction performance is highly dependent on the MSE criterion. However, the MSE criterion assumes that the prediction error obeys Gaussian independent identical distribution (i.i.d), which makes MSE not suitable for complex traffic flow sequences containing non-Gaussian noise such as impulse noise. To solve this problem, we propose to introduce the MCVC function into the LSTM network to guide network parameters, to carry out higher-quality traffic flow forecasting.

It is well known that the error function plays a key role in the performance of deep learning networks. From the perspective of information theory, the correntropy criterion, as a nonlinear similarity measure, has been successfully used as an effective optimization cost in signal processing and machine learning^[28]. The correntropy between two random variables X and Y is shown in Eqn (5).

$ V(X,Y)=\mathbb{E}\left[{\text ƙ}\right(e\left)\right]=\dfrac{1}{N}\textstyle\sum _{n=1}^{N}{\text ƙ}\left({e}_{n}\right) $

(5)

where, $ \mathbb{E}[\cdot ] $ denotes the expectation operator, ${\text ƙ}(\cdot) $ is the Mercer kernel, e = X − Y, and N represents the number of samples.

It is worth noting that the selection of the kernel function $ {\text ƙ} $ plays an important role in the correntropy. If the kernel function adopts a triangle kernel, i.e. $ {\text ƙ}\left(e\right)={\parallel e\parallel }^{d} $, when d = 2, V will degenerate to MSE. When the kernel function is Gaussian kernel, that is, $ {\text ƙ}\left(e\right)=\dfrac{1}{N}\sum _{n=1}^{N}\mathrm{e}\mathrm{x}\mathrm{p}[-\dfrac{{\Delta }_{n}^{2}}{2{\delta }^{2}}] $, and then V is the maximum correntropy. Further, when the kernel function in Eqn (5) adopts the mixed Gaussian kernel, then V is called the mixed correntropy (MC), as shown in Eqn (6).

$ {V}_{MC}=\textstyle\sum _{i=1}^{I}{\alpha }_{i}\dfrac{1}{N}\textstyle\sum _{n=1}^{N}{G}_{{\delta }_{i}}\left({e}_{n}\right) $

(6)

where, δ_i is the kernel bandwidth of the ith Gaussian kernel, and α_i is the corresponding proportionality coefficient, satisfying α₁ + α₂ + ... + α_I = 1. Since the Taylor expansion of Gaussian kernel is a measure from zero to infinite order, it can contain the measure order of non-Gaussian noise whether it is heavy tail noise or light tail noise, so Gaussian kernel is easy to eliminate non-Gaussian noise in the training process.

In Eqn (6), V_MC is a linear combination of multiple Gaussian cores. Besides, it is found that the mean error of a single Gaussian kernel in V_MC is zero, that is, V_MC can only have a good effect on the noise under the mixed Gaussian kernel with the center of zero. Then, Chen et al.^[30] proposed the MCVC criterion to further improve the performance of correntropy by enhancing the applicability of correntropy, as shown in Eqn (7).

$ {V}_{MCVC}=\textstyle\sum _{k=1}^{K}{\lambda }_{k}\dfrac{1}{N}\textstyle\sum _{n=1}^{N}{G}_{{\delta }_{k}}({e}_{n}-{c}_{n}) $

(7)

where, δ_k defines the kernel bandwidth of the kth Gaussian kernel, and λ_k is the corresponding proportionality coefficient, satisfying λ₁ + λ₂ + ... +λ_k = 1.

It should be noted that the kernel function in V_MCVC is a multi-Gaussian function, which usually does not satisfy Mercer's condition. However, this is not a problem because Mercer's condition is not required for the similarity measure^[30]. As for the convergence of 1, it involves the kernel method and the unified framework of regression and classification. However, the convergence of 1 can be guaranteed if an appropriate parameter search method is adopted^[31].

To consider both Gaussian error and non-Gaussian error of LSTM network, an $ {\overline{\delta }}_{relax} $-LSTM network is proposed. In this network, a new loss function based on the MCVC criterion is adopted, shown in Eqn (8).

$ \begin{split}L=\;&1-{V}_{MCVC} =1-\sum _{k=1}^{K}{\lambda }_{k}\dfrac{1}{N}\textstyle\sum _{n=1}^{N}{G}_{{\delta }_{k}}\left({e}_{n}-{c}_{n}\right)\\ =\;&1-\Bigg({\lambda }_{1}\dfrac{1}{N}\textstyle\sum _{n=1}^{N}\mathrm{exp}\left[-\dfrac{{\left({e}_{n}-{c}_{n}\right)}^{2}}{2{{\delta }_{1}}^{2}}\right]+\cdots +\\&{\lambda }_{K}\dfrac{1}{N}\textstyle\sum _{n=1}^{N}\mathrm{exp}\left[-\dfrac{{\left({e}_{n}-{c}_{n}\right)}^{2}}{2{{\delta }_{K}}^{2}}\right]\Bigg)\end{split} $

(8)

Through the analysis of Eqn (8), the advantages of $ \mathcal{L} $ are as follows:

● $ \mathcal{L} $ performs a negative exponential operation on the prediction error. This means that when the sequence is mixed with non-Gaussian noise such as impulse noise or outliers, the value of $ \dfrac{{\left({e}_{n}-{c}_{n}\right)}^{2}}{2{\delta }^{2}} $ will be very large, but the negative exponential operation makes the correntropy V_MCVC tend to zero. That is, V_MCVC is not sensitive to non-Gaussian noise, which can weaken the misjudgment of the LSTM network.

● When K = 2 , δ₁ < δ₂, and $ {\delta }_{1}\to \infty $ is satisfied, V_MCVC is approximately equal to MSE, which means that the $ {\overline{\delta }}_{relax} $-LSTM network has the potential to maintain good performance in Gaussian noise environment. On the other hand, when c_k = 0 (k = 1, 2, ..., K) is satisfied, V_MCVC = V_MC is obtained, that is, the performance of MCVC-LSTM network in non-Gaussian noise environment is not inferior to MC criterion. The proposed $ \mathcal{L} $ loss function can make the LSTM network have excellent prediction performance in dealing with both Gaussian noise and non-Gaussian noise.

● The single Gaussian kernel in $ \mathcal{L} $ is no longer limited to zero-mean, but can be concentrated in different positions. By studying the Gaussian mixture kernel with a variable center, it is found that V_MCVC is more general and flexible, and can adapt to more complex error distributions, such as skew, multi-peak, discrete value distribution, and so on. Therefore, when $ \mathcal{L} $ is employed as the cost function in the LSTM network, traffic flow forecasting can get better performance by setting the center appropriately.

Models	Criterion	A1	A2	A4	A8
HA	RMSE (vehs/h)	404.84	348.96	357.85	218.72
HA	MAPE (%)	16.87	15.53	16.72	16.24
KF	RMSE (vehs/h)	332.03	239.87	250.51	187.48
KF	MAPE (%)	12.46	10.72	12.62	12.63
ANN	RMSE (vehs/h)	299.64	212.95	225.86	166.50
ANN	MAPE (%)	12.61	10.89	12.49	12.53
SAE	RMSE (vehs/h)	295.43	209.32	226.91	167.01
SAE	MAPE (%)	11.92	10.23	11.87	12.03
GSA-ELM	RMSE (vehs/h)	287.89	203.04	221.39	163.24
GSA-ELM	MAPE (%)	11.69	10.25	11.72	12.05
PSOGSA-ELM	RMSE (vehs/h)	288.03	204.09	220.52	163.92
PSOGSA-ELM	MAPE (%)	11.53	10.16	11.67	12.02
LSTM	RMSE (vehs/h)	289.56	204.71	224.49	165.13
LSTM	MAPE (%)	12.38	10.56	11.99	12.48
NiLSTM	RMSE (vehs/h)	285.54	203.69	223.72	163.25
NiLSTM	MAPE (%)	12.00	10.14	11.57	11.76
$ {\overline{\delta }}_{relax} $-LSTM	RMSE (vehs/h)	280.54	195.28	220.08	161.69
$ {\overline{\delta }}_{relax} $-LSTM	MAPE (%)	11.48	10.02	11.51	11.54

Hyperparameter value	Value
Hidden layers	1
Hidden units	256
Batch size	32
Input length	12
Epochs	200

{{lists.name}}

Mixture correntropy with variable center LSTM network for traffic flow forecasting

Abstract