Driving risk assessment under the connected vehicle environment: a CNN-LSTM modeling approach

Yin Zheng; Lei Han; Jiqing Yu; Rongjie Yu; Yin Zheng; Lei Han; Jiqing Yu; Rongjie Yu

doi:10.48130/DTS-2023-0017

2023 Volume 2

Article Contents

Next Previous

ARTICLE Open Access

Driving risk assessment under the connected vehicle environment: a CNN-LSTM modeling approach

1.
The Key Laboratory of Road and Traffic Engineering, Ministry of Education, No. 4800 Cao'an Road, Shanghai 201804, China
2.
College of Transportation Engineering, Tongji University, No. 4800 Cao'an Road, Shanghai 201804, China
3.
Ningbo Hangzhou Bay Bridge Development Co., Ltd., No. 1 Hongqiao Road, Cixi 315300, Ningbo, China

More Information

Corresponding author: yurongjie@tongji.edu.cn

Received: 31 May 2023
Accepted: 04 September 2023
Published online: 28 September 2023
Digital Transportation and Safety 2023, 2(3): 211−219 | Cite this article

Abstract

Connected vehicle (CV) is regarded as a typical feature of the future road transportation system. One core benefit of promoting CV is to improve traffic safety, and to achieve that, accurate driving risk assessment under Vehicle-to-Vehicle (V2V) communications is critical. There are two main differences concluded by comparing driving risk assessment under the CV environment with traditional ones: (1) the CV environment provides high-resolution and multi-dimensional data, e.g., vehicle trajectory data, (2) Rare existing studies can comprehensively address the heterogeneity of the vehicle operating environment, e.g., the multiple interacting objects and the time-series variability. Hence, this study proposes a driving risk assessment framework under the CV environment. Specifically, first, a set of time-series top views was proposed to describe the CV environment data, expressing the detailed information on the vehicles surrounding the subject vehicle. Then, a hybrid CNN-LSTM model was established with the CNN component extracting the spatial interaction with multiple interacting vehicles and the LSTM component solving the time-series variability of the driving environment. It is proved that this model can reach an AUC of 0.997, outperforming the existing machine learning algorithms. This study contributes to the improvement of driving risk assessment under the CV environment.
- Connected vehicle,
- Connected vehicle environment,
- Driving risk assessment,
- CNN-LSTM,
- Traffic safety
Rights and permissions
Copyright: © 2023 by the author(s). Published by Maximum Academic Press, Fayetteville, GA. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https://creativecommons.org/licenses/by/4.0/.

References

[1]	Bennett R, Marsters R, Szymkowski T, Balke KN. 2018. Connected vehicle pilot deployment independent evaluation: Mobility, environmental, and public agency efficiency (MEP) refined evaluation plan-wyoming.
[2]	Lampropoulos G, Siakas K, Anastasiadis T. 2018. Internet of Things (IoT) in industry: Contemporary application domains, innovative technologies and intelligent manufacturing. International Journal of Advances in Scientific Research and Engineering (IJASRE) 4(10):109−18 doi: 10.31695/ijasre.2018.32910 CrossRef Google Scholar
[3]	Bezzina D, Sayer J. 2015. Safety pilot model deployment: Test conductor team report. Report. No. DOT HS 812 171. Washington, DC: National Highway Traffic Safety Administration.
[4]	Wang C, Xie Y, Huang H, Liu P. 2021. A review of surrogate safety measures and their applications in connected and automated vehicles safety modeling. Accident Analysis & Prevention 157:106157 doi: /10.1016/j.aap.2021.106157 CrossRef Google Scholar
[5]	Wang J, Wu J, Zheng X, Ni D, Li K. 2016. Driving safety field theory modeling and its application in pre-collision warning system. Transportation Research Part C: Emerging Technologies 72:306−24 doi: 10.1016/j.trc.2016.10.003 CrossRef Google Scholar
[6]	Mullakkal-Babu FA, Wang M, He X, van Arem B, Happee R. 2020. Probabilistic field approach for motorway driving risk assessment. Transportation Research Part C: Emerging Technologies 118:102716 doi: 10.1016/j.trc.2020.102716 CrossRef Google Scholar
[7]	Ismail Fawaz H, Forestier G, Weber J, Idoumghar L, Muller PA. 2019. Deep learning for time series classification: a review. Data Mining and Knowledge Discovery 33:917−63 doi: 10.1007/s10618-019-00619-1 CrossRef Google Scholar
[8]	Wang J, Zheng Y, Li X, Yu C, Kodaka K, et al. 2015. Driving risk assessment using near-crash database through data mining of tree-based model. Accident Analysis & Prevention 84:54−64 doi: 10.1016/j.aap.2015.07.007 CrossRef Google Scholar
[9]	Hu H, Wang Q, Cheng M, Gao Z. 2021. Cost-sensitive semi-supervised deep learning to assess driving risk by application of naturalistic vehicle trajectories. Expert Systems with Applications 178:115041 doi: 10.1016/j.eswa.2021.115041 CrossRef Google Scholar
[10]	Zhu X, Yuan Y, Hu X, Chiu YC, Ma YL. 2017. A Bayesian Network model for contextual versus non-contextual driving behavior assessment. Transportation Research Part C: Emerging Technologies 81:172−87 doi: 10.1016/j.trc.2017.05.015 CrossRef Google Scholar
[11]	Petraki V, Ziakopoulos A, Yannis G. 2020. Combined impact of road and traffic characteristic on driver behavior using smartphone sensor data. Accident Analysis & Prevention 144:105657 doi: 10.1016/j.aap.2020.105657 CrossRef Google Scholar
[12]	Jiang K, Yang D, Xie S, Xiao Z, Victorino AC, et al. 2019. Real-time estimation and prediction of tire forces using digital map for driving risk assessment. Transportation Research Part C: Emerging Technologies 107:463−89 doi: 10.1016/j.trc.2019.08.016 CrossRef Google Scholar
[13]	Mao H, Guo F, Deng X, Doerzaph ZR. 2021. Decision-adjusted driver risk predictive models using kinematics information. Accident Analysis & Prevention 156:106088 doi: 10.1016/j.aap.2021.106088 CrossRef Google Scholar
[14]	Ali Y, Zheng Z, Haque MM. 2021. Modelling lane-changing execution behaviour in a connected environment: A grouped random parameters with heterogeneity-in-means approach. Communications in Transportation Research 1:100009 doi: 10.1016/j.commtr.2021.100009 CrossRef Google Scholar
[15]	Lee E-Y, Cho H-J, Ryu K-Y. 2016. A probabilistic approach for collision avoidance of uncertain moving objects within black zones. Ad Hoc Networks 52:50−62 doi: 10.1016/j.adhoc.2016.08.009 CrossRef Google Scholar
[16]	Lim KL, Whitehead J, Jia D, Zheng Z. 2021. State of data platforms for connected vehicles and infrastructures. Communications in Transportation Research 1:100013 doi: 10.1016/j.commtr.2021.100013 CrossRef Google Scholar
[17]	Abdel-Aty M, Uddin N, Pande A, Abdalla MF, Hsia L. 2004. Predicting freeway crashes from loop detector data by matched case-control logistic regression. Transportation Research Record: Journal of the Transportation Research Board 1897:88−95 doi: 10.3141/1897-12 CrossRef Google Scholar
[18]	Yu R, Quddus M, Wang X, Yang K. 2018. Impact of data aggregation approaches on the relationships between operating speed and traffic safety. Accident Analysis & Prevention 120:304−10 doi: 10.1016/j.aap.2018.06.007 CrossRef Google Scholar
[19]	Yu R, Zheng Y, Abdel-Aty M, Gao Z. 2019. Exploring crash mechanisms with microscopic traffic flow variables: A hybrid approach with latent class logit and path analysis models. Accident Analysis & Prevention 125:70−78 doi: 10.1016/j.aap.2019.01.022 CrossRef Google Scholar
[20]	Sharma A, Ali Y, Saifuzzaman M, Zheng Z, Haque MM. 2017. Human factors in modelling mixed traffic of traditional, connected, and automated vehicles. AHFE 2017: Advances in Human Factors in Simulation and Modeling, ed. Cassenti D. Switzerland: Springer, Cham. pp. 262−73. https://doi.org/10.1007/978-3-319-60591-3_24
[21]	Sharma A, Zheng Z, Kim J, Bhaskar A, Haque MM. 2021. Assessing traffic disturbance, efficiency, and safety of the mixed traffic flow of connected vehicles and traditional vehicles by considering human factors. Transportation Research Part C: Emerging Technologies 124:102934 doi: 10.1016/j.trc.2020.102934 CrossRef Google Scholar
[22]	Liu H, Wei H, Zuo T, Li Z, Yang YJ. 2017. Fine-tuning ADAS algorithm parameters for optimizing traffic safety and mobility in connected vehicle environment. Transportation Research Part C: Emerging Technologies 76:132−49 doi: 10.1016/j.trc.2017.01.003 CrossRef Google Scholar
[23]	Jo Y, Jang J, Park S, Oh C. 2021. Connected vehicle-based road safety information system (CROSS): Framework and evaluation. Accident Analysis & Prevention 151:105972 doi: 10.1016/j.aap.2021.105972 CrossRef Google Scholar
[24]	Xiao G, Lee J, Jiang Q, Huang H, Abdel-Aty M, Wang L. 2021. Safety improvements by intelligent connected vehicle technologies: A meta-analysis considering market penetration rates. Accident Analysis & Prevention 159:106234 doi: 10.1016/j.aap.2021.106234 CrossRef Google Scholar
[25]	Jia D, Ngoduy D. 2016. Platoon based cooperative driving model with consideration of realistic inter-vehicle communication. Transportation Research part C: Emerging Technologies 68:245−64 doi: 10.1016/j.trc.2016.04.008 CrossRef Google Scholar
[26]	Xin Q, Fu R, Ukkusuri SV, Yu S, Jiang R. 2021. Modeling and impact analysis of connected vehicle merging accounting for mainline random length tight-platoon. Physica A: Statistical Mechanics and its Applications 563:125452 doi: 10.1016/j.physa.2020.125452 CrossRef Google Scholar
[27]	Rahman MS, Abdel-Aty M. 2018. Longitudinal safety evaluation of connected vehicles' platooning on expressways. Accident Analysis & Prevention 117:381−91 doi: 10.1016/j.aap.2017.12.012 CrossRef Google Scholar
[28]	Abdel-Aty M, Wu Y, Saad M, Rahman MS. 2020. Safety and operational impact of connected vehicles' lane configuration on freeway facilities with managed lanes. Accident Analysis & Prevention 144:105616 doi: 10.1016/j.aap.2020.105616 CrossRef Google Scholar
[29]	Essa M, Sayed T. 2020. Self-learning adaptive traffic signal control for real-time safety optimization. Accident Analysis & Prevention 146:105713 doi: 10.1016/j.aap.2020.105713 CrossRef Google Scholar
[30]	Hu J, Huang MC, Yu X. 2020. Efficient mapping of crash risk at intersections with connected vehicle data and deep learning models. Accident Analysis & Prevention 144:105665 doi: 10.1016/j.aap.2020.105665 CrossRef Google Scholar
[31]	Li P, Abdel-Aty M, Cai Q, Yuan C. 2020. The application of novel connected vehicles emulated data on real-time crash potential prediction for arterials. Accident Analysis & Prevention 144:105658 doi: 10.1016/j.aap.2020.105658 CrossRef Google Scholar
[32]	Ma Y, Zhu J. 2021. Left-turn conflict identification at signal intersections based on vehicle trajectory reconstruction under real-time communication conditions. Accident Analysis & Prevention 150:105933 doi: 10.1016/j.aap.2020.105933 CrossRef Google Scholar
[33]	Ali EM, Ahmed MM, Yang G. 2021. Normal and risky driving patterns identification in clear and rainy weather on freeway segments using vehicle kinematics trajectories and time series cluster analysis. IATSS Research 45:137−52 doi: 10.1016/j.iatssr.2020.07.002 CrossRef Google Scholar
[34]	Narmadha S, Vijayakumar V. 2023. Spatio-Temporal vehicle traffic flow prediction using multivariate CNN and LSTM model. Materials Today: Proceedings 81:826−33 doi: 10.1016/j.matpr.2021.04.24 CrossRef Google Scholar
[35]	Wang K, Ma C, Qiao Y, Lu X, Hao W, et al. 2021. A hybrid deep learning model with 1DCNN-LSTM-Attention networks for short-term traffic flow prediction. Physica A: Statistical Mechanics and its Applications 583:126293 doi: 10.1016/j.physa.2021.126293 CrossRef Google Scholar
[36]	Boukerche A, Wang J. 2020. A performance modeling and analysis of a novel vehicular traffic flow prediction system using a hybrid machine learning-based model. Ad Hoc Networks 106:102224 doi: 10.1016/j.adhoc.2020.102224 CrossRef Google Scholar
[37]	Guo Y, Zhang H, Wang C, Sun Q, Li W. 2021. Driver lane change intention recognition in the connected environment. Physica A: Statistical Mechanics and its Applications 575:126057 doi: 10.1016/j.physa.2021.126057 CrossRef Google Scholar
[38]	Krajewski R, Bock J, Kloeker L, Eckstein L. 2018. The highD dataset: A drone dataset of naturalistic vehicle trajectories on german highways for validation of highly automated driving systems. Proc. 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4-7 November 2018. USA: IEEE. pp. 2118-25. https://doi.org/10.1109/ITSC.2018.8569552
[39]	Meng Q, Qu X. 2012. Estimation of rear-end vehicle crash frequencies in urban road tunnels. Accident Analysis & Prevention 48:254−63 doi: 10.1016/j.aap.2012.01.025 CrossRef Google Scholar
[40]	Yu R, Han L, Zhang H. 2021. Trajectory data based freeway high-risk events prediction and its influencing factors analyses. Accident Analysis & Prevention 154:106085 doi: 10.1016/j.aap.2021.106085 CrossRef Google Scholar
[41]	Bergasa LM, Almería D, Almazán J, Yebes JJ, Arroyo R. 2014. Drivesafe: An app for alerting inattentive drivers and scoring driving behaviors. 2014 IEEE Intelligent Vehicles Symposium Proceedings, Dearborn, MI, USA, 8-11 June 2014. USA: IEEE. pp. 240-45. https://doi.org/10.1109/IVS.2014.6856461
[42]	Kingma DP, Ba J. 2014. Adam: A method for stochastic optimization. arXiv Preprint doi: 10.48550/arXiv.1412.6980 CrossRef Google Scholar
[43]	Hochreiter S, Schmidhuber J. 1997. Long short-term memory. Neural Computation 9:1735−80 doi: 10.1162/neco.1997.9.8.1735 CrossRef Google Scholar
[44]	Yuan C, Li Y, Huang H, Wang S, Sun Z, et al. 2022. Application of explainable machine learning for real-time safety analysis toward a connected vehicle environment. Accident Analysis & Prevention 171:106681 doi: 10.1016/j.aap.2022.106681 CrossRef Google Scholar

About this article

Cite this article

Zheng Y, Han L, Yu J, Yu R. 2023. Driving risk assessment under the connected vehicle environment: a CNN-LSTM modeling approach. Digital Transportation and Safety 2(3):211−219 doi: 10.48130/DTS-2023-0017

Zheng Y, Han L, Yu J, Yu R. 2023. Driving risk assessment under the connected vehicle environment: a CNN-LSTM modeling approach. Digital Transportation and Safety 2(3):211−219 doi: 10.48130/DTS-2023-0017

Figures(7) / Tables(6)

Download PDF

Article Metrics

Article views(8687) PDF downloads(2211)

Other Articles By Authors

on this site
on Google Scholar

HTML

Introduction

Connected vehicle (CV) is regarded as one of the most significant features of the future road transportation system. Various countries and regions, including the United States, European Union, Japan, Australia, Korea, etc., are supporting the development of CV technologies through policies, rules and regulations, and key projects. And Original Equipment Manufacturers (OEMs), including Ford, General Motors, etc., are investing heavily in supporting CV technology research to promote CV commercialization. The reason that CV has gained great attention is that it helps to address critical traffic issues, including safety, mobility, and environmental sustainability. Among these, improving traffic safety is the highest priority. For instance, within the pilot deployment program in the US, 80% Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I) applications are safety-related^[1]. The EU claims that investments in connected vehicles are focused on V2V and V2I solutions and infrastructure, with a focus on reducing road risk and crashes^[2]. Under the CV environment, a combination of advanced devices, enabling V2V and V2I communications, collects the data necessary to perform evaluations of safety applications^[3]. These applications include Forward Collision Warning (FCW), Emergency Electronic Brake Lights (EEBL), Work Zone Warnings (WZW), etc. To conduct driving risk assessment applied to CV environment, the corresponding algorithms require better accuracy and robustness to cope with complex and variable driving scenarios, especially the spatio-temporal interactions with neighboring vehicles.

The current studies mainly provide driving risk assessment from three approaches: (1) Surrogate Safety Measures (SSMs)^[4], (2) models based on physical field theory^[5,6], and (3) machine learning methods. To be specific, firstly, the SSMs including Time-to-Collision (TTC), Deceleration Rate to Avoid the Crash (DRAC), DeltaV, etc., are utilized to measure traffic safety and then compared with pre-determined thresholds to identify traffic conflicts. However, the SSMs face the problems that only a single interactive object is considered, some scenarios are not applicable (e.g., TTC in a high-speed car-following scenario), and theoretically, the validation of SSMs requires a large amount of crash data. Secondly, as for the models based on physical field theory, elements of the entire traffic system, including the driver, vehicle, and environment, are considered in a general model. For example, based on artificial potential field theory, driver-vehicle-road interactions were utilized to calculate virtual mass, as well as the field strength and field force, and hence a driving safety field model was proposed^[5]. However, these models only utilized cross-sectional data to describe the situation at each moment but ignored that the traffic data belong to time series data, raising the time series classification problem^[7]. Thirdly, the machine learning methods, including Classification and Regression Tree (CART)^[8], deep learning^[9], Bayesian Network^[10], etc., are adopted to evaluate and predict driving risks in a more reliable and robust manner. And some of these methods can overcome the time series classification problem, e.g., Recurrent Neural Network (RNN). Therefore, this kind of machine learning method can comprehensively solve the problems in driving risk assessment research.

Besides, existing driving risk assessment studies mainly employed three kinds of data: (1) driver behavior information^[11], (2) kinematic data^[12,13], e.g., velocity, acceleration, tire forces, (3) contextual features of the neighboring environment^[10], e.g., road conditions, dynamic traffic flow information. These data include influencing elements from all aspects of the entire transportation system. However, these data cannot provide a timely and comprehensive picture of how the subject vehicle interacts with other traffic participants (e.g., neighboring vehicles), which contributes significantly to driving safety. Fortunately, the CV environment data provide a promising way to address this issue. Under the CV environment, detailed information of each vehicle's driving environment, like the kinematics and spatial information of the neighboring vehicles, is available via V2V communication technologies^[14,15]. However, the detailed information, which was always not accessible and hence left out in the previous studies, can be high-resolution and multi-dimensional^[16]. To describe the detailed information in a driving scenario, constructing structured data (e.g., text independent variables), which is commonly used in traffic safety assessment analysis^[17−19], can be clumsy and unattainable. More specifically, some important characteristics (e.g., time series feature) can be obscured due to the data aggregating process from the spatial and temporal dimensions. Meanwhile, it is difficult to describe the neighboring environmental information based on structured data using the exhaustion method. Therefore, it is necessary to explore an effective and comprehensive description of the CV environment data.

In this study, a driving risk assessment framework under the CV environment is proposed, and two advances are highlighted as follows: (1) A novel form of describing the detailed information of the vehicles neighboring the subject vehicle, i.e., time series top view set, was proposed to describe the CV environment data; (2) Developed a hybrid Convolutional Neural Network and Long Short-term Memory (CNN-LSTM) model to analyze both the spatial and temporal features, respectively, i.e., the CNN component considers various elements of the driving environment and the LSTM component solves the time series classification problem. The performance outperformed the existing machine learning algorithms with an AUC of 0.997.

The rest of this paper is structured as follows. In the background section, the driving risk assessment data and approaches are reviewed briefly. In the data preparation section, the identification process of high-risk and non-high-risk events is described in detail. In the methodology section, the CNN-LSTM model structure is introduced, as well as the evaluation metrics. In the modeling results section, the experiment design and the modeling results are clarified. Finally, the summary and discussions section presents the summary of this study and the discussions on the application scenarios of the proposed model.

Background

Traffic safety studies related to CV

In recent years, there has been an increasing amount of traffic safety studies related to CV. According to the objectives of these studies, four main categories can be summarized as follows.

(1) First, to evaluate the human aspects that affect the safety of CVs, such as driver compliance^[20,21]. Sharma et al.^[21] thoroughly investigated the the effect of driver compliance on the mixed traffic environment of CVs and traditional vehicles including both high-compliance and low-compliance drivers.

(2) To calculate the safety implications of CVs in a mixed traffic environment while taking into account various CV Market Penetration Rates (MPRs)^[22−24]. By using a meta-analysis methodology, Xiao et al.^[24] estimated the safety impacts of CVs using MPR and discovered that safety is increased by 4% with 10% MPR, and by 43% with 90% MPR.

(3) To develop traffic flow models for the CV environment, such as platoon-based cooperative driving models^[25], merging advisory models^[26], alarm algorithms^[22], etc. Jia & Ngoduy^[25] introduced a platoon-based car-following model for CVs and used numerical simulations to validate the proposed model.

(4) In addition, to implement traffic safety strategies and management improvements for the CV environment, such as managed lanes^[27,28], improved intersections^[29,30], etc. Rahman & Abdel-Aty^[27] suggested an algorithm for controlling CVs to form platoons in managed lanes, and the longitudinal safety of these platoons was evaluated. Regarding studies on estimating CV-related risk, there are several that concentrate on road segment crash potential prediction^[31], crash-prone intersection identification^[30,32], and risky driving pattern detection^[33].

There are, however, few studies on the real-time driving risk assessment for the CV environment, particularly for the CV environmental characterization data.

Time series classification
As for the time series classification problem, there are some basic deep learning approaches^[7], e.g., Multi-Layer Perceptron (MLP), CNN, RNN, their variants, e.g., LSTM, Echo State Networks, which is a type of RNN. The practices of adopting these approaches to solve time series classification problems in the traffic field are summarized below.

(1) First, traffic flow prediction is one of the most crucial applications^[34,35]. In order to extract the spatial and temporal characteristics, Wang et al.^[35] developed a traffic flow prediction model based on the 1DCNN-LSTM network. This model was demonstrated to display a faster convergence speed and higher prediction accuracy when compared to typical neural network models. With the help of Graph Convolutional Network (GCN) and the Gated Recurrent Unit (GRU)'s sequence-to-sequence structure, Boukerche & Wang^[36] created a hybrid deep learning model that increased the predictability and effectiveness of traffic flow.

(2) Second, the time series classification methods are also widely employed in the field of predicting driver behaviors, such as lane change intention recognition. To recognize the desire to change lanes, Guo et al.^[37] developed a bi-directional long and short-term memory network based on the attention mechanism (AT-BiLSTM). The suggested approach surpassed the current machine learning techniques with an accuracy of 93.33% at 3 s prior to lane change.

(3) Third, increasing research are treating the classification of time series as a challenge in driving risk assessment. To generate driving risk scores, Hu et al.^[9] integrated a convolutional neural network and a long short-term memory encoder/decoder network into a semi-supervised framework. These scores represented the best comprehensive performance among the available machine learning techniques.

However, to the best of our knowledge, there is no driving risk assessment study that considers time series classification problem for the CV environment.

Summary and discussions

This study is intended to improve the accuracy of the driving risk assessment model under the CV environment. Firstly, a novel data form, i.e., time series top view set, was proposed to describe the CV environment data. Then, a hybrid CNN-LSTM model was established to analyze both the spatial and temporal features. More specifically, the CNN component was used to comprehensively consider various elements of the driving environment, and the LSTM component was used to solve the time series classification problem, which the driving risk assessment can be treated as. Besides, to identify which spatial range and temporal range contribute more to the modeling result, six experiment designs were conducted considering both spatial and temporal variables. Finally, it is proved that the proposed model performed best when the spatial range includes only the front part of the subject vehicle and the temporal range is from 5 s to 2 s before the zero time, with a sensitivity of 0.996, a False Alarm Rate of 0.065, and an AUC of 0.997.

The advantages of the proposed model come to the fore when compared with other studies, where the same dataset is utilized, as shown in Table 6. In terms of the form of data organization, independent variable extraction in a single vehicle dimension, e.g., the time series top view set in this study, the temporal and spatial traffic flow characteristic variables^[40], etc., is superior to aggregation of data in the cross-sectional dimension, e.g., the cross-sectional traffic data^[44]. Besides, considering time-series variability can improve the modeling performance. In this study, the time-series variability is considered in the LSTM component, and in the study by Yu et al.^[40], it is considered in the variable construction process.

Table 6. Comparation results of modeling performance based on testing data.

Model	Variable dimension	Time-series variability consideration	Sensitivity	False alarm rate	AUC
Random Forest^[44]	Cross-section	No	0.517	0.088	0.827
RPLR model^[40]	Single vehicle	Yes	0.980	0.060	0.960
CNN-LSTM model (this study)	Single vehicle	Yes	0.996	0.065	0.997

This study is a beneficial exploration, particularly in terms of the form of data description for the CV environment and model construction. In terms of driving safety, this proposed model can be used for real-time driving risk prediction under the CV environment, where information about the neighboring environment can be obtained in real-time via an onboard unit. From a more macro perspective, this proposed model can be applied to road segment safety assessment by comprehensively considering the driving risk of vehicles within the road segment, as the operational status of all vehicles is accessible under the CV environment.

However, there are still some limitations in this study, e.g., due to data limitations, the spatial range defined in this study was fixed in shape and size with the subject vehicle in the center. However, the spatial range can be irregularly shaped and the size can vary dynamically with the velocity of the subject vehicle. And the temporal range can also take on more continuous values. Besides, highD trajectory dataset was utilized to represent the CV environment data. In future research, the model performance on the real CV dataset should be verified. Moreover, in future studies, multi-dimensional safety surrogate indicators could be used to extract the multi-vehicle collision and other types of high-risk scenarios. Last but not least, it is essential to consider the influence of CV penetration rate and its impacts of the model performance.

Author contributions

The authors confirm contribution to the paper as follows: Study conception and design: Y. Zheng, R. Yu; Data collection: L. Han, Y. Zheng; Analysis and interpretation of results: J. Yu, R. Yu, Y. Zheng; Draft manuscript preparation: Y. Zheng, R. Yu. All authors reviewed the results and approved the final version of the manuscript.

Modeling process	Parameters with values
Input of CNN	75 top views with both the front and rear of the subject vehicle: 360 × 30 (or 75 top views with only the front of the subject vehicle: 360 × 60)
Convolution layer	No. of layers: 4 No. of kernels: 32, 64,128, and 256 Kernel size: (5 × 5), (3 × 3), (3 × 3), and (3 × 3) Stride: (2,2), (2,2), (2,2), and (2,2) Padding: (0,0), (0,0), (0,0), and (0,0) Activating function: ReLU
Pooling layer	No. of layers: 1 Kernel size: (2 × 2) Stride: (2,2) Padding: (0,0)
Fully connected layer	No. of layers: 2 Hidden neurons: 512 and 256 Activating function: ReLU
Output of CNN model/ Input of LSTM model	No. of features: 64 (for each top view) 75 top views for each event
LSTM	No. of layers: 3 Hidden neurons: 512, 512, 512
Fully connected layer	No. of layers: 1 Hidden neurons: 256 Activating function: ReLU
Output of LSTM model	Binary classification result: high-risk or non-high-risk
Training process	Backpropagation Learning rate: StepLR (lr = 1e-3, γ = 0.3) Loss function: Cross-entropy Mini-batch size: 128 Epochs: 50

Actual condition	Predicted condition
Actual condition	Positive	Negative
Positive	True Positive (TP)	False Negative (FN)
Negative	False Positive (FP)	True Negative (TN)

	Spatial range	Temporal range
1	100 m in both the front and rear of the subject vehicle	5 s to 2 s before the zero time
2		5 s to 3 s before the zero time
3		4 s to 2 s before the zero time
4	Only 100 m in the front of the subject vehicle	5 s to 2 s before the zero time
5		5 s to 3 s before the zero time
6		4 s to 2 s before the zero time

	Sensitivity	False Alarm Rate	AUC
1	0.988	0.177	0.992
2	0.977	0.119	0.988
3	0.988	0.274	0.989
4	0.996	0.065	0.997
5	0.992	0.032	0.996
6	0.988	0.387	0.988

{{lists.name}}

Driving risk assessment under the connected vehicle environment: a CNN-LSTM modeling approach

Abstract