Digital twin intersection based on roadside multi-sensor data fusion

Yuhang Wang; Yanzhan Chen; Liang Zheng; Yuhang Wang; Yanzhan Chen; Liang Zheng

doi:10.48130/dts-0025-0023

Digital twin technology is pivotal in the advancement of smart cities and autonomous driving due to its unique capabilities in virtual-reality integration, interactive control, and predictive analysis. The primary enabler for achieving advanced transportation digital twins lies in enhancing environmental sensing capabilities, with multi-sensor data fusion emerging as a widely adopted strategy to improve sensing performance. However, existing research has predominantly focused on onboard systems, leaving roadside sensor deployment and roadside multi-sensor data fusion strategies insufficiently explored. Recognizing the potential advantages of roadside sensor systems, such as broader sensory field coverage and reduced occlusion. This study investigates the integration of roadside multi-sensor data fusion with digital twin technology in the transportation domain. Consequently, this paper introduces an innovative intersection digital twin system developed through a simulation-based approach, leveraging roadside multi-sensor data late fusion. The Car Learning to Act-Simulation of Urban Mobility (CARLA-SUMO) co-simulator acts as a data generation platform, synchronously producing RGB images and Light Detection and Ranging (LiDAR) point clouds with spatiotemporal consistency. For object detection, we employ the You Only Look Once version 5 (YOLOv5) and PointPillars algorithms. Then, a decision-level fusion strategy is proposed to integrate these heterogeneous sensor outputs into a cohesive roadside digital twin system. Experimental results demonstrate that YOLOv5 and PointPillars achieve a mean Average Precision (mAP) of approximately 90% and 60%, respectively. Moreover, the detection frequency of both detectors is well-suited to the dynamic nature of intersection traffic, and the fusion strategy synergistically exploits the complementary advantages of heterogeneous sensors to enhance overall system performance. This research contributes to the field by facilitating low-cost autonomous driving simulation tests and enabling the reconstruction of intersections using roadside digital twin technology, with significant implications for vehicle-road coordination and traffic management.

HTML

Introduction

In recent years, the rapid advancement of autonomous driving has significantly increased the demand for sophisticated digital twins^[1]. By creating digital replicas of the physical world, digital twin technology enables real-time monitoring, precise analysis, and efficient optimization of transportation systems^[2,3]. Additionally, it provides planners with high-precision detection and object modeling capabilities, which are instrumental in enhancing the automation and intelligence of transportation networks^[4].

The primary challenge in developing high-fidelity transportation digital twin systems lies in achieving robust, precise, and detailed environmental sensing, which demands advanced object detection capabilities. While current research and industry projects have predominantly focused on onboard sensor solutions, the potential of roadside infrastructure sensing remains underexplored^[5,6]. Onboard solutions are favored for their ease of deployment and maintenance^[7,8], making them well-suited for the current popular single-vehicle intelligence approaches in the industry. Compared to onboard systems, roadside infrastructure offers several complementary distinct advantages, including reduced edge computing costs, enhanced provision of continuous and reliable area-wide sensing, and expanded detection range with diminished susceptibility to occlusion^[9]. These strengths make roadside infrastructure promising for advanced applications like vehicle-to-infrastructure (V2I) collaboration and higher-level autonomous driving. Both academia and industry are increasingly recognizing the benefits of roadside infrastructure sensing as a valuable complementary solution. Consequently, there is a pressing need for interdisciplinary research that integrates roadside multi-view sensing and digital twin technology to extend the capabilities of existing transportation systems.

In the context of roadside deployment, the choice of sensor type is crucial. Different sensors, such as cameras and LiDAR, offer distinct advantages based on their respective imaging principles. Cameras capture rich color and semantic information through pixel-based textures, while LiDAR provides precise positional and depth information through spatial 3D point clouds. The performance comparison of these common sensors is detailed in Table 1. In complex transportation environments, a single type of sensor may lead to problems such as low accuracy, susceptibility to interference, and poor adaptability^[10,11]. To avoid the drawbacks of a single type of sensor, multi-sensor fusion can effectively improve the quality of the obtained information, resulting in more comprehensive and accurate sensing results. For roadside sensors, factors such as data offset due to different installation heights, differences in perceptual perspectives, and experimental difficulties due to scarce datasets bring more challenges to the fusion process.

Table 1. Common sensors in road infrastructure sensing systems.

Type	Advantage	Disadvantage	Principle	Main application	Deployment location
Geomagnetic coil^[12]	Fast response time, cost-effective	Difficult to maintain, accuracy sensitive	Electromagnetic induction	Detecting the presence, passage of vehicles	Position 0.2-0.4 m deep from the road surface
Infrared sensors^[13]	High precision	Low resolution, short distance	Infrared reflectance	Night vision, infrared imaging	Signal arm, building above 3 m position
Fisheye camera^[14]	Large detection range, high picture quality	Highly expensive with distortion problems	Image recognition technology	Safety monitoring	Signal arm, street lamp post 3−5 m height position
LiDAR^[15]	3D information available, high accuracy	Slow processing, high costs	Laser beam reflection	Depth information perception, 3D reconstruction	Signal arm, street lamp post 1.5−2.5 m height position
Camera^[5]	Wide range, rich textures, low prices	Light-sensitive, blurriness under high-speed	Image recognition technology	Path and object recognition	Signal arm, street lamp post 2.5−5 m height position

In response to the above issues, scholars have conducted a great deal of research in related fields. Liu et al.^[16] developed a method that integrates vehicle camera image data with digital twins to improve the performance of vision systems in intelligent vehicles, achieving a detection accuracy of 79.2% at a threshold of 0.7 IoU. Zheng et al.^[17] constructed digital twin models at 12 urban traffic locations by capturing image data and extracting trajectory information via drones. He et al.^[18] utilized deep learning algorithms and near-real-time projection methods to develop a digital twin system for 3D reconstruction of construction sites based on video camera image data. Wojke et al.^[19] introduced a real-time traffic monitoring system that integrates LiDAR and visual data deployed at the roadside for 3D object detection, extending the DeepSORT model to create the 3DSORT tracking model. Bai et al.^[20] pioneered a multimodal 3D object detection framework that combines roadside LiDAR and camera data, integrating various fusion stages (early and late fusion) and methods (traditional and learning-based fusion) within a single system. Young et al.^[21] proposed an infrastructure-based perceptual fusion scheme, where multiple sensors (LiDAR and cameras) are used to acquire and fuse perceptual information for monitoring the traffic status of moving objects. In a recent study, Chen et al.^[22] employed the PointPillars algorithm with a late fusion-based cooperative sensing strategy to generate highly complete and smooth vehicle trajectories in the entire road.

The literature review traces the progression of digital twin construction from onboard sensing and aerial photography to roadside sensing, emphasizing the shift from unimodal to multimodal fusion detection methods. Despite these advancements, there remains a vacancy in integrating roadside infrastructure sensing, multi-sensor data fusion, and digital twin technologies into a unified framework. Additionally, much of the existing research in roadside sensing relies on open-source datasets (e.g., nuScenes, Waymo) or real-world field experiments to train object detection models, which demands substantial resources and incurs high costs. This study introduces a novel approach by leveraging the CARLA-SUMO co-simulator to generate heterogeneous sensor data from roadside infrastructures and proposes an innovative fusion technique that integrates LiDAR point cloud data with camera RGB image data. The framework of this research is composed of three key components: (1) Simulation Platform Construction and Data Collection: the CARLA-SUMO co-simulator is employed to build detailed simulation environments, wherein LiDAR and cameras are deployed to generate comprehensive test datasets. This phase also includes essential tasks such as data preprocessing, co-calibration, and labeling. (2) Unimodal Detection: in this phase, the PointPillars model processes the LiDAR-acquired point cloud data, while the YOLOv5 model handles the RGB image data from the cameras. The detection results from both models undergo spatial transformation to account for the differing perspectives of the detection bounding boxes. (3) Near Real-Time Mapping and Fusion Modeling: this final phase involves correlating and matching detection frames within a unified viewpoint, followed by fusing the sensing results at the decision level through a mapping approach. The overarching objective of this framework is to enhance perception accuracy and develop a digital twin intersection capable of real-time monitoring of traffic dynamics at the vehicle level. The key advantages of this research are outlined as follows:

Versatility and modularity

The study employs a late-fusion approach, where the object detection results from each sensor are fused on top of each other, thus allowing for the flexible selection of any pre-trained 2D and 3D object detection algorithms to achieve the digital twin effect using the proposed fusion method.

Stability and superior performance
By deploying sensors within the roadside infrastructure, the system mitigates occlusions caused by vehicles and buildings, ensuring a continuous and stable perceptual field.

Simulation platform construction and data collection

[1]	Bao L, Wang Q, Jiang Y. 2021. Review of digital twin for intelligent transportation system. 2021 International Conference on Information Control, Electrical Engineering and Rail Transit (ICEERT), Lanzhou, China, 30 October 2021 − 01 November 2021. USA: IEEE. pp. 309−15 doi: 10.1109/ICEERT53919.2021.00064
[2]	Martínez-Gutiérrez A, Díez-González J, Ferrero-Guillén R, Verde P, Álvarez R, et al. 2021. Digital twin for automatic transportation in industry 4.0. Sensors 21(10):3344 doi: 10.3390/s21103344 CrossRef Google Scholar
[3]	Kušić K, Schumann R, Ivanjko E. 2023. A digital twin in transportation: Real-time synergy of traffic data streams and simulation for virtualizing motorway dynamics. Advanced Engineering Informatics 55:101858 doi: 10.1016/j.aei.2022.101858 CrossRef Google Scholar
[4]	Wang Z, Gupta R, Han K, Wang H, Ganlath A, et al. 2022. Mobility digital twin: Concept, architecture, case study, and future challenges. IEEE Internet of Things Journal 9(18):17452−67 doi: 10.1109/JIOT.2022.3156028 CrossRef Google Scholar
[5]	Datondji SRE, Dupuis Y, Subirats P, Vasseur P. 2016. A survey of vision-based traffic monitoring of road intersections. IEEE transactions on intelligent transportation systems 17(10):2681−98 doi: 10.1109/TITS.2016.2530146 CrossRef Google Scholar
[6]	Zimmer W, Birkner J, Brucker M, Tung Nguyen H, Petrovski S, et al. 2023. InfraDet3D: multi-Modal 3D object detection based on roadside infrastructure Camera and LiDAR sensors. 2023 IEEE Intelligent Vehicles Symposium (IV), Anchorage, AK, USA, 4−7 June 2023. USA: IEEE. pp. 1−8 doi: 10.1109/IV55152.2023.10186723
[7]	Yoo JH, Kim Y, Kim J, Choi JW. 2020. 3D-CVF: generating joint Camera and LiDAR features using cross-view spatial feature fusion for 3D object detection. In Computer vision–ECCV 2020: 16^th European conference. Cham: Springer. pp. 720−36 doi: 10.1007/978-3-030-58583-9_43
[8]	Yurtsever E, Lambert J, Carballo A, Takeda K. 2020. A survey of autonomous driving: common practices and emerging technologies. IEEE Access 8:58443−69 doi: 10.1109/ACCESS.2020.2983149 CrossRef Google Scholar
[9]	Bai Z, Wu G, Qi X, Liu Y, Oguchi K, et al. 2022. Infrastructure-based object detection and tracking for cooperative driving automation: A survey. 2022 IEEE Intelligent Vehicles Symposium (IV), Aachen, Germany, 4−9 June 2022. USA: IEEE. pp. 1366−73 doi: 10.1109/IV51971.2022.9827461
[10]	Bijelic M, Gruber T, Mannan F, Kraus F, Ritter W, et al. 2020. Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13−19 June 2020. USA: IEEE. pp. 11682−92 doi: 10.1109/CVPR42600.2020.01170
[11]	Geiger A, Lenz P, Urtasun R. 2012. Are we ready for autonomous driving? The KITTI vision benchmark suite. 2012 IEEE conference on computer vision and pattern recognition, Providence, RI, USA, 16−21 June 2012. USA: IEEE. pp. 3354−61 doi: 10.1109/CVPR.2012.6248074
[12]	Klein LA. 2024. Roadside sensors for traffic management. IEEE Intelligent Transportation Systems Magazine 16(4):21−44 doi: 10.1109/MITS.2023.3346842 CrossRef Google Scholar
[13]	Guerrero-Ibáñez J, Zeadally S, Contreras-Castillo J. 2018. Sensor technologies for intelligent transportation systems. Sensors 18(4):1212 doi: 10.3390/s18041212 CrossRef Google Scholar
[14]	Bassford M, Painter B. 2015. Development of an intelligent Fisheye camera. 2015 International Conference on Intelligent Environments, Prague, Czech Republic, 15−17 July 2015. USA: IEEE. pp. 160−63 doi: 10.1109/IE.2015.34
[15]	Li Y, Ibanez-Guzman J. 2020. Lidar for autonomous driving: The principles, challenges, and trends for automotive lidar and perception systems. IEEE Signal Processing Magazine 37(4):50−61 doi: 10.1109/MSP.2020.2973615 CrossRef Google Scholar
[16]	Liu Y, Wang Z, Han K, Shou Z, Tiwari P, et al. 2020. Sensor fusion of camera and cloud digital twin information for intelligent vehicles. 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas, NV, USA, 19 October 2020 − 13 November 2020. USA: IEEE. pp. 182−87 doi: 10.1109/iv47402.2020.9304643
[17]	Zheng O, Abdel-Aty M, Yue L, Abdelraouf A, Wang Z, et al. 2024. CitySim: a drone-based vehicle trajectory dataset for safety-oriented research and digital twins. Transportation Research Record 2678(4):606−21 doi: 10.1177/03611981231185768 CrossRef Google Scholar
[18]	He J, Li P, An X, Wang C. 2024. A reconstruction methodology of dynamic construction site activities in 3D digital twin models based on camera information. Buildings 14(7):2113 doi: 10.3390/buildings14072113 CrossRef Google Scholar
[19]	Wojke N, Bewley A, Paulus D. 2017. Simple online and realtime tracking with a deep association metric. 2017 IEEE international conference on image processing (ICIP), Beijing, China, 17−20 September 2017. USA: IEEE. pp. 3645−49 doi: 10.1109/ICIP.2017.8296962
[20]	Bai Z, Nayak SP, Zhao X, Wu G, Barth MJ, et al. 2023. Cyber mobility mirror: a deep learning-based real-world object perception platform using roadside LiDAR. IEEE Transactions on Intelligent Transportation Systems 24(9):9476−89 doi: 10.1109/TITS.2023.3268281 CrossRef Google Scholar
[21]	Young SE, Bensen EA, Zhu L, Day C, Lott JS, et al. 2022. Concept of operations of next-generation traffic control utilizing infrastructure-based cooperative perception. International Conference on Transportation and Development 2022, May 31–June 3, 2022, Seattle, Washington. USA: American Society of Civil Engineers. pp. 93−104 doi: 10.1061/9780784484326
[22]	Chen Y, Zheng L, Tan Z. 2024. Roadside LiDAR placement for cooperative traffic detection by a novel chance constrained stochastic simulation optimization approach. Transportation Research Part C: Emerging Technologies 167:104838 doi: 10.1016/j.trc.2024.104838 CrossRef Google Scholar
[23]	He K, Zhang X, Ren S, Sun J. 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence 37(9):1904−16 doi: 10.1109/TPAMI.2015.2389824 CrossRef Google Scholar
[24]	Jaiswal SK, Agrawal R. 2024. A Comprehensive Review of YOLOv5: Advances in Real-Time Object Detection. International Journal of Innovative Research in Computer Science & Technology 12(3):75−80 doi: 10.55524/ijircst.2024.12.3.12 CrossRef Google Scholar
[25]	Bochkovskiy, Wang CY, Liao HYM. 2020. YOLOv4: optimal speed and accuracy of object detection. arXiv Preprint 2004.10934 doi: 10.48550/arXiv.2004.10934 CrossRef Google Scholar
[26]	Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, et al. 2016. SSD: single shot MultiBox detector. In European Conference on Computer Vision. Cham: Springer. pp. 21−37 doi: 10.1007/978-3-319-46448-0_2
[27]	Lang AH, Vora S, Caesar H, Zhou L, Yang J, et al. 2019. Pointpillars: fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Long Beach, CA, USA, 15−20 June 2019. USA: IEEE. pp. 12697−705 doi: 10.1109/CVPR.2019.01298
[28]	Zhou Y, Tuzel O. 2018. Voxelnet: end-to-end learning for point cloud based 3d object detection. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18−23 June 2018. USA: IEEE. pp. 4490−99 doi: 10.1109/CVPR.2018.00472
[29]	Yan Y, Mao Y, Li B. 2018. Second: Sparsely embedded convolutional detection. Sensors 18(10):3337−38 doi: 10.3390/s18103337 CrossRef Google Scholar
[30]	Qi C R, Liu W, Wu C, Su H and Guibas L J. 2018. Frustum pointnets for 3D object detection from RGB-D data. In Proceedings of the IEEE conference on computer vision and pattern recognition, Salt Lake City, UT, USA, 18−23 June 2018. USA: IEEE. pp. 918−27 doi: 10.1109/CVPR.2018.00102

Sensor	Parameters	Default	Description
LiDAR	Channels	64	Number of lasers
	Height	3.17 m	Height with respect to the road surface
	Range	100 m	Maximum distance to measure/ray-cast in meters
	Rotation frequency	20 Hz	LiDAR rotation frequency
	Points per second	500,000	Number of points
	Upper FOV	5	Angle in degrees of the highest laser beam
	Lower FOV	−35	Angle in degrees of the lowest laser beam
	Noise stddev	0.01	Standard deviation of the noise model of point
	Dropoff rate	20%	General proportion of points that are randomly dropped
	Dropoff intensity limit	0.8	Threshold of intensity value for exempting dropoff
Camera	FOV	90	Angle in degrees
	Focal length	360	Optical characteristics of camera lenses
	Principal point coordinate	(320, 240)	Image center coordinates
	Resolution	640 × 480	Measure of image sharpness
FOV indicates field of view; stddev indicates standard deviation.

Model	Test dataset	mAP	FPS
YOLOv5^[24]	COCO	50.4%	140
YOLOv4^[25]	COCO	48.9%	120
SSD^[26]	COCO	41.2%	59
Faster R-CNN^[26]	COCO	42.7%	7
mAP indicates mean average precision. FPS indicates frames per second.

Model	Test dataset	mAP	FPS
VoxelNet^[28]	KITTI dataset	65.11%	30
SECOND^[29]	KITTI dataset	76.48%	20
Pointpillars^[27]	KITTI dataset	74.99%	62
F-PointNet^[30]	KITTI dataset	70.39%	5.9
mAP indicates mean average precision. FPS indicates frames per second.

Parameter	Description	Value
Parameter	Description	PointPillars	YOLOv5
Range	Detection range of the model.	[0, −39.68, −3, 69.12, 39.68, 1]	−
Voxel size	Voxel is a pixel in 3D space, voxel_size represents the size of the voxel.	[0.16,0.16,4]	−
No. of classes	Class of detection objects.	1	1
Lr	Learning rate, which determines the step size of parameter updates during the optimization process.	0.003	0.01
Batch size	Refers to the number of samples entered at once when training the model.	4	16
Epoch	Number of iterations during training.	80	300

{{lists.name}}

Digital twin intersection based on roadside multi-sensor data fusion

Abstract

Supplementary information

Rights and permissions

References

About this article

Cite this article

Article Metrics

Access History

Other Articles By Authors