Double deep network-based traffic signal optimization method for isolated intersections

Rongjian Dai; Yanzhen Li; Rongjian Dai; Yanzhen Li

doi:10.48130/dts-0026-0004

This study addresses the limitations of existing reinforcement learning (RL)-based traffic signal control methods, which typically optimize either the signal phase sequence or phase duration independently. We propose a novel joint optimization framework based on the Double Deep Q-Network (DDQN) that simultaneously determines both the phase sequence and phase duration. To ensure stability, the base phase duration is determined using the classical Webster method. Furthermore, a hybrid state representation is developed by integrating both microscopic and macroscopic traffic features, such as queue length and vehicle delay. A Squeeze-and-Excitation (SE) attention mechanism is introduced to guide the agent's attention toward critical traffic attributes. Simulation experiments conducted on the SUMO platform demonstrate that the proposed method significantly reduces average queue length and vehicle travel time when compared to traditional fixed-time and vehicle-actuated control strategies, particularly under medium to high traffic demand. The results validate the effectiveness, robustness, and practical applicability of the method for intelligent signal control in complex urban intersections.

HTML

Introduction

With the accelerating pace of urbanization, traffic congestion has emerged as a major bottleneck impeding sustainable urban development. It not only reduces the efficiency of road resource utilization and prolongs commuting times, but also contributes to a range of environmental pollution issues^[1]. To address these challenges, researchers have proposed a variety of traffic management strategies, including signal timing optimization, roadway expansion, and the implementation of intelligent transportation systems^[2,3]. Among these, traffic signal control has gained prominence as a core technology for enhancing network efficiency, owing to its relatively low deployment cost and short implementation cycle.

With continuous advancements in traffic data acquisition technologies, traffic signal control methods have undergone significant evolution and innovation^[4,5]. In general, their development can be divided into three stages: static control based on fixed-time plans, vehicle-actuated control triggered by detectors^[6,7], and adaptive control based on real-time optimization^[8,9]. The latter two categories leverage sensor data to capture real-time traffic volumes and dynamically adjust signal timing plans^[10]. Although these methods have improved traffic efficiency to some extent, they still suffer from shortcomings such as delayed response and limited generalization ability in dealing with the highly nonlinear and stochastic characteristics of urban traffic flow.

In recent years, the rapid advancement of artificial intelligence has introduced new research methods and technical paradigms to the field of traffic signal control^[11]. Among them, reinforcement learning (RL) has become a research hotspot due to its ability to make autonomous decisions and optimize policies in dynamic environments^[12−14]. In RL, an agent interacts with the environment by trial and error, continuously learning and updating the mapping between actions and states—namely, the signal control policy—and adaptively adjusts signal plans based on the long-term cumulative reward (Q-value). However, when handling high-dimensional continuous state spaces, traditional RL algorithms exhibit insufficient representational capacity, resulting in poor control performance in complex network environments^[15,16].

Deep reinforcement learning (DRL), which combines the perceptual strength of deep learning (DL) with the sequential decision-making capabilities of RL, offers a promising solution to these limitations^[17−19]. DRL agents follow a model-free learning paradigm, leveraging the powerful perception capabilities of deep networks to learn fine-grained control strategies directly from traffic data, thereby enabling more flexible and adaptive signal control schemes^[20]. As a result, DRL-based traffic signal control approaches have attracted growing interest from the academic community^[21−23].

At present, reinforcement learning (RL)-based traffic signal control (TSC) strategies can be broadly divided into two categories: one focuses on optimizing the phase duration under a fixed phase sequence (i.e., timing optimization); the other focuses on dynamically adjusting the phase sequence under fixed or preset phase durations (i.e., phase sequence optimization). In timing optimization strategies, the agent adjusts the green duration of each signal phase based on real-time traffic demand to improve intersection efficiency^[24]. This approach is adaptable to a certain extent and can respond to traffic flow fluctuations. However, its flexibility and adaptability are significantly limited under peak or high-density traffic conditions^[25]. Moreover, due to the highly uneven arrival patterns across directions, the performance of timing-based strategies often lacks consistency and robustness in practical applications. In contrast, phase sequence optimization strategies offer greater flexibility and responsiveness. These strategies allow the agent to dynamically select the optimal phase according to real-time traffic conditions. When the same phase is consecutively selected, the clearance interval can be skipped, thereby effectively extending green time in a specific direction and improving traffic flow stability^[26−28]. Typically, such strategies switch phases at fixed intervals, enabling them to adapt to varying traffic patterns throughout the day. Despite their advantages, one critical challenge lies in determining the appropriate phase-switching interval^[29]. If the interval is too short, frequent phase transitions may lead to excessive yellow time and reduced intersection capacity, as well as increased computational burden during training and operation. On the other hand, excessively long intervals may reduce the system's ability to promptly respond to traffic fluctuations, degrade real-time performance, and hinder the learning process, leading to suboptimal policies.

Reinforcement learning (RL) has attracted increasing attention in traffic signal control, yet its practical application remains constrained by several key challenges. A central difficulty is that simultaneously optimizing phase sequence and phase duration dramatically enlarges the action space, particularly at intersections with multiple phases. This high dimensionality increases exploration complexity and often leads to slow or unstable convergence. As a result, many existing studies restrict optimization to either phase duration or phase sequence, thereby simplifying the control problem at the cost of reduced operational flexibility. This separation limits the controller's ability to cope with heterogeneous demands and rapidly fluctuating traffic conditions. Moreover, conventional RL-based controllers typically rely on coarse-grained state representations that lack lane-level detail. Without sufficient microscopic information, these models struggle to identify localized congestion phenomena such as queue spillback or uneven lane utilization, ultimately reducing their responsiveness in complex real-world environments.

To address these limitations, this study proposes a DDQN-based control framework that incorporates a structured action design, enabling the integrated optimization of phase sequence and phase duration while avoiding uncontrolled action-space inflation. A multi-scale state representation is constructed to capture both macroscopic and microscopic traffic features, and a Squeeze-and-Excitation (SE) attention mechanism is introduced to emphasize congestion-critical dimensions within the state vector. These components enhance the stability of the learning process, improve convergence efficiency, and support more flexible and adaptive control under dynamic urban traffic conditions.

Motivated by these shortcomings, this study proposes an enhanced Double Deep Q-Network (DDQN) framework that supports integrated decision-making for both phase sequence and phase duration. The major contributions are as follows:

(1) Unified decision-making structure: The proposed framework combines phase selection and timing adjustment within a single decision architecture, improving flexibility and adaptability under dynamic traffic conditions.

(2) Domain-informed phase duration anchoring: A baseline phase duration derived from the classical Webster method is incorporated to stabilize control actions and prevent excessively frequent or erratic switching, thereby improving policy robustness.

(3) Multi-scale state representation with SE attention: The state vector integrates macroscopic indicators (e.g., average queue length, delay) with microscopic lane-level features (e.g., vehicle counts), while the SE module selectively emphasizes critical congestion features to enhance decision relevance and accuracy.

[1]	Wu J, Qu X. 2022. Intersection control with connected and automated vehicles: a review. Journal of Intelligent and Connected Vehicles 5:260−269 doi: 10.1108/JICV-06-2022-0023 CrossRef Google Scholar
[2]	Florin R, Olariu S. 2015. A survey of vehicular communications for traffic signal optimization. Vehicular Communications 2:70−79 doi: 10.1016/j.vehcom.2015.03.002 CrossRef Google Scholar
[3]	Hamilton A, Waterson B, Cherrett T, Robinson A, Snell I. 2013. The evolution of urban traffic control: changing policy and technology. Transportation Planning and Technology 36:24−43 doi: 10.1080/03081060.2012.745318 CrossRef Google Scholar
[4]	Webster FV. 1958. Traffic signal settings. London, England: Her Majesty's Stationery Office
[5]	Wong CK, Wong SC. 2003. Lane-based optimization of signal timings for isolated junctions. Transportation Research Part B: Methodological 37:63−84 doi: 10.1016/S0191-2615(01)00045-5 CrossRef Google Scholar
[6]	Yin Y. 2008. Robust optimal traffic signal timing. Transportation Research Part B: Methodological 42:911−924 doi: 10.1016/j.trb.2008.03.005 CrossRef Google Scholar
[7]	Yun I, Park BB. 2012. Stochastic optimization for coordinated actuated traffic signal systems. Journal of Transportation Engineering 138:819−829 doi: 10.1061/(ASCE)TE.1943-5436.0000384 CrossRef Google Scholar
[8]	Dai R, Cai P, Wang X, Zhang R. 2023. A computationally efficient and refined signal control method for isolated intersections in a connected vehicle environment. Expert Systems with Applications 234:121073 doi: 10.1016/j.eswa.2023.121073 CrossRef Google Scholar
[9]	Mercader P, Uwayid W, Haddad J. 2020. Max-pressure traffic controller based on travel times: an experimental analysis. Transportation Research Part C: Emerging Technologies 110:275−290 doi: 10.1016/j.trc.2019.10.002 CrossRef Google Scholar
[10]	Abed Al Raheem Magableh A, Almakhadmeh MA, Alsrehin N, Klaib AF. 2020. Smart traffic light management systems: a systematic literature review. Journal International Journal of Technology Diffusion 11:22−47 doi: 10.4018/ijtd.2020070102 CrossRef Google Scholar
[11]	Touhbi S, Babram MA, Nguyen-Huu T, Marilleau N, Hbid ML, et al. 2017. Adaptive traffic signal control: exploring reward definition for reinforcement learning. Procedia Computer Science 109:513−520 doi: 10.1016/j.procs.2017.05.327 CrossRef Google Scholar
[12]	Joo H, Ahmed SH, Lim Y. 2020. Traffic signal control for smart cities using reinforcement learning. Computer Communications 154:324−330 doi: 10.1016/j.comcom.2020.03.005 CrossRef Google Scholar
[13]	Acar B, Sterling M. 2023. Ensuring federated learning reliability for infrastructure-enhanced autonomous driving. Journal of Intelligent and Connected Vehicles 6:125−135 doi: 10.26599/JICV.2023.9210009 CrossRef Google Scholar
[14]	Sun H, Chen C, Liu Q, Zhao J. 2020. Traffic signal control method based on deep reinforement learning. Computer Science 47:169−174 Google Scholar
[15]	La P , Bhatnagar S. 2011. Reinforcement learning with function approximation for traffic signal control. IEEE Transactions on Intelligent Transportation Systems 12:412−421 doi: 10.1109/TITS.2010.2091408 CrossRef Google Scholar
[16]	Shabestary SMA, Abdulhai B. 2022. Adaptive traffic signal control with deep reinforcement learning and high dimensional sensory inputs: case study and comprehensive sensitivity analyses. IEEE Transactions on Intelligent Transportation Systems 23:20021−20035 doi: 10.1109/TITS.2022.3179893 CrossRef Google Scholar
[17]	Haddad TA, Hedjazi D, Aouag S. 2022. A deep reinforcement learning-based cooperative approach for multi-intersection traffic signal control. Engineering Applications of Artificial Intelligence 114:105019 doi: 10.1016/j.engappai.2022.105019 CrossRef Google Scholar
[18]	Shabestary SMA, Abdulhai B. 2018. Deep learning vs. discrete reinforcement learning for adaptive traffic signal control. Proc. 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 2018. US: IEEE. pp. 286–293 doi: 10.1109/ITSC.2018.8569549
[19]	Liang X, Du X, Wang G, Han Z. 2019. A deep reinforcement learning network for traffic light cycle control. IEEE Transactions on Vehicular Technology 68:1243−1253 doi: 10.1109/TVT.2018.2890726 CrossRef Google Scholar
[20]	Nigam A, Srivastava S. 2023. Hybrid deep learning models for traffic stream variables prediction during rainfall. Multimodal Transportation 2:100052 doi: 10.1016/j.multra.2022.100052 CrossRef Google Scholar
[21]	Bouktif S, Cheniki A, Ouni A, El-Sayed H. 2023. Deep reinforcement learning for traffic signal control with consistent state and reward design approach. Knowledge-Based Systems 267:110440 doi: 10.1016/j.knosys.2023.110440 CrossRef Google Scholar
[22]	Liang X, Guler SI, Gayah VV. 2020. An equitable traffic signal control scheme at isolated signalized intersections using Connected Vehicle technology. Transportation Research Part C: Emerging Technologies 110:81−97 doi: 10.1016/j.trc.2019.11.005 CrossRef Google Scholar
[23]	Chu T, Wang J, Codecà L, Li Z. 2020. Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Transactions on Intelligent Transportation Systems 21:1086−1095 doi: 10.3390/app15158605 CrossRef Google Scholar
[24]	Aslani M, Mesgari MS, Wiering M. 2017. Adaptive traffic signal control with actor-critic methods in a real-world traffic network with different traffic disruption events. Transportation Research Part C: Emerging Technologies 85:732−52 doi: 10.1016/j.trc.2017.09.020 CrossRef Google Scholar
[25]	Haydari A, Yilmaz Y. 2022. Deep reinforcement learning for intelligent transportation systems: a survey. IEEE Transactions on Intelligent Transportation Systems 23(1):11−32 doi: 10.1109/TITS.2020.3008612 CrossRef Google Scholar
[26]	El-Tantawy S, Abdulhai B. 2010. An agent-based learning towards decentralized and coordinated traffic signal control. Proc. 13th International IEEE Conference on Intelligent Transportation Systems, Funchal, Portugal, 2010. US: IEEE. pp. 665–670 doi: 10.1109/ITSC.2010.5625066
[27]	Mousavi SS, Schukat M, Howley E. 2017. Traffic light control using deep policy-gradient and value-function-based reinforcement learning. IET Intelligent Transport Systems 11:417−423 doi: 10.1049/iet-its.2017.0153 CrossRef Google Scholar
[28]	Khamis MA, Gomaa W. 2013. Enhanced multiagent multi-objective reinforcement learning for urban traffic light control. Proc. 2012 11th International Conference on Machine Learning and Applications, Boca Raton, FL, USA, 2012. US: IEEE. pp. 586–591 doi: 10.1109/ICMLA.2012.108
[29]	Noaeen M, Naik A, Goodman L, Crebo J, Abrar T, et al. 2022. Reinforcement learning in urban network traffic signal control: a systematic literature review. Expert Systems with Applications 199:116830 doi: 10.1016/j.eswa.2022.116830 CrossRef Google Scholar
[30]	Van Hasselt H, Guez A, Silver D. 2016. Deep reinforcement learning with double Q-learning. Proceedings of the AAAI Conference on Artificial Intelligence 30(1):2094−2100 doi: 10.1609/aaai.v30i1.10295 CrossRef Google Scholar
[31]	Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, et al. 2015. Human-level control through deep reinforcement learning. Nature 518:529−533 doi: 10.1038/nature14236 CrossRef Google Scholar

Entry approach	Through (pcu/h)	Left-turn (pcu/h)
Westbound approach	400	100
Northbound approach	200	100
Eastbound approach	380	180
Southbound approach	200	150

Parameters	Value	Description
Total training episodes	1,500	Total number of training episodes during which the agent interacts with the environment and updates its policy
Maximum simulation steps per episode	3,600	The maximum number of simulation steps executed in a single training episode
Target network update interval κ	3	The target network is updated once every three updates of the main (evaluation) network
Batch size	64	The number of samples used in each training batch for network parameter updates
Learning rate α	0.0025	The learning rate used for training the DDQN network
Discount factor γ	0.95	The discount factor used to calculate the cumulative future reward

Traffic movement	Fixed-time control	Vehicle-actuated control	DDQN-based control
SBT-NBT	21.63	37.06	24.29
WBT-EBT	51.26	29.03	42.60
SBL-NBL	56.71	32.10	16.36
WBL-EBL	57.16	32.49	17.91
The bold value represents the minimum value, indicating that under the corresponding control method, this traffic flow direction has the best control traffic efficiency.

{{lists.name}}

Double deep network-based traffic signal optimization method for isolated intersections

Abstract

Rights and permissions

References

About this article

Cite this article

Article Metrics

Access History

Other Articles By Authors