An integrated and cooperative architecture for multi-intersection traffic signal control

Qiang Wu; Jianqing Wu; Bojian Kang; Bo Du; Jun Shen; Adriana Simona Mihăiţă; Qiang Wu; Jianqing Wu; Bojian Kang; Bo Du; Jun Shen; Adriana Simona Mihăiţă

doi:10.48130/DTS-2023-0012

Traffic signal control (TSC) systems are one essential component in intelligent transport systems. However, relevant studies are usually independent of the urban traffic simulation environment, collaborative TSC algorithms and traffic signal communication. In this paper, we propose (1) an integrated and cooperative Internet-of-Things architecture, namely General City Traffic Computing System (GCTCS), which simultaneously leverages an urban traffic simulation environment, TSC algorithms, and traffic signal communication; and (2) a general multi-agent reinforcement learning algorithm, namely General-MARL, considering cooperation and communication between traffic lights for multi-intersection TSC. In experiments, we demonstrate that the integrated and cooperative architecture of GCTCS is much closer to the real-life traffic environment. The General-MARL increases the average movement speed of vehicles in traffic by 23.2% while decreases the network latency by 11.7%.

HTML

Introduction

Urban traffic congestion has become a global issue and caused severe effects, such as increased travel time, fuel consumption, and air pollution^[1−3] It has serious implications for the global economy and the environment. For example, in recent years, the USA alone lost $87 billion per year in extra driving time and gasoline due to traffic jams^[4,5].

Inadequate traffic infrastructure, the rapid growth of vehicles, and pre-determined traffic signal control (TSC) methods are the leading causes of urban traffic congestion^[6]. Addressing urban planning and infrastructure concerns necessitates significant financial and material resources. As a result, improving the existing TSC methods is the most cost-efficient way to relieve traffic congestion.

In recent years, significant progress has been made on reinforcement learning (RL) for solving various sequential decision-making problems in Artificial Intelligence (AI) games, such as Atari^[7], Go^[8], and Dota2^[9,10]. The TSC problem can be regarded as an agent that can make decisions at intersections by interacting with the environment, just like in a game.

It is challenging to alleviate traffic congestion by optimizing and controlling traffic signals only at a single intersection^[11]. TSC is being extended from optimization of a single intersection to multiple intersections, which can be formulated as a multi-agent system with cooperation between agents. Hence, multi-agent reinforcement learning (MARL) has been receiving more attention from researchers in these years^[12−14]. Furthermore, considering the network transmission and communication between traffic lights, efforts toward efficiently deploying MARL in practice has remained a research challenge.

Additionally, most research focused on lab theories and algorithms with few considerations of industrial-scale deployment issues. With the limited capabilities of the network transmission bandwidth and underlying computing resources, optimizing the deployment structure and algorithmic performances for TSC is essential for intelligence transport systems (ITS).

ITS-related technologies, such as urban traffic simulation environments^[15−18], TSC algorithms^[19,20], and traffic signal communication^[14,21], have considerably enhanced traffic operation and management. To the best of our knowledge, there is rare research considering all aspects above for TSC, and few works can be deployed in the real-world. We summarize several existing challenges in TSC:

1) Although some studies have contributed to multi-intersection TSC^[22−24], it lacks an integrated architecture to leverage the traffic simulation environment, cooperative TSC algorithm, and traffic signal communication to achieve optimal multi-intersection TSC;

2) Traditional algorithms in urban TSC rarely consider traffic light cooperation and communication simultaneously;

3) Most studies optimize the algorithms but ignore the network capacity or latency in the urban TSC process.

To address the aforementioned challenges, an integrated and cooperative architecture for TSC across multiple intersections is proposed in this paper. The main contribution is threefold:

1) An integrated architecture, namely General City Traffic Computing System (GCTCS), is proposed, which integrates an urban traffic simulation environment, TSC algorithm, and communication across the traffic signal network simultaneously.

2) A MARL algorithm, namely General-MARL, is developed for TSC based on GCTCS, considering cooperation and communication between traffic lights.

3) Comprehensive experiments have been conducted to validate the proposed architecture and algorithm with promising results. From experimental results, our novel architecture is much closer to the real-life traffic environment. With the proposed algorithm, the average speed of vehicles is increased by 23.2%, and the network latency is reduced by 11.7% compared with baseline algorithms.

The remainder of this paper is organized as follows: Section Related Work introduces related works. Section Preliminary explains the basic concepts and problem definition. Section Methodology describes the architecture of GCTCS and details the General-MARL method for cooperative traffic light control based on GCTCS. Section Experiments conducts experiments to demonstrate the advantage of the General-MARL algorithm. Section Conclusions concludes the paper and discusses future work.

Related Work

In this section, we discuss and introduce studies on the TSC, which can be divided into two typical categories: Conventional approaches and RL-based methods.

Conventional methods for TSC
Conventional TSC methods are classified into four types: fixed-time control^[25], actuated control^[26], adaptive control^[27], and optimization-based control^[28]. Fixed-time is a conventional and primary method of urban TSC, benefiting from the simplicity of deployment. It usually consists of a pre-timed cycle length, fixed cycle-based phase sequence, and phase split. While calculating the cycle length and phase split, the traffic flow is assumed to be uniform during a specific period. Since introduction in the 1950s^[25], it has been a leading solution to TSC in practice considering that the urban traffic environment is complex and uncertain, and mathematical approaches cannot precisely build a model from internal operational mechanisms of a TSC system. Actuated control^[26] decided whether to keep or change the current phase based on the pre-defined rules and real-time traffic data. It could set the green signal for a specific traffic signal phase if the number of approaching vehicles is larger than a threshold. Based on traffic volume data from loop sensors, adaptive control^[27] created a set of traffic plans and chose the one that was best for the current traffic situation. Optimization-based control^[28] formulated TSC as an optimization problem under a dynamic traffic flow and decided the traffic signal according to the observed traffic information. All of the methods discussed above heavily rely on human-designed traffic signal plans or rules.

RL based methods for TSC
RL-based methods have emerged as a promising TSC solution, which are designed for different application scenarios including single intersection control^[11,29], and multi-intersection control^[30,31].

Single intersection control
Abdulhai et al. ^[11] introduced Q-learning for TSC and presented a case study involving application to traffic signal control. Li & Wang^[29] proposed the idea to set up a deep neural network (DNN) to learn the Q-function of reinforcement learning from the sampled traffic state/control inputs and the corresponding traffic system performance result. Park et al.^[32] developed two traffic signal control models using reinforcement learning and a microscopic simulation-based evaluation for an isolated intersection. Additionally, the models could also be adapted for two coordinated intersections.

Multi intersection control
Multi-agent reinforcement learning (MARL) involves the participation of more than one agent^[12]. It can learn through the cooperation of (1) sharing instantaneous information through interaction with the environment and (2) sharing learned policies in episodic experience.

MARL is a suitable method for the TSC problem, which can be solved as a typical MARL system for optimization of all intersections^[13,14]. There exist intelligent traffic agents in the environment that can facilitate learning progress in MARL. Co-DQL model^[12] used a highly scalable independent double Q-learning method based on double estimators and the upper confidence bound (UCB) policy for multi intersections. Wang et al.^[13] proposed two distributed MARL control models as well as a Federated Learning (FL) framework to solve the ATSC problem, where the former is based on Advantage Actor-Critic (A2C) algorithm, and the latter is based on Federated Averaging (FedAvg) algorithm. El-Tantawy et al.^[30] investigated the following dimensions of the control problem: (1) RL learning methods, (2) traffic state representations, (3) action selection methods, (4) traffic signal phasing schemes, (5) reward definitions, and (6) variability of flow arrivals to the intersection. Rasheed et al.^[31] designed a multi-agent DQN (MADQN) and investigated its use to further address the curse of dimensionality under traffic network scenarios with high traffic volume and disturbances. El-Tantawy et al.^[30] introduced a multi-agent auto communication (MAAC) algorithm, which is an innovative adaptive global traffic light control method based on multi-agent reinforcement learning (MARL) and an auto communication protocol in edge computing architecture. The MAAC model considered traffic communication but did not leverage MARL and traffic simulation environment optimization.

From the literature, we find that most studies attempt to develop an RL or MARL model to address the TSC problem directly, ignoring the traffic simulation environment optimization and traffic communication simultaneously.

[1]	Zhao D, Dai Y, Zhang Z. 2011. Computational intelligence in urban traffic signal control: A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42:485−94 doi: 10.1109/TSMCC.2011.2161577 CrossRef Google Scholar
[2]	Ng V, Kim HM. 2021. Autonomous vehicles and smart cities: A case study of Singapore. In Smart cities for technological and social innovation, eds. Kim HM, Sabri S, Kent A. USA: Academic Press, Elsevier. pp. 265–287. https://doi.org/10.1016/B978-0-12-818886-6.00014-9
[3]	Sheng MS, Sreenivasan AV, Sharp B, Du B. 2021. Well-to-wheel analysis of greenhouse gas emissions and energy consumption for electric vehicles: A comparative study in Oceania. Energy Policy 158:112552 doi: 10.1016/j.enpol.2021.112552 CrossRef Google Scholar
[4]	Harris N, Shealy T, Klotz L. 2016. Choice architecture as a way to encourage a whole systems design perspective for more sustainable infrastructure. Sustainability 9(1):54 doi: 10.3390/su9010054 CrossRef Google Scholar
[5]	Afrin T, Yodo N. 2020. A survey of road traffic congestion measures towards a sustainable and resilient transportation system. Sustainability 12(11):4660 doi: 10.3390/su12114660 CrossRef Google Scholar
[6]	Lee WH, Chiu CY. 2020. Design and implementation of a smart traffic signal control system for smart city applications. Sensors 20(2):508 doi: 10.3390/s20020508 CrossRef Google Scholar
[7]	Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, et al. 2013. Playing atari with deep reinforcement learning. arXiv Preprint doi: 10.48550/arXiv.1312.5602 CrossRef Google Scholar
[8]	Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, et al. 2017. Mastering the game of go without human knowledge. Nature 550:354−59 doi: 10.1038/nature24270 CrossRef Google Scholar
[9]	Berner C, Brockman G, Chan B, Cheung V, Dębiak P, et al. 2019. Dota 2 with large scale deep reinforcement learning. arXiv Preprint doi: 10.48550/arXiv.1912.06680 CrossRef Google Scholar
[10]	Telikani A, Tahmassebi A, Banzhaf W, Gandomi AH. 2022. Evolutionary machine learning: A survey. ACM Computing Surveys (CSUR) 54(8):1−35 doi: 10.1145/3467477 CrossRef Google Scholar
[11]	Abdulhai B, Pringle R, Karakoulas GJ. 2003. Reinforcement learning for true adaptive traffic signal control. Journal of Transportation Engineering 129(3):278−85 doi: 10.1061/(ASCE)0733-947X(2003)129:3(278) CrossRef Google Scholar
[12]	Wang X, Ke L, Qiao Z, Chai X. 2020. Large-scale traffic signal control using a novel multiagent reinforcement learning. IEEE Transactions on Cybernetics 51(1):174−87 doi: 10.1109/TCYB.2020.3015811 CrossRef Google Scholar
[13]	Wang T, Liang T, Li J, Zhang W, Zhang Y, et al. 2020. Adaptive traffic signal control using distributed MARL and federated learning. 2020 IEEE 20^th International Conference on Communication Technology (ICCT), Nanning, China, 28-31 October 2020. USA: IEEE. pp. 1242−48. https://doi.org/10.1109/ICCT50939.2020.9295660
[14]	Wu Q, Wu J, Shen J, Yong B, Zhou Q. 2020. An edge based multi-agent auto communication method for traffic light control. Sensors 20(15):4291 doi: 10.3390/s20154291 CrossRef Google Scholar
[15]	Ben-Akiva M, Koutsopoulos HN, Toledo T, Yang Q, Choudhury CF, et al. 2010. Traffic simulation with MITSIMLab. In Fundamentals of traffic simulation, ed. Barceló J. New York: Springer. pp. 233−68. https://doi.org/10.1007/978-1-4419-6142-6_6
[16]	Krajzewicz D. 2010. Traffic simulation with SUMO – simulation of urban mobility. In Fundamentals of traffic simulation, ed. Barceló J. New York: Springer. pp. 269−93. https://doi.org/10.1007/978-1-4419-6142-6_7
[17]	Zhang H, Feng S, Liu C, Ding Y, Zhu Y, et al. 2019. Cityflow: A multi-agent reinforcement learning environment for large scale city traffic scenario. WWW '19: The world wide web conference, San Francisco, CA, USA, 2019. New York, NY, USA: Association for Computing Machinery. pp. 3620−24. https://doi.org/10.1145/3308558.3314139
[18]	Jang K, Vinitsky E, Chalaki B, Remer B, Beaver L, et al. 2019. Simulation to scaled city: zero-shot policy transfer for traffic control via autonomous vehicles. ICCPS '19: Proceedings of the 10th ACM/IEEE International Conference on Cyber-Physical Systems, Montreal Quebec Canada, April 16−18, 2019. pp. 291−300. c>. pp. 291−300. https://doi.org/10.1145/3302509.3313784
[19]	Wei H, Zheng G, Yao H, Li Z. 2018. Intellilight: A reinforcement learning approach for intelligent traffic light control. IKDD '18: Proceedings of the 24^th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London United Kingdom, August 1923, 2018. New York, United States: Association for Computing Machinery. pp. 2496−505.ry. pp. 2496−505. https://doi.org/10.1145/3219819.3220096
[20]	Liang X, Du X, Wang G, Han Z. 2019. A deep reinforcement learning network for traffic light cycle control. IEEE Transactions on Vehicular Technology 68(2):1243−53 doi: 10.1109/TVT.2018.2890726 CrossRef Google Scholar
[21]	Wu Q, Shen J, Yong B, Wu J, Li F, et al. 2019. Smart fog based workflow for traffic control networks. Future Generation Computer Systems 97:825−35 doi: 10.1016/j.future.2019.02.058 CrossRef Google Scholar
[22]	Huo Y, Tao Q, Hu J. 2020. Cooperative control for multi-intersection traffic signal based on deep reinforcement learning and imitation learning. IEEE Access 8:199573−85 doi: 10.1109/ACCESS.2020.3034419 CrossRef Google Scholar
[23]	Yang S, Yang B. 2021. A semi-decentralized feudal multi-agent learned-goal algorithm for multi-intersection traffic signal control. Knowledge-Based Systems 213:106708 doi: 10.1016/j.knosys.2020.106708 CrossRef Google Scholar
[24]	Yang S, Yang B, Kang Z, Deng L. 2021. IHG-MA: Inductive heterogeneous graph multi-agent reinforcement learning for multi-intersection traffic signal control. Neural networks 139:265−77 doi: 10.1016/j.neunet.2021.03.015 CrossRef Google Scholar
[25]	Webster FV. 1958. Traffic signal settings. Technical report. Road Research Technique Paper No. 39. Road Research Laboratory, London.
[26]	Cools, S. B. ; Gershenson, C. and D’Hooghe, B. 2013. Self-organizing traffic lights: A realistic simulation. In Advances in applied self-organizing systems, ed. Prokopenko M. London: Springer. pp. 45−55. https://doi.org/10.1007/978-1-4471-5113-5_3
[27]	Hunt PB, Robertson DI, Bretherton RD, Royle MC. 1982. The SCOOT on-line traffic signal optimisation technique. Traffic Engineering & Control 23(4):190−92 Google Scholar
[28]	Sun X, Yin Y. 2018. A simulation study on max pressure control of signalized intersections. Transportation research record 2672(18):117−27 doi: 10.1177/0361198118786840 CrossRef Google Scholar
[29]	Li L, Lv Y, Wang F. 2016. Traffic signal timing via deep reinforcement learning. IEEE/CAA Journal of Automatica Sinica 3(3):247−54 doi: 10.1109/JAS.2016.7508798 CrossRef Google Scholar
[30]	El-Tantawy S, Abdulhai B, Abdelgawad H. 2014. Design of reinforcement learning parameters for seamless application of adaptive traffic signal control. Journal of Intelligent Transportation Systems 18(3):227−45 doi: 10.1080/15472450.2013.810991 CrossRef Google Scholar
[31]	Rasheed F, Yau KLA, Low YC. 2020. Deep reinforcement learning for traffic signal control under disturbances: A case study on Sunway city, Malaysia. Future Generation Computer Systems 109:431−45 doi: 10.1016/j.future.2020.03.065 CrossRef Google Scholar
[32]	Park S, Han E, Park S, Jeong H, Yun I. 2021. Deep Q-network-based traffic signal control models. Plos One 16(9):e0256405 doi: 10.1371/journal.pone.0256405 CrossRef Google Scholar
[33]	Lownes NE, Machemehl RB. 2006. VISSIM: a multi-parameter sensitivity analysis. Proceedings of the 2006 Winter Simulation Conference, Monterey, CA, USA, December 3-6, 2006. pp. 1406-13. IEEE. https://doi.org/10.1109/WSC.2006.323241
[34]	Cameron GDB, Duncan GID. 1996. PARAMICS—Parallel microscopic simulation of road traffic. The Journal of Supercomputing 10:25−53 doi: 10.1007/BF00128098 CrossRef Google Scholar
[35]	Fox A, Griffith R, Joseph A, Katz R, Konwinski A, et al. 2009. Above the clouds: A berkeley view of cloud computing. Technical Report No. UCB/EECS-2009-28. University of California at Berkeley, USA. www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.html
[36]	Bagchi S, Siddiqui MB, Wood P, Zhang H. 2019. Dependability in edge computing. Communications of the ACM 63(1):58−66 doi: 10.1145/3362068 CrossRef Google Scholar
[37]	Sutton RS, Barto AG. 2018. Reinforcement learning: An introduction. Cambridge, MA: MIT press.
[38]	Bochkovskiy A, Wang CY, Liao HYM. 2020. Yolov4: Optimal speed and accuracy of object detection. arXiv Preprint doi: 10.48550/arXiv.2004.10934 CrossRef Google Scholar
[39]	Telikani A, Shen J, Yang J, Wang P. 2022. Industrial IoT intrusion detection via evolutionary cost-sensitive learning and fog computing. IEEE Internet of Things Journal 9(22):23260−71 doi: 10.1109/JIOT.2022.3188224 CrossRef Google Scholar
[40]	Zhang L, Wu J, Shen J, Chen M, Wang R, et al. 2021. SATP-GAN: Self-attention based generative adversarial network for traffic flow prediction. Transportmetrica B: Transport Dynamics 9(1):552−68 doi: 10.1080/21680566.2021.1916646 CrossRef Google Scholar
[41]	Goodfellow I, Bengio Y, Courville A. 2016. Deep learning. Cambridge, Massachusetts (MA): MIT press.
[42]	Dong Z, Wu Y, Pei M, Jia Y. 2015. Vehicle type classification using a semisupervised convolutional neural network. IEEE transactions on intelligent transportation systems 16(4):2247−56 doi: 10.1109/TITS.2015.2402438 CrossRef Google Scholar
[43]	Wu Q, Wu J, Shen J, Du B, Telikani A, et al. 2022. Distributed agent-based deep reinforcement learning for large scale traffic signal control. Knowledge-Based Systems 241:108304 doi: 10.1016/j.knosys.2022.108304 CrossRef Google Scholar
[44]	Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, et al. 2016. Asynchronous methods for deep reinforcement learning. Proceedings of The 33rd International Conference on Machine Learning (ICML), New York, USA, 2016. New York, USA: PMLR. pp. 1928−37.
[45]	Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, et al. 2017. Attention is all you need. Advances in neural information processing systems. Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS). pp.6000–10
[46]	Merkel D. 2014. Docker: lightweight linux containers for consistent development and deployment. Linux Journal 239(2):2 Google Scholar
[47]	Agarap AF. 2018. Deep learning using rectified linear units (relu). arXiv Preprint doi: 10.48550/arXiv.1803.08375 CrossRef Google Scholar
[48]	Watkins CJCH. 1989. Learning from delayed rewards. PhD Thesis. University of Cambridge, England
[49]	Hu J, Wellman MP. 2003. Nash Q-learning for general-sum stochastic games. Journal of Machine Learning Research 4(Nov):1039−69 Google Scholar
[50]	Nash JF Jr. 1950. Equilibrium points in n-person games. PNAS 36(1):48−49 doi: 10.1073/pnas.36.1.48 CrossRef Google Scholar
[51]	Casgrain, P.; Ning, B. and Jaimungal, S. 2022. Deep Q-learning for Nash equilibria: Nash-DQN. Applied Mathematical Finance 29(1):62−78 doi: 10.1080/1350486X.2022.2136727 CrossRef Google Scholar
[52]	Du B, Zhang C, Shen J, Zheng Z. 2021. A dynamic sensitivity model for unidirectional pedestrian flow with overtaking behaviour and its application on social distancing's impact during COVID-19. IEEE Transactions on Intelligent Transportation Systems 23(8):10404−17 doi: 10.1109/TITS.2021.3093714 CrossRef Google Scholar

1:	Init Episode $ B > 0 $, $ b=1 $; Minibatch Size $ \hat{M} > 0 $, Game Step N
2:	Init Replay Buffer $ D $, $ {\theta }_{V} $ and $ {\theta }_{A} $
3:	repeat
4:	Reset, go to the $ {x}_{0} $
5:	repeat
6:	Select $ u\leftarrow {\pi }^{{\theta }_{A}}\left(x\right) $ or select $ u $ randomly (e.g., ϵ-greedy)
7:	Observe $ {y}_{t}=\left({x}_{t-1},u,{x}_{t}\right) $
8:	Store $ {y}_{t} $ to Reply Buffer $ D $
9:	Sampling from Replay Buffer: $ Y={\left\{{y}_{i}\right\}}_{i=1}^{\hat{M}} $
10:	Optimize $ \frac{1}{M+1}{\sum }_{y\in Y\cup \left\{{y}_{t}\right\}}\hat{L}\left(y,{\theta }_{V},{\theta }_{A}\right) $ fix $ {\theta }_{A} $ update $ {\theta }_{V} $
11:	Optimize $ \frac{1}{M+1}{\sum }_{y\in Y\cup \left\{{y}_{t}\right\}}\hat{L}\left(y,{\theta }_{V},{\theta }_{A}\right) $ fix $ {\theta }_{V} $ update $ {\theta }_{A} $
12:	Until $ t > N $
13:	Until $ b > B $
14:	return $ {\theta }_{V} $ and $ {\theta }_{A} $

1: Apply the communication module:
2: Initialize the communication matric $ {C}_{0} $ of all fog computing node Agents
3: Initialize the parameters $ {\theta }_{Sender}^{i} $ and $ {\theta }_{Receiver}^{i} $ of the fog computing node Agents
4: Receive the global parameter sets $ {\theta }_{V} $ and $ {\theta }_{A} $ distributed by the cloud computing node and initialize the parameter sets $ {\theta }_{V}^{i} $ and $ {\theta }_{A}^{i} $
5: Initialize the Episode $ B > 0 $, $ b=1 $; the minimum batch size Minibatch Size $ \stackrel{-}{M} > 0 $, the number of game steps $ N $
6: Apply the Nash-MARL Module:
7: Initialize the memory record Replay Buffer $ D $
8: repeat
9:	Reset the environment and enter the initial state $ {x}_{0} $
10:	repeat
11:	Choose joint action $ u\leftarrow {\pi }^{{\theta }_{A}}\left(x\right) $ or randomly choose joint action $ u $ (e.g., ϵ-greedy exploration)
12:	Observe the state-action-state triplet $ {y}_{t}=\left({x}_{t-1},u,{x}_{t}\right) $
13:	Store triples in the Replay Buffer $ D $
14:	Extract data $ Y={\left\{{y}_{i}\right\}}_{i=1}^{M} $ from the Replay Buffer
15:	$ {Agent}^{i} $ receiver uses Attention mechanism to generate communication matrix $ {\hat{C}}_{t} $
16:	The strategy choice network of the Agent $ {t}^{i} $ sender chooses an action $ {a}_{t+1}^{i} $, or randomly chooses action a (e.g., ϵ-greedy exploration)
17:	The $ {Agent}^{i} $ sender generates its own information $ {c}_{t+1}^{i} $ through the communication matrix $ {\hat{C}}_{t} $ at the receiving end
18:	Collect the joint actions of all Agents, execute an action $ {a}_{t+1}^{i},\cdots ,{a}_{t+1}^{N} $, get rewards $ {R}_{t+1} $ and the next state $ {X}_{t+1} $ from the environment
19:	Optimization step $ \frac{1}{M+1}{\sum }_{y\in Y\cup \left\{{y}_{t}\right\}}\hat{L}\left(y,{\theta }_{V}^{i},{\theta }_{A}^{i},{\hat{C}}_{t}\right) $, fixes $ {\theta }_{A}^{i} $ updates $ {\theta }_{V}^{i} $
20:	Optimization step $ \frac{1}{M+1}{\sum }_{y\in Y\cup \left\{{y}_{t}\right\}}\hat{L}\left(y,{\theta }_{V}^{i},{\theta }_{A}^{i},{\hat{C}}_{t}\right) $, fixes $ {\theta }_{V}^{i} $ updates $ {\theta }_{A}^{i} $
21:	until $ > N $
22: until $ b > B $
23: Return $ {\theta }_{V}^{i} $ and $ {\theta }_{A}^{i} $

Module	Parameters	Description
Cloud Computing Center	$ x=0\|x=10 $ 20, 60, 60, 20 0.001	The delay from the cloud to the fog node The hidden layers in the network The learning rate
Fog Computing Node	$ x=0\|x=1 $ 20, 60, 60, 20 0.001	The delay from intersection to the fog node The hidden layers in the network The learning rate
Edge Computing Node	$ x=0\|x=1 $	The delay from edge nodes to fog node
Experiment Settings	$ {g}_{t}={r}_{t}=27,{y}_{t}=6 $ 15 E = 1,000 I = 27 l = 0.001 γ = 0.982	The initial intervals of the green, red and yellow The traffic flow prediction period The number of Episodes The number of intersections The learning rate The discount rate

Method	Average speed (km/h)	Average waiting time (s)
Fixed-time	10.17	166.70
Q learning	18.43	135.62
DQN	20.10	112.24
A3C	24.12	90.73
Nash-Q	29.70	70.14
Nash-DQN	33.81	61.21
MAAC	27.39	80.21
General-MARL	31.22	62.87

Method	Accumulated time (s)	Network delay (s)	Delay rate
Fixed-time	38827.5	0.0	0.0%
Q learning(center)	45448.0	10809.7	23.8%
Q learning(edge)	35789.3	6522.9	18.2%
DQN	33789.9	8997.2	26.6%
A3C	31340.4	7792.1	24.9%
Nash-Q	30940.5	7536.9	24.4%
Nash-DQN	31994.6	7937.7	24.8%
MAAC	28940.7	6552.1	22.6%
General-MARL	26912.7	3264.5	12.2%

ID	Fixed- time	Q-edge	DQN	A3C	Nash-Q	Nash- DQN	MAAC	General
1	175.27	161.96	136.93	110.89	109.34	111.21	124.69	96.81
2	188.35	172.68	145.58	116.43	115.88	119.02	130.73	105.46
3	155.28	145.58	123.71	102.43	99.35	99.27	120.99	83.59
4	197.72	180.36	151.78	120.39	120.57	124.61	133.34	111.66
5	155.23	145.54	123.68	102.41	99.32	99.24	116.97	83.56
6	168.97	156.8	132.76	108.22	106.19	107.45	122.26	92.64
7	157.68	147.55	125.30	103.44	100.55	100.7	107.91	85.18
8	161.32	150.53	127.70	104.98	102.37	102.88	119.31	87.58
9	185.23	170.13	143.52	115.11	114.32	109.88	128.53	103.40
10	176.47	162.95	137.72	111.40	109.94	111.92	115.15	97.60
11	161.21	150.44	127.63	104.94	102.31	102.81	119.28	87.51
12	125.96	121.55	104.32	90.01	104.30	81.77	105.69	64.20
13	131.62	126.19	108.06	92.41	87.52	85.15	117.87	67.94
14	169.15	156.95	132.88	108.30	106.28	107.55	122.33	92.76
15	175.52	180.84	152.16	120.64	109.47	124.96	40.06	112.04
16	132.47	126.89	108.62	92.77	87.94	85.65	108.20	68.50
17	164.12	152.83	129.56	87.56	103.77	104.55	113.46	89.44
18	166.87	155.08	131.38	107.33	105.14	106.19	121.45	91.26
19	150.39	141.57	120.48	100.36	96.90	96.35	115.11	80.36
20	177.77	164.01	138.58	111.95	110.59	112.70	125.65	98.46
21	153.63	144.23	122.62	101.73	98.52	98.29	96.35	82.50
22	167.54	155.63	131.82	107.62	105.48	106.59	121.71	91.70
23	177.62	163.89	162.05	157.98	110.52	112.61	139.54	121.93
24	186.84	171.45	144.58	115.79	115.13	118.11	129.15	104.46
25	175.36	162.04	136.99	110.93	109.39	111.26	128.73	96.87
26	175.23	161.93	136.90	110.87	109.32	111.18	124.67	96.78
27	188.35	172.68	145.58	116.43	115.88	119.02	104.64	102.77

Method	Average speed (km/h)	Average waiting time (s)
Fixed-time	10.15	166.71
Q learning(center)	21.69	182.75
Q learning(edge)	23.47	155.64
DQN	24.94	132.70
A3C	26.12	108.65
Nash-Q	28.32	105.55
Nash-DQN	30.86	106.37
MAAC	27.61	116.81
General-MARL	30.78	92.48

{{lists.name}}

An integrated and cooperative architecture for multi-intersection traffic signal control

Abstract

Rights and permissions

References

About this article

Cite this article

Article Metrics

Access History

Other Articles By Authors

An integrated and cooperative architecture for multi-intersection traffic signal control

HTML

Conventional methods for TSC

RL based methods for TSC

Single intersection control

Multi intersection control

Traffic simulation environment

Traffic computing structure

Problem definition

Architecture

The functions of each layer

The module of traffic simulation environment

General-MARL

Edge-General-control

Fog-General-control

Cloud-General-control

Dataset

Parameter settings of General-MARL

Cloud computing center

Fog computing node

Edge computing node

Initial traffic light period setting

Vehicle simulation

Evaluation mechanism

Methods for comparison

Experimental process

Ignoring network delay

Considering network delay

Experimental results

Ignoring network delay

Considering network delay

Overall analysis

Catalog

Export File

Citation

Format

Content