Joint control of traffic signal phase sequence and timing: a deep reinforcement learning method

Zhanbo Sun; Xinben Jia; Yiming Cai; Ang Ji; Xia Lin; Lin Liu; Wenjun Wang; Yuexuan Tu; Zhanbo Sun; Xinben Jia; Yiming Cai; Ang Ji; Xia Lin; Lin Liu; Wenjun Wang; Yuexuan Tu

doi:10.48130/dts-0025-0008

2025 Volume 4

Article Contents

Next Previous

ARTICLE Open Access

Joint control of traffic signal phase sequence and timing: a deep reinforcement learning method

1.
School of Transportation and Logistics, Southwest Jiaotong University, Chengdu 611756, China
2.
SWJTU-Leeds Joint School, Southwest Jiaotong University, Chengdu 611756, China
3.
Tencent Technology (Chengdu) Co., Ltd., Chengdu 610094, China

More Information

Corresponding author: ang.ji@swjtu.edu.cn

Received: 15 October 2024
Revised: 09 January 2025
Accepted: 15 February 2025
Published online: 27 June 2025
Digital Transportation and Safety 2025, 4(2): 118−126 | Cite this article

Abstract

Recent advances in artificial intelligence have opened up new possibilities for optimizing traffic operations. In this study, a novel deep reinforcement learning-based traffic signal control strategy is proposed to jointly optimize phase sequence and signal timing, allowing for more efficient and flexible signal control. A comparison between the proposed approach and two traditional methods, Webster and MaxPressure, is conducted to highlight the advantages of AI-empowered signal control. Different techniques in state representations and action selections are explored to enhance the performance of the DRL-based agent. Results from simulation experiments indicate that the 3DQN framework with prioritized experience replay outperforms other methods by at least a 7.56% queue length reduction. Additionally, the combined state representation of macroscopic feature-based and microscopic cell-based information presents a valuable enhancement for model performance. The ablation experiments demonstrate that considering microscopic information only leads to a 2.44% increase in queue length compared to the proposed method with combined micro-level and macro-level information. However, depending only on the macroscopic feature-based representation, it fails to converge during the training session. Furthermore, the proposed joint control method reduces queue length by 6.37% compared to the phase switching control method, while the single-agent model that optimizes the phase duration performs even worse. Hopefully, this study can offer references for future research in deep reinforcement learning-based traffic signal control schemes and reveal their potential to cope with more dynamic and complex scenarios.
- Traffic signal control,
- Deep reinforcement learning,
- Micro and macro state representation,
- Joint actions
Rights and permissions
Copyright: © 2025 by the author(s). Published by Maximum Academic Press, Fayetteville, GA. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https://creativecommons.org/licenses/by/4.0/.

References

[1]	Mirchandani P, Head L. 2001. A real-time traffic signal control system: architecture, algorithms, and analysis. Transportation Research Part C: Emerging Technologies 9:415−32 doi: 10.1016/S0968-090X(00)00047-4 CrossRef Google Scholar
[2]	Rasheed F, Yau KA, Noor RM, Wu C, Low YC. 2020. Deep reinforcement learning for traffic signal control: a review. IEEE Access 8:208016−44 doi: 10.1109/ACCESS.2020.3034141 CrossRef Google Scholar
[3]	Qin Z, Ji A, Sun Z, Wu G, Hao P, et al. 2024. Game theoretic application to intersection management: a literature review. IEEE Transactions on Intelligent Vehicles 00:1−19 doi: 10.1109/TIV.2024.3379986 CrossRef Google Scholar
[4]	Maslekar N, Mouzna J, Boussedjra M, Labiod H. 2013. CATS: an adaptive traffic signal system based on car-to-car communication. Journal of Network and Computer Applications 36:1308−15 doi: 10.1016/j.jnca.2012.05.011 CrossRef Google Scholar
[5]	Hoogendoorn S, Knoop V. 2013. Traffic flow theory and modelling. In The transport system and transport policy: An introduction, eds. van Wee B, Annema JA, Banister D, Pudāne B. Cheltenham, UK: Edward Elgar Publishing. pp. 125–59
[6]	Miller A. 1963. A computer control system for traffic networks. Proceedings of the International Symposium on the Theory of Traffic Flow and Transportation, London, UK, 1963. London, UK: Organisation for Economic Co-operation and Development. https://trid.trb.org/View/612653
[7]	Zheng X, Recker W, Chu L. 2010. Optimization of control parameters for adaptive traffic-actuated signal control. Journal of Intelligent Transportation Systems 14:95−108 doi: 10.1080/15472451003719756 CrossRef Google Scholar
[8]	Zheng X, Chu L. 2008. Optimal parameter settings for adaptive traffic-actuated signal control. 2008 11^th International IEEE Conference on Intelligent Transportation Systems. October 12-15, 2008, Beijing, China. USA: IEEE. pp. 105−10. doi: 10.1109/ITSC.2008.4732676
[9]	Sims AG, Dobinson KW. 1980. The Sydney coordinated adaptive traffic (SCAT) system philosophy and benefits. IEEE Transactions on Vehicular Technology 29:130−37 doi: 10.1109/T-VT.1980.23833 CrossRef Google Scholar
[10]	Gartner N. 1983. OPAC: A demand-responsive strategy for traffic signal control. Transportation Research Record, No. 906. pp. 75–81
[11]	Bing B, Carter A. 1995. SCOOT: The world's foremost adaptive TRAFFIC control system. Traffic Technology International '95. Surrey, UK: UK and International Press. https://trid.trb.org/View/415757
[12]	Henry JJ, Farges JL, Tuffal J. 1984. The PRODYN real time traffic algorithm. In Control in Transportation Systems. Proceedings of the 4th IFAC/IFIP/IFORS Conference, Baden-Baden, Federal Republic of Germany, 20–22 April 1983. Germany: Elsevier. pp. 305–10. doi: 10.1016/B978-0-08-029365-3.50048-1
[13]	Brilon W, Wietholt T. 2013. Experiences with adaptive signal control in Germany. Transportation Research Record: Journal of the Transportation Research Board 2356:9−16 doi: 10.1177/0361198113235600102 CrossRef Google Scholar
[14]	Lertworawanich P, Unhasut P. 2021. A CO emission-based adaptive signal control for isolated intersections. Journal of the Air & Waste Management Association 71:564−85 doi: 10.1080/10962247.2020.1862940 CrossRef Google Scholar
[15]	Mondal MA, Rehena Z. 2022. Priority-based adaptive traffic signal control system for smart cities. SN Computer Science 3:417 doi: 10.1007/s42979-022-01316-5 CrossRef Google Scholar
[16]	Lee WH, Wang HC. 2022. A person-based adaptive traffic signal control method with cooperative transit signal priority. Journal of Advanced Transportation 2022:2205292 doi: 10.1155/2022/2205292 CrossRef Google Scholar
[17]	Jing P, Huang H, Chen L. 2017. An adaptive traffic signal control in a connected vehicle environment: a systematic review. Information 8:101 doi: 10.3390/info8030101 CrossRef Google Scholar
[18]	Liu Z. 2007. A survey of intelligence methods in urban traffic signal control. International Journal of Computer Science and Network Security 7(7):105−12 Google Scholar
[19]	Mannion P, Duggan J, Howley E. 2016. An experimental review of reinforcement learning algorithms for adaptive traffic signal control. In Autonomic Road Transport Support Systems, edds. McCluskey T, Kotsialos A, Müller J, Klügl F, Rana O, et al. Cham: Springer International Publishing. pp. 47−66. doi: 10.1007/978-3-319-25808-9_4
[20]	La P, Bhatnagar S. 2011. Reinforcement learning with function approximation for traffic signal control. IEEE Transactions on Intelligent Transportation Systems 12:412−21 doi: 10.1109/TITS.2010.2091408 CrossRef Google Scholar
[21]	Mohamad Alizadeh Shabestary S, Abdulhai B. 2022. Adaptive traffic signal control with deep reinforcement learning and high dimensional sensory inputs: case study and comprehensive sensitivity analyses. IEEE Transactions on Intelligent Transportation Systems 23:20021−35 doi: 10.1109/TITS.2022.3179893 CrossRef Google Scholar
[22]	Liang X, Du X, Wang G, Han Z. 2019. A deep reinforcement learning network for traffic light cycle control. IEEE Transactions on Vehicular Technology 68:1243−53 doi: 10.1109/TVT.2018.2890726 CrossRef Google Scholar
[23]	Ge H, Song Y, Wu C, Ren J, Tan G. 2019. Cooperative deep Q-learning with Q-value transfer for multi-intersection signal control. IEEE Access 7:40797−809 doi: 10.1109/ACCESS.2019.2907618 CrossRef Google Scholar
[24]	Haddad TA, Hedjazi D, Aouag S. 2022. A deep reinforcement learning-based cooperative approach for multi-intersection traffic signal control. Engineering Applications of Artificial Intelligence 114:105019 doi: 10.1016/j.engappai.2022.105019 CrossRef Google Scholar
[25]	Chu T, Wang J, Codecà L, Li Z. 2020. Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Transactions on Intelligent Transportation Systems 21:1086−95 doi: 10.1109/TITS.2019.2901791 CrossRef Google Scholar
[26]	Bouktif S, Cheniki A, Ouni A. 2021. Traffic signal control using hybrid action space deep reinforcement learning. Sensors 21:2302 doi: 10.3390/s21072302 CrossRef Google Scholar
[27]	Du Y, ShangGuan W, Rong D, Chai L. 2019. RA-TSC: learning adaptive traffic signal control strategy via deep reinforcement learning. 2019 IEEE Intelligent Transportation Systems Conference (ITSC), October 27–30, 2019. Auckland, New Zealand. USA: IEEE. pp. 3275–80. doi: 10.1109/itsc.2019.8916967
[28]	Kumar N, Mittal S, Garg V, Kumar N. 2022. Deep reinforcement learning-based traffic light scheduling framework for SDN-enabled smart transportation system. IEEE Transactions on Intelligent Transportation Systems 23:2411−21 doi: 10.1109/TITS.2021.3095161 CrossRef Google Scholar
[29]	Li L, Lv Y, Wang FY. 2016. Traffic signal timing via deep reinforcement learning. IEEE/CAA Journal of Automatica Sinica 3:247−54 doi: 10.1109/JAS.2016.7508798 CrossRef Google Scholar
[30]	Kumar N, Rahman SS, Dhakad N. 2021. Fuzzy inference enabled deep reinforcement learning-based traffic light control for intelligent transportation system. IEEE Transactions on Intelligent Transportation Systems 22:4919−28 doi: 10.1109/TITS.2020.2984033 CrossRef Google Scholar
[31]	Ma D, Zhou B, Song X, Dai H. 2022. A deep reinforcement learning approach to traffic signal control with temporal traffic pattern mining. IEEE Transactions on Intelligent Transportation Systems 23:11789−800 doi: 10.1109/TITS.2021.3107258 CrossRef Google Scholar
[32]	Aslani M, Mesgari MS, Wiering M. 2017. Adaptive traffic signal control with actor-critic methods in a real-world traffic network with different traffic disruption events. Transportation Research Part C: Emerging Technologies 85:732−52 doi: 10.1016/j.trc.2017.09.020 CrossRef Google Scholar
[33]	Wei H, Zheng G, Gayah V, Li Z. 2019. A survey on traffic signal control methods. arXiv Preprint doi: 10.48550/arXiv.1904.08117 CrossRef Google Scholar
[34]	El-Tantawy S, Abdulhai B. 2010. An agent-based learning towards decentralized and coordinated traffic signal control. 13^th International IEEE Conference on Intelligent Transportation Systems, September 19−22, 2010, Funchal, Portugal. USA: IEEE. pp. 665−70. doi: 10.1109/ITSC.2010.5625066
[35]	Khamis MA, Gomaa W. 2012. Enhanced multiagent multi-objective reinforcement learning for urban traffic light control. 2012 11^th International Conference on Machine Learning and Applications, December 12−15, 2012, Boca Raton, FL, USA. USA: IEEE. pp. 586−91. doi: 10.1109/ICMLA.2012.108
[36]	Mousavi SS, Schukat M, Howley E. 2017. Traffic light control using deep policy-gradient and value-function-based reinforcement learning. IET Intelligent Transport Systems 11:417−23 doi: 10.1049/iet-its.2017.0153 CrossRef Google Scholar
[37]	Zhao J, Yao T, Zhang C, Shafique MA. 2024. Signal control for overflow prevention at intersections using partial connected vehicle data. Transportmetrica A: Transport Science 1−31 doi: 10.1080/23249935.2024.2361648 CrossRef Google Scholar
[38]	Ma C, Yu C, Zhang C, Yang X. 2023. Signal timing at an isolated intersection under mixed traffic environment with self-organizing connected and automated vehicles. Computer-Aided Civil and Infrastructure Engineering 38:1955−72 doi: 10.1111/mice.12961 CrossRef Google Scholar
[39]	Yao T, Zhang C, Zhao J, Gupta A, Mondal S. 2023. Adaptive signal control for overflow prevention at isolated intersections based on fuzzy control. Transportation Research Record: Journal of the Transportation Research Board 2677:1387−401 doi: 10.1177/03611981221143380 CrossRef Google Scholar
[40]	Noaeen M, Naik A, Goodman L, Crebo J, Abrar T, et al. 2022. Reinforcement learning in urban network traffic signal control: a systematic literature review. Expert Systems with Applications 199:116830 doi: 10.1016/j.eswa.2022.116830 CrossRef Google Scholar
[41]	Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA. 2017. Deep reinforcement learning: a brief survey. IEEE Signal Processing Magazine 34:26−38 doi: 10.1109/MSP.2017.2743240 CrossRef Google Scholar
[42]	Bellman R. 1952. On the theory of dynamic programming. Proceedings of the National Academy of Sciences of the United States of America 38:716−19 doi: 10.1073/pnas.38.8.716 CrossRef Google Scholar
[43]	Van Hasselt H, Guez A, Silver D. 2016. Deep reinforcement learning with double Q-learning. Proceedings of the AAAI Conference on Artificial Intelligence 30:2094−100 doi: 10.1609/aaai.v30i1.10295 CrossRef Google Scholar
[44]	Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, et al. 2015. Human-level control through deep reinforcement learning. Nature 518:529−33 doi: 10.1038/nature14236 CrossRef Google Scholar
[45]	Schaul T, Quan J, Antonoglou I, Silver D. 2015. Prioritized experience replay. arXiv Preprint doi: 10.48550/arXiv.1511.05952 CrossRef Google Scholar
[46]	Wang Z, Schaul T, Hessel M, Hasselt H, Lanctot M, et al. 2016. Dueling network architectures for deep reinforcement learning. International Conference on Machine Learning, New York, USA, 2016. New York, USA: PMLR. pp. 1995–2003. https://proceedings.mlr.press/v48/wangf16.pdf
[47]	Bellemare MG, Dabney W, Munos R. 2017. A distributional perspective on reinforcement learning. Proceedings of the 34^th International Conference on Machine Learning, Sydney, Australia, 2017. PMLR. pp. 449–58. https://proceedings.mlr.press/v70/bellemare17a/bellemare17a.pdf
[48]	Fortunato M, Azar MG, Piot B, Menick J, Osband I, et al. 2017. Noisy networks for exploration. arXiv Preprint doi: 10.48550/arXiv.1706.10295 CrossRef Google Scholar
[49]	Hessel M, Modayil J, Van Hasselt H, Schaul T, Ostrovski G, et al. 2018. Rainbow: combining improvements in deep reinforcement learning. Proceedings of the AAAI Conference on Artificial Intelligence 32(1):3215−222 doi: 10.1609/aaai.v32i1.11796 CrossRef Google Scholar
[50]	Gu J, Fang Y, Sheng Z, Wen P. 2020. Double deep Q-network with a dual-agent for traffic signal control. Applied Sciences 10:1622 doi: 10.3390/app10051622 CrossRef Google Scholar
[51]	Park S, Han E, Park S, Jeong H, Yun I. 2021. Deep Q-network-based traffic signal control models. PLoS One 16:e0256405 doi: 10.1371/journal.pone.0256405 CrossRef Google Scholar
[52]	Ducrocq R, Farhi N. 2023. Deep reinforcement Q-learning for intelligent traffic signal control with partial detection. International Journal of Intelligent Transportation Systems Research 21:192−206 doi: 10.1007/s13177-023-00346-4 CrossRef Google Scholar
[53]	Nishi T, Otaki K, Hayakawa K, Yoshimura T. 2018. Traffic signal control based on reinforcement learning with graph convolutional neural nets. 2018 21^st International Conference on Intelligent Transportation Systems (ITSC), November 4−7, 2018, Maui, HI, USA. USA: IEEE. pp. 877−83. doi: 10.1109/ITSC.2018.8569301
[54]	Zang X, Yao H, Zheng G, Xu N, Xu K, et al. 2020. MetaLight: value-based meta-reinforcement learning for traffic signal control. Proceedings of the AAAI Conference on Artificial Intelligence 34:1153−60 doi: 10.1609/aaai.v34i01.5467 CrossRef Google Scholar
[55]	Steingrover M, Schouten R, Peelen S, Nijhuis E, Bakker B. 2005. Reinforcement learning of traffic light controllers adapting to traffic congestion. Proceedings of the Seventeenth Belgium-Netherlands Conference on Artificial Intelligence, Brussels, Belgium, October 17−18, 2005.
[56]	Gokulan BP, Srinivasan D. 2010. Distributed geometric fuzzy multiagent urban traffic signal control. IEEE Transactions on Intelligent Transportation Systems 11:714−27 doi: 10.1109/TITS.2010.2050688 CrossRef Google Scholar
[57]	Salkham A. 2010. Decentralized optimization of fluctuating urban traffic using reinforcement learning. PhD thesis. Trinity College Dublin, UK
[58]	Xu LH, Xia XH, Luo Q. 2013. The study of reinforcement learning for traffic self-adaptive control under multiagent Markov game environment. Mathematical Problems in Engineering 2013:962869 doi: 10.1155/2013/962869 CrossRef Google Scholar
[59]	Salkham A, Cunningham R, Garg A, Cahill V. 2008. A collaborative reinforcement learning approach to urban traffic control optimization. 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, December 9−12, 2008, Sydney, NSW, Australia. USA: IEEE. pp. 560−66. doi: 10.1109/WIIAT.2008.88
[60]	Kuleshov V, Precup D. 2014. Algorithms for multi-armed bandit problems. arXiv Preprint doi: 10.48550/arXiv.1402.6028 CrossRef Google Scholar
[61]	Webster FV. 1958. Traffic signal settings. Road Research Technical Paper, No. 39. London, UK: Department of Scientific and Industrial Research.
[62]	Varaiya P. 2013. Max pressure control of a network of signalized intersections. Transportation Research Part C: Emerging Technologies 36:177−95 doi: 10.1016/j.trc.2013.08.014 CrossRef Google Scholar

About this article

Cite this article

Sun Z, Jia X, Cai Y, Ji A, Lin X, et al. 2025. Joint control of traffic signal phase sequence and timing: a deep reinforcement learning method. Digital Transportation and Safety 4(2): 118−126 doi: 10.48130/dts-0025-0008

Sun Z, Jia X, Cai Y, Ji A, Lin X, et al. 2025. Joint control of traffic signal phase sequence and timing: a deep reinforcement learning method. Digital Transportation and Safety 4(2): 118−126 doi: 10.48130/dts-0025-0008

Figures(7) / Tables(4)

Download PDF

Article Metrics

Article views(3133) PDF downloads(1737)

Other Articles By Authors

on this site
- Zhanbo Sun
- Xinben Jia
- Yiming Cai
- Ang Ji
- Xia Lin
- Lin Liu
- Wenjun Wang
- Yuexuan Tu
on Google Scholar
- Zhanbo Sun
- Xinben Jia
- Yiming Cai
- Ang Ji
- Xia Lin
- Lin Liu
- Wenjun Wang
- Yuexuan Tu

HTML

Introduction

Advances in automation and communication technology have reshaped intelligent transportation systems (ITS), revealing the potential to further enhance traffic control and operations. The emergence of artificial intelligence (AI) provides a novel insight to manage ITS with minimal human intervention. Traffic signal control (TSC) at intersections, as one of the crucial problems in urban traffic systems, is expected to combine with advanced AI technology to alleviate traffic congestion^[1−3]. Reinforcement learning (RL) or deep reinforcement learning (DRL) with neural networks has been deemed promising in adaptive traffic control, which could hopefully reduce total intersection delay.

Back to the inception of traffic signals, fixed-time control has been widely deployed in the real world. It operates by repeating a preset pattern made according to historical data. Since traffic demand is unpredictable and fluctuates in time^[4,5], fixed-time control may limit its adaptability to dynamic traffic situations. Actuated control was proposed to address this limitation^[6], which is usually applied for isolated intersections. By employing loop detectors at intersections, real-time traffic data can be collected and utilized to adjust signal control strategies. It changes signal strategies based on a set of predefined, static parameters such as unit extension time, and minimum and maximum green time^[7,8]. For instance, if the number of vehicles in a specific approach exceeds a predefined threshold, the signal will prioritize that approach to release more vehicles. Although actuated control demonstrates the ability to respond to varying traffic conditions, it faces challenges in maintaining effectiveness under highly saturated traffic volumes. In response to this challenge, adaptive signal control systems such as SCATS^[9], OPAC^[10], SCOOT^[11], RHODES^[1], PRODYN^[12], and MOTION^[13] have been proposed. Built upon strategies designed by experts, adaptive signal control can integrate additional traffic-related knowledge and operate with a greater intelligence level compared to pure actuated control that relies solely on collected data. By utilizing upstream detector data to estimate incoming traffic flow, the adaptive control method seeks an optimal strategy based on objectives such as traffic delay and emission reduction^[14−16].

Conventional adaptive control grapples with many significant challenges^[17]. A noteworthy limitation stems from its reliance on predefined rules (e.g., restrictions on the number of vehicles/lane occupancy and fixed thresholds of signal duration), which may impact the control performance, particularly in dynamic real-world scenarios. Furthermore, during demand surge periods, it often struggles to dynamically adjust signal timings, resulting in prolonged delays in densely populated urban areas with excessive traffic volumes.

Recent decades have witnessed an increasing amount of learning-based applications in TSC optimization. In the early stage, fuzzy logic, neural networks (NNs), and RL are mostly adopted for intelligent signal control^[18,19]. Due to the curse of dimensionality, as the information collected from various sources (such as fixed-location sensors and probe vehicles) continues to increase, conventional methods are difficult to cope with such high-dimensional inputs on a large scale^[20,21]. Therefore, DRL-based TSC algorithms have been developed to handle such complexity^[22]. Numerous studies have integrated DRL methods to tackle traffic-related problems including TSC, showcasing its effectiveness in managing complex traffic scenarios^[23,24]. By leveraging DRL, the agent autonomously learns intricate strategies directly from data in a model-free manner, enabling more flexible and adaptive signal control strategies^[25,26]. Consequently, its performance relies primarily on collected data, rather than simulation settings or model assumptions, ensuring model robustness.

In general, DRL agents are trained to control traffic signals by selecting appropriate phases in a cycle (i.e., phase allocation)^[27,28], or keeping/switching the current phase (i.e., phase split)^[29−31]. In response to dynamic and/or unpredictable conditions at an intersection, agents may also adjust their control strategies to minimize overall waiting time or queue length from the system perspective.

In the optimization problem of DRL-based signal control, there are two main tasks: signal timing and phase sequencing optimization. In specific, a signal timing optimization strategy should determine the duration of each phase given a fixed phase sequence. For example, the phase duration can be adjusted in a phase-by-phase manner based on real-time traffic demands^[32]. Although the strategy allows for some adaptability, it has certain limitations, particularly during peak hours or high-demand traffic flow scenarios. Moreover, the non-uniform distribution of vehicle arrivals in different directions further complicates its practical application, making it difficult to achieve consistent and effective performance.

The phase-switching strategy, on the other hand, provides potential solutions to overcoming the drawbacks mentioned above^[33]. The method introduces a dynamic control mechanism for signals by selecting phases in a flexible sequence. Additionally, if a selected phase aligns with the previous one, its duration will be extended, leading to smooth traffic flow under stable traffic conditions^[34−36]. The strategy is implemented by switching phases at fixed time intervals, thereby enhancing the flexibility of signal control. Then, it can be adjusted to accommodate dynamic traffic patterns during different periods throughout the day. Some recent relevant papers are recommended for interested readers^[37−39].

Determining the appropriate intervals, however, presents a significant challenge when conducting the phase-switching strategy^[40]. The short phase-switching interval will result in frequent phase changes, limiting the number of vehicles that can pass through the intersection at once, leading to deteriorated intersection capacity. Additionally, it also increases the required computational resources during training and operation for strategy updating. Conversely, excessive intervals bring prolonged green duration, posing difficulties in promptly responding to real-time dynamic traffic. This may also hinder the agent's learning process, leading to sub-optimal solutions.

In summary, many existing reinforcement learning-based signal control models may result in ineffective phase allocation. For example, some strategies may not adapt to peak traffic flow scenarios in specific directions, resulting in extended periods of green light with no vehicles passing through the intersection. Meanwhile, phase-switching strategy may create excessive yellow light time due to inappropriate phase-switching intervals, both of which hinder the intersection traffic efficiency. To address this issue, a joint approach can be employed: the phase-switching agent can control the signal phases to improve the efficiency of vehicle movements, while the phase duration optimization agent can determine the duration of each phase and reduce the proportion of yellow light time.

Focusing on the TSC of isolated intersections, which can potentially be similarly extended to multiple intersections, we propose a DRL-based control strategy that jointly optimizes phase sequence and duration. The contribution of this study includes:

● We train two DRL-based agents to jointly optimize phase sequence and timing for intersection control, which aims to address the limitations of control strategies with fixed sequences or durations.

● We stitch image-like cell information (such as vehicle positions and speeds) and feature-based information (such as queue length and phase duration) as state representation. The question of whether the training of DRL-based agents benefits from the knowledge of microscopic and macroscopic scales is investigated.

● We conduct ablation tests to identify joint action settings within the proposed framework, compared with methods that focus on phase-switching and phase duration only, which could hopefully provide guidelines for future DRL-based TSC studies.

Parameters	Description	Value
total_episodes	Total number of training episodes where agents interact with the environment and update strategies	2,000
max_steps	Maximum steps (s) in one episode	3,600
iterations	Number of batches extracted during training	100
batch size	Number of data in one batch	256
memory_size_min	Minimum memory size	512
memory_size_max	Maximum memory size	20,480
learning_rate	Step size in the optimization process	0.001
gamma	Discount factor	0.75

	Average queue length (m)			Average travel time (s)			Average vehicle speed (m/s)
	Low	Medium	High	Low	Medium	High	Low	Medium	High
Webster	22.15	46.35	94.85	139.61	147.95	161.59	11.90	11.23	10.30
Max pressure	18.95	38.25	81.90	136.24	143.35	154.76	12.03	11.44	10.45
Joint control	17.65	35.80	77.95	135.74	141.34	153.98	12.11	11.46	10.71
Values in bold indicate the optimal results across different models.

	JointControl	MaxPressure	Improvement
Average queue length (m)	48.50	52.47	−7.56%
Average travel time (s)	146.48	147.87	−0.94%
Average vehicle speed (m/s)	11.46	11.19	+2.41%

	Average queue length (m)	Average travel time (s)	Average vehicle speed (m/s)
Micro info only	79.90	154.50	10.42
Macro info only	215.80	206.62	8.96
Phase control – single agent	83.25	155.97	10.45
Timing control – single agent	135.58	180.44	9.92
Joint control	77.95	153.98	10.71
Values in bold indicate the optimal results across different models.

{{lists.name}}

Joint control of traffic signal phase sequence and timing: a deep reinforcement learning method

Abstract