Figures (7)  Tables (4)
    • Figure 1. 

      The schematic diagram of the reinforcement learning process.

    • Figure 2. 

      Vehicle information extraction at one entrance of the intersection.

    • Figure 3. 

      The simulated intersection scenario in all experiments.

    • Figure 4. 

      Settings of dynamic traffic flow.

    • Figure 5. 

      The training curves of ablation experiments.

    • Figure 6. 

      The ratios of phase duration, the number of vehicles, and cumulative queue length with two different strategies. (a) The proportion of duration of all phases with the asymmetric strategy (the proposed model), (b) The proportion of vehicles in all lanes with the asymmetric strategy (the proposed model), (c) The proportion of queue length in all lanes with the asymmetric strategy (the proposed model), (d) The proportion of queue length in all lanes with the symmetric strategy.

    • Figure 7. 

      The time-varying queue length proportion under asymmetric and symmetric strategies. (a) With the asymmetric strategy (the proposed model), (b) With the symmetric strategy.

    • Parameters Description Value
      total_episodes Total number of training episodes where agents interact with the environment and update strategies 2,000
      max_steps Maximum steps (s) in one episode 3,600
      iterations Number of batches extracted during training 100
      batch size Number of data in one batch 256
      memory_size_min Minimum memory size 512
      memory_size_max Maximum memory size 20,480
      learning_rate Step size in the optimization process 0.001
      gamma Discount factor 0.75

      Table 1. 

      Parameter settings of the training process.

    • Average queue length (m) Average travel time (s) Average vehicle speed (m/s)
      Low Medium High Low Medium High Low Medium High
      Webster 22.15 46.35 94.85 139.61 147.95 161.59 11.90 11.23 10.30
      Max pressure 18.95 38.25 81.90 136.24 143.35 154.76 12.03 11.44 10.45
      Joint control 17.65 35.80 77.95 135.74 141.34 153.98 12.11 11.46 10.71
      Values in bold indicate the optimal results across different models.

      Table 2. 

      Comparative results with benchmark models.

    • JointControl MaxPressure Improvement
      Average queue length (m) 48.50 52.47 −7.56%
      Average travel time (s) 146.48 147.87 −0.94%
      Average vehicle speed (m/s) 11.46 11.19 +2.41%

      Table 3. 

      The results of tests under the temporal dynamic traffic flow.

    • Average queue length (m) Average travel time (s) Average vehicle speed (m/s)
      Micro info only 79.90 154.50 10.42
      Macro info only 215.80 206.62 8.96
      Phase control – single agent 83.25 155.97 10.45
      Timing control – single agent 135.58 180.44 9.92
      Joint control 77.95 153.98 10.71
      Values in bold indicate the optimal results across different models.

      Table 4. 

      The simulation results of ablation experiments.