Figures (12)  Tables (10)
    • Figure 1. 

      An urban traffic simulation environment for traffic research.

    • Figure 2. 

      An illustration of traffic signal control at an intersection. According to the current traffic state $ {S}_{t} $ and the reward $ {R}_{t} $, the agent selects and executes the corresponding action $ {A}_{t} $ (change or maintain the current traffic light). Then the agent evaluates the effects of the action to obtain a new traffic state $ {S}_{t+l} $ and a new reward $ {R}_{t+1} $.

    • Figure 3. 

      The MARL structure in urban traffic signal control.

    • Figure 4. 

      The General City Traffic Computing System (GCTCS) for urban traffic signal control.

    • Figure 5. 

      The processes of construction of the urban traffic real environment.

    • Figure 6. 

      The General-MARL is composed of three sub-algorithms based on different layers of the GCTCS architecture.

    • Figure 7. 

      The process of abstracting traffic information from the video to generate the text info from traffic flow. Detection of vehicles in the three regions stands for different directions of passing vehicles

    • Figure 8. 

      The communication process between agents in the communication module.

    • Figure 9. 

      The illustration of topology map of real traffic intersection.

    • Figure 10. 

      Multi-intersection traffic signal control training process (without network delay).

    • Figure 11. 

      Multi-intersection traffic signal control training process (with network delay).

    • Figure 12. 

      Comparative results among different algorithms.

    • 1: Input the video information of the traffic situation.
      2: Capture one frame from the video: G
      3: Use and fine-tune the YOLO to recognize all the vehicle's position (x,y) and type in G.
      4: for vehicle in G do
      5: Obtain the direction of vehicles by judging the (x,y) from the regions according the three regions predefined.
      6: Record traffic state text information (vehicle -id, types, direction, and the timestamp) into traffic state text T.
      7: end for
      8: Use GCN-GAN for traffic flow prediction to T.
      9: Connect traffic flow prediction capability to the urban simulation environment.

      Table 1. 

      Edge-General-control algorithm.

    • 1: Init Episode $ B > 0 $, $ b=1 $; Minibatch Size $ \hat{M} > 0 $, Game Step N
      2: Init Replay Buffer $ D $, $ {\theta }_{V} $ and $ {\theta }_{A} $
      3: repeat
      4: Reset, go to the $ {x}_{0} $
      5: repeat
      6: Select $ u\leftarrow {\pi }^{{\theta }_{A}}\left(x\right) $ or select $ u $ randomly (e.g., ϵ-greedy)
      7: Observe $ {y}_{t}=\left({x}_{t-1},u,{x}_{t}\right) $
      8: Store $ {y}_{t} $ to Reply Buffer $ D $
      9: Sampling from Replay Buffer: $ Y={\left\{{y}_{i}\right\}}_{i=1}^{\hat{M}} $
      10: Optimize $ \frac{1}{M+1}{\sum }_{y\in Y\cup \left\{{y}_{t}\right\}}\hat{L}\left(y,{\theta }_{V},{\theta }_{A}\right) $ fix $ {\theta }_{A} $ update $ {\theta }_{V} $
      11: Optimize $ \frac{1}{M+1}{\sum }_{y\in Y\cup \left\{{y}_{t}\right\}}\hat{L}\left(y,{\theta }_{V},{\theta }_{A}\right) $ fix $ {\theta }_{V} $ update $ {\theta }_{A} $
      12: Until $ t > N $
      13: Until $ b > B $
      14: return $ {\theta }_{V} $ and $ {\theta }_{A} $

      Table 2. 

      Nash-MARL Module.

    • 1: Initialize the communication matrix of all agents $ {C}_{0} $
      2: Initialize the parameters of agent $ {\theta }_{Sender}^{i} $ and $ {\theta }_{Receiver}^{i} $
      3: repeat
      4: Receiver of $ {Agent}^{i} $: use attention mechanism to generate communication matric $ {\hat{C}}_{t} $
      5: Sender of $ {Agent}^{i} $: chooses an action $ {a}_{t+1}^{i} $ from policy selection network, or randomly chooses an action (e.g., ϵ-greedy exploration)
      6: Sender of $ {Agent}^{i} $: generate its own information through the receiver's communication matrix $ {\hat{C}}_{t}:{c}_{t+1}^{i} $
      7: Collect all the joint actions of Agent and execute the actions $ {a}_{t+1}^{1},\cdots ,{a}_{t+1}^{N} $, get the reward from the environment $ {R}_{t+1} $ and next state $ {X}_{t+1} $
      8: until End of Round Episode
      9: return $ {\theta }_{Sender}^{i} $ and $ {\theta }_{Receiver}^{i} $ for each agent

      Table 3. 

      The communication module.

    • 1: Apply the communication module:
      2: Initialize the communication matric $ {C}_{0} $ of all fog computing node Agents
      3: Initialize the parameters $ {\theta }_{Sender}^{i} $ and $ {\theta }_{Receiver}^{i} $ of the fog computing node Agents
      4: Receive the global parameter sets $ {\theta }_{V} $ and $ {\theta }_{A} $ distributed by the cloud computing node and initialize the parameter sets $ {\theta }_{V}^{i} $ and $ {\theta }_{A}^{i} $
      5: Initialize the Episode $ B > 0 $, $ b=1 $; the minimum batch size Minibatch Size $ \stackrel{-}{M} > 0 $, the number of game steps $ N $
      6: Apply the Nash-MARL Module:
      7: Initialize the memory record Replay Buffer $ D $
      8: repeat
      9: Reset the environment and enter the initial state $ {x}_{0} $
      10: repeat
      11: Choose joint action $ u\leftarrow {\pi }^{{\theta }_{A}}\left(x\right) $ or randomly choose joint action $ u $ (e.g., ϵ-greedy exploration)
      12: Observe the state-action-state triplet $ {y}_{t}=\left({x}_{t-1},u,{x}_{t}\right) $
      13: Store triples in the Replay Buffer $ D $
      14: Extract data $ Y={\left\{{y}_{i}\right\}}_{i=1}^{M} $ from the Replay Buffer
      15: $ {Agent}^{i} $ receiver uses Attention mechanism to generate communication matrix $ {\hat{C}}_{t} $
      16: The strategy choice network of the Agent $ {t}^{i} $ sender chooses an action $ {a}_{t+1}^{i} $, or randomly chooses action a (e.g., ϵ-greedy exploration)
      17: The $ {Agent}^{i} $ sender generates its own information $ {c}_{t+1}^{i} $ through the communication matrix $ {\hat{C}}_{t} $ at the receiving end
      18: Collect the joint actions of all Agents, execute an action $ {a}_{t+1}^{i},\cdots ,{a}_{t+1}^{N} $, get rewards $ {R}_{t+1} $ and the next state $ {X}_{t+1} $ from the environment
      19: Optimization step $ \frac{1}{M+1}{\sum }_{y\in Y\cup \left\{{y}_{t}\right\}}\hat{L}\left(y,{\theta }_{V}^{i},{\theta }_{A}^{i},{\hat{C}}_{t}\right) $, fixes $ {\theta }_{A}^{i} $ updates $ {\theta }_{V}^{i} $
      20: Optimization step $ \frac{1}{M+1}{\sum }_{y\in Y\cup \left\{{y}_{t}\right\}}\hat{L}\left(y,{\theta }_{V}^{i},{\theta }_{A}^{i},{\hat{C}}_{t}\right) $, fixes $ {\theta }_{V}^{i} $ updates $ {\theta }_{A}^{i} $
      21: until $ > N $
      22: until $ b > B $
      23: Return $ {\theta }_{V}^{i} $ and $ {\theta }_{A}^{i} $

      Table 4. 

      Fog-General-control.

    • 1: Apply the Nash-MARL module:
      2: Initialize the global parameter sets $ {\theta }_{V} $ and $ {\theta }_{A} $ of the cloud computing center and the global counter $ T $
      3: repeat
      4: Distributer global parameters to fog computing nodes $ {\theta }_{V}^{i}={\theta }_{V} $, $ {\theta }_{A}^{i}={\theta }_{A} $
      5: repeat
      6: Update global parameters $ {\theta }_{V}={\theta }_{V}+d{\theta }_{V}^{i} $, $ {\theta }_{A}={\theta }_{A}+d{\theta }_{A}^{i} $
      7: until all fog computing nodes are traversed and collected
      8: $ T\leftarrow T+1 $
      9: until $ T > {T}_{max} $

      Table 5. 

      Cloud-General-control.

    • ModuleParametersDescription
      Cloud Computing Center$ x=0|x=10 $
      20, 60, 60, 20
      0.001
      The delay from the cloud to the fog node
      The hidden layers in the network
      The learning rate
      Fog Computing Node$ x=0|x=1 $
      20, 60, 60, 20
      0.001
      The delay from intersection to the fog node
      The hidden layers in the network
      The learning rate
      Edge Computing Node$ x=0|x=1 $The delay from edge nodes to fog node
      Experiment Settings$ {g}_{t}={r}_{t}=27,{y}_{t}=6 $
      15
      E = 1,000
      I = 27
      l = 0.001
      γ = 0.982
      The initial intervals of the green, red and yellow
      The traffic flow prediction period
      The number of Episodes
      The number of intersections
      The learning rate
      The discount rate

      Table 1. 

      List of parameters in this paper.

    • MethodAverage speed (km/h)Average waiting time (s)
      Fixed-time10.17166.70
      Q learning18.43135.62
      DQN20.10112.24
      A3C24.1290.73
      Nash-Q29.7070.14
      Nash-DQN33.8161.21
      MAAC27.3980.21
      General-MARL31.2262.87

      Table 2. 

      Results of General-MARL and other algorithms (ignoring network delay).

    • MethodAverage speed (km/h)Average waiting time (s)
      Fixed-time10.15166.71
      Q learning(center)21.69182.75
      Q learning(edge)23.47155.64
      DQN24.94132.70
      A3C26.12108.65
      Nash-Q28.32105.55
      Nash-DQN30.86106.37
      MAAC27.61116.81
      General-MARL30.7892.48

      Table 3. 

      Results of the average speed and waiting time in each episode (consider network delay).

    • MethodAccumulated time (s)Network delay (s)Delay rate
      Fixed-time38827.50.00.0%
      Q learning(center)45448.010809.723.8%
      Q learning(edge)35789.36522.918.2%
      DQN33789.98997.226.6%
      A3C31340.47792.124.9%
      Nash-Q30940.57536.924.4%
      Nash-DQN31994.67937.724.8%
      MAAC28940.76552.122.6%
      General-MARL26912.73264.512.2%

      Table 4. 

      Results of accumulated time and network delay in each episode (consider network delay).

    • IDFixed-
      time
      Q-edgeDQNA3CNash-QNash-
      DQN
      MAACGeneral
      1175.27161.96136.93110.89109.34111.21124.6996.81
      2188.35172.68145.58116.43115.88119.02130.73105.46
      3155.28145.58123.71102.4399.3599.27120.9983.59
      4197.72180.36151.78120.39120.57124.61133.34111.66
      5155.23145.54123.68102.4199.3299.24116.9783.56
      6168.97156.8132.76108.22106.19107.45122.2692.64
      7157.68147.55125.30103.44100.55100.7107.9185.18
      8161.32150.53127.70104.98102.37102.88119.3187.58
      9185.23170.13143.52115.11114.32109.88128.53103.40
      10176.47162.95137.72111.40109.94111.92115.1597.60
      11161.21150.44127.63104.94102.31102.81119.2887.51
      12125.96121.55104.3290.01104.3081.77105.6964.20
      13131.62126.19108.0692.4187.5285.15117.8767.94
      14169.15156.95132.88108.30106.28107.55122.3392.76
      15175.52180.84152.16120.64109.47124.9640.06112.04
      16132.47126.89108.6292.7787.9485.65108.2068.50
      17164.12152.83129.5687.56103.77104.55113.4689.44
      18166.87155.08131.38107.33105.14106.19121.4591.26
      19150.39141.57120.48100.3696.9096.35115.1180.36
      20177.77164.01138.58111.95110.59112.70125.6598.46
      21153.63144.23122.62101.7398.5298.2996.3582.50
      22167.54155.63131.82107.62105.48106.59121.7191.70
      23177.62163.89162.05157.98110.52112.61139.54121.93
      24186.84171.45144.58115.79115.13118.11129.15104.46
      25175.36162.04136.99110.93109.39111.26128.7396.87
      26175.23161.93136.90110.87109.32111.18124.6796.78
      27188.35172.68145.58116.43115.88119.02104.64102.77

      Table 5. 

      Results of average waiting time at each intersection in each episode (considering network delay).