CN114613169A - Traffic signal lamp control method based on double experience pools DQN - Google Patents

Traffic signal lamp control method based on double experience pools DQN Download PDF

Info

Publication number
CN114613169A
CN114613169A CN202210415387.1A CN202210415387A CN114613169A CN 114613169 A CN114613169 A CN 114613169A CN 202210415387 A CN202210415387 A CN 202210415387A CN 114613169 A CN114613169 A CN 114613169A
Authority
CN
China
Prior art keywords
experience
traffic signal
network
value
dqn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210415387.1A
Other languages
Chinese (zh)
Other versions
CN114613169B (en
Inventor
孔燕
杨智超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202210415387.1A priority Critical patent/CN114613169B/en
Publication of CN114613169A publication Critical patent/CN114613169A/en
Application granted granted Critical
Publication of CN114613169B publication Critical patent/CN114613169B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/07Controlling traffic signals
    • G08G1/08Controlling traffic signals according to detected number or speed of vehicles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Analytical Chemistry (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a traffic signal lamp control method based on a double experience pool DQN, which comprises the following steps: 1. establishing a traffic signal lamp control main network and a target value network based on a DQN algorithm; 2. initializing algorithm related parameters, collecting road condition information of traffic intersection, and establishing state value st(ii) a 3. Will stAct a of inputting the maximum Q value into the main networkt(ii) a 4. Execution of atAnd calculates the prize rtAnd state st+1(ii) a Will(s)t,at,rt,st+1) Storing to a first verification pool; 5. if rewarding rtGreater than historical experience average reward
Figure DDA0003605668070000011
Will(s)t,at,rt,st+1) Storing the experience data to a second experience pool; 6. generating a random number P, selecting a first experience pool according to the probability 1-P, selecting a second experience pool according to the probability P, randomly sampling in the selected experience pools, and training the parameters of the main network through a minimum loss function; s7, updating parameters of the target value network at regular time; updating s according to the current road condition informationtAnd jumping to the step 3 to continue execution. The method can enable the algorithm to be fast converged, and the obtained traffic signal lamp control strategy is fast optimized.

Description

Traffic signal lamp control method based on double experience pools DQN
Technical Field
The invention belongs to the technical field of traffic light control, and particularly relates to a traffic light control method based on double-experience pool deep Q learning.
Background
There have been a great deal of research into the use of deep Q-learning based algorithms (DQN) for the regulation of traffic lights. According to the method, labeled test data are not needed, training data are constructed by establishing an experience pool, the strategy obtained at the initial stage of the algorithm is poor, and the obtained strategy is gradually optimized and better along with the continuous updating and training of the experience pool. Therefore, how to make the algorithm converge quickly, that is, how to optimize the strategy quickly, is an important factor that affects the overall execution effect of the method.
Disclosure of Invention
The purpose of the invention is as follows: the invention provides a traffic signal lamp control method based on a double-experience-pool DQN, which can enable an algorithm to be rapidly converged and an obtained traffic signal lamp control strategy to be rapidly optimized.
The technical scheme is as follows: the invention adopts the following technical scheme:
a traffic signal lamp control method based on double experience pools DQN comprises the following steps:
s1, establishing a traffic signal lamp control main network and a target value network based on a DQN algorithm; the traffic signal lamp control main network and the target value network have the same structure, the input is a state value, and the output is the maximum value of the Q value for executing various actions under the input state value and the action corresponding to the maximum value of the Q value; the state space of the main network and the target value network is a vector formed by the number of vehicles on each lane of the traffic intersection, the action space is a vector formed by the regulation and control operation on the phases of all current traffic signal lamps of the traffic intersection, and the reward function is the difference between the number of vehicles on all lanes of the traffic intersection and the number of vehicles on the lanes;
s2, randomly initializing the parameter theta of the main network, initializing the parameter theta' of the target value network to theta, setting the initialization time step t to be 0, collecting road condition information of the traffic intersection, and establishing an initial state value StInitialization of
Figure BDA0003605668050000011
S3, mixing StInputting into main network, selecting Q(s)tA; θ) action of taking maximum value atAs the regulation and control operation of the traffic signal lamp at the current time, namely: a is at=argmaxaQ(stA; θ) where Q(s)tA; theta) indicates that the master network is in accordance with state s under parameter thetatThe Q value output by the action a;
s4, executing action atAnd calculates the prize rtAnd state st+1(ii) a Will(s)t,at,rt,st+1) Storing the experience into a first experience pool;
s5, calculating the average reward of the current historical experience when t is more than 0
Figure BDA0003605668050000021
If it is not
Figure BDA0003605668050000022
Will(s)t,at,rt,st+1) Storing the experience data into a second experience pool;
s6 at (p)1,p2) Generating a random number P in the interval, selecting a first experience pool by taking 1-P as probability, selecting a second experience pool by taking P as probability, randomly sampling B records in the selected experience pool, and training a parameter theta of the main network through a minimum loss function; p is a radical of1,p2Is a preset interval lower limit and upper limit, 0 < p1<p2<1;
The loss function is:
Figure BDA0003605668050000023
wherein(s)i,ai,ri,si+1) For records of random sampling in the selected experience pool, γ is the discount factor, maxa′Q′(si+1And a ', theta') represents the target value network at input state si+1Maximum Q value, max, of time outputaQ(siA, theta) indicates that the main network is in an input state siThe maximum Q value of the time output;
s7, adding one to t, and if mod (t, C) is 0, updating the parameter theta' of the target network to the parameter theta of the main network; mod is a remainder operation, and C is a preset parameter updating time step; updating s according to the current road condition informationtThen, the process proceeds to step S3.
Further, the step S6 obtains the parameters of the main network by minimizing the loss function by using a gradient descent method.
Further, when the traffic intersection is an intersection, the state value in the state space of the main network and the target value network is [ n ]1,m1,n2,m2,n3,m3,n4,m4]Wherein n isjThe number of vehicles on the jth lane of the intersection, mjThe number of vehicles on the jth lane; j is 1,2,3, 4.
Further, the action values in the action spaces of the main network and the target value network have three values, which are respectively: ac1: adding T seconds to the current phase duration; ac2: subtracting T seconds from the current phase duration; ac3: the current phase duration is unchanged. In the present invention, T is 5 seconds.
Further, in the present invention, the lower limit P of the interval for generating the random number P10.7, upper limit of interval p2=0.9。
Further, the value of the reward function is:
Figure BDA0003605668050000024
further, the first experience pool and the second experience pool both store records by using queues with fixed capacity.
Further, the step S5 calculates the average reward of the current historical experience
Figure BDA0003605668050000031
Has the advantages that: the traffic light control method disclosed by the invention combines the double experience pools and the DQN, wherein a double experience pool mechanism can lead the network parameter training to be quickly converged, and the obtained traffic light control strategy is quickly optimized, thereby better realizing the intelligent regulation and control of the traffic light.
Drawings
FIG. 1 is a flow chart of a traffic signal light control method disclosed in the present invention;
FIG. 2 is a schematic diagram of an intersection in an embodiment;
FIG. 3 is a diagram illustrating a network architecture according to the present invention.
Detailed Description
The invention is further elucidated with reference to the drawings and the detailed description.
The invention discloses a traffic signal lamp control method based on a double experience pool DQN, which comprises the following steps of:
s1, establishing a traffic signal lamp control main network and a target value network based on a DQN algorithm; the traffic signal lamp control main network and the target value network have the same structure, the input is a state value, and the output is a Q value for executing various actions under the input state value; the state space of the main network and the target value network is a vector formed by the number of vehicles on each lane of the traffic intersection, the action space is a vector formed by the regulation and control operation on the phases of all current traffic signal lamps of the traffic intersection, and the reward function is the difference between the number of vehicles on all lanes of the traffic intersection and the number of vehicles on the lanes;
when the traffic intersection is an intersection, as shown in fig. 2, the four intersections all have lanes entering the intersection and lanes leaving the intersection, as shown in the drawing, N1-N4 are the lanes entering the intersection, and M1-M4 are the lanes leaving the intersection, and then the state value in the state space of the main network and the target value network is [ N [ [ N ] N [ ]1,m1,n2,m2,n3,m3,n4,m4]Wherein n isjThe number of vehicles on the jth lane of the intersection, mjThe number of vehicles on the jth lane; j is 1,2,3, 4. The data may be captured by sensors or cameras positioned at various directional lanes. The value of the reward function is:
Figure BDA0003605668050000032
i.e. the difference between the number of vehicles on the incoming lane and the number of vehicles on the outgoing lane. The action values in the action space of the network and the target value network have three values, which are respectively: ac1: adding T seconds to the current phase duration; ac2: subtracting T seconds from the current phase duration; ac3: the current phase time is unchanged, namely, the state of the current phase is changed according to the preset phase change of the traffic signal lamp.
S2, randomly initializing the parameter theta of the main network, initializing the parameter theta' of the target value network to theta, setting the initialization time step t to be 0, collecting road condition information of the traffic intersection, and establishing an initial state value StInitialization of
Figure BDA0003605668050000041
S3, mixing StInputting into main network, selecting Q(s)tA; θ) action of taking maximum value atAs the regulation and control operation of the traffic signal lamp at the current time, namely: a ist=argmaxaQ(stA; θ) where Q(s)tA; theta) indicates that the master network is in accordance with state s under parameter thetatThe Q value output by the action a;
s4, executing action atAnd calculates the prize rtAnd state st+1(ii) a Will(s)t,at,rt,st+1) Storing the experience into a first experience pool;
s5, calculating the current historical experience average reward when t is more than 0
Figure BDA0003605668050000042
If it is not
Figure BDA0003605668050000043
Will(s)t,at,rt,st+1) Storing the experience data into a second experience pool;
current historical experience average rewards
Figure BDA0003605668050000044
I.e. average reward according to last time step
Figure BDA0003605668050000045
And the current time step number t and the reward rtTo calculate.
According to the invention, the first experience pool and the second experience pool adopt queues with fixed capacity to store records, when the queues are full, the record at the head of the queue is deleted, and a new record is stored at the tail of the queue, so that the experience pools are updated.
S6 at (p)1,p2) Generating a random number P in the interval, selecting a first experience pool by taking 1-P as a probability, and selecting a second experience pool by taking P as a probability; randomly sampling B records in a selected experience pool, and training a parameter theta of a main network through a minimum loss function; p is a radical of1,p2Is a preset interval lower limit and upper limit, 0 < p1<p2Is less than 1. In this example, p1=0.7,p2The probability of selecting the second experience pool is greater than the first experience pool, 0.9. Because the record reward in the second experience pool is larger, the performance of the record reward is better than that of the first experience pool, and the convergence speed can be accelerated by adopting the record training in the second experience pool. While keeping the first empirical pool selected with a lower probability (1-P) in order to reduce the probability of the network entering overfitting.
The loss function is:
Figure BDA0003605668050000046
wherein(s)i,ai,ri,si+1) For the record of random sampling in the selected experience pool, γ is the discount factor, maxa′Q′(si+1And a ', theta') represents the target value network at input state si+1Maximum Q value, max, of time outputaQ(siA, theta) indicates that the main network is in an input state siThe maximum Q value of the time output;
in this embodiment, the parameters of the main network are obtained by minimizing the loss function by using a gradient descent method. Fig. 3 is a schematic diagram of a network architecture according to the present invention.
S7, ordert plus one, if mod (t, C) is 0, updating the parameter theta' of the target network to the parameter theta of the main network; mod is a remainder operation, and C is a preset parameter updating time step; according to the duration between the t-1 moment and the t moment and the value of C, the frequency of updating the target value network parameters can be controlled; updating s according to the current road condition informationtThe process jumps to step S3 to continue execution.
The invention adopts the form of the double experience pools, and accelerates the convergence speed of the network during the DQN training, thereby better relieving the traffic jam condition and promoting the development of the fields of intelligent traffic and deep reinforcement learning.

Claims (9)

1. A traffic signal lamp control method based on a double experience pool DQN is characterized by comprising the following steps:
s1, establishing a traffic signal lamp control main network and a target value network based on the DQN algorithm; the traffic signal lamp control main network and the target value network have the same structure, the input is a state value, and the output is the maximum value of the Q value for executing various actions under the input state value and the action corresponding to the maximum value of the Q value; the state space of the main network and the target value network is a vector formed by the number of vehicles on each lane of the traffic intersection, the action space is a vector formed by the regulation and control operation on the phases of all current traffic signal lamps of the traffic intersection, and the reward function is the difference between the number of vehicles on all lanes of the traffic intersection and the number of vehicles on the lanes;
s2, randomly initializing the parameter theta of the main network, initializing the parameter theta' of the target value network to theta, setting the initialization time step t to be 0, collecting road condition information of the traffic intersection, and establishing an initial state value StInitialization of
Figure FDA0003605668040000011
S3, mixing StInputting into main network, selecting Q(s)tA; θ) action of taking maximum value atAs the regulation and control operation of the traffic signal lamp at the current time, namely: a ist=argmaxaQ(stA; θ) where Q(s)t,a;θ)Representing the state s of the main network under the parameter thetatThe Q value output by the action a;
s4, executing action atAnd calculates the reward rtAnd state st+1(ii) a Will(s)t,at,rt,st+1) Storing the experience into a first experience pool;
s5, calculating the average reward of the current historical experience when t is more than 0
Figure FDA0003605668040000012
If it is not
Figure FDA0003605668040000013
Will(s)t,at,rt,st+1) Storing the experience data into a second experience pool;
s6 at (p)1,p2) Generating a random number P in the interval, selecting a first experience pool by taking 1-P as probability, selecting a second experience pool by taking P as probability, randomly sampling B records in the selected experience pool, and training a parameter theta of the main network through a minimum loss function; p is a radical of1,p2Is a preset lower limit and an upper limit of the interval, 0 < p1<p2<1;
The loss function is:
Figure FDA0003605668040000014
wherein(s)i,ai,ri,si+1) For records of random sampling in the selected experience pool, γ is the discount factor, maxa′Q′(si+1And a ', θ') represents the target value of the network at the input state si+1Maximum Q value, max, of time outputaQ(siA, theta) indicates that the main network is in an input state siThe maximum Q value of the time output;
s7, adding one to t, and if mod (t, C) is 0, updating the parameter theta' of the target network to the parameter theta of the main network; mod is a remainder operation, and C is a preset parameter updating time step; updating s according to the current road condition informationtThen, the process proceeds to step S3.
2. The traffic signal light control method based on the double-experience-pool DQN of claim 1, wherein the step S6 is implemented by minimizing a loss function by using a gradient descent method to obtain parameters of a main network.
3. The traffic signal lamp control method based on the double experience pools DQN of claim 1, wherein when the traffic intersection is an intersection, the state values in the state spaces of the main network and the target value network are [ n ]1,m1,n2,m2,n3,m3,n4,m4]Wherein n isjThe number of vehicles on the jth lane of the intersection, mjThe number of vehicles on the jth lane; j is 1,2,3, 4.
4. The traffic signal lamp control method based on the double experience pools DQN of claim 1, wherein the action values in the action spaces of the main network and the target value network have three values, which are respectively: ac1: adding T seconds to the current phase duration; ac2: subtracting T seconds from the current phase duration; ac3: the current phase duration is unchanged.
5. The traffic signal light control method based on double empirical pool DQN of claim 1, wherein p is1=0.7,p2=0.9。
6. A traffic signal control method based on dual experience pool DQN in accordance with claim 3, wherein the value of the reward function is:
Figure FDA0003605668040000021
7. the traffic signal lamp control method based on the double experience pool DQN of claim 1, wherein the first experience pool and the second experience pool both employ queues with fixed capacity to store records.
8. The traffic signal light control method based on double experience pool DQN of claim 1, wherein the step S5 calculates the average reward of current historical experience
Figure FDA0003605668040000022
9. The traffic signal light control method based on dual experience pool DQN of claim 4, wherein T is 5 seconds.
CN202210415387.1A 2022-04-20 2022-04-20 Traffic signal lamp control method based on double experience pools DQN Active CN114613169B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210415387.1A CN114613169B (en) 2022-04-20 2022-04-20 Traffic signal lamp control method based on double experience pools DQN

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210415387.1A CN114613169B (en) 2022-04-20 2022-04-20 Traffic signal lamp control method based on double experience pools DQN

Publications (2)

Publication Number Publication Date
CN114613169A true CN114613169A (en) 2022-06-10
CN114613169B CN114613169B (en) 2023-02-28

Family

ID=81870213

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210415387.1A Active CN114613169B (en) 2022-04-20 2022-04-20 Traffic signal lamp control method based on double experience pools DQN

Country Status (1)

Country Link
CN (1) CN114613169B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115758705A (en) * 2022-11-10 2023-03-07 北京航天驭星科技有限公司 Modeling method, model and acquisition method of satellite north-south conservation strategy model
CN117010482A (en) * 2023-07-06 2023-11-07 三峡大学 Strategy method based on double experience pool priority sampling and DuelingDQN implementation

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109559530A (en) * 2019-01-07 2019-04-02 大连理工大学 A kind of multi-intersection signal lamp cooperative control method based on Q value Transfer Depth intensified learning
CN110060475A (en) * 2019-04-17 2019-07-26 清华大学 A kind of multi-intersection signal lamp cooperative control method based on deeply study
CN110930734A (en) * 2019-11-30 2020-03-27 天津大学 Intelligent idle traffic indicator lamp control method based on reinforcement learning
CN111696370A (en) * 2020-06-16 2020-09-22 西安电子科技大学 Traffic light control method based on heuristic deep Q network
CN111898211A (en) * 2020-08-07 2020-11-06 吉林大学 Intelligent vehicle speed decision method based on deep reinforcement learning and simulation method thereof
CN112632858A (en) * 2020-12-23 2021-04-09 浙江工业大学 Traffic light signal control method based on Actor-critical frame deep reinforcement learning algorithm
CN113411099A (en) * 2021-05-28 2021-09-17 杭州电子科技大学 Double-change frequency hopping pattern intelligent decision method based on PPER-DQN
CN113947928A (en) * 2021-10-15 2022-01-18 河南工业大学 Traffic signal lamp timing method based on combination of deep reinforcement learning and extended Kalman filtering
CN113963553A (en) * 2021-10-20 2022-01-21 西安工业大学 Road intersection signal lamp green signal ratio control method, device and equipment

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109559530A (en) * 2019-01-07 2019-04-02 大连理工大学 A kind of multi-intersection signal lamp cooperative control method based on Q value Transfer Depth intensified learning
CN110060475A (en) * 2019-04-17 2019-07-26 清华大学 A kind of multi-intersection signal lamp cooperative control method based on deeply study
CN110930734A (en) * 2019-11-30 2020-03-27 天津大学 Intelligent idle traffic indicator lamp control method based on reinforcement learning
CN111696370A (en) * 2020-06-16 2020-09-22 西安电子科技大学 Traffic light control method based on heuristic deep Q network
CN111898211A (en) * 2020-08-07 2020-11-06 吉林大学 Intelligent vehicle speed decision method based on deep reinforcement learning and simulation method thereof
CN112632858A (en) * 2020-12-23 2021-04-09 浙江工业大学 Traffic light signal control method based on Actor-critical frame deep reinforcement learning algorithm
CN113411099A (en) * 2021-05-28 2021-09-17 杭州电子科技大学 Double-change frequency hopping pattern intelligent decision method based on PPER-DQN
CN113947928A (en) * 2021-10-15 2022-01-18 河南工业大学 Traffic signal lamp timing method based on combination of deep reinforcement learning and extended Kalman filtering
CN113963553A (en) * 2021-10-20 2022-01-21 西安工业大学 Road intersection signal lamp green signal ratio control method, device and equipment

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
WANC H ET.AL: "Value-based deep reinforcement learning for adaptive isolated intersection signal control", 《IET INTELLIGENT TRANSPORT SYSTEMS》 *
丁文杰: ""基于深度强化学习的交通信号自适应控制研究"", 《全国优秀硕士学位论文全文库 工程科技Ⅱ辑》 *
徐东伟 等: ""基于深度强化学习的城市交通信号控制综述"", 《交通运输工程与信息学报》 *
甘正胜 等: ""基于元学习的小样本遥感图像分类"", 《计算机工程与设计》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115758705A (en) * 2022-11-10 2023-03-07 北京航天驭星科技有限公司 Modeling method, model and acquisition method of satellite north-south conservation strategy model
CN117010482A (en) * 2023-07-06 2023-11-07 三峡大学 Strategy method based on double experience pool priority sampling and DuelingDQN implementation

Also Published As

Publication number Publication date
CN114613169B (en) 2023-02-28

Similar Documents

Publication Publication Date Title
CN114613169B (en) Traffic signal lamp control method based on double experience pools DQN
CN109559530B (en) Multi-intersection signal lamp cooperative control method based on Q value migration depth reinforcement learning
CN110047278B (en) Adaptive traffic signal control system and method based on deep reinforcement learning
CN109215355A (en) A kind of single-point intersection signal timing optimization method based on deeply study
CN111260937A (en) Cross traffic signal lamp control method based on reinforcement learning
CN113963553A (en) Road intersection signal lamp green signal ratio control method, device and equipment
CN115019523B (en) Deep reinforcement learning traffic signal coordination optimization control method based on minimized pressure difference
CN115691167A (en) Single-point traffic signal control method based on intersection holographic data
CN113392577B (en) Regional boundary main intersection signal control method based on deep reinforcement learning
CN114419884A (en) Self-adaptive signal control method and system based on reinforcement learning and phase competition
CN113724507A (en) Traffic control and vehicle induction cooperation method and system based on deep reinforcement learning
CN116639124A (en) Automatic driving vehicle lane changing method based on double-layer deep reinforcement learning
CN116824848A (en) Traffic signal optimization control method based on Bayesian deep Q network
CN114723156B (en) Global traffic signal lamp regulation and control method based on improved genetic algorithm
CN116597670A (en) Traffic signal lamp timing method, device and equipment based on deep reinforcement learning
CN115472023A (en) Intelligent traffic light control method and device based on deep reinforcement learning
CN114613170A (en) Traffic signal lamp intersection coordination control method based on reinforcement learning
CN114613168B (en) Deep reinforcement learning traffic signal control method based on memory network
CN116822659B (en) Automatic driving motor skill learning method, system, equipment and computer medium
Xu et al. Training a Reinforcement Learning Agent with AutoRL for Traffic Signal Control
CN117649776B (en) Single intersection signal lamp control method, device, terminal and storage medium
CN115691110B (en) Intersection signal period stable timing method based on reinforcement learning and oriented to dynamic traffic flow
CN117275259B (en) Multi-intersection cooperative signal control method based on field information backtracking
CN116137103B (en) Large-scale traffic light signal control method based on primitive learning and deep reinforcement learning
CN116994444B (en) Traffic light control method, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant