CN114613169A - Traffic signal lamp control method based on double experience pools DQN - Google Patents
Traffic signal lamp control method based on double experience pools DQN Download PDFInfo
- Publication number
- CN114613169A CN114613169A CN202210415387.1A CN202210415387A CN114613169A CN 114613169 A CN114613169 A CN 114613169A CN 202210415387 A CN202210415387 A CN 202210415387A CN 114613169 A CN114613169 A CN 114613169A
- Authority
- CN
- China
- Prior art keywords
- experience
- traffic signal
- network
- value
- dqn
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/07—Controlling traffic signals
- G08G1/08—Controlling traffic signals according to detected number or speed of vehicles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/01—Detecting movement of traffic to be counted or controlled
- G08G1/0104—Measuring and analyzing of parameters relative to traffic conditions
- G08G1/0125—Traffic data processing
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Analytical Chemistry (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention discloses a traffic signal lamp control method based on a double experience pool DQN, which comprises the following steps: 1. establishing a traffic signal lamp control main network and a target value network based on a DQN algorithm; 2. initializing algorithm related parameters, collecting road condition information of traffic intersection, and establishing state value st(ii) a 3. Will stAct a of inputting the maximum Q value into the main networkt(ii) a 4. Execution of atAnd calculates the prize rtAnd state st+1(ii) a Will(s)t,at,rt,st+1) Storing to a first verification pool; 5. if rewarding rtGreater than historical experience average rewardWill(s)t,at,rt,st+1) Storing the experience data to a second experience pool; 6. generating a random number P, selecting a first experience pool according to the probability 1-P, selecting a second experience pool according to the probability P, randomly sampling in the selected experience pools, and training the parameters of the main network through a minimum loss function; s7, updating parameters of the target value network at regular time; updating s according to the current road condition informationtAnd jumping to the step 3 to continue execution. The method can enable the algorithm to be fast converged, and the obtained traffic signal lamp control strategy is fast optimized.
Description
Technical Field
The invention belongs to the technical field of traffic light control, and particularly relates to a traffic light control method based on double-experience pool deep Q learning.
Background
There have been a great deal of research into the use of deep Q-learning based algorithms (DQN) for the regulation of traffic lights. According to the method, labeled test data are not needed, training data are constructed by establishing an experience pool, the strategy obtained at the initial stage of the algorithm is poor, and the obtained strategy is gradually optimized and better along with the continuous updating and training of the experience pool. Therefore, how to make the algorithm converge quickly, that is, how to optimize the strategy quickly, is an important factor that affects the overall execution effect of the method.
Disclosure of Invention
The purpose of the invention is as follows: the invention provides a traffic signal lamp control method based on a double-experience-pool DQN, which can enable an algorithm to be rapidly converged and an obtained traffic signal lamp control strategy to be rapidly optimized.
The technical scheme is as follows: the invention adopts the following technical scheme:
a traffic signal lamp control method based on double experience pools DQN comprises the following steps:
s1, establishing a traffic signal lamp control main network and a target value network based on a DQN algorithm; the traffic signal lamp control main network and the target value network have the same structure, the input is a state value, and the output is the maximum value of the Q value for executing various actions under the input state value and the action corresponding to the maximum value of the Q value; the state space of the main network and the target value network is a vector formed by the number of vehicles on each lane of the traffic intersection, the action space is a vector formed by the regulation and control operation on the phases of all current traffic signal lamps of the traffic intersection, and the reward function is the difference between the number of vehicles on all lanes of the traffic intersection and the number of vehicles on the lanes;
s2, randomly initializing the parameter theta of the main network, initializing the parameter theta' of the target value network to theta, setting the initialization time step t to be 0, collecting road condition information of the traffic intersection, and establishing an initial state value StInitialization of
S3, mixing StInputting into main network, selecting Q(s)tA; θ) action of taking maximum value atAs the regulation and control operation of the traffic signal lamp at the current time, namely: a is at=argmaxaQ(stA; θ) where Q(s)tA; theta) indicates that the master network is in accordance with state s under parameter thetatThe Q value output by the action a;
s4, executing action atAnd calculates the prize rtAnd state st+1(ii) a Will(s)t,at,rt,st+1) Storing the experience into a first experience pool;
s5, calculating the average reward of the current historical experience when t is more than 0If it is notWill(s)t,at,rt,st+1) Storing the experience data into a second experience pool;
s6 at (p)1,p2) Generating a random number P in the interval, selecting a first experience pool by taking 1-P as probability, selecting a second experience pool by taking P as probability, randomly sampling B records in the selected experience pool, and training a parameter theta of the main network through a minimum loss function; p is a radical of1,p2Is a preset interval lower limit and upper limit, 0 < p1<p2<1;
wherein(s)i,ai,ri,si+1) For records of random sampling in the selected experience pool, γ is the discount factor, maxa′Q′(si+1And a ', theta') represents the target value network at input state si+1Maximum Q value, max, of time outputaQ(siA, theta) indicates that the main network is in an input state siThe maximum Q value of the time output;
s7, adding one to t, and if mod (t, C) is 0, updating the parameter theta' of the target network to the parameter theta of the main network; mod is a remainder operation, and C is a preset parameter updating time step; updating s according to the current road condition informationtThen, the process proceeds to step S3.
Further, the step S6 obtains the parameters of the main network by minimizing the loss function by using a gradient descent method.
Further, when the traffic intersection is an intersection, the state value in the state space of the main network and the target value network is [ n ]1,m1,n2,m2,n3,m3,n4,m4]Wherein n isjThe number of vehicles on the jth lane of the intersection, mjThe number of vehicles on the jth lane; j is 1,2,3, 4.
Further, the action values in the action spaces of the main network and the target value network have three values, which are respectively: ac1: adding T seconds to the current phase duration; ac2: subtracting T seconds from the current phase duration; ac3: the current phase duration is unchanged. In the present invention, T is 5 seconds.
Further, in the present invention, the lower limit P of the interval for generating the random number P10.7, upper limit of interval p2=0.9。
further, the first experience pool and the second experience pool both store records by using queues with fixed capacity.
Has the advantages that: the traffic light control method disclosed by the invention combines the double experience pools and the DQN, wherein a double experience pool mechanism can lead the network parameter training to be quickly converged, and the obtained traffic light control strategy is quickly optimized, thereby better realizing the intelligent regulation and control of the traffic light.
Drawings
FIG. 1 is a flow chart of a traffic signal light control method disclosed in the present invention;
FIG. 2 is a schematic diagram of an intersection in an embodiment;
FIG. 3 is a diagram illustrating a network architecture according to the present invention.
Detailed Description
The invention is further elucidated with reference to the drawings and the detailed description.
The invention discloses a traffic signal lamp control method based on a double experience pool DQN, which comprises the following steps of:
s1, establishing a traffic signal lamp control main network and a target value network based on a DQN algorithm; the traffic signal lamp control main network and the target value network have the same structure, the input is a state value, and the output is a Q value for executing various actions under the input state value; the state space of the main network and the target value network is a vector formed by the number of vehicles on each lane of the traffic intersection, the action space is a vector formed by the regulation and control operation on the phases of all current traffic signal lamps of the traffic intersection, and the reward function is the difference between the number of vehicles on all lanes of the traffic intersection and the number of vehicles on the lanes;
when the traffic intersection is an intersection, as shown in fig. 2, the four intersections all have lanes entering the intersection and lanes leaving the intersection, as shown in the drawing, N1-N4 are the lanes entering the intersection, and M1-M4 are the lanes leaving the intersection, and then the state value in the state space of the main network and the target value network is [ N [ [ N ] N [ ]1,m1,n2,m2,n3,m3,n4,m4]Wherein n isjThe number of vehicles on the jth lane of the intersection, mjThe number of vehicles on the jth lane; j is 1,2,3, 4. The data may be captured by sensors or cameras positioned at various directional lanes. The value of the reward function is:i.e. the difference between the number of vehicles on the incoming lane and the number of vehicles on the outgoing lane. The action values in the action space of the network and the target value network have three values, which are respectively: ac1: adding T seconds to the current phase duration; ac2: subtracting T seconds from the current phase duration; ac3: the current phase time is unchanged, namely, the state of the current phase is changed according to the preset phase change of the traffic signal lamp.
S2, randomly initializing the parameter theta of the main network, initializing the parameter theta' of the target value network to theta, setting the initialization time step t to be 0, collecting road condition information of the traffic intersection, and establishing an initial state value StInitialization of
S3, mixing StInputting into main network, selecting Q(s)tA; θ) action of taking maximum value atAs the regulation and control operation of the traffic signal lamp at the current time, namely: a ist=argmaxaQ(stA; θ) where Q(s)tA; theta) indicates that the master network is in accordance with state s under parameter thetatThe Q value output by the action a;
s4, executing action atAnd calculates the prize rtAnd state st+1(ii) a Will(s)t,at,rt,st+1) Storing the experience into a first experience pool;
s5, calculating the current historical experience average reward when t is more than 0If it is notWill(s)t,at,rt,st+1) Storing the experience data into a second experience pool;
current historical experience average rewardsI.e. average reward according to last time stepAnd the current time step number t and the reward rtTo calculate.
According to the invention, the first experience pool and the second experience pool adopt queues with fixed capacity to store records, when the queues are full, the record at the head of the queue is deleted, and a new record is stored at the tail of the queue, so that the experience pools are updated.
S6 at (p)1,p2) Generating a random number P in the interval, selecting a first experience pool by taking 1-P as a probability, and selecting a second experience pool by taking P as a probability; randomly sampling B records in a selected experience pool, and training a parameter theta of a main network through a minimum loss function; p is a radical of1,p2Is a preset interval lower limit and upper limit, 0 < p1<p2Is less than 1. In this example, p1=0.7,p2The probability of selecting the second experience pool is greater than the first experience pool, 0.9. Because the record reward in the second experience pool is larger, the performance of the record reward is better than that of the first experience pool, and the convergence speed can be accelerated by adopting the record training in the second experience pool. While keeping the first empirical pool selected with a lower probability (1-P) in order to reduce the probability of the network entering overfitting.
wherein(s)i,ai,ri,si+1) For the record of random sampling in the selected experience pool, γ is the discount factor, maxa′Q′(si+1And a ', theta') represents the target value network at input state si+1Maximum Q value, max, of time outputaQ(siA, theta) indicates that the main network is in an input state siThe maximum Q value of the time output;
in this embodiment, the parameters of the main network are obtained by minimizing the loss function by using a gradient descent method. Fig. 3 is a schematic diagram of a network architecture according to the present invention.
S7, ordert plus one, if mod (t, C) is 0, updating the parameter theta' of the target network to the parameter theta of the main network; mod is a remainder operation, and C is a preset parameter updating time step; according to the duration between the t-1 moment and the t moment and the value of C, the frequency of updating the target value network parameters can be controlled; updating s according to the current road condition informationtThe process jumps to step S3 to continue execution.
The invention adopts the form of the double experience pools, and accelerates the convergence speed of the network during the DQN training, thereby better relieving the traffic jam condition and promoting the development of the fields of intelligent traffic and deep reinforcement learning.
Claims (9)
1. A traffic signal lamp control method based on a double experience pool DQN is characterized by comprising the following steps:
s1, establishing a traffic signal lamp control main network and a target value network based on the DQN algorithm; the traffic signal lamp control main network and the target value network have the same structure, the input is a state value, and the output is the maximum value of the Q value for executing various actions under the input state value and the action corresponding to the maximum value of the Q value; the state space of the main network and the target value network is a vector formed by the number of vehicles on each lane of the traffic intersection, the action space is a vector formed by the regulation and control operation on the phases of all current traffic signal lamps of the traffic intersection, and the reward function is the difference between the number of vehicles on all lanes of the traffic intersection and the number of vehicles on the lanes;
s2, randomly initializing the parameter theta of the main network, initializing the parameter theta' of the target value network to theta, setting the initialization time step t to be 0, collecting road condition information of the traffic intersection, and establishing an initial state value StInitialization of
S3, mixing StInputting into main network, selecting Q(s)tA; θ) action of taking maximum value atAs the regulation and control operation of the traffic signal lamp at the current time, namely: a ist=argmaxaQ(stA; θ) where Q(s)t,a;θ)Representing the state s of the main network under the parameter thetatThe Q value output by the action a;
s4, executing action atAnd calculates the reward rtAnd state st+1(ii) a Will(s)t,at,rt,st+1) Storing the experience into a first experience pool;
s5, calculating the average reward of the current historical experience when t is more than 0If it is notWill(s)t,at,rt,st+1) Storing the experience data into a second experience pool;
s6 at (p)1,p2) Generating a random number P in the interval, selecting a first experience pool by taking 1-P as probability, selecting a second experience pool by taking P as probability, randomly sampling B records in the selected experience pool, and training a parameter theta of the main network through a minimum loss function; p is a radical of1,p2Is a preset lower limit and an upper limit of the interval, 0 < p1<p2<1;
wherein(s)i,ai,ri,si+1) For records of random sampling in the selected experience pool, γ is the discount factor, maxa′Q′(si+1And a ', θ') represents the target value of the network at the input state si+1Maximum Q value, max, of time outputaQ(siA, theta) indicates that the main network is in an input state siThe maximum Q value of the time output;
s7, adding one to t, and if mod (t, C) is 0, updating the parameter theta' of the target network to the parameter theta of the main network; mod is a remainder operation, and C is a preset parameter updating time step; updating s according to the current road condition informationtThen, the process proceeds to step S3.
2. The traffic signal light control method based on the double-experience-pool DQN of claim 1, wherein the step S6 is implemented by minimizing a loss function by using a gradient descent method to obtain parameters of a main network.
3. The traffic signal lamp control method based on the double experience pools DQN of claim 1, wherein when the traffic intersection is an intersection, the state values in the state spaces of the main network and the target value network are [ n ]1,m1,n2,m2,n3,m3,n4,m4]Wherein n isjThe number of vehicles on the jth lane of the intersection, mjThe number of vehicles on the jth lane; j is 1,2,3, 4.
4. The traffic signal lamp control method based on the double experience pools DQN of claim 1, wherein the action values in the action spaces of the main network and the target value network have three values, which are respectively: ac1: adding T seconds to the current phase duration; ac2: subtracting T seconds from the current phase duration; ac3: the current phase duration is unchanged.
5. The traffic signal light control method based on double empirical pool DQN of claim 1, wherein p is1=0.7,p2=0.9。
7. the traffic signal lamp control method based on the double experience pool DQN of claim 1, wherein the first experience pool and the second experience pool both employ queues with fixed capacity to store records.
9. The traffic signal light control method based on dual experience pool DQN of claim 4, wherein T is 5 seconds.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210415387.1A CN114613169B (en) | 2022-04-20 | 2022-04-20 | Traffic signal lamp control method based on double experience pools DQN |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210415387.1A CN114613169B (en) | 2022-04-20 | 2022-04-20 | Traffic signal lamp control method based on double experience pools DQN |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114613169A true CN114613169A (en) | 2022-06-10 |
CN114613169B CN114613169B (en) | 2023-02-28 |
Family
ID=81870213
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210415387.1A Active CN114613169B (en) | 2022-04-20 | 2022-04-20 | Traffic signal lamp control method based on double experience pools DQN |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114613169B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115758705A (en) * | 2022-11-10 | 2023-03-07 | 北京航天驭星科技有限公司 | Modeling method, model and acquisition method of satellite north-south conservation strategy model |
CN117010482A (en) * | 2023-07-06 | 2023-11-07 | 三峡大学 | Strategy method based on double experience pool priority sampling and DuelingDQN implementation |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109559530A (en) * | 2019-01-07 | 2019-04-02 | 大连理工大学 | A kind of multi-intersection signal lamp cooperative control method based on Q value Transfer Depth intensified learning |
CN110060475A (en) * | 2019-04-17 | 2019-07-26 | 清华大学 | A kind of multi-intersection signal lamp cooperative control method based on deeply study |
CN110930734A (en) * | 2019-11-30 | 2020-03-27 | 天津大学 | Intelligent idle traffic indicator lamp control method based on reinforcement learning |
CN111696370A (en) * | 2020-06-16 | 2020-09-22 | 西安电子科技大学 | Traffic light control method based on heuristic deep Q network |
CN111898211A (en) * | 2020-08-07 | 2020-11-06 | 吉林大学 | Intelligent vehicle speed decision method based on deep reinforcement learning and simulation method thereof |
CN112632858A (en) * | 2020-12-23 | 2021-04-09 | 浙江工业大学 | Traffic light signal control method based on Actor-critical frame deep reinforcement learning algorithm |
CN113411099A (en) * | 2021-05-28 | 2021-09-17 | 杭州电子科技大学 | Double-change frequency hopping pattern intelligent decision method based on PPER-DQN |
CN113947928A (en) * | 2021-10-15 | 2022-01-18 | 河南工业大学 | Traffic signal lamp timing method based on combination of deep reinforcement learning and extended Kalman filtering |
CN113963553A (en) * | 2021-10-20 | 2022-01-21 | 西安工业大学 | Road intersection signal lamp green signal ratio control method, device and equipment |
-
2022
- 2022-04-20 CN CN202210415387.1A patent/CN114613169B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109559530A (en) * | 2019-01-07 | 2019-04-02 | 大连理工大学 | A kind of multi-intersection signal lamp cooperative control method based on Q value Transfer Depth intensified learning |
CN110060475A (en) * | 2019-04-17 | 2019-07-26 | 清华大学 | A kind of multi-intersection signal lamp cooperative control method based on deeply study |
CN110930734A (en) * | 2019-11-30 | 2020-03-27 | 天津大学 | Intelligent idle traffic indicator lamp control method based on reinforcement learning |
CN111696370A (en) * | 2020-06-16 | 2020-09-22 | 西安电子科技大学 | Traffic light control method based on heuristic deep Q network |
CN111898211A (en) * | 2020-08-07 | 2020-11-06 | 吉林大学 | Intelligent vehicle speed decision method based on deep reinforcement learning and simulation method thereof |
CN112632858A (en) * | 2020-12-23 | 2021-04-09 | 浙江工业大学 | Traffic light signal control method based on Actor-critical frame deep reinforcement learning algorithm |
CN113411099A (en) * | 2021-05-28 | 2021-09-17 | 杭州电子科技大学 | Double-change frequency hopping pattern intelligent decision method based on PPER-DQN |
CN113947928A (en) * | 2021-10-15 | 2022-01-18 | 河南工业大学 | Traffic signal lamp timing method based on combination of deep reinforcement learning and extended Kalman filtering |
CN113963553A (en) * | 2021-10-20 | 2022-01-21 | 西安工业大学 | Road intersection signal lamp green signal ratio control method, device and equipment |
Non-Patent Citations (4)
Title |
---|
WANC H ET.AL: "Value-based deep reinforcement learning for adaptive isolated intersection signal control", 《IET INTELLIGENT TRANSPORT SYSTEMS》 * |
丁文杰: ""基于深度强化学习的交通信号自适应控制研究"", 《全国优秀硕士学位论文全文库 工程科技Ⅱ辑》 * |
徐东伟 等: ""基于深度强化学习的城市交通信号控制综述"", 《交通运输工程与信息学报》 * |
甘正胜 等: ""基于元学习的小样本遥感图像分类"", 《计算机工程与设计》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115758705A (en) * | 2022-11-10 | 2023-03-07 | 北京航天驭星科技有限公司 | Modeling method, model and acquisition method of satellite north-south conservation strategy model |
CN117010482A (en) * | 2023-07-06 | 2023-11-07 | 三峡大学 | Strategy method based on double experience pool priority sampling and DuelingDQN implementation |
Also Published As
Publication number | Publication date |
---|---|
CN114613169B (en) | 2023-02-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114613169B (en) | Traffic signal lamp control method based on double experience pools DQN | |
CN109559530B (en) | Multi-intersection signal lamp cooperative control method based on Q value migration depth reinforcement learning | |
CN110047278B (en) | Adaptive traffic signal control system and method based on deep reinforcement learning | |
CN109215355A (en) | A kind of single-point intersection signal timing optimization method based on deeply study | |
CN111260937A (en) | Cross traffic signal lamp control method based on reinforcement learning | |
CN113963553A (en) | Road intersection signal lamp green signal ratio control method, device and equipment | |
CN115019523B (en) | Deep reinforcement learning traffic signal coordination optimization control method based on minimized pressure difference | |
CN115691167A (en) | Single-point traffic signal control method based on intersection holographic data | |
CN113392577B (en) | Regional boundary main intersection signal control method based on deep reinforcement learning | |
CN114419884A (en) | Self-adaptive signal control method and system based on reinforcement learning and phase competition | |
CN113724507A (en) | Traffic control and vehicle induction cooperation method and system based on deep reinforcement learning | |
CN116639124A (en) | Automatic driving vehicle lane changing method based on double-layer deep reinforcement learning | |
CN116824848A (en) | Traffic signal optimization control method based on Bayesian deep Q network | |
CN114723156B (en) | Global traffic signal lamp regulation and control method based on improved genetic algorithm | |
CN116597670A (en) | Traffic signal lamp timing method, device and equipment based on deep reinforcement learning | |
CN115472023A (en) | Intelligent traffic light control method and device based on deep reinforcement learning | |
CN114613170A (en) | Traffic signal lamp intersection coordination control method based on reinforcement learning | |
CN114613168B (en) | Deep reinforcement learning traffic signal control method based on memory network | |
CN116822659B (en) | Automatic driving motor skill learning method, system, equipment and computer medium | |
Xu et al. | Training a Reinforcement Learning Agent with AutoRL for Traffic Signal Control | |
CN117649776B (en) | Single intersection signal lamp control method, device, terminal and storage medium | |
CN115691110B (en) | Intersection signal period stable timing method based on reinforcement learning and oriented to dynamic traffic flow | |
CN117275259B (en) | Multi-intersection cooperative signal control method based on field information backtracking | |
CN116137103B (en) | Large-scale traffic light signal control method based on primitive learning and deep reinforcement learning | |
CN116994444B (en) | Traffic light control method, system and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |