CN114613169A - Traffic signal lamp control method based on double experience pools DQN - Google Patents
Traffic signal lamp control method based on double experience pools DQN Download PDFInfo
- Publication number
- CN114613169A CN114613169A CN202210415387.1A CN202210415387A CN114613169A CN 114613169 A CN114613169 A CN 114613169A CN 202210415387 A CN202210415387 A CN 202210415387A CN 114613169 A CN114613169 A CN 114613169A
- Authority
- CN
- China
- Prior art keywords
- network
- value
- experience pool
- traffic
- experience
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/07—Controlling traffic signals
- G08G1/08—Controlling traffic signals according to detected number or speed of vehicles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/01—Detecting movement of traffic to be counted or controlled
- G08G1/0104—Measuring and analyzing of parameters relative to traffic conditions
- G08G1/0125—Traffic data processing
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Traffic Control Systems (AREA)
Abstract
本发明公开了一种基于双经验池DQN的交通信号灯控制方法,包括:1、建立基于DQN算法的交通信号灯控制主网络和目标值网络;2、初始化算法相关参数,采集交通路口的路况信息,建立状态值st;3、将st输入主网络中,选择Q值最大值的动作at;4、执行at并计算奖励rt和状态st+1;将(st,at,rt,st+1)存储到第一经验池;5、如果奖励rt大于历史经验平均奖励
将(st,at,rt,st+1)存储到第二经验池;6、生成随机数P,以概率1‑P选择第一经验池,以概率P选择第二经验池,在选中的经验池中随机抽样,通过最小化损失函数训练主网络的参数;S7、定时更新目标值网络的参数;根据当前路况信息更新st,跳转至步骤3继续执行。该方法能够使算法快速收敛,获得的交通信号灯控制策略快速优化。The invention discloses a traffic signal light control method based on double experience pool DQN, comprising: 1. establishing a traffic signal light control main network and a target value network based on a DQN algorithm; Establish the state value s t ; 3. Input s t into the main network, select the action at t with the maximum value of Q; 4. Execute at and calculate the reward r t and the state s t +1 ; (s t , at t , r t , s t+1 ) are stored in the first experience pool; 5. If the reward r t is greater than the average reward of historical experience
Store (s t , at , r t , s t +1 ) in the second experience pool; 6. Generate a random number P, select the first experience pool with probability 1-P, and select the second experience pool with probability P, Randomly sample from the selected experience pool, and train the parameters of the main network by minimizing the loss function; S7, update the parameters of the target value network regularly; update s t according to the current road condition information, and jump to step 3 to continue execution. This method can make the algorithm converge quickly, and the obtained traffic light control strategy can be quickly optimized.Description
技术领域technical field
本发明属于交通信号灯控制技术领域,具体涉及一种基于双经验池深度Q学习的交通信号灯控制方法。The invention belongs to the technical field of traffic signal light control, and in particular relates to a traffic signal light control method based on double-experience pool deep Q-learning.
背景技术Background technique
采用基于深度Q学习算法(DQN)进行交通信号灯的调控已有大量的研究。该方法无需带标签的测试数据,而是通过建立经验池来构建训练数据,在算法开始阶段得到的策略较差,随着经验池的不断更新和训练的继续进行,得到的策略逐渐得到优化,越来越好。因此,如何使算法快速收敛,即策略快速优化,是影响方法整体执行效果的重要因素。There has been a lot of research on the regulation of traffic lights based on deep Q-learning algorithm (DQN). This method does not need labeled test data, but builds training data by establishing an experience pool. The strategy obtained at the beginning of the algorithm is poor. With the continuous update of the experience pool and the continuation of training, the obtained strategy is gradually optimized. Getting better and better. Therefore, how to make the algorithm converge quickly, that is, the rapid optimization of the strategy, is an important factor affecting the overall execution effect of the method.
发明内容SUMMARY OF THE INVENTION
发明目的:本发明提供一种基于双经验池DQN的交通信号灯控制方法,该方法能够使算法快速收敛,获得的交通信号灯控制策略快速优化。Purpose of the invention: The present invention provides a traffic signal light control method based on the double experience pool DQN, which can make the algorithm converge quickly, and the obtained traffic signal light control strategy can be quickly optimized.
技术方案:本发明采用如下技术方案:Technical scheme: the present invention adopts the following technical scheme:
一种基于双经验池DQN的交通信号灯控制方法,包括步骤:A traffic signal control method based on double experience pool DQN, comprising the steps of:
S1、建立基于DQN算法的交通信号灯控制主网络和目标值网络;所述交通信号灯控制主网络和目标值网络的结构相同,输入为状态值,输出为在输入状态值下执行各种动作的Q值最大值,以及该Q值最大值所对应的动作;所述主网络和目标值网络的状态空间为交通路口各车道上车辆的数量构成的向量,动作空间为对交通路口当前所有交通信号灯相位的调控操作构成的向量,奖励函数为交通路口所有进车道上车辆数量与出车道上车辆数量之差;S1. Establish a traffic light control main network and a target value network based on the DQN algorithm; the traffic light control main network and the target value network have the same structure, the input is a state value, and the output is a Q for performing various actions under the input state value value maximum value, and the action corresponding to the Q value maximum value; the state space of the main network and the target value network is the vector formed by the number of vehicles in each lane of the traffic intersection, and the action space is the current phase of all traffic lights at the traffic intersection. The vector formed by the control operation of , and the reward function is the difference between the number of vehicles in all incoming lanes and the number of vehicles in outgoing lanes at the traffic intersection;
S2、对主网络的参数θ进行随机初始化,将目标值网络的参数θ′初始化为θ,初始化时间步t=0,采集交通路口的路况信息,建立初始状态值st,初始化 S2. Randomly initialize the parameter θ of the main network, initialize the parameter θ′ of the target value network to θ, initialize the time step t=0, collect the road condition information of the traffic intersection, establish the initial state value s t , and initialize
S3、将st输入主网络中,选择使Q(st,a;θ)取最大值的动作at作为当前时间对交通信号灯的调控操作,即:at=argmaxaQ(st,a;θ),其中Q(st,a;θ)表示主网络在参数θ下根据状态st动作a输出的Q值;S3. Input s t into the main network, and select the action a t that makes Q(s t , a; θ) take the maximum value as the control operation for the traffic lights at the current time, namely: a t =argmax a Q(s t , a; θ), where Q(s t , a; θ) represents the Q value output by the main network according to the state s t action a under the parameter θ;
S4、执行动作at并计算奖励rt和状态st+1;将(st,at,rt,st+1)存储到第一经验池中;S4. Execute action at and calculate reward rt and state s t +1 ; store (s t , at , r t , s t +1 ) in the first experience pool;
S5、当t>0时计算当前历史经验平均奖励如果将(st,at,rt,st+1)存储到第二经验池中;S5. Calculate the average reward of current historical experience when t>0 if store (s t , at , r t , s t +1 ) into the second experience pool;
S6、在(p1,p2)区间内生成随机数P,以1-P作为概率选择第一经验池,以P作为概率选择第二经验池,在选中的经验池中随机抽样B个记录,通过最小化损失函数训练主网络的参数θ;p1,p2为预设的区间下限和上限,0<p1<p2<1;S6. Generate a random number P in the (p 1 , p 2 ) interval, select the first experience pool with 1-P as the probability, select the second experience pool with P as the probability, and randomly sample B records from the selected experience pool , the parameter θ of the main network is trained by minimizing the loss function; p 1 , p 2 are the preset lower and upper limits of the interval, 0<p 1 <p 2 <1;
所述损失函数为: The loss function is:
其中(si,ai,ri,si+1)为在选中的经验池中随机抽样的记录,γ为折扣因子,maxa′Q′(si+1,a′,θ′)表示目标值网络在输入状态si+1时输出的最大的Q值,maxaQ(si,a,θ)表示主网络在输入状态si时输出的最大的Q值;where (s i , a i , r i , s i+1 ) are the records randomly sampled in the selected experience pool, γ is the discount factor, max a′ Q′(s i+1 , a′, θ′) Represents the maximum Q value output by the target value network in the input state s i+1 , max a Q(s i , a, θ) represents the maximum Q value output by the main network in the input state s i ;
S7、令t加一,如果mod(t,C)为0,将目标值网络的参数θ′更新为主网络的参数θ;mod为取余运算,C为预设的参数更新时间步;根据当前路况信息更新st,跳转至步骤S3继续执行。S7. Let t increase by one, if mod(t, C) is 0, update the parameter θ' of the target value network to the parameter θ of the main network; mod is the remainder operation, and C is the preset parameter update time step; The current road condition information is updated s t , and jumps to step S3 to continue the execution.
进一步地,所述步骤S6采用梯度下降法最小化损失函数得到主网络的参数。Further, the step S6 adopts the gradient descent method to minimize the loss function to obtain the parameters of the main network.
进一步地,当交通路口为十字路口,所述主网络和目标值网络的状态空间中的状态值为[n1,m1,n2,m2,n3,m3,n4,m4],其中nj为十字路口中第j个进车道上的车辆数量,mj为第j个出车道上的车辆数量;j=1,2,3,4。Further, when the traffic intersection is an intersection, the state values in the state space of the main network and the target value network are [n 1 ,m 1 ,n 2 ,m 2 ,n 3 ,m 3 ,n 4 ,m 4 ], where n j is the number of vehicles in the j-th incoming lane in the intersection, m j is the number of vehicles in the j-th outgoing lane; j=1,2,3,4.
进一步地,所述主网络和目标值网络的动作空间中的动作值有三种取值,分别为:ac1:当前相位时长加T秒;ac2:当前相位时长减T秒;ac3:当前相位时长不变。本发明中,T为5秒。Further, the action values in the action space of the main network and the target value network have three values, respectively: ac 1 : the current phase duration plus T seconds; ac 2 : the current phase duration minus T seconds; ac 3 : the current phase duration The phase duration does not change. In the present invention, T is 5 seconds.
进一步地,本发明中生成随机数P的区间下限p1=0.7,区间上限p2=0.9。Further, in the present invention, the interval lower limit p 1 =0.7 of the random number P is generated, and the interval upper limit p 2 =0.9.
进一步地,奖励函数值为: Further, the reward function value is:
进一步地,所述第一经验池和第二经验池均采用容量固定的队列存储记录。Further, the first experience pool and the second experience pool both use queues with fixed capacity to store records.
进一步地,所述步骤S5计算当前历史经验平均奖励 Further, the step S5 calculates the average reward of current historical experience
有益效果:本发明公开的交通信号灯控制方法采用双经验池和DQN相结合,其中双经验池机制能够使网络参数训练快速收敛,获得的交通信号灯控制策略快速优化,从而更好地实现交通灯的智能调控。Beneficial effects: The traffic signal control method disclosed in the present invention adopts a combination of dual experience pools and DQN, wherein the dual experience pool mechanism can make the network parameter training converge quickly, and the obtained traffic signal control strategy can be quickly optimized, so as to better realize the control of traffic lights. Intelligent regulation.
附图说明Description of drawings
图1为本发明公开的交通信号灯控制方法的流程图;FIG. 1 is a flowchart of a traffic signal light control method disclosed in the present invention;
图2为实施例中路口示意图;2 is a schematic diagram of an intersection in an embodiment;
图3为本发明网络架构示意图。FIG. 3 is a schematic diagram of the network architecture of the present invention.
具体实施方式Detailed ways
下面结合附图和具体实施方式,进一步阐明本发明。The present invention will be further explained below in conjunction with the accompanying drawings and specific embodiments.
本发明公开了一种基于双经验池DQN的交通信号灯控制方法,如图1所示,包括步骤:The invention discloses a traffic signal light control method based on double experience pool DQN, as shown in FIG. 1 , including steps:
S1、建立基于DQN算法的交通信号灯控制主网络和目标值网络;所述交通信号灯控制主网络和目标值网络的结构相同,输入为状态值,输出为在输入状态值下执行各种动作的Q值;所述主网络和目标值网络的状态空间为交通路口各车道上车辆的数量构成的向量,动作空间为对交通路口当前所有交通信号灯相位的调控操作构成的向量,奖励函数为交通路口所有进车道上车辆数量与出车道上车辆数量之差;S1. Establish a traffic light control main network and a target value network based on the DQN algorithm; the traffic light control main network and the target value network have the same structure, the input is a state value, and the output is a Q for performing various actions under the input state value value; the state space of the main network and the target value network is the vector formed by the number of vehicles in each lane of the traffic intersection, the action space is the vector formed by the control operations on the phases of all current traffic lights at the traffic intersection, and the reward function is all traffic intersections. The difference between the number of vehicles in the incoming lane and the number of vehicles in the outgoing lane;
当交通路口为十字路口时,如图2中的路口A所示,其四个路口均有进入路口和驶离路口的车道,如图中N1-N4为进入路口的车道,M1-M4为驶离路口的车道,则主网络和目标值网络的状态空间中的状态值为[n1,m1,n2,m2,n3,m3,n4,m4],其中nj为十字路口中第j个进车道上的车辆数量,mj为第j个出车道上的车辆数量;j=1,2,3,4。可以通过各个方向道路设置的传感器或摄像头捕捉上述数据。奖励函数值为:即进车道上车辆数量与出车道上车辆数量之差。网络和目标值网络的动作空间中的动作值有三种取值,分别为:ac1:当前相位时长加T秒;ac2:当前相位时长减T秒;ac3:当前相位时长不变,即按照预设的交通信号灯相位变化来改变当前相位的状态。When the traffic intersection is an intersection, as shown in intersection A in Figure 2, its four intersections have lanes entering and leaving the intersection. Lanes away from the intersection, the state values in the state space of the main network and the target value network are [n 1 ,m 1 ,n 2 ,m 2 ,n 3 ,m 3 ,n 4 ,m 4 ], where n j is The number of vehicles in the j-th incoming lane in the intersection, m j is the number of vehicles in the j-th outgoing lane; j=1,2,3,4. This data can be captured by sensors or cameras placed on the road in all directions. The reward function is: That is, the difference between the number of vehicles in the incoming lane and the number of vehicles in the outgoing lane. The action value in the action space of the network and the target value network has three values, namely: ac 1 : the current phase duration plus T seconds; ac 2 : the current phase duration minus T seconds; ac 3 : the current phase duration is unchanged, that is Change the state of the current phase according to the preset traffic light phase change.
S2、对主网络的参数θ进行随机初始化,将目标值网络的参数θ′初始化为θ,初始化时间步t=0,采集交通路口的路况信息,建立初始状态值st,初始化 S2. Randomly initialize the parameter θ of the main network, initialize the parameter θ′ of the target value network to θ, initialize the time step t=0, collect the road condition information of the traffic intersection, establish the initial state value s t , and initialize
S3、将st输入主网络中,选择使Q(st,a;θ)取最大值的动作at作为当前时间对交通信号灯的调控操作,即:at=argmaxaQ(st,a;θ),其中Q(st,a;θ)表示主网络在参数θ下根据状态st动作a输出的Q值;S3. Input s t into the main network, and select the action a t that makes Q(s t , a; θ) take the maximum value as the control operation for the traffic lights at the current time, namely: a t =argmax a Q(s t , a; θ), where Q(s t , a; θ) represents the Q value output by the main network according to the state s t action a under the parameter θ;
S4、执行动作at并计算奖励rt和状态st+1;将(st,at,rt,st+1)存储到第一经验池中;S4. Execute action at and calculate reward rt and state s t +1 ; store (s t , at , r t , s t +1 ) in the first experience pool;
S5、当t>0时计算当前历史经验平均奖励如果将(st,at,rt,st+1)存储到第二经验池中;S5. Calculate the average reward of current historical experience when t>0 if store (s t , at , r t , s t +1 ) into the second experience pool;
当前历史经验平均奖励即根据上一时间步的平均奖励和当前时间步数t和奖励rt来计算。Current historical experience average reward i.e. according to the average reward of the previous time step and the current time step t and reward r t to calculate.
本发明中第一经验池和第二经验池均采用容量固定的队列存储记录,当满队时,将队头的记录删除,新的记录存储至队尾,以此来更新经验池。In the present invention, both the first experience pool and the second experience pool use a queue with a fixed capacity to store records. When the queue is full, the record at the head of the queue is deleted, and the new record is stored at the end of the queue to update the experience pool.
S6、在(p1,p2)区间内生成随机数P,以1-P作为概率选择第一经验池,以P作为概率选择第二经验池;在选中的经验池中随机抽样B个记录,通过最小化损失函数训练主网络的参数θ;p1,p2为预设的区间下限和上限,0<p1<p2<1。本实施例中,p1=0.7,p2=0.9,即选择第二经验池的概率大于第一经验池。由于第二经验池中的记录奖励较大,其表现优于第一经验池,采用第二经验池内的记录训练能够加快收敛速度。同时保留了以较低概率(1-P)选择第一经验池,是为了降低网络进入过拟合的概率。S6. Generate a random number P in the (p 1 , p 2 ) interval, select the first experience pool with 1-P as the probability, and select the second experience pool with P as the probability; randomly sample B records from the selected experience pool , the parameter θ of the main network is trained by minimizing the loss function; p 1 , p 2 are the preset lower and upper limits of the interval, 0<p 1 <p 2 <1. In this embodiment, p 1 =0.7, p 2 =0.9, that is, the probability of selecting the second experience pool is greater than that of the first experience pool. Since the record reward in the second experience pool is larger, its performance is better than that of the first experience pool, and the training with the records in the second experience pool can speed up the convergence speed. At the same time, the selection of the first experience pool with a lower probability (1-P) is reserved in order to reduce the probability of the network entering overfitting.
所述损失函数为: The loss function is:
其中(si,ai,ri,si+1)为在选中的经验池中随机抽样的记录,γ为折扣因子,maxa′Q′(si+1,a′,θ′)表示目标值网络在输入状态si+1时输出的最大的Q值,maxaQ(si,a,θ)表示主网络在输入状态si时输出的最大的Q值;where (s i , a i , r i , s i+1 ) are the records randomly sampled in the selected experience pool, γ is the discount factor, max a′ Q′(s i+1 , a′, θ′) Represents the maximum Q value output by the target value network in the input state s i+1 , max a Q(s i , a, θ) represents the maximum Q value output by the main network in the input state s i ;
本实施例中,采用梯度下降法最小化损失函数得到主网络的参数。如图3所示,为本发明网络架构示意图。In this embodiment, the gradient descent method is used to minimize the loss function to obtain the parameters of the main network. As shown in FIG. 3 , it is a schematic diagram of the network architecture of the present invention.
S7、令t加一,如果mod(t,C)为0,将目标值网络的参数θ′更新为主网络的参数θ;mod为取余运算,C为预设的参数更新时间步;根据t-1时刻和t时刻之间的时长、C的值,可以控制目标值网络参数更新的频率;根据当前路况信息更新st,跳转至步骤S3继续执行。S7. Let t increase by one, if mod(t, C) is 0, update the parameter θ' of the target value network to the parameter θ of the main network; mod is the remainder operation, and C is the preset parameter update time step; The duration between time t-1 and time t and the value of C can control the frequency of updating the target value network parameters; update s t according to the current road condition information, and jump to step S3 to continue execution.
本发明采用的双经验池的形式,加快了DQN训练时网络的收敛速度,从而更好地缓解了交通拥堵的情况,推进了智能交通和深度强化学习领域的发展。The form of double experience pools adopted in the present invention accelerates the convergence speed of the network during DQN training, thereby better relieving traffic congestion and promoting the development of intelligent transportation and deep reinforcement learning.
Claims (9)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210415387.1A CN114613169B (en) | 2022-04-20 | 2022-04-20 | Traffic signal lamp control method based on double experience pools DQN |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210415387.1A CN114613169B (en) | 2022-04-20 | 2022-04-20 | Traffic signal lamp control method based on double experience pools DQN |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114613169A true CN114613169A (en) | 2022-06-10 |
CN114613169B CN114613169B (en) | 2023-02-28 |
Family
ID=81870213
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210415387.1A Active CN114613169B (en) | 2022-04-20 | 2022-04-20 | Traffic signal lamp control method based on double experience pools DQN |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114613169B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115643623A (en) * | 2022-10-17 | 2023-01-24 | 北京航空航天大学 | A Wireless Ad Hoc Network Device Routing Method Based on Deep Q-Learning |
CN115758705A (en) * | 2022-11-10 | 2023-03-07 | 北京航天驭星科技有限公司 | Modeling method, model and acquisition method of satellite north-south conservation strategy model |
CN117010482A (en) * | 2023-07-06 | 2023-11-07 | 三峡大学 | Strategy method based on double experience pool priority sampling and DuelingDQN implementation |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109559530A (en) * | 2019-01-07 | 2019-04-02 | 大连理工大学 | A kind of multi-intersection signal lamp cooperative control method based on Q value Transfer Depth intensified learning |
CN110060475A (en) * | 2019-04-17 | 2019-07-26 | 清华大学 | A kind of multi-intersection signal lamp cooperative control method based on deeply study |
CN110930734A (en) * | 2019-11-30 | 2020-03-27 | 天津大学 | Intelligent idle traffic indicator lamp control method based on reinforcement learning |
CN111696370A (en) * | 2020-06-16 | 2020-09-22 | 西安电子科技大学 | Traffic light control method based on heuristic deep Q network |
CN111898211A (en) * | 2020-08-07 | 2020-11-06 | 吉林大学 | Intelligent vehicle speed decision method and simulation method based on deep reinforcement learning |
CN112632858A (en) * | 2020-12-23 | 2021-04-09 | 浙江工业大学 | Traffic light signal control method based on Actor-critical frame deep reinforcement learning algorithm |
CN113411099A (en) * | 2021-05-28 | 2021-09-17 | 杭州电子科技大学 | Double-change frequency hopping pattern intelligent decision method based on PPER-DQN |
CN113947928A (en) * | 2021-10-15 | 2022-01-18 | 河南工业大学 | Traffic signal lamp timing method based on combination of deep reinforcement learning and extended Kalman filtering |
CN113963553A (en) * | 2021-10-20 | 2022-01-21 | 西安工业大学 | Road intersection signal lamp green signal ratio control method, device and equipment |
-
2022
- 2022-04-20 CN CN202210415387.1A patent/CN114613169B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109559530A (en) * | 2019-01-07 | 2019-04-02 | 大连理工大学 | A kind of multi-intersection signal lamp cooperative control method based on Q value Transfer Depth intensified learning |
CN110060475A (en) * | 2019-04-17 | 2019-07-26 | 清华大学 | A kind of multi-intersection signal lamp cooperative control method based on deeply study |
CN110930734A (en) * | 2019-11-30 | 2020-03-27 | 天津大学 | Intelligent idle traffic indicator lamp control method based on reinforcement learning |
CN111696370A (en) * | 2020-06-16 | 2020-09-22 | 西安电子科技大学 | Traffic light control method based on heuristic deep Q network |
CN111898211A (en) * | 2020-08-07 | 2020-11-06 | 吉林大学 | Intelligent vehicle speed decision method and simulation method based on deep reinforcement learning |
CN112632858A (en) * | 2020-12-23 | 2021-04-09 | 浙江工业大学 | Traffic light signal control method based on Actor-critical frame deep reinforcement learning algorithm |
CN113411099A (en) * | 2021-05-28 | 2021-09-17 | 杭州电子科技大学 | Double-change frequency hopping pattern intelligent decision method based on PPER-DQN |
CN113947928A (en) * | 2021-10-15 | 2022-01-18 | 河南工业大学 | Traffic signal lamp timing method based on combination of deep reinforcement learning and extended Kalman filtering |
CN113963553A (en) * | 2021-10-20 | 2022-01-21 | 西安工业大学 | Road intersection signal lamp green signal ratio control method, device and equipment |
Non-Patent Citations (4)
Title |
---|
WANC H ET.AL: "Value-based deep reinforcement learning for adaptive isolated intersection signal control", 《IET INTELLIGENT TRANSPORT SYSTEMS》 * |
丁文杰: ""基于深度强化学习的交通信号自适应控制研究"", 《全国优秀硕士学位论文全文库 工程科技Ⅱ辑》 * |
徐东伟 等: ""基于深度强化学习的城市交通信号控制综述"", 《交通运输工程与信息学报》 * |
甘正胜 等: ""基于元学习的小样本遥感图像分类"", 《计算机工程与设计》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115643623A (en) * | 2022-10-17 | 2023-01-24 | 北京航空航天大学 | A Wireless Ad Hoc Network Device Routing Method Based on Deep Q-Learning |
CN115758705A (en) * | 2022-11-10 | 2023-03-07 | 北京航天驭星科技有限公司 | Modeling method, model and acquisition method of satellite north-south conservation strategy model |
CN117010482A (en) * | 2023-07-06 | 2023-11-07 | 三峡大学 | Strategy method based on double experience pool priority sampling and DuelingDQN implementation |
Also Published As
Publication number | Publication date |
---|---|
CN114613169B (en) | 2023-02-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114613169A (en) | Traffic signal lamp control method based on double experience pools DQN | |
CN109559530B (en) | A collaborative control method for multi-intersection signal lights based on Q-value transfer deep reinforcement learning | |
CN110264750B (en) | Multi-intersection signal lamp cooperative control method based on Q value migration of multi-task deep Q network | |
CN110047278A (en) | A kind of self-adapting traffic signal control system and method based on deeply study | |
CN114627657A (en) | Adaptive traffic signal control method based on deep graph reinforcement learning | |
WO2021051870A1 (en) | Reinforcement learning model-based information control method and apparatus, and computer device | |
CN110136456A (en) | Traffic light anti-jamming control method and system based on deep reinforcement learning | |
CN110570672B (en) | A method of regional traffic light control based on graph neural network | |
CN111260937A (en) | Cross traffic signal lamp control method based on reinforcement learning | |
CN108335497A (en) | A kind of traffic signals adaptive control system and method | |
CN113554875B (en) | Variable speed-limiting control method for heterogeneous traffic flow of expressway based on edge calculation | |
CN106097733B (en) | A kind of traffic signal optimization control method based on Policy iteration and cluster | |
CN111951574A (en) | Adaptive Iterative Learning Control Method for Traffic Signals Based on Attenuated Memory Removal | |
CN115019523A (en) | Deep reinforcement learning traffic signal coordination optimization control method based on minimized pressure difference | |
CN113299079B (en) | Regional intersection signal control method based on PPO and graph convolution neural network | |
CN115691167A (en) | Single-point traffic signal control method based on intersection holographic data | |
CN115359672A (en) | A Traffic Area Boundary Control Method Combining Data-Driven and Reinforcement Learning | |
CN115472023B (en) | Intelligent traffic light control method and device based on deep reinforcement learning | |
CN116524745A (en) | Cloud edge cooperative area traffic signal dynamic timing system and method | |
CN115578870A (en) | A Traffic Signal Control Method Based on Proximal Policy Optimization | |
CN118172951A (en) | Urban intersection signal control method based on deep reinforcement learning | |
CN114419884B (en) | Self-adaptive signal control method and system based on reinforcement learning and phase competition | |
CN116597670A (en) | Traffic signal lamp timing method, device and equipment based on deep reinforcement learning | |
CN113724507B (en) | Traffic control and vehicle guidance cooperative method and system based on deep reinforcement learning | |
WO2023206248A1 (en) | Control method and apparatus for traffic light, and road network system, electronic device and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |