CN117708594A - Deep reinforcement learning traffic light control method - Google Patents

Deep reinforcement learning traffic light control method Download PDF

Info

Publication number
CN117708594A
CN117708594A CN202311717211.2A CN202311717211A CN117708594A CN 117708594 A CN117708594 A CN 117708594A CN 202311717211 A CN202311717211 A CN 202311717211A CN 117708594 A CN117708594 A CN 117708594A
Authority
CN
China
Prior art keywords
traffic
reinforcement learning
deep reinforcement
traffic light
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311717211.2A
Other languages
Chinese (zh)
Inventor
孔燕
李颖
杨智超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202311717211.2A priority Critical patent/CN117708594A/en
Publication of CN117708594A publication Critical patent/CN117708594A/en
Pending legal-status Critical Current

Links

Landscapes

  • Traffic Control Systems (AREA)

Abstract

The invention discloses a control method of a deep reinforcement learning traffic light, which comprises the following steps: the method comprises the steps of (1) preprocessing urban traffic network data; (2) Constructing a model by utilizing a Multi-step DQN algorithm according to the preprocessed data; (3) Accumulating the experiences of n single steps, and learning by utilizing the accumulated experiences; (4) updating network parameters of the Multi-step DQN; (5) Combining Attentive experience replay with the DQN network to construct a deep reinforcement learning model; (6) Importing a traffic data set and a traffic flow data set into a deep reinforcement learning model, training, and recording experimental results; (7) Comparing the experimental results in step (2) and step (5); (8) performing visual display; compared with the traditional control method, the MALIght is better in reducing the average passing time of vehicles and improving the average throughput of intersections.

Description

Deep reinforcement learning traffic light control method
Technical Field
The invention relates to the technical field of deep reinforcement learning and intelligent traffic, in particular to a control method of a deep reinforcement learning traffic light.
Background
With the rapid increase of the number of motor vehicles in China and the rapid development of the urban process, the traffic demand and the traffic quantity are rapidly increased, and the traffic jam becomes a worldwide complex problem. The data from the Ministry of public security shows that the national motor vehicle has been kept for up to 4.08 hundred million vehicles by 2022, 8 months. The continued growth of vehicles and the extended limitedness of urban roads are the main causes of urban traffic congestion. In order to solve the contradiction between vehicles and road surfaces, methods of limiting the increase in the number of vehicles, increasing the construction of road infrastructure, and the like are generally employed. However, both of these methods have their limitations. Traffic light control systems are one way to solve this contradiction. At present, traffic lights in most cities are still traditional timing sequencing traffic lights, and when the traffic flow is too large, the traditional timing sequencing traffic lights cannot effectively process the traffic flow, so that the traffic jam problem is relieved.
Conventional Traffic Signal Control (TSC) can be divided into three major categories: timing control, sensing control, and adaptive control. Timing control is the most common traffic signal control system that sets the time interval of the phase to a constant value for use under steady traffic conditions. The induction control is to combine the real-time traffic flow to decide whether to keep the current phase or change the current phase according to the pre-defined timing rule. The adaptive control is one of the most effective traffic signal control methods, and essentially uses advanced algorithms and ideas to train the network model to converge and realize intelligent regulation of traffic signals, wherein maxpresure is one of the advanced ideas in the field, namely, a proper phase is selected according to traffic road conditions so as to minimize traffic pressure.
Currently, adaptive traffic lights are an effective way to reduce traffic congestion. Compared with the traditional traffic signal lamp, the self-adaptive traffic signal lamp has the following advantages: (1) The signal lamp can be dynamically adjusted to adapt to actual traffic demands, so that traffic flow can be more effectively controlled; (2) The traffic jam can be reduced and the traffic efficiency of vehicles and pedestrians can be improved by optimizing according to the real-time traffic condition; (3) Accurate traffic flow prediction and control can be realized, so that traffic safety and reliability can be improved.
Disclosure of Invention
The invention aims to: the invention aims to provide a deep reinforcement learning traffic light control method which can more scientifically realize intelligent regulation and control of traffic signal lamps according to the conditions of roads between traffic nodes and the mutual influence degree between adjacent nodes, is beneficial to promoting research in various fields of intelligent traffic and is also beneficial to improving the traffic jam problem.
The technical scheme is as follows: the invention relates to a control method of a deep reinforcement learning traffic light, which comprises the following steps:
(1) Preprocessing urban traffic network data;
(2) Constructing a model by utilizing a Multi-step DQN algorithm according to the preprocessed data;
(3) Accumulating the experiences of n single steps, and learning by utilizing the accumulated experiences;
(4) Updating network parameters of the Multi-step DQN;
(5) Combining Attentive experience replay with the DQN network to construct a deep reinforcement learning model;
(6) Importing a traffic data set and a traffic flow data set into a deep reinforcement learning model, training, and recording experimental results;
(7) Comparing the experimental results in step (2) and step (5);
(8) And (5) performing visual display.
Further, the step (1) includes the following steps:
(11) Collecting information of all traffic nodes in the city to form a traffic data set;
(12) The collection urban road network comprises intersections and is in a 4x4 grid. Traffic flow data is based on cameras of intersections, and a traffic flow data set is formed;
(13) And (3) performing data cleaning on the traffic data set and the traffic flow data set by utilizing the Multi step DQN, and then performing data extraction.
Further, the step (2) specifically includes the following steps:
(21) Setting a state function as the number of vehicles entering and exiting the lane, setting an action function as the regulation and control of the duration of the signal lamp, wherein 0 represents the current phase maintenance, and 1 represents the current phase change;
(22) Setting the rewarding function as negative maximum pressure, wherein the value of the maximum pressure is the difference between the number of vehicles entering and exiting the lane, and the maximum pressure formula corresponding to one traffic movement is as follows:
P i =N in -N out (1)
setting the bonus function to a negative bonus value based on traffic movement:
r i =-P i (2)
the total rewards of the current intersection are the sum of rewards of all traffic movements, namely:
R(s t ,a t )=∑r i (3);
further, the step (3) specifically includes the following steps:
the expression of the accumulated rewards, i.e. the multi-step rewards, is:
training is carried out through the model, and the average passing time of vehicles and the maximum passing quantity of the intersections are recorded.
Further, the step (4) specifically includes the following steps:
updating the parameters through gradient descent; representing the loss function by means of a mean square error;
the expression of the multi-step target value is:
calculating a loss function according to the target value, wherein the expression is as follows:
further, the step (5) specifically comprises the following steps: selecting experiences with higher similarity for playback by comparing the state distribution in the experience pool with the current state distribution;
in the step (7), the average passing time of the vehicles in the step (2) and the step (5) and the passing quantity of the crossing are compared to generate a return visit file.
Furthermore, the step (8) imports the playback file generated in the training in the step (7) to the Cityflow platform for display.
The device comprises a memory, a processor and a program stored on the memory and capable of running on the processor, wherein the processor realizes any one of the deep reinforcement learning traffic light control methods when executing the program.
The storage medium of the present invention stores a computer program, wherein the computer program is designed to implement any one of the deep reinforcement learning traffic light control methods when running.
The beneficial effects are that: compared with the prior art, the invention has the following remarkable advantages: (1) Improving a network framework of the DQN, accumulating n single-step experiences to form one experience, and learning by utilizing the accumulated experiences, so that the convergence speed of the network is increased; (2) Attentive experience replay is combined with the DQN network, experiences similar to the current state are learned preferentially, and an intelligent agent learns a better strategy; (3) Compared with the traditional control method, MALIght is better in reducing the average passing time of vehicles and improving the average throughput of intersections.
Drawings
FIG. 1 is a flow chart of the present invention;
fig. 2 is a schematic view of an intersection according to the present invention.
FIG. 3 is a schematic diagram of a deep reinforcement learning framework constructed by the Multi-step DQN algorithm of the present invention.
Fig. 4 is a schematic diagram of Attentive experience replay selection experience according to the present invention.
FIG. 5 is a schematic diagram of a visual presentation of the Cityflow platform of the present invention. .
Detailed Description
The technical scheme of the invention is further described below with reference to the accompanying drawings.
As shown in fig. 1, an embodiment of the present invention provides a method for controlling a deep reinforcement learning traffic light, including the following steps:
(1) Preprocessing urban traffic network data; the method comprises the following steps:
(11) Collecting information of all traffic nodes in the city to form a traffic data set;
(12) The collection urban road network comprises intersections and is in a 4x4 grid. Traffic flow data is based on cameras of intersections, and a traffic flow data set is formed;
(13) And (3) performing data cleaning on the traffic data set and the traffic flow data set by utilizing the Multi step DQN, and then performing data extraction.
As shown in fig. 2, (2) constructing a model using a Multi-step DQN algorithm based on the preprocessed data; the method comprises the following steps:
(21) Setting a state function as the number of vehicles entering and exiting the lane, setting an action function as the regulation and control of the duration of the signal lamp, wherein 0 represents the current phase maintenance, and 1 represents the current phase change;
(22) Setting the rewarding function as negative maximum pressure, wherein the value of the maximum pressure is the difference between the number of vehicles entering and exiting the lane, and the maximum pressure formula corresponding to one traffic movement is as follows:
P i =N in -N out (1)
setting the bonus function to a negative bonus value based on traffic movement:
r i =-P i (2)
the total rewards of the current intersection are the sum of rewards of all traffic movements, namely:
R(s t ,a t )=∑r i (3);
as shown in fig. 3, (3) n single-step experiences are accumulated, and learning is performed by using the accumulated experiences; the method comprises the following steps:
the expression of the accumulated rewards, i.e. the multi-step rewards, is:
training is carried out through the model, and the average passing time of vehicles and the maximum passing quantity of the intersections are recorded.
(4) Updating network parameters of the Multi-step DQN; the method comprises the following steps:
updating the parameters through gradient descent; representing the loss function by means of a mean square error;
the expression of the multi-step target value is:
calculating a loss function according to the target value, wherein the expression is as follows:
as shown in fig. 4, (5) combining Attentive experience replay with DQN network to construct a deep reinforcement learning model; the method comprises the following steps: selecting experiences with higher similarity for playback by comparing the state distribution in the experience pool with the current state distribution;
as shown in fig. 5, (6) introducing the traffic data set and the traffic flow data set into a deep reinforcement learning model for training, and recording experimental results;
(7) Comparing the experimental results in step (2) and step (5); the method comprises the following steps: comparing the average passing time of the vehicles in the step (2) and the step (5) with the passing quantity of the intersections to generate a return visit file.
(8) And (5) performing visual display. The method comprises the following steps: and (3) importing the playback file generated in the training in the step (7) to a Cityflow platform for display.
The embodiment of the invention also provides equipment, which comprises a memory, a processor and a program stored on the memory and capable of running on the processor, and is characterized in that the processor realizes the deep reinforcement learning traffic light control method according to any one of the above programs when executing the program.
The embodiment of the invention also provides a storage medium, which stores a computer program, wherein the computer program is designed to realize the deep reinforcement learning traffic light control method.

Claims (10)

1. A method for controlling a deep reinforcement learning traffic light, comprising the steps of:
(1) Preprocessing urban traffic network data;
(2) Constructing a model by utilizing a Multi-step DQN algorithm according to the preprocessed data;
(3) Accumulating the experiences of n single steps, and learning by utilizing the accumulated experiences;
(4) Updating network parameters of the Multi-step DQN;
(5) Combining Attentive experience replay with the DQN network to construct a deep reinforcement learning model;
(6) Importing a traffic data set and a traffic flow data set into a deep reinforcement learning model, training, and recording experimental results;
(7) Comparing the experimental results in step (2) and step (5);
(8) And (5) performing visual display.
2. The deep reinforcement learning traffic light control method according to claim 1, wherein the step (1) comprises the steps of:
(11) Collecting information of all traffic nodes in the city to form a traffic data set;
(12) The collection urban road network comprises intersections and is in a 4x4 grid. Traffic flow data is based on cameras of intersections, and a traffic flow data set is formed;
(13) And (3) performing data cleaning on the traffic data set and the traffic flow data set by utilizing the Multi step DQN, and then performing data extraction.
3. The method of controlling a deep reinforcement learning traffic light according to claim 1, wherein the step (2) is specifically as follows:
(21) Setting a state function as the number of vehicles entering and exiting the lane, setting an action function as the regulation and control of the duration of the signal lamp, wherein 0 represents the current phase maintenance, and 1 represents the current phase change;
(22) Setting the rewarding function as negative maximum pressure, wherein the value of the maximum pressure is the difference between the number of vehicles entering and exiting the lane, and the maximum pressure formula corresponding to one traffic movement is as follows:
P i =N in -N out (1)
setting the bonus function to a negative bonus value based on traffic movement:
r i =-P i (2)
the total rewards of the current intersection are the sum of rewards of all traffic movements, namely:
R(s t ,a t )=∑r i (3)。
4. the method of controlling a deep reinforcement learning traffic light according to claim 1, wherein the step (3) is specifically as follows:
the expression of the accumulated rewards, i.e. the multi-step rewards, is:
training is carried out through the model, and the average passing time of vehicles and the maximum passing quantity of the intersections are recorded.
5. The method of controlling a deep reinforcement learning traffic light according to claim 1, wherein the step (4) is specifically as follows:
updating the parameters through gradient descent; representing the loss function by means of a mean square error;
the expression of the multi-step target value is:
calculating a loss function according to the target value, wherein the expression is as follows:
6. the method for controlling a deep reinforcement learning traffic light according to claim 1, wherein the step (5) is specifically as follows: and selecting experiences with higher similarity for playback by comparing the state distribution in the experience pool with the current state distribution.
7. The method for controlling the deep reinforcement learning traffic light according to claim 1, wherein in the step (7), the average passing time of the vehicles in the step (2) and the step (5) and the passing amount of the crossing are compared to generate a return visit file.
8. The method for controlling the deep reinforcement learning traffic light according to claim 1, wherein the step (8) imports the playback file generated by training in the step (7) to a Cityflow platform for display.
9. An apparatus comprising a memory, a processor, and a program stored on the memory and executable on the processor, wherein the processor implements a deep reinforcement learning traffic light control method as claimed in any one of claims 1-8 when the program is executed by the processor.
10. A storage medium storing a computer program, characterized in that the computer program is designed to implement a deep reinforcement learning traffic light control method according to any one of claims 1-8 when run.
CN202311717211.2A 2023-12-13 2023-12-13 Deep reinforcement learning traffic light control method Pending CN117708594A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311717211.2A CN117708594A (en) 2023-12-13 2023-12-13 Deep reinforcement learning traffic light control method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311717211.2A CN117708594A (en) 2023-12-13 2023-12-13 Deep reinforcement learning traffic light control method

Publications (1)

Publication Number Publication Date
CN117708594A true CN117708594A (en) 2024-03-15

Family

ID=90156482

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311717211.2A Pending CN117708594A (en) 2023-12-13 2023-12-13 Deep reinforcement learning traffic light control method

Country Status (1)

Country Link
CN (1) CN117708594A (en)

Similar Documents

Publication Publication Date Title
CN100444210C (en) Mixed controlling method of single dot signal controlling crossing
CN110264750B (en) Multi-intersection signal lamp cooperative control method based on Q value migration of multi-task deep Q network
CN110047278B (en) Adaptive traffic signal control system and method based on deep reinforcement learning
CN109840641B (en) Method for quickly optimizing train multi-section operation curve
CN110570672B (en) Regional traffic signal lamp control method based on graph neural network
CN105118308B (en) Urban road intersection traffic signal optimization method based on cluster intensified learning
CN107393319B (en) Signal optimization control method for preventing single cross port queuing overflow
CN112201060B (en) Actor-Critic-based single-intersection traffic signal control method
CN110718077B (en) Signal lamp optimization timing method under action-evaluation mechanism
CN112365714B (en) Traffic signal control method for intersection of intelligent rail passing main branch road
WO2022188387A1 (en) Multi-model learning particle swarm-based intelligent city signal light timing optimization method
CN105046990A (en) Pavement signal lamp control method between adjacent intersections based on particle swarm algorithm
CN113223305A (en) Multi-intersection traffic light control method and system based on reinforcement learning and storage medium
CN114613169B (en) Traffic signal lamp control method based on double experience pools DQN
CN111524345A (en) Induction control method for multi-objective optimization under constraint of real-time queuing length of vehicle
CN116524745B (en) Cloud edge cooperative area traffic signal dynamic timing system and method
CN115083149B (en) Reinforced learning variable duration signal lamp control method for real-time monitoring
CN117708594A (en) Deep reinforcement learning traffic light control method
CN116824848A (en) Traffic signal optimization control method based on Bayesian deep Q network
CN116758768A (en) Dynamic regulation and control method for traffic lights of full crossroad
CN115472023B (en) Intelligent traffic light control method and device based on deep reinforcement learning
Luo et al. Researches on intelligent traffic signal control based on deep reinforcement learning
CN113096415B (en) Signal coordination optimization control method for secondary pedestrian crossing intersection
CN115705771A (en) Traffic signal control method based on reinforcement learning
CN117275260B (en) Emergency control method for urban road intersection entrance road traffic accident

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination