CN117708594A

CN117708594A - Deep reinforcement learning traffic light control method

Info

Publication number: CN117708594A
Application number: CN202311717211.2A
Authority: CN
Inventors: 孔燕; 李颖; 杨智超
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2023-12-13
Filing date: 2023-12-13
Publication date: 2024-03-15

Abstract

The invention discloses a control method of a deep reinforcement learning traffic light, which comprises the following steps: the method comprises the steps of (1) preprocessing urban traffic network data; (2) Constructing a model by utilizing a Multi-step DQN algorithm according to the preprocessed data; (3) Accumulating the experiences of n single steps, and learning by utilizing the accumulated experiences; (4) updating network parameters of the Multi-step DQN; (5) Combining Attentive experience replay with the DQN network to construct a deep reinforcement learning model; (6) Importing a traffic data set and a traffic flow data set into a deep reinforcement learning model, training, and recording experimental results; (7) Comparing the experimental results in step (2) and step (5); (8) performing visual display; compared with the traditional control method, the MALIght is better in reducing the average passing time of vehicles and improving the average throughput of intersections.

Description

Deep reinforcement learning traffic light control method

Technical Field

The invention relates to the technical field of deep reinforcement learning and intelligent traffic, in particular to a control method of a deep reinforcement learning traffic light.

Background

With the rapid increase of the number of motor vehicles in China and the rapid development of the urban process, the traffic demand and the traffic quantity are rapidly increased, and the traffic jam becomes a worldwide complex problem. The data from the Ministry of public security shows that the national motor vehicle has been kept for up to 4.08 hundred million vehicles by 2022, 8 months. The continued growth of vehicles and the extended limitedness of urban roads are the main causes of urban traffic congestion. In order to solve the contradiction between vehicles and road surfaces, methods of limiting the increase in the number of vehicles, increasing the construction of road infrastructure, and the like are generally employed. However, both of these methods have their limitations. Traffic light control systems are one way to solve this contradiction. At present, traffic lights in most cities are still traditional timing sequencing traffic lights, and when the traffic flow is too large, the traditional timing sequencing traffic lights cannot effectively process the traffic flow, so that the traffic jam problem is relieved.

Conventional Traffic Signal Control (TSC) can be divided into three major categories: timing control, sensing control, and adaptive control. Timing control is the most common traffic signal control system that sets the time interval of the phase to a constant value for use under steady traffic conditions. The induction control is to combine the real-time traffic flow to decide whether to keep the current phase or change the current phase according to the pre-defined timing rule. The adaptive control is one of the most effective traffic signal control methods, and essentially uses advanced algorithms and ideas to train the network model to converge and realize intelligent regulation of traffic signals, wherein maxpresure is one of the advanced ideas in the field, namely, a proper phase is selected according to traffic road conditions so as to minimize traffic pressure.

Currently, adaptive traffic lights are an effective way to reduce traffic congestion. Compared with the traditional traffic signal lamp, the self-adaptive traffic signal lamp has the following advantages: (1) The signal lamp can be dynamically adjusted to adapt to actual traffic demands, so that traffic flow can be more effectively controlled; (2) The traffic jam can be reduced and the traffic efficiency of vehicles and pedestrians can be improved by optimizing according to the real-time traffic condition; (3) Accurate traffic flow prediction and control can be realized, so that traffic safety and reliability can be improved.

Disclosure of Invention

The invention aims to: the invention aims to provide a deep reinforcement learning traffic light control method which can more scientifically realize intelligent regulation and control of traffic signal lamps according to the conditions of roads between traffic nodes and the mutual influence degree between adjacent nodes, is beneficial to promoting research in various fields of intelligent traffic and is also beneficial to improving the traffic jam problem.

The technical scheme is as follows: the invention relates to a control method of a deep reinforcement learning traffic light, which comprises the following steps:

(1) Preprocessing urban traffic network data;

(2) Constructing a model by utilizing a Multi-step DQN algorithm according to the preprocessed data;

(3) Accumulating the experiences of n single steps, and learning by utilizing the accumulated experiences;

(4) Updating network parameters of the Multi-step DQN;

(5) Combining Attentive experience replay with the DQN network to construct a deep reinforcement learning model;

(6) Importing a traffic data set and a traffic flow data set into a deep reinforcement learning model, training, and recording experimental results;

(7) Comparing the experimental results in step (2) and step (5);

(8) And (5) performing visual display.

Further, the step (1) includes the following steps:

(11) Collecting information of all traffic nodes in the city to form a traffic data set;

(12) The collection urban road network comprises intersections and is in a 4x4 grid. Traffic flow data is based on cameras of intersections, and a traffic flow data set is formed;

(13) And (3) performing data cleaning on the traffic data set and the traffic flow data set by utilizing the Multi step DQN, and then performing data extraction.

Further, the step (2) specifically includes the following steps:

(21) Setting a state function as the number of vehicles entering and exiting the lane, setting an action function as the regulation and control of the duration of the signal lamp, wherein 0 represents the current phase maintenance, and 1 represents the current phase change;

(22) Setting the rewarding function as negative maximum pressure, wherein the value of the maximum pressure is the difference between the number of vehicles entering and exiting the lane, and the maximum pressure formula corresponding to one traffic movement is as follows:

P _i ＝N _in -N _out (1)

setting the bonus function to a negative bonus value based on traffic movement:

r _i ＝-P _i (2)

the total rewards of the current intersection are the sum of rewards of all traffic movements, namely:

R(s _t ，a _t )＝∑r _i (3)；

further, the step (3) specifically includes the following steps:

the expression of the accumulated rewards, i.e. the multi-step rewards, is:

training is carried out through the model, and the average passing time of vehicles and the maximum passing quantity of the intersections are recorded.

Further, the step (4) specifically includes the following steps:

updating the parameters through gradient descent; representing the loss function by means of a mean square error;

the expression of the multi-step target value is:

calculating a loss function according to the target value, wherein the expression is as follows:

further, the step (5) specifically comprises the following steps: selecting experiences with higher similarity for playback by comparing the state distribution in the experience pool with the current state distribution;

in the step (7), the average passing time of the vehicles in the step (2) and the step (5) and the passing quantity of the crossing are compared to generate a return visit file.

Furthermore, the step (8) imports the playback file generated in the training in the step (7) to the Cityflow platform for display.

The device comprises a memory, a processor and a program stored on the memory and capable of running on the processor, wherein the processor realizes any one of the deep reinforcement learning traffic light control methods when executing the program.

The storage medium of the present invention stores a computer program, wherein the computer program is designed to implement any one of the deep reinforcement learning traffic light control methods when running.

The beneficial effects are that: compared with the prior art, the invention has the following remarkable advantages: (1) Improving a network framework of the DQN, accumulating n single-step experiences to form one experience, and learning by utilizing the accumulated experiences, so that the convergence speed of the network is increased; (2) Attentive experience replay is combined with the DQN network, experiences similar to the current state are learned preferentially, and an intelligent agent learns a better strategy; (3) Compared with the traditional control method, MALIght is better in reducing the average passing time of vehicles and improving the average throughput of intersections.

Drawings

FIG. 1 is a flow chart of the present invention;

fig. 2 is a schematic view of an intersection according to the present invention.

FIG. 3 is a schematic diagram of a deep reinforcement learning framework constructed by the Multi-step DQN algorithm of the present invention.

Fig. 4 is a schematic diagram of Attentive experience replay selection experience according to the present invention.

FIG. 5 is a schematic diagram of a visual presentation of the Cityflow platform of the present invention. .

Detailed Description

The technical scheme of the invention is further described below with reference to the accompanying drawings.

As shown in fig. 1, an embodiment of the present invention provides a method for controlling a deep reinforcement learning traffic light, including the following steps:

(1) Preprocessing urban traffic network data; the method comprises the following steps:

As shown in fig. 2, (2) constructing a model using a Multi-step DQN algorithm based on the preprocessed data; the method comprises the following steps:

P _i ＝N _in -N _out (1)

setting the bonus function to a negative bonus value based on traffic movement:

r _i ＝-P _i (2)

R(s _t ，a _t )＝∑r _i (3)；

as shown in fig. 3, (3) n single-step experiences are accumulated, and learning is performed by using the accumulated experiences; the method comprises the following steps:

the expression of the accumulated rewards, i.e. the multi-step rewards, is:

(4) Updating network parameters of the Multi-step DQN; the method comprises the following steps:

the expression of the multi-step target value is:

as shown in fig. 4, (5) combining Attentive experience replay with DQN network to construct a deep reinforcement learning model; the method comprises the following steps: selecting experiences with higher similarity for playback by comparing the state distribution in the experience pool with the current state distribution;

as shown in fig. 5, (6) introducing the traffic data set and the traffic flow data set into a deep reinforcement learning model for training, and recording experimental results;

(7) Comparing the experimental results in step (2) and step (5); the method comprises the following steps: comparing the average passing time of the vehicles in the step (2) and the step (5) with the passing quantity of the intersections to generate a return visit file.

(8) And (5) performing visual display. The method comprises the following steps: and (3) importing the playback file generated in the training in the step (7) to a Cityflow platform for display.

The embodiment of the invention also provides equipment, which comprises a memory, a processor and a program stored on the memory and capable of running on the processor, and is characterized in that the processor realizes the deep reinforcement learning traffic light control method according to any one of the above programs when executing the program.

The embodiment of the invention also provides a storage medium, which stores a computer program, wherein the computer program is designed to realize the deep reinforcement learning traffic light control method.

Claims

1. A method for controlling a deep reinforcement learning traffic light, comprising the steps of:

(1) Preprocessing urban traffic network data;

(4) Updating network parameters of the Multi-step DQN;

(7) Comparing the experimental results in step (2) and step (5);

(8) And (5) performing visual display.

2. The deep reinforcement learning traffic light control method according to claim 1, wherein the step (1) comprises the steps of:

3. The method of controlling a deep reinforcement learning traffic light according to claim 1, wherein the step (2) is specifically as follows:

P _i ＝N _in -N _out (1)

setting the bonus function to a negative bonus value based on traffic movement:

r _i ＝-P _i (2)

R(s _t ,a _t )＝∑r _i (3)。

4. the method of controlling a deep reinforcement learning traffic light according to claim 1, wherein the step (3) is specifically as follows:

the expression of the accumulated rewards, i.e. the multi-step rewards, is:

5. The method of controlling a deep reinforcement learning traffic light according to claim 1, wherein the step (4) is specifically as follows:

the expression of the multi-step target value is:

6. the method for controlling a deep reinforcement learning traffic light according to claim 1, wherein the step (5) is specifically as follows: and selecting experiences with higher similarity for playback by comparing the state distribution in the experience pool with the current state distribution.

7. The method for controlling the deep reinforcement learning traffic light according to claim 1, wherein in the step (7), the average passing time of the vehicles in the step (2) and the step (5) and the passing amount of the crossing are compared to generate a return visit file.

8. The method for controlling the deep reinforcement learning traffic light according to claim 1, wherein the step (8) imports the playback file generated by training in the step (7) to a Cityflow platform for display.

9. An apparatus comprising a memory, a processor, and a program stored on the memory and executable on the processor, wherein the processor implements a deep reinforcement learning traffic light control method as claimed in any one of claims 1-8 when the program is executed by the processor.

10. A storage medium storing a computer program, characterized in that the computer program is designed to implement a deep reinforcement learning traffic light control method according to any one of claims 1-8 when run.