CN112258859A

CN112258859A - Intersection traffic control optimization method based on time difference learning

Info

Publication number: CN112258859A
Application number: CN202011037914.7A
Authority: CN
Inventors: 方忠良; 徐韧; 刘亮; 许泸军; 徐琛; 冯远静; 李永强
Original assignee: Aerospace Science And Technology Guangxin Intelligent Technology Co ltd
Current assignee: Aerospace Science And Technology Guangxin Intelligent Technology Co ltd
Priority date: 2020-09-28
Filing date: 2020-09-28
Publication date: 2021-01-22

Abstract

An intersection signal control optimization method based on Time Difference (TD) learning comprises the following steps: 1) acquiring the number of vehicles in all lanes of an intersection and the state of a signal lamp of each lane according to a time sequence; 2) initializing relevant parameters of TD learning; 3) and traversing different learning parameters to obtain different Q value tables, and selecting an optimal Q value table 4) to select the optimal signal control scheme at the next moment of the intersection signal lamp, so as to obtain the optimal action from the Q value table. Compared with the prior art, the intersection traffic flow with high randomness is adapted through real-time signal control, and the traffic signal timing scheme designed by the invention can improve the traffic efficiency of the intersection compared with the traditional timing control timing scheme.

Description

Intersection traffic control optimization method based on time difference learning

Technical Field

The invention relates to the fields of traffic control engineering and artificial intelligence application, in particular to a time Difference learning (TD) method and a traffic signal control method.

Background

Nowadays, automobiles have moved to thousands of households. However, increasing automobile reserves do not have a compatible kangzhou avenue. Therefore, in the first-line city, the traffic congestion problem is increasingly severe. In urban road networks, traffic lights are used for controlling traffic at almost all intersections. And aiming at the timing schemes of the traffic lights, timing schemes are adopted. Then, for the large complex system of the urban road network, the change of the traffic flow is random. The unchanged traffic lights ignore the dynamic information of the road network, so that vehicles in the road network cannot pass efficiently, the urban trip experience of people is reduced, and precious natural energy is greatly lost. In recent years, the rapid development of artificial intelligence technology provides a full theoretical support for signal control, and the vigorous development of sensors such as radar and the like and the popularization of 5G communication technology provide a hardware basis for signal control tamping.

Disclosure of Invention

In order to overcome the defects of the prior art and solve the problem that the existing traffic signal timing scheme cannot well deal with the actual situation of road network traffic flow changing in real time, the invention provides an intersection traffic control optimization method based on time difference learning.

The technical scheme adopted by the invention for solving the technical problems is as follows:

an intersection traffic control optimization method based on time difference learning comprises the following steps:

1) by means of radar ranging, the number of vehicles is measured in the radar ranging range of each lane at each moment for a single intersection, and signal lamp state information at the current moment is recorded. Obtaining a data set in chronological order N_k,S_kIn which S is_kIs at time k, the crossingSignal light status of each lane of the mouth, N_kThe number of vehicles in each lane of the intersection at the time K, wherein K is 1,2, …, and K is the number of data contained in the data set;

2) relevant parameters for initializing TD learning:

2.1) Q value Table

All the items are assigned with 0, and each table entry of the Q value table corresponds to one data of one vehicle number-light state data set;

2.2) λ: 0.1, TD learning parameters, reflecting the prospective strength of the training process;

2.3) γ: 0.99, discount factor;

2.4) ε: 0.001, convergence index;

2.5)r＝-V_kTD, reward value for learning;

3) using vehicle number-light state data set { N_k,S_kTraining Q value table in TD learning

Until reaching the training index;

4) according to the obtained Q value table

The intersection traffic control scheme based on TD learning is as follows: in the actual intersection, acquiring the number N of vehicles in each lane in the current intersection by using a radar sensor_nowAccording to

Signal light state S that should be executed next_nextIs of the formula

Further, the process of the step 3) is as follows;

3.1) tabulating the Q values in chronological order of the data set

Updating the table entry according to the following formula

3.2) calculating a convergence index epsilon, wherein the difference value is calculated as follows, if epsilon is more than 0.001, continuing to execute the step 3.1

3.3) to the learning parameter λ +0.1 until λ 1, resulting in 10 different results

Selecting the one with the largest total Q value

The technical conception of the invention is as follows: the method comprises the steps of firstly, collecting vehicle number information and signal lamp information on each lane of an intersection, training a data set through a TD learning algorithm, and obtaining the optimal execution action of a signal lamp at each state of the intersection, so that the method is applied to the actual intersection.

The invention has the beneficial effects that: the invention can effectively improve the vehicle passing condition at the intersection, improve the vehicle passing efficiency, reduce the vehicle delay time and relieve the traffic jam problem.

Drawings

FIG. 1 shows a training flow diagram of a TD learning algorithm applied to intersection signal control optimization;

fig. 2 shows a road network diagram constructed based on simulation software Vissim for example analysis below.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1 and 2, an intersection traffic control optimization method based on time difference learning includes the following steps:

1) by means of radar ranging, the number of vehicles is measured in the radar ranging range of each lane at each moment for a single intersection, and signal lamp state information at the current moment is recorded. Obtaining a data set in chronological order N_k,S_kIn which S is_kIs the signal light status of each lane at the intersection at time k, N_kThe number of vehicles in each lane of the intersection at the time K, wherein K is 1,2, …, and K is the number of data contained in the data set;

2) relevant parameters for initializing TD learning:

2.1) Q value Table

Each table entry of all the 0, Q value tables corresponds to one vehicle number-light state data

One data of a set;

2.3) γ: 0.99, discount factor;

2.4) ε: 0.001, convergence index;

2.5)r＝-V_kTD, reward value for learning;

Until the training index is reached, the process is as follows:

3.1) tabulating the Q values in chronological order of the data set

Updating the table entry according to the following formula

Selecting the one with the largest total Q value

4) According to the obtained Q value table

Signal light state S that should be executed next_nextIs of the formula

The embodiment takes the measured vehicle number of the intersection built by using simulation software Vissim as an embodiment, and the intersection traffic control optimization method based on time difference learning comprises the following steps:

1) and (3) by calling the Vissim interface, measuring the number of vehicles in the radar ranging range of each lane at each moment for a single intersection, and simultaneously recording the signal lamp state information at the current moment. Obtaining a data set in chronological order N_k,S_kIn which S is_kIs the signal light status of each lane at the intersection at time k, N_kThe number of vehicles in each lane of the intersection at the time K, wherein K is 1,2, …, and K is the number of data contained in the data set;

2) relevant parameters for initializing TD learning:

2.1) Q value Table

One data of a set;

2.3) γ: 0.99, discount factor;

2.4) ε: 0.001, convergence index;

2.5)r＝-V_kTD, reward value for learning;

Until the training index is reached, the process is as follows:

3.1) tabulating the Q values in chronological order of the data set

Updating the table entry according to the following formula

Selecting the one with the largest total Q value

4) According to the obtained Q value table

The intersection traffic control scheme based on TD learning is as follows: in an actual intersection, acquiring the number N of vehicles in each lane at the current intersection by using a Vissim interface_nowAccording to

Signal light state S that should be executed next_nextIs of the formula

With Vissim simulation data software as an embodiment, a traffic signal optimization scheme based on TD learning is obtained by using the method, and a simulation result shows that the average delay time on a road network is 8% shorter than that of a traditional timing control method.

While the foregoing has described the preferred embodiments of the present invention, it will be apparent that the invention is not limited to the embodiments described, but can be practiced with modification without departing from the essential spirit of the invention and without departing from the spirit of the invention.

Claims

1. An intersection traffic control optimization method based on time difference learning is characterized by comprising the following steps:

1) by means of radar ranging, the number of vehicles is measured in the radar ranging range of each lane at each time for a single intersection, signal lamp state information of the current time is recorded, and a data set { N is obtained according to time sequence_k,S_kIn which S is_kIs at time k, the transactionSignal light status of each lane of the fork, N_kThe number of vehicles in each lane of the intersection at the time K, wherein K is 1,2, …, and K is the number of data contained in the data set;

2) relevant parameters for initializing TD learning:

2.1) Q value Table

2.3) γ: 0.99, discount factor;

2.4) ε: 0.001, convergence index;

2.5)r＝-V_kTD, reward value for learning;

Until reaching the training index;

4) according to the obtained Q value table

Signal light state S that should be executed next_nextIs of the formula

2. The intersection traffic control optimization method based on the time difference learning as claimed in claim 1, wherein the process of the step 3) is as follows;

3.1) tabulating the Q values in chronological order of the data set

Updating the table entry according to the following formula

Selecting the one with the largest total Q value