CN112258859A - Intersection traffic control optimization method based on time difference learning - Google Patents

Intersection traffic control optimization method based on time difference learning Download PDF

Info

Publication number
CN112258859A
CN112258859A CN202011037914.7A CN202011037914A CN112258859A CN 112258859 A CN112258859 A CN 112258859A CN 202011037914 A CN202011037914 A CN 202011037914A CN 112258859 A CN112258859 A CN 112258859A
Authority
CN
China
Prior art keywords
intersection
learning
time
data set
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011037914.7A
Other languages
Chinese (zh)
Inventor
方忠良
徐韧
刘亮
许泸军
徐琛
冯远静
李永强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aerospace Science And Technology Guangxin Intelligent Technology Co ltd
Original Assignee
Aerospace Science And Technology Guangxin Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aerospace Science And Technology Guangxin Intelligent Technology Co ltd filed Critical Aerospace Science And Technology Guangxin Intelligent Technology Co ltd
Priority to CN202011037914.7A priority Critical patent/CN112258859A/en
Publication of CN112258859A publication Critical patent/CN112258859A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/07Controlling traffic signals
    • G08G1/08Controlling traffic signals according to detected number or speed of vehicles

Abstract

An intersection signal control optimization method based on Time Difference (TD) learning comprises the following steps: 1) acquiring the number of vehicles in all lanes of an intersection and the state of a signal lamp of each lane according to a time sequence; 2) initializing relevant parameters of TD learning; 3) and traversing different learning parameters to obtain different Q value tables, and selecting an optimal Q value table 4) to select the optimal signal control scheme at the next moment of the intersection signal lamp, so as to obtain the optimal action from the Q value table. Compared with the prior art, the intersection traffic flow with high randomness is adapted through real-time signal control, and the traffic signal timing scheme designed by the invention can improve the traffic efficiency of the intersection compared with the traditional timing control timing scheme.

Description

Intersection traffic control optimization method based on time difference learning
Technical Field
The invention relates to the fields of traffic control engineering and artificial intelligence application, in particular to a time Difference learning (TD) method and a traffic signal control method.
Background
Nowadays, automobiles have moved to thousands of households. However, increasing automobile reserves do not have a compatible kangzhou avenue. Therefore, in the first-line city, the traffic congestion problem is increasingly severe. In urban road networks, traffic lights are used for controlling traffic at almost all intersections. And aiming at the timing schemes of the traffic lights, timing schemes are adopted. Then, for the large complex system of the urban road network, the change of the traffic flow is random. The unchanged traffic lights ignore the dynamic information of the road network, so that vehicles in the road network cannot pass efficiently, the urban trip experience of people is reduced, and precious natural energy is greatly lost. In recent years, the rapid development of artificial intelligence technology provides a full theoretical support for signal control, and the vigorous development of sensors such as radar and the like and the popularization of 5G communication technology provide a hardware basis for signal control tamping.
Disclosure of Invention
In order to overcome the defects of the prior art and solve the problem that the existing traffic signal timing scheme cannot well deal with the actual situation of road network traffic flow changing in real time, the invention provides an intersection traffic control optimization method based on time difference learning.
The technical scheme adopted by the invention for solving the technical problems is as follows:
an intersection traffic control optimization method based on time difference learning comprises the following steps:
1) by means of radar ranging, the number of vehicles is measured in the radar ranging range of each lane at each moment for a single intersection, and signal lamp state information at the current moment is recorded. Obtaining a data set in chronological order Nk,SkIn which S iskIs at time k, the crossingSignal light status of each lane of the mouth, NkThe number of vehicles in each lane of the intersection at the time K, wherein K is 1,2, …, and K is the number of data contained in the data set;
2) relevant parameters for initializing TD learning:
2.1) Q value Table
Figure BDA0002705705150000011
All the items are assigned with 0, and each table entry of the Q value table corresponds to one data of one vehicle number-light state data set;
2.2) λ: 0.1, TD learning parameters, reflecting the prospective strength of the training process;
2.3) γ: 0.99, discount factor;
2.4) ε: 0.001, convergence index;
2.5)r=-VkTD, reward value for learning;
3) using vehicle number-light state data set { Nk,SkTraining Q value table in TD learning
Figure BDA0002705705150000021
Until reaching the training index;
4) according to the obtained Q value table
Figure BDA0002705705150000022
The intersection traffic control scheme based on TD learning is as follows: in the actual intersection, acquiring the number N of vehicles in each lane in the current intersection by using a radar sensornowAccording to
Figure BDA0002705705150000023
Signal light state S that should be executed nextnextIs of the formula
Figure BDA0002705705150000024
Further, the process of the step 3) is as follows;
3.1) tabulating the Q values in chronological order of the data set
Figure BDA0002705705150000025
Updating the table entry according to the following formula
Figure BDA0002705705150000026
3.2) calculating a convergence index epsilon, wherein the difference value is calculated as follows, if epsilon is more than 0.001, continuing to execute the step 3.1
Figure BDA0002705705150000027
3.3) to the learning parameter λ +0.1 until λ 1, resulting in 10 different results
Figure BDA0002705705150000028
Selecting the one with the largest total Q value
Figure BDA0002705705150000029
The technical conception of the invention is as follows: the method comprises the steps of firstly, collecting vehicle number information and signal lamp information on each lane of an intersection, training a data set through a TD learning algorithm, and obtaining the optimal execution action of a signal lamp at each state of the intersection, so that the method is applied to the actual intersection.
The invention has the beneficial effects that: the invention can effectively improve the vehicle passing condition at the intersection, improve the vehicle passing efficiency, reduce the vehicle delay time and relieve the traffic jam problem.
Drawings
FIG. 1 shows a training flow diagram of a TD learning algorithm applied to intersection signal control optimization;
fig. 2 shows a road network diagram constructed based on simulation software Vissim for example analysis below.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 and 2, an intersection traffic control optimization method based on time difference learning includes the following steps:
1) by means of radar ranging, the number of vehicles is measured in the radar ranging range of each lane at each moment for a single intersection, and signal lamp state information at the current moment is recorded. Obtaining a data set in chronological order Nk,SkIn which S iskIs the signal light status of each lane at the intersection at time k, NkThe number of vehicles in each lane of the intersection at the time K, wherein K is 1,2, …, and K is the number of data contained in the data set;
2) relevant parameters for initializing TD learning:
2.1) Q value Table
Figure BDA0002705705150000031
Each table entry of all the 0, Q value tables corresponds to one vehicle number-light state data
One data of a set;
2.2) λ: 0.1, TD learning parameters, reflecting the prospective strength of the training process;
2.3) γ: 0.99, discount factor;
2.4) ε: 0.001, convergence index;
2.5)r=-VkTD, reward value for learning;
3) using vehicle number-light state data set { Nk,SkTraining Q value table in TD learning
Figure BDA0002705705150000032
Until the training index is reached, the process is as follows:
3.1) tabulating the Q values in chronological order of the data set
Figure BDA0002705705150000033
Updating the table entry according to the following formula
Figure BDA0002705705150000034
3.2) calculating a convergence index epsilon, wherein the difference value is calculated as follows, if epsilon is more than 0.001, continuing to execute the step 3.1
Figure BDA0002705705150000035
3.3) to the learning parameter λ +0.1 until λ 1, resulting in 10 different results
Figure BDA0002705705150000036
Selecting the one with the largest total Q value
Figure BDA0002705705150000037
4) According to the obtained Q value table
Figure BDA0002705705150000038
The intersection traffic control scheme based on TD learning is as follows: in the actual intersection, acquiring the number N of vehicles in each lane in the current intersection by using a radar sensornowAccording to
Figure BDA0002705705150000039
Signal light state S that should be executed nextnextIs of the formula
Figure BDA00027057051500000310
The embodiment takes the measured vehicle number of the intersection built by using simulation software Vissim as an embodiment, and the intersection traffic control optimization method based on time difference learning comprises the following steps:
1) and (3) by calling the Vissim interface, measuring the number of vehicles in the radar ranging range of each lane at each moment for a single intersection, and simultaneously recording the signal lamp state information at the current moment. Obtaining a data set in chronological order Nk,SkIn which S iskIs the signal light status of each lane at the intersection at time k, NkThe number of vehicles in each lane of the intersection at the time K, wherein K is 1,2, …, and K is the number of data contained in the data set;
2) relevant parameters for initializing TD learning:
2.1) Q value Table
Figure BDA0002705705150000041
Each table entry of all the 0, Q value tables corresponds to one vehicle number-light state data
One data of a set;
2.2) λ: 0.1, TD learning parameters, reflecting the prospective strength of the training process;
2.3) γ: 0.99, discount factor;
2.4) ε: 0.001, convergence index;
2.5)r=-VkTD, reward value for learning;
3) using vehicle number-light state data set { Nk,SkTraining Q value table in TD learning
Figure BDA0002705705150000042
Until the training index is reached, the process is as follows:
3.1) tabulating the Q values in chronological order of the data set
Figure BDA0002705705150000043
Updating the table entry according to the following formula
Figure BDA0002705705150000044
3.2) calculating a convergence index epsilon, wherein the difference value is calculated as follows, if epsilon is more than 0.001, continuing to execute the step 3.1
Figure BDA0002705705150000045
3.3) to the learning parameter λ +0.1 until λ 1, resulting in 10 different results
Figure BDA0002705705150000046
Selecting the one with the largest total Q value
Figure BDA0002705705150000047
4) According to the obtained Q value table
Figure BDA0002705705150000048
The intersection traffic control scheme based on TD learning is as follows: in an actual intersection, acquiring the number N of vehicles in each lane at the current intersection by using a Vissim interfacenowAccording to
Figure BDA0002705705150000049
Signal light state S that should be executed nextnextIs of the formula
Figure BDA00027057051500000410
With Vissim simulation data software as an embodiment, a traffic signal optimization scheme based on TD learning is obtained by using the method, and a simulation result shows that the average delay time on a road network is 8% shorter than that of a traditional timing control method.
While the foregoing has described the preferred embodiments of the present invention, it will be apparent that the invention is not limited to the embodiments described, but can be practiced with modification without departing from the essential spirit of the invention and without departing from the spirit of the invention.

Claims (2)

1. An intersection traffic control optimization method based on time difference learning is characterized by comprising the following steps:
1) by means of radar ranging, the number of vehicles is measured in the radar ranging range of each lane at each time for a single intersection, signal lamp state information of the current time is recorded, and a data set { N is obtained according to time sequencek,SkIn which S iskIs at time k, the transactionSignal light status of each lane of the fork, NkThe number of vehicles in each lane of the intersection at the time K, wherein K is 1,2, …, and K is the number of data contained in the data set;
2) relevant parameters for initializing TD learning:
2.1) Q value Table
Figure FDA0002705705140000011
All the items are assigned with 0, and each table entry of the Q value table corresponds to one data of one vehicle number-light state data set;
2.2) λ: 0.1, TD learning parameters, reflecting the prospective strength of the training process;
2.3) γ: 0.99, discount factor;
2.4) ε: 0.001, convergence index;
2.5)r=-VkTD, reward value for learning;
3) using vehicle number-light state data set { Nk,SkTraining Q value table in TD learning
Figure FDA0002705705140000012
Until reaching the training index;
4) according to the obtained Q value table
Figure FDA0002705705140000013
The intersection traffic control scheme based on TD learning is as follows: in the actual intersection, acquiring the number N of vehicles in each lane in the current intersection by using a radar sensornowAccording to
Figure FDA0002705705140000014
Signal light state S that should be executed nextnextIs of the formula
Figure FDA0002705705140000015
2. The intersection traffic control optimization method based on the time difference learning as claimed in claim 1, wherein the process of the step 3) is as follows;
3.1) tabulating the Q values in chronological order of the data set
Figure FDA0002705705140000016
Updating the table entry according to the following formula
Figure FDA0002705705140000017
3.2) calculating a convergence index epsilon, wherein the difference value is calculated as follows, if epsilon is more than 0.001, continuing to execute the step 3.1
Figure FDA0002705705140000018
3.3) to the learning parameter λ +0.1 until λ 1, resulting in 10 different results
Figure FDA0002705705140000019
Selecting the one with the largest total Q value
Figure FDA0002705705140000021
CN202011037914.7A 2020-09-28 2020-09-28 Intersection traffic control optimization method based on time difference learning Pending CN112258859A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011037914.7A CN112258859A (en) 2020-09-28 2020-09-28 Intersection traffic control optimization method based on time difference learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011037914.7A CN112258859A (en) 2020-09-28 2020-09-28 Intersection traffic control optimization method based on time difference learning

Publications (1)

Publication Number Publication Date
CN112258859A true CN112258859A (en) 2021-01-22

Family

ID=74234141

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011037914.7A Pending CN112258859A (en) 2020-09-28 2020-09-28 Intersection traffic control optimization method based on time difference learning

Country Status (1)

Country Link
CN (1) CN112258859A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090327011A1 (en) * 2008-06-30 2009-12-31 Autonomous Solutions, Inc. Vehicle dispatching method and system
CN108510764A (en) * 2018-04-24 2018-09-07 南京邮电大学 A kind of adaptive phase difference coordinated control system of Multiple Intersections and method based on Q study
CN108805348A (en) * 2018-06-05 2018-11-13 北京京东金融科技控股有限公司 A kind of method and apparatus of intersection signal timing control optimization
CN109559530A (en) * 2019-01-07 2019-04-02 大连理工大学 A kind of multi-intersection signal lamp cooperative control method based on Q value Transfer Depth intensified learning
CN111489568A (en) * 2019-01-25 2020-08-04 阿里巴巴集团控股有限公司 Traffic signal lamp regulation and control method and device and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090327011A1 (en) * 2008-06-30 2009-12-31 Autonomous Solutions, Inc. Vehicle dispatching method and system
CN108510764A (en) * 2018-04-24 2018-09-07 南京邮电大学 A kind of adaptive phase difference coordinated control system of Multiple Intersections and method based on Q study
CN108805348A (en) * 2018-06-05 2018-11-13 北京京东金融科技控股有限公司 A kind of method and apparatus of intersection signal timing control optimization
CN109559530A (en) * 2019-01-07 2019-04-02 大连理工大学 A kind of multi-intersection signal lamp cooperative control method based on Q value Transfer Depth intensified learning
CN111489568A (en) * 2019-01-25 2020-08-04 阿里巴巴集团控股有限公司 Traffic signal lamp regulation and control method and device and computer readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙浩等: "基于深度强化学习的交通信号控制方法", 《计算机科学》 *

Similar Documents

Publication Publication Date Title
CN111260937A (en) Cross traffic signal lamp control method based on reinforcement learning
CN111931905A (en) Graph convolution neural network model and vehicle track prediction method using same
CN111739284B (en) Traffic signal lamp intelligent timing method based on genetic algorithm optimization fuzzy control
CN112216127B (en) Small road network traffic signal optimization method based on near-end strategy optimization
Fang et al. FTPG: A fine-grained traffic prediction method with graph attention network using big trace data
CN108831168B (en) Traffic signal lamp control method and system based on visual identification of associated intersection
CN113257016B (en) Traffic signal control method and device and readable storage medium
CN111243271A (en) Single-point intersection signal control method based on deep cycle Q learning
CN112265546A (en) Networked automobile speed prediction method based on time-space sequence information
CN110718077A (en) Signal lamp optimization timing method under action-evaluation mechanism
CN111081035A (en) Traffic signal control method based on Q learning
Zeng et al. Training reinforcement learning agent for traffic signal control under different traffic conditions
Li et al. Deep imitation learning for traffic signal control and operations based on graph convolutional neural networks
CN114120670B (en) Method and system for traffic signal control
Zhao et al. Traffic signal control with deep reinforcement learning
CN112767680B (en) Green wave traffic evaluation method based on trajectory data
CN114419884A (en) Self-adaptive signal control method and system based on reinforcement learning and phase competition
CN109255948A (en) A kind of divided lane wagon flow scale prediction method based on Kalman filtering
CN110530378B (en) Vehicle positioning method based on MAP message set of V2X
CN112258859A (en) Intersection traffic control optimization method based on time difference learning
CN111507499B (en) Method, device and system for constructing model for prediction and testing method
CN115472023B (en) Intelligent traffic light control method and device based on deep reinforcement learning
Luo et al. Researches on intelligent traffic signal control based on deep reinforcement learning
CN112216126A (en) Trunk traffic control optimization method based on SARSA
CN115331460A (en) Large-scale traffic signal control method and device based on deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210122

WD01 Invention patent application deemed withdrawn after publication