CN109472984A

CN109472984A - Signalized control method, system and storage medium based on deeply study

Info

Publication number: CN109472984A
Application number: CN201811616142.5A
Authority: CN
Inventors: 傅启明; 吴少波; 高振; 陈建平; 钟珊; 陆悠
Original assignee: Suzhou University of Science and Technology
Current assignee: Suzhou University of Science and Technology
Priority date: 2018-12-27
Filing date: 2018-12-27
Publication date: 2019-03-15

Abstract

The present invention relates to a kind of intelligent traffic lamp control methods based on deeply study, comprising: select center crossing, there are multiple peripheral crossings being connected to center crossing around the center crossing, obtain the traffic information and signal information at each crossing, establish the unimpeded state model of crossing congestion, Traffic signal control problem is modeled as a Markovian decision process, and define state therein, movement and immediately reward functions, establish return value function model, optimal policy is solved using DQN deeply learning algorithm, the traffic lights at each crossing are controlled using optimal policy.The above method adaptively can dynamically adjust the control strategy of traffic lights according to real-time traffic information.And adjustment is synchronized to multiple crossings simultaneously, maximizing plays the ability that is open to traffic at each crossing.

Description

Signalized control method, system and storage medium based on deeply study

Technical field

The present invention relates to Signalized control field, more particularly to the Signalized control method learnt based on deeply, System and storage medium.

Background technique

Early 20th century, first appears in the U.S. by the traffic lights of electrically activating, the traffic in the subsequent time Signal lamp technology continues to develop, its appearance enables the effective control of traffic, for the flow that relieves traffic congestion, improves road energy Power, reducing traffic accident has positive effect.

Social fast-developing, economic growth is rapid, and people's lives condition becomes more superior, automobile also become basically universal to Each family, this has undoubtedly aggravated the transport pressure of municipal highway, so that urban road becomes crowded, this point is especially embodied in At crossroad, since traditional traffic signal lamp system cannot timely adapt to road conditions complicated and changeable, it frequently can lead to ten The waste of the congestion at word crossing and a part of transport resource.

At present China city use Traffic signal control mode, with the continuous development in city, vehicle flowrate it is continuous Expand, defect occur in traditional traffic lights, first is that different vehicle flowrate arterial highways often occurs in crossroad when vehicle is let pass The clearance time is identical, easily causes vehicle to accumulate, causes traffic jam；Second is that when on arterial traffic without vehicle, exactly arterial highway It is open to traffic the time, commander's blind spot has been resulted within this time；Third is that can not change red green when this arterial highway vehicle flowrate is very big The time of lamp extend this arterial highway by the time, cause the vehicle of this arterial highway cannot be by thus causing vehicle accumulation.

With the continuous development of traffic lights technology, traffic lights technology of today compared with the past in its function It is greatly improved, Modern Traffic signal lamp control system is the region friendship for integrating computer, communication and control technology Messenger real-time interconnection control system.Can be achieved to satisfy the need the real-time control of oral sex messenger, carry out area coordination control model, center and Local optimal control, the real-time query of crossing state and monitoring, with belisha beacon fault location, timing scheme it is real-time It uploads and downloads, the functions such as the record of operation log and management, the Telnet control of multi-user and rights management.This very big journey The jam situation for alleviating crossroad spent and the generation for reducing crossroad traffic accident, provide for the daily trip of people Great convenience.However, traditional system still remains intelligence in terms of the adaptive adjustment to road conditions complicated and changeable Not enough, inconvenient for use, low efficiency and dependent on numerous deficiencies such as manual operation cannot meet the needs of practical application conscientiously.

Summary of the invention

Based on this, it is necessary to for the problem that traditional adaptive adjustment capability of Signalized control method is poor, provide one The intelligent traffic lamp control method that kind is learnt based on deeply.

A kind of intelligent traffic lamp control method based on deeply study, comprising:

There are multiple peripheral crossings being connected to center crossing at selection center crossing around the center crossing,

The traffic information and signal information at each crossing are obtained,

The unimpeded state model of crossing congestion is established,

Traffic signal control problem is modeled as a Markovian decision process, and defines state therein, movement And reward functions immediately,

Return value function model is established,

Optimal policy is solved using DQN deeply learning algorithm,

The traffic lights at each crossing are controlled using optimal policy.

The above method adaptively can dynamically adjust the control strategy of traffic lights according to real-time traffic information.And simultaneously Adjustment is synchronized to multiple crossings, maximizing plays the ability that is open to traffic at each crossing.

The quantity at the peripheral crossing is 4 in one of the embodiments, and described 4 peripheral crossings are along the center Crossing is circumferentially uniformly distributed.

The center crossing and peripheral crossing are all crossroad in one of the embodiments,.

The traffic information includes the queue length of vehicle and the average speed of each vehicle in one of the embodiments, Degree.

It is described in one of the embodiments, to establish the unimpeded state model of crossing congestion specifically:

Traffic signalization Agent uses deeply learning method, constructs convolution mind network Q^VFor current value network, and A mutually isostructural Q* is constructed as target value network, constructed convolutional neural networks include input layer, two convolutional layers Network, a full articulamentum and output layer, input layer are the current traffic information at each crossing and the picture of signal information, are incited somebody to action The picture of the picture of traffic information and signal information respectively by the feature that is obtained after different convolution layer networks and it is all can The movement of energy is connected entirely, and output layer is that the value of everything under current state s estimates that (s, a), experience replay remember pond and use Q In recording all sample<s, s ', a, r>, wherein s indicates that current road condition, a indicate the movement executed under current road condition, S ' indicates the next state moved to after execution movement a under s state, and r indicates that execution acts a at current road condition s Obtained return immediately.

It is described in one of the embodiments, that Traffic signal control problem is modeled as a Markov decisior process Journey, and state therein, movement and reward functions immediately are defined, specifically:

State indicates with s, and current traffic condition s is by convolutional neural networks from the traffic information picture and signal lamp of input The feature extracted in information picture indicates；

Movement indicates, if greensignal light is opened for G, red colored lamp signal lamp is opened for R with a, respectively to first direction and The straight and turning left signal lamp of second direction is defined, and first direction and second direction are mutually perpendicular to, and the movement a of t moment is used [first direction straight trip, first direction turn left, and second direction straight trip, second direction is turned left] indicates that then the single crossing of t moment can adopt The set of actions taken are as follows:

A={ [G, R, R, R], [R, G, R, R], [R, R, G, R], [R, R, R, G] }；

Reward functions immediately indicate with r, the total number of each crossing stationary vehicle under statistic behavior s, it is every increase by one it is quiet As soon as vehicle only just obtains -1 award, one static vehicle of every reduction obtains one+1 award.

It is described in one of the embodiments, to establish return value function model, specifically:

If (s a) indicates that, using the return value of movement a at state s, (s is a) about R (s, phase a) to value function Q to R It hopes, then Q (s, a)=E [R (s, a)].

It is described in one of the embodiments, to solve optimal policy using DQN deeply learning algorithm, specifically:

Initialization memory playback unit, capacity is N, for storing trained sample；

Initialize current value network, random initializtion weight parameter ω；

Initialized target value network, structure and initialization weight are identical as current value network；

By the photo for showing road conditions by current value network, the Q (s, a) by current value network under free position s is obtained After calculating value function, movement a is selected using ∈-greedy strategy, i.e. making movement is denoted as one for each next state transfer Time step t, and the data that each time step is obtained (s, a, r, s ') deposit playback memory unit；

Define a loss function:

L (ω)=E [(r+ γ maxa ' Q (s ', a '；ω^-)-Q(s,a；ω))²],

One (s, a, r, s ') is randomly selected from playback memory unit, it will (s, a), s ', r be transmitted to current value net respectively Network, target value network and L (ω) are updated L (ω) about ω, more new formula using stochastic gradient descent method are as follows:

A kind of computer storage medium is stored with an at least executable instruction, the executable finger in the storage medium Enabling makes processor execute the corresponding operation of intelligent traffic lamp control method based on deeply study.

A kind of intelligent traffic signal lamp control system based on deeply study, comprising:

The peripheral road that information acquisition unit centrally disposed crossing at the information acquisition unit and is connected with center crossing On mouth, the information acquisition unit is used to obtain the traffic information and signal information at each crossing；

Signalized control unit, for controlling the operating of traffic lights；

Terminal processing units, the terminal processing units are logical with the information acquisition unit and Signalized control unit respectively Letter connection, the terminal processing units are according to the executable following operation of information that information acquisition unit obtains:

The unimpeded state model of crossing congestion is established,

Return value function model is established,

Optimal policy is solved using DQN deeply learning algorithm,

Traffic lights are controlled by the Signalized control unit at each crossing using optimal policy.

Detailed description of the invention

Fig. 1 is the flow chart of the Signalized control method of the embodiment of the present invention.

Fig. 2 is the schematic diagram of 5 crossroads used in the Signalized control method of the embodiment of the present invention.

Fig. 3 is the information acquisition unit and signal lamp control at single crossing in the signal lamp control system of the embodiment of the present invention The schematic diagram that unit processed is connect with terminal processing units respectively.

Fig. 4 is the DQN algorithm training process schematic diagram in the Signalized control method of the embodiment of the present invention.

Specific embodiment

In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing to the present invention Specific embodiment be described in detail.Many details are explained in the following description in order to fully understand this hair It is bright.But the invention can be embodied in many other ways as described herein, those skilled in the art can be not Similar improvement is done in the case where violating intension of the present invention, therefore the present invention is not limited by the specific embodiments disclosed below.

It should be noted that it can directly on the other element when element is referred to as " being fixed on " another element Or there may also be elements placed in the middle.When an element is considered as " connection " another element, it, which can be, is directly connected to To another element or it may be simultaneously present centering elements.

Unless otherwise defined, all technical and scientific terms used herein and belong to technical field of the invention The normally understood meaning of technical staff is identical.Term as used herein in the specification of the present invention is intended merely to description tool The purpose of the embodiment of body, it is not intended that in the limitation present invention.Term " and or " used herein includes one or more phases Any and all combinations of the listed item of pass.

As depicted in figs. 1 and 2, the embodiment provides a kind of intelligent traffic signals based on deeply study Lamp control method comprising:

S100, select center crossing, there are multiple peripheral crossings being connected to center crossing around the center crossing；

S200, the traffic information and signal information for obtaining each crossing；

S300, the unimpeded state model of crossing congestion is established；

S400, Traffic signal control problem is modeled as to a Markovian decision process, and define state therein, Movement and immediately reward functions；

S500, return value function model is established；

S600, optimal policy is solved using DQN deeply learning algorithm；

S700, the traffic lights that each crossing is controlled using optimal policy.

Further, the above method is with trained continuous progress, until the end of training process, obtained plan Slightly, the effect for alleviating crossroad congestion can be gradually increased.The above method is adapted to the road conditions at crossing and independent of spy Fixed environmental model.In the region that can especially guarantee five crossroads composition centered on a crossing, traffic fortune Movement Capabilities maximize, and the traffic capacity for being not limited solely to single crossing maximizes.

It it is appreciated that the quantity at above-mentioned peripheral crossing can be multiple, such as can be 4.Above-mentioned 4 peripheries crossing Arrangement can also there are many.For example, described 4 peripheral crossings are circumferentially uniformly distributed along the center crossing.Fig. 2 gives one kind Embodiment, in the embodiment, terminal processes crossing is above-mentioned Center Road mouthful, 4 peripheral crossings be respectively crossing 1, crossing 2, Crossing 3 and crossing 4.4 crossings are located at due east, due west, due south and the direct north at center crossing.

Further, the form at above-mentioned each crossing can be various ways.Such as shown in Fig. 2, each crossing is all Crossroad.Namely there is first direction and second that is open to traffic to be open to traffic direction, first be open to traffic direction and second direction that is open to traffic it is mutual Vertically.In Fig. 2, first is open to traffic direction for east-west direction, and second is open to traffic direction for North and South direction.

In the present embodiment, the traffic information includes the queue length of vehicle and the average speed of each vehicle.Each vehicle The vehicle platoon length separate computations in road.The vehicle platoon length of left-hand rotation and Through Lane can be calculated.For example, crossing 1 East turn left lane queue length be 25m.

It is described to establish the unimpeded state model of crossing congestion in step S300 in the present embodiment specifically:

It is described that Traffic signal control problem is modeled as a markov in above-mentioned steps S400 in the present embodiment Decision process, and state therein, movement and reward functions immediately are defined, specifically:

State indicates with s, and current traffic condition s is by convolutional neural networks from the traffic information picture and signal lamp of input The feature extracted in information picture indicates.Specifically, the traffic information picture pixels of input are 227*227, to its every 1*1's Pixel defines in the following way, if wherein there is vehicle, enabling the region is 1, will if enabling the region is 0 without vehicle It is 11*11 that traffic information picture passes through convolution kernel respectively, and the three-layer coil lamination of 5*5,3*3, the dimension of final output feature is 8192, then the feature extracted with signal information picture indicates the state at current crossing jointly, with two time steps for one group, no The traffic behavior at a certain moment is only depicted, can more reflect the dynamic rule of traffic behavior.

A={ [G, R, R, R], [R, G, R, R], [R, R, G, R], [R, R, R, G] }；

So adoptable movement shares 4 at state s if crossing has 5⁵=1024 kinds of possibility.

Reward functions immediately indicate with r, the total number of each crossing stationary vehicle under statistic behavior s, it is every increase by one it is quiet As soon as vehicle only just obtains -1 award, one static vehicle of every reduction obtains one+1 award.Final purpose is So that it is that five static vehicles in crossing reach minimum that award is maximum.

It is described to establish return value function model in above-mentioned steps S500 in the present embodiment, specifically:

It is described to solve optimal policy, tool using DQN deeply learning algorithm in above-mentioned steps S600 in the present embodiment Body are as follows:

Initialize current value network, random initializtion weight parameter ω；

Define a loss function:

L (ω)=E [(r+ γ maxa ' Q (s ', a '；ω^-)-Q(s,a；ω))²],

The embodiments of the present invention also provide a kind of computer storage medium, at least one is stored in the storage medium can It executes instruction, the executable instruction makes processor execute the intelligent traffic lamp control based on deeply study The corresponding operation of method.

The embodiments of the present invention also provide a kind of intelligent traffic signal lamp control systems based on deeply study, should System includes:

Signalized control unit, for controlling the operating of traffic lights；

The unimpeded state model of crossing congestion is established,

Return value function model is established,

Optimal policy is solved using DQN deeply learning algorithm,

It is one group that above system, which is by adjacent multiple crossings, and each group of crossing positioned at center may be configured as terminal Crossing is handled, by the signal information picture at each crossing of traffic information picture and synchronization of each crossroad with two A time step is one group and is transmitted to terminal processing units.Time step can determine according to practical crossing congestion degree in above system It is fixed.Congestion level i.e. traffic information can be defined by the queue length of the vehicle at crossing and the average speed of all vehicles.It can Dynamic adjustment is carried out according to the actual situation.Such as: the queue length of vehicle is greater than 25m, and average speed is less than 10km/h, then the time Step-length can be set as 5s.The queue length of vehicle is less than 25m, and average speed is less than 10km/h, then time step can be set as 5s.Vehicle Queue length be greater than 25m, average speed be greater than 10km/h, then time step can be set as 10s.The queue length of vehicle is less than 25m, average speed are greater than 10km/h, then time step can be set as 10s.

Further, optimal policy can be calculated according to respective algorithms in terminal processing units.For example, by traffic information Pass through two convolutional neural networks respectively with signal information, Markovian decision process is constructed simultaneously by the method for intensified learning Optimal policy is solved, so that current demand signal lamp control system is made most suitable movement according to optimal policy.

In the present embodiment, the traffic information includes the queue length of vehicle and the average speed of each vehicle.Each vehicle The vehicle platoon length in road can separate computations.The vehicle platoon length of left-hand rotation and Through Lane can be calculated.For example, crossing Lane queue length turn left as 25m in 1 east.

Further, at the centrally disposed crossing of terminal processing units.In this way, being more advantageous to large-scale use Terminal processing units are centrally disposed at crossing during data transmission, can also make transmission loss most by above system It is small.

Specifically, by taking Fig. 2 as an example.Four information acquisition units and two letters can be set in each crossroad in the system Signal lamp control unit.Terminal processes crossing is additionally provided with terminal processing units.Each information acquisition unit includes supporting USB transmission Electronic camera and the first communication module being connect with the electronic camera, in this way setting can captured in real-time crossing road conditions letter Breath.Each Signalized control unit includes traffic controller and the second communication module that is connected with traffic controller.It is described It is connected between second communication module and first communication module by wifi network.The terminal processing units include data processing group Part and the third communication module being connected with data handling component.The third communication module and second communication module pass through wifi network Network connection.The data handling component is connect with third communication module by USB interface.It is appreciated that above-mentioned each element it Between connection type be not limited to aforesaid way.It can also be realized between respective element using existing interface and connection type Connection.

In the present embodiment, the first communication module use SKW77-WIFI module, the electronic camera with it is described It is communicated to connect between SKW77-WIFI module by USB interface.

In the present embodiment, the second communication module uses SKW77-WIFI module.Second communication module is communicated with first It is connected between module by wifi network.

In the present embodiment, the traffic controller and the second communication module are communicated to connect by USB interface.

In the present embodiment, the third communication module uses SKW77-WIFI module.Third communication module is communicated with second It is connected between module by wifi network.

In the present embodiment, the data handling component is NVIDIA Jetson TK1 developer component.Data handling component with It is communicated to connect between the third communication module by USB interface.

The specific workflow of above system of the invention are as follows:

Above-mentioned electronic camera acquires the traffic information and signal information at corresponding crossing in real time.

Above-mentioned first communication module is connected with second communication module by wifi network.By first communication module by road conditions Information and signal lamp information is transmitted to second communication module.

Above-mentioned second communication module and third communication module are communicated by wifi network.By second communication module by road conditions Information and signal lamp information is transmitted to third communication module.

Above-mentioned third communication module and data handling component are communicated to connect by USB interface.It will by third communication module Traffic information and signal information are transmitted to data handling component.

After the data handling component receives traffic information and signal information, according to the traffic information at each crossing and Signal information establishes the unimpeded state model of crossing congestion.

Traffic signal control problem is modeled as a markov decision process model, to state therein, movement And reward functions are modeled immediately.

Establish return value function model.

Optimal policy is solved using DQN deeply learning algorithm.

The present invention establishes environmental model to received data information by terminal processing units, is obtained most according to DQN algorithm Excellent signal lamp regulation and control scheme.According to the vehicle flowrate automatic adjusument traffic lights of current crossroad, do not need to manually provide Learning sample.Using DQN algorithm on-line study optimal correction strategy, update by stochastic gradient descent method to loss function, Restrain the parameter of current value network gradually.The present invention is significant compared with the traffic light control system of existing fixation Advantage is: 1) can be for random complicated road conditions dynamic corrections optimal policy；2) as trained continuous progress is until training The end of process, its function of alleviating crossroad congestion of the obtained strategy of system can become better and better；3) system is adapted to The road conditions at crossing and independent of specific environmental model；4) guarantee five crossroads composition centered on a crossing Traffic capacity maximizes and is not limited solely to single crossing in region.

Each technical characteristic of embodiment described above can be combined arbitrarily, for simplicity of description, not to above-mentioned reality It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, all should be considered as described in this specification.

The embodiments described above only express several embodiments of the present invention, and the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to protection of the invention Range.Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.

Claims

1. a kind of intelligent traffic lamp control method based on deeply study characterized by comprising

The traffic information and signal information at each crossing are obtained,

The unimpeded state model of crossing congestion is established,

Traffic signal control problem is modeled as a Markovian decision process, and define state therein, movement and Reward functions immediately,

Return value function model is established,

Optimal policy is solved using DQN deeply learning algorithm,

The traffic lights at each crossing are controlled using optimal policy.

2. the intelligent traffic lamp control method according to claim 1 based on deeply study, which is characterized in that The quantity at the periphery crossing is 4, and described 4 peripheral crossings are circumferentially uniformly distributed along the center crossing.

3. the intelligent traffic lamp control method according to claim 2 based on deeply study, which is characterized in that The center crossing and peripheral crossing are all crossroad.

4. the intelligent traffic lamp control method according to claim 1 based on deeply study, which is characterized in that The traffic information includes the queue length of vehicle and the average speed of each vehicle.

5. the intelligent traffic lamp control method according to claim 1 based on deeply study, which is characterized in that It is described to establish the unimpeded state model of crossing congestion specifically:

Traffic signalization Agent uses deeply learning method, constructs convolution mind network Q^VFor current value network, and construct One mutually isostructural Q* as target value network, constructed convolutional neural networks include input layer, two convolution layer networks, One full articulamentum and output layer, input layer is the current traffic information at each crossing and the picture of signal information, by road conditions The picture of the picture of information and signal information respectively by the feature that is obtained after different convolution layer networks with it is all possible Movement is connected entirely, and output layer is that the value of everything under current state s estimates that (s, a), experience replay memory pond is for remembering by Q Record all sample<s, s ', a, r>, wherein s indicates that current road condition, a indicate the movement executed under current road condition, s ' table Show the next state moved to after execution movement a under s state, r indicates that execution movement a is obtained at current road condition s Return immediately.

6. the intelligent traffic lamp control method according to claim 1 based on deeply study, which is characterized in that It is described that Traffic signal control problem is modeled as a Markovian decision process, and define state therein, movement and Reward functions immediately, specifically:

State indicates with s, and current traffic condition s is by convolutional neural networks from the traffic information picture and signal information of input The feature extracted in picture indicates；

Movement, is indicated, if greensignal light is opened for G, red colored lamp signal lamp is opened for R, respectively to first direction and second with a The straight and turning left signal lamp in direction is defined, and first direction and second direction are mutually perpendicular to, and the movement a of t moment uses [first Direction straight trip, first direction turn left, and second direction straight trip, second direction is turned left] it indicates, then the single crossing of t moment can take Set of actions are as follows:

A={ [G, R, R, R], [R, G, R, R], [R, R, G, R], [R, R, R, G] }；

Reward functions immediately indicate with r, the total number of each crossing stationary vehicle under statistic behavior s,

As soon as every award for increasing a static vehicle and just obtaining -1, one static vehicle of every reduction obtain one+1 Award.

7. the intelligent traffic lamp control method according to claim 1 based on deeply study, which is characterized in that It is described to establish return value function model, specifically:

If (s a) indicates at state s using the return value of movement a R, and (s is a) about R (s, expectation a), then Q to value function Q (s, a)=E [R (s, a)].

8. the intelligent traffic lamp control method according to claim 1 based on deeply study, which is characterized in that It is described to solve optimal policy using DQN deeply learning algorithm, specifically:

Initialize current value network, random initializtion weight parameter ω；

By the photo for showing road conditions by current value network, the Q (s, a) by current value network query function under free position s is obtained Out after value function, movement a is selected using ∈-greedy strategy, i.e. making movement is denoted as a time for each next state transfer Step t, and the data that each time step is obtained (s, a, r, s ') deposit playback memory unit；

Define a loss function:

L (ω)=E [(r+ γ maxa ' Q (s ', a '；ω^-)-Q(s,a；ω))²],

One (s, a, r, s ') is randomly selected from playback memory unit, it will (s, a), s ', r be transmitted to current value network respectively, mesh Scale value network and L (ω) are updated L (ω) about ω, more new formula using stochastic gradient descent method are as follows:

9. a kind of computer storage medium, which is characterized in that an at least executable instruction is stored in the storage medium, it is described The intelligence that executable instruction executes processor as claimed in any of claims 1 to 8 in one of claims based on deeply study is handed over The corresponding operation of ventilating signal lamp control method.

10. a kind of intelligent traffic signal lamp control system based on deeply study characterized by comprising

The peripheral crossing that information acquisition unit centrally disposed crossing at the information acquisition unit and is connected with center crossing On, the information acquisition unit is used to obtain the traffic information and signal information at each crossing；

Signalized control unit, for controlling the operating of traffic lights；

Terminal processing units, the terminal processing units connect with the information acquisition unit and Signalized control unit communication respectively It connects, the terminal processing units are according to the executable following operation of information that information acquisition unit obtains:

The unimpeded state model of crossing congestion is established,

Return value function model is established,

Optimal policy is solved using DQN deeply learning algorithm,