CN115691167A

CN115691167A - Single-point traffic signal control method based on intersection holographic data

Info

Publication number: CN115691167A
Application number: CN202211253243.7A
Authority: CN
Inventors: 王涛; 赵晓寅; 程瑞; 徐奇
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2022-10-13
Filing date: 2022-10-13
Publication date: 2023-02-03

Abstract

The invention relates to the technical field of traffic control, in particular to a single-point traffic signal control method based on intersection holographic data, which is characterized in that a deep reinforcement learning model comprising a plurality of intelligent bodies is constructed, then a MARDDPG algorithm is used for training a neural network, and finally the trained neural network is used for controlling signals.

Description

Single-point traffic signal control method based on intersection holographic data

Technical Field

The invention relates to the technical field of traffic control, in particular to a single-point traffic signal control method based on intersection holographic data.

Background

Currently, the widely used induction signal control systems in China mainly include the british SCOOT (Split Cycle Offset Optimization technology) System and australian SCATS (systematic Coordinated Area Traffic System). With the continuous acceleration of the urbanization process and the exponential increase of the number of motor vehicles, the traditional induction traffic signal control mode is difficult to deal with the traffic flow with huge real-time change and carry out effective management, and the traditional induction control principle is as follows: and adjusting the green light duration based on the headway, the occupancy, the queuing length and the crowding degree.

The disadvantages are mainly shown in the following aspects:

1. the detection data of the traditional induction traffic signal control system cannot comprehensively and effectively represent the traffic demand of an intersection, the coil detection error is large, the reliability is low, and the effectiveness of an algorithm and a control method is difficult to effectively exert.

2. The conventional deep reinforcement learning traffic control method is generally insufficient in cooperative optimization, a single intelligent agent only can perform system optimization on traffic signal control, and the discrete decision space of a combined optimization problem is too complex;

3. the state space dimension of the single intelligent agent is too high, the calculation time is long, and the single intelligent agent cannot be used on the ground actually.

Disclosure of Invention

The invention aims to provide a single-point traffic signal control method based on intersection holographic data, which utilizes an intersection vehicle holographic detection means to acquire vehicle position and speed information in real time and more effectively and accurately control traffic signals.

In order to achieve the purpose, the invention provides a single-point traffic signal control method based on intersection holographic data, which comprises the following steps:

collecting initial holographic traffic data;

processing holographic traffic data;

constructing a deep reinforcement learning model comprising a plurality of agents;

training a neural network by using a MARDDPG algorithm;

and performing signal control by using the trained neural network.

The holographic traffic data comprises target vehicle operation data, lane-level traffic data, intersection design current situation and traffic signal control current situation, wherein the target vehicle operation data comprises a timestamp ID of a target vehicle, a vehicle type, a vehicle longitudinal speed, a lane number where the vehicle is located and a distance between the vehicle and a stop line, the lane-level traffic data comprises a target lane queuing length, a total vehicle waiting time, an average delay and a number of vehicles passing the stop line, the intersection design current situation comprises the number of lanes of each entrance lane and lane function distribution, and the traffic signal control current situation comprises intersection current phase sequence and time duration distribution of each phase.

The holographic traffic data processing process comprises the following steps:

redundant data is deleted;

deleting track abnormal data;

and (5) performing noise track data completion by adopting a linear function interpolation method.

In the process of constructing the deep reinforcement learning model comprising a plurality of intelligent agents, the traffic signal control intelligent Agent selects a MARDDPG algorithm to carry out deep reinforcement learning, and defines a state space S, an action space A and a reward value R respectively.

The process of carrying out neural network training by using the MARDDPG algorithm comprises the following steps:

step 1: participant network and comment family network initialization parameterized action selection strategy

Wherein

Commenting historical memory data of the family network for historical memory data of an ith intelligent agent in a participant network at a certain time step t

Historical memory data of an ith agent in the network at a certain time step t is also defined; (ii) a Constructing and parameterizing value functions in a critic network

The input in the value function comprises a historical state h and an action a selected by the participant network; separately, the weights θ 'of all target networks are initialized' _i And

initializing the replay buffer D, clearing the time step, and the participant network reading the initial state s _t,i ；

And 2, step: every 5 seconds, a single agent selects a strategy by action

Wherein

Exploring noise for the current time step; the participant network selects action a in action set Ai _t,i And executes, after execution, to receive the reward value r _t,i With new state s _t+1,i Generating new history data

And 3, step 3: will have already been performed s _1,i ，a _1,i ，r _1,i ，s _2,i ，a _2,i ，r _2,i A reader is stored in a replay buffer D, and the number of samples is not less than 20000; the agent i samples M historical training steps stored in the replay buffer D for training the network of participants and critics;

and 4, step 4: after the agent selects M pieces of historical training step data, a minimatch (expressed by M) in M is utilized to enable the comment family network to pass through the value function constructed in the step 1

Estimating Q value, and updating parameters in the target network of the critic by minimizing the average loss function

Similarly, the participant network updates the parameter θ in the participant target network by calculating a policy gradient through a loss function _i ；

For updating

The loss function of (d) is specifically as follows:

representing the average value of Q values of M minimatch selection actions of all intelligent agent critics;

is the total prize value; gamma is a reduction coefficient;

representing the sum of the Q values of all agents at the next time step;

for updating theta _i The objective function of (2) is specifically as follows:

to update the participant network to maximize future expected rewards, J (θ) is defined _i ) To find a direction to maximize the jackpot;

and 5: the intelligent agent adopts a soft update mode to update the parameters of the target network

And theta' _i First, a rate τ (0) for updating the target network based on the master network is defined<τ<1) (ii) a Updating the target network by using the convex combination of the parameters of the current network and the parameters of the target network, as follows:

θ' _i ＝τθ _i +(1-τ)θ' _i

repeating the steps 2 to 5 when

In time, the best in the traversable state space has been reached and the agent completes the training.

The method comprises the following steps of performing a signal control process by using a trained neural network, specifically, acquiring real-time holographic data according to an intelligent agent construction requirement, inputting a signal to control the intelligent agent, and generating a phase time matrix G' by changing the time length of each phase signal output by the intelligent agent:

G'＝[G1' G2' G3' … Gn']。

the invention provides a single-point traffic signal control method based on intersection holographic data, which is characterized in that a deep reinforcement learning model comprising a plurality of intelligent bodies is constructed, then a MARDDPG algorithm is used for training a neural network, and finally the trained neural network is used for controlling signals.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a single-point traffic signal control method based on intersection holographic data according to the present invention.

FIG. 2 is a schematic diagram of a convolution network structure of the deep reinforcement learning model of the present invention.

FIG. 3 is a schematic diagram comparing vehicle position to a position matrix and a speed matrix at an intersection of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

Referring to fig. 1, the invention provides a crop disease and pest mobile terminal identification method based on deep learning, which comprises the following steps:

s1: collecting initial holographic traffic data;

s2: processing holographic traffic data;

s3: constructing a deep reinforcement learning model comprising a plurality of agents;

s4: training a neural network by using a MARDDPG algorithm;

s5: and performing signal control by using the trained neural network.

Specifically, in step S1, the holographic traffic data includes target vehicle operation data, lane-level traffic data, intersection design current situation, and traffic signal control current situation.

Wherein the target vehicle operation data includes: target vehicle identification number C _id Vehicle type C _s Vehicle longitudinal speed V _p Number L of lane where vehicle is located _i Distance Y between vehicle and stop line _i 。

The lane-level traffic data includes: target lane queuing length Q _i Total waiting time W of vehicle _i Average delay D _i Number of vehicles passing stop line N _c 。

The current design situation of the intersection comprises: the number of lanes of each entrance lane at the intersection and the function distribution of the lanes.

The current control situation of traffic signals includes: the current phase sequence and the time length of each phase at the intersection are distributed.

In step S2, the holographic traffic data processing is to improve data accuracy and improve traffic control efficiency, and specifically includes the following steps:

step 2.1, redundant and miscellaneous data are deleted, and abnormal identification possibly occurs under the influence of transmission interference and equipment identification rate: the normal output data is a character string, and if the data digits do not accord with each other or the data is messy, all traffic data of related target vehicles need to be deleted.

And 2.2, deleting track abnormal data, and if the situation that the lane steering does not correspond to the data collected by the equipment is the track abnormal, deleting all traffic data of related target vehicles.

And 2.3, completing noise track data, wherein a linear function interpolation method can be adopted for collecting traffic data according to the timing step length of the equipment, firstly, the target vehicle continuous position points with the track offset larger than a threshold value form a track, the formed track is fitted into a linear function, and then the offset track points are inserted into the median positions of front and rear track points in a fitting curve.

In step S3, a deep reinforcement learning model including a plurality of agents is constructed, which specifically includes the following steps:

assuming that the intersection is controlled by classical n-phase signals at the current state, the number of passable lanes in each phase is m, under the condition of not adjusting the phase sequence, 1 signal control agent is deployed in each phase to adjust the duration of the green light of the phase to control the traffic flow, and the state and feedback can be obtained from the environment after each adjustment; all signal control agents can be optimized in a coordinated mode, and the total traffic jam of the intersection is reduced.

And 3.1, the traffic signal control Agent uses a multi-Agent deep reinforcement learning method, and selects a multi-Agent recursive depth certainty strategy gradient algorithm (MARDDPG) according to the characteristics of high precision of holographic traffic data and time continuity of traffic flow, wherein the MARDDPG algorithm is an MADDPG algorithm added with LSTM (long-short term memory).

The input of the participant (Actor) network is agent Sti = (P, V, L), where P and V are same-dimension matrices and have dimensions of 45 × m, and the P matrix and the V matrix are combined into a two-channel image and input to the stacking sub-network. The sub-network comprises two convolutional layers, the first convolutional layer comprising 32 filters, each size 4 × 4, applying a span (2, 2); the second convolutional layer contains 64 filters, each of size 2 × 2, applying a span (2,2). The phase matrix L is encoded as an 8-dimensional vector using a fully connected layer. Then, the outputs of all networks are connected into a vector, the vector synthesized by all agents is respectively sent to the LSTM containing 64 hidden units, and the action prediction value Q (S, A) is output through the softmax activation function, and the specific convolution network structure is shown in FIG. 2,

the reviewer (Critic) network structure is similar to the participant network, and in addition to inputting the state space, the global action set a of all the intelligent agents at the intersection needs to be input.

The replay buffer D is used to randomly sample the samples during each training step while updating the participant network and the reviewer network.

And 3.2, defining a state space S, reading the acquired target vehicle running data, and defining the state by each intelligent agent by using the vehicle position and speed in the corresponding passable lane of the self-distributed phase.

As shown in fig. 3, in order to represent the vehicle position, dividing the released lane into discrete units with 6 meters at equal intervals, wherein each discrete unit is a cell, if a vehicle is in the corresponding cell, the corresponding position value is 1, otherwise the corresponding position value is 0; the matrix Pi is transversely placed by taking the right side as a stop line in different directions, and the position matrix P of the intelligent body consists of the matrixes Pi in all directions:

reading the target vehicle speed to represent the vehicle speed, forming a speed matrix Vi according to the arrangement mode of a matrix P, wherein the speed matrix V of the intelligent agent is composed of the matrix Vi in each direction:

the intersection phase scheme is expressed in the state space, the green time of each phase is Gi, the ratio of the conversion to the cycle length is Li, and a green time matrix G and a phase matrix L are formed:

G＝[G1 G2 G3 … Gn]

L＝[L1 L2 L3 … Ln]

in summary, the intersection state is defined as St = (P, V, L) at discrete time step t.

And 3.3, defining an action space A, controlling the intelligent agent of the current phase to select action when the phase green light time is over, wherein the phase green light time is required to be changed within a small range in order to ensure the stability of the system, and the green light time Gi of each phase is required to be limited between the maximum green light time Gmax and the minimum green light time Gmin (Gmin is less than or equal to Gi and less than or equal to Gmax).

Maximum green time calculation formula:

in the formula: g _max -the maximum time of the green light,

C _max -maximum cycle time, recommended to take 180 seconds;

l-the total loss time,

y-this phase-critical flow ratio,

y is the sum of the key flow ratios of the phases;

minimum green time calculation formula:

in the formula: g _min -the minimum time of the green light,

PL _p -the longitudinal length of the crosswalk,

Pv _p -pedestrian crossing speed, experience is taken as 1.2m/s.

I-Green light time interval;

its action set is Ai = (-5, -4, -3, -2, -1,0, +1, +2, +3, +4, + 5), if the agent selects Ai = +3, it means that the current phase green duration is increased by 3 seconds, and the changed phase time is used to update the phase matrix in the state space after conversion.

Step 3.4, defining an award value R, wherein different indexes have different degrees of influence on each agent strategy, and therefore, the reconciliation weight values of various intersection traffic parameters are adopted as the award values of the agents, namely R = W ₁ R _l +W ₂ R _w +W ₃ R _d +W ₄ R _c (in the invention, W1 is-0.25, W2 is 0.2, W3 is-1, and W4 is 1), wherein:

(1) Queue length Rl: namely the sum of the queue lengths lij of all related roads controlled by the intelligent agent, and the sources of the sum are lane-level traffic data acquisition:

dij — the queuing length of all lanes controlled by the agent;

(2) Waiting time Rw: namely the sum of the vehicle waiting time Wij of all related roads controlled by the intelligent agent, the source of the sum is given by the combination of target vehicle operation data and lane-level traffic data:

wijn — the queuing time of all vehicles in all lanes controlled by the agent;

(3) Average delay Rd: namely the average delay of all related roads controlled by the intelligent agent, the sources of the delay are lane-level traffic data acquisition:

dijn — average delay of all lanes controlled by the agent;

(4) The intersection passes through the vehicle Rc, and is obtained by collecting the running data of the target vehicle:

cijn-the sum of all vehicles passing the stop line in phase green time for all lanes controlled by the agent;

in step S4, training a neural network, training a traffic signal control agent by using a MARDDPG algorithm in a deep reinforcement learning method, wherein the algorithm selects an optimal action by updating an action strategy, and specifically comprises the following substeps:

step 4.1, initializing parameterized action selection strategy for participant network and comment family network

Wherein

Historical memory data including state, reward and action information of an ith intelligent agent in the network at a certain time step t is stored; constructing and parameterizing value functions in a critic network

The input in the value function comprises a historical state h and an action a selected by the participant network;

separately, the weights θ 'of all target networks are initialized' _i ，

Initializing replay buffer D, clearing time steps, participant network reading initial state s _t,i 。

Step 4.2, every 5 seconds, single agent selects strategy through action

Wherein

Exploring for noise for the current time step; the participant network selects action a in action set Ai _t,i And executes, after execution, to receive the bonus value r _t,i With new state s _t+1,i Generating new history data

Step 4.3, will have already gone on { s _1,i ，a _1,i ，r _1,i ，s _2,i ，a _2,i ，r _2,i A sentence } is stored in a replay buffer D, in the present invention the buffer storage size is set to 20000; and the agent i samples M historical training steps stored in the replay buffer D for training the network of the participants and the critics.

Step 4.4, after the agent selects M pieces of historical training step data, the agent sends the data to the serverThe number of samples of M in the lighting is 64. Dividing a plurality of minipatches (expressed by m) by utilizing 64 historical data to perform random batch, specifically, firstly, extracting 64 data from a buffer, but not updating 64 data, randomly dividing 64 data into a plurality of minipatches, and then updating a loss function by using data of the minipatches (batch); inputting the state and the action selected by the participant network into the critic network, and enabling the critic network to pass through the value function constructed in the step 4.1

The participant network utilizes the reward value returned by the critic network to calculate the strategy gradient through the loss function so as to update the parameter theta in the participant target network _i 。

For updating

The loss function of (2) is specifically as follows:

is the total reward value; gamma is a reduction coefficient, and the value of gamma is 0.99 in the invention;

representing the sum of the Q values of all agents at the next time step.

For updating theta _i The objective function of (2) is specifically as follows:

to update the participant network to maximize future expected rewards, J (θ) is defined _i ) To find the direction to maximize the jackpot.

Step 4.5, all the agents in the system adopt a soft update mode to update the parameters of the target network

And theta' _i First, a rate τ (0) for updating the target network with respect to the master network is defined<τ<1) In the invention, the value is 0.001; updating the target network by using the convex combination of the parameters of the current network and the parameters of the target network, as follows:

θ' _i ＝τθ _i +(1-τ)θ' _i

repeating the steps 4.2 to 4.5 when

(in the present invention Δ p is taken to be 0.05), the best in the state space that can be traversed has been reached, and the agent completes the training.

S5, performing signal control by using the trained neural network, acquiring real-time holographic data according to the intelligent agent construction requirements in the steps 3.2 and 3.4, inputting signals to control the intelligent agent, and generating a phase time matrix through the time length change action of each phase signal output by the intelligent agent:

G'＝[G1' G2' G3' … Gn']

gn' is the optimized green time of the intelligent agent for controlling the n phases, the green time is combined and then input into a signal machine before the next period starts, and the optimized signal control scheme is used for controlling the traffic signals.

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims

1. A single-point traffic signal control method based on intersection holographic data is characterized by comprising the following steps:

collecting initial holographic traffic data;

processing holographic traffic data;

training a neural network by using a MARDDPG algorithm;

and performing signal control by using the trained neural network.

2. The single-point traffic signal control method based on intersection holographic data according to claim 1,

the holographic traffic data comprises target vehicle running data, lane-level traffic data, intersection design current situation and traffic signal control current situation, wherein the target vehicle running data comprises a timestamp ID of a target vehicle, a vehicle type, a vehicle longitudinal speed, a lane number where the vehicle is located and a distance between the vehicle and a stop line, the lane-level traffic data comprises a target lane queuing length, a total vehicle waiting time, an average delay and a number of vehicles passing the stop line, the intersection design current situation comprises the number of lanes of each entrance lane and the function distribution of the lanes of an intersection, and the traffic signal control current situation comprises intersection current phase sequence and time length distribution of each phase.

3. The single-point traffic signal control method based on intersection holographic data according to claim 1,

the holographic traffic data processing process comprises the following steps:

redundant data is deleted;

deleting track abnormal data;

4. The single point traffic signal control method based on intersection holographic data according to claim 1,

in the process of constructing a deep reinforcement learning model comprising a plurality of intelligent agents, the traffic signal control intelligent Agent selects a MARDDPG algorithm to carry out deep reinforcement learning, and defines a state space S, an action space A and a reward value R respectively.

5. The single point traffic signal control method based on intersection holographic data according to claim 1,

the process of neural network training using the MARDDPG algorithm comprises the following steps:

Wherein

Historical memory data of an ith agent in the network at a certain time step t is also defined; constructing and parameterizing value functions in a critic network

The input in the value function comprises the historical state h and the action a selected by the participant network; separately, the weights θ 'of all target networks are initialized' _i And

And 2, step: every 5 seconds, a single agent selects a strategy by action

Wherein

Exploring noise for the current time step; the participant network selects action a in action set Ai _t,i And executes, after execution, to receive the bonus value r _t,i With new state s _t+1,i Generating new history data

And 3, step 3: will have already been performed s _1,i ，a _1,i ，r _1,i ，s _2,i ，a _2,i ，r _2,i \ 8230 } is stored in the replay buffer D, and the number of samples is not less than 20000; the agent i samples M historical training steps stored in the replay buffer D for training the network of participants and critics;

and 4, step 4: after the intelligent agent selects M pieces of historical training step data, a minibatch in M is utilized to enable the comment family network to pass through the value function constructed in the step 1

Estimating Q value and updating parameters in critic target network by minimizing average loss function

Wherein minipatch is represented as m; similarly, the participant network updates the parameter θ in the participant target network by calculating a policy gradient through a loss function _i ；

For updating

The loss function of (d) is specifically as follows:

is the total prize value; gamma is a reduction coefficient;

representing the sum of Q values of all agents at the next time step in the critic network;

for updating theta _i The objective function of (2) is specifically as follows:

θ′ _i ＝τθ _i +(1-τ)θ′ _i

repeating the steps 2 to 5 when

6. The single-point traffic signal control method based on intersection holographic data according to claim 1,

the method comprises the following steps of performing signal control by using a trained neural network, specifically, acquiring real-time holographic data according to the construction requirement of an intelligent agent, inputting signals to control the intelligent agent, and generating a phase time matrix G' by changing the time length of each phase signal output by the intelligent agent:

G'＝[G1' G2' G3' … Gn']。