CN113487889B

CN113487889B - Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent

Info

Publication number: CN113487889B
Application number: CN202110813579.3A
Authority: CN
Inventors: 徐东伟; 王达; 李呈斌; 周磊
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2021-07-19
Filing date: 2021-07-19
Publication date: 2022-06-17
Anticipated expiration: 2041-07-19
Also published as: CN113487889A

Abstract

A traffic state countermeasure disturbance generation method based on single intersection signal control of rapid gradient descent is characterized in that according to a traffic intersection signal lamp control model trained by an existing reinforcement learning DQN algorithm, countermeasure samples are generated by utilizing FGSM attack based and discretization processing on the countermeasure disturbance in combination with gradient values, the final disturbance state obtained by combining the countermeasure disturbance and an original state is input into an intelligent model, and finally the effect of fluency or congestion degree inspection on a single intersection is achieved on sumo. The invention can limit the disturbance under the condition that the output disturbance has physical significance, thereby efficiently generating a confrontation state, increasing the queuing length and waiting time of the intersection, greatly reducing the performance of the model and greatly reducing the traffic intersection circulation.

Description

Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent

Technical Field

The invention belongs to the crossing field of intelligent traffic and machine learning information safety, and relates to a traffic state anti-disturbance generation method based on single crossing signal control of fast gradient descent (FGSM).

Background

The problem of traffic congestion becomes an urgent challenge for urban traffic, and when a modern city is designed, one of the most critical considerations is to develop an intelligent traffic management system. The main goal of traffic management systems is to reduce traffic congestion, which has become one of the major problems in large cities today. Efficient urban traffic management can save time and money and reduce carbon dioxide emissions to the atmosphere.

Reinforcement Learning (RL) has produced impressive results as a machine learning technique for the traffic signal control problem. Reinforcement learning does not require prior full understanding of the environment, such as traffic flow. Instead, they can acquire knowledge and model environmental dynamics by interacting with the environment. After each operation is performed in the environment, it will receive a scalar reward. The reward earned depends on the degree of action taken and the goal of the agent is to learn the best control strategy, so by repeatedly interacting with the environment, the accumulated reward at a reduced price can be maximised. Deep Reinforcement Learning (DRL) has numerous applications in the real world due to its excellent ability to adapt quickly to the surrounding environment. Although DRL has great advantages, it is vulnerable to antagonistic attacks such as: the method comprises the following steps of attracting attack, strategy timing attack, value function-based anti-attack, Trojan attack and the like.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a traffic state anti-disturbance generation method based on single intersection signal control with rapid gradient descent, which can add a small amount of disturbance to the number of vehicles and the positions of the vehicles and ensure that the disturbance has actual physical significance, thereby efficiently generating the anti-disturbance and greatly reducing the performance of a model and the smoothness degree of a traffic intersection.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent comprises the following steps:

step 1: training a reinforcement learning Deep Q Network (DQN) intelligent model on a single intersection road grid, wherein Network parameters of the model do not change after training, the model has high mobility, and high fluency and no congestion are embodied in a single intersection testing process;

step 2: acquiring the number of vehicles at the input end of each intersection and the positions of the vehicles at the input end of each intersection, namely inputting the number and the positions of the vehicles into a model, generating corresponding traffic signal lamps, namely output actions, and attacking the input at each moment one by utilizing a FGSM (highway fault warning message) attack algorithm to obtain corresponding counterdisturbance;

and 3, step 3: discretizing the generated countermeasure disturbance, and combining the generated countermeasure disturbance with the originally acquired traffic flow to obtain a final disturbance state, namely the number and the positions of the vehicles at the traffic intersection input into the model at the moment;

and 4, step 4: in the currently constructed disturbance state, the disturbance magnitude is limited, and when the disturbance amount is smaller than the disturbance limit, the disturbance state is input into the model; inputting the original state into the model when the disturbance quantity is larger than the disturbance limit;

and 5: and finally, comparing fluency of traffic intersections by traffic light phases obtained by traffic flows in different input states on sumo.

As a research hotspot in the field of artificial intelligence, Deep Reinforcement Learning (DRL) has achieved certain success in various fields such as robot control, computer vision, intelligent transportation, and the like. Meanwhile, the possibility of attack and whether it has strong resistance are also the hot topics in recent years. Therefore, the method selects a representative Deep Q Network (DQN) algorithm in Deep reinforcement learning, controls single intersection signal lamps to be application scenes, and adopts a fast gradient descent method (FGSM) to attack the DQN algorithm to generate countermeasure samples.

The technical conception of the invention is as follows: according to a traffic intersection signal lamp control model trained by the existing reinforcement learning DQN algorithm, discrete processing is carried out on the countermeasure disturbance by using FGSM attack and combining with the gradient value to generate a countermeasure sample, the final disturbance state obtained by combining the countermeasure disturbance and the original state is input into an intelligent model, and finally the effect of detecting the fluency or the congestion degree of a single-cross intersection is achieved on sumo.

The invention has the following beneficial effects: and generating corresponding opposite disturbance to the maximum gradient value by using an FGSM attack algorithm, wherein the generated disturbance is a discrete value, combining the opposite disturbance and the original traffic flow to form a disturbance state, adding disturbance limitation to the disturbance quantity of the disturbance state, and obtaining the output which is the disturbance state. The invention can limit the disturbance under the condition that the output disturbance has physical significance, thereby efficiently generating a confrontation state, increasing the queuing length and waiting time of the intersection, greatly reducing the performance of the model and greatly reducing the traffic intersection circulation.

Drawings

Fig. 1 is a schematic diagram of reinforcement learning.

Fig. 2 is a general flow diagram of FGSM generation against perturbation.

Fig. 3 is a schematic view of a single intersection.

Fig. 4 is a discrete state of the vehicle position.

FIG. 5 is a comparison graph of single intersection vehicle waiting queue lengths.

FIG. 6 is a comparison graph of vehicle waiting times at a single intersection.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1 to 6, a traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent includes the following steps:

step 1: reinforcement learning is an algorithm that interacts constantly with the environment, as shown in fig. 1. The reinforcement learning algorithm contains three most basic elements: environmental status, agent actions, environmental rewards. Take a typical crossroad as an example. Firstly, training a reinforcement learning intelligent agent model on a road grid of a single intersection, and carrying out discrete coding on traffic states of all roads entering the single intersection. Dividing a single intersection into c discrete units equidistantly from a road k (k is 1,2,3,4) with the length of l between a road entrance and a stop line, and representing the vehicle position of the road k of the single intersection at the time t as a vehicle position matrix s_k(t) when the vehicle head is on a discrete cell, then the vehicle position matrix s_k(t) has a value of 0.5 for the ith (i ═ 1,2, …, c) position, otherwise the value is-0.5, and the formula is:

wherein

Matrix s representing the position of the vehicle_k(t) value of ith position, matrix s of vehicle positions at four intersection input ends at t moment_k(t) splicing according to line head and tail to form s_tThe formula is expressed as:

s_t＝[s₁(t),s₂(t),s₃(t),s₄(t)] (2)

then handle s_tAs environmental conditions, into agent models, and the agent outputs the corresponding action, i.e. the phase to be executed by the traffic light (e.g. south)North green light or east-west green light).

A typical crossroad will be taken as an example for explanation. We define the phase of the traffic light as the motion space a ═ a₁,a₂,a₃,a₄In which a is₁Green light in east-west direction, a₂Turning green light to the left in east and west directions, a₃Green light in the north-south direction, a₄Turning to green light from south to north. In operation set a_iThe initial duration of the yellow phase is m and the yellow phase duration is n. At the time t, the current state s is set_tInputting the phase into an intelligent traffic light model, and selecting the phase a by the intelligent traffic light_i(i ═ 1,2,3,4) when a_iAfter the phase is executed, the intelligent traffic light collects the state s at the moment of t +1 from the environment_t+1Then selects phase a_j(j is 1,2,3, 4). If a_i≠a_jThen a_iThe phase execution time is not lengthened any more, i.e. a_iEnd of phase at a_iAfter the phase is finished, the intelligent traffic light executes the yellow light phase, and after the yellow light phase is finished, executes a_jA phase; if a_i＝a_jThen a is_iPhase execution time is prolonged by m; will award r_tSet as the difference in the waiting time of the crossing vehicle between two consecutive actions, the formula is given as:

r_t＝W_t-W_t+1 (3)

wherein W_t，W_t+1The waiting time for entering all lanes of the single intersection at the time t and the time t +1 are respectively. And judging the action according to the executed action and the environment reward, thereby continuously updating the parameters of the network. The reinforcement learning model used was: deep Q Network (DQN). The structure comprises a convolution layer and a full connection layer; the parameters include the convolution kernel size, the number of full connectivity layer neurons. A deep neural network is used as a Q value network, network parameters are initialized, the output of the network is the Q value, a Relu nonlinear activation function is adopted by a hidden layer, and the number of neurons of an output layer is equal to the size of an action space of a single intersection. The formula is expressed as:

Q＝h(ws_t+b) (4)

where w represents the weight of the neural network, s_tIs the input to the network, b is the bias, and h (.) represents the Relu activation function. The loss function of DQN is:

L_t＝(y_t-Q(s_t,a_i；θ′))² (6)

wherein y is_tRepresents a target value, a_i,a_jE A represents the traffic light phase r which is the action of the intelligent output_tRepresenting the reward at the moment T, gamma is a learning rate, theta and theta 'represent parameters w and b of a target network and parameters w and b' of an estimation network in the DQN respectively, the parameters of the estimation network are updated gradually along with a time step, the parameters of the target network are updated by directly copying the parameters of the network from the estimation network at intervals of time T, and the formula is as follows:

step 2: the number of vehicles at the input end of each intersection and the positions of the vehicles at the input end of each intersection are obtained at the traffic intersections, namely the number and the positions of the vehicles are input into the model, and corresponding traffic lights, namely output actions, are generated. Using FGSM attack algorithm to attack the input of each time one by one to obtain corresponding anti-disturbance; the process is as follows:

2.1: obtaining an input value s of an input model at time t_tWherein s is_tRepresenting the number of vehicles at the input end of the single intersection and the positions of the vehicles at the input end of the single intersection, which are obtained from sumo at the moment t;

2.2: input original state s_tSelecting the action a with the maximum action value function Q through the trained DQN intelligent agent model_m(m is 1,2,3,4) namelyThe optimal traffic light phase is expressed as:

where θ represents a parameter of the trained intelligent agent model network, a_mIndicating the action of the output, i.e. the phase the traffic light is to perform.

2.3: using FGSM attack algorithm, assigning values along gradient direction according to sign function to generate corresponding anti-disturbance eta at t moment_tThe formula is expressed as:

where ε represents the perturbation coefficient, s_tRepresenting the input value, i.e. the position of the vehicle, a_mRepresenting the optimum phase to be carried out by the traffic light at that time, sign representing a sign function, L_t(θ,s_t,a_m) Representing the loss function of the model at time t.

And step 3: the state taken is a discrete value since it is the number of vehicles and their positions. Therefore, the counterdisturbance η is processed to obtain a disturbance value with actual physical significance. The process is as follows:

3.1：

wherein c is the number of discrete units divided by the input end of the traffic intersection,

representing the confrontation state of the ith discrete cell at time t. Calculating the resistance disturbance eta of t moment_tThen, taking the absolute value of the disturbance at the time t and finding the maximum value thereof

And minimum value

And ordering eta according to the size sequence to obtain a new ordering array

Finally pass through

And discretizing the disturbance to make the disturbance have practical physical significance.

3.2: at eta_tReading the disturbance in sequence and comparing the disturbance with original data, and if the original state is inconsistent with the counterdisturbance, assigning the corresponding disturbance to the corresponding original state; if the original state is consistent with the counterdisturbance, then get eta again_t' the next countermeasure disturbance is assigned in the manner described above until the selected disturbance is valid, resulting in a disturbance state s_t′。

And 4, step 4: in the currently constructed disturbance state, the disturbance magnitude is limited, and the disturbance amount mu is measured at the moment t_tInputting the disturbance state into the model when the disturbance state is less than or equal to delta (delta is a disturbance limit); when the disturbance amount mu_tInputting the original state into the model when the value is less than or equal to delta;

calculating the disturbance quantity mu added by the disturbance state at the time t_tThe formula is expressed as:

where len (.) denotes the calculation s_tAnd s_tThe number of' middle vehicle state is 0.5, when the disturbance amount mu_tWhen the value is less than or equal to delta, the state of disturbance s is detected_t' input into the agent model, otherwise the original state s_tInput into the agent model.

And 5: and testing the performance of the generated anti-disturbance, and after the state is input into the model, the intelligent agent can select the phase of the traffic signal lamp according to the current state to control the traffic flow of the single intersection. Finally, comparing the fluency of the traffic intersection by the traffic light phases obtained by the traffic flows in different input states on sumo;

the process of the step 5 is as follows:

5.1: original state s at each time_tModel input into the model will select the optimal action (traffic light phase)

Controlling the traffic flow at the intersection and calculating the waiting time difference (reward r) of the traffic intersection_t＝W_t-W_t+1)。

5.2: to the final disturbance s after adding effective disturbance_t' calculating the disturbance quantity mu_tTo meet the requirement (mu)_tδ) input of input state into the agent model and output action

I.e. the traffic light phase, when the difference in waiting time (reward r) at the traffic crossing is calculated as well_t＝W_t-W_t+1)。

Example (c): the data in the actual experiment are as follows:

(1) selecting experimental data

The experimental data are 100 cars randomly generated by a single intersection on sumo, and the size of each car, the distance from the generation position to the intersection and the speed of the car from generation to passing through the intersection are all the same. The initial time of traffic light phase at the traffic intersection is 10 seconds for green light and 4 seconds for yellow light. A road k (k 1,2,3,4) of 700 meters in length starting from the stop line is divided into 100 discrete units of 7 meters in length. Original state s collected at input end of traffic intersection_tThe number and the positions of the vehicles at the input end of the single intersection are recorded. The perturbation limit δ is 20%.

(2) Results of the experiment

In result analysis, a single intersection is used as an experimental scene, a reinforcement learning Deep Q Network (DQN) intelligent model is trained, a fast gradient descent method (FGSM) is adopted, discretization processing is carried out on disturbance to generate counterdisturbance, the number of vehicles at the input end of the intersection and the positions thereof are changed to cause the phase of the traffic lights to be changed, the comparison experiment is carried out under the two conditions of attack and non-attack, and the experimental results are shown in fig. 5 and fig. 6 (when the attack is continuously carried out, the traffic light phase cannot well ensure the circulation of the single intersection vehicles, so that the vehicles are accumulated at the intersection.

The embodiments described in this specification are merely illustrative of implementations of the inventive concepts, which are intended for purposes of illustration only. The scope of the present invention should not be construed as being limited to the particular forms set forth in the examples, but rather as being defined by the claims and the equivalents thereof which can occur to those skilled in the art upon consideration of the present inventive concept.

Claims

1. A traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent is characterized by comprising the following steps:

step 1: training an reinforcement learning intelligent agent model on a single intersection road grid, wherein network parameters of the model do not change after training, the model has high mobility, and high fluency and no congestion are realized in a single intersection testing process;

step 2: acquiring the number of vehicles at the input end of each intersection and the positions of the vehicles at the input end of each intersection, namely inputting the number and the positions of the current vehicles into a model, generating corresponding traffic signal lamps, namely output actions, and attacking the input at each moment one by utilizing an FGSM (fuzzy basis function) attack algorithm to obtain corresponding counterdisturbance;

and step 3: discretizing the generated countermeasure disturbance, and combining the generated countermeasure disturbance with the originally acquired traffic flow to obtain a final disturbance state, namely the number and the positions of the vehicles at the traffic intersection input into the model at the moment;

and 4, step 4: in the currently constructed disturbance state, the disturbance magnitude is limited, and when the disturbance amount is smaller than the disturbance limit, the disturbance state is input into the model; inputting the original state into the model when the disturbance amount is larger than the disturbance limit;

and 5: testing the performance of the generated anti-disturbance, inputting the state into a model, selecting the phase of a traffic signal lamp according to the current state by an intelligent agent to control the traffic flow of a single intersection, and finally comparing the fluency of traffic intersections corresponding to the traffic lamp phase obtained by the traffic flow of different input states on sumo;

in the step 1, the single intersection is a crossroad, firstly, a reinforcement learning intelligent agent model is trained on a road grid of the single intersection, the traffic states of all roads entering the single intersection are discretely coded, a road k with the length of l from a road section entrance to a stop line of the single intersection is equidistantly divided into c discrete units, k is 1,2,3 and 4, and the vehicle position of the road k of the single intersection at the time t is represented as a vehicle position matrix s_k(t) when the vehicle head is on a discrete cell, then the vehicle position matrix s_k(t) the value corresponding to the ith position is 0.5, i-1, 2, …, c, otherwise the value is-0.5, and the formula is:

wherein

Matrix s representing the position of the vehicle_k(t) value of ith position, matrix s of vehicle positions of four intersection input ends at time t_k(t) splicing end to end in lines to form s_tThe formula is expressed as:

s_t＝[s₁(t),s₂(t),s₃(t),s₄(t)] (2)

then handle s_tInputting the state of the environment into an intelligent agent model for training, and outputting a corresponding action, namely a phase to be executed by a traffic light by the intelligent agent;

defining the phase of traffic light as motion space a ═ a₁,a₂,a₃,a₄In which a is₁Green light in east-west direction, a₂Turning green light to the left in east and west directions, a₃Green light in the north-south direction, a₄Turning green light to the left in south and north, setting a during operation_iThe initial time length of the phase is m, the phase time length of the yellow lamp is n, and the current state s is compared with the current state s at the moment t_tInputting the phase into an intelligent traffic light model, and selecting the phase a by the intelligent traffic light_iWhen i is 1,2,3,4, a_iAfter the phase is executed, the intelligent traffic light collects the state s at the t +1 moment from the environment_t+1Then select phase a_jJ is 1,2,3,4, if a_i≠a_jThen a_iThe phase execution time is not lengthened any more, i.e. a_iEnd of phase at a_iAfter the phase is finished, the intelligent traffic light executes the yellow light phase, and after the yellow light phase is finished, executes a_jA phase; if a_i＝a_jThen a is_iPhase execution time is prolonged by m; will award r_tSet as the difference in the waiting time of the crossing vehicle between two consecutive actions, the formula is expressed as:

r_t＝W_t-W_t+1 (3)

wherein W_t，W_t+1Respectively waiting time of entering all lanes of the single intersection at the time t and the time t +1, judging the action according to the executed action and environment reward, and continuously updating the parameters of the network, wherein the used reinforcement learning model is DQN, and the structure comprises a convolution layer and a full connection layer; the parameters comprise the size of a convolution kernel and the number of neurons of a full connection layer, a deep neural network is used as a Q value network, network parameters are initialized, the output of the network is the Q value, a Relu nonlinear activation function is adopted by a hidden layer, the number of the neurons of the output layer is equal to the size of an action space of a single intersection, and the formula is expressed as follows:

Q＝h(ws_t+b) (4)

where w represents the weight of the neural network, s_tFor the input of the network, b is the bias, h (.) represents the Relu activation function, the loss function of DQN is：

L_t＝(y_t-Q(s_t,a_i；θ′))² (6)

the process of the step 2 is as follows:

2.2: input original state s_tSelecting the action a with the maximum action value function Q through the trained DQN intelligent agent model_mThat is, the optimal traffic light phase at this time, m is 1,2,3,4, and is expressed as:

where θ represents a parameter of the trained agent model network, a_mIndicating the movement of the outputI.e. the phase the traffic light is going to execute;

where ε represents the perturbation coefficient, s_tRepresenting the input value, i.e. the position of the vehicle, a_mRepresenting the optimum phase to be carried out by the traffic light at that time, sign representing a sign function, L_t(θ,s_t,a_m) A loss function representing the model at time t;

the process of the step 3 is as follows:

3.1：

representing the antagonistic disturbance of the ith discrete unit at the t moment, and calculating the antagonistic disturbance eta at the t moment_tThen, taking the absolute value of the disturbance at the time t and finding the maximum value thereof

And minimum value

And pair eta in order of magnitude_tSorting to obtain a new sorting array

Finally pass through

i is 1,2, …, c discretizes the perturbation,making it of practical physical significance;

3.2: at eta_tReading the disturbance in sequence and comparing the disturbance with original data, and if the original state is inconsistent with the counterdisturbance, assigning the corresponding disturbance to the corresponding original state; if the original state is consistent with the counterdisturbance, then get eta again_t' the next countermeasure disturbance is assigned in the manner described above until the selected disturbance is valid, resulting in a disturbance state s_t′；

The process of the step 4 is as follows: calculating the disturbance quantity mu added by the disturbance state at the moment t_tThe formula is expressed as:

where len (.) denotes the calculation s_tAnd s_tThe number of' middle vehicle state is 0.5, when the disturbance amount mu_tWhen the value is less than or equal to delta, the state of disturbance s is detected_t' input into the agent model, otherwise the original state s_tInputting into an agent model;

the process of the step 5 is as follows:

5.1: original state s at each time_tModel input into the model selects the optimal action

Controlling the traffic flow at the intersection and calculating the waiting time difference of the traffic intersection, i.e. the reward r_t＝W_t-W_t+1；

5.2: for the final disturbance s after adding effective disturbance_t' calculating the disturbance quantity mu_tTo satisfy the requirement mu_tInputting input state less than or equal to delta into intelligent agent model and outputting action

I.e. the traffic light phase, and the waiting time difference of the traffic intersection is calculated and r is awarded_t＝W_t-W_t+1。