CN115426149A

CN115426149A - Single intersection signal lamp control traffic state anti-disturbance generation method based on Jacobian saliency map

Info

Publication number: CN115426149A
Application number: CN202211040566.8A
Authority: CN
Inventors: 徐东伟; 刘沛文; 王达; 李呈斌
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2022-08-29
Filing date: 2022-08-29
Publication date: 2022-12-02

Abstract

The invention discloses a traffic state countermeasure disturbance generation method based on single intersection signal lamp control of a Jacobian saliency map, which is based on a traffic signal lamp control model trained by a reinforcement learning algorithm DQN, utilizes a forward derivative Jacobian matrix and a saliency map of JSMA attack to manufacture countermeasure samples, inputs the countermeasure samples meeting the limitation into an intelligent model, and finally analyzes the traffic state of a single intersection road section on sumo so as to check the effect of countermeasure attack. The invention can greatly influence the final signal lamp output action under the condition of only modifying a small part of the original state, thereby efficiently influencing the road traffic condition of the single intersection road section and reducing the performance of the model.

Description

Single intersection signal lamp control traffic state anti-disturbance generation method based on Jacobian saliency map

Technical Field

The invention belongs to the crossing technical field of intelligent traffic and deep reinforcement learning, and particularly relates to a traffic state anti-disturbance generation method based on single intersection signal lamp control of a Jacobian saliency map.

Background

With the acceleration of urbanization and the rapid increase of urban development level, traffic conditions become more and more important factors in modern urban systems. In the face of increasingly serious traffic jam, an intelligent traffic system can be designed hopefully, and high-efficiency traffic management is realized through higher-level automation, so that the purposes of saving traffic time, saving traffic energy, reducing traffic risks and the like are achieved.

The intelligent traffic system has strict requirements on automatic control, so that the system self-adaptive control is realized by means of an artificial intelligence technology based on a learning model. Reinforcement Learning (RL) is a branch in machine learning, and is different from traditional supervised learning and unsupervised learning, and is mainly characterized by learning in interaction. Namely, the intelligence in the reinforcement learning model interacts with the surrounding environment, and receives feedback while outputting the action. Through a preset reward mechanism, the agent can evaluate the action taken before and learn the information of the environment step by step, so as to master the action strategy capable of obtaining the maximum reward, namely the goal of strengthening the learning of the agent. Deep Reinforcement Learning (DRL) generated by combining with a deep neural network has great application potential at present due to excellent decision making and perception capabilities. For example, in the control optimization problem of traffic lights, the DRL is expected to become a new solution. At the same time, however, DRLs have also proven to be susceptible to antagonistic perturbations, potentially presenting various unexpected potential hazards.

As the DRL becomes a hot spot of artificial intelligence research and is applied more widely in the fields of images, games, unmanned systems, etc., the robustness of the DRL to attacks also receives more attention. By providing a threat model and possible attack means, researchers can establish a more perfect defense mechanism so as to improve the resistance of the DRL model to attack. In the invention, a typical Deep Q Network (DQN) algorithm is adopted, the control of a single cross signal lamp is taken as an application scene, and attack is carried out based on a Jacobian saliency map method (JSMA) to generate a countersample.

Disclosure of Invention

In order to expand the prior art, the invention provides a traffic state countermeasure generation method controlled by a single intersection signal lamp based on a Jacobian saliency map, which can form an efficient countermeasure sample on the premise of adding only a small amount of disturbance, so that the output action of the signal lamp is remarkably changed, and the performance of a model and the traffic smoothness of a single intersection road section are greatly influenced.

The technical scheme adopted by the invention is as follows:

a traffic state anti-disturbance generation method based on single intersection signal lamp control of a Jacobian saliency map comprises the following steps:

step 1, training an intelligent agent model at a simulated single-intersection road section, and keeping w and b parameters of a DQN network after training unchanged, wherein the model has certain mobility; the intelligent agent after initial training should show better traffic fluency in the simulated road section, and the fluency is compared with the fluency formed after the anti-attack is applied;

step 2, acquiring road states at each intersection of the single intersection as input of a model, namely the number of vehicles and the positions of the vehicles on each road, giving corresponding action output by the model, namely the phase of a signal lamp, and adding disturbance based on a JSMA attack algorithm to generate a countermeasure sample;

step 3, calculating the amplitude of disturbance, inputting the confrontation state obtained in the previous step into a model if the disturbance is within a limited range, and otherwise, inputting the original state;

and 4, after disturbance input, the model outputs corresponding signal lamp actions to control the road traffic condition of the single intersection, and the effect of resisting the attack can be analyzed by comparing the waiting time of vehicles passing through the intersection.

Further, the process of step 1 is as follows:

firstly, training a reinforcement learning intelligent agent model on a sumo single-intersection road;

secondly, discretizing the traffic states of all roads in the environment: setting the distance from a road entrance to a stop line as l, and equally dividing a lane k (k =1,2,3,4) on a road into c units; the vehicle position on the lane k at time t is represented as a matrix s _k (t) when the head of the vehicle is on a discrete unit, then s _k (t) the value of the corresponding position i (i =1,2,. Once, c) is 0.5, otherwise-0.5; s at four intersections _k (t) arranged in rows, i.e. obtaining the original environmental state s to be input into the model _t ；

For the intelligent model, inputting an environmental state as a traffic condition to obtain a specific traffic signal lamp action; using the phase of the signal lamp as the motion space a = { a) of the agent ₁ ,a ₂ ,a ₃ ,a ₄ In which a is ₁ A green light traveling straight in the north-south direction ₂ Turning green light to the left in south and north directions, a ₃ A green light traveling straight in the east-west direction ₄ A green light turning to the left in the east-west direction; set up a _i The duration of the initial green lamp phase is m, and the duration of the yellow lamp phase is n; the current state s _t Inputting into the model, and outputting corresponding a by the agent _i (i =1,2,3,4) as action, a _i After the duration of time, the agent continues to collect the state s of the next moment from the environment _t+1 Then outputs the phase a _j (j =1,2,3,4); if a _i ≠a _j ，a _i After the green lamp phase is finished, executing a yellow lamp phase for n duration, and then executing a _j A phase; otherwise will a _i The execution time of (1) is prolonged by m duration; reward r for reinforcement learning _t Set as the difference between the total waiting time of the vehicles at the intersection between the two actions, the formula is as follows:

r _t ＝W _t -W _t+1 (1)

wherein W _t ，W _t+1 Respectively representing the sum of waiting time of all vehicles entering the single intersection at the time t and the time t + 1; using DQN as a reinforcement learning model, wherein the output of the initialized neural network is a Q value; the hidden layer of the deep neural network uses Relu as an activation function and outputsThe number of the output neurons is set to be equal to the action space of the traffic signal lamp; the formula is expressed as follows:

Q＝h(ws _t +b) (2)

where w represents the weight of the neural network, s _t For the input to the network at time t, b is the bias, and h (.) represents the Relu activation function. The loss function of DQN is expressed as:

L _t ＝(y _t -Q(s _t ,a _i ；θ′)) ² (4)

wherein gamma is a learning rate, theta and theta ' respectively represent parameters w, b, w ' and b ' of the target network and the estimated network; with the training of the reinforcement learning agent, the parameters of the target network are updated according to the time step, the updating mode is that the parameters are directly copied from the estimation network to the target network at intervals of time T, and the formula is as follows:

further, the process of step 2 is as follows:

2.1: obtaining an original environmental state s _t Inputting the data into a trained DQN intelligent agent model, and selecting the action a which maximizes the Q function value _m (m =1,2,3,4), which is the optimum signal lamp operation at this time, the formula is as follows:

where θ represents the parameters w, b of the trained agent and am represents the next action of the traffic light.

2.2: based on JSMA attack algorithm, calculating a Jacobian matrix of neural network output to input along the gradient direction, and expressing an input state s _t The saliency map X of (1) used to describe which information in the input states has the greatest impact on the output; for input s _ti (i =1,2,3, …, 80) the formula for saliency map X is as follows:

wherein

Representing neural network output versus input s _t The forward derivative of (d); selecting input features s that maximize saliency map X _ti Modifying the characteristics to be +1 to obtain a disturbance state; when the disturbance action corresponding to the disturbance state is corresponding to the optimal action a _m At different times, stopping inputting the state s _ti Modification of (2).

Further, the process of step 3 is as follows:

perturbation mu _t For disturbance state s at time t _t ' with original state s _t Evaluating whether the disturbance quantity is within the limit or not so as to determine whether the disturbance state after the attack is input or not; calculating the size of the disturbance mu t at the time t, and expressing the formula as follows:

where len (. Eta.) calculates the number of 0.5 in the vehicle state set, and when μ ≦ δ, will disturb state s _t ' input into the model, otherwise the original state s _t Input into the model.

Further, the process of step 4 is as follows:

disturbance state s satisfying the disturbance magnitude _t Inputting the data into a model to obtain the countermeasures, and calculating the single intersection at the previous moment and the current momentThe difference between the waiting time of vehicles on the road section is awarded with the reward r _t 。

The technical conception of the invention is as follows: based on a traffic signal lamp control model trained well by a reinforcement learning algorithm DQN, a confrontation sample is made by utilizing a forward derivative Jacobian matrix and a saliency map of JSMA attack, the confrontation sample meeting the limitation is input into an intelligent model, and finally the traffic condition of a single intersection road section is analyzed on sumo, so that the effect of the confrontation attack is tested.

Compared with the prior art, the invention has the following beneficial effects: finding the part which can influence the output most in the input state by utilizing the saliency map idea of the JSMA algorithm, and modifying the part within a certain disturbance size limit to obtain a confrontation sample; the method can greatly influence the final signal lamp output action under the condition of only modifying a small part of original states, thereby efficiently influencing the road traffic condition of the road section of the single intersection and reducing the performance of the model.

Drawings

Fig. 1 is a flow chart for generating an opposition disturbance based on JSMA.

Fig. 2 is a schematic diagram of a single intersection scenario.

FIG. 3 is a schematic diagram of discrete states of vehicle position.

FIG. 4 is a comparison of the length of a single intersection vehicle waiting queue.

FIG. 5 is a comparison graph of vehicle waiting time for a single intersection.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating embodiments of the invention, are given by way of illustration and explanation only, not limitation.

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.

The invention will be described in detail below with reference to the accompanying figures 1-5 in conjunction with exemplary embodiments.

Example 1

Step 1: first of all in sumoAnd training a reinforcement learning intelligent agent model on the single intersection road. Discretizing traffic states of all roads in the environment: let the distance from the road entrance to the stop line be l, and equally divide the lane k (k =1,2,3,4) on the road into c units. the vehicle position on the lane k at time t is represented as a matrix s _k (t) when the head of the vehicle is on a discrete unit, then s _k (t) the value of the corresponding position i (i =1,2,.., c) is 0.5, otherwise-0.5. S at four intersections _k (t) arranged in rows, i.e. obtaining the original environmental state s to be input into the model _t 。

For the intelligent agent model of the present invention, the environmental state as a traffic condition is input, and a specific traffic light action is obtained. The phase of the signal lamp is used as the motion space A = { a) of the agent ₁ ,a ₂ ,a ₃ ,a ₄ In which a is ₁ A green light traveling straight in the north-south direction ₂ Turning green light to the left in the south and north directions, a ₃ A green light traveling straight in the east-west direction ₄ The lamp is a green lamp turning left in east and west directions. Setting a _i The duration of the initial green phase is m and the duration of the yellow phase is n. The current state s _t Inputting into the model, and outputting corresponding a by the agent _i (i =1,2,3,4) as action, a _i After the duration of time, the agent continues to collect the state s of the next moment from the environment _t+1 Then outputs the phase a _j (j =1,2,3,4). If a _i ≠a _j ，a _i After the green lamp phase is finished, executing a yellow lamp phase for n duration, and then executing a _j A phase; otherwise will a _i The execution time of (c) is extended by m duration. Reward r for reinforcement learning _t Set as the difference between the total waiting time of the vehicles at the intersection between the two actions, the formula is as follows:

r _t ＝W _t -W _t+1 (1)

wherein W _t ，W _t+1 Representing the sum of the waiting times of all vehicles entering the single intersection at times t and t +1, respectively. And (3) taking the DQN as a reinforcement learning model, wherein the output of the neural network after initialization is the Q value. The hidden layer of the deep neural network uses Relu as an activation function and outputsThe number of the neurons is set to be equal to the size of the action space of the traffic signal lamp. The formula is expressed as follows:

Q＝h(ws _t +b) (2)

L _t ＝(y _t -Q(s _t ,a _i ；θ′)) ² (4)

where γ is the learning rate, θ and θ ' represent the parameters w, b and w ', b ' of the target network and the estimated network, respectively. With the training of the reinforcement learning agent, the parameters of the target network are updated according to the time step, the updating mode is that the parameters are directly copied from the estimation network to the target network at intervals of time T, and the formula is as follows:

step 2: and acquiring the positions and the number of the vehicles at the single intersection road section as state information, and inputting the state information into the model to obtain corresponding traffic signal lamp actions. And finding input features needing to be modified based on the JSMA algorithm, and making a countermeasure sample according to the input features. The specific process is as follows:

2.1: obtaining an original environmental state s _t Inputting the data into a trained DQN intelligent agent model, and selecting the action a which maximizes the Q function value _m (m =1,2,3,4), which is the optimal signal lamp operation at this moment, the formula is as follows:

2.2: based on JSMA attack algorithm, calculating a Jacobian matrix of neural network output to input along the gradient direction, and expressing an input state s _t Is used to describe which information in the input states has the greatest impact on the output. For input s _ti (i =1,2,3, …, 80), the formula for saliency map X is as follows:

wherein

Representing neural network output versus input s _t The forward derivative of (c). Selecting input features s that maximize saliency map X _ti The signature is modified to +1, resulting in a perturbed state. When the disturbance action corresponding to the disturbance state is corresponding to the optimal action a _m At different times, stopping inputting the state s _ti Modification of (2).

Step 3, perturbing mu _t For disturbance state s at time t _t ' with original state s _t And evaluating whether the disturbance quantity is within the limit or not so as to determine whether a disturbance state formed after the attack is input or not. Calculating the disturbance mu at the time t _t Is formulated as follows:

Step 4, the disturbance state s with the disturbance size meeting the requirement _t ' input to the dieIn the model, a countermeasure action is obtained, the difference between the waiting time of the vehicles at the single intersection road section at the previous moment and the single intersection road section at the previous moment is calculated, and a reward r is obtained _t 。

Example 2: data in actual experiments

(1) Selecting experimental data

The experimental data was generated from 100 randomly generated vehicles at a single intersection on sumo, where the body length, initial position, and travel speed of each vehicle were the same.

(2) Parameter determination

The green light of the traffic light initially lasts 10 seconds and the yellow light 4 seconds. The road k (k =1,2,3,4) from the intersection to the stop line has a length of 700 meters, and is divided into 100 units having a length of 7 meters. The learning rate of the neural network in the agent model was 0.001. The perturbation threshold δ =20%.

(3) Results of the experiment

Experiments are carried out on a single intersection road section on sumo, based on a trained DQN reinforcement learning intelligent agent model, a traffic state countermeasure generation method based on single intersection signal control of a Jacobian Saliency Map (JSMA) is used, disturbance is added to states of the number and positions of vehicles input into the model at each moment, and action output of a traffic signal lamp is changed. Compared experiments are carried out under two conditions of attack and no attack, and the experimental results are shown in fig. 4 and fig. 5 (under the condition of continuous attack, the effect of traffic signal lamps for guiding vehicles to pass through at the intersection is reduced, so that the vehicle congestion is more and more serious, but when the vehicles are congested to a certain degree, the smaller disturbance is not enough to continuously influence the traffic condition, so that the disturbance breaks through the set threshold value, and finally the state of an input system is changed into the original state.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A traffic state anti-disturbance generation method based on single intersection signal lamp control of a Jacobian saliency map comprises the following steps:

step 1, training an intelligent agent model at a simulated single-intersection road section, and keeping w and b parameters of a DQN network after training unchanged, wherein the model has certain mobility; the intelligent agent after initial training should show better traffic fluency on the simulated road section, and the fluency is compared with the fluency formed after the anti-attack is applied;

2. The method for generating single intersection signal lamp controlled traffic state counterdisturbance based on the jacobian saliency map as claimed in claim 1, wherein the procedure of step 1 is as follows:

secondly, discretizing the traffic states of all roads in the environment: setting the distance from a road entrance to a stop line as l, and equally dividing a lane k (k =1,2,3,4) on a road into c units; the vehicle position on lane k at time t is represented as a matrix s _k (t) when the head of the vehicle is on a discrete unit, then s _k (t) the value of the corresponding position i (i =1,2,. Once, c) is 0.5, otherwise-0.5; will have four waysMouth s _k (t) arranged in rows, i.e. obtaining the original environmental state s to be input into the model _t ；

For the intelligent model, inputting an environmental state as a traffic condition to obtain a specific traffic signal lamp action; the phase of the signal lamp is used as the motion space A = { a) of the agent ₁ ,a ₂ ,a ₃ ,a ₄ In which a is ₁ A green light traveling straight in the north-south direction ₂ Turning green light to the left in the south and north directions, a ₃ A green light traveling straight in the east-west direction ₄ A green light turning to the left in the east-west direction; setting a _i The duration of the initial green lamp phase is m, and the duration of the yellow lamp phase is n; the current state s _t Inputting into a model, and outputting corresponding a by the agent _i (i =1,2,3,4) as action, a _i After the duration of time of (c), the agent continues to collect the state s of the next moment from the environment _t+1 Then outputs the phase a _j (j =1,2,3,4); if a _i ≠a _j ，a _i After the green lamp phase is finished, executing a yellow lamp phase for n duration, and then executing a _j A phase; otherwise will a _i The execution time of (1) is prolonged by m duration; reward r for reinforcement learning _t Set as the difference between the total waiting time of the vehicles at the intersection between the two actions, the formula is as follows:

r _t ＝W _t -W _t+1 (1)

wherein W _t ，W _t+1 Respectively representing the sum of waiting time of all vehicles entering the single intersection at the time t and the time t + 1; using DQN as a reinforcement learning model, wherein the output of the initialized neural network is a Q value; the hidden layer of the deep neural network uses Relu as an activation function, and the number of output neurons is set to be equal to the action space of the traffic signal lamp; the formula is expressed as follows:

Q＝h(ws _t +b) (2)

L _t ＝(y _t -Q(s _t ,a _i ；θ′)) ² (4)

wherein gamma is a learning rate, and theta ' respectively represent parameters w, b, w ' and b ' of the target network and the estimation network; with the training of the reinforcement learning agent, the parameters of the target network are updated according to the time step, the updating mode is that the parameters are directly copied from the estimation network to the target network at intervals of time T, and the formula is as follows:

3. the method for generating single intersection signal lamp controlled traffic state counterdisturbance based on the jacobian saliency map as claimed in claim 1, wherein the procedure of said step 2 is as follows:

where θ represents the parameters w, b, a of the trained agent _m Indicating the next action of the traffic light.

2.2: based on JSMA attack algorithm, calculating a Jacobian matrix of neural network output to input along the gradient direction, and expressing an input state s _t The saliency map X of (1) used to describe which information in the input states has the greatest impact on the output; for input s _ti (i =1,2,3, …, 80), the formula for saliency map X is as follows:

wherein

4. The method for generating single intersection signal lamp controlled traffic state counterdisturbance based on the jacobian saliency map as claimed in claim 1, wherein the procedure of said step 3 is as follows:

perturbation mu _t For disturbance state s at time t _t ' with original state s _t Evaluating whether the disturbance quantity is within the limit or not so as to determine whether the disturbance state after the attack is input or not; calculating the disturbance mu at the time t _t Is formulated as follows:

5. The method for generating single intersection signal lamp controlled traffic state counterdisturbance based on the jacobian saliency map as claimed in claim 1, wherein the procedure of said step 4 is as follows:

disturbance state s satisfying the magnitude of disturbance _t ' inputting the data into a model to obtain a countermeasure action, calculating the difference between the waiting time of vehicles on the road section of the single intersection at the previous moment and the current moment, and obtaining a reward r _t 。