CN113487889B - Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent - Google Patents

Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent Download PDF

Info

Publication number
CN113487889B
CN113487889B CN202110813579.3A CN202110813579A CN113487889B CN 113487889 B CN113487889 B CN 113487889B CN 202110813579 A CN202110813579 A CN 202110813579A CN 113487889 B CN113487889 B CN 113487889B
Authority
CN
China
Prior art keywords
disturbance
traffic
phase
model
intersection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110813579.3A
Other languages
Chinese (zh)
Other versions
CN113487889A (en
Inventor
徐东伟
王达
李呈斌
周磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202110813579.3A priority Critical patent/CN113487889B/en
Publication of CN113487889A publication Critical patent/CN113487889A/en
Application granted granted Critical
Publication of CN113487889B publication Critical patent/CN113487889B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/07Controlling traffic signals
    • G08G1/08Controlling traffic signals according to detected number or speed of vehicles

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Traffic Control Systems (AREA)

Abstract

A traffic state countermeasure disturbance generation method based on single intersection signal control of rapid gradient descent is characterized in that according to a traffic intersection signal lamp control model trained by an existing reinforcement learning DQN algorithm, countermeasure samples are generated by utilizing FGSM attack based and discretization processing on the countermeasure disturbance in combination with gradient values, the final disturbance state obtained by combining the countermeasure disturbance and an original state is input into an intelligent model, and finally the effect of fluency or congestion degree inspection on a single intersection is achieved on sumo. The invention can limit the disturbance under the condition that the output disturbance has physical significance, thereby efficiently generating a confrontation state, increasing the queuing length and waiting time of the intersection, greatly reducing the performance of the model and greatly reducing the traffic intersection circulation.

Description

Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent
Technical Field
The invention belongs to the crossing field of intelligent traffic and machine learning information safety, and relates to a traffic state anti-disturbance generation method based on single crossing signal control of fast gradient descent (FGSM).
Background
The problem of traffic congestion becomes an urgent challenge for urban traffic, and when a modern city is designed, one of the most critical considerations is to develop an intelligent traffic management system. The main goal of traffic management systems is to reduce traffic congestion, which has become one of the major problems in large cities today. Efficient urban traffic management can save time and money and reduce carbon dioxide emissions to the atmosphere.
Reinforcement Learning (RL) has produced impressive results as a machine learning technique for the traffic signal control problem. Reinforcement learning does not require prior full understanding of the environment, such as traffic flow. Instead, they can acquire knowledge and model environmental dynamics by interacting with the environment. After each operation is performed in the environment, it will receive a scalar reward. The reward earned depends on the degree of action taken and the goal of the agent is to learn the best control strategy, so by repeatedly interacting with the environment, the accumulated reward at a reduced price can be maximised. Deep Reinforcement Learning (DRL) has numerous applications in the real world due to its excellent ability to adapt quickly to the surrounding environment. Although DRL has great advantages, it is vulnerable to antagonistic attacks such as: the method comprises the following steps of attracting attack, strategy timing attack, value function-based anti-attack, Trojan attack and the like.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a traffic state anti-disturbance generation method based on single intersection signal control with rapid gradient descent, which can add a small amount of disturbance to the number of vehicles and the positions of the vehicles and ensure that the disturbance has actual physical significance, thereby efficiently generating the anti-disturbance and greatly reducing the performance of a model and the smoothness degree of a traffic intersection.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent comprises the following steps:
step 1: training a reinforcement learning Deep Q Network (DQN) intelligent model on a single intersection road grid, wherein Network parameters of the model do not change after training, the model has high mobility, and high fluency and no congestion are embodied in a single intersection testing process;
step 2: acquiring the number of vehicles at the input end of each intersection and the positions of the vehicles at the input end of each intersection, namely inputting the number and the positions of the vehicles into a model, generating corresponding traffic signal lamps, namely output actions, and attacking the input at each moment one by utilizing a FGSM (highway fault warning message) attack algorithm to obtain corresponding counterdisturbance;
and 3, step 3: discretizing the generated countermeasure disturbance, and combining the generated countermeasure disturbance with the originally acquired traffic flow to obtain a final disturbance state, namely the number and the positions of the vehicles at the traffic intersection input into the model at the moment;
and 4, step 4: in the currently constructed disturbance state, the disturbance magnitude is limited, and when the disturbance amount is smaller than the disturbance limit, the disturbance state is input into the model; inputting the original state into the model when the disturbance quantity is larger than the disturbance limit;
and 5: and finally, comparing fluency of traffic intersections by traffic light phases obtained by traffic flows in different input states on sumo.
As a research hotspot in the field of artificial intelligence, Deep Reinforcement Learning (DRL) has achieved certain success in various fields such as robot control, computer vision, intelligent transportation, and the like. Meanwhile, the possibility of attack and whether it has strong resistance are also the hot topics in recent years. Therefore, the method selects a representative Deep Q Network (DQN) algorithm in Deep reinforcement learning, controls single intersection signal lamps to be application scenes, and adopts a fast gradient descent method (FGSM) to attack the DQN algorithm to generate countermeasure samples.
The technical conception of the invention is as follows: according to a traffic intersection signal lamp control model trained by the existing reinforcement learning DQN algorithm, discrete processing is carried out on the countermeasure disturbance by using FGSM attack and combining with the gradient value to generate a countermeasure sample, the final disturbance state obtained by combining the countermeasure disturbance and the original state is input into an intelligent model, and finally the effect of detecting the fluency or the congestion degree of a single-cross intersection is achieved on sumo.
The invention has the following beneficial effects: and generating corresponding opposite disturbance to the maximum gradient value by using an FGSM attack algorithm, wherein the generated disturbance is a discrete value, combining the opposite disturbance and the original traffic flow to form a disturbance state, adding disturbance limitation to the disturbance quantity of the disturbance state, and obtaining the output which is the disturbance state. The invention can limit the disturbance under the condition that the output disturbance has physical significance, thereby efficiently generating a confrontation state, increasing the queuing length and waiting time of the intersection, greatly reducing the performance of the model and greatly reducing the traffic intersection circulation.
Drawings
Fig. 1 is a schematic diagram of reinforcement learning.
Fig. 2 is a general flow diagram of FGSM generation against perturbation.
Fig. 3 is a schematic view of a single intersection.
Fig. 4 is a discrete state of the vehicle position.
FIG. 5 is a comparison graph of single intersection vehicle waiting queue lengths.
FIG. 6 is a comparison graph of vehicle waiting times at a single intersection.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 to 6, a traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent includes the following steps:
step 1: reinforcement learning is an algorithm that interacts constantly with the environment, as shown in fig. 1. The reinforcement learning algorithm contains three most basic elements: environmental status, agent actions, environmental rewards. Take a typical crossroad as an example. Firstly, training a reinforcement learning intelligent agent model on a road grid of a single intersection, and carrying out discrete coding on traffic states of all roads entering the single intersection. Dividing a single intersection into c discrete units equidistantly from a road k (k is 1,2,3,4) with the length of l between a road entrance and a stop line, and representing the vehicle position of the road k of the single intersection at the time t as a vehicle position matrix sk(t) when the vehicle head is on a discrete cell, then the vehicle position matrix sk(t) has a value of 0.5 for the ith (i ═ 1,2, …, c) position, otherwise the value is-0.5, and the formula is:
Figure BDA0003169108510000041
wherein
Figure BDA0003169108510000042
Matrix s representing the position of the vehiclek(t) value of ith position, matrix s of vehicle positions at four intersection input ends at t momentk(t) splicing according to line head and tail to form stThe formula is expressed as:
st=[s1(t),s2(t),s3(t),s4(t)] (2)
then handle stAs environmental conditions, into agent models, and the agent outputs the corresponding action, i.e. the phase to be executed by the traffic light (e.g. south)North green light or east-west green light).
A typical crossroad will be taken as an example for explanation. We define the phase of the traffic light as the motion space a ═ a1,a2,a3,a4In which a is1Green light in east-west direction, a2Turning green light to the left in east and west directions, a3Green light in the north-south direction, a4Turning to green light from south to north. In operation set aiThe initial duration of the yellow phase is m and the yellow phase duration is n. At the time t, the current state s is settInputting the phase into an intelligent traffic light model, and selecting the phase a by the intelligent traffic lighti(i ═ 1,2,3,4) when aiAfter the phase is executed, the intelligent traffic light collects the state s at the moment of t +1 from the environmentt+1Then selects phase aj(j is 1,2,3, 4). If ai≠ajThen aiThe phase execution time is not lengthened any more, i.e. aiEnd of phase at aiAfter the phase is finished, the intelligent traffic light executes the yellow light phase, and after the yellow light phase is finished, executes ajA phase; if ai=ajThen a isiPhase execution time is prolonged by m; will award rtSet as the difference in the waiting time of the crossing vehicle between two consecutive actions, the formula is given as:
rt=Wt-Wt+1 (3)
wherein Wt,Wt+1The waiting time for entering all lanes of the single intersection at the time t and the time t +1 are respectively. And judging the action according to the executed action and the environment reward, thereby continuously updating the parameters of the network. The reinforcement learning model used was: deep Q Network (DQN). The structure comprises a convolution layer and a full connection layer; the parameters include the convolution kernel size, the number of full connectivity layer neurons. A deep neural network is used as a Q value network, network parameters are initialized, the output of the network is the Q value, a Relu nonlinear activation function is adopted by a hidden layer, and the number of neurons of an output layer is equal to the size of an action space of a single intersection. The formula is expressed as:
Q=h(wst+b) (4)
where w represents the weight of the neural network, stIs the input to the network, b is the bias, and h (.) represents the Relu activation function. The loss function of DQN is:
Figure BDA0003169108510000051
Lt=(yt-Q(st,ai;θ′))2 (6)
wherein y istRepresents a target value, ai,ajE A represents the traffic light phase r which is the action of the intelligent outputtRepresenting the reward at the moment T, gamma is a learning rate, theta and theta 'represent parameters w and b of a target network and parameters w and b' of an estimation network in the DQN respectively, the parameters of the estimation network are updated gradually along with a time step, the parameters of the target network are updated by directly copying the parameters of the network from the estimation network at intervals of time T, and the formula is as follows:
Figure BDA0003169108510000061
Figure BDA0003169108510000062
step 2: the number of vehicles at the input end of each intersection and the positions of the vehicles at the input end of each intersection are obtained at the traffic intersections, namely the number and the positions of the vehicles are input into the model, and corresponding traffic lights, namely output actions, are generated. Using FGSM attack algorithm to attack the input of each time one by one to obtain corresponding anti-disturbance; the process is as follows:
2.1: obtaining an input value s of an input model at time ttWherein s istRepresenting the number of vehicles at the input end of the single intersection and the positions of the vehicles at the input end of the single intersection, which are obtained from sumo at the moment t;
2.2: input original state stSelecting the action a with the maximum action value function Q through the trained DQN intelligent agent modelm(m is 1,2,3,4) namelyThe optimal traffic light phase is expressed as:
Figure BDA0003169108510000063
where θ represents a parameter of the trained intelligent agent model network, amIndicating the action of the output, i.e. the phase the traffic light is to perform.
2.3: using FGSM attack algorithm, assigning values along gradient direction according to sign function to generate corresponding anti-disturbance eta at t momenttThe formula is expressed as:
Figure BDA0003169108510000064
where ε represents the perturbation coefficient, stRepresenting the input value, i.e. the position of the vehicle, amRepresenting the optimum phase to be carried out by the traffic light at that time, sign representing a sign function, Lt(θ,st,am) Representing the loss function of the model at time t.
And step 3: the state taken is a discrete value since it is the number of vehicles and their positions. Therefore, the counterdisturbance η is processed to obtain a disturbance value with actual physical significance. The process is as follows:
3.1:
Figure BDA0003169108510000071
wherein c is the number of discrete units divided by the input end of the traffic intersection,
Figure BDA0003169108510000072
representing the confrontation state of the ith discrete cell at time t. Calculating the resistance disturbance eta of t momenttThen, taking the absolute value of the disturbance at the time t and finding the maximum value thereof
Figure BDA0003169108510000073
And minimum value
Figure BDA0003169108510000074
And ordering eta according to the size sequence to obtain a new ordering array
Figure BDA0003169108510000075
Finally pass through
Figure BDA0003169108510000076
And discretizing the disturbance to make the disturbance have practical physical significance.
3.2: at etatReading the disturbance in sequence and comparing the disturbance with original data, and if the original state is inconsistent with the counterdisturbance, assigning the corresponding disturbance to the corresponding original state; if the original state is consistent with the counterdisturbance, then get eta againt' the next countermeasure disturbance is assigned in the manner described above until the selected disturbance is valid, resulting in a disturbance state st′。
And 4, step 4: in the currently constructed disturbance state, the disturbance magnitude is limited, and the disturbance amount mu is measured at the moment ttInputting the disturbance state into the model when the disturbance state is less than or equal to delta (delta is a disturbance limit); when the disturbance amount mutInputting the original state into the model when the value is less than or equal to delta;
calculating the disturbance quantity mu added by the disturbance state at the time ttThe formula is expressed as:
Figure BDA0003169108510000077
where len (.) denotes the calculation stAnd stThe number of' middle vehicle state is 0.5, when the disturbance amount mutWhen the value is less than or equal to delta, the state of disturbance s is detectedt' input into the agent model, otherwise the original state stInput into the agent model.
And 5: and testing the performance of the generated anti-disturbance, and after the state is input into the model, the intelligent agent can select the phase of the traffic signal lamp according to the current state to control the traffic flow of the single intersection. Finally, comparing the fluency of the traffic intersection by the traffic light phases obtained by the traffic flows in different input states on sumo;
the process of the step 5 is as follows:
5.1: original state s at each timetModel input into the model will select the optimal action (traffic light phase)
Figure BDA0003169108510000081
Controlling the traffic flow at the intersection and calculating the waiting time difference (reward r) of the traffic intersectiont=Wt-Wt+1)。
5.2: to the final disturbance s after adding effective disturbancet' calculating the disturbance quantity mutTo meet the requirement (mu)tδ) input of input state into the agent model and output action
Figure BDA0003169108510000082
I.e. the traffic light phase, when the difference in waiting time (reward r) at the traffic crossing is calculated as wellt=Wt-Wt+1)。
Example (c): the data in the actual experiment are as follows:
(1) selecting experimental data
The experimental data are 100 cars randomly generated by a single intersection on sumo, and the size of each car, the distance from the generation position to the intersection and the speed of the car from generation to passing through the intersection are all the same. The initial time of traffic light phase at the traffic intersection is 10 seconds for green light and 4 seconds for yellow light. A road k (k 1,2,3,4) of 700 meters in length starting from the stop line is divided into 100 discrete units of 7 meters in length. Original state s collected at input end of traffic intersectiontThe number and the positions of the vehicles at the input end of the single intersection are recorded. The perturbation limit δ is 20%.
(2) Results of the experiment
In result analysis, a single intersection is used as an experimental scene, a reinforcement learning Deep Q Network (DQN) intelligent model is trained, a fast gradient descent method (FGSM) is adopted, discretization processing is carried out on disturbance to generate counterdisturbance, the number of vehicles at the input end of the intersection and the positions thereof are changed to cause the phase of the traffic lights to be changed, the comparison experiment is carried out under the two conditions of attack and non-attack, and the experimental results are shown in fig. 5 and fig. 6 (when the attack is continuously carried out, the traffic light phase cannot well ensure the circulation of the single intersection vehicles, so that the vehicles are accumulated at the intersection.
The embodiments described in this specification are merely illustrative of implementations of the inventive concepts, which are intended for purposes of illustration only. The scope of the present invention should not be construed as being limited to the particular forms set forth in the examples, but rather as being defined by the claims and the equivalents thereof which can occur to those skilled in the art upon consideration of the present inventive concept.

Claims (1)

1. A traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent is characterized by comprising the following steps:
step 1: training an reinforcement learning intelligent agent model on a single intersection road grid, wherein network parameters of the model do not change after training, the model has high mobility, and high fluency and no congestion are realized in a single intersection testing process;
step 2: acquiring the number of vehicles at the input end of each intersection and the positions of the vehicles at the input end of each intersection, namely inputting the number and the positions of the current vehicles into a model, generating corresponding traffic signal lamps, namely output actions, and attacking the input at each moment one by utilizing an FGSM (fuzzy basis function) attack algorithm to obtain corresponding counterdisturbance;
and step 3: discretizing the generated countermeasure disturbance, and combining the generated countermeasure disturbance with the originally acquired traffic flow to obtain a final disturbance state, namely the number and the positions of the vehicles at the traffic intersection input into the model at the moment;
and 4, step 4: in the currently constructed disturbance state, the disturbance magnitude is limited, and when the disturbance amount is smaller than the disturbance limit, the disturbance state is input into the model; inputting the original state into the model when the disturbance amount is larger than the disturbance limit;
and 5: testing the performance of the generated anti-disturbance, inputting the state into a model, selecting the phase of a traffic signal lamp according to the current state by an intelligent agent to control the traffic flow of a single intersection, and finally comparing the fluency of traffic intersections corresponding to the traffic lamp phase obtained by the traffic flow of different input states on sumo;
in the step 1, the single intersection is a crossroad, firstly, a reinforcement learning intelligent agent model is trained on a road grid of the single intersection, the traffic states of all roads entering the single intersection are discretely coded, a road k with the length of l from a road section entrance to a stop line of the single intersection is equidistantly divided into c discrete units, k is 1,2,3 and 4, and the vehicle position of the road k of the single intersection at the time t is represented as a vehicle position matrix sk(t) when the vehicle head is on a discrete cell, then the vehicle position matrix sk(t) the value corresponding to the ith position is 0.5, i-1, 2, …, c, otherwise the value is-0.5, and the formula is:
Figure FDA0003539094900000021
wherein
Figure FDA0003539094900000022
Matrix s representing the position of the vehiclek(t) value of ith position, matrix s of vehicle positions of four intersection input ends at time tk(t) splicing end to end in lines to form stThe formula is expressed as:
st=[s1(t),s2(t),s3(t),s4(t)] (2)
then handle stInputting the state of the environment into an intelligent agent model for training, and outputting a corresponding action, namely a phase to be executed by a traffic light by the intelligent agent;
defining the phase of traffic light as motion space a ═ a1,a2,a3,a4In which a is1Green light in east-west direction, a2Turning green light to the left in east and west directions, a3Green light in the north-south direction, a4Turning green light to the left in south and north, setting a during operationiThe initial time length of the phase is m, the phase time length of the yellow lamp is n, and the current state s is compared with the current state s at the moment ttInputting the phase into an intelligent traffic light model, and selecting the phase a by the intelligent traffic lightiWhen i is 1,2,3,4, aiAfter the phase is executed, the intelligent traffic light collects the state s at the t +1 moment from the environmentt+1Then select phase ajJ is 1,2,3,4, if ai≠ajThen aiThe phase execution time is not lengthened any more, i.e. aiEnd of phase at aiAfter the phase is finished, the intelligent traffic light executes the yellow light phase, and after the yellow light phase is finished, executes ajA phase; if ai=ajThen a isiPhase execution time is prolonged by m; will award rtSet as the difference in the waiting time of the crossing vehicle between two consecutive actions, the formula is expressed as:
rt=Wt-Wt+1 (3)
wherein Wt,Wt+1Respectively waiting time of entering all lanes of the single intersection at the time t and the time t +1, judging the action according to the executed action and environment reward, and continuously updating the parameters of the network, wherein the used reinforcement learning model is DQN, and the structure comprises a convolution layer and a full connection layer; the parameters comprise the size of a convolution kernel and the number of neurons of a full connection layer, a deep neural network is used as a Q value network, network parameters are initialized, the output of the network is the Q value, a Relu nonlinear activation function is adopted by a hidden layer, the number of the neurons of the output layer is equal to the size of an action space of a single intersection, and the formula is expressed as follows:
Q=h(wst+b) (4)
where w represents the weight of the neural network, stFor the input of the network, b is the bias, h (.) represents the Relu activation function, the loss function of DQN is:
Figure FDA0003539094900000031
Lt=(yt-Q(st,ai;θ′))2 (6)
Wherein y istRepresents a target value, ai,ajE A represents the traffic light phase r which is the action of the intelligent outputtRepresenting the reward at the moment T, gamma is a learning rate, theta and theta 'represent parameters w and b of a target network and parameters w and b' of an estimation network in the DQN respectively, the parameters of the estimation network are updated gradually along with a time step, the parameters of the target network are updated by directly copying the parameters of the network from the estimation network at intervals of time T, and the formula is as follows:
Figure FDA0003539094900000032
Figure FDA0003539094900000033
the process of the step 2 is as follows:
2.1: obtaining an input value s of an input model at time ttWherein s istRepresenting the number of vehicles at the input end of the single intersection and the positions of the vehicles at the input end of the single intersection, which are obtained from sumo at the moment t;
2.2: input original state stSelecting the action a with the maximum action value function Q through the trained DQN intelligent agent modelmThat is, the optimal traffic light phase at this time, m is 1,2,3,4, and is expressed as:
Figure FDA0003539094900000034
where θ represents a parameter of the trained agent model network, amIndicating the movement of the outputI.e. the phase the traffic light is going to execute;
2.3: using FGSM attack algorithm, assigning values along gradient direction according to sign function to generate corresponding anti-disturbance eta at t momenttThe formula is expressed as:
Figure FDA0003539094900000035
where ε represents the perturbation coefficient, stRepresenting the input value, i.e. the position of the vehicle, amRepresenting the optimum phase to be carried out by the traffic light at that time, sign representing a sign function, Lt(θ,st,am) A loss function representing the model at time t;
the process of the step 3 is as follows:
3.1:
Figure FDA0003539094900000036
wherein c is the number of discrete units divided by the input end of the traffic intersection,
Figure FDA0003539094900000037
representing the antagonistic disturbance of the ith discrete unit at the t moment, and calculating the antagonistic disturbance eta at the t momenttThen, taking the absolute value of the disturbance at the time t and finding the maximum value thereof
Figure FDA0003539094900000041
And minimum value
Figure FDA0003539094900000042
And pair eta in order of magnitudetSorting to obtain a new sorting array
Figure FDA0003539094900000043
Finally pass through
Figure FDA0003539094900000044
i is 1,2, …, c discretizes the perturbation,making it of practical physical significance;
3.2: at etatReading the disturbance in sequence and comparing the disturbance with original data, and if the original state is inconsistent with the counterdisturbance, assigning the corresponding disturbance to the corresponding original state; if the original state is consistent with the counterdisturbance, then get eta againt' the next countermeasure disturbance is assigned in the manner described above until the selected disturbance is valid, resulting in a disturbance state st′;
The process of the step 4 is as follows: calculating the disturbance quantity mu added by the disturbance state at the moment ttThe formula is expressed as:
Figure FDA0003539094900000045
where len (.) denotes the calculation stAnd stThe number of' middle vehicle state is 0.5, when the disturbance amount mutWhen the value is less than or equal to delta, the state of disturbance s is detectedt' input into the agent model, otherwise the original state stInputting into an agent model;
the process of the step 5 is as follows:
5.1: original state s at each timetModel input into the model selects the optimal action
Figure FDA0003539094900000046
Controlling the traffic flow at the intersection and calculating the waiting time difference of the traffic intersection, i.e. the reward rt=Wt-Wt+1
5.2: for the final disturbance s after adding effective disturbancet' calculating the disturbance quantity mutTo satisfy the requirement mutInputting input state less than or equal to delta into intelligent agent model and outputting action
Figure FDA0003539094900000047
I.e. the traffic light phase, and the waiting time difference of the traffic intersection is calculated and r is awardedt=Wt-Wt+1
CN202110813579.3A 2021-07-19 2021-07-19 Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent Active CN113487889B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110813579.3A CN113487889B (en) 2021-07-19 2021-07-19 Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110813579.3A CN113487889B (en) 2021-07-19 2021-07-19 Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent

Publications (2)

Publication Number Publication Date
CN113487889A CN113487889A (en) 2021-10-08
CN113487889B true CN113487889B (en) 2022-06-17

Family

ID=77941350

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110813579.3A Active CN113487889B (en) 2021-07-19 2021-07-19 Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent

Country Status (1)

Country Link
CN (1) CN113487889B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023082205A1 (en) * 2021-11-12 2023-05-19 华为技术有限公司 Method for evaluating reinforcement learning agent, and related apparatus
CN115830887B (en) * 2023-02-14 2023-05-12 武汉智安交通科技有限公司 Self-adaptive traffic signal control method, system and readable storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10726134B2 (en) * 2018-08-14 2020-07-28 Intel Corporation Techniques to detect perturbation attacks with an actor-critic framework
US20220004709A1 (en) * 2018-11-14 2022-01-06 North Carolina State University Deep neural network with compositional grammatical architectures
WO2021061266A1 (en) * 2019-09-24 2021-04-01 Hrl Laboratories, Llc A deep reinforcement learning based method for surreptitiously generating signals to fool a recurrent neural network
CN111243299B (en) * 2020-01-20 2020-12-15 浙江工业大学 Single cross port signal control method based on 3 DQN-PSER algorithm
CN111260937B (en) * 2020-02-24 2021-09-14 武汉大学深圳研究院 Cross traffic signal lamp control method based on reinforcement learning
CN111461226A (en) * 2020-04-01 2020-07-28 深圳前海微众银行股份有限公司 Countermeasure sample generation method, device, terminal and readable storage medium
CN112232434B (en) * 2020-10-29 2024-02-20 浙江工业大学 Correlation analysis-based anti-attack cooperative defense method and device

Also Published As

Publication number Publication date
CN113487889A (en) 2021-10-08

Similar Documents

Publication Publication Date Title
CN112215337B (en) Vehicle track prediction method based on environment attention neural network model
CN110647839B (en) Method and device for generating automatic driving strategy and computer readable storage medium
CN113487889B (en) Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent
CN109959388B (en) Intelligent traffic refined path planning method based on grid expansion model
JP6916552B2 (en) A method and device for detecting a driving scenario that occurs during driving and providing information for evaluating a driver's driving habits.
CN103593535A (en) Urban traffic complex self-adaptive network parallel simulation system and method based on multi-scale integration
CN112508164B (en) End-to-end automatic driving model pre-training method based on asynchronous supervised learning
CN112784485B (en) Automatic driving key scene generation method based on reinforcement learning
CN114358128A (en) Method for training end-to-end automatic driving strategy
CN115188204B (en) Highway lane-level variable speed limit control method under abnormal weather condition
Alam et al. Intellegent traffic light control system for isolated intersection using fuzzy logic
CN114120670A (en) Method and system for traffic signal control
Shi et al. Efficient Lane-changing Behavior Planning via Reinforcement Learning with Imitation Learning Initialization
CN111488674B (en) Plane intersection vehicle running track simulation method
CN116448134B (en) Vehicle path planning method and device based on risk field and uncertain analysis
CN116758768A (en) Dynamic regulation and control method for traffic lights of full crossroad
CN115426149A (en) Single intersection signal lamp control traffic state anti-disturbance generation method based on Jacobian saliency map
CN113487870B (en) Anti-disturbance generation method for intelligent single intersection based on CW (continuous wave) attack
WO2021258847A1 (en) Driving decision-making method, device, and chip
CN115031753A (en) Driving condition local path planning method based on safety potential field and DQN algorithm
CN115062202A (en) Method, device, equipment and storage medium for predicting driving behavior intention and track
Wen et al. Modeling human driver behaviors when following autonomous vehicles: An inverse reinforcement learning approach
CN114701517A (en) Multi-target complex traffic scene automatic driving solution based on reinforcement learning
Tran et al. Revisiting pixel-based traffic signal controls using reinforcement learning with world models
CN113628455A (en) Intersection signal optimization control method considering number of people in vehicle under Internet of vehicles environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant