CN113487889A - Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent - Google Patents

Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent Download PDF

Info

Publication number
CN113487889A
CN113487889A CN202110813579.3A CN202110813579A CN113487889A CN 113487889 A CN113487889 A CN 113487889A CN 202110813579 A CN202110813579 A CN 202110813579A CN 113487889 A CN113487889 A CN 113487889A
Authority
CN
China
Prior art keywords
disturbance
traffic
state
phase
intersection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110813579.3A
Other languages
Chinese (zh)
Other versions
CN113487889B (en
Inventor
徐东伟
王达
李呈斌
周磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202110813579.3A priority Critical patent/CN113487889B/en
Publication of CN113487889A publication Critical patent/CN113487889A/en
Application granted granted Critical
Publication of CN113487889B publication Critical patent/CN113487889B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/07Controlling traffic signals
    • G08G1/08Controlling traffic signals according to detected number or speed of vehicles

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Traffic Control Systems (AREA)

Abstract

A traffic state countermeasure disturbance generation method based on single intersection signal control of rapid gradient descent is characterized in that according to a traffic intersection signal lamp control model trained by an existing reinforcement learning DQN algorithm, countermeasure samples are generated by utilizing FGSM attack based and discretization processing on the countermeasure disturbance in combination with gradient values, the final disturbance state obtained by combining the countermeasure disturbance and an original state is input into an intelligent model, and finally the effect of fluency or congestion degree inspection on a single intersection is achieved on sumo. The invention can limit the disturbance under the condition that the output disturbance has physical significance, thereby efficiently generating a confrontation state, increasing the queuing length and waiting time of the intersection, greatly reducing the performance of the model and greatly reducing the traffic intersection circulation.

Description

Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent
Technical Field
The invention belongs to the crossing field of intelligent traffic and machine learning information safety, and relates to a traffic state anti-disturbance generation method based on single crossing signal control of fast gradient descent (FGSM).
Background
The problem of traffic congestion becomes an urgent challenge for urban traffic, and when a modern city is designed, one of the most critical considerations is to develop an intelligent traffic management system. The main goal of traffic management systems is to reduce traffic congestion, which has become one of the major problems in large cities today. Efficient urban traffic management can save time and money and reduce carbon dioxide emissions to the atmosphere.
Reinforcement Learning (RL) has produced impressive results as a machine learning technique for traffic signal control problems. Reinforcement learning does not require prior full understanding of the environment, such as traffic flow. Instead, they can acquire knowledge and model environmental dynamics by interacting with the environment. After each operation is performed in the environment, it will receive a scalar reward. The reward earned depends on the degree of action taken and the goal of the agent is to learn the best control strategy, so by repeatedly interacting with the environment, the accumulated reward at a reduced price can be maximised. Deep Reinforcement Learning (DRL) has numerous applications in the real world due to its excellent ability to adapt quickly to the surrounding environment. Although DRL has great advantages, it is vulnerable to antagonistic attacks such as: luring attacks, strategy timing attacks, value function-based counter attacks, trojan attacks, and the like.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a traffic state anti-disturbance generation method based on single intersection signal control with rapid gradient descent, which can add a small amount of disturbance to the number of vehicles and the positions of the vehicles and ensure that the disturbance has actual physical significance, thereby efficiently generating the anti-disturbance and greatly reducing the performance of a model and the smoothness degree of a traffic intersection.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent comprises the following steps:
step 1: training a reinforcement learning Deep Q Network (DQN) intelligent model on a single intersection road grid, wherein Network parameters of the model do not change after training, the model has high mobility, and high fluency and no congestion are embodied in a single intersection testing process;
step 2: acquiring the number of vehicles at the input end of each intersection and the positions of the vehicles at the input end of each intersection, namely inputting the number and the positions of the vehicles into a model, generating corresponding traffic signal lamps, namely output actions, and attacking the input at each moment one by utilizing a FGSM (highway fault warning message) attack algorithm to obtain corresponding counterdisturbance;
and step 3: discretizing the generated countermeasure disturbance, and combining the generated countermeasure disturbance with the originally acquired traffic flow to obtain a final disturbance state, namely the number and the positions of the vehicles at the traffic intersection input into the model at the moment;
and 4, step 4: in the currently constructed disturbance state, the disturbance magnitude is limited, and when the disturbance amount is smaller than the disturbance limit, the disturbance state is input into the model; inputting the original state into the model when the disturbance amount is larger than the disturbance limit;
and 5: and finally, comparing fluency of traffic intersections by traffic light phases obtained by traffic flows in different input states on sumo.
As a research hotspot in the field of artificial intelligence, Deep Reinforcement Learning (DRL) has achieved certain success in various fields such as robot control, computer vision, intelligent transportation, and the like. Meanwhile, the possibility of attack and whether it has strong resistance are also the hot topics in recent years. Therefore, the method selects a representative Deep Q Network (DQN) algorithm in Deep reinforcement learning, controls single intersection signal lamps to be application scenes, and adopts a fast gradient descent method (FGSM) to attack the DQN algorithm to generate countermeasure samples.
The technical conception of the invention is as follows: according to a traffic intersection signal lamp control model trained by the existing reinforcement learning DQN algorithm, the countermeasure disturbance is discretized by using FGSM attack and combining with the gradient value to generate a countermeasure sample, the countermeasure disturbance is combined with the original state to obtain a final disturbance state, the final disturbance state is input into an intelligent model, and finally the effect of detecting the smoothness or the congestion degree of a single-crossing intersection on sumo is achieved.
The invention has the following beneficial effects: and generating corresponding opposite disturbance to the maximum gradient value by using an FGSM attack algorithm, wherein the generated disturbance is a discrete value, combining the opposite disturbance and the original traffic flow to form a disturbance state, adding disturbance limitation to the disturbance quantity of the disturbance state, and obtaining the output which is the disturbance state. The invention can limit the disturbance under the condition that the output disturbance has physical significance, thereby efficiently generating a confrontation state, increasing the queuing length and waiting time of the intersection, greatly reducing the performance of the model and greatly reducing the traffic intersection circulation.
Drawings
Fig. 1 is a schematic diagram of reinforcement learning.
Fig. 2 is a general flow diagram of FGSM generation against perturbation.
Fig. 3 is a schematic view of a single intersection.
Fig. 4 is a discrete state of the vehicle position.
FIG. 5 is a comparison graph of single intersection vehicle waiting queue lengths.
FIG. 6 is a comparison graph of vehicle waiting times at a single intersection.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 to 6, a traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent includes the following steps:
step 1: reinforcement learning is an algorithm that interacts with the environment continuously, as shown in fig. 1. The reinforcement learning algorithm contains three most basic elements: environmental status, agent actions, environmental rewards. Take a typical crossroad as an example. Firstly, training a reinforcement learning intelligent agent model on a road grid of a single intersection, and carrying out discrete coding on traffic states of all roads entering the single intersection. Equally dividing a single intersection into c discrete units from a road k (k is 1,2,3 and 4) with the length of l between a road section entrance and a stop line, and expressing the vehicle position of the road k of the single intersection at the time t as a vehicle position momentArray sk(t) when the vehicle head is on a discrete cell, then the vehicle position matrix sk(t) has a value of 0.5 for the ith (i ═ 1,2, …, c) position, otherwise the value is-0.5, and the formula is:
Figure BDA0003169108510000041
wherein
Figure BDA0003169108510000042
Representing a vehicle position matrix sk(t) value of ith position, matrix s of vehicle positions of four intersection input ends at time tk(t) splicing according to line head and tail to form stThe formula is expressed as:
st=[s1(t),s2(t),s3(t),s4(t)] (2)
then handle stThe state of the environment is input into the intelligent agent model for training, and the intelligent agent outputs corresponding actions, namely phases (such as south-north green light or east-west green light) to be executed by the traffic light.
A typical crossroad will be described as an example. We define the phase of the traffic light as the motion space a ═ a1,a2,a3,a4In which a is1Green light in east-west direction, a2Turning green light to the left in east and west directions, a3Green light in the north-south direction, a4Turning green light to the left from north and south. Setting a at runtimeiThe initial duration of the yellow phase is m and the yellow phase duration is n. At time t, the current state stInputting the phase into an intelligent traffic light model, and selecting the phase a by the intelligent traffic lighti(i ═ 1,2,3,4) when aiAfter the phase is executed, the intelligent traffic light collects the state s at the moment of t +1 from the environmentt+1Then selects phase aj(j ═ 1,2,3, 4). If ai≠ajThen aiThe phase execution time is not lengthened any more, i.e. aiEnd of phase at aiAfter the phase is finished, the intelligent traffic light executes the yellow light phase, and after the yellow light phase is finished, executes ajA phase; if ai=ajThen a isiPhase execution time is prolonged by m; will award rtSet as the difference in the waiting time of the crossing vehicle between two consecutive actions, the formula is expressed as:
rt=Wt-Wt+1 (3)
wherein Wt,Wt+1The waiting time for entering all lanes of the single intersection at the time t and the time t +1 are respectively. And judging the action according to the executed action and the environment reward, thereby continuously updating the parameters of the network. The reinforcement learning model used was: deep Q Network (DQN). The structure comprises a convolution layer and a full connection layer; the parameters include the convolution kernel size, the number of full connectivity layer neurons. A deep neural network is used as a Q value network, network parameters are initialized, the output of the network is the Q value, a Relu nonlinear activation function is adopted by a hidden layer, and the number of neurons of an output layer is equal to the size of an action space of a single intersection. The formula is expressed as:
Q=h(wst+b) (4)
where w represents the weight of the neural network, stIs the input to the network, b is the bias, and h (.) represents the Relu activation function. The loss function of DQN is:
Figure BDA0003169108510000051
Lt=(yt-Q(st,ai;θ′))2 (6)
wherein y istRepresents a target value, ai,ajE A represents the traffic light phase r which is the action of the intelligent outputtRepresenting the reward at the moment T, gamma is a learning rate, theta and theta 'represent parameters w and b of a target network and parameters w and b' of an estimation network in the DQN respectively, the parameters of the estimation network are updated gradually along with a time step, the parameters of the target network are updated by directly copying the parameters of the network from the estimation network at intervals of time T, and the formula is as follows:
Figure BDA0003169108510000061
Figure BDA0003169108510000062
step 2: the number of vehicles at the input end of each intersection and the positions of the vehicles at the input end of each intersection are obtained at the traffic intersections, namely the number and the positions of the vehicles are input into the model, and corresponding traffic lights, namely output actions, are generated. Using FGSM attack algorithm to attack the input of each time one by one to obtain corresponding anti-disturbance; the process is as follows:
2.1: obtaining an input value s of an input model at time ttWherein s istRepresenting the number of vehicles at the input end of the single intersection and the positions of the vehicles at the input end of the single intersection, which are obtained from sumo at the moment t;
2.2: input original state stSelecting the action a with the maximum action value function Q through the trained DQN intelligent agent modelm(m ═ 1,2,3,4) which is the optimum traffic light phase at this time, the formula is given as:
Figure BDA0003169108510000063
where θ represents a parameter of the trained intelligent agent model network, amIndicating the action of the output, i.e. the phase the traffic light is to perform.
2.3: using FGSM attack algorithm, assigning values along gradient direction according to sign function to generate corresponding anti-disturbance eta at t momenttThe formula is expressed as:
Figure BDA0003169108510000064
where ε represents the perturbation coefficient, stRepresenting the input value, i.e. the position of the vehicle, amRepresenting the optimum phase for traffic light execution at that time, sign representing the sign functionNumber, Lt(θ,st,am) Representing the loss function of the model at time t.
And step 3: the state taken is a discrete value since it is the number of vehicles and their positions. Therefore, the counterdisturbance η is processed to obtain a disturbance value with actual physical significance. The process is as follows:
3.1:
Figure BDA0003169108510000071
wherein c is the number of discrete units divided by the input end of the traffic intersection,
Figure BDA0003169108510000072
representing the confrontation state of the ith discrete cell at time t. Calculating the resistance disturbance eta of t momenttThen, taking the absolute value of the disturbance at the time t and finding the maximum value thereof
Figure BDA0003169108510000073
And minimum value
Figure BDA0003169108510000074
And ordering eta according to the size sequence to obtain a new ordering array
Figure BDA0003169108510000075
Finally pass through
Figure BDA0003169108510000076
And discretizing the disturbance to make the disturbance have practical physical significance.
3.2: at etatReading the disturbance in sequence and comparing the disturbance with original data, and if the original state is inconsistent with the counterdisturbance, assigning the corresponding disturbance to the corresponding original state; if the original state is consistent with the counterdisturbance, then get eta againt' the next countermeasure disturbance is assigned in the manner described above until the selected disturbance is valid, resulting in a disturbance state st′。
And 4, step 4: in the currently constructed disturbance state, the disturbance magnitude is limited, and the disturbance amount is limited at the moment tμtInputting the disturbance state into the model when the disturbance state is less than or equal to delta (delta is a disturbance limit); when the disturbance amount mutInputting the original state into the model when the value is less than or equal to delta;
calculating the disturbance quantity mu added by the disturbance state at the moment ttThe formula is expressed as:
Figure BDA0003169108510000077
where len (.) denotes the calculation stAnd stThe number of' middle vehicle state is 0.5, when the disturbance amount mutWhen the value is less than or equal to delta, the state of disturbance s is detectedt' input into the agent model, otherwise the original state stInput into the agent model.
And 5: and testing the performance of the generated anti-disturbance, and after the state is input into the model, the intelligent agent can select the phase of the traffic signal lamp according to the current state to control the traffic flow of the single intersection. Finally, comparing the fluency of the traffic intersection by the traffic light phases obtained by the traffic flows in different input states on sumo;
the process of the step 5 is as follows:
5.1: original state s at each timetModel input into the model will select the optimal action (traffic light phase)
Figure BDA0003169108510000081
Controlling the traffic flow at the intersection and calculating the waiting time difference (reward r) of the traffic intersectiont=Wt-Wt+1)。
5.2: for the final disturbance s after adding effective disturbancet' calculating the disturbance quantity mutTo meet the requirement (mu)tδ) input of input state into the agent model and output action
Figure BDA0003169108510000082
I.e. the traffic light phase, when the difference in waiting time (reward r) at the traffic crossing is calculated as wellt=Wt-Wt+1)。
Example (c): the data in the actual experiment are as follows:
(1) selecting experimental data
The experimental data are 100 cars randomly generated by a single intersection on sumo, and the size of each car, the distance from the generation position to the intersection and the speed of the car from generation to passing through the intersection are all the same. The initial time of traffic light phase at the traffic intersection is 10 seconds for green light and 4 seconds for yellow light. A road k (k 1,2,3,4) of 700 meters in length starting from the stop line is divided into 100 discrete units of 7 meters in length. Original state s collected at input end of traffic intersectiontThe number and the positions of the vehicles at the input end of the single intersection are recorded. The perturbation limit δ is 20%.
(2) Results of the experiment
In result analysis, a single intersection is used as an experimental scene, a reinforcement learning Deep Q Network (DQN) intelligent model is trained, a fast gradient descent method (FGSM) is adopted, discretization processing is carried out on disturbance to generate counterdisturbance, the number of vehicles at the input end of the intersection and the positions thereof are changed to cause the phase of the traffic lights to be changed, the comparison experiment is carried out under the two conditions of attack and non-attack, and the experimental results are shown in fig. 5 and fig. 6 (when the attack is continuously carried out, the traffic light phase cannot well ensure the circulation of the single intersection vehicles, so that the vehicles are accumulated at the intersection.
The embodiments described in this specification are merely illustrative of implementations of the inventive concepts, which are intended for purposes of illustration only. The scope of the present invention should not be construed as being limited to the particular forms set forth in the examples, but rather as being defined by the claims and the equivalents thereof which can occur to those skilled in the art upon consideration of the present inventive concept.

Claims (6)

1. A traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent is characterized by comprising the following steps:
step 1: training an reinforcement learning intelligent agent model on a single intersection road grid, wherein network parameters of the model do not change after training, the model has high mobility, and high fluency and no congestion are realized in a single intersection testing process;
step 2: acquiring the number of vehicles at the input end of each intersection and the positions of the vehicles at the input end of each intersection, namely inputting the number and the positions of the vehicles into a model, generating corresponding traffic signal lamps, namely output actions, and attacking the input at each moment one by utilizing a FGSM (highway fault warning message) attack algorithm to obtain corresponding counterdisturbance;
and step 3: discretizing the generated countermeasure disturbance, and combining the generated countermeasure disturbance with the originally acquired traffic flow to obtain a final disturbance state, namely the number and the positions of the vehicles at the traffic intersection input into the model at the moment;
and 4, step 4: in the currently constructed disturbance state, the disturbance magnitude is limited, and when the disturbance amount is smaller than the disturbance limit, the disturbance state is input into the model; inputting the original state into the model when the disturbance amount is larger than the disturbance limit;
and 5: and finally, comparing fluency of traffic intersections by traffic light phases obtained by traffic flows in different input states on sumo.
2. The method for generating the disturbance-resistant traffic state based on the rapid gradient descent single intersection signal control as claimed in claim 1, wherein in the step 1, the single intersection is an intersection, firstly, a reinforcement learning intelligent agent model is trained on a single intersection road grid, the traffic state on all roads entering the single intersection is discretely coded, and the length of the single intersection from a road section entrance to a stop line is lThe road k (k is 1,2,3,4) is equally divided into c discrete units, and the vehicle position of the road k at the single intersection at the time t is represented as a vehicle position matrix sk(t) when the vehicle head is on a discrete cell, then the vehicle position matrix sk(t) has a value of 0.5 for the ith (i ═ 1,2, …, c) position, otherwise the value is-0.5, and the formula is:
Figure FDA0003169108500000021
wherein
Figure FDA0003169108500000022
Representing a vehicle position matrix sk(t) value of ith position, matrix s of vehicle positions of four intersection input ends at time tk(t) splicing according to line head and tail to form stThe formula is expressed as:
st=[s1(t),s2(t),s3(t),s4(t)] (2)
then handle stInputting the state of the environment into an intelligent agent model for training, and outputting a corresponding action, namely a phase to be executed by a traffic light by the intelligent agent;
defining the phase of traffic light as motion space a ═ a1,a2,a3,a4In which a is1Green light in east-west direction, a2Turning green light to the left in east and west directions, a3Green light in the north-south direction, a4Turning green light to the left in south and north, setting a during operationiThe initial time length of the phase is m, the phase time length of the yellow lamp is n, and the current state s is compared with the current state s at the moment ttInputting the phase into an intelligent traffic light model, and selecting the phase a by the intelligent traffic lighti(i ═ 1,2,3,4) when aiAfter the phase is executed, the intelligent traffic light collects the state s at the moment of t +1 from the environmentt+1Then selects phase aj(j ═ 1,2,3,4) if ai≠ajThen aiThe phase execution time is not lengthened any more, i.e. aiEnd of phase at aiIntelligent traffic after phase completionThe lamp executes the yellow lamp phase, and after the yellow lamp phase is finished, executes ajA phase; if ai=ajThen a isiPhase execution time is prolonged by m; will award rtSet as the difference in the waiting time of the crossing vehicle between two consecutive actions, the formula is expressed as:
rt=Wt-Wt+1 (3)
wherein Wt,Wt+1Respectively waiting time of entering all lanes of the single intersection at the time t and the time t +1, judging the action according to the executed action and environment reward, and continuously updating the parameters of the network, wherein the used reinforcement learning model is DQN, and the structure comprises a convolution layer and a full connection layer; the parameters comprise the size of a convolution kernel and the number of neurons of a full connection layer, a deep neural network is used as a Q value network, network parameters are initialized, the output of the network is the Q value, a Relu nonlinear activation function is adopted by a hidden layer, the number of the neurons of the output layer is equal to the size of an action space of a single intersection, and the formula is expressed as follows:
Q=h(wst+b) (4)
where w represents the weight of the neural network, stFor the input of the network, b is the bias, h (.) denotes the Relu activation function, the loss function of DQN is:
Figure FDA0003169108500000031
Lt=(yt-Q(st,ai;θ′))2 (6)
wherein y istRepresents a target value, ai,ajE A represents the traffic light phase r which is the action of the intelligent outputtRepresenting the reward at the moment T, gamma is the learning rate, theta and theta 'represent the parameters w and b of the target network and the parameters w and b' of the estimation network in the DQN respectively, the parameters of the estimation network are updated gradually along with the time step, and the parameters of the target network are updated by directly copying the network from the estimation network at intervals of time TThe parameters, formula, are:
Figure FDA0003169108500000032
Figure FDA0003169108500000033
3. the method for generating the traffic state anti-disturbance based on the single intersection signal control with the rapid gradient descent according to the claim 1 or 2, characterized in that the process of the step 2 is as follows:
2.1: obtaining an input value s of an input model at time ttWherein s istRepresenting the number of vehicles at the input end of the single intersection and the positions of the vehicles at the input end of the single intersection, which are obtained from sumo at the moment t;
2.2: input original state stSelecting the action a with the maximum action value function Q through the trained DQN intelligent agent modelm(m ═ 1,2,3,4) which is the optimum traffic light phase at this time, the formula is given as:
Figure FDA0003169108500000034
where θ represents a parameter of the trained intelligent agent model network, amIndicating the phase of the output, i.e. the traffic light is to be executed;
2.3: using FGSM attack algorithm, assigning values along gradient direction according to sign function to generate corresponding anti-disturbance eta at t momenttThe formula is expressed as:
Figure FDA0003169108500000035
where ε represents the perturbation coefficient, stRepresenting the input value, i.e. the position of the vehicle, amRepresenting the traffic light at that timeOptimum phase of the line, sign stands for sign function, Lt(θ,st,am) Representing the loss function of the model at time t.
4. The method for generating the traffic state anti-disturbance based on the single intersection signal control with the rapid gradient descent according to the claim 1 or 2, characterized in that the process of the step 3 is as follows:
3.1:
Figure FDA0003169108500000041
wherein c is the number of discrete units divided by the input end of the traffic intersection,
Figure FDA0003169108500000042
representing the countermeasure disturbance of the ith discrete unit at the time t, and calculating the countermeasure disturbance eta at the time ttThen, taking the absolute value of the disturbance at the time t and finding the maximum value thereof
Figure FDA0003169108500000043
And minimum value
Figure FDA0003169108500000044
And ordering eta according to the size sequence to obtain a new ordering array
Figure FDA0003169108500000045
Finally pass through
Figure FDA0003169108500000046
Discretizing the disturbance to make the disturbance have practical physical significance;
3.2: at etatReading the disturbance in sequence and comparing the disturbance with original data, and if the original state is inconsistent with the counterdisturbance, assigning the corresponding disturbance to the corresponding original state; if the original state is consistent with the counterdisturbance, then get eta againt' the next countermeasure disturbance is assigned as described above until the selected disturbance is valid, resulting in a disturbanceDynamic state st′。
5. The method for generating the traffic state anti-disturbance based on the single intersection signal control with the rapid gradient descent according to the claim 1 or 2, characterized in that the process of the step 4 is as follows: calculating the disturbance quantity mu added by the disturbance state at the moment ttThe formula is expressed as:
Figure FDA0003169108500000047
where len (.) denotes the calculation stAnd stThe number of' middle vehicle state is 0.5, when the disturbance amount mutWhen the value is less than or equal to delta, the state of disturbance s is detectedt' input into the agent model, otherwise the original state stInput into the agent model.
6. The method for generating the traffic state anti-disturbance based on the single intersection signal control with the rapid gradient descent according to the claim 1 or 2, characterized in that the process of the step 5 is as follows:
5.1: original state s at each timetModel input into the model selects the optimal action
Figure FDA0003169108500000048
Controlling the traffic flow at the intersection and calculating the waiting time difference of the traffic intersection, i.e. the reward rt=Wt-Wt+1
5.2: for the final disturbance s after adding effective disturbancet' calculating the disturbance quantity mutTo meet the requirement (mu)tδ) input of input state into the agent model and output action
Figure FDA0003169108500000051
I.e. the traffic light phase, when the difference in waiting time (reward r) at the traffic crossing is calculated as wellt=Wt-Wt+1)。
CN202110813579.3A 2021-07-19 2021-07-19 Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent Active CN113487889B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110813579.3A CN113487889B (en) 2021-07-19 2021-07-19 Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110813579.3A CN113487889B (en) 2021-07-19 2021-07-19 Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent

Publications (2)

Publication Number Publication Date
CN113487889A true CN113487889A (en) 2021-10-08
CN113487889B CN113487889B (en) 2022-06-17

Family

ID=77941350

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110813579.3A Active CN113487889B (en) 2021-07-19 2021-07-19 Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent

Country Status (1)

Country Link
CN (1) CN113487889B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115830887A (en) * 2023-02-14 2023-03-21 武汉智安交通科技有限公司 Self-adaptive traffic signal control method, system and readable storage medium
WO2023082205A1 (en) * 2021-11-12 2023-05-19 华为技术有限公司 Method for evaluating reinforcement learning agent, and related apparatus

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020102526A1 (en) * 2018-11-14 2020-05-22 North Carolina State University Deep neural network with compositional grammatical architectures
CN111243299A (en) * 2020-01-20 2020-06-05 浙江工业大学 Single cross port signal control method based on 3 DQN-PSER algorithm
CN111260937A (en) * 2020-02-24 2020-06-09 武汉大学深圳研究院 Cross traffic signal lamp control method based on reinforcement learning
CN111461226A (en) * 2020-04-01 2020-07-28 深圳前海微众银行股份有限公司 Countermeasure sample generation method, device, terminal and readable storage medium
US20200327238A1 (en) * 2018-08-14 2020-10-15 Intel Corporation Techniques to detect perturbation attacks with an actor-critic framework
CN112232434A (en) * 2020-10-29 2021-01-15 浙江工业大学 Attack-resisting cooperative defense method and device based on correlation analysis
US20210089891A1 (en) * 2019-09-24 2021-03-25 Hrl Laboratories, Llc Deep reinforcement learning based method for surreptitiously generating signals to fool a recurrent neural network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200327238A1 (en) * 2018-08-14 2020-10-15 Intel Corporation Techniques to detect perturbation attacks with an actor-critic framework
WO2020102526A1 (en) * 2018-11-14 2020-05-22 North Carolina State University Deep neural network with compositional grammatical architectures
US20210089891A1 (en) * 2019-09-24 2021-03-25 Hrl Laboratories, Llc Deep reinforcement learning based method for surreptitiously generating signals to fool a recurrent neural network
CN111243299A (en) * 2020-01-20 2020-06-05 浙江工业大学 Single cross port signal control method based on 3 DQN-PSER algorithm
CN111260937A (en) * 2020-02-24 2020-06-09 武汉大学深圳研究院 Cross traffic signal lamp control method based on reinforcement learning
CN111461226A (en) * 2020-04-01 2020-07-28 深圳前海微众银行股份有限公司 Countermeasure sample generation method, device, terminal and readable storage medium
CN112232434A (en) * 2020-10-29 2021-01-15 浙江工业大学 Attack-resisting cooperative defense method and device based on correlation analysis

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
ANAY PATTANAIK: "Robust Deep Reinforcement Learning with Adversarial Attacks", 《COMPUTER SCIENCE》 *
TONG CHEN: "Adversarial attack and defense in reinforcement learning-from ai security view", 《CYBERSECURITY》 *
XIAOYUAN LIANG: "A deep reinforcement learning network for traffic light cycle control", 《IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY》 *
YU-YING CHEN: "Adversarial Attacks Against Reinforcement Learning-Based Portfolio Management Strategy", 《IEEE ACCESS》 *
刘志: "基于改进深度强化学习方法的单交叉口信号控制", 《计算机科学》 *
李蒙: "基于深度强化学习的黑盒对抗攻击算法", 《计算机与现代化》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023082205A1 (en) * 2021-11-12 2023-05-19 华为技术有限公司 Method for evaluating reinforcement learning agent, and related apparatus
CN115830887A (en) * 2023-02-14 2023-03-21 武汉智安交通科技有限公司 Self-adaptive traffic signal control method, system and readable storage medium

Also Published As

Publication number Publication date
CN113487889B (en) 2022-06-17

Similar Documents

Publication Publication Date Title
CN112215337B (en) Vehicle track prediction method based on environment attention neural network model
JP7351487B2 (en) Intelligent navigation method and system based on topology map
CN109959388B (en) Intelligent traffic refined path planning method based on grid expansion model
JP6916552B2 (en) A method and device for detecting a driving scenario that occurs during driving and providing information for evaluating a driver's driving habits.
CN113487889B (en) Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent
CN103593535A (en) Urban traffic complex self-adaptive network parallel simulation system and method based on multi-scale integration
CN112487954B (en) Pedestrian crossing behavior prediction method for plane intersection
CN112016499A (en) Traffic scene risk assessment method and system based on multi-branch convolutional neural network
CN112508164B (en) End-to-end automatic driving model pre-training method based on asynchronous supervised learning
CN114358128A (en) Method for training end-to-end automatic driving strategy
CN115188204B (en) Highway lane-level variable speed limit control method under abnormal weather condition
CN111081022A (en) Traffic flow prediction method based on particle swarm optimization neural network
Alam et al. Intellegent traffic light control system for isolated intersection using fuzzy logic
CN114120670A (en) Method and system for traffic signal control
CN116448134B (en) Vehicle path planning method and device based on risk field and uncertain analysis
CN116758768A (en) Dynamic regulation and control method for traffic lights of full crossroad
CN115426149A (en) Single intersection signal lamp control traffic state anti-disturbance generation method based on Jacobian saliency map
WO2021258847A1 (en) Driving decision-making method, device, and chip
CN113487870B (en) Anti-disturbance generation method for intelligent single intersection based on CW (continuous wave) attack
Wen et al. Modeling human driver behaviors when following autonomous vehicles: An inverse reinforcement learning approach
CN115062202A (en) Method, device, equipment and storage medium for predicting driving behavior intention and track
CN115031753A (en) Driving condition local path planning method based on safety potential field and DQN algorithm
Tran et al. Revisiting pixel-based traffic signal controls using reinforcement learning with world models
CN113628455A (en) Intersection signal optimization control method considering number of people in vehicle under Internet of vehicles environment
CN113486568A (en) Vehicle control dynamic simulation learning algorithm based on surround vision

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant