CN113487889A - Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent - Google Patents
Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent Download PDFInfo
- Publication number
- CN113487889A CN113487889A CN202110813579.3A CN202110813579A CN113487889A CN 113487889 A CN113487889 A CN 113487889A CN 202110813579 A CN202110813579 A CN 202110813579A CN 113487889 A CN113487889 A CN 113487889A
- Authority
- CN
- China
- Prior art keywords
- disturbance
- traffic
- state
- phase
- intersection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/07—Controlling traffic signals
- G08G1/08—Controlling traffic signals according to detected number or speed of vehicles
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Traffic Control Systems (AREA)
Abstract
A traffic state countermeasure disturbance generation method based on single intersection signal control of rapid gradient descent is characterized in that according to a traffic intersection signal lamp control model trained by an existing reinforcement learning DQN algorithm, countermeasure samples are generated by utilizing FGSM attack based and discretization processing on the countermeasure disturbance in combination with gradient values, the final disturbance state obtained by combining the countermeasure disturbance and an original state is input into an intelligent model, and finally the effect of fluency or congestion degree inspection on a single intersection is achieved on sumo. The invention can limit the disturbance under the condition that the output disturbance has physical significance, thereby efficiently generating a confrontation state, increasing the queuing length and waiting time of the intersection, greatly reducing the performance of the model and greatly reducing the traffic intersection circulation.
Description
Technical Field
The invention belongs to the crossing field of intelligent traffic and machine learning information safety, and relates to a traffic state anti-disturbance generation method based on single crossing signal control of fast gradient descent (FGSM).
Background
The problem of traffic congestion becomes an urgent challenge for urban traffic, and when a modern city is designed, one of the most critical considerations is to develop an intelligent traffic management system. The main goal of traffic management systems is to reduce traffic congestion, which has become one of the major problems in large cities today. Efficient urban traffic management can save time and money and reduce carbon dioxide emissions to the atmosphere.
Reinforcement Learning (RL) has produced impressive results as a machine learning technique for traffic signal control problems. Reinforcement learning does not require prior full understanding of the environment, such as traffic flow. Instead, they can acquire knowledge and model environmental dynamics by interacting with the environment. After each operation is performed in the environment, it will receive a scalar reward. The reward earned depends on the degree of action taken and the goal of the agent is to learn the best control strategy, so by repeatedly interacting with the environment, the accumulated reward at a reduced price can be maximised. Deep Reinforcement Learning (DRL) has numerous applications in the real world due to its excellent ability to adapt quickly to the surrounding environment. Although DRL has great advantages, it is vulnerable to antagonistic attacks such as: luring attacks, strategy timing attacks, value function-based counter attacks, trojan attacks, and the like.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a traffic state anti-disturbance generation method based on single intersection signal control with rapid gradient descent, which can add a small amount of disturbance to the number of vehicles and the positions of the vehicles and ensure that the disturbance has actual physical significance, thereby efficiently generating the anti-disturbance and greatly reducing the performance of a model and the smoothness degree of a traffic intersection.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent comprises the following steps:
step 1: training a reinforcement learning Deep Q Network (DQN) intelligent model on a single intersection road grid, wherein Network parameters of the model do not change after training, the model has high mobility, and high fluency and no congestion are embodied in a single intersection testing process;
step 2: acquiring the number of vehicles at the input end of each intersection and the positions of the vehicles at the input end of each intersection, namely inputting the number and the positions of the vehicles into a model, generating corresponding traffic signal lamps, namely output actions, and attacking the input at each moment one by utilizing a FGSM (highway fault warning message) attack algorithm to obtain corresponding counterdisturbance;
and step 3: discretizing the generated countermeasure disturbance, and combining the generated countermeasure disturbance with the originally acquired traffic flow to obtain a final disturbance state, namely the number and the positions of the vehicles at the traffic intersection input into the model at the moment;
and 4, step 4: in the currently constructed disturbance state, the disturbance magnitude is limited, and when the disturbance amount is smaller than the disturbance limit, the disturbance state is input into the model; inputting the original state into the model when the disturbance amount is larger than the disturbance limit;
and 5: and finally, comparing fluency of traffic intersections by traffic light phases obtained by traffic flows in different input states on sumo.
As a research hotspot in the field of artificial intelligence, Deep Reinforcement Learning (DRL) has achieved certain success in various fields such as robot control, computer vision, intelligent transportation, and the like. Meanwhile, the possibility of attack and whether it has strong resistance are also the hot topics in recent years. Therefore, the method selects a representative Deep Q Network (DQN) algorithm in Deep reinforcement learning, controls single intersection signal lamps to be application scenes, and adopts a fast gradient descent method (FGSM) to attack the DQN algorithm to generate countermeasure samples.
The technical conception of the invention is as follows: according to a traffic intersection signal lamp control model trained by the existing reinforcement learning DQN algorithm, the countermeasure disturbance is discretized by using FGSM attack and combining with the gradient value to generate a countermeasure sample, the countermeasure disturbance is combined with the original state to obtain a final disturbance state, the final disturbance state is input into an intelligent model, and finally the effect of detecting the smoothness or the congestion degree of a single-crossing intersection on sumo is achieved.
The invention has the following beneficial effects: and generating corresponding opposite disturbance to the maximum gradient value by using an FGSM attack algorithm, wherein the generated disturbance is a discrete value, combining the opposite disturbance and the original traffic flow to form a disturbance state, adding disturbance limitation to the disturbance quantity of the disturbance state, and obtaining the output which is the disturbance state. The invention can limit the disturbance under the condition that the output disturbance has physical significance, thereby efficiently generating a confrontation state, increasing the queuing length and waiting time of the intersection, greatly reducing the performance of the model and greatly reducing the traffic intersection circulation.
Drawings
Fig. 1 is a schematic diagram of reinforcement learning.
Fig. 2 is a general flow diagram of FGSM generation against perturbation.
Fig. 3 is a schematic view of a single intersection.
Fig. 4 is a discrete state of the vehicle position.
FIG. 5 is a comparison graph of single intersection vehicle waiting queue lengths.
FIG. 6 is a comparison graph of vehicle waiting times at a single intersection.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 to 6, a traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent includes the following steps:
step 1: reinforcement learning is an algorithm that interacts with the environment continuously, as shown in fig. 1. The reinforcement learning algorithm contains three most basic elements: environmental status, agent actions, environmental rewards. Take a typical crossroad as an example. Firstly, training a reinforcement learning intelligent agent model on a road grid of a single intersection, and carrying out discrete coding on traffic states of all roads entering the single intersection. Equally dividing a single intersection into c discrete units from a road k (k is 1,2,3 and 4) with the length of l between a road section entrance and a stop line, and expressing the vehicle position of the road k of the single intersection at the time t as a vehicle position momentArray sk(t) when the vehicle head is on a discrete cell, then the vehicle position matrix sk(t) has a value of 0.5 for the ith (i ═ 1,2, …, c) position, otherwise the value is-0.5, and the formula is:
whereinRepresenting a vehicle position matrix sk(t) value of ith position, matrix s of vehicle positions of four intersection input ends at time tk(t) splicing according to line head and tail to form stThe formula is expressed as:
st=[s1(t),s2(t),s3(t),s4(t)] (2)
then handle stThe state of the environment is input into the intelligent agent model for training, and the intelligent agent outputs corresponding actions, namely phases (such as south-north green light or east-west green light) to be executed by the traffic light.
A typical crossroad will be described as an example. We define the phase of the traffic light as the motion space a ═ a1,a2,a3,a4In which a is1Green light in east-west direction, a2Turning green light to the left in east and west directions, a3Green light in the north-south direction, a4Turning green light to the left from north and south. Setting a at runtimeiThe initial duration of the yellow phase is m and the yellow phase duration is n. At time t, the current state stInputting the phase into an intelligent traffic light model, and selecting the phase a by the intelligent traffic lighti(i ═ 1,2,3,4) when aiAfter the phase is executed, the intelligent traffic light collects the state s at the moment of t +1 from the environmentt+1Then selects phase aj(j ═ 1,2,3, 4). If ai≠ajThen aiThe phase execution time is not lengthened any more, i.e. aiEnd of phase at aiAfter the phase is finished, the intelligent traffic light executes the yellow light phase, and after the yellow light phase is finished, executes ajA phase; if ai=ajThen a isiPhase execution time is prolonged by m; will award rtSet as the difference in the waiting time of the crossing vehicle between two consecutive actions, the formula is expressed as:
rt=Wt-Wt+1 (3)
wherein Wt,Wt+1The waiting time for entering all lanes of the single intersection at the time t and the time t +1 are respectively. And judging the action according to the executed action and the environment reward, thereby continuously updating the parameters of the network. The reinforcement learning model used was: deep Q Network (DQN). The structure comprises a convolution layer and a full connection layer; the parameters include the convolution kernel size, the number of full connectivity layer neurons. A deep neural network is used as a Q value network, network parameters are initialized, the output of the network is the Q value, a Relu nonlinear activation function is adopted by a hidden layer, and the number of neurons of an output layer is equal to the size of an action space of a single intersection. The formula is expressed as:
Q=h(wst+b) (4)
where w represents the weight of the neural network, stIs the input to the network, b is the bias, and h (.) represents the Relu activation function. The loss function of DQN is:
Lt=(yt-Q(st,ai;θ′))2 (6)
wherein y istRepresents a target value, ai,ajE A represents the traffic light phase r which is the action of the intelligent outputtRepresenting the reward at the moment T, gamma is a learning rate, theta and theta 'represent parameters w and b of a target network and parameters w and b' of an estimation network in the DQN respectively, the parameters of the estimation network are updated gradually along with a time step, the parameters of the target network are updated by directly copying the parameters of the network from the estimation network at intervals of time T, and the formula is as follows:
step 2: the number of vehicles at the input end of each intersection and the positions of the vehicles at the input end of each intersection are obtained at the traffic intersections, namely the number and the positions of the vehicles are input into the model, and corresponding traffic lights, namely output actions, are generated. Using FGSM attack algorithm to attack the input of each time one by one to obtain corresponding anti-disturbance; the process is as follows:
2.1: obtaining an input value s of an input model at time ttWherein s istRepresenting the number of vehicles at the input end of the single intersection and the positions of the vehicles at the input end of the single intersection, which are obtained from sumo at the moment t;
2.2: input original state stSelecting the action a with the maximum action value function Q through the trained DQN intelligent agent modelm(m ═ 1,2,3,4) which is the optimum traffic light phase at this time, the formula is given as:
where θ represents a parameter of the trained intelligent agent model network, amIndicating the action of the output, i.e. the phase the traffic light is to perform.
2.3: using FGSM attack algorithm, assigning values along gradient direction according to sign function to generate corresponding anti-disturbance eta at t momenttThe formula is expressed as:
where ε represents the perturbation coefficient, stRepresenting the input value, i.e. the position of the vehicle, amRepresenting the optimum phase for traffic light execution at that time, sign representing the sign functionNumber, Lt(θ,st,am) Representing the loss function of the model at time t.
And step 3: the state taken is a discrete value since it is the number of vehicles and their positions. Therefore, the counterdisturbance η is processed to obtain a disturbance value with actual physical significance. The process is as follows:
3.1:wherein c is the number of discrete units divided by the input end of the traffic intersection,representing the confrontation state of the ith discrete cell at time t. Calculating the resistance disturbance eta of t momenttThen, taking the absolute value of the disturbance at the time t and finding the maximum value thereofAnd minimum valueAnd ordering eta according to the size sequence to obtain a new ordering arrayFinally pass throughAnd discretizing the disturbance to make the disturbance have practical physical significance.
3.2: at etatReading the disturbance in sequence and comparing the disturbance with original data, and if the original state is inconsistent with the counterdisturbance, assigning the corresponding disturbance to the corresponding original state; if the original state is consistent with the counterdisturbance, then get eta againt' the next countermeasure disturbance is assigned in the manner described above until the selected disturbance is valid, resulting in a disturbance state st′。
And 4, step 4: in the currently constructed disturbance state, the disturbance magnitude is limited, and the disturbance amount is limited at the moment tμtInputting the disturbance state into the model when the disturbance state is less than or equal to delta (delta is a disturbance limit); when the disturbance amount mutInputting the original state into the model when the value is less than or equal to delta;
calculating the disturbance quantity mu added by the disturbance state at the moment ttThe formula is expressed as:
where len (.) denotes the calculation stAnd stThe number of' middle vehicle state is 0.5, when the disturbance amount mutWhen the value is less than or equal to delta, the state of disturbance s is detectedt' input into the agent model, otherwise the original state stInput into the agent model.
And 5: and testing the performance of the generated anti-disturbance, and after the state is input into the model, the intelligent agent can select the phase of the traffic signal lamp according to the current state to control the traffic flow of the single intersection. Finally, comparing the fluency of the traffic intersection by the traffic light phases obtained by the traffic flows in different input states on sumo;
the process of the step 5 is as follows:
5.1: original state s at each timetModel input into the model will select the optimal action (traffic light phase)Controlling the traffic flow at the intersection and calculating the waiting time difference (reward r) of the traffic intersectiont=Wt-Wt+1)。
5.2: for the final disturbance s after adding effective disturbancet' calculating the disturbance quantity mutTo meet the requirement (mu)tδ) input of input state into the agent model and output actionI.e. the traffic light phase, when the difference in waiting time (reward r) at the traffic crossing is calculated as wellt=Wt-Wt+1)。
Example (c): the data in the actual experiment are as follows:
(1) selecting experimental data
The experimental data are 100 cars randomly generated by a single intersection on sumo, and the size of each car, the distance from the generation position to the intersection and the speed of the car from generation to passing through the intersection are all the same. The initial time of traffic light phase at the traffic intersection is 10 seconds for green light and 4 seconds for yellow light. A road k (k 1,2,3,4) of 700 meters in length starting from the stop line is divided into 100 discrete units of 7 meters in length. Original state s collected at input end of traffic intersectiontThe number and the positions of the vehicles at the input end of the single intersection are recorded. The perturbation limit δ is 20%.
(2) Results of the experiment
In result analysis, a single intersection is used as an experimental scene, a reinforcement learning Deep Q Network (DQN) intelligent model is trained, a fast gradient descent method (FGSM) is adopted, discretization processing is carried out on disturbance to generate counterdisturbance, the number of vehicles at the input end of the intersection and the positions thereof are changed to cause the phase of the traffic lights to be changed, the comparison experiment is carried out under the two conditions of attack and non-attack, and the experimental results are shown in fig. 5 and fig. 6 (when the attack is continuously carried out, the traffic light phase cannot well ensure the circulation of the single intersection vehicles, so that the vehicles are accumulated at the intersection.
The embodiments described in this specification are merely illustrative of implementations of the inventive concepts, which are intended for purposes of illustration only. The scope of the present invention should not be construed as being limited to the particular forms set forth in the examples, but rather as being defined by the claims and the equivalents thereof which can occur to those skilled in the art upon consideration of the present inventive concept.
Claims (6)
1. A traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent is characterized by comprising the following steps:
step 1: training an reinforcement learning intelligent agent model on a single intersection road grid, wherein network parameters of the model do not change after training, the model has high mobility, and high fluency and no congestion are realized in a single intersection testing process;
step 2: acquiring the number of vehicles at the input end of each intersection and the positions of the vehicles at the input end of each intersection, namely inputting the number and the positions of the vehicles into a model, generating corresponding traffic signal lamps, namely output actions, and attacking the input at each moment one by utilizing a FGSM (highway fault warning message) attack algorithm to obtain corresponding counterdisturbance;
and step 3: discretizing the generated countermeasure disturbance, and combining the generated countermeasure disturbance with the originally acquired traffic flow to obtain a final disturbance state, namely the number and the positions of the vehicles at the traffic intersection input into the model at the moment;
and 4, step 4: in the currently constructed disturbance state, the disturbance magnitude is limited, and when the disturbance amount is smaller than the disturbance limit, the disturbance state is input into the model; inputting the original state into the model when the disturbance amount is larger than the disturbance limit;
and 5: and finally, comparing fluency of traffic intersections by traffic light phases obtained by traffic flows in different input states on sumo.
2. The method for generating the disturbance-resistant traffic state based on the rapid gradient descent single intersection signal control as claimed in claim 1, wherein in the step 1, the single intersection is an intersection, firstly, a reinforcement learning intelligent agent model is trained on a single intersection road grid, the traffic state on all roads entering the single intersection is discretely coded, and the length of the single intersection from a road section entrance to a stop line is lThe road k (k is 1,2,3,4) is equally divided into c discrete units, and the vehicle position of the road k at the single intersection at the time t is represented as a vehicle position matrix sk(t) when the vehicle head is on a discrete cell, then the vehicle position matrix sk(t) has a value of 0.5 for the ith (i ═ 1,2, …, c) position, otherwise the value is-0.5, and the formula is:
whereinRepresenting a vehicle position matrix sk(t) value of ith position, matrix s of vehicle positions of four intersection input ends at time tk(t) splicing according to line head and tail to form stThe formula is expressed as:
st=[s1(t),s2(t),s3(t),s4(t)] (2)
then handle stInputting the state of the environment into an intelligent agent model for training, and outputting a corresponding action, namely a phase to be executed by a traffic light by the intelligent agent;
defining the phase of traffic light as motion space a ═ a1,a2,a3,a4In which a is1Green light in east-west direction, a2Turning green light to the left in east and west directions, a3Green light in the north-south direction, a4Turning green light to the left in south and north, setting a during operationiThe initial time length of the phase is m, the phase time length of the yellow lamp is n, and the current state s is compared with the current state s at the moment ttInputting the phase into an intelligent traffic light model, and selecting the phase a by the intelligent traffic lighti(i ═ 1,2,3,4) when aiAfter the phase is executed, the intelligent traffic light collects the state s at the moment of t +1 from the environmentt+1Then selects phase aj(j ═ 1,2,3,4) if ai≠ajThen aiThe phase execution time is not lengthened any more, i.e. aiEnd of phase at aiIntelligent traffic after phase completionThe lamp executes the yellow lamp phase, and after the yellow lamp phase is finished, executes ajA phase; if ai=ajThen a isiPhase execution time is prolonged by m; will award rtSet as the difference in the waiting time of the crossing vehicle between two consecutive actions, the formula is expressed as:
rt=Wt-Wt+1 (3)
wherein Wt,Wt+1Respectively waiting time of entering all lanes of the single intersection at the time t and the time t +1, judging the action according to the executed action and environment reward, and continuously updating the parameters of the network, wherein the used reinforcement learning model is DQN, and the structure comprises a convolution layer and a full connection layer; the parameters comprise the size of a convolution kernel and the number of neurons of a full connection layer, a deep neural network is used as a Q value network, network parameters are initialized, the output of the network is the Q value, a Relu nonlinear activation function is adopted by a hidden layer, the number of the neurons of the output layer is equal to the size of an action space of a single intersection, and the formula is expressed as follows:
Q=h(wst+b) (4)
where w represents the weight of the neural network, stFor the input of the network, b is the bias, h (.) denotes the Relu activation function, the loss function of DQN is:
Lt=(yt-Q(st,ai;θ′))2 (6)
wherein y istRepresents a target value, ai,ajE A represents the traffic light phase r which is the action of the intelligent outputtRepresenting the reward at the moment T, gamma is the learning rate, theta and theta 'represent the parameters w and b of the target network and the parameters w and b' of the estimation network in the DQN respectively, the parameters of the estimation network are updated gradually along with the time step, and the parameters of the target network are updated by directly copying the network from the estimation network at intervals of time TThe parameters, formula, are:
3. the method for generating the traffic state anti-disturbance based on the single intersection signal control with the rapid gradient descent according to the claim 1 or 2, characterized in that the process of the step 2 is as follows:
2.1: obtaining an input value s of an input model at time ttWherein s istRepresenting the number of vehicles at the input end of the single intersection and the positions of the vehicles at the input end of the single intersection, which are obtained from sumo at the moment t;
2.2: input original state stSelecting the action a with the maximum action value function Q through the trained DQN intelligent agent modelm(m ═ 1,2,3,4) which is the optimum traffic light phase at this time, the formula is given as:
where θ represents a parameter of the trained intelligent agent model network, amIndicating the phase of the output, i.e. the traffic light is to be executed;
2.3: using FGSM attack algorithm, assigning values along gradient direction according to sign function to generate corresponding anti-disturbance eta at t momenttThe formula is expressed as:
where ε represents the perturbation coefficient, stRepresenting the input value, i.e. the position of the vehicle, amRepresenting the traffic light at that timeOptimum phase of the line, sign stands for sign function, Lt(θ,st,am) Representing the loss function of the model at time t.
4. The method for generating the traffic state anti-disturbance based on the single intersection signal control with the rapid gradient descent according to the claim 1 or 2, characterized in that the process of the step 3 is as follows:
3.1:wherein c is the number of discrete units divided by the input end of the traffic intersection,representing the countermeasure disturbance of the ith discrete unit at the time t, and calculating the countermeasure disturbance eta at the time ttThen, taking the absolute value of the disturbance at the time t and finding the maximum value thereofAnd minimum valueAnd ordering eta according to the size sequence to obtain a new ordering arrayFinally pass throughDiscretizing the disturbance to make the disturbance have practical physical significance;
3.2: at etatReading the disturbance in sequence and comparing the disturbance with original data, and if the original state is inconsistent with the counterdisturbance, assigning the corresponding disturbance to the corresponding original state; if the original state is consistent with the counterdisturbance, then get eta againt' the next countermeasure disturbance is assigned as described above until the selected disturbance is valid, resulting in a disturbanceDynamic state st′。
5. The method for generating the traffic state anti-disturbance based on the single intersection signal control with the rapid gradient descent according to the claim 1 or 2, characterized in that the process of the step 4 is as follows: calculating the disturbance quantity mu added by the disturbance state at the moment ttThe formula is expressed as:
where len (.) denotes the calculation stAnd stThe number of' middle vehicle state is 0.5, when the disturbance amount mutWhen the value is less than or equal to delta, the state of disturbance s is detectedt' input into the agent model, otherwise the original state stInput into the agent model.
6. The method for generating the traffic state anti-disturbance based on the single intersection signal control with the rapid gradient descent according to the claim 1 or 2, characterized in that the process of the step 5 is as follows:
5.1: original state s at each timetModel input into the model selects the optimal actionControlling the traffic flow at the intersection and calculating the waiting time difference of the traffic intersection, i.e. the reward rt=Wt-Wt+1;
5.2: for the final disturbance s after adding effective disturbancet' calculating the disturbance quantity mutTo meet the requirement (mu)tδ) input of input state into the agent model and output actionI.e. the traffic light phase, when the difference in waiting time (reward r) at the traffic crossing is calculated as wellt=Wt-Wt+1)。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110813579.3A CN113487889B (en) | 2021-07-19 | 2021-07-19 | Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110813579.3A CN113487889B (en) | 2021-07-19 | 2021-07-19 | Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113487889A true CN113487889A (en) | 2021-10-08 |
CN113487889B CN113487889B (en) | 2022-06-17 |
Family
ID=77941350
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110813579.3A Active CN113487889B (en) | 2021-07-19 | 2021-07-19 | Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113487889B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115830887A (en) * | 2023-02-14 | 2023-03-21 | 武汉智安交通科技有限公司 | Self-adaptive traffic signal control method, system and readable storage medium |
WO2023082205A1 (en) * | 2021-11-12 | 2023-05-19 | 华为技术有限公司 | Method for evaluating reinforcement learning agent, and related apparatus |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020102526A1 (en) * | 2018-11-14 | 2020-05-22 | North Carolina State University | Deep neural network with compositional grammatical architectures |
CN111243299A (en) * | 2020-01-20 | 2020-06-05 | 浙江工业大学 | Single cross port signal control method based on 3 DQN-PSER algorithm |
CN111260937A (en) * | 2020-02-24 | 2020-06-09 | 武汉大学深圳研究院 | Cross traffic signal lamp control method based on reinforcement learning |
CN111461226A (en) * | 2020-04-01 | 2020-07-28 | 深圳前海微众银行股份有限公司 | Countermeasure sample generation method, device, terminal and readable storage medium |
US20200327238A1 (en) * | 2018-08-14 | 2020-10-15 | Intel Corporation | Techniques to detect perturbation attacks with an actor-critic framework |
CN112232434A (en) * | 2020-10-29 | 2021-01-15 | 浙江工业大学 | Attack-resisting cooperative defense method and device based on correlation analysis |
US20210089891A1 (en) * | 2019-09-24 | 2021-03-25 | Hrl Laboratories, Llc | Deep reinforcement learning based method for surreptitiously generating signals to fool a recurrent neural network |
-
2021
- 2021-07-19 CN CN202110813579.3A patent/CN113487889B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200327238A1 (en) * | 2018-08-14 | 2020-10-15 | Intel Corporation | Techniques to detect perturbation attacks with an actor-critic framework |
WO2020102526A1 (en) * | 2018-11-14 | 2020-05-22 | North Carolina State University | Deep neural network with compositional grammatical architectures |
US20210089891A1 (en) * | 2019-09-24 | 2021-03-25 | Hrl Laboratories, Llc | Deep reinforcement learning based method for surreptitiously generating signals to fool a recurrent neural network |
CN111243299A (en) * | 2020-01-20 | 2020-06-05 | 浙江工业大学 | Single cross port signal control method based on 3 DQN-PSER algorithm |
CN111260937A (en) * | 2020-02-24 | 2020-06-09 | 武汉大学深圳研究院 | Cross traffic signal lamp control method based on reinforcement learning |
CN111461226A (en) * | 2020-04-01 | 2020-07-28 | 深圳前海微众银行股份有限公司 | Countermeasure sample generation method, device, terminal and readable storage medium |
CN112232434A (en) * | 2020-10-29 | 2021-01-15 | 浙江工业大学 | Attack-resisting cooperative defense method and device based on correlation analysis |
Non-Patent Citations (6)
Title |
---|
ANAY PATTANAIK: "Robust Deep Reinforcement Learning with Adversarial Attacks", 《COMPUTER SCIENCE》 * |
TONG CHEN: "Adversarial attack and defense in reinforcement learning-from ai security view", 《CYBERSECURITY》 * |
XIAOYUAN LIANG: "A deep reinforcement learning network for traffic light cycle control", 《IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY》 * |
YU-YING CHEN: "Adversarial Attacks Against Reinforcement Learning-Based Portfolio Management Strategy", 《IEEE ACCESS》 * |
刘志: "基于改进深度强化学习方法的单交叉口信号控制", 《计算机科学》 * |
李蒙: "基于深度强化学习的黑盒对抗攻击算法", 《计算机与现代化》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023082205A1 (en) * | 2021-11-12 | 2023-05-19 | 华为技术有限公司 | Method for evaluating reinforcement learning agent, and related apparatus |
CN115830887A (en) * | 2023-02-14 | 2023-03-21 | 武汉智安交通科技有限公司 | Self-adaptive traffic signal control method, system and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113487889B (en) | 2022-06-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112215337B (en) | Vehicle track prediction method based on environment attention neural network model | |
JP7351487B2 (en) | Intelligent navigation method and system based on topology map | |
CN109959388B (en) | Intelligent traffic refined path planning method based on grid expansion model | |
JP6916552B2 (en) | A method and device for detecting a driving scenario that occurs during driving and providing information for evaluating a driver's driving habits. | |
CN113487889B (en) | Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent | |
CN103593535A (en) | Urban traffic complex self-adaptive network parallel simulation system and method based on multi-scale integration | |
CN112487954B (en) | Pedestrian crossing behavior prediction method for plane intersection | |
CN112016499A (en) | Traffic scene risk assessment method and system based on multi-branch convolutional neural network | |
CN112508164B (en) | End-to-end automatic driving model pre-training method based on asynchronous supervised learning | |
CN114358128A (en) | Method for training end-to-end automatic driving strategy | |
CN115188204B (en) | Highway lane-level variable speed limit control method under abnormal weather condition | |
CN111081022A (en) | Traffic flow prediction method based on particle swarm optimization neural network | |
Alam et al. | Intellegent traffic light control system for isolated intersection using fuzzy logic | |
CN114120670A (en) | Method and system for traffic signal control | |
CN116448134B (en) | Vehicle path planning method and device based on risk field and uncertain analysis | |
CN116758768A (en) | Dynamic regulation and control method for traffic lights of full crossroad | |
CN115426149A (en) | Single intersection signal lamp control traffic state anti-disturbance generation method based on Jacobian saliency map | |
WO2021258847A1 (en) | Driving decision-making method, device, and chip | |
CN113487870B (en) | Anti-disturbance generation method for intelligent single intersection based on CW (continuous wave) attack | |
Wen et al. | Modeling human driver behaviors when following autonomous vehicles: An inverse reinforcement learning approach | |
CN115062202A (en) | Method, device, equipment and storage medium for predicting driving behavior intention and track | |
CN115031753A (en) | Driving condition local path planning method based on safety potential field and DQN algorithm | |
Tran et al. | Revisiting pixel-based traffic signal controls using reinforcement learning with world models | |
CN113628455A (en) | Intersection signal optimization control method considering number of people in vehicle under Internet of vehicles environment | |
CN113486568A (en) | Vehicle control dynamic simulation learning algorithm based on surround vision |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |