CN109375514A - A kind of optimal track control device design method when the injection attacks there are false data - Google Patents

A kind of optimal track control device design method when the injection attacks there are false data Download PDF

Info

Publication number
CN109375514A
CN109375514A CN201811453386.6A CN201811453386A CN109375514A CN 109375514 A CN109375514 A CN 109375514A CN 201811453386 A CN201811453386 A CN 201811453386A CN 109375514 A CN109375514 A CN 109375514A
Authority
CN
China
Prior art keywords
policy
algorithm
false data
optimal
following
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811453386.6A
Other languages
Chinese (zh)
Other versions
CN109375514B (en
Inventor
刘皓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shensu Intelligent Agricultural Machinery Equipment Henan Co ltd
Original Assignee
Shenyang Aerospace University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Aerospace University filed Critical Shenyang Aerospace University
Priority to CN201811453386.6A priority Critical patent/CN109375514B/en
Publication of CN109375514A publication Critical patent/CN109375514A/en
Application granted granted Critical
Publication of CN109375514B publication Critical patent/CN109375514B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)
  • Complex Calculations (AREA)

Abstract

The present invention relates to a kind of intelligent-tracking controllers, and when there are false data injection attacks, which can calculate optimal tracking control law in real time, so that the reference input of tracking system is capable of in the output of system.The controller may include different control algolithm processors, using the adaptive dynamic programming algorithm based on game theory and Q- study, the case where can be adapted for the unknown situation of system dynamic, can only even obtain input-output data.The present invention is suitable for the case where by Wireless networking systems and controller, or the case where by wireless communication networks transmission data, has great application value in terms of UAV Formation Flight, intelligent vehicle.

Description

A kind of optimal track control device design method when the injection attacks there are false data
Technical field
It the present invention relates to the use of game theory, adaptive Dynamic Programming and intensified learning method, linear discrete time system There are the methods for when false data injection attacks, determining optimal track control device for system.
Background technique
Optimal track control is an important subject of control field, has a wide range of applications background.For example, intelligence The track following of vehicle and unmanned plane, the tracing control etc. of robot.The purpose of optimal track control is to make the output of system most Under excellent meaning can track reference input (or reference locus), this can be by minimizing previously given quadratic performance index Function is realized.It should be pointed out that with the development and application of network technology, wireless network transmission technology is more and more Remote transmission applied to data.However, due to the presence of wireless network, so that the data of transmission are easy to be attacked by opponent, it is main It to include Denial of Service attack, Replay Attack, false data injection attacks etc..So research there are when network attack it is optimal with Track control has important practical significance.Present invention is generally directed to false data injection attacks to be studied.
Traditional optimal track control designs corresponding tracking control unit using dynamic programming method.However, Dynamic Programming Method belongs to recurrence method from the front to the back, therefore cannot be in line computation, and there are problems that dimension calamity.Adaptive dynamic is advised Draw method belong to artificial intelligence scope, be fundamentally based on intensified learning theory, simulation people by the feedback to complex environment into The thinking of row study, and then time forward Recursive Solution control strategy, so can execute online.
Optimum control rate is calculated using Q learning method, it may not be necessary to the system square of original system and reference locus generator Battle array, the situation unknown suitable for certain dynamic matrixes.In addition, this method can also be asked only with inputoutput data, iteration Optimal track control strategy is solved, without current state information.
Summary of the invention
When present invention aims at proposing a kind of injection attacks there are false data, discrete-time system optimal track control device Design method, can not be tracked when solving the problems, such as to be previously present false data injection attacks.System construction drawing of the invention is such as Shown in Fig. 1.Technical solution of the present invention is implemented as follows:
1) false data challenge model and augmented system model are established;
2) Game Theory is used, the betting model of attacking and defending both sides is established;Defender is controller, and attacker is false data Injection side;
3) Bellman equation is established, by the theory of optimal control, solves optimal control policy and attack strategies;Using strategy Alternative manner and value alternative manner solve game algebra Riccati equation;
4) use the intensified learning method based on Q- function, solve game both sides optimal policy, including Policy iteration and It is worth alternative manner;
5) input-output data are based only on, using Q- learning method, iteratively solve optimal policy.
Detailed description of the invention
Fig. 1 is that there are system construction drawings when false data injection attacks.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.
With reference to Fig. 1, the invention proposes a kind of methods learnt using game theory, adaptive Dynamic Programming and Q, solve The optimal track control problem of discrete-time system.Specific embodiment is as follows:
1) false data challenge model and augmentation model are established
Consider following system model
xk+1=Axk+Buk (1)
Wherein, A and B is sytem matrix;Assuming that system control input ukIt is under attack in transmission process, by false data System model after injection attacks becomes
Wherein, q is attacker's number,It indicates i-th Transmission is logical to receive j-th of attacker's attack, otherwise not under fire;For the false data of j-th of channel of k moment injection.
Assuming that trace model has following form
Wherein, matrix T is trace model matrix;It should be noted that sytem matrix T should not Seeking Truth Hurwitz in above formula 's.Convolution (2) and formula (3), can obtain augmented system equation as follows
2) Game Theory is used, the betting model of attacking and defending both sides is established
In general, there are many forms for controller, for example, state feedback, output feedback, Dynamic Output Feedback etc..In addition, injection False data can also be varied.Present invention assumes that tracking control unit and false data are stateLinear function, i.e.,
Wherein, K=[K1, K2] andThe respectively feedback oscillator of attacking and defending both sides.Respectively game both sides select Take following pay off function:
Wherein,Qe>=0, R > 0, γ ∈ (0,1) are discount factor.So defender and The optimal policy of attacker is
(9) and (10) are solved to be equivalent to solve following problem of game
3) Bellman equation is established, by the theory of optimal control, solves optimal control policy and attack strategies
Firstly, the utility function being defined as follows,
Then, by calculating, available following optimum control Bellman equation,
According to the theory of optimal control it is found thatP > 0.So by solving optimal control equation, it can To obtain the optimal policy of following attacking and defending both sides:
Wherein,
Θ=[(Θ1)T3)T…(Θq)T]T
L (P)=[(L1(P))T(L2(P))T…(Lq(P))T]T
Matrix P > 0, and meet
Result above is provided according to dynamic programming method, can only off-line calculation.We use intensified learning side now Method, in line computation game both sides' optimal policy.Policy iteration and value iterative calculation is set forth in following algorithm 1 and algorithm 2 Process.
Algorithm 1: strategy of on-line iteration
1. initialization: setting j=0 selects stable initial policyWith
2. Policy evaluation: solving following equation and seek Pj+1
3. stragetic innovation:
4. stop condition: | | Kj+1-Kj| | < ∈, | | Lj+1-Lj| | < ∈
Algorithm 2: Iteration algorithm
1. initialization: setting j=0 selects stable initial policyWith
2. Policy evaluation: solving following equation and seek Pj+1
3. stragetic innovation:
4. stop condition: | | Kj+1-Kj| | < ∈, | | Lj+1-Lj| | < ∈;
Equation (17) are solved it can be seen from algorithm 1 needs given dataAnd Initial value must can be steady, otherwise equation is without solution.And algorithm 2 is correspondingly improved, it is no longer necessary to which initial value is can be steady 's.
4) the intensified learning method based on Q- function is used, the optimal policy of game both sides is solved
The Q- function being defined as follows,
The compact form being written as follow again for convenience,
Wherein,
So can be by solving equationWithIt is available as follows Attacking and defending both sides' optimal policy,
Formula (20) is brought into formula (19), the available Bellman equation based on Q- function, the equation is in iterative process An important equation.Policy iteration and value alternative manner based on Q- function provide in algorithm 3 and algorithm 4 respectively.
Algorithm 3: the Policy iteration algorithm based on Q- function
1. initialization: setting j=0 selects H0=(H0)T
2. Policy evaluation: solving following equation and seek Pj+1
3. stragetic innovation:
4, stop condition: | | Hj+1-Hj| | < ∈
Algorithm 4: the Iteration algorithm based on Q- function
1. initialization: setting j=0 selects H0=(H0)T
2. Policy evaluation: solving following equation and seek Pj+1
3. stragetic innovation:
4, stop condition: | | Hj+1-Hj| | < ∈.
It is worth noting that, the system that iterative algorithm 3 and algorithm 4 based on Q- function do not need previously known augmented system MatrixWith
Optimal policy is iteratively solved using Q- learning method based on input-output data
Assuming that system is observable, then system modeIt can be indicated using following input-output sequence,
Wherein,
As can be seen from the above equation, there are a constant k > 0, so that as N < k, rank (VN) < n+p, as N >=κ, rank(VN)=n+p.Wherein, n is original system state dimension, and p is that system exports dimension.Therefore, selection N >=κ makes matrix VNColumn Full rank.Definition
So, Q- function can be write as following form
Therefore, the optimal policy of available attacking and defending both sides is
Wherein,
Bellman equation based on Q- function and input-output data can be write as
Linear parameterization Q- function, it is available
In above formula, unknown matrixHaveA unknown element,BecauseBased on the above analysis, algorithm 5 and algorithm 6 are set forth Input-output data only only used using the Policy iteration and value alternative manner, this method of Q- study.
Algorithm 5: the Policy iteration algorithm learnt using Q-
1. initialization: setting j=0, the initial policy that selection can be steadyWith
2. Policy evaluation: solving following equation and seek hj+1
3. stragetic innovation:
4, stop condition: | | Hj+1-Hj| | < ∈
Algorithm 6: the Iteration algorithm learnt using Q-
1. initialization: setting j=0 selects arbitrary initial policyWith
2. Policy evaluation: solving following equation and seek hj+1
3. stragetic innovation:
4, stop condition: | | Hj+1-Hj| | < ∈;
Attacking and defending both sides initial policy is not needed it can be seen from algorithm 6 can be steady.In addition, for recursive calculation Number of samples must satisfy

Claims (5)

1. optimal track control device design method when a kind of injection attacks there are false data, which is characterized in that including walking as follows It is rapid:
Step 1: false data challenge model and augmented system model are established;
Step 2: Game Theory is used, the betting model of attacking and defending both sides is established;
Step 3: use the intensified learning method based on Q- function, solve game both sides optimal policy, including Policy iteration and It is worth alternative manner;
Step 4: being based on input-output data, using Q- learning method, iteratively solves optimal policy.
2. optimal track control device design method when a kind of injection attacks there are false data according to claim 1, It is characterized in that, the step one specifically:
Consider following system model:
xk+1=Axk+Buk
Wherein, A and B is sytem matrix;If system control input ukIt is under attack in transmission process, then by false data System model after injection attacks are as follows:
Wherein, q is attacker's number, Indicate that i-th of transmission is logical J-th of attacker's attack is received, otherwise not under fire;For the false data of j-th of channel of k moment injection;
Assuming that trace model has following form:
Wherein, matrix T is trace model matrix;Then augmented system can state are as follows:
3. optimal track control device design method when a kind of injection attacks there are false data according to claim 1, It is characterized in that, the step two specifically:
Assuming that tracking control unit and false data are stateLinear function, i.e.,
Wherein, K=[K1, K2] andThe respectively feedback oscillator of attacking and defending both sides.
Game both sides choose following pay off function:
Wherein, γ ∈ (0,1) is discount factor, QeIt is respectively given positive semidefinite and positive definite matrix with R;Defender and attacker Optimal policy design are as follows:
4. optimal track control device design method when a kind of injection attacks there are false data according to claim 1, It is characterized in that, the step three specifically:
The Q- function being defined as follows:
By solving equationWithAvailable following attacking and defending both sides are optimal Action strategy:
Wherein, Policy iteration and value alternative manner based on Q- function provide in algorithm 1 and algorithm 2 respectively;
Algorithm 1: the Policy iteration algorithm based on Q- function includes the following steps,
1), initialize: setting j=0 selects H0=(H0)T
2), Policy evaluation: following equation is solved, P is soughtj+1
3), stragetic innovation:
4), stop condition: | | Hj+1-Hj| | < ∈;
Algorithm 2: the Iteration algorithm based on Q- function includes the following steps,
1), initialize: setting j=0 selects H0=(H0)T
2) it, Policy evaluation: solves following equation and seeks Pj+1
3), stragetic innovation:
4) stop condition: | | Hj+1-Hj| | < ∈.
5. optimal track control device design method when a kind of injection attacks there are false data according to claim 1, It is characterized in that, the step four specifically:
System modeIt can be indicated using following input-output sequence:
So, Q- function can be write as following form:
Therefore, the optimal policy of attacking and defending both sides are as follows:
Wherein,
It is provided in algorithm 3 and algorithm 4 respectively using the Policy iteration and value alternative manner of Q- study:
Algorithm 3: using the Policy iteration algorithm of Q- study, including the following steps,
1) it initializes: setting j=0, the initial policy that selection can be steadyWith
2) it Policy evaluation: solves following equation and seeks hj+1
3) stragetic innovation:
4) stop condition: | | Hj+1-Hj| | < ∈;
Algorithm 4: using the Iteration algorithm of Q- study, including the following steps,
1) initialize: setting j=0 selects arbitrary initial policyWith
2) it Policy evaluation: solves following equation and seeks hj+1
3) stragetic innovation:
4) stop condition: | | Hj+1-Hj| | < ∈.
CN201811453386.6A 2018-11-30 2018-11-30 Design method of optimal tracking controller in presence of false data injection attack Active CN109375514B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811453386.6A CN109375514B (en) 2018-11-30 2018-11-30 Design method of optimal tracking controller in presence of false data injection attack

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811453386.6A CN109375514B (en) 2018-11-30 2018-11-30 Design method of optimal tracking controller in presence of false data injection attack

Publications (2)

Publication Number Publication Date
CN109375514A true CN109375514A (en) 2019-02-22
CN109375514B CN109375514B (en) 2021-11-05

Family

ID=65376219

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811453386.6A Active CN109375514B (en) 2018-11-30 2018-11-30 Design method of optimal tracking controller in presence of false data injection attack

Country Status (1)

Country Link
CN (1) CN109375514B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109932905A (en) * 2019-03-08 2019-06-25 辽宁石油化工大学 A kind of optimal control method of the Observer State Feedback based on non-strategy
CN110083064A (en) * 2019-04-29 2019-08-02 辽宁石油化工大学 A kind of network optimal track control method based on non-strategy Q- study
CN111273543A (en) * 2020-02-15 2020-06-12 西北工业大学 PID optimization control method based on strategy iteration
CN111673750A (en) * 2020-06-12 2020-09-18 南京邮电大学 Speed synchronization control scheme of master-slave type multi-mechanical arm system under deception attack
CN112149361A (en) * 2020-10-10 2020-12-29 中国科学技术大学 Adaptive optimal control method and device for linear system
CN112650057A (en) * 2020-11-13 2021-04-13 西北工业大学深圳研究院 Unmanned aerial vehicle model prediction control method based on anti-spoofing attack security domain
CN113885330A (en) * 2021-10-26 2022-01-04 哈尔滨工业大学 Information physical system safety control method based on deep reinforcement learning
CN114415633A (en) * 2022-01-10 2022-04-29 云境商务智能研究院南京有限公司 Security tracking control method based on dynamic event trigger mechanism under multi-network attack
CN115877871A (en) * 2023-03-03 2023-03-31 北京航空航天大学 Non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2140650B1 (en) * 2007-03-30 2011-05-25 International Business Machines Corporation Method and system for resilient packet traceback in wireless mesh and sensor networks
CN104994569A (en) * 2015-06-25 2015-10-21 厦门大学 Multi-user reinforcement learning-based cognitive wireless network anti-hostile interference method
CN106937295A (en) * 2017-02-22 2017-07-07 沈阳航空航天大学 Heterogeneous network high energy efficiency power distribution method based on game theory
CN107038477A (en) * 2016-08-10 2017-08-11 哈尔滨工业大学深圳研究生院 A kind of neutral net under non-complete information learns the estimation method of combination with Q
CN107819785A (en) * 2017-11-28 2018-03-20 东南大学 A kind of double-deck defence method towards power system false data injection attacks
CN108181816A (en) * 2018-01-05 2018-06-19 南京航空航天大学 A kind of synchronization policy update method for optimally controlling based on online data
CN108196448A (en) * 2017-12-25 2018-06-22 北京理工大学 False data injection attacks method based on inaccurate mathematical model
CN108512837A (en) * 2018-03-16 2018-09-07 西安电子科技大学 A kind of method and system of the networks security situation assessment based on attacking and defending evolutionary Game

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2140650B1 (en) * 2007-03-30 2011-05-25 International Business Machines Corporation Method and system for resilient packet traceback in wireless mesh and sensor networks
CN104994569A (en) * 2015-06-25 2015-10-21 厦门大学 Multi-user reinforcement learning-based cognitive wireless network anti-hostile interference method
CN107038477A (en) * 2016-08-10 2017-08-11 哈尔滨工业大学深圳研究生院 A kind of neutral net under non-complete information learns the estimation method of combination with Q
CN106937295A (en) * 2017-02-22 2017-07-07 沈阳航空航天大学 Heterogeneous network high energy efficiency power distribution method based on game theory
CN107819785A (en) * 2017-11-28 2018-03-20 东南大学 A kind of double-deck defence method towards power system false data injection attacks
CN108196448A (en) * 2017-12-25 2018-06-22 北京理工大学 False data injection attacks method based on inaccurate mathematical model
CN108181816A (en) * 2018-01-05 2018-06-19 南京航空航天大学 A kind of synchronization policy update method for optimally controlling based on online data
CN108512837A (en) * 2018-03-16 2018-09-07 西安电子科技大学 A kind of method and system of the networks security situation assessment based on attacking and defending evolutionary Game

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
HAO LIU 等: "《Optimal Tracking Control of Linear 》Discrete-Time Systems Under Cyber Attacks》", 《IFAC2020》 *
YING CHEN 等: "《Evaluation of Reinforcement Learning Based False Data Injection Attack to Automatic Voltage Control》", 《IEEE》 *
YUZHE LI 等: "《SINR-based DoS Attack on Remote State Estimation: A Game-theoretic Approach》", 《IEEE》 *
刘皓: "《信息物理系统的"攻与防"》", 《沈阳航空航天大学学报》 *
田继伟 等: "《基于博弈论的负荷重分配攻击最佳防御策略》", 《计算机仿真》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109932905A (en) * 2019-03-08 2019-06-25 辽宁石油化工大学 A kind of optimal control method of the Observer State Feedback based on non-strategy
CN109932905B (en) * 2019-03-08 2021-11-09 辽宁石油化工大学 Optimization control method based on non-strategy observer state feedback
CN110083064B (en) * 2019-04-29 2022-02-15 辽宁石油化工大学 Network optimal tracking control method based on non-strategy Q-learning
CN110083064A (en) * 2019-04-29 2019-08-02 辽宁石油化工大学 A kind of network optimal track control method based on non-strategy Q- study
CN111273543A (en) * 2020-02-15 2020-06-12 西北工业大学 PID optimization control method based on strategy iteration
CN111273543B (en) * 2020-02-15 2022-10-04 西北工业大学 PID optimization control method based on strategy iteration
CN111673750A (en) * 2020-06-12 2020-09-18 南京邮电大学 Speed synchronization control scheme of master-slave type multi-mechanical arm system under deception attack
CN111673750B (en) * 2020-06-12 2022-03-04 南京邮电大学 Speed synchronization control scheme of master-slave type multi-mechanical arm system under deception attack
CN112149361A (en) * 2020-10-10 2020-12-29 中国科学技术大学 Adaptive optimal control method and device for linear system
CN112149361B (en) * 2020-10-10 2024-05-17 中国科学技术大学 Self-adaptive optimal control method and device for linear system
CN112650057B (en) * 2020-11-13 2022-05-20 西北工业大学深圳研究院 Unmanned aerial vehicle model prediction control method based on anti-spoofing attack security domain
CN112650057A (en) * 2020-11-13 2021-04-13 西北工业大学深圳研究院 Unmanned aerial vehicle model prediction control method based on anti-spoofing attack security domain
CN113885330A (en) * 2021-10-26 2022-01-04 哈尔滨工业大学 Information physical system safety control method based on deep reinforcement learning
CN113885330B (en) * 2021-10-26 2022-06-17 哈尔滨工业大学 Information physical system safety control method based on deep reinforcement learning
CN114415633A (en) * 2022-01-10 2022-04-29 云境商务智能研究院南京有限公司 Security tracking control method based on dynamic event trigger mechanism under multi-network attack
CN114415633B (en) * 2022-01-10 2024-02-02 云境商务智能研究院南京有限公司 Security tracking control method based on dynamic event triggering mechanism under multi-network attack
CN115877871A (en) * 2023-03-03 2023-03-31 北京航空航天大学 Non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning

Also Published As

Publication number Publication date
CN109375514B (en) 2021-11-05

Similar Documents

Publication Publication Date Title
CN109375514A (en) A kind of optimal track control device design method when the injection attacks there are false data
Yan et al. A path planning algorithm for UAV based on improved Q-learning
CN108803349B (en) Optimal consistency control method and system for nonlinear multi-agent system
Duan et al. Imperialist competitive algorithm optimized artificial neural networks for UCAV global path planning
Givigi et al. A reinforcement learning adaptive fuzzy controller for differential games
Yu et al. Distributed multi‐agent deep reinforcement learning for cooperative multi‐robot pursuit
Fang et al. Target‐driven visual navigation in indoor scenes using reinforcement learning and imitation learning
Schultz et al. Improving tactical plans with genetic algorithms
Wei et al. Recurrent MADDPG for object detection and assignment in combat tasks
Yue et al. Deep reinforcement learning for UAV intelligent mission planning
Liu et al. Task assignment in ground-to-air confrontation based on multiagent deep reinforcement learning
CN111811532B (en) Path planning method and device based on impulse neural network
CN115047907B (en) Air isomorphic formation command method based on multi-agent PPO algorithm
Xiao et al. Graph attention mechanism based reinforcement learning for multi-agent flocking control in communication-restricted environment
Cao et al. Autonomous maneuver decision of UCAV air combat based on double deep Q network algorithm and stochastic game theory
Xu et al. Pursuit and evasion game between UVAs based on multi-agent reinforcement learning
Esrafilian et al. Model-aided deep reinforcement learning for sample-efficient UAV trajectory design in IoT networks
Yang et al. Learning graph-enhanced commander-executor for multi-agent navigation
Zhao et al. Deep Reinforcement Learning‐Based Air Defense Decision‐Making Using Potential Games
CN116165886A (en) Multi-sensor intelligent cooperative control method, device, equipment and medium
Tuba et al. Water cycle algorithm for robot path planning
Lin et al. Choice of discount rate in reinforcement learning with long-delay rewards
Liu et al. A distributed driving decision scheme based on reinforcement learning for autonomous driving vehicles
Bromo Reinforcement Learning Based Strategic Exploration Algorithm for UAVs Fleets
Yang et al. An interrelated imitation learning method for heterogeneous drone swarm coordination

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220718

Address after: 452370 Building 2, Xingfu industrial new town, Micun Town, Xinmi City, Zhengzhou City, Henan Province

Patentee after: Shensu intelligent agricultural machinery equipment (Henan) Co.,Ltd.

Address before: 110136, Liaoning, Shenyang, Shenbei New Area moral South Avenue No. 37

Patentee before: SHENYANG AEROSPACE University