CN113268081B - Small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning - Google Patents
Small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning Download PDFInfo
- Publication number
- CN113268081B CN113268081B CN202110602580.1A CN202110602580A CN113268081B CN 113268081 B CN113268081 B CN 113268081B CN 202110602580 A CN202110602580 A CN 202110602580A CN 113268081 B CN113268081 B CN 113268081B
- Authority
- CN
- China
- Prior art keywords
- unmanned aerial
- aerial vehicle
- prevention
- value
- small unmanned
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000002265 prevention Effects 0.000 title claims abstract description 206
- 238000000034 method Methods 0.000 title claims abstract description 59
- 230000002787 reinforcement Effects 0.000 title claims abstract description 38
- 238000001514 detection method Methods 0.000 claims abstract description 75
- 238000011156 evaluation Methods 0.000 claims abstract description 68
- 238000012549 training Methods 0.000 claims abstract description 35
- 230000000694 effects Effects 0.000 claims abstract description 28
- 238000004458 analytical method Methods 0.000 claims abstract description 27
- 230000004927 fusion Effects 0.000 claims abstract description 17
- 239000000203 mixture Substances 0.000 claims abstract description 13
- 239000002245 particle Substances 0.000 claims abstract description 13
- 230000009471 action Effects 0.000 claims description 97
- 230000006870 function Effects 0.000 claims description 55
- 230000008569 process Effects 0.000 claims description 25
- 238000011282 treatment Methods 0.000 claims description 23
- 230000007123 defense Effects 0.000 claims description 13
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 239000000126 substance Substances 0.000 claims description 6
- 238000011217 control strategy Methods 0.000 claims description 4
- 230000008014 freezing Effects 0.000 claims description 4
- 238000007710 freezing Methods 0.000 claims description 4
- 238000012886 linear function Methods 0.000 claims description 4
- 238000003062 neural network model Methods 0.000 claims description 4
- 210000002569 neuron Anatomy 0.000 claims description 4
- 230000007613 environmental effect Effects 0.000 claims description 3
- 230000003993 interaction Effects 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 230000000007 visual effect Effects 0.000 claims description 2
- 238000012545 processing Methods 0.000 abstract description 6
- 238000007726 management method Methods 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000011269 treatment regimen Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 230000010485 coping Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 230000001681 protective effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/10—Simultaneous control of position or course in three dimensions
- G05D1/101—Simultaneous control of position or course in three dimensions specially adapted for aircraft
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
The invention discloses a small unmanned aerial vehicle prevention and control command decision method based on reinforcement learning, which comprises the following steps: determining the composition of a small unmanned aerial vehicle prevention and control system; the small unmanned aerial vehicle prevention and control system comprises a detection subsystem, a disposal subsystem and a command control system; the detection subsystem is used for providing combat situation information, and the disposal subsystem is responsible for implementing prevention and control disposal; establishing a three-degree-of-freedom particle motion model of the small unmanned aerial vehicle; constructing a prevention and control command decision model; training and optimizing a small unmanned aerial vehicle prevention and control command decision model; and verifying and evaluating the prevention and control effect of the prevention and control command decision model. The invention also discloses a small unmanned aerial vehicle prevention and control command decision system based on reinforcement learning, which comprises a multi-source data fusion module, a situation analysis module, a prevention and control planning module and an effect evaluation module. The invention solves the problems of low decision speed, difficulty in processing complex scenes and the like in the existing prevention and control command decision system, and can be widely applied to the fields of small unmanned aerial vehicle management and control, civil supervision and military.
Description
Technical Field
The invention belongs to the technical field of command control, and particularly relates to a small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning.
Background
At present, for the detection and processing problems of the 'low-speed small' unmanned aerial vehicle, many relevant mature technologies and achievements exist at home and abroad, but in the aspect of generating a specific disposal strategy by using detection information and how to construct a small unmanned aerial vehicle control command decision system and other problems, a commander still needs to make an artificial decision at present, and an operator finishes a relevant disposal instruction of the unmanned aerial vehicle according to a decision result.
Considering the intelligent technology development level of the current command control system, the existing small unmanned aerial vehicle prevention and control command control system mainly has the following problems: (1) at present, the control work of the small unmanned aerial vehicle is mainly completed manually by an operator, and the command automation degree is extremely low; (2) the small unmanned aerial vehicle control belongs to short-range defense, the command decision time is short, the response speed is high, the response time of manual operation is difficult to meet the defense requirement, and the difference between coping and dealing with multiple targets is more obvious; (3) the situation of the small unmanned aerial vehicle is complex and varies, and the existing control system and process based on experience rules are difficult to adapt to the control requirements. The small unmanned aerial vehicle prevention and control command decision method based on the reinforcement learning training algorithm model is not applied to the existing products or the small unmanned aerial vehicle prevention and control command decision system.
Disclosure of Invention
Aiming at the problem of automatic generation of process strategies such as detection, analysis, prevention and control command control, scheduling and handling of low-altitude targets such as small unmanned aerial vehicles and the like under complex scenes such as cities and the like, the invention discloses a small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning, which realize the efficient conversion of comprehensive situation data for the prevention and control of small unmanned aerial vehicles into prevention and control handling schemes and instructions for the unmanned aerial vehicles, can access multi-source detection means and multi-element handling means for command decision, realize the effective promotion of intelligent decision level in 4 small unmanned aerial vehicle prevention and control command flow stages including situation fusion, threat analysis, planning schemes and handling control, solve the problems of low decision speed, difficulty in handling complex scenes and the like in the existing prevention and control command system, and meet the prevention and control requirements of the small unmanned aerial vehicles. The small unmanned aerial vehicle generally refers to an unmanned aerial vehicle with takeoff weight not more than 25 kilograms, and comprises two types of fixed wings and rotary wings, and has the characteristics of low cost, strong maneuverability and the like.
The invention discloses a small unmanned aerial vehicle prevention and control command decision method based on reinforcement learning, which comprises the following steps:
s1, determining the composition of a small unmanned aerial vehicle prevention and control system;
s2, establishing a three-degree-of-freedom particle motion model of the small unmanned aerial vehicle;
s3, constructing a small unmanned aerial vehicle prevention and control command decision model;
s4, training and optimizing a small unmanned aerial vehicle prevention and control command decision model;
and S5, verifying and evaluating the prevention and control effect of the small unmanned aerial vehicle prevention and control command decision model.
Further, the step S1 specifically includes: determining the composition of a small unmanned aerial vehicle prevention and control system, wherein the small unmanned aerial vehicle prevention and control system comprises a detection subsystem, a disposal subsystem and a command control system; the system comprises a detection subsystem, a disposal subsystem, a command control system and a command control system, wherein the detection subsystem is used for providing combat situation information, the disposal subsystem is responsible for implementing prevention and control disposal, and the command control system is used for receiving the combat situation information from the detection subsystem and scheduling a plurality of disposal means to generate a disposal strategy; the detection subsystem comprises single-type or multi-type detection equipment, and the disposal subsystem comprises multi-type soft killing disposal equipment and hard interception disposal equipment; the command control system comprises a multi-source data fusion module, a situation analysis module, a prevention and control planning module and an effect evaluation module;
specifically, the detection subsystem comprises radar detection equipment, photoelectric detection equipment and radio detection equipment, and the treatment subsystem comprises radio interference equipment and laser interception equipment;
further, the step S2 specifically includes: in the unmanned aerial vehicle prevention and control operation, mainly prevent and control the processing according to information such as the target position that the subsystem obtained of surveying, speed, consequently the key is the model of research prevention and control object in the prevention and control operation, regards unmanned aerial vehicle as the particle, establishes its three degree of freedom particle motion model:
wherein (x, y, z) represents the coordinates of the small unmanned aerial vehicle in a three-dimensional space coordinate system of the earth, v, theta and psi respectively represent the flight speed, the pitch angle and the yaw angle of the small unmanned aerial vehicle, and t represents time.
Further, the step S3 specifically includes: the treatment equipment of the unmanned aerial vehicle prevention and control system comprises laser interception equipment and radio interference equipment, wherein actions of the laser equipment comprise four actions of opening the laser equipment, closing the laser equipment, keeping the equipment state and adjusting laser pointing direction, and actions of the radio interference equipment comprise four actions of opening interference, closing interference, keeping action and adjusting interference pointing direction. And performing action coding on various actions of the disposal equipment by adopting a three-bit binary number, wherein the first bit of the three-bit binary number represents the type of the equipment, and the last two bits of the three-bit binary number are used for representing the corresponding specific actions of the equipment, namely, the action taken by the disposal equipment of the prevention and control system is represented by a triple group formed by the three-bit binary number.
According to the characteristics of the small unmanned aerial vehicle prevention and control task and the Markov decision process, a small unmanned aerial vehicle prevention and control command decision model is established, a state space and a disposal decision space are designed, and a reward function is determined according to the prevention and control intention of a small unmanned aerial vehicle prevention and control system;
the small unmanned aerial vehicle control command decision model is established by adopting a reinforcement learning algorithm, interaction between the intelligent decision model and the environment is described by adopting a Markov decision process in reinforcement learning, and the Markov decision process is realized by utilizing a state space, an action space, a reward function and a discount coefficient;
the expression of the state space S of the unmanned aerial vehicle prevention and control command decision model is as follows:
S=[dt,vt,θt,ψt,,tl,tj],
wherein d istThe expression of (a) is:
wherein the content of the first and second substances,andrespectively representing the position coordinates of the small unmanned aerial vehicle at the time t and the time t-delta t, (x)a,ya,za) Representing the position coordinates of the detection device, at representing the stepping time interval of the Markov decision process; dtThe distance between the small unmanned aerial vehicle and the detection equipment at the moment t is represented; v. oftRepresenting the flight speed of the small unmanned aerial vehicle at the moment t; t is tlRepresenting the light emitting time of the laser interception equipment; t is tjRepresenting the time when the radio interference device is on; theta and psi are denoted as the pitch angle and yaw angle of the drone, respectively.
The expression of the action space A of the unmanned aerial vehicle control command decision model is A ═ Dt,Da1,Da2]Wherein the device type DtThe value is 0 or 1, and the action type of the equipment is determined by an action variable Da1And Da2Is a combination of (1) represents an action variable [ D ]a1,Da2]The specific values of (a) include four combinations of 00, 01, 10 and 11.
When the prevention and control intention of the small unmanned aerial vehicle prevention and control system is the prevention and control target of medium and long distance, the defense success condition at the moment is expressed by the reward function of each flight component of the small unmanned aerial vehicle,
wherein R isa、RdAnd RvRespectively representing an angle reward function, a distance reward function and a speed reward function; q represents an included angle between the speed vector of the small unmanned aerial vehicle and a connecting line of the small unmanned aerial vehicle and the detection equipment; q. q.smRepresenting the angle value when the angle reward value is the minimum reward positive value;respectively indicating that the detection equipment is within the visual line angle range of the unmanned aerial vehicle and is away from the unmanned aerial vehicleOpening the reward value of the unmanned aerial vehicle line-of-sight angle range, wherein when the angle q is 0, the angle reward value is minimum; when the angle q is pi, the angle award value is maximum. The distance reward function is expressed by a linear function related to the distance, k is a smooth coefficient keeping the distance reward function at the minimum reward positive value, dfAnd dcRespectively representing the maximum radius of a prevention and control area of the small unmanned aerial vehicle and the minimum detection distance of detection equipment; respectively representing reward coefficients corresponding to the fact that the flying speed of the small unmanned aerial vehicle is lower than a certain flying speed threshold value and higher than a maximum flying speed threshold value; v. ofmin,vmax,vxhRespectively representing the minimum flying speed, the maximum flying speed and the cruising flying speed of the small unmanned aerial vehicle.
R is to bea,RdAnd RvAnd performing weighted summation to obtain an expression of a reward function R of the small unmanned aerial vehicle prevention and control command decision model, wherein the expression specifically comprises the following steps:
R=a1·Ra+a2·Rd+a3·Rv,
wherein, a1,a2,a3The weights corresponding to the angle reward function, the distance reward function and the speed reward function can be obtained according to empirical values, and satisfy constraint conditions: a is1+a2+a3=1,a1,a2,a3≥0。
Further, the step 4 specifically includes: the method comprises the steps of training a small unmanned aerial vehicle prevention and control command decision model by using a Deep Q Network algorithm, namely a DQN algorithm for short, until the small unmanned aerial vehicle prevention and control command decision model can generate prevention and control treatment strategies aiming at driving away and damage striking of the small unmanned aerial vehicle executing different tasks (such as striking and reconnaissance), stopping training and storing neural Network model parameters at the moment when the defense success rate of the strategies exceeds a certain threshold value, and completing the training and optimization of the small unmanned aerial vehicle prevention and control command decision model.
In the DQN algorithm, a value evaluation network and a value target network are constructed, the output value of the value evaluation network is represented as Q (s, a | theta), the input of the value evaluation network is a handling action variable a taken at the previous moment and a state variable s at the current moment, the output of the value evaluation network is a handling action variable taken at the next moment, the corresponding value evaluation network parameter is theta, the value evaluation network adopts a mode of minimizing the difference between the state action value of the value evaluation network and the state action value of the value target network to update and optimize the value evaluation network parameter theta, and the Q (s, a | theta) value output by the value evaluation network is directly output by the network; the value target network output value is expressed asThe input of the method is a treatment action variable a taken at the last moment and a state variable s at the moment, and the corresponding value target network parameter is theta-(ii) a Output of value target networkValue output by value target network and reward rjThe specific expression is as follows:
where the index j indicates the number of the jth data in the experience pool taken dataset, rjIndicates the reward, s, corresponding to the j-th datajA state variable corresponding to the j-th datajA treatment action variable s representing the j-th dataj+1Indicating that the experience pool adopts the state variable, a, corresponding to the j +1 th data in the data setj+1Representing that the experience pool adopts the treatment action change corresponding to the j +1 th data in the data setThe amount of the compound (A) is,representing the value target network output corresponding to the jth dataThe value, γ is the reward discount factor, L (θ) represents the loss function used in training the value assessment network with parameter θ,represents a state variable sj+1Take action aj+1Maximum of value target network outputThe value of the one or more of the one,represents a state variable sj+1Take action aj+1And finally, obtaining the least square error between the predicted value of the value target network and the real value of the target.
For the value evaluation network, the parameter θ is updated toward the direction of increasing value of the value evaluation network output value, and the process is expressed as:
wherein the content of the first and second substances,represents a state variable sjAnd an action variable ajCorresponding to the gradient of the Q-value function over the parameter theta,represents the gradient of the loss function L (theta) to the parameter theta; by adopting the method of temporarily freezing the parameters of the value target network, after reaching a certain training period of the value evaluation network, the parameters of the value target network are updatedOnly the value evaluation network parameter theta is transmitted to the value target network parameter theta-Therefore, the stage fixity of the value target network is kept, and the stability of algorithm training is improved;
the value target network and the value evaluation network both adopt a neural network architecture formed by full connection layers, 3 full connection layers are arranged on the value target network and the value evaluation network, and 200, 100 and 50 neurons are respectively selected from the 3 full connection layers.
Further, the step S5 specifically includes: and loading the small unmanned aerial vehicle control command decision model obtained in the training of the step S4 in a small unmanned aerial vehicle control actual scene, making a decision according to a state space obtained in real time from the small unmanned aerial vehicle control actual scene to obtain a handling action variable a, applying the handling action variable a to the actual scene, immediately obtaining a small unmanned aerial vehicle control strategy, changing an environmental state and obtaining real-time reward feedback.
The invention discloses a small unmanned aerial vehicle prevention and control command decision system based on reinforcement learning, which comprises a multi-source data fusion module, a situation analysis module, a prevention and control planning module and an effect evaluation module, wherein the four modules are sequentially connected;
the multi-source data fusion module is used for fusing data acquired by detecting the prevention and control environment and the target by the multi-type detection equipment;
the situation analysis module is used for performing attribute analysis and judgment and threat assessment on multi-source target data obtained by the multi-type detection equipment;
the control planning module is used for realizing the small unmanned aerial vehicle control decision method based on reinforcement learning to obtain a small unmanned aerial vehicle control command decision model, and automatically generating a small unmanned aerial vehicle control disposal decision scheme according to threat judgment information obtained by the situation analysis module;
the effect evaluation module analyzes and processes the real-time prevention and control environment situation, the damage degree of the prevention and control target and the specific striking effect of the prevention and control disposal equipment, evaluates the prevention and control effect of the prevention and control disposal decision scheme of the small unmanned aerial vehicle, and provides real-time feedback for the prevention and control command decision action of the unmanned aerial vehicle.
Further, the multi-source data fusion module extracts, manages and organizes information of data obtained by the multi-type detection equipment according to the prevention and control target type, the prevention and control environment elements, the prevention and control target elements, the disposal elements and the like;
furthermore, the situation analysis module performs attribute analysis and judgment on multi-source target data in the whole process of prevention and control judgment, constructs a threat level model for threat assessment, obtains threat judgment information, is used for mastering the threat degree of a related target, and uploads the threat judgment information to the prevention and control planning module.
Compared with the prior art, the invention has the beneficial effects that:
(1) the invention provides a small unmanned aerial vehicle prevention and control command decision method and a system based on reinforcement learning, wherein a reinforcement learning theory is combined with a small unmanned aerial vehicle prevention and control decision model, so that the automatic generation of comprehensive situation data for the prevention and control of a small unmanned aerial vehicle is realized, and a prevention and control disposal scheme and instructions for the unmanned aerial vehicle are efficiently generated by utilizing the data;
(2) the invention provides a small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning, which realize situation fusion, threat analysis, planning scheme and treatment control, and improve the intelligent decision level of 4 unmanned aerial vehicle prevention and control command flow stages, solve the problems of low decision speed, difficulty in processing complex scenes and the like in the conventional prevention and control command decision system, and provide a new technical thought for small unmanned aerial vehicle prevention and control command decision.
(3) The invention provides a method and a system for small unmanned aerial vehicle prevention and control command decision based on reinforcement learning, which can be widely applied to the fields of small unmanned aerial vehicle management and control, civil supervision and military.
Drawings
Fig. 1 is a flow chart of a control command decision method of a small unmanned aerial vehicle based on reinforcement learning according to the invention;
FIG. 2 is a flow chart of a deep Q network algorithm in the present invention;
fig. 3 is a composition diagram of a small unmanned aerial vehicle prevention and control command decision system based on reinforcement learning.
Detailed Description
For a better understanding of the present disclosure, an example is given here.
In order to facilitate understanding of those skilled in the art, the method and system for unmanned aerial vehicle prevention and control command decision based on reinforcement learning provided by the invention are further described in detail with reference to the accompanying drawings and specific embodiments.
The invention discloses a small unmanned aerial vehicle prevention and control command decision method based on reinforcement learning, which comprises the following steps:
s1, determining the composition of a small unmanned aerial vehicle prevention and control system;
s2, constructing a three-degree-of-freedom particle motion model of the small unmanned aerial vehicle;
s3, constructing a small unmanned aerial vehicle prevention and control command decision model;
s4, training and optimizing a small unmanned aerial vehicle prevention and control command decision model;
and S5, verifying and evaluating the prevention and control effect of the small unmanned aerial vehicle prevention and control command decision model.
Further, the step S1 specifically includes: determining the composition of a small unmanned aerial vehicle prevention and control system, wherein the small unmanned aerial vehicle prevention and control system comprises a detection subsystem, a disposal subsystem and a command control system; the system comprises a detection subsystem, a disposal subsystem, a command control system and a command control system, wherein the detection subsystem is used for providing combat situation information, the disposal subsystem is responsible for implementing prevention and control disposal, and the command control system is used for receiving the combat situation information from the detection subsystem and scheduling a plurality of disposal means to generate a disposal strategy; the detection subsystem comprises single-type or multi-type detection equipment, and the disposal subsystem comprises multi-type soft killing disposal equipment and hard interception disposal equipment; the command control system comprises a multi-source data fusion module, a situation analysis module, a prevention and control planning module and an effect evaluation module;
specifically, the detection subsystem comprises radar detection equipment, photoelectric detection equipment and radio detection equipment, and the treatment subsystem comprises radio interference equipment and laser interception equipment;
further, the step S2 specifically includes: in the unmanned aerial vehicle prevention and control operation, mainly prevent and control the processing according to information such as the target position that the subsystem obtained of surveying, speed, consequently the key is the model of research prevention and control object in the prevention and control operation, regards unmanned aerial vehicle as the particle, establishes its three degree of freedom particle motion model:
wherein (x, y, z) represents the coordinates of the small unmanned aerial vehicle in a three-dimensional space coordinate system of the earth, v, theta and psi respectively represent the flight speed, the pitch angle and the yaw angle of the small unmanned aerial vehicle, and t represents time.
Further, the step S3 specifically includes: the treatment equipment of the unmanned aerial vehicle prevention and control system comprises laser interception equipment and radio interference equipment, wherein actions of the laser equipment comprise four actions of opening the laser equipment, closing the laser equipment, keeping the equipment state and adjusting laser pointing direction, and actions of the radio interference equipment comprise four actions of opening interference, closing interference, keeping action and adjusting interference pointing direction. And performing action coding on various actions of the disposal equipment by adopting a three-bit binary number, wherein the first bit of the three-bit binary number represents the type of the equipment, and the last two bits of the three-bit binary number are used for representing the corresponding specific actions of the equipment, namely, the action taken by the disposal equipment of the prevention and control system is represented by a triple group formed by the three-bit binary number.
According to the characteristics of the small unmanned aerial vehicle prevention and control task and the Markov decision process, a small unmanned aerial vehicle prevention and control command decision model is established, a state space and a disposal decision space are designed, and a reward function is determined according to the prevention and control intention of a small unmanned aerial vehicle prevention and control system;
the small unmanned aerial vehicle control command decision model is established by adopting a reinforcement learning algorithm, interaction between the intelligent decision model and the environment is described by adopting a Markov decision process in reinforcement learning, and the Markov decision process is realized by utilizing a state space, an action space, a reward function and a discount coefficient;
the expression of the state space S of the unmanned aerial vehicle prevention and control command decision model is as follows:
S=[dt,vt,θt,ψt,,tl,tj],
wherein d istThe expression of (a) is:
wherein the content of the first and second substances,andrespectively representing the position coordinates of the small unmanned aerial vehicle at the time t and the time t-delta t, (x)a,ya,za) Representing the position coordinates of the detection device, at representing the stepping time interval of the Markov decision process; dtThe distance between the small unmanned aerial vehicle and the detection equipment at the moment t is represented; v. oftRepresenting the flight speed of the small unmanned aerial vehicle at the moment t; t is tlRepresenting the light emitting time of the laser interception equipment; t is tjRepresenting the time when the radio interference device is on; theta and psi are denoted as the pitch angle and yaw angle of the drone, respectively.
The expression of the action space A of the unmanned aerial vehicle control command decision model is A ═ Dt,Da1,Da2]Wherein the device type DtThe value is 0 or 1, and the action type of the equipment is determined by an action variable Da1And Da2Is a combination of (1) represents an action variable [ D ]a1,Da2]The specific values of (a) include four combinations of 00, 01, 10 and 11.
When the prevention and control intention of the small unmanned aerial vehicle prevention and control system is the prevention and control target of medium and long distance, the defense success condition at the moment is expressed by the reward function of each flight component of the small unmanned aerial vehicle,
wherein R isa、RdAnd RvRespectively representing an angle reward function, a distance reward function and a speed reward function; q represents an included angle between the speed vector of the small unmanned aerial vehicle and a connecting line of the small unmanned aerial vehicle and the detection equipment; q. q.smRepresenting the angle value when the angle reward value is the minimum reward positive value;respectively representing reward values of the detection equipment in the range of the line-of-sight angle of the unmanned aerial vehicle and reward values of the detection equipment out of the range of the line-of-sight angle of the unmanned aerial vehicle, wherein when the angle q is 0, the reward value of the angle is minimum; when the angle q is pi, the angle award value is maximum. The distance reward function is expressed by a linear function related to the distance, k is a smooth coefficient keeping the distance reward function at the minimum reward positive value, dfAnd dcRespectively representing the maximum radius of a prevention and control area of the small unmanned aerial vehicle and the minimum detection distance of detection equipment; respectively representing reward coefficients corresponding to the fact that the flying speed of the small unmanned aerial vehicle is lower than a certain flying speed threshold value and higher than a maximum flying speed threshold value; v. ofmin,vmax,vxhRespectively representing the minimum flying speed, the maximum flying speed and the cruising flying speed of the small unmanned aerial vehicle.
R is to bea,RdAnd RvAnd performing weighted summation to obtain an expression of a reward function R of the small unmanned aerial vehicle prevention and control command decision model, wherein the expression specifically comprises the following steps:
R=a1·Ra+a2·Rd+a3·Rv,
wherein, a1,a2,a3The weights respectively corresponding to the angle reward function, the distance reward function and the speed reward function can be obtained according to the empirical valueIt satisfies the constraint condition: a is1+a2+a3=1,a1,a2,a3≥0。
Further, the step 4 specifically includes: the method comprises the steps of training a small unmanned aerial vehicle prevention and control command decision model by using a Deep Q Network algorithm, namely a DQN algorithm for short, until the small unmanned aerial vehicle prevention and control command decision model can generate prevention and control treatment strategies aiming at driving away and damage striking of the small unmanned aerial vehicle executing different tasks (such as striking and reconnaissance), stopping training and storing neural Network model parameters at the moment when the defense success rate of the strategies exceeds a certain threshold value, and completing the training and optimization of the small unmanned aerial vehicle prevention and control command decision model.
In the DQN algorithm, a value evaluation network and a value target network are constructed, the output value of the value evaluation network is represented as Q (s, a | theta), the input of the value evaluation network is a handling action variable a taken at the previous moment and a state variable s at the current moment, the output of the value evaluation network is a handling action variable taken at the next moment, the corresponding value evaluation network parameter is theta, the value evaluation network adopts a mode of minimizing the difference between the state action value of the value evaluation network and the state action value of the value target network to update and optimize the value evaluation network parameter theta, and the Q (s, a | theta) value output by the value evaluation network is directly output by the network; the value target network output value is expressed asThe input of the method is a treatment action variable a taken at the last moment and a state variable s at the moment, and the corresponding value target network parameter is theta-(ii) a Output of value target networkValue output by value target network and reward rjThe specific expression is as follows:
where the index j indicates the number of the jth data in the experience pool taken dataset, rjIndicates the reward, s, corresponding to the j-th datajA state variable corresponding to the j-th datajA treatment action variable s representing the j-th dataj+1Indicating that the experience pool adopts the state variable, a, corresponding to the j +1 th data in the data setj+1Indicating that the experience pool adopts the treatment action variable corresponding to the j +1 th data in the data set,representing the value target network output corresponding to the jth dataThe value, γ is the reward discount factor, L (θ) represents the loss function used in training the value assessment network with parameter θ,represents a state variable sj+1Take action aj+1Maximum of value target network outputThe value of the one or more of the one,represents a state variable sj+1Take action aj+1And finally, obtaining the least square error between the predicted value of the value target network and the real value of the target.
For the value evaluation network, the parameter θ is updated toward the direction of increasing value of the value evaluation network output value, and the process is expressed as:
wherein the content of the first and second substances,represents a state variable sjAnd an action variable ajCorresponding to the gradient of the Q-value function over the parameter theta,represents the gradient of the loss function L (theta) to the parameter theta; by adopting a method of temporarily freezing the value target network parameters, after a certain training period of the value evaluation network is reached, the parameters of the value target network are updated, and the value evaluation network parameters theta are transmitted to the value target network parameters theta-Therefore, the stage fixity of the value target network is kept, and the stability of algorithm training is improved;
the value target network and the value evaluation network both adopt a neural network architecture formed by full connection layers, the neural network architecture and the value evaluation network are provided with 3 full connection layers in total, and 200, 100 and 50 neurons are respectively selected from the 3 full connection layers.
Further, the step S5 specifically includes: and loading the small unmanned aerial vehicle control command decision model obtained in the training of the step S4 in a small unmanned aerial vehicle control actual scene, making a decision according to a state space obtained in real time from the small unmanned aerial vehicle control actual scene to obtain a handling action variable a, applying the handling action variable a to the actual scene, immediately obtaining a small unmanned aerial vehicle control strategy, changing an environmental state and obtaining real-time reward feedback.
The invention discloses a small unmanned aerial vehicle prevention and control command decision system based on reinforcement learning, which comprises a multi-source data fusion module, a situation analysis module, a prevention and control planning module and an effect evaluation module, wherein the four modules are sequentially connected;
the data fusion module is used for fusing data acquired by detecting the prevention and control environment and the target by the multi-type detection equipment;
the situation analysis module is used for performing attribute analysis and judgment and threat assessment on multi-source target data obtained by the multi-type detection equipment;
the control planning module is used for realizing the small unmanned aerial vehicle control decision method based on reinforcement learning to obtain a small unmanned aerial vehicle control command decision model, and automatically generating a small unmanned aerial vehicle control disposal decision scheme according to threat judgment information obtained by the situation analysis module;
the effect evaluation module analyzes and processes the real-time prevention and control environment situation, the damage degree of the prevention and control target and the specific striking effect of the prevention and control disposal equipment, evaluates the prevention and control effect of the prevention and control disposal decision scheme of the small unmanned aerial vehicle, and provides real-time feedback for the prevention and control command decision action of the unmanned aerial vehicle.
Further, the multi-source data fusion module extracts, manages and organizes information of data obtained by the multi-type detection equipment according to the prevention and control target type, the prevention and control environment elements, the prevention and control target elements, the disposal elements and the like;
furthermore, the situation analysis module performs attribute analysis and judgment on multi-source target data in the whole process of prevention and control judgment, constructs a threat level model for threat assessment, obtains threat judgment information, is used for mastering the threat degree of a related target, and uploads the threat judgment information to the prevention and control planning module.
Referring to fig. 1, the small unmanned aerial vehicle prevention and control command decision method based on reinforcement learning of the present invention includes the following steps:
step 1, defining the composition of a small unmanned aerial vehicle prevention and control system. Determining the composition of a small unmanned aerial vehicle prevention and control system, wherein the small unmanned aerial vehicle prevention and control system comprises a detection subsystem, a disposal subsystem and a command control system; the system comprises a detection subsystem, a disposal subsystem and a command control system, wherein the detection subsystem is used for providing combat situation information, the disposal subsystem is responsible for implementing prevention and control disposal, and the command control system is used for receiving the combat situation information and generating a disposal strategy; the detection subsystem comprises radar detection equipment, photoelectric detection equipment and radio detection equipment, the treatment subsystem comprises radio interference equipment and laser interception equipment, and the command control system comprises a data fusion module, a situation analysis module, a prevention and control planning module and an effect evaluation module;
under the condition that the small unmanned aerial vehicle prevention and control system is considered to be composed of 1 set of detection subsystem, 1 set of treatment subsystem and an instruction control system, the detection subsystem comprises 1 station of each of radar, photoelectric detection equipment and radio detection equipment, and the treatment subsystem comprises 1 station of each of radio interference equipment and laser interception equipment. The command control system is composed of data fusion, situation analysis, prevention and control planning and effect evaluation modules.
And 2, constructing a three-degree-of-freedom particle motion model of the small unmanned aerial vehicle. In the small unmanned aerial vehicle prevention and control operation, prevention and control treatment is mainly carried out according to information such as target positions and speeds acquired by the detection subsystem, so that the important point is to research a model of a prevention and control target in the prevention and control operation, regard the model as a particle and research a three-degree-of-freedom particle model:
wherein (x, y, z) represents the coordinate of the small unmanned aerial vehicle in a three-dimensional space with the ground as a reference system, and v, theta and psi respectively represent the flight speed, the pitch angle and the yaw angle of the small unmanned aerial vehicle.
In this embodiment, it is assumed that N drones executing reconnaissance and strike tasks are initialized randomly outside the protection area where the drone protection and control system is located, and coordinate information of the drones is (x)i,yi,zi),i=1…N。
And 3, constructing a small unmanned aerial vehicle prevention and control command decision model. Establishing a small unmanned aerial vehicle prevention and control command decision model according to the small unmanned aerial vehicle prevention and control task characteristics and the Markov decision process, designing a state space and a disposal decision space, and determining a reward function according to the intentions of different targets to be prevented and controlled;
in the invention, the small unmanned aerial vehicle prevention and control command decision model is established by a model-free reinforcement learning algorithm, so that other elements except the state transition probability are only considered.
Wherein, the state space S of the unmanned aerial vehicle prevention and control command decision model is as follows:
S=[dt,vt,θt,ψt,,tl,tj],
wherein d istThe expression of (a) is:
wherein (x)a,ya,za) Representing radar coordinates, (x)b,yb,zb) Representing the coordinates of the small unmanned aerial vehicle; superscript t and t-dt respectively represent the directions of the unmanned aerial vehicle at the t moment and the previous moment; dt represents a simulated step time interval; dtRepresenting the distance of the drone from the radar; v. oftRepresenting the flight rate of the drone; t is tlRepresenting the light emitting time of the laser interception equipment; t is tjRepresenting an interference time of a radio interfering device; theta and psi are denoted as the pitch angle and yaw angle of the drone, respectively.
Wherein, action space A ═ D of unmanned aerial vehicle prevention and control command decision modelt,Da1,Da2]Device type DtValue of 0 or 1, value of the specific action [ Da1,Da2]Including four combinations of 00, 01, 10 and 11.
The treatment equipment for preventing and controlling the small unmanned aerial vehicle comprises laser interception equipment and radio interference equipment, wherein actions of the laser equipment comprise four actions of opening the laser equipment, closing the laser equipment, keeping the equipment state and adjusting laser pointing direction, the radio interference equipment is basically the same, and the actions comprise four actions of opening interference, closing interference, keeping action and adjusting interference pointing direction.
And the actions are coded by adopting three-digit binary numbers, wherein the first digit represents the type of the equipment, and the last two digits represent the specific actions corresponding to the equipment, namely the action taken by the prevention and control system is represented by a triple.
The specific content of the reward function R of the unmanned aerial vehicle prevention and control command decision model is as follows:
when the intention of the defense and control system is to defend the medium-long distance target, the defense success condition is
Wherein R isa、RdAnd RvRespectively representing an angle reward function, a distance reward function and a speed reward function; q represents an included angle between the velocity vector and a connecting line of the unmanned aerial vehicle and the radar; q. q.smRepresenting a critical point angle; when the relative angle q is 0 degrees, the punishment is maximum; when q is 180 °, the penalty is minimal. The distance reward is expressed by a linear function related to the distance, k is a smoothing coefficient of the retention function at a critical point, dfAnd dlRespectively representing the maximum radius of the protective area and the radius of the core area; v. ofmin,vmax,vxhRespectively representing the minimum speed, the maximum speed and the cruising speed of the drone targets.
R is to bea,RdAnd RvAnd weighting to obtain a comprehensive single-step reward R:
R=a1·Ra+a2·Rd+a3·Rv
wherein, a1,a2,a3The weight corresponding to each reward function can be obtained according to the empirical value and satisfies the following constraint a1+a2+a3=1(a1,a2,a3≥0)
And 4, training and optimizing a prevention and control command decision model. And training the unmanned aerial vehicle prevention and control command decision model by using a Deep Q network algorithm (Deep Qnetwork) until the decision model can generate unmanned aerial vehicles with different prevention and control intents effectively, and obtaining a neural network corresponding to the model when the defense success rate of the strategy exceeds a certain threshold.
The DQN algorithm provides a technology of applying experience playback and a fixed target network, and is one of the more popular deep reinforcement learning algorithms; the schematic diagram is shown in fig. 2, a value evaluation network and a value target network are constructed in the diagram, the output of the value evaluation network can be represented as Q (s, a | θ), and the corresponding parameter is θ; the value target network output value is expressed asCorresponding to a parameter theta-(ii) a For the value evaluation network, the input is the action a taken at the last moment and the state s at the moment, and the output is Q (s, a); updating an optimized value evaluation network parameter theta in a mode of minimizing the difference between the evaluation network state action value and the target network state action value under the network, wherein the Q value corresponding to the evaluation network is directly output according to the network, and the Q value corresponding to the target network is directly output according to the networkThe value is output from the target network and the reward rjThe structure is specifically shown as the following formula:
wherein, the subscript j represents the index of the jth data in the experience pool adopted data; gamma is the reward discount coefficient; l (θ) represents the loss function of the training evaluation network.
For the evaluation network, the input is the current environment state s, the output is the action a, and the parameter θ of the network is updated toward the direction of increasing the output value of the evaluation network, as shown in the following formula:
updating the parameters of the target network by temporarily freezing the parameters of the target network every time a certain step length is reached, theta-←θ。
Training a small unmanned aerial vehicle prevention and control command decision model by using a DQN algorithm, specifically programming by using python3.8, adopting a Pythrch deep learning framework, setting 3 full-connection layers in total by adopting a neural network architecture formed by full-connection layers for a target network and an evaluation network, and respectively selecting 200, 100 and 50 neurons; the upper limit of each training is set to 10000 rounds, and the step size of each round is set to 105When the defense success rate of the strategy exceeds a certain threshold value, specifically, when the defense success rate reaches 270 or more rounds in each 300 training rounds, the training is stopped at the moment, and the neural network model parameters at the moment are stored.
And 5, verifying and evaluating the effect of the decision model. The method comprises the steps of loading a control command decision model obtained by training in a typical small unmanned aerial vehicle control battle scene, making a decision according to a state space s obtained in real time from the scene to obtain a real-time unmanned aerial vehicle control strategy, and using a disposal device operation a in the scene to change the environment state and obtain real-time reward feedback.
Fig. 3 is a composition diagram of the reinforcement learning-based small unmanned aerial vehicle prevention and control command decision system of the present invention, which includes: the system comprises a multi-source data fusion module, a situation analysis module, a prevention and control planning module and an effect evaluation module.
The data fusion module is used for fusing data acquired by detecting the prevention and control environment and the target by the multi-type detection means; aiming at different types of prevention and control targets, information extraction, management, compilation and the like are carried out on prevention and control environment elements, prevention and control elements and disposal elements;
the situation analysis module is used for carrying out attribute analysis and judgment and threat assessment on the multi-source target data; performing attribute analysis and judgment on multi-source target data in the whole process of prevention and control judgment, and constructing a threat level model for threat assessment;
the control planning module is used for providing automatic treatment decision support for the unmanned aerial vehicle control specific tasks and resource planning activities; by adopting the small unmanned aerial vehicle prevention and control decision method based on reinforcement learning, the composition of a small unmanned aerial vehicle prevention and control system is clarified, and an internal model of the small unmanned aerial vehicle prevention and control system is constructed so as to extract combat situation information; designing a state space, an action space and a reward function, and constructing a small unmanned aerial vehicle prevention and control command decision model; training and optimizing a prevention and control command decision model to obtain a prevention and control disposal strategy, and verifying and evaluating the effect of the decision model;
the effect evaluation module is used for evaluating relevant disposal strategies and effects of unmanned aerial vehicle prevention and control and providing real-time feedback for unmanned aerial vehicle prevention and control command decision actions; and analyzing and processing the real-time prevention and control environment situation, the prevention and control target damage degree and the specific attack condition of the prevention and control treatment equipment.
An application method of a small unmanned aerial vehicle prevention and control command decision system based on reinforcement learning comprises the following steps:
s1: the data fusion module is used for fusing data acquired by detection of the prevention and control environment and the targets by a multi-type detection means to the prevention and control targets of different types based on information extraction, management, compilation and the like of the prevention and control environment elements, the prevention and control elements and the disposal elements;
s2: the situation analysis module is oriented to the whole process of prevention and control judgment, performs attribute analysis and judgment on multi-source target data, constructs a threat level model for threat assessment, is used for mastering the threat degree of a related target, and uploads threat judgment information to the prevention and control planning module;
s3: the control planning module adopts the small unmanned aerial vehicle control decision method based on reinforcement learning to make clear the composition of the small unmanned aerial vehicle control system and construct an internal model of the small unmanned aerial vehicle control system so as to extract the combat situation information; designing a state space, an action space and a reward function, and constructing a small unmanned aerial vehicle prevention and control command decision model; training and optimizing a prevention and control command decision model to obtain a prevention and control disposal strategy, and verifying and evaluating the effect of the decision model; the finally obtained small unmanned aerial vehicle prevention and control command decision model can be used for providing automatic disposal decision support for unmanned aerial vehicle prevention and control specific tasks and resource planning activities;
s4: the effect evaluation module analyzes and processes the real-time prevention and control environment situation, the damage degree of the prevention and control target and the specific striking situation of the prevention and control disposal equipment, is used for evaluating the relevant disposal strategies and effects of the prevention and control of the unmanned aerial vehicle, and provides real-time feedback for the decision-making action of the prevention and control command of the unmanned aerial vehicle.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.
Claims (5)
1. A small unmanned aerial vehicle prevention and control command decision method based on reinforcement learning is characterized by comprising the following steps:
s1, determining the composition of a small unmanned aerial vehicle prevention and control system; determining the composition of a small unmanned aerial vehicle prevention and control system, wherein the small unmanned aerial vehicle prevention and control system comprises a detection subsystem, a disposal subsystem and a command control system; the system comprises a detection subsystem, a disposal subsystem, a command control system and a command control system, wherein the detection subsystem is used for providing combat situation information, the disposal subsystem is responsible for implementing prevention and control disposal, and the command control system is used for receiving the combat situation information from the detection subsystem and scheduling a plurality of disposal means to generate a disposal strategy; the detection subsystem comprises single-type or multi-type detection equipment, and the disposal subsystem comprises multi-type soft killing disposal equipment and hard interception disposal equipment; the command control system comprises a multi-source data fusion module, a situation analysis module, a prevention and control planning module and an effect evaluation module;
s2, establishing a three-degree-of-freedom particle motion model of the small unmanned aerial vehicle;
s3, constructing a small unmanned aerial vehicle prevention and control command decision model;
s4, training and optimizing a small unmanned aerial vehicle prevention and control command decision model;
s5, verifying and evaluating the prevention and control effect of the small unmanned aerial vehicle prevention and control command decision model;
the step S3 specifically includes: the treatment equipment of the unmanned aerial vehicle prevention and control system comprises laser interception equipment and radio interference equipment, wherein actions of the laser equipment comprise four actions of turning on the laser equipment, turning off the laser equipment, keeping the equipment state and adjusting laser pointing direction, and actions of the radio interference equipment comprise four actions of turning on interference, turning off interference, keeping action and adjusting interference pointing direction; the method comprises the steps that various actions of disposal equipment are coded by adopting three-bit binary numbers, the first bit of the three-bit binary numbers represents the type of the equipment, the last two bits of the three-bit binary numbers are used for representing the corresponding specific actions of the equipment, and the action taken by the disposal equipment of the prevention and control system is represented by a triple group formed by the three-bit binary numbers;
according to the characteristics of the small unmanned aerial vehicle prevention and control task and the Markov decision process, a small unmanned aerial vehicle prevention and control command decision model is established, a state space and a disposal decision space are designed, and a reward function is determined according to the prevention and control intention of a small unmanned aerial vehicle prevention and control system;
the small unmanned aerial vehicle control command decision model is established by adopting a reinforcement learning algorithm, interaction between the intelligent decision model and the environment is described by adopting a Markov decision process in reinforcement learning, and the Markov decision process is realized by utilizing a state space, an action space, a reward function and a discount coefficient;
the expression of the state space S of the unmanned aerial vehicle prevention and control command decision model is as follows:
S=[dt,vt,θt,ψt,tl,tj],
wherein d istThe expression of (a) is:
wherein the content of the first and second substances,andrespectively representing the position coordinates of the small unmanned aerial vehicle at the time t and the time t-delta t, (x)a,ya,za) Representing the position coordinates of the detection device, at representing the stepping time interval of the Markov decision process; dtThe distance between the small unmanned aerial vehicle and the detection equipment at the moment t is represented; v. oftRepresenting the flight speed of the small unmanned aerial vehicle at the moment t; t is tlRepresenting the light emitting time of the laser interception equipment; t is tjRepresenting the time when the radio interference device is on; theta and psi are respectively expressed as the pitch angle and yaw angle of the unmanned aerial vehicle;
the expression of the action space A of the unmanned aerial vehicle control command decision model is A ═ Dt,Da1,Da2]Wherein the device type DtThe value is 0 or 1, and the action type of the equipment is determined by an action variable Da1And Da2Is a combination of (1) represents an action variable [ D ]a1,Da2]The specific values of (a) include four combinations of 00, 01, 10 and 11;
when the prevention and control intention of the small unmanned aerial vehicle prevention and control system is the prevention and control target of medium and long distance, the defense success condition at the moment is expressed by the reward function of each flight component of the small unmanned aerial vehicle,
wherein R isa、RdAnd RvRespectively representing an angle reward function, a distance reward function and a speed reward function; q represents an included angle between the speed vector of the small unmanned aerial vehicle and a connecting line of the small unmanned aerial vehicle and the detection equipment; q. q.smRepresenting the angle value when the angle reward value is the minimum reward positive value;respectively indicating the visual angle of the detecting equipment at the unmanned aerial vehicleReward values within the range and out of the range of the line-of-sight angle of the unmanned aerial vehicle, and when the angle q is 0, the angle reward value is minimum; when the angle q is pi, the angle reward value is maximum; the distance reward function is expressed by a linear function related to the distance, k is a smooth coefficient keeping the distance reward function at the minimum reward positive value, dfAnd dcRespectively representing the maximum radius of a prevention and control area of the small unmanned aerial vehicle and the minimum detection distance of detection equipment; respectively representing reward coefficients corresponding to the fact that the flying speed of the small unmanned aerial vehicle is lower than a certain flying speed threshold value and higher than a maximum flying speed threshold value; v. ofmin,vmax,vxhRespectively representing the minimum flying speed, the maximum flying speed and the cruising flying speed of the small unmanned aerial vehicle;
r is to bea,RdAnd RvAnd performing weighted summation to obtain an expression of a reward function R of the small unmanned aerial vehicle prevention and control command decision model, wherein the expression specifically comprises the following steps:
R=a1·Ra+a2·Rd+a3·Rv,
wherein, a1,a2,a3The weights corresponding to the angle reward function, the distance reward function and the speed reward function can be obtained according to empirical values, and satisfy constraint conditions: a is1+a2+a3=1,a1,a2,a3≥0。
2. The reinforcement learning-based unmanned aerial vehicle control and command decision method according to claim 1,
the detection subsystem comprises radar detection equipment, photoelectric detection equipment and radio detection equipment, and the treatment subsystem comprises radio interference equipment and laser interception equipment.
3. The reinforcement learning-based unmanned aerial vehicle control and command decision method according to claim 1,
the step S2 specifically includes: taking the small unmanned aerial vehicle as particles, and establishing a three-degree-of-freedom particle motion model:
wherein (x, y, z) represents the coordinates of the small unmanned aerial vehicle in a three-dimensional space coordinate system of the ground, v, theta and psi respectively represent the flight speed, the pitch angle and the yaw angle of the small unmanned aerial vehicle, and t represents time.
4. The reinforcement learning-based unmanned aerial vehicle control and command decision method according to claim 1,
the step 4 specifically includes: training a small unmanned aerial vehicle prevention and control command decision model by using a depth Q network algorithm until the small unmanned aerial vehicle prevention and control command decision model can generate prevention and control disposal strategies for driving away and damaging and striking of small unmanned aerial vehicles executing different tasks, and stopping training and storing neural network model parameters at the moment when the defense success rate of the strategies exceeds a certain threshold value, thereby completing the training and optimization of the small unmanned aerial vehicle prevention and control command decision model;
in the DQN algorithm, a value evaluation network and a value target network are constructed, the output value of the value evaluation network is represented as Q (s, a | theta), the input of the value evaluation network is a handling action variable a taken at the previous moment and a state variable s at the current moment, the output of the value evaluation network is a handling action variable taken at the next moment, the corresponding value evaluation network parameter is theta, the value evaluation network adopts a mode of minimizing the difference between the state action value of the value evaluation network and the state action value of the value target network to update and optimize the value evaluation network parameter theta, and the Q (s, a | theta) value output by the value evaluation network is directly output by the network; the value target network output value is expressed asThe input of the method is a treatment action variable a taken at the last moment and a state variable s at the moment, and the corresponding value target network parameter is theta-(ii) a Output of value target networkValue output by value target network and reward rjThe specific expression is as follows:
where the index j indicates the number of the jth data in the experience pool taken dataset, rjIndicates the reward, s, corresponding to the j-th datajA state variable corresponding to the j-th datajA treatment action variable s representing the j-th dataj+1Indicating that the experience pool adopts the state variable, a, corresponding to the j +1 th data in the data setj+1Indicating that the experience pool adopts the treatment action variable corresponding to the j +1 th data in the data set,representing the value target network output corresponding to the jth dataThe value, γ is the reward discount factor, L (θ) represents the loss function used in training the value assessment network with parameter θ,represents a state variable sj+1Take action aj+1After, value orderMaximum of target network outputThe value of the one or more of the one,represents a state variable sj+1Take action aj+1Then, the least square error between the predicted value of the value target network and the real value of the target;
for the value evaluation network, the parameter θ is updated toward the direction of increasing value of the value evaluation network output value, and the process is expressed as:
wherein the content of the first and second substances,represents a state variable sjAnd an action variable ajCorresponding to the gradient of the Q-value function over the parameter theta,represents the gradient of the loss function L (theta) to the parameter theta; by adopting a method of temporarily freezing the value target network parameters, after a certain training period of the value evaluation network is reached, the parameters of the value target network are updated, and the value evaluation network parameters theta are transmitted to the value target network parameters theta-Thereby maintaining stage stationarity of the value target network;
the value target network and the value evaluation network both adopt a neural network architecture formed by full connection layers, 3 full connection layers are arranged on the value target network and the value evaluation network, and 200, 100 and 50 neurons are respectively selected from the 3 full connection layers.
5. The reinforcement learning-based unmanned aerial vehicle control and command decision method according to claim 1, wherein the step S5 specifically comprises: and loading the small unmanned aerial vehicle control command decision model obtained in the training of the step S4 in a small unmanned aerial vehicle control actual scene, making a decision according to a state space obtained in real time from the small unmanned aerial vehicle control actual scene to obtain a handling action variable a, applying the handling action variable a to the actual scene, immediately obtaining a small unmanned aerial vehicle control strategy, changing an environmental state and obtaining real-time reward feedback.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110602580.1A CN113268081B (en) | 2021-05-31 | 2021-05-31 | Small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110602580.1A CN113268081B (en) | 2021-05-31 | 2021-05-31 | Small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113268081A CN113268081A (en) | 2021-08-17 |
CN113268081B true CN113268081B (en) | 2021-11-09 |
Family
ID=77233727
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110602580.1A Active CN113268081B (en) | 2021-05-31 | 2021-05-31 | Small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113268081B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114239392B (en) * | 2021-12-09 | 2023-03-24 | 南通大学 | Unmanned aerial vehicle decision model training method, using method, equipment and medium |
CN114963879B (en) * | 2022-05-20 | 2023-11-17 | 中国电子科技集团公司电子科学研究院 | Comprehensive control system and method for unmanned aerial vehicle |
CN115017759B (en) * | 2022-05-25 | 2023-04-07 | 中国航空工业集团公司沈阳飞机设计研究所 | Terminal autonomic defense simulation verification platform of unmanned aerial vehicle |
JP7407329B1 (en) * | 2023-10-04 | 2023-12-28 | 株式会社インターネットイニシアティブ | Flight guidance device and flight guidance method |
CN117527135B (en) * | 2024-01-04 | 2024-03-22 | 北京领云时代科技有限公司 | System and method for interfering unmanned aerial vehicle communication based on deep learning |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007080584A2 (en) * | 2006-01-11 | 2007-07-19 | Carmel-Haifa University Economic Corp. Ltd. | Uav decision and control system |
CN109445456A (en) * | 2018-10-15 | 2019-03-08 | 清华大学 | A kind of multiple no-manned plane cluster air navigation aid |
CN111026147A (en) * | 2019-12-25 | 2020-04-17 | 北京航空航天大学 | Zero overshoot unmanned aerial vehicle position control method and device based on deep reinforcement learning |
CN112215283A (en) * | 2020-10-12 | 2021-01-12 | 中国人民解放军海军航空大学 | Close-range air combat intelligent decision method based on manned/unmanned aerial vehicle system |
CN112797846A (en) * | 2020-12-22 | 2021-05-14 | 中国船舶重工集团公司第七0九研究所 | Unmanned aerial vehicle prevention and control method and system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190220737A1 (en) * | 2018-01-17 | 2019-07-18 | Hengshuai Yao | Method of generating training data for training a neural network, method of training a neural network and using neural network for autonomous operations |
-
2021
- 2021-05-31 CN CN202110602580.1A patent/CN113268081B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007080584A2 (en) * | 2006-01-11 | 2007-07-19 | Carmel-Haifa University Economic Corp. Ltd. | Uav decision and control system |
CN109445456A (en) * | 2018-10-15 | 2019-03-08 | 清华大学 | A kind of multiple no-manned plane cluster air navigation aid |
CN111026147A (en) * | 2019-12-25 | 2020-04-17 | 北京航空航天大学 | Zero overshoot unmanned aerial vehicle position control method and device based on deep reinforcement learning |
CN112215283A (en) * | 2020-10-12 | 2021-01-12 | 中国人民解放军海军航空大学 | Close-range air combat intelligent decision method based on manned/unmanned aerial vehicle system |
CN112797846A (en) * | 2020-12-22 | 2021-05-14 | 中国船舶重工集团公司第七0九研究所 | Unmanned aerial vehicle prevention and control method and system |
Non-Patent Citations (3)
Title |
---|
A Neural Network-based Intelligent Decision-Making in the Air-Offensive Campaign with Simulation;Gang Hu;《 16th International Conference on Computational Intelligence and Security》;20201130;全文 * |
基于动作空间噪声的深度Q网络学习;吴夏铭;《长春理工大学学报(自然科学版)》;20200831;全文 * |
基于强化遗传算法的无人机空战机动决策研究;谢建峰;《西北工业大学学报》;20201231;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113268081A (en) | 2021-08-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113268081B (en) | Small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning | |
CN110488859B (en) | Unmanned aerial vehicle route planning method based on improved Q-learning algorithm | |
CN111240353B (en) | Unmanned aerial vehicle collaborative air combat decision method based on genetic fuzzy tree | |
CN107063255B (en) | Three-dimensional route planning method based on improved drosophila optimization algorithm | |
CN110991972B (en) | Cargo transportation system based on multi-agent reinforcement learning | |
CN109669475A (en) | Multiple no-manned plane three-dimensional formation reconfiguration method based on artificial bee colony algorithm | |
CN113625569B (en) | Small unmanned aerial vehicle prevention and control decision method and system based on hybrid decision model | |
CN114330115B (en) | Neural network air combat maneuver decision-making method based on particle swarm search | |
CN113159266B (en) | Air combat maneuver decision method based on sparrow searching neural network | |
CN114510078A (en) | Unmanned aerial vehicle maneuver evasion decision-making method based on deep reinforcement learning | |
CN114444201A (en) | Autonomous capability evaluation method of ground attack unmanned aerial vehicle based on Bayesian network | |
CN113741186B (en) | Double-aircraft air combat decision-making method based on near-end strategy optimization | |
CN114089776B (en) | Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning | |
CN114815891A (en) | PER-IDQN-based multi-unmanned aerial vehicle enclosure capture tactical method | |
CN113255893B (en) | Self-evolution generation method of multi-agent action strategy | |
Wang et al. | Deep reinforcement learning-based air combat maneuver decision-making: literature review, implementation tutorial and future direction | |
Li et al. | A UAV coverage path planning algorithm based on double deep q-network | |
CN110986948B (en) | Multi-unmanned aerial vehicle grouping collaborative judgment method based on reward function optimization | |
CN113110101A (en) | Production line mobile robot gathering type recovery and warehousing simulation method and system | |
CN115357051B (en) | Deformation and maneuvering integrated avoidance and defense method | |
CN115574826B (en) | National park unmanned aerial vehicle patrol path optimization method based on reinforcement learning | |
CN117035435A (en) | Multi-unmanned aerial vehicle task allocation and track planning optimization method in dynamic environment | |
CN114879742B (en) | Unmanned aerial vehicle cluster dynamic coverage method based on multi-agent deep reinforcement learning | |
CN116400726A (en) | Rotor unmanned aerial vehicle escape method and system based on reinforcement learning | |
CN112698666A (en) | Aircraft route optimization method based on meteorological grid |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |