CN113268081B - Small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning - Google Patents

Small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning Download PDF

Info

Publication number
CN113268081B
CN113268081B CN202110602580.1A CN202110602580A CN113268081B CN 113268081 B CN113268081 B CN 113268081B CN 202110602580 A CN202110602580 A CN 202110602580A CN 113268081 B CN113268081 B CN 113268081B
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
prevention
value
small unmanned
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110602580.1A
Other languages
Chinese (zh)
Other versions
CN113268081A (en
Inventor
刘阳
温志津
牛余凯
晋晓曦
李晋徽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
32802 Troops Of People's Liberation Army Of China
Original Assignee
32802 Troops Of People's Liberation Army Of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 32802 Troops Of People's Liberation Army Of China filed Critical 32802 Troops Of People's Liberation Army Of China
Priority to CN202110602580.1A priority Critical patent/CN113268081B/en
Publication of CN113268081A publication Critical patent/CN113268081A/en
Application granted granted Critical
Publication of CN113268081B publication Critical patent/CN113268081B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention discloses a small unmanned aerial vehicle prevention and control command decision method based on reinforcement learning, which comprises the following steps: determining the composition of a small unmanned aerial vehicle prevention and control system; the small unmanned aerial vehicle prevention and control system comprises a detection subsystem, a disposal subsystem and a command control system; the detection subsystem is used for providing combat situation information, and the disposal subsystem is responsible for implementing prevention and control disposal; establishing a three-degree-of-freedom particle motion model of the small unmanned aerial vehicle; constructing a prevention and control command decision model; training and optimizing a small unmanned aerial vehicle prevention and control command decision model; and verifying and evaluating the prevention and control effect of the prevention and control command decision model. The invention also discloses a small unmanned aerial vehicle prevention and control command decision system based on reinforcement learning, which comprises a multi-source data fusion module, a situation analysis module, a prevention and control planning module and an effect evaluation module. The invention solves the problems of low decision speed, difficulty in processing complex scenes and the like in the existing prevention and control command decision system, and can be widely applied to the fields of small unmanned aerial vehicle management and control, civil supervision and military.

Description

Small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning
Technical Field
The invention belongs to the technical field of command control, and particularly relates to a small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning.
Background
At present, for the detection and processing problems of the 'low-speed small' unmanned aerial vehicle, many relevant mature technologies and achievements exist at home and abroad, but in the aspect of generating a specific disposal strategy by using detection information and how to construct a small unmanned aerial vehicle control command decision system and other problems, a commander still needs to make an artificial decision at present, and an operator finishes a relevant disposal instruction of the unmanned aerial vehicle according to a decision result.
Considering the intelligent technology development level of the current command control system, the existing small unmanned aerial vehicle prevention and control command control system mainly has the following problems: (1) at present, the control work of the small unmanned aerial vehicle is mainly completed manually by an operator, and the command automation degree is extremely low; (2) the small unmanned aerial vehicle control belongs to short-range defense, the command decision time is short, the response speed is high, the response time of manual operation is difficult to meet the defense requirement, and the difference between coping and dealing with multiple targets is more obvious; (3) the situation of the small unmanned aerial vehicle is complex and varies, and the existing control system and process based on experience rules are difficult to adapt to the control requirements. The small unmanned aerial vehicle prevention and control command decision method based on the reinforcement learning training algorithm model is not applied to the existing products or the small unmanned aerial vehicle prevention and control command decision system.
Disclosure of Invention
Aiming at the problem of automatic generation of process strategies such as detection, analysis, prevention and control command control, scheduling and handling of low-altitude targets such as small unmanned aerial vehicles and the like under complex scenes such as cities and the like, the invention discloses a small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning, which realize the efficient conversion of comprehensive situation data for the prevention and control of small unmanned aerial vehicles into prevention and control handling schemes and instructions for the unmanned aerial vehicles, can access multi-source detection means and multi-element handling means for command decision, realize the effective promotion of intelligent decision level in 4 small unmanned aerial vehicle prevention and control command flow stages including situation fusion, threat analysis, planning schemes and handling control, solve the problems of low decision speed, difficulty in handling complex scenes and the like in the existing prevention and control command system, and meet the prevention and control requirements of the small unmanned aerial vehicles. The small unmanned aerial vehicle generally refers to an unmanned aerial vehicle with takeoff weight not more than 25 kilograms, and comprises two types of fixed wings and rotary wings, and has the characteristics of low cost, strong maneuverability and the like.
The invention discloses a small unmanned aerial vehicle prevention and control command decision method based on reinforcement learning, which comprises the following steps:
s1, determining the composition of a small unmanned aerial vehicle prevention and control system;
s2, establishing a three-degree-of-freedom particle motion model of the small unmanned aerial vehicle;
s3, constructing a small unmanned aerial vehicle prevention and control command decision model;
s4, training and optimizing a small unmanned aerial vehicle prevention and control command decision model;
and S5, verifying and evaluating the prevention and control effect of the small unmanned aerial vehicle prevention and control command decision model.
Further, the step S1 specifically includes: determining the composition of a small unmanned aerial vehicle prevention and control system, wherein the small unmanned aerial vehicle prevention and control system comprises a detection subsystem, a disposal subsystem and a command control system; the system comprises a detection subsystem, a disposal subsystem, a command control system and a command control system, wherein the detection subsystem is used for providing combat situation information, the disposal subsystem is responsible for implementing prevention and control disposal, and the command control system is used for receiving the combat situation information from the detection subsystem and scheduling a plurality of disposal means to generate a disposal strategy; the detection subsystem comprises single-type or multi-type detection equipment, and the disposal subsystem comprises multi-type soft killing disposal equipment and hard interception disposal equipment; the command control system comprises a multi-source data fusion module, a situation analysis module, a prevention and control planning module and an effect evaluation module;
specifically, the detection subsystem comprises radar detection equipment, photoelectric detection equipment and radio detection equipment, and the treatment subsystem comprises radio interference equipment and laser interception equipment;
further, the step S2 specifically includes: in the unmanned aerial vehicle prevention and control operation, mainly prevent and control the processing according to information such as the target position that the subsystem obtained of surveying, speed, consequently the key is the model of research prevention and control object in the prevention and control operation, regards unmanned aerial vehicle as the particle, establishes its three degree of freedom particle motion model:
Figure BDA0003093374690000031
wherein (x, y, z) represents the coordinates of the small unmanned aerial vehicle in a three-dimensional space coordinate system of the earth, v, theta and psi respectively represent the flight speed, the pitch angle and the yaw angle of the small unmanned aerial vehicle, and t represents time.
Further, the step S3 specifically includes: the treatment equipment of the unmanned aerial vehicle prevention and control system comprises laser interception equipment and radio interference equipment, wherein actions of the laser equipment comprise four actions of opening the laser equipment, closing the laser equipment, keeping the equipment state and adjusting laser pointing direction, and actions of the radio interference equipment comprise four actions of opening interference, closing interference, keeping action and adjusting interference pointing direction. And performing action coding on various actions of the disposal equipment by adopting a three-bit binary number, wherein the first bit of the three-bit binary number represents the type of the equipment, and the last two bits of the three-bit binary number are used for representing the corresponding specific actions of the equipment, namely, the action taken by the disposal equipment of the prevention and control system is represented by a triple group formed by the three-bit binary number.
According to the characteristics of the small unmanned aerial vehicle prevention and control task and the Markov decision process, a small unmanned aerial vehicle prevention and control command decision model is established, a state space and a disposal decision space are designed, and a reward function is determined according to the prevention and control intention of a small unmanned aerial vehicle prevention and control system;
the small unmanned aerial vehicle control command decision model is established by adopting a reinforcement learning algorithm, interaction between the intelligent decision model and the environment is described by adopting a Markov decision process in reinforcement learning, and the Markov decision process is realized by utilizing a state space, an action space, a reward function and a discount coefficient;
the expression of the state space S of the unmanned aerial vehicle prevention and control command decision model is as follows:
S=[dt,vt,θt,ψt,,tl,tj],
wherein d istThe expression of (a) is:
Figure BDA0003093374690000032
Figure BDA0003093374690000041
wherein the content of the first and second substances,
Figure BDA0003093374690000042
and
Figure BDA0003093374690000043
respectively representing the position coordinates of the small unmanned aerial vehicle at the time t and the time t-delta t, (x)a,ya,za) Representing the position coordinates of the detection device, at representing the stepping time interval of the Markov decision process; dtThe distance between the small unmanned aerial vehicle and the detection equipment at the moment t is represented; v. oftRepresenting the flight speed of the small unmanned aerial vehicle at the moment t; t is tlRepresenting the light emitting time of the laser interception equipment; t is tjRepresenting the time when the radio interference device is on; theta and psi are denoted as the pitch angle and yaw angle of the drone, respectively.
The expression of the action space A of the unmanned aerial vehicle control command decision model is A ═ Dt,Da1,Da2]Wherein the device type DtThe value is 0 or 1, and the action type of the equipment is determined by an action variable Da1And Da2Is a combination of (1) represents an action variable [ D ]a1,Da2]The specific values of (a) include four combinations of 00, 01, 10 and 11.
When the prevention and control intention of the small unmanned aerial vehicle prevention and control system is the prevention and control target of medium and long distance, the defense success condition at the moment is expressed by the reward function of each flight component of the small unmanned aerial vehicle,
Figure BDA0003093374690000044
wherein R isa、RdAnd RvRespectively representing an angle reward function, a distance reward function and a speed reward function; q represents an included angle between the speed vector of the small unmanned aerial vehicle and a connecting line of the small unmanned aerial vehicle and the detection equipment; q. q.smRepresenting the angle value when the angle reward value is the minimum reward positive value;
Figure BDA0003093374690000045
respectively indicating that the detection equipment is within the visual line angle range of the unmanned aerial vehicle and is away from the unmanned aerial vehicleOpening the reward value of the unmanned aerial vehicle line-of-sight angle range, wherein when the angle q is 0, the angle reward value is minimum; when the angle q is pi, the angle award value is maximum. The distance reward function is expressed by a linear function related to the distance, k is a smooth coefficient keeping the distance reward function at the minimum reward positive value, dfAnd dcRespectively representing the maximum radius of a prevention and control area of the small unmanned aerial vehicle and the minimum detection distance of detection equipment;
Figure BDA0003093374690000051
Figure BDA0003093374690000052
respectively representing reward coefficients corresponding to the fact that the flying speed of the small unmanned aerial vehicle is lower than a certain flying speed threshold value and higher than a maximum flying speed threshold value; v. ofmin,vmax,vxhRespectively representing the minimum flying speed, the maximum flying speed and the cruising flying speed of the small unmanned aerial vehicle.
R is to bea,RdAnd RvAnd performing weighted summation to obtain an expression of a reward function R of the small unmanned aerial vehicle prevention and control command decision model, wherein the expression specifically comprises the following steps:
R=a1·Ra+a2·Rd+a3·Rv
wherein, a1,a2,a3The weights corresponding to the angle reward function, the distance reward function and the speed reward function can be obtained according to empirical values, and satisfy constraint conditions: a is1+a2+a3=1,a1,a2,a3≥0。
Further, the step 4 specifically includes: the method comprises the steps of training a small unmanned aerial vehicle prevention and control command decision model by using a Deep Q Network algorithm, namely a DQN algorithm for short, until the small unmanned aerial vehicle prevention and control command decision model can generate prevention and control treatment strategies aiming at driving away and damage striking of the small unmanned aerial vehicle executing different tasks (such as striking and reconnaissance), stopping training and storing neural Network model parameters at the moment when the defense success rate of the strategies exceeds a certain threshold value, and completing the training and optimization of the small unmanned aerial vehicle prevention and control command decision model.
In the DQN algorithm, a value evaluation network and a value target network are constructed, the output value of the value evaluation network is represented as Q (s, a | theta), the input of the value evaluation network is a handling action variable a taken at the previous moment and a state variable s at the current moment, the output of the value evaluation network is a handling action variable taken at the next moment, the corresponding value evaluation network parameter is theta, the value evaluation network adopts a mode of minimizing the difference between the state action value of the value evaluation network and the state action value of the value target network to update and optimize the value evaluation network parameter theta, and the Q (s, a | theta) value output by the value evaluation network is directly output by the network; the value target network output value is expressed as
Figure BDA0003093374690000053
The input of the method is a treatment action variable a taken at the last moment and a state variable s at the moment, and the corresponding value target network parameter is theta-(ii) a Output of value target network
Figure BDA0003093374690000054
Value output by value target network and reward rjThe specific expression is as follows:
Figure BDA0003093374690000061
Figure BDA0003093374690000062
where the index j indicates the number of the jth data in the experience pool taken dataset, rjIndicates the reward, s, corresponding to the j-th datajA state variable corresponding to the j-th datajA treatment action variable s representing the j-th dataj+1Indicating that the experience pool adopts the state variable, a, corresponding to the j +1 th data in the data setj+1Representing that the experience pool adopts the treatment action change corresponding to the j +1 th data in the data setThe amount of the compound (A) is,
Figure BDA0003093374690000063
representing the value target network output corresponding to the jth data
Figure BDA0003093374690000064
The value, γ is the reward discount factor, L (θ) represents the loss function used in training the value assessment network with parameter θ,
Figure BDA0003093374690000065
represents a state variable sj+1Take action aj+1Maximum of value target network output
Figure BDA0003093374690000066
The value of the one or more of the one,
Figure BDA0003093374690000067
represents a state variable sj+1Take action aj+1And finally, obtaining the least square error between the predicted value of the value target network and the real value of the target.
For the value evaluation network, the parameter θ is updated toward the direction of increasing value of the value evaluation network output value, and the process is expressed as:
Figure BDA0003093374690000068
wherein the content of the first and second substances,
Figure BDA0003093374690000069
represents a state variable sjAnd an action variable ajCorresponding to the gradient of the Q-value function over the parameter theta,
Figure BDA00030933746900000610
represents the gradient of the loss function L (theta) to the parameter theta; by adopting the method of temporarily freezing the parameters of the value target network, after reaching a certain training period of the value evaluation network, the parameters of the value target network are updatedOnly the value evaluation network parameter theta is transmitted to the value target network parameter theta-Therefore, the stage fixity of the value target network is kept, and the stability of algorithm training is improved;
the value target network and the value evaluation network both adopt a neural network architecture formed by full connection layers, 3 full connection layers are arranged on the value target network and the value evaluation network, and 200, 100 and 50 neurons are respectively selected from the 3 full connection layers.
Further, the step S5 specifically includes: and loading the small unmanned aerial vehicle control command decision model obtained in the training of the step S4 in a small unmanned aerial vehicle control actual scene, making a decision according to a state space obtained in real time from the small unmanned aerial vehicle control actual scene to obtain a handling action variable a, applying the handling action variable a to the actual scene, immediately obtaining a small unmanned aerial vehicle control strategy, changing an environmental state and obtaining real-time reward feedback.
The invention discloses a small unmanned aerial vehicle prevention and control command decision system based on reinforcement learning, which comprises a multi-source data fusion module, a situation analysis module, a prevention and control planning module and an effect evaluation module, wherein the four modules are sequentially connected;
the multi-source data fusion module is used for fusing data acquired by detecting the prevention and control environment and the target by the multi-type detection equipment;
the situation analysis module is used for performing attribute analysis and judgment and threat assessment on multi-source target data obtained by the multi-type detection equipment;
the control planning module is used for realizing the small unmanned aerial vehicle control decision method based on reinforcement learning to obtain a small unmanned aerial vehicle control command decision model, and automatically generating a small unmanned aerial vehicle control disposal decision scheme according to threat judgment information obtained by the situation analysis module;
the effect evaluation module analyzes and processes the real-time prevention and control environment situation, the damage degree of the prevention and control target and the specific striking effect of the prevention and control disposal equipment, evaluates the prevention and control effect of the prevention and control disposal decision scheme of the small unmanned aerial vehicle, and provides real-time feedback for the prevention and control command decision action of the unmanned aerial vehicle.
Further, the multi-source data fusion module extracts, manages and organizes information of data obtained by the multi-type detection equipment according to the prevention and control target type, the prevention and control environment elements, the prevention and control target elements, the disposal elements and the like;
furthermore, the situation analysis module performs attribute analysis and judgment on multi-source target data in the whole process of prevention and control judgment, constructs a threat level model for threat assessment, obtains threat judgment information, is used for mastering the threat degree of a related target, and uploads the threat judgment information to the prevention and control planning module.
Compared with the prior art, the invention has the beneficial effects that:
(1) the invention provides a small unmanned aerial vehicle prevention and control command decision method and a system based on reinforcement learning, wherein a reinforcement learning theory is combined with a small unmanned aerial vehicle prevention and control decision model, so that the automatic generation of comprehensive situation data for the prevention and control of a small unmanned aerial vehicle is realized, and a prevention and control disposal scheme and instructions for the unmanned aerial vehicle are efficiently generated by utilizing the data;
(2) the invention provides a small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning, which realize situation fusion, threat analysis, planning scheme and treatment control, and improve the intelligent decision level of 4 unmanned aerial vehicle prevention and control command flow stages, solve the problems of low decision speed, difficulty in processing complex scenes and the like in the conventional prevention and control command decision system, and provide a new technical thought for small unmanned aerial vehicle prevention and control command decision.
(3) The invention provides a method and a system for small unmanned aerial vehicle prevention and control command decision based on reinforcement learning, which can be widely applied to the fields of small unmanned aerial vehicle management and control, civil supervision and military.
Drawings
Fig. 1 is a flow chart of a control command decision method of a small unmanned aerial vehicle based on reinforcement learning according to the invention;
FIG. 2 is a flow chart of a deep Q network algorithm in the present invention;
fig. 3 is a composition diagram of a small unmanned aerial vehicle prevention and control command decision system based on reinforcement learning.
Detailed Description
For a better understanding of the present disclosure, an example is given here.
In order to facilitate understanding of those skilled in the art, the method and system for unmanned aerial vehicle prevention and control command decision based on reinforcement learning provided by the invention are further described in detail with reference to the accompanying drawings and specific embodiments.
The invention discloses a small unmanned aerial vehicle prevention and control command decision method based on reinforcement learning, which comprises the following steps:
s1, determining the composition of a small unmanned aerial vehicle prevention and control system;
s2, constructing a three-degree-of-freedom particle motion model of the small unmanned aerial vehicle;
s3, constructing a small unmanned aerial vehicle prevention and control command decision model;
s4, training and optimizing a small unmanned aerial vehicle prevention and control command decision model;
and S5, verifying and evaluating the prevention and control effect of the small unmanned aerial vehicle prevention and control command decision model.
Further, the step S1 specifically includes: determining the composition of a small unmanned aerial vehicle prevention and control system, wherein the small unmanned aerial vehicle prevention and control system comprises a detection subsystem, a disposal subsystem and a command control system; the system comprises a detection subsystem, a disposal subsystem, a command control system and a command control system, wherein the detection subsystem is used for providing combat situation information, the disposal subsystem is responsible for implementing prevention and control disposal, and the command control system is used for receiving the combat situation information from the detection subsystem and scheduling a plurality of disposal means to generate a disposal strategy; the detection subsystem comprises single-type or multi-type detection equipment, and the disposal subsystem comprises multi-type soft killing disposal equipment and hard interception disposal equipment; the command control system comprises a multi-source data fusion module, a situation analysis module, a prevention and control planning module and an effect evaluation module;
specifically, the detection subsystem comprises radar detection equipment, photoelectric detection equipment and radio detection equipment, and the treatment subsystem comprises radio interference equipment and laser interception equipment;
further, the step S2 specifically includes: in the unmanned aerial vehicle prevention and control operation, mainly prevent and control the processing according to information such as the target position that the subsystem obtained of surveying, speed, consequently the key is the model of research prevention and control object in the prevention and control operation, regards unmanned aerial vehicle as the particle, establishes its three degree of freedom particle motion model:
Figure BDA0003093374690000091
wherein (x, y, z) represents the coordinates of the small unmanned aerial vehicle in a three-dimensional space coordinate system of the earth, v, theta and psi respectively represent the flight speed, the pitch angle and the yaw angle of the small unmanned aerial vehicle, and t represents time.
Further, the step S3 specifically includes: the treatment equipment of the unmanned aerial vehicle prevention and control system comprises laser interception equipment and radio interference equipment, wherein actions of the laser equipment comprise four actions of opening the laser equipment, closing the laser equipment, keeping the equipment state and adjusting laser pointing direction, and actions of the radio interference equipment comprise four actions of opening interference, closing interference, keeping action and adjusting interference pointing direction. And performing action coding on various actions of the disposal equipment by adopting a three-bit binary number, wherein the first bit of the three-bit binary number represents the type of the equipment, and the last two bits of the three-bit binary number are used for representing the corresponding specific actions of the equipment, namely, the action taken by the disposal equipment of the prevention and control system is represented by a triple group formed by the three-bit binary number.
According to the characteristics of the small unmanned aerial vehicle prevention and control task and the Markov decision process, a small unmanned aerial vehicle prevention and control command decision model is established, a state space and a disposal decision space are designed, and a reward function is determined according to the prevention and control intention of a small unmanned aerial vehicle prevention and control system;
the small unmanned aerial vehicle control command decision model is established by adopting a reinforcement learning algorithm, interaction between the intelligent decision model and the environment is described by adopting a Markov decision process in reinforcement learning, and the Markov decision process is realized by utilizing a state space, an action space, a reward function and a discount coefficient;
the expression of the state space S of the unmanned aerial vehicle prevention and control command decision model is as follows:
S=[dt,vt,θt,ψt,,tl,tj],
wherein d istThe expression of (a) is:
Figure BDA0003093374690000101
Figure BDA0003093374690000102
wherein the content of the first and second substances,
Figure BDA0003093374690000103
and
Figure BDA0003093374690000104
respectively representing the position coordinates of the small unmanned aerial vehicle at the time t and the time t-delta t, (x)a,ya,za) Representing the position coordinates of the detection device, at representing the stepping time interval of the Markov decision process; dtThe distance between the small unmanned aerial vehicle and the detection equipment at the moment t is represented; v. oftRepresenting the flight speed of the small unmanned aerial vehicle at the moment t; t is tlRepresenting the light emitting time of the laser interception equipment; t is tjRepresenting the time when the radio interference device is on; theta and psi are denoted as the pitch angle and yaw angle of the drone, respectively.
The expression of the action space A of the unmanned aerial vehicle control command decision model is A ═ Dt,Da1,Da2]Wherein the device type DtThe value is 0 or 1, and the action type of the equipment is determined by an action variable Da1And Da2Is a combination of (1) represents an action variable [ D ]a1,Da2]The specific values of (a) include four combinations of 00, 01, 10 and 11.
When the prevention and control intention of the small unmanned aerial vehicle prevention and control system is the prevention and control target of medium and long distance, the defense success condition at the moment is expressed by the reward function of each flight component of the small unmanned aerial vehicle,
Figure BDA0003093374690000111
wherein R isa、RdAnd RvRespectively representing an angle reward function, a distance reward function and a speed reward function; q represents an included angle between the speed vector of the small unmanned aerial vehicle and a connecting line of the small unmanned aerial vehicle and the detection equipment; q. q.smRepresenting the angle value when the angle reward value is the minimum reward positive value;
Figure BDA0003093374690000112
respectively representing reward values of the detection equipment in the range of the line-of-sight angle of the unmanned aerial vehicle and reward values of the detection equipment out of the range of the line-of-sight angle of the unmanned aerial vehicle, wherein when the angle q is 0, the reward value of the angle is minimum; when the angle q is pi, the angle award value is maximum. The distance reward function is expressed by a linear function related to the distance, k is a smooth coefficient keeping the distance reward function at the minimum reward positive value, dfAnd dcRespectively representing the maximum radius of a prevention and control area of the small unmanned aerial vehicle and the minimum detection distance of detection equipment;
Figure BDA0003093374690000113
Figure BDA0003093374690000114
respectively representing reward coefficients corresponding to the fact that the flying speed of the small unmanned aerial vehicle is lower than a certain flying speed threshold value and higher than a maximum flying speed threshold value; v. ofmin,vmax,vxhRespectively representing the minimum flying speed, the maximum flying speed and the cruising flying speed of the small unmanned aerial vehicle.
R is to bea,RdAnd RvAnd performing weighted summation to obtain an expression of a reward function R of the small unmanned aerial vehicle prevention and control command decision model, wherein the expression specifically comprises the following steps:
R=a1·Ra+a2·Rd+a3·Rv
wherein, a1,a2,a3The weights respectively corresponding to the angle reward function, the distance reward function and the speed reward function can be obtained according to the empirical valueIt satisfies the constraint condition: a is1+a2+a3=1,a1,a2,a3≥0。
Further, the step 4 specifically includes: the method comprises the steps of training a small unmanned aerial vehicle prevention and control command decision model by using a Deep Q Network algorithm, namely a DQN algorithm for short, until the small unmanned aerial vehicle prevention and control command decision model can generate prevention and control treatment strategies aiming at driving away and damage striking of the small unmanned aerial vehicle executing different tasks (such as striking and reconnaissance), stopping training and storing neural Network model parameters at the moment when the defense success rate of the strategies exceeds a certain threshold value, and completing the training and optimization of the small unmanned aerial vehicle prevention and control command decision model.
In the DQN algorithm, a value evaluation network and a value target network are constructed, the output value of the value evaluation network is represented as Q (s, a | theta), the input of the value evaluation network is a handling action variable a taken at the previous moment and a state variable s at the current moment, the output of the value evaluation network is a handling action variable taken at the next moment, the corresponding value evaluation network parameter is theta, the value evaluation network adopts a mode of minimizing the difference between the state action value of the value evaluation network and the state action value of the value target network to update and optimize the value evaluation network parameter theta, and the Q (s, a | theta) value output by the value evaluation network is directly output by the network; the value target network output value is expressed as
Figure BDA0003093374690000121
The input of the method is a treatment action variable a taken at the last moment and a state variable s at the moment, and the corresponding value target network parameter is theta-(ii) a Output of value target network
Figure BDA0003093374690000122
Value output by value target network and reward rjThe specific expression is as follows:
Figure BDA0003093374690000123
Figure BDA0003093374690000124
where the index j indicates the number of the jth data in the experience pool taken dataset, rjIndicates the reward, s, corresponding to the j-th datajA state variable corresponding to the j-th datajA treatment action variable s representing the j-th dataj+1Indicating that the experience pool adopts the state variable, a, corresponding to the j +1 th data in the data setj+1Indicating that the experience pool adopts the treatment action variable corresponding to the j +1 th data in the data set,
Figure BDA0003093374690000125
representing the value target network output corresponding to the jth data
Figure BDA0003093374690000126
The value, γ is the reward discount factor, L (θ) represents the loss function used in training the value assessment network with parameter θ,
Figure BDA0003093374690000127
represents a state variable sj+1Take action aj+1Maximum of value target network output
Figure BDA0003093374690000131
The value of the one or more of the one,
Figure BDA0003093374690000132
represents a state variable sj+1Take action aj+1And finally, obtaining the least square error between the predicted value of the value target network and the real value of the target.
For the value evaluation network, the parameter θ is updated toward the direction of increasing value of the value evaluation network output value, and the process is expressed as:
Figure BDA0003093374690000133
wherein the content of the first and second substances,
Figure BDA0003093374690000134
represents a state variable sjAnd an action variable ajCorresponding to the gradient of the Q-value function over the parameter theta,
Figure BDA0003093374690000135
represents the gradient of the loss function L (theta) to the parameter theta; by adopting a method of temporarily freezing the value target network parameters, after a certain training period of the value evaluation network is reached, the parameters of the value target network are updated, and the value evaluation network parameters theta are transmitted to the value target network parameters theta-Therefore, the stage fixity of the value target network is kept, and the stability of algorithm training is improved;
the value target network and the value evaluation network both adopt a neural network architecture formed by full connection layers, the neural network architecture and the value evaluation network are provided with 3 full connection layers in total, and 200, 100 and 50 neurons are respectively selected from the 3 full connection layers.
Further, the step S5 specifically includes: and loading the small unmanned aerial vehicle control command decision model obtained in the training of the step S4 in a small unmanned aerial vehicle control actual scene, making a decision according to a state space obtained in real time from the small unmanned aerial vehicle control actual scene to obtain a handling action variable a, applying the handling action variable a to the actual scene, immediately obtaining a small unmanned aerial vehicle control strategy, changing an environmental state and obtaining real-time reward feedback.
The invention discloses a small unmanned aerial vehicle prevention and control command decision system based on reinforcement learning, which comprises a multi-source data fusion module, a situation analysis module, a prevention and control planning module and an effect evaluation module, wherein the four modules are sequentially connected;
the data fusion module is used for fusing data acquired by detecting the prevention and control environment and the target by the multi-type detection equipment;
the situation analysis module is used for performing attribute analysis and judgment and threat assessment on multi-source target data obtained by the multi-type detection equipment;
the control planning module is used for realizing the small unmanned aerial vehicle control decision method based on reinforcement learning to obtain a small unmanned aerial vehicle control command decision model, and automatically generating a small unmanned aerial vehicle control disposal decision scheme according to threat judgment information obtained by the situation analysis module;
the effect evaluation module analyzes and processes the real-time prevention and control environment situation, the damage degree of the prevention and control target and the specific striking effect of the prevention and control disposal equipment, evaluates the prevention and control effect of the prevention and control disposal decision scheme of the small unmanned aerial vehicle, and provides real-time feedback for the prevention and control command decision action of the unmanned aerial vehicle.
Further, the multi-source data fusion module extracts, manages and organizes information of data obtained by the multi-type detection equipment according to the prevention and control target type, the prevention and control environment elements, the prevention and control target elements, the disposal elements and the like;
furthermore, the situation analysis module performs attribute analysis and judgment on multi-source target data in the whole process of prevention and control judgment, constructs a threat level model for threat assessment, obtains threat judgment information, is used for mastering the threat degree of a related target, and uploads the threat judgment information to the prevention and control planning module.
Referring to fig. 1, the small unmanned aerial vehicle prevention and control command decision method based on reinforcement learning of the present invention includes the following steps:
step 1, defining the composition of a small unmanned aerial vehicle prevention and control system. Determining the composition of a small unmanned aerial vehicle prevention and control system, wherein the small unmanned aerial vehicle prevention and control system comprises a detection subsystem, a disposal subsystem and a command control system; the system comprises a detection subsystem, a disposal subsystem and a command control system, wherein the detection subsystem is used for providing combat situation information, the disposal subsystem is responsible for implementing prevention and control disposal, and the command control system is used for receiving the combat situation information and generating a disposal strategy; the detection subsystem comprises radar detection equipment, photoelectric detection equipment and radio detection equipment, the treatment subsystem comprises radio interference equipment and laser interception equipment, and the command control system comprises a data fusion module, a situation analysis module, a prevention and control planning module and an effect evaluation module;
under the condition that the small unmanned aerial vehicle prevention and control system is considered to be composed of 1 set of detection subsystem, 1 set of treatment subsystem and an instruction control system, the detection subsystem comprises 1 station of each of radar, photoelectric detection equipment and radio detection equipment, and the treatment subsystem comprises 1 station of each of radio interference equipment and laser interception equipment. The command control system is composed of data fusion, situation analysis, prevention and control planning and effect evaluation modules.
And 2, constructing a three-degree-of-freedom particle motion model of the small unmanned aerial vehicle. In the small unmanned aerial vehicle prevention and control operation, prevention and control treatment is mainly carried out according to information such as target positions and speeds acquired by the detection subsystem, so that the important point is to research a model of a prevention and control target in the prevention and control operation, regard the model as a particle and research a three-degree-of-freedom particle model:
Figure BDA0003093374690000151
wherein (x, y, z) represents the coordinate of the small unmanned aerial vehicle in a three-dimensional space with the ground as a reference system, and v, theta and psi respectively represent the flight speed, the pitch angle and the yaw angle of the small unmanned aerial vehicle.
In this embodiment, it is assumed that N drones executing reconnaissance and strike tasks are initialized randomly outside the protection area where the drone protection and control system is located, and coordinate information of the drones is (x)i,yi,zi),i=1…N。
And 3, constructing a small unmanned aerial vehicle prevention and control command decision model. Establishing a small unmanned aerial vehicle prevention and control command decision model according to the small unmanned aerial vehicle prevention and control task characteristics and the Markov decision process, designing a state space and a disposal decision space, and determining a reward function according to the intentions of different targets to be prevented and controlled;
in the invention, the small unmanned aerial vehicle prevention and control command decision model is established by a model-free reinforcement learning algorithm, so that other elements except the state transition probability are only considered.
Wherein, the state space S of the unmanned aerial vehicle prevention and control command decision model is as follows:
S=[dt,vt,θt,ψt,,tl,tj],
wherein d istThe expression of (a) is:
Figure BDA0003093374690000152
Figure BDA0003093374690000161
wherein (x)a,ya,za) Representing radar coordinates, (x)b,yb,zb) Representing the coordinates of the small unmanned aerial vehicle; superscript t and t-dt respectively represent the directions of the unmanned aerial vehicle at the t moment and the previous moment; dt represents a simulated step time interval; dtRepresenting the distance of the drone from the radar; v. oftRepresenting the flight rate of the drone; t is tlRepresenting the light emitting time of the laser interception equipment; t is tjRepresenting an interference time of a radio interfering device; theta and psi are denoted as the pitch angle and yaw angle of the drone, respectively.
Wherein, action space A ═ D of unmanned aerial vehicle prevention and control command decision modelt,Da1,Da2]Device type DtValue of 0 or 1, value of the specific action [ Da1,Da2]Including four combinations of 00, 01, 10 and 11.
The treatment equipment for preventing and controlling the small unmanned aerial vehicle comprises laser interception equipment and radio interference equipment, wherein actions of the laser equipment comprise four actions of opening the laser equipment, closing the laser equipment, keeping the equipment state and adjusting laser pointing direction, the radio interference equipment is basically the same, and the actions comprise four actions of opening interference, closing interference, keeping action and adjusting interference pointing direction.
And the actions are coded by adopting three-digit binary numbers, wherein the first digit represents the type of the equipment, and the last two digits represent the specific actions corresponding to the equipment, namely the action taken by the prevention and control system is represented by a triple.
The specific content of the reward function R of the unmanned aerial vehicle prevention and control command decision model is as follows:
when the intention of the defense and control system is to defend the medium-long distance target, the defense success condition is
Figure BDA0003093374690000162
Wherein R isa、RdAnd RvRespectively representing an angle reward function, a distance reward function and a speed reward function; q represents an included angle between the velocity vector and a connecting line of the unmanned aerial vehicle and the radar; q. q.smRepresenting a critical point angle; when the relative angle q is 0 degrees, the punishment is maximum; when q is 180 °, the penalty is minimal. The distance reward is expressed by a linear function related to the distance, k is a smoothing coefficient of the retention function at a critical point, dfAnd dlRespectively representing the maximum radius of the protective area and the radius of the core area; v. ofmin,vmax,vxhRespectively representing the minimum speed, the maximum speed and the cruising speed of the drone targets.
R is to bea,RdAnd RvAnd weighting to obtain a comprehensive single-step reward R:
R=a1·Ra+a2·Rd+a3·Rv
wherein, a1,a2,a3The weight corresponding to each reward function can be obtained according to the empirical value and satisfies the following constraint a1+a2+a3=1(a1,a2,a3≥0)
And 4, training and optimizing a prevention and control command decision model. And training the unmanned aerial vehicle prevention and control command decision model by using a Deep Q network algorithm (Deep Qnetwork) until the decision model can generate unmanned aerial vehicles with different prevention and control intents effectively, and obtaining a neural network corresponding to the model when the defense success rate of the strategy exceeds a certain threshold.
The DQN algorithm provides a technology of applying experience playback and a fixed target network, and is one of the more popular deep reinforcement learning algorithms; the schematic diagram is shown in fig. 2, a value evaluation network and a value target network are constructed in the diagram, the output of the value evaluation network can be represented as Q (s, a | θ), and the corresponding parameter is θ; the value target network output value is expressed as
Figure BDA0003093374690000171
Corresponding to a parameter theta-(ii) a For the value evaluation network, the input is the action a taken at the last moment and the state s at the moment, and the output is Q (s, a); updating an optimized value evaluation network parameter theta in a mode of minimizing the difference between the evaluation network state action value and the target network state action value under the network, wherein the Q value corresponding to the evaluation network is directly output according to the network, and the Q value corresponding to the target network is directly output according to the network
Figure BDA0003093374690000172
The value is output from the target network and the reward rjThe structure is specifically shown as the following formula:
Figure BDA0003093374690000173
Figure BDA0003093374690000174
wherein, the subscript j represents the index of the jth data in the experience pool adopted data; gamma is the reward discount coefficient; l (θ) represents the loss function of the training evaluation network.
For the evaluation network, the input is the current environment state s, the output is the action a, and the parameter θ of the network is updated toward the direction of increasing the output value of the evaluation network, as shown in the following formula:
Figure BDA0003093374690000181
updating the parameters of the target network by temporarily freezing the parameters of the target network every time a certain step length is reached, theta-←θ。
Training a small unmanned aerial vehicle prevention and control command decision model by using a DQN algorithm, specifically programming by using python3.8, adopting a Pythrch deep learning framework, setting 3 full-connection layers in total by adopting a neural network architecture formed by full-connection layers for a target network and an evaluation network, and respectively selecting 200, 100 and 50 neurons; the upper limit of each training is set to 10000 rounds, and the step size of each round is set to 105When the defense success rate of the strategy exceeds a certain threshold value, specifically, when the defense success rate reaches 270 or more rounds in each 300 training rounds, the training is stopped at the moment, and the neural network model parameters at the moment are stored.
And 5, verifying and evaluating the effect of the decision model. The method comprises the steps of loading a control command decision model obtained by training in a typical small unmanned aerial vehicle control battle scene, making a decision according to a state space s obtained in real time from the scene to obtain a real-time unmanned aerial vehicle control strategy, and using a disposal device operation a in the scene to change the environment state and obtain real-time reward feedback.
Fig. 3 is a composition diagram of the reinforcement learning-based small unmanned aerial vehicle prevention and control command decision system of the present invention, which includes: the system comprises a multi-source data fusion module, a situation analysis module, a prevention and control planning module and an effect evaluation module.
The data fusion module is used for fusing data acquired by detecting the prevention and control environment and the target by the multi-type detection means; aiming at different types of prevention and control targets, information extraction, management, compilation and the like are carried out on prevention and control environment elements, prevention and control elements and disposal elements;
the situation analysis module is used for carrying out attribute analysis and judgment and threat assessment on the multi-source target data; performing attribute analysis and judgment on multi-source target data in the whole process of prevention and control judgment, and constructing a threat level model for threat assessment;
the control planning module is used for providing automatic treatment decision support for the unmanned aerial vehicle control specific tasks and resource planning activities; by adopting the small unmanned aerial vehicle prevention and control decision method based on reinforcement learning, the composition of a small unmanned aerial vehicle prevention and control system is clarified, and an internal model of the small unmanned aerial vehicle prevention and control system is constructed so as to extract combat situation information; designing a state space, an action space and a reward function, and constructing a small unmanned aerial vehicle prevention and control command decision model; training and optimizing a prevention and control command decision model to obtain a prevention and control disposal strategy, and verifying and evaluating the effect of the decision model;
the effect evaluation module is used for evaluating relevant disposal strategies and effects of unmanned aerial vehicle prevention and control and providing real-time feedback for unmanned aerial vehicle prevention and control command decision actions; and analyzing and processing the real-time prevention and control environment situation, the prevention and control target damage degree and the specific attack condition of the prevention and control treatment equipment.
An application method of a small unmanned aerial vehicle prevention and control command decision system based on reinforcement learning comprises the following steps:
s1: the data fusion module is used for fusing data acquired by detection of the prevention and control environment and the targets by a multi-type detection means to the prevention and control targets of different types based on information extraction, management, compilation and the like of the prevention and control environment elements, the prevention and control elements and the disposal elements;
s2: the situation analysis module is oriented to the whole process of prevention and control judgment, performs attribute analysis and judgment on multi-source target data, constructs a threat level model for threat assessment, is used for mastering the threat degree of a related target, and uploads threat judgment information to the prevention and control planning module;
s3: the control planning module adopts the small unmanned aerial vehicle control decision method based on reinforcement learning to make clear the composition of the small unmanned aerial vehicle control system and construct an internal model of the small unmanned aerial vehicle control system so as to extract the combat situation information; designing a state space, an action space and a reward function, and constructing a small unmanned aerial vehicle prevention and control command decision model; training and optimizing a prevention and control command decision model to obtain a prevention and control disposal strategy, and verifying and evaluating the effect of the decision model; the finally obtained small unmanned aerial vehicle prevention and control command decision model can be used for providing automatic disposal decision support for unmanned aerial vehicle prevention and control specific tasks and resource planning activities;
s4: the effect evaluation module analyzes and processes the real-time prevention and control environment situation, the damage degree of the prevention and control target and the specific striking situation of the prevention and control disposal equipment, is used for evaluating the relevant disposal strategies and effects of the prevention and control of the unmanned aerial vehicle, and provides real-time feedback for the decision-making action of the prevention and control command of the unmanned aerial vehicle.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (5)

1. A small unmanned aerial vehicle prevention and control command decision method based on reinforcement learning is characterized by comprising the following steps:
s1, determining the composition of a small unmanned aerial vehicle prevention and control system; determining the composition of a small unmanned aerial vehicle prevention and control system, wherein the small unmanned aerial vehicle prevention and control system comprises a detection subsystem, a disposal subsystem and a command control system; the system comprises a detection subsystem, a disposal subsystem, a command control system and a command control system, wherein the detection subsystem is used for providing combat situation information, the disposal subsystem is responsible for implementing prevention and control disposal, and the command control system is used for receiving the combat situation information from the detection subsystem and scheduling a plurality of disposal means to generate a disposal strategy; the detection subsystem comprises single-type or multi-type detection equipment, and the disposal subsystem comprises multi-type soft killing disposal equipment and hard interception disposal equipment; the command control system comprises a multi-source data fusion module, a situation analysis module, a prevention and control planning module and an effect evaluation module;
s2, establishing a three-degree-of-freedom particle motion model of the small unmanned aerial vehicle;
s3, constructing a small unmanned aerial vehicle prevention and control command decision model;
s4, training and optimizing a small unmanned aerial vehicle prevention and control command decision model;
s5, verifying and evaluating the prevention and control effect of the small unmanned aerial vehicle prevention and control command decision model;
the step S3 specifically includes: the treatment equipment of the unmanned aerial vehicle prevention and control system comprises laser interception equipment and radio interference equipment, wherein actions of the laser equipment comprise four actions of turning on the laser equipment, turning off the laser equipment, keeping the equipment state and adjusting laser pointing direction, and actions of the radio interference equipment comprise four actions of turning on interference, turning off interference, keeping action and adjusting interference pointing direction; the method comprises the steps that various actions of disposal equipment are coded by adopting three-bit binary numbers, the first bit of the three-bit binary numbers represents the type of the equipment, the last two bits of the three-bit binary numbers are used for representing the corresponding specific actions of the equipment, and the action taken by the disposal equipment of the prevention and control system is represented by a triple group formed by the three-bit binary numbers;
according to the characteristics of the small unmanned aerial vehicle prevention and control task and the Markov decision process, a small unmanned aerial vehicle prevention and control command decision model is established, a state space and a disposal decision space are designed, and a reward function is determined according to the prevention and control intention of a small unmanned aerial vehicle prevention and control system;
the small unmanned aerial vehicle control command decision model is established by adopting a reinforcement learning algorithm, interaction between the intelligent decision model and the environment is described by adopting a Markov decision process in reinforcement learning, and the Markov decision process is realized by utilizing a state space, an action space, a reward function and a discount coefficient;
the expression of the state space S of the unmanned aerial vehicle prevention and control command decision model is as follows:
S=[dt,vt,θt,ψt,tl,tj],
wherein d istThe expression of (a) is:
Figure FDA0003296291940000021
Figure FDA0003296291940000022
wherein the content of the first and second substances,
Figure FDA0003296291940000023
and
Figure FDA0003296291940000024
respectively representing the position coordinates of the small unmanned aerial vehicle at the time t and the time t-delta t, (x)a,ya,za) Representing the position coordinates of the detection device, at representing the stepping time interval of the Markov decision process; dtThe distance between the small unmanned aerial vehicle and the detection equipment at the moment t is represented; v. oftRepresenting the flight speed of the small unmanned aerial vehicle at the moment t; t is tlRepresenting the light emitting time of the laser interception equipment; t is tjRepresenting the time when the radio interference device is on; theta and psi are respectively expressed as the pitch angle and yaw angle of the unmanned aerial vehicle;
the expression of the action space A of the unmanned aerial vehicle control command decision model is A ═ Dt,Da1,Da2]Wherein the device type DtThe value is 0 or 1, and the action type of the equipment is determined by an action variable Da1And Da2Is a combination of (1) represents an action variable [ D ]a1,Da2]The specific values of (a) include four combinations of 00, 01, 10 and 11;
when the prevention and control intention of the small unmanned aerial vehicle prevention and control system is the prevention and control target of medium and long distance, the defense success condition at the moment is expressed by the reward function of each flight component of the small unmanned aerial vehicle,
Figure FDA0003296291940000031
wherein R isa、RdAnd RvRespectively representing an angle reward function, a distance reward function and a speed reward function; q represents an included angle between the speed vector of the small unmanned aerial vehicle and a connecting line of the small unmanned aerial vehicle and the detection equipment; q. q.smRepresenting the angle value when the angle reward value is the minimum reward positive value;
Figure FDA0003296291940000032
respectively indicating the visual angle of the detecting equipment at the unmanned aerial vehicleReward values within the range and out of the range of the line-of-sight angle of the unmanned aerial vehicle, and when the angle q is 0, the angle reward value is minimum; when the angle q is pi, the angle reward value is maximum; the distance reward function is expressed by a linear function related to the distance, k is a smooth coefficient keeping the distance reward function at the minimum reward positive value, dfAnd dcRespectively representing the maximum radius of a prevention and control area of the small unmanned aerial vehicle and the minimum detection distance of detection equipment;
Figure FDA0003296291940000033
Figure FDA0003296291940000034
respectively representing reward coefficients corresponding to the fact that the flying speed of the small unmanned aerial vehicle is lower than a certain flying speed threshold value and higher than a maximum flying speed threshold value; v. ofmin,vmax,vxhRespectively representing the minimum flying speed, the maximum flying speed and the cruising flying speed of the small unmanned aerial vehicle;
r is to bea,RdAnd RvAnd performing weighted summation to obtain an expression of a reward function R of the small unmanned aerial vehicle prevention and control command decision model, wherein the expression specifically comprises the following steps:
R=a1·Ra+a2·Rd+a3·Rv
wherein, a1,a2,a3The weights corresponding to the angle reward function, the distance reward function and the speed reward function can be obtained according to empirical values, and satisfy constraint conditions: a is1+a2+a3=1,a1,a2,a3≥0。
2. The reinforcement learning-based unmanned aerial vehicle control and command decision method according to claim 1,
the detection subsystem comprises radar detection equipment, photoelectric detection equipment and radio detection equipment, and the treatment subsystem comprises radio interference equipment and laser interception equipment.
3. The reinforcement learning-based unmanned aerial vehicle control and command decision method according to claim 1,
the step S2 specifically includes: taking the small unmanned aerial vehicle as particles, and establishing a three-degree-of-freedom particle motion model:
Figure FDA0003296291940000041
wherein (x, y, z) represents the coordinates of the small unmanned aerial vehicle in a three-dimensional space coordinate system of the ground, v, theta and psi respectively represent the flight speed, the pitch angle and the yaw angle of the small unmanned aerial vehicle, and t represents time.
4. The reinforcement learning-based unmanned aerial vehicle control and command decision method according to claim 1,
the step 4 specifically includes: training a small unmanned aerial vehicle prevention and control command decision model by using a depth Q network algorithm until the small unmanned aerial vehicle prevention and control command decision model can generate prevention and control disposal strategies for driving away and damaging and striking of small unmanned aerial vehicles executing different tasks, and stopping training and storing neural network model parameters at the moment when the defense success rate of the strategies exceeds a certain threshold value, thereby completing the training and optimization of the small unmanned aerial vehicle prevention and control command decision model;
in the DQN algorithm, a value evaluation network and a value target network are constructed, the output value of the value evaluation network is represented as Q (s, a | theta), the input of the value evaluation network is a handling action variable a taken at the previous moment and a state variable s at the current moment, the output of the value evaluation network is a handling action variable taken at the next moment, the corresponding value evaluation network parameter is theta, the value evaluation network adopts a mode of minimizing the difference between the state action value of the value evaluation network and the state action value of the value target network to update and optimize the value evaluation network parameter theta, and the Q (s, a | theta) value output by the value evaluation network is directly output by the network; the value target network output value is expressed as
Figure FDA0003296291940000051
The input of the method is a treatment action variable a taken at the last moment and a state variable s at the moment, and the corresponding value target network parameter is theta-(ii) a Output of value target network
Figure FDA0003296291940000052
Value output by value target network and reward rjThe specific expression is as follows:
Figure FDA0003296291940000053
Figure FDA0003296291940000054
where the index j indicates the number of the jth data in the experience pool taken dataset, rjIndicates the reward, s, corresponding to the j-th datajA state variable corresponding to the j-th datajA treatment action variable s representing the j-th dataj+1Indicating that the experience pool adopts the state variable, a, corresponding to the j +1 th data in the data setj+1Indicating that the experience pool adopts the treatment action variable corresponding to the j +1 th data in the data set,
Figure FDA0003296291940000055
representing the value target network output corresponding to the jth data
Figure FDA0003296291940000056
The value, γ is the reward discount factor, L (θ) represents the loss function used in training the value assessment network with parameter θ,
Figure FDA0003296291940000057
represents a state variable sj+1Take action aj+1After, value orderMaximum of target network output
Figure FDA0003296291940000058
The value of the one or more of the one,
Figure FDA0003296291940000059
represents a state variable sj+1Take action aj+1Then, the least square error between the predicted value of the value target network and the real value of the target;
for the value evaluation network, the parameter θ is updated toward the direction of increasing value of the value evaluation network output value, and the process is expressed as:
Figure FDA00032962919400000510
wherein the content of the first and second substances,
Figure FDA00032962919400000511
represents a state variable sjAnd an action variable ajCorresponding to the gradient of the Q-value function over the parameter theta,
Figure FDA00032962919400000512
represents the gradient of the loss function L (theta) to the parameter theta; by adopting a method of temporarily freezing the value target network parameters, after a certain training period of the value evaluation network is reached, the parameters of the value target network are updated, and the value evaluation network parameters theta are transmitted to the value target network parameters theta-Thereby maintaining stage stationarity of the value target network;
the value target network and the value evaluation network both adopt a neural network architecture formed by full connection layers, 3 full connection layers are arranged on the value target network and the value evaluation network, and 200, 100 and 50 neurons are respectively selected from the 3 full connection layers.
5. The reinforcement learning-based unmanned aerial vehicle control and command decision method according to claim 1, wherein the step S5 specifically comprises: and loading the small unmanned aerial vehicle control command decision model obtained in the training of the step S4 in a small unmanned aerial vehicle control actual scene, making a decision according to a state space obtained in real time from the small unmanned aerial vehicle control actual scene to obtain a handling action variable a, applying the handling action variable a to the actual scene, immediately obtaining a small unmanned aerial vehicle control strategy, changing an environmental state and obtaining real-time reward feedback.
CN202110602580.1A 2021-05-31 2021-05-31 Small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning Active CN113268081B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110602580.1A CN113268081B (en) 2021-05-31 2021-05-31 Small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110602580.1A CN113268081B (en) 2021-05-31 2021-05-31 Small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN113268081A CN113268081A (en) 2021-08-17
CN113268081B true CN113268081B (en) 2021-11-09

Family

ID=77233727

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110602580.1A Active CN113268081B (en) 2021-05-31 2021-05-31 Small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN113268081B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114239392B (en) * 2021-12-09 2023-03-24 南通大学 Unmanned aerial vehicle decision model training method, using method, equipment and medium
CN114963879B (en) * 2022-05-20 2023-11-17 中国电子科技集团公司电子科学研究院 Comprehensive control system and method for unmanned aerial vehicle
CN115017759B (en) * 2022-05-25 2023-04-07 中国航空工业集团公司沈阳飞机设计研究所 Terminal autonomic defense simulation verification platform of unmanned aerial vehicle
JP7407329B1 (en) * 2023-10-04 2023-12-28 株式会社インターネットイニシアティブ Flight guidance device and flight guidance method
CN117527135B (en) * 2024-01-04 2024-03-22 北京领云时代科技有限公司 System and method for interfering unmanned aerial vehicle communication based on deep learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007080584A2 (en) * 2006-01-11 2007-07-19 Carmel-Haifa University Economic Corp. Ltd. Uav decision and control system
CN109445456A (en) * 2018-10-15 2019-03-08 清华大学 A kind of multiple no-manned plane cluster air navigation aid
CN111026147A (en) * 2019-12-25 2020-04-17 北京航空航天大学 Zero overshoot unmanned aerial vehicle position control method and device based on deep reinforcement learning
CN112215283A (en) * 2020-10-12 2021-01-12 中国人民解放军海军航空大学 Close-range air combat intelligent decision method based on manned/unmanned aerial vehicle system
CN112797846A (en) * 2020-12-22 2021-05-14 中国船舶重工集团公司第七0九研究所 Unmanned aerial vehicle prevention and control method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190220737A1 (en) * 2018-01-17 2019-07-18 Hengshuai Yao Method of generating training data for training a neural network, method of training a neural network and using neural network for autonomous operations

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007080584A2 (en) * 2006-01-11 2007-07-19 Carmel-Haifa University Economic Corp. Ltd. Uav decision and control system
CN109445456A (en) * 2018-10-15 2019-03-08 清华大学 A kind of multiple no-manned plane cluster air navigation aid
CN111026147A (en) * 2019-12-25 2020-04-17 北京航空航天大学 Zero overshoot unmanned aerial vehicle position control method and device based on deep reinforcement learning
CN112215283A (en) * 2020-10-12 2021-01-12 中国人民解放军海军航空大学 Close-range air combat intelligent decision method based on manned/unmanned aerial vehicle system
CN112797846A (en) * 2020-12-22 2021-05-14 中国船舶重工集团公司第七0九研究所 Unmanned aerial vehicle prevention and control method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A Neural Network-based Intelligent Decision-Making in the Air-Offensive Campaign with Simulation;Gang Hu;《 16th International Conference on Computational Intelligence and Security》;20201130;全文 *
基于动作空间噪声的深度Q网络学习;吴夏铭;《长春理工大学学报(自然科学版)》;20200831;全文 *
基于强化遗传算法的无人机空战机动决策研究;谢建峰;《西北工业大学学报》;20201231;全文 *

Also Published As

Publication number Publication date
CN113268081A (en) 2021-08-17

Similar Documents

Publication Publication Date Title
CN113268081B (en) Small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning
CN110488859B (en) Unmanned aerial vehicle route planning method based on improved Q-learning algorithm
CN111240353B (en) Unmanned aerial vehicle collaborative air combat decision method based on genetic fuzzy tree
CN107063255B (en) Three-dimensional route planning method based on improved drosophila optimization algorithm
CN110991972B (en) Cargo transportation system based on multi-agent reinforcement learning
CN109669475A (en) Multiple no-manned plane three-dimensional formation reconfiguration method based on artificial bee colony algorithm
CN113625569B (en) Small unmanned aerial vehicle prevention and control decision method and system based on hybrid decision model
CN114330115B (en) Neural network air combat maneuver decision-making method based on particle swarm search
CN113159266B (en) Air combat maneuver decision method based on sparrow searching neural network
CN114510078A (en) Unmanned aerial vehicle maneuver evasion decision-making method based on deep reinforcement learning
CN114444201A (en) Autonomous capability evaluation method of ground attack unmanned aerial vehicle based on Bayesian network
CN113741186B (en) Double-aircraft air combat decision-making method based on near-end strategy optimization
CN114089776B (en) Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning
CN114815891A (en) PER-IDQN-based multi-unmanned aerial vehicle enclosure capture tactical method
CN113255893B (en) Self-evolution generation method of multi-agent action strategy
Wang et al. Deep reinforcement learning-based air combat maneuver decision-making: literature review, implementation tutorial and future direction
Li et al. A UAV coverage path planning algorithm based on double deep q-network
CN110986948B (en) Multi-unmanned aerial vehicle grouping collaborative judgment method based on reward function optimization
CN113110101A (en) Production line mobile robot gathering type recovery and warehousing simulation method and system
CN115357051B (en) Deformation and maneuvering integrated avoidance and defense method
CN115574826B (en) National park unmanned aerial vehicle patrol path optimization method based on reinforcement learning
CN117035435A (en) Multi-unmanned aerial vehicle task allocation and track planning optimization method in dynamic environment
CN114879742B (en) Unmanned aerial vehicle cluster dynamic coverage method based on multi-agent deep reinforcement learning
CN116400726A (en) Rotor unmanned aerial vehicle escape method and system based on reinforcement learning
CN112698666A (en) Aircraft route optimization method based on meteorological grid

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant