CN111666631A - Unmanned aerial vehicle maneuvering decision method combining hesitation fuzzy and dynamic deep reinforcement learning - Google Patents
Unmanned aerial vehicle maneuvering decision method combining hesitation fuzzy and dynamic deep reinforcement learning Download PDFInfo
- Publication number
- CN111666631A CN111666631A CN202010497478.5A CN202010497478A CN111666631A CN 111666631 A CN111666631 A CN 111666631A CN 202010497478 A CN202010497478 A CN 202010497478A CN 111666631 A CN111666631 A CN 111666631A
- Authority
- CN
- China
- Prior art keywords
- unmanned aerial
- aerial vehicle
- decision
- enemy
- hesitation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 230000002787 reinforcement Effects 0.000 title claims abstract description 23
- 238000005457 optimization Methods 0.000 claims abstract description 38
- 230000009471 action Effects 0.000 claims abstract description 20
- 238000013528 artificial neural network Methods 0.000 claims abstract description 14
- 238000012549 training Methods 0.000 claims abstract description 6
- 230000008901 benefit Effects 0.000 claims description 22
- 239000013598 vector Substances 0.000 claims description 13
- 239000011159 matrix material Substances 0.000 claims description 12
- 230000006870 function Effects 0.000 claims description 8
- 150000001875 compounds Chemical class 0.000 claims description 4
- 238000005381 potential energy Methods 0.000 claims description 4
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 4
- 238000011156 evaluation Methods 0.000 claims description 2
- 230000000007 visual effect Effects 0.000 claims description 2
- 230000008569 process Effects 0.000 abstract description 11
- 230000007613 environmental effect Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- RZVHIXYEVGDQDX-UHFFFAOYSA-N 9,10-anthraquinone Chemical compound C1=CC=C2C(=O)C3=CC=CC=C3C(=O)C2=C1 RZVHIXYEVGDQDX-UHFFFAOYSA-N 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000009194 climbing Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000009189 diving Effects 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000013107 unsupervised machine learning method Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/10—Geometric CAD
- G06F30/15—Vehicle, aircraft or watercraft design
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2111/00—Details relating to CAD techniques
- G06F2111/04—Constraint-based CAD
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2111/00—Details relating to CAD techniques
- G06F2111/06—Multi-objective optimisation, e.g. Pareto optimisation using simulated annealing [SA], ant colony algorithms or genetic algorithms [GA]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2111/00—Details relating to CAD techniques
- G06F2111/10—Numerical modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2119/00—Details relating to the type or aim of the analysis or the optimisation
- G06F2119/14—Force analysis or force optimisation, e.g. static or dynamic forces
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Geometry (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Computational Mathematics (AREA)
- Aviation & Aerospace Engineering (AREA)
- Automation & Control Theory (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Feedback Control In General (AREA)
Abstract
The invention discloses an unmanned aerial vehicle maneuver decision method combining hesitation fuzzy and dynamic deep reinforcement learning, which comprises the steps of firstly, establishing an unmanned aerial vehicle air combat motion model, and establishing a decision model based on a weighted optimization target according to attack parameters of both enemy and my parties and a situation-based energy parameter difference; secondly, determining the optimal weight of a decision model of the optimization target in real time by adopting a maximum deviation method according to a hesitation fuzzy theory; then, constructing a state space and an action space for the air combat maneuver decision reinforcement learning; then, combining the states of the unmanned aerial vehicle at multiple moments into a state set as neural network input, and constructing a dynamic depth Q network to perform unmanned aerial vehicle maneuvering decision training; and finally, obtaining the optimal maneuver decision by the trained dynamic deep Q network. The method mainly solves the problem of unmanned aerial vehicle maneuver decision under the condition of incomplete environmental information, considers the influence of the air combat process in the decision process, and better meets the requirement of actual air combat.
Description
Technical Field
The invention belongs to the field of unmanned aerial vehicle air combat decision making, and particularly relates to an unmanned aerial vehicle maneuver decision making method combining hesitation fuzzy and dynamic deep reinforcement learning.
Technical Field
An Unmanned Combat Aircraft (UCAV) needs to decide an optimal tactical scheme or maneuvering action according to complex battlefield situation information in the air Combat process, and the quality of an Unmanned Aerial Vehicle decision mechanism is the key to successfully finish an air Combat task. Along with the air battle environment is more and more complicated and unknown, improve unmanned aerial vehicle's intelligence level for unmanned aerial vehicle can independently perceive battle field environment, and the automatic mobile selection that produces control command and accomplish in the air battle is the main research direction of current unmanned aerial vehicle air battle.
In recent years, with the rapid development of artificial intelligence technology, deep learning and machine learning show huge potential in the field of unmanned aerial vehicle air combat decision making. Reinforcement learning is an unsupervised machine learning method, wherein an aircraft obtains rewards through interaction with the environment, learns how to adapt to the environment according to the principle of obtaining the maximum reward, and updates and stores the learned experience in a Q value table. In actual air combat, when the state is too much and the dimension is too high, the Q value table is obviously not suitable, a deep reinforcement learning algorithm is adopted, and a neural network fitting Q value function is used for replacing the Q value table, so that the problem can be solved, but the method mainly aims at making air combat maneuver decisions under the condition that environmental parameters are known. However, in the actual air combat decision process, different air combat conditions have different requirements on environment parameters, and each optimized target parameter has certain fuzziness and inaccuracy, so that the method cannot meet the requirements.
Therefore, aiming at the problems, the invention provides the unmanned aerial vehicle maneuvering decision method combining hesitation and fuzzy dynamic deep reinforcement learning. The method has the advantages that the maximum bias method of hesitation and fuzzy is utilized to determine the weighted value of the optimization target at each moment, the problem that the weight of the traditional reinforcement learning multi-optimization target is fixed and unreasonable is solved, the multi-moment state is formed into a state set to serve as the input of a neural network, the neural network is trained on the principle of obtaining the maximum return, and the influence of an air combat process on a result is considered for the use of the state set instead of only considering the influence of the current moment on the result, so that the method is more suitable for the actual air combat.
Disclosure of Invention
The invention aims to provide an unmanned aerial vehicle maneuver decision method combining hesitation fuzzy and dynamic deep reinforcement learning.
In order to achieve the purpose, the invention adopts the following technical scheme:
the unmanned aerial vehicle maneuver decision method combining hesitation fuzzy and dynamic deep reinforcement learning comprises the following steps:
step 1, an unmanned aerial vehicle air combat motion model is established, and a decision model based on a weighted optimization target is established according to attack parameters of both enemy and my parties and the situation-based energy parameter difference.
And 2, determining the optimal weight of the decision model of the optimization target in real time by adopting a maximum deviation method according to the hesitation fuzzy theory.
And 3, constructing a state space and an action space of the air combat maneuver decision reinforcement learning.
And 4, merging the states of the unmanned aerial vehicles at multiple moments into a state set as neural network input, and constructing a dynamic depth Q network.
And 5, enabling the instant reward obtained when the action is taken in the current state to be a decision model based on a weighted optimization target, and performing unmanned aerial vehicle maneuvering decision training on the dynamic depth Q network by using the reward.
And 6, inputting the current state set of the unmanned aerial vehicle into the trained dynamic depth Q network to obtain an optimal maneuver decision.
The invention has the following advantages:
1. the method utilizes the hesitation fuzzy theory, adopts the maximum deviation method to determine the parameter weight in real time, and utilizes the weight to carry out weighted summation on multiple targets, thereby solving the problem that the weight of multiple optimization targets is fixed and unreasonable in the optimization process of the traditional reinforcement learning.
2. According to the invention, by introducing the dynamic deep Q network, the multi-time state is formed into the state set as the input of the neural network, the influence of the air combat process on the decision is considered, rather than only the influence of the current time on the result, so that the decision result is more reasonable.
Description of the figures
FIG. 1 is a flow chart of the method of the present invention
FIG. 2 is a schematic diagram of the air battle situation of the enemy and the my
FIG. 3 is an air combat trajectory diagram of enemy unmanned aerial vehicle adopting S-shaped maneuver strategy
FIG. 4 is a variation curve of each optimized target value of an S-shaped maneuver strategy adopted by an enemy unmanned aerial vehicle
FIG. 5 is a diagram of air combat trajectory with a "pure tracking" strategy taken by an enemy drone
FIG. 6 is a graph showing the variation of each optimized target value of the enemy unmanned aerial vehicle adopting the pure tracking strategy
Detailed Description
The technical scheme of the invention is specifically explained by combining the attached chart.
The unmanned aerial vehicle maneuver decision method combining hesitation ambiguity and dynamic deep reinforcement learning specifically comprises the following steps:
step 1, establishing an unmanned aerial vehicle air combat motion model according to missile attack parameters ξ of both enemies and my partiesi、ξTAnd establishing a decision model based on a weighted optimization target based on the situation energy parameter difference delta W, specifically:
(1.1) taking UCAV as particles, and describing the motion state of the UCAV by adopting a three-degree-of-freedom particle model without considering specific rigid motion and flight control algorithm, wherein the motion model is as follows:
wherein x, y and z represent the position of the airplane in an inertial coordinate system, v is the flying speed of the airplane, theta is the track inclination angle and represents the included angle between the speed and an x-O-y plane, psi is the course angle and represents the projection of the speed on the x-O-y planeAngle to the y-axis, where v' represents the projection of velocity on the x-O-y plane, and g is the current position gravitational acceleration, [ η ]x,ηz,φ]For unmanned aerial vehicle controlWherein ηxRepresenting thrust of the aircraft for overload in the direction of speed, ηzTo overload in the direction of the set-top, i.e., normal, φ is the roll angle around the velocity vector.
(1.2) establishing a decision model based on a weighted optimization target according to the model, specifically:
(1.2.1) attack parameter modeling based on weapon Performance
The air combat aims at hitting down enemies to protect the own party, and the maneuvering decision is made in order to form weapon launching conditions by the enemies and avoid forming weapon launching conditions by the enemies, so that the method is based on the attack performance of airborne weapons, combines angles, distances, weapon configurations and weapon ranges, and provides a new attack parameter zeta as an optimization target.
Assuming that both enemy and my carry air-to-air missiles, the attack zones are as shown in fig. 2, in which,represents the line-of-sight angle of my drone,for the target unmanned aerial vehicle line of sight angle, VU、VTRespectively representing the flight speeds of my party and target unmanned aerial vehicles, R is the distance between the unmanned aerial vehicles of the enemy and the my party, and defining a missile i attack parameter ξiComprises the following steps:
wherein k attacks the parameter adjustment factor, where k is 1, and R isgAlong my party unmanned aerial vehicle line of sight angle for missileMaximum attack distance in direction.
If the number of guided missiles carried by UCAV of our party is n, the specific guided missiles are numbered as U1, U2, Un, the attack parameters { ξ } of all guided missiles of our party are obtained through the formula (2)U1,ξU2,...,ξUnTaking the maximum value asOur current adversary attack parameter ξUThe concrete formula is as follows:
ξU=arg max{ξU1,ξU2,...,ξUn} (3)
if the number of missiles carried by the UCAV of the enemy is m, the specific missile numbers are T1, T2, Tm, the attack parameters { ξ } of all missiles of the enemy are obtained through the formula (2)T1,ξT2,...,ξTmThe maximum value is taken as the current attack parameter ξ of the enemy to the my partyTThe concrete formula is as follows:
ξT=arg max{ξT1,ξT2,...,ξTm} (4)
(1.2.2) energy parameter difference modeling based on UCAV situation
The completion of the actions of the unmanned aircraft is based on the premise of energy consumption, and the higher energy means more selectivity for the actions, so that the unmanned aircraft is more favorable for taking advantage of the actions in air combat. Suppose WUIs the energy parameter of my party and WTDefining an energy parameter difference Δ W ═ W based on UCAV situation for an enemy energy parameterU-WTWherein W isUAnd WTThe method specifically comprises the following steps:
in the formula, EUpAnd WUkRespectively gravitational potential energy and kinetic energy of our part, mUFor unmanned aerial vehicle quality of my party, ETpAnd WTkRespectively the gravitational potential energy and the kinetic energy, m, of the target unmanned aerial vehicleTFor target drone mass, k1And k2Respectively for our unmanned aerial vehicle and target unmanned aerial vehicle energy regulation parameter, here all take 1.
(1.2.3) establishing a decision model based on a weighted optimization target
The missile attack parameters ξ of the enemy and the my party are calculated according to the current enemy and my statei、ξTAnd weighting and summing the energy parameter difference delta W based on the situation to obtain a decision model of the optimization target, which specifically comprises the following steps:
f(ξU,ξT,ΔW;ω*)=ω1ξU+ω2ξT+ω3ΔW (6)
in the formula, ω1,ω2And ω3Are weights.
(2.1) constructing an optimized target weight hesitation fuzzy evaluation matrix based on multi-time unmanned aerial vehicle situation, wherein the specific method comprises the following steps:
let A be { A ═ A1,A2,...,AnThe decision set of the unmanned aerial vehicle at n continuous moments is shown, and X is (ξ)U,ξTΔ W is the optimized parameter set, then a hesitation ambiguity set A about X at the ith timeiComprises the following steps:
in the formula (I), the compound is shown in the specification,indicating that the ith time is at the optimization goal xjSet of possible degrees of membership.
When x isjGet ξUWhen the temperature of the water is higher than the set temperature,indicating that the ith time is at the optimization goal ξUThe set of possible membership degrees is abbreviatedAttack parameters ξ of my party to enemy at current timeUiThe larger the air battle advantage is; otherwise, the smaller the air battle advantage. The corresponding hesitation fuzzy set under the attribute isThe method specifically comprises the following steps:
where ρ is an attack factor coefficient, where ρ is 0.2.
When x isjGet ξTWhen the temperature of the water is higher than the set temperature,indicating that the ith time is at the optimization goal ξTThe set of possible membership degrees is abbreviatedAttack parameter ξ of enemy to my party at presentTiThe larger the air battle advantage is, the smaller the air battle advantage is; otherwise, the greater the air battle advantage. The corresponding hesitation fuzzy set under the attribute isThe method specifically comprises the following steps:
where ρ is an attack factor coefficient, where ρ is 0.2.
When x isjWhen the value is taken to be delta W,representing a possible membership set under the optimization target energy parameter difference delta W at the ith moment, which is abbreviated asEnergy parameter difference Δ W at current timeiThe larger the air battle advantage is; otherwise, the smaller the air battle advantage. The corresponding hesitation fuzzy set under the attribute isThe method specifically comprises the following steps:
where γ is an attack factor coefficient, where γ is 0.3.
And setting H as a hesitation fuzzy decision matrix and consisting of n multiplied by 3 hesitation elements, wherein n is the selected time number, and the hesitation fuzzy decision matrix H is specifically as follows:
(2.2) determining the optimal weight of the objective function based on a maximum deviation method, specifically:
for optimization objective xj∈ X, hesitation fuzzy set A at the ith momentiThe deviation under this optimization objective with respect to all other times is expressed as:
in the formula, ωjIs a weight coefficient, hijIs the ith row and j column elements, H, in the hesitation fuzzy decision matrix HkjJ columns of elements in the k row in the hesitation fuzzy decision matrix H, m is the number of optimization targets, d (H)ij,hkj) For the hesitation element H in the fuzzy decision matrix HijAnd hkjThe hesitation Euclidean distance of (1) is specifically defined as:
wherein l is the number of the numerical values in the hesitation elements.
For optimization objective xj∈ X, the deviation under the optimization objective at all times relative to the others is expressed as:
constructing a nonlinear model for determining the weight vector omega to maximize the deviation values of all the optimized target parameters, specifically:
and (3) solving and converting the model into a constraint optimization problem, and constructing a Lagrangian function f (omega, lambda) as shown in the following formula:
in the formula, λ is a lagrange multiplier.
The partial differential of f (ω, λ) is calculated as follows:
solving the above equation system can obtain the weight vector ω ═ (ω)1,ω2,...,ωj,...,ωm) Here ω isjComprises the following steps:
substituting formula (14) for formula (18), the above formula can be:
then, ω is changed to { ω ═ ω1,ω2,...,ωmNormalizing to obtain a normalized optimal weight vectorThe method specifically comprises the following steps:
and 3, constructing a state space S and an action space A for the air combat maneuver decision reinforcement learning.
(3.1) the state space S of the air combat maneuver decision reinforcement learning comprises all two situation factors influencing the calculation of the air combat advantage function, and specifically comprises the following steps:
1) unmanned aerial vehicle line of sight angle of our partyAnd the visual line angle of the enemy unmanned aerial vehicle
2) The distance R of the unmanned aerial vehicles of the two enemies;
3) speed v of unmanned aerial vehicle of our partyUWith enemy unmanned aerial vehicle velocity vT;
4) The height difference delta h of the unmanned aerial vehicles of the two enemies.
SelectingAnd as a state space for air combat maneuver decision reinforcement learning, describing the air combat situation of the unmanned aerial vehicle at the current moment.
(3.2) the final motion track of the unmanned aerial vehicle during the air battle can be regarded as a maneuver combination decided by each step, wherein seven basic maneuvers of the unmanned aerial vehicle during the air battle are selected, and the seven basic maneuvers specifically comprise: 1) keeping the original flight; 2) the maximum acceleration is directly flown; 3) maximum overload left turn; 4) maximum overload right turn; 5) climbing under maximum overload; 6) maximum overload dive; 7) and (4) flying at the maximum deceleration.
Establishing an unmanned aerial vehicle air combat motion model according to the formula (1), and designing the control quantity of the model as [ η ]x,ηz,φ]Then, the control quantities corresponding to the seven basic maneuvers are respectively:
1) maintain the original flight, [ η ]x,ηz,φ]=[0,1,0];
here, the first and second liquid crystal display panels are,the maximum overload in the speed direction, i.e. the maximum thrust,is the maximum overload in the direction of the set-top, i.e. the maximum normal overload.
The control amounts of the seven operations are respectively denoted as ai1, 2, 7, and an action space a of an air combat decision a ═ a ·1,a2,...,a7}。
(4.1) establishing a dynamic deep Q learning network, and initializing parameters as shown in the following table.
(4.2) State set(s) consisting of the current State vector and the State vectors of the previous two momentst-2,st-1,st) The output is the number Q of all action values(s) as the input of the neural networkt-2,st-1,st) A; θ), where a is the action taken by the agent in that state, θ is the network weight, and state st-2,st-1And stThe method specifically comprises the following steps:
step 5, let r be the instant prize obtained by taking the action in the current state, let r be f (ξ)U,ξT,ΔW;ω*) Here f (ξ)U,ξT,ΔW;ω*) The method is obtained by a decision model of an optimization target in an equation (6), and the reward r is used for carrying out unmanned aerial vehicle maneuvering decision training on the dynamic depth Q network, and the method specifically comprises the following steps:
(5.1) initialize the experience pool D, which takes 50000.
(5.2) establishing an action value Q estimation network, and randomly initializing a network weight theta; establishing an action value target networkInitializing network weights θ-=θ。
(5.3) initializing unmanned aerial vehicle State sequenceThe initial input to the neural network is(s)1,s2,s3)。
(5.4) for each step in the episode, selecting a random action a within the probabilitykOtherwise, select ak=argmaxaQ(s,a;θ)。
(5.5) the drone executes action akCalculating a potential function reward r at the k timekAnd calculating the state of the unmanned plane at the moment k +1Comparing the current experience ((s)k-2,sk-1,sk),ak,rk,(sk-1,sk,sk+1) Is stored in experience pool D.
(5.6) randomly extracting the minimum experience block D from the experience pool DminCalculating a target value function ykThe method specifically comprises the following steps:
to (y)k-Q((sk-2,sk-1,sk),ak;θ))2And updating the Q estimation network weight value theta by executing a gradient descent method.
(5.7) finishing the training if the training of the current plot is finished, otherwise, skipping to the step (5.3).
(6.1) setting k to be 1, randomly initializing states of the enemy and my unmanned aerial vehicles, and obtaining an initial neural network state(s)k,sk+1,sk+2)。
(6.2) inputting the state into the trained dynamic deep Q network, and outputting the optimal action a by the networkk=argmaxaQ (s, a; theta), the unmanned aerial vehicle obtains the state set(s) at the next moment after executing the actionk+1,sk+2,sk+3)。
(6.3) when the unmanned aerial vehicle of the local party forms an attack condition to the enemy, finishing the maneuver decision, otherwise,(s)k+1,sk+2,sk+3) Inputting the data into a neural network, and jumping to the step (6.2).
In order to verify the feasibility and effectiveness of the method, the invention is described in further detail below with reference to examples.
On a macOS operating system (version number: Mojava version: 10.14.5, processor: 2.5GHz opcode i7, memory: 16GB 1600MHz DDR3, display card: Inter Iris Pro 1536MB), an experimental environment is built on pycharm by using python language for algorithm simulation, and a simulation result is exported to realize visualization.
And adopting the trained dynamic depth Q network to make maneuvering decision on the unmanned aerial vehicle under the following two conditions, wherein both the enemy and the my adopt classical tactical action.
Case 1: the unmanned aerial vehicle of our party is initially in a better situation environment, and the unmanned aerial vehicle of the enemy party only adopts S-shaped maneuver to carry out tactical evasion
Fig. 3 shows an air battle locus diagram of an enemy unmanned aerial vehicle adopting a "S-shaped maneuver" strategy, wherein an upper red curve is the motion locus of the unmanned aerial vehicle of the enemy, and a lower blue curve is the motion locus of the unmanned aerial vehicle of the enemy. Coordinates of the unmanned aerial vehicle of the party are (1.2km, 8.2km and 2.4km) and coordinates of the unmanned aerial vehicle of the enemy are (1.5km, 1.5km and 1.0km) in the initial state. As can be seen from the figure, initially, the unmanned aerial vehicle of our party is positioned above the enemy to take advantage, and after the unmanned aerial vehicle of our party is properly dived down, the head is pulled up to avoid the warplane to rush, and then the line-of-sight angle is adjusted, so that the attack condition is formed and maintained.
FIG. 4 shows the variation curve of each optimized target parameter of the enemy unmanned aerial vehicle adopting the strategy of S-shaped maneuver, and the lower red curve in the diagram is the current enemy attack parameter ξ of our partyUThe middle black curve is the current attack on my party parameter ξ of the enemyTThe upper blue curve represents the energy parameter difference Δ W ═ WU-WTIt can be seen from the figure that in the decision making process, my current adversary attack parameter ξUGetting larger and the enemy attacking the parameter ξ to my partyTThe difference is gradually reduced, and the delta W is continuously reduced in the whole decision making process, so that the unmanned aerial vehicle of the party finally realizes attack on the enemy under the condition of sacrificing a certain energy advantage.
Case 2: the enemy adopts the tactics of 'pure tracking' to try to approach and attack unmanned aerial vehicle of our party
Fig. 5 shows an air battle locus diagram of an enemy unmanned aerial vehicle adopting a 'pure tracking' strategy, wherein an upper red curve is a motion locus of the unmanned aerial vehicle of the enemy, and a lower blue curve is a motion locus of the unmanned aerial vehicle of the enemy. Coordinates of the unmanned aerial vehicle of the party are (8.0km, 9.5km and 8.5km) and coordinates of the unmanned aerial vehicle of the enemy are (1.5km, 1.6km and 0.8km) in the initial state. As can be seen from the figure, initially, the unmanned aerial vehicle at one side loses tactical advantages after diving, but then lifts the machine head while flexibly avoiding towards the lower right, so that an attack condition is achieved in advance, and finally attack on an enemy is realized.
FIG. 6 shows the change curve of each optimized target by the enemy unmanned aerial vehicle adopting the 'pure tracking' strategy, and the lower red curve in the graph is the current enemy attack parameter ξ of our partyUThe middle black curve is the current attack on my party parameter ξ of the enemyTThe upper blue curve represents the energy parameter difference Δ W ═ WU-WTIt can be seen from the figure that in the decision making process, my current adversary attack parameter ξUGetting larger and the enemy attacking the parameter ξ to my partyTThe energy parameter difference delta W is gradually reduced in the whole decision making process, and the fact that the unmanned aerial vehicle of the party finally attacks the enemy under the condition that a certain energy advantage is sacrificed is shown.
Claims (4)
1. An unmanned aerial vehicle maneuvering decision method combining hesitation fuzzy and dynamic deep reinforcement learning is characterized by comprising the following steps:
step 1, an unmanned aerial vehicle air combat motion model is established, and a decision model based on a weighted optimization target is established according to attack parameters of both enemy and my parties and the situation-based energy parameter difference.
And 2, determining the optimal weight of the decision model of the optimization target in real time by adopting a maximum deviation method according to the hesitation fuzzy theory.
And 3, constructing a state space and an action space of the air combat maneuver decision reinforcement learning.
And 4, merging the states of the unmanned aerial vehicles at multiple moments into a state set as neural network input, and constructing a dynamic depth Q network.
And 5, enabling the instant reward obtained when the action is taken in the current state to be a decision model based on a weighted optimization target, and performing unmanned aerial vehicle maneuvering decision training on the dynamic depth Q network by using the reward.
And 6, inputting the current state set of the unmanned aerial vehicle into the trained dynamic depth Q network to obtain an optimal maneuver decision.
2. The unmanned aerial vehicle maneuver decision method combining hesitation ambiguity and dynamic deep reinforcement learning according to claim 1, wherein the difference between the enemy-me attack parameter and the situation-based energy parameter in step 1 specifically comprises:
(2.1) defining missile i attack parameters ξiComprises the following steps:
in the formula, k attacks the parameter adjustment factor, R is the distance between the unmanned aerial vehicles of the two parties of the enemy and the my, RgAlong my party unmanned aerial vehicle line of sight angle for missileMaximum attack distance in direction.
If the number of guided missiles carried by UCAV of one party is n, the specific guided missiles are numbered as U1, U2, Un, the attack parameters { ξ } of all guided missiles of one party are obtained through the formula (1)U1,ξU2,...,ξUnTaking the maximum value as the current enemy attack parameter ξ of our partyUThe concrete formula is as follows:
ξU=arg max{ξU1,ξU2,...,ξUn} (2)
if the number of missiles carried by the UCAV of the enemy is m, the specific missile numbers are T1, T2, Tm, the attack parameters { ξ } of all missiles of the enemy are obtained through the formula (1)T1,ξT2,...,ξTmThe maximum value is taken as the current attack parameter ξ of the enemy to the my partyTThe concrete formula is as follows:
ξT=arg max{ξT1,ξT2,...,ξTm} (3)
(2.2) defining a situation-based energy parameter difference Δ W ═ WU-WTHere WUAs my square energy parameter, WTThe method specifically meets the following requirements for enemy energy parameters:
in the formula, EUpAnd WUkRespectively gravitational potential energy and kinetic energy of our part, mUFor unmanned aerial vehicle quality of my party, ETpAnd WTkRespectively the gravitational potential energy and the kinetic energy, m, of the target unmanned aerial vehicleTFor target drone mass, k1And k2Respectively are the energy adjusting parameters of the unmanned aerial vehicle of the same party and the target unmanned aerial vehicle.
3. The unmanned aerial vehicle decision method combining hesitation blur and dynamic depth reinforcement learning according to claim 1, wherein in the step 2, according to the hesitation blur theory, the optimal weight of the decision model of the optimization target is determined in real time by adopting a maximum deviation method, and specifically:
(3.1) constructing an optimized target weight hesitation fuzzy evaluation matrix based on multi-time unmanned aerial vehicle situation, wherein the specific method comprises the following steps:
let A be { A ═ A1,A2,...,AnThe decision set of the unmanned aerial vehicle at n continuous moments is shown, and X is (ξ)U,ξTΔ W is the optimized parameter set, then a hesitation ambiguity set A about X at the ith timeiComprises the following steps:
in the formula (I), the compound is shown in the specification,indicating that the ith time is at the optimization goal xjSet of possible degrees of membership.
When x isjGet ξUWhen the temperature of the water is higher than the set temperature,indicating that the ith time is optimizedGoal ξUThe set of possible membership degrees is abbreviatedAttack parameters ξ of my party to enemy at current timeUiThe larger the air battle advantage is; otherwise, the smaller the air battle advantage. The corresponding hesitation fuzzy set under the attribute isThe method specifically comprises the following steps:
where ρ is an attack factor coefficient, where ρ is 0.2.
When x isjGet ξTWhen the temperature of the water is higher than the set temperature,indicating that the ith time is at the optimization goal ξTThe set of possible membership degrees is abbreviatedAttack parameter ξ of enemy to my party at presentTiThe larger the air battle advantage is, the smaller the air battle advantage is; otherwise, the greater the air battle advantage. The corresponding hesitation fuzzy set under the attribute isThe method specifically comprises the following steps:
where ρ is an attack factor coefficient, where ρ is 0.2.
When x isjWhen the value is taken to be delta W,indicating that the ith time is optimizedThe possible membership set under the target energy parameter difference Δ W is abbreviated asEnergy parameter difference Δ W at current timeiThe larger the air battle advantage is; otherwise, the smaller the air battle advantage. The corresponding hesitation fuzzy set under the attribute isThe method specifically comprises the following steps:
And setting H as a hesitation fuzzy decision matrix and consisting of n multiplied by 3 hesitation elements, wherein n is the selected time number, and the hesitation fuzzy decision matrix H is specifically as follows:
(3.2) determining the optimal weight of the objective function based on a maximum deviation method, specifically:
for optimization objective xj∈ X, hesitation fuzzy set A at the ith momentiThe deviation under this optimization objective with respect to all other times is expressed as:
in the formula, ωjIs a weight coefficient, hijIs the ith row and j column elements, H, in the hesitation fuzzy decision matrix HkjJ columns of elements in the k row in the hesitation fuzzy decision matrix H, m is the number of optimization targets, d (H)ij,hkj) For the hesitation element H in the fuzzy decision matrix HijAnd hkjThe hesitation Euclidean distance of (1) is specifically defined as:
wherein l is the number of the numerical values in the hesitation elements.
For optimization objective xj∈ X, the deviation under the optimization objective at all times relative to the others is expressed as:
constructing a nonlinear model for determining the weight vector omega to maximize the deviation values of all the optimized target parameters, specifically:
and (3) solving and converting the model into a constraint optimization problem, and constructing a Lagrangian function f (omega, lambda) as shown in the following formula:
in the formula, λ is a lagrange multiplier.
The partial differential of f (ω, λ) is calculated as follows:
solving the above equation system can obtain the weight vector ω ═ (ω)1,ω2,...,ωj,...,ωm) Here ω isjComprises the following steps:
substituting formula (12) for formula (16), the above formula can be:
then, ω is changed to { ω ═ ω1,ω2,...,ωmNormalizing to obtain a normalized optimal weight vectorThe method specifically comprises the following steps:
4. the unmanned aerial vehicle maneuver decision method combining hesitation ambiguity and dynamic deep reinforcement learning according to claim 1, wherein in the step 4, the states of the unmanned aerial vehicle at a plurality of moments are combined into a state set as a neural network input, specifically:
first, selectAs a state space for the air war maneuver decision reinforcement learning, the air war situation of the unmanned aerial vehicle at the current moment is described, and here,is the line-of-sight angle of the unmanned aerial vehicle of our party,is the visual angle of the enemy unmanned aerial vehicle, R is the distance between the enemy unmanned aerial vehicle and the unmanned aerial vehicle, vUIs the speed, v, of the unmanned aerial vehicle of our partyTThe speed of the enemy unmanned aerial vehicle is shown, and delta h is the height difference of the enemy unmanned aerial vehicle and the unmanned aerial vehicle.
Then, a state set(s) composed of the current state vector and the state vectors of the previous two time instantst-2,st-1,st) As input to the neural network, each timeState of carving st-2,st-1And stRespectively satisfy:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010497478.5A CN111666631A (en) | 2020-06-03 | 2020-06-03 | Unmanned aerial vehicle maneuvering decision method combining hesitation fuzzy and dynamic deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010497478.5A CN111666631A (en) | 2020-06-03 | 2020-06-03 | Unmanned aerial vehicle maneuvering decision method combining hesitation fuzzy and dynamic deep reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111666631A true CN111666631A (en) | 2020-09-15 |
Family
ID=72385998
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010497478.5A Pending CN111666631A (en) | 2020-06-03 | 2020-06-03 | Unmanned aerial vehicle maneuvering decision method combining hesitation fuzzy and dynamic deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111666631A (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112150510A (en) * | 2020-09-29 | 2020-12-29 | 中国人民解放军63875部队 | Stepping target tracking method based on double-depth enhanced network |
CN112215283A (en) * | 2020-10-12 | 2021-01-12 | 中国人民解放军海军航空大学 | Close-range air combat intelligent decision method based on manned/unmanned aerial vehicle system |
CN112461059A (en) * | 2020-10-30 | 2021-03-09 | 彩虹无人机科技有限公司 | Image-seeking guided missile ground launching method |
CN112595174A (en) * | 2020-11-27 | 2021-04-02 | 合肥工业大学 | Multi-unmanned aerial vehicle tactical decision method and device in dynamic environment |
CN112598046A (en) * | 2020-12-17 | 2021-04-02 | 沈阳航空航天大学 | Target tactical intention identification method in multi-machine collaborative air combat |
CN113093803A (en) * | 2021-04-03 | 2021-07-09 | 西北工业大学 | Unmanned aerial vehicle air combat motion control method based on E-SAC algorithm |
CN113128021A (en) * | 2021-03-12 | 2021-07-16 | 合肥工业大学 | Real-time re-decision method and system for cooperative confrontation of multiple unmanned platforms |
CN113159266A (en) * | 2021-05-21 | 2021-07-23 | 大连大学 | Air combat maneuver decision method based on sparrow search neural network |
CN113392396A (en) * | 2021-06-11 | 2021-09-14 | 浙江工业大学 | Strategy protection defense method for deep reinforcement learning |
CN113625753A (en) * | 2021-08-07 | 2021-11-09 | 中国航空工业集团公司沈阳飞机设计研究所 | Method for guiding neural network to learn maneuvering flight of unmanned aerial vehicle by expert rules |
CN113741525A (en) * | 2021-09-10 | 2021-12-03 | 南京航空航天大学 | Strategy set based MADDPG multi-unmanned aerial vehicle cooperative attack and defense countermeasure method |
CN115392444A (en) * | 2022-10-31 | 2022-11-25 | 中国人民解放军国防科技大学 | Parameter optimization method of unmanned aerial vehicle knowledge model combination based on reinforcement learning |
CN116069056A (en) * | 2022-12-15 | 2023-05-05 | 南通大学 | Unmanned plane battlefield target tracking control method based on deep reinforcement learning |
CN116861645A (en) * | 2023-06-27 | 2023-10-10 | 四川大学 | Non-linear prediction control-based aircraft beyond-sight air combat maneuver decision-making method |
CN117130379A (en) * | 2023-07-31 | 2023-11-28 | 南通大学 | LQR near vision distance-based unmanned aerial vehicle air combat attack method |
CN117332680A (en) * | 2023-09-15 | 2024-01-02 | 四川大学 | Close-range air combat maneuver decision optimization method based on safety reinforcement learning |
CN117348392A (en) * | 2023-09-27 | 2024-01-05 | 四川大学 | Multi-machine short-distance air combat maneuver decision distributed optimization method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108319286A (en) * | 2018-03-12 | 2018-07-24 | 西北工业大学 | A kind of unmanned plane Air Combat Maneuvering Decision Method based on intensified learning |
CN110796255A (en) * | 2019-09-16 | 2020-02-14 | 湖州师范学院 | Hesitation fuzzy multi-attribute decision method based on binary union coefficient |
-
2020
- 2020-06-03 CN CN202010497478.5A patent/CN111666631A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108319286A (en) * | 2018-03-12 | 2018-07-24 | 西北工业大学 | A kind of unmanned plane Air Combat Maneuvering Decision Method based on intensified learning |
CN110796255A (en) * | 2019-09-16 | 2020-02-14 | 湖州师范学院 | Hesitation fuzzy multi-attribute decision method based on binary union coefficient |
Non-Patent Citations (4)
Title |
---|
ALSAGER: "A decision-making approach based on multi Q-dual hesitant fuzzy soft rough model", JOURNAL OF INTELLIGENT & FUZZY SYSTEMS * |
J.A: "A dynamic group decision making process for high number of alternatives using hesitant Fuzzy Ontologies and sentiment analysis", HERRERA-VIEDMA KNOWLEDGE-BASED SYSTEMS * |
丁勇等: "基于直觉模糊博弈的无人机空战机动决策", 系统工程与电子技术 * |
左家亮: "基于启发式强化学习的空战机动智能决策", 航空学报 * |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112150510B (en) * | 2020-09-29 | 2024-03-26 | 中国人民解放军63875部队 | Stepping target tracking method based on dual-depth enhancement network |
CN112150510A (en) * | 2020-09-29 | 2020-12-29 | 中国人民解放军63875部队 | Stepping target tracking method based on double-depth enhanced network |
CN112215283A (en) * | 2020-10-12 | 2021-01-12 | 中国人民解放军海军航空大学 | Close-range air combat intelligent decision method based on manned/unmanned aerial vehicle system |
CN112461059A (en) * | 2020-10-30 | 2021-03-09 | 彩虹无人机科技有限公司 | Image-seeking guided missile ground launching method |
CN112595174B (en) * | 2020-11-27 | 2022-09-13 | 合肥工业大学 | Multi-unmanned aerial vehicle tactical decision method and device in dynamic environment |
CN112595174A (en) * | 2020-11-27 | 2021-04-02 | 合肥工业大学 | Multi-unmanned aerial vehicle tactical decision method and device in dynamic environment |
CN112598046A (en) * | 2020-12-17 | 2021-04-02 | 沈阳航空航天大学 | Target tactical intention identification method in multi-machine collaborative air combat |
CN112598046B (en) * | 2020-12-17 | 2023-09-26 | 沈阳航空航天大学 | Target tactical intent recognition method in multi-machine cooperative air combat |
CN113128021A (en) * | 2021-03-12 | 2021-07-16 | 合肥工业大学 | Real-time re-decision method and system for cooperative confrontation of multiple unmanned platforms |
CN113128021B (en) * | 2021-03-12 | 2022-10-25 | 合肥工业大学 | Real-time re-decision method and system for cooperative confrontation of multiple unmanned platforms |
CN113093803A (en) * | 2021-04-03 | 2021-07-09 | 西北工业大学 | Unmanned aerial vehicle air combat motion control method based on E-SAC algorithm |
CN113159266A (en) * | 2021-05-21 | 2021-07-23 | 大连大学 | Air combat maneuver decision method based on sparrow search neural network |
CN113159266B (en) * | 2021-05-21 | 2023-07-21 | 大连大学 | Air combat maneuver decision method based on sparrow searching neural network |
CN113392396A (en) * | 2021-06-11 | 2021-09-14 | 浙江工业大学 | Strategy protection defense method for deep reinforcement learning |
CN113625753A (en) * | 2021-08-07 | 2021-11-09 | 中国航空工业集团公司沈阳飞机设计研究所 | Method for guiding neural network to learn maneuvering flight of unmanned aerial vehicle by expert rules |
CN113625753B (en) * | 2021-08-07 | 2023-07-07 | 中国航空工业集团公司沈阳飞机设计研究所 | Method for guiding neural network to learn unmanned aerial vehicle maneuver flight by expert rules |
CN113741525A (en) * | 2021-09-10 | 2021-12-03 | 南京航空航天大学 | Strategy set based MADDPG multi-unmanned aerial vehicle cooperative attack and defense countermeasure method |
CN113741525B (en) * | 2021-09-10 | 2024-02-06 | 南京航空航天大学 | Policy set-based MADDPG multi-unmanned aerial vehicle cooperative attack and defense countermeasure method |
CN115392444A (en) * | 2022-10-31 | 2022-11-25 | 中国人民解放军国防科技大学 | Parameter optimization method of unmanned aerial vehicle knowledge model combination based on reinforcement learning |
CN116069056B (en) * | 2022-12-15 | 2023-07-18 | 南通大学 | Unmanned plane battlefield target tracking control method based on deep reinforcement learning |
CN116069056A (en) * | 2022-12-15 | 2023-05-05 | 南通大学 | Unmanned plane battlefield target tracking control method based on deep reinforcement learning |
CN116861645A (en) * | 2023-06-27 | 2023-10-10 | 四川大学 | Non-linear prediction control-based aircraft beyond-sight air combat maneuver decision-making method |
CN116861645B (en) * | 2023-06-27 | 2024-04-16 | 四川大学 | Non-linear prediction control-based aircraft beyond-sight air combat maneuver decision-making method |
CN117130379A (en) * | 2023-07-31 | 2023-11-28 | 南通大学 | LQR near vision distance-based unmanned aerial vehicle air combat attack method |
CN117130379B (en) * | 2023-07-31 | 2024-04-16 | 南通大学 | LQR near vision distance-based unmanned aerial vehicle air combat attack method |
CN117332680A (en) * | 2023-09-15 | 2024-01-02 | 四川大学 | Close-range air combat maneuver decision optimization method based on safety reinforcement learning |
CN117348392A (en) * | 2023-09-27 | 2024-01-05 | 四川大学 | Multi-machine short-distance air combat maneuver decision distributed optimization method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111666631A (en) | Unmanned aerial vehicle maneuvering decision method combining hesitation fuzzy and dynamic deep reinforcement learning | |
Changqiang et al. | Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization | |
CN108319286A (en) | A kind of unmanned plane Air Combat Maneuvering Decision Method based on intensified learning | |
CN110928329B (en) | Multi-aircraft track planning method based on deep Q learning algorithm | |
CN113050686B (en) | Combat strategy optimization method and system based on deep reinforcement learning | |
CN113962012B (en) | Unmanned aerial vehicle countermeasure strategy optimization method and device | |
CN114330115B (en) | Neural network air combat maneuver decision-making method based on particle swarm search | |
CN114063644B (en) | Unmanned fighter plane air combat autonomous decision-making method based on pigeon flock reverse countermeasure learning | |
CN113741500B (en) | Unmanned aerial vehicle air combat maneuver decision-making method for intelligent predation optimization of simulated Harris eagle | |
CN113159266B (en) | Air combat maneuver decision method based on sparrow searching neural network | |
CN110673488A (en) | Double DQN unmanned aerial vehicle concealed access method based on priority random sampling strategy | |
CN113282061A (en) | Unmanned aerial vehicle air game countermeasure solving method based on course learning | |
CN114492805A (en) | Air combat maneuver decision design method based on fuzzy reasoning | |
CN111773722B (en) | Method for generating maneuver strategy set for avoiding fighter plane in simulation environment | |
CN115903865A (en) | Aircraft near-distance air combat maneuver decision implementation method | |
CN115688268A (en) | Aircraft near-distance air combat situation assessment adaptive weight design method | |
CN113671825A (en) | Maneuvering intelligent decision missile avoidance method based on reinforcement learning | |
Duan et al. | Autonomous maneuver decision for unmanned aerial vehicle via improved pigeon-inspired optimization | |
Wang et al. | Deep reinforcement learning-based air combat maneuver decision-making: literature review, implementation tutorial and future direction | |
CN117313561A (en) | Unmanned aerial vehicle intelligent decision model training method and unmanned aerial vehicle intelligent decision method | |
CN116432030A (en) | Air combat multi-intention strategy autonomous generation method based on deep reinforcement learning | |
Yang et al. | Ballistic missile maneuver penetration based on reinforcement learning | |
CN116011315A (en) | Missile escape area fast calculation method based on K-sparse self-coding SVM | |
Wang et al. | Research on naval air defense intelligent operations on deep reinforcement learning | |
Meng et al. | UAV Attack and Defense Optimization Guidance Method Based on Target Trajectory Prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |