CN111666631A - Unmanned aerial vehicle maneuvering decision method combining hesitation fuzzy and dynamic deep reinforcement learning - Google Patents

Unmanned aerial vehicle maneuvering decision method combining hesitation fuzzy and dynamic deep reinforcement learning Download PDF

Info

Publication number
CN111666631A
CN111666631A CN202010497478.5A CN202010497478A CN111666631A CN 111666631 A CN111666631 A CN 111666631A CN 202010497478 A CN202010497478 A CN 202010497478A CN 111666631 A CN111666631 A CN 111666631A
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
decision
enemy
hesitation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010497478.5A
Other languages
Chinese (zh)
Inventor
丁勇
何金
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202010497478.5A priority Critical patent/CN111666631A/en
Publication of CN111666631A publication Critical patent/CN111666631A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/10Geometric CAD
    • G06F30/15Vehicle, aircraft or watercraft design
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/04Constraint-based CAD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/06Multi-objective optimisation, e.g. Pareto optimisation using simulated annealing [SA], ant colony algorithms or genetic algorithms [GA]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/10Numerical modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/14Force analysis or force optimisation, e.g. static or dynamic forces

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Geometry (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Automation & Control Theory (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses an unmanned aerial vehicle maneuver decision method combining hesitation fuzzy and dynamic deep reinforcement learning, which comprises the steps of firstly, establishing an unmanned aerial vehicle air combat motion model, and establishing a decision model based on a weighted optimization target according to attack parameters of both enemy and my parties and a situation-based energy parameter difference; secondly, determining the optimal weight of a decision model of the optimization target in real time by adopting a maximum deviation method according to a hesitation fuzzy theory; then, constructing a state space and an action space for the air combat maneuver decision reinforcement learning; then, combining the states of the unmanned aerial vehicle at multiple moments into a state set as neural network input, and constructing a dynamic depth Q network to perform unmanned aerial vehicle maneuvering decision training; and finally, obtaining the optimal maneuver decision by the trained dynamic deep Q network. The method mainly solves the problem of unmanned aerial vehicle maneuver decision under the condition of incomplete environmental information, considers the influence of the air combat process in the decision process, and better meets the requirement of actual air combat.

Description

Unmanned aerial vehicle maneuvering decision method combining hesitation fuzzy and dynamic deep reinforcement learning
Technical Field
The invention belongs to the field of unmanned aerial vehicle air combat decision making, and particularly relates to an unmanned aerial vehicle maneuver decision making method combining hesitation fuzzy and dynamic deep reinforcement learning.
Technical Field
An Unmanned Combat Aircraft (UCAV) needs to decide an optimal tactical scheme or maneuvering action according to complex battlefield situation information in the air Combat process, and the quality of an Unmanned Aerial Vehicle decision mechanism is the key to successfully finish an air Combat task. Along with the air battle environment is more and more complicated and unknown, improve unmanned aerial vehicle's intelligence level for unmanned aerial vehicle can independently perceive battle field environment, and the automatic mobile selection that produces control command and accomplish in the air battle is the main research direction of current unmanned aerial vehicle air battle.
In recent years, with the rapid development of artificial intelligence technology, deep learning and machine learning show huge potential in the field of unmanned aerial vehicle air combat decision making. Reinforcement learning is an unsupervised machine learning method, wherein an aircraft obtains rewards through interaction with the environment, learns how to adapt to the environment according to the principle of obtaining the maximum reward, and updates and stores the learned experience in a Q value table. In actual air combat, when the state is too much and the dimension is too high, the Q value table is obviously not suitable, a deep reinforcement learning algorithm is adopted, and a neural network fitting Q value function is used for replacing the Q value table, so that the problem can be solved, but the method mainly aims at making air combat maneuver decisions under the condition that environmental parameters are known. However, in the actual air combat decision process, different air combat conditions have different requirements on environment parameters, and each optimized target parameter has certain fuzziness and inaccuracy, so that the method cannot meet the requirements.
Therefore, aiming at the problems, the invention provides the unmanned aerial vehicle maneuvering decision method combining hesitation and fuzzy dynamic deep reinforcement learning. The method has the advantages that the maximum bias method of hesitation and fuzzy is utilized to determine the weighted value of the optimization target at each moment, the problem that the weight of the traditional reinforcement learning multi-optimization target is fixed and unreasonable is solved, the multi-moment state is formed into a state set to serve as the input of a neural network, the neural network is trained on the principle of obtaining the maximum return, and the influence of an air combat process on a result is considered for the use of the state set instead of only considering the influence of the current moment on the result, so that the method is more suitable for the actual air combat.
Disclosure of Invention
The invention aims to provide an unmanned aerial vehicle maneuver decision method combining hesitation fuzzy and dynamic deep reinforcement learning.
In order to achieve the purpose, the invention adopts the following technical scheme:
the unmanned aerial vehicle maneuver decision method combining hesitation fuzzy and dynamic deep reinforcement learning comprises the following steps:
step 1, an unmanned aerial vehicle air combat motion model is established, and a decision model based on a weighted optimization target is established according to attack parameters of both enemy and my parties and the situation-based energy parameter difference.
And 2, determining the optimal weight of the decision model of the optimization target in real time by adopting a maximum deviation method according to the hesitation fuzzy theory.
And 3, constructing a state space and an action space of the air combat maneuver decision reinforcement learning.
And 4, merging the states of the unmanned aerial vehicles at multiple moments into a state set as neural network input, and constructing a dynamic depth Q network.
And 5, enabling the instant reward obtained when the action is taken in the current state to be a decision model based on a weighted optimization target, and performing unmanned aerial vehicle maneuvering decision training on the dynamic depth Q network by using the reward.
And 6, inputting the current state set of the unmanned aerial vehicle into the trained dynamic depth Q network to obtain an optimal maneuver decision.
The invention has the following advantages:
1. the method utilizes the hesitation fuzzy theory, adopts the maximum deviation method to determine the parameter weight in real time, and utilizes the weight to carry out weighted summation on multiple targets, thereby solving the problem that the weight of multiple optimization targets is fixed and unreasonable in the optimization process of the traditional reinforcement learning.
2. According to the invention, by introducing the dynamic deep Q network, the multi-time state is formed into the state set as the input of the neural network, the influence of the air combat process on the decision is considered, rather than only the influence of the current time on the result, so that the decision result is more reasonable.
Description of the figures
FIG. 1 is a flow chart of the method of the present invention
FIG. 2 is a schematic diagram of the air battle situation of the enemy and the my
FIG. 3 is an air combat trajectory diagram of enemy unmanned aerial vehicle adopting S-shaped maneuver strategy
FIG. 4 is a variation curve of each optimized target value of an S-shaped maneuver strategy adopted by an enemy unmanned aerial vehicle
FIG. 5 is a diagram of air combat trajectory with a "pure tracking" strategy taken by an enemy drone
FIG. 6 is a graph showing the variation of each optimized target value of the enemy unmanned aerial vehicle adopting the pure tracking strategy
Detailed Description
The technical scheme of the invention is specifically explained by combining the attached chart.
The unmanned aerial vehicle maneuver decision method combining hesitation ambiguity and dynamic deep reinforcement learning specifically comprises the following steps:
step 1, establishing an unmanned aerial vehicle air combat motion model according to missile attack parameters ξ of both enemies and my partiesi、ξTAnd establishing a decision model based on a weighted optimization target based on the situation energy parameter difference delta W, specifically:
(1.1) taking UCAV as particles, and describing the motion state of the UCAV by adopting a three-degree-of-freedom particle model without considering specific rigid motion and flight control algorithm, wherein the motion model is as follows:
Figure BSA0000210581380000031
wherein x, y and z represent the position of the airplane in an inertial coordinate system, v is the flying speed of the airplane, theta is the track inclination angle and represents the included angle between the speed and an x-O-y plane, psi is the course angle and represents the projection of the speed on the x-O-y plane
Figure BSA0000210581380000035
Angle to the y-axis, where v' represents the projection of velocity on the x-O-y plane, and g is the current position gravitational acceleration, [ η ]x,ηz,φ]For unmanned aerial vehicle controlWherein ηxRepresenting thrust of the aircraft for overload in the direction of speed, ηzTo overload in the direction of the set-top, i.e., normal, φ is the roll angle around the velocity vector.
(1.2) establishing a decision model based on a weighted optimization target according to the model, specifically:
(1.2.1) attack parameter modeling based on weapon Performance
The air combat aims at hitting down enemies to protect the own party, and the maneuvering decision is made in order to form weapon launching conditions by the enemies and avoid forming weapon launching conditions by the enemies, so that the method is based on the attack performance of airborne weapons, combines angles, distances, weapon configurations and weapon ranges, and provides a new attack parameter zeta as an optimization target.
Assuming that both enemy and my carry air-to-air missiles, the attack zones are as shown in fig. 2, in which,
Figure BSA0000210581380000032
represents the line-of-sight angle of my drone,
Figure BSA0000210581380000033
for the target unmanned aerial vehicle line of sight angle, VU、VTRespectively representing the flight speeds of my party and target unmanned aerial vehicles, R is the distance between the unmanned aerial vehicles of the enemy and the my party, and defining a missile i attack parameter ξiComprises the following steps:
Figure BSA0000210581380000034
wherein k attacks the parameter adjustment factor, where k is 1, and R isgAlong my party unmanned aerial vehicle line of sight angle for missile
Figure BSA0000210581380000042
Maximum attack distance in direction.
If the number of guided missiles carried by UCAV of our party is n, the specific guided missiles are numbered as U1, U2, Un, the attack parameters { ξ } of all guided missiles of our party are obtained through the formula (2)U1,ξU2,...,ξUnTaking the maximum value asOur current adversary attack parameter ξUThe concrete formula is as follows:
ξU=arg max{ξU1,ξU2,...,ξUn} (3)
if the number of missiles carried by the UCAV of the enemy is m, the specific missile numbers are T1, T2, Tm, the attack parameters { ξ } of all missiles of the enemy are obtained through the formula (2)T1,ξT2,...,ξTmThe maximum value is taken as the current attack parameter ξ of the enemy to the my partyTThe concrete formula is as follows:
ξT=arg max{ξT1,ξT2,...,ξTm} (4)
(1.2.2) energy parameter difference modeling based on UCAV situation
The completion of the actions of the unmanned aircraft is based on the premise of energy consumption, and the higher energy means more selectivity for the actions, so that the unmanned aircraft is more favorable for taking advantage of the actions in air combat. Suppose WUIs the energy parameter of my party and WTDefining an energy parameter difference Δ W ═ W based on UCAV situation for an enemy energy parameterU-WTWherein W isUAnd WTThe method specifically comprises the following steps:
Figure BSA0000210581380000041
in the formula, EUpAnd WUkRespectively gravitational potential energy and kinetic energy of our part, mUFor unmanned aerial vehicle quality of my party, ETpAnd WTkRespectively the gravitational potential energy and the kinetic energy, m, of the target unmanned aerial vehicleTFor target drone mass, k1And k2Respectively for our unmanned aerial vehicle and target unmanned aerial vehicle energy regulation parameter, here all take 1.
(1.2.3) establishing a decision model based on a weighted optimization target
The missile attack parameters ξ of the enemy and the my party are calculated according to the current enemy and my statei、ξTAnd weighting and summing the energy parameter difference delta W based on the situation to obtain a decision model of the optimization target, which specifically comprises the following steps:
f(ξU,ξT,ΔW;ω*)=ω1ξU2ξT3ΔW (6)
in the formula, ω1,ω2And ω3Are weights.
Step 2, determining the optimal weight omega of the decision model of the optimization target in real time by adopting a maximum deviation method according to the hesitation fuzzy theory*The method specifically comprises the following steps:
(2.1) constructing an optimized target weight hesitation fuzzy evaluation matrix based on multi-time unmanned aerial vehicle situation, wherein the specific method comprises the following steps:
let A be { A ═ A1,A2,...,AnThe decision set of the unmanned aerial vehicle at n continuous moments is shown, and X is (ξ)U,ξTΔ W is the optimized parameter set, then a hesitation ambiguity set A about X at the ith timeiComprises the following steps:
Figure BSA0000210581380000051
in the formula (I), the compound is shown in the specification,
Figure BSA0000210581380000052
indicating that the ith time is at the optimization goal xjSet of possible degrees of membership.
When x isjGet ξUWhen the temperature of the water is higher than the set temperature,
Figure BSA0000210581380000053
indicating that the ith time is at the optimization goal ξUThe set of possible membership degrees is abbreviated
Figure BSA0000210581380000054
Attack parameters ξ of my party to enemy at current timeUiThe larger the air battle advantage is; otherwise, the smaller the air battle advantage. The corresponding hesitation fuzzy set under the attribute is
Figure BSA0000210581380000055
The method specifically comprises the following steps:
Figure BSA0000210581380000056
where ρ is an attack factor coefficient, where ρ is 0.2.
When x isjGet ξTWhen the temperature of the water is higher than the set temperature,
Figure BSA0000210581380000057
indicating that the ith time is at the optimization goal ξTThe set of possible membership degrees is abbreviated
Figure BSA0000210581380000058
Attack parameter ξ of enemy to my party at presentTiThe larger the air battle advantage is, the smaller the air battle advantage is; otherwise, the greater the air battle advantage. The corresponding hesitation fuzzy set under the attribute is
Figure BSA0000210581380000059
The method specifically comprises the following steps:
Figure BSA0000210581380000061
where ρ is an attack factor coefficient, where ρ is 0.2.
When x isjWhen the value is taken to be delta W,
Figure BSA0000210581380000062
representing a possible membership set under the optimization target energy parameter difference delta W at the ith moment, which is abbreviated as
Figure BSA0000210581380000063
Energy parameter difference Δ W at current timeiThe larger the air battle advantage is; otherwise, the smaller the air battle advantage. The corresponding hesitation fuzzy set under the attribute is
Figure BSA0000210581380000064
The method specifically comprises the following steps:
Figure BSA0000210581380000065
where γ is an attack factor coefficient, where γ is 0.3.
And setting H as a hesitation fuzzy decision matrix and consisting of n multiplied by 3 hesitation elements, wherein n is the selected time number, and the hesitation fuzzy decision matrix H is specifically as follows:
Figure BSA0000210581380000066
(2.2) determining the optimal weight of the objective function based on a maximum deviation method, specifically:
for optimization objective xj∈ X, hesitation fuzzy set A at the ith momentiThe deviation under this optimization objective with respect to all other times is expressed as:
Figure BSA0000210581380000067
in the formula, ωjIs a weight coefficient, hijIs the ith row and j column elements, H, in the hesitation fuzzy decision matrix HkjJ columns of elements in the k row in the hesitation fuzzy decision matrix H, m is the number of optimization targets, d (H)ij,hkj) For the hesitation element H in the fuzzy decision matrix HijAnd hkjThe hesitation Euclidean distance of (1) is specifically defined as:
Figure BSA0000210581380000071
wherein l is the number of the numerical values in the hesitation elements.
For optimization objective xj∈ X, the deviation under the optimization objective at all times relative to the others is expressed as:
Figure BSA0000210581380000072
constructing a nonlinear model for determining the weight vector omega to maximize the deviation values of all the optimized target parameters, specifically:
Figure BSA0000210581380000073
and (3) solving and converting the model into a constraint optimization problem, and constructing a Lagrangian function f (omega, lambda) as shown in the following formula:
Figure BSA0000210581380000074
in the formula, λ is a lagrange multiplier.
The partial differential of f (ω, λ) is calculated as follows:
Figure BSA0000210581380000075
solving the above equation system can obtain the weight vector ω ═ (ω)1,ω2,...,ωj,...,ωm) Here ω isjComprises the following steps:
Figure BSA0000210581380000081
substituting formula (14) for formula (18), the above formula can be:
Figure BSA0000210581380000082
then, ω is changed to { ω ═ ω1,ω2,...,ωmNormalizing to obtain a normalized optimal weight vector
Figure BSA0000210581380000083
The method specifically comprises the following steps:
Figure BSA0000210581380000084
in the formula (I), the compound is shown in the specification,
Figure BSA0000210581380000085
and 3, constructing a state space S and an action space A for the air combat maneuver decision reinforcement learning.
(3.1) the state space S of the air combat maneuver decision reinforcement learning comprises all two situation factors influencing the calculation of the air combat advantage function, and specifically comprises the following steps:
1) unmanned aerial vehicle line of sight angle of our party
Figure BSA0000210581380000086
And the visual line angle of the enemy unmanned aerial vehicle
Figure BSA0000210581380000087
2) The distance R of the unmanned aerial vehicles of the two enemies;
3) speed v of unmanned aerial vehicle of our partyUWith enemy unmanned aerial vehicle velocity vT
4) The height difference delta h of the unmanned aerial vehicles of the two enemies.
Selecting
Figure BSA0000210581380000088
And as a state space for air combat maneuver decision reinforcement learning, describing the air combat situation of the unmanned aerial vehicle at the current moment.
(3.2) the final motion track of the unmanned aerial vehicle during the air battle can be regarded as a maneuver combination decided by each step, wherein seven basic maneuvers of the unmanned aerial vehicle during the air battle are selected, and the seven basic maneuvers specifically comprise: 1) keeping the original flight; 2) the maximum acceleration is directly flown; 3) maximum overload left turn; 4) maximum overload right turn; 5) climbing under maximum overload; 6) maximum overload dive; 7) and (4) flying at the maximum deceleration.
Establishing an unmanned aerial vehicle air combat motion model according to the formula (1), and designing the control quantity of the model as [ η ]x,ηz,φ]Then, the control quantities corresponding to the seven basic maneuvers are respectively:
1) maintain the original flight, [ η ]x,ηz,φ]=[0,1,0];
2) The flying speed is accelerated to the maximum extent,
Figure BSA0000210581380000089
3) the vehicle is overloaded to the left at the maximum,
Figure BSA0000210581380000091
4) the vehicle is rotated to the right under the maximum overload,
Figure BSA0000210581380000092
5) the maximum overload is climbed up,
Figure BSA0000210581380000093
6) the maximum overload is caused to dive downwards,
Figure BSA0000210581380000094
7) the flying speed is reduced to the maximum speed,
Figure BSA0000210581380000095
here, the first and second liquid crystal display panels are,
Figure BSA0000210581380000096
the maximum overload in the speed direction, i.e. the maximum thrust,
Figure BSA0000210581380000097
is the maximum overload in the direction of the set-top, i.e. the maximum normal overload.
The control amounts of the seven operations are respectively denoted as ai1, 2, 7, and an action space a of an air combat decision a ═ a ·1,a2,...,a7}。
Step 4, taking a state set formed by the current state vector and the state vectors at the previous two moments as neural network input, and constructing a dynamic depth Q network, which specifically comprises the following steps:
(4.1) establishing a dynamic deep Q learning network, and initializing parameters as shown in the following table.
Figure BSA0000210581380000098
(4.2) State set(s) consisting of the current State vector and the State vectors of the previous two momentst-2,st-1,st) The output is the number Q of all action values(s) as the input of the neural networkt-2,st-1,st) A; θ), where a is the action taken by the agent in that state, θ is the network weight, and state st-2,st-1And stThe method specifically comprises the following steps:
Figure BSA0000210581380000101
step 5, let r be the instant prize obtained by taking the action in the current state, let r be f (ξ)U,ξT,ΔW;ω*) Here f (ξ)U,ξT,ΔW;ω*) The method is obtained by a decision model of an optimization target in an equation (6), and the reward r is used for carrying out unmanned aerial vehicle maneuvering decision training on the dynamic depth Q network, and the method specifically comprises the following steps:
(5.1) initialize the experience pool D, which takes 50000.
(5.2) establishing an action value Q estimation network, and randomly initializing a network weight theta; establishing an action value target network
Figure BSA0000210581380000102
Initializing network weights θ-=θ。
(5.3) initializing unmanned aerial vehicle State sequence
Figure BSA0000210581380000103
The initial input to the neural network is(s)1,s2,s3)。
(5.4) for each step in the episode, selecting a random action a within the probabilitykOtherwise, select ak=argmaxaQ(s,a;θ)。
(5.5) the drone executes action akCalculating a potential function reward r at the k timekAnd calculating the state of the unmanned plane at the moment k +1
Figure BSA0000210581380000104
Comparing the current experience ((s)k-2,sk-1,sk),ak,rk,(sk-1,sk,sk+1) Is stored in experience pool D.
(5.6) randomly extracting the minimum experience block D from the experience pool DminCalculating a target value function ykThe method specifically comprises the following steps:
Figure BSA0000210581380000105
to (y)k-Q((sk-2,sk-1,sk),ak;θ))2And updating the Q estimation network weight value theta by executing a gradient descent method.
(5.7) finishing the training if the training of the current plot is finished, otherwise, skipping to the step (5.3).
Step 6, inputting the current state set of the unmanned aerial vehicle into the trained dynamic depth Q network to obtain an optimal maneuver decision, and the specific steps are as follows:
(6.1) setting k to be 1, randomly initializing states of the enemy and my unmanned aerial vehicles, and obtaining an initial neural network state(s)k,sk+1,sk+2)。
(6.2) inputting the state into the trained dynamic deep Q network, and outputting the optimal action a by the networkk=argmaxaQ (s, a; theta), the unmanned aerial vehicle obtains the state set(s) at the next moment after executing the actionk+1,sk+2,sk+3)。
(6.3) when the unmanned aerial vehicle of the local party forms an attack condition to the enemy, finishing the maneuver decision, otherwise,(s)k+1,sk+2,sk+3) Inputting the data into a neural network, and jumping to the step (6.2).
In order to verify the feasibility and effectiveness of the method, the invention is described in further detail below with reference to examples.
On a macOS operating system (version number: Mojava version: 10.14.5, processor: 2.5GHz opcode i7, memory: 16GB 1600MHz DDR3, display card: Inter Iris Pro 1536MB), an experimental environment is built on pycharm by using python language for algorithm simulation, and a simulation result is exported to realize visualization.
And adopting the trained dynamic depth Q network to make maneuvering decision on the unmanned aerial vehicle under the following two conditions, wherein both the enemy and the my adopt classical tactical action.
Case 1: the unmanned aerial vehicle of our party is initially in a better situation environment, and the unmanned aerial vehicle of the enemy party only adopts S-shaped maneuver to carry out tactical evasion
Fig. 3 shows an air battle locus diagram of an enemy unmanned aerial vehicle adopting a "S-shaped maneuver" strategy, wherein an upper red curve is the motion locus of the unmanned aerial vehicle of the enemy, and a lower blue curve is the motion locus of the unmanned aerial vehicle of the enemy. Coordinates of the unmanned aerial vehicle of the party are (1.2km, 8.2km and 2.4km) and coordinates of the unmanned aerial vehicle of the enemy are (1.5km, 1.5km and 1.0km) in the initial state. As can be seen from the figure, initially, the unmanned aerial vehicle of our party is positioned above the enemy to take advantage, and after the unmanned aerial vehicle of our party is properly dived down, the head is pulled up to avoid the warplane to rush, and then the line-of-sight angle is adjusted, so that the attack condition is formed and maintained.
FIG. 4 shows the variation curve of each optimized target parameter of the enemy unmanned aerial vehicle adopting the strategy of S-shaped maneuver, and the lower red curve in the diagram is the current enemy attack parameter ξ of our partyUThe middle black curve is the current attack on my party parameter ξ of the enemyTThe upper blue curve represents the energy parameter difference Δ W ═ WU-WTIt can be seen from the figure that in the decision making process, my current adversary attack parameter ξUGetting larger and the enemy attacking the parameter ξ to my partyTThe difference is gradually reduced, and the delta W is continuously reduced in the whole decision making process, so that the unmanned aerial vehicle of the party finally realizes attack on the enemy under the condition of sacrificing a certain energy advantage.
Case 2: the enemy adopts the tactics of 'pure tracking' to try to approach and attack unmanned aerial vehicle of our party
Fig. 5 shows an air battle locus diagram of an enemy unmanned aerial vehicle adopting a 'pure tracking' strategy, wherein an upper red curve is a motion locus of the unmanned aerial vehicle of the enemy, and a lower blue curve is a motion locus of the unmanned aerial vehicle of the enemy. Coordinates of the unmanned aerial vehicle of the party are (8.0km, 9.5km and 8.5km) and coordinates of the unmanned aerial vehicle of the enemy are (1.5km, 1.6km and 0.8km) in the initial state. As can be seen from the figure, initially, the unmanned aerial vehicle at one side loses tactical advantages after diving, but then lifts the machine head while flexibly avoiding towards the lower right, so that an attack condition is achieved in advance, and finally attack on an enemy is realized.
FIG. 6 shows the change curve of each optimized target by the enemy unmanned aerial vehicle adopting the 'pure tracking' strategy, and the lower red curve in the graph is the current enemy attack parameter ξ of our partyUThe middle black curve is the current attack on my party parameter ξ of the enemyTThe upper blue curve represents the energy parameter difference Δ W ═ WU-WTIt can be seen from the figure that in the decision making process, my current adversary attack parameter ξUGetting larger and the enemy attacking the parameter ξ to my partyTThe energy parameter difference delta W is gradually reduced in the whole decision making process, and the fact that the unmanned aerial vehicle of the party finally attacks the enemy under the condition that a certain energy advantage is sacrificed is shown.

Claims (4)

1. An unmanned aerial vehicle maneuvering decision method combining hesitation fuzzy and dynamic deep reinforcement learning is characterized by comprising the following steps:
step 1, an unmanned aerial vehicle air combat motion model is established, and a decision model based on a weighted optimization target is established according to attack parameters of both enemy and my parties and the situation-based energy parameter difference.
And 2, determining the optimal weight of the decision model of the optimization target in real time by adopting a maximum deviation method according to the hesitation fuzzy theory.
And 3, constructing a state space and an action space of the air combat maneuver decision reinforcement learning.
And 4, merging the states of the unmanned aerial vehicles at multiple moments into a state set as neural network input, and constructing a dynamic depth Q network.
And 5, enabling the instant reward obtained when the action is taken in the current state to be a decision model based on a weighted optimization target, and performing unmanned aerial vehicle maneuvering decision training on the dynamic depth Q network by using the reward.
And 6, inputting the current state set of the unmanned aerial vehicle into the trained dynamic depth Q network to obtain an optimal maneuver decision.
2. The unmanned aerial vehicle maneuver decision method combining hesitation ambiguity and dynamic deep reinforcement learning according to claim 1, wherein the difference between the enemy-me attack parameter and the situation-based energy parameter in step 1 specifically comprises:
(2.1) defining missile i attack parameters ξiComprises the following steps:
Figure FSA0000210581370000011
in the formula, k attacks the parameter adjustment factor, R is the distance between the unmanned aerial vehicles of the two parties of the enemy and the my, RgAlong my party unmanned aerial vehicle line of sight angle for missile
Figure FSA0000210581370000012
Maximum attack distance in direction.
If the number of guided missiles carried by UCAV of one party is n, the specific guided missiles are numbered as U1, U2, Un, the attack parameters { ξ } of all guided missiles of one party are obtained through the formula (1)U1,ξU2,...,ξUnTaking the maximum value as the current enemy attack parameter ξ of our partyUThe concrete formula is as follows:
ξU=arg max{ξU1,ξU2,...,ξUn} (2)
if the number of missiles carried by the UCAV of the enemy is m, the specific missile numbers are T1, T2, Tm, the attack parameters { ξ } of all missiles of the enemy are obtained through the formula (1)T1,ξT2,...,ξTmThe maximum value is taken as the current attack parameter ξ of the enemy to the my partyTThe concrete formula is as follows:
ξT=arg max{ξT1,ξT2,...,ξTm} (3)
(2.2) defining a situation-based energy parameter difference Δ W ═ WU-WTHere WUAs my square energy parameter, WTThe method specifically meets the following requirements for enemy energy parameters:
Figure FSA0000210581370000021
in the formula, EUpAnd WUkRespectively gravitational potential energy and kinetic energy of our part, mUFor unmanned aerial vehicle quality of my party, ETpAnd WTkRespectively the gravitational potential energy and the kinetic energy, m, of the target unmanned aerial vehicleTFor target drone mass, k1And k2Respectively are the energy adjusting parameters of the unmanned aerial vehicle of the same party and the target unmanned aerial vehicle.
3. The unmanned aerial vehicle decision method combining hesitation blur and dynamic depth reinforcement learning according to claim 1, wherein in the step 2, according to the hesitation blur theory, the optimal weight of the decision model of the optimization target is determined in real time by adopting a maximum deviation method, and specifically:
(3.1) constructing an optimized target weight hesitation fuzzy evaluation matrix based on multi-time unmanned aerial vehicle situation, wherein the specific method comprises the following steps:
let A be { A ═ A1,A2,...,AnThe decision set of the unmanned aerial vehicle at n continuous moments is shown, and X is (ξ)U,ξTΔ W is the optimized parameter set, then a hesitation ambiguity set A about X at the ith timeiComprises the following steps:
Figure FSA0000210581370000026
in the formula (I), the compound is shown in the specification,
Figure FSA0000210581370000024
indicating that the ith time is at the optimization goal xjSet of possible degrees of membership.
When x isjGet ξUWhen the temperature of the water is higher than the set temperature,
Figure FSA0000210581370000025
indicating that the ith time is optimizedGoal ξUThe set of possible membership degrees is abbreviated
Figure FSA0000210581370000022
Attack parameters ξ of my party to enemy at current timeUiThe larger the air battle advantage is; otherwise, the smaller the air battle advantage. The corresponding hesitation fuzzy set under the attribute is
Figure FSA0000210581370000023
The method specifically comprises the following steps:
Figure FSA0000210581370000031
where ρ is an attack factor coefficient, where ρ is 0.2.
When x isjGet ξTWhen the temperature of the water is higher than the set temperature,
Figure FSA0000210581370000032
indicating that the ith time is at the optimization goal ξTThe set of possible membership degrees is abbreviated
Figure FSA0000210581370000033
Attack parameter ξ of enemy to my party at presentTiThe larger the air battle advantage is, the smaller the air battle advantage is; otherwise, the greater the air battle advantage. The corresponding hesitation fuzzy set under the attribute is
Figure FSA0000210581370000034
The method specifically comprises the following steps:
Figure FSA0000210581370000035
where ρ is an attack factor coefficient, where ρ is 0.2.
When x isjWhen the value is taken to be delta W,
Figure FSA0000210581370000036
indicating that the ith time is optimizedThe possible membership set under the target energy parameter difference Δ W is abbreviated as
Figure FSA0000210581370000037
Energy parameter difference Δ W at current timeiThe larger the air battle advantage is; otherwise, the smaller the air battle advantage. The corresponding hesitation fuzzy set under the attribute is
Figure FSA0000210581370000038
The method specifically comprises the following steps:
Figure FSA0000210581370000039
where γ is an attack factor coefficient, where γ is 0.3.
And setting H as a hesitation fuzzy decision matrix and consisting of n multiplied by 3 hesitation elements, wherein n is the selected time number, and the hesitation fuzzy decision matrix H is specifically as follows:
Figure FSA0000210581370000041
(3.2) determining the optimal weight of the objective function based on a maximum deviation method, specifically:
for optimization objective xj∈ X, hesitation fuzzy set A at the ith momentiThe deviation under this optimization objective with respect to all other times is expressed as:
Figure FSA0000210581370000042
in the formula, ωjIs a weight coefficient, hijIs the ith row and j column elements, H, in the hesitation fuzzy decision matrix HkjJ columns of elements in the k row in the hesitation fuzzy decision matrix H, m is the number of optimization targets, d (H)ij,hkj) For the hesitation element H in the fuzzy decision matrix HijAnd hkjThe hesitation Euclidean distance of (1) is specifically defined as:
Figure FSA0000210581370000043
wherein l is the number of the numerical values in the hesitation elements.
For optimization objective xj∈ X, the deviation under the optimization objective at all times relative to the others is expressed as:
Figure FSA0000210581370000044
constructing a nonlinear model for determining the weight vector omega to maximize the deviation values of all the optimized target parameters, specifically:
Figure FSA0000210581370000045
and (3) solving and converting the model into a constraint optimization problem, and constructing a Lagrangian function f (omega, lambda) as shown in the following formula:
Figure FSA0000210581370000051
in the formula, λ is a lagrange multiplier.
The partial differential of f (ω, λ) is calculated as follows:
Figure FSA0000210581370000052
solving the above equation system can obtain the weight vector ω ═ (ω)1,ω2,...,ωj,...,ωm) Here ω isjComprises the following steps:
Figure FSA0000210581370000053
substituting formula (12) for formula (16), the above formula can be:
Figure FSA0000210581370000054
then, ω is changed to { ω ═ ω1,ω2,...,ωmNormalizing to obtain a normalized optimal weight vector
Figure FSA0000210581370000055
The method specifically comprises the following steps:
Figure FSA0000210581370000056
in the formula (I), the compound is shown in the specification,
Figure FSA0000210581370000057
4. the unmanned aerial vehicle maneuver decision method combining hesitation ambiguity and dynamic deep reinforcement learning according to claim 1, wherein in the step 4, the states of the unmanned aerial vehicle at a plurality of moments are combined into a state set as a neural network input, specifically:
first, select
Figure FSA0000210581370000061
As a state space for the air war maneuver decision reinforcement learning, the air war situation of the unmanned aerial vehicle at the current moment is described, and here,
Figure FSA0000210581370000062
is the line-of-sight angle of the unmanned aerial vehicle of our party,
Figure FSA0000210581370000063
is the visual angle of the enemy unmanned aerial vehicle, R is the distance between the enemy unmanned aerial vehicle and the unmanned aerial vehicle, vUIs the speed, v, of the unmanned aerial vehicle of our partyTThe speed of the enemy unmanned aerial vehicle is shown, and delta h is the height difference of the enemy unmanned aerial vehicle and the unmanned aerial vehicle.
Then, a state set(s) composed of the current state vector and the state vectors of the previous two time instantst-2,st-1,st) As input to the neural network, each timeState of carving st-2,st-1And stRespectively satisfy:
Figure FSA0000210581370000064
CN202010497478.5A 2020-06-03 2020-06-03 Unmanned aerial vehicle maneuvering decision method combining hesitation fuzzy and dynamic deep reinforcement learning Pending CN111666631A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010497478.5A CN111666631A (en) 2020-06-03 2020-06-03 Unmanned aerial vehicle maneuvering decision method combining hesitation fuzzy and dynamic deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010497478.5A CN111666631A (en) 2020-06-03 2020-06-03 Unmanned aerial vehicle maneuvering decision method combining hesitation fuzzy and dynamic deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN111666631A true CN111666631A (en) 2020-09-15

Family

ID=72385998

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010497478.5A Pending CN111666631A (en) 2020-06-03 2020-06-03 Unmanned aerial vehicle maneuvering decision method combining hesitation fuzzy and dynamic deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN111666631A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112150510A (en) * 2020-09-29 2020-12-29 中国人民解放军63875部队 Stepping target tracking method based on double-depth enhanced network
CN112215283A (en) * 2020-10-12 2021-01-12 中国人民解放军海军航空大学 Close-range air combat intelligent decision method based on manned/unmanned aerial vehicle system
CN112461059A (en) * 2020-10-30 2021-03-09 彩虹无人机科技有限公司 Image-seeking guided missile ground launching method
CN112598046A (en) * 2020-12-17 2021-04-02 沈阳航空航天大学 Target tactical intention identification method in multi-machine collaborative air combat
CN112595174A (en) * 2020-11-27 2021-04-02 合肥工业大学 Multi-unmanned aerial vehicle tactical decision method and device in dynamic environment
CN113093803A (en) * 2021-04-03 2021-07-09 西北工业大学 Unmanned aerial vehicle air combat motion control method based on E-SAC algorithm
CN113128021A (en) * 2021-03-12 2021-07-16 合肥工业大学 Real-time re-decision method and system for cooperative confrontation of multiple unmanned platforms
CN113159266A (en) * 2021-05-21 2021-07-23 大连大学 Air combat maneuver decision method based on sparrow search neural network
CN113392396A (en) * 2021-06-11 2021-09-14 浙江工业大学 Strategy protection defense method for deep reinforcement learning
CN113625753A (en) * 2021-08-07 2021-11-09 中国航空工业集团公司沈阳飞机设计研究所 Method for guiding neural network to learn maneuvering flight of unmanned aerial vehicle by expert rules
CN113741525A (en) * 2021-09-10 2021-12-03 南京航空航天大学 Strategy set based MADDPG multi-unmanned aerial vehicle cooperative attack and defense countermeasure method
CN115392444A (en) * 2022-10-31 2022-11-25 中国人民解放军国防科技大学 Parameter optimization method of unmanned aerial vehicle knowledge model combination based on reinforcement learning
CN116069056A (en) * 2022-12-15 2023-05-05 南通大学 Unmanned plane battlefield target tracking control method based on deep reinforcement learning
CN116861645A (en) * 2023-06-27 2023-10-10 四川大学 Non-linear prediction control-based aircraft beyond-sight air combat maneuver decision-making method
CN117130379A (en) * 2023-07-31 2023-11-28 南通大学 LQR near vision distance-based unmanned aerial vehicle air combat attack method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319286A (en) * 2018-03-12 2018-07-24 西北工业大学 A kind of unmanned plane Air Combat Maneuvering Decision Method based on intensified learning
CN110796255A (en) * 2019-09-16 2020-02-14 湖州师范学院 Hesitation fuzzy multi-attribute decision method based on binary union coefficient

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319286A (en) * 2018-03-12 2018-07-24 西北工业大学 A kind of unmanned plane Air Combat Maneuvering Decision Method based on intensified learning
CN110796255A (en) * 2019-09-16 2020-02-14 湖州师范学院 Hesitation fuzzy multi-attribute decision method based on binary union coefficient

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ALSAGER: "A decision-making approach based on multi Q-dual hesitant fuzzy soft rough model", JOURNAL OF INTELLIGENT & FUZZY SYSTEMS *
J.A: "A dynamic group decision making process for high number of alternatives using hesitant Fuzzy Ontologies and sentiment analysis", HERRERA-VIEDMA KNOWLEDGE-BASED SYSTEMS *
丁勇等: "基于直觉模糊博弈的无人机空战机动决策", 系统工程与电子技术 *
左家亮: "基于启发式强化学习的空战机动智能决策", 航空学报 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112150510B (en) * 2020-09-29 2024-03-26 中国人民解放军63875部队 Stepping target tracking method based on dual-depth enhancement network
CN112150510A (en) * 2020-09-29 2020-12-29 中国人民解放军63875部队 Stepping target tracking method based on double-depth enhanced network
CN112215283A (en) * 2020-10-12 2021-01-12 中国人民解放军海军航空大学 Close-range air combat intelligent decision method based on manned/unmanned aerial vehicle system
CN112461059A (en) * 2020-10-30 2021-03-09 彩虹无人机科技有限公司 Image-seeking guided missile ground launching method
CN112595174B (en) * 2020-11-27 2022-09-13 合肥工业大学 Multi-unmanned aerial vehicle tactical decision method and device in dynamic environment
CN112595174A (en) * 2020-11-27 2021-04-02 合肥工业大学 Multi-unmanned aerial vehicle tactical decision method and device in dynamic environment
CN112598046A (en) * 2020-12-17 2021-04-02 沈阳航空航天大学 Target tactical intention identification method in multi-machine collaborative air combat
CN112598046B (en) * 2020-12-17 2023-09-26 沈阳航空航天大学 Target tactical intent recognition method in multi-machine cooperative air combat
CN113128021A (en) * 2021-03-12 2021-07-16 合肥工业大学 Real-time re-decision method and system for cooperative confrontation of multiple unmanned platforms
CN113128021B (en) * 2021-03-12 2022-10-25 合肥工业大学 Real-time re-decision method and system for cooperative confrontation of multiple unmanned platforms
CN113093803A (en) * 2021-04-03 2021-07-09 西北工业大学 Unmanned aerial vehicle air combat motion control method based on E-SAC algorithm
CN113159266B (en) * 2021-05-21 2023-07-21 大连大学 Air combat maneuver decision method based on sparrow searching neural network
CN113159266A (en) * 2021-05-21 2021-07-23 大连大学 Air combat maneuver decision method based on sparrow search neural network
CN113392396A (en) * 2021-06-11 2021-09-14 浙江工业大学 Strategy protection defense method for deep reinforcement learning
CN113625753A (en) * 2021-08-07 2021-11-09 中国航空工业集团公司沈阳飞机设计研究所 Method for guiding neural network to learn maneuvering flight of unmanned aerial vehicle by expert rules
CN113625753B (en) * 2021-08-07 2023-07-07 中国航空工业集团公司沈阳飞机设计研究所 Method for guiding neural network to learn unmanned aerial vehicle maneuver flight by expert rules
CN113741525B (en) * 2021-09-10 2024-02-06 南京航空航天大学 Policy set-based MADDPG multi-unmanned aerial vehicle cooperative attack and defense countermeasure method
CN113741525A (en) * 2021-09-10 2021-12-03 南京航空航天大学 Strategy set based MADDPG multi-unmanned aerial vehicle cooperative attack and defense countermeasure method
CN115392444A (en) * 2022-10-31 2022-11-25 中国人民解放军国防科技大学 Parameter optimization method of unmanned aerial vehicle knowledge model combination based on reinforcement learning
CN116069056B (en) * 2022-12-15 2023-07-18 南通大学 Unmanned plane battlefield target tracking control method based on deep reinforcement learning
CN116069056A (en) * 2022-12-15 2023-05-05 南通大学 Unmanned plane battlefield target tracking control method based on deep reinforcement learning
CN116861645A (en) * 2023-06-27 2023-10-10 四川大学 Non-linear prediction control-based aircraft beyond-sight air combat maneuver decision-making method
CN116861645B (en) * 2023-06-27 2024-04-16 四川大学 Non-linear prediction control-based aircraft beyond-sight air combat maneuver decision-making method
CN117130379A (en) * 2023-07-31 2023-11-28 南通大学 LQR near vision distance-based unmanned aerial vehicle air combat attack method
CN117130379B (en) * 2023-07-31 2024-04-16 南通大学 LQR near vision distance-based unmanned aerial vehicle air combat attack method

Similar Documents

Publication Publication Date Title
CN111666631A (en) Unmanned aerial vehicle maneuvering decision method combining hesitation fuzzy and dynamic deep reinforcement learning
Changqiang et al. Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization
CN108319286A (en) A kind of unmanned plane Air Combat Maneuvering Decision Method based on intensified learning
CN110928329B (en) Multi-aircraft track planning method based on deep Q learning algorithm
CN113791634A (en) Multi-aircraft air combat decision method based on multi-agent reinforcement learning
CN113050686B (en) Combat strategy optimization method and system based on deep reinforcement learning
CN114330115B (en) Neural network air combat maneuver decision-making method based on particle swarm search
CN114063644B (en) Unmanned fighter plane air combat autonomous decision-making method based on pigeon flock reverse countermeasure learning
CN113159266B (en) Air combat maneuver decision method based on sparrow searching neural network
CN115291625A (en) Multi-unmanned aerial vehicle air combat decision method based on multi-agent layered reinforcement learning
CN113741500B (en) Unmanned aerial vehicle air combat maneuver decision-making method for intelligent predation optimization of simulated Harris eagle
CN110673488A (en) Double DQN unmanned aerial vehicle concealed access method based on priority random sampling strategy
CN113282061A (en) Unmanned aerial vehicle air game countermeasure solving method based on course learning
CN114492805A (en) Air combat maneuver decision design method based on fuzzy reasoning
CN115755956B (en) Knowledge and data collaborative driving unmanned aerial vehicle maneuvering decision method and system
CN115688268A (en) Aircraft near-distance air combat situation assessment adaptive weight design method
CN113671825A (en) Maneuvering intelligent decision missile avoidance method based on reinforcement learning
CN115903865A (en) Aircraft near-distance air combat maneuver decision implementation method
Duan et al. Autonomous maneuver decision for unmanned aerial vehicle via improved pigeon-inspired optimization
Wang et al. Deep reinforcement learning-based air combat maneuver decision-making: literature review, implementation tutorial and future direction
CN111773722B (en) Method for generating maneuver strategy set for avoiding fighter plane in simulation environment
CN116432030A (en) Air combat multi-intention strategy autonomous generation method based on deep reinforcement learning
Yang et al. Ballistic missile maneuver penetration based on reinforcement learning
CN116011315A (en) Missile escape area fast calculation method based on K-sparse self-coding SVM
CN115859778A (en) Air combat maneuver decision method based on DCL-GWOO algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination