CN111666631A

CN111666631A - Unmanned aerial vehicle maneuvering decision method combining hesitation fuzzy and dynamic deep reinforcement learning

Info

Publication number: CN111666631A
Application number: CN202010497478.5A
Authority: CN
Inventors: 丁勇; 何金
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2020-06-03
Filing date: 2020-06-03
Publication date: 2020-09-15

Abstract

The invention discloses an unmanned aerial vehicle maneuver decision method combining hesitation fuzzy and dynamic deep reinforcement learning, which comprises the steps of firstly, establishing an unmanned aerial vehicle air combat motion model, and establishing a decision model based on a weighted optimization target according to attack parameters of both enemy and my parties and a situation-based energy parameter difference; secondly, determining the optimal weight of a decision model of the optimization target in real time by adopting a maximum deviation method according to a hesitation fuzzy theory; then, constructing a state space and an action space for the air combat maneuver decision reinforcement learning; then, combining the states of the unmanned aerial vehicle at multiple moments into a state set as neural network input, and constructing a dynamic depth Q network to perform unmanned aerial vehicle maneuvering decision training; and finally, obtaining the optimal maneuver decision by the trained dynamic deep Q network. The method mainly solves the problem of unmanned aerial vehicle maneuver decision under the condition of incomplete environmental information, considers the influence of the air combat process in the decision process, and better meets the requirement of actual air combat.

Description

Unmanned aerial vehicle maneuvering decision method combining hesitation fuzzy and dynamic deep reinforcement learning

Technical Field

The invention belongs to the field of unmanned aerial vehicle air combat decision making, and particularly relates to an unmanned aerial vehicle maneuver decision making method combining hesitation fuzzy and dynamic deep reinforcement learning.

Technical Field

An Unmanned Combat Aircraft (UCAV) needs to decide an optimal tactical scheme or maneuvering action according to complex battlefield situation information in the air Combat process, and the quality of an Unmanned Aerial Vehicle decision mechanism is the key to successfully finish an air Combat task. Along with the air battle environment is more and more complicated and unknown, improve unmanned aerial vehicle's intelligence level for unmanned aerial vehicle can independently perceive battle field environment, and the automatic mobile selection that produces control command and accomplish in the air battle is the main research direction of current unmanned aerial vehicle air battle.

In recent years, with the rapid development of artificial intelligence technology, deep learning and machine learning show huge potential in the field of unmanned aerial vehicle air combat decision making. Reinforcement learning is an unsupervised machine learning method, wherein an aircraft obtains rewards through interaction with the environment, learns how to adapt to the environment according to the principle of obtaining the maximum reward, and updates and stores the learned experience in a Q value table. In actual air combat, when the state is too much and the dimension is too high, the Q value table is obviously not suitable, a deep reinforcement learning algorithm is adopted, and a neural network fitting Q value function is used for replacing the Q value table, so that the problem can be solved, but the method mainly aims at making air combat maneuver decisions under the condition that environmental parameters are known. However, in the actual air combat decision process, different air combat conditions have different requirements on environment parameters, and each optimized target parameter has certain fuzziness and inaccuracy, so that the method cannot meet the requirements.

Therefore, aiming at the problems, the invention provides the unmanned aerial vehicle maneuvering decision method combining hesitation and fuzzy dynamic deep reinforcement learning. The method has the advantages that the maximum bias method of hesitation and fuzzy is utilized to determine the weighted value of the optimization target at each moment, the problem that the weight of the traditional reinforcement learning multi-optimization target is fixed and unreasonable is solved, the multi-moment state is formed into a state set to serve as the input of a neural network, the neural network is trained on the principle of obtaining the maximum return, and the influence of an air combat process on a result is considered for the use of the state set instead of only considering the influence of the current moment on the result, so that the method is more suitable for the actual air combat.

Disclosure of Invention

The invention aims to provide an unmanned aerial vehicle maneuver decision method combining hesitation fuzzy and dynamic deep reinforcement learning.

In order to achieve the purpose, the invention adopts the following technical scheme:

the unmanned aerial vehicle maneuver decision method combining hesitation fuzzy and dynamic deep reinforcement learning comprises the following steps:

step 1, an unmanned aerial vehicle air combat motion model is established, and a decision model based on a weighted optimization target is established according to attack parameters of both enemy and my parties and the situation-based energy parameter difference.

And 2, determining the optimal weight of the decision model of the optimization target in real time by adopting a maximum deviation method according to the hesitation fuzzy theory.

And 3, constructing a state space and an action space of the air combat maneuver decision reinforcement learning.

And 4, merging the states of the unmanned aerial vehicles at multiple moments into a state set as neural network input, and constructing a dynamic depth Q network.

And 5, enabling the instant reward obtained when the action is taken in the current state to be a decision model based on a weighted optimization target, and performing unmanned aerial vehicle maneuvering decision training on the dynamic depth Q network by using the reward.

And 6, inputting the current state set of the unmanned aerial vehicle into the trained dynamic depth Q network to obtain an optimal maneuver decision.

The invention has the following advantages:

1. the method utilizes the hesitation fuzzy theory, adopts the maximum deviation method to determine the parameter weight in real time, and utilizes the weight to carry out weighted summation on multiple targets, thereby solving the problem that the weight of multiple optimization targets is fixed and unreasonable in the optimization process of the traditional reinforcement learning.

2. According to the invention, by introducing the dynamic deep Q network, the multi-time state is formed into the state set as the input of the neural network, the influence of the air combat process on the decision is considered, rather than only the influence of the current time on the result, so that the decision result is more reasonable.

Description of the figures

FIG. 1 is a flow chart of the method of the present invention

FIG. 2 is a schematic diagram of the air battle situation of the enemy and the my

FIG. 3 is an air combat trajectory diagram of enemy unmanned aerial vehicle adopting S-shaped maneuver strategy

FIG. 4 is a variation curve of each optimized target value of an S-shaped maneuver strategy adopted by an enemy unmanned aerial vehicle

FIG. 5 is a diagram of air combat trajectory with a "pure tracking" strategy taken by an enemy drone

FIG. 6 is a graph showing the variation of each optimized target value of the enemy unmanned aerial vehicle adopting the pure tracking strategy

Detailed Description

The technical scheme of the invention is specifically explained by combining the attached chart.

The unmanned aerial vehicle maneuver decision method combining hesitation ambiguity and dynamic deep reinforcement learning specifically comprises the following steps:

step 1, establishing an unmanned aerial vehicle air combat motion model according to missile attack parameters ξ of both enemies and my parties_i、ξ_TAnd establishing a decision model based on a weighted optimization target based on the situation energy parameter difference delta W, specifically:

(1.1) taking UCAV as particles, and describing the motion state of the UCAV by adopting a three-degree-of-freedom particle model without considering specific rigid motion and flight control algorithm, wherein the motion model is as follows:

wherein x, y and z represent the position of the airplane in an inertial coordinate system, v is the flying speed of the airplane, theta is the track inclination angle and represents the included angle between the speed and an x-O-y plane, psi is the course angle and represents the projection of the speed on the x-O-y plane

Angle to the y-axis, where v' represents the projection of velocity on the x-O-y plane, and g is the current position gravitational acceleration, [ η ]_x，η_z，φ]For unmanned aerial vehicle controlWherein η_xRepresenting thrust of the aircraft for overload in the direction of speed, η_zTo overload in the direction of the set-top, i.e., normal, φ is the roll angle around the velocity vector.

(1.2) establishing a decision model based on a weighted optimization target according to the model, specifically:

(1.2.1) attack parameter modeling based on weapon Performance

The air combat aims at hitting down enemies to protect the own party, and the maneuvering decision is made in order to form weapon launching conditions by the enemies and avoid forming weapon launching conditions by the enemies, so that the method is based on the attack performance of airborne weapons, combines angles, distances, weapon configurations and weapon ranges, and provides a new attack parameter zeta as an optimization target.

Assuming that both enemy and my carry air-to-air missiles, the attack zones are as shown in fig. 2, in which,

represents the line-of-sight angle of my drone,

for the target unmanned aerial vehicle line of sight angle, V_U、V_TRespectively representing the flight speeds of my party and target unmanned aerial vehicles, R is the distance between the unmanned aerial vehicles of the enemy and the my party, and defining a missile i attack parameter ξ_iComprises the following steps:

wherein k attacks the parameter adjustment factor, where k is 1, and R is_gAlong my party unmanned aerial vehicle line of sight angle for missile

Maximum attack distance in direction.

If the number of guided missiles carried by UCAV of our party is n, the specific guided missiles are numbered as U1, U2, Un, the attack parameters { ξ } of all guided missiles of our party are obtained through the formula (2)_U1，ξ_U2，...，ξ_UnTaking the maximum value asOur current adversary attack parameter ξ_UThe concrete formula is as follows:

ξ_U＝arg max{ξ_U1，ξ_U2，...，ξ_Un} (3)

if the number of missiles carried by the UCAV of the enemy is m, the specific missile numbers are T1, T2, Tm, the attack parameters { ξ } of all missiles of the enemy are obtained through the formula (2)_T1，ξ_T2，...，ξ_TmThe maximum value is taken as the current attack parameter ξ of the enemy to the my party_TThe concrete formula is as follows:

ξ_T＝arg max{ξ_T1，ξ_T2，...，ξ_Tm} (4)

(1.2.2) energy parameter difference modeling based on UCAV situation

The completion of the actions of the unmanned aircraft is based on the premise of energy consumption, and the higher energy means more selectivity for the actions, so that the unmanned aircraft is more favorable for taking advantage of the actions in air combat. Suppose W_UIs the energy parameter of my party and W_TDefining an energy parameter difference Δ W ═ W based on UCAV situation for an enemy energy parameter_U-W_TWherein W is_UAnd W_TThe method specifically comprises the following steps:

in the formula, E_UpAnd W_UkRespectively gravitational potential energy and kinetic energy of our part, m_UFor unmanned aerial vehicle quality of my party, E_TpAnd W_TkRespectively the gravitational potential energy and the kinetic energy, m, of the target unmanned aerial vehicle_TFor target drone mass, k₁And k₂Respectively for our unmanned aerial vehicle and target unmanned aerial vehicle energy regulation parameter, here all take 1.

(1.2.3) establishing a decision model based on a weighted optimization target

The missile attack parameters ξ of the enemy and the my party are calculated according to the current enemy and my state_i、ξ_TAnd weighting and summing the energy parameter difference delta W based on the situation to obtain a decision model of the optimization target, which specifically comprises the following steps:

f(ξ_U，ξ_T，ΔW；ω^*)＝ω₁ξ_U+ω₂ξ_T+ω₃ΔW (6)

in the formula, ω₁，ω₂And ω₃Are weights.

Step 2, determining the optimal weight omega of the decision model of the optimization target in real time by adopting a maximum deviation method according to the hesitation fuzzy theory^*The method specifically comprises the following steps:

(2.1) constructing an optimized target weight hesitation fuzzy evaluation matrix based on multi-time unmanned aerial vehicle situation, wherein the specific method comprises the following steps:

let A be { A ═ A₁，A₂，...，A_nThe decision set of the unmanned aerial vehicle at n continuous moments is shown, and X is (ξ)_U，ξ_TΔ W is the optimized parameter set, then a hesitation ambiguity set A about X at the ith time_iComprises the following steps:

in the formula (I), the compound is shown in the specification,

indicating that the ith time is at the optimization goal x_jSet of possible degrees of membership.

When x is_jGet ξ_UWhen the temperature of the water is higher than the set temperature,

indicating that the ith time is at the optimization goal ξ_UThe set of possible membership degrees is abbreviated

Attack parameters ξ of my party to enemy at current time_UiThe larger the air battle advantage is; otherwise, the smaller the air battle advantage. The corresponding hesitation fuzzy set under the attribute is

The method specifically comprises the following steps:

where ρ is an attack factor coefficient, where ρ is 0.2.

When x is_jGet ξ_TWhen the temperature of the water is higher than the set temperature,

indicating that the ith time is at the optimization goal ξ_TThe set of possible membership degrees is abbreviated

Attack parameter ξ of enemy to my party at present_TiThe larger the air battle advantage is, the smaller the air battle advantage is; otherwise, the greater the air battle advantage. The corresponding hesitation fuzzy set under the attribute is

The method specifically comprises the following steps:

where ρ is an attack factor coefficient, where ρ is 0.2.

When x is_jWhen the value is taken to be delta W,

representing a possible membership set under the optimization target energy parameter difference delta W at the ith moment, which is abbreviated as

Energy parameter difference Δ W at current time_iThe larger the air battle advantage is; otherwise, the smaller the air battle advantage. The corresponding hesitation fuzzy set under the attribute is

The method specifically comprises the following steps:

where γ is an attack factor coefficient, where γ is 0.3.

And setting H as a hesitation fuzzy decision matrix and consisting of n multiplied by 3 hesitation elements, wherein n is the selected time number, and the hesitation fuzzy decision matrix H is specifically as follows:

(2.2) determining the optimal weight of the objective function based on a maximum deviation method, specifically:

for optimization objective x_j∈ X, hesitation fuzzy set A at the ith moment_iThe deviation under this optimization objective with respect to all other times is expressed as:

in the formula, ω_jIs a weight coefficient, h_ijIs the ith row and j column elements, H, in the hesitation fuzzy decision matrix H_kjJ columns of elements in the k row in the hesitation fuzzy decision matrix H, m is the number of optimization targets, d (H)_ij，h_kj) For the hesitation element H in the fuzzy decision matrix H_ijAnd h_kjThe hesitation Euclidean distance of (1) is specifically defined as:

wherein l is the number of the numerical values in the hesitation elements.

For optimization objective x_j∈ X, the deviation under the optimization objective at all times relative to the others is expressed as:

constructing a nonlinear model for determining the weight vector omega to maximize the deviation values of all the optimized target parameters, specifically:

and (3) solving and converting the model into a constraint optimization problem, and constructing a Lagrangian function f (omega, lambda) as shown in the following formula:

in the formula, λ is a lagrange multiplier.

The partial differential of f (ω, λ) is calculated as follows:

solving the above equation system can obtain the weight vector ω ═ (ω)₁，ω₂，...，ω_j，...，ω_m) Here ω is_jComprises the following steps:

substituting formula (14) for formula (18), the above formula can be:

then, ω is changed to { ω ═ ω₁，ω₂，...，ω_mNormalizing to obtain a normalized optimal weight vector

The method specifically comprises the following steps:

in the formula (I), the compound is shown in the specification,

and 3, constructing a state space S and an action space A for the air combat maneuver decision reinforcement learning.

(3.1) the state space S of the air combat maneuver decision reinforcement learning comprises all two situation factors influencing the calculation of the air combat advantage function, and specifically comprises the following steps:

1) unmanned aerial vehicle line of sight angle of our party

And the visual line angle of the enemy unmanned aerial vehicle

2) The distance R of the unmanned aerial vehicles of the two enemies;

3) speed v of unmanned aerial vehicle of our party_UWith enemy unmanned aerial vehicle velocity v_T；

4) The height difference delta h of the unmanned aerial vehicles of the two enemies.

Selecting

And as a state space for air combat maneuver decision reinforcement learning, describing the air combat situation of the unmanned aerial vehicle at the current moment.

(3.2) the final motion track of the unmanned aerial vehicle during the air battle can be regarded as a maneuver combination decided by each step, wherein seven basic maneuvers of the unmanned aerial vehicle during the air battle are selected, and the seven basic maneuvers specifically comprise: 1) keeping the original flight; 2) the maximum acceleration is directly flown; 3) maximum overload left turn; 4) maximum overload right turn; 5) climbing under maximum overload; 6) maximum overload dive; 7) and (4) flying at the maximum deceleration.

Establishing an unmanned aerial vehicle air combat motion model according to the formula (1), and designing the control quantity of the model as [ η ]_x，η_z，φ]Then, the control quantities corresponding to the seven basic maneuvers are respectively:

1) maintain the original flight, [ η ]_x，η_z，φ]＝[0，1，0]；

2) The flying speed is accelerated to the maximum extent,

3) the vehicle is overloaded to the left at the maximum,

4) the vehicle is rotated to the right under the maximum overload,

5) the maximum overload is climbed up,

6) the maximum overload is caused to dive downwards,

7) the flying speed is reduced to the maximum speed,

here, the first and second liquid crystal display panels are,

the maximum overload in the speed direction, i.e. the maximum thrust,

is the maximum overload in the direction of the set-top, i.e. the maximum normal overload.

The control amounts of the seven operations are respectively denoted as a_i1, 2, 7, and an action space a of an air combat decision a ═ a ·₁，a₂，...，a₇}。

Step 4, taking a state set formed by the current state vector and the state vectors at the previous two moments as neural network input, and constructing a dynamic depth Q network, which specifically comprises the following steps:

(4.1) establishing a dynamic deep Q learning network, and initializing parameters as shown in the following table.

(4.2) State set(s) consisting of the current State vector and the State vectors of the previous two moments_t-2，s_t-1，s_t) The output is the number Q of all action values(s) as the input of the neural network_t-2，s_t-1，s_t) A; θ), where a is the action taken by the agent in that state, θ is the network weight, and state s_t-2，s_t-1And s_tThe method specifically comprises the following steps:

step 5, let r be the instant prize obtained by taking the action in the current state, let r be f (ξ)_U，ξ_T，ΔW；ω^*) Here f (ξ)_U，ξ_T，ΔW；ω^*) The method is obtained by a decision model of an optimization target in an equation (6), and the reward r is used for carrying out unmanned aerial vehicle maneuvering decision training on the dynamic depth Q network, and the method specifically comprises the following steps:

(5.1) initialize the experience pool D, which takes 50000.

(5.2) establishing an action value Q estimation network, and randomly initializing a network weight theta; establishing an action value target network

Initializing network weights θ^-＝θ。

(5.3) initializing unmanned aerial vehicle State sequence

The initial input to the neural network is(s)₁，s₂，s₃)。

(5.4) for each step in the episode, selecting a random action a within the probability_kOtherwise, select a_k＝argmax_aQ(s，a；θ)。

(5.5) the drone executes action a_kCalculating a potential function reward r at the k time_kAnd calculating the state of the unmanned plane at the moment k +1

Comparing the current experience ((s)_k-2，s_k-1，s_k)，a_k，r_k，(s_k-1，s_k，s_k+1) Is stored in experience pool D.

(5.6) randomly extracting the minimum experience block D from the experience pool D_minCalculating a target value function y_kThe method specifically comprises the following steps:

to (y)_k-Q((s_k-2，s_k-1，s_k)，a_k；θ))²And updating the Q estimation network weight value theta by executing a gradient descent method.

(5.7) finishing the training if the training of the current plot is finished, otherwise, skipping to the step (5.3).

Step 6, inputting the current state set of the unmanned aerial vehicle into the trained dynamic depth Q network to obtain an optimal maneuver decision, and the specific steps are as follows:

(6.1) setting k to be 1, randomly initializing states of the enemy and my unmanned aerial vehicles, and obtaining an initial neural network state(s)_k，s_k+1，s_k+2)。

(6.2) inputting the state into the trained dynamic deep Q network, and outputting the optimal action a by the network_k＝argmax_aQ (s, a; theta), the unmanned aerial vehicle obtains the state set(s) at the next moment after executing the action_k+1，s_k+2，s_k+3)。

(6.3) when the unmanned aerial vehicle of the local party forms an attack condition to the enemy, finishing the maneuver decision, otherwise,(s)_k+1，s_k+2，s_k+3) Inputting the data into a neural network, and jumping to the step (6.2).

In order to verify the feasibility and effectiveness of the method, the invention is described in further detail below with reference to examples.

On a macOS operating system (version number: Mojava version: 10.14.5, processor: 2.5GHz opcode i7, memory: 16GB 1600MHz DDR3, display card: Inter Iris Pro 1536MB), an experimental environment is built on pycharm by using python language for algorithm simulation, and a simulation result is exported to realize visualization.

And adopting the trained dynamic depth Q network to make maneuvering decision on the unmanned aerial vehicle under the following two conditions, wherein both the enemy and the my adopt classical tactical action.

Case 1: the unmanned aerial vehicle of our party is initially in a better situation environment, and the unmanned aerial vehicle of the enemy party only adopts S-shaped maneuver to carry out tactical evasion

Fig. 3 shows an air battle locus diagram of an enemy unmanned aerial vehicle adopting a "S-shaped maneuver" strategy, wherein an upper red curve is the motion locus of the unmanned aerial vehicle of the enemy, and a lower blue curve is the motion locus of the unmanned aerial vehicle of the enemy. Coordinates of the unmanned aerial vehicle of the party are (1.2km, 8.2km and 2.4km) and coordinates of the unmanned aerial vehicle of the enemy are (1.5km, 1.5km and 1.0km) in the initial state. As can be seen from the figure, initially, the unmanned aerial vehicle of our party is positioned above the enemy to take advantage, and after the unmanned aerial vehicle of our party is properly dived down, the head is pulled up to avoid the warplane to rush, and then the line-of-sight angle is adjusted, so that the attack condition is formed and maintained.

FIG. 4 shows the variation curve of each optimized target parameter of the enemy unmanned aerial vehicle adopting the strategy of S-shaped maneuver, and the lower red curve in the diagram is the current enemy attack parameter ξ of our party_UThe middle black curve is the current attack on my party parameter ξ of the enemy_TThe upper blue curve represents the energy parameter difference Δ W ═ W_U-W_TIt can be seen from the figure that in the decision making process, my current adversary attack parameter ξ_UGetting larger and the enemy attacking the parameter ξ to my party_TThe difference is gradually reduced, and the delta W is continuously reduced in the whole decision making process, so that the unmanned aerial vehicle of the party finally realizes attack on the enemy under the condition of sacrificing a certain energy advantage.

Case 2: the enemy adopts the tactics of 'pure tracking' to try to approach and attack unmanned aerial vehicle of our party

Fig. 5 shows an air battle locus diagram of an enemy unmanned aerial vehicle adopting a 'pure tracking' strategy, wherein an upper red curve is a motion locus of the unmanned aerial vehicle of the enemy, and a lower blue curve is a motion locus of the unmanned aerial vehicle of the enemy. Coordinates of the unmanned aerial vehicle of the party are (8.0km, 9.5km and 8.5km) and coordinates of the unmanned aerial vehicle of the enemy are (1.5km, 1.6km and 0.8km) in the initial state. As can be seen from the figure, initially, the unmanned aerial vehicle at one side loses tactical advantages after diving, but then lifts the machine head while flexibly avoiding towards the lower right, so that an attack condition is achieved in advance, and finally attack on an enemy is realized.

FIG. 6 shows the change curve of each optimized target by the enemy unmanned aerial vehicle adopting the 'pure tracking' strategy, and the lower red curve in the graph is the current enemy attack parameter ξ of our party_UThe middle black curve is the current attack on my party parameter ξ of the enemy_TThe upper blue curve represents the energy parameter difference Δ W ═ W_U-W_TIt can be seen from the figure that in the decision making process, my current adversary attack parameter ξ_UGetting larger and the enemy attacking the parameter ξ to my party_TThe energy parameter difference delta W is gradually reduced in the whole decision making process, and the fact that the unmanned aerial vehicle of the party finally attacks the enemy under the condition that a certain energy advantage is sacrificed is shown.

Claims

1. An unmanned aerial vehicle maneuvering decision method combining hesitation fuzzy and dynamic deep reinforcement learning is characterized by comprising the following steps:

2. The unmanned aerial vehicle maneuver decision method combining hesitation ambiguity and dynamic deep reinforcement learning according to claim 1, wherein the difference between the enemy-me attack parameter and the situation-based energy parameter in step 1 specifically comprises:

(2.1) defining missile i attack parameters ξ_iComprises the following steps:

in the formula, k attacks the parameter adjustment factor, R is the distance between the unmanned aerial vehicles of the two parties of the enemy and the my, R_gAlong my party unmanned aerial vehicle line of sight angle for missile

Maximum attack distance in direction.

If the number of guided missiles carried by UCAV of one party is n, the specific guided missiles are numbered as U1, U2, Un, the attack parameters { ξ } of all guided missiles of one party are obtained through the formula (1)_U1，ξ_U2，...，ξ_UnTaking the maximum value as the current enemy attack parameter ξ of our party_UThe concrete formula is as follows:

ξ_U＝arg max{ξ_U1，ξ_U2，...，ξ_Un} (2)

if the number of missiles carried by the UCAV of the enemy is m, the specific missile numbers are T1, T2, Tm, the attack parameters { ξ } of all missiles of the enemy are obtained through the formula (1)_T1，ξ_T2，...，ξ_TmThe maximum value is taken as the current attack parameter ξ of the enemy to the my party_TThe concrete formula is as follows:

ξ_T＝arg max{ξ_T1，ξ_T2，...，ξ_Tm} (3)

(2.2) defining a situation-based energy parameter difference Δ W ═ W_U-W_THere W_UAs my square energy parameter, W_TThe method specifically meets the following requirements for enemy energy parameters:

in the formula, E_UpAnd W_UkRespectively gravitational potential energy and kinetic energy of our part, m_UFor unmanned aerial vehicle quality of my party, E_TpAnd W_TkRespectively the gravitational potential energy and the kinetic energy, m, of the target unmanned aerial vehicle_TFor target drone mass, k₁And k₂Respectively are the energy adjusting parameters of the unmanned aerial vehicle of the same party and the target unmanned aerial vehicle.

3. The unmanned aerial vehicle decision method combining hesitation blur and dynamic depth reinforcement learning according to claim 1, wherein in the step 2, according to the hesitation blur theory, the optimal weight of the decision model of the optimization target is determined in real time by adopting a maximum deviation method, and specifically:

(3.1) constructing an optimized target weight hesitation fuzzy evaluation matrix based on multi-time unmanned aerial vehicle situation, wherein the specific method comprises the following steps:

in the formula (I), the compound is shown in the specification,

indicating that the ith time is optimizedGoal ξ_UThe set of possible membership degrees is abbreviated

The method specifically comprises the following steps:

where ρ is an attack factor coefficient, where ρ is 0.2.

The method specifically comprises the following steps:

where ρ is an attack factor coefficient, where ρ is 0.2.

When x is_jWhen the value is taken to be delta W,

indicating that the ith time is optimizedThe possible membership set under the target energy parameter difference Δ W is abbreviated as

The method specifically comprises the following steps:

where γ is an attack factor coefficient, where γ is 0.3.

(3.2) determining the optimal weight of the objective function based on a maximum deviation method, specifically:

wherein l is the number of the numerical values in the hesitation elements.

in the formula, λ is a lagrange multiplier.

The partial differential of f (ω, λ) is calculated as follows:

substituting formula (12) for formula (16), the above formula can be:

The method specifically comprises the following steps:

in the formula (I), the compound is shown in the specification,

4. the unmanned aerial vehicle maneuver decision method combining hesitation ambiguity and dynamic deep reinforcement learning according to claim 1, wherein in the step 4, the states of the unmanned aerial vehicle at a plurality of moments are combined into a state set as a neural network input, specifically:

first, select

As a state space for the air war maneuver decision reinforcement learning, the air war situation of the unmanned aerial vehicle at the current moment is described, and here,

is the line-of-sight angle of the unmanned aerial vehicle of our party,

is the visual angle of the enemy unmanned aerial vehicle, R is the distance between the enemy unmanned aerial vehicle and the unmanned aerial vehicle, v_UIs the speed, v, of the unmanned aerial vehicle of our party_TThe speed of the enemy unmanned aerial vehicle is shown, and delta h is the height difference of the enemy unmanned aerial vehicle and the unmanned aerial vehicle.

Then, a state set(s) composed of the current state vector and the state vectors of the previous two time instants_t-2，s_t-1，s_t) As input to the neural network, each timeState of carving s_t-2，s_t-1And s_tRespectively satisfy: