CN116880186A

CN116880186A - Data-driven self-adaptive dynamic programming air combat decision method

Info

Publication number: CN116880186A
Application number: CN202310861633.0A
Authority: CN
Inventors: 李彬; 宁召柯; 史明明; 李清亮; 陶呈纲; 孙绍山; 李导
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2023-07-13
Filing date: 2023-07-13
Publication date: 2023-10-13
Anticipated expiration: 2043-07-13
Also published as: CN116880186B

Abstract

The invention discloses a data-driven self-adaptive dynamic planning air combat decision method, which comprises the following steps: s1, establishing an unmanned aerial vehicle escape-following problem system model; s2, solving the unmanned aerial vehicle escape problem by adopting model-free self-adaptive dynamic programming; acquiring real-time control rates of the red unmanned aerial vehicle and the blue unmanned aerial vehicle by adopting an offline neural network model training algorithm, and collecting information of the control rates of the red unmanned aerial vehicle and state information of the red unmanned aerial vehicle and the blue unmanned aerial vehicle in real time; and S4, updating the neural network on line through an on-line model training algorithm to realize the air combat decision of the self-adaptive dynamic programming of the red unmanned aerial vehicle and the blue unmanned aerial vehicle in the tracking-escaping problem. The invention combines the advantages of offline training and online training, and improves the capability of the online self-adaptive adjustment strategy of the invention. The method does not depend on an aircraft system model, has strong generalization capability, and can be popularized in a plurality of application scenes.

Description

Data-driven self-adaptive dynamic programming air combat decision method

Technical Field

The invention belongs to the technical field of unmanned aerial vehicles, and particularly relates to a data-driven self-adaptive dynamic planning air combat decision method.

Background

The unmanned air fighter decision is aimed at making it take advantage of or turn down in fight, and the key of research is to design an efficient autonomous decision mechanism. The autonomous decision of the unmanned flight fighter is a mechanism for making a tactical plan or selecting a flight action in real time according to the actual combat environment in the air combat, and the degree of the decision mechanism reflects the intelligent level of the unmanned flight fighter in the modern air combat. The input of the autonomous decision mechanism is various parameters related to air combat, such as flight parameters of an aircraft, weapon parameters, three-dimensional space scene parameters and the relative relationship of two parties of a friend and foe, the decision process is information processing and calculation decision process inside a system, and the output is a tactical plan or some specific flight actions of decision making.

The self-adaptive dynamic programming integrates the ideas of dynamic programming and reinforcement learning, not only inherits the advantages of a dynamic programming method, but also can overcome the problem of dimension disaster generated by dynamic programming. The principle of the self-adaptive dynamic programming is to approximate the performance function and the control strategy in the traditional dynamic programming by adopting a function approximation structure, and obtain an optimal value function and the control strategy by means of the idea of reinforcement learning so as to meet the principle of the Belman optimality. The idea of adaptive dynamic programming can be represented by fig. 1.

Air combat decision making is a complex task involving a large amount of information and variables, making conventional manually-made decision rules difficult to adapt to changing battlefield environments. Therefore, the existing air combat decision method often has the following problems:

1. the static planning method cannot cope with the dynamic environment: conventional decision methods are generally based on preset rules or models, and are difficult to adapt to the battlefield environment and dynamic enemy conditions which change in real time.

2. The manual decision requires a lot of time and effort: the decision process needs to process a large amount of information and variables, consumes a large amount of time and effort, and is also prone to omission and misjudgment.

3. Lack of comprehensive consideration and flexible strain capacity: conventional decision methods typically make decisions based on a single factor or a small number of factors, and it is difficult to comprehensively consider and flexibly strain multiple factors, which may lead to decision bias or inaccuracy.

4. Cannot accommodate the needs of informationized warfare: the modern air combat environment has large information quantity and rapid change, and the traditional method for manually making decision rules cannot adapt to the requirements of informationized war.

Disclosure of Invention

The invention aims to provide a data-driven self-adaptive dynamic planning air combat decision-making method, which mainly solves the problem that the traditional manual decision-making rule is difficult to adapt to continuously-changing battlefield environments.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

a data-driven self-adaptive dynamic programming air combat decision method comprises the following steps:

s1, assuming that the fighter unmanned aerial vehicle is a red unmanned aerial vehicle and a blue unmanned aerial vehicle; respectively establishing an unmanned aerial vehicle escape problem system model according to red side pursuit-blue Fang Taoyi and red side escape-blue side pursuit problems;

s2, solving the unmanned aerial vehicle escape problem by adopting model-free self-adaptive dynamic programming, and improving a strategy by adopting a bounded exploration signal;

s3, acquiring real-time control rates of the red unmanned aerial vehicle and the blue unmanned aerial vehicle by adopting an offline neural network model training algorithm, and collecting state information of the red unmanned aerial vehicle and the blue unmanned aerial vehicle in real time;

and S4, updating the neural network on line through an on-line model training algorithm to realize the air combat decision of the self-adaptive dynamic programming of the red unmanned aerial vehicle and the blue unmanned aerial vehicle in the tracking-escaping problem.

Further, in the invention, the method for establishing the red party pursuit-blue party escape problem model is as follows:

real-time position of red square unmanned aerial vehicleIs X _r (t), blue unmanned aerial vehicle position is X _b (t), the two-party position difference is:

e＝X _b (t)-X _r (t) (1)

the tracking error system is:

in the formula ,the tracking error is represented as differential of the two-party position difference e with respect to time, < >>Real-time position X of blue unmanned aerial vehicle _b (t) differentiation with respect to time, < >>Real-time position X of unmanned aerial vehicle for red square _r (t) differentiation with respect to time;

assuming that the red party chaser can only measure the three-dimensional movement speed of the blue party unmanned aerial vehicle, the expression (2) can be expressed specifically as:

the system model of the red party pursuing the blue party is expressed as:

wherein ,V_r The speed of the unmanned aerial vehicle is Mach,speed V of unmanned plane _r Differentiation with respect to time; x-shaped articles _r The course angle of the red unmanned aerial vehicle is in radian, < > for red square unmanned aerial vehicle>Heading angle χ for red square unmanned aerial vehicle _r Differentiation with respect to time; gamma ray _r The track inclination angle of the red square unmanned aerial vehicle is in radian, and the units are +.>Differentiating the track dip angle of the red unmanned aerial vehicle with respect to time; distance error e _x ,e _y ,e _z The unit is km, jersey>Regarding e as distance error _x ,e _y ,e _z Differentiating in time; g is gravity acceleration; v (V) _c Is the sound velocity, n _x ,n _y ,n _z Is the overload control quantity of the red unmanned aerial vehicle.

Further, in the invention, the method for establishing the red party escape-blue party pursuit problem model is as follows:

the virtual displacement method is adopted to minimize the distance between the local reverse displacement and the enemy aircraft, namely, the effect of maximizing the local position and the enemy aircraft position is achieved, wherein the virtual displacement is the displacement quantity generated by the virtual displacement speed V', namely:

the system model of red-side escape-blue-side chase is expressed as:

wherein the sign and meaning are the same as the pursuit problem.

Further, in the invention, the red party pursuit-blue party escape pursuit problem system model is processed, which comprises the following steps:

s11, the nonlinear continuous state space equation of the unmanned aerial vehicle is abbreviated as:

wherein x= [ V _r ，χ _r ，γ _r ，e _x ，e _y ，e _z ] ^T Representing a state vector for the red-side aircraft,represents the differentiation of the red aircraft state vector x with respect to time, u= [ n ] _x ，n _y ，n _z ] ^T Representing control vectors of the red aircraft, F (x), G (x) are respectively

S12, defining a performance index function as:

wherein Q (x, t) is an index function related to the state, and R (u, t) is an index function related to the control amount;

s13, establishing an angle dominance function of the unmanned aerial vehicle, and setting a speed vector of the unmanned aerial vehicle of the red party as follows:

V _r ＝[cosγ _r cosχ _r ，cosγ _r sinχ _r ，sinγ _r ] ^T

the blue-side unmanned aerial vehicle speed vector is:

V _b ＝[cosγ _b cosχ _b ，cosγ _b sinχ _b ，sinγ _b ] ^T ，

red square unmanned aerial vehicle is to blue Fang Lanfang unmanned aerial vehicle distance vector be e _rb ＝[e _x ，e _y ，e _z ] ^T The geometric relationship is that

Obtaining an angle dominance function:

Q _α ＝cα _r +(1-c)α _b (9)

wherein c＝(α_r +α _b )/2π；

S14, defining a distance dominance function as follows:

Q _d ＝e ^T Q ₁ e (10)

wherein e＝[e_x ，e _y ，e _z ] ^T ，Is a positive definite matrix;

the state index function of the red party can be expressed as:

Q(x，t)＝Q _d +Q ₂ Q _α (11)

wherein Q₂ Is a weight coefficient;

s15, defining a controller index function as:

R(u，t)＝(u-u ₀ ) ^T R(u-u ₀ ) (12)

wherein ,to control the weighting coefficient, u ₀ ＝[sinγ _r ，0，cosγ _r ] ^T The control quantity of the unmanned aerial vehicle under the stable flight is provided.

Further, the specific implementation method of the step 2 in the invention is as follows:

definition of bounded exploration signal u _e The model type (5) of the red unmanned aerial vehicle system can be rewritten as:

the performance index function is:

the derivative of the performance index function (7) with respect to time is expressed as:

when the performance index is the minimum value calculated by the functional formula (7), the following Belman equation is satisfied:

wherein r^(j) =q (x, t) +r (u, t); by combining the formula (17) and the formula (18), it is possible to obtain:

the optimal control quantity of the real system is as follows:

the formula (20) is used to solve G, and the formula (19) is used to obtain:

from t at both ends of formula (21) ₀ Integrating by t to obtain:

a neural network is employed to approximate the cost function and control inputs, namely:

wherein ,ideal neural network weights for the evaluation network and the execution network, respectively; l (L) ₁ ，L ₂ The number of hidden layer neurons of the evaluation network and the execution network, respectively; />Neural network activation functions of the evaluation network and the execution network, respectively; />The reconstruction errors of the evaluation network and the execution network are respectively;

let the evaluation network and the evaluation network execute the estimated values of the network as follows:

wherein ,respectively the ideal neural network weight W _c ，W _a Is a function of the estimated value of (2); substituting equation (24) into equation (22) yields the following residual term errors:

wherein The control amount obtained for the improved strategy is expressed as follows:

wherein omega is the exploration set of control quantity, which is composed ofAdding a bounded random search signal, and +.> Optimizing ∈k by least squares algorithm>Namely:

optimization by least squares algorithmNamely:

further, in step S3 in the present invention, the offline neural network model training algorithm includes the steps of:

s31: by giving different initial states, a data set { x } can be obtained _k (t ₀ ) }, initialize

S32: obtaining control quantity corresponding to the state, namely data set according to the formula (26)

S33: utilizing a data setUpdate according to formula (27)>Update according to formula (28)>

S34: if it isOr->The algorithm is terminated; otherwise j=j+1, step S32 is skipped, where e _a 、∈ _c Is convergence accuracy.

Further, in step S4, the step of online updating the neural network by the online model training algorithm is as follows:

s41: the current neural network is off-weight W _c ，W _a The online learning rate is alpha, a real-time data set { x (t), u (t) } is obtained by sampling at a fixed time interval delta t, and a step S42 is carried out after a plurality of groups of data are acquired;

s42: obtaining control quantity corresponding to the state, namely data set according to the formula (26)

S43: utilizing a data setCalculate +.>Calculate +.>

S44: on-line updating of neural network weights, i.eThe process goes to step S41.

Drawings

Fig. 1 is a diagram of a prior art adaptive dynamic programming architecture.

FIG. 2 is a flow chart of the present invention.

Fig. 3 is a schematic view of the angular advantage of the unmanned aerial vehicle according to the embodiment of the present invention.

Fig. 4 is a schematic diagram of virtual displacement principle in an embodiment of the present invention.

Detailed Description

The invention will be further illustrated by the following description and examples, which include but are not limited to the following examples.

As shown in fig. 2, in the data-driven adaptive dynamic programming air combat decision method disclosed by the invention, in a tracking-escaping scene, one tracker and one escaper exist, and in this embodiment, both sides of the tracking and escaping are represented by both red and blue sides. This problem is described herein in terms of red party chase, blue party escape, and red party unmanned aerial vehicle reduces the distance with blue party unmanned aerial vehicle through maneuver, avoids simultaneously being caught by blue party unmanned aerial vehicle, avoids being aimed at by blue party unmanned aerial vehicle aircraft nose promptly to be in the inferior situation.

In the embodiment, an unmanned aerial vehicle escape-following problem system model is established according to the red-side pursuit-blue Fang Taoyi and red-side escape-blue-side pursuit problems respectively.

First, the real-time position of the red-recording unmanned aerial vehicle is X _r (t), blue unmanned aerial vehicle position is X _b (t), the two-party position difference is:

e＝X _b (t)-X _r (t) (1)

the tracking error system is:

the system model of the red party pursuing the blue party is expressed as:

wherein ,V_r The speed of the unmanned aerial vehicle is Mach,speed V of unmanned plane _r Differentiation with respect to time; x-shaped articles _r The course angle of the red unmanned aerial vehicle is in radian, < > for red square unmanned aerial vehicle>Heading angle χ for red square unmanned aerial vehicle _r Differentiation with respect to time; gamma ray _r The track inclination angle of the red square unmanned aerial vehicle is in radian, and the units are +.>Differentiating the track dip angle of the red unmanned aerial vehicle with respect to time; distance error e _x ，e _y ，e _z The unit is km, jersey>Regarding e as distance error _x ，e _y ，e _z Differentiating in time; g is gravity acceleration; v (V) _c Is the sound velocity, n _x ，n _y ，n _z The overload control quantity of the red unmanned aerial vehicle is normally limited by saturation.

For convenience of description, the nonlinear continuous state space equation of the unmanned aerial vehicle is abbreviated as

Wherein x= [ V _r ，χ _r ，γ _r ，e _x ，e _y ，e _z ] ^T Representing a state vector for the red-side aircraft,represents the differentiation of the red aircraft state vector x with respect to time, u= [ n ] _x ，n _y ，n _z ] ^T Representing the control vector of the red aircraft, F (x), G (x) are:

because the unmanned aerial vehicle chases after the escape problem is a nonlinear optimal control problem with a saturated actuator, the performance index function is defined as follows:

V _r ＝[cosγ _r cosχ _r ，cosγ _r sinχ _r ，sinγ _r ] ^T ，

the blue-side unmanned aerial vehicle speed vector is:

V _b ＝[cosγ _b cosχ _b ，cosγ _b sinχ _b ，sinγ _b ] ^T ，

red square unmanned aerial vehicle is to blue Fang Lanfang unmanned aerial vehicle distance vector be e _rb ＝[e _x ，e _y ，e _z ] ^T As shown in fig. 3, the geometric relationship is:

during air combat, it is desirable to have alpha _r ，α _b Are as small as possible to achieve the dominant red angle situation. Taking a red square as an example, the method comprises the steps of,when alpha is _r -(π-α _b ) < 0, i.e. alpha _r +α _b The attack angle of the red square is dominant when pi is less than the angle of the red square; conversely, if alpha _r +α _b If pi is larger, the attack angle of the red party is in a disadvantage; when alpha is _r +α _b Pi, the red attack angle is at equilibrium. Setting an angle dominance function:

Q _α ＝cα _r +(1-c)α _b (9)

wherein c＝(α_r +α _b ) 2 pi; the angle alpha can be dynamically adjusted by the weight c _r ，α _b When c is less than 0.5, the attack angle of the red square is dominant, and alpha should be optimized with emphasis _b The blue party is prevented from obtaining the dominant angle situation; when c is more than 0.5, the attack angle of the red square is at a disadvantage, and alpha should be optimized with emphasis _r The dominant angle situation is obtained for the red square.

In the tracking problem, the goal of the red party is to shorten the distance to the blue party, thus defining a distance dominance function as:

Q _d ＝e ^T Q ₁ e (10)

wherein e＝[e_x ，e _y ，e _z ] ^T ，Is a positive definite matrix;

the state index function of the red party can be expressed as:

Q(x，t)＝Q _d +Q ₂ Q _α (11)

wherein Q₂ Is a weight coefficient;

in order to meet the control limitation requirement, the unmanned aerial vehicle is stable in a stable flight state, and the controller index function is defined as follows:

R(u，t)＝(u-u ₀ ) ^T R(u-u ₀ ) (12)

For the red-side escape-blue-side pursuit problem model establishment, the escape problem is different from the pursuit problem in that the objective function is opposite to the pursuit problem, so as to maximize the double-machine distance. Meanwhile, in order to avoid the missile, when the distance between the unmanned aerial vehicle and the missile is smaller, the unmanned aerial vehicle needs to change the course and the climbing angle in a large maneuver, so as to avoid the missile. In order to solve the problem of maximizing the distance between the two aircraft, a virtual displacement method is adopted to minimize the distance between the reverse displacement of the aircraft and the enemy aircraft, namely, the effect of maximizing the position of the aircraft and the position of the enemy aircraft is achieved.

As shown in fig. 4, the host is chased by the enemy machine, and the distance between the host and the enemy machine is intended to be maximized, and the distance between the virtual displacement and the enemy machine is minimized for a virtual displacement speed V' with the opposite direction of the host speed vector V. The virtual displacement is the displacement amount generated by V', namely:

the system model of red-side escape-blue-side chase is expressed as:

wherein the sign and meaning are the same as the pursuit problem.

Generally, an accurate unmanned aerial vehicle system model cannot be obtained in actual operation, but the existing model-free adaptive dynamic programming based on data is seriously dependent on the data, and policy improvement cannot be performed on the basis of the existing data. Therefore, the embodiment adopts model-free self-adaptive dynamic programming to solve the escape problem of the unmanned aerial vehicle, and adopts bounded exploration signals to improve strategies.

/>

the performance index function is:

when the performance index function (16) is a minimum value, the following bellman equation is satisfied:

the optimal control quantity of the real system is as follows:

the formula (20) is used to solve G, and the formula (19) is used to obtain:

from t at both ends of formula (21) ₀ Integrating by t to obtain:

wherein ,ideal neural network weights for the evaluation network and the execution network, respectively; l (L) ₁ ，L ₂ The number of hidden layer neurons of the evaluation network and the execution network, respectively; />Neural network activation functions of the evaluation network and the execution network, respectively; />The reconstruction errors of the evaluation network and the execution network, respectively.

optimization by least squares algorithmNamely:

in the embodiment, an offline neural network model training algorithm is adopted to obtain the real-time control rate of the red unmanned aerial vehicle and the blue unmanned aerial vehicle, and the information of the red control rate and the state information of the red and blue unmanned aerial vehicles are collected in real time. The method specifically comprises the following steps:

In the embodiment, the neural network is updated on line through an on-line model training algorithm at intervals, so that the air combat decision of self-adaptive dynamic programming of the red unmanned aerial vehicle and the blue unmanned aerial vehicle in the tracking-escaping problem is realized. The method specifically comprises the following steps:

S43: utilizing a data setCalculate +.>Calculate +.>

Through the method, the capacity of the online self-adaptive adjustment strategy is improved, and the adaptability of the unmanned aerial vehicle air combat decision in different scenes is improved. The method does not depend on an aircraft system model, has strong generalization capability, and can be popularized to the control technology of other equipment, such as a plurality of application scenes of unmanned vehicles, mechanical arms and the like. Thus, the present invention provides a significant and substantial improvement over the prior art.

The above embodiment is only one of the preferred embodiments of the present invention, and should not be used to limit the scope of the present invention, but all the insubstantial modifications or color changes made in the main design concept and spirit of the present invention are still consistent with the present invention, and all the technical problems to be solved are included in the scope of the present invention.

Claims

1. The data-driven self-adaptive dynamic programming air combat decision-making method is characterized by comprising the following steps of:

2. The data-driven adaptive dynamic programming air combat decision method of claim 1, wherein the red party pursuit-blue party escape problem model building method is as follows:

real-time position of red square unmanned aerial vehicle is X _r (t), blue unmanned aerial vehicle position is X _b (t), the two-party position difference is:

e＝X _b (t)-X _r (t) (1)

the tracking error system is:

the system model of the red party pursuing the blue party is expressed as:

3. The data-driven adaptive dynamic programming air combat decision method of claim 2, wherein the red party escape-blue party pursuit problem model building method is as follows:

the system model of red-side escape-blue-side chase is expressed as:

wherein the sign and meaning are the same as the pursuit problem.

4. The data-driven adaptive dynamic programming air combat decision method of claim 2, wherein processing the red party chase-blue party escape chase problem system model comprises:

wherein ,representing a red aircraft state vector, +.>Differential with respect to time of the state vector x representing the red aircraft,/->Representing control vectors of the red aircraft, F (x), G (x) are respectively

S12, defining a performance index function as:

the blue-side unmanned aerial vehicle speed vector is:

distance vector of red unmanned aerial vehicle to blue Fang Lanfang unmanned aerial vehicle isThe geometric relationship is that

Obtaining an angle dominance function:

Q _α ＝cα _r +(1-c)α _b (9)

wherein c＝(α_r +α _b )/2π；

S14, defining a distance dominance function as follows:

wherein Is a positive definite matrix;

the state index function of the red party can be expressed as:

Q(x，t)＝Q _d +Q ₂ Q _α (11)

wherein Q₂ Is a weight coefficient;

s15, defining a controller index function as:

wherein ,for controlling the weighting factor of the quantity, < >>The control quantity of the unmanned aerial vehicle under the stable flight is provided.

5. The method for making a data-driven adaptive dynamic programming air combat decision according to claim 4, wherein said step 2 is implemented as follows:

definition of bounded exploration signal u _e The red unmanned aerial vehicle system model (5) is rewritten as:

the performance index function is:

the optimal control quantity of the real system is as follows:

the formula (20) is used to solve G, and the formula (19) is used to obtain:

from t at both ends of formula (21) ₀ Integrating by t to obtain:

optimization by least squares algorithmNamely:

6. the method for data-driven adaptive dynamic programming air combat decision of claim 5, wherein in step S3, the offline neural network model training algorithm comprises the steps of:

7. The method for data-driven adaptive dynamic programming air combat decision of claim 6, wherein in step S4, the step of online updating the neural network by an online model training algorithm is as follows:

s41: the current neural network is off-weight W _c ，W _a The online learning rate is alpha, and a real-time data set { is obtained by sampling at a fixed time interval delta tx (t), u (t) }, and entering step S42 after a plurality of groups of data are acquired;

S43: utilizing a data setCalculate +.>Calculate +.>