CN116880186A - Data-driven self-adaptive dynamic programming air combat decision method - Google Patents

Data-driven self-adaptive dynamic programming air combat decision method Download PDF

Info

Publication number
CN116880186A
CN116880186A CN202310861633.0A CN202310861633A CN116880186A CN 116880186 A CN116880186 A CN 116880186A CN 202310861633 A CN202310861633 A CN 202310861633A CN 116880186 A CN116880186 A CN 116880186A
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
red
blue
party
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310861633.0A
Other languages
Chinese (zh)
Other versions
CN116880186B (en
Inventor
李彬
宁召柯
史明明
李清亮
陶呈纲
孙绍山
李导
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202310861633.0A priority Critical patent/CN116880186B/en
Publication of CN116880186A publication Critical patent/CN116880186A/en
Application granted granted Critical
Publication of CN116880186B publication Critical patent/CN116880186B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention discloses a data-driven self-adaptive dynamic planning air combat decision method, which comprises the following steps: s1, establishing an unmanned aerial vehicle escape-following problem system model; s2, solving the unmanned aerial vehicle escape problem by adopting model-free self-adaptive dynamic programming; acquiring real-time control rates of the red unmanned aerial vehicle and the blue unmanned aerial vehicle by adopting an offline neural network model training algorithm, and collecting information of the control rates of the red unmanned aerial vehicle and state information of the red unmanned aerial vehicle and the blue unmanned aerial vehicle in real time; and S4, updating the neural network on line through an on-line model training algorithm to realize the air combat decision of the self-adaptive dynamic programming of the red unmanned aerial vehicle and the blue unmanned aerial vehicle in the tracking-escaping problem. The invention combines the advantages of offline training and online training, and improves the capability of the online self-adaptive adjustment strategy of the invention. The method does not depend on an aircraft system model, has strong generalization capability, and can be popularized in a plurality of application scenes.

Description

Data-driven self-adaptive dynamic programming air combat decision method
Technical Field
The invention belongs to the technical field of unmanned aerial vehicles, and particularly relates to a data-driven self-adaptive dynamic planning air combat decision method.
Background
The unmanned air fighter decision is aimed at making it take advantage of or turn down in fight, and the key of research is to design an efficient autonomous decision mechanism. The autonomous decision of the unmanned flight fighter is a mechanism for making a tactical plan or selecting a flight action in real time according to the actual combat environment in the air combat, and the degree of the decision mechanism reflects the intelligent level of the unmanned flight fighter in the modern air combat. The input of the autonomous decision mechanism is various parameters related to air combat, such as flight parameters of an aircraft, weapon parameters, three-dimensional space scene parameters and the relative relationship of two parties of a friend and foe, the decision process is information processing and calculation decision process inside a system, and the output is a tactical plan or some specific flight actions of decision making.
The self-adaptive dynamic programming integrates the ideas of dynamic programming and reinforcement learning, not only inherits the advantages of a dynamic programming method, but also can overcome the problem of dimension disaster generated by dynamic programming. The principle of the self-adaptive dynamic programming is to approximate the performance function and the control strategy in the traditional dynamic programming by adopting a function approximation structure, and obtain an optimal value function and the control strategy by means of the idea of reinforcement learning so as to meet the principle of the Belman optimality. The idea of adaptive dynamic programming can be represented by fig. 1.
Air combat decision making is a complex task involving a large amount of information and variables, making conventional manually-made decision rules difficult to adapt to changing battlefield environments. Therefore, the existing air combat decision method often has the following problems:
1. the static planning method cannot cope with the dynamic environment: conventional decision methods are generally based on preset rules or models, and are difficult to adapt to the battlefield environment and dynamic enemy conditions which change in real time.
2. The manual decision requires a lot of time and effort: the decision process needs to process a large amount of information and variables, consumes a large amount of time and effort, and is also prone to omission and misjudgment.
3. Lack of comprehensive consideration and flexible strain capacity: conventional decision methods typically make decisions based on a single factor or a small number of factors, and it is difficult to comprehensively consider and flexibly strain multiple factors, which may lead to decision bias or inaccuracy.
4. Cannot accommodate the needs of informationized warfare: the modern air combat environment has large information quantity and rapid change, and the traditional method for manually making decision rules cannot adapt to the requirements of informationized war.
Disclosure of Invention
The invention aims to provide a data-driven self-adaptive dynamic planning air combat decision-making method, which mainly solves the problem that the traditional manual decision-making rule is difficult to adapt to continuously-changing battlefield environments.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
a data-driven self-adaptive dynamic programming air combat decision method comprises the following steps:
s1, assuming that the fighter unmanned aerial vehicle is a red unmanned aerial vehicle and a blue unmanned aerial vehicle; respectively establishing an unmanned aerial vehicle escape problem system model according to red side pursuit-blue Fang Taoyi and red side escape-blue side pursuit problems;
s2, solving the unmanned aerial vehicle escape problem by adopting model-free self-adaptive dynamic programming, and improving a strategy by adopting a bounded exploration signal;
s3, acquiring real-time control rates of the red unmanned aerial vehicle and the blue unmanned aerial vehicle by adopting an offline neural network model training algorithm, and collecting state information of the red unmanned aerial vehicle and the blue unmanned aerial vehicle in real time;
and S4, updating the neural network on line through an on-line model training algorithm to realize the air combat decision of the self-adaptive dynamic programming of the red unmanned aerial vehicle and the blue unmanned aerial vehicle in the tracking-escaping problem.
Further, in the invention, the method for establishing the red party pursuit-blue party escape problem model is as follows:
real-time position of red square unmanned aerial vehicleIs X r (t), blue unmanned aerial vehicle position is X b (t), the two-party position difference is:
e=X b (t)-X r (t) (1)
the tracking error system is:
in the formula ,the tracking error is represented as differential of the two-party position difference e with respect to time, < >>Real-time position X of blue unmanned aerial vehicle b (t) differentiation with respect to time, < >>Real-time position X of unmanned aerial vehicle for red square r (t) differentiation with respect to time;
assuming that the red party chaser can only measure the three-dimensional movement speed of the blue party unmanned aerial vehicle, the expression (2) can be expressed specifically as:
the system model of the red party pursuing the blue party is expressed as:
wherein ,Vr The speed of the unmanned aerial vehicle is Mach,speed V of unmanned plane r Differentiation with respect to time; x-shaped articles r The course angle of the red unmanned aerial vehicle is in radian, < > for red square unmanned aerial vehicle>Heading angle χ for red square unmanned aerial vehicle r Differentiation with respect to time; gamma ray r The track inclination angle of the red square unmanned aerial vehicle is in radian, and the units are +.>Differentiating the track dip angle of the red unmanned aerial vehicle with respect to time; distance error e x ,e y ,e z The unit is km, jersey>Regarding e as distance error x ,e y ,e z Differentiating in time; g is gravity acceleration; v (V) c Is the sound velocity, n x ,n y ,n z Is the overload control quantity of the red unmanned aerial vehicle.
Further, in the invention, the method for establishing the red party escape-blue party pursuit problem model is as follows:
the virtual displacement method is adopted to minimize the distance between the local reverse displacement and the enemy aircraft, namely, the effect of maximizing the local position and the enemy aircraft position is achieved, wherein the virtual displacement is the displacement quantity generated by the virtual displacement speed V', namely:
the system model of red-side escape-blue-side chase is expressed as:
wherein the sign and meaning are the same as the pursuit problem.
Further, in the invention, the red party pursuit-blue party escape pursuit problem system model is processed, which comprises the following steps:
s11, the nonlinear continuous state space equation of the unmanned aerial vehicle is abbreviated as:
wherein x= [ V r ,χ r ,γ r ,e x ,e y ,e z ] T Representing a state vector for the red-side aircraft,represents the differentiation of the red aircraft state vector x with respect to time, u= [ n ] x ,n y ,n z ] T Representing control vectors of the red aircraft, F (x), G (x) are respectively
S12, defining a performance index function as:
wherein Q (x, t) is an index function related to the state, and R (u, t) is an index function related to the control amount;
s13, establishing an angle dominance function of the unmanned aerial vehicle, and setting a speed vector of the unmanned aerial vehicle of the red party as follows:
V r =[cosγ r cosχ r ,cosγ r sinχ r ,sinγ r ] T
the blue-side unmanned aerial vehicle speed vector is:
V b =[cosγ b cosχ b ,cosγ b sinχ b ,sinγ b ] T
red square unmanned aerial vehicle is to blue Fang Lanfang unmanned aerial vehicle distance vector be e rb =[e x ,e y ,e z ] T The geometric relationship is that
Obtaining an angle dominance function:
Q α =cα r +(1-c)α b (9)
wherein c=(αrb )/2π;
S14, defining a distance dominance function as follows:
Q d =e T Q 1 e (10)
wherein e=[ex ,e y ,e z ] TIs a positive definite matrix;
the state index function of the red party can be expressed as:
Q(x,t)=Q d +Q 2 Q α (11)
wherein Q2 Is a weight coefficient;
s15, defining a controller index function as:
R(u,t)=(u-u 0 ) T R(u-u 0 ) (12)
wherein ,to control the weighting coefficient, u 0 =[sinγ r ,0,cosγ r ] T The control quantity of the unmanned aerial vehicle under the stable flight is provided.
Further, the specific implementation method of the step 2 in the invention is as follows:
definition of bounded exploration signal u e The model type (5) of the red unmanned aerial vehicle system can be rewritten as:
the performance index function is:
the derivative of the performance index function (7) with respect to time is expressed as:
when the performance index is the minimum value calculated by the functional formula (7), the following Belman equation is satisfied:
wherein r(j) =q (x, t) +r (u, t); by combining the formula (17) and the formula (18), it is possible to obtain:
the optimal control quantity of the real system is as follows:
the formula (20) is used to solve G, and the formula (19) is used to obtain:
from t at both ends of formula (21) 0 Integrating by t to obtain:
a neural network is employed to approximate the cost function and control inputs, namely:
wherein ,ideal neural network weights for the evaluation network and the execution network, respectively; l (L) 1 ,L 2 The number of hidden layer neurons of the evaluation network and the execution network, respectively; />Neural network activation functions of the evaluation network and the execution network, respectively; />The reconstruction errors of the evaluation network and the execution network are respectively;
let the evaluation network and the evaluation network execute the estimated values of the network as follows:
wherein ,respectively the ideal neural network weight W c ,W a Is a function of the estimated value of (2); substituting equation (24) into equation (22) yields the following residual term errors:
wherein The control amount obtained for the improved strategy is expressed as follows:
wherein omega is the exploration set of control quantity, which is composed ofAdding a bounded random search signal, and +.> Optimizing ∈k by least squares algorithm>Namely:
optimization by least squares algorithmNamely:
further, in step S3 in the present invention, the offline neural network model training algorithm includes the steps of:
s31: by giving different initial states, a data set { x } can be obtained k (t 0 ) }, initialize
S32: obtaining control quantity corresponding to the state, namely data set according to the formula (26)
S33: utilizing a data setUpdate according to formula (27)>Update according to formula (28)>
S34: if it isOr->The algorithm is terminated; otherwise j=j+1, step S32 is skipped, where e a 、∈ c Is convergence accuracy.
Further, in step S4, the step of online updating the neural network by the online model training algorithm is as follows:
s41: the current neural network is off-weight W c ,W a The online learning rate is alpha, a real-time data set { x (t), u (t) } is obtained by sampling at a fixed time interval delta t, and a step S42 is carried out after a plurality of groups of data are acquired;
s42: obtaining control quantity corresponding to the state, namely data set according to the formula (26)
S43: utilizing a data setCalculate +.>Calculate +.>
S44: on-line updating of neural network weights, i.eThe process goes to step S41.
Drawings
Fig. 1 is a diagram of a prior art adaptive dynamic programming architecture.
FIG. 2 is a flow chart of the present invention.
Fig. 3 is a schematic view of the angular advantage of the unmanned aerial vehicle according to the embodiment of the present invention.
Fig. 4 is a schematic diagram of virtual displacement principle in an embodiment of the present invention.
Detailed Description
The invention will be further illustrated by the following description and examples, which include but are not limited to the following examples.
As shown in fig. 2, in the data-driven adaptive dynamic programming air combat decision method disclosed by the invention, in a tracking-escaping scene, one tracker and one escaper exist, and in this embodiment, both sides of the tracking and escaping are represented by both red and blue sides. This problem is described herein in terms of red party chase, blue party escape, and red party unmanned aerial vehicle reduces the distance with blue party unmanned aerial vehicle through maneuver, avoids simultaneously being caught by blue party unmanned aerial vehicle, avoids being aimed at by blue party unmanned aerial vehicle aircraft nose promptly to be in the inferior situation.
In the embodiment, an unmanned aerial vehicle escape-following problem system model is established according to the red-side pursuit-blue Fang Taoyi and red-side escape-blue-side pursuit problems respectively.
First, the real-time position of the red-recording unmanned aerial vehicle is X r (t), blue unmanned aerial vehicle position is X b (t), the two-party position difference is:
e=X b (t)-X r (t) (1)
the tracking error system is:
in the formula ,the tracking error is represented as differential of the two-party position difference e with respect to time, < >>Real-time position X of blue unmanned aerial vehicle b (t) differentiation with respect to time, < >>Real-time position X of unmanned aerial vehicle for red square r (t) differentiation with respect to time;
assuming that the red party chaser can only measure the three-dimensional movement speed of the blue party unmanned aerial vehicle, the expression (2) can be expressed specifically as:
the system model of the red party pursuing the blue party is expressed as:
wherein ,Vr The speed of the unmanned aerial vehicle is Mach,speed V of unmanned plane r Differentiation with respect to time; x-shaped articles r The course angle of the red unmanned aerial vehicle is in radian, < > for red square unmanned aerial vehicle>Heading angle χ for red square unmanned aerial vehicle r Differentiation with respect to time; gamma ray r The track inclination angle of the red square unmanned aerial vehicle is in radian, and the units are +.>Differentiating the track dip angle of the red unmanned aerial vehicle with respect to time; distance error e x ,e y ,e z The unit is km, jersey>Regarding e as distance error x ,e y ,e z Differentiating in time; g is gravity acceleration; v (V) c Is the sound velocity, n x ,n y ,n z The overload control quantity of the red unmanned aerial vehicle is normally limited by saturation.
For convenience of description, the nonlinear continuous state space equation of the unmanned aerial vehicle is abbreviated as
Wherein x= [ V r ,χ r ,γ r ,e x ,e y ,e z ] T Representing a state vector for the red-side aircraft,represents the differentiation of the red aircraft state vector x with respect to time, u= [ n ] x ,n y ,n z ] T Representing the control vector of the red aircraft, F (x), G (x) are:
because the unmanned aerial vehicle chases after the escape problem is a nonlinear optimal control problem with a saturated actuator, the performance index function is defined as follows:
wherein Q (x, t) is an index function related to the state, and R (u, t) is an index function related to the control amount;
s13, establishing an angle dominance function of the unmanned aerial vehicle, and setting a speed vector of the unmanned aerial vehicle of the red party as follows:
V r =[cosγ r cosχ r ,cosγ r sinχ r ,sinγ r ] T
the blue-side unmanned aerial vehicle speed vector is:
V b =[cosγ b cosχ b ,cosγ b sinχ b ,sinγ b ] T
red square unmanned aerial vehicle is to blue Fang Lanfang unmanned aerial vehicle distance vector be e rb =[e x ,e y ,e z ] T As shown in fig. 3, the geometric relationship is:
during air combat, it is desirable to have alpha r ,α b Are as small as possible to achieve the dominant red angle situation. Taking a red square as an example, the method comprises the steps of,when alpha is r -(π-α b ) < 0, i.e. alpha rb The attack angle of the red square is dominant when pi is less than the angle of the red square; conversely, if alpha rb If pi is larger, the attack angle of the red party is in a disadvantage; when alpha is rb Pi, the red attack angle is at equilibrium. Setting an angle dominance function:
Q α =cα r +(1-c)α b (9)
wherein c=(αrb ) 2 pi; the angle alpha can be dynamically adjusted by the weight c r ,α b When c is less than 0.5, the attack angle of the red square is dominant, and alpha should be optimized with emphasis b The blue party is prevented from obtaining the dominant angle situation; when c is more than 0.5, the attack angle of the red square is at a disadvantage, and alpha should be optimized with emphasis r The dominant angle situation is obtained for the red square.
In the tracking problem, the goal of the red party is to shorten the distance to the blue party, thus defining a distance dominance function as:
Q d =e T Q 1 e (10)
wherein e=[ex ,e y ,e z ] TIs a positive definite matrix;
the state index function of the red party can be expressed as:
Q(x,t)=Q d +Q 2 Q α (11)
wherein Q2 Is a weight coefficient;
in order to meet the control limitation requirement, the unmanned aerial vehicle is stable in a stable flight state, and the controller index function is defined as follows:
R(u,t)=(u-u 0 ) T R(u-u 0 ) (12)
wherein ,to control the weighting coefficient, u 0 =[sinγ r ,0,cosγ r ] T The control quantity of the unmanned aerial vehicle under the stable flight is provided.
For the red-side escape-blue-side pursuit problem model establishment, the escape problem is different from the pursuit problem in that the objective function is opposite to the pursuit problem, so as to maximize the double-machine distance. Meanwhile, in order to avoid the missile, when the distance between the unmanned aerial vehicle and the missile is smaller, the unmanned aerial vehicle needs to change the course and the climbing angle in a large maneuver, so as to avoid the missile. In order to solve the problem of maximizing the distance between the two aircraft, a virtual displacement method is adopted to minimize the distance between the reverse displacement of the aircraft and the enemy aircraft, namely, the effect of maximizing the position of the aircraft and the position of the enemy aircraft is achieved.
As shown in fig. 4, the host is chased by the enemy machine, and the distance between the host and the enemy machine is intended to be maximized, and the distance between the virtual displacement and the enemy machine is minimized for a virtual displacement speed V' with the opposite direction of the host speed vector V. The virtual displacement is the displacement amount generated by V', namely:
the system model of red-side escape-blue-side chase is expressed as:
wherein the sign and meaning are the same as the pursuit problem.
Generally, an accurate unmanned aerial vehicle system model cannot be obtained in actual operation, but the existing model-free adaptive dynamic programming based on data is seriously dependent on the data, and policy improvement cannot be performed on the basis of the existing data. Therefore, the embodiment adopts model-free self-adaptive dynamic programming to solve the escape problem of the unmanned aerial vehicle, and adopts bounded exploration signals to improve strategies.
Definition of bounded exploration signal u e The model type (5) of the red unmanned aerial vehicle system can be rewritten as:
/>
the performance index function is:
the derivative of the performance index function (7) with respect to time is expressed as:
when the performance index function (16) is a minimum value, the following bellman equation is satisfied:
wherein r(j) =q (x, t) +r (u, t); by combining the formula (17) and the formula (18), it is possible to obtain:
the optimal control quantity of the real system is as follows:
the formula (20) is used to solve G, and the formula (19) is used to obtain:
from t at both ends of formula (21) 0 Integrating by t to obtain:
a neural network is employed to approximate the cost function and control inputs, namely:
wherein ,ideal neural network weights for the evaluation network and the execution network, respectively; l (L) 1 ,L 2 The number of hidden layer neurons of the evaluation network and the execution network, respectively; />Neural network activation functions of the evaluation network and the execution network, respectively; />The reconstruction errors of the evaluation network and the execution network, respectively.
Let the evaluation network and the evaluation network execute the estimated values of the network as follows:
wherein ,respectively the ideal neural network weight W c ,W a Is a function of the estimated value of (2); substituting equation (24) into equation (22) yields the following residual term errors:
wherein The control amount obtained for the improved strategy is expressed as follows:
wherein omega is the exploration set of control quantity, which is composed ofAdding a bounded random search signal, and +.> Optimizing ∈k by least squares algorithm>Namely:
optimization by least squares algorithmNamely:
in the embodiment, an offline neural network model training algorithm is adopted to obtain the real-time control rate of the red unmanned aerial vehicle and the blue unmanned aerial vehicle, and the information of the red control rate and the state information of the red and blue unmanned aerial vehicles are collected in real time. The method specifically comprises the following steps:
s31: by giving different initial states, a data set { x } can be obtained k (t 0 ) }, initialize
S32: obtaining control quantity corresponding to the state, namely data set according to the formula (26)
S33: utilizing a data setUpdate according to formula (27)>Update according to formula (28)>
S34: if it isOr->The algorithm is terminated; otherwise j=j+1, step S32 is skipped, where e a 、∈ c Is convergence accuracy.
In the embodiment, the neural network is updated on line through an on-line model training algorithm at intervals, so that the air combat decision of self-adaptive dynamic programming of the red unmanned aerial vehicle and the blue unmanned aerial vehicle in the tracking-escaping problem is realized. The method specifically comprises the following steps:
s41: the current neural network is off-weight W c ,W a The online learning rate is alpha, a real-time data set { x (t), u (t) } is obtained by sampling at a fixed time interval delta t, and a step S42 is carried out after a plurality of groups of data are acquired;
s42: obtaining control quantity corresponding to the state, namely data set according to the formula (26)
S43: utilizing a data setCalculate +.>Calculate +.>
S44: on-line updating of neural network weights, i.eThe process goes to step S41.
Through the method, the capacity of the online self-adaptive adjustment strategy is improved, and the adaptability of the unmanned aerial vehicle air combat decision in different scenes is improved. The method does not depend on an aircraft system model, has strong generalization capability, and can be popularized to the control technology of other equipment, such as a plurality of application scenes of unmanned vehicles, mechanical arms and the like. Thus, the present invention provides a significant and substantial improvement over the prior art.
The above embodiment is only one of the preferred embodiments of the present invention, and should not be used to limit the scope of the present invention, but all the insubstantial modifications or color changes made in the main design concept and spirit of the present invention are still consistent with the present invention, and all the technical problems to be solved are included in the scope of the present invention.

Claims (7)

1. The data-driven self-adaptive dynamic programming air combat decision-making method is characterized by comprising the following steps of:
s1, assuming that the fighter unmanned aerial vehicle is a red unmanned aerial vehicle and a blue unmanned aerial vehicle; respectively establishing an unmanned aerial vehicle escape problem system model according to red side pursuit-blue Fang Taoyi and red side escape-blue side pursuit problems;
s2, solving the unmanned aerial vehicle escape problem by adopting model-free self-adaptive dynamic programming, and improving a strategy by adopting a bounded exploration signal;
s3, acquiring real-time control rates of the red unmanned aerial vehicle and the blue unmanned aerial vehicle by adopting an offline neural network model training algorithm, and collecting state information of the red unmanned aerial vehicle and the blue unmanned aerial vehicle in real time;
and S4, updating the neural network on line through an on-line model training algorithm to realize the air combat decision of the self-adaptive dynamic programming of the red unmanned aerial vehicle and the blue unmanned aerial vehicle in the tracking-escaping problem.
2. The data-driven adaptive dynamic programming air combat decision method of claim 1, wherein the red party pursuit-blue party escape problem model building method is as follows:
real-time position of red square unmanned aerial vehicle is X r (t), blue unmanned aerial vehicle position is X b (t), the two-party position difference is:
e=X b (t)-X r (t) (1)
the tracking error system is:
in the formula ,the tracking error is represented as differential of the two-party position difference e with respect to time, < >>Real-time position X of blue unmanned aerial vehicle b (t) differentiation with respect to time, < >>Real-time position X of unmanned aerial vehicle for red square r (t) differentiation with respect to time;
assuming that the red party chaser can only measure the three-dimensional movement speed of the blue party unmanned aerial vehicle, the expression (2) can be expressed specifically as:
the system model of the red party pursuing the blue party is expressed as:
wherein ,Vr The speed of the unmanned aerial vehicle is Mach,speed V of unmanned plane r Differentiation with respect to time; x-shaped articles r The course angle of the red unmanned aerial vehicle is in radian, < > for red square unmanned aerial vehicle>Heading angle χ for red square unmanned aerial vehicle r Differentiation with respect to time; gamma ray r The track inclination angle of the red square unmanned aerial vehicle is in radian, and the units are +.>Differentiating the track dip angle of the red unmanned aerial vehicle with respect to time; distance error e x ,e y ,e z The unit is km, jersey>Regarding e as distance error x ,e y ,e z Differentiating in time; g is gravity acceleration; v (V) c Is the sound velocity, n x ,n y ,n z Is the overload control quantity of the red unmanned aerial vehicle.
3. The data-driven adaptive dynamic programming air combat decision method of claim 2, wherein the red party escape-blue party pursuit problem model building method is as follows:
the virtual displacement method is adopted to minimize the distance between the local reverse displacement and the enemy aircraft, namely, the effect of maximizing the local position and the enemy aircraft position is achieved, wherein the virtual displacement is the displacement quantity generated by the virtual displacement speed V', namely:
the system model of red-side escape-blue-side chase is expressed as:
wherein the sign and meaning are the same as the pursuit problem.
4. The data-driven adaptive dynamic programming air combat decision method of claim 2, wherein processing the red party chase-blue party escape chase problem system model comprises:
s11, the nonlinear continuous state space equation of the unmanned aerial vehicle is abbreviated as:
wherein ,representing a red aircraft state vector, +.>Differential with respect to time of the state vector x representing the red aircraft,/->Representing control vectors of the red aircraft, F (x), G (x) are respectively
S12, defining a performance index function as:
wherein Q (x, t) is an index function related to the state, and R (u, t) is an index function related to the control amount;
s13, establishing an angle dominance function of the unmanned aerial vehicle, and setting a speed vector of the unmanned aerial vehicle of the red party as follows:
the blue-side unmanned aerial vehicle speed vector is:
distance vector of red unmanned aerial vehicle to blue Fang Lanfang unmanned aerial vehicle isThe geometric relationship is that
Obtaining an angle dominance function:
Q α =cα r +(1-c)α b (9)
wherein c=(αrb )/2π;
S14, defining a distance dominance function as follows:
wherein Is a positive definite matrix;
the state index function of the red party can be expressed as:
Q(x,t)=Q d +Q 2 Q α (11)
wherein Q2 Is a weight coefficient;
s15, defining a controller index function as:
wherein ,for controlling the weighting factor of the quantity, < >>The control quantity of the unmanned aerial vehicle under the stable flight is provided.
5. The method for making a data-driven adaptive dynamic programming air combat decision according to claim 4, wherein said step 2 is implemented as follows:
definition of bounded exploration signal u e The red unmanned aerial vehicle system model (5) is rewritten as:
the performance index function is:
the derivative of the performance index function (7) with respect to time is expressed as:
when the performance index function (16) is a minimum value, the following bellman equation is satisfied:
wherein r(j) =q (x, t) +r (u, t); by combining the formula (17) and the formula (18), it is possible to obtain:
the optimal control quantity of the real system is as follows:
the formula (20) is used to solve G, and the formula (19) is used to obtain:
from t at both ends of formula (21) 0 Integrating by t to obtain:
a neural network is employed to approximate the cost function and control inputs, namely:
wherein ,ideal neural network weights for the evaluation network and the execution network, respectively; l (L) 1 ,L 2 The number of hidden layer neurons of the evaluation network and the execution network, respectively; />Neural network activation functions of the evaluation network and the execution network, respectively; />The reconstruction errors of the evaluation network and the execution network are respectively;
let the evaluation network and the evaluation network execute the estimated values of the network as follows:
wherein ,respectively the ideal neural network weight W c ,W a Is a function of the estimated value of (2); substituting equation (24) into equation (22) yields the following residual term errors:
wherein The control amount obtained for the improved strategy is expressed as follows:
wherein omega is the exploration set of control quantity, which is composed ofAdding a bounded random search signal, and +.> Optimizing ∈k by least squares algorithm>Namely:
optimization by least squares algorithmNamely:
6. the method for data-driven adaptive dynamic programming air combat decision of claim 5, wherein in step S3, the offline neural network model training algorithm comprises the steps of:
s31: by giving different initial states, a data set { x } can be obtained k (t 0 ) }, initialize
S32: obtaining control quantity corresponding to the state, namely data set according to the formula (26)
S33: utilizing a data setUpdate according to formula (27)>Update according to formula (28)>
S34: if it isOr->The algorithm is terminated; otherwise j=j+1, step S32 is skipped, where e a 、∈ c Is convergence accuracy.
7. The method for data-driven adaptive dynamic programming air combat decision of claim 6, wherein in step S4, the step of online updating the neural network by an online model training algorithm is as follows:
s41: the current neural network is off-weight W c ,W a The online learning rate is alpha, and a real-time data set { is obtained by sampling at a fixed time interval delta tx (t), u (t) }, and entering step S42 after a plurality of groups of data are acquired;
s42: obtaining control quantity corresponding to the state, namely data set according to the formula (26)
S43: utilizing a data setCalculate +.>Calculate +.>
S44: on-line updating of neural network weights, i.eThe process goes to step S41.
CN202310861633.0A 2023-07-13 2023-07-13 Data-driven self-adaptive dynamic programming air combat decision method Active CN116880186B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310861633.0A CN116880186B (en) 2023-07-13 2023-07-13 Data-driven self-adaptive dynamic programming air combat decision method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310861633.0A CN116880186B (en) 2023-07-13 2023-07-13 Data-driven self-adaptive dynamic programming air combat decision method

Publications (2)

Publication Number Publication Date
CN116880186A true CN116880186A (en) 2023-10-13
CN116880186B CN116880186B (en) 2024-04-16

Family

ID=88265747

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310861633.0A Active CN116880186B (en) 2023-07-13 2023-07-13 Data-driven self-adaptive dynamic programming air combat decision method

Country Status (1)

Country Link
CN (1) CN116880186B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109085754A (en) * 2018-07-25 2018-12-25 西北工业大学 A kind of spacecraft neural network based is pursued and captured an escaped prisoner game method
CN112215283A (en) * 2020-10-12 2021-01-12 中国人民解放军海军航空大学 Close-range air combat intelligent decision method based on manned/unmanned aerial vehicle system
CN113050686A (en) * 2021-03-19 2021-06-29 北京航空航天大学 Combat strategy optimization method and system based on deep reinforcement learning
CN113095481A (en) * 2021-04-03 2021-07-09 西北工业大学 Air combat maneuver method based on parallel self-game
CN113791634A (en) * 2021-08-22 2021-12-14 西北工业大学 Multi-aircraft air combat decision method based on multi-agent reinforcement learning
CN114330115A (en) * 2021-10-27 2022-04-12 中国空气动力研究与发展中心计算空气动力研究所 Neural network air combat maneuver decision method based on particle swarm search
CN115951709A (en) * 2023-01-09 2023-04-11 中国人民解放军国防科技大学 Multi-unmanned aerial vehicle air combat strategy generation method based on TD3
CN116187777A (en) * 2022-12-28 2023-05-30 中国航空研究院 Unmanned aerial vehicle air combat autonomous decision-making method based on SAC algorithm and alliance training
CN116185059A (en) * 2022-08-17 2023-05-30 西北工业大学 Unmanned aerial vehicle air combat autonomous evasion maneuver decision-making method based on deep reinforcement learning
CN116400718A (en) * 2023-04-06 2023-07-07 中国人民解放军空军航空大学 Unmanned aerial vehicle short-distance air combat maneuver autonomous decision-making method, system, equipment and terminal

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109085754A (en) * 2018-07-25 2018-12-25 西北工业大学 A kind of spacecraft neural network based is pursued and captured an escaped prisoner game method
CN112215283A (en) * 2020-10-12 2021-01-12 中国人民解放军海军航空大学 Close-range air combat intelligent decision method based on manned/unmanned aerial vehicle system
CN113050686A (en) * 2021-03-19 2021-06-29 北京航空航天大学 Combat strategy optimization method and system based on deep reinforcement learning
CN113095481A (en) * 2021-04-03 2021-07-09 西北工业大学 Air combat maneuver method based on parallel self-game
US20220315219A1 (en) * 2021-04-03 2022-10-06 Northwestern Polytechnical University Air combat maneuvering method based on parallel self-play
CN113791634A (en) * 2021-08-22 2021-12-14 西北工业大学 Multi-aircraft air combat decision method based on multi-agent reinforcement learning
CN114330115A (en) * 2021-10-27 2022-04-12 中国空气动力研究与发展中心计算空气动力研究所 Neural network air combat maneuver decision method based on particle swarm search
CN116185059A (en) * 2022-08-17 2023-05-30 西北工业大学 Unmanned aerial vehicle air combat autonomous evasion maneuver decision-making method based on deep reinforcement learning
CN116187777A (en) * 2022-12-28 2023-05-30 中国航空研究院 Unmanned aerial vehicle air combat autonomous decision-making method based on SAC algorithm and alliance training
CN115951709A (en) * 2023-01-09 2023-04-11 中国人民解放军国防科技大学 Multi-unmanned aerial vehicle air combat strategy generation method based on TD3
CN116400718A (en) * 2023-04-06 2023-07-07 中国人民解放军空军航空大学 Unmanned aerial vehicle short-distance air combat maneuver autonomous decision-making method, system, equipment and terminal

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JINGJING XU, ET AL.: "Deep Neural Network-Based Footprint Prediction and Attack Intention Inference of Hypersonic Glide Vehicles", 《MATHEMATICS 》, vol. 11, no. 1, pages 1 - 24 *

Also Published As

Publication number Publication date
CN116880186B (en) 2024-04-16

Similar Documents

Publication Publication Date Title
CN113095481B (en) Air combat maneuver method based on parallel self-game
CN114330115B (en) Neural network air combat maneuver decision-making method based on particle swarm search
CN112215283A (en) Close-range air combat intelligent decision method based on manned/unmanned aerial vehicle system
CN113848974B (en) Aircraft trajectory planning method and system based on deep reinforcement learning
CN113268081B (en) Small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning
CN114063644B (en) Unmanned fighter plane air combat autonomous decision-making method based on pigeon flock reverse countermeasure learning
CN111898201B (en) High-precision autonomous attack guiding method for fighter in air combat simulation environment
CN106527462A (en) Unmanned aerial vehicle (UAV) control device
Li et al. Autonomous maneuver decision-making for a UCAV in short-range aerial combat based on an MS-DDQN algorithm
CN113962012A (en) Unmanned aerial vehicle countermeasure strategy optimization method and device
CN114003050A (en) Active defense guidance method of three-body countermeasure strategy based on differential game
CN115688268A (en) Aircraft near-distance air combat situation assessment adaptive weight design method
CN116501086A (en) Aircraft autonomous avoidance decision method based on reinforcement learning
CN115525058A (en) Unmanned underwater vehicle cluster cooperative countermeasure method based on deep reinforcement learning
CN117313561B (en) Unmanned aerial vehicle intelligent decision model training method and unmanned aerial vehicle intelligent decision method
Cao et al. Autonomous maneuver decision of UCAV air combat based on double deep Q network algorithm and stochastic game theory
CN116880186B (en) Data-driven self-adaptive dynamic programming air combat decision method
CN117471952A (en) Integrated control method for backstepping supercoiled sliding mode guidance of aircraft
CN114815878B (en) Hypersonic aircraft collaborative guidance method based on real-time optimization and deep learning
CN114371729B (en) Unmanned aerial vehicle air combat maneuver decision method based on distance-first experience playback
CN115421508A (en) Particle swarm optimization method suitable for multi-unmanned aerial vehicle multi-target track planning
CN114995129A (en) Distributed optimal event trigger cooperative guidance method
CN113625739A (en) Expert system optimization method based on heuristic maneuver selection algorithm
CN113110428A (en) Carrier-based aircraft landing fixed time trajectory tracking method based on limited backstepping control
CN117332684B (en) Optimal capturing method under multi-spacecraft chase-escaping game based on reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant