CN113791634A - Multi-aircraft air combat decision method based on multi-agent reinforcement learning - Google Patents
Multi-aircraft air combat decision method based on multi-agent reinforcement learning Download PDFInfo
- Publication number
- CN113791634A CN113791634A CN202110964271.9A CN202110964271A CN113791634A CN 113791634 A CN113791634 A CN 113791634A CN 202110964271 A CN202110964271 A CN 202110964271A CN 113791634 A CN113791634 A CN 113791634A
- Authority
- CN
- China
- Prior art keywords
- machine
- blue
- unmanned aerial
- aerial vehicle
- red
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 52
- 238000000034 method Methods 0.000 title claims abstract description 31
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 48
- 230000006870 function Effects 0.000 claims abstract description 33
- 238000010606 normalization Methods 0.000 claims abstract description 15
- 238000013528 artificial neural network Methods 0.000 claims abstract description 14
- 230000009471 action Effects 0.000 claims description 61
- 230000008901 benefit Effects 0.000 claims description 24
- 239000011159 matrix material Substances 0.000 claims description 10
- 230000006378 damage Effects 0.000 claims description 5
- 238000012549 training Methods 0.000 claims description 5
- 230000001133 acceleration Effects 0.000 claims description 4
- 238000004088 simulation Methods 0.000 claims description 4
- 241001025261 Neoraja caerulea Species 0.000 claims description 3
- 230000000977 initiatory effect Effects 0.000 claims description 3
- 238000004806 packaging method and process Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 230000003068 static effect Effects 0.000 claims description 3
- 230000009189 diving Effects 0.000 claims description 2
- 230000005484 gravity Effects 0.000 claims description 2
- SDVOJIRXJLEGCY-ZFRLFKIJSA-N spc3 Chemical compound N([C@@H](CCCNC(N)=N)C(=O)N[C@@H](C)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)NCCCC[C@H](NC(=O)[C@H](CC=1C=CC=CC=1)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(N)=N)NC(=O)CNC(=O)[C@H]1N(CCC1)C(=O)CN)C(=O)NCCCC[C@H](NC(=O)[C@H](CCCCNC(=O)[C@H](CC=1C=CC=CC=1)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(N)=N)NC(=O)CNC(=O)[C@H]1N(CCC1)C(=O)CN)NC(=O)[C@H](CC=1C=CC=CC=1)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(N)=N)NC(=O)CNC(=O)[C@H]1N(CCC1)C(=O)CN)C(=O)NCCCC[C@H](NC(=O)[C@H](CCCCNC(=O)[C@H](CCCCNC(=O)[C@H](CC=1C=CC=CC=1)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(N)=N)NC(=O)CNC(=O)[C@H]1N(CCC1)C(=O)CN)NC(=O)[C@H](CC=1C=CC=CC=1)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(N)=N)NC(=O)CNC(=O)[C@H]1N(CCC1)C(=O)CN)NC(=O)[C@H](CCCCNC(=O)[C@H](CC=1C=CC=CC=1)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(N)=N)NC(=O)CNC(=O)[C@H]1N(CCC1)C(=O)CN)NC(=O)[C@H](CC=1C=CC=CC=1)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(N)=N)NC(=O)CNC(=O)[C@H]1N(CCC1)C(=O)CN)C(=O)NCCC(O)=O)C(=O)CNC(=O)[C@@H]1CCCN1C(=O)CN SDVOJIRXJLEGCY-ZFRLFKIJSA-N 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 abstract description 4
- 230000001052 transient effect Effects 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 6
- 230000008859 change Effects 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000002054 transplantation Methods 0.000 description 2
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 210000000987 immune system Anatomy 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000002688 persistence Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/10—Simultaneous control of position or course in three dimensions
- G05D1/101—Simultaneous control of position or course in three dimensions specially adapted for aircraft
- G05D1/104—Simultaneous control of position or course in three dimensions specially adapted for aircraft involving a plurality of aircrafts, e.g. formation flying
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
The invention discloses a multi-aircraft air combat decision method based on multi-agent reinforcement learning, which comprises the steps of firstly establishing a six-degree-of-freedom model, a missile model, a neural network normalization model, a battlefield environment model, situation judgment and target distribution model of an unmanned aerial vehicle; then, adopting an MAPPO algorithm as a multi-agent reinforcement learning algorithm, and designing a corresponding return function on the basis of a specific air combat environment; and finally, combining the constructed unmanned aerial vehicle model with a multi-agent reinforcement learning algorithm to generate a final multi-machine cooperative air combat decision method based on multi-agent reinforcement learning. The method effectively solves the problems that the traditional multi-agent collaborative air combat is large in calculation amount and difficult to deal with the situation of a battlefield which needs to settle out the transient changes in real time.
Description
Technical Field
The invention belongs to the technical field of unmanned aerial vehicles, and particularly relates to a multi-aircraft air combat decision method.
Background
The decision-making of the unmanned aerial vehicle is to make the unmanned aerial vehicle take advantages or get disadvantages into advantages in the battle, and the key of the research is to design an efficient autonomous decision-making mechanism. The autonomous decision-making of the unmanned aerial fighter is a mechanism for making a tactical plan or selecting flight actions in real time according to actual combat environment in air combat, and the degree of goodness of the decision-making mechanism reflects the intelligent level of the unmanned aerial fighter in modern air combat. The input of the autonomous decision making mechanism is various parameters related to air combat, such as flight parameters of an aircraft, weapon parameters, three-dimensional space scene parameters and the relative relationship between enemy and my parties, the decision making process is an information processing and calculation decision making process in the system, and the output is a tactical plan made by decision making or certain specific flight actions.
At present, the decision-making method for researching the air combat tactics can be basically divided into two types, the first type is that the traditional rule-based non-learning strategy mainly comprises a differential countermeasure method, an expert system, an influence graph method, a matrix game algorithm and the like, the decision-making strategies of the traditional rule-based non-learning strategy are generally fixed and cannot completely cover the problem of complex and instantaneously-changed multi-machine air combat, the second type is that the self-learning strategy based on an intelligent algorithm mainly comprises an artificial immune system, a genetic algorithm, transfer learning, an approximate dynamic programming algorithm, reinforcement learning and the like, and the structure and parameters of a self-decision model are optimized through self experience. The self-learning strategy has strong adaptability and can deal with the air battle field environment with complex and changeable situation.
With the development of air combat technology, the air combat of the modern unmanned aerial vehicle is not limited to the operation environment of one aircraft to one aircraft in the prior art, formation cooperation means many-to-many unmanned aerial vehicle attack mode, mutual shielding among the unmanned aerial vehicles, and cooperative attack also becomes an important component of multi-aircraft air combat decision.
The difficulty of multi-agent multi-machine tactical decision is mainly reflected in (1) cooperation of multi-heterogeneous agents. (2) Real-time confrontation and action persistence. (3) Incomplete information play and strong uncertainty. (4) Huge search space and multiple complex tasks. With the breakthrough and development of artificial intelligence technology taking deep reinforcement learning as a core, a new technical approach is developed for the intellectualization of a command information system, and a new solution is brought for complex multi-agent multi-air decision.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a multi-aircraft air combat decision method based on multi-agent reinforcement learning, which comprises the steps of firstly establishing a six-degree-of-freedom model, a missile model, a neural network normalization model, a battlefield environment model, a situation judgment model and a target distribution model of an unmanned aerial vehicle; then, adopting an MAPPO algorithm as a multi-agent reinforcement learning algorithm, and designing a corresponding return function on the basis of a specific air combat environment; and finally, combining the constructed unmanned aerial vehicle model with a multi-agent reinforcement learning algorithm to generate a final multi-machine cooperative air combat decision method based on multi-agent reinforcement learning. The method effectively solves the problems that the traditional multi-agent collaborative air combat is large in calculation amount and difficult to deal with the situation of a battlefield which needs to settle out the transient changes in real time.
The technical scheme adopted by the invention for solving the technical problem comprises the following steps:
step 1: the unmanned aerial vehicles of the two parties of the battle are assumed to be the unmanned aerial vehicle of the same party and the unmanned aerial vehicle of the opposite party, the unmanned aerial vehicle of the same party is the red machine, and the unmanned aerial vehicle of the opposite party is the blue machine; establishing a six-degree-of-freedom model, a missile model, a neural network normalization model, a battlefield environment model, a situation judgment model and a target distribution model of the unmanned aerial vehicle;
step 2: adopting an MAPPO algorithm as a multi-agent reinforcement learning algorithm, and designing a corresponding return function on the basis of a specific air combat environment;
and step 3: and (3) combining the unmanned aerial vehicle model constructed in the step (1) with the multi-agent reinforcement learning algorithm in the step (2) to generate a final multi-machine cooperative air combat decision method based on multi-agent reinforcement learning.
Further, in the step 1, an airplane model, a missile model, a neural network normalization model, a battlefield environment model, a situation judgment model and a target distribution model of the unmanned aerial vehicle are established, and the specific steps are as follows:
step 1-1: establishing an airplane model of the unmanned aerial vehicle;
step 1-1-1: input state S of unmanned aerial vehicler=[Vr,γr,φr,xr,yr,hr]Speed V of the unmanned aerial vehiclerAngle of pitch γrAngle of roll phirThree axis position (x)r,yr,hr);
Step 1-1-2: constructing a six-degree-of-freedom model and seven actions of the unmanned aerial vehicle; the actions are coded by selecting the tangential overload, normal overload and roll angle of the unmanned aerial vehicle, namely in the formula (1)The actions taken at each moment in the simulation are represented, and the actions comprise seven actions of constant level flight, acceleration, deceleration, left turning, right turning, upward pulling and downward diving after being coded;
where v represents the speed of the drone, NxRepresenting tangential overload of the drone, theta representing pitch angle of the drone, psi representing yaw angle of the drone, NzIndicating a normal overload of the drone,the roll angle of the unmanned aerial vehicle is represented, t represents the updating time of the state of the unmanned aerial vehicle, and g represents the gravity acceleration;
step 1-1-3: inputting the action to be executed by the unmanned aerial vehicle;
step 1-1-4: resolving the state of the airplane after the airplane executes the action through the Longge Kutta;
step 1-1-5: updating the state of the airplane;
step 1-2: constructing a missile model;
step 1-2-1: determining the parameter of the conductance elastic energy as the maximum off-axis emission angleMaximum minimum attack distance DMmaxAnd DMminMaximum and minimum non-escapeable distances DMkmaxAnd DMkminAnd a cone angle
The missile attack area is assumed to be static, and only the maximum attack distance, the maximum non-escape distance and the cone angle are concerned; the attack Area is marked as AreaackAnd satisfies the following conditions:
wherein d istIndicating the distance from red to blue machine, qtRepresenting the line of sight angle of the red machine to the blue machine; pos (target) denotes the position of the blue machine;
the non-escape Area is marked as AreadeadAnd satisfies the following conditions:
when the blue machine enters an attack area of the red machine, the blue machine is destroyed with a certain probability;
step 1-2-2: dividing an attack area;
when in useAnd DM min<d<DM maxThe blue machine is positioned in the second area or the third area of the attack area; utensil for cleaning buttockThe body is judged in the second area or the third area according to the relative position of the red machine and the blue machine, and the relative position of the red machine and the blue machine is as shown in the formula (4):
wherein, Deltax, Deltay, Deltai represent the distance difference of red machine and blue machine in the direction of X-axis, direction of Y-axis and direction of Z-axis respectively, and xb、yb、zbRespectively representing the position of the blue machine in the x-axis direction, the y-axis direction and the z-axis direction, xr、yr、zrRespectively showing the positions of the red machine in the x-axis direction, the y-axis direction and the z-axis direction;
if it is notThe blue machine is located to the right with respect to the red machine, i.e. zone c of the attack zone, ifThe blue camera is positioned on the left side relative to the red camera, namely the region II of the attack region;
in summary, the attack area is specifically divided as follows:
step 1-2-3: when the blue machine is in the region, the blue machine is in the non-escape area of the red machine, and the missile hit probability is maximum; when the blue-ray machine is in other areas, the hit probability of the missile is a function from 0 to 1, and the size of the hit probability is related to the distance, the departure angle, the deviation angle and the flight direction; when the missile hit probability is less than 0.3, the missile is considered to be not hit, and the missile cannot be launched at the moment; the specific destruction probability is as follows:
wherein the content of the first and second substances,parepresenting the probability of crash, p, associated with a blue plane maneuverdThe damage probability associated with the distance is shown, and the position (airfft _ aim) shows the area of the attack area of the local area where the blue machine is located;
step 1-2-4: the specific steps for launching the missile are as follows:
step 1-2-4-1: inputting the distance d, the departure angle AA, the deviation angle ATA, the position and the speed of the red machine and the blue machine;
step 1-2-4-2: constructing a missile model, and setting the number of missiles;
step 1-2-4-3: judging whether the blue machine is in an attack area of the red machine or not according to the distance d and the departure angle ATA;
step 1-2-4-4: when the blue machine is in the attack area of the red machine, judging which part of the attack area the blue machine is in;
step 1-2-4-5: judging the speed direction of the blue machine relative to the red machine;
step 1-2-4-6: calculating the hit rate of the missile at the moment;
step 1-2-4-7: judging whether the missile is hit;
step 1-3: a neural network normalization model;
step 1-3-1: inputting state variables of the unmanned aerial vehicle;
Step 1-3-5: making a difference on the positions of the normalized red machine and the normalized blue machine;
step 1-3-6: outputting the data;
step 1-4: constructing a battlefield environment model;
step 1-5: situation judgment and target distribution model;
step 1-5-1: inputting the states of the red machine and the blue machine, including the speed, the pitch angle, the yaw angle and the triaxial position;
step 1-5-2: calculating respective angle advantages according to the pitch angle and the yaw angleφtIs the target entry angle, phifIs the target azimuth;
1-5-4: calculating respective energy advantages from the velocity and the height in the three-axis position
1-5-5: calculating the comprehensive advantage S ═ C by combining the advantages of angle, speed and energy1Sa+C2Sr+C3Eg,C1、C2And C3Are all weighting coefficients;
1-5-6: sequencing the targets according to the comprehensive advantages to generate a target distribution matrix;
1-5-7: and outputting the allocation of the targets according to the target allocation matrix.
Further, in the step 2, an MAPPO algorithm is adopted as a multi-agent reinforcement learning algorithm, a centralized training and distributed execution framework is combined with a PPO algorithm to form the MAPPO algorithm, and a corresponding reward function is designed on the basis of a specific air combat environment, and the specific steps are as follows:
the return function consists of four sub-return functions, namely a height return function, a speed return function, an angle return function and a distance return function; the method comprises the following specific steps:
step 2-1: input unmanned aerial vehicle state Sr=[Vr,γr,φr,xr,yr,hr];
Step 2-2: calculating the height difference Δ h ═ hr-hbAnd calculates a height difference reward r _ h, hr、hbRespectively, the height of the red machine and the height of the blue machine, wherein the height unit is meter:
step 2-3: high security rewards for computer red:
step 2-4: calculating a total altitude reward Rh=r_h+r_h_self;
Step 2-5: calculating the speed difference Δ h ═ vr-vbAnd calculating the velocity difference reward, vr、vbThe speed of the red machine and the speed of the blue machine are respectively expressed, and the speed is expressed in the unit of meter/second:
step 2-6: speed safety reward of computer:
step 2-7: calculating a total speed reward Rv=r_v+r_v_self;
Step 2-8: calculating the deviation angle AA and the deviation angle ATA of the red computer and the blue computer;
Step 2-10: calculating the distance between red and blue machines as the departure angle AWhen TA is less than 60 degree, the distance reward is obtained
Step 2-11: setting different weights to sum the rewards to obtain continuous reward Rc=a1·Ra+a2·Rh+a3·Rv+a4·Rd,a1、a2、a3And a4Respectively, representing different weights.
Further, in step 3, the unmanned aerial vehicle model constructed in step 1 and the multi-agent reinforcement learning algorithm in step 2 are combined to generate a final multi-machine cooperative air combat decision method based on multi-agent reinforcement learning, which is specifically as follows:
step 3-1: the multi-agent reinforcement learning algorithm consists of a strategy network and a value network, wherein the value network is responsible for evaluating the action selected by the strategy network so as to guide the updating of the strategy network; the input of the value network is the speed, the pitch angle, the yaw angle, the position in the x direction, the position in the y direction, the height and the selected action of the unmanned aerial vehicle, the friend aircraft and the enemy aircraft at the last moment; the input of the strategy network is the speed, the pitch angle, the yaw angle, the position in the x direction, the position in the y direction and the height of the unmanned aerial vehicle, and the output of the strategy network is selected action;
step 3-2: firstly, selecting initial actions according to initial parameters of a policy network of the red machine and the blue machine, executing the actions in a battlefield environment model to obtain a new state, then calculating rewards, and then packaging and storing the states, the rewards and the actions of the red machine and the blue machine in an experience playback library of a multi-agent reinforcement learning algorithm in a normalized mode; after enough set data are stored, the value network of the red machine and the blue machine samples the data of the experience playback library, the states of the red machine and the blue machine are combined, the strategy network updates the strategy, then the unmanned aerial vehicle takes the state of the unmanned aerial vehicle as the input of the strategy network, the strategy network selects the action of the unmanned aerial vehicle according to the state of the unmanned aerial vehicle, the unmanned aerial vehicle executes the action to obtain new data, and the circulation is carried out repeatedly.
The invention has the following beneficial effects:
(1) the method effectively solves the problems that the traditional multi-agent collaborative air combat is large in calculation amount and difficult to deal with the situation of a battlefield which needs to settle out the transient changes in real time.
(2) The multi-machine cooperative air combat decision algorithm based on multi-agent reinforcement learning formed by the method effectively solves the problems of multi-heterogeneous agent cooperation, real-time confrontation and action continuity, huge search space, multi-complex tasks and the like in multi-agent decision.
(3) The multi-machine cooperative air combat decision algorithm based on multi-agent reinforcement learning comprises a battlefield environment construction module, a normalization module, a reinforcement learning module, an airplane module, a missile module, a reward module and a target distribution module, and a decision model can be established according to battlefield environment and situation information.
(4) The invention can realize multi-machine air combat decision output, the reinforcement learning algorithm can be trained independently according to different scenes, and the decision algorithm has the characteristics of good input/output interface and modular rapid transplantation.
Drawings
Fig. 1 is a schematic cross-sectional view of an attack area of an unmanned aerial vehicle according to the present invention.
Fig. 2 is a flowchart of a battlefield environment module of the present invention.
FIG. 3 is a multi-agent multi-aircraft air combat decision algorithm design framework according to the present invention.
FIG. 4 is a diagram showing the relationship between modules in the method of the present invention.
Fig. 5 is an initial occupation map of V2 air war in accordance with embodiment of the present invention.
FIG. 6 is a diagram of the velocity change of both air fighters according to the embodiment of the invention.
FIG. 7 is a diagram of the height change of both air fighters in accordance with the embodiment of the present invention.
FIG. 8 is a diagram illustrating situation changes of both air fighters according to an embodiment of the present invention.
FIG. 9 is a track diagram of both air war parties in accordance with an embodiment of the present invention.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
A multi-airplane air combat decision method based on multi-agent reinforcement learning comprises the following steps:
step 1: the unmanned aerial vehicles of the two parties of the battle are assumed to be the unmanned aerial vehicle of the same party and the unmanned aerial vehicle of the opposite party, the unmanned aerial vehicle of the same party is the red machine, and the unmanned aerial vehicle of the opposite party is the blue machine; establishing a six-degree-of-freedom model, a missile model, a neural network normalization model, a battlefield environment model, a situation judgment model and a target distribution model of the unmanned aerial vehicle;
step 2: adopting an MAPPO algorithm as a multi-agent reinforcement learning algorithm, and designing a corresponding return function on the basis of a specific air combat environment;
and step 3: and (3) combining the unmanned aerial vehicle model constructed in the step (1) with the multi-agent reinforcement learning algorithm in the step (2) to generate a final multi-machine cooperative air combat decision method based on multi-agent reinforcement learning.
Further, in the step 1, an airplane model, a missile model, a neural network normalization model, a battlefield environment model, a situation judgment model and a target distribution model of the unmanned aerial vehicle are established, and the specific steps are as follows:
step 1-1: establishing an airplane model of the unmanned aerial vehicle;
firstly, constructing a six-degree-of-freedom model of the unmanned aerial vehicle according to a three-dimensional space kinematic equation under a ground inertial coordinate system, then constructing seven actions of the airplane according to the tangential overload, normal phase overload and roll angle of the unmanned aerial vehicle, and updating the state after the actions are finished through a Longge Kutta when the airplane selects to execute any one of the actions;
step 1-1-1: input state S of unmanned aerial vehicler=[Vr,γr,φr,xr,yr,hr]Speed V of the unmanned aerial vehiclerAngle of pitch γrAngle of roll phirThree axis position (x)r,yr,hr);
Step 1-1-2: constructing a six-degree-of-freedom model and seven actions of the unmanned aerial vehicle;
step 1-1-3: inputting the action to be executed by the unmanned aerial vehicle;
step 1-1-4: resolving the state of the airplane after the airplane executes the action through the Longge Kutta;
step 1-1-5: updating the state of the airplane;
step 1-2: constructing a missile model;
step 1-2-1: determining the parameter of the conductance elastic energy as the maximum off-axis emission angleMaximum minimum attack distance DMmaxAnd DMminMaximum and minimum non-escapeable distances DMkmaxAnd DMkminAnd a cone angle
In order to simplify the problem, the missile attack area is assumed to be static, and only the maximum attack distance, the maximum non-escape distance and the cone angle are concerned; the attack Area is marked as AreaackAnd satisfies the following conditions:
wherein q istRepresenting the line of sight angle of the red machine to the blue machine; pos (target) denotes the position of the blue machine;
the non-escape Area is marked as AreadeadAnd satisfies the following conditions:
when the blue machine enters an attack area of the red machine, the blue machine is destroyed with a certain probability;
to better determine this probability, the attack zone is further analyzed as shown in FIG. 1.
Step 1-2-2: dividing an attack area;
when in useAnd DM min<d<DM maxThe blue machine is positioned in the second area or the third area of the attack area; specifically, in the second zone or the third zone, the judgment is carried out through the relative position of the red machine and the blue machine, and the relative position of the red machine and the blue machine is as shown in the formula (4):
if it is notThe blue machine is located to the right with respect to the red machine, i.e. zone c of the attack zone, ifThe blue camera is positioned on the left side relative to the red camera, namely the region II of the attack region;
in summary, the attack area is specifically divided as follows:
step 1-2-3: when the blue machine is in the region, the blue machine is in the non-escape area of the red machine, and the missile hit probability is maximum; when the blue-ray machine is in other areas, the hit probability of the missile is a function from 0 to 1, and the size of the hit probability is related to the distance, the departure angle, the deviation angle and the flight direction; when the missile hit probability is less than 0.3, the missile is considered to be not hit, and the missile cannot be launched at the moment; the specific destruction probability is as follows:
step 1-2-4: the specific steps for launching the missile are as follows:
step 1-2-4-1: inputting the distance d, the departure angle AA, the deviation angle ATA, the position and the speed of the red machine and the blue machine;
step 1-2-4-2: constructing a missile model, and setting the number of missiles;
step 1-2-4-3: judging whether the blue machine is in an attack area of the red machine or not according to the distance d and the departure angle ATA;
step 1-2-4-4: when the blue machine is in the attack area of the red machine, judging which part of the attack area the blue machine is in;
step 1-2-4-5: judging the speed direction of the blue machine relative to the red machine;
step 1-2-4-6: calculating the hit rate of the missile at the moment;
step 1-2-4-7: judging whether the missile is hit;
step 1-3: a neural network normalization model;
normalization can ensure that when the input of each layer of the neural network keeps the same distribution gradient and is reduced, the model is converged to a correct place, and the gradient updating direction is deviated under different dimensions. And normalization to a reasonable range favors model generalization.
Step 1-3-1: inputting state variables of the unmanned aerial vehicle;
Step 1-3-5: making a difference on the positions of the normalized red machine and the normalized blue machine;
step 1-3-6: outputting the data;
step 1-4: constructing a battlefield environment model;
step 1-5: situation judgment and target distribution model;
and the situation judgment and target distribution model constructs a comprehensive advantage function by analyzing distance threat, angle advantage and energy advantage so as to construct an air war threat degree model. And then, calculating a target distribution matrix according to a target distribution matrix criterion after data fusion according to all information obtained by the long machine. And then selecting a tactical caution degree or risk degree coefficient according to the target distribution matrix, and representing the balance of the pilot on attacking and avoiding the danger problem.
Step 1-5-1: inputting the states of the red machine and the blue machine, including the speed, the pitch angle, the yaw angle and the triaxial position;
step 1-5-2: calculating respective angle advantages according to the pitch angle and the yaw angleφtIs the target entry angle, phifIs the target azimuth;
1-5-4: calculating respective energy advantages from the velocity and the height in the three-axis position
1-5-5: calculating the comprehensive advantages by combining the advantages of angle, speed and energy;
1-5-6: sequencing the targets according to the comprehensive advantages to generate a target distribution matrix;
1-5-7: and outputting the allocation of the targets according to the target allocation matrix.
Further, in the step 2, a MAPPO algorithm is adopted as a multi-agent reinforcement learning algorithm, and a corresponding reward function is designed on the basis of a specific air combat environment, and the specific steps are as follows:
MAPPO algorithm:
because the state, the action space of multimachine air battle scene are huge, and the space that single unmanned aerial vehicle can explore is limited, and the sample availability factor is not high. In addition, as a typical multi-machine system, in the problem of multi-machine cooperative air combat, the strategy of a single unmanned aerial vehicle is not only dependent on the feedback of the strategy and environment of the single unmanned aerial vehicle, but also influenced by the actions of other unmanned aerial vehicles and the cooperative relationship with the unmanned aerial vehicles, so that an experience sharing mechanism is designed, and the experience sharing mechanism comprises two aspects of sharing a sample experience base and sharing network parameters. The shared sample experience library is obtained by storing global environment situation information, action decision information of the unmanned aerial vehicle, environment situation information after the unmanned aerial vehicle executes a new action and an award value fed back by the environment aiming at the action into an experience playback library according to a quadruple form, and information of each unmanned aerial vehicle is stored into the same experience playback library according to the form. When network parameters are updated, samples are extracted from the experience playback library, loss values of the samples generated by different unmanned aerial vehicles under the Actor network and the Critic network are calculated respectively, then updating gradients of the two neural networks are obtained, gradient values calculated by the samples of the different unmanned aerial vehicles are weighted, and a global gradient formula can be obtained. As shown in fig. 3, the whole framework of the multi-machine collaborative air combat decision framework based on deep reinforcement learning includes seven modules, which are a battlefield environment construction module, a normalization module, a reinforcement learning module, an airplane module, a missile module, a reward module and a target distribution module. The input quantity of the framework is real-time battlefield situation information, and the output quantity is an action decision scheme of the controlled entity. After the original battlefield situation information is input into the framework, the original battlefield situation information is firstly processed by the situation processing module, and after data is cleaned, screened, extracted, packaged, normalized and represented in a format, the data is transmitted to the deep reinforcement learning module; the deep reinforcement learning module receives situation information data and outputs action decisions; the strategy network receives the action decision output of the deep reinforcement learning module, decodes and packages the action decision output into an operation instruction acceptable for the platform environment, and controls the corresponding unit; meanwhile, the new environment situation and the reward value obtained by executing the new action are packaged and stored in the experience storage module together with the environment situation information and the action decision scheme of the decision-making in the step, and when the network is to be trained, the sample data are extracted from the experience base and are transmitted to the neural network training module for training.
The return function consists of four sub-return functions, namely a height return function, a speed return function, an angle return function and a distance return function; the four return functions reflect the distribution of the energy advantage, the kinetic energy advantage and the hit probability of the attack area when the aircraft fights in the air, and the whole air combat environment is summarized. The reward function can reflect the occupation of the opponent airplane relative to the opponent airplane at the current moment, and can guide the airplane to fly to a high reward value, namely a place with a better situation. The method comprises the following specific steps:
step 2-1: input unmanned aerial vehicle state Sr=[Vr,γr,φr,xr,yr,hr];
Step 2-2: calculating the height difference Δ h ═ hr-hbAnd calculates a height difference reward r _ h:
step 2-3: high security rewards for computer red:
step 2-4: calculating a total altitude reward Rh=r_h+r_h_self;
Step 2-5: calculating the speed difference Δ h ═ vr-vbAnd calculates a speed difference reward:
step 2-6: speed safety reward of computer:
step 2-7: calculating a total speed reward Rv=r_v+r_v_self;
Step 2-8: calculating the deviation angle AA and the deviation angle ATA of the red computer and the blue computer;
Step 2-10: calculating the distance between the red computer and the blue computer, and obtaining distance reward when the departure angle ATA is less than 60 degrees
Step 2-11: setting different weights to sum the rewards to obtain continuous reward Rc=a1·Ra+a2·Rh+a3·Rv+a4·Rd,a1、a2、a3And a4Respectively, representing different weights.
Further, in step 3, the unmanned aerial vehicle model constructed in step 1 and the multi-agent reinforcement learning algorithm in step 2 are combined to generate a final multi-machine cooperative air combat decision method based on multi-agent reinforcement learning, which is specifically as follows:
step 3-1: the relation between the model constructed in the step 1 and the MAPPO algorithm and the designed reporting function in the step 2 is shown in the attached figure 4, the multi-agent reinforcement learning algorithm is composed of a strategy network and a value network, and the value network is responsible for evaluating the action selected by the strategy network so as to guide the updating of the strategy network; the input of the value network is the speed, the pitch angle, the yaw angle, the position in the x direction, the position in the y direction, the height and the selected action of the unmanned aerial vehicle, the friend aircraft and the enemy aircraft at the last moment; the input of the strategy network is the speed, the pitch angle, the yaw angle, the position in the x direction, the position in the y direction and the height of the unmanned aerial vehicle, and the output of the strategy network is selected action;
step 3-2: firstly, selecting initial actions according to initial parameters of a policy network of the red machine and the blue machine, executing the actions in a battlefield environment model to obtain a new state, then calculating rewards, and then packaging and storing the states, the rewards and the actions of the red machine and the blue machine in an experience playback library of a multi-agent reinforcement learning algorithm in a normalized mode; after enough set data are stored, the value network of the red machine and the blue machine samples the data of the experience playback library, the states of the red machine and the blue machine are combined, the strategy network updates the strategy, then the unmanned aerial vehicle takes the state of the unmanned aerial vehicle as the input of the strategy network, the strategy network selects the action of the unmanned aerial vehicle according to the state of the unmanned aerial vehicle, the unmanned aerial vehicle executes the action to obtain new data, and the circulation is carried out repeatedly.
The specific embodiment is as follows:
the situation of double-aircraft in wartime is shown in fig. 5, four airplanes are on the same plane, a red aircraft 1 and a red aircraft 2 are respectively positioned right in front of a blue aircraft 1 and a blue aircraft 2, the blue aircraft 1 and the blue aircraft 2 have a tendency to be close to a combined attack area of the red aircraft 1 and the red aircraft 2, and the red aircraft 1 and the red aircraft 2 also have a tendency to be close to a combined attack area of the blue aircraft 1 and the blue aircraft 2. So that the red machine 1 and the red machine 2 are in the same potential as the blue machine 1 and the blue machine 2.
After the training was completed, the number of wins in the red and blue after 1000 trials is shown in table 1. It can be found that the winning rate of the red square is 51.8 percent and the winning rate of the blue square is 48.2 percent.
TABLE 1 number of wins in Red and blue
Situation(s) | Number of times |
|
226 |
|
129 |
|
0 |
|
163 |
|
330 |
|
0 |
|
152 |
|
0 |
The analysis was performed by using a red machine 1 and a middle blue machine 1 as an example.
The action selected by red machine 1 is [ right, right, right, right, acc, acc, acc, acc, acc, acc).
The action selected by Red machine 2 is [ right, right, acc, right, acc, acc, acc, acc, acc, acc ].
The action selected by the blue machine 1 is [ right, right, right, right, acc, acc, acc, acc, acc, acc ].
The action selected by blue machine 2 is [ right, right, right, right, acc, acc, acc, acc, acc, acc ] is.
The simulation result graphs are shown in fig. 6-8, wherein the solid line represents red machine 1, the dotted line represents red machine 2, the dotted line represents blue machine 1, and the dotted curve represents blue machine 2. As shown in fig. 6, the speed of blue machine 2 is highest with the greatest speed advantage, and the speed of red machine 1 and red machine 2 is far less than that of blue machine 1 and blue machine 2. As can be seen from fig. 7, the blue aircraft 1 and the blue aircraft 2 are not as superior in height to the red aircraft 1 and the red aircraft 2, and as can be seen from fig. 8, the red aircraft 1, the red aircraft 2, the blue aircraft 1 and the blue aircraft 2 are flying safely, so that the initial situations thereof are all positive, as the air war carries out the pinching of the red aircraft 1 and the blue aircraft 2 by the blue aircraft 1 and the blue aircraft 2, the situations of the blue aircraft 1 and the blue aircraft 2 gradually rise, the situation of the red aircraft gradually worsens, then the two red aircraft also start the pinching of the blue aircraft 2, the situation of the blue aircraft falls, the situation of the red aircraft rises, finally, the blue aircraft finishes the pinching of the red aircraft 1 first, and the blue aircraft 2 launches a missile, successfully hits the situations of the red aircraft 1 and the blue aircraft 2, and masters the battlefield initiative.
Fig. 9 is a trajectory diagram of four drones.
The effectiveness of the multi-computer cooperative air combat decision algorithm based on multi-agent reinforcement learning designed by the invention is proved by integrating all simulation results, the problems that the traditional multi-agent cooperative air combat is large in calculated amount and difficult to deal with the battlefield situation requiring real-time settlement of the transient change are effectively solved, meanwhile, the problems of cooperation, real-time confrontation and action continuity of multi-heterogeneous agents, huge search space, multi-complex tasks and the like in multi-agent decision are effectively solved, and a decision model can be established according to battlefield environment and situation information; the multi-machine air combat decision output can be realized, the reinforcement learning algorithm can be trained independently according to different scenes, and the decision algorithm has the characteristics of good input/output interfaces and modular rapid transplantation.
Claims (4)
1. A multi-airplane air combat decision method based on multi-agent reinforcement learning is characterized by comprising the following steps:
step 1: the unmanned aerial vehicles of the two parties of the battle are assumed to be the unmanned aerial vehicle of the same party and the unmanned aerial vehicle of the opposite party, the unmanned aerial vehicle of the same party is the red machine, and the unmanned aerial vehicle of the opposite party is the blue machine; establishing a six-degree-of-freedom model, a missile model, a neural network normalization model, a battlefield environment model, a situation judgment model and a target distribution model of the unmanned aerial vehicle;
step 2: adopting an MAPPO algorithm as a multi-agent reinforcement learning algorithm, and designing a corresponding return function on the basis of a specific air combat environment;
and step 3: and (3) combining the unmanned aerial vehicle model constructed in the step (1) with the multi-agent reinforcement learning algorithm in the step (2) to generate a final multi-machine cooperative air combat decision method based on multi-agent reinforcement learning.
2. The multi-aircraft air combat decision method based on multi-agent reinforcement learning as claimed in claim 1, wherein in step 1, an unmanned aerial vehicle model, a missile model, a neural network normalization model, a battlefield environment model, a situation judgment and target distribution model are established, and the specific steps are as follows:
step 1-1: establishing an airplane model of the unmanned aerial vehicle;
step 1-1-1: input state S of unmanned aerial vehicler=[Vr,γr,φr,xr,yr,hr]Speed V of the unmanned aerial vehiclerAngle of pitch γrAngle of roll phirThree axis position (x)r,yr,hr);
Step 1-1-2: constructing a six-degree-of-freedom model and seven actions of the unmanned aerial vehicle; the actions are coded by selecting the tangential overload, normal overload and roll angle of the unmanned aerial vehicle, namely in the formula (1)Coming watchDisplaying actions taken at each moment in the simulation, wherein the actions comprise seven actions of constant level flight, acceleration, deceleration, left turning, right turning, upward pulling and downward diving after being coded;
where v represents the speed of the drone, NxRepresenting tangential overload of the drone, theta representing pitch angle of the drone, psi representing yaw angle of the drone, NzIndicating a normal overload of the drone,the roll angle of the unmanned aerial vehicle is represented, t represents the updating time of the state of the unmanned aerial vehicle, and g represents the gravity acceleration;
step 1-1-3: inputting the action to be executed by the unmanned aerial vehicle;
step 1-1-4: resolving the state of the airplane after the airplane executes the action through the Longge Kutta;
step 1-1-5: updating the state of the airplane;
step 1-2: constructing a missile model;
step 1-2-1: determining the parameter of the conductance elastic energy as the maximum off-axis emission angleMaximum minimum attack distance DMmaxAnd DMminMaximum and minimum non-escapeable distances DMkmaxAnd DMKminAnd a cone angle
The missile attack area is assumed to be static, and only the maximum attack distance, the maximum non-escape distance and the cone angle are concerned; the attack Area is marked as AreaackAnd satisfies the following conditions:
wherein d istIndicating the distance from red to blue machine, qtRepresenting the line of sight angle of the red machine to the blue machine; pos (target) denotes the position of the blue machine;
the non-escape Area is marked as AreadeadAnd satisfies the following conditions:
when the blue machine enters an attack area of the red machine, the blue machine is destroyed with a certain probability;
step 1-2-2: dividing an attack area;
when in useAnd DMmin<d<DMmaxThe blue machine is positioned in the second area or the third area of the attack area; specifically, in the second zone or the third zone, the judgment is carried out through the relative position of the red machine and the blue machine, and the relative position of the red machine and the blue machine is as shown in the formula (4):
wherein, Deltax, Deltay, Deltaz represent the distance difference of red machine and blue machine in the direction of X-axis, direction of Y-axis and direction of Z-axis respectively, and xb、yb、zbRespectively representing the position of the blue machine in the x-axis direction, the y-axis direction and the z-axis direction, xr、yr、zrRespectively showing the positions of the red machine in the x-axis direction, the y-axis direction and the z-axis direction;
if it is notThe blue machine is located to the right with respect to the red machine, i.e. zone c of the attack zone, ifThe blue camera is positioned on the left side relative to the red camera, namely the region II of the attack region;
in summary, the attack area is specifically divided as follows:
step 1-2-3: when the blue machine is in the region, the blue machine is in the non-escape area of the red machine, and the missile hit probability is maximum; when the blue-ray machine is in other areas, the hit probability of the missile is a function from 0 to 1, and the size of the hit probability is related to the distance, the departure angle, the deviation angle and the flight direction; when the missile hit probability is less than 0.3, the missile is considered to be not hit, and the missile cannot be launched at the moment; the specific destruction probability is as follows:
wherein p isaRepresenting the probability of crash, p, associated with a blue plane maneuverdThe damage probability associated with the distance is shown, and the position (airfft _ aim) shows the area of the attack area of the local area where the blue machine is located;
step 1-2-4: the specific steps for launching the missile are as follows:
step 1-2-4-1: inputting the distance d, the departure angle AA, the deviation angle ATA, the position and the speed of the red machine and the blue machine;
step 1-2-4-2: constructing a missile model, and setting the number of missiles;
step 1-2-4-3: judging whether the blue machine is in an attack area of the red machine or not according to the distance d and the departure angle ATA;
step 1-2-4-4: when the blue machine is in the attack area of the red machine, judging which part of the attack area the blue machine is in;
step 1-2-4-5: judging the speed direction of the blue machine relative to the red machine;
step 1-2-4-6: calculating the hit rate of the missile at the moment;
step 1-2-4-7: judging whether the missile is hit;
step 1-3: a neural network normalization model;
step 1-3-1: inputting state variables of the unmanned aerial vehicle;
Step 1-3-5: making a difference on the positions of the normalized red machine and the normalized blue machine;
step 1-3-6: outputting the data;
step 1-4: constructing a battlefield environment model;
step 1-5: situation judgment and target distribution model;
step 1-5-1: inputting the states of the red machine and the blue machine, including the speed, the pitch angle, the yaw angle and the triaxial position;
step 1-5-2: calculating respective angle advantages according to the pitch angle and the yaw angleφtIs the target entry angle, phifIs the target azimuth;
1-5-4: calculating respective energy advantages from the velocity and the height in the three-axis position
1-5-5: calculating the comprehensive advantage S ═ C by combining the advantages of angle, speed and energy1Sa+C2Sr+C3Eg,C1、C2And C3Are all weighting coefficients;
1-5-6: sequencing the targets according to the comprehensive advantages to generate a target distribution matrix;
1-5-7: and outputting the allocation of the targets according to the target allocation matrix.
3. The multi-aircraft air combat decision method based on multi-agent reinforcement learning as claimed in claim 2, wherein in the step 2, a MAPPO algorithm is adopted as the multi-agent reinforcement learning algorithm, a centralized training and distributed execution framework is combined with a PPO algorithm to form the MAPPO algorithm, and a corresponding reward function is designed on the basis of a specific air combat environment, and the specific steps are as follows:
the return function consists of four sub-return functions, namely a height return function, a speed return function, an angle return function and a distance return function; the method comprises the following specific steps:
step 2-1: input unmanned aerial vehicle state Sr=[Vr,γr,φr,xr,yr,hr];
Step 2-2: calculating the height difference Δ h ═ hr-nbAnd calculates a height difference reward r _ h, hr、hbRespectively, the height of the red machine and the height of the blue machine, wherein the height unit is meter:
step 2-3: high security rewards for computer red:
step 2-4: calculating a total altitude reward Rh=r_h+r_h_self;
Step 2-5: calculating the speed difference Δ h ═ vr-vbAnd calculating the velocity difference reward, vr、vbThe speed of the red machine and the speed of the blue machine are respectively expressed, and the speed is expressed in the unit of meter/second:
step 2-6: speed safety reward of computer:
step 2-7: calculating a total speed reward Rv=r_v+r_v_self;
Step 2-8: calculating the deviation angle AA and the deviation angle ATA of the red computer and the blue computer;
Steps 2 to 10: calculating the distance between the red computer and the blue computer, and obtaining distance reward when the departure angle ATA is less than 60 degrees
Step 2-11: setting different weights to sum the rewards to obtain continuous reward Rc=a1·Ra+a2·Rh+a3·Rv+a4·Rd,a1、a2、a3And a4Respectively, representing different weights.
4. The multi-machine air combat decision method based on multi-agent reinforcement learning as claimed in claim 3, wherein in the step 3, the unmanned aerial vehicle model constructed in the step 1 and the multi-agent reinforcement learning algorithm in the step 2 are combined to generate a final multi-machine cooperative air combat decision method based on multi-agent reinforcement learning, which is specifically as follows:
step 3-1: the multi-agent reinforcement learning algorithm consists of a strategy network and a value network, wherein the value network is responsible for evaluating the action selected by the strategy network so as to guide the updating of the strategy network; the input of the value network is the speed, the pitch angle, the yaw angle, the position in the x direction, the position in the y direction, the height and the selected action of the unmanned aerial vehicle, the friend aircraft and the enemy aircraft at the last moment; the input of the strategy network is the speed, the pitch angle, the yaw angle, the position in the x direction, the position in the y direction and the height of the unmanned aerial vehicle, and the output of the strategy network is selected action;
step 3-2: firstly, selecting initial actions according to initial parameters of a policy network of the red machine and the blue machine, executing the actions in a battlefield environment model to obtain a new state, then calculating rewards, and then packaging and storing the states, the rewards and the actions of the red machine and the blue machine in an experience playback library of a multi-agent reinforcement learning algorithm in a normalized mode; after enough set data are stored, the value network of the red machine and the blue machine samples the data of the experience playback library, the states of the red machine and the blue machine are combined, the strategy network updates the strategy, then the unmanned aerial vehicle takes the state of the unmanned aerial vehicle as the input of the strategy network, the strategy network selects the action of the unmanned aerial vehicle according to the state of the unmanned aerial vehicle, the unmanned aerial vehicle executes the action to obtain new data, and the circulation is carried out repeatedly.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110964271.9A CN113791634B (en) | 2021-08-22 | 2021-08-22 | Multi-agent reinforcement learning-based multi-machine air combat decision method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110964271.9A CN113791634B (en) | 2021-08-22 | 2021-08-22 | Multi-agent reinforcement learning-based multi-machine air combat decision method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113791634A true CN113791634A (en) | 2021-12-14 |
CN113791634B CN113791634B (en) | 2024-02-02 |
Family
ID=78876259
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110964271.9A Active CN113791634B (en) | 2021-08-22 | 2021-08-22 | Multi-agent reinforcement learning-based multi-machine air combat decision method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113791634B (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114371729A (en) * | 2021-12-22 | 2022-04-19 | 中国人民解放军军事科学院战略评估咨询中心 | Unmanned aerial vehicle air combat maneuver decision method based on distance-first experience playback |
CN114492059A (en) * | 2022-02-07 | 2022-05-13 | 清华大学 | Multi-agent confrontation scene situation assessment method and device based on field energy |
CN114492058A (en) * | 2022-02-07 | 2022-05-13 | 清华大学 | Multi-agent confrontation scene oriented defense situation assessment method and device |
CN114578838A (en) * | 2022-03-01 | 2022-06-03 | 哈尔滨逐宇航天科技有限责任公司 | Reinforced learning active disturbance rejection attitude control method suitable for aircrafts of various configurations |
CN115047907A (en) * | 2022-06-10 | 2022-09-13 | 中国电子科技集团公司第二十八研究所 | Air isomorphic formation command method based on multi-agent PPO algorithm |
CN115113642A (en) * | 2022-06-02 | 2022-09-27 | 中国航空工业集团公司沈阳飞机设计研究所 | Multi-unmanned aerial vehicle space-time key feature self-learning cooperative confrontation decision-making method |
CN115484205A (en) * | 2022-07-12 | 2022-12-16 | 北京邮电大学 | Deterministic network routing and queue scheduling method and device |
CN116187787A (en) * | 2023-04-25 | 2023-05-30 | 中国人民解放军96901部队 | Intelligent planning method for cross-domain allocation problem of combat resources |
CN116679742A (en) * | 2023-04-11 | 2023-09-01 | 中国人民解放军海军航空大学 | Multi-six-degree-of-freedom aircraft collaborative combat decision-making method |
CN116880186A (en) * | 2023-07-13 | 2023-10-13 | 四川大学 | Data-driven self-adaptive dynamic programming air combat decision method |
CN116909155A (en) * | 2023-09-14 | 2023-10-20 | 中国人民解放军国防科技大学 | Unmanned aerial vehicle autonomous maneuver decision-making method and device based on continuous reinforcement learning |
CN117313561A (en) * | 2023-11-30 | 2023-12-29 | 中国科学院自动化研究所 | Unmanned aerial vehicle intelligent decision model training method and unmanned aerial vehicle intelligent decision method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110404264A (en) * | 2019-07-25 | 2019-11-05 | 哈尔滨工业大学(深圳) | It is a kind of based on the virtually non-perfect information game strategy method for solving of more people, device, system and the storage medium self played a game |
WO2020000399A1 (en) * | 2018-06-29 | 2020-01-02 | 东莞理工学院 | Multi-agent deep reinforcement learning proxy method based on intelligent grid |
WO2020024097A1 (en) * | 2018-07-30 | 2020-02-06 | 东莞理工学院 | Deep reinforcement learning-based adaptive game algorithm |
CN112861442A (en) * | 2021-03-10 | 2021-05-28 | 中国人民解放军国防科技大学 | Multi-machine collaborative air combat planning method and system based on deep reinforcement learning |
CN112947581A (en) * | 2021-03-25 | 2021-06-11 | 西北工业大学 | Multi-unmanned aerial vehicle collaborative air combat maneuver decision method based on multi-agent reinforcement learning |
-
2021
- 2021-08-22 CN CN202110964271.9A patent/CN113791634B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020000399A1 (en) * | 2018-06-29 | 2020-01-02 | 东莞理工学院 | Multi-agent deep reinforcement learning proxy method based on intelligent grid |
WO2020024097A1 (en) * | 2018-07-30 | 2020-02-06 | 东莞理工学院 | Deep reinforcement learning-based adaptive game algorithm |
CN110404264A (en) * | 2019-07-25 | 2019-11-05 | 哈尔滨工业大学(深圳) | It is a kind of based on the virtually non-perfect information game strategy method for solving of more people, device, system and the storage medium self played a game |
CN112861442A (en) * | 2021-03-10 | 2021-05-28 | 中国人民解放军国防科技大学 | Multi-machine collaborative air combat planning method and system based on deep reinforcement learning |
CN112947581A (en) * | 2021-03-25 | 2021-06-11 | 西北工业大学 | Multi-unmanned aerial vehicle collaborative air combat maneuver decision method based on multi-agent reinforcement learning |
Non-Patent Citations (1)
Title |
---|
崔文华;李东;唐宇波;柳少军;: "基于深度强化学习的兵棋推演决策方法框架", 国防科技, no. 02 * |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114371729A (en) * | 2021-12-22 | 2022-04-19 | 中国人民解放军军事科学院战略评估咨询中心 | Unmanned aerial vehicle air combat maneuver decision method based on distance-first experience playback |
CN114492058B (en) * | 2022-02-07 | 2023-02-03 | 清华大学 | Multi-agent confrontation scene oriented defense situation assessment method and device |
CN114492059A (en) * | 2022-02-07 | 2022-05-13 | 清华大学 | Multi-agent confrontation scene situation assessment method and device based on field energy |
CN114492058A (en) * | 2022-02-07 | 2022-05-13 | 清华大学 | Multi-agent confrontation scene oriented defense situation assessment method and device |
CN114492059B (en) * | 2022-02-07 | 2023-02-28 | 清华大学 | Multi-agent confrontation scene situation assessment method and device based on field energy |
CN114578838A (en) * | 2022-03-01 | 2022-06-03 | 哈尔滨逐宇航天科技有限责任公司 | Reinforced learning active disturbance rejection attitude control method suitable for aircrafts of various configurations |
CN114578838B (en) * | 2022-03-01 | 2022-09-16 | 哈尔滨逐宇航天科技有限责任公司 | Reinforced learning active disturbance rejection attitude control method suitable for aircrafts of various configurations |
CN115113642A (en) * | 2022-06-02 | 2022-09-27 | 中国航空工业集团公司沈阳飞机设计研究所 | Multi-unmanned aerial vehicle space-time key feature self-learning cooperative confrontation decision-making method |
CN115047907A (en) * | 2022-06-10 | 2022-09-13 | 中国电子科技集团公司第二十八研究所 | Air isomorphic formation command method based on multi-agent PPO algorithm |
CN115047907B (en) * | 2022-06-10 | 2024-05-07 | 中国电子科技集团公司第二十八研究所 | Air isomorphic formation command method based on multi-agent PPO algorithm |
CN115484205A (en) * | 2022-07-12 | 2022-12-16 | 北京邮电大学 | Deterministic network routing and queue scheduling method and device |
CN115484205B (en) * | 2022-07-12 | 2023-12-01 | 北京邮电大学 | Deterministic network routing and queue scheduling method and device |
CN116679742A (en) * | 2023-04-11 | 2023-09-01 | 中国人民解放军海军航空大学 | Multi-six-degree-of-freedom aircraft collaborative combat decision-making method |
CN116679742B (en) * | 2023-04-11 | 2024-04-02 | 中国人民解放军海军航空大学 | Multi-six-degree-of-freedom aircraft collaborative combat decision-making method |
CN116187787A (en) * | 2023-04-25 | 2023-05-30 | 中国人民解放军96901部队 | Intelligent planning method for cross-domain allocation problem of combat resources |
CN116187787B (en) * | 2023-04-25 | 2023-09-12 | 中国人民解放军96901部队 | Intelligent planning method for cross-domain allocation problem of combat resources |
CN116880186A (en) * | 2023-07-13 | 2023-10-13 | 四川大学 | Data-driven self-adaptive dynamic programming air combat decision method |
CN116880186B (en) * | 2023-07-13 | 2024-04-16 | 四川大学 | Data-driven self-adaptive dynamic programming air combat decision method |
CN116909155B (en) * | 2023-09-14 | 2023-11-24 | 中国人民解放军国防科技大学 | Unmanned aerial vehicle autonomous maneuver decision-making method and device based on continuous reinforcement learning |
CN116909155A (en) * | 2023-09-14 | 2023-10-20 | 中国人民解放军国防科技大学 | Unmanned aerial vehicle autonomous maneuver decision-making method and device based on continuous reinforcement learning |
CN117313561A (en) * | 2023-11-30 | 2023-12-29 | 中国科学院自动化研究所 | Unmanned aerial vehicle intelligent decision model training method and unmanned aerial vehicle intelligent decision method |
CN117313561B (en) * | 2023-11-30 | 2024-02-13 | 中国科学院自动化研究所 | Unmanned aerial vehicle intelligent decision model training method and unmanned aerial vehicle intelligent decision method |
Also Published As
Publication number | Publication date |
---|---|
CN113791634B (en) | 2024-02-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113791634A (en) | Multi-aircraft air combat decision method based on multi-agent reinforcement learning | |
Yang et al. | Maneuver decision of UAV in short-range air combat based on deep reinforcement learning | |
CN111880563B (en) | Multi-unmanned aerial vehicle task decision method based on MADDPG | |
CN111240353B (en) | Unmanned aerial vehicle collaborative air combat decision method based on genetic fuzzy tree | |
CN111859541B (en) | PMADDPG multi-unmanned aerial vehicle task decision method based on transfer learning improvement | |
CN112198892B (en) | Multi-unmanned aerial vehicle intelligent cooperative penetration countermeasure method | |
CN110928329A (en) | Multi-aircraft track planning method based on deep Q learning algorithm | |
CN115291625A (en) | Multi-unmanned aerial vehicle air combat decision method based on multi-agent layered reinforcement learning | |
CN105678030B (en) | Divide the air-combat tactics team emulation mode of shape based on expert system and tactics tactics | |
CN113893539B (en) | Cooperative fighting method and device for intelligent agent | |
Zhang et al. | Maneuver decision-making of deep learning for UCAV thorough azimuth angles | |
CN114063644B (en) | Unmanned fighter plane air combat autonomous decision-making method based on pigeon flock reverse countermeasure learning | |
CN115951709A (en) | Multi-unmanned aerial vehicle air combat strategy generation method based on TD3 | |
Bae et al. | Deep reinforcement learning-based air-to-air combat maneuver generation in a realistic environment | |
CN114638339A (en) | Intelligent agent task allocation method based on deep reinforcement learning | |
CN113435598A (en) | Knowledge-driven intelligent strategy deduction decision method | |
Wu et al. | Visual range maneuver decision of unmanned combat aerial vehicle based on fuzzy reasoning | |
CN117313561B (en) | Unmanned aerial vehicle intelligent decision model training method and unmanned aerial vehicle intelligent decision method | |
CN113741186B (en) | Double-aircraft air combat decision-making method based on near-end strategy optimization | |
Wang et al. | Deep reinforcement learning-based air combat maneuver decision-making: literature review, implementation tutorial and future direction | |
Duan et al. | Autonomous maneuver decision for unmanned aerial vehicle via improved pigeon-inspired optimization | |
Zhu et al. | Mastering air combat game with deep reinforcement learning | |
Kong et al. | Multi-ucav air combat in short-range maneuver strategy generation using reinforcement learning and curriculum learning | |
Wang et al. | Research on naval air defense intelligent operations on deep reinforcement learning | |
CN114330093A (en) | Multi-platform collaborative intelligent confrontation decision-making method for aviation soldiers based on DQN |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |