CN117171984A - Air combat maneuver decision method based on deep reinforcement learning - Google Patents

Air combat maneuver decision method based on deep reinforcement learning Download PDF

Info

Publication number
CN117171984A
CN117171984A CN202311071553.1A CN202311071553A CN117171984A CN 117171984 A CN117171984 A CN 117171984A CN 202311071553 A CN202311071553 A CN 202311071553A CN 117171984 A CN117171984 A CN 117171984A
Authority
CN
China
Prior art keywords
aircraft
air combat
reinforcement learning
deep reinforcement
maneuver decision
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311071553.1A
Other languages
Chinese (zh)
Inventor
陈宇哲
李秋妮
宋祺
焦城阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Air Force Engineering University of PLA
Original Assignee
Air Force Engineering University of PLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Air Force Engineering University of PLA filed Critical Air Force Engineering University of PLA
Priority to CN202311071553.1A priority Critical patent/CN117171984A/en
Publication of CN117171984A publication Critical patent/CN117171984A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an air combat maneuver decision method based on deep reinforcement learning, which comprises the following steps: 1. constructing an air combat one-to-one first-second countermeasure design; 2. constructing three-dimensional relative situation, rewarding function and action space of the first and second aircrafts; 3. establishing an air combat maneuver decision model based on deep reinforcement learning and training; 4. maneuver decisions based on an air combat maneuver decision model. The invention improves the perception capability of the first aircraft to the maneuver sequence characteristics by adding the long and short time memory neural network layer based on the PPO algorithm, randomly shifts the motion output by the first aircraft in the air combat according to the sampling frequency parameter, improves the capability of the first aircraft to find the optimal maneuver decision, and effectively solves the problem of insufficient perception capability of the traditional reinforcement learning algorithm to the maneuver sequence characteristics in the air combat.

Description

Air combat maneuver decision method based on deep reinforcement learning
Technical Field
The invention belongs to the technical field, and particularly relates to an air combat maneuver decision method based on deep reinforcement learning.
Background
The air force is taken as important military force in modern air combat, and no matter the air force is a fighter plane or an unmanned plane, the air combat maneuver decision needs to be carried out, the air combat maneuver decision is carried out by taking a fighter plane pilot as a main decision, and the air force is mainly carried out by depending on the loop control of people and the intelligent algorithm of the unmanned plane. The key way of realizing future intelligent air combat is the intellectualization of the air combat process, and the whole process of the air combat links of observation, judgment, decision and action is penetrated. The intelligent decision of the air combat greatly changes the mode and the form of future warfare, and has a subverted influence on the development of the warfare. The intelligent decision of the air combat simulates the decision made by the control fighter machine under various air combat conditions, and is the core and intelligent module of the intelligent fighter aircraft. Since the reaction rate of such aircraft outperforms any human pilot and does not take into account pilot physiological limits, it is advantageous in predicting combat victory and implementing active attacks. However, the implementation of air combat intelligent decisions is very complex, involving dynamic, real-time factors and larger scale solution space, which also presents significant challenges to the implementation of air combat intelligent decisions.
Air combat can be classified into a short distance air combat, a medium distance air combat, and a long distance air combat. Although significant progress has been made in air weapon technology, the air combat battlefield has been extended from close range to medium-long range, but close range air combat has not been neglected, and related technology is rapidly evolving. At present, the traditional air combat maneuver decision-making method mainly comprises an expert system, an influence graph method, a matrix game, differential countermeasures and the like, and the common characteristics of the methods are complex calculation, low instantaneity and serious dependence on human experience knowledge.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides an air combat maneuver decision method based on deep reinforcement learning, which is based on a PPO algorithm, improves the perception capability of a first aircraft to maneuver motion time sequence characteristics by adding a long and short time memory neural network layer, randomly shifts the motion output by the air combat first aircraft according to sampling frequency parameters so as to improve the capability of the first aircraft to find the optimal maneuver decision, effectively solves the problem of insufficient perception capability of the traditional reinforcement learning algorithm to the air combat maneuver motion time sequence characteristics, and is convenient to popularize and use.
In order to solve the technical problems, the invention adopts the following technical scheme: the air combat maneuver decision-making method based on the deep reinforcement learning is characterized by comprising the following steps of:
step one, constructing an air combat one-to-one first-second countermeasure design;
step two, constructing three-dimensional relative situation, rewarding function and action space of the first and second aircrafts;
step three, establishing an air combat maneuver decision model based on deep reinforcement learning and training;
and step four, maneuver decision based on the air combat maneuver decision model.
The air combat maneuver decision method based on the deep reinforcement learning is characterized by comprising the following steps of: in the first step, an air combat one-to-one first-second countermeasure design is constructed by using an air combat simulation environment platform, the first-second fight thinking middle first-side aircraft is an air combat maneuver to-be-decided aircraft, and the first-second fight thinking middle second-side aircraft is controlled by an air combat simulation environment platform.
The air combat maneuver decision method based on the deep reinforcement learning is characterized by comprising the following steps of: in the second step, three-dimensional relative situation of the two-party aircraft A and B is constructedWherein z is r Altitude, z of the A-square aircraft b Is the altitude of the second aircraft, delta h is the relative altitude of the first and second aircraft, V r Is the velocity vector of the A-square aircraft, V b For the speed of the B-planeVector, deltav is the absolute value difference of the speeds of the first and second aircrafts, d is the distance vector of the first and second aircrafts, and AA and ATA are the two-aircraft disengaging angle and the disengaging angle observed from the first angle respectively;
according to the formulaConstructing a reward function R, wherein ∈>For the angle bonus function R a Normalized results,/-> Awarding a function R for speed v Normalized results,/-> For a height bonus function R h Normalized results,/-> For distance rewarding function R d Normalized results,/->D is the range of the missile on the A-square aircraft, D min Is the minimum value of the range of the missile on the first aircraft, d max For the maximum range of missiles on a square aircraft, < +.>To win or lose the bonus function R end Normalized results,/->k 1 、k 2 、k 3 And k 4 Respectively-> And->Weight coefficient, k of (2) 1 、k 2 、k 3 And k 4 Are all nonnegative numbers and k 1 +k 2 +k 3 +k 4 =1;
And constructing an action space [ delta psil, delta V and delta H ] of the first aircraft relative to the last step, wherein delta psil is an orientation angle change value of the first aircraft relative to the last step and delta psil is dispersed to (-20 degrees, +20 degrees), delta V is a speed change value of the first aircraft relative to the last step and delta V is dispersed to (-100 km/H,100 km/H), and delta H is a height change value of the first aircraft relative to the last step and delta H is dispersed to (-500 m,500 m).
The air combat maneuver decision method based on the deep reinforcement learning is characterized by comprising the following steps of: in the third step, an air combat maneuver decision model based on deep reinforcement learning is established and trained, and the process is as follows:
step 301, a single-layer LSTM network layer is established, the LSTM layer is used as an initial layer of a PPO algorithm network, and an air combat maneuver decision model LSTM-PPO based on deep reinforcement learning is established;
step 302, taking the three-dimensional relative situation of the first and second aircrafts as an input parameter, and taking the action space result of the first aircrafts relative to the last step as an execution action output;
wherein, according to formula a t =π(S t ;θ π )+ε(S t ;θ ε ) Acquiring the result of the action space of the first aircraft relative to the last step, namely a new execution action a t ,π(S t ;θ π ) Three-dimensional relative situation S of first aircraft on second aircraft t And policy super-parameter theta of air combat maneuver decision model LSTM-PPO π Action results, ε (S) t ;θ ε ) Is a three-dimensional relative situation S of a first aircraft and a second aircraft t And parameter theta ε Noise function value, parameter θ ε ~N(0,σ 2 ) Is obtained from Gaussian distribution N (0, sigma) at the beginning of the round 2 ) Sampling to obtain;
step 303, calculating a reward function value;
step 304, setting a sampling frequency, and executing step 305 when the set time step number is not reached in any round; when the set number of time steps is reached, step 306 is performed;
step 305, continuing to take the three-dimensional relative situation of the first and second aircrafts as an input parameter, taking the action space result of the first aircrafts relative to the last step as an execution action output training, and calculating a reward function value in an air combat maneuver decision model LSTM-PPO based on deep reinforcement learning;
step 306, resampling noise and updating strategy super-parameter theta of air combat maneuver decision model LSTM-PPO based on deep reinforcement learning π
Step 307, the steps 302 to 306 are circulated until the rewarding function value is not smaller than the set threshold value, and training of the air combat maneuver decision model based on deep reinforcement learning is performed.
The air combat maneuver decision method based on the deep reinforcement learning is characterized by comprising the following steps of: and step four, performing maneuver decision test on the first aircraft by using the air combat maneuver decision model based on the deep reinforcement learning, which is trained in step 307.
Compared with the prior art, the invention has the following advantages:
(1) The method effectively solves the problem that the traditional reinforcement learning algorithm has insufficient perceptibility of the air combat time sequence maneuver characteristic.
(2) The air combat maneuver decision method based on the deep reinforcement learning formed by the method can show good countermeasure performance in 1v1 near air combat simulation countermeasure.
(3) The method can realize the maneuver decision output of the single air combat, and training is carried out according to different scenes or different model planes by using a deep reinforcement learning algorithm.
(4) The method has good compatibility and can be used for rapidly transplanting different simulation environments and algorithms.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
FIG. 1 is a block flow diagram of the method of the present invention.
Fig. 2 is a schematic diagram of three-dimensional relative situation of a two-sided aircraft.
FIG. 3 is a schematic diagram of the exploration strategy based on generalized state dependence according to the present invention.
Fig. 4 shows the operational area range set for the air combat in example 1v1 of the present invention.
FIG. 5 is a comparison curve of the round step size of the method of the present invention compared to the conventional PPO algorithm.
FIG. 6 is a plot of the prize versus the conventional PPO algorithm for the method of the present invention.
Fig. 7 is a graph showing the change of the winning rate of the air combat aircraft.
Fig. 8 is a diagram of a maneuver decision trajectory of an air combat aircraft.
Fig. 9 is a second diagram of a maneuver decision trajectory of an air combat aircraft.
Detailed Description
As shown in fig. 1 to 9, the air combat maneuver decision method based on deep reinforcement learning of the present invention comprises the following steps:
step one, constructing an air combat one-to-one first-second countermeasure design;
step two, constructing three-dimensional relative situation, rewarding function and action space of the first and second aircrafts;
step three, establishing an air combat maneuver decision model based on deep reinforcement learning and training;
and step four, maneuver decision based on the air combat maneuver decision model.
In the embodiment, in the first step, an air combat one-to-one first-second countermeasure design is constructed by utilizing an air combat simulation environment platform, the first-second fight thinking middle first-side aircraft is an air combat maneuver to-be-decided aircraft, and the first-second fight thinking middle second-side aircraft is controlled by an air combat simulation environment platform.
It should be noted that, using Mo Zibing chess deduction system as the air combat simulation environment platform, the ink subsystem can perform tactical and battle level simulation deduction, and provide AI development kit based on python, supporting development of military first-aid aircraft. In an air combat simulation environment, a first deduction party and a second deduction party are added, the relationship between the first deduction party and the second deduction party is set to be hostile, and the cognitive level and training of the first deduction party and the second deduction party are common. Adding a first air force base, a second air force base, adding fighters for the two bases respectively, and mounting 4 near-distance air-to-air missiles; and (3) moving the fighter aircraft from the first side and the second side to ensure that the distance between the first side and the second side is 98 km, the initial altitude is 10973 meters, and the fighter aircraft is stored as expected.
In the second step, a three-dimensional relative situation of the two-sided aircraft is constructedWherein z is r Altitude, z of the A-square aircraft b Is the altitude of the second aircraft, delta h is the relative altitude of the first and second aircraft, V r Is the velocity vector of the A-square aircraft, V b The velocity vector of the second aircraft is Deltav, the absolute value difference of the velocity of the first and second aircraft is Deltav, d is the distance vector of the first and second aircraft, and AA and ATA are respectively the two aircraft disengaging angles and disengaging angles observed from the first angle;
according to the formulaConstructing a reward function R, wherein ∈>For the angle bonus function R a Normalized results,/-> Awarding a function R for speed v Normalized results,/-> For a height bonus function R h The result of the normalization process is that, for distance rewarding function R d The result of the normalization process is that,d is the range of the missile on the A-square aircraft, D min Is the minimum value of the range of the missile on the first aircraft, d max For the maximum range of missiles on a square aircraft, < +.>To win or lose the bonus function R end The result of the normalization process is that,k 1 、k 2 、k 3 and k 4 Respectively-> And->Weight coefficient, k of (2) 1 、k 2 、k 3 And k 4 Are all nonnegative numbers and k 1 +k 2 +k 3 +k 4 =1;
And constructing an action space [ delta psil, delta V and delta H ] of the first aircraft relative to the last step, wherein delta psil is an orientation angle change value of the first aircraft relative to the last step and delta psil is dispersed to (-20 degrees, +20 degrees), delta V is a speed change value of the first aircraft relative to the last step and delta V is dispersed to (-100 km/H,100 km/H), and delta H is a height change value of the first aircraft relative to the last step and delta H is dispersed to (-500 m,500 m).
k 1 、k 2 、k 3 And k 4 0.5, 0.2 and 0.1 were taken respectively.
In the third embodiment, in the step, an air combat maneuver decision model based on deep reinforcement learning is built and trained, and the process is as follows:
step 301, a single-layer LSTM network layer is established, the LSTM layer is used as an initial layer of a PPO algorithm network, and an air combat maneuver decision model LSTM-PPO based on deep reinforcement learning is established;
step 302, taking the three-dimensional relative situation of the first and second aircrafts as an input parameter, and taking the action space result of the first aircrafts relative to the last step as an execution action output;
wherein, according to formula a t =π(S t ;θ π )+ε(S t ;θ ε ) Acquiring the result of the action space of the first aircraft relative to the last step, namely a new execution action a t ,π(S t ;θ π ) Three-dimensional relative situation S of first aircraft on second aircraft t And policy super-parameter theta of air combat maneuver decision model LSTM-PPO π Action results, ε (S) t ;θ ε ) Is a three-dimensional relative situation S of a first aircraft and a second aircraft t And parameter theta ε Noise function value, parameter θ ε ~N(0,σ 2 ) Is obtained from Gaussian distribution N (0, sigma) at the beginning of the round 2 ) Sampling to obtain;
step 303, calculating a reward function value;
step 304, setting a sampling frequency, and executing step 305 when the set time step number is not reached in any round; when the set number of time steps is reached, step 306 is performed;
step 305, continuing to take the three-dimensional relative situation of the first and second aircrafts as an input parameter, taking the action space result of the first aircrafts relative to the last step as an execution action output training, and calculating a reward function value in an air combat maneuver decision model LSTM-PPO based on deep reinforcement learning;
step 306,Resampling noise and updating strategy super-parameter theta of air combat maneuver decision model LSTM-PPO based on deep reinforcement learning π
Step 307, the steps 302 to 306 are circulated until the rewarding function value is not smaller than the set threshold value, and training of the air combat maneuver decision model based on deep reinforcement learning is performed.
It should be noted that, a single-layer LSTM network layer is established, and an air combat maneuver decision model LSTM-PPO based on deep reinforcement learning comprises a value network and a strategy network, wherein the input latitude of the network is 5, the output latitude of the value network is 1, and the output latitude of the strategy network is 3; the method comprises the steps of setting the node number and the activation function of a hidden layer in a model, selecting a ReLU function by the activation function, adding a gSDE strategy exploration method, and randomly shifting actions output by the air combat aircraft according to sampling frequency parameters so as to improve the capability of searching optimal maneuver decision of the aircraft. Setting the sampling frequency to be 2 steps, namely resampling noise and updating strategy super-parameters of an air combat maneuver decision model LSTM-PPO based on deep reinforcement learning every time the first aircraft executes 2 steps in any round.
Setting decision interval time as 5 seconds, setting round maximum step length as 100 steps, and setting the termination and win-lose judgment conditions of each round as follows:
(1) Killing party B, survival of party A, judging party A wins, and ending the round.
(2) Party A is killed, party B survives, party B wins is judged, and the round is terminated.
(3) Both the first and second sides are knocked down at the same time, the tie is determined, and the round is terminated.
(4) Party a is more than 200km from the base to which it belongs, and is considered that the aircraft cannot return, and the round is terminated.
(5) As in fig. 4, two bases are diagonal vertices of a rectangle (dashed box) outside of which a 5km box is defined as the combat zone, and an aircraft beyond the solid box of the first color is considered to be beyond the combat zone, the round will be terminated.
Setting the training steps as 50000 steps and the test rounds as 100 rounds, and setting the super parameters of the LSTM-PPO algorithm for training, wherein the super parameters are as follows:
LSTM-PPO algorithm superparameter:
in the fourth embodiment, in step 307, the maneuver decision test is performed on the first aircraft by using the air combat maneuver decision model based on the deep reinforcement learning.
After 50000 steps of training, the training result is analyzed, as shown in fig. 5, which is a training round step curve, the round average step gradually rises from about 25 initial steps, and reaches stability at 35 steps. The step length is increased to indicate that the first aircraft performs more actions in each round on average in the later period of training, and to indicate that the strategy of the first aircraft is more complex and more likely to generate a better maneuvering method; FIG. 6 is a graph of the rewards of the air combat aircraft A, wherein the rewards acquired by the aircraft A gradually rise from an initial negative value and remain stable within a certain range finally, which shows that the deep reinforcement learning algorithm completes convergence and learns a better strategy; fig. 7 is a graph of the change in the winning rate of an air combat aircraft, showing a gradual increase in the winning rate from 0 to about 70% and good combat performance.
In a certain round, as shown in fig. 8 and 9, the warplane tracks of both the first and second sides can be seen, the first side corresponds to the oncoming enemy plane, the roundabout maneuver or lateral cutting mode is sampled, the attack situation is rapidly formed behind the enemy plane side, the dominant position of the air combat countermeasure is occupied, the missile launching condition is preferentially formed, and the air combat situation which is more beneficial to the first side is obtained.
The invention can realize the maneuver decision output of the single air combat, and training is carried out according to different scenes or different model aircrafts by using a deep reinforcement learning algorithm; the air combat maneuver decision method based on deep reinforcement learning can lead the air combat first-party aircraft to actively strike the dominant position, and shows good countermeasure performance; the LSTM-PPO deep reinforcement learning algorithm effectively solves the problem that the traditional reinforcement learning algorithm has insufficient perceptibility of the air combat timing maneuver characteristic in the air combat one-to-one maneuver decision; the method has good compatibility, and can rapidly transplant simulation environments and different algorithms.
The foregoing description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and any simple modification, variation and equivalent structural changes made to the above embodiment according to the technical substance of the present invention still fall within the scope of the technical solution of the present invention.

Claims (5)

1. The air combat maneuver decision-making method based on the deep reinforcement learning is characterized by comprising the following steps of:
step one, constructing an air combat one-to-one first-second countermeasure design;
step two, constructing three-dimensional relative situation, rewarding function and action space of the first and second aircrafts;
step three, establishing an air combat maneuver decision model based on deep reinforcement learning and training;
and step four, maneuver decision based on the air combat maneuver decision model.
2. An air combat maneuver decision method based on deep reinforcement learning as defined in claim 1 wherein: in the first step, an air combat one-to-one first-second countermeasure design is constructed by using an air combat simulation environment platform, the first-second fight thinking middle first-side aircraft is an air combat maneuver to-be-decided aircraft, and the first-second fight thinking middle second-side aircraft is controlled by an air combat simulation environment platform.
3. An air combat maneuver decision method based on deep reinforcement learning as defined in claim 2 wherein: in the second step, three-dimensional relative situation of the two-party aircraft A and B is constructedWherein z is r Altitude, z of the A-square aircraft b Is the altitude of the second aircraft, delta h is the relative altitude of the first and second aircraft, V r Is the velocity vector of the A-square aircraft, V b For the second aircraftThe speed vector, deltav is the absolute value difference of the speeds of the first and second aircrafts, d is the distance vector of the first and second aircrafts, and AA and ATA are the two aircraft disengaging angles and the disengaging angle observed from the first angle respectively;
according to the formulaConstructing a reward function R, wherein ∈>For the angle bonus function R a Normalized results,/-> Awarding a function R for speed v The result of the normalization process is that, for a height bonus function R h Normalized results,/-> For distance rewarding function R d Normalized results,/->D is the range of the missile on the A-square aircraft, D min Is the minimum value of the range of the missile on the first aircraft, d max For the maximum range of missiles on a square aircraft, < +.>To win or lose the bonus function R end Normalized junctionFruit of (Bu)>k 1 、k 2 、k 3 And k 4 Respectively-> And->Weight coefficient, k of (2) 1 、k 2 、k 3 And k 4 Are all nonnegative numbers and k 1 +k 2 +k 3 +k 4 =1;
And constructing an action space [ delta psil, delta V and delta H ] of the first aircraft relative to the last step, wherein delta psil is an orientation angle change value of the first aircraft relative to the last step and delta psil is dispersed to (-20 degrees, +20 degrees), delta V is a speed change value of the first aircraft relative to the last step and delta V is dispersed to (-100 km/H,100 km/H), and delta H is a height change value of the first aircraft relative to the last step and delta H is dispersed to (-500 m,500 m).
4. A deep reinforcement learning-based air combat maneuver decision method as defined in claim 3 wherein: in the third step, an air combat maneuver decision model based on deep reinforcement learning is established and trained, and the process is as follows:
step 301, a single-layer LSTM network layer is established, the LSTM layer is used as an initial layer of a PPO algorithm network, and an air combat maneuver decision model LSTM-PPO based on deep reinforcement learning is established;
step 302, taking the three-dimensional relative situation of the first and second aircrafts as an input parameter, and taking the action space result of the first aircrafts relative to the last step as an execution action output;
wherein, according to formula a t =π(S t ;θ π )+ε(S t ;θ ε ) Acquiring the result of the action space of the first aircraft relative to the last step, namely a new execution action a t ,π(S t ;θ π ) Three-dimensional relative situation S of first aircraft on second aircraft t And policy super-parameter theta of air combat maneuver decision model LSTM-PPO π Action results, ε (S) t ;θ ε ) Is a three-dimensional relative situation S of a first aircraft and a second aircraft t And parameter theta ε Noise function value, parameter θ ε ~N(0,σ 2 ) Is obtained from Gaussian distribution N (0, sigma) at the beginning of the round 2 ) Sampling to obtain;
step 303, calculating a reward function value;
step 304, setting a sampling frequency, and executing step 305 when the set time step number is not reached in any round; when the set number of time steps is reached, step 306 is performed;
step 305, continuing to take the three-dimensional relative situation of the first and second aircrafts as an input parameter, taking the action space result of the first aircrafts relative to the last step as an execution action output training, and calculating a reward function value in an air combat maneuver decision model LSTM-PPO based on deep reinforcement learning;
step 306, resampling noise and updating strategy super-parameter theta of air combat maneuver decision model LSTM-PPO based on deep reinforcement learning π
Step 307, the steps 302 to 306 are circulated until the rewarding function value is not smaller than the set threshold value, and training of the air combat maneuver decision model based on deep reinforcement learning is performed.
5. The air combat maneuver decision method based on deep reinforcement learning according to claim 4 wherein: and step four, performing maneuver decision test on the first aircraft by using the air combat maneuver decision model based on the deep reinforcement learning, which is trained in step 307.
CN202311071553.1A 2023-08-24 2023-08-24 Air combat maneuver decision method based on deep reinforcement learning Pending CN117171984A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311071553.1A CN117171984A (en) 2023-08-24 2023-08-24 Air combat maneuver decision method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311071553.1A CN117171984A (en) 2023-08-24 2023-08-24 Air combat maneuver decision method based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN117171984A true CN117171984A (en) 2023-12-05

Family

ID=88935865

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311071553.1A Pending CN117171984A (en) 2023-08-24 2023-08-24 Air combat maneuver decision method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN117171984A (en)

Similar Documents

Publication Publication Date Title
CN113095481B (en) Air combat maneuver method based on parallel self-game
CN113791634B (en) Multi-agent reinforcement learning-based multi-machine air combat decision method
CN108629422A (en) A kind of intelligent body learning method of knowledge based guidance-tactics perception
CN113893539B (en) Cooperative fighting method and device for intelligent agent
CN113221444B (en) Behavior simulation training method for air intelligent game
CN113282061A (en) Unmanned aerial vehicle air game countermeasure solving method based on course learning
CN113052289B (en) Method for selecting cluster hitting position of unmanned ship based on game theory
CN110673488A (en) Double DQN unmanned aerial vehicle concealed access method based on priority random sampling strategy
CN116127848A (en) Multi-unmanned aerial vehicle collaborative tracking method based on deep reinforcement learning
CN114638339A (en) Intelligent agent task allocation method based on deep reinforcement learning
CN116187777A (en) Unmanned aerial vehicle air combat autonomous decision-making method based on SAC algorithm and alliance training
CN113962012A (en) Unmanned aerial vehicle countermeasure strategy optimization method and device
CN113222106A (en) Intelligent military chess deduction method based on distributed reinforcement learning
CN112306070A (en) Multi-AUV dynamic maneuver decision method based on interval information game
CN113282100A (en) Unmanned aerial vehicle confrontation game training control method based on reinforcement learning
CN116700079A (en) Unmanned aerial vehicle countermeasure occupation maneuver control method based on AC-NFSP
Kong et al. Hierarchical multi‐agent reinforcement learning for multi‐aircraft close‐range air combat
CN117171984A (en) Air combat maneuver decision method based on deep reinforcement learning
CN113741186A (en) Double-machine air combat decision method based on near-end strategy optimization
CN113705828B (en) Battlefield game strategy reinforcement learning training method based on cluster influence degree
CN116432030A (en) Air combat multi-intention strategy autonomous generation method based on deep reinforcement learning
CN116520884A (en) Unmanned plane cluster countermeasure strategy optimization method based on hierarchical reinforcement learning
CN116415646A (en) Course-based reinforcement learning single-machine air combat decision-making method
CN116360500A (en) Missile burst prevention method capable of getting rid of controllable distance
Kong et al. Multi-ucav air combat in short-range maneuver strategy generation using reinforcement learning and curriculum learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination