CN114035602A - Airplane maneuvering control method based on layered reinforcement learning - Google Patents

Airplane maneuvering control method based on layered reinforcement learning Download PDF

Info

Publication number
CN114035602A
CN114035602A CN202110904677.8A CN202110904677A CN114035602A CN 114035602 A CN114035602 A CN 114035602A CN 202110904677 A CN202110904677 A CN 202110904677A CN 114035602 A CN114035602 A CN 114035602A
Authority
CN
China
Prior art keywords
missile
command
probability
intelligent agent
embedding vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110904677.8A
Other languages
Chinese (zh)
Inventor
杨晟琦
朴海音
孙智孝
彭宣淇
韩玥
樊松源
孙阳
于津
田明俊
金琳乘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Aircraft Design and Research Institute Aviation Industry of China AVIC
Original Assignee
Shenyang Aircraft Design and Research Institute Aviation Industry of China AVIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Aircraft Design and Research Institute Aviation Industry of China AVIC filed Critical Shenyang Aircraft Design and Research Institute Aviation Industry of China AVIC
Priority to CN202110904677.8A priority Critical patent/CN114035602A/en
Publication of CN114035602A publication Critical patent/CN114035602A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/107Simultaneous control of position or course in three dimensions specially adapted for missiles

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Aiming, Guidance, Guns With A Light Source, Armor, Camouflage, And Targets (AREA)

Abstract

The application relates to the technical field of flight control, in particular to an airplane maneuvering control method based on layered reinforcement learning. The method comprises the following steps: step S1, acquiring action embedding vectors of the intelligent agent after calculation through the neural network; step S2, respectively outputting a probability list of horizontal angles, a probability list of vertical angles and a shooting probability list according to the motion embedding vectors; and step S3, sampling according to the probability lists, determining the horizontal control mode and the vertical control mode of the intelligent agent and whether to shoot, and controlling the intelligent agent. The application can generate a large number of maneuvering patterns with more diversity and flexibility by combining the horizontal tactical maneuvering intention and the vertical three-dimensional maneuvering intention.

Description

Airplane maneuvering control method based on layered reinforcement learning
Technical Field
The application relates to the technical field of flight control, in particular to an airplane maneuvering control method based on layered reinforcement learning.
Background
The beyond visual range air combat is a main form of modern air combat, and the beyond visual range air combat mainly adopts a deductive form of selecting targets for detection and tracking by a fighter, launching guided missiles for guidance and avoiding the missiles attacked by enemies. The process involves a large number of maneuver decision processes, how to carry out maneuver occupation to occupy good attack situation and maneuver evasion to protect itself from being hit by enemy missiles to the maximum extent, which are all key problems to be considered in over-the-horizon air combat. In recent years, the problem of how to make an unmanned intelligent body to make tactical decision behaviors which are equivalent to those of human pilots has become a hotspot of unmanned autonomous air combat research. The existing AI air combat method mainly comprises a rule-based expert system method, a probability model/fuzzy logic and computational intelligent mixing method and a machine learning and deep strong learning method. These methods all achieved exceptional performance in different respects. The rule-based expert system method completely depends on an air combat rule database defined by human pilots in advance, all the methods need to be designed in advance and have no self-evolution characteristic, and the behavior of the intelligent agent has an obvious upper limit. The probability model/fuzzy logic and computational intelligence hybrid method requires experts to construct a probabilistic reasoning network or design a heuristic objective function, cannot cover all air combat states, and is very complex and difficult to design. Machine learning methods rely heavily on large amounts of real air combat data, which is often rare or even unavailable, and tend to limit the performance of agents to the range of capabilities that the data can provide. The deep reinforcement learning method automatically generates the tactical strategy of the air battle through self-game reinforcement learning training without human knowledge supervision, but the maneuvering style is fixed, and the method greatly lacks diversity and flexibility compared with the human.
Disclosure of Invention
In order to solve the problems, the application provides a conceptual modeling method based on object process language, the maneuvering action of an airplane is decomposed into a horizontal dimension maneuvering angle and a vertical dimension maneuvering angle, and a large number of maneuvering patterns with more diversity and flexibility can be generated by mutually combining a horizontal tactical maneuvering intention and a vertical three-dimensional maneuvering intention.
The application relates to an airplane maneuvering control method based on layered reinforcement learning, which is used for maneuvering control of two groups of airplane intelligent bodies in a game process, and comprises the following steps:
step S1, acquiring action embedding vectors of the intelligent agent after calculation through the neural network;
step S2, respectively outputting a probability list of a horizontal angle, a probability list of a vertical angle and a shooting probability list according to the action embedding vector, wherein the probability list of the horizontal angle comprises a plurality of probability values respectively corresponding to a plurality of preset horizontal control instructions, the probability list of the vertical angle comprises a plurality of probability values respectively corresponding to a plurality of preset vertical control instructions, and the shooting probability list comprises two probability values respectively corresponding to whether to shoot or not;
and step S3, sampling according to the probability lists, determining the horizontal control mode and the vertical control mode of the intelligent agent and whether to shoot, and controlling the intelligent agent.
Preferably, the step S1 further includes:
s11, acquiring the overall air combat state, and dividing the overall air combat state into the current intelligent agent absolute state quantity representing the self attribute of the intelligent agent, the relative state quantity of the current intelligent agent and other intelligent agents, the missile state quantity of the intelligent agent of the team and the missile state quantity of the intelligent agent of the opponent;
step S12, determining a global embedding vector based on the overall air combat state, determining a relative observation embedding vector based on the relative state quantity, determining a friend missile embedding vector based on the missile state quantity of the team intelligent body, and determining an enemy missile embedding vector based on the missile state quantity of the opponent intelligent body;
and step S13, the action embedding vector is formed by jointly splicing the global embedding vector, the relative observation embedding vector, the friend missile embedding vector and the enemy missile embedding vector.
Preferably, in step S11, the current absolute state quantities of the agent include a vacuum speed, a current altitude, a climbing rate, a three-axis attitude angle, a normal overload, a fire radar lock signal, an electronic alarm device alarm state, and a number of remaining empty missiles; the relative state quantity comprises a relative distance, an approach rate, a relative altitude difference, a target entrance angle, a local beam angle and attack area information, and the missile state quantity comprises a missile speed, a current height, a missile-target distance, a missile-target approach rate, residual hit time, an entrance angle and a beam angle between a missile and a target; the missile state quantities of the opponent agent comprise missile state quantities which threaten the current agent.
Preferably, in step S2, the preset level control commands include six, which are a hold command to hold the current heading, an attack command to the target, an attack command biased by ± 30 °, an attack command biased by ± 50 °, a defense command biased by ± 90 °, and a defense command biased by ± 180 °.
Preferably, in step S2, the preset vertical control commands include six, i.e., a hold command for holding the current heading, an attack command for pointing to the target, +30 ° climb attack command, +60 ° climb attack command, -30 ° dive defense command, and-60 ° dive defense command.
Preferably, step S3 is followed by further comprising:
and step S4, determining an instruction speed probability list and an instruction overload probability list according to sampling results corresponding to a horizontal control mode and a vertical control mode and the action embedding vector, and sampling according to the instruction speed probability list and the instruction overload probability list respectively to obtain the instruction speed and the instruction overload of the intelligent agent.
Preferably, the instruction speed probability list includes a plurality of probabilities corresponding to the speed values, the instruction overload probability list includes a plurality of probabilities corresponding to the overload values, and the plurality of speed values and the plurality of overload values are obtained by discretizing the instruction speed and the discretizing the instruction overload.
The application can generate a large number of maneuvering patterns with more diversity and flexibility by combining the horizontal tactical maneuvering intention and the vertical three-dimensional maneuvering intention.
Drawings
FIG. 1 is a flow chart of an airplane maneuver control method based on hierarchical reinforcement learning according to the present application.
Detailed Description
In order to make the implementation objects, technical solutions and advantages of the present application clearer, the technical solutions in the embodiments of the present application will be described in more detail below with reference to the accompanying drawings in the embodiments of the present application. In the drawings, the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The described embodiments are some, but not all embodiments of the present application. The embodiments described below with reference to the drawings are illustrative and intended to be used for explaining the present application and should not be construed as limiting the present application. All other embodiments that can be derived by a person skilled in the art from the embodiments given herein without making any creative effort fall within the protection scope of the present application. Embodiments of the present application will be described in detail below with reference to the drawings.
The application provides an airplane maneuvering control method based on layered reinforcement learning, and the characteristic that the whole macro tactical intention of an airplane is influenced by combining the microscopic maneuvering actions under a plurality of dimensional visual angles is called maneuvering semantics. The strategy network of the intelligent body is improved by utilizing the maneuvering semantic information, so that the intelligent body can show more flexible tactical behaviors. The method comprises the following procedures:
a) extracting air combat characteristics as neural network input;
b) designing a semantic mechanical controller as decision output of a neural network;
c) and constructing a hierarchical network to forward the neural network.
The specific example steps are as follows:
1) and extracting air combat characteristics.
Divide the air combat state S into the absolute state quantity of the current intelligent agent
Figure BDA0003201145420000031
Relative state quantities of a current agent and other agents
Figure BDA0003201145420000032
Missile state quantity of intelligent agent of team
Figure BDA0003201145420000033
Missile status quantity of opponent agent
Figure BDA0003201145420000041
Current agent absolute state quantity
Figure BDA0003201145420000042
The material is composed of the following elements: vacuum speed tas, current height h, climbing rate
Figure BDA0003201145420000043
Three-axis attitude angle psi, theta, phi, normal overload nnFire control radar locking signal lo, electronic warning device warning state wan, number m of remaining empty missilesleft. The absolute state characterizes the attributes of the agent itself. Relative state quantity
Figure BDA0003201145420000044
The material is composed of the following elements: distance r, rate of approach
Figure BDA00032011454200000412
Relative altitude Δ h, target entry angle AA (angle between target velocity vector and line of sight vector), local beam angle BA (angle between local handpiece orientation and line of sight vector)And attack area information DLZ. The relative state characterizes the situational information of the current agent and target. The absolute state and the relative state are integrated to provide information of the whole battlefield for the intelligent agent, and provide characteristic information for the attack decision and team friend cooperation decision of the intelligent agent. The missile state quantity is composed of the following state quantities: velocity v of missilemCurrent height hmDistance r between missile and targetmRate of missile approach to target
Figure BDA0003201145420000045
Remaining hit time TgoAngle of entry AA between missile and targetmSum beam angle BAm
Figure BDA0003201145420000046
The system is composed of missile state quantities of all intelligent agents of the team, and provides characteristic information for cooperative guidance decision of the intelligent agents.
Figure BDA0003201145420000047
The difference of the method is that only missile state quantities which threaten the current intelligent agent are contained, and characteristic information is provided for defense decision of the intelligent agent.
For a certain agent, the overall air war status is defined as follows:
Figure BDA0003201145420000048
Figure BDA0003201145420000049
Figure BDA00032011454200000410
Figure BDA00032011454200000411
(2) designing a semantic mechanical controller.
The application decomposes the maneuver of a fighter plane into several microscopic maneuvers in both horizontal and vertical dimensions, see table 1. The horizontal dimension maneuver takes the sight line as a reference, the included angle between the speed direction of the airplane and the sight line is used as a control instruction, the airplane flies towards a target at 0 degree, and the positive and negative of other angles are determined by the fastest turning direction. When the angle is 0, + -30, + -50, it means that the agent is currently using an attack strategy, which usually occurs when proximity to a target is required to launch a missile or a bias maneuver is used after launching a missile to guide the missile, including the semantics of an attack. The conditions of +/-90 and +/-180 correspond to the compression of defense maneuver against the Doppler effect of the hand fire control radar and the escape maneuver at the tail end of a turn, and the semantics of the defense are included. The maneuver of the vertical dimension takes a horizontal line as a reference, an included angle between the speed direction of the airplane and the horizontal line is used as a control instruction, a positive angle is climbing, and a negative angle is diving. The purpose of the climbing maneuver is to enter a region with thin air to weaken the energy attenuation of the launched missile, including the semantics of attack. The diving maneuver makes the self enter an air dense area, increases the energy consumption of the incoming missile, improves the survival probability and contains the defense semantics. See table 1 for details.
TABLE 1 semantic maneuver List
Figure BDA0003201145420000051
(3) And constructing a hierarchical network.
From the foregoing, the present application has characterized air combat as four types of observed states: absolute observation, relative observation, friend missile observation and enemy missile observation. The four types of observation states jointly form a global observation state of the air battle, and almost all information of a battlefield is contained, so that the value estimation is more accurate by estimating the state value of each time step through the global observation state. Inputting global observations into the network fO2EGenerating a corresponding global embedding vector egThen through the network fE2VThe state value of the current time step is output as described on the left side of fig. 1.
eg=fO2E(S)
V(S)=fE2V(eg)。
On the other hand, as described above, the relative observation, the friend missile observation and the enemy missile observation have respective state semantics, which are closely related to the tactical decision of the air battle, so that the application makes full use of the information to extract the strategy. The three observed quantities are respectively passed through respective fO2EThe network generates the corresponding embedded vector: relative observation embedding vector erFriend-party missile embedded vector eamAnd enemy missile embedding vector eomAs shown at the bottom of the right side of fig. 1. They are embedded with a global embedding vector egSpliced together to form a comprehensive action-embedding vector, i.e. hidden state vector e shown in FIG. 1tot. The hidden state vector comprehensively extracts the characteristic information of the global observation and each semantic observation, and is beneficial to the generation of strategies. The present application divides the strategy into a maneuver strategy and a fire strategy, and therefore will output three decision actions: horizontal mechanical mode ahorVertical maneuvering mode averAnd firing instructions ashootWherein the selection ranges of the horizontal and vertical maneuvers are shown in Table 1, firing order ashootE {0,1}, 0 denotes no shot and 1 denotes a shot. The hidden state vectors pass through the embedded layer network f of each actionactionGenerating respective motion-embedded vectors eaThe selection probability of each action is proportional to the action embedding vector: pi (a | s) octo exp (f)action(etot) Calculate the probability of each action using the Softmax activation function. The following formula describes the forward propagation process of a policy network:
Figure BDA0003201145420000061
Figure BDA0003201145420000062
Figure BDA0003201145420000063
etot=Ccat(eg,er,eam,eom)
ea=faction(etot)
Figure BDA0003201145420000064
in addition, in order to enable the maneuvering action to be suitable for richer situations, the control of the command speed and the command overload is added in the maneuvering mode. In order to reduce the learning complexity, a few common values are selected, and the instruction speed and the instruction overload are discretized, namely v epsilon [ v ∈ [ v [ ]1,…,vn],nn∈[nn1,…,nnm]. Thus, even under the same upper-layer mechanical strategy, the intelligent agent can select different instruction speeds and instruction overload to flexibly deal with different air combat situations. Since the selection of the two lower-layer instructions is closely related to the maneuver strategy of the upper layer, the application therefore adopts a layered idea to handle the selection of the two instruction actions. Converting the currently selected horizontal maneuver number and the currently selected vertical maneuver number into a one-hot vector form, and etotPassing in together instruction generation network fsteerAnd then outputting the instruction speed and the instruction overload through the softmax activating function. The forward propagation process of the underlying policy network is defined as follows:
ea=fsteer(etot,Tone-hot(ahor),Tone-hot(aver))
Figure BDA0003201145420000065
the following is a specific example.
(1) Inputting the whole air combat state S into a neural network to calculate eg=fO2E(S), assuming the output vector is eg=[10,22,21](ii) a E is to beg=[10,22,2]1 carrying out V (S) ═ fE2V(eg) Calculating and outputting the value of the current state, wherein V (S) is 34; relative state quantity
Figure BDA0003201145420000066
To carry out
Figure BDA0003201145420000067
Calculation, assuming output vector er=[9,2,11]The missile state quantity of the intelligent agent of the team
Figure BDA0003201145420000068
To carry out
Figure BDA0003201145420000069
Computing, assuming the output vector is eam=[11,23,67]The missile state quantity of the opponent agent
Figure BDA00032011454200000610
To carry out
Figure BDA00032011454200000611
Calculation, assuming output vector as eom=[54,3,7]. E is to beg,er,eamAnd eomPerforming a stitching operation to form the motion-embedded vector of step S1:
etot=Ccat(eg,er,eam,eom);
output 12-dimensional vector etot=[10,22,21,9,2,11,11,23,67,54,3,7]。
(2) In step S2, referring to fig. 1, a vector e is first embedded in the generated motiontotCarry out ehor=fHor(etot) Calculation, assuming output ehor=[34,21,1]Then proceed to
Figure RE-GDA0003461098080000071
The probability of outputting horizontal angle is calculated, and is assumed to be [0.1,0.1,0.2,0.4,0.1]In step S3, a sample is taken based on the probability,the 4 th horizontal angle was selected and the horizontal control was determined to be + -50 deg. offset according to table 1. In fig. 1, it is also necessary to convert the horizontal motion number 4 into a one-hot vector Tone-hot(ahor)=[0,0,0,1,0,0]To control the speed and overload of the agent;
similarly, with continued reference to FIG. 1, in step S2, a vector e is embedded for the generated motiontotCarry out ever=fVer(etot) Calculation, assuming output ever=[4,2,12]Then proceed to
Figure RE-GDA0003461098080000072
The probability of outputting the vertical angle is calculated, and is assumed to be [0.3,0,0.2,0.2,0.2, 0.1%]In step S3, sampling is performed according to the probability, the 1 st vertical angle is selected, and the vertical control mode is determined according to table 1 to maintain the current heading, and in fig. 1, the vertical motion 1 needs to be converted into a one-hot vector Tone-hot(aver)=[1,0,0,0,0,0]To control the speed and overload of the agent;
similarly, with continued reference to FIG. 1, in step S2, a vector e is embedded for the generated motiontotCarry out eshoot=fShoot(etot) Calculation, assuming output eshoot=[2,52,12]Then proceed to
Figure BDA0003201145420000075
Calculate the probability of outputting the shooting action, assume to be [0.3,0.7 ]]In step S3, sampling is performed based on the probability, the 2 nd shooting action is selected, and a shooting command is formed as a shooting.
In step S4, e generated in (1) is addedtotT produced in (2)one-hot(ahor),Tone-hot(aver) Calculating the command speed and the command overload, wherein the formula is as follows:
ev,enn=fsteer(etot,Tone-hot(ahor),Tone-hot(aver));
suppose output evIs [12,2,42 and e ]nn=[14,23,4]Then proceed again
Figure BDA0003201145420000076
And
Figure BDA0003201145420000077
calculating and determining a command speed probability list and a command overload probability list, and assuming that the probability of outputting the command speed is [0.6,0.4 ]]The probability of instruction overload is [0.2,0.8 ]]Assume that the command speed selection value is [200,300 ]]Two options, instruction override select value [5, 6%]Two options, sampling according to the probability, and selecting the instruction 1 speed 200m/s and the instruction 2 overload 6 g.
The airplane maneuvering control method (HRLMC) based on hierarchical reinforcement learning decomposes the learning of maneuvering strategies into two aspects of tactical maneuvering intention learning and three-dimensional maneuvering intention learning, and through the deep reinforcement learning method of self-gaming, an intelligent agent can learn flexible and diverse maneuvering strategies to deal with different air combat situations, so that the robustness of the algorithm is enhanced. The whole learning process does not involve any human handwriting rules. The application can generate a large number of maneuvering patterns with more diversity and flexibility by combining the horizontal tactical maneuvering intention and the vertical three-dimensional maneuvering intention.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (7)

1. An airplane maneuver control method based on layered reinforcement learning is used for maneuvering control of two teams of airplane intelligent bodies in a game process, and is characterized by comprising the following steps:
step S1, acquiring action embedding vectors of the intelligent agent after calculation through the neural network;
step S2, respectively outputting a probability list of a horizontal angle, a probability list of a vertical angle and a shooting probability list according to the action embedding vector, wherein the probability list of the horizontal angle comprises a plurality of probability values respectively corresponding to a plurality of preset horizontal control instructions, the probability list of the vertical angle comprises a plurality of probability values respectively corresponding to a plurality of preset vertical control instructions, and the shooting probability list comprises two probability values respectively corresponding to whether to shoot or not;
and step S3, sampling according to the probability lists, determining the horizontal control mode and the vertical control mode of the intelligent agent and whether to shoot, and controlling the intelligent agent.
2. The method for controlling maneuvering of an aircraft based on hierarchical reinforcement learning as set forth in claim 1, wherein step S1 further includes:
s11, acquiring the overall air combat state, and dividing the overall air combat state into the current intelligent agent absolute state quantity representing the self attribute of the intelligent agent, the relative state quantity of the current intelligent agent and other intelligent agents, the missile state quantity of the intelligent agent of the team and the missile state quantity of the intelligent agent of the opponent;
step S12, determining a global embedding vector based on the overall air combat state, determining a relative observation embedding vector based on the relative state quantity, determining a friend missile embedding vector based on the missile state quantity of the team intelligent body, and determining an enemy missile embedding vector based on the missile state quantity of the opponent intelligent body;
and step S13, the action embedding vector is formed by jointly splicing the global embedding vector, the relative observation embedding vector, the friend missile embedding vector and the enemy missile embedding vector.
3. The aircraft maneuver control method based on the layered reinforcement learning as claimed in claim 2, wherein in step S11, the current intelligent agent absolute state quantities include vacuum speed, current altitude, climbing rate, three-axis attitude angle, normal overload, fire control radar locking signal, electronic alarm device alarm state, and number of remaining empty missiles; the relative state quantity comprises a relative distance, an approach rate, a relative altitude difference, a target entrance angle, a local beam angle and attack area information, and the missile state quantity comprises a missile speed, a current height, a missile-target distance, a missile-target approach rate, residual hit time, an entrance angle and a beam angle between a missile and a target; the missile state quantities of the opponent agent comprise missile state quantities which threaten the current agent.
4. The method for controlling maneuvering of an aircraft based on stratified reinforcement learning as recited in claim 1, wherein the preset level control commands in step S2 include six of a hold command to hold the current heading, an attack command to the target, an attack command biased by ± 30 °, an attack command biased by ± 50 °, a defense command biased by ± 90 °, and a defense command biased by ± 180 °.
5. The method for controlling maneuvering of an aircraft based on hierarchical reinforcement learning as set forth in claim 1, characterized in that the preset vertical control commands in step S2 include six, respectively a hold command to hold the current heading, an attack command to aim at, an attack command to climb at +30 °, an attack command to climb at +60 °, a defense command to dive at-30 ° and a defense command to dive at-60 °.
6. The method for controlling maneuvering of an aircraft based on hierarchical reinforcement learning of claim 1, further comprising, after step S3:
and step S4, determining an instruction speed probability list and an instruction overload probability list according to sampling results corresponding to a horizontal control mode and a vertical control mode and the action embedding vector, and sampling according to the instruction speed probability list and the instruction overload probability list respectively to obtain the instruction speed and the instruction overload of the intelligent agent.
7. The method as claimed in claim 1, wherein the command speed probability list includes a plurality of probabilities corresponding to speed values, the command overload probability list includes a plurality of probabilities corresponding to overload values, and the speed values and the overload values are obtained by discretizing the command speed and the discretizing the command overload.
CN202110904677.8A 2021-08-07 2021-08-07 Airplane maneuvering control method based on layered reinforcement learning Pending CN114035602A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110904677.8A CN114035602A (en) 2021-08-07 2021-08-07 Airplane maneuvering control method based on layered reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110904677.8A CN114035602A (en) 2021-08-07 2021-08-07 Airplane maneuvering control method based on layered reinforcement learning

Publications (1)

Publication Number Publication Date
CN114035602A true CN114035602A (en) 2022-02-11

Family

ID=80139840

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110904677.8A Pending CN114035602A (en) 2021-08-07 2021-08-07 Airplane maneuvering control method based on layered reinforcement learning

Country Status (1)

Country Link
CN (1) CN114035602A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004051485A1 (en) * 2002-12-05 2004-06-17 Nir Padan Dynamic guidance for close-in maneuvering air combat
CN111027143A (en) * 2019-12-18 2020-04-17 四川大学 Shipboard aircraft approach guiding method based on deep reinforcement learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004051485A1 (en) * 2002-12-05 2004-06-17 Nir Padan Dynamic guidance for close-in maneuvering air combat
CN111027143A (en) * 2019-12-18 2020-04-17 四川大学 Shipboard aircraft approach guiding method based on deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HAIYIN PIAO 等: "Beyond-Visual-Range Air Combat Tactics Auto-Generation by Reinforcement Learning", INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, pages 1 - 8 *
孙楚 等: "基于强化学习的无人机自主机动决策方法", 火力与指挥控制, vol. 44, no. 4, pages 142 - 149 *

Similar Documents

Publication Publication Date Title
CN113536528B (en) Early warning aircraft tactical behavior simulation method and system under non-convoy condition
CN113791634A (en) Multi-aircraft air combat decision method based on multi-agent reinforcement learning
Hu et al. Application of deep reinforcement learning in maneuver planning of beyond-visual-range air combat
Li et al. Deep reinforcement learning with application to air confrontation intelligent decision-making of manned/unmanned aerial vehicle cooperative system
CN109063819B (en) Bayesian network-based task community identification method
CN113435598B (en) Knowledge-driven intelligent strategy deduction decision method
CN114460959A (en) Unmanned aerial vehicle group cooperative autonomous decision-making method and device based on multi-body game
CN115951709A (en) Multi-unmanned aerial vehicle air combat strategy generation method based on TD3
Santoso et al. State-of-the-art integrated guidance and control systems in unmanned vehicles: A review
CN115903865A (en) Aircraft near-distance air combat maneuver decision implementation method
Bae et al. Deep reinforcement learning-based air-to-air combat maneuver generation in a realistic environment
CN115993835A (en) Target maneuver intention prediction-based short-distance air combat maneuver decision method and system
Qiu et al. One-to-one air-combat maneuver strategy based on improved TD3 algorithm
Xianyong et al. Research on maneuvering decision algorithm based on improved deep deterministic policy gradient
Xu et al. Autonomous decision-making for dogfights based on a tactical pursuit point approach
Dahlbom et al. Detection of hostile aircraft behaviors using dynamic bayesian networks
Duan et al. Autonomous maneuver decision for unmanned aerial vehicle via improved pigeon-inspired optimization
CN114035602A (en) Airplane maneuvering control method based on layered reinforcement learning
CN115859778A (en) Air combat maneuver decision method based on DCL-GWOO algorithm
Meng et al. One-to-one close air combat maneuver decision method based on target maneuver intention prediction
Wang et al. Over-the-Horizon Air Combat Environment Modeling and Deep Reinforcement Learning Application
Zhang et al. Intelligent Close Air Combat Design based on MA-POCA Algorithm
Scukins et al. Monte carlo tree search and convex optimization for decision support in beyond-visual-range air combat
Stilman et al. Adapting the linguistic geometry—abstract board games approach to air operations
Lu et al. Strategy Generation Based on DDPG with Prioritized Experience Replay for UCAV

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination