CN115857548A - Terminal guidance law design method based on deep reinforcement learning - Google Patents

Terminal guidance law design method based on deep reinforcement learning Download PDF

Info

Publication number
CN115857548A
CN115857548A CN202211509505.1A CN202211509505A CN115857548A CN 115857548 A CN115857548 A CN 115857548A CN 202211509505 A CN202211509505 A CN 202211509505A CN 115857548 A CN115857548 A CN 115857548A
Authority
CN
China
Prior art keywords
missile
target
state
action
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211509505.1A
Other languages
Chinese (zh)
Inventor
易文俊
杨书
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN202211509505.1A priority Critical patent/CN115857548A/en
Publication of CN115857548A publication Critical patent/CN115857548A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Aiming, Guidance, Guns With A Light Source, Armor, Camouflage, And Targets (AREA)

Abstract

The invention discloses a terminal guidance law design method based on deep reinforcement learning, and belongs to the field of missile and rocket guidance. The method comprises the following steps: establishing a relative kinematic equation between missile eyes in a longitudinal plane of a guided section at the tail end of a missile interception target; in order to adapt to a research paradigm of reinforcement learning, a research problem is abstracted and modeled into a Markov decision process; building an algorithm network, setting algorithm parameters, and selecting a deep reinforcement learning algorithm as DQN; in the final guidance process of each round, a sufficient number of training samples are obtained through Q-learning, a neural network and an updated target network are trained at fixed frequency respectively, and the process is repeated continuously before the set learning round is not reached. By applying the technical scheme of the invention, the guidance precision of the traditional proportional guidance law can be improved, and the missile can obtain certain autonomous decision-making capability.

Description

Terminal guidance law design method based on deep reinforcement learning
Technical Field
The invention belongs to the field of missile and rocket guidance, and particularly relates to a terminal guidance law design method based on deep reinforcement learning.
Background
The terminal control law for controlling the missile flying at the ultrahigh speed to accurately hit the enemy target is defined as the terminal guidance law and is also a vital technology of a prevention and control system. The control quantity output by the guidance law is a key basis for intercepting the missile to adjust the flight attitude of the missile, and most of the guidance laws actually applied to engineering practice at present are still based on a proportional guidance law or an improved guidance law. The principle is that the line-of-sight rotation rate of the missile and the target and the rotation rate of the speed vector of the missile are kept unchanged in direct proportion by control means such as a missile-borne steering engine.
In an ideal situation, a proportional guidance law can achieve a better hit effect, but when the inherent nonidealities of a missile aerodynamic model, the inherent delay of an autopilot and the target of large maneuvering are considered, the guidance law can cause a higher miss distance.
Disclosure of Invention
In order to solve the technical defects in the prior art, the invention provides a design method of a terminal guidance law based on deep reinforcement learning.
The technical scheme for realizing the purpose of the invention is as follows: a terminal guidance law design method based on deep reinforcement learning comprises the following steps:
step 1: establishing a relative kinematic equation between the missile eyes in the longitudinal plane of the terminal guidance section of the missile interception target;
step 2: abstracting the solution problem of the kinematic equation and modeling the solution problem into a Markov decision process;
and step 3: building an algorithm network, setting algorithm parameters, and training the algorithm network according to a randomly initialized data set to determine weight parameters of an initial network;
and 4, step 4: the intelligent agent continuously caches the state transition data and the reward value as learning samples in an experience pool according to a Q-learning algorithm, and continuously selects a fixed number of sample training networks from the experience pool at random until a set learning turn is reached;
and 5: in a specific guidance process, a learned network is used for generating action in real time according to the current state and transferring to the next state, and the process is continuously repeated until a target is hit to finish the guidance process.
Preferably, the relative kinematic equation between the missile eyes in the longitudinal plane of the missile interception target terminal guidance section established in the step 1 is as follows:
x r =x t -x m
y r =y t -y m
Figure BDA0003970189190000021
Figure BDA0003970189190000022
Figure BDA0003970189190000023
Figure BDA0003970189190000024
Figure BDA0003970189190000025
Figure BDA0003970189190000026
wherein x is t Is the abscissa, x, of the object m Is the abscissa, x, of the missile r Is the transverse relative distance of the target from the missile, y t Is the ordinate, y, of the object m Is the ordinate, y, of the missile r Is the longitudinal relative distance of the target and the missile, V t Is the target linear velocity, theta t Is the angle between the target linear velocity direction and the horizontal direction, V m At linear velocity of the missile, theta m Is the included angle between the linear velocity direction of the missile and the horizontal direction,
Figure BDA0003970189190000027
is the rate of change of the lateral distance between the target and the missile>
Figure BDA0003970189190000028
Is the change rate of the longitudinal distance between the target and the missile, r is the relative distance between the target and the missile, q is the included angle between the sight line between the missile and the target and the horizontal direction, also called the sight line angle, and is greater than or equal to the maximum value of the mean square value>
Figure BDA0003970189190000029
Based on the relative distance change rate>
Figure BDA00039701891900000210
Is the rate of change of the line of sight angle.
Preferably, abstracting and modeling the solution problem of the kinematic equation as a markov decision process specifically comprises:
the action space setting specifically is as follows: constructing an action space by taking a proportional guidance law PNG as expert experience;
the state space setting is specifically as follows: change the sight line
Figure BDA00039701891900000211
As a state space for the current problem of knowledge;
the reward function setting is specifically as follows:
Figure BDA00039701891900000212
in the formula, r hit Relative distance, r, for the final target hit by the missile end Is the relative distance between the missile and the target at the termination time, end is the total period duration of the termination time, r t The distance between the missile and the target at the moment t in the simulation process.
Preferably, the specific process of constructing the motion space by using the proportional guidance law PNG as the expert experience is as follows:
taking the relative speed and the line-of-sight rotation rate as input, the output is an overload instruction, and the overload instruction is expressed as:
Figure BDA0003970189190000031
wherein K is a proportionality coefficient>
Figure BDA0003970189190000032
Is relative speed, is based on>
Figure BDA0003970189190000033
The line of sight rotation rate; by setting the proportionality coefficient K to be constantIs discretized into a finite number as a motion space, the scaling factor is determined by selecting the motion in the motion space, and the overload command is calculated therefrom.
Preferably, the specific steps of initializing the neural network weight parameter are as follows:
step 301: determining that an algorithm network adopts a BP neural network, inputting a (state, action) two-dimensional column vector, and outputting a Q value corresponding to a (state, action) binary group;
calling a random function in a given value range to generate a series of random data as an input data set of the network, and calculating a reward value taking the random data set as a state and an action according to a reward function as an output reference data set;
step 302: and training the neural network according to the data set obtained in the step 301 to determine initial weight parameters of the neural network.
Preferably, the specific method for training the neural network and updating the target network with a fixed frequency is as follows:
in each simulation step, selecting an executed action from an action space according to an epsilon-greedy strategy for the current state, integrating according to a kinetic equation to obtain the state of the next moment, and calculating to obtain an obtained reward value; setting an experience pool and saving the current state, the executed action, the reward value and the next state as experience into the experience pool;
randomly taking out a data set with a certain size from the experience pool at a fixed frequency, calculating a corresponding target value of the data set, training a neural network by using the data set and the target value corresponding to the data set, and updating the target network by adopting a certain frequency, namely replacing the target network with the network trained in a period of time.
Preferably, the specific calculation method of the target value is as follows:
Q t arg et =Q(s t ,a t )+α[R t +γmax a Q(s t+1 ,a)-Q(s t ,a t )]
wherein Q is t arg et Represents updated(s) t ,a t ) The corresponding Q value of the signal is obtained,s t represents the state at time t, a t Representative state s t Action performed below, Q(s) t ,a t ) Representative state s t Lower execution action a t Is the learning rate, is the rate of Q value update, R t Representative state s t Lower execution action a t The value of the prize earned, gamma for the discount rate, s t+1 Represents the state at time t +1, max a Q(s t+1 A) represents the state s t+1 The Q value for performing the optimal action.
Compared with the prior art, the invention has the following remarkable advantages: the invention provides an optimal navigation ratio accumulation sequence obtained by applying a deep reinforcement learning algorithm through off-line learning in a given navigation ratio range, so that the missile can select the most appropriate navigation ratio parameter to generate overload required to be used at every moment according to the current state, thereby solving the problem of difficulty in selecting the navigation ratio to a certain extent and simultaneously improving the hit precision.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, wherein like reference numerals are used to designate like parts throughout.
FIG. 1 is a geometrical diagram of missile interception end-guidance plane engagement provided according to an embodiment of the invention.
FIG. 2 is a diagram of the end-guided rhythm learning of the deep reinforcement learning of the present invention.
FIG. 3 is a flowchart of the deep reinforcement learning algorithm of the present invention.
Fig. 4 is a relationship between the amount of miss and the number of learning rounds for an embodiment of the present invention.
FIG. 5 is a two-dimensional trajectory graph of the missile and target provided in accordance with an embodiment of the present invention.
FIG. 6 is a sequence of navigation ratios for a specific example of the present invention.
Fig. 7 is a graph of line-of-sight angular velocity for a specific example of the present invention.
Detailed Description
It is easily understood that various embodiments of the present invention can be conceived by those skilled in the art according to the technical solution of the present invention without changing the essential spirit of the present invention. Therefore, the following detailed description and the accompanying drawings are merely illustrative of the technical aspects of the present invention, and should not be construed as all of the present invention or as limitations or limitations on the technical aspects of the present invention. Rather, these embodiments are provided so that this disclosure will be thorough and complete. The preferred embodiments of the present invention will now be described in detail with reference to the accompanying drawings, which form a part hereof, and which together with the embodiments of the invention serve to explain the innovative concepts of the invention.
The invention discloses a terminal guidance law design method based on deep reinforcement learning, which comprises the following steps:
step 1: with reference to the geometrical schematic diagram of the missile interception terminal guidance plane engagement in fig. 1, the equation of the relative kinematics between the missile eyes in the longitudinal plane of the missile interception target terminal guidance section is established as follows:
x r =x t -x m
y r =y t -y m
Figure BDA0003970189190000051
Figure BDA0003970189190000052
Figure BDA0003970189190000053
Figure BDA0003970189190000054
Figure BDA0003970189190000055
Figure BDA0003970189190000056
wherein x is t Is the abscissa, x, of the object m Is the abscissa, x, of the missile r Is the transverse relative distance of the target from the missile, y t Is the ordinate, y, of the object m As ordinate, y, of the missile r Is the longitudinal relative distance of the target from the missile, V t Is the target linear velocity, theta t Is the angle between the target linear velocity direction and the horizontal direction, V m At the linear velocity of the missile, θ m Is the included angle between the linear velocity direction of the missile and the horizontal direction,
Figure BDA0003970189190000057
in relation to the rate of change of the transverse distance of the target from the missile>
Figure BDA0003970189190000058
Is the change rate of the longitudinal distance between the target and the missile, r is the relative distance between the target and the missile, q is the included angle between the sight line between the missile and the target and the horizontal direction, also called the sight line angle, and is greater than or equal to the maximum value of the mean square value>
Figure BDA0003970189190000059
Based on the relative distance change rate>
Figure BDA00039701891900000510
Is the rate of change of the line of sight angle.
Step 2: abstracting the solution problem of the kinematic equation and modeling the solution problem into a Markov decision process;
further, step 2 specifically includes:
and (4) setting an action space, and constructing the action space by taking the proportional guidance law PNG as expert experience in order to avoid the condition that the final algorithm cannot be converged due to overlarge action space search.
Specifically, the proportional guidance law takes the relative speed and the line-of-sight rotation rate as input, and the output is an overload command, which is expressed as:
Figure BDA00039701891900000511
wherein K is a proportionality coefficient>
Figure BDA00039701891900000512
Is relative speed, is based on>
Figure BDA00039701891900000513
The line of sight rotation rate; determining the proportionality coefficient by discretizing the proportionality coefficient K into a limited number within a certain range of values as an action space, by selecting an action in the action space, and calculating therefrom an overload instruction;
setting a state space, wherein in the design of the guidance ratio, the selected state space must contain all states of the guidance process, and converting the sight line into the ratio
Figure BDA00039701891900000514
As a state space for the current problem-aware, all states of motion can be adequately represented;
setting a reward function, wherein the DQN algorithm judges whether the execution action is good or not by using the reward function; in the missile pursuit process, if the relative distance between the missile and the target is shortened at adjacent moments, a positive reward is obtained; if the missile finally hits the target, a larger reward will be obtained, but otherwise the missile that did not hit the target will set the reward to 0, and in summary, the reward function is set to:
Figure BDA0003970189190000061
in the formula, r hit Relative distance, r, for the final target hit by the missile end Is the time of terminationThe relative distance between the missile and the target, end is the total period duration of the termination time, r t The distance between the missile and the target at the moment t in the simulation process. Relative velocity during target pursuit by the missile
Figure BDA0003970189190000062
Is always negative when>
Figure BDA0003970189190000063
The time is changed from negative to positive at a certain moment, and the moment is the termination moment.
And 3, step 3: building an algorithm network, setting algorithm parameters, and training the algorithm network according to a randomly initialized data set to determine weight parameters of an initial network;
specifically, the algorithm determining network adopts a BP neural network, inputs the (state, action) two-dimensional column vectors, and outputs Q values corresponding to the (state, action) two-tuple, wherein the significance of the Q values is to determine the optimal action to be executed according to the Q values of different actions to be executed in the same state; calling a random function within a given value range to generate a series of random data as an input data set of the network, and calculating a reward value taking the random data set as a state and an action according to the reward function in the step 4 as an output reference data set;
and 4, step 4: the intelligent agent continuously caches state transition data and reward values as learning samples in an experience pool according to a Q-learning algorithm, and continuously selects a fixed number of sample training networks from the experience pool at random until a set learning turn is reached, and the method specifically comprises the following steps:
in each simulation step, determining which action is taken for the current state through an epsilon-greedy strategy, integrating according to a kinetic equation to obtain the state of the next moment, and meanwhile calculating to obtain an obtained reward value; setting experience pool and obtaining the current state, the executed action, the reward value and the next state(s) t ,a t ,r t ,s t+1 ) Saving the experience in an experience pool as experience;
randomly fetching a data set of a certain size from an experience pool at a fixed frequencyThen, calculating a corresponding target value of the data set, wherein the specific calculation method is as follows: q t arg et =Q(s t ,a t )+α[R t +γmax a Q(s t+1 ,a)-Q(s t ,a t )]Wherein Q is t arg et Represents updated(s) t ,a t ) Corresponding Q value, s t Representing the state at time t, a t Representative state s t Action performed, Q(s) t ,a t ) Representative state s t Lower execution action a t Is the rate of updating of the Q value, R t Representative state s t Lower execution action a t The value of the reward earned, γ representing the discount rate, is how important the future experience is in performing the action on the current state, s t+1 Represents the state at time t +1, max a Q(s t+1 A) represents the state s t+1 The Q value of the optimal action is executed; then training a neural network by using the data set and the corresponding target value obtained by calculation until reaching the set learning turn;
and 5: in a specific guidance process, a learned network is used for generating action in real time according to the current state and transferring to the next state, and the process is continuously repeated until a target is hit to finish the guidance process.
To be a specific example of the present invention, the initial conditions are set as follows:
Figure BDA0003970189190000071
meanwhile, the motion space, i.e., the navigation ratio, is designed to be a = {2,2.12.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9,3.0,3.1,3.2,3.3,3.4,3.5,3.6,3.7,3.8,3.9,4.0,4.1,4.2,4.3,4.4,4.5,4.6,4.7,4.8,4.9,5.0}; the neural network is set as 2 hidden layers, each layer has 40 neurons, and the error back propagation strategy selects a gradient descent method; the total number of learning rounds is 2200, and as can be seen from fig. 4, the miss distance converges from the initial random distribution to a lower value as the number of learning rounds increases, thus demonstrating the convergence of the algorithm of the present invention.
And intercepting the target by applying the learned algorithm model, and calculating a guidance trajectory by a fourth-order Runge Kutta to obtain a trajectory diagram as shown in FIG. 5. Comparing the depth-enhanced learning-based guidance law (DQNG) with the traditional proportional guidance law (PNG), wherein the miss distance of the DQNG is 0.5386m, the miss distance of the QNG is 1.3268m, and finding that the guidance trajectory of the DQNG can approach a target more quickly and perform accurate striking; meanwhile, the DQNG has the hit time of 12.44s, while the PNG has the hit time of 12.94s, compared with the DQNG, the target can be intercepted faster.
While the invention has been described with reference to specific preferred embodiments, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the following claims.
It should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes described in a single embodiment or with reference to a single figure, for the purpose of streamlining the disclosure and aiding in the understanding of various aspects of the invention by those skilled in the art. However, the present invention should not be construed to include features in the exemplary embodiments which are all the essential technical features of the patent claims.
It should be understood that the modules, units, components, and the like included in the device of one embodiment of the present invention may be adaptively changed to be provided in a device different from that of the embodiment. The different modules, units or components comprised by the apparatus of an embodiment may be combined into one module, unit or component or may be divided into a plurality of sub-modules, sub-units or sub-components.

Claims (7)

1. A terminal guidance law design method based on deep reinforcement learning is characterized by comprising the following steps:
step 1: establishing a relative kinematic equation between missile eyes in a longitudinal plane of a guided section at the tail end of a missile interception target;
step 2: abstracting the solution problem of the kinematic equation and modeling the solution problem into a Markov decision process;
and step 3: establishing an algorithm network, setting algorithm parameters, and training the algorithm network according to a randomly initialized data set to determine weight parameters of an initial network;
and 4, step 4: the intelligent agent continuously caches the state transition data and the reward value as learning samples in an experience pool according to a Q-learning algorithm, and continuously selects a fixed number of sample training networks from the experience pool at random until a set learning turn is reached;
and 5: in a specific guidance process, a learned network is used for generating action in real time according to the current state and transferring to the next state, and the process is continuously repeated until a target is hit to finish the guidance process.
2. The terminal guidance law design method based on the deep reinforcement learning as claimed in claim 1, wherein the relative kinematic equation between the missile eyes in the longitudinal plane of the terminal guidance section of the missile interception target established in step 1 is specifically as follows:
x r =x t -x m
y r =y t -y m
Figure FDA0003970189180000011
Figure FDA0003970189180000012
Figure FDA0003970189180000013
Figure FDA0003970189180000014
Figure FDA0003970189180000015
Figure FDA0003970189180000016
wherein x is t Is the abscissa, x, of the object m Is the abscissa, x, of the missile r Is the transverse relative distance of the target from the missile, y t Is the ordinate, y, of the object m As ordinate, y, of the missile r Is the longitudinal relative distance of the target and the missile, V t Is the target linear velocity, theta t Is the angle between the target linear velocity direction and the horizontal direction, V m At linear velocity of the missile, theta m Is the included angle between the linear velocity direction of the missile and the horizontal direction,
Figure FDA0003970189180000021
is the rate of change of the lateral distance between the target and the missile>
Figure FDA0003970189180000022
Is the change rate of the longitudinal distance between the target and the missile, r is the relative distance between the target and the missile, q is the included angle between the sight line between the missile and the target and the horizontal direction, also called the sight line angle, and is greater than or equal to the maximum value of the mean square value>
Figure FDA0003970189180000023
Based on the relative distance change rate>
Figure FDA0003970189180000024
Is the rate of change of the line of sight angle.
3. The end-lead law design method based on deep reinforcement learning according to claim 1, wherein abstracting and modeling a solution problem of kinematic equations into a Markov decision process specifically comprises:
the action space setting specifically is as follows: constructing an action space by taking a proportional guidance law PNG as expert experience;
the state space setting is specifically as follows: change the line of sight to a certain degree
Figure FDA0003970189180000025
As a state space for the current problem of knowledge;
the reward function setting is specifically as follows:
Figure FDA0003970189180000026
in the formula, r hit Relative distance, r, for the final target hit by the missile end Is the relative distance between the missile and the target at the termination time, end is the total period duration of the termination time, r t The distance between the missile and the target at the moment t in the simulation process.
4. The terminal guidance law design method based on the deep reinforcement learning as claimed in claim 3, wherein the specific process of constructing the action space by taking the proportional guidance law PNG as the expert experience is as follows:
taking the relative speed and the line of sight rotation rate as input, the output is an overload command, and is represented as:
Figure FDA0003970189180000027
wherein K is a proportionality coefficient>
Figure FDA0003970189180000028
Is relative speed, is based on>
Figure FDA0003970189180000029
The line of sight rotation rate; the scaling factor K is determined by selecting an action in the action space as an action space by discretizing the scaling factor K into a finite number within a certain range of values, and the overload command is calculated therefrom.
5. The terminal guidance law design method based on deep reinforcement learning as claimed in claim 1, wherein the specific steps of initializing the weight parameters of the neural network are as follows:
step 301: determining that an algorithm network adopts a BP neural network, inputting a (state, action) two-dimensional column vector, and outputting a Q value corresponding to a (state, action) binary group;
calling a random function in a given value range to generate a series of random data as an input data set of the network, and calculating a reward value taking the random data set as a state and an action according to a reward function as an output reference data set;
step 302: and training the neural network according to the data set obtained in the step 301 to determine initial weight parameters of the neural network.
6. The terminal guidance law design method based on deep reinforcement learning as claimed in claim 1, wherein the specific method for training the neural network and updating the target network with fixed frequency is as follows:
in each simulation step, selecting an executed action from an action space according to an epsilon-greedy strategy for the current state, integrating according to a kinetic equation to obtain the state of the next moment, and calculating to obtain an obtained reward value; setting an experience pool and saving the current state, the executed action, the reward value and the next state as experience into the experience pool;
randomly taking out a data set with a certain size from the experience pool at a fixed frequency, calculating a corresponding target value of the data set, training a neural network by using the data set and the target value corresponding to the data set, and updating the target network by adopting a certain frequency, namely replacing the target network with the network trained in a period of time.
7. The terminal guidance law design method based on the deep reinforcement learning as claimed in claim 6, wherein the specific calculation method of the target value is as follows:
Q target =Q(s t ,a t )+α[R t +γmax a Q(s t+1 ,a)-Q(s t ,a t )]
wherein Q is target Represents updated(s) t ,a t ) Corresponding Q value, s t Representing the state at time t, a t Representative state s t Action performed below, Q(s) t ,a t ) Representative state s t Lower execution action a t Is the rate of updating of the Q value, R t Representative state s t Lower execution action a t The prize value earned, gamma representing the discount rate, s t+1 Represents the state at time t +1, max a Q(s t+1 A) represents the state s t+1 The Q value for performing the optimal action.
CN202211509505.1A 2022-11-29 2022-11-29 Terminal guidance law design method based on deep reinforcement learning Pending CN115857548A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211509505.1A CN115857548A (en) 2022-11-29 2022-11-29 Terminal guidance law design method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211509505.1A CN115857548A (en) 2022-11-29 2022-11-29 Terminal guidance law design method based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN115857548A true CN115857548A (en) 2023-03-28

Family

ID=85667624

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211509505.1A Pending CN115857548A (en) 2022-11-29 2022-11-29 Terminal guidance law design method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN115857548A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116222310A (en) * 2023-04-13 2023-06-06 哈尔滨工业大学 Two-pair synchronous region coverage interception method based on RBF_G in three-dimensional space
CN117989923A (en) * 2024-03-22 2024-05-07 哈尔滨工业大学 Variable proportion coefficient multi-bullet collaborative guidance method and system based on reinforcement learning

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116222310A (en) * 2023-04-13 2023-06-06 哈尔滨工业大学 Two-pair synchronous region coverage interception method based on RBF_G in three-dimensional space
CN116222310B (en) * 2023-04-13 2024-04-26 哈尔滨工业大学 Two-pair synchronous region coverage interception method based on RBF_G in three-dimensional space
CN117989923A (en) * 2024-03-22 2024-05-07 哈尔滨工业大学 Variable proportion coefficient multi-bullet collaborative guidance method and system based on reinforcement learning

Similar Documents

Publication Publication Date Title
CN115857548A (en) Terminal guidance law design method based on deep reinforcement learning
CN108803321B (en) Autonomous underwater vehicle track tracking control method based on deep reinforcement learning
US11794898B2 (en) Air combat maneuvering method based on parallel self-play
CN110989576A (en) Target following and dynamic obstacle avoidance control method for differential slip steering vehicle
CN111240345B (en) Underwater robot trajectory tracking method based on double BP network reinforcement learning framework
CN111538241B (en) Intelligent control method for horizontal track of stratospheric airship
CN112947592B (en) Reentry vehicle trajectory planning method based on reinforcement learning
CN113093802A (en) Unmanned aerial vehicle maneuver decision method based on deep reinforcement learning
CN112286218B (en) Aircraft large-attack-angle rock-and-roll suppression method based on depth certainty strategy gradient
CN114253296B (en) Hypersonic aircraft airborne track planning method and device, aircraft and medium
CN111240344B (en) Autonomous underwater robot model-free control method based on reinforcement learning technology
Yue et al. Deep reinforcement learning and its application in autonomous fitting optimization for attack areas of UCAVs
CN113671825B (en) Maneuvering intelligent decision-avoiding missile method based on reinforcement learning
CN114675673B (en) Method and system for tracking moving target in air
CN113625740B (en) Unmanned aerial vehicle air combat game method based on transfer learning pigeon swarm optimization
CN114063644B (en) Unmanned fighter plane air combat autonomous decision-making method based on pigeon flock reverse countermeasure learning
CN114089776A (en) Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning
Jiang et al. Realizing midcourse penetration with deep reinforcement learning
CN116697829A (en) Rocket landing guidance method and system based on deep reinforcement learning
Wang et al. Autonomous target tracking of multi-UAV: A two-stage deep reinforcement learning approach with expert experience
CN117908565A (en) Unmanned aerial vehicle safety path planning method based on maximum entropy multi-agent reinforcement learning
CN117332684A (en) Optimal capturing method under multi-spacecraft chase-escaping game based on reinforcement learning
CN116796844A (en) M2 GPI-based unmanned aerial vehicle one-to-one chase game method
CN114815878B (en) Hypersonic aircraft collaborative guidance method based on real-time optimization and deep learning
CN114997048A (en) Automatic driving vehicle lane keeping method based on TD3 algorithm improved by exploration strategy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination