CN115857548A - Terminal guidance law design method based on deep reinforcement learning - Google Patents
Terminal guidance law design method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN115857548A CN115857548A CN202211509505.1A CN202211509505A CN115857548A CN 115857548 A CN115857548 A CN 115857548A CN 202211509505 A CN202211509505 A CN 202211509505A CN 115857548 A CN115857548 A CN 115857548A
- Authority
- CN
- China
- Prior art keywords
- missile
- target
- state
- action
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 230000002787 reinforcement Effects 0.000 title claims abstract description 19
- 238000013461 design Methods 0.000 title claims abstract description 15
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 24
- 230000008569 process Effects 0.000 claims abstract description 24
- 238000013528 artificial neural network Methods 0.000 claims abstract description 16
- 238000012549 training Methods 0.000 claims abstract description 14
- 230000009471 action Effects 0.000 claims description 50
- 230000008859 change Effects 0.000 claims description 14
- 230000006870 function Effects 0.000 claims description 11
- 238000004088 simulation Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 4
- 239000013598 vector Substances 0.000 claims description 4
- 239000003795 chemical substances by application Substances 0.000 claims description 3
- 230000007704 transition Effects 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 4
- 238000009825 accumulation Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
Images
Landscapes
- Aiming, Guidance, Guns With A Light Source, Armor, Camouflage, And Targets (AREA)
Abstract
The invention discloses a terminal guidance law design method based on deep reinforcement learning, and belongs to the field of missile and rocket guidance. The method comprises the following steps: establishing a relative kinematic equation between missile eyes in a longitudinal plane of a guided section at the tail end of a missile interception target; in order to adapt to a research paradigm of reinforcement learning, a research problem is abstracted and modeled into a Markov decision process; building an algorithm network, setting algorithm parameters, and selecting a deep reinforcement learning algorithm as DQN; in the final guidance process of each round, a sufficient number of training samples are obtained through Q-learning, a neural network and an updated target network are trained at fixed frequency respectively, and the process is repeated continuously before the set learning round is not reached. By applying the technical scheme of the invention, the guidance precision of the traditional proportional guidance law can be improved, and the missile can obtain certain autonomous decision-making capability.
Description
Technical Field
The invention belongs to the field of missile and rocket guidance, and particularly relates to a terminal guidance law design method based on deep reinforcement learning.
Background
The terminal control law for controlling the missile flying at the ultrahigh speed to accurately hit the enemy target is defined as the terminal guidance law and is also a vital technology of a prevention and control system. The control quantity output by the guidance law is a key basis for intercepting the missile to adjust the flight attitude of the missile, and most of the guidance laws actually applied to engineering practice at present are still based on a proportional guidance law or an improved guidance law. The principle is that the line-of-sight rotation rate of the missile and the target and the rotation rate of the speed vector of the missile are kept unchanged in direct proportion by control means such as a missile-borne steering engine.
In an ideal situation, a proportional guidance law can achieve a better hit effect, but when the inherent nonidealities of a missile aerodynamic model, the inherent delay of an autopilot and the target of large maneuvering are considered, the guidance law can cause a higher miss distance.
Disclosure of Invention
In order to solve the technical defects in the prior art, the invention provides a design method of a terminal guidance law based on deep reinforcement learning.
The technical scheme for realizing the purpose of the invention is as follows: a terminal guidance law design method based on deep reinforcement learning comprises the following steps:
step 1: establishing a relative kinematic equation between the missile eyes in the longitudinal plane of the terminal guidance section of the missile interception target;
step 2: abstracting the solution problem of the kinematic equation and modeling the solution problem into a Markov decision process;
and step 3: building an algorithm network, setting algorithm parameters, and training the algorithm network according to a randomly initialized data set to determine weight parameters of an initial network;
and 4, step 4: the intelligent agent continuously caches the state transition data and the reward value as learning samples in an experience pool according to a Q-learning algorithm, and continuously selects a fixed number of sample training networks from the experience pool at random until a set learning turn is reached;
and 5: in a specific guidance process, a learned network is used for generating action in real time according to the current state and transferring to the next state, and the process is continuously repeated until a target is hit to finish the guidance process.
Preferably, the relative kinematic equation between the missile eyes in the longitudinal plane of the missile interception target terminal guidance section established in the step 1 is as follows:
x r =x t -x m
y r =y t -y m
wherein x is t Is the abscissa, x, of the object m Is the abscissa, x, of the missile r Is the transverse relative distance of the target from the missile, y t Is the ordinate, y, of the object m Is the ordinate, y, of the missile r Is the longitudinal relative distance of the target and the missile, V t Is the target linear velocity, theta t Is the angle between the target linear velocity direction and the horizontal direction, V m At linear velocity of the missile, theta m Is the included angle between the linear velocity direction of the missile and the horizontal direction,is the rate of change of the lateral distance between the target and the missile>Is the change rate of the longitudinal distance between the target and the missile, r is the relative distance between the target and the missile, q is the included angle between the sight line between the missile and the target and the horizontal direction, also called the sight line angle, and is greater than or equal to the maximum value of the mean square value>Based on the relative distance change rate>Is the rate of change of the line of sight angle.
Preferably, abstracting and modeling the solution problem of the kinematic equation as a markov decision process specifically comprises:
the action space setting specifically is as follows: constructing an action space by taking a proportional guidance law PNG as expert experience;
the state space setting is specifically as follows: change the sight lineAs a state space for the current problem of knowledge;
the reward function setting is specifically as follows:
in the formula, r hit Relative distance, r, for the final target hit by the missile end Is the relative distance between the missile and the target at the termination time, end is the total period duration of the termination time, r t The distance between the missile and the target at the moment t in the simulation process.
Preferably, the specific process of constructing the motion space by using the proportional guidance law PNG as the expert experience is as follows:
taking the relative speed and the line-of-sight rotation rate as input, the output is an overload instruction, and the overload instruction is expressed as:wherein K is a proportionality coefficient>Is relative speed, is based on>The line of sight rotation rate; by setting the proportionality coefficient K to be constantIs discretized into a finite number as a motion space, the scaling factor is determined by selecting the motion in the motion space, and the overload command is calculated therefrom.
Preferably, the specific steps of initializing the neural network weight parameter are as follows:
step 301: determining that an algorithm network adopts a BP neural network, inputting a (state, action) two-dimensional column vector, and outputting a Q value corresponding to a (state, action) binary group;
calling a random function in a given value range to generate a series of random data as an input data set of the network, and calculating a reward value taking the random data set as a state and an action according to a reward function as an output reference data set;
step 302: and training the neural network according to the data set obtained in the step 301 to determine initial weight parameters of the neural network.
Preferably, the specific method for training the neural network and updating the target network with a fixed frequency is as follows:
in each simulation step, selecting an executed action from an action space according to an epsilon-greedy strategy for the current state, integrating according to a kinetic equation to obtain the state of the next moment, and calculating to obtain an obtained reward value; setting an experience pool and saving the current state, the executed action, the reward value and the next state as experience into the experience pool;
randomly taking out a data set with a certain size from the experience pool at a fixed frequency, calculating a corresponding target value of the data set, training a neural network by using the data set and the target value corresponding to the data set, and updating the target network by adopting a certain frequency, namely replacing the target network with the network trained in a period of time.
Preferably, the specific calculation method of the target value is as follows:
Q t arg et =Q(s t ,a t )+α[R t +γmax a Q(s t+1 ,a)-Q(s t ,a t )]
wherein Q is t arg et Represents updated(s) t ,a t ) The corresponding Q value of the signal is obtained,s t represents the state at time t, a t Representative state s t Action performed below, Q(s) t ,a t ) Representative state s t Lower execution action a t Is the learning rate, is the rate of Q value update, R t Representative state s t Lower execution action a t The value of the prize earned, gamma for the discount rate, s t+1 Represents the state at time t +1, max a Q(s t+1 A) represents the state s t+1 The Q value for performing the optimal action.
Compared with the prior art, the invention has the following remarkable advantages: the invention provides an optimal navigation ratio accumulation sequence obtained by applying a deep reinforcement learning algorithm through off-line learning in a given navigation ratio range, so that the missile can select the most appropriate navigation ratio parameter to generate overload required to be used at every moment according to the current state, thereby solving the problem of difficulty in selecting the navigation ratio to a certain extent and simultaneously improving the hit precision.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, wherein like reference numerals are used to designate like parts throughout.
FIG. 1 is a geometrical diagram of missile interception end-guidance plane engagement provided according to an embodiment of the invention.
FIG. 2 is a diagram of the end-guided rhythm learning of the deep reinforcement learning of the present invention.
FIG. 3 is a flowchart of the deep reinforcement learning algorithm of the present invention.
Fig. 4 is a relationship between the amount of miss and the number of learning rounds for an embodiment of the present invention.
FIG. 5 is a two-dimensional trajectory graph of the missile and target provided in accordance with an embodiment of the present invention.
FIG. 6 is a sequence of navigation ratios for a specific example of the present invention.
Fig. 7 is a graph of line-of-sight angular velocity for a specific example of the present invention.
Detailed Description
It is easily understood that various embodiments of the present invention can be conceived by those skilled in the art according to the technical solution of the present invention without changing the essential spirit of the present invention. Therefore, the following detailed description and the accompanying drawings are merely illustrative of the technical aspects of the present invention, and should not be construed as all of the present invention or as limitations or limitations on the technical aspects of the present invention. Rather, these embodiments are provided so that this disclosure will be thorough and complete. The preferred embodiments of the present invention will now be described in detail with reference to the accompanying drawings, which form a part hereof, and which together with the embodiments of the invention serve to explain the innovative concepts of the invention.
The invention discloses a terminal guidance law design method based on deep reinforcement learning, which comprises the following steps:
step 1: with reference to the geometrical schematic diagram of the missile interception terminal guidance plane engagement in fig. 1, the equation of the relative kinematics between the missile eyes in the longitudinal plane of the missile interception target terminal guidance section is established as follows:
x r =x t -x m
y r =y t -y m
wherein x is t Is the abscissa, x, of the object m Is the abscissa, x, of the missile r Is the transverse relative distance of the target from the missile, y t Is the ordinate, y, of the object m As ordinate, y, of the missile r Is the longitudinal relative distance of the target from the missile, V t Is the target linear velocity, theta t Is the angle between the target linear velocity direction and the horizontal direction, V m At the linear velocity of the missile, θ m Is the included angle between the linear velocity direction of the missile and the horizontal direction,in relation to the rate of change of the transverse distance of the target from the missile>Is the change rate of the longitudinal distance between the target and the missile, r is the relative distance between the target and the missile, q is the included angle between the sight line between the missile and the target and the horizontal direction, also called the sight line angle, and is greater than or equal to the maximum value of the mean square value>Based on the relative distance change rate>Is the rate of change of the line of sight angle.
Step 2: abstracting the solution problem of the kinematic equation and modeling the solution problem into a Markov decision process;
further, step 2 specifically includes:
and (4) setting an action space, and constructing the action space by taking the proportional guidance law PNG as expert experience in order to avoid the condition that the final algorithm cannot be converged due to overlarge action space search.
Specifically, the proportional guidance law takes the relative speed and the line-of-sight rotation rate as input, and the output is an overload command, which is expressed as:wherein K is a proportionality coefficient>Is relative speed, is based on>The line of sight rotation rate; determining the proportionality coefficient by discretizing the proportionality coefficient K into a limited number within a certain range of values as an action space, by selecting an action in the action space, and calculating therefrom an overload instruction;
setting a state space, wherein in the design of the guidance ratio, the selected state space must contain all states of the guidance process, and converting the sight line into the ratioAs a state space for the current problem-aware, all states of motion can be adequately represented;
setting a reward function, wherein the DQN algorithm judges whether the execution action is good or not by using the reward function; in the missile pursuit process, if the relative distance between the missile and the target is shortened at adjacent moments, a positive reward is obtained; if the missile finally hits the target, a larger reward will be obtained, but otherwise the missile that did not hit the target will set the reward to 0, and in summary, the reward function is set to:
in the formula, r hit Relative distance, r, for the final target hit by the missile end Is the time of terminationThe relative distance between the missile and the target, end is the total period duration of the termination time, r t The distance between the missile and the target at the moment t in the simulation process. Relative velocity during target pursuit by the missileIs always negative when>The time is changed from negative to positive at a certain moment, and the moment is the termination moment.
And 3, step 3: building an algorithm network, setting algorithm parameters, and training the algorithm network according to a randomly initialized data set to determine weight parameters of an initial network;
specifically, the algorithm determining network adopts a BP neural network, inputs the (state, action) two-dimensional column vectors, and outputs Q values corresponding to the (state, action) two-tuple, wherein the significance of the Q values is to determine the optimal action to be executed according to the Q values of different actions to be executed in the same state; calling a random function within a given value range to generate a series of random data as an input data set of the network, and calculating a reward value taking the random data set as a state and an action according to the reward function in the step 4 as an output reference data set;
and 4, step 4: the intelligent agent continuously caches state transition data and reward values as learning samples in an experience pool according to a Q-learning algorithm, and continuously selects a fixed number of sample training networks from the experience pool at random until a set learning turn is reached, and the method specifically comprises the following steps:
in each simulation step, determining which action is taken for the current state through an epsilon-greedy strategy, integrating according to a kinetic equation to obtain the state of the next moment, and meanwhile calculating to obtain an obtained reward value; setting experience pool and obtaining the current state, the executed action, the reward value and the next state(s) t ,a t ,r t ,s t+1 ) Saving the experience in an experience pool as experience;
randomly fetching a data set of a certain size from an experience pool at a fixed frequencyThen, calculating a corresponding target value of the data set, wherein the specific calculation method is as follows: q t arg et =Q(s t ,a t )+α[R t +γmax a Q(s t+1 ,a)-Q(s t ,a t )]Wherein Q is t arg et Represents updated(s) t ,a t ) Corresponding Q value, s t Representing the state at time t, a t Representative state s t Action performed, Q(s) t ,a t ) Representative state s t Lower execution action a t Is the rate of updating of the Q value, R t Representative state s t Lower execution action a t The value of the reward earned, γ representing the discount rate, is how important the future experience is in performing the action on the current state, s t+1 Represents the state at time t + 1, max a Q(s t+1 A) represents the state s t+1 The Q value of the optimal action is executed; then training a neural network by using the data set and the corresponding target value obtained by calculation until reaching the set learning turn;
and 5: in a specific guidance process, a learned network is used for generating action in real time according to the current state and transferring to the next state, and the process is continuously repeated until a target is hit to finish the guidance process.
To be a specific example of the present invention, the initial conditions are set as follows:
meanwhile, the motion space, i.e., the navigation ratio, is designed to be a = {2,2.12.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9,3.0,3.1,3.2,3.3,3.4,3.5,3.6,3.7,3.8,3.9,4.0,4.1,4.2,4.3,4.4,4.5,4.6,4.7,4.8,4.9,5.0}; the neural network is set as 2 hidden layers, each layer has 40 neurons, and the error back propagation strategy selects a gradient descent method; the total number of learning rounds is 2200, and as can be seen from fig. 4, the miss distance converges from the initial random distribution to a lower value as the number of learning rounds increases, thus demonstrating the convergence of the algorithm of the present invention.
And intercepting the target by applying the learned algorithm model, and calculating a guidance trajectory by a fourth-order Runge Kutta to obtain a trajectory diagram as shown in FIG. 5. Comparing the depth-enhanced learning-based guidance law (DQNG) with the traditional proportional guidance law (PNG), wherein the miss distance of the DQNG is 0.5386m, the miss distance of the QNG is 1.3268m, and finding that the guidance trajectory of the DQNG can approach a target more quickly and perform accurate striking; meanwhile, the DQNG has the hit time of 12.44s, while the PNG has the hit time of 12.94s, compared with the DQNG, the target can be intercepted faster.
While the invention has been described with reference to specific preferred embodiments, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the following claims.
It should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes described in a single embodiment or with reference to a single figure, for the purpose of streamlining the disclosure and aiding in the understanding of various aspects of the invention by those skilled in the art. However, the present invention should not be construed to include features in the exemplary embodiments which are all the essential technical features of the patent claims.
It should be understood that the modules, units, components, and the like included in the device of one embodiment of the present invention may be adaptively changed to be provided in a device different from that of the embodiment. The different modules, units or components comprised by the apparatus of an embodiment may be combined into one module, unit or component or may be divided into a plurality of sub-modules, sub-units or sub-components.
Claims (7)
1. A terminal guidance law design method based on deep reinforcement learning is characterized by comprising the following steps:
step 1: establishing a relative kinematic equation between missile eyes in a longitudinal plane of a guided section at the tail end of a missile interception target;
step 2: abstracting the solution problem of the kinematic equation and modeling the solution problem into a Markov decision process;
and step 3: establishing an algorithm network, setting algorithm parameters, and training the algorithm network according to a randomly initialized data set to determine weight parameters of an initial network;
and 4, step 4: the intelligent agent continuously caches the state transition data and the reward value as learning samples in an experience pool according to a Q-learning algorithm, and continuously selects a fixed number of sample training networks from the experience pool at random until a set learning turn is reached;
and 5: in a specific guidance process, a learned network is used for generating action in real time according to the current state and transferring to the next state, and the process is continuously repeated until a target is hit to finish the guidance process.
2. The terminal guidance law design method based on the deep reinforcement learning as claimed in claim 1, wherein the relative kinematic equation between the missile eyes in the longitudinal plane of the terminal guidance section of the missile interception target established in step 1 is specifically as follows:
x r =x t -x m
y r =y t -y m
wherein x is t Is the abscissa, x, of the object m Is the abscissa, x, of the missile r Is the transverse relative distance of the target from the missile, y t Is the ordinate, y, of the object m As ordinate, y, of the missile r Is the longitudinal relative distance of the target and the missile, V t Is the target linear velocity, theta t Is the angle between the target linear velocity direction and the horizontal direction, V m At linear velocity of the missile, theta m Is the included angle between the linear velocity direction of the missile and the horizontal direction,is the rate of change of the lateral distance between the target and the missile>Is the change rate of the longitudinal distance between the target and the missile, r is the relative distance between the target and the missile, q is the included angle between the sight line between the missile and the target and the horizontal direction, also called the sight line angle, and is greater than or equal to the maximum value of the mean square value>Based on the relative distance change rate>Is the rate of change of the line of sight angle.
3. The end-lead law design method based on deep reinforcement learning according to claim 1, wherein abstracting and modeling a solution problem of kinematic equations into a Markov decision process specifically comprises:
the action space setting specifically is as follows: constructing an action space by taking a proportional guidance law PNG as expert experience;
the state space setting is specifically as follows: change the line of sight to a certain degreeAs a state space for the current problem of knowledge;
the reward function setting is specifically as follows:
in the formula, r hit Relative distance, r, for the final target hit by the missile end Is the relative distance between the missile and the target at the termination time, end is the total period duration of the termination time, r t The distance between the missile and the target at the moment t in the simulation process.
4. The terminal guidance law design method based on the deep reinforcement learning as claimed in claim 3, wherein the specific process of constructing the action space by taking the proportional guidance law PNG as the expert experience is as follows:
taking the relative speed and the line of sight rotation rate as input, the output is an overload command, and is represented as:wherein K is a proportionality coefficient>Is relative speed, is based on>The line of sight rotation rate; the scaling factor K is determined by selecting an action in the action space as an action space by discretizing the scaling factor K into a finite number within a certain range of values, and the overload command is calculated therefrom.
5. The terminal guidance law design method based on deep reinforcement learning as claimed in claim 1, wherein the specific steps of initializing the weight parameters of the neural network are as follows:
step 301: determining that an algorithm network adopts a BP neural network, inputting a (state, action) two-dimensional column vector, and outputting a Q value corresponding to a (state, action) binary group;
calling a random function in a given value range to generate a series of random data as an input data set of the network, and calculating a reward value taking the random data set as a state and an action according to a reward function as an output reference data set;
step 302: and training the neural network according to the data set obtained in the step 301 to determine initial weight parameters of the neural network.
6. The terminal guidance law design method based on deep reinforcement learning as claimed in claim 1, wherein the specific method for training the neural network and updating the target network with fixed frequency is as follows:
in each simulation step, selecting an executed action from an action space according to an epsilon-greedy strategy for the current state, integrating according to a kinetic equation to obtain the state of the next moment, and calculating to obtain an obtained reward value; setting an experience pool and saving the current state, the executed action, the reward value and the next state as experience into the experience pool;
randomly taking out a data set with a certain size from the experience pool at a fixed frequency, calculating a corresponding target value of the data set, training a neural network by using the data set and the target value corresponding to the data set, and updating the target network by adopting a certain frequency, namely replacing the target network with the network trained in a period of time.
7. The terminal guidance law design method based on the deep reinforcement learning as claimed in claim 6, wherein the specific calculation method of the target value is as follows:
Q target =Q(s t ,a t )+α[R t +γmax a Q(s t+1 ,a)-Q(s t ,a t )]
wherein Q is target Represents updated(s) t ,a t ) Corresponding Q value, s t Representing the state at time t, a t Representative state s t Action performed below, Q(s) t ,a t ) Representative state s t Lower execution action a t Is the rate of updating of the Q value, R t Representative state s t Lower execution action a t The prize value earned, gamma representing the discount rate, s t+1 Represents the state at time t +1, max a Q(s t+1 A) represents the state s t+1 The Q value for performing the optimal action.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211509505.1A CN115857548A (en) | 2022-11-29 | 2022-11-29 | Terminal guidance law design method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211509505.1A CN115857548A (en) | 2022-11-29 | 2022-11-29 | Terminal guidance law design method based on deep reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115857548A true CN115857548A (en) | 2023-03-28 |
Family
ID=85667624
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211509505.1A Pending CN115857548A (en) | 2022-11-29 | 2022-11-29 | Terminal guidance law design method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115857548A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116222310A (en) * | 2023-04-13 | 2023-06-06 | 哈尔滨工业大学 | Two-pair synchronous region coverage interception method based on RBF_G in three-dimensional space |
CN117989923A (en) * | 2024-03-22 | 2024-05-07 | 哈尔滨工业大学 | Variable proportion coefficient multi-bullet collaborative guidance method and system based on reinforcement learning |
-
2022
- 2022-11-29 CN CN202211509505.1A patent/CN115857548A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116222310A (en) * | 2023-04-13 | 2023-06-06 | 哈尔滨工业大学 | Two-pair synchronous region coverage interception method based on RBF_G in three-dimensional space |
CN116222310B (en) * | 2023-04-13 | 2024-04-26 | 哈尔滨工业大学 | Two-pair synchronous region coverage interception method based on RBF_G in three-dimensional space |
CN117989923A (en) * | 2024-03-22 | 2024-05-07 | 哈尔滨工业大学 | Variable proportion coefficient multi-bullet collaborative guidance method and system based on reinforcement learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115857548A (en) | Terminal guidance law design method based on deep reinforcement learning | |
CN108803321B (en) | Autonomous underwater vehicle track tracking control method based on deep reinforcement learning | |
US11794898B2 (en) | Air combat maneuvering method based on parallel self-play | |
CN110989576A (en) | Target following and dynamic obstacle avoidance control method for differential slip steering vehicle | |
CN111240345B (en) | Underwater robot trajectory tracking method based on double BP network reinforcement learning framework | |
CN111538241B (en) | Intelligent control method for horizontal track of stratospheric airship | |
CN112947592B (en) | Reentry vehicle trajectory planning method based on reinforcement learning | |
CN113093802A (en) | Unmanned aerial vehicle maneuver decision method based on deep reinforcement learning | |
CN112286218B (en) | Aircraft large-attack-angle rock-and-roll suppression method based on depth certainty strategy gradient | |
CN114253296B (en) | Hypersonic aircraft airborne track planning method and device, aircraft and medium | |
CN111240344B (en) | Autonomous underwater robot model-free control method based on reinforcement learning technology | |
Yue et al. | Deep reinforcement learning and its application in autonomous fitting optimization for attack areas of UCAVs | |
CN113671825B (en) | Maneuvering intelligent decision-avoiding missile method based on reinforcement learning | |
CN114675673B (en) | Method and system for tracking moving target in air | |
CN113625740B (en) | Unmanned aerial vehicle air combat game method based on transfer learning pigeon swarm optimization | |
CN114063644B (en) | Unmanned fighter plane air combat autonomous decision-making method based on pigeon flock reverse countermeasure learning | |
CN114089776A (en) | Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning | |
Jiang et al. | Realizing midcourse penetration with deep reinforcement learning | |
CN116697829A (en) | Rocket landing guidance method and system based on deep reinforcement learning | |
Wang et al. | Autonomous target tracking of multi-UAV: A two-stage deep reinforcement learning approach with expert experience | |
CN117908565A (en) | Unmanned aerial vehicle safety path planning method based on maximum entropy multi-agent reinforcement learning | |
CN117332684A (en) | Optimal capturing method under multi-spacecraft chase-escaping game based on reinforcement learning | |
CN116796844A (en) | M2 GPI-based unmanned aerial vehicle one-to-one chase game method | |
CN114815878B (en) | Hypersonic aircraft collaborative guidance method based on real-time optimization and deep learning | |
CN114997048A (en) | Automatic driving vehicle lane keeping method based on TD3 algorithm improved by exploration strategy |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |