CN115524964A - Rocket landing real-time robust guidance method and system based on reinforcement learning - Google Patents
Rocket landing real-time robust guidance method and system based on reinforcement learning Download PDFInfo
- Publication number
- CN115524964A CN115524964A CN202210972207.XA CN202210972207A CN115524964A CN 115524964 A CN115524964 A CN 115524964A CN 202210972207 A CN202210972207 A CN 202210972207A CN 115524964 A CN115524964 A CN 115524964A
- Authority
- CN
- China
- Prior art keywords
- rocket
- landing
- flight
- intelligent agent
- real
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/042—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T90/00—Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a rocket landing real-time robust guidance method and a rocket landing real-time robust guidance system based on reinforcement learning, wherein a rocket three-degree-of-freedom motion model is established according to acting force borne by the rocket earth landing power descent segment, a rocket landing Markov decision process model is established according to the rocket three-degree-of-freedom motion model, an intelligent Agent is established according to the rocket landing Markov decision process model, interactive simulation training is carried out on the intelligent Agent and a pre-established rocket landing flight simulation environment to obtain a landing control Agent, and the rocket landing flight is guided according to a real-time control instruction generated by the landing control Agent.
Description
Technical Field
The invention relates to the technical field of vertical take-off and landing rocket earth landing guidance, in particular to a rocket landing real-time robust guidance method and system based on reinforcement learning.
Background
The vertical take-off and landing reusable carrier rocket is a novel carrier rocket and is an effective tool for reducing the cost of space flight carrying tasks and improving the efficiency of entering space. The rocket sub-level earth landing guidance is a key technology for controlling the position and the speed of the mass center of a carrier rocket in the three-degree-of-freedom flight process of returning the rocket to the earth landing mass center, namely, generating an instruction for guiding the mass center of the rocket to move according to a certain principle or strategy, so that the moving process meets constraint conditions, and the terminal state meets a preset target, thereby ensuring the recovery precision of the carrier rocket, reducing fuel consumption and realizing reliable reuse.
The existing rocket sublevel earth landing guidance method mainly comprises an online track optimization guidance method for solving an optimal flight track on line by establishing a target rocket dynamics model and a corresponding track optimization problem model by adopting an indirect method or a direct method, and a deep learning guidance method adopting a strategy of off-line training and online application. Although the existing guidance methods have certain real-time performance and optimality and can realize recycling of the carrier rocket to a certain extent, the methods are all guidance methods based on models, the algorithm efficiency and the usability of the solving result of the methods are seriously dependent on the modeling precision and accuracy, the robustness is poor, once unknown factors which cannot be modeled exist in the environment or the models have deviation and uncertainty interference, the algorithm performance and the usability of the solving result are seriously influenced, and further guidance failure is caused.
Disclosure of Invention
The invention aims to provide a rocket landing real-time robust guidance method based on reinforcement learning, which is characterized in that a rocket three-degree-of-freedom motion model is constructed through stress analysis based on rocket earth landing power descent stage flight, a rocket landing Markov decision process model is constructed by combining a staring heuristic idea, and interactive simulation training is carried out on an intelligent Agent based on a value function neural network and a strategy neural network and a rocket landing flight simulation environment to obtain a landing control Agent for generating a rocket landing guidance control strategy, and a real-time control instruction is generated according to the rocket landing guidance control strategy to guide the rocket landing flight.
In order to achieve the above object, it is necessary to provide a rocket landing real-time robust guidance method and system based on reinforcement learning to solve the above technical problems.
In a first aspect, an embodiment of the present invention provides a rocket landing real-time robust guidance method based on reinforcement learning, including the following steps:
constructing a rocket three-degree-of-freedom motion model according to acting force borne by the rocket in the earth landing power descent stage;
constructing a rocket landing Markov decision process model according to the rocket three-degree-of-freedom motion model;
constructing an intelligent Agent according to the rocket landing Markov decision process model, and performing interactive training on the intelligent Agent and a pre-constructed rocket landing flight simulation environment to obtain a landing control Agent; the intelligent Agent comprises a value function-based neural network and a strategy-based neural network;
and generating a real-time control instruction according to the landing control Agent, and guiding the rocket to land and fly according to the real-time control instruction.
Further, the step of constructing the rocket three-degree-of-freedom motion model according to the acting force borne by the rocket earth landing power descent segment flight comprises the following steps:
establishing a landing point coordinate system by taking the rocket sub-level target landing point as an original point; the landing point coordinate system is a coordinate system which takes a target landing point for rocket sublevel landing as a coordinate origin O, takes the vertical upward direction of the geocentric as a coordinate axis Oz, takes the main flight direction of the rocket during landing as a coordinate axis Ox, and takes the direction which is perpendicular to the plane xOz and forms a right-hand rectangular coordinate system with the coordinate axis Ox and the coordinate axis Oz as a coordinate axis Oy;
based on the landing point coordinate system, carrying out stress analysis on the rocket flying in the earth landing power descent stage, and determining corresponding earth attraction, aerodynamic resistance and engine thrust;
constructing the rocket three-degree-of-freedom motion model according to the earth attraction, the pneumatic resistance and the engine thrust; the rocket three-degree-of-freedom motion model is expressed as follows:
in the formula (I), the compound is shown in the specification,
wherein r represents a rocket position vector; v represents a rocket velocity vector; m represents rocket mass; g (r) represents the gravitational acceleration vector received by the rocket; t represents an engine thrust vector; d represents an aerodynamic resistance vector; i is sp Represents the fuel specific impulse; g 0 Representing the average gravitational acceleration at sea level of the earth;the second consumption of the propellant after the engine is started; c D Representing a drag coefficient; s ref A reference area representing a rocket substage; rho 0 Representing a reference atmospheric density at sea level of the earth; h represents the flight height of the rocket stage; h is a total of ref Indicating a reference height.
Further, the step of constructing a rocket landing Markov decision process model according to the rocket three-degree-of-freedom motion model comprises the following steps:
based on the concept of staring inspiration, carrying out conversion processing on the state variable of the rocket to obtain the state quantity of the rocket landing Markov decision process model; the state quantity is expressed as:
in the formula (I), the compound is shown in the specification,
V error =V-V sight
the method comprises the following steps that S represents the state quantity of a rocket landing Markov decision process model; r, V and V 0 Respectively representing a rocket position vector, a rocket speed vector and a rocket initial speed; t is t go Representing the remaining time of flight of the rocket; r is z A Z-axis component representing a rocket position vector; v sight Represents a line-of-sight vector; v error Representing the error of the rocket velocity vector and the sight line vector; λ represents a parameter that adjusts the magnitude of the sight vector over time;
obtaining the action quantity of the rocket landing Markov decision process model according to the control instruction of the rocket; the action amount is expressed as:
wherein A represents the action amount of the rocket landing Markov decision process model; t represents an engine thrust vector; t is x 、T y And T z X-axis, Y-axis and Z-axis components representing engine thrust, respectively;
determining a return function design principle according to rocket fixed point soft landing requirements, and obtaining a return function of the rocket landing Markov decision process model according to the return function design principle;
discretizing a continuous rocket landing process according to a preset period, and determining the state transition probability of the rocket landing Markov decision process model according to rocket integral dynamics.
Further, the step of constructing an intelligent Agent according to the rocket landing Markov decision process model comprises the following steps:
selecting a near-end strategy optimization algorithm as a reinforcement learning algorithm of the intelligent Agent according to the rocket landing Markov decision process model;
and constructing the value function-based neural network and the strategy-based neural network according to a multilayer perceptron model based on the near-end strategy optimization algorithm.
Further, the rocket landing flight simulation environment construction step comprises:
and constructing a rocket landing operating environment based on the rocket three-degree-of-freedom motion model, and synchronously constructing a corresponding initial value condition generator and a corresponding flight termination determiner to obtain the rocket landing flight simulation environment.
Further, the step of interactively training the intelligent Agent and the pre-constructed rocket landing flight simulation environment to obtain the landing control Agent comprises the following steps:
and training a strategy-based neural network of the intelligent Agent until convergence through interactive simulation of the intelligent Agent and the rocket landing flight simulation environment to obtain the landing control Agent.
Further, the step of training a policy-based neural network of the intelligent Agent through interactive simulation of the intelligent Agent and the rocket landing flight simulation environment until convergence to obtain the landing control Agent comprises:
randomly selecting an initial state to be simulated from a preset initial state space according to the initial value condition generator;
according to the initial state to be simulated, executing interactive simulation of the intelligent Agent and the flight simulation environment, terminating the simulation flight of the current wheel when a simulation termination condition preset by the flight termination judger is reached, evaluating and obtaining an accumulated return value of each state point in the current simulation flight track according to a return function, and updating a value-function-based neural network parameter of the intelligent Agent according to the accumulated return value;
predicting expected accumulated return values of all state points in the current simulated flight trajectory according to the updated value-based function neural network of the intelligent Agent, calculating an advantage function according to the accumulated return values and the expected return values, and updating parameters of the intelligent Agent based on a strategy neural network according to the advantage function;
and judging whether the strategy-based neural network of the intelligent Agent reaches a preset convergence condition, if so, stopping simulation training to obtain the landing control Agent, otherwise, reselecting an initial state to be simulated according to the initial condition generator, and starting the next round of interactive simulation training.
In a second aspect, an embodiment of the present invention provides a rocket landing real-time robust guidance system based on reinforcement learning, where the system includes:
the motion model building module is used for building a rocket three-degree-of-freedom motion model according to acting force borne by the rocket in the earth landing power descent stage;
the optimization model building module is used for building a rocket landing Markov decision process model according to the rocket three-degree-of-freedom motion model;
the control strategy training module is used for constructing an intelligent Agent according to the rocket landing Markov decision process model and interactively training the intelligent Agent and a pre-constructed rocket landing flight simulation environment to obtain a landing control Agent; the intelligent Agent comprises a value function-based neural network and a strategy-based neural network;
and the rocket landing guidance module is used for generating a real-time control instruction according to the landing control Agent and guiding the rocket to land and fly according to the real-time control instruction.
In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method when executing the computer program.
In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the above method.
The method realizes the technical scheme that a rocket three-degree-of-freedom motion model is constructed according to acting force borne by rocket earth landing power descent section flight, a rocket landing Markov decision process model is constructed according to the rocket three-degree-of-freedom motion model, after an intelligent Agent is constructed according to the rocket landing Markov decision process model, the intelligent Agent and a pre-constructed rocket landing flight simulation environment are subjected to interactive simulation training to obtain a landing control Agent, and rocket landing flight is guided according to a real-time control instruction generated by the landing control Agent. Compared with the prior art, the rocket landing real-time robust guidance method based on reinforcement learning has the advantages of extremely high real-time performance, extremely high algorithm robustness, capability of adapting to relatively wide modeling deviation, capability of guiding the rocket to perform high-precision fixed-point soft landing under the working condition that uncertain interference exists in the environment, and high application value.
Drawings
FIG. 1 is a schematic diagram of an application scenario of a rocket landing real-time robust guidance method based on reinforcement learning in an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a rocket landing real-time robust guidance method based on reinforcement learning in the embodiment of the invention;
FIG. 3 is a schematic diagram of a landing site coordinate system used for building a rocket three-degree-of-freedom motion model in an embodiment of the present invention;
FIG. 4 is a schematic diagram of a policy-based neural network of an intelligent Agent in an embodiment of the present invention;
FIG. 5 is a schematic diagram of a value-based function neural network of an intelligent Agent in an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of a rocket landing real-time robust guidance system based on reinforcement learning in the embodiment of the present invention;
fig. 7 is an internal structural diagram of a computer device in the embodiment of the present invention.
Detailed Description
In order to make the purpose, technical solution and advantages of the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments, and it is obvious that the embodiments described below are part of the embodiments of the present invention, and are used for illustrating the present invention only, but not for limiting the scope of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The rocket landing real-time robust guidance method based on reinforcement learning can be applied to returning landing guidance of a vertical take-off and landing reusable carrier rocket earth, can map an engine thrust instruction based on a rocket real-time state based on an overall framework shown in figure 1, and the given instruction has adaptability to large-range rocket model deviation and environmental interference, ensures that high-precision fixed-point soft landing is carried out by guiding a rocket sublevel under the condition of complex uncertainty, and can effectively fit a high-dimensional continuous action space instruction by adopting a staring heuristic method to set state quantity guidance and designing different reward discount rates for a rocket landing track terminal and process indexes to accelerate the convergence speed of a strategy by adopting a deep neural network as a strategy network for reinforcement learning and combining with an improved PPO algorithm to learn, thereby ensuring that the rocket landing method has extremely high real-time performance, the algorithm has stronger robustness, is suitable for relatively wide modeling deviation, can effectively cope with uncertain interference in the environment, and provides reliable guarantee for guiding the rocket to carry out high fixed-point precision soft landing, and has higher application value; it should be noted that the method of the present invention can be executed by a server that undertakes related functions, and the following embodiments all take the server as an execution subject, and the rocket landing real-time robust guidance method based on reinforcement learning of the present invention is described in detail.
In one embodiment, as shown in fig. 2, a rocket landing real-time robust guidance method based on reinforcement learning is provided, which comprises the following steps:
s11, constructing a rocket three-degree-of-freedom motion model according to acting force borne by the rocket in the earth landing power descent stage; the rocket three-degree-of-freedom motion model can be understood as a motion model obtained by performing targeted improvement on a current rocket landing operation model based on the consideration of actual flight conditions and task targets and establishing a nonlinear and continuous rocket fuel optimal landing trajectory optimization problem; because the rocket sublevel is mainly influenced by engine thrust, earth attraction and aerodynamic force generated in a dense atmosphere environment in the final landing process, in order to simplify the problems as much as possible on the basis of ensuring the reliability of research problems, the embodiment mainly considers the acting force borne by the rocket in the flight of the earth landing power descent stage to construct a corresponding rocket three-degree-of-freedom motion model; specifically, the step of constructing a rocket three-degree-of-freedom motion model according to the acting force applied to the rocket earth landing power descent stage flight comprises the following steps:
establishing a landing point coordinate system by taking the rocket sublevel target landing point as an origin; the landing point coordinate system is a coordinate system which takes a target landing point of rocket sublevel landing as a coordinate origin O, takes the vertical upward direction of the geocenter as a coordinate axis Oz, takes the main flight direction of the rocket during landing as a coordinate axis Ox, and takes the direction which is vertical to a plane xOz and forms a right-hand rectangular coordinate system with the coordinate axis Ox and the coordinate axis Oz as a coordinate axis Oy as shown in FIG. 3; it should be noted that, because the last flight phase in the rocket substage landing process considered by the invention has the characteristics of short flight time and narrow flight airspace, the influence caused by the curvature of the earth surface and the earth rotation can be ignored, and the earth surface is taken as a plane, so that in order to describe the rocket substage flight process more intuitively and simplify the solution of the problem, the embodiment preferably establishes a landing point coordinate system for the stress analysis when constructing a rocket three-degree-of-freedom motion model;
based on the landing point coordinate system, carrying out stress analysis on the rocket flying in the earth landing power descent stage, and determining corresponding earth attraction, aerodynamic drag and engine thrust; the system comprises a rocket three-degree-of-freedom motion model, a plane landing field model, a constant gravity model, a dynamic descent segment model and a constant gravity field model, wherein the earth gravity is set as a constant value in the rocket three-degree-of-freedom motion model, the influence of earth rotation can be ignored due to the short flight time (dozens of seconds) of the dynamic descent segment, and the narrow flight airspace (dozens of kilometers) can meet the precision requirement by adopting the plane landing field model and the constant gravity field model, so that the solution of the problem is effectively simplified; aerodynamic drag is understood to be the aerodynamic drag to which a rocket is subjected in a dense atmosphere and can be expressed as:
D=-C D S ref ρ||V|| 2 V/2
in the formula (I), the compound is shown in the specification,
wherein, C D Representing a drag coefficient; s ref A reference area representing a rocket substage; rho represents the atmospheric density in the earth landing environment and is represented by an exponential atmospheric density model; v represents the velocity vector of the rocket; rho 0 Representing a reference atmospheric density at sea level of the earth; h represents the flight height of the rocket sublevel, namely the rocket position component of the Z axis in the landing point coordinate system; h is a total of ref Represents a reference height;
the engine thrust can be understood as that under the condition of not considering rocket attitude transformation, a plurality of engines equipped at the rocket substage are combined and equivalent to one engine to be used as a rocket to provide thrust to obtain the engine thrust, and the engine thrust is expressed as follows:
wherein, I sp Specific impulse of fuel, g 0 Is the average gravitational acceleration at sea level of the earth,the second consumption of the propellant after the engine is started.
Furthermore, in the landing problem studied by the present invention, it is notConsidering the influence of Control mechanisms such as a grid rudder and a Reaction Control System (RCS) on rocket adjustment, and taking thrust generated by a rocket engine as a unique Control quantity; meanwhile, the attitude motion of the rocket is not considered, the landing motion of the rocket is taken as a mass center motion, so that the total thrust T of the engine can be decomposed in the established landing point coordinate system according to the three-axis direction, and the thrust component along the three axes in the landing point coordinate system is obtained, namely T = [ T ]) x ,T y ,T z ] T The method can effectively avoid complex trigonometric function thrust resolving, directly takes three thrust components as the control quantity of the rocket sublevel in the subsequent problem modeling and solving, and imposes the following constraint on the form of the rocket sublevel:
wherein, the thrust amplitude is | | | T | |; because of the limitation of the current reusable engine technology, and in order to ensure the safety in the landing process, the engine is not shut down after being ignited and started in the last section of landing flight process, namely, in the whole power descent section flight, the rocket stage can be acted by the nonzero minimum thrust, and the corresponding thrust amplitude of the engine has the following constraints:
T min ≤||T||≤T max
wherein, T max And T min The upper bound and the lower bound of the thrust amplitude of the rocket engine are respectively;
constructing the rocket three-degree-of-freedom motion model according to the earth attraction, the pneumatic resistance and the engine thrust; the rocket three-degree-of-freedom motion model is expressed as follows:
in the formula (I), the compound is shown in the specification,
wherein, r represents a rocket position vector,v represents a vector of the velocity of the rocket,m represents rocket mass; g (r) represents the gravitational acceleration vector experienced by the rocket,is itself a function of rocket position r, which is set to a constant value in the problem solving of the present invention; t represents the engine thrust vector and,control variables for the trajectory optimization problem of the present invention; d represents the vector of the aerodynamic drag force,I sp represents the fuel specific impulse; g 0 Representing the average gravitational acceleration at sea level of the earth;the second consumption of the propellant after the engine is started; c D Representing a drag coefficient; s. the ref A reference area representing a rocket substage; rho 0 Representing a reference atmospheric density at sea level of the earth; h represents the flight height of the rocket substage; h is ref Indicating a reference height.
For the rocket substage landing process, the system state and system control can be expressed as:
the system state x of the rocket comprises the position of the rocket, the speed and the mass of the rocket;
wherein, the system control u of the rocket is the thrust of the equivalent rocket engine;
s12, constructing a rocket landing Markov decision process model according to the rocket three-degree-of-freedom motion model; wherein, the rocket landing Markov decision process model comprises five elements: the state quantity S, the action quantity A, the return function R, the state transition probability P and the discount factor gamma; specifically, the step of constructing a rocket landing Markov decision process model according to the rocket three-degree-of-freedom motion model comprises the following steps:
based on the concept of staring inspiration, carrying out conversion processing on the state variable of the rocket to obtain the state quantity of the rocket landing Markov decision process model; the state quantity S does not directly adopt the state variable of the rocket, but the observed state of the rocket is converted by adopting a staring heuristic idea so as to accelerate the convergence rate of the follow-up intelligent agent strategy in the early learning; it should be noted that the following rocket system states can be understood as state quantities obtained through conversion processing; the state quantity S can be expressed as:
in the formula (I), the compound is shown in the specification,
V error =V-V sight
wherein S represents the state quantity of the rocket landing Markov decision process model; r, V and V 0 Respectively representing rocket position vector and fireRocket velocity vector and rocket initial velocity; t is t go Representing the remaining time of flight of the rocket; r is a radical of hydrogen z A Z-axis component representing a rocket position vector; v sight Represents a line-of-sight vector; v error Representing the error of the rocket velocity vector and the sight line vector; λ represents a parameter that adjusts the magnitude of the sight vector over time;
obtaining the action quantity of the rocket landing Markov decision process model according to the control instruction of the rocket; the action quantity can be understood as directly selecting a control command of a rocket, namely engine thrust, and is expressed as follows:
wherein A represents the action amount of the rocket landing Markov decision process model; t represents an engine thrust vector; t is x 、T y And T z X-axis, Y-axis and Z-axis components representing engine thrust, respectively; in addition, considering that the control command given by the subsequent intelligent agent strategy cannot constrain the magnitude of the module value of the output action, in order to ensure that the output control command meets the thrust amplitude constraint of the rocket engine, in this embodiment, preferably, the control command also needs to be subjected to amplitude interception so as to strictly meet the thrust amplitude constraint of the engine;
determining a return function design principle according to rocket fixed point soft landing requirements, and obtaining a return function of the rocket landing Markov decision process model according to the return function design principle; wherein, the return function design principle can be understood as rocket fixed point soft landing limiting conditions, such as:
(1) Rocket landing terminal position to landing site, i.e. r f =0;
(2) Rocket landing terminal speed is zero, i.e. V f =0;
(3) Rocket landing terminal residual mass m f As large as possible, i.e. to reduce fuel consumption as much as possible during flight;
(4) The transverse maneuvering can not be too large in the landing and flying process of the rocket;
it should be noted that the return function design principle may include, but is not limited to, the above listed principles according to the actual analysis situation, and after the design principle is determined, the trajectory return function design may be divided into two parts by combining with the gaze heuristic idea: a process cumulative return R and a terminal reward return prog Expressed as:
R prog =α||V error ||+β||F use ||+η·P glide
s.t.V error =V-V sight
wherein, V error Is the current rocket velocity V and the 'line of sight' vector V sight The error between them is the rocket velocity vector; f use The fuel consumption of the rocket at the current moment is related to the amplitude of the adopted command, wherein, A is the amplitude of the thrust output by the control command, and T is the amplitude of the thrust output by the control command max The maximum value of rocket thrust; gs (glideslope) represents the track slope, P glide Limiting the lateral maneuver of the rocket for enveloping constraint, and calculating the longitudinal maneuver dr of the rocket every time the altitude between two states drops by more than 2m during the landing of the rocket z With transverse manoeuvringThe ratio gs between; the other variables are initialization setting parameters, such as alpha = -0.01, beta = -0.05 and eta = -100, which are cumulative return R prog The scale factor of the corresponding term in, gs limit =0.1 and gs τ =0.05 representing the minimum track slope and P, respectively glide A scale factor of an envelope constraint formula;
the terminal reward is expressed as:
R term =reward landing +P term
wherein reward term Is a reward for satisfying the requirements of rocket landing terminal position and speed, P term Punishment is carried out when the transverse maneuvering is too large at the moment before the rocket lands; i V term I and R term | | respectively represents the module values of the terminal speed and the terminal position; gs is term The ratio of the longitudinal displacement and the transverse displacement of the rocket during landing is consistent with a gs calculation method in a process constraint; the other variables V limit 、r limit And gs limit Setting parameters for initialization;
the intelligent Agent can be guided to control the rocket to finish the target of vertical fixed point soft landing through the accumulated return of the process and the terminal reward return.
Discretizing a continuous rocket landing process according to a preset period, and determining the state transition probability of a rocket landing Markov decision process model according to rocket integral dynamics; specifically, the state transition probability P is expressed as:
P(s τ+1 =f(s τ ,a τ )|s τ ,a τ )=1;
wherein s is τ And a τ Respectively representing the current state of the system at the moment tau and the action currently taken by the system; s τ+1 Represents the state of the system at time τ + 1; f (s, a) represents a system state transition kinetic equation; p represents a quantity s based on the state τ And an amount of motion a τ From the state s at time τ τ Into time instant τ +1State s of τ+1 The probability of (d);
correspondingly, the discount factor γ in the rocket landing markov decision process model is used to attenuate the cumulative return of future processes in the trajectory along time, preferably by 0.95.
S13, constructing an intelligent Agent according to the rocket landing Markov decision process model, and performing interactive training on the intelligent Agent and a pre-constructed rocket landing flight simulation environment to obtain a landing control Agent; the intelligent Agent comprises a value function-based neural network and a strategy-based neural network, and specifically, the step of constructing the intelligent Agent according to the rocket landing Markov decision process model comprises the following steps:
selecting a near-end strategy optimization algorithm as a reinforcement learning algorithm of the intelligent Agent according to the rocket landing Markov decision process model; the near-end strategy optimization algorithm can be understood as an improved PPO algorithm and is used for training a rocket son-level landing task of the intelligent Agent;
constructing the value function-based neural network and the strategy-based neural network according to a multilayer perceptron model based on the near-end strategy optimization algorithm; the input based on the strategy neural network is the system state S after the intelligent Agent observed value is processed, and the corresponding output is a thrust vector control instruction value A of rocket landing; the method comprises the steps that a value function-based neural network is used for accelerating convergence of the strategy-based neural network, and an expected accumulated return V(s) of a certain state is predicted based on a value function network trained according to a rocket state value x of a track in plot simulation and a corresponding actual accumulated return Q (s, a);
the neural network based on the strategy and the neural network based on the value function both adopt a 3-hidden-layer structure, the hidden-layer activation function adopts a tanh function, the output-layer activation function adopts a linear activation function, and the number n of neurons in the input layer based on the strategy and the value function neural networks in All 5, the dimension of the state quantity S; the output layer based on the strategy neural network comprises 3 neurons which respectively correspond to the three-dimensional thrust components of the rocket and are based on the value function neural networkThe output layer via the network contains only one neuron, corresponding to the expected cumulative reward. Specifically, the specific structural parameters of the policy-based neural network shown in fig. 4 and the value-based function neural network shown in fig. 5 are shown in table 1:
TABLE 1 structural parameters of policy-based neural networks and value-function-based neural networks
The rocket landing flight simulation environment can be understood as a simulation model which is constructed based on a rocket landing dynamics model and is used for simulating rocket landing flight; specifically, the building steps of the rocket landing flight simulation environment include:
building a rocket landing operation environment based on the rocket three-degree-of-freedom motion model, and synchronously building a corresponding initial value condition generator and a corresponding flight termination determiner to obtain a rocket landing flight simulation environment; wherein, the initial condition generator can be understood as an initial state selector for starting a round of track simulation by randomly selecting an initial state from the set initial state space; the flight termination judger can be understood as a motion state detector which simultaneously sets a rocket flight abnormity and normal termination criterion to detect whether the rocket landing is terminated in real time;
after the intelligent Agent and the rocket landing flight simulation environment are constructed and obtained through the steps of the method, each episode of the rocket landing operation environment can be simulated in a mode of continuous interaction of the intelligent Agent and the rocket landing operation environment according to the following method, an initial value condition generator firstly randomly selects an initial state of rocket landing in an initial state space, then the intelligent Agent guides the rocket landing flight according to an observed system state and a corresponding control instruction based on a strategy neural network, and when the rocket landing succeeds or reaches a cut-off condition in advance to terminate the flight, one episode of simulation is finished, so that a complete rocket landing flight track is completed; in different initial states, corresponding reinforcement learning training can be completed after multiple rounds of plot simulation;
specifically, the step of interactively training the intelligent Agent and the pre-constructed rocket landing flight simulation environment to obtain the landing control Agent comprises the following steps:
training a strategy-based neural network of the intelligent Agent until convergence through interactive simulation of the intelligent Agent and the rocket landing flight simulation environment to obtain the landing control Agent; correspondingly, the training process of the landing control Agent is as follows:
randomly selecting an initial state to be simulated from a preset initial state space according to the initial value condition generator;
according to the initial state to be simulated, executing interactive simulation of the intelligent Agent and the flight simulation environment, terminating the simulation flight of the current wheel when a simulation termination condition preset by the flight termination judger is reached, evaluating and obtaining an accumulated return value of each state point in the current simulation flight track according to a return function, and updating a value-function-based neural network parameter of the intelligent Agent according to the accumulated return value; in a PPO algorithm learning framework, an intelligent Agent in each turn of plot simulation obtains a complete track(s) formed by observation state, action and return through interaction with a flight simulation environment l ,a l ,r l ) Wherein s is l Environmental states observed by the Intelligent Agent, a l Actions taken by the Intelligent Agent according to the observed values, r l Feedbacks to the intelligent Agent for the environment, and rewards r l Generally denoted as s l And a l The trajectory from time k to time T (episode end time) can be represented as(s) k ,a k ,...,s T ,a T ) The cumulative discount return for the track may be expressed as:
wherein, γ ∈ [0,1] represents a discount factor for representing discount returned by each time node in the track, and for a reinforcement learning algorithm, a set of strategies can be found for the purpose of maximizing the accumulated discount return expectation value of the track as much as possible;
predicting expected accumulated return values of all state points in the current simulated flight trajectory according to the updated value-based function neural network of the intelligent Agent, calculating an advantage function according to the accumulated return values and the expected return values, and updating parameters of the intelligent Agent based on a strategy neural network according to the advantage function; wherein the merit function may be expressed as:
A(s,a)=Q(s,a)-V(s)
wherein A (s, a), Q (s, a), and V(s) are expressed as a merit function, an accumulated reward value, and an expected accumulated reward value, respectively;
and judging whether the strategy-based neural network of the intelligent Agent reaches a preset convergence condition, if so, stopping simulation training to obtain the landing control Agent, otherwise, reselecting an initial state to be simulated according to the initial condition generator, and starting the next round of interactive simulation training.
It should be noted that, in the reinforcement learning process, in order to improve the convergence rate of the network and avoid reaching the saturation region of the hidden layer activation function as much as possible, it is preferable to perform normalization processing on the input of the network. For the input data of the network, the mean value and the standard deviation of each dimension are counted and then scaled according to the following formula:
meanwhile, for the output based on the strategy neural network, in order to satisfy the thrust constraint, it is preferable to perform a clipping operation on the total amplitude of the output thrust command, and the specific process is as follows:
wherein, a andrespectively representing thrust instructions before and after the amplitude limiting operation, and explaining other variables with reference to the previous description, which is not repeated herein;
s14, generating a real-time control instruction according to the landing control Agent, and guiding the rocket to land and fly according to the real-time control instruction; after the landing control Agent is obtained through the interactive simulation training, the obtained strategy-based neural network can be used for rocket online landing guidance, and corresponding control instructions can be given in real time according to the state of the rocket in the flight process without the amplitude of the value-function-based neural network, so that the rocket is guided to finish high-precision landing in the environment with deviation.
According to the method, not only can an engine thrust instruction corresponding to large-range rocket model deviation and environmental interference be mapped based on the rocket real-time state, high-precision fixed-point soft landing can be conducted under the condition that complex uncertainty exists, but also a deep neural network is adopted as a strategy network for strengthening learning, an improved PPO algorithm is adopted to conduct simulation training learning on the strategy network, a high-dimensional continuous action space instruction is effectively fitted, a staring heuristic method is adopted to set state quantity guidance, different discount rates are designed for a landing trajectory terminal and a process index, the convergence speed of the rocket is accelerated, and the fixed-point soft landing strategy learning efficiency is effectively improved; compared with the prior art, the method has extremely high real-time performance and extremely strong algorithm robustness, can adapt to relatively wide modeling deviation, can still guide the rocket to carry out high-precision fixed-point soft landing under the working condition that uncertain interference exists in the environment, and has higher application value.
It should be noted that, although the steps in the above-described flowcharts are shown in sequence as indicated by arrows, the steps are not necessarily executed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise.
In one embodiment, as shown in fig. 6, there is provided a reinforcement learning-based rocket landing real-time robust guidance system, the system comprising:
the motion model building module 1 is used for building a rocket three-degree-of-freedom motion model according to acting force borne by the rocket in the earth landing power descent stage;
the optimization model building module 2 is used for building a rocket landing Markov decision process model according to the rocket three-degree-of-freedom motion model;
the control strategy training module 3 is used for constructing an intelligent Agent according to the rocket landing Markov decision process model, and interactively training the intelligent Agent and a pre-constructed rocket landing flight simulation environment to obtain a landing control Agent; the intelligent Agent comprises a value function-based neural network and a strategy-based neural network;
and the rocket landing guidance module 4 is used for generating a real-time control instruction according to the landing control Agent and guiding the rocket to land and fly according to the real-time control instruction.
For specific limitations of a rocket landing real-time robust guidance system based on reinforcement learning, reference may be made to the above limitations of a rocket landing real-time robust guidance method based on reinforcement learning, which are not described herein again. All modules in the rocket landing real-time robust guidance system based on reinforcement learning can be completely or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
Fig. 7 shows an internal structure diagram of a computer device in one embodiment, and the computer device may be specifically a terminal or a server. As shown in fig. 7, the computer apparatus includes a processor, a memory, a network interface, a display, and an input device, which are connected through a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to realize a rocket landing real-time robust guidance method based on reinforcement learning. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those of ordinary skill in the art that the architecture shown in FIG. 7 is a block diagram of only a portion of the architecture associated with the subject application, and is not intended to limit the computing devices to which the subject application may be applied, as a particular computing device may include more or less components than those shown, or may combine certain components, or have a similar arrangement of components.
In one embodiment, a computer device is provided, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the above-mentioned method.
In summary, the rocket landing real-time robust guidance method and system based on reinforcement learning provided by the embodiment of the invention realize that a rocket three-degree-of-freedom motion model is constructed according to acting force borne by the rocket landing power descent segment, a rocket landing Markov decision process model is constructed according to the rocket landing three-degree-of-freedom motion model, and after an intelligent Agent is constructed according to the rocket landing Markov decision process model, the intelligent Agent and a pre-constructed rocket landing flight simulation environment are subjected to interactive simulation training to obtain a landing control Agent, and a technical scheme for guiding the rocket landing flight according to a real-time control instruction generated by the landing control Agent is adopted.
The embodiments in this specification are described in a progressive manner, and all the same or similar parts of the embodiments are directly referred to each other, and each embodiment is described with emphasis on differences from other embodiments. In particular, as for the system embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and reference may be made to the partial description of the method embodiment for relevant points. It should be noted that, the technical features of the embodiments may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express some preferred embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for those skilled in the art, various modifications and substitutions can be made without departing from the technical principle of the present invention, and these should be construed as the protection scope of the present application. Therefore, the protection scope of the present patent shall be subject to the protection scope of the claims.
Claims (10)
1. A rocket landing real-time robust guidance method based on reinforcement learning is characterized by comprising the following steps:
constructing a rocket three-degree-of-freedom motion model according to acting force borne by the rocket in the earth landing power descent stage;
constructing a rocket landing Markov decision process model according to the rocket three-degree-of-freedom motion model;
constructing an intelligent Agent according to the rocket landing Markov decision process model, and performing interactive training on the intelligent Agent and a pre-constructed rocket landing flight simulation environment to obtain a landing control Agent; the intelligent Agent comprises a value function-based neural network and a strategy-based neural network;
and generating a real-time control instruction according to the landing control Agent, and guiding the rocket to land and fly according to the real-time control instruction.
2. A rocket landing real-time robust guidance method based on reinforcement learning as claimed in claim 1, wherein said step of constructing a rocket three-degree-of-freedom motion model according to the acting force applied to the rocket in the landing power descent stage of the earth comprises:
establishing a landing point coordinate system by taking the rocket sub-level target landing point as an original point; the landing point coordinate system is a coordinate system which takes a target landing point of rocket sublevel landing as a coordinate origin O, takes the vertical upward direction of the geocenter as a coordinate axis Oz, takes the main flight direction of the rocket during landing as a coordinate axis Ox, and takes the direction which is perpendicular to the plane xOz and forms a right-hand rectangular coordinate system with the coordinate axis Ox and the coordinate axis Oz as a coordinate axis Oy;
based on the landing point coordinate system, carrying out stress analysis on the rocket flying in the earth landing power descent stage, and determining corresponding earth attraction, aerodynamic resistance and engine thrust;
constructing the rocket three-degree-of-freedom motion model according to the earth attraction, the pneumatic resistance and the engine thrust; the rocket three-degree-of-freedom motion model is expressed as follows:
in the formula (I), the compound is shown in the specification,
wherein r represents a rocket position vector; v represents a rocket velocity vector; m represents rocket mass; g (r) represents the gravitational acceleration vector received by the rocket; t represents an engine thrust vector; d represents an aerodynamic resistance vector; I.C. A sp Represents the fuel specific impulse; g 0 Representing the average gravitational acceleration at sea level of the earth;the second consumption of the propellant after the engine is started; c D Representing a drag coefficient; s ref A reference area representing a rocket substage; rho 0 Representing a reference atmospheric density at sea level of the earth; h represents the flight height of the rocket stage; h is ref Indicating a reference height.
3. A rocket landing real-time robust guidance method based on reinforcement learning as claimed in claim 1, wherein said step of constructing a rocket landing markov decision process model according to said rocket three degrees of freedom motion model comprises:
based on the concept of staring inspiration, carrying out conversion processing on the state variable of the rocket to obtain the state quantity of the rocket landing Markov decision process model; the state quantities are expressed as:
in the formula (I), the compound is shown in the specification,
V error =V-V sight
wherein S represents the state quantity of the rocket landing Markov decision process model; r, V and V 0 Respectively representing a rocket position vector, a rocket speed vector and a rocket initial speed; t is t go Representing the remaining time of flight of the rocket; r is z A Z-axis component representing a rocket position vector; v sight Represents a line-of-sight vector; v error Representing the error of the rocket velocity vector and the sight line vector; λ represents a parameter that adjusts the magnitude of the sight vector over time;
obtaining the action quantity of the rocket landing Markov decision process model according to the control instruction of the rocket; the action amount is expressed as:
wherein A represents the action amount of a rocket landing Markov decision process model; t represents an engine thrust vector; t is a unit of x 、T y And T z X-axis, Y-axis and Z-axis components representing engine thrust, respectively;
determining a return function design principle according to rocket fixed point soft landing requirements, and obtaining a return function of the rocket landing Markov decision process model according to the return function design principle;
discretizing a continuous rocket landing process according to a preset period, and determining the state transition probability of the rocket landing Markov decision process model according to rocket integral dynamics.
4. A rocket landing real-time robust guidance method based on reinforcement learning according to claim 1, wherein said step of constructing an intelligent Agent according to said rocket landing markov decision process model comprises:
selecting a near-end strategy optimization algorithm as a reinforcement learning algorithm of the intelligent Agent according to the rocket landing Markov decision process model;
and constructing the value function-based neural network and the strategy-based neural network according to a multilayer perceptron model based on the near-end strategy optimization algorithm.
5. A rocket landing real-time robust guidance method based on reinforcement learning according to claim 1, wherein the rocket landing flight simulation environment construction step comprises:
and constructing a rocket landing operating environment based on the rocket three-degree-of-freedom motion model, and synchronously constructing a corresponding initial value condition generator and a corresponding flight termination determiner to obtain the rocket landing flight simulation environment.
6. A rocket landing real-time robust guidance method based on reinforcement learning as claimed in claim 5, wherein said step of interactively training said intelligent Agent and a pre-constructed rocket landing flight simulation environment to obtain a landing control Agent comprises:
and training a strategy-based neural network of the intelligent Agent until convergence through interactive simulation of the intelligent Agent and the rocket landing flight simulation environment to obtain the landing control Agent.
7. A rocket landing real-time robust guidance method based on reinforcement learning as claimed in claim 6, wherein said step of training strategy-based neural network of intelligent Agent through interactive simulation of said intelligent Agent and said rocket landing flight simulation environment until convergence, obtaining said landing control Agent comprises:
randomly selecting an initial state to be simulated from a preset initial state space according to the initial value condition generator;
according to the initial state to be simulated, executing interactive simulation of the intelligent Agent and the flight simulation environment, terminating the simulation flight of the current wheel when a simulation termination condition preset by the flight termination judger is reached, evaluating and obtaining an accumulated return value of each state point in the current simulation flight track according to a return function, and updating a value-function-based neural network parameter of the intelligent Agent according to the accumulated return value;
predicting expected accumulated return values of all state points in the current simulated flight trajectory according to the updated value-based function neural network of the intelligent Agent, calculating an advantage function according to the accumulated return values and the expected return values, and updating parameters of the intelligent Agent based on a strategy neural network according to the advantage function;
and judging whether the strategy-based neural network of the intelligent Agent reaches a preset convergence condition, if so, stopping simulation training to obtain the landing control Agent, otherwise, reselecting an initial state to be simulated according to the initial condition generator, and starting the next round of interactive simulation training.
8. A rocket landing real-time robust guidance system based on reinforcement learning, which is characterized in that the system comprises:
the motion model building module is used for building a rocket three-degree-of-freedom motion model according to acting force borne by the rocket in the earth landing power descent stage;
the optimization model building module is used for building a rocket landing Markov decision process model according to the rocket three-degree-of-freedom motion model;
the control strategy training module is used for constructing an intelligent Agent according to the rocket landing Markov decision process model and interactively training the intelligent Agent and a pre-constructed rocket landing flight simulation environment to obtain a landing control Agent; the intelligent Agent comprises a value function-based neural network and a strategy-based neural network;
and the rocket landing guidance module is used for generating a real-time control instruction according to the landing control Agent and guiding the rocket to land and fly according to the real-time control instruction.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented when the computer program is executed by the processor.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210972207.XA CN115524964B (en) | 2022-08-12 | 2022-08-12 | Rocket landing real-time robust guidance method and system based on reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210972207.XA CN115524964B (en) | 2022-08-12 | 2022-08-12 | Rocket landing real-time robust guidance method and system based on reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115524964A true CN115524964A (en) | 2022-12-27 |
CN115524964B CN115524964B (en) | 2023-04-11 |
Family
ID=84696584
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210972207.XA Active CN115524964B (en) | 2022-08-12 | 2022-08-12 | Rocket landing real-time robust guidance method and system based on reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115524964B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117688826A (en) * | 2023-07-13 | 2024-03-12 | 东方空间技术(山东)有限公司 | Sea-shooting rocket sub-level recovery method, equipment and storage medium |
Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020008657A1 (en) * | 1993-12-21 | 2002-01-24 | Aubrey B. Poore Jr | Method and system for tracking multiple regional objects by multi-dimensional relaxation |
US7454037B2 (en) * | 2005-10-21 | 2008-11-18 | The Boeing Company | System, method and computer program product for adaptive video processing |
CN107807617A (en) * | 2016-09-08 | 2018-03-16 | 通用电气航空系统有限责任公司 | Based on fuel, time and the improved flying vehicles control for consuming cost |
US20190018375A1 (en) * | 2017-07-11 | 2019-01-17 | General Electric Company | Apparatus and method for event detection and duration determination |
CN109343341A (en) * | 2018-11-21 | 2019-02-15 | 北京航天自动控制研究所 | It is a kind of based on deeply study carrier rocket vertically recycle intelligent control method |
CN110687918A (en) * | 2019-10-17 | 2020-01-14 | 哈尔滨工程大学 | Underwater robot trajectory tracking control method based on regression type neural network online approximation |
CN111338375A (en) * | 2020-02-27 | 2020-06-26 | 中国科学院国家空间科学中心 | Control method and system for four-rotor unmanned aerial vehicle to move and land based on hybrid strategy |
CN111766782A (en) * | 2020-06-28 | 2020-10-13 | 浙江大学 | Strategy selection method based on Actor-Critic framework in deep reinforcement learning |
CN112069997A (en) * | 2020-09-04 | 2020-12-11 | 中山大学 | Unmanned aerial vehicle autonomous landing target extraction method and device based on DenseHR-Net |
CN112278334A (en) * | 2020-11-06 | 2021-01-29 | 北京登火汇智科技有限公司 | Method for controlling the landing process of a rocket |
US20210123741A1 (en) * | 2019-10-29 | 2021-04-29 | Loon Llc | Systems and Methods for Navigating Aerial Vehicles Using Deep Reinforcement Learning |
CN112818599A (en) * | 2021-01-29 | 2021-05-18 | 四川大学 | Air control method based on reinforcement learning and four-dimensional track |
CN113065709A (en) * | 2021-04-13 | 2021-07-02 | 西北工业大学 | Cross-domain heterogeneous cluster path planning method based on reinforcement learning |
CN113359843A (en) * | 2021-07-02 | 2021-09-07 | 成都睿沿芯创科技有限公司 | Unmanned aerial vehicle autonomous landing method and device, electronic equipment and storage medium |
CN113486938A (en) * | 2021-06-28 | 2021-10-08 | 重庆大学 | Multi-branch time convolution network-based re-landing analysis method and device |
CN114265308A (en) * | 2021-09-08 | 2022-04-01 | 哈尔滨工程大学 | Anti-saturation model-free preset performance track tracking control method for autonomous water surface vehicle |
US20220234765A1 (en) * | 2021-01-25 | 2022-07-28 | Brian Haney | Precision Landing for Rockets using Deep Reinforcement Learning |
-
2022
- 2022-08-12 CN CN202210972207.XA patent/CN115524964B/en active Active
Patent Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020008657A1 (en) * | 1993-12-21 | 2002-01-24 | Aubrey B. Poore Jr | Method and system for tracking multiple regional objects by multi-dimensional relaxation |
US7454037B2 (en) * | 2005-10-21 | 2008-11-18 | The Boeing Company | System, method and computer program product for adaptive video processing |
CN107807617A (en) * | 2016-09-08 | 2018-03-16 | 通用电气航空系统有限责任公司 | Based on fuel, time and the improved flying vehicles control for consuming cost |
US20190018375A1 (en) * | 2017-07-11 | 2019-01-17 | General Electric Company | Apparatus and method for event detection and duration determination |
CN109343341A (en) * | 2018-11-21 | 2019-02-15 | 北京航天自动控制研究所 | It is a kind of based on deeply study carrier rocket vertically recycle intelligent control method |
CN110687918A (en) * | 2019-10-17 | 2020-01-14 | 哈尔滨工程大学 | Underwater robot trajectory tracking control method based on regression type neural network online approximation |
US20210123741A1 (en) * | 2019-10-29 | 2021-04-29 | Loon Llc | Systems and Methods for Navigating Aerial Vehicles Using Deep Reinforcement Learning |
CN111338375A (en) * | 2020-02-27 | 2020-06-26 | 中国科学院国家空间科学中心 | Control method and system for four-rotor unmanned aerial vehicle to move and land based on hybrid strategy |
CN111766782A (en) * | 2020-06-28 | 2020-10-13 | 浙江大学 | Strategy selection method based on Actor-Critic framework in deep reinforcement learning |
CN112069997A (en) * | 2020-09-04 | 2020-12-11 | 中山大学 | Unmanned aerial vehicle autonomous landing target extraction method and device based on DenseHR-Net |
CN112278334A (en) * | 2020-11-06 | 2021-01-29 | 北京登火汇智科技有限公司 | Method for controlling the landing process of a rocket |
US20220234765A1 (en) * | 2021-01-25 | 2022-07-28 | Brian Haney | Precision Landing for Rockets using Deep Reinforcement Learning |
CN112818599A (en) * | 2021-01-29 | 2021-05-18 | 四川大学 | Air control method based on reinforcement learning and four-dimensional track |
CN113065709A (en) * | 2021-04-13 | 2021-07-02 | 西北工业大学 | Cross-domain heterogeneous cluster path planning method based on reinforcement learning |
CN113486938A (en) * | 2021-06-28 | 2021-10-08 | 重庆大学 | Multi-branch time convolution network-based re-landing analysis method and device |
CN113359843A (en) * | 2021-07-02 | 2021-09-07 | 成都睿沿芯创科技有限公司 | Unmanned aerial vehicle autonomous landing method and device, electronic equipment and storage medium |
CN114265308A (en) * | 2021-09-08 | 2022-04-01 | 哈尔滨工程大学 | Anti-saturation model-free preset performance track tracking control method for autonomous water surface vehicle |
Non-Patent Citations (4)
Title |
---|
WANG, JB等: "Optimal Rocket Landing Guidance Using Convex Optimization and Model Predictive Control" * |
ZHANG, HY等: "Autonomous Navigation with Improved Hierarchical Neural Network Based on Deep Reinforcement Learning" * |
王劲博: "可重复使用运载火箭在线轨迹优化与制导方法研究" * |
程子龙: "采用可重复使用地月转移飞船的载人月球探测系统建模与优化研究" * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117688826A (en) * | 2023-07-13 | 2024-03-12 | 东方空间技术(山东)有限公司 | Sea-shooting rocket sub-level recovery method, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN115524964B (en) | 2023-04-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109343341B (en) | Carrier rocket vertical recovery intelligent control method based on deep reinforcement learning | |
CN110806759B (en) | Aircraft route tracking method based on deep reinforcement learning | |
Han et al. | State-constrained agile missile control with adaptive-critic-based neural networks | |
CN111027143B (en) | Shipboard aircraft approach guiding method based on deep reinforcement learning | |
CN111538255B (en) | Anti-bee colony unmanned aerial vehicle aircraft control method and system | |
CN111240345A (en) | Underwater robot trajectory tracking method based on double BP network reinforcement learning framework | |
CN112550770B (en) | Rocket soft landing trajectory planning method based on convex optimization | |
CN108984907A (en) | A kind of interative guidance method based on yaw corner condition | |
CN114428517B (en) | End-to-end autonomous landing control method for unmanned plane and unmanned ship cooperative platform | |
CN115524964B (en) | Rocket landing real-time robust guidance method and system based on reinforcement learning | |
CN116697829A (en) | Rocket landing guidance method and system based on deep reinforcement learning | |
CN109144099B (en) | Fast evaluation method for unmanned aerial vehicle group action scheme based on convolutional neural network | |
CN113031448A (en) | Aircraft ascending section track optimization method based on neural network | |
CN102279568B (en) | Data control method used for formation flying | |
CN115289917B (en) | Rocket sublevel landing real-time optimal guidance method and system based on deep learning | |
Candeli et al. | A deep deterministic policy gradient learning approach to missile autopilot design | |
CN111830848A (en) | Unmanned aerial vehicle super-maneuvering flight performance simulation training system and method | |
CN118192584A (en) | Unmanned aerial vehicle boarding method, equipment and medium based on MPC-NDQN | |
CN117908565A (en) | Unmanned aerial vehicle safety path planning method based on maximum entropy multi-agent reinforcement learning | |
Al-Helal et al. | UAV search: Maximizing target acquisition | |
CN109062044B (en) | Terminal iterative learning docking control method | |
CN112380692B (en) | Method for planning online track in atmosphere of carrier rocket | |
CN106647327B (en) | Based on virtual flight experience warship commander longitudinal direction imperative instruction modeling method | |
CN113111433B (en) | Double-thread embedded real-time track optimization and guidance method | |
CN112161626B (en) | High-flyability route planning method based on route tracking mapping network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |