CN117762022A - satellite intelligent robust approximate optimal orbit control method based on discrete time reinforcement learning - Google Patents

satellite intelligent robust approximate optimal orbit control method based on discrete time reinforcement learning Download PDF

Info

Publication number
CN117762022A
CN117762022A CN202410194622.6A CN202410194622A CN117762022A CN 117762022 A CN117762022 A CN 117762022A CN 202410194622 A CN202410194622 A CN 202410194622A CN 117762022 A CN117762022 A CN 117762022A
Authority
CN
China
Prior art keywords
satellite
control
orbit
reinforcement learning
robust
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410194622.6A
Other languages
Chinese (zh)
Other versions
CN117762022B (en
Inventor
张鹏
陈谋
邵书义
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202410194622.6A priority Critical patent/CN117762022B/en
Priority claimed from CN202410194622.6A external-priority patent/CN117762022B/en
Publication of CN117762022A publication Critical patent/CN117762022A/en
Application granted granted Critical
Publication of CN117762022B publication Critical patent/CN117762022B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention relates to a satellite intelligent robust approximate optimal orbit control method based on discrete time reinforcement learning, which aims at the problem of satellite orbit discrete control containing uncertainty, designs a brand new robust second-order approximate Hamiltonian-jacobian-Belman equation, and designs a strategy iterative algorithm with convergence characteristic based on the equation. The method can not only effectively solve the uncertainty in the system, but also ensure the stability of satellite orbit control, and meanwhile, the strategy iteration method is convenient to apply in actual engineering.

Description

satellite intelligent robust approximate optimal orbit control method based on discrete time reinforcement learning
Technical Field
The invention relates to the field of intelligent control of satellite orbits, in particular to an intelligent robust near-optimal satellite orbit control method based on discrete time reinforcement learning.
Background
Intelligent control of satellite orbits has been a key topic in satellite in-orbit service and is widely used for space-critical tasks such as: active clearance of space debris, interstellar detection, and the like. For space satellites, propeller fuel is an important resource to ensure that the satellite is working properly. In designing satellite orbit control strategies, it is necessary to incorporate fuel consumption into the control performance index. However, due to the strong non-linear nature of satellite orbit, the difficulty of designing an orbit controller with fuel as a performance index increases. In addition, in the satellite operation process, the control accuracy of the satellite orbit can be seriously reduced due to the influence of uncertainty such as various perturbation forces, and the design difficulty of the nonlinear optimal controller of the satellite is further increased.
Aiming at the optimal control problem of satellite orbits, various control methods have been proposed. Common satellite orbit control strategies generally require local linearization of the satellite orbit system followed by controller design. However, such local linearization may reduce the control accuracy of the satellite orbit. In addition, the existing nonlinear optimal control method does not consider the influence of uncertainty, so that the designed control strategy is not accurate enough. Therefore, it is a difficult problem to design a nonlinear robust optimal control method for the satellite orbit optimal control task, how to avoid local linearization.
Disclosure of Invention
the technical solution of the invention is as follows: aiming at the optimal control problem of the satellite nonlinear orbit, the method overcomes the defects of the prior art, fully utilizes the nonlinear approximation characteristic of the neural network, and provides a satellite intelligent robust approximate optimal orbit control method based on discrete time reinforcement learning.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a satellite intelligent robust approximate optimal orbit control method based on discrete time reinforcement learning comprises the following steps:
step 1: establishing a satellite orbit discrete control model containing uncertainty according to the two-body dynamics;
step 2: establishing a robust second-order approximate Hamiltonian-Jacobian-Belman equation for optimal satellite control by utilizing a second-order expansion of a Taylor formula;
step 3: designing a satellite orbit control strategy iterative algorithm based on a robust second-order approximate Hamiltonian-jacobian-Belman equation;
further, the step 1 specifically comprises the following steps:
Establishing a satellite orbit control model:
Wherein,Is a vector of satellite position and velocity; /(I)nonlinear terms for a satellite orbit control system; /(I)A coefficient matrix for control input; /(I)Is the control input of the satellite. /(I)And/>The specific form is as follows:
Wherein,Is a universal gravitation constant; /(I)is the true near point angle of the reference orbit; /(I)Is the reference track radius. /(I)Andthe following dynamic equations are satisfied:
further, a Euler discretization method is utilized to establish a satellite orbit discrete control model:
Wherein,,/>;/>For sampling period,/>In the first place for satellite stateValue of time of day,/>Input for satellite at the/>A value of the time of day.
Further, a satellite orbit discrete control model containing uncertainty is established
Wherein,Is a non-matching uncertainty term for a satellite orbit control system and satisfies the following inequality:
Wherein,state/>, for a known determinationIs a function of (2).
to quantify control performance, the satellite control cost function is set to:
Wherein the matrixIs an adjustable known positive constant matrix. Controller/>is designed to minimize the control cost function/>
further, the step 2 specifically comprises: establishing a robust second-order approximation Hamiltonian-Jacobian-Belman equation by utilizing a second-order expansion of a Taylor formula:
And is also provided with
Wherein,And/>Respectively the value function/>the expressions of the gradient vector and the hessian matrix are respectively as follows:
Wherein,Is vector/>/>The elements.
Further, the step 3 specifically comprises: based on the robust second-order approximate Hamiltonian-jacobian-Bellman equation, the following strategy iterative algorithm is designed by utilizing the reinforcement learning idea:
step 3.1: first, an initial admission control strategy is selectedand initializing and calculating an error threshold
Step 3.2: for the number of iterationsIterative value function/>Calculated according to the following equation:
Wherein,is an adjustable positive constant.
Step 3.3: at the derived value functionbased on the above, the control strategy for calculating the next iteration step number
Wherein,
step 3.4: calculating control strategy norm errors of two adjacent times. If it isstep 3.2, if not, calculating, and outputting the optimal control strategy/>
The beneficial effects of the invention are as follows: aiming at the problem of satellite orbit discrete control containing uncertainty, the invention designs a brand new robust second-order approximate Hamiltonian-Jacobian-Bellman equation and a strategy iteration algorithm with convergence characteristic. The invention not only overcomes the nonlinear characteristic of the satellite orbit, but also solves the adverse effect of uncertainty of the satellite orbit, and ensures the control precision of the satellite orbit.
Drawings
FIG. 1 is a graph of the position error of a satellite versus a desired orbit in accordance with the present invention;
FIG. 2 is a graph of the velocity error of a satellite versus a desired orbit in accordance with the present invention;
FIG. 3 is a flowchart of the algorithm of the present invention.
Detailed Description
the principles and features of the present invention are described below with examples given for the purpose of illustration only and are not intended to limit the scope of the invention.
As shown in fig. 1, a satellite intelligent robust near-optimal orbit control method based on discrete time reinforcement learning comprises the following steps:
step 1: establishing a satellite orbit discrete control model containing uncertainty according to the two-body dynamics;
step 2: and establishing a robust second-order approximate Hamiltonian-Jacobian-Belman equation for optimal control of the satellite orbit by utilizing a second-order expansion of a Taylor formula.
Step 3: based on a robust second-order approximate Hamiltonian-jacobian-Belman equation, a satellite orbit control strategy iterative algorithm is designed.
the steps described above are described in further detail below in connection with specific examples:
in step 1, a satellite orbit control model, a discrete control model of a satellite orbit, and a discrete control model of a satellite orbit containing uncertainty are required to be established in sequence, and the method is realized according to the following steps:
The first step: establishing a satellite orbit control model:
Wherein,Is a vector of satellite position and velocity; /(I)nonlinear terms for a satellite orbit control system; /(I)A coefficient matrix for control input; /(I)Is the control input of the satellite. /(I)And/>The specific form is as follows:
Wherein,Is a universal gravitation constant; /(I)is the true near point angle of the reference orbit; /(I)Is the reference track radius. /(I)Andthe following dynamic equations are satisfied:
In the embodiment of the invention, aiming at the defect that the control precision of the satellite orbit is reduced due to the fact that the optimal control strategy of the satellite orbit commonly carries out local linearization on the satellite orbit system and then the controller is designed in the prior art, the defect that the control precision is reduced due to the fact that the nonlinear approximation of a neural network is utilized is overcome, and therefore the control precision is effectively improved. The specific initial state parameter values are as follows:
,/>,/>
And a second step of: establishing a discrete control model of a satellite orbit by using an Euler discretization method:
Wherein,,/>;/>For sampling period,/>In the first place for satellite stateValue of time of day,/>Input for satellite at the/>A value of the time of day. In an embodiment of the invention, the parameter/>For/>
and a third step of: establishing a satellite orbit discrete control model containing uncertainty;
Wherein,Is a non-matching uncertainty term for a satellite orbit control system and satisfies the following inequality:
Wherein,state/>, for a known determinationIs a function of (2).
in this embodiment, the control accuracy of the satellite orbit is further improved by introducing an uncertainty term in the discrete control model.
In another embodiment, to quantify control performance, as a further improvement of the present invention, the satellite control cost function is set to:
Wherein the matrixIs an adjustable known positive constant matrix. Controller/>is designed to minimize the control cost function/>. In the present embodiment of the present invention, in the present embodiment,
Wherein/>The elements in brackets are shown as constituting a diagonal matrix.
in step 2, a robust second order approximation hamilton-jacobian-bellman equation is established using the second order expansion of the taylor equation:
and/>
Wherein,And/>Respectively the value function/>the expressions of the gradient vector and the hessian matrix are respectively as follows:
Wherein,Is vector/>/>The elements.
In step 3, based on the robust second-order approximation hamilton-jacobian-bellman equation, the following strategy iterative algorithm is designed by using the reinforcement learning idea:
step 3.1: first, an initial admission control strategy is selectedand initializing and calculating an error threshold. In the present embodiment,/>
Step 3.2: for the number of iterationsIterative value function/>Calculated according to the following equation:
Wherein,Is an adjustable positive constant. In the present embodiment,/>
Step 3.3: at the derived value functionbased on the above, the control strategy for calculating the next iteration step number
Wherein,
step 3.4: calculating control strategy norm errors of two adjacent times. If it isstep 3.2, if not, calculating, and outputting the optimal control strategy/>
Fig. 1 and 2 depict simulation results of this example, respectively, wherein fig. 1 depicts a plot of the position error of a satellite versus a desired orbit. As can be seen from fig. 1, with the satellite control method of the present application, after a period of time, the satellite successfully migrates to the desired orbital position. Fig. 2 depicts a velocity error profile of a satellite versus a desired orbit, from analysis, where the relative velocity error eventually converges to zero.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (6)

1. The intelligent robust near-optimal orbit control method for the satellite based on the discrete time reinforcement learning is characterized by comprising the following steps of:
step 1: establishing a satellite orbit discrete control model containing uncertainty according to the two-body dynamics;
step 2: establishing a robust second-order approximate Hamiltonian-Jacobian-Belman equation for optimal control of a satellite orbit by utilizing a second-order expansion of a Taylor formula;
Step 3: based on a robust second-order approximate Hamiltonian-jacobian-Belman equation, a satellite orbit control strategy iterative algorithm is designed.
2. The method for intelligent robust near optimal orbit control of a satellite based on discrete time reinforcement learning according to claim 1, wherein a satellite orbit control model is established:
Wherein,Is a vector of satellite position and velocity; /(I)nonlinear terms for a satellite orbit control system; /(I)A coefficient matrix for control input; /(I)A control input for the satellite;
And/>The specific form is as follows:
Wherein,Is a universal gravitation constant; /(I)is the true near point angle of the reference orbit; /(I)is the reference track radius;
And/>the following dynamic equations are satisfied:
3. the method for controlling the intelligent robust near-optimal orbit of the satellite based on the discrete time reinforcement learning according to claim 2, wherein a discrete control model of the satellite orbit is established by using an Euler discretization method:
Wherein,,/>;/>For sampling period,/>In the first place for satellite stateValue of time of day,/>Input for satellite at the/>A value of the time of day.
4. the method for intelligent robust near optimal orbit control of a satellite based on discrete time reinforcement learning according to claim 2, wherein a satellite orbit discrete control model containing uncertainty is established
Wherein,Is a non-matching uncertainty term for a satellite orbit control system and satisfies the following inequality:
Wherein,state/>, for a known determinationIs a function of (2).
5. the method for intelligent robust near optimal orbit control of a satellite based on discrete time reinforcement learning according to claim 4, wherein the method comprises the following steps: the satellite control cost function is set as:
Wherein the matrixis an adjustable known positive constant matrix; controller/>is designed to minimize the control cost function/>
6. The method for intelligent robust approximate optimal orbit control of a satellite based on discrete time reinforcement learning according to claim 5, wherein in the step 3, based on the robust second-order approximate Hamiltonian-jacobian-Belman equation, the following strategy iterative algorithm is designed by using reinforcement learning ideas:
step 3.1: first, an initial admission control strategy is selectedand initializing and calculating an error threshold/>
Step 3.2: for the number of iterationsIterative value function/>Calculated according to the following equation:
Wherein,is an adjustable positive constant;
Step 3.3: at the derived value functionbased on the above, the control strategy for calculating the next iteration step number
Wherein,
step 3.4: calculating control strategy norm errors of two adjacent timesThe method comprises the steps of carrying out a first treatment on the surface of the If it isstep 3.2, if not, calculating, and outputting the optimal control strategy/>
CN202410194622.6A 2024-02-22 Satellite intelligent robust approximate optimal orbit control method based on discrete time reinforcement learning Active CN117762022B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410194622.6A CN117762022B (en) 2024-02-22 Satellite intelligent robust approximate optimal orbit control method based on discrete time reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410194622.6A CN117762022B (en) 2024-02-22 Satellite intelligent robust approximate optimal orbit control method based on discrete time reinforcement learning

Publications (2)

Publication Number Publication Date
CN117762022A true CN117762022A (en) 2024-03-26
CN117762022B CN117762022B (en) 2024-05-14

Family

ID=

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1455992A (en) * 2000-09-28 2003-11-12 Ses阿斯特拉有限公司 Spread spectrum communication system using quasi-geostationary satellite
CN108196446A (en) * 2017-12-14 2018-06-22 北京理工大学 The Dynamic Programming method for optimally controlling of the bi-motor load of unknown-model
CN111874267A (en) * 2020-04-30 2020-11-03 中国人民解放军战略支援部队航天工程大学 Low-orbit satellite off-orbit control method and system based on particle swarm optimization
CN113128828A (en) * 2021-03-05 2021-07-16 中国科学院国家空间科学中心 Satellite observation distributed online planning method based on multi-agent reinforcement learning
CN117579126A (en) * 2023-11-21 2024-02-20 重庆邮电大学 Satellite mobile edge calculation unloading decision method based on deep reinforcement learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1455992A (en) * 2000-09-28 2003-11-12 Ses阿斯特拉有限公司 Spread spectrum communication system using quasi-geostationary satellite
CN108196446A (en) * 2017-12-14 2018-06-22 北京理工大学 The Dynamic Programming method for optimally controlling of the bi-motor load of unknown-model
CN111874267A (en) * 2020-04-30 2020-11-03 中国人民解放军战略支援部队航天工程大学 Low-orbit satellite off-orbit control method and system based on particle swarm optimization
CN113128828A (en) * 2021-03-05 2021-07-16 中国科学院国家空间科学中心 Satellite observation distributed online planning method based on multi-agent reinforcement learning
CN117579126A (en) * 2023-11-21 2024-02-20 重庆邮电大学 Satellite mobile edge calculation unloading decision method based on deep reinforcement learning

Similar Documents

Publication Publication Date Title
Reddy et al. Reliability based structural optimization: a simplified safety index approach
CN109255096B (en) Geosynchronous satellite orbit uncertain evolution method based on differential algebra
Li et al. An approach and landing guidance design for reusable launch vehicle based on adaptive predictor–corrector technique
CN105203110A (en) Low-orbit-satellite orbit prediction method based on atmospheric resistance model compensation
CN114839880B (en) Self-adaptive control method based on flexible joint mechanical arm
CN114879515A (en) Spacecraft attitude reconstruction fault-tolerant control method based on learning neural network
CN117762022A (en) satellite intelligent robust approximate optimal orbit control method based on discrete time reinforcement learning
CN110015445B (en) Earth-moon L2 point Halo track maintaining method
CN117762022B (en) Satellite intelligent robust approximate optimal orbit control method based on discrete time reinforcement learning
CN111624872B (en) PID controller parameter setting method and system based on adaptive dynamic programming
US6317662B1 (en) Stable and verifiable state estimation methods and systems with spacecraft applications
Jayakumar et al. A computational method for solving singular perturbation problems
CN115993777A (en) Track perturbation model inversion-based diameter-cut joint control decoupling iteration calibration method
CN114063458A (en) Preset performance control method of non-triangular structure system independent of initial conditions
CN110032066B (en) Adaptive iterative learning control method for fractional order nonlinear system trajectory tracking
Yuan et al. Uncertainty-resilient constrained rendezvous trajectory optimization via stochastic feedback control and unscented transformation
CN114200491A (en) Navigation data-based emergency spacecraft ephemeris determination method and system
Rigatos et al. Nonlinear optimal control for autonomous hypersonic vehicles
Burlion et al. Controls for a nonlinear system arising in vision‐based landing of airliners
Jia et al. Collision avoidance in target encirclement and tracking of unmanned aerial vehicles under a dynamic event-triggered formation control
Das et al. Optimal nonlinear control and estimation for a reusable launch vehicle during reentry phase
CN113297666B (en) Design method for high-precision control of spacecraft
CN113886947B (en) Aircraft static aeroelastic system output state quantity interval determination method based on iteration strategy
CN114397906B (en) Rapid high-precision calculation method for earth stationary satellite electric propulsion transfer
CN116202535B (en) Initial value intelligent optimized spacecraft angle measurement-only ultrashort arc initial orbit determination method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant