CN117762022A - satellite intelligent robust approximate optimal orbit control method based on discrete time reinforcement learning - Google Patents
satellite intelligent robust approximate optimal orbit control method based on discrete time reinforcement learning Download PDFInfo
- Publication number
- CN117762022A CN117762022A CN202410194622.6A CN202410194622A CN117762022A CN 117762022 A CN117762022 A CN 117762022A CN 202410194622 A CN202410194622 A CN 202410194622A CN 117762022 A CN117762022 A CN 117762022A
- Authority
- CN
- China
- Prior art keywords
- satellite
- control
- orbit
- reinforcement learning
- robust
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 230000002787 reinforcement Effects 0.000 title claims abstract description 16
- 238000011217 control strategy Methods 0.000 claims description 19
- 230000006870 function Effects 0.000 claims description 14
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000005070 sampling Methods 0.000 claims description 3
- 230000007547 defect Effects 0.000 description 3
- 239000000446 fuel Substances 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Landscapes
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
The invention relates to a satellite intelligent robust approximate optimal orbit control method based on discrete time reinforcement learning, which aims at the problem of satellite orbit discrete control containing uncertainty, designs a brand new robust second-order approximate Hamiltonian-jacobian-Belman equation, and designs a strategy iterative algorithm with convergence characteristic based on the equation. The method can not only effectively solve the uncertainty in the system, but also ensure the stability of satellite orbit control, and meanwhile, the strategy iteration method is convenient to apply in actual engineering.
Description
Technical Field
The invention relates to the field of intelligent control of satellite orbits, in particular to an intelligent robust near-optimal satellite orbit control method based on discrete time reinforcement learning.
Background
Intelligent control of satellite orbits has been a key topic in satellite in-orbit service and is widely used for space-critical tasks such as: active clearance of space debris, interstellar detection, and the like. For space satellites, propeller fuel is an important resource to ensure that the satellite is working properly. In designing satellite orbit control strategies, it is necessary to incorporate fuel consumption into the control performance index. However, due to the strong non-linear nature of satellite orbit, the difficulty of designing an orbit controller with fuel as a performance index increases. In addition, in the satellite operation process, the control accuracy of the satellite orbit can be seriously reduced due to the influence of uncertainty such as various perturbation forces, and the design difficulty of the nonlinear optimal controller of the satellite is further increased.
Aiming at the optimal control problem of satellite orbits, various control methods have been proposed. Common satellite orbit control strategies generally require local linearization of the satellite orbit system followed by controller design. However, such local linearization may reduce the control accuracy of the satellite orbit. In addition, the existing nonlinear optimal control method does not consider the influence of uncertainty, so that the designed control strategy is not accurate enough. Therefore, it is a difficult problem to design a nonlinear robust optimal control method for the satellite orbit optimal control task, how to avoid local linearization.
Disclosure of Invention
the technical solution of the invention is as follows: aiming at the optimal control problem of the satellite nonlinear orbit, the method overcomes the defects of the prior art, fully utilizes the nonlinear approximation characteristic of the neural network, and provides a satellite intelligent robust approximate optimal orbit control method based on discrete time reinforcement learning.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a satellite intelligent robust approximate optimal orbit control method based on discrete time reinforcement learning comprises the following steps:
step 1: establishing a satellite orbit discrete control model containing uncertainty according to the two-body dynamics;
step 2: establishing a robust second-order approximate Hamiltonian-Jacobian-Belman equation for optimal satellite control by utilizing a second-order expansion of a Taylor formula;
step 3: designing a satellite orbit control strategy iterative algorithm based on a robust second-order approximate Hamiltonian-jacobian-Belman equation;
further, the step 1 specifically comprises the following steps:
Establishing a satellite orbit control model:
Wherein,Is a vector of satellite position and velocity; /(I)nonlinear terms for a satellite orbit control system; /(I)A coefficient matrix for control input; /(I)Is the control input of the satellite. /(I)And/>The specific form is as follows:
Wherein,Is a universal gravitation constant; /(I)is the true near point angle of the reference orbit; /(I)Is the reference track radius. /(I)Andthe following dynamic equations are satisfied:
further, a Euler discretization method is utilized to establish a satellite orbit discrete control model:
Wherein,,/>;/>For sampling period,/>In the first place for satellite stateValue of time of day,/>Input for satellite at the/>A value of the time of day.
Further, a satellite orbit discrete control model containing uncertainty is established
Wherein,Is a non-matching uncertainty term for a satellite orbit control system and satisfies the following inequality:
Wherein,state/>, for a known determinationIs a function of (2).
to quantify control performance, the satellite control cost function is set to:
Wherein the matrixIs an adjustable known positive constant matrix. Controller/>is designed to minimize the control cost function/>。
further, the step 2 specifically comprises: establishing a robust second-order approximation Hamiltonian-Jacobian-Belman equation by utilizing a second-order expansion of a Taylor formula:
And is also provided with
Wherein,And/>Respectively the value function/>the expressions of the gradient vector and the hessian matrix are respectively as follows:
Wherein,Is vector/>/>The elements.
Further, the step 3 specifically comprises: based on the robust second-order approximate Hamiltonian-jacobian-Bellman equation, the following strategy iterative algorithm is designed by utilizing the reinforcement learning idea:
step 3.1: first, an initial admission control strategy is selectedand initializing and calculating an error threshold。
Step 3.2: for the number of iterationsIterative value function/>Calculated according to the following equation:
Wherein,is an adjustable positive constant.
Step 3.3: at the derived value functionbased on the above, the control strategy for calculating the next iteration step number
Wherein,
step 3.4: calculating control strategy norm errors of two adjacent times. If it isstep 3.2, if not, calculating, and outputting the optimal control strategy/>。
The beneficial effects of the invention are as follows: aiming at the problem of satellite orbit discrete control containing uncertainty, the invention designs a brand new robust second-order approximate Hamiltonian-Jacobian-Bellman equation and a strategy iteration algorithm with convergence characteristic. The invention not only overcomes the nonlinear characteristic of the satellite orbit, but also solves the adverse effect of uncertainty of the satellite orbit, and ensures the control precision of the satellite orbit.
Drawings
FIG. 1 is a graph of the position error of a satellite versus a desired orbit in accordance with the present invention;
FIG. 2 is a graph of the velocity error of a satellite versus a desired orbit in accordance with the present invention;
FIG. 3 is a flowchart of the algorithm of the present invention.
Detailed Description
the principles and features of the present invention are described below with examples given for the purpose of illustration only and are not intended to limit the scope of the invention.
As shown in fig. 1, a satellite intelligent robust near-optimal orbit control method based on discrete time reinforcement learning comprises the following steps:
step 1: establishing a satellite orbit discrete control model containing uncertainty according to the two-body dynamics;
step 2: and establishing a robust second-order approximate Hamiltonian-Jacobian-Belman equation for optimal control of the satellite orbit by utilizing a second-order expansion of a Taylor formula.
Step 3: based on a robust second-order approximate Hamiltonian-jacobian-Belman equation, a satellite orbit control strategy iterative algorithm is designed.
the steps described above are described in further detail below in connection with specific examples:
in step 1, a satellite orbit control model, a discrete control model of a satellite orbit, and a discrete control model of a satellite orbit containing uncertainty are required to be established in sequence, and the method is realized according to the following steps:
The first step: establishing a satellite orbit control model:
Wherein,Is a vector of satellite position and velocity; /(I)nonlinear terms for a satellite orbit control system; /(I)A coefficient matrix for control input; /(I)Is the control input of the satellite. /(I)And/>The specific form is as follows:
Wherein,Is a universal gravitation constant; /(I)is the true near point angle of the reference orbit; /(I)Is the reference track radius. /(I)Andthe following dynamic equations are satisfied:
;
In the embodiment of the invention, aiming at the defect that the control precision of the satellite orbit is reduced due to the fact that the optimal control strategy of the satellite orbit commonly carries out local linearization on the satellite orbit system and then the controller is designed in the prior art, the defect that the control precision is reduced due to the fact that the nonlinear approximation of a neural network is utilized is overcome, and therefore the control precision is effectively improved. The specific initial state parameter values are as follows:
,/>,/>,
。
And a second step of: establishing a discrete control model of a satellite orbit by using an Euler discretization method:
Wherein,,/>;/>For sampling period,/>In the first place for satellite stateValue of time of day,/>Input for satellite at the/>A value of the time of day. In an embodiment of the invention, the parameter/>For/>。
and a third step of: establishing a satellite orbit discrete control model containing uncertainty;
Wherein,Is a non-matching uncertainty term for a satellite orbit control system and satisfies the following inequality:
Wherein,state/>, for a known determinationIs a function of (2).
in this embodiment, the control accuracy of the satellite orbit is further improved by introducing an uncertainty term in the discrete control model.
In another embodiment, to quantify control performance, as a further improvement of the present invention, the satellite control cost function is set to:
Wherein the matrixIs an adjustable known positive constant matrix. Controller/>is designed to minimize the control cost function/>. In the present embodiment of the present invention, in the present embodiment,
Wherein/>The elements in brackets are shown as constituting a diagonal matrix.
in step 2, a robust second order approximation hamilton-jacobian-bellman equation is established using the second order expansion of the taylor equation:
and/>
Wherein,And/>Respectively the value function/>the expressions of the gradient vector and the hessian matrix are respectively as follows:
Wherein,Is vector/>/>The elements.
In step 3, based on the robust second-order approximation hamilton-jacobian-bellman equation, the following strategy iterative algorithm is designed by using the reinforcement learning idea:
step 3.1: first, an initial admission control strategy is selectedand initializing and calculating an error threshold. In the present embodiment,/>。
Step 3.2: for the number of iterationsIterative value function/>Calculated according to the following equation:
Wherein,Is an adjustable positive constant. In the present embodiment,/>。
Step 3.3: at the derived value functionbased on the above, the control strategy for calculating the next iteration step number
Wherein,
step 3.4: calculating control strategy norm errors of two adjacent times. If it isstep 3.2, if not, calculating, and outputting the optimal control strategy/>。
Fig. 1 and 2 depict simulation results of this example, respectively, wherein fig. 1 depicts a plot of the position error of a satellite versus a desired orbit. As can be seen from fig. 1, with the satellite control method of the present application, after a period of time, the satellite successfully migrates to the desired orbital position. Fig. 2 depicts a velocity error profile of a satellite versus a desired orbit, from analysis, where the relative velocity error eventually converges to zero.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.
Claims (6)
1. The intelligent robust near-optimal orbit control method for the satellite based on the discrete time reinforcement learning is characterized by comprising the following steps of:
step 1: establishing a satellite orbit discrete control model containing uncertainty according to the two-body dynamics;
step 2: establishing a robust second-order approximate Hamiltonian-Jacobian-Belman equation for optimal control of a satellite orbit by utilizing a second-order expansion of a Taylor formula;
Step 3: based on a robust second-order approximate Hamiltonian-jacobian-Belman equation, a satellite orbit control strategy iterative algorithm is designed.
2. The method for intelligent robust near optimal orbit control of a satellite based on discrete time reinforcement learning according to claim 1, wherein a satellite orbit control model is established:
Wherein,Is a vector of satellite position and velocity; /(I)nonlinear terms for a satellite orbit control system; /(I)A coefficient matrix for control input; /(I)A control input for the satellite;
And/>The specific form is as follows:
Wherein,Is a universal gravitation constant; /(I)is the true near point angle of the reference orbit; /(I)is the reference track radius;
And/>the following dynamic equations are satisfied:
。
3. the method for controlling the intelligent robust near-optimal orbit of the satellite based on the discrete time reinforcement learning according to claim 2, wherein a discrete control model of the satellite orbit is established by using an Euler discretization method:
Wherein,,/>;/>For sampling period,/>In the first place for satellite stateValue of time of day,/>Input for satellite at the/>A value of the time of day.
4. the method for intelligent robust near optimal orbit control of a satellite based on discrete time reinforcement learning according to claim 2, wherein a satellite orbit discrete control model containing uncertainty is established
Wherein,Is a non-matching uncertainty term for a satellite orbit control system and satisfies the following inequality:
Wherein,state/>, for a known determinationIs a function of (2).
5. the method for intelligent robust near optimal orbit control of a satellite based on discrete time reinforcement learning according to claim 4, wherein the method comprises the following steps: the satellite control cost function is set as:
Wherein the matrixis an adjustable known positive constant matrix; controller/>is designed to minimize the control cost function/>。
6. The method for intelligent robust approximate optimal orbit control of a satellite based on discrete time reinforcement learning according to claim 5, wherein in the step 3, based on the robust second-order approximate Hamiltonian-jacobian-Belman equation, the following strategy iterative algorithm is designed by using reinforcement learning ideas:
step 3.1: first, an initial admission control strategy is selectedand initializing and calculating an error threshold/>;
Step 3.2: for the number of iterationsIterative value function/>Calculated according to the following equation:
Wherein,is an adjustable positive constant;
Step 3.3: at the derived value functionbased on the above, the control strategy for calculating the next iteration step number
Wherein,
;
step 3.4: calculating control strategy norm errors of two adjacent timesThe method comprises the steps of carrying out a first treatment on the surface of the If it isstep 3.2, if not, calculating, and outputting the optimal control strategy/>。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410194622.6A CN117762022B (en) | 2024-02-22 | Satellite intelligent robust approximate optimal orbit control method based on discrete time reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410194622.6A CN117762022B (en) | 2024-02-22 | Satellite intelligent robust approximate optimal orbit control method based on discrete time reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117762022A true CN117762022A (en) | 2024-03-26 |
CN117762022B CN117762022B (en) | 2024-05-14 |
Family
ID=
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1455992A (en) * | 2000-09-28 | 2003-11-12 | Ses阿斯特拉有限公司 | Spread spectrum communication system using quasi-geostationary satellite |
CN108196446A (en) * | 2017-12-14 | 2018-06-22 | 北京理工大学 | The Dynamic Programming method for optimally controlling of the bi-motor load of unknown-model |
CN111874267A (en) * | 2020-04-30 | 2020-11-03 | 中国人民解放军战略支援部队航天工程大学 | Low-orbit satellite off-orbit control method and system based on particle swarm optimization |
CN113128828A (en) * | 2021-03-05 | 2021-07-16 | 中国科学院国家空间科学中心 | Satellite observation distributed online planning method based on multi-agent reinforcement learning |
CN117579126A (en) * | 2023-11-21 | 2024-02-20 | 重庆邮电大学 | Satellite mobile edge calculation unloading decision method based on deep reinforcement learning |
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1455992A (en) * | 2000-09-28 | 2003-11-12 | Ses阿斯特拉有限公司 | Spread spectrum communication system using quasi-geostationary satellite |
CN108196446A (en) * | 2017-12-14 | 2018-06-22 | 北京理工大学 | The Dynamic Programming method for optimally controlling of the bi-motor load of unknown-model |
CN111874267A (en) * | 2020-04-30 | 2020-11-03 | 中国人民解放军战略支援部队航天工程大学 | Low-orbit satellite off-orbit control method and system based on particle swarm optimization |
CN113128828A (en) * | 2021-03-05 | 2021-07-16 | 中国科学院国家空间科学中心 | Satellite observation distributed online planning method based on multi-agent reinforcement learning |
CN117579126A (en) * | 2023-11-21 | 2024-02-20 | 重庆邮电大学 | Satellite mobile edge calculation unloading decision method based on deep reinforcement learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Reddy et al. | Reliability based structural optimization: a simplified safety index approach | |
CN109255096B (en) | Geosynchronous satellite orbit uncertain evolution method based on differential algebra | |
Li et al. | An approach and landing guidance design for reusable launch vehicle based on adaptive predictor–corrector technique | |
CN105203110A (en) | Low-orbit-satellite orbit prediction method based on atmospheric resistance model compensation | |
CN114839880B (en) | Self-adaptive control method based on flexible joint mechanical arm | |
CN114879515A (en) | Spacecraft attitude reconstruction fault-tolerant control method based on learning neural network | |
CN117762022A (en) | satellite intelligent robust approximate optimal orbit control method based on discrete time reinforcement learning | |
CN110015445B (en) | Earth-moon L2 point Halo track maintaining method | |
CN117762022B (en) | Satellite intelligent robust approximate optimal orbit control method based on discrete time reinforcement learning | |
CN111624872B (en) | PID controller parameter setting method and system based on adaptive dynamic programming | |
US6317662B1 (en) | Stable and verifiable state estimation methods and systems with spacecraft applications | |
Jayakumar et al. | A computational method for solving singular perturbation problems | |
CN115993777A (en) | Track perturbation model inversion-based diameter-cut joint control decoupling iteration calibration method | |
CN114063458A (en) | Preset performance control method of non-triangular structure system independent of initial conditions | |
CN110032066B (en) | Adaptive iterative learning control method for fractional order nonlinear system trajectory tracking | |
Yuan et al. | Uncertainty-resilient constrained rendezvous trajectory optimization via stochastic feedback control and unscented transformation | |
CN114200491A (en) | Navigation data-based emergency spacecraft ephemeris determination method and system | |
Rigatos et al. | Nonlinear optimal control for autonomous hypersonic vehicles | |
Burlion et al. | Controls for a nonlinear system arising in vision‐based landing of airliners | |
Jia et al. | Collision avoidance in target encirclement and tracking of unmanned aerial vehicles under a dynamic event-triggered formation control | |
Das et al. | Optimal nonlinear control and estimation for a reusable launch vehicle during reentry phase | |
CN113297666B (en) | Design method for high-precision control of spacecraft | |
CN113886947B (en) | Aircraft static aeroelastic system output state quantity interval determination method based on iteration strategy | |
CN114397906B (en) | Rapid high-precision calculation method for earth stationary satellite electric propulsion transfer | |
CN116202535B (en) | Initial value intelligent optimized spacecraft angle measurement-only ultrashort arc initial orbit determination method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |