CN117762022A

CN117762022A - satellite intelligent robust approximate optimal orbit control method based on discrete time reinforcement learning

Info

Publication number: CN117762022A
Application number: CN202410194622.6A
Authority: CN
Inventors: 张鹏; 陈谋; 邵书义
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2024-02-22
Filing date: 2024-02-22
Publication date: 2024-03-26
Anticipated expiration: 2044-02-22

Abstract

The invention relates to a satellite intelligent robust approximate optimal orbit control method based on discrete time reinforcement learning, which aims at the problem of satellite orbit discrete control containing uncertainty, designs a brand new robust second-order approximate Hamiltonian-jacobian-Belman equation, and designs a strategy iterative algorithm with convergence characteristic based on the equation. The method can not only effectively solve the uncertainty in the system, but also ensure the stability of satellite orbit control, and meanwhile, the strategy iteration method is convenient to apply in actual engineering.

Description

satellite intelligent robust approximate optimal orbit control method based on discrete time reinforcement learning

Technical Field

The invention relates to the field of intelligent control of satellite orbits, in particular to an intelligent robust near-optimal satellite orbit control method based on discrete time reinforcement learning.

Background

Intelligent control of satellite orbits has been a key topic in satellite in-orbit service and is widely used for space-critical tasks such as: active clearance of space debris, interstellar detection, and the like. For space satellites, propeller fuel is an important resource to ensure that the satellite is working properly. In designing satellite orbit control strategies, it is necessary to incorporate fuel consumption into the control performance index. However, due to the strong non-linear nature of satellite orbit, the difficulty of designing an orbit controller with fuel as a performance index increases. In addition, in the satellite operation process, the control accuracy of the satellite orbit can be seriously reduced due to the influence of uncertainty such as various perturbation forces, and the design difficulty of the nonlinear optimal controller of the satellite is further increased.

Aiming at the optimal control problem of satellite orbits, various control methods have been proposed. Common satellite orbit control strategies generally require local linearization of the satellite orbit system followed by controller design. However, such local linearization may reduce the control accuracy of the satellite orbit. In addition, the existing nonlinear optimal control method does not consider the influence of uncertainty, so that the designed control strategy is not accurate enough. Therefore, it is a difficult problem to design a nonlinear robust optimal control method for the satellite orbit optimal control task, how to avoid local linearization.

Disclosure of Invention

the technical solution of the invention is as follows: aiming at the optimal control problem of the satellite nonlinear orbit, the method overcomes the defects of the prior art, fully utilizes the nonlinear approximation characteristic of the neural network, and provides a satellite intelligent robust approximate optimal orbit control method based on discrete time reinforcement learning.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

a satellite intelligent robust approximate optimal orbit control method based on discrete time reinforcement learning comprises the following steps:

step 1: establishing a satellite orbit discrete control model containing uncertainty according to the two-body dynamics;

step 2: establishing a robust second-order approximate Hamiltonian-Jacobian-Belman equation for optimal satellite control by utilizing a second-order expansion of a Taylor formula;

step 3: designing a satellite orbit control strategy iterative algorithm based on a robust second-order approximate Hamiltonian-jacobian-Belman equation;

further, the step 1 specifically comprises the following steps:

Establishing a satellite orbit control model:

Wherein,Is a vector of satellite position and velocity; /(I)nonlinear terms for a satellite orbit control system; /(I)A coefficient matrix for control input; /(I)Is the control input of the satellite. /(I)And/>The specific form is as follows:

Wherein,Is a universal gravitation constant; /(I)is the true near point angle of the reference orbit; /(I)Is the reference track radius. /(I)Andthe following dynamic equations are satisfied:

further, a Euler discretization method is utilized to establish a satellite orbit discrete control model:

Wherein,，/>；/>For sampling period,/>In the first place for satellite stateValue of time of day,/>Input for satellite at the/>A value of the time of day.

Further, a satellite orbit discrete control model containing uncertainty is established

Wherein,Is a non-matching uncertainty term for a satellite orbit control system and satisfies the following inequality:

Wherein,state/>, for a known determinationIs a function of (2).

to quantify control performance, the satellite control cost function is set to:

Wherein the matrixIs an adjustable known positive constant matrix. Controller/>is designed to minimize the control cost function/>。

further, the step 2 specifically comprises: establishing a robust second-order approximation Hamiltonian-Jacobian-Belman equation by utilizing a second-order expansion of a Taylor formula:

And is also provided with

Wherein,And/>Respectively the value function/>the expressions of the gradient vector and the hessian matrix are respectively as follows:

Wherein,Is vector/>/>The elements.

Further, the step 3 specifically comprises: based on the robust second-order approximate Hamiltonian-jacobian-Bellman equation, the following strategy iterative algorithm is designed by utilizing the reinforcement learning idea:

step 3.1: first, an initial admission control strategy is selectedand initializing and calculating an error threshold。

Step 3.2: for the number of iterationsIterative value function/>Calculated according to the following equation:

Wherein,is an adjustable positive constant.

Step 3.3: at the derived value functionbased on the above, the control strategy for calculating the next iteration step number

Wherein,

step 3.4: calculating control strategy norm errors of two adjacent times. If it isstep 3.2, if not, calculating, and outputting the optimal control strategy/>。

The beneficial effects of the invention are as follows: aiming at the problem of satellite orbit discrete control containing uncertainty, the invention designs a brand new robust second-order approximate Hamiltonian-Jacobian-Bellman equation and a strategy iteration algorithm with convergence characteristic. The invention not only overcomes the nonlinear characteristic of the satellite orbit, but also solves the adverse effect of uncertainty of the satellite orbit, and ensures the control precision of the satellite orbit.

Drawings

FIG. 1 is a graph of the position error of a satellite versus a desired orbit in accordance with the present invention;

FIG. 2 is a graph of the velocity error of a satellite versus a desired orbit in accordance with the present invention;

FIG. 3 is a flowchart of the algorithm of the present invention.

Detailed Description

the principles and features of the present invention are described below with examples given for the purpose of illustration only and are not intended to limit the scope of the invention.

As shown in fig. 1, a satellite intelligent robust near-optimal orbit control method based on discrete time reinforcement learning comprises the following steps:

step 2: and establishing a robust second-order approximate Hamiltonian-Jacobian-Belman equation for optimal control of the satellite orbit by utilizing a second-order expansion of a Taylor formula.

Step 3: based on a robust second-order approximate Hamiltonian-jacobian-Belman equation, a satellite orbit control strategy iterative algorithm is designed.

the steps described above are described in further detail below in connection with specific examples:

in step 1, a satellite orbit control model, a discrete control model of a satellite orbit, and a discrete control model of a satellite orbit containing uncertainty are required to be established in sequence, and the method is realized according to the following steps:

The first step: establishing a satellite orbit control model:

；

In the embodiment of the invention, aiming at the defect that the control precision of the satellite orbit is reduced due to the fact that the optimal control strategy of the satellite orbit commonly carries out local linearization on the satellite orbit system and then the controller is designed in the prior art, the defect that the control precision is reduced due to the fact that the nonlinear approximation of a neural network is utilized is overcome, and therefore the control precision is effectively improved. The specific initial state parameter values are as follows:

，/>，/>，

。

And a second step of: establishing a discrete control model of a satellite orbit by using an Euler discretization method:

Wherein,，/>；/>For sampling period,/>In the first place for satellite stateValue of time of day,/>Input for satellite at the/>A value of the time of day. In an embodiment of the invention, the parameter/>For/>。

and a third step of: establishing a satellite orbit discrete control model containing uncertainty;

Wherein,state/>, for a known determinationIs a function of (2).

in this embodiment, the control accuracy of the satellite orbit is further improved by introducing an uncertainty term in the discrete control model.

In another embodiment, to quantify control performance, as a further improvement of the present invention, the satellite control cost function is set to:

Wherein the matrixIs an adjustable known positive constant matrix. Controller/>is designed to minimize the control cost function/>. In the present embodiment of the present invention, in the present embodiment,

Wherein/>The elements in brackets are shown as constituting a diagonal matrix.

in step 2, a robust second order approximation hamilton-jacobian-bellman equation is established using the second order expansion of the taylor equation:

and/>

Wherein,Is vector/>/>The elements.

In step 3, based on the robust second-order approximation hamilton-jacobian-bellman equation, the following strategy iterative algorithm is designed by using the reinforcement learning idea:

step 3.1: first, an initial admission control strategy is selectedand initializing and calculating an error threshold. In the present embodiment,/>。

Wherein,Is an adjustable positive constant. In the present embodiment,/>。

Wherein,

Fig. 1 and 2 depict simulation results of this example, respectively, wherein fig. 1 depicts a plot of the position error of a satellite versus a desired orbit. As can be seen from fig. 1, with the satellite control method of the present application, after a period of time, the satellite successfully migrates to the desired orbital position. Fig. 2 depicts a velocity error profile of a satellite versus a desired orbit, from analysis, where the relative velocity error eventually converges to zero.

The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims

1. The intelligent robust near-optimal orbit control method for the satellite based on the discrete time reinforcement learning is characterized by comprising the following steps of:

step 2: establishing a robust second-order approximate Hamiltonian-Jacobian-Belman equation for optimal control of a satellite orbit by utilizing a second-order expansion of a Taylor formula;

2. The method for intelligent robust near optimal orbit control of a satellite based on discrete time reinforcement learning according to claim 1, wherein a satellite orbit control model is established:

Wherein,Is a vector of satellite position and velocity; /(I)nonlinear terms for a satellite orbit control system; /(I)A coefficient matrix for control input; /(I)A control input for the satellite;

And/>The specific form is as follows:

Wherein,Is a universal gravitation constant; /(I)is the true near point angle of the reference orbit; /(I)is the reference track radius;

And/>the following dynamic equations are satisfied:

。

3. the method for controlling the intelligent robust near-optimal orbit of the satellite based on the discrete time reinforcement learning according to claim 2, wherein a discrete control model of the satellite orbit is established by using an Euler discretization method:

4. the method for intelligent robust near optimal orbit control of a satellite based on discrete time reinforcement learning according to claim 2, wherein a satellite orbit discrete control model containing uncertainty is established

Wherein,state/>, for a known determinationIs a function of (2).

5. the method for intelligent robust near optimal orbit control of a satellite based on discrete time reinforcement learning according to claim 4, wherein the method comprises the following steps: the satellite control cost function is set as:

Wherein the matrixis an adjustable known positive constant matrix; controller/>is designed to minimize the control cost function/>。

6. The method for intelligent robust approximate optimal orbit control of a satellite based on discrete time reinforcement learning according to claim 5, wherein in the step 3, based on the robust second-order approximate Hamiltonian-jacobian-Belman equation, the following strategy iterative algorithm is designed by using reinforcement learning ideas:

step 3.1: first, an initial admission control strategy is selectedand initializing and calculating an error threshold/>；

Wherein,is an adjustable positive constant;

Wherein,

；

step 3.4: calculating control strategy norm errors of two adjacent timesThe method comprises the steps of carrying out a first treatment on the surface of the If it isstep 3.2, if not, calculating, and outputting the optimal control strategy/>。