CN115771624B

CN115771624B - Self-adaptive satellite attitude and orbit control method based on reinforcement learning

Info

Publication number: CN115771624B
Application number: CN202310101472.5A
Authority: CN
Inventors: 刘昊; 吕金虎; 钟森; 高庆; 刘德元; 王田
Original assignee: Beihang University; Academy of Mathematics and Systems Science of CAS
Current assignee: Beihang University; Academy of Mathematics and Systems Science of CAS
Priority date: 2023-02-13
Filing date: 2023-02-13
Publication date: 2023-05-26
Anticipated expiration: 2043-02-13
Also published as: CN115771624A

Abstract

The invention relates to the field of satellite control, in particular to a self-adaptive satellite attitude orbit control method based on reinforcement learning, which aims at actual satellite formation, establishes a virtual leading satellite orbit dynamics model by taking a virtual leading satellite as a reference, and obtains a satellite attitude dynamics Lagrange representation based on MRPs description by combining an attitude motion equation; obtaining six-degree-of-freedom dynamic models of the coupling of the pose and orbit of each satellite; then a satellite attitude dynamics control model is obtained; after dynamic description of satellite reference signals is combined, a cost function of a satellite subsystem is defined, a Hamiltonian is obtained, an optimal control strategy is obtained through designing an abnormal strategy reinforcement learning optimal attitude control algorithm, and the optimal control strategy is obtained according to the same method. The method adopts the neural network to directly estimate the optimal control strategy of the nonlinear model, has good self-adaptability under the condition of unknown satellite part parameters, and designs the optimal control for each thruster.

Description

Self-adaptive satellite attitude and orbit control method based on reinforcement learning

Technical Field

The present invention relates to the field of satellite control. In particular to a self-adaptive satellite attitude and orbit control method based on reinforcement learning.

Background

Satellite formation flight has attracted a great deal of attention in the fields of earth observation, astronomical observation, communication between satellites and the earth, and the like. A group of satellites is flexible and has good mission benefits compared to a single satellite, because they can be coordinated with different payloads and freely combined depending on the particular mission. Satellite formation has great potential in various fields. Because of the complex coupling dynamics between the attitude and phase positions of satellites, the dynamics model of each satellite is highly nonlinear, involving a rotation-translation coupling between 6 degrees of freedom. Furthermore, it is difficult to obtain accurate satellite dynamics parameters, which presents challenges in determining optimal targets for optimal formation flights for multiple satellite systems.

The current method for controlling the satellite attitude and orbit mainly comprises the following steps:

1. simplifying a satellite six-degree-of-freedom dynamic model into a linear model, and combining a traditional optimal control method to obtain an optimal control strategy.

2. For a nonlinear satellite dynamics model, the nonlinear is partially offset by adopting a feedback linearization method, and the method is based on some optimization methods (such as a Riccati differential method,

Method) and combining the traditional optimal control method to obtain an optimal control strategy.

3. For satellite position control, the control input is given in the LVLH (local vertical local horizontal) coordinate system, irrespective of the coupling effect of attitude on the relative position.

As CN105068546a discloses a satellite formation relative orbit adaptive neural network configuration comprising a control method by establishing a relative orbit dynamics equation of a following satellite; designing a distributed speed observer for each follower star; then, a neural network approximation is carried out by following a relative orbit dynamics equation of the star and a distributed speed observer; the adaptive neural network configuration is designed according to the neural network approximation result to comprise a control algorithm. CN105138010a discloses a formation satellite distributed finite time tracking control method, by establishing a double-star relative motion dynamics model; forming a relative motion dynamics model of the satellite relative to a reference point; the distributed finite time tracking control law is designed, and the problem that communication among formation satellites is limited in the satellite formation control method is solved.

However, the above-mentioned satellite attitude and orbit control methods have certain problems, and for the first, the dynamics of the satellite in actual operation are strongly nonlinear, so that the control strategy obtained by this way is not optimal. For the second, it should be noted that model information based on the controlled object is required in both the feedback linearization process and the conventional optimal controller design process. The dynamic model of the actual satellite has uncertainty (such as mass distribution uncertainty, moment of inertia uncertainty and the like). Such techniques are not applicable where the parameters are completely unknown and the model information of the controlled object is unknown. For the third control method, a position controller (such as a thruster) of an actual satellite is fixed on a satellite body, an input matrix of the position control depends on a position posture, and the change of the satellite posture has obvious influence on thrust distribution of the thruster. The optimal control input obtained by such techniques is the optimal control in the LVLH coordinate system, not the optimal control corresponding to each thruster in the body coordinate system.

Disclosure of Invention

Aiming at the problems existing in the prior art, the invention firstly establishes a complete six-degree-of-freedom nonlinear model of the satellite, and compared with the mode of simplifying the satellite dynamics into a linearization model in the first type of prior control technology, the invention adopts a neural network to estimate the optimal control strategy of the nonlinear model directly so as to cope with nonlinear influence. Compared with the second type of prior art, the algorithm provided by the invention can avoid the requirement on the accuracy of the satellite model because the reinforcement learning algorithm is adopted to carry out parameter identification based on measurable data, and has good self-adaptability to the condition that the satellite part parameters are unknown, and the parameters can be identified and the real optimal control strategy can be learned through interaction with the environment. Compared with the third type of prior art, the invention projects the position of the satellite relative to the reference point into the body coordinate system, the relative position observed from the body coordinate system dynamically contains the coupling information of the gesture to the position, and the optimal control is designed for each thruster in the body coordinate system.

The complete technical scheme of the invention comprises the following steps:

a self-adaptive satellite attitude and orbit control method based on reinforcement learning comprises the following steps:

step 1: aiming at a group of formation composed of actual satellites, a virtual leading satellite is adopted as a benchmark of the whole formation; establishing an orbit dynamics model of a virtual leading satellite, obtaining a relative position dynamic relation between each actual satellite and the virtual leading satellite, and combining an attitude motion equation of each actual satellite to obtain Lagrange representation of each actual satellite attitude dynamics based on modified Rodrigas parameter description;

step 2: aiming at actual satellite formation, a six-degree-of-freedom dynamic model of the coupling of each actual satellite attitude and orbit is obtained;

step 3: carrying out attitude controller design, comprising aiming at actual satellite formation, obtaining an attitude subsystem of an actual satellite according to a model of actual satellite attitude dynamics control and combining dynamic description of an actual satellite reference signal, defining a cost function of the attitude subsystem aiming at the attitude subsystem, and obtaining an optimal attitude control strategy by designing an abnormal strategy reinforcement learning optimal attitude control algorithm;

step 4: the design of the relative position controller comprises the steps of aiming at the formation of the actual satellites, obtaining a relative position subsystem of each actual satellite according to the relative position dynamic between the actual satellites and the virtual leading satellites and combining the relative position reference dynamic of each actual satellite; and defining a cost function of the relative position subsystem aiming at the relative position subsystem, and obtaining an optimal relative position control strategy by designing an abnormal strategy reinforcement learning relative position control algorithm.

The step 1 specifically comprises the following steps:

1.1 Establishing a geocentric inertial coordinate system of a virtual leading satellite and a body coordinate system of each actual satellite; establishing a virtual leading satellite orbit dynamics model consisting of a position vector and an gravitational acceleration vector in a geocentric inertial coordinate system;

1.2, establishing an orbit dynamics model of each actual satellite consisting of a position vector, an gravitation acceleration vector and a position control force vector;

1.3 Obtaining the relative position and the relative position dynamic between each actual satellite and the virtual leading satellite according to the position vector of each actual satellite and the position vector of the virtual leading satellite;

1.4 Dynamically projecting the relative position obtained in the step 1.3 to a body coordinate system of each actual satellite; and combining an actual satellite attitude motion equation comprising angular velocity, inertia and control moment to obtain Lagrangian representation of actual satellite attitude dynamics based on the modified Rodrigues parameter description.

In the step 2, each actual satellite includes six thrusters as position control actuators, and the relative positions between each actual satellite and the virtual leading satellite are dynamically represented as six-degree-of-freedom dynamic models of pose-orbit coupling of each actual satellite in a body coordinate system.

The step 3 specifically comprises the following steps:

3.1 expressing the attitude status of each actual satellite as a function of the output and the control moment input;

3.2 combining dynamic description of satellite reference signals to obtain attitude subsystems of the actual satellites,

3.3 defining a cost function for the satellite attitude subsystem;

and 3.4, combining the value functions of the attitude subsystems of the actual satellites to obtain Hamiltonian, and obtaining an optimal attitude control strategy by reinforcement learning of the optimal attitude control algorithm through the different strategies.

In the step 4, in particular,

4.1 dynamically expressing the relative position between each actual satellite and the virtual leading satellite as a function of the relative position state quantity, the output quantity and the control acceleration input,

4.2 combining the relative position reference dynamic of each actual satellite to obtain the relative position subsystem of each actual satellite,

4.3 defining a cost function for the relative position subsystem;

4.4, combining the relative position subsystem cost function of each actual satellite to obtain a Belman equation, and obtaining an optimal relative position control strategy through different strategy reinforcement learning relative position control algorithm.

The invention has the advantages compared with the prior art that:

1. and establishing a complete nonlinear model of the satellite, and estimating an optimal control strategy of the nonlinear model directly by adopting a neural network so as to cope with nonlinear influence.

2. The method has the advantages that the reinforcement learning algorithm is adopted to conduct parameter identification based on measurable data, so that the requirement on the accuracy of a satellite model can be avoided, the optimal control law is learned, meanwhile, the method has good self-adaptability under the condition that the satellite partial parameters are unknown, the parameters can be identified through interaction with the environment, and the real optimal control strategy is learned.

3. According to the invention, the position of the satellite relative to the reference point is dynamically projected into the body coordinate system, the relative position observed from the body coordinate system dynamically contains coupling information of the gesture to the position, and optimal control is designed for each thruster in the body coordinate system.

Drawings

FIG. 1 is a schematic view of a satellite formation coordinate system of the disclosed method;

FIG. 2 is a schematic diagram of a satellite formation model;

FIG. 3 is a simulated 3D flight effect diagram of the satellite of the present invention;

FIG. 4 is a graph of the satellite simulated attitude tracking error of the present invention;

FIG. 5 is a graph of the relative position tracking error of the satellite simulation of the present invention.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only examples and are not intended to limit the present application.

The present invention will be described in detail with reference to the following examples and drawings, but it should be understood that the examples and drawings are only for illustrative purposes and are not intended to limit the scope of the present invention in any way. All reasonable variations and combinations that are included within the scope of the inventive concept fall within the scope of the present invention.

The technical scheme adopted by the invention is that the self-adaptive satellite attitude and orbit control method based on reinforcement learning comprises the following specific steps:

further description of the technical solution disclosed in this embodiment is provided, and it should be noted that, in the reference signs used in this embodiment, unless otherwise specified,

representing a matrix of i rows and j columns, e.g.>

Representation->

A matrix of 3 rows and 1 column, i.e. a 3-order column vector; symbols with dots above, representing the derivative of the parameter represented by the symbol below the dot, the number of dots being the derivative order, e.g.

For the angular velocity of the ith satellite, +.>

Is the first derivative of the angular velocity of the ith satellite, i.e. the angular acceleration of the ith satellite, +.>

Then it is the second derivative of the angular velocity of the ith satellite. />

Representing a standard oblique symmetry matrix.

The upper right-hand corner mark T represents the matrix transposition, e.g

Representation->

Is a transposed matrix of (a); right upper corner mark->

Representing an inverse matrix, e.g. R _+n ^-1 Is R _+n An inverse matrix of (a); upper corner mark->

Representing a companion matrix; parameters which are not explicitly defined in the embodiment are all intermediate variables in the derivation process, and have no practical physical significance.

Step 1: satellite dynamics model: respectively establishing a geocentric inertial coordinate system and a body coordinate system of each satellite; considering a set of rigid satellites traveling around the earth, each satellite contains six thrusters as its position control actuators, as shown in fig. 1, the attitude control moments are given in an ontology coordinate system. To facilitate formation representation, a virtual leader is introduced as a reference to the entire formation, and each satellite needs to maintain a desired relative position with the virtual leader to construct the entire satellite formation, as shown in fig. 2. Establishing an orbit dynamics model of a virtual leading satellite under a geocentric inertial coordinate system, giving out an orbit dynamics model of each satellite, respectively solving the relative position dynamics of each satellite and the virtual leading satellite according to the orbit dynamics model, and dynamically projecting the relative position dynamics of each satellite and the virtual leading satellite to a body coordinate system of each satellite, and then combining a gesture motion equation of the satellite, thereby giving out Lagrange representation of satellite gesture dynamics described based on MRPs; the method specifically comprises the following steps:

1.1 Giving a virtual leading satellite orbit dynamics model defined by a position vector of the virtual leading satellite and Euclidean constants, an gravitational acceleration vector and an earth gravitational constant of the virtual leading satellite;

（1）

in the method, in the process of the invention,

a position vector for the virtual lead satellite;

is->

Euclidean constant of (c);

the gravitational acceleration vector is the gravitational acceleration vector of the virtual leading satellite;

is the gravitational constant>

Time is;

1.2, an orbit dynamics model of an ith actual satellite is established as follows:

（2）

in the method, in the process of the invention,

the position vector is the position vector of the ith actual satellite;

the gravitational acceleration vector of the ith actual satellite;

controlling a force vector for the position of the ith actual satellite; />

Is->

Euclidean constant of (c);

1.3 obtaining the relative position of the ith satellite and the virtual leading satellite

The following are provided:

（3）

1.4 Obtain the firstiThe relative positions of the satellites and the virtual leading satellites are dynamically as follows:

（4）

1.5 since the actuator (thruster) controlling the relative position is fixed on the satellite body, for the controller design, the firstiThe relative positions of the satellites and the virtual leading satellites are dynamically projected into a satellite body coordinate system to obtain:

（5）

in the method, in the process of the invention,

is the firstiAngular velocity of the satellite;

a coordinate conversion matrix from the geocentric inertial coordinate system to the body coordinate system of the ith satellite;

is the firstiThe relative positions of the satellites and the virtual leading satellites in the body coordinate system are dynamic;

angular acceleration for the ith satellite; />

Representing a standard oblique symmetry matrix.

Since the relative position dynamic model includes angular velocity and angular acceleration, there is a coupling effect of satellite attitude on the relative position dynamic.

1.6 the satellite attitude dynamics are described as follows:

（6）

in the method, in the process of the invention,

is the firstiThe product of the inertia of the satellite;

represents the firstiThe control moment of the satellite.

In order to avoid singular problems of attitude description, the invention describes the attitude of the ith satellite by adopting Modified Rodrigas Parameters (MRPs)

The following are provided:

（7）

wherein the method comprises the steps of

The unit vector of the ith satellite on the Euler axis;

is the rotation angle of the ith satellite relative to the Euler axis.

Gesture kinematics can be written as:

（8）

in the method, in the process of the invention,

is the ith satellitePosture of->

Is the first derivative of (a);

；

and->

Are all identity matrices.

A lagrangian representation of satellite attitude dynamics based on MRPs descriptions can be obtained:

（9）

wherein:

，

，

m is

A function; n is->

A function; h is->

A function.

Step 2: describing formation problems, the six-degree-of-freedom dynamic model of satellite attitude and orbit coupling is obtained as follows:

（10）

wherein:

，

，

and is also provided with

。

The six-degree-of-freedom dynamic model of the satellite with strong nonlinearity and high coupling is obtained, and the coupling effect of the gesture dynamic to the relative position dynamic can be seen. The purpose of this embodiment is to design an adaptive controller to control the formation of satellites including non-linear and orbit-coupled satellites.

Step 3: the design of the attitude controller gives out the dynamic description of satellite reference signals, defines the cost function of a satellite subsystem, further obtains the Hamiltonian, and obtains an optimal control strategy by designing an abnormal strategy reinforcement learning optimal attitude control algorithm:

(1) For the ith satellite in the formation, the attitude dynamics in equation (9) can be sorted as:

（11）

wherein:

in the posture state->

For outputting (I)>

To control moment input, an

Nonlinear terms, and satisfies:

wherein the method comprises the steps of

Is a zero matrix.

(2) The reference signal dynamics for the attitude of the other satellite is described as follows:

（12）

wherein, the liquid crystal display device comprises a liquid crystal display device,

is an unknown smooth function, and

is the output.

(3) The first can be obtained from the formulas (11) and (12)iEnhanced dynamics of the satellites:

（13）

wherein:

，

，

，/>

，

and is also provided with

Representing an attitude tracking error. It is necessary to design the attitude controller so that the attitude error approaches 0.

(4) And defining a cost function for the attitude subsystem by adopting a reinforcement learning method, combining the reinforcement dynamics of the ith satellite to obtain a Hamiltonian function, obtaining an optimal attitude control strategy according to a stability condition, then bringing the Hamiltonian function into the Hamiltonian function, and designing a different strategy reinforcement learning optimal attitude control algorithm to obtain a Bellman equation.

For a given control strategy

The bellman equation is solved, specifically as follows:

step 1, setting an initial control strategy and exploring noise for the ith satellite, wherein the initial control strategy and exploring noise can enable the system to be stable, and collecting required satellite attitude belonging to control input data.

And 2, solving a Belman equation by using the given iteration control initial value and the data collected in the step 1.

And step 3, updating the control strategy according to the solving result and returning to the step one until the termination condition is met.

And 4, ending.

In this step, the bellman equation is difficult to solve because the cost function and the control strategy are both nonlinear functions. Two neural networks are introduced to estimate the cost function and control strategy.

(4) Adopting a reinforcement learning method, defining a cost function aiming at a relative position subsystem to obtain a Belman equation, obtaining the most valuable function solved by the Belman equation, obtaining an optimal relative position control strategy, bringing the most valuable function into the Belman equation, utilizing the following different strategies to reinforcement learn the relative position control algorithm to solve,

similar to the gesture controller design, two Neural Networks (NNs) may be used to estimate the cost function and control strategy, and the Least Squares (LS) method is used to update the neural networks under the condition of continuous excitation of the probe noise to obtain the optimal position control strategy.

In the simulation test, the task of four satellites for formation flight is executed, and parameters of each satellite are as follows: gravitational acceleration g=9.8 m/s ² The gravitational constant is 3.986 multiplied by 10 ¹⁴ m ³ /s ² . The initial position and velocity of the virtual pilot satellite are set as [068710000 ] respectively] ^T m and [ -770000)] ^T m/s, which also determines the track parameters of the whole formation. The expected relative trajectories of each satellite and the virtual lead satellite in the formation are in turn: [20+30e ] ^-t 0 t] ^T m，[-20-30e ^-t 0 t] ^T m，[0 20+30e ^-t t] ^T m，[0 -20-30e ^-t t] ^T m。

The expected attitude of each satellite is [ 0000] ^T . The initial positions of four satellites are respectively [ 50-5 0 ] in the geocentric inertial coordinate system] ^T m，[-50 -5 2] ^T m，2 50 1] ^T m，[5 -50 0] ^T m, initial velocities are [ 1.5.0.05.0.6 respectively] ^T m/s，[1.5 0.05 0.6] ^T m/s，[1.5 0.05 0.6] ^T m/s，[1.5 0.05 0.6] ^T m/s, initial attitudes are respectively [ 0.10.1.0.1 ]] ^T ，[0.09 0.09 0.09] ^T ，[0.1 0.1 0.1] ^T ，[0.09 0.09 0.09] ^T . Setting relevant parameters in a reinforcement learning algorithm, selecting a polynomial function as a basis function in a neural network, setting initial control strategy distribution in an adaptive algorithm and exploring noise of each satellite, and performing simulation. Fig. 3 shows the 3D flight effect of four satellites, fig. 4 is the tracking error of the pose of the four satellites, fig. 5 is the tracking error of the relative positions of the four satellites, and in the illustration, four curves are distributed for satellites 1 to 4. It can be seen that after iteratively learning the optimal control strategy, the tracking error for each satellite converges to zero. In contrast to the model-based satellite control method, in the present embodimentThe optimal formation control algorithm can realize optimal control under the condition that the dynamic position or parameters of the satellite are uncertain.

The above applications are only some of the embodiments of the present application. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit of the inventive concept.

Claims

1. The self-adaptive satellite attitude and orbit control method based on reinforcement learning is characterized by comprising the following steps of:

step 1: aiming at a group of formation composed of actual satellites, a virtual leading satellite is adopted as a benchmark of the whole formation; establishing an orbit dynamics model of a virtual leading satellite, obtaining a relative position dynamic relation between each actual satellite and the virtual leading satellite, and combining an attitude motion equation of each actual satellite to obtain Lagrange representation of each actual satellite attitude dynamics based on modified Rodrigas parameter description; the method comprises the following steps:

（1）

in the method, in the process of the invention,

a position vector for the virtual lead satellite; />

Is->

Euclidean constant of (c);

the gravitational acceleration vector is the gravitational acceleration vector of the virtual leading satellite; />

Is the gravitational constant>

Time is;

1.2, establishing an orbit dynamics model of each actual satellite consisting of a position vector, an gravitational acceleration vector and a position control force vector:

（2）

in the method, in the process of the invention,

the position vector is the position vector of the ith actual satellite; />

The gravitational acceleration vector of the ith actual satellite; />

Controlling a force vector for the position of the ith actual satellite;

is->

Euclidean constant of (c);

1.3, obtaining the relative position and the relative position dynamic between each actual satellite and the virtual leading satellite according to the position vector of each actual satellite and the position vector of the virtual leading satellite;

relative position of ith satellite and virtual leading satellite

The following are provided:

（3）

first, theiThe relative positions of the satellites and the virtual leading satellites are dynamically as follows:

（4）

1.4 Dynamically projecting the relative position obtained in the step 1.3 to a body coordinate system of each actual satellite; combining an actual satellite attitude motion equation comprising angular velocity, inertia and control moment to obtain a Lagrangian representation of actual satellite attitude dynamics based on the modified Rodrigues parameter description:

in the method, in the process of the invention,

the attitude of the ith satellite; m is->

A function; n is->

A function; h is->

A function;

representing the control moment of the ith satellite; wherein: />

The unit vector of the ith satellite on the Euler axis; />

Is the rotation angle of the ith satellite relative to the Euler axis;

；

is a unit matrix; />

Representing a standard oblique symmetry matrix;

，/>

is the product of inertia of the ith satellite;

，/>

angular acceleration for the ith satellite;

step 2: aiming at actual satellite formation, a six-degree-of-freedom dynamic model of the coupling of each actual satellite attitude and orbit is obtained; each actual satellite comprises six thrusters serving as position control actuators, and the relative positions of the actual satellites and the virtual leading satellites are dynamically represented as six-degree-of-freedom dynamic models of the pose-orbit coupling of the actual satellites in a body coordinate system;

2. The adaptive satellite attitude and orbit control method based on reinforcement learning according to claim 1, wherein the step 3 is specifically:

3.3 defining a cost function for the satellite attitude subsystem;

3. The method for controlling the attitude and orbit of the adaptive satellite based on reinforcement learning according to claim 2, wherein in the step 4,

4.3 defining a cost function for the relative position subsystem;