CN115771624B - Self-adaptive satellite attitude and orbit control method based on reinforcement learning - Google Patents

Self-adaptive satellite attitude and orbit control method based on reinforcement learning Download PDF

Info

Publication number
CN115771624B
CN115771624B CN202310101472.5A CN202310101472A CN115771624B CN 115771624 B CN115771624 B CN 115771624B CN 202310101472 A CN202310101472 A CN 202310101472A CN 115771624 B CN115771624 B CN 115771624B
Authority
CN
China
Prior art keywords
satellite
attitude
actual
relative position
control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310101472.5A
Other languages
Chinese (zh)
Other versions
CN115771624A (en
Inventor
刘昊
吕金虎
钟森
高庆
刘德元
王田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Academy of Mathematics and Systems Science of CAS
Original Assignee
Beihang University
Academy of Mathematics and Systems Science of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University, Academy of Mathematics and Systems Science of CAS filed Critical Beihang University
Priority to CN202310101472.5A priority Critical patent/CN115771624B/en
Publication of CN115771624A publication Critical patent/CN115771624A/en
Application granted granted Critical
Publication of CN115771624B publication Critical patent/CN115771624B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Landscapes

  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention relates to the field of satellite control, in particular to a self-adaptive satellite attitude orbit control method based on reinforcement learning, which aims at actual satellite formation, establishes a virtual leading satellite orbit dynamics model by taking a virtual leading satellite as a reference, and obtains a satellite attitude dynamics Lagrange representation based on MRPs description by combining an attitude motion equation; obtaining six-degree-of-freedom dynamic models of the coupling of the pose and orbit of each satellite; then a satellite attitude dynamics control model is obtained; after dynamic description of satellite reference signals is combined, a cost function of a satellite subsystem is defined, a Hamiltonian is obtained, an optimal control strategy is obtained through designing an abnormal strategy reinforcement learning optimal attitude control algorithm, and the optimal control strategy is obtained according to the same method. The method adopts the neural network to directly estimate the optimal control strategy of the nonlinear model, has good self-adaptability under the condition of unknown satellite part parameters, and designs the optimal control for each thruster.

Description

Self-adaptive satellite attitude and orbit control method based on reinforcement learning
Technical Field
The present invention relates to the field of satellite control. In particular to a self-adaptive satellite attitude and orbit control method based on reinforcement learning.
Background
Satellite formation flight has attracted a great deal of attention in the fields of earth observation, astronomical observation, communication between satellites and the earth, and the like. A group of satellites is flexible and has good mission benefits compared to a single satellite, because they can be coordinated with different payloads and freely combined depending on the particular mission. Satellite formation has great potential in various fields. Because of the complex coupling dynamics between the attitude and phase positions of satellites, the dynamics model of each satellite is highly nonlinear, involving a rotation-translation coupling between 6 degrees of freedom. Furthermore, it is difficult to obtain accurate satellite dynamics parameters, which presents challenges in determining optimal targets for optimal formation flights for multiple satellite systems.
The current method for controlling the satellite attitude and orbit mainly comprises the following steps:
1. simplifying a satellite six-degree-of-freedom dynamic model into a linear model, and combining a traditional optimal control method to obtain an optimal control strategy.
2. For a nonlinear satellite dynamics model, the nonlinear is partially offset by adopting a feedback linearization method, and the method is based on some optimization methods (such as a Riccati differential method,
Figure SMS_1
Method) and combining the traditional optimal control method to obtain an optimal control strategy.
3. For satellite position control, the control input is given in the LVLH (local vertical local horizontal) coordinate system, irrespective of the coupling effect of attitude on the relative position.
As CN105068546a discloses a satellite formation relative orbit adaptive neural network configuration comprising a control method by establishing a relative orbit dynamics equation of a following satellite; designing a distributed speed observer for each follower star; then, a neural network approximation is carried out by following a relative orbit dynamics equation of the star and a distributed speed observer; the adaptive neural network configuration is designed according to the neural network approximation result to comprise a control algorithm. CN105138010a discloses a formation satellite distributed finite time tracking control method, by establishing a double-star relative motion dynamics model; forming a relative motion dynamics model of the satellite relative to a reference point; the distributed finite time tracking control law is designed, and the problem that communication among formation satellites is limited in the satellite formation control method is solved.
However, the above-mentioned satellite attitude and orbit control methods have certain problems, and for the first, the dynamics of the satellite in actual operation are strongly nonlinear, so that the control strategy obtained by this way is not optimal. For the second, it should be noted that model information based on the controlled object is required in both the feedback linearization process and the conventional optimal controller design process. The dynamic model of the actual satellite has uncertainty (such as mass distribution uncertainty, moment of inertia uncertainty and the like). Such techniques are not applicable where the parameters are completely unknown and the model information of the controlled object is unknown. For the third control method, a position controller (such as a thruster) of an actual satellite is fixed on a satellite body, an input matrix of the position control depends on a position posture, and the change of the satellite posture has obvious influence on thrust distribution of the thruster. The optimal control input obtained by such techniques is the optimal control in the LVLH coordinate system, not the optimal control corresponding to each thruster in the body coordinate system.
Disclosure of Invention
Aiming at the problems existing in the prior art, the invention firstly establishes a complete six-degree-of-freedom nonlinear model of the satellite, and compared with the mode of simplifying the satellite dynamics into a linearization model in the first type of prior control technology, the invention adopts a neural network to estimate the optimal control strategy of the nonlinear model directly so as to cope with nonlinear influence. Compared with the second type of prior art, the algorithm provided by the invention can avoid the requirement on the accuracy of the satellite model because the reinforcement learning algorithm is adopted to carry out parameter identification based on measurable data, and has good self-adaptability to the condition that the satellite part parameters are unknown, and the parameters can be identified and the real optimal control strategy can be learned through interaction with the environment. Compared with the third type of prior art, the invention projects the position of the satellite relative to the reference point into the body coordinate system, the relative position observed from the body coordinate system dynamically contains the coupling information of the gesture to the position, and the optimal control is designed for each thruster in the body coordinate system.
The complete technical scheme of the invention comprises the following steps:
a self-adaptive satellite attitude and orbit control method based on reinforcement learning comprises the following steps:
step 1: aiming at a group of formation composed of actual satellites, a virtual leading satellite is adopted as a benchmark of the whole formation; establishing an orbit dynamics model of a virtual leading satellite, obtaining a relative position dynamic relation between each actual satellite and the virtual leading satellite, and combining an attitude motion equation of each actual satellite to obtain Lagrange representation of each actual satellite attitude dynamics based on modified Rodrigas parameter description;
step 2: aiming at actual satellite formation, a six-degree-of-freedom dynamic model of the coupling of each actual satellite attitude and orbit is obtained;
step 3: carrying out attitude controller design, comprising aiming at actual satellite formation, obtaining an attitude subsystem of an actual satellite according to a model of actual satellite attitude dynamics control and combining dynamic description of an actual satellite reference signal, defining a cost function of the attitude subsystem aiming at the attitude subsystem, and obtaining an optimal attitude control strategy by designing an abnormal strategy reinforcement learning optimal attitude control algorithm;
step 4: the design of the relative position controller comprises the steps of aiming at the formation of the actual satellites, obtaining a relative position subsystem of each actual satellite according to the relative position dynamic between the actual satellites and the virtual leading satellites and combining the relative position reference dynamic of each actual satellite; and defining a cost function of the relative position subsystem aiming at the relative position subsystem, and obtaining an optimal relative position control strategy by designing an abnormal strategy reinforcement learning relative position control algorithm.
The step 1 specifically comprises the following steps:
1.1 Establishing a geocentric inertial coordinate system of a virtual leading satellite and a body coordinate system of each actual satellite; establishing a virtual leading satellite orbit dynamics model consisting of a position vector and an gravitational acceleration vector in a geocentric inertial coordinate system;
1.2, establishing an orbit dynamics model of each actual satellite consisting of a position vector, an gravitation acceleration vector and a position control force vector;
1.3 Obtaining the relative position and the relative position dynamic between each actual satellite and the virtual leading satellite according to the position vector of each actual satellite and the position vector of the virtual leading satellite;
1.4 Dynamically projecting the relative position obtained in the step 1.3 to a body coordinate system of each actual satellite; and combining an actual satellite attitude motion equation comprising angular velocity, inertia and control moment to obtain Lagrangian representation of actual satellite attitude dynamics based on the modified Rodrigues parameter description.
In the step 2, each actual satellite includes six thrusters as position control actuators, and the relative positions between each actual satellite and the virtual leading satellite are dynamically represented as six-degree-of-freedom dynamic models of pose-orbit coupling of each actual satellite in a body coordinate system.
The step 3 specifically comprises the following steps:
3.1 expressing the attitude status of each actual satellite as a function of the output and the control moment input;
3.2 combining dynamic description of satellite reference signals to obtain attitude subsystems of the actual satellites,
3.3 defining a cost function for the satellite attitude subsystem;
and 3.4, combining the value functions of the attitude subsystems of the actual satellites to obtain Hamiltonian, and obtaining an optimal attitude control strategy by reinforcement learning of the optimal attitude control algorithm through the different strategies.
In the step 4, in particular,
4.1 dynamically expressing the relative position between each actual satellite and the virtual leading satellite as a function of the relative position state quantity, the output quantity and the control acceleration input,
4.2 combining the relative position reference dynamic of each actual satellite to obtain the relative position subsystem of each actual satellite,
4.3 defining a cost function for the relative position subsystem;
4.4, combining the relative position subsystem cost function of each actual satellite to obtain a Belman equation, and obtaining an optimal relative position control strategy through different strategy reinforcement learning relative position control algorithm.
The invention has the advantages compared with the prior art that:
1. and establishing a complete nonlinear model of the satellite, and estimating an optimal control strategy of the nonlinear model directly by adopting a neural network so as to cope with nonlinear influence.
2. The method has the advantages that the reinforcement learning algorithm is adopted to conduct parameter identification based on measurable data, so that the requirement on the accuracy of a satellite model can be avoided, the optimal control law is learned, meanwhile, the method has good self-adaptability under the condition that the satellite partial parameters are unknown, the parameters can be identified through interaction with the environment, and the real optimal control strategy is learned.
3. According to the invention, the position of the satellite relative to the reference point is dynamically projected into the body coordinate system, the relative position observed from the body coordinate system dynamically contains coupling information of the gesture to the position, and optimal control is designed for each thruster in the body coordinate system.
Drawings
FIG. 1 is a schematic view of a satellite formation coordinate system of the disclosed method;
FIG. 2 is a schematic diagram of a satellite formation model;
FIG. 3 is a simulated 3D flight effect diagram of the satellite of the present invention;
FIG. 4 is a graph of the satellite simulated attitude tracking error of the present invention;
FIG. 5 is a graph of the relative position tracking error of the satellite simulation of the present invention.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only examples and are not intended to limit the present application.
The present invention will be described in detail with reference to the following examples and drawings, but it should be understood that the examples and drawings are only for illustrative purposes and are not intended to limit the scope of the present invention in any way. All reasonable variations and combinations that are included within the scope of the inventive concept fall within the scope of the present invention.
The technical scheme adopted by the invention is that the self-adaptive satellite attitude and orbit control method based on reinforcement learning comprises the following specific steps:
further description of the technical solution disclosed in this embodiment is provided, and it should be noted that, in the reference signs used in this embodiment, unless otherwise specified,
Figure SMS_2
representing a matrix of i rows and j columns, e.g.>
Figure SMS_3
Representation->
Figure SMS_4
A matrix of 3 rows and 1 column, i.e. a 3-order column vector; symbols with dots above, representing the derivative of the parameter represented by the symbol below the dot, the number of dots being the derivative order, e.g.
Figure SMS_5
For the angular velocity of the ith satellite, +.>
Figure SMS_6
Is the first derivative of the angular velocity of the ith satellite, i.e. the angular acceleration of the ith satellite, +.>
Figure SMS_7
Then it is the second derivative of the angular velocity of the ith satellite. />
Figure SMS_8
Representing a standard oblique symmetry matrix.
The upper right-hand corner mark T represents the matrix transposition, e.g
Figure SMS_9
Representation->
Figure SMS_10
Is a transposed matrix of (a); right upper corner mark->
Figure SMS_11
Representing an inverse matrix, e.g. R +n -1 Is R +n An inverse matrix of (a); upper corner mark->
Figure SMS_12
Representing a companion matrix; parameters which are not explicitly defined in the embodiment are all intermediate variables in the derivation process, and have no practical physical significance.
Step 1: satellite dynamics model: respectively establishing a geocentric inertial coordinate system and a body coordinate system of each satellite; considering a set of rigid satellites traveling around the earth, each satellite contains six thrusters as its position control actuators, as shown in fig. 1, the attitude control moments are given in an ontology coordinate system. To facilitate formation representation, a virtual leader is introduced as a reference to the entire formation, and each satellite needs to maintain a desired relative position with the virtual leader to construct the entire satellite formation, as shown in fig. 2. Establishing an orbit dynamics model of a virtual leading satellite under a geocentric inertial coordinate system, giving out an orbit dynamics model of each satellite, respectively solving the relative position dynamics of each satellite and the virtual leading satellite according to the orbit dynamics model, and dynamically projecting the relative position dynamics of each satellite and the virtual leading satellite to a body coordinate system of each satellite, and then combining a gesture motion equation of the satellite, thereby giving out Lagrange representation of satellite gesture dynamics described based on MRPs; the method specifically comprises the following steps:
1.1 Giving a virtual leading satellite orbit dynamics model defined by a position vector of the virtual leading satellite and Euclidean constants, an gravitational acceleration vector and an earth gravitational constant of the virtual leading satellite;
Figure SMS_13
(1)
in the method, in the process of the invention,
Figure SMS_14
a position vector for the virtual lead satellite;
Figure SMS_15
is->
Figure SMS_16
Euclidean constant of (c);
Figure SMS_17
the gravitational acceleration vector is the gravitational acceleration vector of the virtual leading satellite;
Figure SMS_18
is the gravitational constant>
Figure SMS_19
Time is;
1.2, an orbit dynamics model of an ith actual satellite is established as follows:
Figure SMS_20
(2)
in the method, in the process of the invention,
Figure SMS_21
the position vector is the position vector of the ith actual satellite;
Figure SMS_22
the gravitational acceleration vector of the ith actual satellite;
Figure SMS_23
controlling a force vector for the position of the ith actual satellite; />
Figure SMS_24
Is->
Figure SMS_25
Euclidean constant of (c);
1.3 obtaining the relative position of the ith satellite and the virtual leading satellite
Figure SMS_26
The following are provided:
Figure SMS_27
(3)
1.4 Obtain the firstiThe relative positions of the satellites and the virtual leading satellites are dynamically as follows:
Figure SMS_28
(4)
1.5 since the actuator (thruster) controlling the relative position is fixed on the satellite body, for the controller design, the firstiThe relative positions of the satellites and the virtual leading satellites are dynamically projected into a satellite body coordinate system to obtain:
Figure SMS_29
(5)
in the method, in the process of the invention,
Figure SMS_30
is the firstiAngular velocity of the satellite;
Figure SMS_31
a coordinate conversion matrix from the geocentric inertial coordinate system to the body coordinate system of the ith satellite;
Figure SMS_32
is the firstiThe relative positions of the satellites and the virtual leading satellites in the body coordinate system are dynamic;
Figure SMS_33
angular acceleration for the ith satellite; />
Figure SMS_34
Representing a standard oblique symmetry matrix.
Since the relative position dynamic model includes angular velocity and angular acceleration, there is a coupling effect of satellite attitude on the relative position dynamic.
1.6 the satellite attitude dynamics are described as follows:
Figure SMS_35
(6)
in the method, in the process of the invention,
Figure SMS_36
is the firstiThe product of the inertia of the satellite;
Figure SMS_37
represents the firstiThe control moment of the satellite.
In order to avoid singular problems of attitude description, the invention describes the attitude of the ith satellite by adopting Modified Rodrigas Parameters (MRPs)
Figure SMS_38
The following are provided:
Figure SMS_39
(7)
wherein the method comprises the steps of
Figure SMS_40
The unit vector of the ith satellite on the Euler axis;
Figure SMS_41
is the rotation angle of the ith satellite relative to the Euler axis.
Gesture kinematics can be written as:
Figure SMS_42
(8)
in the method, in the process of the invention,
Figure SMS_43
is the ith satellitePosture of->
Figure SMS_44
Is the first derivative of (a);
Figure SMS_45
Figure SMS_46
and->
Figure SMS_47
Are all identity matrices.
A lagrangian representation of satellite attitude dynamics based on MRPs descriptions can be obtained:
Figure SMS_48
(9)
wherein:
Figure SMS_49
Figure SMS_50
m is
Figure SMS_51
A function; n is->
Figure SMS_52
A function; h is->
Figure SMS_53
A function.
Step 2: describing formation problems, the six-degree-of-freedom dynamic model of satellite attitude and orbit coupling is obtained as follows:
Figure SMS_54
(10)
wherein:
Figure SMS_55
Figure SMS_56
and is also provided with
Figure SMS_57
Figure SMS_58
The six-degree-of-freedom dynamic model of the satellite with strong nonlinearity and high coupling is obtained, and the coupling effect of the gesture dynamic to the relative position dynamic can be seen. The purpose of this embodiment is to design an adaptive controller to control the formation of satellites including non-linear and orbit-coupled satellites.
Step 3: the design of the attitude controller gives out the dynamic description of satellite reference signals, defines the cost function of a satellite subsystem, further obtains the Hamiltonian, and obtains an optimal control strategy by designing an abnormal strategy reinforcement learning optimal attitude control algorithm:
(1) For the ith satellite in the formation, the attitude dynamics in equation (9) can be sorted as:
Figure SMS_59
(11)
wherein:
Figure SMS_60
in the posture state->
Figure SMS_61
For outputting (I)>
Figure SMS_62
To control moment input, an
Figure SMS_63
Figure SMS_64
Nonlinear terms, and satisfies:
Figure SMS_65
wherein the method comprises the steps of
Figure SMS_66
Is a zero matrix.
(2) The reference signal dynamics for the attitude of the other satellite is described as follows:
Figure SMS_67
(12)
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_68
is an unknown smooth function, and
Figure SMS_69
is the output.
(3) The first can be obtained from the formulas (11) and (12)iEnhanced dynamics of the satellites:
Figure SMS_70
(13)
wherein:
Figure SMS_71
Figure SMS_72
Figure SMS_73
,/>
Figure SMS_74
and is also provided with
Figure SMS_75
Representing an attitude tracking error. It is necessary to design the attitude controller so that the attitude error approaches 0.
(4) And defining a cost function for the attitude subsystem by adopting a reinforcement learning method, combining the reinforcement dynamics of the ith satellite to obtain a Hamiltonian function, obtaining an optimal attitude control strategy according to a stability condition, then bringing the Hamiltonian function into the Hamiltonian function, and designing a different strategy reinforcement learning optimal attitude control algorithm to obtain a Bellman equation.
For a given control strategy
Figure SMS_76
The bellman equation is solved, specifically as follows:
step 1, setting an initial control strategy and exploring noise for the ith satellite, wherein the initial control strategy and exploring noise can enable the system to be stable, and collecting required satellite attitude belonging to control input data.
And 2, solving a Belman equation by using the given iteration control initial value and the data collected in the step 1.
And step 3, updating the control strategy according to the solving result and returning to the step one until the termination condition is met.
And 4, ending.
In this step, the bellman equation is difficult to solve because the cost function and the control strategy are both nonlinear functions. Two neural networks are introduced to estimate the cost function and control strategy.
(4) Adopting a reinforcement learning method, defining a cost function aiming at a relative position subsystem to obtain a Belman equation, obtaining the most valuable function solved by the Belman equation, obtaining an optimal relative position control strategy, bringing the most valuable function into the Belman equation, utilizing the following different strategies to reinforcement learn the relative position control algorithm to solve,
similar to the gesture controller design, two Neural Networks (NNs) may be used to estimate the cost function and control strategy, and the Least Squares (LS) method is used to update the neural networks under the condition of continuous excitation of the probe noise to obtain the optimal position control strategy.
In the simulation test, the task of four satellites for formation flight is executed, and parameters of each satellite are as follows: gravitational acceleration g=9.8 m/s 2 The gravitational constant is 3.986 multiplied by 10 14 m 3 /s 2 . The initial position and velocity of the virtual pilot satellite are set as [068710000 ] respectively] T m and [ -770000)] T m/s, which also determines the track parameters of the whole formation. The expected relative trajectories of each satellite and the virtual lead satellite in the formation are in turn: [20+30e ] -t 0 t] T m,[-20-30e -t 0 t] T m,[0 20+30e -t t] T m,[0 -20-30e -t t] T m。
The expected attitude of each satellite is [ 0000] T . The initial positions of four satellites are respectively [ 50-5 0 ] in the geocentric inertial coordinate system] T m,[-50 -5 2] T m,2 50 1] T m,[5 -50 0] T m, initial velocities are [ 1.5.0.05.0.6 respectively] T m/s,[1.5 0.05 0.6] T m/s,[1.5 0.05 0.6] T m/s,[1.5 0.05 0.6] T m/s, initial attitudes are respectively [ 0.10.1.0.1 ]] T ,[0.09 0.09 0.09] T ,[0.1 0.1 0.1] T ,[0.09 0.09 0.09] T . Setting relevant parameters in a reinforcement learning algorithm, selecting a polynomial function as a basis function in a neural network, setting initial control strategy distribution in an adaptive algorithm and exploring noise of each satellite, and performing simulation. Fig. 3 shows the 3D flight effect of four satellites, fig. 4 is the tracking error of the pose of the four satellites, fig. 5 is the tracking error of the relative positions of the four satellites, and in the illustration, four curves are distributed for satellites 1 to 4. It can be seen that after iteratively learning the optimal control strategy, the tracking error for each satellite converges to zero. In contrast to the model-based satellite control method, in the present embodimentThe optimal formation control algorithm can realize optimal control under the condition that the dynamic position or parameters of the satellite are uncertain.
The above applications are only some of the embodiments of the present application. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit of the inventive concept.

Claims (3)

1. The self-adaptive satellite attitude and orbit control method based on reinforcement learning is characterized by comprising the following steps of:
step 1: aiming at a group of formation composed of actual satellites, a virtual leading satellite is adopted as a benchmark of the whole formation; establishing an orbit dynamics model of a virtual leading satellite, obtaining a relative position dynamic relation between each actual satellite and the virtual leading satellite, and combining an attitude motion equation of each actual satellite to obtain Lagrange representation of each actual satellite attitude dynamics based on modified Rodrigas parameter description; the method comprises the following steps:
1.1 Establishing a geocentric inertial coordinate system of a virtual leading satellite and a body coordinate system of each actual satellite; establishing a virtual leading satellite orbit dynamics model consisting of a position vector and an gravitational acceleration vector in a geocentric inertial coordinate system;
Figure QLYQS_1
(1)
in the method, in the process of the invention,
Figure QLYQS_2
a position vector for the virtual lead satellite; />
Figure QLYQS_3
Is->
Figure QLYQS_4
Euclidean constant of (c);
Figure QLYQS_5
the gravitational acceleration vector is the gravitational acceleration vector of the virtual leading satellite; />
Figure QLYQS_6
Is the gravitational constant>
Figure QLYQS_7
Time is;
1.2, establishing an orbit dynamics model of each actual satellite consisting of a position vector, an gravitational acceleration vector and a position control force vector:
Figure QLYQS_8
(2)
in the method, in the process of the invention,
Figure QLYQS_9
the position vector is the position vector of the ith actual satellite; />
Figure QLYQS_10
The gravitational acceleration vector of the ith actual satellite; />
Figure QLYQS_11
Controlling a force vector for the position of the ith actual satellite;
Figure QLYQS_12
is->
Figure QLYQS_13
Euclidean constant of (c);
1.3, obtaining the relative position and the relative position dynamic between each actual satellite and the virtual leading satellite according to the position vector of each actual satellite and the position vector of the virtual leading satellite;
relative position of ith satellite and virtual leading satellite
Figure QLYQS_14
The following are provided:
Figure QLYQS_15
(3)
first, theiThe relative positions of the satellites and the virtual leading satellites are dynamically as follows:
Figure QLYQS_16
(4)
1.4 Dynamically projecting the relative position obtained in the step 1.3 to a body coordinate system of each actual satellite; combining an actual satellite attitude motion equation comprising angular velocity, inertia and control moment to obtain a Lagrangian representation of actual satellite attitude dynamics based on the modified Rodrigues parameter description:
Figure QLYQS_17
in the method, in the process of the invention,
Figure QLYQS_18
the attitude of the ith satellite; m is->
Figure QLYQS_19
A function; n is->
Figure QLYQS_20
A function; h is->
Figure QLYQS_21
A function;
Figure QLYQS_22
representing the control moment of the ith satellite; wherein: />
Figure QLYQS_23
Figure QLYQS_24
The unit vector of the ith satellite on the Euler axis; />
Figure QLYQS_25
Is the rotation angle of the ith satellite relative to the Euler axis;
Figure QLYQS_26
Figure QLYQS_27
is a unit matrix; />
Figure QLYQS_28
Representing a standard oblique symmetry matrix;
Figure QLYQS_29
,/>
Figure QLYQS_30
is the product of inertia of the ith satellite;
Figure QLYQS_31
,/>
Figure QLYQS_32
angular acceleration for the ith satellite;
step 2: aiming at actual satellite formation, a six-degree-of-freedom dynamic model of the coupling of each actual satellite attitude and orbit is obtained; each actual satellite comprises six thrusters serving as position control actuators, and the relative positions of the actual satellites and the virtual leading satellites are dynamically represented as six-degree-of-freedom dynamic models of the pose-orbit coupling of the actual satellites in a body coordinate system;
step 3: carrying out attitude controller design, comprising aiming at actual satellite formation, obtaining an attitude subsystem of an actual satellite according to a model of actual satellite attitude dynamics control and combining dynamic description of an actual satellite reference signal, defining a cost function of the attitude subsystem aiming at the attitude subsystem, and obtaining an optimal attitude control strategy by designing an abnormal strategy reinforcement learning optimal attitude control algorithm;
step 4: the design of the relative position controller comprises the steps of aiming at the formation of the actual satellites, obtaining a relative position subsystem of each actual satellite according to the relative position dynamic between the actual satellites and the virtual leading satellites and combining the relative position reference dynamic of each actual satellite; and defining a cost function of the relative position subsystem aiming at the relative position subsystem, and obtaining an optimal relative position control strategy by designing an abnormal strategy reinforcement learning relative position control algorithm.
2. The adaptive satellite attitude and orbit control method based on reinforcement learning according to claim 1, wherein the step 3 is specifically:
3.1 expressing the attitude status of each actual satellite as a function of the output and the control moment input;
3.2 combining dynamic description of satellite reference signals to obtain attitude subsystems of the actual satellites,
3.3 defining a cost function for the satellite attitude subsystem;
and 3.4, combining the value functions of the attitude subsystems of the actual satellites to obtain Hamiltonian, and obtaining an optimal attitude control strategy by reinforcement learning of the optimal attitude control algorithm through the different strategies.
3. The method for controlling the attitude and orbit of the adaptive satellite based on reinforcement learning according to claim 2, wherein in the step 4,
4.1 dynamically expressing the relative position between each actual satellite and the virtual leading satellite as a function of the relative position state quantity, the output quantity and the control acceleration input,
4.2 combining the relative position reference dynamic of each actual satellite to obtain the relative position subsystem of each actual satellite,
4.3 defining a cost function for the relative position subsystem;
4.4, combining the relative position subsystem cost function of each actual satellite to obtain a Belman equation, and obtaining an optimal relative position control strategy through different strategy reinforcement learning relative position control algorithm.
CN202310101472.5A 2023-02-13 2023-02-13 Self-adaptive satellite attitude and orbit control method based on reinforcement learning Active CN115771624B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310101472.5A CN115771624B (en) 2023-02-13 2023-02-13 Self-adaptive satellite attitude and orbit control method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310101472.5A CN115771624B (en) 2023-02-13 2023-02-13 Self-adaptive satellite attitude and orbit control method based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN115771624A CN115771624A (en) 2023-03-10
CN115771624B true CN115771624B (en) 2023-05-26

Family

ID=85393575

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310101472.5A Active CN115771624B (en) 2023-02-13 2023-02-13 Self-adaptive satellite attitude and orbit control method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN115771624B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6341249B1 (en) * 1999-02-11 2002-01-22 Guang Qian Xing Autonomous unified on-board orbit and attitude control system for satellites
CN105068546A (en) * 2015-07-31 2015-11-18 哈尔滨工业大学 Satellite formation relative orbit adaptive neural network configuration containment control method
CN107187615A (en) * 2017-04-25 2017-09-22 西北工业大学 The formation method of satellite distributed load
CN107554817A (en) * 2017-07-11 2018-01-09 西北工业大学 The compound formation method of satellite
CN108181916A (en) * 2017-12-29 2018-06-19 清华大学 The control method and device of moonlet relative attitude
CN111781827A (en) * 2020-06-02 2020-10-16 南京邮电大学 Satellite formation control method based on neural network and sliding mode control
CN113665849A (en) * 2021-09-29 2021-11-19 长光卫星技术有限公司 Autonomous phase control method combining EKF filtering algorithm and neural network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6341249B1 (en) * 1999-02-11 2002-01-22 Guang Qian Xing Autonomous unified on-board orbit and attitude control system for satellites
CN105068546A (en) * 2015-07-31 2015-11-18 哈尔滨工业大学 Satellite formation relative orbit adaptive neural network configuration containment control method
CN107187615A (en) * 2017-04-25 2017-09-22 西北工业大学 The formation method of satellite distributed load
CN107554817A (en) * 2017-07-11 2018-01-09 西北工业大学 The compound formation method of satellite
CN108181916A (en) * 2017-12-29 2018-06-19 清华大学 The control method and device of moonlet relative attitude
CN111781827A (en) * 2020-06-02 2020-10-16 南京邮电大学 Satellite formation control method based on neural network and sliding mode control
CN113665849A (en) * 2021-09-29 2021-11-19 长光卫星技术有限公司 Autonomous phase control method combining EKF filtering algorithm and neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
梁雯洁等.多飞行器自适应编队制导控制技术.《航空兵器》.2015,(第第3期期),第8-12页. *
梅杰等.近距离航天器相对轨道的鲁棒自适应控制.《宇航学报》.2010,第第31卷卷(第第10期期),第2276-2282页. *

Also Published As

Publication number Publication date
CN115771624A (en) 2023-03-10

Similar Documents

Publication Publication Date Title
Wang et al. Hybrid finite-time trajectory tracking control of a quadrotor
Mofid et al. Adaptive terminal sliding mode control for attitude and position tracking control of quadrotor UAVs in the existence of external disturbance
Yang et al. Robot learning system based on adaptive neural control and dynamic movement primitives
Gao et al. Hierarchical model predictive image-based visual servoing of underwater vehicles with adaptive neural network dynamic control
Wu et al. Modeling and sliding mode-based attitude tracking control of a quadrotor UAV with time-varying mass
Turpin et al. Trajectory design and control for aggressive formation flight with quadrotors
Lin et al. Flying through a narrow gap using neural network: an end-to-end planning and control approach
Chu et al. Observer-based adaptive neural network control for a class of remotely operated vehicles
CN105652664B (en) A kind of explicit forecast Control Algorithm of quadrotor unmanned plane based on dove group's optimization
Selfridge et al. A multivariable adaptive controller for a quadrotor with guaranteed matching conditions
Burri et al. A framework for maximum likelihood parameter identification applied on MAVs
Xiao et al. Flying through a narrow gap using end-to-end deep reinforcement learning augmented with curriculum learning and sim2real
CN115639830B (en) Air-ground intelligent agent cooperative formation control system and formation control method thereof
Trapiello et al. Position‐heading quadrotor control using LPV techniques
Jin et al. Adaptive finite-time consensus of a class of disturbed multi-agent systems
Chowdhary et al. Experimental results of concurrent learning adaptive controllers
Kuo et al. Quaternion-based adaptive backstepping RFWNN control of quadrotors subject to model uncertainties and disturbances
Zhao et al. Adaptive neural network-based sliding mode tracking control for agricultural quadrotor with variable payload
Duan et al. Attitude tracking control of small-scale unmanned helicopters using quaternion-based adaptive dynamic surface control
Glida et al. Trajectory tracking control of a coaxial rotor drone: Time-delay estimation-based optimal model-free fuzzy logic approach
Li et al. Motion control of mobile under-actuated manipulators by implicit function using support vector machines
De Marco et al. A deep reinforcement learning control approach for high-performance aircraft
Enjiao et al. Finite-time control of formation system for multiple flight vehicles subject to actuator saturation
CN114510067A (en) Approximate optimal guidance method for reusable aircraft
Lugo-Cárdenas et al. The MAV3DSim: A simulation platform for research, education and validation of UAV controllers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant