CN117872753A

CN117872753A - Underactuated system layered sliding film optimal control method based on self-adaptive dynamic programming

Info

Publication number: CN117872753A
Application number: CN202410022633.6A
Authority: CN
Inventors: 张颖伟; 张浩岩; 袁平
Original assignee: 东北大学
Priority date: 2024-01-08
Filing date: 2024-01-08
Publication date: 2024-04-12

Abstract

The invention provides an underactuated system layered synovial membrane optimization control method based on self-adaptive dynamic programming, which comprises the steps of firstly establishing an underactuated system dynamics model under a nominal form; secondly, constructing a layered sliding film surface containing state information; then constructing a cost function and an HJB equation based on the layered synovial membrane surface and solving an optimization control strategy by a gradient descent method; finally, constructing an evaluation network and an execution network to evaluate the optimization control law executed each time so as to realize the optimization control target of the underactuated system; the effectiveness of the proposed control method is verified through numerical simulation of the underactuated system; an advantageous tool is provided for analyzing the problems of the optimal control series of the underactuated system in the control field, and the reliability of the controlled system can be enhanced to a certain extent.

Description

Underactuated system layered sliding film optimal control method based on self-adaptive dynamic programming

Technical Field

The invention belongs to the technical field of under-actuated system optimal control, and particularly relates to an under-actuated system layered synovial membrane optimal control method based on self-adaptive dynamic programming.

Background

An under-actuated system is a typical unstable dynamic system. Because of the flexible operation space, the method has the characteristics of underactuation, nonlinearity, unstable open loop and the like, the method becomes a hot spot research problem in the control field and is widely applied to aerospace, robot control systems and the like. In recent years, controllers designed based on various classical control theory and modern control theory methods make the dynamic characteristics of underactuated systems well reflected.

As a special nonlinear control method, synovial membrane variable structure control has been widely studied in recent years. It is unique in that the structure is not fixed when constructing the synovial surface, but rather the controlled system state can be forced to move according to a predetermined sliding mode and in the process is insensitive to parametric perturbation and external disturbances. Therefore, the synovial membrane variable structure control has the characteristics of good robustness and the like. However, with the continuous development of industrial processes, conventional single-layer synovial membrane variable structure control has failed to meet practical requirements, and thus, synovial membrane control having a layered structure has been attracting attention in recent years. Each sub-synovial surface of the layered synovial surface is constructed according to the state of each subsystem, and the topmost synovial surface is composed of the sub-synovial surfaces of each layer, i.e. the topmost synovial surface contains all system state information. In the case of an under-actuated system, each layer of sub-slip film face can properly correspond to each subspace of the under-actuated system, and thus, layered slip film control can effectively solve under-actuated system related control problems.

Adaptive dynamic programming provides an efficient technique for dealing with optimal decision and control problems without linear assumptions and uncertain random conditions by employing approximation structures, such as neural networks, to produce an online approximation of the optimal solution by recursive numerical methods. In particular, adaptive dynamic programming combines reinforcement learning ideas with traditional dynamic programming to approximate the solution of the Hamilton-Jacobi-Bellman (HJB) equation. For linear time-varying cases, the HJB equation can evolve into an algebraic Riccati equation, and the well-known iterative solution strategy is proposed by converting the algebraic Riccati equation into a series of linear lisapunov equations. Along this direction, adaptive dynamic programming iterative solution strategies have gradually been extended and applied to nonlinear time-varying situations.

Disclosure of Invention

In order to solve the problem of optimal control of the underactuated system, the invention provides an underactuated system layered synovial membrane optimal control method based on self-adaptive dynamic programming. By introducing a layered slip film surface, the system response rate is improved. And (3) designing an optimization controller for enhancing the dynamic performance of the under-actuated system, and ensuring the optimization performance of the under-actuated system by adopting a self-adaptive dynamic programming method.

The underactuated system layered sliding film optimal control method based on the self-adaptive dynamic programming specifically comprises the following steps:

s1, establishing an under-actuated system dynamics model under a nominal form;

an augmented state variable x= [ X ] defining an underactuated system ₁ ,x ₂ ,x ₃ ,x ₄ ] ^T And the order state components are x respectively ₁ ，x ₂ ，x ₃ And x ₄ The under-actuated system dynamics model is formulated as:

wherein p is ₁ (X) and p ₂ (X) is a system dynamic function; q ₁ (X) and q ₂ (X) is a control gain function; u is a control input;

s2, constructing a layered sliding film surface containing state information based on an underactuated system dynamics model;

s21, constructing a sub-synovial surface vector S as follows:

s＝[s ₁ ,0,s ₂ ,0] ^T ＝HX _o +MX _e (2)

wherein s is _i ＝h _i x _2i-1 +x _2i I=1, 2, the i-th layer sub-slip film surface; h=diag (H ₁ ,1,h ₂ 1) is a diagonal matrix, h ₁ And h ₂ Are all diagonal elements; vector X _o And X _e Respectively defined as X _o ＝[x ₁ ,0,x ₃ ,0] ^T And X _e ＝[0,x ₂ ,0,x ₄ ] ^T The method comprises the steps of carrying out a first treatment on the surface of the The matrix M is:

s22, based on the formula (2), constructing a layered slide film surface vector S as follows:

S＝[S ₁ ,0,S ₂ ,0] ^T ＝s+ZS (3)

wherein S is _i ＝s _i +z _i-1 S _i-1 I=1, 2, indicating the i-th layered slide film surface; z=diag (Z) ₀ ,1,z ₁ 1) is a diagonal matrix, z ₀ And z ₁ Are all diagonal elements; vector quantitySIs defined asS＝[S ₀ ,0,S ₁ ,0] ^T ，S ₀ And S is ₁ Are all vectorsSIs an element of (2);

s23, obtaining an ith layered slide film surface S through iterative calculation based on formulas (2) and (3) _i The following is shown:

wherein,if->Parameters->Otherwise, go (L)>

S24, based on equation (2), equation (3) and equation (4), the layered synovial surface vector S is re-expressed as follows:

wherein the matrixThe method comprises the following steps:

s25, deriving the following result from the formula (5):

wherein the vector isP (X) and U (t) are each +.>P(X)＝[0,p ₁ (X),0,p ₂ (X)] ^T And u= [0, U ]] ^T The method comprises the steps of carrying out a first treatment on the surface of the Vector Q (X) is:

s26, based on a hierarchical structure, the control input u is designed as follows:

wherein i is less than or equal to i; u (u) _eq,i And u _sw,i Respectively representing an equivalent control law and a switching control law of the ith layered slide film surface;

s27, based on the formula (7), the vector U is re-formulated as:

U＝Ξ(u _eq +u _sw ) (8)

wherein the vector u _eq And u _sw Respectively isAnd->The matrix xi is:

s3, constructing a cost function and an HJB equation based on the layered sliding film surface based on the step S2, and solving an optimal control strategy through a gradient descent method;

s31, constructing a cost function based on the layered sliding film surface by utilizing the principle of the optimality of BellmanThe following are provided:

wherein A and B are positive arrays; alpha is a normal constant; the |x|| represents the norm of the augmented state variable X; f (u) _sw ) Is a positive definite function and is defined as:

wherein β is a normal constant; omega _i Represents an integral variable;is a monotonic odd function;

s32, constructing a Hamiltonian based on a layered sliding film surface based on the formula (9), wherein the Hamiltonian is as follows:

wherein,is->Is a gradient of (2); />Represents->Is a transpose of (2);

s33, an HJB equation of the structural formula (11) is as follows:

wherein,a cost function for optimization; />Is->Is a gradient of (2); />Represents->Is a transpose of (2); />And->Respectively an optimized equivalent control law vector and a switching control law vector;

s34, based on the formulas (11) and (12), and by solving partial differential equationAnd->Obtaining:

s4, constructing an evaluation network and an execution network, and evaluating an optimization control strategy executed each time to realize an optimization control target of the underactuated system;

s41, optimizing a cost function by using general approximation performance of a radial basis function neural networkIs approximated as follows:

wherein W is ^* Is a bounded weight, (W) ^* ) ^T Is W ^* Is a transpose of (2);representing an activation function; />Is an approximation error;

s42, based on the formula (15), an optimized cost functionGradient of->Is formulated as:

wherein,is->Is a gradient of (2); />Represents->Is a transpose of (2); />Is->Is a gradient of (2);

s43, based on the formula (16),and->Is re-expressed as follows:

s44 due to bounded weight W ^* Is unknown, and therefore,is estimated as follows:

wherein,is->Is determined by the estimation of (a); />Is W ^* Is determined by the estimation of (a); />Is->Is a transpose of (2);

s45, based on the formula (19),is formulated as:

wherein,is->Is a gradient of (2);

s46, based on the formula (20),and->The estimated form of (a) is:

wherein,and->Respectively->And->Is determined by the estimation of (a); />To perform the weights of the network;

s47, based on the formula (21) and the formula (22), the estimation form of the HJB equation is as follows:

wherein, to evaluate the weight of the network;

s48, based on the formula (12) and the formula (23), the bellman residual is defined as follows:

s49, in order to ensure minimum Belman residual error, an objective function e is defined as follows:

s50, designing and evaluating a network update law based on the formula (25) by using a standardization principle and a gradient descent method, wherein the network update law is as follows:

wherein, xi _c To evaluate the network update rate;represents->Is a transpose of (2);

s51, to ensure system stability, the execution network update law is designed to:

wherein epsilon and delta are both normal constants;J ^T represents the transpose of J; gamma ray _a To perform network update rates.

The invention has the beneficial technical effects that:

aiming at the underactuated system, the invention provides a layered synovial membrane optimization control method based on self-adaptive dynamic programming. By introducing a layered synovial surface containing all state information, system robustness is improved. And constructing a cost function and an HJB equation based on the layered synovial membrane surface by utilizing the Bellman optimality principle, and solving a non-explicit optimization control strategy by a gradient descent method. The neural network is adopted to effectively approximate the non-explicit optimization control strategy, the execution network is constructed to tune the control strategy, the evaluation network is constructed to evaluate the control strategy after each tuning, and then the evaluation network is fed back to the execution network to realize the optimization control target of the underactuated system.

Drawings

FIG. 1 is a flow chart diagram of an underactuated system layered synovial membrane optimization control method based on self-adaptive dynamic programming;

FIG. 2 is a graph of state variable response in a simulation of an embodiment of the present invention;

FIG. 3 is a graph of response of a layered synovial membrane variable in a simulation of an embodiment of the present invention;

FIG. 4 is a diagram showing the implementation of network weight updating in a simulation of an embodiment of the present invention;

FIG. 5 is a diagram showing the update of the evaluation network weight in the simulation according to the embodiment of the present invention.

Detailed Description

The invention will be further described with reference to the accompanying drawings and examples;

the under-actuated system layered sliding film optimal control method based on the self-adaptive dynamic programming is shown in the attached figure 1, and specifically comprises the following steps:

s1, establishing an under-actuated system dynamics model under a nominal form;

s21, constructing a sub-synovial surface vector S as follows:

s＝[s ₁ ,0,s ₂ ,0] ^T ＝HX _o +MX _e (2)

S＝[S ₁ ,0,S ₂ ,0] ^T ＝s+ZS (3)

wherein,if->Parameters->Otherwise, go (L)>

wherein the matrixThe method comprises the following steps:

s25, deriving the following result from the formula (5):

s27, based on the formula (7), the vector U is re-formulated as:

U＝Ξ(u _eq +u _sw ) (8)

wherein the vector u _eq And u _sw Respectively isAnd->The matrix xi is:

wherein,is->Is a gradient of (2); />Represents->Is a transpose of (2);

s33, an HJB equation of the structural formula (11) is as follows:

s43, based on the formula (16),and->Is re-expressed as follows:

s45, based on the formula (19),is formulated as:

wherein,is->Is a gradient of (2);

s46, based on the formula (20),and->The estimated form of (a) is:

wherein, to evaluate the weight of the network;

In order to verify the effectiveness of the control method, the under-actuated system is subjected to numerical simulation verification. Wherein the state variable initial value is given as X (0) = [ X ] ₁ (0),x ₂ (0),x ₃ (0),x ₄ (0)] ^T ＝[0.03,1,0.002,0.24] ^T . Parameter mu ₁ Sum mu ₂ The value of (1) is set to mu ₁ =2.1 and μ ₂ =4.9. Cost functionThe medium parameters are set to α=0.3 and β=2. The correlation matrix is given as:

the executive network and evaluation network weights are defined asAndits initial value is +.>Andactivation function->Is selected as a Gaussian type function and +.>Its center and widthThe degrees are c respectively _NN,l = { -2, -1,0,1,2} and w _NN,l = {1,2,1,2,1}. The executive network and evaluation network weight update law parameters are given as ζ _c ＝0.95，γ _a ＝1，ε＝0.6，δ＝1.5。

The simulation results are shown in fig. 2-5. FIG. 2 is a system state x ₁ ，x ₂ ，x ₃ And x ₄ It can be seen that the underactuated system reaches a steady state as the system state gradually converges and reaches the equilibrium point. FIG. 3 shows a layered slip film variable S ₁ And S is ₂ Can be seen as S ₁ And S is ₂ The trajectory of (2) is initially oscillating more and then begins to stabilize and gradually converge to the equilibrium point. Fig. 4 and 5 are graphs of the update of the execution network and evaluation network weights, respectively, in the embodiment, it can be seen that the execution network and evaluation network weights converge and reach a steady state after training in a short time.

Claims

1. The underactuated system layered sliding film optimal control method based on the self-adaptive dynamic programming is characterized by comprising the following steps of:

s1, establishing an under-actuated system dynamics model under a nominal form;

and S4, constructing an evaluation network and an execution network, and evaluating the optimization control strategy executed each time to realize the optimization control target of the underactuated system.

2. The under-actuated system layered synovial membrane optimization control method based on self-adaptive dynamic programming as claimed in claim 1, wherein the step S1 is specifically:

wherein p is ₁ (X) and p ₂ (X) is a system dynamic function; q ₁ (X) and q ₂ (X) is a control gain function; u is the control input.

3. The under-actuated system layered synovial membrane optimization control method based on self-adaptive dynamic programming as claimed in claim 1, wherein the step S2 is specifically:

s21, constructing a sub-synovial surface vector S as follows:

s＝[s ₁ ,0,s ₂ ,0] ^T ＝HX _o +MX _e (2)

S＝[S ₁ ,0,S ₂ ,0] ^T ＝s+ZS (3)

wherein S is _i ＝s _i +z _i-1 S _i-1 I=1, 2, indicating the i-th layered slide film surface; z=diag (Z) ₀ ,1,z ₁ 1) is a diagonal matrix, z ₀ And z ₁ Are all diagonal elements; vector quantitySIs defined asS＝[S ₀ ,0,S ₁ ,0] ^T ，S ₀ And S is ₁ Are elements of the vector S;

wherein,if->Parameters->Otherwise, go (L)>

wherein the matrixThe method comprises the following steps:

s25, deriving the following result from the formula (5):

wherein iota is less than or equal to i; u (u) _eq,ι And u _sw,ι Respectively representing an equivalent control law and a switching control law of the laminated sliding film surface of the first layer;

s27, based on the formula (7), the vector U is re-formulated as:

U＝Ξ(u _eq +u _sw ) (8)

wherein the vector u _eq And u _sw Respectively isAnd->The matrix xi is:

4. the under-actuated system layered synovial membrane optimization control method based on self-adaptive dynamic programming as claimed in claim 1, wherein the step S3 is specifically:

s31, constructing a cost function based on the layered sliding film surface by utilizing the principle of the optimality of Bellman

wherein,is->Is a gradient of (2); />Represents->Is a transpose of (2);

s33, constructing an HJB equation of the formula (11);

s34, based on the formulas (11) and (12), and by solving partial differential equationAndobtaining:

5. the method for optimizing control of an underactuated system layered slide film based on adaptive dynamic programming as claimed in claim 4, wherein the cost function based on layered slide film surfaceThe method comprises the following steps:

wherein β is a normal constant; omega _i Represents an integral variable;is a monotonic odd function.

6. The under-actuated system layered synovial membrane optimization control method based on self-adaptive dynamic programming as claimed in claim 4, wherein the HJB equation is specifically:

wherein,a cost function for optimization; />Is->Is a gradient of (2); />Represents->Is a transpose of (2); />And->The optimized equivalent control law vector and the switching control law vector are respectively.

7. The under-actuated system layered synovial membrane optimization control method based on self-adaptive dynamic programming as claimed in claim 1, wherein the step S4 is specifically:

s4-1, optimizing cost function by using general approximation performance of radial basis function neural networkIs approximated as follows:

s4-2, based on the formula (15), an optimized cost functionGradient of->Is formulated as:

s4-3, based on formula (16),and->Is re-expressed as follows:

s4-4 due to bounded weight W ^* Is unknown, and therefore,is estimated as follows:

s4-5, based on formula (19),is formulated as:

wherein,is->Is a gradient of (2);

s4-6, based on formula (20),and->The estimated form of (a) is:

s4-7, based on the formula (21) and the formula (22), the estimation form of the HJB equation is as follows:

wherein, to evaluate the weight of the network;

s4-8, based on formula (12) and formula (23), defining a Belman residual as follows:

s4-9, in order to ensure that the Belman residual error is minimum, an objective function e is defined as follows:

s5-10, designing and evaluating a network update law based on the formula (25) and by using a standardization principle and a gradient descent method as follows:

s5-11, in order to ensure the stability of the system, the execution network update law is designed as follows: