CN116208041A

CN116208041A - Motor system H infinite reduced order output tracking control method based on reinforcement learning

Info

Publication number: CN116208041A
Application number: CN202310067097.7A
Authority: CN
Inventors: 周林娜; 厉功贺; 杨春雨; 褚众; 王海; 刘晓敏
Original assignee: China University of Mining and Technology CUMT
Current assignee: China University of Mining and Technology CUMT
Priority date: 2023-01-30
Filing date: 2023-01-30
Publication date: 2023-06-02

Abstract

The invention discloses a motor system H infinite reduced order output tracking control method based on reinforcement learning, which can solve the problem of motor system interference suppression tracking control containing unmodeled dynamic and imperfect data. The method specifically comprises the following steps: decomposing the H infinite output tracking control problem of the original motor system by utilizing a singular perturbation theory to obtain a reduced-order system problem; based on the output state data of the original system, a state reconstruction mechanism of the virtual subsystem is provided to solve the problem that the data of the virtual subsystem is not measurable, and a reinforcement learning H infinite output tracking iterative algorithm based on the reconstruction data is further deduced; and introducing an execution-evaluation-disturbance neural network approximation controller, a performance index and disturbance, and iteratively updating the weight of the neural network based on a least square method to obtain the reduced-order H infinite output tracking controller based on reinforcement learning. The invention avoids the potential high-speed and pathological numerical problems when designing the double-time-scale motor system tracking controller under the reinforcement learning framework.

Description

Motor system H infinite reduced order output tracking control method based on reinforcement learning

Technical Field

The invention belongs to the field of motor system drive control, and particularly relates to a motor system H infinite reduced order output tracking control method based on reinforcement learning.

Background

The nonlinear double-time-scale motor system widely existing in the fields of power systems, flow industry and the like is a system with complex characteristics of high order, fast and slow coupling and the like. In practice, the system is often required to have a certain anti-interference capability while running according to a preset reference track. The goal of robust tracking control is to design the controller so that the system meets the above requirements and is therefore under extensive research.

The existing nonlinear double-time-scale motor system tracking control method is mainly based on sliding mode control, active disturbance rejection control and the like. However, no quantitative analysis of disturbance inhibition exists in the method, so that H infinite control is generated, and the method becomes an effective means for dealing with disturbance rejection. However, if the tracking control method of the general system is directly applied to a singular perturbation system, the problem of pathological numerical values and the dimension disaster can be caused. To this end, a viable solution based on system decomposition is applied in controlling such systems. While time-scale decomposition has been introduced as a result to design a combined robust controller of nonlinear dual time scales, the system model is required to be fully known and the virtual subsystem states are required to be fully measurable. At present, H infinite reduced order output tracking control of a nonlinear double-time-scale motor system with unknown dynamics does not exist.

In the actual industrial production process, the accurate model of the system is difficult to build, and the reinforcement learning has the unique advantage of dealing with the problem of model-free control due to the interactive error characteristics of the intelligent body and the environment, and the system can acquire an ideal control law by utilizing the input and output data of the system, so that the problem of optimal tracking control can be solved. Today, many approaches have emerged to overcome the adverse effects of interference under reinforcement learning frameworks. As a mainstream immunity method, H infinity control based on reinforcement learning has attracted attention. Converting the H infinity control problem to a zero and game problem and solving using the optimal control concept has proven to be an effective method. However, since the dual time scale system has high dimension and fast and slow dynamic coupling characteristics, the existing reinforcement learning method is not suitable for the motor system, and even causes a problem of pathological numerical values in the iterative learning process. Therefore, there is an urgent need to develop a motor system H infinity reduced order output tracking control method with self-learning capability, which can still realize H infinity reduced order output tracking control of a system other than the above system under the condition of containing unknown dynamics and data imperfections.

Disclosure of Invention

Aiming at the technical defects, the invention aims to provide a motor system H infinite reduced order output tracking control method based on reinforcement learning, which can solve the problem of motor system interference suppression tracking control containing unmodeled dynamic and imperfect data and avoid the potential high-dimension and pathological numerical problems when a double-time-scale motor system tracking controller is designed under the reinforcement learning framework.

In order to solve the technical problems, the invention adopts the following technical scheme:

a motor system H infinite reduced order output tracking control method based on reinforcement learning is used for servo motors, flow industry and other systems, and comprises the following steps:

step one: decomposing the H infinite output tracking control problem of the original motor system by utilizing a singular perturbation theory to obtain a reduced order problem;

step two: based on the output state data of the original system, a state reconstruction mechanism of the virtual subsystem is provided to solve the problem that the data of the virtual subsystem is not measurable, and an H infinite output tracking reinforcement learning iterative algorithm based on the reconstruction data is further deduced;

step three: and introducing an execution-evaluation-disturbance neural network approximation controller, a performance index and disturbance, and iteratively updating the weight of the neural network based on a least square method to obtain the reduced order tracking controller based on reinforcement learning.

Preferably, in step one, the motor system is described by the following state space model:

wherein x₁ ,x ₂ As a motor system state variable, u= [ u ] ₁ ,…,u _m ]Is a control input, w= [ w ] ₁ ,…w _q ]Is an external disturbance, f ₁₁ 、f ₁₂ 、f ₂₁ 、f ₂₂ Is the system dynamic, g ₁ 、g ₂ Is the input dynamics, k is the disturbance dynamics and 0<Epsilon < 1 is a singular perturbation parameter; let f ₁₁ 、f ₁₂ 、f ₂₁ 、f ₂₂ 、g ₁ 、g ₂ K is completely unknown and lipshitz is continuous, f (0) =0 and f ₂₂ Reversible, the fast subsystem is asymptotically stable in a short time without the application of a fast controller;

to make the system slow state x ₁ Tracking a bounded reference trajectory r (t), assuming a lipshitz continuous function exists, such that

Define tracking error as

ρ＝Cx ₁ -r(t)；

The tracking error dynamic is

The original H infinite output tracking control problem is: designing a state feedback controller u=χ (ρ, r), satisfying an L2 gain condition defined by the following equation in the presence of disturbance, and converging a tracking error to 0 in the absence of disturbance;

wherein z ² ＝ρ ^T Qρ+u ^T Ru is defined virtual control output, alpha>0 is a discount factor, γ represents the level of attenuation from the disturbance input w (t) to the defined performance output variable z (t), q= [ C ] ₁ C ₂ ] ^T [C ₁ C ₂ ]>0，R>0；

The original system is simplified into the following reduced-order system:

y＝Cx _1s ；

wherein C is a system output matrix, x _1s Is a reduced system state and

F _s (x _1s )＝f ₁₁ (x _1s )-f ₁₂ (x _1s )f ₂₂ ^-1 (x _1s )f ₂₁ (x _1s )

G _s (x _1s )＝g ₁ (x _1s )-f ₁₂ (x _1s )f ₂₂ ^-1 (x _1s )g ₂ (x _1s )；

K _s (x _1s )＝k(x _1s )

the H infinite reduced order output tracking control problem is simplified into the following reduced order output tracking problem:

design controller u _s So that the reduced order system outputs a state track Cx _1s Tracking a reference track r (t);

defining the output tracking error of the reduced order system as

ρ _s ＝Cx _1s -r(t)；

The tracking error dynamic is

Virtual control outputs are defined as follows:

||z|| ² ＝ρ _s ^T Qρ _s +u _s ^T Ru _s ；

the objective of the H infinity reduced order output tracking control problem is to calculate the tracking error rho _s And a reference track r, find a control strategy u of a smoothing function χ _s ＝χ(ρ _s R) is set to satisfy the following conditions:

1) In the presence of disturbances, the system satisfies the following L ₂ Gain conditions:

2) In the absence of disturbances, the output tracking error approaches 0.

Preferably, in the second step, the state reconstruction mechanism of the virtual subsystem is as follows: using the slow dynamic state x of the original system ₁ Reconstructing an unmeasurable virtual subsystem state based on the reconstructed data x ₁ The slow subsystem H infinite reinforcement learning iterative algorithm is as follows:

wherein ,

i is the slow controller iteration index.

Preferably, in the third step, the slow controller design method based on reinforcement learning specifically includes:

selecting and evaluating the neural network, executing the neural network and perturbing the linear independent activation function vectors of the neural network to be respectively

Design ofEvaluation-execution-perturbation neural network for approximating performance index J _rec Controller u _rec Disturbance w _rec ：

wherein ,

respectively representing the weight vectors of the evaluation neural network, the execution neural network and the disturbance neural network;

initializing the weight vector of the neural network

Given an initially stable execution network and perturbed network weights

In different behavior strategies u _s Under the action of w, data pair { X } is collected from the original system _1(n) ,u _s(n) ,w,X′ _1(n) And put it into sample set +.>

In (2), the number of collected samples is N _s ，n＝1,…,N _s ；

c, utilizing

and W⁽ⁱ⁾ Further constructing a database, and simultaneously updating the weights of the evaluation-execution-disturbance neural network based on a least square method:

wherein ,

preferably, the motor system H infinite reduced order output tracking controller based on reinforcement learning is:

the beneficial effects of the invention are as follows:

1) The singular perturbation theory is utilized to decompose the H infinite output tracking control problem of the original motor system to obtain the problem of a reduced-order slow subsystem, so that the occurrence of the problem of a pathological numerical value is avoided;

2) Based on the output state data of the original system, a state reconstruction mechanism of the virtual subsystem is provided to solve the problem that the data of the virtual subsystem is not measurable, and an H infinite output tracking reinforcement learning iterative algorithm based on the reconstruction data is further deduced;

3) Introducing a reinforcement learning algorithm into a motor control system, and iteratively updating the weight of the neural network based on a least square method by utilizing an execution-evaluation-disturbance neural network approximation controller, a performance index and disturbance to obtain a reinforcement learning-based reduced-order H infinite output tracking controller.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a diagram of a motor system H infinite reduced order output tracking control framework based on reinforcement learning provided by an embodiment of the invention;

fig. 2 is a schematic diagram of a process for evaluating convergence of weights of a neural network according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a first implementation neural network weight convergence procedure according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a second implementation neural network weight convergence procedure according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a process for converging weights of a perturbed neural network according to an embodiment of the present invention;

fig. 6 is a trace curve of the state of the closed-loop motor system under the action of the optimal control law provided by the embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1; referring to fig. 1, the motor system H infinite reduced order output tracking control method based on reinforcement learning includes the following steps:

step 101: the singular perturbation theory is utilized to decompose the H infinite output tracking control problem of the original motor system to obtain a reduced order problem, so that the occurrence of the disease state numerical value problem is avoided;

the specific method comprises the following steps:

(1-1) a nonlinear double time scale motor system, without loss of generality, the state space model of the system is described as:

wherein x₁ ,x ₂ As a motor system state variable, u= [ u ] ₁ ,…,u _m ]Is a control input, w= [ w ] ₁ ,…w _q ]Is an external disturbance, f ₁₁ ,f ₁₂ ,f ₂₁ ,f ₂₂ Is the system dynamic, g ₁ ,g ₂ Is the input dynamics, k is the disturbance dynamics and 0<Epsilon < 1 is a singular perturbation parameter. Let f ₁₁ ,f ₁₂ ,f ₂₁ ,f ₂₂ ,g ₁ ,g ₂ K is completely unknown and lipshitz is continuous, f (0) =0 and f ₂₂ Reversible, the fast subsystem is asymptotically stable in a very short time without the application of a fast controller.

To make the system slow state x ₁ Tracking a bounded reference trajectory r (t), assuming a Lipschitz continuous function exists, such that

Define tracking error as

ρ＝Cx ₁ -r(t)(3)

The tracking error dynamic is

(1-2) the original H infinity output tracking control problem is: the design state feedback controller u=χ (ρ, r), satisfies the L2 gain condition defined by the following equation in the presence of disturbance, and converges to 0 in the absence of disturbance.

Wherein z ² ＝ρ ^T Qρ+u ^T Ru is defined virtual control output, alpha>0 is a discount factor, γ represents the level of attenuation from the disturbance input w (t) to the defined performance output variable z (t), q= [ C ] ₁ C ₂ ] ^T [C ₁ C ₂ ]>0，R>0。

(1-3) the decomposed slow sub-problem is: design controller u _s So that the slow subsystem outputs a state track Cx _1s Tracking a reference track r (t).

Defining the output tracking error of the reduced order system as

ρ _s ＝Cx _1s -r(t)(6)

The tracking error dynamic is

Virtual control outputs are defined as follows:

||z|| ² ＝ρ _s ^T Qρ _s +u _s ^T Ru _s (8)

2) In the absence of disturbances, the output tracking error approaches 0.

Step 102: based on the output state data of the original system, a state reconstruction mechanism of the virtual subsystem is provided to solve the problem that the data of the virtual subsystem is not measurable, and an H infinite output tracking reinforcement learning iterative algorithm based on the reconstruction data is further deduced; comprising the following steps:

(2-1) Using the Primary System Slow dynamic State x ₁ Reconstructing an unmeasurable virtual subsystem state, said reconstructing data x based ₁ The slow subsystem H infinite reinforcement learning iterative algorithm is as follows:

wherein ,

i is the slow controller iteration index.

Step 103: introducing a reinforcement learning algorithm into a motor control system, and iteratively updating weights of a neural network based on a least square method by utilizing an execution-evaluation-disturbance neural network approximation controller, performance indexes and disturbance to obtain a reinforcement learning-based reduced-order H infinite output tracking controller, wherein the reinforcement learning-based reduced-order H infinite output tracking controller comprises the following steps of:

(3-1) designing a slow controller based on reinforcement learning, specifically:

selecting a slow evaluation neural network, an execution neural network and a linear independent activation function vector of a disturbance neural network as follows respectively

Designing an evaluation-execution-perturbation neural network for approximating a slow performance index J _rec Slow controller u _rec Disturbance w _rec ：

wherein ,

the weight vectors of the slow evaluation neural network, the slow execution neural network and the first disturbance neural network are respectively represented.

Initializing the weight vector of the neural network

Given an initially stable execution network and perturbed network weights

In different behavior strategies u _s Under the action of w, data pair ∈A is collected from the original system>

And put it into sample set +.>

In (2), the number of collected samples is N _s ，n＝1,…,N _s 。

c, utilizing

wherein ,

will be

Acting on the original motor system;

the motor system H infinite reduced order output tracking controller based on reinforcement learning is designed as follows

u＝u _s (15)。

Example 2

In order to enable those skilled in the art to better understand the invention, a motor system H infinite reduced order output tracking control method based on reinforcement learning is described in detail below with reference to specific embodiments;

consider the permanent magnet synchronous motor model:

wherein the number of pole pairs n _p =4, viscous friction coefficient B _υ =0.005 n·m·s, stator resistance R _s =10.7Ω, synthetic rotor flux linkage

Direct axis and quadrature axis inductance L _d ＝L _q =0.0098 mH, moment of inertia +.>

Select state variable +.>

For motor rotation speed, direct axis current and quadrature axis current, the control input u= [ u ] ₁ u ₂ ] ^T ＝[u _d u _q ] ^T External disturbance for direct and quadrature voltages>

Time scale parameter for load torque

Obtaining

x ₁ ＝-0.238x ₁ +2.0114x ₂ -4.7619w

The control objective of this embodiment is to design a state feedback controller to make the motor system run according to a given reference trajectory at w≡0 and to satisfy L ₂ Gain of

Q and R are chosen to be first and second order identity matrices, respectively, γ=1.

When designing the H-infinity output tracking controller, four neural networks are introduced, including an evaluation neural network, two execution neural networks and a disturbance neural network. The reference track is selected to be r=0.2 cos (0.2 t), the initial value is 0, and x ₁ Initial values are 1, c=1, q=i, r=1, α=0.2, γ=1. The neural network basis function of the evaluation function is sigma= [ ρ ] ² ,ρ ³ ,ρ ⁴ ,r,r ³ ]Executing the network and perturbing the network basis functions as

The initial weight is +.>

And applying detection noise, collecting sample data, and carrying out iteration to converge the weights of the neural networks. The slow subsystem evaluates the neural network weight iterative process as shown in fig. 2, the execution neural network weight iterative process as shown in fig. 2-4, and the disturbance neural network weight iterative process as shown in fig. 5. Based on executing the neural network weight and combining (12), the H infinite reduced order tracking controller (15) can be obtained.

The state track curve of the closed loop motor system under the action of the reduced order tracking controller is shown in fig. 6, and it can be seen that the system operates according to a given reference track without disturbance.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. The motor system H infinite reduced order output tracking control method based on reinforcement learning is characterized by comprising the following steps of:

step one: decomposing the H infinite output tracking control problem of the original motor system by utilizing a singular perturbation theory to obtain a reduced-order system problem;

step three: and introducing an execution-evaluation-disturbance neural network approximation controller, performance indexes and disturbance, and iteratively updating the weight of the neural network based on a least square method to obtain a reinforcement learning-based reduced order controller.

2. The reinforcement learning-based motor system H infinite reduced order output tracking control method according to claim 1, wherein in step one, the motor system is described by the following state space model:

Define tracking error as

ρ＝Cx ₁ -r(t)；

The tracking error dynamic is

The original system is simplified into the following reduced-order system:

y＝Cx _1s ；

wherein C is a system output matrix, x _1s Is a reduced system state and

the original H infinity output tracking control problem is simplified into the following H infinity reduced order output tracking problem:

defining the output tracking error of the reduced order system as

ρ _s ＝Cx _1s -r(t)；

The tracking error dynamic is

Virtual control outputs are defined as follows:

||z|| ² ＝ρ _s ^T Qρ _s +u _s ^T Ru _s ；

2) In the absence of disturbances, the output tracking error approaches 0.

3. The method for controlling H infinite reduced order output tracking of a motor system based on reinforcement learning according to claim 1, wherein in the second step, a state reconstruction mechanism of the virtual subsystem is as follows: using the slow dynamic state x of the original system ₁ Reconstructing an unmeasurable virtual subsystem state based on the reconstructed data x ₁ The slow subsystem H infinite reinforcement learning iterative algorithm is as follows:

wherein ,

i is the slow controller iteration index.

4. The motor system H infinite reduced order output tracking control method based on reinforcement learning according to claim 1, wherein in the third step, the slow controller design method based on reinforcement learning specifically comprises:

Design evaluation-execution-perturbation neural network for approximating performance index J _rec Controller u _rec Disturbance w _rec ：

wherein ,

initializing the weight vector of the neural network

Given an initial stable execution network and a perturbation network weight +.>

In different behavior strategies u _s Under the action of w, data pair { X } is collected from the original system _1(n) ,u _s(n) ,w,X' _1(n) And put it into sample set +.>

In (2), the number of collected samples is N _s ，n＝1,…,N _s ；

c, utilizing

and W⁽ⁱ⁾ Further constructing a database, and simultaneously updating the weights of the evaluation-execution-disturbance neural network based on a least square method: />

wherein ,

5. the reinforcement learning-based motor system H infinite reduced order output tracking control method according to claim 1, wherein the reinforcement learning-based motor system H infinite reduced order output tracking controller is:

/>