CN115327903B

CN115327903B - Off-track strategy optimal tracking control method for two-dimensional state time-lag batch processing process

Info

Publication number: CN115327903B
Application number: CN202210962596.8A
Authority: CN
Inventors: 施惠元; 吕梦迪; 李娟�; 苏成利; 姜雪莹; 李平; 郑尚磊; 解俊朋
Original assignee: Liaoning Shihua University
Current assignee: Liaoning Shihua University
Filing date: 2022-08-11
Publication date: 2024-07-05
Anticipated expiration: 2042-08-11

Abstract

An off-track strategy optimal tracking control method for a two-dimensional state time-lag batch processing process belongs to the technical field of industrial process control, breaks through the limitation that the traditional control method needs to depend on an accurate system model, and comprises the following specific steps: step one: establishing an augmented state space equation consisting of a state error and an output error; step two: on the basis of the augmentation system, a controller of a two-dimensional time-lag augmentation state space model is designed; step three: designing a two-dimensional time-lag bellman equation and a controller gain; step four: solving the gain of the controller; the method can effectively solve the problems of modeling difficulty, repeatability and other complex characteristics in the injection molding process with state time lag, well reduces the difficulty that a system depends on a model in a data-based mode, and reduces the calculation cost.

Description

Off-track strategy optimal tracking control method for two-dimensional state time-lag batch processing process

Technical Field

The invention belongs to the technical field of industrial process control, and particularly relates to an off-track strategy optimal tracking control method for a two-dimensional state time-lag batch processing process.

Background

The input of raw materials to the output of products is continuous and uninterrupted in the traditional industrial control process, the industrial process is called as a continuous process, but with the continuous development of society, the variety of products is increasing, the quality requirements of people on the products are increasing, and the modern industrial control process usually needs a large amount of repeated operations, so that the modern industrial control process is favored by people in a multi-process production mode with high efficiency, multiple types and small scale. Such a process is mainly to convert raw materials into products in a processing order and obtain the products by repeating operations, and is called a batch process. Control of batch processes is in some areas of mass production such as: the method is widely applied to the fields of clothing, chemical industry, food production, equipment part production, medicine, smelting and the like. The batch process has the unique properties of repeatability, rapidity and low cost, and also has the difficulty of building an accurate system model, but the batch process has a large specific gravity in industrial production. Therefore, control of batch processes in industrial production is also one of the hot spots studied in the control field. In the batch production process, time lag is unavoidable, and in the case of time lag, the continuous delay of the system along with time can reduce the response speed of the system, increase the design difficulty of the controller, and also make the control performance of the system unstable, the control effect deviates, and the optimal control law is difficult to obtain. At the same time, because of the repeatability and model complexity of a typical batch process, this makes it difficult to accurately model the process, which is often unduly dependent on the model by conventional control methods. Therefore, how to effectively deal with the time lag problem without depending on the model in the intermittent process has been a research hotspot in this field.

Whereas the model is often difficult to obtain during injection molding production while a large amount of data is generated and stored, the data contains data in both the time direction and the batch direction. It is important how to effectively use these two-dimensional data to directly design the controller of an injection molding process without an accurate process model, so that data-dependent methods are more applicable in the problem of time-lapse batch processing. Since reinforcement learning has the advantage of being capable of optimally controlling a complex system without prior information by relying on data only, reinforcement learning has been studied to some extent in terms of optimal tracking control problems with state time-lapse systems since development. However, these studies are mainly directed to one-dimensional state time-lapse systems, and have not been studied in terms of time-lapse batch process control problems with two-dimensional characteristics. For this reason, it is very important to study a control method based on reinforcement learning for an intermittent process with state time lag, so that the system does not depend on a model, and only depends on data in a time direction and a batch direction to continuously learn to obtain an optimal control law.

Disclosure of Invention

The invention provides a new model-free two-dimensional off-track strategy optimal tracking control method aiming at the optimal tracking control problem of a two-dimensional batch process with state delay. The method can effectively solve the problem that the system cannot accurately model, reduce the dependence of the system on the model, continuously learn by only relying on data to obtain the optimal control law, well track the set value and improve the control performance and tracking performance of the system. The method first represents an intermittent process with unknown dynamics as a state space form and constructs an augmentation system consisting of state delta and output error. Secondly, on the basis of the augmentation system, a two-dimensional optimal performance index function, a two-dimensional value function and a two-dimensional Q function which contain time lags are designed. Then, a target strategy is introduced on the basis of a behavior strategy, and a time-lag batch process optimal tracking control method based on two-dimensional reinforcement learning is designed by utilizing data of a batch direction and a time direction through the Beemann optimal principle, dynamic programming and related knowledge of a Crohn's product and a least square method. After the two-dimensional data generated in the injection molding process are used for multiple learning, the optimal controller gain in the injection molding process can be obtained, the optimal control law under the performance index can be obtained through the controller gain, and then the controller gain acts on an actuator control system to enable the output of the system to gradually track the upper set value.

The invention is realized by the following technical scheme:

The invention describes a two-dimensional injection molding process with state delay, a corresponding two-dimensional time-lag injection molding process state space model can be obtained, and then a new two-dimensional time-lag injection molding process augmentation state space model can be formed by augmenting an output error into the model; secondly, on the basis of the augmentation system, a two-dimensional optimal performance index function, a two-dimensional value function and a two-dimensional Q function containing time lags are designed, then a target strategy is introduced on the basis of a behavior strategy, and the time lag batch process optimal tracking control method based on two-dimensional reinforcement learning is designed by utilizing data of batch directions and time directions and utilizing data of the batch directions and the time directions through the Beeman optimal principle, dynamic programming, crohn's products and related knowledge of a least square method. After the two-dimensional data generated in the injection molding process are used for multiple learning, the optimal controller gain in the injection molding process can be obtained, the optimal control law under the performance index can be obtained through the controller gain, and then the controller gain acts on an actuator control system to enable the output of the system to gradually track the upper set value.

Step one: establishing an augmented state space equation consisting of a state error and an output error;

first, the state space of an intermittent process with time lags can be expressed as:

Wherein t represents a time direction, k represents a lot, x (t, k) represents a system state of a current lot at a current time, x (t-d, k) represents a state value of a system having a time delay, u (t, k) represents a control input of the system under the current lot at the current time, y (t, k) represents a system output of the current lot at the current time, and a, a _d, B and C are constant matrices having appropriate dimensions;

In order to reduce steady state errors of the system and improve the regulating force of the controller, subtracting the state space at the time of k-1 batch t+1 from the state space at the time of k batch t+1 to obtain a time-lag state space increment model as shown in the following:

Where r (t, k) =u (t, k) -u (t, k-1) represents the update law of the current batch at the current time of the system.

Defining y _r as the set value of the injection speed and keeping the set value unchanged all the time, the output tracking error of the system at the time of the k batch t+1 is e (t+1, k) =y _r -y (t+1, k), so as to obtain

e(t+1,k)＝e(t+1,k-1)-CAΔ_kx(t,k)-CA_dΔ_kx(t-d,k)-CBΔ_kr(t,k) (3)

The output tracking error and the incremental state variable are introduced into a new state space variable to obtain a new extended time-lag state space model, and the result is as follows:

Wherein e (t+1, k) =y _r (t+1, k) -y (t+1, k),

I is a unit array of appropriate dimensions;

step two: on the basis of the augmentation system, a controller of a two-dimensional time-lag augmentation state space model is designed;

According to model (4), the controller gains for an injection molding system with state lags are designed as follows:

Wherein K ₁ is the controller gain corresponding to the time t of the system K batch, K ₂ is the controller gain corresponding to the time t-d of the system K batch, and K ₃ is the controller gain corresponding to the time t+1 of the system K-1 batch;

meanwhile, the performance index of the optimal tracking control problem of the injection molding system with state time lag is designed as follows:

Designing a two-dimensional value function of the system according to the performance index:

the secondary form is as follows:

Wherein, α₁＝P₁+d²P₄,α₂＝P₃+d²P₄+d(d-1)P₄,α₃＝P₃+d(d-1)P₄+d(d-2)P₄,α₄＝P₃+d(d-(d-2))P₄+d(d-(d-1))P₄,α₅＝P₃+d(d-(d-1))P₄,α₆＝-d²P₄,α₇＝-d(d-1)P₄,α₈＝-d(d-(d-2))P₄,α₉＝-d(d-(d-1))P₄,α₁₀＝P₂;

According to the performance index and the two-dimensional value function, the two-dimensional Q function of the state evaluation by the design system is as follows:

similar to the two-dimensional value function, the two-dimensional Q function is expressed as follows:

therefore, comparing the quadratic form of the two-dimensional value function and the two-dimensional Q function, the relationship between the P matrix and the H matrix based on the controller gain can be obtained:

substituting the state space model of the injection molding process into a two-dimensional Q function for solving, the specific form of the H matrix based on the model parameters can be expressed:

Wherein X ₁ is represented by X (t, k), X ₂ is represented by X (t+1, k-1), and X _d is represented by X (t-d+ (d-1), k);

Based on dynamic programming, a two-dimensional Belman equation based on a Q function is obtained, and the two-dimensional Belman equation is independent of model parameters of an injection molding system and is only related to the state of the system, and the specific form is as follows:

wherein the method comprises the steps of ,z(t,k)＝[X^T(t,k),X^T(t-1,k)…X^T(t-d,k),X^T(t+1,k-1),r^T(t,k)]^T;

According to the requirement of optimality, letOptimal control inputs are available:

step three: designing a two-dimensional time-lag bellman equation and a controller gain;

solving H matrix parameters by adopting data-driven two-dimensional off-track strategy algorithm AndThereby according toAndDesigning a controller gain K ₁,K₂,K₃;

here, in order to better solve the contradiction between exploring data and utilizing data in the two-dimensional off-track policy algorithm learning process, a target policy r ^j (k, p) is introduced into the system, and the target policy is continuously optimized and learned in the data generated through the behavior policy; under the action of the two strategies, not only the comprehensiveness of the data is ensured, but also the global optimality of learning is ensured, and at the moment, the following new system is obtained:

Wherein the method comprises the steps of

Along the trajectory of the new system (15), the two-dimensional bellman equation can be expressed as follows:

Substituting the relation between the system (15) and the P matrix and the H matrix into the formula (16) to obtain a two-dimensional Belman equation expression as shown in the following:

Describing the formula (17) as follows according to the principle of kronecker product, least squares:

Wherein the method comprises the steps of

Step four: solving the gain of the controller;

By calculating the above items, the controller gain can be obtained

1. The stable behavior control strategy is used for acting on the injection molding process to generate two-dimensional data of time and batch direction, and the two-dimensional data are stored in the injection molding processAnd θ ^j+1 (p);

2. selecting an initial allowable controller gain AndAnd j=0, where j represents the number of iterations;

3. Executing the off-track strategy algorithm, the controller gain K ₁,K₂,K₃ can be updated;

4. Stopping when the controller gain satisfies the condition of < K ^j+1-K^j < epsilon, wherein epsilon is a very small positive integer, if the stopping condition is not satisfied, letting j=j+1, and the algorithm jumps to step 2 and continues to loop execution.

The invention has the advantages and effects that:

aiming at the problems of modeling difficulty, repeatability and other complex characteristics in the optimal tracking control problem of the injection molding process with state time lag, the invention provides a novel optimal tracking control method of a model-free off-track strategy; unlike conventional methods that require accurate modeling of the time-lapse batch process, the method does not rely on a model, and only uses data obtained from the injection molding process in the batch and time directions. Meanwhile, the method is different from the traditional one-dimensional mode, and under the framework of a two-dimensional theory, the optimal tracking control of a two-dimensional off-track strategy algorithm on the injection molding process is realized in a data driving mode. An augmented state space equation consisting of state errors and output errors is established to ensure the tracking performance of the designed controller; when solving the controller gain, the behavior strategy and the target strategy are separately considered, the data are fully mined and utilized, the system can ensure unbiasedness even after the detection noise is added, the H matrix gradually approaches to the optimal H matrix after repeated iterative learning, and the controller gain is converged to the optimal control input.

Drawings

FIG. 1 is a H matrixIs a convergence process of (1);

FIG. 2 is a H matrix Is a convergence process of (1);

FIG. 3 is a H matrix Is a convergence process of (1);

FIG. 4 is a convergence process of the controller gain K ₁;

FIG. 5 is a convergence process of the controller gain K ₂;

FIG. 6 is a convergence process of the controller gain K ₃;

FIG. 7 is a graph of control inputs under a two-dimensional off-track strategy algorithm;

fig. 8 is an output tracking curve under a two-dimensional off-track strategy algorithm.

Detailed Description

The invention is described in detail below with reference to the drawings and examples for further illustration of the invention, but they are not to be construed as limiting the scope of the invention.

Example 1:

Injection molding is a typical batch production process, which plays an important role in the plastic product industry and mainly consists of three stages, namely filling, packaging and cooling, wherein the packaging stage is an important stage for determining the quality of the product, and the injection speed is a key variable which influences the flow behavior of the solution in the cavity and is controlled within a given profile range in order to ensure the quality of the product. In this section, the algorithm presented herein is applied to injection speed control to demonstrate the feasibility and effectiveness of the designs presented herein.

Based on a large number of experiments, the discrete input/output model between the injection speed and the valve in the two-dimensional time-lapse injection molding process is as follows:

wherein y (Z) and u (Z) are the Z-transforms of the output and input quantities, respectively, and Z is the Z-transform factor.

The transition of the state space of the model described by the expression (20) is expressed as:

Wherein, C= [ 10 ], noise (t, k) is random noise, and the value range is [ -2,2];

The controller parameters used in the simulation are: q ₁＝Q₂ = diag [0.4,0.4,0.4], R = 1. Here, the optimal can be obtained by solving (13) and (18) through a two-dimensional off-track strategy iterative algorithm Matrix and optimal controller gain

After multiple iterative learning, the two-dimensional off-track strategy algorithm is obtained Matrix (26) and controller gains (27), (28), (29) gradually converge to an optimum Matrix and optimal controller gain.

As can be seen from fig. 1,2 and 3, in the iterative learning process, as the number of batches increases,The matrix gradually approaches the optimumA matrix. The trend of the controller gain K ₁,K₂,K₃ gradually converging to the optimum controller gain is shown in fig. 4,5 and 6. As can be seen from the simulation effect graph, the method has a good optimizing effect, and the convergence effect is better along with the increase of the number of batches.

Under the action of the controller parameters acquired by the method, the input and output curves are respectively shown in fig. 7 and 8, wherein the output set value is y _r =40 mm/s, and the control input curve is always in a stable state, so that the deviation between the output response of the system and the target track is gradually reduced to zero. In the previous batches, the actual output is not much different from the expected value, and then the tracking error becomes smaller and the tracking effect becomes better as the number of batches increases.

In summary, the invention takes the control design of the injection molding system with state time lag as an example to verify the effectiveness and feasibility of the control method provided by the invention; under the condition that the model parameters are unknown, only data measured in the batch and time directions in the injection molding process are utilized, and the method is different from the traditional time-lag intermittent process in the need of establishing an accurate model, and is only dependent on the data, and is different from the traditional one-dimensional mode, so that the dependence of the system on the model is greatly reduced; simulation results show that after the system learns for many times, the set value can be better tracked and output along with the increase of the batch number, and the system has better tracking performance and convergence. Therefore, the method provides a new design scheme for the optimal tracking control problem of the two-dimensional batch processing process with state time lag, so that the system has good tracking effect and difficulty in modeling is reduced.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. The off-track strategy optimal tracking control method for the two-dimensional state time-lag batch processing process comprises the following specific steps:

where t represents the time direction, k represents the lot, x (t, k) represents the state value of the system, x (t-d, k) represents the state value of the system with time delay, u (t, k) represents the control input of the system, y (t, k) represents the output of the system, A, A _d, B and C are constant matrices with appropriate dimensions;

to reduce the steady state error of the system and improve the tuning force of the controller, the state space at time k-1 batch t+1 is subtracted from the state space at time k batch t+1 to obtain a state space delta model with state time lags as follows:

wherein r (t, k) =u (t, k) -u (t, k-1) represents the update law of the current batch at the current time of the system;

e(t+1,k)＝e(t+1,k-1)-CAΔ_kx(t,k)-CA_dΔ_kx(t-d,k)-CBΔ_kr(t,k) (3)

The increment of the output tracking error and the state variable is introduced into a new state space variable to obtain a new extended time-lag state space model, and the result is as follows:

wherein e (t+1, k) =y _r (t+1, k) -y (t+1, k),

I is a unit array of appropriate dimensions;

On the basis of the model (4), the controller gain of the injection molding system with state time lag is designed as follows:

then designing the performance indexes of the optimal tracking control problem of the injection molding system with state time lag as follows:

designing a two-dimensional value function with a state time lag system by using a performance index:

The quadratic form of the two-dimensional value function is as follows:

Wherein,

α₁＝P₁+d²P₄,α₂＝P₃+d²P₄+d(d-1)P₄,α₃＝P₃+d(d-1)P₄+d(d-2)P₄,

α₄＝P₃+d(d-(d-2))P₄+d(d-(d-1))P₄,α₅＝P₃+d(d-(d-1))P₄,α₆＝-d²P₄,

α₇＝-d(d-1)P₄,α₈＝-d(d-(d-2))P₄,α₉＝-d(d-(d-1))P₄,α₁₀＝P₂;

The two-dimensional Q function which can evaluate the state of the system can be designed by the performance index and the two-dimensional value function:

Then the quadratic form of the two-dimensional Q function can be expressed as follows:

By comparing the quadratic form of the two-dimensional value function and the two-dimensional Q function, the relation of the P matrix and the H matrix based on the controller gain can be obtained:

substituting the state space model of the injection molding process into the two-dimensional Q function can solve the model parameter-based form of the H matrix:

Wherein, for simplicity of expression, X (t, k) is represented by X ₁, X (t+1, k-1) is represented by X ₂, and X (t-d+ (d-1), k) is represented by X _d1,x_d2…x_dd;

based on the dynamic programming principle, the following two-dimensional bellman equation based on the Q function can be obtained:

Wherein the method comprises the steps of z(t,k)＝[X^T(t,k),X^T(t-1,k)…X^T(t-d,k),X^T(t+1,k-1),r^T(t,k)]^T;

According to the requirement of optimality, letObtaining optimal control input:

Solving H matrix parameters through a two-dimensional off-track strategy Q learning algorithm driven by data Thereby according to Designing a controller gain K ₁,K₂,K₃;

In order to better solve the relationship between the exploration data and the utilization data in the Q learning, a target strategy r ^j (k, p) needs to be introduced into the system, and a new system can be obtained as follows:

Wherein the method comprises the steps of

Substituting the formula (15) and the relation between the P matrix and the H matrix into the formula (16) to obtain a two-dimensional Belman equation expression as shown below:

Wherein the method comprises the steps of

Step four: solving the gain of the controller;

Calculating the above expressions to obtain the gain of the controller

。