CN118192249A

CN118192249A - Boiler turbine system load control method based on experience-oriented Q learning

Info

Publication number: CN118192249A
Application number: CN202410428471.6A
Authority: CN
Inventors: 刘晓敏; 彭献永; 范赫; 余梦君; 王浩宇; 杨春雨; 周林娜; 赵峻
Original assignee: China University of Mining and Technology CUMT
Current assignee: China University of Mining and Technology CUMT
Priority date: 2024-04-10
Filing date: 2024-04-10
Publication date: 2024-06-14

Abstract

The invention discloses a boiler turbine system load control method based on experience-oriented Q learning, which comprises the following steps: converting the raw load control problem into an augmented error system adjustment problem with respect to tracking error; constructing an experience pool based on the operation history data of the steam turbine of the boiler, providing a leave strategy Q learning method, updating a state-action value function according to batch sampling information, designing and evaluating a neural network approximate Q function, and iteratively updating the state-action value Q function by combining a least square method; the recycled sampling-training cycle nesting training framework is used for further optimizing and evaluating the network weight on line; and designing a Q learning self-adaptive controller, generating data with optimization trend, storing the data into an experience pool, and realizing the guide learning of a Q learning algorithm so as to self-adaptively adjust the load control strategy of the system. The invention efficiently utilizes the system operation data and the learning mode guided by experience, and solves the problems of difficult data utilization, high data quality requirement and the like in the load control of the boiler turbine system.

Description

Boiler turbine system load control method based on experience-oriented Q learning

Technical Field

The invention relates to the field of data driving control of a boiler turbine system, in particular to a boiler turbine system load control method based on experience-oriented Q learning.

Background

With the advancement of the "two carbon" policy, the mutual coupling of thermal power generation and renewable energy sources becomes a key strategy for low carbonization of energy sources. However, this trend also presents a new challenge for thermal power plants, namely to stabilize grid fluctuations. Under the background, optimizing the load control of the boiler turbine system of the thermal power plant is important to ensure the safe and stable operation of the power grid.

Although conventional control schemes, such as Proportional Integral Derivative (PID) controllers, are widely used in the field of industrial process control due to their simple deployment, they have insufficient adaptability to rapid changes in load, and it is difficult to meet design requirements. Therefore, some scholars propose an advanced economic model predictive control scheme based on a feedback linearization method to realize better tracking precision and economic performance. In addition, a learner considers that a state observer and an error integrator are introduced in a control scheme, fuzzy robust control is provided, and experiments prove that the method has good tracking performance and robustness. However, all of these methods are based on an accurate build of the system model. In fact, for systems that have many complex processes, such as boiler turbine systems, that are nonlinear, parameter coupled, etc., these complexities greatly prevent further development of model-based methods.

Reinforcement learning, also called adaptive dynamic programming, is used as a data-driven artificial intelligence algorithm, and can be used for solving the problem of optimizing control of a model unknown system through interactive learning of an agent and an environment. While scholars have proposed adaptive dynamic programming based boiler turbine control methods, these methods typically use a collection of acquired data sets to train network weights. However, this approach often requires assumptions of sufficient richness in the batch data to obtain satisfactory results. Aiming at the problems that the original single-batch data training mode has insufficient data utilization and over-dependence on input data, how to effectively store the data and realize the training mode with optimized trend is still to be further developed.

Therefore, there is an urgent need to implement a data driving method with efficient data utilization and experience guidance, which can implement the load control problem of the boiler turbine system under a given objective.

Disclosure of Invention

The invention provides a boiler turbine system load control method based on experience-oriented Q learning, which solves the problem of non-modeling dynamic boiler turbine system load control, so that a boiler turbine system load control strategy has self-adaption and self-learning capacity, and a sampling-training circulation nested training framework with experience-oriented data recycling is introduced, thereby effectively avoiding the problems of low data utilization rate, high data requirement and the like under a single sampling-training framework, and is described in detail below:

A method for controlling load of a steam turbine system of a boiler based on empirical guided Q learning, the method comprising:

Step1, discretizing a boiler-steam turbine system to obtain a discrete boiler steam turbine system by fixing a sampling period T _s, and converting the original load control problem of the discrete boiler steam turbine system into an augmented error system adjustment problem about tracking error;

Step 2, constructing an experience pool related to an augmentation error system based on historical sampling data of a boiler turbine, providing a Q learning method of a separation strategy, updating a state-action value function according to batch sampling information, designing a single evaluation network approximate Q function, and updating an evaluation network weight by combining a least square method;

Step 3, constructing a sampling-training circulation nesting training framework based on experience-oriented data reuse, and further optimizing and evaluating the network weight on line;

Step 4, designing a Q learning self-adaptive controller by using a strategy gradient descent method, generating data with an optimization trend, and storing the data into an experience pool to realize the guide learning of a Q learning algorithm;

further, the step 1 specifically includes the following:

Step 101, discretizing the boiler-steam turbine system with a fixed sampling period T _s to obtain a discrete boiler steam turbine system represented as follows

x(k+1)＝f(x(k),u(k)) (1)

Where f (·, ·) represents an unknown nonlinear function about the boiler-turbine dynamics, x (k) and u (k) are the system state vector and the control input vector, respectively, at sample time k.

Step 102, the load expected track form is as follows

r(k+1)＝h(r(k)) (2)

Where r (k) is the desired load target at time k and h (r) is a Lipschitz continuous vector function.

It can be seen that the load tracking error is as follows

Step 103, load control problems of the discrete boiler turbine system are as follows: an optimal control input u (k) is designed for the system (1) such that the state x (k) tracks the desired target load r (k) as soon as possible while minimizing input consumption.

Step 104, the error-increasing system for tracking error is as follows

Step 105, the problem of the system adjustment of the tracking error is: to augment the error system (4), a control input u (k) is designed to minimize tracking error and input consumption, i.e. the following performance metrics

Where γ ε (0, 1) is the discount factor, W (E) and E (u) are positive definite functions, R (y (l), u (l)) represents the utility function obtained at time l.

Further, the step 2 specifically includes the following contents;

Step 201, the order reduction system is:

Step 201, constructing the history data experience pool in the following form

Where y represents the augmented error system state, y' represents the next time the augmented error system state, a represents the control input taken in state y, and N represents the empirical pool size.

Step 202, for the control strategy u (y), in the state y (k), the value function V _u (y (k)) is defined as

Step 203, satisfyThe state-action value function Q _u (y (k), a) is:

in step 204, the iterative Q learning algorithm using the leave strategy evaluates the state-action value function Q _u (y (k), a), specifically:

(1) Based on the sampled data (y, a, R, y'), iteratively updating the Q value:

Where i is the number of iterations.

(2) Based on the gradient descent method, it can be seen that

Where ζ is the policy update step size.

(3) Let i=i+1 until the Q value converges.

Step 205, the optimal Q function can be found to satisfy the following HJB equation

Q^*(y(k),a)＝R(y(k),a)+Q^*(y(k+1),u(k)) (11)

The goal of Q learning is to find the optimal strategy to minimize the Q function, i.e

In step 206, the sheet evaluation network is designed to approximate a Q function, and thus, the Q function can be expressed as:

Wherein L is the number of neurons of the hidden layer of the evaluation network, For the network activation vector function, ω corresponds to the network weight vector and ε is the evaluation network approximation error.

Selection ofTo represent an ideal evaluation network weight vector, the approximate Q function is an evaluation network output, expressed as follows

Step 207, in combination with batch sampling information, adopting a least square method to update the evaluation network weight in an iterative manner, wherein the method specifically comprises the following steps:

(1) Selecting batch size data Calculate each data

The following time difference errors are as follows:

Wherein the method comprises the steps of The evaluation neural network approximation residual under the first data is shown.

(2) Updating the network weight parameters by the least square method

Wherein the method comprises the steps of/>

Further, the step 3 specifically includes the following contents;

step 301, selecting the jth batch data As training data, the online optimization evaluation network weights according to formula (16) are expressed as follows:

Step 302, updating the strategy according to formula (10) based on the gradient descent method

Step 303, let i=i+1, repeat steps 301 and 302 above untilAnd (5) convergence.

Step 304, training the network weight under the jth batch dataActing on the boiler-turbine system, sampling the system, and updating the evaluation weight by using a soft updating mode, namely

Where β is the soft update rate.

Step 305, the sample-training loop nesting training framework based on experience-oriented data reuse, namely: each time the experience pool data is circularly sampled, the online optimization evaluation of the network weight is realizedAnd acting on the boiler-turbine system to generate optimization data to store in the experience pool to construct a nested training framework with experience guidance until the network weights/>And (5) convergence.

Further, the step 4 specifically includes the following contents;

step 401, obtaining the weight when the j-th batch data After convergence, it can be known that the optimal load following control rate is

Step 402, designing an adaptive control strategy based on a gradient descent method as follows

Step 403, will have Gaussian noiseSearch action/>Acting on the boiler turbine system, observing the next state y ', calculating the corresponding utility function R (y (k), a), and storing the data (y, a, R, y') with the optimized trend into experience Chi _N to realize the guide learning of the Q learning algorithm.

The technical scheme provided by the invention has the beneficial effects that:

1) Aiming at the difficulty in modeling a complex nonlinear boiler turbine system, a boiler turbine load control method based on single evaluation network Q learning is designed. Compared with a control method based on a model, the method based on data driving has more self-learning and self-adapting capabilities.

2) By constructing an experience pool for data reuse and designing a sampling-training cycle nested training framework with experience guidance, system data with optimization trend can be generated, so that initial data dependency is reduced, data utilization rate and data quality are improved, and learning stability is guaranteed.

Drawings

FIG. 1 is a diagram of a boiler turbine system load control framework based on empirical guided Q learning;

FIG. 2 is a graph of two norms of neural network weight errors in an iterative process with optimized batch data for evaluating the neural network weight ω;

FIG. 3 is a state trace of x ₁ and an error e ₁ trace plot;

FIG. 4 is a state trace of x ₂ and an error e ₂ trace plot;

FIG. 5 is a state trace of x ₃ and an error e ₃ trace plot;

Fig. 6 is a control input trace diagram.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in further detail below.

The invention relates to a load control method of a boiler turbine system based on experience-oriented Q learning, which is shown in figure 1 and comprises the following steps:

Step 1, discretizing a boiler-steam turbine system to obtain a discrete boiler steam turbine system by fixing a sampling period T _s, and converting the original load control problem of the discrete boiler steam turbine system into an augmented error system adjustment problem about tracking errors;

x(k+1)＝f(x(k),u(k)) (1)

Step 102, the load expected track form is as follows

r(k+1)＝h(r(k)) (2)

It can be seen that the load tracking error is as follows

Step 103, load control problems of the discrete boiler turbine system are as follows: the optimal control input u (k) is designed for the system (1) such that the state x (k) tracks the desired target load r (k) as soon as possible while minimizing input consumption.

Step 104, the error-increasing system for tracking error is as follows

Step 2, constructing an experience pool related to an augmentation error system based on historical sampling data of a boiler turbine, providing a Q learning method of a separation strategy, updating a state-action value function according to batch sampling information, designing a single evaluation network approximate Q function, and updating an evaluation network weight by combining a least square method, wherein the step 2 specifically comprises the following steps of;

Step 201, constructing the history data experience pool in the following form

Step 203, satisfyThe state-action value function Q _u (y (k), a) is

(1) Based on the sampled data (y, a, R, y'), iteratively updating the Q value:

Where i is the number of iterations.

(2) Based on the gradient descent method, it can be seen that

Where ζ is the policy update step size.

(3) Let i=i+1 until the Q value converges.

Q^*(y(k),a)＝R(y(k),a)+Q^*(y(k+1),u(k)) (11)

At step 206, the design sheet evaluates the network to approximate a Q function, and therefore, the Q function is expressed as:

Selection ofRepresenting an ideal evaluation network weight vector, the approximate Q function being the evaluation network output, expressed as follows

(1) Selecting batch size data Calculate each data

The following time difference errors are as follows:

(2) Updating the network weight parameters by the least square method

Wherein the method comprises the steps of/>

Step 3, constructing a sampling-training circulation nesting training framework based on experience-oriented data reuse, and further optimizing and evaluating network weights on line, wherein the step 3 specifically comprises the following contents;

Step 303, let i=i+1, repeat steps 301 and 302 above untilAnd (5) convergence.

Step 304, training the evaluation network weight under the jth batch dataActing on the boiler-turbine system, sampling the data of the system, and updating the weight of the evaluation network by using a soft updating mode, namely

Where β is the soft update rate.

Step 4, designing a Q learning self-adaptive controller by using a strategy gradient descent method, generating data with an optimization trend, storing the data into an experience pool, and realizing the guide learning of a Q learning algorithm, wherein the step 4 specifically comprises the following steps of;

The method is suitable for the boiler turbine system under various operation conditions. In order to better understand the present invention, a detailed description of a method for controlling load of a steam turbine system of a boiler based on empirical guided Q learning will be provided with reference to specific embodiments.

Consider the following nonlinear boiler-turbine system:

Wherein the state variables x ₁、x₂、x₃ are respectively drum pressure (kg/cm ²), electric power (MW) and steam water density (kg/cm ³), and the input variables u ₁、u₂、u₃ are respectively regulated fuel flow, steam flow and water supply flow. It can be seen that from the sampling time T _s, the following discretization system is obtained:

For which at a given load tracking target R (k) = [121,90,389.92], a utility function R (y (l), u (l))=e (k) ^T(0.3I₃)e(k)+u(k)^T(I₃) u (k) is designed, discounting system γ=0.95.

Designing activation function vector for evaluation network with 54 neurons Empirical pool size n=20000, batch size m=400, step α=0.07, soft update rate β=0.7.

Setting the two-norm iteration termination threshold of the weight vector change as 10, and after 27 iterations, the system evaluation neural network is converged, wherein the iteration process is shown in fig. 2. And as shown in fig. 5, it is known that the system is ultimately able to achieve tracking of a given target value. The state of the system variable x ₁、x₂、x₃ and the corresponding tracking target r ₁,r₂,r₃ and the trajectory of the error are shown in fig. 3, 4, and 5, respectively. Fig. 6 is a diagram of a change trace corresponding to the system input u (k).

The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims

1. A load control method of a boiler turbine system based on experience-oriented Q learning is characterized by comprising the following steps:

And step 4, designing a Q learning self-adaptive controller by using a strategy gradient descent method, generating data with an optimization trend, storing the data into an experience pool, and realizing the guided learning of a Q learning algorithm.

2. The method for controlling the load of the steam turbine system of the boiler based on the experience-oriented Q learning according to claim 1, wherein the step 1 specifically comprises the following steps:

x(k+1)＝f(x(k),u(k)) (1)

Where f (·, ·) represents an unknown nonlinear function with respect to boiler-turbine dynamics, x (k) and u (k) are the system state vector and the control input vector, respectively, at sample time k;

Step 102, the load expected track form is as follows

r(k+1)＝h(r(k)) (2)

Where r (k) is the desired load target at time k and h (r) is a Lipschitz continuous vector function;

It can be seen that the load tracking error is as follows

Step 103, load control problems of the discrete boiler turbine system are as follows: designing an optimal control input u (k) for the system (1) such that the state x (k) tracks the desired target load r (k) as soon as possible while minimizing input consumption;

Step 104, the error-increasing system for tracking error is as follows

3. The method for controlling the load of the steam turbine system of the boiler based on the experience-oriented Q learning according to claim 2, wherein the step 2 specifically comprises the following steps:

Step 201, constructing the history data experience pool in the following form

Wherein y represents an augmented error system state, y' represents an augmented error system state at the next moment, a represents a control input taken in the state y, and N represents the empirical pool size;

Step 203, satisfyThe state-action value function Q _u (y (k), a) is:

(1) Based on the sampled data (y, a, R, y'), iteratively updating the Q value:

wherein i is the number of iterations;

(2) Based on the gradient descent method, it can be seen that

Where ζ is the policy update step size;

(3) Let i=i+1 until the Q value converges;

Q^*(y(k),a)＝R(y(k),a)+Q^*(y(k+1),u(k)) (11)

Wherein L is the number of neurons of the hidden layer of the evaluation network, For the network activation vector function, ω corresponds to the network weight vector, ε is the evaluation network approximation error;

Selection of Representing an ideal evaluation network weight vector, the approximate Q function being the evaluation network output, expressed as follows

(1) Selecting batch size data The time difference error for each data is calculated as follows:

Wherein the method comprises the steps of The evaluation neural network approximation residual error under the first data is shown;

(2) Updating the network weight parameters by the least square method

Wherein the method comprises the steps of/>

4. The method for controlling the load of a steam turbine system of a boiler based on experience-oriented Q learning according to claim 3, wherein the step 3 comprises the following specific steps:

Step 303, let i=i+1, repeat steps 301 and 302 above untilConverging;

step 304, training the evaluation network weight under the jth batch data Acting on the boiler-turbine system, sampling the data of the system, and updating the weight of the evaluation network by using a soft updating mode, namely

Wherein β is the soft update rate;

Step 305, the sample-training loop nesting training framework based on experience-oriented data reuse, namely: each time the experience pool data is circularly sampled, the online optimization evaluation of the network weight is realized And acting on the boiler-turbine system to generate optimization data to store in the experience pool to construct a nested training framework with experience guidance until the network weights/>And (5) convergence.

5. The method for controlling the load of a steam turbine system of a boiler based on experience-oriented Q learning according to claim 4, wherein the step 4 comprises the following steps: