WO2007116591A1

WO2007116591A1 - Plant controller

Info

Publication number: WO2007116591A1
Application number: PCT/JP2007/050683
Authority: WO
Inventors: Akihiro Yamada; Takaaki Sekiai; Yoshiharu Hayashi; Naohiro Kusumi; Masayuki Fukai; Satoru Shimizu
Original assignee: Hitachi, Ltd.
Priority date: 2006-03-31
Filing date: 2007-01-18
Publication date: 2007-10-18
Also published as: JP2007272646A; JP4665815B2

Abstract

A plant controller enabling always stable operation with an improved control performance by reducing the risk that the state cannot be improved nor becomes worse during control operation. The plant controller is characterized by comprising basic control command computing means for inputting plant measurement data and computing an operation command variable sent to the plant, operation record database where operation data including the measurement data and the operation command variable are stored, status searching means for searching for and extracting similar statuses according to the current operation data and the past operation data, improvement probability computing means for computing a frequency distribution or a probability distribution from the variation record of the operation status by control operation according to the past operation data being the extracted similar statuses and computing the improvement probability or the nonimprovement probability by the control operation, and operation command determining means for determining the next operation command variable according to the computed improvement or nonimprovement probability.

Description

Plant control device

Technical field

The present invention relates to a plant control device.

Background art

Conventionally, control logic based on PID control has been mainstream in the field of plant control. In addition, many supervised learning functions such as neural networks have been proposed to flexibly respond to plant characteristics!

[0003] In order to configure a control device using a supervised learning function, it is necessary to prepare a success case as teacher data in advance, so an unsupervised learning method has also been proposed.

[0004] As an example of unsupervised learning, there is a reinforcement learning method.

[0005] Reinforcement learning is a learning control method that generates an operation signal to the environment so that the measurement signal obtained by the environmental force becomes a desired one through trial and error interaction with the environment such as the controlled object. It is a framework. In this way, even if a successful case cannot be prepared in advance, there is an advantage that it is possible to learn desired and action according to the environment by simply defining a desirable state.

[0006] In reinforcement learning, the current state power can be obtained in the future by using the evaluation value of the scalar quantity calculated by using the measurement signal obtained from the environmental power (referred to as reward in reinforcement learning). It has a learning function that generates an operation signal to the environment so that the expected value of the evaluation value is maximized. As a method for implementing such a learning function, there are algorithms such as Actor-Critic, Q-learning, and real-time dynamic programming described in Non-Patent Document 1, for example.

[0007] In addition, a framework called Dyna-architecture has been introduced in the above-mentioned document as a framework for reinforcement learning that is an extension of the above-described method. This is a method of learning in advance what kind of operation signal should be generated for a model simulating a control target, and determining the operation signal to be applied to the control target using the learning result. It also has a model adjustment function that reduces the error between the controlled object and the model.

[0008] Further, as a technique to which reinforcement learning is applied, a technique described in Patent Document 1 is cited. It is. This is because multiple reinforcement learning modules, which are a combination of a model and a system with a learning function, are prepared, and a responsibility signal that takes a larger value as the prediction error between the model and the controlled object in each reinforcement learning module is smaller. This is a technology that determines the operation signal to be applied to the control object by weighting the operation signal to the control object generated from each reinforcement learning module in proportion to the responsibility signal.

[0009] Further, in plant control, it is necessary to consider variations in actual machine data. In general, the state of the plant fluctuates, and fine adjustment is always repeated by the action of the control system even in the apparently static state. In addition, the current state is affected by the previous state due to plant dynamics. Furthermore, because there are errors in the actuator and measuring instrument, signal noise, etc., the plant state (process value) is usually not exactly the same even under the same operating conditions. That is, there is variation in plant data.

[0010] The variation in plant data is described in Patent Document 2. Patent Document 2 describes a method of adjusting a process simulation model using a reinforcement learning method, a method of configuring a driving training device and a driving diagnosis device using the model, and the like. In this paper, it is described how to express the fluctuation of a phenomenon with a probability density function such as a normal distribution, and how it can be simulated more realistically by reflecting this fluctuation in simulation conditions and model parameters.

Patent Document 1: Japanese Patent Application Laid-Open No. 2000-35956

Patent Document 2: JP 2004-178492 A

Non-Patent Document 1: Reinforcement Learning, Sadayoshi Mikami · Masaaki Minagawa, Morikita Publishing Co., Ltd., published on December 20, 2000

Disclosure of the invention

Problems to be solved by the invention

[0012] As described above, since plant data generally varies, even if a certain index (for example, a brand output) is the same, the process state values are not always the same. Therefore, the result is often not the same even if the same operation is performed, so that the improvement effect as expected by the control operation cannot be obtained, and conversely, the state may worsen.

[0013] Especially when learning the operation method by reinforcement learning or the like, There is a possibility that the driving state may be improved or not improved for the same operation. S There is a possibility that the operation (improved) cannot be learned as desired.

[0014] Non-Patent Document 1 and Patent Document 1 do not describe a method for variation in plant data. In addition, Patent Document 2 describes that the variation is expressed by a probability density function or the like, but it describes a countermeasure method for the above problem at the time of control operation and a countermeasure method at the time of reinforcement learning! Are you! /! /.

[0015] The present invention provides a control device that can reduce the risk that the state will not be improved or worsen during the control operation, improve the control performance, and can always perform a stable operation. The purpose is that.

Means for solving the problem

[0016] The present invention provides basic control command calculation means for inputting plant measurement data and calculating an operation command value for the plant, and an operation result database for accumulating operation data having the measurement data and the operation command value. A state search means for searching for and extracting a similar state based on the current operation data and the past operation data, and past operation data in a similar state extracted by the state search means. Performance of change in driving state by operation Improvement probability calculation means for calculating frequency distribution or probability distribution and calculating improvement probability or non-improvement probability by the control operation, and improvement probability or non-improvement calculated by the improvement probability calculation means A plant control apparatus comprising operation command determining means for determining a next operation command value based on a probability.

The invention's effect

[0017] Since the present invention is constituted by the above means, it is possible to reduce the risk of non-improvement of the plant state due to the control operation caused by the variation in the actual plant operation data. In other words, the control operation can reduce the risk that the deviation from the control target value (control deviation) will increase compared to the current state, so that stable operation is always possible.

BEST MODE FOR CARRYING OUT THE INVENTION

Hereinafter, embodiments will be described with reference to the drawings.

FIG. 1 shows a first embodiment. The control device 200 also receives the measured value 205 of the process value for the plant 100 force to be controlled, and uses this value to pre-process it in the control device 200. Performs the programmed calculation and sends an operation command signal (control signal) 285 to the plant 100. In accordance with the received operation command signal 285, the plant 100 controls the state of the plant by operating an actuator such as a valve opening or a damper opening.

This embodiment is an example applied to combustion control of a thermal power plant. In this example, an example applied to a control function aimed at reducing the NOx and CO concentrations in exhaust gas will be explained.

FIG. 10 shows a configuration of a thermal power plant that is a control target. The coal used as fuel, the primary air for transporting the coal, and the secondary air for adjusting the combustion are introduced into the boiler 101 through the burner 102, and the boiler 101 burns the coal. Coal and primary air are routed from line 134 and secondary air is routed from line 141. Further, after-air for two-stage combustion is introduced into the boiler 101 via the after-air port 103. This after air is led from the pipe 142.

[0022] The high-temperature gas generated by the combustion of coal flows along the path of the boiler 101 and then passes through the heater 104. Then, after removing harmful substances with an exhaust gas treatment device, it is released from the chimney to the atmosphere.

[0023] The feed water circulating in the boiler 101 is guided to the boiler 101 via the feed water pump 105, and is superheated by the gas in the heat exchanger 106 to become high-temperature and high-pressure steam. In this embodiment, the number of heat exchanges is one, but a plurality of heat exchanges may be arranged.

The high-temperature and high-pressure steam that has passed through the heat exchanger 106 is guided to a steam turbine bin 108 via a turbine governor 107. The steam turbine 108 is driven by the energy of the steam, and the generator 109 generates power.

[0025] Next, the primary air supplied from the burner 102, the secondary air, and the after-air port 103 will be described.

The primary air is led from the fan 120 to the pipe 130, branched into a pipe 132 that passes through the air heater and a pipe 131 that does not pass through the air, and merges again in the pipe 133 and led to the mill 110. The air passing through the air heater is superheated by the gas. Using this primary air, coal (pulverized coal) produced in the mill 110 is conveyed to the burner 102.

[0027] Secondary air and after air are led from the fan 121 to the pipe 140, and the air heater 10 After being heated at 4, the secondary air pipe 141 and the after-air pipe 142 are branched to the burner 102 and the after-air port 103, respectively.

[0028] The control device 200 has a function of adjusting the air amount input from the burner and the air amount input from the after air port in order to reduce NOx and CO concentrations.

[0029] The control device 200 includes a basic control command calculation means 230, a correction means 250 for changing or correcting the basic operation command value 235 output from the basic control command calculation means 230, a process measurement value 205, and an operator input. Operation results database 240 that stores and stores operation results data consisting of signals, command signals of higher control system power, etc., and input / output interface 220 for data exchange with plant 100 or operators, etc. It consists of input / output means 221 for the operator to view various data and to input set values, operation modes, manual operation commands, etc.

[0030] The basic control command calculation means 230 has a PID (proportional / integral / derivative) controller as a basic component, and inputs the process measurement value 205, operator input signal, host control system force command signal, etc. Calculates and outputs basic operation command value 235 for various operating devices such as valves, dampers, and motors installed in

A feature of this embodiment is that a correction means 250 for changing or correcting the basic operation command value 235 is provided. Hereinafter, the correction means 250 will be described.

[0032] The correction means 250 is composed of a state search means 260, an improvement probability calculation means 270, and an operation command determination means 280. The plant operation is based on the basic operation command value 235 from the past operation result data. This function has a function to switch whether to output the current value of the basic operation command value 235 or to maintain the previous value according to the probability. ing.

[0033] Generally, there is variation in plant data, so even if an index (for example, plant output) is at the same level, there are few cases where the process state values are exactly the same. Since the types of process values referenced by the control unit to recognize the current state are limited, other process values may be different even if they all have the same value. . In addition, the operation of the actuator and the error of the measuring instrument are the factors of the data dispersion. Therefore, there are many cases where the result is not the same even if the control device recognizes the same state and performs the same operation. In other words, the state improved by the control operation last time, but this time, the state may worsen. In addition, it may not be as effective as expected.

FIG. 4 shows the relationship between the operation parameter X in FIG. 4 and the process value (state quantity A) to be controlled. As described above, since the operation data varies, plotting the operation results yields data with the distribution shown in Fig. 4.

[0036] For example, when the current state quantity is b and the basic operation command value 235 is a value indicated by the next operation point, the driving performance corresponding to the next operation point has a distribution. If the distribution is a frequency distribution or probability distribution as shown on the right side of Fig. 4, the expected value is smaller than the current state quantity b, and has been reduced to the minimum value c. However, there is also a probability that it will be larger than the current state b, and it can be seen that there is a track record that it has increased to a maximum.

[0037] In this case, the determination varies depending on whether the state quantity A is larger, more desirable, smaller, or more desirable, but the state quantity A is greater than the expected value power ¾, for example, assuming that a smaller one is desirable. Since it is small, it can be judged that the following operations should be performed. However, since there is a possibility that the state is bad up to a, it is necessary to decide whether or not to perform the next operation.

[0038] If a bad condition in the next operation has a fatal effect on the quality of products manufactured in the plant, or affects the external environment, the condition deteriorates. Risk must be eliminated as much as possible.

[0039] Therefore, the correction means 250 automatically analyzes past operation result data and determines whether to output the current operation value of the basic operation command value 235 as it is based on the result. As described above, it is possible to suppress a situation in which the state of illness has a serious influence on the product or the environment.

Next, a specific algorithm of the correction unit 250 will be described with reference to FIG. First, in step 500, the condition parameters are read. Condition parameters are the process value that is an index for identifying the plant state, the deviation tolerance value for defining similarity, and the output of the current operation value of the basic operation command value 235 for conditions such as improvement probability. Determining whether to allow This is the standard tolerance.

[0041] In this example, the power output value and the air damper opening that adjusts the air amount that is the operation end of the combustion control are used as an index for specifying the plant state. However, this embodiment does not limit what is used for this index. For example, conditions such as the fuel flow rate and the feed water flow rate may be added, or another index may be used.

These are inputted by the operator from the input / output means 221. These input data are stored in storage means (not shown) in the control device, and are read when step 500 is executed.

[0043] In step 510, the similarity between the past driving data stored in the driving performance database 240 and the current state is calculated.

[0044] The similarity is defined by the Euclidean distance. When the coordinates of two data points P and Q are given by (Xpl, Xp2, Xp 3, ..., Xpn), (Xql, Xq2, Xq3, ..., Xqn), the distance d squared between the two points Is obtained by equation (1). Here, the coordinates Xpi and Xqi are process values defined as indicators for identifying the plant state.

[0045] When a plurality of these indicators are used, it is better to use each process value with a maximum value or the like because the unit of the process value is different.

[0046] [Equation 1] d ² = _ i = i (Xpi— Xqi) ² , · · Equation (1)

[0047] Using equation (1), the distance d between the current driving state and the past driving performance points is calculated.

[0048] In step 520, the deviation allowable distance d max for defining the similarity read in step 500 is compared with each distance d calculated in step 510, and the operation satisfying the condition of equation (2) is performed. Extract only the data set.

[0049] [Equation 2] α d max formula (2)

[0050] In step 530, as shown in the graph on the right side of FIG. 4 from the data set extracted in step 520, NOx or CO at the air damper opening at which the basic operation command value 235 is obtained. Calculate the frequency distribution from the concentration results. At this time, the air damper opening, which is the operation amount, uses a predetermined division width. In addition, since NOx and CO values fluctuate with time, they are averaged over a predetermined time interval, and these are also counted with a predetermined division width.

[0051] In step 540, the frequency at which the NOx or CO concentration decreased from the current value in the operating state of the basic operation command value 235 is obtained from the counted frequency distribution, and this rate is divided by the total frequency to obtain the improvement probability. To do.

On the contrary, the ratio of the frequency increased from the current concentration is defined as the non-improvement probability. In other words, the probability of deterioration.

[0053] Further, the minimum value of NOx or CO concentration is searched, and the value is set as the maximum improvement performance value. Similarly, search for the maximum value of NOx or CO concentration, and use that value as the maximum non-improvement actual value. In other words, it is the maximum bad performance record.

[0054] In Step 550, the judgment reference value for whether or not to allow the output of the current operation value of the basic operation command value 235 read in Step 500, the improvement probability or non-improvement probability calculated in Step 540, and the maximum improvement actual value , The maximum non-improvement actual value is compared with each other, and whether or not the basic operation command value 235 current calculated value can be output is also determined based on a predetermined determination condition.

[0055] Judgment conditions include improvement probabilities or non-improvement probabilities, maximum improvement actual values, and maximum non-improvement actual values. When all the reference values are satisfied, the basic operation command value 235 Set to allow output! /

[0056] In addition to the so-called AND condition, the determination condition may be an OR condition or a combination of these threads, and other settings may be used. Judgment conditions include the maximum improvement value, maximum non-improvement value, variance, average value, expectation value, maximum improvement value, or the probability of occurrence exceeding a predetermined ratio with respect to the maximum non-improvement value.

Steps 500, 510, and 520 are executed by the state search means 260, and data 265 that meets similar conditions is extracted from the data 245 stored in the operation result database 240.

Steps 530 and 540 are executed by the improvement probability calculation means 270 and become information 275 of improvement probability or non-improvement probability, maximum improvement actual value, and maximum non-improvement actual value.

Step 550 is executed by the operation command determination means 280. In the operation command determination means 280, when the basic operation command value 235 this time calculated value is permitted based on the result of step 550, Outputs the basic operation command value 235 current calculation value as the operation command value 285 as it is. In the case of non-permission determination, the previous value of the basic operation command value 235 (that is, the current operation command value) is output as the operation command value 285.

[0060] In the example shown in the frequency distribution of NOx concentration in Fig. 5, past results indicate that there is no possibility that the NOx concentration will increase from the current value. In this case, the basic operation command value 235 is calculated this time. It is appropriate to use a value.

[0061] In the example shown in the frequency distribution of NOx concentration in Fig. 6, the expected value is that the NOx concentration will decrease and there is still a significant probability that it will increase. From the viewpoint of reducing NOx emissions as much as possible, If it is judged that the risk is high, the next operation can be selected.

[0062] In this way, for example, an operation with a non-improvement probability equal to or higher than a predetermined value is not performed, and measures such as not performing an operation when the maximum non-improvement actual value is unacceptable are taken. Since it can be taken dynamically, the risk of non-improvement due to the control operation can be reduced.

[0063] This is particularly effective when it affects the external environment, such as product quality and the release of harmful substances such as NOx and CO, in the case of non-improvement, and enables safe and stable operation. It can contribute to product quality assurance, yield improvement, and environmental conservation.

[0064] Further, in a thermal power plant, the power for removing NOx by a denitration device requires ammonia for denitration. If the amount of NOx generated can be reduced by the control device of this embodiment, the consumption of ammonia can be reduced, and the economic effect of reducing the operating cost can be expected. In addition, by reducing the amount of NOx generated, the effect of extending the life of the denitration catalyst can be expected if the denitration device is downsized.

In this example, the previous value of the basic operation command value 235 is output in the case of non-permission determination, but there is also a method in which the basic operation command value 235 is corrected and added to the current calculated value. As a correction method, the probability of improvement or non-improvement calculated in step 550 in step 550, the maximum improvement actual value, the maximum non-improvement actual value, and the deviation between each corresponding criterion value is 0 to 1. It is also possible to use a force-specific correction method that uses a coefficient.

[0066] Also, the probability of improvement or non-improvement probability when the previous value of basic operation command value 235 is output. If the previous value and the current value are good, the probability of obtaining the result is high, and the one with the higher value may be selected.

FIG. 7 shows an example of a screen for setting a determination condition for determining whether or not output of the current operation value of the basic operation command value 235 is permitted. The screen shown in FIG. 7 is displayed on the display monitor 223, and the operator inputs settings from the keyboard 222 with a mouse.

[0068] In FIG. 1, the input / output means 221 is composed of a screen display monitor 223 and a keyboard 222 with a mouse as input means. As input / output means to the operator, there are a voice input / output device, a touch pen, etc. It is also possible to use these devices.

[0069] On the setting screen 300, “Always allow new operation” or “Allow conditionally” can be selected, and the check box indicated by the mouth is checked using the mouse cursor 301. To select one of them. If you check either one, the other check will be deleted and you will not be able to enter any more.

[0070] When “Allow conditionally” is selected, “Determine by improvement probability or non-improvement probability”, “Determine by expected improvement”, “Determine by maximum improvement result”, “Maximum non-improvement result” Each condition sentence of “judgment” is presented, and the condition is selected by checking the check box in front of the conditional sentence to be used. You can also select whether to make each conditional statement an “AND condition” or “OR” condition.

[0071] The setting conditions are not limited to these, and other conditions may be added.

When “Always allow” is selected, the “setting end” button 304 at the bottom of the screen can be selected, and the setting is ended by clicking this with the mouse pointer 301. If “Allow with condition” is selected, the “To condition parameter setting screen” button 303 can be selected. Clicking this with the mouse pointer 301 advances to the condition parameter setting screen.

Further, the setting screen 300 has a “return” button 302, which can be clicked to return to the state before the setting. Since no condition is set at first, the “return” button 302 is disabled until the condition is set!

However, the initial setting may be set as a default and this may be set as the initial state. Also, default settings can be added as conditions on the setting screen 300. [0075] When "Allow with condition" is selected and "To condition parameter setting screen" button 303 is clicked, the screen shown in FIG. 8 is displayed. In the screen of Fig. 8, the target process value can be selected from the pull-down menu 305 in the upper right of the screen. Also, the process value selected from the pull-down menu 305 can be divided into levels. In the example of Fig. 8, NOx is selected as the process value, and it can be divided into multiple levels by entering the upper and lower limits of the NOx value. Each level is managed as a condition number.

[0076] When the level setting is completed, the "Next" button 306 is clicked to move to a screen for setting an allowable value for each level (condition No.).

An example of the allowable value setting screen is shown in FIG. Select condition No. in the pull-down menu 307 at the top right of the screen.

FIG. 9 is an example of a screen for setting the permissible values for “determination based on improvement probability or non-improvement probability” and “determination based on expected improvement value”. The allowable values to be set are the items checked on the setting screen 300, and the corresponding allowable value setting screen is automatically displayed.

[0079] The permissible value of "determined by improvement probability or non-improvement probability" can be changed by a mouse operation on the horizontal bar. You can also enter values directly from the keyboard in the non-improvement probability or improvement probability columns. When a value is entered for either non-improvement probability or improvement accuracy, the other automatically displays the difference between 100% and the input value.

[0080] The allowable value of “determined by the expected improvement value” can also be changed by operating the mouse on the horizontal bar. Here, the ratio corresponds to the level range of process values such as NOx values set on the screen in Fig. 8. In addition, on the screen shown in Fig. 9, the upper and lower limits of NOx can be directly entered to set the allowable value for “determined by the expected improvement value”.

As shown in the figure, it is possible to set an allowable value in the same manner for “determination based on the maximum improvement record” and “determination based on the maximum non-improvement record”.

[0082] The results of operation with the conditions set as described above can be confirmed on the screen as shown in the NOx operation result trend display example of FIG. In Fig. 11, the power generation output value, the NOx generation value, and whether NOx has decreased or increased (ie, improved or not improved) are displayed in a time-series graph. The percentage of improvement cases and non-improvement cases is indicated in%. [0083] By confirming the results with the graph in this way, it becomes easy to review the condition settings. Click the “Condition setting confirmation” button 308 to display the settings made on the setting screen 300 or later in a separate window. In addition, when the “condition setting change” button 309 is clicked, a setting screen 300 is displayed, and the setting contents can be changed.

[0084] In the example of this embodiment, the force of controlling the process value to be controlled as NOx and CO concentration is not limited to this. The amount of CO, SOx, Hg (mercury) in the gas, fine particles consisting of fluorine, dust, or mist

2

It can also target children, VOCs (volatile organic compounds), or steam temperature, steam pressure, generator output, efficiency, etc. A plurality of combinations of these can also be used as “AND conditions” or “OR conditions”.

Next, a second embodiment will be described with reference to FIG. The difference from the first embodiment shown in FIG. 1 is that the correction learning means 290 of the control device 200 of this embodiment is provided with reinforcement learning means 290. The reinforcement learning means 290 has a function of learning an appropriate operation method corresponding to the plant state by the reinforcement learning theory using the operation data stored in the operation result database 240.

[0086] A detailed explanation of reinforcement learning theory is given in, for example, “Reinforcement Learning,” Sadayoshi Mikami and Masaaki Minagawa, Morikita Publishing Co., Ltd., published on December 20, 2000. Here, only the concept of reinforcement learning will be described.

FIG. 12 shows the concept of control based on reinforcement learning theory. The control device 610 outputs an operation command 630 to the controlled object 600. The controlled object 600 operates according to the control command 630. At this time, the state of the controlled object 600 is changed by the operation according to the control command 630. A reward 620 is received from the controlled object 600 that is an amount that indicates whether the changed state is desirable or undesirable for the controller 610 and how much they are.

Actually, the information to be received is the state quantity of the control object, and the control device 610 generally calculates the reward based on the information. Generally, the reward is set to increase as it approaches the desired state, and the reward decreases as the state becomes undesirable.

[0089] The control device 610 performs the operation by trial and error, and learns the operation method that maximizes the reward (ie, approaches the desired state as much as possible). Appropriate operation (control) logic is automatically constructed according to the state.

[0090] Supervised learning theory represented by neural networks needs to provide success cases as teacher data in advance, and is not suitable when a new plant or phenomenon is complicated and a success case cannot be prepared in advance.

[0091] On the other hand, reinforcement learning theory is classified as unsupervised learning, and the characteristics of the controlled object are not always clear in that they have the ability to generate the desired L ヽ operation by trial and error! No! ヽ It has advantages that can be applied even in cases.

In the second embodiment, this reinforcement learning theory is used.

FIG. 13 shows the configuration of reinforcement learning means 290. The reinforcement learning means 290 includes a modeling means 291 and a learning means 292.

[0094] Reinforcement learning is learned on a trial and error basis, but in the case of plant control, operating on the actual plant directly and on a trial and error basis can be a risk of operation, damage to manufactured products of the plant, etc. This is difficult to realize. Therefore, an operation characteristic model is created from the operation results of the plant, and this model is used as a learning method.

[0095] Modeling means 291 is a neural network consisting of an input layer, an intermediate layer, and an output layer, which reads past operation data from the operation result database 240, and uses an error back-propagation method (back propagation method) to determine the input / output relationship. learn. The configuration and learning method of the neural network is a general method, and these methods may be other methods. It does not depend on the configuration of the neural network or the learning method, so it is detailed here. The description is omitted.

[0096] The input data are the air flow rate at each position of the burner and after-air port, the fuel flow rate for each burner, and the generator output, and the output data are NOx and CO concentrations.

[0097] In this example, the relationship between the fuel flow rate, air flow rate, power generation output and NOx and CO concentration is modeled, but the input items and output items are not limited to this. Further, the model method is not limited to the Euler network, and other statistical models such as a regression model may be used.

[0098] The learning means 292 is input data consisting of the air flow rate at each position of the burner and after-air port, and the fuel flow rate at each burner with respect to the model created by the modeling means 291. 93 is output. Input data 293 corresponds to the operating conditions of the plant, and the upper and lower limits, change width (step size), and maximum change width that can be taken in one operation are set. Each amount of the input data 293 is determined at random within a range of possible values.

[0099] Modeling means 291 inputs input data 293 to the created model, and outputs data 2

Calculate the NOx and CO concentrations to be 94.

[0100] The learning means 292 receives the output data 294 and calculates a reward value.

[0101] The reward is defined by equation (3), where R is the reward value, O is the NOx value, O is the CO value, S

NOx CO NOx and S are target setpoints for NOx and CO, and k, k, k, and k are positive constants.

CO 1 2 3 4

[0102] [Equation 3]

R = R ₁ + R ₂ + R ₃ + R ₄ ... Formula (3)

Ox-NOx

1 0 Co _NOx > s _Ox )

k ₂ (O _c o ≤ _co )

0

(s _NOx — o _NOx ) (o _NOx <S _NOx )

k ₄ ,. co 1 Oco Oco-Sco)

0

[0103] As shown in Equation (3), when the NOx and CO values are lower than the target set value, the reward R and

1

R

2 is given, and if it falls below the target set value, a reward is given in proportion to the deviation.

[0104] There are various other methods for defining the reward, and the method is not limited to the method of equation (3).

[0105] The learning means 292 learns the combination of the input data 293 so that the reward calculated by Equation (3) is maximized, that is, the manipulated variable. As a result, NOx and CO are calculated corresponding to the current state. It is possible to learn combinations of operating amounts to be reduced.

[0106] The learning means 292 is a state in which learning is completed, and a measured value 205 that is operation data at the current time. Based on the learning result, the operation amount 295 that maximizes the reward in equation (3) is output.

[0107] The state search means 260 is the same as that in the first embodiment described above. In the function of the improvement probability calculation means 270, in addition to the frequency distribution in the basic operation command value 235 current calculation value in step 530 shown in FIG. 3, the frequency distribution in the operation amount 295 that is the output value of the learning means 292 is calculated. Calculate one frequency distribution. In step 540, the improvement probability or non-improvement probability, the maximum improvement actual value, and the maximum non-improvement actual value are calculated for the basic operation command value 235 current calculated value and the operation amount 295 from the counted frequency distribution. Although these points are different from the first embodiment, the respective calculation methods are the same.

The function of the operation command determination means 280 is basically the same as that of the first embodiment, but differs from the first embodiment in the following points.

In the present embodiment, in step 550 shown in FIG. 3, the improvement probability or non-improvement probability, maximum improvement actual value, and maximum non-improvement actual value for the current operation value of the basic operation command value 235 and the operation amount 295 are calculated. Compare with each criterion value.

[0110] Whether or not both signals can be permitted as the next operation command is determined based on a preset condition as in the first embodiment.

As a result, if both are determined to be disapproved, the previous value of the basic operation command value 235 is output as the operation command value 285. If only one of them is permission determination, the permission determination signal is output as the operation command value 285. In addition, when both are determined to be permitted, a condition for deciding which one to select is determined, and one is selected based on this.

[0112] As the selection method, the one having the larger expected improvement value is selected. Other methods such as the one with the largest maximum improvement actual value or the one with the smallest maximum non-improvement actual value may be considered, and methods other than this example may be set.

[0113] Figure 14 shows the circuit diagram for selecting the operation command value 285 current value by the operation command determination means 280. The subtractor 281 calculates the deviation value 287 of the basic operation command value 235 current value and the operation amount 295, which is the reinforcement learning result, and adds it to the operation command value 285 current value by the adder 284 to add the reinforcement learning operation command value. Create 288.

[0114] If the input data is abnormal or the arithmetic circuit is abnormal, the output value of reinforcement learning means 290 If the operation command value 285 is abnormal, the reinforcement learning operation command value 288 becomes equal to the basic operation command value 235 current value by setting the coefficient multiplied by the deviation signal 287 by the multiplier 283 to zero. The risk of erroneously outputting an abnormal signal is reduced.

[0115] Whether the operation command value 285 is abnormal or not is determined by checking the upper and lower limit values of the input data and output data to the reinforcement learning means 290 and the upper and lower limit checks of the change rate. If even one of the values deviates from the preset upper and lower limit values, the output of the switch 282 is set to 0 to prevent the output of the operation command value 285 that may be abnormal. The switch 282 sets the output signal to 1 in other cases.

In response to the determination result of step 550, switch 286 selects and outputs one of reinforcement learning operation command value 288, basic operation command value 235 current value, and basic operation command value 235 previous value.

[0117] If the operation command value 285 is abnormal, the reinforcement learning operation command value 288 is excluded from the selection candidates. For this reason, when the operation command value 285 is abnormal, either the current value or the previous value of the basic operation command value 235 is output, and driving safety is ensured. As described above, when the operation command value 285 is abnormal, the output signal of the switch 282 is set to 0. Therefore, even if the reinforcement learning operation command value 288 is selected by the switch 286, an abnormal signal is output. There is no double security.

[0118] By adding the reinforcement learning means 290 as described above, it is possible to learn a better operation corresponding to the actual plant characteristics, so that the control performance is further improved. Even if the calculation result of the reinforcement learning means 290 becomes abnormal for some reason, the stable operation of the brand is maintained by the double safety measures, so that the reliability can be ensured.

Next, a third embodiment will be described with reference to FIG. The difference from the second embodiment described above is that information 275 of improvement probability or non-improvement probability, maximum improvement actual value, and maximum non-improvement actual value is output to the reinforcement learning means 290.

[0120] Reinforcement learning means 290 uses information 275 to calculate a reward. The formula for calculating the reward is shown in Equation (4). Here, R 1, R 2, R 3, and R are the same as in equation (3). P and P are NOx and C

1 2 3 4 a— NOx a— CO

O improvement probability, O, Ρ is the maximum improvement actual value of NOx and CO, O, Ρ

Pmin.NOx Pmin.CO Pmin.NOx Pmin.C is the maximum non-improved actual value of NOx and CO, SI, S2, SI, SI, S2

O Pa— NOx Pa— NOx OPmax.NOx Pa— CO P SI is a setting parameter, and k, k, k, k, k, k, k, k are positive constants.

OPmax — CO 5 6 7 8 9 10 11 12 Number.

[0121] [Equation 4] Equation (4) (P a..NOx S, Ox

(Pa—NOx 1 Pa Ox

Κ 6 (100P _{a NOx} -SS22 _{Pa NOx} )

Ox> ° ¹ OPmax NOx>

, ΝΟχ〉 ° "I OPrnin„ .NOx)

½a CO>

(P _a CO ^Ί Ρβ CO)

⁼ ^ ιο (100P S2 Pa CO)

F¾ "=

CO> S1 OPmki— CO)

[0122] Since the reward is calculated using Equation (4), the reward increases when the maximum improvement actual value with a large improvement probability is large and the maximum non-improvement actual value is small. Therefore, operations that satisfy these conditions are automatically learned at the reinforcement learning stage, and therefore, operations with less risk of non-improvement can be learned.

[0123] Definition of reward calculation method shown in Equation (3) or Equation (4) and each setting parameter required for reward definition. Input parameters are the same as those on setting screen 300. Input / output means 221 is used to input and set parameters. Is possible.

[0124] Since the operation amount 295 is an operation command value determined in consideration of information such as improvement probability, maximum improvement actual value, maximum non-improvement actual value, etc., the operation command determining means 280 normally operates the operation amount 295 at all times. Output as command value 285.

[0125] However, if the manipulated variable 295 becomes abnormal due to a lack of input data to the reinforcement learning means 290 or a malfunction of the arithmetic circuit, the operation command determining means 280 uses the basic operation command 235 the current value as the operation command value. Output as 285. [0126] As described above, in this embodiment, since the reward for reinforcement learning is calculated using information such as improvement probability, maximum improvement actual value, maximum non-improvement actual value, etc., operation based on past driving results is performed. It is possible to automatically learn the operation method that reduces the risk of non-improvement and has a large effect.

[0127] According to the above-described embodiment, it is possible to reduce the risk of non-improvement of the plant state due to the control operation resulting from the variation in the actual plant operation data. In other words, the control operation can reduce the risk that the deviation from the control target value (control deviation) increases compared to the current state, so that stable operation is always possible.

[0128] Further, since the user can determine the improvement probability or non-improvement probability due to the operation as an index for determining the next control operation, the operation according to the operation policy of the plant operator or the manager becomes possible. In terms of operation, control characteristics can be easily changed even when operating characteristics are changed in the middle.

[0129] That is, for example, in the trial operation stage, non-improvement risk is allowed, and a control method is adjusted by pursuing a desirable driving operation method through trial and error. be able to.

[0130] Depending on the type of plant, stable operation may be the first condition even if the time to settling is long. On the other hand, even if there are some fluctuations, the target state is reached quickly. May be desirable. Similarly, when the control deviation is large to some extent, it is set to reduce the non-improvement risk so that the state does not worsen further. On the contrary, when the control deviation is small to some extent, some non-improvement risk is allowed. Even so, it is possible to pursue more desirable conditions.

[0131] Furthermore, it can flexibly respond to user requirements and plant conditions, and has the effect of improving operational safety, product quality, yield, etc. by reducing non-improvement risk.

[0132] In addition, when applied to combustion control of a thermal power plant, the risk of an increase in the generation of environmentally hazardous substances such as NOx and CO can be reduced, and environmentally friendly operation becomes possible. As a result, the amount of ammonia used in the denitration apparatus can be reduced, and economic effects such as longer catalytic activity can be expected.

[0133] In the above-described embodiment, a control device for controlling a thermal power plant is described. The control device described above can also be used for other plants such as force production plants. Brief Description of Drawings

FIG. 1 is a diagram illustrating a configuration of a control device according to a first embodiment.

FIG. 2 is a diagram illustrating a configuration of a control device according to a second embodiment.

FIG. 3 is a diagram for explaining a calculation procedure of a correction means.

FIG. 4 is a diagram for explaining an example of the relationship between plant data distribution and operating parameters.

FIG. 5 is a diagram for explaining an example of a frequency distribution of NOx concentration.

FIG. 6 is a diagram illustrating an example of a frequency distribution of NOx concentration.

FIG. 7 is a diagram illustrating an example of an operation command value output permission condition setting screen.

FIG. 8 is a diagram illustrating an example of an operation command value output permission condition setting screen.

FIG. 9 is a diagram for explaining an example of an operation command value output permission condition setting screen.

FIG. 10 is a diagram illustrating the configuration of a thermal power plant.

FIG. Ll is a diagram showing a trend display example of NOx operation results.

FIG. 12 is a diagram for explaining the concept of reinforcement learning.

FIG. 13 is a diagram illustrating the configuration of reinforcement learning means.

FIG. 14 is a diagram illustrating an operation command signal selection circuit.

FIG. 15 is a diagram illustrating a configuration of a control device according to a third embodiment.

Explanation of symbols

[0135] 100 ··· Plant, 200 ··· Control device, 220 ··· I / O interface, 221 ··· I / O means, 230 ··· Basic control command calculation means, 240 ··· Operation result database, 250 ··· Correction means , 260 ... state search means, 270 ... improvement probability calculation means, 280 ... operation command determination means

Claims

The scope of the claims

[1] Basic control command calculation means for inputting plant measurement data and calculating an operation command value for the plant;

An operation record database for accumulating operation data having the measurement data and the operation command value;

A state search means for searching and extracting a similar state based on the current driving data and the past driving data;

An improvement that calculates a frequency distribution or a probability distribution of a driving state change by a control operation in a past driving data that is a similar state extracted by the state search means, and calculates an improvement probability or a non-improvement probability by the control operation A probability calculation means;

Operation command determination means for determining the next operation command value based on the improvement probability or non-improvement probability calculated by the improvement probability calculation means is provided.

A plant control apparatus characterized by the above.

[2] In the plant control device according to claim 1,

A plant control apparatus comprising improvement probability reference value input means for inputting an allowable reference value which is an upper limit value or a lower limit value for each value calculated by the improvement probability calculation means.

[3] In the plant control device according to claim 1,

Reinforcement learning means for calculating a predefined reward and learning the plant operation method based on the reward value!

The operation command determining means corrects an operation command value calculated as a result of reinforcement learning and determines a next operation command value.

A plant control apparatus characterized by the above.

[4] In the plant control apparatus according to claim 3,

A plant control apparatus comprising operation learning means for determining a reward value to be calculated during reinforcement learning and learning a plant operation method based on the reward value.

[5] In the plant control device according to claim 1,

The control device is a control device for a thermal power plant, When searching for a similar state of the plant state, the state search means searches for the generator output or load factor, fuel flow rate, feed water flow rate, total air flow rate of the burner or air port, individual air flow rate of each air port, and burner ignition. This is a search for the similar state of the above-mentioned driving performance database using the information of position! / Or deviation as a state index.

The improvement probability calculation means includes NOx, CO, CO, SOx, Hg, fluorine, soot in exhaust gas.

2

State quantity of fine particles consisting of dust or mist, state quantity of VOC (Volatile Organic Compound), steam temperature, steam pressure, generator output, and efficiency change due to control operation. The frequency distribution or probability distribution of the actual change is calculated, and the improvement probability, non-improvement probability, variance, average value, expectation value, maximum improvement expected value, maximum non-improvement prediction value, maximum improvement expected value, or maximum non-improvement prediction by operation A control device for a thermal power plant, wherein at least one of occurrence probabilities of a predetermined ratio or more with respect to a value is calculated.

[6] In the thermal power plant control device according to claim 5,

Reinforcement learning means for calculating a predefined reward and learning the operation method of the thermal power plant based on the reward value;

The control device for a thermal power plant, wherein the operation command determination means corrects an operation command value calculated as a result of reinforcement learning and determines a next operation command value.

[7] In the thermal power plant control device according to claim 5,

The operation command value is one of the fuel flow rate supplied to the burner, the air flow rate of the burner, the air flow rate supplied to the air port, the gas recirculation amount, the burner angle, and the supply air temperature. apparatus.

[8] In the plant control device according to claim 1,

The improvement probability calculating means calculates any one of the maximum improvement value, the maximum non-improvement value, the variance, the average value, the expected value, the maximum improvement value, or the occurrence probability of a predetermined ratio or more with respect to the maximum non-improvement value,

The plant control apparatus according to claim 1, wherein the operation command determination means determines a next operation command value based on the value calculated by the improvement establishment calculation means.