WO2007116591A1 - Plant controller - Google Patents

Plant controller Download PDF

Info

Publication number
WO2007116591A1
WO2007116591A1 PCT/JP2007/050683 JP2007050683W WO2007116591A1 WO 2007116591 A1 WO2007116591 A1 WO 2007116591A1 JP 2007050683 W JP2007050683 W JP 2007050683W WO 2007116591 A1 WO2007116591 A1 WO 2007116591A1
Authority
WO
WIPO (PCT)
Prior art keywords
value
improvement
probability
plant
operation command
Prior art date
Application number
PCT/JP2007/050683
Other languages
French (fr)
Japanese (ja)
Inventor
Akihiro Yamada
Takaaki Sekiai
Yoshiharu Hayashi
Naohiro Kusumi
Masayuki Fukai
Satoru Shimizu
Original Assignee
Hitachi, Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi, Ltd. filed Critical Hitachi, Ltd.
Publication of WO2007116591A1 publication Critical patent/WO2007116591A1/en

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/0265Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion

Definitions

  • the present invention relates to a plant control device.
  • control logic based on PID control has been mainstream in the field of plant control.
  • many supervised learning functions such as neural networks have been proposed to flexibly respond to plant characteristics!
  • Reinforcement learning is a learning control method that generates an operation signal to the environment so that the measurement signal obtained by the environmental force becomes a desired one through trial and error interaction with the environment such as the controlled object. It is a framework. In this way, even if a successful case cannot be prepared in advance, there is an advantage that it is possible to learn desired and action according to the environment by simply defining a desirable state.
  • the current state power can be obtained in the future by using the evaluation value of the scalar quantity calculated by using the measurement signal obtained from the environmental power (referred to as reward in reinforcement learning). It has a learning function that generates an operation signal to the environment so that the expected value of the evaluation value is maximized.
  • a method for implementing such a learning function there are algorithms such as Actor-Critic, Q-learning, and real-time dynamic programming described in Non-Patent Document 1, for example.
  • Dyna-architecture has been introduced in the above-mentioned document as a framework for reinforcement learning that is an extension of the above-described method. This is a method of learning in advance what kind of operation signal should be generated for a model simulating a control target, and determining the operation signal to be applied to the control target using the learning result. It also has a model adjustment function that reduces the error between the controlled object and the model.
  • Patent Document 1 As a technique to which reinforcement learning is applied, a technique described in Patent Document 1 is cited. It is. This is because multiple reinforcement learning modules, which are a combination of a model and a system with a learning function, are prepared, and a responsibility signal that takes a larger value as the prediction error between the model and the controlled object in each reinforcement learning module is smaller. This is a technology that determines the operation signal to be applied to the control object by weighting the operation signal to the control object generated from each reinforcement learning module in proportion to the responsibility signal.
  • Patent Document 2 describes a method of adjusting a process simulation model using a reinforcement learning method, a method of configuring a driving training device and a driving diagnosis device using the model, and the like. In this paper, it is described how to express the fluctuation of a phenomenon with a probability density function such as a normal distribution, and how it can be simulated more realistically by reflecting this fluctuation in simulation conditions and model parameters.
  • Patent Document 1 Japanese Patent Application Laid-Open No. 2000-35956
  • Patent Document 2 JP 2004-178492 A
  • Non-Patent Document 1 Reinforcement Learning, Sadayoshi Mikami ⁇ Masaaki Minagawa, Morikita Publishing Co., Ltd., published on December 20, 2000
  • Non-Patent Document 1 and Patent Document 1 do not describe a method for variation in plant data.
  • Patent Document 2 describes that the variation is expressed by a probability density function or the like, but it describes a countermeasure method for the above problem at the time of control operation and a countermeasure method at the time of reinforcement learning! Are you! /! /.
  • the present invention provides a control device that can reduce the risk that the state will not be improved or worsen during the control operation, improve the control performance, and can always perform a stable operation.
  • the purpose is that.
  • the present invention provides basic control command calculation means for inputting plant measurement data and calculating an operation command value for the plant, and an operation result database for accumulating operation data having the measurement data and the operation command value.
  • a state search means for searching for and extracting a similar state based on the current operation data and the past operation data, and past operation data in a similar state extracted by the state search means.
  • Performance of change in driving state by operation Improvement probability calculation means for calculating frequency distribution or probability distribution and calculating improvement probability or non-improvement probability by the control operation, and improvement probability or non-improvement calculated by the improvement probability calculation means
  • a plant control apparatus comprising operation command determining means for determining a next operation command value based on a probability.
  • the present invention is constituted by the above means, it is possible to reduce the risk of non-improvement of the plant state due to the control operation caused by the variation in the actual plant operation data.
  • the control operation can reduce the risk that the deviation from the control target value (control deviation) will increase compared to the current state, so that stable operation is always possible.
  • FIG. 1 shows a first embodiment.
  • the control device 200 also receives the measured value 205 of the process value for the plant 100 force to be controlled, and uses this value to pre-process it in the control device 200. Performs the programmed calculation and sends an operation command signal (control signal) 285 to the plant 100.
  • the plant 100 controls the state of the plant by operating an actuator such as a valve opening or a damper opening.
  • This embodiment is an example applied to combustion control of a thermal power plant.
  • an example applied to a control function aimed at reducing the NOx and CO concentrations in exhaust gas will be explained.
  • FIG. 10 shows a configuration of a thermal power plant that is a control target.
  • the coal used as fuel, the primary air for transporting the coal, and the secondary air for adjusting the combustion are introduced into the boiler 101 through the burner 102, and the boiler 101 burns the coal.
  • Coal and primary air are routed from line 134 and secondary air is routed from line 141.
  • after-air for two-stage combustion is introduced into the boiler 101 via the after-air port 103. This after air is led from the pipe 142.
  • the feed water circulating in the boiler 101 is guided to the boiler 101 via the feed water pump 105, and is superheated by the gas in the heat exchanger 106 to become high-temperature and high-pressure steam.
  • the number of heat exchanges is one, but a plurality of heat exchanges may be arranged.
  • the high-temperature and high-pressure steam that has passed through the heat exchanger 106 is guided to a steam turbine bin 108 via a turbine governor 107.
  • the steam turbine 108 is driven by the energy of the steam, and the generator 109 generates power.
  • the primary air is led from the fan 120 to the pipe 130, branched into a pipe 132 that passes through the air heater and a pipe 131 that does not pass through the air, and merges again in the pipe 133 and led to the mill 110.
  • the air passing through the air heater is superheated by the gas. Using this primary air, coal (pulverized coal) produced in the mill 110 is conveyed to the burner 102.
  • the control device 200 has a function of adjusting the air amount input from the burner and the air amount input from the after air port in order to reduce NOx and CO concentrations.
  • the control device 200 includes a basic control command calculation means 230, a correction means 250 for changing or correcting the basic operation command value 235 output from the basic control command calculation means 230, a process measurement value 205, and an operator input.
  • Operation results database 240 that stores and stores operation results data consisting of signals, command signals of higher control system power, etc., and input / output interface 220 for data exchange with plant 100 or operators, etc. It consists of input / output means 221 for the operator to view various data and to input set values, operation modes, manual operation commands, etc.
  • the basic control command calculation means 230 has a PID (proportional / integral / derivative) controller as a basic component, and inputs the process measurement value 205, operator input signal, host control system force command signal, etc. Calculates and outputs basic operation command value 235 for various operating devices such as valves, dampers, and motors installed in
  • a feature of this embodiment is that a correction means 250 for changing or correcting the basic operation command value 235 is provided.
  • the correction means 250 will be described.
  • the correction means 250 is composed of a state search means 260, an improvement probability calculation means 270, and an operation command determination means 280.
  • the plant operation is based on the basic operation command value 235 from the past operation result data. This function has a function to switch whether to output the current value of the basic operation command value 235 or to maintain the previous value according to the probability. ing.
  • FIG. 4 shows the relationship between the operation parameter X in FIG. 4 and the process value (state quantity A) to be controlled. As described above, since the operation data varies, plotting the operation results yields data with the distribution shown in Fig. 4.
  • the driving performance corresponding to the next operation point has a distribution. If the distribution is a frequency distribution or probability distribution as shown on the right side of Fig. 4, the expected value is smaller than the current state quantity b, and has been reduced to the minimum value c. However, there is also a probability that it will be larger than the current state b, and it can be seen that there is a track record that it has increased to a maximum.
  • the determination varies depending on whether the state quantity A is larger, more desirable, smaller, or more desirable, but the state quantity A is greater than the expected value power 3 ⁇ 4, for example, assuming that a smaller one is desirable. Since it is small, it can be judged that the following operations should be performed. However, since there is a possibility that the state is bad up to a, it is necessary to decide whether or not to perform the next operation.
  • the correction means 250 automatically analyzes past operation result data and determines whether to output the current operation value of the basic operation command value 235 as it is based on the result. As described above, it is possible to suppress a situation in which the state of illness has a serious influence on the product or the environment.
  • Condition parameters are the process value that is an index for identifying the plant state, the deviation tolerance value for defining similarity, and the output of the current operation value of the basic operation command value 235 for conditions such as improvement probability. Determining whether to allow This is the standard tolerance.
  • the power output value and the air damper opening that adjusts the air amount that is the operation end of the combustion control are used as an index for specifying the plant state.
  • this embodiment does not limit what is used for this index.
  • conditions such as the fuel flow rate and the feed water flow rate may be added, or another index may be used.
  • step 510 the similarity between the past driving data stored in the driving performance database 240 and the current state is calculated.
  • the similarity is defined by the Euclidean distance.
  • the coordinates of two data points P and Q are given by (Xpl, Xp2, Xp 3, ..., Xpn), (Xql, Xq2, Xq3, ..., Xqn), the distance d squared between the two points Is obtained by equation (1).
  • the coordinates Xpi and Xqi are process values defined as indicators for identifying the plant state.
  • step 520 the deviation allowable distance d max for defining the similarity read in step 500 is compared with each distance d calculated in step 510, and the operation satisfying the condition of equation (2) is performed. Extract only the data set.
  • step 530 as shown in the graph on the right side of FIG. 4 from the data set extracted in step 520, NOx or CO at the air damper opening at which the basic operation command value 235 is obtained. Calculate the frequency distribution from the concentration results. At this time, the air damper opening, which is the operation amount, uses a predetermined division width. In addition, since NOx and CO values fluctuate with time, they are averaged over a predetermined time interval, and these are also counted with a predetermined division width.
  • step 540 the frequency at which the NOx or CO concentration decreased from the current value in the operating state of the basic operation command value 235 is obtained from the counted frequency distribution, and this rate is divided by the total frequency to obtain the improvement probability. To do.
  • the ratio of the frequency increased from the current concentration is defined as the non-improvement probability. In other words, the probability of deterioration.
  • the minimum value of NOx or CO concentration is searched, and the value is set as the maximum improvement performance value.
  • search for the maximum value of NOx or CO concentration and use that value as the maximum non-improvement actual value. In other words, it is the maximum bad performance record.
  • Step 550 the judgment reference value for whether or not to allow the output of the current operation value of the basic operation command value 235 read in Step 500, the improvement probability or non-improvement probability calculated in Step 540, and the maximum improvement actual value .
  • the maximum non-improvement actual value is compared with each other, and whether or not the basic operation command value 235 current calculated value can be output is also determined based on a predetermined determination condition.
  • Judgment conditions include improvement probabilities or non-improvement probabilities, maximum improvement actual values, and maximum non-improvement actual values.
  • the determination condition may be an OR condition or a combination of these threads, and other settings may be used.
  • Judgment conditions include the maximum improvement value, maximum non-improvement value, variance, average value, expectation value, maximum improvement value, or the probability of occurrence exceeding a predetermined ratio with respect to the maximum non-improvement value.
  • Steps 500, 510, and 520 are executed by the state search means 260, and data 265 that meets similar conditions is extracted from the data 245 stored in the operation result database 240.
  • Steps 530 and 540 are executed by the improvement probability calculation means 270 and become information 275 of improvement probability or non-improvement probability, maximum improvement actual value, and maximum non-improvement actual value.
  • Step 550 is executed by the operation command determination means 280.
  • the operation command determination means 280 when the basic operation command value 235 this time calculated value is permitted based on the result of step 550, Outputs the basic operation command value 235 current calculation value as the operation command value 285 as it is.
  • the previous value of the basic operation command value 235 that is, the current operation command value
  • the expected value is that the NOx concentration will decrease and there is still a significant probability that it will increase. From the viewpoint of reducing NOx emissions as much as possible, If it is judged that the risk is high, the next operation can be selected.
  • the power for removing NOx by a denitration device requires ammonia for denitration. If the amount of NOx generated can be reduced by the control device of this embodiment, the consumption of ammonia can be reduced, and the economic effect of reducing the operating cost can be expected. In addition, by reducing the amount of NOx generated, the effect of extending the life of the denitration catalyst can be expected if the denitration device is downsized.
  • the previous value of the basic operation command value 235 is output in the case of non-permission determination, but there is also a method in which the basic operation command value 235 is corrected and added to the current calculated value.
  • the probability of improvement or non-improvement calculated in step 550 in step 550, the maximum improvement actual value, the maximum non-improvement actual value, and the deviation between each corresponding criterion value is 0 to 1. It is also possible to use a force-specific correction method that uses a coefficient.
  • the probability of improvement or non-improvement probability when the previous value of basic operation command value 235 is output If the previous value and the current value are good, the probability of obtaining the result is high, and the one with the higher value may be selected.
  • FIG. 7 shows an example of a screen for setting a determination condition for determining whether or not output of the current operation value of the basic operation command value 235 is permitted.
  • the screen shown in FIG. 7 is displayed on the display monitor 223, and the operator inputs settings from the keyboard 222 with a mouse.
  • the input / output means 221 is composed of a screen display monitor 223 and a keyboard 222 with a mouse as input means.
  • input / output means to the operator there are a voice input / output device, a touch pen, etc. It is also possible to use these devices.
  • the setting conditions are not limited to these, and other conditions may be added.
  • the “setting end” button 304 at the bottom of the screen can be selected, and the setting is ended by clicking this with the mouse pointer 301. If “Allow with condition” is selected, the “To condition parameter setting screen” button 303 can be selected. Clicking this with the mouse pointer 301 advances to the condition parameter setting screen.
  • the setting screen 300 has a “return” button 302, which can be clicked to return to the state before the setting. Since no condition is set at first, the “return” button 302 is disabled until the condition is set!
  • the initial setting may be set as a default and this may be set as the initial state.
  • default settings can be added as conditions on the setting screen 300.
  • the screen shown in FIG. 8 is displayed.
  • the target process value can be selected from the pull-down menu 305 in the upper right of the screen.
  • the process value selected from the pull-down menu 305 can be divided into levels.
  • NOx is selected as the process value, and it can be divided into multiple levels by entering the upper and lower limits of the NOx value. Each level is managed as a condition number.
  • the "Next" button 306 is clicked to move to a screen for setting an allowable value for each level (condition No.).
  • FIG. 9 is an example of a screen for setting the permissible values for “determination based on improvement probability or non-improvement probability” and “determination based on expected improvement value”.
  • the allowable values to be set are the items checked on the setting screen 300, and the corresponding allowable value setting screen is automatically displayed.
  • the permissible value of "determined by improvement probability or non-improvement probability” can be changed by a mouse operation on the horizontal bar. You can also enter values directly from the keyboard in the non-improvement probability or improvement probability columns. When a value is entered for either non-improvement probability or improvement accuracy, the other automatically displays the difference between 100% and the input value.
  • the allowable value of “determined by the expected improvement value” can also be changed by operating the mouse on the horizontal bar.
  • the ratio corresponds to the level range of process values such as NOx values set on the screen in Fig. 8.
  • the upper and lower limits of NOx can be directly entered to set the allowable value for “determined by the expected improvement value”.
  • the force of controlling the process value to be controlled as NOx and CO concentration is not limited to this.
  • the amount of CO, SOx, Hg (mercury) in the gas, fine particles consisting of fluorine, dust, or mist is not limited to this.
  • VOCs volatile organic compounds
  • steam temperature steam pressure
  • generator output efficiency
  • a plurality of combinations of these can also be used as “AND conditions” or “OR conditions”.
  • the difference from the first embodiment shown in FIG. 1 is that the correction learning means 290 of the control device 200 of this embodiment is provided with reinforcement learning means 290.
  • the reinforcement learning means 290 has a function of learning an appropriate operation method corresponding to the plant state by the reinforcement learning theory using the operation data stored in the operation result database 240.
  • FIG. 12 shows the concept of control based on reinforcement learning theory.
  • the control device 610 outputs an operation command 630 to the controlled object 600.
  • the controlled object 600 operates according to the control command 630.
  • the state of the controlled object 600 is changed by the operation according to the control command 630.
  • a reward 620 is received from the controlled object 600 that is an amount that indicates whether the changed state is desirable or undesirable for the controller 610 and how much they are.
  • the information to be received is the state quantity of the control object, and the control device 610 generally calculates the reward based on the information.
  • the reward is set to increase as it approaches the desired state, and the reward decreases as the state becomes undesirable.
  • control device 610 performs the operation by trial and error, and learns the operation method that maximizes the reward (ie, approaches the desired state as much as possible). Appropriate operation (control) logic is automatically constructed according to the state.
  • Supervised learning theory represented by neural networks needs to provide success cases as teacher data in advance, and is not suitable when a new plant or phenomenon is complicated and a success case cannot be prepared in advance.
  • reinforcement learning theory is classified as unsupervised learning, and the characteristics of the controlled object are not always clear in that they have the ability to generate the desired L ⁇ operation by trial and error! No! ⁇ It has advantages that can be applied even in cases.
  • this reinforcement learning theory is used.
  • FIG. 13 shows the configuration of reinforcement learning means 290.
  • the reinforcement learning means 290 includes a modeling means 291 and a learning means 292.
  • Modeling means 291 is a neural network consisting of an input layer, an intermediate layer, and an output layer, which reads past operation data from the operation result database 240, and uses an error back-propagation method (back propagation method) to determine the input / output relationship. learn.
  • the configuration and learning method of the neural network is a general method, and these methods may be other methods. It does not depend on the configuration of the neural network or the learning method, so it is detailed here. The description is omitted.
  • the input data are the air flow rate at each position of the burner and after-air port, the fuel flow rate for each burner, and the generator output, and the output data are NOx and CO concentrations.
  • the relationship between the fuel flow rate, air flow rate, power generation output and NOx and CO concentration is modeled, but the input items and output items are not limited to this.
  • the model method is not limited to the Euler network, and other statistical models such as a regression model may be used.
  • the learning means 292 is input data consisting of the air flow rate at each position of the burner and after-air port, and the fuel flow rate at each burner with respect to the model created by the modeling means 291. 93 is output. Input data 293 corresponds to the operating conditions of the plant, and the upper and lower limits, change width (step size), and maximum change width that can be taken in one operation are set. Each amount of the input data 293 is determined at random within a range of possible values.
  • Modeling means 291 inputs input data 293 to the created model, and outputs data 2
  • the learning means 292 receives the output data 294 and calculates a reward value.
  • the reward is defined by equation (3), where R is the reward value, O is the NOx value, O is the CO value, S
  • NOx CO NOx and S are target setpoints for NOx and CO, and k, k, k, and k are positive constants.
  • the learning means 292 learns the combination of the input data 293 so that the reward calculated by Equation (3) is maximized, that is, the manipulated variable. As a result, NOx and CO are calculated corresponding to the current state. It is possible to learn combinations of operating amounts to be reduced.
  • the learning means 292 is a state in which learning is completed, and a measured value 205 that is operation data at the current time. Based on the learning result, the operation amount 295 that maximizes the reward in equation (3) is output.
  • the state search means 260 is the same as that in the first embodiment described above.
  • the frequency distribution in the operation amount 295 that is the output value of the learning means 292 is calculated. Calculate one frequency distribution.
  • step 540 the improvement probability or non-improvement probability, the maximum improvement actual value, and the maximum non-improvement actual value are calculated for the basic operation command value 235 current calculated value and the operation amount 295 from the counted frequency distribution.
  • the function of the operation command determination means 280 is basically the same as that of the first embodiment, but differs from the first embodiment in the following points.
  • step 550 shown in FIG. 3 the improvement probability or non-improvement probability, maximum improvement actual value, and maximum non-improvement actual value for the current operation value of the basic operation command value 235 and the operation amount 295 are calculated. Compare with each criterion value.
  • Whether or not both signals can be permitted as the next operation command is determined based on a preset condition as in the first embodiment.
  • the previous value of the basic operation command value 235 is output as the operation command value 285. If only one of them is permission determination, the permission determination signal is output as the operation command value 285. In addition, when both are determined to be permitted, a condition for deciding which one to select is determined, and one is selected based on this.
  • the selection method the one having the larger expected improvement value is selected.
  • Other methods such as the one with the largest maximum improvement actual value or the one with the smallest maximum non-improvement actual value may be considered, and methods other than this example may be set.
  • Figure 14 shows the circuit diagram for selecting the operation command value 285 current value by the operation command determination means 280.
  • the subtractor 281 calculates the deviation value 287 of the basic operation command value 235 current value and the operation amount 295, which is the reinforcement learning result, and adds it to the operation command value 285 current value by the adder 284 to add the reinforcement learning operation command value. Create 288.
  • the output value of reinforcement learning means 290 If the operation command value 285 is abnormal, the reinforcement learning operation command value 288 becomes equal to the basic operation command value 235 current value by setting the coefficient multiplied by the deviation signal 287 by the multiplier 283 to zero. The risk of erroneously outputting an abnormal signal is reduced.
  • Whether the operation command value 285 is abnormal or not is determined by checking the upper and lower limit values of the input data and output data to the reinforcement learning means 290 and the upper and lower limit checks of the change rate. If even one of the values deviates from the preset upper and lower limit values, the output of the switch 282 is set to 0 to prevent the output of the operation command value 285 that may be abnormal. The switch 282 sets the output signal to 1 in other cases.
  • switch 286 selects and outputs one of reinforcement learning operation command value 288, basic operation command value 235 current value, and basic operation command value 235 previous value.
  • the reinforcement learning operation command value 288 is excluded from the selection candidates. For this reason, when the operation command value 285 is abnormal, either the current value or the previous value of the basic operation command value 235 is output, and driving safety is ensured. As described above, when the operation command value 285 is abnormal, the output signal of the switch 282 is set to 0. Therefore, even if the reinforcement learning operation command value 288 is selected by the switch 286, an abnormal signal is output. There is no double security.
  • Reinforcement learning means 290 uses information 275 to calculate a reward.
  • the formula for calculating the reward is shown in Equation (4).
  • R 1, R 2, R 3, and R are the same as in equation (3).
  • P and P are NOx and C
  • O improvement probability, O, ⁇ is the maximum improvement actual value of NOx and CO, O, ⁇
  • Pmin.NOx Pmin.CO Pmin.NOx Pmin.C is the maximum non-improved actual value of NOx and CO, SI, S2, SI, SI, S2
  • NOx Pa— NOx OPmax.NOx Pa— CO P SI is a setting parameter, and k, k, k, k, k, k, k, k are positive constants.
  • Equation 4 Equation (4) (P a..NOx S, Ox
  • Equation (4) Since the reward is calculated using Equation (4), the reward increases when the maximum improvement actual value with a large improvement probability is large and the maximum non-improvement actual value is small. Therefore, operations that satisfy these conditions are automatically learned at the reinforcement learning stage, and therefore, operations with less risk of non-improvement can be learned.
  • Equation (3) Definition of reward calculation method shown in Equation (3) or Equation (4) and each setting parameter required for reward definition.
  • Input parameters are the same as those on setting screen 300.
  • Input / output means 221 is used to input and set parameters. Is possible.
  • the operation command determining means 280 normally operates the operation amount 295 at all times. Output as command value 285.
  • the operation command determining means 280 uses the basic operation command 235 the current value as the operation command value. Output as 285.
  • the reward for reinforcement learning is calculated using information such as improvement probability, maximum improvement actual value, maximum non-improvement actual value, etc., operation based on past driving results is performed. It is possible to automatically learn the operation method that reduces the risk of non-improvement and has a large effect.
  • control operation can reduce the risk that the deviation from the control target value (control deviation) increases compared to the current state, so that stable operation is always possible.
  • control characteristics can be easily changed even when operating characteristics are changed in the middle.
  • stable operation may be the first condition even if the time to settling is long. On the other hand, even if there are some fluctuations, the target state is reached quickly. May be desirable. Similarly, when the control deviation is large to some extent, it is set to reduce the non-improvement risk so that the state does not worsen further. On the contrary, when the control deviation is small to some extent, some non-improvement risk is allowed. Even so, it is possible to pursue more desirable conditions.
  • control device for controlling a thermal power plant is described.
  • the control device described above can also be used for other plants such as force production plants.
  • FIG. 1 is a diagram illustrating a configuration of a control device according to a first embodiment.
  • FIG. 2 is a diagram illustrating a configuration of a control device according to a second embodiment.
  • FIG. 3 is a diagram for explaining a calculation procedure of a correction means.
  • FIG. 4 is a diagram for explaining an example of the relationship between plant data distribution and operating parameters.
  • FIG. 5 is a diagram for explaining an example of a frequency distribution of NOx concentration.
  • FIG. 6 is a diagram illustrating an example of a frequency distribution of NOx concentration.
  • FIG. 7 is a diagram illustrating an example of an operation command value output permission condition setting screen.
  • FIG. 8 is a diagram illustrating an example of an operation command value output permission condition setting screen.
  • FIG. 9 is a diagram for explaining an example of an operation command value output permission condition setting screen.
  • FIG. 10 is a diagram illustrating the configuration of a thermal power plant.
  • FIG. Ll is a diagram showing a trend display example of NOx operation results.
  • FIG. 12 is a diagram for explaining the concept of reinforcement learning.
  • FIG. 13 is a diagram illustrating the configuration of reinforcement learning means.
  • FIG. 14 is a diagram illustrating an operation command signal selection circuit.
  • FIG. 15 is a diagram illustrating a configuration of a control device according to a third embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)

Abstract

A plant controller enabling always stable operation with an improved control performance by reducing the risk that the state cannot be improved nor becomes worse during control operation. The plant controller is characterized by comprising basic control command computing means for inputting plant measurement data and computing an operation command variable sent to the plant, operation record database where operation data including the measurement data and the operation command variable are stored, status searching means for searching for and extracting similar statuses according to the current operation data and the past operation data, improvement probability computing means for computing a frequency distribution or a probability distribution from the variation record of the operation status by control operation according to the past operation data being the extracted similar statuses and computing the improvement probability or the nonimprovement probability by the control operation, and operation command determining means for determining the next operation command variable according to the computed improvement or nonimprovement probability.

Description

プラントの制御装置  Plant control device
技術分野  Technical field
[0001] 本発明は、プラントの制御装置に関する。  The present invention relates to a plant control device.
背景技術  Background art
[0002] 従来、プラント制御の分野では PID制御を基本とする制御ロジックが主流である。ま た、ニューラルネットワークに代表される教師付き学習機能により、プラントの特性に 柔軟に対応できる技術も多数提案されて!ヽる。  Conventionally, control logic based on PID control has been mainstream in the field of plant control. In addition, many supervised learning functions such as neural networks have been proposed to flexibly respond to plant characteristics!
[0003] 教師付き学習機能を用いて制御装置を構成するためには、教師データとなる成功 事例を予め準備する必要があるため、教師なし学習方法も提案されている。  [0003] In order to configure a control device using a supervised learning function, it is necessary to prepare a success case as teacher data in advance, so an unsupervised learning method has also been proposed.
[0004] 教師なし学習の例として、強化学習法がある。  [0004] As an example of unsupervised learning, there is a reinforcement learning method.
[0005] 強化学習法は制御対象などの環境との試行錯誤的な相互作用を通じて、環境力 得られる計測信号が望まし ヽものとなるように、環境への操作信号を生成する学習制 御の枠組みである。これにより、成功事例が予め準備できない場合でも、望ましい状 態を定義しておくだけで、自ら環境に応じて望ま 、行動を学習できると!、う利点が ある。  [0005] Reinforcement learning is a learning control method that generates an operation signal to the environment so that the measurement signal obtained by the environmental force becomes a desired one through trial and error interaction with the environment such as the controlled object. It is a framework. In this way, even if a successful case cannot be prepared in advance, there is an advantage that it is possible to learn desired and action according to the environment by simply defining a desirable state.
[0006] 強化学習では、環境力 得られる計測信号を用いて計算されるスカラー量の評価 値 (強化学習では、報酬と呼ばれている)を手力かりに、現状態力も将来までに得ら れる評価値の期待値が最大となるように、環境への操作信号を生成する学習機能を 持つ。このような学習機能を実装する方法として、例えば非特許文献 1に述べられて いる Actor-Critic, Q学習,実時間 Dynamic Programmingなどのアルゴリズムがある。  [0006] In reinforcement learning, the current state power can be obtained in the future by using the evaluation value of the scalar quantity calculated by using the measurement signal obtained from the environmental power (referred to as reward in reinforcement learning). It has a learning function that generates an operation signal to the environment so that the expected value of the evaluation value is maximized. As a method for implementing such a learning function, there are algorithms such as Actor-Critic, Q-learning, and real-time dynamic programming described in Non-Patent Document 1, for example.
[0007] また、上述の手法を発展させた強化学習の枠組みとして、 Dyna—アーキテクチャと 呼ばれる枠組みが上記文献に紹介されている。これは、制御対象を模擬するモデル を対象にどのような操作信号を生成するのが良いかを予め学習し、この学習結果を 用いて制御対象に印加する操作信号を決定する方法である。また、制御対象とモデ ルの誤差を小さくするモデル調整機能を持って 、る。  [0007] In addition, a framework called Dyna-architecture has been introduced in the above-mentioned document as a framework for reinforcement learning that is an extension of the above-described method. This is a method of learning in advance what kind of operation signal should be generated for a model simulating a control target, and determining the operation signal to be applied to the control target using the learning result. It also has a model adjustment function that reduces the error between the controlled object and the model.
[0008] また、強化学習を適用した技術として、特許文献 1に述べられている技術が挙げら れる。これは、モデルと学習機能を有するシステムの組である強化学習モジュールを 複数備えておき、各強化学習モジュールにおけるモデルと制御対象との予測誤差が 小さいものほど大きな値を取る責任信号を求め、この責任信号に比例して各強化学 習モジュールから生成される制御対象への操作信号を重み付けし、制御対象に印 加する操作信号を決定する技術である。 [0008] Further, as a technique to which reinforcement learning is applied, a technique described in Patent Document 1 is cited. It is. This is because multiple reinforcement learning modules, which are a combination of a model and a system with a learning function, are prepared, and a responsibility signal that takes a larger value as the prediction error between the model and the controlled object in each reinforcement learning module is smaller. This is a technology that determines the operation signal to be applied to the control object by weighting the operation signal to the control object generated from each reinforcement learning module in proportion to the responsibility signal.
[0009] また、プラント制御においては、実機データのばらつきを考慮する必要がある。一般 にプラントの状態は変動しており、見かけ上の静定状態においても制御系の働きによ り常に微調整を繰返している。また、プラントのダイナミクスにより、現在の状態はそれ 以前の状態の影響を受けている。さらに、ァクチユエータ及び計測器の誤差、信号ノ ィズ等もあるため、同一運転条件においてもプラント状態 (プロセス値)は全く同一と はならないのが普通である。すなわち、プラントデータにはばらつきが存在する。  [0009] Further, in plant control, it is necessary to consider variations in actual machine data. In general, the state of the plant fluctuates, and fine adjustment is always repeated by the action of the control system even in the apparently static state. In addition, the current state is affected by the previous state due to plant dynamics. Furthermore, because there are errors in the actuator and measuring instrument, signal noise, etc., the plant state (process value) is usually not exactly the same even under the same operating conditions. That is, there is variation in plant data.
[0010] プラントデータのばらつきについては特許文献 2に記載されている。特許文献 2は、 強化学習法を用いてプロセスシミュレーションモデルを調整する方法及びそのモデ ルを用いて運転訓練装置,運転診断装置を構成する方法等が述べられている。この 中で、現象の揺らぎを正規分布等の確率密度関数で表す方法、また、この揺らぎを シミュレーション条件やモデルパラメータに反映することによりより実際に近いシミュレ ーシヨンができることが記載されて!、る。 [0010] The variation in plant data is described in Patent Document 2. Patent Document 2 describes a method of adjusting a process simulation model using a reinforcement learning method, a method of configuring a driving training device and a driving diagnosis device using the model, and the like. In this paper, it is described how to express the fluctuation of a phenomenon with a probability density function such as a normal distribution, and how it can be simulated more realistically by reflecting this fluctuation in simulation conditions and model parameters.
[0011] 特許文献 1 :特開 2000— 35956号公報 Patent Document 1: Japanese Patent Application Laid-Open No. 2000-35956
特許文献 2:特開 2004 - 178492号公報  Patent Document 2: JP 2004-178492 A
非特許文献 1 :強化学習(Reinforcement Learning) ,三上貞芳 ·皆川雅章共訳,森北 出版株式会社, 2000年 12月 20日出版  Non-Patent Document 1: Reinforcement Learning, Sadayoshi Mikami · Masaaki Minagawa, Morikita Publishing Co., Ltd., published on December 20, 2000
発明の開示  Disclosure of the invention
発明が解決しょうとする課題  Problems to be solved by the invention
[0012] 上述のように、プラントデータには一般にばらつきがあるため、ある指標(例えばブラ ント出力)が同じであっても、プロセスの状態値は全く同じであるとは限らない。従って 、同じ操作をしても結果が同じにならない場合が多いため、制御操作によって思った ほどの改善効果が得られないばかりか、逆に状態が悪ィ匕する可能性がある。  [0012] As described above, since plant data generally varies, even if a certain index (for example, a brand output) is the same, the process state values are not always the same. Therefore, the result is often not the same even if the same operation is performed, so that the improvement effect as expected by the control operation cannot be obtained, and conversely, the state may worsen.
[0013] 特に強化学習等で操作方法を学習する場合には、ある指標で同じ状態において 同じ操作に対して運転状態が改善される場合と改善されない場合が存在する可能性 力 Sあり必ずしも望ま 、操作 (改善される)が学習できな 、可能性がある。 [0013] Especially when learning the operation method by reinforcement learning or the like, There is a possibility that the driving state may be improved or not improved for the same operation. S There is a possibility that the operation (improved) cannot be learned as desired.
[0014] 上記非特許文献及び特許文献 1にはプラントデータのばらつきに対する方法は記 載されていない。また、上記特許文献 2には、ばらつきを確率密度関数等であらわす ことは記載されて 、るが、制御操作時の上記課題への対策方法及び強化学習時の 対策方法につ!、ては述べられて!/、な!/、。 [0014] Non-Patent Document 1 and Patent Document 1 do not describe a method for variation in plant data. In addition, Patent Document 2 describes that the variation is expressed by a probability density function or the like, but it describes a countermeasure method for the above problem at the time of control operation and a countermeasure method at the time of reinforcement learning! Are you! /! /.
[0015] 本発明は、以上の課題に対して、制御操作時に状態が改善しない、または、悪ィ匕 するリスクを低減し、制御性能を向上させ常に安定した操作が可能な制御装置を提 供することを目的としている。 [0015] The present invention provides a control device that can reduce the risk that the state will not be improved or worsen during the control operation, improve the control performance, and can always perform a stable operation. The purpose is that.
課題を解決するための手段  Means for solving the problem
[0016] 本発明は、プラントの計測データを入力し前記プラントへの操作指令値を演算する 基本制御指令演算手段と、前記計測データ及び前記操作指令値を有する運転デー タを蓄積する運転実績データベースと、現在の前記運転データと過去の前記運転デ ータに基づいて類似状態を検索'抽出する状態検索手段と、該状態検索手段で抽 出された類似状態である過去の運転データにおいて、制御操作による運転状態の変 化実績力 頻度分布または確率分布を算出し、前記制御操作による改善確率または 非改善確率を算出する改善確率演算手段と、該改善確率演算手段で算出した改善 確率または非改善確率に基づ 、て、次の操作指令値を決定する操作指令決定手段 を具備することを特徴とするプラントの制御装置である。 [0016] The present invention provides basic control command calculation means for inputting plant measurement data and calculating an operation command value for the plant, and an operation result database for accumulating operation data having the measurement data and the operation command value. A state search means for searching for and extracting a similar state based on the current operation data and the past operation data, and past operation data in a similar state extracted by the state search means. Performance of change in driving state by operation Improvement probability calculation means for calculating frequency distribution or probability distribution and calculating improvement probability or non-improvement probability by the control operation, and improvement probability or non-improvement calculated by the improvement probability calculation means A plant control apparatus comprising operation command determining means for determining a next operation command value based on a probability.
発明の効果  The invention's effect
[0017] 本発明は、上記手段から構成されるため、プラントの実機運転データのばらつきに 起因する制御操作によるプラント状態の非改善リスクを低減できる。すなわち、制御 操作によって、現状態よりも制御目標値との偏差 (制御偏差)が増大するというリスク を低減できるため、常に安定した運転が可能になる。  [0017] Since the present invention is constituted by the above means, it is possible to reduce the risk of non-improvement of the plant state due to the control operation caused by the variation in the actual plant operation data. In other words, the control operation can reduce the risk that the deviation from the control target value (control deviation) will increase compared to the current state, so that stable operation is always possible.
発明を実施するための最良の形態  BEST MODE FOR CARRYING OUT THE INVENTION
[0018] 以下、実施形態について、図を参照しながら説明する。 Hereinafter, embodiments will be described with reference to the drawings.
[0019] 図 1は第一の実施形態を示している。制御装置 200は制御対象であるプラント 100 力もプロセス値の計測値 205を受け取り、これを使用して制御装置 200内に予めプロ グラムされた演算を行ってプラント 100への操作指令信号 (制御信号) 285を送信す る。プラント 100は受け取った操作指令信号 285に従って、例えばバルブの開度や ダンパ開度といったァクチユエータを動作させてプラントの状態をコントロールしてい る。 FIG. 1 shows a first embodiment. The control device 200 also receives the measured value 205 of the process value for the plant 100 force to be controlled, and uses this value to pre-process it in the control device 200. Performs the programmed calculation and sends an operation command signal (control signal) 285 to the plant 100. In accordance with the received operation command signal 285, the plant 100 controls the state of the plant by operating an actuator such as a valve opening or a damper opening.
[0020] 本実施例は火力発電プラントの燃焼制御に適用した例である。本例では特に、排 ガス中の NOx及び CO濃度を低下することを目的とした制御機能に適用した例を中 心に説明する。  This embodiment is an example applied to combustion control of a thermal power plant. In this example, an example applied to a control function aimed at reducing the NOx and CO concentrations in exhaust gas will be explained.
[0021] 図 10に制御対象である火力発電プラントの構成を示す。燃料となる石炭と石炭搬 送用の 1次空気、及び燃焼調整用の 2次空気をバーナー 102を介してボイラ 101〖こ 投入し、ボイラ 101にて石炭を燃焼する。石炭と 1次空気は配管 134から、 2次空気 は配管 141から導かれる。また、 2段燃焼用のァフタエアを、ァフタエアポート 103を 介してボイラ 101に投入する。このァフタエアは、配管 142から導かれる。  FIG. 10 shows a configuration of a thermal power plant that is a control target. The coal used as fuel, the primary air for transporting the coal, and the secondary air for adjusting the combustion are introduced into the boiler 101 through the burner 102, and the boiler 101 burns the coal. Coal and primary air are routed from line 134 and secondary air is routed from line 141. Further, after-air for two-stage combustion is introduced into the boiler 101 via the after-air port 103. This after air is led from the pipe 142.
[0022] 石炭の燃焼により発生した高温のガスは、ボイラ 101の経路に沿って流れた後、ェ ァーヒーター 104を通過する。その後、排ガス処理装置にて有害物質を除去した後、 煙突から大気に放出される。  [0022] The high-temperature gas generated by the combustion of coal flows along the path of the boiler 101 and then passes through the heater 104. Then, after removing harmful substances with an exhaust gas treatment device, it is released from the chimney to the atmosphere.
[0023] ボイラ 101を循環する給水は、給水ポンプ 105を介してボイラ 101に導かれ、熱交 換器 106においてガスにより過熱され、高温高圧の蒸気となる。尚、本実施形態では 熱交翻の数を 1つとしているが、熱交翻を複数個配置してもよい。  [0023] The feed water circulating in the boiler 101 is guided to the boiler 101 via the feed water pump 105, and is superheated by the gas in the heat exchanger 106 to become high-temperature and high-pressure steam. In this embodiment, the number of heat exchanges is one, but a plurality of heat exchanges may be arranged.
[0024] 熱交換器 106を通過した高温高圧の蒸気は、タービンガバナ 107を介して蒸気タ 一ビン 108に導かれる。蒸気の持つエネルギーによって蒸気タービン 108を駆動し、 発電機 109で発電する。  The high-temperature and high-pressure steam that has passed through the heat exchanger 106 is guided to a steam turbine bin 108 via a turbine governor 107. The steam turbine 108 is driven by the energy of the steam, and the generator 109 generates power.
[0025] 次に、バーナー 102から投入される 1次空気、及び 2次空気,ァフタエアポート 103 力 投入されるァフタエアの経路にっ 、て説明する。  [0025] Next, the primary air supplied from the burner 102, the secondary air, and the after-air port 103 will be described.
[0026] 1次空気は、ファン 120から配管 130に導かれ、途中でエアーヒーターを通過する 配管 132と通過しない 131に分岐し、再び配管 133にて合流し、ミル 110に導かれる 。エアーヒーターを通過する空気は、ガスにより過熱される。この 1次空気を用いて、ミ ル 110で生成される石炭 (微粉炭)をバーナー 102に搬送する。  The primary air is led from the fan 120 to the pipe 130, branched into a pipe 132 that passes through the air heater and a pipe 131 that does not pass through the air, and merges again in the pipe 133 and led to the mill 110. The air passing through the air heater is superheated by the gas. Using this primary air, coal (pulverized coal) produced in the mill 110 is conveyed to the burner 102.
[0027] 2次空気、及びァフタエアは、ファン 121から配管 140に導かれ、エアーヒーター 10 4で過熱された後、 2次空気用の配管 141と、ァフタエア用の配管 142とに分岐し、そ れぞれバーナー 102とァフタエアポート 103に導かれる。 [0027] Secondary air and after air are led from the fan 121 to the pipe 140, and the air heater 10 After being heated at 4, the secondary air pipe 141 and the after-air pipe 142 are branched to the burner 102 and the after-air port 103, respectively.
[0028] 制御装置 200は、 NOxおよび CO濃度を低減するため、バーナーから投入する空 気量と、ァフタエアポートから投入する空気量を調整する機能を持って 、る。  [0028] The control device 200 has a function of adjusting the air amount input from the burner and the air amount input from the after air port in order to reduce NOx and CO concentrations.
[0029] 制御装置 200は基本制御指令演算手段 230と、基本制御指令演算手段 230から 出力される基本操作指令値 235を変更または補正する補正手段 250と、プロセス計 測値 205,運転員の入力信号,上位制御システム力 の指令信号等力 成る運転実 績データを蓄積 ·格納して 、る運転実績データベース 240と、プラント 100または運 転員等とのデータ授受のための入出力インターフェース 220と、運転員が各種デー タを見たり、設定値や運転モード、手動運転の際の操作指令等を入力したりするため の入出力手段 221と力 構成されている。  [0029] The control device 200 includes a basic control command calculation means 230, a correction means 250 for changing or correcting the basic operation command value 235 output from the basic control command calculation means 230, a process measurement value 205, and an operator input. Operation results database 240 that stores and stores operation results data consisting of signals, command signals of higher control system power, etc., and input / output interface 220 for data exchange with plant 100 or operators, etc. It consists of input / output means 221 for the operator to view various data and to input set values, operation modes, manual operation commands, etc.
[0030] 基本制御指令演算手段 230は PID (比例 ·積分,微分)制御器を基本構成要素とし 、プロセス計測値 205,運転員の入力信号,上位制御システム力 の指令信号等を 入力としてプラント 100に設置されているバルブ、ダンバ、モータ等の各種動作機器 に対する基本操作指令値 235を演算して出力する。  [0030] The basic control command calculation means 230 has a PID (proportional / integral / derivative) controller as a basic component, and inputs the process measurement value 205, operator input signal, host control system force command signal, etc. Calculates and outputs basic operation command value 235 for various operating devices such as valves, dampers, and motors installed in
[0031] 本実施例の特徴は基本操作指令値 235の変更または補正する補正手段 250を備 えている点である。以下、補正手段 250について説明する。  A feature of this embodiment is that a correction means 250 for changing or correcting the basic operation command value 235 is provided. Hereinafter, the correction means 250 will be described.
[0032] 補正手段 250は状態検索手段 260と、改善確率演算手段 270と、操作指令決定手 段 280とで構成されており、過去の運転実績データカゝら基本操作指令値 235によつ てプラントの状態が望まし 、方向に改善するかどうかを調査し、その確率に応じて基 本操作指令値 235の今回値を出力するか、あるいは前回値のまま維持するかを切替 える機能を有している。  [0032] The correction means 250 is composed of a state search means 260, an improvement probability calculation means 270, and an operation command determination means 280. The plant operation is based on the basic operation command value 235 from the past operation result data. This function has a function to switch whether to output the current value of the basic operation command value 235 or to maintain the previous value according to the probability. ing.
[0033] 一般にプラントデータにはばらつきがあるため、ある指標(例えばプラント出力)が同 じレベルであっても、プロセスの状態値が全く同じである場合は少ない。制御装置が 現在の状態を認識するために参照するプロセス値の種類は限られているため、それ らが仮にすベて同じ値であったとしても、他のプロセス値は異なっている場合がある。 また、ァクチユエータの動作や計測器の誤差もデータのばらつきの要因となっている [0034] 従って、制御装置が同じ状態と認識して、同じ操作をしても結果が同じにならない 場合が多い。すなわち、制御操作によって前回は状態が改善したが、今回は逆に状 態が悪ィ匕する場合も起こり得る。また、思ったほどの改善効果が得られない可能性も ある。 [0033] Generally, there is variation in plant data, so even if an index (for example, plant output) is at the same level, there are few cases where the process state values are exactly the same. Since the types of process values referenced by the control unit to recognize the current state are limited, other process values may be different even if they all have the same value. . In addition, the operation of the actuator and the error of the measuring instrument are the factors of the data dispersion. Therefore, there are many cases where the result is not the same even if the control device recognizes the same state and performs the same operation. In other words, the state improved by the control operation last time, but this time, the state may worsen. In addition, it may not be as effective as expected.
[0035] 図 4にある操作パラメータ Xと制御対象のプロセス値 (状態量 A)との関係を示す。前 述したように、運転データにはばらつきがあるため、運転実績をプロットすると図 4に 示すような分布を持つデータが得られる。  FIG. 4 shows the relationship between the operation parameter X in FIG. 4 and the process value (state quantity A) to be controlled. As described above, since the operation data varies, plotting the operation results yields data with the distribution shown in Fig. 4.
[0036] 例えば、現在の状態量が bであり、基本操作指令値 235が次操作点で示す値であ る場合、次操作点に対応する運転実績には分布がある。その分布が図 4右側に示し たような頻度分布または確率分布の場合、期待値としては現在の状態量 bよりも小さ くなり、最小値 cまで低下した実績がある。しかし、現在の状態 bよりも大きくなる確率も あり、最大 aまで増加したという実績があることがわかる。  [0036] For example, when the current state quantity is b and the basic operation command value 235 is a value indicated by the next operation point, the driving performance corresponding to the next operation point has a distribution. If the distribution is a frequency distribution or probability distribution as shown on the right side of Fig. 4, the expected value is smaller than the current state quantity b, and has been reduced to the minimum value c. However, there is also a probability that it will be larger than the current state b, and it can be seen that there is a track record that it has increased to a maximum.
[0037] この場合、状態量 Aが大き 、方が望ま 、か、小さ 、方が望ま 、状態かによつて 判断が変わるが、例えば小さい方が望ましい場合を仮定すると、期待値力 ¾よりも小さ いため次の操作を実施した方が良いという判断もできる。しかし、逆に最大 aまで状態 が悪ィ匕するという可能性もあるため、次の操作を実施するかどうかを決める必要があ る。  [0037] In this case, the determination varies depending on whether the state quantity A is larger, more desirable, smaller, or more desirable, but the state quantity A is greater than the expected value power ¾, for example, assuming that a smaller one is desirable. Since it is small, it can be judged that the following operations should be performed. However, since there is a possibility that the state is bad up to a, it is necessary to decide whether or not to perform the next operation.
[0038] もし、次操作で状態が悪ィ匕することがプラントで製造する製品の品質に致命的な影 響を与えたり、対外的な環境性に影響を与える場合などでは、状態が悪化するリスク は極力排除する必要がある。  [0038] If a bad condition in the next operation has a fatal effect on the quality of products manufactured in the plant, or affects the external environment, the condition deteriorates. Risk must be eliminated as much as possible.
[0039] そこで、補正手段 250により、自動的に過去の運転実績データを分析して、その結 果に基づいて基本操作指令値 235の今回演算値をそのまま出力するかどうを決定 するため、特に上述のように状態悪ィ匕が製品や環境に深刻な影響を与える事態を抑 ff¾することができる。  [0039] Therefore, the correction means 250 automatically analyzes past operation result data and determines whether to output the current operation value of the basic operation command value 235 as it is based on the result. As described above, it is possible to suppress a situation in which the state of illness has a serious influence on the product or the environment.
[0040] 次に、補正手段 250の具体的なアルゴリズムを図 3を用いて説明する。まず、ステツ プ 500では条件パラメータを読み込む。条件パラメータとは、プラント状態を特定する ための指標となるプロセス値と、類似と定義するための偏差許容値と、改善確率等の 条件に対して基本操作指令値 235の今回演算値の出力を許可するかどうかの判定 基準となる許容値である。 Next, a specific algorithm of the correction unit 250 will be described with reference to FIG. First, in step 500, the condition parameters are read. Condition parameters are the process value that is an index for identifying the plant state, the deviation tolerance value for defining similarity, and the output of the current operation value of the basic operation command value 235 for conditions such as improvement probability. Determining whether to allow This is the standard tolerance.
[0041] 本例では、プラント状態を特定するための指標として発電出力値と燃焼制御の操作 端である空気量を調整する空気ダンバ開度を用いている。ただし、本実施例はこの 指標に何を用いるかを限定するものではなぐ例えば燃料流量,給水流量等の条件 を加えても良ぐまた、別の指標を用いても良い。  [0041] In this example, the power output value and the air damper opening that adjusts the air amount that is the operation end of the combustion control are used as an index for specifying the plant state. However, this embodiment does not limit what is used for this index. For example, conditions such as the fuel flow rate and the feed water flow rate may be added, or another index may be used.
[0042] これらは、入出力手段 221から運転員が入力する。入力されたこれらのデータは制 御装置内の記憶手段(図示せず)に保存され、ステップ 500実行時に読込まれる。  These are inputted by the operator from the input / output means 221. These input data are stored in storage means (not shown) in the control device, and are read when step 500 is executed.
[0043] ステップ 510では、運転実績データベース 240に格納された過去の運転データと 現在の状態との類似度を計算する。  [0043] In step 510, the similarity between the past driving data stored in the driving performance database 240 and the current state is calculated.
[0044] 類似度は、ユークリッド距離で定義する。 2データ点 P, Qの座標が (Xpl, Xp2, Xp 3, · ··, Xpn) , (Xql, Xq2, Xq3, · ··, Xqn)で与えられる時、 2点間の距離 dの二乗 は式(1)で求められる。ここで、座標となる Xpi, Xqiはプラント状態を特定するための 指標と定義したプロセス値である。  [0044] The similarity is defined by the Euclidean distance. When the coordinates of two data points P and Q are given by (Xpl, Xp2, Xp 3, ..., Xpn), (Xql, Xq2, Xq3, ..., Xqn), the distance d squared between the two points Is obtained by equation (1). Here, the coordinates Xpi and Xqi are process values defined as indicators for identifying the plant state.
[0045] この指標を複数用いる場合は、プロセス値の単位が異なることからそれぞれのプロ セス値を最大値等で規格ィ匕して用いる方が良 、。  [0045] When a plurality of these indicators are used, it is better to use each process value with a maximum value or the like because the unit of the process value is different.
[0046] [数 1] d2= _ i= i (Xpi— Xqi) 2 , · ·式 ( 1 ) [0046] [Equation 1] d 2 = _ i = i (Xpi— Xqi) 2 , · · Equation (1)
[0047] 式(1)をもちいて現在の運転状態と過去の運転実績点との距離 dを計算する。 [0047] Using equation (1), the distance d between the current driving state and the past driving performance points is calculated.
[0048] ステップ 520ではステップ 500で読込んだ類似と定義するための偏差許容値距離 d maxとステップ 510で計算したそれぞれの距離 dとを比較して、式(2)の条件を満た す運転データセットのみを抽出する。 [0048] In step 520, the deviation allowable distance d max for defining the similarity read in step 500 is compared with each distance d calculated in step 510, and the operation satisfying the condition of equation (2) is performed. Extract only the data set.
[0049] [数 2] α d max 式(2〉  [0049] [Equation 2] α d max formula (2)
[0050] ステップ 530では、ステップ 520で抽出したデータセットから図 4右側のグラフに示し たように、基本操作指令値 235の値となる空気ダンパ開度における NOxまたは CO 濃度の実績から頻度分布を計算する。この時、操作量である空気ダンバ開度は所定 の分割幅を用いる。また、 NOx, CO値は時間的に変動するため、所定の時間間隔 での平均値とし、これらも所定の分割幅で発生度数をカウントする。 [0050] In step 530, as shown in the graph on the right side of FIG. 4 from the data set extracted in step 520, NOx or CO at the air damper opening at which the basic operation command value 235 is obtained. Calculate the frequency distribution from the concentration results. At this time, the air damper opening, which is the operation amount, uses a predetermined division width. In addition, since NOx and CO values fluctuate with time, they are averaged over a predetermined time interval, and these are also counted with a predetermined division width.
[0051] ステップ 540ではカウントした頻度分布から、基本操作指令値 235の操作状態にお いて、 NOxまたは CO濃度が現在値よりも減少した度数を求め、この割合を全度数で 割って改善確率とする。  [0051] In step 540, the frequency at which the NOx or CO concentration decreased from the current value in the operating state of the basic operation command value 235 is obtained from the counted frequency distribution, and this rate is divided by the total frequency to obtain the improvement probability. To do.
[0052] 逆に現在濃度よりも増加した度数の割合を非改善確率とする。つまり悪化確率であ る。  On the contrary, the ratio of the frequency increased from the current concentration is defined as the non-improvement probability. In other words, the probability of deterioration.
[0053] また、 NOxまたは CO濃度の最小値を検索し、その値を最大改善実績値とする。同 様に NOxまたは CO濃度の最大値を検索し、その値を最大非改善実績値とする。つ まり最大悪ィ匕実績値である。  [0053] Further, the minimum value of NOx or CO concentration is searched, and the value is set as the maximum improvement performance value. Similarly, search for the maximum value of NOx or CO concentration, and use that value as the maximum non-improvement actual value. In other words, it is the maximum bad performance record.
[0054] ステップ 550では、ステップ 500で読込んだ基本操作指令値 235の今回演算値の 出力を許可するかどうかの判定基準値とステップ 540で計算した改善確率または非 改善確率,最大改善実績値,最大非改善実績値とをそれぞれ比較し、予め定めた 判定条件も基づいて基本操作指令値 235今回演算値の出力可否を決定する。  [0054] In Step 550, the judgment reference value for whether or not to allow the output of the current operation value of the basic operation command value 235 read in Step 500, the improvement probability or non-improvement probability calculated in Step 540, and the maximum improvement actual value , The maximum non-improvement actual value is compared with each other, and whether or not the basic operation command value 235 current calculated value can be output is also determined based on a predetermined determination condition.
[0055] 判定条件は改善確率または非改善確率,最大改善実績値,最大非改善実績値の それぞれに判定基準値を設けてすべてが基準値を満たしている時に基本操作指令 値 235今回演算値の出力を許可するように設定して!/、る。  [0055] Judgment conditions include improvement probabilities or non-improvement probabilities, maximum improvement actual values, and maximum non-improvement actual values. When all the reference values are satisfied, the basic operation command value 235 Set to allow output! /
[0056] 判定条件はこのような所謂 AND条件以外にも OR条件それらの糸且合せも考えられ 、これ以外の設定でも良い。また、判定条件は、最大改善値,最大非改善値,分散, 平均値,期待値,最大改善値または最大非改善値に対する所定割合以上の発生確 率等が挙げられる。  [0056] In addition to the so-called AND condition, the determination condition may be an OR condition or a combination of these threads, and other settings may be used. Judgment conditions include the maximum improvement value, maximum non-improvement value, variance, average value, expectation value, maximum improvement value, or the probability of occurrence exceeding a predetermined ratio with respect to the maximum non-improvement value.
[0057] ステップ 500, 510, 520は状態検索手段 260で実行され、運転実績データベース 240に蓄積されたデータ 245から類似条件に適合するデータ 265を抽出する。  Steps 500, 510, and 520 are executed by the state search means 260, and data 265 that meets similar conditions is extracted from the data 245 stored in the operation result database 240.
[0058] ステップ 530, 540は改善確率演算手段 270で実行され、改善確率または非改善 確率,最大改善実績値,最大非改善実績値の情報 275となる。  Steps 530 and 540 are executed by the improvement probability calculation means 270 and become information 275 of improvement probability or non-improvement probability, maximum improvement actual value, and maximum non-improvement actual value.
[0059] ステップ 550は操作指令決定手段 280で実行される。操作指令決定手段 280では 、ステップ 550の結果に基づき基本操作指令値 235今回演算値を許可する場合に は、基本操作指令値 235今回演算値をそのまま操作指令値 285として出力する。ま た、不許可判定の場合は、基本操作指令値 235の前回値 (すなわち現在の操作指 令値)を操作指令値 285として出力する。 Step 550 is executed by the operation command determination means 280. In the operation command determination means 280, when the basic operation command value 235 this time calculated value is permitted based on the result of step 550, Outputs the basic operation command value 235 current calculation value as the operation command value 285 as it is. In the case of non-permission determination, the previous value of the basic operation command value 235 (that is, the current operation command value) is output as the operation command value 285.
[0060] 図 5の NOx濃度の頻度分布に示す例では、過去実績は NOx濃度が現在値よりも 増加する可能性がないことを示しており、この場合には基本操作指令値 235今回演 算値を用いることが適切である。 [0060] In the example shown in the frequency distribution of NOx concentration in Fig. 5, past results indicate that there is no possibility that the NOx concentration will increase from the current value. In this case, the basic operation command value 235 is calculated this time. It is appropriate to use a value.
[0061] 図 6の NOx濃度の頻度分布に示す例では、期待値としては NOx濃度が低下する 力 逆に増加する確率もかなり残されており、極力 NOx排出量を削減する観点から、 次操作はリスクが大き ヽと判断した場合は次操作を実施しな ヽように選択することが できる。 [0061] In the example shown in the frequency distribution of NOx concentration in Fig. 6, the expected value is that the NOx concentration will decrease and there is still a significant probability that it will increase. From the viewpoint of reducing NOx emissions as much as possible, If it is judged that the risk is high, the next operation can be selected.
[0062] このように、例えば非改善確率が所定の数値以上の操作は実施しな 、ようにするこ とや、最大非改善実績値が許容できない場合には操作を実施しない等の措置が自 動的に取れるため、制御操作による非改善のリスクを低減することができる。  [0062] In this way, for example, an operation with a non-improvement probability equal to or higher than a predetermined value is not performed, and measures such as not performing an operation when the maximum non-improvement actual value is unacceptable are taken. Since it can be taken dynamically, the risk of non-improvement due to the control operation can be reduced.
[0063] 特に非改善の場合に、製品の品質や NOx, CO等の有害物質の排出など、対外環 境に影響を与える場合には有効であり、安全かつ安定な運転が可能になると共に、 製品の品質確保,歩留まり向上,環境保全にも貢献できる。  [0063] This is particularly effective when it affects the external environment, such as product quality and the release of harmful substances such as NOx and CO, in the case of non-improvement, and enables safe and stable operation. It can contribute to product quality assurance, yield improvement, and environmental conservation.
[0064] また、火力発電プラントにおいては NOxを脱硝装置で除去している力 この際に脱 硝用のアンモニアが必要である。本実施例の制御装置により NOx発生量を低減でき れば、アンモニアの消費量を削減でき、運転コスト削減の経済効果も期待できる。さら に、 NOx発生量削減により、脱硝装置の小型化ゃ脱硝触媒の長寿命化の効果も期 待できる。  [0064] Further, in a thermal power plant, the power for removing NOx by a denitration device requires ammonia for denitration. If the amount of NOx generated can be reduced by the control device of this embodiment, the consumption of ammonia can be reduced, and the economic effect of reducing the operating cost can be expected. In addition, by reducing the amount of NOx generated, the effect of extending the life of the denitration catalyst can be expected if the denitration device is downsized.
[0065] 本例では不許可判定の場合に基本操作指令値 235の前回値を出力するようにして いるが、基本操作指令値 235今回演算値に補正を加えて出力する方法もある。補正 の方法としては、ステップ 550でステップ 540で計算した改善確率または非改善確率 ,最大改善実績値,最大非改善実績値とそれぞれの対応する判定基準値との偏差 に比例した 0〜1までの係数を掛けて補正する方法がある力 別の補正方法を用いて も良い。  In this example, the previous value of the basic operation command value 235 is output in the case of non-permission determination, but there is also a method in which the basic operation command value 235 is corrected and added to the current calculated value. As a correction method, the probability of improvement or non-improvement calculated in step 550 in step 550, the maximum improvement actual value, the maximum non-improvement actual value, and the deviation between each corresponding criterion value is 0 to 1. It is also possible to use a force-specific correction method that uses a coefficient.
[0066] また、基本操作指令値 235の前回値出力する場合の改善確率または非改善確率 を同様に計算し、前回値と今回値の場合で良 、結果が得られる確率が高 、方を選 択しても良い。 [0066] Also, the probability of improvement or non-improvement probability when the previous value of basic operation command value 235 is output. If the previous value and the current value are good, the probability of obtaining the result is high, and the one with the higher value may be selected.
[0067] 図 7に基本操作指令値 235の今回演算値の出力を許可するかどうかの判定条件の 設定画面例を示す。図 7に示した画面は表示モニタ 223に表示され、マウス付キーボ ート 222から運転員が設定を入力する。  FIG. 7 shows an example of a screen for setting a determination condition for determining whether or not output of the current operation value of the basic operation command value 235 is permitted. The screen shown in FIG. 7 is displayed on the display monitor 223, and the operator inputs settings from the keyboard 222 with a mouse.
[0068] 図 1では入出力手段 221は画面表示用モニタ 223と入力手段となるマウス付キー ボード 222で構成されている力 運転員への入出力手段としては音声入出力装置, タツチペン等の他のデバイスを用いることも可能である。  [0068] In FIG. 1, the input / output means 221 is composed of a screen display monitor 223 and a keyboard 222 with a mouse as input means. As input / output means to the operator, there are a voice input / output device, a touch pen, etc. It is also possible to use these devices.
[0069] 設定画面 300では、「新たな操作を常に許可する」か「条件付で許可する」かを選 択できるようになっており、口で示したチェックボックスにマウスカーソル 301を用いて チェックすることでどちらか一方を選択する。どちらか一方にチェックすると、他方のチ エックは削除され、それ以上の入力ができな 、ようになる。  [0069] On the setting screen 300, “Always allow new operation” or “Allow conditionally” can be selected, and the check box indicated by the mouth is checked using the mouse cursor 301. To select one of them. If you check either one, the other check will be deleted and you will not be able to enter any more.
[0070] 「条件付で許可する」を選択した場合、「改善確率または非改善確率で判定」, 「改 善の期待値で判定」, 「最大改善実績で判定」, 「最大非改善実績で判定」の各条件 文が提示され、使用する条件文の前のチェックボックスにチェックを付けることで条件 を選択する。また、それぞれの条件文を" AND条件"にするカ ' OR"条件にするかを 選択することができる。  [0070] When “Allow conditionally” is selected, “Determine by improvement probability or non-improvement probability”, “Determine by expected improvement”, “Determine by maximum improvement result”, “Maximum non-improvement result” Each condition sentence of “judgment” is presented, and the condition is selected by checking the check box in front of the conditional sentence to be used. You can also select whether to make each conditional statement an “AND condition” or “OR” condition.
[0071] また、設定条件はこれらに限定されず、これ以外の条件を付加しても良い。  [0071] The setting conditions are not limited to these, and other conditions may be added.
[0072] 「常に許可する」を選択した場合は、画面下部の「設定終了」ボタン 304が選択可能 になり、これをマウスポインタ 301でクリックすることにより設定を終了する。また「条件 付で許可する」を選択した場合は、「条件パラメータ設定画面へ」ボタン 303が選択 可能になる。これをマウスポインタ 301でクリックすることにより条件パラメータ設定画 面に進む。  When “Always allow” is selected, the “setting end” button 304 at the bottom of the screen can be selected, and the setting is ended by clicking this with the mouse pointer 301. If “Allow with condition” is selected, the “To condition parameter setting screen” button 303 can be selected. Clicking this with the mouse pointer 301 advances to the condition parameter setting screen.
[0073] また、設定画面 300には「戻る」ボタン 302があり、これをクリックすることで設定前の 状態に戻ることができる。最初は条件設定がされていないので、条件を設定するまで は「戻る」ボタン 302は無効になって!/、る。  Further, the setting screen 300 has a “return” button 302, which can be clicked to return to the state before the setting. Since no condition is set at first, the “return” button 302 is disabled until the condition is set!
[0074] ただし、初期設定をデフォルトで設定しておき、これを初期状態としても良い。また、 設定画面 300の条件としてデフォルト設定を追加しても良 、。 [0075] 「条件付で許可する」を選択し、「条件パラメータ設定画面へ」ボタン 303をクリック すると図 8に示す画面が表示される。図 8の画面では、対象とするプロセス値を画面 右上のプルダウンメニュー 305から選択できる。また、プルダウンメニュー 305で選択 したプロセス値に対して、その値をレベルわけすることができる。図 8の例ではプロセ ス値として NOxを選択しており、 NOx値の上'下限値を入力することで複数のレベル に分けることができる。それぞれのレベルは条件 No.として管理される。 However, the initial setting may be set as a default and this may be set as the initial state. Also, default settings can be added as conditions on the setting screen 300. [0075] When "Allow with condition" is selected and "To condition parameter setting screen" button 303 is clicked, the screen shown in FIG. 8 is displayed. In the screen of Fig. 8, the target process value can be selected from the pull-down menu 305 in the upper right of the screen. Also, the process value selected from the pull-down menu 305 can be divided into levels. In the example of Fig. 8, NOx is selected as the process value, and it can be divided into multiple levels by entering the upper and lower limits of the NOx value. Each level is managed as a condition number.
[0076] レベル設定が終わると「次へ」ボタン 306をクリックしてレベル(条件 No.)毎に許容 値を設定する画面へ移行する。  [0076] When the level setting is completed, the "Next" button 306 is clicked to move to a screen for setting an allowable value for each level (condition No.).
[0077] 許容値設定画面の例を図 9に示す。画面右上のプルダウンメニュー 307で条件 No .を選択する。  An example of the allowable value setting screen is shown in FIG. Select condition No. in the pull-down menu 307 at the top right of the screen.
[0078] 図 9は「改善確率または非改善確率で判定」と「改善の期待値で判定」につ 、て、そ れぞれの許容値を設定する画面例である。設定する許容値は設定画面 300でチエツ クした項目であり、これらに対応する許容値設定画面が自動的に表示される。  FIG. 9 is an example of a screen for setting the permissible values for “determination based on improvement probability or non-improvement probability” and “determination based on expected improvement value”. The allowable values to be set are the items checked on the setting screen 300, and the corresponding allowable value setting screen is automatically displayed.
[0079] 「改善確率または非改善確率で判定」の許容値は横バー上でマウス操作により割 合を変更できるようになつている。また、非改善確率または改善確率の欄に直接キー ボードから数値を入力することができる。非改善確率または改善確のどちらか一方に 数値が入力されると、他方は 100%とその入力値との差を自動表示する。  [0079] The permissible value of "determined by improvement probability or non-improvement probability" can be changed by a mouse operation on the horizontal bar. You can also enter values directly from the keyboard in the non-improvement probability or improvement probability columns. When a value is entered for either non-improvement probability or improvement accuracy, the other automatically displays the difference between 100% and the input value.
[0080] 「改善の期待値で判定」の許容値も同様に横バー上でマウス操作により割合を変更 できる。ここで、割合とは図 8の画面で設定した NOx値等のプロセス値のレベル範囲 に相当する。また、図 9の画面では、「改善の期待値で判定」の許容値設定のための NOxの上 '下限値を直接入力することもできる。  [0080] The allowable value of “determined by the expected improvement value” can also be changed by operating the mouse on the horizontal bar. Here, the ratio corresponds to the level range of process values such as NOx values set on the screen in Fig. 8. In addition, on the screen shown in Fig. 9, the upper and lower limits of NOx can be directly entered to set the allowable value for “determined by the expected improvement value”.
[0081] 図示して 、な 、が、「最大改善実績で判定」 , 「最大非改善実績で判定」につ ヽても 同様に許容値を設定することが可能である。  As shown in the figure, it is possible to set an allowable value in the same manner for “determination based on the maximum improvement record” and “determination based on the maximum non-improvement record”.
[0082] 以上のように条件を設定して運転した結果を図 11の NOx操作実績のトレンド表示 例に示すような画面で確認することができる。図 11では発電出力値と NOx発生値及 び操作によって NOxが減少したか増加したか (すなわち改善か非改善か)を時系列 的にグラフ表示している。また、改善ケースと非改善ケースの割合を%で表示してい る。 [0083] このようにグラフで実績を確認することで、条件設定を見直すことが容易になる。「条 件設定確認」ボタン 308をクリックすると、設定画面 300以降で設定した内容が別ウイ ンドウで表示される。また、「条件設定変更」ボタン 309をクリックすると設定画面 300 が表示され、設定内容を変更することができる。 [0082] The results of operation with the conditions set as described above can be confirmed on the screen as shown in the NOx operation result trend display example of FIG. In Fig. 11, the power generation output value, the NOx generation value, and whether NOx has decreased or increased (ie, improved or not improved) are displayed in a time-series graph. The percentage of improvement cases and non-improvement cases is indicated in%. [0083] By confirming the results with the graph in this way, it becomes easy to review the condition settings. Click the “Condition setting confirmation” button 308 to display the settings made on the setting screen 300 or later in a separate window. In addition, when the “condition setting change” button 309 is clicked, a setting screen 300 is displayed, and the setting contents can be changed.
[0084] 本実施形態の例では制御対象プロセス値を NOx及び CO濃度としている力 これ に限らず、ガス中の CO , SOx, Hg (水銀)量,フッ素,煤塵またはミストから成る微粒  [0084] In the example of this embodiment, the force of controlling the process value to be controlled as NOx and CO concentration is not limited to this. The amount of CO, SOx, Hg (mercury) in the gas, fine particles consisting of fluorine, dust, or mist
2  2
子類, VOC (揮発性有機化合物)、または蒸気温度,蒸気圧力,発電機出力,効率 等を対象にすることもできる。また、これらの複数の組合せを" AND条件"または" OR 条件"とすることもできる。  It can also target children, VOCs (volatile organic compounds), or steam temperature, steam pressure, generator output, efficiency, etc. A plurality of combinations of these can also be used as “AND conditions” or “OR conditions”.
[0085] 次に第-の実施形態を図 2を用いて説明する。図 1に示した第一の実施形態と異な る点は、本実施例の制御装置 200の補正手段 250に強化学習手段 290を備えてい る点である。強化学習手段 290は運転実績データベース 240に蓄積された運転デ ータを用いて強化学習理論によりプラント状態に対応した適切な操作方法を学習す る機能を有している。 Next, a second embodiment will be described with reference to FIG. The difference from the first embodiment shown in FIG. 1 is that the correction learning means 290 of the control device 200 of this embodiment is provided with reinforcement learning means 290. The reinforcement learning means 290 has a function of learning an appropriate operation method corresponding to the plant state by the reinforcement learning theory using the operation data stored in the operation result database 240.
[0086] 強化学習理論の詳細な説明は、例えば"強化学習(Reinforcement Learning) ,三 上貞芳 ·皆川雅章共訳,森北出版株式会社, 2000年 12月 20日出版"に述べられて いるので、ここでは強化学習の概念のみを説明する。  [0086] A detailed explanation of reinforcement learning theory is given in, for example, “Reinforcement Learning,” Sadayoshi Mikami and Masaaki Minagawa, Morikita Publishing Co., Ltd., published on December 20, 2000. Here, only the concept of reinforcement learning will be described.
[0087] 図 12に強化学習理論による制御の概念を示す。制御装置 610は制御対象 600に 対して操作指令 630を出力する。制御対象 600は制御指令 630に従って動作する。 この時、制御指令 630による動作により制御対象 600の状態が変化する。変化した 状態が制御装置 610にとつて望ましいか、または、望ましくないか、また、それらがど の程度かを示す量である報酬 620を制御対象 600から受取る。  FIG. 12 shows the concept of control based on reinforcement learning theory. The control device 610 outputs an operation command 630 to the controlled object 600. The controlled object 600 operates according to the control command 630. At this time, the state of the controlled object 600 is changed by the operation according to the control command 630. A reward 620 is received from the controlled object 600 that is an amount that indicates whether the changed state is desirable or undesirable for the controller 610 and how much they are.
[0088] 実際には制御対象力 受取る情報は制御対象の状態量であって、それに基づいて 制御装置 610が報酬を計算するのが一般的である。一般に、望ましい状態に近づく ほど報酬が大きくなり、望ましくない状態になるほど報酬が小さくなるように設定される  Actually, the information to be received is the state quantity of the control object, and the control device 610 generally calculates the reward based on the information. Generally, the reward is set to increase as it approaches the desired state, and the reward decreases as the state becomes undesirable.
[0089] 制御装置 610は試行錯誤的に操作を実施して、報酬が最大になる (すなわち、でき るだけ望ましい状態に近づく)ような操作方法を学習することにより、制御対象 600の 状態に応じて適切な操作 (制御)ロジックが自動的に構築されるのである。 [0089] The control device 610 performs the operation by trial and error, and learns the operation method that maximizes the reward (ie, approaches the desired state as much as possible). Appropriate operation (control) logic is automatically constructed according to the state.
[0090] ニューラルネットワークに代表される教師付学習理論は、予め成功事例を教師デー タとして提供する必要があり、新規プラントや現象が複雑で予め成功事例を準備でき ない場合には不向きである。  [0090] Supervised learning theory represented by neural networks needs to provide success cases as teacher data in advance, and is not suitable when a new plant or phenomenon is complicated and a success case cannot be prepared in advance.
[0091] これに対して強化学習理論は教師なし学習に分類され、自らが試行錯誤的に望ま L ヽ操作を生成する能力を持って!/ヽる点で、制御対象の特性が必ずしも明確でな!ヽ 場合に対しても適用可能な利点を持って ヽる。  [0091] On the other hand, reinforcement learning theory is classified as unsupervised learning, and the characteristics of the controlled object are not always clear in that they have the ability to generate the desired L ヽ operation by trial and error! No! ヽ It has advantages that can be applied even in cases.
[0092] 本第二の実施形態では、この強化学習理論を利用している。  In the second embodiment, this reinforcement learning theory is used.
[0093] 図 13に強化学習手段 290の構成を示す。強化学習手段 290はモデル化手段 291 と学習手段 292とから構成されている。  FIG. 13 shows the configuration of reinforcement learning means 290. The reinforcement learning means 290 includes a modeling means 291 and a learning means 292.
[0094] 強化学習は試行錯誤的に学習するが、プラント制御の場合に実プラントを直接相 手にして試行錯誤的に操作することは運転の危険性やプラントの製造製品へのダメ ージなどの点で実現困難である。そこで、プラントの運転実績から運転特性モデルを 作成し、このモデルを相手に学習する方式としている。  [0094] Reinforcement learning is learned on a trial and error basis, but in the case of plant control, operating on the actual plant directly and on a trial and error basis can be a risk of operation, damage to manufactured products of the plant, etc. This is difficult to realize. Therefore, an operation characteristic model is created from the operation results of the plant, and this model is used as a learning method.
[0095] モデル化手段 291は運転実績データベース 240から過去の運転データを読込み, 入力層, 中間層,出力層からなるニューラルネットワークで、誤差逆伝播法 (バックプ ロパゲーシヨン法)を用いて入出力関係を学習する。ニューラルネットワークの構成及 び学習方法は一般的な方法であり、また、これらの方法が他の方法であっても良ぐ ニューラルネットワークの構成や学習方法には依存しな 、ので、ここでは詳細な説明 を省略する。  [0095] Modeling means 291 is a neural network consisting of an input layer, an intermediate layer, and an output layer, which reads past operation data from the operation result database 240, and uses an error back-propagation method (back propagation method) to determine the input / output relationship. learn. The configuration and learning method of the neural network is a general method, and these methods may be other methods. It does not depend on the configuration of the neural network or the learning method, so it is detailed here. The description is omitted.
[0096] 入力データはバーナー及びァフタエアポートの各位置毎の空気流量,バーナー毎 の燃料流量,発電機出力であり、出力データは NOx及び CO濃度である。  [0096] The input data are the air flow rate at each position of the burner and after-air port, the fuel flow rate for each burner, and the generator output, and the output data are NOx and CO concentrations.
[0097] 本例で燃料流量,空気流量,発電出力と NOx及び CO濃度の関係をモデル化して いるが、入力項目及び出力項目をこれだけに限定するものではい。また、モデルィ匕 方法も-ユーラルネットワークに限定するものではなぐ回帰モデル等の他の統計モ デルを用いても良い。  [0097] In this example, the relationship between the fuel flow rate, air flow rate, power generation output and NOx and CO concentration is modeled, but the input items and output items are not limited to this. Further, the model method is not limited to the Euler network, and other statistical models such as a regression model may be used.
[0098] 学習手段 292はモデル化手段 291で作成したモデルに対して、バーナー及びァフ タエアポートの各位置毎の空気流量、バーナー毎の燃料流量からなる入力データ 2 93を出力する。入力データ 293はプラントの操作条件に対応しており、それぞれ上 下限値,変化幅 (刻み幅)、一回の操作で取り得る最大変化幅が設定してある。入力 データ 293の各量は取り得る値の範囲内でランダムに各数値が決定される。 [0098] The learning means 292 is input data consisting of the air flow rate at each position of the burner and after-air port, and the fuel flow rate at each burner with respect to the model created by the modeling means 291. 93 is output. Input data 293 corresponds to the operating conditions of the plant, and the upper and lower limits, change width (step size), and maximum change width that can be taken in one operation are set. Each amount of the input data 293 is determined at random within a range of possible values.
[0099] モデル化手段 291は作成済みのモデルに入力データ 293を入力し、出力データ 2[0099] Modeling means 291 inputs input data 293 to the created model, and outputs data 2
94となる NOx及び CO濃度を計算する。 Calculate the NOx and CO concentrations to be 94.
[0100] 学習手段 292は出力データ 294を受信し、報酬値を計算する。 [0100] The learning means 292 receives the output data 294 and calculates a reward value.
[0101] 報酬は式(3)で定義する、ここで、 Rは報酬値、 O は NOx値、 O は CO値、 S [0101] The reward is defined by equation (3), where R is the reward value, O is the NOx value, O is the CO value, S
NOx CO NOx 及び S は NOx及び COの目標設定値、 k , k , k , kは正の定数である。  NOx CO NOx and S are target setpoints for NOx and CO, and k, k, k, and k are positive constants.
CO 1 2 3 4  CO 1 2 3 4
[0102] [数 3]  [0102] [Equation 3]
R=R1 +R2+R3 + R4 …式(3)R = R 1 + R 2 + R 3 + R 4 ... Formula (3)
Ox - NOx  Ox-NOx
1 0 CoNOx > s Ox) 1 0 Co NOx > s Ox )
k2 (Oco ≤ co) k 2 (O c o ≤ co )
0  0
(sNOx— oNOx) (oNOx < SNOx)(s NOx — o NOx ) (o NOx <S NOx )
Figure imgf000016_0001
Figure imgf000016_0001
k4 、。co一 Oco Oco - Sco) k 4 ,. co 1 Oco Oco-Sco)
0  0
[0103] 式 (3)に示すように、目標設定値よりも NOx, CO値が低下した場合は報酬 R及び [0103] As shown in Equation (3), when the NOx and CO values are lower than the target set value, the reward R and
1 1
R R
2を与え、さらに、目標設定値よりも低下した場合はその偏差に比例して報酬を与え るようになっている。  2 is given, and if it falls below the target set value, a reward is given in proportion to the deviation.
[0104] なお、報酬の定義方法は他にも多様な方法が考えられ、式 (3)の方法に限定され るものではない。  [0104] There are various other methods for defining the reward, and the method is not limited to the method of equation (3).
[0105] 学習手段 292は式(3)で計算される報酬が最大になるように入力データ 293の組 合せ、すなわち操作量を学習するため、結果的に現状態に対応して NOx, COを低 減する操作量の組合せを学習することができる。  [0105] The learning means 292 learns the combination of the input data 293 so that the reward calculated by Equation (3) is maximized, that is, the manipulated variable. As a result, NOx and CO are calculated corresponding to the current state. It is possible to learn combinations of operating amounts to be reduced.
[0106] 学習手段 292は学習が終了した状態で、現在時刻の運転データである計測値 205 を読込み、学習結果に基づ 、て式 (3)の報酬が最大となる操作量 295を出力する。 [0106] The learning means 292 is a state in which learning is completed, and a measured value 205 that is operation data at the current time. Based on the learning result, the operation amount 295 that maximizes the reward in equation (3) is output.
[0107] 状態検索手段 260は前述の第一の実施形態と同じである。改善確率演算手段 27 0の機能では、図 3に示したステップ 530において基本操作指令値 235今回演算値 における頻度分布に加えて、学習手段 292の出力値である操作量 295における頻 度分布の二つの頻度分布を計算する。また、ステップ 540ではカウントした頻度分布 から、改善確率または非改善確率,最大改善実績値,最大非改善実績値をそれぞ れ基本操作指令値 235今回演算値と操作量 295とについて算出する。これらの点が 第一の実施形態と異なるが、それぞれの算出方法は同じである。  [0107] The state search means 260 is the same as that in the first embodiment described above. In the function of the improvement probability calculation means 270, in addition to the frequency distribution in the basic operation command value 235 current calculation value in step 530 shown in FIG. 3, the frequency distribution in the operation amount 295 that is the output value of the learning means 292 is calculated. Calculate one frequency distribution. In step 540, the improvement probability or non-improvement probability, the maximum improvement actual value, and the maximum non-improvement actual value are calculated for the basic operation command value 235 current calculated value and the operation amount 295 from the counted frequency distribution. Although these points are different from the first embodiment, the respective calculation methods are the same.
[0108] 操作指令決定手段 280の機能も基本的に第一の実施形態と同じであるが、第一の 実施形態とは次の点が異なる。  The function of the operation command determination means 280 is basically the same as that of the first embodiment, but differs from the first embodiment in the following points.
[0109] 本実施の形態では、図 3に示したステップ 550では、基本操作指令値 235の今回 演算値と操作量 295に対する改善確率または非改善確率,最大改善実績値,最大 非改善実績値をそれぞれの判定基準値と比較する。  In the present embodiment, in step 550 shown in FIG. 3, the improvement probability or non-improvement probability, maximum improvement actual value, and maximum non-improvement actual value for the current operation value of the basic operation command value 235 and the operation amount 295 are calculated. Compare with each criterion value.
[0110] 両者の信号が次操作指令として許可できるかどうかを第一の実施形態と同様に予 め設定した条件に基づ!/、て判定する。  [0110] Whether or not both signals can be permitted as the next operation command is determined based on a preset condition as in the first embodiment.
[0111] その結果、両方が不許可判定の場合は、基本操作指令値 235の前回値を操作指 令値 285として出力する。また、どちらか一方だけが許可判定の場合は、許可判定の 方の信号を操作指令値 285として出力する。また、両方とも許可判定の場合は、さら に、どちらを選択するかを決定するための条件を定めておき、これに基づいて一方を 選択する。  As a result, if both are determined to be disapproved, the previous value of the basic operation command value 235 is output as the operation command value 285. If only one of them is permission determination, the permission determination signal is output as the operation command value 285. In addition, when both are determined to be permitted, a condition for deciding which one to select is determined, and one is selected based on this.
[0112] その選択方法としては、改善期待値が大きい方を選択するようにしている。その他 にも最大改善実績値が大きい方を選択、または、最大非改善実績値が小さい方など 考えられ、本例以外の方法を設定しても良い。  [0112] As the selection method, the one having the larger expected improvement value is selected. Other methods such as the one with the largest maximum improvement actual value or the one with the smallest maximum non-improvement actual value may be considered, and methods other than this example may be set.
[0113] 操作指令決定手段 280で操作指令値 285今回値を選択する回路図を図 14に示 す。減算器 281で基本操作指令値 235今回値と強化学習結果である操作量 295の 偏差信号 287を計算し、これを加算器 284で操作指令値 285今回値に加算して強 化学習操作指令値 288を作成する。  [0113] Figure 14 shows the circuit diagram for selecting the operation command value 285 current value by the operation command determination means 280. The subtractor 281 calculates the deviation value 287 of the basic operation command value 235 current value and the operation amount 295, which is the reinforcement learning result, and adds it to the operation command value 285 current value by the adder 284 to add the reinforcement learning operation command value. Create 288.
[0114] もし、入力データの異常または演算回路の異常により強化学習手段 290の出力値 である操作指令値 285が異常になった場合には偏差信号 287に乗算器 283で乗じ る係数をゼロにすることで強化学習操作指令値 288は基本操作指令値 235今回値と 等しくなるため、誤って異常信号を出力する危険性が低減される。 [0114] If the input data is abnormal or the arithmetic circuit is abnormal, the output value of reinforcement learning means 290 If the operation command value 285 is abnormal, the reinforcement learning operation command value 288 becomes equal to the basic operation command value 235 current value by setting the coefficient multiplied by the deviation signal 287 by the multiplier 283 to zero. The risk of erroneously outputting an abnormal signal is reduced.
[0115] 操作指令値 285が異常が異常力否かは、強化学習手段 290への入力データ及び 出力データの上下限値チェック及び変化率の上下限チェックで判定する。一つでも 予め設定した上下限値を逸脱する場合は、切替器 282の出力信号を 0とすることで 異常の可能性がある操作指令値 285の出力を防止する。切替器 282は、それ以外 の場合は出力信号を 1としている。  [0115] Whether the operation command value 285 is abnormal or not is determined by checking the upper and lower limit values of the input data and output data to the reinforcement learning means 290 and the upper and lower limit checks of the change rate. If even one of the values deviates from the preset upper and lower limit values, the output of the switch 282 is set to 0 to prevent the output of the operation command value 285 that may be abnormal. The switch 282 sets the output signal to 1 in other cases.
[0116] 切替器 286はステップ 550の判定結果を受けて、強化学習操作指令値 288,基本 操作指令値 235今回値,基本操作指令値 235前回値のうち何れかを選択して出力 する。  In response to the determination result of step 550, switch 286 selects and outputs one of reinforcement learning operation command value 288, basic operation command value 235 current value, and basic operation command value 235 previous value.
[0117] 操作指令値 285が異常の場合には、強化学習操作指令値 288は選択候補から外 される。そのため、操作指令値 285が異常時には基本操作指令値 235の今回値と前 回値のどちらか一方が出力され、運転の安全性が確保される。また、前述したように 操作指令値 285の異常時には切替器 282の出力信号を 0としているので、万が一切 替器 286で強化学習操作指令値 288が選択されたとしても異常信号を出力すること は無く、二重に安全性が確保されている。  [0117] If the operation command value 285 is abnormal, the reinforcement learning operation command value 288 is excluded from the selection candidates. For this reason, when the operation command value 285 is abnormal, either the current value or the previous value of the basic operation command value 235 is output, and driving safety is ensured. As described above, when the operation command value 285 is abnormal, the output signal of the switch 282 is set to 0. Therefore, even if the reinforcement learning operation command value 288 is selected by the switch 286, an abnormal signal is output. There is no double security.
[0118] 以上のように強化学習手段 290を付加することにより、実機プラント特性に対応して より良い操作を学習できるため、さらに制御性能が向上する。また、強化学習手段 29 0の演算結果が何らかの原因で異常になったとしても、二重の安全対策によってブラ ントの安定運転が維持されるため、信頼性も確保できる。  [0118] By adding the reinforcement learning means 290 as described above, it is possible to learn a better operation corresponding to the actual plant characteristics, so that the control performance is further improved. Even if the calculation result of the reinforcement learning means 290 becomes abnormal for some reason, the stable operation of the brand is maintained by the double safety measures, so that the reliability can be ensured.
[0119] 次に第三の実施形態を、図 15を用いて説明する。前述の第二の実施形態と異なる 点は改善確率または非改善確率,最大改善実績値,最大非改善実績値の情報 275 を強化学習手段 290に出力している点である。  Next, a third embodiment will be described with reference to FIG. The difference from the second embodiment described above is that information 275 of improvement probability or non-improvement probability, maximum improvement actual value, and maximum non-improvement actual value is output to the reinforcement learning means 290.
[0120] 強化学習手段 290では情報 275を用いて報酬を計算する。報酬の計算式を式 (4) に示す。ここで R , R , R , Rは式(3)と同じである。また、 P , P は NOx及び C  [0120] Reinforcement learning means 290 uses information 275 to calculate a reward. The formula for calculating the reward is shown in Equation (4). Here, R 1, R 2, R 3, and R are the same as in equation (3). P and P are NOx and C
1 2 3 4 a— NOx a— CO  1 2 3 4 a— NOx a— CO
Oの改善確率、 O , Ρ は NOx及び COの最大改善実績値、 O ,Ρ  O improvement probability, O, Ρ is the maximum improvement actual value of NOx and CO, O, Ρ
Pmin.NOx Pmin.CO Pmin.NOx Pmin.C は NOx及び COの最大非改善実績値、 SI , S2 , SI , SI , S2 Pmin.NOx Pmin.CO Pmin.NOx Pmin.C is the maximum non-improved actual value of NOx and CO, SI, S2, SI, SI, S2
O Pa— NOx Pa— NOx OPmax.NOx Pa— CO P SI はそれぞれ設定パラメータ、 k, k, k, k, k, k , k , k は正の定O Pa— NOx Pa— NOx OPmax.NOx Pa— CO P SI is a setting parameter, and k, k, k, k, k, k, k, k are positive constants.
OPmax— CO 5 6 7 8 9 10 11 12 数である。 OPmax — CO 5 6 7 8 9 10 11 12 Number.
[0121] [数 4] 式(4〉 (P a..NOx S, Ox [0121] [Equation 4] Equation (4) (P a..NOx S, Ox
Figure imgf000019_0001
(Pa— NOx く 1 Pa Ox
Figure imgf000019_0001
(Pa—NOx 1 Pa Ox
Κ 6 (100Pa NOx- SS22Pa NOx)Κ 6 (100P a NOx -SS22 Pa NOx )
Ox > ° 1 OPmax NOx〉 Ox> ° 1 OPmax NOx>
,ΝΟχ〉 ° "I OPrnin„.NOx ) , ΝΟχ〉 ° "I OPrnin„ .NOx)
½a CO〉 ½a CO>
Figure imgf000019_0002
(Pa CO く ΊΡβ CO)
Figure imgf000019_0002
(P a CO Ί Ρβ CO)
=^ιο (100P S2 Pa CO ) = ^ ιο (100P S2 Pa CO)
F¾" = F¾ "=
Figure imgf000019_0003
CO > S1 OPmki— CO )
Figure imgf000019_0003
CO> S1 OPmki— CO)
[0122] 式 (4)を用いて報酬を計算するため、改善確率が大きぐ最大改善実績値が大きく 、最大非改善実績値が小さい場合に報酬が大きくなる。従って、強化学習の段階で 、それらの条件を満足する操作が自動的に学習されるため、非改善のリスクが少ない 操作を学習できる。 [0122] Since the reward is calculated using Equation (4), the reward increases when the maximum improvement actual value with a large improvement probability is large and the maximum non-improvement actual value is small. Therefore, operations that satisfy these conditions are automatically learned at the reinforcement learning stage, and therefore, operations with less risk of non-improvement can be learned.
[0123] 式 (3)または式 (4)に示した報酬計算方法の定義及び報酬定義に必要な各設定 パラメータは設定画面 300と同様の入力画面力も入出力手段 221を用いて入力 -設 定が可能である。  [0123] Definition of reward calculation method shown in Equation (3) or Equation (4) and each setting parameter required for reward definition. Input parameters are the same as those on setting screen 300. Input / output means 221 is used to input and set parameters. Is possible.
[0124] 操作量 295は改善確率,最大改善実績値,最大非改善実績値等の情報加味して 決定された操作指令値であるため、操作指令決定手段 280では通常は常に操作量 295を操作指令値 285として出力する。  [0124] Since the operation amount 295 is an operation command value determined in consideration of information such as improvement probability, maximum improvement actual value, maximum non-improvement actual value, etc., the operation command determining means 280 normally operates the operation amount 295 at all times. Output as command value 285.
[0125] ただし、強化学習手段 290への入力データの欠損や演算回路の故障等によって操 作量 295が異常になった場合は、操作指令決定手段 280は基本操作指令 235今回 値を操作指令値 285として出力する。 [0126] 以上のように、本実施の形態では、強化学習の報酬を改善確率,最大改善実績値 ,最大非改善実績値等の情報を用いて計算するため、過去の運転実績に基づいて 操作による効果が大きぐかつ、非改善リスクを低減する操作方法を自動的に学習で きる。 [0125] However, if the manipulated variable 295 becomes abnormal due to a lack of input data to the reinforcement learning means 290 or a malfunction of the arithmetic circuit, the operation command determining means 280 uses the basic operation command 235 the current value as the operation command value. Output as 285. [0126] As described above, in this embodiment, since the reward for reinforcement learning is calculated using information such as improvement probability, maximum improvement actual value, maximum non-improvement actual value, etc., operation based on past driving results is performed. It is possible to automatically learn the operation method that reduces the risk of non-improvement and has a large effect.
[0127] 上述した実施例により、プラントの実機運転データのばらつきに起因する制御操作 によるプラント状態の非改善リスクを低減できる。すなわち、制御操作によって、現状 態よりも制御目標値との偏差 (制御偏差)が増大するというリスクを低減できるため、常 に安定した運転が可能になる。  [0127] According to the above-described embodiment, it is possible to reduce the risk of non-improvement of the plant state due to the control operation resulting from the variation in the actual plant operation data. In other words, the control operation can reduce the risk that the deviation from the control target value (control deviation) increases compared to the current state, so that stable operation is always possible.
[0128] また、次の制御操作を決定する指標として、操作による改善確率または非改善確率 をユーザーが決められるので、プラントの運転員または経営者の運用方針に沿った 運転が可能になる。運用上、運転特性を途中で変更する場合も、制御特性を容易に 変更可能である。  [0128] Further, since the user can determine the improvement probability or non-improvement probability due to the operation as an index for determining the next control operation, the operation according to the operation policy of the plant operator or the manager becomes possible. In terms of operation, control characteristics can be easily changed even when operating characteristics are changed in the middle.
[0129] すなわち、例えば試運転段階では非改善リスクを許容して試行錯誤的に望ましい 運転操作法を追求して制御パラメータ等を調整し、本格運用時には非改善リスクを 極力排除した安定運転を志向することができる。  [0129] That is, for example, in the trial operation stage, non-improvement risk is allowed, and a control method is adjusted by pursuing a desirable driving operation method through trial and error. be able to.
[0130] また、プラントの種類によっては、静定までの時間が長くても安定運転が第一条件 である場合もあり、一方で、多少の変動はあっても急速に目標状態に到達する方が 望ましい場合もある。同様に、制御偏差がある程度大きい状態では、それ以上状態 が悪ィ匕しな 、ように非改善リスクを少なくする設定とし、制御偏差がある程度小さくな つた状態では逆に多少の非改善リスクを許容してもさらに望ましい状態を追及する運 転も可能となる。  [0130] Depending on the type of plant, stable operation may be the first condition even if the time to settling is long. On the other hand, even if there are some fluctuations, the target state is reached quickly. May be desirable. Similarly, when the control deviation is large to some extent, it is set to reduce the non-improvement risk so that the state does not worsen further. On the contrary, when the control deviation is small to some extent, some non-improvement risk is allowed. Even so, it is possible to pursue more desirable conditions.
[0131] また、ユーザーの要求やプラントの状態に柔軟に対応でき、非改善リスクを低減す ることで運転の安全性,製品品質,収率などが向上する効果がある。  [0131] Furthermore, it can flexibly respond to user requirements and plant conditions, and has the effect of improving operational safety, product quality, yield, etc. by reducing non-improvement risk.
[0132] また、火力発電プラントの燃焼制御に適用する場合、 NOx, CO等の環境負荷物 質発生増加のリスクを低減することができ、環境に優しい運転が可能になる。これに より、脱硝装置におけるアンモニアの使用量が削減でき、触媒活性が長時間持続す るようになるなどの経済的効果も期待できる。  [0132] In addition, when applied to combustion control of a thermal power plant, the risk of an increase in the generation of environmentally hazardous substances such as NOx and CO can be reduced, and environmentally friendly operation becomes possible. As a result, the amount of ammonia used in the denitration apparatus can be reduced, and economic effects such as longer catalytic activity can be expected.
[0133] 上記実施例では、火力発電プラントを制御対象とした制御装置について記載した 力 製造プラント等の他のプラントについても上記した制御装置を使用できる。 図面の簡単な説明 [0133] In the above-described embodiment, a control device for controlling a thermal power plant is described. The control device described above can also be used for other plants such as force production plants. Brief Description of Drawings
[0134] [図 1]第一の実施形態に係る制御装置の構成を説明する図である。  FIG. 1 is a diagram illustrating a configuration of a control device according to a first embodiment.
[図 2]第二の実施形態に係る制御装置の構成を説明する図である。  FIG. 2 is a diagram illustrating a configuration of a control device according to a second embodiment.
[図 3]補正手段の演算手順を説明する図である。  FIG. 3 is a diagram for explaining a calculation procedure of a correction means.
[図 4]プラントデータの分布と操作パラメータとの関係例を説明する図である。  FIG. 4 is a diagram for explaining an example of the relationship between plant data distribution and operating parameters.
[図 5]NOx濃度の頻度分布例を説明する図である。  FIG. 5 is a diagram for explaining an example of a frequency distribution of NOx concentration.
[図 6]NOx濃度の頻度分布例を説明する図である。  FIG. 6 is a diagram illustrating an example of a frequency distribution of NOx concentration.
[図 7]操作指令値の出力許可条件設定画面例を説明する図である。  FIG. 7 is a diagram illustrating an example of an operation command value output permission condition setting screen.
[図 8]操作指令値の出力許可条件設定画面例を説明する図である。  FIG. 8 is a diagram illustrating an example of an operation command value output permission condition setting screen.
[図 9]操作指令値の出力許可条件設定画面例を説明する図である。  FIG. 9 is a diagram for explaining an example of an operation command value output permission condition setting screen.
[図 10]火力発電プラントの構成を説明する図である。  FIG. 10 is a diagram illustrating the configuration of a thermal power plant.
[図 ll]NOx操作実績のトレンド表示例を示す図である。  FIG. Ll is a diagram showing a trend display example of NOx operation results.
[図 12]強化学習の概念を説明する図である。  FIG. 12 is a diagram for explaining the concept of reinforcement learning.
[図 13]強化学習手段の構成を説明する図である。  FIG. 13 is a diagram illustrating the configuration of reinforcement learning means.
[図 14]操作指令信号選択回路を説明する図である。  FIG. 14 is a diagram illustrating an operation command signal selection circuit.
[図 15]第三の実施形態に係る制御装置の構成を説明する図である。  FIG. 15 is a diagram illustrating a configuration of a control device according to a third embodiment.
符号の説明  Explanation of symbols
[0135] 100· ··プラント、 200· ··制御装置、 220…入出力インターフェース、 221· ··入出力 手段、 230· ··基本制御指令演算手段、 240…運転実績データベース、 250…補正 手段、 260…状態検索手段、 270…改善確率演算手段、 280· ··操作指令決定手段  [0135] 100 ··· Plant, 200 ··· Control device, 220 ··· I / O interface, 221 ··· I / O means, 230 ··· Basic control command calculation means, 240 ··· Operation result database, 250 ··· Correction means , 260 ... state search means, 270 ... improvement probability calculation means, 280 ... operation command determination means

Claims

請求の範囲 The scope of the claims
[1] プラントの計測データを入力し前記プラントへの操作指令値を演算する基本制御指 令演算手段と、  [1] Basic control command calculation means for inputting plant measurement data and calculating an operation command value for the plant;
前記計測データ及び前記操作指令値を有する運転データを蓄積する運転実績デ ータベースと、  An operation record database for accumulating operation data having the measurement data and the operation command value;
現在の前記運転データと過去の前記運転データに基づいて類似状態を検索及び 抽出する状態検索手段と、  A state search means for searching and extracting a similar state based on the current driving data and the past driving data;
該状態検索手段で抽出された類似状態である過去の運転データにおいて、制御 操作による運転状態の変化実績力 頻度分布または確率分布を算出し、前記制御 操作による改善確率または非改善確率を算出する改善確率演算手段と、  An improvement that calculates a frequency distribution or a probability distribution of a driving state change by a control operation in a past driving data that is a similar state extracted by the state search means, and calculates an improvement probability or a non-improvement probability by the control operation A probability calculation means;
該改善確率演算手段で算出した改善確率または非改善確率に基づいて、次の操 作指令値を決定する操作指令決定手段を具備する  Operation command determination means for determining the next operation command value based on the improvement probability or non-improvement probability calculated by the improvement probability calculation means is provided.
ことを特徴とするプラントの制御装置。  A plant control apparatus characterized by the above.
[2] 請求項 1に記載のプラントの制御装置にお 、て、  [2] In the plant control device according to claim 1,
該改善確率演算手段で算出した各値に対する上限値または下限値である許容基 準値を入力する改善確率基準値入力手段を具備することを特徴とするプラントの制 御装置。  A plant control apparatus comprising improvement probability reference value input means for inputting an allowable reference value which is an upper limit value or a lower limit value for each value calculated by the improvement probability calculation means.
[3] 請求項 1に記載のプラントの制御装置にお 、て、  [3] In the plant control device according to claim 1,
予め定義した報酬を計算して、該報酬値に基づ!、てプラントの操作方法を学習す る強化学習手段を有し、  Reinforcement learning means for calculating a predefined reward and learning the plant operation method based on the reward value!
前記操作指令決定手段は、強化学習の結果算出された操作指令値の値に補正を 施して次の操作指令値を決定する  The operation command determining means corrects an operation command value calculated as a result of reinforcement learning and determines a next operation command value.
ことを特徴とするプラントの制御装置。  A plant control apparatus characterized by the above.
[4] 請求項 3に記載のプラントの制御装置において、 [4] In the plant control apparatus according to claim 3,
強化学習時に演算する報酬の値を決定し、該報酬値に基づいてプラントの操作方 法を学習する操作学習手段を具備することを特徴とするプラントの制御装置。  A plant control apparatus comprising operation learning means for determining a reward value to be calculated during reinforcement learning and learning a plant operation method based on the reward value.
[5] 請求項 1に記載のプラントの制御装置にお 、て、 [5] In the plant control device according to claim 1,
前記制御装置が、火力発電プラントの制御装置であり、 前記状態検索手段は、プラント状態の類似状態を検索する際に、発電機出力また は負荷率,燃料流量,給水流量,バーナーまたはエアポートの空気流量合計値,各 空気ポート個別の空気流量,バーナー点火位置の!/、ずれかの情報を状態指標とし て前記運転実績データベース力 類似状態を検索するものであり、 The control device is a control device for a thermal power plant, When searching for a similar state of the plant state, the state search means searches for the generator output or load factor, fuel flow rate, feed water flow rate, total air flow rate of the burner or air port, individual air flow rate of each air port, and burner ignition. This is a search for the similar state of the above-mentioned driving performance database using the information of position! / Or deviation as a state index.
前記改善確率演算手段は、排ガス中の NOx, CO, CO , SOx, Hg,フッ素,煤  The improvement probability calculation means includes NOx, CO, CO, SOx, Hg, fluorine, soot in exhaust gas.
2  2
塵またはミストから成る微粒子類の状態量, VOC (揮発性有機化合物)の状態量、蒸 気温度,蒸気圧力,発電機出力,効率のいずれかの状態量の制御操作による変化 実績力ゝら該変化実績の頻度分布または確率分布を算出し、操作による改善確率,非 改善確率,分散,平均値,期待値,最大改善予想値,最大非改善予想値,最大改 善予想値または最大非改善予想値に対する所定割合以上の発生確率のうち少なく とも一つを算出するものであることを特徴とする火力発電プラントの制御装置。  State quantity of fine particles consisting of dust or mist, state quantity of VOC (Volatile Organic Compound), steam temperature, steam pressure, generator output, and efficiency change due to control operation. The frequency distribution or probability distribution of the actual change is calculated, and the improvement probability, non-improvement probability, variance, average value, expectation value, maximum improvement expected value, maximum non-improvement prediction value, maximum improvement expected value, or maximum non-improvement prediction by operation A control device for a thermal power plant, wherein at least one of occurrence probabilities of a predetermined ratio or more with respect to a value is calculated.
[6] 請求項 5に記載の火力発電プラントの制御装置において、 [6] In the thermal power plant control device according to claim 5,
予め定義した報酬を計算して、該報酬値に基づ 、て火力発電プラントの操作方法 を学習する強化学習手段を有し、  Reinforcement learning means for calculating a predefined reward and learning the operation method of the thermal power plant based on the reward value;
前記操作指令決定手段は、強化学習の結果算出された操作指令値の値に補正を 施して次の操作指令値を決定することを特徴とする火力発電プラントの制御装置。  The control device for a thermal power plant, wherein the operation command determination means corrects an operation command value calculated as a result of reinforcement learning and determines a next operation command value.
[7] 請求項 5に記載の火力発電プラントの制御装置において、 [7] In the thermal power plant control device according to claim 5,
前記操作指令値はバーナヘ供給する燃料流量,パーナ空気流量,エアポートへ 供給する空気流量,ガス再循環量,パーナ角度,供給空気温度のいずれかを対象と することを特徴とする火力発電プラントの制御装置。  The operation command value is one of the fuel flow rate supplied to the burner, the air flow rate of the burner, the air flow rate supplied to the air port, the gas recirculation amount, the burner angle, and the supply air temperature. apparatus.
[8] 請求項 1に記載のプラントの制御装置にお 、て、 [8] In the plant control device according to claim 1,
前記改善確率演算手段は、最大改善値,最大非改善値,分散,平均値,期待値, 最大改善値または最大非改善値に対する所定割合以上の発生確率のいずれかを 算出するものであり、  The improvement probability calculating means calculates any one of the maximum improvement value, the maximum non-improvement value, the variance, the average value, the expected value, the maximum improvement value, or the occurrence probability of a predetermined ratio or more with respect to the maximum non-improvement value,
前記操作指令決定手段は、前記改善確立演算手段で算出されたものに基づいて 、次の操作指令値を決定するものであることを特徴とするプラントの制御装置。  The plant control apparatus according to claim 1, wherein the operation command determination means determines a next operation command value based on the value calculated by the improvement establishment calculation means.
PCT/JP2007/050683 2006-03-31 2007-01-18 Plant controller WO2007116591A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2006-098518 2006-03-31
JP2006098518A JP4665815B2 (en) 2006-03-31 2006-03-31 Plant control equipment

Publications (1)

Publication Number Publication Date
WO2007116591A1 true WO2007116591A1 (en) 2007-10-18

Family

ID=38580893

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2007/050683 WO2007116591A1 (en) 2006-03-31 2007-01-18 Plant controller

Country Status (2)

Country Link
JP (1) JP4665815B2 (en)
WO (1) WO2007116591A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015219793A (en) * 2014-05-20 2015-12-07 株式会社日立製作所 Electronic control device

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4427074B2 (en) 2007-06-07 2010-03-03 株式会社日立製作所 Plant control equipment
US8135653B2 (en) 2007-11-20 2012-03-13 Hitachi, Ltd. Power plant control device which uses a model, a learning signal, a correction signal, and a manipulation signal
JP4627553B2 (en) 2008-03-28 2011-02-09 株式会社日立製作所 Plant control device and thermal power plant control device
EP2336637A1 (en) * 2009-12-14 2011-06-22 ABB Research Ltd. System and associated method for monitoring and controlling a power plant
JP5918663B2 (en) * 2012-09-10 2016-05-18 株式会社日立製作所 Thermal power plant control device and control method
JP6400511B2 (en) * 2015-03-16 2018-10-03 株式会社日立製作所 Data processing apparatus and plant for plant control
JP6813416B2 (en) * 2017-04-10 2021-01-13 株式会社日立製作所 Plant control device and its control method, rolling mill control device and its control method and program
JP7086692B2 (en) 2018-04-19 2022-06-20 三菱重工業株式会社 Plant control equipment, plants, plant control methods and plant control programs
JP7187961B2 (en) * 2018-10-12 2022-12-13 富士通株式会社 Reinforcement learning program, reinforcement learning method, and reinforcement learning device
JP7033639B2 (en) * 2020-12-17 2022-03-10 株式会社日立製作所 Plant control device and its control method, rolling mill control device and its control method and program
JP7487704B2 (en) 2021-04-28 2024-05-21 横河電機株式会社 EVALUATION APPARATUS, EVALUATION METHOD, EVALUATION PROGRAM, CONTROL APPARATUS, AND CONTROL PROGRAM
JP7478297B1 (en) 2023-09-08 2024-05-02 三菱重工業株式会社 Information processing system, information processing method, learning system, and learning method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02306364A (en) * 1989-05-22 1990-12-19 Nkk Corp Learning control method
JPH02308302A (en) * 1989-05-24 1990-12-21 Hitachi Ltd Process state managing device
JPH07191706A (en) * 1993-12-27 1995-07-28 Nkk Corp Identification method for cause/effect model and learning method for knowledge for control
JPH07219626A (en) * 1994-02-04 1995-08-18 Toshiba Corp Plant control unit and tunnel ventilation control unit
JPH08286727A (en) * 1995-02-17 1996-11-01 Nippon Steel Corp Automatic preset device for manipulated variable of process control
JP2004013393A (en) * 2002-06-05 2004-01-15 Hitachi Ltd Preset control method and controller of control system using control model

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS63196896A (en) * 1987-02-12 1988-08-15 株式会社東芝 Nuclear reactor output controller
JP3432612B2 (en) * 1994-09-30 2003-08-04 バブコック日立株式会社 Plant operation control device
JP2004178492A (en) * 2002-11-29 2004-06-24 Mitsubishi Heavy Ind Ltd Plant simulation method using enhanced learning method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02306364A (en) * 1989-05-22 1990-12-19 Nkk Corp Learning control method
JPH02308302A (en) * 1989-05-24 1990-12-21 Hitachi Ltd Process state managing device
JPH07191706A (en) * 1993-12-27 1995-07-28 Nkk Corp Identification method for cause/effect model and learning method for knowledge for control
JPH07219626A (en) * 1994-02-04 1995-08-18 Toshiba Corp Plant control unit and tunnel ventilation control unit
JPH08286727A (en) * 1995-02-17 1996-11-01 Nippon Steel Corp Automatic preset device for manipulated variable of process control
JP2004013393A (en) * 2002-06-05 2004-01-15 Hitachi Ltd Preset control method and controller of control system using control model

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015219793A (en) * 2014-05-20 2015-12-07 株式会社日立製作所 Electronic control device

Also Published As

Publication number Publication date
JP2007272646A (en) 2007-10-18
JP4665815B2 (en) 2011-04-06

Similar Documents

Publication Publication Date Title
WO2007116591A1 (en) Plant controller
JP4573783B2 (en) Plant control apparatus and control method, thermal power plant and control method therefor
JP4270218B2 (en) Control device for control object having combustion device, and control device for plant having boiler
US8554706B2 (en) Power plant control device which uses a model, a learning signal, a correction signal, and a manipulation signal
JP5251938B2 (en) Plant control device and thermal power plant control device
US7813819B2 (en) Control system for control subject having combustion unit and control system for plant having boiler
CA2820216C (en) Optimized integrated controls for oxy-fuel combustion power plant
US7219040B2 (en) Method and system for model based control of heavy duty gas turbine
US7389151B2 (en) Systems and methods for multi-level optimizing control systems for boilers
JP4627553B2 (en) Plant control device and thermal power plant control device
JP2008146371A (en) Controller of boiler plant
JP5918663B2 (en) Thermal power plant control device and control method
JP5503563B2 (en) Plant control device and thermal power plant control device
JP5639613B2 (en) Plant control device and thermal power plant control device
CN104075340A (en) Low-nitrogen combustion control method and system based on PLC
CN108073145B (en) Operation support device and recording medium
JP5410480B2 (en) Plant control equipment
JP7374590B2 (en) KPI improvement support system and KPI improvement support method
JP5378288B2 (en) Plant control device and thermal power plant control device
JP7368332B2 (en) Guidance operation support device, guidance operation support method, and guidance operation support program
JP7222943B2 (en) Operation improvement support system, operation improvement support method, and operation improvement support program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07706987

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07706987

Country of ref document: EP

Kind code of ref document: A1