WO2007116592A1

WO2007116592A1 - Plant control device

Info

Publication number: WO2007116592A1
Application number: PCT/JP2007/050684
Authority: WO
Inventors: Naohiro Kusumi; Akihiro Yamada; Takaaki Sekiai; Yoshiharu Hayashi; Masayuki Fukai; Satoru Shimizu
Original assignee: Hitachi, Ltd.
Priority date: 2006-03-30
Filing date: 2007-01-18
Publication date: 2007-10-18
Also published as: JP4741968B2; JP2007272361A

Abstract

Provided is a plant control device capable of easily creating a simulation model. The plant control device includes: a model (500) for predicting a value of a measurement signal (1) obtained from a control object (100) when a predetermined operation signal (16) is given to the control object (100); a learning unit (600) for learning a generation method of a model input (12) to be given to the model (500) so that a model output (13) as the prediction result of the model (500) is converged at a model output target value; and an operation signal generation unit (300) for generating an operation signal (15) according to the result of the learning unit (600). The operation signal (15) generated by the operation signal generation unit (300) is used as an operation signal (16). The plant control device further includes: an external input interface (210) for acquiring the measurement signal (1) and a measurement signal database (230) for storing the value of a measurement signal (2). The learning unit (600) calculates an average and variance of the measurement signals stored in the measurement signal database (230). By using the result of the average and the variance, the operation signal generation unit (300) corrects the operation signal (15).

Description

Specification

Plant control device

Technical field

TECHNICAL FIELD [0001] The present invention relates to a plant control apparatus such as power generation equipment, and more particularly to a control apparatus suitable for controlling boiler equipment.

Background art

In the control of a plant such as a thermal power generation facility, usually, a measurement signal obtained from a plant to be controlled is processed and an operation signal to be given to the control target is calculated. For this reason, the controller is equipped with an algorithm that calculates the operation signal so that the plant measurement signal that also incorporates the control target force achieves the operation target!

[0003] As a control algorithm used for plant control at this time, a so-called PI (proportional 'integral) control algorithm is also known in the past. Here, the PI algorithm is to derive the operation signal by multiplying the deviation between the operation target value and the measurement signal by a proportional gain, and adding the value obtained by time-integrating the deviation to the value. At this time, a plant operation signal may be derived using a learning algorithm.

[0004] By the way, there are several parameters in the control algorithm used for plant control, but these parameters need to be tuned in advance to values suitable for the controlled object. In general, tuning of this parameter is carried out on the object to be controlled (simulated) using a physical model or a statistical model.

[0005] Here, particularly in the case of a statistical model, the -Euron of a human neural network is expressed by elements called nodes that are simulated by linear or nonlinear functions, arranged in layers, and the previous layer force There is a well-known method that uses a so-called “Ural network” that artificially simulates the network structure in which signals are transmitted to other layers.

[0006] The model using the -Eural network is a model in which a desired output signal is output by adjusting parameters in the model by providing an input signal and a desired output signal as a teacher signal. In this way, a model that simulates the controlled object is used as a model. If this is the case, the operation signal input to the control target should be the input signal to the model, and the plant force measurement signal should be the model output signal.

[0007] At this time, as a method for adjusting the basic structure of the neural network and the parameters in the model, there are, for example, a back propagation method and a learning method of a neural network having a feedback mechanism. There is a Back Propagation Through Time method (for example, see Non-Patent Document 1).

[0008] On the other hand, unlike the case of learning by giving a teacher signal as in a neural network, a technique called reinforcement learning has been actively studied in the field of unsupervised learning. Here, this reinforcement learning is a framework of learning control that adapts to the environment through trial and error, and if you acquire the state of the environment and act on it, you will be rewarded according to its content. However, as the reward at this time is the action that is right for the environment or the action that reaches the goal that the environment aims at, the more reward is obtained.

[0009] Therefore, in this case, an action that obtains more reward is selected as a goal, and as a result, the action that reaches the goal aimed by the environment is adapted. At this time, the ability to select an action for obtaining more rewards for the environment is generally called an agent. Here, if the environment is regarded as a controlled object and the agent is regarded as a controller, the control object and the trial and error are considered. The generation method of the operation signal given to the controlled object is learned so that the measurement signal that can also obtain the controlled object force is desirable through the interaction, and this is known as the framework of learning control, Is.

[0010] In this reinforcement learning, the value of the scalar quantity calculated using the signal obtained from the control target force (this is called reward in reinforcement learning) is used as a clue, and from the current state. The operation signal generation method will be learned so that the expected value of the evaluation value obtained up to the future is maximized, but the measurement signal achieves the operation target value as the operation signal generation method at this time. In such a case, a method for giving a positive evaluation value and learning using an algorithm such as Actor-Critic, Q-learning, real-time dynamic programming, etc. is known (for example, see Non-patent Document 2). ₀

[0011] In addition, a framework called Dyna-architecture has been developed as a method developed from the above method. This is also known (see Non-Patent Document 1), but in this framework, there is a model that simulates the controlled object in the controller. In this case, the model takes the operation signal given to the controlled object as a model input, and calculates the model output that is the predicted value of the measured signal to be controlled. The model at this time is constructed using physical formulas and statistical methods.

[0012] Then, using the evaluation value calculated using this model output as a clue, the model input generation method is learned. In this Dyna architecture, model input generation that achieves the model output target value is performed. The method is learned in advance, and an operation signal to be applied to the control target is determined according to the learning result.

[0013] Non-Patent Document 1: "Neural network and measurement control" Nishikawa Hoichi, Shinzo Kitamura, Asakura Shoten 1 Published January 25, 995

Non-Patent Document 2: "Reinforcement Learning" Sadayoshi Mikami and Masaaki Minagawa Co. Translation Morikita Publishing Co., Ltd. December 20, 2000

Disclosure of the invention

Problems to be solved by the invention

[0014] As described above, when designing a control device for a plant, it is necessary to create a model that appropriately simulates the controlled object. In order to improve accuracy, detailed physical models and numerical analysis are required. This numerical analysis requires the creation of meshes (calculation grids), and an increase in the number of meshes is necessary to improve accuracy.

[0015] For example, when a large plant such as a boiler of a thermal power plant is targeted and the combustion phenomenon is analyzed, a large number of meshes are required. For this purpose, several hours to several tens of days are required. For this reason, measures to reduce the calculation time by using parallel calculation of high-speed algorithms have been used in the past, but it is still possible to calculate various operating conditions continuously. It is virtually difficult to do.

[0016] In addition, when considering the simulation of a controlled object using a statistical model, the data used to create the model fits well, but the accuracy when different values are input is significantly reduced. The phenomenon occurs. This phenomenon is generally called over-learning, but the conventional technology needs to devise a way to avoid this phenomenon and create a versatile model. Therefore, the application is difficult and the application range is limited.

[0017] The present invention has been made in view of the above-described problems, and an object thereof is to provide a plant control device in which a simulation model can be easily created.

Means for solving the problem

[0018] The above-described object is necessary to make the value of the measurement signal obtained from the control object fall within the operation target value of the control object when a predetermined operation signal is given to the control object. A plant control device that generates an operation signal and uses the operation signal as the predetermined operation signal. When a predetermined operation signal is given to the control target, the value of the measurement signal obtained from the control target A learning means for learning how to generate a model input given to the model so that the model output that is the prediction result of the model converges to the model output target value, and according to the result of the learning means A plant control apparatus comprising operation signal generation means for generating an operation signal to be given to the controlled object, wherein the operation signal generated by the operation signal generation means is the predetermined operation signal An external input interface for capturing the measurement signal to be controlled and a measurement signal database for storing the value of the measurement signal captured by the interface, and the average and variance of the measurement signals stored in the measurement signal database This is achieved by correcting the operation signal using the result of the average and dispersion, and generating the predetermined operation signal anew.

[0019] Further, the above-described purpose is necessary to make the value of the measurement signal obtained from the control object fall within the operation target value of the control object when a predetermined operation signal is given to the control object. A control device for a plant that generates a specific operation signal and uses the operation signal as the predetermined operation signal. When a predetermined operation signal is given to the control target, a measurement signal obtained from the control target A model that predicts a value, a model output force that is a prediction result of the model, a learning means that learns how to generate a model input to be given to the model so as to converge on the model output target value, and a result of the learning means Control of a plant having operation signal generating means for generating an operation signal to be given to the control object, and using the operation signal generated by the operation signal generating means as the predetermined operation signal In the apparatus, an external input interface for capturing the measurement target measurement signal, and A measurement signal database that stores the values of measurement signals captured by the interface is calculated, the average and variance of the measurement signals stored in the measurement signal database are calculated, and the operation signals are modified to generate new operation signals. In this case, the change width of the operation signal is determined based on the variance of the measurement signal.

[0020] At this time, as an external input function that may have a function of generating an operation signal using the result of calculating an expected value from the average and variance of the measurement signal, the distribution shape of the measurement signal is provided to the control device. In addition to providing a user interface for inputting, as an external input function, a user interface for inputting at least one of the average value, expected value, variance, and distribution shape of the measurement signal to the control device You may make it provide.

[0021] At this time, the control target is a thermal power plant, and among the measurement signals of the thermal power plant, a function of taking at least one of carbon monoxide and nitrogen oxides into the control device; Function to set at least one environmental regulation value of carbon monoxide and nitrogen oxides as the limit value of the measurement signal as an external input function and function to generate an operation signal of at least air damper opening according to the learning result The above-described object is achieved even when the thermal power plant is equipped with a function for taking in at least one of carbon monoxide and nitrogen oxides from the measurement signal of the thermal power plant into the control device, and an external input function. At least one environmental regulation value of carbon monoxide, nitrogen oxides, carbon dioxide, sulfur oxides, fine particles of mercury, fluorine, dust or mist, and volatile organic compounds Depending on the function to be set as the limit value of the measurement signal and the learning result, the air damper opening, the fuel flow rate supplied to the burner, the burner air flow rate, the air flow rate supplied to the air port, the gas recirculation amount, the PANA angle, the supply air The above object can be achieved by providing a function of generating at least one operation signal among the temperatures.

The invention's effect

[0022] According to the present invention, the average and variance of the measurement signal are calculated, and a model that simulates the control target is created from the calculation result. Therefore, the model that simulates the control target includes data The distribution shape corresponding to the accumulated amount of data is incorporated, and the fluctuation of force data such as the magnitude of dispersion can be known. [0023] As a result, it can be seen that when the variance is large! /, The operation state of the plant or! Is influenced by other process values, and when the variance is small, the operation state of the plant or other Therefore, according to the present invention, by constructing a control algorithm in consideration of the magnitude of dispersion, the reliability is low due to data fluctuations and the small amount of accumulated data. Can be avoided.

Brief Description of Drawings

FIG. 1 is a block diagram showing an embodiment of a plant control apparatus according to the present invention.

FIG. 2 is a block diagram showing an example of a thermal power plant to be controlled in an embodiment of the present invention.

FIG. 3 is an enlarged view of a piping part and an air heater part in an example of a thermal power plant to be controlled in one embodiment of the present invention.

FIG. 4 is an explanatory diagram showing a state of data stored in a measurement signal database in one embodiment of the present invention.

FIG. 5 is an explanatory diagram showing a state of data stored in an operation signal database in one embodiment of the present invention.

FIG. 6 is an explanatory diagram showing a mechanism of model used in one embodiment of the present invention.

FIG. 7 is an explanatory diagram of a model cage structure used in an embodiment of the present invention.

FIG. 8 is a flowchart for explaining the process of the model creation unit in an embodiment of the present invention.

FIG. 9 is an explanatory diagram showing an aspect of data stored in a model parameter database in one embodiment of the present invention.

FIG. 10 is a block diagram of a learning unit using a Q-Learning method used in an embodiment of the present invention.

FIG. 11 is a flowchart of an algorithm used in the learning unit according to the embodiment of the present invention.

FIG. 12 is a flowchart at the time of execution of 1 episode learning in the algorithm used in the learning unit according to the embodiment of the present invention. FIG. 13 is an explanatory diagram of tile coding applied to an evaluator in a learning unit according to an embodiment of the present invention.

FIG. 14 is an explanatory diagram showing an aspect of data stored in a learning information database in an embodiment of the present invention.

FIG. 15 is an explanatory diagram showing an aspect of data stored in a learning information database in an embodiment of the present invention.

FIG. 16 is an explanatory diagram showing an aspect of data stored in a learning parameter database in an embodiment of the present invention.

FIG. 17 is an explanatory diagram of an initial screen displayed as an image in an embodiment of the present invention.

FIG. 18 is an explanatory diagram of a control logic creation / edit screen displayed as an image in an embodiment of the present invention.

FIG. 19 is an explanatory diagram of a first half screen of a learning condition setting screen displayed as an image in an embodiment of the present invention.

FIG. 20 is an explanatory diagram of the second half screen of the learning condition setting screen displayed as an image in one embodiment of the present invention.

FIG. 21 is an explanatory diagram of a display information setting screen displayed as an image in an embodiment of the present invention.

FIG. 22 is an explanatory diagram of a trend graph of measured values displayed as an image in an embodiment of the present invention.

FIG. 23 is a characteristic diagram illustrating the relationship between CO and NOx emitted from a thermal power plant. Explanation of symbols

100: Control target, 200: Control device, 210: External input interface, 220: External output interface, 230: Measurement signal database, 240: Operation signal database, 25 0: Control logic database, 260: Learning parameter database, 270: Model parameter database, 280: Learning information database, 300: Operation signal generation unit, 400: Model creation unit, 500: Model, 600: Learning unit, 900: Input device, 901: Keyboard, 902: Mouse, 910: Maintenance Tool, 920: External input interface, 930: Data transmission / reception processing section, 940: External output interface, 950: Image display device. BEST MODE FOR CARRYING OUT THE INVENTION

Hereinafter, the plant control apparatus according to the present invention will be described in detail with reference to the illustrated embodiments. FIG. 1 shows an embodiment in which a plant control device according to the present invention is applied to a control object 100. For this purpose, a control device 200, an input device 900, a maintenance tool 910, and an image display device 950 are provided. RU

First, the control device 200 takes in the measurement signal 1 from the controlled object 100 via the external input interface 210 and transmits the operation signal 16 to the controlled object 100 via the external output interface 220. The measurement signal 2 captured by the external input interface 210 is transmitted to the operation signal generation unit 300 and is stored in the measurement signal database 230 together with this. Then, the operation signal 15 generated in the operation signal generation unit 300 is transmitted to the external output interface 220 and is stored in the operation signal database 240 together with this.

[0028] The operation signal generation unit 300 uses information stored in the control logic database 250 and the learning information database 280 so that the measurement signal 1 from the control target 100 achieves the operation target value. Generate signal 15. At this time, the information stored in the learning information database 280 is a force generated by the learning unit 600. Therefore, the learning unit 600 is connected to the model 500.

Here, the model 500 has a function of simulating the characteristics of the controlled object 100. That is, in the same way as when the operation signal 16 is given to the control object 100 and the measurement signal 1 is obtained, the model input 12 for operating the model 500 is given to the model 500, and as a result, the model output 13 is given. To get. At this time, the model output 13 is a predicted value of the measurement signal 1. Therefore, the model 500 simulates the characteristics of the controlled object 100, and has a function of calculating the model output 13 with respect to the model input 12 using a model formula based on a physical law or a statistical method.

The model creation unit 400 has a function of generating a model 500 from the previous model parameter 5 and the measurement signal 3 stored in the model parameter database 270. In addition, the model creation unit 400, when the model parameter database 270 does not have the previous model parameter 5, generates the model parameter and measurement signal 3 generated by random numbers or the like. It has a function to create a new model 500 using

[0031] Therefore, the learning unit 600 uses the previous learning information 11 stored in the learning information database 280, the learning parameter 7 stored in the learning parameter database 260, and the model output 13 as the model input 12. Generate. Therefore, the evaluation value 14 force calculated using the model output 13 calculated in the model 500 is input to the learning unit 600. Then, the learning unit 600 updates the learning information using the evaluation value 14, and transmits the updated learning information 10 to the learning information database 280.

The operation signal generator 300 generates the operation signal 15 using the learning information 9 stored in the learning information database 280 and the control logic information 6 stored in the control logic database 250. At this time, the operator of the control target 100 is composed of a keyboard 901 and a mouse 902 and is connected to the input device 900 and the image display device 950. You can access the information stored in the various databases provided.

The maintenance tool 910 includes an external input interface 920, a data transmission / reception processing unit 930, and an external output interface 940. An input signal 31 generated by the input device 900 is transmitted via the external input interface 920. This maintenance tool 910 is taken in. At this time, the data transmission / reception processing unit 930 acquires the database information 30 provided in the control device 200 according to the information of the input signal 32.

The data transmission / reception processing unit 930 transmits the output signal 33 obtained as a result of processing the database information 30 to the external output interface 940. The output signal 34 is supplied from the external output interface 940 to the image display device 950 and displayed as an image in preparation for the operator's monitor.

In this embodiment, all the necessary databases are arranged inside the control apparatus 200. These can be arranged outside the control apparatus 200. Further, in this embodiment, all signal processing functions for generating the operation signal 16 are arranged inside the control device 200, but these may be arranged outside the control device 200.

[0036] Next, the operation of this embodiment will be described below along with the information stored in the database and the signal processing function, taking the case where the present invention is applied to a thermal power plant as an example. Light up. Here, the thermal power plant to be controlled 100 will be described with reference to FIG. Here, the case where coal is used as fuel will be described. In this case, coal is stored in the coal bunker 111. Then, coal is supplied from the coal bunker 111 to the mill 110 via the coal feeder 112.

[0037] In the mill 110, the coal is pulverized by an internal roller into fine powder coal, so-called pulverized coal. Then, the pulverized coal is transported to the burner 102 by the primary air for transporting coal together with the secondary air for combustion adjustment, and is supplied into the furnace of the boiler 101 and combusted. At this time, the primary air is supplied to the mill 110 via the pipe 133, the pulverized coal and the primary air are supplied to the burner 102 via the pipe 1 34, and the secondary air is supplied to the burner 102 via the pipe 141. Each is guided.

[0038] At this time, the force at which the after-air for two-stage combustion is supplied into the furnace of the boiler 101 via the after-air port 103. This after-air is guided through the pipe 142. The high-temperature gas generated in the furnace due to the combustion of coal flows along a predetermined path including the heat exchanger 106 of the boiler body in the furnace of the boiler 101, and then passes through the air heater 104 to treat the exhaust gas. And then released to the atmosphere through the chimney.

[0039] At this time, the feed water circulating in the heat exchange 106 of the boiler 101 is pressurized by the feed water pump 105 and introduced into the boiler 101, and heated by the heat exchange 106 to become high-temperature and high-pressure steam. In this example, there is one heat exchange, but a plurality of heat exchanges may be arranged.

[0040] The steam that has passed through the heat exchange ^^ 106 and has become high-temperature and high-pressure is led to the steam turbine 108 through the turbine governor 107, where the energy of the steam is converted into rotational energy, and the generator 109 is As a result of the rotational drive, electric power is generated. At this time, the exhaust gas from the steam turbine 108 is sent to the condenser 113, where it is cooled, and as a result, it becomes condensed water and is sent to the feed pump 105 again. In this process, air is extracted from the turbine 108, and a device for heating the feed water with the extracted steam is installed to improve the thermal efficiency!

By the way, various measuring instruments are arranged in such a thermal power plant. For example, FIG. 2 shows a flow rate measuring device 150, a temperature measuring device 151, a pressure measuring device 152, a power generation output measuring device 153, and a concentration measuring device 154. And with the flow meter 150, the water supply The flow rate of the feed water supplied from the pump 105 to the boiler 101 is measured. The temperature measuring device 151 and the pressure measuring device 152 measure the temperature and pressure of the steam supplied to the steam turbine 108. The amount of power generated by the generator 109 is measured by a power generation output measuring device 153.

[0042] On the other hand, fine particles composed of CO (acid-carbon), NOx (nitrogen oxide), carbon dioxide, sulfur oxide, mercury, fluorine, dust, or mist contained in the gas passing through boiler 101. For example, information on the concentration of a component such as at least one environmental regulation value of a volatile organic compound is measured by a concentration meter 154. In general, in addition to those shown in Fig. 2, many measuring instruments are omitted in Fig. 2, which is installed in a thermal power plant. Then, information obtained from these measuring instrument forces is shown in FIG. 1 as measurement information 1 output from the control object 100, and these are transmitted to the control device 200.

[0043] Next, the paths of the primary air and secondary air supplied from the burner 102 and the after-air supplied from the after-air port 103 will be described. First, the primary air is also taken into the pipe 130 by the fan 120 force, branched into a pipe 132 that passes through the air heater 104 and a pipe 131 that does not pass through, and then merges into the pipe 133 and is guided to the mill 110. At this time, the air passing through the air heater 104 is heated by the gas and used to convey the pulverized coal produced by the mill 110 to the burner 102.

[0044] On the other hand, the secondary air and after air are taken into the pipe 140 by the fan 121, heated by the air heater 104, and then branched into the secondary air pipe 141 and the after air pipe 142, respectively. 102 and after-airport 103.

[0045] Fig. 3 is an enlarged view of the primary air and secondary air at this time, a piping section through which after-air passes, and an air heater 104. As shown in this figure, Air dampers 160, 161, 162, and 163 are arranged, and by operating these air dampers, the area through which air passes through the pipe can be changed, and the air flow rate through the pipe can be adjusted by operating the air damper. . Therefore, the control device 200 operates equipment such as the feed pump 105, the mill 110, and the air dampers 160, 161, 162, 163 using the operation signal 16 generated there.

Next, information stored in the measurement signal database 230 and the operation signal database 240 will be described with reference to FIGS. 4 and 5. FIG. Here, Fig. 4 is stored in the measurement signal database 230. FIG. 5 shows an example of information stored in the operation signal database 240.

First, as shown in FIG. 4, in the measurement signal database 230, information measured in the control target 100 is stored together with each measurement time for each measuring instrument. For example, in Fig. 2, the flow rate value F measured by the flow meter 150, the temperature value T measured by the temperature meter 151, the pressure value ρ measured by the pressure meter 152, the pressure value ρ measured by the power generation output meter 153 The power generation output value Ε and the ΝΟχ concentration D contained in the exhaust gas are stored along with the time information.

[0048] At this time, in order to make it easy to use the data stored in the measurement signal database 230, each measurement value is assigned a unique number called a PID number as shown in the figure. In FIG. 4, the force for storing data in a 1-second cycle, that is, the sampling cycle for data collection can be arbitrarily set.

Next, in the operation signal database 240, as shown in FIG. 5, operation signals such as a feed water flow rate command signal are stored together with time information. In this case as well, each operation signal is assigned a unique PID number, and it goes without saying that the time interval can also be set arbitrarily.

[0050] Next, operations of the model creation unit 400 and the model 500 will be described. The model 500 realizes the relationship of the measurement signals shown in Fig. 6 by the structure shown in Fig. 7. Here, Fig. 6 shows the force plotting the relationship between the air flow rate ratio and the measurement signal Α. At this time, the number of data that can be plotted on the graph differs depending on the plant conditions. For example, in a new plant, the power of design value information is also required, so the number of data is small. On the other hand, the number of data increases in plants with many years of operation.

[0051] As described above, since the number of data varies depending on the plant status, here, the distribution is assumed for each data, and the difference in the number of data is expressed by the shape of the distribution. Then, when the number of data is small, the variance is large and the distribution is widened. On the other hand, when the number of data is large, the variance is small and the distribution is sharp. At this time, if there is prior information on the data, the distribution shape can be assumed, but in the case of new data, it is necessary to estimate the distribution based on the data obtained without prior information. .

[0052] Here, there are many known methods for estimating the force distribution only in the data. Whatever the group distribution, the central limit of force normal distribution, where the distribution approaches the normal distribution as the number of data increases, assumes a normal distribution. If the distribution can be assumed, the shape can be determined from the mean and variance. The assumption of normal distribution based on this central limit theorem is described in detail, for example, in "Introduction to Statistics", Department of Statistics, Faculty of Liberal Arts, The University of Tokyo, 19th July, 1991. .

[0053] Fig. 7 is a diagram for explaining the model structure when the distribution is determined. The output signal at this time is a model that outputs the median and variance values of the distribution shown in Fig. 6. It consists of an input layer, an intermediate layer, and an output layer, and the nodes of each layer are connected to each other. The node portion uses a linear or non-linear function, but a sigmoid function is generally used. Each node connection has a weighting coefficient, and represents the strength of the mutual relationship between the nodes.

[0054] Normally, a model parameter (described later) refers to this weighting factor. Here, the intermediate layer is expressed as a single layer, but can also be expressed as a multilayer. For the input signal, input the related measurement signal. By simulating the controlled object with this model, it is possible to create a model that takes into account the number of stored data, so various states of the controlled object can be easily simulated.

FIG. 8 is a flowchart showing processing for creating a model 500 by the model creation unit 400. The parameters necessary for the execution of this flowchart are stored in the model parameter database 270 !, but the format of the information stored in this database will be described later.

When the processing according to the flowchart of FIG. 8 is started, first, in step 401, it is selected whether to use a model parameter set in the past or create a new model parameter. If a new model parameter is to be created here, the process proceeds to step 402, and the initial value of the model parameter is set using a random number.

[0057] Next, in step 403, the measurement signal 3 that is the input signal and output signal of the model 500 is extracted from the measurement signal database 230, and the average of the measurement signal 3 that is the output signal of the model 500 is calculated. The calculated average is stored in the learning information database 280.

[0058] In step 404, the variance of the measurement signal 3 that is the output signal of the model 500 is calculated. If the measurement signal has only one sample, the variance cannot be calculated. . Therefore, in this case, a larger dispersion value is given as a default value. For example, 100 may be set as the default value. This default value can be changed by the user sequentially.

[0059] Regarding the shape of the distribution at this time, the shape stored in the learning information database 280 is used. However, when the shape is not yet stored in the learning information database 280, a normal distribution is used. The variance thus calculated is stored in the learning information database 280.

[0060] In step 405, the mean and variance calculated in steps 403 and 404 are set as the teacher signal of the model 500. Next, in step 406, parameters necessary for learning such as the number of learning, the learning coefficient, and the number of nodes are set. Set. When creating a new model parameter, the default value stored in the model parameter database 270 is used.

[0061] In step 407, the initial value force is sequentially updated by learning the model parameters. The Back Propagation method is used as a method for updating model parameters by learning. This learning method is described in detail in "Neural Network and Measurement Control" Kei Nishikawa-'Shinzo Kitamura, Asakura Shoten, published January 25, 1995'. The model parameters are updated so that there is no difference between the output signal and the teacher signal when the signal is given.

[0062] Here, the difference between the output signal from model 500 and the teacher signal is generally expressed by a square error and is called an evaluation function. The variation of the evaluation function when each model parameter is varied is subjected to partial differential calculation, and the obtained value multiplied by the learning coefficient is used as the updated model parameter. Therefore, if this is repeated, the difference between the output signal of the model 500 and the teacher signal disappears, and the evaluation function approaches zero.

[0063] When the evaluation function approaches zero, the partial differential value also approaches zero, and the update amount of the model parameter approaches the outlet. However, in numerical calculations, it is never completely zero. Therefore, in step 408, when the evaluation function falls below the set value, it is considered that learning has ended, and model creation is ended.

[0064] On the other hand, if the learning end condition is not satisfied in step 408, the iterative calculation is stopped when the number of learning repetitions reaches the set number of times, and the process returns to step 406 and restarts. Set the learning parameters.

[0065] If step 401 is returned and the use of the past model parameter is selected here, whether or not the past model parameter is to be corrected by learning is selected in step 409. If so, go to step 403. If it is not corrected, the model creation without the need to reconstruct the model 500 is completed because the previous model parameters are used as they are.

[0066] In this embodiment, other network models such as a radial basis function network using a Gaussian function for a force node using a sigmoid function for a node and a Gaussian function may be used!ヽ.

[0067] FIG. 9 is a diagram for explaining the form of information stored in the model parameter database 270. As shown in this figure, the model parameter database 270 includes an ID, a creation date, a learning coefficient, and the number of learnings. , End condition, number of nodes, parameter values are saved. Here, the number of nodes is divided into an input layer, an intermediate layer, and an output layer. The parameter value is a weighting factor, and there are mutual connections of nodes, which are stored as W, W, ..., respectively.

11 12

[0068] The ID value of 000 indicates the default value of the learning parameter when a new model parameter is created. For new creation, the number of nodes and parameter values are usually blank.

Next, the operation of model 500 and learning unit 600 will be described. The learning unit 600 learns how to generate the model input 12 so that the model output 13 achieves the model output target value for the model 500 that simulates the characteristics of the control target 100. Examples of algorithms for performing such learning are described in “Reinforcement Learning”, Sadayoshi Mikami and Masaaki Minagawa, Morikita Publishing Co., Ltd., published on December 20, 2000. There is a reinforcement learning theory.

[0070] Here, this reinforcement learning is the generation of the model input 12 for achieving the model output target value through the interaction between the learning unit 600 and the model 500 using the evaluation value (reward) information as a clue. By applying this reinforcement learning, it is possible to learn how to generate the model input 12 that maximizes the expected value of the evaluation value obtained from the current time in the future. is there. In this embodiment, a case where the Q-learning method is applied as an example of the reinforcement learning algorithm will be described. However, as a learning method in the control device 200 of this embodiment, the reinforcement learning method is used. It is also possible to apply optimization techniques such as genetic algorithms and linear 'non-linear programming'.

[0072] Figure 10 is a schematic diagram of the Q-learning method. As shown here, the learning unit 600 applying this Q-learning method evaluates the state value and the agent 650 that generates the model input 12. It consists of an evaluator 660.

FIG. 11 and FIG. 12 are flowcharts for explaining the processing in the case of the Q-learning method. Here, the learning parameter database 260 and the learning information database 280 are stored for the design parameters necessary for executing this flowchart, such as the discount rate γ. The form of information stored in these databases and the method of registering design parameters in the database will be described later.

In FIG. 11, first, this flowchart is repeatedly performed while the control target 100 is being controlled. In the first step 301, the sampling period r in the control is acquired. Next, in step 302, one episode learning is executed. In step 302, the model 500 and the learning unit 600 are operated to execute the above-described reinforcement learning algorithm. In step 303, a learning end determination is executed.

[0075] This step 303 is a step provided to end the learning within the sampling period of the control. When the learning execution time is smaller than r, the process returns to step 302, and when the processing time exceeds the period r. End learning.

FIG. 12 is a flowchart for explaining the operation at the time of execution of one episode learning in step 302 in FIG. 11. First, in step 601, initial values of model inputs are set at random. Next, in step 602, the model input 12 generated in step 601 is input to the model 500, and the model output 13 is obtained. Next, in Step 603, the model output 13 is compared with the target value of the model output, and if the model output 13 has achieved the model output target value, the episode is terminated and achieved! / Proceed to step 604.

In the next step 604, the learning unit 600 determines a model input change width using information stored in the learning information database 280. This model input change The method for determining the width Aa will be described later.

In Step 605, the model input 12 is determined using the following equation (1).

[0079] [Equation 1]

"(/ + 1} = a (t) + A

In Step 606, the model input 12 determined in Step 605 is input to the model 500, and the model output 13 is obtained. Next, at step 607, an evaluation value is calculated by the following equation (2) based on the model output 13 obtained at step 606.

[0081] [Equation 2]

Here, β (ΐ, ίί) is the value when taking action a with state s, ( ₀ ≤ v <D is the split bow j rate is the reward for time-lapse.

[0082] Here, the value Q (s, a) is determined by the sum of time. This makes sense. That is, the actual behavior, here the force that becomes the response when the model input 12 is generated and input to the model 500 is often accompanied by a delay time. In particular, this effect is significant when applied to plants.

[0083] Therefore, it is more realistic to determine the value based on the sum of rewards given in the future rather than determining the value based on the reward immediately after the action. Therefore, the value is determined based on the sum at the time. I did it. In this case, by introducing a discount rate γ, the reward obtained immediately after the action is set higher so that an evaluation value that considers responsiveness can be obtained.

There is.

In step 608, based on the evaluation value calculated in step 607, the agent parameter is updated by the following equation (3), and the updated result is stored in the learning information database 280. [0085] [Equation 3] Equation 3

<2 (i,) - ( . 5 (+,, α) Q s t,,) Q (s,, α,) + Do +, _max max β (3) where, shed (0≤ α _<1 ) Is the learning rate

[0086] Finally, in step 609, end determination is performed by the same method as in step 403. That is, in step 609, when the learning end condition is not satisfied, the iterative calculation is stopped when the number of learning repetitions reaches the set number of times, and the process returns to step 604.

Next, processing when the model input 12 is generated in the agent 650 of the learning unit 600 and the state value is calculated in the evaluator 660 will be described. Here, the power to explain the case of using the tail coding method The agent 650 and the evaluator 660 may be configured by using a method other than this method.

First, the evaluator 660 divides the state by the tile coding method as described above. This tile coding method is a method of recognizing continuous states as discrete states by dividing the input space and determining which region it belongs to. Figure 13 shows the tile coding at this time. It is a figure explaining the law. In this figure, each area is called a tile. For example, if the input signal 12 to the model 500 is a two-dimensional input signal Α and an input signal B, the input signal A is between 0 and 1, and the input signal B is between 1 and 2, Belongs to tile with state number 1.

In this case, the learning information database 280 stores information in a form in which the state number and the value function correspond as shown in FIG. The evaluator 660 uses the value of the input signal 12 when the model output 13 is obtained and the information stored in the learning information database 280 to calculate the value of the state according to the above-described equation (3).

Here, first, FIG. 15 shows information stored in the learning information database 280. As shown in the figure, the teacher signal used when creating the model 500 corresponding to the state number is shown. The mean and variance of the are preserved. At this time, in step 604 described above, the model input change width is determined based on the variance value of the teacher signal. Accordingly, when the variance is small, the input change width is increased because the variation is small and the sensitivity to changes in the input signal is low. On the other hand, if the variance is large, the input change width is reduced because the variance is large and the sensitivity to changes in the input signal is high.

Next, FIG. 16 shows an aspect of information stored in the learning parameter database 260, which includes steps 606 and 607 in the flowchart of FIG. 12, as shown in FIG. Parameters such as the learning rate necessary to do this are stored. In this reinforcement learning, the generation method of the model input 12 is learned so that the expected value of the evaluation value is maximized. Therefore, when the model output 13 reaches the model output target value, the evaluation value increases. It is desirable.

Therefore, as a method for generating such an evaluation value, there is a method in which when the model output 13 achieves the model output target value, a positive value, for example, “1” is used as the evaluation value. If the model output target value is not achieved, there is a method to calculate the evaluation value using a function that is inversely proportional to the error between the model force target value and the model output 13. Furthermore, a method of calculating an evaluation value by combining these methods can be considered.

Next, a method in which the operator of the control target 100 displays the database information on the image display device 950 using the maintenance tool 910 will be described with reference to FIGS. In this case, the operator uses the keyboard 901 and the mouse 902 to execute an operation such as inputting a parameter value in a blank area of the displayed screen.

Here, FIG. 17 shows an initial screen displayed on the image display device 950. Here, the operator creates control logic. The edit button 951, the learning condition setting button 952, and the information display button 953 are also displayed. By selecting the required button, moving the cursor 954 using the mouse 902, and clicking the mouse 902, one of the buttons is pressed.

[0096] First, FIG. 18 shows a control logic edit screen that is displayed when the control logic create / edit button 951 is clicked. In this screen, the operator clicks the new create button 96 7 and the edit button 968. Press one of the following. If it is newly created, a logic diagram with nothing is opened. If editing, the logic to be edited is selected and the logic diagram is displayed. When creating or editing, from the standard element module 963 registered in advance Select the required module and move it to the logic edit screen 961. Connect the modules using connection Z erase 962.

The control logic drawing created on the display screen of FIG. 18 is saved in the control logic database 250 via the data transmission / reception processing unit 930 when the save button 964 is clicked. In addition, the operation signal generation unit 300 generates the operation signal 15 when the measurement signal 2 is input, using the information in the control logic drawing. Furthermore, the operation signal generation unit 300 can generate the operation signal 15 by using information stored in the learning information database 280 together.

At this time, the learning information database 280 stores the state number and the central information of the information shown in FIG. Therefore, by using these pieces of information, it is possible to easily generate the operation signal 15 having the same value as the model input 12 where the model output 13 is desired and becomes a value. If the control logic drawing created at this time is not to be saved, a cancel button 965 is clicked. On the other hand, by clicking the back button 966, it is possible to return to the screen of FIG.

When the learning condition setting button 952 is clicked on the screen shown in FIG. 17, the screen shown in FIG. 19 is displayed. Therefore, the operator inputs the learning coefficient, the number of learnings, and the end condition necessary for executing the flowchart of FIG. 8 in the model creation field 971 in the screen of FIG. 19 based on the model-specific PID. Or if it has already been entered, its value can be corrected. At this time, the operator can change the default value of ID 000.

Next, in the parameter setting field 972, setting parameters necessary for executing the flowcharts of FIGS. 11 and 12 are input. Also, in the operation edge setting column 973, the operation edge name for learning the operation method, the operation range, and the number of divisions for tile coding are input by the flow chart of FIG. Here, by clicking the next page button 977 in FIG. 19, the screen moves to the second half of the learning condition setting screen. The previous page button 978 and the second half screen of the learning condition setting screen will be described later.

[0101] Then, by clicking the save button 974 in FIG. 19, the information entered in the model creation field 971 is entered in the model parameter database 270 and in the parameter setting field 972. The information input in the learning parameter database 260 and the information input in the operation end setting field 973 are stored in the learning information database 280, respectively.

Here, if the cancel button 975 is clicked, the information input in the model creation field 971, the parameter setting field 972, and the operation end setting field 973 is canceled. Then, when the return button 976 is clicked, the screen shown in FIG. 17 is restored.

Next, the first half screen of the learning condition setting screen will be described with reference to FIG. This first half screen is displayed by clicking the next page button 977 in FIG. Therefore, the operator can input the average, variance, and distribution shape of the output signal of the model 500 in the learning information column 979, or can correct them if they are input. Based on this information, the model input change width at step 604 in the flowchart of FIG. 12 is determined.

Next, FIG. 21 shows a screen used to set the conditions for displaying the information stored in the measurement signal database 230 and the operation signal database 240 on the image display device 950. In FIG. It is displayed by clicking the information display button 953. Therefore, the operator inputs a measurement signal or an operation signal to be displayed on the image display device 950 in the input field 981 together with the range (upper limit Z lower limit). Enter the time you want to display at this time in the time input field 982.

Also, by clicking the display button 983, a trend graph as shown in FIG. 22 is displayed on the image display device 950. Clicking the return button 991 here will return you to the screen in Figure 21. On the other hand, by clicking the back button 984, it is possible to return to the screen of FIG. In addition to the image described above, information stored in the database in the control device 200 can be arbitrarily selected and displayed on the image display device 950 in any manner.

Next, in this embodiment, the control target 100 in FIG. 1 is the thermal power plant described in FIG. 2, and the control device 200 is applied to this to operate the air damper of the thermal power plant. This makes it possible to control the emission status of at least one environmental regulation value such as CO, NOx, carbon dioxide, sulfur oxides, mercury, fluorine, particulates consisting of dust or mist, and volatile organic compounds. . [0107] Here, first, Fig. 23 explains the basic characteristics of CO and NOx emitted from a thermal power plant. Generally, the amount of CO and the amount of NOx are in a trade-off relationship as shown in the figure. Yes, NOx increases when trying to reduce CO, and CO tends to increase when trying to reduce NOx.

[0108] On the other hand, in the thermal power plant, the amount of CO and NOx emitted from the chimney force is also legally restricted. In particular, NOx is strictly controlled, so the gas at the boiler outlet is denitrated. The power that leads to the equipment and keeps the regulations through the treatment here At this time, the amount of ammonia consumed for the denitration equipment increases as the NOx concentration at the inlet of the denitration equipment increases.

[0109] Therefore, reducing the amount of NOx at the inlet of the denitration system is a great cost advantage. Therefore, it is desirable to reduce the NOx concentration as much as possible. Therefore, consider the tradeoff relationship between CO and NOx. Control algorithm is required. However, in the case of thermal power plants, the stored measurement signal data will differ if the situation differs during design, trial operation, and operation. Therefore, if the operating conditions are different even during long-term operation, the number of accumulated data is large, so it is not necessarily preferable! /.

[0110] However, in the case of the above embodiment, the model that simulates the thermal power plant to be controlled can model the shape of the distribution according to the accumulated data according to the flowchart of FIG. Therefore, the past state can be grasped from the shape of the distribution. In other words, if the variance is large, it means a state with a lot of variation, and it can be seen that the state of the plant is very unstable.If the variance is small, the variation is small, so the plant state is very stable. Therefore, a control algorithm that takes into account the reliability of stored data can be constructed from the flowchart in Fig. 12.

[0111] As a result, according to the above embodiment, a control algorithm that takes into account the fact that the number of data is small even when the amount of accumulated data is small can be constructed. Therefore, it is always possible to satisfy the legal regulations regarding the trade-off relationship between CO and NOx.

[0112] Regarding emissions subject to legal restrictions, in addition to the above-mentioned CO and NOx, fine particles such as carbon dioxide, sulfur oxide, mercury, fluorine, dust or mist, volatile organic compounds As described above, according to the above embodiment, Control over at least one of these environmental limits is also possible.

Claims

The scope of the claims

[1] When a predetermined operation signal is given to the control object, an operation signal necessary to make the measurement signal value obtained from the control object fall within the operation target value of the control object is generated. A control apparatus for a plant in which this operation signal is used as the predetermined operation signal, and a model for predicting a value of a measurement signal obtained from the control object when the predetermined operation signal is given to the control object; Learning means for learning a method of generating a model input to be given to the model so that the model output which is a prediction result of the model converges to a model output target value; In a plant control apparatus, comprising an operation signal generating means for generating a signal, wherein the operation signal generated by the operation signal generating means is the predetermined operation signal. An external input interface that captures the target measurement signal and a measurement signal database that stores the value of the measurement signal captured by the interface, and calculates the mean and variance of the measurement signal stored in the measurement signal database Then, using the average and variance results, the operation signal is corrected, and the predetermined operation signal is newly generated.

[2] When a predetermined operation signal is given to the controlled object, an operation signal necessary to make the value of the measurement signal obtained from the controlled object fall within the operation target value of the controlled object is generated. A control apparatus for a plant in which this operation signal is used as the predetermined operation signal, and a model for predicting a value of a measurement signal obtained from the control object when the predetermined operation signal is given to the control object; Learning means for learning a method of generating a model input to be given to the model so that the model output which is a prediction result of the model converges to a model output target value, and an operation to be given to the control target according to the result of the learning means In a plant control apparatus, comprising an operation signal generating means for generating a signal, wherein the operation signal generated by the operation signal generating means is the predetermined operation signal. An external input interface that captures the target measurement signal and a measurement signal database that stores the value of the measurement signal captured by the interface, and calculates the mean and variance of the measurement signal stored in the measurement signal database When the operation signal is modified to generate a new operation signal, the change width of the operation signal is set to the dispersion of the measurement signal. A control apparatus for a plant characterized in that it is determined based on the above.

[3] The plant control device according to claim 1, comprising a function of generating an operation signal using a result of calculating an expected value from an average and variance of measurement signals. apparatus.

[4] The plant control device according to claim 1, further comprising a user interface for inputting a distribution shape of the measurement signal to the control device as an external input function.

[5] In the plant control apparatus according to claim 1, a user for inputting at least one of an average value, an expected value, a variance, and a distribution shape of the measurement signal to the control apparatus as an external input function. A plant control device comprising an interface.

[6] In the plant control apparatus according to claim 1, the control target is a thermal power plant, and at least one of carbon monoxide and nitrogen oxides among measurement signals of the thermal power plant. A function to capture one of them into the control device, a function to set at least one environmental regulation value of carbon monoxide and nitrogen oxide as a limit value of the measurement signal as an external input function, and at least an air damper opening according to the learning result A plant control device characterized by having a function of generating an operation signal.

[7] In the plant control apparatus according to claim 1, the control target is a thermal power plant, and at least one of carbon monoxide and nitrogen oxides among the measurement signals of the thermal power plant. At least one of fine particles and volatile organic compounds consisting of carbon monoxide, nitrogen oxides, carbon dioxide, sulfur oxides, mercury, fluorine, dust, or mist. According to the learning result, the air damper opening, the fuel flow to be supplied to the burner, the fuel flow to the burner, the air flow to be supplied to the air port, the gas recirculation amount, the burner A plant control apparatus comprising a function of generating at least one operation signal of an angle and a supply air temperature.