JP2024017830A

JP2024017830A - Optimal control device, optimal control method and computer program

Info

Publication number: JP2024017830A
Application number: JP2022120736A
Authority: JP
Inventors: 理山中; 祐太大西; 勇岐西室
Original assignee: Toshiba Corp; Toshiba Infrastructure Systems and Solutions Corp
Current assignee: Toshiba Corp; Toshiba Infrastructure Systems and Solutions Corp
Priority date: 2022-07-28
Filing date: 2022-07-28
Publication date: 2024-02-08

Abstract

【課題】安定性を維持しながら、制御性能を極力高める最適値の探索を実現する最適制御装置を提供する。【解決手段】実施形態による最適制御装置は、制御対象プロセスにおいて取得された計測値を用いて評価関数の評価値を算出するプロセス評価値算出部１２０と、操作量および評価値を入力として、操作量から評価値までの位相遅れの推定値を算出するプロセス位相遅れ推定部４００と、位相遅れの推定値に関する情報と評価値の情報とを用いて、評価値の操作量に対する変化率の推定値を算出する評価関数勾配推定部１４０と、変化率の推定値を積分することにより、操作量の動くべき方向と量とを決める極値探索部１７０と、極値探索部１７０で決定された操作量の動くべき方向と量との情報に基づく操作量を、制御対象プロセスへ出力する操作量出力部１９０と、を備える。【選択図】図３An object of the present invention is to provide an optimal control device that realizes a search for an optimal value that maximizes control performance while maintaining stability. [Solution] The optimal control device according to the embodiment includes a process evaluation value calculation unit 120 that calculates an evaluation value of an evaluation function using measured values acquired in a controlled process, and an operation amount and an evaluation value as input. A process for calculating an estimated value of the phase lag from the amount to the evaluation value. Using the phase lag estimating unit 400 and information regarding the estimated value of the phase lag and information on the evaluation value, an estimated value of the rate of change of the evaluation value with respect to the manipulated variable is calculated. An evaluation function gradient estimating unit 140 that calculates the evaluation function gradient estimating unit 140, an extreme value searching unit 170 that determines the direction and amount in which the manipulated variable should move by integrating the estimated value of the rate of change, and an operation determined by the extreme value searching unit 170. It includes a manipulated variable output unit 190 that outputs a manipulated variable based on information on the direction and amount in which the quantity should move to the controlled process. [Selection diagram] Figure 3

Description

本発明の実施形態は、最適制御装置、最適制御方法およびコンピュータプログラムに関する。 Embodiments of the present invention relate to an optimal control device, an optimal control method, and a computer program.

水処理プラントや環境プラントなどのプラント制御では、外乱や環境条件によってプロセスの状態が変化し、外乱の影響を緩和する様にプロセスの操作量を最適化したり、環境条件に応じて、その環境下で最適なプロセスの操作量を決定したりしなければならない場合が多い。 In plant control such as water treatment plants and environmental plants, the state of the process changes depending on disturbances and environmental conditions. In many cases, it is necessary to determine the optimal amount of process operation.

たとえば、下水処理プロセスや排水処理プロセスでは、下水処理場に流入する流入量や流入負荷（＝流入量×流入水質濃度）等の外乱に応じて、ブロワ風量、ポンプ流量、薬品投入量を調整し、放流水質を維持する必要がある。さらに、生物学的排水処理などでは、特に季節に応じた年間の水温変動によって最適な操作量が変化する。
また、上水処理における凝集剤注入制御や塩素注入制御では、取水源の濁度や日射量などの外乱によって、必要な水質を維持するための薬品注入量が変化する。また、凝集剤や塩素の注入量も年間の水温変動の影響を大きく受けることが多い。 For example, in sewage treatment and wastewater treatment processes, blower air volume, pump flow rate, and chemical input amount are adjusted according to disturbances such as the inflow volume flowing into the sewage treatment plant and the inflow load (= inflow volume x inflow water quality concentration). , it is necessary to maintain the quality of the effluent water. Furthermore, in biological wastewater treatment, etc., the optimal amount of operation changes depending on annual water temperature fluctuations, especially depending on the season.
Furthermore, in flocculant injection control and chlorine injection control in water treatment, the amount of chemicals injected to maintain the required water quality changes depending on disturbances such as the turbidity of the water intake source and the amount of solar radiation. Additionally, the amount of coagulant and chlorine injected is often greatly affected by annual water temperature fluctuations.

また、火力発電プラントなどにおける脱硝制御では、発電プラントから排出される窒素酸化物（ＮＯｘ）を外乱として、ＮＯｘの排出量に応じてＮＯｘを処理するためのアンモニア注入量を調整する必要がある。
石油化学プラントや鉄鋼プラントでは、原料（重油や鉄鉱石など）の質や量に応じて石油精製や製鉄に必要な操作量（温度、圧力、流量など）を最適値に制御する必要がある。 Further, in denitrification control in a thermal power plant or the like, it is necessary to use nitrogen oxides (NOx) discharged from the power plant as a disturbance and adjust the amount of ammonia injection for treating NOx according to the amount of NOx discharged.
In petrochemical plants and steel plants, it is necessary to control the operating variables (temperature, pressure, flow rate, etc.) necessary for oil refining and steelmaking to optimal values depending on the quality and quantity of raw materials (heavy oil, iron ore, etc.).

上記のように、多くの産業におけるプラント制御では、外乱の影響や環境条件によってプラントの状態が変化する。これらのプラントでは、外乱に対処するために各種の操作量を変化させてプラントを最適な状態に保つ様な外乱抑制型の制御や、環境条件に応じて制御を切り替えるスケジューリング型の制御が行われている場合が多く、また、環境条件に適応していく適応型の制御が採用されている場合もある。 As mentioned above, in plant control in many industries, the state of the plant changes depending on the influence of disturbances and environmental conditions. In these plants, disturbance suppression-type control is used to keep the plant in an optimal state by changing various manipulated variables in order to deal with disturbances, and scheduling-type control is used to switch control according to environmental conditions. In many cases, adaptive control that adapts to environmental conditions is employed.

プラントを最適な状態に保つ代表的な制御手法として、極値制御(ＥＳＣ：Extremum Seeking Control)と呼ばれる技術が注目を浴び、産業界でも利用され始めている。極値制御は、例えば、プラントの運転コストや発電量など、最小化や最大化を図りたい最適性の評価関数を設定し、その評価値の極値（＝局所最小値あるいは局所最大値）をリアルタイムに計測しながら、操作量を常時変化させることによって、極値を探索する制御手法である。極値制御は、数式によるモデル化の難しい複雑な現象を伴うプロセス（例：下水処理プロセス、燃焼プロセス、石油化学プロセスなど）に対する実用的な最適制御技術として注目を集めている。 A technology called Extreme Seeking Control (ESC) has been attracting attention as a typical control method for keeping plants in optimal conditions, and is beginning to be used in industry. In extreme value control, for example, you set an evaluation function for the optimality that you want to minimize or maximize, such as the operating cost or power generation amount of a plant, and then set the extreme value (= local minimum value or local maximum value) of the evaluation value. This is a control method that searches for extreme values by constantly changing the amount of operation while measuring in real time. Extreme value control is attracting attention as a practical optimal control technology for processes that involve complex phenomena that are difficult to model using mathematical formulas (e.g., sewage treatment processes, combustion processes, petrochemical processes, etc.).

極値制御は、最適化したい評価関数の値を直接計測できるオンラインセンサーにより取得される情報から計算し、評価関数値を最適値（最小値もしくは最大値）に維持する様に操作量を変化させながら適応的に探索するものである。すなわち、極値制御は、複雑な数式モデルを使わずに操作量を変化させながら最適値を探索するため、ＰＬＣ（Programmable Logic Controller）などのコントローラへの実装が容易であるという点で魅力的な方法である。 Extreme value control is calculated from information obtained by an online sensor that can directly measure the value of the evaluation function that you want to optimize, and changes the amount of operation to maintain the evaluation function value at the optimal value (minimum value or maximum value). It is an adaptive search. In other words, extreme value control is attractive because it is easy to implement in controllers such as PLCs (Programmable Logic Controllers) because it searches for optimal values while changing the manipulated variable without using complex mathematical models. It's a method.

一方で、極値制御は、安定性（極値の探索が可能なこと）を維持しながら、制御性能（極値への収束の速さや最終値と極値の誤差など）を向上させることは一般には難しく、安定性と制御性能（収束速度＋最適性（最適値との誤差））とを両立させることは、極値制御を現実の制御対象（プラント）に適用していく上で欠かせない重要な課題である。 On the other hand, extreme value control does not improve control performance (speed of convergence to extreme values, error between final value and extreme value, etc.) while maintaining stability (ability to search for extreme values). In general, it is difficult to achieve both stability and control performance (convergence speed + optimality (error from optimal value)), but it is essential to apply extreme value control to actual control objects (plants). This is an important issue.

国際公開第2020/241657号International Publication No. 2020/241657 特開2019－83030号公報Japanese Patent Application Publication No. 2019-83030

Yamanaka, Osamu, et al. "Extremum Seeking Based on Approximated Sign of Gradient of Unknown Plant Maps." 2020 59th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE)., 2020.Yamanaka, Osamu, et al. "Extremum Seeking Based on Approximated Sign of Gradient of Unknown Plant Maps." 2020 59th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE)., 2020. B. G. B. Hunnekens, M. A. M. Haring, N. van de Wouw, andH. Nijmeijer, “A dither-free Extremum-seeking control approach using1st-order least-squares fits for gradient estimation,” in 53rd Conference on Decision and Control, 2014, pp. 2679-2684.B. G. B. Hunnekens, M. A. M. Haring, N. van de Wouw, and H. Nijmeijer, “A dither-free extreme-seeking control approach using1st-order least-squares fits for gradient estimation,” in 53rd Conference on Decision and Control, 2014, pp. 2679-2684. Zengin, Nursefa, and Baris Fidan. "Adaptive extremum seeking using recursive least squares." arXiv preprint arXiv:2003.03891 (2020).Zengin, Nursefa, and Baris Fidan. "Adaptive extremum seeking using recursive least squares." arXiv preprint arXiv:2003.03891 (2020). Onishi, Yuta, et al. "Extremum Seeking Control for Wastewater Treatment Plant with Prioritized Output Constraints." 2020 59th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE)., 2020.Onishi, Yuta, et al. "Extremum Seeking Control for Wastewater Treatment Plant with Prioritized Output Constraints." 2020 59th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE)., 2020. Y. Tan, D. Nesic and I.M.Y. Mareels, “On non-local stability properties of extremum seeking control”, Automatica, vol. 42 (2006), pp. 889-903.Y. Tan, D. Nesic and I.M.Y. Mareels, “On non-local stability properties of extremum seeking control”, Automatica, vol. 42 (2006), pp. 889-903.

本発明の実施形態は、上記の事情に鑑みて為されたものであり、安定性を維持しながら、制御性能を極力高める最適値の探索を実現する最適制御装置、最適制御方法、および、コンピュータプログラムを提供することを目的とする。 Embodiments of the present invention have been made in view of the above circumstances, and provide an optimal control device, an optimal control method, and a computer that realize the search for optimal values that maximize control performance while maintaining stability. The purpose is to provide programs.

実施形態による最適制御装置は、制御対象プロセスの操作量と、前記操作量に応じて変化する制御量に基づく評価関数の評価値とに基づいて、前記操作量をリアルタイムに操作して、前記評価値の最適値を探索する最適制御装置であって、前記制御対象プロセスにおいて取得された計測値を用いて前記評価関数の前記評価値を算出するプロセス評価値算出部と、前記操作量および前記評価値を用いて、前記操作量から前記評価値までの位相遅れの推定値を算出するプロセス位相遅れ推定部と、前記位相遅れの推定値に関する情報と前記評価値の情報とを用いて、前記評価値の前記操作量に対する変化率の推定値を算出する評価関数勾配推定部と、前記変化率の推定値を積分することにより、前記操作量の動くべき方向と量とを決める極値探索部と、前記極値探索部で決定された前記操作量の動くべき方向と量との情報に基づく前記操作量を、前記制御対象プロセスへ出力する操作量出力部と、を備える。 The optimal control device according to the embodiment operates the manipulated variable in real time based on the manipulated variable of the controlled process and the evaluation value of the evaluation function based on the controlled variable that changes according to the manipulated variable, and performs the evaluation. An optimal control device that searches for an optimal value of a value, the process evaluation value calculation unit that calculates the evaluation value of the evaluation function using the measured value acquired in the controlled process, and the operation amount and the evaluation. a process phase lag estimator that calculates an estimated value of the phase lag from the manipulated variable to the evaluation value using the value; an evaluation function gradient estimation unit that calculates an estimated rate of change of a value with respect to the manipulated variable; and an extreme value search unit that determines the direction and amount of movement of the manipulated variable by integrating the estimated value of the change rate. , a manipulated variable output unit that outputs the manipulated variable to the controlled process based on information about the direction and amount of movement of the manipulated variable determined by the extreme value search unit.

図１は、一実施形態の最適制御装置、最適制御方法、および、コンピュータプログラムの制御対象の一例を説明するための図である。FIG. 1 is a diagram for explaining an example of an optimal control device, an optimal control method, and a control target of a computer program according to an embodiment. 図２は、極値制御における極値（局所最適値）探索の原理を説明するための図である。FIG. 2 is a diagram for explaining the principle of extreme value (local optimum value) search in extreme value control. 図３は、第１実施形態の最適制御装置が適用されたリアルタイムプロセス最適制御システムの一構成例を概略的に示すブロック図である。FIG. 3 is a block diagram schematically showing a configuration example of a real-time process optimal control system to which the optimal control device of the first embodiment is applied. 図４は、第１実施形態の最適制御装置の一部の構成例を概略的に示す図である。FIG. 4 is a diagram schematically showing a partial configuration example of the optimal control device according to the first embodiment. 図５は、制御対象にダイナミクスがあるときの操作量の時系列データと評価量の時系列データとの関係の一例を概略的に示す図である。FIG. 5 is a diagram schematically showing an example of the relationship between time-series data of manipulated variables and time-series data of evaluation amounts when the controlled object has dynamics. 図６は、位相遅れ補償の前後における操作量の時系列データと評価値の時系列データとの一例を概略的に示す図である。FIG. 6 is a diagram schematically showing an example of time series data of manipulated variables and time series data of evaluation values before and after phase lag compensation. 図７は、極値制御に用いられる周期信号の周波数を変更したときの、制御の安定性と制御性能との関係の一例を説明するための図である。FIG. 7 is a diagram for explaining an example of the relationship between control stability and control performance when the frequency of a periodic signal used for extreme value control is changed. 図８は、極値制御に用いられる周期信号の周波数を変更したときの、制御の安定性と制御性能との関係の一例を説明するための図である。FIG. 8 is a diagram for explaining an example of the relationship between control stability and control performance when the frequency of a periodic signal used for extreme value control is changed. 図９は、極値制御に用いられる周期信号の周波数を変更したときの、制御の安定性と制御性能との関係の一例を説明するための図である。FIG. 9 is a diagram for explaining an example of the relationship between control stability and control performance when the frequency of a periodic signal used for extreme value control is changed. 図１０は、極値制御に用いられる周期信号の周波数を変更したときの、制御の安定性と制御性能との関係の一例を説明するための図である。FIG. 10 is a diagram for explaining an example of the relationship between control stability and control performance when the frequency of a periodic signal used for extreme value control is changed. 図１１は、実施形態の最適制御装置による効果の一例を説明するための図である。FIG. 11 is a diagram for explaining an example of the effect of the optimal control device of the embodiment. 図１２は、実施形態の最適制御装置による効果の一例を説明するための図である。FIG. 12 is a diagram for explaining an example of the effect of the optimal control device of the embodiment. 図１３は、実施形態の最適制御装置による効果の一例を説明するための図である。FIG. 13 is a diagram for explaining an example of the effect of the optimal control device of the embodiment. 図１４は、実施形態の最適制御装置による効果の他の例を説明するための図である。FIG. 14 is a diagram for explaining another example of the effect of the optimal control device of the embodiment. 図１５は、実施形態の最適制御装置による効果の他の例を説明するための図である。FIG. 15 is a diagram for explaining another example of the effect of the optimal control device of the embodiment. 図１６は、実施形態の最適制御装置による効果の他の例を説明するための図である。FIG. 16 is a diagram for explaining another example of the effect of the optimal control device of the embodiment. 図１７は、実施形態の最適制御装置による効果の他の例を説明するための図である。FIG. 17 is a diagram for explaining another example of the effect of the optimal control device of the embodiment. 図１８は、第１実施形態の最適制御装置の一部の他の構成例を概略的に示す図である。FIG. 18 is a diagram schematically showing another configuration example of a part of the optimal control device of the first embodiment. 図１９は、第２実施形態の最適制御装置の一構成例を概略的に示す図である。FIG. 19 is a diagram schematically showing a configuration example of the optimal control device according to the second embodiment. 図２０は、第２実施形態の最適制御装置の一部の構成例を概略的に示す図である。FIG. 20 is a diagram schematically showing a partial configuration example of the optimal control device according to the second embodiment. 図２１は、第２実施形態の最適制御装置の構成の変形例を概略的に示す図である。FIG. 21 is a diagram schematically showing a modification of the configuration of the optimal control device according to the second embodiment. 図２２は、第２実施形態の最適制御装置の一部の構成の変形例を概略的に示す図である。FIG. 22 is a diagram schematically showing a modification of a part of the configuration of the optimal control device of the second embodiment.

以下、実施形態の最適制御装置、最適制御方法、および、コンピュータプログラムについて、図面を参照して詳細に説明する。
図１は、一実施形態の最適制御装置、最適制御方法、および、コンピュータプログラムの制御対象の一例を説明するための図である。
本実施形態の最適制御装置は、制御対象プロセスの操作量と、操作量に応じて変化する制御量に基づく評価関数の評価値とに基づいて、操作量をリアルタイムに操作して、評価値の最適値を探索する装置である。
ここでは、水処理プラント（例えば浄水場）の凝集剤（ＰＡＣ）注入プロセスを、本実施形態の最適制御装置の制御対象の一例として説明する。なお、実施形態の最適制御装置の制御対象は任意のプロセスでよく、図１に示すプロセスに限定されない。 Hereinafter, an optimal control device, an optimal control method, and a computer program according to embodiments will be described in detail with reference to the drawings.
FIG. 1 is a diagram for explaining an example of an optimal control device, an optimal control method, and a control target of a computer program according to an embodiment.
The optimal control device of this embodiment manipulates the manipulated variable in real time based on the manipulated variable of the controlled process and the evaluation value of the evaluation function based on the controlled variable that changes according to the manipulated variable, and calculates the evaluation value. This is a device that searches for the optimal value.
Here, a coagulant (PAC) injection process in a water treatment plant (for example, a water purification plant) will be described as an example of a control target of the optimal control device of this embodiment. Note that the control target of the optimal control device of the embodiment may be any process, and is not limited to the process shown in FIG.

水処理プラントは、例えば、混和池と、フロック形成池と、沈殿池と、排泥池と、砂ろ過池と、浄水池とを含む。
水処理プラントに流入した原水には混和池で凝集剤が注入され、フロック形成池で撹拌されてフロックと呼ばれる固形物が形成される。フロックを含む処理水の一部は排泥池に排出され、一部は沈澱池に沈殿する。フロック形成池から排出された処理水は砂ろ過池へ送水され、ろ過された処理水が浄水池へ排出される。 The water treatment plant includes, for example, a mixing basin, a flocculation basin, a settling basin, a sludge basin, a sand filtration basin, and a water purification basin.
A flocculant is injected into raw water that flows into a water treatment plant in a mixing pond, and is stirred in a flocculation pond to form solids called flocs. A portion of the treated water containing flocs is discharged to the sludge pond, and a portion is precipitated in the sedimentation basin. The treated water discharged from the floc formation pond is sent to the sand filter pond, and the filtered treated water is discharged to the water purification pond.

上記水処理プラントの凝集剤注入プロセスにおいて、凝集剤（ＰＡＣ）の注入によってフロックが形成される。形成されたフロックは、電界をかけることにより移動する。本実施形態の最適制御装置は、フロックの移動速度（ＰＶ）を画像処理によって計測しながら、移動速度がゼロ付近の目標値（ＳＶ）に追従する様にＰＩ制御で注入率（ＭＶ）を調整している。 In the flocculant injection process of the water treatment plant, flocs are formed by the injection of flocculant (PAC). The formed flocs are moved by applying an electric field. The optimal control device of this embodiment measures the floc movement speed (PV) through image processing and adjusts the injection rate (MV) using PI control so that the movement speed follows a target value (SV) near zero. are doing.

なお、移動速度ＰＶの目標値ＳＶは、ゼロ付近にすることが良いが、その最適値は、水温や流入水質などによって変動すると考えられる。このため、例えば本実施形態の最適制御装置が凝集剤注入プロセスを制御対象とするときには、フロックの移動速度の目標値ＳＶを極値制御により調整する。
図１に示す制御系は、カスケード制御と呼ばれる構成になっており、ＰＩ制御の目標値ＳＶが極値制御の操作量ＭＶ値になる様に、２段のカスケード構成となっている。 Note that the target value SV of the moving speed PV is preferably set near zero, but the optimal value is considered to vary depending on the water temperature, inflow water quality, etc. For this reason, for example, when the optimum control device of this embodiment controls the flocculant injection process, the target value SV of the floc movement speed is adjusted by extreme value control.
The control system shown in FIG. 1 has a configuration called cascade control, and has a two-stage cascade configuration so that the target value SV of the PI control becomes the manipulated variable MV value of the extreme value control.

このとき、本実施形態の最適制御装置における評価関数は、例えば、ＰＡＣの薬品コスト、汚泥処分コスト、および、洗浄コストを考慮してその総和を評価関数（運用コスト）とし、沈殿水とろ過水との濁度に制約条件を設けている。制約条件は、例えば非特許文献５に示されている方法や、ペナルティ関数と呼ばれる方法により、評価関数に換算することができる。以下では簡単のため、制約条件はペナルティ関数で評価関数に組み込まれているものとし、上記運用コストにこの水質制約（制約条件）を組み込んだ評価関数を、総コストと呼ぶ。 At this time, the evaluation function in the optimal control device of this embodiment is, for example, taking into account the chemical cost of PAC, the sludge disposal cost, and the cleaning cost, and setting the sum as the evaluation function (operating cost), and There are constraints on the turbidity. The constraint conditions can be converted into an evaluation function by, for example, the method shown in Non-Patent Document 5 or a method called a penalty function. In the following, for simplicity, it is assumed that the constraint condition is a penalty function that is incorporated into the evaluation function, and the evaluation function that incorporates this water quality constraint (constraint condition) into the above operation cost is referred to as the total cost.

図２は、極値制御における極値（局所最適値）探索の原理を説明するための図である。
図２において、横軸は極値制御によって操作を行う操作量を意味し、本実施形態では移動速度の目標値（ＳＶ）である。また、図２において、縦軸は極値制御により最適化（最小化）したい評価関数（評価量）であり、本実施形態では総コストである。 FIG. 2 is a diagram for explaining the principle of extreme value (local optimum value) search in extreme value control.
In FIG. 2, the horizontal axis means the amount of operation performed by extreme value control, and in this embodiment is the target value (SV) of the moving speed. Further, in FIG. 2, the vertical axis is an evaluation function (evaluation amount) to be optimized (minimized) by extreme value control, and in this embodiment is the total cost.

操作量と評価関数との間には何等かの関係性があり、極値（局所最小値）を持つことが予め仮定されている。図２では、操作量と評価関数との間に、下に凸形状の関数関係があることを想定している。 It is assumed in advance that there is some kind of relationship between the manipulated variable and the evaluation function, and that they have an extreme value (local minimum value). In FIG. 2, it is assumed that there is a downwardly convex functional relationship between the manipulated variable and the evaluation function.

実際に極値制御を行っている際は、制御実施時のその時々の操作量に対する評価量の値はリアルタイムで取得できるが、評価関数の全体形状はリアルタイムでは把握できず、未知の状態である。このような状況において、極値制御は、図中の極値（局所最適値、図中の例は単峰性の下に凸な関数であるので大域的最小値）を探索するための制御アルゴリズムを提供するものである。 When extreme value control is actually performed, the value of the evaluation value for the manipulated variable at each moment during control execution can be obtained in real time, but the overall shape of the evaluation function cannot be grasped in real time and is in an unknown state. . In this situation, extreme value control is a control algorithm that searches for the extreme value in the diagram (local optimum value, the example in the diagram is a unimodal downwardly convex function, so the global minimum value). It provides:

図２によれば、操作量が正弦波などの周期的なディザー信号で駆動されている時、極値の右側（操作量が増加する側）に操作量がある場合には、ディザー信号で駆動された操作量の動きとリアルタイムで取得する評価量（評価値）の動きとは同期して動く、すなわち、同位相で動く。 According to Figure 2, when the manipulated variable is driven by a periodic dither signal such as a sine wave, if the manipulated variable is on the right side of the extreme value (the side where the manipulated variable increases), it is driven by the dither signal. The movement of the manipulated variable and the movement of the evaluation amount (evaluation value) acquired in real time move in synchronization, that is, they move in the same phase.

一方、操作量が正弦波などの周期的なディザー信号で駆動されている時、極値の左側（操作量が減少する側）に操作量がある場合には、ディザー信号で駆動された操作量の動きとリアルタイムで取得する評価量（評価値）の動きとは逆になる。すなわち、極値の左側では、ディザー信号で駆動された操作量とリアルタイムで取得する評価量（評価値）とが逆位相で動く。 On the other hand, when the manipulated variable is driven by a periodic dither signal such as a sine wave, if the manipulated variable is on the left side of the extreme value (the side where the manipulated variable decreases), the manipulated variable driven by the dither signal The movement of the evaluation amount (evaluation value) obtained in real time is opposite to the movement of the evaluation amount (evaluation value) obtained in real time. That is, on the left side of the extreme value, the operation amount driven by the dither signal and the evaluation amount (evaluation value) acquired in real time move in opposite phases.

最適制御装置は、この情報を利用することで、制御動作中の操作量が極値に対して左右どちら側にあるかを判断できる。最適制御装置は、制御動作中の操作量が極値に対して右にある場合には操作量を減少させ、制御動作中の操作量が極値に対して左にある場合は操作量を増加させることで、極値探索が可能になる。 By using this information, the optimal control device can determine whether the manipulated variable during the control operation is on the left or right side of the extreme value. The optimal control device decreases the manipulated variable when the manipulated variable during control action is to the right of the extreme value, and increases the manipulated variable when the manipulated variable during control action is to the left of the extreme value. By doing so, it becomes possible to search for extreme values.

なお、ディザー信号駆動型の極値制御アルゴリズムの数式を用いて分析すると、最適制御装置は、実際には、例えば図２に示す点Ａのような動作点における平均的な勾配の推定を行っており、この勾配の情報を用いて、制御動作中の操作量を減少させるか増加させるか（操作量の動くべき向き）と、操作量を減少又は増加させる大きさ（操作量の動くべき量）とを判断するものになっている。
本実施形態の最適制御装置は、上記極値制御の安定性および制御性能を改善するものである。 Furthermore, when analyzed using the formula of the dither signal-driven extreme value control algorithm, the optimal control device actually estimates the average slope at an operating point such as point A shown in Fig. 2. This gradient information is used to determine whether to decrease or increase the manipulated variable during control operation (the direction in which the manipulated variable should move) and the magnitude at which the manipulated variable should be decreased or increased (the amount that the manipulated variable should move). It is supposed to be judged.
The optimal control device of this embodiment improves the stability and control performance of the extreme value control described above.

図３は、第１実施形態の最適制御装置が適用されたリアルタイムプロセス最適制御システムの一構成例を概略的に示すブロック図である。
図３に示すリアルタイムプロセス最適制御システムは、最適制御装置と、制御対象２００と、を備えている。 FIG. 3 is a block diagram schematically showing a configuration example of a real-time process optimal control system to which the optimal control device of the first embodiment is applied.
The real-time process optimal control system shown in FIG. 3 includes an optimal control device and a controlled object 200.

制御対象２００は、操作量入力Ｕとプロセス出力Ｙとを持つ任意のプラントにおける任意のプロセスである。本実施形態では、制御対象２００は、例えば図１に示す水処理プラントの凝集剤注入プロセスであり、操作量入力Ｕは凝集剤注入プロセスにおける移動速度目標値（ＳＶ）である。凝集剤注入プロセスにおける実際の操作量はＰＡＣ注入率であるが、先に述べた通り、この制御系は２段のカスケード構成となっているため、極値制御によって調整する操作量は移動速度の目標値である。
また、水処理プラントは、出力Ｙの値を検出する種々のセンサを備えている。凝集剤注入プロセスの出力Ｙは、例えば、凝集剤（ＰＡＣ）の注入率、排泥池の汚泥濃度、汚泥流量、沈砂池の洗浄頻度、沈殿池出口の汚泥濃度、および、浄水池出口の汚泥濃度、である。 The controlled object 200 is an arbitrary process in an arbitrary plant having a manipulated variable input U and a process output Y. In this embodiment, the controlled object 200 is, for example, a flocculant injection process of a water treatment plant shown in FIG. 1, and the manipulated variable input U is a moving speed target value (SV) in the flocculant injection process. The actual manipulated variable in the coagulant injection process is the PAC injection rate, but as mentioned earlier, this control system has a two-stage cascade configuration, so the manipulated variable adjusted by extreme value control is the PAC injection rate. This is the target value.
Furthermore, the water treatment plant is equipped with various sensors that detect the value of the output Y. The output Y of the flocculant injection process is, for example, the injection rate of the flocculant (PAC), the sludge concentration in the sludge basin, the sludge flow rate, the washing frequency of the settling basin, the sludge concentration at the outlet of the settling basin, and the sludge at the outlet of the water treatment basin. The concentration is.

本実施形態の最適制御装置は、例えば、少なくとも１つのプロセッサと、プロセッサにより実行されるプログラムが記録されたメモリとを備えた演算装置である。なお、最適制御装置のメモリには、プログラムを実行中に一時的にデータ（例えば計測値や評価値の時系列データ）を記憶するための記憶領域（記憶部）が設けられ得る。最適制御装置は、ソフトウエアにより、又は、ソフトウエアとハードウエアとの組み合わせにより、以下に説明する種々の機能を実現することができる。 The optimal control device of this embodiment is, for example, an arithmetic device including at least one processor and a memory in which a program executed by the processor is recorded. Note that the memory of the optimal control device may be provided with a storage area (storage unit) for temporarily storing data (for example, time-series data of measurement values and evaluation values) while the program is being executed. The optimal control device can implement various functions described below using software or a combination of software and hardware.

最適制御装置は、プロセス位相遅れ推定部４００と、プロセス計測値取得部１１０と、プロセス評価値算出部１２０と、復調用ディザー信号生成部１３０と、評価関数勾配推定部１４０と、正規化信号発生部１５０と、勾配推定量正規化部１６０と、極値探索部（最適操作量適応調整部）１７０と、変調用ディザー信号生成部１８０と、操作量出力部１９０と、を備えている。プロセス位相遅れ推定部４００は、位相遅れパラメータ推定部３００と、位相遅れ推定部１１００とを含む。 The optimal control device includes a process phase delay estimation section 400, a process measurement value acquisition section 110, a process evaluation value calculation section 120, a demodulation dither signal generation section 130, an evaluation function gradient estimation section 140, and a normalization signal generation section. 150, a gradient estimation amount normalization section 160, an extreme value search section (optimum manipulated variable adaptive adjustment section) 170, a modulation dither signal generation section 180, and a manipulated variable output section 190. Process phase lag estimating section 400 includes a phase lag parameter estimating section 300 and a phase lag estimating section 1100.

プロセス計測値取得部１１０は、評価量や制約条件を算出するために必要となる計測情報を、制御対象２００から取得する。
一例として、薬品費と汚泥処分費とろ過池洗浄費とを合計した運用コストを評価関数として設定する場合には、プロセス計測値取得部１１０は、例えば、ＰＡＣの注入率、排泥池の汚泥濃度、汚泥流量、沈砂池の洗浄頻度、などを計測情報として取得する。また、沈殿池出口や浄水池出口の汚泥濃度に制約を設ける場合は、プロセス計測値取得部１１０は、制約を設けた汚泥濃度の値も所定周期で計測し、所定のフォーマットで時系列データとして保存しておく。また、プロセス計測値取得部１１０は、ろ過池の洗浄履歴データを所定のフォーマットで取得し、保存可能である。 The process measurement value acquisition unit 110 acquires measurement information necessary for calculating evaluation quantities and constraint conditions from the controlled object 200.
As an example, when setting the operation cost, which is the sum of chemical costs, sludge disposal costs, and filter cleaning costs, as the evaluation function, the process measurement value acquisition unit 110 may, for example, Obtain measurement information such as concentration, sludge flow rate, and cleaning frequency of the settling basin. In addition, if a restriction is placed on the sludge concentration at the outlet of the settling tank or the water purification tank, the process measurement value acquisition unit 110 also measures the value of the sludge concentration with the restriction at a predetermined period, and converts it into time series data in a predetermined format. Save it. Furthermore, the process measurement value acquisition unit 110 can acquire and save filter cleaning history data in a predetermined format.

プロセス評価値算出部１２０は、プロセス計測値取得部１１０で取得した情報（制御対象プロセスの計測値）を用いて、予め設定した評価関数の評価値をリアルタイムに計算する。
運用コストは、薬品費（ＰＡＣ費）と汚泥処分費とろ過池洗浄費との総和で定義されている。
薬品費は、ＰＡＣの注入量の時系列データに薬品単価や希釈率などの係数をかけることで算出でき、時系列データとして得ることができる。
汚泥処分費は、排泥池の汚泥濃度と汚泥流量との積で発生汚泥量を算出し、発生汚泥量に汚泥処分単価をかけることで算出でき、時系列データを得ることができる。 The process evaluation value calculation unit 120 calculates the evaluation value of a preset evaluation function in real time using the information (measurement value of the controlled process) acquired by the process measurement value acquisition unit 110.
The operational cost is defined as the sum of chemical costs (PAC costs), sludge disposal costs, and filter cleaning costs.
The drug cost can be calculated by multiplying the time series data of the injection amount of PAC by a coefficient such as the drug unit price or dilution rate, and can be obtained as time series data.
The sludge disposal cost can be calculated by calculating the amount of generated sludge by multiplying the sludge concentration in the sludge pond by the sludge flow rate, and then multiplying the amount of generated sludge by the sludge disposal unit price, and time series data can be obtained.

また、ろ過池の洗浄は、通常ろ抗（ろ過抵抗）が所定のしきい値を超過すると行われるため、ろ過抵抗の時系列データや洗浄履歴データから洗浄が行われたタイミングを知ることができ、洗浄が行われたタイミングの洗浄費は履歴データから得られる。洗浄は前回の洗浄実施時からの次に洗浄を行うまでの期間の費用と考えられるので、この洗浄費を当該期間に均等に分配することで洗浄費を時系列データに換算することができる。 In addition, since filtration basin cleaning is normally performed when the filtration resistance exceeds a predetermined threshold, it is not possible to know the timing of cleaning from the time series data of filtration resistance and cleaning history data. The cleaning cost at the time of cleaning is obtained from historical data. Since cleaning is considered to be an expense for the period from the previous cleaning to the next cleaning, the cleaning expenses can be converted into time-series data by distributing the cleaning expenses evenly over the period.

ただし、洗浄費の時系列データへの換算は、過去の実績データに対しては容易に実施できるが、リアルタイムでは、次回洗浄を行うタイミングは未確定であるため、ろ抗の変化率を監視しながら、次回の洗浄タイミングを推定し、そこから、リアルタイムでの洗浄費の時系列データに換算してもよい。 However, although conversion of cleaning costs into time-series data can be easily performed on past performance data, in real time, the timing of the next cleaning is not determined, so the rate of change in the filtration resistance must be monitored. However, the next cleaning timing may be estimated and converted into time-series data of cleaning costs in real time.

運用コストは、時系列データに換算された各費用の総和である。したがって、プロセス評価値算出部１２０は、上記の方法により得られた各費用を用いて、リアルタイムで運用コストを算出することが可能である。
一方、水質制約は、図１の沈殿池出口および浄水池出口の濁度を時系列データとして取得し、これに、濁度の制約値、例えば、Ｔｌｉｍ（濁度上限値）＝０．８度以下などの制約条件を組み込む。 The operational cost is the sum of each cost converted into time-series data. Therefore, the process evaluation value calculation unit 120 can calculate the operation cost in real time using each cost obtained by the above method.
On the other hand, water quality constraints are obtained by acquiring the turbidity at the sedimentation basin outlet and the water purification basin outlet in Figure 1 as time series data, and adding a turbidity constraint value, for example, Tlim (turbidity upper limit) = 0.8 degrees. Incorporate constraints such as the following.

極値制御では、制約条件を直接扱うことはできないため、例えば、非特許文献５の方法などで評価関数として扱える様に変換することができる。ここでは、最適化分野で良く知られたペナルティ関数の考え方で評価関数として扱う。すなわち、例えば、以下の（１）式の様に水質コストを表す。 In extreme value control, constraint conditions cannot be handled directly, so they can be converted so that they can be handled as evaluation functions, for example, by the method of Non-Patent Document 5. Here, it is treated as an evaluation function using the concept of penalty function, which is well known in the field of optimization. That is, for example, the water quality cost is expressed as in the following equation (1).

Ｗｃｏｓｔ＝ｍａｘ（０，ａ×（ｅｘｐ（Ｔ－Ｔｌｉｍ）－１））……………（１）
ここで、Ｔは濁度計測値、Ｔｌｉｍは濁度上限値、ａは設計パラメータであり、パラメータａはゼロより大きい（ａ＞０）。
コストＷｃｏｓｔは換算された評価関数である。本実施形態の最適制御装置では、沈殿池出口の濁度と浄水池出口の濁度とについて（１）式の変換を行い、沈殿池出口の濁度についてのコストＷｃｏｓｔと、浄水池出口の濁度についてのコストＷｃｏｓｔとの和を水質コストとする。 Wcost=max(0, a×(exp(T-Tlim)-1))…………(1)
Here, T is a turbidity measurement value, Tlim is a turbidity upper limit value, and a is a design parameter, and the parameter a is larger than zero (a>0).
The cost Wcost is a converted evaluation function. In the optimal control device of this embodiment, the turbidity at the outlet of the sedimentation tank and the turbidity at the outlet of the water purification tank are converted using equation (1), and the cost W cost for the turbidity at the outlet of the water treatment tank and the turbidity at the outlet of the water purification tank are calculated. The sum of the water quality cost and the water quality cost Wcost is the water quality cost.

（１）式は、濁度計測値Ｔが濁度上限値Ｔｌｉｍ以下（濁度計測値Ｔ≦濁度上限値Ｔｌｉｍ）のときにはコストＷｃｏｓｔは０となり、濁度計測値Ｔが濁度上限値Ｔｌｉｍを超過（濁度計測値Ｔ＞濁度上限値Ｔｌｉｍ）するとＷｃｏｓｔが急激（指数関数的）に上昇することを意味しており、いわゆるペナルティ関数の一種である。 Equation (1) shows that when the turbidity measurement value T is less than or equal to the turbidity upper limit value Tlim (turbidity measurement value T≦turbidity upper limit value Tlim), the cost Wcost becomes 0, and the turbidity measurement value T becomes the turbidity upper limit value Tlim. Exceeding (turbidity measurement value T>turbidity upper limit value Tlim) means that Wcost increases rapidly (exponentially), which is a type of so-called penalty function.

先に述べた運用コストに上記水質コストを加えて、総コストＪを以下の（２）式に示す様に定義することができる。
総コストＪ＝薬品コスト＋汚泥処分コスト＋ろ過池洗浄コスト＋水質コスト…（２）
なお、本実施形態の最適制御装置における凝集剤注入制御では、上述のような評価関数を設定したが、問題設定によっては、対象プラント（制御対象）２００とプロセス評価値算出部１２０とを分離できない場合がある。 By adding the water quality cost to the operation cost described above, the total cost J can be defined as shown in equation (2) below.
Total cost J = Chemical cost + Sludge disposal cost + Filter cleaning cost + Water quality cost... (2)
Note that in the flocculant injection control in the optimal control device of this embodiment, the evaluation function as described above is set, but depending on the problem setting, the target plant (control target) 200 and the process evaluation value calculation unit 120 cannot be separated. There are cases.

例えば、風力発電プラントにおいて、風車のブレードの向きを風向に併せて動かして発電量を最大化する制御に極値制御を適用する場合、評価関数Ｊは発電量であり、操作量Ｕは風車のブレードの回転角である。この例では、プラントの出力Ｙを特に定義する必要は無い。このような問題設定においては、出力Ｙと評価関数Ｊとを区別しないケースもあり、必ずしも出力Ｙが評価関数Ｊと分離した状態で定義されて計測されているとは限らない。 For example, in a wind power plant, when extreme value control is applied to maximize the amount of power generation by moving the wind turbine blades in accordance with the wind direction, the evaluation function J is the amount of power generation, and the manipulated variable U is the amount of power generated by the wind turbine. This is the rotation angle of the blade. In this example, there is no need to specifically define the output Y of the plant. In such problem settings, there are cases in which the output Y and the evaluation function J are not distinguished, and the output Y is not necessarily defined and measured separately from the evaluation function J.

一方、対象プラント（制御対象）２００とプロセス評価値算出部１２０とを分離できない場合であっても、本実施形態の最適制御装置と同様に、最適化評価関数を個別に定義することによって、極値制御を適用できる対象になる場合がある。
いずれの場合も評価関数が適切に設定されると、評価関数の値をリアルタイムに所定の制御周期で計測・算出することにより、時々刻々と変化する評価量を取得することができる。 On the other hand, even if the target plant (control target) 200 and the process evaluation value calculation unit 120 cannot be separated, it is possible to minimize the There are cases where value control can be applied.
In either case, when the evaluation function is appropriately set, the evaluation amount that changes from moment to moment can be obtained by measuring and calculating the value of the evaluation function in real time at a predetermined control cycle.

なお、後述する位相遅れパラメータ推定部３００で導かれる下記（６）式や（７）式の伝達関数モデルの出力Ｙ（ｓ）に対応する信号としてプロセス評価値を用いる場合には、パラメータの推定値を演算するよりも前に、プロセス評価値算出部１２０によりプロセス評価値を取得しておく必要がある。この場合には、プロセス評価値算出部１２０は、位相遅れパラメータ推定部３００にプロセス評価値を供給する。 Note that when the process evaluation value is used as a signal corresponding to the output Y(s) of the transfer function model of the following equations (6) and (7) derived by the phase lag parameter estimation unit 300 described later, the parameter estimation Before calculating the value, it is necessary to obtain the process evaluation value by the process evaluation value calculation unit 120. In this case, the process evaluation value calculation unit 120 supplies the process evaluation value to the phase delay parameter estimation unit 300.

評価関数勾配推定部１４０は、プロセス評価値の操作量に対する勾配（変化率）の推定値を演算する。
図４は、第１実施形態の最適制御装置の一部の構成例を概略的に示す図である。なお、図４では以下の説明に必要な構成を示し、その他の構成の詳細は省略されている。 The evaluation function gradient estimation unit 140 calculates an estimated value of the slope (rate of change) of the process evaluation value with respect to the manipulated variable.
FIG. 4 is a diagram schematically showing a partial configuration example of the optimal control device according to the first embodiment. Note that FIG. 4 shows the configuration necessary for the following explanation, and the details of other configurations are omitted.

評価関数勾配推定部１４０は、ハイパスフィルタ（ＨＰＦ）１４１と、ローパスフィルタ（ＬＰＦ）１４３と、加算器１４２と、を含む。
ハイパスフィルタ（ＨＰＦ）１４１には、プロセス評価値算出部１２０から出力された評価値（時系列データ）が入力される。ハイパスフィルタ１４１は、入力された評価値の所定の周波数以上の成分を出力する。ハイパスフィルタ１４１は、未知の極値（局所最適値）が変化しない、あるいは、変化したとしても非常に緩やかに変化する、と仮定した場合に、極値を強制的に（近似的に）０（ゼロ）にするために導入される。評価関数勾配推定部１４０がハイパスフィルタ１４１を備えることにより制御性能を向上させることができる。なお、ハイパスフィルタ１４１は、極値制御の理論的な構成上は必須のものではなく省略されても構わない。 The evaluation function gradient estimation unit 140 includes a high pass filter (HPF) 141, a low pass filter (LPF) 143, and an adder 142.
The evaluation value (time series data) output from the process evaluation value calculation unit 120 is input to the high pass filter (HPF) 141 . The high-pass filter 141 outputs components of the input evaluation value having a predetermined frequency or higher. The high-pass filter 141 forces (approximately) the extreme value to 0 (if it is assumed that the unknown extreme value (local optimum value) does not change, or even if it changes, it changes very slowly. zero). Control performance can be improved by providing the evaluation function gradient estimation unit 140 with the high-pass filter 141. Note that the high-pass filter 141 is not essential in terms of the theoretical configuration of extreme value control and may be omitted.

加算器１４２には、ハイパスフィルタ１４１から出力された値と、後述する復調用ディザー信号（位相遅れの推定値に関する情報）とが入力される。加算器１４２は、入力された値を加算した和をローパスフィルタ１４３へ出力する。
ローパスフィルタ１４３は、加算器１４２から供給された信号の所定の周波数よりも小さい成分を出力する。ローパスフィルタ１４３から出力される信号の周期的な平均値は、評価関数の勾配に比例する信号となることが理論的な解析結果から知られているため、ローパスフィルタ１４３の出力信号は、評価関数値の勾配（正確には勾配に比例する信号）と見なすことができる。例えば、勾配推定値は勾配の方向（符号（１又は－１））と大きさとの情報を含む。 The adder 142 receives the value output from the high-pass filter 141 and a demodulation dither signal (information regarding the estimated value of phase delay), which will be described later. The adder 142 adds the input values and outputs the sum to the low-pass filter 143.
The low-pass filter 143 outputs a component of the signal supplied from the adder 142 that is lower than a predetermined frequency. It is known from theoretical analysis results that the periodic average value of the signal output from the low-pass filter 143 is a signal proportional to the slope of the evaluation function. It can be considered as the slope of the value (more precisely, a signal proportional to the slope). For example, the gradient estimate includes information on the direction (sign (1 or -1)) and magnitude of the gradient.

正規化信号発生部１５０は、正規化信号を発生させる。正規化信号発生部１５０の作用は、例えば特許文献１に記載された正則化信号発生部の作用と同一であり、様々な方法をとることができる。本実施形態の最適制御装置では、評価関数の勾配推定値Ｇ（ｔ）を、以下の性質［１］－［４］を満たす信号Ｇｎ（ｔ）に変換するための信号を「正規化信号」とする。 The normalized signal generator 150 generates a normalized signal. The operation of the normalization signal generation section 150 is the same as that of the regularization signal generation section described in Patent Document 1, for example, and various methods can be adopted. In the optimal control device of this embodiment, a signal for converting the estimated gradient value G(t) of the evaluation function into a signal Gn(t) that satisfies the following properties [1]-[4] is called a "normalized signal". shall be.

［１］Ｇ（ｔ）＝０⇔Ｇｎ（ｔ）＝０（Ｇ（ｔ）が０の時に限りＧｎ（ｔ）も０となる）
［２］Ｇ（ｔ）が正（負）⇔Ｇｎ（ｔ）が正（負）（Ｇ（ｔ）とＣｎ（ｔ）との符号は同じ）
［３］Ｇ（ｔ）＜∞⇔Ｇｎ（ｔ）＜∞（Ｇ（ｔ）が有限の時はＧｎ（ｔ）も有限に留まり、ゼロ割などが起こらない）
［４］Ｇｎ（ｔ）は、Ｇ（ｔ）→∞の時、Ｇｎ（ｔ）→∞とはならずある正の有限値に近づく、すなわち、ある０＜ｋ＜∞が存在して、Ｇｎ（ｔ）→ｋとなる。 [1] G(t)=0⇔Gn(t)=0 (Gn(t) is also 0 only when G(t) is 0)
[2] G(t) is positive (negative) ⇔ Gn(t) is positive (negative) (G(t) and Cn(t) have the same sign)
[3] G(t)<∞⇔Gn(t)<∞ (When G(t) is finite, Gn(t) also remains finite, and zero division etc. does not occur)
[4] Gn(t) does not become Gn(t)→∞ when G(t)→∞, but approaches a certain positive finite value, that is, there exists a certain 0<k<∞, and Gn (t)→k.

正規化信号発生部１５０は、既知の信号発生方法を採用し、上記の性質を満たす任意の信号を正規化信号として採用することができる。
例えば、評価関数の勾配を極値制御に用いるのではなく、勾配の方向（１又は－１）を極値制御に用いる方が、極値制御の制御性能を高める（速く収束させる）ための各種パラメータの調整が容易になる。 The normalized signal generation section 150 can employ a known signal generation method, and can employ any signal that satisfies the above properties as the normalized signal.
For example, instead of using the gradient of the evaluation function for extreme value control, it is better to use the direction of the gradient (1 or -1) for extreme value control. Parameter adjustment becomes easier.

操作量と評価量の関係を表す評価関数形状は未知であり、実際には例えば図２に示すような下に凸の関数になっているとは限らず、動作点によって勾配が極めて急峻になったり、極めて緩やかになったりしている可能性がある。制御実行時には、評価関数の形状の情報を予め知ることができないため、極値制御では評価関数の勾配をリアルタイムで推定している。 The shape of the evaluation function that expresses the relationship between the manipulated variable and the evaluated value is unknown, and in reality it is not necessarily a downwardly convex function as shown in Figure 2, but the slope may be extremely steep depending on the operating point. It is possible that the rate of change has increased significantly, or that it has become extremely gradual. Since information on the shape of the evaluation function cannot be known in advance during control execution, the gradient of the evaluation function is estimated in real time in extreme value control.

例えば、推定された勾配が緩やかな場合は、制御を強く効かせることができるが、推定された勾配が急峻な場合には、制御を強く効かせると安定性を損なうリスクがある。したがって、推定された勾配の情報に基づいて極値制御を行うときには、想定される勾配の中で最も急峻と思われる勾配を推測・予想して、極値制御のパラメータの調整を行う必要が出てくる。このとき、制御パラメータの設定は安定性を損なわない様に保守的な調整をせざるを得なくなり、結果として制御性能を向上させる（収束速度を上げる）ことが困難になる。 For example, if the estimated slope is gentle, the control can be applied strongly, but if the estimated slope is steep, there is a risk that stability will be impaired if the control is applied strongly. Therefore, when performing extreme value control based on information on estimated slopes, it is necessary to estimate and predict the steepest of the assumed slopes and adjust the extreme value control parameters. It's coming. At this time, control parameter settings must be conservatively adjusted so as not to impair stability, and as a result, it becomes difficult to improve control performance (increase convergence speed).

しかしながら、極値探索を行うためには、必ずしも勾配そのものを推定して知る必要は無く、勾配の方向、すなわち、符号（操作量を上げるべきか下げるべきか）の情報だけ得ることができれば、原理的には極値探索を行えるはずである。この観点でみると、正規化信号として次式を用いることが最も適切であると考えられる。
ＲＳ＝１／｜Ｇ（ｔ）｜……………………………………………………………（３）
ここで、ＲＳは正規化信号を意味し、｜Ｇ（ｔ）｜は評価関数の勾配推定値の絶対値である。これを正規化信号として用いると、Ｇ（ｔ）を正規化した信号Ｇ_ｎ（ｔ）は勾配の符号関数となる。 However, in order to perform an extreme value search, it is not necessarily necessary to estimate and know the gradient itself; if only the direction of the gradient, that is, the sign (whether the manipulated variable should be increased or decreased) can be obtained, the principle In general, it should be possible to search for extreme values. From this point of view, it is considered most appropriate to use the following equation as the normalized signal.
RS=1/|G(t)|……………………………………………………………………(3)
Here, RS means a normalized signal, and |G(t)| is the absolute value of the estimated gradient of the evaluation function. When this is used as a normalized signal, the signal G _n (t) obtained by normalizing G(t) becomes a sign function of the gradient.

ただし、この関数をそのまま用いると、Ｇ（ｔ）とＧ_ｎ（ｔ）との関係を表す平面の原点で不連続となり、上記［１］の条件を満たさなくなる。そのため、上記（３）式は、若干滑らかにする方が好ましい。例えば、次式の様な正則化定数δ＞０を導入することで、以下の様な正規化信号が得られる。
ＲＳ＝１／（δ＋｜Ｇ（ｔ）｜）…………………………………………………（４）
正規化信号発生部１５０は、上記（４）式により得られる正規化信号を勾配推定量正規化部１６０へ出力する。 However, if this function is used as is, it will become discontinuous at the origin of the plane representing the relationship between G(t) and G _n (t), and the condition [1] above will not be satisfied. Therefore, it is preferable to make the above equation (3) slightly smoother. For example, by introducing a regularization constant δ>0 as shown in the following equation, the following normalized signal can be obtained.
RS=1/(δ+|G(t)|)……………………………………………………(4)
The normalized signal generation section 150 outputs the normalized signal obtained by the above equation (4) to the gradient estimation amount normalization section 160.

勾配推定量正規化部１６０は、正規化信号発生部１５０で発生させた正規化信号を用いて、勾配推定量を正規化する。すなわち、勾配推定量正規化部１６０は、評価関数勾配推定部１４０で推定した勾配推定値Ｇ（ｔ）に対して、例えば（４）式の正規化信号を作用させて正規化後の勾配推定値Ｇ_ｎ（ｔ）を生成する。正規化された勾配推定値は勾配の方向（符号）の情報を含む。勾配推定量正規化部１６０は、正規化後の勾配推定値Ｇ_ｎ（ｔ）を極値探索部１７０へ出力する。 The gradient estimate normalization section 160 normalizes the gradient estimate using the normalization signal generated by the normalization signal generation section 150. That is, the gradient estimate normalization unit 160 applies the normalization signal of equation (4), for example, to the gradient estimate G(t) estimated by the evaluation function gradient estimation unit 140 to obtain a normalized gradient estimate. Generate the value G _n (t). The normalized gradient estimate includes information on the direction (sign) of the gradient. The gradient estimate normalization unit 160 outputs the normalized gradient estimate G _n (t) to the extreme value search unit 170.

極値探索部１７０は、積分器１７１と、ゲイン乗算部１７２と、を備え、勾配（変化率の推定値）の情報を積分することにより、操作量の動くべき方向と量とを決める。
積分器１７１は、勾配推定値あるいは正規化された勾配推定値を積分した値を出力する。
ゲイン乗算部１７２は、積分器１７１から出力された値に積分ゲインＫを乗じて出力する。積分ゲインＫは、極値探索の収束速度を決める重要なパラメータであるが、後述する位相遅れ補償を導入することにより、従来の極値制御よりもその値を大きくすることができ、結果的に制御性能を高めることができる。 The extreme value search unit 170 includes an integrator 171 and a gain multiplier 172, and determines the direction and amount of movement of the manipulated variable by integrating information on the slope (estimated value of the rate of change).
The integrator 171 outputs a value obtained by integrating the estimated gradient value or the normalized estimated gradient value.
The gain multiplier 172 multiplies the value output from the integrator 171 by an integral gain K and outputs the result. Integral gain K is an important parameter that determines the convergence speed of extreme value search, but by introducing phase lag compensation, which will be described later, its value can be made larger than in conventional extreme value control, and as a result, Control performance can be improved.

なお、積分器１７１およびゲイン乗算部１７２を用いた制御アルゴリズムは、勾配法と呼ばれる最適化アルゴリズムとほぼ同じものであり、積分ゲインＫは勾配法で呼ばれる学習率と呼ばれるパラメータに相当している。 Note that the control algorithm using the integrator 171 and the gain multiplier 172 is almost the same as an optimization algorithm called the gradient method, and the integral gain K corresponds to a parameter called the learning rate called the gradient method.

変調用ディザー信号生成部１８０は、変調用のディザー信号（第２周期信号）を生成する。
変調用ディザー信号生成部１８０で生成される変調用ディザー信号は、任意の周期的信号で良いが、後述する復調用ディザー信号生成部１３０の信号と同じ波形の信号である必要がある。また、パラメータ同定の観点からは、ディザー信号は、周期信号であっても単一周波数の正弦波よりも複数の周波数の合成となる矩形波などのディザー信号の方が好ましい。後述する復調用ディザー信号（（９）式）に対応する正弦波信号を用いた場合には、変調用ディザー信号生成部１８０では次式の変調用ディザー信号を用いる。
Ｍ（ｔ）＝Ａｓｉｎ（ωｔ）…………………………………………………………（５）
ここで、Ａはディザー信号の振幅であり、調整可能なパラメータである。変調用ディザー信号は、ディザー信号駆動型の極値探索制御では必須の要素である。 The modulation dither signal generation section 180 generates a modulation dither signal (second periodic signal).
The modulation dither signal generated by the modulation dither signal generation section 180 may be any periodic signal, but it needs to be a signal with the same waveform as the signal of the demodulation dither signal generation section 130, which will be described later. Further, from the viewpoint of parameter identification, even if the dither signal is a periodic signal, a dither signal such as a rectangular wave that is a combination of a plurality of frequencies is preferable to a sine wave of a single frequency. When a sine wave signal corresponding to a demodulation dither signal (formula (9)) to be described later is used, the modulation dither signal generating section 180 uses a modulation dither signal expressed by the following formula.
M(t)=A sin(ωt)…………………………………………………………(5)
Here, A is the amplitude of the dither signal and is an adjustable parameter. The modulation dither signal is an essential element in dither signal-driven extreme value search control.

操作量出力部１９０は加算器を備え、極値探索部１７０の出力と変調用ディザー信号とを足し合わせた信号（操作量指令信号）を、制御対象２００に対して出力する。
本実施形態の最適制御装置において、上記のように変調用ディザー信号を足し合わせた信号を操作量指令信号とすることにより、変調用ディザー信号により操作量を周期的に駆動するとともに、極値の探索を行うことができる。 The manipulated variable output section 190 includes an adder, and outputs a signal (manipulated variable command signal) that is the sum of the output of the extreme value search section 170 and the modulation dither signal to the controlled object 200.
In the optimal control device of this embodiment, by using a signal obtained by adding the modulation dither signal as the manipulated variable command signal as described above, the manipulated variable is periodically driven by the modulating dither signal, and the extreme value You can explore.

ここで、実際には、制御対象にダイナミクスがある場合には、操作量の時系列データと評価値の時系列データとの間には、プロセスのダイナミクスによる時間の遅れ（操作量が正弦波で駆動される場合には位相の遅れ）が必ず存在する。 In reality, if the controlled object has dynamics, there is a time lag between the manipulated variable time series data and the evaluation value time series data due to the dynamics of the process (the manipulated variable is a sine wave). When driven, there is always a phase delay).

図５は、制御対象にダイナミクスがあるときの操作量の時系列データと評価量の時系列データとの関係の一例を概略的に示す図である。
制御対象にダイナミクスがある場合、例えば図５において、制御動作中の操作量が極値の右側にある場合でも、操作量と評価量とが同位相で動くわけではなく、操作量に対して位相の遅れを伴って評価量が変化する。同様に、制御動作中の操作量が極値の左側にある場合でも、操作量と評価量とが逆位相で動くわけではなく、操作量に対して位相の遅れを伴って評価量が変化する。 FIG. 5 is a diagram schematically showing an example of the relationship between time-series data of manipulated variables and time-series data of evaluation amounts when the controlled object has dynamics.
When the controlled object has dynamics, for example in Figure 5, even if the manipulated variable during the control operation is on the right side of the extreme value, the manipulated variable and the evaluated amount do not move in the same phase, but the phase with respect to the manipulated variable The evaluation quantity changes with the delay of . Similarly, even if the manipulated variable during control operation is to the left of the extreme value, the manipulated variable and the evaluated amount do not move in opposite phases, but the evaluated amount changes with a phase lag relative to the manipulated variable. .

例えば、操作量に対する評価量の位相の遅れが９０度になると、制御動作中の操作量が極値の右側にある場合に、操作量と評価量とが逆位相になり、制御動作中の操作量が極値の左側にある場合には、操作量と評価量とが同位相になる。
上記のように操作量に対する評価量の位相遅れがある状態で、極値制御による最適化を行うと、操作量が収束するまでの時間を要する可能性があった。 For example, if the phase delay of the evaluated amount with respect to the manipulated variable becomes 90 degrees, when the manipulated variable during the control operation is on the right side of the extreme value, the manipulated variable and the evaluated amount will be in opposite phases, and the operation during the control operation will be in opposite phase. When the quantity is on the left side of the extreme value, the manipulated quantity and the evaluation quantity are in the same phase.
If optimization by extreme value control is performed in a state where there is a phase lag of the evaluation amount with respect to the manipulated variable as described above, there is a possibility that it will take time for the manipulated variable to converge.

操作量に対する評価量の位相遅れを極力小さくするためには、制御対象がほとんど静的なプロセスと見なせるぐらい速いプロセスである必要がある。制御対象が十分に早く応答し、近似的に静的なプロセスと見なせるならば、操作量に対する評価量の位相遅れを小さくすることができる。一方で、実際の制御対象が静的なプロセスと見なせるほど速く応答するプロセスであるとは限らない。 In order to minimize the phase lag of the evaluation variable with respect to the manipulated variable, the controlled object needs to be a process so fast that it can be regarded as an almost static process. If the controlled object responds quickly enough and can be regarded as an approximately static process, it is possible to reduce the phase lag of the evaluation variable with respect to the manipulated variable. On the other hand, the actual control target is not necessarily a process that responds quickly enough to be considered a static process.

応答が速い、あるいは、応答が遅い、という概念は相対的なものである点に着目すると、ディザー信号の周波数を調整することによって、実際の制御対象を近似的に静的と見なすことが可能である。すなわち、実際の制御対象の応答速度は制御対象固有の特性であるため、その応答速度が十分に速い、と判断できるように、ディザー信号の周波数を遅く（小さく）設計する必要がある。そうすることで、制御対象による応答の遅れが所与のものであっても、それを、位相遅れで見た場合に十分小さいくすることができる。これにより、極値制御における「安定性」と「制御性能」とのトレードオフが生じる。 Focusing on the fact that the concept of fast response or slow response is a relative concept, by adjusting the frequency of the dither signal, it is possible to consider the actual controlled object to be approximately static. be. That is, since the actual response speed of the controlled object is a characteristic unique to the controlled object, it is necessary to design the frequency of the dither signal to be slow (small) so that it can be determined that the response speed is sufficiently fast. By doing so, even if the response delay due to the controlled object is given, it can be made sufficiently small when viewed in terms of phase delay. This creates a trade-off between "stability" and "control performance" in extreme value control.

そこで、本実施形態の最適制御装置では、以下のように位相遅れ補償を行っている。
位相遅れパラメータ推定部３００は、操作量（＝移動速度目標値）に対するプロセス評価値の位相遅れ情報を取得するために、位相遅れに関するパラメータの推定値を演算する。なお、位相遅れパラメータ推定部３００は、パラメータ推定をオフラインで行ってもよく、オンラインで行ってもよい。 Therefore, in the optimal control device of this embodiment, phase delay compensation is performed as follows.
The phase lag parameter estimating unit 300 calculates estimated values of parameters related to phase lag in order to obtain phase lag information of the process evaluation value with respect to the manipulated variable (=target moving speed value). Note that the phase lag parameter estimation unit 300 may perform parameter estimation offline or online.

本実施形態の最適制御装置では、位相遅れパラメータ推定部３００において位相遅れを定義するために適切なモデルとして、例えば下記（６）式のような線形伝達関数モデルを想定することができる。
ここで、ｓはラプラス演算子を意味し、ａ_ｉ,ｉ＝０,１,２,…ｎ、ｂ_ｉ,ｉ＝０,１,２,…ｍは、各々伝達関数Ｇ（ｓ）の分母多項式と分子多項式との係数、ｅｘｐ（－Ｌｓ）は、むだ時間（遅れ時間）の伝達関数であり、Ｌは遅れ時間を表す。また、Ｕ（ｓ）は操作量をラプラス変換した値であり、Ｙ（ｓ）はプロセス計測値をラプラス変換した値である。 In the optimal control device of this embodiment, a linear transfer function model such as the following equation (6) can be assumed as an appropriate model for defining the phase lag in the phase lag parameter estimating section 300.
Here, s means the Laplace operator, a _i ,i=0,1,2,...n, b _i ,i=0,1,2,...m are the denominators of the transfer function G(s), respectively. The coefficient of the polynomial and the numerator polynomial, exp(-Ls), is a transfer function of dead time (delay time), and L represents the delay time. Further, U(s) is a value obtained by Laplace transform of the manipulated variable, and Y(s) is a value obtained by Laplace transform of a process measurement value.

プロセス計測値は複数の項目の値を含み得るため、位相遅れパラメータ推定部３００は、（６）式に相当する線形伝達関数モデルをプロセス計測値毎に用いても良い。一般的に、制御系の応答は最も応答の遅いものが律速となるため、例えば、この場合は、Ｙ（ｓ）として浄水池濁度をラプラス変換した値などを代表値としても良い。あるいは、プロセス計測値から、後述するプロセス評価値に換算した上で、プロセス評価値のラプラス変換した値をＹ（ｓ）とすることができる。 Since a process measurement value may include values of a plurality of items, the phase lag parameter estimation unit 300 may use a linear transfer function model corresponding to equation (6) for each process measurement value. Generally, the response of the control system is determined by the slowest response, so in this case, for example, a value obtained by Laplace transform of the water purification pond turbidity as Y(s) may be used as a representative value. Alternatively, after converting the process measurement value into a process evaluation value to be described later, the value obtained by Laplace transform of the process evaluation value can be set as Y(s).

Ｕ（ｓ）とＹ（ｓ）とを定義することにより、（６）式の伝達関数モデルのパラメータ、ａ_ｉ,ｉ＝０,１,２,…ｎ、ｂ_ｉ,ｉ＝０,１,２,…ｍ、および、Ｌを同定することができる。なお、（６）式は、一般的な連続時間系の線形伝達関数モデルであるが、位相遅れパラメータ推定部３００では、プロセスの静的な情報は不要であるため、予め静的なゲインを１としておいてよい。この場合、Ｕ（ｓ）からＹ（ｓ）の（静的な）ゲインが１になる様に、予めＵ（ｓ）とＹ（ｓ）とに対応するデータを正規化しておく必要がある。予めＵ（ｓ）とＹ（ｓ）とに対応するデータを正規化することでゲインを１に規格化できる。また、（６）式でゲインを１にするために、予めａ_０＝ｂ_０としておく必要がある。 By defining U(s) and Y(s), the parameters of the transfer function model in equation (6), a _i ,i=0,1,2,...n, b _i ,i=0,1, 2,...m, and L can be identified. Note that equation (6) is a general linear transfer function model for continuous time systems, but the phase lag parameter estimation unit 300 does not require static information about the process, so the static gain is set to 1 in advance. You can leave it as In this case, it is necessary to normalize the data corresponding to U(s) and Y(s) in advance so that the (static) gain from U(s) to Y(s) becomes 1. The gain can be normalized to 1 by normalizing the data corresponding to U(s) and Y(s) in advance. Furthermore, in order to set the gain to 1 in equation (6), it is necessary to set a ₀ = b ₀ in advance.

位相遅れパラメータ推定部３００が、オフラインでパラメータ推定を行う場合には、予め蓄積しておいたＵ（ｓ）とＹ（ｓ）とに対応する時系列データを用いて、制御理論分野で良く知られているシステム同定法を流用することで、これらのパラメータを同定することができる。 When the phase lag parameter estimator 300 performs parameter estimation off-line, it uses time series data corresponding to U(s) and Y(s) that have been accumulated in advance. These parameters can be identified by using the system identification method that has been proposed.

位相遅れパラメータ推定部３００が、オンラインでパラメータ推定を行う場合には、例えば、適応制御で知られている適応オブザーバなどの考え方を流用することで、これらのパラメータを同定することができる。例えば、制御対象２００が外乱によって大きく励起（駆動）されているときには、そのまま適応オブザーバを用いることも可能である。 When the phase lag parameter estimator 300 performs parameter estimation online, these parameters can be identified by, for example, using the concept of an adaptive observer known for adaptive control. For example, when the controlled object 200 is greatly excited (driven) by a disturbance, it is possible to use the adaptive observer as is.

なお、システム同定法や適応オブザーバなどを適用する場合には、（６）式を直接用いるのではなく、（６）式を離散化した離散伝達関数モデルを用いる方が一般的である。また、これらの理論を適用する際には、パラメータが正しく同定できるための条件（可同定条件）が知られているので、可同定を満たす様なデータを用いる必要がある。
具体的には、オフラインの場合には、操作量（＝移動速度目標値ＳＶ）をＭ系列（ＭＬＳ）と呼ばれる信号で変化させたデータを予め取っておくことが好ましい。
上記のように、位相遅れパラメータ推定部３００は、システム同定理論や適応制御理論に基づいて、システマティックに伝達関数モデルの係数を同定することができる。 Note that when applying a system identification method, an adaptive observer, etc., it is common to use a discrete transfer function model obtained by discretizing equation (6), rather than directly using equation (6). Furthermore, when applying these theories, since the conditions for correctly identifying parameters (identifiable conditions) are known, it is necessary to use data that satisfies the identifiability conditions.
Specifically, in the case of offline operation, it is preferable to save in advance data in which the manipulated variable (=moving speed target value SV) is changed using a signal called an M-sequence (MLS).
As described above, the phase lag parameter estimation unit 300 can systematically identify the coefficients of the transfer function model based on system identification theory and adaptive control theory.

なお、システム同定や適応制御分野では、同定すべきパラメータ数が増加するとパラメータを正しく同定できる可同定条件が厳しくなることも広く知られている。一般に、凝集剤注入制御などのプロセス制御分野では、あまり大きなかく乱をプラントに印加することは好まれない場合が多く、実際には、ステップ応答試験ぐらいしか実施できない場合も多い。これは、実際の制御対象を相手にする場合には無視できない強い制約となり、このような現実的な制約のため、実際の現場ではシステム同定や適応制御を用いずに、ステップ応答試験およびＰＩＤ制御の様なシンプルな制御が好まれているという側面がある。 In the field of system identification and adaptive control, it is widely known that as the number of parameters to be identified increases, the identifiability conditions for correctly identifying the parameters become stricter. Generally, in the field of process control such as coagulant injection control, it is often not desirable to apply too large a disturbance to a plant, and in reality, only a step response test is often possible. This becomes a strong constraint that cannot be ignored when dealing with an actual control target, and due to these realistic constraints, step response tests and PID control are not used in actual sites without system identification or adaptive control. There is an aspect that simple control such as is preferred.

位相遅れパラメータ推定部３００においても、（６）式の形を一般系で用いるのは現実的には得策でない場合も多く、より実用的な方法として例えば（７）式のモデルを採用することができる。
上記（７）式のモデルは（６）式の特殊なケースでああり、１次遅れ＋むだ時間モデルである。（７）式は、遅れ時間Ｌと時定数Ｔという２つのパラメータしか含んでおらず、これらの２つのパラメータは、ステップ応答試験を実施できれば、直ちに取得することができるため、実際の取得も容易である。 In the phase lag parameter estimating unit 300, it is often not practical to use the form of equation (6) in a general system, and a more practical method is to adopt the model of equation (7), for example. can.
The model of equation (7) above is a special case of equation (6), and is a first-order lag + dead time model. Equation (7) only includes two parameters, the delay time L and the time constant T, and these two parameters can be obtained immediately if a step response test can be performed, so it is easy to actually obtain them. It is.

本実施形態の最適制御装置では、操作量の時系列データに上記（６）式や（７）式の伝達関数を乗じることで、操作量の時系列データの位相を遅れさせて、操作量に位相遅れ情報を組み込んだ位相遅れ操作量の時系列データと、評価値の時系列データとの間の位相差を解消している。 In the optimal control device of this embodiment, the phase of the time series data of the manipulated variable is delayed by multiplying the time series data of the manipulated variable by the transfer function of equations (6) and (7) above. The phase difference between the time series data of the phase delay manipulated variable incorporating the phase delay information and the time series data of the evaluation value is eliminated.

なお、（７）式は、ＰＩＤ制御の調整に用いられる典型的なモデルの特殊系である。ＰＩＤ制御の調整では、ｅｘｐ（－Ｌｓ）/（Ｔｓ＋１）ではなく、Ｋｅｘｐ（－Ｌｓ）/（Ｔｓ＋１）というモデルを用い、プロセスゲインと呼ばれるもう一つのパラメータＫも同定することが通常である。（７）式でＫを組み込んでいないのは、後述するように、位相遅れには、プラントの静的な情報は関係しないためである。 Note that equation (7) is a special system of a typical model used for adjusting PID control. When adjusting PID control, it is common to use a model called Kexp(-Ls)/(Ts+1) instead of exp(-Ls)/(Ts+1), and also identify another parameter K called process gain. The reason why K is not included in equation (7) is because static information of the plant is not related to the phase delay, as will be described later.

また、位相遅れパラメータ推定部３００がオンラインで、遅れ時間Ｌと時定数Ｔとの推定値を演算する場合には、先に述べた適応オブザーバの考え方を流用しても良い。
遅れ時間Ｌと時定数Ｔとの２つのパラメータの推定値を演算するときには、位相遅れパラメータ推定部３００は、ディザー信号の情報を利用することができる。例えば、ディザー信号として正弦波ではなく矩形波を用いている場合には、矩形波の変化前後を一つのステップと考えることができるため、位相遅れパラメータ推定部３００は、ステップ応答の考え方を直接利用して遅れ時間Ｌと時定数Ｔとを同定することができる。 Further, when the phase delay parameter estimating section 300 calculates the estimated values of the delay time L and the time constant T online, the concept of the adaptive observer described above may be used.
When calculating the estimated values of the two parameters, lag time L and time constant T, the phase lag parameter estimator 300 can use information on the dither signal. For example, when a rectangular wave is used instead of a sine wave as a dither signal, the period before and after the change in the rectangular wave can be considered as one step, so the phase lag parameter estimating unit 300 directly utilizes the concept of step response. Then, the delay time L and time constant T can be identified.

例えば、ディザー信号として正弦波を用いている場合には、可同定性が劣化するので、直接求めることが困難となる可能性もあるが、位相遅れパラメータ推定部３００がディザー信号の正弦波の周波数を２段階（若しくは２段階以上）変化させ、Ｕ（ｓ）とＹ（ｓ）とに対応する２つの時系列データの位相差を直接推定することで、遅れ時間Ｌと時定数Ｔを逆算して求めることができる。 For example, when a sine wave is used as the dither signal, the identifiability deteriorates and it may be difficult to directly obtain it, but the phase lag parameter estimator 300 The delay time L and time constant T can be calculated backward by changing the phase difference by two steps (or two or more steps) and directly estimating the phase difference between the two time series data corresponding to U(s) and Y(s). can be found.

１次遅れむだ時間の位相は、遅れ時間Ｌと時定数Ｔとの関数により表されるため、位相情報が２つ得られれば逆算により遅れ時間Ｌと時定数Ｔとを求められる。例えば３つ以上の周波数を用いた場合は、誤差最小化基準などを設けてパラメータ最適化を行うことで、遅れ時間Ｌと時定数Ｔとを求められる。 Since the phase of the first-order delay dead time is expressed by a function of the delay time L and the time constant T, if two pieces of phase information are obtained, the delay time L and the time constant T can be obtained by back calculation. For example, when three or more frequencies are used, the delay time L and time constant T can be determined by setting an error minimization criterion and optimizing the parameters.

一方、制御対象プロセスのプロセスモデル（制御対象プロセスモデル）やプロセスシミュレータが既にある場合は、必ずしも、（６）式の様な線形伝達関数、あるいは、その特殊例である（７）式の様な１次遅れ＋むだ時間モデルを同定する必要はなく、プロセスシミュレータを直接用いることもできる。 On the other hand, if you already have a process model for the controlled process (controlled process model) or a process simulator, it is not necessary to use a linear transfer function such as equation (6), or a special example of the linear transfer function such as equation (7). It is not necessary to identify a first-order delay + dead time model, and a process simulator can be used directly.

本実施形態の最適制御装置では、浄水場のＰＡＣ注入とＰＡＣ注入とにより移動速度や沈殿池・浄水池出口の濁度が変化する様な凝集プロセスモデル・凝集プロセスシミュレータがある場合には、プロセスシミュレータに含まれる各種パラメータを同定することが位相遅れパラメータ推定処理に相当する。 In the optimal control device of this embodiment, if there is a flocculation process model or flocculation process simulator in which the moving speed or the turbidity at the outlet of the sedimentation tank or water purification tank changes depending on the PAC injection and the PAC injection at the water treatment plant, the process Identifying various parameters included in the simulator corresponds to phase lag parameter estimation processing.

プロセスシミュレータに含まれる各種パラメータの推定方法としては、様々な方法が考えられる。プロセスシミュレータが非線形モデルである場合には、実際のプロセスの入出力とプロセスシミュレータの入出力とが極力一致する様に、同じ入力に対する実際のプロセスの出力とシミュレータ出力との誤差に対して、例えば、２乗誤差最小化などを評価基準として、誤差を最小化する最適化問題を解いてもよく、あるいは、データ同化と呼ばれる方法を適用してもよい。このことにより、位相遅れパラメータ推定部３００は、制御対象プロセスモデルのパラメータを同定することができる。 Various methods can be considered for estimating various parameters included in the process simulator. If the process simulator is a nonlinear model, in order to match the input and output of the actual process and the input and output of the process simulator as much as possible, for example, for the error between the actual process output and the simulator output for the same input, , squared error minimization, etc. may be used as an evaluation criterion to solve an optimization problem that minimizes the error, or a method called data assimilation may be applied. This allows the phase lag parameter estimator 300 to identify the parameters of the controlled process model.

上記のように、位相遅れパラメータ推定部３００は、位相遅れパラメータをオフラインあるいはオンラインで同定することができる。位相遅れパラメータ推定部３００は、同定したパラメータの推定値を、位相遅れ推定部１１００に供給する。 As described above, the phase lag parameter estimator 300 can identify the phase lag parameter offline or online. The phase lag parameter estimation section 300 supplies the estimated value of the identified parameter to the phase lag estimation section 1100.

位相遅れ推定部１１００は、位相遅れパラメータ推定部３００から供給されたパラメータの推定値を用いて、操作量から評価量までの位相遅れを求めることができる。
本実施形態の最適制御装置において、位相遅れ推定部１１００は、復調用ディザー信号に位相遅れ情報を組み込むため、例えば（６）式あるいは（７）式の伝達関数モデルを用いて、伝達関数モデルの位相遅れを計算する。位相遅れ推定部１１００は、線形伝達関数モデルのボード線図作成時などに用いられる広く一般的に知られた位相計算方法を直接適用することにより、伝達関数モデルの位相遅れを計算できる。ここでは、簡単のため、（７）式の伝達関数モデルの場合の具体的な位相遅れの計算式を（８）式に示す。
∠Ｇ（ｓ）＝－ωＬ－ｔａｎ^－１（ωＴ）………………………………………（８）
ここで、ωは周波数である。本実施形態の最適制御装置では、ディザー信号を用いて強制的に操作量を動かすため、例えばディザー信号として正弦波を用いる場合は、ディザー信号の周波数をωの値とすることができる。 The phase lag estimating unit 1100 can use the estimated values of the parameters supplied from the phase lag parameter estimating unit 300 to determine the phase lag from the manipulated variable to the evaluation variable.
In the optimal control device of this embodiment, the phase lag estimation unit 1100 uses the transfer function model of equation (6) or equation (7), for example, to incorporate phase lag information into the demodulation dither signal. Calculate the phase delay. The phase lag estimation unit 1100 can calculate the phase lag of a transfer function model by directly applying a widely known phase calculation method used for creating a Bode diagram of a linear transfer function model. Here, for the sake of simplicity, a specific equation for calculating the phase delay in the case of the transfer function model of equation (7) is shown in equation (8).
∠G(s)=-ωL-tan ^-1 (ωT)…………………………………………(8)
Here, ω is the frequency. In the optimal control device of this embodiment, the manipulated variable is forcibly moved using a dither signal, so when a sine wave is used as the dither signal, for example, the frequency of the dither signal can be set to the value of ω.

なお、位相遅れ推定部１１００において（６）式の伝達関数モデルを用いて位相遅れを演算する場合には、（７）式のモデルを用いる場合よりも複雑な計算が必要になるものの、周波数ωと線形伝達関数モデルのパラメータ値とから位相遅れを計算することが可能である。
一方で、線形伝達関数モデルで表す事が適切でない場合は、予め、プロセスシミュレータ（制御対象プロセスを模擬する物理化学式で記述されたシミュレーションモデル）を直接用いて位相推定をすることができる。この際、位相推定に極値制御を適用するためには、出力として何等かの評価関数が必要となるため、いわゆるプロセスシミュレータ（制御対象プロセスのシミュレーションモデル）と評価関数モデルとを組み合わせたシミュレーションモデル、が必要である。
例えば、浄水プロセスや下水処理プロセスではその処理を記述するプロセスシミュレータを作成可能であり、例えば、その出力は水質などである。極値制御を適用する際には、水質だけでなく、水質の良好さとそれにかかる費用などの全体のバランスを見る必要があるため、本実施形態では、最適性を図る評価関数モデルとして、例えば、総合的なコストなどを定義する。この様にプロセスシミュレータと評価関数モデルとから構成されるモデルを直接利用して、極値制御による位相推定を行うことができる。 Note that when the phase delay estimator 1100 calculates the phase delay using the transfer function model of equation (6), more complicated calculations are required than when using the model of equation (7), but the frequency ω It is possible to calculate the phase delay from and the parameter values of the linear transfer function model.
On the other hand, if it is not appropriate to express it using a linear transfer function model, the phase can be estimated in advance directly using a process simulator (a simulation model written using a physicochemical formula that simulates the process to be controlled). At this time, in order to apply extreme value control to phase estimation, some kind of evaluation function is required as an output, so a simulation model that combines a so-called process simulator (simulation model of the controlled process) and an evaluation function model is used. ,is necessary.
For example, in a water purification process or a sewage treatment process, it is possible to create a process simulator that describes the process, and the output is, for example, water quality. When applying extreme value control, it is necessary to look at not only water quality but also the overall balance between good water quality and the cost involved. Therefore, in this embodiment, as an evaluation function model for optimality, for example, Define overall costs, etc. In this way, by directly using the model composed of the process simulator and the evaluation function model, it is possible to perform phase estimation by extreme value control.

また、プロセスシミュレータを利用する場合には、（８）式のような周波数と位相との関係を数式で陽に表現することは一般にはできない。このため、多数の異なる周波数の正弦波をプロセスシミュレータと評価関数モデルとから構成されたモデルに入力し、シミュレーションの入力（操作量）と出力（評価関数値）との波形のずれを評価して、位相遅れを評価する。各々の入力正弦波に対する出力波形の位相遅れを測定し、位相遅れ推定部１１００において（８）式に相当する周波数ωの関数として位相を表す近似関数を作成しておけばよい。
すなわち、位相という概念は、周波数ωに対して定義されるため、入力の周波数ωを色々と変化させて、周波数ω毎に出力の位相がどのぐらい遅れるかを、シミュレーションにより評価し、周波数ωを横軸とし位相遅れを縦軸としてプロットしたときの関数を表すテーブルを予め作成することができる。位相遅れ推定部１１００は、上記関数（テーブル）を用いて位相遅れの推定値を算出できる。 Furthermore, when using a process simulator, it is generally not possible to explicitly express the relationship between frequency and phase as in equation (8) using a mathematical equation. For this reason, we input many sine waves of different frequencies into a model consisting of a process simulator and an evaluation function model, and evaluate the waveform deviation between the simulation input (operated amount) and the output (evaluation function value). , evaluate the phase delay. It is sufficient to measure the phase delay of the output waveform with respect to each input sine wave, and create an approximate function representing the phase as a function of frequency ω corresponding to equation (8) in the phase delay estimating section 1100.
In other words, the concept of phase is defined with respect to frequency ω, so by varying the input frequency ω and evaluating by simulation how much the output phase lags at each frequency ω, the frequency ω can be calculated by A table can be created in advance that represents a function when the horizontal axis is plotted and the phase delay is plotted as the vertical axis. The phase lag estimation unit 1100 can calculate the estimated value of the phase lag using the above function (table).

あるいは、位相遅れ推定部１１００は、制御対象プロセスモデルから実際の操作量近傍での線形近似モデルを解析的に導出し、線形近似モデルを線形伝達関数で表現した線形伝達関数モデルの位相計算式を適用したシミュレーションにより位相遅れを算出してもよい。
例えば、下水処理プロセスシミュレータなどは非線形の微分方程式で記述されている。この非線形の微分方程式は、動作点（着目する時点の水質の状態や操作量の状態）の近傍において線形近似モデルを導出することができる。線形近似モデルは、元の非線形の微分方程式を近似した線形の微分方程式であり、本実施形態において線形近似モデルは線形の微分方程式の意味を意味する。線形の微分方程式は、例えば（６）式の線形伝達関数の形でも記述可能である。線形近似モデルを線形伝達関数の形で記述することができれば、位相遅れは（８）式に対応する位相遅れの計算式を解析的に求めることができる。位相遅れ推定部１１００は、上記のように線形近似モデルを線形伝達関数の形で表すことにより得られた位相遅れの計算式で、位相遅れの推定値を算出できる。
位相遅れ推定部１１００は、算出した位相遅れの値を復調用ディザー信号生成部１３０に供給する。 Alternatively, the phase delay estimation unit 1100 analytically derives a linear approximation model near the actual manipulated variable from the controlled target process model, and calculates the phase calculation formula of the linear transfer function model in which the linear approximation model is expressed by a linear transfer function. The phase delay may be calculated by the applied simulation.
For example, sewage treatment process simulators are described using nonlinear differential equations. From this nonlinear differential equation, a linear approximation model can be derived in the vicinity of the operating point (the state of water quality or the state of the manipulated variable at the time of interest). The linear approximation model is a linear differential equation that approximates the original nonlinear differential equation, and in this embodiment, the linear approximation model means a linear differential equation. A linear differential equation can also be described in the form of a linear transfer function of equation (6), for example. If the linear approximation model can be described in the form of a linear transfer function, the phase lag can be analytically determined by a phase lag calculation formula corresponding to equation (8). The phase lag estimation unit 1100 can calculate the estimated value of the phase lag using the phase lag calculation formula obtained by expressing the linear approximation model in the form of a linear transfer function as described above.
The phase lag estimation section 1100 supplies the calculated phase lag value to the demodulation dither signal generation section 130.

復調用ディザー信号生成部１３０は、極値制御で、評価関数の勾配（１階微分、ヤコビアン）の推定処理をするために必要となる第１周期信号（復調用ディザー信号）を生成する。
ディザー信号としては、典型的には正弦波を用いるが、必ずしも正弦波である必要はなく、周期的信号であれば矩形波や三角波であっても構わない。復調用ディザー信号は、変調用ディザー信号と同じ周期で同じ形状の波形である。
（９）式に、ディザー信号として正弦波を用いた場合の復調用ディザー信号を示す。
Ｄ（ｔ）＝ｓｉｎ（ωｔ＋φ）…………………………………………………（９） The demodulation dither signal generation unit 130 generates a first periodic signal (demodulation dither signal) necessary for estimating the gradient (first derivative, Jacobian) of the evaluation function using extreme value control.
Although a sine wave is typically used as the dither signal, it does not necessarily have to be a sine wave, and may be a rectangular wave or a triangular wave as long as it is a periodic signal. The demodulation dither signal has a waveform with the same period and the same shape as the modulation dither signal.
Equation (9) shows a demodulation dither signal when a sine wave is used as the dither signal.
D(t)=sin(ωt+φ)………………………………………………(9)

ここで、ωはディザー信号の周波数であり、φは位相遅れ推定部１１００で演算した位相遅れの推定値である。すなわち、位相遅れパラメータ推定部３００で（７）式のモデルを用いた場合には、（８）式のものに相当し、位相遅れφは下記（１０）式で表すことができる。
φ:＝－ωＬ－ｔａｎ^－１（ωＴ）…………………………………………………（１０）
本実施形態の最適制御装置では、上記のように復調用ディザー信号生成部１３０で生成される復調用ディザー信号の中に、位相遅れ情報が組み込まれる。
復調用ディザー信号生成部１３０は、位相遅れφを含む復調用ディザー信号を評価関数勾配推定部１４０に供給する。 Here, ω is the frequency of the dither signal, and φ is the estimated value of the phase lag calculated by the phase lag estimator 1100. That is, when the model of equation (7) is used in the phase lag parameter estimation unit 300, it corresponds to that of equation (8), and the phase lag φ can be expressed by equation (10) below.
φ:=-ωL-tan ^-1 (ωT)………………………………………………(10)
In the optimal control device of this embodiment, phase delay information is incorporated into the demodulation dither signal generated by the demodulation dither signal generation section 130 as described above.
The demodulation dither signal generation unit 130 supplies the demodulation dither signal including the phase delay φ to the evaluation function gradient estimation unit 140.

図６は、位相遅れ補償の前後における操作量の時系列データと評価値の時系列データとの一例を概略的に示す図である。
上記のように位相遅れφを含む復調用ディザー信号を用いて、評価関数勾配推定部１４０にて評価関数の勾配を演算することにより、位相遅れ操作量の時系列データｕ´（ｔ）と評価値の時系列データｙ´（ｔ）との間に生じる位相差を略ゼロとすることができる。 FIG. 6 is a diagram schematically showing an example of time series data of manipulated variables and time series data of evaluation values before and after phase lag compensation.
As described above, by calculating the slope of the evaluation function in the evaluation function gradient estimating unit 140 using the demodulation dither signal including the phase delay φ, the time series data u'(t) of the phase delay manipulated variable is evaluated. The phase difference that occurs between the value and the time series data y'(t) can be made approximately zero.

なお、本実施形態の最適制御装置では、ディザー信号の一例として正弦波を用いたが、正弦波ではなく矩形波や三角波などの周期信号をディザー信号として用いることも可能である。矩形波や三角波などの周期信号をディザー信号として用いる場合には、（１０）式で得られる値をそのまま位相遅れφとすることはできない。（９）式による位相遅れ補償は、（９）式の復調用ディザー信号と、（５）式の変調用ディザー信号との間の位相差をゼロにするようにするための補正処理であるため、位相遅れφがディザー信号の周期に対してどのくらいの割合であるかを求め、変調用ディザー信号に対して、その分だけ周期を補正した信号を復調用のディザー信号とすれば良い。
このような位相遅れ補償を機械的に実施するためには、復調用ディザー信号生成部１３０は、例えば、矩形波などの周期信号をフーリエ級数展開して正弦波の級数として表し、その中に、（１０）式に相当する位相遅れを代入していくという方法を採用しても良い。 Note that in the optimal control device of this embodiment, a sine wave is used as an example of the dither signal, but it is also possible to use a periodic signal such as a rectangular wave or a triangular wave instead of a sine wave as the dither signal. When using a periodic signal such as a rectangular wave or a triangular wave as a dither signal, the value obtained by equation (10) cannot be directly used as the phase delay φ. The phase delay compensation according to equation (9) is a correction process to make the phase difference between the demodulation dither signal of equation (9) and the modulation dither signal of equation (5) zero. , the ratio of the phase delay φ to the period of the dither signal may be determined, and a signal obtained by correcting the period of the modulation dither signal by that amount may be used as the demodulation dither signal.
In order to mechanically perform such phase lag compensation, the demodulation dither signal generation unit 130 expands a periodic signal such as a rectangular wave into a Fourier series, expresses it as a sine wave series, and expresses it as a sine wave series. A method of substituting a phase delay corresponding to equation (10) may also be adopted.

次に、本実施形態の最適制御装置において、上記位相遅れ補償を行うことにより得られる効果について説明する。
従来、極値制御の安定性と制御性能との関係について、例えば、極値制御の安定解析を行うことにより、安定性を保証するためにパラメータ設定を適切に行う必要があることを指摘し、定性的には制御パラメータ（ディザー信号の振幅、ディザー信号の周波数、積分器のゲイン（積分ゲイン）、ローパスフィルタのカットオフ周波数、ハイパスフィルタのカットオフ周波数など）の値を十分に小さくしないと、安定性が崩れることが示されている。 Next, the effects obtained by performing the phase delay compensation in the optimal control device of this embodiment will be explained.
Conventionally, regarding the relationship between the stability of extreme value control and control performance, for example, by performing stability analysis of extreme value control, it has been pointed out that it is necessary to appropriately set parameters to guarantee stability. Qualitatively, if the values of the control parameters (dither signal amplitude, dither signal frequency, integrator gain (integral gain), low-pass filter cutoff frequency, high-pass filter cutoff frequency, etc.) are not made sufficiently small, It has been shown that stability is compromised.

制御パラメータを「十分に小さく」設定することは、制御動作を「ゆっくり」、「弱く」働かせることを意味している。すなわち、「安定性」と「制御性能」とにはトレードオフの関係があり、安定性を維持するためには、制御性能を落とすことが本質的に必要であり、逆に制御性能を上げようとしてパラメータを大きく設定すると、安定性が崩れ、結果として制御が失敗するリスクが高くなることを示している。 Setting the control parameter to be "sufficiently small" means that the control operation is performed "slowly" and "weakly". In other words, there is a trade-off relationship between "stability" and "control performance"; in order to maintain stability, it is essentially necessary to reduce control performance, and conversely, it is necessary to improve control performance. This shows that if the parameters are set to a large value, the stability will collapse and the risk of control failure will increase as a result.

本実施形態の最適制御装置による極値制御では、（５）式の変調用ディザー信号と（９）式の復調用ディザー信号との間に位相差φがあり、操作量の時系列データに対する評価値の時系列データの位相遅れ分が補償されている。この位相遅れ補償により、ディザー信号の周波数ωを大きく（すなわちディザー信号の周期を小さく）設定したり、積分ゲインＫを大きく設定したりすることが可能になり、極値探索の制御性能（収束速度）を改善することができる。 In the extreme value control by the optimal control device of this embodiment, there is a phase difference φ between the modulation dither signal in equation (5) and the demodulation dither signal in equation (9), and evaluation of the manipulated variable time series data is performed. The phase delay of the time series data is compensated. This phase lag compensation makes it possible to set the frequency ω of the dither signal large (that is, reduce the period of the dither signal) and to set the integral gain K large, which improves the control performance of extreme value search (convergence speed ) can be improved.

図７乃至図１０は、極値制御に用いられる周期信号の周波数を変更したときの、制御の安定性と制御性能との関係の一例を説明するための図である。
凝集剤注入プロセスでは、対象プロセスから沈殿池出口濁度や浄水池出口濁度までの応答時間として数時間オーダーの遅れがある。ここでは、応答に数時間オーダーの遅れがある対象プロセスに対して、上記位相遅れ補償を行わずに、周期が１５００分、１０００分および５００分のディザー信号で操作量を駆動したときの目標値、沈殿池濁度、ＰＡＣ注入量および総コストの時間変化のシミュレーション結果を示している。 7 to 10 are diagrams for explaining an example of the relationship between control stability and control performance when changing the frequency of a periodic signal used for extreme value control.
In the flocculant injection process, there is a response time delay of several hours from the target process to the turbidity at the outlet of the sedimentation tank or the outlet of the water purification tank. Here, for a target process with a response delay on the order of several hours, the target value when the manipulated variable is driven with a dither signal with a period of 1500 minutes, 1000 minutes, and 500 minutes without performing the above phase delay compensation. , shows the simulation results of the time variation of sedimentation basin turbidity, PAC injection amount, and total cost.

図７乃至図１０によれば、ディザー信号の周期が１５００分のときには、いずれの評価値も収束しているものの、数値が最適値近傍に収束するまでに長時間要した。ディザー信号の周期が１０００分のときには、ディザー信号の周期が１５００分のときよりも最適値近傍に収束する時間が短くなったが、ディザー信号の周期を５００分とすると、評価値が最適値に収束することがなく発散してしまった。 According to FIGS. 7 to 10, when the period of the dither signal was 1500 minutes, all evaluation values converged, but it took a long time for the numerical values to converge near the optimum value. When the dither signal period was 1000 minutes, the time to converge near the optimal value was shorter than when the dither signal period was 1500 minutes, but when the dither signal period was 500 minutes, the evaluation value became the optimal value. It diverged without converging.

図１１乃至図１３は、実施形態の最適制御装置による効果の一例を説明するための図である。
ここでは、本実施形態の最適制御装置において、プラントの時定数Ｔ＝０．６７ｓ、プラントのむだ時間Ｌ＝０．６７ｓ、操作量の最適値Ｕ^＊＝２、ディザー信号の周期をプラントの時定数とプラントのむだ時間との和（Ｔ＋Ｌ）の２倍～５倍として、操作量の時間変換をシミュレーションした結果の一例を示している。
また、図１１乃至図１３には、第１比較例の最適制御装置による操作量の時間変化のシミュレーション結果の一例を併せて示している。第１比較例の最適制御装置は、位相遅れ補償を適用していないこと以外は本実施形態の最適制御装置と同じ条件である。 FIGS. 11 to 13 are diagrams for explaining an example of the effects of the optimal control device of the embodiment.
Here, in the optimal control device of this embodiment, the plant time constant T = 0.67 s, the plant dead time L = 0.67 s, the optimal value of the manipulated variable U ^* = 2, and the period of the dither signal is set to the plant time constant. An example of the results of simulating the time conversion of the manipulated variable is shown, with the value being 2 to 5 times the sum of the constant and the plant dead time (T+L).
Further, FIGS. 11 to 13 also show examples of simulation results of temporal changes in the manipulated variable by the optimal control device of the first comparative example. The optimal control device of the first comparative example has the same conditions as the optimal control device of the present embodiment except that phase delay compensation is not applied.

シミュレーション結果によれば、本実施形態の最適制御装置によれば、ディザー信号の周期が小さくなるほど操作量が最適値に収束するまでの時間が短くなった。一方で、第１比較例の最適制御装置については、ディザー信号の周期を小さくすると操作量が最適値に収束するまでの時間が長くなる傾向があり、ディザー信号の周期を２．５１ｓ（（Ｔ＋Ｌ）×２）としたときには操作量が発散して最適値に収束しなかった。 According to the simulation results, according to the optimal control device of this embodiment, the shorter the period of the dither signal, the shorter the time it takes for the manipulated variable to converge to the optimal value. On the other hand, regarding the optimal control device of the first comparative example, when the period of the dither signal is decreased, the time until the manipulated variable converges to the optimal value tends to become longer, and the period of the dither signal is set to 2.51 s ((T+L )×2), the manipulated variable diverged and did not converge to the optimal value.

上記のように、本実施形態の最適制御装置によれば、操作量とプロセス評価値の間に存在する位相差を補償する機能を導入することにより、極値制御の性能を高める（収束速度を上げる）ことが可能になる。これにより、収束速度が遅いことによる極値探索性能の劣化を改善できる。 As described above, the optimal control device of this embodiment improves the performance of extreme value control (increases the convergence speed) by introducing a function that compensates for the phase difference that exists between the manipulated variable and the process evaluation value. increase) becomes possible. This can improve the deterioration in extreme value search performance due to slow convergence speed.

図１４乃至図１７は、実施形態の最適制御装置による効果の他の例を説明するための図である。
ここでは、第３比較例の最適制御装置と本実施形態の最適制御装置とについて、外乱（流入量の日単位の変動）がある場合のシミュレーション結果の一例を示している。
図１４および図１５には、第３比較例の最適制御装置による極値制御のシミュレーション結果の一例と、目標値ＳＶを一定にしたときのシミュレーション結果の一例とを示している。なお、第３比較例の最適制御装置は、ディザー信号の周期を１５００分（≒１４４０分（一日）に相当）とし、位相遅れ補償を適用せずに極値制御により目標値を演算している。
第３比較例の最適制御装置によるシミュレーション結果によれば、流入量の日単位変動の影響により制御が不安定になっていることが分かる。 FIGS. 14 to 17 are diagrams for explaining other examples of effects achieved by the optimal control device of the embodiment.
Here, an example of simulation results when there is a disturbance (daily fluctuation in inflow amount) is shown for the optimal control device of the third comparative example and the optimal control device of the present embodiment.
FIGS. 14 and 15 show an example of a simulation result of extreme value control by the optimal control device of the third comparative example, and an example of a simulation result when the target value SV is kept constant. Note that the optimal control device of the third comparative example sets the period of the dither signal to 1500 minutes (equivalent to 1440 minutes (one day)), and calculates the target value by extreme value control without applying phase lag compensation. There is.
According to the simulation results by the optimal control device of the third comparative example, it is found that the control is unstable due to the influence of daily fluctuations in the inflow amount.

図１６および図１７には、本実施形態の最適制御装置による極値制御のシミュレーション結果の一例と、目標値ＳＶを一定にしたときのシミュレーション結果の一例とを示している。なお、本実施形態の最適制御装置は、ディザー信号の周期を９００分とし、位相遅れ補償を適用した極値制御により目標値を演算している。
第３比較例の最適制御装置に対し、本実施形態の最適制御装置のシミュレーション結果によれば、流入量(外乱)の日単位変動の影響が抑制され、制御が安定している。 16 and 17 show an example of a simulation result of extreme value control by the optimal control device of this embodiment, and an example of a simulation result when the target value SV is kept constant. Note that the optimal control device of this embodiment sets the period of the dither signal to 900 minutes, and calculates the target value by extreme value control to which phase lag compensation is applied.
Compared to the optimal control device of the third comparative example, the simulation results of the optimal control device of this embodiment show that the influence of daily fluctuations in the inflow amount (disturbance) is suppressed and the control is stable.

上記のように、本実施形態によれば、極値制御の「安定性」と「制御性能（収束速度）」との両立という課題を、位相補償という操作により克服することで、安定性を維持しながら、制御性能を極力高める（局所）最適値の探索を実現する最適制御装置、最適制御方法、および、コンピュータプログラムを提供することができる。 As described above, according to this embodiment, stability is maintained by overcoming the problem of achieving both "stability" and "control performance (convergence speed)" in extreme value control through an operation called phase compensation. At the same time, it is possible to provide an optimal control device, an optimal control method, and a computer program that realize a search for a (local) optimal value that maximizes control performance.

また、本実施形態の最適制御装置によれば、産業界で広く利用されているＰＩＤ制御の調整時に用いられるステップ応答試験などの簡単な応答試験を実施するだけで、位相遅れ補償を行う手段を構成することでき、容易に実務に利用できる極値制御系を構成可能である。
また、本実施形態の最適制御装置において、勾配推定量を正規化する構成を備えることにより、より収束速度の速い極値制御系を構成することが可能になる。
さらに、位相遅れパラメータ推定部３００においてオンラインでパラメータの推定値を算出することにより、ディザー信号の印加に少しの加工を行うことで、ステップ応答などの試験も不要とする収束速度の速い極値制御系を構成することが可能になる。 Further, according to the optimal control device of this embodiment, a means for performing phase lag compensation can be achieved by simply performing a simple response test such as a step response test used when adjusting PID control that is widely used in industry. It is possible to construct an extreme value control system that can be easily used in practice.
Furthermore, by providing the optimal control device of this embodiment with a configuration for normalizing the gradient estimation amount, it becomes possible to configure an extreme value control system with a faster convergence speed.
Furthermore, by calculating parameter estimates online in the phase lag parameter estimating section 300, and by performing slight processing on the application of the dither signal, extreme value control with fast convergence speed that eliminates the need for tests such as step response etc. It becomes possible to configure the system.

図１８は、第１実施形態の最適制御装置の一部の他の構成例を概略的に示す図である。
図１８に示すように、正規化信号発生部１５０および勾配推定量正規化部１６０は必須の構成ではなく、省略されても構わない。その場合には、評価関数勾配推定部１４０の出力値が極値探索部１７０に供給される。
このように、正規化信号発生部１５０および勾配推定量正規化部１６０が省略された構成であっても、安定性を維持しながら、制御性能を極力高める（局所）最適値の探索を実現する最適制御装置、最適制御方法、および、コンピュータプログラムを提供することができる。 FIG. 18 is a diagram schematically showing another configuration example of a part of the optimal control device of the first embodiment.
As shown in FIG. 18, the normalized signal generation section 150 and the gradient estimation amount normalization section 160 are not essential components and may be omitted. In that case, the output value of the evaluation function gradient estimation section 140 is supplied to the extreme value search section 170.
In this way, even in a configuration in which the normalized signal generation section 150 and the gradient estimator normalization section 160 are omitted, it is possible to realize a search for a (local) optimal value that maximizes control performance while maintaining stability. An optimal control device, an optimal control method, and a computer program can be provided.

次に、第２実施形態の最適制御装置、最適制御方法およびコンピュータプログラムについて図面を参照して詳細に説明する。
なお、以下の説明において上述の第１実施形態と同様の構成については同一の符号を付して説明を省略する。
図１９は、第２実施形態の最適制御装置の一構成例を概略的に示す図である。
本実施形態の最適制御装置は、復調用ディザー信号を用いずに、勾配推定時に位相遅れの情報を組み込んでいる点において上述の第１実施形態と異なっている。 Next, an optimal control device, an optimal control method, and a computer program according to a second embodiment will be described in detail with reference to the drawings.
In the following description, the same components as those in the first embodiment described above will be denoted by the same reference numerals, and the description will be omitted.
FIG. 19 is a diagram schematically showing a configuration example of the optimal control device according to the second embodiment.
The optimal control device of this embodiment differs from the above-described first embodiment in that phase delay information is incorporated during gradient estimation without using a demodulation dither signal.

本実施形態の最適制御装置は、プロセス計測値取得部１１０と、プロセス評価値算出部１２０と、評価関数勾配推定部１４０と、極値探索部（最適操作量適応調整部）１７０と、操作量出力部１９０と、プロセス位相遅れ推定部４００と、を備えている。プロセス位相遅れ推定部４００は、位相遅れパラメータ推定部３００と、位相遅れ推定部１１００と、を含む。 The optimal control device of this embodiment includes a process measurement value acquisition section 110, a process evaluation value calculation section 120, an evaluation function gradient estimation section 140, an extreme value search section (optimum operation amount adaptive adjustment section) 170, and an operation amount It includes an output section 190 and a process phase delay estimation section 400. Process phase lag estimating section 400 includes a phase lag parameter estimating section 300 and a phase lag estimating section 1100.

図２０は、第２実施形態の最適制御装置の一部の構成例を概略的に示す図である。
位相遅れ推定部１１００は、位相遅れを（８）式の様な形で陽に推定するのではなく、（６）式や（７）式のモデルに対して、操作量のデータをリアルタイムで入力した出力の時系列データ（以下、位相遅れ補正操作量データと呼ぶ）を位相遅れ推定部１１００の出力とする。
プロセスシミュレータを用いる場合も同様に、位相遅れ推定部１１００は、操作量データをリアルタイムでプロセスシミュレータに入力した出力の時系列データを、出力とすることができる。 FIG. 20 is a diagram schematically showing a partial configuration example of the optimal control device according to the second embodiment.
The phase lag estimating unit 1100 does not explicitly estimate the phase lag in the form of equation (8), but inputs manipulated variable data in real time to the models of equations (6) and (7). The time series data of the output (hereinafter referred to as phase lag correction operation amount data) is set as the output of the phase lag estimating section 1100.
Similarly, when using a process simulator, the phase delay estimating unit 1100 can output time-series data obtained by inputting manipulated variable data into the process simulator in real time.

これにより、操作量と評価値との間の位相のずれを補正できる。すなわち、操作量と評価値との間には、ダイナミクスを持つ制御対象２００を通すことによる位相遅れが必ず存在する。本実施形態の最適制御装置では、操作量に替えて位相遅れ補正操作量を用いることで、位相遅れ補正操作量と評価値との位相遅れを極めて小さくしている。原理的には、線形伝達関数モデルと実際の制御対象との誤差が無ければ、位相遅れ補正操作量と評価値との間の位相遅れをなくすことができる。 Thereby, the phase shift between the manipulated variable and the evaluation value can be corrected. That is, there is always a phase delay between the manipulated variable and the evaluation value due to the passage of the controlled object 200 having dynamics. In the optimal control device of this embodiment, the phase lag correction operation amount is used instead of the operation amount, thereby making the phase lag between the phase lag correction operation amount and the evaluation value extremely small. In principle, if there is no error between the linear transfer function model and the actual controlled object, it is possible to eliminate the phase lag between the phase lag correction manipulated variable and the evaluation value.

本実施形態では、評価関数勾配推定部１４０は、復調用ディザー信号を用いずに、動作点における評価関数の傾き（勾配）を直接算出するする。
先に述べたように、例えば図２に示した評価関数の全体の形状は、実際に極値制御を実施している際には不明であるが、動作点近傍の値は制御実施時にリアルタイムに取得できている。従って、動作点近傍のデータ、すなわち、制御実施時の直近の操作量と評価量との時系列データを用いることにより、図２に示すような評価関数の勾配を表す直線の傾きを推定することが可能である。 In this embodiment, the evaluation function gradient estimation unit 140 directly calculates the slope (gradient) of the evaluation function at the operating point without using the demodulation dither signal.
As mentioned earlier, for example, the overall shape of the evaluation function shown in Figure 2 is unknown when extreme value control is actually implemented, but the values near the operating point can be determined in real time during control. I have been able to obtain it. Therefore, by using data near the operating point, that is, time-series data of the most recent manipulated variables and evaluation variables during control execution, it is possible to estimate the slope of the straight line representing the slope of the evaluation function as shown in Figure 2. is possible.

例えば、制御実施時に過去の操作量時系列データと評価量時系列データとを、所定期間分、蓄積し、蓄積された時系列データ利用して最小２乗法や逐次最小２乗法などの回帰アルゴリズムにより次式（１１）の直線の係数を求めることができる。
Ｊ＝ｃＵ＋ｄ………………………………………………………………………（１１）
ここで、Ｊは図２の評価量（プロセス評価値）であり、Ｕは図２の操作量である。ｃとｄとは直線回帰の係数である。これらの係数ｃ、ｄは、最小２乗法などのアルゴリズムを用いて求めることができる。
（１１）式の直線の係数が求まると、直線の傾き＝評価関数の勾配と見なすことができるので、傾きに対応する係数ｃを評価関数の勾配情報（勾配推定値）として用いることができる。 For example, when implementing control, past manipulated variable time-series data and evaluation amount time-series data are accumulated for a predetermined period of time, and the accumulated time-series data is used to perform a regression algorithm such as the least squares method or the sequential least squares method. The coefficients of the straight line in the following equation (11) can be found.
J=cU+d…………………………………………………………………………(11)
Here, J is the evaluation amount (process evaluation value) in FIG. 2, and U is the operation amount in FIG. c and d are linear regression coefficients. These coefficients c and d can be determined using an algorithm such as the method of least squares.
Once the coefficient of the straight line in equation (11) is determined, the slope of the straight line can be regarded as the slope of the evaluation function, so the coefficient c corresponding to the slope can be used as slope information (estimated slope value) of the evaluation function.

本実施形態では、評価関数勾配推定部１４０は、制御実施時に過去の操作量時系列データと評価量時系列データとを、所定期間分、蓄積するデータ蓄積部（図示せず）を備え、蓄積されたデータを用いて、（１１）式に替えて以下の（１２）式で係数ｃを勾配推定値として求めている。
Ｊ＝ｃＵ_ｆ＋ｄ……………………………………………………………………（１２）
ここで、Ｕ_ｆは実際の操作量ではなく、操作量に対して位相遅れの推定値の情報を加えた位相遅れ操作量であり、操作量を（６）式や（７）式に示した線形伝達関数を通したものである。すなわち、位相遅れ推定部１１００の線形伝達関数モデルの出力信号の値である。 In the present embodiment, the evaluation function gradient estimating unit 140 includes a data storage unit (not shown) that stores past manipulated variable time series data and evaluation amount time series data for a predetermined period when performing control. Using the obtained data, the coefficient c is calculated as the gradient estimated value using the following equation (12) instead of equation (11).
J=cU _f +d……………………………………………………………………(12)
Here, U _f is not the actual manipulated variable, but the phase lag manipulated variable that adds information on the estimated value of the phase lag to the manipulated variable, and the manipulated variable is expressed in equations (6) and (7). It is passed through a linear transfer function. That is, it is the value of the output signal of the linear transfer function model of the phase delay estimation section 1100.

上記のように操作量そのものではなく、操作量に対して線形伝達関数モデルを通した値を用いることにより、図２に示すように操作量の位相遅れがない状態で評価関数の勾配推定を行うことができ、勾配推定の精度の向上が見込める。
また、理想的には、（６）式や（７）式の線形伝達関数モデルにより、制御対象２００における位相遅れが正確に推定されているならば、操作量の位相遅れをゼロにすることができる。このため、ディザー信号の周波数をいくらでも速くすることができることから、「安定性」と「制御性能」とのトレードオフ問題を解消することができ、制御の安定性を維持したまま、制御性能を向上させる（収束速度を上げる）ことが可能になる。 As mentioned above, by using the value passed through the linear transfer function model for the manipulated variable rather than the manipulated variable itself, the slope of the evaluation function is estimated without a phase lag of the manipulated variable, as shown in Figure 2. This can be expected to improve the accuracy of gradient estimation.
Ideally, if the phase delay in the controlled object 200 is accurately estimated using the linear transfer function model of equations (6) and (7), it is possible to reduce the phase delay of the manipulated variable to zero. can. Therefore, since the frequency of the dither signal can be made as fast as possible, the trade-off problem between "stability" and "control performance" can be resolved, and control performance can be improved while maintaining control stability. (increase the convergence speed).

なお、線形伝達関数モデルにより、制御対象における位相遅れを正確に同定することは実際には不可能であるため、制御性能（収束速度）の限界はあるが、操作量の時系列データと評価値の時系列データとの位相差を解消することにより、安定性を損なわずに、制御性能を格段に向上させることができる。 Note that it is actually impossible to accurately identify the phase delay in the controlled object using a linear transfer function model, so although there is a limit to control performance (convergence speed), time series data of manipulated variables and evaluation values By eliminating the phase difference with the time series data, control performance can be significantly improved without sacrificing stability.

本実施形態において、勾配推定値を算出する方法は上記に限定されるものではない。例えば、操作量と評価値との「相関係数」を勾配推定値と見なしてもよい。これは、操作量と評価とを表す関数の勾配の傾きの正負は、操作量と評価値との相関係数の正負と一致するとともに、相関係数は勾配の大きさに依存しない、勾配の代用となるからである。すなわち、変数Ｘおよび変数Ｙの直線回帰の回帰係数と、変数Ｘおよび変数Ｙの相関係数との間には、「変数Ｘと変数Ｙとを各々の平均と標準偏差で正規化した上で、回帰直線を求めると、回帰直線の係数ｃは相関係数ｒと一致し、バイアスｂ＝０となる」という事実に基づいている。そのため、回帰係数ｃに替えて相関係数ｒを用いることは、操作量時系列データと評価値時系列データとを各々正規化した上で、回帰係数を求めていることに相当する。 In this embodiment, the method of calculating the gradient estimated value is not limited to the above. For example, a "correlation coefficient" between the manipulated variable and the evaluation value may be regarded as the estimated gradient value. This means that the positive or negative slope of the slope of the function representing the manipulated variable and the evaluation matches the positive or negative sign of the correlation coefficient between the manipulated variable and the evaluated value, and the correlation coefficient does not depend on the magnitude of the slope. This is because it serves as a substitute. In other words, between the regression coefficient of linear regression of variable X and variable Y and the correlation coefficient of variable X and variable Y, there is a difference between , the coefficient c of the regression line matches the correlation coefficient r, and the bias b=0. Therefore, using the correlation coefficient r instead of the regression coefficient c corresponds to determining the regression coefficient after normalizing the manipulated variable time series data and the evaluation value time series data, respectively.

本実施形態の最適制御装置においても、評価関数勾配推定部１４０は、勾配推定値を評価量Ｊと位相遅れ操作量Ｕ_ｆとの相関係数により算出することができる。本実施形態の最適制御装置において回帰係数に替えて相関係数を勾配情報として用いることで、第１実施形態の最適制御装置における正規化操作を行うことと同様の効果を得ることができる。 Also in the optimal control device of this embodiment, the evaluation function gradient estimating unit 140 can calculate the estimated gradient value using the correlation coefficient between the evaluation amount J and the phase delay operation amount U _f . By using the correlation coefficient as gradient information instead of the regression coefficient in the optimal control device of this embodiment, it is possible to obtain the same effect as the normalization operation in the optimal control device of the first embodiment.

なお、図１８における、勾配を直線近似で推定する極値制御の構成は、目標値追従型のいわゆる通常のフィードバック制御において広く知られている、内部モデル制御（Internal Model Control）やスミス補償制御（内部モデル制御の一種）と類似した構造を持っている。
目標値追従型のフィードバック制御において、内部モデル制御は、制御対象と、制御対象を表す制御対象プロセスモデルとを並列に並べ、制御対象の出力と制御対象を表すプロセスモデルの出力との誤差をフィードバックすることで、制御性能を高める制御であり、予測制御の一種ととらえることもでき、特にむだ時間が長いプロセスに対して有効であることが広く知られている。 Note that the configuration of extreme value control in which the slope is estimated by linear approximation in FIG. It has a structure similar to that of internal model control (a type of internal model control).
In target value tracking type feedback control, internal model control arranges the controlled object and the controlled object process model representing the controlled object in parallel, and feeds back the error between the output of the controlled object and the output of the process model representing the controlled object. This is a control that improves control performance, and can be considered a type of predictive control, and is widely known to be particularly effective for processes with long dead times.

本実施形態の最適制御装置では、極値探索型のフィードバック制御において、内部モデル制御と同様の考え方を導入したものと考えることができる。すなわち、遅れ時間が長い場合に、それを予測するモデルを制御対象と並列に並べ、一種の予測を行い、制御対象における遅れを補償した操作量に基づいて最適化（勾配推定）を行っている。
本実施形態の最適制御装置における極値探索型のフィードバック制御と、内部モデル制御やスミス補償制御との類似点は、制御対象のモデルを用いて予測を行っている点である。 The optimal control device of this embodiment can be considered to have introduced a concept similar to that of internal model control in extreme value search type feedback control. In other words, when the delay time is long, a model that predicts it is arranged in parallel with the controlled object, a type of prediction is made, and optimization (gradient estimation) is performed based on the manipulated variable that compensates for the delay in the controlled object. .
The similarity between the extreme value search type feedback control in the optimal control device of this embodiment and the internal model control and Smith compensation control is that prediction is performed using a model of the controlled object.

一方、相違点は、（１）内部モデル制御では、制御対象プロセスモデルを利用するが、本実施形態においては、制御対象の静的な特性は不要であるため、（制御対象プロセスモデルを利用しても良いが）制御対象プロセスモデルの中の静的な特性（静的な非線形要素）を除く位相特性をあらわすモデル利用するだけで良い点と、（２）内部モデル制御では、制御対象の出力値と制御対象プロセスモデルの予測値との誤差をフィードバックするのに対し、本実施形態では、プロセスの位相特性の予測値と制御対象の出力値（＝評価関数値）との関係を表す勾配を求めて勾配を推定する点、である。
内部モデル制御については既にその効果が明らかになっており、上記のような本実施形態の最適制御装置におけるフィードバック制御と、内部モデル制御との類似性の視点からも本実施形態の効果を類推することができる。 On the other hand, the difference is (1) Internal model control uses a controlled object process model, but in this embodiment, static characteristics of the controlled object are not required. (2) In internal model control, the output of the controlled object can be In contrast to feeding back the error between the value and the predicted value of the controlled target process model, in this embodiment, the gradient representing the relationship between the predicted value of the phase characteristic of the process and the output value of the controlled target (= evaluation function value) is fed back. This is the point at which the gradient is estimated.
The effects of internal model control have already been clarified, and the effects of this embodiment can also be inferred from the viewpoint of the similarity between feedback control in the optimal control device of this embodiment as described above and internal model control. be able to.

図１１乃至図１３に、第２実施形態と第２比較例の最適制御装置による操作量の時間変化のシミュレーション結果の一例を示している。
ここでは、本実施形態の最適制御装置において、プラントの時定数Ｔ＝０．６７ｓ、プラントのむだ時間Ｌ＝０．６７ｓ、操作量の最適値Ｕ^＊＝２、ディザー信号の周期をプラントの時定数とプラントのむだ時間との和（Ｔ＋Ｌ）の２倍～５倍として、操作量の時間変換をシミュレーションした結果の一例を示している。
第２比較例の最適制御装置は、位相遅れ補償を適用していないこと以外は本実施形態の最適制御装置と同じ条件である。 FIGS. 11 to 13 show examples of simulation results of temporal changes in the manipulated variable by the optimal control devices of the second embodiment and the second comparative example.
Here, in the optimal control device of this embodiment, the plant time constant T = 0.67 s, the plant dead time L = 0.67 s, the optimal value of the manipulated variable U ^* = 2, and the period of the dither signal is set to the plant time constant. An example of the results of simulating the time conversion of the manipulated variable is shown, with the value being 2 to 5 times the sum of the constant and the plant dead time (T+L).
The optimal control device of the second comparative example has the same conditions as the optimal control device of the present embodiment except that phase delay compensation is not applied.

シミュレーション結果によれば、本実施形態の最適制御装置および第２比較例の最適制御装置によれば、ディザー信号の周期が小さくなるほど操作量が最適値に収束するまでの時間が短くなる傾向がみられた。また、ディザー信号がいずれの周期の場合にも、本実施形態の最適制御装置によれば操作量が最適値に収束するまでの時間が第２比較例の最適制御装置よりも短かった。 According to the simulation results, according to the optimal control device of this embodiment and the optimal control device of the second comparative example, there is a tendency that the shorter the period of the dither signal, the shorter the time it takes for the manipulated variable to converge to the optimal value. It was done. Furthermore, regardless of the period of the dither signal, the time required for the manipulated variable to converge to the optimal value was shorter with the optimal control device of this embodiment than with the optimal control device of the second comparative example.

上記のように、本実施形態の最適制御装置によれば、操作量とプロセス評価値の間に存在する位相差を補償する機能を導入することにより、極値制御の性能を高める（収束を速くする）ことが可能になる。これにより、収束速度が遅いことによる極値探索性能の劣化を改善できる。 As described above, the optimal control device of this embodiment improves the performance of extreme value control (fastens convergence) by introducing a function that compensates for the phase difference that exists between the manipulated variable and the process evaluation value. ) becomes possible. This can improve the deterioration in extreme value search performance due to slow convergence speed.

すなわち、本実施形態によれば、強制的なディザー信号を印加せずに、安定性を維持しながら、制御性能を極力高める（局所）最適値の探索を実現する最適制御装置、最適制御方法、および、コンピュータプログラムを提供することができる。 That is, according to the present embodiment, an optimal control device, an optimal control method, which realizes a search for a (local) optimal value that maximizes control performance while maintaining stability without applying a forced dither signal. And, a computer program can be provided.

図２１は、第２実施形態の最適制御装置の構成の変形例を概略的に示す図である。
この例では、最適制御装置が変調用ディザー信号生成部１８０を備えている点において、図１７および図１８に示す例と相違している。
図２２は、第２実施形態の最適制御装置の一部の構成の変形例を概略的に示す図である。
本実施形態では、変調用ディザー信号を操作量に組み込むことにより、回帰推定や相関推定に用いるデータに強制的に変動を与えている。例えば、制御対象２００が外乱などによって十分に励起（駆動）されていない場合には、回帰パラメータや相関係数の推定値の信頼性が劣化する可能性があるため、変調用ディザー信号を操作量に加えておく方が好ましい。 FIG. 21 is a diagram schematically showing a modification of the configuration of the optimal control device according to the second embodiment.
This example differs from the examples shown in FIGS. 17 and 18 in that the optimal control device includes a modulation dither signal generation section 180.
FIG. 22 is a diagram schematically showing a modification of a part of the configuration of the optimal control device of the second embodiment.
In this embodiment, by incorporating a modulating dither signal into the manipulated variable, data used for regression estimation and correlation estimation are forcibly varied. For example, if the controlled object 200 is not sufficiently excited (driven) due to disturbance etc., the reliability of estimated values of regression parameters and correlation coefficients may deteriorate, so the modulation dither signal may be It is preferable to add it to

ただし、この場合、変調用ディザー信号は必ずしも周期的な信号である必要もなく正弦波である必要もない。例えば、乱数の様な信号を変調用ディザー信号とし、制御対象に印加する方法でも構わない。
制御対象が外乱などによって十分に駆動されている場合には、上述の様に変調用ディザー信号生成部１８０を備えない極値制御系を構成することも可能である。 However, in this case, the modulating dither signal does not necessarily need to be a periodic signal or a sine wave. For example, a method may be used in which a signal such as a random number is used as a modulating dither signal and is applied to the controlled object.
If the controlled object is sufficiently driven by a disturbance or the like, it is also possible to configure an extreme value control system that does not include the modulation dither signal generation section 180 as described above.

本変形例は、上記以外の構成は図１９および図２０の最適制御装置と同様の構成である。したがって、本変形例によれば、必要最小限の外部入力の印加により、安定性を維持しながら、制御性能を極力高める（局所）最適値の探索を実現する最適制御装置、最適制御方法、および、コンピュータプログラムを提供することができる。 This modification has a configuration similar to that of the optimal control device in FIGS. 19 and 20 except for the configuration described above. Therefore, according to this modification, there is provided an optimal control device, an optimal control method, and an optimal control method that realize a search for a (local) optimal value that maximizes control performance while maintaining stability by applying the minimum necessary external input. , may provide computer programs.

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Although several embodiments of the invention have been described, these embodiments are presented by way of example and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other forms, and various omissions, substitutions, and changes can be made without departing from the gist of the invention. These embodiments and their modifications are included within the scope and gist of the invention, as well as within the scope of the invention described in the claims and its equivalents.

１００…制御システム、１１０…プロセス計測値取得部、１２０…プロセス評価値算出部、１３０…復調用ディザー信号生成部、１４０…評価関数勾配推定部、１５０…正規化信号発生部、１６０…勾配推定量正規化部、１７０…極値探索部、１７１…積分器、１７２…ゲイン乗算部、１８０…変調用ディザー信号生成部、１９０…操作量出力部、２００…制御対象（対象プラント）、３００…位相遅れパラメータ推定部、４００…プロセス位相遅れ推定部、１１００…位相遅れ推定部 DESCRIPTION OF SYMBOLS 100... Control system, 110... Process measurement value acquisition part, 120... Process evaluation value calculation part, 130... Dither signal generation part for demodulation, 140... Evaluation function gradient estimation part, 150... Normalization signal generation part, 160... Gradient estimation Quantity normalization section, 170...Extreme value search section, 171...Integrator, 172...Gain multiplication section, 180...Modulation dither signal generation section, 190...Manipulated amount output section, 200...Controlled object (target plant), 300... Phase lag parameter estimation unit, 400... Process phase lag estimation unit, 1100... Phase lag estimation unit

Claims

Optimization that searches for the optimal value of the evaluation value by manipulating the manipulated variable in real time based on the manipulated variable of the controlled process and the evaluation value of an evaluation function based on the controlled variable that changes according to the manipulated variable. A control device,
a process evaluation value calculation unit that calculates the evaluation value of the evaluation function using the measurement value acquired in the controlled process;
a process phase lag estimation unit that uses the manipulated variable and the evaluation value to calculate an estimated value of a phase delay from the manipulated variable to the evaluated value;
an evaluation function gradient estimation unit that calculates an estimated value of a rate of change of the evaluation value with respect to the manipulated variable using information regarding the estimated value of the phase delay and information on the evaluation value;
an extreme value search unit that determines the direction and amount of movement of the manipulated variable by integrating the estimated value of the rate of change;
An optimal control device comprising: a manipulated variable output unit that outputs the manipulated variable to the controlled process based on information about the direction and amount of movement of the manipulated variable determined by the extreme value search unit.

The process phase delay estimator expresses the transfer function from the manipulated variable of the controlled process to the evaluation value using a linear transfer function model excluding static nonlinear elements, and calculates the transfer function by phase calculation of the linear transfer function model. The optimal control device according to claim 1, which calculates an estimated value of phase delay.

The optimal control device according to claim 2, wherein the process phase lag estimator includes a phase lag parameter estimator that identifies parameters of the linear transfer function model online.

The process phase delay estimation unit configures a model from the evaluation amount of the controlled process to the evaluation value of the evaluation function from a simulation model of the controlled process and an evaluation function model, and by simulating the model, The optimal control device according to claim 1, wherein the estimated value of the phase delay is calculated.

a demodulation dither signal generation unit that generates a first periodic signal having a predetermined period;
a modulation dither signal generation unit that generates a second periodic signal having the same period as the first periodic signal;
The manipulated variable output unit outputs a signal obtained by adding the second periodic signal to information about the direction and amount of movement of the manipulated variable determined by the extreme value search unit,
The evaluation function gradient estimating unit calculates an estimated value of the rate of change of the evaluation value with respect to the manipulated variable using the second periodic signal and information on the evaluation value,
The optimal control device according to claim 2 or 4, wherein the first periodic signal is a signal obtained by adding the estimated value of the phase delay to the second periodic signal.

further comprising a normalization unit that normalizes the estimated value of the rate of change and outputs information on the direction of the estimated value of the rate of change to the extreme value search unit,
6. The optimal control device according to claim 5, wherein the extreme value search unit determines the direction and amount of movement of the manipulated variable by integrating the sign of the estimated value of the rate of change.

The process phase lag estimator expresses a linear approximation model in the vicinity of the manipulated variable of the controlled process model using a linear transfer function model, and calculates the estimated value of the phase lag by phase calculation of the linear transfer function model. The optimal control device according to claim 4, which calculates the optimal control device.

The process phase lag estimator outputs a phase lag manipulated variable passed through the linear transfer function model with respect to the manipulated variable as an estimated value of the phase lag,
The evaluation function gradient estimator uses time series data of the value of the phase delay operation amount and the evaluation value for a predetermined period to calculate a straight line with the phase delay operation amount as input and the evaluation value as output. The optimal control device according to claim 2 or 7, wherein the coefficient of the first-order term of the straight line is calculated by regression, and the coefficient of the first-order term of the straight line is used as the estimated value of the rate of change.

The process phase lag estimator outputs a phase lag manipulated variable passed through the linear transfer function model with respect to the manipulated variable as an estimated value of the phase lag,
The optimal control device according to claim 2 or 7, wherein the evaluation function gradient estimator calculates a correlation coefficient between the value of the phase delay manipulated variable and the evaluation value as the estimated value of the rate of change.

Optimization that searches for the optimal value of the evaluation value by manipulating the manipulated variable in real time based on the manipulated variable of the controlled process and the evaluation value of an evaluation function based on the controlled variable that changes according to the manipulated variable. A control method,
Calculating the evaluation value of the evaluation function using the measurement value acquired in the controlled process,
Using the manipulated variable and the evaluation value, calculate an estimated value of a phase delay from the manipulated variable to the evaluated value,
Using information on the estimated value of the phase lag and information on the evaluation value, calculate an estimated value of the rate of change of the evaluation value with respect to the manipulated variable,
determining the direction and amount of movement of the manipulated variable by integrating the estimated value of the rate of change;
An optimal control method that outputs the manipulated variable to the controlled process based on information about the determined direction and amount of movement of the manipulated variable.

A computer program for causing a computer to perform the method according to claim 10.