JP2013168020A

JP2013168020A - State prediction method for process

Info

Publication number: JP2013168020A
Application number: JP2012030901A
Authority: JP
Inventors: Norihiro Tanaka; 規博田中; Hidehiko Furuya; 秀彦古家; Yuhei Akaike; 裕平赤池; Masatoshi Ogawa; 雅俊小川; Harutoshi Okai; 晴俊大貝
Original assignee: Waseda University; Nippon Steel and Sumikin Engineering Co Ltd
Current assignee: Waseda University; Nippon Steel Engineering Co Ltd
Priority date: 2012-02-15
Filing date: 2012-02-15
Publication date: 2013-08-29

Abstract

PROBLEM TO BE SOLVED: To provide the state prediction method of a process of structuring an appropriate local model by optimizing the number of explanatory variables and the number of past events.SOLUTION: A state prediction method of a process for creating a database in which input vectors and output vectors consisting of observation data representing the operation states of processes are accumulated as pairs, and for acquiring a proximity data vector which is similar to a request point vector consisting of the input vector corresponding to the output vector at a desired point of time of prediction from the database, and for structuring a local model from the proximity data vector to search the output vectors at the desired point of time of prediction includes: structuring a plurality of local models by using the number M of explanatory variables constituting the input vector and the maximum number NNof the proximity data vector as parameters; and selecting the local model in which an error between the predicted value and measured value of the local model is minimized.

Description

本発明は、プラント設備におけるプロセスの状態予測方法に関する。 The present invention relates to a process state prediction method in a plant facility.

あるプロセスについて、その状態を把握する必要があるとき、通常の計測機器では分析に時間がかかり、リアルタイムにプロセス状態を把握できないことがある。また、計測環境や計測対象によっては、計測機器の設置自体が困難な場合もある。プロセスの状態を示す明確な物理モデルが得られる場合は、高精度な予測値を計算によって求めることができるが、プラント設備におけるプロセスは複雑な物理化学現象が複合した形で発現することが殆どであるため、物理モデルで表せない場合が多い。 When it is necessary to grasp the state of a process, it may take time to analyze with a normal measuring instrument, and the process state may not be grasped in real time. Also, depending on the measurement environment and measurement target, installation of the measurement device itself may be difficult. When a clear physical model showing the process state can be obtained, a highly accurate predicted value can be obtained by calculation, but the process in the plant equipment is often expressed in a complex form of complex physicochemical phenomena. In many cases, it cannot be represented by a physical model.

そこで、近年、計算機ハードウェアやデータベースシステム技術の進歩に伴い、大量データの蓄積と高速検索が可能になったこと等を背景に、“Just-In-Time（ＪＩＴ）モデリング”と呼ばれる局所モデリング手法が注目されている。ＪＩＴモデリングでは、観測したデータをデータベースに蓄積しておき、システムの予測等の必要が生じるたびに、入力である“要求点ベクトル”と関連性の高いデータベクトルをデータベースから近傍データベクトルとして検索し、検索した近傍データベクトルの出力を補間する局所モデルを構成して、“要求点ベクトル”の出力を予測する。この手法では、観測データの更なる蓄積があるたびに既存の局所モデルを廃棄し、再び新たな局所モデルを構築する。 Therefore, a local modeling method called “Just-In-Time (JIT) modeling” against the backdrop of recent advances in computer hardware and database system technology, which enabled the storage of large amounts of data and high-speed retrieval. Is attracting attention. In JIT modeling, observed data is stored in a database, and whenever a system prediction or the like becomes necessary, a data vector highly related to the input “request point vector” is searched as a neighborhood data vector from the database. Then, a local model that interpolates the output of the searched neighborhood data vector is constructed, and the output of the “request point vector” is predicted. In this method, whenever there is further accumulation of observation data, the existing local model is discarded and a new local model is constructed again.

ＪＩＴモデリングでは、予測を行うたびに、データベースから要求点ベクトルと類似するデータベクトルを検索するため、データベースが大規模になると、計算負荷が大きくなりすぎるという問題がある。そのため、ＪＩＴモデリングにステップワイズ法を適用して変数の低次元化を行う大規模データベースオンラインモデリング（ＬＯＭ）という手法が開発されている。例えば、特許文献１、２では、熱反応炉の操業データからなる大規模データベースについて、ステップワイズ法を用いて炉頂ガス温度に対する寄与率が高い変数を選択して当該変数からなる新たなデータベースを作成し、新たなデータベースから取得した近傍データベクトルに基づいて構築した局所モデルを用いて炉頂ガス温度の予測を行っている。 In JIT modeling, every time a prediction is made, a data vector similar to a request point vector is searched from the database. Therefore, when the database becomes large, the calculation load becomes too large. For this reason, a technique called large-scale database online modeling (LOM) has been developed in which a stepwise method is applied to JIT modeling to reduce variables. For example, in Patent Documents 1 and 2, for a large-scale database composed of operation data of a thermal reactor, a variable having a high contribution rate to the furnace top gas temperature is selected using a stepwise method, and a new database composed of the variable is created. The top gas temperature is predicted using a local model that is created and constructed based on the neighborhood data vector obtained from a new database.

なお、本明細書では、「要求点」と「近傍データ」がそれぞれベクトル量であることを明確にするため、「要求点」を「要求点ベクトル」、「近傍データ」を「近傍データベクトル」と記載する。また、データベクトルの集合である「データベクトル集合」を「データセット」と呼ぶことがある。 In this specification, in order to clarify that “request point” and “neighbor data” are vector quantities, “request point” is “request point vector” and “neighbor data” is “neighbor data vector”. It describes. Also, a “data vector set” that is a set of data vectors may be referred to as a “data set”.

特開２００９−０７６０３６号公報JP 2009-076036 A 特開２００９−０７６０３７号公報JP 2009-076037 A

特許文献１、２において開示されているプロセスの状態予測方法では、ステップワイズ法を用いて、目的変数に対する寄与率が高い説明変数を選択している。具体的には、寄与率の指標であるＦ値に対する限界値を予め設定し、Ｆ値が限界値以上となるように説明変数が選択される。従って、ステップワイズ法では、限界値の設定が重要となるが、理論的に限界値を決定する方法が無く、経験的に限界値を決定しているという問題がある。
また、データベースを構成する過去事例数についても、採用する説明変数の数が変わると、それに応じて過去事例の最適数も変化するため、適切な局所モデルを構築するためには、説明変数の数と過去事例数を併せて最適化する必要がある。 In the process state prediction methods disclosed in Patent Documents 1 and 2, an explanatory variable having a high contribution rate to the objective variable is selected using a stepwise method. Specifically, a limit value for the F value that is an index of the contribution rate is set in advance, and the explanatory variable is selected so that the F value is equal to or greater than the limit value. Therefore, in the stepwise method, setting of the limit value is important, but there is no method for theoretically determining the limit value, and there is a problem that the limit value is determined empirically.
In addition, regarding the number of past cases that make up the database, if the number of explanatory variables to be adopted changes, the optimum number of past cases also changes accordingly, so in order to build an appropriate local model, the number of explanatory variables And the number of past cases need to be optimized.

本発明はかかる事情に鑑みてなされたもので、説明変数の数と過去事例数を併せて最適化することにより適切な局所モデルを構築することが可能なプロセスの状態予測方法を提供することを目的とする。 The present invention has been made in view of such circumstances, and provides a process state prediction method capable of constructing an appropriate local model by optimizing the number of explanatory variables and the number of past cases together. Objective.

上記目的を達成するため、本発明は、プラント設備におけるプロセスの操業状態を示す観測データから構成される入力ベクトル及び出力ベクトルが対となったデータベクトルが蓄積されたデータベースを作成し、予測したい時点における出力ベクトルに対応する入力ベクトルからなる要求点ベクトルに類似する前記データベクトルを近傍データベクトルとして前記データベースから少なくとも１つ以上取得し、前記近傍データベクトルから局所モデルを構築して、前記予測したい時点における出力ベクトルを予測するプロセスの状態予測方法において、
前記入力ベクトルを構成する説明変数の数Ｍと前記近傍データベクトルの最大数ＮＮ_ＭＡＸをパラメータとして、前記近傍データベクトルが格納された近傍データベクトル集合を複数作成して前記各近傍データベクトル集合について主成分分析を実施し、前記要求点ベクトルに対するＱ統計量が最小となる前記局所モデルを前記各近傍データベクトル集合ごとに構築して、前記局所モデルによる予測値と実測値との誤差を算出する工程と、
前記説明変数の数Ｍ及び前記近傍データベクトルの最大数ＮＮ_ＭＡＸをパラメータとして構築された複数の前記局所モデルのなかで、前記誤差が最小となる前記局所モデルを選択する工程とを備えることを特徴としている。 In order to achieve the above-mentioned object, the present invention creates a database in which a data vector in which an input vector and an output vector composed of observation data indicating the operation state of a process in a plant facility are paired is accumulated and is to be predicted At least one of the data vectors similar to the request point vector consisting of input vectors corresponding to the output vector in FIG. 5 is obtained as a neighborhood data vector from the database, a local model is constructed from the neighborhood data vector, and the prediction time point In the process state prediction method for predicting the output vector at
Using the number M of explanatory variables constituting the input vector and the maximum number NN _MAX of the neighborhood data vectors as parameters, a plurality of neighborhood data vector sets in which the neighborhood data vectors are stored are created, and each neighborhood data vector set is Performing component analysis, constructing the local model having a minimum Q statistic for the request point vector for each set of neighboring data vectors, and calculating an error between a predicted value and an actual measurement value of the local model When,
Selecting the local model that minimizes the error among a plurality of the local models constructed using the number M of the explanatory variables and the maximum number NN _MAX of the neighboring data vectors as parameters. It is said.

本発明では、入力ベクトルを構成する説明変数の数Ｍと近傍データベクトルの最大数ＮＮ_ＭＡＸ（過去事例数）の値を変えながら、各Ｍ値及び各ＮＮ_ＭＡＸ値ごとに局所モデルを構築し、各局所モデルによる予測値と実測値との誤差が最小となる局所モデルを選択するので、説明変数の数と過去事例数が併せて最適化され、適切な局所モデルを構築することが可能となる。 In the present invention, a local model is constructed for each M value and each NN _MAX value while changing the number M of explanatory variables constituting the input vector and the maximum number NN _MAX (number of past cases) of neighboring data vectors. Since the local model that minimizes the error between the predicted value and the actual measurement value of each local model is selected, the number of explanatory variables and the number of past cases are optimized together, and an appropriate local model can be constructed. .

なお、局所モデルを構築する際に使用する主成分分析及びＱ統計量は以下のような特徴を有している。
主成分分析では、変数間の相関関係を捉えるため、変数の線形結合によって主成分と呼ばれる新たな合成変数を作り出す。この主成分によって、対象とするデータベクトル集合の特徴を最も良く表現する部分空間を得ることができる。Ｑ統計量は、主成分によって張られる部分空間では表現できない部分を表している。つまり、Ｑ統計量は、対象とするデータベクトル集合と要求点ベクトルとの相関関係の非類似度を表し、Ｑ統計量が小さいほど、要求点ベクトルに類似するデータベクトル集合であると判断できる。 The principal component analysis and Q statistic used when constructing the local model have the following characteristics.
In principal component analysis, in order to capture the correlation between variables, a new synthetic variable called a principal component is created by linear combination of variables. With this principal component, it is possible to obtain a partial space that best represents the characteristics of the target data vector set. The Q statistic represents a portion that cannot be expressed in the subspace spanned by the principal components. That is, the Q statistic represents the degree of dissimilarity between the target data vector set and the requested point vector, and it can be determined that the smaller the Q statistic, the more similar the data vector set is to the requested point vector.

図１は、要求点ベクトルと近傍データベクトルとの相関関係を表した模式図である。図１（Ａ）は、ＪＩＴモデリングやＬＯＭの場合を示しており、ベクトル間距離に基づいて近傍データベクトルを選択するため、異なる相関関係を有する近傍データベクトルが選択されるおそれがある。一方、図１（Ｂ）は、主成分分析及びＱ統計量を用いた状態予測方法の場合を示しており、Ｑ統計量を用いて、要求点ベクトルとデータベクトル集合（データセット）の相関関係を測るため、相関関係の高い○印のデータベクトル集合のみ選択される。 FIG. 1 is a schematic diagram showing a correlation between a request point vector and a neighborhood data vector. FIG. 1A shows the case of JIT modeling or LOM. Since neighboring data vectors are selected based on the distance between vectors, neighboring data vectors having different correlations may be selected. On the other hand, FIG. 1 (B) shows the case of the state prediction method using principal component analysis and Q statistics, and the correlation between request point vectors and data vector sets (data sets) using Q statistics. Therefore, only a set of data vectors marked with ○ having a high correlation is selected.

また、本発明に係るプロセスの状態予測方法では、前記局所モデルによる予測値と実測値との誤差は、二乗平均平方根誤差によって算出してもよく、評価区間全域に亘る誤差が単一の数値で示される。 In the process state prediction method according to the present invention, the error between the predicted value and the actual measurement value by the local model may be calculated by a root mean square error, and the error over the entire evaluation interval is a single numerical value. Indicated.

本発明に係るプロセスの状態予測方法では、入力ベクトルを構成する説明変数の数Ｍと近傍データベクトルの最大数ＮＮ_ＭＡＸ（過去事例数）をパラメータとして作成した複数の近傍データベクトル集合それぞれについて主成分分析を実施し、要求点ベクトルに対するＱ統計量が最小となる局所モデルを各近傍データベクトル集合ごとに構築して局所モデルによる予測値と実測値との誤差が最小となる局所モデルを選択するので、説明変数の数と過去事例数が併せて最適化され、適切な局所モデルを構築することが可能となる。 In the process state prediction method according to the present invention, the principal component for each of a plurality of neighborhood data vector sets created using the number M of explanatory variables constituting the input vector and the maximum number of neighborhood data vectors NN _MAX (the number of past cases) as parameters. Analysis is performed, and a local model that minimizes the Q statistic for the requested point vector is constructed for each set of neighboring data vectors, and a local model that minimizes the error between the predicted value and the actual measurement value of the local model is selected. The number of explanatory variables and the number of past cases are optimized together, and an appropriate local model can be constructed.

要求点ベクトルと近傍データベクトルとの相関関係を表す模式図であって、（Ａ）はＪＩＴモデリングやＬＯＭの場合、（Ｂ）は主成分分析及びＱ統計量を用いた状態予測方法の場合をそれぞれ示している。FIG. 4 is a schematic diagram showing the correlation between a request point vector and a neighborhood data vector, where (A) shows the case of JIT modeling and LOM, and (B) shows the case of a state prediction method using principal component analysis and Q statistics. Each is shown. 本発明の一実施の形態に係るプロセスの状態予測方法を説明するためのフローチャートである。It is a flowchart for demonstrating the process state prediction method which concerns on one embodiment of this invention. 同プロセスの状態予測方法を説明するためのフローチャートである。It is a flowchart for demonstrating the state prediction method of the process. データセットの構成を示すテーブルである。It is a table which shows the structure of a data set. 要求点ベクトルの構成を示すテーブルである。It is a table which shows the structure of a request point vector. 近傍データ数がＮＮ_ＭＡＸである近傍データセットＡの構成を示すテーブルである。It is a table which shows the composition of neighborhood data set A where the number of neighborhood data is NN _MAX . 近傍データ数がＮＮ_ＭＩＮである近傍データセットＢ_０の構成を示すテーブルである。Number of neighbors data is a table showing the structure of a neighboring data set _{B 0} is _{NN MIN.} Ｑ統計量が格納されたＱ値テーブルである。It is a Q value table in which Q statistics are stored. 説明変数の数と過去事例数をパラメータとしたときのＲＭＳＥ_{ＴＯＴＡＬ}の一覧を示したテーブルである。It is the table which showed the list of RMSE _TOTAL when the number of explanatory variables and the number of past cases are used as parameters. 同プロセスの状態予測方法によって構築された局所モデルによる予測値を実測値と対比した時刻歴グラフである。It is the time history graph which contrasted the predicted value by the local model constructed | assembled with the state prediction method of the process with the measured value. 従来の予測方法によって構築された局所モデルによる予測値を実測値と対比した時刻歴グラフである。It is the time history graph which contrasted the predicted value by the local model built by the conventional prediction method with the actual measurement value. 従来の予測方法によって構築された局所モデルによる予測値を実測値と対比した時刻歴グラフである。It is the time history graph which contrasted the predicted value by the local model built by the conventional prediction method with the actual measurement value.

続いて、添付した図面を参照しつつ、本発明を具体化した実施の形態に付き説明し、本発明の理解に供する。 Next, embodiments of the present invention will be described with reference to the accompanying drawings for understanding of the present invention.

［プロセスの状態予測方法の概略手順］
先ず、本発明の一実施の形態に係るプロセスの状態予測方法の概略手順を以下に示す。
（Ａ１）プラント設備におけるプロセスの操業状態を示す観測データから構成される入力ベクトル及び出力ベクトルが対となったデータベクトルが蓄積されたデータベースに関し、入力ベクトルを構成する説明変数の数Ｍの範囲と、予測したい時点における出力ベクトルに対応する入力ベクトルからなる要求点ベクトルに類似する近傍データベクトルの最大数ＮＮ_ＭＡＸ（過去事例数）の範囲を設定する。 [Outline of process status prediction method]
First, a schematic procedure of a process state prediction method according to an embodiment of the present invention will be described below.
(A1) Regarding a database in which a data vector in which an input vector and an output vector composed of observation data indicating the operation state of a process in a plant facility are paired is stored, a range of the number M of explanatory variables constituting the input vector; Then, the range of the maximum number NN _MAX (the number of past cases) of the neighborhood data vectors similar to the request point vector composed of the input vector corresponding to the output vector at the time of the prediction is set.

（Ａ２）説明変数の数Ｍ及び近傍データセットの最大数ＮＮ_ＭＡＸの各範囲内において、Ｍ及びＮＮ_ＭＡＸを変化させながら、選択したＭ及びＮＮ_ＭＡＸそれぞれについて以下の処理を行う。
（Ａ２−１）近傍データベクトルが格納された近傍データセット（近傍データベクトル集合）を近傍データベクトルの数を変えて（ただし、最大数ＮＮ_ＭＡＸが上限である。）複数作成する。
（Ａ２−２）作成した複数の近傍データセットについて主成分分析を実施して各近傍データセットごとに要求点ベクトルに対するＱ統計量を算出し、Ｑ統計量が最小となる近傍データセットを選択して局所モデルを構築する。
（Ａ２−３）構築した局所モデルを用いて、予測したい時点における出力ベクトルの予測値を求め、実測値と比較してその誤差を算出する。 (A2) in each range of the maximum number _{NN MAX} number M and the vicinity dataset explanatory variables, while changing the M and _{NN MAX,} carries out the following process for each selected M and _{NN MAX.}
(A2-1) A plurality of neighborhood data sets (neighboring data vector sets) storing neighborhood data vectors are created by changing the number of neighborhood data vectors (however, the maximum number NN _MAX is the upper limit).
(A2-2) Principal component analysis is performed on the plurality of generated neighborhood data sets, the Q statistic for the request point vector is calculated for each neighborhood data set, and the neighborhood data set that minimizes the Q statistic is selected. To build a local model.
(A2-3) Using the constructed local model, the predicted value of the output vector at the time point to be predicted is obtained, and the error is calculated by comparing with the actually measured value.

（Ａ３）説明変数の数Ｍ及び近傍データセットの最大数ＮＮ_ＭＡＸの各範囲内において構築された複数の局所モデルのなかで、実測値との誤差が最小となる局所モデルを選択する。 (A3) A local model that minimizes an error from an actual measurement value is selected from among a plurality of local models constructed within each range of the number M of explanatory variables and the maximum number NN _MAX of neighboring data sets.

ここで、本実施の形態に係るプロセスの状態予測方法を構成する主要な手法について基本的な説明をしておく。
［ＪＩＴモデリング］
現在の挙動と近似した挙動が過去に観測されていたならば、現在の挙動が進展する様子は過去のものと近似したものになるであろうと考えることができる。この考え方を再現した予測手法の１つがJust-In-Time（ＪＩＴ）モデリングである。ＪＩＴモデリングは決まったモデルを持たない代わりに、過去のデータベクトルをそのままデータベースとして保持する。プロセスの予測が必要となったとき、過去データが蓄積されたデータベースから、要求点ベクトルと類似性の高いデータベクトルを検索し、局所モデルを構築して出力の予測を行う手法である。 Here, a basic description will be given of main methods constituting the process state prediction method according to the present embodiment.
[JIT modeling]
If a behavior that approximates the current behavior has been observed in the past, it can be considered that the progress of the current behavior will be an approximation of the past. One prediction method that reproduces this idea is Just-In-Time (JIT) modeling. JIT modeling does not have a fixed model, but retains past data vectors as a database. In this method, when a process needs to be predicted, a data vector having high similarity to a request point vector is searched from a database in which past data is accumulated, a local model is constructed, and an output is predicted.

対象とするプロセスが非線形かつ動的なプロセスであるとき、次式の回帰モデルでそのプロセスを表すことができる。 When the target process is a non-linear and dynamic process, the process can be represented by the following regression model.

ここで、プロセスの入力ベクトルｘ^ｋと出力ベクトルｙ^ｋを以下のように定義する。つまり、出力ベクトルｙ^ｋは、ｋ時における入力ベクトルｘ^ｋに対する（ｋ＋ｐ）時における出力、即ち予測値となる。 Here, an input vector x ^k and an output vector y ^k of the process are defined as follows. That is, the output vector y ^k is an output at (k + p) with respect to the input vector x ^k at k, that is, a predicted value.

時間の経過と共に、入力ベクトルｘ^ｋと出力ベクトルｙ^ｋのデータベクトルの組が（ｘ^１，ｙ^１），（ｘ^２，ｙ^２），…のように、対象とするプロセスから大量に得られ、データベクトル集合｛（ｘ^ｋ，ｙ^ｋ）｝（ｋ＝１，２，…）としてデータベースに蓄積される。ｋは離散化時間である。 Over time, a large number of sets of data vectors of the input vector x ^k and the output vector y ^k can be obtained from the target process, such as (x ¹ , y ¹ ), (x ² , y ² ),. , Data vector set {(x ^k , y ^k )} (k = 1, 2,...) Is stored in the database. k is the discretization time.

予測したい時点における出力ベクトルｙ^ｋｑに対応する入力ベクトルｘ^ｋｑを要求点ベクトルとし、要求点ベクトルと類似性が高い近傍データベクトルを上記データベースから取得する。要求点ベクトルと類似性が高い近傍データベクトルを選択する際の指標としては、次式で示すようなベクトル間距離（ユークリッド距離）などを用いることができる。 An input vector x ^kq corresponding to the output vector y ^kq at the time point to be predicted is set as a request point vector, and a neighborhood data vector having high similarity to the request point vector is acquired from the database. As an index for selecting a neighborhood data vector having high similarity to the requested point vector, an intervector distance (Euclidean distance) as shown by the following equation can be used.

近傍データベクトル群｛（ｘ^ｋｉ，ｙ^ｋｉ）｝（ｉ＝１，２，…，ｍ）が取得されると、この近傍データベクトル群を用いて局所モデルの構築を行い、出力ベクトルｙ^ｋｑの予測を行う。局所モデルとしては、重回帰モデルや、以下に示す相加平均法あるいは重み付き線形平均法などが用いられる。 When the neighborhood data vector group {(x ^ki , y ^ki )} (i = 1, 2,..., M) is acquired, a local model is constructed using the neighborhood data vector group, and the output vector y ^kq Make a prediction. As the local model, a multiple regression model, an arithmetic average method or a weighted linear average method shown below, or the like is used.

［主成分分析］
主成分分析は、データの特徴抽出及び低次元化を目的とする多変量解析手法であり、変数間の相関関係を捉えるため、変数の線形結合によって得られる主成分と呼ばれる合成変数を使用する。主成分分析では、データを最も良く表現できる方向に第１主成分を設定し、第１主成分と直交する空間上で、第１主成分では表現できないデータの変動を最も良く表現できる方向に第２主成分を設定するという手順で、主成分を次々と設定していく。ここで、データを最も良く表現する方向というのは、主成分得点の分散が最大となる方向という意味である。また、主成分得点とは、主成分が張る部分空間へデータを射影した値である。 [Principal component analysis]
Principal component analysis is a multivariate analysis method for the purpose of data feature extraction and reduction in dimensions, and uses synthetic variables called principal components obtained by linear combination of variables in order to capture correlations between variables. In the principal component analysis, the first principal component is set in the direction in which the data can be best expressed, and the variation in the data that cannot be expressed in the first principal component in the direction orthogonal to the first principal component is in the direction in which the first principal component can be expressed best. The principal components are set one after another by the procedure of setting two principal components. Here, the direction in which the data is best expressed means the direction in which the variance of the principal component scores is maximized. The principal component score is a value obtained by projecting data to a partial space spanned by the principal component.

［Ｑ統計量］
Ｑ統計量は、データベクトルのうち、主成分によって張られる部分空間では表現できない部分を表す。Ｑ統計量は二乗予測誤差とも呼ばれ、以下のように定義されている。
Ｉ行×Ｊ列のデータ行列Ｘがあるものとする。ここで、Ｊは変数の数、Ｉはサンプル数であり、各変数は標準化されている。
データ行列Ｘを特異値分解すると次式のようになる。 [Q statistics]
The Q statistic represents a portion of the data vector that cannot be expressed in the subspace spanned by the principal components. The Q statistic is also called a square prediction error and is defined as follows.
Assume that there is a data matrix X of I rows × J columns. Here, J is the number of variables, I is the number of samples, and each variable is standardized.
When the singular value decomposition of the data matrix X is performed, the following equation is obtained.

ＵとＶは直交行列であり、対角行列Ｓの対角要素には特異値ｓ_ｒが降順に並んでいる。採用する主成分の数をＲとすると、第ｒ主成分は負荷量行列Ｖ_Ｒの第ｒ列ｖ_ｒで与えられる。
第ｒ主成分得点ｔ_ｒは（９）式で与えられ、第Ｒ主成分得点までをまとめて表現すると、（１０）式となる。 U and V are orthogonal matrices, singular values s _r is the diagonal elements of the diagonal matrix S are arranged in descending order. When the number of employed principal component is R, the r principal component is given by the first r columns v _r loadings matrix V _R.
The r-th principal component score _tr is given by the equation (9), and when the R-th principal component score is collectively expressed, the equation (10) is obtained.

Ｔ_Ｒを元のＪ次元空間上の座標で表すと、再構築データ行列Ｘ＾は次のようになる。 Expressing T _R with the coordinates on the original J-dimensional space, reconstructed data matrix X ^ is as follows.

このとき、Ｑ統計量は次式で与えられる。 At this time, the Q statistic is given by the following equation.

［誤差評価方法］
局所モデルによる予測値と実測値との誤差評価は、二乗平均平方根誤差（以下では、「ＲＭＳＥ」と呼ぶことがある。）によって行う。ＲＭＳＥの定義式を（１３）式に示す。 [Error evaluation method]
The error evaluation between the predicted value and the actual measurement value based on the local model is performed using a root mean square error (hereinafter, sometimes referred to as “RMSE”). The formula for defining RMSE is shown in equation (13).

本実施の形態では、時刻ｔ＝ｔ_１からｔ＝ｔ_ＭＡＸまでに亘る二乗平均平方根誤差ＲＭＳＥ_Ｉ（（１４）式参照）を算出し、さらにデータ群による偏りを平均化するため、Ｈ個のデータ群それぞれについてＲＭＳＥ_Ｉを算出して、そのトータル量ＲＭＳＥ_{ＴＯＴＡＬ}（（１５）式参照）で評価する。例えば、ＲＭＳＥ_Ｉを２４時間に亘って算出した誤差とし、ＲＭＳＥ_{ＴＯＴＡＬ}をＨ日間に亘って算出した誤差とするなどが考えられる。 In the present embodiment, the root mean square error RMSE _I (see equation (14)) from time t = t ₁ to t = t _MAX is calculated, and the bias due to the data group is averaged. RMSE _I is calculated for each data group and evaluated by the total amount RMSE _TOTAL (see equation (15)). For example, RMSE _I may be an error calculated over 24 hours, and RMSE _TOTAL may be an error calculated over H days.

［プロセスの状態予測方法の詳細手順］
続いて、図２及び図３のフローチャートに基づいて本実施の形態に係るプロセスの状態予測方法の手順について詳細に説明する。
（Ｃ１）プラント設備におけるプロセスの操業状態を示す観測データから構成される入力ベクトルｘ^ｋ及び出力ベクトルｙ^ｋのデータベクトルの組（ｘ^ｋ，ｙ^ｋ）（ｋ＝１，２，…）が蓄積された大規模データベース１０を作成する。
（Ｃ２）入力ベクトルを構成する説明変数の数Ｍの最小値Ｍ_ＭＩＮ、最大値Ｍ_ＭＡＸ、増分値Ｍ_ＩＮＣと、近傍データベクトルの最大数ＮＮ_ＭＡＸ（過去事例数）の最小値Ｎ_ＭＩＮ、最大値Ｎ_ＭＡＸ、増分値Ｎ_ＩＮＣを設定する（ＳＴ１０）。 [Detailed procedure for predicting process status]
Next, the procedure of the process state prediction method according to the present embodiment will be described in detail based on the flowcharts of FIGS.
(C1) A set of data vectors (x ^k , y ^k ) (k = 1, 2,...) Of the input vector x ^k and the output vector y ^k composed of observation data indicating the operation state of the process in the plant equipment is accumulated. The created large-scale database 10 is created.
(C2) Minimum value M _MIN , maximum value M _MAX , increment value M _INC of the number M of explanatory variables constituting the input vector, and minimum value N _MIN , maximum of the maximum number NN _MAX (number of past cases) of neighboring data vectors A value N _MAX and an increment value N _INC are set (ST10).

（Ｃ３）説明変数の数Ｍの初期値をＭ_ＭＩＮ、近傍データセットの最大数ＮＮ_ＭＡＸの初期値をＮ_ＭＩＮとする（ＳＴ１１）。
（Ｃ４）出力ベクトルを構成する目的変数との単相関係数が大きな、即ち目的変数に対する寄与率が大きな上位Ｍ個の説明変数を選択し（ＳＴ１２）、当該変数からなる新たなデータベース１１を大規模データベース１０から作成する。なお、目的変数と説明変数の間に時間遅れが存在する可能性がある場合は、見込まれる最大の時間遅れ変数も説明変数に加える。
作成されるデータベース１１の構成を図４に示す。このデータベース１１では、入力変数の数がＭ個、出力変数の数がＬ個、各変数のサンプル数がＫ個とされている。各データは日時に応じたＩＤが付けられ、同じＩＤに属するデータは１つのデータベクトルとして扱われる。 (C3) The initial value of the number M of explanatory variables is M _MIN , and the initial value of the maximum number NN _MAX of neighboring data sets is N _MIN (ST11).
(C4) The top M explanatory variables having a large single correlation coefficient with the objective variable constituting the output vector, that is, the contribution ratio with respect to the objective variable is selected (ST12), and the new database 11 composed of the variable is enlarged. Created from the scale database 10. If there is a possibility that a time delay exists between the objective variable and the explanatory variable, the maximum expected time delay variable is also added to the explanatory variable.
The configuration of the database 11 to be created is shown in FIG. In this database 11, the number of input variables is M, the number of output variables is L, and the number of samples of each variable is K. Each data is given an ID according to the date and time, and data belonging to the same ID is handled as one data vector.

（Ｃ５）予測したい時点における出力ベクトルＹ^ｑに対応する入力ベクトルＸ^ｑからなる要求点ベクトルを設定する（ＳＴ１３）。図５に要求点ベクトルの構成を示す。
（Ｃ６）データベース１１に格納されている各データベクトルと要求点ベクトルとのベクトル間距離を（４）式や（５）式を用いて計算し、ベクトル間距離が小さいものから順にＮＮ_ＭＡＸ個の近傍データベクトルを全て収集する。そして、収集した近傍データベクトルを、ベクトル間距離が近い順に近傍データセットＡとして保存する（ＳＴ１４）。図６に近傍データセットＡの構成を示す。図６において「Ｎｏ．」が近傍データ数を表している。
（Ｃ７）近傍データセットＡの中から近傍データ数（Ｎｏ．）が１〜ＮＮ_ＭＩＮまでの近傍データベクトルを選択して近傍データセットＢ_０を作成する（ＳＴ１５）。即ち、要求点ベクトルとのベクトル間距離が近いものから順にＮＮ_ＭＩＮ個の近傍データベクトルを選択する。図７に近傍データセットＢ_０の構成を示す。 (C5) sets the required point vector and an input vector ^{X q} corresponding to the output vector ^{Y q} at the time to be predicted (ST13). FIG. 5 shows the configuration of the request point vector.
(C6) The inter-vector distance between each data vector stored in the database 11 and the requested point vector is calculated using the formulas (4) and (5), and the NN _MAX pieces are calculated in order from the smallest vector distance. Collect all neighborhood data vectors. Then, the collected neighborhood data vectors are stored as the neighborhood data set A in order of increasing distance between vectors (ST14). FIG. 6 shows the configuration of the neighborhood data set A. In FIG. 6, “No.” represents the number of neighboring data.
(C7) number of neighbors data from neighboring data set A (No.) creates a neighboring data set _{B 0} Select neighborhood data vector to _{1~NN MIN} (ST15). That is, NN _MIN neighboring data vectors are selected in order from the shortest vector distance to the requested point vector. It shows the structure of a neighboring data set B ₀ in Fig.

（Ｃ８）近傍データセットＢ_０に対して主成分分析を実施し、負荷量行列Ｖ_Ｒを求める（ＳＴ１６）。具体的には、近傍データセットＢ_０をデータ行列Ｘとして特異値分解すればよい。
（Ｃ９）要求点ベクトルｘ^ｑが（１６）式で表されるとすると、要求点ベクトルｘ^ｑを再構築した再構築ベクトルｘ＾^ｑは、負荷量行列Ｖ_Ｒを用いて（１７）式により算出される。従って、近傍データセットＢ_０に対するＱ統計量は、（１８）式より得ることができる（ＳＴ１７）。算出されたＱ統計量は、図８に示すＱ値テーブルに保存される。 (C8) The principal component analysis was performed on the proximate data sets _{B 0,} obtains the loading matrix _{V R} (ST16). Specifically, the singular value decomposition may be performed using the neighborhood data set B ₀ as the data matrix X.
(C9) the request point vector ^{x q} is to be expressed by equation (16), reconstructed vector x ^{^ q} to a reconstructed request point vector ^{x q} is the using load matrix _{V R} (17) below Calculated. Therefore, the Q statistic for the neighborhood data set B ₀ can be obtained from the equation (18) (ST17). The calculated Q statistic is stored in the Q value table shown in FIG.

（Ｃ１０）Ｑ統計量が算出された近傍データセットＢ_０の近傍データ数がＮＮ_ＭＡＸ以上であるかどうか判断される（ＳＴ１８）。近傍データ数がＮＮ_ＭＡＸ未満である場合は、近傍データセットＡの内、近傍データセットＢ_０に含まれていない近傍データベクトルの中から、さらにＳ個の近傍データベクトルを、近傍データ数（Ｎｏ．）が小さいほうから（要求点ベクトルとのベクトル間距離が近いものから）選択し、近傍データセットＢ_０に追加して新たな近傍データセットＢ_１を作成する（ＳＴ１９）。そして、ＳＴ１６のステップに戻る。
（Ｃ１１）一方、近傍データ数がＮＮ_ＭＡＸ以上になった場合は、Ｑ値テーブルに基づいて、Ｑ統計量が最小となったデータセットＢ_ｋをデータセットＡから選択する。そして、データセットＢ_ｋに対応する出力ベクトルを、データセットＢ_ｋのＩＤに基づいてデータベース１１から取得して、重回帰モデルや重み付き線形平均法などを用いて局所モデルを構築する（ＳＴ２０）。 (C10) It is determined whether or not the number of neighboring data in the neighboring data set B ₀ for which the Q statistic is calculated is greater than or equal to NN _MAX (ST18). When the number of neighboring data is less than NN _MAX , among the neighboring data sets that are not included in the neighboring data set B ₀ in the neighboring data set A, S neighboring data vectors are further converted into the number of neighboring data (No .) those from) selecting (close inter-vector distance between the request point vector from the smaller, in addition to the vicinity of the data set B ₀ to create a new neighborhood data set B ₁ (ST19). Then, the process returns to step ST16.
(C11) On the other hand, when the number of neighboring data becomes NN _MAX or more, the data set B _k having the minimum Q statistic is selected from the data set A based on the Q value table. Then, an output vector corresponding to the data set B _k, acquired from the database 11 based on the ID of the data set B _k, to construct a local model by using a multiple regression model and weighted linear average method (ST20) .

（Ｃ１２）時刻ｔ＝ｔ_１からｔ＝ｔ_ＭＡＸまでに亘る二乗平均平方根誤差ＲＭＳＥ_Ｉを（１４）式により算出し、Ｈ日間のトータル量であるＲＭＳＥ_{ＴＯＴＡＬ}を（１５）式により算出する（ＳＴ２１）。
（Ｃ１３）ＮＮ_ＭＡＸがＮ_ＭＡＸ以上かどうかチェックが行われ（ＳＴ２２）、ＮＮ_ＭＡＸがＮ_ＭＡＸ未満の場合は、ＮＮ_ＭＡＸ＋Ｎ_ＩＮＣを新規ＮＮ_ＭＡＸに更新（ＳＴ２５）してＳＴ１４に戻る。
（Ｃ１４）一方、ＮＮ_ＭＡＸがＮ_ＭＡＸ以上の場合は、ＭがＭ_ＭＡＸ以上かどうかチェックが行われ（ＳＴ２３）、ＭがＭ_ＭＡＸ未満の場合は、Ｍ＋Ｍ_ＩＮＣを新規Ｍに更新（ＳＴ２６）してＳＴ１２に戻る。
（Ｃ１５）ＭがＭ_ＭＡＸ以上の場合は、構築された複数の局所モデルのなかで、ＲＭＳＥ_{ＴＯＴＡＬ}が最小となる局所モデルを選択する（ＳＴ２４）。 (C12) The root mean square error RMSE _I from time t = t ₁ to t = t _{MAX is} calculated by equation (14), and RMSE _TOTAL , which is the total amount for H days, is calculated by equation (15) (ST21). ).
_{(C13) NN MAX} is whether a check is made whether more than _{N MAX} _(ST22), if _{NN MAX} is less than _{N _MAX,} update the NN MAX _{+ N INC} to the new _{NN MAX} (ST25) to return to ST14 is.
(C14) On the other hand, if NN _MAX is greater than or equal to N _MAX , it is checked whether M is greater than or equal to M _MAX (ST23). If M is less than M _MAX , M + M _INC is updated to a new M (ST26). Return to ST12.
(C15) If M is greater than or equal to M _MAX , a local model with the smallest RMSE _TOTAL is selected from among the plurality of constructed local models (ST24).

以上、本発明の一実施の形態について説明してきたが、本発明は何ら上記した実施の形態に記載の構成に限定されるものではなく、特許請求の範囲に記載されている事項の範囲内で考えられるその他の実施の形態や変形例も含むものである。例えば、上記実施の形態では、局所モデルによる予測値と実測値との誤差評価に、二乗平均平方根誤差を使用しているが、平均二乗誤差や平均絶対誤差等、他の評価方法を使用してもよい。 Although one embodiment of the present invention has been described above, the present invention is not limited to the configuration described in the above-described embodiment, and is within the scope of matters described in the claims. Other possible embodiments and modifications are also included. For example, in the above embodiment, the root mean square error is used for the error evaluation between the predicted value and the actual measurement value by the local model, but other evaluation methods such as the mean square error and the average absolute error are used. Also good.

本実施の形態に係るプロセスの状態予測方法の効果について検証するため、熱反応炉の炉頂ガス温度の予測を行い、実測値との比較を行った。 In order to verify the effect of the process state prediction method according to the present embodiment, the top gas temperature of the thermal reactor was predicted and compared with the actual measurement value.

検証に使用したデータは、２年間に亘るごみ処理プロセスにおいて測定された観測データである。取り込んだデータは、ノイズ除去のため、１時間の移動平均フィルタを掛けて平滑化した。サンプリング時間は２０分、総データ数は３８８０９個である。 The data used for the verification is observation data measured in a waste disposal process over 2 years. The acquired data was smoothed by applying a moving average filter for 1 hour to remove noise. The sampling time is 20 minutes and the total number of data is 38809.

図９は、検証時に設定した説明変数の数Ｍ及び過去事例数ＮＮ_ＭＡＸに対して算出されたＲＭＳＥ_{ＴＯＴＡＬ}の一覧を示したものである。本検証では、説明変数の数Ｍの最小値を２０個、最大値を４６個、増分値を２個とし、過去事例数ＮＮ_ＭＡＸの最小値を１００個、最大値を２００個、増分値を５０個として、ＲＭＳＥ_{ＴＯＴＡＬ}を算出した。その結果、説明変数の数Ｍが３０個、過去事例数ＮＮ_ＭＡＸが１５０個のときにＲＭＳＥ_{ＴＯＴＡＬ}は最小となった。
なお、局所モデルの構築には重回帰モデルを使用した。 FIG. 9 shows a list of RMSE _TOTALs calculated for the number M of explanatory variables and the number of past cases NN _MAX set at the time of verification. In this verification, the minimum value of the number M of explanatory variables is 20, the maximum value is 46, the increment value is 2, the minimum value of the past case number NN _MAX is 100, the maximum value is 200, and the increment value is As 50, RMSE _TOTAL was calculated. As a result, when the number M of explanatory variables was 30 and the number of past cases NN _MAX was 150, the RMSE _TOTAL was minimized.
A multiple regression model was used to construct the local model.

本実施の形態に係るプロセスの状態予測方法によって構築された局所モデル（Ｍ＝３０、ＮＮ_ＭＡＸ＝１５０）による予測値を実測値と対比した時刻歴グラフを図１０に、従来の予測方法によって構築された局所モデル（Ｍ＝２０、ＮＮ_ＭＡＸ＝１５０）による予測値を実測値と対比した時刻歴グラフを図１１に、従来の予測方法によって構築された局所モデル（Ｍ＝４０、ＮＮ_ＭＡＸ＝１５０）による予測値を実測値と対比した時刻歴グラフを図１２にそれぞれ示す。
これらの図より、本実施の形態に係るプロセスの状態予測方法によって構築された局所モデルによる予測値が最も実測値に近く、従来の予測方法によって構築された局所モデルの場合、誤差が徐々に大きくなっていくことがわかる。 FIG. 10 shows a time history graph in which a predicted value based on a local model (M = 30, NN _MAX = 150) constructed by the process state prediction method according to the present embodiment is compared with an actual measurement value, and is constructed by a conventional prediction method. FIG. 11 shows a time history graph in which the predicted value based on the local model (M = 20, NN _MAX = 150) is compared with the actual measurement value, and FIG. 11 shows the local model (M = 40, NN _MAX = 150) constructed by the conventional prediction method. FIG. 12 shows time history graphs in which the predicted values obtained by) are compared with the actually measured values.
From these figures, the prediction value by the local model constructed by the process state prediction method according to the present embodiment is the closest to the actual measurement value, and the error gradually increases in the case of the local model constructed by the conventional prediction method. I understand that it will become.

１０：大規模データベース、１１：データベース 10: Large database, 11: Database

Claims

A request that consists of an input vector corresponding to an output vector at the point in time when you want to create a database in which data vectors that consist of input vectors and output vectors consisting of observation data indicating the operational status of the process in the plant equipment are stored A process state prediction method for obtaining at least one or more data vectors similar to a point vector from the database as neighboring data vectors, constructing a local model from the neighboring data vectors, and predicting an output vector at the time point to be predicted In
Using the number M of explanatory variables constituting the input vector and the maximum number NN _MAX of the neighborhood data vectors as parameters, a plurality of neighborhood data vector sets in which the neighborhood data vectors are stored are created, and each neighborhood data vector set is Performing component analysis, constructing the local model having a minimum Q statistic for the request point vector for each set of neighboring data vectors, and calculating an error between a predicted value and an actual measurement value of the local model When,
Selecting the local model that minimizes the error among a plurality of the local models constructed using the number M of the explanatory variables and the maximum number NN _MAX of the neighboring data vectors as parameters. The process state prediction method.

2. The process state prediction method according to claim 1, wherein an error between the predicted value and the actual measurement value of the local model is calculated by a root mean square error.