JP5868216B2

JP5868216B2 - Clustering apparatus and clustering program

Info

Publication number: JP5868216B2
Application number: JP2012040134A
Authority: JP
Inventors: 誠今村; 齋藤　裕; 裕齋藤
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2012-02-27
Filing date: 2012-02-27
Publication date: 2016-02-24
Anticipated expiration: 2032-02-27
Also published as: JP2013175108A

Description

本発明は、プラントを構成する機器の故障や性能劣化等の異常の予兆を検知するためのプラントの異常検知装置に関する。 The present invention relates to a plant abnormality detection apparatus for detecting a sign of abnormality such as a failure of a device constituting a plant or performance deterioration.

火力、水力、原子力などの発電プラント、化学プラント、鉄鋼プラント、上下水道プラントなどでは、プラントのプロセスを制御するための計装システムが導入されている。これらのプラントの計装システムでは、装置に取り付けられたセンサが取得した種々の時系列データが蓄積されている。この時系列データを用いて、プラントの監視や保守に役立てたいというニーズがある。 In power plants such as thermal power, hydropower, and nuclear power, chemical plants, steel plants, and water and sewage plants, instrumentation systems for controlling plant processes are introduced. In these plant instrumentation systems, various time-series data acquired by sensors attached to the apparatus are accumulated. There is a need to use this time-series data for plant monitoring and maintenance.

例えば、以下に示した特許文献１では、過去のプラントのセンサ信号のデータと、観測データとの類似度を計算することにより、観測データの外れ度合いを出力することにより、異常検知する方法が記載されている。しかし、プラントでは、起動、定常、停止などの運転モード、燃料の成分による発熱効率の差異、設備の劣化などにより、過去のプラントのセンサ信号のデータのばらつきが大きいため、学習の対象となるセンサ信号のデータを、運転条件毎に集める必要があり、この負荷が大きいという課題があった。 For example, Patent Document 1 shown below describes a method of detecting anomalies by outputting the degree of detachment of observation data by calculating the similarity between sensor signal data of the past plant and observation data. Has been. However, in the plant, there are large variations in the sensor signal data of the past plant due to operation modes such as start, steady, and stop, differences in heat generation efficiency due to fuel components, and deterioration of equipment. There is a problem that it is necessary to collect signal data for each operation condition, and this load is large.

上記の課題を解決する方法として、以下に示した特許文献２では、時系列データを時間的な変化に基づいて、データ空間の軌跡を複数の軌跡区分に分割して、各々の軌跡区分毎に対象をモデル化することにより、プラントを構成する機器の性能劣化状態を定常的に評価する方法が記載されている。 As a method for solving the above-described problem, in Patent Document 2 shown below, based on temporal changes, time-series data is divided into a plurality of trajectory segments, and each trajectory segment is divided. A method of constantly evaluating the performance deterioration state of the equipment constituting the plant by modeling the object is described.

特表２００４−５３１８１５、「予測的状態監視のための診断システムおよび方法」JP 2004-531815, “Diagnostic system and method for predictive state monitoring” 特開２０１０−９２３５５「異常検知方法及びシステム」JP 2010-92355 “Abnormality Detection Method and System”

上記の特許文献２の方法では、対象データを時間に沿ってデータ間の距離が定めたしきい値を超えれば、別のクラスタとし、しきい値を超えなければ、同じクラスタとして扱うという方法に分割している。このため、プラントの起動、定常、停止などの運転条件に起因する複数のセンサ信号間の関係を捉えることができるため、過去のデータを運転条件毎に集める作業を不要としている。しかし、異常の判定時には、過去に類似した分割区分があるかどうかで判定するので、その分割区分の異常がどの程度まれに生じるものかといった統計的な偏りについては、判定することができないという課題がある。したがって、収集したデータ中に、センサ故障等による異常データや、異常となる直前のデータが混入している場合には、そのデータに起因して、異常検知の精度が低下する可能性がある。 In the method of Patent Document 2, the target data is handled as another cluster if the distance between the data exceeds a predetermined threshold along the time, and is treated as the same cluster if the threshold is not exceeded. It is divided. For this reason, since the relationship between the several sensor signals resulting from operation conditions, such as starting of a plant, a steady state, and a stop, can be caught, the operation | work which collects past data for every operation condition is made unnecessary. However, when determining abnormality, it is determined by whether there are similar divisions in the past, so it is not possible to determine statistical bias such as how rare the abnormality of the division is. There is. Therefore, if the collected data contains abnormal data due to a sensor failure or data immediately before an abnormality occurs, the accuracy of abnormality detection may be reduced due to the data.

また、統計的な処理に基づく異常検知は、検知精度は１００％にはならないので、プラントの保守員や監視員が対処をとる際には、異常とする判断の根拠をわかりやすく説明する機能が求められる。しかし、従来技術（特許文献２）では、区分された軌跡によりプラントシステムを部分空間法によりモデル化しているので、数学的に変換した結果に対する異常判定となっており、センサ信号間の関係や異常判断の根拠の説明が難しいという課題がある。 In addition, abnormality detection based on statistical processing does not have a detection accuracy of 100%. Therefore, when plant maintenance personnel and supervisors take measures, there is a function that easily explains the basis for determining an abnormality. Desired. However, in the prior art (Patent Document 2), since the plant system is modeled by the subspace method based on the segmented trajectory, it is an abnormality determination for the result of mathematical conversion, and the relationship between the sensor signals and the abnormality There is a problem that it is difficult to explain the grounds of judgment.

この発明は、計装システムが蓄積する時系列データを活用することにより、プラントを構成する設備や機器の故障や性能劣化等の異常の予兆を検知する装置の、異常検知の精度を向上させることを目的とする。 This invention improves the accuracy of abnormality detection of an apparatus for detecting a sign of abnormality such as a failure of a facility or equipment constituting a plant or performance deterioration by utilizing time series data accumulated by an instrumentation system. With the goal.

この発明のクラスタリング装置は、
種類の異なる複数の時系列データから、第１時間範囲から第Ｎ時間範囲の異なるＮ個（Ｎは２以上の整数）の時間範囲ごとにその時間範囲に属する時系列データを抽出し、その時間範囲における複数の時系列データの組からなるＮ個の局所時系列データを生成する局所時系列データ抽出部と、
前記局所時系列データ抽出部が抽出したＮ個の局所時系列データを、初期クラスタ分割の規則として予め設定された初期クラスタ分割規則に従って、予め設定された初期クラスタ数の初期クラスタに分割し、分割した初期クラスタごとに初期クラスタの特性を指標する代表情報を生成し、生成した代表情報ごとにＮ個の局所時系列データを再クラスタリングの規則として予め設定された再クラスタリング規則に従って分配することによりＮ個の局所時系列データをクラスタに分割する再クラスタリングを実行し、再クラスタリングしたクラスタごとに代表情報を再生成し、再生成した代表情報ごとに前記局所時系列データ抽出部が抽出したＮ個の局所時系列データを再クラスタリングし、
以降同様に、
Ｎ個の局所時系列データの再クラスタリングと、代表情報の再生成とを繰り返すと共に、代表情報を再生成するたびに、今回生成した代表情報が直前に生成した代表情報に対して変化があるかどうかを判定し、変化があるときには次回の代表情報の再生成処理を継続し、変化がないときには次回の代表情報の再生成処理を継続することなくＮ個の局所時系列データの再クラスタリングと代表情報の再生成との処理を終了する局所時系データクラスタリング部と
を備えたことを特徴とする。 The clustering device of this invention
Time series data belonging to the time range is extracted for each N time ranges (N is an integer of 2 or more) from the first time range to the Nth time range from a plurality of different time series data. A local time-series data extraction unit that generates N local time-series data including a plurality of sets of time-series data in a range;
N local time-series data extracted by the local time-series data extraction unit is divided into initial clusters having a predetermined number of initial clusters according to an initial cluster dividing rule set in advance as an initial cluster dividing rule, and divided. N is generated by generating representative information indicating the characteristics of the initial cluster for each initial cluster, and distributing N local time-series data according to the re-clustering rule set in advance as a re-clustering rule for each generated representative information. Re-clustering is performed to divide the local time-series data into clusters, representative information is regenerated for each re-clustered cluster, and N local time-series data extraction units extracted for each re-generated representative information Re-cluster local time series data,
Similarly,
Repeat the re-clustering of N local time-series data and the regeneration of the representative information, and every time the representative information is regenerated, is the representative information generated this time changed from the representative information generated immediately before? When there is a change, the next representative information regeneration process is continued, and when there is no change, the next representative information regeneration process is continued without re-clustering and representing the N local time-series data. And a local time system data clustering unit that terminates the process of regenerating information.

この発明により、プラントを構成する設備や機器の故障や性能劣化等の異常の予兆を検知する装置において、検知精度を向上させることができる。 According to the present invention, detection accuracy can be improved in an apparatus for detecting a sign of abnormality such as a failure or performance deterioration of equipment or equipment constituting a plant.

実施の形態１におけるプラント異常検知装置１００の構成を示すブロック図。1 is a block diagram showing a configuration of a plant abnormality detection device 100 according to Embodiment 1. FIG. 実施の形態１における時系列データを説明する図。FIG. 4 illustrates time-series data according to Embodiment 1. 実施の形態１における局所時系列データ抽出の説明図。Explanatory drawing of local time series data extraction in Embodiment 1. FIG. 実施の形態１におけるセンサ信号間の相関関係の変化の一例（連続的な変化）を示す説明図。Explanatory drawing which shows an example (continuous change) of the change of the correlation between the sensor signals in Embodiment 1. FIG. 実施の形態１におけるセンサ信号間の相関関係の変化の一例（不連続な変化）を示す説明図。Explanatory drawing which shows an example (discontinuous change) of the change of the correlation between the sensor signals in Embodiment 1. FIG. 実施の形態１におけるセンサ信号間の相関関係の変化の一例（値の区間への依存性）を示す説明図。Explanatory drawing which shows an example (dependency to the area of a value) of the change of the correlation between the sensor signals in Embodiment 1. FIG. 実施の形態１における局所時系列データクラスタリングの一例を示す説明図。Explanatory drawing which shows an example of the local time series data clustering in Embodiment 1. FIG. 実施の形態１における大域的時系列データモデルの推定の一例を示す説明図。FIG. 3 is an explanatory diagram illustrating an example of estimation of a global time series data model in the first embodiment. 実施の形態１におけるプラント異常検知装置１００の処理の全体の流れを説明するフロー。5 is a flow for explaining the overall flow of processing of the plant abnormality detection device 100 according to the first embodiment. 実施の形態１における局所時系列データクラスタリング部１０４の処理の流れを説明するフロー。5 is a flow for explaining a processing flow of the local time-series data clustering unit 104 according to the first embodiment. 実施の形態１におけるデータの値のレンジで区分する場合の局所時系列データを説明する図。The figure explaining the local time series data in the case of classifying by the range of the data value in the first embodiment. 実施の形態１における局所時系列データモデル推定部１０３の処理を概念的に示す図。FIG. 3 is a diagram conceptually showing processing of a local time series data model estimation unit 103 in the first embodiment. 実施の形態１における「Ｎ×Ｋ」個の局所構造Ｓ（Ｌ_ｋｉ）を概念的に示す図。FIG. 3 is a diagram conceptually showing “N × K” local structures S (L _ki ) in the first embodiment. 実施の形態１における（式５）の意味を示す図。FIG. 6 shows the meaning of (Formula 5) in Embodiment 1; 図１０のＳ１００４の処理を概念的に示す図。The figure which shows notionally the process of S1004 of FIG. 図１０のＳ１００５の処理を概念化した図。The figure which conceptualized the process of S1005 of FIG. 図１０の最初のＳ１００６からＳ１００４に戻った場合を説明する図。The figure explaining the case where it returns to S1004 from the first S1006 of FIG. 図１０のｋ＝２、ｋ＝３の実行後を示す概念図。The conceptual diagram which shows after execution of k = 2 of FIG. 10, k = 3. 図１０のｋ＝１〜Ｋの実行後を示す概念図。The conceptual diagram which shows after execution of k = 1-K of FIG. 図１０の全体処理概要を示す図。The figure which shows the whole process outline | summary of FIG. 実施の形態１における大域的時系列データモデル推定部１０５の処理の流れを説明するフロー図。FIG. 4 is a flowchart for explaining a processing flow of a global time series data model estimation unit 105 according to the first embodiment. 実施の形態２におけるプラント異常検知装置１００の外観の一例を示す図。The figure which shows an example of the external appearance of the plant abnormality detection apparatus 100 in Embodiment 2. FIG. 実施の形態２におけるプラント異常検知装置１００のハードウェア構成例を示す図。The figure which shows the hardware structural example of the plant abnormality detection apparatus 100 in Embodiment 2. FIG.

実施の形態１．
図１は、本実施の形態１に係るプラント異常検知装置１００の一実施例の構成を示すブロック図である。各構成要素の機能概要を説明する。以下の実施の形態の説明では、ｔは時間を示す。
（１）プラント時系列データベース１０１は、異常検知の対象となるプラント機器等について、時間の経過に従って順次観測して得た複数の時系列データを格納するデータベースである。
（２）局所時系列データ抽出部１０２は、プラント時系列データベース１０１中の多次元時系列データを入力とする。例えば後述の図２のような、投入燃料に対する発熱量ｙ（ｔ）、燃料投入量ｘ１（ｔ）、及び温度ｘ２（ｔ）の組、（ｙ（ｔ）、ｘ１（ｔ）、ｘ２（ｔ））のような多次元時系列データを入力とする。局所時系列データ抽出部１０２は、その入力データの時間的な変化の仕方に従って、この入力データを「時間」や「データの値」で区分することにより、区分された時系列データを抽出する。この区分された時系列データを「局所時系列データ３０１」と呼ぶ。
（３）局所時系列データモデル推定部１０３は、局所時系列データ抽出部１０２が抽出した局所時系列データ３０１を、多変量解析、または、時系列解析手法によりモデル推定する。モデル推定とは、例えば、「局所時系列データ３０１」ごとに、回帰式を求める処理である。
（４）局所時系列データクラスタリング部１０４は、局所時系列データモデル推定部１０３が推定した局所時系列データ３０１の「モデルの集合」をクラスタに分割すると共に、クラスタ毎にクラスタを代表する「代表局所パラメータ」を推定（算出）する。
（５）大域的時系列データモデル推定部１０５は、局所時系列データクラスタリング部１０４が推定したモデルを接続することにより、大域的な代表時系列データモデルを推定する。
（６）外れ値検出部１０６は、別途与えられた区分データに対して、局所時系列データクラスタリング部１０４あるいは、大域的時系列データモデル推定部１０５により得られた代表局所時系列データモデルの集合における外れ値が大きいものを異常として検出する。 Embodiment 1 FIG.
FIG. 1 is a block diagram showing a configuration of an example of a plant abnormality detection apparatus 100 according to the first embodiment. The functional outline of each component will be described. In the following description of the embodiments, t represents time.
(1) The plant time-series database 101 is a database that stores a plurality of time-series data obtained by sequentially observing plant equipment and the like that are targets of abnormality detection as time passes.
(2) The local time series data extraction unit 102 receives multidimensional time series data in the plant time series database 101 as input. For example, as shown in FIG. 2 to be described later, a set of a calorific value y (t) for the input fuel, a fuel input amount x1 (t), and a temperature x2 (t), (y (t), x1 (t), x2 (t )) As input. The local time series data extraction unit 102 extracts the divided time series data by classifying the input data by “time” or “data value” in accordance with the temporal change of the input data. This divided time series data is referred to as “local time series data 301”.
(3) The local time series data model estimation unit 103 estimates the model of the local time series data 301 extracted by the local time series data extraction unit 102 by multivariate analysis or a time series analysis method. Model estimation is, for example, processing for obtaining a regression equation for each “local time series data 301”.
(4) The local time series data clustering unit 104 divides the “model set” of the local time series data 301 estimated by the local time series data model estimation unit 103 into clusters, and “represents a cluster for each cluster”. Estimate (calculate) "local parameters".
(5) The global time series data model estimation unit 105 estimates a global representative time series data model by connecting the models estimated by the local time series data clustering unit 104.
(6) The outlier detection unit 106 is a set of representative local time-series data models obtained by the local time-series data clustering unit 104 or the global time-series data model estimation unit 105 with respect to separately provided segment data. A large outlier is detected as an abnormality.

（多次元時系列データの例）
図２は、局所時系列データ抽出部１０２が入力する時系列データの説明図である。時系列データとは、時刻ｔに対して、時刻ｔのセンサ信号値を対応させる関数ｘ（ｔ）として表現できるものである。時刻ｔは、連続であってもよいし、サンプリングによって一定周期ごとに記録されるものであってもよい。本明細書では、上記のように、複数の時系列データの組み（多次元時系列データ）を扱う。
図２は、
（ａ）時系列データｙ（ｔ）、
（ｂ）時系列データｘ_１（ｔ）、
（ｃ）時系列データｘ_２（ｔ）、
の三組の時系列データの例を示している。複数の時系列データの組は、時刻ｔのベクトル値関数（ｙ（ｔ），ｘ_１（ｔ），ｘ_２（ｔ））とみなすことができる。 (Example of multidimensional time series data)
FIG. 2 is an explanatory diagram of time-series data input by the local time-series data extraction unit 102. The time series data can be expressed as a function x (t) that associates the sensor signal value at time t with respect to time t. Time t may be continuous or may be recorded at regular intervals by sampling. In the present specification, as described above, a plurality of sets of time series data (multidimensional time series data) are handled.
FIG.
(A) Time series data y (t),
(B) Time series data x ₁ (t),
(C) Time series data x ₂ (t),
An example of three sets of time series data is shown. A set of a plurality of time series data can be regarded as a vector value function (y (t), x ₁ (t), x ₂ (t)) at time t.

（時系列データの区分けの例：その１）
図３は、局所時系列データ抽出部１０２による時系列データの区分けの例を示す図であり、局所時系列データ抽出部１０２の動作の一例を示す説明図である。
（ａ）は、プラント時系列データベース１０１に格納されている元データである時系列データｘ（ｔ）の例である。
（ｂ）は、局所時系列データ抽出部１０２が時系列データ（ａ）を平滑化することにより得られた「平滑化処理後の時系列データ」の例である。
（ｃ）は、時系列データ（ｂ）に対して、時間差分（ｘ（ｔ_ｉ＋１）−ｘ（ｔ_ｉ））をとった値をもつ時系列データの例である。つまり（ｃ）はΔｘ／Δｔを示す。
（ｄ）は、（ｃ）の値の絶対値がある閾値以上である時刻で、元信号データ（ａ）を区分することにより得られた、局所時系列データ３０１の例である。（ｄ）では、元信号データ（ａ）が、８つの区分に分割されている。つまり、（ｄ）では（ａ）の時系列データｘ（ｔ）が、８つの局所時系列データ３０１に区分された場合を示している。 (Example of time-series data classification: Part 1)
FIG. 3 is a diagram illustrating an example of time-series data segmentation by the local time-series data extraction unit 102 and an explanatory diagram illustrating an example of the operation of the local time-series data extraction unit 102.
(A) is an example of time-series data x (t) that is original data stored in the plant time-series database 101.
(B) is an example of “time-series data after smoothing processing” obtained by the local time-series data extraction unit 102 smoothing the time-series data (a).
(C) is an example of time-series data having a value obtained by taking a time difference (x (t _{i + 1} ) −x (t _i )) with respect to time-series data (b). That is, (c) indicates Δx / Δt.
(D) is an example of the local time series data 301 obtained by classifying the original signal data (a) at a time when the absolute value of the value of (c) is equal to or greater than a threshold value. In (d), the original signal data (a) is divided into eight sections. That is, (d) shows a case where the time series data x (t) of (a) is divided into eight local time series data 301.

（時系列データの区分けの例：その２）
図４は、局所時系列データ抽出部１０２による時系列データの区分けの別の例を示す図であり、センサ信号間の相関関係の変化の一例（連続的な変化）を示す説明図である。（ａ）と（ｂ）は、時系列データの例である。（ａ）をｙ（ｔ）、（ｂ）をｘ（ｔ）とする。（ｃ）は、ｙ（ｔ）とｘ（ｔ）との相関関係を示す時系列データの例である。（ｃ）の例では、相関関係を時系列データｙ（ｔ）とｘ（ｔ）を回帰分析した際の係数としてとらえており、時間の経過と共に少しずつ連続的に値が小さくなっている。設備の劣化などにより、設備の効率が少しずつ低下している場合には、このように連続的に値が変化する。例えば、ｙ（ｔ）を発熱量、ｘ（ｔ）を燃料投入量とすると、設備の劣化により、設備の効率が少しずつ低下していることを表している。局所時系列データ抽出部１０２は図４（ｃ）の５つの区間を、それぞれ局所時系列データ３０１とすることができる。 (Example of time-series data classification: Part 2)
FIG. 4 is a diagram illustrating another example of time-series data segmentation by the local time-series data extraction unit 102, and is an explanatory diagram illustrating an example (continuous change) of a change in correlation between sensor signals. (A) and (b) are examples of time-series data. Let (a) be y (t) and (b) be x (t). (C) is an example of time-series data indicating the correlation between y (t) and x (t). In the example of (c), the correlation is regarded as a coefficient when regression analysis is performed on the time series data y (t) and x (t), and the value gradually decreases with time. When the efficiency of the equipment is gradually decreasing due to deterioration of the equipment, the value continuously changes in this way. For example, if y (t) is the amount of heat generated and x (t) is the amount of fuel input, it indicates that the efficiency of the facility is gradually decreasing due to the deterioration of the facility. The local time-series data extraction unit 102 can set the five sections in FIG. 4C as local time-series data 301, respectively.

（時系列データの区分けの例：その３）
図５は、局所時系列データ抽出部１０２による時系列データの区分けの別の例を示す図であり、センサ信号間の相関関係の変化の一例（不連続な変化）を示す説明図である。（ａ）と（ｂ）は、時系列データの例である。各々、ｙ（ｔ）、ｘ（ｔ）とする。（ｃ）は、ｙ（ｔ）とｘ（ｔ）の相関関係を示す時系列データの例である。（ｃ）の例では、相関関係を時系列データｙ（ｔ）とｘ（ｔ）を回帰分析した際の係数としてとらえており、時間の経過と共に、値が不連続に二つの値をとっている。区間１、区間２、区間５は、高い値であり、区間３、区間４、区間６は低い値になっている。例えば、ｙ（ｔ）を発熱量、ｘ（ｔ）を燃料投入量とすると、燃料の種別により、設備の効率が異なる場合には、このような相関関係になる。区間１、区間２、区間５のグループと、区間３、区間４、区間６のグループでは、燃料の種別が異なり、前者のグループは、後者のグループに対して、同じ燃料量での発熱効率がよいことを表している。局所時系列データ抽出部１０２は、（ｃ）において、区間１，２、５からなる局所時系列データ３０１と、区間３，４、６からなる局所時系列データ３０１との二つの局所時系列データを区分けすることができる。 (Example of time-series data classification: Part 3)
FIG. 5 is a diagram illustrating another example of time-series data classification by the local time-series data extraction unit 102, and is an explanatory diagram illustrating an example (discontinuous change) of a change in correlation between sensor signals. (A) and (b) are examples of time-series data. Let y (t) and x (t) respectively. (C) is an example of time-series data indicating the correlation between y (t) and x (t). In the example of (c), the correlation is regarded as a coefficient when regression analysis is performed on the time series data y (t) and x (t), and the values take two values discontinuously over time. Yes. Section 1, section 2, and section 5 have high values, and section 3, section 4, and section 6 have low values. For example, assuming that y (t) is a heat generation amount and x (t) is a fuel input amount, such a correlation is obtained when the efficiency of the facility differs depending on the type of fuel. The types of fuel are different between the group of section 1, section 2, and section 5 and the group of section 3, section 4, and section 6, and the former group has a heat generation efficiency with the same fuel amount compared to the latter group. It represents a good thing. In (c), the local time-series data extraction unit 102 includes two local time-series data, that is, the local time-series data 301 including sections 1, 2, and 5 and the local time-series data 301 including sections 3, 4, and 6. Can be classified.

（時間で区分した時系列データを、さらに時系列データの値のレンジで区分する例）
図６は、局所時系列データ抽出部１０２による時系列データの区分けの別の例を示す図であり、センサ信号間の相関関係の変化の一例（値の区間への依存性）を示す説明図である。図６は、時間で区分した時系列データを、さらに時系列データの値のレンジで区分する場合を説明する。（ａ）と（ｂ）は、時系列データの例である。各々、ｙ（ｔ）、ｘ（ｔ）とする。（ｃ）は、ｙ（ｔ）とｘ（ｔ）の相関関係を示す時系列データの例である。（ｃ）の例では、相関関係を時系列データｙ（ｔ）とｘ（ｔ）とを回帰分析した際の係数としてとらえており、時間の経過に対して、相関値の変化傾向が変化している。区間１と区間３は、係数は一定の値であり、区間２では、係数がその一定の値の間を連続的に変化している。この相関関係の変化は、時刻に依存するのではなく、時系列データｘ（ｔ）の値への依存性を表していると解釈することができる。例えば、ｙ（ｔ）を発熱量、ｘ（ｔ）を燃料投入量とするとき、設備の制御システムにより、燃料を多く投入しても、ある一定の発熱量以上にはあがらないように制御されている場合には、このような挙動を示す。このような挙動を示すセンサ信号間の関係をとらえるためには、時系列データ（ｙ（ｔ）あるいはｘ（ｔ））を、値のレンジにより区分することが有効である。（ｄ）に、相関値の値の変動する点を、値ｙの区分として抽出した場合の、局所信号データを示す。（ｄ）では、時系列データｙを値の区分Ａ、区分Ｂ、および、区分Ｃに分割している。（ｄ）では、区間１、２、３がそれぞれ局所時系列データ３０１となる。このとき局所時系列データ抽出部１０２は、時間で区分した得た局所時系列データ３０１を、さらに時系列データの値（この例では（ａ）のｙ（ｔ）の値を採用している）のレンジで区分することで、区間２の局所時系列データ３０１については、局所時系列データ３０１を、さらに、区間２−１と区間２−２とのデータとに分けている。 (Example of dividing time-series data divided by time into a range of time-series data values)
FIG. 6 is a diagram illustrating another example of time-series data segmentation by the local time-series data extraction unit 102, and is an explanatory diagram illustrating an example of change in correlation between sensor signals (dependence of values on a section). It is. FIG. 6 illustrates a case where the time-series data divided by time is further divided by the value range of the time-series data. (A) and (b) are examples of time-series data. Let y (t) and x (t) respectively. (C) is an example of time-series data indicating the correlation between y (t) and x (t). In the example of (c), the correlation is regarded as a coefficient when regression analysis is performed on the time series data y (t) and x (t), and the change tendency of the correlation value changes with the passage of time. ing. In section 1 and section 3, the coefficient has a constant value, and in section 2, the coefficient continuously changes between the constant values. This change in correlation is not dependent on time but can be interpreted as representing dependence on the value of the time-series data x (t). For example, when y (t) is a calorific value and x (t) is a fuel input amount, even if a large amount of fuel is input by the equipment control system, it is controlled so as not to exceed a certain calorific value. If this is the case, this behavior is shown. In order to grasp the relationship between sensor signals exhibiting such behavior, it is effective to classify time series data (y (t) or x (t)) according to a range of values. (D) shows local signal data when a point where the value of the correlation value fluctuates is extracted as a category of the value y. In (d), the time-series data y is divided into value categories A, B, and C. In (d), sections 1, 2, and 3 become local time series data 301, respectively. At this time, the local time-series data extraction unit 102 further uses the time-series data value (the value of y (t) in (a) is adopted in this example) for the obtained local time-series data 301 divided by time. With respect to the local time series data 301 in section 2, the local time series data 301 is further divided into data of section 2-1 and section 2-2.

（局所クラスタと大域クラスタ）
図７は、局所時系列データクラスタリングの一例を示す説明図である。図７は、局所時系列データモデル推定部１０３、局所時系列データクラスタリング部１０４、及び大域的時系列データモデル推定部１０５の処理結果を概念的に示す図である。図７のグラフは、縦軸を発熱量ｙ、横軸を燃料投入量ｘとする散布図である。
「散布図」とは、ある時刻の時系列データｘ（ｔ）とｙ（ｔ）の組を点＜ｘ（ｔ），ｙ（ｔ）＞として、２次元のグラフにマッピングしたものである。図５に示したような信号間の関係がある場合には、散布図上の時系列データの組は、クラスタ７０１とクラスタ７０２との二つのクラスタに分類できる。以下に図７を説明する。 (Local and global clusters)
FIG. 7 is an explanatory diagram showing an example of local time-series data clustering. FIG. 7 is a diagram conceptually illustrating processing results of the local time series data model estimation unit 103, the local time series data clustering unit 104, and the global time series data model estimation unit 105. The graph of FIG. 7 is a scatter diagram in which the vertical axis indicates the heat generation amount y and the horizontal axis indicates the fuel input amount x.
The “scatter diagram” is a set of time series data x (t) and y (t) at a certain time mapped as a point <x (t), y (t)> on a two-dimensional graph. When there is a relationship between signals as shown in FIG. 5, a set of time series data on the scatter diagram can be classified into two clusters, a cluster 701 and a cluster 702. Hereinafter, FIG. 7 will be described.

図７では、クラスタとして小さなクラスタ７０３、７０４と、大きなクラスタ７０１、７０２とを示した。小さなクラスタ７０３、７０４等を局所クラスタ（局所時系列クラスタともいう）と呼び、大きなクラスタ７０１、７０２等を大域クラスタと呼ぶこととする。大域クラスタ７０１は、効率のよい燃料を使用した場合のｘとｙの値の組に対応している。大域クラスタ７０２は、効率のよくない燃料を使用した場合のｘとｙの値の組に対応している。大域クラスタ７０１は上昇傾向が頭打ちになっているが、大域クラスタ７０２は直線になっている。これは、図６に示したような信号間の関係がなりたつ例を示している。 In FIG. 7, small clusters 703 and 704 and large clusters 701 and 702 are shown as clusters. Small clusters 703 and 704 are called local clusters (also referred to as local time series clusters), and large clusters 701 and 702 are called global clusters. The global cluster 701 corresponds to a set of x and y values when an efficient fuel is used. The global cluster 702 corresponds to a set of x and y values when using inefficient fuel. The global cluster 701 has peaked, but the global cluster 702 has a straight line. This shows an example of the relationship between signals as shown in FIG.

局所クラスタ７０３、７０４の範囲（枠線）は、局所時系列データクラスタリング部１０４による局所時系列データ３０１のクラスタリングの結果得られたクラスタの例である。時間区分と値のレンジの区分で分割された局所時系列データ３０１に対応している。この局所時系列データ３０１において、回帰分析やＡＲモデル等により推定された回帰式（代表局所パラメータ）が各々回帰式７０５、７０６である。異常判定対象データ７０７、７０８は、異常判定対象の時系列データの例である。異常検知では、最も近い回帰式からの距離がある閾値以上であるものを異常と判定する。７０７は、最も近い回帰式７０４と近いため、正常と判定する。７０８は、最も近い回帰式７０４からある一定以上は離れているため、異常と判定する。 The ranges (frame lines) of the local clusters 703 and 704 are examples of clusters obtained as a result of clustering the local time series data 301 by the local time series data clustering unit 104. This corresponds to the local time series data 301 divided by the time section and the value range section. In the local time series data 301, regression equations (representative local parameters) estimated by regression analysis, an AR model, or the like are regression equations 705 and 706, respectively. The abnormality determination target data 707 and 708 are examples of time series data to be determined as abnormality. In the abnormality detection, an object whose distance from the nearest regression equation is equal to or greater than a threshold is determined to be abnormal. Since 707 is close to the closest regression equation 704, it is determined to be normal. 708 is determined to be abnormal because it is more than a certain distance from the closest regression equation 704.

図８は、大域的時系列データモデルの推定の一例を示す説明図である。（ａ）の８０１、８０２、８０３、８０４は、局所時系列データ３０１の集合をクラスタリングすることにより得られた代表回帰式である。（ｂ）の８０９は、（ａ）の８０１、８０２、８０３、８０４を接続して得られる大域的な代表回帰式である。同様に、（ｂ）の８１０は、（ａ）の８０５、８０６、８０７、８０８を接続して得られる大域的な代表回帰式である。 FIG. 8 is an explanatory diagram showing an example of estimation of a global time series data model. Reference numerals 801, 802, 803, and 804 of (a) are representative regression equations obtained by clustering a set of local time series data 301. 809 in (b) is a global representative regression equation obtained by connecting 801, 802, 803, and 804 in (a). Similarly, 810 in (b) is a global representative regression equation obtained by connecting 805, 806, 807, and 808 in (a).

（動作の説明）
以下、図９〜図２１のフロー図を用いて、本実施の形態１の動作を説明する。図９は、プラント異常検知装置の処理の全体の流れを説明するフロー図である。図１０は、局所時系列データクラスタリング部１０４の処理の流れを説明するフロー図である。図１１〜図２０は、図１０の処理説明を補足する図である。
図２１は、大域的時系列データモデル推定部１０５の処理の流れを説明するフロー図である。 (Description of operation)
Hereinafter, the operation of the first embodiment will be described with reference to the flowcharts of FIGS. FIG. 9 is a flowchart for explaining the overall processing flow of the plant abnormality detection apparatus. FIG. 10 is a flowchart illustrating the processing flow of the local time series data clustering unit 104. FIGS. 11 to 20 are diagrams for supplementing the processing description of FIG. 10.
FIG. 21 is a flowchart for explaining the processing flow of the global time-series data model estimation unit 105.

（Ｓ９０１，Ｓ９０２：局所時系列データ抽出部１０２の処理）
Ｓ９０１は、局所時系列データ抽出部１０２が実行する局所時系列データ抽出処理である。Ｓ９０１では、複数の時系列データの組を入力として、入力データの時間的な変化の仕方に従って、入力データを区分する。入力区分を求める手順（所定の局所時系列データ生成規則）は、例えば、複数の入力データ中の一つを目的変数とし、それ以外を説明変数として、
（１）参考文献（河口至商著，多変量解析２ｐｐ．６０〜６４，森北出版）に記載の「区分的な回帰分析」や、
（２）参考文献（北川源四郎著，時系列解析入門，ｐｐ１１３−１２４，岩波書店）の局所定常ＡＲモデルを用いる。
（３）あるいは、初等的に、図３に示すように、時系列データを平滑化処理と時間差分処理を作用させた後に、閾値で区分を抽出してもよい。Ｓ９０２も局所時系列データ抽出部１０２が実行する処理である。 (S901, S902: processing of local time series data extraction unit 102)
S <b> 901 is a local time series data extraction process executed by the local time series data extraction unit 102. In step S901, the input data is classified according to how the input data changes with time, using a plurality of sets of time-series data as input. The procedure for obtaining the input classification (predetermined local time-series data generation rule) is, for example, that one of a plurality of input data is an objective variable and the other is an explanatory variable.
(1) "Categorical regression analysis" described in references (Kawaguchi Zhisho, Multivariate Analysis 2pp. 60-64, Morikita Publishing)
(2) The local stationary AR model of the reference (Genjiro Kitagawa, Introduction to Time Series Analysis, pp113-124, Iwanami Shoten) is used.
(3) Alternatively, as shown in FIG. 3, the time series data may be subjected to smoothing processing and time difference processing and then the sections may be extracted with threshold values. S902 is also processing executed by the local time-series data extraction unit 102.

Ｓ９０２では、局所時系列データ抽出部１０２は、Ｓ９０１で得られた局所区分時系列データを、データの値のレンジでさらに区分することにより、新たな時系列データを抽出する処理である。具体的には、時間で区分された時系列データの集合

を入力として、区分として注目する変数ｙｉの区間の分割

により分割された時系列データの集合

を抽出する。
但し、（Ｔ_ｉｓ，，Ｔ_ｉｅ］は、Ｓ９０１で得られた区間区分とする。
以下では、Ｌ_ｋｉを局所時系列データ３０１として、局所時系列データＬ_ｋｉと呼ぶ。なお、以下では、Ｌ_ｋｉを局所時系列データと呼んで説明するが、データ区分を含まない時間範囲のみで分割したものも局所時系列データ（広義）である。 In S902, the local time-series data extraction unit 102 is a process of extracting new time-series data by further classifying the local segment time-series data obtained in S901 with a range of data values. Specifically, a set of time-series data divided by time

Divide the variable yi section of interest as a category

A set of time series data divided by

To extract.
_{_{However, (T is,, T ie}} ] shall be obtained interval division in S901.
In the _following, the _{L ki} as a local time-series data 301, referred to as a local time-series data _{L ki.} In the following description, L _{ki is referred} to as local time-series data, but local time-series data (in a broad sense) is also divided only by a time range that does not include data sections.

図１１を参照して、上記（式１）〜（式３）の意味を具体的に説明する。図１１は、時系列データｙ（ｔ）、時系列データｘ_１（ｔ）、時系列データｘ_２（ｔ）の３つの場合の例である。以下、時系列データｙ（ｔ）を単にｙ（ｔ）のように記載する場合もある。ｘ_１（ｔ）、ｘ_２（ｔ）と２つなので（式１）において「Ｍ＝２」である。（式１）は時間区分を示すｉに関する１〜ＮのＮ個の時系列データの集合を示すが、「ｉ＝１」の場合の（式１）の示す時系列データは、図１１のＴ_１ｓ，＜ｔ≦Ｔ_２ｅの範囲に含まれる
ｙ（ｔ）、ｘ_１（ｔ）、ｘ_２（ｔ）
である。
また（式２）のデータの値のレンジでの分割は、図１１のｙ（ｔ）の縦軸に関する、Ｙ１〜Ｙ２、Ｙ２〜Ｙ３のような分割を意味する。ｙ（ｔ）のＹ１〜Ｙ２の範囲は、（式３）におけるｋ＝１の場合に相当する。つまり（式３）において、
Ｙ１＜ｙ_ｉ＝１≦Ｙ２
の場合である。
（式３）においてｋ＝１、ｉ＝１の場合、つまりＬ_１１の例を以下に説明する。ｉ＝１の場合は図１１のＴ_１ｓ，＜ｔ≦Ｔ_２ｅの時間の帯（縦の帯）に含まれるｙ（ｔ）、ｘ_１（ｔ）、ｘ_２（ｔ）のグラフ部分が対応する。またｉ＝１の下でのｋ＝１の場合は、ｙ（ｔ）のグラフのうち、Ｙ１〜Ｙ２のデータ範囲に属する部分のｙ（ｔ）のグラフがＬ_１１に属する部分である。これを図１１のｙ（ｔ）の太線部分（時間範囲の両側のＹ左側、Ｙ右側）として示した。またＬ_１１に属するｘ_１（ｔ）、ｘ_２（ｔ）は、Ｌ_１１に属するｙ（ｔ）決まり、図１１ではそれぞれ、
ｘ_１左側、ｘ_１右側、ｘ_２左側、ｘ_２右側である。
同様にして、ｋ＝２、ｉ＝１の場合のＬ_２１は、図１１に示す斜線で示すグラフ部分である。 With reference to FIG. 11, the meanings of the above (formula 1) to (formula 3) will be described in detail. FIG. 11 is an example of three cases of time series data y (t), time series data x ₁ (t), and time series data x ₂ (t). Hereinafter, the time series data y (t) may be described simply as y (t). Since x ₁ (t) and x ₂ (t) are two, “M = 2” in (Expression 1). (Expression 1) represents a set of N time-series data of 1 to N related to i indicating a time section, and the time-series data indicated by (Expression 1) in the case of “i = 1” is T in FIG. _1s, <t ≦ T _2e included in the range y (t), x ₁ (t), x ₂ (t)
It is.
Further, the division in the range of the data value of (Expression 2) means the division such as Y1 to Y2 and Y2 to Y3 with respect to the vertical axis of y (t) in FIG. The range of Y1 to Y2 of y (t) corresponds to the case of k = 1 in (Expression 3). That is, in (Equation 3),
Y1 <y _{i = 1} ≦ Y2
This is the case.
In the case of k = 1 and i = 1 in (Equation 3), that is, an example of L ₁₁ will be described below. When i = 1, the graph portions of y (t), x ₁ (t), and x ₂ (t) included in the time zone (vertical zone) of T _1s, <t ≦ T _{2e in} FIG. 11 correspond. To do. For k = 1 under i = 1 In addition, of the graph of y (t), the graph of y (t) of the portion belonging to the data range of Y1~Y2 is part belonging to _{L 11.} This is shown as a thick line portion of y (t) in FIG. 11 (Y left side and Y right side on both sides of the time range). The _x 1 belonging to _{_{L 11 (t), x 2}} (t) _is determined y (t) belonging to _{L 11,} respectively 11,
x ₁ left side, x ₁ right side, x ₂ left side, x ₂ right side.
Similarly, _{L 21} in the case of k = 2, i = 1 is a graph portion shown by oblique lines shown in FIG. 11.

（Ｓ９０３：局所時系列データモデル推定部１０３の処理）
Ｓ９０３では、局所時系列データモデル推定部１０３（局所時系列データ回帰式生成部の一例）が、局所時系列データＬ_ｋｉを、多変量解析、または、時系列解析手法によりモデル推定する。「モデル推定」とは例えば、回帰式を求める処理である。例えば、Ｓ９０２でセンサ信号（注目変数ｙ）の値のレンジで分割した区間ｋ毎にＮ個存在する局所時系列データ３０１の要素である時系列データ

のすべてに対して、多変量解析、または、時系列データによりモデル推定する。
以下では、多変量解析の例として線形回帰分析を用いて説明するが、因子分析、特異値分解、ＡＲモデル、状態空間モデルなどでもよい。時系列データＬ_ｋｉ（ｔ）に対して回帰分析を実施すると、
回帰式ｙ（ｔ）＝Ｆ_ｋｉ（ｘ_１，ｘ_２，…，ｘ_Ｍ）と、残差の平方和Ｅ_ｋｉを得る。
以下では、（ｘ_１，ｘ_２，…，ｘ_Ｍ）をベクトルｘとして記載し、
Ｆ_ｋｉ（ｘ_１，ｘ_２，…，ｘ_Ｍ）をＦ_ｋｉ（ｘ）と記載する。
以下では、
時系列データＬ_ｋｉ（ｔ）、
ｙの区間（Ｙｋ，Ｙｋ＋１］、
回帰式Ｆ_ｋｉ（ｘ）、
残差の平方和Ｅ_ｋｉの
四つ組（Ｌ_ｋｉ（ｔ），（Ｙｋ，Ｙｋ＋１］，Ｆ_ｋｉ（ｘ），Ｅ_ｋｉ）を、
局所時系列データ３０１の集合Ｌ_ｋｉに対する局所構造Ｓ（Ｌ_ｋｉ）と呼ぶ。 (S903: Processing of local time series data model estimation unit 103)
In step S903, the local time series data model estimation unit 103 (an example of a local time series data regression equation generation unit) estimates the local time series data L _ki using a multivariate analysis or a time series analysis method. “Model estimation” is, for example, processing for obtaining a regression equation. For example, time-series data that is an element of the local time-series data 301 that exists for each section k divided in the range of the value of the sensor signal (target variable y) in S902

All models are estimated by multivariate analysis or time series data.
In the following, linear regression analysis is used as an example of multivariate analysis, but factor analysis, singular value decomposition, AR model, state space model, and the like may be used. When regression analysis is performed on the time series data L _ki (t),
The regression equation y (t) = F _ki (x ₁ , x ₂ ,..., X _M ) and the residual sum of squares E _ki are obtained.
In the following, (x ₁ , x ₂ ,..., X _M ) is described as a vector x,
F _ki (x ₁ , x ₂ ,..., X _M ) is _denoted as F _ki (x).
Below,
Time series data L _ki (t),
y section (Yk, Yk + 1],
Regression equation F _ki (x),
A quadruple of residual sums of squares E _ki (L _ki (t), (Yk, Yk + 1], F _ki (x), E _ki ),
This is called a local structure S (L _ki ) for the set L _ki of the local time series data 301.

（局所構造）
つまり、
局所構造Ｓ（Ｌ_ｋｉ）＝｛Ｌ_ｋｉ（ｔ），（Ｙｋ，Ｙｋ＋１］，Ｆ_ｋｉ（ｘ），Ｅ_ｋｉ｝
である。 (Local structure)
That means
Local structure S (L _ki ) = {L _ki (t), (Yk, Yk + 1], F _ki (x), E _ki }
It is.

図１２は、上記で述べた局所時系列データモデル推定部１０３の処理を概念的に示す。局所時系列データモデル推定部１０３は、局所時系列データ抽出部１０２の抽出した、ある（ｋ、ｉ）の一つの局所時系列データＬ_ｋｉに、Ｓ（Ｌ_ｋｉ）を対応させる。この場合、時間の区分数を示す「ｉ」は（式１）に示すように１〜ＮまでのＮ個である。また、注目変数ｙ（ｔ）（指定時系列データ）についてのデータ区分数を示すｋは１〜Ｋとする（（式２）においてｍ＝１〜Ｋに対応）。
つまり、ｉ＝１〜Ｎ、ｋ＝１〜Ｋ
であるので、局所時系列データＬ_ｋｉは「Ｎ×Ｋ」個できる。
よって、局所構造Ｓ（Ｌ_ｋｉ）も「Ｎ×Ｋ」個できる。図１３は、「Ｎ×Ｋ」個の局所構造Ｓ（Ｌ_ｋｉ）を概念的に示す。図１３は、横軸を時間の区分数「ｉ」、縦軸をデータ区分数「ｋ」とした。その場合、一つのセルが、あるＳ（Ｌ_ｋｉ）に対応する。 FIG. 12 conceptually shows the processing of the local time series data model estimation unit 103 described above. The local time series data model estimation unit 103 associates S (L _ki ) with one local time series data L _ki extracted by the local time series data extraction unit 102. In this case, “i” indicating the number of time segments is N from 1 to N as shown in (Formula 1). Further, k indicating the number of data sections for the target variable y (t) (designated time series data) is 1 to K (corresponding to m = 1 to K in (Expression 2)).
That is, i = 1 to N, k = 1 to K.
Therefore, “N × K” pieces of local time series data L _ki can be _generated .
Therefore, “N × K” local structures S (L _ki ) can be formed. FIG. 13 conceptually shows “N × K” local structures S (L _ki ). In FIG. 13, the horizontal axis represents the number of time divisions “i”, and the vertical axis represents the number of data divisions “k”. In that case, one cell corresponds to a certain S (L _ki ).

（Ｓ９０４：局所時系列データクラスタリング部１０４の動作）
Ｓ９０４は、局所時系列データクラスタリング部１０４が実行する局所データクラスタリング処理である。Ｓ９０４では、局所時系列データモデル推定部１０３が推定した局所時系列データモデルの集合（つまりＮ・Ｋ個のＳ（Ｌ_ｋｉ））をクラスタに分割すると共に、クラスタ毎にクラスタを代表する代表局所パラメータを推定する。
図１０は、局所時系列データクラスタリング部１０４が実行する、Ｓ９０４の処理の流れの詳細を示すフローチャートである。図１０の動作の主語は局所時系列データクラスタリング部１０４であるが、煩雑になるので省略する。Ｓ９０４は、Ｓ９０３により得られた局所構造
Ｓ（Ｌ_ｉ）＝（Ｌ_ｉ（ｔ），（Ｙｋ，Ｙｋ＋１］，Ｆ_ｉ（ｘ），Ｅ_ｉ）
の集合を入力とする。ただし、ｋ毎（データ区分ごと）に実行するので、簡潔さのために、Ｌ，Ｆ，Ｅの添え字ｋは省略して説明する。また、大文字のＮは、局所時系列データ（Ｌ_ｉ）の数とし（つまり、時間範囲の数）、上記のようにデータ区分数「ｉ」は、１からＮの値をとるとする。 (S904: Operation of Local Time Series Data Clustering Unit 104)
S904 is a local data clustering process executed by the local time-series data clustering unit 104. In S904, the set of local time-series data models estimated by the local time-series data model estimation unit 103 (that is, N · K S (L _ki )) is divided into clusters, and a representative local representing the cluster for each cluster. Estimate the parameters.
FIG. 10 is a flowchart showing details of the processing flow of S904 executed by the local time-series data clustering unit 104. The subject of the operation in FIG. 10 is the local time series data clustering unit 104, but it will be omitted because it becomes complicated. S904 is the local structure S (L _i ) = (L _i (t), (Yk, Yk + 1], F _i (x), E _i ) obtained in S903.
As a set. However, since it is executed for each k (each data section), the subscript k of L, F, and E is omitted for the sake of brevity. Further, the capital letter N is the number of local time series data (L _i ) (that is, the number of time ranges), and the number of data sections “i” takes values from 1 to N as described above.

このことを図１３で説明すれば、局所時系列データクラスタリング部１０４がｋ毎にＳ（Ｌ_ｉ）を実行するとは、例えば「ｋ＝２」とする場合、図１３の斜線部のデータである、Ｓ（Ｌ_１）〜Ｓ（Ｌ_Ｎ）を実行するという意味である。 If this is explained with reference to FIG. 13, the local time-series data clustering unit 104 executes S (L _i ) for each k, for example, in the case of “k = 2”, the data in the shaded part of FIG. , S (L ₁ ) to S (L _N ).

（Ｓ１００１）
Ｓ１００１では、Ｓ_ｉの中から、最小のＥ_ｉをもつＳ_ｉを探す。最小のＥ_ｉをもつＳ_ｉはＳ_３であるとする。ｋ＝２とすれば、図１３において局所時系列データクラスタリング部１０４は、Ｓ_１〜Ｓ_Ｎの中から、最小のＥ_ｉをもつＳ_ｉを探す。
次に、代表局所パラメータの候補変数ｍ_１にＦ_ｉを代入する。この場合は、最小のＥ_ｉをもつＳ_３に属するＦ_３（回帰式）を代表局所パラメータの候補変数ｍ_１に代入する。
この場合、
ｍ_１＝Ｆ_３
である。
次に、変数ｃに１を代入する。
つまり
ｃ＝１
である。
なお、後述のＳ１００３で登場する局所クラスタ数の設定数を変数ｃとの区別するため「Ｃ＊」とする。 (S1001)
In S1001, from among the _{S i,} look for the _{S i} with the minimum of _{E i.} Let S _i with the smallest E _{i be} S ₃ . If k = 2, the local time-series data clustering unit 104 in FIG. 13 searches for S _i having the minimum E _i from S _{1 to} S _N.
Next, F _i is substituted into candidate variable m ₁ of the representative local parameter. In this case, F ₃ (regression equation) belonging to S ₃ having the smallest E _i is substituted into the candidate variable m ₁ of the representative local parameter.
in this case,
m ₁ = F ₃
It is.
Next, 1 is substituted into the variable c.
That is, c = 1
It is.
Note that the set number of local clusters that appear in S1003, which will be described later, is “C *” to distinguish it from the variable c.

（Ｓ１００２）

ここで、ｄｉｓｔ_ｌｍ（Ｓ（Ｌ_ｉ），｛ｍ_１，・・ｍ_Ｃ＊｝）はＳ（Ｌ_ｉ）と｛ｍ_１，・・ｍ_Ｃ＊｝との距離を示し、また回帰式間の距離ｄｉｓｔ_ｒは、回帰式の係数をベクトルとみなした場合のベクトル間の距離とする。距離ｄｉｓｔ（Ｆ_ｉ（ｘ），ｍ_１）はＦ_ｉとｍ_１との距離であるが、予め設定された計算式に基づき算出する。なお回帰式Ｆ_ｉ（ｘ）は簡略化してＦ_ｉ（ｘ）とも表記する。図１４の「ｃ＝１」は、Ｓ１００１でｃ＝１となった場合の（式５）の意味を示している。
変数ｃ＝１の場合、
ｄｉｓｔ（Ｆ_１，ｍ_１）〜ｄｉｓｔ（Ｆ_Ｎ，ｍ_１）のＮ個の距離のなかから、最大の距離を探す。例えば、ｄｉｓｔ（Ｆ_５，ｍ_１）が最大とする（ｉ＝５）。
つまり、
ｍａｘ＝ｄｉｓｔ（Ｆ_５，ｍ_１）次に、ｍ_ｃ＋１に、Ｆ_ｉ（ｘ）を代入する。次に、ｃに、ｃ＋１を代入する。
この設例では、
ｍ_１＋１＝ｍ_２＝Ｆ_５，
ｃ＝１＋１＝２
となる。 (S1002)

_{_{Here, dist lm (S (L i}} ), {m 1, ·· m C *}) represents the distance between the _{_{S (L i) {m 1}} , ·· m C *}, also among the regression equation distance dist _r of is the distance between vectors in the case where the coefficients of the regression equation was considered vector. The distance dist (F _i (x), m ₁ ) is the distance between F _i and m ₁ and is calculated based on a preset calculation formula. Note that the regression equation F _i (x) is simplified and expressed as F _i (x). “C = 1” in FIG. 14 indicates the meaning of (Formula 5) when c = 1 in S1001.
If variable c = 1,
The maximum distance is searched from among N distances of dist (F ₁ , m ₁ ) to dist (F _N , m ₁ ). For example, dist (F ₅ , m ₁ ) is the maximum (i = 5).
That means
max = dist (F ₅ , m ₁ ) Next, F _i (x) is substituted into m _{c + 1} . Next, c + 1 is substituted for c.
In this example,
m _{1 + 1} = m ₂ = F ₅ ,
c = 1 + 1 = 2
It becomes.

（Ｓ１００３）
Ｓ１００３では、変数ｃが定数Ｃ＊に等しいかどうかを判定する。
但し、上述のように定数Ｃ＊は、局所クラスタ数を示す数として、あらかじめパラメータとして与えられているものとする。等しい場合は、Ｓ１００４に進む。等しくない場合は、Ｓ１００２に戻る。 (S1003)
In S1003, it is determined whether or not the variable c is equal to the constant C *.
However, as described above, the constant C * is assumed to be given as a parameter in advance as a number indicating the number of local clusters. If equal, the process proceeds to S1004. If they are not equal, the process returns to S1002.

この設例では、現在、ｃ＝２なのでＳ１００２に戻るとする。
Ｓ１００２に戻った状態では、
ｃ＝２、
ｍ_２＝Ｆ_５，
である。
また、Ｓ１００１より、
ｍ_１＝Ｆ_３，
である。
そして、最初のＳ１００２（ｃ＝１のとき）と同様に、（式５）に基づき、最大の距離を探す。
図１４の「ｃ＝２」は、Ｓ１００２でｃ＝２の場合の（式５）の意味を示している。
変数ｃ＝２の場合、
「ｄｉｓｔ（Ｆ_１，ｍ_１）＋（Ｆ_１，ｍ_２）」〜「ｄｉｓｔ（Ｆ_Ｎ，ｍ_１）＋（Ｆ_Ｎ，ｍ_２）」のＮ個の距離の中から最大の距離を探す。
後の動作は前回のＳ１００２と同じである。
そして、
ｃ＝Ｃ＊となると、処理はＳ１００４に進む。
この例では、ｃ＝２０（局所クラスタ数）となった場合である。 In this example, it is assumed that the process returns to S1002 because c = 2 at present.
In the state returned to S1002,
c = 2,
m ₂ = F ₅ ,
It is.
From S1001,
m ₁ = F ₃ ,
It is.
Then, similarly to the first S1002 (when c = 1), the maximum distance is searched based on (Expression 5).
“C = 2” in FIG. 14 indicates the meaning of (Formula 5) in the case where c = 2 in S1002.
If variable c = 2,
The maximum distance is searched from N distances “dist (F ₁ , m ₁ ) + (F ₁ , m ₂ )” to “dist (F _N , m ₁ ) + (F _N , m ₂ )”. .
The subsequent operation is the same as the previous S1002.
And
When c = C *, the process proceeds to S1004.
In this example, c = 20 (the number of local clusters).

以上のＳ１００２、Ｓ１００３の処理でＣ＊個のｍが求まるが、これを以下では、
ｍｊ、ｊ＝１，２，・・・，Ｃ＊、
としている。 The above processing of S1002 and S1003 determines C * number of m.
mj, j = 1, 2,..., C *,
It is said.

以下、後述するＤ_ｊ，ｊ＝１，２，・・・，Ｃ＊を、局所構造をクラスタリングすることにより得られたＣ＊個のクラスタとする。 Hereinafter, D _j , j = 1, 2,..., C *, which will be described later, are C * clusters obtained by clustering the local structure.

（Ｓ１００４）
Ｓ１００４では、クラスタＤ_ｊの初期化処理を実施する。例えば、図１３（ｋ＝２）のＮ個のＬ_ｉ（Ｓ_ｉはＬ_ｉを要素に持つ）に対して、Ｃ＊個のｄｉｓｔ_ｌｍ（Ｌ_ｉ，ｍ_ｊ）（所定の距離定義式）が最小となるｍ_ｊを探す。
次に、クラスタＤ_ｊに、Ｌ_ｉを代入する。
図１５はこの処理を概念的に示す図である。
例えばＬ_１を考える。Ｌ_１と、ｍ_１〜ｍ_ｃ＊のそれぞれとの距離ｄｉｓｔ_ｌｍ（Ｌ_ｉ，ｍ_ｊ）を計算し、距離が最小となるｍ_ｊを探す。距離を求める式は、Ｓ１００２で用いたように回帰式の係数をベクトルとみなした場合のベクトル間の距離とする方式でもよいし、それ以外の式でもよい。
この場合の距離ｄｉｓｔ_ｌｍ（Ｌ_ｉ，ｍ_ｊ）は、
時系列データＬ_ｉと同一の局所構造Ｓ_ｉに所属する回帰式Ｆ_ｉを意味する。これは（式５）の距離式の場合と同様である。
つまり、
ｄｉｓｔ_ｌｍ（Ｌ_ｉ，ｍ_ｊ）＝ｄｉｓｔ_ｌｍ（Ｆ_ｉ，ｍ_ｊ）
であり、Ｆ_ｉ，ｍ_ｊとも回帰式であるから、回帰式どうしの距離が求まる。
ただし、クラスタリングの対象は時系列データであるのでＬ_ｉを用いて表現している。
例えば、Ｌ_１（Ｆ_１）はｍ_１との距離が最小であったとすると、時系列データＬ_１はクラスタＤ_１に所属する。
同様に、図１５のようにＬ_２（Ｆ_２）もｍ_１との距離が最小であったとすると、時系列データＬ_２もクラスタＤ_１に所属する。
同様に、Ｌ_３（Ｆ_３）はｍ_２との距離が最小であったとすると、時系列データＬ_３はクラスタＤ_２に所属する。
以下、Ｌ_４〜Ｌ_Ｎまで同様である。
以上のＳ１００４の処理によって、Ｌ_１〜Ｌ_ＮはＤ_１〜Ｄ_ｃ＊のいずれかのクラスタＤ_ｊに所属することになる。 (S1004)
In S1004, carried initializes the cluster _{D j.} For example, for N L _i (S _i has L _i as an element) in FIG. 13 (k = 2), C * dist _lm (L _i , m _j ) (predetermined distance defining formula) Find m _j that minimizes.
Next, L _i is substituted into cluster D _j .
FIG. 15 is a diagram conceptually showing this processing.
For example, consider the _{L 1.} The distance dist _lm (L _i , m _j ) between L ₁ and each of m _{1 to} m _{c *} is calculated, and m _j that minimizes the distance is searched for. The equation for obtaining the distance may be a method for determining the distance between vectors when the coefficient of the regression equation is regarded as a vector as used in S1002, or may be another equation.
The distance dist _lm (L _i , m _j ) in this case is
It means the regression equation F _i belonging to the same local structure S _i as the time series data L _i . This is the same as in the case of the distance formula of (Formula 5).
That means
dist _lm (L _i , m _j ) = dist _lm (F _i , m _j )
Since both F _i and m _j are regression equations, the distance between the regression equations can be obtained.
However, it is expressed using the L _i because the clustering of the target is a time-series data.
For example, if L ₁ (F ₁ ) has the shortest distance from m ₁ , the time series data L ₁ belongs to the cluster D ₁ .
Similarly, if L ₂ (F ₂ ) has the shortest distance from m ₁ as shown in FIG. 15, the time-series data L ₂ also belongs to cluster D ₁ .
Similarly, if L ₃ (F ₃ ) has the smallest distance from m ₂ , the time series data L ₃ belongs to the cluster D ₂ .
_Hereinafter, the same to L 4 ~L _N.
Through the processing in S1004, L _{1 to} L _N belong to any cluster D _j of D _{1 to} D _{c *} .

（Ｓ１００５）
Ｓ１００５以降（Ｓ１００５，Ｓ１００６、Ｓ１００４のループ）は、Ｓ１００３→Ｓ１００４で設定された初期クラスタをもとに、局所時系列データＬｉ（ｉ＝１〜Ｎ）を再クラスタリングする処理である。Ｓ１００５では、Ｃ＊個のＤ_ｊに対して、あるｊであるクラスタＤ_ｊに属する局所時系列データＬ_ｊｋ（この例ではｋ＝２）の和集合∪Ｌ_ｊｋ∈Ｄ_ｊに対して、回帰分析する。この回帰分析により、そのクラスタＤ_ｊの回帰式Ｆ_ｊ（ｘ）を求める。
次に、そのクラスタＤ_ｊの代表回帰式の候補ｍ_ｊに、求めた回帰式Ｆ_ｊ（ｘ）を代入する。
図１６はＳ１００５の処理を概念化した図であり、再クラスタリング規則を示している。
最初のＳ１００４の処理が終わった段階で、
クラスタＤ_１には局所時系列データＬ_１、Ｌ_２が所属し、
クラスタＤ_２には局所時系列データＬ_３〜Ｌ_５が所属し、
・・・とする。
その場合、
クラスタＤ_１については、和集合Ｌ_１∪Ｌ_２に対して回帰式Ｆ_ｊ＝１を求め、
クラスタＤ_２については、和集合Ｌ_１∪Ｌ_２∪Ｌ_３に対して回帰式Ｆ_ｊ＝２を求める。他のクラスタに関しても同様である。
この処理によって、クラスタＤ_１〜Ｄ_Ｃ＊のＣ＊個の各クラスタに回帰式が定まる。
このＣ＊個の回帰式を、図１５（Ｓ１００２，Ｓ１００３）で求めたｍ_１〜ｍ_Ｃ＊に対して、新たなｍ_１〜ｍ_Ｃ＊とする。 (S1005)
After S1005 (the loop of S1005, S1006, and S1004) is a process of reclustering the local time series data Li (i = 1 to N) based on the initial cluster set in S1003 → S1004. In S1005, with respect to C * number of _{D j,} with respect to the union ∪L _jk ∈D _j of the local time-series data _{L jk} belonging to the cluster _{D j} is some j (in this example k = 2), the regression analyse. By this regression analysis, a regression formula F _j (x) of the cluster D _j is obtained.
Next, the obtained regression equation F _j (x) is substituted into the representative regression equation candidate m _j of the cluster D _j .
FIG. 16 is a diagram conceptualizing the processing of S1005, and shows a reclustering rule.
At the stage when the processing of the first S1004 is finished,
Cluster D ₁ includes local time series data L ₁ and L ₂ ,
A member of the local time-series data _L 3 ~L ₅ to cluster _{D 2,}
... and so on.
In that case,
For cluster D ₁ , find regression equation F _{j = 1} for union L ₁ ∪L ₂ ,
For the cluster D ₂ , the regression equation F _{j = 2} is obtained for the union L ₁ ∪L ₂ ∪L ₃ . The same applies to other clusters.
By this processing, a regression equation is determined for each of C * clusters of the clusters D _{1 to} D _{C *} .
The C * regression equations are set as new m _{1 to} m _{C *} with respect to m _{1 to} m _{C *} obtained in FIG. 15 (S1002, S1003).

Ｓ１００６では、「Ｃ＊個のｍ_ｉのすべてに変化がない」かどうかを判定する。
最初のＳ１００６では、前回の「ｍ_１〜ｍ_Ｃ＊」はＳ１００２、Ｓ１００３のループで作成された、いわゆる初期値である。よって、通常、最初のＳ１００５で求めた「ｍ_１〜ｍ_Ｃ＊」との間に変化がある。
変化がある場合は、Ｓ１００４に戻る。
変化がない場合は、終了する。
終了した際のＤ_１，Ｄ_２，…，Ｄ_Ｃ＊のＣ＊個のクラスタが、局所時系列クラスタである。終了時点での「ｍ_１〜ｍ_Ｃ＊」を大文字を用いて「Ｍ_１，Ｍ_２，…，Ｍ_Ｃ＊」と記載するが、これらが局所時系列クラスタ「Ｄ_１〜Ｄ_Ｃ＊」のそれぞれの代表局所パラメータである。 In S <b> 1006, it is determined whether or not “C * m _i are all unchanged”.
In the first S1006, the previous “m _{1 to} m _{C *} ” is a so-called initial value created in the loop of S1002 and S1003. Therefore, there is usually a change between “m _{1 to} m _{C *} ” obtained in the first S1005.
If there is a change, the process returns to S1004.
If there is no change, exit.
The C * clusters of D ₁ , D ₂ ,..., D _{C *} when finished are local time series clusters. Ends _"m 1 _{~m C *"} at the time with the capital letter _{_{"M 1, M 2, ...,}} M C * " will be described as these local time series cluster _"D 1 _{to D C *"} in Each representative local parameter.

(複数回目のＳ１００４)
最初のＳ１００６からＳ１００４に戻った場合を説明する。図１７は、２回目（３回目以上も同様）の処理を示す概念図である。２回目が最初のＳ１００４と異なるのは、「ｍ_１〜ｍ_Ｃ＊」が、Ｓ１００５で求められた「新たなｍ_１〜ｍ_Ｃ＊」になっている点のみある。つまり、複数回目のＳ１００４では、直前のＳ１００５で求められた「新たなｍ_１〜ｍ_Ｃ＊」を使用し、「Ｌ_１〜Ｌ_Ｎ」のクラスタリングのやり直しを行う。つまり新たな「新たなｍ_１〜ｍ_Ｃ＊」を使用し、「Ｌ_１〜Ｌ_Ｎ」の再クラスタリングを行う。 (Multiple S1004)
A case where the process returns from S1006 to S1004 will be described. FIG. 17 is a conceptual diagram showing the second processing (the same applies to the third and subsequent times). The second time is different from the first S1004 only in that “m _{1 to} m _{C *} ” is “new m _{1 to} m _{C *} ” obtained in S1005. That is, in S1004 for a plurality of times, “new m _{1 to} m _{C *} ” obtained in the immediately preceding S 1005 is used, and clustering of “L _{1 to} L _N ” is performed again. That is, new “new m _{1 to} m _{C *} ” is used, and “L _{1 to} L _N ” is re-clustered.

図１０はｋ毎の処理であるので、ｋ＝２、ｋ＝３のような順に実行された場合、図１８に示すように、ｋ＝２についてＳ_ｉ（ｉ：１〜Ｎ）が処理され、次にｋ＝３についてＳ_ｉ（ｉ：１〜Ｎ）が処理される。したがって図１８に示すように、ｋ＝２について局所クラスタＤ_１〜Ｄ_Ｃ＊が決まり、ｋ＝３について局所クラスタＤ_１〜Ｄ_Ｃ＊が決まる。よって、１〜Ｋについて実行すれば、図１９に示すように、１〜Ｋのそれぞれに、局所クラスタＤ_１〜Ｄ_Ｃ＊が決まる。そして局所クラスタＤ_１〜Ｄ_Ｃ＊のそれぞれについて代表局所パラメータが定まっている。これを図示したものが図７であり、クラスタ７０３、７０４等が局所クラスタを示す。また、回帰式７０５，７０６が各局所クラスタの代表局所パラメータである。図１９のｋ違いの場合は、図７において、ｋごとの局所クラスタ、代表局所パラメータとして表示されるが、図７にはｋ違いは表現していない。 Since FIG. 10 shows processing for each k, when executed in the order of k = 2 and k = 3, S _i (i: 1 to N) is processed for k = 2 as shown in FIG. Next, S _i (i: 1 to N) is processed for k = 3. Thus, as shown in FIG. 18, for k = 2 the local cluster _D 1 _{to D C *} determines the local cluster _D 1 _{to D C *} is determined for k = 3. Therefore, if it is executed for 1 to K, as shown in FIG. 19, local clusters D ₁ to DC _* are determined for ₁ to K, respectively. A representative local parameter is determined for each of the local clusters D _{1 to} D _{C *} . This is illustrated in FIG. 7, and clusters 703, 704, etc. indicate local clusters. Regression equations 705 and 706 are representative local parameters of each local cluster. In the case of the k difference in FIG. 19, it is displayed as a local cluster and a representative local parameter for each k in FIG. 7, but the k difference is not expressed in FIG.

（外れ値検出部１０６）
外れ値検出部１０６は、図７に示す局所クラスタ、代表局所パラメータを対象として、別途与えられた区分データが外れ値に該当するかどうかを判定する。即ち、外れ値検出部１０６は、局所時系列データクラスタリング部１０４が決定した代表局所パラメータに基づき、評価対象として別途与えられた評価対象データであって、局所時系列データの生成の元になる種類の異なる複数の時系列データと種類を同じくする複数の時系列データの組からなる所定期間の評価対象データに対して、距離として定義された値が、いずれかの代表局所パラメータとの間で閾値を超えるかどうかを検出する。外れ値検出部１０６は、閾値を超えた場合、その評価対象データ（図７で異常と判定されたで異常判定対象データ７０８）を外れ値と判定する。 (Outlier detection unit 106)
The outlier detection unit 106 determines whether or not separately provided segment data corresponds to an outlier for the local cluster and the representative local parameter shown in FIG. That is, the outlier detection unit 106 is the evaluation target data separately given as the evaluation target based on the representative local parameter determined by the local time series data clustering unit 104, and is the type that is the source of the generation of the local time series data. The value defined as the distance for the evaluation target data consisting of a set of multiple time-series data of the same type as multiple time-series data of different values is a threshold value with any of the representative local parameters Detect whether or not. When the threshold value is exceeded, the outlier detection unit 106 determines the evaluation target data (abnormality determination target data 708 determined to be abnormal in FIG. 7) as an outlier.

なお、以上の局所クラスタリング（図１０）の説明では注目変数をｙとし、データ区分を考慮した場合を説明したが、データ区分を反映することは必須ではない。データ区分を反映しない場合（広義の局所時系列データ）は、例えば図１３、図１９等でｋ＝１のみの場合に対応する。 In the above description of local clustering (FIG. 10), the variable of interest is assumed to be y and the data partition is considered. However, it is not essential to reflect the data partition. The case where the data classification is not reflected (broadly defined local time-series data) corresponds to the case where only k = 1 in FIGS.

なお、Ｓ１００１、Ｓ１００２、Ｓ１００３は、局所時系列データクラスタリングの初期クラスタを設定する方法（初期クラスタ分割規則）の一例を示している。この初期クラスタの選択方法は、ランダムに選択するなど、クラスタリングの既知の選択方法に置き換えてもよい。 Note that S1001, S1002, and S1003 show an example of a method (initial cluster division rule) for setting an initial cluster for local time-series data clustering. This initial cluster selection method may be replaced with a known selection method of clustering such as selecting at random.

図２０、及び図１０のフローチャートを参照してランダムに選択する場合を説明する。
初期クラスタをＤ_ｊ ^（０）（ｊ＝１〜Ｃ＊）と表記する。
図１０ではｋ＝２とする。
説明の単純化のため、局所時系列データＬ_iは１０個とし、
局所クラスタの設定数Ｃ＊は、３とする。
ランダムに選択する場合、局所時系列データクラスタリング部１０４は、
局所時系列データＬ₁〜Ｌ_１０を、たとえば次のように初期クラスタに分割する（Ｓ０１，Ｓ０２）。
Ｄ_１ ^（０）＝Ｌ₁〜Ｌ_３、
Ｄ_２ ^（０）＝Ｌ_４〜Ｌ_６、
Ｄ_３ ^（０）＝Ｌ_７〜Ｌ_１０。
これは、図１０のＳ１００４（１回目）の処理が終了した状態である。
つぎに、Ｓ１００５（１回目）でＤ_１ ^（０）〜Ｄ_３ ^（０）の回帰式を求め、これを
「ｍ_１ ^（１）〜ｍ_３ ^（１）」（代表情報である第１回帰式）とする（Ｓ０２）。
次にＳ１００４（２回目）で各Ｌ_ｉとの距離がＳ１００４の定義式（所定の距離定義式）に基づき最小となるｍ_ｊ ^（１）を特定する（Ｓ０３）。
そして、Ｓ１００５（２回目）で、特定したｍ_ｊ ^（１）（第１回帰式）を同じくする「時系列データ回帰式（Ｓ（Ｌ_ｋｉ）に属するＦ（ｘ））」の元になる局所時系列データから構成されるクラスタである第１クラスタＤ_１ ^（１）〜Ｄ_３ ^（１）を、ｍ_１ ^（１）〜ｍ_３ ^（１）に対応させて生成する。
Ｄ_１ ^（１）＝Ｌ_２〜Ｌ_４、
Ｄ_２ ^（１）＝Ｌ_５〜Ｌ_７、
Ｄ_３ ^（１）＝Ｌ_８〜Ｌ_１０、Ｌ_１、
とする。
そして、第１クラスタＤ_１ ^（１）〜Ｄ_３ ^（１）に対して回帰分析を実施することにより第１クラスタＤ_１ ^（１）〜Ｄ_３ ^（１）ごとにｍ_１ ^（２）〜ｍ_３ ^（２）（代表情報である第２回帰式）を生成する（Ｓ０４）。
そして、Ｓ１００６において、今回生成のｍ_１ ^（２）〜ｍ_３ ^（２）が前回生成のｍ_１ ^（１）〜ｍ_３ ^（１）に対して変化があるかどうかを判定する。変化がなければ処理は終了し、変化があればＳ１００４（３回目）に進む。
Ｓ１００４（３回目）では、各Ｌ_ｉについてｍ_１ ^（２）〜ｍ_３ ^（２）との距離がＳ１００４の定義式に基づき最小となるｍ_ｊ ^（２）を特定する（Ｓ０５）。
そして、Ｓ１００５（３回目）で、特定したｍ_ｊ ^（２）（第１回帰式）を同じくする「時系列データ回帰式（Ｓ（Ｌ_ｋｉ）に属するＦ（ｘ））」の元になる局所時系列データから構成されるクラスタである第２クラスタＤ_１ ^（２）〜Ｄ_３ ^（２）を、ｍ_１ ^（２）〜ｍ_３ ^（２）に対応させて生成する。
Ｄ_１ ^（２）＝Ｌ_３〜Ｌ_５、
Ｄ_２ ^（２）＝Ｌ_６〜Ｌ_８、
Ｄ_３ ^（２）＝Ｌ_９〜Ｌ_１０、Ｌ_１〜Ｌ_２、
とする。
そして、
第２クラスタＤ_１ ^（２）〜Ｄ_３ ^（２）に対して回帰分析を実施することにより第２クラスタＤ_１ ^（２）〜Ｄ_３ ^（２）ごとにｍ_１ ^（３）〜ｍ_３ ^（３）（代表情報である第３回帰式）を生成する（Ｓ０６）。
そして、Ｓ１００６において、今回生成のｍ_１ ^（３）〜ｍ_３ ^（３）が前回生成のｍ_１ ^（２）〜ｍ_３ ^（２）に対して変化があるかどうかを判定する。変化がなければ処理は終了し、
変化があればＳ１００４（４回目）に進む。
Ｓ１００４（４回目）を経て（Ｓ０７）、Ｓ１００５（４回目）では、上記と同様にして、今回のｍ_１ ^（４）〜ｍ_３ ^（４）が生成される（Ｓ０８）。
そして、Ｓ１００６において、今回生成のｍ_１ ^（４）〜ｍ_３ ^（４）が前回生成のｍ_１ ^（３）〜ｍ_３ ^（３）に対して変化がなければ処理は終了するが、ｍ_１ ^（４）〜ｍ_３ ^（４）は前回生成のｍ_１ ^（３）〜ｍ_３ ^（３）に対して変化がないとする。この場合、処理は終了する。
この場合、処理が終了した時点の第３クラスタＤ_１ ^（３）〜Ｄ_３ ^（３）が局所クラスタであり、
ｍ_１ ^（４）〜ｍ_３ ^（４）が各局所クラスタを代表する代表局所パラメータ（局所クラスタ代表情報）である。 The case of selecting at random with reference to the flowcharts of FIGS. 20 and 10 will be described.
The initial cluster is denoted as D _j ⁽⁰⁾ (j = 1 to C *).
In FIG. 10, k = 2.
For simplification of explanation, the local time series data L _i is assumed to be 10 pieces,
The set number C * of local clusters is 3.
When selecting at random, the local time series data clustering unit 104
The local time series data L _{1 to} L ₁₀ are divided into initial clusters as follows, for example (S01, S02).
D ₁ ⁽⁰⁾ = L _{1 to} L ₃ ,
D ₂ ⁽⁰⁾ = L _{4 to} L ₆ ,
_{^{_{D 3 (0) = L 7}}} ~L 10.
This is a state in which the processing of S1004 (first time) in FIG. 10 has been completed.
Next, a regression equation of D ₁ ^{(0) to} D ₃ ⁽⁰⁾ is obtained in S1005 (first time), and this is obtained as “m ₁ ^{(1) to} m ₃ ⁽¹⁾ ” (first regression equation that is representative information ^). (S02).
Next S1004 distances between the _{L i} in Run 2 identifies the _m ^{j (1)} having the minimum based on the defining equation of the S1004 (predetermined distance defining equation) (S03).
Then, in S1005 (second time), the local that becomes the basis of the “time series data regression equation (F (x) belonging to S (L _ki ))” that uses the identified m _j ⁽¹⁾ (first regression equation). First clusters D ₁ ^{(1) to} D ₃ ⁽¹⁾ , which are clusters composed of time-series data, are generated in correspondence with m ₁ ^{(1) to} m ₃ ⁽¹⁾ .
D ₁ ⁽¹⁾ = L _{2 to} L ₄ ,
D ₂ ⁽¹⁾ = L _{5 to} L ₇ ,
D ₃ ⁽¹⁾ = L _{8 to} L _10, L ₁ ,
And
The first cluster _{^{_{^{D 1 (1) ~D 3 (}}}} 1) first cluster _D ¹ by performing a regression analysis on ⁽¹⁾ _~D ³ (1) each to _m ^{1 (2)} ~m ₃ ⁽²⁾ Generate (second regression equation as representative information) (S04).
In step S < ^b > 1006, it is determined whether m ₁ ^{(2) to} m ₃ ⁽²⁾ generated this time is changed from m ₁ ^{(1) to} m ₃ ⁽¹⁾ generated last time. If there is no change, the process ends. If there is a change, the process proceeds to S1004 (third time).
In S1004 (third time), for each L _i , m _j ⁽²⁾ is specified that minimizes the distance from m ₁ ^{(2) to} m ₃ ⁽²⁾ based on the definition formula of S1004 (S05).
Then, in S1005 (third time), the local that is the basis of “time series data regression equation (F (x) belonging to S (L _ki ))” that uses the identified m _j ⁽²⁾ (first regression equation). Second clusters D ₁ ^{(2) to} D ₃ ⁽²⁾ , which are clusters composed of time-series data, are generated in correspondence with m ₁ ^{(2) to} m ₃ ⁽²⁾ .
D ₁ ⁽²⁾ = L _{3 to} L ₅ ,
D ₂ ⁽²⁾ = L _{6 to} L ₈ ,
D ₃ ⁽²⁾ = L _{9 to} L ₁₀ , L _{1 to} L _2,
And
And
The second cluster _{_D} ¹ (2) _~D ³ second cluster _D ¹ (2) by performing a regression analysis on ^{_{^{(2) ~D 3 (2)}}} m 1 (3) every _~m ^{3 (3 )} (Third regression equation as representative information) is generated (S06).
In step S < ^b > 1006, it is determined whether m ₁ ^{(3) to} m ₃ ⁽³⁾ generated this time has a change from m ₁ ^{(2) to} m ₃ ⁽²⁾ generated last time. If there is no change, the process ends.
If there is a change, the process proceeds to S1004 (fourth time).
Through S1004 (the fourth) (S07), the S1005 (fourth), in the same manner as described above, the current _m ¹ (4) _~m ^{3 (4)} is generated (S08).
In S1006, if m ₁ ^{(4) to} m ₃ ⁽⁴⁾ generated this time is not changed from m ₁ ^{(3) to} m ₃ ⁽³⁾ generated last time, the process ends, but m ₁ ^{( 4) to} m ₃ ⁽⁴⁾ are assumed to be unchanged from the previously generated m ₁ ^{(3) to} m ₃ ⁽³⁾ . In this case, the process ends.
In this case, the third clusters D ₁ ^{(3) to} D ₃ ⁽³⁾ at the end of the process are local clusters,
m ₁ ^{(4) to} m ₃ ⁽⁴⁾ are representative local parameters (local cluster representative information) representing each local cluster.

図１０のＳ１００１、Ｓ１００２、Ｓ１００３、Ｓ１００４、Ｓ１００５は、回帰式を生成する回帰式生成処理であり、Ｓ１００６は、新たな回帰式を生成するたびに前回生成した回帰式から変化があるかどうかを判定し、変化があるときには次の新たな回帰式の回帰式生成処理を継続し、変化がないときには次の新たな回帰式の回帰式生成処理を継続することなく、回帰式生成処理を終了する判定処理である。 In FIG. 10, S1001, S1002, S1003, S1004, and S1005 are regression equation generation processes for generating a regression equation, and S1006 indicates whether or not there is a change from the previously generated regression equation each time a new regression equation is generated. When there is a change, the regression formula generation process of the next new regression formula is continued, and when there is no change, the regression formula generation process is terminated without continuing the regression formula generation process of the next new regression formula This is a determination process.

また、Ｓ１００２におけるｄｉｓｔ_ｌｍ（Ｓ（Ｌ_ｉ），｛ｍ_１，…，ｍ_Ｃ｝の定義も一例を示している。この距離では、クラスタリング分野でいう誤差２乗和基準を用いたが、最小分散、散布基準、トレース基準、行列式基準、不変量基準などを用いてもよい（参考文献：ＲｉｃｈａｒｄＯ．Ｄｕｄａ他著、尾上守夫監訳，パターン識別，ｐｐ．５４３−５４８，株式会社新技術コミュニケーション）。 Also, the definition of dist _lm (S (L _i ), {m ₁ ,..., M _C } in S1002 shows an example.In this distance, the error square sum criterion used in the clustering field is used. Dispersion, scattering criteria, trace criteria, determinant criteria, invariant criteria, etc. may be used (reference: Richard O. Duda et al., Translated by Morio Onoe, Pattern Identification, pp. 543-548, New Technology Communication Co., Ltd.) ).

（Ｓ９０５：大域的時系列データモデル推定部１０５の動作）
Ｓ９０５は、大域的時系列データモデル推定部１０５が実行する大域的データモデル推定である。Ｓ９０５では、局所時系列データクラスタリング部１０４が推定したモデルを接続することにより大域的な代表時系列データモデルを推定する。
図２１は、Ｓ９０５の処理の流れの詳細を示すフローチャートである。 (S905: Operation of Global Time Series Data Model Estimation Unit 105)
S905 is global data model estimation executed by the global time-series data model estimation unit 105. In S905, a global representative time series data model is estimated by connecting the models estimated by the local time series data clustering unit 104.
FIG. 21 is a flowchart showing details of the processing flow of S905.

Ｓ１１０１で、大域的時系列データ候補の初期集合Ｇを作り、Ｓ１１０２以降で集合の要素を併合しながら、最終的な大域的データ推定モデルを得る。Ｓ１１０１は、局所時系列データクラスタリングの結果得られるクラスタＤｉを順に取り出し、大域的時系列データ候補の初期集合Ｇを作る。初期集合Ｇは、Ｓ９０４の処理により得られたクラスタＤｉの局所構造Ｓ（Ｄｉ）の
集合｛Ｓ（Ｄ_１），Ｓ（Ｄ_２），…，Ｓ（Ｄ_Ｎ）｝
とする。
局所時系列データクラスタは、Ｙの区間ｌ毎にＣ_ｌ個あり、
全部でＮ＝ΣＣ_ｌ個ある。以下では、Ｓ（Ｄｉ）は、クラスタ併合後の局所構造を表現できるように、
目的変数の区間（ｙ_ｉｓ，ｙ_ｉｅ］、
クラスタの代表回帰式Ｆｉｋ（ｘ）の集合、
残差の平方和Ｅｉｌ、
局所時系列データＬｉ、
Ｌｉが定義されている時間区間Ｔｉｌの
５つ組（（ｙ_ｉｓ，ｙ_ｉｅ］，｛Ｆｉｋ（ｘ）｝，｛Ｅｉｋ｝，Ｌｉ，｛Ｔｉｌ｝）とする。
ここで、目的変数の区間
（ｙ_ｉｓ，ｙ_ｉｅ］は、
ｙ_ｉｓ＜ｙ≦ｙ_ｉｅを示している。

In S1101, an initial set G of global time-series data candidates is created, and a final global data estimation model is obtained while merging elements of the set in S1102 and thereafter. In step S1101, clusters Di obtained as a result of local time-series data clustering are sequentially extracted to create an initial set G of global time-series data candidates. The initial set G is a set {S (D ₁ ), S (D ₂ ),..., S (D _N )} of the local structure S (Di) of the cluster Di obtained by the processing of S904.
And
There are C _l local time-series data clusters for each interval l of Y,
There are N = ΣC _{l in} total. In the following, S (Di) can represent the local structure after cluster merging,
Objective variable interval (y _is , y _ie ],
A set of cluster representative regression equations Fik (x),
Residual sum of squares Eil,
Local time series data Li,
A set of five time intervals Til in which Li is defined ((y _is , y _ie ], {Fik (x)}, {Eik}, Li, {Til}).
Here, the interval (y _is , y _ie ) of the objective variable is
y _is <y ≦ y _ie .

Ｓ１１０２では、大域的時系列推定処理における接続処理を終了するかどうかを判定する。接続処理は、目的変数の区分が隣り合っており、かつ、クラスタの要素である局所時系列データ３０１の時間区間と代表回帰関数が近い組み合わせを含むという条件を満たす場合に、実行する。 In S1102, it is determined whether or not to terminate the connection process in the global time series estimation process. The connection process is executed when the objective variable classifications are adjacent to each other and the condition that the time interval of the local time series data 301 that is an element of the cluster and the representative regression function are included is included.

例えば、集合Ｇのすべての要素Ｄｉ，Ｄｊの組に対して、
条件Ｄｉｓｔ（Ｓ（Ｄｉ），Ｓ（Ｄｊ））＜δ
を満たすかどうかを判定する。条件を満たす場合には、Ｓ１１０５に進む。条件を満たさない場合は、Ｓ１１０３に進む。
ここで、
Ｄｉｓｔ（Ｓ（Ｄｉ），Ｓ（Ｄｊ））は、例えば、以下で定義する。

但し、｜｜ｘ｜｜は、ユークリッド距離とする。Ｓ（Ｄｉ）には、複数の局所構造をもつので、代表回帰式Ｆｉｋは複数存在するので、Ｓ（Ｄｉ）とＳ（Ｄｊ）では、すべてｉｋとｊｌの組を比較した際の最小値をとるように定義する。 For example, for a set of all elements Di, Dj of the set G,
Condition Dist (S (Di), S (Dj)) <δ
Judge whether to satisfy. If the condition is satisfied, the process proceeds to S1105. If the condition is not satisfied, the process proceeds to S1103.
here,
Dist (S (Di), S (Dj)) is defined below, for example.

However, || x || is the Euclidean distance. Since S (Di) has a plurality of local structures, there are a plurality of representative regression equations Fik. Therefore, in S (Di) and S (Dj), the minimum value when all pairs of ik and jl are compared is set. Define to take.

Ｓ１１０３では、大域的時系列推定処理における併合処理を実行する。例えば、集合ＧのすべてのＤｉ，Ｄｊの組に対して、Ｄｉｓｔ（Ｓ（Ｄｉ），Ｓ（Ｄｊ））が最小となるＤｉ，Ｄｊを求める。 In S1103, the merge process in the global time series estimation process is executed. For example, Di and Dj that minimize Dist (S (Di), S (Dj)) are obtained for all Di and Dj pairs in the set G.

Ｓ１１０４では、大域的時系列データの候補集合Ｇを更新する。集合Ｇから、Ｓ（Ｄｉ）とＳ（Ｄｊ）を削除し、Ｓ（Ｄｉ＋Ｄｊ）を追加する。但し、Ｓ（Ｄｉ＋Ｄｊ）は、例えば、下記で定義する。

Ｄｉｓｔ（Ｓ（Ｄｉ），Ｓ（Ｄｊ））がｙ_ｉｅ＝ｙ_ｊｓ
の場合にしか定義されないので、併合後のｙの区間は連続した一つの区間
（ｙ_ｉｓ，ｙ_ｊｅ］になる。 In S1104, the global time-series data candidate set G is updated. Delete S (Di) and S (Dj) from set G and add S (Di + Dj). However, S (Di + Dj) is defined below, for example.

Dist (S (Di), S (Dj)) is y _ie = y _js
Therefore, the y section after merging becomes one continuous section (y _is , y _je ]).

Ｓ１１０５では、集合ＧのすべてのＤｉに対して、区分的な回帰分析を実施する。このときの区分数は、自由に選択してもよいし、クラスタＤｉの構成要素となるＧ初期化時のクラスタ数（すなわち、Ｓ（Ｄｉ）に含まれる代表回帰式の数に等しい）としてもよい。Ｓ１１０５で得られた区分的な回帰式が、推定された大域的な時系列データモデルである。 In S1105, piecewise regression analysis is performed on all Di in the set G. The number of sections at this time may be freely selected, or may be the number of clusters at the time of G initialization that is a component of the cluster Di (that is, equal to the number of representative regression equations included in S (Di)). Good. The piecewise regression equation obtained in S1105 is the estimated global time series data model.

（Ｓ９０６：外れ値検出部１０６の動作）
Ｓ９０６は、外れ値検出部１０６が実行する外れ値検出処理である。別途与えられた区分データに対して、大域的時系列データモデル推定部１０５により得られた代表局所時系列データモデルの集合における外れ値が大きいものを異常として検出する。 (S906: Operation of Outlier Detection Unit 106)
S906 is an outlier detection process executed by the outlier detection unit 106. With respect to separately provided segment data, a large outlier in a set of representative local time series data models obtained by the global time series data model estimation unit 105 is detected as an anomaly.

以上のように、実施の形態１のプラント異常検知装置１００では、時間やセンサ信号の値で区分された局所時系列データの集合をクラスタリングするＳ９０４の処理により、頻度の少ない局所時系列データは代表局所パラメータにはあまり影響されてないため、収集したデータ中に、設備の劣化に起因するデータのばらつきや、異常となる直前のデータが混入している場合でも、それらの頻度が少ない場合には、異常検知の精度低下を防ぐ効果が得られる。
また、Ｓ９０４で得られた局所時系列データクラスタリング部１０４が推定したモデルを接続することにより大域的な代表時系列データモデルを生成することで、センサ信号間の大域的な関係式を求めることができるようになる。したがって、この大域的な関係グラフからの偏差により異常と判定したことをユーザに示すことで、異常判断の根拠の説明をわかりやすく説明することができるようになる。
この大域的な関係式を求める処理は、図８に示すように、信号間の関係があらかじめ不明である非線形な関係にある場合でも、局所的に線形で表現された関係を接続しているので、大域的な関係式を求めることができるという効果も持つ。 As described above, in the plant abnormality detection device 100 according to the first embodiment, local time-series data with less frequency is represented by the processing of S904 for clustering a set of local time-series data divided by time and sensor signal values. Since local parameters are not so much affected, even if the collected data contains data variability due to equipment deterioration or data immediately before abnormalities are mixed, the frequency is low. The effect of preventing a decrease in the accuracy of abnormality detection can be obtained.
Further, it is possible to obtain a global relational expression between sensor signals by generating a global representative time series data model by connecting the models estimated by the local time series data clustering unit 104 obtained in S904. become able to. Therefore, by indicating to the user that the abnormality is determined based on the deviation from the global relationship graph, the explanation of the basis of the abnormality determination can be explained in an easy-to-understand manner.
As shown in FIG. 8, the processing for obtaining this global relational expression connects locally expressed relationships even when the relationship between signals is a non-linear relationship that is unknown in advance. It also has the effect that a global relational expression can be obtained.

（１）本実施の形態のプラント異常検出装置は、時間区分に分割して得られた軌跡区分をクラスタリングすることにより、軌跡区分の中で頻度が多いという意味で代表的な軌跡区分を抽出する。これにより、まれにしか生じない軌跡区分の影響を小さくすることにより、異常検知の精度を向上させることができる。
（２）また、大域的時系列データモデル推定部１０５が、上記の代表的な軌跡区分を接続することにより、大域的な代表的な軌跡を生成する。従って、センサ信号間の大域的な関係グラフを求め、この大域的な関係グラフからの偏差により異常と判定したことをユーザに示すことができるため、異常判断の根拠の説明をユーザにわかりやすく説明することができる。 (1) The plant abnormality detection apparatus according to the present embodiment extracts a representative trajectory segment in the sense that the trajectory segment has a high frequency by clustering the trajectory segments obtained by dividing into time segments. . Thereby, the accuracy of abnormality detection can be improved by reducing the influence of the trajectory segment that occurs rarely.
(2) Further, the global time series data model estimation unit 105 generates a global representative trajectory by connecting the representative trajectory segments. Therefore, it is possible to obtain a global relationship graph between sensor signals and indicate to the user that the abnormality has been determined based on the deviation from the global relationship graph. can do.

実施の形態２．
図２２、図２３を参照して実施の形態４を説明する。実施の形態２は、コンピュータであるプラント異常検知装置１００のハードウェア構成を説明する。図２２は、コンピュータであるプラント異常検知装置１００の外観の一例を示す図である。図２３は、実施の形態１で述べたＣＰＵ割当時間管理装置１０００のハードウェア資源の一例を示す図である。 Embodiment 2. FIG.
The fourth embodiment will be described with reference to FIGS. Embodiment 2 demonstrates the hardware constitutions of the plant abnormality detection apparatus 100 which is a computer. FIG. 22 is a diagram illustrating an example of the appearance of the plant abnormality detection device 100 that is a computer. FIG. 23 is a diagram illustrating an example of hardware resources of the CPU allocation time management apparatus 1000 described in the first embodiment.

外観を示す図２２において、プラント異常検知装置１００は、システムユニット８３０、ＣＲＴ（Ｃａｔｈｏｄｅ・Ｒａｙ・Ｔｕｂｅ）やＬＣＤ（液晶）の表示画面を有する表示装置８１３、キーボード８１４（Ｋｅｙ・Ｂｏａｒｄ：Ｋ／Ｂ）、マウス８１５、コンパクトディスク装置８１８（ＣＤＤ：ＣｏｍｐａｃｔＤｉｓｋＤｒｉｖｅ）などのハードウェア資源を備え、これらはケーブルや信号線で接続されている。システムユニット８３０はネットワークに接続している。 In FIG. 22 showing the appearance, the plant abnormality detection device 100 includes a system unit 830, a display device 813 having a CRT (Cathode / Ray / Tube) or LCD (liquid crystal) display screen, a keyboard 814 (Key / Board: K / B). ), A hardware resource such as a mouse 815 and a compact disk device 818 (CDD: Compact Disk Drive), which are connected by a cable or a signal line. The system unit 830 is connected to the network.

またハードウェア資源を示す図２３において、プラント異常検知装置１００は、プログラムを実行するＣＰＵ８１０（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）を備えている。ＣＰＵ８１０は、バス８２５を介してＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）８１１、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）８１２、表示装置８１３、キーボード８１４、マウス８１５、通信ボード８１６、ＣＤＤ８１８、磁気ディスク装置８２０と接続され、これらのハードウェアデバイスを制御する。磁気ディスク装置８２０の代わりに、光ディスク装置、フラッシュメモリなどの記憶装置でもよい。 In FIG. 23 showing hardware resources, the plant abnormality detection apparatus 100 includes a CPU 810 (Central Processing Unit) that executes a program. The CPU 810 is connected to a ROM (Read Only Memory) 811, a RAM (Random Access Memory) 812, a display device 813, a keyboard 814, a mouse 815, a communication board 816, a CDD 818, and a magnetic disk device 820 via a bus 825. Control hardware devices. Instead of the magnetic disk device 820, a storage device such as an optical disk device or a flash memory may be used.

ＲＡＭ８１２は、揮発性メモリの一例である。ＲＯＭ８１１、ＣＤＤ８１８、磁気ディスク装置８２０等の記憶媒体は、不揮発性メモリの一例である。これらは、「記憶装置」あるいは記憶部、格納部、バッファの一例である。通信ボード８１６、キーボード８１４などは、入力部、入力装置の一例である。また、通信ボード８１６、表示装置８１３などは、出力部、出力装置の一例である。通信ボード８１６は、ネットワークに接続されている。 The RAM 812 is an example of a volatile memory. Storage media such as the ROM 811, the CDD 818, and the magnetic disk device 820 are examples of nonvolatile memories. These are examples of a “storage device” or a storage unit, a storage unit, and a buffer. The communication board 816, the keyboard 814, and the like are examples of an input unit and an input device. The communication board 816, the display device 813, and the like are examples of an output unit and an output device. The communication board 816 is connected to the network.

磁気ディスク装置８２０には、オペレーティングシステム８２１（ＯＳ）、ウィンドウシステム８２２、プログラム群８２３、ファイル群８２４が記憶されている。プログラム群８２３のプログラムは、ＣＰＵ８１０、オペレーティングシステム８２１、ウィンドウシステム８２２により実行される。 The magnetic disk device 820 stores an operating system 821 (OS), a window system 822, a program group 823, and a file group 824. The programs in the program group 823 are executed by the CPU 810, the operating system 821, and the window system 822.

上記ＯＳ８２１，プログラム群８２３には、以上の実施の形態の説明において「〜部」として説明した機能を実行するプログラムが記憶されている。プログラムは、ＣＰＵ８１０により読み出され実行される。 The OS 821 and the program group 823 store programs that execute the functions described as “˜units” in the description of the above embodiments. The program is read and executed by the CPU 810.

ファイル群８２４には、以上の実施の形態の説明において、「〜の判定結果」、「〜の算出結果」、「〜の抽出結果」、「〜の生成結果」、「〜の処理結果」として説明した情報や、データや信号値や変数値やパラメータなどが、「〜ファイル」や「〜データベース」の各項目として記憶されている。「〜ファイル」や「〜データベース」（例えばプラント時系列データベース１０１）は、ディスクやメモリなどの記録媒体に記憶される。ディスクやメモリなどの記憶媒体に記憶された情報やデータや信号値や変数値やパラメータは、読み書き回路を介してＣＰＵ８１０によりメインメモリやキャッシュメモリに読み出され、抽出・検索・参照・比較・演算・計算・処理・出力・印刷・表示などのＣＰＵの動作に用いられる。抽出・検索・参照・比較・演算・計算・処理・出力・印刷・表示のＣＰＵの動作の間、情報やデータや信号値や変数値やパラメータは、メインメモリやキャッシュメモリやバッファメモリに一時的に記憶される。 In the description of the above embodiment, the file group 824 includes “to determination result”, “to calculation result”, “to extraction result”, “to generation result”, and “to processing result”. The described information, data, signal values, variable values, parameters, and the like are stored as items of “˜file” and “˜database”. The “˜file” and “˜database” (for example, the plant time series database 101) are stored in a recording medium such as a disk or a memory. Information, data, signal values, variable values, and parameters stored in a storage medium such as a disk or memory are read out to the main memory or cache memory by the CPU 810 via a read / write circuit, and extracted, searched, referenced, compared, and calculated. Used for CPU operations such as calculation, processing, output, printing, and display. Information, data, signal values, variable values, and parameters are temporarily stored in the main memory, cache memory, and buffer memory during the CPU operations of extraction, search, reference, comparison, operation, calculation, processing, output, printing, and display. Is remembered.

また、以上に述べた実施の形態の説明において、データや信号値は、ＲＡＭ８１２のメモリ、ＣＤＤ８１８のコンパクトディスク、磁気ディスク装置８２０の磁気ディスク、その他光ディスク、ミニディスク、ＤＶＤ（Ｄｉｇｉｔａｌ・Ｖｅｒｓａｔｉｌｅ・Ｄｉｓｋ）等の記録媒体に記録される。また、データや信号は、バス８２５や信号線やケーブルその他の伝送媒体によりオンライン伝送される。 In the description of the embodiment described above, the data and signal values are the memory of the RAM 812, the compact disk of the CDD 818, the magnetic disk of the magnetic disk device 820, other optical disks, mini disks, and DVDs (Digital Versatile Disk). Or the like. Data and signals are transmitted on-line via the bus 825, signal lines, cables, and other transmission media.

また、以上の実施の形態の説明において、「〜部」として説明したものは、「〜手段」、であってもよく、また、「〜ステップ」、「〜手順」、「〜処理」であってもよい。すなわち、「〜部」として説明したものは、ソフトウェアのみ、或いは、ソフトウェアとハードウェアとの組み合わせ、さらには、ファームウェアとの組み合わせで実施されても構わない。ファームウェアとソフトウェアは、プログラムとして、磁気ディスク、フレキシブルディスク、光ディスク、コンパクトディスク、ミニディスク、ＤＶＤ等の記録媒体に記憶される。プログラムはＣＰＵ８１０により読み出され、ＣＰＵ８１０により実行される。すなわち、プログラムは、以上に述べた「〜部」としてコンピュータを機能させるものである。あるいは、以上に述べた「〜部」の手順や方法をコンピュータに実行させるものである。 In the above description of the embodiment, what has been described as “to part” may be “to means”, and “to step”, “to procedure”, and “to processing”. May be. That is, what has been described as “˜unit” may be implemented by software alone, a combination of software and hardware, or a combination of firmware. Firmware and software are stored as programs in a recording medium such as a magnetic disk, a flexible disk, an optical disk, a compact disk, a mini disk, and a DVD. The program is read by the CPU 810 and executed by the CPU 810. That is, the program causes the computer to function as the “˜unit” described above. Alternatively, the computer executes the procedure and method of “to part” described above.

以上の実施の形態では、プラント異常検知装置１００を説明したが、プラント異常検知装置１００の動作を、コンピュータに実行させるためのプログラムとしても把握できることは以上の説明から当然である。また、以上の実施の形態で説明したプラント異常検知装置１００の動作を、プラント異常検知装置１００の各部が行う検出方法、検知方法と把握することも可能である。 In the above embodiment, although the plant abnormality detection apparatus 100 was demonstrated, it is natural from the above description that operation | movement of the plant abnormality detection apparatus 100 can also be grasped | ascertained as a program for making a computer perform. Moreover, it is also possible to grasp the operation of the plant abnormality detection device 100 described in the above embodiment as a detection method and a detection method performed by each unit of the plant abnormality detection device 100.

以上の実施の形態では、
時間の経過に従って順次観測して得た複数の時系列データの組を入力データとして、入力データの時間的な変化の仕方に従って、前記入力データを時間で区分することにより、時間で区分された時系列データを抽出する局所時系列データ抽出部と、
前記の区分されたデータを、多変量解析、または、時系列解析手法によりモデル推定する局所時系列データモデル推定部と、
前記により推定された局所時系列データのモデルの集合をクラスタに分割すると共に、クラスタ毎にクラスタを代表する代表局所パラメータを推定する局所時系列データクラスタリング部と、
別途与えられた区分データに対して、前記により得られた代表局所時系列データモデルの集合における外れ値が大きいものを異常として検出する外れ値検出部
とを有するプラント異常検知装置を説明した。 In the above embodiment,
When a set of time-series data obtained by observing sequentially over time is used as input data, and the input data is divided by time according to how the input data changes over time. A local time series data extraction unit for extracting series data;
A local time series data model estimation unit for estimating the modeled data by multivariate analysis or a time series analysis method;
A local time series data clustering unit that divides a set of models of local time series data estimated as described above into clusters, and estimates representative local parameters representing the clusters for each cluster;
A plant abnormality detection apparatus having an outlier detection unit that detects, as an abnormality, a large outlier in the set of representative local time series data models obtained as described above with respect to separately provided segment data has been described.

以上の実施の形態では、
時間で区分された時系列データを、さらに、時系列データの値のレンジで区分することにより得られる時系列データを抽出することを特徴とする局所時系列データ抽出部を備えたプラント異常検知装置を説明した。 In the above embodiment,
A plant abnormality detection apparatus having a local time-series data extraction unit, characterized by extracting time-series data obtained by further classifying time-series data divided by time into a range of values of time-series data Explained.

以上の実施の形態では、
代表局所時系列データの推定モデルを接続することにより得られる大域的な代表時系列データの候補を生成する大域的時系列データモデル推定部を備えたこプラント異常検知装置を説明した。 In the above embodiment,
This plant abnormality detection apparatus provided with the global time series data model estimation part which produces | generates the candidate of the global representative time series data obtained by connecting the estimation model of representative local time series data was demonstrated.

１００プラント異常検知装置、１０１プラント時系列データベース、１０２局所時系列データ抽出部、１０３局所時系列データモデル推定部、１０４局所時系列データクラスタリング部、１０５大域的時系列データモデル推定部、１０６外れ値検出部、３０１局所時系列データ、７０１クラスタ、７０２クラスタ、７０３クラスタ、７０４クラスタ、７０５回帰式（代表局所パラメータ）、７０６回帰式（代表局所パラメータ）、７０７，７０８異常判定対象データ、９０１，９０２局所時系列データの対応範囲。 DESCRIPTION OF SYMBOLS 100 Plant abnormality detection apparatus, 101 Plant time series database, 102 Local time series data extraction part, 103 Local time series data model estimation part, 104 Local time series data clustering part, 105 Global time series data model estimation part, 106 Outlier Detection unit, 301 local time series data, 701 cluster, 702 cluster, 703 cluster, 704 cluster, 705 regression equation (representative local parameter), 706 regression equation (representative local parameter), 707, 708 abnormality determination target data, 901, 902 Corresponding range of local time series data.

Claims

Time series data belonging to the time range is extracted for each N time ranges (N is an integer of 2 or more) from the first time range to the Nth time range from a plurality of different time series data. A local time-series data extraction unit that generates N local time-series data including a plurality of sets of time-series data in a range;
N local time-series data extracted by the local time-series data extraction unit is divided into initial clusters having a predetermined number of initial clusters according to an initial cluster dividing rule set in advance as an initial cluster dividing rule, and divided. N is generated by generating representative information indicating the characteristics of the initial cluster for each initial cluster, and distributing N local time-series data according to the re-clustering rule set in advance as a re-clustering rule for each generated representative information. Re-clustering is performed to divide the local time-series data into clusters, representative information is regenerated for each re-clustered cluster, and N local time-series data extraction units extracted for each re-generated representative information Re-cluster local time series data,
Similarly,
Repeat the re-clustering of N local time-series data and the regeneration of the representative information, and every time the representative information is regenerated, is the representative information generated this time changed from the representative information generated immediately before? When there is a change, the next representative information regeneration process is continued, and when there is no change, the next representative information regeneration process is continued without re-clustering and representing the N local time-series data. the local time-series data clustering unit to end the process with regeneration of the information,
A local time series data regression equation generating unit that generates a local time series data regression equation corresponding to the N local time series data generated by the local time series data extracting unit for each of the N local time series data. When
With
The local time series data clustering unit
By performing regression analysis on the union of local time series data belonging to each divided initial cluster, a first regression equation is generated as representative information for each initial cluster, and each of the N local time series data is generated. The first regression equation that has the shortest distance calculated according to a predetermined distance definition equation that is a reclustering rule among the first regression equations generated for each initial cluster with respect to the time series data regression equation is identified and identified. Re-clustering by generating a first cluster that is a cluster composed of local time-series data that is the basis of a time-series data regression formula that is the same as the first regression formula, corresponding to different first regression formulas, By performing a regression analysis on the union of local time series data belonging to each first cluster, a second regression equation that is representative information is generated for each first cluster. Regression formula generation process is executed for,
Similarly,
The predetermined distance definition among the respective p + 1 regression equations that are representative information generated for each pth cluster (p is an integer of 1 or more) with respect to each of the time series data regression equations of the N local time series data. The p + 1-th cluster which is a cluster composed of local time-series data that identifies the time-series data regression equation that identifies the p + 1-th regression equation with the shortest distance calculated according to the equation Are re-clustered by generating corresponding to each different p + 1-th regression equation, and regression analysis is performed on the union of local time series data belonging to each p + 1-th cluster for each p + 1-th cluster While executing a regression equation generation process for generating the p + 2 regression equation that is representative information,
Each time a new p + 1th regression equation is generated, it is determined whether there is a change from the previously generated pth regression equation. If there is a change, the next new p + 2 regression equation regression expression generation process is continued and the change is made. When there is not, the determination processing for ending the regression equation generation processing is executed without continuing the regression equation generation processing of the next new p + 2 regression equation,
The local time series data extraction unit includes:
A range of data values of designated time-series data specified in advance among a plurality of different types of time-series data is set to K pieces of data that differ from the first data range to the K-th data range (K is an integer of 2 or more). Divide into ranges, and generate N local time series data using a predetermined local time series data generation rule for each of the divided K data ranges,
The local time series data regression equation generation unit
The local time series data extraction unit calculates a local time series data regression equation corresponding to the total number of K × N local time series data generated for each of the divided K data ranges as K × N local times. Generate for each series data,
The local time series data clustering unit
For each of the K data ranges divided by the local time series data extraction unit, a regression formula generation process and a determination process are performed, and the local time series data corresponding to the data range generated by the local time series data extraction unit, A local time series data regression formula generated by the local time series data regression formula generation unit, and a local time series data regression formula in which the data range generated by the local time series data extraction unit is the same as the local time series data Clustering device to execute.

The local time series data clustering unit
When it is determined that the regression equation generation processing is to be terminated in the determination processing, the cluster that is the source of the last generated regression equation is determined as a local cluster, and the regression equation corresponding to the determined local cluster is represented by that cluster. The clustering apparatus according to claim 1 , wherein local cluster representative information to be determined is determined.

The clustering apparatus includes:
Based on local cluster representative information determined by the local time-series data clustering unit, the evaluation target data separately given as an evaluation target, and a plurality of different types of time-series data from which local time-series data is generated An error in detecting whether the value defined as the distance exceeds any threshold with any local cluster representative information for the evaluation target data consisting of multiple sets of time series data of the same type. clustering apparatus according to claim 1 or claim 2, characterized in that with a value detection unit.

  Time series data belonging to the time range is extracted for each N time ranges (N is an integer of 2 or more) from the first time range to the Nth time range from a plurality of different time series data. Local time-series data extraction processing for generating N local time-series data consisting of a plurality of sets of time-series data in a range;
  The N local time-series data extracted by the local time-series data extraction process are divided into initial clusters having a preset number of initial clusters in accordance with an initial cluster division rule set in advance as an initial cluster division rule. N is generated by generating representative information indicating the characteristics of the initial cluster for each initial cluster, and distributing N local time-series data according to the re-clustering rule set in advance as a re-clustering rule for each generated representative information. Re-clustering is performed to divide the local time-series data into clusters, representative information is regenerated for each re-clustered cluster, and the N time-series data extraction processes extracted for each re-generated representative information Re-cluster local time series data,
Similarly,
  Repeat the re-clustering of N local time-series data and the regeneration of the representative information, and every time the representative information is regenerated, is the representative information generated this time changed from the representative information generated immediately before? When there is a change, the next representative information regeneration process is continued, and when there is no change, the next representative information regeneration process is continued without re-clustering and representing the N local time-series data. Local time series data clustering processing to finish the process of information regeneration,
  Local time series data regression equation generation processing for generating a local time series data regression equation corresponding to N local time series data generated by the local time series data extraction processing for each N local time series data When
And execute
  The local time series data clustering process is:
  By performing regression analysis on the union of local time series data belonging to each divided initial cluster, a first regression equation is generated as representative information for each initial cluster, and each of the N local time series data is generated. The first regression equation that has the shortest distance calculated according to a predetermined distance definition equation that is a reclustering rule among the first regression equations generated for each initial cluster with respect to the time series data regression equation is identified and identified. Re-clustering by generating a first cluster that is a cluster composed of local time-series data that is the basis of a time-series data regression formula that is the same as the first regression formula, corresponding to different first regression formulas, By performing a regression analysis on the union of local time series data belonging to each first cluster, a second regression equation that is representative information is generated for each first cluster. Regression formula generation process is executed for,
  Similarly,
  The predetermined distance definition among the respective p + 1 regression equations that are representative information generated for each pth cluster (p is an integer of 1 or more) with respect to each of the time series data regression equations of the N local time series data. The p + 1-th cluster which is a cluster composed of local time-series data that identifies the time-series data regression equation that identifies the p + 1-th regression equation with the shortest distance calculated according to the equation Are re-clustered by generating corresponding to each different p + 1-th regression equation, and regression analysis is performed on the union of local time series data belonging to each p + 1-th cluster for each p + 1-th cluster While executing a regression equation generation process for generating the p + 2 regression equation that is representative information,
  Each time a new p + 1th regression equation is generated, it is determined whether there is a change from the previously generated pth regression equation. If there is a change, the next new p + 2 regression equation regression expression generation process is continued and the change is made. When there is not, the determination processing for ending the regression equation generation processing is executed without continuing the regression equation generation processing of the next new p + 2 regression equation,
  The local time series data extraction process includes:
  A range of data values of designated time-series data specified in advance among a plurality of different types of time-series data is set to K pieces of data that differ from the first data range to the K-th data range (K is an integer of 2 or more). Divide into ranges, and generate N local time series data using a predetermined local time series data generation rule for each of the divided K data ranges,
  The local time series data regression formula generation process is:
  In the local time series data extraction processing, the local time series data regression equation corresponding to the total number K × N of local time series data generated for each of the divided K data ranges is represented as K × N local times. Generate for each series data,
  The local time series data clustering process is:
  For each of the K data ranges divided by the local time-series data extraction process, a regression equation generation process and a determination process are performed, the local time-series data corresponding to the data range generated by the local time-series data extraction process, A local time series data regression formula generated by the local time series data regression formula generation process, and a local time series data regression formula in which the data range generated by the local time series data extraction process is the same as the local time series data Clustering program to execute.