JP2007164346A

JP2007164346A - Decision tree changing method, abnormality determination method, and program

Info

Publication number: JP2007164346A
Application number: JP2005357725A
Authority: JP
Inventors: Kazuto Kubota; 和人久保田; Toshiaki Hatano; 寿昭波田野; Tsuneo Watanabe; 辺経夫渡
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2005-12-12
Filing date: 2005-12-12
Publication date: 2007-06-28

Abstract

<P>PROBLEM TO BE SOLVED: To provide a decision tree changing method for improving the precision of abnormality determination. <P>SOLUTION: This decision tree changing method as one configurations includes inputting data to a decision tree for predicting a class from one or more attributes generated from multi-dimensional time series data as the group of data having the n-th dimensional attribute value and one-dimensional class, and generating frequency distribution information showing the frequency distribution of classes by using the data group classified into the leaves of the decision tree for every leave, and selecting one or more classes based on the frequency distribution information, and updating the class of the corresponding leaves of the decision tree with the selected class. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、決定木変更方法、異常性判定方法およびプログラムに関し、例えば多次元時系列データの異常性を判定する技術に関するものである。 The present invention relates to a decision tree changing method, an abnormality determination method, and a program, for example, a technique for determining the abnormality of multidimensional time series data.

近年のプラントシステムの一部では，システムを構成する個々の装置に取り付けられたセンサの適正範囲を監視することでプラントの異常を発見するシステムが備えられている。これは，センサ値の取るべき適正な範囲をあらかじめ設定し，適正範囲を外れた際に異常警告を出すものである．センサ数の増大によって適正範囲設定の自動化が望まれている。 Some of the recent plant systems are equipped with a system for detecting a plant abnormality by monitoring an appropriate range of sensors attached to individual devices constituting the system. In this method, an appropriate range that should be taken by the sensor value is set in advance, and an abnormal warning is issued when the sensor value is outside the appropriate range. An increase in the number of sensors is desired to automate the appropriate range setting.

一般的な適正範囲は、例えば上限と下限とで与えられるが、これでは精度が今ひとつである。あるセンサ（以降、ターゲットセンサと呼ぶ）の適正範囲の設定は、関連して動く１つ以上の他のセンサ（以降、説明センサと呼ぶ）を利用することで精度良く決定できる。 A general appropriate range is given by, for example, an upper limit and a lower limit, but this is only one accuracy. The setting of an appropriate range of a certain sensor (hereinafter referred to as a target sensor) can be accurately determined by using one or more other sensors (hereinafter referred to as explanation sensors) that move in association with each other.

より詳細には、まずターゲットセンサの値を説明センサの値を用いて予測するモデルを複数構築しておく。あるデータが異常であるか否かの判定を行う際には、複数のモデルの中から判定に用いるモデルを選択し、選択したモデルに判定すべきデータを入力する。選択したモデルから得られる予測値と実際のターゲットセンサの値とを比較することでデータに異常があるか否かの判定（異常性判定）を行う。予測値と実際の値とが大きく異なる場合は、その実際の値は異常値を示している可能性が高いといえる。 More specifically, first, a plurality of models for predicting the value of the target sensor using the value of the explanatory sensor are constructed. When determining whether or not certain data is abnormal, a model used for determination is selected from a plurality of models, and data to be determined is input to the selected model. By comparing the predicted value obtained from the selected model with the actual value of the target sensor, it is determined whether or not there is an abnormality in the data (abnormality determination). If the predicted value and the actual value are significantly different, it can be said that the actual value is likely to indicate an abnormal value.

しかし、このような手法により異常性判定を行うにしても、予測時に用いるモデルの選択が適切でないと、正常なデータを異常と判定してしまう可能性がある。
特開２００４−２９９２２公報 However, even if the abnormality determination is performed by such a method, there is a possibility that normal data is determined to be abnormal if the model used for prediction is not properly selected.
JP 2004-29922 A

本発明は、異常性判定の精度を向上可能にした、決定木変更方法、異常性判定方法およびプログラムを提供する。 The present invention provides a decision tree changing method, an abnormality determination method, and a program capable of improving the accuracy of abnormality determination.

本発明の一態様としての決定木変更方法は、ｎ次元の属性値と１次元のクラスとを持つデータの集合である多次元時系列データから生成された１以上の属性からクラスを予測する決定木に各前記データを入力し、前記決定木の葉ごとに、前記葉に分類されるデータ群を用いてクラスの度数分布を表す度数分布情報を生成し、前記度数分布情報に基づき１つ以上のクラスを選択し、選択した前記クラスによって、前記度数分布情報に対応する前記決定木の葉のクラスを更新する。 A decision tree changing method according to an aspect of the present invention is a method for determining a class from one or more attributes generated from multidimensional time-series data that is a set of data having an n-dimensional attribute value and a one-dimensional class. Each of the data is input to a tree, and for each leaf of the decision tree, frequency distribution information representing a frequency distribution of the class is generated using a data group classified into the leaves, and one or more classes are generated based on the frequency distribution information And the leaf class of the decision tree corresponding to the frequency distribution information is updated with the selected class.

本発明の一態様としての異常性判定方法は、ｎ次元の属性値を持つデータの集合である第１の多次元時系列データをクラスタリングし、各クラスタにそれぞれクラスを割り当て、前記クラスタ毎に、ある１つの属性の値を他の属性を用いて予測するモデルを作成し、前記第１の多次元時系列データにおける個々のデータに、それぞれが属するクラスタのクラスを割り当て、クラスが割り当てられた第１の多次元時系列データを用いて、前記他の属性からクラスを予測する決定木を生成し、生成された前記決定木を上記決定木変更方法によって変更し、次に、ｎ次元の属性値を持つデータの集合である第２の多次元時系列データにおいてテスト対象となるデータを、変更後の前記決定木に入力して、クラスを予測し、予測したクラスに対応するモデルを選択し、選択したモデルに前記テスト対象となるデータにおける前記他の属性の値を入力し、前記テスト対象となるデータにおける前記ある１つの属性の値と、前記モデルの出力とから、前記テスト対象となるデータの異常性を判定する。 The abnormality determination method as one aspect of the present invention clusters first multidimensional time-series data that is a set of data having n-dimensional attribute values, assigns a class to each cluster, and for each cluster, A model for predicting the value of one attribute using another attribute is created, and the class of the cluster to which each class belongs is assigned to each data in the first multidimensional time series data. A decision tree for predicting a class from the other attributes is generated using one multidimensional time-series data, the generated decision tree is changed by the decision tree changing method, and then an n-dimensional attribute value Data to be tested in the second multidimensional time-series data that is a set of data having the following is input to the decision tree after the change, the class is predicted, and the predicted class is supported Select Dell, input the value of the other attribute in the data to be tested to the selected model, and from the value of the one attribute in the data to be tested and the output of the model, Determine the anomaly of the data to be tested.

本発明の一態様としての異常性判定方法は、ｎ次元の属性値を持つデータの集合である第１の多次元時系列データをクラスタリングし、各クラスタにそれぞれクラスを割り当て、前記クラスタ毎に、ある１つの属性の値を他の属性を用いて予測するモデルを作成し、前記第１の多次元時系列データにおける個々のデータに、それぞれが属するクラスタのクラスを割り当て、前記データに割り当てられたクラスと、該データから所定の時刻範囲に含まれるデータに割り当てられたクラスとに基づいて、該データに割り当てるクラスを再決定し、再決定後のクラスによって該データのクラスを更新し、更新後のクラスが割り当てられた第１の多次元時系列データを用いて、前記他の属性からクラスを予測する決定木を生成し、ｎ次元の属性値を持つデータの集合である第２の多次元時系列データにおいてテスト対象となるデータを、生成された前記決定木に入力して、クラスを予測し、予測したクラスに対応するモデルを選択し、選択したモデルに前記テスト対象となるデータにおける前記他の属性の値を入力し、前記テスト対象となるデータにおける前記ある１つの属性の値と、前記モデルの出力とから、前記テスト対象となるデータの異常性を判定する。 The abnormality determination method as one aspect of the present invention clusters first multidimensional time-series data that is a set of data having n-dimensional attribute values, assigns a class to each cluster, and for each cluster, A model for predicting the value of one attribute using another attribute is created, and each class in the first multidimensional time series data is assigned a class of a cluster to which the data belongs, and assigned to the data Based on the class and the class assigned to the data included in the predetermined time range from the data, the class assigned to the data is redetermined, the class of the data is updated with the redetermined class, and after the update Using the first multi-dimensional time series data to which the class is assigned, a decision tree that predicts the class from the other attributes is generated, and has an n-dimensional attribute value. Data to be tested in the second multi-dimensional time series data that is a set of data is input to the generated decision tree, a class is predicted, and a model corresponding to the predicted class is selected and selected. The value of the other attribute in the data to be tested is input to the model, and the abnormality of the data to be tested is determined from the value of the one attribute in the data to be tested and the output of the model. Determine sex.

本発明の一態様としての異常性判定方法は、ｎ次元の属性値を持つデータの集合である第１の多次元時系列データをクラスタリングし、前記クラスタ毎に、ある１つの属性の値を他の属性を用いて予測するモデルを作成し、個々の前記モデルにおいて前記他の属性の変動許容範囲を決定し、ｎ次元の属性値を持つデータの集合である第２の多次元時系列データにおいてテスト対象となるデータにおける前記他の属性が前記変動許容範囲に含まれるモデルを選択し、選択したモデルに、前記テスト対象となるデータにおける前記他の属性の値を入力し、前記テスト対象となるデータにおける前記ある１つの属性の値と、前記モデルの出力とから、前記テスト対象となるデータの異常性を判定する。 According to an abnormality determination method as one aspect of the present invention, first multi-dimensional time-series data that is a set of data having n-dimensional attribute values is clustered, and one attribute value is assigned to each cluster. In the second multidimensional time series data, which is a set of data having an n-dimensional attribute value, a model for predicting using the attribute is determined, a variation allowable range of the other attribute is determined in each of the models A model in which the other attribute in the data to be tested is included in the variation allowable range is selected, and the value of the other attribute in the data to be tested is input to the selected model, and the test target The anomaly of the data to be tested is determined from the value of the one attribute in the data and the output of the model.

本発明の一態様としての異常性判定方法は、ｎ次元の属性値を持つデータの集合である第１の多次元時系列データをクラスタリングし、各クラスタにそれぞれクラスを割り当て、前記クラスタ毎に、ある１つの属性の値を他の属性を用いて予測するモデルを作成し、前記第１の多次元時系列データにおける個々のデータに、それぞれが属するクラスタのクラスを割り当て、クラスが割り当てられた第１の多次元時系列データを用いて、前記他の属性からクラスを予測する決定木を生成し、生成された前記決定木を上記決定木変更方法によって変更し、次に、ｎ次元の属性値を持つデータの集合である第２の多次元時系列データにおいてテスト対象となるデータを、変更後の前記決定木に入力して、クラスを予測し、一方、個々の前記モデルにおいて前記他の属性の変動許容範囲を決定し、前記テスト対象となるデータにおける前記他の属性が前記変動許容範囲に含まれるモデルを選択し、選択したモデルと、予測された前記クラスに対応するモデルとで共通するモデルを選出し、選出したモデルに、前記テスト対象となるデータにおける前記他の属性の値を入力し、前記テスト対象となるデータにおける前記ある１つの属性の値と、前記モデルの出力とから、前記テスト対象となるデータの異常性を判定する。 The abnormality determination method as one aspect of the present invention clusters first multidimensional time-series data that is a set of data having n-dimensional attribute values, assigns a class to each cluster, and for each cluster, A model for predicting the value of one attribute using another attribute is created, and the class of the cluster to which each class belongs is assigned to each data in the first multidimensional time series data. A decision tree for predicting a class from the other attributes is generated using one multidimensional time-series data, the generated decision tree is changed by the decision tree changing method, and then an n-dimensional attribute value The data to be tested in the second multidimensional time series data that is a set of data having the following is input to the changed decision tree to predict the class, while the individual models are And determining a variation allowable range of the other attribute, selecting a model in which the other attribute in the data to be tested is included in the variable allowable range, and corresponding to the selected model and the predicted class. A model common to the model is selected, the value of the other attribute in the data to be tested is input to the selected model, the value of the one attribute in the data to be tested, and the model From this output, the abnormality of the data to be tested is determined.

本発明の一態様としての異常性判定方法は、ｎ次元の属性値を持つデータの集合である第１の多次元時系列データをクラスタリングし、各クラスタにそれぞれクラスを割り当て、前記クラスタ毎に、ある１つの属性の値を他の属性を用いて予測するモデルを作成し、前記第１の多次元時系列データにおける個々のデータに、それぞれが属するクラスタのクラスを割り当て、前記データに割り当てられたクラスと、該データから所定の時刻範囲に含まれるデータに割り当てられたクラスとに基づいて、該データに割り当てるクラスを再決定し、再決定後のクラスによって該データのクラスを更新し、更新後のクラスが割り当てられた第１の多次元時系列データを用いて、前記他の属性からクラスを予測する決定木を生成し、ｎ次元の属性値を持つデータの集合である第２の多次元時系列データにおいてテスト対象となるデータを、生成された前記決定木に入力して、クラスを予測し、一方、個々の前記モデルにおいて前記他の属性の変動許容範囲を決定し、前記テスト対象となるデータにおける前記他の属性が前記変動許容範囲に含まれるモデルを選択し、選択したモデルと、予測された前記クラスに対応するモデルとで共通するモデルを選出し、選出したモデルに、前記テスト対象となるデータにおける前記他の属性の値を入力し、前記テスト対象となるデータにおける前記ある１つの属性の値と、前記モデルの出力とから、前記テスト対象となるデータの異常性を判定する。 The abnormality determination method as one aspect of the present invention clusters first multidimensional time-series data that is a set of data having n-dimensional attribute values, assigns a class to each cluster, and for each cluster, A model for predicting the value of one attribute using another attribute is created, and each class in the first multidimensional time series data is assigned a class of a cluster to which the data belongs, and assigned to the data Based on the class and the class assigned to the data included in the predetermined time range from the data, the class assigned to the data is redetermined, the class of the data is updated with the redetermined class, and after the update Using the first multi-dimensional time series data to which the class is assigned, a decision tree that predicts the class from the other attributes is generated, and has an n-dimensional attribute value. Data to be tested in the second multi-dimensional time-series data that is a set of data is input to the generated decision tree to predict a class, while fluctuations in the other attributes in each individual model A tolerance range is determined, a model in which the other attribute in the data to be tested is included in the variation tolerance range is selected, and a model common to the selected model and a model corresponding to the predicted class is selected. A value of the other attribute in the data to be tested is input to the selected model, and the test is performed from the value of the one attribute in the data to be tested and the output of the model. Determine the anomaly of the target data.

本発明の一態様としてのプログラムは、上記決定木変更方法に記載の各段階をコンピュータに実行させることを特徴とする。 A program as one aspect of the present invention causes a computer to execute each step described in the decision tree changing method.

本発明の一態様としてのプログラムは、上記異常性判定方法に記載の各段階をコンピュータに実行させることを特徴とする。 A program according to one aspect of the present invention causes a computer to execute each step described in the abnormality determination method.

本発明によれば、複数のモデルを用いて異常性判定を行う際に利用するモデルを柔軟に決めることができ、異常性判定の精度を向上させることができる。 ADVANTAGE OF THE INVENTION According to this invention, the model utilized when performing abnormality determination using a some model can be determined flexibly, and the precision of abnormality determination can be improved.

本件出願人の先願として、本件出願時において未公開の特願２００５−１７６７００号がある。以下、本発明者が本発明をなすに至った経緯についてこの先願を用いつつ説明する。 As a prior application of the applicant of the present application, there is Japanese Patent Application No. 2005-176700 which has not been disclosed at the time of the application. Hereinafter, the background of the inventor's achievement of the present invention will be described using this prior application.

図２は、発電所のセンサ群において、ターゲットセンサをポンプ圧力、説明センサを発電出力とし、説明センサの値からターゲットセンサの値を予測するモデルを生成した例である。図中、「一般的な管理値」は背景技術の欄で述べた上限値および下限値に相当するものである。このような予測モデルは、過去に収集されたセンサの時系列データ（訓練データ）を用いることで作成でき、具体的な作成方法の一例が先願に記載されている。 FIG. 2 is an example of generating a model for predicting the value of the target sensor from the value of the explanation sensor, with the target sensor as the pump pressure and the explanation sensor as the power generation output in the sensor group of the power plant. In the figure, “general management value” corresponds to the upper limit value and the lower limit value described in the background art section. Such a prediction model can be created by using time series data (training data) of sensors collected in the past, and an example of a specific creation method is described in the prior application.

ここで異常発見に際し、説明センサの値が決まるとターゲットセンサの値が一意に決まるモデルが作成できることが望ましい。しかし、一般にはプラントは複数の運転モードを持つ場合があるため、一意に決まるとは言えない。図２には２つの運転状態（稼働状態と非稼働状態）のそれぞれに対応したモデルが示される。１つの発電所においてポンプが複数台存在し、同じ発電出力でもこれらのポンプが稼働している場合としていない場合とがある。稼動している場合は発電出力の上昇に伴ってモデルＢ，Ｃという圧力値の推移をたどり、稼動していない場合はモデルＡという一定の小さな値をとる。したがって、現在の運転モードが推定できなければ、どのモデルを用いて異常発見を行えばよいか不明である。運転モードを示すセンサ値が存在すればその値を用いてモデルを決定すればよいが、このようなセンサ値が常に与えられているとは限らない。 Here, when an abnormality is detected, it is desirable that a model in which the value of the target sensor is uniquely determined can be created when the value of the explanatory sensor is determined. However, in general, a plant may have a plurality of operation modes, so it cannot be determined uniquely. FIG. 2 shows a model corresponding to each of two operating states (operating state and non-operating state). There are cases where a plurality of pumps exist in one power plant, and these pumps are not operating even at the same power generation output. When operating, the pressure values of models B and C are changed as the power generation output increases, and when not operating, a constant small value of model A is taken. Therefore, if the current operation mode cannot be estimated, it is unclear which model should be used to detect abnormality. If there is a sensor value indicating the operation mode, the model may be determined using that value, but such a sensor value is not always given.

先願の手法では、複数のモデルを作成するのと同時に、モデルの選択を行う分類器を生成することで問題の解決を図っている。同手法では、モデル生成時に訓練データをクラスタリングしており、訓練データにおける各時刻のデータはいずれかのモデルに対応するクラスタに属する。この情報を各時刻のデータにクラスとして割り当てる。図３はクラスが割り当てられた訓練データの例である。続いて、ターゲットセンサ（この例ではＸ１）を除いたセンサを用いて、クラスを予測する決定木を生成する。図４に生成された決定木の例を示す。図５に示されたテスト対象となる多次元時系列データ（以降、テストデータと呼ぶ）のＸ１系列（ポンプ圧力）の異常値を発見する際には、まず、図４の決定木を用いて各時刻のデータに適用するモデルを選択し、次いで、選択されたモデルを用いて異常性の有無を判定する。 In the method of the prior application, the problem is solved by generating a plurality of models and simultaneously generating a classifier for selecting a model. In this method, training data is clustered at the time of model generation, and data at each time in the training data belongs to a cluster corresponding to one of the models. This information is assigned to each time data as a class. FIG. 3 is an example of training data to which classes are assigned. Subsequently, a decision tree for predicting the class is generated using the sensors excluding the target sensor (X1 in this example). FIG. 4 shows an example of the decision tree generated. When finding an abnormal value of the X1 series (pump pressure) of the multidimensional time series data (hereinafter referred to as test data) to be tested shown in FIG. 5, first, using the decision tree of FIG. A model to be applied to the data at each time is selected, and then the presence or absence of abnormality is determined using the selected model.

この方法では、モデル選択決定木の精度が悪いときに問題が生じる。図６はモデル選択決定木により、訓練データにおける各時刻のデータのクラスを予測した結果の例である。葉に付加された［ｎａ、ｎｂ、ｎｃ］は、それぞれの葉に分類されたデータ群が実際に持っていたクラスの分布である。ｎａはクラスＡの個数、ｎｂはクラスＢの個数、ｎｃはクラスＣの個数を表す。例えば、葉Ｌ１には１００２時刻分のデータが当てはまり、そのうちクラスＡをとるものが１０００時刻分あり、クラスＣをとるものが２時刻分あったことを示している。仮に、図５のテストデータにおいて時刻Ｐのデータが図６のＬ１の葉に分類されたとすると、そのデータが元々クラスＡに属すべきものならば正しいモデルＡで異常性判定が行われることになる。しかし、元々クラスＣに属していたとするとモデルＡで判定するのは誤りである。すなわち、正常であったはずのデータが異常と判定される可能性がある。 This method causes a problem when the accuracy of the model selection decision tree is poor. FIG. 6 is an example of the result of predicting the class of data at each time in the training data by the model selection decision tree. [Na, nb, nc] added to the leaf is a class distribution actually possessed by the data group classified into each leaf. na represents the number of classes A, nb represents the number of classes B, and nc represents the number of classes C. For example, the data for 1002 hours is applied to the leaf L1, of which 1000 hours are taken for class A and 2 hours are taken for class C. If the data at time P in the test data of FIG. 5 is classified into the leaves of L1 in FIG. 6, if the data originally belongs to class A, the abnormality determination is performed with the correct model A. . However, if it originally belongs to class C, it is an error to make a determination with model A. That is, data that should have been normal may be determined to be abnormal.

このように、先願の手法では、複数のモデルを用いて異常性判定を行った場合、モデルの選択が正しく行われず正常なデータを異常と判定してしまう可能性があった。 As described above, in the method of the prior application, when the abnormality determination is performed using a plurality of models, there is a possibility that the model is not correctly selected and normal data is determined to be abnormal.

本発明者はこのような問題点を解決すべく多くの努力を重ねた結果本発明をなすに至った。以下に本発明の実施形態を述べるが、その特徴の１つは、複数のモデルを用いて異常発見を行う際に、ある１つのモデルを使う妥当性が高ければ、そのモデルを用いて異常性判定を行い、そうでなければ２以上のモデルを用いて異常性判定を行うというものである。これにより、背景技術で述べた一般的な管理値を使った異常発見手法より精度が高く、また、先願を使った方法よりも誤判定の少ない異常値判定方法を提供することができる。以下、第１〜第３の実施形態では、異常性判定に用いるべきモデルを選択する決定木の生成方法（より詳細には、先願の手法により生成された決定木の変更方法）について説明し、第４以降の実施形態においては、異常性判定の詳細について説明する。 The present inventor has made many efforts to solve such problems, and has come to make the present invention. Embodiments of the present invention will be described below. One of the features of the present invention is that, when anomaly discovery is performed using a plurality of models, if the validity of using one model is high, the model is used to detect anomalies. The determination is performed, otherwise, the abnormality determination is performed using two or more models. Accordingly, it is possible to provide an abnormal value determination method that is more accurate than the abnormality detection method using the general management value described in the background art and has fewer erroneous determinations than the method using the prior application. Hereinafter, in the first to third embodiments, a method for generating a decision tree for selecting a model to be used for abnormality determination (more specifically, a method for changing a decision tree generated by the technique of the prior application) will be described. In the fourth and subsequent embodiments, details of the abnormality determination will be described.

（第１の実施形態）
図１は、第１の実施形態に係わる決定木の変更方法を説明するフローチャートである。 (First embodiment)
FIG. 1 is a flowchart for explaining a decision tree changing method according to the first embodiment.

まず、訓練データ（第１の多次元時系列データ）（図３参照）から生成された決定木に当該訓練データを適用し、後述するクラス分布表（図７参照）を生成する（Ｓ１１）。 First, the training data is applied to a decision tree generated from training data (first multi-dimensional time series data) (see FIG. 3) to generate a class distribution table (see FIG. 7) described later (S11).

次いで、ユーザから閾値が入力される（Ｓ１２）。 Next, a threshold value is input from the user (S12).

次いで、クラス分布表の各葉について以下の処理を行う。 Next, the following processing is performed for each leaf in the class distribution table.

まず、クラス分布表から葉のラベルを消去する（Ｓ１３）。次いで、各葉について、クラス分布表に含まれる、クラスの度数分布表から、値が閾値より大きいクラスの全てを葉のラベルとしてクラス分布表に追加し、追加された葉のラベルによって決定木の対応する葉のラベルを更新する（Ｓ１４）。 First, the leaf label is deleted from the class distribution table (S13). Next, for each leaf, from the class frequency distribution table included in the class distribution table, all the classes whose values are greater than the threshold are added to the class distribution table as leaf labels, and the decision tree is added according to the added leaf label. The corresponding leaf label is updated (S14).

以下に詳細例を示す。 Detailed examples are shown below.

図６の決定木の葉には図３の訓練データを属性Ｘ２からＸｎまで用いて分類した結果が付されている。この分類結果から図７に示すクラス分布表を作成する。クラス分布表は、葉の名前と、葉に割り当てられたクラスを示す葉のラベル、および、各葉（Ｌ１〜Ｌ３）に分類されたデータのクラス分布を表す度数分布表とを含む。１つの葉に分類されたデータのクラス分布は度数分布情報に対応する。 The result of classifying the training data of FIG. 3 using the attributes X2 to Xn is attached to the leaves of the decision tree of FIG. A class distribution table shown in FIG. 7 is created from the classification result. The class distribution table includes a leaf name, a leaf label indicating a class assigned to the leaf, and a frequency distribution table representing the class distribution of data classified into each leaf (L1 to L3). The class distribution of the data classified into one leaf corresponds to the frequency distribution information.

ここで、各葉について度数分布表を参照しながら葉のラベルの更新を行う。まず、ユーザが閾値を設定する。ここでは０に設定されたものとする。次いでクラス分布表における葉のラベルをいったん全て消去する。すなわち、葉Ｌ１におけるラベルＡ、葉Ｌ２におけるラベルＢ、葉Ｌ３におけるラベルＣを消去する。次いで、１つの葉に着目し、クラスを１つ選択し、選択したクラスの度数が閾値より大きい場合はそのクラスを葉のラベルに追加することを繰り返す。これを全ての葉について行う。 Here, the leaf label is updated with reference to the frequency distribution table for each leaf. First, the user sets a threshold value. Here, it is assumed that 0 is set. Next, all the leaf labels in the class distribution table are once deleted. That is, the label A on the leaf L1, the label B on the leaf L2, and the label C on the leaf L3 are erased. Next, paying attention to one leaf, one class is selected, and when the frequency of the selected class is larger than the threshold, adding the class to the leaf label is repeated. Do this for all leaves.

葉Ｌ１においてはクラスＡ，Ｃの度数が０より大きいので新たな葉のラベルはＡ，Ｃとなる。同様に葉Ｌ２に関しては新たなラベルはＡ，Ｂ，葉Ｌ３に関してはＣとなる。更新後におけるクラス分布表の各葉のラベルによって図６における決定木の葉に付されたラベルを更新する。 Since the frequency of classes A and C is greater than 0 in leaf L1, the new leaf labels are A and C. Similarly, for leaf L2, the new labels are A, B, and C for leaf L3. The label attached to the leaf of the decision tree in FIG. 6 is updated with the label of each leaf in the class distribution table after the update.

なお、閾値は、全体の度数に対する割合としてもよい。仮にユーザが閾値を決定するための値として１％を設定したとする。この場合、葉Ｌ１では、閾値は（１０００＋２）×０．０１＝１．００２となり新しい葉のラベルはＡ、Ｃとなる。葉Ｌ２では閾値は（１０＋５００）×０．０１＝５．１となり新しい葉のラベルはＡ，Ｂとなる。葉Ｌ３では閾値は５×０．０１＝０．００５となり新しい葉のラベルはＣとなる。 The threshold value may be a ratio with respect to the overall frequency. Suppose that the user sets 1% as a value for determining the threshold value. In this case, for the leaf L1, the threshold value is (1000 + 2) × 0.01 = 1.002, and the new leaf labels are A and C. For the leaf L2, the threshold is (10 + 500) × 0.01 = 5.1, and the new leaf labels are A and B. For leaf L3, the threshold is 5 × 0.01 = 0.005, and the label for the new leaf is C.

図８は、第１の実施形態に係わるデータ処理装置の構成を示す図である。 FIG. 8 is a diagram showing the configuration of the data processing apparatus according to the first embodiment.

記憶装置１１は、ＣＰＵ１２が実行するプログラムと、訓練データと、訓練データから作成された決定木とを格納している。 The storage device 11 stores a program executed by the CPU 12, training data, and a decision tree created from the training data.

ＣＰＵ１２は、記憶装置１１内のプログラムをメモリ１３にロードして実行する。以下、ＣＰＵ１２の処理の詳細を示す。 The CPU 12 loads the program in the storage device 11 into the memory 13 and executes it. Details of the processing of the CPU 12 will be described below.

ＣＰＵ１２により決定木が記憶装置１１からメモリ１３に読み出される。 The CPU 12 reads the decision tree from the storage device 11 to the memory 13.

さらに、ＣＰＵ１２により、訓練データが順次メモリ１３に読み出され、訓練データに含まれる各時刻のデータをメモリ１３上の決定木に適用する。訓練データに含まれる各時刻のデータは、決定木におけるいずれかの葉に分類される。個々の葉ごとに、その葉に分類されたデータのクラスの度数をカウントする。 Further, the CPU 12 sequentially reads the training data into the memory 13 and applies the data at each time included in the training data to the decision tree on the memory 13. The data at each time included in the training data is classified into one of the leaves in the decision tree. For each leaf, the frequency of the class of data classified into that leaf is counted.

以上の結果に基づきクラス分布表を作成する。 A class distribution table is created based on the above results.

作成したクラス分布表と、あらかじめユーザから与えられメモリ１３または記憶装置１１に格納された閾値とを用いて、個々の葉についてラベルの付け直しを行い、新しい決定木を生成する。 Using the created class distribution table and the threshold value given in advance by the user and stored in the memory 13 or the storage device 11, the labels are re-attached to each leaf to generate a new decision tree.

ＣＰＵ１２は新たな決定木を表示装置１４に出力する。 The CPU 12 outputs a new decision tree to the display device 14.

以上のように、本実施形態によれば、決定木の葉に複数のラベルを付加することにより適切でないモデルを選択する確率を低減できる。本実施形態で生成した決定木を用いた異常値判定は第４の実施形態以降において詳細に述べる。 As described above, according to this embodiment, the probability of selecting an inappropriate model can be reduced by adding a plurality of labels to the leaves of the decision tree. The abnormal value determination using the decision tree generated in the present embodiment will be described in detail in the fourth and subsequent embodiments.

（第２の実施形態）
図９は、第２の実施形態に係わる決定木の変更方法を説明するフローチャートである。 (Second Embodiment)
FIG. 9 is a flowchart illustrating a decision tree changing method according to the second embodiment.

まず、第１の実施形態と同様に、訓練データを決定木に適用し、クラス分布表を生成する（Ｓ２１）。 First, as in the first embodiment, the training data is applied to the decision tree to generate a class distribution table (S21).

次いでユーザから閾値が入力される（Ｓ２２）。クラス分布表の各葉において以下の処理を行う。 Next, a threshold value is input from the user (S22). The following processing is performed on each leaf of the class distribution table.

まず、葉のラベルを消去する（Ｓ２３）。次いで、度数分布表における値の大きいクラスから順にＳ２４の処理を行う。 First, the leaf label is erased (S23). Next, the process of S24 is performed in order from the class with the largest value in the frequency distribution table.

Ｓ２４では現在のクラスを葉のラベルとして追加し、葉のラベルとして追加されたクラスの度数を合計し、閾値を超えていれば処理を終了する。閾値を超えていなければ、次に値の大きいクラスを選択し処理を継続する。 In S24, the current class is added as a leaf label, the frequencies of the classes added as leaf labels are summed, and if the threshold is exceeded, the process ends. If the threshold is not exceeded, the class having the next largest value is selected and the process is continued.

以下に、前述の図６および図７を用いて、詳細例を説明する。 A detailed example will be described below with reference to FIGS. 6 and 7 described above.

まず、図７のクラス分布表を生成し、次いで、ユーザにより閾値として９９％が設定されたとする。次いで度数分布表における葉のラベルを消去する。次いで、クラスを度数の大きい順に取り出し葉のラベルとして追加する。葉のラベルとして追加されたクラスの度数のクラス全体の度数に対する割合が閾値を超えるまでこの処理を続ける。 First, it is assumed that the class distribution table of FIG. 7 is generated, and then 99% is set as a threshold value by the user. Next, the label of the leaf in the frequency distribution table is deleted. Next, the classes are extracted in the descending order of frequency and added as leaf labels. This process is continued until the ratio of the frequency of the class added as a leaf label to the frequency of the entire class exceeds the threshold.

葉Ｌ１においてはクラスＡが追加された時点で上記割合が閾値を超えるため（９９．８％）、葉のラベルはＡのみとなる。 In the leaf L1, since the above ratio exceeds the threshold value when class A is added (99.8%), the leaf label is only A.

葉Ｌ２に関してはクラスＢが追加されただけでは上記割合が閾値を超えないため（９８．０％）、さらにクラスＡが追加され、この時点で閾値を超えるため（１００％）、新たなラベルはＡ、Ｂとなる。 For the leaf L2, the above ratio does not exceed the threshold when only class B is added (98.0%), and further class A is added and exceeds the threshold at this point (100%), so the new label is A and B.

葉Ｌ３に関してはクラスＣのみしか存在しないためクラスＣを追加した時点で上記割合が１００％となるため葉のラベルはＣとなる。 Since only the class C exists for the leaf L3, when the class C is added, the ratio becomes 100%, so the leaf label is C.

更新後のクラス分布表における各葉のラベルによって図６における決定木の葉に付されたラベルを更新する。 The label attached to the leaf of the decision tree in FIG. 6 is updated with the label of each leaf in the updated class distribution table.

以下、第１の実施形態で用いた図８に基づき、第２の実施形態に係わるデータ処理装置について説明する。 The data processing apparatus according to the second embodiment will be described below based on FIG. 8 used in the first embodiment.

（第３の実施形態）
図１０は、第３の実施形態に係わる決定木の変更方法を説明するフローチャートである。 (Third embodiment)
FIG. 10 is a flowchart for explaining a decision tree changing method according to the third embodiment.

まず、訓練データを決定木に適用し、クラス分布表を生成する（Ｓ３１）。 First, the training data is applied to the decision tree to generate a class distribution table (S31).

次いでユーザから信頼度ＣＦと閾値とが入力される（Ｓ３２）。 Next, the reliability CF and the threshold value are input from the user (S32).

クラス分布表の各葉において以下の処理を行う。 The following processing is performed on each leaf of the class distribution table.

まず、葉のラベルを消去する（Ｓ３３）。次いで、度数分布表の値の大きいクラスから順にＳ３４の処理を行う。 First, the leaf label is erased (S33). Next, the process of S34 is performed in order from the class with the largest value in the frequency distribution table.

Ｓ３４では度数分布表における葉のラベルとして現在のクラスを追加する。ラベルとして追加されたクラスの度数を合計し、葉におけるクラス全体の度数Ｎから合計値を引く。この値を差ｅとする。ＣＦ、Ｎ、ｅから、母集団中に含まれる、葉のラベルのクラスのデータの割合ｐを推定する。これは、図１１に記述された式をｐについて解くことで計算できる。ｐが閾値を超えていれば処理を終了し、閾値を超えていなければ、次のクラスを選択して処理を継続する。すべてのクラスの選択後もｐが閾値を超えない場合はラベルに全てのクラスを与えて処理を終了する。 In S34, the current class is added as a leaf label in the frequency distribution table. The frequencies of the classes added as labels are summed, and the total value is subtracted from the frequency N of the entire class in the leaf. Let this value be the difference e. From CF, N, and e, the ratio p of the leaf label class data included in the population is estimated. This can be calculated by solving the equation described in FIG. 11 for p. If p exceeds the threshold, the process is terminated. If p does not exceed the threshold, the next class is selected and the process is continued. If p does not exceed the threshold even after all classes are selected, all classes are assigned to the label and the process is terminated.

以下に前述した図６および図７を用いて詳細例を説明する。 A detailed example will be described below with reference to FIGS. 6 and 7 described above.

まず、図７のクラス分布表を生成し、次いで、ユーザが信頼度ＣＦと閾値とを設定する。ここでは、信頼度ＣＦが０．２５、閾値が０．９９に設定されたとする。次いで度数分布表における葉のラベルを消去する。次いで、クラスを度数の大きい順に取り出し葉のラベルとして追加する。葉のラベルとして追加されたクラスの度数を合計し、葉におけるクラス全体の度数から合計値を引き、差ｅを求める。 First, the class distribution table of FIG. 7 is generated, and then the user sets the reliability CF and the threshold value. Here, it is assumed that the reliability CF is set to 0.25 and the threshold value is set to 0.99. Next, the label of the leaf in the frequency distribution table is deleted. Next, the classes are extracted in the descending order of frequency and added as leaf labels. The frequencies of the classes added as leaf labels are summed, and the total value is subtracted from the frequency of the entire class in the leaf to obtain a difference e.

葉Ｌ１では、クラス全体の度数はＮ＝１００２である。まず、度数が最も大きいクラスＡが選択され、この度数が１０００なので差ｅは２である。ＣＦ＝０．２５、ｅ＝２、Ｎ＝１００２を用いて、図１１の式より割合ｐを計算するとｐ＝０．９９６２となる。この値は閾値（０．９９）より大きいため、この時点で処理は終了し、この結果、葉のラベルはＡとなる。 In the leaf L1, the frequency of the entire class is N = 1002. First, the class A having the highest frequency is selected. Since this frequency is 1000, the difference e is 2. When the ratio p is calculated from the equation of FIG. 11 using CF = 0.25, e = 2, and N = 1002, p = 0.9996. Since this value is larger than the threshold value (0.99), the processing ends at this point, and as a result, the leaf label is A.

葉Ｌ２では、Ｎ＝５１０である。まず、度数が最も大きいクラスＢが選択され、この度数は５００なので差ｅは１０となる。ＣＦ＝０．２５、ｅ＝１０、Ｎ＝５１０を用いて、図１１の式よりｐを計算するとｐ＝０．９７４１となる。この値は閾値（０．９９）を超えないため、処理が継続される。つぎにクラスＡが選択される。このときｅ＝０となる。ｐを計算するとｐ＝０．９９８６となり、この値は閾値を超えるため、この時点で処理は終了し、この結果、葉のラベルはＡ、Ｂとなる。 In leaf L2, N = 510. First, the class B having the highest frequency is selected. Since this frequency is 500, the difference e is 10. When p is calculated from the equation of FIG. 11 using CF = 0.25, e = 10, and N = 510, p = 0.9741. Since this value does not exceed the threshold value (0.99), the processing is continued. Next, class A is selected. At this time, e = 0. When p is calculated, p = 0.9986, and this value exceeds the threshold value. Therefore, the processing is terminated at this point, and as a result, the leaf labels are A and B.

葉Ｌ３では、まず、クラスＣが選択される。Ｎ＝５であり、ｅ＝０である。ｐを計算すると０．７６７９なので処理が継続される。次にクラスＡが選択される。ｅ＝０である。ｐ＝０．７６７９となり処理が継続される。次にクラスＢが選択される。全てのクラスが選択されたため、この時点で処理は終了し、この結果、葉のラベルはＡ、Ｂ、Ｃとなる。 In leaf L3, class C is first selected. N = 5 and e = 0. Since p is calculated as 0.7679, the processing is continued. Next, class A is selected. e = 0. The process continues with p = 0.7679. Next, class B is selected. Since all the classes have been selected, the processing ends at this point, and as a result, the leaf labels are A, B, and C.

更新後のクラス分布表における各葉のラベルによって、図６における決定木の葉に付されたラベルを更新する。 The label attached to the leaf of the decision tree in FIG. 6 is updated with the label of each leaf in the updated class distribution table.

以下、第１の実施形態で用いた図８に基づき、第３の実施形態に係わるデータ処理装置について説明する。 The data processing apparatus according to the third embodiment will be described below based on FIG. 8 used in the first embodiment.

作成したクラス分布表と、あらかじめユーザから与えられたメモリ１３または記憶装置１１に格納された信頼度ＣＦおよび閾値とを用いて、個々の葉についてラベルの付け直しを行い、新しい決定木を生成する。 Using the created class distribution table and the reliability CF and threshold stored in the memory 13 or storage device 11 given in advance by the user, the labels are re-labeled for each leaf to generate a new decision tree. .

（第４の実施形態）
第４の実施形態は、テストデータ（第２の多次元時系列データ）の異常検出に関わるものである。図１２に示すｎ次元の属性を持つ訓練データを用いて、図１３に示すｎ次元の属性を持つテストデータからターゲットセンサの異常を発見したいものとする。ターゲットセンサはＸ１であるとする。 (Fourth embodiment)
The fourth embodiment relates to abnormality detection of test data (second multidimensional time series data). Assume that it is desired to discover abnormality of the target sensor from the test data having the n-dimensional attribute shown in FIG. 13 using the training data having the n-dimensional attribute shown in FIG. Assume that the target sensor is X1.

図１４は本実施形態に係わるテストデータの異常性判定方法を説明するフローチャートである。 FIG. 14 is a flowchart for explaining a test data abnormality determination method according to this embodiment.

まず、先願の手法を用いて訓練データをクラスタリングし、クラスタ毎に、説明センサの値からターゲットセンサの値を推測するモデルとモデルに対する訓練データの標準偏差σとを先願の方法または既知の手法により生成する（Ｓ４１）。そして、各クラスタにクラスを割り当てる（Ｓ４２）。 First, the training data is clustered using the method of the prior application, and for each cluster, the model for inferring the value of the target sensor from the value of the explanatory sensor and the standard deviation σ of the training data for the model are determined using the method of the prior application or a known one. It is generated by a technique (S41). Then, a class is assigned to each cluster (S42).

次に、訓練データにおける各時刻のデータに、それぞれが属するクラスタのクラスを割り当てる（Ｓ４３）。各時刻のデータにクラスを割り当てた後の訓練データの例を図１５に示す。 Next, the class of the cluster to which each belongs is assigned to the data at each time in the training data (S43). An example of training data after assigning a class to the data at each time is shown in FIG.

次に、訓練データのうちターゲットセンサＸ１を除いた属性を用いてクラスを予測する決定木を生成する（Ｓ４４）。 Next, a decision tree for predicting a class is generated using the attributes excluding the target sensor X1 in the training data (S44).

つづいて、この決定木を第１〜第３の実施形態のいずれかの方法を用いて変更し（Ｓ４５）、Ｓ４６に進む。 Subsequently, the decision tree is changed using any one of the methods of the first to third embodiments (S45), and the process proceeds to S46.

Ｓ４６では、図１３のテストデータにおいてＰ時刻のＸ１センサの異常性を判定する場合、Ｐ時刻における属性Ｘ２〜Ｘｎの値を、変更後の決定木に適用してクラス名、すなわち異常性判定に利用するモデル（単数または複数）を決定する。次いで、決定された個々のモデルに時刻Ｐにおける属性Ｘ２〜Ｘｎの値を入力して、モデルによるＸ１の予測値を求める。時刻ＰにおけるＸ１の値がモデルの標準偏差から求まる正常範囲（例えば±３σ、ここでσは標準偏差）であれば正常とみなし、そうでなければ異常とみなす。より詳細には、いずれか１つのモデルでも正常と判定されれば正常、全てのモデルで異常と判定されたならば異常とみなす。 In S46, when the abnormality of the X1 sensor at the P time is determined in the test data of FIG. 13, the values of the attributes X2 to Xn at the P time are applied to the changed decision tree to determine the class name, that is, the abnormality. Determine the model or models to use. Next, the values of the attributes X2 to Xn at the time P are input to the determined individual models, and the predicted value of X1 by the model is obtained. If the value of X1 at time P is in a normal range (for example, ± 3σ, where σ is a standard deviation) obtained from the standard deviation of the model, it is regarded as normal, and otherwise it is regarded as abnormal. More specifically, if any one model is determined to be normal, it is considered normal, and if all models are determined to be abnormal, it is considered abnormal.

以下に詳細例を示す。 Detailed examples are shown below.

図１６は、図１２に示す訓練データの属性Ｘ１の異常性を判定するために生成した、Ｘ２からＸ１を予測するモデルの例を示す。 FIG. 16 shows an example of a model that predicts X1 from X2 generated to determine the abnormality of the attribute X1 of the training data shown in FIG.

Ｘ２が説明センサであり、Ａ，Ｂ，Ｃの３つのモデルが生成されている。モデルＡはＸ１＝１０、σ＝１である。モデルＢはＸ１＝Ｘ２×３＋１、σ＝１である。モデルＣはＸ１＝３０、σ＝２である。 X2 is an explanation sensor, and three models A, B, and C are generated. Model A has X1 = 10 and σ = 1. Model B has X1 = X2 × 3 + 1 and σ = 1. Model C has X1 = 30 and σ = 2.

モデルＡ、Ｂ、Ｃを生成する元となった訓練データ（図１２）にクラスを割り振る。ここでは、図１５のようにクラスが割り振られたとする。 Classes are assigned to the training data (FIG. 12) from which models A, B, and C are generated. Here, it is assumed that a class is allocated as shown in FIG.

この訓練データを用いて、Ｘ１を除く属性からクラスを予測する決定木を生成する。ここでは、図１７に示す、属性Ｘ３，Ｘ４からクラス（モデル）を予測する決定木が生成されたとする。 Using this training data, a decision tree that predicts a class from attributes other than X1 is generated. Here, it is assumed that a decision tree for predicting a class (model) is generated from the attributes X3 and X4 shown in FIG.

この決定木を第１〜第３の実施形態の方法により変更する。ここでは第１の実施形態の方法を用いるものとするが、第２または第３の実施形態の方法を用いてもよい。変更後の決定木の例を図１８に示す。 This decision tree is changed by the method of the first to third embodiments. Here, the method of the first embodiment is used, but the method of the second or third embodiment may be used. An example of the decision tree after the change is shown in FIG.

ここで、図１３のテストデータにおいて時刻Ｐの異常性判定を行う。 Here, the abnormality determination of the time P is performed in the test data of FIG.

まず、図１８の決定木を用いて、利用すべきモデルを決定する。時刻ＰにおけるＸ３は２０、Ｘ４は１であり、したがって図１８の決定木から、利用すべきモデルは葉Ｌ２に割り振られたモデルＡ、Ｂと決定される。 First, a model to be used is determined using the decision tree of FIG. X3 at time P is 20 and X4 is 1. Therefore, the models to be used are determined as models A and B allocated to the leaf L2 from the decision tree of FIG.

次に、それぞれのモデルＡ、Ｂにより異常性判定を行う。 Next, abnormality determination is performed by each of models A and B.

モデルＡでは、図１６に示す通り、Ｘ１＝１０、σ＝１である。正常範囲を±３σとすると、正常範囲は７〜１３となる。時刻ＰにおけるＸ１は１７なので、モデルＡからは、時刻Ｐにおけるターゲットセンサの値は異常と判定される。 In model A, as shown in FIG. 16, X1 = 10 and σ = 1. When the normal range is ± 3σ, the normal range is 7 to 13. Since X1 at time P is 17, it is determined from model A that the value of the target sensor at time P is abnormal.

続いてモデルＢで判定する。時刻ＰにおけるＸ２は５なので、モデルＢによる予測値は５×３＋１＝１６となる。σ＝１なので正常範囲を±３σとすると、正常範囲は１３〜１９となる。時刻ＰにおけるＸ１は１７なので、モデルＢからは、時刻Ｐにおけるターゲットセンサの値は正常と判定される。 Subsequently, the determination is made by model B. Since X2 at time P is 5, the predicted value by model B is 5 × 3 + 1 = 16. Since σ = 1, if the normal range is ± 3σ, the normal range is 13-19. Since X1 at time P is 17, it is determined from model B that the value of the target sensor at time P is normal.

２つのモデルのうちの一方のモデルで正常と判定されたので、時刻ＰにおけるＸ１は最終的に正常と判定される。 Since one of the two models is determined to be normal, X1 at time P is finally determined to be normal.

図１９は、第４の実施形態に係わるデータ処理装置の構成を示す図である。 FIG. 19 is a diagram illustrating a configuration of a data processing apparatus according to the fourth embodiment.

記憶装置２１は、ＣＰＵ２２が実行するプログラムと、訓練データとを格納している。 The storage device 21 stores a program executed by the CPU 22 and training data.

ＣＰＵ２２は、記憶装置２１内のプログラムをメモリ２３にロードして実行する。以下、ＣＰＵ２２の処理の詳細を示す。 The CPU 22 loads the program in the storage device 21 into the memory 23 and executes it. Details of the processing of the CPU 22 will be described below.

訓練データが記憶装置２１からメモリ２３に読み出され、ＣＰＵ２２によって１つ以上のクラスタに分割される。 The training data is read from the storage device 21 to the memory 23 and is divided into one or more clusters by the CPU 22.

個々のクラスタごとに、ある一つの属性の値を、他の属性の値から予測するモデルがＣＰＵ２２によって生成される。 For each cluster, the CPU 22 generates a model for predicting a value of one attribute from values of other attributes.

訓練データにおける個々の時刻のデータが属するクラスタを示すクラスがＣＰＵ２２によって訓練データに割り当てられる。 A class indicating a cluster to which individual time data belongs in the training data is assigned to the training data by the CPU 22.

クラスが割り当てられた訓練データを用いて、上記他の属性から、利用すべきクラス（モデル）を推測する決定木がＣＰＵ２２によって生成される。 A decision tree for estimating a class (model) to be used is generated by the CPU 22 from the other attributes using the training data to which the class is assigned.

この決定木における葉のラベルが、ＣＰＵ２２により、第１〜第３の実施形態のいずれかの方法によって変更される。 The leaf label in the decision tree is changed by the CPU 22 by any of the methods of the first to third embodiments.

インターフェース２５経由で、テストデータの１時刻分がメモリ２３上にロードされる。 One time of test data is loaded onto the memory 23 via the interface 25.

ＣＰＵ２２により、このテストデータが決定木に適用され、異常性判定に用いるモデルが１つ以上選択される。 The CPU 22 applies this test data to the decision tree and selects one or more models to be used for abnormality determination.

選択されたモデルを用いて、１時刻分のテストデータにおける上記他の属性の値から、上記ある一つの属性を予測する。そして、予測値と、テストデータにおける上記ある１つの属性の値との乖離から異常性判定が行われる。 Using the selected model, the certain attribute is predicted from the values of the other attributes in the test data for one time. Then, the abnormality determination is performed from the difference between the predicted value and the value of the one attribute in the test data.

ＣＰＵ２２は、すべてのモデルで異常が検出された場合は、上記１時刻分のテストデータは異常であると判定し、１つでも正常と判定したモデルが存在した場合は、正常と判定する。 When abnormality is detected in all models, the CPU 22 determines that the test data for one time is abnormal, and determines that it is normal when there is at least one model determined to be normal.

ＣＰＵ２２は、判定の結果を表示装置２４に表示する。 The CPU 22 displays the determination result on the display device 24.

以上のように、本実施形態により、テストデータの異常発見において、モデルを１つしか利用しないことで誤って異常と判定してしまう可能性を低くすることができる。 As described above, according to the present embodiment, it is possible to reduce the possibility of erroneously determining an abnormality by using only one model in finding abnormalities in test data.

（第５の実施形態）
図２０は本実施形態に係わるテストデータの異常性判定方法を説明するフローチャートである。 (Fifth embodiment)
FIG. 20 is a flowchart for explaining a test data abnormality determination method according to this embodiment.

このフローチャートは、第４の実施形態で用いた図１４におけるＳ４３とＳ４４との間に、クラス割り当て変更処理Ｓ４３−１が入り、Ｓ４５の決定木の変更処理を除いたものである。ただし、Ｓ４５を除かずに処理を行うことも可能である。Ｓ４３−１では、ある時刻について訓練データの前後ｎステップ（所定の時刻範囲）のクラスの個数を集計する。集計値が最も大きいクラスをその時刻のクラスとする。以下に詳細例を示す。 In this flowchart, the class assignment change process S43-1 is inserted between S43 and S44 in FIG. 14 used in the fourth embodiment, and the decision tree change process of S45 is excluded. However, it is also possible to perform processing without removing S45. In S43-1, the number of classes of n steps (predetermined time range) before and after the training data for a certain time is totaled. The class with the largest aggregate value is the class at that time. Detailed examples are shown below.

図２１の左側は、図２０に示すＳ４３の直後における訓練データである。ここで、各時刻について前後ｎステップのクラスの個数を集計する。ｎ＝１として、クラスの個数を集計した表を図２１の中央に示す。例えば、時刻３では、１つ前の時刻２のクラスはＡ、自身のクラスはＢ、１つ先の時刻４のクラスはＡであるため、（Ａ、Ｂ、Ｃ）＝（２、１、０）となる。クラスの個数を集計する際、現在時刻に近いデータの重みを高くしても良い。 The left side of FIG. 21 is training data immediately after S43 shown in FIG. Here, the number of classes of n steps before and after each time is totalized. A table in which the number of classes is tabulated with n = 1 is shown in the center of FIG. For example, at time 3, the previous class at time 2 is A, its own class is B, and the next class at time 4 is A, so (A, B, C) = (2, 1, 0). When counting the number of classes, the weight of data close to the current time may be increased.

図２１の右側は、中央に示した集計表から決定した最終的なクラスである。例えば時刻３では、（Ａ、Ｂ、Ｃ）＝（２、１、０）から、クラスＡの個数が最も多いため、時刻３のクラスはＡとされる。本実施形態では、変更後のクラスを利用して、異常性判定に用いるモデルの選択を行う決定木を生成する。 The right side of FIG. 21 is the final class determined from the tabulation table shown in the center. For example, at time 3, since the number of classes A is the largest since (A, B, C) = (2, 1, 0), the class at time 3 is A. In this embodiment, a decision tree for selecting a model to be used for abnormality determination is generated using the changed class.

本実施形態では以上のようにして、訓練データに付されたクラスの書き換えを行うがこれについてさらに説明すると以下の通りである。 In the present embodiment, the class attached to the training data is rewritten as described above. This will be further described as follows.

プラント等のセンサデータが図２に示したような挙動を示すのは、プラントが複数の運転モードを持つ場合があるためであるが、運転モードが頻繁に切り替わることは少なく、同じモードが続くことが多い。図２１における時刻３のデータは、Ｓ４３の処理ではＢというクラスが割り当てられているが、これは時系列的連続性を考えるとＡである可能性が高い。従ってここでは時刻３のクラスをＢからＡに書き換える。 The sensor data of the plant or the like shows the behavior as shown in FIG. 2 because the plant may have a plurality of operation modes, but the operation mode is rarely switched and the same mode continues. There are many. The data at time 3 in FIG. 21 is assigned the class B in the process of S43, but this is highly likely to be A considering time-series continuity. Therefore, the class at time 3 is rewritten from B to A here.

以下、第４の実施形態で用いた図１９に基づき、第５の実施形態に係わるデータ処理装置について説明する。 The data processing apparatus according to the fifth embodiment will be described below based on FIG. 19 used in the fourth embodiment.

このクラスが、前後の時刻におけるクラスの内容に応じてＣＰＵによって書き換えられる。 This class is rewritten by the CPU according to the contents of the class at the previous and subsequent times.

書き換え後の訓練データを用いて、上記他の属性から、利用すべきクラス（モデル）を推測する決定木がＣＰＵ２２によって生成される。 Using the rewritten training data, the CPU 22 generates a decision tree for estimating a class (model) to be used from the other attributes.

以上のように、本実施形態により、テストデータの異常発見において、訓練データに付されたクラスの時系列性を考慮することで、異常発見に利用するモデルを選択する決定木の精度を向上させることができる。また、この効果に加え、モデルを一つしか利用しないことで誤って異常と判定してしまう可能性を低くすることができる。 As described above, according to the present embodiment, the accuracy of a decision tree for selecting a model to be used for anomaly detection is improved by considering the time series of classes attached to training data in the anomaly detection of test data. be able to. In addition to this effect, the possibility of erroneously determining an abnormality by using only one model can be reduced.

（第６の実施形態）
図２２は本実施形態に係わるテストデータの異常性判定方法を説明するフローチャートである。 (Sixth embodiment)
FIG. 22 is a flowchart for explaining a test data abnormality determination method according to this embodiment.

まず、先願の手法を用いて訓練データ（図１２参照）をクラスタリングしクラスタ毎にモデルを生成する（Ｓ５１）。 First, the training data (see FIG. 12) is clustered using the method of the prior application, and a model is generated for each cluster (S51).

つづいて、モデル毎にモデルを構成する説明変数（説明センサ）が変動を許容される範囲を計算する（Ｓ５２）。 Subsequently, the range in which the explanatory variable (explanatory sensor) constituting the model is allowed to vary is calculated for each model (S52).

つづいて、テストデータのＰ時刻におけるＸ１の異常性判定を行うとする。まず、モデルを１つ選び、選んだモデルの説明センサ（Ｘ２とする）が変動を許容される範囲に、時刻ＰのＸ２が入っているかを調べる。入っているならばそのモデルで異常性判定を行う。この処理を全てのモデルについて行う。 Subsequently, it is assumed that the abnormality determination of X1 at the time P of the test data is performed. First, one model is selected, and it is checked whether or not X2 at time P is within a range in which the explanation sensor (X2) of the selected model is allowed to fluctuate. If so, the model is judged to be abnormal. This process is performed for all models.

以下に詳細例を示す。 Detailed examples are shown below.

図２２のＳ５１において、図２３に示すようなモデルが生成されたとする。 Assume that a model as shown in FIG. 23 is generated in S51 of FIG.

次に、Ｓ５２において、説明変数（Ｘ２）の変動許容範囲を求める。ここでは、モデルに対応するクラスタの説明変数の変動範囲を利用する。よって、モデルＡに関しては３以上１６以下、モデルＢに関しては１以上１０以下、モデルＣに関しては１１以上２２以下が変動許容範囲となる。 Next, in S52, an allowable variation range of the explanatory variable (X2) is obtained. Here, the fluctuation range of the explanatory variable of the cluster corresponding to the model is used. Therefore, 3 to 16 for the model A, 1 to 10 for the model B, and 11 to 22 for the model C.

次に、Ｓ５３において、テストデータにおける時刻Ｐの異常性判定を行う。この時、時刻ＰにおけるＸ２が３ならば、３を変動許容範囲に持つモデルＡ、Ｂが異常性判定に用いるモデルとして選択される。Ｘ２が１３ならば、モデルＡ、Ｃが選択される。 Next, in S53, the abnormality determination at time P in the test data is performed. At this time, if X2 at time P is 3, models A and B having 3 in the allowable variation range are selected as models used for abnormality determination. If X2 is 13, models A and C are selected.

なお、全てのモデルの変動許容範囲にＸ２が入らなかった場合は、Ｘ２と距離が近い変動範囲を持つモデルを、ユーザにより指定された個数、利用する方法が考えられる。また、モデル毎の変動許容範囲は、クラスタに含まれる点の平均と分散とから決定してもよい。 If X2 does not fall within the variation allowable range of all models, a method of using a model having a variation range close to X2 in the number specified by the user can be considered. Further, the variation allowable range for each model may be determined from the average and variance of points included in the cluster.

以下、第４の実施形態で用いた図１９に基づき、第６の実施形態に係わるデータ処理装置について説明する。 The data processing apparatus according to the sixth embodiment will be described below based on FIG. 19 used in the fourth embodiment.

ＣＰＵ２２によって、モデルごとに、そのモデルを構成する属性の変動許容範囲が計算される。 For each model, the CPU 22 calculates an allowable variation range of attributes constituting the model.

ＣＰＵ２２により、個々のモデルの変動許容範囲に、このテストデータにおける他の属性の値が含まれるか否かが判定される。入る場合、そのモデルは異常性判定に用いられるものとして選択される。 The CPU 22 determines whether or not the value of another attribute in the test data is included in the allowable variation range of each model. If so, the model is selected for use in anomaly determination.

ＣＰＵ２２は、すべてのモデルで異常と判定されたときは、１時刻分のテストデータは異常と判定し、正常と判定したモデルが１つでも存在した場合は正常と判定する。 The CPU 22 determines that the test data for one hour is abnormal when it is determined to be abnormal in all models, and determines that it is normal if there is at least one model determined to be normal.

以上のように、本実施形態により、テストデータの異常発見において、テストデータにおける属性値の変動許容範囲を考慮することで、異常発見に利用するモデルを選択する精度を向上させることができる。 As described above, according to the present embodiment, the accuracy of selecting a model to be used for abnormality detection can be improved by considering the allowable variation range of the attribute value in the test data in the abnormality detection of the test data.

（第７の実施形態）
第７の実施形態では、第４または第５の実施形態の方法と、第６の実施形態の方法とを用いてそれぞれから異常性判定を行うためのモデルを選び、両方の方法から選ばれたモデルから、最終的なモデルを選択して異常性判定を行う。例えば、両方の方法から選ばれたモデルのアンド（ＡＮＤ）をとったものを最終的なモデルとする。ＡＮＤが空集合なら例えば第６の実施形態で選択したモデルを利用する。 (Seventh embodiment)
In the seventh embodiment, a model for performing abnormality determination is selected from each of the methods according to the fourth or fifth embodiment and the method according to the sixth embodiment, and selected from both methods. The final model is selected from the models and the abnormality is determined. For example, the final model is obtained by taking an AND of models selected from both methods. If AND is an empty set, for example, the model selected in the sixth embodiment is used.

仮に、第４または第５の実施形態で求まったモデルがＡ、Ｂ、Ｃで、第６の実施形態で求まったモデルがＡ、Ｃであったとすると、両方のＡＮＤをとったモデルＡ、Ｃを最終的なモデルとして採用する。これは、例えば、図２３のモデルにより異常性判定を行いたいデータのＸ２が１３であったとすると、１３を変動許容範囲に持たないモデルＢを使うのは適切でない可能性があるためである。 If the models obtained in the fourth or fifth embodiment are A, B, and C, and the models obtained in the sixth embodiment are A and C, then models A and C that take both ANDs. Is adopted as the final model. This is because, for example, if X2 of the data for which abnormality determination is to be performed based on the model of FIG. 23 is 13, it may not be appropriate to use model B that does not have 13 in the allowable fluctuation range.

また、第４または第５の実施形態で求まったモデルがＡで、第６の実施形態で求まったモデルがＡ、Ｃであったとすると、両方のＡＮＤをとったモデルＡを最終的なモデルとして採用する。これは、第４または第５の実施形態のモデル選択でクラス（運転モード）がＡであるという確信が高いため、モデルＣを外すのが妥当だと考えられるからである。 Further, if the model obtained in the fourth or fifth embodiment is A and the models obtained in the sixth embodiment are A and C, the model A that takes both ANDs is used as the final model. adopt. This is because it is considered that it is appropriate to remove the model C because it is highly certain that the class (operation mode) is A in the model selection of the fourth or fifth embodiment.

以下、第４の実施形態で用いた図１９に基づき、第７の実施形態に係わるデータ処理装置について説明する。 The data processing apparatus according to the seventh embodiment will be described below based on FIG. 19 used in the fourth embodiment.

ＣＰＵ２２により、個々のモデルの変動許容範囲に、このテストデータにおける他の属性の値が含まれるか否かが計算される。入る場合、そのモデルを選択する。 The CPU 22 calculates whether or not the values of other attributes in the test data are included in the variation allowable range of each model. If so, select that model.

選択されたモデルと、第４または第５の実施形態を用いて選択されたモデルとのANDをとったモデルを、新たに異常性判定に用いるモデルとする。 A model obtained by ANDing the selected model and the model selected using the fourth or fifth embodiment is a model used for the abnormality determination.

以上のように、本実施形態により、テストデータの異常発見において、訓練データに付されたクラスの時系列性や、テストデータにおける属性値の変動許容範囲を考慮することで、異常発見に利用するモデルを選択する分類規則（決定木）の精度、および、異常発見に利用するモデルの選択精度を向上させることができる。さらに、テストデータの異常発見において、モデルを１つしか利用しないことで誤って異常と判定してしまう可能性を低くすることができる。 As described above, according to the present embodiment, in finding abnormality in test data, it is used for finding abnormality by taking into account the time series nature of classes attached to training data and the allowable variation range of attribute values in test data. It is possible to improve the accuracy of the classification rule (decision tree) for selecting a model and the accuracy of selecting a model used for finding anomalies. Furthermore, in detecting abnormalities in test data, the possibility of erroneously determining abnormalities can be reduced by using only one model.

（第８の実施形態）
第８の実施形態は、第４〜第７の実施形態のいずれかを用いて異常検出を行う場合に、利用するモデルの修正を図るものである。テストデータの過去ｍ時刻前までに利用されたモデルと、現在の時刻について決定されたモデルとから、利用するモデルを決定する。ここでは２つの手法を示す。 (Eighth embodiment)
In the eighth embodiment, a model to be used is corrected when abnormality detection is performed using any of the fourth to seventh embodiments. The model to be used is determined from the model used up to m times before the test data and the model determined for the current time. Here, two methods are shown.

［第１の手法］ある時刻（現時刻）で異常発見に利用するモデルは、過去ｍ時刻内の各時刻において第４〜第７の実施形態のいずれかにより決定されたモデルと、現在の時刻において第４〜第７の実施形態のいずれかにより決定されたモデルとを合わせたもの（論理和：ＯＲ）とする。以下、図２４を用いて詳細例を説明する。ただし、ｍ＝２とする。 [First Method] The model used for abnormality detection at a certain time (current time) is the model determined by any of the fourth to seventh embodiments at each time within the past m times, and the current time. And a model determined by any of the fourth to seventh embodiments (logical sum: OR). Hereinafter, a detailed example will be described with reference to FIG. However, m = 2.

時刻２では第４〜第７の実施形態のいずれかにより決定されたモデルがＡ、Ｃ、時刻３でも第４〜第７の実施形態のいずれかにより決定されたモデルがＡ、Ｃである。時刻４では第４〜第７の実施形態のいずれかにより決定されたモデルがＢである。よって、時刻４では、時刻２のモデルＡ、Ｃ、時刻３のモデルＡ、Ｃ、時刻４のモデルＢのＯＲをとったモデルＡ、Ｂ、Ｃが、実際に利用するモデルとなる。 The models determined by any of the fourth to seventh embodiments at time 2 are A and C, and the models determined by any of the fourth to seventh embodiments at time 3 are A and C. At time 4, the model determined by any one of the fourth to seventh embodiments is B. Therefore, at time 4, models A, B, and C obtained by ORing models A and C at time 2, models A and C at time 3, and model B at time 4 are actually used models.

また、時刻３では第４〜第７の実施形態のいずれかにより決定されたモデルがＡ，Ｃ、時刻４では第４〜第７の実施形態のいずれかにより決定されたモデルがＢ、時刻５では第４〜第７の実施形態のいずれかにより決定されたモデルがＢである。よって、時刻５では、時刻３のモデルＡ、Ｃ、時刻４のモデルＢ、時刻５のモデルＢのＯＲをとったモデルＡ、Ｂ、Ｃが実際に利用するモデルとなる。 At time 3, the model determined by any of the fourth to seventh embodiments is A, C, and at time 4, the model determined by any of the fourth to seventh embodiments is B, time 5 Then, the model determined by any of the fourth to seventh embodiments is B. Therefore, at time 5, models A, B, and C obtained by ORing models A and C at time 3, model B at time 4, and model B at time 5 are actually used models.

また、時刻４、５、６では、第４〜第７の実施形態のいずれかにより決定されたモデルはいずれもＢである。よって、時刻６では、モデルＢが、実際に利用するモデルとなる。 At times 4, 5, and 6, the model determined by any of the fourth to seventh embodiments is B. Therefore, at time 6, model B becomes a model that is actually used.

［第２の手法］ある時刻（現時刻）で異常発見に利用するモデルは、過去ｍ時刻内で異常性判定を行い正常と判定されたモデルと、現時刻において第４〜第７の実施形態のいずれかにより決定されたモデルとを合わせたもの（ＯＲ）とする。以下、図２５を用いて詳細例を説明する。ただし、ｍ＝２とし、図中、“○”がついているモデルは異常性判定に使われかつ正常と判定したモデルである。 [Second Method] A model used for abnormality detection at a certain time (current time) is a model in which abnormality is determined within the past m times and is determined to be normal, and the fourth to seventh embodiments at the current time. (OR) combined with the model determined by any of the above. Hereinafter, a detailed example will be described with reference to FIG. However, m = 2 and the model with “◯” in the figure is a model that is used for abnormality determination and is determined to be normal.

時刻２、３ではモデルＣを用いたときに正常と判定されている。一方、時刻４において第４〜第７の実施形態のいずれかにより決定されたモデルはＢである。よって、時刻４では、モデルＢ，Ｃが実際に利用するモデルとなる。 At time 2 and 3, it is determined to be normal when model C is used. On the other hand, the model determined by any of the fourth to seventh embodiments at time 4 is B. Therefore, at time 4, models B and C are actually used models.

また、時刻３ではモデルＣ、時刻４ではモデルＢを用いたときに正常と判定されている。一方、時刻５において第４〜第７の実施形態のいずれかにより決定されたモデルはＢである。よって，時刻５では、モデルＢ、Ｃが実際に利用するモデルとなる。 Further, it is determined to be normal when model C is used at time 3 and model B is used at time 4. On the other hand, the model determined by any of the fourth to seventh embodiments at time 5 is B. Therefore, at time 5, models B and C are actually used models.

また、時刻４ではモデルＢ、時刻５でもモデルＢを用いたときに正常と判定されている。一方、時刻６において第４〜第７の実施形態のいずれかにより決定されたモデルはＢである。よって，時刻６では、モデルＢが実際に利用するモデルとなる。 Further, it is determined that model B is used at time 4 and model B is used at time 5 as well. On the other hand, the model determined by any of the fourth to seventh embodiments at time 6 is B. Therefore, at time 6, model B is the model that is actually used.

以下、第４の実施形態で用いた図１９に基づき、第８の実施形態に係わるデータ処理装置について説明する。 The data processing apparatus according to the eighth embodiment will be described below with reference to FIG. 19 used in the fourth embodiment.

ＣＰＵ２２により、第４〜第７の実施形態のいずれかを用いて、異常性判定に用いるモデルを決定する。 The CPU 22 determines a model to be used for abnormality determination using any of the fourth to seventh embodiments.

テストデータにおける過去の時刻で第４〜第７の実施形態のいずれかにより選択されたモデルの情報をメモリ２３上に保存しておき、その情報と、上記異常性判定に用いるモデルとから、最終的に用いるモデル（単数または複数）を決定する。 Information on the model selected by any of the fourth to seventh embodiments at the past time in the test data is stored in the memory 23, and the final information is obtained from the information and the model used for the abnormality determination. Determine the model (s) to be used.

以上のように、本実施形態により、テストデータの異常発見において、異常発見に利用するモデルの選択精度を向上させることができる。 As described above, according to the present embodiment, it is possible to improve the accuracy of selecting a model used for finding an abnormality in test data.

第１の実施形態に係わる決定木の変更方法を説明するフローチャート。The flowchart explaining the change method of the decision tree concerning 1st Embodiment. プラントにおけるセンサ間の関係および一般的な管理値の設定を説明する図。The figure explaining the relationship between the sensors in a plant, and the setting of a general management value. 訓練データの例を示す図。The figure which shows the example of training data. モデルを選択するための決定木の例を示す図。The figure which shows the example of the decision tree for selecting a model. テストデータの例を示す図。The figure which shows the example of test data. 決定木に訓練データの分類結果を加えた状態を示す図。The figure which shows the state which added the classification result of training data to the decision tree. クラス分布表の例を示す図。The figure which shows the example of a class distribution table. データ処理装置の構成例を示す図。The figure which shows the structural example of a data processor. 第２の実施形態に係わる決定木の変更方法を説明するフローチャート。9 is a flowchart for explaining a decision tree changing method according to the second embodiment. 第３の実施形態に係わる決定木の変更方法を説明するフローチャート。12 is a flowchart for explaining a decision tree changing method according to the third embodiment. 母集団の分布を計算するための方程式の例を示す図。The figure which shows the example of the equation for calculating the distribution of a population. 訓練データの例を示す図。The figure which shows the example of training data. テストデータの例を示す図。The figure which shows the example of test data. 第４の実施形態に係わるテストデータの異常性判定方法を説明するフローチャート。10 is a flowchart for explaining a test data abnormality determination method according to a fourth embodiment; クラスが付加された訓練データの例を示す図。The figure which shows the example of the training data to which the class was added. モデルとモデルを表す数式とを示す図。The figure which shows the model and the numerical formula showing a model. 決定木に訓練データの分類結果を加えた状態を示す図。The figure which shows the state which added the classification result of training data to the decision tree. 変更後の決定木の例を示す図。The figure which shows the example of the decision tree after a change. 他のデータ処理装置の構成例を示す図。The figure which shows the structural example of another data processing apparatus. 第５の実施形態に係わるテストデータの異常性判定方法を説明するフローチャート。10 is a flowchart for explaining a test data abnormality determination method according to a fifth embodiment; 訓練データに付されたクラスの変更処理を説明する図。The figure explaining the change process of the class attached | subjected to training data. 第６の実施形態に係わるテストデータの異常性判定方法を説明するフローチャート。10 is a flowchart for explaining a test data abnormality determination method according to a sixth embodiment; 変動許容範囲を説明する図。The figure explaining the fluctuation | variation tolerance range. 第１の手法を説明する図。The figure explaining a 1st method. 第２の手法を説明する図。The figure explaining a 2nd method.

Explanation of symbols

１１、２１：記憶装置
１２、２２：ＣＰＵ
１３、２３：メモリ
１４、２４：表示装置
２５：インターフェース 11, 21: Storage device 12, 22: CPU
13, 23: Memory 14, 24: Display device 25: Interface

Claims

Each of the data is input to a decision tree that predicts a class from one or more attributes generated from multidimensional time series data that is a set of data having an n-dimensional attribute value and a one-dimensional class;
For each leaf of the decision tree, generate frequency distribution information representing the frequency distribution of the class using a data group classified into the leaves,
Select one or more classes based on the frequency distribution information,
Update the leaf class of the decision tree corresponding to the frequency distribution information according to the selected class.
Decision tree change method.

The decision tree changing method according to claim 1, wherein a class whose frequency satisfies a threshold in the frequency distribution information is selected.

Select a class in order from the most frequent frequency distribution information,
When the ratio of the sum of the frequencies of the selected class to the sum of the frequencies of all classes satisfies the threshold, the selection is terminated.
The decision tree changing method according to claim 1, wherein:

Select a class in order from the most frequent frequency distribution information,
Based on the sum of the frequencies of the selected class, the sum of the frequencies of all the classes, and the reliability, the distribution of the population classified into the leaves is estimated,
In the estimated distribution of the population, the selection is terminated when the distribution of the selected class satisfies a threshold value.
The decision tree changing method according to claim 1, wherein:

clustering the first multidimensional time series data which is a set of data having n-dimensional attribute values;
Assign a class to each cluster,
For each cluster, create a model that predicts the value of one attribute using other attributes;
Assigning to each data in the first multidimensional time series data a class of clusters to which each belongs,
Using the first multi-dimensional time-series data to which the class is assigned, to generate a decision tree that predicts the class from the other attributes;
The generated decision tree is changed by any one of claims 1 to 4,
next,
In the second multi-dimensional time-series data that is a set of data having n-dimensional attribute values, data to be tested is input to the changed decision tree to predict a class,
Select the model corresponding to the predicted class, enter the value of the other attribute in the data to be tested into the selected model,
Determining anomalies of the data to be tested from the value of the one attribute in the data to be tested and the output of the model;
Abnormality judgment method.

clustering the first multidimensional time series data which is a set of data having n-dimensional attribute values;
Assign a class to each cluster,
For each cluster, create a model that predicts the value of one attribute using other attributes;
Assigning to each data in the first multidimensional time series data a class of clusters to which each belongs,
Based on the class assigned to the data and the class assigned to the data included in the predetermined time range from the data, the class assigned to the data is re-determined, and the class of the data is determined by the class after the re-determination Update
Using the first multidimensional time-series data to which the updated class is assigned, a decision tree that predicts the class from the other attribute is generated,
In the second multidimensional time-series data that is a set of data having n-dimensional attribute values, data to be tested is input to the generated decision tree to predict a class,
Select the model corresponding to the predicted class, enter the value of the other attribute in the data to be tested into the selected model,
Determining the anomaly of the data to be tested from the value of the one attribute in the data to be tested and the output of the model;
Abnormality judgment method.

The abnormality determination according to claim 6, wherein the generated decision tree is changed using the method according to claim 1, and the class is predicted using the changed decision tree. Method.

clustering the first multidimensional time series data which is a set of data having n-dimensional attribute values;
For each cluster, create a model that predicts the value of one attribute using other attributes;
Determining the variation tolerance of the other attributes in each of the models,
selecting a model in which the other attribute in the data to be tested is included in the variation allowable range in the second multidimensional time-series data which is a set of data having n-dimensional attribute values;
Enter the value of the other attribute in the data to be tested into the selected model,
Determining the anomaly of the data to be tested from the value of the one attribute in the data to be tested and the output of the model;
Abnormality judgment method.

clustering the first multidimensional time series data which is a set of data having n-dimensional attribute values;
Assign a class to each cluster,
For each cluster, create a model that predicts the value of one attribute using other attributes;
Assigning to each data in the first multidimensional time series data a class of clusters to which each belongs,
Using the first multi-dimensional time series data to which the class is assigned, to generate a decision tree for predicting the class from the other attributes;
The generated decision tree is changed by the method according to any one of claims 1 to 4,
next,
In the second multidimensional time series data that is a set of data having n-dimensional attribute values, data to be tested is input to the changed decision tree to predict a class,
On the other hand, the variation tolerance of the other attribute is determined in each individual model,
Select a model in which the other attribute in the data to be tested is included in the variation allowable range,
Select a model that is common between the selected model and the model corresponding to the predicted class,
Enter the value of the other attribute in the data to be tested into the selected model,
Determining the anomaly of the data to be tested from the value of the one attribute in the data to be tested and the output of the model;
Abnormality judgment method.

clustering the first multidimensional time series data which is a set of data having n-dimensional attribute values;
Assign a class to each cluster,
For each cluster, create a model that predicts the value of one attribute using other attributes;
Assigning to each data in the first multidimensional time series data a class of clusters to which each belongs,
Based on the class assigned to the data and the class assigned to the data included in the predetermined time range from the data, the class assigned to the data is re-determined, and the class of the data is determined by the class after the re-determination Update
Using the first multidimensional time-series data to which the updated class is assigned, a decision tree that predicts the class from the other attribute is generated,
In the second multidimensional time-series data that is a set of data having n-dimensional attribute values, data to be tested is input to the generated decision tree to predict a class,
On the other hand, the variation tolerance of the other attribute is determined in each individual model,
Select a model in which the other attribute in the data to be tested is included in the variation allowable range,
Select a model that is common between the selected model and the model corresponding to the predicted class,
Enter the value of the other attribute in the data to be tested into the selected model,
Determining the anomaly of the data to be tested from the value of the one attribute in the data to be tested and the output of the model;
Abnormality judgment method.

The abnormality determination according to claim 10, wherein the generated decision tree is changed using the method according to claim 1, and the class is predicted using the changed decision tree. Method.

When there are a plurality of models selected or selected to determine the abnormality of the data to be tested, if at least one of the models is determined to be normal, the data to be tested is normal. The abnormality according to any one of claims 5 to 11, wherein it is determined that the data to be tested is abnormal when all the models are determined to be abnormal. Judgment method.

Logically OR the model selected or selected to determine the anomaly of the data to be tested at a certain time with the model selected or selected for each time up to m times before the certain time. The abnormality determination method according to claim 5, wherein the model is used to determine abnormality of the data to be tested at the certain time.

The model selected or selected to determine the anomaly of the data to be tested at a certain time and the model selected or selected for each time up to m times before the certain time are judged as normal. An abnormality according to any one of claims 5 to 12, wherein a model obtained by performing a logical sum of the obtained model and the model is used to determine an abnormality of the data to be tested at the certain time. Sex determination method.

The program for making a computer perform each step as described in any one of Claims 1 thru | or 4.

The program for making a computer perform each step as described in any one of Claims 5 thru | or 14.