JP2008140100A

JP2008140100A - Information processor, data determination method and program

Info

Publication number: JP2008140100A
Application number: JP2006325201A
Authority: JP
Inventors: Rika Kawabata; 理華河端; Norio Hirai; 規郎平井
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2006-12-01
Filing date: 2006-12-01
Publication date: 2008-06-19

Abstract

<P>PROBLEM TO BE SOLVED: To perform determination of abnormality and spike-like noise in time-series data. <P>SOLUTION: A time-series data generation part 102 creates time-series data from data accumulated in a log storage part 101. An abnormality detection part 103 detects abnormal data whose data value is not normal in the time-series data. For a plurality of following data within a fixed time from the abnormal data as prediction object data, a spike determination part 106 calculates a prediction range of data value for each prediction object data, compares an actual data value of each prediction object data with the prediction range of data value, calculates the probability that the actual data value of each prediction object data is contained in the prediction range, and determines which of abnormal data and spike-like noise the abnormal data is based on the calculated probability. When the spike-like noise is determined, a data conversion part 107 substitutes the data in the log storage part 101 by a normal value that is alternative to the value of spike-like noise. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、例えば、時間軸に沿って整列された時系列データに混入したノイズを除去する技術に関する。 The present invention relates to a technique for removing noise mixed in time-series data aligned along a time axis, for example.

従来、時系列データにノイズが混入した場合、特開平５−１２２４０号公報や、特開２００３−１０１８８号公報に記載のように、ノイズを除去し、検出精度や計測制度を高めていた。
特開平５−１２２４０号公報特開２００３−１０１８８号公報 Conventionally, when noise is mixed in time-series data, as described in JP-A-5-12240 and JP-A-2003-10188, noise is removed to improve detection accuracy and measurement system.
Japanese Patent Laid-Open No. 5-12240 JP 2003-10188 A

ネットワークの異常検出において、ネットワークのログを集計したアクセス数などから生成した時系列データでは、データの変化するポイントとして、ワームなどの攻撃による異常、正常状態の変化、さらに、スパイク状ノイズがある。 In time-series data generated from the number of accesses obtained by collecting network logs in network anomaly detection, there are anomalies due to attacks such as worms, changes in normal state, and spike noise as data change points.

このスパイク状ノイズは、突発的に異常値と同じ、またはそれ以上の高い値が発生し、直後に通常の値に戻る。
図２の（１）に示すように、数値が急激に上昇した場合に、この数値の急上昇がスパイク状ノイズであるのか異常であるのかはその時点では不明である。
図２の（２）に示すように、後の数値の推移から、図２の（１）の数値の急上昇は異常の端緒であったことが後に判明する場合がある。 This spike noise suddenly occurs at a value that is higher than or equal to the abnormal value, and immediately returns to a normal value.
As shown in (1) of FIG. 2, when the numerical value increases rapidly, it is unclear at that time whether the rapid increase in numerical value is spike noise or abnormal.
As shown in (2) of FIG. 2, it may be later revealed that the rapid increase in the numerical value of (1) in FIG.

一方、図２の（３）に示すように、後の数値の推移から、図２の（１）の数値の急上昇がスパイク状ノイズであった場合、このスパイク状ノイズが入ったままだと、その後の異常が正常と判定されることなどにより異常検出の精度が低下するため、スパイク状ノイズを除去する必要がある。 On the other hand, as shown in (3) of FIG. 2, if the rapid increase in the numerical value of (1) in FIG. Therefore, it is necessary to remove spike noise.

また、ネットワークでの異常検出では、できるだけ速やかに異常な状態を検出することが必要であるが、スパイク状ノイズが入ったままだと、スパイク状ノイズの値が高い数値のため、異常の端緒の検出が遅れる場合がある（図２の（３））。
例えば、ワームなどの影響により正常な状態が徐々に変化する場合は、異常な状態の端緒の検出への影響が大きいといえる。 In addition, it is necessary to detect an abnormal state as quickly as possible in network anomaly detection. However, if spike noise remains in the network, the spike noise level is high, so the detection of the beginning of the anomaly is detected. May be delayed ((3) in FIG. 2).
For example, if the normal state gradually changes due to the influence of a worm or the like, it can be said that the influence on the detection of the start of the abnormal state is great.

さらに、アクセス数が増大していき異常と判定された場合でも、結果的には正常状態の変化である場合（図２の（４））もあり、正常でないと判定されたデータを全て除去することはできない。 Furthermore, even when the number of accesses increases and it is determined that there is an abnormality, there is a case where the result is a change in the normal state ((4) in FIG. 2), and all the data determined as not normal is removed. It is not possible.

一方、従来の技術でノイズを除去する場合、データは正常か異常、もしくは正常かノイズかの区別しかなく、ネットワークのデータにある異常とスパイク状ノイズの判別ができないという課題があった。 On the other hand, when noise is removed by the conventional technique, there is only a distinction between data being normal or abnormal, or normal or noise, and there is a problem that it is impossible to distinguish between abnormalities in network data and spike noise.

この発明は、上記のような課題を解決することを主な目的とし、精度の高い異常検出を行うことを主な目的とする。 The main object of the present invention is to solve the above-described problems, and to perform highly accurate abnormality detection.

本発明に係る情報処理装置は、
時間軸に沿って整列された時系列データを監視し、時系列データにおいてデータ値が正常でない非正常データを検出する非正常データ検出部と、
時間軸において前記非正常データから一定時間内にある複数のデータをそれぞれ予測対象データとし、予測対象データごとにデータ値の予測範囲を算出し、予測対象データごとに実際のデータ値とデータ値の予測範囲とを比較し、比較結果に基づいて、前記非正常データが異常データ及びノイズのいずれであるかを判定する非正常データ判定部とを有することを特徴とする。 An information processing apparatus according to the present invention includes:
An abnormal data detector that monitors time-series data arranged along the time axis and detects abnormal data in which data values are not normal in the time-series data;
A plurality of data within a certain time period from the abnormal data on the time axis is set as the prediction target data, the prediction range of the data value is calculated for each prediction target data, and the actual data value and the data value of each prediction target data are calculated. A non-normal data determination unit that compares the prediction range and determines whether the abnormal data is abnormal data or noise based on a comparison result.

本発明によれば、非正常データを検出した場合に、検出した非正常データが、異常か突発的なノイズかの判定を行うため、精度の高い異常検出を行うことができる。 According to the present invention, when abnormal data is detected, it is determined whether the detected abnormal data is abnormal or sudden noise. Therefore, highly accurate abnormality detection can be performed.

実施の形態１．
図１は、この発明の実施の形態１に係るログ分析装置等の構成を示すブロック図である。
図１において、ログ分析装置１０（情報処理装置）は、ログ収集装置２０が出力するログを分析し、分析結果を通知する。
ログ収集装置２０は、ネットワークを監視してログを収集して出力する。
表示装置１０５は、ログ分析装置１０の分析結果を表示する。 Embodiment 1 FIG.
FIG. 1 is a block diagram showing a configuration of a log analysis apparatus and the like according to Embodiment 1 of the present invention.
In FIG. 1, a log analysis device 10 (information processing device) analyzes a log output by the log collection device 20 and notifies an analysis result.
The log collection device 20 collects and outputs logs by monitoring the network.
The display device 105 displays the analysis result of the log analysis device 10.

ログ分析装置１０において、ログ記憶部１０１は、ログ収集装置２０から出力されたログを保存する。
時系列データ生成部１０２は、ログ記憶部１０１に蓄積されたデータを時間軸に沿って整列し、時系列データを作成する。
異常検出部１０３は、時間軸に沿って整列された時系列データを監視し、時系列データにおいてデータ値が正常でない非正常データを検出する。異常検出部１０３は、非正常データ検出部の例である。
通知部１０４は、非正常データの検出を通知する。
スパイク判定部１０６は、非正常データが検出された場合、それがスパイク状ノイズであるか異常データであるかを判定する。スパイク判定部１０６は、非正常データ判定部の例である。
データ変換部１０７は、非正常データの値の代わりとなる正常な値を算出し、ログ記憶部１０１のデータを置換する。データ変換部１０７は、データ書換え部の例である。 In the log analysis device 10, the log storage unit 101 stores the log output from the log collection device 20.
The time series data generation unit 102 arranges the data accumulated in the log storage unit 101 along the time axis, and creates time series data.
The abnormality detection unit 103 monitors time-series data arranged along the time axis, and detects abnormal data whose data values are not normal in the time-series data. The abnormality detection unit 103 is an example of an abnormal data detection unit.
The notification unit 104 notifies the detection of abnormal data.
When abnormal data is detected, the spike determination unit 106 determines whether it is spike noise or abnormal data. The spike determination unit 106 is an example of an abnormal data determination unit.
The data conversion unit 107 calculates a normal value that replaces the value of the abnormal data, and replaces the data in the log storage unit 101. The data conversion unit 107 is an example of a data rewriting unit.

スパイク判定部１０６は、時間軸において非正常データから一定時間内にある後続の複数のデータをそれぞれ予測対象データとし、予測対象データごとにデータ値の予測範囲を算出し、予測対象データごとに実際のデータ値とデータ値の予測範囲とを比較し、各予測対象データの実際のデータ値が予測範囲に含まれる確率を算出し、算出した確率に基づいて、非正常データが異常データ及びノイズのいずれであるかを判定する。 The spike determination unit 106 calculates a prediction range of data values for each prediction target data using a plurality of subsequent data within a certain time from the abnormal data on the time axis, and calculates the prediction range of the data value for each prediction target data. And the prediction range of the data value are compared, the probability that the actual data value of each prediction target data is included in the prediction range is calculated, and based on the calculated probability, the abnormal data is detected as abnormal data and noise. It is determined which one.

スパイク判定部１０６は、より具体的には、非正常データの非正常データ値を反映させて予測対象データごとにデータ値の予測範囲を算出して、各予測対象データの第一の予測範囲とし、また、非正常データの非正常データ値の代わりに正常なデータ値を反映させて予測対象データごとにデータ値の予測範囲を算出して、各予測対象データの第二の予測範囲とする。
そして、スパイク判定部１０６は、予測対象データごとに実際のデータ値と第一の予測範囲とを比較し、各予測対象データの実際のデータ値が第一の予測範囲に含まれる確率を第一の確率として算出し、また、予測対象データごとに実際のデータ値と第二の予測範囲とを比較し、各予測対象データの実際のデータ値が第二の予測範囲に含まれる確率を第二の確率として算出し、算出した第一の確率と第二の確率とを比較して、非正常データが異常データ及びノイズのいずれであるかを判定する。
なお、スパイク判定部１０６及びその他の要素の詳細な動作例は後述する。 More specifically, the spike determination unit 106 calculates the prediction range of the data value for each prediction target data by reflecting the abnormal data value of the abnormal data, and sets the first prediction range of each prediction target data. In addition, the normal data value is reflected instead of the abnormal data value of the abnormal data, and the prediction range of the data value is calculated for each prediction target data to obtain the second prediction range of each prediction target data.
Then, the spike determination unit 106 compares the actual data value with the first prediction range for each prediction target data, and determines the probability that the actual data value of each prediction target data is included in the first prediction range. And the actual data value for each prediction target data is compared with the second prediction range, and the probability that the actual data value of each prediction target data is included in the second prediction range is calculated. And the first probability and the second probability calculated are compared to determine whether the abnormal data is abnormal data or noise.
Detailed operation examples of the spike determination unit 106 and other elements will be described later.

次に、本実施の形態及び以下に述べる実施の形態に示すログ分析装置１０のハードウェア構成例について説明する。
図６は、本実施の形態及び以下に述べる実施の形態に示すログ分析装置１０のハードウェア資源の一例を示す図である。なお、図６の構成は、あくまでもログ分析装置１０のハードウェア構成の一例を示すものであり、ログ分析装置１０のハードウェア構成は図６に記載の構成に限らず、他の構成であってもよい。 Next, a hardware configuration example of the log analysis device 10 shown in the present embodiment and the embodiments described below will be described.
FIG. 6 is a diagram illustrating an example of hardware resources of the log analysis device 10 according to the present embodiment and the embodiments described below. 6 is merely an example of the hardware configuration of the log analysis device 10, and the hardware configuration of the log analysis device 10 is not limited to the configuration illustrated in FIG. Also good.

図６において、ログ分析装置１０は、プログラムを実行するＣＰＵ９１１（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ、中央処理装置、処理装置、演算装置、マイクロプロセッサ、マイクロコンピュータ、プロセッサともいう）を備えている。ＣＰＵ９１１は、バス９１２を介して、例えば、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）９１３、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）９１４、通信ボード９１５、表示装置９０１、キーボード９０２、マウス９０３、磁気ディスク装置９２０と接続され、これらのハードウェアデバイスを制御する。更に、ＣＰＵ９１１は、ＦＤＤ９０４（ＦｌｅｘｉｂｌｅＤｉｓｋＤｒｉｖｅ）、コンパクトディスク装置９０５（ＣＤＤ）、プリンタ装置９０６、スキャナ装置９０７と接続していてもよい。また、磁気ディスク装置９２０の代わりに、光ディスク装置、メモリカード読み書き装置などの記憶装置でもよい。
ＲＡＭ９１４は、揮発性メモリの一例である。ＲＯＭ９１３、ＦＤＤ９０４、ＣＤＤ９０５、磁気ディスク装置９２０の記憶媒体は、不揮発性メモリの一例である。これらは、記憶装置あるいは記憶部の一例である。
通信ボード９１５、キーボード９０２、スキャナ装置９０７、ＦＤＤ９０４などは、入力部、入力装置の一例である。
また、通信ボード９１５、表示装置９０１、プリンタ装置９０６などは、出力部、出力装置の一例である。 6, the log analysis apparatus 10 includes a CPU 911 (also referred to as a central processing unit, a central processing unit, a processing unit, a processing unit, a microprocessor, a microcomputer, and a processor) that executes a program. The CPU 911 is connected to, for example, a ROM (Read Only Memory) 913, a RAM (Random Access Memory) 914, a communication board 915, a display device 901, a keyboard 902, a mouse 903, and a magnetic disk device 920 via a bus 912. Control hardware devices. Further, the CPU 911 may be connected to an FDD 904 (Flexible Disk Drive), a compact disk device 905 (CDD), a printer device 906, and a scanner device 907. Further, instead of the magnetic disk device 920, a storage device such as an optical disk device or a memory card read / write device may be used.
The RAM 914 is an example of a volatile memory. The storage media of the ROM 913, the FDD 904, the CDD 905, and the magnetic disk device 920 are an example of a nonvolatile memory. These are examples of a storage device or a storage unit.
The communication board 915, the keyboard 902, the scanner device 907, the FDD 904, and the like are examples of an input unit and an input device.
Further, the communication board 915, the display device 901, the printer device 906, and the like are examples of an output unit and an output device.

通信ボード９１５は、例えば、ＬＡＮ（ローカルエリアネットワーク）、インターネット、ＷＡＮ（ワイドエリアネットワーク）などに接続されている。
磁気ディスク装置９２０には、オペレーティングシステム９２１（ＯＳ）、ウィンドウシステム９２２、プログラム群９２３、ファイル群９２４が記憶されている。プログラム群９２３のプログラムは、ＣＰＵ９１１、オペレーティングシステム９２１、ウィンドウシステム９２２により実行される。 The communication board 915 is connected to a LAN (Local Area Network), the Internet, a WAN (Wide Area Network), etc., for example.
The magnetic disk device 920 stores an operating system 921 (OS), a window system 922, a program group 923, and a file group 924. The programs in the program group 923 are executed by the CPU 911, the operating system 921, and the window system 922.

上記プログラム群９２３には、本実施の形態及び以下に述べる実施の形態の説明において「〜部」、「〜手段」として説明している機能を実行するプログラムが記憶されている。プログラムは、ＣＰＵ９１１により読み出され実行される。
ファイル群９２４には、以下に述べる説明において、「〜の判断」、「〜の計算」、「〜の比較」、「〜の生成」、「〜の置換」、「〜の検出」、「〜の設定」等として説明している処理の結果を示す情報やデータや信号値や変数値やパラメータが、「〜ファイル」や「〜データベース」の各項目として記憶されている。「〜ファイル」や「〜データベース」は、ディスクやメモリなどの記録媒体に記憶される。ディスクやメモリになどの記憶媒体に記憶された情報やデータや信号値や変数値やパラメータは、読み書き回路を介してＣＰＵ９１１によりメインメモリやキャッシュメモリに読み出され、抽出・検索・参照・比較・演算・計算・処理・編集・出力・印刷・表示などのＣＰＵの動作に用いられる。抽出・検索・参照・比較・演算・計算・処理・編集・出力・印刷・表示のＣＰＵの動作の間、情報やデータや信号値や変数値やパラメータは、メインメモリ、レジスタ、キャッシュメモリ、バッファメモリ等に一時的に記憶される。
また、以下で説明するフローチャートの矢印の部分は主としてデータや信号の入出力を示し、データや信号値は、ＲＡＭ９１４のメモリ、ＦＤＤ９０４のフレキシブルディスク、ＣＤＤ９０５のコンパクトディスク、磁気ディスク装置９２０の磁気ディスク、その他光ディスク、ミニディスク、ＤＶＤ等の記録媒体に記録される。また、データや信号は、バス９１２や信号線やケーブルその他の伝送媒体によりオンライン伝送される。 The program group 923 stores programs that execute the functions described as “˜unit” and “˜means” in the description of the present embodiment and the following embodiments. The program is read and executed by the CPU 911.
In the file group 924, in the following description, “determination of”, “calculation of”, “comparison of”, “generation of”, “replacement of”, “detection of”, “ Information, data, signal values, variable values, and parameters indicating the result of the processing described as “setting” are stored as items of “˜file” and “˜database”. The “˜file” and “˜database” are stored in a recording medium such as a disk or a memory. Information, data, signal values, variable values, and parameters stored in a storage medium such as a disk or memory are read out to the main memory or cache memory by the CPU 911 via a read / write circuit, and extracted, searched, referenced, compared, Used for CPU operations such as calculation, calculation, processing, editing, output, printing, and display. Information, data, signal values, variable values, and parameters are stored in the main memory, registers, cache memory, and buffers during the CPU operations of extraction, search, reference, comparison, calculation, processing, editing, output, printing, and display. It is temporarily stored in a memory or the like.
The arrows in the flowchart described below mainly indicate input / output of data and signals. The data and signal values are the RAM 914 memory, FDD 904 flexible disk, CDD 905 compact disk, magnetic disk device 920 magnetic disk, In addition, it is recorded on a recording medium such as an optical disc, a mini disc, or a DVD. Data and signals are transmitted online via a bus 912, signal lines, cables, or other transmission media.

また、本実施の形態及び以下に述べる実施の形態の説明において「〜部」、「〜手段」として説明しているものは、「〜回路」、「〜装置」、「〜機器」、「〜手段」であってもよく、また、「〜ステップ」、「〜手順」、「〜処理」であってもよい。すなわち、「〜部」、「〜手段」として説明しているものは、ＲＯＭ９１３に記憶されたファームウェアで実現されていても構わない。或いは、ソフトウェアのみ、或いは、素子・デバイス・基板・配線などのハードウェアのみ、或いは、ソフトウェアとハードウェアとの組み合わせ、さらには、ファームウェアとの組み合わせで実施されても構わない。ファームウェアとソフトウェアは、プログラムとして、磁気ディスク、フレキシブルディスク、光ディスク、コンパクトディスク、ミニディスク、ＤＶＤ等の記録媒体に記憶される。プログラムはＣＰＵ９１１により読み出され、ＣＰＵ９１１により実行される。すなわち、プログラムは、本実施の形態及び以下に述べる実施の形態の「〜部」、「〜手段」としてコンピュータを機能させるものである。あるいは、本実施の形態及び以下に述べる実施の形態の「〜部」、「〜手段」の手順や方法をコンピュータに実行させるものである。 In addition, what is described as “to part” and “to means” in the description of this embodiment and the following embodiments is “to circuit”, “to apparatus”, “to device”, and “to”. It may be “means”, and may be “˜step”, “˜procedure”, and “˜processing”. That is, what is described as “˜unit” and “˜means” may be realized by firmware stored in the ROM 913. Alternatively, it may be implemented only by software, or only by hardware such as elements, devices, substrates, and wirings, by a combination of software and hardware, or by a combination of firmware. Firmware and software are stored as programs in a recording medium such as a magnetic disk, a flexible disk, an optical disk, a compact disk, a mini disk, and a DVD. The program is read by the CPU 911 and executed by the CPU 911. That is, the program causes the computer to function as “˜unit” and “˜means” of the present embodiment and the embodiments described below. Alternatively, the procedures and methods of “˜unit” and “˜means” of the present embodiment and the embodiments described below are executed by a computer.

このように、本実施の形態及び以下に述べる実施の形態に示すログ分析装置１０は、処理装置たるＣＰＵ、記憶装置たるメモリ、磁気ディスク等、入力装置たるキーボード、マウス、通信ボード等、出力装置たる表示装置、通信ボード等を備えるコンピュータであり、上記したように「〜部」、「〜手段」として示された機能をこれら処理装置、記憶装置、入力装置、出力装置を用いて実現するものである。 As described above, the log analysis apparatus 10 according to the present embodiment and the embodiments described below includes a CPU as a processing device, a memory as a storage device, a magnetic disk, an input device such as a keyboard, a mouse, and a communication board. A computer provided with a display device, a communication board, etc., which realizes the functions indicated as “to part” and “to means” by using these processing devices, storage devices, input devices, and output devices as described above. It is.

次に、ログ分析装置１０の動作例（データ判定方法）について図３を参照して説明する。
ログ分析装置１０で、時系列データ生成部１０２は、ログ記憶部１０１からデータを取り出し、取り出したデータを時間軸に沿って整列させて時系列データを生成する（ステップＳ１）。
異常検出部１０３は、時系列データを解析する（ステップＳ２）。
解析の結果、正常（ステップＳ４）か否か（ステップＳ５）を判定する（ステップＳ３）（非正常データ検出ステップ）。
この異常検知の方法には、時系列データを特異値分解し特徴量を算出し、時系列の各時点に対応する特徴量をマハラノビス距離などの指標を用いて異常の検出を行う方法や、ＡＲ（ＡｕｔｏＲｅｇｒｅｓｓｉｖｅ）やＡＲＭＡ（ＡｕｔｏＲｅｇｒｅｓｓｉｖｅＭｏｖｉｎｇＡｖｅｒａｇｅ）などの時系列モデルを使う方法などがある。
異常検出部１０３は、時点Ｎで正常でないデータを検出すると、非正常状態の検知を通知部１０４を介して通知し、表示装置１０５に表示する（ステップＳ６）。
さらに、スパイク判定部１０６に非正常状態の検知を通知する（ステップＳ７）。 Next, an operation example (data determination method) of the log analysis apparatus 10 will be described with reference to FIG.
In the log analyzer 10, the time-series data generation unit 102 extracts data from the log storage unit 101, and generates time-series data by aligning the extracted data along the time axis (step S1).
The abnormality detection unit 103 analyzes the time series data (step S2).
As a result of the analysis, it is determined whether or not it is normal (step S4) (step S5) (step S3) (unnormal data detection step).
This anomaly detection method includes a method in which time series data is subjected to singular value decomposition to calculate feature amounts, and feature amounts corresponding to each time point in the time series are detected using an index such as Mahalanobis distance, or AR There are methods using a time series model such as (Auto Regressive) and ARMA (Auto Regressive Moving Average).
When the abnormality detection unit 103 detects data that is not normal at the time point N, the abnormality detection unit 103 notifies the detection of the abnormal state via the notification unit 104 and displays it on the display device 105 (step S6).
Further, the spike determination unit 106 is notified of the detection of an abnormal state (step S7).

次に、スパイク判定部１０６の動作例（非正常データ判定ステップ）について図４を参照して説明する。
スパイク判定部１０６は、時系列データ生成部１０２を介し、ログ記憶部１０１から時点Ｎの直前の期間Ｔ１の時系列データを取得する（ステップＳ８）。
Ｘ（ｉ）を時点ｉのデータと表すと、ここで取得する時系列データはＸ（Ｎ−Ｔ１）、Ｘ（Ｎ−Ｔ１＋１）、・・・、Ｘ（Ｎ−１）となる。
これに、Ｘ（Ｎ）を加えた、Ｘ（Ｎ−Ｔ１）、Ｘ（Ｎ−Ｔ１＋１）、・・・、Ｘ（Ｎ−１）、Ｘ（Ｎ）の時系列データに対し、スパイク判定部１０６は、ＡＲやＡＲＭＡなどの時系列モデルを使い、時点Ｎ＋１の予測値Ｐ（Ｎ＋１）と、予測誤差ＰＥ（Ｎ＋１）を算出する（ステップＳ９）。
次に、時系列データ生成部１０２を介し、ログ記憶部１０１から時点Ｎ＋１の値（実測値）Ｘ（Ｎ＋１）を得る（ステップＳ１０）。
実測値Ｘ（Ｎ＋１）が式（１）の条件を満たすかを見る。（ステップＳ１１）
Ｐ（Ｎ＋１）−ＰＥ（Ｎ＋１）＜Ｘ（Ｎ＋１）＜Ｐ（Ｎ＋１）＋ＰＥ（Ｎ＋１） Next, an operation example (non-normal data determination step) of the spike determination unit 106 will be described with reference to FIG.
The spike determination unit 106 acquires the time series data of the period T1 immediately before the time point N from the log storage unit 101 via the time series data generation unit 102 (step S8).
When X (i) is expressed as data at time point i, the time-series data acquired here is X (N−T1), X (N−T1 + 1),..., X (N−1).
X (N−T1), X (N−T1 + 1),..., X (N−1), X (N) time-series data is added to this, and spike determination unit A time series model such as AR or ARMA is used to calculate a prediction value P (N + 1) and a prediction error PE (N + 1) at time N + 1 (step S9).
Next, a value (actual value) X (N + 1) at time N + 1 is obtained from the log storage unit 101 via the time series data generation unit 102 (step S10).
It is checked whether the actual measurement value X (N + 1) satisfies the condition of the formula (1). (Step S11)
P (N + 1) -PE (N + 1) <X (N + 1) <P (N + 1) + PE (N + 1)

次に、Ｘ（Ｎ−Ｔ１＋１）からＸ（Ｎ＋１）のデータを使い、時点Ｎ＋２の予測値Ｐ（Ｎ＋２）と予測誤差ＰＥ（Ｎ＋２）を算出する。次に、時系列データ生成部１０２を介し、ログ記憶部１０１から時点Ｎ＋２の値（実測値）Ｘ（Ｎ＋２）を得、時点Ｎ＋１のときと同様に、実測値Ｘ（Ｎ＋２）が予測値Ｐ（Ｎ＋２）±予測誤差ＰＥ（Ｎ＋２）の範囲にあるかを見る。 Next, using the data from X (N−T1 + 1) to X (N + 1), the prediction value P (N + 2) and the prediction error PE (N + 2) at time N + 2 are calculated. Next, the value (measured value) X (N + 2) of the time point N + 2 is obtained from the log storage unit 101 via the time series data generation unit 102, and the measured value X (N + 2) is the predicted value P as in the case of the time point N + 1. Check whether it is within the range of (N + 2) ± prediction error PE (N + 2).

なお、これらＰ（Ｎ＋ｉ）±ＰＥ（Ｎ＋ｉ）の範囲は、非正常データＸ（Ｎ）の非正常データ値（実測値）を反映させて算出したＮ＋ｉ時のデータの予測値の範囲であり、第一の予測範囲に相当する。 The range of these P (N + i) ± PE (N + i) is the range of the predicted value of the data at N + i calculated by reflecting the abnormal data value (actual value) of the abnormal data X (N), It corresponds to the first prediction range.

ステップＳ９からステップＳ１１の動作を、ある期間Ｔ２（つまり、Ｘ（Ｎ＋Ｔ２）まで）繰り返し（ステップＳ１２、Ｓ２３）、期間Ｔ２に含まれる全てのデータ（Ｘ（Ｎ＋１）からＸ（Ｎ＋Ｔ２）までの全てのデータ）について実測値が予測値±予測誤差の範囲に入る確率Ｒ１を求める（ステップＳ１３）。 The operations from step S9 to step S11 are repeated for a certain period T2 (that is, from X (N + T2)) (steps S12 and S23), and all the data included in the period T2 (from X (N + 1) to X (N + T2)) The probability R1 of the actually measured value that falls within the range of predicted value ± prediction error is obtained (step S13).

なお、期間Ｔ２に含まれるデータ（Ｘ（Ｎ＋１）・・・Ｘ（Ｎ＋Ｔ２））は、予測対象データの例である。
また、確率Ｒ１は、予測対象データごとに実際のデータ値と第一の予測範囲とを比較して得られる確率であり、各予測対象データの実際のデータ値が第一の予測範囲に含まれる確率を表し、第一の確率の例である。 Note that data (X (N + 1)... X (N + T2)) included in the period T2 is an example of prediction target data.
The probability R1 is a probability obtained by comparing the actual data value and the first prediction range for each prediction target data, and the actual data value of each prediction target data is included in the first prediction range. It represents the probability and is an example of the first probability.

一方、スパイク判定部１０６はデータ変換部１０７を呼び、時点Ｎに対応する正常なデータＸ’（Ｎ）を取得し（ステップＳ１４）、正常でないデータＸ（Ｎ）をＸ’（Ｎ）に置き換える（ステップＳ１５）。
次に、スパイク判定部１０６は、Ｘ（Ｎ）をＸ’（Ｎ）で置換した時系列データを使い、ステップＳ９からＳ１１と同様の手法にて、時点Ｎ＋ｉの予測値および予測誤差を算出し（ステップＳ１６）、実測値と比較し、実測値が、予測値±予測誤差の範囲にあるかを見る（ステップＳ１７〜Ｓ１８）。 On the other hand, the spike determination unit 106 calls the data conversion unit 107 to acquire normal data X ′ (N) corresponding to the time point N (step S14), and replaces the abnormal data X (N) with X ′ (N). (Step S15).
Next, the spike determination unit 106 uses the time-series data in which X (N) is replaced with X ′ (N), and calculates the prediction value and the prediction error at the time point N + i by the same method as in steps S9 to S11. (Step S16) Compare with the actual measurement value to see whether the actual measurement value is within the range of predicted value ± prediction error (Steps S17 to S18).

なお、ステップＳ１６で算出される（Ｎ＋ｉ）時のデータの予測値±予測誤差の範囲は、非正常データＸ（Ｎ）の非正常データ値の代わりに正常なデータ値（Ｘ’（Ｎ）のデータ値）を反映させて算出したＮ＋ｉ時のデータの予測値の範囲であり、第二の予測範囲に相当する。 Note that the range of the predicted value ± prediction error of the data at the time of (N + i) calculated in step S16 is the normal data value (X ′ (N) instead of the abnormal data value of the abnormal data X (N). (Data value) is a range of the predicted value of the data at N + i calculated and reflects the second predicted range.

そして、スパイク判定部１０６は、ステップＳ１６からステップＳ１８の動作を、ある期間Ｔ２（つまり、Ｘ（Ｎ＋Ｔ２）まで）繰り返し（ステップＳ１９、Ｓ２４）、期間Ｔ２に含まれる全てのデータ（Ｘ（Ｎ＋１）からＸ（Ｎ＋Ｔ２）までの全てのデータ）について実測値が予測値±予測誤差の範囲に入る確率Ｒ２を求める（ステップＳ２０）。 Then, the spike determination unit 106 repeats the operation from step S16 to step S18 for a certain period T2 (that is, from X (N + T2)) (steps S19 and S24), and all the data (X (N + 1)) included in the period T2 To R (all data from X to (N + T2)), the probability R2 that the measured value falls within the range of predicted value ± prediction error is obtained (step S20).

なお、確率Ｒ２は、予測対象データごとに実際のデータ値と第二の予測範囲とを比較して得られる確率であり、各予測対象データの実際のデータ値が第二の予測範囲に含まれる確率を表し、第二の確率の例である。 The probability R2 is a probability obtained by comparing the actual data value and the second prediction range for each prediction target data, and the actual data value of each prediction target data is included in the second prediction range. It represents the probability and is an example of the second probability.

次に、スパイク判定部１０６は、確率Ｒ１と確率Ｒ２とを比較する（ステップＳ２１）。
Ｒ１＜Ｒ２、つまり、予測対象データの実測値が、時点Ｎの値を正常なデータＸ’（Ｎ）に置き換えた場合の予測に近い場合、正常でない時点Ｎのデータは、突発的なスパイク状ノイズと判定し、スパイク判定部１０６はデータ変換部１０７に通知する。
スパイク判定部１０６から通知を受けたデータ変換部１０７は、ログ記憶部１０１の時点Ｎのデータを、時点Ｎに対応する正常なデータＸ’（Ｎ）で置き換える（ステップＳ２２）。 Next, the spike determination unit 106 compares the probability R1 and the probability R2 (step S21).
When R1 <R2, that is, when the actual measurement value of the prediction target data is close to the prediction when the value of the time point N is replaced with the normal data X ′ (N), the data at the time point N that is not normal is a sudden spike. The spike determination unit 106 notifies the data conversion unit 107 of the determination as noise.
Receiving the notification from the spike determination unit 106, the data conversion unit 107 replaces the data at the time point N in the log storage unit 101 with normal data X ′ (N) corresponding to the time point N (step S22).

データ変換部１０７で、正常でない時点ＮのデータＸ（Ｎ）に対する正常なデータＸ’（Ｎ）を算出するときの動作について説明する。
データ変換部１０７は、時系列データ生成部１０２を介してログ記憶部１０１から、時点Ｎの直前の期間Ｔ３の時系列データ、Ｘ（Ｎ−Ｔ３）、Ｘ（Ｎ−Ｔ３＋１）、・・・、Ｘ（Ｎ−１）を取得する。
この時系列データに対し、ＡＲやＡＲＭＡなどの時系列モデルを使い、時点Ｎの予測値Ｐ（Ｎ）を算出し、これを時点Ｎに対する正常なデータとする。
そして、データ変換部１０７は、図４のステップＳ１４において、正常なデータＸ’（Ｎ）をスパイク判定部１０６に通知する。
また、データ変換部１０７は、図４のステップＳ２２において、ログ記憶部１０１のデータを正常なデータＸ’（Ｎ）に書き換える。 An operation when the data conversion unit 107 calculates normal data X ′ (N) with respect to data X (N) at the time N that is not normal will be described.
The data conversion unit 107 receives the time series data of the period T3 immediately before the time point N from the log storage unit 101 via the time series data generation unit 102, X (N−T3), X (N−T3 + 1),. , X (N−1).
A time series model such as AR or ARMA is used for this time series data, and a predicted value P (N) at time N is calculated and used as normal data for time N.
Then, the data conversion unit 107 notifies the spike determination unit 106 of normal data X ′ (N) in step S14 of FIG.
Further, the data conversion unit 107 rewrites the data in the log storage unit 101 to normal data X ′ (N) in step S22 of FIG.

以上のように、正常でないデータを検出したとき、それが異常か、突発的なスパイク状ノイズかの判定を行い、スパイク状ノイズのみを除去することにより、その後の異常検出の精度を向上させることができる。 As described above, when abnormal data is detected, it is determined whether it is abnormal or sudden spike noise, and only spike noise is removed to improve the accuracy of subsequent abnormality detection. Can do.

なお、以上の説明では、正常なデータＸ’（Ｎ）を反映させた予測値±予測誤差に実測値が含まれる確率Ｒ２を算出し、確率Ｒ１と確率Ｒ２とを比較することとしたが、確率Ｒ２を算出せずに、予め確率Ｒ１に対する閾値を定めておき、確率Ｒ１が閾値を超えるか否かにより、異常かスパイク状ノイズかを判定するようにしてもよい。 In the above description, the probability R2 in which the actual measurement value is included in the predicted value ± prediction error reflecting the normal data X ′ (N) is calculated, and the probability R1 and the probability R2 are compared. Instead of calculating the probability R2, a threshold value for the probability R1 may be determined in advance, and it may be determined whether the noise is abnormal or spike noise depending on whether the probability R1 exceeds the threshold value.

以上、本実施の形態では、
（ａ）ネットワークログのパケット数やフラグ毎のパケット数などを収集する手段、
（ｂ）収集したログを記憶する手段、
（ｃ）記憶手段によって保存されたログから時間軸に沿って変化する時系列データを生成する手段、
（ｄ）時系列データを解析し、正常でないデータを検出する手段、
（ｅ）正常でないデータを検出したことを通知する手段、
（ｆ）検知した正常でないデータが異常かスパイク状ノイズかを判定する手段、
（ｇ）スパイク状ノイズの値を、その前のある一定の期間の値から推測した値に変換する手段を有し、
検知した正常でないデータが異常かスパイク状ノイズかを判定する手段おいて、
（ｈ）ＡＲやＡＲＭＡなどの時系列モデルを用い、
（ｉ）実測値が信頼区間内に入る確率から、異常かスパイク状ノイズかを判定するログ分析装置等について説明した。 As described above, in the present embodiment,
(A) means for collecting the number of packets in the network log and the number of packets for each flag,
(B) means for storing the collected logs;
(C) means for generating time-series data changing along the time axis from the log saved by the storage means;
(D) means for analyzing time-series data and detecting abnormal data;
(E) means for notifying that abnormal data has been detected;
(F) means for determining whether the detected abnormal data is abnormal or spike noise;
(G) having means for converting the value of the spike-like noise into a value estimated from a value of a certain period before the spike noise;
In the means to determine whether the detected abnormal data is abnormal or spike noise,
(H) Using time series models such as AR and ARMA,
(I) A log analysis apparatus and the like that determines whether an actual measurement value falls within the confidence interval and determines whether it is abnormal or spike noise has been described.

また、本実施の形態では、スパイク状ノイズに代える値を推測し変換する手段で、スパイク状ノイズに代える値として、スパイク状ノイズと判定されたデータの前のデータを用いてＡＲやＡＲＭＡなどの時系列モデルを使い、スパイク状ノイズに代える値を推測し用いるログ分析装置について説明した。 Further, in the present embodiment, a means for estimating and converting a value to replace spike-like noise is used as a value to replace spike-like noise, using data before data determined to be spike-like noise, such as AR and ARMA. A log analysis apparatus that uses a time series model to estimate and use a value to replace spike noise has been described.

実施の形態２．
実施の形態１では、時点Ｎの実測値が、正常でないデータをそのまま使って予測したものと、正常なデータに置換して予測したもののどちらに近いかで、正常でないデータがスパイク状ノイズか否かを判定した。
本実施の形態では、スパイク判定部１０６は、時間軸において非正常データに後続する複数のデータを予測対象データとし、予測対象データごとにデータ値の予測範囲を算出し、予測対象データごとに実際のデータ値とデータ値の予測範囲とを比較し、予測対象データの実際のデータ値の軌跡と予測範囲の軌跡との一致状況に基づいて、非正常データが異常データ及びノイズのいずれであるかを判定する。 Embodiment 2. FIG.
In the first embodiment, whether or not the abnormal data is spike noise depending on whether the actually measured value at the time N is close to the one predicted using the normal data as it is or the one predicted by replacing the normal data with the normal data. It was judged.
In the present embodiment, the spike determination unit 106 uses a plurality of data following the abnormal data on the time axis as the prediction target data, calculates the prediction range of the data value for each prediction target data, and actually performs the prediction for each prediction target data. Whether the abnormal data is abnormal data or noise based on the coincidence between the actual data value trajectory of the prediction target data and the predicted range trajectory Determine.

具体的には、本実施の形態では、スパイク判定部１０６は、時点Ｎの直前の期間Ｔ４のデータ（時点Ｎ―Ｔ４、Ｎ−Ｔ４＋１、・・・Ｎ−１のデータ）を使って、時点Ｎ以降の期間Ｔ５について、時点Ｎ＋１、Ｎ＋２、・・・、Ｎ＋Ｔ５の各データの予測値を算出し、時点Ｎ、Ｎ＋１、Ｎ＋２、・・・、Ｎ＋Ｔ５の実測値が、算出した予測値に収束していくかを見る。
収束する場合、正常でないデータがスパイク状ノイズと判定する。 Specifically, in the present embodiment, the spike determination unit 106 uses the data of the period T4 immediately before the time N (time N−T4, N−T4 + 1,... N−1 data) to For a period T5 after N, the predicted values of the data at the time points N + 1, N + 2,..., N + T5 are calculated, and the actual values at the time points N, N + 1, N + 2,. See how it will go.
In the case of convergence, data that is not normal is determined as spike noise.

図５は、本実施の形態に係るスパイク判定部１０６の判定手法を説明する図である。
図５の上段では、Ｎ＋１からＮ＋Ｔ５の範囲において、予測値（破線）と実測値（実線）の軌跡は一致傾向にあり、実測値が予測値に収束していくため、Ｎ時点のデータはスパイク状ノイズであると判定できる。
他方、図５の下段では、Ｎ＋１からＮ＋Ｔ５の範囲において、予測値（破線）と実測値（実線）の軌跡は一致傾向になく、実測値が予測値に収束していかないため、Ｎ時点のデータはスパイク状ノイズではないと判定できる。
なお、図５では、予測範囲の例として、予測誤差を考慮していない予測値を用いているが、実施の形態１と同様に、予測範囲として、予測値±予測誤差を用い、予測値±予測誤差と実測値とを比較するようにしてもよい。 FIG. 5 is a diagram illustrating a determination method of the spike determination unit 106 according to the present embodiment.
In the upper part of FIG. 5, in the range from N + 1 to N + T5, the locus of the predicted value (broken line) and the actually measured value (solid line) tend to coincide, and the actually measured value converges to the predicted value. It can be determined that the noise is a noise.
On the other hand, in the lower part of FIG. 5, in the range from N + 1 to N + T5, the locus of the predicted value (broken line) and the actually measured value (solid line) does not tend to match, and the actually measured value does not converge to the predicted value. Can be determined not to be spike noise.
In FIG. 5, a prediction value that does not consider the prediction error is used as an example of the prediction range. However, as in the first embodiment, the prediction value ± prediction error is used as the prediction range, and the prediction value ± The prediction error may be compared with the actual measurement value.

本実施の形態においても、スパイク判定部１０６がスパイク状ノイズを検出した場合には、データ変換部１０７がログ記憶部１０１のスパイク状ノイズのデータを正常なデータ値に置き換える。
なお、本実施の形態においても、ログ分析装置１０等の構成は図１と同様である。 Also in the present embodiment, when the spike determination unit 106 detects spike noise, the data conversion unit 107 replaces the spike noise data in the log storage unit 101 with a normal data value.
In the present embodiment, the configuration of the log analysis device 10 and the like is the same as that in FIG.

このように、本実施の形態によれば、非正常データの検出後の一定期間の実測値が予測値に収束していくか否かによりスパイク状ノイズか否かの判断を行うため、少ない計算量にてスパイク状ノイズの判定を行うことができる。 Thus, according to the present embodiment, since it is determined whether or not the spiked noise is based on whether or not the actual measurement value for a certain period after the detection of abnormal data converges to the predicted value, a small amount of calculation is required. Spike noise can be determined by the amount.

以上、本実施の形態では、ＡＲやＡＭＲＡなどの時系列モデルを用い、実測値が推定値に収束するか否かで、検知した正常でないデータが異常かスパイク状ノイズかを判定するログ分析装置について説明した。 As described above, in this embodiment, a log analysis device that uses a time series model such as AR or AMRA and determines whether detected abnormal data is abnormal or spike noise depending on whether or not the actual measurement value converges to the estimated value. Explained.

実施の形態３．
実施の形態１では、スパイク判定部１０６は、時点Ｎの非正常データＸ（Ｎ）に対する正常なデータとして、ＡＲやＡＲＭＡなどの時系列モデルを用いて算出されたデータを用いて第二の予測範囲を算出した。
一方、本実施の形態では、スパイク判定部１０６は、時間軸において非正常データに先行する複数のデータのデータ値の平均値を非正常データ値の代わりの正常なデータ値として用いて、各予測対象データの第二の予測範囲を算出する。 Embodiment 3 FIG.
In the first embodiment, the spike determination unit 106 uses the data calculated using a time series model such as AR or ARMA as the normal data for the abnormal data X (N) at the time point N to perform the second prediction. Range was calculated.
On the other hand, in the present embodiment, the spike determination unit 106 uses each data value of a plurality of data preceding the abnormal data on the time axis as a normal data value instead of the abnormal data value. A second prediction range of the target data is calculated.

つまり、本実施の形態では、データ変換部１０７は、時点Ｎの直前の期間Ｔ６のデータ（時点Ｎ−Ｔ６、Ｎ−Ｔ６＋１、・・・Ｎ−１のデータ）の平均値を時点Ｎに対する正常値Ｘ’（Ｎ）として算出する。
そして、スパイク判定部１０６は、この正常値Ｘ’（Ｎ）を用いて、第二の予測範囲を算出し、以降は、実施の形態１に示した手順にて、時点Ｎのデータがスパイク状ノイズであるか否かを判定する。
また、本実施の形態においても、スパイク判定部１０６がスパイク状ノイズを検出した場合には、データ変換部１０７がログ記憶部１０１のスパイク状ノイズのデータを正常なデータ値に置き換える。この場合の正常なデータ値は、上記の期間Ｔ６のデータの平均値である。
なお、本実施の形態においても、ログ分析装置１０等の構成は図１と同様である。 That is, in the present embodiment, the data conversion unit 107 sets the average value of the data in the period T6 immediately before the time point N (data of the time points N−T6, N−T6 + 1,... N−1) to the normal value with respect to the time point N. Calculated as the value X ′ (N).
Then, the spike determination unit 106 calculates the second prediction range using the normal value X ′ (N), and thereafter, the data at the time N is spiked in the procedure shown in the first embodiment. It is determined whether or not it is noise.
Also in the present embodiment, when the spike determination unit 106 detects spike noise, the data conversion unit 107 replaces the spike noise data in the log storage unit 101 with normal data values. The normal data value in this case is the average value of the data in the period T6.
In the present embodiment, the configuration of the log analysis device 10 and the like is the same as that in FIG.

このように、本実施の形態によれば、時点Ｎの非正常データの代わりに用いる正常値を、時点Ｎに先行する複数のデータの平均値とするため、少ない計算量にてスパイク状ノイズの判定を行うことができる。 As described above, according to the present embodiment, the normal value used in place of the abnormal data at time N is the average value of a plurality of data preceding time N, so that spike noise is reduced with a small amount of calculation. Judgment can be made.

以上のように、本実施の形態では、スパイク状ノイズに代える値として、スパイク状ノイズと判定されたデータの前のデータの平均値を用いるログ分析装置について説明した。 As described above, in the present embodiment, the log analyzer that uses the average value of data before data determined to be spike-like noise as a value to replace spike-like noise has been described.

実施の形態１に係るログ分析装置の構成例を示す図。FIG. 3 is a diagram illustrating a configuration example of a log analysis apparatus according to the first embodiment. スパイク状ノイズと正常状態及び異常との関係を示す図。The figure which shows the relationship between spike-like noise, a normal state, and abnormality. 実施の形態１に係るログ分析装置の動作例を示すフローチャート図。FIG. 3 is a flowchart showing an operation example of the log analysis apparatus according to the first embodiment. 実施の形態１に係るログ分析装置の動作例を示すフローチャート図。FIG. 3 is a flowchart showing an operation example of the log analysis apparatus according to the first embodiment. 実施の形態２に係るログ分析装置のスパイク判定手法の例を示す図。FIG. 10 is a diagram illustrating an example of a spike determination method of the log analysis apparatus according to the second embodiment. 実施の形態１〜３に係るログ分析装置のハードウェア構成例を示す図。The figure which shows the hardware structural example of the log analyzer which concerns on Embodiment 1-3.

Explanation of symbols

１０ログ分析装置、２０ログ収集装置、１０１ログ記憶部、１０２時系列データ生成部、１０３異常検出部、１０４通知部、１０５表示装置、１０６スパイク判定部、１０７データ変換部。 10 log analysis device, 20 log collection device, 101 log storage unit, 102 time series data generation unit, 103 abnormality detection unit, 104 notification unit, 105 display device, 106 spike determination unit, 107 data conversion unit.

Claims

An abnormal data detector that monitors time-series data arranged along the time axis and detects abnormal data in which data values are not normal in the time-series data;
A plurality of data within a certain time period from the abnormal data on the time axis is set as the prediction target data, the prediction range of the data value is calculated for each prediction target data, and the actual data value and the data value of each prediction target data are calculated. An information processing apparatus comprising: an abnormal data determination unit that compares a prediction range and determines whether the abnormal data is abnormal data or noise based on a comparison result.

The abnormal data determination unit
For each prediction target data, the actual data value is compared with the prediction range of the data value, the probability that the actual data value of each prediction target data is included in the prediction range is calculated, and the abnormal state is based on the calculated probability The information processing apparatus according to claim 1, wherein it is determined whether the data is abnormal data or noise.

The abnormal data determination unit
Reflecting the abnormal data value of the abnormal data to calculate the prediction range of the data value for each prediction target data, as the first prediction range of each prediction target data,
Reflecting the normal data value instead of the abnormal data value of the non-normal data to calculate the prediction range of the data value for each prediction target data, the second prediction range of each prediction target data,
For each prediction target data, the actual data value is compared with the first prediction range, the probability that the actual data value of each prediction target data is included in the first prediction range is calculated as the first probability,
For each prediction target data, the actual data value is compared with the second prediction range, and the probability that the actual data value of each prediction target data is included in the second prediction range is calculated as the second probability.
The information processing apparatus according to claim 1, wherein the calculated first probability and the second probability are compared to determine whether the abnormal data is abnormal data or noise.

The abnormal data determination unit
A data value calculated using a time series model for a plurality of data preceding the abnormal data on the time axis is used as a normal data value instead of the abnormal data value. The information processing apparatus according to claim 3, wherein a second prediction range is calculated.

The abnormal data determination unit
Calculating a second prediction range of each prediction target data using an average value of a plurality of data values preceding the abnormal data on the time axis as a normal data value instead of the abnormal data value The information processing apparatus according to claim 3.

The abnormal data determination unit
A plurality of data following the abnormal data on the time axis is set as prediction target data, a prediction range of data values is calculated for each prediction target data, and an actual data value and a prediction range of data values are calculated for each prediction target data. The comparison is performed, and it is determined whether the abnormal data is abnormal data or noise based on a coincidence state between a trajectory of an actual data value of the prediction target data and a trajectory of the prediction range. The information processing apparatus according to 1.

The information processing apparatus further includes:
A log storage unit for storing a log that is a source of the time-series data;
A data rewriting unit for rewriting the log in the log storage unit from an abnormal data value of the abnormal data to a normal data value when the abnormal data determination unit determines that the abnormal data is noise; The information processing apparatus according to claim 1, further comprising:

The data rewriting unit
A normal data value is calculated using a time series model for a plurality of data preceding the abnormal data on the time axis, and the log in the log storage unit is rewritten with the calculated normal data value. The information processing apparatus according to claim 7.

A non-normal data detection step in which a computer monitors time-series data arranged along a time axis and detects abnormal data in which data values are not normal in the time-series data;
A computer sets a plurality of data within a certain time from the abnormal data on the time axis as prediction target data, calculates a prediction range of data values for each prediction target data, and sets an actual data value for each prediction target data A data determination method comprising: comparing a prediction range of data values, and determining whether the abnormal data is abnormal data or noise based on a comparison result.

An abnormal data detection process for monitoring time-series data arranged along the time axis and detecting abnormal data whose data values are not normal in the time-series data;
A plurality of data within a certain time period from the abnormal data on the time axis is set as the prediction target data, the prediction range of the data value is calculated for each prediction target data, and the actual data value and the data value of each prediction target data are calculated. A program that compares a predicted range and causes a computer to execute an abnormal data determination process that determines whether the abnormal data is abnormal data or noise based on a comparison result.