JP2008154010A

JP2008154010A - Data processor, and data processing method and program

Info

Publication number: JP2008154010A
Application number: JP2006340621A
Authority: JP
Inventors: Kazuhiro Ono; 一広大野
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2006-12-19
Filing date: 2006-12-19
Publication date: 2008-07-03
Anticipated expiration: 2026-12-19
Also published as: JP4723466B2

Abstract

<P>PROBLEM TO BE SOLVED: To smooth a feature place of time series data preferentially. <P>SOLUTION: A data input/processing part 420 accumulates input data 410 in each unit time, a feature analysis part 430 classifies the accumulated data into prescribed areas and performs main component analysis in each area to calculate a feature vector, a protruding point determining part 440 arranges the feature vector of each area in a two-dimensional plane and determines a protruding rate from a distribution of feature vectors, a smoothing coefficient calculating part 450 gives the protruding rate of an area corresponding to each accumulated data from the data input/processing part 420, calculates a relative evaluation value in the area of each data, calculates a smoothing coefficient in which the relative evaluation value of each data, the protruding rate of the area of each data and the number of data to be a target of a moving average calculation by a smoothing part 460 become a proportional relation, and the smoothing part 460 calculates the number of target data of a moving average on the basis of the smoothing coefficient and smoothes each accumulated data from the data inputting/processing part 420 by the moving average calculation. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、時系列データの平滑化技術に関する。 The present invention relates to a technique for smoothing time-series data.

不正アクセス検出において、収集されたパケットログから生成された時系列データを解析して異常を検知する手法がある。
この手法では、時系列データと学習データとを比較する。学習データとは時系列データの変化量を測るための基準となるものである。 In unauthorized access detection, there is a method of detecting anomalies by analyzing time-series data generated from collected packet logs.
In this method, time series data and learning data are compared. Learning data is a reference for measuring the amount of change in time-series data.

図１は、例えば、非特許文献１に記載の不正アクセス分析システム１００の構成例を示す。
図１に示す不正アクセス分析システム１００は、例えば図２に示すように、企業等の特定の組織に属するネットワークを監視対象とする。ファイアウォール（Ｆ／Ｗ）、Ｓ−ＮＩＤＳ（ＳｉｇｎａｔｕｒｅｂａｓｅｄＮｅｔｗｏｒｋＩＤＳ（ＩｎｔｒｕｓｉｏｎＤｅｔｅｃｔｉｏｎＳｙｓｔｅｍ）、パケット収集装置からのパケットログ（定点観測データ）を不正アクセス分析システム１００に入力し、リアルタイムに分析を行う。 FIG. 1 shows a configuration example of an unauthorized access analysis system 100 described in Non-Patent Document 1, for example.
The unauthorized access analysis system 100 shown in FIG. 1 targets a network belonging to a specific organization such as a company as a monitoring target, for example, as shown in FIG. A firewall (F / W), an S-NIDS (Signature based Network IDS) (Intrusion Detection System), and a packet log (fixed point observation data) from the packet collection device are input to the unauthorized access analysis system 100 and analyzed in real time.

図１において、情報収集部６は、Ｆ／Ｗ、Ｓ−ＮＩＤＳ、パケット収集装置のパケットログを定期的に収集する。
ログ情報集計部５は、情報収集部６で集められたパケットログから不正アクセスの検知に必要なパケットの情報を集計する。例えば、単位時間当たりの送信元ＩＰアドレス毎パケット数、送信先ポート毎パケット数、或いはパケット長等の集計を行う。
異常検知部４は、ログ情報集計部５により集計されたデータをもとに異常なネットワークトラフィックを検知し早期アラートを出力する。
不正アクセス判定部３は、異常検知部４においてトラフィックの異常状態が検知された場合、不正アクセスが原因であることを判定する機能である。ログ情報集計部５において複数の分析視点での集計を行い、各々に対する異常検知部４の検知の結果を総合的に判断し不正アクセスが原因であることを確定する。また、図示していないセキュリティ情報データベースに格納された既知の脆弱性情報も判定に利用する。例えば、異常検知部４において特定のサービス（ポート）へのパケットの分析結果で異常が検知されており、直近に同サービスの脆弱性が公開されていたのであれば、同脆弱性を悪用した不正アクセスの可能性があると判定できる。
誤検知と判定された場合は、その情報を正常状態して異常検知部４にフィードバックする。
なお、セキュリティ情報データベースとは、例えば、ソフトウェアの最新の脆弱性情報・パッチ情報を管理するデータベースである。
対策部２は、不正アクセス判定部３により不正アクセスが確定された場合、特定ポートへのアクセスの制限、パッチの適用等の指示等、対策の指針を出力する機能である。ネットワーク管理者はこの出力を参考に対策を行う。
ＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）１は、早期アラート、不正アクセスの原因、対策情報等を表示する。 In FIG. 1, the information collection unit 6 periodically collects F / W, S-NIDS, and packet logs of the packet collection device.
The log information totaling unit 5 totals packet information necessary for detecting unauthorized access from the packet logs collected by the information collecting unit 6. For example, the number of packets per source IP address per unit time, the number of packets per destination port, or the packet length, etc. are aggregated.
The abnormality detection unit 4 detects abnormal network traffic based on the data aggregated by the log information aggregation unit 5 and outputs an early alert.
The unauthorized access determination unit 3 is a function that determines that an unauthorized access is caused when an abnormal state of traffic is detected by the abnormality detection unit 4. The log information aggregation unit 5 performs aggregation from a plurality of analysis viewpoints, comprehensively determines the detection results of the abnormality detection unit 4 for each of them, and determines that unauthorized access is the cause. Further, known vulnerability information stored in a security information database (not shown) is also used for determination. For example, if an abnormality is detected in the analysis result of a packet to a specific service (port) in the abnormality detection unit 4 and the vulnerability of the service has been disclosed recently, an unauthorized use of the vulnerability It can be determined that there is a possibility of access.
If it is determined that there is a false detection, the information is in a normal state and fed back to the abnormality detection unit 4.
The security information database is, for example, a database that manages the latest vulnerability information / patch information of software.
The countermeasure unit 2 is a function of outputting countermeasure guidelines such as an instruction for restricting access to a specific port, applying a patch, etc. when unauthorized access is determined by the unauthorized access determination unit 3. The network administrator takes measures against this output.
A GUI (Graphical User Interface) 1 displays an early alert, the cause of unauthorized access, countermeasure information, and the like.

次に、時系列データの解析手法の従来技術として、主成分分析を用いた場合の異常検知部４の例を示す。
この主成分分析を用いた手法では、時系列データ中で発生した変動の判定を行う。判定手法は以下の手順に従う。
図２６に、異常検知部４の詳細を示す。
データ取得部４３は、時系列データ（入力データ４２）の入力及び学習データ４１の規定を行う。
分析部４４は、時系列データの特徴量の計算を行う。
判定部４５は、時系列データの異常値の判定を行う。 Next, as an example of a conventional technique for analyzing time series data, an example of the abnormality detection unit 4 when principal component analysis is used will be described.
In the method using the principal component analysis, the variation occurring in the time series data is determined. The determination method follows the following procedure.
FIG. 26 shows details of the abnormality detection unit 4.
The data acquisition unit 43 inputs time-series data (input data 42) and defines the learning data 41.
The analysis unit 44 calculates the feature amount of the time series data.
The determination unit 45 determines an abnormal value of the time series data.

データ取得部４３では、異常を測定する時系列データの入力と学習データの規定を行う。
学習データとは、前述したように、時系列データの変化量を測るための基準となるものである。これは入力する時系列データの一部分を学習データとする方法と、何らかのモデル化に基づいて作成する方法がある。
図２６の例の場合、学習データを時系列データ内の連続する一定範囲の領域と規定する。 The data acquisition unit 43 inputs time-series data for measuring abnormality and defines learning data.
As described above, the learning data is a reference for measuring the amount of change in the time series data. There are a method of using a part of input time series data as learning data and a method of creating based on some modeling.
In the case of the example in FIG. 26, the learning data is defined as a continuous range of a range in the time series data.

分析部４４では時系列データに対する分析を行う。
ここでは時系列データに関する特徴量を算出する。
分析手法の例としては、データ取得部４３から得た時系列データを単位時間ごとに一定の大きさに分解する。
分解したデータそれぞれに対して分析を行い、少数の特徴量に変換する。
これにより一定期間に発生した多次元の時系列情報がより小さな次元の情報に圧縮される。その結果異常の分析をより高速に行うことが可能になる。 The analysis unit 44 analyzes the time series data.
Here, a feature amount related to time series data is calculated.
As an example of the analysis method, the time series data obtained from the data acquisition unit 43 is decomposed into a certain size every unit time.
Each decomposed data is analyzed and converted into a small number of features.
As a result, multi-dimensional time-series information generated in a certain period is compressed into smaller-dimensional information. As a result, it becomes possible to analyze the abnormality at a higher speed.

判定部４５では、分析部４４で得られた時系列データの特徴量と先に定義した学習データ４１の比較を行う。
比較の結果、入力データ４２が学習データ４１と異なるものである場合、入力データ４２は異常であると判定する。
比較の手法には特徴量の空間を定義し、入力データ４２を分析し特徴量化したものをこの空間に配置する。その後、入力データ４２の分布を調べ、分布の群から一定量乖離しているものについては異常とみなす方法がある。 The determination unit 45 compares the feature amount of the time series data obtained by the analysis unit 44 with the previously defined learning data 41.
As a result of the comparison, if the input data 42 is different from the learning data 41, it is determined that the input data 42 is abnormal.
In the comparison method, a space for feature amounts is defined, and the input data 42 analyzed and converted into feature amounts is placed in this space. After that, there is a method in which the distribution of the input data 42 is examined, and the data that deviates from the distribution group by a certain amount is regarded as abnormal.

上記手順は繰り返し行う。新規の時系列データを解析する際は学習データの規定も再び行う。 The above procedure is repeated. When new time series data is analyzed, the training data is also defined again.

また、平滑化処理を行って時系列データの解析を行う従来技術として、例えば、特許文献１、非特許文献２に記載の技術がある。
これらの技術は時系列データの変化点を検出する技術である。
これらの技術では、時系列データに対して移動平均処理を用いた平滑化処理を行っているが、時系列データの全ての領域に対して平滑化を行っている。
つまり、対象とする時系列データ全体に対して平滑化を行うものである。
特開２０００−２１３９４８号公報榊原裕之、藤井誠司、北澤繁樹、平井規郎、鹿島理華、東辰輔、「定点観測による不正アクセス分析システムの提案」、情報処理学会第６８回全国大会、情報処理学会、２００６。竹内純一、山西健司、「忘却型学習アルゴリズムを用いた外れ値検出と変化点検出の統一的扱い」、２０００年情報論的学習理論ワークショップ、２００２。 Moreover, as a prior art which performs smoothing processing and analyzes time-series data, for example, there are techniques described in Patent Document 1 and Non-Patent Document 2.
These techniques are techniques for detecting change points in time series data.
In these techniques, smoothing processing using moving average processing is performed on time-series data, but smoothing is performed on all regions of time-series data.
That is, smoothing is performed on the entire target time-series data.
Japanese Patent Laid-Open No. 2000-213948 Hiroyuki Sugawara, Seiji Fujii, Shigeki Kitazawa, Norio Hirai, Rika Kashima, Shinsuke Higashi, “Proposal of Unauthorized Access Analysis System by Fixed-Point Observation”, IPSJ 68th National Convention, IPSJ, 2006. Junichi Takeuchi and Kenji Yamanishi, “Unified treatment of outlier detection and change point detection using forgetting learning algorithm”, 2000 Information theory learning theory workshop, 2002.

従来の時系列データの平滑化手法は、平滑化をその時系列データ全体に対して行うものであった。
そのため従来手法をネットワークの異常検知に用いた場合、本来必要な情報まで平滑化されてしまうため、検知性能が低下してしまう。
上記のような時系列データの異常検知手法の場合、学習データの規定方法によって検知性能に問題が生じる場合がある。
例として学習データにノイズが含まれている場合である。
学習データの一部に他とは傾向の異なる突出した値が含まれていた場合、異常の判定に大きな影響を与える。図２６は、従来技術で学習データを用いて検知を行う場合の例を示している。
図２６の例では、時系列データに大きな変動が発生しているか判定する方法として、マハラノビス汎距離の値と学習データ領域の分布を比較する方法を用いている。
従来の異常判定手法では。異常を比較する対象として学習データ（定常域データ）を用いる。
判定処理では学習データ内の情報を分析し、異常判定の閾値を決定する。この学習データ内にノイズなどの傾向が異なる情報が存在していた場合、従来の手法では異常判定の閾値はノイズを含んだものとなり異常の判定が遅れてしまう問題点があった。 The conventional method for smoothing time-series data is to perform smoothing on the entire time-series data.
For this reason, when the conventional method is used for detecting an abnormality in the network, the necessary information is smoothed, and the detection performance is deteriorated.
In the case of the time series data abnormality detection method as described above, there may be a problem in detection performance depending on the method of defining the learning data.
An example is when the learning data contains noise.
When a part of the learning data includes a prominent value having a different tendency from the others, it greatly affects the determination of abnormality. FIG. 26 shows an example in which detection is performed using learning data in the prior art.
In the example of FIG. 26, a method of comparing the Mahalanobis generalized distance value and the distribution of the learning data area is used as a method of determining whether a large variation has occurred in the time series data.
In the conventional abnormality judgment method. Learning data (stationary area data) is used as a target for comparing anomalies.
In the determination process, information in the learning data is analyzed to determine a threshold value for abnormality determination. When information having a different tendency such as noise exists in the learning data, the conventional method has a problem that the abnormality determination threshold includes noise and delays the abnormality determination.

従来は、時系列データの変動が発生したとする判定手法として、以下の閾値を用いている。
検知時のデータ値＞学習データ内の最大値のａ倍（ａは定数）
この手法では、学習データ中に他よりも値の大きなデータが混入していた場合、その影響を強く受ける。そのため検知時のデータ値において閾値よりも小さな変動が発生した際、異常を見落とす可能性が高い。 Conventionally, the following threshold values are used as a determination method that time-series data fluctuation has occurred.
Data value at the time of detection> a times the maximum value in the learning data (a is a constant)
In this method, when learning data includes data having a value larger than that of other data, it is strongly influenced by the learning data. Therefore, when the data value at the time of detection changes smaller than the threshold value, there is a high possibility of overlooking the abnormality.

この発明は、このような問題点を解決することを主な目的の一つとしており、学習データの特徴的な箇所を優先的に平滑化し、後の異常検知処理の精度向上を行うことを主な目的とする。 The main object of the present invention is to solve such problems, and to preferentially smooth characteristic portions of the learning data and to improve the accuracy of the subsequent abnormality detection processing. With a purpose.

本発明に係るデータ処理装置は、
各々がデータ値を有する複数のデータに対する解析を行い、各データについて、各データのデータ値が他のデータのデータ値から乖離している度合いを乖離値として設定する乖離値設定部と、
前記乖離値設定部により設定された各データの乖離値を反映させて、データごとに平滑化のための平滑化係数を算出する平滑化係数算出部と、
前記平滑化係数算出部により算出された各データの平滑化係数を用いて各データの平滑化を行う平滑化部とを有することを特徴とする。 The data processing apparatus according to the present invention
A divergence value setting unit that performs analysis on a plurality of data each having a data value, and sets, for each data, the degree of deviation of the data value of each data from the data value of other data, as a divergence value;
A smoothing coefficient calculating unit that calculates a smoothing coefficient for smoothing for each data, reflecting the deviation value of each data set by the deviation value setting unit;
And a smoothing unit that performs smoothing of each data using the smoothing coefficient of each data calculated by the smoothing coefficient calculating unit.

本発明によれば、データ値が他のデータから乖離しているデータに対して平滑化の度合いを強くするため、ノイズを除去することができ、異常検知の精度を向上することができる。 According to the present invention, since the degree of smoothing is increased with respect to data whose data value is different from other data, noise can be removed and the accuracy of abnormality detection can be improved.

実施の形態１．
本実施の形態では、学習データとして用いる時系列データに対して平滑化を施し異常検知の妨げとなる情報を除去する。その際、平滑化はデータの傾向が特徴的な領域に対してより強く行う。 Embodiment 1 FIG.
In this embodiment, time series data used as learning data is smoothed to remove information that hinders abnormality detection. At this time, smoothing is performed more strongly on regions where the tendency of data is characteristic.

図３は、本実施の形態に係る異常検知部４（データ処理装置）の構成例を示す。
本実施の形態に係る異常検知部４は、図１に示す不正アクセス分析システム１００の一部である。不正アクセス分析システム１００の他の要素の詳細は、前述した通りなので、説明は省略する。
なお、不正アクセス分析システム１００は、全体として一つのコンピュータで実現されていてもよいし、不正アクセス分析システム１００に含まれる各要素が異なるコンピュータで実現され、各コンピュータがネットワークで接続されて不正アクセス分析システムが実現される形態でもよい。 FIG. 3 shows a configuration example of the abnormality detection unit 4 (data processing apparatus) according to the present embodiment.
The abnormality detection unit 4 according to the present embodiment is a part of the unauthorized access analysis system 100 shown in FIG. Details of the other elements of the unauthorized access analysis system 100 are as described above, and a description thereof will be omitted.
The unauthorized access analysis system 100 may be realized by a single computer as a whole, or each element included in the unauthorized access analysis system 100 is realized by a different computer, and each computer is connected via a network for unauthorized access. It may be a form in which the analysis system is realized.

また、本実施の形態に係る異常検知部４が含まれる不正アクセス分析システム１００は、前述したように、例えば、図２に示すように、企業等の特定の組織に属するネットワークを監視対象とする。ファイアウォール（Ｆ／Ｗ）、Ｓ−ＮＩＤＳ、パケット収集装置からのパケットログ（定点観測データ）を不正アクセス分析システム１００に入力し、リアルタイムに分析を行う。 In addition, as described above, the unauthorized access analysis system 100 including the anomaly detection unit 4 according to the present embodiment monitors a network belonging to a specific organization such as a company as shown in FIG. . Packet logs (fixed point observation data) from the firewall (F / W), S-NIDS, and packet collection device are input to the unauthorized access analysis system 100 and analyzed in real time.

図３において、データ入力・処理部４２０は、時系列データである入力データ４１０を単位時間ごとに集計した数を記憶する。この入力データ４１０は、学習データとして用いる時系列データである。なお、以下、入力データ４１０を時系列データともいう。
特徴量分析部４３０は、上記データ入力・処理部４２０で集計された時系列データから主成分得点の計算を行い特徴領域の群に纏める。
突出点判定部４４０（乖離値設定部）は、上記特徴量分析部４３０で得られた特徴領域の群を調査し、他の領域と比較して領域のスコア化を行う。つまり、突出点判定部４４０は、各々がデータ値を有する複数のデータに対する解析を行い、各データについて、各データのデータ値が他のデータのデータ値から乖離している度合いを突出率（乖離値）として設定する。なお、具体的には、後述するように、突出点判定部４４０は、各データを所定の領域ごとにグループ化し、領域単位で乖離度合いを解析して突出率を判定する。
平滑化係数算出部４５０は、上記突出点判定部４４０での特徴領域のスコアに従い平滑化に関するパラメータを定める。つまり、平滑化係数算出部４５０は、突出点判定部４４０により判定された突出率を反映させて、データごとに平滑化のための平滑化係数を算出する。
平滑化部４６０は、平滑化係数算出部４５０でのパラメータに従い時系列データの平滑化を実施する。詳細は、後述するが、平滑化部４６０は、平滑化の対象となるデータに対して任意数のデータを用いた移動平均計算を行って平滑化を行う。 In FIG. 3, the data input / processing unit 420 stores the total number of input data 410 that is time series data per unit time. This input data 410 is time series data used as learning data. Hereinafter, the input data 410 is also referred to as time series data.
The feature quantity analysis unit 430 calculates principal component scores from the time series data aggregated by the data input / processing unit 420 and collects them into a group of feature regions.
The protruding point determination unit 440 (deviation value setting unit) investigates the group of feature regions obtained by the feature amount analysis unit 430, and scores the region compared to other regions. In other words, the protrusion point determination unit 440 performs analysis on a plurality of data each having a data value, and the degree of protrusion of the data value of each data from the data value of other data is determined for each data. Value). Specifically, as will be described later, the protruding point determination unit 440 groups each data for each predetermined region, and analyzes the degree of deviation for each region to determine the protruding rate.
The smoothing coefficient calculation unit 450 determines a parameter relating to smoothing according to the score of the feature region in the protruding point determination unit 440. That is, the smoothing coefficient calculation unit 450 reflects the protrusion rate determined by the protrusion point determination unit 440 and calculates a smoothing coefficient for smoothing for each data.
The smoothing unit 460 smoothes the time series data according to the parameters in the smoothing coefficient calculation unit 450. Although details will be described later, the smoothing unit 460 performs smoothing by performing a moving average calculation using an arbitrary number of data on the data to be smoothed.

ここで、図２４のフローチャートを参照して、本実施の形態に係る異常検知部４（データ処理装置）の動作例（データ処理方法）を概説する。
なお、本実施の形態では、検査対象の時系列データの一部を学習データとすることとし、検査対象の時系列データが入力された際に、図２４のフローチャートに示す処理が開始し、学習データの平滑化が行われる。 Here, with reference to the flowchart of FIG. 24, the operation example (data processing method) of the abnormality detection unit 4 (data processing apparatus) according to the present embodiment will be outlined.
In this embodiment, a part of the time-series data to be inspected is used as learning data, and when the time-series data to be inspected is input, the processing shown in the flowchart of FIG. Data is smoothed.

先ず、データ入力・処理部４２０が、平滑化の対象となる時系列データである入力データ４１０を入力する（Ｓ２４０１）。前述したように、異常検知の対象となる時系列データの一部を学習データとして用いるため、データ入力・処理部４２０は、異常検知の対象となる時系列データの一部を入力データ４１０として入力する。
そして、データ入力・処理部４２０は、入力データ４１０を所定の単位時間ごとに集計する（Ｓ２４０２）。
その後、データ入力・処理部４２０は、集計後のデータを特徴量分析部４３０、突出点判定部４４０及び平滑化係数算出部４５０のそれぞれに出力する。 First, the data input / processing unit 420 inputs input data 410 that is time-series data to be smoothed (S2401). As described above, since part of the time series data that is the target of abnormality detection is used as learning data, the data input / processing unit 420 inputs part of the time series data that is the target of abnormality detection as input data 410. To do.
Then, the data input / processing unit 420 aggregates the input data 410 every predetermined unit time (S2402).
Thereafter, the data input / processing unit 420 outputs the aggregated data to the feature amount analysis unit 430, the protruding point determination unit 440, and the smoothing coefficient calculation unit 450, respectively.

次に、特徴量分析部４３０が、データ入力・処理部４２０から出力されたデータを入力するとともに、入力したデータを所定の領域に区分し、領域ごとに特徴量を算出する（Ｓ２４０３）。
データ入力・処理部４２０からのデータは、所定の順序に従って整列されており、この順序に従ってデータを複数の領域（グループ）にグループ化し、各領域に含まれるデータのデータ値の主成分分析を行って、各グループの特徴量を算出する。
そして、特徴量分析部４３０は、領域ごとの特徴量を示したデータを突出点判定部４４０に出力する。 Next, the feature amount analysis unit 430 inputs the data output from the data input / processing unit 420, divides the input data into predetermined regions, and calculates a feature amount for each region (S2403).
The data from the data input / processing unit 420 is arranged according to a predetermined order. The data is grouped into a plurality of areas (groups) according to this order, and the principal component analysis of the data values of the data included in each area is performed. Thus, the feature amount of each group is calculated.
Then, the feature amount analysis unit 430 outputs data indicating the feature amount for each region to the protruding point determination unit 440.

突出点判定部４４０は、各領域の特徴量を２次元平面に配列し、特徴量の分布から突出率（乖離値）を判定する（Ｓ２４０４）（乖離値設定ステップ）。
つまり、突出点判定部４４０は、特徴量分析部４３０によりグループ化された各領域について、各領域の特徴量が他の領域の特徴量から乖離している度合いを突出率として設定する。なお、突出率の詳細については後述する。
その後、突出点判定部４４０は、各領域の突出率を示すデータを平滑化係数算出部４５０に出力する。 The protruding point determination unit 440 arranges the feature amounts of each region on a two-dimensional plane, and determines a protruding rate (deviation value) from the distribution of the feature amounts (S2404) (deviation value setting step).
That is, the protruding point determination unit 440 sets, as the protruding rate, the degree that the feature amount of each region deviates from the feature amount of the other region for each region grouped by the feature amount analysis unit 430. Details of the protrusion rate will be described later.
Thereafter, the protrusion point determination unit 440 outputs data indicating the protrusion ratio of each region to the smoothing coefficient calculation unit 450.

平滑化係数算出部４５０は、データ入力・処理部４２０により単位時間ごとに集計されたデータを入力するとともに、突出点判定部４４０から各領域の突出率を示すデータを入力する。
そして、平滑化係数算出部４５０は、データ入力・処理部４２０からの各データに対して対応する領域の突出率を付与し、各データの領域内の相対評価値を算出し、各データの相対評価値と各データが属する領域の突出率とを反映させてデータごとの平滑化係数を算出する（Ｓ２４０５）（平滑化係数算出ステップ）。
ここで、相対評価値とは、あるデータの値が同じ領域に含まれている他のデータと比較してどのような位置づけになるかを示す評価値である。なお、相対評価値の詳細についても後述する。
平滑化係数算出部４５０は、各データの相対評価値及び各データが属する領域の突出率と、平滑化部４６０による平滑化の度合いとが比例関係となる平滑化係数を算出する。
具体的には、相対評価値及び突出率と、平滑化部４６０による移動平均計算の対象とするデータ数とが比例関係となる平滑化係数を算出する。
このように、相対評価値又は突出率が大きいデータに対しては、移動平均の対象とするデータ数を大きくすることで、平滑化の度合いを高める。 The smoothing coefficient calculation unit 450 inputs data aggregated for each unit time by the data input / processing unit 420 and inputs data indicating the protrusion rate of each region from the protrusion point determination unit 440.
Then, the smoothing coefficient calculation unit 450 assigns the protrusion rate of the corresponding region to each data from the data input / processing unit 420, calculates the relative evaluation value in the region of each data, The smoothing coefficient for each data is calculated by reflecting the evaluation value and the protrusion rate of the area to which each data belongs (S2405) (smoothing coefficient calculating step).
Here, the relative evaluation value is an evaluation value indicating how a certain data value is compared with other data included in the same region. Details of the relative evaluation value will be described later.
The smoothing coefficient calculation unit 450 calculates a smoothing coefficient in which the relative evaluation value of each data and the protrusion rate of the region to which each data belongs and the degree of smoothing by the smoothing unit 460 are in a proportional relationship.
Specifically, a smoothing coefficient is calculated in which the relative evaluation value and the protrusion ratio are proportional to the number of data to be subjected to moving average calculation by the smoothing unit 460.
As described above, for data having a large relative evaluation value or protrusion ratio, the degree of smoothing is increased by increasing the number of data targeted for moving average.

最後に、平滑化部４６０が、データ入力・処理部４２０により単位時間ごとに集計されたデータを入力するとともに、平滑化係数算出部４５０から各データの平滑化係数を入力し、平滑化係数に従い各データを平滑化する（Ｓ２４０６）（平滑化ステップ）。
平滑化部４６０は、平滑化係数に応じて移動平均計算の対象とするデータ数を決定し、決定したデータ数のデータを用いた移動平均計算を行ってデータの平滑化を行う。
本実施の形態に示す例では、平滑化係数と同数のデータを用いて移動平均計算を行う。 Finally, the smoothing unit 460 inputs the data aggregated per unit time by the data input / processing unit 420, and also inputs the smoothing coefficient of each data from the smoothing coefficient calculation unit 450, and according to the smoothing coefficient Each data is smoothed (S2406) (smoothing step).
The smoothing unit 460 determines the number of data to be subjected to moving average calculation according to the smoothing coefficient, and performs the moving average calculation using data of the determined number of data to smooth the data.
In the example shown in this embodiment, moving average calculation is performed using the same number of data as the smoothing coefficients.

次に、本実施の形態に係る異常検知部４の動作を詳細に説明する。 Next, the operation of the abnormality detection unit 4 according to the present embodiment will be described in detail.

データ入力・処理部４２０は、解析を行う対象となる入力データ４１０を単位時間ごとに集計する。初期設定のためのパラメータは以下の通りである。
集計単位時間…観測を行う時系列データを集計する単位時間 The data input / processing unit 420 aggregates the input data 410 to be analyzed every unit time. The parameters for initial setting are as follows.
Aggregation unit time: Unit time for aggregation of time series data to be observed

入力データ４１０の形式を図４に示す。
なお、図４に示す通し番号は各データを現すもので、説明のために記載しているものであり、実際のデータには存在しない。
入力データ４１０は、例えば送信元ＩＰアドレス毎のパケット数のデータであり、通常、このような入力データ４１０は不定期に発生するため、データ入力・処理部４２０では、あらかじめ指定した集計単位時間ごとにデータをまとめる。
図４では、イベント発生日時（集計前イベント発生日時）は、不規則な時間間隔になっている。 The format of the input data 410 is shown in FIG.
Note that the serial numbers shown in FIG. 4 represent each data and are described for explanation, and do not exist in actual data.
The input data 410 is, for example, data of the number of packets for each source IP address. Usually, such input data 410 is generated irregularly. Therefore, the data input / processing unit 420 performs the aggregation unit time specified in advance. Summarize the data.
In FIG. 4, the event occurrence date and time (pre-aggregation event occurrence date and time) are irregular time intervals.

図５は、集計後の入力データの例である。
図５では、イベント発生日時（集計後イベント発生日時）は単位時間に集計を開始した最初の時刻とする。また、イベント発生数（集計後イベント発生数）は単位時間に発生した集計前イベント発生数の総計である。
入力データの単位時間が、｛Ｔ_１、Ｔ_２、Ｔ_３｝、｛Ｔ_４、Ｔ_５｝、｛Ｔ_６、Ｔ_７｝に分かれる場合、集計結果は３種類の情報になる。単位時間｛Ｔ_１、Ｔ_２、Ｔ_３｝のデータを集計した結果は通し番号ａ_１である。集計後イベント発生日時はＴ_１、集計後イベント発生数はＣ_１からＣ_３を加算したものである。
なお、図４と同様に、図５の通し番号も説明のために付加したものであり、実際のデータには存在しない。
また、図５のデータは、図３に示すように、特徴量分析部４３０、突出点判定部４４０及び平滑化係数算出部４５０のそれぞれに出力される。 FIG. 5 is an example of input data after aggregation.
In FIG. 5, the event occurrence date / time (post-aggregation event occurrence date / time) is the first time when the aggregation is started per unit time. Further, the number of event occurrences (the number of event occurrences after aggregation) is the total number of event occurrences before aggregation that occurred per unit time.
When the unit time of the input data is divided into {T ₁ , T ₂ , T ₃ }, {T ₄ , T ₅ }, {T ₆ , T ₇ }, the total result is three types of information. The result of totaling the data of the unit time {T ₁ , T ₂ , T ₃ } is the serial number a ₁ . The post-aggregation event occurrence date / time is T ₁ , and the post-aggregation event occurrence number is the sum of C ₁ to C ₃ .
As in FIG. 4, the serial numbers in FIG. 5 are added for the sake of explanation and do not exist in actual data.
Further, as shown in FIG. 3, the data in FIG. 5 is output to each of the feature amount analysis unit 430, the protruding point determination unit 440, and the smoothing coefficient calculation unit 450.

図１４は、入力データ４１０を５分間隔で集計した場合の例である。
入力データ４１０の先頭８つのイベントが集計されて５つのイベントとなる。
入力データのうち２００６／０７／０１０：００：２０と２００６／０７／０１０：０１：１３、２００６／０７／０１０：０３：０４は開始５分間に発生したイベントであるためひとつのイベントとする。
その際イベント発生日時は先に現れた情報（２００６／０７／０１０：００：２０）を使用し、イベント発生数は両者の合計数１７（４＋８＋５）とする。
同様にイベント発生日時が２００６／０７／０１０：１０：３３と２００６／０７／０１０：１１：３０のもの、２００６／０７／０１０：１６：２２と２００６／０７／０１０：１９：５４のものはひとつにまとめる。
イベントの集計時間内に１度しか発生しない場合（２００６／０７／０１０：２２：４３）はそのまま保持し、集計時間内に１度も発生しない場合はイベント発生時間を単位時間（図１４の場合２００６／０７／０１０：０５：００）、イベント発生数を０とする。 FIG. 14 shows an example when the input data 410 is tabulated at intervals of 5 minutes.
The top eight events of the input data 410 are aggregated into five events.
Of the input data, 2006/07/01 0:00:20 and 2006/07/01 0:01:13, 2006/07/01 0:03:04 are events that occurred within the first 5 minutes, so one event And
At that time, the event occurrence date and time uses the information (2006/07/01 0:00:20) that appears earlier, and the event occurrence number is the total number 17 (4 + 8 + 5) of both.
Similarly, the event occurrence dates are 2006/07/01 0:10:33 and 2006/07/01 0:11:30, 2006/07/01 0:16:22 and 2006/07/01 0:19: 54 things are put together.
If the event occurs only once within the totaling time of the event (2006/07/01 0:22:43), the event generation time is held as it is. In the case of 2006/07/01 0:05:00), the event occurrence number is set to 0.

特徴量分析部４３０は、上記データ入力・処理部４２０で集計された時系列データから主成分得点の計算を行い、次に主成分得点の時系列へ変換する。初期設定のためのパラメータは以下の通りである。
主成分対象次元数…主成分分析を計算する次元数 The feature quantity analysis unit 430 calculates principal component scores from the time series data aggregated by the data input / processing unit 420, and then converts the principal component scores into time series. The parameters for initial setting are as follows.
Principal component target dimensions: Number of dimensions for calculating principal component analysis

主成分対象次元数は、主成分分析を計算する際の主成分対象行列の列数になる、データ入力・処理部４２０から受けた時系列データを解析する個数である。
特徴量分析部４３０は、時系列データの先頭から主成分対象次元数の個数のデータを取り出し主成分分析にかける。
特徴量分析部４３０の入力データの例を図６に示す。
特徴量分析部４３０の入力データである図６のデータと、データ入力・処理部４２０の出力データである図５のデータは同じである。
図５と図６では、以降の説明の便宜のため表記方法が異なっているが、図５の通し番号ａ_１の集計後イベント発生日時Ｔ_１、集計後イベント発生数Ｃ_１＋Ｃ_２＋Ｃ_３が、図６の通し番号ｄ_１のイベント発生日時Ｔ_１、イベント発生数Ｃ_１に対応し、図５の通し番号ａ_２の集計後イベント発生日時Ｔ_４、集計後イベント発生数Ｃ_４＋Ｃ_５が、図６の通し番号ｄ_２のイベント発生日時Ｔ_２、イベント発生数Ｃ_２に対応する関係である。以降の行についても同様である。 The number of principal component target dimensions is the number of time series data received from the data input / processing unit 420 that is the number of columns of the principal component target matrix when calculating the principal component analysis.
The feature amount analysis unit 430 extracts the data of the number of principal component target dimensions from the beginning of the time series data, and applies it to the principal component analysis.
An example of input data of the feature quantity analysis unit 430 is shown in FIG.
The data of FIG. 6 that is input data of the feature quantity analysis unit 430 and the data of FIG. 5 that is output data of the data input / processing unit 420 are the same.
In FIG. 5 and FIG. 6, the notation method is different for the convenience of the following description. However, the post-aggregation event occurrence date / time T ₁ of serial number a _{1 and} the post-aggregation event occurrence number C ₁ + C ₂ + C ₃ Corresponding to the event occurrence date / time T ₁ and the event occurrence number C ₁ of the serial number d _{1 in} FIG. 6, the post-aggregation event occurrence date / time T ₄ and the post-aggregation event occurrence number C ₄ + C ₅ of the serial number a ₂ in FIG. event occurrence time _{T 2} of the serial number _{d 2} in a relationship corresponding to the event occurrence count _{C 2.} The same applies to the subsequent lines.

ここで、主成分対象次元数をｋとしたとき、時系列データの先頭からｋ個ずつまとめてグループ化し、グループごと（領域ごと）に処理を行う。図６の例の場合ｄ_１からｄ_ｋまでのイベント発生数から１行ｋ列の行列を作成し、この行列に含まれる要素を一つのグループ（領域）として主成分分析を行う。取り扱う行列は以下のようになる。
（Ｃ_１、Ｃ_２、．．．、Ｃ_ｋ）
その後、時系列データから次のｋ個を取り出し同様に行列を作成して主成分分析を行う。この処理を順次繰り返す。 Here, assuming that the number of principal component target dimensions is k, k pieces are grouped together from the beginning of the time series data, and processing is performed for each group (for each region). In the case of the example in FIG. 6, a matrix of 1 row and k columns is created from the number of event occurrences from d ₁ to d _k , and principal component analysis is performed with elements included in this matrix as one group (region). The matrix to handle is as follows.
(C ₁ , C ₂ ,..., C _k )
Thereafter, the next k pieces are extracted from the time-series data, a matrix is similarly created, and principal component analysis is performed. This process is repeated sequentially.

主成分分析の結果、ｋ個の時系列データを表す主成分得点の時系列が得られる。主成分得点は第１、第２、…と複数の得点が出るが、そのうち先頭２つを以降の工程で使用する。
時系列データから作成した配列と主成分分析で得られた特徴量の関係を図７に示す。 As a result of the principal component analysis, a time series of principal component scores representing k pieces of time series data is obtained. The principal component score is a plurality of scores, such as first, second,..., And the first two are used in the subsequent steps.
FIG. 7 shows the relationship between the sequence created from the time series data and the feature quantity obtained by the principal component analysis.

図７において、ＰＣ_１＿１およびＰＣ_２＿１は、入力の時系列データから作成した配列（Ｃ_１、Ｃ_２、．．．、Ｃ_ｋ）をあらわす特徴量である。以下の配列についても同様である。 In FIG. 7, PC _{1_1} and PC _{2_1} are feature amounts representing arrays (C ₁ , C ₂ ,..., C _k ) created from input time-series data. The same applies to the following sequences.

図１５は、特徴量分析部４３０による上記の手順を時系列データで表した例である。
はじめに時系列データ（データ入力・処理部４２０による集計後の時系列データ）を先頭からｋ要素ずつ分割したｎ個の部分時系列（領域）を作成する。
次に、それぞれの部分時系列に対して主成分分析を行う。
主成分分析の概念を図１６に示す。
この結果一つの部分時系列あたり２つの主成分得点が得られた。
本工程の出力として、特徴量分析部４３０は、イベントの発生時間と特徴量を記述した図８に示すデータを作成し、突出点判定部４４０に出力する。 FIG. 15 is an example in which the above-described procedure by the feature amount analysis unit 430 is represented by time series data.
First, n partial time series (regions) are created by dividing time series data (time series data after aggregation by the data input / processing unit 420) by k elements from the top.
Next, principal component analysis is performed on each partial time series.
The concept of principal component analysis is shown in FIG.
As a result, two principal component scores were obtained per partial time series.
As an output of this step, the feature amount analysis unit 430 creates data shown in FIG. 8 describing the event occurrence time and the feature amount, and outputs the data to the protruding point determination unit 440.

突出点判定部４４０は、図９に示すようなデータを入力し、上記特徴量分析部４３０で得られた特徴領域の群を調査し、他の領域と比較して領域のスコア化を行う。なお、図９では、説明の便宜のために通し番号を付与しているが、実際のデータにはなく、実際は、図８と同じ形式のデータを入力する。
突出点判定部４４０による特徴領域の調査は、具体的には、上記特徴量分析部４３０からの入力から第１特徴量と第２特徴量を取り出し、２次元平面へ配置する。配置の方法は、例えば、第１特徴量をＹ軸の座標に配置し、第２特徴量をＸ軸の座標とする。 The protruding point determination unit 440 inputs data as shown in FIG. 9, investigates the group of feature regions obtained by the feature amount analysis unit 430, and scores the region compared to other regions. In FIG. 9, serial numbers are given for convenience of explanation, but they are not in the actual data but are actually input in the same format as in FIG.
Specifically, the feature point survey by the protruding point determination unit 440 is performed by extracting the first feature amount and the second feature amount from the input from the feature amount analysis unit 430 and arranging them on a two-dimensional plane. As the arrangement method, for example, the first feature value is arranged at the Y-axis coordinate, and the second feature value is set as the X-axis coordinate.

図１７は、特徴量分析部４３０からの入力データ（図９）を２次元の特徴量空間（主成分空間）へ配置した図である。
通し番号（ａ）から（ｆ）までの特徴量のうち（ｃ）の特徴量が群から乖離していることがわかる。 FIG. 17 is a diagram in which input data (FIG. 9) from the feature amount analysis unit 430 is arranged in a two-dimensional feature amount space (principal component space).
It can be seen that among the feature quantities from serial numbers (a) to (f), the feature quantity of (c) deviates from the group.

次に、突出点判定部４４０は、特徴量空間（主成分空間）の分布をもとに群からの乖離を計算する。
ここで、群からの乖離を示す値を突出率と定義する。突出率は０から１までの数値をとり群の重心からの乖離度を示す。
群の重心を求める方法には母集団平均を求める方法がある。また群からの乖離を算出するにはマハラノビス汎距離の算出がある。
図１０は、突出点判定部４４０の出力データである。突出点判定部４４０は、時系列データ中のイベント発生日時に対応する突出率Ｐを付加する。 Next, the protruding point determination unit 440 calculates the deviation from the group based on the distribution of the feature amount space (principal component space).
Here, the value indicating the deviation from the group is defined as the protrusion rate. The protrusion ratio takes a numerical value from 0 to 1 and indicates the degree of deviation from the center of gravity of the group.
There is a method for obtaining the population average as a method for obtaining the center of gravity of the group. The Mahalanobis generalized distance can be calculated to calculate the deviation from the group.
FIG. 10 shows output data of the protruding point determination unit 440. The protruding point determination unit 440 adds a protruding rate P corresponding to the event occurrence date and time in the time series data.

図１８は、特徴量分析部４３０からの入力データから突出率を設定する際の概念を示す。
個々の部分時系列の特徴量空間分布を調査すると、（ｃ）の特徴量が他よりも乖離していたことがわかった。そこで、突出点判定部４４０は、乖離の度合いの大きい（ｃ）の領域については意図的に他よりも突出率を高く設定している。 FIG. 18 shows a concept when setting the protrusion rate from the input data from the feature amount analysis unit 430.
When the feature quantity spatial distribution of each partial time series was investigated, it was found that the feature quantity in (c) was more dissimilar than the others. Therefore, the protruding point determination unit 440 intentionally sets the protruding rate higher than the other in the region (c) where the degree of deviation is large.

なお、ここでは、各領域の乖離の度合いを示す値として、比率である突出率を用いているが、乖離の度合いを示すことができれば、比率でなくてもよい。 Here, as a value indicating the degree of divergence in each region, the ratio of protrusion is used as a ratio. However, the ratio may not be a ratio as long as the degree of divergence can be indicated.

平滑化係数算出部４５０は、上記突出点判定部４４０での特徴領域のスコアに従い時系列データの各点の情報に関して平滑化処理に使用する係数の算出を行う。初期設定のためのパラメータは以下の通りである。
集計単位時間…観測を行う時系列データを集計する単位時間 The smoothing coefficient calculation unit 450 calculates a coefficient to be used for the smoothing process with respect to information on each point of the time series data according to the score of the feature region in the protruding point determination unit 440. The parameters for initial setting are as follows.
Aggregation unit time: Unit time for aggregation of time series data to be observed

平滑化係数算出部４５０が突出点判定部４４０から入力するデータは図１１の通りである。
図１１における領域の部分は、項目の位置関係を示すためのもので、実際のデータには存在しない。従って、平滑化係数算出部４５０が突出点判定部４４０から入力するデータは、実際には図８と同様である。また、図１１の領域とは、図１５において説明した部分時系列を示している。 Data input from the protruding point determination unit 440 by the smoothing coefficient calculation unit 450 is as shown in FIG.
The portion of the area in FIG. 11 is for indicating the positional relationship of items, and does not exist in actual data. Therefore, the data input by the smoothing coefficient calculation unit 450 from the protruding point determination unit 440 is actually the same as in FIG. Moreover, the area | region of FIG. 11 has shown the partial time series demonstrated in FIG.

また、平滑化係数算出部４５０は、図３に示すように、データ入力・処理部４２０から単位時間当たりの時系列データを入力する。
平滑化係数算出部４５０がデータ入力・処理部４２０から入力するデータは、図６と同様である。
平滑化係数算出部４５０は、図６の各々のデータに対して、図１２に示すように、突出点判定部４４０により割り当てられた領域を設定する。
このように、平滑化係数算出部４５０は、図６のデータ入力・処理部４２０からデータに対して対応する領域を設定することにより、各データに対して対応する領域の突出率を付与する。 Further, as shown in FIG. 3, the smoothing coefficient calculation unit 450 inputs time series data per unit time from the data input / processing unit 420.
Data input from the data input / processing unit 420 by the smoothing coefficient calculation unit 450 is the same as that in FIG.
The smoothing coefficient calculation unit 450 sets the area allocated by the protruding point determination unit 440 for each piece of data in FIG. 6, as shown in FIG.
As described above, the smoothing coefficient calculation unit 450 sets the corresponding region for the data from the data input / processing unit 420 of FIG. 6, thereby giving the protrusion rate of the corresponding region to each data.

特徴量分析部４３０による主成分対象次元数をｋとしたとき、１つの領域にはｋ個の時系列データが含まれる。つまり、図９の例において、領域ｒ_１には、Ｔ_１とＣ_１の対からＴ_ｋとＣ_ｋの対までのｋ個のデータが含まれる。いま、データｉに対する突出率をＰ_ｉとした場合に、領域ｒ_ｊ（ｊ＝１、．．．、ｍ）のすべてのデータｉに対して平滑化係数Ｍ_ｉ（ｉ＝１、．．．、ｎ）を計算する際のアルゴリズムは以下の通りである。 When the number of principal component target dimensions by the feature quantity analysis unit 430 is k, one area includes k time-series data. That is, in the example of FIG. 9, the region r ₁ includes k pieces of data from a pair of T ₁ and C _{1 to} a pair of T _k and C _k . Now, the projected rate for data i in case of a _{P i,} region _{r j (j = 1, ...} , m) smoothing factor for all data i of _M i (i = 1, ... , N) is as follows.

数１において、ｍａｘ（ｒ_ｊ）は、領域ｒ_ｊに含まれるｋ個のデータのうち、イベント発生数Ｃにおける最大値を取得する処理を示す。
また、ｍｉｎ（ｒ_ｊ）は、領域ｒ_ｊに含まれるｋ個のデータのうち、イベント発生数Ｃにおける最小値を取得する処理を示す。
右辺の第３項、すなわち、（Ｃ_ｉ−ｍｉｎ（ｒ_ｊ））／（（ｍａｘ（ｒ_ｊ）−ｍｉｎ（ｒ_ｊ））は、領域ｒ_ｊに含まれる各々のイベント発生数Ｃ_ｉが当該領域ｒ_ｊに含まれる他のイベント発生数との比較においてどのような位置づけになるかという計算であり、各々のイベント発生数Ｃ_ｉの相対評価値を算出する計算である。
このように、平滑化係数Ｍ_ｉは、各データの相対評価値及び各データが属する領域の突出率とに基づく係数であり、後述するように、相対評価値及び突出率と、平滑化部４６０による移動平均計算の対象とするデータ数とが比例関係となる。
平滑化係数算出部４５０の出力データは、図１３に示す通りであり、図１２のデータに対して、平滑化係数Ｍ_ｉが追加されたものである。 In Equation 1, max (r _j ) indicates a process for obtaining the maximum value in the event occurrence number C among the k pieces of data included in the region r _j .
Further, min (r _j ) indicates a process for acquiring the minimum value in the event occurrence count C among the k pieces of data included in the region r _j .
The third term on the right side, that is, (C _i −min (r _j )) / ((max (r _j ) −min (r _j ))) indicates that each event occurrence number C _i included in the region r _j This is a calculation as to how it is positioned in comparison with the number of other event occurrences included in the region r _j , and is a calculation for calculating the relative evaluation value of each event occurrence number C _i .
As described above, the smoothing coefficient M _i is a coefficient based on the relative evaluation value of each data and the protrusion rate of the region to which each data belongs, and as will be described later, the relative evaluation value and the protrusion rate, and the smoothing unit 460. There is a proportional relationship with the number of data that is the target of the moving average calculation.
The output data of the smoothing coefficient calculation unit 450 is as shown in FIG. 13, and the smoothing coefficient M _i is added to the data of FIG.

次に、平滑化部４６０は、平滑化係数をもとに時系列データの平滑化を行う。
平滑化部４６０は、図３に示すように、データ入力・処理部４２０から単位時間当たりの時系列データを入力する。
平滑化部４６０がデータ入力・処理部４２０から入力するデータは、図６と同様である。
そして、平滑化部４６０は、データ入力・処理部４２０から単位時間当たりの時系列データに対して、平滑化係数算出部４５０からの出力データ（図１３）を用いて、平滑化を行う。
具体的な平滑化には、サンプル値を可変にした移動平均手法を用いる。
以下は、移動平均手法の定義である。いま、以下のようなｘ_ｉを中心とした前後ｑ個の時系列が存在するとする。 Next, the smoothing unit 460 smoothes the time series data based on the smoothing coefficient.
As shown in FIG. 3, the smoothing unit 460 inputs time series data per unit time from the data input / processing unit 420.
Data input from the data input / processing unit 420 by the smoothing unit 460 is the same as in FIG.
Then, the smoothing unit 460 smoothes the time series data per unit time from the data input / processing unit 420 by using the output data (FIG. 13) from the smoothing coefficient calculation unit 450.
For specific smoothing, a moving average method with variable sample values is used.
The following is the definition of the moving average method. Now, suppose that there are q time series before and after the following x _i at the center.

本実施の形態に係る平滑化部４６０の行う移動平均計算の式は以下の通りである。 The formula of the moving average calculation performed by the smoothing unit 460 according to the present embodiment is as follows.

数３において、ｙ_ｉは、平滑化後のイベント発生数Ｃ_ｉの値を示す。
数３において、ｘ_ｉは、イベント発生数Ｃ_ｉを意味する。
また、ｍ_ｉを移動平均値と呼ぶ。移動平均値ｍ_ｉは、平滑化部４６０による移動平均計算の対象とするデータ数を示す。
移動平均値ｍ_ｉは、平滑化係数Ｍ_ｉと同値である。
通常は移動平均値は一定（上記の数２では、ｑ個で固定）であるが、本実施の形態では移動平均値ｍ_ｉは、平滑化係数算出部４５０により算出された平滑化係数Ｍ_ｉに連動させている。
つまり、平滑化係数Ｍ_ｉの値により、平滑化部４６０による移動平均計算の対象とするデータ数が変化する。
平滑化係数Ｍ_ｉが大きくなれば、換言すれば、各々のイベント発生数Ｃ_ｉの相対評価値及び各々のイベント発生数Ｃ_ｉの突出率Ｐ_ｉの少なくとも一方が大きくなれば、移動平均計算の対象となるデータ数が大きくなり、この結果、多くのイベント発生数の値が反映されて平滑化の度合いが大きくなる。平滑化係数Ｍ_ｉが大きなデータは、前後のデータ又は領域と比較して突出した傾向にあるデータであるため、平滑化の度合いを大きくして平準化する。 In Equation 3, y _i indicates the value of the number of event occurrences C _i after smoothing.
In Equation 3, x _i means the number of event occurrences C _i .
Also referred to _{m i} and the moving average value. Moving average value m _i indicates the number of data to be subjected to the moving average calculation by the smoothing unit 460.
Moving average value _{m i} is the smoothing coefficient _{M i} and equivalence.
Normally, the moving average value is constant (in the above equation 2, q is fixed), but in this embodiment, the moving average value _mi is the smoothing coefficient M _i calculated by the smoothing coefficient calculation unit 450. It is linked to.
That is, the number of data to be subjected to the moving average calculation by the smoothing unit 460 varies depending on the value of the smoothing coefficient M _i .
The greater the smoothing coefficient M _i, in other words, if at least one of large projecting ratio P _i relative evaluation value and each of the event occurrence count C _i of each of the event occurrence count C _i, the moving average calculation The number of target data increases, and as a result, the value of many event occurrences is reflected and the degree of smoothing increases. Since the data having a large smoothing coefficient M _i is data that tends to be prominent compared to the preceding or subsequent data or region, the data is leveled by increasing the degree of smoothing.

以上のように、本実施の形態では、定常領域の各地点の値を移動平均で求める。その際、移動平均値ｍ_ｉの値を平滑化係数Ｍ_ｉを元に決定する。
定常領域のある点が特徴量空間の群に含まれる領域であった場合、移動平均値ｍ_ｉは小さいため元の情報を保持する。つまり、値は大きく変化しない。
また時系列データのある点が特徴量空間の群に含まれない領域であった場合、移動平均値ｍ_ｉが大きくなるので、突出した情報を平滑化する。 As described above, in this embodiment, the value of each point in the steady region is obtained by a moving average. At that time, the value of the moving average value m _i is determined based on the smoothing factor M _i.
When a certain point in the stationary region is a region included in the group of feature amount spaces, the moving average value _mi is small, so the original information is retained. That is, the value does not change greatly.
In the case that a time series data is a region not included in the group of the feature space, the moving average value m _i increases, smoothes the protruding information.

図１９は、時系列データの平滑化の概念を示す。
これまでの工程で、領域ごとの平滑化係数は時系列の値が大きいほど移動平均値が大きくなるよう設定されている。すなわち値の突出している箇所ほど強くノイズ除去が働くようになる。図１９では（ｂ）の領域が強く平滑化がかかることになる。 FIG. 19 shows the concept of smoothing time-series data.
In the steps so far, the smoothing coefficient for each region is set so that the moving average value increases as the time-series value increases. In other words, noise removal works more strongly in areas where the values are more prominent. In FIG. 19, the region (b) is strong and smoothed.

このようにして平滑化処理が行われた後の時系列データを学習データとし、異常検知部４は図３に図示していない手段において、この学習データを用いて異常検知を行う。 The time-series data after the smoothing process is performed as learning data, and the abnormality detection unit 4 performs abnormality detection using the learning data by means not shown in FIG.

このように、本実施の形態では、ノイズ情報が混入した時系列データに対して、主成分分析と部分的な移動平均処理を行うことによって、学習データを伴うネットワーク異常検知処理に効果的な時系列データの平滑化を行うことができる。 As described above, in the present embodiment, by performing principal component analysis and partial moving average processing on time series data in which noise information is mixed, it is effective for network abnormality detection processing with learning data. The series data can be smoothed.

なお、上記の説明では、特徴量分析において２種類の特徴量を用いることとしたが、２種類に限らず、１種類でもよいし、３種類以上であってもよい。
また、上記の説明では、計算の高速化のために、時系列データを領域に分け、領域ごとに特徴量を算出し、領域ごとの特徴量に基づき各領域の突出率を判定したが、領域に分けることなく個々のデータのデータ値に基づいてデータごとに突出率を判定するようにしてもよい。 In the above description, two types of feature amounts are used in the feature amount analysis. However, the number of feature amounts is not limited to two, and may be one or three or more.
In the above description, in order to speed up the calculation, the time series data is divided into regions, the feature amount is calculated for each region, and the protrusion rate of each region is determined based on the feature amount for each region. The protrusion rate may be determined for each piece of data based on the data value of each piece of data.

本実施の形態では、時系列データを単位時間ごとに集計した数を記憶するデータ入力・処理手段と、上記データ入力・処理手段で集計された時系列データから主成分得点の計算を行い主成分得点の時系列へ変換する主成分分析手段と、上記主成分分析手段で得られた主成分得点の時系列を先頭から一定数ごとに特徴領域の群に纏める領域編集処理手段と、上記領域編集処理手段で得られた特徴領域の群を調査し他の領域と比較して領域のスコア化を行う突出点判定手段と、上記突出点判定手段での特徴領域のスコアに従い平滑化に関するパラメータを定める平滑化係数判定手段と、上記平滑化係数判定手段でのパラメータに従い時系列データの平滑化を実施する平滑化手段とを有する異常検知部（データ処理装置）について説明した。 In this embodiment, the data input / processing means for storing the number of time series data aggregated per unit time, and the principal component score is calculated from the time series data aggregated by the data input / processing means. Principal component analysis means for converting the score into a time series, a region editing processing means for collecting the principal component score time series obtained by the principal component analysis means into a group of feature areas from a head in a certain number, and the area editing A group of feature areas obtained by the processing means is investigated, and a projection point determination means for scoring the area in comparison with other areas, and parameters for smoothing are determined according to the score of the feature area in the projection point determination means. An abnormality detection unit (data processing apparatus) having a smoothing coefficient determination unit and a smoothing unit that performs smoothing of time-series data in accordance with parameters in the smoothing coefficient determination unit has been described.

実施の形態２．
時系列データを平滑化する形態として、変動の大きな時系列データへの対応が考えられる。
図２０の上段のように時系列データの傾向が前半と後半で変化していた場合、主成分分析の結果得られる特徴量空間（主成分空間）は、大きく２つの群に分割される。
図２０の例では、領域（ａ）〜（ｃ）が一つの群を形成し、領域（ｅ）〜（ｆ）が別の群を形成する。
このような場合、突出率を正確に判定することができず、時系列データの平滑化係数の判定が困難になる。 Embodiment 2. FIG.
As a form of smoothing time-series data, it is possible to deal with time-series data with large fluctuations.
As shown in the upper part of FIG. 20, when the tendency of the time series data changes between the first half and the second half, the feature amount space (principal component space) obtained as a result of the principal component analysis is roughly divided into two groups.
In the example of FIG. 20, the regions (a) to (c) form one group, and the regions (e) to (f) form another group.
In such a case, the protrusion rate cannot be accurately determined, and it is difficult to determine the smoothing coefficient of the time series data.

本実施の形態では、これを解決するために突出点判定部４４０に以下のような機能を追加する。
なお、本実施の形態に係る異常検知部４の構成は図３に示したものと同様であり、各要素の処理の工程は突出点判定部４４０以外は実施の形態１と同じである。 In the present embodiment, the following function is added to the protruding point determination unit 440 in order to solve this problem.
The configuration of the abnormality detection unit 4 according to the present embodiment is the same as that shown in FIG. 3, and the process of each element is the same as that of the first embodiment except for the protruding point determination unit 440.

図２０の前段に示すように、実施の形態１の突出点判定部４４０は、時系列データの突出率を算出する際、入力されたデータ全てを対象にしていた。この結果、領域（ａ）〜（ｃ）が一つの群を形成し、領域（ｅ）〜（ｆ）が別の群を形成する主成分空間となっていた。
本実施の形態では、突出点判定部４４０は、近隣の領域のみを用いて突出率の算出を行うことで時系列データの変動に対応する。 As shown in the previous stage of FIG. 20, the protruding point determination unit 440 according to the first embodiment targets all input data when calculating the protruding rate of time-series data. As a result, the regions (a) to (c) form one group, and the regions (e) to (f) form a main component space that forms another group.
In the present embodiment, the protruding point determination unit 440 responds to fluctuations in time-series data by calculating the protruding rate using only neighboring regions.

図２０の下段は実施の形態２に係る突出点判定部４４０の突出率判定手法を示す概念図である。
時系列データ（ａ）の領域に関する突出率を算出する際は隣り合う両側の領域（ｚ）〜（ｂ）を元にする。つまり、領域（ｚ）〜（ｂ）のそれぞれの特徴量を特徴量空間に配置し、それぞれの特徴量を比較して、突出率を判定する。
同様に、領域（ｂ）に関する突出率の算出は領域（ａ）〜（ｃ）を用いる。 The lower part of FIG. 20 is a conceptual diagram illustrating a protrusion rate determination method of the protrusion point determination unit 440 according to the second embodiment.
When calculating the protrusion ratio regarding the region of the time series data (a), the regions (z) to (b) on both sides adjacent to each other are used as a basis. That is, the feature amounts of the regions (z) to (b) are arranged in the feature amount space, and the feature amounts are compared to determine the protrusion rate.
Similarly, calculation of the protrusion ratio regarding the region (b) uses the regions (a) to (c).

なお、実施の形態１による突出率判定の手法（図２０の上段）と、実施の形態２による突出率判定の手法（図２０の下段）の区別は、例えば、時系列データのデータ量が一定量以上であれば、実施の形態２の手法により、一定量未満の場合であれば実施の形態１の手法によるという運用が考えられる。 The distinction between the protrusion rate determination method according to the first embodiment (upper part of FIG. 20) and the protrusion rate determination method according to the second embodiment (lower part of FIG. 20) is, for example, a constant amount of time-series data. If the amount is equal to or greater than the amount, the method according to the second embodiment can be considered to be operated according to the method according to the first embodiment if the amount is less than a certain amount.

以上のように、本実施の形態では、突出点判定部４４０は、各領域（グループ）の特徴量と、各領域に近接する任意数の領域の特徴量との関係に基づいて、各領域の突出率（乖離値）を設定する。
なお、上記の説明では、突出率の算出を行う領域の両隣の領域のみを用いることにしているが、両隣に限らず、近接する任意数の領域を用いることができる。例えば、前後５領域ずつを用いてもよいし、先行する５つの領域のみを用いるようにしてもよい。 As described above, in the present embodiment, the protruding point determination unit 440 performs the determination of each region (group) based on the relationship between the feature amount of each region (group) and the feature amount of an arbitrary number of regions close to each region. Set the protrusion rate (deviation value).
In the above description, only the regions adjacent to both sides of the region where the protrusion ratio is calculated are used, but not limited to both sides, any number of adjacent regions can be used. For example, the front and rear five areas may be used, or only the preceding five areas may be used.

このように、本実施の形態によれば、平滑化を行う時系列データが変動のあるものである場合、突出率を求めるために用いる時系列領域を狭めることで、時系列データの変動による影響を小さくすることができる。これにより時系列のどの部分においても突出した箇所を検出することが可能になる。 As described above, according to the present embodiment, when the time series data to be smoothed is fluctuating, the time series area used for obtaining the protrusion ratio is narrowed, thereby affecting the influence of the fluctuation of the time series data. Can be reduced. This makes it possible to detect a protruding portion in any part of the time series.

実施の形態３．
実施の形態２に示した事例の派生として、平滑化を行う時系列データの特徴量が頻繁に分布する場合がある。この場合、特徴量の数によっては特徴量空間上の群で判断した場合、群の主従関係が逆になり突出率の判定を誤る場合がある。 Embodiment 3 FIG.
As a derivation of the example shown in the second embodiment, the feature amount of time series data to be smoothed may be frequently distributed. In this case, depending on the number of feature amounts, when a group in the feature amount space is determined, the master-slave relationship of the group may be reversed and the protrusion rate may be erroneously determined.

図２１は突出率を誤って判定した例を示す。
図２１では時系列データに突出している領域（（ａ）、（ｃ）〜（ｆ））が多く出現している。
このデータを主成分分析し、特徴量空間に配置した場合、突出している領域の方が群としての割合が大きくなる。
群を構成している領域の大小関係から、突出率を算出すると群から乖離している領域は（ｂ）の領域であると判断され、突出率を算出する関係が逆転してしまう。
このため、領域（ｂ）の突出率が高く、その他の領域の突出率が低くなり、実態からかけ離れてしまう。 FIG. 21 shows an example in which the protrusion rate is erroneously determined.
In FIG. 21, many regions ((a), (c) to (f)) protruding in the time series data appear.
When this data is subjected to principal component analysis and arranged in the feature amount space, the protruding region has a larger ratio as a group.
If the protrusion ratio is calculated from the size relationship of the areas constituting the group, the area deviating from the group is determined to be the area (b), and the relationship for calculating the protrusion ratio is reversed.
For this reason, the protrusion rate of the area (b) is high, and the protrusion ratios of the other areas are low, which is far from the actual situation.

このような事態を是正するため、本実施の形態に係る突出点判定部４４０は、領域間の具体的な値を把握する。
特徴量空間の分布から見ると、領域（ａ）、（ｃ）〜（ｆ）が中心であると見られる。
しかし各領域の平均値を求めると、領域（ｂ）の平均値が他よりも小さいことが分かり、この場合の特徴量空間は領域（ｂ）からの乖離を求める必要があることがわかる。
本実施の形態に係る突出点判定部４４０は、このように、各領域に含まれるデータ値の平均値を求めて、より実態に則した突出率の判定を行う。
なお、本実施の形態に係る異常検知部４の構成は図３に示したものと同様であり、各要素の処理の工程は突出点判定部４４０以外は実施の形態１と同じである。 In order to correct such a situation, the protruding point determination unit 440 according to the present embodiment grasps a specific value between regions.
When viewed from the distribution of the feature amount space, it can be seen that the regions (a) and (c) to (f) are the center.
However, when the average value of each region is obtained, it can be seen that the average value of the region (b) is smaller than the others, and the feature amount space in this case needs to obtain the deviation from the region (b).
As described above, the protrusion point determination unit 440 according to the present embodiment calculates the average value of the data values included in each region, and determines the protrusion rate in accordance with the actual situation.
The configuration of the abnormality detection unit 4 according to the present embodiment is the same as that shown in FIG. 3, and the process of each element is the same as that of the first embodiment except for the protruding point determination unit 440.

このように、本実施の形態に係る突出点判定部は、領域（グループ）ごとに領域内のデータの平均値を算出し、各領域の平均値と、各領域の特徴量と他の領域の特徴量との関係とに基づき、各領域の突出率（乖離値）を設定する。 As described above, the protruding point determination unit according to the present embodiment calculates the average value of the data in the region for each region (group), and calculates the average value of each region, the feature amount of each region, and the other region. Based on the relationship with the feature amount, the protrusion rate (deviation value) of each region is set.

以上のように、本実施の形態によれば、特徴量空間の分布からみた時系列データの突出点判定が困難な場合、個々の領域の平均値を求めることで誤った判定を防ぐことができる。 As described above, according to the present embodiment, when it is difficult to determine the protruding point of time series data viewed from the distribution of the feature amount space, it is possible to prevent erroneous determination by obtaining the average value of each region. .

実施の形態４．
本実施の形態では、異常検知システムに学習データを蓄積する例を説明する。
つまり、本実施の形態では、異常検知に先立って学習データの平滑化を行い、平滑化された後の学習データを蓄積しておき、異常検知の際に蓄積している学習データを利用する。 Embodiment 4 FIG.
In the present embodiment, an example in which learning data is accumulated in the abnormality detection system will be described.
That is, in the present embodiment, learning data is smoothed prior to abnormality detection, the learning data after smoothing is accumulated, and learning data accumulated at the time of abnormality detection is used.

図２２の左側は、従来の異常検知システムにおける処理の流れの概略を示す。
入力データから得られたデータは異常検知システムで解析され異常であった場合、警告を行う。
従来は、入力データが異常であるかどうかを比較するため、入力データ中から正常である領域を学習データとして用いた。この学習データは入力データが更新されるたびに新たに規定し直していたが、学習データを蓄積すると過去の実績も使用できることになり検知精度が向上すると考えられる。ただし入力データには異常な値が含まれているため、学習データを蓄積する前段階で実施の形態１に示した手順を用いて平滑化を行う。
図２２の右側がその処理の流れの概略である。 The left side of FIG. 22 shows an outline of the process flow in the conventional abnormality detection system.
The data obtained from the input data is analyzed by the abnormality detection system, and a warning is given if there is an abnormality.
Conventionally, in order to compare whether or not the input data is abnormal, a normal region from the input data is used as learning data. The learning data is newly defined every time the input data is updated. However, if the learning data is accumulated, the past results can be used and the detection accuracy is considered to be improved. However, since abnormal values are included in the input data, smoothing is performed using the procedure shown in the first embodiment before the learning data is accumulated.
The right side of FIG. 22 is an outline of the processing flow.

本実施の形態では、異常検知システムの学習データの蓄積処理に用いる。
通常の平滑化と異なり特徴的な部分を優先的に平滑化するため、入力データにノイズ情報が含まれていた場合でも学習データとして活用することが可能である。 In the present embodiment, it is used for accumulation processing of learning data of the abnormality detection system.
Unlike normal smoothing, the characteristic part is preferentially smoothed, so that even when noise information is included in the input data, it can be used as learning data.

実施の形態５．
本実施の形態では、時系列情報の検索システムでの利用方法について説明する。
図２３は、このような検索システムの処理の流れの概略を示す。
図２３の左側に示す従来のシステムでは時系列情報を入力すると辞書データベースが辞書データを参照し類似するパターンを選定する。
しかし、時系列情報は特徴が多岐に渡るため同一の情報は検索が困難である。
関連する類似の情報を検索するためには平滑化の処理が必要であるが、単純に平滑化してしまうと入力データの情報が失われるため、検索は困難である。また入力データにノイズが混入していた場合はそのままでは検索を行うことができない。
この場合も実施の形態１に示したようなノイズ情報を優先的に平滑化する手法が有効である。 Embodiment 5. FIG.
In this embodiment, a method of using the time series information search system will be described.
FIG. 23 shows an outline of the processing flow of such a search system.
In the conventional system shown on the left side of FIG. 23, when time series information is input, the dictionary database refers to the dictionary data and selects a similar pattern.
However, since time-series information has various features, it is difficult to retrieve the same information.
In order to search for related similar information, a smoothing process is necessary. However, if the information is simply smoothed, the information of the input data is lost, and the search is difficult. If noise is mixed in the input data, the search cannot be performed as it is.
In this case as well, a technique for preferentially smoothing noise information as shown in the first embodiment is effective.

図２３の右側は、本実施の形態に係る方式の処理の流れの概略を示す。
入力データをそのまま検索のキーとせず、実施の形態１に示す手法により平滑化を行う。これにより検索範囲の拡大とノイズの対処が可能になる。
つまり、辞書データベースにおいて辞書データとの照合の対象になるデータ（検知パターン）に対して実施の形態１に示した平滑化部による平滑化を行い、平滑化後のデータ（検知パターン）を辞書データベースに出力する。
辞書データベースでは、平滑後のデータ（検知パターン）に合致する辞書データを検索して、検索結果を応答する。 The right side of FIG. 23 shows an outline of the processing flow of the method according to the present embodiment.
The input data is not directly used as a search key, but is smoothed by the method shown in the first embodiment. This makes it possible to expand the search range and deal with noise.
That is, the data (detection pattern) to be collated with the dictionary data in the dictionary database is smoothed by the smoothing unit shown in the first embodiment, and the smoothed data (detection pattern) is stored in the dictionary database. Output to.
In the dictionary database, dictionary data matching the smoothed data (detection pattern) is searched, and the search result is returned.

本実施の形態では、平滑化処理を時系列情報の検索システムの検索処理に用いる。
このようにすることで、ノイズ情報を取り除いてもノイズ情報以外の時系列情報は保存されるため検索精度の向上が可能になる。 In this embodiment, the smoothing process is used for the search process of the time-series information search system.
In this way, even if the noise information is removed, time series information other than the noise information is saved, so that the search accuracy can be improved.

最後に、実施の形態１〜５に示した不正アクセス分析システム１００及び異常検知部４のハードウェア構成例について説明する。 Finally, a hardware configuration example of the unauthorized access analysis system 100 and the abnormality detection unit 4 described in the first to fifth embodiments will be described.

図２５は、本実施の形態１〜５に示す不正アクセス分析システム１００及び異常検知部４のハードウェア資源の一例を示す図である。なお、図２５の構成は、あくまでも不正アクセス分析システム１００及び異常検知部４のハードウェア構成の一例を示すものであり、不正アクセス分析システム１００及び異常検知部４のハードウェア構成は図２５に記載の構成に限らず、他の構成であってもよい。 FIG. 25 is a diagram illustrating an example of hardware resources of the unauthorized access analysis system 100 and the abnormality detection unit 4 described in the first to fifth embodiments. The configuration of FIG. 25 is merely an example of the hardware configuration of the unauthorized access analysis system 100 and the abnormality detection unit 4, and the hardware configuration of the unauthorized access analysis system 100 and the abnormality detection unit 4 is described in FIG. It is not limited to this configuration, and other configurations may be used.

図２５において、不正アクセス分析システム１００及び異常検知部４は、プログラムを実行するＣＰＵ９１１（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ、中央処理装置、処理装置、演算装置、マイクロプロセッサ、マイクロコンピュータ、プロセッサともいう）を備えている。ＣＰＵ９１１は、バス９１２を介して、例えば、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）９１３、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）９１４、通信ボード９１５、表示装置９０１、キーボード９０２、マウス９０３、磁気ディスク装置９２０と接続され、これらのハードウェアデバイスを制御する。更に、ＣＰＵ９１１は、ＦＤＤ９０４（ＦｌｅｘｉｂｌｅＤｉｓｋＤｒｉｖｅ）、コンパクトディスク装置９０５（ＣＤＤ）、プリンタ装置９０６、スキャナ装置９０７と接続していてもよい。また、磁気ディスク装置９２０の代わりに、光ディスク装置、メモリカード読み書き装置などの記憶装置でもよい。
ＲＡＭ９１４は、揮発性メモリの一例である。ＲＯＭ９１３、ＦＤＤ９０４、ＣＤＤ９０５、磁気ディスク装置９２０の記憶媒体は、不揮発性メモリの一例である。これらは、記憶装置あるいは記憶部の一例である。
通信ボード９１５、キーボード９０２、スキャナ装置９０７、ＦＤＤ９０４などは、入力部、入力装置の一例である。
また、通信ボード９１５、表示装置９０１、プリンタ装置９０６などは、出力部、出力装置の一例である。 In FIG. 25, the unauthorized access analysis system 100 and the abnormality detection unit 4 include a CPU 911 (also referred to as a central processing unit, a central processing unit, a processing unit, an arithmetic unit, a microprocessor, a microcomputer, and a processor) that executes a program. . The CPU 911 is connected to, for example, a ROM (Read Only Memory) 913, a RAM (Random Access Memory) 914, a communication board 915, a display device 901, a keyboard 902, a mouse 903, and a magnetic disk device 920 via a bus 912. Control hardware devices. Further, the CPU 911 may be connected to an FDD 904 (Flexible Disk Drive), a compact disk device 905 (CDD), a printer device 906, and a scanner device 907. Further, instead of the magnetic disk device 920, a storage device such as an optical disk device or a memory card read / write device may be used.
The RAM 914 is an example of a volatile memory. The storage media of the ROM 913, the FDD 904, the CDD 905, and the magnetic disk device 920 are an example of a nonvolatile memory. These are examples of a storage device or a storage unit.
The communication board 915, the keyboard 902, the scanner device 907, the FDD 904, and the like are examples of an input unit and an input device.
Further, the communication board 915, the display device 901, the printer device 906, and the like are examples of an output unit and an output device.

通信ボード９１５は、例えば、ＬＡＮ（ローカルエリアネットワーク）、インターネット、ＷＡＮ（ワイドエリアネットワーク）などに接続されていてもよい。
磁気ディスク装置９２０には、オペレーティングシステム９２１（ＯＳ）、ウィンドウシステム９２２、プログラム群９２３、ファイル群９２４が記憶されている。プログラム群９２３のプログラムは、ＣＰＵ９１１、オペレーティングシステム９２１、ウィンドウシステム９２２により実行される。 The communication board 915 may be connected to a LAN (Local Area Network), the Internet, a WAN (Wide Area Network), etc., for example.
The magnetic disk device 920 stores an operating system 921 (OS), a window system 922, a program group 923, and a file group 924. The programs in the program group 923 are executed by the CPU 911, the operating system 921, and the window system 922.

上記プログラム群９２３には、本実施の形態１〜５の説明において「〜部」、「〜手段」として説明している機能を実行するプログラムが記憶されている。プログラムは、ＣＰＵ９１１により読み出され実行される。
ファイル群９２４には、実施の形態１〜５の説明において、「〜の判断」、「〜の計算」、「〜の比較」、「〜の評価」、「〜の判定」、「〜の設定」、「〜の集計」等として説明している処理の結果を示す情報やデータや信号値や変数値やパラメータが、「〜ファイル」や「〜データベース」の各項目として記憶されている。「〜ファイル」や「〜データベース」は、ディスクやメモリなどの記録媒体に記憶される。ディスクやメモリになどの記憶媒体に記憶された情報やデータや信号値や変数値やパラメータは、読み書き回路を介してＣＰＵ９１１によりメインメモリやキャッシュメモリに読み出され、抽出・検索・参照・比較・演算・計算・処理・編集・出力・印刷・表示などのＣＰＵの動作に用いられる。抽出・検索・参照・比較・演算・計算・処理・編集・出力・印刷・表示のＣＰＵの動作の間、情報やデータや信号値や変数値やパラメータは、メインメモリ、レジスタ、キャッシュメモリ、バッファメモリ等に一時的に記憶される。
また、実施の形態１〜５で説明するフローチャートの矢印の部分は主としてデータや信号の入出力を示し、データや信号値は、ＲＡＭ９１４のメモリ、ＦＤＤ９０４のフレキシブルディスク、ＣＤＤ９０５のコンパクトディスク、磁気ディスク装置９２０の磁気ディスク、その他光ディスク、ミニディスク、ＤＶＤ等の記録媒体に記録される。また、データや信号は、バス９１２や信号線やケーブルその他の伝送媒体によりオンライン伝送される。 The program group 923 stores programs for executing the functions described as “˜unit” and “˜means” in the description of the first to fifth embodiments. The program is read and executed by the CPU 911.
In the file group 924, in the description of the first to fifth embodiments, “determination of”, “calculation of”, “comparison of”, “evaluation of”, “determination of”, and “setting of” are set. ", Data, signal values, variable values, and parameters indicating the results of the processes described as""," totaling ", etc. are stored as items of" ~ file "and" ~ database ". The “˜file” and “˜database” are stored in a recording medium such as a disk or a memory. Information, data, signal values, variable values, and parameters stored in a storage medium such as a disk or memory are read out to the main memory or cache memory by the CPU 911 via a read / write circuit, and extracted, searched, referenced, compared, Used for CPU operations such as calculation, calculation, processing, editing, output, printing, and display. Information, data, signal values, variable values, and parameters are stored in the main memory, registers, cache memory, and buffers during the CPU operations of extraction, search, reference, comparison, calculation, processing, editing, output, printing, and display. It is temporarily stored in a memory or the like.
The arrows in the flowcharts described in the first to fifth embodiments mainly indicate input / output of data and signals. The data and signal values are the memory of the RAM 914, the flexible disk of the FDD904, the compact disk of the CDD905, and the magnetic disk device. It is recorded on a recording medium such as a 920 magnetic disk, other optical disks, minidisks, and DVDs. Data and signals are transmitted online via a bus 912, signal lines, cables, or other transmission media.

また、本実施の形態１〜５の説明において「〜部」、「〜手段」として説明しているものは、「〜回路」、「〜装置」、「〜機器」、であってもよく、また、「〜ステップ」、「〜手順」、「〜処理」であってもよい。すなわち、「〜部」、「〜手段」として説明しているものは、ＲＯＭ９１３に記憶されたファームウェアで実現されていても構わない。或いは、ソフトウェアのみ、或いは、素子・デバイス・基板・配線などのハードウェアのみ、或いは、ソフトウェアとハードウェアとの組み合わせ、さらには、ファームウェアとの組み合わせで実施されても構わない。ファームウェアとソフトウェアは、プログラムとして、磁気ディスク、フレキシブルディスク、光ディスク、コンパクトディスク、ミニディスク、ＤＶＤ等の記録媒体に記憶される。プログラムはＣＰＵ９１１により読み出され、ＣＰＵ９１１により実行される。すなわち、プログラムは、本実施の形態１〜５の「〜部」、「〜手段」としてコンピュータを機能させるものである。あるいは、本実施の形態１殻の「〜部」、「〜手段」の手順や方法をコンピュータに実行させるものである。 Further, in the description of the first to fifth embodiments, what is described as “to part” and “to means” may be “to circuit”, “to device”, and “to device”. Also, “˜step”, “˜procedure”, and “˜processing” may be used. That is, what is described as “˜unit” and “˜means” may be realized by firmware stored in the ROM 913. Alternatively, it may be implemented only by software, or only by hardware such as elements, devices, substrates, and wirings, by a combination of software and hardware, or by a combination of firmware. Firmware and software are stored as programs in a recording medium such as a magnetic disk, a flexible disk, an optical disk, a compact disk, a mini disk, and a DVD. The program is read by the CPU 911 and executed by the CPU 911. That is, the program causes the computer to function as “to part” and “to means” in the first to fifth embodiments. Alternatively, the computer executes the procedures and methods of “to part” and “to means” of the first embodiment shell.

このように、本実施の形態１〜５に示す不正アクセス分析システム１００及び異常検知部４は、処理装置たるＣＰＵ、記憶装置たるメモリ、磁気ディスク等、入力装置たるキーボード、マウス、通信ボード等、出力装置たる表示装置、通信ボード等を備えるコンピュータであり、上記したように「〜部」、「〜手段」として示された機能をこれら処理装置、記憶装置、入力装置、出力装置を用いて実現するものである。 As described above, the unauthorized access analysis system 100 and the abnormality detection unit 4 shown in the first to fifth embodiments include a CPU as a processing device, a memory as a storage device, a magnetic disk, etc., a keyboard as an input device, a mouse, a communication board, etc. It is a computer equipped with a display device, a communication board, etc. as an output device, and implements the functions indicated as “to part” and “to means” by using these processing device, storage device, input device, and output device as described above. To do.

実施の形態１に係る不正アクセス分析システムの構成例を示す図。1 is a diagram illustrating a configuration example of an unauthorized access analysis system according to Embodiment 1. FIG. 実施の形態１に係る不正アクセス分析システムと監視対象との関係を示す図。The figure which shows the relationship between the unauthorized access analysis system which concerns on Embodiment 1, and a monitoring object. 実施の形態１に係る異常検知部の構成例を示す図。FIG. 3 is a diagram illustrating a configuration example of an abnormality detection unit according to the first embodiment. 実施の形態１に係るデータ入力・処理部の集計前の入力データの例を示す図。The figure which shows the example of the input data before totaling of the data input and process part which concerns on Embodiment 1. FIG. 実施の形態１に係るデータ入力・処理部の集計後の入力データの例を示す図。The figure which shows the example of the input data after totaling of the data input and process part which concerns on Embodiment 1. FIG. 実施の形態１に係る特徴量分析部の入力データの例を示す図。FIG. 6 is a diagram illustrating an example of input data of a feature amount analysis unit according to the first embodiment. 実施の形態１に係る特徴量分析部における時系列データと特徴量の関係の例を示す図。FIG. 6 is a diagram illustrating an example of a relationship between time series data and a feature amount in a feature amount analysis unit according to the first embodiment. 実施の形態１に係る特徴量分析部の出力形式の例を示す図。FIG. 4 is a diagram illustrating an example of an output format of a feature amount analysis unit according to the first embodiment. 実施の形態１に係る突出点判定部の入力データの例を示す図。FIG. 4 is a diagram illustrating an example of input data of a protruding point determination unit according to the first embodiment. 実施の形態１に係る突出点判定部の出力データの例を示す図。FIG. 6 is a diagram illustrating an example of output data of a protruding point determination unit according to the first embodiment. 実施の形態１に係る平滑化係数算出部の突出点判定部からの入力データの例を示す図。FIG. 4 is a diagram illustrating an example of input data from a protruding point determination unit of a smoothing coefficient calculation unit according to the first embodiment. 実施の形態１に係る平滑化係数算出部のデータ入力・処理部からの入力データの例を示す図。FIG. 4 is a diagram illustrating an example of input data from a data input / processing unit of a smoothing coefficient calculation unit according to the first embodiment. 実施の形態１に係る平滑化係数算出部の出力データの例を示す図。FIG. 6 is a diagram illustrating an example of output data of a smoothing coefficient calculation unit according to the first embodiment. 実施の形態１に係るデータ入力・処理部のデータ集計処理の具体例を示す図。FIG. 6 is a diagram showing a specific example of data totaling processing of the data input / processing unit according to the first embodiment. 実施の形態１に係る特徴量分析部の領域化処理の具体例を示す図。FIG. 10 is a diagram showing a specific example of the regionizing process of the feature amount analysis unit according to the first embodiment. 実施の形態１に係る特徴量分析部の主成分分析処理の具体例を示す図。FIG. 6 is a diagram illustrating a specific example of principal component analysis processing of a feature amount analysis unit according to the first embodiment. 実施の形態１に係る突出点判定部の主成分空間への配置処理の具体例を示す図。The figure which shows the specific example of the arrangement | positioning process to the principal component space of the protrusion point determination part which concerns on Embodiment 1. FIG. 実施の形態１に係る突出点判定部の突出点判定処理の具体例を示す図。The figure which shows the specific example of the protrusion point determination process of the protrusion point determination part which concerns on Embodiment 1. FIG. 実施の形態１に係る平滑化部の平滑化処理の具体例を示す図。FIG. 6 shows a specific example of the smoothing process of the smoothing unit according to the first embodiment. 実施の形態２に係る突出点判定部の突出点判定処理の具体例を示す図。The figure which shows the specific example of the protrusion point determination process of the protrusion point determination part which concerns on Embodiment 2. FIG. 実施の形態３に係る突出点判定部の突出点判定処理の具体例を示す図。FIG. 10 is a diagram illustrating a specific example of a protrusion point determination process of a protrusion point determination unit according to the third embodiment. 実施の形態４に係る方式を示す図。FIG. 6 shows a method according to a fourth embodiment. 実施の形態５に係る方式を示す図。FIG. 6 shows a method according to a fifth embodiment. 実施の形態１に係る異常検知部の動作例を示すフローチャート図。FIG. 3 is a flowchart showing an operation example of an abnormality detection unit according to the first embodiment. 実施の形態１〜５に係る不正アクセス分析システム及び異常検知部のハードウェア構成例を示す図。The figure which shows the hardware structural example of the unauthorized access analysis system which concerns on Embodiment 1-5, and an abnormality detection part. 従来技術を説明する図。The figure explaining a prior art.

Explanation of symbols

１ＧＵＩ、２対策部、３不正アクセス判定部、４異常検知部、５ログ情報集計部、６情報収集部、１００不正アクセス分析システム、４１０入力データ、４２０データ入力・処理部、４３０特徴量分析部、４４０突出点判定部、４５０平滑化係数算出部、４６０平滑化部。 DESCRIPTION OF SYMBOLS 1 GUI, 2 Countermeasure part, 3 Unauthorized access determination part, 4 Abnormality detection part, 5 Log information totaling part, 6 Information collection part, 100 Unauthorized access analysis system, 410 Input data, 420 Data input and processing part, 430 Feature quantity analysis Part, 440 protrusion point determination part, 450 smoothing coefficient calculation part, 460 smoothing part.

Claims

A divergence value setting unit that performs analysis on a plurality of data each having a data value, and sets, for each data, the degree of deviation of the data value of each data from the data value of other data, as a divergence value;
A smoothing coefficient calculating unit that calculates a smoothing coefficient for smoothing for each data, reflecting the deviation value of each data set by the deviation value setting unit;
A data processing apparatus comprising: a smoothing unit that smoothes each data using the smoothing coefficient of each data calculated by the smoothing coefficient calculating unit.

The smoothing coefficient calculation unit includes:
The data processing apparatus according to claim 1, wherein a smoothing coefficient is calculated so that a degree of smoothing by the smoothing unit changes in accordance with a deviation value set by the deviation value setting unit.

The deviation value setting unit
Analyzing a plurality of data arranged according to a predetermined order,
The smoothing unit
Perform smoothing by performing a moving average calculation using an arbitrary number of data on the data to be smoothed,
The smoothing coefficient calculation unit includes:
The smoothing coefficient in which the divergence value set by the divergence value setting unit and the number of data targeted for moving average calculation by the smoothing unit are proportional to each other is calculated. Data processing device.

The data processing device further includes:
Feature quantity analysis that groups a plurality of data arranged according to a predetermined order into a plurality of groups according to the order, calculates a principal quantity of data values of data included in each group, and calculates a feature quantity of each group Part
The deviation value setting unit
For each group grouped by the feature quantity analysis unit, the degree of deviation of the feature quantity of each group from the feature quantity of the other group is set as a deviation value,
The smoothing coefficient calculation unit includes:
Calculate the relative evaluation value of the data value of each data in the group to which each data belongs, and reflect the relative evaluation value of each data and the deviation value of the group to which each data belongs to smooth each data The data processing apparatus according to claim 1, wherein a smoothing coefficient is calculated.

The smoothing coefficient calculation unit includes:
The data processing apparatus according to claim 4, wherein a smoothing coefficient in which a relative evaluation value and a divergence value are proportional to a degree of smoothing by the smoothing unit is calculated.

The smoothing unit
Perform smoothing by performing a moving average calculation using an arbitrary number of data on the data to be smoothed,
The number of data to be subjected to moving average calculation is determined according to the smoothing coefficient calculated by the smoothing coefficient calculation unit, and data is smoothed by performing moving average calculation using data of the determined number of data. The data processing apparatus according to claim 4, wherein:

The deviation value setting unit
5. The data processing apparatus according to claim 4, wherein a divergence value of each group is set based on a relationship between a feature amount of each group and a feature amount of an arbitrary number of groups close to each group.

The deviation value setting unit
For each group, calculate the average value of the data in the group, and set the divergence value of each group based on the average value of each group and the relationship between the feature value of each group and the feature value of other groups. 5. The data processing apparatus according to claim 4, wherein

The data processing device includes:
A data processing device that generates learning data for detecting abnormality of predetermined inspection target data by smoothing data by the smoothing unit,
The data processing apparatus according to claim 1, wherein when the inspection target data is input, setting of a deviation value by the deviation value setting unit is started.

The data processing device includes:
A data processing device that generates learning data for detecting abnormality of predetermined inspection target data by smoothing data by the smoothing unit,
2. The data processing apparatus according to claim 1, wherein smoothing of data by the smoothing unit is completed and generation of learning data is completed before inputting the inspection target data.

The data processing device includes:
It is connected to a dictionary database that stores predetermined dictionary data,
2. The smoothing by the smoothing unit is performed on data to be collated with dictionary data in the dictionary database, and the smoothed data is output to the dictionary database. Data processing device.

A divergence value setting step in which a computer analyzes a plurality of data each having a data value, and sets, for each data, the degree of deviation of the data value of each data from the data value of other data as a divergence value; ,
A smoothing coefficient calculating step in which a computer reflects a deviation value of each data set in the deviation value setting step and calculates a smoothing coefficient for smoothing for each data;
A data processing method, comprising: a smoothing step in which a computer smoothes each data using the smoothing coefficient of each data calculated in the smoothing coefficient calculating step.

Analyzing a plurality of data each having a data value, and for each data, a divergence value setting process for setting the degree of divergence of the data value of each data from the data value of other data as a divergence value;
A smoothing coefficient calculation process for calculating a smoothing coefficient for smoothing for each data, reflecting a deviation value of each data set by the deviation value setting process;
A program for causing a computer to execute a smoothing process for smoothing each data using a smoothing coefficient of each data calculated by the smoothing coefficient calculating process.