JP2021140675A

JP2021140675A - Performance analysis device, performance analysis method, and performance analysis program

Info

Publication number: JP2021140675A
Application number: JP2020040198A
Authority: JP
Inventors: ヤナバックフース; Backhus Jana; 洋輔肥村; Yosuke Himura; 峰義増田; Mineyoshi Masuda
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2020-03-09
Filing date: 2020-03-09
Publication date: 2021-09-16
Anticipated expiration: 2040-03-09
Also published as: JP7285798B2

Abstract

To make it possible to reduce a load on a human and to appropriately analyze performance in a system.SOLUTION: An abnormality detection device 100 analyzes performance using performance data including a plurality of entries including time information, specification information concerning a plurality of specifications indicating a context, and performance information. The abnormality detection device is configured to include: a data instance generation unit 110 for dividing performance data into a plurality of data instances 120-1 to 120-N based on the specification information related to at least one specification of an entry of the performance data; and a data instance label attaching unit 130 that evaluates data characteristics relative to the data instances 120-1 to 120-N, specifies a performance analysis method according to the evaluated data characteristics, performs performance analysis relative to entries belonging to the data instances 120-1 to 120-N by the specified performance analysis method to attach a label indicating performance analysis results.SELECTED DRAWING: Figure 1

Description

本発明は、時間情報を持つ性能データにより、システムの性能を分析する技術に関する。 The present invention relates to a technique for analyzing the performance of a system by using performance data having time information.

近年、ＩＴシステムの信頼性、可用性、セキュリティを向上させるためのＩＴ運用管理の自動化への関心が高まっている。ＩＴ運用管理の主な仕事は、ＩＴシステムの健全性の監視と維持であり、現在主に人間のオペレータによって行われている仕事を含む。 In recent years, there has been increasing interest in automating IT operation management to improve the reliability, availability, and security of IT systems. The main task of IT operation management is to monitor and maintain the health of IT systems, including the tasks currently performed primarily by human operators.

ＩＴシステムの一部が誤動作している場合、人間のオペレータは、問題の原因と同様に現在の範囲を見つけることを義務付けられる。これは、マニュアル検索のみによって行われる場合には、非常に時間のかかるプロセスとなり得る。 If part of the IT system is malfunctioning, the human operator is obliged to find the current range as well as the cause of the problem. This can be a very time consuming process if done solely by manual search.

近年、ＩＴシステムの数は、組織において着実に増加しており、ＩＴシステムの大規模なエンティティ、例えば、データセンタの使用は、ますます一般的になってきている。したがって、迅速な問題認識は、ますます困難になるが、すべての生活分野におけるＩＴシステムへの人間の依存度が増大するため、さらに重要になる。例えば、機械学習のような計算アプローチを用いて、ＩＴシステムデータを分析することによって、ＩＴシステムの健全性の問題を自動的に検出することは、上述の問題を軽減することができる。 In recent years, the number of IT systems has steadily increased in organizations, and the use of large entities in IT systems, such as data centers, has become increasingly common. Therefore, rapid problem recognition becomes increasingly difficult, but even more important as humans become more dependent on IT systems in all areas of life. For example, automatically detecting IT system health problems by analyzing IT system data using a computational approach such as machine learning can alleviate the above problems.

別の重要な側面は、設計された問題検出アプローチの再利用可能性である。ＩＴシステムの数が増加するにつれて、ＩＴシステムの健全性の問題は、多くの異なる問題コンテキストの下で考慮されなければならないが、問題コンテキストの各変化に対して新しい解決策が設計される場合、人間の大きな努力及び特定の機械学習領域の知識が必要である。
したがって、異なる問題コンテキストに対して容易に複製可能な異常検出アプローチは、特に明示的な機械学習領域の知識が必要でない場合に、ＩＴ運用管理者への救済をもたらす。 Another important aspect is the reusability of the designed problem detection approach. As the number of IT systems grows, IT system health issues must be considered under many different problem contexts, but when new solutions are designed for each change in the problem context. Great human effort and knowledge of specific machine learning areas are required.
Therefore, an anomaly detection approach that can be easily replicated for different problem contexts provides relief for IT operations managers, especially when explicit knowledge of the machine learning domain is not required.

例えば、特許文献１には、関連する技術として、クラスタ化された異常値のグループについて脅威リスクスコアを特定する方法が開示されている。この方法では、異常値タイプに関するヒントを得るために異なる特徴が使用され、データ依存の数学モデルおよびＭＬモデルをそれぞれ適用することによって、異常値が識別され、脅威リスクスコアが割り当てられる。 For example, Patent Document 1 discloses, as a related technique, a method of identifying a threat risk score for a group of clustered outliers. In this method, different features are used to get hints about outlier types, and outliers are identified and threat risk scores are assigned by applying data-dependent mathematical and ML models, respectively.

米国特許出願公開第２０１９／０２６０７９３号明細書U.S. Patent Application Publication No. 2019/0260793

ＩＴ運用管理者は、ＩＴシステムの性能に問題を見つけることに挑戦している。ＩＴシステムの性能データの分析は、異なる問題コンテキスト、すなわちＩＴシステムの属性に関する異常およびそれらの初期原因の検出をサポートする。現在、大部分の提案されている分析方法は、問題コンテキストの変更ごとに、高い人間のセットアップ努力を必要とする。 IT operations managers are trying to find problems with the performance of IT systems. Analysis of IT system performance data supports the detection of different problem contexts, namely anomalies related to IT system attributes and their initial causes. Currently, most proposed analysis methods require high human setup efforts for each change in problem context.

特許文献１の技術によると、外れ値を検出し、異なる問題コンテキストについて、各外れ値にスコアを割り当てることができる２ステップの外れ値検出方法が確立されているが、特徴工学、機械学習、または数学モデル選択、ならびにモデル調整（例えば、パラメータ）に関する決定を含む、新しい問題コンテキストごとに、高い人間ベースのモデリング努力を必要とする。また、十分なトレーニングデータも確保する必要がある。 According to the technique of Patent Document 1, a two-step outlier detection method that can detect outliers and assign a score to each outlier for different problem contexts has been established, but features engineering, machine learning, or High human-based modeling efforts are required for each new problem context, including mathematical model selection, as well as decisions regarding model adjustments (eg, parameters). It is also necessary to secure sufficient training data.

本発明は、上記事情に鑑みなされたものであり、その目的は、人間への負荷を低減でき、システムにおける性能を適切に分析することのできる技術を提供することにある。 The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a technique capable of reducing a load on a human being and appropriately analyzing performance in a system.

上記目的を達成するため、一観点に係る性能分析装置は、時間情報と、コンテキストを示す複数の諸元に関する諸元情報と、性能情報とを含むデータ要素を複数含む性能データを用いて性能を分析する性能分析装置であって、前記性能データのデータ要素の少なくとも１つの諸元に関する諸元情報に基づいて、前記性能データを複数のデータインスタンスに分割するデータインスタンス生成部と、前記データインスタンスについてのデータ特性を評価し、評価したデータ特性に応じた性能分析方法を特定し、特定した前記性能分析方法により、前記データインスタンスに属するデータ要素に対して性能分析を行って性能分析結果を示すラベルを付けるラベル付部と、を有する。 In order to achieve the above object, the performance analyzer according to one viewpoint uses time information, specification information regarding a plurality of specifications indicating a context, and performance data including a plurality of data elements including the performance information to perform performance. A performance analyzer for analysis, the data instance generation unit that divides the performance data into a plurality of data instances based on the specification information regarding at least one specification of the data element of the performance data, and the data instance. The data characteristics of the data are evaluated, the performance analysis method according to the evaluated data characteristics is specified, and the performance analysis is performed on the data elements belonging to the data instance by the specified performance analysis method, and the label showing the performance analysis result is shown. It has a labeled portion to which the data is attached.

本発明によれば、人間への負荷を低減でき、システムにおける性能を適切に分析することができる。 According to the present invention, the load on humans can be reduced, and the performance in the system can be appropriately analyzed.

図１は、一実施形態に係る異常検出装置を含む異常検出システムの全体構成図である。FIG. 1 is an overall configuration diagram of an abnormality detection system including an abnormality detection device according to an embodiment. 図２は、一実施形態に係る性能データデータベースの構成図である。FIG. 2 is a configuration diagram of a performance data database according to an embodiment. 図３は、一実施形態に係るデータインスタンス生成部によるデータインスタンス生成処理のフローチャートである。FIG. 3 is a flowchart of the data instance generation process by the data instance generation unit according to the embodiment. 図４は、一実施形態に係るデータコンテキスト選択部によるデータコンテキスト選択処理のフローチャートである。FIG. 4 is a flowchart of the data context selection process by the data context selection unit according to the embodiment. 図５は、一実施形態に係るデータインスタンスのデータ構成図である。FIG. 5 is a data structure diagram of a data instance according to an embodiment. 図６は、一実施形態に係るデータインスタンスラベル付部のデータインスタンスラベル付処理のフローチャートである。FIG. 6 is a flowchart of the data instance labeling process of the data instance labeling portion according to the embodiment. 図７は、一実施形態に係る方法プールの構成図である。FIG. 7 is a block diagram of a method pool according to an embodiment. 図８は、一実施形態に係るデータインスタンスラベルデータベースの構成図である。FIG. 8 is a configuration diagram of a data instance label database according to an embodiment. 図９は、一実施形態に係るラベルデータ採点部によるラベルデータ採点処理のフローチャートである。FIG. 9 is a flowchart of label data scoring processing by the label data scoring unit according to the embodiment. 図１０は、一実施形態に係るコンテキストスコアデータベースの構成図である。FIG. 10 is a configuration diagram of a context score database according to an embodiment. 図１１は、一実施形態に係る可視化処理部による可視化処理のフローチャートである。FIG. 11 is a flowchart of visualization processing by the visualization processing unit according to the embodiment. 図１２は、一実施形態に係る異常検出装置のハードウェア構成図である。FIG. 12 is a hardware configuration diagram of the abnormality detection device according to the embodiment. 図１３は、一実施形態に係るＧＵＩの画面例を示す図である。FIG. 13 is a diagram showing a screen example of the GUI according to the embodiment.

実施形態について、図面を参照して説明する。なお、以下に説明する実施形態は特許請求の範囲に係る発明を限定するものではなく、また実施形態の中で説明されている諸要素及びその組み合わせの全てが発明の解決手段に必須であるとは限らない。 The embodiment will be described with reference to the drawings. It should be noted that the embodiments described below do not limit the invention according to the claims, and all of the elements and combinations thereof described in the embodiments are indispensable for the means for solving the invention. Is not always.

以下の説明では、「ＡＡＡテーブル」の表現にて情報を説明することがあるが、情報は、どのようなデータ構造で表現されていてもよい。すなわち、情報がデータ構造に依存しないことを示すために、「ＡＡＡテーブル」を「ＡＡＡ情報」と呼ぶことができる。 In the following description, the information may be described by the expression of "AAA table", but the information may be expressed by any data structure. That is, the "AAA table" can be called "AAA information" to show that the information does not depend on the data structure.

また、以下の説明では、データコンテキスト諸元は、分析の対象となる性能データのデータセット内のいくつかの列の形で定義される問題コンテキストである。 Also, in the following description, the data context specifications are problem contexts defined in the form of several columns in the dataset of performance data to be analyzed.

また、適用データコンテキスト諸元は、異常検出を行う処理で使用するために選択され、性能データをデータインスタンスに分割する際に使用（適用）するデータコンテキスト諸元である。 In addition, the applied data context specifications are data context specifications that are selected for use in the process of performing anomaly detection and are used (applied) when the performance data is divided into data instances.

また、データインスタンスは、性能データのデータセットを適用データコンテキスト諸元の構成に従ってより小さなエンティティに分割したものを意味する。 A data instance also means a dataset of performance data divided into smaller entities according to the configuration of the applied data context specifications.

図１は、一実施形態に係る異常検出装置を含む異常検出システムの全体構成図である。 FIG. 1 is an overall configuration diagram of an abnormality detection system including an abnormality detection device according to an embodiment.

異常検出システムは、性能分析装置の一例としての異常検出装置１００と、性能データデータベース（ＤＢ）２００と、コンソール３００と、ディスプレイ４００とを備える。 The abnormality detection system includes an abnormality detection device 100 as an example of a performance analysis device, a performance data database (DB) 200, a console 300, and a display 400.

性能データＤＢ２００は、異常検出装置１００によって解析される性能データのデータセット（性能データセット）を含むデータセットテーブル２０２（図２参照）と、各性能データセットのそれぞれの属性についての情報を定義するデータコンテキスト２０１ａ（データコンテキストテーブル２１０のエントリ）を含むデータコンテキストテーブル２０１とを格納する。本実施形態では、性能データＤＢ２００は、異常検出装置１００の外部に設けられ、例えば、図示しないネットワークを介して接続された装置内に設けられていることを想定しているが、異常検出装置１００内に備えるようにしてもよい。性能データＤＢ２００の詳細については、図２を用いて後述する。 The performance data DB 200 defines a data set table 202 (see FIG. 2) including a data set (performance data set) of performance data analyzed by the abnormality detection device 100, and information about each attribute of each performance data set. It stores a data context table 201 including a data context 201a (an entry in the data context table 210). In the present embodiment, it is assumed that the performance data DB 200 is provided outside the abnormality detection device 100, for example, in a device connected via a network (not shown), but the abnormality detection device 100 You may prepare for it inside. Details of the performance data DB 200 will be described later with reference to FIG.

異常検出装置１００は、データコンテキスト２０１ａに従って、性能データＤＢ２００の性能データセットをデータインスタンス１２０（１２０−１〜Ｎ）に分割することによって、性能データＤＢ２００の性能データセットにおける異常を識別し、各データインスタンス１２０についてデータ（データ要素）毎にイベントラベルを割り当てる。また、異常検出装置１００は、データインスタンスラベルＤＢ１６０内のラベル付けされたデータ及びデータコンテキストに基づいて、異常スコアを計算し、各イベントラベルの異常を特定する。 The abnormality detection device 100 identifies an abnormality in the performance data set of the performance data DB 200 by dividing the performance data set of the performance data DB 200 into data instances 120 (120-1 to N) according to the data context 201a, and each data. An event label is assigned to each data (data element) for the instance 120. Further, the abnormality detection device 100 calculates an abnormality score based on the labeled data and the data context in the data instance label DB 160, and identifies the abnormality of each event label.

コンソール３００は、異常検出装置１００に接続された入力デバイスである。コンソール３００は、異常検出装置１００の管理者による管理タスクを可能にする。具体的には、コンソール３００は、管理者から異常検出装置１００の設定の入力を受け付けたり、異常検出装置１００のユーザからディスプレイ４００上の可視化されたコンテンツ（画面）の変更を受け付けたりする。 The console 300 is an input device connected to the abnormality detection device 100. The console 300 enables management tasks by the administrator of the anomaly detection device 100. Specifically, the console 300 receives an input of the setting of the abnormality detection device 100 from the administrator, and receives a change of the visualized content (screen) on the display 400 from the user of the abnormality detection device 100.

ディスプレイ４００は、ＧＵＩを用いて異常検出装置１００の結果を可視化することができる出力装置である。本実施形態では、ディスプレイ４００は、例えば、異常検出結果、例えば、異常スコア等をＧＵＩにより表示する。ディスプレイ４００でのＧＵＩの表示例については、図１３を用いて後述する。 The display 400 is an output device capable of visualizing the result of the abnormality detection device 100 using the GUI. In the present embodiment, the display 400 displays, for example, an abnormality detection result, for example, an abnormality score or the like by GUI. A display example of the GUI on the display 400 will be described later with reference to FIG.

異常検出装置１００は、データインスタンス生成部１１０と、ラベル付部の一例としてのデータインスタンスラベル付部１３０と、再帰的原因特定部１４０と、方法プール１５０と、データインスタンスラベルデータベース（ＤＢ）１６０と、コンテキストスコアデータベース（ＤＢ）１７０と、を備える。 The abnormality detection device 100 includes a data instance generation unit 110, a data instance labeling unit 130 as an example of a labeling unit, a recursive cause identification unit 140, a method pool 150, and a data instance label database (DB) 160. , And a context score database (DB) 170.

本実施形態では、異常検出装置１００において、或るデータコンテキストテーブル２０１の１つのデータコンテキスト（１つのエントリ：対象エントリという。この例では、１行目のエントリ）と、これに対応する１つの性能データセット（エントリに対応する性能データテーブル２０２）とを処理対象として取得（受信）した場合について説明する。 In the present embodiment, in the abnormality detection device 100, one data context (one entry: a target entry; in this example, the entry in the first row) of a certain data context table 201 and one performance corresponding thereto. A case where a data set (performance data table 202 corresponding to an entry) is acquired (received) as a processing target will be described.

異常検出装置１００は、データインスタンス生成部１１０において、データコンテキスト２０１ａと共に受信した性能データセットに対して前処理をしている。データインスタンス生成部１１０は、選択部の一例としてのデータコンテキスト選択部１１１を有する。 The abnormality detection device 100 performs preprocessing on the performance data set received together with the data context 201a in the data instance generation unit 110. The data instance generation unit 110 has a data context selection unit 111 as an example of the selection unit.

この前処理ステップは、データコンテキスト選択部１１１におけるデータコンテキスト諸元の選択と、データコンテキスト諸元のデータのフォーマッティングとを含む。データのフォーマッティングに関する情報は、方法プール１５０を参照して使用することができる。 This preprocessing step includes selection of the data context specifications in the data context selection unit 111 and formatting of the data of the data context specifications. Information regarding data formatting can be used with reference to Method Pool 150.

データコンテキスト選択部１１１で選択された適用データコンテキスト諸元は、性能データＤＢ２００に格納される。適用データコンテキスト諸元は、以降において、性能データをいくつかのデータインスタンス１２０に分割するために使用される。これらの詳細については、図３、図４、及び図５を参照して後述する。 The applied data context specifications selected by the data context selection unit 111 are stored in the performance data DB 200. The applicable data context specifications are subsequently used to divide the performance data into several data instances 120. These details will be described later with reference to FIGS. 3, 4, and 5.

取得されたデータインスタンス１２０は、次に、方法プール１５０から最良の方法を選択することによって、各データインスタンスにイベントラベルを割り当てるタスクを有するデータインスタンスラベル付部１３０においてさらに処理され、その後、ラベル付けされたデータインスタンスがデータインスタンスラベルＤＢ１６０に格納される。これらの詳細については、図６、図７、及び図８を参照して後述する。 The retrieved data instances 120 are then further processed in the data instance labeling section 130, which has the task of assigning event labels to each data instance by selecting the best method from the method pool 150, and then labeling. The created data instance is stored in the data instance label DB 160. These details will be described later with reference to FIGS. 6, 7, and 8.

データインスタンスラベルＤＢ１６０からのラベル付けされたデータインスタンスは、再帰的原因特定部１４０においてさらに処理される。再帰的原因特定部１４０は、採点部の一例としてのラベルデータ採点部１４１と、可視化処理部１４２とを含む。再帰的原因特定部１４０は、集約のためにデータコンテキスト選択部１１１から受信した適用データコンテキスト諸元の情報を使用して、各イベントラベルについてラベルデータ採点部１４１で異常に関するスコアを計算し、コンテキストスコアＤＢ１７０にスコア情報を保存するタスクを有する。さらに、コンテキストスコアＤＢ１７０からの集約およびスコア結果の可視化は、ディスプレイ４００上で表示するために可視化処理部１４２で準備される。コンソール３００を介して異常検出装置１００のユーザから新しい入力を受け取ると、可視化処理部１４２での新しい可視化またはラベルデータ採点部１４１でのスコアの再計算がトリガされる。これらのさらなる詳細は、図９、図１０、及び図１１を参照して後述する。 The labeled data instance from the data instance label DB 160 is further processed by the recursive cause identification unit 140. The recursive cause identification unit 140 includes a label data scoring unit 141 as an example of the scoring unit and a visualization processing unit 142. The recursive cause identification unit 140 uses the information of the applied data context specifications received from the data context selection unit 111 for aggregation, and the label data scoring unit 141 calculates the score for the abnormality for each event label, and the context. It has a task of storing score information in the score DB 170. Further, the aggregation from the context score DB 170 and the visualization of the score result are prepared by the visualization processing unit 142 for display on the display 400. When a new input is received from the user of the anomaly detection device 100 via the console 300, a new visualization in the visualization processing unit 142 or a recalculation of the score in the label data scoring unit 141 is triggered. Further details of these will be described later with reference to FIGS. 9, 10, and 11.

次に、性能データＤＢ２００について説明する。 Next, the performance data DB 200 will be described.

図２は、一実施形態に係る性能データデータベースの構成図である。 FIG. 2 is a configuration diagram of a performance data database according to an embodiment.

性能データＤＢ２００は、２つの種類のテーブル、すなわち、データコンテキストテーブル２０１と、性能データテーブル２０２（２０２−１〜ｎ）とを含む。 The performance data DB 200 includes two types of tables, namely, a data context table 201 and a performance data table 202 (202-1 to n).

データコンテキストテーブル２０１は、データコンテキスト毎のエントリを含む。データコンテキストテーブル２０１のエントリは、データＩＤＤ２０１０１、性能データテーブルＩＤＤ２０１０２、データコンテキスト諸元Ｄ２０１０３、適用データコンテキスト諸元Ｄ２０１０４、フォーマット辞書Ｄ２０１０５、及びラベリング目標Ｄ２０１０６のカラムを含む。 The data context table 201 contains entries for each data context. The entries in the data context table 201 include columns for data ID D20101, performance data table ID D20102, data context specifications D20103, applicable data context specifications D20104, format dictionary D20105, and labeling goal D20106.

データＩＤＤ２０１０１には、性能データセット（性能データテーブル２０２の１つのテーブルが格納しているデータセット）に関連付けられた一意の値であるデータＩＤが格納される。性能データテーブルＩＤＤ２０１０２には、性能データセットを格納する性能データテーブル２０２−１〜ｎのいずれかへのポインタが格納される。データコンテキスト諸元Ｄ２０１０３には、データコンテキストの情報とみなす性能データセットを格納する性能データテーブルの列（諸元：項目）の名前が格納される。 The data ID D20101 stores a data ID which is a unique value associated with the performance data set (a data set stored in one table of the performance data table 202). The performance data table ID D20102 stores a pointer to any of the performance data tables 202-1 to n that stores the performance data set. In the data context specification D20103, the name of the column (specification: item) of the performance data table that stores the performance data set regarded as the information of the data context is stored.

適用データコンテキスト諸元Ｄ２０１０４には、データコンテキスト選択部１１１によって選択された適用データコンテキスト諸元の情報とする性能データセットを格納する性能データテーブルの列（諸元）の名前を格納する。フォーマット辞書Ｄ２０１０５には、オプションとするフォーマットを行う変換プログラム名と、適用する性能データテーブルの列（諸元）の名前とが対応付けられた辞書が格納される。この辞書は、例えば、異常検出装置１００のユーザによって定義される。ラベリング目標Ｄ２０１０６には、データインスタンスラベル付部１３０で正確なラベリング方法を選択するための情報として必要とされる、性能データセットのラベリングの目標（ラベリング目標）が格納される。 The applied data context specification D20104 stores the name of the column (specification) of the performance data table that stores the performance data set as the information of the applied data context specification selected by the data context selection unit 111. The format dictionary D20105 stores a dictionary in which the name of the conversion program that performs the optional format and the name of the column (specifications) of the performance data table to be applied are associated with each other. This dictionary is defined, for example, by the user of the anomaly detection device 100. The labeling target D20106 stores a labeling target (labeling target) of the performance data set, which is required as information for selecting an accurate labeling method in the data instance labeling unit 130.

性能データテーブル２０２（２０２−１〜ｎ）は、それぞれ性能データセットを格納する。性能データテーブル２０２は、格納する性能データセットの種類によって異なる構成となっている。性能データテーブル２０２は、性能データセットにおける性能データごとのエントリ（行：データ要素）を格納する。ここで、このテーブルの以下の説明においては、性能データが、いくつかのウェブサイトへのアクセスに関する性能データである場合を例に説明する。 Each performance data table 202 (202-1 to n) stores a performance data set. The performance data table 202 has a different configuration depending on the type of performance data set to be stored. The performance data table 202 stores entries (rows: data elements) for each performance data in the performance data set. Here, in the following description of this table, a case where the performance data is performance data related to access to some websites will be described as an example.

性能データテーブル２０２のエントリ（データ要素）は、例えば、時刻Ｄ２０２０１、ＵＲＩＤ２０２０２、ソースＩＰＤ２０２０３、ＨＴＴＰｍｅｔｈｏｄＤ２０２０４、性能指標１〜ＮＤ２０２０５〜Ｄ２０２Ｎのカラムを含む。この例では、ＵＲＩＤ２０２０２、ソースＩＰＤ２０２０３、ＨＴＴＰｍｅｔｈｏｄＤ２０２０４の情報が諸元情報の一例であり、性能指標１〜ＮＤ２０２０５〜Ｄ２０２Ｎが性能情報の一例である。 The entries (data elements) of the performance data table 202 include, for example, columns at time D20201, URI D20202, source IP D2203, HTTPMord D20204, performance indicators 1-N D20205-D202N. In this example, the information of the URI D20202, the source IP D2203, and the HTTPmethod D20204 is an example of the specification information, and the performance indexes 1 to N D20205 to D202N are examples of the performance information.

時刻Ｄ２０２０１には、性能データセットの各エントリのデータについての時刻情報（例えば、年月日時分秒：時間情報の一例）が格納される。ＵＲＩＤ２０２０２には、エントリに対応する性能データが示す通信先のウェブサイトのＵＲＩ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＩｄｅｎｔｉｆｉｅｒ：ウェブアドレス）が格納される。本実施形態では、このＵＲＩは、データコンテキストの諸元の１つの例である。ソースＩＰＤ２０２０３には、エントリに対応する性能データが示す通信の送信元のＩＰアドレス（ソースＩＰ）が格納される。このＩＰアドレスは、データコンテキストの諸元の１つの例である。 The time D20201 stores time information (for example, year / month / day / hour / minute / second: an example of time information) for the data of each entry in the performance data set. The URI D20202 stores the URI (Uniform Resource Identifier: Web address) of the communication destination website indicated by the performance data corresponding to the entry. In this embodiment, this URI is an example of the specifications of the data context. The source IP D2203 stores the IP address (source IP) of the communication source indicated by the performance data corresponding to the entry. This IP address is an example of the specifications of the data context.

ＨＴＴＰｍｅｔｈｏｄＤ２０２０４には、性能データが示すウェブサイトへのアクセス時のｈｔｔｐ形式の要求方法が格納される。この要求方法は、データコンテキストの諸元の１つの一例である。性能指標１〜ＮＤ２０２０５〜Ｄ２０２Ｎには、性能データにおける性能指標（メトリック値という、一般的には数値）が格納される。なお、性能指標の種類の数は任意でよく、その種類の数に応じたカラムが用意されて使用されることとなる。 The HTTP method D20204 stores a request method in the http format at the time of accessing the website indicated by the performance data. This request method is an example of one of the specifications of the data context. Performance indexes 1 to N D20205 to D202N store performance indexes (metric values, generally numerical values) in performance data. The number of types of performance indicators may be arbitrary, and columns corresponding to the number of types will be prepared and used.

次に、データインスタンス生成部１１０によるデータインスタンス生成処理について説明する。 Next, the data instance generation process by the data instance generation unit 110 will be described.

図３は、一実施形態に係るデータインスタンス生成部によるデータインスタンス生成処理のフローチャートである。 FIG. 3 is a flowchart of the data instance generation process by the data instance generation unit according to the embodiment.

データインスタンス生成部１１０は、処理対象のデータコンテキスト情報（データコンテキストテーブル２０１のエントリ）を有する性能データセット（性能データテーブルの１つ）を性能データＤＢ２００から受信する（Ｓ１１００１）。 The data instance generation unit 110 receives a performance data set (one of the performance data tables) having the data context information (entry of the data context table 201) to be processed from the performance data DB 200 (S11001).

次いで、データインスタンス生成部１１０は、データコンテキスト選択部１１１から適用データコンテキスト諸元を取得する（Ｓ１１００２）。データコンテキスト選択部１１１では、図４に示すデータコンテキスト選択処理により、適用データコンテキスト諸元が選択されて、データインスタンス生成部１１０に適用データコンテキスト諸元が送信される。データコンテキスト選択処理は、図４を用いて後述する。 Next, the data instance generation unit 110 acquires the applied data context specifications from the data context selection unit 111 (S11002). In the data context selection unit 111, the applied data context specifications are selected by the data context selection process shown in FIG. 4, and the applied data context specifications are transmitted to the data instance generation unit 110. The data context selection process will be described later with reference to FIG.

次いで、データインスタンス生成部１１０は、適用データコンテキスト諸元における各一意値の組合せを含むリストを作成する（ステップＳ１１００３）。例えば、適用データコンテキスト諸元に含まれる諸元「ＵＲＩ」における一意の値として、「ＵＲＩ１」，「ＵＲＩ２」があり、適用データコンテキスト諸元に含まれる諸元「ソースＩＤ」における一意の値として、「１０．０．＊。＊」がある場合には、データインスタンス生成部１１０は、これらの値を組み合わせた２つの組合せ「（ＵＲＩ１，１０．０．＊．＊），（ＵＲＩ２，１０．０．＊．＊）」を含むリストを作成する。 Next, the data instance generation unit 110 creates a list including a combination of each unique value in the applied data context specifications (step S11003). For example, there are "URI1" and "URI2" as unique values in the specification "URI" included in the applied data context specification, and as unique values in the specification "source ID" included in the applied data context specification. , "10.0. *. *", The data instance generation unit 110 has two combinations "(URI1, 10.0. *. *), (URI2, 10. *) Combining these values. Create a list containing "0. *. *)".

次いで、データインスタンス生成部１１０は、リストに含まれる各組合せ毎にループ１の処理（ステップＳ１１００４，Ｓ１１００５）を実行する。この処理において処理対象の組合せを対象組合せという。 Next, the data instance generation unit 110 executes the loop 1 process (steps S11004, S11005) for each combination included in the list. In this process, the combination of processing targets is called a target combination.

ループ１の処理においては、データインスタンス生成部１１０は、性能データＤＢ２００の適用データコンテキスト諸元に対応する性能データテーブル２００（性能データセット）から、対象組合せに対応する値を含むエントリ（行）を抽出して、対象組合せに対応するデータインスタンスを生成する（ステップＳ１１００４）。 In the processing of loop 1, the data instance generation unit 110 makes an entry (row) including a value corresponding to the target combination from the performance data table 200 (performance data set) corresponding to the application data context specifications of the performance data DB 200. Extract and generate a data instance corresponding to the target combination (step S11004).

次いで、データインスタンス生成部１１０は、時間に関するデータインスタンスの疎性を考慮することによってステップＳ１１００４で取得されたデータインスタンスについての理想の時間窓サイズを決定する（ステップＳ１１００５）。ここで、疎性が高い（非常に疎である：例えば、所定の閾値よりも疎性が高い）データインスタンスに対しては、より正確なラベル付けを達成するために、疎性が所定以下となるように、大きな時間窓サイズに決定される。これにより、データインスタンスにおけるエントリの数を処理に適した数に調整することができる。 Next, the data instance generation unit 110 determines the ideal time window size for the data instance acquired in step S11004 by considering the sparseness of the data instance with respect to time (step S11005). Here, for data instances that are highly sparse (very sparse: for example, more sparse than a given threshold), the sparseness is less than or equal to the given in order to achieve more accurate labeling. Therefore, a large time window size is determined. This makes it possible to adjust the number of entries in the data instance to a number suitable for processing.

データインスタンス生成部１１０は、１つの対象組合せに対してループ１の処理を行った後には、未処理の他の組合せを次の処理対象としてループ１の処理を行い、リストの全ての組合せを対象にループ１の処理を行った場合には、ループ１を抜けて、データインスタンス生成処理を終了する。 After processing loop 1 for one target combination, the data instance generation unit 110 processes loop 1 with the other unprocessed combinations as the next processing target, and targets all combinations in the list. When the processing of the loop 1 is performed, the loop 1 is exited and the data instance generation processing is terminated.

次に、データコンテキスト選択処理（Ｓ１１００２）について説明する。 Next, the data context selection process (S11002) will be described.

図４は、一実施形態に係るデータコンテキスト選択部によるデータコンテキスト選択処理のフローチャートである。 FIG. 4 is a flowchart of the data context selection process by the data context selection unit according to the embodiment.

データコンテキスト選択部１１１は、処理対象としているデータコンテキスト諸元の各諸元ごとにループ２の処理（ステップＳ１１１０１〜Ｓ１１１０５）を実行する。ここで、処理対象としているデータコンテキスト諸元を対象データコンテキスト諸元といい、対象データコンテキスト諸元におけるループ２の処理対象としている諸元を対象諸元という。 The data context selection unit 111 executes the processing of loop 2 (steps S11101 to S11105) for each of the data context specifications to be processed. Here, the data context specification to be processed is referred to as a target data context specification, and the specification to be processed in the loop 2 in the target data context specification is referred to as a target specification.

ループ２においては、データコンテキスト選択部１１１は、性能データＤＢ２００のデータコンテキストテーブル２０１の対象データコンテキスト諸元に対応するエントリのフォーマット辞書Ｄ２０１０５から対象諸元の値のフォーマットを変更するためのフォーマットルールを取得する（Ｓ１１１０１）。本実施形態では、フォーマットルールは、諸元の名称に対して、適用可能なプログラムへのポインタが対応付けられた辞書形式となっている。 In the loop 2, the data context selection unit 111 sets a format rule for changing the format of the value of the target specification from the format dictionary D20105 of the entry corresponding to the target data context specification of the data context table 201 of the performance data DB 200. Acquire (S11101). In the present embodiment, the format rule is in a dictionary format in which pointers to applicable programs are associated with the names of the specifications.

次いで、データコンテキスト選択部１１１は、ステップＳ１１１０１で取得したルールが、対象諸元に対して利用可能であるか否かを判定する（Ｓ１１１０２）。 Next, the data context selection unit 111 determines whether or not the rule acquired in step S11101 is available for the target specifications (S11102).

この結果、対象諸元に対してフォーマットルールが利用可能である場合（Ｓ１１１０２：Ｙｅｓ）には、データコンテキスト選択部１１１は、処理をステップＳ１１１０５に進める。 As a result, when the format rule is available for the target specifications (S11102: Yes), the data context selection unit 111 advances the process to step S11105.

一方、対象データコンテキスト諸元に対してフォーマットルールが利用可能でない場合（Ｓ１１１０２：Ｎｏ）には、データコンテキスト選択部１１１は、処理をステップＳ１１１０３に進める。 On the other hand, when the format rule is not available for the target data context specifications (S11102: No), the data context selection unit 111 advances the process to step S11103.

ステップＳ１１１０３では、データコンテキスト選択部１１１は、方法プール１５０のコンテキストフォーマッティングテーブル１５２（図７参照）で定義されている対象諸元に対応するフォーマットルール（ここでは、エントリ）を取得する。 In step S11103, the data context selection unit 111 acquires a format rule (here, an entry) corresponding to the target specifications defined in the context formatting table 152 (see FIG. 7) of the method pool 150.

次いで、データコンテキスト選択部１１１は、コンテキストフォーマッティングテーブル１５２の期待正規表現フォーマットＤ１５２０２の期待正規表現フォーマットに従って、対象諸元の値（データ）が期待正規表現か否かを判定することにより、対象諸元の値に対してフォーマットルールを適用するか否かを判定する（Ｓ１１１０４）。 Next, the data context selection unit 111 determines whether or not the value (data) of the target specification is the expected regular expression according to the expected regular expression format of the expected regular expression format D15202 of the context formatting table 152, thereby determining the target specification. It is determined whether or not the format rule is applied to the value of (S11104).

この結果、対象諸元の値が期待正規表現である場合（Ｓ１１１０４：Ｎｏ）には、データフォーマットを変更しなくてもよいことを意味しているので、データコンテキスト選択部１１１は、処理をループ２の終わりに進める。 As a result, when the value of the target specification is the expected regular expression (S11104: No), it means that the data format does not need to be changed, so that the data context selection unit 111 loops the processing. Proceed to the end of 2.

一方、対象諸元が期待正規表現でない場合（Ｓ１１１０４：Ｙｅｓ）には、データフォーマットを変更する必要があることを意味しているので、データコンテキスト選択部１１１は、取得したエントリのフォーマット処理Ｄ１５２０３からフォーマットを実行するプログラム（スクリプト）のポインタを取得し、処理をステップＳ１１１０５に進める。 On the other hand, when the target specifications are not the expected regular expression (S11104: Yes), it means that the data format needs to be changed. Therefore, the data context selection unit 111 starts from the format processing D15203 of the acquired entry. The pointer of the program (script) that executes the format is acquired, and the process proceeds to step S11105.

ステップＳ１１１０５では、データコンテキスト選択部１１１は、ステップＳ１１１０２で取得されたフォーマットルール又はステップＳ１１１０４で取得されたプログラムに従って、対象諸元の値をフォーマットする。 In step S11105, the data context selection unit 111 formats the values of the target specifications according to the format rule acquired in step S11102 or the program acquired in step S11104.

データコンテキスト選択部１１１は、１つの対応諸元に対してループ２の処理を終えた後には、他の諸元を新たな処理対象としてループ２の処理を実行し、全ての諸元を処理対象とした後に、ループ２を抜けて、処理をステップＳ１１１０６に進める。 After finishing the processing of loop 2 for one corresponding specification, the data context selection unit 111 executes the processing of loop 2 with the other specifications as new processing targets, and processes all the specifications. After that, the loop 2 is exited and the process proceeds to step S11106.

ステップＳ１１１０６では、データコンテキスト選択部１１１は、性能データＤＢ２００のデータコンテキストテーブル２０１の対象データコンテキスト諸元に対応するエントリにおけるデータコンテキスト諸元Ｄ２０１０３のすべてのデータコンテキスト諸元（諸元のリスト）を、このエントリの適用データコンテキスト諸元Ｄ２０１０４にコピーする。 In step S11106, the data context selection unit 111 displays all the data context specifications (list of specifications) of the data context specifications D20103 in the entry corresponding to the target data context specifications of the data context table 201 of the performance data DB 200. Copy this entry to the applicable data context specification D20104.

次いで、データコンテキスト選択部１１１は、現在、適用データコンテキスト諸元とされているすべての諸元に基づいて、性能データセットのデータを分割し、分割によって得たデータインスタンスの疎性を評価する（Ｓ１１１０７）。 Next, the data context selection unit 111 divides the data of the performance data set based on all the specifications currently used as the applied data context specifications, and evaluates the sparseness of the data instance obtained by the division (). S11107).

次いで、データコンテキスト選択部１１１は、データインスタンスについてのデータの分割が疎すぎる（例えば、或る量のデータ行を超えるデータインスタンスがない)か否かを判定する（Ｓ１１１０８）。 Next, the data context selection unit 111 determines whether or not the data division for the data instance is too sparse (for example, there is no data instance exceeding a certain amount of data rows) (S11108).

この結果、データ分割が疎すぎる場合（Ｓ１１１０８：Ｙｅｓ）には、データコンテキスト選択部１１１は、処理をステップＳ１１１０９に進める。一方、データ分割が疎すぎない場合（Ｓ１１１０８：Ｎｏ）には、データ分割が適切に行われたことを意味しているので、データコンテキスト選択部１１１は、処理をステップＳ１１１１０に進める。 As a result, when the data division is too sparse (S11108: Yes), the data context selection unit 111 advances the process to step S11109. On the other hand, when the data division is not too sparse (S11108: No), it means that the data division has been performed appropriately, so the data context selection unit 111 proceeds to the process in step S11110.

ステップＳ１１１０９では、データコンテキスト選択部１１１は、適用データコンテキスト諸元の中から最も不均一な分布を有する諸元を検出し、検出した諸元を適用データコンテキスト諸元から落とし（削除し）、処理をステップＳ１１１０７に進める。例えば、データコンテキストテーブル２０１の１行目のエントリを処理対象としている場合には、データコンテキスト諸元のうちの諸元「ＨＴＴＰｍｅｔｈｏｄ」は、ほとんどの値が方法タイプ「ｃｏｎｎｅｃｔ」である不均一な分布を有する傾向がある。この場合には、このステップにおいては、諸元「ＨＴＴＰｍｅｔｈｏｄ」が適用データコンテキスト諸元のリストから落とされることとなる。これにより、分析処理に適していない諸元を適切にのぞくことができる。 In step S11109, the data context selection unit 111 detects the specifications having the most non-uniform distribution from the applied data context specifications, drops (deletes) the detected specifications from the applied data context specifications, and processes them. To step S11107. For example, when the entry in the first row of the data context table 201 is targeted for processing, the specification "HTTPmethod" among the data context specifications has a non-uniform distribution in which most of the values are the method type "connect". Tends to have. In this case, in this step, the specification "HTTP method" will be dropped from the list of applicable data context specifications. As a result, specifications that are not suitable for the analysis process can be appropriately excluded.

上記したステップＳ１１１０７〜Ｓ１１１０９の処理を繰り返し実行することにより、疎すぎないデータインスタンスを生成することができる適用データコンテキスト諸元を特定することができる。 By repeatedly executing the processes of steps S11107 to S11109 described above, it is possible to specify the applicable data context specifications that can generate a data instance that is not too sparse.

ステップＳ１１１１０では、データコンテキスト選択部１１１は、データ分割が疎すぎない場合（Ｓ１１１０８：Ｎｏ）、すなわち、データ分割が適切に行われている場合における適用データコンテキスト諸元を、データコンテキストテーブル２０１の対応するエントリの適用データコンテキスト諸元Ｄ２０１０４に格納する。 In step S11110, the data context selection unit 111 corresponds to the data context table 201 with the applicable data context specifications when the data division is not too sparse (S11108: No), that is, when the data division is properly performed. It is stored in the application data context specification D20104 of the entry to be applied.

上記したデータコンテキスト選択処理によると、データ分割が適切に行われるデータコンテキストの諸元（適用データコンテキスト諸元）を適切に選択することができる。 According to the above-mentioned data context selection process, it is possible to appropriately select the specifications of the data context (applied data context specifications) in which the data division is appropriately performed.

次に、データインスタンス１２０について説明する。 Next, the data instance 120 will be described.

図５は、一実施形態に係るデータインスタンスのデータ構成図である。 FIG. 5 is a data structure diagram of a data instance according to an embodiment.

データインスタンス１２０（１２０−１〜ｎ）は、データインスタンス生成部１１０から得られる。データインスタンス生成部１１０から得られるデータインスタンスの数は、適用データコンテキスト諸元によって変わる。 The data instance 120 (120-1 to n) is obtained from the data instance generation unit 110. The number of data instances obtained from the data instance generation unit 110 varies depending on the applied data context specifications.

データインスタンス１２０は、同一のデータコンテキスト（すなわち、適用データコンテキスト諸元の各諸元の値が同一であるもの）についての所定の時間区間ごとのエントリ（行：データ要素）を格納する。データインスタンス１２０のエントリは、時刻Ｄ１２００１、時間窓サイズＤ１２００２、ＵＲＩＤ１２００３、ソースＩＰＤ１２００４、性能指標１〜ＮＤ１２００５〜Ｄ１２０Ｎのカラムを含む。 The data instance 120 stores an entry (row: data element) for each predetermined time interval for the same data context (that is, the value of each specification of the applied data context specification is the same). The entry for the data instance 120 includes columns for time D12001, time window size D12002, URI D12003, source IP D12004, performance indicators 1-N D12005-D120N.

時刻Ｄ１２００１には、データインスタンスのエントリに対応する時間窓の代表時刻に対応する時刻情報（例えば、年月日時分秒）が格納される。時間窓サイズＤ１２００２には、データインスタンスのラベル付けに使用されるべき時間窓サイズ（推奨時間窓サイズ）に関する時間差情報を格納する。ＵＲＩＤ１２００３には、エントリに対応するデータコンテキストの諸元の１つである通信先のウェブサイトのＵＲＩが格納される。ソースＩＰＤ１２００４には、エントリに対応するデータコンテキストの諸元の１つである通信の送信元のＩＰアドレスが格納される。 The time D12001 stores time information (for example, year, month, day, hour, minute, second) corresponding to the representative time of the time window corresponding to the entry of the data instance. The time window size D12002 stores time difference information regarding the time window size (recommended time window size) to be used for labeling the data instance. The URI D12003 stores the URI of the communication destination website, which is one of the specifications of the data context corresponding to the entry. The source IP D12004 stores the IP address of the source of communication, which is one of the specifications of the data context corresponding to the entry.

性能指標１〜ＮＤ１２００５〜Ｄ１２０Ｎには、エントリに対応するデータインスタンスについての性能指標（メトリック値という、一般的には数値）が格納される。なお、性能指標の種類の数は任意でよく、その種類の数に応じたカラムが使用されることとなる。 Performance indexes 1 to N D12005 to D120N store performance indexes (metric values, generally numerical values) for data instances corresponding to entries. The number of types of performance indicators may be arbitrary, and columns corresponding to the number of types will be used.

次に、データインスタンスラベル付部１３０によるデータインスタンスラベル付処理について、説明する。 Next, the data instance labeling process by the data instance labeling unit 130 will be described.

図６は、一実施形態に係るデータインスタンスラベル付部のデータインスタンスラベル付処理のフローチャートである。 FIG. 6 is a flowchart of the data instance labeling process of the data instance labeling portion according to the embodiment.

まず、データインスタンスラベル付部１３０は、性能データＤＢ２００のデータコンテキストテーブル２０１の対象エントリのラベリング目標Ｄ２０１０６からラベリング目標を取得する（Ｓ１３００１）。 First, the data instance labeling unit 130 acquires the labeling target from the labeling target D20106 of the target entry in the data context table 201 of the performance data DB 200 (S13001).

次いで、データインスタンスラベル付部１３０は、データインスタンス生成部１１０で生成されたデータインスタンス１２０を受信する（Ｓ１３００２）。 Next, the data instance labeling unit 130 receives the data instance 120 generated by the data instance generation unit 110 (S1302).

次いで、データインスタンスラベル付部１３０は、各データインスタンス１２０に対してループ３の処理（Ｓ１３００３〜Ｓ１３００６）を実行する。ここで、処理対象のデータインスタンスを対象データインスタンスという。 Next, the data instance labeling unit 130 executes the loop 3 processing (S13003 to S13006) for each data instance 120. Here, the data instance to be processed is referred to as a target data instance.

ループ３の処理において、データインスタンスラベル付部１３０は、対象データインスタンスについて統計的特性を算出する（Ｓ１３００３）。例えば、データインスタンスラベル付部１３０が算出する統計的特性は、データインスタンスに含まれるエントリの性能指標の最大値及び最小値、性能指標についてのパーセンタイル、標準偏差、又はエントリの件数の少なくとも一つを含んでもよい。 In the processing of the loop 3, the data instance labeling unit 130 calculates the statistical characteristics of the target data instance (S13003). For example, the statistical characteristics calculated by the data instance labeling unit 130 include at least one of the maximum and minimum values of the performance indicators of the entries contained in the data instance, the percentiles for the performance indicators, the standard deviation, or the number of entries. It may be included.

次いで、データインスタンスラベル付部１３０は、Ｓ１３００３で算出した統計的特性と、Ｓ１３００１で取得したラベリング目標とに基づいて、対象データインスタンスに対してラベリングするために適用すべきラベリング方法を方法プール１５０から選択する（Ｓ１３００４）。具体的には、データインスタンスラベル付部１３０は、方法プール１５０から、ラベリング目標がラベリング目標Ｄ１５１Ｎ＋１に設定され、統計的特性の値が、データ属性１〜ＮＤ１５１０２〜Ｄ１５１Ｎの条件を満たすエントリを特定し、そのエントリのラベリング方法Ｄ１５１０１に設定されているラベリング方法を選択する。 Next, the data instance labeling unit 130 applies a labeling method to be applied to the target data instance from the method pool 150 based on the statistical characteristics calculated in S13003 and the labeling target acquired in S13001. Select (S13004). Specifically, the data instance labeling unit 130 identifies an entry in which the labeling target is set to the labeling target D151N + 1 from the method pool 150 and the value of the statistical characteristic satisfies the conditions of the data attributes 1 to N D15102 to D151N. Then, the labeling method set in the labeling method D15101 of the entry is selected.

次いで、データインスタンスラベル付部１３０は、データインスタンス１２０の各データ行（エントリ）に対して、Ｓ１３００４で選択したラベリング方法に従ってイベントラベルを割り当てる（Ｓ１３００５）。例えば、ラベリング目標が外れ値検出（ＯｕｔｌｉｅｒＩｄｅｎｔｉｆｉｃａｔｉｏｎ）である場合には、データインスタンスラベル付部１３０は、ラベリング方法によって、データインスタンス１２０の各データ行に対して、イベントラベルとして、外れ値又は非外れ値を示すイベントラベルを割り当てる。ここで、最良のラベリング方法は、データインスタンス１２０の統計的特性に依存して異なる傾向がある。そこで、本実施形態では、使用するラベリング方法を、方法プール１５０における統計的特性に対する条件に従って選択するようにしている。ラベリング方法を選択するための統計的特性は、データインスタンスの時間、時間窓サイズ、および性能値に基づいて生成することができる。 Next, the data instance labeling unit 130 assigns an event label to each data row (entry) of the data instance 120 according to the labeling method selected in S13004 (S13005). For example, when the labeling target is Outlier Identification, the data instance labeling unit 130 uses an outlier or non-outlier as an event label for each data row of the data instance 120, depending on the labeling method. Assign an event label that indicates the value. Here, the best labeling method tends to be different depending on the statistical characteristics of the data instance 120. Therefore, in the present embodiment, the labeling method to be used is selected according to the conditions for the statistical characteristics in the method pool 150. Statistical characteristics for choosing a labeling method can be generated based on the time, time window size, and performance values of the data instance.

次いで、データインスタンスラベル付部１３０は、データインスタンス１２０について、各データ行の割り当てられたイベントラベルと共に、データインスタンスラベルＤＢ１６０のデータインスタンステーブル１６２として格納する。 Next, the data instance labeling unit 130 stores the data instance 120 together with the event label assigned to each data row as the data instance table 162 of the data instance label DB 160.

次に、方法プール１５０について説明する。 Next, the method pool 150 will be described.

図７は、一実施形態に係る方法プールの構成図である。 FIG. 7 is a block diagram of a method pool according to an embodiment.

方法プール１５０は、２つの種類のテーブル、すなわち、ラベリング方法属性テーブル１５１と、コンテキストフォーマッティングテーブル１５２とを含む。 The method pool 150 includes two types of tables, namely a labeling method attribute table 151 and a context formatting table 152.

ラベリング方法属性テーブル１５１は、ラベリング方法毎のエントリを格納する。ラベリング方法属性テーブル１５１のエントリは、ラベリング方法Ｄ１５１０１と、１以上のデータ属性１〜ＮＤ１５１０２〜Ｄ１５１Ｎと、ラベリング目標Ｄ１５１Ｎ＋１とのカラムを含む。 The labeling method attribute table 151 stores entries for each labeling method. The entry in the labeling method attribute table 151 includes columns for the labeling method D15101, one or more data attributes 1 to N D15102 to D151N, and a labeling target D151N + 1.

ラベリング方法Ｄ１５１０１には、エントリに対応するラベリング方法の名称と、そのラベリング方法を実行するプログラムへのポインタとが格納される。データ属性１〜ＮＤ１５１０２〜Ｄ１５１Ｎには、最良のラベリング方法を選択するために考慮すべき可能性のある統計的特性（属性）についての条件が格納される。ラベリング目標Ｄ１５１Ｎ＋１には、エントリに対応するラベリング方法を使用することができる１または複数のラベリング目標が格納される。ラベリング目標Ｄ１５１Ｎ＋１には、例えば、性能分析のうちの異常検出（外れ値検出）を行う場合には、「ＯｕｔｌｉｅｒＩｄｅｎｔｉｆｉｃａｔｉｏｎ」が格納される。 The labeling method D15101 stores the name of the labeling method corresponding to the entry and a pointer to the program that executes the labeling method. Data attributes 1 to N D15102 to D151N store conditions for statistical characteristics (attributes) that may be considered in order to select the best labeling method. The labeling target D151N + 1 stores one or more labeling targets that can use the labeling method corresponding to the entry. In the labeling target D151N + 1, for example, when performing abnormality detection (outlier detection) in the performance analysis, "Outlier Identification" is stored.

コンテキストフォーマッティングテーブル１５２は、データコンテキスト諸元のタイプ（データコンテキストタイプ）ごとのエントリを格納する。コンテキストフォーマッティングテーブル１５２のエントリは、データコンテキスト諸元タイプＤ１５２０１、期待正規表現フォーマットＤ１５２０２、フォーマット処理Ｄ１５２０３のカラムを含む。 The context formatting table 152 stores entries for each type of data context specification (data context type). The entries in the context formatting table 152 include columns for data context specification type D15201, expected regular expression format D15202, and formatting D15203.

データコンテキスト諸元タイプＤ１５２０１には、エントリに対応するフォーマットルールが提供されるデータコンテキスト諸元のタイプ（種類）の名前が格納される。期待正規表現フォーマットＤ１５２０２には、エントリに対応するデータコンテキスト諸元のタイプに適合するすべてのデータコンテキスト諸元の値を抽出可能とする正規表現が格納される。フォーマット処理Ｄ１５２０３には、プログラム（スクリプトも含む）で定義されたルールに従ってデータコンテキスト諸元のデータを正規表現に再フォーマットするためのプログラムへのポインタが格納される。 The data context specification type D15201 stores the name of the type of data context specification for which the format rule corresponding to the entry is provided. The expected regular expression format D15202 stores a regular expression that allows the values of all data context specifications that match the type of data context specification corresponding to the entry to be extracted. The format process D15203 stores a pointer to the program for reformatting the data of the data context specifications into a regular expression according to the rules defined in the program (including the script).

次に、データインスタンスＤＢ１６０について説明する。 Next, the data instance DB 160 will be described.

図８は、一実施形態に係るデータインスタンスラベルデータベースの構成図である。 FIG. 8 is a configuration diagram of a data instance label database according to an embodiment.

データインスタンスラベルデータＤＢ１６０は、２つの種類のテーブル、すなわち、データインスタンス管理テーブル１６１と、データインスタンステーブル１６２（１６２−１〜Ｎ）とを含む。 The data instance label data DB 160 includes two types of tables, namely the data instance management table 161 and the data instance table 162 (1621-1N).

データインスタンス管理テーブル１６１は、データインスタンス毎のエントリを格納する。データインスタンス管理テーブル１６１のエントリは、データインスタンスＩＤＤ１６１０１、ＵＲＩＤ１６１０２、ソースＩＰＤ１６１０３、データインスタンステーブルＤ１６１０４のカラムを含む。 The data instance management table 161 stores an entry for each data instance. The entries in the data instance management table 161 include columns for the data instance ID D16101, URI D16102, source IP D16103, and data instance table D16104.

データインスタンスＩＤＤ１６１０１には、エントリに対応するデータインスタンスを識別する値（データインスタンスＩＤ）が格納される。ＵＲＩＤ１６１０２及びソースＩＰＤ１６１０３は、適用データコンテキスト諸元に対応するカラムであり、適用データコンテキスト諸元に含まれる諸元によって、異なるカラムとなる。ＵＲＩＤ１６１０２には、エントリに対応するデータインスタンスについての適用データコンテキスト諸元であるＵＲＩの値、すなわち、通信先のウェブサイトのＵＲＩ（ウェブアドレス）が格納される。ソースＩＰＤ１６１０３には、エントリに対応するデータインスタンスについての適用データコンテキスト諸元であるソースＩＰの値、すなわち、通信の送信元のＩＰアドレス（ソースＩＰ）が格納される。データインスタンステーブルＤ１６１０４には、エントリに対応するデータインスタンスに対応するデータインスタンステーブル１６２（１６２−１〜Ｎのいずれか）へのポインタが格納される。 The data instance ID D16101 stores a value (data instance ID) that identifies the data instance corresponding to the entry. The URI D16102 and the source IP D16103 are columns corresponding to the applied data context specifications, and are different columns depending on the specifications included in the applied data context specifications. The URI D16102 stores the value of the URI, which is the applicable data context specification for the data instance corresponding to the entry, that is, the URI (web address) of the website to communicate with. The source IP D16103 stores the value of the source IP, which is the applicable data context specification for the data instance corresponding to the entry, that is, the IP address (source IP) of the source of the communication. The data instance table D16104 stores a pointer to the data instance table 162 (any of 1621-1 to N) corresponding to the data instance corresponding to the entry.

データインスタンステーブル１６２−１〜Ｎのそれぞれは、データインスタンス毎に設けられ、各データインスタンスに対応するエントリ（データ要素）を格納する。データインスタンステーブル１６２のエントリは、時刻Ｄ１６２０１、性能指標１〜ＮＤ１６２０２〜Ｄ１６２Ｎ、イベントラベルＤ１６２Ｎ＋１のカラムを含む。 Each of the data instance tables 1621-1 to N is provided for each data instance and stores an entry (data element) corresponding to each data instance. The entries in the data instance table 162 include columns at time D16201, performance indicators 1-N D16202-D162N, and event label D162N + 1.

時刻Ｄ１６２０１には、エントリに対応するデータについての時刻情報（例えば、年月日時分秒）が格納される。性能指１〜ＮＤ１６２０２〜Ｄ１６２Ｎには、エントリに対応するデータについての性能指標（メトリック値という、一般的には数値）が格納される。イベントラベルＤ１６２Ｎ＋１には、エントリのデータに対して、データインスタンスラベル付部１３０によって割り当てられたイベントラベルが格納される。イベントラベルＤ１６２Ｎ＋１には、例えば、異常検出の対象のデータインスタンスについては、エントリのデータが正常である場合には、正常を示す「−１」が格納され、異常である場合には、異常を示す「１」が格納される。 Time D16201 stores time information (for example, year, month, day, hour, minute, second) for the data corresponding to the entry. Performance indexes 1 to N D16202 to D162N store performance indexes (metric values, generally numerical values) for the data corresponding to the entries. The event label D162N + 1 stores the event label assigned by the data instance labeling unit 130 for the entry data. For example, for the data instance targeted for abnormality detection, the event label D162N + 1 stores "-1" indicating normality when the entry data is normal, and indicates an abnormality when it is abnormal. "1" is stored.

次に、ラベルデータ採点部１４１によるラベルデータ採点処理について説明する。 Next, the label data scoring process by the label data scoring unit 141 will be described.

図９は、一実施形態に係るラベルデータ採点部によるラベルデータ採点処理のフローチャートである。 FIG. 9 is a flowchart of label data scoring processing by the label data scoring unit according to the embodiment.

ラベルデータ採点部１４１は、データインスタンスラベルＤＢ１６０からラベル付けされたデータインスタンス（データインスタンステーブル１６２）を取得し、一意のイベントラベルの値（ラベル値）をすべての抽出する（Ｓ１４１０１）。 The label data scoring unit 141 acquires the labeled data instance (data instance table 162) from the data instance label DB 160, and extracts all the unique event label values (label values) (S14101).

次いで、ラベルデータ採点部１４１は、データインスタンスの生成に使用したデータコンテキストの適用データコンテキスト諸元の各諸元についての全ての可能な組合せ（諸元組合せ）を含むリストを作成する（Ｓ１４１０２）。例えば、適用データコンテキスト諸元が、「ＵＲＩ」，「ソースＩＰ」である場合には、ラベルデータ採点部１４１は、（「ＵＲＩ」，「ソースＩＰ」），（「ＵＲＩ」），（「ソースＩＰ」）の３つの諸元組合せを含むリストを生成する。 Next, the label data scoring unit 141 creates a list including all possible combinations (specification combinations) for each specification of the applied data context specifications of the data context used to generate the data instance (S14102). For example, if the applicable data context specifications are "URI", "Source IP", the label data scoring unit 141 will have ("URI", "Source IP"), ("URI"), ("Source"). Generate a list containing the three specification combinations of "IP").

必要に応じて、ラベルデータ採点部１４１は、現在与えられている集約時間窓（例えば、デフォルトとして、又は、後述する表示画面を介して与えられている集約時間窓）に再サンプリングする（Ｓ１４１０３）。例えば、集約時間窓のサイズが１時間である場合には、ラベルデータ採点部１４１は、１分間の時間窓のデータを、加算し、又はカウントすることによって、１時間の時間窓のデータに再サンプリングする。 If necessary, the label data scoring unit 141 resamples to the currently given aggregation time window (eg, by default or the aggregation time window given via the display screen described below) (S14103). .. For example, when the size of the aggregated time window is 1 hour, the label data scoring unit 141 reconverts the data of the 1-minute time window into the data of the 1-hour time window by adding or counting the data of the 1-minute time window. Sampling.

次いで、ラベルデータ採点部１４１は、ステップＳ１４１０１で抽出した各ラベル値についてループ４の処理（Ｓ１４１０４〜Ｓ１４１０８）を実行する。ここで、処理の対象となっているラベル値を対象ラベル値という。 Next, the label data scoring unit 141 executes loop 4 processing (S14104 to S14108) for each label value extracted in step S14101. Here, the label value to be processed is referred to as a target label value.

ループ４の処理においては、ラベルデータ採点部１４１は、ステップＳ１４１０２で取得された各諸元組合せについてループ５の処理（Ｓ１４１０４〜Ｓ１４１０６）を実行する。ここで、処理対象となっている諸元組合せを対象諸元組合せという。 In the processing of the loop 4, the label data scoring unit 141 executes the processing of the loop 5 (S14104 to S14106) for each specification combination acquired in step S14102. Here, the specification combination to be processed is referred to as a target specification combination.

ループ５の処理では、ラベルデータ採点部１４１は、データインスタンスについて、対象諸元組合せ、対象ラベル値、及び与えられている集約時間窓の集合に従ってデータを集約する（Ｓ１４１０４）。例えば、対象諸元組合せが「ＵＲＩ」である場合には、適用データコンテキスト諸元の他の諸元については考慮せずに、対象諸元組合せの諸元の値（同じＵＲＩの値）及び対象ラベル値を有するデータ行について、対象ラベル値の加算またはデータ行の数をカウントすることにより、データの集約を行う。 In the processing of the loop 5, the label data scoring unit 141 aggregates the data of the data instance according to the target specification combination, the target label value, and the set of the given aggregation time windows (S14104). For example, when the target specification combination is "URI", the value of the specification of the target specification combination (value of the same URI) and the target without considering other specifications of the applied data context specifications. For data rows having label values, data is aggregated by adding the target label values or counting the number of data rows.

次いで、ラベルデータ採点部１４１は、集約されたデータ（集約データ）に対する性能評価結果（ここでは、異常）についてのスコアを、現在の時間窓内のデータと、同様のデータについての過去（所定の時間前、例えば、１週間前）の時間窓内のデータとを比較することによって計算する（Ｓ１４１０５）。例えば、ラベルデータ採点部１４１は、過去からの集約データの変化量に基づいて、ランク付けし、例えば、最大の変化を有する集約データに対して最高のスコアを与える。具体的には、例えば、ランク付けは、変化量が小さいほど低いランク（数値が小さいランク）とし、変化量とランクとを乗算した結果をスコアとする。本実施形態では、スコアが大きいほど異常が発生している可能性が高いことを示す。 Next, the label data scoring unit 141 sets the score for the performance evaluation result (abnormal in this case) for the aggregated data (aggregated data) as the data in the current time window and the past (predetermined) for the same data. It is calculated by comparing the data in the time window before the hour (for example, one week before) (S14105). For example, the label data scoring unit 141 ranks based on the amount of change in the aggregated data from the past, and gives the highest score to the aggregated data having the largest change, for example. Specifically, for example, in the ranking, the smaller the amount of change, the lower the rank (the rank with the smaller numerical value), and the result of multiplying the amount of change and the rank is used as the score. In this embodiment, the larger the score, the higher the possibility that an abnormality has occurred.

次いで、ラベルデータ採点部１４１は、ステップＳ１４１０４での集約データと、ステップＳ１４１０５で計算したスコアとを、コンテキストスコアＤＢ１７０に格納する（Ｓ１４１０６）。 Next, the label data scoring unit 141 stores the aggregated data in step S14104 and the score calculated in step S14105 in the context score DB 170 (S14106).

ラベルデータ採点部１４１は、ループ５の処理を全ての諸元組合せを対象に実行し、すべての諸元組合せに対してループ５の処理を終了した場合には、ループ５を抜ける。 The label data scoring unit 141 executes the processing of the loop 5 for all the specification combinations, and exits the loop 5 when the processing of the loop 5 is completed for all the specification combinations.

ループ５を抜けると、ラベルデータ採点部１４１は、ループ５の処理において得られた各諸元組合せの集約データに対して得られたスコアに基づいて、対応するデータコンテキストに対する総合スコアを算出する（Ｓ１４１０７）。本実施形態では、総合スコアは、例えば、各スコアを合計した値としている。 After exiting the loop 5, the label data scoring unit 141 calculates the total score for the corresponding data context based on the score obtained for the aggregated data of each specification combination obtained in the processing of the loop 5 ( S14107). In the present embodiment, the total score is, for example, the total value of each score.

次いで、ラベルデータ採点部１４１は、ステップＳ１４１０７で算出した総合スコアをコンテキストスコアＤＢ１７０のデータコンテキストスコアテーブル１７２の総合スコアＤ１７２１０に格納する（Ｓ１４１０８）。 Next, the label data scoring unit 141 stores the total score calculated in step S14107 in the total score D17210 of the data context score table 172 of the context score DB170 (S14108).

ラベルデータ採点部１４１は、ループ４の処理を全てのラベル値を対象に実行し、すべてのラベル値に対してループ４の処理を終了した場合には、ループ４を抜け、ラベルデータ採点処理を終了する。 The label data scoring unit 141 executes the processing of the loop 4 for all the label values, and when the processing of the loop 4 is completed for all the label values, exits the loop 4 and performs the label data scoring processing. finish.

次に、コンテキストスコアＤＢ１７０について説明する。 Next, the context score DB 170 will be described.

図１０は、一実施形態に係るコンテキストスコアデータベースの構成図である。 FIG. 10 is a configuration diagram of a context score database according to an embodiment.

コンテキストスコアＤＢ１６０は、２つの種類のテーブル、すなわち、データコンテキスト集約テーブル１７１と、データコンテキストスコアテーブル１７２とを含む。 The context score DB 160 includes two types of tables, namely a data context aggregation table 171 and a data context score table 172.

データコンテキスト集約テーブル１７１は、適用データコンテキスト諸元の値毎に所定の集約時間で集約した集約データセット毎のエントリ（データ要素）を格納する。データコンテキスト集約テーブル１７１のエントリは、時刻Ｄ１７１０１、データコンテキスト（ＵＲＩＤ１７１０２、ソースＩＰＤ１７１０３）、及びデータコンテキストベースの集約（集約ＵＲＩＤ１７１０４、集約ソースＩＰＤ１７１０５、集約ＵＲＩ×ソースＩＰＤ１７１０６）のカラムを含む。 The data context aggregation table 171 stores entries (data elements) for each aggregated data set aggregated at a predetermined aggregation time for each value of the applied data context specifications. The entries in the data context aggregation table 171 include columns for time D17101, data context (URI D17102, source IP D17103), and data context-based aggregation (aggregation URI D17104, aggregation source IP D17105, aggregation URI x source IP D17106). ..

時刻Ｄ１７１０１には、エントリに対応する集約データセットの集約時間の基準となる代表時刻（例えば、集約時間の最初の時刻）についての時刻情報（例えば、年月日時分秒）が格納される。 Time D17101 stores time information (for example, year, month, day, hour, minute, second) about a representative time (for example, the first time of the aggregation time) that serves as a reference for the aggregation time of the aggregation data set corresponding to the entry.

データコンテキスト（ＵＲＩＤ１７１０２、ソースＩＰＤ１７１０３）には、エントリに対応する集約データセットにおける適用データコンテキスト諸元ごとの値（データコンテキスト値）が格納される。ＵＲＩＤ１７１０２には、エントリに対応する集約データセットについてのＵＲＩの値、すなわち、通信先のウェブサイトのＵＲＩ（ウェブアドレス）が格納される。ソースＩＰＤ１７１０３には、エントリに対応する集約データセットについてのソースＩＰの値、すなわち、通信の送信元のＩＰアドレス（ソースＩＰ）が格納される。 The data context (URI D17102, source IP D17103) stores a value (data context value) for each applicable data context specification in the aggregated dataset corresponding to the entry. The URI D17102 stores the URI value for the aggregated data set corresponding to the entry, that is, the URI (web address) of the website to communicate with. The source IP D17103 stores the value of the source IP for the aggregated data set corresponding to the entry, that is, the IP address (source IP) of the source of the communication.

データコンテキストベースの集約（集約ＵＲＩＤ１７１０４、集約ソースＩＰＤ１７１０５、集約ＵＲＩ×ソースＩＰＤ１７１０６）には、所定の集約時間に対応するデータセットにおける適用データコンテキスト諸元についての諸元の組合せごとのデータの集約値が格納される。集約ＵＲＩＤ１７１０４には、エントリに対応するデータセットにおけるＵＲＩの値が共通するデータの数が格納される。集約ソースＩＰＤ１７１０５には、エントリに対応するデータセットにおけるソースＩＰの値が共通するデータの数が格納される。集約ＵＲＩ×ソースＩＰＤ１７１０６には、エントリに対応するデータセットにおけるＵＲＩの値及びソースＩＰの値が共通するデータの数が格納される。 Data context-based aggregations (aggregate URI D17104, aggregate source IP D17105, aggregate URI x source IP D17106) include data for each combination of specifications for the applicable data context specifications in the dataset corresponding to the given aggregation time. The aggregated value is stored. The aggregated URI D17104 stores the number of data in which the URI values are common in the dataset corresponding to the entry. The aggregate source IP D17105 stores the number of data having a common source IP value in the dataset corresponding to the entry. Aggregate URI x source IP D17106 stores the number of data in which the URI value and the source IP value in the data set corresponding to the entry are common.

データコンテキストスコアテーブル１７２は、適用データコンテキスト諸元の値毎に所定の集約時間で集約した集約データセット毎のエントリを格納する。データコンテキストスコアテーブル１７２のエントリは、時刻Ｄ１７２０１、ＵＲＩＤ１７２０２、ソースＩＰＤ１７２０３、ＵＲＩ差（ランク）Ｄ１７２０４、ソースＩＰ差（ランク）Ｄ１７２０５、ＵＲＩ×ソースＩＰ差（ランク）Ｄ１７２０６、ＵＲＩスコアＤ１７２０７、ソースＩＰスコアＤ１７２０８、ＵＲＩ×ソースＩＰスコアＤ１７２０９、及び総合スコアＤ１７２１０のカラムを含む。 The data context score table 172 stores the entries for each aggregated data set aggregated at a predetermined aggregation time for each value of the applied data context specifications. The entries in the data context score table 172 are time D1721, URI D17202, source IP D1723, URI difference (rank) D17204, source IP difference (rank) D17205, URI x source IP difference (rank) D17206, URI score D17207, source IP. Includes columns with score D17208, URI x source IP score D17209, and overall score D17210.

時刻Ｄ１７２０１には、エントリに対応する集約データセットの集約時間の基準となる代表時刻（例えば、集約時間の最初の時刻）についての時刻情報（例えば、年月日時分秒）が格納される。ＵＲＩＤ１７２０２、ソースＩＰＤ１７２０３には、エントリに対応する集約データセットの適用データコンテキスト諸元の各諸元の値が格納される。 The time D17201 stores time information (for example, year, month, day, hour, minute, second) about a representative time (for example, the first time of the aggregation time) that is a reference for the aggregation time of the aggregation data set corresponding to the entry. In the URI D17202 and the source IP D17203, the values of the applicable data context specifications of the aggregated data set corresponding to the entries are stored.

ＵＲＩ差（ランク）Ｄ１７２０４、ソースＩＰ差（ランク）Ｄ１７２０５、ＵＲＩ×ソースＩＰ差（ランク）Ｄ１７２０６には、集約データセットにおける各諸元組合せについての現在（例えば、今週）の値と過去（例えば、先週）の値との絶対値の差と、集約データセット間での各諸元組合せの絶対値差のランクとが格納される。これらのカラムの情報は、スコアを計算するために用いることができ、例えば、絶対値差、絶対値差のランク、現在の値と過去の値とのランクの絶対値の差等を用いることができる。 The URI difference (rank) D17204, the source IP difference (rank) D17205, and the URI x source IP difference (rank) D17206 include the current (eg, this week) and past (eg, this week) values for each specification combination in the aggregated dataset. The difference between the absolute value and the value of (last week) and the rank of the absolute value difference of each specification combination between the aggregated data sets are stored. The information in these columns can be used to calculate the score, for example, the absolute value difference, the rank of the absolute value difference, the difference in the absolute value of the rank between the current value and the past value, and the like. can.

ＵＲＩスコアＤ１７２０７、ソーススコアＤ１７２０８、ＵＲＩ×ソースＩＰスコアＤ１７２０９には、適用データコンテキスト諸元の各諸元組合せについてのスコアが格納される。総合スコアＤ１７２１０には、カラムＤ１７２０７〜Ｄ１７２０９のスコアを用いて、所定の計算式（例えば、スコアの加算）を実行することにより得られる総合スコアが格納される。 The URI score D17207, source score D17208, and URI × source IP score D17209 store scores for each combination of applicable data context specifications. The total score D17210 stores the total score obtained by executing a predetermined calculation formula (for example, addition of scores) using the scores of columns D17207 to D17209.

次に、可視化処理部１４２による可視化処理について説明する。 Next, the visualization process by the visualization processing unit 142 will be described.

図１１は、一実施形態に係る可視化処理部による可視化処理のフローチャートである。 FIG. 11 is a flowchart of visualization processing by the visualization processing unit according to the embodiment.

可視化処理部１４２は、コンテキストスコアＤＢ１７０から集約データとスコア情報を取得する（Ｓ１４２０１）。 The visualization processing unit 142 acquires aggregated data and score information from the context score DB 170 (S14201).

次いで、可視化処理部１４２は、集約データに対応する各スコアについて、各スコアに対して予め定義されている閾値よりも大きいスコアがあるか否かを判定する（Ｓ１４２０２）。 Next, the visualization processing unit 142 determines whether or not each score corresponding to the aggregated data has a score larger than a predetermined threshold value for each score (S14202).

この結果、閾値よりも大きいスコアがある場合（Ｓ１４２０２：Ｙｅｓ）には、可視化処理部１４２は、オペレータに警報を送信し（Ｓ１４２０３）、処理をステップＳ１４２０４に進める。一方、閾値よりも大きいスコアがない場合（Ｓ１４２０２：Ｎｏ）には、処理をステップＳ１４２０４に進める。 As a result, when there is a score larger than the threshold value (S14202: Yes), the visualization processing unit 142 sends an alarm to the operator (S14203), and proceeds to the process in step S14204. On the other hand, if there is no score larger than the threshold value (S14202: No), the process proceeds to step S14204.

ステップＳ１４２０４では、可視化処理部１４２は、ステップＳ１４２０１で取得した集計データ及びスコア情報を可視化する。具体的には、可視化処理部１４２は、集計データ及びスコア情報により表示画面のデータを生成して、表示画面（図１３参照）をディスプレイ４００に表示させる。 In step S14204, the visualization processing unit 142 visualizes the aggregated data and the score information acquired in step S14201. Specifically, the visualization processing unit 142 generates display screen data from aggregated data and score information, and displays the display screen (see FIG. 13) on the display 400.

次いで、可視化処理部１４２は、異常検出装置１００のユーザによるコンソール３００からの入力を待つ（Ｓ１４２０５）。コンソール３００に対する入力としては、集約期間の変更や、表示させない諸元の値の設定等の表示内容の変更指示の入力がある。 Next, the visualization processing unit 142 waits for an input from the console 300 by the user of the abnormality detection device 100 (S14205). As the input to the console 300, there is an input of a change instruction of the display content such as a change of the aggregation period and a setting of the value of the specification not to be displayed.

次いで、可視化処理部１４２は、ステップＳ１４２０５で入力された変更指示に対応する画面の表示に必要なデータがコンテキストスコアＤＢ１７０において使用可能であるか否かを判定する（Ｓ１４２０６）。 Next, the visualization processing unit 142 determines whether or not the data necessary for displaying the screen corresponding to the change instruction input in step S14205 can be used in the context score DB 170 (S14206).

この結果、必要なデータがコンテキストスコアＤＢ１７０において使用可能である場合（Ｓ１４２０６：Ｙｅｓ）には、可視化処理部１４２は、コンテキストスコアＤＢ１７０から必要な集約データとスコア情報とを取得し（Ｓ１４２０７）、処理をステップＳ１４２０４に進める。 As a result, when the necessary data is available in the context score DB 170 (S14206: Yes), the visualization processing unit 142 acquires the necessary aggregated data and score information from the context score DB 170 (S14207) and processes them. To step S14204.

一方、必要なデータがコンテキストスコアＤＢ１７０において使用可能でない場合（Ｓ１４２０６：Ｎｏ）には、可視化処理部１４２は、ユーザからの入力に基づいて、集約及びスコア付けに関連するパラメータ（例えば、時間窓サイズ、特定のデータコンテキスト諸元の値の表示除外等）を更新する（Ｓ１４２０８）。 On the other hand, if the required data is not available in the context score DB 170 (S14206: No), the visualization processor 142 will use parameters related to aggregation and scoring (eg, time window size) based on user input. , Exclusion of display of values of specific data context specifications, etc.) (S14208).

次いで、可視化処理部１４２は、集約及びスコア付けに関連するパラメータをラベルデータ採点部１４１に送信し、ラベルデータ採点部１４１による新たなパラメータを使用してのラベルデータ採点処理を実行させ（Ｓ１４２０９）、可視化処理を終了する。なお、新たなパラメータを使用してのラベルデータ採点処理が実行された後には、新たに、可視化処理が実行されて表示画面が表示されることとなる。 Next, the visualization processing unit 142 transmits the parameters related to aggregation and scoring to the label data scoring unit 141, and causes the label data scoring unit 141 to execute the label data scoring process using the new parameters (S14209). , End the visualization process. After the label data scoring process using the new parameters is executed, the visualization process is newly executed and the display screen is displayed.

次に、異常検出装置１００のハードウェア構成について説明する。 Next, the hardware configuration of the abnormality detection device 100 will be described.

図１２は、一実施形態に係る異常検出装置のハードウェア構成図である。 FIG. 12 is a hardware configuration diagram of the abnormality detection device according to the embodiment.

異常検出装置１００は、例えば、汎用コンピュータであり、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）６０１と、メモリ６０２と、補助記憶装置６０３と、通信インタフェース６０４と、媒体インタフェース６０５と、入出力インタフェース６０６とを含む。 The abnormality detection device 100 is, for example, a general-purpose computer, and includes a CPU (Central Processing Unit) 601, a memory 602, an auxiliary storage device 603, a communication interface 604, a medium interface 605, and an input / output interface 606.

ＣＰＵ６０１は、メモリ６０２又は補助記憶装置６０３に格納されたプログラムを実行し、メモリ６０２又は補助記憶装置６０３に格納されたデータを使用することにより各種処理を実行する。メモリ６０２は、例えば、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）であり、ＣＰＵ６０１によって実行されるプログラムや、データ等を記憶する。補助記憶装置６０３は、例えば、ハードディスクドライブ、フラッシュメモリ、ＲＡＭ等であり、ＣＰＵ６０１により実行されるプログラムや、ＣＰＵ６０１によって使用されるデータを記憶する。 The CPU 601 executes a program stored in the memory 602 or the auxiliary storage device 603, and executes various processes by using the data stored in the memory 602 or the auxiliary storage device 603. The memory 602 is, for example, a RAM (Random Access Memory), and stores a program executed by the CPU 601, data, and the like. The auxiliary storage device 603 is, for example, a hard disk drive, a flash memory, a RAM, or the like, and stores a program executed by the CPU 601 and data used by the CPU 601.

通信インタフェース６０４は、ネットワーク６０８を介して、他の装置と通信するためのインタフェースである。媒体インタフェース６０５は、外部記憶媒体６０７を着脱可能であり、外部記憶媒体６０７とのデータの入出力を仲介する。入出力インタフェース６０６は、異常検出装置１００の管理者やユーザによって操作されるコンソール３００やディスプレイ４００と接続可能であり、コンソール３００との情報の入出力を実行したり、ディスプレイ４００への表示を実行する。 The communication interface 604 is an interface for communicating with another device via the network 608. The medium interface 605 is removable from the external storage medium 607 and mediates the input / output of data to and from the external storage medium 607. The input / output interface 606 can be connected to the console 300 or the display 400 operated by the administrator or the user of the abnormality detection device 100, and can input / output information to / from the console 300 or display the information on the display 400. do.

図１における異常検出装置１００の各機能部は、例えば、ＣＰＵ６０１がメモリ６０２又は補助記憶装置６０３に格納されたプログラム（性能分析プログラム）を実行することにより実現される。また、機能部（方法プール１５０、データインスタンスラベルＤＢ１６０、コンテキストスコアＤＢ１７０）で管理される情報は、記憶部の一例であるメモリ６０２又は補助記憶装置６０３に格納される。 Each functional unit of the abnormality detection device 100 in FIG. 1 is realized, for example, by the CPU 601 executing a program (performance analysis program) stored in the memory 602 or the auxiliary storage device 603. Further, the information managed by the functional unit (method pool 150, data instance label DB 160, context score DB 170) is stored in the memory 602 or the auxiliary storage device 603, which is an example of the storage unit.

ＣＰＵ６０１が実行するプログラムは、必要に応じて通信インタフェース６０４を介して他の装置から取得してもよいし、媒体インタフェース６０５を介して利用可能な記憶媒体から読み出して取得してもよい。記憶媒体は、例えば、媒体インタフェース６０５に着脱可能な通信媒体(すなわち、有線、無線、光ネットワーク、ネットワークを伝搬するキャリアやデジタル信号)や外部記憶媒体６０７である。 The program executed by the CPU 601 may be acquired from another device via the communication interface 604, or may be read from and acquired from an available storage medium via the medium interface 605, if necessary. The storage medium is, for example, a communication medium (that is, a wired, wireless, optical network, a carrier or digital signal propagating in the network) or an external storage medium 607 that can be attached to and detached from the medium interface 605.

次に、ＧＵＩの画面例を説明する。 Next, a GUI screen example will be described.

図１３は、一実施形態に係るＧＵＩの画面例を示す図である。なお、図１３の画面は、後述するコンテキスト１（ＵＲＩ）のコンテキストタブ４０１−１が選択されている場合の例を示している。 FIG. 13 is a diagram showing a screen example of the GUI according to the embodiment. The screen of FIG. 13 shows an example when the context tab 401-1 of the context 1 (URI) described later is selected.

ディスプレイ４００に表示される画面１３００は、コンテキストタブ４０１（４０１-１〜４０１−７）、ヒートマップ４０２（図１３では、４０２−１）、上位外れ値リスト４０３（図１３では、４０３−１）、閾値表示領域４０４（図１３では、４０４−１）を含む。画面１３００における表示内容は、可視化処理部１４２から送信される情報に基づいて、適宜更新される。 The screen 1300 displayed on the display 400 includes a context tab 401 (401-1 to 401-7), a heat map 402 (402-1 in FIG. 13), and a top outlier list 403 (403-1 in FIG. 13). , Includes a threshold display area 404 (404-1 in FIG. 13). The display content on the screen 1300 is appropriately updated based on the information transmitted from the visualization processing unit 142.

コンテキストタブ４０１は、適用データコンテキスト諸元における各諸元組合せについての可視化コンテンツを含めるためのコンテナであり、諸元組合せの個数分のタブが備えられる。例えば、適用データコンテキスト諸元が３つの諸元を含む場合には、コンテキストタブ４０１は、コンテキストタブ４０１−１〜４０１−７の７個となる。図１３の例では、コンテキストタブ４０１−１が選択されているので、コンテキストタブ４０１−１が強調表示されている。 The context tab 401 is a container for containing visualization contents for each specification combination in the applied data context specification, and tabs for the number of specification combinations are provided. For example, when the applied data context specifications include three specifications, the number of context tabs 401 is seven, that is, context tabs 401-1 to 401-7. In the example of FIG. 13, since the context tab 401-1 is selected, the context tab 401-1 is highlighted.

ヒートマップ４０２は、適用データコンテキスト諸元の選択されている諸元組合せ（すなわち、選択されているコンテキストタブ４０１に対応する諸元組合せにおける、いくつかの諸元の値についてのスコアのマップである。図１３の例では、ヒートマップ４０２−１は、諸元組合せをＵＲＩとした場合における、いくつかのＵＲＩの値に対するスコアのマップとなっている。ヒートマップによると、諸元組合せのいずれの値において、最大の異常が発生したかを容易に把握することができる。 The heat map 402 is a map of scores for the values of some of the selected specification combinations of the applied data context specifications (ie, in the specification combinations corresponding to the selected context tab 401. In the example of FIG. 13, the heat map 402-1 is a map of scores for some URI values when the specification combination is URI. According to the heat map, any of the specification combinations In terms of value, it is possible to easily grasp whether or not the largest abnormality has occurred.

上位外れ値リスト４０３は、スコアが上位の所定数の諸元組合せにおける値を可視化したリストである。図１３の例では、上位外れ値リスト４０３は、諸元組合せをＵＲＩとした場合のスコアが上位のＵＲＩとして、ＵＲＩ１、ＵＲＩ２、ＵＲＩ３・・・の順番で並んでいるリストとなっている。なお、図１３の上位外れ値リスト４０３においては、高いスコアを有するが可視化する必要がない諸元の値を、リストから除外するためのチェックボックス４０３１−１等が用意されている。また、上位外れ値リスト４０３には、考慮対象とする開始時間および終了時間や、可視化のための集約時間窓サイズ（Ａｇｇｒｅｇａｔｉｏｎ）を表示し、設定可能な領域がある。 The high-ranking outlier list 403 is a list that visualizes the values in a predetermined number of specification combinations having high scores. In the example of FIG. 13, the high-ranking outlier list 403 is a list in which the scores are arranged in the order of URI1, URI2, URI3, ... As the high-ranking URI when the specification combination is URI. In addition, in the high-order outlier list 403 of FIG. 13, a check box 4031-1 and the like for excluding the values of specifications having a high score but not needing to be visualized from the list are prepared. Further, in the high-order outlier list 403, there is an area in which the start time and the end time to be considered and the aggregation time window size (Aggression) for visualization are displayed and can be set.

なお、本発明は、上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で、適宜変形して実施することが可能である。 The present invention is not limited to the above-described embodiment, and can be appropriately modified and implemented without departing from the spirit of the present invention.

例えば、上記実施形態では、異常を検出する異常検出装置を例にしていたが、本発明はこれに限られず、各種装置の性能を分析する装置にも適用することができる。 For example, in the above embodiment, an abnormality detection device for detecting an abnormality has been used as an example, but the present invention is not limited to this, and the present invention can be applied to a device for analyzing the performance of various devices.

また、上記実施形態において、ＣＰＵが行っていた処理の一部又は全部を、ハードウェア回路で行うようにしてもよい。また、上記実施形態におけるプログラムは、プログラムソースからインストールされてよい。プログラムソースは、プログラム配布サーバ又は記憶メディア（例えば可搬型の記憶メディア）であってもよい。 Further, in the above embodiment, a part or all of the processing performed by the CPU may be performed by the hardware circuit. In addition, the program in the above embodiment may be installed from the program source. The program source may be a program distribution server or a storage medium (eg, a portable storage medium).

また、上記実施形態においては、ＩＴシステムの運用管理での利用を想定した性能検出装置について説明したが、本発明はこれに限られず、データコンテキストに基づいてデータを分割してデータインスタンスを生成するケースにおいても性能分析装置を使用してもよく、例えば、ＯＴ（ＯｐｅｒａｔｉｏｎａｌＴｅｃｈｎｏｌｏｇｙ）においても性能分析装置を使用してもよい。 Further, in the above embodiment, the performance detection device assuming the use in the operation management of the IT system has been described, but the present invention is not limited to this, and the data is divided based on the data context to generate a data instance. A performance analyzer may be used in the case as well, and for example, a performance analyzer may be used in OT (Operational Technology).

１００…異常検出装置、１１０…データインスタンス生成部、１１１…データコンテキスト選択部、１３０…データインスタンスラベル付部、１４０…再帰的原因特定部、１４１…ラベルデータ採点部、１４２…可視化処理部、１５０…方法プール、１６０…データインスタンスラベルＤＢ、１７０…コンテキストスコアＤＢ、２００…性能データＤＢ

100 ... Abnormality detection device, 110 ... Data instance generation unit, 111 ... Data context selection unit, 130 ... Data instance labeling unit, 140 ... Recursive cause identification unit, 141 ... Label data scoring unit, 142 ... Visualization processing unit, 150 ... Method pool, 160 ... Data instance label DB, 170 ... Context score DB, 200 ... Performance data DB

Claims

A performance analysis device that analyzes performance using performance data that includes time information, specification information related to a plurality of specifications indicating context, and performance data including a plurality of data elements including performance information.
A data instance generation unit that divides the performance data into a plurality of data instances based on the specification information regarding at least one specification of the data element of the performance data.
Evaluate the data characteristics of the data instance, specify the performance analysis method according to the evaluated data characteristics, and perform performance analysis on the data elements belonging to the data instance by the specified performance analysis method. Labeled parts that label the results and
Performance analyzer with.

Further having a method pool for storing the data characteristics in association with the performance analysis method used for performance analysis on the data instance having the data characteristics.
The performance analyzer according to claim 1, wherein the labeled portion specifies a performance analysis method corresponding to the evaluated data characteristics from the method pool.

The performance analyzer according to claim 2, wherein the data characteristics are statistical characteristics for the data instance.

A claim that further has a scoring unit that creates aggregated data that aggregates the same labeled data elements within a predetermined aggregation time in the data instance and calculates the score of the performance analysis result for the aggregated data. Item 1. The performance analyzer according to Item 1.

The scoring unit is the difference between the number of aggregated data elements in the aggregated data and the number of aggregated data elements in the past aggregated data created for the predetermined aggregation time in the past at a predetermined time point. The performance analyzer according to claim 4, wherein the score in the aggregated data is calculated based on the rank of the difference in the plurality of data instances.

The performance analyzer according to claim 4, further comprising a visualization processing unit that displays information on the specifications of the data instance aggregated in the aggregated data and the calculated score.

The performance according to claim 6, wherein the visualization processing unit receives a change in the aggregation time from a user, causes the scoring unit to recalculate based on the changed aggregation time, and displays the result of the recalculation. Analysis equipment.

For each combination of one or more specifications used when the data instance was generated, the scoring unit has the same combination specification value and the same label within a predetermined aggregation time. A claim that further has a scoring unit that identifies the number of data elements attached, calculates a score for each of all combinations, and calculates an overall score based on the calculated score for all combinations. Item 4. The performance analyzer according to Item 4.

The performance according to claim 1, wherein the data instance generation unit applies the specification information of one or more of the plurality of specifications of the performance data when the performance data is divided into the data instances. Analysis equipment.

Applying a plurality of specifications of the performance data, the performance data is divided into the data instances, the temporal sparseness of the data elements of the divided data instances is evaluated, and the time of the data elements of the divided data instances is evaluated. If it is too sparse, specify the most non-uniform specification among the plurality of applied specifications, and select one or more specifications excluding the specified specifications from the plurality of specifications in the data instance. The performance analyzer according to claim 9, further comprising a selection unit that determines one or more specifications to be applied by the generation unit.

The data instance generation unit
When the performance data is divided into a plurality of data instances, the time window size for the data element of the performance data, which is the target of the data element of the data instance, is set so that the sparseness of the data element of the data instance is equal to or less than a predetermined value. The performance analyzer according to claim 1, which is determined in 1.

It is a performance analysis method by a performance analyzer that analyzes performance using performance data including a plurality of entries including time information, specification information regarding a plurality of specifications indicating a context, and performance information.
The performance data is divided into a plurality of data instances based on the specification information regarding at least one specification of the data element of the performance data.
Evaluate the data characteristics of the data instance, specify the performance analysis method according to the evaluated data characteristics, and perform performance analysis on the data elements belonging to the data instance by the specified performance analysis method. A performance analysis method that labels the results.

A performance analysis program that is run by a computer
On the computer
The performance data is obtained based on the specification information regarding at least one specification of the data element of the performance data including the time information, the specification information regarding a plurality of specifications indicating the context, and a plurality of data elements including the performance information. Divide into multiple data instances
The data characteristics of the data instance are evaluated, the performance analysis method according to the evaluated data characteristics is specified, and the performance analysis is performed on the data elements belonging to the data instance by the specified performance analysis method. A performance analysis program that executes a process that causes a label to indicate the analysis result.