JP2022024277A

JP2022024277A - Analysis system, analysis device, and analysis method

Info

Publication number: JP2022024277A
Application number: JP2020120726A
Authority: JP
Inventors: 和三村; Kazu Mimura; 利行齋藤; Toshiyuki Saito; 幸三池上; Kozo Ikegami; 演己山口; Hiroki Yamaguchi
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2020-07-14
Filing date: 2020-07-14
Publication date: 2022-02-09

Abstract

To enable assisting a human to determine explanatory variables that are not required for a prediction model expression.SOLUTION: An analysis system comprises: an input data creation section which creates a plurality of explanatory variables using information pertaining to events, for creating input data including the plurality of explanatory variables and results of the events; a prediction section which creates a prediction model expression predicting values for objective variables representing the results of the events using the plurality of explanatory variables; an explanatory variable evaluation section which calculates exclusion candidate variables representing candidates of explanatory variables which are not required to be used in the prediction model expression; and an output section which outputs the exclusion candidate variables.SELECTED DRAWING: Figure 5

Description

本発明は、分析システム、分析装置、および分析方法に関する。 The present invention relates to an analysis system, an analysis device, and an analysis method.

標的型攻撃をはじめとするサイバー攻撃、“ＤＤｏＳ”（ＤｉｓｔｒｉｂｕｔｅｄＤｅｎｉａｌｏｆＳｅｒｖｉｃｅ）攻撃、ウィルス拡散など、サイバー空間の脅威に対応する必要性が高まっている。サイバー空間では攻撃側が構造的に優位であり、その攻撃は日々高度化、増加、そして、変化してきている。しかも、攻撃対象は、金融サービス事業者やＩＴサービス事業者に留まらず、インフラ事業者等へと拡大している。一方、企業内のＩＴシステムは、クラウドサービスの利用拡大、スマホ、タブレットなどのモバイルデバイスの普及、リモートワーク等、益々、複雑化、多様化している。こうした状況では、いつ、どこからネットワークに不正侵入され、企業にとって重要な情報が外部に流出するかわからない。 There is an increasing need to respond to cyberspace threats such as targeted attacks and other cyber attacks, "DDoS" (Denial of Service) attacks, and virus spread. In cyberspace, the attacker has a structural advantage, and the attacks are becoming more sophisticated, increasing, and changing day by day. Moreover, the targets of attacks are expanding beyond financial service providers and IT service providers to infrastructure providers and the like. On the other hand, IT systems in companies are becoming more complicated and diversified due to the expansion of the use of cloud services, the spread of mobile devices such as smartphones and tablets, and remote work. In such a situation, it is unknown when and where the network is compromised and important information for the enterprise is leaked to the outside.

これまでは、サーバやネットワーク分野に詳しいＩＴ部門の技術者が、セキュリティの脅威に対応していたが、多層的なセキュリティ対策と併せ、専門スタッフが常時監視するセキュリティオペレーションセンタ（ＳＯＣ：ＳｅｃｕｒｉｔｙＯｐｅｒａｔｉｏｎＣｅｎｔｅｒ）の必要性が高まっている。しかしながら、セキュリティ専門家の必要数が確保できないために、情報システムや制御システムにおけるセキュリティインシデントの発生を監視する業務に支障を来たすことが懸念されていた。特に社会インフラ事業では、監視対象がシステム全体に及ぶため、ＳＯＣの運用能力の大幅な向上が望まれていた。 Until now, engineers in the IT department who are familiar with the server and network fields have responded to security threats, but in addition to multi-layered security measures, a security operations center (SOC) that is constantly monitored by specialized staff is constantly monitored. ) Is increasing. However, there was concern that the required number of security specialists could not be secured, which would hinder the work of monitoring the occurrence of security incidents in information systems and control systems. Especially in the social infrastructure business, since the monitoring target covers the entire system, it has been desired to significantly improve the operational capacity of the SOC.

ＳＯＣでの運用業務において、最も比重が高いのは、“ＦＷ”（Ｆｉｒｅｗａｌｌ）／“ＩＰＳ”（ＩｎｔｒｕｓｉｏｎＤｅｔｅｃｔｉｏｎＳｙｓｔｅｍ）等から通知されるセキュリティアラートの重要度を評価することである。言い換えると、セキュリティ専門家が、アラートがインシデントに該当するのか、または、誤検知などの注意を払わなくてもよいものに該当するかを評価することである。ＳＯＣのセキュリティ専門家は、監視対象システムの各装置ログ、外部脅威情報、マルウェアの危険度評価等を参照し、アラートの重要度を自身の知識と経験とに基づいて判断している。 The most important thing in the operation work at SOC is to evaluate the importance of security alerts notified from "FW" (Firewall) / "IPS" (Intrusion Detection System) and the like. In other words, a security expert assesses whether an alert is an incident or something that does not require attention, such as false positives. SOC security experts refer to each device log of the monitored system, external threat information, malware risk assessment, etc., and judge the importance of alerts based on their own knowledge and experience.

増加し続けるサイバー攻撃や監視対象システムの大規模化に対応して、ＳＯＣの運用を将来にわたって安定して継続させるには、セキュリティアラートの危険度を評価する業務を自動化、または、これを支援することが望まれる。このような問題は、セキュリティ面に限らず装置などの故障によるアラートであっても同様である。 In order to keep SOC operations stable in the future in response to the ever-increasing number of cyber attacks and the scale of monitored systems, automate or support the work of assessing the risk of security alerts. Is desired. Such a problem is not limited to the security aspect, but is the same even if it is an alert due to a failure of a device or the like.

サーバ装置やネットワーク装置を含む情報システム全体の故障監視を、専門スタッフが常駐するオペレーションセンタ（情報システム部門や、ネットワークオペレーションセンタ（ＮＯＣ：ＮｅｔｗｏｒｋＯｐｅｒａｔｉｏｎＣｅｎｔｅｒ）などと呼ばれる）が行っている。ここでは、各装置から発生するログ情報と、パフォーマンス低下などによるアラート情報を結び付けて、そのアラートが一時的なものなのか、サービス変更によるものなのか、または、装置故障によるものなのかを判断しなければならない。故障監視を行う専門スタッフは、ＳＯＣの場合と同様に、アラートの重要度を自身の知識と経験とに基づいて判断している。 Failure monitoring of the entire information system including server equipment and network equipment is performed by an operation center (information system department, network operation center (NOC: Network Operation Center), etc.) where specialized staff are stationed. Here, the log information generated from each device is linked with the alert information due to performance degradation, etc., and it is determined whether the alert is temporary, due to a service change, or due to a device failure. There must be. As in the case of SOC, the failure monitoring specialists judge the importance of alerts based on their own knowledge and experience.

業務の自動化が進展することで、これまで以上に様々なサービスが情報システム上に追加されるようになっているため、上記ログ情報やアラート情報は増加する一方である。そのため、オペレーションセンタの運用能力の大幅な向上が望まれており、その運用を将来にわたって安定して継続させるには、ＳＯＣ業務と同様に、アラートの重要度を評価する業務を自動化、または、これを支援することが望まれる。 With the progress of business automation, various services are being added to information systems more than ever, so the above log information and alert information are increasing. Therefore, it is desired to significantly improve the operational capacity of the operation center, and in order to continue the operation stably in the future, the task of evaluating the importance of alerts should be automated or this should be done in the same way as the SOC task. Is desired to support.

このような背景のもとで、過去のアラート重要度判断結果を目的変数、各装置の過去のログ及び外部公開情報などより算出するログ統計を説明変数として、統計的手法や機械学習技術を用いて重要度判断に影響を与えた要因を抽出し、予測モデル式を生成し、その予測モデル式を用いて新規発生したアラートに対する重要度を自動的に予測する手法が知られている。監視対象システムが大規模化すること、およびセキュリティであればサイバー攻撃の手口が日々変化し増加していることに鑑みると、参照すべき各装置のログ項目や外部公開情報の項目は非常に多岐に渡り、かつ変化していく。 Against this background, statistical methods and machine learning techniques are used, with log statistics calculated from past alert importance judgment results as objective variables, past logs of each device, and externally disclosed information as explanatory variables. There is known a method of extracting factors that have influenced the importance judgment, generating a prediction model formula, and automatically predicting the importance of a newly generated alert using the prediction model formula. Considering that the system to be monitored is becoming larger and the methods of cyber attacks are changing and increasing day by day in the case of security, the log items of each device to be referred to and the items of external public information are extremely diverse. And change.

特許文献１には、プロセッサと、事象群の要因に対する結果を予測する予測モデル式を記憶する記憶デバイスと、を有する分析装置であって、前記プロセッサは、前記事象群の中の第１事象の要因に対する第１出現頻度を前記予測モデル式に与えることで得られる第１予測値と、前記第１出現頻度に対応する結果と、に基づいて、前記第１予測値の予測誤差を算出する予測誤差算出処理と、前記事象群の中の第２事象の要因に対する第２出現頻度と、前記予測誤差算出処理によって算出された予測誤差と、の相関に基づいて、前記第１事象の要因の中から前記予測誤差の誤差要因を抽出する誤差要因抽出処理と、を実行することを特徴とする分析装置が開示されている。 Patent Document 1 is an analyzer comprising a processor and a storage device for storing a prediction model formula for predicting a result for a factor of the event group, wherein the processor is a first event in the event group. The prediction error of the first predicted value is calculated based on the first predicted value obtained by giving the first appearance frequency to the factor of the above to the prediction model formula and the result corresponding to the first appearance frequency. The factor of the first event is based on the correlation between the prediction error calculation process, the second appearance frequency for the factor of the second event in the event group, and the prediction error calculated by the prediction error calculation process. An analyzer characterized by executing an error factor extraction process for extracting an error factor of the prediction error from the above is disclosed.

特開２０１９－１４４９７０号公報Japanese Unexamined Patent Publication No. 2019-144970

特許文献１に開示された技術では、予測に寄与しない説明変数を含んでしまう可能性がある。 The technique disclosed in Patent Document 1 may include explanatory variables that do not contribute to prediction.

本発明の第１の態様による分析システムは、事象に関する情報を用いて複数の説明変数を作成し、前記複数の説明変数および前記事象の結果を含む入力データを作成する入力データ作成部と、前記複数の説明変数を用いて前記事象の結果である目的変数の値を予測する予測モデル式を作成する予測部と、前記予測モデル式に用いる必要がない前記説明変数の候補である除外候補変数を算出する説明変数評価部と、前記除外候補変数を出力する出力部と、を備える。
本発明の第２の態様による分析装置は、事象に関する情報を用いて複数の説明変数を作成し、前記複数の説明変数および前記事象の結果を含む入力データを作成する入力データ作成部と、前記複数の説明変数を用いて前記事象の結果である目的変数の値を予測する予測モデル式を作成する予測部と、前記予測モデル式に用いる必要がない前記説明変数の候補である除外候補変数を算出する説明変数評価部と、前記除外候補変数を出力する出力部と、を備える。
本発明の第３の態様による分析方法は、分析装置が実行する分析方法であって、事象に関する情報を用いて複数の説明変数を作成し、前記複数の説明変数および前記事象の結果を含む入力データを作成することと、前記複数の説明変数を用いて前記事象の結果である目的変数の値を予測する予測モデル式を作成することと、前記予測モデル式に用いる必要がない前記説明変数の候補である除外候補変数を算出することと、前記除外候補変数を出力することとを含む。 The analysis system according to the first aspect of the present invention includes an input data creation unit that creates a plurality of explanatory variables using information about an event and creates input data including the plurality of explanatory variables and the result of the event. A prediction unit that creates a prediction model formula that predicts the value of the objective variable that is the result of the event using the plurality of explanatory variables, and an exclusion candidate that is a candidate for the explanatory variable that does not need to be used in the prediction model formula. It includes an explanatory variable evaluation unit for calculating variables and an output unit for outputting the exclusion candidate variables.
The analyzer according to the second aspect of the present invention includes an input data creation unit that creates a plurality of explanatory variables using information about an event and creates input data including the plurality of explanatory variables and the result of the event. A prediction unit that creates a prediction model formula that predicts the value of the objective variable that is the result of the event using the plurality of explanatory variables, and an exclusion candidate that is a candidate for the explanatory variable that does not need to be used in the prediction model formula. It includes an explanatory variable evaluation unit for calculating variables and an output unit for outputting the exclusion candidate variables.
The analysis method according to the third aspect of the present invention is an analysis method executed by an analyzer, in which a plurality of explanatory variables are created using information about an event, and the plurality of explanatory variables and the result of the event are included. Creating input data, creating a prediction model formula that predicts the value of the objective variable that is the result of the event using the plurality of explanatory variables, and the explanation that does not need to be used in the prediction model formula. It includes calculating an exclusion candidate variable that is a candidate for a variable and outputting the exclusion candidate variable.

本発明によれば、予測モデル式に用いない説明変数を人間が決定する補助ができる。 According to the present invention, it is possible to assist human beings in determining explanatory variables that are not used in the prediction model formula.

第１の実施の形態における分析システムのシステム構成例を示すブロック図A block diagram showing a system configuration example of the analysis system according to the first embodiment. 図１に示した各種コンピュータのハードウェア構成例を示すブロック図A block diagram showing hardware configuration examples of various computers shown in FIG. 第１の実施の形態において統合管理装置の機能的成例を示すブロック図A block diagram showing a functional example of the integrated management device in the first embodiment. 第１の実施の形態においてアラート分析装置の機能構成例を示すブロック図A block diagram showing a functional configuration example of the alert analyzer in the first embodiment. 入力データ作成の動作例を示すシーケンス図Sequence diagram showing an operation example of input data creation アラート収集テーブルの一例を示す説明図Explanatory diagram showing an example of an alert collection table ログ収集テーブルの一例を示す説明図Explanatory diagram showing an example of a log collection table 計算リソース量テーブルの一例を示す説明図Explanatory diagram showing an example of a calculated resource amount table 入力データテーブルの一例を示す説明図Explanatory diagram showing an example of an input data table アラート分析装置の設定画面表示の一例を示す説明図Explanatory diagram showing an example of the setting screen display of the alert analyzer 作成対象から除外する説明変数を抽出する動作の一例を示すシーケンス図Sequence diagram showing an example of the operation to extract the explanatory variables to be excluded from the creation target 相関度評価テーブルの一例を示す説明図Explanatory diagram showing an example of the correlation degree evaluation table 説明変数評価テーブルの一例を示す説明図Explanatory diagram showing an example of an explanatory variable evaluation table 相互相関度評価テーブルの一例を示す説明図Explanatory diagram showing an example of a cross-correlation degree evaluation table 説明変数のグループ内評価テーブルの一例を示す説明図Explanatory diagram showing an example of an evaluation table in a group of explanatory variables 統合管理装置、またはアラート分析装置の出力画面表示の一例を示す説明図Explanatory diagram showing an example of the output screen display of the integrated management device or the alert analysis device. アラート分析装置の予測影響度評価手順の一例を示すフローチャート図Flow chart showing an example of the predictive impact evaluation procedure of the alert analyzer 第２の実施の形態における分析システムのシステム構成例を示すブロック図A block diagram showing a system configuration example of the analysis system according to the second embodiment. 第２の実施の形態におけるアラート分析装置の機能構成例を示すブロック図A block diagram showing a functional configuration example of the alert analyzer according to the second embodiment.

―第１の実施の形態―
以下、図１～図１７を参照して、分析システムの第１の実施の形態を説明する。 -First embodiment-
Hereinafter, the first embodiment of the analysis system will be described with reference to FIGS. 1 to 17.

図１は、第１の実施の形態における分析システム１の構成を示す図である。分析システム１は、ＳＯＣ１３０と、アラート分析装置１３５とを含んで構成される。分析システム１は、１以上の監視対象を対象にサイバー攻撃等セキュリティの脅威から防衛することに関する分析を行う。本実施の形態では分析システム１が第１監視対象システム１００ａを監視対象とする場合を説明する。なお、以下では、セキュリティの監視について述べるが、分析システム１は、その他のアラート、たとえば、ハードウェアの故障やシステムの不調を監視してもよい。ＳＯＣ１３０は、アラート管理装置１３１と、ログ管理装置１３２と、統合管理装置１３３と、第２ネットワーク１３４と、を備える。 FIG. 1 is a diagram showing a configuration of an analysis system 1 according to the first embodiment. The analysis system 1 includes an SOC 130 and an alert analyzer 135. The analysis system 1 analyzes one or more monitoring targets to protect them from security threats such as cyber attacks. In this embodiment, a case where the analysis system 1 targets the first monitoring target system 100a as a monitoring target will be described. Although security monitoring will be described below, the analysis system 1 may monitor other alerts such as hardware failure and system malfunction. The SOC 130 includes an alert management device 131, a log management device 132, an integrated management device 133, and a second network 134.

第１監視対象システム１００ａは、第１ネットワーク１１０、１台以上のクライアント端末１１１、業務サーバ１１２、ネットワーク監視装置１１３、ファイヤウォール１１４、およびプロキシサーバ１１５を有する。なおファイヤウォール１１４は侵入防止機能を備えてもよい。第１監視対象システム１００ａの処理対象は本発明において限定されず、第１監視対象システム１００ａはたとえば、金融分野のコンピュータシステムやＩＴ分野のコンピュータシステムである。ただし第１監視対象システム１００ａは、電力供給システム、上下水道管理システムのような社会インフラに係るものであってもよい。第１監視対象システム１００ａは、ハードウェアおよびソフトウェアを監視し、サーバに対する攻撃や、ウィルス等の不正を検知すると、アラートを作成してＳＯＣ１３０に通知する。 The first monitoring target system 100a includes a first network 110, one or more client terminals 111, a business server 112, a network monitoring device 113, a firewall 114, and a proxy server 115. The firewall 114 may have an intrusion prevention function. The processing target of the first monitored system 100a is not limited in the present invention, and the first monitored system 100a is, for example, a computer system in the financial field or a computer system in the IT field. However, the first monitored system 100a may be related to social infrastructure such as a power supply system and a water and sewage management system. The first monitoring target system 100a monitors hardware and software, and when it detects an attack on a server or fraud such as a virus, it creates an alert and notifies the SOC 130.

第１ネットワーク１１０は、例えば、バスであり、クライアント端末１１１、業務サーバ１１２、ネットワーク監視装置１１３、ファイヤウォール１１４、プロキシサーバ１１５、ＳＯＣ１３０が、互いに通信可能に接続されている。ファイヤウォール１１４は、外部ネットワーク１１６に接続される。外部ネットワーク１１６は、例えば、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）、または、インターネットである。 The first network 110 is, for example, a bus, and the client terminal 111, the business server 112, the network monitoring device 113, the firewall 114, the proxy server 115, and the SOC 130 are connected to each other so as to be able to communicate with each other. The firewall 114 is connected to the external network 116. The external network 116 is, for example, a LAN (Local Area Network), a WAN (Wide Area Network), or the Internet.

第２ネットワーク１３４は、例えば、バスであり、アラート管理装置１３１、ログ管理装置１３２、統合管理装置１３３、外部ネットワーク１１６が接続されている。さらに、ＳＯＣ１３０には、外部ネットワーク１１６を介して、アラート分析装置１３５、および、外部脅威情報データベース１３６が接続されている。アラート分析装置１３５とＳＯＣ１３０は、一般的には、ＶＰＮ（ＶｉｒｔｕａｌＰｒｉｖａｔｅＮｅｔｗｏｒｋ）などのセキュアな通信で接続される。外部脅威情報データベース１３６は、脅威に関する情報を提供するＷｅｂサーバである。 The second network 134 is, for example, a bus, to which an alert management device 131, a log management device 132, an integrated management device 133, and an external network 116 are connected. Further, the alert analyzer 135 and the external threat information database 136 are connected to the SOC 130 via the external network 116. The alert analyzer 135 and the SOC 130 are generally connected by secure communication such as a VPN (Virtual Private Network). The external threat information database 136 is a Web server that provides information on threats.

アラート管理装置１３１は、第１監視対象システム１００ａからウィルス検出、システムの異常な挙動の検出、未登録装置との接続の検出、故障、障害等の事象をアラートとして受信すると、アラートを特定するための管理情報を登録する。アラートの管理情報は、アラートの発生日時と、アラート対象の種類と、アラート対象のアドレスとを含む。なお、「アラート対象の種類」は、「アラートの発生元の種類」と言い換えることもできる。 The alert management device 131 identifies an alert when it receives an event such as virus detection, abnormal system behavior detection, connection detection with an unregistered device, failure, or failure from the first monitored system 100a as an alert. Register the management information of. The alert management information includes the date and time when the alert occurred, the type of the alert target, and the address of the alert target. The "type of alert target" can also be rephrased as the "type of alert source".

ＳＯＣのセキュリティ専門家（以下、専門家、という。）は、外部脅威情報データベース１３６を参照して、それぞれのアラートを評価する。そして専門家は評価結果に基づきそれぞれのアラートに分類情報であるアラート分類を付す。アラート分類とは、たとえばフラグ０およびフラグ１のいずれかである。フラグ０が付されるアラートはたとえば、アラートが情報漏えい、システムの暴走等セキュリティインシデントに繋がる危険度が高いものである。フラグ１が付されるアラートはたとえば、対策済み、または、アラートが誤報に過ぎない等、危険度が高くなく、さほど注意を払わなくてもよいものである。専門家が付したアラート分類は、後述する処理において利用される。 SOC security experts (hereinafter referred to as experts) refer to the external threat information database 136 and evaluate each alert. Then, the expert attaches the alert classification, which is the classification information, to each alert based on the evaluation result. The alert classification is, for example, one of flag 0 and flag 1. An alert with flag 0 has a high risk of leaking information and leading to a security incident such as a system runaway. The alert to which the flag 1 is attached is not high in risk, for example, the countermeasure has been taken or the alert is only a false alarm, and it is not necessary to pay much attention. The alert classification given by the expert is used in the process described later.

ログ管理装置１３２は、第１監視対象システム１００ａからアクセスログや認証ログなどを取得して、過去のログの履歴を記録して管理する。ログは、いつ、第１監視対象システム１００ａ内のどのコンピュータがどのようなデータをどの通信相手に送受信したか等の情報示す履歴情報である。 The log management device 132 acquires an access log, an authentication log, and the like from the first monitored system 100a, and records and manages the history of past logs. The log is historical information indicating information such as when, which computer in the first monitored system 100a sends and receives what data to which communication partner.

統合管理装置１３３は、アラート収集テーブル６００と、ログ収集テーブル７００と、計算リソース量テーブル８００と、入力データテーブル９００とを備える。統合管理装置１３３は、アラート管理装置１３１からアラート管理情報を収集し、後述するアラート収集テーブル６００に登録する。また統合管理装置１３３は、ログ管理装置１３２からログ情報を収集し、後述するログ収集テーブル７００に登録する。さらに統合管理装置１３３は、アラート収集テーブル６００とログ収集テーブル７００に記載された情報に基づいて、後述する入力データテーブル９００を作成し、アラート分析装置１３５に対して入力データテーブル９００を送信する。計算リソース量テーブル８００については後述する。 The integrated management device 133 includes an alert collection table 600, a log collection table 700, a calculation resource amount table 800, and an input data table 900. The integrated management device 133 collects alert management information from the alert management device 131 and registers it in the alert collection table 600, which will be described later. Further, the integrated management device 133 collects log information from the log management device 132 and registers it in the log collection table 700, which will be described later. Further, the integrated management device 133 creates an input data table 900, which will be described later, based on the information described in the alert collection table 600 and the log collection table 700, and transmits the input data table 900 to the alert analysis device 135. The calculated resource amount table 800 will be described later.

アラート分析装置１３５は、アラートの内容と専門家が決定したアラート分類との組み合わせを学習し、学習結果に基づいて別のアラートを評価するためのモデルを作成する。アラート分析装置１３５は、このモデルに基づいて新たに発生したアラートである評価対象アラートを分析し、アラート分類を決定するための予測値を計算し、これを統合管理装置１３３に通知する。統合管理装置１３３の表示を確認した専門家は、この予測値を確認して、評価対象アラートの正式なアラート分類を決定できる。 The alert analyzer 135 learns the combination of the content of the alert and the alert classification determined by the expert, and creates a model for evaluating another alert based on the learning result. The alert analyzer 135 analyzes the evaluation target alert which is a newly generated alert based on this model, calculates a predicted value for determining the alert classification, and notifies the integrated management device 133 of this. The expert who confirmed the display of the integrated management device 133 can confirm this predicted value and determine the formal alert classification of the alert to be evaluated.

外部脅威情報データベース１３６は、インターネット上で脅威情報を公開するものである。脅威情報には、マルウェア、プログラムの脆弱性、スパム、そして、不正ＵＲＬ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＬｏｃａｔｏｒ）が含まれていてよい。ただし外部脅威情報データベース１３６がＳＯＣ１３０の外部に存在することは必須の構成ではなく、外部脅威情報データベース１３６がＳＯＣ１３０に含まれてもよい。 The external threat information database 136 discloses threat information on the Internet. Threat information may include malware, program vulnerabilities, spam, and malicious URLs (Uniform Resource Identifiers). However, it is not essential that the external threat information database 136 exists outside the SOC 130, and the external threat information database 136 may be included in the SOC 130.

図２は、図１に示すシステムに適用されるコンピュータ２００のハードウェアブロック図である。コンピュータ２００は、クライアント端末１１１、業務サーバ１１２、ネットワーク監視装置１１３、ファイヤウォール１１４、プロキシサーバ１１５、アラート管理装置１３１、ログ管理装置１３２、統合管理装置１３３、およびアラート分析装置１３５の夫々に適用される。コンピュータ２００は、プロセッサ２０１と、記憶デバイス２０２と、入力デバイス２０３と、出力デバイス２０４と、通信インタフェース２０５と、を備える。プロセッサ２０１等は、バス２０６により互いに接続されている。記憶デバイス２０２は、プログラムやデータを記憶する非一時的なまたは一時的な記録手段でよい。 FIG. 2 is a hardware block diagram of the computer 200 applied to the system shown in FIG. The computer 200 is applied to each of the client terminal 111, the business server 112, the network monitoring device 113, the firewall 114, the proxy server 115, the alert management device 131, the log management device 132, the integrated management device 133, and the alert analysis device 135. To. The computer 200 includes a processor 201, a storage device 202, an input device 203, an output device 204, and a communication interface 205. The processors 201 and the like are connected to each other by the bus 206. The storage device 202 may be a non-temporary or temporary recording means for storing programs and data.

なお図２に示す構成は例示であり、たとえばプロセッサ２０１の代わりに書き換え可能な論理回路や特定用途向け集積回路が用いられてもよい。また、クライアント端末１１１、業務サーバ１１２、ネットワーク監視装置１１３、ファイヤウォール１１４、プロキシサーバ１１５、アラート管理装置１３１、ログ管理装置１３２、統合管理装置１３３、およびアラート分析装置１３５は構成が同一でなくてもよい。 The configuration shown in FIG. 2 is an example, and for example, a rewritable logic circuit or an integrated circuit for a specific application may be used instead of the processor 201. Further, the client terminal 111, the business server 112, the network monitoring device 113, the firewall 114, the proxy server 115, the alert management device 131, the log management device 132, the integrated management device 133, and the alert analysis device 135 do not have the same configuration. May be good.

なお記憶デバイス２０２は、外部から記憶媒体２９１を介して各処理を実現するための動作プログラム等の情報の提供を受けてもよい。また、コンピュータ２００は通信インタフェース２０５を介して外部サーバ２９２との接続機能を有してもよい。外部サーバ２９２は各処理を実現するための動作プログラム等の情報を提供するサーバーコンピュータであり、記憶装置２９３などの記録媒体に情報を格納する。通信回線２９９は、インターネット、パソコン通信などの通信回線、あるいは専用通信回線などである。 The storage device 202 may be provided with information such as an operation program for realizing each process from the outside via the storage medium 291. Further, the computer 200 may have a function of connecting to an external server 292 via the communication interface 205. The external server 292 is a server computer that provides information such as an operation program for realizing each process, and stores the information in a recording medium such as a storage device 293. The communication line 299 is a communication line such as the Internet and personal computer communication, or a dedicated communication line.

外部サーバ２９２は記憶装置２９３から動作プログラム等の情報を読み出し、通信回線２９９を介してコンピュータ２００に送信する。すなわち、プログラムをデータ信号として搬送波を介して、通信回線２９９を介して送信する。このように、コンピュータ２００を動作させるためのプログラムは、記録媒体やデータ信号（搬送波）などの種々の形態のコンピュータ読み込み可能なコンピュータプログラム製品として供給できる。 The external server 292 reads information such as an operation program from the storage device 293 and transmits it to the computer 200 via the communication line 299. That is, the program is transmitted as a data signal via a carrier wave and via a communication line 299. As described above, the program for operating the computer 200 can be supplied as a computer-readable computer program product in various forms such as a recording medium and a data signal (carrier wave).

図３は、第１の実施の形態における統合管理装置１３３の機能構成例を示すブロック図である。統合管理装置１３３は、アラート収集部３０１と、ログ収集部３０２と、入力データ作成部３０３と、出力データ表示部３０４と、を有する。アラート収集部３０１等の夫々のブロックは、プロセッサ２０１が記憶デバイス２０２に保存されたプログラムを実行することによって実現される機能モジュールである。したがって、アラート収集部３０１等の夫々を、アラート収集モジュール、アラート収集手段、アラート収集回路、アラート収集ユニット、または、アラート収集要素等と言い換えてもよい。なお、それぞれの機能モジュールをハードウェアによって実現してもよい。 FIG. 3 is a block diagram showing a functional configuration example of the integrated management device 133 according to the first embodiment. The integrated management device 133 includes an alert collection unit 301, a log collection unit 302, an input data creation unit 303, and an output data display unit 304. Each block of the alert collecting unit 301 and the like is a functional module realized by the processor 201 executing a program stored in the storage device 202. Therefore, each of the alert collecting unit 301 and the like may be paraphrased as an alert collecting module, an alert collecting means, an alert collecting circuit, an alert collecting unit, an alert collecting element, and the like. In addition, each functional module may be realized by hardware.

アラート収集部３０１は、アラート管理装置１３１からアラート管理情報を受信し、後述するアラート収集テーブル６００に登録する。ログ収集部３０２は、アラート収集テーブル６００に登録された情報を参照して、ログ管理装置１３２から受信するログ情報と、外部脅威情報データベース１３６から受信する外部脅威情報から説明変数となるログ統計を算出して、後述するログ収集テーブル７００に登録する。また、ログ収集部３０２は、ログ統計の算出に要した計算リソース量、すなわち、計算時間を後述する計算リソース量テーブル８００に登録する。 The alert collection unit 301 receives the alert management information from the alert management device 131 and registers it in the alert collection table 600, which will be described later. The log collection unit 302 refers to the information registered in the alert collection table 600, and obtains log statistics as explanatory variables from the log information received from the log management device 132 and the external threat information received from the external threat information database 136. It is calculated and registered in the log collection table 700, which will be described later. Further, the log collecting unit 302 registers the calculation resource amount required for the calculation of the log statistics, that is, the calculation time in the calculation resource amount table 800 described later.

入力データ作成部３０３は、アラート収集テーブル６００およびログ収集テーブル７００を参照して入力データテーブル９００を作成し、アラート分析装置１３５に入力データテーブル９００を送信する。出力データ表示部３０４は、計算リソース量テーブル８００の内容と共に、アラート分析装置１３５から受信する出力データを表示する。セキュリティ専門家３１０は、この出力データ表示部３０４の表示画面を確認し、作成する説明変数、すなわちログ統計の種類を任意に設定することができる。 The input data creation unit 303 creates the input data table 900 with reference to the alert collection table 600 and the log collection table 700, and transmits the input data table 900 to the alert analyzer 135. The output data display unit 304 displays the output data received from the alert analyzer 135 together with the contents of the calculation resource amount table 800. The security expert 310 can check the display screen of the output data display unit 304 and arbitrarily set the explanatory variable to be created, that is, the type of log statistics.

図４は、第１の実施の形態におけるアラート分析装置１３５の機能的構成例を示すブロックである。アラート分析装置１３５は、たとえば、まず人工知能による機械学習、たとえばプレディクティブコーディングを利用して、アラートの評価モデルを作成する。次にアラート分析装置１３５は、学習の継続により評価モデルを更新する。アラート分析装置１３５はさらに、評価モデルに基づいて、アラートに対する分類の予測値を算出する。アラート分析装置１３５は、アラートを構成し、アラートを特徴付けるデータ要素と、分類フラグとの関連性を学習する。データ要素は、アラートに関連するログの情報である。評価モデルは、例えば、複数のデータ要素毎のフラグとの関係性における重み、または数式のモデルである。 FIG. 4 is a block showing a functional configuration example of the alert analyzer 135 according to the first embodiment. The alert analyzer 135 first creates an alert evaluation model using, for example, machine learning by artificial intelligence, for example, predictive coding. Next, the alert analyzer 135 updates the evaluation model by continuing the learning. The alert analyzer 135 further calculates the predicted value of the classification for the alert based on the evaluation model. The alert analyzer 135 configures an alert and learns the association between the data elements that characterize the alert and the classification flags. Data elements are log information related to alerts. The evaluation model is, for example, a weight in relation to a flag for each of a plurality of data elements, or a model of a mathematical formula.

アラート分析装置１３５は、モデル部４００と、予測部４０１と、説明変数評価部４０２と、予測影響度評価部４０３と、出力部４０４と、を有する。予測部４０１等のそれぞれのブロックは、プロセッサ２０１が記憶デバイス２０２に保存されたプログラムを実行することによって実現される機能モジュールである。したがって、予測部４０１等の夫々を、予測モジュール、予測手段、予測回路、予測ユニット、または、予測要素等と言い換えてもよい。なお、機能モジュールをハードウェアによって実現してもよい。 The alert analyzer 135 includes a model unit 400, a prediction unit 401, an explanatory variable evaluation unit 402, a prediction impact evaluation unit 403, and an output unit 404. Each block of the prediction unit 401 and the like is a functional module realized by the processor 201 executing a program stored in the storage device 202. Therefore, each of the prediction unit 401 and the like may be paraphrased as a prediction module, a prediction means, a prediction circuit, a prediction unit, a prediction element, and the like. The functional module may be realized by hardware.

予測部４０１は、予測モデル式に基づいて、新規アラートの予測値を演算する。この予測値は、出力部４０４を介して統合管理装置１３３に送信される。専門家３１０は、新規アラートを実際に処理する前に予測値を参照できるために、アラートの処理を効率的に行うことができ、かつ、重要度の高いアラートの見逃しを無くすことができる。 The prediction unit 401 calculates the predicted value of the new alert based on the prediction model formula. This predicted value is transmitted to the integrated management device 133 via the output unit 404. Since the expert 310 can refer to the predicted value before actually processing the new alert, the alert can be processed efficiently, and the alert of high importance can be eliminated.

説明変数評価部４０２は、１または複数の所定の抽出方法用いて説明変数を評価し、予測モデル式に用いる必要がない説明変数の候補である除外候補変数を算出する。ただし本実施の形態では特定の説明変数を評価するのではなく、特に対象を限定することなく、換言すると全ての説明変数を評価して、所定の条件に合致する説明変数を除外候補変数として出力する。 The explanatory variable evaluation unit 402 evaluates the explanatory variables using one or a plurality of predetermined extraction methods, and calculates exclusion candidate variables that are candidates for the explanatory variables that do not need to be used in the prediction model formula. However, in the present embodiment, a specific explanatory variable is not evaluated, but in other words, all the explanatory variables are evaluated without limiting the target, and the explanatory variables that meet the predetermined conditions are output as exclusion candidate variables. do.

予測影響度評価部４０３は、説明変数評価部４０２が抽出した説明変数を、作成対象から除外した際の予測影響度を評価する。予測影響度は、特定の説明変数を予測モデル式に用いない場合の精度である。たとえば、特定の説明変数を用いない予測モデル式を用いたテストアラートの正答率を予測影響度と定義する。より具体的には、特定の説明変数を用いない予測モデル式を用いたテストアラートが１０個中９個正解した場合には、予測影響度は「９０％」である。 The predictive impact evaluation unit 403 evaluates the predictive impact when the explanatory variables extracted by the explanatory variable evaluation unit 402 are excluded from the creation target. The predictive impact is the accuracy when no particular explanatory variable is used in the predictive model formula. For example, the correct answer rate of a test alert using a predictive model formula that does not use a specific explanatory variable is defined as the predictive impact. More specifically, when 9 out of 10 test alerts using the prediction model formula that does not use a specific explanatory variable are answered correctly, the prediction influence degree is "90%".

説明変数評価部４０２と予測影響度評価部４０３の評価結果は、出力部４０４を介して、統合管理装置１３３に送信される。専門家３１０は、この評価結果の表示画面を確認し、作成する説明変数、すなわちログ統計の種類を任意に設定することができる。出力部４０４は、予測部４０１、説明変数評価部４０２、および予測影響度評価部４０３の出力結果をまとめて、統合管理装置１３３に送信する。 The evaluation results of the explanatory variable evaluation unit 402 and the prediction impact evaluation unit 403 are transmitted to the integrated management device 133 via the output unit 404. The expert 310 can confirm the display screen of the evaluation result and arbitrarily set the explanatory variable to be created, that is, the type of log statistics. The output unit 404 collectively transmits the output results of the prediction unit 401, the explanatory variable evaluation unit 402, and the prediction impact evaluation unit 403 to the integrated management device 133.

次に、分析システム１の動作を説明する。図５は、統合管理装置１３３が入力データを作成する動作例を示すシーケンス図である。アラート収集部３０１は、アラートの学習のために、アラート管理装置１３１からアラート管理情報を収集する範囲を決定する（ステップＳ５０１）。アラート管理装置１３１は、第１監視対象システム１００ａからアラートを受信する都度、アラートの管理情報を蓄積する。 Next, the operation of the analysis system 1 will be described. FIG. 5 is a sequence diagram showing an operation example in which the integrated management device 133 creates input data. The alert collecting unit 301 determines the range in which the alert management information is collected from the alert management device 131 for learning the alert (step S501). The alert management device 131 accumulates alert management information each time an alert is received from the first monitoring target system 100a.

アラート収集部３０１は、アラート管理装置１３１からアラート管理情報を収集する（Ｓ５０２）。アラート収集部３０１は、例えば、取り込まれていないアラートのうち最も発生日時が古いアラートから、収集する期間の範囲、たとえば３０日間で発生したアラートの管理情報を収集する。アラート収集部３０１は、収集したアラート管理情報をアラート収集テーブル６００に登録する（ステップＳ５０３）。ここでアラート収集テーブル６００を説明する。 The alert collection unit 301 collects alert management information from the alert management device 131 (S502). The alert collection unit 301 collects, for example, management information of alerts that have occurred in the range of the collection period, for example, 30 days, from the alert with the oldest occurrence date and time among the alerts that have not been captured. The alert collection unit 301 registers the collected alert management information in the alert collection table 600 (step S503). Here, the alert collection table 600 will be described.

図６は、アラート収集テーブル６００の一例を示す説明図である。アラート収集テーブル６００は、統合管理装置１３３の所定の記憶領域に保存される。アラート収集テーブル６００は、アラート識別子６０１と、発生日時６０２と、システム６０３と、アラート対象６０４と、通信相手６０５と、判断結果６０６と、アラート分類６０７と、のフィールドを備える。これらのフィールドに格納される情報のうち、判断結果６０６と、アラート分類６０７を除くフィールドに格納される情報は、アラート管理装置１３１が第１監視対象システム１００ａから取得するアラートを解析して得られる。 FIG. 6 is an explanatory diagram showing an example of the alert collection table 600. The alert collection table 600 is stored in a predetermined storage area of the integrated management device 133. The alert collection table 600 includes fields of an alert identifier 601, an occurrence date and time 602, a system 603, an alert target 604, a communication partner 605, a determination result 606, and an alert classification 607. Of the information stored in these fields, the determination result 606 and the information stored in the fields other than the alert classification 607 are obtained by analyzing the alert acquired by the alert management device 131 from the first monitored system 100a. ..

アラート識別子６０１には、アラートを一意に特定する識別情報が格納される。発生日時６０２には、アラートが発生した日付や時刻の情報が格納される。システム６０３は、アラートが発生した監視対象システムの識別情報が格納される。アラート対象６０４は、アラートの発生元を特定する情報、たとえば装置を特定する情報が格納される。すなわちアラート対象６０４に格納される情報は、システム６０３に格納される情報のより詳細を示している。通信相手６０５には、アラート対象６０４により特定される対象が送信したデータの宛先またはアラート対象６０４により特定される対象にデータを送信した送信元を示す情報が格納される。 The alert identifier 601 stores identification information that uniquely identifies the alert. Information on the date and time when the alert occurred is stored in the occurrence date and time 602. The system 603 stores the identification information of the monitored system in which the alert has occurred. The alert target 604 stores information that identifies the source of the alert, for example, information that identifies the device. That is, the information stored in the alert target 604 shows more details of the information stored in the system 603. The communication partner 605 stores information indicating the destination of the data transmitted by the target specified by the alert target 604 or the transmission source of the data transmitted to the target specified by the alert target 604.

判断結果６０６には、アラートに対して、専門家によって行われた対応の情報が格納される。判断結果はたとえば、“誤検知と判断”、“未対処の攻撃と判断”、“対処済みの攻撃と判断”、および“未処理”のいずれかである。アラート分類６０７は、処理結果６０６に基づいて、専門家によって入力されたアラートの重要性を示す情報が格納される。アラート分類６０７には”１”または”０”が格納され、“１”はアラートが重要であり危険度が大きいことを示し、“０”はアラートが重要ではなく危険ではないから注意を払わなくてもよいことを示す。後者は、例えば、アラートが誤報の場合や、アラートが対策済みの場合に選択される。 The determination result 606 stores information on the response taken by the expert to the alert. The determination result is, for example, one of "false positive and determination", "unaddressed attack and determination", "addressed attack and determination", and "unprocessed". The alert classification 607 stores information indicating the importance of the alert input by the expert based on the processing result 606. "1" or "0" is stored in the alert classification 607, "1" indicates that the alert is important and the risk is high, and "0" indicates that the alert is not important and not dangerous, so do not pay attention. Indicates that it may be. The latter is selected, for example, when the alert is a false alarm or when the alert has been addressed.

図５に戻って説明を続ける。アラート収集部３０１は、アラート収集テーブル６００への登録が完了すると、アラート情報をログ収集部３０２に通知する（ステップＳ５０４）。ログ収集部３０２は、ログ管理装置１３２にアクセスしてログ情報群を参照し、アラートに関連するログ情報を抽出する（Ｓ５０５）。また、ログ収集部３０２は、外部脅威情報データベース１３６にアクセスして、アラートに関連する外部脅威情報を抽出する（Ｓ５０６）。ログ収集部３０２は、ログ情報と外部脅威情報を元に、アラートに関連したログ統計、すなわち、説明変数を算出して、ログ収集テーブル７００に登録する。そして、ログ収集部３０２は、ログ統計の算出に要した計算時間を計算リソース量テーブル８００に記憶する（Ｓ５０８）。 The explanation will be continued by returning to FIG. When the registration to the alert collection table 600 is completed, the alert collection unit 301 notifies the log collection unit 302 of the alert information (step S504). The log collection unit 302 accesses the log management device 132, refers to the log information group, and extracts log information related to the alert (S505). Further, the log collecting unit 302 accesses the external threat information database 136 and extracts the external threat information related to the alert (S506). The log collection unit 302 calculates log statistics related to alerts, that is, explanatory variables, based on log information and external threat information, and registers them in the log collection table 700. Then, the log collecting unit 302 stores the calculation time required for calculating the log statistics in the calculation resource amount table 800 (S508).

図７は、ログ収集テーブル７００の一例を示す説明図である。ログ収集テーブル７００は、統合管理装置１３３の所定の記憶領域に記録されている。ログ収集テーブル７００の各エントリは、アラート識別子６０１と、集計日時７０２と、プロキシサーバログ７０３と、業務サーバログ７０４と、外部脅威情報７０５のフィールドを有する。なおログ収集テーブル７００には、第１監視対象システム１００ａ内の他のコンピュータ、たとえばクライアント端末１１１、ファイヤウォール１１４、およびネットワーク監視装置１１３などに関するログが含まれてもよい。 FIG. 7 is an explanatory diagram showing an example of the log collection table 700. The log collection table 700 is recorded in a predetermined storage area of the integrated management device 133. Each entry in the log collection table 700 has fields for an alert identifier 601, an aggregation date and time 702, a proxy server log 703, a business server log 704, and external threat information 705. The log collection table 700 may include logs related to other computers in the first monitored system 100a, such as the client terminal 111, the firewall 114, and the network monitoring device 113.

以下に説明する、ログ収集テーブル７００に含まれるログの統計情報であるログ統計が説明変数に該当する。図７に示す例では、キャッシュミス回数７３１、異常応答回数７３２、異常応答回数７４１、アクセス回数７４２、ＩＰアドレス危険度７５１、およびＵＲＬ危険度７５２が説明変数である。 The log statistics, which are the statistical information of the logs included in the log collection table 700, described below correspond to the explanatory variables. In the example shown in FIG. 7, the number of cache misses 731, the number of abnormal responses 732, the number of abnormal responses 741, the number of accesses 742, the IP address risk 751, and the URL risk 752 are explanatory variables.

ログ収集テーブル７００には、以下に詳述するように、それぞれのアラートに対応する１以上のログの統計が格納される。換言するとログ収集テーブル７００には、１つのアラートに対応する１つのログが格納されるのではなく、アラートが発せられた時刻近辺の複数のログを集計した情報が格納される。具体的には、アラートが発せられた時刻を起点として所定時間だけ遡り、そこから所定の時間間隔で集計したログ等の統計情報がログ収集テーブル７００に格納される。図７に示す例では所定時間を「１時間」、時間間隔を「１０分」としており、１つのアラートに対応する７つの統計情報が格納される。 The log collection table 700 stores statistics for one or more logs corresponding to each alert, as detailed below. In other words, the log collection table 700 does not store one log corresponding to one alert, but stores information that aggregates a plurality of logs near the time when the alert is issued. Specifically, statistical information such as a log that goes back by a predetermined time from the time when the alert is issued and aggregated at a predetermined time interval is stored in the log collection table 700. In the example shown in FIG. 7, the predetermined time is "1 hour" and the time interval is "10 minutes", and seven statistical information corresponding to one alert is stored.

アラート識別子６０１には、そのエントリの統計に対応するアラートを特定する情報、すなわちアラートの識別子が格納される。集計日時７０２には、そのエントリに示される統計の作成に使用されたログの最終時刻の情報が格納される。前述のとおり図７の例では時間間隔を「１０分」としたので、最初のエントリに示す例は集計日時７０２が“２０１８／１０／１０１２：５７”なので、２０１８年１０月１０日の“１２：４８”から“１２：５７”までの１０分間に受信したログの集計である。 The alert identifier 601 stores information that identifies the alert corresponding to the statistics of the entry, that is, the identifier of the alert. The aggregation date and time 702 stores information on the last time of the log used to create the statistics shown in the entry. As mentioned above, in the example of FIG. 7, the time interval is set to "10 minutes", so in the example shown in the first entry, the aggregation date and time 702 is "2018/10/10 12:57", so "October 10, 2018" It is a total of the logs received in 10 minutes from "12:48" to "12:57".

アラートを構成するデータ要素は、アラートの発生日時以前の所定範囲のログであると定義してよい。つまり、同じ識別子のアラートの集計日時のログの情報７０３、７０４がアラートに関連するログ統計、すなわち、アラートを構成するデータ要素である。さらに、外部脅威情報７０５もアラートの構成情報としてもよい。 The data elements that make up the alert may be defined as a range of logs prior to the date and time the alert occurred. That is, the log information 703 and 704 of the aggregation date and time of the alert having the same identifier are the log statistics related to the alert, that is, the data element constituting the alert. Further, the external threat information 705 may also be used as the alert configuration information.

図６の例で示したように、アラート識別子６０１が“Ａｌｅｒｔ＿００１”であるアラートの発生日時６０２は“２０１８／１０／１０１３：５７”である。図７の以下の説明では日付の記載を省略して時刻のみを記載する。アラート識別子６０１が“Ａｌｅｒｔ＿００１”であるエントリのアラートの集計日時７０２は、発生日時６０２の“１３：５７”から１時間遡った“１２：５７”と、“１２：５７”から１０分刻みの“１３：０７”、“１３：１７”、“１３：２７”、“１３：３７”、“１３：４７”、および、“１３：５７”である。 As shown in the example of FIG. 6, the alert occurrence date and time 602 in which the alert identifier 601 is “Alert_001” is “2018/10/10 13:57”. In the following description of FIG. 7, the date is omitted and only the time is described. The aggregated date and time 702 of the alert of the entry whose alert identifier 601 is "Alert_001" is "12:57" which is one hour back from "13:57" of the occurrence date and time 602 and "12:57" in 10-minute increments. 13:07 ”,“ 13:17 ”,“ 13:27 ”,“ 13:37 ”,“ 13:47 ”, and“ 13:57 ”.

プロキシサーバログ７０３は、サブフィールドとして、キャッシュミス回数７３１と異常応答回数７３２とを有する。キャッシュミス回数７３１には、集計日時７０２においてプロキシサーバ１１５がキャッシュミスした回数が格納される。異常応答回数７３２には、集計日時７０２においてプロキシサーバ１１５が異常応答を受信した回数が格納される。なお、プロキシサーバログ７０３のサブフィールドは、キャッシュミス回数７３１や異常応答回数７３２に限定されずそれ以外、たとえば、通信バイト数などが含まれてもよい。 The proxy server log 703 has a cache miss count 731 and an abnormal response count 732 as subfields. The number of cache misses 731 stores the number of cache misses by the proxy server 115 at the aggregation date and time 702. The number of abnormal responses 732 stores the number of times that the proxy server 115 received the abnormal response at the aggregation date and time 702. The subfield of the proxy server log 703 is not limited to the number of cache misses 731 and the number of abnormal responses 732, and may include, for example, the number of communication bytes.

業務サーバログ７０４は、サブフィールドとして、異常応答回数７４１とアクセス回数７４２とを有する。異常応答回数７４１には、集計日時７０２において業務サーバ１１２が異常応答を受信した回数が格納される。アクセス回数７４２には、集計日時７０２で特定される一定時間間隔の集計期間において業務サーバ１１２が他のコンピュータ２００にアクセスされた回数が含まれる。なお、業務サーバログ７０４のサブフィールドは、異常応答回数７４１やアクセス回数７４２に限定されずそれ以外、たとえば認証失敗回数などが含まれてもよい。 The business server log 704 has an abnormal response count 741 and an access count 742 as subfields. The number of abnormal responses 741 stores the number of times that the business server 112 received the abnormal response at the aggregation date and time 702. The number of accesses 742 includes the number of times that the business server 112 is accessed by another computer 200 during the aggregation period specified by the aggregation date and time 702. The subfield of the business server log 704 is not limited to the number of abnormal responses 741 and the number of accesses 742, and may include, for example, the number of authentication failures.

外部脅威情報７０５は、サブフィールドとして、ＩＰアドレス危険度７５１とＵＲＬ危険度７５２とを有する。ＩＰアドレス危険度７５１には、集計日時７０２におけるアラート対象６０４の通信相手６０５がＩＰアドレスで特定された場合に、外部脅威情報データベース１３６において当該ＩＰアドレスの危険度を段階的に示した指標値が格納される。この指標値はたとえば、“０～５”の６段階であり、“５”が最も危険度が高いことを意味する。 The external threat information 705 has an IP address risk level 751 and a URL risk level 752 as subfields. The IP address risk level 751 includes an index value indicating the risk level of the IP address in the external threat information database 136 when the communication partner 605 of the alert target 604 at the aggregation date and time 702 is specified by the IP address. Stored. This index value is, for example, 6 levels of "0 to 5", and "5" means that the risk is the highest.

ＵＲＬ危険度７５２には、集計日時７０２におけるアラート対象６０４の通信相手６０５がＵＲＬで特定された場合に、外部脅威情報データベース１３６において当該ＵＲＬの危険度を段階的に示した指標値が格納される。この指標値はたとえば、“０～５”の６段階であり、“５”最も危険度が高いことを意味する。なお、外部脅威情報７０５のサブフィールドは、ＩＰアドレス危険度７５１やＵＲＬ危険度７５２に限定されずそれ以外、たとえば端末の脆弱度などが含まれてもよい。 In the URL risk level 752, when the communication partner 605 of the alert target 604 at the aggregation date and time 702 is specified by the URL, an index value indicating the risk level of the URL in the external threat information database 136 is stored. .. This index value is, for example, 6 levels of "0 to 5", and "5" means that the risk is the highest. The subfield of the external threat information 705 is not limited to the IP address risk level 751 and the URL risk level 752, and may include, for example, the vulnerability level of the terminal.

図８は、計算リソース量テーブル８００の一例を示す説明図である。計算リソース量テーブル８００は、統合管理装置１３３の所定の記憶領域に記録されている。計算リソース量テーブル８００は、説明変数名８０１と、算出に要した計算時間８０２のフィールドを有する。 FIG. 8 is an explanatory diagram showing an example of the calculation resource amount table 800. The calculated resource amount table 800 is recorded in a predetermined storage area of the integrated management device 133. The calculation resource amount table 800 has a field of the explanatory variable name 801 and the calculation time 802 required for the calculation.

説明変数名８０１には、図７のログ収集テーブル７００で示したプロキシサーバログ７０３のキャッシュミス回数７３１等のそれぞれの説明変数の項目名が格納される。算出に要した計算時間８０２には、ログ情報等からログ統計を算出するのに要した計算時間が格納される。算出に要した計算時間８０２に格納される情報は、たとえば、算出の際に計算に要した時間を記憶し、その平均値が格納される。説明変数名８０１が“プロキシサーバキャッシュミス回数”であるエントリは、算出に要した計算時間８０２が“５０．３秒”であったことを示す。 The explanatory variable name 801 stores the item names of the respective explanatory variables such as the cache miss count 731 of the proxy server log 703 shown in the log collection table 700 of FIG. The calculation time 802 required for the calculation stores the calculation time required to calculate the log statistics from the log information and the like. As the information stored in the calculation time 802 required for the calculation, for example, the time required for the calculation at the time of calculation is stored, and the average value thereof is stored. The entry in which the explanatory variable name 801 is "the number of proxy server cache misses" indicates that the calculation time 802 required for the calculation was "50.3 seconds".

図５に戻る。入力データ作成部３０３は、アラート収集部３０１とログ収集部３０２によって登録されたアラート収集テーブル６００とログ収集テーブル７００を参照して入力データを作成し、入力データテーブル９００に登録する（Ｓ５１１）。以上の手順によって、アラート分析装置１３５への入力データが作成される。 Return to FIG. The input data creation unit 303 creates input data by referring to the alert collection table 600 and the log collection table 700 registered by the alert collection unit 301 and the log collection unit 302, and registers the input data in the input data table 900 (S511). By the above procedure, the input data to the alert analyzer 135 is created.

図９は、入力データテーブル９００の一例を示す説明図である。入力データテーブル９００は、統合管理装置１３３の所定の記憶領域に記録されている。入力データテーブル９００は、アラート識別子６０１と、データ種別９０２と、判断結果分類９０３と集計日時９０４と、システム９０５と、プロキシサーバログ９０６と、業務サーバログ９０７と、外部脅威情報９０８のフィールドを有する。 FIG. 9 is an explanatory diagram showing an example of the input data table 900. The input data table 900 is recorded in a predetermined storage area of the integrated management device 133. The input data table 900 has fields of an alert identifier 601, a data type 902, a judgment result classification 903, an aggregation date and time 904, a system 905, a proxy server log 906, a business server log 907, and an external threat information 908. ..

データ種別９０２には、取り込まれたアラートに対して、データとしての扱いの種別を示す“学習”および“テスト”のいずれかが格納される。以下では、“学習”が設定されたアラートを「学習アラート」と呼び、“テスト”が設定されたアラートを「テストアラート」と呼ぶ。学習アラートは、アラートのデータ要素とアラートの分類（分類）との関連性が評価され、アラートの分類を予測するための予測モデル式の作成および更新に利用される。テストアラートは作成された予測モデル式のテストに用いられる。 In the data type 902, one of "learning" and "test" indicating the type of handling as data for the captured alert is stored. In the following, an alert for which "learning" is set is referred to as a "learning alert", and an alert for which "test" is set is referred to as a "test alert". Learning alerts are used to evaluate the association between alert data elements and alert classification (classification) and to create and update predictive model formulas for predicting alert classification. Test alerts are used to test the created predictive model formulas.

受信したアラートを“学習”および“テスト”のどちらに設定するかは任意である。入力データ作成部３０３は、所定のルールに基づいて、“学習”が設定されるアラート、“テスト”が設定されるアラート、を決めてよい。“学習”と“テスト”の比率は特に制限されるものではなく、入力データ作成部３０３が適宜設定してよい。 It is optional to set the received alert to "learn" or "test". The input data creation unit 303 may determine an alert in which "learning" is set and an alert in which "test" is set, based on a predetermined rule. The ratio of "learning" and "test" is not particularly limited, and may be appropriately set by the input data creation unit 303.

判断結果分類９０３は、図６に例示したアラート収集テーブル６００における分類６０７に相当し、同じ値が格納される。判断結果分類９０３に格納される情報は、入力データのうち目的変数に該当する。集計日時９０４は、図７に例示したログ収集テーブル７００における集計日時７０２に相当し、同じ値が格納される。システム９０５は、図６に例示したアラート収集テーブル６００のシステム６０３に相当し、同じ値が格納され、集計日時９０４ごとに記録される。 The determination result classification 903 corresponds to the classification 607 in the alert collection table 600 illustrated in FIG. 6, and the same value is stored. The information stored in the determination result classification 903 corresponds to the objective variable in the input data. The aggregation date and time 904 corresponds to the aggregation date and time 702 in the log collection table 700 illustrated in FIG. 7, and the same value is stored. The system 905 corresponds to the system 603 of the alert collection table 600 illustrated in FIG. 6, and the same value is stored and recorded for each aggregation date and time 904.

プロキシサーバログ９０６、業務サーバログ９０７、および外部脅威情報９０８は、ログ収集テーブル７００における、プロキシサーバログ７０３、業務サーバログ７０４、および外部脅威情報７０５に相当し、そのサブフィールドも同様に相当し、同じ値が格納される。システム９０５、プロキシサーバログ９０６、業務サーバログ９０７、および、外部脅威情報９０８は、説明変数に該当する。 The proxy server log 906, the business server log 907, and the external threat information 908 correspond to the proxy server log 703, the business server log 704, and the external threat information 705 in the log collection table 700, and their subfields also correspond to them. , The same value is stored. The system 905, the proxy server log 906, the business server log 907, and the external threat information 908 correspond to explanatory variables.

次に、アラート分析装置１３５に対する設定項目について説明する。図１０は、アラート分析装置１３５の設定画面表示の一例を示す説明図である。画面１０００は、設定タブ１００１において、抽出方法１０１０と閾値１０２０の２つの設定項目を示している。抽出方法１０１０は、作成対象から除外する説明変数の抽出方法を設定する。閾値１０２０は、さまざまな閾値を設定する。 Next, the setting items for the alert analyzer 135 will be described. FIG. 10 is an explanatory diagram showing an example of a setting screen display of the alert analysis device 135. The screen 1000 shows two setting items of the extraction method 1010 and the threshold value 1020 in the setting tab 1001. The extraction method 1010 sets an extraction method for explanatory variables to be excluded from the creation target. The threshold value 1020 sets various threshold values.

抽出方法１０１０では、第１方法１０１１、第２方法１０１２、および第３方法１０１３のそれぞれが独立に使用要否を設定できる。本項目は作成対象から除外する説明変数の抽出方法の設定なので、選択した抽出方法により抽出された説明変数は作成対象から除外される。第１方法１０１１は、分散＝０となる説明変数を抽出する方法である。第２方法１０１２は、目的変数との相関度が閾値以下となる説明変数を抽出する方法である。第３方法１０１３は、説明変数間の相関度（以下、相互相関度、と呼ぶ）が閾値以上となる説明変数グループから抽出する方法である。 In the extraction method 1010, the necessity of use can be independently set for each of the first method 1011 and the second method 1012, and the third method 1013. Since this item is the setting of the extraction method of the explanatory variables to be excluded from the creation target, the explanatory variables extracted by the selected extraction method are excluded from the creation target. The first method 1011 is a method of extracting an explanatory variable having a variance = 0. The second method 1012 is a method of extracting explanatory variables whose degree of correlation with the objective variable is equal to or less than the threshold value. The third method 1013 is a method of extracting from an explanatory variable group in which the degree of correlation between explanatory variables (hereinafter referred to as “cross-correlation degree”) is equal to or greater than a threshold value.

閾値１０２０では、無相関閾値１０２１、ｐ閾値１０２２、および強相関閾値１０２３のそれぞれの閾値を数値で設定できる。無相関閾値１０２１は、相関なしとする相関度の閾値であり、相関度が無相関閾値１０２１の値よりも小さい場合には相関なしと判断される。ｐ閾値１０２２は、有意水準とするｐ値の閾値である。ｐ値とは、統計において一般的に用いられる指標であり、帰無仮説の元で検定統計量がその値となる確率のことである。強相関閾値１０２３は、強い相関とする相互相関度の閾値であり、相互相関度が強相関閾値１０２３に設定された値よりも大きい場合には強い相関を有すると判断する。 At the threshold value 1020, the threshold values of the uncorrelated threshold value 1021, the p threshold value 1022, and the strong correlation threshold value 1023 can be set numerically. The uncorrelated threshold value 1021 is a threshold value of the degree of correlation with no correlation, and when the degree of correlation is smaller than the value of the uncorrelated threshold value 1021, it is determined that there is no correlation. The p-threshold value 1022 is a threshold value of the p-value as the significance level. The p-value is an index generally used in statistics, and is the probability that the test statistic will be the value under the null hypothesis. The strong correlation threshold value 1023 is a threshold value of the degree of cross-correlation that makes a strong correlation, and when the degree of cross-correlation is larger than the value set in the strong correlation threshold value 1023, it is determined that the strong correlation has a strong correlation.

次に、アラート分析装置１３５の動作について説明する。図１１は、作成対象から除外する説明変数を抽出する動作の一例を示すシーケンス図である。統合管理装置１３３の入力データ作成部３０３は、登録された入力データテーブル９００をアラート分析装置１３５に送信する。入力データテーブル９００は、予測部４０１、説明変数評価部４０２、予測影響度評価部４０３に共有される（Ｓ１１０１、Ｓ１１０２、Ｓ１１０３）。予測部４０１は、入力データテーブル９００のデータ種別９０２に“学習”が割り当てられているアラートの識別子を選択し、それらのエントリから目的変数と説明変数に該当する項目を抽出する。さらに予測部４０１は、目的変数と説明変数の出現頻度との相関度を算出し、相関度評価テーブル１２００に登録する（Ｓ１１０４）。ただし予測部４０１は、より単純に、目的変数と説明変数との相関度を算出してもよい。Ｓ１１０４の処理を図１２を参照して詳述する。 Next, the operation of the alert analyzer 135 will be described. FIG. 11 is a sequence diagram showing an example of an operation of extracting explanatory variables to be excluded from the creation target. The input data creation unit 303 of the integrated management device 133 transmits the registered input data table 900 to the alert analysis device 135. The input data table 900 is shared by the prediction unit 401, the explanatory variable evaluation unit 402, and the prediction impact evaluation unit 403 (S1101, S1102, S1103). The prediction unit 401 selects an alert identifier to which "learning" is assigned to the data type 902 of the input data table 900, and extracts items corresponding to the objective variable and the explanatory variable from those entries. Further, the prediction unit 401 calculates the degree of correlation between the appearance frequency of the objective variable and the explanatory variable and registers it in the correlation degree evaluation table 1200 (S1104). However, the prediction unit 401 may more simply calculate the degree of correlation between the objective variable and the explanatory variable. The process of S1104 will be described in detail with reference to FIG.

図１２は、相関度評価テーブル１２００の一例を示す説明図である。相関度評価テーブル１２００は、アラート分析装置１３５の所定の記憶領域に登録されている。相関度評価テーブル１２００は、説明変数名１２０１と、値域１２０２と、相関度１２０３と、ｐ値１２０４と、をフィールドとして有する。説明変数名１２０１には、“学習”が付与されたアラートに関する説明変数の名称が格納される。これは、入力データテーブル９００のプロキシサーバキャッシュミス回数９６１等に対応する。 FIG. 12 is an explanatory diagram showing an example of the correlation degree evaluation table 1200. The correlation degree evaluation table 1200 is registered in a predetermined storage area of the alert analyzer 135. The correlation degree evaluation table 1200 has an explanatory variable name 1201, a range 1202, a correlation degree 1203, and a p-value 1204 as fields. The explanatory variable name 1201 stores the name of the explanatory variable related to the alert to which "learning" is given. This corresponds to the proxy server cache miss count 961 and the like in the input data table 900.

値域１２０２には、説明変数名１２０１が取り得る値の範囲が格納される。値域１２０２にはたとえば、あらかじめ定められた値が格納される。アラートのデータ要素は、説明変数名１２０１と値域１２０２との組み合わせで構成される。相関度１２０３にはデータ要素と、目的変数に該当する判断結果分類９０３との相関性を評価した指標、すなわち、相関度に対応する値が格納される。以下で具体的に説明する。 The range 1202 stores a range of values that the explanatory variable name 1201 can take. For example, a predetermined value is stored in the range 1202. The data element of the alert is composed of a combination of the explanatory variable name 1201 and the range 1202. The correlation degree 1203 stores an index for evaluating the correlation between the data element and the judgment result classification 903 corresponding to the objective variable, that is, a value corresponding to the correlation degree. This will be described in detail below.

相関度１２０３に格納される値はたとえば、学習アラートの特徴量（データ要素）における値域１２０２の出現回数を当該特徴量の集計回数で除算した値域１２０２の出現頻度ｐと、判断結果分類９０３の分類度合いｑと、の相関係数Ｒ１である。相関係数Ｒ１は、出現頻度ｐの標準偏差σｐと、分類度合いｑの標準偏差σｑと、出現頻度ｐおよび分類度合いｑの共分散Ｓｐｑと、を用いて次の（数式１）で算出される。 The values stored in the correlation degree 1203 are, for example, the appearance frequency p of the range 1202 obtained by dividing the number of appearances of the range 1202 in the feature amount (data element) of the learning alert by the total number of times of the feature amount, and the classification of the judgment result classification 903. The correlation coefficient R1 with the degree q. The correlation coefficient R1 is calculated by the following (Equation 1) using the standard deviation σp of the appearance frequency p, the standard deviation σq of the classification degree q, and the covariance Spq of the appearance frequency p and the classification degree q. ..

Ｒ１＝Ｓｐｑ／（σｐ×σｑ）・・・（数式１） R1 = Spq / (σp × σq) ... (Formula 1)

以下では図９に示した入力データテーブル９００の例を用いて、出現頻度ｐおよび分類度合いｑの算出方法を具体的に説明する。 Hereinafter, a method for calculating the appearance frequency p and the classification degree q will be specifically described using the example of the input data table 900 shown in FIG.

アラート識別子６０１が“Ａｌｅｒｔ＿００１”で、かつ、説明変数名１２０１が“プロキシサーバキャッシュミス回数”を説明すると次のとおりである。図９によれば、アラート識別子６０１が“Ａｌｅｒｔ＿００１”であるプロキシサーバログ９０６のキャッシュミス回数９６１は、”１２：５７”において“３”、”１３：０７”において“４”、”１３：５７”において“４”である。ここで、図９において記載が省略されている、集計日時９０４が“１３：１７”～”１３：４７”のキャッシュミス回数９６１を“３”および“４”以外の値とする。この場合には、アラート識別子６０１が“Ａｌｅｒｔ＿００１”であるプロキシサーバログ９０６のキャッシュミス回数９６１における値域１２０２“３”～“４”の出現回数は３回である。 It is as follows when the alert identifier 601 is "Alert_001" and the explanatory variable name 1201 explains "the number of proxy server cache misses". According to FIG. 9, the number of cache misses 961 of the proxy server log 906 in which the alert identifier 601 is “Alert_001” is “3” at “12:57”, “4” at “13:07”, and “13:57”. Is "4" in "". Here, the number of cache misses 961 in which the aggregation date and time 904 is "13:17" to "13:47", which is omitted in FIG. 9, is set to a value other than "3" and "4". In this case, the number of occurrences of the range 1202 "3" to "4" in the cache miss count 961 of the proxy server log 906 in which the alert identifier 601 is "Alert_001" is three times.

アラート識別子６０１が“Ａｌｅｒｔ＿００１”であるプロキシサーバログ９０６のキャッシュミス回数９６１の集計回数は、前述のように７回である。したがって、アラート識別子６０１が“Ａｌｅｒｔ＿００１”であるプロキシサーバログ９０６のキャッシュミス回数９６１における値域１２０２が“３”～“４”の出現頻度ｐは、“３／７”である。また、アラート識別子５０１が“Ａｌｅｒｔ＿００５”である分類度合いｑは、判断結果分類９０３の“０”である。以上が出現頻度ｐおよび分類度合いｑの具体的な算出方法である。 As described above, the total number of cache misses 961 of the proxy server log 906 in which the alert identifier 601 is "Alert_001" is 7 times. Therefore, the appearance frequency p of the range 1202 of “3” to “4” in the cache miss count 961 of the proxy server log 906 in which the alert identifier 601 is “Alert_001” is “3/7”. Further, the classification degree q in which the alert identifier 501 is “Alert_005” is “0” in the judgment result classification 903. The above is a specific calculation method of the appearance frequency p and the classification degree q.

予測部４０１は、データ種別９０２が“学習”であるアラート識別子６０１ごとに、出現頻度ｐと分類度合いｑとの組み合わせを求める。次に予測部４０１は、それぞれの項目の現象が出現する頻度ｐから、出現頻度ｐの標準偏差σｐを求める。そして予測部４０１は、複数のアラート夫々の分類度合いｑから標準偏差σｑを求め、共分散Ｓｐｑを求める。さらに予測部４０１は、上記（数式１）により、データ種別９０２が“学習”であるアラート識別子６０１についての説明変数名１２０１“プロキシサーバキャッシュミス回数”の値域１２０２が“３”～“４”に対応する相関係数（Ｒ１＝－０．５４）を算出する。 The prediction unit 401 obtains a combination of the appearance frequency p and the classification degree q for each alert identifier 601 whose data type 902 is “learning”. Next, the prediction unit 401 obtains the standard deviation σp of the appearance frequency p from the frequency p at which the phenomenon of each item appears. Then, the prediction unit 401 obtains the standard deviation σq from the classification degree q of each of the plurality of alerts, and obtains the covariance Spq. Further, according to the above (formula 1), the prediction unit 401 changes the value range 1202 of the explanatory variable name 1201 “proxy server cache miss count” for the alert identifier 601 whose data type 902 is “learning” to “3” to “4”. The corresponding correlation coefficient (R1 = −0.54) is calculated.

相関係数Ｒ１が正（Ｒ１＞０）であることは、アラートが正しく、すなわち、アラートがウィルス攻撃等の危険を示すことを意味する。また、相関係数Ｒ１の値が大きいほど、危険の程度が高いことを示す。一方、相関係数Ｒ１が負（Ｒ１＜０）であることは、アラートが誤報に基づくものである等、それほどアラートに危険がない事を示す。また、相関係数Ｒ１が小さいほど、すなわちマイナスの絶対値が大きいほど誤報である程度が高いことを示す。このように、アラートの複数のデータ要素毎に相関度が求められるため、それぞれのデータ要素の相関係数を統合することによって、アラートのスコアに基づいて、アラート自体の重要度を評価することができる。なおここでいう「統合」は、融合、組み合わせ、統一、合成などに言い換えてもよい。 When the correlation coefficient R1 is positive (R1> 0), it means that the alert is correct, that is, the alert indicates a danger such as a virus attack. Further, the larger the value of the correlation coefficient R1, the higher the degree of danger. On the other hand, the fact that the correlation coefficient R1 is negative (R1 <0) indicates that the alert is not so dangerous, such as that the alert is based on a false alarm. Further, the smaller the correlation coefficient R1, that is, the larger the negative absolute value, the higher the false alarm to some extent. In this way, since the degree of correlation is calculated for each of multiple data elements of the alert, it is possible to evaluate the importance of the alert itself based on the score of the alert by integrating the correlation coefficient of each data element. can. The term "integration" here may be paraphrased as fusion, combination, unification, synthesis, and the like.

ｐ値１２０４は、無相関を検定するための指標である。統計において一般的に用いられる指標であり、帰無仮説の元で検定統計量がその値となる確率である。統計検定量ｔは、相関係数Ｒ１と、標本サンプル数ｍと、により、以下の（数式２）で算出される。 The p-value 1204 is an index for testing uncorrelatedness. It is an index generally used in statistics, and is the probability that the test statistic will be its value under the null hypothesis. The statistical test statistic t is calculated by the following (Equation formula 2) by the correlation coefficient R1 and the number of sample samples m.

ｔ＝Ｒ１×√（ｍ－２）／√（１－Ｒ１＾２）・・・（数式２） t = R1 × √ (m-2) / √ (1-R1 ^ 2) ・・・ (Formula 2)

ｐ値１２０４は、ｔ分布において、統計検定量５の絶対値以上の値が発生する確率として算出される。ｐ値がある閾値、例えば、０．０５以上になる場合、事前に算出された相関係数Ｒ１の結果は有意水準５％で棄却され、２つの変数は無相関であると言える。 The p-value 1204 is calculated as the probability that a value equal to or greater than the absolute value of the statistical test statistic 5 will occur in the t distribution. When the p-value reaches a certain threshold, for example 0.05 or more, the result of the correlation coefficient R1 calculated in advance is rejected at the significance level of 5%, and it can be said that the two variables are uncorrelated.

図１１に戻る。予測部４０１は、出現頻度ｐおよび分類度合いｑに基づいて、分類度合いの予測値を算出す予測モデル式を作成し、保存する（Ｓ１１０６）。この予測モデル式は、目的変数Ｙを判断結果分類９０３、特徴量Ｘｉ（ｉ＝１、２、・・・、ｎ）を“Ｐ（説明変数名１２０１、値域１２０２）”とする。ただし、Ｐ（Ｚ）は、事象Ｚの出現頻度ｐを表す。特徴量Ｘｉを、相関度評価テーブル１２００を参照して、Ｘ１＝Ｐ（プロキシサーバキャッシュミス回数、“０”～“３”）、Ｘ２＝Ｐ（プロキシサーバキャッシュミス回数、“３”～“４”）、などとする。 Return to FIG. The prediction unit 401 creates and stores a prediction model formula for calculating the predicted value of the classification degree based on the appearance frequency p and the classification degree q (S1106). In this prediction model formula, the objective variable Y is the judgment result classification 903, and the feature quantity Xi (i = 1, 2, ..., N) is "P (explanatory variable name 1201, range 1202)". However, P (Z) represents the appearance frequency p of the event Z. For the feature quantity Xi, refer to the correlation degree evaluation table 1200, X1 = P (proxy server cache miss count, “0” to “3”), X2 = P (proxy server cache miss count, “3” to “4”). "), Etc.

予測部４０１は、予測モデル式の一例として、下記（数式３）のような重回帰式を作成する。“ｎ”は、説明変数名１２０１および値域１２０２との組み合わせの総数、すなわち、事象Ｚの総数である。 The prediction unit 401 creates a multiple regression equation as shown below (Formula 3) as an example of the prediction model equation. “N” is the total number of combinations with the explanatory variable name 1201 and the range 1202, that is, the total number of events Z.

ここで、入力データであるアラートのエントリｊに対する目的変数Ｙの値を“ｙ＿ｊ”とする。エントリｊはたとえば、“Ａｌｅｒｔ＿００１”に関する目的変数Ｙと特徴量Ｘｉの組み合わせである。また、特徴量Ｘ１の値を“ｘ１＿ｊ”、特徴量Ｘ２の値を“ｘ２＿ｊ”、・・・、特徴量Ｘｎの値を“ｘｎ＿ｊ”とする。このとき、上記（数式３）の各係数“ｂ０”、“ｂ１”、“ｂ２”、・・・、“ｂｎ”は、一例として下記（数式４）のような行列式によって求めることができる。下記の（数式４）において、“ｊ”は、入力データであるアラートのエントリ“１”～“ｋ”の任意のエントリを示す。 Here, the value of the objective variable Y for the entry j of the alert which is the input data is set to "y_j". The entry j is, for example, a combination of the objective variable Y and the feature amount Xi regarding "Alert_001". Further, the value of the feature amount X1 is set to "x1_j", the value of the feature amount X2 is set to "x2_j", ..., And the value of the feature amount Xn is set to "xn_j". At this time, each coefficient "b0", "b1", "b2", ..., "Bn" of the above (formula 3) can be obtained by a determinant as shown below (formula 4) as an example. In the following (Formula 4), "j" indicates any entry of the alert entry "1" to "k" which is input data.

上記（数式３）による予測モデル式は、予測モデルの一例である。この予測モデル式の作成方法は一例であり、公知の正則化や決定木、アンサンブル学習、ニューラルネットワーク、ベイジアンネットワークなどの手法を用いて導出してもよい。アラート分類は、前述のように、“０”または“１”に設定される。分類の予測値は、“０”以上“１”以下の値として設定される。予測部４０１は、この予測値に基づいて分類度を判定し、これを専門家が参照できるようにしてよい。図１１の説明を続ける。 The prediction model formula based on the above (formula 3) is an example of a prediction model. This method of creating a predictive model formula is an example, and may be derived using a known method such as regularization, decision tree, ensemble learning, neural network, Bayesian network, or the like. The alert classification is set to "0" or "1" as described above. The predicted value of classification is set as a value of "0" or more and "1" or less. The prediction unit 401 may determine the degree of classification based on this prediction value and allow an expert to refer to it. The description of FIG. 11 will be continued.

予測部４０１は、前述の予測モデル式を用いて、入力データテーブル９００の学習アラートとテストアラートに基づいて予測値と精度を算出する（Ｓ１１０７）。そして予測部４０１は、算出した予測値と制度を出力部４０４に送信する（Ｓ１１０８）。予測部４０１は、このアラートの特徴量“ｘ１＿ｊ”、“ｘ２＿ｊ”、…、“ｘｎ＿ｊ”を予測モデル式に与えることにより、予測値“ｙ＿ｊ”を算出する。 The prediction unit 401 calculates the prediction value and the accuracy based on the learning alert and the test alert of the input data table 900 by using the above-mentioned prediction model formula (S1107). Then, the prediction unit 401 transmits the calculated predicted value and the system to the output unit 404 (S1108). The prediction unit 401 calculates the prediction value “y_j” by giving the feature quantities “x1_j”, “x2_j”, ..., “Xn_j” of this alert to the prediction model formula.

予測部４０１は、人間による指定、または、所定のルールに基づいて予測閾値を決定する。予測閾値は、算出された予測値を、“０”および“１”のいずれかに分類する指標である。予測部４０１は、予測値が予測閾値より小さければ“０”を、閾値以上であれば“１”を判定する。予測部４０１は、入力データテーブル９００の判断結果分類９０３を参照し、予測値の判定結果と照らし合わせて、予測が的中した精度を算出する。セキュリティ監視の場合は危険なアラートの見逃しを防ぎたいという観点から、“１”と予測分類したアラートの的中率が１００％となるように予測閾値を調整し、その際に“０”と予測分類したアラートの的中率を精度としてもよい。 The prediction unit 401 determines the prediction threshold value based on a human designation or a predetermined rule. The prediction threshold value is an index for classifying the calculated predicted value into either "0" or "1". The prediction unit 401 determines "0" if the predicted value is smaller than the predicted threshold value and "1" if the predicted value is equal to or higher than the threshold value. The prediction unit 401 refers to the judgment result classification 903 of the input data table 900, compares it with the judgment result of the predicted value, and calculates the accuracy of the prediction. In the case of security monitoring, from the viewpoint of preventing dangerous alerts from being overlooked, the prediction threshold is adjusted so that the hit rate of alerts classified as "1" is 100%, and at that time, it is predicted to be "0". The accuracy rate of the classified alerts may be used as the accuracy.

次に、説明変数評価部４０２は、図１０の抽出方法１０１０における設定に従って、説明変数の評価を行う。説明変数評価部４０２は、まず、入力データテーブル９００の各説明変数の項目、たとえば、プロキシサーバログキャッシュミス回数９６１の値を参照し、説明変数の分散と、目的変数との相関度を算出し、説明変数評価テーブル１３００に登録する（Ｓ１１０９）。 Next, the explanatory variable evaluation unit 402 evaluates the explanatory variables according to the settings in the extraction method 1010 of FIG. The explanatory variable evaluation unit 402 first refers to the item of each explanatory variable in the input data table 900, for example, the value of the proxy server log cache miss count 961, and calculates the distribution of the explanatory variable and the degree of correlation with the objective variable. , Registered in the explanatory variable evaluation table 1300 (S1109).

図１３は、説明変数評価テーブル１３００の一例を示す説明図である。説明変数評価テーブル１３００は、アラート分析装置１３５の所定の記憶領域に登録されている。説明変数評価テーブル１３００は、説明変数名１３０１と、分散１３０２と、目的変数との相関度１３０３のフィールドを有する。説明変数名１３０１には、入力データテーブル９００の各説明変数の名称が格納される。分散１３０２には、入力データテーブル９００の各説明変数の項目、例えば、プロキシサーバログキャッシュミス回数９６１の値を参照し、正規分布に従うことを仮定して、平均値と分散を算出することによって求めた値が格納される。 FIG. 13 is an explanatory diagram showing an example of the explanatory variable evaluation table 1300. The explanatory variable evaluation table 1300 is registered in a predetermined storage area of the alert analyzer 135. The explanatory variable evaluation table 1300 has fields for the explanatory variable name 1301, the variance 1302, and the degree of correlation 1303 with the objective variable. The explanatory variable name 1301 stores the name of each explanatory variable in the input data table 900. The variance 1302 is obtained by referring to the items of each explanatory variable in the input data table 900, for example, the value of the proxy server log cache miss count 961, and calculating the mean value and the variance on the assumption that a normal distribution is followed. Value is stored.

目的変数との相関度１３０３には、相関度評価テーブル１２００を参照し、同名の説明変数名１２０１のうち相関度１２０３の絶対値が最大となる値が格納される。たとえば、説明変数名１３０１が“プロキシサーバキャッシュミス回数”であるエントリは、分散１３０２が“５４．２”で、目的変数との相関度１３０３が“－０．５４”と算出される。 The correlation degree 1303 with the objective variable refers to the correlation degree evaluation table 1200, and stores the value having the maximum absolute value of the correlation degree 1203 among the explanatory variable names 1201 having the same name. For example, for an entry in which the explanatory variable name 1301 is the “number of proxy server cache misses”, the variance 1302 is calculated to be “54.2” and the correlation degree 1303 with the objective variable is calculated to be “−0.54”.

ただし、図１０の閾値１０２０における無相関閾値１０２１の値以下となる相関度１２０３の値は、相関なしとして相関度＝０とする。また、閾値１０２０におけるｐ閾値１０２２の値以上のｐ値１２０４を有する相関度の値は信頼度が低いため、相関度１２０３の値は同様に相関度＝０とする。たとえば、説明変数名１３０１が“外部脅威情報ＩＰアドレス危険度”であるエントリは、上記の閾値との関係で目的変数との相関度１３０３が“０”と算出される。 However, the value of the correlation degree 1203 that is equal to or less than the value of the uncorrelated threshold value 1021 in the threshold value 1020 of FIG. 10 is assumed to have no correlation and the correlation degree is 0. Further, since the reliability value having a p-value 1204 equal to or higher than the value of the p-threshold value 1022 at the threshold value 1020 has low reliability, the value of the correlation degree 1203 is similarly set to the correlation degree = 0. For example, an entry whose explanatory variable name 1301 is “external threat information IP address risk” is calculated to have a correlation degree 1303 with the objective variable “0” in relation to the above threshold value.

図１１に戻る。説明変数評価部４０２は、説明変数評価テーブル１３００の分散１３０２を参照して、分散がゼロである説明変数を抽出して、その説明変数名を出力部４０４に送信する（Ｓ１１１０）。分散がゼロであることは、その説明変数が変化することなく常に一定の値をとることを意味する。そのため、目的変数との相関はないことになる。 Return to FIG. The explanatory variable evaluation unit 402 refers to the variance 1302 of the explanatory variable evaluation table 1300, extracts the explanatory variable having a variance of zero, and transmits the explanatory variable name to the output unit 404 (S1110). A zero variance means that the explanatory variables do not change and always take a constant value. Therefore, there is no correlation with the objective variable.

説明変数評価部４０２は、次に、目的変数との相関度が閾値以下となる説明変数を抽出し（Ｓ１１１１）、抽出した説明変数の名を予測影響度評価部４０３に送信する（Ｓ１１１２）。説明変数評価部４０２は、説明変数評価テーブル１３００の目的変数との相関度１３０３を参照し、相関度１３０３がゼロである説明変数を抽出する。閾値１０２０において無相関閾値１０２１の値以下を有する相関度１２０３の値は、既に相関度がゼロに設定されているため、相関度１２０３がゼロの説明変数を抽出することは、目的変数との相関度が閾値以下となる説明変数を抽出していることと同等である。 Next, the explanatory variable evaluation unit 402 extracts an explanatory variable whose degree of correlation with the objective variable is equal to or less than the threshold value (S1111), and transmits the name of the extracted explanatory variable to the prediction impact evaluation unit 403 (S1112). The explanatory variable evaluation unit 402 refers to the correlation degree 1303 with the objective variable in the explanatory variable evaluation table 1300, and extracts the explanatory variables having the correlation degree 1303 of zero. Since the value of the correlation degree 1203 having the value of the uncorrelated threshold 1021 or less at the threshold 1020 is already set to zero, extracting the explanatory variable having the correlation degree 1203 of zero correlates with the objective variable. It is equivalent to extracting explanatory variables whose degree is less than or equal to the threshold.

次に、予測影響度評価部４０３は、Ｓ１１１２にて受信した説明変数名に対して、それを除外した際の予測影響度を算出し（Ｓ１１１３）、説明変数名と共に予測影響度を出力部４０４に送信する（Ｓ１１１４）。Ｓ１１１３において算出される予測影響度は、入力データテーブル９００から、Ｓ１１１２で通知された説明変数を除外して、Ｓ１１０６、Ｓ１１０７に記載したものと同様の手段によって算出される予測精度である。そして説明変数評価部４０２は、説明変数間の相関度である相互相関度を算出し、相互相関度評価テーブル１４００に登録する（Ｓ１１１５）。 Next, the predictive impact evaluation unit 403 calculates the predictive impact when the explanatory variable name received in S1112 is excluded (S1113), and outputs the predictive impact together with the explanatory variable name to the output unit 404. (S1114). The prediction influence degree calculated in S1113 is the prediction accuracy calculated by the same means as described in S1106 and S1107, excluding the explanatory variables notified in S1112 from the input data table 900. Then, the explanatory variable evaluation unit 402 calculates the cross-correlation degree, which is the degree of correlation between the explanatory variables, and registers it in the cross-correlation degree evaluation table 1400 (S1115).

図１４は、相互相関度評価テーブル１４００の一例を示す説明図である。相互相関度評価テーブル１４００は、アラート分析装置１３５の所定の記憶領域に登録されている。相互相関度評価テーブル１４００は、行方向１４０１と、列方向１４０２と、にそれぞれ説明変数名が列挙されたフィールドを有する。説明変数名の行方向１４０１と列方向１４０２は、それぞれ入力データテーブル９００の各説明変数の名称が格納される。相互相関度評価テーブル１４００の各セルには、説明変数同士の相関係数Ｒ２が登録される。例えば、相関係数Ｒ２は、説明変数Ａの値αの標準偏差σαと、説明変数Ｂの値βの標準偏差σβと、説明変数Ａの値αおよび説明変数Ｂの値βの共分散Ｓαβとを用いて下記の（数式５）で算出される。 FIG. 14 is an explanatory diagram showing an example of the cross-correlation degree evaluation table 1400. The cross-correlation degree evaluation table 1400 is registered in a predetermined storage area of the alert analyzer 135. The cross-correlation degree evaluation table 1400 has fields in which explanatory variable names are listed in the row direction 1401 and the column direction 1402, respectively. The row direction 1401 and the column direction 1402 of the explanatory variable names store the names of the explanatory variables in the input data table 900, respectively. In each cell of the cross-correlation degree evaluation table 1400, the correlation coefficient R2 between the explanatory variables is registered. For example, the correlation coefficient R2 includes a standard deviation σα of the value α of the explanatory variable A, a standard deviation σβ of the value β of the explanatory variable B, and a covariance Sαβ of the value α of the explanatory variable A and the value β of the explanatory variable B. Is calculated by the following (Equation 5).

Ｒ２＝Ｓαβ／（σα×σβ）・・・（数式５） R2 = Sαβ / (σα × σβ) ... (Formula 5)

相互相関度評価テーブル１４００では、例えば、説明変数名“プロキシサーバキャッシュミス回数”と説明変数名“プロキシサーバ異常応答回数”の相関係数は“０．８７”である。 In the mutual correlation degree evaluation table 1400, for example, the correlation coefficient between the explanatory variable name “proxy server cache miss count” and the explanatory variable name “proxy server abnormal response count” is “0.87”.

図１１に戻る。説明変数評価部４０２は、相互相関度評価テーブル１４００を参照して、図１０の閾値１０２０において、強相関閾値１０２３の値以上となる相互相関度の説明変数の組を抽出する。次に説明変数評価部４０２は、説明変数のグループを作成し、グループ内評価テーブル１５００に登録する（Ｓ１１１６）。そして説明変数評価部４０２は、グループ内評価テーブル１５００の内容を予測影響度評価部４０３に送信する（Ｓ１１１７）。 Return to FIG. The explanatory variable evaluation unit 402 refers to the cross-correlation degree evaluation table 1400 and extracts a set of explanatory variables of the cross-correlation degree having a value equal to or higher than the value of the strong correlation threshold value 1023 in the threshold value 1020 of FIG. Next, the explanatory variable evaluation unit 402 creates a group of explanatory variables and registers it in the evaluation table 1500 within the group (S1116). Then, the explanatory variable evaluation unit 402 transmits the contents of the evaluation table 1500 in the group to the prediction impact evaluation unit 403 (S1117).

図１５は、説明変数のグループ内評価テーブル１５００の一例を示す説明図である。グループ内評価テーブル１５００は、アラート分析装置１３５の所定の記憶領域に登録されている。グループ内評価テーブル１５００は、グループ１５０１と、説明変数名１５０２と、目的変数との相関度１５０３のフィールドを有する。グループ１５０１には、グループの名称、たとえば１からの連番の数字が格納される。説明変数名１５０２には、同一のグループに属する説明変数の名称、すなわち相互相関度評価テーブル１４００の説明変数名１４０１、または、説明変数１４０２の値が格納される。目的変数との相関度１５０３には、それぞれの説明変数について目的変数との相関度が格納される。この値はたとえば、説明変数評価テーブル１３００の、該当する説明変数名１３０１に対する目的変数との相関度１３０３の値である。 FIG. 15 is an explanatory diagram showing an example of the evaluation table 1500 in the group of explanatory variables. The in-group evaluation table 1500 is registered in a predetermined storage area of the alert analyzer 135. The in-group evaluation table 1500 has fields for the group 1501, the explanatory variable name 1502, and the degree of correlation 1503 with the objective variable. The group 1501 stores the name of the group, for example, a serial number from 1. The explanatory variable name 1502 stores the names of the explanatory variables belonging to the same group, that is, the explanatory variable names 1401 of the mutual correlation degree evaluation table 1400 or the values of the explanatory variables 1402. The degree of correlation with the objective variable 1503 stores the degree of correlation with the objective variable for each explanatory variable. This value is, for example, the value of the degree of correlation 1303 with the objective variable for the corresponding explanatory variable name 1301 in the explanatory variable evaluation table 1300.

グループ内評価テーブル１５００は、相互相関度評価テーブル１４００を参照して、閾値１０２０における強い相関とする相互相関度の閾値１０２３以上となる相互相関度の説明変数の組を抽出することで作成される。たとえば、説明変数Ａと説明変数Ｂ、説明変数Ｂと説明変数Ｃが抽出された場合、説明変数Ａ、Ｂ、Ｃは１つのグループとなる。 The in-group evaluation table 1500 is created by referring to the cross-correlation evaluation table 1400 and extracting a set of explanatory variables of the cross-correlation degree having a cross-correlation degree threshold of 1023 or more, which is a strong correlation at the threshold value 1020. .. For example, when the explanatory variables A and B, and the explanatory variables B and C are extracted, the explanatory variables A, B, and C become one group.

図１１に戻る。予測影響度評価部４０３は、グループ内評価テーブル１５００を参照し、グループ１５０１ごとに、説明変数名１５０２を１つ選出して、グループ内のそれ以外を除外した際の予測影響度を算出し（Ｓ１１１８）、グループ内評価テーブル１５００と共に予測影響度を出力部４０４に送信する（Ｓ１１１９）。予測影響度は、Ｓ１１１８においてグループ内で選出した１つ以外の説明変数を、入力データテーブル９００から除外して、Ｓ１１０６、Ｓ１１０７に記載したものと同様の手段によって算出される予測精度である。 Return to FIG. The predictive impact evaluation unit 403 refers to the in-group evaluation table 1500, selects one explanatory variable name 1502 for each group 1501, and calculates the predictive impact when the others in the group are excluded (). S1118), the predicted influence degree is transmitted to the output unit 404 together with the evaluation table 1500 in the group (S1119). The prediction influence degree is the prediction accuracy calculated by the same means as described in S1106 and S1107 by excluding the explanatory variables other than one selected in the group in S1118 from the input data table 900.

最後に、出力部４０４は、統合管理装置１３３の出力データ表示部３０４に対して、Ｓ１１０８、Ｓ１１１０、Ｓ１１１４、Ｓ１１１９にて受信した予測値と評価結果を送信する（Ｓ１１２０）。以上が図１１の説明である。 Finally, the output unit 404 transmits the predicted value and the evaluation result received in S1108, S1110, S1114, and S1119 to the output data display unit 304 of the integrated management device 133 (S1120). The above is the description of FIG.

図１６は、統合管理装置１３３の出力画面表示の一例を示す説明図である。画面１６００は、説明変数評価１６０１のタブにて、学習およびテストデータ予測結果１６１０と、除外候補変数および予測影響度１６２０と、を表示する。なお除外候補変数とは、前述のとおり予測モデル式から除外される候補となっている説明変数である。学習およびテストデータ予測結果１６１０は、アラート識別子に対する予測値の一覧１６１１と、データ種別１６１２ごとの予測精度１６１４とを表示する。予測精度１６１４は、図１１のＳ１１０８にて通知された値である。 FIG. 16 is an explanatory diagram showing an example of an output screen display of the integrated management device 133. The screen 1600 displays the training and test data prediction result 1610, the exclusion candidate variable, and the prediction influence degree 1620 on the tab of the explanatory variable evaluation 1601. The exclusion candidate variable is an explanatory variable that is a candidate to be excluded from the prediction model formula as described above. The training and test data prediction result 1610 displays a list of predicted values 1611 for the alert identifier and a prediction accuracy 1614 for each data type 1612. The prediction accuracy 1614 is the value notified in S1108 of FIG.

除外候補変数および予測影響度１６２０は、分散＝０となる説明変数１６２１と、目的変数との相関が閾値以下となる説明変数１６３０と、グループ化された説明変数１６４０と、の３つを表示する。この３つは、図１０の抽出方法１０１０において設定する３つの抽出方法に対応しており、図１０に示す例では３つの方法全てが選択されているので図１６の例では３つとも表示されている。たとえば抽出方法１０１０において第２方法１０１２を選択しない、換言すると選択されていない場合には図１６において、目的変数との相関が閾値以下となる説明変数１６３０が表示されない。 The exclusion candidate variable and the prediction influence degree 1620 display three explanatory variables 1621 having a variance = 0, an explanatory variable 1630 having a correlation with the objective variable below the threshold value, and a grouped explanatory variable 1640. .. These three correspond to the three extraction methods set in the extraction method 1010 of FIG. 10, and since all three methods are selected in the example shown in FIG. 10, all three are displayed in the example of FIG. ing. For example, when the second method 1012 is not selected in the extraction method 1010, in other words, when it is not selected, the explanatory variable 1630 whose correlation with the objective variable is equal to or less than the threshold value is not displayed in FIG.

分散＝０となる説明変数１６２１は、図１１のＳ１１１０で通知された説明変数名１６２２と、その説明変数を作成対象から除外した際の計算リソース削減量１６２３を表示する。計算リソース削減量１６２３には、図８の計算リソース量テーブル８００の該当する説明変数名に対する算出に要した計算時間８０２の値を表示する。 The explanatory variable 1621 in which the variance = 0 displays the explanatory variable name 1622 notified in S1110 of FIG. 11 and the calculation resource reduction amount 1623 when the explanatory variable is excluded from the creation target. The calculation resource reduction amount 1623 displays the value of the calculation time 802 required for the calculation for the corresponding explanatory variable name in the calculation resource amount table 800 of FIG.

目的変数との相関が閾値以下となる説明変数１６３０は、図１１のＳ１１１４で通知される説明変数名１６３１と、予測影響度１６３２と、を表示する。計算リソース削減量１６２３には、図８の計算リソース量テーブル８００の該当する説明変数名に対する算出に要した計算時間８０２の値を表示する。なお予測影響度１６３２の値は、精度を表しているともいえる。 The explanatory variable 1630 whose correlation with the objective variable is equal to or less than the threshold value displays the explanatory variable name 1631 notified in S1114 of FIG. 11 and the predicted influence degree 1632. The calculation resource reduction amount 1623 displays the value of the calculation time 802 required for the calculation for the corresponding explanatory variable name in the calculation resource amount table 800 of FIG. It can be said that the value of the predicted influence degree 1632 represents the accuracy.

グループ化された説明変数１６４０は、図１１のＳ１１１９で通知されるグループ内評価テーブル１５００の登録内容であるグループ１６４１と、説明変数名１６４２と、目的変数との相関度１６４３と、選出時の予測影響度１６４４と、選出時の計算リソース削減量１６４５とを表示する。選出時の予測影響度１６４４には、グループ内の１つの説明変数を選出時、すなわち、それ以外の説明変数を除外した際の予測影響度、換言すると精度を表示する。選出時の計算リソース削減量１６４５には、グループ内の１つの説明変数を選出時、すなわち、それ以外の説明変数を除外した際に、図８の計算リソース量テーブル８００の該当する除外対象の説明変数名に対する算出に要した計算時間８０２の値の総和を表示する。 The grouped explanatory variables 1640 are the group 1641 which is the registered content of the in-group evaluation table 1500 notified in S1119 of FIG. 11, the explanatory variable name 1642, the degree of correlation 1643 with the objective variable, and the prediction at the time of selection. The influence degree 1644 and the calculation resource reduction amount 1645 at the time of selection are displayed. The predicted influence degree 1644 at the time of election displays the predicted influence degree at the time of selection, that is, when the other explanatory variables are excluded, in other words, the accuracy. In the calculation resource reduction amount 1645 at the time of selection, the description of the corresponding exclusion target in the calculation resource amount table 800 of FIG. 8 when one explanatory variable in the group is selected, that is, when the other explanatory variables are excluded. The sum of the values of the calculation time 802 required for the calculation for the variable name is displayed.

専門家３１０は、画面１６００を確認することで、作成対象から除外してよい説明変数名を知ることができ、また、その説明変数を除外した場合の予測影響度（精度）と計算リソース削減量を知ることができる。これによって、専門家３１０は、アラート重要度予測への影響がない、又は、少ない、作成対象から除外してもよい説明変数を知ることができる。これによって、専門家３１０は、説明変数の設定を見直すことができ、その結果、ログ統計を算出するための計算リソース消費量を削減する効果を見積もることができる。 By checking the screen 1600, the expert 310 can know the names of the explanatory variables that may be excluded from the creation target, and the prediction impact (accuracy) and the amount of calculation resource reduction when the explanatory variables are excluded. Can be known. This allows the expert 310 to know the explanatory variables that have no or little effect on the alert importance prediction and may be excluded from the creation target. This allows the expert 310 to review the explanatory variable settings and, as a result, estimate the effect of reducing computational resource consumption for calculating log statistics.

図１７は、アラート分析装置１３５における予測影響度評価手順の一例を示すフローチャート図である。アラート分析装置１３５は処理を開始すると、入力データテーブル９００のデータ種別９０２に“学習”が割り当てられているアラートのエントリを選択する。そしてアラート分析装置１３５は、それらのエントリから目的変数と説明変数に該当する項目を抽出し、目的変数と説明変数の出現頻度との相関度を算出する（Ｓ１７０１）。ただしアラート分析装置１３５は、より単純に目的変数と説明変数との相関度を算出してもよい。次にアラート分析装置１３５は、予測モデル式を作成し、入力データの学習アラートとテストアラートに基づいて予測モデル式を用いて予測値と精度を算出する（Ｓ１７０２）。 FIG. 17 is a flowchart showing an example of the predicted impact degree evaluation procedure in the alert analyzer 135. When the alert analyzer 135 starts processing, it selects an entry for an alert to which "learning" is assigned to the data type 902 of the input data table 900. Then, the alert analyzer 135 extracts the items corresponding to the objective variable and the explanatory variable from those entries, and calculates the degree of correlation between the objective variable and the appearance frequency of the explanatory variable (S1701). However, the alert analyzer 135 may more simply calculate the degree of correlation between the objective variable and the explanatory variable. Next, the alert analyzer 135 creates a prediction model formula, and calculates the prediction value and the accuracy using the prediction model formula based on the learning alert and the test alert of the input data (S1702).

アラート分析装置１３５は、図１０の抽出方法１０１０を参照し、第１方法１０１１が選択されているか否かを判断する（Ｓ１７０３）。第１方法１０１１が選択されている場合はアラート分析装置１３５は、入力データテーブル９００の各説明変数の分散を算出し、値が０である説明変数を抽出し（Ｓ１７０４）、Ｓ１７０５に進む。第１方法１０１１が選択されていない場合は、そのままＳ１７０５に進む。 The alert analyzer 135 refers to the extraction method 1010 in FIG. 10 and determines whether or not the first method 1011 is selected (S1703). When the first method 1011 is selected, the alert analyzer 135 calculates the variance of each explanatory variable in the input data table 900, extracts the explanatory variable having a value of 0 (S1704), and proceeds to S1705. If the first method 1011 is not selected, the process proceeds to S1705 as it is.

次に、アラート分析装置１３５は、抽出方法１０１０を参照し、第２方法１０１２が選択されているか否かを判断する（Ｓ１７０５）。第２方法１０１２が選択されている場合はアラート分析装置１３５は、Ｓ１７０１で算出した目的変数との相関度が無相関閾値１０２１の値以下となる説明変数を抽出する（Ｓ１７０６）。アラート分析装置１３５は、Ｓ１７０６で抽出した説明変数を作成対象から除外した際の予測影響度を、Ｓ１７０２と同様の手順で算出し（Ｓ１７０７）、Ｓ１７０８に進む。第２方法１０１２が選択されていない場合は、そのままＳ１７０８に進む。 Next, the alert analyzer 135 refers to the extraction method 1010 and determines whether or not the second method 1012 is selected (S1705). When the second method 1012 is selected, the alert analyzer 135 extracts an explanatory variable whose degree of correlation with the objective variable calculated in S1701 is equal to or less than the value of the uncorrelated threshold value 1021 (S1706). The alert analyzer 135 calculates the predicted influence degree when the explanatory variables extracted in S1706 are excluded from the creation target by the same procedure as in S1702 (S1707), and proceeds to S1708. If the second method 1012 is not selected, the process proceeds to S1708 as it is.

次に、アラート分析装置１３５は、抽出方法１０１０を参照し、第３方法１０１３が選択されているか否かを判断する（Ｓ１７０８）。第３方法１０１３が選択されている場合はアラート分析装置１３５は、入力データテーブル９００の説明変数間の相関度、すなわち相互相関度を算出する（Ｓ１７０９）。そしてアラート分析装置１３５は、相互相関度が強相関閾値１０２３の値以上となる説明変数同士でグループを作成する（Ｓ１７１０）。 Next, the alert analyzer 135 refers to the extraction method 1010 and determines whether or not the third method 1013 is selected (S1708). When the third method 1013 is selected, the alert analyzer 135 calculates the degree of correlation between the explanatory variables of the input data table 900, that is, the degree of cross-correlation (S1709). Then, the alert analyzer 135 creates a group of explanatory variables whose cross-correlation degree is equal to or higher than the value of the strong correlation threshold value 1023 (S1710).

アラート分析装置１３５は、作成したグループごとに、説明変数を１つ選出し、それ以外の説明変数を除外した際の予測影響度を、Ｓ１７０２と同様の手順で算出し（Ｓ１７１１）、Ｓ１７１２に進む。第３方法１０１３が選択されていない場合は、そのままＳ１７１２に進む。最後に、アラート分析装置１３５は、出力画面表示１６００に必要な情報を統合管理装置１３３に送信して、処理を終了する（Ｓ１７１２）。 The alert analyzer 135 selects one explanatory variable for each created group, calculates the predicted influence degree when the other explanatory variables are excluded (S1711), and proceeds to S1712. .. If the third method 1013 is not selected, the process proceeds to S1712 as it is. Finally, the alert analysis device 135 transmits the information required for the output screen display 1600 to the integrated management device 133, and ends the process (S1712).

上述した第１の実施の形態によれば、次の作用効果が得られる。
（１）分析システム１は、事象に関する情報を用いて複数の説明変数を作成し、複数の説明変数および事象の結果を含む入力データを作成する入力データ作成部３０３と、複数の説明変数を用いて事象の結果である目的変数の値を予測する予測モデル式を作成する予測部４０１と、予測モデル式に用いる必要がない説明変数の候補である除外候補変数を算出する説明変数評価部４０２と、除外候補変数を出力する出力部４０４と、を備える。そのため分析システム１は、予測モデル式に用いない説明変数を人間が決定する補助ができる。このことはさらに、説明変数の作成数を削減することに貢献し、その結果、ログ統計を算出するための計算リソース消費量を削減することに貢献することができるとも言える。 According to the first embodiment described above, the following effects can be obtained.
(1) The analysis system 1 uses an input data creation unit 303 that creates a plurality of explanatory variables using information about an event and creates input data including a plurality of explanatory variables and the result of the event, and a plurality of explanatory variables. The prediction unit 401 that creates a prediction model formula that predicts the value of the objective variable that is the result of the event, and the explanatory variable evaluation unit 402 that calculates the exclusion candidate variables that are candidates for the explanatory variables that do not need to be used in the prediction model formula. , An output unit 404 that outputs exclusion candidate variables. Therefore, the analysis system 1 can assist human beings in determining explanatory variables that are not used in the prediction model formula. It can be said that this further contributes to reducing the number of explanatory variables created, and as a result, it can contribute to reducing the consumption of computational resources for calculating log statistics.

（２）説明変数評価部４０２は、分散がゼロである説明変数を算出する。分散がゼロの説明変数は目的変数との相関が全くないので、予測モデル式に用いない説明変数として最も適している。そのため分散がゼロである説明変数を出力することは、予測モデル式に用いない説明変数の判断に有益である。 (2) The explanatory variable evaluation unit 402 calculates an explanatory variable having a variance of zero. An explanatory variable with zero variance has no correlation with the objective variable, so it is most suitable as an explanatory variable not used in the prediction model formula. Therefore, outputting the explanatory variables with zero variance is useful for determining the explanatory variables that are not used in the prediction model formula.

（３）説明変数評価部４０２は、目的変数との相関度が所定の閾値以下である説明変数を算出する。目的変数との相関度が低い説明変数は予測モデル式に用いない説明変数として適している。そのため目的変数との相関度が所定の閾値以下である説明変数を出力することは、予測モデル式に用いない説明変数の判断に有益である。 (3) The explanatory variable evaluation unit 402 calculates an explanatory variable whose degree of correlation with the objective variable is equal to or less than a predetermined threshold value. An explanatory variable with a low degree of correlation with the objective variable is suitable as an explanatory variable that is not used in the prediction model formula. Therefore, it is useful to output an explanatory variable whose degree of correlation with the objective variable is equal to or less than a predetermined threshold value in determining an explanatory variable that is not used in the prediction model formula.

（４）分析システム１は、除外候補変数を用いない場合の予測モデル式の精度に関する情報、すなわち図１６の予測影響度１６３２を算出する予測影響度評価部４０３を備える。出力部４０４は、この予測影響度１６３２を出力する。そのため、専門家が予測影響度１６３２の値を見て、除外候補変数を予測モデル式に用いるか否かの判断の参考にできる。 (4) The analysis system 1 includes information on the accuracy of the prediction model formula when the exclusion candidate variable is not used, that is, the prediction influence degree evaluation unit 403 for calculating the prediction influence degree 1632 in FIG. The output unit 404 outputs this predicted influence degree 1632. Therefore, an expert can see the value of the prediction influence degree 1632 and use it as a reference for determining whether or not to use the exclusion candidate variable in the prediction model formula.

（５）出力部４０４は、除外候補変数を用いる予測モデル式に対する除外候補変数を用いない予測モデル式が削減できる計算リソース量を、図１６の符号１６２３、１６３３、および１６４５に示すように出力する。そのため、予測モデル式に用いない説明変数を決定する補助的な情報として有用である。 (5) The output unit 404 outputs the amount of calculation resources that can be reduced by the prediction model expression that does not use the exclusion candidate variable with respect to the prediction model expression that uses the exclusion candidate variable, as shown by reference numerals 1623, 1633, and 1645 in FIG. .. Therefore, it is useful as auxiliary information for determining explanatory variables that are not used in the prediction model formula.

（６）説明変数評価部４０２は、１または複数の抽出方法を用いて予測モデル式に用いる必要がない説明変数の候補である除外候補変数を抽出し、出力部４０４は、抽出方法の識別情報を出力する。そのため出力部４０４の出力を見た専門家は、複数の抽出方法を用いて除外候補変数を抽出した場合に、抽出方法と除外候補変数との対応を知ることができる。 (6) The explanatory variable evaluation unit 402 extracts exclusion candidate variables that are candidates for explanatory variables that do not need to be used in the prediction model formula using one or more extraction methods, and the output unit 404 extracts identification information of the extraction method. Is output. Therefore, an expert who sees the output of the output unit 404 can know the correspondence between the extraction method and the exclusion candidate variable when the exclusion candidate variable is extracted by using a plurality of extraction methods.

（７）説明変数評価部４０２は、説明変数同士の相関度に基づきグループを作成し、グループごとにグループに含まれるいずれか１つの説明変数以外を予測モデル式に用いない場合の予測モデル式の精度に関する情報である、図１６に示す選出時の予測影響度１６４４を算出し、出力部４０４がこれを出力する。そのため相関が高いグループでの判断の参考にできる。 (7) The explanatory variable evaluation unit 402 creates a group based on the degree of correlation between the explanatory variables, and the predictive model formula in the case where only one of the explanatory variables included in the group is used in the predictive model formula for each group. The predicted influence degree 1644 at the time of selection shown in FIG. 16, which is information on accuracy, is calculated, and the output unit 404 outputs this. Therefore, it can be used as a reference for judgments in groups with high correlation.

（変形例１）
出力部４０４は、除外候補変数だけを出力してもよい。具体的には、図１６に示す例において学習およびテストデータ予測結果１６１０を一切表示せず、除外候補変数および予測影響度１６２０における説明変数名１６２２、説明変数名１６３１、および説明変数名１６４２だけを出力してもよい。さらにそれぞれの項目名も表示しなくてもよい。前述のように、図１６に示す表示例は図１０に示す設定と連動しているので、たとえば図１０の抽出方法１０１０において第１方法１０１１のみが選択されている場合には、図１６において分散がゼロである説明変数である説明変数名１６２２のみが表示される。すなわち出力部４０４の出力として、画面１６００に「外部脅威情報 URL危険度」とのみ表示される場合もある。 (Modification 1)
The output unit 404 may output only the exclusion candidate variables. Specifically, in the example shown in FIG. 16, the training and test data prediction result 1610 is not displayed at all, and only the explanatory variable name 1622, the explanatory variable name 1631, and the explanatory variable name 1642 in the exclusion candidate variable and the prediction influence degree 1620 are displayed. It may be output. Furthermore, it is not necessary to display each item name. As described above, the display example shown in FIG. 16 is linked to the setting shown in FIG. 10. Therefore, for example, when only the first method 1011 is selected in the extraction method 1010 of FIG. 10, the dispersion is dispersed in FIG. Only the explanatory variable name 1622, which is the explanatory variable for which is zero, is displayed. That is, as the output of the output unit 404, only "external threat information URL risk" may be displayed on the screen 1600.

（変形例２）
予測影響度評価部４０３は、予測影響度の代わりに学習アラートまたはテストアラートを用いて算出した精度を出力してもよい。この場合には、図１６における予測影響度１６３２および選出時の予測影響度１６４４の代わりに、学習アラートまたはテストアラートを用いて算出した精度が表示される。この場合には専門家は、予測影響度１６３２および選出時の予測影響度１６４４の代わりに表示される精度と、図１６における精度１６１４に示される値と、を比較することで予測影響度と同様の情報が得られる。 (Modification 2)
The predictive impact evaluation unit 403 may output the accuracy calculated by using the learning alert or the test alert instead of the predictive impact. In this case, the accuracy calculated using the learning alert or the test alert is displayed instead of the predicted influence degree 1632 and the predicted influence degree 1644 at the time of selection in FIG. In this case, the expert may compare the accuracy displayed in place of the predicted impact 1632 and the predicted impact 1644 at the time of election with the value shown in the accuracy 1614 in FIG. Information can be obtained.

（変形例３）
上述した第１の実施の形態では、予測モデル式に用いる必要がない説明変数の候補を提示した。しかし分析システム１は、所定の方針に従って予測モデル式を変更してもよい。たとえば分析システム１は、分散がゼロとなる説明変数を用いない予測モデル式に変更してもよい。 (Modification 3)
In the first embodiment described above, candidate explanatory variables that do not need to be used in the prediction model formula are presented. However, the analysis system 1 may change the prediction model formula according to a predetermined policy. For example, the analysis system 1 may be changed to a predictive model formula that does not use an explanatory variable having a variance of zero.

この変形例３によれば、あらかじめ定めた範囲で予測モデル式を自動で更新できる。専門家ではない人が分析システム１の出力を見る場合や、専門家による判断を削減したい場合に特に有効である。 According to this modification 3, the prediction model formula can be automatically updated within a predetermined range. This is especially effective when a non-expert person sees the output of the analysis system 1 or when he / she wants to reduce the judgment by an expert.

（変形例４）
出力部４０４は、画面１６００への出力において、出力の形式を揃えるために、分散＝０となる説明変数１６２１に予測影響度もあわせて出力してもよい。分散がゼロの説明変数は目的変数への影響がないので、この場合に出力される予測影響度は「１００％」である。 (Modification example 4)
In the output to the screen 1600, the output unit 404 may output the predictive influence degree together with the explanatory variable 1621 in which the variance = 0 in order to make the output format uniform. Since the explanatory variable with zero variance has no effect on the objective variable, the predicted influence degree output in this case is "100%".

（変形例５）
上述した実施の形態では、分析システム１は図１０に示したインタフェースを用いて専門家、すなわち人間から指定された抽出方法、および閾値を用いた処理を行った。しかし抽出方法および閾値の少なくとも一方はあらかじめ定められていてもよい。たとえば抽出方法および閾値の全てがあらかじめ定められ、設定１００１のタブが存在しなくてもよい。 (Modification 5)
In the above-described embodiment, the analysis system 1 uses the interface shown in FIG. 10 to perform processing using an extraction method designated by an expert, that is, a human being, and a threshold value. However, at least one of the extraction method and the threshold value may be predetermined. For example, all of the extraction methods and thresholds are predetermined, and the tab of setting 1001 does not have to exist.

―第２の実施の形態―
図１８～図１９を参照して、分析システムの第２の実施の形態を説明する。以下の説明では、第１の実施の形態と同じ構成要素には同じ符号を付して相違点を主に説明する。特に説明しない点については、第１の実施の形態と同じである。本実施の形態では、主に、アラート分析装置の設置位置が変更され統合管理装置の機能も兼ねる点で、第１の実施の形態と異なる。 -Second embodiment-
A second embodiment of the analysis system will be described with reference to FIGS. 18-19. In the following description, the same components as those in the first embodiment are designated by the same reference numerals, and the differences will be mainly described. The points not particularly described are the same as those in the first embodiment. This embodiment is different from the first embodiment in that the installation position of the alert analysis device is changed and the function of the integrated management device is also used.

図１８は、第２の実施の形態における分析システム１Ａの構成を示す図である。分析システム１Ａは、ＳＯＣ１３０Ａである。ＳＯＣ１３０Ａは、アラート管理装置１３１と、ログ管理装置１３２と、アラート分析装置１８０１と、第２ネットワーク１３４と、を備える。アラート管理装置１３１、ログ管理装置１３２、および第２ネットワーク１３４の構成および機能は第１の実施の形態と同様なので説明を省略する。アラート分析装置１８０１は、第１の実施の形態における統合管理装置１３３およびアラート分析装置１３５の機能を兼ね備える。 FIG. 18 is a diagram showing the configuration of the analysis system 1A according to the second embodiment. The analysis system 1A is SOC130A. The SOC 130A includes an alert management device 131, a log management device 132, an alert analysis device 1801, and a second network 134. Since the configurations and functions of the alert management device 131, the log management device 132, and the second network 134 are the same as those in the first embodiment, the description thereof will be omitted. The alert analysis device 1801 has the functions of the integrated management device 133 and the alert analysis device 135 according to the first embodiment.

図１９は、第２の実施の形態におけるアラート分析装置１８０１の機能的構成例を示すブロック図である。アラート分析装置１８０１は、アラート収集部３０１と、ログ収集部３０２と、入力データ作成部３０３と、出力データ表示部３０４と、予測部４０１と、説明変数評価部４０２と、予測影響度評価部４０３と、出力部４０４と、を有する。すなわち図１９には、第１の実施の形態における図３に示す構成と図４に示す構成とが含まれる。アラート収集部３０１、ログ収集部３０２、入力データ作成部３０３、出力データ表示部３０４、予測部４０１、説明変数評価部４０２、予測影響度評価部４０３、および出力部４０４の動作は第１の実施の形態において説明したとおりなので説明を省略する。 FIG. 19 is a block diagram showing a functional configuration example of the alert analyzer 1801 according to the second embodiment. The alert analysis device 1801 includes an alert collection unit 301, a log collection unit 302, an input data creation unit 303, an output data display unit 304, a prediction unit 401, an explanatory variable evaluation unit 402, and a prediction impact evaluation unit 403. And an output unit 404. That is, FIG. 19 includes the configuration shown in FIG. 3 and the configuration shown in FIG. 4 in the first embodiment. The operation of the alert collection unit 301, the log collection unit 302, the input data creation unit 303, the output data display unit 304, the prediction unit 401, the explanatory variable evaluation unit 402, the prediction impact evaluation unit 403, and the output unit 404 is the first implementation. Since it is as described in the form of the above, the description thereof will be omitted.

以上説明した第２の実施の形態では、第１の実施の形態と同様の作用効果が得られる。 In the second embodiment described above, the same effects as those in the first embodiment can be obtained.

上記開示は、代表的実施形態に関して記述されているが、当業者は、開示される主題の趣旨や範囲を逸脱することなく、形式及び細部において、様々な変更や修正が可能であることを理解するであろう。 Although the above disclosure is described for a representative embodiment, one of ordinary skill in the art understands that various changes and modifications can be made in form and detail without departing from the spirit or scope of the disclosed subject matter. Will do.

例えば、外部脅威情報データベース１３６は、故障情報データベースや、これらを含む外部情報データベースであってもよいし、アラートの危険度は、これを含むアラートの重要度であってもよいし、セキュリティオペレーションセンタ（ＳＯＣ）は、セキュリティだけでなく、故障などより広い範囲のアラートにも対応するオペレーションセンタであってもよい。 For example, the external threat information database 136 may be a failure information database or an external information database including these, the risk level of the alert may be the importance of the alert including the fault information database, or the security operation center. (SOC) may be an operations center that responds not only to security but also to a wider range of alerts such as failures.

なお、本発明は前述した実施形態に限定されるものではなく、添付した特許請求の範囲の趣旨内における様々な変形例および同等の構成が含まれる。例えばまた、前述した実施形態は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに本発明は限定されない。また、ある実施形態の構成の一部を他の実施形態の構成に置き換えてもよい。また、ある実施形態の構成に他の実施形態の構成を加えてもよい。また、各実施例の構成の一部について、他の構成の追加、削除、または置換をしてもよい。 It should be noted that the present invention is not limited to the above-described embodiment, and includes various modifications and equivalent configurations within the scope of the attached claims. For example, the above-described embodiments have been described in detail in order to explain the present invention in an easy-to-understand manner, and the present invention is not necessarily limited to those having all the described configurations. Further, a part of the configuration of one embodiment may be replaced with the configuration of another embodiment. Further, the configuration of another embodiment may be added to the configuration of one embodiment. In addition, other configurations may be added, deleted, or replaced with respect to a part of the configurations of each embodiment.

上述した各実施の形態および変形例において、機能ブロックの構成は一例に過ぎない。別々の機能ブロックとして示したいくつかの機能構成を一体に構成してもよいし、１つの機能ブロック図で表した構成を２以上の機能に分割してもよい。また各機能ブロックが有する機能の一部を他の機能ブロックが備える構成としてもよい。 In each of the above-described embodiments and modifications, the configuration of the functional block is only an example. Several functional configurations shown as separate functional blocks may be integrally configured, or the configuration represented by one functional block diagram may be divided into two or more functions. Further, a configuration in which a part of the functions of each functional block is provided in another functional block may be provided.

上述した各実施の形態および変形例は、それぞれ組み合わせてもよい。上記では、種々の実施の形態および変形例を説明したが、本発明はこれらの内容に限定されるものではない。本発明の技術的思想の範囲内で考えられるその他の態様も本発明の範囲内に含まれる。 Each of the above-described embodiments and modifications may be combined. Although various embodiments and modifications have been described above, the present invention is not limited to these contents. Other aspects considered within the scope of the technical idea of the present invention are also included within the scope of the present invention.

１ … 分析システム
１００a … 監視対象システム
１３３ … 統合管理装置
１３５ … アラート分析装置
３０１ … アラート収集部
３０２ … ログ収集部
３０３ … 入力データ作成部
３０４ … 出力データ表示部
４０１ … 予測部
４０２ … 説明変数評価部
４０３ … 予測影響度評価部
４０４ … 出力部 1 ... Analysis system 100a ... Monitoring target system 133 ... Integrated management device 135 ... Alert analysis device 301 ... Alert collection unit 302 ... Log collection unit 303 ... Input data creation unit 304 ... Output data display unit 401 ... Prediction unit 402 ... Explanatory variable evaluation Part 403… Predictive impact evaluation part 404… Output part

Claims

An input data creation unit that creates a plurality of explanatory variables using information about an event and creates input data including the plurality of explanatory variables and the result of the event.
A prediction unit that creates a prediction model formula that predicts the value of the objective variable that is the result of the event using the plurality of explanatory variables.
An explanatory variable evaluation unit that calculates exclusion candidate variables that are candidates for the explanatory variables that do not need to be used in the prediction model formula.
An analysis system including an output unit that outputs the exclusion candidate variable.

In the analysis system according to claim 1,
The explanatory variable evaluation unit is an analysis system that calculates the explanatory variables having a variance of zero.

In the analysis system according to claim 1,
The explanatory variable evaluation unit is an analysis system that calculates the explanatory variables whose degree of correlation with the objective variable is equal to or less than a predetermined threshold value.

In the analysis system according to claim 1,
Further provided with a predictive impact evaluation unit for calculating predictive impact information, which is information on the accuracy of the predictive model formula when the exclusion candidate variable is not used.
The output unit is an analysis system that further outputs the predicted effect information.

In the analysis system according to claim 1,
The output unit is an analysis system that further outputs the amount of calculation resources that can be reduced by the prediction model formula that does not use the exclusion candidate variable with respect to the prediction model formula that uses the exclusion candidate variable.

In the analysis system according to claim 1,
The explanatory variable evaluation unit uses one or a plurality of extraction methods to extract exclusion candidate variables that are candidates for the explanatory variables that do not need to be used in the prediction model formula.
The output unit is an analysis system that further outputs the identification information of the extraction method.

In the analysis system according to claim 1,
The explanatory variable evaluation unit creates a group based on the degree of correlation between the explanatory variables, and predicts the case where only one of the explanatory variables included in the group is used in the prediction model formula for each group. Calculate predictive impact information, which is information about the accuracy of the model formula,
The output unit is an analysis system that further outputs the predicted effect information.

An input data creation unit that creates a plurality of explanatory variables using information about an event and creates input data including the plurality of explanatory variables and the result of the event.
A prediction unit that creates a prediction model formula that predicts the value of the objective variable that is the result of the event using the plurality of explanatory variables.
An explanatory variable evaluation unit that calculates exclusion candidate variables that are candidates for the explanatory variables that do not need to be used in the prediction model formula.
An analyzer including an output unit for outputting the exclusion candidate variable.

An analytical method performed by an analyzer,
Creating a plurality of explanatory variables using information about an event, and creating input data including the plurality of explanatory variables and the result of the event.
Creating a prediction model formula that predicts the value of the objective variable that is the result of the event using the plurality of explanatory variables.
To calculate exclusion candidate variables that are candidates for the explanatory variables that do not need to be used in the prediction model formula,
An analysis method comprising outputting the exclusion candidate variable.