JP2017004097A

JP2017004097A - Information analysis system and information analysis method

Info

Publication number: JP2017004097A
Application number: JP2015114777A
Authority: JP
Inventors: 林　直樹; Naoki Hayashi; 直樹林; 仲小路　博史; Hiroshi Nakakoji; 博史仲小路; 淳弥楠美; Junya Kusumi
Original assignee: Hitachi Systems Ltd
Current assignee: Hitachi Systems Ltd
Priority date: 2015-06-05
Filing date: 2015-06-05
Publication date: 2017-01-05
Anticipated expiration: 2035-06-05
Also published as: JP6523799B2; WO2016194752A1

Abstract

PROBLEM TO BE SOLVED: To provide a system and method that makes an analysis result used even in important decision-making by presenting information having a strong relation to a search query along with its grounding.SOLUTION: An information analysis system of the present invention comprises: an acceptance unit for accepting input of information to be analyzed and type of information; a relevant information generation unit for generating structured information in which nodes that are one information and other information and edges indicating relationship between nodes are stipulated; an origin information search unit for extracting, from among the structured information, information that includes a type of analysis accepted by the acceptance unit as an origin node, and outputting the extracted information; a classification accessibility analysis unit for clustering a graph structure expressed by the extracted information and extracting a partial graph structure that includes the origin node; and a connectivity analysis unit for searching the partial graph structure for a node to be extracted that constitutes the end point of the origin node corresponding to the type of analysis, and outputting a node to be extracted that has the largest number of independent paths between the origin node and the node to be extracted as a node that is most related to the origin node.SELECTED DRAWING: Figure 1

Description

本発明は、情報を分析する情報分析システム、情報分析方法に関する。 The present invention relates to an information analysis system and an information analysis method for analyzing information.

近年、組織のＩＴシステムは多種多様且つ多数の機器やアプリケーションを内包する複雑なものとなってきている。したがって、何か異常が起こった際、その異常がどのようなものであるかの調査・把握が困難になっており、長時間を要するようになっている。なお、上記の異常とは、運用に伴ってシステムが物理的に故障したり、実装上の不具合でアプリケーションが停止したり、あるいは、組織を狙った外部からの攻撃によって不具合が発生することなどを意味する。 In recent years, an organization's IT system has become complicated and includes a large variety of devices and applications. Therefore, when something abnormal occurs, it is difficult to investigate and grasp what the abnormality is, and it takes a long time. In addition, the above-mentioned abnormality means that the system physically fails during operation, the application is stopped due to a mounting failure, or the failure occurs due to an external attack aimed at the organization. means.

特に近年においては、組織システムや商用システムの運用オペレーション、セキュリティオペレーションの外部委託が進んでおり、受託側は巨大かつ多様なシステムの異常に対応する必要があるため、異常対応の短時間化、低コスト化のニーズが高まっている。異常への対応は、過去の類似事例など、蓄積したノウハウを用いることで効率的に行えることが多い。すなわち、膨大なログやレポート、公開情報などから適切な類似事例を探し出す技術を活用することで効率化を図ることが可能である。ただし、ログやレポートは情報源によって様々な形態をしており、また起こる異常もその都度様々な情報を契機として発覚することから、単純な検索では類似事例を見つけ出すことが困難である課題がある。
上記の単純な検索とは、情報をあらかじめ特定のキーごとに分割して保持し、キーを突合せて分析するような検索を指す。 In particular, in recent years, the outsourcing of operation and security operations for organizational systems and commercial systems has progressed, and the contractor needs to respond to abnormalities in a huge and diverse system. The need for cost increases. Anomalies can often be handled efficiently by using accumulated know-how such as past similar cases. In other words, it is possible to improve efficiency by utilizing a technique for finding an appropriate similar case from a huge amount of logs, reports, and public information. However, logs and reports are in various forms depending on the information source, and abnormalities that occur are triggered by various information each time, so there is a problem that it is difficult to find similar cases by simple search .
The above simple search refers to a search in which information is divided and held in advance for each specific key, and the keys are matched and analyzed.

以上に述べた課題を解決する技術としては、特許文献１に記すような技術がある。この技術では、情報をグラフ構造として保持し、検索条件の起点となったノードに指定した初期値を設定して一定比率で減少させながら伝播させ、最終的に閾値以上となったノード集合を検索結果として出力することで、既存の検索においては容易にたどり着かない情報を効率的に取得する方法が開示されている。 As a technique for solving the problems described above, there is a technique described in Patent Document 1. In this technology, information is held as a graph structure, the initial value specified for the node that is the starting point of the search condition is set and propagated while decreasing at a constant rate, and the node set that finally exceeds the threshold is searched There has been disclosed a method of efficiently obtaining information that cannot be easily reached in an existing search by outputting as a result.

特開２０１０−１９１９０２号公報JP 2010-191902 A

特許文献１に拠れば、情報と情報の間の関係性をグラフ構造として保持してクラスタリングすることできるため、検索対象情報と何らかの関係がある情報を、その関係性を検索前に意識することなく抽出することができ、既存の検索方式と比較して、それまでに蓄積した情報を効果的に利用できる利点がある。 According to Patent Document 1, since the relationship between information can be clustered while maintaining the relationship between information as a graph structure, information that has some relationship with the search target information can be used without being aware of the relationship before the search. Compared with existing search methods, there is an advantage that information accumulated so far can be used effectively.

しかしながら、運用オペレーションやセキュリティオペレーションの事例分析に適用することを考えた場合、例えば、悪性を疑われる通信について過去の類似レポートを調査する場合など、検索結果が間違っていれば正常な通信を誤って遮断してしまうリスクがある。 However, when considering application to case analysis of operational operations and security operations, for example, when investigating past similar reports for suspected malignant communications, if the search results are incorrect, normal communications are mistakenly performed. There is a risk of blocking.

そのような、システムの動作に悪影響を与え得る重要な判断を行う場合には、対策の決定者が、検索結果を基にして責任を持った判断が行える必要がある。そのために必要となる情報としては、検索結果として提示される情報自体も重要であるが、それに加えて、如何なる理由でその検索結果が検索対象と関係があるのか、という、その検索結果情報を参考情報として採用するに至る、根拠情報が不可欠である。 When making such an important decision that can adversely affect the operation of the system, it is necessary for the decision maker of the countermeasure to make a responsible decision based on the search result. As information necessary for this, the information presented as the search result is also important, but in addition to that, refer to the search result information about why the search result is related to the search target. Evidence information is indispensable until it is adopted as information.

ここで、上述の責任を持った判断が行える、とは具体的には、検索システムの検索ロジックについて知識を有さない第三者に対して、対策の決定者が、なぜ自らがそのような判断を行ったのかを説明可能であることである。 Here, it is possible to make a judgment with the above-mentioned responsibility. Specifically, for a third party who does not have knowledge about the search logic of the search system, the decision maker of the countermeasure is It is possible to explain whether the judgment has been made.

また同様に、対策の決定者が、如何なる理由で検索結果が検索対象と関係しているのかを理解できなければ、関連事例や対策として、複数の異なる情報が検索結果として出力された場合に、どちらの情報を用いてよいのかそもそも判断が行えないという問題もある。 Similarly, if the decision maker of the countermeasure does not understand why the search result is related to the search target, if multiple different information is output as the search result as a related case or countermeasure, There is also a problem that it is impossible to determine which information should be used.

従来技術には、上述したような問題についての特別な記載はなく、したがって、従来技術のシステムが、操作者が意図せずに検索した情報を、責任を伴った判断の材料として活用することは難しい。 In the prior art, there is no special description about the above-described problem, and therefore, it is not possible for the prior art system to use information retrieved unintentionally by an operator as a material for judgment with responsibility. difficult.

本発明は掛かる課題を鑑みてなされたものであり、探索クエリに対して関係の強い情報を、その根拠と共に利用者に提示することで、重要な意思決定においても検索結果を利用可能にすることを目的とする。 The present invention has been made in view of the problems to be solved. By presenting information that is strongly related to the search query together with the basis thereof to the user, the search result can be used even in an important decision making. With the goal.

上記課題を解決するために、例えば特許請求の範囲に記載の構成を採用する。 In order to solve the above problems, for example, the configuration described in the claims is adopted.

本願は上記課題を解決する手段を複数含んでいるが、その一例を挙げるならば、分析対象となる情報と前記情報の種別を示す分析種別との入力を受け付ける受付部と、情報源に含まれる複数の情報のうち一の情報と他の情報との関係性を示す関係分析情報に基づいて、グラフ構造における前記一の情報および前記他の情報であるノードと、前記ノード間の関係性を示すエッジとを定めた構造化情報を生成する関係情報生成部と、前記構造化情報の中から前記受付部が受け付けた前記分析種別を含む情報を抽出し、抽出した情報を起点ノードとして出力する起点情報検索部と、抽出した情報により表現される前記グラフ構造をクラスタリングして前記起点ノードを含む部分グラフ構造を抽出する分類可達性分析部と、前記分析種別に対応する前記起点ノードの終点となる抽出対象ノードを前記部分グラフ構造の中から検索し、前記起点ノードと前記抽出対象ノードとの間の独立パスの数を算出し、前記独立パスの数が最も多い前記抽出対象ノードを前記起点ノードと最も前記関係性があるノードとして出力する結合性分析部と、を備えることを特徴とする情報分析システムとして構成される。 The present application includes a plurality of means for solving the above-described problems. To give an example, the application includes a receiving unit that receives input of information to be analyzed and an analysis type indicating the type of the information, and an information source. Based on the relationship analysis information indicating the relationship between one information and other information among a plurality of pieces of information, the node that is the one information and the other information in the graph structure and the relationship between the nodes are indicated. A relation information generation unit that generates structured information that defines an edge; and a starting point that extracts information including the analysis type received by the receiving unit from the structured information and outputs the extracted information as a starting node An information search unit, a classification reachability analysis unit that clusters the graph structure represented by the extracted information and extracts a subgraph structure including the origin node, and a pre-corresponding analysis type The extraction target node serving as the end point of the start node is searched from the subgraph structure, the number of independent paths between the start node and the extraction target node is calculated, and the extraction having the largest number of independent paths And an connectivity analysis unit that outputs the target node as the node most closely related to the origin node.

また、本発明は、上記情報分析システムで行われる情報分析方法としても把握される。 Moreover, this invention is grasped | ascertained also as an information analysis method performed with the said information analysis system.

本発明に拠れば、情報を分析する際に、分析クエリに対して関係の強い情報を、その根拠と共に利用者に提示すること可能となる。これにより、分析結果の情報を利用者が納得した上で活用する、あるいは活用しない、といった判断が可能になるため、リスクを伴うような重要な判断を行う場面であっても分析結果を活用できるようになる。 According to the present invention, when information is analyzed, it is possible to present information having a strong relationship with the analysis query to the user together with the basis. This makes it possible to determine whether the information of the analysis result is used by the user after consenting or not, so the analysis result can be used even in situations where important decisions involving risks are made. It becomes like this.

本発明を適用したシステムの構成図の例である。It is an example of the block diagram of the system to which this invention is applied. 計算機の構成の例を示す図である。It is a figure which shows the example of a structure of a computer. 構造化情報の例である。It is an example of structured information. 関係分析ロジックの例である。It is an example of relationship analysis logic. 枝重み情報の例である。It is an example of branch weight information. 抽出対象指定情報の例である。It is an example of extraction object designation | designated information. 出力生成ロジックの例である。It is an example of output generation logic. 構造化情報を形成する処理の例である。It is an example of the process which forms structured information. 構造化情報を分析する処理の例である。It is an example of the process which analyzes structured information. 構造化情報の構造を表す例である。It is an example showing the structure of structured information. 分類・可達性分析の処理例である。It is a processing example of classification and reachability analysis. 可達性分析結果を表す例である。It is an example showing a reachability analysis result. 結合性分析を表す例である。It is an example showing a connectivity analysis. 結果の表示例である。It is a display example of a result.

以下、実施例を図面を用いて説明する。 Hereinafter, examples will be described with reference to the drawings.

図１は、本願の技術を適用した情報分析システムの構成図の例である。 FIG. 1 is an example of a configuration diagram of an information analysis system to which the technology of the present application is applied.

情報分析システム１０００は本発明にかかる情報分析システム、情報分析方法を適用したシステムであり、情報源１１００から各種の情報を取得し、関係性を分析した上で情報を構造化情報１０５１に保持し、また、分析依頼者１２００からの分析依頼に基づいて上記の蓄積した情報を分析してその結果を返すシステムである。なお、それぞれの処理の詳細については後述する。 The information analysis system 1000 is a system to which the information analysis system and the information analysis method according to the present invention are applied. The information analysis system 1000 acquires various types of information from the information source 1100, analyzes the relationship, and holds the information in the structured information 1051. In addition, the system analyzes the accumulated information based on the analysis request from the analysis requester 1200 and returns the result. Details of each process will be described later.

情報取得部１００１、および、関係情報生成部１００２はいずれも、後述する図８に示す構造化情報の形成処理において用いる機能である。 Both the information acquisition unit 1001 and the relationship information generation unit 1002 are functions used in the formation process of structured information shown in FIG.

情報取得部１００１は、図８に後述する構造化情報の形成処理において用いる機能であり、情報源１１００から各種の情報を取得し、関係情報生成部１００２に当該情報を入力する機能である。 The information acquisition unit 1001 is a function used in the formation process of structured information described later in FIG. 8, and is a function of acquiring various types of information from the information source 1100 and inputting the information to the relationship information generation unit 1002.

情報源１１００から機械的に情報を取得する場合には、各情報源１１００用のＡＰＩ（ＡｐｐｌｉｃａｔｉｏｎＰｒｏｇｒａｍｍｉｎｇＩｎｔｅｒｆａｃｅ）を具備する。 When information is mechanically acquired from the information source 1100, an API (Application Programming Interface) for each information source 1100 is provided.

また、その際、取得する情報のフォーマットは後述の関係分析ロジック１０５２の入力情報種別４００１で規定された種別の何れかでなければならない。 At that time, the format of the information to be acquired must be one of the types defined in the input information type 4001 of the relationship analysis logic 1052 described later.

情報源１１００が本システムの運用員の手入力である場合には、情報取得部１００１は手入力用のＵＩ（ＵｓｅｒＩｎｔｅｒｆａｃｅ）を備える必要がある。上記ＵＩは、関係分析ロジック１０５２の入力情報種別４００１で規定された種別の何れかを入力するための機構であってもよいし、あるいは、後述の構造化情報１０５１を直接編集するための機構であってもよい。 When the information source 1100 is manually input by an operator of this system, the information acquisition unit 1001 needs to include a UI (User Interface) for manual input. The UI may be a mechanism for inputting any of the types defined by the input information type 4001 of the relationship analysis logic 1052 or a mechanism for directly editing the structured information 1051 described later. There may be.

情報源１１００として手入力を受け付けることで、例えば、「後になってから、以前挙がったアラートＡは実はマルウェアＢと関係があったことが判明した」といった場合のように、既に入力した特定の情報と別の特定の情報の関係を明示的に編集することができ、その結果、後述する分析処理を行う際に、機械的に分析した情報と人が判断して判る情報とを組み合わせて分析できるようになる。 By accepting manual input as the information source 1100, for example, the specific information that has already been input as in the case where “alert A mentioned before was actually related to malware B later” The relationship between specific information and other specific information can be explicitly edited. As a result, when performing the analysis process described later, the information analyzed mechanically and the information that can be determined by human judgment can be combined. It becomes like this.

情報源１１００としては、上述の情報源１１００の何れか一つのみに限るものではなく、情報源１１００として複数種類の機器やＷＥＢサイトなどを用い、それぞれにあわせて情報取得部１００１も複数保持することができる。 The information source 1100 is not limited to any one of the information sources 1100 described above, and a plurality of types of devices, WEB sites, and the like are used as the information source 1100, and a plurality of information acquisition units 1001 are also held in accordance with each. be able to.

関係情報生成部１００２は、情報取得部１００１から受け取った情報について、関係分析ロジック１０５２を用いて、情報と、情報間の関係性をそれぞれ抽出するための機能であり、抽出結果は、構造化情報１０５１に格納する。処理の詳細は後述する。 The relationship information generation unit 1002 is a function for extracting information received from the information acquisition unit 1001 and the relationship between the information using the relationship analysis logic 1052, and the extraction result is structured information. It stores in 1051. Details of the processing will be described later.

分析受付・応答インタフェース１００３、起点情報検索部１００４、枝重要度決定部１００５、分類・可達性分析部１００６、結合性分析部１００７、出力生成部１００８の６つの機能はいずれも後述の図９に示す構造化情報の分析処理において用いる機能である。 The six functions of the analysis reception / response interface 1003, the origin information search unit 1004, the branch importance determination unit 1005, the classification / reachability analysis unit 1006, the connectivity analysis unit 1007, and the output generation unit 1008 are all described later with reference to FIG. This is a function used in the structured information analysis process shown in FIG.

分析受付・応答インタフェース１００３は、後述する分析依頼者１２００から、分析対象の情報と、分析種別を受け取り、後述の分析処理の結果を上記の分析依頼者１２００に返すためのインタフェースである。 The analysis reception / response interface 1003 is an interface for receiving information to be analyzed and an analysis type from an analysis requester 1200 to be described later, and returning a result of an analysis process to be described later to the analysis requester 1200 described above.

起点情報検索部１００４は、分析受付・応答インタフェース１００３を介して分析依頼者１２００から受け取った分析対象の情報について、構造化情報１０５１の中で適合する情報を検索し、起点ノードである起点情報として返す機能である。処理の詳細については後述する。 The starting point information search unit 1004 searches the structured information 1051 for information to be analyzed with respect to the analysis target information received from the analysis requester 1200 via the analysis reception / response interface 1003, and sets it as starting point information that is a starting point node. It is a function to return. Details of the processing will be described later.

枝重要度決定部１００５は、分類・可達性分析部１００６の前処理として、分析依頼者１２００が入力した分析種別情報と、枝重み情報１０５４を照らし合わせることで、それぞれの枝に重みを入れる処理である。 The branch importance determination unit 1005 weights each branch by comparing the analysis type information input by the analysis requester 1200 with the branch weight information 1054 as preprocessing of the classification / reachability analysis unit 1006. It is processing.

本処理は分類・可達性分析部１００６を実施するために必ずしも必要な処理ではない。すなわち、本処理を行わない場合、分析の種別に関わらず全ての枝を常に重さ１として処理することも可能である。 This process is not necessarily a process necessary for implementing the classification / reachability analysis unit 1006. That is, when this processing is not performed, it is possible to always process all branches with a weight of 1 regardless of the type of analysis.

しかしながら、本処理を行うことで、分析種別毎に、特定の種別の情報や特定の種別の関係性を重要視する、あるいは逆に特定種別の関係性の存在を軽視する、といった処理が可能になり、分析の精度を向上できる効果がある。 However, by performing this process, it is possible to attach importance to the information of a specific type and the relationship of a specific type for each analysis type, or conversely, to neglect the existence of a relationship of a specific type. Thus, there is an effect that the accuracy of analysis can be improved.

分類・可達性分析部１００６は、まず、情報の構造を枝の重みも考慮した上でクラスタリングし、起点情報検索部１００４の結果である起点情報とその起点情報から到達可能な抽出対象ノードである抽出対象情報が含まれるクラスタを抽出する。さらに、抽出したクラスタの中で、抽出対象指定情報１０５３に含まれる分析種別を一つ以上含むクラスタのみを抽出する機能である。処理の詳細は後述するが、本処理の結果、分析種別に応じて、関係性の強い情報の一覧を取得することができる。 First, the classification / reachability analysis unit 1006 performs clustering in consideration of the weight of the branch in the structure of information, and uses the origin information that is the result of the origin information search unit 1004 and the extraction target node that can be reached from the origin information. A cluster including certain extraction target information is extracted. Furthermore, this is a function for extracting only clusters that include one or more analysis types included in the extraction target designation information 1053 from the extracted clusters. Although details of the processing will be described later, as a result of this processing, it is possible to acquire a list of highly relevant information according to the analysis type.

結合性分析部１００７は、本実施例で特に特徴的な機能であり、分類・可達性分析部１００６の結果を入力として、起点情報から、各抽出対象情報への独立パスの数を計算する処理である。詳細は後述する。 The connectivity analysis unit 1007 is a particularly characteristic function in the present embodiment, and calculates the number of independent paths from the origin information to each extraction target information by using the result of the classification / reachability analysis unit 1006 as an input. It is processing. Details will be described later.

出力生成部１００８は、結合性分析部１００７の結果と、出力生成ロジック１０５５を用いて、分析受付・応答インタフェース１００３を介して最終的に分析依頼者１２００へ返すための出力を生成する処理である。詳細は後述する。 The output generation unit 1008 is a process of generating an output to be finally returned to the analysis requester 1200 via the analysis reception / response interface 1003 using the result of the connectivity analysis unit 1007 and the output generation logic 1055. . Details will be described later.

構造化情報１０５１は、情報分析システム１０００が分析対象とする情報である。保持する情報の詳細については図３を用いて後述する。 The structured information 1051 is information to be analyzed by the information analysis system 1000. Details of the information to be held will be described later with reference to FIG.

関係分析ロジック１０５２は、関係情報生成部１００２が使用する情報である。保持する情報の詳細については図４を用いて後述する。 The relationship analysis logic 1052 is information used by the relationship information generation unit 1002. Details of the information to be held will be described later with reference to FIG.

抽出対象指定情報１０５３は、分類・可達性分析部１００６が使用する情報である。保持する情報の詳細については図６を用いて後述する。 The extraction target designation information 1053 is information used by the classification / reachability analysis unit 1006. Details of the information to be held will be described later with reference to FIG.

枝重み情報１０５４は、枝重要度決定部１００５が使用する情報である。保持する情報の詳細については図５を用いて後述する。 The branch weight information 1054 is information used by the branch importance degree determination unit 1005. Details of the information to be held will be described later with reference to FIG.

出力生成ロジック１０５５は、出力生成部１００８が使用する情報である。保持する情報の詳細については図７を用いて後述する。 The output generation logic 1055 is information used by the output generation unit 1008. Details of the information to be held will be described later with reference to FIG.

情報源１１００は、本システムが取り扱う情報を取得する際の情報源である。例えば情報源１１００は、ネットワーク機器やサーバ機器、あるいはセキュリティ機器であり、ログやアラートを情報として情報取得部１００１に渡す。また、例えば情報源１１００は、本システムの運用員などにより入力される情報である。人手により入力インタフェースを介して入力された情報源１１００は、情報取得部１００１に引き渡される。情報源１１００は、上記のいずれか一つのみに限るものではなく、それらを組み合わせた複数の情報源であってよい。 An information source 1100 is an information source used when acquiring information handled by the system. For example, the information source 1100 is a network device, a server device, or a security device, and passes logs and alerts to the information acquisition unit 1001 as information. For example, the information source 1100 is information input by an operator of the system. The information source 1100 manually input via the input interface is delivered to the information acquisition unit 1001. The information source 1100 is not limited to any one of the above, and may be a plurality of information sources obtained by combining them.

分析依頼者１２００は、情報分析システム１０００の利用者であり、分析したい対象の情報と、希望する分析種別を分析受付・応答インタフェース１００３を介して本システムに入力し、その結果を同じく分析受付・応答インタフェース１００３を介して受け取る。 The analysis requester 1200 is a user of the information analysis system 1000, and inputs information to be analyzed and a desired analysis type to the system via the analysis reception / response interface 1003. Received via the response interface 1003.

図２は、図１の各構成要素の構成を例示した図である。 FIG. 2 is a diagram illustrating the configuration of each component in FIG.

これらの機器２０００は、ＣＰＵ２００１と、メモリ２００２と、インターネットやＬＡＮを介して他の装置と通信を行うための通信装置２００４と、キーボードやマウス等の入力装置２００５と、モニタやプリンタ等の出力装置２００６と、読取装置２００７と、ハードディスク等の外部記憶装置２００３とが、インタフェース２００８を介して接続されている。また、読取装置２００７にはＩＣカードやＵＳＢメモリのような、可搬性を有する記憶媒体２００９を接続することができる。 These devices 2000 include a CPU 2001, a memory 2002, a communication device 2004 for communicating with other devices via the Internet and a LAN, an input device 2005 such as a keyboard and a mouse, and an output device such as a monitor and a printer. 2006, a reading device 2007, and an external storage device 2003 such as a hard disk are connected via an interface 2008. In addition, a portable storage medium 2009 such as an IC card or a USB memory can be connected to the reading device 2007.

本実施例における情報分析システム１０００を実現するための装置や機器は、これらの機能を実現するプログラムがメモリ２００２上にロードされ、ＣＰＵ２００１により実行されることにより具現化される。これらのプログラムは、あらかじめ、上記機器２０００の外部記憶装置２００３に格納されていても良いし、必要なときに、読取装置２００７や通信装置２００４と当該機器２０００が利用可能な媒体を介して、他の装置から上記外部記憶装置に導入されてもよい。 The apparatus and apparatus for realizing the information analysis system 1000 in the present embodiment are realized by loading a program for realizing these functions onto the memory 2002 and executing it by the CPU 2001. These programs may be stored in advance in the external storage device 2003 of the device 2000, or other programs may be stored via the reading device 2007, the communication device 2004, and a medium that can be used by the device 2000 when necessary. May be introduced into the external storage device.

上記機器２０００が利用可能な媒体とは、たとえば、読取装置２００７に着脱可能な記憶媒体２００９、または通信装置２００４に接続可能なネットワーク２０１０またはネットワーク２０１０を伝搬する搬送波やディジタル信号を指す。そして、プログラムは一旦外部記憶装置２００３に格納された後、そこからメモリ２００２上にロードされてＣＰＵ２００１に実行されてもよいし、あるいは外部記憶装置２００３に格納されることなく、直接メモリ２００２上にロードされて、ＣＰＵ２００１に実行されてもよい。 The medium that can be used by the device 2000 refers to, for example, a storage medium 2009 that can be attached to and detached from the reading device 2007, a network 2010 that can be connected to the communication device 2004, or a carrier wave or a digital signal that propagates through the network 2010. The program may be stored once in the external storage device 2003 and then loaded into the memory 2002 and executed by the CPU 2001. Alternatively, the program may be directly stored in the memory 2002 without being stored in the external storage device 2003. It may be loaded and executed by the CPU 2001.

図３は、情報分析システム１０００が保持する構造化情報１０５１の例を示した図である。 FIG. 3 is a diagram showing an example of structured information 1051 held by the information analysis system 1000.

構造化情報１０５１としては、本システムは大きく、蓄積情報３０００と、関係性情報３１００の２種の情報を保持する必要がある。 As the structured information 1051, this system is large and needs to hold two types of information: accumulated information 3000 and relationship information 3100.

蓄積情報３０００とは、本システムが蓄積する情報そのものであり、本実施例においては、例えば具体的には、図示したように、ｉｄ３００１と、種別３００２と、内容３００３の３つのカテゴリの情報を組にして保持する。 The stored information 3000 is information stored in the system itself. In this embodiment, specifically, for example, as shown in the figure, information of three categories of an id 3001, a type 3002, and a content 3003 is set. And hold.

ｉｄ３００１は、個々の情報をユニークに識別するためのＩＤ（Ｉｄｅｎｔｉｆｉｅｒ）であり、必須ではないが、保持することで処理を簡潔に記載することが可能になる。 The id 3001 is an ID (Identifier) for uniquely identifying each piece of information, and is not indispensable, but it is possible to simply describe the process by holding it.

種別３００２は、ノードが有する個々の情報がどのようなカテゴリのものであるかを識別するための情報であり、本実施例において、抽出指定情報１０５３や、枝重み情報１０５４、出力生成ロジック１０５５と組み合わせて用いることで分析精度を向上したり、出力情報に付加情報を加えたりする効果を得られる。詳細については後述する。 The type 3002 is information for identifying what category the individual information of the node belongs to. In this embodiment, the extraction designation information 1053, the branch weight information 1054, the output generation logic 1055, By using in combination, the analysis accuracy can be improved, and additional information can be added to the output information. Details will be described later.

種別３００２としては、１つの情報は１つの種別しか保持できないわけではなく、１つ以上の複数の種別を同時に保持してよい。その場合、後述の処理における「種別が一致した場合」の条件を、「いずれかの種別に一つでも一致した場合」と読み替えることで問題なく処理可能である。 As the type 3002, one information can hold only one type, and one or more types may be held simultaneously. In that case, processing can be performed without any problem by rereading the condition of “when the types match” in the processing described later as “when any of the types matches”.

また、種別３００２として指定可能な種別は、例示したものに限るわけではなく、情報を入力する際、適宜任意の種別を指定できる。すなわち、仮にそれまでに一つも現れなかった種別であっても、特にその他の処理を行わずに新しい種別を追加してよい。ただし、新しい種別は適宜任意に追加してよいが、本実施例の分析処理の精度を保つためには、同じ種別の情報には、同じ種別名が指定されている必要がある。すなわち、内容が従業員の氏名などをあらわすものであれば、それらには少なくとも「従業員」といった、共通の種別が一つは含まれているべきである。 Further, the types that can be specified as the type 3002 are not limited to those illustrated, and any type can be appropriately specified when inputting information. That is, even if a type has never appeared before, a new type may be added without performing other processing. However, new types may be arbitrarily added as appropriate. However, in order to maintain the accuracy of the analysis processing of the present embodiment, the same type name needs to be specified for information of the same type. That is, if the content represents an employee's name, etc., they should contain at least one common type, such as “employee”.

内容３００３としては、その情報の内容である実体を保持することができる。本実施例では、任意のｋｅｙ−ｖａｌｕｅを保持することで、情報の内容を表現しているが、特にｋｅｙ−ｖａｌｕｅ方式に限定するわけではない。 The content 3003 can hold an entity that is the content of the information. In the present embodiment, the content of information is expressed by holding an arbitrary key-value, but the present invention is not limited to the key-value method.

蓄積情報３０００の例としては、例えば１行目は『ファイル名「ｈｏｇｅ．ｅｘｅ」という名称のファイル』という情報がＩＤ０００１、種別ｆｉｌｅで登録されていることを示している。その他の行も同様である。 As an example of the accumulated information 3000, for example, the first line indicates that information “file name“ hoge.exe ”” is registered with ID0001 and type file. The same applies to the other lines.

関係性情報３１００は、蓄積情報３０００の個々の情報がどのような関係にあるかを保持する。本実施例においては、例えば、ｉｄ３１０１と、ｆｒｏｍ−ｉｄ３１０２と、ｔｏ−ｉｄ３１０３と、種別３１０４を保持する。 The relationship information 3100 holds the relationship between the pieces of information of the accumulated information 3000. In this embodiment, for example, id 3101, from-id 3102, to-id 3103, and type 3104 are held.

ｉｄ３００１は、ノード間のパスを示すエッジが有する個々の関係性をユニークに識別するためのＩＤであり、必須ではないが、保持することで処理を簡潔に記載することが可能になる。 The id 3001 is an ID for uniquely identifying each relationship that an edge indicating a path between nodes has, and although it is not indispensable, it is possible to simply describe the process by holding it.

ｆｒｏｍ−ｉｄ３１０２、ｔｏ−ｉｄ３１０３、種別３１０４は、蓄積情報３０００内のどの情報がどの情報とどのように関係があるかを表すエッジが有する情報である。具体的には、ｆｒｏｍ−ｉｄ３１０２は、エッジの起点となるノードのＩＤであり、ｔｏ−ｉｄ３１０３は、そのエッジの終点となるノードのＩＤである。種別３１０４は、そのエッジが有する関係性がどのようなカテゴリのものであるかを識別するための情報である。 From-id 3102, to-id 3103, and type 3104 are information that an edge indicates which information in accumulated information 3000 is related to which information. Specifically, the from-id 3102 is the ID of the node that is the starting point of the edge, and the to-id 3103 is the ID of the node that is the ending point of the edge. The type 3104 is information for identifying what category the relationship of the edge is.

関係性情報３１００の例としては、例えば１行目のｉｄ「ｒ０００１」は、『ｉｄ「０００６」が表す情報である、ハッシュ値「ａｂｃｄｅ１２３４」のｍｗ（Ｍａｌｗａｒｅ）は、ｉｄ「０００１」が表す、名称「ｈｏｇｅ．ｅｘｅ」のファイルを生成する』ことを意味している。その他の行も同様である。 As an example of the relationship information 3100, for example, id “r0001” in the first row is information represented by “id“ 0006 ”, and mw (Malware) of the hash value“ abcde1234 ”is represented by id“ 0001 ”. This means that a file with the name “hoge.exe” is generated ”. The same applies to the other lines.

なお、本実施例においては、ｆｒｏｍ−ｉｄ３１０２、ｔｏ−ｉｄ３１０３として関係性に有向性を持たせているが、方向性を取り扱わず、無向性の関係性を保持してもよい。 In this embodiment, the relationship is given directionality as the from-id 3102 and to-id 3103, but the directionality may not be handled and the non-directional relationship may be maintained.

図４は、情報分析システム１０００が保持する関係分析ロジック１０５２の例を示した図である。 FIG. 4 is a diagram illustrating an example of the relationship analysis logic 1052 held by the information analysis system 1000.

関係分析ロジック１０５２は、情報源１１００から取得した情報を機械的に分析して構造化情報１０５１に保持するためのロジックを規定する情報をあらかじめ保持する。 The relationship analysis logic 1052 holds in advance information defining a logic for mechanically analyzing information acquired from the information source 1100 and holding it in the structured information 1051.

具体的に本実施例においては、情報源１１００から取得した情報の種別を規定する入力情報種別４００１と、その分析ロジック４００２を保持することで実現する。 Specifically, this embodiment is realized by holding an input information type 4001 that defines the type of information acquired from the information source 1100 and its analysis logic 4002.

例えば１行目は、Ｗｅｂサイト「ｈｏｇｅ．ｓｅｃｕｒｉｔｙ．ｃｏｍ」から取得するウェブページについては、同じく１行目に記載したＸＰａｔｈを処理することで、蓄積情報３０００と、その関係性情報３１００として取り出せることを意味している。 For example, in the first line, the web page acquired from the website “hoge.security.com” can be extracted as the accumulated information 3000 and its relationship information 3100 by processing the XPath described in the first line. Means.

図５は、情報分析システム１０００が保持する枝重み情報１０５４の例を示した図である。 FIG. 5 is a diagram showing an example of branch weight information 1054 held by the information analysis system 1000.

枝重み情報１０５４は、分析依頼者１２００が指定した分析種別に応じ、どの種別の情報や、どの種別の関係性を重要視するか、あるいは軽視するかといった関係の重要性を規定するための情報である。 The branch weight information 1054 is information for prescribing the importance of the relationship such as which type of information and which type of relationship is important or neglected according to the analysis type designated by the analysis requester 1200. It is.

具体的に本実施例においては、分析種別５００１と、カテゴリ５００２と、ノードｏｒエッジ５００３と、重み５００４を保持する。 Specifically, in this embodiment, an analysis type 5001, a category 5002, a node or edge 5003, and a weight 5004 are held.

分析種別５００１は、分析依頼者１２００が本システムに対して依頼可能な分析の種別の何れかである。 The analysis type 5001 is one of analysis types that the analysis requester 1200 can request from the system.

カテゴリ５００２は、蓄積情報３０００の種別３００２（すなわちノードの種別）、あるいは、関係性情報の種別３１０４（すなわちエッジの種別）のいずれかの値、もしくは任意の関係性を意味する「（その他）」である。 The category 5002 indicates any value of the type 3002 of the accumulated information 3000 (that is, the type of node) or the type 3104 of the relationship information (that is, the type of edge), or “(other)” that represents an arbitrary relationship. It is.

ノードｏｒエッジ５００３は、カテゴリ５００２が種別３００２の値である場合はそのことを意味する「ノード」、逆に、種別３１０４の場合は、そのことを意味する「エッジ」という情報を保持する。なお、カテゴリ５００２が「（その他）」の場合は、エッジとノードのいずれも含みうるため「−」を保持する。 The node or edge 5003 holds information “node” meaning that the category 5002 is a value of the type 3002, and conversely “edge” meaning that when the category 5002 is the type 3104. When the category 5002 is “(Other)”, both “edge” and node can be included, and “−” is held.

重み５００４は、関係性をどの程度重視するかの指標であり、１がもっとも重視する、０は当該分析においては、関係性が存在しないものとみなすことを意味する。 The weight 5004 is an index of how much importance is given to the relationship, 1 being the most important, 0 means that no relationship exists in the analysis.

例えば「原因」を分析する際には、「通信」という種別の関係は重み１にする一方、例えば「アナリスト」という種別のノードから出ている全ての関係は重み０．１としてほとんど考慮に入れない、ことを意味している。 For example, when analyzing the “cause”, the relationship of the type “communication” is set to weight 1, while for example, all the relationships coming out from the node of the type “analyst” are almost considered as weight 0.1. It means you can't enter.

図６は、情報分析システム１０００が保持する抽出対象指定情報１０５３の例を示した図である。 FIG. 6 is a diagram showing an example of the extraction target designation information 1053 held by the information analysis system 1000.

抽出対象指定情報１０５３は、あらかじめ分析の結果として出力する対象の情報のカテゴリを規定する情報である。 The extraction target designation information 1053 is information that defines a category of information to be output in advance as a result of analysis.

具体的に本実施例においては、分析種別６００１と、抽出対象カテゴリ６００２を保持する。 Specifically, in this embodiment, an analysis type 6001 and an extraction target category 6002 are held.

例えばこの例は、「原因」種別を分析する際には、最終的な結果として「マルウェア」カテゴリか「脆弱性」カテゴリの情報が出力対象とすることを意味している。 For example, in this example, when the “cause” type is analyzed, the final result is that information in the “malware” category or the “vulnerability” category is output.

図７は、情報分析システム１０００が保持する出力生成ロジック１０５５の例を示した図である。 FIG. 7 is a diagram illustrating an example of the output generation logic 1055 held by the information analysis system 1000.

出力生成ロジックは、分析結果を最終的に分析依頼者に表示する際、どのような情報を表示するかを規定するロジックである。 The output generation logic is a logic that defines what information is displayed when the analysis result is finally displayed to the analysis requester.

本実施例においては、情報種別７００１と、その種別の情報を表示する際に併せて表示する、付加情報内容７００２と、その種別の情報が分析結果として抽出された根拠を文章として表示するための文章化ロジック７００３を保持する。 In this embodiment, the information type 7001 and the additional information content 7002 displayed together with the information of the type are displayed, and the reason why the information of the type is extracted as an analysis result is displayed as a sentence. The documenting logic 7003 is held.

例えばこの例は、「対策」種別の情報は、その対策情報のダウンロードリンクを結果に併せて表示し、文章化ロジックは標準的な処理でよいことを意味している。 For example, in this example, the “measure” type information indicates that the download link of the measure information is displayed together with the result, and the documenting logic may be a standard process.

なお、本実施例において標準的な文章化ロジックとは、『＜ｆｒｏｍ−ｉｄ３１０２の情報＞は、＜ｔｏ−ｉｄ３１０３の情報＞を＜３１０４種別＞する』のように主語、目的語、述語の順で連結する処理を表す。 In this embodiment, the standard grammatical logic is the order of the subject, the object, and the predicate as “<information of from-id 3102> makes <information of to-id 3103> <3104 type>”. Represents the process of connecting with.

文章化ロジックとしては、例示した「従業員」種別のように、例外的な文章化方法を指定してもよい。この例では、関係性の種別が「通信履歴」、あるいは「保持」の場合は＜ｆｒｏｍ−ｉｄ３１０２の情報＞の後に“の端末”という文言を追加することを表している。 As the documenting logic, an exceptional documenting method may be designated as in the “employee” type exemplified. In this example, when the relationship type is “communication history” or “holding”, it indicates that the word “terminal” is added after <from-id 3102 information>.

以降、図８から図９を用いて、本解析システムの基本的な処理フローを例示し、さらに図１０から図１３を用いて、構造化情報がどのようなに処理されるかを説明する。 Hereinafter, the basic processing flow of the present analysis system will be exemplified with reference to FIGS. 8 to 9, and how the structured information is processed will be described with reference to FIGS. 10 to 13.

図８は、図３に示した構造化情報を形成する処理を表すフローの例である。 FIG. 8 is an example of a flow showing processing for forming the structured information shown in FIG.

まず処理８００１において、情報取得部１００１が、情報源１１００から情報を取得する。その際、情報取得部１００１は自身がどのような種別の情報を取得したかを記憶しておく。 First, in process 8001, the information acquisition unit 1001 acquires information from the information source 1100. At that time, the information acquisition unit 1001 stores what type of information it has acquired.

なお、情報の種別とは、関係分析ロジック１０５２の入力情報種別４００１が規定するものの何れかである。 Note that the information type is one specified by the input information type 4001 of the relationship analysis logic 1052.

また、情報源１１００が手入力だった場合は、以降の処理に進まず図１の説明において説明したように、直接構造化情報１０５１を編集してもよい。 Further, when the information source 1100 is manually input, the structured information 1051 may be edited directly as described in the description of FIG.

次に処理８００２において、関係情報生成部１００２は、処理８００１において取得され情報を、関係性ロジック１０５２の分析ロジック４００２に従って分析する。 Next, in process 8002, the relationship information generation unit 1002 analyzes the information acquired in process 8001 according to the analysis logic 4002 of the relationship logic 1052.

最後に処理８００３において、関係情報生成部１００２は、処理８００２の分析結果を、蓄積情報３０００と、関係性情報３１００として保存する。 Finally, in process 8003, the relationship information generation unit 1002 stores the analysis result of process 8002 as accumulated information 3000 and relationship information 3100.

図１０に、そのようにして形成した構造化情報１０５１の一例をグラフ構造として表現したものを示す。 FIG. 10 shows an example of structured information 1051 formed in this way expressed as a graph structure.

例えば１０００１、１０００３は蓄積情報３０００の例のそれぞれ１行目のｉｄ「０００１」の情報と、ｉｄ「０００６」の情報を表し、また、その２つのノード間のエッジである１０００２は、関係性情報３１００の例の１行目のｉｄ「ｒ０００１」の関係を表している。その他のノードと、エッジも同様である。また、１０００４として表したような破線で囲んだグレーの領域は、さらに同様のグラフ構造が連結していることを示している。 For example, 10001 and 10003 represent the information of id “0001” and the information of id “0006” in the first row in the example of the accumulated information 3000, and the edge 10002 between the two nodes represents the relationship information. The relationship of id “r0001” in the first row in the example of 3100 is shown. The same applies to other nodes and edges. Further, a gray region surrounded by a broken line as represented by 10004 indicates that similar graph structures are connected.

図９は、図８に示す処理によって形成された構造化情報１０５１を用いて、分析依頼者１２００から受け取った分析条件と分析種別に従って情報を分析する処理のフローの例を示している。 FIG. 9 shows an example of a processing flow for analyzing information according to the analysis conditions and analysis type received from the analysis requester 1200 using the structured information 1051 formed by the processing shown in FIG.

まず処理９００１において、情報分析システム１０００は、分析受付・応答インタフェース１００３を介して、分析依頼者１２００から分析対象である情報と、分析種別を受け取る。 First, in process 9001, the information analysis system 1000 receives information to be analyzed and an analysis type from the analysis requester 1200 via the analysis reception / response interface 1003.

本実施例では、分析対象である情報とは具体的には文字列であって、例えば、何らかのセキュリティ機器からアラートが挙がった際のアラートに記載されたホスト名や、ＩＰアドレスであってもよいし、不審なファイルのハッシュ値でもよい。 In this embodiment, the information to be analyzed is specifically a character string, and may be, for example, a host name or an IP address described in an alert when an alert is raised from some security device. Alternatively, the hash value of a suspicious file may be used.

また、分析種別とは、分析依頼者１２００が何を知りたいのかを自ら指定するものであって、枝重み情報１０５４の分析種別５００１のいずれかと一致する必要がある。 The analysis type specifies what the analysis requester 1200 wants to know and needs to match one of the analysis types 5001 of the branch weight information 1054.

次に、処理９００２において、起点情報検索部１００４は、処理９００１で受け取った分析対象の文字列とマッチする内容を保持する蓄積情報３０００を検索し、結果を起点情報として返す。 Next, in the process 9002, the starting point information search unit 1004 searches the accumulated information 3000 that holds the contents that match the character string to be analyzed received in the process 9001, and returns the result as starting point information.

マッチするとは、本実施例では文字列が一致、もしくは部分一致することである。複数マッチした場合は、それら全てを返す。本実施例では、分析対象が通信先の一つにマッチした場合の例を、図１１の１１００２の二重線のノードとして表している。 Matching means that character strings are matched or partially matched in this embodiment. If there are multiple matches, all of them are returned. In this embodiment, an example in which the analysis target matches one of the communication destinations is represented as a double line node 11002 in FIG.

次に、処理９００３において、枝重要度決定部１００５は、処理９００１で取得した分析種別と、枝重み情報１０５４を照らし合わせ、各枝に重要度を設定する。 Next, in process 9003, the branch importance level determination unit 1005 compares the analysis type acquired in process 9001 with the branch weight information 1054, and sets the importance level for each branch.

本実施例では、分析種別が例えば「原因」であった場合について、重要度が０．１以下の関係性を、関係性がほとんど存在しないものとして、図１１の１１００１のように破線の矢印で表している。 In the present embodiment, when the analysis type is “cause”, for example, a relationship having an importance of 0.1 or less is assumed to have almost no relationship, and a broken line arrow like 11001 in FIG. Represents.

次に、処理９００４において、分類・可達性分析部１００６は、処理９００３において設定した枝の重みを考慮した上で、グラフをクラスタリングする。グラフ情報のクラスタリング手法は既知の手法が多数あるが、本実施例において非常に好適な手法の一つは、コミュニティ分類と呼ばれる手法である。 Next, in process 9004, the classification / reachability analysis unit 1006 clusters the graphs in consideration of the branch weight set in process 9003. There are many known methods for clustering graph information, but one of the most suitable methods in this embodiment is a method called community classification.

コミュニティ分類とは具体的には、グラフ構造をグラフラプラシアンと（あるいはラプラシアン行列とも）呼ばれる行列形式で表現し、そのゼロ固有値（もしくはゼロに近い固有値）と、その固有値に対応する固有ベクトルを計算して求める。最後にグラフ構造からグラフラプラシアンへの写像の逆写像を用いて、固有ベクトルをグラフのノード集合に引き戻すことで、グラフをいくつかの部分グラフに分割する方法である。この分割結果として取得される各部分グラフは、全体のグラフの中でそれぞれ、枝の重みが大きく、さらにそのような枝が密に張られているような部分が優先的に残るように上手く分割されたものであることが知られている。 More specifically, community classification expresses the graph structure in a matrix form called graph Laplacian (or Laplacian matrix), and calculates its eigenvalue (or eigenvalue close to zero) and the eigenvector corresponding to that eigenvalue. Ask. Finally, the inverse graph of the mapping from the graph structure to the graph Laplacian is used to divide the graph into several subgraphs by pulling back the eigenvectors to the graph node set. Each subgraph obtained as a result of this division is divided well so that the weight of the branch is large in the entire graph, and the portion where such a branch is densely left preferentially remains. It is known that

本処理においては、分類・可達性分析部１００６は、分類結果の部分グラフの中で、起点情報であるノードを含む部分グラフを抽出し、処理結果として返す。図１１の１１００３に示したような二重線によって、本実施例においてコミュニティ分類を行った結果としての、全体グラフの分割線を示す。すなわち、本処理の結果は、部分グラフの中でも起点情報である１１００２を含むものであり、すなわち図１２に示すようなグラフが結果として返される。なお、起点情報が複数存在する場合は、各起点情報についてそれぞれ部分グラフを返す。 In this process, the classification / reachability analysis unit 1006 extracts a subgraph including a node as starting point information from the subgraph of the classification result, and returns it as a processing result. A dividing line of the entire graph as a result of community classification in the present embodiment is shown by a double line as shown by 11003 in FIG. That is, the result of this processing includes 11002 which is the starting point information in the partial graph, that is, a graph as shown in FIG. 12 is returned as a result. When there are a plurality of starting point information, a partial graph is returned for each starting point information.

次に、処理９００５において、結合性分析部１００７は、処理９００４の結果の部分グラフについて（結果が複数存在する場合はそれぞれについて）、抽出対象指定情報１０５３を参照し、分析種別６００１が処理９０００１で取得したものと一致する行の、抽出対象カテゴリ６００２で指定された種別に一致する蓄積情報３０００（すなわちノード）を全て検索する。この処理を行うことにより、起点情報に対する終点となる抽出対象情報がわかる。また、その抽出対象情報は、処理９００１で入力された分析種別に対応するカテゴリに属する抽出対象情報のみが検索され、最終的に出力生成部１００８により出力される。本実施例では、図１２の１２００１のようにグレーで塗り潰した二つのｍｗ種別のノードがその結果の例である。 Next, in the process 9005, the connectivity analysis unit 1007 refers to the extraction target designation information 1053 for the partial graph of the result of the process 9004 (if there are a plurality of results), and the analysis type 6001 is the process 90001. All stored information 3000 (that is, nodes) matching the type specified in the extraction target category 6002 in the line that matches the acquired one is searched. By performing this process, the extraction target information that becomes the end point with respect to the starting point information is known. As the extraction target information, only the extraction target information belonging to the category corresponding to the analysis type input in the process 9001 is searched and finally output by the output generation unit 1008. In the present embodiment, two mw type nodes painted in gray as indicated by 12001 in FIG. 12 are examples of the result.

ここまでの処理で、分析対象情報を入力として、関連の強いとみなせる情報の一覧を取得することができる。本実施例においては、そのような情報の根拠までを分析することが特徴であり、それは具体的には、次に示す処理で行うものである。 With the processing so far, it is possible to acquire a list of information that can be regarded as strongly related, with the analysis target information as an input. The present embodiment is characterized by analyzing the basis of such information, and specifically, it is performed by the following processing.

さらに、処理９００５において、結合性分析部１００７は、起点情報（図１２内の二十枠のノード）から、抽出対象情報（図１２内のグレーで塗りつぶしたノード）それぞれに対して、エッジの向きを無視して、独立パスが何本存在するかを計算する。 Further, in the process 9005, the connectivity analysis unit 1007 determines the edge direction for each piece of extraction target information (nodes painted in gray in FIG. 12) from the starting point information (nodes in the 20th frame in FIG. 12). , And calculate how many independent paths exist.

ここで、あるノードから別のノードへの二つのパスが独立であるとは、二つのパスが途中でいずれのエッジも共有することがないことを意味する。 Here, two paths from one node to another node are independent, meaning that the two paths do not share any edge in the middle.

例えば、図１３（１３−ａ）に示したグラフにおいては、ノードａからノードｄへのパスはエッジｂ−ｃが常に共通するため、独立パスは１本のみである。 For example, in the graph shown in FIG. 13 (13-a), the path from the node a to the node d always has the same edge bc, so there is only one independent path.

一方で、図１３（１３−ｂ）に示したグラフにおいては、共通するパスはなく、ノードａからノードｇへの独立パスはａ−ｂ−ｄ−ｇ、ａ−ｄ−ｅ−ｇ、ａ−ｆ−ｇの３本である。 On the other hand, in the graph shown in FIG. 13 (13-b), there is no common path, and independent paths from the node a to the node g are abdg, adeg, a -F-g.

本実施例においては、この独立パスの本数は、何種類の互いに依存のない周辺情報が、対象情報を関係のある情報として指し示しているか、という意味であり、すなわちいわば状況証拠の数であると捉えることができる。 In the present embodiment, the number of independent paths means how many kinds of peripheral information that do not depend on each other indicates target information as related information, that is, the number of situation evidences. Can be caught.

逆に、図１３（１３−ａ）のように、パス自体は多くても、独立パスが少なければ、結局は何らかの少数の情報のみに依存した関係性であり、そこが誤りであった場合には、一度に関係性が切れてしまうものである。したがって、上記のような共通するパスが少ないあるいは上記のような独立パスが多いほど根拠が揃うため情報を多面的に捉えることができ、有用性の高いより強い状況証拠となる。 On the other hand, as shown in FIG. 13 (13-a), if there are many paths, but there are few independent paths, the relationship depends on only a small number of pieces of information. Will break the relationship at once. Therefore, as the number of common paths as described above decreases or as the number of independent paths as described above increases, the basis is more complete, so that information can be grasped from various aspects, and the situation becomes stronger and more useful.

すなわちまとめると、処理９００５は、起点情報から各抽出対象情報への独立パスの本数を計算し、独立パスの多い対象情報ノードをより根拠のそろった情報として、また、各独立パスを状況証拠として返す処理である。 That is, in summary, the process 9005 calculates the number of independent paths from the origin information to each piece of extraction target information, sets the target information nodes having many independent paths as more grounded information, and sets each independent path as status evidence. Return processing.

本実施例では、図１３（１３−ｃ）に示すような、独立パスが１本のｍｗ種別情報と、図１３（１３−ｄ）に示したような独立パスが２本のｍｗ種別情報が存在するため、後者がより根拠のそろった情報として返される。また、処理９００５のさらに効果的な処理としては、独立パスであって、さらに各パスを構成するエッジの種別が異なるものであればさらに強い状況証拠として判断してもよい。 In the present embodiment, mw type information with one independent path as shown in FIG. 13 (13-c) and mw type information with two independent paths as shown in FIG. 13 (13-d). The latter is returned as more justified information because it exists. Further, as a more effective process of the process 9005, if it is an independent path and the types of edges constituting each path are different, it may be determined as a stronger situation proof.

最後に、処理９００６において、出力生成部１００８は、出力生成ロジック１０５５を参照し、応答として返す情報種別７００１に応じて、付加情報表示内容７００２で指定された付加的な情報を生成する。また、グラフ構造を基に抽出根拠を文章化する文章化ロジック７００３を用いて、根拠を文章化して分析依頼者１２００に通知することも可能である。 Finally, in process 9006, the output generation unit 1008 refers to the output generation logic 1055 and generates additional information specified by the additional information display content 7002 according to the information type 7001 returned as a response. Also, it is possible to document the basis and notify the analysis requester 1200 using the documenting logic 7003 that documents the extraction basis based on the graph structure.

本実施例の出力結果の表示例を図１４を用いて示す。１４００１は、分析依頼者１２００が入力した分析対象である。図１４では、ファイル「ｈｏｇｅ．ｅｘｅ」に記述されている文字列「ｅｘａｍｐｌｅ．ｃｏｍ」を分析対象としていることを示している。１４００２は、分析依頼者１２００が入力した分析種別である。図１４では、分析種別として「原因」が入力されたことを示している。１４００３は、分析結果として処理９００５の結果返された抽出対象情報の総数である。図１４では、２件の分析結果（図１３の１３−ｃ、１３−ｄ）が得られたことを示している。 A display example of the output result of this embodiment will be described with reference to FIG. Reference numeral 14001 denotes an analysis target input by the analysis requester 1200. FIG. 14 illustrates that the character string “example.com” described in the file “hoge.exe” is an analysis target. Reference numeral 14002 denotes an analysis type input by the analysis requester 1200. FIG. 14 shows that “Cause” is input as the analysis type. 14003 is the total number of extraction target information returned as a result of the process 9005 as an analysis result. FIG. 14 shows that two analysis results (13-c and 13-d in FIG. 13) were obtained.

１４００４は、分析結果をどのような順で表示するかを指定する機能である。本実施例によれば、独立パスが多いものほど重要であるため、そのような抽出対象情報から優先的に表示することが望ましいが、例えば情報が登録された時系列などの変更させることも可能である。図１４では、状況証拠の多い順となる降順で分析結果が表示されることを示している。１４００５は、それ以下に表示される情報が、ＭＷ０００７の情報であることを示す。１４００６は、起点情報となる通信先００１２と、抽出対象情報となるＭＷ０００７との間の独立パスを図示したものである。本情報により、分析依頼者１２００はどのような関係性に基づいて、当該情報が抽出されたのかを理解することができる。また、例えば、１４００６内の各情報を選択することで、その情報種別に応じた付加情報を動的に生成して表示するなどの付加的な動作も可能である。 A function 14004 designates in which order the analysis results are displayed. According to the present embodiment, the more independent paths, the more important. Therefore, it is desirable to display such information preferentially from such extraction target information, but it is also possible to change the time series in which information is registered It is. FIG. 14 shows that the analysis results are displayed in descending order, which is the order in which the situation evidence is large. 14005 indicates that the information displayed below it is MW0007 information. Reference numeral 14006 illustrates an independent path between the communication destination 0012 serving as starting point information and the MW 0007 serving as extraction target information. Based on this information, the analysis requester 1200 can understand based on what kind of relationship the information is extracted. Further, for example, by selecting each piece of information in 14006, an additional operation such as dynamically generating and displaying additional information according to the information type is also possible.

１４００７は、当該情報が抽出された根拠を示しており、起点情報が選ばれた理由と、抽出対象情報の中でもっとも根拠が強い理由を示している。具体的な根拠については、１４００８と、１４００９に示すように、グラフ構造を文章化することでそのまま説明可能である。例えば、状況１として、ＭＷ０００７は通信先００１２に通信することを根拠として示している。１４０１０は、抽出対象情報に付加的に追加して提示する情報であって、ＭＷ種別の場合はアンチウィルスソフトで検知可能か、など、通常必要となる情報があるならばあわせて提示することが可能である。１４０１１は、表示する分析結果を切り替えるためのページネータである。 Reference numeral 14007 indicates the reason why the information is extracted, and indicates the reason why the starting point information is selected and the reason why the basis is the strongest among the extraction target information. The specific grounds can be explained as they are by writing the graph structure as shown in 14008 and 14009. For example, as the situation 1, the MW 0007 indicates that the communication is performed with the communication destination 0012. 14010 is information that is additionally presented to the extraction target information and can be presented together with information that is normally required, such as whether it can be detected by anti-virus software in the case of MW type. It is. 14011 is a page numberer for switching the analysis result to be displayed.

以上に例示した構成にすることで、上述した本願の効果を得ることが可能となる。 With the configuration exemplified above, the above-described effects of the present application can be obtained.

１０００情報分析システム
１００１情報取得部
１００２関係情報生成部
１００３分析受付・応答インタフェース
１００４起点情報検索部
１００５枝重要度決定部
１００６分類・可達性分析部
１００７結合性分析部
１００８出力生成部
１０５１構造化情報
１０５２関係分析ロジック
１０５３抽出対象指定情報
１０５４枝重み情報
１０５５出力生成ロジック
１１００情報源
１２００分析依頼者
２０００各構成要素のハードウェア構成要素。 1000 Information analysis system 1001 Information acquisition unit 1002 Relation information generation unit 1003 Analysis reception / response interface 1004 Origin information search unit 1005 Branch importance level determination unit 1006 Classification / reachability analysis unit 1007 Connectivity analysis unit 1008 Output generation unit 1051 Structured Information 1052 Relation analysis logic 1053 Extraction target designation information 1054 Branch weight information 1055 Output generation logic 1100 Information source 1200 Analysis requester 2000 Hardware component of each component.

Claims

A reception unit that receives input of information to be analyzed and an analysis type indicating the type of the information;
Based on the relationship analysis information indicating the relationship between one piece of information and other pieces of information included in the information source, the one piece of information in the graph structure and the node that is the other piece of information, and between the nodes A relationship information generating unit that generates structured information that defines edges indicating the relationship between
Extracting the information including the analysis type received by the receiving unit from the structured information, and outputting the extracted information as a starting node;
A class reachability analyzer that clusters the graph structure represented by the extracted information and extracts the subgraph structure including the origin node;
The extraction target node that is the end point of the starting node corresponding to the analysis type is searched from the subgraph structure, the number of independent paths between the starting node and the extracting target node is calculated, and the independent path A connectivity analyzer that outputs the extraction target node having the largest number of nodes as the node having the most relationship with the origin node;
An information analysis system comprising:

The class reachability analysis unit clusters the graph structure based on weight information indicating the importance of a relationship between the predetermined node type or the edge type and the analysis type. Extract the graph structure,
The information analysis system according to claim 1.

The connectivity analysis unit outputs the node as a stronger situation proof as the number of independent paths is larger or the common path between the origin node and the extraction target node is smaller.
The information analysis system according to claim 1.

The combination analysis unit searches for the nodes belonging to the category corresponding to the analysis type based on the extraction target designation information that associates the predetermined analysis type with the output category of the node, Outputting the searched node as the node having the relationship;
The information analysis system according to claim 1.

A reception step for receiving input of information to be analyzed and an analysis type indicating the type of the information;
Based on the relationship analysis information indicating the relationship between one piece of information and other pieces of information included in the information source, the one piece of information in the graph structure and the node that is the other piece of information, and between the nodes A relationship information generation step for generating structured information defining edges indicating the relationship between
Extracting the information including the analysis type received by the receiving unit from the structured information, and outputting the extracted information as a starting node;
A class reachability analysis step of clustering the graph structure represented by the extracted information to extract a subgraph structure including the origin node;
The extraction target node that is the end point of the starting node corresponding to the analysis type is searched from the subgraph structure, the number of independent paths between the starting node and the extracting target node is calculated, and the independent path A connectivity analysis step of outputting the extraction target node having the largest number of nodes as a node having the most relationship with the origin node;
An information analysis method comprising:

In the class reachability analysis step, the graph structure is clustered on the basis of weight information indicating importance of a relationship between the predetermined node type or the edge type and the analysis type, Extract the graph structure,
The information analysis method according to claim 5.

In the connectivity analysis step, the stronger the number of independent paths or the fewer common paths between the origin node and the extraction target node, the stronger the node is output as a situation proof.
The information analysis method according to claim 5.

In the combination analysis step, based on the extraction target designation information that associates the predetermined analysis type with the output category of the node, the node belonging to the category corresponding to the analysis type is searched, Outputting the searched node as the node having the relationship;
The information analysis method according to claim 5.