JP6557774B2

JP6557774B2 - Graph-based intrusion detection using process trace

Info

Publication number: JP6557774B2
Application number: JP2018502363A
Authority: JP
Inventors: ジェンジャンチェン、; ルーアンタン、; ボクシアンドン、; グオフェイジアン、; ハイフォンチェン、
Original assignee: NEC Laboratories America Inc
Current assignee: NEC Laboratories America Inc
Priority date: 2015-07-24
Filing date: 2016-07-20
Publication date: 2019-08-07
Anticipated expiration: 2036-07-20
Also published as: WO2017019391A1; DE112016002806T5; JP2018526728A

Description

この出願は、２０１５年４月１６日に出願された米国特許出願第６２／１４８２３２号を基礎とした優先権を主張する、２０１６年４月１４日に出願された米国特許一部継続出願第１５／０９８８６１号である。さらに、この出願は、２０１５年７月２４日に出願された米国特許出願第６２／１９６４０４号、並びに２０１６年７月１１日に出願された米国特許出願第６２／３６０５７２号を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims priority from US patent application serial number 62/148232 filed on April 16, 2015, US patent continuation application 15 filed on April 14, 2016. No./098861. In addition, this application is based on U.S. Patent Application No. 62/196404 filed July 24, 2015, and U.S. Patent Application No. 62/360572 filed Jul. 11, 2016. And the entire disclosure is incorporated herein.

本発明は、コンピュータ及び情報セキュリティに関し、特に大規模なプロセストレースによるホストレベルの侵入検知に関する。 The present invention relates to computer and information security, and more particularly to host level intrusion detection with large process traces.

企業向けネットワークは、企業における極めて需要なシステムであり、ミッションクリティカルな情報の大部分を扱う。そのような重要性のために、これらのネットワークはしばしば攻撃の対象となる。侵入検知システムは、コンピュータのネットワークにおける情報セキュリティを確保するため、ネットワーク全体の稼働状態を監視し、起こり得る攻撃または悪意のある行動と関連するシナリオを特定する必要がある。 Enterprise networks are highly demanding systems in the enterprise and deal with most of mission critical information. Because of their importance, these networks are often the target of attacks. In order to ensure information security in a computer network, an intrusion detection system needs to monitor the operational status of the entire network and identify scenarios associated with possible attacks or malicious behavior.

ホストレベルにおいて、検知システムは、特定のホストまたはマシン上のプロセス／プログラムイベントに関して豊富な情報を収集する（例えば、プログラムがファイルを開くとき）。この情報は、侵入検知システムが的確に侵入動作を監視できるようにするが、シグネチャベースの検知技法は新たな脅威を検知できなくなり、アノマリベースの検知技法は、単一の異常プロセスの検知に集中するか、または純正常イベント（purely normal events）と共に訓練データから構築されるオフラインモデルが必要になる。 At the host level, the detection system collects a wealth of information regarding process / program events on a particular host or machine (eg, when a program opens a file). This information allows intrusion detection systems to accurately monitor intrusion behavior, but signature-based detection techniques can no longer detect new threats, and anomaly-based detection techniques focus on detecting a single anomalous process. Or an off-line model constructed from training data with purely normal events is required.

何より、侵入検知システムは、与えられたシステムがどのような状態にあるかを判定するには、様々なシステムイベントの、独立した動作ではなく、同時に発生したまたは一連の動作に依存することが多い。一般的に、このシステム監視データは、正確なタイムスタンプを備える低レベルプロセスイベント、またはプロセス、ファイル及びソケット（例えば、プログラムがファイルを開くとき、またはサーバに接続するとき）等の様々なシステムエンティティ間のインタラクション（interactions）から構成されるが、侵入の企ては、通常、複数の異なるプロセスイベントが関係する、より高いレベルの行動である。例えば、持続的標的型攻撃（ＡＰＴ：Advanced Persistent Threat）と呼ばれるネットワーク攻撃は、ステルシー（stealthy）で継続するコンピュータハッキングプロセスの集合（set）で構成される。ＡＰＴは、まず環境において足掛かりを得ることを企てる。続いて、ＡＰＴは、ウィルスに感染しているシステムをターゲットのネットワークに対するアクセスに用いて攻撃目的を実現するのに役立つ追加のツールを配備する。プロセスイベントのレベルと侵入行動のレベルとの間に存在するギャップは、特にそれらの間で発生する大量の「ノイズの多い」プロセスイベントがあることを考慮すると、どのプロセスイベントが実際に悪意のある行動に関係しているかを推論するのを困難にする。したがって、個々の疑わしいプロセスイベントを判別する従来の攻撃検知技法は、このシナリオに対処するには不十分である。 Above all, intrusion detection systems often rely on simultaneous or series of actions, rather than independent actions, of various system events to determine what state a given system is in . In general, this system monitoring data is a low-level process event with an accurate time stamp, or various system entities such as processes, files and sockets (eg, when a program opens a file or connects to a server) Composed of interactions between, intrusion attempts are usually higher level actions involving multiple different process events. For example, a network attack called an Advanced Persistent Threat (APT) consists of a set of computer hacking processes that continue with stealthy. APT first seeks to gain a foothold in the environment. Subsequently, APT deploys additional tools that help a virus-infected system be used to access the target network to achieve the attack objective. The gap that exists between the level of process events and the level of intrusive behavior, which process events are actually malicious, especially considering that there are a large number of "noisy" process events that occur between them Make it difficult to infer whether it is related to behavior. Thus, conventional attack detection techniques that determine individual suspicious process events are insufficient to address this scenario.

悪意のあるプロセスを検知するための方法は、システムエンティティを表す頂点とそれぞれのシステムエンティティ間のイベントを表す辺（edge）とを有するグラフとしてシステムデータをモデル化することを含む。各辺は、２つのシステムエンティティ間のそれぞれのイベントに対応する１つまたは複数のタイムスタンプを備える。起こり得る攻撃に関連する有効な経路パターンの集合が生成される。システムにおける１つまたは複数のイベントシーケンスは、グラフ上でランダムウォークを用いて、グラフ及び有効な経路パターンに基づいて疑わしいと判定される。 A method for detecting malicious processes includes modeling system data as a graph having vertices representing system entities and edges representing events between the respective system entities. Each side comprises one or more time stamps corresponding to respective events between the two system entities. A set of valid path patterns associated with possible attacks is generated. One or more event sequences in the system are determined to be suspicious based on the graph and valid path patterns using a random walk on the graph.

悪意のあるプロセスを検知するためのシステムは、システムエンティティを表す頂点とそれぞれのシステムエンティティ間のイベントを表す辺とを有するグラフとしてシステムデータをモデル化するように構成されたモデリングモジュールを含む。各辺は、２つのシステムエンティティ間のそれぞれのイベントに対応する１つまたは複数のタイムスタンプを含む。悪意のあるプロセス経路発見モジュールは、起こり得る攻撃に関連する有効な経路パターンの集合を生成し、グラフ上でランダムウォークを用いて、グラフ及び有効な経路パターンに基づいてシステムにおける１つまたは複数のイベントシーケンスを疑わしいと判定するように構成されたプロセッサを含む。 A system for detecting malicious processes includes a modeling module configured to model system data as a graph having vertices representing system entities and edges representing events between the respective system entities. Each side includes one or more time stamps corresponding to respective events between the two system entities. The malicious process path discovery module generates a set of valid path patterns related to a possible attack and uses a random walk on the graph to determine one or more in the system based on the graph and the valid path pattern. A processor is configured to determine that the event sequence is suspicious.

本発明のこれら及び他の特徴並びに利点は、当業者にとって以下の詳細な説明及び添付の図面を参照することで明らかになるであろう。 These and other features and advantages of the present invention will become apparent to those of ordinary skill in the art by reference to the following detailed description and the accompanying drawings.

本開示では、後述するように、以下の図面を参照しながら好ましい実施形態について詳細に説明する。 In the present disclosure, as will be described later, preferred embodiments will be described in detail with reference to the following drawings.

図１は、本原理による、ノードのコミュニティ及び役割を表すネットワークグラフの図である。FIG. 1 is a diagram of a network graph representing node communities and roles according to the present principles.

図２は、本原理による、コミュニティ及び役割のメンバーシップを発見し、異常を検知する方法のブロック／流れ図である。FIG. 2 is a block / flow diagram of a method for detecting community and role membership and detecting anomalies in accordance with the present principles.

図３は、本原理による、ホストレベル分析モジュールのブロック図である。FIG. 3 is a block diagram of a host level analysis module according to the present principles.

図４は、本原理による、疑わしいホストレベルイベントシーケンスを検知するための方法のブロック／流れ図である。FIG. 4 is a block / flow diagram of a method for detecting a suspicious host level event sequence in accordance with the present principles.

図５は、本原理による、疑わしいホストレベルイベントシーケンスを検知するための疑似コードのセグメントである。FIG. 5 is a pseudo-code segment for detecting suspicious host-level event sequences in accordance with the present principles.

図６は、本原理による、処理システムのブロック図である。FIG. 6 is a block diagram of a processing system according to the present principles.

本原理によれば、本実施形態は、侵入行動に関係する異常プロセス経路を検知することで、悪意のあるプロセス経路の発見を提供する。これは、プロセストレースを用いることで達成される。有効なシーケンスパターンの集合が生成され、ランダムウォークベースのプロセスが、システム機能を学習して疑わしいプロセスシーケンスを発見するために用いられる。経路長からスコアバイアスを除去するため、ボックスコックスベキ変換（Box-Cox power transformation）がプロセスシーケンスの異常スコアを正規化するために適用される。 According to this principle, the present embodiment provides the discovery of malicious process paths by detecting abnormal process paths related to intrusion behavior. This is achieved by using a process trace. A set of valid sequence patterns is generated and a random walk-based process is used to learn system functions and find suspicious process sequences. In order to remove the score bias from the path length, a Box-Cox power transformation is applied to normalize the anomaly score of the process sequence.

そのため、本実施形態は、攻撃が起きた後、攻撃者の行動トレース（すなわち、プロセス経路）の完全な痕跡（complete evidence）を提供する。さらに、本実施形態は、悪意のあるプロセス経路をより正確に検知し、より少ない時間で、かつより少ない計算の複雑性で誤検知数及び検知漏れ数を低減する。記憶負荷を低減するためにコンパクトなグラフ構造を用いてもよく、検索空間を低減するために有効なシーケンスパターンの集合を生成して用いてもよく、計算コストを低減するためにランダムウォーク方式を用いてもよい。さらに、本実施形態は、訓練データを必要としないため、新たな攻撃を検知できる。 Thus, the present embodiment provides complete evidence of an attacker's behavior trace (ie, process path) after an attack occurs. Furthermore, the present embodiment more accurately detects malicious process paths, and reduces the number of false positives and missed detections in less time and with less computational complexity. A compact graph structure may be used to reduce the memory load, a set of effective sequence patterns may be generated and used to reduce the search space, and a random walk method is used to reduce the calculation cost. It may be used. Furthermore, since this embodiment does not require training data, a new attack can be detected.

次に、同じ数字が同一または同様の要素を表す図面、まず図１を詳細に参照すると、図１には自動セキュリティインテリジェンスシステム（ＡＳＩ：Automatic Security Intelligence）アーキテクチャが示されている。ＡＳＩシステムは、３つの主要な構成要素を含む。すなわち、動作データを収集するために企業ネットワークの各マシンにエージェント１０がインストールされ、バックエンドサーバ２００がエージェント１０からデータを受信し、該データを前処理し、該前処理したデータを分析サーバ３０に送信し、分析サーバ３０がセキュリティアプリケーションプログラムを実行して該データを分析する。 Referring now in detail to the drawings in which like numerals represent the same or similar elements, first of all, FIG. 1 illustrates an Automatic Security Intelligence (ASI) architecture. The ASI system includes three main components. That is, the agent 10 is installed on each machine in the corporate network in order to collect operation data, the back-end server 200 receives data from the agent 10, preprocesses the data, and analyzes the preprocessed data with the analysis server 30. The analysis server 30 executes the security application program and analyzes the data.

各エージェント１０は、エージェントマネージャ１１、エージェントアップデータ１２及びエージェントデータ１３を含み、エージェントデータ１３は、アクティブプロセス、ファイルアクセス、ネットソケット、１サイクル当たりの命令数及びホスト情報に関する情報を含んでもよい。バックエンドサーバ２０は、エージェントアップデータサーバ２１及び監視データ記憶装置を含む。分析サーバ３０は、侵入検知３１、セキュリティポリシーコンプライアンス評価３２、インシデントバックトラック及びシステム復旧３３、並びに集中脅威検索及びクエリ３４を含む。 Each agent 10 includes an agent manager 11, agent update data 12, and agent data 13. The agent data 13 may include information regarding active processes, file access, net sockets, the number of instructions per cycle, and host information. The back-end server 20 includes an agent updater server 21 and a monitoring data storage device. Analysis server 30 includes intrusion detection 31, security policy compliance assessment 32, incident backtrack and system recovery 33, and centralized threat search and query 34.

次に図２を参照すると、図２には侵入検知３１に関するさらなる細部が示されている。侵入検知エンジンには５つのモジュールがある。すなわち、侵入検知エンジンは、バックエンドサーバ２０からデータを受信し、対応するデータをネットワークレベルモジュール４２及びホストレベルモジュール４３に分配するデータ分配器４１と、ネットワーク通信（ＴＣＰ及びＵＤＰを含む）を処理し、異常通信イベントを検知するネットワーク分析モジュール４２と、ユーザ対プロセスイベント、プロセス対ファイルイベント及びユーザ対レジストリイベントを含む、ホストレベルイベントを処理するホストレベル分析モジュール４３と、ネットワークレベル異常とホストレベル異常を統合し、それらの結果を信頼できる侵入イベントにリファイン（refine）する異常融合モジュール４４と、検知結果をエンドユーザに出力する可視化モジュール４５とを有する。 Referring now to FIG. 2, further details regarding intrusion detection 31 are shown in FIG. There are five modules in the intrusion detection engine. In other words, the intrusion detection engine receives data from the backend server 20 and processes the network communication (including TCP and UDP) with the data distributor 41 that distributes the corresponding data to the network level module 42 and the host level module 43. A network analysis module 42 for detecting abnormal communication events, a host level analysis module 43 for processing host level events including user-to-process events, process-to-file events, and user-to-registry events, network level anomalies and host levels An anomaly fusion module 44 that integrates anomalies and refines those results into reliable intrusion events, and a visualization module 45 that outputs the detection results to the end user.

次に図３を参照すると、図３にはホストレベル分析モジュール４３に関するさらなる細部が示されている。ホストレベル分析モジュール４３は、ハードウェアプロセッサ３１２及びメモリ３１４を含む。さらに、ホストレベル分析モジュール４３は、一実施形態において、メモリ３１４に記憶されてハードウェアプロセッサ３１２で実行される、１つまたは複数の機能モジュールを含む。代替の実施形態において、機能モジュールは、例えば、特定用途向けの集積チップまたはフィールドプログラマブルゲートアレイの形式で、１つまたは複数の個別ハードウェア構成要素として実装されていてもよい。 Referring now to FIG. 3, further details regarding the host level analysis module 43 are shown. The host level analysis module 43 includes a hardware processor 312 and a memory 314. Further, the host level analysis module 43 includes one or more functional modules stored in the memory 314 and executed by the hardware processor 312 in one embodiment. In alternative embodiments, the functional modules may be implemented as one or more individual hardware components, for example in the form of an application specific integrated chip or field programmable gate array.

ホストレベル分析モジュール４３は、複数の異なる分析及び検知機能を含む。プロセス対ファイル異常検知モジュール３０２は、データ分配器４１からホストレベルプロセス対ファイルイベントを入力として受け取り、異常プロセス対ファイルイベントを発見する。これらのイベントには、例えば、ファイルに対する読出しまたは書込みを含んでもよい。ユーザ対プロセス異常検知モジュール３０４は、全てのストリーミングプロセスイベントをデータ分配器４１から入力として受け取り、各ユーザの行動をプロセスレベルでモデル化し、各ユーザで実行された疑わしいプロセスを識別する。ＵＳＢイベント異常検知モジュール３０６は、ストリーミングプロセスイベントを検討し、異常な装置の動作を検知するために全てのＵＳＢ装置関連イベントを特定する。プロセスシグネチャー異常検知モジュール３０８は、データ分配器４１からプロセス名及びシグネチャーを入力として受け取り、疑わしいシグネチャーを有するプロセスを検知する。最後に、悪意のあるプロセス経路発見モジュール３１０は、データ分配器４１から現在のアクティブプロセスを開始点として受け取り、時間ウィンドウにおける入力イベントと先のイベントとを組み合わせることで危険性を有する全てのプロセス経路を監視する。悪意のあるプロセス経路発見モジュール３１０は、以下でより詳細に記載するように異常プロセスシーケンス／経路を検知する。 The host level analysis module 43 includes a plurality of different analysis and detection functions. The process-to-file abnormality detection module 302 receives a host level process-to-file event as an input from the data distributor 41 and finds an abnormal process-to-file event. These events may include, for example, reading or writing to a file. The user-to-process anomaly detection module 304 receives all streaming process events as input from the data distributor 41, models each user's behavior at the process level, and identifies suspicious processes performed by each user. The USB event abnormality detection module 306 examines the streaming process event and identifies all USB device related events in order to detect abnormal device operation. The process signature anomaly detection module 308 receives the process name and signature from the data distributor 41 as input, and detects a process having a suspicious signature. Finally, the malicious process path discovery module 310 receives the current active process from the data distributor 41 as a starting point, and all process paths that are at risk by combining the input event and the previous event in the time window. To monitor. Malicious process path discovery module 310 detects abnormal process sequences / paths as described in more detail below.

次に図４を参照すると、図４には悪意のあるプロセス経路検知の方法が示されている。入力としてブループリントグラフが用いられる。ブループリントグラフは、ネットワークにおける通信のヒストリカルデータ集合から構成されるヘテロジーニアスグラフであり、ブループリントグラフの各ノードは企業ネットワークの物理的装置を表し、各辺はノード間の通常の通信パターンを示す。ブロック４０２は、システムエンティティ間の複雑なインタラクションを非巡回の多部グラフ（multipartite graph）としてキャプチャする、コンパクトなグラフ構造を用いたグラフモデリングを実行する。ブロック４０４は、その後、最大シーケンス長に基づいて有効なシーケンスパターンの集合を生成する。最大シーケンス長は、ユーザが設定してもよく、最適な値を自動的に決定してもよい。有効なシーケンスパターンを形成することで、グラフのサーチ空間サイズが著しく低減する。 Referring now to FIG. 4, FIG. 4 shows a malicious process path detection method. A blueprint graph is used as input. The blueprint graph is a heterogeneous graph composed of a historical data set of communication in the network. Each node of the blueprint graph represents a physical device of the corporate network, and each side represents a normal communication pattern between the nodes. . Block 402 performs graph modeling using a compact graph structure that captures complex interactions between system entities as an acyclic multipartite graph. Block 404 then generates a set of valid sequence patterns based on the maximum sequence length. The maximum sequence length may be set by the user, or an optimum value may be automatically determined. By forming an effective sequence pattern, the search space size of the graph is significantly reduced.

次に、ブロック４０６は、グラフを走査してパターンと一致する候補イベントシーケンスを判定する。「パターン」は、システムエンティティタイプの順序集合を表すが、「シーケンス」は具体的なシステムエンティティの順序集合を表す。したがって、シーケンスは、システムエンティティの各タイプの順序がパターンに合致する場合、該パターンと一致する。シーケンスは、本明細書では「経路」とも称す。 Next, block 406 scans the graph to determine candidate event sequences that match the pattern. “Pattern” represents an ordered set of system entity types, whereas “Sequence” represents a specific ordered set of system entities. Thus, the sequence matches the pattern if the order of each type of system entity matches the pattern. The sequence is also referred to herein as a “path”.

ブロック４０８は、あらゆるエンティティの特性を抽出するためにランダムウォークを適用する。発見されたエンティティの特性に基づいて、ブロック４０８は、各候補プロセスの異常スコアを計算し、プロセスがどのように異常であるかを診断する。長さが異なる複数の異なるシーケンスパターンがあり、長さが異なる２つの経路のスコアは直接比較できないため、各シーケンスパターンの異常スコア分布は、例えばボックスコックスベキ変換を用いて、ブロック４１０にて単一分布（single distribution）に変換される。ブロック４１０は、疑わしいシーケンスと通常のシーケンスとの偏差を測定し、閾値よりも高い偏差を有するシーケンスを報告する。 Block 408 applies a random walk to extract the characteristics of every entity. Based on the characteristics of the discovered entity, block 408 calculates an abnormal score for each candidate process and diagnoses how the process is abnormal. Since there are a plurality of different sequence patterns with different lengths, and the scores of two paths with different lengths cannot be directly compared, the abnormal score distribution of each sequence pattern is simply determined at block 410 using, for example, box cox power transformation. Converted to a single distribution. Block 410 measures the deviation between the suspicious sequence and the normal sequence and reports a sequence having a deviation higher than the threshold.

攻撃行動には、しばしば複数のシステムエンティティ（例えば、プロセス、ファイル、ソケット等）が関与する。したがって、本実施形態では、複数の異なるシステムエンティティ間のインタラクションを組み込む。 Aggressive behavior often involves multiple system entities (eg, processes, files, sockets, etc.). Therefore, in this embodiment, an interaction between a plurality of different system entities is incorporated.

システム監視によって提供される情報量は極めて大きいことがあり、メモリに対する該情報の直接的な記憶及びアクセスの実現を困難にする。しかしながら、情報は高い冗長性を有する。各イベントのレコードには、関連するエンティティだけでなく、それらのエンティティの属性を含むため、属性において冗長性を最初に見出すことができる。各イベントに関して属性を繰り返し記憶することは冗長である。第二に、同じエンティティで起きるイベントを繰り返しセーブし、それらの間でタイムスタンプだけを変えるのは冗長である。第三に、侵入攻撃の検知に無関係であるレコードの保存は不必要である。 The amount of information provided by system monitoring can be quite large, making it difficult to directly store and access the information to memory. However, the information has a high redundancy. Since each event record contains not only the related entities but also the attributes of those entities, redundancy can be found first in the attributes. It is redundant to store attributes repeatedly for each event. Second, it is redundant to repeatedly save events that occur on the same entity and only change the timestamps between them. Third, it is not necessary to store records that are unrelated to intrusion detection.

したがって、グラフモデルは、有意な情報を圧縮して監視データから取り込むことでブロック４０２によって生成される。グラフモデルは、有向グラフＧ＝（Ｖ，Ｅ，Ｔ）で表される。ここで、Ｔはタイムスタンプの集合（set）であり、Ｅ⊂Ｖ×Ｖ×Ｔは辺の集合であり、Ｖ＝Ｆ∪Ｐ∪Ｕ∪Ｓは頂点の集合である。Ｆはコンピュータシステムに常駐するファイルの集合であり、Ｐはプロセスの集合であり、ＵはＵＮＩＸ（登録商標）ソケットの集合であり、Ｓはインターネットソケットの集合である。Ｅにおける具体的な辺（ｖ_ｉ，ｖ_ｊ）に関して、Ｔ（ｖ_ｉ，ｖ_ｊ）は、辺上のタイムスタンプの集合を表す。各イベントｅに対応する辺が既にＧに存在する場合、タイムスタンプｔが辺のタイムスタンプに追加される。そうでない場合、ブロック４０２は、タイムスタンプの集合Ｔ（ｖ_ｉ，ｖ_ｊ）＝｛ｔ｝を含むＧにそのような辺を構築する。この構造において、固有のエンティティの属性値が一度だけ保存される。長さｌの各イベントシーケンスについて、ｌ個の辺から成る、Ｇを通る対応する経路がある。 Accordingly, a graph model is generated by block 402 by compressing significant information and capturing from monitoring data. The graph model is represented by a directed graph G = (V, E, T). Here, T is a set of time stamps, E⊂V × V × T is a set of edges, and V = F∪P∪U∪S is a set of vertices. F is a set of files resident in the computer system, P is a set of processes, U is a set of UNIX (registered trademark) sockets, and S is a set of Internet sockets. For a specific edge (v _i , v _j ) in E, T (v _i , v _j ) represents a set of time stamps on the edge. If an edge corresponding to each event e already exists in G, the time stamp t is added to the time stamp of the edge. Otherwise, block 402 constructs such an edge in G that contains the set of time stamps T (v _i , v _j ) = {t}. In this structure, unique entity attribute values are stored only once. For each event sequence of length l, there is a corresponding path through G consisting of l edges.

最も疑わしい経路をグラフＧから抽出する自然な方式は、既存の全ての経路を調査することである。しかしながら、密に接続されたグラフから考えられる全ての経路を列挙することは非現実的である。ブロック４０６における候補検索にガイダンスを提供するため、ブロック４０４は、有効な経路パターンＢの集合を生成する。有効な経路パターンと一致する経路だけが、起こり得る攻撃に関連しており、他は廃棄してもよい。 The natural way to extract the most suspicious path from the graph G is to examine all existing paths. However, it is impractical to enumerate all possible paths from a closely connected graph. To provide guidance for the candidate search at block 406, block 404 generates a set of valid route patterns B. Only routes that match a valid route pattern are associated with possible attacks and others may be discarded.

長さｌから成る各経路パターンＢは、ｌ個のエンティティ及び／またはエンティティタイプを含む。そのため、経路パターンＢは、特有のエンティティ（例えば、特定のファイル）だけでなく、同じ経路におけるエンティティタイプの一般的名称の両方を含んでいてもよい。Ｂと一致するＧにおいて少なくとも経路ｐが存在する場合にのみ、Ｂは有効な経路パターンと判定される。ｌが小さな数となる可能性があることを考慮すると、考えられる全ての経路を列挙することが可能である。グラフＧを検索することで、有効な全てのパターンの抽出を可能にする。 Each path pattern B of length l includes l entities and / or entity types. As such, path pattern B may include not only specific entities (eg, specific files), but also common names of entity types in the same path. B is determined to be a valid route pattern only when there is at least a route p in G that matches B. Considering that l can be a small number, it is possible to enumerate all possible paths. By searching the graph G, all effective patterns can be extracted.

有効な経路パターンＢは、例えば、過去の侵入検知攻撃による専門家の経験を用いて、該専門家によって生成されてもよい。しかしながら、有効な経路パターンの正確かつ完全な集合をそのような専門家から得ることは困難である可能性がある。そのため、経路パターンは自動的に生成してもよい。各エンティティは、具体的なシステムエンティティタイプとして設定される。情報漏洩に対応する全ての経路では、ファイルエンティティ（Ｆ）から開始し、インターネットソケットエンティティ（Ｉ）で終了しなければならない。経路ｐ∈Ｇ及びＧの経路パターンＢが与えられると、ｐ［ｉ］及びＢ［ｉ］は、それぞれｐ及びＢにおけるｉ番目のノードを表す。したがって、ｐ及びＢが同じ長さを有し、各ｌに関してｐ［ｉ］∈Ｂ［ｉ］である（すなわち具体的なエンティティｐ［ｉ］がエンティティタイプＢ［ｉ］に属する）場合、経路ｐはＢと一致し、

で示される。そして、

であるようなＧにおいて、少なくとも１つの経路ｐが存在する場合、Ｂは有効な経路パターンである。上記の制約に続く一例において、上記の４つのエンティティタイプ、すなわち｛Ｆ，Ｆ，ｌ｝、｛Ｆ，Ｐ，ｌ｝、｛Ｆ，Ｕ，ｌ｝及び｛Ｆ，Ｉ，ｌ｝を用いた長さ３から成る４つの可能性がある経路パターンがある。何故なら、プロセスノードだけがファイルノードをインターネットソケットノードに接続できるからである。このようにして、Ｇにおける有効な全てのパターンを発見できる。 The effective path pattern B may be generated by the expert using, for example, the expert's experience from past intrusion detection attacks. However, it can be difficult to obtain an accurate and complete set of valid route patterns from such an expert. Therefore, the route pattern may be automatically generated. Each entity is set as a specific system entity type. All paths corresponding to information leakage must start with the file entity (F) and end with the Internet socket entity (I). Given a path pεG and a path pattern B of G, p [i] and B [i] represent the i-th node in p and B, respectively. Thus, if p and B have the same length and for each l, p [i] εB [i] (ie the specific entity p [i] belongs to the entity type B [i]), the path p matches B,

Indicated by And

In G such that B is a valid route pattern if there is at least one route p. In an example following the above constraints, the above four entity types were used: {F, F, l}, {F, P, l}, {F, U, l} and {F, I, l}. There are four possible path patterns consisting of length 3. This is because only process nodes can connect file nodes to Internet socket nodes. In this way, all valid patterns in G can be found.

生成された有効な経路パターンＢに基づいて、ブロック４０６は、多部グラフにおいて、該パターンを満たす経路を探索する。イベントシーケンスｓｅｑ＝｛ｅ_１，ｅ_２，．．．，ｅ_ｒ）が与えられると、グラフＧには等価経路ｐ＝｛ｖ_ｌ，ｖ_２，．．．，ｖ_ｒ＋１｝がなければならない。イベントは時間の順に起きるため、時間順制約（time order constraint）が候補経路の探索に適用される。経路パターン及び時間順制約を幅優先探索（breadth first search）に適用することで、Ｇのワンタイムス走査で候補経路を発見することができる。候補経路Ｃは、

で定義される。 Based on the generated valid route pattern B, block 406 searches the multipart graph for routes that satisfy the pattern. Event sequence seq = {e ₁ , e ₂ ,. . . , E _r ), the graph G has an equivalent path p = {v _l , v ₂ _,. . . , V _{r + 1} }. Since events occur in time order, time order constraints are applied to search for candidate routes. By applying the route pattern and the time order constraint to the breadth first search, the candidate route can be found by the G one-time scan. Candidate route C is

Defined by

経路パターン及び時間順制約に基づくフィルタリングポリシーを用いても、グラフＧには依然として多数の候補経路が残る可能性があり、そのほとんどは通常動作に関連する。したがって、本実施形態では、候補経路のより大きな集合から疑わしい経路を抽出する。 Even with a filtering policy based on route patterns and time order constraints, graph G may still have a large number of candidate routes, most of which are related to normal operation. Therefore, in this embodiment, a suspicious route is extracted from a larger set of candidate routes.

経路において、関連するエンティティがそれらの通常の役割と異なる動きをする場合、ブロック４０８によって候補経路が疑わしいと判定される。コンピュータシステムにおいて、情報送信者及び受信者は、エンティティの役割として判別される。送信者及び受信者スコアは、通常動作のプロファイルを設定するために使用されるため、そのコンピュータシステムから正確に分かるはずである。このことを達成するため、ランダムウォークがグラフＧに適用される。Ｇから、Ｎ×Ｎ正方遷移行列Ａは、

で計算される。ここで、Ｎはエンティティの総数であり、Ｔ（ｖ_ｉ，ｖ_ｊ）はｖ_ｉとｖ_ｊとの間でこれまでに発生したイベントのタイムスタンプの集合である。Ａ［ｉ］［ｊ］は、Ｇにおいてｖ_ｉからｖ_ｊへ情報が流れる確率を意味する。 If the related entities in the path move differently from their normal roles, block 408 determines that the candidate path is suspicious. In a computer system, information senders and receivers are identified as entity roles. The sender and recipient scores are used to set up a profile for normal operation and should be known accurately from the computer system. To achieve this, a random walk is applied to the graph G. From G, the N × N square transition matrix A is

Calculated by Here, N is the total number of entities, and T (v _i , v _j ) is a set of time stamps of events that have occurred so far between v _i and v _j . A [i] [j] means the probability that information will flow from v _i to v _{j in} G.

Ａは、多部グラフＧの行列表現であるため、

で表すこともできる。ここで、ゼロはゼロ部分行列を示し、矢印演算子は情報が流れる方向を示す。例えば、Ｐ→Ｆは、プロセスからファイルに対する情報の流れを示している。プロセス間インタラクションはインタラクションフローを備えていないため、Ａの非ゼロ部分行列はプロセスとファイルとの間及びプロセスとソケットとの間のみに現れ、それぞれのプロセス間には現れないことに留意されたい。これらはＵＮＩＸ（登録商標）システムによって設定される制約である。 Since A is a matrix representation of the multipart graph G,

It can also be expressed as Here, zero indicates a zero submatrix, and the arrow operator indicates the direction in which information flows. For example, P → F indicates a flow of information from the process to the file. Note that because non-process interactions do not have an interaction flow, the non-zero submatrix of A appears only between processes and files and between processes and sockets, not between each process. These are constraints set by the UNIX (registered trademark) system.

Ｘを送信者スコアベクトルとし、Ｘ［ｉ］がｖ_ｉの送信者スコアを示し、Ｙが受信者スコアベクトルとすると、初期ベクトルＸ_０及びＹ_０がランダムに生成され、ｍが現在の反復回数を引用する場合、各エンティティの送信者及び受信者スコアは、

で示すように繰り返し生成できる。概説するならば、高い受信者スコアを有する、多数のエンティティへ情報を送信するエンティティは、それ自体が重要な情報送信者であり、高い送信者スコアを有する、多数のエンティティから情報を受信するエンティティは、重要な情報受信者である。したがって、エンティティの送信者及び受信者スコアは、エンティティに関連する受信者及び送信者スコアを蓄積することで反復計算される。例えば、ＵＮＩＸ（登録商標）システム上のｆｉｌｅ／ｅｔｃ／ｐａｓｓｗｄは、アクセス許可をチェックするために多くのプロセスに送信されるため、高い送信者スコア及び低い受信者スコアを有するが、めったに変更されることはない。 If X is the sender score vector, X [i] indicates the sender score of v _i , Y is the receiver score vector, initial vectors X ₀ and Y ₀ are randomly generated, and m is the current number of iterations The sender and receiver scores for each entity are

It can be generated repeatedly as shown in. In summary, an entity that sends information to a number of entities with a high recipient score is an important information sender itself, and an entity that receives information from a number of entities with a high sender score. Is an important information recipient. Thus, the entity's sender and receiver scores are iteratively calculated by accumulating the receiver and sender scores associated with the entity. For example, file / etc / passwd on a UNIX system has a high sender score and a low recipient score because it is sent to many processes to check access permissions, but rarely changes There is nothing.

この反復改良の結果では、学習したスコア値が初期スコア値に依存する。しかしながら、初期スコア値の影響は、行列の定常状態特性を用いて除去できる。一般正方行列Ｍ及び一般ベクトルπが与えられると、一般ベクトルπは、

で示すように繰り返し更新できる。十分に大きなｍの値に対してπ_ｍ＋１≒π_ｍであるような収束状態は起こり得る。この場合、収束状態に到達できる１つの固有の値がある。

収束状態は、収束されたベクトルが行列Ｍのみに依存するが、初期ベクトル値π_０からは独立した特性を有する。 In the result of this iterative improvement, the learned score value depends on the initial score value. However, the effect of the initial score value can be removed using the steady state characteristics of the matrix. Given a general square matrix M and a general vector π, the general vector π is

It can be updated repeatedly as shown in. A converging state such that π _{m + 1} ≈π _m can occur for a sufficiently large value of _m . In this case, there is one unique value that can reach the convergence state.

The convergence state has a characteristic independent of the initial vector value π ₀ although the converged vector depends only on the matrix M.

収束状態に到達するには、行列Ｍは、既約性と非周期性という２つの条件を満たす必要がある。グラフＧは、任意の２つのノードに対して、それらの間に少なくとも１つの経路が存在する時かつそのときに限り既約である。ノードの周期は、ノードからそれ自体に戻る最小経路長であり、グラフの周期は、全てのノード周期値の最大公約数である。グラフＧは、それが既約でありかつＧのピリオドが１である時かつそのときに限り非周期である。 In order to reach the convergence state, the matrix M needs to satisfy two conditions of irreducibility and aperiodicity. Graph G is irreducible for and only if there are at least one path between any two nodes. The period of a node is the minimum path length from the node back to itself, and the period of the graph is the greatest common divisor of all node period values. Graph G is aperiodic when and only if it is irreducible and the period of G is 1.

システムグラフＧは必ずしも強く接続されていないため、上記の反復は必ずしも収束に到達しない。収束を確実にするため、各セル値が

であり、Ｎ×Ｎ正方行列である再起動行列（restart matrix）Ｒが追加される。新たな遷移行列

は

で定義される。ここで、ｃは０と１との間の値であり、再起動比と呼ばれる。

は既約であり非周期であることが保証され、収束された送信者スコア及び受信者スコアベクトルをもたらす。収束率は、再起動率の値をコントロールすることで制御できる。収束を確実にするために用いるいくつかの反復に関する１つの例示的な値は約１０である。 Since the system graph G is not necessarily strongly connected, the above iteration does not necessarily reach convergence. To ensure convergence, each cell value is

And a restart matrix R which is an N × N square matrix is added. New transition matrix

Is

Defined by Here, c is a value between 0 and 1 and is called a restart ratio.

Is guaranteed to be irreducible and non-periodic, resulting in a converged sender score and receiver score vector. The convergence rate can be controlled by controlling the restart rate value. One exemplary value for several iterations used to ensure convergence is about 10.

送信者及び受信者スコアに基づき、経路ｐが与えられると、経路に関する異常スコアは、
Ｓｃｏｒｅ（ｐ）＝１−ＮＳ（ｐ）

で計算される。 Based on the sender and receiver scores, given a route p, the anomaly score for the route is
Score (p) = 1-NS (p)

Calculated by

上述したように、異なる長さの経路に関する異常スコアは異なる分布を有する。したがって、異なる長さの経路の疑わしさを比較するために、異なる長さの経路を同じ条件に置き換える変換が実施される。経路異常スコアは、任意の分布を有することが可能であり、一般に通常の分布ではない。長さｒの経路の疑わしさは、

で定義できる。ここで、Ｔは正規化関数である。 As mentioned above, the anomaly scores for different length paths have different distributions. Therefore, in order to compare the suspicion of different length paths, a transformation is performed that replaces different length paths with the same conditions. The path anomaly score can have any distribution and is generally not a normal distribution. The suspicion of the route of length r is

Can be defined. Here, T is a normalization function.

上位（top）ｋの疑わしい経路は、最も大きい疑わしさスコアを有するものである。数学的には、通常の分布を任意の他の分布に変える変換は実現可能であるが、その逆関数を得ることは困難である。この問題を解決するため、ボックスコックスベキ変換を正規化関数として用いる。特に、Ｑ（ｒ）を各スコアｑ∈Ｑ（ｒ）に関して長さｒの経路から計算された異常スコアの集合として示すと、

である。ここで、λは正規化パラメータである。λに関する異なる値は異なる変換された分布を生じる。目的は、通常の分布（すなわち、Ｔ（Ｂ，λ）〜Ν（μ，σ^２））に可能な限り近い正規化分布が生じるλの値を選択することである。 The top k suspicious paths are those with the highest suspicion scores. Mathematically, a transformation that changes a normal distribution to any other distribution is feasible, but it is difficult to obtain its inverse function. In order to solve this problem, the Box Cox power transformation is used as a normalization function. In particular, if Q (r) is shown as a set of anomaly scores calculated from a path of length r for each score qεQ (r)

It is. Here, λ is a normalization parameter. Different values for λ result in different transformed distributions. The goal is to select the value of λ that yields a normalized distribution that is as close as possible to the normal distribution (ie, T (B, λ) to Ν (μ, σ ² )).

上位ｋの疑わしい経路は、通常の経路に対して十分に弁別的にならない限り、侵入攻撃に関連しているとはみなされない。疑わしい経路からの偏差を通常の経路から測定するため、経路の２つのグループ間でｔ値が計算される。通常経路は多数であるため、総和を計算することなく、相対的に小さいサイズのサンプルから期待値及び分散を計算する、モンテカルロシミュレーションに基づく効率的な解決法を用いる。 The top k suspicious paths are not considered related to intrusion attacks unless they are sufficiently discriminatory from the normal path. In order to measure the deviation from the suspicious path from the normal path, a t-value is calculated between the two groups of paths. Since there are many normal paths, an efficient solution based on Monte Carlo simulation is used that calculates expected values and variances from relatively small sample sizes without calculating the sum.

疑わしい経路が検知されると、ホストレベル分析モジュール４３は、異常に関する情報を提供し、１つまたは複数のアラートを含む報告を生成する。異常融合モジュール４４は、これらのホストレベルアラートを他のホスト及びネットワークレベル異常と統合し、誤った警報を自動的に除去する。結果として生じる異常のリストは、可視化モジュール４５を介してユーザに提供される。代替の実施形態において、明白な異常または異常のクラスは、例えばセキュリティ対策または緩和（mitigations）を配備することで自動的に対処してもよい。具体的な一例において、検知された異常に対する自動応答は、異常動作を示す装置を管理者が検査できるまでシャットダウンすることであってもよい。 When a suspicious path is detected, the host level analysis module 43 provides information about the anomaly and generates a report that includes one or more alerts. Anomaly fusion module 44 integrates these host level alerts with other host and network level anomalies and automatically removes false alarms. The resulting list of anomalies is provided to the user via the visualization module 45. In alternative embodiments, obvious anomalies or classes of anomalies may be addressed automatically, for example by deploying security measures or mitigations. In a specific example, the automatic response to the detected abnormality may be shut down until the administrator can inspect the device that exhibits the abnormal operation.

次に図５を参照すると、図５には、上位ｋの疑わしい経路「ＳＰ」を発見するための疑似コードが示されている。送信者及び受信者スコアベクトルＸ及びＹは、ランダムウォークプロセスを用いて作成される。ファイルＦ_Ｘのキュー（queue）は下降（descending）Ｘにしたがってソートされ、ファイルＦ_Ｙのキューはソートされた下降Ｙである。同様に、プロセスＰ_Ｘ及びＰ_Ｙのキュー、ＵＮＩＸ（登録商標）ソケットＵ_Ｘ及びＵ_Ｙのキュー、並びにインターネットソケットＳ_Ｘ及びＳ_Ｙのキューが作成される。そして、イベントシーケンスパターン及び時間的制約が一致する経路の集合を見つけるために経路が処理される。 Reference is now made to FIG. 5, which shows pseudo code for finding the top k suspicious paths “SP”. Sender and recipient score vectors X and Y are created using a random walk process. The queue of file F _X is sorted according to descending X, and the queue of file F _Y is sorted descending Y. Similarly, queues for processes P _X and P _Y , queues for UNIX sockets U _X and U _Y , and queues for Internet sockets S _X and S _Y are created. The path is then processed to find a set of paths that match the event sequence pattern and time constraints.

本明細書に記載した実施形態は、ハードウェアで実現してもよく、ソフトウェアで実現してもよく、ハードウェアとソフトウェアの両方の要素を含んでいてもよい。好ましい実施形態において、本発明は、ファームウェア、常駐ソフトウェア、マイクロコード等を含むが、これらに限定されないソフトウェアでも実現可能である。 The embodiments described herein may be implemented in hardware, may be implemented in software, and may include both hardware and software elements. In a preferred embodiment, the present invention can be implemented in software, including but not limited to firmware, resident software, microcode, etc.

実施形態には、コンピュータもしくは任意の命令実行システムによって使用される、または関連して使用されるプログラムコードを提供する、コンピュータで利用可能な、またはコンピュータで読み取り可能な媒体からアクセスできる、コンピュータプログラム製品を含んでもよい。コンピュータで利用可能な、またはコンピュータで読み取り可能な媒体には、命令実行システム、機器、もしくは装置によって使用される、または関連して使用されるプログラムを格納、伝達、伝搬または転送する任意の機器を含んでもよい。該媒体は、磁気媒体、光学媒体、電子媒体、電磁気媒体、赤外線媒体、または半導体システム（または機器もしくは装置）、あるいは伝搬媒体であってよい。該媒体には、半導体または固体メモリ、磁気テープ、取り外し可能なコンピュータディスケット、ランダムアクセスメモリ（ＲＡＭ）、リードオンリーメモリ（ＲＯＭ）、リジッド磁気ディスク及び光ディスク等のコンピュータで読み取り可能な媒体を含んでもよい。 Embodiments include a computer program product that provides program code for use by or in connection with a computer or any instruction execution system, accessible from a computer, or accessible from a computer. May be included. A computer-usable or computer-readable medium includes any device that stores, transmits, propagates, or transmits programs used by or in connection with an instruction execution system, device, or apparatus. May be included. The medium may be a magnetic medium, an optical medium, an electronic medium, an electromagnetic medium, an infrared medium, or a semiconductor system (or apparatus or device), or a propagation medium. Such media may include computer readable media such as semiconductor or solid state memory, magnetic tape, removable computer diskettes, random access memory (RAM), read only memory (ROM), rigid magnetic disks and optical disks. .

各コンピュータプログラムは、汎用または特別な目的を持つプログラム可能なコンピュータで読み取ることができる、機械で読み取り可能なストレージメディアまたは装置（例えば、プログラムメモリまたは磁気ディスク）に格納される。該コンピュータプログラムは、ストレージメディアまたは装置から本明細書に記載された手順を実行するコンピュータで読み出される、該コンピュータの設定及び制御動作のためのものである。本発明のシステムには、本明細書に記載した機能を実行する、特定の及び事前に定義された方法でコンピュータに動作させるように構成されたコンピュータプログラムを含む、コンピュータで読み取り可能なストレージメディアも考慮される。 Each computer program is stored on a machine-readable storage medium or device (eg, program memory or magnetic disk) that can be read by a general purpose or special purpose programmable computer. The computer program is for a setting and control operation of the computer that is read from a storage medium or device by a computer that performs the procedures described herein. The system of the present invention also includes a computer readable storage medium including a computer program configured to cause a computer to operate in a specific and predefined manner that performs the functions described herein. Be considered.

プログラムコードを記憶及び／または実行するのに適したデータ処理システムは、システムバスを介してメモリ要素に直接または間接的に接続された少なくとも１つのプロセッサを備えていてもよい。このメモリ要素には、処理の実行中にバルク記憶装置からコードが検索される回数を減らすために、プログラムコードの実際の実行中に用いられるローカルメモリ、バルク記憶装置及び少なくともいくつかのプログラムコードを一時的に記憶するキャッシュメモリを備えていてもよい。入出力またはＩ／Ｏ装置（限定されるものではないが、キーボード、ディスプレイ、ポインティング装置等を含む）は、直接またはＩ／Ｏコントローラを介してシステムに接続されてもよい。 A data processing system suitable for storing and / or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus. This memory element contains local memory, bulk storage, and at least some program code used during the actual execution of program code to reduce the number of times code is retrieved from the bulk storage during processing. You may provide the cache memory which memorize | stores temporarily. Input / output or I / O devices (including but not limited to keyboards, displays, pointing devices, etc.) may be connected to the system either directly or through an I / O controller.

ネットワークアダプタは、データ処理システムが、プライベートネットワークまたは公衆ネットワークを介して、他のデータ処理システムまたは遠隔プリンタもしくは記憶装置に接続されることを可能にするために、上記システムと接続されていてもよい。モデム、ケーブルモデム及びイーサネット（登録商標）カードは、現在利用可能なタイプのネットワークアダプタのほんの一握りのものである。 A network adapter may be connected to the system to allow the data processing system to be connected to other data processing systems or remote printers or storage devices via a private or public network. . Modems, cable modems and Ethernet cards are just a handful of the types of network adapters currently available.

次に図６を参照すると、図６には、分析サーバ３０、侵入検知システム３１および／またはホストレベル分析モジュール４３に適用できる処理システム６００の一例が示されている。処理システム６００は、システムバス６０２を介して他の構成要素と動作可能に接続された、少なくとも１つのプロセッサ（ＣＰＵ）６０４を含む。システムバス６０２には、キャッシュ６０６、リードオンリメモリ（ＲＯＭ）６０８、ランダムアクセスメモリ（ＲＡＭ）６１０、入力／出力（Ｉ／Ｏ）アダプタ６２０、サウンドアダプタ６３０、ネットワークアダプタ６４０、ユーザインタフェースアダプタ６５０及びディスプレイアダプタ６６０が動作可能に接続されている。 Referring now to FIG. 6, FIG. 6 shows an example of a processing system 600 that can be applied to the analysis server 30, the intrusion detection system 31, and / or the host level analysis module 43. Processing system 600 includes at least one processor (CPU) 604 operatively connected to other components via a system bus 602. The system bus 602 includes a cache 606, a read only memory (ROM) 608, a random access memory (RAM) 610, an input / output (I / O) adapter 620, a sound adapter 630, a network adapter 640, a user interface adapter 650, and a display. An adapter 660 is operably connected.

第１の記憶装置６２２及び第２の記憶装置６２４は、Ｉ／Ｏアダプタ６２０によってシステムバス６０２と動作可能に接続されている。記憶装置６２２及び６２４は、ディスク記憶装置（例えば磁気ディスク記憶装置または光ディスク記憶装置）、固体磁気装置等のいずれであってもよい。記憶装置６２２及び６２４は、同じタイプの記憶装置であってもよく、異なるタイプの記憶装置であってもよい。 The first storage device 622 and the second storage device 624 are operably connected to the system bus 602 by an I / O adapter 620. The storage devices 622 and 624 may be disk storage devices (for example, magnetic disk storage devices or optical disk storage devices), solid magnetic devices, or the like. The storage devices 622 and 624 may be the same type of storage device or different types of storage devices.

スピーカ６３２は、サウンドアダプタ６３０によってシステムバス６０２と動作可能に接続されている。トランシーバ６４２は、ネットワークアダプタ６４０によってシステムバス６０２と動作可能に接続されている。ディスプレイ装置６６２は、ディスプレイアダプタ６６０によってシステムバス６０２と動作可能に接続されている。 The speaker 632 is operatively connected to the system bus 602 by a sound adapter 630. The transceiver 642 is operatively connected to the system bus 602 by a network adapter 640. Display device 662 is operatively connected to system bus 602 by display adapter 660.

第１のユーザ入力装置６５２、第２のユーザ入力装置６５４及び第３のユーザ入力装置６５６は、ユーザインタフェースアダプタ６５０によってシステムバス６０２と動作可能に接続されている。ユーザ入力装置６５２、６５４及び６５６は、キーボード、マウス、キーパッド、イメージキャプチャ装置、モーション感知装置、マイクロホン、あるいはこれらの装置のうちの少なくとも２つの装置の機能を組み込んだ装置等のいずれであってもよい。本原理の趣旨を維持する限りにおいて、他のタイプの入力装置を使用することも可能である。ユーザ入力装置６５２、６５４及び６５６は、同じタイプのユーザ入力装置であってもよく、異なるタイプのユーザ入力装置であってもよい。ユーザ入力装置６５２、６５４及び６５６は、システム６００に情報を入力し、システム６００から情報を出力するために使用される。 The first user input device 652, the second user input device 654, and the third user input device 656 are operatively connected to the system bus 602 by a user interface adapter 650. The user input devices 652, 654, and 656 are any of a keyboard, a mouse, a keypad, an image capture device, a motion sensing device, a microphone, or a device that incorporates functions of at least two of these devices. Also good. Other types of input devices can be used as long as the spirit of the present principle is maintained. User input devices 652, 654 and 656 may be the same type of user input device or different types of user input devices. User input devices 652, 654 and 656 are used to input information to and output information from system 600.

処理システム６００は、当業者であれば容易に思いつくような他の要素（図示せず）を含んでもよく、特定の要素を省略することも可能である。例えば、当業者であれば容易に理解できるが、処理システム６００には、その詳細な実装に応じて他の様々な入力装置及び／または出力装置を含むことができる。例えば、無線及び／または有線による様々な入力装置及び／または出力装置を使用できる。さらに、当業者であれば容易に理解できるが、様々な構成の追加プロセッサ、コントローラ、メモリ等を使用することも可能である。処理システム６００の上記及び他の変形例は、本明細書で提供される本原理の教示によって当業者であれば容易に考えられるであろう。 The processing system 600 may include other elements (not shown) that would be readily conceivable by those skilled in the art, and certain elements may be omitted. For example, as will be readily appreciated by those skilled in the art, processing system 600 may include a variety of other input devices and / or output devices depending on the detailed implementation thereof. For example, a variety of wireless and / or wired input devices and / or output devices can be used. Further, as can be easily understood by those skilled in the art, various configurations of additional processors, controllers, memories, and the like can be used. These and other variations of the processing system 600 will be readily apparent to those skilled in the art from the teachings of the present principles provided herein.

上記は、あらゆる観点において例示的（illustrative）かつ典型的（exemplary）であって限定的でないものと理解されるべきであり、本明細書で開示する本発明の範囲は、詳細な説明から決定されるべきではなく、特許法で認められた最大限の広さに基づいて解釈される特許請求の範囲から決定されるべきである。本明細書中に図示及び記載されている実施形態は、本発明の原理を説明するものにすぎず、本発明の範囲及び主旨から逸脱することなく当業者は様々な変更を実施することができることを理解されたい。当業者は、本発明の範囲及び精神から逸脱することなく、様々な他の特徴の組み合わせを実施できる。以上、本発明の態様について、特許法で要求される詳細及び特殊性と共に説明したが、特許証で保護されることを要求する特許請求の範囲は、添付の特許請求の範囲に示されている。 It should be understood that the foregoing is illustrative and exemplary in all respects and not limiting, and the scope of the invention disclosed herein is determined from the detailed description. It should not be determined from the claims, but should be construed based on the maximum breadth permitted by the patent law. The embodiments illustrated and described herein are merely illustrative of the principles of the invention and various modifications can be made by those skilled in the art without departing from the scope and spirit of the invention. I want you to understand. Those skilled in the art can implement various other feature combinations without departing from the scope and spirit of the invention. While the embodiments of the present invention have been described with details and specialities required by the Patent Law, the scope of the claims requiring protection by the patent certificate is set forth in the appended claims. .

Claims

A method for detecting malicious processes,
Computer
And a side representing vertices and event between entities for each system which represent the system entities, as a graph with one or more time stamp corresponding to each of the events between the sides of the two system entities Model system data ,
Generate a set of valid path patterns related to possible attacks ,
Using said random walk on a graph, you determine suspect one or more event sequences in the system based on the graph and the valid route patterns, method.

Entity of the system, the file in the system, the process in the system, including the Internet socket in UNIX (registered trademark) socket and said system in said system, method according to claim 1.

The method of claim 1, wherein the valid pattern is determined based on an entity type characteristic of the system.

The method of claim 3, wherein the valid path pattern is determined based on a definition provided by a security expert through experience based on past intrusion attacks.

The method of claim 1, wherein determining that the one or more event sequences are suspicious comprises the computer performing a breadth-first search for candidate paths in the graph.

The method of claim 5, wherein the breadth-first search comprises a time order constraint based on a time stamp of the edge.

Said one or more event sequences is determined that suspicious has said computer, determines that the entity on the sides are shifted from the role of the hand through the normally relates these entities, claims The method according to 1.

The determining that entities are deviating from the normal role with respect to those entities comprises the computer determining a sender score for the sender entity and a receiver score for the receiver entity. 8. The method according to 7.

9. Determining one or more event sequences as suspicious comprises the computer calculating an anomaly score based on the sender score and receiver score of each entity in each event sequence. The method described.

The method of claim 9, wherein determining one or more event sequences as suspicious comprises the computer normalizing an anomaly score using a boxcox power transform.

A system for detecting malicious processes,
Graph with one or more time stamps and a side representing the vertices and events between the entities for each system, each side corresponding to each of the events between the two systems entities that represent system entities A modeling module configured to model system data as
Generate a set of valid route patterns associated with possible attacks and use a random walk on the graph to suspect one or more event sequences in the system based on the graph and the valid route pattern A malicious process path discovery module comprising a processor configured to determine;
Having a system.

12. The system of claim 11, wherein the system entities comprise a file in the system, a process in the system, a UNIX socket in the system, and an internet socket in the system.

The system of claim 11, wherein the malicious process path discovery module is further configured to determine a valid pattern based on characteristics of an entity type of the system.

The system of claim 13, wherein the malicious process path discovery module is further configured to determine a valid path pattern based on definitions provided by a security expert with experience based on past intrusion attacks. .

The system of claim 11, wherein the malicious process path discovery module is further configured to perform a breadth-first search of candidate paths within the graph.

The system of claim 15, wherein the breadth-first search comprises a time order constraint based on a time stamp of the edge.

The system of claim 11, wherein the malicious process path discovery module is further configured to determine whether the entities on an edge have deviated from their normal roles with respect to those entities.

The system of claim 17, wherein the malicious process path discovery module is further configured to determine a sender score of a sender entity and a receiver score of a recipient entity.

The system of claim 18, wherein the malicious process path discovery module is further configured to calculate an anomaly score based on the sender score and receiver score of each entity in each event sequence.

The system of claim 19, wherein the malicious process path discovery module is further configured to normalize an anomaly score using a boxcox power transform.