JP6297425B2

JP6297425B2 - Attack code detection apparatus, attack code detection method, and program

Info

Publication number: JP6297425B2
Application number: JP2014130741A
Authority: JP
Inventors: 恭之田中
Original assignee: NTT Communications Corp
Current assignee: NTT Communications Corp
Priority date: 2014-06-25
Filing date: 2014-06-25
Publication date: 2018-03-20
Anticipated expiration: 2034-06-25
Also published as: JP2016009405A

Description

本発明は、コンピュータの脆弱性を狙った不正プログラムを検出する技術に関連するものである。 The present invention relates to a technology for detecting a malicious program aimed at a vulnerability of a computer.

コンピュータの脆弱性を狙った不正プログラムによる攻撃が重要な問題となっている。通常、そのような攻撃により、コンピュータシステムの脆弱性が明らかになると、脆弱性を無くすための脆弱性パッチがリリースされ、ユーザに配布される。しかし、脆弱性パッチがリリースされる前に、攻撃手法やツールが公になって、攻撃が行われるゼロデイ攻撃が後を絶たない。 Attacks by malicious programs aimed at computer vulnerabilities are an important issue. Usually, when a vulnerability of a computer system is revealed by such an attack, a vulnerability patch for eliminating the vulnerability is released and distributed to users. However, before the vulnerability patch is released, attack methods and tools become public, and there is no end to zero-day attacks where attacks are conducted.

上記の攻撃の特徴は、ＤＥＰ（ＤａｔａＥｘｅｃｕｔｉｏｎＰｒｅｖｅｎｔｉｏｎ、データ実行防止）等のコンピュータ側の防御手法を回避するために、ＲＯＰ（ＲｅｔｕｒｎＯｒｉｅｎｔｅｄＰｒｏｇｒａｍｍｉｎｇ）に代表されるコードリユースという攻撃手法が多く用いられることである。コードリユースを正確にとらえるためにはその性質上、コンピュータ側のメモリ状態を把握する必要がある。例えば、特許文献１には、ホストコンピュータ側でメモリ状態を把握して攻撃を回避する技術が開示されている。 The feature of the above attack is that a code reuse attack technique represented by ROP (Return Oriented Programming) is often used in order to avoid computer-side defense techniques such as DEP (Data Execution Prevention). It is. In order to accurately grasp code reuse, it is necessary to grasp the memory state on the computer side. For example, Patent Literature 1 discloses a technique for grasping a memory state on the host computer side to avoid an attack.

また、昨今の標的型攻撃では、攻撃コードを埋め込んだ悪性文書ファイルを送付し、被害者がファイルを開くことでマルウェア感染等が引き起こされ、情報の搾取等がなされるケースが多い。このような悪性文書ファイルにも最新のものについてはＲＯＰ攻撃コードが含まれているケースが出てきている。例えば、特許文献２には、文書ファイルから悪意のあるシェルコード等を動的に検出する技術が記載されている。 Moreover, in recent targeted attacks, a malicious document file in which an attack code is embedded is sent, and the victim opens the file, thereby causing malware infection and the like, and often exploiting information. There is a case where such a malicious document file includes an ROP attack code for the latest file. For example, Patent Document 2 describes a technique for dynamically detecting a malicious shell code or the like from a document file.

特開２０１１−２５８０１９号公報JP 2011-258019 A 特開２０１３−２３９１４９号公報JP 2013-239149 A 三村守，田中英彦 : Handy Scissors:悪性文書ファイルに埋め込まれた実行ファイルの自動抽出ツール，情報処理学会論文誌，Vol.54，No.3，pp.1211-1219(Mar. 2013)．Mamoru Mimura, Hidehiko Tanaka: Handy Scissors: Automatic extraction tool for executable files embedded in malicious document files, Journal of Information Processing Society of Japan, Vol.54, No.3, pp.1211-1219 (Mar. 2013). 大坪雄平，三村守，田中英彦 : ファイル構造検査による悪性 MS 文書ファイル検知手法の検知, 情報処理学会研究報告, Vol.2013-IOT-22, No.16 (2013).Yuhei Otsubo, Mamoru Mimura, Hidehiko Tanaka: Detection of malicious MS document file detection method by file structure inspection, IPSJ SIG, Vol.2013-IOT-22, No.16 (2013). Boldewin, F.: Analyzing MSOffice malware with OfficeMalScanner (2009), http://www.reconstructer.org/code.htmlBoldewin, F .: Analyzing MSOffice malware with OfficeMalScanner (2009), http://www.reconstructer.org/code.html

しかし、例えば特許文献２に記載されたような動的解析により攻撃コード（悪意のあるシェルコード等）を検出する手法は、攻撃コードにより試験環境が被害を受ける可能性があり危険であるとともに、エクスプロイトコード等を作動させるためにＯＳやアプリケーションのバージョン・パッチレベルを適切に合わせた特定の環境を整えなければならない等、解析に手間がかかるという問題がある。 However, for example, a technique for detecting an attack code (malicious shell code or the like) by dynamic analysis as described in Patent Document 2 is dangerous because the test environment may be damaged by the attack code. There is a problem that analysis takes time and effort, for example, it is necessary to prepare a specific environment that appropriately matches the version and patch level of the OS and application in order to operate the exploit code.

本発明は上記の点に鑑みてなされたものであり、文書ファイル等の入力データ内の攻撃コードを静的解析により検出する技術を提供することを目的とする。 The present invention has been made in view of the above points, and an object thereof is to provide a technique for detecting an attack code in input data such as a document file by static analysis.

本発明の実施の形態によれば、入力データから攻撃コードを検出する攻撃コード検出装置であって、
前記入力データの中から、所定のデータ長の部分データ列を複数取り出す取得手段と、
前記複数の部分データ列について、各部分データ列の数値が所定の範囲に集中する度合を評価する評価手段と、
前記評価手段による評価結果に基づいて、前記入力データにおける前記攻撃コードの検出結果を出力する出力手段とを備える攻撃コード検出装置が提供される。 According to an embodiment of the present invention, an attack code detection device for detecting an attack code from input data,
Obtaining means for extracting a plurality of partial data strings having a predetermined data length from the input data;
For the plurality of partial data strings, evaluation means for evaluating the degree to which the numerical values of the partial data strings are concentrated in a predetermined range;
There is provided an attack code detection device comprising: output means for outputting the detection result of the attack code in the input data based on the evaluation result by the evaluation means.

また、本発明の実施の形態によれば、入力データから攻撃コードを検出する攻撃コード検出装置が実行する攻撃コード検出方法であって、
前記攻撃コード検出装置が、前記入力データの中から、所定のデータ長の部分データ列を複数取り出す取得ステップと、
前記攻撃コード検出装置が、前記複数の部分データ列について、各部分データ列の数値が所定の範囲に集中する度合を評価する評価ステップと、
前記攻撃コード検出装置が、前記評価ステップによる評価結果に基づいて、前記入力データにおける前記攻撃コードの検出結果を出力する出力ステップと
を備えることを特徴とする攻撃コード検出方法が提供される。
Further, according to the embodiment of the present invention, there is an attack code detection method executed by an attack code detection device that detects an attack code from input data,
The attack code detecting device obtains a plurality of partial data strings having a predetermined data length from the input data; and
The attack code detection device, for the plurality of partial data strings, an evaluation step for evaluating the degree to which the numerical values of the partial data strings are concentrated in a predetermined range;
An attack code detection method is provided, wherein the attack code detection device includes an output step of outputting a detection result of the attack code in the input data based on an evaluation result in the evaluation step.

文書ファイル等の入力データ内の攻撃コードを静的解析により検出する技術を提供できる。 A technique for detecting an attack code in input data such as a document file by static analysis can be provided.

悪性文書ファイルの動作を説明するための図である。It is a figure for demonstrating operation | movement of a malicious document file. ＲＯＰコードを含む悪性文書ファイルの内容例を示す図である。It is a figure which shows the example of the content of the malignant document file containing a ROP code. 本発明の実施の形態に係る悪性文書ファイル検出装置１００の機能構成図である。It is a functional block diagram of the malicious document file detection apparatus 100 which concerns on embodiment of this invention. 悪性文書ファイル内のＲＯＰコードを説明するための図である。It is a figure for demonstrating the ROP code in a malicious document file. ＲＯＰコードの例を示す図である。It is a figure which shows the example of a ROP code. メモリ領域の例を示す図である。It is a figure which shows the example of a memory area. 攻撃コード判定部１０３における動作例を示すフローチャートである。5 is a flowchart illustrating an operation example in an attack code determination unit 103. ダブルワード間の比較の例を示す図である。It is a figure which shows the example of the comparison between double words. 疑似コード例１を示す図である。6 is a diagram illustrating pseudo code example 1. FIG. 疑似コード例２を示す図である。FIG. 10 is a diagram illustrating a pseudo code example 2;

以下、図面を参照して本発明の実施の形態を説明する。なお、以下で説明する実施の形態は一例に過ぎず、本発明が適用される実施の形態は、以下の実施の形態に限られるわけではない。例えば、本実施の形態では、ＲＯＰコードの検知を例として説明しているが、本発明の適用先はこれに限らず、ＲＯＰコードと同様の特性を有する他の攻撃コードにも適用可能である。 Embodiments of the present invention will be described below with reference to the drawings. The embodiment described below is only an example, and the embodiment to which the present invention is applied is not limited to the following embodiment. For example, in the present embodiment, the detection of the ROP code is described as an example, but the application destination of the present invention is not limited to this, and the present invention can also be applied to other attack codes having the same characteristics as the ROP code. .

（攻撃コードと防御メカニズム等について）
本発明の実施の形態では、文書ファイルからＲＯＰコードを静的に特定する技術が提供されるが、当該技術の理解を促進するために、まず、既存の攻撃コードと防御メカニズム、及びその回避手法等について説明する。 (About attack codes and defense mechanisms)
In the embodiment of the present invention, a technique for statically specifying an ROP code from a document file is provided. In order to facilitate understanding of the technique, first, an existing attack code, a defense mechanism, and a technique for avoiding the attack code are provided. Etc. will be described.

図１（ａ）は、コンピュータ上でユーザが通常の悪性文書ファイルを開いた場合における各コードの実行動作を示す図である。図１（ａ）に示すように、ユーザが悪性文書ファイルを開くと、まず、閲覧ソフトの脆弱性を攻撃するエクスプロイトコードが作動する。エクスプロイトコードは、閲覧ソフトの制御権をコントロールできるようにするまでの役割を担う。制御権が取られると、シェルコードが実行される。シェルコードは文書ファイル内に埋め込まれた実行ファイル（マルウェア）を取り出して実行する。マルウェアが実行されると、例えば、ユーザ情報の搾取等が行われる。 FIG. 1A is a diagram showing an execution operation of each code when a user opens a normal malicious document file on a computer. As shown in FIG. 1A, when a user opens a malicious document file, an exploit code that attacks the vulnerability of the browsing software is activated first. The exploit code plays a role until the control right of the browsing software can be controlled. When control is taken, the shellcode is executed. The shell code extracts and executes an executable file (malware) embedded in the document file. When the malware is executed, for example, exploitation of user information is performed.

上記のような悪意のあるシェルコードの実行を防止する技術としてＤＥＰ（ＤａｔａＥｘｅｃｕｔｉｏｎＰｒｅｖｅｎｔｉｏｎ）がある。エクスプロイトコードにより制御権を取られ配置されるシェルコードは、メモリ上のプログラム領域ではなく、スタックやヒープと呼ばれるデータ領域内に配置される。ＤＥＰは、このようなデータ領域内に配置されたコードの実行を防止する機能である。ＤＥＰは、メモリの特定の部分がデータの保持のみを目的にしていることをマークし、プロセッサがその領域を実行不可能であると認識することによって機能する。 There is DEP (Data Execution Prevention) as a technique for preventing execution of the malicious shell code as described above. The shell code that is controlled and placed by the exploit code is placed not in the program area on the memory but in a data area called a stack or heap. DEP is a function that prevents the execution of code arranged in such a data area. DEP works by marking that a particular portion of memory is only intended for data retention and by the processor recognizing that area is not executable.

ＲＯＰは、ＤＥＰを回避するための手法の一つである。ＲＯＰは、リターン（ｒｅｔ命令）で終わるコードの断片（ＲＯＰｇａｄｇｅｔと呼ばれる）をつなぎ合わせて、実行させたい処理を実現するものであり、スタックに実行したいコードのアドレスを積み上げておき、それに従って処理がジャンプしていくように調整することで、攻撃者が意図した様々なコード実行を可能とする。この場合、コードの実行は通常のコード領域で行われるため、ＤＥＰによる防御は働かない。 ROP is one of the methods for avoiding DEP. ROP connects the code fragments (called ROPgadget) that end with a return (ret instruction) to realize the processing that is to be executed. The address of the code that is to be executed is accumulated on the stack, and the processing is performed accordingly. By adjusting to jump, it is possible to execute various codes intended by the attacker. In this case, since code execution is performed in a normal code area, protection by DEP does not work.

ただし、ＲＯＰのみで複雑な処理を行うことは困難であるため、ＲＯＰを利用して、データ領域の実行権を変更させる命令、つまり、ＤＥＰを解除する命令（例：ＶｉｒｔｕａｌＰｒｏｔｅｃｔ関数）を実行させることで、データ領域に配置されたシェルコードを実行可能にして、当該シェルコードを実行させる制御が行われる。 However, since it is difficult to perform complicated processing using only ROP, using ROP, an instruction to change the execution right of the data area, that is, an instruction to release DEP (eg, VirtualProtect function) is executed. Thus, control is performed to make the shell code arranged in the data area executable and execute the shell code.

ＲＯＰを回避する手法としてはＡＳＬＲ（ＡｄｄｒｅｓｓＳｐａｃｅＬａｙｏｕｔＲａｎｄｏｍｉｚａｔｉｏｎ）がある。ＡＳＬＲは、アドレス空間をＯＳ起動時にランダム化するものであり、これにより、攻撃者がＡＰＩ関数やスタック・ヒープの固定的な既知アドレスを利用することに対抗し得る。 As a technique for avoiding ROP, there is ASLR (Address Space Layout Randomization). ASLR randomizes the address space when the OS is started, and this can counter an attacker using an API function or a fixed known address of the stack heap.

しかし、例えば当該ＯＳが３２ｂｉｔのＯＳである場合、攻撃者はランダム化されたアドレス空間をスキャンして必要なアドレスを見つけ出すことによって、現実時間・現実試行回数内でＡＳＬＲを回避可能である。また、ＤＬＬ（ＤｙｎａｍｉｃＬｉｎｋＬｉｂｒａｒｙ）によっては、ランダム化が行われないものも存在し、これを悪用した手法が用いられている。また、ＡＳＬＲによるランダム化が行われる場合であっても、脆弱性を利用して特定のＤＬＬのベースアドレスを得ることで動的にＲＯＰコードを組み立てる手法も存在する。 However, for example, when the OS is a 32-bit OS, the attacker can avoid ASLR within the actual time and the actual number of trials by scanning a randomized address space to find a necessary address. Some DLLs (Dynamic Link Libraries) are not randomized, and a technique exploiting this is used. Even when randomization is performed by ASLR, there is a method of dynamically assembling an ROP code by obtaining a base address of a specific DLL using vulnerability.

図１（ｂ）は、ＤＥＰ等の防御メカニズムを回避するＲＯＰコードを含む悪性文書ファイルの動作例を説明するための図である。 FIG. 1B is a diagram for explaining an operation example of a malicious document file including an ROP code that avoids a defense mechanism such as DEP.

図１（ｂ）は、ＲＯＰコード実行と復号コード実行を含む点が図１（ａ）と異なる。図１（ｂ）の場合、暗号化したシェルコードが用いられ、シェルコードは復号コードによって復号されてから実行される。また、復号コードを実行するためには、復号コードが配置されているメモリ領域に実行権限が必要なため、ＤＥＰ回避のためのＲＯＰコードが実行される。また、安定して攻撃を成功させるために、ＲＯＰコードに加えて（又はＲＯＰコードに代えて）、ＳＥＨ（ＳｔｒｕｃｔｕｒｅｄＥｘｃｅｐｔｉｏｎＨａｎｄｌｉｎｇ）コードが実行される場合が多い。 FIG. 1B differs from FIG. 1A in that ROP code execution and decryption code execution are included. In the case of FIG. 1B, an encrypted shell code is used, and the shell code is executed after being decrypted by the decryption code. Further, in order to execute the decryption code, an execution authority is required in the memory area where the decryption code is arranged, and therefore the ROP code for avoiding DEP is executed. In addition to the ROP code (or in place of the ROP code), a structured exception handling (SEH) code is often executed in order to succeed in a stable attack.

（本実施の形態における検出対象コードについて）
図２に、ＲＯＰコード等の悪性コードを含む悪性文書ファイルの内容例を示す。図２に示す悪性文書ファイル内のＲＯＰ（ＳＥＨ含む）コード部は、前記のようにＤＥＰ回避や安定動作を目的とし、後続する復号コード部の外に配置され、復号コードの実行権を付与する。ＲＯＰコードは復号コード部の外に配置されることから暗号化されない。ただし、ＲＯＰコードは１００Ｂｙｔｅ程度、ＳＥＨコードは１０Ｂｙｔｅ程度であり、ＳＥＨコードについては短いため特徴を捕らえるのは困難と考えられることから、本実施の形態では、静的解析によりＲＯＰコードの特徴を捕らえることで、文書ファイルにおけるＲＯＰコードの有無を判定し、文書ファイルが悪性文書ファイルであるか否かを判定することとしている。このように、ＲＯＰコードは復号コード内に含めることはできず平文で現れることから、静的解析で高速に検出することが可能である。 (About detection target code in the present embodiment)
FIG. 2 shows an example of the contents of a malicious document file including a malicious code such as an ROP code. The ROP (including SEH) code part in the malicious document file shown in FIG. 2 is arranged outside the subsequent decryption code part for the purpose of avoiding DEP and stable operation as described above, and gives the right to execute the decryption code. . Since the ROP code is arranged outside the decryption code portion, it is not encrypted. However, since the ROP code is about 100 bytes and the SEH code is about 10 bytes, and the SEH code is short, it is considered difficult to capture the characteristics. In this embodiment, the characteristics of the ROP code are captured by static analysis. Thus, the presence or absence of the ROP code in the document file is determined, and it is determined whether or not the document file is a malicious document file. Thus, since the ROP code cannot be included in the decrypted code and appears in plain text, it can be detected at high speed by static analysis.

文書ファイルを悪性文書ファイルであると特定するためには、文書ファイルにおけるいずれかの悪性コードを検出できればよいが、以下で説明する観点から、本実施の形態では、悪性文書ファイルに含まれる各種の悪性コードのうち、ＲＯＰコードを検出対象としている。 In order to identify a document file as a malicious document file, any malicious code in the document file need only be detected. From the viewpoint described below, in the present embodiment, various types of files included in the malicious document file are detected. Among malicious codes, ROP codes are targeted for detection.

エクスプロイトコード部は、脆弱性を発動させるためのコードであるため、特徴が現れやすく意図的な暗号化は難しいため特定がしやすいが、ゼロデイのように未知の脆弱性の場合は特定ができない。また、復号コード部は、暗号化シェルコードを復号する目的で数１０Ｂｙｔｅ程度からなり、ＸＯＲ等の論理演算を用いる単純な物が多く、正常なコードと区別が困難である。ポリモーフィックコードを生成するエンコーダの場合、サイズが大きくなり特徴を捕らえられる可能性があるものの本実施の形態では、対象としていない。 Since the exploit code part is a code for invoking a vulnerability, it is easy to identify because the feature appears easily and intentional encryption is difficult, but it cannot be identified in the case of an unknown vulnerability such as zero day. In addition, the decryption code part is composed of about several tens of bytes for the purpose of decrypting the encrypted shell code, and is often simple using a logical operation such as XOR, and is difficult to distinguish from a normal code. In the case of an encoder that generates a polymorphic code, there is a possibility that the size increases and a feature can be captured.

シェルコード部はいくつかの共通した特徴を持つ。その中でも多くのコードはＡＰＩ関数アドレスの自己解決を行う為にＰＥＢ（ＰｒｏｃｅｓｓＥｎｖｉｒｏｎｍｅｎｔＢｌｏｃｋ）を参照するため判定が可能である。しかし容易に暗号化や難読化が行われ、特に複数回暗号化を行う手法であるマルチエンコーディングが行われると検出が困難となるので対象としない。 The shellcode part has some common features. Among them, many codes can be determined because they refer to PEB (Process Environment Block) for self-resolution of API function addresses. However, encryption and obfuscation are easily performed, and in particular, when multi-encoding, which is a method of performing encryption multiple times, is difficult to detect, it is not targeted.

一方、前記のように、ＲＯＰコードは復号コード内に含めることはできず平文で現れることから、静的解析で高速に検出することが可能である。また、本実施の形態の手法により、特定の関数の引数等の既知の文字列に頼らない未知のＤＬＬに対するＲＯＰコードを検出可能である。 On the other hand, as described above, since the ROP code cannot be included in the decrypted code and appears in plain text, it can be detected at high speed by static analysis. In addition, the ROP code for an unknown DLL that does not depend on a known character string such as an argument of a specific function can be detected by the method of the present embodiment.

（装置構成）
図３に、本発明の実施の形態に係る悪性文書ファイル検出装置１００の機能構成図を示す。悪性文書ファイル検出装置１００は、オフラインで設置し、検査対象の文書ファイル（入力データ）を手動で入力することにより悪性文書ファイルの検出することとしてもよいし、ネットワーク上に設置し、ネットワーク上で文書ファイルを取得することで悪性文書ファイルの検出を行うこととしてもよい。なお、悪性文書ファイル検出装置１００を攻撃コード検出装置と称してもよい。また、本実施の形態の「文書ファイル」は特定の種類に限定されず、どのようなものでもよい。また、攻撃コード検出の対象は文書ファイルに限られず、本発明に係る技術により、任意の入力データから攻撃コードを検出できる。 (Device configuration)
FIG. 3 is a functional configuration diagram of the malicious document file detection apparatus 100 according to the embodiment of the present invention. The malicious document file detection apparatus 100 may be installed offline, and may detect a malicious document file by manually inputting a document file (input data) to be inspected. Alternatively, the malicious document file detection apparatus 100 may be installed on the network and on the network. It is good also as detecting a malignant document file by acquiring a document file. The malicious document file detection device 100 may be referred to as an attack code detection device. Further, the “document file” in the present embodiment is not limited to a specific type, and may be any type. Further, the attack code detection target is not limited to the document file, and the attack code can be detected from arbitrary input data by the technique according to the present invention.

図３に示すように、悪性文書ファイル検出装置１００は、文書ファイル入力部１０１、文書ファイル格納部１０２、攻撃コード判定部１０３、及び判定結果出力部１０４を備える。 As illustrated in FIG. 3, the malicious document file detection apparatus 100 includes a document file input unit 101, a document file storage unit 102, an attack code determination unit 103, and a determination result output unit 104.

文書ファイル入力部１０１は、悪性文書ファイルか否かを検査する対象とする文書ファイルを入力する。入力された文書ファイルは文書ファイル格納部１０２に格納される。攻撃コード判定部１０３は、対象の文書ファイルを静的に解析することにより、文書ファイルに攻撃コードが含まれているか否かの判定を行う。本実施の形態において、有無判定の対象とする攻撃コードはＲＯＰコードである。攻撃コード判定部１０３の処理内容については後述する。 The document file input unit 101 inputs a document file to be inspected as to whether or not it is a malicious document file. The input document file is stored in the document file storage unit 102. The attack code determination unit 103 determines whether the attack code is included in the document file by statically analyzing the target document file. In the present embodiment, the attack code that is subject to presence / absence determination is an ROP code. The processing content of the attack code determination unit 103 will be described later.

判定結果出力部１０４は、攻撃コード判定部１０３による攻撃コード有無の判定結果を出力する。攻撃コード有との判定結果が得られた場合、対象の文書ファイルは悪性文書ファイルであると判断できる。 The determination result output unit 104 outputs the determination result of the attack code presence / absence by the attack code determination unit 103. If the determination result that there is an attack code is obtained, it can be determined that the target document file is a malicious document file.

本実施の形態に係る悪性文書ファイル検出装置１００は、例えば、１つ又は複数のコンピュータに、本実施の形態で説明する処理内容を記述したプログラムを実行させることにより実現可能である。すなわち、悪性文書ファイル検出装置１００が有する機能は、当該コンピュータに内蔵されるＣＰＵやメモリ、ハードディスクなどのハードウェア資源を用いて、悪性文書ファイル検出装置１００で実施される処理に対応するプログラムを実行することによって実現することが可能である。また、上記プログラムは、コンピュータが読み取り可能な記録媒体（可搬メモリ等）に記録して、保存したり、配布したりすることが可能である。また、上記プログラムをインターネットや電子メールなど、ネットワークを通して提供することも可能である。 The malicious document file detection apparatus 100 according to the present embodiment can be realized, for example, by causing one or a plurality of computers to execute a program describing the processing contents described in the present embodiment. That is, the function of the malicious document file detection apparatus 100 is to execute a program corresponding to processing executed by the malicious document file detection apparatus 100 using hardware resources such as a CPU, memory, and hard disk built in the computer. This can be realized by doing so. Further, the program can be recorded on a computer-readable recording medium (portable memory or the like), stored, or distributed. It is also possible to provide the program through a network such as the Internet or electronic mail.

（ＲＯＰコードの内部構成について）
本実施の形態における攻撃コード判定部１０３における処理内容は、ＲＯＰコードの内部構成と密接に関連するため、ここでＲＯＰコードの内部構成について説明する。 (About the internal structure of the ROP code)
Since the processing content in the attack code determination unit 103 in the present embodiment is closely related to the internal configuration of the ROP code, the internal configuration of the ROP code will be described here.

前述したように、ＲＯＰコードは、実行権限のあるメモリ領域のコード部分を繋いで利用することでＤＥＰを回避して任意のコードを実行可能とする。 As described above, the ROP code can execute any code by avoiding DEP by connecting and using the code portion of the memory area to which execution is authorized.

発明者による調査の結果、攻撃コードとしてＲＯＰコードが用いられる場合、ＲＯＰを利用して自由度の高い任意のシェルコードを書くのではなく、後続するコード領域に実行権限を付与するものしか見られないことがわかっている。実行権限を付与可能な関数の例としてはＶｉｒｔｕａｌＰｒｏｔｅｃｔ、ＶｉｒｔｕａｌＡｌｌｏｃ、ＨｅａｐＣｒｅａｔｅ、ＳｅｔＰｒｏｃｅｓｓＤＥＰＰｏｌｉｃｙ、ＷｒｉｔｅＰｒｏｃｅｓｓＭｅｍｏｒｙ、ＮｔＳｅｔＩｎｆｏｒｍａｔｉｏｎＰｒｏｃｅｓｓ等がある。 As a result of the investigation by the inventor, when ROP code is used as attack code, only shell code having a high degree of freedom is written using ROP, but only execution authority is given to the subsequent code area. I know it ’s not there. Examples of functions that can be given execution authority include VirtualProtect, VirtualAlloc, HeapCreate, SetProcessDEPPolicy, WriteProcessMemory, NtSetInformationProcess, and the like.

ＲＯＰコードには、ＤＥＰを制御するＡＰＩ関数（例：ＶｉｒｔｕａｌＰｒｏｔｅｃｔ関数）に関するＲＯＰｇａｄｇｅｔコード、これらの関数に適切な引数等を準備するために用いられるＲＯＰｇａｄｇｅｔコード（通常のＲＯＰｇａｄｇｅｔコードと呼ぶ）、関数等への引数が含まれる。攻撃者は、スタックにこの２種類のＲＯＰｇａｄｇｅｔコード及び引数等を適切に積んでおき、ＲＯＰコードを実行し、シェルコード配置エリアのメモリ領域に実行権限を付与し、ＤＥＰを回避する。ＤＥＰが回避されると、復号コードの実行等、次の処理が開始される。 The ROP code includes an ROPgadget code related to an API function (eg, VirtualProtect function) that controls DEP, an ROPgadget code (referred to as a normal ROPgadget code) used to prepare appropriate arguments for these functions, a function, and the like. Arguments are included. The attacker appropriately stacks these two types of ROPgadget code and arguments on the stack, executes the ROP code, grants execution authority to the memory area of the shell code placement area, and avoids DEP. When DEP is avoided, the next process such as execution of a decoded code is started.

攻撃者は、安定して攻撃を成功させるためにメモリ空間に固定値として存在するＲＯＰｇａｄｇｅｔコードを用いようとする。ＡＳＬＲが機能している場合これが困難となるが前述した回避方法がある。発明者が調査したところ、ＡＳＬＲ非対応モジュールの物理アドレスを用いる攻撃方法が多数存在し、また逆にＡＳＬＲ非対応モジュールの数は限られることから、用いられるＲＯＰｇａｄｇｅｔコードの数は限定的であり、この物理アドレスに関する特徴を用いる検出手法が有効である。 An attacker tries to use a ROPgadget code that exists as a fixed value in the memory space in order to succeed in a stable attack. This is difficult when the ASLR is functioning, but there is a workaround described above. As a result of an investigation by the inventors, there are many attack methods using the physical address of a module that does not support ASLR, and conversely, since the number of modules that do not support ASLR is limited, the number of ROPgadget codes that are used is limited, A detection method using the feature relating to the physical address is effective.

悪性文書ファイル内でのＲＯＰコードは、上述したようにスタックの状態を作るため、例えば図４に示すように構成される。図４における個々の四角は１バイトを示す。３２ｂｉｔ環境において、Ｂで示すＤＥＰを制御するＡＰＩ関数に関するＲＯＰｇａｄｇｅｔコード、Ａで示す通常のＲＯＰｇａｄｇｅｔコードのそれぞれは４バイトの物理メモリアドレスである。 For example, the ROP code in the malicious document file is configured as shown in FIG. 4 in order to create a stack state as described above. Each square in FIG. 4 represents one byte. In the 32-bit environment, each of the ROPgadget code related to the API function that controls the DEP indicated by B and the normal ROPgadget code indicated by A are 4-byte physical memory addresses.

このようなＲＯＰコードの特徴を利用して、攻撃者が用いる既知のアドレスを検出することでＲＯＰコードを検出することが考えられる。しかし、未知のＡＳＬＲ非対応物理アドレスを用いる攻撃手法、もしくは、ＡＳＬＲ対応ＤＬＬでも、特定のＤＬＬベースアドレスを動的に攻撃コード内で取得して用いる攻撃手法が存在し、このような攻撃手法では、既知のアドレスを検出する手法によるＲＯＰコードの検出は難しくなる。 It is conceivable that the ROP code is detected by detecting a known address used by the attacker by using such a characteristic of the ROP code. However, there are attack methods that use unknown ASLR non-compliant physical addresses, or there are attack methods that use specific DLL base addresses dynamically in attack codes even with ASLR compatible DLLs. The detection of the ROP code by the method of detecting a known address becomes difficult.

そこで、本実施の形態では、特定のＤＬＬ等の物理アドレスに依存しない方法でＲＯＰコードを検出する技術が用いられる。 Therefore, in the present embodiment, a technique for detecting the ROP code by a method independent of a physical address such as a specific DLL is used.

ＡＳＬＲ非対応のあるＤＬＬから作成されたＲＯＰコードの例を図５に示す。図５においてアドレスはリトルエンディアンで配置されている。当該ＤＬＬのベースアドレスは非ＡＳＬＲであることから常に固定であり、０ｘ７Ｃ３４００００〜０ｘ７Ｃ３９６０００にロードされる。このことから、作成されるＲＯＰｇａｄｇｅｔは上位バイトが０ｘ７Ｃ３となり、図５において網掛けで示すように４バイト周期で現れる。 An example of an ROP code created from a DLL that does not support ASLR is shown in FIG. In FIG. 5, the addresses are arranged in little endian. Since the base address of the DLL is non-ASLR, it is always fixed and loaded to 0x7C340000 to 0x7C396000. From this, the generated ROPgadget has an upper byte of 0x7C3 and appears in a 4-byte cycle as shown by shading in FIG.

図６は、３２ｂｉｔＯＳにおけるメモリ空間を示す図である。図６には、上述したＤＬＬがロードされる領域を示している。図５で例に挙げたＤＬＬは、図６で斜線で示す部分にロードされ、ＲＯＰｇａｄｇｅｔとして悪用される。また、悪用される多くのＤＬＬはカーネル領域ではなく、ユーザ領域にロードされるものであるため、ＤＬＬを利用するＲＯＰコードにおける物理アドレスは最大で０ｘ７ＦＦＦＦＦＦＦとなると考えられる。つまり、図６にグレーで示す０ｘ８０００００００から０ｘＦＦＦＦＦＦＦＦのカーネル領域のアドレスは利用されないと考えられる。 FIG. 6 is a diagram showing a memory space in the 32-bit OS. FIG. 6 shows an area where the above-described DLL is loaded. The DLL given as an example in FIG. 5 is loaded in the hatched portion in FIG. 6 and abused as ROPgadget. In addition, since many DLLs that are abused are loaded not in the kernel area but in the user area, the physical address in the ROP code using the DLL is considered to be 0x7FFFFFFF at the maximum. That is, it is considered that the kernel area addresses from 0x80000000 to 0xFFFFFFFF shown in gray in FIG. 6 are not used.

また、図５に白で示した部分のうち、４０バイトから４７バイト目の値が０ｘ４０００００００であるが、これはＶｉｒｔｕａｌＰｒｏｔｅｃｔ関数での実行権を付与するのに必要な引数である。また、図５において、網掛けを除く白部分の文字列について、一般的な文書ファイルによく見られる０ｘ００や０ｘＦＦ等でなく、散らばりのある値となっていることも特徴の１つである。 In addition, in the part shown in white in FIG. 5, the value from the 40th byte to the 47th byte is 0x40000000, which is an argument necessary for granting the execution right in the VirtualProtect function. In addition, in FIG. 5, the white part character string excluding the halftone is one of the characteristics that is not scattered, such as 0x00 and 0xFF which are often found in general document files.

上記の例のように、ＲＯＰコードを用いる攻撃者は、非ＡＳＬＲのＤＬＬを探し出してその固定アドレスを利用するため、ＲＯＰコードには、特定のアドレス空間に集中した物理アドレスが連続するという特徴がある。もしくは、ＡＳＬＲ対応のＤＬＬの場合でも、脆弱性を利用してそのＤＬＬのベースアドレスを知ることで固定アドレスを利用することが可能であるため、ＡＳＬＲ対応のＤＬＬの場合でも、ＲＯＰコードには、特定のアドレス空間に集中した物理アドレスが連続するという特徴があると考えられる。 As in the above example, an attacker using a ROP code finds a non-ASLR DLL and uses its fixed address. Therefore, the ROP code has a feature that physical addresses concentrated in a specific address space are continuous. is there. Or, even in the case of an ASLR compatible DLL, it is possible to use a fixed address by knowing the base address of the DLL using the vulnerability, so even in the case of an ASLR compatible DLL, It is considered that there is a feature that physical addresses concentrated in a specific address space are continuous.

（攻撃コード検出部１０３の動作例）
攻撃コード検出部１０３は、文書ファイルの中から上記のような特徴を検出することにより、ＲＯＰコードの有無を判定する処理を行う。 (Operation example of attack code detection unit 103)
The attack code detection unit 103 performs processing for determining the presence or absence of the ROP code by detecting the above-described features from the document file.

攻撃コード検出部１０３の処理の概要を図７のフローチャートを参照して説明する。図７に示す処理は、悪性文書ファイルか否かを検査する対象とする文書ファイルの先頭のバイトデータから、最後のバイトデータまで１バイトづつ順番に行うものである。なお、２５個のダブルワードを全部取得できなくなった時点で処理を終了することとしてもよい。また、ＲＯＰコードが検出された旨の判定結果が得られた時点で処理を終了してもよい。 An outline of the processing of the attack code detection unit 103 will be described with reference to the flowchart of FIG. The processing shown in FIG. 7 is performed in order from the first byte data of the document file to be inspected as a malignant document file to the last byte data one byte at a time. The process may be terminated when all 25 double words cannot be acquired. Further, the processing may be terminated when a determination result indicating that the ROP code is detected is obtained.

図７の処理の前提として、文書ファイル入力部１０１から悪性文書ファイルか否かを検査する対象とする文書ファイルが入力され、当該文書ファイルが文書ファイル格納部１０２に格納されているものとする。攻撃コード検出部１０３は、文書ファイル格納部１０２から文書ファイルのデータを読み取り、以下で説明する処理を実行する。 As a premise of the processing of FIG. 7, it is assumed that a document file to be inspected as a malignant document file is input from the document file input unit 101 and the document file is stored in the document file storage unit 102. The attack code detection unit 103 reads document file data from the document file storage unit 102, and executes processing described below.

ステップ１０１において、攻撃コード判定部１０３は、現在のバイト位置（処理開始時点であれば文書ファイルにおける最初のバイト）から４バイト分のデータであるダブルワードを取得する。本例では、文書ファイルにおいてリトルエンディアンで文字列が並んでいるものとし、それを考慮してダブルワードとする。以下の２５個のダブルワードを取得する場合も同様である。 In step 101, the attack code determination unit 103 acquires a double word, which is data of 4 bytes, from the current byte position (the first byte in the document file at the processing start time). In this example, it is assumed that character strings are arranged in little endian in a document file, and a double word is taken into consideration. The same applies to the following 25 double words.

ステップ１０２において、攻撃コード判定部１０３は、取得したダブルワードの値が所定アドレスよりも大きいか否かを判定し、大きい場合はステップ１０３に進み、大きくない場合はステップ１０４に進む。ステップ１０３において、攻撃コード判定部１０３は、文書ファイルにおける着目するバイトを１バイト進めてステップ１０１からの処理を再び行う。 In step 102, the attack code determination unit 103 determines whether or not the acquired double word value is larger than a predetermined address. If larger, the process proceeds to step 103. If not larger, the process proceeds to step 104. In step 103, the attack code determination unit 103 advances the target byte in the document file by one byte and performs the processing from step 101 again.

本実施の形態では、上記の所定アドレスは、前述したユーザ領域の最大アドレス０ｘ７ＦＦＦＦＦＦＦである。本例では、ＲＯＰコードはユーザ領域にロードされたプログラムをＲＯＰｇａｄｇｅｔとして利用すると想定し、ユーザ領域の最大アドレスよりも大きなアドレスについては、ＲＯＰコードに含まれないものとしている。 In the present embodiment, the predetermined address is the above-described maximum address 0x7FFFFFFF of the user area. In this example, it is assumed that the ROP code uses a program loaded in the user area as the ROPgadget, and an address larger than the maximum address of the user area is not included in the ROP code.

ステップ１０４において、攻撃コード判定部１０３は、現在のバイト位置から連続する２５個のダブルワードを文書ファイルから取得し、当該２５ダブルワード間での比較、及び、２５ダブルワードにおける所定値有無によりスコアの算出を行う。 In step 104, the attack code determination unit 103 acquires 25 double words consecutive from the current byte position from the document file, compares the 25 double words, and scores based on the presence or absence of a predetermined value in the 25 double words. Is calculated.

２５ダブルワード間での比較の処理の例を図８を参照して説明する。図８の例では、最初のダブルワードと、他の２４個のダブルワードのそれぞれとの比較処理を行う。本例において、比較処理とは、比較対象のダブルワード間の数値の差の絶対値が所定の閾値よりも小さいか（「以下」でもよい）どうかを判定し、小さい場合にスコアを加算する。この処理は、部分データ列の数値が所定の範囲に集中する度合を評価することの例である。なお、図８の処理例は一例に過ぎない。例えば、各ダブルワードを他の各ダブルワードと比較する（２５：２５）こととしてもよい。また、部分データ列（ダブルワード）の数値が所定の範囲に集中する度合を評価するために、部分データ列に関する分散、エントロピー等を計算し、所定閾値との比較等を行うこととしてもよい。 An example of comparison processing between 25 double words will be described with reference to FIG. In the example of FIG. 8, the first double word is compared with each of the other 24 double words. In this example, the comparison processing determines whether or not the absolute value of the numerical difference between the double words to be compared is smaller than a predetermined threshold (or may be “below”), and adds a score if it is smaller. This process is an example of evaluating the degree to which the numerical values of the partial data strings are concentrated in a predetermined range. Note that the processing example in FIG. 8 is merely an example. For example, each double word may be compared with each other double word (25:25). Further, in order to evaluate the degree of concentration of the partial data string (double word) in a predetermined range, variance, entropy, etc. regarding the partial data string may be calculated and compared with a predetermined threshold value.

前述したように、ＲＯＰコードにおいては、特定のアドレス空間に集中した物理アドレスが現れるという特徴があることから、ステップ１０４の処理でこの特徴を見出そうとするのである。 As described above, since the ROP code has a feature that physical addresses concentrated in a specific address space appear, this feature is to be found in the process of step 104.

また、２５ダブルワードにおける所定値有無における所定値とは、例えば、前述したメモリ権限を変更する関数（例：ＶｉｒｔｕａｌＰｒｏｔｅｃｔ関数）においてメモリ権限を変更（例：実行権付与）するために必要な引数（例：０ｘ４０００００００）である。２５ダブルワードの中に当該所定値が存在する場合、当該文書ファイルにＲＯＰコードが存在する可能性が高くなるため、大きなスコアを加算する。もしくは、当該所定値が存在することをＲＯＰコードが存在すると判定することの条件としてもよい。なお、所定値有無検査は必須ではなく、部分データ列の数値が所定の範囲に集中する度合を評価するのみによりＲＯＰコード有無判定を行うことも可能である。 The predetermined value in the presence / absence of a predetermined value in 25 double words is, for example, an argument necessary for changing the memory authority (eg, granting execution right) in the function for changing the memory authority (eg, VirtualProtect function) described above. Example: 0x40000000). If the predetermined value is present in 25 double words, the possibility that an ROP code exists in the document file is high, so a large score is added. Alternatively, the presence of the predetermined value may be a condition for determining that the ROP code exists. The predetermined value presence / absence check is not essential, and it is also possible to determine the presence / absence of the ROP code only by evaluating the degree to which the numerical values of the partial data strings are concentrated in a predetermined range.

ステップ１０５において、攻撃コード判定部１０３は、２５ダブルワードの検査結果により、所定の検出条件を満たすかどうかを判定し、満たす場合はステップ１０６に進み、満たさない場合はステップ１０３に進む。所定の検出条件とは、例えば、スコアが所定の閾値よりも大きいこと、もしくは、スコアが所定の閾値よりも大きいこと、かつ、上記の所定値が存在すること、等である。 In step 105, the attack code determination unit 103 determines whether or not a predetermined detection condition is satisfied based on the 25 double word inspection result. If satisfied, the process proceeds to step 106. If not satisfied, the process proceeds to step 103. The predetermined detection condition is, for example, that the score is larger than a predetermined threshold, that the score is larger than the predetermined threshold, and that the predetermined value exists.

ステップ１０６において、攻撃コード判定部１０３は、ＲＯＰコードが検出されたことを示す判定結果を判定結果出力部１０４に通知し、判定結果出力部１０４は、ＲＯＰコードが検出されたことを示す判定結果を出力する。 In step 106, the attack code determination unit 103 notifies the determination result output unit 104 of a determination result indicating that the ROP code has been detected, and the determination result output unit 104 determines that the ROP code has been detected. Is output.

図９に、攻撃コード判定部１０３が実行する処理に対応する疑似コード例１を示す。疑似コード例１は図７に示した処理と基本的に同じ処理を示すが、より詳細な処理例を示している。 FIG. 9 shows pseudo code example 1 corresponding to the processing executed by the attack code determination unit 103. Pseudo code example 1 shows basically the same processing as the processing shown in FIG. 7, but shows a more detailed processing example.

疑似コード例１に従った処理では、攻撃コード判定部１０３は、対象とする文書ファイルをバイナリデータとして、先頭から１バイト毎にチェックする（０１行、１９行）。攻撃コード判定部１０３は、１個目のダブルワード（４バイト）を読み出し（０２行）、ユーザ領域に存在する要件を満たしていれば（０２行）、１個目のダブルワードとそれに連続する２４個のダブルワードからなる２５個のダブルワード（１００バイト）を検査する（０７行〜１５行）。 In the process according to the pseudo code example 1, the attack code determination unit 103 checks the target document file as binary data for each byte from the top (lines 01 and 19). The attack code determination unit 103 reads the first double word (4 bytes) (02 lines), and satisfies the requirements existing in the user area (02 lines). 25 double words (100 bytes) consisting of 24 double words are inspected (lines 07 to 15).

この検査処理においては、図８に示したようにしてダブルワード間の差をとり、差が閾値を下回る場合にスコアを上げる（０８行、０９行）。また、図９の例では、２５個のダブルワード内に、ＶｉｒｔｕａｌＰｒｏｔｅｃｔ関数を用いて実行権を付与するために必要な引数があれば、ＲＯＰフラグを立てる（１２行、１３行）。図９の例では、ＲＯＰフラグが立ちスコアが閾値を超えていればＲＯＰコードが見つかったと判定する（１６行、１７行）。 In this inspection process, the difference between double words is taken as shown in FIG. 8, and the score is raised when the difference falls below the threshold (line 08, line 09). In the example of FIG. 9, if there are arguments necessary for granting the execution right using the VirtualProtect function in 25 double words, the ROP flag is set (lines 12 and 13). In the example of FIG. 9, if the ROP flag stands and the score exceeds the threshold, it is determined that the ROP code has been found (lines 16 and 17).

図９の例における０８行での閾値（０ｘ１０００００００）については、発明者によりＲＯＰに用いられる代表的なＤＬＬを調査した結果から得られたものであるが、これは一例であり、他の閾値を使用してもよい。 The threshold value (0x10000000) at line 08 in the example of FIG. 9 is obtained from the result of investigation of a typical DLL used for ROP by the inventor, but this is an example, and other threshold values are set. May be used.

また、ＲＯＰフラグを立てるかどうかを判定するための引数については、０ｘ４０００００００以外にもいくつかのパターンがあるが多くはないことが確認されている。 Further, it has been confirmed that there are some patterns other than 0x40000000 for the argument for determining whether to set the ROP flag, but there are not many.

図９に示す疑似コード例１では、０ｘ４０００００００等の引数をＲＯＰコード判定の決定要因にしていたが、これを決定要因にせず、スコアを上げる材料としてもよい。その場合の疑似コード例である疑似コード例２を図１０に示す。 In the pseudo code example 1 shown in FIG. 9, an argument such as 0x40000000 is used as a determining factor for ROP code determination. However, this may not be used as a determining factor but may be a material for increasing the score. FIG. 10 shows a pseudo code example 2 which is a pseudo code example in that case.

図１０における１２行と１３行に示すように、ダブルワードが引数に一致した場合に、スコアを大きく上げることとしている。他の処理については図９の場合と同様である。 As shown in lines 12 and 13 in FIG. 10, when the double word matches the argument, the score is greatly increased. Other processes are the same as those in FIG.

図１０の例のようにスコアのみでもＲＯＰコードの検出が可能である。また、図１０において、引数のスコア加算を入れずにＲＯＰコードを検出することも可能である。このように、スコアのみで検出する方式では、関数仕様が未知の実行権限付与関数が使用されるＲＯＰコードも検出することが可能である。 As in the example of FIG. 10, the ROP code can be detected only by the score. In FIG. 10, it is also possible to detect the ROP code without adding the argument score. As described above, in the method of detecting only by the score, it is possible to detect an ROP code in which an execution authority grant function whose function specification is unknown is used.

なお、上記のように、ダブルワード間での差が所定閾値を下回る場合にスコアを上げる方式の場合、このような関係にあるダブルワードを含む良性文書ファイルを悪性文書ファイルであると誤検出する可能性があり得る。このような誤検出としては、例えば、良性文書ファイルにおいて終端文字列等として頻繁に使用される文字列（例：０ｘ００、０ｘ＊＊＊＊００００）や１バイト周期で現れる同一文字列があるが、いずれもＲＯＰコードの構成はできなくなるので、これらを評価の対象外とすることで誤検出を回避可能である。 As described above, in the case of a system in which the score is increased when the difference between double words is below a predetermined threshold, a benign document file including a double word having such a relationship is erroneously detected as a malignant document file. There is a possibility. Examples of such erroneous detection include a character string (eg, 0x00, 0x *** 0000) frequently used as a terminal character string in a benign document file and the same character string appearing in a 1-byte cycle. In either case, the configuration of the ROP code cannot be made, so that erroneous detection can be avoided by excluding these from the evaluation.

また、ＲＯＰコードには必ず無意味なＪＵＮＫコード（例：ＦＦＦＦＦＦＦＦ）が入ることから、１つ又は複数の無意味コードを定義し、検査対象の複数の部分データ列（ダブルワード）の中に含まれる無意味コードの数を判定条件に追加することとしてもよい。例えば、無意味コードが所定数個以上あればスコアを加算するといった処理を行うことができる。 Since a meaningless JUNK code (eg FFFFFFFF) is always included in the ROP code, one or more meaningless codes are defined and included in a plurality of partial data strings (double words) to be inspected. The number of meaningless codes to be added may be added to the determination condition. For example, if there are a predetermined number or more of meaningless codes, a process of adding a score can be performed.

また、本実施の形態では、３２ｂｉｔ環境を想定しているが、６４ｂｉｔ環境でも本実施の形態に係る技術を同様に適用可能である。６４ｂｉｔ環境であれば、文書ファイル（入力データ）を８バイトずつ区切り、評価を行う。 Further, in the present embodiment, a 32-bit environment is assumed, but the technology according to the present embodiment can be similarly applied to a 64-bit environment. In a 64-bit environment, the document file (input data) is divided into 8 bytes and evaluated.

（実施の形態のまとめ、効果等）
以上、説明したように、本実施の形態により、入力データから攻撃コードを検出する攻撃コード検出装置であって、前記入力データの中から、所定のデータ長の部分データ列を複数取り出す取得手段と、前記複数の部分データ列について、各部分データ列の数値が所定の範囲に集中する度合を評価する評価手段と、前記評価手段による評価結果に基づいて、前記入力データにおける前記攻撃コードの検出結果を出力する出力手段とを備える攻撃コード検出装置が提供される。 (Summary of the embodiment, effects, etc.)
As described above, according to the present embodiment, an attack code detection device that detects an attack code from input data, the acquisition means for extracting a plurality of partial data strings having a predetermined data length from the input data; , For the plurality of partial data strings, evaluation means for evaluating the degree to which the numerical values of the partial data strings are concentrated in a predetermined range, and the detection result of the attack code in the input data based on the evaluation result by the evaluation means An attack code detecting device is provided.

前記評価手段は、前記複数の部分データ列について、特定の数値に該当する部分データ列が存在するか否かについて更に評価を行うこととしてもよい。前記特定の数値は、例えば、メモリにおけるコードの実行権を制御する所定の関数の引数である。 The evaluation means may further evaluate whether or not there is a partial data string corresponding to a specific numerical value for the plurality of partial data strings. The specific numerical value is, for example, an argument of a predetermined function that controls the right to execute code in the memory.

前記評価手段は、前記各部分データ列の数値が所定の範囲に集中する度合の評価として、例えば、部分データ列間の差分の大きさが所定の閾値よりも小さいか否かの評価を行う。また、前記評価手段は、前記取得手段により取得される部分データ列が、コンピュータのメモリ空間におけるユーザ領域外のアドレスを示す場合に、当該部分データ列を前記評価の対象外とすることとしてもよい。前記攻撃コードは例えばＲＯＰコードである。 For example, the evaluation unit evaluates whether or not the magnitude of the difference between the partial data strings is smaller than a predetermined threshold as an evaluation of the degree to which the numerical values of the partial data strings are concentrated in a predetermined range. The evaluation unit may exclude the partial data string from the evaluation target when the partial data string acquired by the acquisition unit indicates an address outside the user area in the memory space of the computer. . The attack code is, for example, an ROP code.

本実施の形態により、ＲＯＰコードを文書ファイルの静的解析により特定することが可能となる。本実施の形態では、静的に解析することから、解析時間が短い、ＯＳや文書閲覧ソフト等複数のバージョンを準備しての解析環境が不要、ファイルを動作させないことから安全である等、のメリットがある。またＲＯＰコードを検出することで、ゼロデイ等未知の脆弱性をつくタイプの攻撃コードが埋め込まれた悪性文書ファイルでも検出可能である。 According to the present embodiment, the ROP code can be specified by static analysis of the document file. In this embodiment, the analysis time is short, the analysis time is short, the analysis environment with multiple versions such as OS and document browsing software is unnecessary, the file is safe because it does not operate, etc. There are benefits. Also, by detecting the ROP code, it is possible to detect a malicious document file in which an attack code of an unknown vulnerability type such as zero day is embedded.

悪性文書ファイルに埋め込まれた悪性コードを静的解析により検出する既存技術として、非特許文献１〜３に記載された技術がある。非特許文献１では、悪性文書ファイルに埋め込まれた実行ファイル（マルウェア本体）を複数のエンコード方式への対応や総当たり方式による鍵の探索により自動抽出するＨａｎｄｙＳｃｉｓｓｏｒｓが提案されている。 As existing techniques for detecting a malicious code embedded in a malicious document file by static analysis, there are techniques described in Non-Patent Documents 1 to 3. Non-Patent Document 1 proposes Handy Scissors for automatically extracting an execution file (malware main body) embedded in a malicious document file by supporting a plurality of encoding methods or searching for a key by a brute force method.

また、非特許文献２では、文書ファイルのサイズや構造に関する情報を検査することで悪性文書ファイルを検知する手法が提案されている。また、非特許文献３においては、鍵の探索機能を持つツールとしてＯｆｆｉｃｅＭａｌＳｃａｎｎｅｒが開示されている。 Non-Patent Document 2 proposes a technique for detecting a malicious document file by inspecting information on the size and structure of the document file. In Non-Patent Document 3, OfficeMalScanner is disclosed as a tool having a key search function.

いずれも既存のアンチウィルスソフトと比較して高い悪性文書検知性能を有するものの、いずれにおいても本実施の形態で説明したようなＲＯＰコードの特徴に着目した静的解析による検出手法は示されていない。本実施の形態に係る技術を用いることで、非特許文献１〜３に記載された技術では検知できないＲＯＰコードを含む悪性文書ファイルを検知することが期待できる。 Although all have higher performance for detecting malicious documents as compared with existing anti-virus software, none of the detection methods by static analysis focusing on the characteristics of the ROP code as described in the present embodiment is shown. By using the technique according to the present embodiment, it can be expected to detect a malicious document file including an ROP code that cannot be detected by the techniques described in Non-Patent Documents 1 to 3.

本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において、種々変更・応用が可能である。 The present invention is not limited to the above-described embodiments, and various modifications and applications are possible within the scope of the claims.

１００悪性文書ファイル検出装置
１０１文書ファイル入力部
１０２文書ファイル格納部
１０３攻撃コード判定部
１０４判定結果出力部 DESCRIPTION OF SYMBOLS 100 Malignant document file detection apparatus 101 Document file input part 102 Document file storage part 103 Attack code determination part 104 Determination result output part

Claims

An attack code detection device for detecting an attack code from input data,
Obtaining means for extracting a plurality of partial data strings having a predetermined data length from the input data;
For the plurality of partial data strings, evaluation means for evaluating the degree to which the numerical values of the partial data strings are concentrated in a predetermined range;
An attack code detection apparatus comprising: output means for outputting a detection result of the attack code in the input data based on an evaluation result by the evaluation means.

The attack code detection apparatus according to claim 1, wherein the evaluation unit further evaluates whether or not there is a partial data string corresponding to a specific numerical value for the plurality of partial data strings.

The attack code detection apparatus according to claim 2, wherein the specific numerical value is an argument of a predetermined function that controls an execution right of the code in the memory.

The evaluation means evaluates whether or not the difference between the partial data strings is smaller than a predetermined threshold as an evaluation of the degree to which the numerical values of the partial data strings are concentrated in a predetermined range. The attack code detection device according to any one of claims 1 to 3.

The evaluation means excludes the partial data string from the evaluation object when the partial data string acquired by the acquisition means indicates an address outside a user area in a memory space of a computer. Item 5. The attack code detection device according to any one of Items 1 to 4.

6. The attack code detection device according to claim 1, wherein the attack code is an ROP code.

The program for functioning a computer as each means in the attack code detection apparatus in any one of Claims 1 thru | or 6.

An attack code detection method executed by an attack code detection device that detects an attack code from input data,
The attack code detecting device obtains a plurality of partial data strings having a predetermined data length from the input data; and
The attack code detection device, for the plurality of partial data strings, an evaluation step for evaluating the degree to which the numerical values of the partial data strings are concentrated in a predetermined range;
The attack code detection device comprises: an output step of outputting a detection result of the attack code in the input data based on an evaluation result in the evaluation step.