JP2013084064A

JP2013084064A - Analyzer, analysis method, and analysis program

Info

Publication number: JP2013084064A
Application number: JP2011222241A
Authority: JP
Inventors: Yuhei Kawakoya; 裕平川古谷; Makoto Iwamura; 誠岩村; Takeo Hario; 剛男針生
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2011-10-06
Filing date: 2011-10-06
Publication date: 2013-05-09
Anticipated expiration: 2031-10-06
Also published as: JP5687593B2

Abstract

PROBLEM TO BE SOLVED: To improve identification accuracy of a packer.SOLUTION: After receiving an instruction to generate a signature from an input section 11, an analyzer 10 performs the steps of: acquiring a packed execution file stored in a packed execution file storage section 14b to run the packed execution file; and acquiring an execution trace of an unpacked code of the packed execution file. A signature generation section 13a then divides the execution trace into specific blocks and generates sets of code blocks. Subsequently, the analyzer 10 compares the sets of code blocks in the execution trace with one another, extracts a set of code blocks which appear in common, and generates a signature.

Description

本発明は、解析装置、解析方法および解析プログラムに関する。 The present invention relates to an analysis apparatus, an analysis method, and an analysis program.

従来、コンピュータウィルスやスパイウェア等の悪意のあるプログラムであるマルウェアのプログラムコードは、該マルウェアの動作や機能が解析されることを妨害するために難読化されていることが多い。かかる難読化は、一つの態様として、ＸＯＲエンコードや特定の鍵を利用した暗号化等の可逆なアルゴリズムを利用するものが挙げられる。また、マルウェアのプログラムコードが難読化されることで、本来のプログラムコードであるオリジナルコードが隠蔽されることはパッキングと呼ばれ、該パッキングを実行するツールは総じてパッカーと呼ばれる。 Conventionally, malware program codes that are malicious programs such as computer viruses and spyware are often obfuscated in order to prevent analysis of the operation and function of the malware. One example of such obfuscation is one that uses a reversible algorithm such as XOR encoding or encryption using a specific key. Further, obfuscation of the program code of the malware and concealing the original code, which is the original program code, is called packing, and the tools that execute the packing are generally called packers.

ところで、パッキングされたマルウェアは、難読化されたプログラムコードをデータとして有するとともに、オリジナルコードを復元するための展開コードを有する。詳細には、パッキングされたマルウェアが実行された場合には、展開コードの部分が実行されて難読化されたプログラムコードが解かれ、オリジナルコードがメモリ上に展開される。続いて、オリジナルコードの展開が完了した場合には、展開されたオリジナルコードが実行される。 By the way, the packed malware has an obfuscated program code as data and an expanded code for restoring the original code. Specifically, when the packed malware is executed, the portion of the expanded code is executed, the obfuscated program code is solved, and the original code is expanded on the memory. Subsequently, when the development of the original code is completed, the developed original code is executed.

このようなパッキングされたマルウェアに利用されているパッカーの識別方法として、シグネチャを用いたマッチング処理を行って、検査対象の実行ファイルで利用されたパッカーを特定する手法が知られている。例えば、パッキングされた展開コード部分とデータ部分の特徴的なバイト列を手動により抽出し、抽出した特徴的なバイト列をシグネチャのパターンとして用いたマッチング処理を行い、検査対象の実行ファイルで利用されたパッカーを特定する。 As a method for identifying a packer used for such packed malware, a technique is known in which a matching process using a signature is performed to identify a packer used in an execution file to be inspected. For example, a characteristic byte sequence of the packed expanded code portion and data portion is manually extracted, matching processing is performed using the extracted characteristic byte sequence as a signature pattern, and used in the execution file to be inspected. Identify the packer.

“The new signature generation method based on an unpacking algorithm and procedure for a packer detection”，International Journal of Advanced Science and Technology， VOL．27， February 2011“The new signature generation method based on an unpacking algorithm and procedure for a packer detection”, International Journal of Advanced Science and Technology, VOL. 27, February 2011 “A Survey of Malware Detection Techniques”, Technical report. Department of Computer Science, Purdue University, February 2007“A Survey of Malware Detection Techniques”, Technical report. Department of Computer Science, Purdue University, February 2007

しかしながら、上述した従来技術では、パッキングされた展開コード部分とデータ部分の特徴的なバイト列を手動で抽出するので、コストが高く大量のシグネチャを作成するのが困難である。また、ｘ８６アーキテクチャの場合を例にとると正確な逆アセンブルを行うことが難しく精度の高いシグネチャを得ることが出来ない。結果、パッカーの識別精度向上を図ることが出来ないといった課題があった。 However, in the above-described conventional technique, the characteristic byte strings of the packed expanded code portion and the data portion are manually extracted. Therefore, it is difficult to create a large number of signatures at high cost. Further, taking the case of the x86 architecture as an example, it is difficult to perform accurate disassembly, and a highly accurate signature cannot be obtained. As a result, there was a problem that the packer identification accuracy could not be improved.

そこで、この発明は、上述した従来技術の課題を解決するためになされたものであり、パッカーの識別精度の向上を図ることを目的とする。 Accordingly, the present invention has been made to solve the above-described problems of the prior art, and an object thereof is to improve the packer identification accuracy.

上述した課題を解決し、目的を達成するため、本願に開示する解析装置は、複数の実行ファイルに対して複数の難読化ツールそれぞれを用いて難読化された各実行ファイルを動作させ、該各実行ファイルの動作結果として得られた各命令コードを複数のブロックごとに分割する分割部と、前記分割部によって分割された各ブロックのうち、同一の難読化ツールで難読化された実行ファイル間で共通して出現するブロックを抽出し、該抽出したブロックの特徴に関する情報を生成する生成部と、前記生成部によって生成されたブロックの特徴に関する情報を用いて、検査対象の実行ファイルの難読化に用いられた難読化ツールを特定する特定部と、を備える。 In order to solve the above-described problems and achieve the object, the analysis device disclosed in the present application operates each executable file obfuscated using each of a plurality of obfuscation tools with respect to a plurality of execution files. A division unit that divides each instruction code obtained as an operation result of the executable file into a plurality of blocks, and an executable file obfuscated by the same obfuscation tool among the blocks divided by the division unit Extracting blocks that appear in common, generating information about the extracted block features, and using information about the block features generated by the generating unit to obfuscate the executable file to be examined A specifying unit that specifies the obfuscation tool used.

本願に開示する解析装置は、パッカーの識別精度の向上を図ることができるという効果を奏する。 The analysis device disclosed in the present application has an effect of improving the packer identification accuracy.

図１は、実施例１に係る解析装置の構成を示すブロック図である。FIG. 1 is a block diagram illustrating the configuration of the analysis apparatus according to the first embodiment. 図２は、パッキング済みの実行ファイルを用いてシグネチャを生成する処理を説明する図である。FIG. 2 is a diagram illustrating a process of generating a signature using a packed execution file. 図３は、パッキング済みの実行ファイルとともに、パッキング前の実行ファイルを用いてシグネチャを生成する処理を説明する図である。FIG. 3 is a diagram for explaining processing for generating a signature using an execution file before packing together with an execution file that has been packed. 図４は、パッカーを識別する識別処理を説明する図である。FIG. 4 is a diagram illustrating an identification process for identifying a packer. 図５は、実施例１に係る解析装置のシグネチャ生成処理の手順を説明するためのフローチャートである。FIG. 5 is a flowchart for explaining the procedure of the signature generation process of the analysis apparatus according to the first embodiment. 図６は、実施例１に係る解析装置のパッカー識別処理の手順を説明するためのフローチャートである。FIG. 6 is a flowchart for explaining the procedure of the packer identification process of the analysis apparatus according to the first embodiment. 図７は、図７は、解析プログラムを実行するコンピュータを示す図である。FIG. 7 is a diagram illustrating a computer that executes an analysis program.

以下に添付図面を参照して、この発明に係る解析装置、解析方法および解析プログラムの実施例を詳細に説明する。なお、この実施例によりこの発明が限定されるものではない。 Exemplary embodiments of an analysis apparatus, an analysis method, and an analysis program according to the present invention will be described below in detail with reference to the accompanying drawings. Note that the present invention is not limited to the embodiments.

［解析装置の構成］
まず、図１を用いて、実施例１に係る解析装置について説明する。図１は、実施例１に係る解析装置１０の構成例を示すブロック図である。図１に示した解析装置１０は、所定のプログラムを実行するコンピュータに実装される。 [Configuration of analyzer]
First, the analysis apparatus according to the first embodiment will be described with reference to FIG. FIG. 1 is a block diagram illustrating a configuration example of an analysis apparatus 10 according to the first embodiment. The analysis apparatus 10 shown in FIG. 1 is mounted on a computer that executes a predetermined program.

ここで、プログラムを実行するコンピュータの動作例を説明する。コンピュータによってプログラムが実行される場合には、例えば、コンピュータ内部のＣＰＵ（Central Processing Unit）は、ハードディスク装置等に記憶されているプログラムを、ＲＡＭ（Random Access Memory）等のメモリ空間に展開する。具体的には、ＣＰＵは、メモリ空間に、命令や値等のデータを展開する。これにより、プログラムは、プロセスとして機能する。このようにしてプロセスが生成された後に、ＣＰＵは、プロセスメモリ空間に展開された各種データを用いて各種命令を実行する。 Here, an example of the operation of a computer that executes a program will be described. When a program is executed by a computer, for example, a CPU (Central Processing Unit) inside the computer expands a program stored in a hard disk device or the like in a memory space such as a RAM (Random Access Memory). Specifically, the CPU expands data such as instructions and values in the memory space. Thereby, the program functions as a process. After the process is generated in this manner, the CPU executes various instructions using various data expanded in the process memory space.

解析装置１０は、入力部１１、出力部１２、制御部１３および記憶部１４を有する。解析装置１０は、シグネチャを生成するとともに、生成したシグネチャを用いて検査対象の実行ファイルをパッキングしたパッカーを特定する。 The analysis device 10 includes an input unit 11, an output unit 12, a control unit 13, and a storage unit 14. The analysis apparatus 10 generates a signature and identifies a packer that packs an execution file to be inspected using the generated signature.

入力部１１は、シグネチャの生成指示や検査対象の実行ファイルなどを入力するものであり、キーボードやマウス、マイクなどを備えて構成される。出力部１２は、例えば、実行ファイルにパッキングをしたパッカーを表示するものであり、モニタやスピーカを備えて構成される。 The input unit 11 inputs a signature generation instruction, an execution file to be inspected, and the like, and includes a keyboard, a mouse, a microphone, and the like. The output unit 12 displays, for example, a packer packed in an execution file, and includes a monitor and a speaker.

記憶部１４は、制御部１３による各種処理に必要なデータおよびプログラムを格納するが、特に本発明に密接に関連するものとしては、実行ファイル記憶部１４ａ、パッキング済み実行ファイル記憶部１４ｂおよびシグネチャ記憶部１４ｃを有する。また、記憶部１４とは、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、フラッシュメモリ（flash memory）などの半導体メモリ素子、または、ハードディスク、光ディスクなどの記憶装置である。 The storage unit 14 stores data and programs necessary for various types of processing by the control unit 13, and particularly those closely related to the present invention include an execution file storage unit 14 a, a packed execution file storage unit 14 b, and a signature storage. It has a portion 14c. The storage unit 14 is a semiconductor memory device such as a random access memory (RAM), a read only memory (ROM), or a flash memory, or a storage device such as a hard disk or an optical disk.

実行ファイル記憶部１４ａは、パッキングされる前の複数の実行ファイルを記憶する。例えば、実行ファイル記憶部１４ａは、複数の実行ファイル１〜ｎを記憶する。パッキング済み実行ファイル記憶部１４ｂは、パッキング済みの実行ファイルを記憶する。例えば、パッキング済み実行ファイル記憶部１４ｂは、実行ファイル１〜ｎに対してパッカー１〜ｍそれぞれを用いてパッキングされた各実行ファイル（つまり、ｎ×ｍ個の実行ファイル）を記憶する。シグネチャ１４ｃは、後述するシグネチャ生成部１３ａによって生成されたシグネチャを記憶する。 The executable file storage unit 14a stores a plurality of executable files before packing. For example, the executable file storage unit 14a stores a plurality of executable files 1 to n. The packed executable file storage unit 14b stores the packed executable file. For example, the packed executable file storage unit 14b stores each executable file (that is, n × m executable files) packed using the packers 1 to m with respect to the executable files 1 to n. The signature 14c stores a signature generated by a signature generation unit 13a described later.

制御部１３は、各種の処理手順などを規定したプログラムおよび所要データを格納するための内部メモリを有し、これらによって種々の処理を実行するが、特に本発明に密接に関連するものとしては、シグネチャ生成部１３ａおよび難読化ツール識別部１３ｂを有する。なお、制御部１３として、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）などを適用する。 The control unit 13 has an internal memory for storing a program that defines various processing procedures and necessary data, and performs various processes by using these programs. Particularly, as closely related to the present invention, It has a signature generation unit 13a and an obfuscation tool identification unit 13b. Note that a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or the like is applied as the control unit 13.

シグネチャ生成部１３ａは、シグネチャとしてパッカーを識別するために利用するための各パッカーに応じた特徴的なコードブロックを抽出し、抽出したコードブロックからパッカー識別に利用するためのシグネチャを生成する。シグネチャ生成部１３ａは、分割部１３１および生成部１３２を有する。 The signature generation unit 13a extracts a characteristic code block corresponding to each packer to be used for identifying a packer as a signature, and generates a signature to be used for packer identification from the extracted code block. The signature generation unit 13a includes a division unit 131 and a generation unit 132.

分割部１３１は、複数の実行ファイルに対して複数のパッカーそれぞれを用いてパッキングされた各実行ファイルを動作させ、該各実行ファイルの動作結果として得られた各展開コードの実行トレースを複数のブロックごとに分割する。具体的には、まず、分割部１３１は、シグネチャの生成指示を入力部１１から受け付けると、パッキング済みの実行ファイルをパッキング済み実行ファイル記憶部１４ｂから取得する。例えば、図２の例を用いて説明すると、分割部１３１は、パッカー１〜ｍでそれぞれパッキングされたｎ×ｍ個のパッキング済みの各実行ファイル（図２の例では、パッカー１（実行ファイル１〜ｎ）、パッカー２（実行ファイル１〜ｎ）・・・パッカーｍ（実行ファイル１〜ｎ）と記載）をパッキング済み実行ファイル記憶部１４ｂから取得する。 The dividing unit 131 operates each execution file packed using each of a plurality of packers with respect to a plurality of execution files, and executes an execution trace of each expanded code obtained as an operation result of each execution file into a plurality of blocks. Divide every. Specifically, first, upon receiving a signature generation instruction from the input unit 11, the dividing unit 131 acquires a packed executable file from the packed executable file storage unit 14 b. For example, referring to the example of FIG. 2, the dividing unit 131 uses the n × m packed executable files packed by the packers 1 to m (packer 1 (executable file 1 in the example of FIG. 2)). N), packer 2 (execution files 1 to n)... Packer m (execution files 1 to n) are acquired from the packed execution file storage unit 14b.

そして、分割部１３１は、パッキング済みの各実行ファイルを動作させ、パッキング済み実行ファイルの展開コードの実行トレースを取得する。ここで実行トレースとは、プログラムの実行された命令の羅列である。ｘ８６命令を例にとると、実行された命令のオペコードとそのオペランドの組の列と言い換えてもよい。なお、実行トレースは幾つかの方法で取得することが可能である。例えば、デバッガを利用する方法、Binary Instrumentationを利用する方法、エミュレータを利用する方法、仮想マシンを利用する方法が考えられる。 Then, the dividing unit 131 operates each packed execution file, and acquires an execution trace of the expanded code of the packed execution file. Here, the execution trace is a list of instructions executed by the program. Taking the x86 instruction as an example, it may be paraphrased as a sequence of the opcode of the executed instruction and its operand. The execution trace can be acquired by several methods. For example, a method using a debugger, a method using Binary Instrumentation, a method using an emulator, and a method using a virtual machine can be considered.

また、展開コードの実行トレースとは、具体的に述べると、パッキングされた実行ファイルの先頭から各実行ファイルの本来のエントリポイントであるオリジナルエントリポイントまでの命令列である。また、オリジナルエントリポイント以外にも独自に定めた事象が現れるまで動作させることも可能である。つまり、展開コードが終わっていると思われる事象が現れる周辺まで動かせればよく、例えば、本来のオリジナルコードが利用する可能性の高いＡＰＩの呼び出しがみられるところまで、ネットワークへの通信が発生するところまで、更にはある一定時間まで、などの指標で動かしてもよい。なお、多くのパッカーは、この展開コードを多段で持つことが多い。これは、最初の展開コードが実行されると、２段目の展開コードとデータをメモリ上に展開し、展開が終わるとその２段目の展開コードにジャンプし、その２段目の展開コードが先ほど展開されたデータの読み込みオリジナルコードを展開する。上記は２段の例だが、パッカーの中にはこの展開コードが多段になっているものも多く存在する。 More specifically, the expanded code execution trace is a sequence of instructions from the top of the packed execution file to the original entry point that is the original entry point of each execution file. In addition to the original entry point, it is possible to operate until a uniquely defined event appears. In other words, it is only necessary to move to the vicinity where an event that seems to be the end of the expanded code appears. For example, communication to the network occurs until an API call that is highly likely to be used by the original code is seen. However, it may be moved by an index such as until a certain time. Many packers often have this expanded code in multiple stages. When the first expanded code is executed, the expanded code and data of the second stage are expanded on the memory. When the expansion is completed, the jump to the expanded code of the second stage is performed, and the expanded code of the second stage is executed. Read the data that was expanded earlier and expand the original code. The above is a two-stage example, but there are many packers in which this expanded code has multiple stages.

また、本来シグネチャとして利用するバイト列としては、実行ファイルのデータ部分よりもコード部分の方が望ましい。これは、データ部分が難読化のアルゴリズムにより変動が激しく可変な状態である場合が多いため、ある特定のプログラムの特徴パターンとして利用するには向いていない。例えば、ＸＯＲエンコードが利用されていた場合には、バイナリコードとＸＯＲをとる値を変更すれば生成される値は大きく変化してしまう。一方、パッキングされている実行ファイルは主に展開コードとデータから構成される。上述のようにデータ部分はシグネチャのパターンとしては適していないため、展開コード部分のバイト列をパターンとしてシグネチャに利用する方が識別の精度が高くなる。 In addition, as a byte string originally used as a signature, a code portion is preferable to a data portion of an executable file. This is often unsuitable for use as a characteristic pattern of a specific program because the data portion is often in a state of being fluctuating and variable due to an obfuscation algorithm. For example, when XOR encoding is used, if the value that takes XOR with the binary code is changed, the value to be generated changes greatly. On the other hand, the packed executable file is mainly composed of a development code and data. As described above, since the data portion is not suitable as a signature pattern, the accuracy of identification is higher when the byte sequence of the expanded code portion is used as a pattern for the signature.

続いて、分割部１３１は、実行トレースを特定のブロックに区切り、命令列（以下、コードブロックという）の集合を作成する。ブロックへの区切り方として、ベーシックブロック単位、一定命令数単位（n-gram）、関数単位、または上記ベーシックブロック、一定命令数、関数のn-gramが考えられる。これら各ブロックのハッシュ値を求めておく。なお、ハッシュ値を求める際、ｘ８６を例にとると、各命令のオペコード部分のみを利用してもよいし、オペランド部分も含めてハッシュ値を計算してもよい。なお、ここで計算されたハッシュ値は、後述のコードブロックの集合の比較を行う際に、計算を容易にするためのものであり、計算のコストを考慮しないのであれば、ハッシュ値を求める処理は必ずしも必要ではない。 Subsequently, the dividing unit 131 divides the execution trace into specific blocks, and creates a set of instruction sequences (hereinafter referred to as code blocks). As a method of dividing into blocks, a basic block unit, a constant instruction number unit (n-gram), a function unit, or the basic block, a constant instruction number, and a function n-gram can be considered. The hash value of each block is obtained. When obtaining a hash value, taking x86 as an example, only the opcode portion of each instruction may be used, or the hash value may be calculated including the operand portion. The hash value calculated here is for facilitating calculation when comparing sets of code blocks described later. If the calculation cost is not taken into account, the hash value is calculated. Is not necessarily required.

続いて、生成部１３２は、分割部１３１によって分割された各ブロックのうち、同一のパッカーでパッキングされた実行ファイル間で共通して出現するコードブロックを抽出し、該抽出したコードブロックのシグネチャを生成する。例えば、生成部１３２は、パッカー１でパッキングした実行ファイル１〜ｎの各実行トレースのコードブロックのハッシュ値の集合同士を比較し、共通で出現するハッシュ値を求める。これは、言い換えると、各実行トレースの中で共通する命令コードを探しだしていることとなる。 Subsequently, the generation unit 132 extracts code blocks that appear in common among execution files packed by the same packer from the blocks divided by the division unit 131, and obtains the signatures of the extracted code blocks. Generate. For example, the generation unit 132 compares hash value sets of code blocks of the execution traces of the execution files 1 to n packed by the packer 1 to obtain a hash value that appears in common. In other words, a common instruction code is searched for in each execution trace.

上記の共通で出現するコードブロックを求める処理は、下記（１）式で定義される。つまり、パッカーｑでパッキングしたｉ番目（０≦ｉ≦ｎ)の実行ファイルの実行トレースのブロックの集合をＢ_ｑ，ｉとすると、ｎ個の実行ファイルの中に共通で出現するブロックは下記（１）式で定義される。そして、生成部１３２は、図２に例示するように、全てのパッカー１〜ｍについて、パッキングされた実行ファイルの展開コードの実行トレースに共通で出現するコードブロックを抽出する。 The process for obtaining the code block that appears in common is defined by the following equation (1). In other words, if a set of execution trace blocks of the i-th (0 ≦ i ≦ n) executable file packed by the packer q is B _{q, i} , the blocks that appear in common in the n executable files are as follows ( 1) It is defined by the formula. Then, as illustrated in FIG. 2, the generation unit 132 extracts code blocks that appear in common in the execution trace of the expanded code of the packed execution file for all packers 1 to m.

上記の処理では、生成部１３２は、同一のパッカーでパッキングされた実行ファイル間で共通して出現するコードブロックのシグネチャを生成する場合を説明したが、以下の（２）〜（５）式を用いて説明するように、同一のパッカーでパッキングされた実行ファイル間で共通して出現するコードブロックであって、且つ、他のパッカーでパッキングされた実行ファイルでは出現しないコードブロックのシグネチャを生成するようにしてもよい。 In the above processing, the generation unit 132 has described the case of generating a signature of a code block that appears in common between executable files packed by the same packer. However, the following equations (2) to (5) are used. As will be described with reference to the above description, a signature of a code block that appears in common among executable files packed with the same packer and does not appear in an executable file packed with another packer is generated. You may do it.

例えば、生成部１３２は、全てのパッカー１〜ｍについて、パッキングされた実行ファイルの展開コード部分の実行トレースに共通で出現するコードブロックの計算が終わると、次にあるパッカーに対しては共通的に表れるが、他のパッカーの共通なコードブロックとしては出現しないコードブロックの集合を求め、これをそのパッカーに対するシグネチャとする。これを式で定義すると下記（２）式となる。パッカーの集合をＰとし、パッカーのシグネチャをＳ_ｑｊとする。 For example, when the calculation of the code block that appears in common in the execution trace of the expanded code portion of the packed execution file is completed for all packers 1 to m, the generation unit 132 is common to the next packer. A set of code blocks that do not appear as code blocks common to other packers is obtained, and this is used as a signature for the packer. When this is defined by an equation, the following equation (2) is obtained. Let P be the set of packers and S _qj be the signature of the packers.

また、生成部１３２は、あるパッカーで共通で出現するコードブロックから、他のパッカーの共通なコードブロック全てにおいて共通に出現するコードブロックを省くことで、シグネチャＳ_ｑｊを求める。これを式で定義すると下記（３）式となる。 The generation unit 132 obtains the signature S _qj by omitting code blocks that appear in common in all code blocks common to other packers from code blocks that appear in common in a certain packer. When this is defined by an equation, the following equation (3) is obtained.

また、他にも、生成部１３２は、あるパッカーで共通で出現するコードブロックから、他の各パッカーの実行トレースのコードブロックに一つでも出現したコードを省いたコードブロックの集合をシグネチャＳ_ｑｊとしてもよい。これを式で定義すると下記（４）式となる。 Alternatively, it is also possible to form generating unit 132, a code block that appears in common in some packers, another signature S _qj a set of code blocks even omitting the code that appeared in one code block execution trace of each packer It is good. When this is defined by an equation, the following equation (4) is obtained.

さらには、生成部１３２は、あるパッカーで共通で出現するコードブロックから、各パッカー内で複数の実行ファイル全てに出現するわけではないが、全てのパッカーに共通して出現するコードブロックを省くことでシグネチャＳ_ｑｊを作成してもよい。これを式で定義すると下記（５）式となる。 Furthermore, the generation unit 132 omits code blocks that appear in common in all packers from code blocks that appear in common in a certain packer, although they do not appear in all of the plurality of executable files in each packer. The signature S _qj may be created by When this is defined by an equation, the following equation (5) is obtained.

また、シグネチャ生成部１３ａは、分割部によって分割された各ブロックのうち、同一のパッカーでパッキングされた実行ファイル間で共通して出現するブロックであって、且つ、パッキングされていない実行ファイルでは出現しないブロックのシグネチャを生成するようにしてもよい。 Also, the signature generation unit 13a is a block that appears in common among executable files packed by the same packer among the blocks divided by the dividing unit, and appears in an unpacked executable file You may make it produce | generate the signature of the block which does not carry out.

例えば、図３に示すように、シグネチャ生成部１３ａは、パッカーによりパッキングされた実行ファイルのみならず、パッキングされる前の実行ファイルを実行ファイル記憶部１４ａから取得する。そして、シグネチャ生成部１３ａは、ある特定のパッカーのシグネチャを生成する際に、そのパッカーに共通的に表れるコードブロックの集合から、他のパッカーに出現するコードブロックのみならず、各実行ファイルに出現するコードブロックも併せて省くことでシグネチャを作成するようにしてもよい。 For example, as illustrated in FIG. 3, the signature generation unit 13a acquires not only the execution file packed by the packer but also the execution file before packing from the execution file storage unit 14a. When the signature generation unit 13a generates a signature for a specific packer, the signature generation unit 13a appears not only in the code block that appears in other packers but also in each executable file from the set of code blocks that commonly appear in the packer. The signature may be created by omitting the code block to be processed.

これにより、通常の実行ファイルに出現しやすいコードブロックを当該パッカーのシグネチャから省くことができ、よりパッカー特有のシグネチャが作成できるものと考えられる。 As a result, code blocks that tend to appear in a normal executable file can be omitted from the signature of the packer, and a packer-specific signature can be created.

難読化ツール識別部１３ｂは、図４に示すように、シグネチャ生成部１３ａが生成したシグネチャを用いて、入力された検査対象実行ファイルで利用されているパッカーを特定し、特定した結果を出力する。具体的には、難読化ツール識別部１３ｂは、検査対象実行ファイルを入力部１１を介して受け付けると、検査対象実行ファイルを動作させて、実行トレースを取得する。ここで、難読化ツール識別部１３ｂは、検査対象の実行ファイルを一旦動作させているので、２段目以降の展開コードが現れ、この部分もシグネチャと比較され、パッカーの識別処理に利用される。これにより、２段目以降の展開コードもパッカーの識別処理に利用することが可能となり、精度の高いパッカー識別処理を行うことが可能となる。 As shown in FIG. 4, the obfuscation tool identification unit 13b identifies the packer used in the input inspection target execution file using the signature generated by the signature generation unit 13a, and outputs the identified result. . Specifically, when the obfuscation tool identification unit 13b receives the inspection target execution file via the input unit 11, the obfuscation tool identification unit 13b operates the inspection target execution file and acquires an execution trace. Here, since the obfuscated tool identification unit 13b once operates the execution file to be inspected, the expanded code after the second stage appears, and this part is also compared with the signature and used for packer identification processing. . As a result, the development codes in the second and subsequent stages can also be used for the packer identification process, and the packer identification process with high accuracy can be performed.

そして、難読化ツール識別部１３ｂは、実行トレースを特定のブロックに区切り、コードブロックの集合を作成する。続いて、難読化ツール識別部１３ｂは、作成したコードブロックとシグネチャ記憶部１４ｃに記憶された各シグネチャＳ_ｊ（０≦ｊ≦ｍ）を比較し、スコアを算出する。 The obfuscation tool identification unit 13b then divides the execution trace into specific blocks and creates a set of code blocks. Subsequently, the obfuscation tool identification unit 13b compares the created code block with each signature S _j (0 ≦ j ≦ m) stored in the signature storage unit 14c, and calculates a score.

ここで、スコアの算出方法について具体的に説明する。難読化ツール識別部１３ｂは、入力された検査対象の実行ファイルの実行トレースをシグネチャ生成部１３ａと同様の方法で取得し、この実行トレースから得られるコードブロックの集合Ｔとシグネチャ生成部１３ａで作成した各シグネチャを比較し、各シグネチャにおけるスコアを下記（６）式で計算する。この式により、各シグネチャのコードブロックのうち、どれだけのコードブロックが検査対象の実行ファイルの実行トレースに含まれているか（どれだけ適合するか）が分かる。 Here, the score calculation method will be specifically described. The obfuscation tool identification unit 13b obtains an execution trace of the input execution target execution file in the same manner as the signature generation unit 13a, and creates the code block set T obtained from the execution trace and the signature generation unit 13a. The respective signatures are compared, and the score in each signature is calculated by the following equation (6). This equation shows how many code blocks of the code blocks of each signature are included in the execution trace of the execution file to be inspected (how much is matched).

このスコアを全てのパッカー１〜ｍに対して求め、その中で最大値のもの（下記（７）式参照）を実行ファイルでパッキングされているパッカーとする。そして、難読化ツール識別部１３ｂは、最大スコアのシグネチャに対応するパッカーを検査対象の実行ファイルをパッキングしたパッカーとして出力部１２から出力する。なお、最大値のスコアでなく、ある閾値に基づき一定以上の値を示した場合に、その実行ファイルをパッキングしたパッカーとして識別してもよい。 This score is obtained for all packers 1 to m, and the one with the maximum value (see the following expression (7)) is defined as the packer packed with the execution file. Then, the obfuscation tool identification unit 13b outputs the packer corresponding to the signature with the maximum score from the output unit 12 as a packer packed with the execution file to be inspected. In addition, when a certain value or more is shown based on a certain threshold instead of the maximum score, the executable file may be identified as a packed packer.

なお、パッカーには、正確な逆アセンブルを行うのが困難であるという事情がある。これは、ｘ８６のコードを例にとると、コードの最適化やキャッシュ効率を上げるためなどの理由でコード領域の中にデータを含ませることが可能なアーキテクチャになっている。パッカーでは、この特徴を悪用しコードブロックの途中にデータを含ませるなどの方法で逆アセンブルを困難にする解析妨害機能を持つことが多い。この結果としてパッキングされた実行ファイルの展開コード部分を自動的に特定するのが難しく、正確な逆アセンブル結果を得るために人手の介入が必要となることが多い。これに対して、解析装置１０では、実行トレースは実際にＣＰＵが実行した命令コードを基にしているため、明らかにコードと断定することができ、逆アセンブルの際のデータとコードの見分けがつかなくなる問題も発生しない。 The packer has a situation that it is difficult to perform accurate disassembly. Taking x86 code as an example, this is an architecture that allows data to be included in the code area for reasons such as code optimization and increased cache efficiency. Packers often have an analysis blocking function that makes disassembly difficult by exploiting this feature and including data in the middle of code blocks. As a result, it is difficult to automatically specify the expanded code portion of the packed executable file, and manual intervention is often required to obtain an accurate disassembly result. On the other hand, in the analysis device 10, since the execution trace is based on the instruction code actually executed by the CPU, it can be clearly determined as the code, and the data and code at the time of disassembly can be distinguished. The problem of disappearing does not occur.

［解析装置による処理］
次に、図５および図６を用いて、実施例１に係る解析装置１０による処理を説明する。図５は、実施例１に係る解析装置のシグネチャ生成処理の手順を説明するためのフローチャートである。図６は、実施例１に係る解析装置のパッカー識別処理の手順を説明するためのフローチャートである。 [Processing by analyzer]
Next, processing performed by the analysis apparatus 10 according to the first embodiment will be described with reference to FIGS. 5 and 6. FIG. 5 is a flowchart for explaining the procedure of the signature generation process of the analysis apparatus according to the first embodiment. FIG. 6 is a flowchart for explaining the procedure of the packer identification process of the analysis apparatus according to the first embodiment.

図５に示すように、解析装置１０のシグネチャ生成部１３ａは、シグネチャ生成の指示を入力部１１から受け付けると（ステップＳ１０１肯定）、パッキング済み実行ファイル記憶部１４ｂに記憶されたパッキング済み実行ファイルを取得し、パッキング済み実行ファイルを動作させる（ステップＳ１０２）。 As illustrated in FIG. 5, when the signature generation unit 13a of the analysis apparatus 10 receives a signature generation instruction from the input unit 11 (Yes in step S101), the packed execution file stored in the packed execution file storage unit 14b is received. The acquired execution file is acquired and operated (step S102).

そして、シグネチャ生成部１３ａは、パッキング済み実行ファイルの展開コードの実行トレースを取得する（ステップＳ１０３）。続いて、シグネチャ生成部１３ａは、実行トレースを特定のブロックに区切り、コードブロックの集合を生成する（ステップＳ１０４）。 Then, the signature generation unit 13a acquires an execution trace of the expanded code of the packed execution file (step S103). Subsequently, the signature generation unit 13a divides the execution trace into specific blocks, and generates a set of code blocks (step S104).

そして、シグネチャ生成部１３ａは、各実行トレースのコードブロックの集合同士を比較し（ステップＳ１０５）、共通で出現するコードブロックの集合を抽出し、シグネチャを生成する（ステップＳ１０６）。具体的には、シグネチャ生成部１３ａは、あるパッカーに対しては共通的に表れるが、他のパッカーの共通なコードブロックとしては出現しないコードブロックの集合を求め、このコードブロックの集合をそのパッカーに対するシグネチャとして生成し、シグネチャ記憶部１４ｃに記憶させて、処理を終了する。 Then, the signature generation unit 13a compares a set of code blocks of each execution trace (step S105), extracts a set of code blocks that appear in common, and generates a signature (step S106). Specifically, the signature generation unit 13a obtains a set of code blocks that appear in common to a certain packer but do not appear as common code blocks of other packers, and the set of code blocks is obtained as the packer. And is stored in the signature storage unit 14c, and the process ends.

次に、図６を用いて、パッカー識別処理について説明する。図６に示すように、解析装置１０の難読化ツール識別部１３ｂは、検査対象実行ファイルを入力部１１を介して受け付けると（ステップＳ２０１）、検査対象実行ファイルの実行トレースを取得する（ステップＳ２０２）。 Next, the packer identification process will be described with reference to FIG. As illustrated in FIG. 6, when the obfuscation tool identification unit 13b of the analysis apparatus 10 receives the inspection target execution file via the input unit 11 (step S201), the execution trace of the inspection target execution file is acquired (step S202). ).

そして、難読化ツール識別部１３ｂは、実行トレースを特定のブロックに区切り、コードブロックの集合を作成する（ステップＳ２０３）。続いて、難読化ツール識別部１３ｂは、作成したコードブロックとシグネチャ記憶部１４ｃに記憶された各シグネチャを比較し、スコアを算出する（ステップＳ２０４）。 The obfuscation tool identification unit 13b then divides the execution trace into specific blocks and creates a set of code blocks (step S203). Subsequently, the obfuscation tool identification unit 13b compares the created code block with each signature stored in the signature storage unit 14c, and calculates a score (step S204).

具体的には、難読化ツール識別部１３ｂは、入力された検査対象の実行ファイルの実行トレースをシグネチャ生成部１３ａと同様の方法で取得し、この実行トレースから得られるコードブロックの集合Ｔとシグネチャ生成部１３ａで作成した各シグネチャを比較し、各シグネチャにおけるスコアを算出する。そして、難読化ツール識別部１３ｂは、最大スコアのシグネチャに対応するパッカーを検査対象の実行ファイルで利用されているパッカーとして出力部１２から出力する（ステップＳ２０５）。 Specifically, the obfuscation tool identification unit 13b acquires the execution trace of the input execution target execution file in the same manner as the signature generation unit 13a, and obtains a set T of code blocks obtained from the execution trace and the signature. The signatures created by the generation unit 13a are compared, and the score for each signature is calculated. Then, the obfuscation tool identification unit 13b outputs the packer corresponding to the signature with the maximum score from the output unit 12 as the packer used in the execution file to be inspected (step S205).

[実施例１の効果]
上述してきたように、解析装置１０は、複数の実行ファイルに対して複数のパッカーそれぞれを用いてパッキングされた各実行ファイルを動作させ、該各実行ファイルの動作結果として得られた各命令コードを複数のブロックごとに分割する。そして、解析装置１０は、分割された各ブロックのうち、同一のパッカーでパッキングされた実行ファイル間で共通して出現するコードブロックを抽出し、該抽出したコードブロックのシグネチャを生成する。そして、解析装置１０は、生成されたコードブロックのシグネチャを用いて、検査対象の実行ファイルのパッキングに用いられたパッカーを特定する。このため、精度の高いシグネチャを生成することができる結果、パッカーの識別精度の向上を図ることが可能である。 [Effect of Example 1]
As described above, the analysis apparatus 10 operates each execution file packed using each of a plurality of packers with respect to a plurality of execution files, and uses each instruction code obtained as an operation result of each execution file. Divide into multiple blocks. Then, the analysis apparatus 10 extracts code blocks that appear in common among executable files packed by the same packer from among the divided blocks, and generates a signature of the extracted code blocks. Then, the analysis apparatus 10 specifies the packer used for packing the execution file to be inspected using the generated code block signature. For this reason, as a result of generating a highly accurate signature, it is possible to improve the packer identification accuracy.

また、実施例１によれば、分割された各コードブロックのうち、同一のパッカーでパッキングされた実行ファイル間で共通して出現するコードブロックであって、且つ、他のパッカーでパッキングされた実行ファイルでは出現しないシグネチャを生成する。このため、よりパッカー特有のシグネチャを生成することができる結果、パッカーの識別精度の向上を図ることが可能である。 Further, according to the first embodiment, among the divided code blocks, code blocks that appear in common between execution files packed with the same packer and packed with other packers are executed. Generate a signature that does not appear in the file. For this reason, as a result of being able to generate a packer-specific signature, it is possible to improve the packer identification accuracy.

また、実施例１によれば、分割された各コードブロックのうち、同一のパッカーでパッキングされた実行ファイル間で共通して出現するコードブロックであって、且つ、パッキングされていない実行ファイルでは出現しないコードブロックの特徴に関する情報を生成する。このため、よりパッカー特有のシグネチャを生成することができる結果、パッカーの識別精度の向上を図ることが可能である。 Further, according to the first embodiment, among the divided code blocks, code blocks that appear in common among executable files packed by the same packer and appear in an unpacked executable file Generate information about the features of the code blocks that do not. For this reason, as a result of being able to generate a packer-specific signature, it is possible to improve the packer identification accuracy.

また、実施例１によれば、検査対象の実行ファイルを動作させ、検査対象の実行ファイルの動作結果として得られた命令コードと生成されたシグネチャとの適合度合いを算出し、算出した結果に応じて、検査対象の実行ファイルのパッキングに用いられたパッカーを特定するので、パッカーの識別精度の向上を図ることが可能である。 Further, according to the first embodiment, the execution file to be inspected is operated, the degree of matching between the instruction code obtained as the operation result of the execution file to be inspected and the generated signature is calculated, and according to the calculated result Thus, since the packer used for packing the execution file to be inspected is specified, it is possible to improve the packer identification accuracy.

［解析プログラム］
図７は、解析プログラムによる処理がコンピュータを用いて具体的に実現されることを示す図である。図７に例示するように、コンピュータ１０００は、例えば、メモリ１００１と、ＣＰＵ１００２と、ハードディスクドライブインタフェース１００３と、ディスクドライブインタフェース１００４と、シリアルポートインタフェース１００５と、ビデオアダプタ１００６と、ネットワークインタフェース１００７とを有し、これらの各部はバス１００８によって接続される。 [Analysis program]
FIG. 7 is a diagram illustrating that the processing by the analysis program is specifically realized using a computer. As illustrated in FIG. 7, the computer 1000 includes, for example, a memory 1001, a CPU 1002, a hard disk drive interface 1003, a disk drive interface 1004, a serial port interface 1005, a video adapter 1006, and a network interface 1007. These units are connected by a bus 1008.

メモリ１００１は、図７に例示するように、ＲＯＭ（Read Only Memory）１００１ａ及びＲＡＭ（Random Access Memory）１００１ｂを含む。ＲＯＭ１００１ａは、例えば、ＢＩＯＳ（Basic Input Output System）等のブートプログラムを記憶する。ハードディスクドライブインタフェース１００３は、図７に例示するように、ハードディスクドライブ１００９に接続される。ディスクドライブインタフェース１００４は、図７に例示するように、ディスクドライブ１０１０に接続される。例えば、磁気ディスクや光ディスク等の着脱可能な記憶媒体が、ディスクドライブ１０１０に挿入される。シリアルポートインタフェース１００５は、図７に例示するように、例えばマウス１０１１、キーボード１０１２に接続される。ビデオアダプタ１００６は、図７に例示するように、例えばディスプレイ１０１３に接続される。 As illustrated in FIG. 7, the memory 1001 includes a ROM (Read Only Memory) 1001a and a RAM (Random Access Memory) 1001b. The ROM 1001a stores a boot program such as BIOS (Basic Input Output System). The hard disk drive interface 1003 is connected to the hard disk drive 1009 as illustrated in FIG. The disk drive interface 1004 is connected to the disk drive 1010 as illustrated in FIG. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1010. The serial port interface 1005 is connected to, for example, a mouse 1011 and a keyboard 1012 as illustrated in FIG. The video adapter 1006 is connected to a display 1013, for example, as illustrated in FIG.

ここで、図７に例示するように、ハードディスクドライブ１００９は、例えば、ＯＳ１００９ａ、アプリケーションプログラム１００９ｂ、プログラムモジュール１００９ｃ、プログラムデータ１００９ｄを記憶する。すなわち、解析プログラムは、コンピュータ１０００によって実行される指令が記述されたプログラムモジュール１００９ｃとして、例えばハードディスクドライブ１００９に記憶される。具体的には、上記実施例で説明したシグネチャ生成部１３ａと同様の処理を実行するシグネチャ生成手順と、難読化ツール識別部１３ｂと同様の処理を実行する難読化ツール識別手順とが記述されたプログラムモジュール１００９ｃが、ハードディスクドライブ１００９に記憶される。また、解析プログラムによる処理に用いられるデータは、プログラムデータ１００９ｄとして、例えばハードディスクドライブ１００９に記憶される。そして、ＣＰＵ１００２が、ハードディスクドライブ１００９に記憶されたプログラムモジュール１００９ｃやプログラムデータ１００９ｄを必要に応じてＲＡＭ１００１ｂに読み出し、シグネチャ生成手順、難読化ツール識別手順を実行する。 Here, as illustrated in FIG. 7, the hard disk drive 1009 stores, for example, an OS 1009a, an application program 1009b, a program module 1009c, and program data 1009d. That is, the analysis program is stored in, for example, the hard disk drive 1009 as a program module 1009c in which a command to be executed by the computer 1000 is described. Specifically, a signature generation procedure for executing the same processing as the signature generation unit 13a described in the above embodiment and an obfuscation tool identification procedure for executing the same processing as the obfuscation tool identification unit 13b are described. A program module 1009c is stored in the hard disk drive 1009. Data used for processing by the analysis program is stored as program data 1009d in, for example, the hard disk drive 1009. Then, the CPU 1002 reads the program module 1009c and program data 1009d stored in the hard disk drive 1009 to the RAM 1001b as necessary, and executes a signature generation procedure and an obfuscation tool identification procedure.

なお、解析プログラムに係るプログラムモジュール１００９ｃやプログラムデータ１００９ｄは、ハードディスクドライブ１００９に記憶される場合に限られず、例えば着脱可能な記憶媒体に記憶され、ディスクドライブ１０１０等を介してＣＰＵ１００２によって読み出されてもよい。あるいは、解析プログラムに係るプログラムモジュール１００９ｃやプログラムデータ１００９ｄは、ネットワーク（ＬＡＮ（Local Area Network）、ＷＡＮ（Wide Area Network）等）を介して接続された他のコンピュータに記憶され、ネットワークインタフェース１００７を介してＣＰＵ１００２によって読み出されてもよい。 Note that the program module 1009c and the program data 1009d related to the analysis program are not limited to being stored in the hard disk drive 1009, but are stored in, for example, a removable storage medium and read out by the CPU 1002 via the disk drive 1010 or the like. Also good. Alternatively, the program module 1009c and the program data 1009d related to the analysis program are stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.), and are transmitted via the network interface 1007. May be read by the CPU 1002.

１０解析装置
１１入力部
１２出力部
１３制御部
１３ａシグネチャ生成部
１３ｂ難読化ツール識別部
１４記憶部
１４ａ実行ファイル記憶部
１４ｂパッキング済み実行ファイル記憶部
１４ｃシグネチャ記憶部 DESCRIPTION OF SYMBOLS 10 Analysis apparatus 11 Input part 12 Output part 13 Control part 13a Signature production | generation part 13b Obfuscation tool identification part 14 Storage part 14a Execution file storage part 14b Packed execution file storage part 14c Signature storage part

Claims

Dividing multiple executable files by operating each executable file obfuscated using multiple obfuscation tools, and dividing each instruction code obtained as a result of each executable file into multiple blocks And
A generation unit that extracts blocks that appear in common among executable files obfuscated by the same obfuscation tool from among the blocks divided by the division unit, and generates information about the characteristics of the extracted blocks; ,
An analysis apparatus comprising: a specifying unit that specifies an obfuscation tool used for obfuscation of an execution file to be inspected using information about the feature of the block generated by the generation unit.

The generating unit is a block that appears in common among executable files obfuscated by the same obfuscation tool among the blocks divided by the dividing unit, and is obfuscated by other obfuscation tools. The analysis apparatus according to claim 1, wherein information relating to a feature of a block that does not appear in the converted execution file is generated.

The generation unit is an execution file that is a block that appears in common among execution files obfuscated by the same obfuscation tool among the blocks divided by the division unit and is not obfuscated The analysis apparatus according to claim 1, wherein information on a feature of a block that does not appear is generated.

The specifying unit operates the execution file to be inspected, calculates a degree of matching between the instruction code obtained as an operation result of the execution file to be inspected and information about the feature generated by the generation unit, and calculates The analysis device according to claim 1, wherein an obfuscation tool used for obfuscation of the execution file to be inspected is specified according to the result.

Dividing multiple executable files by operating each executable file obfuscated using multiple obfuscation tools, and dividing each instruction code obtained as a result of each executable file into multiple blocks Process,
A generation step of extracting blocks that appear in common among executable files obfuscated by the same obfuscation tool among the blocks divided by the division step, and generating information on the characteristics of the extracted blocks; ,
And a specifying step of specifying an obfuscation tool used for obfuscation of the execution file to be inspected using information on the feature of the block generated by the generating step.

Dividing multiple executable files by operating each executable file obfuscated using multiple obfuscation tools, and dividing each instruction code obtained as a result of each executable file into multiple blocks Steps,
A generation step of extracting a block that appears in common among executable files obfuscated by the same obfuscation tool among the blocks divided by the division step, and generating information on the characteristics of the extracted block; ,
An analysis program for causing a computer to execute a specific step of identifying an obfuscation tool used for obfuscation of an execution file to be inspected using information on the feature of the block generated by the generation step.