JP5839967B2

JP5839967B2 - Malware analysis system

Info

Publication number: JP5839967B2
Application number: JP2011263186A
Authority: JP
Inventors: 河内　清人; 清人河内
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2011-12-01
Filing date: 2011-12-01
Publication date: 2016-01-06
Anticipated expiration: 2031-12-01
Also published as: JP2013114637A

Description

この発明は、インターネットに接続された情報処理端末のマルウェアによる感染が発覚したときに、そのマルウェアから外部に漏洩された情報の特定に必要なマルウェアの暗号化鍵を特定する技術に関する。 The present invention relates to a technique for specifying an encryption key of malware necessary for specifying information leaked to the outside from the malware when an infection of an information processing terminal connected to the Internet is detected.

マルウェアを解析する従来技術としては、例えば特許文献１において、マルウェアを特別な実験環境で実行させて、システムコールやコンピュータ資源の利用履歴およびリモートの端末への送信メッセージを記録する手法が示されている。 As a conventional technique for analyzing malware, for example, Patent Document 1 discloses a technique for executing malware in a special experimental environment and recording a system call, a usage history of computer resources, and a message transmitted to a remote terminal. Yes.

特開2009-181335号公報JP 2009-181335 A

昨今、マルウェアによる大規模な情報漏洩事件が注目されている。マルウェアによる感染が発覚したときには、そのマルウェアから外部に漏洩された情報を特定する必要がある。そのためには、インターネットと組織内ネットワーク上の情報処理端末との通信を普段から記録し、ある端末上でマルウェアの感染が発覚した際には、当該端末からアップロードされた通信情報を通信記録から抜き出して、解析しなければならない。
これらのマルウェアはRC4やAESなどの暗号化関数を内部に持ち、外部にアップロードする情報を暗号化してから送信するため、マルウェアを解析して暗号化鍵を特定しなければ実際に漏洩した情報の内容が何かを明らかにすることはできない。 Recently, large-scale information leakage incidents due to malware have attracted attention. When an infection by malware is detected, it is necessary to identify information leaked to the outside from the malware. For this purpose, communication between the Internet and information processing terminals on the in-house network is normally recorded, and when malware infection is detected on a certain terminal, the communication information uploaded from the terminal is extracted from the communication record. Must be analyzed.
These malware have encryption functions such as RC4 and AES inside, and since the information uploaded to the outside is encrypted before being sent, it is necessary to analyze the malware and identify the encryption key. It is not possible to clarify what the contents are.

しかし、従来技術では、実験環境にてマルウェアの動作を記録することはできても、その動作記録から復号に必要な暗号化鍵を抽出する方法は無く、結果として通信記録を復号して漏洩した情報を特定することはできなかった。
この発明は上記のような問題点を解決するためになされたもので、マルウェアのメモリ操作を詳細に解析することにより、マルウェアがメモリに書き込む様々なデータの中から暗号化鍵を特定することを目的とする。 However, in the prior art, although it is possible to record the operation of malware in the experimental environment, there is no method for extracting the encryption key necessary for decryption from the operation record, and as a result, the communication record was decrypted and leaked. The information could not be identified.
The present invention has been made to solve the above problems, and by analyzing the memory operation of the malware in detail, it is possible to identify the encryption key from various data that the malware writes to the memory. Objective.

この発明に係るマルウェア解析システムは、
インターネットに接続され、マルウェアに感染された情報処理端末から、外部に漏洩された情報の特定に必要なマルウェアの暗号化鍵を特定するマルウェア解析システムであって、
情報処理端末とインターネットとの間の通信情報が記録された通信記録蓄積手段と、
感染されたマルウェアを記憶し、このマルウェアのプログラムを実行するマルウェア実行手段と、マルウェア実行手段上でマルウェアがマシン語命令を実行するたびに、実行トレース情報を記録する実行トレース記録手段を有するマルウェア実行装置と、
マルウェアからの接続要求に応じて、通信記録蓄積手段の過去の通信記録からメッセージを選択してマルウェアに返すメッセージ送信装置と、
実行トレース記録手段からの実行トレース情報を走査し、呼び出し引数によるネットワーク受信関数の呼び出し、引数のうち、受信バッファのアドレス、および受信バッファ長を求め、変数Pmsg、Lmsg領域に格納し、
マルウェアプログラム実行中のデータ計算に参照されるデータを依存データとし、その依存データとマルウェア実行トレース情報との対応を記録するデータ依存関係履歴情報をマルウェアプログラムの実行により更新し、
マルウェアが受信したメッセージ内のデータとの間でビット演算または算術演算が行われたデータ群を、依存データ情報に追加する処理をマルウェアプログラムの実行トレース行終端まで実施し、
データが追加された依存データ情報を所定の基準でグループ分けして鍵格納バッファ候補群を抽出し出力する鍵特定手段と、
鍵特定手段により暗号化鍵の候補が複数抽出された場合に、抽出された暗号化鍵の候補によりデータを復号し、復号前と復号後のデータを比較し、復号に成功した暗号化鍵を選択する復号結果判定手段を有する実行トレース解析装置とを備える。 The malware analysis system according to this invention
A malware analysis system for identifying an encryption key of malware necessary for identifying information leaked from an information processing terminal connected to the Internet and infected with malware,
Communication record storage means in which communication information between the information processing terminal and the Internet is recorded;
Malware execution means having malware execution means for storing infected malware and executing the malware program, and execution trace recording means for recording execution trace information each time the malware executes a machine language instruction on the malware execution means Equipment,
In response to a connection request from malware, a message transmission device that selects a message from the past communication record of the communication record storage means and returns it to the malware;
Scans the execution trace information from the execution trace recording means, calls the network reception function by the call argument, finds the address of the receive buffer and the receive buffer length among the arguments, stores them in the variable Pmsg, Lmsg area,
Update the data dependency history information that records the correspondence between the dependency data and the malware execution trace information by executing the malware program.
The process of adding the data group that has undergone bit operation or arithmetic operation to the data in the message received by the malware to the dependent data information is performed until the end of the execution trace line of the malware program,
A key specifying means for extracting and outputting a key storage buffer candidate group by grouping the dependent data information to which data is added according to a predetermined criterion;
When a plurality of encryption key candidates are extracted by the key specifying means, the data is decrypted with the extracted encryption key candidates, the data before decryption is compared with the data after decryption, and the encryption key successfully decrypted is determined. An execution trace analysis device having a decoding result determination means to select.

この発明に係るマルウェア解析システムによれば、実行トレース情報を記録しながら解析対象マルウェアに、過去に蓄積されたメッセージを復号させ、実行トレース情報を解析して鍵を特定することで、過去にアップロードされた情報を復号し、内容を取り出すことができるという効果がある。
さらに、鍵格納バッファ候補群抽出手続きにおいて、鍵の候補を特定する際に、参照したマシン語命令アドレスが近いかどうかを基準にアドレスをグループ分けすることで、連続したメモリ上に鍵データとは別のデータが並んでいたとしても、鍵データの一部として取り出されることを防ぐ効果がある。
さらに、鍵候補を抽出する際に、受信バッファ由来のデータとビット演算/算術演算されるデータのアドレスに絞り込んで解析を行うことで、抽出される鍵候補の数を削減することができるという効果がある。 According to the malware analysis system of the present invention, the malware to be analyzed is decrypted in the past while recording the execution trace information, and the past is uploaded by analyzing the execution trace information and specifying the key. There is an effect that the information can be decrypted and the contents can be extracted.
Furthermore, in the key storage buffer candidate group extraction procedure, when identifying key candidates, by grouping the addresses based on whether the referenced machine language instruction addresses are close, the key data is stored on the continuous memory. Even if other data are arranged, there is an effect of preventing the data from being extracted as part of the key data.
Furthermore, when extracting key candidates, the number of key candidates to be extracted can be reduced by narrowing down the analysis to the addresses of the data derived from the reception buffer and the data subjected to bit operation / arithmetic operation. There is.

この発明の実施の形態１に係るマルウェア解析システムを示す構成図である。It is a block diagram which shows the malware analysis system which concerns on Embodiment 1 of this invention. 実行トレース記録手段に記録される実行トレース情報の説明図である。It is explanatory drawing of the execution trace information recorded on an execution trace recording means. 通信記録蓄積手段に記録される情報の説明図である。It is explanatory drawing of the information recorded on a communication record storage means. データ依存関係履歴情報の説明図である。It is explanatory drawing of data dependence history information. 鍵特定手段の処理の概要を示すフロー図である。It is a flowchart which shows the outline | summary of a process of a key specific means. データ依存関係履歴情報更新手続きで、依存データ情報の和集合が変数Dに格納されるまでの処理フロー図である。FIG. 11 is a processing flow chart until a union of dependent data information is stored in a variable D in a data dependency history information update procedure. データ依存関係履歴情報更新手続きで、生成されたlastFlowのコピー情報が変数newFlowに格納される以降の処理フロー図である。FIG. 11 is a processing flowchart after the copy information of lastFlow generated in the data dependency history information update procedure is stored in a variable newFlow. データ依存関係確認手続きの処理フロー図である。It is a processing flowchart of a data dependence confirmation procedure. 鍵格納バッファ候補群抽出手続きの処理フロー図である。It is a processing flow figure of a key storage buffer candidate group extraction procedure. 依存アドレス抽出手続きの処理フロー図である。It is a processing flow figure of a dependent address extraction procedure. この発明の実施の形態２に係るマルウェア解析システムを示す構成図である。It is a block diagram which shows the malware analysis system which concerns on Embodiment 2 of this invention.

実施の形態１．
図１はこの発明に係るマルウェア解析システムを示す装置の構成図である。
本システムは、メッセージ送信装置101、マルウェア実行装置104、実行トレース解析装置108、及び通信記録蓄積手段116で構成されている。メッセージ送信装置101とマルウェア実行装置104とはネットワーク118で接続されている。その他の装置間および通信記録蓄積手段116との間の接続は、情報の入出力が可能であれば自由に行われる。 Embodiment 1 FIG.
FIG. 1 is a block diagram of an apparatus showing a malware analysis system according to the present invention.
This system includes a message transmission device 101, a malware execution device 104, an execution trace analysis device 108, and communication record storage means 116. The message transmission device 101 and the malware execution device 104 are connected via a network 118. Connections between other devices and between the communication record storage means 116 are freely made as long as information can be input and output.

次にマルウェア解析システムを構成する各装置について説明する。
メッセージ送信装置101は、マルウェア実行装置104上で動作する解析対象マルウェア105に入力するためのメッセージを生成・送信する装置であり、メッセージ送信手段102とメッセージ生成手段103で構成される。メッセージ送信手段102は、メッセージ生成手段103によって生成されたメッセージをネットワーク118を通じてマルウェア実行装置104に送信する手段である。メッセージ生成手段103は通信記録蓄積手段116に記録された通信記録をもとに、マルウェア実行装置104に送信すべきメッセージを生成する手段である。 Next, each device constituting the malware analysis system will be described.
The message transmission device 101 is a device that generates and transmits a message to be input to the analysis target malware 105 operating on the malware execution device 104, and includes a message transmission unit 102 and a message generation unit 103. The message transmission unit 102 is a unit that transmits the message generated by the message generation unit 103 to the malware execution device 104 via the network 118. The message generation means 103 is means for generating a message to be transmitted to the malware execution device 104 based on the communication record recorded in the communication record storage means 116.

マルウェア実行装置104は、解析対象マルウェア105を実行し、その動作履歴である実行トレースを記録する装置であり、マルウェア実行手段106と実行トレース記録手段107で構成される。マルウェア実行手段106は、解析対象マルウェア105を1マシン語単位で実行するためのエミュレータである。実行トレース記録手段107はマルウェア実行手段106上で解析対象マルウェア105が1マシン語命令を実行するたびに、実行トレースを記録していく。 The malware execution device 104 is a device that executes the analysis target malware 105 and records an execution trace as its operation history, and includes a malware execution means 106 and an execution trace recording means 107. The malware execution means 106 is an emulator for executing the analysis target malware 105 in units of one machine word. The execution trace recording means 107 records an execution trace each time the analysis target malware 105 executes a one-machine language instruction on the malware execution means 106.

実行トレース解析装置108は、実行トレース記録手段107に記録された実行トレースを解析し、解析対象マルウェア105が利用している暗号化鍵を特定した上で、通信記録蓄積手段116に記録されている解析対象マルウェア105からの上り通信情報を復号する装置であり、通信記録入力手段109、復号化関数特定手段110、実行トレース入力手段111、鍵特定手段112、通信記録復号手段113、解析結果出力手段114、復号結果判定手段115で構成されている。 The execution trace analysis device 108 analyzes the execution trace recorded in the execution trace recording means 107, specifies the encryption key used by the analysis target malware 105, and then records it in the communication record storage means 116. A device for decoding uplink communication information from the analysis target malware 105, communication record input means 109, decryption function identification means 110, execution trace input means 111, key identification means 112, communication record decryption means 113, analysis result output means 114 and decoding result determination means 115.

通信記録入力手段109は、通信記録蓄積手段116から解析対象とする通信記録情報を取得する手段である。実行トレース入力手段111は、マルウェア実行装置104の実行トレース記録手段107からマルウェア実行手段106が1マシン語単位で実行した解析対象マルウェア105の実行トレース情報を受け取る手段である。復号化関数特定手段110は、実行トレース入力手段111が受け取った解析対象マルウェア105の実行トレース情報から解析対象マルウェア105内の復号化関数のアドレスおよび復号化関数アルゴリズムを特定する手段である。鍵特定手段112は、受け取った実行トレース情報から解析対象マルウェア105が復号に使用した暗号化鍵を特定する手段である。通信記録復号手段113は鍵特定手段112によって特定された暗号化鍵と、復号化関数特定手段110が特定した復号化関数アルゴリズムを用いて通信記録入力手段109によって通信記録蓄積手段116より取得した解析対象マルウェア105からの上り通信情報を復号する手段である。復号結果判定手段115は通信記録復号手段113が復号した結果と復号前の通信情報記録とを比較し復号に成功したかどうかを判定する手段である。解析結果出力手段114は解析結果を利用者117に出力するための手段である。 The communication record input means 109 is means for acquiring communication record information to be analyzed from the communication record storage means 116. The execution trace input means 111 is means for receiving execution trace information of the analysis target malware 105 executed by the malware execution means 106 in units of one machine word from the execution trace recording means 107 of the malware execution device 104. The decryption function specifying unit 110 is a unit that specifies the address of the decryption function and the decryption function algorithm in the analysis target malware 105 from the execution trace information of the analysis target malware 105 received by the execution trace input unit 111. The key specifying unit 112 is a unit that specifies the encryption key used for decryption by the analysis target malware 105 from the received execution trace information. The communication record decryption unit 113 uses the encryption key specified by the key specification unit 112 and the decryption function algorithm specified by the decryption function specification unit 110 to perform the analysis acquired from the communication record storage unit 116 by the communication record input unit 109. This is a means for decoding upstream communication information from the target malware 105. The decoding result determination means 115 is a means for comparing the result decoded by the communication record decoding means 113 with the communication information record before decoding to determine whether or not the decoding has succeeded. The analysis result output unit 114 is a unit for outputting the analysis result to the user 117.

次に動作について説明する。
本システムでは、組織内ネットワーク上の各端末とインターネットとのHTTP(Hyper Text Transfer Protocol)通信情報が通信記録蓄積手段116に記録されている。通信記録蓄積手段116に記録される通信情報を図３に示す。図３に示す通り、本システムでは通信記録蓄積手段116には、発信元のIPアドレスであるsrc ip301、宛先URL302、上りデータ303、および下りデータ304が、発信元から送信されたリクエスト単位で記録される。
組織内ネットワーク上のある情報処理端末でマルウェアプログラムが発見されたとする。利用者117は、同マルウェアプログラムを解析対象マルウェア105としてマルウェア実行装置104に格納し、さらにシステム実行時のパラメータとして感染が確認された端末のIPアドレスをメッセージ送信装置101および実行トレース解析装置108の図示しないメモリに入力した上で本システムでの解析を開始する。 Next, the operation will be described.
In this system, HTTP (Hyper Text Transfer Protocol) communication information between each terminal on the intra-organization network and the Internet is recorded in the communication record storage means 116. The communication information recorded in the communication record storage means 116 is shown in FIG. As shown in FIG. 3, in this system, the communication record storage means 116 records the source IP address src ip 301, destination URL 302, upstream data 303, and downstream data 304 in units of requests transmitted from the source. Is done.
Assume that a malware program is found on an information processing terminal on the network in the organization. The user 117 stores the malware program in the malware execution device 104 as the analysis target malware 105, and further sets the IP address of the terminal confirmed to be infected as a parameter at the time of system execution of the message transmission device 101 and the execution trace analysis device 108. After input to a memory (not shown), analysis in this system is started.

解析対象マルウェア105のプログラムはマルウェア実行手段106によって起動される。マルウェア実行手段106は、解析対象マルウェア105をマシン語単位でステップ実行する。このような手段は既存の技術、例えばデバッガのステップ実行や、あるいは非特許文献1；［http://wiki.qemu.org/Main_Page］に示されるQEMU等のCPUエミュレータを利用することで構成可能であり、その実現方式の詳細については説明を省く。 The program of the analysis target malware 105 is started by the malware execution means 106. The malware execution means 106 executes the analysis target malware 105 step by step in machine words. Such means can be configured by using existing technology, for example, step execution of a debugger, or a CPU emulator such as QEMU shown in Non-Patent Document 1; [http://wiki.qemu.org/Main_Page] Therefore, the details of the implementation method will be omitted.

マルウェア実行手段106で解析対象マルウェア105のプログラム内のマシン語が1ステップ実行されると、その記録が実行トレース記録手段107によって実行トレース情報として記録される。記録される実行トレース情報は図２に示すとおり、実行アドレス201、実行命令202、参照したメモリアドレスおよび格納されていた値203、書き込みが行われたメモリアドレスおよび書き込まれた値204、実行直前のレジスタ値205で構成される。
マルウェア実行手段106上で解析対象マルウェア105のプログラムが起動すると、解析対象マルウェア105のプログラムはネットワーク118を通じてインターネット上にある司令サーバとの接続を試みる。メッセージ送信装置101は、解析対象マルウェア105からはHTTPプロキシサーバとして見えるよう構成されているため、解析対象マルウェア105のプログラムはメッセージ送信装置101に対してHTTPリクエストを送信する。 When the machine language in the program of the analysis target malware 105 is executed by the malware execution means 106 for one step, the record is recorded as execution trace information by the execution trace recording means 107. As shown in FIG. 2, the recorded execution trace information includes an execution address 201, an execution instruction 202, a referenced memory address and a stored value 203, a written memory address and a written value 204, and an immediately preceding execution. It consists of a register value 205.
When the analysis target malware 105 program is activated on the malware execution means 106, the analysis target malware 105 program attempts to connect to the command server on the Internet through the network 118. Since the message transmission device 101 is configured to be viewed as an HTTP proxy server from the analysis target malware 105, the analysis target malware 105 program transmits an HTTP request to the message transmission device 101.

HTTPリクエストを受信したメッセージ送信装置101内では、メッセージ生成手段103が、解析対象マルウェア105のプログラムからのHTTPリクエストに含まれる宛先URLおよび、あらかじめ利用者117が入力しておいた感染が確認された端末のIPアドレスをキーとして通信記録蓄積手段116を検索し、該当する通信記録の中から一つを選び、その下りデータ304を取り出してメッセージ送信手段102に渡す。メッセージ送信手段102はメッセージ生成手段103から渡されたメッセージを、ネットワーク118を通じて解析対象マルウェア105に返す。
このような通信を数回程度実行後、マルウェア実行手段106は解析対象マルウェア105のプログラム実行を停止し、それまでに実行トレース記録手段107によって記録された実行トレース情報を実行トレース解析装置108に送信する。 In the message transmission device 101 that has received the HTTP request, the message generation unit 103 has confirmed the destination URL included in the HTTP request from the analysis target malware 105 program and the infection that the user 117 has input in advance. The communication record storage means 116 is searched using the IP address of the terminal as a key, one of the corresponding communication records is selected, the downlink data 304 is taken out and passed to the message transmission means 102. The message transmission unit 102 returns the message passed from the message generation unit 103 to the analysis target malware 105 through the network 118.
After executing such communication several times, the malware execution means 106 stops the program execution of the analysis target malware 105 and sends the execution trace information recorded by the execution trace recording means 107 to the execution trace analysis apparatus 108. To do.

実行トレース解析装置108内では、実行トレース入力手段111によって、マルウェア実行装置104から送信された実行トレース記録手段107によって記録された実行トレース情報が受信され、この実行トレース情報は初めに復号化関数特定手段110に入力される。復号化関数特定手段110では、実行トレース情報をもとに、復号化関数のアドレスと、同関数が使用する復号アルゴリズム名を特定する。このような技術は、例えば非特許文献2；［Z. Wang, X. Jiang, W. Cui, and X. Wang. ReFormat: Automatic reverse engineering of encrypted messages. In European Symposium on Research in Computer Security, Saint-Malo, France, September 2009.］や非特許文献3；［Felix Grobert, Carsten Willems, and Thorsten Holz Automated Identification of Cryptographic Primitives in Binary Programs,14th International Symposium on Recent Advances in Intrusion Detection (RAID)］記載の技術を用いることで実現が可能である。 In the execution trace analysis apparatus 108, the execution trace input means 111 receives the execution trace information recorded by the execution trace recording means 107 transmitted from the malware execution apparatus 104, and this execution trace information is first identified by the decryption function. Input to means 110. Based on the execution trace information, the decryption function specifying unit 110 specifies the address of the decryption function and the name of the decryption algorithm used by the function. For example, Non-Patent Document 2; [Z. Wang, X. Jiang, W. Cui, and X. Wang. ReFormat: Automatic reverse engineering of encrypted messages. In European Symposium on Research in Computer Security, Saint- Malo, France, September 2009.] and Non-Patent Document 3; [Felix Grobert, Carsten Willems, and Thorsten Holz Automated Identification of Cryptographic Primitives in Binary Programs, 14th International Symposium on Recent Advances in Intrusion Detection (RAID)] It can be realized by using it.

復号化関数特定手段110による復号化関数のアドレスと、使用している復号アルゴリズム名を特定後、実行トレース情報は鍵特定手段112に入力され、復号に使用された暗号化鍵の特定が行われる。鍵特定手段112の動作について図４〜図１０を用いて詳しく説明する。
はじめに鍵特定手段112の処理の概要について図５を用いて説明する。鍵特定手段112は、入力された実行トレース情報を走査し、呼び出し引数によるネットワーク受信関数の呼び出しまで実行トレース情報をスキップする(S501)。ネットワーク受信関数は、OS(operating system )が標準で提供するものであり、呼び出しに必要な引数は明らかである。そこで、ネットワーク受信関数に渡された引数のうち、受信バッファのアドレス、および受信バッファ長を、関数呼び出しの直前に記録されたPUSH等のスタック操作命令の実行トレース情報から求め、それぞれ鍵特定手段112のメモリの変数Pmsg、Lmsg領域に格納しておく(S502)。 After specifying the address of the decryption function and the name of the decryption algorithm used by the decryption function identifying unit 110, the execution trace information is input to the key identifying unit 112, and the encryption key used for decryption is identified. . The operation of the key specifying means 112 will be described in detail with reference to FIGS.
First, the outline of the processing of the key specifying means 112 will be described with reference to FIG. The key specifying unit 112 scans the input execution trace information and skips the execution trace information until the network reception function is called by the call argument (S501). The network reception function is provided by the OS (operating system) as standard, and the arguments required for the call are clear. Therefore, among the arguments passed to the network reception function, the address of the reception buffer and the reception buffer length are obtained from the execution trace information of the stack operation instruction such as PUSH recorded immediately before the function call, and each of the key specifying means 112 is obtained. Are stored in the Pmsg and Lmsg areas of the memory (S502).

次に、ネットワーク受信関数実行後のアドレスまで実行トレース情報をスキップし、その時の実行トレース行を鍵特定手段112の変数curに格納する(S503)。また鍵特定手段112のメモリの変数Pおよび変数Flowsを初期化する(S504、S505)。
ここで、メモリＰおよびメモリFlowsに格納されるデータについて説明する。はじめにメモリFlowsに格納される情報について説明する。Flowsには、解析対象マルウェア105の解析が進むにつれデータ依存関係履歴情報と呼ばれる情報が格納されていく。データ依存関係履歴情報とは、実行トレース上のある命令が実行された時点で、各レジスタ/メモリと、そこに書き込まれているデータが依存しているデータについて記録したものである。ここで、依存しているデータとは、あるデータの値を計算する際に、直接、あるいはレジスタを通じて間接に参照されたメモリ上のデータを指す。 Next, the execution trace information is skipped to the address after execution of the network reception function, and the execution trace line at that time is stored in the variable cur of the key specifying means 112 (S503). Further, the memory variable P and variable Flows of the key specifying means 112 are initialized (S504, S505).
Here, data stored in the memory P and the memory Flows will be described. First, information stored in the memory Flows will be described. In Flows, information called data dependency history information is stored as analysis of the analysis target malware 105 proceeds. The data dependency history information is a record of data on which each register / memory and data written therein depend on when a certain instruction on the execution trace is executed. Here, the dependent data refers to data on a memory that is directly or indirectly referenced through a register when calculating the value of certain data.

データ依存関係履歴情報は図４に示されるような情報であり、実行トレース行番号401、実行アドレス402、依存関係情報403で構成されている。依存関係情報403はさらにデータ格納場所404、依存データ情報405で構成されている。依存データ情報405はさらにアドレス406、および有効実行トレース行番号407で構成されている。
有効実行トレース行番号407とは、実行トレース行番号401で示される命令が実行されたときに、依存データ情報405に記載されたアドレス406に格納されていた値が、実行トレースのどの時点で参照されたのかを示す実行トレース行番号を指す。これは、同一メモリアドレスに複数回データ書き込みがあった場合、どの時点で書き込まれたデータかを追跡するために必要な情報である。なお、図４において、各情報名に括弧書きで記載されている文字列は、後の動作説明で用いる際の各情報名の略称である。 The data dependency relationship history information is information as shown in FIG. 4 and includes an execution trace line number 401, an execution address 402, and dependency relationship information 403. The dependency relationship information 403 further includes a data storage location 404 and dependency data information 405. The dependency data information 405 further includes an address 406 and a valid execution trace line number 407.
The effective execution trace line number 407 refers to the value stored in the address 406 described in the dependency data information 405 when the instruction indicated by the execution trace line number 401 is executed. Points to the execution trace line number indicating whether This is information necessary for tracking at which point data is written when data is written to the same memory address a plurality of times. In FIG. 4, a character string described in parentheses in each information name is an abbreviation for each information name when used in the subsequent operation description.

次にメモリPに格納される情報・変数Pについて説明する。メモリPには解析対象マルウェア105が受信したメッセージを復号する際に、メッセージ内のデータとの間でビット演算/算術演算を行ったデータ群が依存データ情報(図４ 405記載)の形式で格納される。このようなメッセージ内のデータとの間でビット演算や算術演算を行ったデータは、暗号化鍵、あるいは拡大暗号化鍵等、暗号化鍵から派生したデータである可能性があるため、これらを暗号化鍵の候補として記録していくのが目的である。
最終的に、メモリPに記録された各データが依存しているデータをメモリFlowsの変数Flowsを用いてさかのぼっていく事で暗号化鍵を特定するのが鍵特定手段112の目的となる。 Next, the information / variable P stored in the memory P will be described. When decoding the message received by the malware 105 to be analyzed, the memory P stores a data group obtained by performing bit operation / arithmetic operation with the data in the message in the form of dependent data information (described in FIG. 4405). Is done. Data that has undergone bit operations or arithmetic operations with data in such messages may be data derived from encryption keys, such as encryption keys or extended encryption keys. The purpose is to record them as encryption key candidates.
Finally, the purpose of the key specifying means 112 is to specify the encryption key by tracing back the data on which each data recorded in the memory P depends using the variable Flows of the memory Flows.

再び図５に戻って動作を説明する。本解析は鍵特定手段112のcurが復号関数の終端に達するまでループで実行される(S506)。ループ内ではまず、現在curが示している実行トレース行の内容に従い、鍵特定手段112の処理プログラムのデータ依存関係履歴情報更新手続きを呼び出し、データ依存関係履歴情報が更新される(S507)。次に、curで実行した命令がビット演算/算術演算だった場合(S508)、同演算で参照されたレジスタあるいはメモリに、受信したメッセージに依存したデータが含まれているか、鍵特定手段112の処理プログラムのデータ依存関係確認手続きを呼び出して確認する(S509)。もし受信メッセージに依存したデータが参照されていたならば(S510)、その依存しているデータについて、Pに追加する(S511)。その後curを次の実行トレース行にうつし、ループの先頭S506に戻る(S512)。 Returning to FIG. 5 again, the operation will be described. This analysis is executed in a loop until cur of the key specifying means 112 reaches the end of the decryption function (S506). In the loop, first, the data dependency history information update procedure of the processing program of the key specifying unit 112 is called according to the contents of the execution trace line currently indicated by cur, and the data dependency history information is updated (S507). Next, if the instruction executed in cur is a bit operation / arithmetic operation (S508), whether the register or memory referenced in the operation includes data depending on the received message, The data dependency confirmation procedure of the processing program is called and confirmed (S509). If the data depending on the received message is referenced (S510), the dependent data is added to P (S511). Then, cur is moved to the next execution trace line, and the process returns to the top S506 of the loop (S512).

S506での判定が真、すなわちcurが復号関数の終端になるとS513にて鍵特定手段112の処理プログラムの鍵格納バッファ候補群抽出手続きが呼び出され、Pに含まれているデータのうち、暗号化鍵の候補となるメモリ領域が抽出され、鍵候補バッファ集合の変数Kに代入される。最終的に本手段の処理結果として鍵候補バッファ集合の変数Kが出力される(S514)。 When the determination in S506 is true, that is, cur is the end of the decryption function, the key storage buffer candidate group extraction procedure of the processing program of the key specifying means 112 is called in S513, and encryption is performed among the data included in P. A memory area to be a key candidate is extracted and substituted into a variable K of the key candidate buffer set. Finally, the variable K of the key candidate buffer set is output as the processing result of this means (S514).

次に、鍵特定手段112の処理プログラムのデータ依存関係履歴情報更新手続き処理について図６および図７を用いて説明する。本手続きは、解析対象マルウェア105の実行トレース情報trace、および現在のデータ依存関係履歴情報Fを入力として受け取り、実行トレース情報traceの内容に従ってデータ依存関係履歴情報Fを更新し、更新後のデータ依存関係履歴情報Fを戻し、記録する。
まず、S601〜S604で変数の初期化を行う。S601ではFlowsの中から実行トレース情報traceの直前(つまり実行トレース行番号が現在の実行トレース情報traceよりも1少ない)の実行結果として算出された依存関係情報を取り出し、変数lastFlowに格納する。次に、traceに記録されている命令を解析し、命令のソースオペランドおよびディスティネーションオペランドを、変数Op_s、Op_dに格納する(S602)，(S603)。Op_sは命令によっては存在しない場合もあるが、その場合は空値が代入される。最後に、依存データ情報(図４ 405参照)の集合Dを空集合に初期化する(S604)。 Next, the data dependency history information update procedure processing of the processing program of the key specifying means 112 will be described with reference to FIGS. This procedure receives the execution trace information trace of the analysis target malware 105 and the current data dependency history information F as inputs, updates the data dependency history information F according to the contents of the execution trace information trace, and updates the data dependency after the update. Return and record the relationship history information F.
First, variables are initialized in S601 to S604. In S601, the dependency information calculated as the execution result immediately before the execution trace information trace (that is, the execution trace line number is one less than the current execution trace information trace) is extracted from Flows and stored in the variable lastFlow. Next, the instruction recorded in the trace is analyzed, and the source operand and destination operand of the instruction are stored in the variables Op_s and Op_d (S602) and (S603). Op_s may not exist depending on the instruction, but in that case, a null value is substituted. Finally, the set D of the dependency data information (see FIG. 4405) is initialized to an empty set (S604).

初期化終了後、実際の処理が行われる。最初にS605にてOp_sが空値もしくはOp_sが即値(イミディエート)かどうか判定される。判定結果が真だったならば、図７ S701まで処理がスキップされる。S605の判定が偽だった場合、Op_sがレジスタを指しているかどうかが判定される(S606)。レジスタだった場合、S612において、変数DにlastFlow[ store = Op_s ]が代入される。ここで、表記 xxx[ yyy = zzz ]は、依存関係情報xxxの中から情報名yyyがzzzに一致するものを全て選び、その依存データ情報(図4 405)を集合として取り出す処理を実行することを表している。 After initialization, actual processing is performed. First, in S605, it is determined whether Op_s is a null value or Op_s is an immediate value (immediate). If the determination result is true, the processing is skipped to S701 in FIG. If the determination in S605 is false, it is determined whether Op_s points to a register (S606). If it is a register, lastFlow [store = Op_s] is substituted for variable D in S612. Here, the notation xxx [yyy = zzz] selects all of the dependency information xxx whose information name yyy matches zzz, and executes the process of extracting the dependency data information (FIG. 4405) as a set. Represents.

S606の判定が偽だった場合、Op_sはメモリ参照を表している。インテル社の80386(またはその後継)では、メモリ参照はベース、インデックス、スケール、ディスプレースメントの4パラメータを組み合わせてメモリアドレスが決定される。ベース、インデックスはレジスタ名を指定する。スケールは定数であり、ディスプレースメントはメモリアドレスで指定される。スケールを除いた3パラメータと、実際に参照されたメモリアドレスがS607〜S610で対応する変数(base, index, disp, addr)に格納される。
次に、S611において、base、index、dispに対し、対応する依存データ情報が取り出され、それらの和集合が依存データ情報の集合Dに格納される。さらに変数である集合Dには新たな依存データ情報として、address=addr、d_line=traceの行番号が追加される。 If the determination in S606 is false, Op_s represents a memory reference. In Intel's 80386 (or its successor), the memory reference is determined by combining four parameters: base, index, scale, and displacement. The base and index specify the register name. The scale is a constant, and the displacement is specified by a memory address. The three parameters excluding the scale and the actually referenced memory address are stored in the corresponding variables (base, index, disp, addr) in S607 to S610.
Next, in S611, the dependent data information corresponding to base, index, and disp is extracted, and the union of these is stored in the dependent data information set D. Furthermore, the line number of address = addr, d_line = trace is added to the set D, which is a variable, as new dependency data information.

その後の処理について図７を用いて説明する。S701で新たな依存関係情報として、lastFlowをコピーした情報が生成され変数newFlowに格納される。その後の処理は、Op_dの種別によって分岐する。Op_dがレジスタだった場合(S702 yesへの分岐)、変数destにはOp_dの示すレジスタがそのまま代入される(S707)。そうではなかった場合(S702 noへの分岐)、Op_dはメモリ参照であるため、Op_dが参照するメモリアドレスが計算され、destに代入される(S703)。 The subsequent processing will be described with reference to FIG. In S701, information obtained by copying lastFlow is generated as new dependency relationship information and stored in the variable newFlow. Subsequent processing branches depending on the type of Op_d. If Op_d is a register (branch to S702 yes), the register indicated by Op_d is directly substituted for variable dest (S707). Otherwise (branch to S702 no), since Op_d is a memory reference, the memory address referred to by Op_d is calculated and assigned to dest (S703).

次に、S704で現在解析対象としている実行トレース情報traceの命令がmov命令かどうかが検査される。mov命令であった場合、依存データ情報の集合の変数Dの内容がnewFlow[ store = dest ]に代入される(S708)。この操作は、newFlowの中でstore = destに該当する依存データ情報を取り除き、かわりに集合の変数Dを代入する操作となる。
S704の判定が偽だった場合、newFlow[ store = dest ]は、現在のnewFlow[ store = dest ]と変数Dとの和集合で更新される(S705)。最後にS706で、Fに(line = trace行番号、exec = trace内の命令アドレス、依存関係情報 = newFlow)で構成されるエントリを追加し、現在のデータ依存関係履歴情報Fを返して処理が終了する。 Next, in S704, it is checked whether the instruction of the execution trace information trace currently being analyzed is a mov instruction. If it is a mov instruction, the contents of the variable D of the set of dependent data information are substituted into newFlow [store = dest] (S708). This operation is an operation of removing the dependency data information corresponding to store = dest in newFlow and substituting the set variable D instead.
If the determination in S704 is false, newFlow [store = dest] is updated with the union of current newFlow [store = dest] and variable D (S705). Finally, in S706, an entry consisting of (line = trace line number, instruction address in exec = trace, dependency information = newFlow) is added to F, and the current data dependency history information F is returned and processing is performed. finish.

次に、データ依存関係確認手続きの詳細について図８を用いながら説明する。本手続きは、引数として与えられたデータが受信バッファに依存するかどうかを判定する。
S801で、引数として与えられた情報が各変数に格納される。S802でFSからline=trとなる履歴を取得し現在のデータ依存関係履歴情報Fに代入(S802)し、さらに現在のデータ依存関係履歴情報F[store = s]を依存データ情報の変数である集合Dに代入する(S803)。次にS804〜S810で、集合D内の各依存データ情報に対し、S805〜S809で示す処理が行われる。S805、S806で、エントリのaddress(図４ 406)、およびd_line(図４ 407)が変数m、iに代入される。もしmがPmsgで示される受信バッファ内を指していたならば(S807 yes)、本手続きは真を返して終了する。
さもなければ、mに格納されているデータが依存しているメモリが受信バッファに依存しているかどうかを再帰的に検査する(S808)。結果が真であれば(S809 yes)、真を返して終了するが、そうでない場合には、次のエントリに対して同様な検査を行う。
最終的に、依存データ情報の変数である集合D内の全てのエントリが受信バッファに依存していない場合、FALSEを返して処理が終了する。 Next, details of the data dependency confirmation procedure will be described with reference to FIG. This procedure determines whether the data given as an argument depends on the receive buffer.
In S801, information given as an argument is stored in each variable. In S802, the history of line = tr is acquired from the FS and substituted into the current data dependency history information F (S802), and the current data dependency history information F [store = s] is a variable of the dependency data information Substitute into set D (S803). Next, in S804 to S810, the processing shown in S805 to S809 is performed on each piece of dependent data information in the set D. In S805 and S806, the address (FIG. 4406) and d_line (FIG. 4407) of the entry are substituted into variables m and i. If m points to the reception buffer indicated by Pmsg (S807 yes), this procedure returns true and ends.
Otherwise, it recursively checks whether the memory on which the data stored in m depends depends on the reception buffer (S808). If the result is true (S809 yes), it returns true and the process ends. If not, the same check is performed on the next entry.
Finally, if all the entries in the set D, which is a variable of the dependent data information, do not depend on the reception buffer, FALSE is returned and the process ends.

次に、図９を用いて鍵格納バッファ候補群抽出手続きの詳細について説明する。本手続きは、引数として与えられた依存データ情報の依存関係を追跡し、鍵の候補とみなすバッファ領域の集合を返す。
まず初期化処理としてS901で引数として与えられた情報が各変数に代入され、鍵候補バッファ集合Kが空集合に初期化される(S902)。
次にS903で依存データ情報の集合P内の各要素(address, d_line)に対し、有効実行トレース行番号(d_line)に該当する実行アドレス(exec)を、複数の依存データと複数のマルウェア実行トレース情報との対応が記録されたデータ依存関係履歴情報の集合FSを参照して求め、3要素のタプル(adderss, d_line, exec)とする。 Next, details of the key storage buffer candidate group extraction procedure will be described with reference to FIG. This procedure tracks the dependency of the dependency data information given as an argument, and returns a set of buffer areas that are considered key candidates.
First, as initialization processing, the information given as an argument in S901 is substituted for each variable, and the key candidate buffer set K is initialized to an empty set (S902).
Next, in S903, for each element (address, d_line) in the dependency data information set P, the execution address (exec) corresponding to the effective execution trace line number (d_line) is set to multiple dependency data and multiple malware execution traces. It is obtained by referring to the set FS of data dependency history information in which the correspondence with the information is recorded, and is set as a three-element tuple (adderss, d_line, exec).

次に、S904で生成された3要素のタプル集合を実行アドレスexecの値が近い要素同士をまとめ、グループを作成する。近いかどうかの判定は事前に定義された閾値T1によって判定される。あるタプルがあるグループに所属するかどうかの判定は、そのグループ内で、対象タプルに最も実行アドレスexec値が近いものを選択し、対象タプルの実行アドレスexecとの差がT1以内だった場合、当該グループに所属すると判定する。すでにあるどのグループにも所属しない場合には、新たなグループが生成される。 Next, the elements having similar execution address exec values are gathered from the three-element tuple set generated in S904 to create a group. Whether or not they are close is determined by a predefined threshold value T1. To determine whether a tuple belongs to a group, select the one with the execution address exec value closest to the target tuple within the group, and if the difference between the target tuple execution address exec is within T1, It is determined that it belongs to the group. If it does not belong to any existing group, a new group is created.

次に生成されたグループ(G1〜GN)をループ(S905)によって取り出し、S906〜S912で示される処理が実行される。S906〜S912は、選択されたグループGiの各要素(address, d_line, exec)を順に取り出し、S907〜S911の処理を実行するためのループである。S907では、まずaddress、d_lineで示されるメモリが依存しているアドレス群(アドレスと有効実行トレース行番号のタプルの集合)を依存アドレス抽出手続きを用いて取り出しAに格納する。次に、S908において、取り出されたアドレス群と、実行トレース情報を用いてメモリマップを作成し、マップ上連続した領域をメモリグループとして取り出す。メモリマップは、(アドレス、値)からなる集合であり、Aに格納されているアドレスと有効実行トレース行番号をキーに実行トレース情報中の参照メモリアドレス/値を取得することで生成する。生成されたグループMm(m=1〜M)それぞれが鍵の格納バッファの候補となる。そこで、各バッファの領域に格納されているデータを、鍵候補バッファ集合Kに登録する(S910)。 Next, the generated groups (G1 to GN) are taken out by the loop (S905), and the processes indicated by S906 to S912 are executed. S906 to S912 are loops for sequentially extracting each element (address, d_line, exec) of the selected group Gi and executing the processing of S907 to S911. In S907, first, an address group (a set of tuples of addresses and valid execution trace line numbers) on which the memory indicated by address and d_line depends is fetched and stored in A using a dependent address extraction procedure. In step S908, a memory map is created using the extracted address group and execution trace information, and a continuous area on the map is extracted as a memory group. The memory map is a set of (address, value), and is generated by acquiring the reference memory address / value in the execution trace information using the address stored in A and the effective execution trace line number as a key. Each of the generated groups Mm (m = 1 to M) is a key storage buffer candidate. Therefore, the data stored in each buffer area is registered in the key candidate buffer set K (S910).

最後に、依存アドレス抽出手続き(図９ S907参照)の詳細について図１０を参照しながら説明する。本手続きは、与えられたアドレス上のデータが依存しているアドレスを辿っていき、他のどこにも依存していないアドレスの集合を返す。
はじめにS1001にて、引数として入力された情報を変数に代入する。次にS1002にて、指定された実行トレース行番号に対応する依存関係情報をデータ依存関係履歴情報の集合FSから取り出し現在のデータ依存関係履歴情報Fに代入し、F[ store = s ]を依存データ情報の変数である集合Dに代入する(S1003)。もし、集合Dが空集合だったならば(S1004 yes)、与えられたアドレスsは他のどこにも依存していないアドレスであり、集合Aに(s, trace)を追加して(S1011)処理を終了させる。 Finally, details of the dependent address extraction procedure (see FIG. 9, S907) will be described with reference to FIG. This procedure traverses the address on which the data at the given address depends and returns a set of addresses that do not depend on anywhere else.
First, in S1001, the information input as an argument is substituted into a variable. Next, in S1002, the dependency information corresponding to the specified execution trace line number is extracted from the data dependency history information set FS and assigned to the current data dependency history information F, and F [store = s] is dependent Substitute into set D, which is a variable of data information (S1003). If set D is an empty set (S1004 yes), given address s is an address that does not depend on anywhere else, and (s, trace) is added to set A (S1011). End.

集合Dが空集合ではなかった場合、S1005〜S1010のループを実行することで、集合D内の各エントリに対し、依存アドレス抽出手続きを再帰的に呼び出すことで、集合Aに、他のどこにも依存していないアドレスの集合を作成する。全てのエントリに対して依存アドレス抽出手続きが完了後、生成された集合Aを返して処理が終了する。
以上をもって、鍵特定手段の動作についての説明を終了し、再び全体の動作の説明を再開する。 If set D is not an empty set, execute the loop from S1005 to S1010 to recursively call the dependent address extraction procedure for each entry in set D, so that set A can be found anywhere else Create a set of independent addresses. After the dependency address extraction procedure is completed for all entries, the generated set A is returned, and the process ends.
Thus, the description of the operation of the key specifying means is finished, and the description of the overall operation is resumed.

鍵特定手段112での処理の結果生成された鍵の候補のそれぞれを用いて、通信記録復号手段113において復号が試みられる。通信記録復号手段113は通信記録蓄積手段に蓄積された通信記録のうち、感染端末IPアドレス、および宛先URLに合致するものに含まれている上りデータ情報(図3 303)を取得し、それぞれに対して鍵の候補と特定したアルゴリズムを用いて復号を行う。復号の結果と、元の上りデータを入力として復号結果判定手段115が呼び出される。 Using each of the key candidates generated as a result of the processing by the key specifying unit 112, the communication recording / decrypting unit 113 attempts to decrypt. The communication record decoding unit 113 acquires the upstream data information (303 in FIG. 3) included in the communication record stored in the communication record storage unit and included in the one matching the infected terminal IP address and the destination URL. On the other hand, decryption is performed using an algorithm identified as a key candidate. The decoding result determination unit 115 is called by inputting the decoding result and the original uplink data.

復号結果判定手段115では、元のデータの復号が成功したかどうか、データのランダム性を測定することで判定する。判定は、元のデータ、復号後のデータを各々共通の圧縮アルゴリズムを用いて圧縮し、復号後のデータを圧縮した場合の圧縮後データサイズが、復号前のデータを圧縮した場合の圧縮後データサイズと比較して顕著に(例えば10%)以上小さくなった場合に、復号に成功したとみなす。
復号結果判定手段115により、復号に成功したと判定された場合には、復号結果とその時の鍵を解析結果出力手段を通じて利用者に通知し、本システムの動作が終了する。
なお、本形態では通信記録蓄積手段116に格納される通信をHTTPに絞って説明したが、他の通信プロトコルの通信記録を対象として解析することももちろん可能である。 Decoding result determination means 115 determines whether or not the original data has been successfully decoded by measuring the randomness of the data. The determination is that the original data and the decoded data are compressed using a common compression algorithm, and the compressed data size when the decoded data is compressed is the compressed data when the pre-decoded data is compressed. Decoding is considered successful when it is significantly smaller (eg, 10%) or smaller than the size.
If the decryption result determination means 115 determines that the decryption has succeeded, the decryption result and the key at that time are notified to the user through the analysis result output means, and the operation of this system ends.
In the present embodiment, the communication stored in the communication record storage unit 116 has been described by focusing on HTTP. However, it is of course possible to analyze communication records of other communication protocols.

以上のように、実行トレース情報を記録しながら解析対象マルウェアに、過去に蓄積されたメッセージを復号させ、実行トレースを解析して鍵を特定することで、過去にアップロードされた情報を復号し、内容を取り出すことができるという効果がある。
さらに、鍵の候補が複数現れた場合に、復号前と復号後のデータのランダム性を比較することで、利用者が目視で確認することなく、正しい鍵を選択できるという効果がある。 As described above, while analyzing the execution trace information, the analysis target malware decrypts the message accumulated in the past, analyzes the execution trace and identifies the key, and decrypts the information uploaded in the past, The content can be taken out.
Furthermore, when a plurality of key candidates appear, there is an effect that the user can select the correct key without visual confirmation by comparing the randomness of the data before and after decryption.

さらに、鍵格納バッファ候補群抽出手続きにおいて、鍵の候補を特定する際に、参照したマシン語命令アドレスが近いかどうかを基準にアドレスをグループ分けすることで、連続したメモリ上に鍵データとは別のデータが並んでいたとしても、鍵データの一部として取り出されることを防ぐ効果がある。
さらに、鍵候補を抽出する際に、受信バッファ由来のデータとビット演算/算術演算されるデータのアドレスに絞り込んで解析を行うことで、抽出される鍵候補の数を削減することができるという効果がある。 Furthermore, in the key storage buffer candidate group extraction procedure, when identifying key candidates, by grouping the addresses based on whether the referenced machine language instruction addresses are close, the key data is stored on the continuous memory. Even if other data are arranged, there is an effect of preventing the data from being extracted as part of the key data.
Furthermore, when extracting key candidates, the number of key candidates to be extracted can be reduced by narrowing down the analysis to the addresses of the data derived from the reception buffer and the data subjected to bit operation / arithmetic operation. There is.

実施の形態２．
以上の実施の形態１では、復号結果判定手段において、圧縮率の変化を用いるものであるが、本実施の形態では、復号結果後のデータが既知のファイルフォーマットに一致するかを条件として判定する。ファイルフォーマットに関する情報はフォーマット定義蓄積手段119に保存されている。
以上のように、復号結果の判定に既知のファイルフォーマットと一致するかという条件を用いることで、復号後のデータが圧縮されていたとしても、復号の成否を正しく判定することができるという効果がある。 Embodiment 2. FIG.
In the first embodiment described above, the decoding result determination means uses a change in compression rate. In the present embodiment, however, the determination is made based on whether the data after the decoding result matches a known file format. . Information regarding the file format is stored in the format definition storage means 119.
As described above, it is possible to correctly determine the success or failure of the decryption even if the decrypted data is compressed by using the condition of whether or not it matches the known file format in the determination of the decryption result. is there.

この発明に係るマルウェア解析システムは、例えば、インターネットに接続された組織内ネットワーク上の情報処理端末に適用され、情報処理端末より漏洩したマルウェアに感染の情報を暗号化鍵を用いて特定でき、マルウェア感染の漏洩情報対策への利用可能性を有する。 The malware analysis system according to the present invention is applied to, for example, an information processing terminal on an intra-organization network connected to the Internet, and can infect information leaked from the information processing terminal using an encryption key. It has the potential to be used for countermeasures against infection leakage information.

101；メッセージ送信装置、102；メッセージ送信手段、103；メッセージ生成手段、104；マルウェア実行装置、105；解析対象マルウェア、106；マルウェア実行手段、107；実行トレース記録手段、108；実行トレース解析装置、109；通信記録入力手段、110；復号化関数特定手段、111；実行トレース入力手段、112；鍵特定手段、113；通信記録復号手段、114；解析結果出力手段、115；復号結果判定手段、116；通信記録蓄積手段、117；利用者、118；ネットワーク、119；フォーマット定義蓄積手段。 101; Message sending device, 102; Message sending means, 103; Message generation means, 104; Malware execution device, 105; Analysis target malware, 106; Malware execution means, 107; Execution trace recording means, 108; Execution trace analysis device, 109; communication record input means, 110; decryption function specifying means, 111; execution trace input means, 112; key specifying means, 113; communication record decoding means, 114; analysis result output means, 115; Communication record storage means 117; user 118; network 119; format definition storage means.

Claims

A malware analysis system for identifying an encryption key of malware necessary for identifying information leaked from an information processing terminal connected to the Internet and infected with malware,
Communication record storage means in which communication information between the information processing terminal and the Internet is recorded;
Malware execution means having malware execution means for storing infected malware and executing the malware program, and execution trace recording means for recording execution trace information each time the malware executes a machine language instruction on the malware execution means Equipment,
In response to a connection request from malware, a message transmission device that selects a message from the past communication record of the communication record storage means and returns it to the malware;
Scans the execution trace information from the execution trace recording means, calls the network reception function by the call argument, finds the address of the receive buffer and the receive buffer length among the arguments, stores them in the variable Pmsg, Lmsg area,
Update the data dependency history information that records the correspondence between the dependency data and the malware execution trace information by executing the malware program.
The process of adding the data group that has undergone bit operation or arithmetic operation to the data in the message received by the malware to the dependent data information is performed until the end of the execution trace line of the malware program,
A key specifying means for extracting and outputting a key storage buffer candidate group by grouping the dependent data information to which data is added according to a predetermined criterion;
When a plurality of encryption key candidates are extracted by the key specifying means, the data is decrypted with the extracted encryption key candidates, the data before decryption is compared with the data after decryption, and the encryption key successfully decrypted is determined. A malware analysis system comprising: an execution trace analysis device having a decryption result determination means to select.

Whether the key identification means grouping of the dependent data information to which the data in the key storage buffer candidate group extraction processing is added is whether the machine instruction address of the malware program corresponding to the effective execution trace line number of the dependent data information is close The malware analysis system according to claim 1, wherein the addresses are grouped on the basis of and the dependency data information is grouped.

A set of data dependency history information in which correspondence between multiple dependency data and multiple malware execution trace information is recorded,
The key identification means is
Scans the execution trace information from the execution trace recording means, finds the address of the reception buffer and the reception buffer length among the call arguments of the network reception function by the call argument, stores them in the variables Pmsg and Lmsg areas,
The data dependency information corresponding to the execution trace line number during execution of the malware program is extracted from the set of data dependency history information as the current data dependency history information, and the dependency data information of the current data dependency history information is set to set D. The substitution process is executed up to the end of the execution trace line of the malware program, and if the set D is an empty set, the set of addresses of the reception buffer is output as an address that does not depend on anywhere else,
2. If the set D is not an empty set, dependent address extraction procedure processing is performed for each entry in the set D, and a set of addresses that do not depend anywhere else is generated and output. Malware analysis system described in.

The decryption result determination means
Decrypt the data with the encryption key candidates extracted by the key identification means,
The original data and the decoded data are compressed using a common compression algorithm, and the compressed data size when the decoded data is compressed is compared with the compressed data size when the data before decoding is compressed. The malware analysis system according to claim 1, wherein the malware analysis system is configured to identify an encryption key that has been successfully decrypted.

A format definition storage means for storing information on the file format;
The decryption result determining means decrypts the data with the encryption key candidate extracted by the key specifying means, and the encryption key is obtained on the condition that the data after the decryption result matches the file format stored in the format definition storage means. The malware analysis system according to claim 1, wherein the malware analysis system is configured to specify