JP2017162244A

JP2017162244A - Information processing device, information processing method, program, and computer-readable recording medium with program recorded

Info

Publication number: JP2017162244A
Application number: JP2016046762A
Authority: JP
Inventors: 友輔岡野; Yusuke Okano
Original assignee: FFRI Inc
Current assignee: FFRI Inc
Priority date: 2016-03-10
Filing date: 2016-03-10
Publication date: 2017-09-14
Anticipated expiration: 2036-03-10
Also published as: JP5982597B1

Abstract

PROBLEM TO BE SOLVED: To easily determine that a disguised icon is malware even when an icon image changes even a little.SOLUTION: An information processing device according to one embodiment of the present invention may comprise: a feature information extraction part for extracting a binary of an icon image from a resource of a predetermined file; a feature vector generation part for generating a feature vector from the binary of the extracted icon image; and a determination part for determining whether or not the predetermined file is malware by machine learning by using the feature vector. Also, the program may cause a computer to extract the binary of the icon image from the source of the predetermined file, and to generate the feature vector from the binary of the extracted icon image, and to determine whether or not the predetermined file is malware by machine learning by using the feature vector.SELECTED DRAWING: Figure 1

Description

本発明は、情報処理装置、情報処理方法、プログラム及びプログラムを記録したコンピュータ読み取り可能な記録媒体に関する。本発明は、特に、マルウェアを検出する情報処理装置、情報処理方法、プログラム及びプログラムを記録したコンピュータ読み取り可能な記録媒体に関する。 The present invention relates to an information processing apparatus, an information processing method, a program, and a computer-readable recording medium on which the program is recorded. The present invention particularly relates to an information processing apparatus that detects malware, an information processing method, a program, and a computer-readable recording medium that records the program.

近年、アイコンを一見無害に見える他のソフトウェアやファイルのアイコンに偽装した上で、不正かつ有害な動作を行う意図で作成された悪意のあるソフトウェアや悪質なコードをユーザに送付し、ユーザが悪意のあるソフトウェアや悪質なコードをそれと知らずに実行してしまう問題が生じている。ここで、不正かつ有害な動作を行う意図で作成された悪意のあるソフトウェアや悪質なコードをマルウェア（ＭａｌｉｃｉｏｕｓＳｏｆｔｗａｒｅ）と呼ぶ。 In recent years, malicious software or malicious code created with the intention of performing illegal and harmful operations after impersonating an icon of another software or file that looks harmless at first appears to the user. There is a problem of running malicious software and malicious code without knowing it. Here, malicious software or malicious code created with the intention of performing an illegal and harmful operation is referred to as malware (Malicius Software).

マルウェアに感染したコンピュータはネットワークに接続された他のコンピュータに対して不正あるいは有害な動作を行うことが特徴であり、例えば、迷惑メールの大量送信や、サーバへの不正な大量アクセスによるサービス妨害攻撃といった悪質な行動を行うためのツールとして使われる。マルウェアの脅威は外部に対する攻撃のみならず、感染したコンピュータからクレジットカードの番号やアドレス帳などの個人情報を抽出し、外部のコンピュータに送信する動作も存在する。このようなマルウェアによる被害を未然に防ぐためにはマルウェア本体、あるいはマルウェア本体を送受信している通信を検出する技術が必要となる。 Malware-infected computers are characterized by performing illegal or harmful actions on other computers connected to the network. For example, denial-of-service attacks by mass sending of junk mail or illegal mass access to servers It is used as a tool for performing malicious actions such as Malware threats include not only attacks against the outside, but also operations that extract personal information such as credit card numbers and address books from infected computers and send them to external computers. In order to prevent such damage caused by malware, a technique for detecting the malware main body or communication that transmits and receives the malware main body is required.

このようなマルウェアの中には、Ｗｏｒｄ文書などのドキュメントファイルに関連付けされるアイコンそのもの、一見そのように見えるアイコンを保有し自身をドキュメントファイルに誤認させる工夫がされているもの、画像ファイルやフォルダなどのアイコンそのもの、一見そのように見えるアイコンを用いて実行を誘導する工夫がされているものが存在する。ここでは、これらのアイコン類を「偽装アイコン」と呼ぶ。 Among such malwares, icons associated with document files such as Word documents, icons that seem to look like that are devised to misidentify themselves as document files, image files, folders, etc. There are icons that are devised to guide the execution using icons that seem to appear. Here, these icons are called “fake icons”.

従来、偽装アイコンを有するマルウェアを検知する方法としては、図１２に示すように、実行ファイルのリソース内からアイコンイメージのハッシュ値を抽出し、このハッシュ値と予め保持している偽装アイコンのハッシュ値のリストとのマッチングを行い、合致すれば偽装アイコンを保持していると判別する方法がある。しかし、この方法では、予め保持している偽装アイコンには対応できるものの、アイコンイメージが少しでも変化した場合には、ハッシュ値が異なることから、偽装アイコンと判別できないという問題がある。図１３のアイコンイメージは、左上の画素が異なるだけであるが、ハッシュ値が異なる。 Conventionally, as a method of detecting malware having a camouflaged icon, as shown in FIG. 12, a hash value of an icon image is extracted from the resource of an executable file, and this hash value and a hash value of a camouflaged icon held in advance There is a method of performing a matching with a list of the images and determining that the camouflaged icon is held if they match. However, with this method, although it is possible to deal with a pretending icon that is held in advance, there is a problem that when the icon image changes even a little, the hash value is different, so that it cannot be determined as a fake icon. The icon image of FIG. 13 differs only in the upper left pixel, but the hash value is different.

他方、特許文献１に開示された技術は、危険性判定の対象となる対象ファイルのアイコン画像の特徴情報と、実行形式ファイルではない文書ファイルなどのファイルのアイコン画像として一般によく知られているアイコン画像である基本アイコン画像の特徴情報との類似度を算出し、算出された類似度に基づいて、対象ファイルの危険性を判定する。 On the other hand, the technique disclosed in Japanese Patent Application Laid-Open No. 2004-228561 is well-known as icon information of icon images of target files that are targets of risk determination and icon images of files such as document files that are not executable files. The similarity with the feature information of the basic icon image that is an image is calculated, and the risk of the target file is determined based on the calculated similarity.

特開２０１５−１９１４５８号公報JP2015-191458A

しかしながら、特許文献１に開示される技術では、基本アイコン画像の特徴情報との類似度を算出するのみで、本来は正常なファイルであっても、誤って偽装アイコンだと判定してしまうといった問題がある。 However, the technique disclosed in Patent Document 1 only calculates the degree of similarity with the feature information of the basic icon image, and it may be erroneously determined to be a fake icon even if it is originally a normal file. There is.

本発明は、上記のような従来技術に伴う課題を解決しようとするものであって、その目的とするところは、アイコンイメージが少しでも変化した場合でも、偽装アイコンをマルウェアと判定しやすくするところにある。 The present invention is intended to solve the problems associated with the prior art as described above, and its purpose is to make it easy to determine a fake icon as malware even if the icon image changes even a little. It is in.

また、本発明の他の目的は、本来は正常なファイルを誤って偽装アイコンと判定するといったような誤検知を抑制するところにある。 Another object of the present invention is to suppress false detection such that a normally normal file is erroneously determined as a fake icon.

本発明の一実施形態によれば、所定のファイルのリソース内からアイコン画像のバイナリを抽出する特徴情報抽出部と、前記抽出したアイコン画像のバイナリから特徴ベクトルを生成する特徴ベクトル生成部と、前記特徴ベクトルを用いて機械学習により前記所定のファイルがマルウェアかどうかを判定する判定部と、を備える情報処理装置が提供される。 According to an embodiment of the present invention, a feature information extraction unit that extracts an icon image binary from a resource of a predetermined file, a feature vector generation unit that generates a feature vector from the extracted icon image binary, An information processing apparatus is provided that includes a determination unit that determines whether the predetermined file is malware by machine learning using a feature vector.

前記特徴情報抽出部は、前記アイコン画像から数値を抽出し、偽装アイコン画像を数値化した数値を記憶する数値記憶部と、前記特徴情報抽出部が抽出した数値と前記数値記憶部が記憶する数値とに基づいて前記所定のファイルがマルウェアかどうかを判定する概括的判定部と、を備えることを特徴とする情報処理装置であってもよい。 The feature information extraction unit extracts a numerical value from the icon image, stores a numerical value obtained by digitizing the camouflaged icon image, a numerical value extracted by the feature information extraction unit, and a numerical value stored by the numerical value storage unit And an overall determination unit that determines whether the predetermined file is malware based on the information processing apparatus.

少なくとも前記判定部及び前記概括的判定部のうち一つが前記所定のファイルをマルウェアと判定した場合に、所定のマルウェア初期ポイントを設定する初期ポイント設定部と、前記所定のファイルがマルウェアかどうかを判定する指標となるポイント閾値を記憶するポイント閾値記憶部と、前記所定のファイルが所定の条件を満たすときに前記マルウェア初期ポイントに所定のポイントを加算または減算するポイント加減部と、前記ポイント加減部によって加算または減算されて算出されるポイントが、前記ポイント閾値記憶部が記憶するポイント閾値を超えるかどうかを判定し、該ポイント閾値を超える場合には、前記所定のファイルをマルウェアと判定する閾値判定部と、をさらに備えることを特徴とする情報処理装置であってもよい。 When at least one of the determination unit and the general determination unit determines that the predetermined file is malware, an initial point setting unit that sets a predetermined malware initial point, and determines whether the predetermined file is malware A point threshold value storage unit that stores a point threshold value that is an index to perform, a point adjustment unit that adds or subtracts a predetermined point to the malware initial point when the predetermined file satisfies a predetermined condition, and the point adjustment unit A threshold determination unit that determines whether a point calculated by addition or subtraction exceeds a point threshold stored in the point threshold storage unit and determines the predetermined file as malware when the point threshold is exceeded. And an information processing apparatus characterized by further comprising:

前記マルウェアと判定された所定のファイルのアイコン画像を数値化する数値化部をさらに備え、前記数値記憶部は、前記数値化部によって数値化された数値を記憶することを特徴とする情報処理装置であってもよい。 An information processing apparatus, further comprising: a numerical unit that digitizes an icon image of a predetermined file that is determined as the malware, wherein the numerical storage unit stores the numerical value that is digitized by the numerical unit It may be.

本発明の一実施形態によれば、コンピュータが、所定のファイルのリソース内からアイコン画像のバイナリを抽出し、前記抽出したアイコン画像のバイナリから特徴ベクトルを生成し、前記特徴ベクトルを用いて機械学習により前記所定のファイルがマルウェアかどうかを判定する情報処理方法が提供される。 According to an embodiment of the present invention, a computer extracts an icon image binary from a resource of a predetermined file, generates a feature vector from the extracted icon image binary, and uses the feature vector to perform machine learning. Provides an information processing method for determining whether the predetermined file is malware.

本発明の一実施形態によれば、コンピュータに、所定のファイルのリソース内からアイコン画像のバイナリを抽出し、前記抽出したアイコン画像のバイナリから特徴ベクトルを生成し、前記特徴ベクトルを用いて機械学習により前記所定のファイルがマルウェアかどうかを判定することを実行させるためのプログラムが提供される。 According to an embodiment of the present invention, a computer extracts an icon image binary from a resource of a predetermined file, generates a feature vector from the extracted icon image binary, and uses the feature vector to perform machine learning. Provides a program for determining whether the predetermined file is malware.

前記コンピュータに、前記バイナリを抽出するのに、所定の間隔で抽出することを実行させてもよい。 In order to extract the binary, the computer may execute extraction at a predetermined interval.

前記コンピュータに、前記アイコン画像から数値を抽出し、偽装アイコン画像を数値化した数値を記憶し、前記抽出した数値と前記記憶する数値とに基づいて前記所定のファイルがマルウェアかどうかを判定することをさらに実行させてもよい。 The computer extracts a numerical value from the icon image, stores a numerical value obtained by digitizing a fake icon image, and determines whether the predetermined file is malware based on the extracted numerical value and the stored numerical value. May be further executed.

前記コンピュータに、少なくとも前記特徴ベクトルを用いて機械学習により前記所定のファイルがマルウェアかどうかを判定すること及び前記抽出した数値と前記記憶する数値とに基づいて前記所定のファイルがマルウェアかどうかを判定することのうち一つが前記所定のファイルをマルウェアと判定した場合に、所定のマルウェア初期ポイントを設定し、前記所定のファイルがマルウェアかどうかを判定する指標となるポイント閾値を記憶し、前記所定のファイルが所定の条件を満たすときに前記マルウェア初期ポイントに所定のポイントを加算または減算し、前記加算または減算されて算出されるポイントが、前記記憶するポイント閾値を超えるかどうかを判定し、該ポイント閾値を超える場合には、前記所定のファイルをマルウェアと判定することを実行させてもよい。 The computer determines whether the predetermined file is malware by machine learning using at least the feature vector, and determines whether the predetermined file is malware based on the extracted numerical value and the stored numerical value When one of the determinations determines that the predetermined file is malware, a predetermined malware initial point is set, a point threshold value serving as an index for determining whether the predetermined file is malware is stored, and the predetermined file is stored. When a file satisfies a predetermined condition, a predetermined point is added to or subtracted from the malware initial point, and it is determined whether the point calculated by the addition or subtraction exceeds the stored point threshold, If the threshold is exceeded, the predetermined file is regarded as malware. It may be executed to the constant.

前記コンピュータに、前記マルウェアと判定された所定のファイルのアイコン画像を数値化し、前記数値化された数値を記憶することを実行させてもよい。 You may make the said computer digitize the icon image of the predetermined file determined with the said malware, and memorize | store the said digitized numerical value.

前記コンピュータに、前記所定のファイルのアイコン画像が正規のアイコン画像かどうかを判定し、前記所定のファイルのアイコン画像が正規のアイコン画像でないと判定した場合に、前記マルウェア初期ポイントに所定のポイントを加算することを実行させてもよい。 When the computer determines whether the icon image of the predetermined file is a regular icon image, and determines that the icon image of the predetermined file is not a regular icon image, the predetermined point is set as the malware initial point. You may perform adding.

前記所定のファイルのリソース内に保有されるアイコンの数を抽出し、前記抽出したアイコンの数が所定の数を下回る場合に、前記マルウェア初期ポイントに所定のポイントを加算することを実行させてもよい。 Extracting the number of icons held in the resource of the predetermined file, and adding the predetermined point to the malware initial point when the number of extracted icons is less than the predetermined number Good.

前記所定のファイルからバージョン情報を抽出し、前記所定のファイルのアイコン画像が前記バージョン情報に対応しない場合に、前記マルウェア初期ポイントに所定のポイントを加算することを実行させてもよい。 Version information may be extracted from the predetermined file, and when the icon image of the predetermined file does not correspond to the version information, adding the predetermined point to the malware initial point may be executed.

前記所定のファイルからプログラミング言語情報を抽出し、前記抽出されるプログラミング言語情報が、前記所定のファイルが保持するアイコン画像に対応するプログラミング言語情報でない場合に、前記マルウェア初期ポイントに所定のポイントを加算することを実行させてもよい。 Programming language information is extracted from the predetermined file, and when the extracted programming language information is not programming language information corresponding to an icon image held in the predetermined file, the predetermined point is added to the malware initial point It may be performed.

前記所定のファイルからコンパイラ情報を抽出し、前記抽出されるコンパイラ情報が、前記所定のファイルが保持するアイコン画像に対応するコンパイラ情報でない場合に、前記マルウェア初期ポイントに所定のポイントを加算することを実行させてもよい。 Extracting compiler information from the predetermined file, and adding the predetermined point to the malware initial point when the extracted compiler information is not compiler information corresponding to an icon image held in the predetermined file. It may be executed.

前記所定のファイルからパッカーに関する情報を抽出し、前記所定のファイルからパッカーに関する情報を抽出した場合に、前記マルウェア初期ポイントに所定のポイントを加算することを実行させてもよい。 When the information about the packer is extracted from the predetermined file and the information about the packer is extracted from the predetermined file, adding the predetermined point to the initial malware point may be executed.

前記所定のファイルから自己解凍書庫情報を抽出し、前記所定のファイルから自己解凍書庫情報を抽出した場合に、前記マルウェア初期ポイントに所定のポイントを加算することを実行させてもよい。 When the self-extracting archive information is extracted from the predetermined file and the self-extracting archive information is extracted from the predetermined file, adding the predetermined point to the malware initial point may be executed.

前記所定のファイルからファイル名を抽出し、前記抽出されたファイル名の文字数が所定の文字数を超える場合に、前記マルウェア初期ポイントに所定のポイントを加算することを実行させてもよい。 A file name may be extracted from the predetermined file, and when the number of characters of the extracted file name exceeds a predetermined number of characters, adding a predetermined point to the malware initial point may be executed.

前記所定のファイルからファイル名を抽出し、前記抽出されたファイル名にユニコード制御文字が含まれる場合に、前記マルウェア初期ポイントに所定のポイントを加算することを実行させてもよい。 A file name may be extracted from the predetermined file, and when a Unicode control character is included in the extracted file name, adding a predetermined point to the malware initial point may be executed.

前記所定のファイルからファイル名を抽出し、前記抽出されたファイル名に複数の拡張子が含まれる場合に、前記マルウェア初期ポイントに所定のポイントを加算することを実行させてもよい。 A file name may be extracted from the predetermined file, and when the extracted file name includes a plurality of extensions, adding a predetermined point to the malware initial point may be executed.

前記所定のファイルからファイル名を抽出し、前記抽出されたファイル名に２バイト文字が含まれている場合に、前記マルウェア初期ポイントに所定のポイントを加算することを実行させてもよい。 A file name may be extracted from the predetermined file, and when a 2-byte character is included in the extracted file name, adding a predetermined point to the malware initial point may be executed.

本発明の一実施形態によれば、前記のプログラムを記録したコンピュータ読み取り可能な記録媒体が提供されてもよい。 According to an embodiment of the present invention, a computer-readable recording medium that records the program may be provided.

本発明によれば、アイコンイメージが少しでも変化した場合でも、偽装アイコンをマルウェアと判定しやすくすることができる。 According to the present invention, even when an icon image changes even a little, it is possible to easily determine a camouflaged icon as malware.

本発明の一実施形態に係る情報処理装置の概念図である。It is a conceptual diagram of the information processing apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る情報処理装置が特徴ベクトルを生成することを説明するための図である。It is a figure for demonstrating that the information processing apparatus which concerns on one Embodiment of this invention produces | generates a feature vector. 本発明の一実施形態に係る情報処理装置の判定部（モデル）を生成するフローを説明するための図である。It is a figure for demonstrating the flow which produces | generates the determination part (model) of the information processing apparatus which concerns on one Embodiment of this invention. 図３で生成したモデルを移植することを説明するための概念図である。It is a conceptual diagram for demonstrating transplanting the model produced | generated in FIG. 本発明の一実施形態に係る情報処理装置がマルウェアを判定するフローを説明するための図である。It is a figure for demonstrating the flow in which the information processing apparatus which concerns on one Embodiment of this invention determines a malware. 本発明の他の実施形態に係る情報処理装置の概念図である。It is a conceptual diagram of the information processing apparatus which concerns on other embodiment of this invention. 本発明の他の実施形態に係る情報処理装置がマルウェアを判定するフローを説明するための図である。It is a figure for demonstrating the flow in which the information processing apparatus which concerns on other embodiment of this invention determines malware. 本発明の他の実施形態に係る情報処理装置がマルウェアを判定するフローを説明するための図である。It is a figure for demonstrating the flow in which the information processing apparatus which concerns on other embodiment of this invention determines malware. 本発明の他の実施形態に係る情報処理装置の概念図である。It is a conceptual diagram of the information processing apparatus which concerns on other embodiment of this invention. 本発明の他の実施形態に係る情報処理装置がマルウェアを判定するフローを説明するための図である。It is a figure for demonstrating the flow in which the information processing apparatus which concerns on other embodiment of this invention determines malware. 本発明の他の実施形態に係る情報処理装置の概念図である。It is a conceptual diagram of the information processing apparatus which concerns on other embodiment of this invention. 従来技術に係る偽装アイコンを有するマルウェアを検知するフローを説明するための図である。It is a figure for demonstrating the flow which detects the malware which has the camouflaged icon which concerns on a prior art. 従来技術の課題を理解するためのアイコン画像及びアイコン画像に対応するハッシュ値を示す図である。It is a figure which shows the hash value corresponding to the icon image and icon image for understanding the subject of a prior art.

以下、本発明の一実施形態について、図面を参照しながら詳細に説明する。以下に示す実施形態は本発明の実施形態の一例であって、本発明はこれらの実施形態に限定されるものではない。なお、本実施形態で参照する図面において、同一部分または同様な機能を有する部分には同一の符号または類似の符号（数字の後にＡ、Ｂなどを付しただけの符号）を付し、その繰り返しの説明は省略する場合がある。また、図面の寸法比率は説明の都合上実際の比率とは異なったり、構成の一部が図面から省略されたりする場合がある。 Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings. The following embodiments are examples of the embodiments of the present invention, and the present invention is not limited to these embodiments. Note that in the drawings referred to in this embodiment, the same portion or a portion having a similar function is denoted by the same reference symbol or a similar reference symbol (a reference symbol simply including A, B, etc. after a number) and repeated. The description of may be omitted. In addition, the dimensional ratio in the drawing may be different from the actual ratio for convenience of explanation, or a part of the configuration may be omitted from the drawing.

＜第１実施形態＞
［情報処理装置の構成］
図１乃至図４を用いて、情報処理装置１について説明する。図１は、本発明の一実施形態に係る情報処理装置の概念図である。 <First Embodiment>
[Configuration of information processing device]
The information processing apparatus 1 will be described with reference to FIGS. FIG. 1 is a conceptual diagram of an information processing apparatus according to an embodiment of the present invention.

情報処理装置１は、接続するユーザ端末３０ａ、３０ｂ、サーバ３３それぞれがネットワーク２７を介して接続する。なお、ユーザ端末の区別が不要な場合には、「ユーザ端末３０」と表記する。 In the information processing apparatus 1, the user terminals 30 a and 30 b to be connected and the server 33 are connected via the network 27. In addition, when it is not necessary to distinguish between user terminals, “user terminal 30” is used.

ここで、ネットワーク２７は、例えば、ＬＡＮ（ローカルエリアネットワーク）やインターネット等のネットワークであり、無線／有線回線、専用回線等を問わず、ユーザ端末３０がワークフロー情報処理装置１に接続可能なネットワーク環境が適用される。 Here, the network 27 is a network such as a LAN (local area network) or the Internet, for example, and a network environment in which the user terminal 30 can connect to the workflow information processing apparatus 1 regardless of a wireless / wired line, a dedicated line, or the like. Applies.

また、ユーザ端末３０は、多機能携帯電話、携帯電話やＰＤＡ（ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔ）等の移動通信端末装置、パーソナルコンピュータなどの通信機能及び演算機能を備えた情報処理端末装置等が含まれる。また、画面を表示する表示制御機能としてブラウザを備え、ＣＰＵ、メモリ及び情報処理装置１との間の通信制御を遂行する通信制御部等を含む。さらに、マウスやキーボード、タッチパネル等の操作入力装置及び表示装置を備えることができる。 The user terminal 30 includes a multi-function mobile phone, a mobile communication terminal device such as a mobile phone and a PDA (Personal Digital Assistant), an information processing terminal device having a communication function and a calculation function such as a personal computer. Further, a browser is provided as a display control function for displaying a screen, and includes a CPU, a memory, and a communication control unit that performs communication control with the information processing apparatus 1. Furthermore, an operation input device such as a mouse, a keyboard, and a touch panel, and a display device can be provided.

情報処理装置１は、特徴情報抽出部１０、特徴ベクトル生成部１１及び判定部１２を備える。 The information processing apparatus 1 includes a feature information extraction unit 10, a feature vector generation unit 11, and a determination unit 12.

特徴情報抽出部１０は、検体である所定のファイル（実行ファイルなど）のリソース内からアイコン画像のバイナリを抽出する。ここで、抽出したアイコン画像に対して、縮小や正規化などの加工を行ってもよい。 The feature information extraction unit 10 extracts a binary icon image from the resource of a predetermined file (execution file or the like) that is a specimen. Here, processing such as reduction or normalization may be performed on the extracted icon image.

この例では、アイコン画像のバイナリの抽出は、すべての画素から抽出している。もっとも、アイコン画像のバイナリの抽出方法は、この方法に限定されるものではなく、所定の間隔でバイナリを抽出してもよい。例えば、左上の画素を（１、１）として、（１、３）、（１、５）・・・（２、１）、（２、３）・・・といったように奇数の画素のバイナリを抽出してもよいし、偶数の画素のバイナリを抽出してもよい。奇数の画素や偶数の画素ではなく、これらよりも間隔を大きくして、画素のバイナリを抽出してもよい。 In this example, binary extraction of the icon image is performed from all pixels. Of course, the method for extracting the icon image binary is not limited to this method, and the binary may be extracted at a predetermined interval. For example, assuming that the upper left pixel is (1, 1), binary of odd pixels such as (1, 3), (1, 5)... (2, 1), (2, 3). You may extract, and you may extract the binary of an even-numbered pixel. Instead of the odd-numbered pixels and the even-numbered pixels, the binary of the pixels may be extracted with a larger interval than these.

特徴ベクトル生成部１１は、特徴情報抽出部１０が抽出したアイコン画像のバイナリから特徴ベクトルを生成する。図２を用いて、特徴ベクトルの生成について説明する。図２は、本発明の一実施形態に係る情報処理装置が特徴ベクトルを生成することを説明するための図である。 The feature vector generation unit 11 generates a feature vector from the binary icon image extracted by the feature information extraction unit 10. The generation of feature vectors will be described with reference to FIG. FIG. 2 is a diagram for explaining that the information processing apparatus according to an embodiment of the present invention generates a feature vector.

特徴ベクトル生成部１１は、アイコン画像の各画素のＲＧＢ値をベクトルとする。この例では、図２のアイコン画像の左上の画素は、「Ｒ：０ｘＤ４」、「Ｇ：０ｘ００」、「Ｂ：０ｘＣ８」で、「０ｘ００Ｄ４Ｄ０Ｃ８」である。そして、１６進数である「Ｄ４Ｄ０Ｃ８」を１０進数で表わすと「１３９４７０８０」となる。図２のアイコン画像の各画素について、同様に１０進数とした上で、ベクトルとする。この例では、特徴ベクトル＝｛１３９４７０８０、１３９４７０８０、・・・、８２９３０１３、・・・｝となる。 The feature vector generation unit 11 uses the RGB value of each pixel of the icon image as a vector. In this example, the upper left pixel of the icon image in FIG. 2 is “R: 0xD4”, “G: 0x00”, “B: 0xC8”, and “0x00D4D0C8”. Then, “D4D0C8” which is a hexadecimal number is represented as “139447080” in decimal. Each pixel of the icon image of FIG. 2 is similarly converted to a decimal number and then a vector. In this example, the feature vector = {13947080, 1394080,..., 893013,.

判定部１２は、特徴ベクトルを用いて機械学習により、検体である所定のファイルがマルウェアかどうか判定する。ところで、判定部（モデル）１２は、予め生成される。図３を用いて、判定部（モデル）１２の生成について説明する。図３は、本発明の一実施形態に係る情報処理装置の判定部（モデル）を生成するフローを説明するための図である。図４は、図３で生成したモデルを移植することを説明するための概念図である。 The determination unit 12 determines whether the predetermined file as the sample is malware by machine learning using the feature vector. By the way, the determination unit (model) 12 is generated in advance. The generation of the determination unit (model) 12 will be described with reference to FIG. FIG. 3 is a diagram for explaining a flow of generating a determination unit (model) of the information processing apparatus according to the embodiment of the present invention. FIG. 4 is a conceptual diagram for explaining that the model generated in FIG. 3 is transplanted.

偽装アイコンを持つマルウェアから抽出したアイコン画像及び正常なソフトウェアから抽出したアイコン画像を複数、ラベル付けして準備する。そして、これを学習データセット（教師データ）とする。この学習データをモデルに投入する（Ｓ１０１）。この例では、学習データを投入するモデルについては、重みｗｉを任意（ランダム）に決めている。もっとも、公知のプレトレーニング等の方法によって、重みｗｉの値を決めてもよい。 Prepare a plurality of icon images extracted from malware with fake icons and icon images extracted from normal software. And let this be a learning data set (teacher data). This learning data is input to the model (S101). In this example, the weight wi is arbitrarily (randomly) determined for the model into which learning data is input. However, the value of the weight wi may be determined by a known method such as pre-training.

学習データをモデルに投入すると、ディープラーニングのアルゴリズムにしたがって学習を行う（Ｓ１０２）。この学習の結果、モデルの内部構造が変化する。具体的には、モデルの重みｗｉの値が変化する。なお、活性化関数は、固定されている。ディープラーニングは、ＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）などである。この例では、ディープラーニングによる学習を行っているが、これに限定されるものではなく、誤差逆伝播法（Ｂａｃｋｐｒｏｐａｇａｔｉｏｎ；バックプロパゲーション）、ボルツマンマシン、ＴＷＤＲＬＳ、コグニトロンなど他のアルゴリズムであってもよい。 When the learning data is input to the model, learning is performed according to a deep learning algorithm (S102). As a result of this learning, the internal structure of the model changes. Specifically, the value of the model weight wi changes. The activation function is fixed. Deep learning is CNN (Convolutional Neural Network) or the like. In this example, learning by deep learning is performed. However, the present invention is not limited to this, and other algorithms such as backpropagation (Backpropagation), Boltzmann machine, TWDRLS, and cognitoron may be used. .

学習の結果、内部構造が変化してモデルの分類精度が、一定の分類精度かどうかを判定する（Ｓ１０３）。一定の分類精度である場合（Ｓ１０３でＹｅｓの場合）には、モデルの生成が完了する（Ｓ１０４）。他方、一定の分類精度未満の場合（Ｓ１０３でＮｏの場合）には、Ｓ１０１からＳ１０３を繰り返す。モデルの生成が完了すると、生成したモデルをマルウェア検出エンジンに移植する。この移植したモデルが判定部１２である。 As a result of learning, it is determined whether the internal structure changes and the classification accuracy of the model is a constant classification accuracy (S103). If the classification accuracy is constant (Yes in S103), the model generation is completed (S104). On the other hand, if it is less than a certain classification accuracy (No in S103), S101 to S103 are repeated. When the model generation is completed, the generated model is ported to the malware detection engine. This transplanted model is the determination unit 12.

［マルウェア判定フロー］
図５を用いて、情報処理装置が検体であるファイルをマルウェアと判定するフローを説明する。図５は、本発明の一実施形態に係る情報処理装置がマルウェアを判定するフローを説明するための図である。 [Malware judgment flow]
A flow in which the information processing apparatus determines a file as a sample as malware will be described with reference to FIG. FIG. 5 is a diagram for explaining a flow in which the information processing apparatus according to an embodiment of the present invention determines malware.

まず、特徴情報抽出部１０が、検体であるファイル（実行ファイルなど）のリソース内からアイコン画像のバイナリを抽出する（Ｓ２０１）。次に、特徴ベクトル生成部１１が、抽出したアイコン画像のバイナリから特徴ベクトルを生成する（Ｓ２０２）。そして、判定部１２が、特徴ベクトルを用いて、機械学習により検体であるファイルがマルウェアかどうかを判定する（Ｓ２０３）。判定部１２によって、マルウェアと判定されるか（Ｓ２０４ａ）、正常と判定される（Ｓ２０４ｂ）と、フローは終了となる。 First, the feature information extraction unit 10 extracts the binary of the icon image from the resource of the sample file (execution file or the like) (S201). Next, the feature vector generation unit 11 generates a feature vector from the extracted binary icon image (S202). Then, the determination unit 12 determines whether the sample file is malware by machine learning using the feature vector (S203). If the determination unit 12 determines that the malware is malware (S204a) or is normal (S204b), the flow ends.

本実施形態では、判定部１２は、ディープラーニングによる学習により、初期のモデルから内部構造が変化し、一定の分類精度を有すると判定されたモデルが移植されたものである。そして、判定部１２は、機械学習により、検体であるファイルをマルウェアかどうか判定する。そのため、従来技術よりも、より高精度で検体であるファイルをマルウェアと判定することができるという効果を奏する。 In the present embodiment, the determination unit 12 is obtained by transplanting a model determined to have a certain classification accuracy by changing the internal structure from the initial model by learning by deep learning. And the determination part 12 determines whether the file which is a sample is malware by machine learning. Therefore, there is an effect that a file that is a sample can be determined as malware with higher accuracy than the conventional technology.

＜第２実施形態＞
第１実施形態のように、機械学習により検体であるファイルをマルウェアかどうか判定する方法は、従来技術よりもより高精度で検体であるファイルをマルウェアと判定することができるという意味で効果を有する。もっとも、この方法では、検体であるファイルが本来は正常なソフトウェアであるにもかかわらず、誤ってマルウェアと判定してしまうことがある。このような判定を誤検知（誤判定）という。本発明者は、この誤検知を抑制する方法の必要性を認識し、鋭意検討した結果、検体であるファイルのアイコン画像の色の出現回数や濃淡ヒストグラムに着目した統計的アプローチを用いる方法を考えるに至った。 Second Embodiment
As in the first embodiment, the method of determining whether a sample file is malware by machine learning is effective in the sense that the sample file can be determined as malware with higher accuracy than the conventional technology. . However, in this method, the sample file may be erroneously determined as malware even though it is originally normal software. Such a determination is called erroneous detection (incorrect determination). As a result of recognizing the necessity of a method for suppressing this false detection and intensively studying the present inventor, the present inventor considers a method using a statistical approach focusing on the number of appearances of the color of an icon image of a sample file and a density histogram. It came to.

［情報処理装置の構成］
図６を用いて、情報処理装置２について説明する。図６は、本発明の他の実施形態に係る情報処理装置の概念図である。本実施形態に係る情報処理装置２は、第１実施形態の構成に加えて、概括的判定部１３、数値記憶部２１を備える。ここでは、第１実施形態と異なる点について詳細に説明する。 [Configuration of information processing device]
The information processing apparatus 2 will be described with reference to FIG. FIG. 6 is a conceptual diagram of an information processing apparatus according to another embodiment of the present invention. The information processing apparatus 2 according to the present embodiment includes a general determination unit 13 and a numerical value storage unit 21 in addition to the configuration of the first embodiment. Here, differences from the first embodiment will be described in detail.

特徴情報抽出部１０は、検体であるファイルのリソース内からアイコン画像のバイナリを抽出する。加えて、特徴情報抽出部１０は、アイコン画像から数値を抽出する。 The feature information extraction unit 10 extracts an icon image binary from the resource of a file that is a sample. In addition, the feature information extraction unit 10 extracts a numerical value from the icon image.

数値記憶部２１は、この例では、偽装アイコン画像を数値化した数値を予め記憶し、保持している。ここで、偽装アイコン画像は、すでに偽装アイコンと判明しているアイコンの画像である。このような画像を多数、数値化して、数値を記憶し、保持する。 In this example, the numerical value storage unit 21 stores in advance and holds numerical values obtained by digitizing the camouflaged icon image. Here, the camouflaged icon image is an icon image that has already been determined to be a camouflaged icon. A large number of such images are digitized to store and hold numerical values.

概括的判定部１３は、特徴情報抽出部１０が検体であるファイルのリソース内のアイコン画像から抽出した数値と数値記憶部２１とに基づいて、検体であるファイルがマルウェアかどうかを判定する。概括的判定部１３としては、ＡｖｅｒａｇｅＨａｓｈが挙げられる。ＡｖｅｒａｇｅＨａｓｈである場合には、特徴情報抽出部１０は、アイコン画像のサイズを縮小し（例えば、８×８画素）、色をグレースケールにする。そして、画像の各画素を使って色の平均値を計算し、それぞれの画素で色の濃淡を調べ、その色が平均値よりも濃い場合には、「１」を設定し、薄い場合には、「０」を設定する。そうすると、６４ビット（８×８）のビット列ができる。特徴情報抽出部１０が抽出する数値は、このビット列である。 The general determination unit 13 determines whether or not the file that is the sample is malware based on the numerical value extracted from the icon image in the resource of the file that is the sample by the feature information extraction unit 10 and the numerical value storage unit 21. An example of the general determination unit 13 is Average Hash. In the case of Average Hash, the feature information extraction unit 10 reduces the size of the icon image (for example, 8 × 8 pixels) and changes the color to gray scale. Then, the average value of the color is calculated using each pixel of the image, the color density of each pixel is checked, and when the color is darker than the average value, “1” is set. , “0” is set. Then, a 64-bit (8 × 8) bit string is created. The numerical value extracted by the feature information extraction unit 10 is this bit string.

多数のすでに偽装アイコンと判明しているアイコンの画像についても、同様に、ビット列を生成し、そのビット列（数値）を予め数値記憶部２１に記憶する。そして、概括的判定部１３は、検体であるファイルから抽出されるビット列（数値）と数値記憶部２１が記憶する、多数のすでに偽装アイコンと判明しているアイコンの画像のビット列（数値）とを１ビットずつ比較して、類似度を計算する。そして、予め設定した所定の類似度を超えた場合に、概括的判定部１３は、検体であるファイルをマルウェアと判定する。 Similarly, a bit string is generated for a large number of icon images that have already been identified as fake icons, and the bit string (numerical value) is stored in the numerical value storage unit 21 in advance. Then, the general determination unit 13 obtains a bit string (numerical value) extracted from the file as the sample and a bit string (numerical value) of an icon image stored in the numerical value storage unit 21 and already known as a fake icon. The similarity is calculated by comparing one bit at a time. When the predetermined similarity is exceeded, the general determination unit 13 determines that the sample file is malware.

概括的判定部１３としては、ＡｖｅｒａｇｅＨａｓｈに限定されるものではなく、Ｊａｃｃａｒｄ係数、ＴＦ／ＩＤＦ、ＦｕｚｚｙＨａｓｈ、ＳＡＤ（ＳｕｍｏｆＡｂｓｏｌｕｔｅＤｉｆｆｅｒｅｎｃｅ）などを用いてもよい。これらのいずれかを用いて、検体であるファイルから抽出される数値と多数のすでに偽装アイコンと判明しているアイコンの画像の数値とを比較して、類似度を計算する。そして、予め設定した所定の類似度を超えた場合に、概括的判定部１３は、検体であるファイルをマルウェアと判定する。また、概括的判定部１３として、ＡｖｅｒａｇｅＨａｓｈによる判定、Ｊａｃｃａｒｄ係数、ＴＦ／ＩＤＦ、ＦｕｚｚｙＨａｓｈ、ＳＡＤ（ＳｕｍｏｆＡｂｓｏｌｕｔｅＤｉｆｆｅｒｅｎｃｅ）などを用いた判定の複数を用いてもよい。これらは互いに独立しているからである。 The general determination unit 13 is not limited to the Average Hash, and a Jaccard coefficient, TF / IDF, Fuzzy Hash, SAD (Sum of Absolute Difference), or the like may be used. Using either of these, a numerical value extracted from a file as a sample is compared with numerical values of a large number of icon images already known as camouflaged icons, and the similarity is calculated. When the predetermined similarity is exceeded, the general determination unit 13 determines that the sample file is malware. Further, as the general determination unit 13, a plurality of determinations using Average Hash determination, Jaccard coefficient, TF / IDF, Fuzzy Hash, SAD (Sum of Absolute Difference), or the like may be used. This is because they are independent of each other.

［マルウェア判定フロー］
図７を用いて、情報処理装置が検体であるファイルをマルウェアと判定するフローを説明する。図７は、本発明の他の実施形態に係る情報処理装置がマルウェアを判定するフローを説明するための図である。本実施形態のＳ３０１、Ｓ３０２、Ｓ３０３は、第１実施形態のＳ２０１、Ｓ２０２、Ｓ２０３に対応する。ここでは、Ｓ３０３以降について詳細に説明する。 [Malware judgment flow]
A flow in which the information processing apparatus determines a file as a sample as malware will be described with reference to FIG. FIG. 7 is a diagram for explaining a flow in which an information processing apparatus according to another embodiment of the present invention determines malware. S301, S302, and S303 of the present embodiment correspond to S201, S202, and S203 of the first embodiment. Here, S303 and subsequent steps will be described in detail.

機械学習による判定の結果、検体であるファイルがマルウェアではなく正常なソフトウェアと判定される場合（Ｓ３０３でＮｏの場合）、検体であるファイルは、正常なソフトウェアと判定される（Ｓ３０６）。他方、機械学習による判定の結果、検体であるファイルがマルウェアであると判定される場合（Ｓ３０３でＹｅｓの場合）、さらに概括的判定部１３が、検体であるファイルがマルウェアかどうか判定する（Ｓ３０４）。具体的な判定方法は、上記のとおりである。 As a result of determination by machine learning, when the sample file is determined to be normal software and not malware (No in S303), the sample file is determined to be normal software (S306). On the other hand, if it is determined as a result of the machine learning that the sample file is malware (Yes in S303), the general determination unit 13 further determines whether the sample file is malware (S304). ). A specific determination method is as described above.

概括的判定部１３が、検体であるファイルがマルウェアではなく正常なソフトウェアと判定する場合（Ｓ３０４でＮｏの場合）、検体であるファイルは、正常なソフトウェアと判定され（Ｓ３０６）、フローは終了となる。他方、概括的判定部１３が、検体であるファイルがマルウェアであると判定する場合（Ｓ３０４でＹｅｓの場合）、検体であるファイルはマルウェアと判定され（Ｓ３０５）、フローは終了となる。 When the general determination unit 13 determines that the sample file is not malware but normal software (No in S304), the sample file is determined to be normal software (S306), and the flow ends. Become. On the other hand, when the general determination unit 13 determines that the sample file is malware (Yes in S304), the sample file is determined to be malware (S305), and the flow ends.

本実施形態でも、第１実施形態と同様に、判定部１２は、機械学習により、検体であるファイルをマルウェアかどうか判定する。そのため、従来技術よりも、より高精度で検体であるファイルをマルウェアと判定することができるという効果を奏する。 Also in the present embodiment, as in the first embodiment, the determination unit 12 determines whether a file as a sample is malware by machine learning. Therefore, there is an effect that a file that is a sample can be determined as malware with higher accuracy than the conventional technology.

本実施形態では、判定部１２によってマルウェアと判定されたファイルについて、概括的判定部１３がさらにマルウェアかどうか判定する。判定部１２は、検体であるファイルが本来は正常なソフトウェアであるにもかかわらず、誤ってマルウェアと判定してしまうことが稀にある。このような判定を誤検知（誤判定）という。概括的判定部１３は、判定部１２の誤検知を抑制するという効果を奏する。 In the present embodiment, the general determination unit 13 further determines whether the file is determined to be malware by the determination unit 12. The determination unit 12 rarely mistakenly determines that the sample file is malware even though it is originally normal software. Such a determination is called erroneous detection (incorrect determination). The general determination unit 13 has an effect of suppressing erroneous detection of the determination unit 12.

＜第３実施形態＞
［マルウェア判定フロー］
図８を用いて、情報処理装置が検体であるファイルをマルウェアと判定するフローを説明する。図８は、本発明の他の実施形態に係る情報処理装置がマルウェアを判定するフローを説明するための図である。 <Third Embodiment>
[Malware judgment flow]
A flow in which the information processing apparatus determines a file as a sample as malware will be described with reference to FIG. FIG. 8 is a diagram for explaining a flow in which an information processing apparatus according to another embodiment of the present invention determines malware.

第３実施形態は、第２実施形態と一部のフローの順番が異なる。第２実施形態では、判定部１２による判定の後に概括的判定部１３による判定が行わるが、第３実施形態では、概括的判定部１３による判定の後に判定部１２による判定が行われる。 The third embodiment differs from the second embodiment in the order of some flows. In the second embodiment, the determination by the general determination unit 13 is performed after the determination by the determination unit 12. In the third embodiment, the determination by the determination unit 12 is performed after the determination by the general determination unit 13.

本実施形態でも、第２実施形態と同様の効果を奏する。 This embodiment also has the same effect as the second embodiment.

＜第４実施形態＞
第２実施形態及び第３実施形態のように、機械学習により検体であるファイルをマルウェアかどうか判定する方法に加えて、概括的判定を行うことにより、より高精度で検体であるファイルをマルウェアと判定することができる。これらの方法でも、誤検知をすべて防ぐことはできない。そこで、本発明者は、さらに誤検知を抑制する方法を鋭意検討した結果、第２実施形態及び第３実施形態に加えて、マルウェアの特徴を加味して誤検知を抑制する方法を考えるに至った。以下では、マルウェアの特徴例を１１項目説明するが、本実施形態は、これらに限定されるものではない。 <Fourth embodiment>
As in the second and third embodiments, in addition to the method of determining whether a sample file is malware by machine learning, by performing a general determination, the sample file is more accurately identified as malware. Can be determined. These methods do not prevent all false detections. Therefore, as a result of intensive studies on a method for suppressing false detection, the present inventor has come up with a method for suppressing false detection in consideration of the characteristics of malware in addition to the second and third embodiments. It was. In the following, eleven characteristic examples of malware will be described, but the present embodiment is not limited to these.

［情報処理装置の構成］
図９を用いて、情報処理装置３について説明する。図９は、本発明の他の実施形態に係る情報処理装置の概念図である。本実施形態に係る情報処理装置３は、第２実施形態の構成に加えて、初期ポイント設定部１４、ポイント加減部１５、閾値判定部１６、改ざんアイコン判定部１７、ポイント閾値記憶部２２を備える。ここでは、第２実施形態と異なる点について詳細に説明する。 [Configuration of information processing device]
The information processing apparatus 3 will be described with reference to FIG. FIG. 9 is a conceptual diagram of an information processing apparatus according to another embodiment of the present invention. The information processing apparatus 3 according to the present embodiment includes an initial point setting unit 14, a point adjustment unit 15, a threshold determination unit 16, a falsified icon determination unit 17, and a point threshold storage unit 22 in addition to the configuration of the second embodiment. . Here, differences from the second embodiment will be described in detail.

初期ポイント設定部１４は、少なくとも判定部１２及び概括的判定部１３のうち一つが検体である所定のファイルをマルウェアと判定した場合に、所定のマルウェア初期ポイントを設定する。判定部１２が検体であるファイルをマルウェアと判定した場合であってもよいし、概括的判定部１３が検体であるファイルをマルウェアと判定した場合であってもよいし、判定部１２及び概括的判定部１３が検体であるファイルをマルウェアと判定した場合であってもよい。これらの場合に、初期ポイント設定部１４は、マルウェア初期ポイントを設定する。この例では、マルウェア初期ポイントは、９０ポイントとして説明する。マルウェア初期ポイントは、予め記憶部２０に記憶しておいてもよい。 The initial point setting unit 14 sets a predetermined initial malware point when at least one of the determination unit 12 and the general determination unit 13 determines a predetermined file as a sample as malware. The determination unit 12 may determine that the sample file is malware, the general determination unit 13 may determine that the sample file is malware, or the determination unit 12 and the general determination unit The determination unit 13 may determine that the file as the sample is malware. In these cases, the initial point setting unit 14 sets a malware initial point. In this example, the initial malware point is described as 90 points. The initial malware point may be stored in the storage unit 20 in advance.

ポイント閾値記憶部２２は、所定のファイルがマルウェアかどうかを判定する指標となるポイント閾値を記憶する。この例では、ポイント閾値は、９５ポイントとして説明する。 The point threshold value storage unit 22 stores a point threshold value that serves as an index for determining whether a predetermined file is malware. In this example, the point threshold value is described as 95 points.

ポイント加減部１５は、検体であるファイルが所定の条件を満たすときに、初期ポイント設定部１４が設定したマルウェア初期ポイントに所定のポイントを加算または減算する。所定のポイントは、適宜設定可能である。「所定の条件」については、１１項目を後述する。所定のポイントは、項目ごとに異なる値であってもよい。また、項目ごとのポイントを予め記憶部２０に記憶しておいてもよい。 The point adding / subtracting unit 15 adds or subtracts a predetermined point to the malware initial point set by the initial point setting unit 14 when the sample file satisfies a predetermined condition. The predetermined point can be set as appropriate. Regarding “predetermined conditions”, eleven items will be described later. The predetermined point may be a different value for each item. Further, points for each item may be stored in the storage unit 20 in advance.

閾値判定部１６は、ポイント加減部１５によって加算または減算されて算出されるポイントが、ポイント閾値記憶部２２が記憶するポイント閾値を超えるかどうかを判定し、そのポイント閾値を超える場合には、検体であるファイルをマルウェアと判定する。例えば、判定部１２及び概括的判定部１３がマルウェアと判定したファイルには、マルウェア初期ポイントとして９０ポイントが設定される。そして、このファイルが所定の条件を満たすとき、ポイント加減部１５は、９０ポイントに所定のポイントを加算または減算する。９０ポイントに所定のポイントを加算または減算して算出されるポイントが９５ポイントを超える場合には、検体であるファイルはマルウェアと判定される。 The threshold determination unit 16 determines whether the point calculated by addition or subtraction by the point addition / subtraction unit 15 exceeds the point threshold stored in the point threshold storage unit 22, and if the point threshold exceeds the point threshold, It is determined that the file is malware. For example, 90 points are set as a malware initial point in the file determined by the determination unit 12 and the general determination unit 13 as malware. When this file satisfies a predetermined condition, the point adding / subtracting unit 15 adds or subtracts a predetermined point to 90 points. When the points calculated by adding or subtracting predetermined points to 90 points exceed 95 points, the file as the sample is determined to be malware.

マルウェアの具体的特徴１１項目について説明する。検体であるファイルがマルウェアの具体的特徴に合致するとき、ポイント加減部１５は、初期ポイント設定部１４が設定したマルウェア初期ポイントに所定のポイントを加算する。マルウェアの具体的特徴は、特徴情報抽出部１０によって抽出される。 Eleven specific features of malware will be described. When the file that is the sample matches the specific characteristics of the malware, the point adding / subtracting unit 15 adds a predetermined point to the malware initial point set by the initial point setting unit 14. Specific features of the malware are extracted by the feature information extraction unit 10.

（具体的特徴１）
改ざんアイコン判定部１７は、検体であるファイルのアイコン画像が正規のアイコン画像かどうかを判定する。正規のアイコン画像である場合、オペレーティングシステムの内部に完全に同一のアイコンが保持されている場合がある。また、一般的に知られている正規のアイコン画像については、記憶部２０に記憶しておくことで判別可能である。他方、正規のアイコン画像を改ざんしている場合には、マルウェアである可能性が高い。そこで、ポイント加減部１５は、改ざんアイコン判定部１７が検体であるファイルのアイコン画像が正規のアイコン画像でないと判定した場合に、マルウェア初期ポイントに所定のポイントを加算する。他方、ポイント加減部１５は、改ざんアイコン判定部１７が検体であるファイルのアイコン画像が正規のアイコン画像であると判定した場合、マルウェア初期ポイントに所定のポイントを減算する。 (Specific features 1)
The falsification icon determination unit 17 determines whether the icon image of the sample file is a regular icon image. If it is a regular icon image, the same icon may be held inside the operating system. Further, a generally known regular icon image can be determined by storing it in the storage unit 20. On the other hand, if the legitimate icon image has been tampered with, it is highly possible that it is malware. Therefore, when the falsification icon determination unit 17 determines that the icon image of the sample file is not a regular icon image, the point adjustment unit 15 adds a predetermined point to the malware initial point. On the other hand, when the falsification icon determination unit 17 determines that the icon image of the sample file is a regular icon image, the point adjustment unit 15 subtracts a predetermined point from the malware initial point.

（具体的特徴２）
正規のアプリケーションは、多数のアイコンリソースを保有する傾向がある。他方、マルウェアのアイコンリソース数は極端に少ない傾向がある。そこで、特徴情報抽出部１０が、検体であるファイルのリソースファイルからアイコンセグメントの数を抽出する。そして、抽出されたアイコンセグメントの数が所定の閾値を下回る場合には、ポイント加減部１５は、マルウェア初期ポイントに所定のポイントを加算する。他方、抽出されたアイコンセグメントの数が所定の閾値以上の場合には、ポイント加減部１５は、マルウェア初期ポイントに所定のポイントを減算する。ここで、加算する場合のポイントと減算する場合のポイントとは異なってもよい。また、所定の閾値については、予め記憶部２０に記憶しておいてもよい。 (Specific features 2)
Legitimate applications tend to have a large number of icon resources. On the other hand, the number of malware icon resources tends to be extremely small. Therefore, the feature information extraction unit 10 extracts the number of icon segments from the resource file of the sample file. When the number of extracted icon segments falls below a predetermined threshold, the point adjuster 15 adds a predetermined point to the malware initial point. On the other hand, when the number of extracted icon segments is equal to or greater than a predetermined threshold, the point adding / subtracting unit 15 subtracts the predetermined point from the malware initial point. Here, the points for addition and the points for subtraction may be different. Further, the predetermined threshold value may be stored in the storage unit 20 in advance.

（具体的特徴３）
例えば、マイクロソフト・ワードなどのマイクロソフト（登録商標）のアイコンを有するアプリケーションは、基本的にはマイクロソフト（登録商標）製のアプリケーションである。そこで、特徴情報抽出部１０が、検体であるファイルからバージョン情報を抽出する。そして、ポイント加減部１５は、検体であるファイルのアイコン画像がバージョン情報に対応しない場合に、マルウェア初期ポイントに所定のポイントを加算する。例えば、検体であるファイルがマイクロソフト（登録商標）のアイコンを保持しているにもかかわらず、抽出されたバージョン情報がマイクロソフト（登録商標）でない場合には、ポイント加減部１５は、マルウェア初期ポイントに所定のポイントを加算する。 (Specific features 3)
For example, an application having a Microsoft (registered trademark) icon such as Microsoft Word is basically an application made by Microsoft (registered trademark). Therefore, the feature information extraction unit 10 extracts version information from a file that is a sample. Then, when the icon image of the sample file does not correspond to the version information, the point adding / subtracting unit 15 adds a predetermined point to the malware initial point. For example, if the extracted version information is not Microsoft (registered trademark) even though the sample file has a Microsoft (registered trademark) icon, the point adjuster 15 sets the malware initial point. Add predetermined points.

（具体的特徴４）
例えば、検体であるファイルがマイクロソフト・ワードなどのマイクロソフト（登録商標）のアイコンを有する場合、アイコンが正規のものであれば、当該アイコンを保持するアプリケーションは、基本的にはＣ＋＋等を利用したネイティブアプリケーションである。他方、バイナリが「．ＮＥＴ」や「ＶｉｓｕａｌＢａｓｉｃ（登録商標）」で作成されている場合には、マルウェアである可能性が高い。具体的には、特徴情報抽出部１０がファイルのＰＥヘッダーからプログラミング言語情報を抽出する。そして、検体であるファイルがマイクロソフト・ワードなどのマイクロソフト（登録商標）のアイコンを有し、しかも抽出されたプログラミング言語情報がＣ＋＋等でない場合には、ポイント加減部１５は、マルウェア初期ポイントに所定のポイントを加算する。 (Specific features 4)
For example, if a file that is a sample has a Microsoft (registered trademark) icon such as Microsoft Word and the icon is genuine, the application that holds the icon is basically a native application that uses C ++ or the like. Is an application. On the other hand, if the binary is created with “.NET” or “Visual Basic (registered trademark)”, there is a high possibility that the binary is malware. Specifically, the feature information extraction unit 10 extracts programming language information from the PE header of the file. If the sample file has a Microsoft (registered trademark) icon such as Microsoft Word and the extracted programming language information is not C ++ or the like, the point adjuster 15 adds a predetermined initial malware point to the malware. Add points.

（具体的特徴５）
例えば、検体であるファイルがマイクロソフト・ワードなどのマイクロソフト（登録商標）のアイコンを有する場合、アイコンが正規のものであれば、コンパイラは、基本的には「ＶｉｓｕａｌＳｔｕｄｉｏ（登録商標）」である。その他のコンパイラを利用している形跡がある場合、例えば、利用されているＡＰＩ（ＡｐｐｌｉｃａｔｉｏｎＰｒｏｇｒａｍｍｉｎｇＩｎｔｅｒｆａｃｅ）に差異がある場合などには、検体であるファイルはマルウェアである可能性が高いため、所定のポイントを加算する。具体的には、特徴情報抽出部１０が検体であるファイルからコンパイラ情報を抽出する。ポイント加減部１５は、抽出されるコンパイラ情報が、ファイルが保持するアイコン画像に対応するコンパイラ情報でない場合に、マルウェア初期ポイントに所定のポイントを加算する。 (Specific feature 5)
For example, when a file that is a sample has a Microsoft (registered trademark) icon such as Microsoft Word, if the icon is genuine, the compiler is basically “Visual Studio (registered trademark)”. When there is evidence that other compilers are used, for example, when there is a difference in the API (Application Programming Interface) used, the sample file is likely to be malware. Add points. Specifically, the feature information extraction unit 10 extracts compiler information from a file that is a sample. The point adding / subtracting unit 15 adds a predetermined point to the malware initial point when the extracted compiler information is not the compiler information corresponding to the icon image held by the file.

（具体的特徴６）
偽装アイコンを保持しており、ＵＰＸ（ＵｌｔｉｍａｔｅＰａｃｋｅｒｆｏｒｅＸｅｃｕｔａｂｌｅｓ）などのパッカーが利用されている場合、検体であるファイルはマルウェアである可能性が高いため、所定のポイントを加算する。具体的には、特徴情報抽出部１０がファイルのセクション情報からパッカーに関する情報を抽出する。ポイント加減部１５は、特徴情報抽出部１０が所定のファイルからパッカーに関する情報を抽出した場合に、マルウェア初期ポイントに所定のポイントを加算する。 (Specific features 6)
When a disguised icon is held and a packer such as UPX (Ultimate Packer for Executables) is used, the sample file is likely to be malware, so a predetermined point is added. Specifically, the feature information extraction unit 10 extracts information about the packer from the section information of the file. The point adjustment unit 15 adds a predetermined point to the initial malware point when the feature information extraction unit 10 extracts information about the packer from the predetermined file.

（具体的特徴７）
本来、自己解凍書庫が持っているアイコンは特定のものである。偽装アイコンがこのアイコンを保持している場合には、ファイルの実行を誘導している可能性が高い。そこで、この場合には、ポイントを加算する。具体的には、特徴情報抽出部１０が、検体であるファイルから自己解凍書庫情報を抽出した場合に、マルウェア初期ポイントに所定のポイントを加算する。 (Specific feature 7)
Originally, the icons that self-extracting archives have are specific. If the impersonation icon holds this icon, there is a high possibility that the execution of the file is guided. Therefore, in this case, points are added. Specifically, when the feature information extraction unit 10 extracts the self-decompressing archive information from the sample file, a predetermined point is added to the malware initial point.

（具体的特徴８）
正規のアプリケーションの場合、ファイル名は英数のみで文字数も短い傾向にある。そこで、検体であるファイルが非常に長いファイル名を持っており、かつ、偽装アイコンを保持している場合、ドキュメントファイルに誤認させる目的であると考えられるため、所定のポイントを加算する。ここで、ファイル名の文字数の閾値を設定する。この閾値は、適宜設定可能である。また、文字数の閾値は、記憶部２０に予め記憶しておいてもよい。具体的には、特徴情報抽出部１０が検体であるファイルのヘッダー情報からファイル名を抽出する。そして、ポイント加減部１５は、抽出されたファイル名の文字数が所定の文字数（閾値）を超える場合に、マルウェア初期ポイントに所定のポイントを加算する。 (Specific feature 8)
In the case of legitimate applications, file names tend to be only alphanumeric and have a short number of characters. Therefore, if the sample file has a very long file name and holds a camouflaged icon, it is considered that the document file is misidentified, so a predetermined point is added. Here, a threshold value for the number of characters in the file name is set. This threshold value can be set as appropriate. The threshold value for the number of characters may be stored in the storage unit 20 in advance. Specifically, the feature information extraction unit 10 extracts a file name from header information of a file that is a sample. Then, when the number of characters in the extracted file name exceeds a predetermined number of characters (threshold), the point adjustment unit 15 adds a predetermined point to the malware initial point.

（具体的特徴９）
ファイル名を途中から逆転させ、拡張子を誤認させることがある。これを偽装アイコンと組み合わせるケースがある。そこで、ファイル名に、例えば、「￥ｘ２０２ｅ（ＲＬＯ（ＲｉｇｈｔｔｏＬｅｆｔＯｖｅｒｒｉｄｅ））」といったユニコード制御文字が含まれており、かつ、偽装アイコンを保持している場合には、ポイントを加算する。具体的には、特徴情報抽出部１０が検体であるファイルのヘッダー情報からファイル名を抽出する。そして、ポイント加減部１５は、抽出されたファイル名にユニコード制御文字が含まれる場合に、マルウェア初期ポイントに所定のポイントを加算する。 (Specific feature 9)
The file name may be reversed from the middle and the extension may be mistaken. There are cases where this is combined with a camouflaged icon. Therefore, when the file name includes a Unicode control character such as “¥ x202e (RLO (Right to Left Override))” and holds a camouflaged icon, points are added. Specifically, the feature information extraction unit 10 extracts a file name from header information of a file that is a sample. Then, the point adding / subtracting unit 15 adds a predetermined point to the malware initial point when the extracted file name includes a Unicode control character.

（具体的特徴１０）
「．ｄｏｃ．ｅｘｅ」といった二重拡張子をファイル名に付与することで、実際のファイル形式を誤認させることがある。これを偽装アイコンと組み合わせるケースがある。そこで、ファイル名に二重拡張子が含まれており、かつ偽装アイコンを保持している場合には、ポイントを加算する。具体的には、特徴情報抽出部１０が検体であるファイルのヘッダー情報からファイル名を抽出する。そして、ポイント加減部１５は、抽出されたファイル名に複数の拡張子が含まれる場合に、マルウェア初期ポイントに所定のポイントを加算する。 (Specific feature 10)
By giving a double extension such as “.doc.exe” to the file name, the actual file format may be misidentified. There are cases where this is combined with a camouflaged icon. Therefore, if the file name includes a double extension and holds a fake icon, points are added. Specifically, the feature information extraction unit 10 extracts a file name from header information of a file that is a sample. Then, when the extracted file name includes a plurality of extensions, the point adding / subtracting unit 15 adds a predetermined point to the malware initial point.

（具体的特徴１１）
マルウェアをドキュメントファイルに偽装する場合、興味を引くようなファイル名が用いられる傾向がある。特に、日本国内で行われる攻撃では、日本語のファイル名が使用される傾向にある。そこで、ファイル名に２バイト文字が含まれており、かつ、偽装アイコンを保持している場合、ポイントを加算する。具体的には、特徴情報抽出部１０が検体であるファイルのヘッダー情報からファイル名を抽出する。そして、ポイント加減部１５は、抽出されたファイル名に２バイト文字が含まれている場合に、マルウェア初期ポイントに所定のポイントを加算する。 (Specific features 11)
When disguising malware as a document file, there is a tendency to use an interesting file name. In particular, Japanese file names tend to be used in attacks conducted in Japan. Therefore, if the file name includes double-byte characters and holds a camouflaged icon, points are added. Specifically, the feature information extraction unit 10 extracts a file name from header information of a file that is a sample. Then, the point adding / subtracting unit 15 adds a predetermined point to the malware initial point when the extracted file name includes a 2-byte character.

以上、ファイルの具体的特徴、１１項目について説明した。特徴情報抽出部１０が抽出する特徴は、この１１項目すべてでもよいし、そのうちのいくつかを組み合わせるものであってもよい。項目数に応じて、ポイント閾値を適宜設定してもよい。 The specific features of the file and 11 items have been described above. The features extracted by the feature information extraction unit 10 may be all 11 items or a combination of some of them. Depending on the number of items, the point threshold may be set as appropriate.

［マルウェア判定フロー］
図１０を用いて、情報処理装置が検体であるファイルをマルウェアと判定するフローを説明する。図１０は、本発明の他の実施形態に係る情報処理装置がマルウェアを判定するフローを説明するための図である。本実施形態のＳ５０１、Ｓ５０２、Ｓ５０３、Ｓ５０４は、第２実施形態のＳ３０１、Ｓ３０２、Ｓ３０３、Ｓ３０４に対応する。ここでは、Ｓ５０４以降について詳細に説明する。 [Malware judgment flow]
A flow in which the information processing apparatus determines a file as a sample as malware will be described with reference to FIG. FIG. 10 is a diagram for explaining a flow in which an information processing apparatus according to another embodiment of the present invention determines malware. S501, S502, S503, and S504 in the present embodiment correspond to S301, S302, S303, and S304 in the second embodiment. Here, S504 and subsequent steps will be described in detail.

概括的判定部１３による判定の結果、検体であるファイルがマルウェアであると判定される場合（Ｓ５０４でＹｅｓの場合）、初期ポイント設定部１４がマルウェア初期ポイントを設定する（Ｓ５０５）。ここでは、マルウェア初期ポイントは、９０ポイントであるとして説明する。 As a result of the determination by the general determination unit 13, when it is determined that the sample file is malware (Yes in S504), the initial point setting unit 14 sets a malware initial point (S505). Here, description will be made assuming that the initial malware point is 90 points.

次に、検体であるファイルが所定の条件を満たすかどうか判定する（Ｓ５０６）。具体的には、検体であるファイルが、上記のマルウェアの具体的特徴を有するかどうかを判定する。所定の条件を満たす場合（Ｓ５０６でＹｅｓの場合）には、ポイント加算部１５は、マルウェア初期ポイントである９０ポイントに所定のポイントを加算する（Ｓ５０７）。他方、所定の条件を満たさない場合（Ｓ５０６でＮｏの場合）には、ポイントが加算されないことから、ポイントはマルウェア初期ポイントの９０ポイントのままである。この例では、ポイント閾値は、マルウェア初期ポイントよりも大きいことを前提としている。そのため、ポイントが加算されない場合には、ポイント閾値を下回るため、検体であるファイルは、正常なソフトウェアと判定される（Ｓ５１０）。 Next, it is determined whether the sample file satisfies a predetermined condition (S506). Specifically, it is determined whether or not a file as a sample has the specific characteristics of the malware. When the predetermined condition is satisfied (Yes in S506), the point addition unit 15 adds a predetermined point to 90 points that are malware initial points (S507). On the other hand, if the predetermined condition is not satisfied (No in S506), no points are added, so the points remain 90 points of the initial malware points. In this example, it is assumed that the point threshold is larger than the initial malware point. For this reason, when the points are not added, the file that is the sample is determined to be normal software because it falls below the point threshold (S510).

マルウェア初期ポイントにポイントが加算された結果、ポイント閾値（ここでは、９５ポイント）を超える場合（Ｓ５０８でＹｅｓの場合）には、検体であるファイルはマルウェアであると判定され（Ｓ５０９）、フローは終了する。他方、マルウェア初期ポイントにポイントが加算された結果、ポイント閾値以下の場合（Ｓ５０８でＮｏの場合）には、検体であるファイルは正常なソフトウェアと判定され（Ｓ５１０）、フローは終了となる。 As a result of adding the points to the malware initial points, when the point threshold (95 points in this case) is exceeded (Yes in S508), the sample file is determined to be malware (S509), and the flow is finish. On the other hand, if the result of adding points to the malware initial points is equal to or less than the point threshold (No in S508), the sample file is determined to be normal software (S510), and the flow ends.

本実施形態でも、第１実施形態乃至第３実施形態と同様の効果を奏する。 This embodiment also has the same effects as those of the first to third embodiments.

判定部１２及び概括的判定部１３によってマルウェアと判定されたファイルは、本来は正常なソフトウェアであるにもかかわらず、誤ってマルウェアと判定してしまうことが稀にある。そこで、判定部１２及び概括的判定部１３によってマルウェアと判定されたファイルについて、マルウェア初期ポイントを設定する。そして、ファイルから各特徴情報を抽出し、マルウェアの具体的特徴を有する場合には、マルウェア初期ポイントにポイントが加算される。加算されて算出されたポイントが、ポイント閾値を超える場合に初めて、検体であるファイルはマルウェアと判定される。本実施形態では、判定部１２及び概括的判定部１３の誤検知を抑制することができるという効果を奏する。 A file that is determined to be malware by the determination unit 12 and the general determination unit 13 is rarely erroneously determined to be malware although it is originally normal software. Therefore, an initial malware point is set for the file determined to be malware by the determination unit 12 and the general determination unit 13. And each feature information is extracted from a file, and when it has the specific feature of malware, a point is added to the malware initial point. Only when the point calculated by addition exceeds the point threshold value, the sample file is determined to be malware. In this embodiment, there exists an effect that the misdetection of the determination part 12 and the general determination part 13 can be suppressed.

＜第５実施形態＞
［情報処理装置の構成］
図１１を用いて、情報処理装置４について説明する。図１１は、本発明の他の実施形態に係る情報処理装置の概念図である。本実施形態に係る情報処理装置４は、第３実施形態の構成に加えて、数値化部１８を備える。ここでは、第３実施形態と異なる点について詳細に説明する。 <Fifth Embodiment>
[Configuration of information processing device]
The information processing apparatus 4 will be described with reference to FIG. FIG. 11 is a conceptual diagram of an information processing apparatus according to another embodiment of the present invention. The information processing apparatus 4 according to the present embodiment includes a digitizing unit 18 in addition to the configuration of the third embodiment. Here, differences from the third embodiment will be described in detail.

数値化部１８は、マルウェアと判定された所定のファイルのアイコン画像を数値化する。この例では、数値化部１８は、閾値判定部１６によってマルウェアと判定されたファイルのアイコン画像を数値化する。そして、数値記憶部２１は、数値化部１８によって数値化された数値を記憶する。もっとも、これに限定されるものではなく、数値化部１８は、判定部１２や概括的判定部１３によってマルウェアと判定されたファイルのアイコン画像を数値化してもよい。 The digitizing unit 18 digitizes an icon image of a predetermined file determined as malware. In this example, the digitizing unit 18 digitizes the icon image of the file determined as malware by the threshold determining unit 16. The numerical value storage unit 21 stores the numerical value converted into a numerical value by the numerical conversion unit 18. However, the present invention is not limited to this, and the digitizing unit 18 may digitize the icon image of the file determined as malware by the determining unit 12 or the general determining unit 13.

閾値判定部１６によって、未知のファイルでマルウェアと判定されたアイコン画像を数値化した数値も記憶されることになる。すなわち、未知のファイルでマルウェアと判定されたアイコン画像も、検体であるファイルから抽出される数値との比較対象となるデータとなる。もっとも、未知のファイルでマルウェアと判定されたファイルは、本来は正常なソフトウェアである可能性も僅かながらある。そこで、比較対象となるデータとしての適格性を持たせるために、数値化部１８によって数値化する対象のファイルは、ポイント閾値を極めて高くしてもよい。 The threshold value determination unit 16 also stores a numerical value obtained by digitizing an icon image determined to be malware by an unknown file. That is, an icon image determined to be malware in an unknown file is also data to be compared with a numerical value extracted from the sample file. However, there is a slight possibility that an unknown file determined to be malware is originally normal software. Therefore, in order to provide eligibility as data to be compared, a file to be digitized by the digitizing unit 18 may have an extremely high point threshold.

本実施形態でも、第１実施形態乃至第４実施形態と同様の効果を奏する。 This embodiment also has the same effect as the first to fourth embodiments.

本実施形態では、未知のファイルでマルウェアと判定されたアイコン画像も、検体であるファイルから抽出される数値との比較対象となるデータとなる。そのため、数値記憶部２１に記憶される数値が増え、より多くの数の比較対象となるデータとの判定をすることができるという効果を奏する。 In the present embodiment, an icon image determined to be malware in an unknown file is also data to be compared with a numerical value extracted from the sample file. Therefore, there is an effect that the numerical value stored in the numerical value storage unit 21 is increased, and it is possible to determine the data to be compared with a larger number.

以上の実施形態に係る方法は、多様なコンピュータ手段によって実行が可能なプログラム命令形態で実現されてコンピュータで読み取り可能な媒体に記録されてもよい。コンピュータで読み取り可能な媒体は、プログラム命令、データファイル、データ構造などを単独または組み合わせて含んでもよい。コンピュータで読み取り可能な記録媒体の例としては、ハードディスク、フロッピーディスク(登録商標)、および磁気テープのような磁気媒体、ＣＤ−ＲＯＭ、ＤＶＤのような光記録媒体、フロプティカルディスクのような磁気−光媒体、およびＲＯＭ、ＲＡＭ、フラッシュメモリなどのようなプログラム命令を格納して実行するように特別に構成されたハードウェア装置が含まれる。プログラム命令の例としては、コンパイラによって生成されるもののような機械語コードだけではなく、インタプリタなどを使用してコンピュータによって実行される高級言語コードを含んでもよい。 The methods according to the above embodiments may be realized in the form of program instructions that can be executed by various computer means and recorded on a computer-readable medium. A computer readable medium may include program instructions, data files, data structures, etc., alone or in combination. Examples of the computer-readable recording medium include a hard disk, a floppy disk (registered trademark), a magnetic medium such as a magnetic tape, an optical recording medium such as a CD-ROM and a DVD, and a magnetic medium such as a floppy disk. -Optical media and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, etc. are included. Examples of program instructions may include not only machine language code such as that generated by a compiler, but also high-level language code executed by a computer using an interpreter or the like.

なお、本発明は上記の実施形態に限られたものではなく、趣旨を逸脱しない範囲で適宜変更することが可能である。 Note that the present invention is not limited to the above-described embodiment, and can be modified as appropriate without departing from the spirit of the present invention.

１、２、３：情報処理装置
１０：特徴情報抽出部１１：特徴ベクトル生成部１２：判定部
１３：概括的判定部１４：初期ポイント設定部１５：ポイント加減部
１６：閾値判定部１７：改ざんアイコン判定部１８：数値化部
２０：記憶部２１：数値記憶部２２：ポイント閾値記憶部
２７：ネットワーク３０：ユーザ端末３３：サーバ
1, 2, 3: Information processing apparatus 10: Feature information extraction unit 11: Feature vector generation unit 12: Determination unit 13: General determination unit 14: Initial point setting unit 15: Point adjustment unit 16: Threshold determination unit 17: Tampering Icon determination unit 18: Digitization unit 20: Storage unit 21: Numerical storage unit 22: Point threshold storage unit 27: Network 30: User terminal 33: Server

Claims

A feature information extraction unit that extracts a binary of an icon image from within a resource of a predetermined file;
A feature vector generation unit that generates a feature vector from the binary of the extracted icon image;
A determination unit that determines whether the predetermined file is malware by machine learning using the feature vector;
An information processing apparatus comprising:

The feature information extraction unit extracts a numerical value from the icon image,
A numerical value storage unit for storing numerical values obtained by digitizing a fake icon image;
A general determination unit that determines whether the predetermined file is malware based on a numerical value extracted by the feature information extraction unit and a numerical value stored in the numerical value storage unit;
The information processing apparatus according to claim 1, further comprising:

When at least one of the determination unit and the general determination unit determines the predetermined file as malware, an initial point setting unit that sets a predetermined malware initial point;
A point threshold value storage unit for storing a point threshold value as an index for determining whether the predetermined file is malware;
A point addition / subtraction unit that adds or subtracts a predetermined point to the malware initial point when the predetermined file satisfies a predetermined condition;
It is determined whether the points calculated by addition or subtraction by the point addition / subtraction unit exceed the point threshold value stored in the point threshold value storage unit. If the point threshold value is exceeded, the predetermined file is regarded as malware. A threshold determination unit for determining;
The information processing apparatus according to claim 1, further comprising:

A digitizing unit that digitizes an icon image of a predetermined file determined as the malware;
The information processing apparatus according to claim 1, wherein the numerical value storage unit stores a numerical value converted into a numerical value by the numerical conversion unit.

Computer
Extract the icon image binary from the resource of the given file,
Generating a feature vector from the extracted icon image binary;
An information processing method for determining whether the predetermined file is malware by machine learning using the feature vector.

On the computer,
Extract the icon image binary from the resource of the given file,
Generating a feature vector from the extracted icon image binary;
A program for causing a machine learning to determine whether the predetermined file is malware using the feature vector.

In the computer,
The program according to claim 6, wherein the binary is extracted at a predetermined interval.

In the computer,
Extract numerical values from the icon image,
Memorize the numerical value of the fake icon image,
The program according to claim 6 or 7, for further determining whether or not the predetermined file is malware based on the extracted numerical value and the stored numerical value.

In the computer,
Determining whether the predetermined file is malware by machine learning using at least the feature vector, and determining whether the predetermined file is malware based on the extracted numerical value and the stored numerical value When one determines that the predetermined file is malware, it sets a predetermined malware initial point,
Storing a point threshold value as an index for determining whether the predetermined file is malware;
Adding or subtracting a predetermined point to the malware initial point when the predetermined file satisfies a predetermined condition,
A request for determining whether or not the point calculated by the addition or subtraction exceeds the stored point threshold value, and determining that the predetermined file is malware when the point threshold value is exceeded. Item 9. The program according to item 6 or claim 8.

In the computer,
Digitize the icon image of the predetermined file determined as the malware,
The program according to claim 9, for executing the storage of the digitized numerical value.

In the computer,
Determining whether the icon image of the predetermined file is a regular icon image;
The program according to claim 9, wherein when it is determined that an icon image of the predetermined file is not a regular icon image, adding a predetermined point to the malware initial point is executed.

Extracting the number of icons held in the resource of the given file;
The program according to claim 9, wherein when the number of extracted icons is less than a predetermined number, adding a predetermined point to the malware initial point is executed.

Extracting version information from the predetermined file;
The program according to claim 9 for causing a predetermined point to be added to the initial malware point when an icon image of the predetermined file does not correspond to the version information.

Extracting programming language information from the predetermined file;
10. The method according to claim 9, wherein when the extracted programming language information is not programming language information corresponding to an icon image held in the predetermined file, adding a predetermined point to the malware initial point is executed. Program.

Extract compiler information from the given file,
The program according to claim 9, wherein when the extracted compiler information is not compiler information corresponding to an icon image held in the predetermined file, adding the predetermined point to the malware initial point is executed. .

Extracting information about the packer from the given file,
The program according to claim 9, wherein when a piece of information about a packer is extracted from the predetermined file, adding a predetermined point to the malware initial point is executed.

Extract self-extracting archive information from the given file,
The program according to claim 9, wherein when a self-extracting archive information is extracted from the predetermined file, a predetermined point is added to the malware initial point.

Extract the file name from the given file,
The program according to claim 9, wherein when the number of characters of the extracted file name exceeds a predetermined number of characters, a predetermined point is added to the malware initial point.

Extract the file name from the given file,
The program according to claim 9, wherein when a Unicode control character is included in the extracted file name, a predetermined point is added to the malware initial point.

Extract the file name from the given file,
The program according to claim 9, wherein when the extracted file name includes a plurality of extensions, adding a predetermined point to the malware initial point is executed.

Extract the file name from the given file,
The program according to claim 9, wherein when a two-byte character is included in the extracted file name, a predetermined point is added to the malware initial point.

A computer-readable recording medium on which the program according to any one of claims 6 to 21 is recorded.