JP2008192122A

JP2008192122A - Malicious mail detector, detecting method and program

Info

Publication number: JP2008192122A
Application number: JP2007137453A
Authority: JP
Inventors: Takahide Sugita; 貴英杉田
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2007-01-09
Filing date: 2007-05-24
Publication date: 2008-08-21

Abstract

<P>PROBLEM TO BE SOLVED: To provide a malicious mail detecting technology detecting spam mail and malware even when a part of data included in a packet is intentionally modified. <P>SOLUTION: This malicious mail detector 50 has a storage means 170 storing the feature amount of a character string pattern included in mail to be compared; a computing means 120 computing the feature amount of the character string pattern within a computation range defined beforehand out of received mail; and determining means 130, 140 determining whether the mail is malicious mail from the similarity between the feature amount of the character string pattern of the received mail computed by the computing means and the feature amount of the stored character string pattern. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、悪意のこめられたメールを検出する悪意メール検出装置に関し、特に、パケットに含まれるデータの一部が意図的変更された悪意メールであっても検出できる悪意メール検出技術を提供することに関する。 The present invention relates to a malicious email detection device that detects malicious emails, and in particular, provides a malicious email detection technology that can detect even malicious emails in which part of data included in a packet is intentionally changed. About that.

インターネットの普及に伴い、スパムメールによる被害が深刻化している。スパムメールとは、メール受信者の都合を考慮せず一方的に送られてくる、所謂、迷惑メールである。スパムメールを受信すると、受信したコンピュータのハードディスクやメモリ等のリソースが徒に消費され、コンピュータに接続されているネットワークにも負荷がかかる。スパムメールは、今やネットワークにおける脅威の一つとなっており、種々のスパムメール対策として種々の方法が検討されてきている。 With the spread of the Internet, the damage caused by spam emails has become serious. Spam mail is so-called spam mail that is sent unilaterally without considering the convenience of the mail recipient. When a spam mail is received, resources such as the hard disk and memory of the received computer are consumed, and a load is imposed on the network connected to the computer. Spam mail is now one of the threats in the network, and various methods have been studied for various spam mail countermeasures.

従来から行われている対策として、あらかじめ登録したパターンと受信したメールとを照合するパターンマッチングによる手法がある。この手法は、受信したメールにＵＲＬ等のある特定の文字列が含まれていた場合に、スパムメールであると判断する手法（ＵＲＬフィルタリング）である。スパムメールには、スパムメールの受信者に参照させようとする誘導先サイトのＵＲＬが示されていることが多いため、この特徴を利用したものである。具体的には、メールに含まれているＵＲＬと予めデータベースに登録されているＵＲＬとを比較し、完全に一致した場合に、該当メールをスパムメールと見なす手法である。 As a countermeasure conventionally taken, there is a pattern matching method for matching a pre-registered pattern with a received mail. This method is a method (URL filtering) for determining that the received mail is a spam mail when a certain character string such as a URL is included in the received mail. The spam mail often uses the URL of the destination site to be referred to by the recipient of the spam mail, and therefore uses this feature. Specifically, this is a technique in which a URL included in an email is compared with a URL registered in advance in a database, and the email is regarded as a spam email if the URL matches completely.

しかしながら、こうした従来の手法は、管理者側で照合用のデータベースを管理する等、都度メンテナンスが必要であり運用コストがかかるという問題点があった。スパムメールの検出率も、低かった。 However, such a conventional method has a problem in that maintenance is required each time, such as managing a database for verification on the administrator side, and operation costs are increased. The spam detection rate was also low.

こうした背景から、スパムメールの検出率を高め、管理者負担を軽減するべく、いくつかの技術が提案されてきている。 Against this background, several techniques have been proposed to increase the spam mail detection rate and reduce the burden on administrators.

例えば、メール流量数が所定の閾値を超えた場合、または、判定ワードの一部が受信メールの本文中に含まれている場合に迷惑メールであると判定し、迷惑メールと判定された電子メール中のＵＲＬを判定ワード候補として登録する技術が提案されてきている（特許文献１）。特許文献１の発明によれば、ＵＲＬが完全一致した場合に限らず、一部の判定ワードが一致した場合でも迷惑メールであると判定できるため検出精度が向上する。また、ＵＲＬ登録手段が、迷惑メールからＵＲＬを抽出して直接ブラックリストを生成するので、管理者負担が軽減される。 For example, if the number of email flows exceeds a predetermined threshold, or if a part of the judgment word is included in the body of the received email, it is determined that the email is spam and the email is determined to be spam There has been proposed a technique for registering the URL as a determination word candidate (Patent Document 1). According to the invention of Patent Document 1, not only when the URLs completely match, but also when some of the determination words match, it can be determined that the message is spam, and thus the detection accuracy is improved. Moreover, since the URL registration means extracts the URL from the junk mail and directly generates the black list, the burden on the administrator is reduced.

また、スパム検出を促進させる目的で追加の特徴（例えば、Ｎグラムに基づいてキャラクタＮグラムについての特徴）を作成し、キャラクタシーケンスのエントロピーについての特徴を作成し、機械学習システムを使用してフィルタをトレーニングすることで、スパムを検出し防止するスパム検出技術が提案されてきている（特許文献２）。特許文献２の発明においては、従来技術のスパムフィルタが一般に使用している特徴を超える追加の特徴を含めることによって、スパムメールの検出精度を向上させている。
特開２００５−２０８７８０号公報特開２００５−０１８７４５号公報 It also creates additional features (for example, features for character N-grams based on N-grams) to facilitate spam detection, creates features for entropy of character sequences, and filters using machine learning systems Has been proposed to detect and prevent spam (Patent Document 2). In the invention of Patent Document 2, the accuracy of spam mail detection is improved by including additional features that exceed those commonly used by the spam filter of the prior art.
JP 2005-208780 A JP 2005-018745 A

しかしながら、こうした技術が提案されてきているものの、確実にスパムメールを検出することは難しかった。例えば、特許文献１の発明では、受信したメールに判定ワードの一部が含まれていた場合であっても検出できるが、ＵＲＬの一部が意図的に変更されてしまうと検出ができなくなるという問題点があった。特許文献２の発明では、機械学習システムを使用してフィルタをトレーニングすることでスパムメールの検出精度を向上させてはいるものの、パケットに含まれるデータの一部を意図的に変更されたスパムメールについては、検知することができないという問題点があった。 However, although such techniques have been proposed, it has been difficult to reliably detect spam mails. For example, in the invention of Patent Document 1, it can be detected even when a part of the determination word is included in the received mail, but it cannot be detected if a part of the URL is intentionally changed. There was a problem. In the invention of Patent Document 2, although the accuracy of spam mail detection is improved by training a filter using a machine learning system, a part of data included in a packet is intentionally changed. There was a problem that cannot be detected.

すなわち、特許文献１及び特許文献２の発明を含む従来の検出手法では、新規のスパムメールの出現頻度のサイクルよりも、スパムメールに含まれるＵＲＬを登録する作業にかかる時間が短いことが前提とされていた。そのため、メールの本文をほとんど変えずに既知のＵＲＬを意図的に一部変更しているスパムメールを受信した場合には、データベースへのＵＲＬ登録／フィルタの更新が追いつかず、スパムメールがＵＲＬフィルタリングをすり抜けてしまうという問題点があった。 That is, the conventional detection method including the inventions of Patent Document 1 and Patent Document 2 is based on the premise that the time taken to register the URL included in the spam mail is shorter than the cycle of appearance frequency of new spam mail. It had been. Therefore, if a spam mail is received that intentionally changes a part of a known URL with almost no change in the body of the mail, the URL registration / filter update in the database cannot catch up, and the spam mail is URL filtered. There was a problem of slipping through.

パケットに含まれるデータの一部を意図的に変更されて検出が回避されてしまうという問題は、スパムメールに限られた事ではなかった。例えば、特許文献１及び特許文献２の発明を含む従来の検出手法では、マルウェア（ｍａｌｉｃｉｏｕｓｓｏｆｔｗａｒｅ：悪意のこもったソフトウェア）の検出も困難であった。これには複数の理由があった。 The problem that a part of the data contained in a packet is intentionally changed to avoid detection was not limited to spam mail. For example, with the conventional detection methods including the inventions of Patent Literature 1 and Patent Literature 2, it is difficult to detect malware (malicious software). There were several reasons for this.

第一の理由として、マルウェアは、次々とその亜種が登場するため、パターンファイルの作成が追いつかないということがある。 The first reason is that malware is unable to keep up with the creation of pattern files because its variants appear one after another.

第二の理由として、亜種の種類は多いが各亜種の絶対数が少なく、検体の入手が困難なこと、それ故に、パターンファイルの作成も困難であるということがある。 The second reason is that although there are many types of subspecies, the absolute number of each subspecies is small, it is difficult to obtain a specimen, and therefore it is difficult to create a pattern file.

第三の理由として、マルウェアは、感染過程で自分自身を複製して感染させる対象となるコンピュータへ送り込むが、このときマルウェアは、検知されるのを回避するため自データの一部を変更して複製を行う為、特許文献１及び特許文献２の発明を含む従来の検出手法では検知し難いことがあげられる。 Third, the malware replicates itself during the infection process and sends it to the target computer, but the malware changes some of its data to avoid detection. Since duplication is performed, it is difficult to detect with the conventional detection methods including the inventions of Patent Document 1 and Patent Document 2.

更に、前述したように、スパムメールを受信したコンピュータでは、ハードディスクやメモリ等のリソースが徒に消費され、コンピュータに接続されているネットワークにも負荷がかかる。例えば、トラフィック量の多い環境下でスパムメールを検知する場合、受信メールによるセッションの管理やメールの本文を検査するための処理に多くのメモリ容量を必要とし、処理コストがかかる。これを解決するべくハードウェアスペックを上げていくと、今度は、装置コストが高くなるという問題が生じる。 Furthermore, as described above, in the computer that receives the spam mail, resources such as a hard disk and a memory are consumed, and a load is imposed on the network connected to the computer. For example, when spam mail is detected in an environment where there is a large amount of traffic, a large amount of memory capacity is required for the process for managing the session by the received mail and for checking the text of the mail, and processing costs are high. If the hardware specifications are raised to solve this problem, this time, the problem that the apparatus cost becomes high arises.

従って、本発明が解決しようとする課題は、パケットに含まれるデータの一部が意図的変更された場合であっても、スパムメールやマルウェアを検知できる悪意メール検出技術を提供することである。 Therefore, the problem to be solved by the present invention is to provide a malicious mail detection technique capable of detecting spam mail and malware even when a part of data included in a packet is intentionally changed.

更に、本発明が解決しようとする課題は、悪意メールの検出処理の効率化を図り、処理コストを抑えた悪意メール検出技術を提供することでもある。 Furthermore, a problem to be solved by the present invention is to provide a malicious mail detection technique that improves the efficiency of malicious mail detection processing and reduces processing costs.

上記課題を解決するための第１の発明は、比較対象として用いられるメールに含まれる文字列パターンの特徴量を記憶する記憶手段と、受信メールのうち予め定義された演算対象範囲の文字列パターンの特徴量を算出する演算手段と、前記演算手段により算出された前記受信メールの文字列パターンの特徴量と、前記記憶されている文字列パターンの特徴量との類否により、前記メールが悪意メールか否かを判断する判断手段とを有することを特徴とする。 A first invention for solving the above-mentioned problems is a storage means for storing a character string pattern feature amount included in a mail used as a comparison target, and a character string pattern in a predefined calculation target range in the received mail. The mail is malicious due to similarity between the calculation means for calculating the feature quantity of the received mail, the feature quantity of the character string pattern of the received mail calculated by the calculation means, and the feature quantity of the stored character string pattern. And determining means for determining whether it is an e-mail.

上記課題を解決するための第２の発明は、受信メールのうち予め定義された演算対象範囲の文字列パターンの特徴量を算出する演算ステップと、前記演算ステップにより算出された前記受信メールの文字列パターンの特徴量と、記憶されている文字列パターンの特徴量との類否により、前記メールが悪意メールか否かを判断する判断ステップとを有することを特徴とする。 According to a second aspect of the present invention for solving the above problem, a calculation step of calculating a character string pattern feature amount in a predetermined calculation target range of received mail, and a character of the received mail calculated by the calculation step And determining whether or not the mail is a malicious mail based on the similarity between the feature quantity of the column pattern and the feature quantity of the stored character string pattern.

上記課題を解決するための第３の発明は、情報処理装置のプログラムであって、前記プログラムは情報処理装置に、受信メールのうち予め定義された演算対象範囲の文字列パターンの特徴量を算出する演算処理と、前記演算ステップにより算出された前記受信メールの文字列パターンの特徴量と、記憶されている文字列パターンの特徴量との類否により、前記メールが悪意メールか否かを判断する判断処理とを実行させることを特徴とする。 A third invention for solving the above-described problem is a program for an information processing apparatus, which calculates a feature amount of a character string pattern in a predetermined calculation target range in a received mail in the information processing apparatus. Determining whether the mail is a malicious mail based on the similarity between the calculation processing to be performed, the character string pattern feature amount of the received mail calculated in the calculation step, and the stored character string pattern feature amount And determining processing to be executed.

本発明によれば、データの一部を意図的に変更された場合であっても、悪意メールを検知することができる。その理由は、本発明の悪意メール検出装置が、比較対象として用いられるメールに含まれる文字列パターンの特徴量を記憶する記憶手段と、受信したメールに含まれる文字列パターンの特徴量を算出する演算手段とを備え、この演算手段により算出された受信メールの文字列パターンの特徴量と、予め記憶されている文字列パターンの特徴量との類否により、受信したメールが悪意メールか否かを判断する判断手段とを備えているからである。 According to the present invention, it is possible to detect malicious mail even when a part of data is intentionally changed. The reason for this is that the malicious mail detection device of the present invention calculates the feature value of the character string pattern included in the received mail and the storage means for storing the feature value of the character string pattern included in the mail used as the comparison target. Whether the received mail is malicious or not, based on the similarity between the character string pattern feature quantity of the received mail calculated by the calculation means and the character string pattern feature quantity stored in advance. This is because it comprises a judging means for judging the above.

次に、本発明の第１の実施例について、図１、図２、及び図３の第１の実施例を示すブロック図を用いて説明する。 Next, a first embodiment of the present invention will be described with reference to the block diagrams showing the first embodiment of FIGS. 1, 2 and 3. FIG.

図１は、本実施例における悪意メール検出方式を用いた通信システムの一例を示す図である。検出装置５０は、端末５１、５２、５３と同一セグメントのネットワークで接続されている。検出装置５０は同一セグメント上に流れる全てのパケットを受信する。これは一般的なネットワークインタフェースカードではプロミスカスモードと呼ばれる設定で実現できる。検出装置５０は、端末５１、５２、５３のパケットのレイヤ７（ＯＳＩモデル第7層：以下「Ｌ７」ともいう）を分析する。検出装置５０は、分析結果をログへ記録する。 FIG. 1 is a diagram illustrating an example of a communication system using the malicious mail detection method according to the present embodiment. The detection device 50 is connected to the terminals 51, 52, and 53 through the same segment network. The detection device 50 receives all packets flowing on the same segment. This can be realized by a setting called a promiscuous mode in a general network interface card. The detection device 50 analyzes layer 7 (OSI model layer 7; hereinafter also referred to as “L7”) of the packets of the terminals 51, 52, and 53. The detection device 50 records the analysis result in a log.

図２は、本実施例における悪意メール検出方式を用いた通信システムの他の一例を示す図である。通信装置６０、端末６１、６２、６３は、同一セグメントのネットワークで接続されている。尚、通信装置６０は、Ｌ７レベルのサービスを提供する通信装置であり、プロキシ、ロードバランサである。通信装置６０は、Ｌ２レベルの通信制御をするスイッチ・ハブ、Ｌ３レベルの通信制御をするルータ、Ｌ４レベルの通信制御をするファイアウォールであって構わない。この通信装置６０は、分析結果に応じてパケットを規制（パケットの廃棄・遮断）し、規制したパケットをログに記録する。 FIG. 2 is a diagram illustrating another example of a communication system using the malicious mail detection method according to the present embodiment. The communication device 60 and the terminals 61, 62, and 63 are connected by a network of the same segment. The communication device 60 is a communication device that provides an L7 level service, and is a proxy and a load balancer. The communication device 60 may be a switch hub that performs L2 level communication control, a router that performs L3 level communication control, or a firewall that performs L4 level communication control. The communication device 60 regulates the packet (discards / blocks the packet) according to the analysis result, and records the regulated packet in a log.

図３は、本発明における第１の実施例の構成を示すブロック図である。尚、本実施の形態は図１、図２のどちらの通信システムに適用させてもよい。 FIG. 3 is a block diagram showing the configuration of the first embodiment of the present invention. Note that this embodiment may be applied to either the communication system of FIG. 1 or FIG.

図３を見ると、本発明における第１の実施例による悪意メール検出方式は、プロトコル判定部１００と、特定部１１０と、対象文字指示部１６０と、ｎ−ｇｒａｍ処理部１２０と、判断部１３０と、検出部１４０と、記憶部１５０と、リファレンスパターン格納部１７０と、アクション処理部１８０とから構成される。 Referring to FIG. 3, the malicious mail detection method according to the first exemplary embodiment of the present invention includes a protocol determination unit 100, a specification unit 110, a target character instruction unit 160, an n-gram processing unit 120, and a determination unit 130. A detection unit 140, a storage unit 150, a reference pattern storage unit 170, and an action processing unit 180.

記憶部１５０は、記憶媒体であり、パケットの類似性を判断するにあたりユーザが予め指定した分析対象が記憶されている。記憶部１５０には、定義用テーブル１５１が含まれる。この定義用テーブル１５１には、検出装置５０または通信装置６０が何の類似性を分析するかがあらかじめプロトコル毎に指定される。例えば、ＴＣＰ２５番（ｓｍｔｐ）のパケットであれば、メールに含まれるＵＲＬの類似性を見るのか、マルウェア検知のために添付ファイルの類似性を見るのか、文字の出現頻度に基づいてメールを識別することでメールの類似性を見るのかが指される。また、定義用テーブル１５１では、計数単位の文字数（ｎ）が指定される。計数対象となる文字列を何文字で区切るかを指定する。例えば、計数対象となる文字列を１文字毎に区切って計数したい場合にはｎ＝１と設定される。 The storage unit 150 is a storage medium, and stores an analysis target designated in advance by the user when judging the similarity of packets. The storage unit 150 includes a definition table 151. In the definition table 151, what kind of similarity the detection device 50 or the communication device 60 analyzes is specified in advance for each protocol. For example, in the case of a TCP No. 25 (smtp) packet, whether to see the similarity of the URL included in the mail or to see the similarity of the attached file for malware detection, identify the mail based on the appearance frequency of characters. In this way, it is pointed to see the similarity of mail. In the definition table 151, the number of characters (n) in the counting unit is designated. Specify the number of characters to divide the character string to be counted. For example, n = 1 is set when it is desired to divide and count the character string to be counted.

プロトコル判定部１００は、受信したパケットを受け取ると、パケットのＬ７プロトコルの判別をする。尚、ＴＣＰのポート番号などプロトコル判定に必要な情報が付加されているのであれば、パケットを再構築したストリームデータであっても、プロトコルの判別は可能である。ＴＣＰのポート番号が８０ならＨＴＴＰ、ＴＣＰのポート番号が２５ならＳＭＴＰと判別する。 When the protocol determination unit 100 receives the received packet, the protocol determination unit 100 determines the L7 protocol of the packet. If information necessary for protocol determination, such as a TCP port number, is added, the protocol can be determined even with stream data obtained by reconstructing a packet. If the TCP port number is 80, it is determined as HTTP, and if the TCP port number is 25, it is determined as SMTP.

特定部１１０は、プロトコル判定部１００で判定された結果と、定義用テーブル１５１とを参照し、パケットに含まれるデータの構文解析を行う。具体的には、定義用テーブル１５１に、プロトコルにＳＭＴＰが指定され、Ｃｏｎｔｅｎｔ−ｔｙｐｅにｔｅｘｔ／ｐｌａｉｎが指定されている場合、特定部１１０は、メールに含まれるＵＲＬの類似性を分析する旨の指定であると判断する。そして、特定部１１０は、ｈｔｔｐの文字を探し、ＵＲＬの開始位置と終了位置とを特定する。特定の結果として得られる計数（カウント）すべきデータ範囲は、実際に計数処理をするｎ−ｇｒａｍ処理部１２０へと通知される。 The identifying unit 110 refers to the result determined by the protocol determining unit 100 and the definition table 151, and parses data included in the packet. Specifically, in the definition table 151, when SMTP is specified as a protocol and text / plain is specified as Content-type, the specifying unit 110 analyzes the similarity of URLs included in an email. Judge as specified. Then, the specifying unit 110 searches for an http character and specifies the start position and end position of the URL. The data range to be counted (counted) obtained as a specific result is notified to the n-gram processing unit 120 that actually performs the counting process.

ここで、計数処理とは、文字列パターンに含まれる文字列数をカウントする処理である。文字列パターンとは文字列または文字によって構成される語句であり、「受信メールに含まれる文字列パターン」と「ユーザによって予め登録された文字列パターン」とに大別される。「受信メールに含まれる文字列パターン」とは、受信メール本文中、又は、受信メールの添付ファイル中の予め定義された範囲で指定される語句である。また、「ユーザによって予め登録された文字列パターン」とは、ユーザによって予め登録された比較対象として用いられるメールに含まれる特徴的な語句である。尚、文字列とは、この文字列パターンに含まれ、文字列パターンがｎ文字で区切られた１つ以上の文字である。尚、ｎ＝１の場合、１文字で区切られる単位は文字列ではなく厳密には文字となるが、以下実施例の説明では、ｎ＝１であっても説明の便宜のため文字列と表現する場合がある。 Here, the counting process is a process of counting the number of character strings included in the character string pattern. The character string pattern is a word / phrase composed of a character string or characters, and is roughly classified into “character string pattern included in received mail” and “character string pattern registered in advance by the user”. The “character string pattern included in the received mail” is a phrase specified in a predefined range in the received mail body or in the attached file of the received mail. Further, the “character string pattern registered in advance by the user” is a characteristic word / phrase included in an email used as a comparison target registered in advance by the user. The character string is one or more characters included in the character string pattern and divided by n characters. In the case of n = 1, the unit delimited by one character is not a character string but strictly a character, but in the following description of the embodiment, even if n = 1, it is expressed as a character string for convenience of explanation. There is a case.

対象文字指示部１６０は、定義用テーブル１５１を参照し、計数対象となる文字を特定する。定義用テーブル１５１にアルファベットと記号とが計数対象として指定されていれば、対象文字指示部１６０は、計数対象の文字がアルファベット及び記号であると特定する。この特定された計数対象の文字は、ｎ−ｇｒａｍ処理部１２０へと通知される。 The target character instruction unit 160 refers to the definition table 151 and specifies the character to be counted. If alphabets and symbols are specified as counting objects in the definition table 151, the target character instruction unit 160 specifies that the characters to be counted are alphabets and symbols. The specified character to be counted is notified to the n-gram processing unit 120.

ｎ−ｇｒａｍ処理部１２０は、特定部１１０と対象文字指示部１６０とから通知されてきた計数対象の文字、計数対象範囲、文字列を区切る単位（ｎ）の情報に基づいて、受信したメールに含まれる文字列パターンの特徴量を算出する。ここで、特徴量とは、文字列パターンを構成する1文字以上の文字列の出現頻度の集合である。 The n-gram processing unit 120 generates a received mail based on the information on the count target character, the count target range, and the unit (n) that divides the character string notified from the specifying unit 110 and the target character instruction unit 160. The feature amount of the included character string pattern is calculated. Here, the feature amount is a set of appearance frequencies of one or more character strings constituting the character string pattern.

ｎ−ｇｒａｍ処理部１２０において、特徴量の算出は以下のように行われる。まず、受信したメールに含まれる文字列パターンを、隣接するｎ文字の文字列で区切り、この区切られた文字列のそれぞれが、該文字列パターンの中に幾つ存在するかを示す出現回数をカウントする。さらに、このカウントされた値（出現回数）を、区切られた文字列の総数で除算することで、文字列毎の出現頻度が算出される。例えば、ｎ＝１の場合、文字列パターンがＵＲＬ（http://example3.com/ex1/top.html）であるとすると、ｎ−ｇｒａｍ処理部１２０による特徴量の算出結果は、図７の例に示されるような計数テーブルとして出力される。 In the n-gram processing unit 120, the feature amount is calculated as follows. First, the character string pattern included in the received mail is separated by adjacent character strings of n characters, and the number of occurrences indicating how many of the divided character strings exist in the character string pattern is counted. To do. Further, the appearance frequency for each character string is calculated by dividing the counted value (number of appearances) by the total number of character strings delimited. For example, when n = 1, if the character string pattern is URL (http://example3.com/ex1/top.html), the calculation result of the feature value by the n-gram processing unit 120 is as shown in FIG. It is output as a counting table as shown in the example.

尚、本実施例においては、計数対象となる計数単位の文字列を「ｎ文字の文字列として区切る」ｎ−ｇｒａｍ方式を採用し、ｎ文字の文字列として区切られた文字列の組合せがどのくらい出現するかの出現率を、区切られた文字ごとに調査するものとする。また、文字列を区切る単位（ｎ）は、定義用テーブルでユーザによって指定される。 In this embodiment, the n-gram method of “delimiting the character string of the counting unit to be counted as a character string of n characters” is adopted, and how many combinations of character strings are divided as the character string of n characters. The appearance rate of occurrence will be investigated for each delimited character. The unit (n) for dividing the character string is specified by the user in the definition table.

リファレンスパターン格納部１７０は、悪意メールに含まれる文字列パターンを登録しておくための記憶領域である。具体的には、ユーザが、悪意メールと認定したメールに含まれる文字列パターンの特徴量をこの格納部に予め登録しておく。さらに、該文字列パターンを構成する各文字の出現頻度についても事前に算出し登録しておく。すなわち、図５の例に示されるように、文字列パターンを構成する各文字列と、各文字列の出現頻度とが対応付けられたリファレンスパターンを文字列パターン毎に複数格納しておく。 The reference pattern storage unit 170 is a storage area for registering a character string pattern included in malicious mail. Specifically, the feature amount of the character string pattern included in the mail certified as malicious mail by the user is registered in advance in this storage unit. Furthermore, the appearance frequency of each character constituting the character string pattern is also calculated and registered in advance. That is, as shown in the example of FIG. 5, a plurality of reference patterns in which each character string constituting the character string pattern is associated with the appearance frequency of each character string are stored for each character string pattern.

判断部１３０は、計数テーブルにおいて出現が確認されている文字列を含んだ出現パターンをもつリファレンスパターンが無いかどうか、リファレンスパターン格納部を参照し検索する。ここで、「出現が確認されている文字列」とは、出現頻度が０を超える文字であり、「出現パターン」とは、出現頻度が０を超える文字の組み合わせである。例えば、図７の計数テーブルにおける出現パターンは［ａ，ｃ，ｅ］であり、“ａ”、“ｃ”、または“ｅ”の文字を少なくとも含むリファレンスパターンをリファレンスパターン格納部１７０から抽出する。 The determination unit 130 searches the reference pattern storage unit for a reference pattern having an appearance pattern including a character string whose appearance is confirmed in the counting table. Here, the “character string whose appearance has been confirmed” is a character whose appearance frequency exceeds 0, and the “appearance pattern” is a combination of characters whose appearance frequency exceeds 0. For example, the appearance pattern in the counting table of FIG. 7 is [a, c, e], and a reference pattern including at least the characters “a”, “c”, or “e” is extracted from the reference pattern storage unit 170.

判断部１３０によって計数テーブルで出現が確認された文字列を含むリファレンスパターンが見つかった場合、次に、検出部１４０は、計数テーブルと、リファレンスパターン格納部１７０に格納されているリファレンスパターンとを比較し、受信メールの文字列パターンの特徴量が、悪意メールと認定されたメールに含まれ予め登録された文字列パターンの特徴量と類似するか否かを判断する。 When a reference pattern including a character string whose appearance is confirmed in the counting table by the determining unit 130 is found, the detecting unit 140 then compares the counting table with the reference pattern stored in the reference pattern storage unit 170. Then, it is determined whether or not the feature amount of the character string pattern of the received mail is similar to the feature amount of the character string pattern included in the mail certified as malicious mail.

具体的には、検出部１４０は、計数テーブルに含まれる各文字列の出現頻度と、リファレンスパターンの各文字列の出現頻度との差分を文字列毎に計算する。そして、該文字列毎の差分の値の平均値と検出部が保持している閾値とを比較する。差分の値が閾値より小さい場合は、特徴量の類似性が確認できたとし、パケットの規制やログへの記録などのアクション処理を行う。 Specifically, the detection unit 140 calculates, for each character string, the difference between the appearance frequency of each character string included in the counting table and the appearance frequency of each character string of the reference pattern. Then, the average value of the difference values for each character string is compared with the threshold value held by the detection unit. If the difference value is smaller than the threshold value, it is assumed that the similarity of the feature amount has been confirmed, and action processing such as packet restriction and recording in a log is performed.

次に、図４に悪意メール検出方式の処理を示すフローチャートを用いて、本発明である悪意メール検出装置の処理の動作について説明する。 Next, the operation of the malicious mail detection apparatus according to the present invention will be described with reference to the flowchart of FIG. 4 showing the malicious mail detection process.

尚、ここでは、予め悪意メールとして登録したＵＲＬの文字列パターンと受信したパケット（メール）に含まれるＵＲＬとの類似性から悪意メールを検出する場合を例にとって説明する。 Here, an example will be described in which malicious mail is detected from the similarity between the URL character string pattern registered in advance as malicious mail and the URL included in the received packet (mail).

また、ここでは、メールに含まれるＵＲＬは図６に示されるような文字列パターンであるとする。また、計数単位の文字数ｎは１（ｎ＝１）と指定され、文字列パターンは1文字単位で区切られるものとして説明する。 Here, it is assumed that the URL included in the mail is a character string pattern as shown in FIG. In the following description, it is assumed that the number n of characters in the counting unit is designated as 1 (n = 1), and the character string pattern is divided in units of one character.

ネットワーク上を流れるパケットを、検出装置が受信すると（ステップＳ１）、プロトコル判定部１００は、Ｌ７レベルでのプロトコル判定を行う（ステップＳ２）。すなわち、プロトコル判定部１００によって、パケットがＳＭＴＰプロトコルであると判定される。このプロトコル判定は、Ｌ２，Ｌ３，Ｌ４レベルで行わせるようにしてもよい。 When the detection apparatus receives a packet flowing on the network (step S1), the protocol determination unit 100 performs protocol determination at the L7 level (step S2). That is, the protocol determination unit 100 determines that the packet is the SMTP protocol. This protocol determination may be performed at the L2, L3, and L4 levels.

次に、特定部１１０によって、パケットに含まれるデータが解析されて計数対象範囲が特定される。具体的には、特定部１１０によって定義用ファイル１５１が参照され、図６に示されるように、データ中のＣｏｎｔｅｎｔ−ｔｙｐｅ：ｔｅｘｔ／ｐｌａｉｎの記載がある場合、特定部１１０は、定義用テーブル１５１に定義されている情報に基づいてｈｔｔｐの文字を探し、ＵＲＬの開始位置と終了位置とをｎ−ｇｒａｍ処理部１２０へ通知する（ステップＳ３）。また、計数単位の文字数（ｎ＝１）をｎ−ｇｒａｍ処理部１２０へ通知する。 Next, the data included in the packet is analyzed by the specifying unit 110 and the count target range is specified. Specifically, when the definition unit 151 is referred to by the specifying unit 110 and the content-type: text / plain is included in the data as illustrated in FIG. 6, the specifying unit 110 displays the definition table 151. The character of http is searched based on the information defined in the above, and the start position and end position of the URL are notified to the n-gram processing unit 120 (step S3). In addition, the number of characters in the counting unit (n = 1) is notified to the n-gram processing unit 120.

次に、対象文字指示部１６０によって、計数対象の文字が特定される。この場合、定義用ファイルには、アルファベットと記号が計数対象の文字として指定されているため、アルファベットと記号とが計数対象の文字であると特定される。この特定された情報はｎ−ｇｒａｍ処理部１２０へ通知される（ステップＳ４）。 Next, the target character instruction unit 160 specifies the character to be counted. In this case, since the alphabet and the symbol are designated as the characters to be counted in the definition file, the alphabet and the symbol are specified as the characters to be counted. This specified information is notified to the n-gram processing unit 120 (step S4).

続いて、特定部１１０と対象文字指示部１６０とからの指示に応答したｎ−ｇｒａｍ処理部１２０によって、文字の計数が行われる（ステップＳ２）。具体的には、受信したメールに含まれるＵＲＬが図６に示されるようにＵＲＬ（http://example3.com/ex1/top.html）であるので、ＵＲＬの先頭から順番に文字の個数をカウントされていく。ｈ：＋１（合計１），ｔ：＋１(合計１)，ｔ：＋１（合計２），ｐ：＋１（合計１）・・・の如く加算されていき、最終的なカウントの結果は、ａ：（合計１），ｂ：（合計０），ｃ：（合計１），ｄ（合計０），ｅ：（合計３）といったようなる。 Subsequently, characters are counted by the n-gram processing unit 120 in response to instructions from the specifying unit 110 and the target character instruction unit 160 (step S2). Specifically, since the URL included in the received mail is a URL (http://example3.com/ex1/top.html) as shown in FIG. 6, the number of characters is determined in order from the top of the URL. It will be counted. h: +1 (total 1), t: +1 (total 1), t: +1 (total 2), p: +1 (total 1), and so on, and the final count result is a : (Total 1), b: (total 0), c: (total 1), d (total 0), e: (total 3), and so on.

最終的には、各文字の出現回数（カウンタ値）は、図７の計数テーブルのカウンタ項目欄に示されるような結果となる。さらに、正規化のため、計数対象の文字列パターンに含まれる全文字数で各文字の出現回数が除算され、出現頻度が算出される。例えば、本実施例の場合、図６のＵＲＬの文字数が全部で３２文字なので、各文字は３２で除算され、図７に示される計数テーブルの「出現頻度」の項目に示されるような文字列毎の出現頻度が得られる。上記算出された結果である出現回数、出現頻度の情報が、計数テーブルに記憶される。 Eventually, the number of appearances (counter value) of each character is as shown in the counter item column of the counting table in FIG. Further, for normalization, the appearance frequency is calculated by dividing the appearance count of each character by the total number of characters included in the character string pattern to be counted. For example, in the case of the present embodiment, since the total number of characters in the URL of FIG. 6 is 32, each character is divided by 32, and a character string as shown in the “appearance frequency” item of the counting table shown in FIG. The appearance frequency for each is obtained. Information on the number of appearances and the appearance frequency, which are the calculated results, is stored in the count table.

次に、判断部１３０は、計数テーブルに記憶されており出現が確認されている文字列を少なくとも含む出現パターンを有するリファレンスパターンが無いかどうか、リファレンスパターン格納部１７０を検索する。この場合、図７に示されるように、出現が確認されている文字列（文字）は［ａ，ｃ，ｅ］であるので、“ａ”、“ｃ”、“ｅ”のいずれかを少なくとも含むリファレンスパターンをリファレンスパターン格納部１７０で検索する。ここでは、図５に示されるリファレンスパターンが抽出される。 Next, the determination unit 130 searches the reference pattern storage unit 170 for a reference pattern having an appearance pattern that includes at least a character string that has been confirmed to appear and is stored in the counting table. In this case, as shown in FIG. 7, since the character string (character) whose appearance has been confirmed is [a, c, e], at least one of “a”, “c”, and “e” is set. The reference pattern storage unit 170 searches for a reference pattern to be included. Here, the reference pattern shown in FIG. 5 is extracted.

判断部１３０の処理が終了後、計数テーブルに示された出現パターンに近いリファレンスパターンが見つかった場合には、検出部１４０は、計数テーブルの出現頻度の項目とリファレンスパターンの出現頻度の項目との差分の値を文字列毎に計算する（ステップＳ７）。さらに、検出部１４０は、この文字列毎に算出した差分の値の平均値と閾値とを比較し、差分の値の平均値が閾値より小さければ（ステップＳ８）、類似性有り（悪意メールを検出）とする。そして、アクション処理部３８０によって、パケットの規制やログへの記録等のアクション処理が行われる（ステップＳ９）。 When a reference pattern close to the appearance pattern shown in the count table is found after the processing of the determination unit 130 is completed, the detection unit 140 determines whether the appearance frequency item of the count table and the appearance frequency item of the reference pattern are A difference value is calculated for each character string (step S7). Furthermore, the detection unit 140 compares the average value of the difference values calculated for each character string with a threshold value, and if the average value of the difference values is smaller than the threshold value (step S8), there is similarity (malicious mail is transmitted). Detection). Then, the action processing unit 380 performs action processing such as packet restriction and log recording (step S9).

上記のように構成された悪意メール検出装置によって、ＵＲＬの一部が意図的に変更された悪意メールであっても、検出することができる。 Even the malicious mail in which a part of the URL is intentionally changed can be detected by the malicious mail detecting apparatus configured as described above.

上記第１の実施例においては、判断部１３０が、メールに含まれる文字列パターンの出現パターンを含むリファレンスパターンを抽出した後、検出部１４０が、計数テーブルの頻度の項目とリファレンスパターンの頻度の項目との差分を計算し、該差分の平均値と保持する閾値との大小関係により悪意メールか否かを判断するよう構成させたが、上述したリファレンスパターンの抽出をせずに、判断部１３０と検出部１４０とを同時に機能させるようにしてもよい。すなわち、計数テーブルの頻度の項目とリファレンスパターンの頻度の項目との差分を計算し、該差分の平均値と保持する閾値との大小関係により、受信メールの文字列パターンと記憶されている文字列パターンとの類否を直接判断させるようにしてもよい。 In the first embodiment, after the determination unit 130 extracts the reference pattern including the appearance pattern of the character string pattern included in the mail, the detection unit 140 determines the frequency item of the count table and the frequency of the reference pattern. Although it is configured to calculate the difference with the item and determine whether it is a malicious email based on the magnitude relationship between the average value of the difference and the threshold value to be held, the determination unit 130 does not extract the reference pattern described above. And the detection unit 140 may function simultaneously. That is, the difference between the frequency item of the count table and the frequency item of the reference pattern is calculated, and the character string pattern stored in the received mail character string pattern according to the magnitude relationship between the average value of the difference and the threshold value to be held The similarity with the pattern may be directly judged.

また、上記第１の実施例においては、計数テーブルの出現頻度の項目とリファレンスパターンの出現頻度の項目との差分の値を文字列毎に算出し、この文字列毎の差分値の平均値と閾値とを比較し、該差分値の平均値が閾値よりも小さい場合に類似性有りと判断させるよう構成させたが、これに限ることはない。計数テーブルの出現頻度の項目とリファレンスパターンの出現頻度の項目との差分の値を文字列毎に算出した後、この算出した差分値の最大値と閾値とを比較し、該差分値の最大値が閾値よりも小さい場合に類似性有りと判断させるようにしてもよい。また、差分の値の偏りの度合いが所定の閾値よりも小さい場合に類似性有りと判断させてもよい。 In the first embodiment, the difference value between the appearance frequency item of the count table and the appearance frequency item of the reference pattern is calculated for each character string, and the average value of the difference values for each character string is calculated as follows: The threshold value is compared, and when the average value of the difference values is smaller than the threshold value, it is determined that there is similarity. However, the present invention is not limited to this. After calculating the difference value between the appearance frequency item of the count table and the reference pattern appearance frequency item for each character string, the maximum value of the calculated difference value is compared with a threshold value, and the maximum value of the difference value is calculated. It may be determined that there is similarity when is smaller than the threshold. Alternatively, it may be determined that there is similarity when the degree of deviation of the difference value is smaller than a predetermined threshold.

上記第１の実施例においては、悪意メールに含まれる文字列パターンとしてＵＲＬに着目し、登録されているＵＲＬとメールに含まれるＵＲＬとの類似性に基づいてスパムメールか否かを判断する場合を例にとって説明したが、本発明によれば、添付ファイルの類似性によりマルウェアを検知させるようにすることもできる。 In the first embodiment, attention is paid to a URL as a character string pattern included in a malicious mail, and it is determined whether or not the mail is spam based on the similarity between the registered URL and the URL included in the mail. However, according to the present invention, malware can be detected based on the similarity of the attached file.

次に、本発明の第２の実施例について、図９の第２の実施例を示すブロック図を用いて説明する。 Next, a second embodiment of the present invention will be described with reference to a block diagram showing the second embodiment of FIG.

第２の実施例の悪意メール検出装置は、特定部２１０と、ｎ−ｇｒａｍ処理部２２０と、判断部２３０と、検出部２４０と、対象文字指示部２６０と、リファレンスパターン格納部２７０と、アクション処理部２８０とを有する。 The malicious mail detection device of the second embodiment includes a specifying unit 210, an n-gram processing unit 220, a determination unit 230, a detection unit 240, a target character instruction unit 260, a reference pattern storage unit 270, an action And a processing unit 280.

特定部２１０は、定義用テーブル２５１を参照して、受信メールの添付ファイルに含まれる予め定義された計数対象範囲をｎ−ｇｒａｍ処理部２２０に指示する。 The specifying unit 210 refers to the definition table 251 and instructs the n-gram processing unit 220 of a predefined count target range included in the attached file of the received mail.

ｎ−ｇｒａｍ処理部２２０は、特定部２１０及び対象文字指示部２６０により指示された演算対象範囲の文字列パターンについて、特徴量を算出する。 The n-gram processing unit 220 calculates a feature amount for the character string pattern in the calculation target range specified by the specifying unit 210 and the target character specifying unit 260.

対象文字指示部２６０は、定義用テーブル２５１を参照し、添付ファイルに含まれるデータ中の計数対象となる文字を特定する。 The target character instruction unit 260 refers to the definition table 251 and specifies the character to be counted in the data included in the attached file.

リファレンスパターン格納部２７０は、悪意メール（マルウェア）に含まれる文字列パターンを登録しておくための記憶領域である。登録する文字列パターンは、通常のメールには含まれずマルウェアだけに含まれるような特徴的な文字列パターンとすることが望ましい。さらに、該文字列パターンを構成する各文字の出現頻度についても事前に算出し登録しておく。尚、その他構成部分の機能については第１の実施例の場合と同様である為、詳細な説明は省略する。 The reference pattern storage unit 270 is a storage area for registering a character string pattern included in a malicious mail (malware). It is desirable that the character string pattern to be registered is a characteristic character string pattern that is not included in normal mail but included only in malware. Furthermore, the appearance frequency of each character constituting the character string pattern is also calculated and registered in advance. Since the functions of the other components are the same as those in the first embodiment, detailed description thereof is omitted.

次に、第２の実施例における動作について説明する。 Next, the operation in the second embodiment will be described.

ここでは、定義用ファイルには、「プロトコル：ＳＭＴＰ」及び「Ｃｏｎｔｅｎｔ−ｔｙｐｅ：ａｐｐｌｉｃａｔｉｏｎ」の条件の場合に、「添付ファイルの全ての範囲が計数対象」「全ての文字が計数対象」である旨の定義がされているものとする。また、ここでも、計数単位の文字数ｎが１（ｎ＝１）と指定された場合を例にとって説明する。 Here, in the definition file, in the case of the conditions of “protocol: SMTP” and “Content-type: application”, “all ranges of attached files are subject to counting” and “all characters are subject to counting”. Is defined. Also here, the case where the number of characters n in the counting unit is designated as 1 (n = 1) will be described as an example.

プロトコル判定部２００によるプロトコル判定の結果がＳＭＴＰであり、Ｃｏｎｔｅｎｔ−ｔｙｐｅにａｐｐｌｉｃａｔｉｏｎが記されていることを確認した特定部２１０は、添付ファイルに相当するデータの部分において、計数対象範囲（どの範囲を計数すべきか）を、定義用ファイルを参照してｎ−ｇｒａｍ処理部２２０へ指示する。 The identification unit 210 confirming that the protocol determination result by the protocol determination unit 200 is SMTP and application is written in the Content-type, in the data portion corresponding to the attached file, Whether to count) is instructed to the n-gram processing unit 220 with reference to the definition file.

対象文字指示部２６０は、ｎ−ｇｒａｍ処理部２２０に全ての文字を計数するよう指示する。ｎ−ｇｒａｍ処理部２２０は、受信メールの添付ファイルにおける演算対象範囲の文字列パターンについて、特徴量を算出し、算出した結果を計数テーブルに記憶する。 The target character instruction unit 260 instructs the n-gram processing unit 220 to count all characters. The n-gram processing unit 220 calculates a feature amount for the character string pattern in the calculation target range in the attached file of the received mail, and stores the calculated result in the count table.

続いて、判断部２３０は、計数テーブルに記憶されており出現が確認されている文字列を含む出現パターンを有するリファレンスパターンが無いかどうか、リファレンスパターン格納部２７０を検索する。 Subsequently, the determination unit 230 searches the reference pattern storage unit 270 for a reference pattern having an appearance pattern that includes a character string that is stored in the count table and has been confirmed to appear.

判断部２３０の処理が終了後、計数テーブルに示された出現パターンに近いリファレンスパターンが見つかった場合には、検出部２４０は、計数テーブルの出現頻度の項目とリファレンスパターンの出現頻度の項目との差分の値を文字列毎に計算する。さらに、検出部２４０は、この文字列毎に算出した差分の値の平均値と閾値とを比較し、差分の値の平均値が閾値より小さければ、類似性有り（マルウェア検出）と判断する。そして、アクション処理部２８０によって、パケットの規制やログへの記録等のアクション処理が行われる。 When a reference pattern close to the appearance pattern shown in the count table is found after the processing of the determination unit 230 is completed, the detection unit 240 determines whether the appearance frequency item of the count table and the appearance frequency item of the reference pattern are The difference value is calculated for each character string. Furthermore, the detection unit 240 compares the average value of the difference values calculated for each character string with a threshold value, and determines that there is similarity (malware detection) if the average value of the difference values is smaller than the threshold value. The action processing unit 280 performs action processing such as packet restriction and recording in a log.

上記のように構成させた第２の実施例では、マルウェアの添付メールに含まれる文字列パターンを構成する文字列の出現頻度の集合をリファレンスパターン格納部に予め格納しておき、受信メールの添付ファイルに含まれる文字列パターンを構成する文字列の出現頻度の集合との類否に基づいて悪意メールか否かを判断する判断手段を備えているため、メールに含まれる添付ファイルのデータの一部が変更されたようなマルウェアであっても検出することが可能となる。 In the second embodiment configured as described above, a set of appearance frequencies of the character strings constituting the character string pattern included in the malware attached mail is stored in advance in the reference pattern storage unit, and the received mail is attached. Since there is a judging means for judging whether or not a malicious mail is based on the similarity with the set of appearance frequencies of the character strings constituting the character string pattern included in the file, one of the attached file data included in the mail is included. It is possible to detect even malware whose part has been changed.

さて、メールの種類毎に文字の出現頻度には特徴がある。この特徴に着目すれば、本発明の検出装置をメールの識別装置として機能させることも可能である。 Now, the appearance frequency of characters is characteristic for each type of mail. Focusing on this feature, the detection device of the present invention can also function as a mail identification device.

次に、本発明の第３の実施例について、図１０の第３の実施例を示すブロック図を用いて説明する。 Next, a third embodiment of the present invention will be described with reference to a block diagram showing the third embodiment of FIG.

第３の実施例においては、特定部３１０と、ｎ−ｇｒａｍ処理部３２０と、判断部３３０と、検出部３４０と、対象文字指示部３６０と、リファレンスパターン格納部３７０と、アクション処理部３８０とを有する。 In the third embodiment, the specifying unit 310, the n-gram processing unit 320, the determination unit 330, the detection unit 340, the target character instruction unit 360, the reference pattern storage unit 370, the action processing unit 380, Have

特定部３１０は、定義用テーブル３５１を参照して、受信メールに含まれる予め定義された計数対象範囲をｎ−ｇｒａｍ処理部３２０に指示する。 The specifying unit 310 refers to the definition table 351 and instructs the n-gram processing unit 320 of a predefined count target range included in the received mail.

対象文字指示部３６０は、定義用テーブル３５１を参照し、添付ファイルに含まれるデータ中の計数対象となる文字を特定する。 The target character instruction unit 360 refers to the definition table 351 and specifies the character to be counted in the data included in the attached file.

ｎ−ｇｒａｍ処理部３２０は、特定部３１０及び対象文字指示部３６０により指示された演算対象範囲の文字列パターンについて、特徴量を算出する。 The n-gram processing unit 320 calculates a feature amount for the character string pattern in the calculation target range instructed by the specifying unit 310 and the target character instruction unit 360.

リファレンスパターン格納部３７０は、メール種別を特定可能な程度に特徴的な文字列パターンを予め登録しておくための記憶領域である。登録対象の文字列パターンは、メールの種別を特定するのにふさわしい特徴的な文字列パターンとすることが望ましい。さらに、該文字列パターンを構成する文字列毎の出現頻度についても事前に算出し登録しておく。尚、その他構成部分の機能については第１の実施例の場合と同様である為、詳細な説明は省略する。 The reference pattern storage unit 370 is a storage area for registering in advance a character string pattern that is characteristic enough to specify the mail type. The character string pattern to be registered is preferably a characteristic character string pattern suitable for specifying the type of mail. Furthermore, the appearance frequency for each character string constituting the character string pattern is also calculated and registered in advance. Since the functions of the other components are the same as those in the first embodiment, detailed description thereof is omitted.

尚、上記第１及び第２の実施例においては、悪意メールと認定されたメールを予め登録しておき、受信メールの文字列パターンを構成する各文字の出現頻度と、記憶されている文字列パターンを構成する各文字の出現頻度との差分を算出し、この文字毎に算出された差分値の平均値が所定の閾値よりも小さい場合に受信メールが悪意メールと判断する構成をとったが、これに限ることはない。悪意メールでないと認定されたメールを予め登録しておき、受信メールの文字列パターンを構成する各文字の出現頻度と、記憶されている文字列パターンを構成する各文字の出現頻度との差分値の平均値が所定の閾値よりも大きい場合に、前記受信メールが悪意メールと判断する構成をとることもむろん可能である。 In the first and second embodiments, a mail that is recognized as a malicious mail is registered in advance, and the appearance frequency of each character constituting the character string pattern of the received mail and the stored character string are stored. The difference between the appearance frequency of each character constituting the pattern is calculated, and when the average value of the difference value calculated for each character is smaller than a predetermined threshold, the received mail is determined to be malicious mail. This is not a limitation. Mail that has been certified as not malicious mail is registered in advance, and the difference between the appearance frequency of each character constituting the character string pattern of the received mail and the appearance frequency of each character constituting the stored character string pattern Of course, it is possible to adopt a configuration in which the received mail is determined to be malicious mail when the average value of is greater than a predetermined threshold.

次に、第３の実施例における動作について説明する。 Next, the operation in the third embodiment will be described.

ここでは、定義用ファイルには、「プロトコル：ＳＭＴＰ」及び「Ｃｏｎｔｅｎｔ−ｔｙｐｅ：ｔｅｘｔ／ｐｌａｉｎ」の条件の場合に、「メールに含まれるデータの１バイト目から１００バイト目までが計数対象」「全ての文字が計数対象」である旨の定義がされているものとする。また、ここでも、計数単位の文字数ｎが１（ｎ＝１）と指定された場合を例にとって説明する。 Here, in the case of the conditions of “protocol: SMTP” and “Content-type: text / plain”, the definition file includes “from 1st byte to 100th byte of data included in mail”. It is assumed that the definition that “all characters are to be counted” is made. Also here, the case where the number of characters n in the counting unit is designated as 1 (n = 1) will be described as an example.

プロトコル判定部３００によるプロトコル判定の結果がＳＭＴＰであり、Ｃｏｎｔｅｎｔ−ｔｙｐｅにｔｅｘｔ／ｐｌａｉｎが定義用ファイル３５１に記されていることを確認した特定部３１０は、定義用ファイルを参照し、計数対象範囲がメールに含まれるデータの１バイト目から１００バイト目までを計数範囲とするようにｎ−ｇｒａｍ処理部３２０へ指示する。 The identification unit 310 that has confirmed that the protocol determination result by the protocol determination unit 300 is SMTP and that text / plain is described in the content-type in the definition file 351 refers to the definition file, and the count target range Instructs the n-gram processing unit 320 to set the counting range from the first byte to the 100th byte of the data included in the mail.

対象文字指示部３６０は、ｎ−ｇｒａｍ処理部３２０に全ての文字（アルファベット・数字・記号・漢字・カナ等）を計数するよう指示する。ｎ−ｇｒａｍ処理部３２０は、計数した結果を計数テーブルに記憶する。 The target character instruction unit 360 instructs the n-gram processing unit 320 to count all characters (alphabets, numbers, symbols, kanji, kana, etc.). The n-gram processing unit 320 stores the counted result in the counting table.

続いて、判断部３３０は、計数テーブルに記憶されており出現が確認されている文字列を含む出現パターンを有するリファレンスパターンが無いかどうか、リファレンスパターン格納部３７０を検索する。 Subsequently, the determination unit 330 searches the reference pattern storage unit 370 for a reference pattern having an appearance pattern including a character string that is stored in the count table and has been confirmed to appear.

判断部３３０の処理が終了後、計数テーブルに示された出現パターンに近いリファレンスパターンが見つかった場合には、受信したメールの種類は、該リファレンスパターンに登録されたメールの種類と同一の種類あると識別することができる。 When the reference pattern close to the appearance pattern shown in the counting table is found after the processing of the determination unit 330 is completed, the received mail type is the same as the mail type registered in the reference pattern. Can be identified.

上記のように構成させた第３の実施例では、メールの種類を特定可能な文字列パターンと該文字列パターンを構成する文字列の出現頻度の集合とを対応付けてリファレンスパターン格納部に予め格納しておき、受信メールに含まれる文字列パターンを構成する文字列の出現頻度の集合との類否を調べる判断手段を備えているため、受信メールがどのような種別のメールであるのか、メールを識別することが可能となる。 In the third embodiment configured as described above, a character string pattern capable of specifying the type of mail is associated with a set of appearance frequencies of character strings constituting the character string pattern in advance in the reference pattern storage unit. Since it has a judgment means to store and determine the similarity with the set of appearance frequencies of the character strings constituting the character string pattern included in the received mail, what type of mail the received mail is, Mail can be identified.

次に、第４の実施例について、図１１の全体ブロック図と、図１２のブロック図とを用いて説明する。 Next, a fourth embodiment will be described with reference to the overall block diagram of FIG. 11 and the block diagram of FIG.

図１１を見ると、第４の実施例における構成では、第１から第３の実施例における構成に加えてメモリ解放機能部４９０を更に有する。メモリ解放機能部４９０は、所定のストリームの構築規則に従って、受信したメールから少なくとも一つ以上のストリームを構築する。ここで、「所定のストリームの構築規則」とは、例えば、「受信メールのパケットを一定のサイズになるまで連結してストリームを構築する」、或いは、「受信データのｓｍｔｐコマンドに含まれる改行コードを確認するまでストリームを構築する」といった、ストリームを構築するために予め定義された規則である。そして、この構築したストリームを管理し、ｎ−ｇｒａｍ処理部４２０による特徴量の算出が終了すると、そのストリームの管理に使用されているメモリ領域を解放する。 Referring to FIG. 11, the configuration of the fourth embodiment further includes a memory release function unit 490 in addition to the configurations of the first to third embodiments. The memory release function unit 490 constructs at least one stream from the received mail according to a predetermined stream construction rule. Here, the “predetermined stream construction rule” is, for example, “concatenates received mail packets to a certain size to construct a stream” or “line feed code included in the received data smtp command” The rule is defined in advance for constructing a stream, such as “construct stream until confirmation”. Then, the constructed stream is managed, and when the feature amount calculation by the n-gram processing unit 420 is completed, the memory area used for managing the stream is released.

ここで構築されたストリームは、特定部４１０によるデータの構文解析、及び、ｎ−ｇｒａｍ処理部４２０による計数処理・特徴量算出処理の処理単位となる。 The stream constructed here is a processing unit of data parsing by the specifying unit 410 and counting processing / feature amount calculation processing by the n-gram processing unit 420.

特定部４１０は、メモリ解放機能部４９０から指示を受けると、構文解析を開始する。このとき、特定部４１０は、メモリ解放機能部４９０から渡されたストリームのデータを構文解析の処理単位とする。構文解析の処理の後、特定部４１０は、ストリームを構文解析の結果とともにｎ-ｇｒａｍ処理部４２０へ渡す。 When receiving the instruction from the memory release function unit 490, the specifying unit 410 starts syntax analysis. At this time, the identifying unit 410 uses the stream data passed from the memory release function unit 490 as a processing unit for parsing. After the parsing process, the identifying unit 410 passes the stream to the n-gram processing unit 420 together with the parsing result.

ｎ−ｇｒａｍ処理部４２０は、特定部４１０から渡されたストリーム毎に特徴量を算出する。ストリーム毎の特徴量の算出処理が終了すると、終了した旨をメモリ解放機能部４９０に通知する。その他、特定部４１０、ｎ−ｇｒａｍ処理部４２０は、第１から第３の実施例で詳述した機能を有するが、ここでは詳細な説明を省略する。また、他の構成部の機能についても、第１から第３の実施例における機能と同一であるため、ここでは詳細な説明を省略する。 The n-gram processing unit 420 calculates a feature amount for each stream passed from the specifying unit 410. When the calculation process of the feature value for each stream ends, the memory release function unit 490 is notified of the end. In addition, the specifying unit 410 and the n-gram processing unit 420 have the functions described in detail in the first to third embodiments, but detailed description thereof is omitted here. Further, the functions of the other components are the same as those in the first to third embodiments, and thus detailed description thereof is omitted here.

続いて、メモリ解放機能部４９０の内部構成について、図１２のブロック図を用いて詳細に説明する。 Next, the internal configuration of the memory release function unit 490 will be described in detail with reference to the block diagram of FIG.

メモリ解放機能部４９０は、セッション管理部４０００と、ストリーム再構築処理部４００１と、プロトコル状態遷移管理部４００２と、指示部４００３と、セッション管理テーブル４０１０と、ストリーム管理テーブル４０１１と、プロトコル状態遷移管理部４０１２とを有する。 The memory release function unit 490 includes a session management unit 4000, a stream reconstruction processing unit 4001, a protocol state transition management unit 4002, an instruction unit 4003, a session management table 4010, a stream management table 4011, and a protocol state transition management. Part 4012.

セッション管理部４０００は、プロトコル判定部４００を介してメールのパケットを受信すると、受信したメールによるセッションが新規であるのか否かを判断する。そして、新規なセッションであると判断された場合には、その受信メールによるセッションを特定するための識別情報であるセッションＩＤを付与する。 When the session management unit 4000 receives a mail packet via the protocol determination unit 400, the session management unit 4000 determines whether or not the session by the received mail is new. If it is determined that the session is a new session, a session ID, which is identification information for specifying the session by the received mail, is assigned.

また、セッション管理部４０００は、受信メールのパケットによる新規なセッションの接続情報を、セッションＩＤに関連付けてテーブルに登録する。更に、不要になったセッションの接続情報をテーブルから削除する。ここで、接続情報とは、セッション（接続）を特定するための情報である。例えば、送信元・送信先ＩＰアドレス，送信元・送信先ポート番号，プロトコル等の情報である。この接続情報の登録・削除は、セッション管理部４０００が有するセッション管理テーブル４０１０に対して行う。 In addition, the session management unit 4000 registers connection information of a new session by the received mail packet in the table in association with the session ID. Further, the connection information of the session that is no longer needed is deleted from the table. Here, the connection information is information for specifying a session (connection). For example, the source / destination IP address, the source / destination port number, and the protocol information. This connection information registration / deletion is performed on the session management table 4010 of the session management unit 4000.

また、セッション管理部４０００は、セッション管理テーブル４０１０に対し登録・削除を終えたパケットを、ストリーム再構築処理部４００１に渡す。 Also, the session management unit 4000 passes the packet that has been registered / deleted to the session management table 4010 to the stream reconstruction processing unit 4001.

ここで、新規なセッションであるか否かの判断の方法であるが、送信元ＩＰアドレス，送信先ＩＰアドレス，送信元ポート番号，送信先ポート番号，及びプロトコルの少なくとも５タプルを参照して行われる。すなわち、セッション管理部４０００は、到着したパケットの５タプルを参照し、セッション管理テーブル４０１０に未登録であれば、受信パケットの接続情報を新規セッションとして登録する。既にセッションが登録されている場合には登録処理は行われない。また、指示部４００３からの指示を受けて不要になったセッションの接続情報を削除し、メモリ領域を解放する。 Here, a method for determining whether or not the session is a new session is performed by referring to at least five tuples of a transmission source IP address, a transmission destination IP address, a transmission source port number, a transmission destination port number, and a protocol. Is called. That is, the session management unit 4000 refers to the 5-tuple of the arrived packet, and if it is not registered in the session management table 4010, registers the connection information of the received packet as a new session. If the session has already been registered, the registration process is not performed. Also, in response to an instruction from the instruction unit 4003, connection information of a session that is no longer needed is deleted, and the memory area is released.

ストリーム再構築処理部４００１は、ストリームの構築規則に従って、セッション管理部４０００から受信したメールのパケットに基づいて、少なくとも一つ以上のストリームを新たに構築する。また、構築したストリームに、ストリームを特定するための識別情報であるストリームＩＤを付与する。ここで、ストリーム再構築処理部４００１は、ストリーム管理のために設けられたストリーム管理テーブル４０１１を用いて、ストリームＩＤが付与されたストリームを管理する。ストリーム再構築処理部４００１は、この付与したストリームＩＤをストリームの情報（ストリームのデータサイズ，先頭ポインタのアドレス等）と関連付けてストリーム管理テーブル４０１１に登録し、更に、セッションＩＤと対応付けて管理する。 The stream reconstruction processing unit 4001 newly constructs at least one stream based on the mail packet received from the session management unit 4000 according to the stream construction rule. Further, a stream ID that is identification information for specifying the stream is given to the constructed stream. Here, the stream reconstruction processing unit 4001 manages a stream to which a stream ID is assigned using a stream management table 4011 provided for stream management. The stream reconstruction processing unit 4001 registers the assigned stream ID in association with the stream information (stream data size, head pointer address, etc.) in the stream management table 4011 and further manages the stream ID in association with the session ID. .

また、ストリーム再構築処理部４００１は、指示部４００３からの指示、及びストリームＩＤの通知に応答して、構築したストリームを管理する為のメモリ領域を解放する。具体的には、通知されたストリームＩＤに対応するストリームの情報を、ストリーム管理テーブル４０１１から削除する。 Further, the stream reconstruction processing unit 4001 releases a memory area for managing the constructed stream in response to the instruction from the instruction unit 4003 and the notification of the stream ID. Specifically, the stream information corresponding to the notified stream ID is deleted from the stream management table 4011.

図１３にはセッション管理テーブルの例が、図１４にはストリーム管理テーブルの例が示される。ストリーム再構築処理部４００１は、メールを受信すると、図１３の例で示されるセッション管理テーブル４０１０を参照し、受信メールのパケットの接続情報に基づいてセッションＩＤを調べる。そして、図１４の例で示されるストリーム管理テーブル４０１１を参照し、取得したセッションＩＤに対応する「ストリーム構築フラグ」の値（以下、フラグ値ともいう）に従って、ストリームを構築する。 FIG. 13 shows an example of a session management table, and FIG. 14 shows an example of a stream management table. When receiving the mail, the stream reconstruction processing unit 4001 refers to the session management table 4010 shown in the example of FIG. 13 and checks the session ID based on the connection information of the received mail packet. Then, referring to the stream management table 4011 shown in the example of FIG. 14, a stream is constructed according to the value of the “stream construction flag” (hereinafter also referred to as flag value) corresponding to the acquired session ID.

ここで、ストリーム構築フラグとは、ストリーム構築の際に構築単位を決める際に参照される識別子である。メール本文のデータを受信する迄の間は、ストリーム構築フラグのフラグ値は「０」に設定される。メール本文のデータを受信すると、フラグ値「１」が設定される。このストリーム構築フラグは、ｓｍｔｐコマンドの遷移の通知を受けたストリーム構築処理部４００１によって設定される。 Here, the stream construction flag is an identifier that is referred to when a construction unit is determined in stream construction. Until the mail body data is received, the flag value of the stream construction flag is set to “0”. When the mail text data is received, the flag value “1” is set. The stream construction flag is set by the stream construction processing unit 4001 that has received notification of the transition of the smtp command.

ストリーム再構築処理部４００１は、ストリーム構築フラグの値が「０」の場合には“改行コード”まで、「１」の場合には“予め定められたサイズか、又は、メール本文終了（「．」行の受信）”まで、ストリームを構築する。そして、ストリーム再構築処理部４００１は、構築したストリームを、プロトコル状態遷移管理部４００２に渡す。 The stream reconstruction processing unit 4001 displays up to “line feed code” when the value of the stream construction flag is “0”, and “1” as “predetermined size or the end of the mail text (“. The stream is constructed until “Reception of line” ”, and the stream reconstruction processing unit 4001 passes the constructed stream to the protocol state transition management unit 4002.

プロトコル状態遷移管理部４００２は、ストリーム再構築処理部４００１で構築されたストリームを受け取ると、プロトコルの状態遷移を追跡し、プロトコルの状態遷移のステータスを管理する。さらに、受け取ったストリームを特定部４１０に渡す。具体的には、「フラグ値０」の状態で構築されたストリームをストリーム再構築処理部４００１から受信すると、プロトコルの状態遷移の追跡を開始する。 Upon receiving the stream constructed by the stream reconstruction processing unit 4001, the protocol state transition management unit 4002 tracks the protocol state transition and manages the protocol state transition status. Further, the received stream is passed to the specifying unit 410. Specifically, when a stream constructed in the state of “flag value 0” is received from the stream reconstruction processing unit 4001, tracking of the protocol state transition is started.

この遷移の追跡であるが、プロトコル遷移情報（ｓｍｔｐコマンド）が登録されたプロトコル状態遷移管理テーブル４０１２を用いて行われる。具体的には、プロトコル状態遷移管理部４００２は、受信データに含まれるｓｍｔｐコマンドと、プロトコル状態遷移管理テーブル４０１２に予め登録されたｓｍｔｐコマンドとの一致・不一致に基づいて追跡が行われる。受信データに含まれるｓｍｔｐコマンドと、プロトコル状態遷移管理テーブル４０１２に予め登録されたｓｍｔｐコマンドとが一致した場合、プロトコルの状態（ｓｍｔｐコマンドの状態）の遷移の追跡を続ける。尚、一致しない場合には、この遷移の追跡を中断し、更には、該受信メールからのストリーム構築処理も中止する。これにより、不正なプロトコルコマンドによるストリームについては分析の対象から外される。明らかに不正なプロトコルコマンドによるストリームについてまで構築・分析の対象とする必要は無いからである。 The tracking of this transition is performed using a protocol state transition management table 4012 in which protocol transition information (smtp command) is registered. Specifically, the protocol state transition management unit 4002 performs tracking based on the match / mismatch between the smtp command included in the received data and the smtp command registered in the protocol state transition management table 4012 in advance. When the smtp command included in the received data matches the smtp command registered in advance in the protocol state transition management table 4012, the tracking of the transition of the protocol state (the state of the smtp command) is continued. If they do not match, the tracking of this transition is interrupted, and further, the stream construction process from the received mail is also interrupted. As a result, a stream with an illegal protocol command is excluded from the analysis target. This is because it is not necessary to construct / analyze a stream with an obviously invalid protocol command.

一方、「フラグ値１」の状態でストリームをストリーム再構築処理部４００１から受信すると、プロトコル状態遷移管理部４００２は、プロトコルの状態（ｓｍｔｐコマンドの状態）遷移の追跡は行わずに該ストリームを特定部４１０に渡す。「フラグ値１」の場合は、既に「ＤＡＴＡ」コマンドに遷移した状態であることが自明であり、敢えて遷移を追跡する必要がないからである。 On the other hand, when the stream is received from the stream reconstruction processing unit 4001 in the state of “flag value 1”, the protocol state transition management unit 4002 identifies the stream without tracking the transition of the protocol state (the state of the smtp command). To the unit 410. This is because in the case of “flag value 1”, it is obvious that the state has already transitioned to the “DATA” command, and it is not necessary to trace the transition.

また、プロトコル状態遷移管理部４００２は、プロトコル状態遷移管理テーブル４０１２の情報とストリームのデータとを照合し、「ＤＡＴＡ」への遷移を確認すると、遷移の完了をストリーム再構築処理部４００１に通知する。この通知は、指示部４００３を介して行われる。更に、遷移の追跡を終えると、指示部からの通知に応答して、プロトコルの状態遷移のステータスを示す情報をプロトコル状態遷移管理テーブル４０１２から削除する。 Further, the protocol state transition management unit 4002 collates the information in the protocol state transition management table 4012 with the stream data, and when the transition to “DATA” is confirmed, notifies the stream reconstruction processing unit 4001 of the completion of the transition. . This notification is performed via the instruction unit 4003. Further, when the tracking of the transition is finished, information indicating the status of the protocol state transition is deleted from the protocol state transition management table 4012 in response to the notification from the instruction unit.

指示部４００３は、各部（セッション管理部、ストリーム再構築処理部、プロトコル状態遷移管理部）と連携しメモリ解放の制御をする。又、各部に対し、ストリームの構築や構文解析の開始を指示する。具体的には、プロトコル状態遷移管理部４００２より、「ＤＡＴＡ」への遷移完了の通知を受けると、ストリーム再構築処理部４００１に対し、“予め定められたサイズ（又は、メール本文終了（「．」行の受信））”までストリームを構築するよう指示を通知する。 The instruction unit 4003 controls memory release in cooperation with each unit (session management unit, stream reconstruction processing unit, protocol state transition management unit). Also, it instructs each unit to start stream construction and syntax analysis. Specifically, when the notification of completion of transition to “DATA” is received from the protocol state transition management unit 4002, the stream reconstruction processing unit 4001 is notified of “a predetermined size (or end of the mail text (“. “Receive line”)) ”to instruct to build a stream.

また、指示部４００３は、予め定められたサイズになるまでストリーム構築が完了した旨の通知をストリーム再構築処理部４００１から受けとると、特定部４１０に対して構文解析の開始を指示する。更に、指示部４００３は、「特徴量の算出処理」が終了した旨の通知をｎ−ｇｒａｍ処理部４２０から受け取ると、これに応答し、構築したストリームによるセッションを解放するようストリーム再構築処理部４００１へ指示を通知する。尚、この指示の通知の際、構築したストリームに付与されたストリームＩＤもあわせて通知される。 In addition, when the instruction unit 4003 receives a notification from the stream reconstruction processing unit 4001 that the stream construction is completed until a predetermined size is reached, the instruction unit 4003 instructs the specifying unit 410 to start parsing. Further, upon receiving a notification from the n-gram processing unit 420 that the “feature amount calculation processing” has been completed, the instruction unit 4003 responds to the notification and sends a stream reconstruction processing unit to release the session with the constructed stream. An instruction is notified to 4001. At the time of notification of this instruction, the stream ID assigned to the constructed stream is also notified.

この通知を受けたストリーム再構築部４００１では、通知されたストリームＩＤに対応するセッションＩＤに関連する情報をストリーム管理テーブル４０１１から削除する。 Upon receiving this notification, the stream reconstruction unit 4001 deletes information related to the session ID corresponding to the notified stream ID from the stream management table 4011.

更に、指示部４００３は、構築したストリームに付与されたストリームＩＤを含むセッションＩＤに関連付けられた情報の削除が完了した旨の通知をストリーム再構築処理部４００１から受信すると、受信したメールのパケットの接続情報をセッション管理テーブル４０１０から削除する旨の指示をセッション管理部４０００に対し通知する。同様に、プロトコルの遷移情報をプロトコル状態遷移管理テーブル４０１２から削除する旨の指示をプロトコル状態遷移管理部４００２に対して通知する。この指示の通知の際も、ストリームＩＤがあわせて通知される。 Further, when the instruction unit 4003 receives a notification from the stream reconstruction processing unit 4001 that the deletion of the information associated with the session ID including the stream ID assigned to the constructed stream has been completed, the instruction unit 4003 An instruction to delete the connection information from the session management table 4010 is notified to the session management unit 4000. Similarly, the protocol state transition management unit 4002 is notified of an instruction to delete the protocol transition information from the protocol state transition management table 4012. The stream ID is also notified when this instruction is notified.

この通知を受けたセッション管理部４０００、及びプロトコル状態遷移管理部４００２では、通知されたストリームＩＤに関連して登録された情報を、セッション管理テーブル４０１０、プロトコル状態遷移管理テーブル４０１２からそれぞれ削除する。 Upon receiving this notification, the session management unit 4000 and the protocol state transition management unit 4002 delete information registered in association with the notified stream ID from the session management table 4010 and the protocol state transition management table 4012, respectively.

次に、第４の実施例における悪意メール検出装置の動作について、図１５，図１６のメモリ解放処理を示すフローチャートを用いて説明する。 Next, the operation of the malicious mail detection device according to the fourth embodiment will be described with reference to the flowcharts shown in FIGS.

尚、本実施例では、メールのパケットに加えてメール以外のパケットが混在するネットワーク環境に設置された本装置が、ｓｍｔｐプロトコルのパケットを分析の対象として処理する場合を例にとって説明する。 In the present embodiment, a case will be described as an example where the present apparatus installed in a network environment in which packets other than mail are mixed in addition to mail packets processes smtp protocol packets as analysis targets.

また、本実施例では、受信したメール（以下、受信メールＡとも記載する）には、セッションＩＤ“０００１”が付与されるものとする。また、以下では、“送信元ＩＰアドレス，送信先ＩＰアドレス，送信元ポート番号，送信先ポート番号，プロトコル番号”が、それぞれ“ｓ１，ｄ１，ｓｐ１，ｄｐ１、ＴＣＰ２５”であるものとして説明する。 In the present embodiment, it is assumed that a session ID “0001” is given to a received mail (hereinafter also referred to as received mail A). In the following description, it is assumed that “source IP address, destination IP address, source port number, destination port number, protocol number” are “s1, d1, sp1, dp1, TCP25”, respectively.

また、本実施例では、メール本文のデータの受信後、５００バイトのデータ量に相当するストリームを構築し、この５００バイトのストリームを処理単位として特徴量を算出する場合を例にとって説明する。 In the present embodiment, a case where a stream corresponding to a data amount of 500 bytes is constructed after reception of mail body data, and a feature amount is calculated using the 500-byte stream as a processing unit will be described as an example.

また、以下では、“改行コードを確認するまで”構築したストリームに対しては、ストリームＩＤ“ｓｔｒｅａｍ１〜ｓｔｒｅａｍ４”が付与され、“一定サイズ（５００バイト）になるまで”構築されたストリームには、ストリームＩＤ“ｓｔｒｅａｍ５”が付与されるものとする。尚、一定サイズは、ここでは５００バイトであるとして説明するが、５００バイトに限る必要はない。例えば、予めデータサイズを３００バイトと設定しておき、３００バイトのデータ量に相当するストリームを構築し、この３００バイトの処理単位についての特徴量算出が終了した後でメモリ領域を解放するように構成させても構わない。 In the following, stream IDs “stream1 to stream4” are assigned to the stream constructed “until the line feed code is confirmed”, and the stream constructed “until it reaches a certain size (500 bytes)” It is assumed that the stream ID “stream5” is given. The fixed size is described here as being 500 bytes, but is not limited to 500 bytes. For example, the data size is set to 300 bytes in advance, a stream corresponding to the data amount of 300 bytes is constructed, and the memory area is released after the feature amount calculation for the processing unit of 300 bytes is completed. You may make it comprise.

また、本実施例は、前述の図１の例に示される検出装置、図２の例に示される通信装置のどちらに適用させても構わない。 Further, the present embodiment may be applied to either the detection device shown in the example of FIG. 1 or the communication device shown in the example of FIG.

さて、パケットのストリームが本装置に流入すると、プロトコル判定部４００は、パケットのＬ７プロトコルの判定をする。プロトコル判定部４００による判定の結果、ＳＭＴＰプロトコルであると判定されたストリームのパケットは、メモリ解放機能部４９０に渡される。このパケットをメモリ解放機能部４９０のセッション管理部４０００で受信すると（ステップＴ２）、セッション管理部４０００は、受信したパケットによるセッションが新規か否かを判定する。具体的には、到着したパケットの５タプル（送信元ＩＰ、送信先ＩＰ、送信元ポート、送信先ポート、プロトコルの組）の情報（以下、接続情報ともいう）を参照し、セッション管理テーブル４０１０に登録されていない場合（ステップＴ３：Ｎｏ）、新規なセッションであるかどうかが判定される。 When the packet stream flows into the apparatus, the protocol determination unit 400 determines the L7 protocol of the packet. As a result of the determination by the protocol determination unit 400, the packet of the stream determined to be the SMTP protocol is passed to the memory release function unit 490. When this packet is received by the session management unit 4000 of the memory release function unit 490 (step T2), the session management unit 4000 determines whether or not the session by the received packet is new. Specifically, the session management table 4010 is referred to by referring to information (hereinafter also referred to as connection information) of a 5-tuple (a combination of transmission source IP, transmission destination IP, transmission source port, transmission destination port, and protocol) of the arrived packet. If not registered (step T3: No), it is determined whether the session is a new session.

新規なセッションであると判定された場合（ステップＴ３：Ｎｏ）、このセッションには新たなセッションＩＤがセッション管理部４０００によって付与される。ここでは、“０００１”が付与される。更に、ストリーム再構築処理部４００１によって、セッションＩＤ“０００１”に対応するストリーム構築フラグの値が「０」に設定される（ステップＴ５）。 When it is determined that the session is a new session (step T3: No), a new session ID is assigned to this session by the session management unit 4000. Here, “0001” is assigned. Further, the stream reconstruction processing unit 4001 sets the value of the stream construction flag corresponding to the session ID “0001” to “0” (step T5).

こうして、図１３に示されるように、付与されたセッションＩＤ“０００１”と該セッションによる接続情報とが関連付けられ、セッション管理テーブル４０１０に登録される。尚、既に登録されている場合には登録は行われない（ステップＴ３：Ｙｅｓ）。 In this way, as shown in FIG. 13, the assigned session ID “0001” and the connection information by the session are associated and registered in the session management table 4010. If already registered, registration is not performed (step T3: Yes).

続いて、ストリーム再構築処理部４００１は、ストリーム構築フラグを参照し、ストリームの構築を開始する。ここでは、フラグ値「０」であるので、受信データに改行コードがある迄、ストリームが構築される（ステップＴ８）。 Subsequently, the stream reconstruction processing unit 4001 refers to the stream construction flag and starts stream construction. Here, since the flag value is “0”, a stream is constructed until there is a line feed code in the received data (step T8).

ここで構築されたストリームは、プロトコル状態遷移管理部４００２に渡される。そして、プロトコル状態遷移管理部４００２では、受信したメールのパケットが正規なプロトコルコマンドを含むかどうかチェックする（ステップＴ９，１０）。このチェックであるが、具体的には、受信データに含まれるｓｍｔｐコマンドと、プロトコル状態遷移管理テーブル４０１２に予め登録されたｓｍｔｐコマンドとが一致するかどうかによって行われる。 The stream constructed here is passed to the protocol state transition management unit 4002. Then, the protocol state transition management unit 4002 checks whether or not the received mail packet includes a normal protocol command (steps T9 and T10). Specifically, this check is performed depending on whether the smtp command included in the received data matches the smtp command registered in the protocol state transition management table 4012 in advance.

予め登録されたｓｍｔｐコマンドと一致しない場合、プロトコル状態遷移管理部４００２は、受信したメールのパケットが、不正なプロトコルコマンドを含むパケットであると判断する。そして、ストリームの構築を終了し、次のパケットの受信処理に移る（ステップＴ１０：Ｎｏ，ステップＴ３０）。 If it does not match the smtp command registered in advance, the protocol state transition management unit 4002 determines that the received mail packet is a packet including an invalid protocol command. Then, the construction of the stream is terminated, and the process proceeds to reception processing for the next packet (step T10: No, step T30).

一致する場合、プロトコル状態遷移管理部４００２は、受信したメールのパケットが、正しいプロトコルコマンドを含むパケットであると判断する。そして、セッション管理テーブル４０１０を参照してストリームを構築するようにストリーム再構築処理部４００１に指示を通知する。ここでは、ストリーム構築フラグが「０」であるので、「改行コード」を確認するまで受信メールＡのパケットは連結され、ストリームが構築される（ステップＴ１１）。構築されたストリームには、ストリーム再構築処理部４００１によってストリームＩＤが付与される（ステップＴ１２）。ここでは、ストリームＩＤ“ｓｔｒｅａｍ１”が付与される。 If they match, the protocol state transition management unit 4002 determines that the received mail packet is a packet including a correct protocol command. Then, an instruction is notified to the stream reconstruction processing unit 4001 so as to construct a stream with reference to the session management table 4010. Here, since the stream construction flag is “0”, the packet of the received mail A is connected and the stream is constructed until “line feed code” is confirmed (step T11). A stream ID is assigned to the constructed stream by the stream reconstruction processing unit 4001 (step T12). Here, the stream ID “stream1” is assigned.

また、このとき構築されたストリームは、プロトコル状態遷移管理部４００２に渡され、プロトコルの状態遷移が追跡される（ステップＴ１３）。 In addition, the stream constructed at this time is transferred to the protocol state transition management unit 4002, and the state transition of the protocol is tracked (step T13).

ここで、「改行コード」毎にストリームを構築する理由であるが、プロトコルのコマンドをプロトコル状態遷移管理部４００２に正しく認識させるためである。ＳＭＴＰは、コマンドの文字列の後に改行コードがある。従って、「改行コード」までストリームを構築するようにすれば、ＳＭＴＰのコマンドが複数のパケットにまたがる場合でも、正確にプロトコルコマンドを認識させることができる。例えば、受信したパケットが、１パケット目：［ＨＥ］，２パケット目：［ＬＯ
ｅｘａｍｐｌｅ．ｃｏ．ｊｐ”改行コード”］である場合に、”改行コード”までストリームを構築すれば、１パケット目と２パケット目を連結して「ＨＥＬＯｅｘａｍｐｌｅ．ｃｏ．ｊｐ”改行コード”」のデータを得ることができる。これによって、"ＨＥＬＯ"というｓｍｔｐコマンドの存在が確認できる。尚、ＳＭＴＰのコマンドが１パケットに収まっているような場合には、パケットは連結しない。パケットを連結せずともプロトコルコマンドの確認ができるからである。 Here, the reason is that a stream is constructed for each “line feed code”, so that the protocol state transition management unit 4002 correctly recognizes a protocol command. SMTP has a line feed code after a character string of a command. Therefore, if the stream is constructed up to the “line feed code”, the protocol command can be accurately recognized even when the SMTP command extends over a plurality of packets. For example, the received packet is the first packet: [HE], the second packet: [LO
example. co. jp "line feed code"], if the stream is constructed up to "line feed code", the data of "HELO example.co.jp" line feed code "" is obtained by concatenating the first packet and the second packet. Can do. As a result, the existence of the smtp command “HELO” can be confirmed. When the SMTP command is contained in one packet, the packets are not connected. This is because the protocol command can be confirmed without connecting the packets.

こうして、受信したパケットに基づいて、「改行コード」までのパケットがストリームとして構築される処理が、「ＤＡＴＡ」コマンドを受信するまで繰返し行われる（ステップＴ８〜ステップＴ１１）。その結果、ここでは、“「ＨＥＬＯ」コマンド”，“「ＭＡＩＬＦＲＯＭ」コマンド”，“「ＲＣＰＲＴＯ」コマンド”，“「ＤＡＴＡ」コマンド”を含む４つのストリームが、ストリーム再構築処理部４００１によって構築される。この構築されたストリームにはストリームＩＤ“ｓｔｒｅａｍ１〜ｓｔｒｅａｍ４”がストリーム再構築処理部４００１によって付与され、ストリーム管理テーブル４０１１でセッションＩＤ毎に管理される。具体的には、付与された“ｓｔｒｅａｍ１〜ｓｔｒｅａｍ４”は、セッションＩＤ”０００１“に対応付けられ、他のストリームの情報とともにストリーム管理テーブルで管理される。尚、このとき、セッションＩＤ”０００１“に対応するフラグ値は０である。 In this way, the process of constructing a packet up to “carriage return code” as a stream based on the received packet is repeated until the “DATA” command is received (step T8 to step T11). As a result, here, four streams including ““ HELO ”command”, ““ MAIL FROM ”command”, ““ RCPR TO ”command”, and ““ DATA ”command” are constructed by the stream reconstruction processing unit 4001. The stream ID “stream1 to stream4” is assigned to the constructed stream by the stream reconstruction processing unit 4001, and is managed for each session ID in the stream management table 4011. Specifically, the assigned “ “stream1” to “stream4” are associated with the session ID “0001” and managed in the stream management table together with other stream information. At this time, the flag value corresponding to the session ID “0001” is 0.

また、プロトコル状態遷移管理部４００２では、プロトコルの状態の遷移を追跡しており（ステップＴ１３）、プロトコルコマンドであるｓｍｔｐコマンドが“「Ｄａｔａ」コマンド”に遷移したか否かがチェックされる（ステップＴ１４）。 Further, the protocol state transition management unit 4002 tracks the transition of the protocol state (step T13), and checks whether or not the smtp command, which is a protocol command, has transitioned to the ““ Data ”command” (step S13). T14).

ここで、プロトコル状態遷移管理部４００２による「ＤＡＴＡ」コマンドへの遷移の確認は、具体的には以下のように行われる。 Here, the confirmation of the transition to the “DATA” command by the protocol state transition management unit 4002 is specifically performed as follows.

プロトコル状態遷移管理テーブル４０１２には、予めプロトコル遷移情報が登録されている。例えば、追跡するプロトコルの遷移情報として、「ＨＥＬＯ」−＞「ＭＡＩＬＦＲＯＭ」−＞「ＲＣＰＴＴＯ」−＞「ＤＡＴＡ」のコマンドが登録されている。プロトコル状態遷移管理部４００２は、このプロトコル状態遷移管理テーブル４０１２に登録されたコマンドの情報と、トラフィックを構成するパケットに含まれるデータのコマンド情報との一致・不一致を調べることで、プロトコルの状態が「ＤＡＴＡ」コマンドに遷移したかどうかを確認する。 Protocol transition information is registered in the protocol state transition management table 4012 in advance. For example, a command “HELO”-> “MAIL FROM”-> “RCPT TO”-> “DATA” is registered as transition information of the protocol to be tracked. The protocol state transition management unit 4002 checks the match / mismatch of the command information registered in the protocol state transition management table 4012 and the command information of the data included in the packets constituting the traffic, thereby determining the protocol state. It is confirmed whether or not the transition to the “DATA” command has been made.

「ＤＡＴＡ」コマンドへの遷移が確認されるまでは、フラグ値は０であり、ストリーム再構築処理部４００１は、受信メール内の改行コードを確認するまでストリームを構築する処理をする。 The flag value is 0 until the transition to the “DATA” command is confirmed, and the stream reconstruction processing unit 4001 performs the process of constructing the stream until the line feed code in the received mail is confirmed.

プロトコル状態遷移管理部４００２によって「ＤＡＴＡ」コマンドへの遷移が確認されると（ステップＴ１４：Ｙｅｓ）、プロトコル状態遷移管理部４００２は、「ＤＡＴＡ」コマンドへの遷移完了をストリーム再構築処理部４００１に通知する。この通知は、指示部４００３を介して行われる。この通知を受けて、ストリーム再構築処理部４００１は、メール本文のデータ受信がはじまったことを認識する。そして、ストリーム再構築処理部４００１は、ストリーム構築フラグを０から１に更新／設定する（ステップＴ１５）。 When the transition to the “DATA” command is confirmed by the protocol state transition management unit 4002 (step T14: Yes), the protocol state transition management unit 4002 notifies the stream reconstruction processing unit 4001 of the completion of the transition to the “DATA” command. Notice. This notification is performed via the instruction unit 4003. Upon receiving this notification, the stream reconstruction processing unit 4001 recognizes that the reception of the mail body data has started. Then, the stream reconstruction processing unit 4001 updates / sets the stream construction flag from 0 to 1 (step T15).

ストリーム構築フラグを１に設定したストリーム再構築処理部４００１では、続いて、構築するストリームサイズを変更する。これにより、”「改行コード」までストリーム構築する”処理から”５００バイト又はメール本文終了まで（「．」行を受信するまで）ストリームを構築する”処理へと切り替わる（ステップＴ１６）。 In the stream reconstruction processing unit 4001 in which the stream construction flag is set to 1, the stream size to be constructed is subsequently changed. As a result, the process is switched from the “construct stream to“ line feed code ”” process to “construct stream to 500 bytes or the end of the mail text (until the“. ”Line is received)” (step T16).

上述の如く構築処理を切り替え、ストリームサイズを変更させる目的は以下の通りである。すなわち、「ＤＡＴＡ」コマンドを受信後に受信するデータは、メール本文のデータになるが、“「改行コード」までストリームを構築する”構築処理のままでは、メール本文１行だけしかストリームを構築できないことになってしまう。「ＤＡＴＡ」コマンド受信後においては、一定サイズ（実施例では５００バイト）か、メール本文終了まで（「．」行を受信するまで）ストリームを構築し、分析処理に必要な処理単位を確保する必要がある。 The purpose of switching the construction process and changing the stream size as described above is as follows. In other words, the data received after receiving the “DATA” command is the data of the mail body, but the stream can be built only for one line of the mail body with the “build stream to“ line feed code ”” construction process. After receiving the “DATA” command, the stream is built up to a certain size (500 bytes in the embodiment) or until the end of the mail text (until the “.” Line is received), and processing required for analysis processing It is necessary to secure a unit.

さて、ストリーム再構築処理部４００１によって、５００バイト又はメール本文終了（「．」行の受信）までのストリーム構築が終了すると（ステップＴ１７：Ｙｅｓ）、この構築されたストリームにはストリームＩＤ“ｓｔｒｅａｍ５”が付与される（ステップＴ１８）。この“ｓｔｒｅａｍ５”が付与されたストリームは、プロトコル状態遷移管理部４００２を介して特定部４１０に渡される。また、「５００バイト迄ストリームの構築が終了」した旨の通知が、ストリームＩＤ“ｓｔｒｅａｍ５”とともに、指示部４００３に送信される。このときのストリーム管理テーブルの登録状況は、図１４の例で示される。 When the stream reconstruction processing unit 4001 completes the stream construction up to the end of 500 bytes or the mail body (reception of the “.” Line) (step T17: Yes), the stream ID “stream5” is assigned to the constructed stream. Is given (step T18). The stream to which “stream5” is assigned is passed to the specifying unit 410 via the protocol state transition management unit 4002. In addition, a notification that “stream construction has been completed up to 500 bytes” is transmitted to the instruction unit 4003 together with the stream ID “stream5”. The registration status of the stream management table at this time is shown in the example of FIG.

このストリーム構築終了の通知を受けた指示部４００３は、特定部４１０に対し、該ストリームのデータの構文解析の開始を伝える指示と、ストリームＩＤ“ｓｔｒｅａｍ５”とを通知する（ステップＴ１９）。この通知に応答した特定部４１０では、５００バイトのデータ量のストリームを処理単位として、データの構文解析を開始する（ステップＴ２０）。更に、ｎ−ｇｒａｍ処理部４２０によって、５００バイトのデータ量のストリームを処理単位として、通知されたストリームＩＤ“ｓｔｒｅａｍ５”のストリームについての特徴量が算出される（ステップＴ２２）。 Upon receiving this stream construction end notification, the instruction unit 4003 notifies the specifying unit 410 of an instruction to notify the start of syntax analysis of the stream data and the stream ID “stream5” (step T19). In response to this notification, the identifying unit 410 starts parsing data using a stream having a data amount of 500 bytes as a processing unit (step T20). Further, the n-gram processing unit 420 calculates a feature amount for the stream with the notified stream ID “stream5”, with a stream having a data amount of 500 bytes as a processing unit (step T22).

ｎ−ｇｒａｍ処理部４２０による特徴量の算出処理が終了すると（ステップＴ２３：Ｙｅｓ）、ｎ−ｇｒａｍ処理部４２０から処理完了通知が、指示部４００３を介してストリーム再構築処理部４００１に送出される（ステップＴ２４）。このとき、ストリームＩＤ“ｓｔｒｅａｍ５”もストリーム再構築処理部４００１に通知される（ステップＴ２５）。 When the feature amount calculation processing by the n-gram processing unit 420 ends (step T23: Yes), a processing completion notification is sent from the n-gram processing unit 420 to the stream reconstruction processing unit 4001 via the instruction unit 4003. (Step T24). At this time, the stream ID “stream5” is also notified to the stream reconstruction processing unit 4001 (step T25).

この処理完了通知を受信したストリーム再構築処理部４００１では、通知されたストリームＩＤ“ｓｔｒｅａｍ５”を管理するためのメモリ領域を解放する（ステップＴ２６）。このとき、ストリームＩＤ“ｓｔｒｅａｍ５”に対応するストリームと同じセッションＩＤ“０００１”をもつ他のストリーム（ストリームＩＤ“ｓｔｒｅａｍ１〜４”に対応するストリーム）についても、ストリーム管理の為のメモリ領域が解放される（ステップＴ２７）。こうして、ストリームＩＤ“ｓｔｒｅａｍ５”に対応するセッションＩＤ“０００１”に関連付けられたストリームＩＤ“ｓｔｒｅａｍ１〜ｓｔｒｅａｍ５”について、これらのストリームＩＤに対応するストリームの情報が、ストリーム管理テーブル４０１１から削除されることになる。 Upon receiving this processing completion notification, the stream reconstruction processing unit 4001 releases the memory area for managing the notified stream ID “stream5” (step T26). At this time, the memory area for stream management is also released for other streams having the same session ID “0001” as the stream corresponding to the stream ID “stream5” (streams corresponding to the stream ID “stream1 to 4”). (Step T27). Thus, for the stream IDs “stream1 to stream5” associated with the session ID “0001” corresponding to the stream ID “stream5”, the stream information corresponding to these stream IDs is deleted from the stream management table 4011. Become.

続いて、ストリーム再構築処理部４００１は、メモリ解放が完了した旨の完了通知を、ストリームＩＤとあわせて指示部４００３に返信する。 Subsequently, the stream reconstruction processing unit 4001 returns a completion notification indicating that the memory release has been completed, together with the stream ID, to the instruction unit 4003.

更に、メモリ解放の完了通知を受けた指示部４００３では、残りのセッション管理テーブル４０１０及びプロトコル状態遷移管理テーブル４０１２の該当セッションについても削除するように、セッション管理部４０００及びプロトコル状態遷移管理部４００２に対して指示を通知する。 Further, the instruction unit 4003 that has received the memory release completion notification instructs the session management unit 4000 and the protocol state transition management unit 4002 to delete the corresponding sessions in the remaining session management table 4010 and protocol state transition management table 4012. An instruction is notified to.

この通知を受信したセッション管理部４０００では、通知されたストリームＩＤのストリームによるセッションを管理するのに使用されているメモリ領域を解放する（ステップＴ２８）。具体的には、セッションＩＤ“０００１”、及びセッションＩＤ“０００１”に関連付けられた接続情報がセッション管理テーブル４０１０から削除される。 Upon receiving this notification, the session management unit 4000 releases the memory area used to manage the session using the stream with the notified stream ID (step T28). Specifically, the session ID “0001” and the connection information associated with the session ID “0001” are deleted from the session management table 4010.

同様に、指示部４００３からの通知を受信したプロトコル状態遷移管理部４００２も、通知されたストリームＩＤのストリームを管理するのに使用されているメモリ領域を解放する（ステップＴ２８）。具体的には、構築したストリームを構成する各パケットについて追跡されたプロトコルの状態遷移のステータスを示す情報が、プロトコル状態遷移管理テーブル４０１２から削除される。以上によりメモリ領域の解放が完了する（ステップＴ２９）。 Similarly, the protocol state transition management unit 4002 that has received the notification from the instruction unit 4003 also releases the memory area used to manage the stream with the notified stream ID (step T28). Specifically, information indicating the status of the protocol state transition tracked for each packet constituting the constructed stream is deleted from the protocol state transition management table 4012. Thus, the release of the memory area is completed (step T29).

上記のように構成させた第４の実施例では、メール１通分のデータが取得されるのを待たずに、ストリームの構築規則に従って受信メールから構築したストリームのデータが揃った時点で、データの構文解析を開始することができる。従って、悪意メールの検出に要する時間を一層短縮化し、悪意メール検出処理を効率化することができる。 In the fourth embodiment configured as described above, when the data of the stream constructed from the received mail is prepared according to the stream construction rules without waiting for the data for one mail to be acquired, the data Can begin parsing. Accordingly, it is possible to further reduce the time required for detecting the malicious mail and to improve the efficiency of the malicious mail detection process.

更に、上記第４の実施例では、メール１通分のデータが揃うのを待たずに、ストリームの構築規則に従って受信メールから構築したストリームのデータが揃った時点でデータの構文解析を開始する。そして、ｎ−ｇｒａｍ処理部による特徴量の算出が終了した後で、この処理単位のストリームの管理に使用されているメモリ領域を解放する。従って、搭載するメモリを小容量化することができ、装置のコストを抑えることができる。 Further, in the fourth embodiment, the data parsing is started when the data of the stream constructed from the received mail is prepared according to the stream construction rules without waiting for the data for one mail to be prepared. Then, after the calculation of the feature value by the n-gram processing unit is completed, the memory area used for managing the stream of this processing unit is released. Therefore, the capacity of the mounted memory can be reduced, and the cost of the apparatus can be suppressed.

更に、上記第４の実施例では、メモリ解放手段におけるプロトコル状態遷移管理部がプロトコルの状態の遷移を追跡し、ｓｍｔｐコマンドが「ＤＡＴＡ」コマンドに遷移したことが確認された後に、一定サイズまでストリームを構築している。この為、プロトコルに違反したパケットが分析の対象にならず、正しいプロトコルにのっとったパケットのみが分析の対象となる。従って、分析対象の絞込みができ、更なる処理の効率化が図れる。 Furthermore, in the fourth embodiment, the protocol state transition management unit in the memory releasing unit tracks the transition of the protocol state, and after confirming that the smtp command has transitioned to the “DATA” command, the stream is streamed to a certain size. Is building. For this reason, packets that violate the protocol are not subject to analysis, and only packets that comply with the correct protocol are subject to analysis. Therefore, the analysis target can be narrowed down and the processing efficiency can be further improved.

尚、本実施例では、処理効率の観点から、メール本文のデータについて構築したストリームだけを特徴量の算出対象としたが、これに限る必要はない。改行コードまで構築されたストリームも含めて特徴量の算出対象とすることで、メールヘッダを分析対象に含めるようにしてもよい。例えば、ｓｔｒｅａｍ５のストリームに限らず、ｓｔｒｅａｍ１〜５の全てのストリームについて特徴量を算出し、特徴量の算出後にこれらのストリームを管理するのに使用されているメモリ領域を解放するように構成させてもよい。 In this embodiment, from the viewpoint of processing efficiency, only the stream constructed for the mail body data is set as the feature quantity calculation target. However, the present invention is not limited to this. The mail header may be included in the analysis target by including the stream constructed up to the line feed code as the feature amount calculation target. For example, not only the stream 5 stream, but also the feature amount is calculated for all streams 1 to 5, and the memory area used to manage these streams is released after the feature amount calculation. Also good.

また、本実施例では、改行コードまでストリームを構築した後、メール本文のデータが一定サイズになる迄ストリームを構築したが、改行コードまでのストリーム構築処理は省略しても構わない。例えば、「ＤＡＴＡ」コマンドの受信有無のみを監視し、「ＤＡＴＡ」コマンドの受信を受けてメール本文のデータが一定サイズになる迄ストリームを構築するように構成させてもよい。 Further, in this embodiment, after the stream is constructed up to the line feed code, the stream is constructed until the data of the mail body reaches a certain size, but the stream construction processing up to the line feed code may be omitted. For example, only the presence / absence of reception of the “DATA” command may be monitored, and the stream may be constructed until the data of the mail text reaches a certain size upon receipt of the “DATA” command.

尚、本実施例では、ｎ−ｇｒａｍ処理部による特徴量の算出終了後、ストリーム管理テーブルで管理されているセッションＩＤに関連するストリームの情報、受信メールのパケットのセッション管理テーブルで管理されている接続情報、及び、受信メールのパケットのプロトコル状態遷移管理テーブルで管理されているプロトコル遷移情報を削除し、構築したストリームを管理するのに使用されているメモリ領域を解放するよう構成させた。しかしながら、これに限る必要は無い。前述したセッションＩＤに関連するストリームの情報、接続情報、プロトコル遷移情報の少なくともいずれかの情報を削除するようにしてもよい。また、上記以外の情報であっても、構築したストリームのデータを管理するのに使用されている情報であれば、これを削除してメモリ領域を解放するようにしても構わない。 In this embodiment, after the feature amount calculation by the n-gram processing unit is completed, the stream information related to the session ID managed in the stream management table and the session management table of received mail packets are managed. The connection information and the protocol transition information managed in the protocol state transition management table of the received mail packet are deleted, and the memory area used to manage the constructed stream is released. However, this need not be limited. At least one of the stream information, connection information, and protocol transition information related to the session ID described above may be deleted. Even if the information is other than the above, it may be deleted as long as the information is used for managing the data of the constructed stream to release the memory area.

また、上記第４の実施例では、３つのテーブルを用意し、各テーブルでそれぞれセッションＩＤに関連するストリームの情報、接続情報、プロトコル遷移情報を管理する構成をとったが、これら情報を一つのテーブルで一元管理するようにしても構わない。 In the fourth embodiment, three tables are prepared, and the stream information, connection information, and protocol transition information related to the session ID are managed in each table. You may make it manage in a centralized manner with a table.

また、上記第４の実施例では、第１〜第３の実施例で示した構成に加え、メモリ解放機能部を更に有する構成を例にとって説明するが、これに限る必要はない。プロトコル判定部にメモリ解放機能部の機能を持たせるようにしてもよい。 In the fourth embodiment, a configuration having a memory release function unit in addition to the configurations shown in the first to third embodiments will be described as an example. However, the present invention is not limited to this. The protocol determination unit may have the function of a memory release function unit.

尚、上記第１から第４の実施例においては、説明の便宜の為、端末と本発明の検出装置とを別々のハードウェアであるとして説明したが、本発明の悪意メール検出装置の各部を端末に組み込み、端末上で悪意メール検出機能を実現させるようにしてもよい。 In the first to fourth embodiments described above, for convenience of explanation, the terminal and the detection device of the present invention have been described as separate hardware, but each part of the malicious mail detection device of the present invention has been described. It may be incorporated in a terminal to realize a malicious mail detection function on the terminal.

また、上記第１から第４の実施例では、装置の各部をハードウェアで構成したが、各部の一部または全部の処理をプログラムとして情報処理装置に実行させるようにすることもできる。 In the first to fourth embodiments, each unit of the apparatus is configured by hardware. However, part or all of the processing of each unit may be executed as a program by the information processing apparatus.

本実施例における悪意メール検出装置を用いた通信システムの一例を示す図The figure which shows an example of the communication system using the malicious email detection apparatus in a present Example. 本実施例における悪意メール検出装置を用いた通信システムの他の一例を示す図The figure which shows another example of the communication system using the malicious email detection apparatus in a present Example. 本発明における第１の実施例の構成を示すブロック図The block diagram which shows the structure of the 1st Example in this invention. 本発明の悪意メール検出装置における処理フローを示すフローチャートThe flowchart which shows the processing flow in the malicious email detection apparatus of this invention リファレンスパターンの例を示す図Diagram showing examples of reference patterns メールの一例を示す図Figure showing an example of email 計数テーブルの例を示す図Figure showing an example of a counting table 文字列の出現頻度の差分の算出例Example of calculating the difference in appearance frequency of character strings 本発明における第２の実施例の構成を示すブロック図The block diagram which shows the structure of the 2nd Example in this invention. 本発明における第３の実施例の構成を示すブロック図The block diagram which shows the structure of the 3rd Example in this invention. 本発明における第４の実施例の構成を示すブロック図The block diagram which shows the structure of the 4th Example in this invention. 第４の実施例におけるメモリ解放機能部の構成例を示すブロック図The block diagram which shows the structural example of the memory release function part in 4th Example 第４の実施例におけるセッション管理テーブルの例を示す図The figure which shows the example of the session management table in a 4th Example. 第４の実施例におけるストリーム管理テーブルの例を示す図The figure which shows the example of the stream management table in a 4th Example. 第４の実施例における処理フローを示すフローチャートThe flowchart which shows the processing flow in a 4th Example. 第４の実施例における処理フローを示すフローチャートThe flowchart which shows the processing flow in a 4th Example.

Explanation of symbols

３０ネットワーク
４０ネットワーク
５０検出装置
５１端末
５２端末
５３端末
６０通信装置
６１端末
６２端末
６３端末
１００プロトコル判定部
１１０特定部
１２０ｎ−ｇｒａｍ処理部
１３０判断部
１４０検出部
１５０記憶部
１６０計数対象文字指示部
１７０リファレンスパターン格納部
１８０アクション処理部
２００プロトコル判定部
２１０特定部
２２０ｎ−ｇｒａｍ処理部
２３０判断部
２４０検出部
２５０記憶部
２６０計数対象文字指示部
２７０リファレンスパターン格納部
２８０アクション処理部
３００プロトコル判定部
３１０特定部
３２０ｎ−ｇｒａｍ処理部
３３０判断部
３４０検出部
３５０記憶部
３５１定義用テーブル
３６０計数対象文字指示部
３７０リファレンスパターン格納部
３８０アクション処理部
４００プロトコル判定部
４１０特定部
４２０ｎ−ｇｒａｍ処理部
４３０判断部
４４０検出部
４５０記憶部
４５１定義用テーブル
４６０計数対象文字指示部
４７０リファレンスパターン格納部
４８０アクション処理部
４９０メモリ解放機能部
４０００セッション管理部
４００１ストリーム再構築処理部
４００２プロトコル状態遷移管理部
４００３指示部
４０１０セッション管理テーブル
４０１１ストリーム管理テーブル
４０１２プロトコル状態遷移管理テーブル 30 Network 40 Network 50 Detection Device 51 Terminal 52 Terminal 53 Terminal 60 Communication Device 61 Terminal 62 Terminal 63 Terminal 100 Protocol Determination Unit 110 Identification Unit 120 n-gram Processing Unit 130 Determination Unit 140 Detection Unit 150 Storage Unit 160 Count Target Character Indication Unit 170 Reference pattern storage unit 180 Action processing unit 200 Protocol determination unit 210 Identification unit 220 n-gram processing unit 230 Determination unit 240 Detection unit 250 Storage unit 260 Count target character instruction unit 270 Reference pattern storage unit 280 Action processing unit 300 Protocol determination unit 310 identification unit 320 n-gram processing unit 330 determination unit 340 detection unit 350 storage unit 351 definition table 360 counting target character instruction unit 370 reference pattern storage unit 380 action processing unit 00 protocol determination unit 410 identification unit 420 n-gram processing unit 430 determination unit 440 detection unit 450 storage unit 451 definition table 460 counting target character instruction unit 470 reference pattern storage unit 480 action processing unit 490 memory release function unit 4000 session management unit 4001 Stream reconstruction processing unit 4002 Protocol state transition management unit 4003 Instruction unit 4010 Session management table 4011 Stream management table 4012 Protocol state transition management table

Claims

Storage means for storing a feature amount of a character string pattern included in an email used as a comparison target;
A calculation means for calculating a feature amount of a character string pattern in a predefined calculation target range in the received mail;
Determining means for determining whether or not the mail is a malicious mail based on the similarity between the characteristic amount of the character string pattern of the received mail calculated by the calculating means and the characteristic amount of the stored character string pattern; A malicious mail detection device comprising:

The malicious mail detection device according to claim 1, wherein the feature amount is a set of appearance frequencies of one or more character strings constituting a character string pattern.

The determination means determines the character of the received mail calculated by the arithmetic means when the mail used as the comparison target is a mail certified as malicious mail and the character string constituting the character string pattern is one character. The difference value calculated for each character is calculated by calculating the difference between the appearance frequency of each character with respect to all characters constituting the column pattern and the appearance frequency of each character with respect to all characters constituting the stored character string pattern. The malicious mail detection device according to claim 2, wherein the received mail is determined to be malicious mail when the average value is smaller than a predetermined threshold.

The determination means is an email certified as not being a malicious email as the comparison target, and when the character string constituting the character string pattern is one character, the received email calculated by the computing means The difference between the appearance frequency of each character with respect to all characters constituting the character string pattern and the appearance frequency of each character with respect to all characters constituting the stored character string pattern is calculated, and the difference calculated for each character The malicious mail detection device according to claim 2, wherein when the average value is larger than a predetermined threshold value, the received mail is determined to be malicious mail.

The character string pattern to be calculated by the calculation means is a URL included in a received mail, and the stored character string pattern is a URL extracted from spam mail. Item 5. The malicious mail detection device according to any one of items 4 to 6.

The character string pattern to be calculated by the calculation means is a character string pattern within a predetermined calculation target range in the attached file of the received mail, and the stored character string pattern is a character extracted from malware 5. The malicious mail detection device according to claim 1, wherein the malicious mail detection device is a column pattern.

The character string pattern to be calculated by the calculation means is a character string pattern within a predefined calculation target range of received mail, and the stored character string pattern is a character string pattern extracted from spam mail. The malicious mail detection device according to claim 1, wherein the malicious mail detection device is a malicious email detection device.

2. The malicious mail detection device according to claim 1, further comprising processing means for discarding, blocking, or recording the received mail when the received mail is determined to be malicious mail by the determining means. Item 8. The malicious mail detection device according to Item 7.

The malicious email detection device includes:
Construction means for constructing at least one stream from the received mail according to a predetermined rule;
A memory release unit that manages a stream constructed by the construction unit for each stream, and that releases a memory area for managing the stream in units of streams in response to the end of calculation of the feature value by the computing unit; The malicious mail detection device according to claim 1, comprising:

The malicious mail detection device according to claim 9, wherein the calculation unit is configured to calculate a feature amount for each stream constructed by the construction unit.

The information stored in the memory area is at least one of connection information of received mail packets, status transition of protocol of received mail packets, and identification information for specifying a stream corresponding to the processing unit. The malicious mail detection device according to claim 9, wherein the malicious mail detection device is information including:

A calculation step of calculating a feature amount of a character string pattern in a predefined calculation target range of the received mail;
A determination step of determining whether or not the mail is a malicious mail based on the similarity between the feature amount of the character string pattern of the received mail calculated in the calculation step and the feature amount of the stored character string pattern; A malicious email detection method comprising:

The malicious mail detection method according to claim 12, wherein the feature amount is a set of appearance frequencies of one or more character strings constituting a character string pattern.

In the determination step, when the mail used as the comparison target is a mail certified as malicious mail and the character string constituting the character string pattern is one character, the character string of the received mail calculated by the calculation step The difference between the appearance frequency of each character for all characters constituting the pattern and the appearance frequency of each character for all characters constituting the stored character string pattern is calculated, and the difference value calculated for each character is calculated. The malicious mail detection method according to claim 13, wherein when the average value is smaller than a predetermined threshold, the received mail is determined to be malicious mail.

In the determination step, when the mail used as the comparison target is a mail certified as not malicious mail and the character string constituting the character string pattern is one character, the character of the received mail calculated by the calculation step The difference value calculated for each character is calculated by calculating the difference between the appearance frequency of each character with respect to all characters constituting the column pattern and the appearance frequency of each character with respect to all characters constituting the stored character string pattern. The malicious mail detection method according to claim 13, wherein the received mail is determined to be malicious mail when the average value of the messages is larger than a predetermined threshold.

13. The character string pattern to be calculated in the calculation step is a URL included in a received mail, and the stored character string pattern is a URL extracted from spam mail. Item 16. The malicious mail detection method according to any one of items 15.

The character string pattern to be calculated in the calculation step is a character string pattern within a predetermined calculation target range in the attached file of the received mail, and the stored character string pattern is a character extracted from malware The malicious mail detection method according to claim 12, wherein the malicious mail detection method is a column pattern.

The character string pattern to be calculated in the calculation step is a character string pattern within a predetermined calculation target range of the received mail, and the stored character string pattern is a character string pattern extracted from spam mail. The malicious mail detection method according to any one of claims 12 to 15, wherein:

13. The malicious mail detection method includes a processing step of discarding, blocking, or recording the received mail when the received mail is determined to be malicious mail by the determining step. Item 19. The malicious mail detection method according to Item 18.

In accordance with a predetermined rule, at least one stream is constructed from the received mail, the constructed stream is managed for each stream, and upon completion of the calculation of the feature amount by the computation step, the management of the stream is performed. The malicious mail detection method according to claim 12, further comprising a step of releasing a memory area for each stream.

21. The malicious mail detection method according to claim 20, wherein the calculating step calculates a feature amount for each constructed stream.

The information stored in the memory area is at least one of connection information of received mail packets, status transition of protocol of received mail packets, and identification information for specifying a stream corresponding to the processing unit. 22. The malicious mail detection method according to claim 20, wherein the malicious mail detection method is information including:

An information processing apparatus program, the program is stored in the information processing apparatus,
A calculation process for calculating a characteristic amount of a character string pattern in a predetermined calculation target range in the received mail;
A determination process for determining whether or not the mail is a malicious mail based on the similarity between the feature amount of the character string pattern of the received mail calculated in the calculation step and the feature amount of the stored character string pattern; A program characterized by being executed.