JP2012003463A

JP2012003463A - Supporting device, method and program for supporting signature generation

Info

Publication number: JP2012003463A
Application number: JP2010137111A
Authority: JP
Inventors: Akira Yamada; 山田　　明; Masaru Miyake; 優三宅
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2010-06-16
Filing date: 2010-06-16
Publication date: 2012-01-05

Abstract

PROBLEM TO BE SOLVED: To provide a supporting device, method and program that are capable of efficiently generating a signature which is directed to a text format sample in an intrusion detection system.SOLUTION: A signature generation server 10 comprises a text determination unit 112 that determines whether a sample of attack data or non-attack data has a text format or not; a character class classification unit 114 that classifies characters included in the sample determined as having the text format into any of the character classes in a regular expression; a character string extraction unit 115 that extracts a character string or sequence of character strings more frequently appearing in the attack data than the non-attack data, as a candidate of the signature, among the character strings made up of continuous characters classified into an alphabet.

Description

本発明は、不許可データの侵入を検知するためのシグネチャの生成を支援する支援装置、方法及びプログラムに関する。 The present invention relates to a support apparatus, method, and program for supporting generation of a signature for detecting intrusion of unauthorized data.

従来、Ｗｅｂサーバ等、インターネットに公開してサービスを提供するサーバは、インターネットから攻撃（不正アクセス）を受ける可能性がある。そこで、ファイアウォールや侵入検知システム等が設置されることが一般的である。ファイアウォールは、特定のプロトコル、ＩＰアドレス及びポート番号のみの通過を許可することによって、サーバにおける意図しないポートへの攻撃を防ぐ。また、侵入検知システムは、許可しているポートを通過するパケットの内容から攻撃を検知して運用者に通知する。 2. Description of the Related Art Conventionally, a server that provides services by publishing to the Internet, such as a Web server, may be attacked (illegal access) from the Internet. Therefore, a firewall or an intrusion detection system is generally installed. The firewall prevents attacks on unintended ports on the server by allowing only certain protocols, IP addresses and port numbers to pass through. The intrusion detection system detects an attack from the contents of a packet passing through the permitted port and notifies the operator.

この侵入検知システムは、シグネチャ型と呼ばれる方式を採用する場合が多い。シグネチャ型とは、攻撃に特有のビット列を予めデータベースに登録しておき、通信の内容がこのビット列と合致した場合に、攻撃を受けていると判断する方式である。ところが、攻撃の方法は、次々と新しくなるため、攻撃方法の変化に対応するためのシグネチャの更新が不可欠である。このシグネチャは、侵入検知システムの販売会社や運用者のコミュニティによって生成、配布されるが、通常、攻撃時に発生する通信や脆弱性に関する情報から、専門家が攻撃特有のビット列を抽出して、シグネチャとして登録される。 This intrusion detection system often employs a method called a signature type. The signature type is a method in which a bit string peculiar to an attack is registered in a database in advance, and it is determined that an attack has occurred when the content of communication matches this bit string. However, since the attack methods become new one after another, it is indispensable to update the signature in order to cope with the change in the attack method. This signature is generated and distributed by the community of intrusion detection system distributors and operators. Usually, experts extract bit strings specific to attacks from information on communications and vulnerabilities that occur at the time of an attack. Registered as

このようなシグネチャの生成に関わる時間や費用を削減するために、通信の内容や攻撃方法に関する情報からシグネチャを自動生成する方法が提案されている（例えば、特許文献１〜４、非特許文献１〜３参照）。 In order to reduce the time and cost related to the generation of such signatures, methods for automatically generating signatures from information on communication contents and attack methods have been proposed (for example, Patent Documents 1 to 4 and Non-Patent Document 1). To 3).

ところで、攻撃がバイナリの実行ファイルである場合、このファイルは、実行形式のまま圧縮、暗号化、難読化又は最適化することにより、異なるビット列に変換が可能である。したがって、攻撃者が攻撃パケットに対して、これらの変換を行うことにより、シグネチャによる攻撃検出を回避される可能性がある。なお、このような変換によって生成された攻撃の検体は、ポリモーフィック型マルウェアと呼ばれる。ポリモーフィック技術は、実行ファイルだけでなく、攻撃通信に対しても適用される。 By the way, when the attack is a binary executable file, this file can be converted into a different bit string by compressing, encrypting, obfuscating, or optimizing the file in the executable format. Therefore, if the attacker performs these conversions on the attack packet, the attack detection by the signature may be avoided. An attack specimen generated by such conversion is called polymorphic malware. The polymorphic technology is applied not only to an executable file but also to attack communication.

そこで、１つの攻撃検体に対して、複数のポリモーフィックされた検体を生成して、全ての検体に共通するビット列をシグネチャとして生成する方法が提案されている（例えば、非特許文献４又は５参照）。 Therefore, a method has been proposed in which a plurality of polymorphic samples are generated for one attack sample, and a bit string common to all the samples is generated as a signature (see, for example, Non-Patent Document 4 or 5). ).

特許第４２６５１６３号公報Japanese Patent No. 4265163 特開２００４−３４８７４０号公報JP 2004-348740 A 特開２００７−０５８５１４号公報JP 2007-058514 A 特開２００７−２４２００２号公報JP 2007-242002 A

Ｈｙａｎｇ−ａｈＫｉｍ， “Ａｕｔｏｇｒａｐｈ：Ｔｏｗａｒｄａｕｔｏｍａｔｅｄ，ｄｉｓｔｒｉｂｕｔｅｄｗｏｒｍｓｉｇｎａｔｕｒｅｄｅｔｅｃｔｉｏｎ，” Ｐｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ１３ｔｈＵｓｅｎｉｘＳｅｃｕｒｉｔｙＳｙｍｐｏｓｉｕｍ，２００４Hyang-ah Kim, “Autograph: Towered automated, distributed worm signature detection,” Proceedings of the 13th Usage Security Symposium, 200 ＳｕｍｅｅｔＳｉｎｇｈ，ＣｒｉｓｔｉａｎＥｓｔａｎ，ＧｅｏｒｇｅＶａｒｇｈｅｓｅａｎｄＳｔｅｆａｎＳａｖａｇｅ， “ＡｕｔｏｍａｔｅｄＷｏｒｍｆｉｎｇｅｒｐｒｉｎｔｉｎｇ，” Ｐｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ６ｔｈＣｏｎｆｅｒｅｎｃｅｏｎＳｙｍｐｏｓｉｕｍｏｎＯｐｅａｒｔｉｎｇＳｙｓｔｅｍｓＤｅｓｉｇｎ＆Ｉｍｐｌｅｍｅｎｔａｔｉｏｎ，２００４Sumet Singh, Christian Estan, George Vargese and Stefan Savage, “Automated Form of Stimulation, 6th Conference on Symposium.” ＣｈｒｉｓｔｉａｎＫｒｅｉｂｉｃｈａｎｄＪｏｎＣｒｏｗｃｒｏｆｔ， “Ｈｏｎｅｙｃｏｍｂ − ＣｒｅａｔｉｎｇＩｎｔｒｕｓｉｏｎＤｅｔｅｃｔｉｏｎＳｉｇｎａｔｕｒｅｓＵｓｉｎｇＨｏｎｅｙｐｏｔｓ，” ＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅＳｅｃｏｎｄＷｏｒｋｓｈｏｐｏｎＨｏｔＴｏｐｉｃｓｉｎＮｅｔｗｏｒｋｓ（ＨｏｔＮｅｔｓ−ＩＩ），２００３Christian Kreibich and Jon Crowcroft, “Honeycomb-Creating Intrusion Detection Detections IIHonepots,” Proceedings of the Second Worlds. ＪａｍｅｓＮｅｗｓｏｍｅ，ＢｒａｄＫａｒｐａｎｄＤａｗｎＳｏｎｇ， “Ｐｏｌｙｇｒａｐｈ：ＡｕｔｏｍａｔｉｃａｌｌｙＧｅｎｅｒａｔｉｎｇＳｉｇｎａｔｕｒｅｓｆｏｒＰｏｌｙｍｏｒｐｈｉｃＷｏｒｍｓ，” Ｐｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ２００５ＩＥＥＥＳｙｍｐｏｓｉｕｍｏｎＳｅｃｕｒｉｔｙａｎｄＰｒｉｖａｃｙ，２００５James Newsome, Brad Karp and Dawn Song, “Polygraph: Automatically Generating Signatures for Polymorphic Worms, and Proceedings of P ＺｈｉｃｈｕｎＬｉ，ＭａｎａｎＳａｎｇｈｉ，ＹａｎＣｈｅｎ，Ｍｉｎｇ−ｙａｎｇＫａｏａｎｄＢｒｉａｎＣｈａｖｅｚ， “Ｈａｍｓａ：ｆａｓｔｓｉｇｎａｔｕｒｅｇｅｎｅｒａｔｉｏｎｆｏｒｚｅｒｏ−ｄａｙｐｏｌｙｍｏｒｐｈｉｃｗｏｒｍｓｗｉｔｈｐｒｏｖａｂｌｅａｔｔａｃｋｒｅｓｉｌｉｅｎｃｅ，” Ｐｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ２００６ＩＥＥＥＳｙｍｐｏｓｉｕｍｏｎＳｅｃｕｒｉｔｙａｎｄＰｒｉｖａｃｙ，２００６Zhichun Li, Manan Sanghi, Yan Chen, Ming-yang Kao and Brian Chavez, "Hamsa: fast signature generation for zero-day polymorphic worms with provable attack resilience," Proceedings of the 2006 IEEE Symposium on Security and Privacy, 2006

近年、マルウェアの感染経路が、ネットワークからの能動的な経路から、閲覧しているＷｅｂサイトからＷｅｂブラウザを経由する受動的な経路に移行しつつある。そのため、Ｗｅｂブラウザが解釈できるＨＴＭＬやＪａｖａＳｃｒｉｐｔ（登録商標）等のテキスト形式の検体が増加している。 In recent years, malware infection routes are shifting from an active route from a network to a passive route via a web browser from a web site being browsed. Therefore, the number of text format samples such as HTML and JavaScript (registered trademark) that can be interpreted by Web browsers is increasing.

また、Ｗｅｂブラウザへの攻撃は、ＨＴＭＬやＪａｖａＳｃｒｉｐｔ（登録商標）によってＷｅｂブラウザを攻撃Ｗｅｂサイトへ誘導する方法であるため、Ｗｅｂサーバに設置された侵入検知システムでは攻撃を検知できない場合がある。したがって、テキスト形式の検体から効率的にシグネチャを生成することが望まれる。 In addition, since the attack to the Web browser is a method of guiding the Web browser to the attack Web site using HTML or JavaScript (registered trademark), the intrusion detection system installed on the Web server may not be able to detect the attack. Therefore, it is desirable to efficiently generate a signature from a text sample.

しかしながら、上述のバイナリ形式の検体を対象としたシグネチャの生成方法は、テキスト形式の検体を効率的に処理できない。すなわち、テキスト形式のスクリプトは、バイナリ形式に比べて表現が柔軟であり、シグネチャを回避する変換が容易であるため、対象とすべき検体が膨大となる。そのため、有効なシグネチャを網羅的に検索するために多大な時間が掛かってしまう。例えば、ＨＴＭＬやＪａｖａＳｃｒｉｐｔ（登録商標）は、スペースの連続や改行コードを無視するため、スペースを追加するだけで異なる攻撃スクリプトが生成される。 However, the above-described signature generation method for a binary sample cannot efficiently process a text sample. In other words, the script in the text format is more flexible in expression than the binary format and can be easily converted to avoid the signature. Therefore, it takes a lot of time to exhaustively search for valid signatures. For example, since HTML and JavaScript (registered trademark) ignore a continuation of a space or a line feed code, a different attack script is generated only by adding a space.

そこで、本発明は、侵入検知システムにおけるテキスト形式の検体を対象としたシグネチャを効率的に生成できる支援装置、方法及びプログラムを提供することを目的とする。 Therefore, an object of the present invention is to provide a support apparatus, method, and program capable of efficiently generating a signature for a text sample in an intrusion detection system.

本発明では、以下のような解決手段を提供する。 The present invention provides the following solutions.

（１）攻撃データの侵入を検知するためのシグネチャの生成を支援する支援装置であって、攻撃データ及び非攻撃データの検体がテキスト形式であるか否かを判定するテキスト判定部と、前記テキスト判定部によりテキスト形式であると判定された前記検体に含まれている文字を、正規表現における文字クラスのいずれかに分類するクラス分類部と、前記クラス分類部により所定の文字クラスに分類された文字の連続からなる文字列のうち、前記非攻撃データの中よりも前記攻撃データの中に頻繁に出現する文字列、又は文字列の列を、前記シグネチャの候補として抽出する文字列抽出部と、を備える支援装置。 (1) A support device for supporting generation of a signature for detecting intrusion of attack data, a text determination unit for determining whether a specimen of attack data and non-attack data is in a text format, and the text A class classification unit that classifies the characters included in the specimen determined to be in text format by the determination unit into one of the character classes in the regular expression, and is classified into a predetermined character class by the class classification unit A character string extracting unit that extracts, as a candidate for the signature, a character string that appears more frequently in the attack data than in the non-attack data, or a string of character strings, among character strings that consist of a sequence of characters; A support device comprising:

このような構成によれば、支援装置は、テキスト形式の検体のうち、非攻撃データよりも攻撃データに頻出する所定の種類の文字からなる文字列、又は文字列の列を抽出する。したがって、支援装置は、所定の種類の文字のみに着目して効率的にテキスト形式の攻撃に特有のシグネチャ候補を抽出できる。その結果、支援装置は、処理負荷を低減でき、大量の検体に対して網羅的にシグネチャ候補を探索することができる。 According to such a configuration, the support apparatus extracts a character string or a string of character strings made up of a predetermined type of character that appears more frequently in attack data than in non-attack data, from the text sample. Therefore, the support apparatus can efficiently extract signature candidates specific to text format attacks by paying attention to only predetermined types of characters. As a result, the support apparatus can reduce the processing load, and can comprehensively search for signature candidates for a large number of samples.

（２）前記文字列抽出部により抽出された文字列又は文字列の列を示す正規表現形式のシグネチャを生成する正規表現生成部をさらに備える（１）に記載の支援装置。 (2) The support apparatus according to (1), further including a regular expression generation unit that generates a signature in a regular expression format indicating a character string or a string of character strings extracted by the character string extraction unit.

このような構成によれば、支援装置は、正規表現によるシグネチャを生成するので、テキストデータの揺らぎに対しても同一のシグネチャを対応させることができる。その結果、支援装置は、攻撃データを検知する精度を向上できる。 According to such a configuration, since the support apparatus generates a signature based on a regular expression, the same signature can be made to correspond to fluctuations in text data. As a result, the support device can improve the accuracy of detecting attack data.

（３）前記正規表現生成部は、前記所定の文字クラス以外の文字クラスに分類された文字又は文字列に対して、前記所定の文字クラス以外の文字クラスに分類される他の文字又は文字列に置き換え可能な正規表現形式のシグネチャを生成する（２）に記載の支援装置。 (3) The regular expression generation unit, with respect to characters or character strings classified into a character class other than the predetermined character class, other characters or character strings classified into a character class other than the predetermined character class The assisting device according to (2), wherein a signature in a regular expression format that can be replaced with is generated.

このような構成によれば、支援装置は、例えばスペースを挿入するといった難読化されたテキストデータに対しても同一のシグネチャを対応させることができる。その結果、支援装置は、攻撃データを検知する精度を向上できる。 According to such a configuration, the support apparatus can make the same signature correspond to obfuscated text data such as inserting a space. As a result, the support device can improve the accuracy of detecting attack data.

（４）前記正規表現生成部により生成されたシグネチャについて、予め設定されている所定の基準に基づく有効性指標をそれぞれ求め、当該有効性指標に基づいて前記シグネチャを取捨選択する有効性判定部をさらに備える（２）又は（３）に記載の支援装置。 (4) An effectiveness determination unit that obtains an effectiveness index based on a predetermined criterion set in advance for each of the signatures generated by the regular expression generation unit, and selects the signature based on the effectiveness index. The support device according to (2) or (3) further provided.

このような構成によれば、支援装置は、有効性指標に基づいて、より高精度に攻撃データを検知できるシグネチャを選択するので、このシグネチャを利用する侵入検知システムにおいて、攻撃データの検知精度が向上すると共に、処理の効率化が期待できる。 According to such a configuration, the support device selects a signature that can detect attack data with higher accuracy based on the effectiveness index. Therefore, in the intrusion detection system that uses this signature, the accuracy of detection of attack data is high. It is possible to improve the processing efficiency.

（５）前記攻撃データ又は前記非攻撃データの検体を収集する検体収集部をさらに備える（１）から（４）のいずれかに記載の支援装置。 (5) The support apparatus according to any one of (1) to (4), further including a sample collection unit configured to collect a sample of the attack data or the non-attack data.

このような構成によれば、支援装置は、シグネチャを抽出するための検体を自動的に収集するので、運用者の作業負荷が低減される。さらに、検体数が増加することにより、有効なシグネチャを抽出できる可能性が高まる。 According to such a configuration, since the support apparatus automatically collects the specimen for extracting the signature, the workload on the operator is reduced. Furthermore, the possibility that an effective signature can be extracted increases as the number of specimens increases.

（６）前記検体に含まれているＵＲＬを所定のリストと照合することにより、当該検体が攻撃データであるか非攻撃データであるかを判定する検体判定部をさらに備える（１）から（５）のいずれかに記載の支援装置。 (6) The apparatus further includes a sample determination unit that determines whether the sample is attack data or non-attack data by comparing the URL included in the sample with a predetermined list (1) to (5 ).

このような構成によれば、支援装置は、検体を閲覧したブラウザがリダイレクトされる先のＵＲＬを、既に攻撃サイトと判明しているリストと照合できるので、検体が攻撃データであるか非攻撃データであるかの簡易判定ができる。したがって、支援装置は、検体に基づいて生成されるシグネチャの信頼性を向上できる。 According to such a configuration, the support apparatus can collate the URL to which the browser that browsed the specimen is redirected with a list that has already been identified as an attack site, so whether the specimen is attack data or non-attack data. Can be easily determined. Therefore, the support apparatus can improve the reliability of the signature generated based on the specimen.

（７）前記検体判定部は、前記検体に含まれているスクリプトを実行することにより生成されるＵＲＬについても、前記所定のリストと照合することにより、当該検体が攻撃データであるか非攻撃データであるかを判定する（６）に記載の支援装置。 (7) The sample determination unit also checks whether the sample is attack data or non-attack data by checking the URL generated by executing the script included in the sample with the predetermined list. (6) The support apparatus according to (6).

このような構成によれば、支援装置は、検体内のスクリプトを実行することにより実際に発生するリクエストを検出し、リダイレクト先のＵＲＬを取得できる。したがって、支援装置は、テキストとしてＵＲＬが明示されていない場合であっても、検体が攻撃データであるか否かの簡易判定ができる。 According to such a configuration, the support apparatus can detect a request that actually occurs by executing the script in the sample, and acquire the redirect destination URL. Therefore, the support apparatus can easily determine whether or not the specimen is attack data even when the URL is not clearly specified as text.

（８）攻撃データの侵入を検知するためのシグネチャの生成をコンピュータが支援する方法であって、攻撃データ及び非攻撃データの検体がテキスト形式であるか否かを判定するテキスト判定ステップと、前記テキスト判定ステップにおいてテキスト形式であると判定された前記検体に含まれている文字を、正規表現における文字クラスのいずれかに分類するクラス分類ステップと、前記クラス分類ステップにおいて所定の文字クラスに分類された文字の連続からなる文字列のうち、前記非攻撃データの中よりも前記攻撃データの中に頻繁に出現する文字列、又は文字列の列を、前記シグネチャの候補として抽出する文字列抽出ステップと、を含む方法。 (8) A text determination step for determining whether or not a sample of attack data and non-attack data is in a text format, wherein the computer supports generation of a signature for detecting intrusion of attack data, In the text determination step, the characters included in the specimen determined to be in the text format are classified into one of the character classes in the regular expression, and in the class classification step, the characters are classified into a predetermined character class. A character string extraction step of extracting, as a candidate for the signature, a character string that frequently appears in the attack data, or a string of character strings, out of a character string composed of a series of characters And a method comprising:

このような構成によれば、当該方法をコンピュータが実行することにより、（１）と同様の効果が期待できる。 According to such a configuration, the same effect as in (1) can be expected when the computer executes the method.

（９）攻撃データの侵入を検知するためのシグネチャの生成をコンピュータに支援させるプログラムであって、攻撃データ及び非攻撃データの検体がテキスト形式であるか否かを判定するテキスト判定ステップと、前記テキスト判定ステップにおいてテキスト形式であると判定された前記検体に含まれている文字を、正規表現における文字クラスのいずれかに分類するクラス分類ステップと、前記クラス分類ステップにおいて所定の文字クラスに分類された文字の連続からなる文字列のうち、前記非攻撃データの中よりも前記攻撃データの中に頻繁に出現する文字列、又は文字列の列を、前記シグネチャの候補として抽出する文字列抽出ステップと、を実行させるプログラム。 (9) A text determination step for allowing a computer to generate a signature for detecting the invasion of attack data, wherein the text determination step determines whether the specimen of the attack data and the non-attack data is in a text format; In the text determination step, the characters included in the specimen determined to be in the text format are classified into one of the character classes in the regular expression, and in the class classification step, the characters are classified into a predetermined character class. A character string extraction step of extracting, as a candidate for the signature, a character string that frequently appears in the attack data, or a string of character strings, out of a character string composed of a series of characters And a program to execute.

このような構成によれば、当該プログラムをコンピュータに実行させることにより、（１）と同様の効果が期待できる。 According to such a configuration, the same effect as in (1) can be expected by causing the computer to execute the program.

本発明によれば、侵入検知システムにおけるテキスト形式の検体を対象としたシグネチャを効率的に生成できる。 According to the present invention, it is possible to efficiently generate a signature for a specimen in a text format in an intrusion detection system.

本発明の実施形態に係るシステム環境の全体構成を示す図である。It is a figure which shows the whole structure of the system environment which concerns on embodiment of this invention. 本発明の実施形態に係るシグネチャ生成サーバの機能構成を示すブロック図である。It is a block diagram which shows the function structure of the signature generation server which concerns on embodiment of this invention. 本発明の実施形態に係る正規表現における文字クラスを示す図である。It is a figure which shows the character class in the regular expression which concerns on embodiment of this invention. 本発明の実施形態に係るシグネチャの候補を記憶したテーブルの例を示す図である。It is a figure which shows the example of the table which memorize | stored the candidate of the signature which concerns on embodiment of this invention. 本発明の実施形態に係るシグネチャ生成サーバの処理を示すフローチャートである。It is a flowchart which shows the process of the signature generation server which concerns on embodiment of this invention.

以下、本発明の実施形態の一例について説明する。
なお、本実施形態では、支援装置の一例として、シグネチャ生成サーバ１０を説明する。 Hereinafter, an example of an embodiment of the present invention will be described.
In the present embodiment, the signature generation server 10 will be described as an example of a support device.

図１は、本実施形態に係るシグネチャ生成サーバ１０及び侵入検知システム６を含むシステム環境の全体構成を示す図である。 FIG. 1 is a diagram showing an overall configuration of a system environment including a signature generation server 10 and an intrusion detection system 6 according to the present embodiment.

ホスティングサーバ群１は、Ｗｅｂサイトをインターネット２に公開しており、サイトの管理用端末３又はサイトの閲覧用端末４からアクセスされる。攻撃者５は、管理用端末３を制御し、ホスティングサーバ群１に対して、攻撃サイトへ誘導する攻撃データ（ＨＴＭＬ、ＪａｖａＳｃｒｉｐｔ（登録商標）等のテキストデータ）をアップロードする。すると、閲覧用端末４は、攻撃データが含まれるＷｅｂサイトを閲覧したことにより、攻撃者５により用意された攻撃サイトへリダイレクトされ、攻撃を受けることとなる。 The hosting server group 1 publishes a website on the Internet 2 and is accessed from the site management terminal 3 or the site browsing terminal 4. The attacker 5 controls the management terminal 3 and uploads attack data (text data such as HTML and JavaScript (registered trademark)) for guiding to the attack site to the hosting server group 1. Then, the browsing terminal 4 is redirected to the attack site prepared by the attacker 5 by browsing the Web site including the attack data, and is attacked.

侵入検知システム６は、ホスティングサーバ群１への攻撃データの侵入を検知するシステムであり、この攻撃データを識別するためのシグネチャをデータベースに記憶している。ホスティングサーバ群１の運用者７は、侵入検知システム６から警告が通知されることにより、攻撃データの侵入を把握する。また、運用者７は、このシグネチャを適宜更新することにより、新たな攻撃データにも対応する。 The intrusion detection system 6 is a system that detects the intrusion of attack data into the hosting server group 1, and stores a signature for identifying the attack data in a database. The operator 7 of the hosting server group 1 grasps the intrusion of attack data by receiving a warning from the intrusion detection system 6. Moreover, the operator 7 responds to new attack data by appropriately updating this signature.

シグネチャ生成サーバ１０は、シグネチャの生成を支援する装置である。シグネチャ生成サーバ１０は、運用者７からの指示入力に基づいて、ホスティングサーバ群１やインターネット２から収集された検体より、シグネチャを生成し、侵入検知システム６のデータベースへ登録する。 The signature generation server 10 is a device that supports signature generation. Based on the instruction input from the operator 7, the signature generation server 10 generates a signature from the samples collected from the hosting server group 1 and the Internet 2 and registers them in the database of the intrusion detection system 6.

図２は、本実施形態に係るシグネチャ生成サーバ１０の機能構成を示すブロック図である。
シグネチャ生成サーバ１０は、制御部１１と、記憶部１２と、通信部１３と、入力部１４と、出力部１５とを備える。 FIG. 2 is a block diagram illustrating a functional configuration of the signature generation server 10 according to the present embodiment.
The signature generation server 10 includes a control unit 11, a storage unit 12, a communication unit 13, an input unit 14, and an output unit 15.

制御部１１は、シグネチャ生成サーバ１０の全体を制御する部分であり、記憶部１２に記憶された各種プログラムを適宜読み出して実行することにより、上記のハードウェアと協働し、本実施形態における各種機能を実現している。制御部１１は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）であってよい。なお、制御部１１が備える各部の機能は後述する。 The control unit 11 is a part that controls the entire signature generation server 10. By appropriately reading and executing various programs stored in the storage unit 12, the control unit 11 cooperates with the hardware described above, and performs various types in the present embodiment. The function is realized. The control unit 11 may be a CPU (Central Processing Unit). In addition, the function of each part with which the control part 11 is provided is mentioned later.

記憶部１２は、ハードウェア群をシグネチャ生成サーバ１０として機能させるための各種プログラム、及び各種データ等の記憶領域であり、ハードディスク（ＨＤＤ）であってよい。具体的には、記憶部１２には、本実施形態の各種機能を実現させるため制御部１１に実行させるプログラムが記憶される。 The storage unit 12 is a storage area for various programs and various data for causing the hardware group to function as the signature generation server 10, and may be a hard disk (HDD). Specifically, the storage unit 12 stores a program to be executed by the control unit 11 in order to realize various functions of the present embodiment.

通信部１３は、シグネチャ生成サーバ１０が他の装置と情報を送受信する場合のネットワーク・アダプタであり、ネットワーク（インターネット２）を介して攻撃データ又は非攻撃データの検体を収集して制御部１１へ提供する。また、通信部１３は、運用者７の端末からの指示データを受信し、侵入検知システム６へシグネチャを送信する。 The communication unit 13 is a network adapter in the case where the signature generation server 10 transmits / receives information to / from other devices, collects samples of attack data or non-attack data via the network (Internet 2), and sends the sample to the control unit 11. provide. The communication unit 13 receives instruction data from the terminal of the operator 7 and transmits a signature to the intrusion detection system 6.

入力部１４は、シグネチャ生成サーバ１０に対する利用者からの指示入力を受け付けるインタフェース装置である。入力部１４は、例えば、キーボード、マウス及びタッチパネル等により構成される。 The input unit 14 is an interface device that receives an instruction input from the user to the signature generation server 10. The input unit 14 includes, for example, a keyboard, a mouse, a touch panel, and the like.

出力部１５は、利用者にデータの入力を受け付ける画面を表示したり、シグネチャ生成サーバ１０による処理結果の画面を表示したりするディスプレイ装置を含む。さらに、出力部１５は、ブラウン管表示装置（ＣＲＴ）や液晶表示装置（ＬＣＤ）等のディスプレイ装置の他、プリンタ等の各種出力装置を含んでよい。 The output unit 15 includes a display device that displays a screen for accepting data input to the user and displays a screen of a processing result by the signature generation server 10. Further, the output unit 15 may include various output devices such as a printer in addition to a display device such as a cathode ray tube display device (CRT) or a liquid crystal display device (LCD).

次に、制御部１１の機能を詳述する。
制御部１１は、攻撃検体収集部１１１ａと、非攻撃検体収集部１１１ｂと、テキスト判定部１１２と、検体判定部１１３と、文字クラス分類部１１４と、文字列抽出部１１５と、正規表現生成部１１６と、有効性判定部１１７とを備える。各部は、記憶部１２に記憶されているプログラムを実行することにより実現される機能ブロックである。 Next, the function of the control unit 11 will be described in detail.
The control unit 11 includes an attack sample collection unit 111a, a non-attack sample collection unit 111b, a text determination unit 112, a sample determination unit 113, a character class classification unit 114, a character string extraction unit 115, and a regular expression generation unit. 116 and an effectiveness determination unit 117. Each unit is a functional block realized by executing a program stored in the storage unit 12.

攻撃検体収集部１１１ａは、攻撃データの検体を収集する。具体的には、攻撃検体収集部１１１ａは、運用者７から、攻撃を受けたことが判明している、あるいは可能性が高いＷｅｂサイトのＵＲＬの指定入力を受け付け、このＵＲＬにて特定されるＷｅｂページのページデータを収集する。 The attack specimen collection unit 111a collects specimens of attack data. Specifically, the attack specimen collection unit 111a receives from the operator 7 a designated input of a URL of a website that is known or highly likely to have been attacked, and is specified by this URL. Collect page data for web pages.

非攻撃検体収集部１１１ｂは、非攻撃データの検体を収集する。具体的には、非攻撃検体収集部１１１ｂは、侵入検知システム６の監視対象であるホスティングサーバ群１のＵＲＬを入力として、ページデータを取得する。このとき、非攻撃検体収集部１１１ｂは、取得したページデータ（ＨＴＭＬ）に含まれるリンクを参照し、別のページデータを取得する処理を繰り返す。これにより、効率的に非攻撃検体が収集される。 The non-attack sample collection unit 111b collects samples of non-attack data. Specifically, the non-attack sample collection unit 111b receives the URL of the hosting server group 1 to be monitored by the intrusion detection system 6 and acquires page data. At this time, the non-attack sample collection unit 111b refers to the link included in the acquired page data (HTML) and repeats the process of acquiring another page data. Thereby, a non-attack sample is collected efficiently.

テキスト判定部１１２は、収集された検体がテキスト形式であるか否かを判定する。具体的には、テキスト判定部１１２は、検体であるファイルの先頭の数バイトを読み込むと、このバイト列が特定の文字列（例えば、「＜！ｄｏｃｔｙｐｅｈｔｍｌ」）である場合に、ある種のテキスト形式（例えば、ＨＴＭＬ形式）のファイルであると判定する。 The text determination unit 112 determines whether or not the collected specimen is in a text format. Specifically, when the text determination unit 112 reads the first few bytes of the sample file, if this byte string is a specific character string (for example, “<! Doctype html”), a certain type of It is determined that the file is in text format (for example, HTML format).

検体判定部１１３は、収集された検体（ＨＴＭＬファイル）に含まれているＵＲＬを所定のリスト、すなわち公開されているブラックリストと照合することにより、この検体が攻撃データあるか非攻撃データであるかを簡易判定する。また、検体判定部１１３は、検体に含まれているＪａｖａＳｃｒｉｐｔ（登録商標）を実行することにより発生するＨＴＴＰリクエストも対象とし、実行により生成されるＵＲＬについても、所定のブラックリストと照合することにより、検体が攻撃データであるか非攻撃データであるかを簡易判定する。 The specimen determination unit 113 compares the URL included in the collected specimen (HTML file) with a predetermined list, that is, a public blacklist, so that the specimen is attack data or non-attack data. Simple judgment. The sample determination unit 113 also targets HTTP requests generated by executing the JavaScript (registered trademark) included in the sample, and the URL generated by the execution is checked against a predetermined black list. , Whether the specimen is attack data or non-attack data is simply determined.

文字クラス分類部１１４は、テキスト判定部１１２によりテキスト形式であると判定された検体に含まれている文字を、正規表現における文字クラスのいずれかに分類する。 The character class classification unit 114 classifies the characters included in the sample determined to be in the text format by the text determination unit 112 into one of the character classes in the regular expression.

図３は、本実施形態に係る正規表現における文字クラスを示す図である。
本実施形態では、各文字は、アルファベット（Ａｌｐｈａｂｅｔｉｃｃｈａｒａｃｔｅｒ）、数字（Ｄｉｇｉｔｃｈａｒａｃｔｅｒ）又は非単語文字（Ｎｏｎ−ｗｏｒｄ）のいずれかの文字クラスに分類される。また、非単語文字は、さらに区切り文字（Ｐｕｎｃｔｕａｔｉｏｎｃｈａｒａｃｔｅｒｓ）、空白文字（Ｗｈｉｔｅｓｐａｃｅｃｈａｒａｃｔｅｒｓ）又は制御文字（Ｃｏｎｔｒｏｌｃｈａｒａｃｔｅｒｓ）に分類される。 FIG. 3 is a diagram showing character classes in the regular expression according to the present embodiment.
In the present embodiment, each character is classified into one of alphabet (alphabetic character), number (digit character), or non-word character (non-word) character class. Further, the non-word characters are further classified into punctuation characters (White space characters) or control characters (Control characters).

正規表現において、アルファベットは「［：ａｌｐｈａ：］」と表され、数字は「［：ｄｉｇｉｔ：］」と表され、区切り文字は「［：ｐｕｎｃｔ：］」と表され、空白文字は「［：ｓｐａｃｅ：］」と表され、制御文字は「［：ｃｎｔｒｌ：］」と表される。なお、本実施形態では、区切り文字、空白文字及び制御文字を区別せず、非単語文字を表す正規表現「￥ｗ」を用いる。 In the regular expression, the alphabet is represented as “[: alpha:]”, the number is represented as “[: digit:]”, the delimiter is represented as “[: punct:]”, and the space character is represented as “[: space:] "and the control character is represented as" [: cntll:] ". In the present embodiment, a regular expression “¥ w” representing a non-word character is used without distinguishing a delimiter, a space character, and a control character.

文字列抽出部１１５は、文字クラス分類部１１４によりアルファベット（所定の文字クラス）に分類された文字の連続からなる文字列を抽出する。そして、文字列抽出部１１５は、これらのうち、非攻撃データの検体中よりも攻撃データの検体中に頻繁に出現する文字列、又は文字列の列を、シグネチャの候補として抽出する。具体的には、文字列抽出部１１５は、攻撃データの検体にある閾値以上の割合で出現し、非攻撃データの検体に別の閾値以下の割合で出現する文字列又は文字列の列を抽出することとしてよい。 The character string extraction unit 115 extracts a character string composed of a series of characters classified into alphabets (predetermined character classes) by the character class classification unit 114. Then, the character string extraction unit 115 extracts a character string or a character string string that appears more frequently in the attack data sample than in the non-attack data sample, as a signature candidate. Specifically, the character string extraction unit 115 extracts a character string or a character string string that appears in the attack data sample at a rate equal to or higher than a certain threshold and appears in the non-attack data sample at a rate equal to or lower than another threshold. It is good to do.

ここで、攻撃データの検体の中では出現頻度が高いが、非攻撃データの検体の中では出現頻度が低い文字列は、攻撃データを検出する有力なシグネチャの候補となる。また、文字列抽出部１１５は、アルファベット以外の文字で分割されたアルファベットのみの文字列を抽出するので、バイナリ形式の検体からビット列を抽出するのに比べて、シグネチャの候補数を低減できる。すなわち、分割位置を特定せずにビット列を抽出した場合、様々な長さのビット列が候補となるが、文字列抽出部１１５は、アルファベット以外の文字クラスの文字により明確に分割されたアルファベットの文字列を対象とする。 Here, a character string having a high appearance frequency in a sample of attack data but having a low appearance frequency in a sample of non-attack data is a potential signature candidate for detecting attack data. Further, since the character string extraction unit 115 extracts only the character string of the alphabet divided by the characters other than the alphabet, the number of signature candidates can be reduced as compared to extracting the bit string from the binary sample. That is, when a bit string is extracted without specifying the division position, bit strings of various lengths are candidates, but the character string extraction unit 115 is capable of clearly dividing the alphabet characters by characters of a character class other than the alphabet. Target a column.

正規表現生成部１１６は、文字列抽出部１１５により抽出された文字列又は文字列の列を示す正規表現形式のシグネチャを生成し、記憶部１２に記憶する。このとき、正規表現生成部１１６は、アルファベット以外の文字クラスに分類された文字又は文字列に対して、同じくアルファベット以外の文字クラスに分類される他の文字又は文字列に置き換え可能な正規表現形式のシグネチャを生成する。 The regular expression generation unit 116 generates a regular expression signature indicating the character string or character string extracted by the character string extraction unit 115 and stores the signature in the storage unit 12. At this time, the regular expression generation unit 116 can replace a character or character string classified into a character class other than alphabet with another character or character string classified into a character class other than alphabet. Generate a signature for

図４は、本実施形態に係るシグネチャの候補を記憶したテーブルの例を示す図である。
ここで、Ｔｏｋｅｎフィールドは、文字列抽出部１１５により抽出された文字列又は文字列の列であり、ＲｅｇＥｘｐＳｉｇｎａｔｕｒｅフィールドは、正規表現生成部１１６により生成された正規表現形式のシグネチャである。 FIG. 4 is a diagram illustrating an example of a table storing signature candidates according to the present embodiment.
Here, the Token field is a character string or a string of character strings extracted by the character string extraction unit 115, and the RegExp Signature field is a regular expression format signature generated by the regular expression generation unit 116.

また、Ｔｒｕｅ＋フィールドは、シグネチャが攻撃データの検体に含まれる割合であり、Ｆａｌｓｅ＋フィールドは、シグネチャが非攻撃データの検体に含まれる割合である。さらに、Ｌｅｎｇｔｈフィールドは、シグネチャが表しているアルファベットの文字列、又は文字列の列の長さである。これらは、生成されたシグネチャの有効性判定に用いられる。 Further, the True + field is a ratio in which the signature is included in the specimen of the attack data, and the False + field is a ratio in which the signature is included in the specimen of the non-attack data. Further, the Length field is an alphabetic character string represented by the signature or the length of the character string. These are used to determine the validity of the generated signature.

なお、正規表現では、文字列の区切りは、「￥（［［：ｄｉｇｉｔ：］］￥｜￥ｗ￥）￥＋」と表現され、アルファベット以外の文字クラスに属する文字からなる任意の文字列を示している。すなわち、攻撃データ中の数字や非単語文字が他の数字や非単語文字からなる文字列に変換された場合（例えば、スペースを挿入された場合等）にも、シグネチャと合致する。 In the regular expression, the character string delimiter is expressed as “¥ ([[: digit:]] ¥ | ¥ w ¥) ¥ +”, and an arbitrary character string composed of characters belonging to a character class other than the alphabet is represented. Show. That is, even when a number or non-word character in the attack data is converted into a character string composed of another number or non-word character (for example, when a space is inserted), it matches the signature.

また、図４の例では、抽出された文字列が２文字以下である場合、例えば変数名の相違等により、侵入検知システム６において誤検知が生じる可能性が高くなるため、任意の２文字として表現している。 In the example of FIG. 4, when the extracted character string is 2 characters or less, there is a high possibility of erroneous detection in the intrusion detection system 6 due to, for example, a difference in variable name. expressing.

有効性判定部１１７は、正規表現生成部１１６により生成されたシグネチャについて、予め設定されている所定の基準に基づく有効性指標をそれぞれ求め、この有効性指標に基づいてシグネチャを取捨選択する。具体的には、例えば、図４に示すように、シグネチャが攻撃データの検体に含まれる割合（Ｔｒｕｅ＋）、シグネチャが非攻撃データの検体に含まれる割合（Ｆａｌｓｅ＋）、文字列長（Ｌｅｎｇｔｈ）等を有効性指標とする。 The validity determination unit 117 obtains an effectiveness index based on a predetermined criterion set in advance for each signature generated by the regular expression generation unit 116, and selects a signature based on the validity index. Specifically, for example, as shown in FIG. 4, the ratio that the signature is included in the specimen of attack data (True +), the ratio that the signature is included in the specimen of non-attack data (False +), the character string length (Length), and the like Is the effectiveness index.

上述のように、（Ｔｒｕｅ＋）は大きいほど良く、（Ｆａｌｓｅ＋）は小さいほど良い。また、（Ｌｅｎｇｔｈ）は、長いほど誤検知が生じにくい。そこで、有効性判定部１１７は、例えば、（Ｔｒｕｅ＋）の大きいもの、（Ｆａｌｓｅ＋）の小さいもの、（Ｔｒｕｅ＋）−（Ｆａｌｓｅ＋）の大きいもの、（Ｌｅｎｇｔｈ）の大きいもの等、予め設定されている優先度に基づいて、有効なシグネチャを選択する。 As described above, (True +) is better as it is larger, and (False +) is better as it is smaller. Further, as (Length) is longer, false detection is less likely to occur. Therefore, the validity determination unit 117, for example, a preset priority such as one with a large (True +), one with a small (False +), one with a large (True +) − (False +), one with a large (Length), etc. Select a valid signature based on the degree.

また、ランダムな文字列よりも関数名のような意味のある文字列の方が有効なシグネチャである可能性が高いので、有効性判定部１１７は、このようなシグネチャを優先してもよい。 In addition, since there is a high possibility that a meaningful character string such as a function name is a valid signature rather than a random character string, the validity determination unit 117 may prioritize such a signature.

図５は、本実施形態に係るシグネチャ生成サーバ１０の処理を示すフローチャートである。 FIG. 5 is a flowchart showing processing of the signature generation server 10 according to the present embodiment.

ステップＳ１において、制御部１１（攻撃検体収集部１１１ａ、非攻撃検体収集部１１１ｂ）は、攻撃データ及び非攻撃データの検体を収集する。 In step S1, the control unit 11 (attack sample collection unit 111a, non-attack sample collection unit 111b) collects sample of attack data and non-attack data.

ステップＳ２（テキスト判定ステップ）において、制御部１１（テキスト判定部１１２）は、ステップＳ１において収集された検体がテキスト形式であるか否かを判定する。制御部１１は、テキスト形式でないと判定された検体を除外し、処理を次のステップへ進める。 In step S2 (text determination step), the control unit 11 (text determination unit 112) determines whether or not the sample collected in step S1 is in text format. The control unit 11 excludes the sample determined not to be in the text format, and proceeds to the next step.

ステップＳ３において、制御部１１（検体判定部１１３）は、ステップＳ２においてテキスト形式と判定された検体について、攻撃データであるか非攻撃データであるかの簡易判定を行う。制御部１１は、収集時の区分（攻撃データ又は非攻撃データ）と異なると判定された検体については、以下の処理で除外してもよいし、区分を入れ替えてもよい。 In step S3, the control unit 11 (specimen determination unit 113) performs simple determination as to whether the sample is determined to be text format in step S2 as attack data or non-attack data. The control unit 11 may exclude the sample determined to be different from the classification at the time of collection (attack data or non-attack data) by the following processing, or may replace the classification.

ステップＳ４（クラス分類ステップ）において、制御部１１（文字クラス分類部１１４）は、ステップＳ３までに除外されなかった検体に含まれる文字を、正規表現における文字クラスに分類する。 In step S4 (class classification step), the control unit 11 (character class classification unit 114) classifies characters included in the sample that have not been excluded until step S3 into character classes in the regular expression.

ステップＳ５（文字列抽出ステップ）において、制御部１１（文字列抽出部１１５）は、ステップＳ４でアルファベットに分類された文字からなる文字列、又は文字列の列を抽出し、このうち、非攻撃データよりも攻撃データの中に頻出するシグネチャ候補をさらに抽出する。 In step S5 (character string extraction step), the control unit 11 (character string extraction unit 115) extracts a character string or a string of character strings composed of the characters classified into alphabets in step S4. Signature candidates that appear more frequently in attack data than data are further extracted.

ステップＳ６において、制御部１１（正規表現生成部１１６）は、ステップＳ５において抽出されたシグネチャ候補の文字列、又は文字列の列を検出するための正規表現を生成する。 In step S6, the control unit 11 (regular expression generating unit 116) generates a regular expression for detecting the character string of the signature candidate extracted in step S5 or the string of character strings.

ステップＳ７において、制御部１１（有効性判定部１１７）は、ステップＳ６において生成された正規表現のシグネチャ候補の有効性を判定し、有効と判定されたものをシグネチャとして選定する。 In step S <b> 7, the control unit 11 (validity determination unit 117) determines the validity of the regular expression signature candidate generated in step S <b> 6 and selects the one determined to be valid as the signature.

以上のように、本実施形態によれば、シグネチャ生成サーバ１０は、テキスト形式の検体のうち、非攻撃データよりも攻撃データに頻出するアルファベットの文字列、又は文字列の列を抽出する。したがって、シグネチャ生成サーバ１０は、アルファベットの並びのみに着目して効率的にテキスト形式の攻撃に特有のシグネチャ候補を抽出できる。その結果、シグネチャ生成サーバ１０は、処理負荷を低減でき、大量の検体に対して網羅的にシグネチャ候補を探索することができる。 As described above, according to the present embodiment, the signature generation server 10 extracts an alphabetic character string or a character string string that appears more frequently in attack data than in non-attack data, from the text sample. Therefore, the signature generation server 10 can efficiently extract signature candidates specific to text format attacks by paying attention only to the alphabetical sequence. As a result, the signature generation server 10 can reduce the processing load, and can comprehensively search for signature candidates for a large number of samples.

また、シグネチャ生成サーバ１０は、正規表現によるシグネチャを生成するので、例えばスペースを挿入するといった難読化されたテキストデータの揺らぎに対しても同一のシグネチャを対応させることができる。その結果、シグネチャ生成サーバ１０は、攻撃データを検知する精度を向上できる。 Furthermore, since the signature generation server 10 generates a signature based on a regular expression, the same signature can be made to correspond to fluctuations in obfuscated text data such as inserting a space. As a result, the signature generation server 10 can improve the accuracy of detecting attack data.

また、シグネチャ生成サーバ１０は、出現頻度や文字列長等の有効性指標に基づいて、より高精度に攻撃データを検知できるシグネチャを選択するので、このシグネチャを利用する侵入検知システム６において、攻撃データの検知精度が向上すると共に、処理の効率化が期待できる。 Further, since the signature generation server 10 selects a signature that can detect attack data with higher accuracy based on the validity index such as the appearance frequency and the character string length, the intrusion detection system 6 that uses this signature performs an attack. Data detection accuracy can be improved and processing efficiency can be expected.

また、シグネチャ生成サーバ１０は、シグネチャを抽出するための検体を自動的に収集するので、運用者の作業負荷が低減される。さらに、検体数が増加することにより、有効なシグネチャを抽出できる可能性が高まる。 In addition, since the signature generation server 10 automatically collects the specimen for extracting the signature, the workload of the operator is reduced. Furthermore, the possibility that an effective signature can be extracted increases as the number of specimens increases.

また、シグネチャ生成サーバ１０は、検体を閲覧したブラウザがリダイレクトされる先のＵＲＬを、既に攻撃サイトと判明しているブラックリストと照合できるので、検体が攻撃データであるか非攻撃データであるかの簡易判定ができる。さらに、シグネチャ生成サーバ１０は、検体内のスクリプトを実行することにより実際に発生するリクエストを検出し、リダイレクト先のＵＲＬを取得できる。したがって、支援装置は、テキストとしてＵＲＬが明示されていない場合であっても、検体が攻撃データであるか否かの簡易判定ができる。その結果、シグネチャ生成サーバ１０は、検体に基づいて生成されるシグネチャの信頼性を向上できる。 In addition, since the signature generation server 10 can collate the URL to which the browser that browsed the sample is redirected with a blacklist that has already been identified as an attack site, whether the sample is attack data or non-attack data Simple judgment can be made. Furthermore, the signature generation server 10 can detect a request that actually occurs by executing a script in the sample, and acquire a redirect destination URL. Therefore, the support apparatus can easily determine whether or not the specimen is attack data even when the URL is not clearly specified as text. As a result, the signature generation server 10 can improve the reliability of the signature generated based on the specimen.

以上、本発明の実施形態について説明したが、本発明は上述した実施形態に限るものではない。また、本実施形態に記載された効果は、本発明から生じる最も好適な効果を列挙したに過ぎず、本発明による効果は、本実施形態に記載されたものに限定されるものではない。 As mentioned above, although embodiment of this invention was described, this invention is not restricted to embodiment mentioned above. Further, the effects described in the present embodiment are merely a list of the most preferable effects resulting from the present invention, and the effects of the present invention are not limited to those described in the present embodiment.

１０シグネチャ生成サーバ（支援装置）
１１制御部
１２記憶部
１３通信部
１４入力部
１５出力部
１１１ａ攻撃検体収集部
１１１ｂ非攻撃検体収集部
１１２テキスト判定部
１１３検体判定部
１１４文字クラス分類部
１１５文字列抽出部
１１６正規表現生成部
１１７有効性判定部 10 Signature generation server (support device)
DESCRIPTION OF SYMBOLS 11 Control part 12 Memory | storage part 13 Communication part 14 Input part 15 Output part 111a Attack sample collection part 111b Non-attack sample collection part 112 Text determination part 113 Sample determination part 114 Character class classification part 115 Character string extraction part 116 Regular expression production | generation part 117 Effectiveness determination unit

Claims

A support device that supports generation of a signature for detecting intrusion of attack data,
A text determination unit that determines whether or not a sample of attack data and non-attack data is in a text format;
A class classification unit that classifies characters included in the specimen determined to be in text format by the text determination unit into any of the character classes in a regular expression;
Among character strings consisting of a series of characters classified into a predetermined character class by the class classification unit, a character string that appears more frequently in the attack data than in the non-attack data, or a string of character strings A character string extraction unit that extracts the candidate signature.

The support apparatus according to claim 1, further comprising a regular expression generation unit that generates a signature in a regular expression format indicating a character string or a string of character strings extracted by the character string extraction unit.

The regular expression generator can replace a character or character string classified into a character class other than the predetermined character class with another character or character string classified into a character class other than the predetermined character class. The support apparatus according to claim 2, wherein the regular expression format signature is generated.

The system further comprises an effectiveness determination unit that obtains an effectiveness index based on a predetermined criterion set in advance for each signature generated by the regular expression generation unit and selects the signature based on the effectiveness index. The support device according to claim 2 or claim 3.

The support device according to any one of claims 1 to 4, further comprising a sample collection unit that collects the sample of the attack data or the non-attack data.

The sample determination unit according to any one of claims 1 to 5, further comprising a sample determination unit that determines whether the sample is attack data or non-attack data by comparing a URL included in the sample with a predetermined list. The support apparatus according to the above.

Whether the sample is attack data or non-attack data by collating the URL generated by executing the script included in the sample with the predetermined list. The support device according to claim 6, wherein the determination is made.

A method for a computer to support generation of a signature for detecting intrusion of attack data,
A text determination step for determining whether or not a sample of attack data and non-attack data is in a text format;
A class classification step for classifying the characters included in the specimen determined to be in the text format in the text determination step into one of the character classes in a regular expression;
Among character strings consisting of a series of characters classified into a predetermined character class in the class classification step, a character string that appears more frequently in the attack data than in the non-attack data, or a string of character strings And a character string extracting step of extracting as a candidate for the signature.

A program for assisting a computer in generating a signature for detecting intrusion of attack data,
A text determination step for determining whether or not a sample of attack data and non-attack data is in a text format;
A class classification step for classifying the characters included in the specimen determined to be in the text format in the text determination step into one of the character classes in a regular expression;
Among character strings consisting of a series of characters classified into a predetermined character class in the class classification step, a character string that appears more frequently in the attack data than in the non-attack data, or a string of character strings And a character string extraction step of extracting as a signature candidate.