JP2007074339A

JP2007074339A - Spread unauthorized access detection method and system

Info

Publication number: JP2007074339A
Application number: JP2005258812A
Authority: JP
Inventors: Yuji Izumi; 勇治和泉; Yutaka Tsunoda; 裕角田; Yoshiaki Nemoto; 義章根元
Original assignee: Tohoku University NUC
Current assignee: Tohoku University NUC
Priority date: 2005-09-07
Filing date: 2005-09-07
Publication date: 2007-03-22

Abstract

<P>PROBLEM TO BE SOLVED: To provide a system for detecting spread unauthorized access in the early stage by a detection method not requiring a detection data base. <P>SOLUTION: The system for detecting spread unauthorized access comprises a packet collecting section 101 for collecting packets from network traffic while gathering in units of packet or flow, a storage section 102 for storing the collected packet information, a feature amount calculating section 103 for calculating the amount of features by digitizing the payload of packet from the packet information gathered in units of packet or flow, a similarity determining section 104 for judging similarity of communication content of packet or flow based on the calculated amount of features, an information transmitting section 105 for notifying the amount of features of communication content as a candidate of spread unauthorized access to other observation point when a plurality of similar communication contents are detected in a predetermined period, an information receiving section 106 for receiving the amount of features of the candidate of spread unauthorized access notified from other observation point, and an abnormality judging section 107 for detecting spread unauthorized access by judging the similarity of communication content based on the notified amount of features. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、ワームなどのように不特定多数のホストに感染を試みる拡散型不正アクセスを検出する技術に関するものである。 The present invention relates to a technique for detecting a diffusion type unauthorized access that attempts to infect a large number of unspecified hosts such as worms.

近年、ネットワークを経由して感染するワームやウイルスなどの不正アクセスが大きな社会問題となっている。特にネットワークやコンピュータの高速化に伴い、その拡散のスピードが急激に早まり、世界規模の被害が報告されるようになっている（非特許文献１、非特許文献２）。ワームなどの拡散型不正アクセスによる被害を抑制するには、まず不正アクセスを検知することが必要であるが、現在の検知システムでは検知用のデータベースの作成と更新が必要であり、そのデータベースを準備するために要する時間が被害拡大の要因となっている。zero-day attackと表されるように、ソフトウエアにセキュリティ上の脆弱性が発見された際に、問題の存在自体が広く公表される前にその脆弱性を悪用する不正アクセスも出現しており、不正アクセスに対する迅速な対応を可能とする技術の確立が社会的急務となっている。 In recent years, unauthorized access such as worms and viruses that are transmitted via networks has become a major social problem. In particular, with the increase in the speed of networks and computers, the speed of diffusion has rapidly increased, and worldwide damage has been reported (Non-Patent Document 1, Non-Patent Document 2). In order to suppress the damage caused by diffusion-type unauthorized access such as worms, it is first necessary to detect unauthorized access, but the current detection system requires the creation and update of a database for detection. The time it takes to do this is a factor in the expansion of damage. As represented by a zero-day attack, when a security vulnerability is discovered in software, unauthorized access that exploits the vulnerability has also appeared before the existence of the problem has been widely disclosed. The establishment of technology that enables quick response to unauthorized access is a social urgent task.

この課題に対して検知用データベースを必要としない検知方式である異常検知方式の研究が盛んに行われている（非特許文献３〜非特許文献９）。異常検知方式は、管理対象ネットワークのトラヒックを、サーバへのアクセス頻度やパケットヘッダのフィールドの出現頻度などに基づき数値として表すことにより、その数値が予め設定された通常時の値からどの程度逸脱しているか判定することで異常状態の検知を行うものである。異常検知方式は、検知用データベースなどの事前知識を必要としないため未知のワームの検知が原理的に可能であるが、ネットワークトラヒックの数値化方式やその値に基づき通常状態を適切に表すモデルの設計方式が確立されていないため、多くの未検知や誤検知が発生する問題がある。 Research on an anomaly detection method, which is a detection method that does not require a detection database, has been actively conducted for this problem (Non-Patent Documents 3 to 9). The anomaly detection method expresses the traffic of the managed network as a numerical value based on the access frequency to the server, the appearance frequency of the packet header field, etc., and how much the numerical value deviates from the preset normal value. The abnormal state is detected by determining whether or not it is present. The anomaly detection method can detect unknown worms in principle because it does not require prior knowledge such as a database for detection, but it is a model that appropriately represents the normal state based on the network traffic quantification method and its value. Since the design method is not established, there is a problem that many undetected and erroneous detections occur.

また不正アクセスの早期検知を目的とした研究では、感染試行時に発生するスキャンに伴うエラーメッセージや未使用IPアドレスへのアクセスを利用したものがある。メールを経由して感染するワームなどは、感染ホストのメールアドレスのリストを感染先として利用するものがあり、この場合スキャン発生時に生じるエラーなどは起こらず、スキャンを前提としない検知方式が必要であると考えられる。 Some studies aimed at early detection of unauthorized access use error messages associated with scans that occur during infection attempts and access to unused IP addresses. Some worms that are infected via e-mail use a list of infected host e-mail addresses as an infection destination. In this case, an error that occurs when a scan occurs does not occur, and a detection method that does not require scanning is required. It is believed that there is.

さらにワームなどのように不特定多数のホストに感染を試みる拡散型不正アクセスでは、感染時の動作として自分自身の複製を多数のホストへ送信する特徴があり、類似した通信内容を多数発生させることになるため、通信内容の類似性を評価し早期検知を行う方式が有効であると考えられる。 Furthermore, in the spread type unauthorized access that tries to infect a large number of unspecified hosts such as worms, there is a feature of sending a copy of itself to a large number of hosts as an action at the time of infection, and a lot of similar communication contents are generated. Therefore, it is considered that a method of evaluating the similarity of communication contents and performing early detection is effective.

拡散型不正アクセスの早期検知方式として様々な手法が提案されている。非特許文献10や非特許文献11などは、世界規模で拡散する不正アクセスに対して、不正検知システムやFirewallなどをセンサとして利用し、不正アクセスの世界規模での監視を行っている。 Various methods have been proposed as an early detection method of diffusion type unauthorized access. Non-Patent Document 10 and Non-Patent Document 11 use a fraud detection system, a firewall, and the like as sensors to monitor unauthorized access that spreads on a global scale.

検知用データベースを利用しない検知技術としては、未使用のIPアドレスに対するアクセスをNetwork Probe（非特許文献12、非特許文献13）やHoneypot（非特許文献14）により観測する方式、ICMPのUnreachable Messageを観測する方式（非特許文献15）がある。未使用のIPアドレスへのアクセスは、送信元を詐称したDoS（Denial of Service）攻撃やScanなどが原因で生じるものであり、検知用データベースなどの事前知識を必要としない不正アクセスの検知としては有効な方式であると考えられる。 As detection technology that does not use the database for detection, access to unused IP addresses is monitored by Network Probe (Non-Patent Document 12, Non-Patent Document 13) or Honeypot (Non-Patent Document 14), ICMP Unreachable Message There is an observation method (Non-patent Document 15). Access to an unused IP address is caused by a DoS (Denial of Service) attack that misrepresents the sender or scanning, etc. As detection of unauthorized access that does not require prior knowledge such as a detection database It is considered to be an effective method.

拡散型不正アクセスの拡散をモデル化する研究も行われている（非特許文献16、非特許文献17、非特許文献18）。これらの研究は、主に感染先を特定する際にスキャンを行うワームなどを対象とし、感染しているホスト数と脆弱性を有するホスト数、ワームから発生するスキャンのためのトラヒック量などから感染するホスト数の漸化式などを導出し、その時間変化を推定している。また未使用IPアドレスを利用した検知手法に基づき検知可能となるまでの時間の推定も行い、利用可能な未使用IPアドレス数と検知までの必要時間の関係の考察も行っている。これらの研究は検知システムの設計・運用に多くの知見を与え得る有用な研究であるが、拡散型不正アクセスの早期検知を実現するまでには至ってない。 Research is also being conducted to model the diffusion of diffusion-type unauthorized access (Non-Patent Document 16, Non-Patent Document 17, Non-Patent Document 18). These studies mainly target worms that scan when specifying the infection destination, and infect the number of infected hosts, the number of vulnerable hosts, and the amount of traffic for scanning generated by the worm. A recurrence formula for the number of hosts to be used is derived, and the time change is estimated. We also estimate the time until detection is possible based on a detection method using unused IP addresses, and consider the relationship between the number of available unused IP addresses and the time required for detection. These studies are useful studies that can give a lot of knowledge to the design and operation of detection systems, but they have not yet reached the early stage of detection of diffuse unauthorized access.

未使用IPアドレスへのスキャンなどを前提としない拡散型不正アクセスの検知方式として、通信内容の一部分を抽出し、それが高頻度で観測された場合、それらを拡散型不正アクセスの候補として検知する方式がある。非特許文献19、非特許文献20では、部分的に同じ内容を有している通信が異なる端末間で多数発見された場合、その同じ内容の共通部分を含む通信をワームなどにより生成されたものとして検知を行うものである。非特許文献19は、共通部分の抽出の単位として”substring”を利用し、そのHashの出現頻度のヒストグラムを保持する方式である。非特許文献20は、ある区切り文字を基準にフローのペイロードをバイト列へ分割し、その出現頻度のヒストグラムを用いて、出現頻度の高いバイト列をワームが原因によるものとして検知する手法である。これらの方式は、パケットやフローの通信内容の一部の出現頻度を検知の基準として用いることにより、スキャンなどの特定の動作を前提としないため、メールアドレスのリストから感染候補を選定し感染を試みるよなウイルスに対しても検知が可能であるという利点がある。しかし、ペイロードをあるルールで分解し、分解されたバイト列を出現頻度算出の基準として用いるため、不正アクセスの動作に影響のないバイト列の挿入などにより、出現頻度を適切に算出することができないという問題点が考えられる。 As a method for detecting unauthorized access that does not require scanning of unused IP addresses, etc., a part of communication content is extracted, and if it is observed frequently, it is detected as a candidate for unauthorized access to the diffusion There is a method. In Non-Patent Document 19 and Non-Patent Document 20, when a large number of communications having partially the same content are found between different terminals, communication including the common part of the same content is generated by a worm or the like Is detected. Non-Patent Document 19 is a method that uses “substring” as a unit for extracting a common part and holds a histogram of the appearance frequency of the Hash. Non-Patent Document 20 is a method of dividing a flow payload into byte sequences based on a certain delimiter and detecting a byte sequence having a high appearance frequency as a cause of a worm using a histogram of the appearance frequency. These methods use the appearance frequency of part of the communication contents of packets and flows as detection criteria, and do not assume specific operations such as scanning, so select infection candidates from a list of e-mail addresses and infect them. It has the advantage of being able to detect viruses that you try. However, since the payload is decomposed according to a rule and the decomposed byte string is used as a reference for calculating the appearance frequency, the appearance frequency cannot be calculated appropriately by inserting a byte string that does not affect the operation of unauthorized access. The problem can be considered.

CERT. CERT/CC advisories. http://www.cert.org/advisories/.CERT. CERT / CC advisories. Http://www.cert.org/advisories/. D. Moore, C. Shannon, and J. Brown. "Code-Red: a case study on the spread and victims of an Internet worm.” In Proceedings of the second ACM SIGCOMM Workshop on Internet Measurement, November 2002.D. Moore, C. Shannon, and J. Brown. "Code-Red: a case study on the spread and victims of an Internet worm." In Proceedings of the second ACM SIGCOMM Workshop on Internet Measurement, November 2002. Matthew V. Mahoney and Philip K. Chan, "Trajectory Boundary Modeling of Time Series for Anomaly Detection," Data Mining Methods for Anomaly Detection workshop, KDD, Chicag0, 2005.Matthew V. Mahoney and Philip K. Chan, "Trajectory Boundary Modeling of Time Series for Anomaly Detection," Data Mining Methods for Anomaly Detection workshop, KDD, Chicag0, 2005. K. Chan and Matthew V. Mahoney, "Detecting Novel Attacks by Identifying Anomalous Network Packet Headers by Philip", Florida Tech. technical report CS-2001-2, 2001.K. Chan and Matthew V. Mahoney, "Detecting Novel Attacks by Identifying Anomalous Network Packet Headers by Philip", Florida Tech. Technical report CS-2001-2, 2001. Matthew V. Mahoney and Philip K. Chan, "PHAD: Packet Header Anomaly Detection for Indentifying Hostile Network Traffic". Florida Tech. technical report CS-2001-4,2001.Matthew V. Mahoney and Philip K. Chan, "PHAD: Packet Header Anomaly Detection for Indentifying Hostile Network Traffic". Florida Tech. Technical report CS-2001-4,2001. Matthew V. Mahoney and Philip K. Glian, "Learning Non- stationary Models of Normal Network Traffic for Detecting Novel Attacks", Proc. Eighth Int1. Conf. Knowledge Discovery and Data Mining, p376-385, 2002.Matthew V. Mahoney and Philip K. Glian, "Learning Non-stationary Models of Normal Network Traffic for Detecting Novel Attacks", Proc. Eighth Int1. Conf. Knowledge Discovery and Data Mining, p376-385, 2002. Matthew V. Mahoney and Philip K. Chan, "Learning Models of Network Traffic for Detecting Novel Attacks", Florida Institute of Technology Technical Report CS-2002-08, 2002.Matthew V. Mahoney and Philip K. Chan, "Learning Models of Network Traffic for Detecting Novel Attacks", Florida Institute of Technology Technical Report CS-2002-08, 2002. Matthew V. Mahoney, "Network Traffic Anomaly Detection Based on Packet Bytes", Proc. ACM-SAC, Melbourne FL, p346-350,2003Matthew V. Mahoney, "Network Traffic Anomaly Detection Based on Packet Bytes", Proc. ACM-SAC, Melbourne FL, p346-350,2003 D. Barbara, J. Couto, S. Jajodia, L. Popyack, and N. Wu, "ADAM: Detecting Intrusions by Data Mining", Proc. 2001IEEE Workshop Information Assurance and Security, pp.11-16, June 2001.D. Barbara, J. Couto, S. Jajodia, L. Popyack, and N. Wu, "ADAM: Detecting Intrusions by Data Mining", Proc. 2001 IEEE Workshop Information Assurance and Security, pp. 11-16, June 2001. Symantec Corp.,"Symantec early warning solutions." , http://enterprisesecurity.symantec.com/SecurityServices /content.cfm?ArticlelD=1522.Symantec Corp., "Symantec early warning solutions.", Http://enterprisesecurity.symantec.com/SecurityServices /content.cfm?ArticlelD=1522. "Internet Storm Center", http://isc.incidents.org."Internet Storm Center", http://isc.incidents.org. D. Moore, C. Shannon, G.M. Voelker, and S. Savage, " Network telescopes". Technical report. Technical Report TR-2004-04, GAIDA, 2004.D. Moore, C. Shannon, G.M.Voelker, and S. Savage, "Network telescopes". Technical report. Technical Report TR-2004-04, GAIDA, 2004. R. Pang, V. Yegneswaran, P. Barford, V. Paxson, and L. Peterson, "Characteristics of Internet background radiation". In Proceedings of the Internet Measurement Conference (IMC), October 2004.R. Pang, V. Yegneswaran, P. Barford, V. Paxson, and L. Peterson, "Characteristics of Internet background radiation". In Proceedings of the Internet Measurement Conference (IMC), October 2004. Hotleynet project,"know your enemy: Honeynets", http://project, honeynet.org/papers/honeynet.Hotleynet project, "know your enemy: Honeynets", http: // project, honeynet.org/papers/honeynet. V.H.Berk, R.S.Gray, and G.Bakos, "Using sensor networks and data fusion for early detection of active worms". In Proceedings of the SPIE AeroSense, 2003.V.H.Berk, R.S.Gray, and G.Bakos, "Using sensor networks and data fusion for early detection of active worms". In Proceedings of the SPIE AeroSense, 2003. Z.Chen, L.Gao, and K.Kwiat," Modeling the spread of active worms", In Proceedings of the IEEE INFOCOM, March 2003.Z. Chen, L. Gao, and K. Kwiat, "Modeling the spread of active worms", In Proceedings of the IEEE INFOCOM, March 2003. C.C.Zou, W.Gong, and D.Towsley,"Code Red worm propagation modeling and analysis", In Proceedings of 9th ACM Conference on Computer and Communications Security (CCS' 02), October 2002C.C.Zou, W.Gong, and D.Towsley, "Code Red worm propagation modeling and analysis", In Proceedings of 9th ACM Conference on Computer and Communications Security (CCS '02), October 2002 S.Staniford, V.Paxson, and N.Weaver, "How to own the Internet in your spare time", In Proceedings of USENIX Security Symposium, August 2002.S. Staniford, V. Paxson, and N. Weaver, "How to own the Internet in your spare time", In Proceedings of USENIX Security Symposium, August 2002. S.Singh, C.Estan, G.Varghese. and S.Savage, "Automated worm fingerprinting", In Proceedings of the 6th ACM/USENIX Symposium on Operating System Design and Implementation (OSDI), December 2004.S.Singh, C.Estan, G.Varghese. And S.Savage, "Automated worm fingerprinting", In Proceedings of the 6th ACM / USENIX Symposium on Operating System Design and Implementation (OSDI), December 2004. H.Kim and B.Karp,"Autograph: Toward automated, distributed worm signature detection", In Proceedings of 13th USENIX Security Symposium, August 2004.H. Kim and B. Karp, "Autograph: Toward automated, distributed worm signature detection", In Proceedings of 13th USENIX Security Symposium, August 2004.

本発明は、上記問題を解決するため、ある区切り文字を基準にしてフローのペイロードを分割するのではなく、フローを構成するパケットのペイロードを8bit毎に区切り、該8bitを一つのコードとして、コードの出現頻度をヒストグラムとして表わした特徴量を用いることにより、バイト列の挿入などの問題を回避することで、パケットやフローの通信内容の類似性を評価する方法を提供することを目的とする。 In order to solve the above problem, the present invention does not divide the payload of the flow based on a certain delimiter character, but delimits the payload of the packet constituting the flow every 8 bits, and uses the 8 bits as one code. It is an object of the present invention to provide a method for evaluating the similarity of communication contents of packets and flows by avoiding problems such as insertion of a byte sequence by using feature amounts representing the appearance frequency of as a histogram.

さらに複数の観測点で前記特徴量に関する情報の交換を行うと共に該情報を基にフローの類似性を判定することにより拡散型不正アクセスを早期に検知する方法を提供することを目的とする。 It is another object of the present invention to provide a method for detecting diffusion-type unauthorized access at an early stage by exchanging information on the feature quantity at a plurality of observation points and determining flow similarity based on the information.

上記目的を達成するため、請求項１に記載の拡散型不正アクセス検出方法は、ネットワークトラヒックから収集したパケット情報を基にパケットのペイロードを数値化して特徴量を算出する特徴量算出工程と、前記特徴量を基にパケットやフローの通信内容の類似性を判定する類似性判定工程と、複数の観測点で前記特徴量に関する情報の交換を行うと共に該情報を基にパケットやフローの通信内容の類似性を判定することにより拡散型不正アクセスを検知する異常判定工程とを有することを特徴とする。ここでフローとは、送信IPアドレス、受信IPアドレス、送信port番号、受信port番号、およびプロトコルの５要素が同一である通信の単位と定義する。 In order to achieve the above object, the diffusion type unauthorized access detection method according to claim 1, wherein a feature amount calculation step of calculating a feature amount by digitizing a packet payload based on packet information collected from network traffic, A similarity determination step for determining the similarity of communication contents of a packet or a flow based on a feature amount, and exchange of information regarding the feature amount at a plurality of observation points, and the communication content of the packet or flow based on the information And an abnormality determination step of detecting diffusion type unauthorized access by determining similarity. Here, the flow is defined as a communication unit in which five elements of a transmission IP address, a reception IP address, a transmission port number, a reception port number, and a protocol are the same.

請求項２に記載の特徴量算出工程は、パケット単位またはフロー単位に集約したパケット情報を基にパケットのペイロードを8bit毎に区切り、該8bitを一つのコードとして256クラスのコード出現頻度を算出すると共に、該出現頻度をヒストグラムとして表すことを特徴とする。 The feature amount calculation step according to claim 2, based on packet information aggregated in packet units or flow units, divides the packet payload into 8 bits and calculates 256 classes of code appearance frequency using the 8 bits as one code. In addition, the appearance frequency is represented as a histogram.

請求項３に記載の類似性判定工程は、パケットやフローの通信内容の類似性を判定するために、前記特徴量として表したヒストグラムを256次元のベクトルで表すと共に、類似性を比較する通信内容についてベクトル間の距離を算出し、算出された距離値を評価して通信内容の類似性を判定することを特徴とする。 The similarity determination step according to claim 3, wherein in order to determine the similarity of communication contents of a packet or a flow, the communication contents for comparing the similarities are expressed with a 256-dimensional vector representing the histogram expressed as the feature amount. The distance between vectors is calculated, and the calculated distance value is evaluated to determine the similarity of communication contents.

請求項４に記載の類似性判定工程は、前記距離値の評価にクラスタリング技術を用いることを特徴とする。 The similarity determination step according to claim 4 uses a clustering technique for the evaluation of the distance value.

請求項５に記載の異常判定工程は、一つの観測点で一定期間内に類似した通信内容が複数検出された場合、該通信内容を拡散型不正アクセスの候補として該通信内容の特徴量を他の観測点に通知する手順と、通知を受けた観測点において通知された特徴量を基準として自観測点の通信内容の類似性を判定すると共に、類似性があると判定した通信内容を拡散型不正アクセスとして検知する手順とを有することを特徴とする。 In the abnormality determination step according to claim 5, when a plurality of similar communication contents are detected within a certain period at one observation point, the communication contents are set as candidates for diffusion type unauthorized access, and the feature amount of the communication contents is changed. The similarity of the communication contents of the own observation point is determined based on the procedure of notifying the observation point and the feature value notified at the notified observation point, and the communication content determined to be similar is diffused. And a procedure for detecting as unauthorized access.

請求項６に記載の拡散型不正アクセス検出システムは、ネットワークトラヒックから収集したパケット情報を基にパケットのペイロードを数値化して特徴量を算出する特徴量算出手段と、前記特徴量を基にパケットやフローの通信内容の類似性を判定する類似性判定手段と、複数の観測点で前記特徴量に関する情報の交換を行うと共に該情報を基にパケットやフローの通信内容の類似性を判定することにより拡散型不正アクセスを検知する異常判定手段とを有することを特徴とする。ここでフローとは、送信IPアドレス、受信IPアドレス、送信port番号、受信port番号、およびプロトコルの５要素が同一である通信の単位と定義する。 The diffusion type unauthorized access detection system according to claim 6 is characterized in that a feature amount calculating means for calculating a feature amount by quantifying a packet payload based on packet information collected from network traffic, a packet or Similarity determination means for determining the similarity of the communication content of the flow and exchanging information on the feature quantity at a plurality of observation points and determining the similarity of the communication content of the packet or the flow based on the information And an abnormality determining means for detecting diffusion type unauthorized access. Here, the flow is defined as a communication unit in which five elements of a transmission IP address, a reception IP address, a transmission port number, a reception port number, and a protocol are the same.

請求項７に記載の特徴量算出手段は、パケット単位またはフロー単位に集約したパケット情報を基にパケットのペイロードを8bit毎に区切り、該8bitを一つのコードとして256クラスのコード出現頻度を算出すると共に、該出現頻度をヒストグラムとして表すことを特徴とする。 The feature amount calculation means according to claim 7 divides a packet payload into 8 bits based on packet information aggregated in packet units or flow units, and calculates 256-class code appearance frequency using the 8 bits as one code. In addition, the appearance frequency is represented as a histogram.

請求項８に記載の類似性判定手段は、パケットやフローの通信内容の類似性を判定するために、前記特徴量として表したヒストグラムを256次元のベクトルで表すと共に、類似性を比較する通信内容についてベクトル間の距離を算出し、算出された距離値を評価して通信内容の類似性を判定することを特徴とする。 The similarity determination unit according to claim 8, wherein the similarity is a communication content for comparing the similarity as well as representing the histogram represented as the feature amount by a 256-dimensional vector in order to determine the similarity of the communication content of the packet or the flow. The distance between vectors is calculated, and the calculated distance value is evaluated to determine the similarity of communication contents.

請求項９に記載の類似性判定手段は、前記距離値の評価にクラスタリング技術を用いることを特徴とする。 The similarity determination means according to claim 9 uses a clustering technique for the evaluation of the distance value.

請求項１０に記載の異常判定手段は、一つの観測点で一定期間内に類似した通信内容が複数検出された場合、該通信内容を拡散型不正アクセスの候補として該通信内容の特徴量を他の観測点に通知する手段と、通知を受けた観測点において通知された特徴量を基準として自観測点の通信内容の類似性を判定すると共に、類似性があると判定した通信内容を拡散型不正アクセスとして検知する手段とを有することを特徴とする。 The abnormality determination unit according to claim 10, wherein when a plurality of similar communication contents are detected at a single observation point within a certain period, the communication contents are set as candidates for diffusion unauthorized access, and the feature amount of the communication contents is The communication contents of the own observation point are determined based on the means notified to the observation point and the feature value notified at the notified observation point, and the communication content determined to be similar is diffused. And a means for detecting unauthorized access.

請求項１または請求項６に係る発明によれば、ネットワークトラヒックから収集したパケット情報を基にパケットのペイロードを数値化して特徴量を算出し、特徴量を基にパケットやフローの通信内容の類似性を判定すること、および複数の観測点で前記特徴量に関する情報の交換を行うと共に該情報を基にパケットやフローの通信内容の類似性を判定することとしたため、検知用データベースを必要としない検知方法で拡散型不正アクセスを検知することが可能となる。 According to the first or sixth aspect of the invention, the feature amount is calculated by quantifying the packet payload based on the packet information collected from the network traffic, and the communication content of the packet or the flow is similar based on the feature amount. And the exchange of information on the feature quantity at a plurality of observation points and the similarity of communication contents of packets and flows are determined based on the information, so that no detection database is required. It becomes possible to detect the diffusion type unauthorized access by the detection method.

請求項２または請求項７に係る発明によれば、パケット単位またはフロー単位に集約したパケット情報のペイロード部についてバイトコードの出現頻度をヒストグラムとして表すこととしたため、パケットやフローの通信内容の類似性を定量的に評価することが可能となる。 According to the invention according to claim 2 or claim 7, since the appearance frequency of the bytecode is represented as a histogram for the payload portion of the packet information aggregated in packet units or flow units, similarity of communication contents of packets and flows Can be quantitatively evaluated.

請求項３および請求項４、または請求項８および請求項９に係る発明によれば、特徴量として表したヒストグラムを256次元のベクトルで表すと共に、類似性を比較する通信内容についてベクトル間の距離を算出し、算出された距離値を評価して通信内容の類似性を判定することとしたため、パケットやフローの通信内容の類似性を定量的に評価することが可能となる。さらに前記距離が小さい程通信内容の類似性が高くなること、および同一種類の不正アクセスのペイロード間の距離は通常通信に比べて小さくなることから、拡散型不正アクセスの候補を抽出することが可能となる。 According to the inventions according to claim 3 and claim 4 or claim 8 and claim 9, the histogram expressed as the feature quantity is represented by a 256-dimensional vector, and the distance between the vectors for the communication contents to be compared for similarity. And the similarity of the communication contents is determined by evaluating the calculated distance value, so that the similarity of the communication contents of the packet or the flow can be quantitatively evaluated. Furthermore, the smaller the distance is, the higher the similarity of communication contents is, and the distance between payloads of the same type of unauthorized access is smaller than that of normal communication, so it is possible to extract diffusion-type unauthorized access candidates. It becomes.

請求項５または請求項１０に係る発明によれば、複数の観測点で拡散型不正アクセスの特徴量に関する情報の交換を行うと共に該情報を基にパケットやフローの通信内容の類似性を判定することとしたため、検知用データベースを必要としない検知方法で拡散型不正アクセスを検知することが可能となる。さらに通常通信と不正アクセスを弁別するために複数の観測点が協調することにより、フォールスポジティブを防ぐことが可能となる。 According to the invention according to claim 5 or claim 10, information regarding the feature amount of the diffusion type unauthorized access is exchanged at a plurality of observation points, and similarity of communication contents of packets and flows is determined based on the information. Therefore, it is possible to detect diffusion type unauthorized access by a detection method that does not require a detection database. Furthermore, false positives can be prevented by coordinating multiple observation points to discriminate between normal communication and unauthorized access.

次に、本発明の実施の形態に係る拡散型不正アクセス検出システムについて図面に基づいて説明する。なお、この実施の形態により本発明が限定されるものではない。 Next, a diffusion type unauthorized access detection system according to an embodiment of the present invention will be described with reference to the drawings. In addition, this invention is not limited by this embodiment.

図１は、本発明の実施の形態に係る拡散型不正アクセス検出システムの構成を示すブロック図である。図１に示すように、拡散型不正アクセス検出システムは、ネットワークトラヒック中の観測点を通過するパケットについてパケット単位またはフロー単位に集約して収集するパケット収集部101と、収集したパケット情報を格納する記憶部102と、パケット単位またはフロー単位に集約したパケット情報からパケットのペイロードを数値化して特徴量を算出する特徴量算出部103と、算出された特徴量を基にパケットやフローの通信内容の類似性を判定する類似性判定部104と、一定期間内に類似した通信内容が複数検出された場合、該通信内容を拡散型不正アクセスの候補として該通信内容の特徴量を他の観測点に通知する情報送信部105と、他の観測点から通知された拡散型不正アクセス候補の特徴量を受け取る情報受信部106と、通知された特徴量を基に通信内容の類似性を判定することにより拡散型不正アクセスを検知する異常判定部107とを有する。 FIG. 1 is a block diagram showing a configuration of a diffusion type unauthorized access detection system according to an embodiment of the present invention. As shown in FIG. 1, the diffusion type unauthorized access detection system stores a packet collection unit 101 that collects and collects packets passing through observation points in network traffic in packet units or flow units, and the collected packet information. A storage unit 102, a feature amount calculation unit 103 that calculates a feature amount by digitizing a packet payload from packet information aggregated in packet units or flow units, and a communication content of a packet or a flow based on the calculated feature amount When a plurality of similar communication contents are detected within a certain period of time with the similarity determination unit 104 that determines similarity, the communication contents are used as candidates for diffusion-type unauthorized access, and the feature quantities of the communication contents are set as other observation points. Information transmitting unit 105 for notification, information receiving unit 106 for receiving feature amounts of diffusion type unauthorized access candidates notified from other observation points, and feature amounts notified And an abnormality determining unit 107 for detecting the spread unauthorized access by determining the similarity of communication contents based.

パケット収集部101は、ネットワークトラヒック中の観測点を通過するパケットについてパケット単位またはフロー単位に集約してパケット情報を収集するものであり、収集したパケット情報を記憶部102に格納する。ここでフローとは、送信IPアドレス、受信IPアドレス、送信port番号、受信port番号、およびプロトコルの５要素が同一である通信の単位と定義する。 The packet collection unit 101 collects packet information by aggregating packets passing through observation points in network traffic in units of packets or in units of flows, and stores the collected packet information in the storage unit 102. Here, the flow is defined as a communication unit in which five elements of a transmission IP address, a reception IP address, a transmission port number, a reception port number, and a protocol are the same.

ネットワークトラヒックをフロー単位で集約することの利点として、タイムスロットによる収集方法などのように観測時間帯に依存することなく、２点間通信の独立した情報を抽出することが可能になること、および通信サービスの区別が明確になることがある。さらにフロー情報から異常を検知した際に異常に係るホスト情報を迅速に得ることが可能になることがある。 As an advantage of aggregating network traffic in units of flows, it becomes possible to extract independent information of point-to-point communication without depending on the observation time zone, such as a time slot collection method, and the like. The distinction between communication services may be clear. Furthermore, when an abnormality is detected from the flow information, it may be possible to quickly obtain host information related to the abnormality.

特徴量算出部103は、記憶部102に格納されたパケット情報から、パケット単位またはフロー単位に集約したパケット情報を基にパケットのペイロードを8bit毎に区切り、該8bitを一つのコードとして256クラスのコード出現頻度を算出すると共に、該出現頻度をヒストグラムとして表したものを特徴量とするものであり、算出された特徴量は類似性判定部104と異常判定部107とへ出力される。 The feature amount calculation unit 103 divides the payload of the packet into 8 bits based on the packet information collected in packet units or flow units from the packet information stored in the storage unit 102, and 256 bits of the 8 bits as one code. The code appearance frequency is calculated, and the appearance frequency expressed as a histogram is used as a feature amount. The calculated feature amount is output to the similarity determination unit 104 and the abnormality determination unit 107.

類似性判定部104は、パケットやフローの通信内容の類似性を判定するために、特徴量算出部103で算出された特徴量を256次元のベクトルで表すと共に、類似性を比較する通信内容についてベクトル間の距離を算出し、算出された距離値を評価して通信内容の類似性を判定すると共に、類似した通信内容が検出された場合、該通信内容に関する特徴量の情報を情報送信部105へ出力する。 The similarity determination unit 104 represents the feature amount calculated by the feature amount calculation unit 103 as a 256-dimensional vector and determines the communication content to be compared for similarity in order to determine the similarity of communication contents of packets and flows. The distance between the vectors is calculated, the calculated distance value is evaluated to determine the similarity of the communication content, and when similar communication content is detected, information on the feature amount related to the communication content is transmitted to the information transmission unit 105. Output to.

情報送信部105は、一定期間内に類似した通信内容が複数検出された場合、該通信内容を拡散型不正アクセスの候補として該通信内容に関する特徴量の情報を他の観測点に通知するものである。 When a plurality of similar communication contents are detected within a certain period, the information transmitting unit 105 notifies the other observation points of information on feature quantities related to the communication contents as candidates for diffusion type unauthorized access. is there.

情報受信部106は、他の観測点から通知された拡散型不正アクセス候補の特徴量に関する情報を受け取るものであり、該特徴量に関する情報は異常判定部107へ出力される。 The information receiving unit 106 receives information on the feature amount of the diffusion type unauthorized access candidate notified from other observation points, and the information on the feature amount is output to the abnormality determining unit 107.

異常判定部107は、情報受信部106から受け取った拡散型不正アクセス候補の特徴量を基準にして、特徴量算出部103で算出された特徴量と比較して類似性を判定すると共に、類似性があると判定した通信内容を拡散型不正アクセスとして検知するものである。また類似性を判定する際には、それぞれの特徴量を256次元のベクトルで表すと共に、類似性を比較する通信内容についてベクトル間の距離を算出し、算出された距離値を評価する。 The abnormality determination unit 107 determines similarity by comparing the feature amount calculated by the feature amount calculation unit 103 with reference to the feature amount of the diffusion type unauthorized access candidate received from the information reception unit 106, and the similarity The communication contents determined to be detected are detected as diffusion type unauthorized access. When determining the similarity, each feature amount is represented by a 256-dimensional vector, the distance between the vectors is calculated for the communication content to be compared for similarity, and the calculated distance value is evaluated.

次に本発明の実施の形態に係る拡散型不正アクセス自動検出システムの動作について説明する。図２および図３は、本発明の実施の形態に係る拡散型不正アクセス自動検出システムの動作を示すフローチャートであり、図２は拡散型不正アクセスの候補を検出するプロセスを示し、図３は拡散型不正アクセスを検出するプロセスを示すものである。 Next, the operation of the diffusion type unauthorized access automatic detection system according to the embodiment of the present invention will be described. 2 and 3 are flowcharts showing the operation of the diffusion type unauthorized access automatic detection system according to the embodiment of the present invention. FIG. 2 shows a process for detecting a diffusion type unauthorized access candidate, and FIG. It shows the process of detecting illegal type access.

まず図２から、ネットワークトラヒック中の観測点を通過するパケットについてパケット単位またはフロー単位に集約してパケット情報を収集する（S01）。ここでフローとは、送信IPアドレス、受信IPアドレス、送信port番号、受信port番号、およびプロトコルの５要素が同一である通信の単位と定義する。 First, from FIG. 2, packet information is collected by aggregating packets passing through observation points in network traffic in units of packets or flows (S01). Here, the flow is defined as a communication unit in which five elements of a transmission IP address, a reception IP address, a transmission port number, a reception port number, and a protocol are the same.

次にパケット単位またはフロー単位に集約したパケット情報を基にパケットのペイロードを8bit毎に区切り、該8bitを一つのコードとして256クラスのコード出現頻度を算出すると共に、該出現頻度をヒストグラムとして表したものを特徴量として算出する（S02）。 Next, based on packet information aggregated in packet units or flow units, packet payloads are divided into 8 bits, and 256 class code appearance frequencies are calculated using the 8 bits as one code, and the appearance frequencies are represented as a histogram. A thing is calculated as a feature value (S02).

上記で算出された特徴量を256次元のベクトルで表すと共に、パケットやフローの通信内容の類似性を判定するために、類似性を比較する通信内容についてベクトル間の距離を算出する（S03）。比較する通信内容について、そのヒストグラムh_i 、h_j 間の距離D(h_i , h_j)を次式により算出する。

ここでh_i,kはヒストグラムh_iの第kクラスを表す。この距離D(h_i , h_j)が小さい程通信内容の類似性が高くなり、この式を用いて通信内容の類似性を定量的に評価することが可能になる。特に同種のワーム同士はこの距離が小さくなることが、後述するシミュレーションの結果から判明している。 The feature amount calculated above is represented by a 256-dimensional vector, and in order to determine the similarity of communication contents of packets and flows, the distance between vectors is calculated for communication contents to be compared for similarity (S03). For the communication contents to be compared, the distance D (h _i , h _j ) between the histograms h _i and h _{j is} calculated by the following equation.

Here, h _{i, k} represents the k-th class of the histogram h _i . The smaller the distance D (h _i , h _j ), the higher the similarity of communication contents, and it is possible to quantitatively evaluate the similarity of communication contents using this equation. It has been found from simulation results described later that this distance is particularly small between worms of the same type.

次にステップS04について説明する。同種のワームの通信内容の類似性が高いことは前述した通りであるが、一部のペイロード間において同種のワーム同士でも距離の大きいものがあることが後述するシミュレーションから判明した。これは複数のクラスタによりペイロードの分布が構成されていること、つまりワームが複数のクラスタに分散して分布していることが後述するシミュレーションから判明した。従って複数の通信内容について式(1)を用いてベクトル間の距離を算出した後に、算出したデータをクラスタリング手法により分類する（S04）。ここではクラスタリング手法として、kmeansクラスタリング手法を用いる。この手法により、クラスタ数が３以上の場合、同種ワーム間の距離の分散が非常に小さくなることが後述するシミュレーションから判明した。 Next, step S04 will be described. As described above, the communication content of similar worms is high in similarity, but it has been found from simulations to be described later that some worms have large distances between some payloads. It was found from the simulation described later that the payload distribution is composed of a plurality of clusters, that is, the worm is distributed in a plurality of clusters. Therefore, after calculating the distance between the vectors using the equation (1) for a plurality of communication contents, the calculated data is classified by the clustering method (S04). Here, the kmmeans clustering method is used as the clustering method. From this simulation, it was found from the simulation described later that the dispersion of the distance between the same kind of worms becomes very small when the number of clusters is 3 or more.

次にステップS05について説明する。上記で述べたようにクラスタリングにより同種ワーム間の距離の分散が非常に小さくなることが判明している。従って本ステップ（S05）では、上記クラスタリングにより求めた距離の分散値が予め設定された閾値よりも小さいか否かによって、異常候補の通信内容の有無を判定することとしている。閾値よりも小さい場合は異常候補の通信内容と判定し（S06）、閾値以上の場合は正常の通信内容と判定する（S08）。 Next, step S05 will be described. As mentioned above, it has been found that the dispersion of the distance between homogeneous worms becomes very small by clustering. Therefore, in this step (S05), the presence / absence of the communication content of the abnormal candidate is determined based on whether or not the dispersion value of the distance obtained by the clustering is smaller than a preset threshold value. If it is smaller than the threshold, it is determined that the communication content is an abnormal candidate (S06), and if it is greater than or equal to the threshold, it is determined that the communication content is normal (S08).

次にステップS07では、ステップS06で異常候補と判定された通信内容について、該通信内容に関する特徴量の情報を他の観測点へ通知する。これは、拡散型不正アクセスの通信特性として、高い類似性を示す内容の通信が多数発生することを仮定しているが、不正アクセスではない通常通信においても幾つかの類似性の高い通信内容を持つものが存在することが判明したため、通常通信と不正アクセスを弁別する方法として協調型の検出を行うことが必要と考えたことによる。つまり複数の観測点で類似性の高い通信が発見されたか否かにより、拡散型不正アクセスと通常通信との弁別が可能であると考えた。 Next, in step S07, with respect to the communication content determined to be an abnormality candidate in step S06, information on the feature amount related to the communication content is notified to other observation points. This assumes that many communications with high similarity occur as the communication characteristics of diffusion-type unauthorized access, but there are some highly similar communications contents in normal communications that are not unauthorized access. Because it was found that there is something to have, it was necessary to perform cooperative detection as a method to discriminate between normal communication and unauthorized access. In other words, we thought that it was possible to distinguish between diffusion-type unauthorized access and normal communication based on whether or not communication with high similarity was found at multiple observation points.

次に拡散型不正アクセスを検知するプロセスについて図３を用いて説明する。図３から、他の観測点から通知された拡散型不正アクセス候補の特徴量に関する情報を受信する（S11）と、自観測点におけるパケットやフローの通信内容の中に、通知された拡散型不正アクセス候補の通信内容と類似した通信内容がないか判定するために、自観測点の通信内容についてペイロードを数値化して特徴量を算出する（S12）。つまりパケット単位またはフロー単位に集約したパケット情報を基にパケットのペイロードを8bit毎に区切り、該8bitを一つのコードとして256クラスのコード出現頻度を算出すると共に、該出現頻度をヒストグラムとして表したものを特徴量として算出する。 Next, a process for detecting diffusion type unauthorized access will be described with reference to FIG. From FIG. 3, when information on the feature amount of the diffusion-type unauthorized access candidate notified from another observation point is received (S11), the diffusion-type unauthorized notification notified in the packet or flow communication contents at the own observation point In order to determine whether there is communication content similar to the communication content of the access candidate, the payload is digitized and the feature amount is calculated for the communication content of the own observation point (S12). In other words, the payload of a packet is divided into 8 bits based on packet information aggregated in packet units or flow units, 256 class code appearance frequency is calculated with the 8 bits as one code, and the appearance frequency is represented as a histogram. Is calculated as a feature amount.

次に上記で算出された特徴量を256次元のベクトルで表すと共に、通知された拡散型不正アクセス候補の通信内容と比較するためにベクトル間の距離を算出する（S13）。比較する通信内容について、そのヒストグラムh_i 、h_j 間の距離D(h_i , h_j)を式(1)により算出する。この距離が小さい程通信内容の類似性が高くなり、この式を用いて拡散型不正アクセスを定量的に評価することが可能になる。 Next, the feature quantity calculated above is represented by a 256-dimensional vector, and the distance between the vectors is calculated for comparison with the communication content of the notified diffusion type unauthorized access candidate (S13). For the communication contents to be compared, the distance D (h _i , h _j ) between the histograms h _i and h _{j is} calculated by equation (1). The smaller this distance is, the higher the similarity of communication contents becomes, and it becomes possible to quantitatively evaluate the diffusion type unauthorized access using this equation.

次にステップS14では、式(1)で算出した距離が予め設定された閾値よりも小さいか否かによって、拡散型不正アクセスの有無を判定することとしている。閾値よりも小さい場合は拡散型不正アクセスと判定し（S15）、閾値以上の場合は正常アクセスと判定する（S16）。 Next, in step S14, the presence / absence of diffusion-type unauthorized access is determined based on whether or not the distance calculated by equation (1) is smaller than a preset threshold value. If it is smaller than the threshold value, it is determined as diffusion type unauthorized access (S15), and if it is greater than the threshold value, it is determined as normal access (S16).

以上の図２および図３で示したプロセスにより、拡散型不正アクセスを検知することが可能になる。 The process shown in FIGS. 2 and 3 can detect a diffusion type unauthorized access.

次に、本発明の実施の形態に係る拡散型不正アクセス検出システムについて、実際のネットワークから得られたトラヒックデータを用いて行ったシミュレーションの内容と、その効果について説明する。 Next, the contents of the simulation performed using the traffic data obtained from the actual network and the effect of the diffusion type unauthorized access detection system according to the embodiment of the present invention will be described.

Bleeding Snort（非特許文献２１）のSignatureを利用して検知されたBeagle.AVと、MIMEでエンコードされたファイルが添付されたメールについて、それぞれ特徴量を算出してヒストグラムで表したものを図４と図５に示す。各図には二つのサンプルのヒストグラムが示されており、Beagle.AVのヒストグラムはほとんど同じ形状をしているが、通常メールのヒストグラムは異なる形状をしていることがわかる。本実験では、このヒストグラムを256次元のベクトルと見なし、形状の違いをベクトル間の距離により定量的に算出し、通信内容の類似性の評価を行う。以降、本実験で扱う不正アクセスは、全てBleeding SnortのSignatureを利用し検出したものを利用する。 Fig. 4 is a histogram of feature values calculated for Beagle.AV detected using Signature of Bleeding Snort (Non-patent Document 21) and mail attached with MIME-encoded files. And shown in FIG. Each figure shows the histogram of two samples, and the histogram of Beagle.AV has almost the same shape, but the histogram of normal mail has a different shape. In this experiment, this histogram is regarded as a 256-dimensional vector, the difference in shape is quantitatively calculated from the distance between the vectors, and the similarity of communication contents is evaluated. Thereafter, all unauthorized access handled in this experiment uses those detected using Signature of Bleeding Snort.

http://www.bleedingsnort.org/http://www.bleedingsnort.org/

任意の二つのヒストグラムh_i、h_j間の距離D(h_i, h_j)は式(1)で算出され、この距離をフロー間の類似性の評価指標として用いることとして、図６〜図９に同種のワーム間、ワームと通常メール間（以下、通常メールをNormalと表す）、異種のワーム間の距離の分布を示す。BeagleおよびNetSkyはメールを利用し感染するワームであるため、Normalにはメールのトラヒックのみを用いている。サンプル数は、Beagle.AVとNetSky.Pが30個、NetSky.Cが27個、Normalが56個であり、図中の横軸はその全ての組合せを示している。また算出された距離は昇順にソートして表している。 The distance D (h _i , h _j ) between any two histograms h _i and h _j is calculated by the equation (1), and this distance is used as an evaluation index of similarity between flows. 9 shows the distribution of distances between worms of the same type, between worms and normal mail (hereinafter, normal mail is represented as Normal), and between different types of worms. Beagle and NetSky are worms that use email to infect, so Normal uses only email traffic. The number of samples is 30 for Beagle.AV and NetSky.P, 27 for NetSky.C, and 56 for Normal, and the horizontal axis in the figure indicates all the combinations. The calculated distances are sorted in ascending order.

図６〜図９から、同種のワーム同士の距離は小さく、ワームとNormal間、異種のワーム間の距離は大きくなっていることがわかる。実験で利用したNormalの中に3個程度ワームに類似したファイルが添付されているものがあり、それらが原因でワーム同士の小さい距離が算出されている。しかし、大多数の同種のワーム同士は、ワームとNormal間よりも小さな距離となっていることから、ペイロードにおけるバイトコードの出現頻度を表したヒストグラム間の距離を用いて通信内容の類似性を定量的に評価すること、およびNormalとの弁別が可能であると考えられる。 6 to 9, it can be seen that the distance between the worms of the same kind is small, and the distance between the worm and Normal and between the worms of different kinds are large. Some Normal files used in the experiment have three worm-like files attached to them, and the short distance between them is calculated. However, since most worms of the same type have a smaller distance than the worm and Normal, the similarity of communication contents is quantified using the distance between histograms representing the appearance frequency of bytecode in the payload. It is considered that it can be evaluated and discriminated from Normal.

上記において、任意の二つのペイロードを用いて、同種のワーム間、異種のワーム間、ワームとNormal間の距離の分布を評価し、同種のワームは通信内容の類似性が高いことが確認された。しかし一部のペイロード間において、同種のワームであっても距離の大きいものがあることも判明した。この原因として、同種のワームであっても幾つかのペイロードが他のペイロードの塊から逸脱した位置に存在していることが考えられる。つまり、複数のクラスタによりペイロードの分布が構成されていると推定できる。そこでkmeansクラスタリングを用いて、クラスタ数を変化させた際の、各クラスタ毎の分散の変化を評価し、上記の推定の検証を行う。 In the above, using two arbitrary payloads, we evaluated the distribution of distances between worms of the same kind, between worms of different kinds, and between worms and Normal, and it was confirmed that the worms of the same kind have high communication content similarity. . However, it has also been found that some worms of the same kind have a large distance between some payloads. This may be because some payloads exist at positions deviating from other payload chunks even if they are the same type of worm. That is, it can be estimated that the distribution of the payload is composed of a plurality of clusters. Therefore, using kmeans clustering, the variance of each cluster when the number of clusters is changed is evaluated, and the above estimation is verified.

図10に、kmeansクラスタリングでクラスタ数（セントロイド数）を変化させた場合の各クラスタの分散の平均値を示す。データ数等は上記と同じものを使用する。図９より、クラスタ数が増加した際にワーム間の距離は0近くまで減少しているが、Normalは分散が大きく広範囲に分布していることがわかる。特にクラスタ数が３以上の場合、ワーム間の距離の分散が非常に小さくなっていることが確認できる。このことは、ワームが複数のクラスタに分散して分布していることの裏付けであると言える。また新種のワームが発生した場合でも、ペイロードを構成するバイトコードのヒストグラム間の距離を用いて通信内容の類似性を評価し、高い類似性を有している通信、つまり互いの距離が小さい通信を発見することにより、検知用データベースを必要としないで不正アクセス検知の可能性を示していると言える。 FIG. 10 shows the average value of the variance of each cluster when the number of clusters (centroid number) is changed by kmeans clustering. Use the same number of data as above. From FIG. 9, it can be seen that when the number of clusters increases, the distance between worms decreases to near zero, but Normal has a large dispersion and is distributed over a wide range. In particular, when the number of clusters is 3 or more, it can be confirmed that the dispersion of the distance between the worms is very small. This can be said to support that the worm is distributed in multiple clusters. Even if a new type of worm occurs, the communication content is evaluated using the distance between the histograms of the byte codes that make up the payload, and communication with high similarity, that is, communication with a small distance between each other By discovering the above, it can be said that it shows the possibility of unauthorized access detection without requiring a detection database.

次に、ペイロードのサンプルを基準としたクラスタリングについて説明する。上記では、kmeansクラスタリングを利用し距離の分散について調査し、同一種類のワームは局所的に分布していることが明らかになった。kmeansクラスタリングのように、クラスタの基準となるベクトルを学習する必要のあるアルゴリズムでは、リアルタイムでの検知は困難になると考えられる。ここでは、他のペイロードから逸脱した通信内容が検知されたとして、そのサンプル（以下、基準サンプルと記す）を基準に順次観測された通信との距離計算を行った場合の距離の分布について検証する。 Next, clustering based on payload samples will be described. In the above, we investigated the dispersion of distance using kmeans clustering and found that the same kind of worms are locally distributed. It is considered that real-time detection is difficult for an algorithm that needs to learn a vector serving as a cluster reference, such as kmeans clustering. Here, it is assumed that communication contents deviating from other payloads are detected, and the distance distribution in the case of performing distance calculation with the communication sequentially observed based on the sample (hereinafter referred to as a reference sample) is verified. .

図11〜図13に、ある基準サンプルから他のサンプルまでの距離の分布を示す。縦軸は基準サンプルからの距離、横軸は基準サンプルと同種のワームとの距離、および基準サンプルとNormalとの距離を、観測された順に示している。図11〜図13から、基準サンプルと同一種類のワーム間の距離が小さく、基準サンプルとNormal間の距離は大きくなることがわかる。この結果から、ある逸脱したサンプルを基準とし、それ以降に観測されたサンプルの距離を順次計算することにより、類似した通信をクラスタリングし、検知することが可能であると考えられる。 11 to 13 show distance distributions from a certain reference sample to other samples. The vertical axis represents the distance from the reference sample, the horizontal axis represents the distance between the reference sample and the same type of worm, and the distance between the reference sample and Normal in the order observed. 11 to 13, it can be seen that the distance between the worms of the same type as the reference sample is small, and the distance between the reference sample and Normal is large. From this result, it is considered that similar communications can be clustered and detected by sequentially calculating the distances of samples observed after that with a sample that has deviated as a reference.

本実験では、拡散型不正アクセスの通信特性として、高い類似性を示す内容の通信が多数発生することを仮定している。しかし不正アクセスではない通常の通信においても幾つかの類似性の高い通信内容を持つものが存在することが判明した。これはフォールスポジティブの原因となるものであるため、不正アクセスとの弁別を実現するアルゴリズムが必要である。 In this experiment, it is assumed that a number of communications having high similarity occur as communication characteristics of diffusion type unauthorized access. However, it has been found that there are some communication contents with high similarity even in normal communication that is not unauthorized access. Since this is a cause of false positives, an algorithm for realizing discrimination from unauthorized access is required.

図14に、高い類似性を示す通常通信の例を示す。図14から、プロトコルヘッダやメッセージ部分が高い類似性を示す要因であることがわかる。特に、ある特定のサーバに対して定期的にアクセスする通信やPOPを利用してメールの一覧を頻繁に得ようとする通信は、アクセス日時のみが変化し、ほとんどの部分が同一の通信内容になる傾向がある。このような通常通信の部分的な変化は、拡散型不正アクセスが感染先のIPアドレスやメールアドレスを変化させるよりも小さい変化である場合があり、ヒストグラム間の距離以外に弁別を行う条件が必要である。この条件の候補としては、送受信したデータのサイズが挙げられる。前述のように類似性の高い通信内容を持つ通常通信は、プロトコルヘッダのみの通信などの短いものが多いと考えられる。一方、自己拡散を試みる拡散型不正アクセスは、スキャンや感染ホスト内のデータベースなどを利用し感染先の候補を選定する機能や、独自のメールサーバを実装するなどのある程度の規模のプログラムであると考えられ、送受信されるデータのサイズが通常通信よりも大きくなり、データサイズが不正アクセスと通常通信とを弁別する基準の候補となり得る。 FIG. 14 shows an example of normal communication showing high similarity. From FIG. 14, it can be seen that the protocol header and the message part are high similarity factors. In particular, for communications that regularly access a specific server or communications that frequently obtain a list of emails using POP, only the access date / time changes, and most of the communication contents are the same. Tend to be. Such a partial change in normal communication may be a change that is smaller than spreading-type unauthorized access that changes the IP address or email address of the infection destination, and conditions other than the distance between histograms are necessary for discrimination. It is. A candidate for this condition is the size of the transmitted / received data. As described above, normal communication having highly similar communication contents is considered to be short, such as communication using only a protocol header. On the other hand, spreading unauthorized access that attempts self-diffusion is a program of a certain scale, such as a function to select a candidate for an infection destination using a scan or a database in the infected host, or a unique mail server. It is conceivable that the size of data to be transmitted and received is larger than that of normal communication, and the data size can be a candidate for a criterion for discriminating between unauthorized access and normal communication.

しかしメーリングリストからのメール送信などのように、送受信されたデータサイズでは弁別できない通常通信がある。特に一定サイズ以上のファイルが添付されたものは、複数の送信先に同一内容が短時間に送信されることになり、データサイズだけでの弁別は不可能であると考えられる。 However, there are normal communications that cannot be distinguished by the size of data sent and received, such as mail transmission from a mailing list. In particular, when a file of a certain size or larger is attached, the same content is transmitted to a plurality of transmission destinations in a short time, and it is considered impossible to discriminate only by the data size.

この問題を解決するために、複数ネットワークの協調による検知アルゴリズムについて検証する。図15に、協調型検知アルゴリズムの概要を示す。このアルゴリズムは、LANなどに設置した類似性評価システムにより抽出された高い類似性を持つ通信の情報を、個々のLAN 間で互いに交換し、それぞれ比較することを行うものである。 In order to solve this problem, we will verify the detection algorithm based on the cooperation of multiple networks. FIG. 15 shows an overview of the cooperative detection algorithm. In this algorithm, communication information with high similarity extracted by a similarity evaluation system installed in a LAN is exchanged between individual LANs and compared.

世界規模での被害をもたらす拡散型不正アクセスは、非常に多くのネットワークやホストへ感染し被害の拡大を試みるはずである。そのため、ネットワーク上の複数の観測点で類似した内容の通信が多数観測されると考えられる。この傾向は、感染速度の速い不正アクセス程、短時間で顕著になると言える。この多数のネットワークへの感染を試みる通信の特性を利用し、複数の観測点において互いに類似した通信が、類似性情報の交換により複数発見された場合、その通信を拡散型不正アクセスとして検知することが可能となる。一方、メーリングリストから同時にそのようなメールが送信されることは稀であると考えられ、複数の観測点で類似性の高い通信が発見されたか否かにより、拡散型不正アクセスとメーリングリストのメールとの弁別が可能になると考えられる。 Spreading unauthorized access that causes damage on a global scale should infect a large number of networks and hosts and attempt to spread the damage. Therefore, it is considered that many communications with similar contents are observed at a plurality of observation points on the network. It can be said that this tendency becomes more noticeable in a shorter time as unauthorized access with a higher infection speed. Using the characteristics of communications that attempt to infect a large number of networks, when multiple communications that are similar to each other at multiple observation points are discovered through the exchange of similarity information, the communications are detected as diffuse unauthorized access. Is possible. On the other hand, it is rare that such a mail is sent from the mailing list at the same time, and depending on whether or not a highly similar communication has been found at multiple observation points, it is possible to determine whether Discrimination will be possible.

本発明の実施の形態に係る拡散型不正アクセス検出システムの構成を示すブロック図である。It is a block diagram which shows the structure of the diffusion type | mold unauthorized access detection system which concerns on embodiment of this invention. 本発明の実施の形態に係る拡散型不正アクセス検出システムにおいて拡散型不正アクセスの候補を検出するプロセス示すフローチャートである。It is a flowchart which shows the process which detects the candidate of a spreading | diffusion type unauthorized access in the spreading | diffusion type unauthorized access detection system which concerns on embodiment of this invention. 本発明の実施の形態に係る拡散型不正アクセス検出システムにおいて拡散型不正アクセスを検出するプロセスを示すフローチャートである。It is a flowchart which shows the process which detects a spreading | diffusion type unauthorized access in the spreading | diffusion type unauthorized access detection system which concerns on embodiment of this invention. Beagle.AVの特徴量を表したヒストグラムを示す図である。It is a figure which shows the histogram showing the feature-value of Beagle.AV. 通常メールの特徴量を表したヒストグラムを示す図である。It is a figure which shows the histogram showing the feature-value of normal mail. Beagle.AV間、およびBeagle.AVとNormal間の距離の分布（昇順にソート）を示す図である。It is a figure which shows distribution (sorting in ascending order) of the distance between Beagle.AV and between Beagle.AV and Normal. NetSky.C間、およびNetSky.CとNormal間の距離の分布（昇順にソート）を示す図である。It is a figure which shows distribution (sorting in ascending order) of the distance between NetSky.C and between NetSky.C and Normal. NetSky.P間、およびNetSky.PとNormal間の距離の分布（昇順にソート）を示す図である。It is a figure which shows distribution (sorting in ascending order) of the distance between NetSky.P and between NetSky.P and Normal. Beagle.AV間、NetSky.P間、およびBeagle.AVとNetSky.P間の距離の分布（昇順にソート）を示す図である。It is a figure which shows distribution (sorting in ascending order) between Beagle.AV, between NetSky.P, and between Beagle.AV and NetSky.P. クラスタ数とクラスタ毎の分散の関係を示す図である。It is a figure which shows the relationship between the number of clusters and dispersion | distribution for every cluster. ある一つのBeagle.AVとBeagle.AV間、およびある一つのBeagle.AVとNormal間の距離を示す図である。It is a figure which shows the distance between one certain Beagle.AV and Beagle.AV, and one certain Beagle.AV and Normal. ある一つのNetSky.PとNetSky.P間、およびある一つのNetSky.PとNormal間の距離を示す図である。It is a figure which shows the distance between a certain NetSky.P and NetSky.P, and between a certain NetSky.P and Normal. ある一つのNetSky.CとNetSky.C間、およびある一つのNetSky.CとNormal間の距離を示す図である。It is a figure which shows the distance between a certain NetSky.C and NetSky.C, and between a certain NetSky.C and Normal. 高い類似性を示す通常通信の例を示す図である。It is a figure which shows the example of normal communication which shows high similarity. 協調型検知アルゴリズムの概要を示す図である。It is a figure which shows the outline | summary of a cooperative detection algorithm.

Explanation of symbols

１０１パケット収集部
１０２記憶部
１０３特徴量算出部
１０４類似性判定部
１０５情報送信部
１０６情報受信部
１０７異常判定部 101 packet collection unit 102 storage unit 103 feature amount calculation unit 104 similarity determination unit 105 information transmission unit 106 information reception unit 107 abnormality determination unit

Claims

A feature amount calculation step for calculating a feature amount by digitizing a packet payload based on packet information collected from network traffic, and a similarity determination step for determining the similarity of communication contents of a packet or a flow based on the feature amount And an abnormality determination step of detecting diffusion type unauthorized access by exchanging information on the feature quantity at a plurality of observation points and determining similarity of communication contents of packets and flows based on the information. A diffusion type unauthorized access detection method characterized by the above.

The feature amount calculating step divides the payload of the packet into 8 bits based on packet information aggregated in packet units or flow units, calculates the 256 class code appearance frequency using the 8 bits as one code, and the appearance frequency The method according to claim 1, wherein: is expressed as a histogram.

In the similarity determination step, in order to determine the similarity of communication contents of a packet or a flow, the histogram expressed as the feature amount is represented by a 256-dimensional vector, and the distance between vectors for communication contents to be compared for similarity 2. The diffusion type unauthorized access detection method according to claim 1, wherein the similarity of communication contents is determined by evaluating the calculated distance value.

The diffusion type unauthorized access detection method according to claim 3, wherein the similarity determination step uses a clustering technique for the evaluation of the distance value.

In the abnormality determination step, when a plurality of similar communication contents are detected at a single observation point within a certain period, the communication contents are notified to other observation points as a diffusion-type unauthorized access candidate. The similarity of the communication contents of the own observation point is determined based on the procedure to be performed and the feature amount notified at the notified observation point, and the communication content determined to be similar is detected as diffuse unauthorized access The diffusion type unauthorized access detection method according to claim 1, further comprising a procedure.

Feature quantity calculation means for calculating the feature quantity by digitizing the payload of the packet based on packet information collected from network traffic, and similarity determination means for judging the similarity of communication contents of packets and flows based on the feature quantity And an abnormality determination means for detecting diffusion-type unauthorized access by exchanging information on the feature quantity at a plurality of observation points and determining similarity of communication contents of packets and flows based on the information. A diffusion type unauthorized access detection system characterized by

The feature amount calculation means divides the payload of the packet into 8 bits based on packet information aggregated in packet units or flow units, calculates 256 class code appearance frequency using the 8 bits as one code, and the appearance frequency The diffusion type unauthorized access detection system according to claim 6, wherein: is expressed as a histogram.

The similarity determination means represents a histogram represented as the feature amount by a 256-dimensional vector and determines the distance between vectors for communication contents to be compared for similarity in order to determine the similarity of communication contents of packets and flows. The diffusion type unauthorized access detection system according to claim 6, wherein the similarity of communication contents is determined by evaluating the calculated distance value.

The diffusion type unauthorized access detection system according to claim 8, wherein the similarity determination unit uses a clustering technique for the evaluation of the distance value.

When a plurality of similar communication contents are detected at a single observation point within a certain period, the abnormality determination means notifies the communication contents of the communication contents to other observation points as a candidate for diffusion unauthorized access. And the similarity of the communication contents of the own observation point based on the notified feature quantity at the notified observation point, and the communication content determined to be similar is detected as a diffusion type unauthorized access The diffusion type unauthorized access detection system according to claim 6, further comprising: means.