JP2007228489A

JP2007228489A - Identification system and method therefor, and program for application

Info

Publication number: JP2007228489A
Application number: JP2006049896A
Authority: JP
Inventors: Takayuki Shizuno; 隆之静野
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2006-02-27
Filing date: 2006-02-27
Publication date: 2007-09-06

Abstract

<P>PROBLEM TO BE SOLVED: To provide a technology capable of identifying the application, even in an environment where a part of traffics is encrypted in end-to-end communications, where flows of two or more applications are intermingled on the traffic flows. <P>SOLUTION: An identification system of application for identifying application of packets to be noticed as an identification object comprises a statistics value calculating means for computing statistics values based upon information on the received packets for each group of application candidate of the packets to be noticed and application candidate of neighboring packets of the packets to be noticed; and an operating means for calculating the transition probability which changes from the application candidate of the neighboring packets to the application candidate of the packets to be noticed, while the statistical values calculated by the statistics value calculating means with the application candidate's application information are compared. Thus, application of these packets to be noticed is identified on the basis of the transition probability. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、アプリケーション識別システム、アプリケーション識別方法、及びプログラムに関し、特にさまざまなアプリケーションのデータが混在したままネットワーク上を流れる場合において、流れているアプリケーションの種類を識別するアプリケーション識別システム、方法、及びプログラムに関する。 The present invention relates to an application identification system, an application identification method, and a program, and more particularly to an application identification system, a method, and a program for identifying the type of a flowing application in a case where data of various applications flow on a network while being mixed. About.

インターネットを利用した電子商取引やストリーミング配信等の本格的な普及に伴い、インターネット上には益々多くのアプリケーションのフローが混在するようになってきている。 With the full-scale spread of electronic commerce and streaming distribution using the Internet, more and more application flows are mixed on the Internet.

また、インターネット上には、ピア・ツー・ピア技術を用いて可能な限り多くの帯域を使用し、かつＴＣＰ／ＵＤＰポート番号を詐称して大きなデータを転送するアプリケーションも多数存在する。このようなアプリケーションを使用されたインターネットサービスプロバイダでは、提供可能な帯域の大部分を一部のユーザに使われてしまう。このため、インターネットサービスプロバイダ等では、通過するフローのうち長時間大容量のフローを発生させるものがあれば、ネットワーク管理ポリシーに従いそのフローのみを帯域制限することも珍しくない。 There are also many applications on the Internet that use as much bandwidth as possible using peer-to-peer technology, and that transfer large data by spoofing TCP / UDP port numbers. In an Internet service provider using such an application, most of the available bandwidth is used by some users. For this reason, it is not uncommon for Internet service providers and the like to limit the bandwidth of only a flow according to the network management policy if there is a flow that generates a large-capacity flow for a long time.

このとき、帯域を制限してはならない正規のフローに対して帯域制限をかけてしまうと具合が悪い。レスポンス低下やセッション断等のサービス影響が生じる可能性があるからである。そこで、この場合、影響を受けてしまう帯域を制限してはならない正規のフローなのか、或いは、帯域を制限すべき不正なフローなのかを識別するためにアプリケーションの識別が必要となる。 At this time, if the bandwidth limit is applied to a regular flow that should not be bandwidth-limited, the situation is bad. This is because there is a possibility that a service impact such as a response drop or session disconnection may occur. Therefore, in this case, it is necessary to identify the application in order to identify whether the flow is a regular flow that should not limit the affected band or an unauthorized flow that should limit the bandwidth.

ここに、アプリケーションのフローとはユーザとシステム間で通信される一連のアプリケーションデータを意味し、以下単にフローと略記する場合がある。また、「トラフィック」の用語は、複数のアプリケーションフローを含むデータ全体の流れ」の意味として用いる場合がある。 Here, the application flow means a series of application data communicated between the user and the system, and may be simply abbreviated as a flow below. In addition, the term “traffic” may be used as the meaning of “the flow of the entire data including a plurality of application flows”.

さて、このアプリケーションの識別手法については、これまでもいくつかの技術が提案されてきている。 Now, several techniques have been proposed for this application identification method.

例えば、正規のアプリケーションフローの送受信ＩＰアドレスの組み合わせと、そのフローを構成するパケットの中身を正規フローのパタンとして予め保存しておき、受信したパケットが前記正規フローのパタンと一致するかでアプリケーションの識別を行う技術がある（特許文献１）。 For example, the combination of the transmission / reception IP address of the regular application flow and the contents of the packet constituting the flow are stored in advance as a regular flow pattern, and whether the received packet matches the regular flow pattern or not There is a technique for performing identification (Patent Document 1).

又、不正なフローを構成するパケットのビットパタンを予め保存しておき、受信したパケットが前記ビットパタンに一致するかでアプリケーション識別を行う技術も提案されている（特許文献２）。 In addition, a technique has been proposed in which the bit pattern of a packet constituting an illegal flow is stored in advance, and application identification is performed based on whether the received packet matches the bit pattern (Patent Document 2).

しかしながら、こうした方法にはいくつかの問題があった。 However, there are several problems with these methods.

例えば、ＴＣＰ／ＵＤＰポート番号が詐称されてしまうとアプリケーションの識別ができなくなるという問題があった。 For example, there is a problem that if the TCP / UDP port number is spoofed, the application cannot be identified.

又、アプリケーションの識別にはビットパタンが既知である必要があった。このため、新規の不正なフローが出現する度に保守者はビットパタン指定や登録を行うなどの保守作業を行う必要があり管理者に負担がかかった。同時に、ビットパタンを登録するための記憶領域もシステム上で余分に必要となっていた。 In addition, the bit pattern needs to be known for application identification. For this reason, each time a new illegal flow appears, the maintenance person needs to perform maintenance work such as specifying a bit pattern or registering it, which places a burden on the administrator. At the same time, an extra storage area for registering bit patterns is required on the system.

そこで、これらの問題を解決するための技術として、最近提案された技術に以下のものがある。すなわち、ヘッダ情報に依存しないフローのパケット長平均値、パケット長分散値、パケットの到着間隔平均値、パケットの到着間隔分散値で表される統計的特徴で定義されるフローの特徴を予め保存しておき、受信したフローが前記保存しているフローの統計的特徴と一致するかでアプリケーション識別を行う技術である（非特許文献３）。
特開２００４−３８５５７号公報特開２００４−１４０６１８号公報電子情報通信学会２００５総合大会Ｂ−６−４３ Thus, as a technique for solving these problems, the following techniques have been proposed recently. That is, the flow characteristics defined by the statistical characteristics represented by the packet length average value, the packet length dispersion value, the packet arrival interval average value, and the packet arrival interval dispersion value that do not depend on the header information are stored in advance. In this technique, application identification is performed based on whether the received flow matches the statistical characteristics of the stored flow (Non-patent Document 3).
JP 2004-38557 A Japanese Patent Laid-Open No. 2004-140618 IEICE 2005 General Conference B-6-43

しかしながら、上記提案の技術においても、大きな問題があった。 However, the proposed technique also has a big problem.

例えば、エンドツーエンドで一部のトラフィックが暗号化されてしまうとアプリケーションの検出ができないという問題である。 For example, there is a problem that an application cannot be detected if a part of traffic is encrypted end-to-end.

その理由は、特許文献１と特許文献２と特許文献３とに記載されている発明は、ネットワーク上を流れるパケットを発信側と着信側のポート番号の組み合わせに分け、アプリケーション毎のフローに分離しなければアプリケーションの検出ができないためである。例えばＩＰＳｅｃを用いてエンドツーエンドで全てのトラフィックが暗号化されてしまうと、従来技術ではアプリケーションの識別ができないという難点があった。 The reason for this is that the inventions described in Patent Document 1, Patent Document 2 and Patent Document 3 divide packets flowing on the network into combinations of port numbers on the caller side and the callee side, and separate them into flows for each application. This is because the application cannot be detected without it. For example, if all traffic is encrypted end-to-end using IPSec, there is a problem that the application cannot be identified by the conventional technology.

従って、本発明が解決しようとする課題は、トラフィック上をさまざまなアプリケーションのフローが混在したまま流れるネットワークにおいて、エンドツーエンドで一部のトラフィックが暗号化された場合においてもアプリケーションの識別ができる技術を提供することである。 Therefore, the problem to be solved by the present invention is a technology that can identify an application even when a part of traffic is encrypted end-to-end in a network in which various application flows are mixed on the traffic. Is to provide.

上記課題を解決するための第１の発明は、パケットに対応するアプリケーションを識別するアプリケーション識別システムであって、
少なくとも二以上の受信パケットの各々にアプリケーション候補を対応させた場合、受信パケット間でアプリケーション候補が遷移する遷移確率を、前記受信パケット及び前記アプリケーション候補の情報に基づいて算出する確率演算手段と、
前記遷移確率に基づいて、受信パケットに対応するアプリケーションを識別するアプリケーション識別手段と
を有することを特徴とする。 A first invention for solving the above problem is an application identification system for identifying an application corresponding to a packet,
When the application candidate is associated with each of at least two or more received packets, probability calculation means for calculating a transition probability that the application candidate transitions between the received packets based on the information of the received packet and the application candidate;
Application identifying means for identifying an application corresponding to the received packet based on the transition probability.

上記課題を解決するための第２の発明は、複数のアプリケーションのデータフローが混在するネットワーク環境において、識別対象となる注目パケットのアプリケーションを識別するアプリケーション識別システムであって、
前記注目パケットのアプリケーション候補と、前記注目パケットの近隣パケットのアプリケーション候補との組毎に、受信したパケットのパケット情報に基づいて統計量を算出する統計量演算手段と、
前記統計量演算手段により演算した統計量と前記アプリケーション候補のアプリケーション情報とを比較して、前記近隣パケットのアプリケーション候補から前記注目パケットのアプリケーション候補へと遷移する確率である遷移確率を演算する確率演算手段と、
前記遷移確率に基づいて、前記注目パケットのアプリケーションを識別するアプリケーション識別手段と
を有することを特徴とする。 A second invention for solving the above problem is an application identification system for identifying an application of a packet of interest to be identified in a network environment in which data flows of a plurality of applications are mixed.
Statistic calculation means for calculating a statistic based on packet information of a received packet for each set of application candidates of the packet of interest and application candidates of neighboring packets of the packet of interest;
A probability calculation for calculating a transition probability that is a probability of transition from the application candidate of the neighboring packet to the application candidate of the packet of interest by comparing the statistic calculated by the statistic calculation means and the application information of the application candidate Means,
Application identifying means for identifying the application of the packet of interest based on the transition probability.

上記課題を解決するための第３の発明は、複数のアプリケーションのデータフローが混在するネットワーク環境において、識別対象となる注目パケットのアプリケーションを識別するアプリケーション識別システムであって、
パケット系列のパケット情報を抽出するパケット情報抽出手段と、
前記パケット情報に基づいて、前記パケット系列に対し、注目パケット及び注目パケットの近隣のパケットをそれぞれあるアプリケーション候補に仮定し、前記注目パケットのアプリケーション候補と、前記注目パケットの近隣パケットのアプリケーション候補との組毎に、受信したパケットのパケット情報に基づいてパケット長統計量を演算するパケット長統計量演算手段と、
前記パケット情報に基づいて、前記パケット系列に対し、注目パケット及び注目パケットの近隣のパケットをそれぞれあるアプリケーション候補に仮定し、
前記注目パケットのアプリケーション候補と、前記注目パケットの近隣パケットのアプリケーション候補との組毎に、受信したパケットのパケット情報に基づいてパケット到着間隔統計量を演算するパケット到着間隔統計量演算手段と、
前記統計量演算手段により演算した統計量と前記アプリケーション候補のアプリケーション情報とを比較して、前記近隣パケットのアプリケーション候補から前記注目パケットのアプリケーション候補へと遷移する確率である遷移確率を演算する確率演算手段と、
前記遷移確率に基づいて最尤系列推定を実施しアプリケーション識別を行う最尤系列推定手段と、
識別結果を用いて前記アプリケーション候補の種別及び候補数を動的に制御するアプリケーション候補動的制御手段と、
時系列順に注目パケットを変化させてアプリケーション識別の繰り返し動作を管理する繰り返し動作管理手段と
を有することを特徴とする。 A third invention for solving the above problem is an application identification system for identifying an application of a packet of interest to be identified in a network environment in which data flows of a plurality of applications are mixed,
Packet information extraction means for extracting packet information of a packet sequence;
Based on the packet information, the packet of interest is assumed to be a certain application candidate for the packet sequence, and the application candidate of the packet of interest and the application candidate of the neighboring packet of the packet of interest A packet length statistic calculating means for calculating a packet length statistic based on packet information of a received packet for each set;
Based on the packet information, assume that a packet of interest and a packet in the vicinity of the packet of interest for each packet series are application candidates,
A packet arrival interval statistic calculating means for calculating a packet arrival interval statistic based on packet information of a received packet for each set of application candidates of the packet of interest and application candidates of neighboring packets of the packet of interest;
A probability calculation for calculating a transition probability that is a probability of transition from the application candidate of the neighboring packet to the application candidate of the packet of interest by comparing the statistic calculated by the statistic calculation means and the application information of the application candidate Means,
Maximum likelihood sequence estimation means for performing application identification based on maximum likelihood sequence estimation based on the transition probability;
Application candidate dynamic control means for dynamically controlling the type and number of candidates for the application candidates using the identification result;
Repetitive operation management means for managing the repetitive operation of application identification by changing the packet of interest in chronological order.

上記課題を解決するための第４の発明は、パケットに対応するアプリケーションを識別するアプリケーション識別方法であって、
少なくとも二以上の受信パケットの各々にアプリケーション候補を対応させて、受信パケット間でアプリケーション候補が遷移する遷移確率を、前記受信パケット及び前記アプリケーション候補の情報に基づいて算出する確率演算ステップと、
前記遷移確率に基づいて、受信パケットに対応するアプリケーションを識別するアプリケーション識別ステップと
を有することを特徴とする。 A fourth invention for solving the above-mentioned problem is an application identification method for identifying an application corresponding to a packet,
A probability calculating step of associating an application candidate with each of at least two or more received packets, and calculating a transition probability that the application candidate transitions between received packets based on the information of the received packet and the application candidate;
And an application identification step of identifying an application corresponding to the received packet based on the transition probability.

上記課題を解決するための第５の発明は、複数のアプリケーションのデータフローが混在するネットワーク環境において、識別対象となる注目パケットのアプリケーションを識別するアプリケーション識別方法であって、
前記注目パケットのアプリケーション候補と、前記注目パケットの近隣パケットのアプリケーション候補との組毎に、受信したパケットのパケット情報に基づいて統計量を算出する統計量演算ステップと、
前記統計量演算ステップにより演算した統計量と前記アプリケーション候補のアプリケーション情報とを比較して、前記近隣パケットのアプリケーション候補から前記注目パケットのアプリケーション候補へと遷移する確率である遷移確率を演算する遷移確率演算ステップと、
前記遷移確率に基づいて、前記注目パケットのアプリケーションを識別するアプリケーション識別ステップと
を有することを特徴とする。 A fifth invention for solving the above problem is an application identification method for identifying an application of a packet of interest to be identified in a network environment in which data flows of a plurality of applications are mixed.
A statistic calculation step for calculating a statistic based on packet information of a received packet for each set of application candidates of the packet of interest and application candidates of neighboring packets of the packet of interest;
A transition probability that compares the statistic calculated in the statistic calculation step with the application information of the application candidate and calculates a transition probability that is a probability of transition from the application candidate of the neighboring packet to the application candidate of the packet of interest. A computation step;
And an application identification step of identifying an application of the packet of interest based on the transition probability.

上記課題を解決するための第６の発明は、複数のアプリケーションのデータフローが混在するネットワーク環境において、識別対象となる注目パケットのアプリケーションを識別するアプリケーション識別方法であって、
パケット系列のパケット情報を抽出するパケット情報抽出ステップと、
前記パケット情報に基づいて、前記パケット系列に対し、注目パケット及び注目パケットの近隣のパケットをそれぞれあるアプリケーション候補に仮定し、前記注目パケットのアプリケーション候補と、前記注目パケットの近隣パケットのアプリケーション候補との組毎に、受信したパケットのパケット情報に基づいてパケット長統計量を演算するパケット長統計量演算ステップと、
前記パケット情報に基づいて、前記パケット系列に対し、注目パケット及び注目パケットの近隣のパケットをそれぞれあるアプリケーション候補に仮定し、
前記注目パケットのアプリケーション候補と、前記注目パケットの近隣パケットのアプリケーション候補との組毎に、受信したパケットのパケット情報に基づいてパケット到着間隔統計量を演算するパケット到着間隔統計量演算ステップと、
前記統計量演算ステップにより演算した統計量と前記アプリケーション候補のアプリケーション情報とを比較して、前記近隣パケットのアプリケーション候補から前記注目パケットのアプリケーション候補へと遷移する確率である遷移確率を演算する確率演算ステップと、
前記遷移確率に基づいて最尤系列推定を実施しアプリケーション識別を行う最尤系列推定ステップと、
識別結果を用いて前記アプリケーション候補の種別及び候補数を動的に制御するアプリケーション候補動的制御ステップと、
時系列順に注目パケットを変化させてアプリケーション識別の繰り返し動作を管理する繰り返し動作管理ステップと
を有することを特徴とする。 A sixth invention for solving the above-described problem is an application identification method for identifying an application of a target packet to be identified in a network environment in which data flows of a plurality of applications are mixed.
A packet information extraction step for extracting packet information of the packet sequence;
Based on the packet information, the packet of interest is assumed to be a certain application candidate for the packet sequence, and the application candidate of the packet of interest and the application candidate of the neighboring packet of the packet of interest A packet length statistic calculating step for calculating a packet length statistic based on packet information of the received packet for each set;
Based on the packet information, assume that a packet of interest and a packet in the vicinity of the packet of interest for each packet series are application candidates,
A packet arrival interval statistic calculating step for calculating a packet arrival interval statistic based on packet information of the received packet for each set of the application candidate of the packet of interest and an application candidate of a neighboring packet of the packet of interest;
A probability calculation that calculates a transition probability that is a probability of transition from the application candidate of the neighboring packet to the application candidate of the packet of interest by comparing the statistic calculated in the statistic calculation step and the application information of the application candidate. Steps,
A maximum likelihood sequence estimation step for performing application identification based on maximum transition sequence estimation based on the transition probability; and
An application candidate dynamic control step for dynamically controlling the type and number of candidates for the application candidate using an identification result;
And a repetitive operation management step for managing a repetitive operation of application identification by changing a packet of interest in time series order.

上記課題を解決するための第７の発明は、パケットに対応するアプリケーションを識別するアプリケーション識別システムのプログラムであって、前記プログラムは前記アプリケーション識別システムを、
少なくとも二以上の受信パケットの各々にアプリケーション候補を対応させた場合、受信パケット間でアプリケーション候補が遷移する遷移確率を、前記受信パケット及び前記アプリケーション候補の情報に基づいて算出する確率演算手段と、
前記遷移確率に基づいて、受信パケットに対応するアプリケーションを識別するアプリケーション識別手段と
して機能させことを特徴とする。 A seventh invention for solving the above-mentioned problems is an application identification system program for identifying an application corresponding to a packet, the program identifying the application identification system,
When the application candidate is associated with each of at least two or more received packets, probability calculation means for calculating a transition probability that the application candidate transitions between the received packets based on the information of the received packet and the application candidate;
It is made to function as an application identification means which identifies the application corresponding to a received packet based on the transition probability.

上記課題を解決するための第８の発明は、複数のアプリケーションのデータフローが混在するネットワーク環境において、識別対象となる注目パケットのアプリケーションを識別するアプリケーション識別システムのプログラムであって、前記プログラムは前記システムを、
前記注目パケットのアプリケーション候補と、前記注目パケットの近隣パケットのアプリケーション候補との組毎に、受信したパケットのパケット情報に基づいて統計量を算出する統計量演算手段と、
前記統計量演算手段により演算した統計量と前記アプリケーション候補のアプリケーション情報とを比較して、前記近隣パケットのアプリケーション候補から前記注目パケットのアプリケーション候補へと遷移する確率である遷移確率を演算する確率演算手段と、
前記遷移確率に基づいて、前記注目パケットのアプリケーションを識別するアプリケーション識別手段と
して機能させることを特徴とする。 An eighth invention for solving the above-described problem is an application identification system program for identifying an application of a packet of interest to be identified in a network environment in which data flows of a plurality of applications are mixed. System
Statistic calculation means for calculating a statistic based on packet information of a received packet for each set of application candidates of the packet of interest and application candidates of neighboring packets of the packet of interest;
A probability calculation for calculating a transition probability that is a probability of transition from the application candidate of the neighboring packet to the application candidate of the packet of interest by comparing the statistic calculated by the statistic calculation means and the application information of the application candidate Means,
It is made to function as an application identification means for identifying the application of the packet of interest based on the transition probability.

上記課題を解決するための第９の発明は、複数のアプリケーションのデータフローが混在するネットワーク環境において、識別対象となる注目パケットのアプリケーションを識別するアプリケーション識別システムのプログラムであって、前記プログラムは前記アプリケーション識別システムを、
パケット系列のパケット情報を抽出するパケット情報抽出手段と、
前記パケット情報に基づいて、前記パケット系列に対し、注目パケット及び注目パケットの近隣のパケットをそれぞれあるアプリケーション候補に仮定し、前記注目パケットのアプリケーション候補と、前記注目パケットの近隣パケットのアプリケーション候補との組毎に、受信したパケットのパケット情報に基づいてパケット長統計量を演算するパケット長統計量演算手段と、
前記パケット情報に基づいて、前記パケット系列に対し、注目パケット及び注目パケットの近隣のパケットをそれぞれあるアプリケーション候補に仮定し、前記注目パケットのアプリケーション候補と、前記注目パケットの近隣パケットのアプリケーション候補との組毎に、受信したパケットのパケット情報に基づいてパケット到着間隔統計量を演算するパケット到着間隔統計量演算手段と、
前記統計量演算手段により演算した統計量と前記アプリケーション候補のアプリケーション情報とを比較して、前記近隣パケットのアプリケーション候補から前記注目パケットのアプリケーション候補へと遷移する確率である遷移確率を演算する確率演算手段と、
前記遷移確率に基づいて最尤系列推定を実施しアプリケーション識別を行う最尤系列推定手段と、
識別結果を用いて前記アプリケーション候補の種別及び候補数を動的に制御するアプリケーション候補動的制御手段と、
時系列順に注目パケットを変化させてアプリケーション識別の繰り返し動作を管理する繰り返し動作管理手段と
して機能させることを特徴とする。 A ninth invention for solving the above problem is a program for an application identification system for identifying an application of a packet of interest to be identified in a network environment in which data flows of a plurality of applications are mixed. Application identification system
Packet information extraction means for extracting packet information of a packet sequence;
Based on the packet information, the packet of interest is assumed to be a certain application candidate for the packet sequence, and the application candidate of the packet of interest and the application candidate of the neighboring packet of the packet of interest A packet length statistic calculating means for calculating a packet length statistic based on packet information of a received packet for each set;
Based on the packet information, the packet of interest is assumed to be a certain application candidate for the packet sequence, and the application candidate of the packet of interest and the application candidate of the neighboring packet of the packet of interest Packet arrival interval statistic calculating means for calculating packet arrival interval statistic based on packet information of received packets for each set;
A probability calculation for calculating a transition probability that is a probability of transition from the application candidate of the neighboring packet to the application candidate of the packet of interest by comparing the statistic calculated by the statistic calculation means and the application information of the application candidate Means,
Maximum likelihood sequence estimation means for performing application identification based on maximum likelihood sequence estimation based on the transition probability;
Application candidate dynamic control means for dynamically controlling the type and number of candidates for the application candidates using the identification result;
It is characterized by functioning as a repetitive operation managing means for managing the repetitive operation of application identification by changing the packet of interest in chronological order.

上記のように構成させたことにより、複数のアプリケーションのフローが混在するトラフィックの一部が暗号化された場合であっても、本発明により受信したパケットのアプリケーションを識別することが可能になる。 By configuring as described above, it is possible to identify an application of a packet received according to the present invention even when a part of traffic in which a flow of a plurality of applications is mixed is encrypted.

その理由は、アプリケーションのフローが混在したパケット系列に対して、関連するパケット情報に基づいて、前記パケット系列に対し、注目パケット及び注目パケットの近隣のパケットをそれぞれあるアプリケーション候補に仮定し、その組み合わせ毎のパケット長統計量演算及びパケット到着間隔統計量演算を実施しているからである。 The reason is that for a packet sequence in which application flows are mixed, based on associated packet information, the packet of interest and neighboring packets of the packet of interest are assumed to be application candidates for the packet sequence, and the combination This is because the packet length statistic calculation and the packet arrival interval statistic calculation are performed every time.

そして、前記演算の結果として得られたパケット長統計量及びパケット到着間隔統計量と、前記アプリケーションに関するアプリケーション情報とを比較して各アプリケーション候補組み合わせの遷移確率を演算することにより前記パケット系列をアプリケーション毎のフローに振り分ける動作を、時系列順に注目パケットを変えて実施することによってパケット系列のアプリケーション識別ができるからである。 Then, by comparing the packet length statistic and packet arrival interval statistic obtained as a result of the calculation with the application information related to the application and calculating the transition probability of each application candidate combination, the packet sequence is calculated for each application. This is because the application identification of the packet series can be performed by changing the packet of interest in the time series order and performing the operation of allocating to these flows.

また、ひとつのアプリケーションについて複数のアプリケーション候補を確率演算の対象とすることで、同一アプリケーションのフローが複数混在するような場合であっても、パケット系列のアプリケーション識別ができるためである。 In addition, by using a plurality of application candidates for one application as a target of probability calculation, even when a plurality of flows of the same application are mixed, it is possible to identify application of packet series.

本発明の実施の形態について図を用いて説明する。 Embodiments of the present invention will be described with reference to the drawings.

図１〜図３は本発明になるアプリケーション識別システムの第１実施形態を説明する為のもので、図１は全体のブロック図、図２はパケット情報の情報例の説明図、図３はフローチャートである。 1 to 3 are diagrams for explaining a first embodiment of the application identification system according to the present invention. FIG. 1 is an overall block diagram, FIG. 2 is an explanatory diagram of information examples of packet information, and FIG. 3 is a flowchart. It is.

図１を参照すると、Ａは受信装置であり、複数のアプリケーションのフローが混在したトラフィックのパケット系列を受信する装置である。Ｂはアプリケーション識別装置であり、受信したパケット系列を構成する個々のパケットについてアプリケーションを識別し、パケット系列のアプリケーションを識別する。Ｃは記憶装置であり、受信したパケットに関する情報及びアプリケーションに関する情報等が保存される。Ｄは表示装置であり、アプリケーション識別装置Ｂによるアプリケーションの識別結果が表示される。 Referring to FIG. 1, A is a receiving device that receives a packet sequence of traffic in which a plurality of application flows are mixed. B is an application identification device that identifies an application for each packet constituting the received packet sequence and identifies a packet sequence application. C is a storage device that stores information about received packets, information about applications, and the like. D is a display device on which an application identification result by the application identification device B is displayed.

さらに、各装置の構成について図１に加え図２、図４、図７を適宜用いて説明する。 Further, the configuration of each apparatus will be described with reference to FIGS. 2, 4, and 7 as appropriate in addition to FIG.

受信装置Ａは受信部１を有し、受信装置に入力されるトラフィックのパケット系列が受信されると、パケット系列はアプリケーション識別装置Ｂに送信される。この時、パケット系列を構成する各パケットには受信装置Ａがパケットを受信した時の時刻であるパケット到着時刻が付加される。 The receiving device A has the receiving unit 1, and when a packet sequence of traffic input to the receiving device is received, the packet sequence is transmitted to the application identifying device B. At this time, a packet arrival time which is a time when the receiving device A receives the packet is added to each packet constituting the packet series.

アプリケーション識別装置Ｂは、統計量演算部２０と、パケット情報抽出部２１と、確率演算部２４と、最尤系列推定部２５と、アプリケーション候補動的制御部２６と、繰返し動作管理部２７を有する。 The application identification apparatus B includes a statistic calculation unit 20, a packet information extraction unit 21, a probability calculation unit 24, a maximum likelihood sequence estimation unit 25, an application candidate dynamic control unit 26, and an iterative operation management unit 27. .

パケット情報抽出部２１は、受信装置Ａで受信したパケット系列を構成する個々のパケットのパケット長及びパケット到着時刻を抽出する。そして、抽出されたパケット情報は記憶装置Ｃに送信される。 The packet information extraction unit 21 extracts packet lengths and packet arrival times of individual packets constituting the packet series received by the receiving device A. The extracted packet information is transmitted to the storage device C.

統計量演算部２０は、受信したパケットのパケット情報に基づいて、パケット長の平均値や、パケット到着間隔平均値といったパケットに関する統計量を算出する。具体的には、統計量演算部２０は、パケット長統計量演算部２２と、パケット到着間隔統計量演算部２３とを有する。ここでパケット情報とは、パケットの到着順を示すパケット番号と、パケット長と、パケット到着時刻と、前記アプリケーション識別の結果を示すアプリケーション識別結果等のパケットに関する情報である。尚、パケット情報の詳細については後述する。
パケット長統計量演算部２２は、記憶装置Ｃから受信したパケット情報を用いて、注目パケットのパケット長統計量を演算する。ここで注目パケットとは、アプリケーションを識別する対象となるパケットをいう。また、パケット長統計量とは、パケット長分布、パケット長移動平均値分布、パケット長移動中央値分布、パケット長移動分散値分布、パケット長移動標準偏差分布の少なくとも１つ以上からなるものである。 Based on the packet information of the received packet, the statistic calculation unit 20 calculates a statistic regarding the packet such as an average value of the packet length and an average value of the packet arrival intervals. Specifically, the statistic calculator 20 includes a packet length statistic calculator 22 and a packet arrival interval statistic calculator 23. Here, the packet information is information about the packet such as a packet number indicating the arrival order of the packet, a packet length, a packet arrival time, and an application identification result indicating the result of the application identification. Details of the packet information will be described later.
The packet length statistic calculation unit 22 uses the packet information received from the storage device C to calculate the packet length statistic of the packet of interest. Here, the packet of interest refers to a packet that is a target for identifying an application. The packet length statistic includes at least one of a packet length distribution, a packet length moving average value distribution, a packet length moving median distribution, a packet length moving dispersion value distribution, and a packet length moving standard deviation distribution. .

パケット到着間隔統計量演算部２３は、記憶装置Ｃから受信したパケット情報を用いて、注目パケットのパケット到着間隔統計量を演算する。ここでパケット到着間隔統計量とは、パケット到着間隔分布と、パケット到着間隔移動平均値分布と、パケット到着間隔移動中央値分布と、パケット到着間隔移動分散値分布と、パケット到着間隔移動標準偏差分布の少なくとも１つ以上からなるものである。 The packet arrival interval statistic calculator 23 calculates the packet arrival interval statistic of the packet of interest using the packet information received from the storage device C. Here, the packet arrival interval statistics are the packet arrival interval distribution, the packet arrival interval moving average value distribution, the packet arrival interval moving median distribution, the packet arrival interval moving variance value distribution, and the packet arrival interval moving standard deviation distribution. It consists of at least one of the following.

確率演算部２４は、パケット長統計量演算部２２及び到着間隔統計量演算部２３とから受信した注目パケットのパケット長統計量とパケット到着間隔統計量と記憶装置Ｃに予め記憶された各アプリケーション候補を特徴付けるアプリケーション情報とを用いて、注目パケットのアプリケーション候補と注目パケットに近接する近隣パケットのアプリケーション候補との組毎に遷移確率を演算する。ここで、アプリケーション候補とは、識別するアプリケーションの候補をいう。また、遷移確率とは、二以上の受信パケットの各々にアプリケーション候補を対応させた場合に、受信パケット間でアプリケーション候補が変化する確率をいう。 The probability calculation unit 24 receives the packet length statistic of the packet of interest received from the packet length statistic calculation unit 22 and the arrival interval statistic calculation unit 23, the packet arrival interval statistic, and each application candidate stored in advance in the storage device C. Is used to calculate the transition probability for each set of the application candidate of the packet of interest and the application candidate of the neighboring packet adjacent to the packet of interest. Here, the application candidate means an application candidate to be identified. The transition probability is a probability that an application candidate changes between received packets when an application candidate is associated with each of two or more received packets.

最尤系列推定部２５は、図７に示すように、遷移確率の演算結果に対して最尤系列推定を実施する。ここで、最尤系列推定とは、注目パケットの近隣パケットのアプリケーション候補から注目パケットのアプケーション候補に到達する複数の経路（パス）のうち、選択する可能性の最も高いパスを推定することである。そして、この選択する可能性の最も高いパスのことをサバイバルパスという。サバイバルパスを見つける例としては、ビタビアルゴリズムがある。また、各パスを選択する確率値をパス確率という。サバイバルパスは、遷移確率に基づいてパス毎にパス確率を計算し、パス確率が最も高いパスを残しそれ以外のパスを削除することで推定される。 As shown in FIG. 7, the maximum likelihood sequence estimation unit 25 performs maximum likelihood sequence estimation on the calculation result of the transition probability. Here, the maximum likelihood sequence estimation is to estimate the path most likely to be selected from a plurality of routes (paths) that reach the application candidate of the packet of interest from the application candidate of the neighboring packet of the packet of interest. is there. The path most likely to be selected is called a survival path. An example of finding a survival path is the Viterbi algorithm. A probability value for selecting each path is called a path probability. The survival path is estimated by calculating the path probability for each path based on the transition probability, leaving the path with the highest path probability, and deleting the other paths.

アプリケーション候補動的制御部２６は、最尤系列推定部２５の結果を用いて前記アプリケーション候補の種別及びフロー数を動的に制御する。具体的には、あるアプリケーション候補のフローが新規に発生した場合等に対処するために、既にあるアプリケーション候補Ａとは別に新たなアプリケーション候補としてアプリケーション候補Ａを増加させる制御を行う。 The application candidate dynamic control unit 26 dynamically controls the type of application candidate and the number of flows using the result of the maximum likelihood sequence estimation unit 25. Specifically, in order to deal with a case where a flow of a certain application candidate newly occurs, control is performed to increase the application candidate A as a new application candidate separately from the existing application candidate A.

繰返し動作管理部２７は、時系列順に注目パケットを変化させて、受信したパケット系列を構成する全てのパケットについてアプリケーション識別が終了するまでアプリケーション識別動作が繰り返されるよう管理する。 The repetitive operation management unit 27 changes the packet of interest in chronological order, and manages the application identification operation to be repeated until the application identification is completed for all the packets constituting the received packet sequence.

次に、記憶装置Ｃは、パケット情報格納部３１と、アプリケーション情報格納部３２とを有する。 Next, the storage device C includes a packet information storage unit 31 and an application information storage unit 32.

パケット情報格納部３１には、受信したパケット系列を構成する個々のパケットのパケット情報が保存される。パケット情報とは、パケットを一意に特定しその特徴を示す情報であり、図２に示すように、パケット毎に、パケット情報抽出部２１により抽出されたパケット長と、パケット到着時刻と、パケット情報抽出部２１がパケット到着時刻に基づいて付与したパケットの受信順番を表すパケット番号と、アプリケーション識別装置Ｂで識別されたアプリケーションのアプリケーション識別結果とを含む。 The packet information storage unit 31 stores packet information of individual packets constituting the received packet series. The packet information is information that uniquely identifies a packet and indicates its characteristics. As shown in FIG. 2, for each packet, the packet length extracted by the packet information extraction unit 21, the packet arrival time, and the packet information It includes a packet number representing the reception order of the packets given by the extraction unit 21 based on the packet arrival time, and an application identification result of the application identified by the application identification device B.

アプリケーション情報格納部３２には、アプリケーション情報が保存される。ここで、アプリケーション情報とは、アプリケーション候補となる各アプリケーションを特徴付ける情報である。そして、アプリケーション候補となるアプリケーションのフローのみがネットワーク上を流れている状況において、そのパケットに対し統計量演算を行うことで予め用意しておく。このアプリケーション情報には、パケット長統計量、パケット到着間隔統計量、アプリケーション種別等の情報が含まれる。ここで、アプリケーション種別とは、アプリケーションを特定するための情報であり、ＮｅｔｍｅｅｔｉｎｇやＷｉｎｎｙ等といった具体的なアプリケーション名である。ＴＣＰ／ＵＤＰのポート番号等のアプリケーションを特定できる情報を代わりに用いてもよい。図４を参照すると、アプリケーション情報の例としてパケット長統計量、パケット到着間隔統計量が示されている。図４の例では統計量の分布を積分すると１になるように正規化している。 Application information is stored in the application information storage unit 32. Here, the application information is information that characterizes each application as an application candidate. Then, in a situation where only the flow of an application that is an application candidate flows on the network, it is prepared in advance by performing a statistic calculation on the packet. This application information includes information such as a packet length statistic, a packet arrival interval statistic, and an application type. Here, the application type is information for specifying an application, and is a specific application name such as Netmeeting or Winny. Information that can specify an application such as a TCP / UDP port number may be used instead. Referring to FIG. 4, packet length statistics and packet arrival interval statistics are shown as examples of application information. In the example of FIG. 4, the distribution of statistics is normalized so as to be 1 when integrated.

また、表示装置Ｄの表示部４は液晶表示装置（ＬＣＤ）であり、アプリケーションの識別結果を表示するためのものである。表示手段を有するものであれば、ＬＣＤ以外の表示装置でもよい。 The display unit 4 of the display device D is a liquid crystal display device (LCD) for displaying an application identification result. A display device other than the LCD may be used as long as it has display means.

次に、上記のように構成させたアプリケーション識別システムの動作について図３のフローチャートに沿って、図２〜７を適宜用いながら説明する。 Next, the operation of the application identification system configured as described above will be described along with the flowchart of FIG.

尚、以下の説明において、アプリケーション候補がＡ，Ｂ，Ｃの３種類存在するものとして説明する。また、パケット長統計量としてパケット長分布及びパケット長移動平均値分布を演算し、パケット到着間隔統計量としてパケット到着間隔分布及びパケット到着間隔移動平均値分布を演算する場合を例として説明する。 In the following description, it is assumed that there are three types of application candidates A, B, and C. Further, an example will be described in which a packet length distribution and a packet length moving average value distribution are calculated as packet length statistics, and a packet arrival interval distribution and packet arrival interval moving average value distribution are calculated as packet arrival interval statistics.

アプリケーションＡ，Ｂ，Ｃのデータフローが混在するトラフィックのパケット系列を、アプリケーション識別システムの受信装置Ａの受信部１が受信すると（ステップＳ１）、受信装置Ａがパケットを受信した時の時刻であるパケット到着時刻が各パケットに付加される。 When the receiving unit 1 of the receiving device A of the application identification system receives a packet sequence of traffic in which the data flows of the applications A, B, and C are mixed (step S1), it is the time when the receiving device A receives the packet. A packet arrival time is added to each packet.

そして、パケット情報抽出部２１によって各パケットのパケット長及びパケット到着時刻の情報が抽出され、記憶装置Ｃのパケット情報格納部３１に保存される。このとき、パケットの到着順を示すパケット番号の情報もパケット情報抽出部２１によって生成され、記憶装置Ｃのパケット情報格納部３１に保存される（ステップＳ２）。 Then, the packet information extraction unit 21 extracts the packet length and packet arrival time information of each packet and stores them in the packet information storage unit 31 of the storage device C. At this time, packet number information indicating the arrival order of the packets is also generated by the packet information extraction unit 21 and stored in the packet information storage unit 31 of the storage device C (step S2).

次に、統計量演算部２０では、ステップＳ２によって記憶装置Ｃに保存されているパケット情報に基づいて、受信したパケット系列に対し、注目パケット及び注目パケットの直前のパケットにそれぞれあるアプリケーション候補を仮定する。そして、注目パケットのアプリケーション候補と、注目パケットの近隣パケットのアプリケーション候補との組毎に、受信したパケットのパケット情報に基づいてパケット長統計量とパケット到着間隔統計量を演算する。 Next, the statistic calculator 20 assumes application candidates in the packet of interest and the packet immediately before the packet of interest for the received packet sequence based on the packet information stored in the storage device C in step S2. To do. Then, the packet length statistic and the packet arrival interval statistic are calculated for each set of the application candidate of the packet of interest and the application candidate of the neighboring packet of the packet of interest based on the packet information of the received packet.

具体的には、まず、パケット長統計量演算部２２は、パケット情報格納部３１に保存されている各パケットのパケット長に関する情報を用いて、注目パケット及び注目パケットの直前のパケットについて、パケット長分布とパケット長平均をパケット長統計量として演算する（ステップＳ４−１）。 Specifically, first, the packet length statistic calculation unit 22 uses the information on the packet length of each packet stored in the packet information storage unit 31 to determine the packet length for the packet of interest and the packet immediately before the packet of interest. The distribution and the packet length average are calculated as packet length statistics (step S4-1).

尚、ここではパケット長統計量としてパケット長分布とパケット長移動平均値分布とを算出しているが、これらのうちいずれか１つを算出する形でもよいし、パケット長移動中央値分布、パケット長移動分散値分布、パケット長移動標準偏差分布といったパケット長統計量を算出してもよい。 Here, the packet length distribution and the packet length moving average value distribution are calculated as the packet length statistics, but any one of these may be calculated, or the packet length moving median distribution, Packet length statistics such as long movement variance distribution and packet length movement standard deviation distribution may be calculated.

また、パケット到着間隔統計量演算部２３は、パケット情報格納部３１に保存されているパケット到着時刻の情報を用いて、注目パケット及び注目パケットの近隣のパケットについて、パケット到着間隔統計量としてパケット到着間隔分布及びパケット到着間隔移動平均値分布を算出する（ステップＳ４−２）。 Further, the packet arrival interval statistic calculation unit 23 uses the packet arrival time information stored in the packet information storage unit 31 as a packet arrival interval statistic for the packet of interest and the neighboring packets of the packet of interest. The interval distribution and the packet arrival interval moving average value distribution are calculated (step S4-2).

尚、ここではパケット到着間隔統計量としてパケット到着間隔分布とパケット到着間隔移動平均値分布とを算出しているが、これらのうちいずれか一方を算出する形でもよいし、パケット到着間隔移動平均値分布、パケット到着間隔移動中央値分布、パケット到着間隔移動分散値分布と、パケット到着間隔移動標準偏差分布を算出してもよい。 Here, the packet arrival interval distribution and the packet arrival interval moving average value distribution are calculated as the packet arrival interval statistics, but either one of them may be calculated, or the packet arrival interval moving average value may be calculated. Distribution, packet arrival interval movement median distribution, packet arrival interval movement variance distribution, and packet arrival interval movement standard deviation distribution may be calculated.

図２では、パケット番号がｎから（ｎ−６）の場合のパケット情報を示している。パケット番号ｎ以前のパケットのアプリケーション識別は終了しているとし、パケット番号ｎのひとつ前のパケット（ｎ−１）は複数のアプリケーション候補の可能性が残されているものとする。一番確率の高いアプリケーション候補のみを残してしまうと、誤認識した場合にリカバリーが難しくなるためである。 FIG. 2 shows packet information when the packet number is from n to (n-6). It is assumed that the application identification of the packet before the packet number n has been completed, and the packet (n−1) immediately before the packet number n has a possibility of a plurality of application candidates. This is because if only the application candidate with the highest probability is left, recovery becomes difficult if it is erroneously recognized.

まず、注目パケットであるパケット番号ｎのパケットのひとつ前のパケットを適当なアプリケーション候補に仮定する。例えばアプリケーション候補Ｂであると仮定する。この上で注目パケットであるパケット番号ｎのパケットを適当なアプリケーション候補に仮定する。例えばアプリケーション候補Ａとする。この仮定の下で、パケット番号ｎからパスを辿っていき、以前にアプリケーション候補Ａと識別されたパケットとの関係からパケット長統計量及びパケット到着間隔を演算する。具体的には、パケット番号ｎ−４及びｎ−６がアプリケーション候補Ａと識別されているので、ｌ＿（ｎ）とｌ＿（ｎ−４）とｌ＿（ｎ−６）の平均値がパケット長移動平均値となる。また、ｔ＿（ｎ）とｔ＿（ｎ−４）との差及びｔ＿（ｎ−４）とｔ＿（ｎ−６）との差の平均値がパケット到着間隔移動平均値となる。 First, it is assumed that the packet immediately before the packet of packet number n, which is the packet of interest, is an appropriate application candidate. For example, it is assumed that the candidate application B. On this basis, it is assumed that the packet of the packet number n that is the packet of interest is an appropriate application candidate. For example, application candidate A is assumed. Under this assumption, the path is traced from the packet number n, and the packet length statistic and the packet arrival interval are calculated from the relationship between the application candidate A and the previously identified packet. Specifically, since packet numbers n-4 and n-6 are identified as application candidate A, the average value of l_ (n), l_ (n-4), and l_ (n-6) is the packet length shift. Average value. Further, the average value of the difference between t_ (n) and t_ (n-4) and the difference between t_ (n-4) and t_ (n-6) is the packet arrival interval moving average value.

これらの統計量を過去どの程度のパケットを演算に含めるかはパラメータとする。多くのパケットを演算に含めれば統計的情報が増えるので信頼性が向上する。一方仮に誤識別してしまった場合、その情報を長い間引きずってしまう可能性もある。このパラメータは識別対象とするトラフィックの特性に合わせ調整することが望ましい。 It is a parameter how many packets in the past are included in these calculations. If many packets are included in the calculation, the statistical information increases, so the reliability is improved. On the other hand, if it is misidentified, the information may be dragged for a long time. It is desirable to adjust this parameter according to the characteristics of the traffic to be identified.

これらの演算を注目パケットであるパケット番号ｎのアプリケーション候補の仮定を変えていき、全てのアプリケーション候補について実施する。さらに注目パケットであるパケット番号ｎのひとつ前のパケットの仮定も変え、遷移する可能性のある全ての組みについて演算を実施する。尚、この仮定をする順序に制約はなく、どのアプリケーション候補から仮定し、演算してもよい。 These calculations are performed for all application candidates by changing the assumption of the application candidate of the packet number n that is the packet of interest. Further, the assumption of the packet immediately before the packet number n, which is the packet of interest, is also changed, and the calculation is performed for all combinations that may be changed. There is no restriction on the order in which this assumption is made, and any application candidate may be assumed and calculated.

次に、上述した統計量演算部２０による演算結果であるパケット長統計量及びパケット到着間隔統計量は、確率演算部２４に送信される。そして、確率演算部２４は、このパケット長統計量及びパケット到着間隔統計量と、注目パケットのアプリケーション候補のアプリケーション情報とを比較することで、注目パケットの直前パケットのアプリケーション候補から注目パケットのアプリケーション候補へと遷移する確率である遷移確率を演算する（ステップＳ６）。 Next, the packet length statistic and the packet arrival interval statistic, which are the calculation results by the above-described statistic calculation unit 20, are transmitted to the probability calculation unit 24. Then, the probability calculation unit 24 compares the packet length statistic and the packet arrival interval statistic with the application information of the application candidate of the packet of interest, thereby comparing the application candidate of the packet of interest with the application candidate of the packet immediately before the packet of interest. A transition probability that is a probability of transition to is calculated (step S6).

この遷移確率を演算する具体的な動作は下記の通りである。 The specific operation for calculating this transition probability is as follows.

まず、注目パケットに関するパケット長統計量（パケット長分布、パケット長移動平均値分布）及びパケット到着間隔統計量（到着間隔分布、到着間隔移動平均値分布）とアプリケーション情報のパケット長統計量及びパケット到着間隔統計量をそれぞれ比較し、どの程度一致しているか示す確率である適合率を計算する（ステップＳ５）。図５を参照すると、注目パケットの統計量と、アプリケーション情報を比較し適合度を求める例を示している。この例の場合パケット長が１２０バイトであれば適合度が０．３である。 First, packet length statistics (packet length distribution, packet length moving average value distribution) and packet arrival interval statistics (arrival interval distribution, arrival interval moving average value distribution), packet length statistics of application information and packet arrival for the packet of interest The interval statistics are compared with each other, and the precision, which is the probability indicating how much they match, is calculated (step S5). Referring to FIG. 5, an example is shown in which the statistic of the packet of interest and application information are compared to determine the fitness. In this example, if the packet length is 120 bytes, the fitness is 0.3.

そして、この適合度計算（ステップＳ５）を、注目パケットのアプリケーション候補と注目パケットの近隣パケットのアプリケーション候補の全ての組について行う。 Then, this fitness calculation (step S5) is performed for all sets of application candidates for the target packet and application candidates for neighboring packets of the target packet.

次に、この算出した各適合度に対応する遷移確率をそれぞれ算出する。具体的には、図６に示すように、総適合度に占める適合度合計の割合を各遷移の遷移確率とする。ここで、適合度合計とは、アプリケーション候補毎の適合度の合計値であり、総適合度とは、アプリケーションＡ，Ｂ，Ｃを含めた全ての適合度の合計値である。アプリケーション情報との適合度が高い程遷移確率も高くなる。 Next, the transition probabilities corresponding to the calculated degrees of fitness are calculated. Specifically, as shown in FIG. 6, the ratio of the total fitness to the total fitness is defined as the transition probability of each transition. Here, the total fitness level is a total value of fitness levels for each application candidate, and the total fitness level is a total value of all fitness levels including applications A, B, and C. The higher the fitness with the application information, the higher the transition probability.

ここで、上記の遷移確率を求める演算を（１）の式で示す。 Here, the calculation for obtaining the transition probability is expressed by the equation (1).

アプリケーション候補数をＡ、統計量演算項目数をＢとし、演算対象パケットである注目パケットのインデックスをｎ、注目パケットの直前のパケットの仮定のアプリケーション候補の番号をｉ（１≦ｉ≦Ａ）、演算対象パケットの仮定のアプリケーション候補の番号をｊ（１≦ｊ≦Ａ）とする。すると、ｉからｊへの遷移確率Ｐ_{ｎ，ｊ→ｊ}は、

となる。ここでＣ_{ｎ，ｉｋｌ}とは、ｎ−１番目のパケットの仮定がｉ、ｎ番目のパケットの仮定がｋ、統計量演算項目インデックスがｌである場合の適合度である。例えばｌ＝パケット長分布などがある。以上が、確率演算部２４が遷移確率を求めるための具体的な動作の説明である。 The number of application candidates is A, the number of statistics calculation items is B, the index of the packet of interest that is the calculation target packet is n, the number of the hypothetical application candidate of the packet immediately before the packet of interest is i (1 ≦ i ≦ A), Let j (1 ≦ j ≦ A) be the number of the hypothetical application candidate of the operation target packet. Then, the transition probability P _{n, j → j} from i to _j is

It becomes. Here, C _{n, ikl} is _{the fitness} when the assumption of the (n−1) -th packet is i, the assumption of the n-th packet is k, and the statistic calculation item index is l. For example, l = packet length distribution. The above is the description of the specific operation for the probability calculation unit 24 to obtain the transition probability.

次に、最尤系列推定部２５は、確率演算部２４から受信した注目パケットのひとつ前のパケットとのアプリケーション候補組み合わせの遷移確率を基に、最尤系列推定を実施する（ステップＳ７）。 Next, the maximum likelihood sequence estimation unit 25 performs maximum likelihood sequence estimation based on the transition probability of the application candidate combination with the packet immediately before the packet of interest received from the probability calculation unit 24 (step S7).

ここで、サバイバルパスを求める演算を（２）の式で示す。演算対象パケットである注目パケットのインデックスがｎ、アプリケーション候補の番号がｊ、ｉからｊへの遷移確率をＰ_{ｎ，ｉ→ｊ}であるとすると、サバイバルパス確率Ｑ_ｎ，ｊは、

となる。ここで、ｍａｘ｛ｆ（ｘ，ｙ）｜ｍ≦ｙ≦ｎ｝はｍ≦ｙ≦ｎの範囲内でｆ（ｘ，ｙ）の最大値を表す関数である。 Here, the calculation for obtaining the survival path is shown by the equation (2). Assuming that the index of the packet of interest that is the computation target packet is n, the application candidate number is j, and the transition probability from i to j is P _{n, i → j} , the survival path probability Q _{n, j} is

It becomes. Here, max {f (x, y) | m ≦ y ≦ n} is a function representing the maximum value of f (x, y) within the range of m ≦ y ≦ n.

また、図７には最尤系列推定の例を示している。図７に示したように最尤系列推定とは、パケット系列に従い各パケットをアプリケーション候補に塗り分けていくことである。最尤系列推定により、誤識別を防ぐため残していた複数のアプリケーション候補が削除され、識別結果パスは1本へ収束していく。 FIG. 7 shows an example of maximum likelihood sequence estimation. As shown in FIG. 7, the maximum likelihood sequence estimation is to paint each packet into application candidates according to the packet sequence. By the maximum likelihood sequence estimation, a plurality of remaining application candidates are deleted to prevent erroneous identification, and the identification result path converges to one.

次に、アプリケーション候補動的制御部２６は、アプリケーション候補を動的に制御する（ステップＳ８）。すなわち、最尤系列推定の結果新しいフローが増加していた場合は該アプリケーション候補の候補数を増やし、逆にフローが減少していた場合該当アプリケーション候補の候補数を減らす。例えばアプリケーション候補Ａのフローが増加したと識別された場合、さらに増えることを想定してアプリケーション候補Ａの候補数を増やす。また複数あったアプリケーションＢのフローが減少したと識別された場合、アプリケーション候補Ｂの候補数を削減する。例えば一定時間パケットを受信しなかったら、そのフローは無くなったと判定する、などの制御を行う。 Next, the application candidate dynamic control unit 26 dynamically controls application candidates (step S8). That is, if the new flow has increased as a result of maximum likelihood sequence estimation, the number of candidate application candidates is increased. Conversely, if the flow has decreased, the number of candidate application candidates is decreased. For example, when it is identified that the flow of the application candidate A has increased, the number of candidates for the application candidate A is increased on the assumption that the flow further increases. When it is identified that the flow of the plurality of application B has decreased, the number of candidate application candidates B is reduced. For example, if a packet is not received for a certain period of time, it is determined that the flow has been lost.

次に、繰り返し動作管理部２７は注目パケットを変更する。注目パケットを変更する際、パケット系列の識別が全て終了している場合は、パケット系列のアプリケーション識別結果が表示装置４により表示されて終了する（ステップＳ１０）。パケット系列の識別が終了していない場合は、注目パケットを変更し、ステップＳ３に移行する。また、注目パケットを変更する際、パケット情報格納部３１に最新のパケット情報を保存する（ステップＳ９）。 Next, the repetitive operation management unit 27 changes the packet of interest. When the packet of interest is completely identified when changing the packet of interest, the application identification result of the packet sequence is displayed on the display device 4 and the process ends (step S10). If the identification of the packet series has not ended, the packet of interest is changed and the process proceeds to step S3. When changing the packet of interest, the latest packet information is stored in the packet information storage unit 31 (step S9).

上記のように構成されたアプリケーション識別システムにより、図７に示したように、受信したパケット系列はアプリケーションフローに振り分けられる。 As shown in FIG. 7, the received packet sequence is distributed to application flows by the application identification system configured as described above.

本実施例では、注目パケットのパケット長統計量及びパケット到着間隔統計量を用いて、注目パケットのひとつ前のパケットとのアプリケーション候補組み合わせの遷移確率を演算する場合を例にとったが、注目パケットと注目パケットのひとつ前のパケットと注目パケットのふたつ前のパケットの３つのパケット間のパケット長統計量及びパケット到着間隔統計量を用いて遷移確率を演算することもできる。あるいは、３つ以上複数のパケット間のパケット長統計量及びパケット到着間隔統計量を用いて遷移確率を演算することもできる。 In this embodiment, the transition probability of the application candidate combination with the packet immediately before the packet of interest is calculated using the packet length statistic and the packet arrival interval statistic of the packet of interest. The transition probability can be calculated using the packet length statistic and the packet arrival interval statistic between the three packets of the packet immediately before the packet of interest and the packet two packets before the packet of interest. Alternatively, the transition probability can be calculated using a packet length statistic and a packet arrival interval statistic between three or more packets.

また、アプリケーション情報がアプリケーション情報格納部３２に保存されていないアプリケーションを受信することを想定して、アプリケーション候補にアプリケーション情報の無いアプリケーション用のアプリケーション候補を用意することもできる。 Further, assuming that an application whose application information is not stored in the application information storage unit 32 is received, an application candidate for an application having no application information can be prepared as an application candidate.

本発明になるシステムのブロック図。1 is a block diagram of a system according to the present invention. パケット情報の情報例を示す図。The figure which shows the example of information of packet information. 本発明になるアプリケーション識別のフローチャート。The flowchart of the application identification which becomes this invention. アプリケーション情報の情報例を示す図。The figure which shows the information example of application information. 適合度の算出例を示す図。The figure which shows the example of calculation of a fitness. 遷移確率の算出例を示す図。The figure which shows the example of calculation of a transition probability. 最尤系列推定の例を示す図。The figure which shows the example of maximum likelihood sequence estimation.

Explanation of symbols

Ａ受信装置
Ｂアプリケーション識別装置
Ｃ記憶装置
Ｄ表示装置
１受信部
２０統計量演算部
２１パケット情報抽出部
２２パケット長統計量演算部
２３パケット到着間隔統計量演算部
２４確率演算部
２５最尤系列推定部
２６アプリケーション候補動的管理部
２７繰返し動作管理部
３１パケット情報格納部
３２アプリケーション情報格納部
４表示部
特許出願人日本電気株式会社
代理人宇高克己 A receiving device B application identifying device C storage device D display device 1 receiving unit 20 statistic calculating unit 21 packet information extracting unit 22 packet length statistic calculating unit 23 packet arrival interval statistic calculating unit 24 probability calculating unit 25 maximum likelihood sequence estimation Unit 26 application candidate dynamic management unit 27 repetitive operation management unit 31 packet information storage unit 32 application information storage unit 4 display unit
Patent Applicant NEC Corporation
Representative Katsumi Udaka

Claims

An application identification system for identifying an application corresponding to a packet,
When the application candidate is associated with each of at least two or more received packets, probability calculation means for calculating a transition probability that the application candidate transitions between the received packets based on the information of the received packet and the application candidate;
And an application identification unit that identifies an application corresponding to the received packet based on the transition probability.

An application identification system for identifying an application of a target packet to be identified in a network environment in which data flows of a plurality of applications are mixed,
Statistic calculation means for calculating a statistic based on packet information of a received packet for each set of application candidates of the packet of interest and application candidates of neighboring packets of the packet of interest;
A probability calculation for calculating a transition probability that is a probability of transition from the application candidate of the neighboring packet to the application candidate of the packet of interest by comparing the statistic calculated by the statistic calculation means and the application information of the application candidate Means,
And an application identification unit that identifies an application of the packet of interest based on the transition probability.

The application identifying means performs the operation of identifying the application of the packet of interest by changing the packet of interest in chronological order with respect to the packet sequence, and divides the packet of interest into flows for each application, thereby The application identification system according to claim 2, wherein the application identification system is configured to distribute to application flows.

The statistic calculation means includes:
Packet length statistic calculating means for calculating a packet length statistic as the statistic based on packet information of the received packet;
4. The application identification system according to claim 2, further comprising packet arrival interval statistic calculation means for calculating a packet arrival interval statistic as the statistic based on packet information of a received packet.

3. The packet information includes a packet number indicating a packet arrival order, a packet length, a packet arrival time, and an application identification result that is a result of application identification by the application identification unit. The application identification system according to claim 4.

3. The application information is information that characterizes an application obtained by performing a statistic calculation on a packet in a situation where only the packet of the application that is a candidate for the application flows on the network. The application identification system according to claim 5.

7. The application identification system according to claim 6, wherein the information characterizing the application includes at least one of a packet length statistic, a packet arrival interval statistic, and an application type.

The packet length statistic of the application information includes at least one of a packet length distribution, a packet length moving average value distribution, a packet length moving median value distribution, a packet length moving variance value distribution, and a packet length moving standard deviation distribution. The application identification system according to claim 7, comprising the above.

The packet arrival interval statistic of the application information includes packet arrival interval distribution, packet arrival interval moving average distribution, packet arrival interval moving median distribution, packet arrival interval moving variance distribution, and packet arrival interval moving standard deviation. The application identification system according to claim 7, comprising at least one distribution.

The application identification system according to claim 2, wherein the application candidate is identified by identification information.

An application identification system for identifying an application of a target packet to be identified in a network environment in which data flows of a plurality of applications are mixed,
Packet information extraction means for extracting packet information of a packet sequence;
Based on the packet information, the packet of interest is assumed to be a certain application candidate for the packet sequence, and the application candidate of the packet of interest and the application candidate of the neighboring packet of the packet of interest A packet length statistic calculating means for calculating a packet length statistic based on packet information of a received packet for each set;
Based on the packet information, assume that a packet of interest and a packet in the vicinity of the packet of interest for each packet series are application candidates,
A packet arrival interval statistic calculating means for calculating a packet arrival interval statistic based on packet information of a received packet for each set of application candidates of the packet of interest and application candidates of neighboring packets of the packet of interest;
A probability calculation for calculating a transition probability that is a probability of transition from the application candidate of the neighboring packet to the application candidate of the packet of interest by comparing the statistic calculated by the statistic calculation means and the application information of the application candidate Means,
Maximum likelihood sequence estimation means for performing application identification based on maximum likelihood sequence estimation based on the transition probability;
Application candidate dynamic control means for dynamically controlling the type and number of candidates for the application candidates using the identification result;
An application identification system comprising: repetitive operation management means for managing a repetitive operation of application identification by changing a packet of interest in chronological order.

12. The application identification system according to claim 11, wherein the packet information extraction unit extracts packet lengths and packet arrival times of individual packets constituting the received packet series.

13. The application identification system according to claim 11, wherein the maximum likelihood sequence estimation unit performs maximum likelihood sequence estimation from the transition probability and distributes the packet sequence to a flow for each application.

14. The application candidate dynamic control unit dynamically controls the type and number of flows of the application candidate using a result of application identification by the maximum likelihood sequence estimation unit. The application identification system according to any one of the above.

15. The application identification system according to claim 11, wherein the repetitive operation management unit manages the application identification operation to be repeated until application identification of all packets is completed.

An application identification method for identifying an application corresponding to a packet,
A probability calculating step of associating an application candidate with each of at least two or more received packets, and calculating a transition probability that the application candidate transitions between received packets based on the information of the received packet and the application candidate;
An application identification step of identifying an application corresponding to the received packet based on the transition probability.

An application identification method for identifying an application of a target packet to be identified in a network environment in which data flows of multiple applications are mixed,
A statistic calculation step for calculating a statistic based on packet information of a received packet for each set of application candidates of the packet of interest and application candidates of neighboring packets of the packet of interest;
A transition probability that compares the statistic calculated in the statistic calculation step with the application information of the application candidate and calculates a transition probability that is a probability of transition from the application candidate of the neighboring packet to the application candidate of the packet of interest. A computation step;
An application identification step of identifying an application of the packet of interest based on the transition probability.

The application identifying step performs the operation of identifying the application of the packet of interest by changing the packet of interest in chronological order with respect to the packet sequence, and divides the packet of interest into flows for each application, thereby The application identification method according to claim 17, wherein the application identification method is assigned to an application flow.

The statistic calculation step includes:
A packet length statistic calculating step for calculating a packet length statistic as the statistic based on packet information of the received packet;
19. The application identification method according to claim 17, further comprising a packet arrival interval statistic calculating step of calculating a packet arrival interval statistic as the statistic based on packet information of a received packet.

18. The packet information includes a packet number indicating a packet arrival order, a packet length, a packet arrival time, and an application identification result that is a result of application identification in the application identification step. The application identification method according to claim 19.

The application information is information that characterizes an application obtained by performing a statistic calculation on a packet in a situation where only the packet of the application that is the application candidate flows on the network. The application identification method according to claim 20.

The application identifying method according to claim 21, wherein the information characterizing the application includes at least one of a packet length statistic, a packet arrival interval statistic, and an application type.

The packet length statistic of the application information includes at least one of a packet length distribution, a packet length moving average value distribution, a packet length moving median value distribution, a packet length moving variance value distribution, and a packet length moving standard deviation distribution. 23. The application identification method according to claim 22, comprising the above.

The packet arrival interval statistic of the application information includes packet arrival interval distribution, packet arrival interval moving average distribution, packet arrival interval moving median distribution, packet arrival interval moving variance distribution, and packet arrival interval moving standard deviation. The application identification method according to claim 22, comprising at least one of distributions.

The application candidate according to any one of claims 17 to 24, wherein the application candidate is identified by identification information.

An application identification method for identifying an application of a target packet to be identified in a network environment in which data flows of multiple applications are mixed,
A packet information extraction step for extracting packet information of the packet sequence;
Based on the packet information, the packet of interest is assumed to be a certain application candidate for the packet sequence, and the application candidate of the packet of interest and the application candidate of the neighboring packet of the packet of interest A packet length statistic calculating step for calculating a packet length statistic based on packet information of the received packet for each set;
Based on the packet information, assume that a packet of interest and a packet in the vicinity of the packet of interest for each packet series are application candidates,
A packet arrival interval statistic calculating step for calculating a packet arrival interval statistic based on packet information of the received packet for each set of the application candidate of the packet of interest and an application candidate of a neighboring packet of the packet of interest;
A probability calculation that calculates a transition probability that is a probability of transition from the application candidate of the neighboring packet to the application candidate of the packet of interest by comparing the statistic calculated in the statistic calculation step and the application information of the application candidate. Steps,
A maximum likelihood sequence estimation step for performing application identification based on maximum transition sequence estimation based on the transition probability; and
An application candidate dynamic control step for dynamically controlling the type and number of candidates for the application candidate using an identification result;
A repetitive operation management step of managing a repetitive operation of application identification by changing a packet of interest in time series order.

27. The application identification method according to claim 26, wherein the packet information extraction step extracts a packet length and a packet arrival time of each packet constituting the received packet series.

The application identification method according to claim 26 or 27, wherein the maximum likelihood sequence estimation step performs maximum likelihood sequence estimation from the transition probability and distributes the packet sequence to a flow for each application.

29. The application candidate dynamic control step dynamically controls the type and number of flows of the application candidate using a result of application identification performed by the maximum likelihood sequence estimation step. The application identification method according to any one of the above.

30. The application identification method according to claim 26, wherein the repetitive operation management step performs management so that the application identification operation is repeated until application identification of all packets is completed.

A program of an application identification system for identifying an application corresponding to a packet, the program identifying the application identification system,
When the application candidate is associated with each of at least two or more received packets, probability calculation means for calculating a transition probability that the application candidate transitions between the received packets based on the information of the received packet and the application candidate;
A program that functions as an application identification unit that identifies an application corresponding to a received packet based on the transition probability.

In a network environment in which data flows of a plurality of applications are mixed, an application identification system program for identifying an application of a target packet to be identified, the program includes the system,
Statistic calculation means for calculating a statistic based on packet information of a received packet for each set of application candidates of the packet of interest and application candidates of neighboring packets of the packet of interest;
A probability calculation for calculating a transition probability that is a probability of transition from the application candidate of the neighboring packet to the application candidate of the packet of interest by comparing the statistic calculated by the statistic calculation means and the application information of the application candidate Means,
A program that functions as an application identification unit that identifies an application of the packet of interest based on the transition probability.

The application identifying means performs an operation of identifying the application of the packet of interest by changing the packet of interest in chronological order with respect to the packet sequence, and divides the packet of interest into flows for each application, thereby The program according to claim 32, wherein the program is distributed to application flows.

The statistic calculation means includes:
Packet length statistic calculating means for calculating a packet length statistic as the statistic based on packet information of the received packet;
24. The program according to claim 32, further comprising packet arrival interval statistic calculating means for calculating a packet arrival interval statistic as the statistic based on packet information of a received packet.

The packet information includes a packet number indicating the arrival order of packets, a packet length, a packet arrival time, and an application identification result which is a result of application identification by the application identification unit. Item 35. The program according to any one of item 34.

The application information is information that characterizes an application obtained by performing a statistic calculation on a packet in a situation where only the packet of the application as the application candidate flows on the network. 36. The program according to claim 35.

The program according to claim 36, wherein the information characterizing the application includes at least one of a packet length statistic, a packet arrival interval statistic, and an application type.

The packet length statistic of the application information includes at least one of a packet length distribution, a packet length moving average value distribution, a packet length moving median value distribution, a packet length moving variance value distribution, and a packet length moving standard deviation distribution. The program according to claim 37, comprising the above.

The packet arrival interval statistic of the application information includes packet arrival interval distribution, packet arrival interval moving average distribution, packet arrival interval moving median distribution, packet arrival interval moving variance distribution, and packet arrival interval moving standard deviation. The program according to claim 37, comprising at least one distribution.

40. The program according to claim 32, wherein the application candidate is identified by identification information.

In a network environment in which data flows of a plurality of applications are mixed, an application identification system program for identifying an application of a target packet to be identified, the program identifying the application identification system,
Packet information extraction means for extracting packet information of a packet sequence;
Based on the packet information, the packet of interest is assumed to be a certain application candidate for the packet sequence, and the application candidate of the packet of interest and the application candidate of the neighboring packet of the packet of interest A packet length statistic calculating means for calculating a packet length statistic based on packet information of a received packet for each set;
Based on the packet information, assume that a packet of interest and a packet in the vicinity of the packet of interest for each packet series are application candidates,
A packet arrival interval statistic calculating means for calculating a packet arrival interval statistic based on packet information of a received packet for each set of application candidates of the packet of interest and application candidates of neighboring packets of the packet of interest;
A probability calculation for calculating a transition probability that is a probability of transition from the application candidate of the neighboring packet to the application candidate of the packet of interest by comparing the statistic calculated by the statistic calculation means and the application information of the application candidate Means,
Maximum likelihood sequence estimation means for performing application identification based on maximum likelihood sequence estimation based on the transition probability;
Application candidate dynamic control means for dynamically controlling the type and number of candidates for the application candidates using the identification result;
A program that functions as a repetitive operation management unit that manages a repetitive operation of application identification by changing a packet of interest in time series.

42. The program according to claim 41, wherein the packet information extraction unit extracts a packet length and a packet arrival time of each packet constituting the received packet series.

43. The program according to claim 41 or claim 42, wherein the maximum likelihood sequence estimation means performs maximum likelihood sequence estimation from the transition probability, and distributes the packet sequence to a flow for each application.

44. The application candidate dynamic control unit dynamically controls the type and number of flows of the application candidate using the result of application identification by the maximum likelihood sequence estimation unit. The program according to any one.

The program according to any one of claims 41 to 44, wherein the repetitive operation managing means manages the application identifying operation to be repeated until application identification of all packets is completed.