JP6410309B2

JP6410309B2 - Communication identification method and apparatus

Info

Publication number: JP6410309B2
Application number: JP2014265284A
Authority: JP
Inventors: 秀行小頭
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2014-12-26
Filing date: 2014-12-26
Publication date: 2018-10-24
Anticipated expiration: 2034-12-26
Also published as: JP2016127361A

Description

本発明は、通信識別方法および装置に係り、特に、通信端末がフォアグラウンド（Foreground）で行う通信とバックグラウンド(Background)で行う通信とを識別する通信識別方法および装置に関する。 The present invention relates to a communication identification method and apparatus, and more particularly, to a communication identification method and apparatus for identifying communication performed by a communication terminal in the foreground and communication performed in the background.

通信端末が実行する通信の内容を識別する技術が研究されている。特許文献１には、通信トラヒック量の時系列変化やデータから、フーリエ変換等を用いて周期性を算出する手法が開示されている。 A technique for identifying the contents of communication executed by a communication terminal has been studied. Patent Document 1 discloses a technique for calculating periodicity from a time series change of communication traffic volume or data using Fourier transform or the like.

特許文献２には、通信トラヒックをフロー毎に測定・集約し、フロー毎の通信トラヒック特性から、周波数解析手法であるケプストラム分析を適用することにより、フローからアプリ（通信サービスの種別）を識別する技術が開示されている。 In Patent Literature 2, communication traffic is measured and aggregated for each flow, and a cepstrum analysis that is a frequency analysis method is applied from the communication traffic characteristics for each flow to identify an application (type of communication service) from the flow. Technology is disclosed.

特許文献３には、通信トラヒックをフロー毎に測定・集約し、フロー毎の通信トラヒック特性から特徴量を算出し、機械学習を適用することにより、フローから通信サービスの種別（アプリケーション）を識別する技術が開示されている。 In Patent Document 3, communication traffic is measured and aggregated for each flow, a feature amount is calculated from communication traffic characteristics for each flow, and machine learning is applied to identify the type (application) of the communication service from the flow. Technology is disclosed.

特開2010-283668号公報JP 2010-283668 A 特開2012-105043号公報JP 2012-105043 A 特開2013-127504号公報JP 2013-127504 A 特願2014-037083号Japanese Patent Application No. 2014-037083

近年におけるモバイル端末の爆発的な普及によりさまざまなサービスが台頭し、端末ユーザの通信要求操作を契機に生起されるフォアグラウンド通信のみならず、端末ユーザの通信要求操作とは無関係に、アプリケーションが自身に都合の良い任意のタイミングでサーバ等と通信するバックグラウンド通信が生起されるようになってきた。 Due to the explosive spread of mobile terminals in recent years, various services have emerged, and not only foreground communication that is triggered by terminal user's communication request operation, but also the application itself Background communication that communicates with a server or the like has occurred at an arbitrary convenient timing.

各通信方式は、重要度、緊急度、データサイズ、遅延やジッタに対する許容度、端末ユーザの主観評価に与える影響などが異なるので、各通信方式を識別できれば様々な用途で活用できる。 Each communication method differs in importance, urgency, data size, tolerance for delay and jitter, influence on the subjective evaluation of the terminal user, and so on, so that each communication method can be used for various purposes if it can be identified.

しかしながら、特許文献１は、目的が異常検知すなわち平常時からの傾向変化検知であり、フォア／バックグラウンド通信を識別できない。特許文献２，３は、アプリケーションや通信サービスの識別を目的とする技術であり、フォア／バックグラウンド通信は識別できない。 However, in Patent Document 1, the purpose is abnormality detection, that is, trend change detection from normal times, and foreground / background communication cannot be identified. Patent Documents 2 and 3 are technologies aimed at identifying applications and communication services, and cannot identify foreground / background communication.

一方、このような技術課題に対して、本発明の発明者等は、ネットワーク上で観測される各セッションを、その送信元情報および宛先情報の組み合わせに固有のSDグループに分類すると共に、SDグループごとに各セッションの生起タイミングに関する自己相関を計算し、自己相関係数が高いセッションはバックグラウンド通信に識別する通信識別装置を発明し、特許出願（特許文献４）した。 On the other hand, in response to such technical problems, the inventors of the present invention classify each session observed on the network into an SD group specific to the combination of the transmission source information and the destination information, and the SD group. For each session, an autocorrelation relating to the occurrence timing of each session was calculated, and a communication identification device was invented for identifying a session having a high autocorrelation coefficient as background communication, and a patent application (Patent Document 4) was filed.

しかしながら、Webコンテンツをインターネット経由で配信するために最適化されたCDN(Contents Delivery Network)や、Google社が提供するクラウドサービスでは、コンテンツを複数のサーバに分散配置したり、コンテンツの配信元をユーザ端末の位置に応じて動的に変更、最適化したりすることがある。 However, CDN (Contents Delivery Network) optimized for delivering web content over the Internet and cloud services provided by Google Inc. distribute content on multiple servers, and distribute content from users. It may be changed or optimized dynamically depending on the location of the terminal.

このような場合、送信元（クライアント・ユーザ端末）が同一であり、かつ同一のサービスやアプリケーションに係るセッションでありながら、宛先が異なるために別々のSDグループに分類されてしまうので、各セッションの関連性に基づいて通信種別を識別することができなかった。 In such a case, the source (client / user terminal) is the same and the session is related to the same service or application, but because the destination is different, it is classified into different SD groups. Communication type could not be identified based on relevance.

本発明の目的は、上記の技術課題を解決し、宛先情報が異なるセッション同士であっても、送信元情報が同一であって、トラヒック特性が類似あるいは高い相関を示すセッションは、同一のサービスまたはアプリケーションに係るセッションとして扱うことで、フォアグラウンド通信またはバックグラウンド通信に識別できる通信識別方法および装置を提供することにある。 An object of the present invention is to solve the above technical problem, and even if sessions having different destination information are used, sessions having the same transmission source information and similar or high correlation in traffic characteristics may be the same service or An object of the present invention is to provide a communication identification method and apparatus that can be identified as a foreground communication or a background communication by treating the session as an application.

上記の目的を達成するために、本発明は、通信端末が通信相手との間に確立したセッションをフォアグラウンド通信およびバックグラウンド通信のいずれかに識別する通信識別装置において、以下の構成を具備した点に特徴がある。 In order to achieve the above object, the present invention provides a communication identification apparatus that identifies a session established between a communication terminal and a communication partner as either foreground communication or background communication, and has the following configuration: There is a feature.

(1) 各セッションを送信元情報および宛先情報の組み合わせに固有のSDグループに分類する手段と、送信元情報が同一で宛先情報の異なる複数のSDグループを、各セッションのトラヒック特性の相関関係に基づいて集約する手段と、SDグループごとにセッションの生起タイミング特性を計算する手段と、セッションの生起タイミング特性に基づいてSDグループのセッションを識別する手段とを具備した。 (1) A method for classifying each session into an SD group specific to a combination of source information and destination information, and a plurality of SD groups with the same source information but different destination information are correlated with the traffic characteristics of each session. A means for aggregating based on; a means for calculating an occurrence timing characteristic of a session for each SD group; and a means for identifying a session of the SD group based on the occurrence timing characteristic of the session.

(2) 生起タイミング特性を計算する手段が、SDグループごとに各セッションの生起タイミングに関する自己相関を計算し、自己相関が所定の閾値を超えるSDグループのセッションをバックグラウンド通信に識別するようにした。 (2) The means for calculating the occurrence timing characteristics calculates the autocorrelation related to the occurrence timing of each session for each SD group, and identifies sessions in the SD group whose autocorrelation exceeds a predetermined threshold as background communication. .

(3) 生起タイミング特性を計算する手段が、SDグループごとに各セッションの生起タイミングを所定の時間幅を有するbinの位置で代表して当該bin位置を要素とする生起タイミングベクトルを生成する手段と、生起タイミングベクトルの異なる要素の組み合わせごとに差分を計算する手段と、前記差分が同一となる組み合わせの要素集合ごとに当該差分がセッションの生起周期である信頼度を計算する手段とを具備し、各生起周期の信頼度に基づいてセッションを識別するようにした。
(4) セッション集約手段が、送信元情報が同一で宛先情報の異なるSDグループ間でトラヒック特性の相互相関を計算する手段を具備し、相互相関が所定の閾値を超えるSDグループ同士を一のSDグループに集約するようにした。
また、セッション集約手段が、トラヒック特性の時系列変動を単位時間幅のbinごとに離散化してトラヒック特性数列を生成する特性数列生成手段を具備し、前記相互相関を計算する手段は、前記トラヒック特性数列の相互相関を計算するようにした。 (3) Means for calculating occurrence timing characteristics is means for generating an occurrence timing vector having the bin position as an element by representing the occurrence timing of each session for each SD group as a bin position having a predetermined time width. A means for calculating a difference for each combination of elements having different occurrence timing vectors, and a means for calculating a reliability for which the difference is the occurrence period of a session for each set of elements having the same difference. Sessions were identified based on the reliability of each occurrence period.
(4) The session aggregation means includes means for calculating the cross-correlation of traffic characteristics between SD groups having the same source information and different destination information, and the SD groups whose cross-correlation exceeds a predetermined threshold are combined into one SD group. Aggregated into groups.
Further, the session aggregation means includes characteristic sequence generation means for generating a traffic characteristic sequence by discretizing a time series variation of traffic characteristics for each bin of unit time width, and the means for calculating the cross correlation includes the traffic characteristics The cross-correlation of the sequence was calculated.

本発明によれば、以下のような効果が達成される。
(1) 宛先情報が異なるセッションであっても、送信元情報が同一であり、かつセッションのトラヒック特性が類似あるいは高い相関を示すセッションは、同一のサービスまたはアプリケーションに係るセッションとして同一のSDグループに集約され、セッションの生起タイミング特性に基づく通信種別の判定対象とされる。 According to the present invention, the following effects are achieved.
(1) Sessions with the same source information and similar or high correlation in session traffic characteristics, even in sessions with different destination information, belong to the same SD group as sessions related to the same service or application. The communication types are aggregated and are subject to communication type determination based on the occurrence timing characteristics of the session.

したがって、コンテンツが複数のサーバに分散配置されたり、コンテンツの配信元がユーザ端末の位置に応じて動的に変更、最適化されたりするサービスやアプリケーションに係るセッションであっても、ユーザ由来・主導のフォアグラウンド通信とバックグラウンド通信とを正確に判別できるようになる。 Therefore, even if the session is related to a service or application in which content is distributed and distributed on multiple servers, or the content distribution source is dynamically changed and optimized according to the location of the user terminal, The foreground communication and the background communication can be accurately discriminated.

(2) 通信端末が実行する通信のうち、生起タイミングの自己相関が高いセッションは、ユーザ操作とは無関係にOSやアプリケーションが自動的、機械的に実行するバックグラウンド通信である可能性が高いので、生起タイミングの自己相関係数を計算することで、フォアグラウンド通信とバックグラウンド通信とを正確に判別できるようになる。 (2) Of the communications performed by communication terminals, sessions with high autocorrelation of occurrence timing are likely to be background communications that are automatically and mechanically executed by the OS and applications regardless of user operations. By calculating the autocorrelation coefficient of the occurrence timing, the foreground communication and the background communication can be accurately determined.

(3) バックグラウンド通信に係るセッションの生起タイミングのように、極めて疎なデータ列を評価する際、疎の区間に対応して多数の「０」を連続して含むビット数列Tではなく、このビット数列Tから生成した生起タイミングベクトルを対象に信頼度計算を行うので、射影後の数列長を短くすることができ、計算量を大幅に減じることが可能となる。 (3) When evaluating an extremely sparse data sequence, such as the occurrence timing of a session related to background communication, this is not a bit sequence T that contains a number of “0” s continuously corresponding to the sparse interval. Since the reliability calculation is performed on the occurrence timing vector generated from the bit sequence T, the sequence length after projection can be shortened, and the calculation amount can be greatly reduced.

本発明の通信識別方法が適用されるネットワークの構成を示した図である。It is the figure which showed the structure of the network to which the communication identification method of this invention is applied. キャプチャ装置の主要部の構成を示した機能ブロック図である。It is the functional block diagram which showed the structure of the principal part of a capture apparatus. 宛先のみが異なるSDグループを集約する方法を模式的に表現した図である。It is the figure which expressed typically the method of integrating the SD group from which only a destination differs. TCPコネクションにおけるトラヒック特性の測定方法を示した図である。It is the figure which showed the measuring method of the traffic characteristic in a TCP connection. HTTPセッションにおけるトラヒック特性の測定方法を示した図である。It is the figure which showed the measuring method of the traffic characteristic in HTTP session. 自己相関係数の計算方法を示したフローチャートである。It is the flowchart which showed the calculation method of the autocorrelation coefficient. 本発明の第１実施形態に係るキャプチャ装置の機能ブロック図である。It is a functional block diagram of the capture device concerning a 1st embodiment of the present invention. 相互相関係数の計算方法を示したフローチャートである。It is the flowchart which showed the calculation method of the cross correlation coefficient. 本発明の第２実施形態に係るキャプチャ装置の機能ブロック図である。It is a functional block diagram of the capture device concerning a 2nd embodiment of the present invention. ビット数列生成部１０５B1の機能を説明するための図である。It is a figure for demonstrating the function of the bit sequence generator 105B1. 差分計算部１０５C1の機能を説明するための図である。It is a figure for demonstrating the function of difference calculation part 105C1. 要素集合表生成部１０５C2の機能を説明するための図である。It is a figure for demonstrating the function of element set table production | generation part 105C2. 信頼度計算部１０５C3の動作を示したフローチャートである。It is the flowchart which showed operation | movement of the reliability calculation part 105C3.

以下、図面を参照して本発明の実施の形態について詳細に説明する。図１は、本発明の通信識別方法が適用されるネットワークの構成を示したブロック図である。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing the configuration of a network to which the communication identification method of the present invention is applied.

サービス提供範囲の各エリアには無線基地局BSが設置され、当該エリア内の無線移動端末MN（例えば、スマートフォンやタブレット端末）は前記各無線基地局BSに収容される。各無線基地局BSは無線アクセス網RANに接続され、前記無線アクセス網RANはコア網のゲートウェイ(GW)に接続される。前記コア網はインターネットエクスチェンジ(IX)においてインターネットと接続される。 A radio base station BS is installed in each area of the service providing range, and a radio mobile terminal MN (for example, a smartphone or a tablet terminal) in the area is accommodated in each radio base station BS. Each radio base station BS is connected to a radio access network RAN, and the radio access network RAN is connected to a gateway (GW) of the core network. The core network is connected to the Internet at the Internet Exchange (IX).

前記インターネットには、各MNからの要求に応答してサービスを提供する各種のサーバが接続されている。本実施形態では、各MNと各サーバとの間のトラヒックを集約できる回線として、無線アクセス網RANとコア網とを接続する回線Lに、通信識別装置としてのキャプチャ装置１が接続されている。 Various servers that provide services in response to requests from each MN are connected to the Internet. In the present embodiment, the capture device 1 as a communication identification device is connected to a line L that connects the radio access network RAN and the core network as a line that can aggregate traffic between each MN and each server.

図２は、前記キャプチャ装置１の主要部の構成を示した機能ブロック図であり、ここでは、本発明の説明に不要な構成は図示が省略されている。本発明のキャプチャ装置１は、汎用のコンピュータやサーバに各機能を実現するアプリケーション（プログラム）を実装することで構成できる。あるいはアプリケーションの一部がハードウェア化またはROM化された専用機や単能機としても構成できる。 FIG. 2 is a functional block diagram showing the configuration of the main part of the capture device 1. Here, the configuration unnecessary for the description of the present invention is omitted. The capture device 1 of the present invention can be configured by mounting an application (program) for realizing each function on a general-purpose computer or server. Alternatively, it can be configured as a dedicated machine or a single-function machine in which a part of the application is implemented in hardware or ROM.

通信トラヒック測定部１０１は、前記回線L上で送受されるパケットをキャプチャして、そのセッション種別（TCP、HTTP、UDPなど）、生起タイミング、送信元情報および宛先情報を含む各種の情報をログ情報管理部１０２に記録する。 The communication traffic measurement unit 101 captures packets transmitted and received on the line L, and logs various information including the session type (TCP, HTTP, UDP, etc.), occurrence timing, transmission source information, and destination information. Record in the management unit 102.

セッション分類部１０３は、各セッションをその送信元情報(S)および宛先情報(D)の組み合わせに基づいていずれかのSDグループに分類する。すなわち、送信元情報および宛先情報のいずれもが同一のセッションは同一のSDグループに分類され、異なるセッションは別のSDグループに分類される。 The session classification unit 103 classifies each session into one of the SD groups based on the combination of the transmission source information (S) and the destination information (D). That is, sessions with the same transmission source information and destination information are classified into the same SD group, and different sessions are classified into different SD groups.

本実施形態では、送信元情報として送信元IPアドレスを採用し、宛先情報として宛先IPアドレスおよびポート番号を採用する。各セッションの生起タイミングは、HTTPセッションであればHTTP Requestパケットの到着時刻で代表し、TCPセッションであればセッション確立時に実行されるスリーハンドシェイクで送受されるSynパケットの到着時刻で代表する。また、UDPであれば当該セッションで初めて観測されるパケットの到着時刻で代表する。 In this embodiment, a transmission source IP address is adopted as transmission source information, and a destination IP address and a port number are adopted as destination information. The occurrence timing of each session is represented by the arrival time of the HTTP Request packet in the case of an HTTP session, and is represented by the arrival time of a Syn packet transmitted and received by a three handshake executed at the time of session establishment in the case of a TCP session. For UDP, it is represented by the arrival time of the packet observed for the first time in the session.

セッション集約部１０４は、送信元情報が同一で宛先情報の異なるSDグループを対象に、各セッションのトラヒック特性のパターン（トラヒックパターン）の分析結果に基づいて、同一のサービスまたはアプリケーションに係る複数のSDグループを一のSDグループに集約する。 The session aggregating unit 104 targets a plurality of SD groups related to the same service or application based on the analysis result of the traffic characteristic pattern (traffic pattern) of each session for SD groups having the same source information and different destination information. Consolidate groups into a single SD group.

図３は、セッション集約部１０４が、宛先のみ異なるSDグループ間でトラヒックパターンの類似度を算出し、トラヒックパターンが類似する複数のSDグループを集約する方法を模式的に表現した図である。 FIG. 3 is a diagram schematically illustrating a method in which the session aggregating unit 104 calculates the similarity of traffic patterns between SD groups with different destinations, and aggregates a plurality of SD groups with similar traffic patterns.

ここでは、SDグループ#iについて時刻t1で生起したセッションに関連して一連のトラヒックパターンP1が観測されている。同様に、宛先のみが異なるSDグループ#i+1について時刻t2で生起したセッションに関連して一連のトラヒックパターンP2が観測され、同じく宛先のみが異なるSDグループ#i+2について時刻t3で生起したセッションに関連して一連のトラヒックパターンP3が観測されている。 Here, a series of traffic patterns P1 is observed in relation to the session that occurred at time t1 for SD group #i. Similarly, a series of traffic patterns P2 is observed in relation to the session that occurred at time t2 for SD group # i + 1 with only different destinations, and occurred at time t3 for SD group # i + 2 that also has different destinations. A series of traffic patterns P3 has been observed in connection with the session.

セッション集約部１０４は、各トラヒックパターンP1，P2，P3の類似度を評価する。トラヒックパターンの類似度は、NaiveBayesやクラスタリング等の学習・判別アルゴリズムを用いた分類結果を指標として評価できる。あるいは後に図６を参照して説明するように、トラヒック特性の相互相関の計算結果に基づいて評価するようにしても良い。 The session aggregation unit 104 evaluates the similarity between the traffic patterns P1, P2, and P3. The similarity of traffic patterns can be evaluated by using classification results using learning / discrimination algorithms such as NaiveBayes and clustering as an index. Alternatively, as will be described later with reference to FIG. 6, the evaluation may be performed based on the calculation result of the cross-correlation of traffic characteristics.

その結果、類似度が十分に高いと評価できれば、３つのSDグループ#i，#i+1，#i+2が集約され、各トラヒックパターンP1，P2，P3の出現時刻パターンの算術和（論理和）が求められて集約後のトラヒックパターンとされる。 As a result, if the similarity can be evaluated to be sufficiently high, the three SD groups #i, # i + 1, # i + 2 are aggregated, and the arithmetic sum (logic) of the appearance time patterns of the traffic patterns P1, P2, P3 Sum) is obtained and the traffic pattern after aggregation is obtained.

図示の例では、トラヒックパターンP1の出現時刻パターンが[1/0/0/0/0]であり、これと同一の時間軸上でのトラヒックパターンP2，P3の出現時刻パターンが、それぞれ[0/0/1/0/0]，[0/0/0/0/1]なので、集約後の出現時刻パターンは[1/0/1/0/1]となる。 In the illustrated example, the appearance time pattern of the traffic pattern P1 is [1/0/0/0/0], and the appearance time patterns of the traffic patterns P2 and P3 on the same time axis are [0 / 0/1/0/0] and [0/0/0/0/1], so the appearance time pattern after aggregation is [1/0/1/0/1].

集約後の出現時刻パターン[1/0/1/0/1]は、後述するセッション生起タイミング特性計算部１０５による自己相関の計算とされ、自己相関が所定の閾値を超えるSDグループのセッションがバックグラウンド通信に識別されることになる。 The appearance time pattern [1/0/1/0/1] after aggregation is calculated as an autocorrelation by the session occurrence timing characteristic calculation unit 105 described later, and an SD group session whose autocorrelation exceeds a predetermined threshold is backed up. It will be identified by ground communication.

本実施形態では、各SDグループのトラヒックパターンの類似度を、トラヒック特性の相互相関に基づいて評価すべく、前記セッション集約部１０４は、トラヒック特性分析部１０４ａ，特性数列生成部１０４ｂおよび相関計算部１０４ｃを備える。 In this embodiment, in order to evaluate the similarity of the traffic patterns of each SD group based on the cross-correlation of traffic characteristics, the session aggregation unit 104 includes a traffic characteristic analysis unit 104a, a characteristic number sequence generation unit 104b, and a correlation calculation unit. 104c.

トラヒック特性分析部１０４ａは、前記ログ情報管理部１０２に蓄積され、送信元情報が同一で宛先情報の異なるSDグループを対象に、それぞれのトラヒック情報を所定の時間幅（bin）ごとに集計して単位トラヒック特性を計算する。 The traffic characteristic analysis unit 104a accumulates each traffic information for each predetermined time width (bin) for the SD groups stored in the log information management unit 102 and having the same transmission source information but different destination information. Calculate unit traffic characteristics.

本実施形態では単位トラヒック特性として、(1) ダウンロード(DL)のデータ転送量またはパケット数、(2) アップロード(UL)のデータ転送量またはパケット数、(3) スループット特性、(4)各種のトラヒック遅延特性（サーバ側RTT遅延、クライアント側RTT遅延、TCP接続遅延など）、(5)パケット間の到着間隔を採用できる。 In this embodiment, as unit traffic characteristics, (1) download (DL) data transfer amount or number of packets, (2) upload (UL) data transfer amount or number of packets, (3) throughput characteristics, (4) various types Traffic delay characteristics (server side RTT delay, client side RTT delay, TCP connection delay, etc.), (5) arrival interval between packets can be adopted.

さらに、(6) DL転送データ量とUL転送データ量との比率(DL/UL)や、(7) DLパケット数とULパケット数との比率(DL/UL)も単位トラヒック特性として採用できる。すなわち、ユーザ主導のフォアグラウンド通信では、例えばWeb閲覧であればコンテンツの閲覧といったDLが主体となるのでDL/UL>>１となる。また、ユーザ主導の通信であってもSNSでは画像のアップロードなどが主体となるのでDL/UL<<１となる。これに対して、バックグラウンド通信では、例えば、少量のデータ送受が主体となり、DL/UL≒１となる。したがって、これらのトラヒック特性は識別性の高い指標となり得る。 Furthermore, (6) the ratio of DL transfer data amount to UL transfer data amount (DL / UL) and (7) the ratio of DL packet number to UL packet number (DL / UL) can also be adopted as unit traffic characteristics. That is, in the user-driven foreground communication, for example, when browsing the Web, DL such as content browsing is mainly used, so DL / UL >> 1. Also, even with user-initiated communications, DL / UL << 1 because SNS is mainly responsible for uploading images. On the other hand, in background communication, for example, a small amount of data is mainly transmitted and received, and DL / UL≈1. Therefore, these traffic characteristics can be a highly discriminating index.

さらに、(8) 保留時間(HoldTime)と応答時間(RespTime)との比率、(9) 各種トラヒック特性の時系列変動傾向およびその統計量（観測した複数あるいは全てのbin毎の値をまとめた合計や平均、分散、最大値、最小値など）、(10) セッションの生起数を単位トラヒック特性として採用しても良い。 Furthermore, (8) Ratio of hold time (HoldTime) and response time (RespTime), (9) Time series fluctuation tendency of various traffic characteristics and its statistics (total sum of observed multiple or all bin values) Or average, variance, maximum value, minimum value, etc.) (10) The number of occurrences of a session may be adopted as the unit traffic characteristic.

すなわち、ユーザ主導のフォアグラウンド通信では、例えばWeb閲覧であれば、接続後のDL時間の方が接続遅延よりも大きいので、保留時間と応答時間との比率(HoldTime／RespTime)は１よりも十分に大きくなる。また、Web閲覧では、多種多様な種類およびサイズのコンテンツがアクセス対象や取得対象となるので、トラヒック特性の時系列変動が大きくなる一方、バックグラウンド通信では少量のデータ送受やPush通知の待ち受け系が主体となるので時系列変動が小さくなる。したがって、これらのトラヒック特性も識別性の高い指標となり得る。 That is, in the user-driven foreground communication, for example, when browsing the Web, the DL time after connection is greater than the connection delay, so the ratio of the hold time to the response time (HoldTime / RespTime) is sufficiently greater than 1. growing. In web browsing, content of various types and sizes are subject to access and acquisition, so the time-series fluctuation of traffic characteristics increases, while background communication has a small amount of data transmission and reception and a push notification standby system. Since it is the main body, time series fluctuations are reduced. Therefore, these traffic characteristics can also be a highly discriminating index.

図４は、前記トラヒック特性分析部１０４ａによるトラヒック特性の分析方法を説明するための図である。ここでは、TCPコネクションの確立時にクライアント／サーバ間で実行されるTCP_3wayハンドシェークのSYNパケットからキャプチャできたコネクションを例にして説明する。 FIG. 4 is a diagram for explaining a traffic characteristic analysis method by the traffic characteristic analysis unit 104a. Here, a connection that can be captured from a SYN packet of a TCP_3way handshake executed between a client and a server when a TCP connection is established will be described as an example.

TCPコネクションについては、端末MHからサーバへ最初に送信されたSYNパケットの到着時刻（コネクション生起時刻）t1と、サーバから端末MHへ返信されたSYN+ACKパケットの到着時刻t2との差分(t2-t1)に基づいてサーバ側RTT（往復）遅延が算出される。 For the TCP connection, the difference between the arrival time (connection occurrence time) t1 of the SYN packet first transmitted from the terminal MH to the server and the arrival time t2 of the SYN + ACK packet returned from the server to the terminal MH (t2- The server side RTT (round trip) delay is calculated based on t1).

また、前記SYN+ACKパケットの到着時刻t2と端末MHからサーバへ最後に送信されたACKパケットの到着時刻t3との差分(t3-t2)に基づいて、クライアント側RTT遅延が算出される。さらに、前記最初のSYNパケットの到着時刻t1と前記3wayハンドシェーク後に端末MHからサーバへ最初に送信されデータパケットの到着時刻t4との差分(t4-t1)に基づいて、TCP接続遅延が算出される。このような遅延特性は、binごとに離散化して特性数列化し、その相関係数を求めることにより、フォアグラウンド通信判定およびバックグラウンド通信判定の指標の一つとして利用できる。 The client-side RTT delay is calculated based on the difference (t3-t2) between the arrival time t2 of the SYN + ACK packet and the arrival time t3 of the ACK packet last transmitted from the terminal MH to the server. Further, the TCP connection delay is calculated based on the difference (t4-t1) between the arrival time t1 of the first SYN packet and the arrival time t4 of the data packet first transmitted from the terminal MH to the server after the 3-way handshake. . Such a delay characteristic is discretized for each bin, converted into a characteristic number sequence, and a correlation coefficient is obtained, so that it can be used as one of indices for foreground communication determination and background communication determination.

また、3wayハンドシェーク後に端末MHから最初に送信されるデータの到着時刻t4からFINまたはRSTパケットの到着時刻t5までの差分(t5-t4)、および当該差分時間内にキャプチャされた送受信データ量に基づいて、TCPコネクションのスループット特性が算出される。 Also, based on the difference (t5-t4) from the arrival time t4 of the first data transmitted from the terminal MH after the 3-way handshake to the arrival time t5 of the FIN or RST packet, and the amount of transmitted / received data captured within the difference time Thus, the throughput characteristic of the TCP connection is calculated.

このようなスループット特性も、所定幅のbinごとに離散化して特性数列化し、その相関係数を求めることにより、フォアグラウンド通信判定およびバックグラウンド通信判定の指標の一つとして利用できる。 Such throughput characteristics can also be used as one of the indicators of foreground communication determination and background communication determination by discretizing and binarizing the characteristics for each bin of a predetermined width and obtaining the correlation coefficient.

また、前記データの到着時刻t4からFINまたはRSTパケットの到着時刻t5までに観測されたDLまたはULの転送データ量、パケット数あるいはDLとULとの比率も、所定幅のbinごとに離散化して特性数列化し、その相関係数を求めることにより、フォアグラウンド通信判定およびバックグラウンド通信判定の指標の一つとして利用できる。ここで、トラヒック特性の指標として採用する転送データ量やパケット数は、上記のようなペイロードデータに限定ものではなく、各種の制御パケットを含む全転送データ量または全パケット数であっても良い。 Also, the DL or UL transfer data amount, the number of packets or the ratio of DL and UL observed from the data arrival time t4 to the arrival time t5 of the FIN or RST packet is discretized for each bin of a predetermined width. By obtaining a characteristic number sequence and obtaining its correlation coefficient, it can be used as one of indices for foreground communication determination and background communication determination. Here, the amount of transfer data and the number of packets adopted as an index of traffic characteristics are not limited to the payload data as described above, and may be the total amount of transfer data or the total number of packets including various control packets.

なお、パケットのキャプチャがコネクションの途中から開始されているような場合には、得られた到着時刻から可能な分析のみが選択的に行われる。例えば、キャプチャがSYN+ACKパケットから開始されていれば、その到着時刻t2からACKパケットの到着時刻t3までの差分(t3-t2)に基づいて、クライアント側RTT遅延のみが算出される。 When packet capture is started from the middle of the connection, only possible analysis is selectively performed from the obtained arrival time. For example, if the capture is started from the SYN + ACK packet, only the client-side RTT delay is calculated based on the difference (t3-t2) from the arrival time t2 to the arrival time t3 of the ACK packet.

また、前記TCPコネクションのスループット特性やTCP接続所要時間は、クライアント側の遅延のみならずサーバが側の遅延にも依存するので、サーバ側遅延が大きいときに算出されたこれらの特性等は、クライアント側の通信品質を正確に代表できない。したがって、前記サーバ側RTT遅延が所定の閾値を超えているとき、あるいはサーバ側遅延を代表できるデータやACKなどのパケット到着間隔が所定の閾値を越えているときに算出されたスループット特性やTCP接続所要時間は、品質分析の対象から除外することが望ましい。 Further, the throughput characteristics and TCP connection time required for the TCP connection depend not only on the client side delay but also on the server side delay, so these characteristics calculated when the server side delay is large are Cannot accurately represent the communication quality on the other side. Therefore, when the server-side RTT delay exceeds a predetermined threshold, or when the packet arrival interval such as data or ACK that can represent the server-side delay exceeds a predetermined threshold, TCP characteristics and TCP connection It is desirable to exclude the time required from quality analysis.

図５は、HTTPセッションを対象とした品質特性の測定方法を説明するためのシーケンスフローであり、HTTPリクエスト(#1)パケットの到着時刻t1と、このリクエストに対して返信されるHTTPレスポンス(#1)パケットの到着時刻t2との時間差(t2-t1)がHTTPレスポンス遅延(RespTime)とされる。また、最初のHTTPリクエスト(#1)パケットの到着時刻t1と最後のHTTPレスポンス(#1)パケットの到着時刻t3との時間差(t3-t1)がHTTP保留時間(HoldTime)とされる。このようなRespTimeとHoldTimeとの比率も、binごとに離散化して特性数列化し、その相関係数を求めることにより、フォアグラウンド通信判定およびバックグラウンド通信判定の指標の一つとして利用できる。 FIG. 5 is a sequence flow for explaining a method for measuring quality characteristics for an HTTP session. An HTTP request (# 1) packet arrival time t1 and an HTTP response (# 1) The time difference (t2−t1) from the packet arrival time t2 is the HTTP response delay (RespTime). Further, the time difference (t3−t1) between the arrival time t1 of the first HTTP request (# 1) packet and the arrival time t3 of the last HTTP response (# 1) packet is set as the HTTP hold time (HoldTime). Such a ratio between RespTime and HoldTime can also be used as one of the indicators for foreground communication determination and background communication determination by discretizing each bin to form a characteristic number sequence and obtaining a correlation coefficient thereof.

また、最初のHTTPレスポンス(#1)パケットの到着時刻t2と最後のHTTPレスポンス(#1)パケットの到着時刻t3との時間差(t2-t1)がHTTPレスポンス保留時間とされ、その間にダウンロードされた総データ量がHTTP通信データ量とされる。さらに、最初のHTTPリクエスト(#2)パケットの到着時刻t4と最後のHTTPリクエスト(#2)パケットの到着時刻t5との時間差(t5-t4)がHTTPリクエスト保留時間とされ、その間のデータ量がHTTPリクエストデータ量とされる。このようなHTTP通信データ量やHTTPリクエストデータ量も、binごとに離散化して特性数列化し、その相関係数を求めることにより、フォアグラウンド通信判定およびバックグラウンド通信判定の指標の一つとして利用できる。 Also, the time difference (t2-t1) between the arrival time t2 of the first HTTP response (# 1) packet and the arrival time t3 of the last HTTP response (# 1) packet is taken as the HTTP response hold time and downloaded during that time The total data amount is the HTTP communication data amount. Furthermore, the time difference (t5-t4) between the arrival time t4 of the first HTTP request (# 2) packet and the arrival time t5 of the last HTTP request (# 2) packet is the HTTP request hold time, and the amount of data between them is The amount of HTTP request data. Such HTTP communication data amount and HTTP request data amount can also be used as one of the indicators of foreground communication determination and background communication determination by discretizing each bin to form a characteristic number sequence and obtaining the correlation coefficient thereof.

図２へ戻り、特性数列生成部１０４ｂは、前記binごとに得られた単位トラヒック特性を時系列の要素とする特定数列をSDグループごとに生成する。相関計算部１０４ｃは、送信元情報が同一で宛先情報の異なるSDグループ間で、トラヒック特性の相互相関に基づいて、当該SDグループを一のSDグループに集約するか否かを判定する。 Returning to FIG. 2, the characteristic number sequence generation unit 104 b generates, for each SD group, a specific number sequence having the unit traffic characteristics obtained for each bin as time-series elements. The correlation calculation unit 104c determines whether to aggregate the SD groups into one SD group based on the cross-correlation of traffic characteristics between SD groups having the same transmission source information but different destination information.

図６は、各SDグループのトラヒックパターンの類似度を、トラヒック特性の相互相関に基づいて評価する方法を示したフローチャートであり、送信元情報が同一で宛先情報の異なる全てのSDグループペアについて繰り返される。 FIG. 6 is a flowchart showing a method of evaluating the similarity of traffic patterns of each SD group based on the cross-correlation of traffic characteristics, and is repeated for all SD group pairs having the same source information but different destination information. It is.

ステップＳ１０１では、宛先情報の異なる２つのSDグループSD1，SD2の一のトラヒック特性が所定の周期τで離散化（BIN化）され、各周期で観測されたトラヒック特性が時系列で数列化される。ステップＳ１０２では、相互相関を算出するためのラグ値jにτ秒が設定される。 In step S101, the traffic characteristics of two SD groups SD1 and SD2 having different destination information are discretized (BINed) at a predetermined period τ, and the traffic characteristics observed in each period are sequenced in time series. . In step S102, τ seconds are set as the lag value j for calculating the cross-correlation.

ステップＳ１０３では、今回のラグ値jに対応する相互相関係数CC(j)が次式(1)に基づいて計算される。本実施形態では、２つの数列をτずつずらして積の和を算出することで相互相関係数が求められる。 In step S103, the cross-correlation coefficient CC (j) corresponding to the current lag value j is calculated based on the following equation (1). In the present embodiment, the cross-correlation coefficient is obtained by calculating the sum of products by shifting two number sequences by τ.

ステップＳ１０４では、今回の相互相関係数CC(j)の計算結果を、次式(2)に示したように、ラグ値j=0の場合の各自己相関係数AC1(0)，AC2(0)の積の平方根で除すことにより、当該相互相関係数CC(j)が正規化される。 In step S104, the calculation result of the current cross-correlation coefficient CC (j) is calculated from the autocorrelation coefficients AC1 (0), AC2 (when the lag value j = 0, as shown in the following equation (2). The cross-correlation coefficient CC (j) is normalized by dividing by the square root of the product of (0).

ステップＳ１０５では、ラグ値jが所定の上限値jmaxに達したか否かが判定される。j≧jmaxでなければ、ステップＳ１０６へ進んでラグ値jを周期τ秒だけ延長した後にステップＳ１０３へ戻り、更新後のラグ値jに対応する相互相関係数CC(j)の計算、正規化が繰り返される。 In step S105, it is determined whether or not the lag value j has reached a predetermined upper limit value jmax. If j ≧ jmax, the process proceeds to step S106 to extend the lag value j by the period τ seconds, and then returns to step S103 to calculate and normalize the cross-correlation coefficient CC (j) corresponding to the updated lag value j Is repeated.

このような相互相関係数CC(j)の計算は、送信元情報が同一で宛先情報の異なるSDグループが他にもある場合には、計算対象を他のSDグループに切り替えながら繰り返される。 Such calculation of the cross-correlation coefficient CC (j) is repeated while switching the calculation target to another SD group when there are other SD groups having the same source information and different destination information.

本実施形態では、トラヒック特性の相互相関が所定の閾値を超えるSDグループ同士は、そのトラヒックパターンが類似していると判断し、これらのSDグループが一のSDグループに集約される。 In the present embodiment, SD groups whose traffic characteristics cross-correlation exceed a predetermined threshold are determined to have similar traffic patterns, and these SD groups are aggregated into one SD group.

図２へ戻り、セッション生起タイミング特性計算部１０５は、前記分類または集約されたSDグループごとに、後に詳述するように、セッションの生起タイミングに関する特性を計算する。通信識別部１０６は、前記セッションの生起タイミング特性に基づいて当該SDグループのセッションを、フォアグラウンド通信またはバックグラウンド通信に識別する。 Returning to FIG. 2, the session occurrence timing characteristic calculation unit 105 calculates a characteristic relating to the occurrence timing of the session for each of the classified or aggregated SD groups, as will be described in detail later. The communication identification unit 106 identifies the session of the SD group as foreground communication or background communication based on the occurrence timing characteristic of the session.

[第１実施例]
図７は、本発明の第１実施形態に係るキャプチャ装置の主要部の構成を示した図であり、前記と同一の符号は同一または同等部分を表しているので、その説明は省略する。本実施形態では、前記セッション生起タイミング特性計算部１０５として、SDグループごとに、同一グループに分類または集約された各セッションの生起タイミングに関する自己相関を計算する自己相関計算部１０５Aを採用した点に特徴がある。 [First embodiment]
FIG. 7 is a diagram showing a configuration of a main part of the capture device according to the first embodiment of the present invention, and the same reference numerals as those described above represent the same or equivalent parts, and thus description thereof is omitted. The present embodiment is characterized in that the session occurrence timing characteristic calculation unit 105 employs an autocorrelation calculation unit 105A that calculates an autocorrelation related to the occurrence timing of each session classified or aggregated into the same group for each SD group. There is.

図８は、前記自己相関計算部１０５AによるSDグループ内自己相関の計算手順を示したフローチャートであり、全てのSDグループを対象に繰り返される。 FIG. 8 is a flowchart showing a calculation procedure of the intra-SD group autocorrelation performed by the autocorrelation calculation unit 105A, and is repeated for all SD groups.

ステップＳ２０１では、今回の識別対象となるSDグループ(SD1)に分類されている各セッションの生起タイミングが所定の周期τ秒で離散化（BIN化）され、各周期τ内で生起したセッション数が時系列で数列化される。 In step S201, the occurrence timing of each session classified in the SD group (SD1) to be identified this time is discretized (binized) at a predetermined period τ seconds, and the number of sessions generated within each period τ is calculated. It is converted into a number sequence in time series.

ステップＳ２０２では、前記自己相関を算出するためのラグ値（遅れ時間）jとして前記周期と同じτ秒が設定される。ステップＳ２０３では、今回のラグ値jに対応する自己相関係数AC(j)が次式(3)に基づいて計算される。本実施形態では、同一の数列をjずつずらして積の和を算出することで自己相関係数AC(j)が求められる。なお、iは数列の要素識別子、SD(i)は数列のi番目の要素の値、Nは数列長である。 In step S202, the same τ seconds as the cycle are set as the lag value (delay time) j for calculating the autocorrelation. In step S203, the autocorrelation coefficient AC (j) corresponding to the current lag value j is calculated based on the following equation (3). In the present embodiment, the autocorrelation coefficient AC (j) is obtained by calculating the sum of products by shifting the same number sequence by j. Note that i is an element identifier of the sequence, SD (i) is the value of the i-th element of the sequence, and N is the sequence length.

ステップＳ２０４では、今回の自己相関係数AC(j)の計算結果を、次式(4)に示したように、ラグ値j=0の場合の自己相関係数AC(0)で除すことにより当該自己相関係数AC(j)が正規化される。 In step S204, the current autocorrelation coefficient AC (j) calculation result is divided by the autocorrelation coefficient AC (0) when the lag value j = 0, as shown in the following equation (4). Thus, the autocorrelation coefficient AC (j) is normalized.

ステップＳ２０５では、ラグ値jが所定の上限値jmaxに達したか否かが判定される。j≧jmaxでなければ、ステップＳ２０６へ進んでラグ値jを周期τ秒だけ延長した後にステップＳ２０３へ戻り、更新後のラグ値jに対応する自己相関係数AC(j)の計算、正規化が繰り返される。 In step S205, it is determined whether or not the lag value j has reached a predetermined upper limit value jmax. If j ≧ jmax, the process proceeds to step S206 to extend the lag value j by the period τ seconds and then return to step S203 to calculate and normalize the autocorrelation coefficient AC (j) corresponding to the updated lag value j Is repeated.

本実施形態によれば、宛先情報が異なるセッションであっても、送信元情報が同一であり、かつセッションのトラヒック特性が類似あるいは高い相関を示すセッションは、同一のサービスまたはアプリケーションに係るセッションとして同一のSDグループに集約され、セッションの生起タイミング特性に基づく通信種別の判定対象とすることができる。 According to the present embodiment, even in sessions with different destination information, sessions having the same source information and similar or high correlation in session traffic characteristics are the same as sessions related to the same service or application. And can be used as a communication type determination target based on session occurrence timing characteristics.

したがって、コンテンツが複数のサーバに分散配置されたり、コンテンツの配信元がユーザ端末の位置に応じて動的に変更、最適化されたりするサービスやアプリケーションに係るセッションであっても、その生起タイミングに関する自己相関性を分析することにより、ユーザ由来・主導のフォアグラウンド通信とバックグラウンド通信とを正確に識別できるようになる。 Therefore, even if a session is related to a service or application in which content is distributed and distributed on multiple servers, or the content distribution source is dynamically changed and optimized according to the location of the user terminal, By analyzing the autocorrelation, it becomes possible to accurately identify the user-derived / led foreground communication and the background communication.

また、本実施形態によれば、フォアグラウンド通信とバックグラウンド通信とをパッシブなパケットキャプチャにより低コストかつ少数の測定ポイントだけで識別できるようになる。 Further, according to the present embodiment, foreground communication and background communication can be identified by passive packet capture at low cost and with only a small number of measurement points.

[第２実施例]
図９は、本発明の第２実施形態に係るキャプチャ装置１の主要部の構成を示した機能ブロック図であり、前記と同一の符号は同一または同等部分を表しているので、その説明は省略する。本実施形態では、前記セッション生起タイミング特性計算部１０５を、生起タイミングベクトル生成部１０５Bおよび周期性評価部１０５Cにより構成した点に特徴がある。 [Second Embodiment]
FIG. 9 is a functional block diagram showing the configuration of the main part of the capture device 1 according to the second embodiment of the present invention. The same reference numerals as those described above represent the same or equivalent parts, and the description thereof is omitted. To do. The present embodiment is characterized in that the session occurrence timing characteristic calculation unit 105 includes an occurrence timing vector generation unit 105B and a periodicity evaluation unit 105C.

前記生起タイミングベクトル生成部１０５Bはビット数列生成部１０５B1を含み、SDグループごとに各セッションの生起タイミングを、所定の時間幅を有するbinの位置で代表し、これを要素とする生起タイミングベクトルを生成する。 The occurrence timing vector generation unit 105B includes a bit sequence generation unit 105B1, and represents the occurrence timing of each session for each SD group as a bin position having a predetermined time width, and generates an occurrence timing vector having this as an element. To do.

図１０は、前記ビット数列生成部１０５B1の機能を説明するための図であり、セッションが生起したか否かを監視する時間軸上のbin幅をΔt、監視期間をn×Δtとし、例えば第1，2，5，8，9，11，12，14番目の各binでセッションの生起が検知されると、次式(5)のように、セッションの生起が検知されたbinには「1」、セッションの生起が検知されなかったbinには「0」、のセットされたビット数列Tを生成され、その長さはnとなる。 FIG. 10 is a diagram for explaining the function of the bit sequence generator 105B1, where the bin width on the time axis for monitoring whether or not a session has occurred is Δt, the monitoring period is n × Δt, When the occurrence of a session is detected in each of the first, second, fifth, eighth, ninth, eleventh, twelfth, and fourteenth bins, as shown in the following equation (5), the bin where the occurrence of a session is detected is “1”. In the bin where the occurrence of the session is not detected, a bit number sequence T in which “0” is set is generated, and the length thereof is n.

前記生起タイミングベクトル生成部１０５Bは、ビット数列Tにおいてビットがセットされた各binを、周期の起点とみなす開始binからの距離（bin数）で表現し、これを時系列の要素とする生起タイミングベクトルOCV(occurrence vector)を生成する。開始binを第１番目のbinとすれば、上式(5)のビット数列Tから、次式(6)の生起タイミングベクトルOCV (occurrence vector) が生成され、その長さmは「8」となる。 The occurrence timing vector generation unit 105B expresses each bin in which a bit is set in the bit number sequence T as a distance (bin number) from the start bin regarded as the start point of the cycle, and uses this as a time-series element. A vector OCV (occurrence vector) is generated. If the starting bin is the first bin, the occurrence timing vector OCV (occurrence vector) of the following equation (6) is generated from the bit sequence T of the above equation (5), and its length m is “8”. Become.

周期性評価部１０５Cにおいて、差分計算部１０５C1は、前記生起タイミングベクトルOCVの各要素の組み合わせごとに差分を計算する。 In the periodicity evaluation unit 105C, the difference calculation unit 105C1 calculates a difference for each combination of elements of the occurrence timing vector OCV.

図１１は、前記差分計算部１０C1の機能を説明するための図であり、本実施形態では、生起タイミングベクトルOCVの各要素iを行列(x，y)方向に時系列で配置し、行列の交差欄に各要素の差分を登録してマトリックス表を導出する。 FIG. 11 is a diagram for explaining the function of the difference calculation unit 10C1. In this embodiment, each element i of the occurrence timing vector OCV is arranged in time series in the matrix (x, y) direction, and the matrix The matrix table is derived by registering the difference of each element in the intersection column.

このとき、同一要素の差分は計算不要なので、マトリックス表において対角線上の組み合わせは計算対象から除外される。また、x番目のベクトル要素とy番目のベクトル要素との差分が求まれば、y番目のベクトル要素とx番目のベクトル要素との差分は計算不要なので、対角線よりも下側の組み合わせも全て計算対象から除外される。 At this time, since the difference between the same elements does not need to be calculated, combinations on the diagonal line in the matrix table are excluded from the calculation target. Moreover, if the difference between the xth vector element and the yth vector element is obtained, the difference between the yth vector element and the xth vector element is not required to be calculated, so all the combinations below the diagonal are also calculated. Excluded from the target.

さらに、監視期間がbinのn倍すなわちビット数列Tのビット長がnであれば、その半分のn/2を超える生起タイミング周期は判別できないので、ここでは8ビット以上の周期は計算対象から除外される。本実施形態では、以上の各徐外処理を事前に実施し、差分計算の対象を減じることで、計算量の削減が可能になる。 Furthermore, if the monitoring period is n times bin, that is, the bit length of the bit sequence T is n, the occurrence timing period exceeding n / 2, which is half of that, cannot be determined. Is done. In the present embodiment, it is possible to reduce the amount of calculation by executing each of the above-described gradual processing in advance and reducing the difference calculation targets.

図９へ戻り、要素集合表生成部１０５C2は、差分が同一となる組み合わせの要素集合の表を生成する。 Returning to FIG. 9, the element set table generation unit 105C2 generates a table of element sets of combinations with the same difference.

図１２は、前記要素集合表生成部１０C2の機能を説明するための図であり、本実施形態では、差分が同一となる組み合わせの要素集合が予め構築される。例えば、差分「2」に関して図１１のマトリックス表を参照すると、第9ビットと第11ビットとの間隔および第12ビットと第14ビットとの間隔がいずれも「2」なので、差分「2」の要素集合は｛9，11，12，14｝となる。 FIG. 12 is a diagram for explaining the function of the element set table generation unit 10C2. In this embodiment, a combination of element sets having the same difference is constructed in advance. For example, referring to the matrix table of FIG. 11 for the difference “2”, the interval between the 9th bit and the 11th bit and the interval between the 12th bit and the 14th bit are both “2”. The element set is {9, 11, 12, 14}.

同様に、差分「3」に関してマトリックス表を参照すると、第2ビットと第5ビットとの間隔、第5ビットと第8ビットとの間隔、第9ビットと第12ビットとの間隔および第11ビットと第14ビットとの間隔がいずれも「3」なので、差分「3」の要素集合は｛2，5，8，9，11，12，14｝となる。以下同様に、差分「4」，「5」，「6」，「7」に関してもマトリックス表を参照して要素集合が構築され、図１２に示したような要素集合表が完成する。 Similarly, referring to the matrix table for the difference “3”, the interval between the second bit and the fifth bit, the interval between the fifth bit and the eighth bit, the interval between the ninth bit and the twelfth bit, and the eleventh bit And the 14th bit are both “3”, the element set of the difference “3” is {2, 5, 8, 9, 11, 12, 14}. Similarly, for the differences “4”, “5”, “6”, and “7”, an element set is constructed with reference to the matrix table, and the element set table as shown in FIG. 12 is completed.

信頼度計算部１０５C3は、前記差分ごとに求められた要素集合に基づいて、当該差分をセッションの生起周期とみなした場合の信頼度を計算する。通信識別部１０６は、各セッションを、その生起タイミングの信頼度に基づいて、フォアグラウンド通信およびバックグラウンド通信のいずれかに識別する。 Based on the element set obtained for each difference, the reliability calculation unit 105C3 calculates the reliability when the difference is regarded as a session occurrence cycle. The communication identifying unit 106 identifies each session as either foreground communication or background communication based on the reliability of the occurrence timing.

図１３は、前記信頼度計算部１０５C3による信頼度の計算手順を示したフローチャートであり、本実施形態では、差分Δdごとに得られるベクトル要素列を対象に、開始binを切り替えながら、セッション生起が差分Δdごとに検知されている割合に基づいて信頼度が計算される。 FIG. 13 is a flowchart showing a reliability calculation procedure performed by the reliability calculation unit 105C3. In this embodiment, a session is generated while switching a start bin for a vector element sequence obtained for each difference Δd. The reliability is calculated based on the ratio detected for each difference Δd.

ステップS３０１では、要素数が最大の要素集合が選択される。本実施形態では、差分Δdを「3」として抽出された要素集合｛2，5，8，9，11，12，14｝の要素数が「7」で最多となるので、この要素集合が選択される。ステップS３０２では、セッションの生起周期の起点とみなす開始binに初期値の「1」がセットされる。 In step S301, an element set having the maximum number of elements is selected. In this embodiment, since the number of elements of the element set {2, 5, 8, 9, 11, 12, 14} extracted with the difference Δd as “3” is “7”, this element set is selected. Is done. In step S302, an initial value “1” is set in the start bin regarded as the starting point of the session occurrence cycle.

ステップS３０３では、前記選択された要素集合｛2，5，8，9，11，12，14｝について、今回の開始binの要素に差分Δdを順次に加算することで期待要素列が求められる。ここでは開始binが「1」なので、時系列で１番目の要素「2」に差分「3」を順次に加算して期待要素列｛2，5，8，11，14｝が求められる。 In step S303, for the selected element set {2, 5, 8, 9, 11, 12, 14}, an expected element string is obtained by sequentially adding the difference Δd to the element of the current start bin. Here, since the start bin is “1”, the difference “3” is sequentially added to the first element “2” in time series to obtain the expected element sequence {2, 5, 8, 11, 14}.

ステップＳ３０４では、前記選択された要素集合｛2，5，8，9，11，12，14｝と前記期待要素列｛2，5，8，11，14｝とが比較され、期待要素列の各要素が要素集合の各要素と一致するか否かを示す一致数列OCV'が求められる。ここでは、期待要素列｛2，5，8，11，14｝の全ての要素が要素集合｛2，5，8，9，11，12，14｝のいずれかの要素と一致するので、一致数列OCV'として[11111]が得られる。ステップS５では、次式(7)に基づいて、差分「3」の開始bin「1」における信頼度cが算出される。 In step S304, the selected element set {2, 5, 8, 9, 11, 12, 14} is compared with the expected element sequence {2, 5, 8, 11, 14}, and the expected element sequence is determined. A matching number sequence OCV ′ indicating whether each element matches each element of the element set is obtained. Here, all the elements in the expected element sequence {2, 5, 8, 11, 14} match with any element in the element set {2, 5, 8, 9, 11, 12, 14}. [11111] is obtained as the sequence OCV ′. In step S5, the reliability c in the start bin “1” of the difference “3” is calculated based on the following equation (7).

ここで、Π_Δd,l(OCV')は一致数列OCV'の長さ（ビット長）であり、上記の例では「5」となる。また、zは生起タイミングが検知されるべきbinでありながら実際には検知されなかったbinの合計数、すなわち一致数列OCV'における「0」の個数である。したがって、上式(7)のzを含む項の値は、数列OCV'が「0」を含めばその個数であり、「0」を含まなければ0となる。上記のように、一致数列OCV'が[11111]であれば「0」を含まないので、上式(7)のzを含む項の値は0となる。したがって、周期「3」の開始bin「1」における信頼度cは、(5-1-0)/(5-1)となって100%となる。 Here, _{ΠΔd, l} (OCV ′) is the length (bit length) of the coincidence sequence OCV ′, which is “5” in the above example. Further, z is the total number of bins whose occurrence timing is to be detected but not actually detected, that is, the number of “0” s in the coincidence sequence OCV ′. Therefore, the value of the term including z in the above equation (7) is the number when the sequence OCV ′ includes “0”, and is 0 when it does not include “0”. As described above, if the coincidence sequence OCV ′ is [11111], since “0” is not included, the value of the term including z in the above equation (7) is 0. Therefore, the reliability c at the start bin “1” of the period “3” is (5-1-0) / (5-1), which is 100%.

なお、図１２の差分が「2」の要素集合｛9，11，12，14｝について開始binを「1」とした場合を例にして説明すれば、ステップＳ３０３では、時系列で１番目の要素「9」に差分「2」を順次に加算して期待要素列｛9，11，13，15｝が求められる。 Note that the case where the start bin is set to “1” for the element set {9, 11, 12, 14} having the difference “2” in FIG. 12 will be described as an example. The difference “2” is sequentially added to the element “9” to obtain the expected element sequence {9, 11, 13, 15}.

ステップＳ３０４では、前記選択された要素集合｛9，11，12，14｝と期待要素列｛9，11，13，15｝とが比較され、ここでは要素「9」，「11」が一致し、「12」，「14」が不一致となるので一致数列OCV'として[1100]が得られ、zが「2」となる。ステップS３０５では、これらを上式(7)に適用することで、開始bin「1」における周期「2」の信頼度cが、(4-1-2)/(4-1)となって33%となる。 In step S304, the selected element set {9, 11, 12, 14} is compared with the expected element sequence {9, 11, 13, 15}. Here, the elements “9” and “11” match. , “12” and “14” do not match, so [1100] is obtained as the coincidence sequence OCV ′, and z becomes “2”. In step S305, by applying these to the above equation (7), the reliability c of the period “2” in the start bin “1” becomes (4-1-2) / (4-1) and becomes 33 %.

このように、本実施形態では通信のバックグラウンドで周期的に確立されるセッションの生起タイミングのように、極めて疎なデータ列を評価するにあたり、疎の区間に対応して多数の「0」を連続して含むビット数列Tではなく、このビット数列Tから生成した生起タイミングベクトルOCV及びOCV'を対象に信頼度計算を行うので、射影後の数列長を短くすることができ、計算量を大幅に減じることが可能となる。 As described above, in this embodiment, when evaluating an extremely sparse data string, such as the occurrence timing of a session periodically established in the background of communication, a large number of “0” s are set corresponding to the sparse section. Since the reliability calculation is performed on the occurrence timing vectors OCV and OCV 'generated from this bit sequence T instead of the consecutive bit sequence T, the sequence length after projection can be shortened, greatly increasing the amount of calculation. Can be reduced to

すなわち、分析対象であるビット数列Tの長さが「n」であれば、自己相関や相互相関はこのビット数列Tを対象に算出されるので、その計算量は単純アルゴリズムであればO(n²)、FFT等により高速化してもO(n log n)となる。 That is, if the length of the bit sequence T to be analyzed is `` n '', the autocorrelation and cross-correlation are calculated for this bit sequence T, so the amount of calculation is O (n ² ) Even if the speed is increased by FFT or the like, O (n log n) is obtained.

これに対して、本実施形態ではOCVの算出において、長さnのビット数列Tに対してn個の要素を一旦全て探索するので、その計算量はO(n)となる。ここで、通信トラヒックを分析対象としてセッションの生起時刻を基に算出したベクトルでは、通信トラヒックデータ全体の大きさ・長さnに対して、生起する時刻が少ないので要素数mが少なくなって疎となり、n＞＞mとなる。 On the other hand, in the present embodiment, in calculating the OCV, since all the n elements are once searched for the bit number sequence T of length n, the amount of calculation is O (n). Here, in the vector calculated based on the time of occurrence of the session with communication traffic as the object of analysis, the number of elements m decreases because the time of occurrence is less than the size and length n of the entire communication traffic data. And n >> m.

OCVを扱う全ての計算は要素数mに対する計算・処理であり、計算量はO(m²)またはO(m)となる。ここで、n＞＞mなので、O(m²)やO(m)は、O(n²)やO(n log n)よりも極めて小さい。したがって、本実施形態によれば、特にビット数列Tが疎の場合に計算量がO(m)となり、その計算量を他の手法による場合に比べて極めて少なくできる。 All calculations dealing with OCV are calculations / processing for the number of elements m, and the calculation amount is O (m ² ) or O (m). Here, since n >> m, O (m ² ) and O (m) are extremely smaller than O (n ² ) and O (n log n). Therefore, according to the present embodiment, the amount of calculation is O (m) particularly when the bit sequence T is sparse, and the amount of calculation can be significantly reduced as compared with the case of other methods.

図１３へ戻り、ステップS３０６では、開始binがインクリメントされる。ステップS３０７では、開始binが所定の上限位置を超えたか否かが判断され、上限位置に達していなければステップS３０３へ戻り、更新後の開始bin（ここでは、「2」）に関して上記の各処理が繰り返される。 Returning to FIG. 13, in step S306, the start bin is incremented. In step S307, it is determined whether or not the start bin has exceeded a predetermined upper limit position. If the start bin has not reached the upper limit position, the process returns to step S303, and each of the processes described above regarding the updated start bin (here, “2”). Is repeated.

これに対して、開始binが上限値に達したと判断されるとステップS３０８へ進み、開始binごとに得られた信頼度の中の最高値が信頼度として採用される。 On the other hand, when it is determined that the start bin has reached the upper limit value, the process proceeds to step S308, and the highest value among the reliability obtained for each start bin is adopted as the reliability.

なお、本実施形態のように開始binが初期値「1」のときに、その信頼度が100%となれば、その後の計算は不要となるので、その時点で信頼度を100%として当該処理を終了するようにしても良い。 As in the present embodiment, when the start bin is the initial value “1”, if the reliability is 100%, the subsequent calculation is not necessary, so that the reliability is set to 100% at that time and the processing is performed. May be terminated.

また、開始binを変更しながら信頼度計算を繰り返しても所定の閾値を超える信頼度が得られない場合にはステップS１へ戻り、要素数が次に大きな要素集合を選択して上記の処理を繰り返すようにしても良い。 If the reliability exceeding the predetermined threshold is not obtained even if the reliability calculation is repeated while changing the start bin, the process returns to step S1, the element set with the next largest number of elements is selected, and the above processing is performed. It may be repeated.

前記通信識別部１０６は、算出された信頼度に基づいて、各SDグループのセッションをバックグラウンド通信およびフォアグラウンド通信のいずれかに識別する。本実施形態では、信頼度に例えば80%程度の閾値を設定し、信頼度が当該閾値を超える周期が存在すれば、当該SDグループの注目セッションをバックグラウンド通信に識別し、それ以外であればフォアグラウンド通信に識別する。あるいは、SDグループごとに求まる多数の信頼度を母数として、その90％ile値を閾値としても良いし、尤度判定を採用しても良い。 The communication identifying unit 106 identifies a session of each SD group as either background communication or foreground communication based on the calculated reliability. In the present embodiment, for example, a threshold of about 80% is set for the reliability, and if there is a period in which the reliability exceeds the threshold, the attention session of the SD group is identified as background communication, otherwise Identify foreground communication. Alternatively, a large number of reliability obtained for each SD group may be used as a parameter, and the 90% ile value may be used as a threshold value, or likelihood determination may be employed.

なお、上記の第２実施形態では各bin幅を厳密に固定し、bin幅内で生起タイミングが検知されたbinに対してのみ「1」をセットするものとして説明したが、本発明はこれのみに限定されるものではなく、各bin幅は維持したまま、その前後方向に所定の幅で付加期間を設定し、当該付加期間を含めたbin幅内で生起タイミングが検知されたbinに対して「1」をセットするようにしても良い。 In the second embodiment described above, each bin width is strictly fixed and “1” is set only for bins whose occurrence timing is detected within the bin width. However, the present invention is not limited to this. For each bin whose occurrence timing is detected within the bin width including the additional period, the additional period is set with a predetermined width in the front-rear direction while maintaining the bin width. “1” may be set.

このようにすれば、ネットワークに遅延やジッタが発生し、セッションの生起タイミングがずれて、算出される数列やその周期、信頼度が低くなってしまう場合でも、このような外乱に関わらず安定した周期判定が可能になり、ひいてはフォアグラウンド通信およびバックグラウンド通信の識別を正確に行えるようになる。 In this way, even if delays and jitters occur in the network, the occurrence timing of the session shifts, and the calculated sequence, its period, and reliability become low, it is stable regardless of such disturbances. The period can be determined, and as a result, foreground communication and background communication can be accurately identified.

したがって、コンテンツが複数のサーバに分散配置されたり、コンテンツの配信元がユーザ端末の位置に応じて動的に変更、最適化されたりするサービスやアプリケーションに係るセッションであっても、その生起タイミングに関する周期性を分析することにより、フォアグラウンド通信またはバックグラウンド通信に識別できるようになる。 Therefore, even if a session is related to a service or application in which content is distributed and distributed on multiple servers, or the content distribution source is dynamically changed and optimized according to the location of the user terminal, By analyzing the periodicity, it becomes possible to distinguish between foreground communication and background communication.

また、本実施形態によれば、バックグラウンド通信に係るセッションの生起タイミングのように、極めて疎なデータ列を評価する際、疎の区間に対応して多数の「０」を連続して含むビット数列Tではなく、このビット数列Tから生成した生起タイミングベクトルOCVを対象に信頼度計算を行うので、射影後の数列長を短くすることができ、計算量を大幅に減じることが可能となる。 In addition, according to the present embodiment, when evaluating a very sparse data string, such as the occurrence timing of a session related to background communication, a bit including a number of “0” s corresponding to a sparse section continuously. Since the reliability calculation is performed on the occurrence timing vector OCV generated from the bit sequence T instead of the sequence T, the sequence length after projection can be shortened, and the calculation amount can be greatly reduced.

さらに、本実施形態によれば、フォアグラウンド通信とバックグラウンド通信とをパッシブなパケットキャプチャにより低コストかつ少数の測定ポイントだけで識別できるようになる。 Further, according to the present embodiment, foreground communication and background communication can be identified by passive packet capture at low cost and with only a small number of measurement points.

１０１…通信トラヒック測定部，１０２…ログ情報管理部，１０３…セッション分類部，１０４…セッション集約部，１０４ａ…トラヒック特性分析部，１０４ｂ…特性数列生成部，１０４ｃ…相関計算部，１０５…セッション生起タイミング特性計算部，１０５A…自己相関計算部，１０５B…生起タイミングベクトル生成部，１０５C…周期性評価部，１０６…通信識別部 DESCRIPTION OF SYMBOLS 101 ... Communication traffic measurement part, 102 ... Log information management part, 103 ... Session classification part, 104 ... Session aggregation part, 104a ... Traffic characteristic analysis part, 104b ... Characteristic sequence generation part, 104c ... Correlation calculation part, 105 ... Session occurrence Timing characteristic calculation unit, 105A ... autocorrelation calculation unit, 105B ... occurrence timing vector generation unit, 105C ... periodicity evaluation unit, 106 ... communication identification unit

Claims

In a communication identification device that identifies a session established between a communication terminal and a communication partner as either foreground communication or background communication,
Session classification means for classifying each session into an SD group specific to the combination of source information and destination information,
Session aggregation means for aggregating a plurality of SD groups having the same transmission source information and different destination information based on the similarity of the traffic characteristics of each session;
A means to calculate the occurrence timing characteristics of sessions for each SD group;
Communication identification means for identifying the session of each SD group based on the occurrence timing characteristics of the session , and
The session aggregation means includes means for calculating cross-correlation of traffic characteristics between SD groups having the same source information but different destination information, and SD groups whose cross-correlation exceeds a predetermined threshold are combined into one SD group. Aggregate,
The session aggregation means includes characteristic sequence generation means for generating a traffic characteristic sequence by discretizing time series fluctuations of traffic characteristics for each bin of unit time width, and the means for calculating the cross-correlation includes the traffic characteristic sequence The cross-correlation of
A communication identification device.

The characteristic number sequence generation means includes download transfer data amount, upload transfer data amount, ratio of download and upload transfer data amount, download packet number, upload packet number, download and upload packet number ratio, It is characterized by calculating at least one of throughput characteristics, traffic delay characteristics, ratio between hold time and response time, traffic characteristics time-series fluctuation tendency, and the number of session occurrences for each bin as a time-series element of the characteristic sequence. The communication identification device according to claim 1 .

The means for calculating the occurrence timing characteristic calculates an autocorrelation related to the occurrence timing of each session for each SD group,
The communication identification device according to claim 1, wherein the communication identification unit identifies an SD group session whose autocorrelation exceeds a predetermined threshold as background communication.

The means for calculating the occurrence timing characteristic is:
Representing the occurrence timing of each session for each SD group by a bin position having a predetermined time width, and a means for generating an occurrence timing vector having the bin position as an element,
Means for calculating a difference for each combination of different elements of the occurrence timing vector;
Means for calculating the reliability that the difference is the occurrence cycle of the session for each element set of the combination in which the difference is the same,
The communication identification apparatus according to claim 1, wherein the communication identification unit identifies a session based on a reliability of each occurrence period.

The means for generating the occurrence timing vector comprises means for generating a bit sequence indicating whether or not it is the occurrence timing of a session for each bin,
5. The communication identification apparatus according to claim 4 , wherein a time series of the number of bins from the start bin of the bit number sequence to each occurrence timing bin is used as an element of the occurrence timing vector.

The means for calculating the reliability calculates the reliability based on the number of bins in which the occurrence timing of the session is detected with respect to the number of bins in which the occurrence timing of the session is expected to be detected within a predetermined monitoring period. 6. The communication identification device according to claim 4 or 5 , wherein:

7. The reliability calculation unit according to claim 4, wherein the reliability calculation unit does not set a combination element set whose difference is approximately half or more of a sampling period of occurrence timing as a reliability calculation target. Communication identification device.

The means for generating the occurrence timing vector sets an additional period with a predetermined width in the front-rear direction of each bin, and generates the occurrence timing vector based on the occurrence timing detected within the bin width including the additional period. The communication identification device according to claim 4, wherein

In a communication identification method in which a computer identifies a session established between a communication terminal and a communication partner as either foreground communication or background communication,
A procedure for classifying each session into an SD group specific to the combination of source and destination information;
Aggregating multiple SD groups with the same source information and different destination information based on the similarity of the traffic characteristics of each session;
Procedure to calculate the occurrence timing characteristics of the session for each SD group,
Identifying a session for each SD group based on the occurrence timing characteristics of the session ;
A means for calculating a cross-correlation of traffic characteristics between SD groups having the same source information and different destination information, and aggregating SD groups whose cross-correlation exceeds a predetermined threshold into one SD group;
A procedure for discretizing the time series variation of traffic characteristics for each bin of unit time width to generate a traffic characteristic sequence, and calculating a cross correlation of the traffic characteristic sequence by means of calculating the cross correlation;
A communication identification method comprising: