JP2013174959A

JP2013174959A - Application inspection system

Info

Publication number: JP2013174959A
Application number: JP2012037805A
Authority: JP
Inventors: Hiroki Kuzuno; 弘樹葛野
Original assignee: Secom Co Ltd
Current assignee: Secom Co Ltd
Priority date: 2012-02-23
Filing date: 2012-02-23
Publication date: 2013-09-05
Anticipated expiration: 2032-02-23
Also published as: JP5806630B2

Abstract

PROBLEM TO BE SOLVED: To provide an application inspection system which can show a user a security risk even for an application such as a grayware.SOLUTION: A cluster extracting means 225 extracts a lowest cluster to which communication packets of an application to be inspected are allocated by using a cluster classification result 217 which stores analyzed clusters prepared by means of hierarchical clustering at a degree of similarity exhibited at a transmission destination with respect to communication packets extracted from a communication log obtained after executions of a plurality of applications; the cluster extracting means 225 counts, as a common cluster number, the number of clusters each which contains one or more lowest cluster extracted from a cluster at an arbitrary hierarchical level located between a highest level and a lowest level of analyzed cluster so as to output as an evaluation value. Evaluation means 226 makes an evaluation that a degree of influence by the application to be inspected is higher as the evaluation value output from the cluster extracting means 225 is larger. A result output means 227 outputs an evaluation result of the evaluation means 226 as an evaluation result of the application to be inspected.

Description

本発明は、携帯端末上で動作する端末アプリケーションを検査するアプリケーション検査システムに関する。 The present invention relates to an application inspection system that inspects a terminal application that runs on a mobile terminal.

スマートフォンや携帯情報端末、携帯電話等の携帯端末には、利用者の電話番号やメールアドレス等の情報をはじめ、位置情報やアドレス帳やスケジュールなど、さまざまな個人情報が記憶されている。一方、これらの携帯端末には、さまざまなベンダーが開発したメールソフト、ゲームソフト、便利ツール等の携帯端末用の端末アプリケーションプログラム（以下、単にアプリケーションと称する）が利用者によって自由にインストールされ、使用される。 A mobile terminal such as a smartphone, a mobile information terminal, or a mobile phone stores various personal information such as location information, an address book, and a schedule as well as information such as a user's telephone number and mail address. On the other hand, terminal application programs (hereinafter simply referred to as applications) for mobile terminals such as mail software, game software, and convenient tools developed by various vendors are freely installed and used by these users. Is done.

ところで、このようなアプリケーションは、さまざまなベンダーによって任意に開発されるため、例えば利用者がそのアプリケーションを利用しているときに、利用者の許可を求めることなく、その携帯端末に記憶された個人情報を外部に送信してしまうおそれがある。また、利用者が携帯端末を操作していないときも外部に送信するおそれがある。このように、利用者の意図しない状況で個人情報が送信されてしまう潜在的なおそれが存在する可能性があっても、アプリケーションが主目的として提供する機能・サービスの利便性を求めて、利用者は携帯端末上にアプリケーションをインストールし、利用している。 By the way, since such an application is arbitrarily developed by various vendors, for example, when a user uses the application, an individual stored in the portable terminal without asking for the user's permission is used. There is a risk of sending information to the outside. In addition, there is a possibility that data is transmitted to the outside even when the user is not operating the portable terminal. In this way, even if there is a possibility that personal information may be transmitted in a situation unintended by the user, it is necessary to use it for the convenience of functions and services provided by the application as the main purpose. A person installs an application on a mobile terminal and uses it.

一方、多種多様なアプリケーションを提供するサービスプロバイダーは、提供予定のアプリケーションが既知のマルウェアやスパイウェアではないかを検査した上で提供するようにしており、アプリケーションを検査する技術が知られている（例えば、特許文献１）。 On the other hand, service providers that provide a wide variety of applications are designed to inspect applications that are scheduled to be provided for known malware or spyware, and techniques for inspecting applications are known (for example, Patent Document 1).

特許文献１では、多くのプログラムのプロファイルを簡単かつ短時間で行うために、仮想オペレーティングシステムでプログラムを実行し、プログラムの動作履歴から動作の特徴を見つけ、プログラムの種類を例えば、「マクロウィルス」「ボット」「Ｐ２Ｐワーム」「スパイウェア」「トロイの木馬」「非悪性プログラム」のように分類するプログラム分析方法が提案されている。 In Patent Document 1, in order to profile many programs easily and in a short time, a program is executed by a virtual operating system, an operation characteristic is found from the operation history of the program, and the type of program is, for example, “macro virus”. There have been proposed program analysis methods classified as “bot”, “P2P worm”, “spyware”, “Trojan horse”, and “non-malicious program”.

特開２００８−１２９７０７号公報JP 2008-129707 A

しかしながら、従来の方法ではグレーウェアに対応することは困難であった。ここで、グレーウェアとは、利用者が了承した範囲（すなわち、アプリケーションが主目的として提供する機能・サービス）で正常に動作しつつ、更に利用者の意図と無関係に端末内の個人情報を外部へ送信するというもので、情報収集を主目的とするスパイウェアとの区別があいまいなアプリケーションである。このようなグレーウェアは、利用者が了承した正規のサービスを受けるための外部への送信と利用者が意図しない端末内の個人情報を送信するための外部への送信とから複数の異なる送信先へ外部の送信が行われるため、一概に悪いと判断することはできなくても、利用者によっては不利益をもたらす場合があるアプリケーションとして扱いたい。 However, it has been difficult to cope with grayware by the conventional method. Here, grayware is a function that operates normally within the range approved by the user (that is, the function / service provided by the application as the main purpose), and further, the personal information in the terminal is externally output regardless of the user's intention. It is an application that is vaguely distinguished from spyware whose main purpose is collecting information. Such grayware has a number of different destinations from external transmission to receive legitimate services approved by the user and external transmission to transmit personal information in the terminal that is not intended by the user. Because it is transmitted to the outside, even if it cannot be determined that it is generally bad, some users want to treat it as an application that may be disadvantageous.

そこで、本発明は、このようなグレーウェアについても、利用者が端末でアプリケーションを動作させるとセキュリティの観点でどのような影響を受ける可能性があるかを示せるように評価することを目的とする。 Therefore, the present invention has an object to evaluate such grayware so that it can be shown how it may be affected in terms of security when a user operates an application on a terminal. .

かかる課題を解決するために、本発明は、複数のアプリケーションを実行した通信ログから抽出した通信パケットに対して送信先の類似度にて階層的クラスタリングにより作成された解析済クラスタを記憶している記憶部と、検査対象アプリケーションを実行した通信ログから抽出した通信パケットそれぞれが割り当てられる前記解析済クラスタにおける最下位クラスタを抽出し、前記解析済クラスタの最上位と最下位の間の任意の階層のクラスタに前記最下位クラスタを1以上含んでいるクラスタの数を共通クラスタ数として計数し、当該共通クラスタ数を評価値とするクラスタ抽出手段と、前記評価値が多いほど、当該アプリケーションの影響度合いが高いと評価する評価手段と、前記評価結果を検査対象アプリケーションの評価結果として出力する結果出力部と、を有するアプリケーション検査システムを提供する。これにより、統計的に複数のアプリケーションを送信先に基づいて類似するアプリケーション群に分類した結果を用い、検査対象のある一つのアプリケーションに着目したときには、異なるアプリケーション群をまたがって属するアプリケーションほど、影響度合いを高く算出することができる。 In order to solve such a problem, the present invention stores an analyzed cluster created by hierarchical clustering with a similarity of destinations for communication packets extracted from communication logs in which a plurality of applications are executed. Extract the lowest cluster in the analyzed cluster to which each communication packet extracted from the communication log executed from the storage unit and the communication log executing the inspection target application is allocated, and in any hierarchy between the highest and lowest of the analyzed cluster The number of clusters including one or more of the lowest clusters in the cluster is counted as the number of common clusters, and the cluster extraction means that uses the number of common clusters as an evaluation value; Evaluation means that evaluates as high, and the evaluation result as the evaluation result of the application to be inspected And a result output unit for outputting the output. As a result, using the results of statistically classifying multiple applications into similar application groups based on the destination, and focusing on one application to be inspected, the degree of influence is greater for applications belonging across different application groups Can be calculated high.

かかるアプリケーション検査システムにおいて、前記クラスタ抽出手段は、前記検査対象アプリケーションについて抽出した前記最下位クラスタの数を最下位クラスタ数として計数し前記評価値に加えることが好ましい。これにより、所属するアプリケーション群に多数出現するアプリケーションほど、影響度合いを高く算出することができる。 In such an application inspection system, it is preferable that the cluster extraction unit counts the number of the lowest clusters extracted for the inspection target application as the lowest cluster number and adds it to the evaluation value. As a result, the degree of influence can be calculated higher for applications that appear more frequently in the application group to which they belong.

また、かかるアプリケーション検査システムにおいて、前記解析済クラスタは、更に通信パケットの送信内容の類似度を用いて階層的クラスタリングを行なって作成されることが好ましい。これにより、検査対象のアプリケーションを送信先だけでなく送信内容にも基づいて類似するアプリケーション群に分類した結果を用いて評価することができる。 In the application inspection system, it is preferable that the analyzed cluster is created by further performing hierarchical clustering using the similarity of the transmission contents of the communication packet. Thereby, it is possible to evaluate using the result of classifying the application to be inspected into similar application groups based not only on the transmission destination but also on the transmission content.

本発明によれば、グレーウェアについても、ユーザが端末でアプリケーションを動作させるとセキュリティの観点でどのような影響を受ける可能性があるかのセキュリティリスクを示せるように評価するアプリケーション検査システムを提供することができる。 According to the present invention, there is provided an application inspection system that evaluates grayware so that a security risk that can be affected in terms of security when a user operates an application on a terminal is shown. be able to.

本発明によるアプリケーション検査システムを説明する図である。It is a figure explaining the application test | inspection system by this invention. 本発明によるアプリケーションの分類結果のイメージ図である。It is an image figure of the classification result of the application by the present invention. 複数のアプリケーションを分類し、解析済みクラスタを生成する処理フローチャートである。It is a processing flowchart which classifies a plurality of applications and generates an analyzed cluster. 新たにアプリケーションを追加したときの解析済みクラスタを生成する処理フローチャートである。It is a process flowchart which produces | generates the analyzed cluster when an application is newly added. 解析済みクラスタを用いて検査対象のアプリケーションを検査する処理フローチャートである。It is a process flowchart which test | inspects the application of test | inspection using the analyzed cluster.

以下、本発明に係るアプリケーション検査システムについて、図を参照しつつ説明する。図１は、本発明によるアプリケーション検査システムの構成図である。図１に示すアプリケーション検査システム１は、アプリケーション検査サーバ装置２と外部サーバ４と携帯端末５がネットワーク３を介して接続されて構成される。 Hereinafter, an application inspection system according to the present invention will be described with reference to the drawings. FIG. 1 is a block diagram of an application inspection system according to the present invention. An application inspection system 1 shown in FIG. 1 is configured by connecting an application inspection server device 2, an external server 4, and a mobile terminal 5 via a network 3.

アプリケーション検査サーバ装置２は、検査対象のアプリケーションを検査する装置であって、記憶部２１と、制御部２２と、通信部２３と、表示操作部２４と、を有する。アプリケーション検査サーバ装置２は、アプリケーションをインストールして実行する携帯端末をエミュレートしてアプリケーションを実行するパーソナルコンピュータやサーバであってもよい。 The application inspection server apparatus 2 is an apparatus that inspects an application to be inspected, and includes a storage unit 21, a control unit 22, a communication unit 23, and a display operation unit 24. The application inspection server device 2 may be a personal computer or server that emulates a portable terminal that installs and executes an application and executes the application.

記憶部２１は、ＲＯＭ、ＲＡＭ、磁気ハードディスク等の記憶装置であり、各種プログラムや各種データを記憶し、アプリケーションプログラム２１１と固有情報２１２と通信ログ２１３と動作ログ２１４とクラスタ２１５と解析済みクラスタ２１６とクラスタ分類結果２１７の記憶領域を持つ。 The storage unit 21 is a storage device such as a ROM, a RAM, and a magnetic hard disk, and stores various programs and various data. The application program 211, unique information 212, communication log 213, operation log 214, cluster 215, and analyzed cluster 216 are stored. And a cluster classification result 217 storage area.

アプリケーションプログラム２１１は、検査対象の複数のアプリケーションを記憶する領域である。例えば、Ｎ個のアプリケーションを検査する場合は、アプリケーション１〜ＮのＮ個のアプリケーションがアプリケーションプログラム２１１に記憶される。 The application program 211 is an area that stores a plurality of applications to be inspected. For example, when N applications are inspected, N applications of applications 1 to N are stored in the application program 211.

固有情報２１２は、携帯端末に記憶される携帯端末特有の複数の固有情報を記憶する領域である。固有情報は、例えば、携帯端末を識別するために付与される端末固有番号や各種端末識別情報、緯度経度による現在位置を示す位置情報、アドレス帳、メール情報、スケジュール情報等を含む。 The unique information 212 is an area for storing a plurality of unique information unique to the mobile terminal stored in the mobile terminal. The unique information includes, for example, a terminal unique number assigned to identify the mobile terminal, various terminal identification information, position information indicating the current position by latitude and longitude, address book, mail information, schedule information, and the like.

通信ログ２１３は、検査対象のアプリケーションを実行したときに生成される通信ログを記憶する領域である。 The communication log 213 is an area for storing a communication log generated when an application to be inspected is executed.

動作ログ２１４は、検査対象のアプリケーションを実行したときに生成される動作ログを記憶する領域である。動作ログには、ユーザがアプリケーションの入力受付に対して行った入力を記憶した操作ログを含む。 The operation log 214 is an area for storing an operation log generated when the inspection target application is executed. The operation log includes an operation log storing the input made by the user to the input reception of the application.

クラスタ２１５は、後述する分類評価のためのクラスタリング処理に用いるクラスタを記憶する領域である。通信ログ２１６から通信ログを取り出し、クラスタとして割り当て、クラスタ２１５に保存する。 The cluster 215 is an area for storing clusters used for clustering processing for classification evaluation described later. A communication log is extracted from the communication log 216, assigned as a cluster, and stored in the cluster 215.

解析済みクラスタ２１６は、後述する分類評価のためのクラスタリング処理中にクラスタ２１５から取り出され解析を済ませたクラスタ、他のクラスタと最もクラスタ間の距離が近くまとめて新しく作成されるクラスタを階層的に記憶する領域である。 The analyzed cluster 216 is a hierarchy of clusters that have been extracted from the cluster 215 during clustering processing for classification evaluation, which will be described later, have been analyzed, and clusters that are newly created with the closest distance between other clusters. This is the area to be stored.

クラスタ分類結果２１７は、分類評価のためのクラスタリング処理が完了したときの解析済みクラスタ２１６のクラスタの階層的な構成をクラスタ分類結果として記憶する領域である。図２にクラスタ分類結果２１７の出力イメージを示す。外側の末端クラスタが、通信ログから割り当ててクラスタ２１５に最初の保存されたクラスタである。そして、クラスタリング処理の結果、中央で１つのクラスタにまとめられている。 The cluster classification result 217 is an area for storing the hierarchical structure of the cluster of the analyzed cluster 216 when the clustering process for classification evaluation is completed as a cluster classification result. FIG. 2 shows an output image of the cluster classification result 217. The outer end cluster is the first saved cluster assigned to the cluster 215 from the communication log. As a result of the clustering process, a single cluster is collected at the center.

制御部２２は、ＣＰＵやＭＰＵ等から構成され、アプリケーション検査サーバ装置２の制御全般についてプログラムに従って処理を行い、通信ログ取得手段２２１、動作ログ取得手段２２２、クラスタ作成手段２２３、クラスタ間距離計算手段２２４、クラスタ抽出手段２２５、評価手段２２６、結果出力手段２２７を有する。 The control unit 22 is constituted by a CPU, an MPU, and the like, and processes the overall control of the application inspection server apparatus 2 according to a program, and performs communication log acquisition means 221, operation log acquisition means 222, cluster creation means 223, and intercluster distance calculation means. 224, cluster extraction means 225, evaluation means 226, and result output means 227.

通信ログ取得手段２２１は、アプリケーションプログラム２１１に記憶された各アプリケーションの実行中に外部サーバ４にデータを送信するとき、また、外部サーバ４からデータを受信するとき、通信部２３を介して行われる送受信の通信ログを取得し、通信ログ２１３に記憶する。 The communication log acquisition unit 221 is performed via the communication unit 23 when data is transmitted to the external server 4 during execution of each application stored in the application program 211 and when data is received from the external server 4. A transmission / reception communication log is acquired and stored in the communication log 213.

動作ログ取得手段２２２は、アプリケーションプログラム２１１に記憶された各アプリケーションの実行中に利用者が行った操作のログを動作ログ２１４に記憶する。また、動作ログ取得手段２２２は、アプリケーションプログラム２１１に記憶された各アプリケーションの実行中にアプリケーションが固有情報２１２から読み出すと、いつどの情報を取得したか等を動作ログ２１４に記憶する。 The operation log acquisition unit 222 stores, in the operation log 214, a log of operations performed by the user during execution of each application stored in the application program 211. Further, when the application reads from the unique information 212 during the execution of each application stored in the application program 211, the operation log acquisition unit 222 stores in the operation log 214 when and what information is acquired.

クラスタ作成手段２２３は、クラスタを作成する際に、通信ログ記憶部から外部送信の通信ログを取り出し、取り出した通信ログをクラスタとして記憶部２１のクラスタ２１５に保存する。また、クラスタ作成手段２２３は、クラスタリング処理において主体的に動き、クラスタ２１５に保存されたクラスタに階層的クラスタリングを適用し、アプリケーションの分類を行い、分類中に作成する新しいクラスタを改めてクラスタ２１５に保存すると共にクラスタは解析済みクラスタ２１６に保存する。具体的には、クラスタ作成手段２２３は、クラスタ間距離計算手段２２４に処理対象として着目してクラスタ２１５から取り出した１つのクラスタとクラスタ２１５に記憶されている他のクラスタとのクラスタ間距離を算出させ、処理対象のクラスタと最もクラスタ間の距離が近いクラスタとをまとめて新しいクラスタを作成し、まとめる前のクラスタはクラスタ２１５から削除し、新しいクラスタをクラスタ２１５に保存すると共に、まとめる前のクラスタと新しいクラスタを解析済みクラスタ２１６に階層的に記憶させる。また、クラスタリングが終了すると、そのときの解析済みクラスタ２１６の内容を複数のアプリケーションの分類結果としてクラスタ分類結果２１７に記憶する。 When creating a cluster, the cluster creation means 223 takes out a communication log of external transmission from the communication log storage unit, and stores the extracted communication log as a cluster in the cluster 215 of the storage unit 21. In addition, the cluster creation means 223 moves independently in the clustering process, applies hierarchical clustering to the clusters stored in the cluster 215, classifies the application, and saves the new cluster created during the classification in the cluster 215 again. At the same time, the cluster is stored in the analyzed cluster 216. Specifically, the cluster creation unit 223 calculates an inter-cluster distance between one cluster extracted from the cluster 215 while focusing on the inter-cluster distance calculation unit 224 as a processing target and another cluster stored in the cluster 215. The cluster to be processed and the cluster having the shortest distance between the clusters are collected to create a new cluster, the cluster before the grouping is deleted from the cluster 215, the new cluster is saved in the cluster 215, and the cluster before the grouping And new clusters are hierarchically stored in the analyzed cluster 216. When the clustering is completed, the contents of the analyzed cluster 216 at that time are stored in the cluster classification result 217 as the classification results of a plurality of applications.

クラスタ間距離計算手段２２４は、クラスタ作成手段２２３の要求により、指定されたクラスタとクラスタ２１５に保存されたクラスタとのクラスタ間の距離を計算する。そして、指定されたクラスタと最もクラスタ間距離が近いクラスタを出力する。 The inter-cluster distance calculation unit 224 calculates the distance between clusters between the designated cluster and the cluster stored in the cluster 215 in response to a request from the cluster creation unit 223. Then, the cluster having the closest intercluster distance to the designated cluster is output.

クラスタ抽出手段２２５は、クラスタ分類結果２１７に保存された複数のアプリケーションの分類結果について、検査対象として指定された特定のアプリケーションに属する最下位クラスタを抽出し、抽出した最下位クラスタの数を最下位クラスタ数として計数する。また、抽出した最下位クラスタを含む任意の上位階層でのクラスタの数を共通クラスタ数として計数する。共通クラスタ数は、指定された任意の上位階層において異なるアプリケーション群にまたがって最下位クラスタが出現しているかを評価できる評価値となる。共通クラスタ数を求める上位階層には最上位と最下位は含まれず、上位階層は、最上位と最下位の間の任意の階層で指定されるものとする。なお、最上位はクラスタが１つにまとまって作成され、すなわち、すべての最下位クラスタが含まれるため、任意の上位階層の対象としない。 The cluster extraction unit 225 extracts the lowest cluster belonging to the specific application designated as the inspection target from the plurality of application classification results stored in the cluster classification result 217, and sets the number of the extracted lowest clusters as the lowest. Count as the number of clusters. In addition, the number of clusters in any upper hierarchy including the extracted lowest cluster is counted as the number of common clusters. The number of common clusters is an evaluation value that can be used to evaluate whether the lowest cluster appears across different application groups in a specified arbitrary upper hierarchy. It is assumed that the highest hierarchy and the lowest hierarchy are not included in the upper hierarchy for obtaining the number of common clusters, and the upper hierarchy is designated by an arbitrary hierarchy between the highest hierarchy and the lowest hierarchy. Note that the uppermost layer is created as a single cluster, that is, since all the lowermost clusters are included, it is not a target of any upper layer.

評価手段２２６は、検査対象のアプリケーションの影響度合いを評価する。このため、評価手段２２６は、クラスタ抽出手段２２５が出力する最下位クラスタ数を評価値として、評価値が多いほど、検査対象のアプリケーションの影響度合いが高いと評価する。また、評価手段２２６は、クラスタ抽出手段２２５が出力する共通クラスタ数を評価値として、評価値が多いほど、検査対象のアプリケーションの影響度合いが高いと評価する。また、評価手段２２６は、最下位クラスタ数と共通クラスタ数の合計を評価値としてもよい。 The evaluation unit 226 evaluates the degree of influence of the application to be inspected. For this reason, the evaluation unit 226 evaluates the degree of influence of the application to be inspected as the evaluation value increases with the number of lowest clusters output from the cluster extraction unit 225 as an evaluation value. Further, the evaluation unit 226 evaluates that the influence degree of the application to be inspected is higher as the evaluation value is larger, with the number of common clusters output from the cluster extraction unit 225 as the evaluation value. The evaluation unit 226 may use the sum of the lowest cluster number and the common cluster number as the evaluation value.

結果出力手段２２７は、評価手段２２６が出力した検査対象アプリケーションの影響度合いを表示操作部２４に表示させるほか、ネットワーク３を介して外部から要求を受けたときは外部に送信をして出力する。また、クラスタ分類結果２１７に保存された複数のアプリケーションの分類結果を用いて、指定されたある特定のアプリケーションに対する分類結果を、クラスタ分類結果２１７にクラスタ抽出手段２２５が抽出した最下位クラスタを識別可能に重ねて出現頻度の状態として出力してもよい。 The result output means 227 displays the degree of influence of the application to be inspected output from the evaluation means 226 on the display operation unit 24, and when it receives a request from the outside via the network 3, transmits it to the outside and outputs it. Further, by using the classification results of a plurality of applications stored in the cluster classification result 217, it is possible to identify the classification result for a specified specific application and the lowest cluster extracted by the cluster extraction means 225 in the cluster classification result 217. May be output as a state of appearance frequency.

通信部２３は、ネットワーク３を介して外部サーバ４との通信を行なうインタフェースであり、接続するネットワークに対応している。 The communication unit 23 is an interface that communicates with the external server 4 via the network 3 and corresponds to the network to be connected.

表示操作部２４は、タッチパネル付き液晶ディスプレイなどで構成された入出力インタフェースであり、アプリケーション実行時の画面表示や確認操作の受付を行う。なお、表示操作部２４は、表示機能だけもつ表示用デバイスと、操作ボタン等の入力デバイスとで構成してもよい。 The display operation unit 24 is an input / output interface configured by a liquid crystal display with a touch panel and the like, and accepts screen display and confirmation operation during application execution. The display operation unit 24 may include a display device having only a display function and an input device such as an operation button.

ネットワーク３は、アプリケーション検査サーバ装置２及び外部サーバ４に接続され、インターネット等の広域通信網あるいは閉域通信網などのネットワークである。一般電話回線網であってもよい。 The network 3 is connected to the application inspection server device 2 and the external server 4 and is a network such as a wide area communication network such as the Internet or a closed area communication network. It may be a general telephone line network.

外部サーバ４は、ネットワーク３を介してアプリケーションが主目的として提供する機能・サービスを提供する際のサーバやアプリケーションが端末内の個人情報を勝手に外部へ送信する際の情報収集を行うサーバである。 The external server 4 is a server that provides functions and services provided by the application as a main purpose via the network 3 and a server that collects information when the application arbitrarily transmits personal information in the terminal to the outside. .

携帯端末５は、利用者が所持し、評価結果をふまえてアプリケーションがインストールされる装置で、スマートフォンや携帯情報端末、携帯電話等である。 The mobile terminal 5 is a device that is possessed by the user and in which an application is installed based on the evaluation result, and is a smartphone, a mobile information terminal, a mobile phone, or the like.

ここで、通信ログに対するクラスタ間の距離の計算方法を説明する。通信ログに含まれるHTTPパケットの類似度を求め、このHTTPパケットの類似度を利用してクラスタ間の距離とし、クラスタリングを行う。このとき、通信ログに含まれるHTTPパケットの類似度は、HTTPパケット送信先類似度とHTTPパケット送信内容類似度とから求める。 Here, a method for calculating the distance between clusters for the communication log will be described. Clustering is performed by obtaining the similarity of HTTP packets included in the communication log and using the similarity of HTTP packets as the distance between clusters. At this time, the similarity of the HTTP packet included in the communication log is obtained from the HTTP packet transmission destination similarity and the HTTP packet transmission content similarity.

まず、HTTPパケット送信先類似度について説明する。HTTPパケット送信先類似度は、送信先ＩＰアドレスの類似度、Ｐｏｒｔ番号の類似度、及び、HTTPヘッダフィールドのHostヘッダ（以降、単にHostヘッダという）の類似度を用いて算出する。送信先ＩＰアドレスの類似度はＩＰアドレスの上位ビットの最長一致を適用し、Ｐｏｒｔ番号の類似度は値が同一か否かを適用し、Hostヘッダの類似度は文字列の距離を表す編集距離を適用する。 First, the HTTP packet transmission destination similarity will be described. The HTTP packet transmission destination similarity is calculated using the similarity of the transmission destination IP address, the similarity of the Port number, and the similarity of the Host header (hereinafter simply referred to as the Host header) in the HTTP header field. The similarity of the destination IP address applies the longest match of the upper bits of the IP address, the similarity of the Port number applies whether or not the values are the same, and the similarity of the Host header indicates the edit distance indicating the distance of the character string Apply.

例えば、２つのHTTPパケットＰ_ｘとＰ_ｙのHTTPパケット送信先類似度を求める場合を説明する。ここで、任意のHTTPパケットＰ_ｎの送信先は、送信先ＩＰアドレスｉｐ_ｎとＰｏｒｔ番号ｐｏｒｔ_ｎとHostヘッダｈｏｓｔ_ｎを用いて、Ｐ_ｎ＝｛ｉｐ_ｎ；ｐｏｒｔ_ｎ；ｈｏｓｔ_ｎ｝で構成されるとする。 For example, a case where the HTTP packet transmission destination similarity of two HTTP packets P _x and P _y is obtained will be described. Here, the destination of an arbitrary HTTP packet P _n is configured as P _n = {ip _n ; port _n ; host _n } using a destination IP address ip _n , a port number port _n, and a Host header host _n. Let's say.

まず、送信先ＩＰアドレス類似度について説明する。送信先ＩＰアドレスをＩＰｖ４とすると、ＩＰアドレスは、２^３２のアドレス空間であり、上位から８ビット毎に、ＩＰアドレスの範囲となる。ＩＰアドレスは一定の範囲ごとに利用者に割り当てられることから、同一組織あるいはネットワークの場合は、上位ビットが共通となる可能性が高い。そこで、HTTPパケットＰ_ｘとＰ_ｙにおいて、ＩＰアドレスの類似度は、２つのＩＰアドレスｉｐ_ｘとｉｐ_ｙの先頭から最長一致ビット数を求めることで得られる。 First, the destination IP address similarity will be described. When the destination IP address and IPv4, IP addresses ^is an address space of ^{2 32,} every 8 highest bits, the range of IP addresses. Since an IP address is assigned to a user for each certain range, in the case of the same organization or network, there is a high possibility that the upper bits are common. Therefore, in the HTTP packets P _x and P _y , the similarity between IP addresses can be obtained by obtaining the longest matching bit number from the heads of the two IP addresses ip _x and ip _y .

次に、Ｐｏｒｔ番号類似度について説明する。Ｐｏｒｔ番号は、２^１６の空間であり、一部のＰｏｒｔ番号は、特定のサービスに対し登録され使用されることがほとんどである。そこで、HTTPパケットＰ_ｘとＰ_ｙにおいて、Ｐｏｒｔ番号の類似度は、２つのＰｏｒｔ番号ｐｏｒｔ_ｘとｐｏｒｔ_ｙが一致するかしないかを求めることで得られる。 Next, the Port number similarity will be described. Port numbers ^is a space of ^{2 16,} part of the Port numbers are most often used is registered for a particular service. Therefore, in the HTTP packets P _x and P _y , the similarity of the Port numbers can be obtained by determining whether or not the two Port numbers port _x and port _y match.

次に、Hostヘッダ類似度について説明する。Hostヘッダは、ＦＱＤＮとしてホスト名とドメイン名から構成される文字列となる。そこで、任意の文字列の距離として編集距離を用いると、HTTPパケットＰ_ｘとＰ_ｙにおいて、Hostヘッダの類似度は、２つのHostヘッダｈｏｓｔ_ｘとｈｏｓｔ_ｙの編集距離を求めることで得られる。 Next, the Host header similarity will be described. The Host header is a character string composed of a host name and a domain name as FQDN. Therefore, when the edit distance is used as the distance between the arbitrary character strings, the similarity of the Host header in the HTTP packets P _x and P _y can be obtained by determining the edit distance between the two Host headers host _x and host _y .

従って、HTTPパケットＰ_ｘとＰ_ｙについて求めた送信先ＩＰアドレス類似度、Ｐｏｒｔ番号類似度、Hostヘッダ類似度を用いて、例えば単純加算すると、２つのHTTPパケットＰ_ｘとＰ_ｙのHTTPパケット送信先類似度とすることができる。 Accordingly, when the destination IP address similarity, the Port number similarity, and the Host header similarity obtained for the HTTP packets P _x and P _y are used, for example, simple addition, HTTP packet transmission of two HTTP packets P _x and P _y is performed. The degree of similarity can be used.

次に、HTTPパケット送信内容類似度について説明する。HTTPパケット送信内容類似度は、HTTPヘッダフィールドのRequest-Line、Cookie、Message-Bodyの類似度を用いて算出する。各要素の類似度は、コルモゴルフ複雑性を利用したnormalized compression distance (NCD) により求める。 Next, HTTP packet transmission content similarity will be described. The HTTP packet transmission content similarity is calculated using the Request-Line, Cookie, and Message-Body similarity in the HTTP header field. The similarity of each element is obtained by normalized compression distance (NCD) using the Colmo golf complexity.

例えば、２つのHTTPパケットＰ_ｘとＰ_ｙのHTTPパケット送信内容類似度を求める場合を説明する。ここで、任意のHTTPパケットＰ_ｎの送信内容は、HTTPパケットに含まれるRequest-Lineをｒｌｉｎｅ_ｎ、Cookieをｃｏｏｋｉｅ_ｎ、Message-Bodyをｂｏｄｙ_ｎとして、Ｐ_ｎ＝｛ｒｌｉｎｅ_ｎ；ｃｏｏｋｉｅ_ｎ；ｂｏｄｙ_ｎ｝で構成されるとする。 For example, a case will be described in which the HTTP packet transmission content similarity between two HTTP packets P _x and P _y is obtained. Here, the transmission content of an arbitrary HTTP packet P _n is as follows: Request-Line included in the HTTP packet is set to rline _n , Cookie is set to cookie _n , Message-Body is set to body _n , P _n = {rline _n ; cookie _n ; _n }.

Request-Line、Cookie、Message-Bodyはいずれも文字列で構成され、その値はアプリケーションやサービス毎に異なるため、コルモゴルフ複雑性を利用し，文字列の文脈に依存することなく情報量としての距離を求めることが可能なNCDを用いることで、類似度を求められる。HTTPパケットＰ_ｘとＰ_ｙにおけるRequest-Line類似度は、ｒｌｉｎｅ_ｘとｒｌｉｎｅ_ｙのNCDを求めることで得られる。また、HTTPパケットＰ_ｘとＰ_ｙにおけるCookie類似度は、ｃｏｏｋｉｅ_ｘとｃｏｏｋｉｅ_ｙのNCDを求めることで得られる。また、HTTPパケットＰ_ｘとＰ_ｙにおけるMessage-Body類似度は、ｂｏｄｙ_ｘとｂｏｄｙ_ｙのNCDを求めることで得られる。 Since Request-Line, Cookie, and Message-Body are all composed of character strings, and the values differ depending on the application or service, the distance as the amount of information is used without depending on the context of the character string, using the Colmo Golf complexity. The similarity can be obtained by using an NCD capable of obtaining. The Request-Line similarity in HTTP packets P _x and P _y can be obtained by obtaining the NCD of rline _x and rline _y . Further, the cookie similarity in the HTTP packets P _x and P _y can be obtained by obtaining NCDs of cookie _x and cookie _y . Further, the Message-Body similarity in the HTTP packets P _x and P _y can be obtained by obtaining the NCD of body _x and body _y .

従って、HTTPパケットＰ_ｘとＰ_ｙについて求めたRequest-Line類似度、Cookie類似度、Message-Bodyの類似度を用いて、例えば単純加算すると、２つのHTTPパケットＰ_ｘとＰ_ｙのHTTPパケット送信内容類似度とすることができる。 Therefore, Request-Line similarity calculated for HTTP packets P _x and P _y, cookie similarity using the similarity of Message-Body, for example, when simply added, HTTP packet transmission of two HTTP packets P _x and P _y The content similarity can be set.

次に、クラスタリングの方法に説明する。クラスタは、アプリケーション単位又はHTTPパケット単位で構成する。なお、アプリケーション単位にするときのアプリケーションは、HTTPパケットの集合となるため、HTTPパケットの階層的クラスタリングを行う。HTTPパケットの階層的クラスタリングは、クラスタ間の距離は上述の通り、HTTP パケット送信先類似度及びHTTP パケット送信内容類似度に基づいたHTTP パケット類似度の距離を用い、距離の近いクラスタ毎にまとめていき、クラスタリングは群平均法を用いる。 Next, a clustering method will be described. A cluster is configured in units of applications or HTTP packets. In addition, since the application in the application unit is a set of HTTP packets, hierarchical clustering of HTTP packets is performed. As described above, the hierarchical clustering of HTTP packets uses the distance of HTTP packet similarity based on the HTTP packet destination similarity and the HTTP packet transmission content similarity as described above. The group average method is used for clustering.

図３を参照し、アプリケーションプログラム２１１に記憶されたアプリケーション１〜Ｎのすべてに対する、アプリケーション検査サーバ装置２の分類評価の処理フローを説明する。なお、以下に説明する処理フローは、制御部２２で実行されるプログラムによって制御される。 With reference to FIG. 3, the processing flow of the classification evaluation of the application inspection server apparatus 2 for all of the applications 1 to N stored in the application program 211 will be described. Note that the processing flow described below is controlled by a program executed by the control unit 22.

複数のアプリケーションの分類評価を開始するように指示を受けると処理が開始し、検査対象のすべてのアプリケーションを実行する（ステップＳ１）。このとき、通信ログ取得手段２２１は、アプリケーションの通信ログを取得し、取得した通信ログを通信ログ２１３に保存する。また、動作ログ取得手段２２２は、アプリケーションの動作ログを取得し、取得した動作ログを動作ログ２１４に保存する。 When an instruction to start classification evaluation of a plurality of applications is received, the process starts and all the applications to be inspected are executed (step S1). At this time, the communication log acquisition unit 221 acquires the communication log of the application, and stores the acquired communication log in the communication log 213. Further, the operation log acquisition unit 222 acquires an operation log of the application, and stores the acquired operation log in the operation log 214.

次に、クラスタ作成手段２２３は、通信ログ２１３からHTTPパケットなどの通信パケットを抽出する（ステップＳ２）。そして、クラスタ作成手段２２３は、抽出した通信パケット毎にクラスタを割り当て、割り当てたクラスタをクラスタ２１５に保存する（ステップＳ３）。 Next, the cluster creation unit 223 extracts a communication packet such as an HTTP packet from the communication log 213 (step S2). Then, the cluster creation means 223 assigns a cluster for each extracted communication packet, and stores the assigned cluster in the cluster 215 (step S3).

クラスタ作成手段２２３は、クラスタ２１５に保存されているクラスタ数を確認し（ステップＳ４）、クラスタ２１５内のクラスタ数が１ならば（ステップＳ４−Ｙｅｓ）、ステップＳ８に進む。クラスタ２１５内のクラスタ数が１でなければ（ステップＳ４−Ｎｏ）、処理対象としてクラスタ２１５より任意のクラスタを１つ取り出す（ステップＳ５）。このとき、クラスタを取り出したことによりクラスタが１つ減るため、クラスタ数は−１されることになる。 The cluster creation means 223 checks the number of clusters stored in the cluster 215 (step S4), and if the number of clusters in the cluster 215 is 1 (step S4-Yes), the process proceeds to step S8. If the number of clusters in the cluster 215 is not 1 (step S4-No), one arbitrary cluster is extracted from the cluster 215 as a processing target (step S5). At this time, the number of clusters is decremented by -1 because the number of clusters is reduced by one by taking out the clusters.

次に、クラスタ作成手段２２３は、取り出した処理対象のクラスタと他のクラスタのクラスタ間距離を計算するようにクラスタ間距離計算手段２２４を呼び出す。クラスタ間距離計算手段２２４は、取り出した処理対象のクラスタとクラスタ２１５に保存されている他の全てのクラスタとの間でクラスタ間距離を計算する（ステップＳ６）。クラスタ間距離を得ると、クラスタ作成手段２２３は、取り出した処理対象のクラスタと最もクラスタ間の距離が近いクラスタを新しいクラスタとして1つのクラスタにまとめ、この新しいクラスタをクラスタ２１５および解析済みクラスタ２１６に格納し、ステップＳ４へ戻る（ステップＳ７）。このとき、ステップＳ５でクラスタ２１５から取り出した処理対象のクラスタと最もクラスタ間の距離が近いクラスタもクラスタ２１５から取り出されるため、１つにまとめられて作成された新しいクラスタをクラスタ２１５に格納しても、クラスタ２１５のクラスタ数は増減することはない。また、解析済みクラスタ２１６には、クラスタ２１６から取り出された２つのクラスタ（すなわち、処理対象としてステップＳ５で取り出されたクラスタと、処理対象のクラスタと最もクラスタ間の距離が近いクラスタとしてステップＳ７で取り出されたクラスタ）がまとめられてどのように新しい１つのクラスタとなったかが、階層的に記憶されることになる。 Next, the cluster creation unit 223 calls the inter-cluster distance calculation unit 224 so as to calculate the inter-cluster distance between the extracted cluster to be processed and another cluster. The intercluster distance calculation means 224 calculates the intercluster distance between the extracted cluster to be processed and all other clusters stored in the cluster 215 (step S6). When the inter-cluster distance is obtained, the cluster creation means 223 combines the extracted clusters that are closest to the processing target cluster into a single cluster as a new cluster, and the new cluster is combined into the cluster 215 and the analyzed cluster 216. Store and return to step S4 (step S7). At this time, the cluster having the shortest distance between the clusters to be processed extracted from the cluster 215 in step S5 is also extracted from the cluster 215. Therefore, the new cluster created as one unit is stored in the cluster 215. However, the number of clusters in the cluster 215 does not increase or decrease. The analyzed cluster 216 includes two clusters extracted from the cluster 216 (that is, the cluster extracted in step S5 as a processing target and the cluster having the shortest distance between the processing target cluster and the cluster in step S7). It is hierarchically stored how the extracted clusters are combined into one new cluster.

ステップＳ８では、解析済みクラスタ２１６に保存されたクラスタを複数アプリケーションの分類結果としてクラスタ分類結果２１７に保存して、処理を終了する。 In step S8, the cluster stored in the analyzed cluster 216 is stored in the cluster classification result 217 as a classification result of a plurality of applications, and the process ends.

これにより、複数のアプリケーションに対し、通信ログに基づいて送信先及び送信内容の類似度による分類を新たに行うことができる。 Thereby, it is possible to newly classify a plurality of applications based on the similarity between the transmission destination and the transmission content based on the communication log.

次に、図４を参照し、アプリケーションプログラム２１１に記憶された複数アプリケーションに更に新たにアプリケーションプログラムを加えて、分類評価を行う処理フローを説明する。新規アプリケーションの追加は、図３の処理において、クラスタリングを適用し分類済みのクラスタに対して、新規アプリケーションの通信ログのクラスタとの距離を計算し、新たな分類結果とすることを目的としている。そのため、分類済みのクラスタとして、解析済みクラスタ２１６に格納されているクラスタを利用して処理を行う。このとき、どの階層に追加したいかを予め指定して処理を開始する。 Next, a processing flow for performing classification evaluation by adding a new application program to a plurality of applications stored in the application program 211 will be described with reference to FIG. The purpose of adding a new application is to calculate a distance from the cluster of the communication log of the new application to a cluster that has been classified by applying clustering in the process of FIG. 3 and obtain a new classification result. Therefore, processing is performed using clusters stored in the analyzed cluster 216 as classified clusters. At this time, the process is started by specifying in advance which hierarchy to add.

まず、新規アプリケーションを実行し、通信ログ取得手段２２１は、新規アプリケーションの通信ログを取得し通信ログ２１３に保存する（ステップＳ１１）。そして、クラスタ作成手段２２３において、通信ログ２１３から通信パケットのうちHTTPパケットを取り出し（ステップＳ１２）、取り出したHTTPパケット毎にクラスタを割り当て、クラスタ２１５に保存する（ステップＳ１３）。 First, the new application is executed, and the communication log acquisition unit 221 acquires the communication log of the new application and stores it in the communication log 213 (step S11). Then, in the cluster creation means 223, an HTTP packet is extracted from the communication log 213 from the communication log 213 (step S12), a cluster is assigned to each extracted HTTP packet, and stored in the cluster 215 (step S13).

クラスタ作成手段２２３は、クラスタ２１５に保存されているクラスタ数を確認し（ステップＳ１４）、クラスタ２１５内のクラスタ数が０ならば（ステップＳ１４−Ｙｅｓ）、ステップＳ１８へ進む。クラスタ２１５内のクラスタ数が０でなければ（ステップＳ１４−Ｎｏ）、クラスタ２１５よりステップＳ１３で追加した新規クラスタの中からまだ未処理クラスタとして残る任意のクラスタを１つ取り出す（ステップＳ１５）。このとき、クラスタが１つ減り、クラスタ数は−１となる。 The cluster creation means 223 confirms the number of clusters stored in the cluster 215 (step S14), and if the number of clusters in the cluster 215 is 0 (step S14-Yes), the process proceeds to step S18. If the number of clusters in the cluster 215 is not 0 (step S14-No), one arbitrary cluster still remaining as an unprocessed cluster is extracted from the new cluster added in step S13 from the cluster 215 (step S15). At this time, one cluster is reduced and the number of clusters is -1.

クラスタ間距離計算手段２２４は、ステップＳ１５で取り出した処理対象のクラスタと解析済みクラスタ２１６に保存されている追加したいとして指定した階層の全てのクラスタとの間でクラスタ間距離を計算する（ステップＳ１６）。 The inter-cluster distance calculation means 224 calculates the inter-cluster distance between the cluster to be processed extracted in step S15 and all the clusters of the hierarchy designated to be added stored in the analyzed cluster 216 (step S16). ).

次に、クラスタ作成手段２２３は、処理対象のクラスタと最もクラスタ間の距離が近いクラスタを新しいクラスタとして1つのクラスタにまとめ、この新しいクラスタを解析済みクラスタ２１６に格納し、ステップＳ１４へ戻る（ステップＳ１７）。このとき、解析済みクラスタ２１６には、ステップＳ１５で取り出されたクラスタがどこに追加されて１つのクラスタにまとめられたかが、階層的に記憶されることになる。 Next, the cluster creation means 223 collects the clusters having the closest distance between the clusters to be processed as a new cluster into one cluster, stores the new cluster in the analyzed cluster 216, and returns to step S14 (step S14). S17). At this time, in the analyzed cluster 216, where the clusters extracted in step S15 are added and combined into one cluster is hierarchically stored.

ステップＳ１８では、解析済みクラスタ２１６に保存されている各層のクラスタから新しい分類結果を生成し、新たにクラスタ分類結果２１７に保存して、処理を終了する。このクラスタ分類結果２１７に保存された結果が、新規アプリケーションを追加した複数アプリケーションの分類結果となる。 In step S18, a new classification result is generated from the cluster of each layer stored in the analyzed cluster 216, newly stored in the cluster classification result 217, and the process ends. The result stored in the cluster classification result 217 becomes a classification result of a plurality of applications to which a new application is added.

これにより、既存の分類済みのクラスタに対し、新規のアプリケーションを追加しても分類結果を得ることができる。なお、図４の処理フローでは、追加したい階層のクラスタとのクラスタ間の距離からクラスタリングを行うことになるため、増えたアプリケーションも含めて図３の処理フローのように再度新しくクラスタリングし直す場合に比べ、分類結果の精度は劣ってしまう一方処理量は少なくて済むため、分類結果を早く得たいという場合に有効である。 Thereby, a classification result can be obtained even if a new application is added to an existing classified cluster. In the processing flow of FIG. 4, clustering is performed from the distance between clusters with the cluster of the layer to be added. Therefore, when clustering is performed again including the increased number of applications as in the processing flow of FIG. 3. In comparison, the accuracy of the classification result is inferior, but the processing amount is small, which is effective when it is desired to obtain the classification result quickly.

次に、図５を参照し、分類結果から検査対象のアプリケーションに対する分類のみを抽出して結果出力する処理フローについて説明する。本処理は、図３及び図４において複数アプリケーションの分類結果が得られた後に、個々のアプリケーション評価も続いて行う場合、及び携帯端末５から利用予定のアプリケーションに対する個別の評価結果の問い合わせに応える場合等に実行される。 Next, a processing flow for extracting only the classification for the inspection target application from the classification result and outputting the result will be described with reference to FIG. This processing is performed when individual application evaluation is performed after the classification results of a plurality of applications in FIG. 3 and FIG. 4 are obtained, and when an individual evaluation result inquiry for an application scheduled to be used is answered from the mobile terminal 5 Etc. are executed.

階層的クラスタリングの最下位のクラスタは、アプリケーション毎に区別することができる。従って、クラスタ抽出手段２２５が、クラスタ分類結果２１７に格納された分類結果において、アプリケーション毎に最下位のクラスタを抽出すると、当該アプリケーションにおける出現状況を把握できる。このとき、クラスタ分類結果では送信先が類似するアプリケーション群、送信内容が類似するアプリケーション群として分類結果が得られているので、異なるアプリケーション群に同じアプリケーションが出現する頻度を得られる。すべてのアプリケーションに対し、異なるアプリケーション群に属する頻度を求め、頻度の値でソートを行い、相対的に頻度が大きい、すなわち、異なるアプリケーション群を複数にまたがって属しているアプリケーションほど、グレーウェアらしく影響度が大きいといえる。 The lowest cluster in the hierarchical clustering can be distinguished for each application. Therefore, when the cluster extraction unit 225 extracts the lowest cluster for each application in the classification result stored in the cluster classification result 217, the appearance status in the application can be grasped. At this time, in the cluster classification result, since the classification result is obtained as an application group having a similar transmission destination and an application group having a similar transmission content, the frequency at which the same application appears in different application groups can be obtained. For all applications, find the frequency that belongs to different application groups, sort by frequency value, and relatively high frequency, that is, the application that belongs to multiple different application groups affects more like grayware It can be said that the degree is great.

図５では、検査対象のアプリケーション及び評価値に使用する階層を指定して、検査の処理を開始する。まず、クラスタ抽出手段２２５は、最下位クラスタ数の値を０に初期化し、クラスタ分類結果２１７から複数アプリケーションの分類結果を読み出す（ステップＳ２１）。次に、クラスタ抽出手段２２５は、読み出した分類結果の最下位クラスタの１つを処理対象として選択する（ステップＳ２２）。 In FIG. 5, the application to be inspected and the hierarchy to be used for the evaluation value are specified, and the inspection process is started. First, the cluster extraction unit 225 initializes the value of the lowest cluster number to 0, and reads out the classification results of a plurality of applications from the cluster classification result 217 (step S21). Next, the cluster extraction means 225 selects one of the lowest clusters of the read classification results as a processing target (Step S22).

処理対象として選択した最下位クラスタが検査対象のアプリケーションのクラスタかを判定する（ステップＳ２３）。検査対象のアプリケーションのクラスタであれば（ステップＳ２３−Ｙｅｓ）、処理対象のクラスタを抽出し、最下位クラスタ数を＋１としてカウントアップして計数し（ステップＳ２４）、ステップＳ２５へ進む。検査対象のアプリケーションのクラスタでなければ（ステップＳ２３−Ｎｏ）、ステップＳ２５へ進む。 It is determined whether the lowest cluster selected as the processing target is a cluster of the application to be inspected (step S23). If it is the cluster of the application to be inspected (step S23-Yes), the cluster to be processed is extracted, the lowest cluster number is counted up as +1 (step S24), and the process proceeds to step S25. If it is not the cluster of the application to be inspected (step S23-No), the process proceeds to step S25.

ステップＳ２５において、すべての最下位クラスタについて確認が終了したか否かを判定する。すべての最下位クラスタについて確認が終了していれば（ステップＳ２５−Ｙｅｓ）、クラスタ抽出手段２２５は、検査対象アプリケーションの最下位クラスタの抽出と最下位クラスタ数の計数を完了する。そして、指定された上位階層でのクラスタの中から抽出した最下位クラスタが１つ以上含まれている上位階層のクラスタを抽出し、そのクラスタ数を共通クラスタ数として計数する（ステップＳ２６）。一方、すべての最下位クラスタについて確認が終了していなければ（ステップＳ２５−Ｎｏ）、ステップＳ２２に戻り、検査対象アプリケーションの最下位クラスタの抽出と最下位クラスタ数の計数を継続する。 In step S25, it is determined whether confirmation has been completed for all the lowest clusters. If the confirmation has been completed for all the lowest clusters (step S25—Yes), the cluster extraction unit 225 completes the extraction of the lowest cluster of the application to be inspected and the count of the lowest clusters. Then, an upper layer cluster including one or more lowest clusters extracted from clusters in the specified upper layer is extracted, and the number of clusters is counted as the number of common clusters (step S26). On the other hand, if the confirmation has not been completed for all the lowest clusters (step S25-No), the process returns to step S22, and the extraction of the lowest cluster of the application to be inspected and the counting of the lowest cluster number are continued.

次に、評価手段２２６は、クラスタ抽出手段２２５が出力した最下位クラスタ数と共通クラスタ数を評価値として用いて、影響度合いの評価を行う（ステップＳ２７）。このとき、最下位クラスタ数が多いほど、影響度合いを高く評価する。また、共通クラスタ数が多いほど、影響度合いを高く評価する。そして、結果出力手段２２７は、評価手段２２６が出力した影響度合いを検査対象のアプリケーションの評価結果として出力し（ステップＳ２８）、処理を終了する。 Next, the evaluation unit 226 evaluates the degree of influence using the lowest cluster number and the common cluster number output from the cluster extraction unit 225 as evaluation values (step S27). At this time, the greater the number of lowest clusters, the higher the degree of influence is evaluated. Further, the greater the number of common clusters, the higher the degree of influence is evaluated. Then, the result output means 227 outputs the degree of influence output by the evaluation means 226 as the evaluation result of the application to be inspected (Step S28), and the process is terminated.

グレーウェアなアプリケーションで、例えば、広告配信、利用統計を目的として端末情報の取得を行う場合は、携帯端末内の複数のアプリケーションが同一サーバと通信していることがあることから、他のアプリケーションとの類似性からアプリケーション群に分類し、その結果を用いてグレーウェアらしさを判断することができる。このように、複数のアプリケーションから得られる統計的な傾向を把握することで、これまで知られていなかった送信意図や用途を抽出し、改めてアプリケーション単体の評価に反映することが可能になる。 For example, when acquiring terminal information for the purpose of advertisement distribution and usage statistics with a grayware application, multiple applications in the mobile terminal may be communicating with the same server. It is possible to classify into application groups based on the similarity of and to determine the likelihood of grayware using the result. In this way, by grasping statistical trends obtained from a plurality of applications, it is possible to extract transmission intentions and uses that have not been known so far and reflect them again in the evaluation of a single application.

本発明は、上記実施の形態に限定されるものではなく、幾多の変更及び変形が可能である。例えば、ステップＳ２及びステップＳ１２において通信ログ２１３から通信パケットのうちHTTPパケットを取り出す際に、動作ログ取得手段２２２が取得した動作ログ２１４もあわせて参照し、通信パケットにユーザの意図によるものか否かの情報を付加するようにしてもよい。このとき、付加されたユーザの意図によるものか否かの情報を用い、ユーザの意図による通信パケットと明らかであるHTTPパケットについては、ステップＳ３及びステップＳ１３においてクラスタの割り当てをせず、分類評価の対象からはずしてもよい。また、目的が明らかでユーザの操作がなくてもユーザの意図による通信パケットの送信先となる送信先について予めホワイトリストとして情報を記憶しておき、ステップＳ２及びステップＳ１２において、抽出する通信パケットの対象外としてもよい。これにより、処理対象のクラスタ数を減らすことができ、クラスタリング処理を効率よく行うことができると共に、評価値を構成する最下位クラスタ数も少なくすることができる。
The present invention is not limited to the above-described embodiment, and many changes and modifications can be made. For example, when the HTTP packet is extracted from the communication packet 213 from the communication log 213 in step S2 and step S12, the operation log 214 acquired by the operation log acquisition unit 222 is also referred to, and whether or not the communication packet is due to the user's intention. Such information may be added. At this time, using the added information indicating whether or not it is due to the user's intention, the communication packet according to the user's intention is clearly identified as a classification evaluation without assigning a cluster in step S3 and step S13. It may be removed from the subject. In addition, information is stored in advance as a white list for transmission destinations of communication packets according to the user's intention even if the purpose is clear and there is no user operation, and the communication packets to be extracted are extracted in steps S2 and S12. It may be excluded. As a result, the number of clusters to be processed can be reduced, the clustering process can be performed efficiently, and the number of the lowest clusters constituting the evaluation value can be reduced.

１・・・アプリケーション検査システム
２・・・アプリケーション検査サーバ装置
２１・・・記憶部
２１１・・・アプリケーションプログラム
２１２・・・固有情報
２１３・・・通信ログ
２１４・・・動作ログ
２１５・・・クラスタ
２１６・・・解析済みクラスタ
２１７・・・クラスタ分類結果
２２・・・制御部
２２１・・・通信ログ取得手段
２２２・・・動作ログ取得手段
２２３・・・クラスタ作成手段
２２４・・・クラスタ間距離計算手段
２２５・・・クラスタ抽出手段
２２６・・・評価手段
２２７・・・結果出力手段
２３・・・通信部
２４・・・表示操作部
３・・・ネットワーク
４・・・外部サーバ
５・・・携帯端末

DESCRIPTION OF SYMBOLS 1 ... Application inspection system 2 ... Application inspection server apparatus 21 ... Memory | storage part 211 ... Application program 212 ... Specific information 213 ... Communication log 214 ... Operation log 215 ... Cluster 216: Analyzed cluster 217 ... Cluster classification result 22 ... Control unit 221 ... Communication log acquisition means 222 ... Operation log acquisition means 223 ... Cluster creation means 224 ... Intercluster distance Calculation means 225 ... Cluster extraction means 226 ... Evaluation means 227 ... Result output means 23 ... Communication section 24 ... Display operation section 3 ... Network 4 ... External server 5 ... Mobile device

Claims

A storage unit that stores the analyzed cluster created by hierarchical clustering with the similarity of the transmission destination with respect to the communication packet extracted from the communication log that executed a plurality of applications,
The lowest cluster in the analyzed cluster to which each communication packet extracted from the communication log executing the inspection target application is assigned is extracted, and the highest cluster is placed in a cluster of any hierarchy between the highest and lowest of the analyzed cluster. A cluster extraction unit that counts the number of clusters including one or more subordinate clusters as the number of common clusters, and uses the number of common clusters as an evaluation value;
An evaluation means for evaluating that the greater the evaluation value is, the higher the degree of influence of the application is;
A result output unit for outputting the evaluation result as an evaluation result of the inspection target application; and
An application inspection system characterized by comprising:

The application inspection system according to claim 1, wherein the cluster extraction unit counts the number of the lowest clusters extracted for the inspection target application as the lowest cluster number, and adds the counted number to the evaluation value.

3. The application inspection system according to claim 1, wherein the analyzed cluster is created by further performing hierarchical clustering using a similarity of transmission contents of communication packets.