JP2021129203A

JP2021129203A - Communication analysis device, communication analysis program, and communication analysis method

Info

Publication number: JP2021129203A
Application number: JP2020022363A
Authority: JP
Inventors: 信之中村; Nobuyuki Nakamura; 信吾阿多; Shingo Ata; 圭脩兎本; Keishu Umoto
Original assignee: Oki Electric Industry Co Ltd; University Public Corporation Osaka
Current assignee: Oki Electric Industry Co Ltd; University Public Corporation Osaka
Priority date: 2020-02-13
Filing date: 2020-02-13
Publication date: 2021-09-02

Abstract

To analyze communication abnormalities efficiently and in detail.SOLUTION: The present invention relates to a communication analysis device. The communication analysis device of the present invention includes: encoding processing means for generating a flow code string by acquiring flow data by dividing traffic data to be analyzed according to each flow, discriminating a protocol used for communication, and performing encoding processing of converting each signal of communication indicated in the flow data into a code; label assigning means for assigning a label corresponding to a similar one from multiple pieces of teacher data to the flow code string or a code string based on the flow code string; and output means for performing a predetermined output corresponding to the label for the flow code string or the code string based on the flow code string to which the label is assigned.SELECTED DRAWING: Figure 1

Description

この発明は、通信解析装置、通信解析プログラム及び通信解析方法に関し、例えば、通信内容から異常及びその異常の種別を分析する処理に適用し得る。 The present invention relates to a communication analysis device, a communication analysis program, and a communication analysis method, and can be applied to, for example, a process of analyzing an abnormality and the type of the abnormality from the communication content.

一般に、通信障害を特定するには、通信のパケットダンプ等の詳細な情報を元にして、ベンダ拡張やプロトコルの実装揺らぎのような機種依存の知見を元にして、技術者の経験を頼りに解析することが多かった。一方で、昨今、ＡＩなどの機械学習器を用いて、過去に発生した事象とのパターン認識による合致などによる技術革新が進んでいるが、機械学習では従来発生した事象のデータの網羅性が重要であることが知られている。 In general, to identify a communication failure, rely on the experience of engineers based on detailed information such as communication packet dumps and model-dependent knowledge such as vendor expansion and protocol implementation fluctuations. It was often analyzed. On the other hand, in recent years, technological innovations such as matching with events that have occurred in the past by pattern recognition using machine learning devices such as AI are progressing, but in machine learning, completeness of data of events that have occurred in the past is important. Is known to be.

ところで、ベンダ依存の実装ゆらぎに由来する手順が含まれるプロトコルでは、典型的な正常系シーケンスについては機械学習可能であるが、セッションを維持・確認する場合やエラーやＷａｒｎｉｎｇ（警告）等を通知する場合に当該通知に応じたメッセージが適宜挿入されることがあり、予め機械学習することが困難である。また、実際の通信においてエラーが起きた際には、想定外の様々なパターンのシーケンスが発生するため、従来、正常系／異常系の全てのパターンのデータを収集して機械学習することが困難であるという課題があった。 By the way, in a protocol that includes a procedure derived from vendor-dependent implementation fluctuations, machine learning is possible for a typical normal sequence, but when maintaining / confirming a session or notifying an error or warning. In some cases, a message corresponding to the notification may be inserted as appropriate, and it is difficult to perform machine learning in advance. In addition, when an error occurs in actual communication, a sequence of various unexpected patterns occurs, so it is difficult to collect data of all patterns of normal system / abnormal system and perform machine learning in the past. There was a problem that it was.

例えば、ＴＣＰ（ＴｒａｎｓｐｏｒｔＣｏｎｔｒｏｌＰｒｏｔｏｃｏｌ）のシーケンスでは、第１の端末から第２の端末へデータ転送が行われる場合、ＴＣＰのセッション生成成功の後、転送するデータの量に応じて、第１の端末からの「ＰＳＨ＋ＡＣＫ」と第２の端末からの「ＡＣＫ」のメッセージが繰り返される場合があり、これには回数に関する規定がない。また、ＴＣＰでは、ＳｅｌｅｃｔｉｖｅＡＣＫ（選択確認応答）等のように、正常系の通信における例外的な拡張手順も存在するため、正常系にもバリエーションが多い。さらに、ＴＣＰの通信では、パケットロス等のエラーにより突如ＲＳＴが発生したり、正常にデータ転送が終了したのにＦＩＮによる切断をしないなど、異常系のバリエーションは多々存在する。 For example, in the TCP (Transport Control Protocol) sequence, when data is transferred from the first terminal to the second terminal, after successful TCP session generation, the first terminal depends on the amount of data to be transferred. The message "PSH + ACK" from and "ACK" from the second terminal may be repeated, and there is no regulation regarding the number of times. Further, in TCP, since there is an exceptional extension procedure in the communication of the normal system such as Selective ACK (selection acknowledgment), there are many variations in the normal system. Further, in TCP communication, there are many variations of an abnormal system, such as sudden occurrence of RST due to an error such as packet loss, or disconnection by FIN even though data transfer is completed normally.

同様のことは別のプロトコルでも存在する。例えば、ＶｏＩＰの呼制御のためのＳＩＰ（ＳｅｓｓｉｏｎＩｎｉｔｉａｔｉｏｎＰｒｏｔｏｃｏｌ）を考えた場合も、ＴＣＰと同様に開始メッセージであるＩＮＶＩＴＥを受けた後に１００Ｔｒｙｉｎｇ、１８０Ｒｉｎｇｉｎｇ、２００ＯＫと続き、それに対してＡＣＫを返すような典型的なシーケンスがある。この場合においても、ベンダによっては、ＰＲＡＣＫメッセージが不定期に挿入されるなどの揺らぎによって、正常なシーケンスを異常と判定してしまう場合などが発生しうる。 The same thing exists in other protocols. For example, when considering SIP (Session Initiation Protocol) for VoIP call control, 100 Triing, 180 Ring, 200 OK are continued after receiving INVITE, which is a start message, as in TCP, and ACK is returned for it. There is a typical sequence. Even in this case, depending on the vendor, the normal sequence may be determined to be abnormal due to fluctuations such as the PRACK message being inserted irregularly.

このような課題への従来技術として、例えば、特許文献１の記載技術が存在する。 As a conventional technique for solving such a problem, for example, there is a description technique of Patent Document 1.

特許文献１には、パケットのヘッダ情報に基づいて、通信が攻撃（スキャン）であるか通常動作かを判定する手法について記載されている。特許文献１では、主として統計的な値に関する閾値などを用いた判別手法が用いられているが、ネットワーク的な距離をＩＰアドレスのレーベンシュタイン距離を用いて判定するという特徴的な手法についても開示している。 Patent Document 1 describes a method for determining whether a communication is an attack (scan) or a normal operation based on the header information of a packet. Patent Document 1 mainly uses a discrimination method using a threshold value related to a statistical value, but also discloses a characteristic method of determining a network-like distance using the Levenshtein distance of an IP address. ing.

特開２０１７−７６８４１号公報JP-A-2017-76841

ところで、特許文献１に記載された手法を、通信のフローにエラー（異常）が発生した際に、どのエラー種別なのか（エラー原因）を識別するために用いる場合を想定する。この場合、特許文献１に記載された手法では、一定時間におけるカウンター値の変化を元にパケット数を数えて、正常かどうかの判定のために、副次的な情報を用いている。したがって、特許文献１に記載された手法は、真のエラーか気にしなくても良いエラーかの判別には適用できるが、エラー種別を特定して対策するという用途には利用できないという課題が残る。 By the way, it is assumed that the method described in Patent Document 1 is used to identify which error type (error cause) is when an error (abnormality) occurs in the communication flow. In this case, in the method described in Patent Document 1, the number of packets is counted based on the change in the counter value in a certain period of time, and secondary information is used for determining whether or not the packet is normal. Therefore, the method described in Patent Document 1 can be applied to determine whether the error is a true error or an error that does not need to be bothered, but there remains a problem that it cannot be used for the purpose of identifying the error type and taking countermeasures. ..

従来技術を用いた場合、上記のような課題が存在しながらもエラー（異常）が発生したことを検出すること自体は可能であるが、実際の環境では、典型的な正常系シーケンスで処理が完結するのは稀であり、正常系異常系問わず多種多様な通信シーケンスのパターンが存在する。そのため、従来技術を用いたとしても、接続に不具合が出る場合等に、機械学習などを用いたパターン学習により原因究明をすることが困難であるという課題が依然として存在する。 When the conventional technology is used, it is possible to detect that an error (abnormality) has occurred in spite of the above-mentioned problems, but in an actual environment, processing is performed by a typical normal sequence. It is rarely completed, and there are a wide variety of communication sequence patterns regardless of whether they are normal or abnormal. Therefore, even if the conventional technique is used, there is still a problem that it is difficult to investigate the cause by pattern learning using machine learning or the like when a connection problem occurs.

以上のような問題に鑑みて、効率的且つ詳細に通信の異常を分析することができる通信解析装置、通信解析プログラム及び通信解析方法が望まれている。 In view of the above problems, a communication analysis device, a communication analysis program, and a communication analysis method capable of efficiently and in detail analyzing communication abnormalities are desired.

第１の本発明の通信解析装置は、（１）解析対象となるトラフィックのデータをフローごとに分割してフローデータを取得するとともに、フローデータに示された通信の各信号を、用いられたプロトコルに応じた符号に変換する符号化処理を行いフロー符号列を生成する符号化処理手段と、（２）前記フロー符号列又は前記フロー符号列に基づく符号列について、複数の教師データから類似するものに対応したラベルを付与するラベル付与手段と、（３）前記ラベルが付与された、前記フロー符号列又は前記フロー符号列に基づく符号列について、前記ラベルに応じた所定の出力を行う出力手段とを有することを特徴とする。 In the first communication analysis device of the present invention, (1) the data of the traffic to be analyzed was divided for each flow to acquire the flow data, and each communication signal shown in the flow data was used. The coding processing means for generating a flow code string by performing the coding process for converting to a code according to the protocol, and (2) the flow code string or the code string based on the flow code string are similar from a plurality of teacher data. Labeling means for assigning a label corresponding to the object, and (3) an output means for outputting a predetermined output according to the label with respect to the flow code string or the code string based on the flow code string to which the label is attached. It is characterized by having and.

第２の本発明の通信解析プログラムは、コンピュータを、（１）解析対象となるトラフィックのデータをフローごとに分割してフローデータを取得するとともに、フローデータに示された通信の各信号を、用いられたプロトコルに応じた符号に変換する符号化処理を行いフロー符号列を生成する符号化処理手段と、（２）前記フロー符号列又は前記フロー符号列に基づく符号列について、複数の教師データから類似するものに対応したラベルを付与するラベル付与手段と、（３）前記ラベルが付与された、前記フロー符号列又は前記フロー符号列に基づく符号列について、前記ラベルに応じた所定の出力を行う出力手段として機能させることを特徴とする。 The second communication analysis program of the present invention acquires flow data by dividing the data of the traffic to be analyzed into (1) for each flow, and obtains the flow data, and obtains each communication signal shown in the flow data. A plurality of teacher data for the coding processing means for generating a flow code string by performing the coding process for converting to a code according to the protocol used, and (2) the flow code string or the code string based on the flow code string. (3) For the flow code string or the code string based on the flow code string to which the label is attached, a predetermined output corresponding to the label is output. It is characterized in that it functions as an output means to be performed.

第３の本発明は、通信解析装置が行う通信解析方法において、（１）符号化処理手段、ラベル付与手段、及び出力手段を備え、（２）前記符号化処理手段は、解析対象となるトラフィックのデータをフローごとに分割してフローデータを取得するとともに、フローデータに示された通信の各信号を、用いられたプロトコルに応じた符号に変換する符号化処理を行いフロー符号列を生成し、（３）前記ラベル付与手段は、前記フロー符号列又は前記フロー符号列に基づく符号列について、複数の教師データから類似するものに対応したラベルを付与し、（４）前記出力手段は、前記ラベルが付与された、前記フロー符号列又は前記フロー符号列に基づく符号列について、前記ラベルに応じた所定の出力を行うことを特徴とする。 A third aspect of the present invention is a communication analysis method performed by a communication analysis apparatus, which includes (1) a coding processing means, a labeling means, and an output means, and (2) the coding processing means is a traffic to be analyzed. Data is divided for each flow to acquire flow data, and each communication signal shown in the flow data is coded to be converted into a code according to the protocol used to generate a flow code string. , (3) The labeling means assigns a label corresponding to a plurality of teacher data to the flow code string or a code string based on the flow code string, and (4) the output means said. It is characterized in that a predetermined output corresponding to the label is performed on the flow code string or the code string based on the flow code string to which the label is attached.

本発明によれば、効率的且つ詳細に通信の異常を分析する通信解析装置、通信解析プログラム及び通信解析方法を提供することができる。 According to the present invention, it is possible to provide a communication analysis device, a communication analysis program, and a communication analysis method for efficiently and in detail analyzing communication abnormalities.

第１の実施形態に係る各装置の接続構成について示したブロック図である。It is a block diagram which showed the connection structure of each apparatus which concerns on 1st Embodiment. 第１の実施形態に係るエラー原因解析装置周辺の接続構成の例について示したブロック図である。It is a block diagram which showed the example of the connection structure around the error cause analysis apparatus which concerns on 1st Embodiment. 第１の実施形態に係るエラー解析装置で保持されるＴＣＰに関する置換文字定義情報の例について示した図である。It is a figure which showed the example of the substitution character definition information about TCP held by the error analysis apparatus which concerns on 1st Embodiment. 第１の実施形態に係るエラー原因解析装置で解析対象となるＴＣＰのフローに含まれるシーケンスの例について示した図である。It is a figure which showed the example of the sequence included in the flow of TCP which is the analysis target by the error cause analysis apparatus which concerns on 1st Embodiment. 第１の実施形態に係るエラー原因解析装置でＴＣＰのフローについて文字列化処理する過程について示した図である。It is a figure which showed the process of character stringizing the TCP flow by the error cause analysis apparatus which concerns on 1st Embodiment. 第１の実施形態に係るエラー解析装置で保持されるＳＩＰに関する置換文字定義情報の例について示した図である。It is a figure which showed the example of the substitution character definition information about SIP held by the error analysis apparatus which concerns on 1st Embodiment. 第１の実施形態に係るエラー原因解析装置で解析対象となるＳＩＰのフローに含まれるシーケンスの例について示した図である。It is a figure which showed the example of the sequence included in the flow of the SIP to be analyzed by the error cause analysis apparatus which concerns on 1st Embodiment. 第１の実施形態に係るエラー原因解析装置でＳＩＰのフローについて文字列化処理する過程について示した図である。It is a figure which showed the process of character stringizing the SIP flow by the error cause analysis apparatus which concerns on 1st Embodiment. 第１の実施形態に係るエラー原因解析装置において繰返し文字列を検出する処理の例について示した図である。It is a figure which showed the example of the process which detects the repeated character string in the error cause analysis apparatus which concerns on 1st Embodiment. 第１の実施形態に係るエラー原因解析装置において、複数の教師用文字列（正解データ）と、入力された複数の削除処理済フロー文字列との編集距離の例を表形式で示した図である。In the error cause analysis device according to the first embodiment, an example of the editing distance between a plurality of teacher character strings (correct answer data) and a plurality of input deleted flow character strings is shown in a table format. be. 第２の実施形態に係るエラー原因解析装置周辺の接続構成の例について示したブロック図である。It is a block diagram which showed the example of the connection structure around the error cause analysis apparatus which concerns on 2nd Embodiment.

（Ａ）第１の実施形態
以下、本発明による通信解析装置、通信解析プログラム及び通信解析方法の第１の実施形態を、図面を参照しながら詳述する。この実施形態では、本発明の通信解析装置、通信解析プログラム及び通信解析方法をエラー原因解析装置及びエラー原因解析プログラムに適用する例について説明する。 (A) First Embodiment Hereinafter, the first embodiment of the communication analysis device, the communication analysis program, and the communication analysis method according to the present invention will be described in detail with reference to the drawings. In this embodiment, an example in which the communication analyzer, the communication analysis program, and the communication analysis method of the present invention are applied to the error cause analysis device and the error cause analysis program will be described.

（Ａ−１）第１の実施形態の構成
図１は、第１の実施形態に関係する装置の接続構成について示したブロック図である。なお、図１において括弧内の符号は後述する第２の実施形態において用いられる符号である。 (A-1) Configuration of First Embodiment FIG. 1 is a block diagram showing a connection configuration of devices related to the first embodiment. The reference numerals in parentheses in FIG. 1 are the reference numerals used in the second embodiment described later.

図２は、第１の実施形態に係るエラー原因解析装置１０の接続構成の例について示した説明図である。 FIG. 2 is an explanatory diagram showing an example of a connection configuration of the error cause analysis device 10 according to the first embodiment.

エラー原因解析装置１０は、ネットワークＮの通信の状況を監視し、エラー（異常）の検知及び検知したエラーの原因解析を含む処理（以下、「通信解析処理」と呼ぶ）を行う装置である。 The error cause analysis device 10 is a device that monitors the communication status of the network N and performs processing including detection of an error (abnormality) and analysis of the cause of the detected error (hereinafter, referred to as "communication analysis processing").

第１の実施形態の例では、エラー原因解析装置１０による解析対象のネットワークＮには、少なくともＰＣ端末２０及びサーバ３０が接続されているものとする。 In the example of the first embodiment, it is assumed that at least the PC terminal 20 and the server 30 are connected to the network N to be analyzed by the error cause analysis device 10.

ＰＣ端末２０は、例えば、ネットワークＮに接続するユーザ（加入者）が利用する端末である。 The PC terminal 20 is, for example, a terminal used by a user (subscriber) connected to the network N.

サーバ３０は、ＰＣ端末２０と通信して所定のサービスを提供する装置である。サーバ３０の構成については限定されないものであるが、例えば、ＰＣ端末２０に対して種々のサービス（例えば、Ｗｅｂサービスやデータベースサービス等）を提供するサーバ（コンピュータ）等が該当する。 The server 30 is a device that communicates with the PC terminal 20 to provide a predetermined service. The configuration of the server 30 is not limited, but for example, a server (computer) that provides various services (for example, a Web service, a database service, etc.) to the PC terminal 20 is applicable.

この実施形態では、説明を簡易とするために、エラー原因解析装置１０は、ＰＣ端末２０とサーバ３０との間の通信について解析する例を説明するが、ネットワークＮの構成やネットワークＮに接続する端末やサーバの数や種類（機能）等の属性については限定されないものである。 In this embodiment, for the sake of simplicity, the error cause analysis device 10 describes an example of analyzing the communication between the PC terminal 20 and the server 30, but connects to the network N configuration and the network N. Attributes such as the number and types (functions) of terminals and servers are not limited.

次に、エラー原因解析装置１０の内部構成について図１を用いて説明する。 Next, the internal configuration of the error cause analysis device 10 will be described with reference to FIG.

エラー原因解析装置１０は、受信部１１、プロトコル分析・文字列化処理部１２、繰返し文字列削除部１３、ラベル付与処理部１４、原因出力部１５、教師データ保持部１６、及び繰返し文字列保持部１７を有している。 The error cause analysis device 10 includes a receiving unit 11, a protocol analysis / character string processing unit 12, a repeating character string deleting unit 13, a labeling processing unit 14, a cause output unit 15, a teacher data holding unit 16, and a repeating character string holding unit. It has a part 17.

エラー原因解析装置１０は、例えば、プロセッサ及びメモリを有するコンピュータに、プログラム（実施形態に係る通信解析プログラムを含む）をインストールすることにより実現するようにしてもよい。また、エラー原因解析装置１０において、一部又は全部の処理をハードウェア（例えば、専用の半導体チップ等）により実現するようにしてもよい。 The error cause analysis device 10 may be realized, for example, by installing a program (including a communication analysis program according to the embodiment) on a computer having a processor and a memory. Further, in the error cause analysis device 10, some or all of the processing may be realized by hardware (for example, a dedicated semiconductor chip or the like).

受信部１１は、ネットワークＮを流れるトラフィックの各フローに関する情報（以下、「フローデータ」と呼ぶ）を取得する機能を担っている。この実施形態の例において、受信部１１が取得するフローデータには、少なくともＰＣ端末２０とサーバ３０との間を流れるトラフィックに関するフローデータが含まれるものとする。受信部１１がフローデータを取得する構成については限定されないものであり種々の構成を適用することができる。例えば、ＰＣ端末２０とサーバ３０との間の経路上に配置された図示しないゲートウェイ装置（例えば、ルータやプロキシ等のネットワーク装置）からフローデータをオンラインでリアルタイムに収集するようにしてもよいし、上記の図示しないゲートウェイで取得されたフローデータをオフラインで取得（例えば、ストレージに記録されたデータを取得）するようにしてもよい。 The receiving unit 11 has a function of acquiring information (hereinafter, referred to as “flow data”) regarding each flow of traffic flowing through the network N. In the example of this embodiment, it is assumed that the flow data acquired by the receiving unit 11 includes at least the flow data related to the traffic flowing between the PC terminal 20 and the server 30. The configuration in which the receiving unit 11 acquires the flow data is not limited, and various configurations can be applied. For example, flow data may be collected online in real time from a gateway device (for example, a network device such as a router or proxy) arranged on a route between the PC terminal 20 and the server 30 (not shown). The flow data acquired by the gateway (not shown above) may be acquired offline (for example, the data recorded in the storage may be acquired).

プロトコル分析・文字列化処理部１２は、受信部１１が受信したフローデータをフロー毎に分類し、フロー毎に該当するプロトコルを分析する。そして、プロトコル分析・文字列化処理部１２は、各フローのフローデータから当該フローに含まれる通信のシーケンス（時系列ごとの信号（通信手順）の列）を、当該フローのプロトコルに対応する文字列（以下、この文字列を「フロー文字列」と呼ぶ）に変換する処理（以下、「文字列化処理」と呼ぶ）を行う。言い換えると、プロトコル分析・文字列化処理部１２は、通信のシーケンスを構成する各信号（手順）を対応する文字に置換えてシーケンス全体を文字列化する処理を行う。プロトコル分析・文字列化処理部１２が行う処理の詳細については後述する。なお、文字列化処理で用いられる文字の体系（コード体系）は限定しないものであるが、例えば、ＡＳＣＩＩコードやユニコード等のコードで記述可能な文字を適用するようにしてもよい。なお、本明細書において、「文字」の概念には、記号や数字等、種々の文字コード（例えば、ＡＳＣＩＩコード等）で記述可能なすべての文字が含まれるものとして説明する。 The protocol analysis / character string processing unit 12 classifies the flow data received by the receiving unit 11 for each flow, and analyzes the corresponding protocol for each flow. Then, the protocol analysis / character string processing unit 12 converts the communication sequence (string of the signal (communication procedure) for each time series) included in the flow from the flow data of each flow into characters corresponding to the protocol of the flow. Performs a process of converting a column (hereinafter, this character string is referred to as a "flow character string") (hereinafter, referred to as a "character string conversion process"). In other words, the protocol analysis / character string processing unit 12 performs a process of replacing each signal (procedure) constituting the communication sequence with a corresponding character and converting the entire sequence into a character string. The details of the processing performed by the protocol analysis / character string processing unit 12 will be described later. The character system (code system) used in the character string conversion process is not limited, but for example, characters that can be described by codes such as ASCII code and Unicode may be applied. In the present specification, the concept of "character" will be described as including all characters that can be described by various character codes (for example, ASCII code, etc.) such as symbols and numbers.

繰返し文字列削除部１３は、繰返し文字列保持部１７に登録されたデータに基づき、プロトコル分析・文字列化処理部１２により文字列処理化されたフロー文字列から、繰返し発生（冗長的に発生；循環的に発生）する文字列（以下、「繰返し文字列」と呼ぶ）を削除する処理（以下、「繰返し文字列削除処理」と呼ぶ）を行う。さらに、繰返し文字列削除部１３は、新規に表れた削除可能な文字列を繰返し文字列保持部１７に供給して登録させるようにしてもよい。以下では、繰返し文字列削除部１３が、削除処理したフロー文字列を「削除処理済フロー文字列」と呼ぶものとする。繰返し文字列削除部１３の処理詳細については後述する。 The repeating character string deletion unit 13 repeatedly occurs (generates redundantly) from the flow character string processed by the protocol analysis / character string processing unit 12 based on the data registered in the repeating character string holding unit 17. Performs a process (hereinafter referred to as "repeated character string deletion process") for deleting a character string (hereinafter referred to as "repeated character string") that occurs cyclically. Further, the repeating character string deleting unit 13 may supply a newly appearing deleteable character string to the repeating character string holding unit 17 to register the character string. In the following, it is assumed that the flow character string deleted by the repeating character string deletion unit 13 is referred to as a “deleted flow character string”. The processing details of the repeating character string deletion unit 13 will be described later.

繰返し文字列保持部１７は、繰返し文字列削除処理で用いられる繰返し文字列のリスト（一覧）を保持して、繰返し文字列削除部１３に提供する機能を担っている。繰返し文字列保持部１７では、プロトコルごとに分類して繰返し文字列を管理するようにしてもよい。 The repeating character string holding unit 17 has a function of holding a list (list) of repeating character strings used in the repeating character string deleting process and providing the repeating character string deleting unit 13. The repeating character string holding unit 17 may manage the repeating character strings by classifying them according to the protocol.

ラベル付与処理部１４は、各削除処理済フロー文字列について、教師データ保持部１６の保持する教師用の文字列（以下、「教師用文字列」と呼ぶ）との類似度合を示すパラメータを計算する。ここでは、ラベル付与処理部１４は、削除処理済フロー文字列と教師用文字列との類似度合を示すパラメータとして編集距離を計算する処理（以下、「編集距離計算処理」と呼ぶ）を行うものとする。編集距離とは、挿入・削除などの文字列編集を何回繰り返すと該当する文字列になるかを数値化したものであり、文章の盗用検知などにも利用される技術である。言い換えると、編集距離とは、１文字の挿入・削除・置換によって、一方の文字列をもう一方の文字列に変形するのに必要な手順の最小回数を示すパラメータである。すなわち、編集距離とは、元の文字列の並びがあった場合に、挿入・削除などの文字列編集を何回繰り返すと該当する文字列になるかを数値化したものといえる。この実施形態の例では、編集距離として、レーベンシュタイン距離を用いるものとする。 The label assignment processing unit 14 calculates a parameter indicating the degree of similarity of each deleted flow character string with the teacher character string held by the teacher data holding unit 16 (hereinafter, referred to as “teacher character string”). do. Here, the label assignment processing unit 14 performs a process of calculating the edit distance as a parameter indicating the degree of similarity between the deleted flow character string and the teacher character string (hereinafter, referred to as “edit distance calculation process”). And. The editing distance is a technology that quantifies how many times a character string editing such as insertion / deletion is repeated to obtain the corresponding character string, and is also used for sentence theft detection. In other words, the edit distance is a parameter that indicates the minimum number of steps required to transform one character string into the other character string by inserting, deleting, or replacing one character. That is, it can be said that the editing distance is a numerical value of how many times the character string editing such as insertion / deletion is repeated to obtain the corresponding character string when the original character string is arranged. In the example of this embodiment, the Levenshtein distance is used as the editing distance.

そして、ラベル付与処理部１４は、編集距離が近く尤もらしい教師用文字列を選択し、当該教師用文字列に応じたラベルを当該削除処理済フロー文字列に付与する処理（以下、「ラベル付与処理」と呼ぶ）を行う。ここでは、ラベルとは、解析対象のプロトコルにおいて、文字列（削除処理済フロー文字列や教師用文字列）が示す通信の内容（例えば、正常／異常のいずれの通信となるかや、異常な通信の原因等）を表示するものであるものとする。 Then, the label assignment processing unit 14 selects a teacher character string that is close to the editing distance and is likely to be, and assigns a label corresponding to the teacher character string to the deleted flow character string (hereinafter, "label assignment"). It is called "processing"). Here, the label is the communication content (for example, normal / abnormal communication) indicated by the character string (deleted flow character string or teacher character string) in the protocol to be analyzed, or abnormal. The cause of communication, etc.) shall be displayed.

ラベル付与処理部１４は、例えば、削除処理済フロー文字列と最も編集距離の少ない教師用文字列を選択するようにしてもよい。なお、削除処理済フロー文字列は、１つの削除処理済フロー文字列に対して複数の教師用文字列を選択し、選択した複数の教師用文字列のラベルを、当該削除処理済フロー文字列に付与するようにしてもよい。例えば、ラベル付与処理部１４は、削除処理済フロー文字列と最も同じ編集距離の近い教師文字列が複数ある場合は、それらの教師文字列のラベルを全て選択するようにしてもよい。また、例えば、ラベル付与処理部１４は、削除処理済フロー文字列と、編集距離が閾値以下の教師文字列を全て選択するようにしてもよい。ラベル付与処理部１４の処理詳細については後述する。 For example, the labeling processing unit 14 may select the deleted flow character string and the teacher character string having the shortest editing distance. As for the deleted flow character string, a plurality of teacher character strings are selected for one deleted flow character string, and the labels of the selected plurality of teacher character strings are assigned to the deleted flow character string. It may be given to. For example, if there are a plurality of teacher character strings having the same editing distance as the deleted flow character string, the label assignment processing unit 14 may select all the labels of those teacher character strings. Further, for example, the label assignment processing unit 14 may select all the deleted flow character strings and the teacher character strings whose editing distance is equal to or less than the threshold value. The processing details of the labeling processing unit 14 will be described later.

教師データ保持部１６は、ラベル付与処理部１４に供給する教師用文字列に、ラベル（正解に対応するラベル）が対応付けられたデータ（以下、「教師データ」と呼ぶ）が保持されている。教師データ保持部１６に保持される教師データは、例えば、予めフローデータ等により取得された削除処理済フロー文字列と、当該削除処理済フロー文字列に対して人間（例えば、オペレータや設計者等）が付与した正解のデータを適用するようにしてもよい。教師データ保持部１６では、プロトコルごとに分類して複数の教師データを保持するようにしてもよい。 The teacher data holding unit 16 holds data (hereinafter, referred to as “teacher data”) in which a label (label corresponding to the correct answer) is associated with the teacher character string supplied to the label assigning processing unit 14. .. The teacher data held in the teacher data holding unit 16 is, for example, a deleted flow character string acquired in advance by flow data or the like, and a human being (for example, an operator, a designer, etc.) with respect to the deleted flow character string. ) May be applied to the correct answer data. The teacher data holding unit 16 may hold a plurality of teacher data by classifying each protocol.

原因出力部１５は、各削除処理済フロー文字列に付与されたラベルを参照し、参照したラベルに応じた出力処理を行う。例えば、原因出力部１５は、各削除処理済フロー文字列について、付与されたラベルを付加して（対応付けて）出力するようにしてもよい。また、その際、原因出力部１５は、各削除処理済フロー文字列に係るフローが発生した日時（タイムスタンプ）や、当該フローの送信元及び又は送信先のホスト（例えば、端末名やＩＰアドレス）や、当該フローのプロトコル等の情報を付加するようにしてもよい。原因出力部１５の処理の詳細については後述する。 The cause output unit 15 refers to the label assigned to each deleted flow character string, and performs output processing according to the referenced label. For example, the cause output unit 15 may add (associate with) a given label to each deleted flow character string and output it. At that time, the cause output unit 15 determines the date and time (time stamp) when the flow related to each deleted flow character string occurs, and the host (for example, terminal name or IP address) of the source and / or destination of the flow. ) Or information such as the protocol of the flow may be added. The details of the processing of the cause output unit 15 will be described later.

（Ａ−２）第１の実施形態の動作
次に、以上のような構成を有する第１の実施形態のエラー原因解析装置１０の動作を説明する。 (A-2) Operation of First Embodiment Next, the operation of the error cause analysis device 10 of the first embodiment having the above configuration will be described.

受信部１１は、フローデータを受け取ると、当該フローデータをプロトコル分析・文字列化処理部１２に供給する。 When the receiving unit 11 receives the flow data, the receiving unit 11 supplies the flow data to the protocol analysis / character string processing unit 12.

プロトコル分析・文字列化処理部１２は、フローデータが供給されると、そのフローデータをフロー毎に分離（分類）し、フロー毎のプロトコルを判別する。プロトコル分析・文字列化処理部１２がトラフィックのデータをフロー毎に分離する処理については、種々の分離手法（分類手法）を適用することができる。例えば、プロトコル分析・文字列化処理部１２は、いわゆる５−ｔｕｐｌｅｓ（送信元アドレスとポート、あて先アドレスとポート、及びプロトコル番号の組み合わせ）をキーとしてフローを分割するようにしてもよい。 When the flow data is supplied, the protocol analysis / character string processing unit 12 separates (classifies) the flow data for each flow and determines the protocol for each flow. Various separation methods (classification methods) can be applied to the process in which the protocol analysis / character string processing unit 12 separates traffic data for each flow. For example, the protocol analysis / character string processing unit 12 may divide the flow using so-called 5-tuples (a combination of a source address and a port, a destination address and a port, and a protocol number) as a key.

そして、プロトコル分析・文字列化処理部１２は、分類したフロー毎のフローデータについて、プロトコルを判別する。プロトコル分析・文字列化処理部１２が各フローのプロトコルを判別する方法は限定されないものであり、種々の方法を適用することができる。プロトコル分析・文字列化処理部１２は、例えば、フローデータから当該フローを構成するパケットのプロトコル番号やポート番号等に基づいてフロー毎のプロトコルを判別するようにしてもよい。 Then, the protocol analysis / character string processing unit 12 determines the protocol for the flow data for each classified flow. The method by which the protocol analysis / character string processing unit 12 determines the protocol of each flow is not limited, and various methods can be applied. The protocol analysis / character string processing unit 12 may, for example, determine the protocol for each flow from the flow data based on the protocol number, port number, or the like of the packets constituting the flow.

そして、プロトコル分析・文字列化処理部１２は、フロー毎のシーケンス（フローデータに含まれる通信のシーケンス）を構成する各信号（メッセージ）について、当該プロトコル内で、どのような意味を持つものであるかを分析し、フロー毎に判別したプロトコルに応じた文字に置換える文字列化処理を行う。 Then, the protocol analysis / character string processing unit 12 has what meaning in the protocol for each signal (message) constituting the sequence for each flow (communication sequence included in the flow data). It analyzes whether it exists and performs a character string conversion process that replaces it with a character according to the protocol determined for each flow.

ところで、ＴＣＰのパケットには、ＴＣＰヘッダの制御ビットにより、当該パケットの持つ意味（処理／機能）が示される。例えば、ＴＣＰヘッダの制御ビットには、緊急ポインタ・フィールドが有効であることを示す「ＵＲＧ」、確認応答番号フィールドが有効であることを示す「ＡＣＫ」、バッファに蓄えたデータを出力（プッシュ）することを依頼する「ＰＳＨ」、コネクションのリセットを要求する「ＲＳＴ」、シーケンス番号の同期を依頼する「ＳＹＮ」、送信終了を示す「ＦＩＮ」のそれぞれに対応するフラグ（ビット）が設けられている。ＴＣＰパケットでは上述のようなＴＣＰヘッダの制御ビットの組み合わせにより、当該パケットの持つ意味（処理内容や機能）が示される。以下では、各ＴＣＰパケットにセットされたフラグに応じた名称（メッセージ）で示すものとする。例えば、ＳＹＮのフラグがセット（１に設定）されたＴＣＰパケットを「ＳＹＮ」となる。 By the way, in a TCP packet, the meaning (processing / function) of the packet is indicated by the control bit of the TCP header. For example, for the control bit of the TCP header, "URG" indicating that the emergency pointer field is valid, "ACK" indicating that the acknowledgment number field is valid, and the data stored in the buffer are output (pushed). Flags (bits) corresponding to each of "PSH" requesting the operation, "RST" requesting the reset of the connection, "SYN" requesting the synchronization of the sequence number, and "FIN" indicating the end of transmission are provided. There is. In a TCP packet, the meaning (processing content and function) of the packet is indicated by the combination of the control bits of the TCP header as described above. In the following, it is assumed that the name (message) corresponding to the flag set in each TCP packet is used. For example, a TCP packet in which the SYN flag is set (set to 1) becomes "SYN".

言い換えると、プロトコル分析・文字列化処理部１２は、ＴＣＰのプロトコルのフロー（シーケンス）について分析する場合、セッションの開始（ＳＹＮ）なのか、応答（ＡＣＫ）なのか、データ送出（ＰＳＨ）なのかという情報にマッピングする。さらに、プロトコル分析・文字列化処理部１２は、文字列化処理として、意味合い毎に１文字ずつの置換え文字を定義（辞書により定義）して、時系列的に発生する通信（シーケンス）の文字列化を行う。すなわち、プロトコル分析・文字列化処理部１２は、文字列化処理として、通信のフローを元に意味のある文字列の並びに変換する。 In other words, when the protocol analysis / stringification processing unit 12 analyzes the TCP protocol flow (sequence), it is a session start (SYNC), a response (ACK), or a data transmission (PSH). Map to the information. Further, the protocol analysis / character string conversion processing unit 12 defines replacement characters for each character for each meaning (defined by a dictionary) as the character string conversion process, and the communication (sequence) characters that occur in time series. Make a line. That is, the protocol analysis / character string conversion processing unit 12 converts a meaningful character string sequence based on the communication flow as the character string conversion process.

次に、例として、あるフローについて判別したプロトコルがＴＣＰであった場合における文字列化処理の具体例について図３〜図５を用いて説明する。 Next, as an example, a specific example of the character string conversion process when the protocol determined for a certain flow is TCP will be described with reference to FIGS. 3 to 5.

ここでは、プロトコル分析・文字列化処理部１２には、文字列化処理対象のプロトコルについて、元の信号（メッセージ）の名称（種類）ごとに置換える文字（以下、単に「置換え文字」とも呼ぶ）を定義した情報（以下、「置換文字定義情報」と呼ぶ）が保持されているものとする。 Here, in the protocol analysis / character string conversion processing unit 12, the character to be replaced for each name (type) of the original signal (message) with respect to the protocol to be processed for character string conversion (hereinafter, also simply referred to as “replacement character”). ) Is defined (hereinafter referred to as "replacement character definition information").

図３は、ＴＣＰに関する置換文字定義情報の例について示した図である。 FIG. 3 is a diagram showing an example of substitution character definition information regarding TCP.

図３に示すように、置換文字定義情報には元の信号（メッセージ）の名称（種類）ごとに置換え文字が定義（記述）されている。具体的には、図３に示すＴＣＰの置換文字定義情報では、「ＳＹＮ」に対して「Ｓ」、「ＳＹＮ＋ＡＣＫ」に対して「Ｔ」、「ＡＣＫ」に対して「Ａ」、「ＰＳＨ＋ＡＣＫ」に対して「Ｐ」、「ＦＩＮ」に対して「Ｆ」、「ＲＳＴ」に対して「Ｒ」、「ＵＲＧ」に対して「Ｕ」・・・が、それぞれ置換え文字として定義されているものとする。 As shown in FIG. 3, the replacement character definition information defines (describes) a replacement character for each name (type) of the original signal (message). Specifically, in the TCP replacement character definition information shown in FIG. 3, "S" is used for "SYN", "T" is used for "SYN + ACK", "A" is used for "ACK", and "PSH + ACK" is used for "ACK". "P" for "P", "F" for "FIN", "R" for "RST", "U" for "URG", and so on are defined as replacement characters. And.

図４は、ＰＣ端末２０とサーバ３０との間のＴＣＰのシーケンスで送受信される一連のパケットについて示したシーケンス図である。図４に示すように、当該フローでは、ＰＣ端末２０とサーバ３０との間で、ＴＣＰパケットＰ１０１〜Ｐ１０５、・・・Ｐ２０１〜Ｐ２０４が流れている。図４に示すように、Ｐ１０１〜Ｐ１０５のＴＣＰパケットは、それぞれ「ＳＹＮ」、「ＳＹＮ＋ＡＣＫ」、「ＡＣＫ」、「ＰＳＨ＋ＡＣＫ」、「ＡＣＫ」のメッセージを示している。また、図４に示すように、Ｐ２０１〜Ｐ２０４のＴＣＰパケットは、それぞれ「ＦＩＮ」、「ＡＣＫ」、「ＦＩＮ」、「ＡＣＫ」のメッセージを示している。 FIG. 4 is a sequence diagram showing a series of packets transmitted / received in a TCP sequence between the PC terminal 20 and the server 30. As shown in FIG. 4, in the flow, TCP packets P101 to P105, ... P201 to P204 are flowing between the PC terminal 20 and the server 30. As shown in FIG. 4, the TCP packets of P101 to P105 indicate the messages of "SYN", "SYNC + ACK", "ACK", "PSH + ACK", and "ACK", respectively. Further, as shown in FIG. 4, the TCP packets of P201 to P204 indicate the messages of "FIN", "ACK", "FIN", and "ACK", respectively.

図５は、図４のシーケンスで示されるフローについて、図３に示す置換文字定義情報に基づいて文字列化処理する過程について示した図である。 FIG. 5 is a diagram showing a process of character stringizing the flow shown in the sequence of FIG. 4 based on the replacement character definition information shown in FIG.

図５（ａ）は、図４のシーケンスにおける各ＴＣＰパケットのメッセージを時系列順に並べたリストである。そして、図５（ｂ）は、図５（ａ）に示すメッセージのリストを文字列化処理したフロー文字列を示している。 FIG. 5A is a list in which the messages of each TCP packet in the sequence of FIG. 4 are arranged in chronological order. Then, FIG. 5B shows a flow character string obtained by converting the list of messages shown in FIG. 5A into a character string.

図５（ｂ）に示すように、ＴＣＰパケットＰ１０１〜Ｐ１０５、・・・Ｐ２０１〜Ｐ２０４を文字列化すると、「ＳＴＡＰＡ・・・ＦＡＦＡ」となる。 As shown in FIG. 5B, when the TCP packets P101 to P105, ... P201 to P204 are converted into character strings, it becomes "STAPA ... FAFA".

次に、例として判別したプロトコルがＳＩＰであった場合における文字列化処理の具体例について図６〜図８を用いて説明する。 Next, a specific example of the character string conversion process when the protocol determined as an example is SIP will be described with reference to FIGS. 6 to 8.

通常、ＳＩＰのメッセージが設定されるパケットには、ＵＤＰ（ＵｓｅｒＤａｔａｇｒａｍＰｒｏｔｏｃｏｌ）の所定のポート番号（標準では５０６０）が設定されている。例えば、プロトコル分析・文字列化処理部１２では、予めネットワークＮ上で用いられるＳＩＰ用のポート番号が登録されていれば、ＳＩＰのフローを判別することができる。また、プロトコル分析・文字列化処理部１２は、ＳＩＰのメッセージのヘッダ部分を参照してメッセージの種類（名称）を認識するようにしてもよい。 Normally, a predetermined port number (standard: 5060) of UDP (User Datagram Protocol) is set in the packet in which the SIP message is set. For example, in the protocol analysis / character string processing unit 12, if the port number for SIP used on the network N is registered in advance, the SIP flow can be determined. Further, the protocol analysis / character string processing unit 12 may recognize the message type (name) by referring to the header portion of the SIP message.

図６は、ＳＩＰに関する置換文字定義情報の例について示した図である。 FIG. 6 is a diagram showing an example of substitution character definition information regarding SIP.

図６に示すように、置換文字定義情報には元のメッセージの種類（名称）ごとに置換え文字が定義（記述）されている。図６に示すＳＩＰの置換文字定義情報では、「ＩＮＶＩＴＥ」に対して「Ｉ」、「ＢＹＥ」に対して「Ｂ」、「ＡＣＫ」に対して「Ａ」、「ＰＲＡＣＫ」に対して「Ｐ」、「ＣＡＮＣＥＬ」に対して「Ｃ」、「ＩＮＦＯ」に対して「Ｊ」、「ＭＥＳＳＡＧＥ」に対して「Ｍ」、「ＲＥＧＩＳＴＥＲ」に対して「Ｅ」、「１００Ｔｒｙｉｎｇ」に対して「Ｔ」、「１８０Ｒｉｎｇｉｎｇ」に対して「Ｒ」、「２００ＯＫ」に対して「Ｏ」、・・・が、それぞれ置換え文字として定義されている。 As shown in FIG. 6, replacement characters are defined (description) for each type (name) of the original message in the replacement character definition information. In the SIP replacement character definition information shown in FIG. 6, "I" is used for "INVITE", "B" is used for "BYE", "A" is used for "ACK", and "P" is used for "PRACK". , "C" for "CANCEL", "J" for "INFO", "M" for "MESSAGE", "E" for "REGISTER", "100 Trying" for "100 Trying" "R" for "T" and "180 Ringing", "O" for "200 OK", ... Are defined as replacement characters, respectively.

図７は、ＰＣ端末２０とサーバ３０との間で送受信される一連のＳＩＰメッセージ（ＳＩＰのメッセージが設定されたパケット）の流れについて示したシーケンス図である。図７に示すように、当該シーケンスでは、ＰＣ端末２０とサーバ３０との間で、ＳＩＰメッセージのパケットＰ３０１〜Ｐ３０５、・・・Ｐ４０１、Ｐ４０２が流れている。図７に示すように、Ｐ３０１〜Ｐ３０５のパケット（ＳＩＰメッセージ）は、それぞれ「ＩＮＶＩＴＥ」、「１００Ｔｒｙｉｎｇ」、「１８０Ｒｉｎｇｉｎｇ」、「２００ＯＫ」、「ＡＣＫ」のメッセージを示している。また、図７に示すように、Ｐ４０１、Ｐ４０２のパケット（ＳＩＰメッセージ）は、それぞれ「ＢＹＥ」、「２００ＯＫ」のメッセージを示している。 FIG. 7 is a sequence diagram showing a flow of a series of SIP messages (packets in which SIP messages are set) transmitted and received between the PC terminal 20 and the server 30. As shown in FIG. 7, in the sequence, SIP message packets P301 to P305, ... P401, P402 are flowing between the PC terminal 20 and the server 30. As shown in FIG. 7, the packets (SIP messages) of P301 to P305 indicate the messages of "INVITE", "100 Trying", "180 Ringing", "200 OK", and "ACK", respectively. Further, as shown in FIG. 7, the packets (SIP messages) of P401 and P402 indicate the messages of "BYE" and "200 OK", respectively.

図８は、図７のシーケンスで示されるフローについて、図６に示す置換文字定義情報に基づいて文字列化処理する過程について示した図である。 FIG. 8 is a diagram showing a process of character stringizing the flow shown in the sequence of FIG. 7 based on the replacement character definition information shown in FIG.

図８（ａ）は、図７のシーケンスにおける各ＳＩＰメッセージを時系列順に並べたリストである。そして、図８（ｂ）は、図８（ａ）に示すＳＩＰメッセージのリストを文字列化処理したフロー文字列を示している。 FIG. 8A is a list in which each SIP message in the sequence of FIG. 7 is arranged in chronological order. Then, FIG. 8B shows a flow character string obtained by converting the list of SIP messages shown in FIG. 8A into a character string.

図８（ｂ）に示すように、パケットＰ３０１〜Ｐ３０５、・・・Ｐ４０１、Ｐ４０２を文字列化すると、「ＩＴＲＯＡ・・・ＢＯ」となる。 As shown in FIG. 8B, when the packets P301 to P305, ... P401, and P402 are converted into character strings, they become "ITROA ... BO".

以上のように、プロトコル分析・文字列化処理部１２は、分離した各フローについて通信のシーケンスを文字列化処理したフロー文字列を生成する。 As described above, the protocol analysis / character string processing unit 12 generates a flow character string obtained by character stringizing the communication sequence for each separated flow.

次に、繰返し文字列削除部１３は、フロー文字列から冗長な部分（繰返し文字列）を削除する処理（繰返し文字列削除処理）を行う。 Next, the repeating character string deletion unit 13 performs a process (repeated character string deletion process) of deleting a redundant portion (repeated character string) from the flow character string.

例えばＴＣＰのシーケンスについて文字列化処理した場合、通信すべきデータ量に応じてＰＳＨ＋ＡＣＫ（データの送信）とＡＣＫ（データ受信応答）を任意の回数繰り返すなどの伝送するコンテンツ（データ）の量等に依存する冗長な文字列が含まれる場合がある。他にも、ＳＩＰのシーケンスについて文字列化処理した場合、通信フローを安定化されるため等の要因でＰＲＡＣＫ（暫定応答確認）を適宜挿入して通信することがあり、このＰＲＡＣＫにより発生する文字列が冗長となる場合がある。繰返し文字列削除部１３は、フロー文字列からこのような冗長な文字列を「繰返し文字列」として削除する処理を行う。 For example, when a TCP sequence is converted into a character string, the amount of content (data) to be transmitted, such as repeating PSH + ACK (data transmission) and ACK (data reception response) an arbitrary number of times according to the amount of data to be communicated, etc. May contain dependent redundant strings. In addition, when the SIP sequence is processed into a character string, PRACK (provisional response confirmation) may be inserted as appropriate for communication due to factors such as stabilizing the communication flow, and the characters generated by this PRACK may be inserted. Columns may be redundant. The repeating character string deletion unit 13 performs a process of deleting such a redundant character string as a "repeating character string" from the flow character string.

上述の通り、繰返し文字列削除部１３で繰返し文字列削除処理の際に参照される繰返し文字列は、繰返し文字列保持部１７に保持（登録）されている。繰返し文字列保持部１７に保持させる繰返し文字列は、例えば、予め人間（例えば、オペレータや設計者）が作成して設定したものでもよい。しかしながら、繰返し文字列のバリエーションは、増える可能性があるため、繰返し文字列削除部１３が、供給されるフロー文字列から新規に出現した繰返し文字を検出して繰返し文字列保持部１７に登録するようにしてもよい。 As described above, the repeated character string referred to in the repeated character string deletion process in the repeated character string deletion unit 13 is held (registered) in the repeated character string holding unit 17. The repeating character string to be held by the repeating character string holding unit 17 may be, for example, one created and set in advance by a human being (for example, an operator or a designer). However, since the variation of the repeating character string may increase, the repeating character string deleting unit 13 detects a newly appearing repeating character from the supplied flow character string and registers it in the repeating character string holding unit 17. You may do so.

繰返し文字列削除部１３が、供給されるフロー文字列から新規の繰返し文字を検出する方法は限定されないものであるが、任意の列に出現する循環（繰返し）を検出する種々のアルゴリズムを用いることができる。例えば、繰返し文字列削除部１３では、フロイドの循環検出法を用いるようにしてもよい。 The method by which the repeating character string deletion unit 13 detects a new repeating character from the supplied flow character string is not limited, but various algorithms for detecting the circulation (repetition) appearing in an arbitrary column are used. Can be done. For example, the repeated character string deletion unit 13 may use the Floyd cycle detection method.

図９は、繰返し文字列削除部１３が入力されたフロー文字列について、繰返し文字の検出処理及び繰返し文字列削除処理を行う際の例について示した図である。 FIG. 9 is a diagram showing an example of performing a repeat character detection process and a repeat character string deletion process on a flow character string in which the repeat character string deletion unit 13 is input.

図９では、「ＩＴＲｉＡＩｉＡａＩｉＡａＩｉＡａＩｉＡａＩｉＡａＩｉＡａＩｉＡｉＡＢｂ」、「ＩＴＳｉＡＩｉＡＩｉＡＩｉＡＢｂ」、「ＩＴＲｉＡＩＴｉＡＩＴｉＡＢｂ」、「ＩＴＲｉＡＪｊＪｊＪｊＪｊＢｂ」という４つのフロー文字列について、それぞれ繰返し文字の検出処理及び繰返し文字列削除処理を行う際の例について示している。 9.

ここでは、繰返し文字列削除部１３は、フロイドの循環検出法を用いて、単純に複数回（２回以上）出現する文字列（循環）をすべて検出するものとする。そうすると、図９に示すように、繰返し文字列削除部１３は、「ＩＴＲｉＡＩｉＡａＩｉＡａＩｉＡａＩｉＡａＩｉＡａＩｉＡａＩｉＡｉＡＢｂ」、「ＩＴＳｉＡＩｉＡＩｉＡＩｉＡＢｂ」、「ＩＴＲｉＡＩＴｉＡＩＴｉＡＢｂ」、「ＩＴＲｉＡＪｊＪｊＪｊＪｊＢｂ」について、それぞれ「ｉＡＩ」、「ｉＡＩ」、「ｉＡＩＴ」、「Ｊｊ」という繰返し文字列を検出することになる。なお、このとき、繰返し文字列削除部１３では、繰返し文字の検出にあたって、Ｎ回（Ｎは２以上の整数）以上出現する文字列（循環）のみを検出することや、Ｍ文字以上（Ｍは１以上の整数）の文字列（循環）のみを検出することや、Ｊ文字以下（循環）の文字列のみを検出すること等の制限を設定してもよい。なお、繰返し文字列削除部１３では、複数の制限を同時に設定してもよい。また、繰返し文字列削除部１３は、繰返し文字列をすべて削除せずに１つだけ残す（冗長部分のみ削除する）ようにしてもよい。繰り返し文字列削除部はフロイドの循環検出法に限定されない。 Here, it is assumed that the repeating character string deletion unit 13 simply detects all the character strings (circulation) that appear a plurality of times (twice or more) by using the Floyd cycle detection method. Then, as shown in FIG. 9, the repeating character string deletion unit 13 will perform "ITRiAIIAaIiAaIiAaIiAaIiAaIiAaIAiABb", "ITSIAIiAIIAIIABb", "ITSiAIIAIIAIIABb", "ITRiAITiAITiABb", "ITRiAITiAITiAJi" Will be detected repeatedly. At this time, the repeating character string deletion unit 13 detects only the character string (circulation) that appears N times (N is an integer of 2 or more) or more, or M characters or more (M is an integer of 2 or more) when detecting the repeating character. Restrictions may be set such that only a character string (circular) of 1 or more characters) is detected, or only a character string of J characters or less (circular) is detected. In the repeating character string deletion unit 13, a plurality of restrictions may be set at the same time. Further, the repeating character string deleting unit 13 may leave only one repeating character string (delete only the redundant part) without deleting all the repeating character strings. The repeating character string deletion part is not limited to the Floyd cycle detection method.

図９に示すように、繰返し文字列削除部１３は、「ＩＴＲｉＡＩｉＡａＩｉＡａＩｉＡａＩｉＡａＩｉＡａＩｉＡａＩｉＡｉＡＢｂ」というフロー文字列が入力されると、当該フロー文字列から「ｉＡＩ」という繰返し文字列をすべて削除し、「ＩＴＲｉＡＢｂ」という削除処理済フロー文字列を出力する。同様に、繰返し文字列削除部１３は、「ＩＴＳｉＡＩｉＡＩｉＡＩｉＡＢｂ」、「ＩＴＲｉＡＩＴｉＡＩＴｉＡＢｂ」、「ＩＴＲｉＡＪｊＪｊＪｊＪｊＢｂ」についてもそれぞれ、「ｉＡＩ」、「ｉＡＩＴ」、「Ｊｊ」という繰返し文字列を用いて繰返し文字列削除処理を行い、「ＩＴＳｉＡＢｂ」、「ＩＴＲｉＡＢｂ」、「ＩＴＲｉＡＢｂ」という削除処理済フロー文字列を出力することになる。 As shown in FIG. 9, when the flow character string "ITRiAIiAaIiAaIiAaIiAaIiAaIiAaIiAiABb" is input, the repeat character string deletion unit 13 deletes all the repeat character strings "iAI" from the flow character string and deletes "ITRiABb". Output the processed flow string. Similarly, the repeat character string deletion unit 13 also performs the repeat character string deletion process for "ITSAAIAIAIAIAIABb", "ITRiAITiAITiABb", and "ITRiAJjJjJjJjBb" using the repeat character strings "iAI", "iAIT", and "Jj", respectively. Then, the deleted flow character strings "ITSiABb", "ITRiABb", and "ITRiABb" are output.

以上のように、繰返し文字列削除部１３は、各フロー文字列について繰返し文字列削除処理を行う。 As described above, the repetitive character string deletion unit 13 performs the repetitive character string deletion process for each flow character string.

次に、ラベル付与処理部１４は、削除処理済フロー文字列について、教師データ保持部１６の保持する教師用文字列（教師用データ）との編集距離を計算し、編集距離が近く尤もらしい教師用文字列を選択して、当該教師用文字列のラベルを当該削除処理済フロー文字列に付与する。 Next, the labeling processing unit 14 calculates the editing distance of the deleted flow character string from the teacher character string (teacher data) held by the teacher data holding unit 16, and the teacher who is close to the editing distance and is likely to be edited. Select the character string to assign the label of the teacher character string to the deleted flow character string.

上述の通り、この実施形態の例において、ラベル付与処理部１４は、削除処理済フロー文字列と教師用文字列（教師用データ）との編集距離についてレーベンシュタイン距離を適用するものとして説明する。 As described above, in the example of this embodiment, the labeling processing unit 14 describes that the Levenshtein distance is applied to the editing distance between the deleted flow character string and the teacher character string (teacher data).

図１０は、複数の教師用文字列（正解データ）と、入力された複数の削除処理済フロー文字列（サンプルデータ）との編集距離を示す表である。 FIG. 10 is a table showing the editing distance between the plurality of teacher character strings (correct answer data) and the plurality of input deleted flow processed flow character strings (sample data).

図１０では、教師用文字列については列方向にラベル（図中では「疑似網ラベル」）で図示しており、削除処理済フロー文字列については行方向に実際の種別（図中では「実網ラベル」と図示）で図示している。 In FIG. 10, the teacher character string is shown by a label in the column direction (“pseudo-net label” in the figure), and the deleted flow character string is shown by the actual type in the row direction (“actual” in the figure). It is illustrated as "net label").

図１０では、実網ラベル（削除処理済フロー文字列）として、ＳＩＰにおける「正常（ｅａｒｙｄｉａｌｏｇ）」、「正常」、「Ｂｕｓｙエラー」、「ＣＡＮＣＥＬ」、「Ｍｅｓｓａｇｅ，２００ＯＫ」が示されている。また、図１０では、疑似網ラベル（教師用文字列）として、「０」、「１」、「２」、「３」、「４」、「５_１」、「５_２」、「６」、「７」、「８_１」、「８_２」、「８_３」が示されている。図１０における疑似網ラベルのうち「０」のみが正常であることを示し、それ以外は異常（エラー）であることを示すものとする。なお「５_１」、「５_２」等、添え字（下付き文字）が付記されたラベルは、同じ原因であるが添え字に応じて教師用文字列のパターンが異なることを示しているものとする。なお、ラベルの表示形式は上記のような形式に限定されないものであり、各ラベルがユニークとなる形式であればよい。また、図１０では、説明を簡易とするため、疑似網ラベルにおいて正常なものが「０」と１つのみになっているが複数設定（例えば、「０_１」、「０_２」、・・・と複数設定）しても良いことは当然である。 In FIG. 10, as the real network label (deleted flow character string), "normal (early dialog)", "normal", "Busy error", "CANCEL", and "Message, 200OK" in SIP are shown. .. Further, in FIG. 10, as a pseudo network labels (teacher character string), "0", "1", "2", "3", "4", _{"5 1",} _{"5 2",} "6" , "7", _{"8 1",} _{"8 2"} are shown _{"8 3".} Of the pseudo-net labels in FIG. 10, only "0" indicates that it is normal, and the others indicate that it is abnormal (error). Note "5 _1", "5 _2", etc., labels subscript (subscript) is appended is intended the same cause of the pattern of the teacher string in response to but subscript shows different And. The label display format is not limited to the above format, and any format may be used as long as each label is unique. Further, in FIG. 10, in order to simplify the explanation, those normal in pseudo network label is in only one "0" MULTI (e.g., "0 _1", "0 _2", ...・ It is natural that multiple settings may be made.

図１０の表において、判定が必要な各実網ラベルと各疑似網ラベルとの距離を俯瞰すると、比較的距離が近いと判断できるのは編集距離が４未満と推定できる。つまり、ラベル付与処理部１４は、各削除処理済フロー文字列（実網ラベル）に対して、該当する可能性があるものとして編集距離を所定の閾値Ｔｈ以下の疑似網ラベルを付与するように構成してもよい。図１０の例では、上述の通り、Ｔｈ＝４とすることが妥当と推定可能である。閾値Ｔｈについては、例えば、予め繰返し文字列削除部１３に設定しておくようにしてもよい。 In the table of FIG. 10, when the distance between each real net label and each pseudo net label that needs to be determined is overlooked, it can be estimated that the editing distance is less than 4 when it can be judged that the distance is relatively short. That is, the label assigning processing unit 14 assigns a pseudo network label having an edit distance of a predetermined threshold value Th or less to each deleted flow character string (real network label) as a possibility of being applicable. It may be configured. In the example of FIG. 10, as described above, it can be estimated that Th = 4 is appropriate. The threshold value Th may be set in advance in the repeating character string deletion unit 13, for example.

例えば、図１０の表の行（実網ラベル）を上から見ていくと、１番上の行の入力「正常（ｅａｒｙｄｉａｌｏｇ）」では、最も編集距離が小さい値は２で正常のラベル（０）がついているものであり、その他の疑似網ラベルはすべて４より編集距離が大きい。また、上から２番の行の入力「正常」では、最も小さい編集距離の値は４であり正常ラベル（０）であり、その他の疑似網ラベルはすべて４より編集距離が大きい。さらに、上から３番目の入力「Ｂｕｓｙエラー」は小さい距離として２と３が多数発生している。このような状態場合、これらの距離が小さいもののうち、いずれかのエラー（異常の疑似ラベル）と近いという可能性が複数見つかった状態であり、ラベル付けとしては、該当する可能性のあるものを複数付与するのが妥当である。次に、上から４番目の入力「ＣＡＮＣＥＬ」では、最も小さい距離の値は２だけであるため、疑似網ラベル６のエラー（異常）と特定できる。 For example, looking at the row (real net label) of the table in FIG. 10 from above, in the input "normal (eary dialog)" of the top row, the value with the smallest editing distance is 2 and the label is normal (the label is normal. 0) is attached, and all other pseudo-net labels have a larger editing distance than 4. Further, in the input "normal" of the second line from the top, the value of the smallest edit distance is 4, which is the normal label (0), and all the other pseudo-net labels have a larger edit distance than 4. Further, the third input "Busy error" from the top has a large number of 2 and 3 as a small distance. In such a state, among those with a small distance, there are multiple possibilities that it is close to one of the errors (pseudo-label of abnormality), and the labeling is the one that may be applicable. It is appropriate to give more than one. Next, in the fourth input "CANCEL" from the top, since the value of the smallest distance is only 2, it can be identified as an error (abnormality) of the pseudo network label 6.

最後に、上から５番目の入力「Ｍｅｓｓａｇｅ２００ＯＫ」では編集距離が近いもの（例えば、閾値ＴＨ＝４以下のもの）が見つからず、現在教師データとして持っている疑似網ラベルのいずれにも近くないと判定できる。この場合、ラベル付与処理部１４は、当該入力（５番目の入力「Ｍｅｓｓａｇｅ２００ＯＫ」）については、いずれの教師データのラベルも付与せずに原因出力部１５に供給する。これにより、原因出力部１５では、当該入力については、いずれの教師データにも該当しないため未知のエラーであるという旨の出力を行うことが可能となる。また、同様の処理として、削除済みフロー文字列の中で大きく意味合いの異なる可能性のある４番目の距離の編集については、編集距離を大きくする処理が可能である。 Finally, in the fifth input "Message 200OK" from the top, no edit distance is found (for example, threshold TH = 4 or less), and it is not close to any of the pseudo-net labels currently held as teacher data. Can be determined. In this case, the label assignment processing unit 14 supplies the input (fifth input “Message 200OK”) to the cause output unit 15 without assigning any label of the teacher data. As a result, the cause output unit 15 can output that the input is an unknown error because it does not correspond to any teacher data. Further, as the same processing, it is possible to increase the editing distance for the editing of the fourth distance, which may have a significantly different meaning in the deleted flow character string.

以上のように、ラベル付与処理部１４では、実網ラベル（削除処理済フロー文字列）と各疑似網ラベル（教師用文字列）との編集距離の算出結果に応じて、実網ラベル（削除処理済フロー文字列）に疑似網ラベル（教師用文字列のラベル）を付与して、原因出力部１５に供給する。 As described above, in the label assigning processing unit 14, the real network label (deleted) according to the calculation result of the editing distance between the real network label (deleted flow character string) and each pseudo network label (teacher character string). A pseudo-net label (label of a teacher's character string) is attached to the processed flow character string), and the label is supplied to the cause output unit 15.

そして、原因出力部１５は、供給された削除処理済フロー文字列のラベルを参照し、そのラベルの内容に応じた出力処理を行う。例えば、異常（エラー）のラベルが付与された削除処理済フロー文字列が発生した場合、当該異常（エラー）を示す情報（例えば、ラベルに対応するコードや詳細を示す文字列）と、当該削除処理済フロー文字列に対応するフローを特定する情報（例えば、当該フローの送信元と送信先のＩＰアドレスやプロトコル名等）を出力（例えば、オペレータや通信障害対応者に対して情報提供するための出力）するようにしてもよい。原因出力部１５が出力する際の形式については限定されないものであり、例えば、通信により出力（例えば、電子メール送信、所定のプロトコルに基づく電文送信や、チャットアプリケーションを用いたメッセージ送信等）するようにしてもよいし、図示しない表示装置に表示出力したり表音装置（スピーカ）から表音出力するようにしてもよい。 Then, the cause output unit 15 refers to the label of the supplied deleted flow character string, and performs output processing according to the content of the label. For example, when a deleted flow character string with an error label is generated, information indicating the error (error) (for example, a character string indicating the code or details corresponding to the label) and the deletion are performed. Output information that identifies the flow corresponding to the processed flow character string (for example, the IP address and protocol name of the source and destination of the flow) (for example, to provide information to the operator and the person who responds to the communication failure). (Output of) may be performed. The format when the cause output unit 15 outputs is not limited, and for example, it is output by communication (for example, e-mail transmission, telegram transmission based on a predetermined protocol, message transmission using a chat application, etc.). Alternatively, the display may be output to a display device (not shown) or the sound may be output from a sound device (speaker).

（Ａ−３）第１の実施形態の効果
第１の実施形態によれば、以下のような効果を奏することができる。 (A-3) Effect of First Embodiment According to the first embodiment, the following effects can be obtained.

第１の実施形態のエラー原因解析装置１０では、フローデータについて文字列化して、類似する教師用文字列のラベルを付与して出力するため、通信の異常（エラー）が発生した際の原因を特定した情報を自動的に出力することができる。これにより、第１の実施形態では、従来、通信サービス毎に利用するプロトコルが違い、さらにベンダや組み合わせによって無数にエラーのバリエーションがあったにも関わらず、熟練した人手によって通信障害の解析をしていたものが、自動化できるという効果が得られる。 In the error cause analysis device 10 of the first embodiment, the flow data is converted into a character string, labeled with a similar teacher character string, and output. Therefore, the cause when a communication error (error) occurs can be determined. The specified information can be output automatically. As a result, in the first embodiment, although the protocol used for each communication service is different and there are innumerable error variations depending on the vendor and combination, the communication failure is analyzed by a skilled person. The effect of being able to automate what was used is obtained.

また、第１の実施形態のエラー原因解析装置１０では、編集距離が閾値以下の教師データのラベル（エラー）だけを付与する処理が可能である。これにより、第１の実施形態のエラー原因解析装置１０では、似たエラーが存在する場合でもエラーの候補を出力することができ、未知のエラーが発生した場合（どの教師データとも距離が閾値異常の場合）には未知のエラーとして出力（類似する教師データが存在しない結果を出力）することができるので、人間による解析が本当に必要な部分のみに人手をかけることができる。すなわち、第１の実施形態のエラー原因解析装置１０は、効率的な解析補助ツールとして利用することができる。 Further, in the error cause analysis device 10 of the first embodiment, it is possible to perform a process of assigning only the label (error) of the teacher data whose editing distance is equal to or less than the threshold value. As a result, the error cause analysis device 10 of the first embodiment can output error candidates even if similar errors exist, and when an unknown error occurs (the distance is a threshold abnormality with any teacher data). In the case of), it can be output as an unknown error (results that do not have similar teacher data), so it is possible to manually work on only the parts that really need to be analyzed by humans. That is, the error cause analysis device 10 of the first embodiment can be used as an efficient analysis auxiliary tool.

さらに、第１の実施形態のエラー原因解析装置１０では、フロー文字列から冗長的な繰返し文字を削除を削除したものを用いて評価を行うため、エラーの原因解析の精度向上に必要となる教師データのバリエーションの数を低減することができる。言い換えると、第１の実施形態のエラー原因解析装置１０では、フロー文字列から冗長的な情報を削除して評価しているので、より少ない教師データで網羅性を担保することが可能となる。 Further, since the error cause analysis device 10 of the first embodiment evaluates using the flow character string in which the redundant repeated characters are deleted and deleted, the teacher required to improve the accuracy of the error cause analysis. The number of data variations can be reduced. In other words, since the error cause analysis device 10 of the first embodiment evaluates by deleting redundant information from the flow character string, it is possible to ensure completeness with less teacher data.

（Ｂ）第２の実施形態
以下、本発明による通信解析装置、通信解析プログラム及び通信解析方法の第２の実施形態を、図面を参照しながら詳述する。この実施形態では、本発明の通信解析装置、通信解析プログラム及び通信解析方法をエラー原因解析装置及びエラー原因解析プログラムに適用する例について説明する。 (B) Second Embodiment Hereinafter, a second embodiment of the communication analysis device, the communication analysis program, and the communication analysis method according to the present invention will be described in detail with reference to the drawings. In this embodiment, an example in which the communication analyzer, the communication analysis program, and the communication analysis method of the present invention are applied to the error cause analysis device and the error cause analysis program will be described.

（Ｂ−１）第２の実施形態の構成及び動作
第２の実施形態に関係する装置の接続構成についても図１を用いて示すことができる。なお、図１において括弧内の符号は後述する第２の実施形態において用いられる符号である。 (B-1) Configuration and operation of the second embodiment The connection configuration of the device related to the second embodiment can also be shown with reference to FIG. The reference numerals in parentheses in FIG. 1 are the reference numerals used in the second embodiment described later.

図１に示すように第２の実施形態では、エラー原因解析装置１０がエラー原因解析装置１０Ａに置き換わっている点で第１の実施形態と異なっている。 As shown in FIG. 1, the second embodiment is different from the first embodiment in that the error cause analysis device 10 is replaced with the error cause analysis device 10A.

また、第２の実施形態のエラー原因解析装置１０Ａでは、ラベル付与処理部１４と教師データ保持部１６が、ラベル付与処理部１４Ａに置き換わっている点で第１の実施形態と異なっている。 Further, the error cause analysis device 10A of the second embodiment is different from the first embodiment in that the labeling processing unit 14 and the teacher data holding unit 16 are replaced with the labeling processing unit 14A.

第１の実施形態のラベル付与処理部１４では、削除処理済フロー文字列との編集距離に応じた教師データ（教師用文字列）を選択し、選択した教師データのラベルを付与していた。これに対して、第２の実施形態のラベル付与処理部１４Ａでは、予め機械学習により学習した学習モデルを適用した識別器１４１を用いて、削除処理済フロー文字列に類似する教師データ（教師用文字列）を選択し、選択した教師データのラベルを付与する処理を行う。 The label assignment processing unit 14 of the first embodiment selects teacher data (teacher character string) according to the editing distance from the deleted flow character string, and assigns a label to the selected teacher data. On the other hand, in the labeling processing unit 14A of the second embodiment, teacher data (for teachers) similar to the deleted flow character string is used by using the classifier 141 to which the learning model learned in advance by machine learning is applied. (Character string) is selected, and the process of assigning the label of the selected teacher data is performed.

識別器１４１は、削除処理済フロー文字列を入力すると対応するラベル（疑似網ラベル）を識別して出力する手段である。識別器１４１は、予め設定された学習モデル（学習済みのパターン認識モデル）に基づき、入力された削除処理済フロー文字列が、どの教師用文字列に一番類似するかを自動的に判断する処理を行う。 The classifier 141 is a means for identifying and outputting the corresponding label (pseudo-net label) when the deleted flow character string is input. The classifier 141 automatically determines which teacher character string the input deleted flow character string is most similar to, based on a preset learning model (learned pattern recognition model). Perform processing.

識別器１４１は、例えば、予め図示しない学習器により、教師用文字列とラベル（正解データとなる疑似網ラベル）の組を含む教師データにより機械学習した結果得られる学習モデルを適用した識別手段である。上述の学習器に適用される教師用データは、例えば、第１の実施形態における教師データ保持部１６に保持される教師データを適用することができる。なお、ラベル付与処理部１４Ａにおいて、機械学習により識別器１４１に適用する学習モデルを生成する学習器をさらに搭載するようにしてもよい。ラベル付与処理部１４Ａにおいて、学習モデルを生成する方式や学習モデルに基づいて識別処理を行う方式自体については、種々のＡＩのシステムを適用することができるので詳しい説明を省略する。 The classifier 141 is, for example, a discriminating means to which a learning model obtained as a result of machine learning using teacher data including a set of a teacher character string and a label (pseudo-net label as correct answer data) is applied by a learning device (not shown in advance). be. As the teacher data applied to the above-mentioned learner, for example, the teacher data held in the teacher data holding unit 16 in the first embodiment can be applied. The labeling processing unit 14A may be further equipped with a learning device that generates a learning model to be applied to the classifier 141 by machine learning. Since various AI systems can be applied to the method of generating the learning model and the method of performing the identification process based on the learning model in the label assignment processing unit 14A, detailed description thereof will be omitted.

（Ｂ−２）第２の実施形態の効果
第２の実施形態によれば、第１の実施形態と比較して以下のような効果を奏することができる。 (B-2) Effect of Second Embodiment According to the second embodiment, the following effects can be obtained as compared with the first embodiment.

第２の実施形態のエラー原因解析装置１０Ａでは、第１の実施形態とは異なり、既知のいずれかのラベル（教師用データ）が１つ付与されるため、エラー原因が明らかであるため、対策に関して熟練した人手を介さずに対処が可能であるという効果を奏する。 In the error cause analysis device 10A of the second embodiment, unlike the first embodiment, one known label (teacher data) is given, so that the cause of the error is clear. It has the effect that it is possible to deal with the problem without the intervention of skilled personnel.

（Ｃ）他の実施形態
本発明は、上記の各実施形態に限定されるものではなく、以下に例示するような変形実施形態も挙げることができる。 (C) Other Embodiments The present invention is not limited to each of the above embodiments, and modified embodiments as illustrated below can also be mentioned.

（Ｃ−１）第１の実施形態において、ラベル付与処理部１４は、レーベンシュタイン距離を計算しているが、レーベンシュタイン距離で計算される編集距離は重みがついていない為、意味の逆となる文字列との編集距離が短いという課題がある。例えば、通信の開始を示す文字を「Ｓ」、通信の終了を示す文字を「Ｅ」、処理の成功を示す文字を「Ｏ」、処理の失敗を示す文字を「Ｘ」とした際、「ＳＯＥ」と「ＳＸＥ」の編集距離は２であり、非常に近い距離に位置するが、意味合いは大きく変わってしまうことになる。 (C-1) In the first embodiment, the labeling processing unit 14 calculates the Levenshtein distance, but the editing distance calculated by the Levenshtein distance is not weighted, so the meaning is opposite. There is a problem that the editing distance with the character string is short. For example, when the character indicating the start of communication is "S", the character indicating the end of communication is "E", the character indicating the success of processing is "O", and the character indicating the failure of processing is "X", " The editing distance between "SOE" and "SXE" is 2, which is a very short distance, but the meaning will change significantly.

そこで、ラベル付与処理部１４において、教師用文字列との編集距離を計算する際に、文字列の持つ意味合いが変わるような編集（編集前後の文字が所定のパターンに該当する編集）については重みづけを大きくして最終的に算出される編集距離を大きくするようにしてもよい。 Therefore, when the label assignment processing unit 14 calculates the editing distance from the teacher character string, the weight is applied to the editing in which the meaning of the character string changes (the editing in which the characters before and after the editing correspond to a predetermined pattern). The final calculated editing distance may be increased by increasing the attachment.

例えば、ラベル付与処理部１４において、予め重みづけを大きくする編集のパターンを登録しておくようにしてもよい。例えば、上述の図３〜図５のような形式のＴＣＰのフロー文字列の場合、正常系を示すＡ（ＡＣＫ）が、Ｒ（ＲＳＴ）やＵ（ＵＲＧ）に編集される場合には通常の５倍程度の重みの編集距離が発生するように設定してもよい。例えば、上述の図３〜図５のような形式のＴＣＰのフロー文字列において「ＳＴＡＰＡ・・・ＦＡＦＡ」と「ＳＴＡＰＡ・・・ＦＡＦＵ」のように、末尾の１文字だけ異なる場合でも「Ａ」が「Ｕ」に変わっているため他の文字の５文字分（通常の５倍）と大きな重みの編集距離が発生するようにしてもよい。 For example, the labeling processing unit 14 may register in advance an editing pattern for increasing the weighting. For example, in the case of the TCP flow character string in the format shown in FIGS. 3 to 5 described above, when A (ACK) indicating a normal system is edited into R (RST) or U (URG), it is normal. It may be set so that an editing distance having a weight of about 5 times occurs. For example, "A" even if only the last character is different, such as "STAPA ... FAFA" and "STAPA ... FAFU", in the TCP flow character string in the format shown in FIGS. 3 to 5 described above. Is changed to "U", so that an editing distance of 5 characters (5 times the normal value) of other characters and a large weight may be generated.

また、一般的に、通信のシーケンスでは、「開始→処理１→処理２→・・・→終了」のように変化する為、最初と最後の定型処理と中盤の変化では位置づけが異なる。この場合、フロー文字列においても同様に、最初と最後の文字と、中盤の文字では意味合いが異なることになる。そのため、ラベル付与処理部１４は、編集距離を計算する際に、フロー文字列において両端の文字（最初の文字と最後の文字）が編集される場合の重みを中盤の文字よりも重くするような設定（例えば、フロー文字列について先頭から２次曲線的に重みを変化させる設定）としてもよい。 Further, in general, in the communication sequence, the position changes as "start-> process 1-> process 2-> ...-> end", so that the position is different between the first and last routine processes and the change in the middle stage. In this case, similarly, in the flow character string, the meanings of the first and last characters and the characters in the middle stage are different. Therefore, when calculating the editing distance, the labeling processing unit 14 makes the weight when the characters at both ends (first character and last character) are edited in the flow character string heavier than the characters in the middle stage. It may be set (for example, a setting that changes the weight of the flow character string in a quadratic curve from the beginning).

以上のように、ラベル付与処理部１４では、編集前後の文字のパターンや、編集される文字の位置に応じて、編集距離の重みを変動させるようにしてもよい。 As described above, the labeling processing unit 14 may change the weight of the editing distance according to the character pattern before and after editing and the position of the character to be edited.

（Ｃ−２）上記の各実施形態では、プロトコル分析・文字列化処理部１２が、フロー（シーケンス）を文字列化する際に、アルファベット等の文字列に置換える例について説明したが、文字列に限らず数値・記号などを含む符号（以下、これらを総称して「符号列」と呼ぶ）への変換でも良く、処理しやすい符号を適宜選ぶことで、バリエーションが多いプロトコルにも適応するようにしてもよい。 (C-2) In each of the above embodiments, an example in which the protocol analysis / character string processing unit 12 replaces a flow (sequence) with a character string such as an alphabet when converting the flow (sequence) into a character string has been described. Not limited to columns, it may be converted to a code containing numerical values, symbols, etc. (hereinafter, these are collectively referred to as "code string"), and by appropriately selecting a code that is easy to process, it can be applied to protocols with many variations. You may do so.

１０…エラー原因解析装置、１１…受信部、１２…プロトコル分析・文字列化処理部、１３…繰返し文字列削除部、１４…ラベル付与処理部、１５…原因出力部、１６…教師データ保持部、１７…繰返し文字列保持部。 10 ... Error cause analysis device, 11 ... Receiver unit, 12 ... Protocol analysis / string conversion processing unit, 13 ... Repeated character string deletion unit, 14 ... Labeling processing unit, 15 ... Cause output unit, 16 ... Teacher data holding unit , 17 ... Repeated character string holding unit.

Claims

The data of the traffic to be analyzed is divided for each flow to acquire the flow data, and each communication signal shown in the flow data is coded to be converted into a code according to the protocol used. Coding processing means for generating a code string and
Labeling means for assigning labels corresponding to similar ones from a plurality of teacher data with respect to the flow code string or a code string based on the flow code string.
A communication analysis apparatus comprising: an output means for outputting a predetermined output according to the label with respect to the flow code string or the code string based on the flow code string to which the label is attached.

The coding processing means further includes a repeating code string deleting means that deletes a repeating code string that is repeated a plurality of times from the encoded flow code string to generate a repeatedly deleted code string.
The communication analysis device according to claim 1, wherein the label-imparting means assigns a label corresponding to a similar one from a plurality of teacher data to the repeatedly deleted code string.

The communication analysis apparatus according to claim 2, wherein the labeling means selects similar ones from a plurality of teacher data for the repeatedly deleted code string, and assigns a label corresponding to the selected teacher data. ..

The communication analysis apparatus according to claim 3, wherein the labeling means selects teacher data having a closer editing distance as teacher data having a high degree of similarity to the repeatedly deleted code string.

The communication analysis device according to claim 4, wherein the labeling means changes the weight of the editing distance according to the code pattern before and after editing.

The communication analysis device according to claim 4, wherein the labeling means changes the weight of the editing distance according to the position of the code to be edited.

The labeling means selects teacher data having a closer editing distance as teacher data having a high degree of similarity to the repeatedly deleted code string by using a classifier to which a learning model machine-learned based on the teacher data is applied in advance. The communication analysis apparatus according to claim 3, wherein the communication analysis apparatus is used.

Computer,
The data of the traffic to be analyzed is divided for each flow to acquire the flow data, and each communication signal shown in the flow data is coded to be converted into a code according to the protocol used. Coding processing means for generating a code string and
Labeling means for assigning labels corresponding to similar ones from a plurality of teacher data with respect to the flow code string or a code string based on the flow code string.
A communication analysis program characterized in that the flow code string to which the label is attached or a code string based on the flow code string is made to function as an output means for performing a predetermined output according to the label.

In the communication analysis method performed by the communication analysis device,
A coding processing means, a labeling means, and an output means are provided.
The coding processing means divides the data of the traffic to be analyzed for each flow to acquire the flow data, and converts each communication signal shown in the flow data into a code according to the protocol used. The flow code string is generated by performing the coding process to be performed.
The labeling means assigns a label corresponding to a similar one from a plurality of teacher data to the flow code string or a code string based on the flow code string.
The output means is a communication analysis method, characterized in that a predetermined output according to the label is performed on the flow code string or the code string based on the flow code string to which the label is attached.