JP6310405B2

JP6310405B2 - Service impact cause estimation apparatus, service impact cause estimation program, and service impact cause estimation method

Info

Publication number: JP6310405B2
Application number: JP2015022516A
Authority: JP
Inventors: 愛子尾居; 浩行大西; 高明森谷; 大己遠藤
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2015-02-06
Filing date: 2015-02-06
Publication date: 2018-04-11
Anticipated expiration: 2035-02-06
Also published as: JP2016146555A

Description

本発明は、サービス影響原因推定装置、サービス影響原因推定プログラム、及びサービス影響原因推定方法に関する。 The present invention relates to a service influence cause estimation device, a service influence cause estimation program, and a service influence cause estimation method.

通信ネットワークを介してサービスを提供するのに際して、通信ネットワーク中の故障又は品質劣化が発生している箇所を推定する技術が知られている。 2. Description of the Related Art A technique for estimating a location where a failure or quality degradation occurs in a communication network when providing a service via the communication network is known.

特許文献１には、ユーザからの通信ネットワーク上でのトラブル発生の申告を契機に、利用端末と情報ソース（サーバに相当）の間に配置される複数のサービス構成要素（物理的な設備）からなる設備モデルを生成し、全サービス構成要素が正常な場合の通信シーケンスと、サービス構成要素それぞれが故障した場合の通信シーケンスとを生成し、これらの通信シーケンスと故障申告時における観測情報とを比較することで、故障したサービス構成要素を推定する技術について開示されている（詳細は「第１の比較例」として後述）。 Patent Document 1 describes a plurality of service components (physical facilities) arranged between a use terminal and an information source (corresponding to a server) in response to a report of a trouble occurrence on a communication network from a user. A communication sequence when all service components are normal and a communication sequence when each service component fails are generated, and these communication sequences are compared with observation information at the time of failure declaration. Thus, a technique for estimating a failed service component is disclosed (details will be described later as “first comparative example”).

また、通信ネットワーク上の装置間のパケットをキャプチャし、キャプチャしたパケットを解析することで、ユーザの体感品質に影響を与える種々の特性値（リオーダ幅、トラヒック流量、ＲＴＴ（Round-Trip Time）、パケットロス、ジッタ、セッション確立率、ウィンドウサイズ、サーバの応答時間）を算出し、その算出した値に基づいて当該特性値の正常性の判定を行い、前記装置間の通信品質劣化の原因箇所を推定する技術も知られている（詳細は「第２の比較例」として後述）。 In addition, by capturing packets between devices on the communication network and analyzing the captured packets, various characteristic values (reorder width, traffic flow, RTT (Round-Trip Time), Packet loss, jitter, session establishment rate, window size, server response time), the normality of the characteristic value is determined based on the calculated values, and the cause of communication quality degradation between the devices is determined. An estimation technique is also known (details will be described later as “second comparative example”).

さらに、非特許文献１には、仮想化された通信ネットワーク機能の選択的利用を可能とする柔軟な経路制御技術であるＳＦＣ（Service Function Chaining）技術を用いて、各フローに対し試験パケットを送信し、当該試験パケットが通過した転送機能部（物理／仮想ルータ又は物理／仮想スイッチに相当）及びアプリケーションのＩＤを、試験パケットが備えるリストにそれぞれ格納し、そのＩＤが格納されたリストと事前に設定した情報とを比較することで故障箇所を推定する技術について開示されている（詳細は「第３の比較例」として後述）。 Furthermore, Non-Patent Document 1 transmits a test packet to each flow using SFC (Service Function Chaining) technology, which is a flexible route control technology that enables selective use of virtualized communication network functions. Then, the transfer function unit (corresponding to a physical / virtual router or physical / virtual switch) through which the test packet has passed and the ID of the application are respectively stored in a list provided in the test packet, and the list in which the ID is stored in advance A technique for estimating a fault location by comparing with set information is disclosed (details will be described later as “third comparative example”).

特開平１０−２００５２７号公報Japanese Patent Laid-Open No. 10-200527

Y. Jiang他、"Fault Management in Service Function Chaining"、[online]、２０１４年１０月２７日、The Internet Engineering Task Force、[平成２７年１月２７日検索]、インターネット<URL：https://datatracker.ietf.org/doc/draft-jxc-sfc-fm/?include_text=1>Y. Jiang et al., "Fault Management in Service Function Chaining", [online], October 27, 2014, The Internet Engineering Task Force, [Search January 27, 2015], Internet <URL: https: // datatracker.ietf.org/doc/draft-jxc-sfc-fm/?include_text=1>

しかしながら、前記第１〜第３の比較例の技術では、仮想化設備やアプリケーションソフトといったソフトウェアに故障や劣化が生じていても異常を検出できないこと（特許文献１）、ＩＤを付与できない設備に対しては故障診断ができないこと（非特許文献１）、作業が煩雑になること、装置規模が大規模になること、サービス品質の劣化が生じている場合に原因箇所の推定をすることができないこと等の不具合点が存在する。
そこで、本発明は、ソフトウェアの故障や劣化も検出でき、作業が簡易で、装置規模が比較的小規模であり、サービス品質の劣化が生じている原因箇所の推定もできるサービス影響原因推定装置、サービス影響原因推定プログラム、及びサービス影響原因推定方法を提供することを目的とする。 However, in the technologies of the first to third comparative examples, it is impossible to detect an abnormality even if a failure or deterioration occurs in software such as virtualization equipment or application software (Patent Document 1), and for equipment that cannot be assigned an ID. Failure diagnosis cannot be performed (Non-Patent Document 1), work is complicated, the scale of the apparatus is large, and the cause cannot be estimated when service quality is deteriorated. There are defects such as.
Therefore, the present invention is a service influence cause estimation device that can detect software failures and deterioration, is simple in work, has a relatively small apparatus scale, and can estimate the cause of the deterioration of service quality, It is an object to provide a service influence cause estimation program and a service influence cause estimation method.

本発明は、通信ネットワーク上でデータが受け渡しされる物理設備及びソフトウェアのうち少なくとも１つ以上を用いて構成されるフローについて、当該フローを識別するフローＩＤと、前記物理設備を識別する物理設備ＩＤ、前記物理設備であるサーバを識別するサーバＩＤ、及び当該各サーバで用いられる前記ソフトウェアを識別するソフトウェアＩＤとを関連付けて記憶する記憶部と、前記記憶部を参照して、前記各フローについてデータが流れる前記物理設備及びソフトウェアの前記ＩＤと当該データが流れる順番を特定するモデルであるフローモデルを生成するモデル生成部と、前記フローモデル同士を比較して当該比較結果から前記通信ネットワーク上でのサービス影響の原因となる前記物理設備又は前記ソフトウェアを推定する推定部と、を備え、前記推定部は、前記ソフトウェアＩＤを格納できるリストを生成するリスト生成部と、前記リストを備えた試験パケットを生成する試験パケット生成部と、前記試験パケットを前記フローごとに所定時間内に所定数送信するパケット送信部と、前記試験パケットが通過した前記ソフトウェアのＩＤを前記リストに格納した当該試験パケットのリプライパケットを受信するパケット受信部と、前記受信したリプライパケットを格納するリプライ格納部と、を備え、前記リプライ格納部に格納されている前記リプライパケットの受信についての計測結果に基づいて前記各フローを正常グループと異常グループとに分類するグループ構成部を備え、前記グループ構成部は、前記計測結果としてのレスポンスタイム及び前記計測結果としてのリプライのカウント数がそれぞれ各所定値の範囲内にあり、かつ、前記記憶部から取得したソフトウェアＩＤと前記リプライパケットのリストに格納されたソフトウェアＩＤとを比較し、全てのソフトウェアＩＤが一致した前記フローを正常グループに分類し、それ以外の前記フローを異常グループに分類して、前記異常グループについて、前記レスポンスタイムの実測値及び前記カウント数がそれぞれ各所定値の範囲内にあるとき、又は、前記記憶部から取得したソフトウェアＩＤと前記リプライパケットのリストに格納されたソフトウェアＩＤとを比較し、少なくとも1つ以上のソフトウェアＩＤが一致しない前記フローを故障グループに分類し、それ以外の前記フローを性能劣化グループに分類し、前記推定部は、前記性能劣化グループ内又は前記故障グループ内で前記各フローのフローモデル同士を比較し、共通する前記物理設備又は前記ソフトウェアを抽出し、当該抽出した物理設備又はソフトウェアを前記サービス影響の原因として推定する第１の原因特定部と、前記性能劣化グループ又は前記故障グループと、前記正常グループとの間で前記各フローのフローモデル同士を比較して、共通する前記物理設備又は前記ソフトウェアは前記サービス影響の原因の候補から除外し、残った前記物理設備又は前記ソフトウェアを抽出して、当該抽出した前記物理設備又は前記ソフトウェアを前記サービス影響の原因として推定する第２の原因特定部と、を備えたことを特徴とするサービス影響原因推定装置である。本発明によれば、ソフトウェアの故障や劣化も検出でき、作業が簡易で、装置規模が比較的小規模であり、サービス品質の劣化が生じている原因箇所の推定もできる。
また、試験パケットを送信してリプライパケットを受け取るだけなので、ソフトウェアの故障や劣化も検出でき、作業が簡易で、装置規模が比較的小規模であり、サービス品質の劣化が生じている原因箇所の推定もできる。
さらに、正常なフローと異常なフローとに分類することができる。
その上、レスポンスタイム、リプライのカウント数により簡易に正常なフローと異常なフローとに分類することができる。
また、異常のあるフローを故障ありのものと劣化ありのものに分類できる。
さらに、適切な手段によりサービス影響の原因の推定を行うことができる。 The present invention relates to a flow ID for identifying a flow and a physical facility ID for identifying the physical facility with respect to a flow configured by using at least one of physical facilities and software for transferring data on a communication network. A storage unit that associates and stores a server ID that identifies a server that is the physical facility, and a software ID that identifies the software used in each of the servers, and refers to the storage unit and stores data for each flow A model generation unit that generates a flow model that is a model for specifying the flow of the physical equipment and software through which the data flows and the data flows, and the flow models are compared with each other on the communication network. Estimate the physical equipment or the software that causes service impact Includes an estimation unit, wherein the estimation unit, wherein the list generating unit to generate a list that can store software ID, the test packet generator for generating a test packet with the list, the test packet by the flow A packet transmission unit that transmits a predetermined number of packets within a predetermined time, a packet reception unit that receives a reply packet of the test packet in which the ID of the software that the test packet has passed is stored in the list, and the received reply packet A reply storage unit for storing, and a group configuration unit for classifying each flow into a normal group and an abnormal group based on a measurement result of reception of the reply packet stored in the reply storage unit, The group configuration unit includes a response time as the measurement result and the measurement. As a result, the number of reply counts is within the range of each predetermined value, and the software ID acquired from the storage unit is compared with the software ID stored in the list of reply packets. When the matched flows are classified into a normal group, the other flows are classified as an abnormal group, and the measured response time and the count number are within a predetermined value range for the abnormal group, respectively. Or, compare the software ID acquired from the storage unit with the software ID stored in the list of reply packets, classify the flows that do not match at least one software ID into a failure group, and The flow is classified into a performance degradation group, and the estimation unit First, the flow models of the respective flows are compared with each other in the generalization group or the failure group, the common physical facility or software is extracted, and the extracted physical facility or software is estimated as the cause of the service influence Comparing the flow models of each flow between the cause identifying unit, the performance degradation group or the failure group, and the normal group, the common physical equipment or the software is the cause of the service impact. A second cause identifying unit that excludes the physical equipment or software remaining from the candidate and extracts the physical equipment or software that has been extracted and estimates the extracted physical equipment or software as a cause of the service influence. Is a service influence cause estimation device. According to the present invention, software failure and deterioration can be detected, the work is simple, the apparatus scale is relatively small, and the cause of the deterioration of service quality can be estimated.
In addition, because it only sends a test packet and receives a reply packet, it can detect software failures and deterioration, simplify the work, have a relatively small scale of equipment, and cause the service quality deterioration. It can also be estimated.
Furthermore, it can be classified into a normal flow and an abnormal flow.
In addition, it can be easily classified into a normal flow and an abnormal flow according to the response time and the number of replies.
Also, abnormal flows can be classified into those with failure and those with deterioration.
Furthermore, the cause of the service influence can be estimated by an appropriate means.

この場合に、前記グループ構成部で前記レスポンスタイムに用いる前記所定値は、所定時間内に所定の値だけ送信された前記試験パケットに対する前記レスポンスタイムについて平均値をとる又は統計的手法を用いることで求めるものであり、前記カウント数に用いる前記所定値は、所定時間内に所定の値だけ送信された前記試験パケットに対する前記カウント数の平均値をとる又は統計的手法を用いることで求めるものであることを特徴としてもよい。
本発明によれば、レスポンスタイム、リプライのカウント数に用いる所定値を適切に決定することができる。 In this case, the predetermined value used for the response time in the group configuration unit is obtained by taking an average value for the response time for the test packet transmitted by a predetermined value within a predetermined time or using a statistical method. The predetermined value used for the count number is obtained by taking an average value of the count numbers for the test packet transmitted by a predetermined value within a predetermined time or using a statistical method. This may be a feature.
According to the present invention, it is possible to appropriately determine the predetermined values used for the response time and the reply count.

この場合に、前記推定部は、前記故障又は性能劣化グループに割り振られたフローの数が、同じグループについての所定の閾値以上の場合は前記第１の原因特定部を用い、前記所定の閾値未満の場合は前記第１の原因特定部を用いた後、前記第２の原因特定部を用いて前記サービス影響の原因の推定を行うことを特徴としてもよい。
本発明によれば、第１の原因特定部又は第２の原因特定部を的確に選択することができる。 In this case, when the number of flows allocated to the failure or performance degradation group is equal to or greater than a predetermined threshold value for the same group, the estimation unit uses the first cause identifying unit and is less than the predetermined threshold value. In this case, after using the first cause identifying unit, the cause of the service influence may be estimated using the second cause identifying unit.
According to the present invention, the first cause specifying unit or the second cause specifying unit can be accurately selected.

前記の場合に、前記推定部は、前記性能劣化グループ内又は前記故障グループ内で前記各フローのフローモデル同士を比較し、共通する前記物理設備又は前記ソフトウェアの数をそれぞれカウントし、その後、当該比較をした前記性能劣化グループ又は前記故障グループと前記正常グループとの間で前記各フローのフローモデル同士を比較し、共通する前記物理設備又は前記ソフトウェアについては前記カウントの数を０とし、最終的に前記カウントの数が最大である前記物理設備又は前記ソフトウェアを抽出し、当該抽出した前記物理設備又は前記ソフトウェアを前記サービス影響の原因であるものとして推定すること特徴としてもよい。
本発明によれば、サービス影響の原因であるものの推定を的確に行うことができる。 In the above case, the estimation unit compares the flow models of the flows in the performance degradation group or the failure group, counts the number of the common physical equipment or the software, and then The flow models of each flow are compared between the performance degradation group or the failure group and the normal group that have been compared, and the number of counts is set to 0 for the common physical equipment or the software, and finally The physical equipment or the software having the largest number of counts may be extracted, and the extracted physical equipment or the software may be estimated as the cause of the service influence.
According to the present invention, it is possible to accurately estimate the cause of service influence.

この場合に、前記記憶部は、前記物理設備ＩＤ及び前記ソフトウェアＩＤとして、当該ＩＤが示す物理設備又はソフトウェアと、当該物理設備又はソフトウェアと親子関係又は接続関係にある他の物理設備、ソフトウェア、又はサーバとの相関関係を示すものであり、前記推定部は、前記サービス影響の原因である前記物理設備又は前記ソフトウェアが複数推定された場合に、当該複数の物理設備又はソフトウェア同士、又は、前回行われた前記推定で推定された前記物理設備又は前記ソフトウェアと今回行われた前記推定で推定された前記物理設備又は前記ソフトウェアとについて前記親子関係又は接続関係がある場合に、今回行われて複数推定された前記物理設備又は前記ソフトウェアの前記カウントの数に優先度をつけること特徴としてもよい。
本発明によれば、親子関係又は接続関係によってカウントの数に優先度をつけることができる。 In this case, the storage unit may use the physical equipment or software indicated by the ID as the physical equipment ID and the software ID, and other physical equipment or software that is in a parent-child relationship or connection relationship with the physical equipment or software, or A correlation with a server, and when the estimation unit estimates a plurality of the physical facilities or software that are the cause of the service influence, When there is a parent-child relationship or a connection relationship between the physical facility or software estimated in the estimation and the physical facility or software estimated in the estimation performed this time, a plurality of estimations performed this time Prioritize the number of counts of the physical equipment or software It may be.
According to the present invention, the number of counts can be prioritized by a parent-child relationship or a connection relationship.

別の本発明は、通信ネットワーク上でデータが受け渡しされる物理設備及びソフトウェアのうち少なくとも１つ以上を用いて構成されるフローについて、当該フローを識別するフローＩＤと、前記物理設備を識別する物理設備ＩＤ、前記物理設備であるサーバを識別するサーバＩＤ、及び当該各サーバで用いられる前記ソフトウェアを識別するソフトウェアＩＤとを関連付けて記憶する記憶部を参照して、前記各フローについてデータが流れる前記物理設備及びソフトウェアの前記ＩＤと当該データが流れる順番を特定するモデルであるフローモデルを生成するモデル生成処理と、前記フローモデル同士を比較して当該比較結果から前記通信ネットワーク上でのサービス影響の原因となる前記物理設備又は前記ソフトウェアを推定する推定処理と、をコンピュータに実行させ、前記推定処理は、前記ソフトウェアＩＤを格納できるリストを生成するリスト生成処理と、前記リストを備えた試験パケットを生成する試験パケット生成処理と、前記試験パケットを前記フローごとに所定時間内に所定数送信するパケット送信処理と、前記試験パケットが通過した前記ソフトウェアのＩＤを前記リストに格納した当該試験パケットのリプライパケットを受信するパケット受信処理と、前記受信したリプライパケットを格納するリプライ格納処理と、をコンピュータに実行させ、前記リプライ格納処理で格納されている前記リプライパケットの受信についての計測結果に基づいて前記各フローを正常グループと異常グループとに分類するグループ構成処理をコンピュータに実行させ、前記グループ構成処理は、前記計測結果としてのレスポンスタイム及び前記計測結果としてのリプライのカウント数がそれぞれ各所定値の範囲内にあり、かつ、前記記憶部から取得したソフトウェアＩＤと前記リプライパケットのリストに格納されたソフトウェアＩＤとを比較し、全てのソフトウェアＩＤが一致した前記フローを正常グループに分類し、それ以外の前記フローを異常グループに分類して、前記異常グループについて、前記レスポンスタイムの実測値及び前記カウント数がそれぞれ各所定値の範囲内にあるとき、又は、前記記憶部から取得したソフトウェアＩＤと前記リプライパケットのリストに格納されたソフトウェアＩＤとを比較し、少なくとも1つ以上のソフトウェアＩＤが一致しない前記フローを故障グループに分類し、それ以外の前記フローを性能劣化グループに分類し、前記推定処理は、前記性能劣化グループ内又は前記故障グループ内で前記各フローのフローモデル同士を比較し、共通する前記物理設備又は前記ソフトウェアを抽出し、当該抽出した物理設備又はソフトウェアを前記サービス影響の原因として推定する第１の原因特定処理と、前記性能劣化グループ又は前記故障グループと、前記正常グループとの間で前記各フローのフローモデル同士を比較して、共通する前記物理設備又は前記ソフトウェアは前記サービス影響の原因の候補から除外し、残った前記物理設備又は前記ソフトウェアを抽出して、当該抽出した前記物理設備又は前記ソフトウェアを前記サービス影響の原因として推定する第２の原因特定処理と、をコンピュータに実行させることを特徴とするコンピュータに読み取り可能なサービス影響原因推定プログラムである。
本発明によれば、ソフトウェアの故障や劣化も検出でき、作業が簡易で、装置規模が比較的小規模であり、サービス品質の劣化が生じている原因箇所の推定もできる。
また、試験パケットを送信してリプライパケットを受け取るだけなので、ソフトウェアの故障や劣化も検出でき、作業が簡易で、装置規模が比較的小規模であり、サービス品質の劣化が生じている原因箇所の推定もできる。
さらに、正常なフローと異常なフローとに分類することができる。
その上、レスポンスタイム、リプライのカウント数により簡易に正常なフローと異常なフローとに分類することができる。
また、異常のあるフローを故障ありのものと劣化ありのものに分類できる。
さらに、適切な手段によりサービス影響の原因の推定を行うことができる。 According to another aspect of the present invention, for a flow configured using at least one of physical equipment and software for transferring data on a communication network, a flow ID for identifying the flow and a physical for identifying the physical equipment are provided. The data flows for each flow with reference to a storage unit that stores an equipment ID, a server ID that identifies the server that is the physical equipment, and a software ID that identifies the software used in each server. A model generation process for generating a flow model, which is a model for specifying the ID of the physical equipment and software and the order in which the data flows, and comparing the flow models with each other to determine the service influence on the communication network. An estimation process for estimating the physical equipment or the software causing the problem When, cause the computer to execute, the estimation process, a list generation process to generate a list that can store the software ID, the test packet generation processing that generates a test packet with the list, the flow of the test packet A packet transmission process for transmitting a predetermined number of times within a predetermined time, a packet reception process for receiving a reply packet of the test packet in which the ID of the software that the test packet has passed is stored in the list, and the received reply packet And a reply storing process for storing the reply, and a group configuration for classifying the flows into a normal group and an abnormal group based on a measurement result of reception of the reply packet stored in the reply storing process. Allow the computer to execute the process and In the loop configuration process, the response time as the measurement result and the number of replies as the measurement result are each within a range of each predetermined value, and the list of the software ID and the reply packet acquired from the storage unit Are compared with the software IDs stored in, and the flows with all matching software IDs are classified into normal groups, the other flows are classified into abnormal groups, and the response times are measured for the abnormal groups. When the value and the count number are each within the range of each predetermined value, or comparing the software ID acquired from the storage unit with the software ID stored in the list of reply packets, at least one software Classify the flows whose IDs do not match into failure groups, The outside flow is classified into a performance degradation group, and the estimation process compares the flow models of the flows in the performance degradation group or the failure group, and extracts the common physical equipment or the software. The first physical identification process for estimating the extracted physical equipment or software as the cause of the service influence, and the flow models of the flows between the performance degradation group or the failure group and the normal group. In comparison, the common physical equipment or the software is excluded from the cause of the service influence, the remaining physical equipment or the software is extracted, and the extracted physical equipment or the software is used as the service influence. Causing the computer to execute a second cause identification process to be estimated as the cause of the problem This is a service influence cause estimation program readable by a computer.
According to the present invention, software failure and deterioration can be detected, the work is simple, the apparatus scale is relatively small, and the cause of the deterioration of service quality can be estimated.
In addition, because it only sends a test packet and receives a reply packet, it can detect software failures and deterioration, simplify the work, have a relatively small scale of equipment, and cause the service quality deterioration. It can also be estimated.
Furthermore, it can be classified into a normal flow and an abnormal flow.
In addition, it can be easily classified into a normal flow and an abnormal flow according to the response time and the number of replies.
Also, abnormal flows can be classified into those with failure and those with deterioration.
Furthermore, the cause of the service influence can be estimated by an appropriate means.

別の本発明は、通信ネットワーク上でデータが受け渡しされる物理設備及びソフトウェアのうち少なくとも１つ以上を用いて構成されるフローについて、当該フローを識別するフローＩＤと、前記物理設備を識別する物理設備ＩＤ、前記物理設備であるサーバを識別するサーバＩＤ、及び当該各サーバで用いられる前記ソフトウェアを識別するソフトウェアＩＤとを関連付けて記憶する記憶部を参照して、前記各フローについてデータが流れる前記物理設備及びソフトウェアの前記ＩＤと当該データが流れる順番を特定するモデルであるフローモデルを生成するモデル生成工程と、前記フローモデル同士を比較して当該比較結果から前記通信ネットワーク上でのサービス影響の原因となる前記物理設備又は前記ソフトウェアを推定する推定工程と、を備え、前記推定工程は、前記ソフトウェアＩＤを格納できるリストを生成するリスト生成工程と、前記リストを備えた試験パケットを生成する試験パケット生成工程と、前記試験パケットを前記フローごとに所定時間内に所定数送信するパケット送信工程と、前記試験パケットが通過した前記ソフトウェアのＩＤを前記リストに格納した当該試験パケットのリプライパケットを受信するパケット受信工程と、前記受信したリプライパケットを格納するリプライ格納工程と、を備え、前記リプライ格納工程で格納されている前記リプライパケットの受信についての計測結果に基づいて前記各フローを正常グループと異常グループとに分類するグループ構成工程を備え、前記グループ構成工程は、前記計測結果としてのレスポンスタイム及び前記計測結果としてのリプライのカウント数がそれぞれ各所定値の範囲内にあり、かつ、前記記憶部から取得したソフトウェアＩＤと前記リプライパケットのリストに格納されたソフトウェアＩＤとを比較し、全てのソフトウェアＩＤが一致した前記フローを正常グループに分類し、それ以外の前記フローを異常グループに分類して、前記異常グループについて、前記レスポンスタイムの実測値及び前記カウント数がそれぞれ各所定値の範囲内にあるとき、又は、前記記憶部から取得したソフトウェアＩＤと前記リプライパケットのリストに格納されたソフトウェアＩＤとを比較し、少なくとも1つ以上のソフトウェアＩＤが一致しない前記フローを故障グループに分類し、それ以外の前記フローを性能劣化グループに分類し、前記推定工程は、前記性能劣化グループ内又は前記故障グループ内で前記各フローのフローモデル同士を比較し、共通する前記物理設備又は前記ソフトウェアを抽出し、当該抽出した物理設備又はソフトウェアを前記サービス影響の原因として推定する第１の原因特定工程と、前記性能劣化グループ又は前記故障グループと、前記正常グループとの間で前記各フローのフローモデル同士を比較して、共通する前記物理設備又は前記ソフトウェアは前記サービス影響の原因の候補から除外し、残った前記物理設備又は前記ソフトウェアを抽出して、当該抽出した前記物理設備又は前記ソフトウェアを前記サービス影響の原因として推定する第２の原因特定工程と、を備えたことを特徴とするサービス影響原因推定方法である。
本発明によれば、ソフトウェアの故障や劣化も検出でき、作業が簡易で、装置規模が比較的小規模であり、サービス品質の劣化が生じている原因箇所の推定もできる。
また、試験パケットを送信してリプライパケットを受け取るだけなので、ソフトウェアの故障や劣化も検出でき、作業が簡易で、装置規模が比較的小規模であり、サービス品質の劣化が生じている原因箇所の推定もできる。
さらに、正常なフローと異常なフローとに分類することができる。
その上、レスポンスタイム、リプライのカウント数により簡易に正常なフローと異常なフローとに分類することができる。
また、異常のあるフローを故障ありのものと劣化ありのものに分類できる。
さらに、適切な手段によりサービス影響の原因の推定を行うことができる。 According to another aspect of the present invention, for a flow configured using at least one of physical equipment and software for transferring data on a communication network, a flow ID for identifying the flow and a physical for identifying the physical equipment are provided. The data flows for each flow with reference to a storage unit that stores an equipment ID, a server ID that identifies the server that is the physical equipment, and a software ID that identifies the software used in each server. A model generation step for generating a flow model, which is a model for specifying the ID of the physical equipment and software and the order in which the data flows, and comparing the flow models with each other to determine the service impact on the communication network. Estimator to estimate the physical equipment or software When, wherein the estimation step is given a list generation step of generating a list that can store the software ID, the test packet generation step of generating test packets with the list, the test packet to each of the flow A packet transmission step of transmitting a predetermined number in time, a packet reception step of receiving a reply packet of the test packet storing the ID of the software that the test packet has passed in the list, and storing the received reply packet A group storing step for classifying each flow into a normal group and an abnormal group based on a measurement result of reception of the reply packet stored in the reply storing step. The configuration process is the response time as the measurement result And the counts of replies as the measurement results are within a range of each predetermined value, and the software ID acquired from the storage unit is compared with the software IDs stored in the list of reply packets. The flows with the same software ID are classified into normal groups, the other flows are classified into abnormal groups, and the measured values of the response time and the counts are within the respective predetermined ranges for the abnormal groups. Or the software ID acquired from the storage unit and the software ID stored in the reply packet list, and classify the flows that do not match at least one software ID into a failure group, The other flows are classified into performance degradation groups, and the estimation process is performed. Compares the flow models of each flow within the performance degradation group or the failure group, extracts the common physical equipment or software, and uses the extracted physical equipment or software as the cause of the service impact The first physical identification process to be estimated, the performance degradation group or the failure group, and the normal model are compared with each other in the flow model of each flow, and the common physical equipment or the software is the service A second cause identifying step of extracting the remaining physical equipment or software as a cause of the service influence by extracting the remaining physical equipment or the software from the candidate of the cause of the influence and extracting the extracted physical equipment or the software as the cause of the service influence. This is a service influence cause estimation method characterized by
According to the present invention, software failure and deterioration can be detected, the work is simple, the apparatus scale is relatively small, and the cause of the deterioration of service quality can be estimated.
In addition, because it only sends a test packet and receives a reply packet, it can detect software failures and deterioration, simplify the work, have a relatively small scale of equipment, and cause the service quality deterioration. It can also be estimated.
Furthermore, it can be classified into a normal flow and an abnormal flow.
In addition, it can be easily classified into a normal flow and an abnormal flow according to the response time and the number of replies.
Also, abnormal flows can be classified into those with failure and those with deterioration.
Furthermore, the cause of the service influence can be estimated by an appropriate means.

本発明によれば、ソフトウェアの故障や劣化も検出でき、作業が簡易で、装置規模が比較的小規模であり、サービス品質の劣化が生じている原因箇所の推定もできる。 According to the present invention, software failure and deterioration can be detected, the work is simple, the apparatus scale is relatively small, and the cause of the deterioration of service quality can be estimated.

本発明の一実施形態にかかるシステムの全体の構成図である。1 is an overall configuration diagram of a system according to an embodiment of the present invention. 本発明の一実施形態にかかるサービス影響原因推定装置のハードウェア構成の概要を示すブロック図である。It is a block diagram which shows the outline | summary of the hardware constitutions of the service influence cause estimation apparatus concerning one Embodiment of this invention. 本発明の一実施形態にかかるサービス影響原因推定プログラムに基づいて中央処理装置が実行する機能を説明する機能ブロック図である。It is a functional block diagram explaining the function which a central processing unit performs based on the service influence cause estimation program concerning one Embodiment of this invention. 本発明の一実施形態にかかるサービス影響原因推定装置の設備情報ＤＢに登録されているデータ構成の説明図である。It is explanatory drawing of the data structure registered into equipment information DB of the service influence cause estimation apparatus concerning one Embodiment of this invention. 本発明の一実施形態にかかるサービス影響原因推定装置のソフトウェア情報ＤＢに登録されているデータ構成の説明図である。It is explanatory drawing of the data structure registered into software information DB of the service influence cause estimation apparatus concerning one Embodiment of this invention. 本発明の一実施形態にかかるフローモデルの一例を示す説明図である。It is explanatory drawing which shows an example of the flow model concerning one Embodiment of this invention. 本発明の一実施形態にかかる試験パケットの例を説明する。An example of a test packet according to an embodiment of the present invention will be described. 本発明の一実施形態にかかる各フローをグループ分けする処理のフローチャートである。It is a flowchart of the process which groups each flow concerning one Embodiment of this invention. 図８のグループ分けにおける判断を示す状態遷移図である。FIG. 9 is a state transition diagram illustrating determination in grouping in FIG. 8. 本発明の一実施形態におけるサービス影響の原因推定処理を説明する説明図である。It is explanatory drawing explaining the cause estimation process of the service influence in one Embodiment of this invention. 本発明の一実施形態において第１原因特定部を用いるか、第２原因特定部を用いるかを選択するためのフローチャートである。It is a flowchart for selecting whether the 1st cause specific part is used in one embodiment of the present invention, or the 2nd cause specific part is used. 本発明の一実施形態におけるサービス影響の原因推定処理の変形例を説明する説明図である。It is explanatory drawing explaining the modification of the cause estimation process of the service influence in one Embodiment of this invention. 本発明の一実施形態におけるサービス影響の原因推定処理の変形例を説明する説明図である。It is explanatory drawing explaining the modification of the cause estimation process of the service influence in one Embodiment of this invention. 本発明の一実施形態におけるサービス影響の原因推定処理の変形例を説明する説明図である。It is explanatory drawing explaining the modification of the cause estimation process of the service influence in one Embodiment of this invention. 第１の比較例の技術内容を説明する説明図である。It is explanatory drawing explaining the technical content of a 1st comparative example. 第２の比較例の技術内容を説明する説明図である。It is explanatory drawing explaining the technical content of the 2nd comparative example. 第３の比較例の技術内容を説明する説明図である。It is explanatory drawing explaining the technical content of the 3rd comparative example.

まず、本実施形態を説明する前に本実施形態に対する比較例を複数例説明する。
［比較例］
（第１の比較例）
本明細書において、サービス品質に「劣化」が生じているとは、ネットワーク管理者側で異常の発生を示すアラームを確認できないような異常が発生している場合であり、サービス品質に「故障」が生じているとは、ネットワーク管理者側で当該アラームを確認できる異常が発生している場合である。 First, before describing the present embodiment, a plurality of comparative examples for the present embodiment will be described.
[Comparative example]
(First comparative example)
In this specification, “degradation” occurs in the service quality when the network administrator has an abnormality in which an alarm indicating the occurrence of the abnormality cannot be confirmed, and “failure” occurs in the service quality. The occurrence of an error occurs when an abnormality that allows the network administrator to confirm the alarm has occurred.

図１５は、第１の比較例（特許文献１）の技術内容を説明する説明図である。このネットワークサービス故障診断方法及び装置では、ユーザがサービスを利用する際にその端末と情報ソースとの間に配置される複数のサービス構成要素のそれぞれに関する役割を記述している設備情報データベース（ＤＢ）２０１と、各サービス構成要素に関する正常時及び故障時の動作を記述している設計情報データベース（ＤＢ）２０２とを備えている。設備情報ＤＢ２０１の登録情報の例は符号２０３で示している。 FIG. 15 is an explanatory diagram for explaining the technical contents of the first comparative example (Patent Document 1). In this network service failure diagnosis method and apparatus, when a user uses a service, a facility information database (DB) that describes roles related to each of a plurality of service components arranged between the terminal and an information source 201 and a design information database (DB) 202 describing operations at the time of normality and failure of each service component. An example of registration information in the facility information DB 201 is indicated by reference numeral 203.

まず、通信ネットワークのユーザ（ユーザＡ）の申告により、そのユーザＩＤと利用サービス名で検索を要求すると（Ｓ２１１）、設備情報ＤＢ２０１に基づいて、当該ユーザＡのEnd-to-Endの設備モデルを、サービス構成要素（物理的装置）を単位として出力する（Ｓ２１２）。そして、設計情報ＤＢ２０２を参照して、当該設備モデルに基づいて、全サービスの構成要素が正常なときの通信シーケンスを生成する（Ｓ２１３）。また、各サービスの構成要素が故障した際の通信シーケンスを、設計情報ＤＢ２０２を参照して生成する（Ｓ２１４）。 First, when a user of a communication network (user A) declares a search with the user ID and service name used (S211), the end-to-end equipment model of the user A is determined based on the equipment information DB 201. The service component (physical device) is output as a unit (S212). Then, referring to the design information DB 202, based on the equipment model, a communication sequence when the components of all services are normal is generated (S213). In addition, a communication sequence when a component of each service fails is generated with reference to the design information DB 202 (S214).

次に、正常時の通信シーケンス（Ｓ２１３）、異常時の通信シーケンス（Ｓ２１４）と、観測情報（通信システムの保守者の入力した情報）とを比較して、両者に共通の情報が含まれる通信シーケンスを抽出する。そして、各サービス構成要素のうち、本来の役割を果たさなかったサービス構成要素を抽出する（Ｓ２１５，Ｓ２１６）。この例では、Ｓ２１５で、Ｎｏ．４８のＮＩＣ（Network Interface Controller）の故障時の通信シーケンスが通信シーケンス番号１〜４として示されている。これらは、様々な故障パターンの通信シーケンス例である。また、Ｓ２１６で保守者の入力した情報は、「アラームＡ」という警告が表示されたこと、「メッセージＢ」というメッセージ（Ｓ２１４における「ホストからの応答がありません」というメッセージ）が表示されたこと、及び「○○が動かない」とのユーザＡからの申告である。
この例では、正常／異常時の通信シーケンスと観測情報との比較により、シーケンス番号１，３で一致し、これを抽出する。そして、シーケンス番号１，３の通信シーケンスにおいて、どのサービス構成要素が本来の役割を果たさないのかを判断して、異常個所の推定を行う。 Next, the normal communication sequence (S213), the abnormal communication sequence (S214), and the observation information (information input by the maintainer of the communication system) are compared, and the communication includes information common to both. Extract the sequence. And the service component which did not play the original role among each service component is extracted (S215, S216). In this example, in S215, no. Communication sequences at the time of failure of 48 NICs (Network Interface Controllers) are shown as communication sequence numbers 1 to 4. These are examples of communication sequences of various failure patterns. In addition, the information input by the maintenance person in S216 is that the warning “Alarm A” is displayed, the message “Message B” (the message “No response from host” in S214) is displayed, And a declaration from the user A that “XX does not move”.
In this example, the sequence numbers 1 and 3 are matched by the comparison between the normal / abnormal communication sequence and the observation information, and these are extracted. Then, in the communication sequence of sequence numbers 1 and 3, it is determined which service component does not play the original role, and the abnormal part is estimated.

しかしながら、このような第１の比較例は、予め故障時の通信シーケンスを網羅的に用意する必要があるため、作業が煩雑である。また、第１の比較例は、仮想化設備やアプリケーションソフトといったソフトウェアがサービス構成要素の対象外とされている（対象とされているのは物理的装置だけ）ため、例えば、同一サーバ内に複数の仮想化設備やアプリケーションソフトが設定されている場合、これを設備モデルに変換する際に、ある物理的装置に同居して存在する仮想化設備又はアプリケーションソフトなのか、ある物理的装置に単一で存在する仮想化設備又はアプリケーションソフトなのかを区別できないため、人手による1つ1つの確認作業が必要となってしまい、作業が煩雑である。さらに、第１の比較例は、通信シーケンスの異常を判断するものであるため、通信シーケンスは正常だがサービス品質に劣化が生じている場合に、原因となるサービス構成要素を推定できないという不具合もある。 However, in the first comparative example, it is necessary to comprehensively prepare a communication sequence at the time of failure, so that the work is complicated. In the first comparative example, software such as virtualization equipment and application software is excluded from service components (only physical devices are targeted). When the virtual equipment or application software is set, when converting it to an equipment model, the virtual equipment or application software coexisting with a physical device or a single physical device Therefore, it is not possible to distinguish between the virtual equipment and application software existing in the system, so that it is necessary to check each one manually, and the work is complicated. Furthermore, since the first comparative example determines an abnormality in the communication sequence, there is also a problem in that the service component that is the cause cannot be estimated when the communication sequence is normal but the service quality has deteriorated. .

（第２の比較例）
図１６は、第２の比較例の技術内容を説明する説明図である。この品質劣化原因推定方法は、図１６（ａ）に示すように、ユーザ端末３０１とサーバ３０２がネットワーク３０３を介して接続されている。そして、ネットワーク３０３に設けられた品質劣化原因推定装置３０４がユーザ端末３０１、サーバ３０２間のパケットＰ３１１をキャプチャし、キャプチャしたパケットＰ３１１を解析することで、ユーザの体感品質に影響を与える種々の特性値（リオーダ幅、トラヒック流量、ＲＴＴ（Round-Trip Time）、パケットロス、ジッタ、セッション確立率、ウィンドウサイズ、サーバの応答時間）を算出し、その算出した値に基づいて特性値の正常性判定を行い、ユーザ端末３０１、サーバ３０２間の通信品質劣化の原因箇所を推定するものである。 (Second comparative example)
FIG. 16 is an explanatory diagram for explaining the technical contents of the second comparative example. In this quality degradation cause estimation method, a user terminal 301 and a server 302 are connected via a network 303 as shown in FIG. Then, the quality degradation cause estimation device 304 provided in the network 303 captures the packet P311 between the user terminal 301 and the server 302, and analyzes the captured packet P311. Calculates values (reorder width, traffic flow, RTT (Round-Trip Time), packet loss, jitter, session establishment rate, window size, server response time), and determines the normality of the characteristic value based on the calculated values The cause of communication quality degradation between the user terminal 301 and the server 302 is estimated.

図１６（ｂ）は、特性値がセッション確立率の例である場合の判定処理のフローチャートである。特性値がセッション確立率であるときは、まず、品質劣化原因推定装置３０４は、セッション確立失敗率が所定の閾値より大きいか否か判断する（Ｓ３２１）。大きくないときは（Ｓ３２１のＮ）、品質劣化原因推定装置３０４は、ネットワーク３０３は正常であると判断する（Ｓ３２２）。大きいときは（Ｓ３２１のＹ）、品質劣化原因推定装置３０４は、セッション終端装置があるか否か判断する（Ｓ３２３）。セッション終端装置がある場合は（Ｓ３２３のＹ）、セッション終端装置に異常の原因がある可能性があるので、品質劣化原因推定装置３０４は、セッション終端装置のログを確認する必要があると判断する（Ｓ３２４）。セッション終端装置がない場合は（Ｓ３２３のＮ）、品質劣化原因推定装置３０４は、サーバ３０２に原因があると判断する（Ｓ３２５）。 FIG. 16B is a flowchart of the determination process when the characteristic value is an example of the session establishment rate. When the characteristic value is the session establishment rate, first, the quality degradation cause estimation device 304 determines whether or not the session establishment failure rate is greater than a predetermined threshold (S321). When it is not large (N in S321), the quality degradation cause estimating apparatus 304 determines that the network 303 is normal (S322). When it is larger (Y in S321), the quality degradation cause estimation device 304 determines whether there is a session termination device (S323). If there is a session termination device (Y in S323), there is a possibility that the session termination device has a cause of abnormality, so the quality degradation cause estimation device 304 determines that it is necessary to check the log of the session termination device. (S324). When there is no session termination device (N in S323), the quality degradation cause estimation device 304 determines that there is a cause in the server 302 (S325).

図１６（ｃ）は、特性値がウィンドウサイズの例である場合の判定処理のフローチャートである。特性値がウィンドウサイズであるときは、まず、品質劣化原因推定装置３０４は、ウィンドウサイズが所定の閾値より小さいか否か判断する（Ｓ３３１）。閾値以上であるときは（Ｓ３３１のＮ）、品質劣化原因推定装置３０４は、ネットワーク３０３は正常であると判断する（Ｓ３３２）。閾値より小さいときは（Ｓ３３１のＹ）、品質劣化原因推定装置３０４は、帯域制御装置があるか否か判断する（Ｓ３３３）。帯域制御装置がある場合は（Ｓ３３３のＹ）、帯域制御装置に異常の原因がある可能性があるので、品質劣化原因推定装置３０４は、帯域制御装置のログを確認する必要があると判断する（Ｓ３３４）。帯域制御装置がない場合は（Ｓ３３３のＮ）、品質劣化原因推定装置３０４は、サーバ３０２に原因があると判断する（Ｓ３３５）。 FIG. 16C is a flowchart of the determination process when the characteristic value is an example of the window size. When the characteristic value is the window size, first, the quality deterioration cause estimating device 304 determines whether or not the window size is smaller than a predetermined threshold (S331). When it is equal to or greater than the threshold (N in S331), the quality degradation cause estimating apparatus 304 determines that the network 303 is normal (S332). When it is smaller than the threshold value (Y in S331), the quality degradation cause estimation device 304 determines whether there is a bandwidth control device (S333). If there is a bandwidth control device (Y in S333), there is a possibility that the bandwidth control device has a cause of abnormality, so the quality degradation cause estimation device 304 determines that it is necessary to check the log of the bandwidth control device. (S334). When there is no bandwidth control device (N in S333), the quality degradation cause estimation device 304 determines that there is a cause in the server 302 (S335).

しかしながら、このような第２の比較例は、フローごとに、End-to-Endで、ネットワーク３０３内の個々の装置も含めた原因箇所推定を行う場合に、ネットワーク３０３内の装置に設けるキャプチャポイントが多くなってしまい、キャプチャしたパケットＰ３１１のデータの保存量及び判定処理にかかる負荷が増大してしまうので、作業が煩雑となり、装置規模が大規模になってしまう。このため、通信キャリアといった大規模なネットワークにおいては適用が困難である。 However, such a second comparative example is a capture point provided in a device in the network 303 when the cause location including individual devices in the network 303 is estimated end-to-end for each flow. Since the amount of data stored in the captured packet P311 and the load on the determination process increase, the operation becomes complicated and the apparatus scale becomes large. For this reason, it is difficult to apply in a large-scale network such as a communication carrier.

（第３の比較例）
図１７は、第３の比較例の技術内容を説明する説明図である。この仮想化機構を含む故障診断方法は、図１７（ａ）に示すように、ネットワーク４０１中に、複数台のサーバ４０２、複数台のスイッチ（ネットワークスイッチ）４０３が配置されている。検出ノード４０４は、ネットワーク４０１中のノードのひとつである。サーバ４０２は、データ転送を行う転送機能部（仮想スイッチに相当）４１１と、アプリケーションソフト４１２とを備えている。ＳＦＦ１〜ＳＦＦ３は、各転送機能部４１１のＩＤであり、ＳＦ１〜ＳＦ５は、各アプリケーションソフト４１２のＩＤである。 (Third comparative example)
FIG. 17 is an explanatory diagram for explaining the technical contents of the third comparative example. In the failure diagnosis method including this virtualization mechanism, a plurality of servers 402 and a plurality of switches (network switches) 403 are arranged in a network 401 as shown in FIG. The detection node 404 is one of the nodes in the network 401. The server 402 includes a transfer function unit (corresponding to a virtual switch) 411 that performs data transfer, and application software 412. SFF1 to SFF3 are IDs of the transfer function units 411, and SF1 to SF5 are IDs of the application software 412.

本技術は、仮想化されたネットワーク機能の選択的利用を可能とする柔軟な経路制御技術であるＳＦＣ（Service Function Chaining）を用いている。検出ノード４０４は、各フローに対して試験パケットを１つ送信する。この試験パケットは、転送機能部４１１とアプリケーションソフト４１２のＩＤを格納するリストを備えている。試験パケットを受信したサーバ４０２では、当該試験パケットのリストに自身の転送機能部４１１のＩＤを格納し、当該格納後のリストをコピーし、コピーしたリストを送信元の検出ノード４０４にリプライする。その後、当該コピー元のリストを備えた試験パケットを同一サーバ４０２のアプリケーションソフト４１２に転送し、ここでも転送機能部４１１と同様にリストのコピー、コピーしたリストの送信元の検出ノード４０４へのリプライが行われる。さらに別のサーバ４０２に試験パケットが転送される場合も同様である。 This technology uses SFC (Service Function Chaining), which is a flexible route control technology that enables selective use of virtualized network functions. The detection node 404 transmits one test packet for each flow. This test packet includes a list for storing the IDs of the transfer function unit 411 and the application software 412. Upon receiving the test packet, the server 402 stores the ID of its own transfer function unit 411 in the list of the test packet, copies the stored list, and replies the copied list to the detection node 404 of the transmission source. Thereafter, the test packet having the copy source list is transferred to the application software 412 of the same server 402. Here, as with the transfer function unit 411, the list is copied, and the reply is sent to the detection node 404 that is the transmission source of the copied list. Is done. The same applies when the test packet is transferred to another server 402.

図１７（ｂ）には、検出ノード４０４に予め格納されている設定情報４２１の例を示している。この設定情報４２１は、あるフローを転送される試験パケットの通過経路の各部のＩＤを通過する順に上から並べて示している。この例では、ＩＤがＳＦＦ１の転送機能部４１１を備えたサーバ４０２に試験パケットが転送され、その転送機能部４１１、アプリケーションソフト４１２を試験パケットが順次通過した後、ＩＤがＳＦＦ２の転送機能部４１１を備えたサーバ４０２に試験パケットが転送され、その転送機能部４１１、アプリケーションソフト４１２を試験パケットが順次通過する例である。そのため、当該試験パケットが通過する予定の各部のＩＤを順に示すと、“ＳＦＦ１→ＳＦ１→ＳＦＦ１→ＳＦＦ２→ＳＦ２→ＳＦＦ２→ＳＦ３→ＳＦＦ２”となる。 FIG. 17B shows an example of setting information 421 stored in advance in the detection node 404. The setting information 421 is arranged from the top in the order of passing through the ID of each part of the passage route of the test packet to be transferred for a certain flow. In this example, the test packet is transferred to the server 402 having the transfer function unit 411 with the ID SFF1, and after the test packet sequentially passes through the transfer function unit 411 and the application software 412, the transfer function unit 411 with the ID SFF2. In this example, a test packet is transferred to a server 402 equipped with and the test packet sequentially passes through the transfer function unit 411 and application software 412. Therefore, if the IDs of the respective parts that the test packet is scheduled to pass are indicated in order, “SFF1 → SF1 → SFF1 → SFF2 → SF2 → SFF2 → SF3 → SFF2”.

図１７（ｃ）は、試験パケットに基づいて、検出ノード４０４にリプライされたリストの例である。リスト１はＩＤがＳＦＦ１の転送機能部４１１からリプライされ、リスト２はＩＤがＳＦ１のアプリケーションソフト４１２からリプライされ、リスト３はＩＤがＳＦＦ１の転送機能部４１１からリプライされ、リスト４はＩＤがＳＦＦ２の転送機能部４１１からリプライされたものである。 FIG. 17C is an example of a list replied to the detection node 404 based on the test packet. List 1 is replied from transfer function unit 411 with ID SFF1, list 2 is replied from application software 412 with ID SF1, list 3 is replied from transfer function unit 411 with ID SFF1, and list 4 has ID SFF2. The transfer function unit 411 is replied.

これらのリストと設定情報４２１とを比較することにより、各転送機能部４１１、アプリケーションソフト４１２のうち、異常が存在する部位はどれであるかを判定することができる。リスト１〜リスト４は、設定情報４２１と比較すれば、いずれも検出ノード４０４に正しくリプライされたものであることがわかる。すなわち、リスト１〜リスト４の各最下段のＩＤは設定情報４２１中に存在し、当該ＩＤが当該リストのリプライ元であるから、そのリプライ元の転送機能部４１１又はアプリケーションソフト４１２は正常に動作していると判定できる。
ここで、仮に、ＩＤがＳＦＦ２である転送機能部４１１に異常が存在していると、最下段にＩＤのＳＦＦ２が記録されたリスト４はリプライされないので、リスト４のリプライの不存在をもって、ＩＤがＳＦＦ２である転送機能部４１１に異常があると判定することになる。 By comparing these lists with the setting information 421, it is possible to determine which part of each transfer function unit 411 and application software 412 has an abnormality. Compared with the setting information 421, it can be seen that List 1 to List 4 are all correctly replied to the detection node 404. That is, the lowest ID of each of the list 1 to the list 4 exists in the setting information 421, and the ID is a reply source of the list, so that the transfer function unit 411 or application software 412 of the reply source operates normally. Can be determined.
Here, if there is an abnormality in the transfer function unit 411 whose ID is SFF2, the list 4 in which the SFF2 of ID is recorded at the bottom is not replied. It is determined that there is an abnormality in the transfer function unit 411 that is SFF2.

しかしながら、このような第３の比較例は、ＩＤを付与可能な転送機能部４１１及びアプリケーションソフト４１２のみを異常診断の対象としているため，それ以外のＩＤを付与できない物理的な装置やソフト的な設備に対しては異常診断ができないという不具合がある。また、転送機能部４１１及びアプリケーションソフト４１２のＩＤが格納されたリストと設定情報４２１との比較だけでは、サービス品質の劣化が生じている場合に原因箇所の推定をすることができないという不具合もある。 However, in the third comparative example, since only the transfer function unit 411 and the application software 412 that can be assigned IDs are targeted for abnormality diagnosis, other physical devices or software that cannot be assigned other IDs. There is a problem that abnormality diagnosis cannot be performed for equipment. In addition, there is a problem in that the cause location cannot be estimated when the quality of service is deteriorated only by comparing the setting information 421 with the list in which the IDs of the transfer function unit 411 and the application software 412 are stored. .

［実施形態］
次に、第１〜第３の比較例における不具合を解消した本実施形態の技術内容について説明する。
（システム構成の概要）
図１は、本実施形態の全体のシステム構成図である。インターネットなどの通信ネットワーク１０上には複数のサーバ１１が設置され、これらのサーバ１１は、スイッチ（ネットワークスイッチ）１２、リンク１３を介して接続されている。各サーバ１１には、いずれもソフトウェアである仮想スイッチ（ｖＳＷ）２１、アプリケーションソフト（ＡＰＬ）２２が用意されている。これらの通信ネットワーク１０中の各構成要素には、そのＩＤを、例えば「ＩＤ：ｓｖ０１」のように図示している。 [Embodiment]
Next, the technical contents of the present embodiment in which the problems in the first to third comparative examples are solved will be described.
(Overview of system configuration)
FIG. 1 is an overall system configuration diagram of the present embodiment. A plurality of servers 11 are installed on a communication network 10 such as the Internet, and these servers 11 are connected via a switch (network switch) 12 and a link 13. Each server 11 is provided with a virtual switch (vSW) 21 and application software (APL) 22 which are all software. The ID of each component in the communication network 10 is illustrated as “ID: sv01”, for example.

サービス影響原因推定装置１は、この例ではサーバ１１に接続されて設けられている。しかし、本発明はこれに限定されるものではなく、サービス影響原因推定装置１をサーバ１１とは独立させて通信ネットワーク１０中に配置してもよい。 In this example, the service influence cause estimation device 1 is provided connected to the server 11. However, the present invention is not limited to this, and the service influence cause estimation device 1 may be arranged in the communication network 10 independently of the server 11.

図２は、サービス影響原因推定装置１のハードウェア構成の概要を示すブロック図である。サービス影響原因推定装置１は、各種演算及び制御を行う中央処理装置（ＣＰＵ）３１と、中央処理装置３１の作業領域となる主記憶装置３２と、各種データを記憶する補助記憶装置（ＨＤＤ等）３３と、通信ネットワーク１０と通信を行う通信インターフェイス（Ｉ／Ｆ）３４とを備えている。補助記憶装置３３には、サービス影響原因推定装置１における下記に説明する特徴的な処理を実行するためのプログラムであるサービス影響原因推定プログラム４５が格納されている。 FIG. 2 is a block diagram showing an outline of the hardware configuration of the service influence cause estimating apparatus 1. The service influence cause estimation device 1 includes a central processing unit (CPU) 31 that performs various calculations and controls, a main storage device 32 that is a work area of the central processing unit 31, and an auxiliary storage device (HDD or the like) that stores various data. 33 and a communication interface (I / F) 34 that communicates with the communication network 10. The auxiliary storage device 33 stores a service influence cause estimation program 45 that is a program for executing the characteristic processing described below in the service influence cause estimation apparatus 1.

図３は、サービス影響原因推定プログラム４５に基づいて中央処理装置３１が実行する機能を説明する機能ブロック図である。
すなわち、サービス影響原因推定装置１は、記憶部５０と、処理部６０と、管理部７０とを備えている。
記憶部５０には、設備情報データベース（ＤＢ）５１と、ソフトウェア情報データベース（ＤＢ）５２とが設けられている。これら各部の詳細な機能は後述する。
処理部６０は、後述のフローモデルに関する処理を行う。処理部６０には、モデル生成部６１と、構成要素方法決定部６２と、構成要素抽出部６３と、抽出要素格納部６４とが設けられている。構成要素抽出部６３は、第１原因特定部６３１と、第２原因特定部６３２とを備えている。これら各部の詳細な機能は後述する。 FIG. 3 is a functional block diagram for explaining functions executed by the central processing unit 31 based on the service influence cause estimation program 45.
That is, the service influence cause estimation device 1 includes a storage unit 50, a processing unit 60, and a management unit 70.
The storage unit 50 is provided with an equipment information database (DB) 51 and a software information database (DB) 52. Detailed functions of these units will be described later.
The processing unit 60 performs processing related to a flow model described later. The processing unit 60 includes a model generation unit 61, a component element method determination unit 62, a component element extraction unit 63, and an extraction element storage unit 64. The component extraction unit 63 includes a first cause specifying unit 631 and a second cause specifying unit 632. Detailed functions of these units will be described later.

管理部７０は、各種データの管理に関する処理を行う。管理部７０には、設定情報管理部７１と、グループ管理部７２と、閾値管理部７３と、試験パケット管理部７４と、記録部７５とが設けられている。グループ管理部７２には、グループ構成部７２１と、グループ格納部７２２とが設けられている。閾値管理部７３は、レスポンスタイム閾値格納部７３１と、リプライカウント数閾値格納部７３２と、故障フロー数閾値格納部７３３と、性能劣化フロー数閾値格納部７３４とが設けられている。試験パケット管理部７４は、試験パケット生成部７４１と、リスト生成部７４２と、パケット送信部７４３と、パケット受信部７４４と、リプライ格納部７４５とが設けられている。記録部７５は、レスポンスタイム格納部７５１と、リプライカウント数格納部７５２と、故障フロー数格納部７５３と、性能劣化フロー数格納部７５４とが設けられている。これら各部の詳細な機能は後述する。 The management unit 70 performs processing related to management of various data. The management unit 70 includes a setting information management unit 71, a group management unit 72, a threshold management unit 73, a test packet management unit 74, and a recording unit 75. The group management unit 72 is provided with a group configuration unit 721 and a group storage unit 722. The threshold management unit 73 is provided with a response time threshold storage unit 731, a reply count number threshold storage unit 732, a failure flow number threshold storage unit 733, and a performance degradation flow number threshold storage unit 734. The test packet management unit 74 includes a test packet generation unit 741, a list generation unit 742, a packet transmission unit 743, a packet reception unit 744, and a reply storage unit 745. The recording unit 75 includes a response time storage unit 751, a reply count number storage unit 752, a failure flow number storage unit 753, and a performance deterioration flow number storage unit 754. Detailed functions of these units will be described later.

以下では、サービス影響原因推定装置１が実行する処理であるサービス影響原因推定方法について順次説明する。
（サービス影響原因推定方法の概要）
図４は、設備情報ＤＢ５１（図３）に登録されているデータ構成の説明図である。設備情報ＤＢ５１には、フローＩＤと物理設備ＩＤとが関連付けられて登録される。フローＩＤは、通信ネットワーク１０において、転送装置や通信ケーブル等の物理設備と、仮想マシンや仮想スイッチ等の仮想化された設備及びアプリケーションソフト等のソフトウェアとのうち（図１の例では、サーバ１１、スイッチ１２、リンク１３、仮想スイッチ２１、アプリケーションソフト２２）の少なくとも１つ以上を用いて構成されるフローを識別する識別子である。物理設備ＩＤは、通信ネットワーク１０において、前記各フロー中の物理設備（図１の例では、サーバ１１、スイッチ１２、リンク１３）を識別する識別子である。物理設備ＩＤは、データが流れる物理設備のＩＤをデータが流れる順番に左から右に連結して示している。 Below, the service influence cause estimation method which is a process which the service influence cause estimation apparatus 1 performs is demonstrated sequentially.
(Outline of service impact cause estimation method)
FIG. 4 is an explanatory diagram of a data configuration registered in the facility information DB 51 (FIG. 3). In the facility information DB 51, a flow ID and a physical facility ID are associated and registered. In the communication network 10, the flow ID is a physical facility such as a transfer device or a communication cable, a virtual facility such as a virtual machine or a virtual switch, and software such as application software (in the example of FIG. 1, the server 11 , The switch 12, the link 13, the virtual switch 21, and the application software 22). The physical facility ID is an identifier for identifying the physical facility (in the example of FIG. 1, the server 11, the switch 12, and the link 13) in each flow in the communication network 10. The physical facility ID indicates the ID of the physical facility through which data flows, connected from left to right in the order in which the data flows.

図５は、ソフトウェア情報ＤＢ５２（図３）に登録されているデータ構成の説明図である。設備情報ＤＢ５１には、前記のフローＩＤと、サーバＩＤと、ソフトウェアＩＤとが関連付けられて登録される。サーバＩＤは、サーバ１１を識別する識別子である。ソフトウェアＩＤは、各サーバＩＤが示すサーバ１１に搭載されている仮想マシンや仮想スイッチ等の仮想化された設備及びアプリケーションソフト等のソフトウェア（図１の例では、仮想スイッチ２１、アプリケーションソフト２２）を識別する識別子である。ソフトウェアＩＤは、データが流れるソフトウェアのＩＤをデータが流れる順番に左から右に連結して示している。ソフトウェアＩＤは、各サーバ１１のサーバＩＤと関連付けられていて、当該サーバＩＤの示すサーバ１１内のソフトウェアのＩＤのみで示されている。 FIG. 5 is an explanatory diagram of a data configuration registered in the software information DB 52 (FIG. 3). In the facility information DB 51, the flow ID, the server ID, and the software ID are associated and registered. The server ID is an identifier for identifying the server 11. The software ID is a virtual facility such as a virtual machine or virtual switch mounted on the server 11 indicated by each server ID, and software such as application software (in the example of FIG. 1, virtual switch 21 and application software 22). It is an identifier to identify. The software ID indicates the ID of the software through which the data flows, concatenated from left to right in the order in which the data flows. The software ID is associated with the server ID of each server 11 and is indicated only by the software ID in the server 11 indicated by the server ID.

図６は、フローモデルの一例を示す説明図である。フローモデルもフローＩＤが識別子となり、図６においては、フローＩＤごとのフローモデル例を示している。フローモデルは、モデル生成部６１により生成される。すなわち、モデル生成部６１は、あるフローＩＤのフローモデルを作成するに際して、設備情報ＤＢ５１（図４）とソフトウェア情報ＤＢ５２（図５）とを参照して、それぞれ対象となるフローＩＤと関連付けられている、物理設備ＩＤが示す物理設備と、サーバＩＤ及びソフトウェアＩＤが示すソフトウェアとを、データが流れる順番に左から右に連結して示している。すなわち、モデル生成部６１は、各フローについてデータが流れる物理設備及びソフトウェアのＩＤと当該データが流れる順番を特定するモデルである。 FIG. 6 is an explanatory diagram illustrating an example of a flow model. In the flow model, the flow ID is an identifier, and FIG. 6 shows an example of the flow model for each flow ID. The flow model is generated by the model generation unit 61. That is, the model generation unit 61 refers to the facility information DB 51 (FIG. 4) and the software information DB 52 (FIG. 5) when creating a flow model of a certain flow ID, and is associated with each target flow ID. The physical equipment indicated by the physical equipment ID and the software indicated by the server ID and the software ID are shown connected from left to right in the order of data flow. That is, the model generation unit 61 is a model that identifies the physical equipment and software ID through which data flows for each flow and the order in which the data flows.

本実施形態では、処理部６０及び管理部７０が推定部に相当し、この処理部６０及び管理部７０の実行する処理により、前記のフローモデル同士を比較して当該比較結果から、通信ネットワーク１０上での劣化、故障のようなサービス影響の原因となる物理設備又はソフトウェアを推定するものである。以下では、サービス影響原因推定装置１が実行する詳細な処理、特に、推定部となる処理部６０及び管理部７０が実行する具体的な処理について説明する。 In the present embodiment, the processing unit 60 and the management unit 70 correspond to an estimation unit, and the processing performed by the processing unit 60 and the management unit 70 compares the flow models with each other and determines the communication network 10 based on the comparison result. It estimates the physical equipment or software that causes the service impact such as deterioration and failure. Below, the detailed process which the service influence cause estimation apparatus 1 performs, especially the specific process which the process part 60 used as an estimation part and the management part 70 perform are demonstrated.

（試験パケット）
通信ネットワーク１０上での劣化、故障のようなサービス影響の原因となる物理設備又はソフトウェアを推定するために用いる試験パケットの例を説明する。
まず、リスト生成部７４２がソフトウェアＩＤを格納できる、図７（ａ）に示すようなリストを生成する。このリストには、対象となるフローモデルのフローＩＤとソフトウェアＩＤとが、試験パケットが当該ソフトウェアを通過した際に記載される。そして、試験パケット生成部７４１が、当該リストを備えた試験パケットを生成する。この試験パケットのヘッダには、該当するフローの設備情報ＤＢ５１及びソフトウェア情報ＤＢ５２を参照して、当該試験パケットが通過する物理設備の物理設備ＩＤ、ソフトウェアのソフトウェアＩＤが格納されている。
パケット送信部７４３は、この生成した試験パケットをフローごとに所定時間内に所定数送信する。具体的には、１フローにつき複数個の同一の試験パケットが送信される。 (Test packet)
An example of a test packet used for estimating a physical facility or software that causes a service influence such as deterioration or failure on the communication network 10 will be described.
First, the list generation unit 742 generates a list as shown in FIG. 7A in which the software ID can be stored. In this list, the flow ID and software ID of the target flow model are described when the test packet passes through the software. Then, the test packet generation unit 741 generates a test packet having the list. The header of the test packet stores the physical equipment ID of the physical equipment through which the test packet passes and the software ID of the software with reference to the equipment information DB 51 and software information DB 52 of the corresponding flow.
The packet transmitter 743 transmits a predetermined number of the generated test packets for each flow within a predetermined time. Specifically, a plurality of identical test packets are transmitted per flow.

これにより、各フローにおいて、フローモデルの最後の構成要素がリプライパケットを生成し、送信する。リプライパケットには、対応する試験パケットが通過したソフトウェアのＩＤをそれぞれ格納した、図７（ｂ）に示すようなリストが添付される。リプライパケットはパケット受信部７４４が受信する。そして、リプライ格納部７４５は、当該リプライパケットのリストを格納する。
この試験パケットのレスポンスタイムの実測値はレスポンスタイム格納部７５１に格納され、また、試験パケットのリプライパケットのカウント数はリプライカウント数格納部７５２に格納される。 Thereby, in each flow, the last component of the flow model generates a reply packet and transmits it. The reply packet is attached with a list as shown in FIG. 7B in which the IDs of the software that have passed the corresponding test packet are stored. The reply packet is received by the packet receiving unit 744. Then, the reply storage unit 745 stores the list of reply packets.
The actually measured value of the response time of the test packet is stored in the response time storage unit 751, and the reply packet count number of the test packet is stored in the reply count number storage unit 752.

（グループ分け）
次に、前記試験パケットの送信の結果に基づいて、各フローをグループ分けする。グループ分けは、まず、異常が存在しないフローと判断する「正常グループ」と、異常が存在するフローと判断する「異常グループ」である「性能劣化グループ」及び「故障グループ」とに分類する。グループ構成部７２１は、この正常グループ、異常グループ、性能劣化グループのグループ分けを行う。グループ格納部７２２は、このグループ分けの結果を格納する。また、設定情報管理部７１には、設備情報ＤＢ５１、ソフトウェア情報ＤＢ５２の登録情報を設定情報として取り込む。 (Grouping)
Next, the flows are grouped based on the result of transmission of the test packet. The grouping is first classified into a “normal group” that is determined as a flow having no abnormality, and a “performance degradation group” and a “failure group” that are “abnormal groups” that are determined as a flow having an abnormality. The group configuration unit 721 performs grouping of the normal group, the abnormal group, and the performance deterioration group. The group storage unit 722 stores the grouping result. In addition, the setting information management unit 71 takes in the registration information of the facility information DB 51 and the software information DB 52 as setting information.

図８は各フローをグループ分けする処理のフローチャートである。本処理では、レスポンスタイム格納部７５１に格納されている試験パケットのレスポンスタイムの実測値の閾値としてＮ１，Ｎ２（Ｎ１＜Ｎ２）、リプライカウント数格納部７５２に格納されている試験パケットのリプライパケットのカウント数の閾値としてＣ１，Ｃ２，Ｃ３（Ｃ１＞Ｃ２＞Ｃ３）を用いる。 FIG. 8 is a flowchart of processing for grouping the flows. In this processing, N1 and N2 (N1 <N2) are used as threshold values of the actual measurement values of the response time of the test packet stored in the response time storage unit 751, and the reply packet of the test packet stored in the reply count number storage unit 752 C1, C2, and C3 (C1> C2> C3) are used as the threshold values of the count number.

ここで、レスポンスタイム閾値格納部７３１に格納されるレスポンスタイムに用いる所定値である閾値Ｎ１，Ｎ２は、所定時間内に前記のとおり所定の値だけ送信された試験パケットに対するレスポンスタイムについて平均値をとる又は所定の統計的手法を用いることで求めるものである。同様に、リプライカウント数閾値格納部７３２に格納されるリプライパケットのカウント数に用いる所定値である閾値Ｃ１，Ｃ２，Ｃ３は、所定時間内に前記所定の値だけ送信された試験パケットに対するカウント数の平均値をとる又は所定の統計的手法を用いることで求めるものである。 Here, the thresholds N1 and N2 which are predetermined values used for the response time stored in the response time threshold storage unit 731 are average values for the response times for the test packets transmitted by the predetermined value within the predetermined time as described above. Or by using a predetermined statistical method. Similarly, threshold values C1, C2, and C3, which are predetermined values used for the count number of reply packets stored in the reply count number threshold storage unit 732, are the count numbers for the test packets transmitted by the predetermined value within a predetermined time. It is obtained by taking an average value of or using a predetermined statistical method.

まず、フローごとに図８に示す処理を行う。すなわち、グループ構成部７２１は、レスポンスタイム格納部７５１に格納されているレスポンスタイムの実測値が閾値Ｎ１より小さいか否かを判断する（Ｓ１）。レスポンスタイムの実測値が閾値Ｎ１より小さいときは（Ｓ１のＹｅｓ）、グループ構成部７２１は、リプライカウント数格納部７５２に格納されているリプライパケットのカウント数が閾値Ｃ１以上か否かを判断する（Ｓ２）。リプライパケットのカウント数が閾値Ｃ１以上であるときは（Ｓ２のＹｅｓ）、グループ構成部７２１は、リプライ格納部７４５に格納されているリプライパケットのリストが設定情報管理部７１に格納されている設定情報と完全に一致するか否かを判断する（Ｓ３）。リプライパケットのリストと設定情報とが完全に一致するときは（Ｓ３のＹｅｓ）、レスポンスが良く、リプライパケットは十分な数が返ってきて、試験パケットは該当するフロー中の物理設備、ソフトウェアを全て正常に経由しているので、グループ構成部７２１は、そのフローを正常グループに分類する（Ｓ４）。 First, the process shown in FIG. 8 is performed for each flow. That is, the group configuration unit 721 determines whether or not the actual response time value stored in the response time storage unit 751 is smaller than the threshold value N1 (S1). When the measured response time value is smaller than the threshold value N1 (Yes in S1), the group configuration unit 721 determines whether the count number of reply packets stored in the reply count number storage unit 752 is greater than or equal to the threshold value C1. (S2). When the count number of reply packets is equal to or greater than the threshold value C1 (Yes in S2), the group configuration unit 721 sets the list of reply packets stored in the reply storage unit 745 stored in the setting information management unit 71. It is determined whether or not the information completely matches (S3). When the list of reply packets and the setting information completely match (Yes in S3), the response is good, a sufficient number of reply packets are returned, and the test packets are all physical equipment and software in the corresponding flow. Since it is normally routed, the group configuration unit 721 classifies the flow into a normal group (S4).

一方、リプライパケットのカウント数が閾値Ｃ１を下回ったときは（Ｓ２のＮｏ）、グループ構成部７２１は、Ｓ３と同様にリプライ格納部７４５に格納されているリプライパケットのリストが設定情報管理部７１に格納されている設定情報と完全に一致するか否かを判断する（Ｓ５）。リプライパケットのリストと設定情報とが完全に一致するときは（Ｓ５のＹｅｓ）、リプライパケットは十分な数が返ってきていないが、試験パケットは該当するフロー中の物理設備、ソフトウェアを全て正常に経由しているので、グループ構成部７２１は、そのフローを性能劣化グループに分類する（Ｓ６）。リプライパケットのリストと設定情報とで一致しないものがあるときは（Ｓ５のＹｅｓ）、リプライパケットは十分な数が返ってきておらず、試験パケットは該当するフロー中の物理設備、ソフトウェアで正常に経由していないものであるので、グループ構成部７２１は、そのフローを故障グループに分類する（Ｓ７）。Ｓ３で、リプライパケットのリストと設定情報とで一致しないものがあるときも（Ｓ３のＮｏ）、リプライパケットは十分な数が返ってきてはいるが、試験パケットは該当するフロー中の物理設備、ソフトウェアで正常に経由していないものであるので、グループ構成部７２１は、そのフローを故障グループに分類する（Ｓ７）。 On the other hand, when the count number of reply packets falls below the threshold value C1 (No in S2), the group configuration unit 721 displays the list of reply packets stored in the reply storage unit 745 as in the case of S3. It is determined whether or not it completely matches the setting information stored in (S5). When the list of reply packets and the setting information completely match (Yes in S5), a sufficient number of reply packets have not been returned, but the test packets are all normal for the physical equipment and software in the corresponding flow. Since it is routed, the group configuration unit 721 classifies the flow into a performance degradation group (S6). If there is a mismatch between the reply packet list and the setting information (Yes in S5), a sufficient number of reply packets have not been returned, and the test packet has been successfully processed by the physical equipment and software in the corresponding flow. Since it is not via, the group configuration unit 721 classifies the flow into a failure group (S7). Even if there is a mismatch between the reply packet list and the setting information in S3 (No in S3), a sufficient number of reply packets are returned, but the test packet is a physical facility in the corresponding flow, Since it is not normally routed by software, the group configuration unit 721 classifies the flow into a failure group (S7).

レスポンスタイムの実測値が閾値Ｎ１以上であるときは（Ｓ１のＮｏ）、グループ構成部７２１は、レスポンスタイムの実測値が閾値Ｎ１以上で、かつ、閾値Ｎ２未満であるか否かを判断する（Ｓ８）。レスポンスタイムの実測値が閾値Ｎ１以上で、かつ、閾値Ｎ２未満であるときは（Ｓ８のＹｅｓ）、グループ構成部７２１は、リプライカウント数格納部７５２に格納されているリプライパケットのカウント数が閾値Ｃ２以上か否かを判断する（Ｓ９）。リプライパケットのカウント数が閾値Ｃ２以上であるときは（Ｓ９のＹｅｓ）、グループ構成部７２１は、リプライ格納部７４５に格納されているリプライパケットのリストが設定情報管理部７１に格納されている設定情報（記憶部５０の情報）と完全に一致するか否かを判断する（Ｓ１０）。リプライパケットのリストと設定情報とが完全に一致するときは（Ｓ１０のＹｅｓ）、レスポンスがそれほど良くはないリプライパケットが複数返ってきており、かつ、試験パケットは該当するフロー中の物理設備、ソフトウェアを全て正常に経由しているので、グループ構成部７２１は、そのフローを性能劣化グループに分類する（Ｓ６）。リプライパケットのリストと設定情報とで一致しないものがあるときは（Ｓ１０のＮｏ）、レスポンスがそれほど良くはないリプライパケットが複数返ってきており、かつ、試験パケットは該当するフロー中の物理設備、ソフトウェアで正常に経由していないものがあるので、グループ構成部７２１は、そのフローを故障グループに分類する（Ｓ７）。リプライパケットのカウント数が閾値Ｃ２未満であるときは（Ｓ９のＮｏ）、グループ構成部７２１は、前記のＳ３の判断により、正常グループと性能劣化グループとにグループ分けする。 When the measured response time value is equal to or greater than the threshold value N1 (No in S1), the group configuration unit 721 determines whether the measured response time value is equal to or greater than the threshold value N1 and less than the threshold value N2 ( S8). When the measured response time is greater than or equal to the threshold value N1 and less than the threshold value N2 (Yes in S8), the group configuration unit 721 indicates that the count number of reply packets stored in the reply count number storage unit 752 is the threshold value. It is determined whether or not C2 or more (S9). When the count number of reply packets is equal to or greater than the threshold C2 (Yes in S9), the group configuration unit 721 sets the list of reply packets stored in the reply storage unit 745 stored in the setting information management unit 71. It is determined whether or not the information (information in the storage unit 50) completely matches (S10). When the list of reply packets and the setting information completely match (Yes in S10), a plurality of reply packets whose responses are not so good are returned, and the test packet is a physical facility or software in the corresponding flow. Since all are normally routed, the group composition unit 721 classifies the flow into a performance degradation group (S6). If there is a mismatch between the reply packet list and the setting information (No in S10), a plurality of reply packets with poor responses are returned, and the test packet is a physical facility in the corresponding flow, Since there is a software that is not normally routed, the group configuration unit 721 classifies the flow into a failure group (S7). When the count number of reply packets is less than the threshold value C2 (No in S9), the group configuration unit 721 groups into normal groups and performance degradation groups based on the determination in S3.

レスポンスタイムの実測値が閾値Ｎ１以上で、かつ、閾値Ｎ２未満ではないときは（Ｓ８のＮｏ）、グループ構成部７２１は、レスポンスタイムの実測値が閾値Ｎ２以上であるか否かを判断する（Ｓ１１）。レスポンスタイムの実測値が閾値Ｎ２以上であるときは（Ｓ１１のＹｅｓ）、リプライパケットのカウント数が閾値Ｃ３以上か否かを判断する（Ｓ１２）。リプライパケットのカウント数が閾値Ｃ３以上のときは（Ｓ１２のＹｅｓ）、レスポンスが悪いリプライパケットが所定数以上返ってきているので、グループ構成部７２１は、当該フローを故障グループに分類する（Ｓ７）。リプライパケットのカウント数が閾値Ｃ３未満のときは（Ｓ１２のＹｅｓ）、グループ構成部７２１は、前記のＳ１０の判断により、性能劣化グループと故障グループとに分類する。
以上のグループ分けの結果は、グループ格納部７２２に格納される。 When the measured response time value is not less than the threshold value N1 and not less than the threshold value N2 (No in S8), the group configuration unit 721 determines whether or not the measured response time value is not less than the threshold value N2 ( S11). When the measured value of the response time is equal to or greater than the threshold value N2 (Yes in S11), it is determined whether the count number of reply packets is equal to or greater than the threshold value C3 (S12). When the count number of reply packets is equal to or greater than the threshold C3 (Yes in S12), since the reply packets with poor responses have returned a predetermined number or more, the group configuration unit 721 classifies the flow into a failure group (S7). . When the count number of reply packets is less than the threshold value C3 (Yes in S12), the group configuration unit 721 classifies into a performance degradation group and a failure group based on the determination in S10.
The above grouping results are stored in the group storage unit 722.

図９は、図８のグループ分けにおける判断を示す状態遷移図である。正常グループ、性能劣化グループ、故障グループにグループ分けする各判断の項目８１〜８９において、（１）は前記のレスポンスタイムの判断、（２）は前記のリプライパケットのカウント数の判断、（３）は前記した設定情報とリストの一致性をそれぞれ示している。 FIG. 9 is a state transition diagram showing the determination in the grouping of FIG. In the items 81 to 89 for determining each group into a normal group, a performance degradation group, and a failure group, (1) is a determination of the response time, (2) is a determination of the count number of the reply packet, (3) Indicates the consistency between the setting information and the list.

（サービス影響の原因推定）
構成要素抽出部６３は、第１原因特定部６３１と、第２原因特定部６３２とを備えている。構成要素抽出部６３は、フローモデル同士を比較して当該比較結果から通信ネットワーク１０上でのサービス影響の原因となる物理設備又は前記ソフトウェアを推定する。構成要素抽出部６３の第１原因特定部６３１と、第２原因特定部６３２とは、それぞれ異なる手法で当該推定を行う。
まず、第１原因特定部６３１は、前記のように分類された性能劣化グループ内又は故障グループ内で各フローについて、フローモデル同士を比較し、共通する要素となる物理設備又はソフトウェアを抽出し、当該抽出した物理設備又はソフトウェアをサービス影響の原因として推定する。
図１０（ａ）の例では、性能劣化グループ内又は故障グループ内で、フローＩＤがＰ１１とＰ１２のフローモデル同士を比較し、共通する要素となる物理設備ＩＤがｐｓ９５のスイッチ１２を抽出している。 (Estimated cause of service impact)
The component extraction unit 63 includes a first cause specifying unit 631 and a second cause specifying unit 632. The component extraction unit 63 compares the flow models with each other and estimates a physical facility or the software that causes a service influence on the communication network 10 from the comparison result. The first cause identifying unit 631 and the second cause identifying unit 632 of the component extraction unit 63 perform the estimation using different methods.
First, the first cause identifying unit 631 compares the flow models for each flow within the performance degradation group or the failure group classified as described above, and extracts physical equipment or software that is a common element, The extracted physical equipment or software is estimated as a cause of service influence.
In the example of FIG. 10A, the flow models with the flow IDs P11 and P12 are compared within the performance degradation group or the failure group, and the switch 12 with the physical equipment ID ps95 as a common element is extracted. Yes.

第２原因特定部６３２は、前記のように分類された性能劣化グループ又は故障グループと、正常グループとの間で各フローのフローモデル同士を比較して、共通する物理設備又はソフトウェアはサービス影響の原因の候補から除外し、残った物理設備又はソフトウェアを抽出して、当該抽出した物理設備又はソフトウェアをサービス影響の原因として推定する。 The second cause identifying unit 632 compares the flow models of each flow between the performance degradation group or the failure group classified as described above and the normal group, and the common physical equipment or software has a service impact. The physical equipment or software remaining is excluded from the cause candidates, and the extracted physical equipment or software is estimated as a cause of service influence.

図１０（ｂ）の例では、性能劣化グループ又は故障グループのフローＩＤがＰ２１のフローモデルと、正常グループのフローＩＤがＰ２２のフローモデルとを比較し、共通の要素である、ソフトウェアＩＤが“ｂｋ２”のアプリケーションソフト２２、物理設備ＩＤが“ｐｓ９５”のスイッチ１２を共通するものとして除外し、残った要素である物理設備又はソフトウェアを抽出している。
このようにして、第１原因特定部６３１又は第２原因特定部６３２により抽出された構成要素は抽出要素格納部６４に格納される。 In the example of FIG. 10B, the flow model whose performance ID is P21 of the performance degradation group or failure group is compared with the flow model whose flow ID is P22 of the normal group, and the software ID which is a common element is “ The application software 22 of “bk2” and the switch 12 whose physical equipment ID is “ps95” are excluded as common, and the physical equipment or software which is the remaining element is extracted.
In this way, the constituent elements extracted by the first cause identifying unit 631 or the second cause identifying unit 632 are stored in the extracted element storage unit 64.

図１１は、構成要素抽出部６３の第１原因特定部６３１を用いるか、第２原因特定部６３２を用いるかを選択するためのフローチャートである。
すなわち、構成要素抽出方法決定部６２が図１１の処理により、第１原因特定部６３１を用いるか、第１原因特定部６３１及び第２原因特定部６３２の両方を用いるかを選択する。この場合に後述の故障フロー数に関する閾値Ｄ１が故障フロー数閾値格納部７３３に格納されていて、同様に後述の性能劣化フロー数に関する閾値Ｄ２が性能劣化フロー数閾値格納部７３４に格納されていて、本処理では当該各閾値を用いる。 FIG. 11 is a flowchart for selecting whether to use the first cause specifying unit 631 or the second cause specifying unit 632 of the component extraction unit 63.
That is, the component extraction method determination unit 62 selects whether to use the first cause specifying unit 631 or both the first cause specifying unit 631 and the second cause specifying unit 632 by the processing of FIG. In this case, a threshold value D1 related to the failure flow number described later is stored in the failure flow number threshold storage unit 733, and similarly, a threshold value D2 related to the performance deterioration flow number described later is stored in the performance deterioration flow number threshold storage unit 734. In the present process, each threshold value is used.

構成要素抽出方法決定部６２は、各フローについて図１１の処理を実行する。まず、当該フローが性能劣化グループに分類されているときは（Ｓ２１のＹｅｓ）、構成要素抽出方法決定部６２は、性能劣化グループに分類されたフローの数である性能劣化フロー数を、その閾値である閾値Ｄ１以上であるか否か判断する（Ｓ２２）。性能劣化フロー数が閾値Ｄ１以上であるときは（Ｓ２２のＹｅｓ）、構成要素抽出方法決定部６２は、第１原因特定部６３１を使用する（Ｓ２３）。性能劣化フロー数が閾値Ｄ１未満であるときは（Ｓ２２のＮｏ）、構成要素抽出方法決定部６２は、第１原因特定部６３１を使用した後（Ｓ２４）、第２原因特定部６３２を使用する（Ｓ２５）。 The component extraction method determination unit 62 executes the process of FIG. 11 for each flow. First, when the flow is classified into the performance degradation group (Yes in S21), the component element extraction method determination unit 62 sets the number of performance degradation flows, which is the number of flows classified into the performance degradation group, as a threshold value. It is determined whether or not the threshold value D1 is greater than or equal to (S22). When the number of performance degradation flows is equal to or greater than the threshold value D1 (Yes in S22), the component element extraction method determination unit 62 uses the first cause identification unit 631 (S23). When the number of performance degradation flows is less than the threshold value D1 (No in S22), the component extraction method determination unit 62 uses the second cause specifying unit 632 after using the first cause specifying unit 631 (S24). (S25).

一方、該当するフローが故障グループに分類されているときは（Ｓ２６のＹｅｓ）、構成要素抽出方法決定部６２は、故障グループに分類されたフローの数である故障フロー数を、その閾値である閾値Ｄ２以上であるか否か判断する（Ｓ２７）。故障フロー数が閾値Ｄ２以上であるときは（Ｓ２７のＹｅｓ）、構成要素抽出方法決定部６２は、第１原因特定部６３１を使用する（Ｓ２３）。故障フロー数が閾値Ｄ２未満であるときは（Ｓ２７のＮｏ）、構成要素抽出方法決定部６２は、第１原因特定部６３１を使用した後（Ｓ２４）、第２原因特定部６３２を使用する（Ｓ２５）。
以上のように、第１原因特定部６３１又は第２原因特定部６３２が使用されて、前記のとおり構成要素となる物理設備又はソフトウェアが抽出されると、その抽出した構成要素をサービス影響の原因の特定個所と推定する（Ｓ２８）。 On the other hand, when the corresponding flow is classified into the failure group (Yes in S26), the component extraction method determination unit 62 uses the failure flow number that is the number of flows classified into the failure group as the threshold value. It is determined whether or not the threshold value D2 is exceeded (S27). When the number of failure flows is equal to or greater than the threshold value D2 (Yes in S27), the component extraction method determining unit 62 uses the first cause identifying unit 631 (S23). When the number of failure flows is less than the threshold D2 (No in S27), the component extraction method determination unit 62 uses the first cause specifying unit 631 (S24) and then uses the second cause specifying unit 632 ( S25).
As described above, when the first cause specifying unit 631 or the second cause specifying unit 632 is used and the physical equipment or software that is the constituent element is extracted as described above, the extracted constituent element is the cause of the service influence. (S28).

（サービス影響の原因推定の変形例）
図１２〜図１４を参照して前記したサービス影響の原因推定の処理の変形例について説明する。 (Modification of service cause estimation)
A modified example of the service influence cause estimation process described above will be described with reference to FIGS.

図１２は、当該変形例を説明する説明図である。図１２（ａ）には、正常グループと故障グループ（性能劣化グループ）のフローモデルの例を示している。本例では、まず、構成要素抽出部６３が、前記の性能劣化グループ内又は前記の故障グループ内で、各フローのフローモデル同士を比較し、共通する物理設備又はソフトウェアの数をそれぞれカウントする。図１２（ｂ）には、そのカウント結果の例を示している。例えば、ソフトウェアＩＤが“ａ１”のアプリケーションソフト２２についてはカウント数が“２１”、物理設備ＩＤが“ｖｓ３”のスイッチ１２についてはカウント数が“１７”であるという例を示している。 FIG. 12 is an explanatory diagram for explaining the modification. FIG. 12A shows an example of a flow model of a normal group and a failure group (performance degradation group). In this example, first, the component extraction unit 63 compares the flow models of each flow within the performance degradation group or the failure group, and counts the number of common physical facilities or software, respectively. FIG. 12B shows an example of the count result. For example, the count number is “21” for the application software 22 with the software ID “a1”, and the count number is “17” for the switch 12 with the physical facility ID “vs3”.

その後、構成要素抽出部６３は、このような比較をした性能劣化グループ又は故障グループと、正常グループとの間で各フローのフローモデル同士を比較し、共通する物理設備又はソフトウェアについては、図１２（ｃ）に例示するようにカウントの数を０とする。図１２（ｃ）の例では、ソフトウェアＩＤが“ａ１”のアプリケーションソフト２２については、性能劣化グループ又は故障グループと、正常グループとの間で共通していたのでカウント数が“２１”から“０”に変更され、物理設備ＩＤが“ｖｓ３”のスイッチ１２については、性能劣化グループ又は故障グループと、正常グループとの間で共通していなかったので、カウント数が“１７”のままであるという例を示している（いずれも「合計」）。 After that, the component extraction unit 63 compares the flow models of each flow between the performance deterioration group or failure group thus compared and the normal group, and for common physical equipment or software, FIG. As exemplified in (c), the number of counts is set to zero. In the example of FIG. 12C, since the application software 22 having the software ID “a1” is common between the performance deterioration group or the failure group and the normal group, the count number is changed from “21” to “0”. The switch 12 whose physical equipment ID is “vs3” is not common between the performance degradation group or the failure group and the normal group, and therefore the count number remains “17”. Examples are shown (both are “total”).

そして、構成要素抽出部６３は、最終的に前記のカウントの数が最大である物理設備又はソフトウェアを抽出し、当該抽出した物理設備又はソフトウェアをサービス影響の原因であるものとして推定する。図１２（ｃ）の例では、カウントの数が最大である物理設備又はソフトウェアは、物理設備ＩＤが“ｓ３”のスイッチ１２のカウント数“４５”であり、これが「最終結果」となる。そのため、構成要素抽出部６３は、物理設備ＩＤが“ｓ３”のスイッチ１２が、サービス影響の原因であるものとして推定する。
このような処理において、最終的に前記のカウントの数が最大である物理設備又はソフトウェアを抽出し、その抽出した物理設備又はソフトウェアが複数個になる場合もある。 Then, the component extraction unit 63 finally extracts the physical facility or software having the maximum number of counts, and estimates the extracted physical facility or software as the cause of the service influence. In the example of FIG. 12C, the physical facility or software having the largest count is the count number “45” of the switch 12 whose physical facility ID is “s3”, which is the “final result”. Therefore, the component extraction unit 63 estimates that the switch 12 having the physical facility ID “s3” is the cause of the service influence.
In such a process, the physical equipment or software having the maximum number of counts may be finally extracted, and the extracted physical equipment or software may be plural.

この場合に、抽出した複数個の物理設備又はソフトウェアについて説明する。まず、図１を参照して前記したように、通信ネットワーク１０上の各部にＩＤ（物理設備ＩＤ、サーバＩＤ、ソフトウェアＩＤ）が付されている。
これに対して、物理設備ＩＤ及びソフトウェアＩＤとして、当該ＩＤが示す物理設備又はソフトウェアと、当該物理設備又はソフトウェアと親子関係又は接続関係にある他の物理設備、ソフトウェア、又はサーバとの相関関係を示すものを用いるようにする。 In this case, the extracted plurality of physical facilities or software will be described. First, as described above with reference to FIG. 1, IDs (physical facility ID, server ID, and software ID) are assigned to each unit on the communication network 10.
On the other hand, as the physical equipment ID and software ID, the correlation between the physical equipment or software indicated by the ID and the other physical equipment, software, or server that is in a parent-child relationship or connection relationship with the physical equipment or software. Use what is shown.

図１３は、図１の例において、物理設備ＩＤ、ソフトウェアＩＤとして、このような相関関係のあるＩＤを用いた例を示す図である。例えば、物理設備ＩＤが“ｌ１：ｓｖ０１：ｓ１”であるリンク１３において、物理設備ＩＤの“ｌ１”の部分は当該リンク１３自体を示しており、これに続く“ｓｖ０１”の部分は当該スイッチ１２と接続関係にあるサーバ１１のＩＤを示し、同様に、“ｓ１”の部分は当該スイッチ１２と接続関係にあるスイッチ１２のＩＤを示している。
このような物理設備ＩＤ及びソフトウェアＩＤを用いることで、当該ＩＤから当該ＩＤと親子関係又は接続関係にある他の物理設備、ソフトウェア、又はサーバを認識することができる。 FIG. 13 is a diagram illustrating an example in which such a correlated ID is used as the physical facility ID and the software ID in the example of FIG. For example, in the link 13 whose physical equipment ID is “l1: sv01: s1”, the part “l1” of the physical equipment ID indicates the link 13 itself, and the part “sv01” subsequent thereto is the switch 12. Similarly, the “s1” portion indicates the ID of the switch 12 that is connected to the switch 12.
By using such physical facility ID and software ID, it is possible to recognize other physical facilities, software, or servers that are in a parent-child relationship or connection relationship with the ID from the ID.

図１４は、図１３のＩＤを用いて実行する処理の説明図である。この例では、図１２の処理で求めた今回の結果が、物理設備ＩＤ又はソフトウェアＩＤ（の図１の例に相当する部分だけを図１４に示している）が、それぞれ“ｂ２”、“ｖｓ２”、“ｌ２”の場合の物理設備又はソフトウェアのカウントの数がいずれも“２５”である。すなわち、前記のサービス影響の原因である物理設備又はソフトウェアが複数推定された場合である。この場合に、構成要素抽出部６３は、今回の当該複数の物理設備又はソフトウェア同士、又は、前回行われた処理で推定された物理設備又はソフトウェアと今回行われた処理で推定された物理設備又はソフトウェアとについて前記の親子関係又は接続関係がある場合に、今回行われて複数推定された物理設備又はソフトウェアのカウントの数に優先度をつける。 FIG. 14 is an explanatory diagram of processing executed using the ID of FIG. In this example, the current result obtained by the processing of FIG. 12 is that the physical equipment ID or software ID (only the part corresponding to the example of FIG. 1 is shown in FIG. 14) is “b2”, “vs2”, respectively. “25” is the number of physical equipment or software counts in the case of “12”. That is, this is a case where a plurality of physical facilities or software that cause the service influence are estimated. In this case, the constituent element extraction unit 63 may connect the plurality of physical facilities or software this time, or the physical facility or software estimated in the process performed last time and the physical facility or software estimated in the process performed this time. When there is the above-described parent-child relationship or connection relationship with software, a priority is given to the number of physical facilities or software counts that have been estimated this time.

図１４の例では、ソフトウェアＩＤが“ｂ２（図１３では、“ｂ２：ｖｓ２”）”のアプリケーションソフト２２と、ソフトウェアＩＤが“ｖｓ２（図１３では、“ｖｓ２：ｂ１：ｂ２”）”の仮想スイッチ２１との間には親子関係（前者が子、後者が親）があるため、親であるソフトウェアＩＤが“ｖｓ２（図１３では、“ｖｓ２：ｂ１：ｂ２”）”の仮想スイッチ２１の優先度を＋１だけ上げる。また、物理設備ＩＤが“ｌ２（図１３では、“ｌ２：ｓｖ０２：ｓ１”）”のリンク１３の前回の結果と、ソフトウェアＩＤが“ｖｓ２（図１３では、“ｖｓ２：ｂ１：ｂ２”）”の仮想スイッチ２１の今回の結果との間には直接的な接続関係があるため、物理設備ＩＤが“ｌ２（図１３では、“ｌ２：ｓｖ０２：ｓ１”）”のリンク１３の影響が波及したと推定し、ソフトウェアＩＤが“ｖｓ２（図１３では、“ｖｓ２：ｂ１：ｂ２”）”の仮想スイッチ２１の今回の結果の優先度を−０．５下げる。
物理設備ＩＤが“ｓ３（図１３では、“ｓ３：ｌ４：ｌ５”）”のスイッチ１２の前回の結果と、今回の結果との間には、親子関係又は接続関係がないため、当該スイッチ１２が単独で故障していると推定し、＋１だけ優先度を上げる。 In the example of FIG. 14, the application software 22 whose software ID is “b2 (“ b2: vs2 ”in FIG. 13)” and the virtual whose software ID is “vs2 (“ vs2: b1: b2 ”in FIG. 13)”. Since there is a parent-child relationship with the switch 21 (the former is a child and the latter is a parent), the priority of the virtual switch 21 whose parent software ID is “vs2 (“ vs2: b1: b2 ”in FIG. 13)” Increase the degree by +1. Further, the previous result of the link 13 whose physical equipment ID is “l2 (in FIG. 13,“ l2: sv02: s1 ”) and the software ID is“ vs2 (in FIG. 13, “vs2: b1: b2”) ”. Since there is a direct connection relationship with the current result of the virtual switch 21, the influence of the link 13 whose physical facility ID is “l2 (“ l2: sv02: s1 ”in FIG. 13)” has spread. And the priority of the current result of the virtual switch 21 whose software ID is “vs2 (in FIG. 13,“ vs2: b1: b2 ”)” is lowered by −0.5.
Since there is no parent-child relationship or connection relationship between the previous result of the switch 12 whose physical facility ID is “s3 (“ s3: l4: l5 ”in FIG. 13)” and the current result, the switch 12 Estimate that is alone and raise the priority by +1.

これらの結果、ソフトウェアＩＤが“ｂ２（図１３では、“ｂ２：ｖｓ２”）”のアプリケーションソフト２２の最終結果は“２５”、ソフトウェアＩＤが“ｖｓ２（図１３では、“ｖｓ２：ｂ１：ｂ２”）”の仮想スイッチ２１の最終結果は“２５．５”、物理設備ＩＤが“ｓ３（図１３では、“ｓ３：ｌ４：ｌ５”）”のスイッチ１２の最終結果は“２６”となる。 As a result, the final result of the application software 22 whose software ID is “b2 (in FIG. 13,“ b2: vs2 ”) is“ 25 ”, and the software ID is“ vs2 ”(in FIG. 13,“ vs2: b1: b2 ”). The final result of the virtual switch 21 of “)” is “25.5”, and the final result of the switch 12 whose physical equipment ID is “s3 (“ s3: l4: l5 ”in FIG. 13)” is “26”.

以上の処理により、物理設備及びソフトウェアの探索順序は、最終結果の値が大きい物理設備ＩＤが“ｓ３（図１３では、“ｓ３：ｌ４：ｌ５）”のスイッチ１２、ソフトウェアＩＤが“ｖｓ２（図１３では、“ｖｓ２：ｂ１：ｂ２）”の仮想スイッチ２１、ソフトウェアＩＤが“ｂ２（図１３では、“ｂ２：ｖｓ２”）”のアプリケーションソフト２２の順番となる。 With the above processing, the physical equipment and software search order is such that the physical equipment ID having the large final result value is “s3 (in FIG. 13,“ s3: l4: l5 ”)” and the software ID is “vs2 (FIG. 13, the virtual switch 21 is “vs2: b1: b2)” and the application software 22 is “b2” (“b2: vs2” in FIG. 13).

以上説明した本実施形態によれば、ソフトウェアの故障や劣化も検出でき、作業が簡易で、装置規模が比較的小規模であり、サービス品質の劣化が生じている原因箇所の推定もできるサービス影響原因推定装置１、サービス影響原因推定プログラム４５、及びサービス影響原因推定方法を提供することができる。 According to the present embodiment described above, it is possible to detect software failures and deterioration, the work is simple, the apparatus scale is relatively small, and the cause of the service quality deterioration can be estimated. The cause estimation apparatus 1, the service influence cause estimation program 45, and the service influence cause estimation method can be provided.

１サービス影響原因推定装置
４５サービス影響原因推定プログラム
５０記憶部
６０処理部（推定部）
６１モデル生成部
７０管理部（推定部）
６３１第１原因特定部
６３２第２原因特定部
７２１グループ構成部
７４１試験パケット生成部
７４２リスト生成部
７４３パケット送信部
７４４パケット受信部
７４５リプライ格納部 1 Service Effect Cause Estimation Device 45 Service Effect Cause Estimation Program 50 Storage Unit 60 Processing Unit (Estimation Unit)
61 Model generation unit 70 Management unit (estimation unit)
631 First cause identifying unit 632 Second cause identifying unit 721 Group configuration unit 741 Test packet generation unit 742 List generation unit 743 Packet transmission unit 744 Packet reception unit 745 Reply storage unit

Claims

For a flow configured using at least one or more of physical equipment and software for transferring data on a communication network, a flow ID for identifying the flow, a physical equipment ID for identifying the physical equipment, and the physical equipment A storage unit that associates and stores a server ID that identifies the server and a software ID that identifies the software used in each server;
With reference to the storage unit, a model generation unit that generates a flow model that is a model for specifying the ID of the physical facility and software through which data flows for each flow and the order in which the data flows;
An estimation unit that compares the flow models and estimates the physical equipment or the software that causes a service influence on the communication network from the comparison result , and
The estimation unit includes
A list generation unit for generating a list capable of storing the software ID;
A test packet generator for generating a test packet having the list;
A packet transmitter that transmits a predetermined number of the test packets within a predetermined time for each flow;
A packet receiving unit that receives a reply packet of the test packet in which the ID of the software that the test packet has passed is stored in the list;
A reply storage unit for storing the received reply packet;
With
A group configuration unit that classifies each flow into a normal group and an abnormal group based on a measurement result of reception of the reply packet stored in the reply storage unit;
The group component is
The response time as the measurement result and the count number of replies as the measurement result are within the respective predetermined values, respectively, and the software ID acquired from the storage unit and the software ID stored in the reply packet list , And classify the flows having the same software ID as a normal group, classify the other flows as an abnormal group, and the measured value of the response time and the count number for the abnormal group. The flow in which at least one software ID does not match when each is within the range of each predetermined value or when the software ID acquired from the storage unit is compared with the software ID stored in the list of reply packets Are classified into failure groups, and the other flows are Classify into performance degradation groups,
The estimation unit includes
Compare the flow models of each flow within the performance degradation group or within the failure group, extract the common physical equipment or software, and estimate the extracted physical equipment or software as the cause of the service impact A first cause identifying unit;
The flow model of each flow is compared between the performance degradation group or the failure group and the normal group, and the common physical equipment or the software is excluded from candidates for the cause of the service influence and remains. Extracting the physical equipment or the software, and estimating the extracted physical equipment or the software as a cause of the service influence;
A service influence cause estimation device characterized by comprising:

The predetermined value used for the response time in the group configuration unit is obtained by taking an average value or using a statistical method for the response time for the test packet transmitted by a predetermined value within a predetermined time. The predetermined value used for the count number is obtained by taking an average value of the count numbers for the test packet transmitted by a predetermined value within a predetermined time or using a statistical method. The service influence cause estimation device according to claim 1 .

The estimation unit uses the first cause identifying unit when the number of flows allocated to the failure or performance degradation group is equal to or greater than a predetermined threshold for the same group, and when the number is less than the predetermined threshold, 3. The service influence cause estimation device according to claim 1, wherein after the first cause specifying unit is used, the cause of the service influence is estimated using the second cause specifying unit. .

The estimation unit compares the flow models of the flows in the performance deterioration group or the failure group, counts the number of the common physical equipment or the software, and then performs the comparison. The flow models of the respective flows are compared between the deterioration group or the failure group and the normal group, the number of the count is set to 0 for the common physical equipment or the software, and finally the number of the count 4. The physical equipment or the software having the largest value is extracted, and the extracted physical equipment or the software is estimated as a cause of the service influence . The service influence cause estimation device according to one item .

The storage unit correlates the physical equipment or software indicated by the ID as the physical equipment ID and the software ID, and other physical equipment, software, or server that is in a parent-child relationship or connection relationship with the physical equipment or software. Showing the relationship,
The estimation unit, when a plurality of the physical equipment or software that is a cause of the service influence is estimated, the physical equipment or software, or the physical equipment estimated by the previous estimation. Or, when there is a parent-child relationship or a connection relationship between the software and the physical facility or the software estimated by the estimation performed this time, the count of the physical facility or the software estimated a plurality of times performed this time The service influence cause estimation device according to claim 4 , wherein a priority is given to the number of the service influences.

For a flow configured using at least one or more of physical equipment and software for transferring data on a communication network, a flow ID for identifying the flow, a physical equipment ID for identifying the physical equipment, and the physical equipment The physical facility and the software in which data flows for each flow with reference to a storage unit that associates and stores a server ID that identifies the server and a software ID that identifies the software used in each server A model generation process for generating a flow model which is a model for specifying the ID and the order in which the data flows;
An estimation process for comparing the flow models and estimating the physical equipment or the software that causes a service influence on the communication network from the comparison result;
To the computer,
The estimation process includes
A list generation process for generating a list capable of storing the software ID;
A test packet generation process for generating a test packet with the list;
Packet transmission processing for transmitting a predetermined number of the test packets within a predetermined time for each flow;
A packet reception process for receiving a reply packet of the test packet in which the ID of the software that the test packet has passed is stored in the list;
A reply storing process for storing the received reply packet;
To the computer,
Causing a computer to execute a group configuration process for classifying each flow into a normal group and an abnormal group based on a measurement result of reception of the reply packet stored in the reply storage process;
The group configuration process includes:
The response time as the measurement result and the count number of replies as the measurement result are within the respective predetermined values, respectively, and the software ID acquired from the storage unit and the software ID stored in the reply packet list , And classify the flows having the same software ID as a normal group, classify the other flows as an abnormal group, and the measured value of the response time and the count number for the abnormal group. The flow in which at least one software ID does not match when each is within the range of each predetermined value or when the software ID acquired from the storage unit is compared with the software ID stored in the list of reply packets Are classified into failure groups, and the other flows are Classify into performance degradation groups,
The estimation process includes
Compare the flow models of each flow within the performance degradation group or within the failure group, extract the common physical equipment or software, and estimate the extracted physical equipment or software as the cause of the service impact A first cause identification process;
The flow model of each flow is compared between the performance degradation group or the failure group and the normal group, and the common physical equipment or the software is excluded from candidates for the cause of the service influence and remains. Extracting the physical equipment or the software, and estimating the extracted physical equipment or the software as a cause of the service influence;
A computer-readable service influence cause estimation program characterized by causing a computer to execute.

For a flow configured using at least one or more of physical equipment and software for transferring data on a communication network, a flow ID for identifying the flow, a physical equipment ID for identifying the physical equipment, and the physical equipment The physical facility and the software in which data flows for each flow with reference to a storage unit that associates and stores a server ID that identifies the server and a software ID that identifies the software used in each server A model generation step for generating a flow model, which is a model for specifying the ID and the order in which the data flows;
An estimation step of comparing the flow models and estimating the physical equipment or the software that causes a service influence on the communication network from the comparison result;
With
The estimation step includes
A list generation step of generating a list capable of storing the software ID;
A test packet generating step for generating a test packet with the list;
A packet transmission step of transmitting a predetermined number of the test packets for each flow within a predetermined time;
A packet receiving step of receiving a reply packet of the test packet in which the ID of the software that the test packet has passed is stored in the list;
A reply storing step for storing the received reply packet;
With
A group configuration step of classifying each flow into a normal group and an abnormal group based on a measurement result of reception of the reply packet stored in the reply storage step;
The group configuration process includes:
The response time as the measurement result and the count number of replies as the measurement result are within the respective predetermined values, respectively, and the software ID acquired from the storage unit and the software ID stored in the reply packet list , And classify the flows having the same software ID as a normal group, classify the other flows as an abnormal group, and the measured value of the response time and the count number for the abnormal group. The flow in which at least one software ID does not match when each is within the range of each predetermined value or when the software ID acquired from the storage unit is compared with the software ID stored in the list of reply packets Are classified into failure groups, and the other flows are Classify into performance degradation groups,
The estimation step includes
Compare the flow models of each flow within the performance degradation group or within the failure group, extract the common physical equipment or software, and estimate the extracted physical equipment or software as the cause of the service impact A first cause identification step;
The flow model of each flow is compared between the performance degradation group or the failure group and the normal group, and the common physical equipment or the software is excluded from candidates for the cause of the service influence and remains. Extracting the physical equipment or the software, and estimating the extracted physical equipment or the software as a cause of the service influence;
A service influence cause estimation method characterized by comprising: