JP2004104540A

JP2004104540A - Support system for analyzing network performance fault

Info

Publication number: JP2004104540A
Application number: JP2002264894A
Authority: JP
Inventors: Yukio Ogawa; 小川　祐紀雄; Eiji Ohira; 大平　栄二; Satoshi Hasegawa; 長谷川　聡
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2002-09-11
Filing date: 2002-09-11
Publication date: 2004-04-02

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method that can measure a response time of the entire network even in a large-scale network system and collect operational information of a cause part by narrowing down the cause part when delay occurs about a network system performance fault analysis supporting method. <P>SOLUTION: A tree type configuration part of a network system comprises a means for measuring an IP packet response time in a path that goes through a trunk part from a monitor to a network device of a branch line part, a means for estimating a response time in a path that goes from a client device of the branch line part to a server device of a second supporting part from measuring results, a means for determining delay occurrence of the response time, a means for automatically narrowing down the cause part of the delay occurrence by comparing delay states of a plurality of paths, and a means for collecting operational information by considering a network device located at the narrowed down part as a collection object of the operational information. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、ネットワークシステム運用管理方法に関し、さらに詳しくは、ネットワーク機器の応答時間を監視することにより性能管理を行い、ネットワークシステム稼動状態の診断、障害部位分析、予防保守を行う方法に関する。
【０００２】
【従来の技術】
ネットワークシステムの管理者やシステムエンジニアは、ネットワークの性能を把握するために、通常、ネットワーク機器間における応答時間の測定を行う。応答時間の測定には、ＩＣＭＰ（Ｉｎｔｅｒｎｅｔ　Ｃｏｎｔｒｏｌ　Ｍｅｓｓａｇｅ　Ｐｒｏｔｏｃｏｌ）エコーの要求／応答時間（ｐｉｎｇコマンド）を利用したＩＰパケットの応答時間測定方法が、広く利用されている。
【０００３】
特許文献１には、ネットワークシステムにおいて遅延個所を分離するための方法が開示されている。
【０００４】
また、特許文献２には、ネットワークシステムにおいて遅延個所を自動的に調査するための方法が開示されている。
【０００５】
【特許文献１】
特開平１１−３４６２３８号公報
【特許文献２】
特開２００２−１５２２０３号公報
【０００６】
【発明が解決しようとする課題】
特許文献１では、遅延個所を分離するためにネットワークシステム上に多数の応答時間測定プローブを設置する必要がある。
【０００７】
また、特許文献２では、遅延個所を分離するために、経路を特定の上、経路上の各ノードからの応答時間を測定する必要がある。
【０００８】
これらの従来方法は、数千台以上のネットワーク機器からなるような大規模なネットワークシステム全体をカバーする応答時間の測定行い、遅延個所の絞込みを行う場合においては効率的な方法ではない。
【０００９】
【課題を解決するための手段】
本発明は、数千台以上のネットワーク機器からなるような大規模なネットワークシステムにおいても、システム全体におけるクライアントからサーバへ至る経路での応答時間を推定できるネットワーク性能障害分析支援方法を提供する。
【００１０】
また、本発明は、数千台以上のネットワーク機器からなるような大規模なネットワークシステムにおいても遅延発生時に原因部位を自動的に絞り込むことができるネットワーク性能障害分析支援方法を提供する。
【００１１】
また、本発明は、数千台以上のネットワーク機器からなるような大規模なネットワークシステムにおいても遅延発生時に原因部位のみの稼動情報を自動的に収集することができるネットワーク性能障害分析支援方法を提供する。
【００１２】
具体的には、本発明は、幹線部の中継拠点から支線部の複数のネットワーク機器に至るツリー型のネットワークの、幹線部に接続した監視装置から中継拠点を通り支線部のいずれかのネットワーク機器に至る経路におけるＩＰパケット応答時間および到達率の測定を行う応答時間測定手段と、支線部分に設置したクライアント機器から別の支線部分に設置したサーバ機器に至る経路における応答時間を推定するサーバ・クライアント応答時間推定手段を設けている。
【００１３】
また、本発明は、応答時間に遅延が生じているかを判定する応答時間確認手段と、遅延発生の原因部位を自動的に絞り込む遅延原因部位絞込み手段とをさらに設けている。
【００１４】
また、本発明は、自動的に絞り込んだ遅延発生の原因部位に設置されたネットワーク機器を稼動情報収集対象機器とし、さらに、収集する稼動情報の種別、収集周期、収集期間を決定して稼動情報収集のための設定ファイルを作成する稼動情報収集設定ファイル作成手段と、作成した設定ファイルを監視装置内の稼動情報収集手段に対して再設定する稼動情報収集設定ファイル変更手段と、稼動情報収集手段を起動する稼動情報収集手段起動手段と、設定ファイルに従ってネットワーク機器から稼動情報を収集する稼動統計情報収集手段と、収集した稼動情報を格納、保存する稼動情報保存手段を設けている。
【００１５】
さらに、本発明は、現在稼動中の稼動情報収集手段の設定ファイルに対して部分的な設定変更を行う稼動情報収集設定ファイル変更手段を設けている。
【００１６】
本発明は以上の構成を備えているので、大規模なネットワークシステムにおいても、システム全体に対し少ない監視経路でクライアントからサーバへ至る経路での応答時間を推定することができる。また、大規模なネットワークシステムにおいても、遅延発生時に原因部位を自動的に絞り込むことができる。また、大規模なネットワークシステムにおいても、遅延発生時に原因部位のみの稼動情報を自動的に収集することができる。
【００１７】
【発明の実施の形態】
以下、図を参照して本発明の実施形態を説明する。
【００１８】
図１は、本発明の一実施形態にかかるネットワーク性能障害分析支援システムの機能構成例である。図１を参照しながら、ネットワーク性能障害分析支援システムのハードウェア構成および機能構成を説明する。
【００１９】
ネットワーク機器１１９は、ルータ、ＡＴＭ交換機、スイッチングハブ、インテリジェントハブなどの機器であり、ＩＰパケット応答時間測定のためのＩＣＭＰエコー応答機能、および、稼動情報測定のためのＳＮＭＰ（Ｓｉｍｐｌｅ　Ｎｅｔｗｏｒｋ　Ｍａｎａｇｅｍｅｎｔ　Ｐｒｏｔｏｃｏｌ　）エージェント機能を備えている。また、サーバやクライアントが、ＩＣＭＰエコー応答機能およびＳＮＭＰエージェント機能を備えている場合は、ネットワーク機器と見なすことが出来る。
【００２０】
ネットワーク監視装置１２０、ネットワーク情報表示装置１２１は、一般的なパーソナルコンピュータであり、ＣＰＵ（Ｃｅｎｔｒａｌ　Ｐｒｏｃｅｓｓｉｎｇ　Ｕｎｉｔ　）がプログラム命令を解釈し実行することができる。
【００２１】
応答時間測定処理部１０２は、ネットワーク監視装置１２０内にあり、監視装置１２０からネットワーク機器１１９に至る経路における応答時間およびＩＰパケット到達率１０１を周期的に測定する。
【００２２】
サーバ・クライアント応答時間推定処理部１０３は、ネットワーク監視装置１２０内にあり、監視装置１２０からサーバに至る経路における応答時間、および、監視装置１２０からクライアントに至る経路における応答時間から、クライアントからサーバに至る経路における応答時間を推定する。推定方法の詳細については後述する。
【００２３】
応答時間格納処理部１０４は、ネットワーク監視装置１２０内にある記憶装置であり、応答時間測定処理部１０２により測定された応答時間情報およびパケット到達率情報１０１を格納、蓄積する。
【００２４】
応答時間表示処理部１０５は、ネットワーク監視装置１２０内にあり、応答時間測定処理部１０２により測定された応答時間情報およびパケット到達率情報１０１をネットワーク情報表示装置を通じて表示する。
【００２５】
応答時間確認処理部１０８は、ネットワーク監視装置１２０内にあり、監視装置１２０から個々のネットワーク機器１１９に至る経路における応答時間およびＩＰパケット到達率１０１が、それぞれに設定した閾値以上であるか判定する。
【００２６】
遅延部位絞込み処理部１０９は、ネットワーク監視装置１２０内にあり、監視装置１２０から個々のネットワーク機器１１９に至る経路における応答時間およびＩＰパケット到達率１０１において、閾値以上の値が検知された場合に、遅延の原因となるネットワーク部位を自動的に絞り込む。絞り込み方法の詳細については後述する。
【００２７】
稼動情報収集設定ファイル作成処理部１１０は、ネットワーク監視装置１２０内にあり、自動的に絞り込んだ遅延原因部位に設置されたネットワーク機器１１９を稼動情報収集対象機器とし、さらに、収集する稼動情報の種別、収集周期、収集期間を決定して稼動情報収集のための設定ファイルを作成する。
【００２８】
稼動情報収集設定ファイル変更処理部１１１は、ネットワーク監視装置１２０内にあり、作成した設定ファイルを監視装置１２０内の稼動情報収集処理部１１５に対して再設定する。また、稼動情報収集設定ファイル変更処理部１１１は、作成した設定ファイルをもとに現在稼動中の稼動情報収集処理部の設定ファイルに対して部分的な設定変更を行う。
【００２９】
稼動情報収集処理部起動処理部１１２は、ネットワーク監視装置１２０内にあり、監視装置１２０から個々のネットワーク機器１１９に至る経路における応答時間およびＩＰパケット到達率１０１において、閾値以上の値が検知された場合に、稼動情報収集処理部１１５を起動する。
【００３０】
稼動情報収集処理部１１５は、ネットワーク監視装置１２０内にあり、ネットワーク機器１１９が稼動情報測定処理部１１３により測定したネットワーク稼動情報１１４を、設定ファイルに従って、ネットワーク機器１１９から収集する。
【００３１】
稼動情報格納処理部１１６は、ネットワーク監視装置１２０内にある記憶装置であり、稼動情報収集処理部１１５により収集されたネットワーク稼動情報１１４を格納、蓄積する。
【００３２】
稼動情報表示処理部１１７は、ネットワーク監視装置１２０内にあり、稼動情報収集処理部１１５により収集された稼動情報１１４をネットワーク情報表示装置を通じて表示する。
【００３３】
表示処理部呼び出し処理部１０７は、ネットワーク情報表示装置１２１内にあり、ネットワーク監視装置１２０にある応答時間表示処理部１０５を呼び出すことにより、応答時間情報やパケット到達率情報１０６を表示する。また、ネットワーク監視装置１２０にある稼動情報表示処理部１１７を呼び出すことにより、ネットワーク稼動情報１１８を表示する。　上記各処理部は、上記ＣＰＵがプログラムを実行することにより具現化される。プログラムは、予め記憶装置に格納されていても良いし、記憶媒体または通信媒体を介して他の装置から導入されても良い。
【００３４】
次に、以上の機能構成を持つネットワーク監視装置１２０によるネットワーク性能障害分析支援の例を、図２のフローチャートを利用して説明する。
【００３５】
（ｓｔｅｐ　２０１）
応答時間測定処理部１０２により、監視装置１２０からネットワーク機器１１９に至る経路における応答時間およびＩＰパケット到達率を、１０分毎や５毎という設定周期に従い、測定する。
【００３６】
ここで、監視対象とする数千台以上のネットワーク機器からなるような大規模なネットワークシステムの代表的な構成例を、図３に示す。数百オーダのサーバと数千オーダ以上のクライアントからなる大規模なネットワークシステムおいては、ネットワークの拡張性や回線コストの観点から、ネットワークにハブとなる中継拠点３０３、３０４を設置し、ここで回線を集約するトポロジーとすることが多い。また、信頼性の観点から、中継拠点３０３、３０４を複数設置し、クライアント３３０〜３３７からサーバ３２０〜３２３までの経路が２重系となるようにしている。これは、サーバ３２０〜３２３の設置されたデータセンタ３０２、基幹となる中継拠点３０３、３０４、クライアント３３０〜３３７の設置された支店３０５〜３０８からなる階層型ネットワークトポロジーであるが、見方を変えれば、中継拠点３０３、３０４を幹線部としサーバ３２０〜３２３側を支線部とするツリー型構成と、中継拠点３０３、３０４を幹線部としクライアント３３０〜３３７側を支線部とするツリー型構成が組み合わさったネットワーク構成である。
【００３７】
クライアント３３０〜３３７からサーバ３２０〜３２３に至る経路においてＩＰパケットの応答時間を測定する場合に、通常、クライアントにおいてサーバをターゲットとしてＩＣＭＰエコーの要求／応答時間（ｐｉｎｇコマンド）を測定する。しかしながら、上記のような構成のネットワークにおいては、全サーバ３２０〜３２３から全クライアント３３０〜３３７に至る経路における応答時間を測定するためには、サーバ３２０〜３２３数をｍ台、クライアント３３０〜３３７数をｎ台としたときに、
ｍ×ｎ
だけの経路数の監視をする必要がある。この場合、監視トラフィック量も多くなるため、通常のトラフィックの妨げになる恐れがある。また、監視装置をクライアント３３０〜３３７毎あるいはサーバ３２０〜３２３毎に分散して設置しなければならないので、管理上の問題が生じる。
【００３８】
本実施例では、ネットワーク全体をカバーするための方法として、中継拠点３０３、３０４を中心とする監視方法を採用する。これは、図３において点線の矢印で示した監視経路３７０〜３７６のように、ネットワークのツリー型構成部分において、監視センタ３０１内の監視装置ＮＭＳ３１０から幹線部（中継拠点３０３、３０４）を通り支線部（データセンタ３０２、あるいは、支店３０５〜３０８）のネットワーク機器に至る経路におけるＩＰパケット応答時間および到達率の測定を行う方法である。監視対象数が多く監視装置（３１０）１台でカバーできない場合は、複数台で分担することも可能である。これら複数台の監視装置３１０は、監視センタに一括して設置する。
【００３９】
なお、図３の監視装置３１０は、図１におけるネットワーク監視装置１２０に相当し、応答時間測定処理部１０２、サーバ・クライアント応答時間推定処理部１０３、応答時間格納処理部１０４、応答時間表示処理部１０５、応答時間確認処理部１０８、遅延部位絞込み処理部１０９、稼動情報収集設定ファイル作成処理部１１０、稼動情報収集設定ファイル変更処理部１１１、稼動情報収集処理部起動処理部１１２、稼動情報収集処理部１１５、稼動情報格納処理部１１６、稼動情報表示処理部１１７の各処理部を有する。
【００４０】
また、図３のサーバ３２０〜３２３、クライアント３３０〜３３７、ルータ３４０〜３５９は、図１におけるネットワーク機器１１９に相当する。　この方法においては、クライアント３３０〜３３７からサーバ３２０〜３２３に至る経路におけるＩＰパケット応答時間、例えば、図３における支店３０５のクライアントＣＬ１（３３０）から中継拠点３０３のルータＲ５（３４４）を通りデータセンタ３０２のサーバＳＶ１（３２０）に至る経路におけるＩＰパケット応答時間ｔは、監視装置ＮＭＳ（３１０）から中継拠点のルータＲ５（３４４）を通り支店３０５のクライアントＣＬ１（３３０）に至る経路（監視経路Ｄ（３７３））におけるＩＰパケット応答時間をｔｃ、監視装置ＮＭＳ（３１０）から中継拠点３０３のルータＲ５（３４４）を通りデータセンタ３０２のサーバＳＶ１（３２０）に至る経路（監視経路Ｂ（３７１））におけるＩＰパケット応答時間をｔｓ、監視装置ＮＭＳ（３１０）から中継拠点３０３のルータＲ５（３４４）に至る経路（監視経路Ａ（３７０））におけるＩＰパケット応答時間をｔ０とした場合、
ｔ＝ｔｃ＋ｔｓ−２×ｔ０
により推定する。
【００４１】
サーバ・クライアント応答時間推定処理部１０３は、この方法に従い、それぞれのサーバ３２０〜３２３とクライアント３３０〜３３７の組み合わせに応じて、サーバ３２０〜３２３からクライアント３３０〜３３７に至る経路における応答時間を推定する。
【００４２】
この方法により、全サーバ３２０〜３２３から全クライアント３３０〜３３７に至る経路における応答時間を測定するためには、サーバ３２０〜３２３数をｍ台、クライアント３３０〜３３７数をｎ台としたときに、
ｍ＋ｎ
だけの経路数の監視で可能である。従って、監視トラフィック量を削減でき、また、監視センタ３０１のみに設置した監視装置３１０からの集中監視により、ネットワーク全体を監視できる。
【００４３】
（ｓｔｅｐ　２０２）
応答時間確認処理部１０８は、応答時間測定処理部１０２により測定した監視装置からネットワーク機器に至る経路における応答時間およびＩＰパケット到達率が、それぞれの監視経路に対して設定した閾値を超えているかどうか判定する。
【００４４】
閾値の設定基準は、以下のとおりである。
【００４５】
・ネットワークの各経路における応答時間設計値
・過去の測定結果における同一の時間帯の平均値、分散値
・過去の測定結果における同一の曜日、時間帯の平均値、分散値
・過去の測定結果における同一の週、曜日、時間帯の平均値、分散値
・過去の測定結果における同一の日付、時間帯の平均値、分散値
（ｓｔｅｐ　２０３）
遅延原因部位絞込み処理部１０９は、少なくとも一つの監視経路において応答時間およびＩＰパケット到達率が、それぞれの監視経路に対して設定した閾値を超えている場合には、遅延発生の原因部位を自動的に絞込み、その部位を稼動情報の収集対象とする。
【００４６】
図３および図４を用いて、遅延部位絞込み処理部１０９による、遅延発生の原因部位の絞込み方法を解説する。中継拠点３０３、３０４を幹線部とするツリー型構成において、基幹から支線部に至る経路での応答時間の監視を行っているとき、基幹部に近いネットワーク機器やインターフェースが原因で応答時間に遅延が発生した場合、基幹を通る複数の監視経路において、遅延が検知されるはずである。一方、支線部に近いネットワーク機器やインターフェースが原因で応答時間に遅延が発生した場合、支線を通る少数の監視経路においてのみ、遅延が検知されるはずである。従って、ツリー型構成における基幹から支線部に至る複数経路での応答時間の比較をおこない、応答時間の測定結果より原因個所の推定を行うことにより、遅延の原因部位を絞り込むことが可能である。
【００４７】
例えば、図３に示すクライアントＣＬ１（３３０）、クライアントＣＬ３（３３２）、クライアントＣＬ５（３３４）、クライアントＣＬ７（３３６）から、サーバＳＶ１（３２０）およびサーバＳＶ３（３２２）に至るそれぞれの通信経路における応答時間を、監視装置ＮＭＳ（３１０）から中継拠点３０３のルータＲ５（３４４）に至る経路（監視経路Ａ（３７０））における応答時間、監視装置ＮＭＳ（３１０）から中継拠点３０３のルータＲ５（３４４）を通りデータセンタ３０２のサーバＳＶ１（３２０）に至る経路（監視経路Ｂ（３７１））における応答時間、監視装置ＮＭＳ（３１０）から中継拠点３０３のルータＲ５（３４４）を通りデータセンタ３０２のサーバＳＶ３（３２２）に至る経路（監視経路Ｃ（３７２））における応答時間、監視装置ＮＭＳ（３１０）から中継拠点３０３のルータＲ５（３４４）を通り支店３０５のクライアントＣＬ１（３３０）に至る経路（監視経路Ｄ（３７３））における応答時間、監視装置ＮＭＳ（３１０）から中継拠点３０３のルータＲ５（３４４）を通り支店３０６のクライアントＣＬ３（３３２）に至る経路（監視経路Ｅ（３７４））における応答時間、監視装置ＮＭＳ（３１０）から中継拠点３０３のルータＲ５（３４４）を通り支店３０７のクライアントＣＬ５（３３４）に至る経路（監視経路Ｆ（３７５））における応答時間、監視装置ＮＭＳ（３１０）から中継拠点３０３のルータＲ５（３４４）を通り支店３０８のクライアントＣＬ７（３３６）に至る経路（監視経路Ｇ（３７６））における応答時間を測定し前述の方法に従うことにより推定しているとする。
【００４８】
いずれかの監視経路Ａ（３７０）〜Ｇ（３７６）において遅延が検知された場合、監視装置ＮＭＳ３１０は、各経路Ａ（３７０）〜Ｇ（３７６）の遅延状態を組み合わせ、比較する。各監視経路Ａ（３７０）〜Ｇ（３７６）における遅延検知状態の組み合わせを図４の表４０１に表す。
【００４９】
中継拠点３０３のルータＲ５（３４４）からデータセンタ３０２のサーバＳＶ１（３２０）およびサーバＳＶ３（３２２）に至る経路における遅延時間の比較において、表４０１の列４０３のように監視経路Ａ（３７０）において遅延が検知された場合、遅延の原因部位はルータＲ５（３４４）およびそのインタフェースＩＦ１（３６０）付近と絞り込むことができる。表４０１の列４０４のように監視経路Ａ（３７０）においては正常、監視経路Ｂ（３７１）および監視経路Ｃ（３７２）において遅延が検知された場合、遅延の原因部位はルータＲ５（３４４）およびそのインタフェースＩＦ２（３６１）付近と絞り込むことができる。表４０１の列４０５のように監視経路Ａ（３７０）および監視経路Ｃ（３０２）においては正常、監視経路Ｂ（３７１）において遅延が検知された場合、遅延の原因部位はルータＲ１（３４０）およびその全インタフェース付近と絞り込むことができる。表４０１の列４０６のように監視経路Ａ（３７０）および監視経路Ｂ（３７１）においては正常、監視経路Ｃ（３７２）において遅延が検知された場合、遅延の原因部位はルータＲ３（３４２）およびその全インタフェース付近と絞り込むことができる。
【００５０】
同様に、中継拠点３０３のルータＲ５（３４４）から各支店３０５〜３０８のクライアントＣＬ１（３３０）、クライアントＣＬ３（３３２）、クライアントＣＬ５（３３４）、クライアントＣＬ７（３３６）に至る経路における遅延時間の比較において、表４０１の列４０７のように監視経路Ａ（３７０）、監視経路Ｅ（３７４）、監視経路Ｆ（３７５）および監視経路Ｇ（３７６）においては正常、監視経路Ｄ（３７３）において遅延が検知された場合、遅延の原因部位はルータＲ１３（３５２）およびその全インタフェース付近と絞り込むことができる。表４０１の列４０８のように監視経路Ａ（３７０）、監視経路Ｆ（３７５）および監視経路Ｇ（３７６）においては正常、監視経路Ｄ（３７３）および監視経路Ｅ（３７４）において遅延が検知された場合、遅延の原因部位はルータＲ７（３４６）およびそのインタフェースＩＦ５（３６４）付近と絞り込むことができる。表４０１の列４０９のように監視経路Ａ（３７０）および監視経路Ｇにおいては正常、監視経路Ｄ、監視経路Ｅおよび監視経路Ｆにおいて遅延が検知された場合、遅延の原因部位はルータＲ７およびそのインタフェースＩＦ４付近と絞り込むことができる。表４０１の列４１０のように監視経路Ａ（３７０）においては正常、監視経路Ｄ（３７３）、監視経路Ｅ（３７４）、監視経路Ｆ（３７５）および監視経路Ｇ（３７６）において遅延が検知された場合、遅延の原因部位はルータＲ５（３４４）およびそのインタフェースＩＦ３（３６２）付近と絞り込むことができる。
【００５１】
以上のように、ネットワークのツリー型構成部分において、監視装置３１０から幹線部を通り支線部のネットワーク機器に至る複数経路での遅延状態を比較することにより、遅延発生の原因部位を絞り込むこが可能である。図４の各監視経路Ａ（３７０）〜Ｇ（３７６）における遅延状態の比較表４０１とそれに対応する原因部位の絞込み結果４０２を予め記述しておけば、遅延原因部位絞込み処理部１０９は、それらの表の対応関係を参照することにより、各監視経路Ａ（３７０）〜Ｇ（３７６）の遅延状態に応じて自動的に原因部位を絞り込むことができる。遅延原因の調査を行うため遅延原因部位の稼動情報を収集する必要があるが、遅延原因部位絞込み処理部１０９は、絞り込んだ遅延の原因部位を、稼動情報の収集対象に設定する。
【００５２】
（ｓｔｅｐ　２０４）
稼動情報収集設定ファイル作成処理部１１０は、稼動情報収集のための設定項目として収集情報種別を決定する。
【００５３】
ネットワーク稼動情報の収集情報種別は、ルータやレイヤー３スイッチやＡＴＭスイッチなどのネットワーク機器に対しては、ＣＰＵ利用率、空きメモリ量とする。またそれらのインターフェースに対しては、入出力トラフィック量、入出力パケット数、入出力パケット廃棄数、入出力エラーバケット数、コリジョン数とする。
【００５４】
（ｓｔｅｐ　２０５）
稼動情報収集設定ファイル作成処理部１１０は、稼動情報収集のための設定項目として収集周期を決定する。
【００５５】
ネットワーク稼動情報の収集周期は、１分というように予め設定した値を利用するか、通常の長期的傾向把握のための定期的な稼動情報収集の周期の１０分の１というように設定する。
【００５６】
（ｓｔｅｐ　２０６）
稼動情報収集設定ファイル作成処理部１１０は、稼動情報収集のための設定項目として収集期間を決定する。
【００５７】
ネットワーク稼動情報の収集周期は、３０分というように予め設定した値を利用するか、応答時間が閾値を超えていた監視経路において、その後の応答時間測定結果が閾値以下になるまでとする。
【００５８】
（ｓｔｅｐ　２０７）
稼動情報収集設定ファイル作成処理部１１０は、ｓｔｅｐ２０３からｓｔｅｐ２０６での決定事項に基づき、稼動情報収集処理部１１５の設定ファイルを作成する。
【００５９】
（ｓｔｅｐ　２０８）
稼動情報収集設定ファイル変更処理部１１１は、稼動情報収集設定ファイル作成処理部１１０が作成した設定ファイルを、稼動情報収集処理部１１５の設定ファイルに上書きするか、部分的に変更を加える。また、既にネットワークの定期的稼動情報収集のための処理部が動作している場合は、その設定ファイルに対して、部分的な変更を加えることも可能である。
【００６０】
（ｓｔｅｐ　２０９）
稼動情報収集処理部起動処理部１１２は、設定ファイルの変更をされた稼動情報収集処理部１１５を起動する。稼動情報収集処理部１１５は、再設定された設定ファイルに従い、ネットワーク稼動情報を収集する。
【００６１】
本実施例は以上のｓｔｅｐを監視装置において実施することにより、本実施例は以上の構成を備えているので、大規模なネットワークシステムにおいても、システム全体に対し少ない監視経路、少ない監視トラフィック量でクライアントからサーバへ至る経路での応答時間を推定することができる。さらに、大規模なネットワークシステムにおいても、遅延発生時に原因部位を自動的に絞り込むことができる。さらに、大規模なネットワークシステムにおいても、遅延発生時に原因部位のみの稼動情報を自動的に収集するというような効率的な情報収集が可能である。
【００６２】
【発明の効果】
本発明によれば、大規模なネットワークシステムにおいても、効率的に、応答時間の測定や遅延個所の絞込みを行うことが可能になる。
【図面の簡単な説明】
【図１】本実施形態のシステム構成図である。
【図２】本実施形態のネットワーク監視装置による性能障害分析支援処理の流れである。
【図３】本実施形態のネットワーク論理構成図および応答時間監視経路の例である。
【図４】本実施形態のネットワーク監視装置による応答時間測定結果からの遅延部位絞込みの例である。
【符号の説明】
１０１……応答時間情報・パケット到達率情報、１０２……応答時間測定処理部、１０３……サーバ・クライアント応答時間推定処理部、１０４……応答時間格納処理部、１０５……応答時間表示処理部、１０６……応答時間情報・パケット到達率情報、１０７……表示処理部呼び出し処理部、１０８……応答時間確認処理部、１０９……遅延部位絞込み処理部、１１０……稼動情報収集設定ファイル作成処理部、１１１……稼動情報収集設定ファイル変更処理部、１１２……稼動情報収集処理部起動処理部、１１３……稼動情報測定処理部、１１４……ネットワーク稼動情報、１１５……稼動情報収集処理部、１１６……稼動情報格納処理部、１１７……稼動情報表示処理部、１１８……ネットワーク稼動情報、１１９……ネットワーク機器、１２０……ネットワーク監視装置、１２１……ネットワーク情報表示装置。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a network system operation management method, and more particularly, to a method for performing performance management by monitoring response times of network devices, diagnosing a network system operation state, performing a failure site analysis, and performing preventive maintenance.
[0002]
[Prior art]
A network system manager or system engineer usually measures response time between network devices in order to grasp network performance. For measuring the response time, a response time measurement method of an IP packet using a request / response time (ping command) of an ICMP (Internet Control Message Protocol) echo is widely used.
[0003]
Patent Literature 1 discloses a method for separating a delay point in a network system.
[0004]
Patent Document 2 discloses a method for automatically investigating a delay point in a network system.
[0005]
[Patent Document 1]
JP-A-11-346238
[Patent Document 2]
JP-A-2002-152203
[0006]
[Problems to be solved by the invention]
In Patent Literature 1, it is necessary to install a large number of response time measurement probes on a network system in order to separate a delay point.
[0007]
In Patent Document 2, it is necessary to specify a route and measure a response time from each node on the route in order to separate a delay point.
[0008]
These conventional methods are not efficient methods for measuring response time covering the entire large-scale network system including thousands or more network devices and narrowing down delay points.
[0009]
[Means for Solving the Problems]
The present invention provides a network performance failure analysis support method capable of estimating a response time on a path from a client to a server in a large-scale network system including thousands or more network devices.
[0010]
Further, the present invention provides a network performance failure analysis support method that can automatically narrow down a cause part when a delay occurs even in a large-scale network system including thousands or more network devices.
[0011]
Also, the present invention provides a network performance failure analysis support method that can automatically collect operation information of only a cause part when a delay occurs even in a large-scale network system including thousands or more network devices. I do.
[0012]
More specifically, the present invention relates to a tree-type network that extends from a trunk base relay point to a plurality of branch line network devices. Response time measuring means for measuring the response time and the arrival rate of IP packets on a route to a server, and a server / client for estimating a response time on a route from a client device installed on a branch line to a server device installed on another branch line Response time estimating means is provided.
[0013]
Further, the present invention further includes a response time confirming means for determining whether a delay has occurred in the response time, and a delay cause part narrowing means for automatically narrowing down the cause part of the delay.
[0014]
In addition, the present invention automatically selects network devices installed at the cause of delay occurrence as operation information collection target devices, and further determines the type of operation information to be collected, a collection cycle, and a collection period to determine operation information. Operating information collection setting file creating means for creating a setting file for collection, operating information collection setting file changing means for resetting the created setting file to the operating information collecting means in the monitoring device, and operating information collecting means Operating information collecting means for activating the operation information, operating statistical information collecting means for collecting operating information from network devices in accordance with a setting file, and operating information storing means for storing and storing the collected operating information.
[0015]
Further, the present invention includes an operation information collection setting file change unit for partially changing the setting file of the currently operating operation information collection unit.
[0016]
Since the present invention has the configuration described above, even in a large-scale network system, it is possible to estimate the response time in a path from a client to a server with a small number of monitoring paths for the entire system. Further, even in a large-scale network system, a cause part can be automatically narrowed down when a delay occurs. In addition, even in a large-scale network system, it is possible to automatically collect operation information of only a cause part when a delay occurs.
[0017]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0018]
FIG. 1 is a functional configuration example of a network performance failure analysis support system according to an embodiment of the present invention. The hardware configuration and the functional configuration of the network performance failure analysis support system will be described with reference to FIG.
[0019]
The network device 119 is a device such as a router, an ATM switch, a switching hub, or an intelligent hub, and has an ICMP echo response function for measuring an IP packet response time, and an SNMP (Simple Network Management Protocol) agent for measuring operation information. Has functions. When the server or the client has the ICMP echo response function and the SNMP agent function, it can be regarded as a network device.
[0020]
The network monitoring device 120 and the network information display device 121 are general personal computers, and a CPU (Central Processing Unit) can interpret and execute program instructions.
[0021]
The response time measurement processing unit 102 is provided in the network monitoring device 120, and periodically measures a response time and an IP packet arrival rate 101 in a path from the monitoring device 120 to the network device 119.
[0022]
The server / client response time estimation processing unit 103 is provided in the network monitoring device 120, and determines a response time from the client to the server based on a response time in a path from the monitoring device 120 to the server and a response time in a path from the monitoring device 120 to the client. Estimate the response time in the route to be reached. Details of the estimation method will be described later.
[0023]
The response time storage processing unit 104 is a storage device in the network monitoring device 120, and stores and stores the response time information and the packet arrival rate information 101 measured by the response time measurement processing unit 102.
[0024]
The response time display processing unit 105 is provided in the network monitoring device 120, and displays the response time information and the packet arrival rate information 101 measured by the response time measurement processing unit 102 through the network information display device.
[0025]
The response time confirmation processing unit 108 is located in the network monitoring device 120, and determines whether the response time and the IP packet arrival rate 101 in the path from the monitoring device 120 to each network device 119 are equal to or greater than the threshold value set for each. .
[0026]
The delay part narrowing down processing unit 109 is provided in the network monitoring apparatus 120, and when a value greater than or equal to a threshold value is detected in the response time and the IP packet arrival rate 101 in the path from the monitoring apparatus 120 to each network device 119, Automatically narrow down network parts that cause delay. Details of the narrowing method will be described later.
[0027]
The operation information collection setting file creation processing unit 110 sets the network device 119 in the network monitoring device 120, which is installed at the delay cause part that has been automatically narrowed down, as the operation information collection target device, and furthermore, the type of operation information to be collected. , A collection cycle and a collection period are determined, and a setting file for collecting operation information is created.
[0028]
The operation information collection setting file change processing unit 111 is in the network monitoring device 120 and resets the created setting file to the operation information collection processing unit 115 in the monitoring device 120. The operation information collection setting file change processing unit 111 performs a partial setting change to the setting file of the currently operating operation information collection processing unit based on the created setting file.
[0029]
The operation information collection processing unit activation processing unit 112 is located in the network monitoring apparatus 120, and a value equal to or greater than the threshold value is detected in the response time and the IP packet arrival rate 101 in the path from the monitoring apparatus 120 to each network device 119. In this case, the operation information collection processing unit 115 is started.
[0030]
The operation information collection processing unit 115 is provided in the network monitoring apparatus 120, and collects the network operation information 114 measured by the network device 119 by the operation information measurement processing unit 113 from the network device 119 according to the setting file.
[0031]
The operation information storage processing unit 116 is a storage device in the network monitoring device 120, and stores and accumulates the network operation information 114 collected by the operation information collection processing unit 115.
[0032]
The operation information display processing unit 117 is located in the network monitoring device 120, and displays the operation information 114 collected by the operation information collection processing unit 115 through the network information display device.
[0033]
The display processing unit call processing unit 107 is located in the network information display device 121 and displays the response time information and the packet arrival rate information 106 by calling the response time display processing unit 105 in the network monitoring device 120. Also, by calling the operation information display processing unit 117 in the network monitoring device 120, the network operation information 118 is displayed. Each of the processing units is embodied by the CPU executing a program. The program may be stored in a storage device in advance, or may be introduced from another device via a storage medium or a communication medium.
[0034]
Next, an example of network performance failure analysis support by the network monitoring device 120 having the above-described functional configuration will be described with reference to the flowchart of FIG.
[0035]
(Step 201)
The response time measurement processing unit 102 measures the response time and the IP packet arrival rate in the path from the monitoring device 120 to the network device 119 according to a set cycle of every 10 minutes or every 5 minutes.
[0036]
Here, FIG. 3 shows a typical configuration example of a large-scale network system including thousands or more network devices to be monitored. In a large-scale network system including a server of several hundred orders and a client of several thousand orders or more, from the viewpoint of network expandability and line cost, relay bases 303 and 304 serving as hubs are installed in the network. It is often a topology that aggregates lines. Further, from the viewpoint of reliability, a plurality of relay bases 303 and 304 are installed, and the routes from the clients 330 to 337 to the servers 320 to 323 are configured to be a dual system. This is a hierarchical network topology composed of the data center 302 where the servers 320 to 323 are installed, the relay bases 303 and 304 serving as the core, and the branches 305 to 308 where the clients 330 to 337 are installed. The tree structure in which the relay bases 303 and 304 are trunk lines and the servers 320 to 323 are branch lines is combined with the tree structure in which the relay bases 303 and 304 are trunk lines and the clients 330 to 337 are branch lines. Network configuration.
[0037]
When measuring the response time of an IP packet in the path from the clients 330 to 337 to the servers 320 to 323, the client typically measures the request / response time (ping command) of the ICMP echo with the server as the target. However, in the network having the above configuration, in order to measure the response time on the path from all the servers 320 to 323 to all the clients 330 to 337, m servers 320 to 323 and m clients 330 to 337 are required. Is n units,
mxn
It is necessary to monitor only the number of routes. In this case, the amount of monitoring traffic increases, which may hinder normal traffic. In addition, since monitoring devices must be distributed and installed for each of the clients 330 to 337 or each of the servers 320 to 323, a management problem arises.
[0038]
In the present embodiment, as a method for covering the entire network, a monitoring method centered on the relay points 303 and 304 is adopted. This is because, as shown by the monitoring paths 370 to 376 indicated by the dotted arrows in FIG. This is a method for measuring the response time and the arrival rate of the IP packet in the route to the network device of the data unit 302 (or the data center 302 or the branches 305 to 308). When the number of monitoring targets is large and cannot be covered by one monitoring device (310), a plurality of monitoring devices (310) can be shared. The plurality of monitoring devices 310 are collectively installed in a monitoring center.
[0039]
The monitoring device 310 in FIG. 3 corresponds to the network monitoring device 120 in FIG. 1, and includes a response time measurement processing unit 102, a server / client response time estimation processing unit 103, a response time storage processing unit 104, and a response time display processing unit. 105, response time confirmation processing unit 108, delay part narrowing down processing unit 109, operation information collection setting file creation processing unit 110, operation information collection setting file change processing unit 111, operation information collection processing unit activation processing unit 112, operation information collection processing The processing unit includes a processing unit 115, an operation information storage processing unit 116, and an operation information display processing unit 117.
[0040]
The servers 320 to 323, the clients 330 to 337, and the routers 340 to 359 in FIG. 3 correspond to the network device 119 in FIG. In this method, the IP packet response time in the path from the clients 330 to 337 to the servers 320 to 323, for example, from the client CL1 (330) of the branch 305 to the data center through the router R5 (344) of the relay base 303 in FIG. The IP packet response time t on the route to the server SV1 (320) of the server 302 is determined by the route (monitoring route D) from the monitoring device NMS (310) to the client CL1 (330) of the branch 305 through the router R5 (344) at the relay base. In (373)), the IP packet response time is tc, and a route from the monitoring device NMS (310) to the server SV1 (320) of the data center 302 through the router R5 (344) of the relay base 303 (monitoring route B (371)) The response time of the IP packet in the monitoring device to ts If the MS path from (310) to the router R5 (344) of the relay site 303 the IP packet response time in (monitoring path A (370)) was t0,
t = tc + ts−2 × t0
Estimate by
[0041]
According to this method, the server / client response time estimation processing unit 103 estimates the response time on the path from the server 320 to 323 to the client 330 to 337 according to the combination of the server 320 to 323 and the client 330 to 337. .
[0042]
According to this method, in order to measure the response time in the path from all servers 320 to 323 to all clients 330 to 337, when the number of servers 320 to 323 is m and the number of clients 330 to 337 is n,
m + n
It is possible to monitor only the number of paths. Therefore, the amount of monitoring traffic can be reduced, and the entire network can be monitored by centralized monitoring from the monitoring device 310 installed only in the monitoring center 301.
[0043]
(Step 202)
The response time confirmation processing unit 108 determines whether the response time and the IP packet arrival rate in the path from the monitoring device to the network device measured by the response time measurement processing unit 102 exceed the threshold set for each monitoring path. judge.
[0044]
The threshold setting criteria are as follows.
[0045]
・ Response time design value for each route of the network
・ Average value and variance value of the same time zone in past measurement results
・ Average and variance values of the same day of the week and time zone in past measurement
・ Average value and variance value of the same week, day, and time zone in the past measurement results
・ Average value and variance value of the same date and time zone in past measurement results
(Step 203)
If the response time and the IP packet arrival rate in at least one of the monitoring paths exceed the thresholds set for the respective monitoring paths, the delay cause part narrowing down processing unit 109 automatically identifies the cause of the delay. And the part is set as the collection target of the operation information.
[0046]
With reference to FIGS. 3 and 4, a method of narrowing down a part causing a delay by the delay part narrowing down processing unit 109 will be described. In a tree-type configuration with the relay bases 303 and 304 as trunks, when monitoring the response time on the path from the trunk to the branch, there is a delay in the response time due to network devices and interfaces close to the trunk. If this occurs, delays should be detected in multiple monitoring paths through the backbone. On the other hand, if a delay occurs in the response time due to a network device or an interface near the branch line, the delay should be detected only in a small number of monitoring paths passing through the branch line. Therefore, it is possible to narrow down the cause of the delay by comparing the response times of a plurality of paths from the trunk to the branch line in the tree-type configuration, and estimating the cause of the delay based on the measurement result of the response time.
[0047]
For example, responses in respective communication paths from the client CL1 (330), the client CL3 (332), the client CL5 (334), and the client CL7 (336) shown in FIG. 3 to the server SV1 (320) and the server SV3 (322). The response time in the route (monitoring route A (370)) from the monitoring device NMS (310) to the router R5 (344) of the relay base 303, the router R5 (344) of the relay base 303 from the monitoring device NMS (310). , The response time on the route (monitoring route B (371)) to the server SV1 (320) of the data center 302, the server SV3 of the data center 302 from the monitoring device NMS (310) through the router R5 (344) of the relay base 303. On the route to (322) (monitoring route C (372)) Response time on the route (monitoring route D (373)) from the monitoring device NMS (310) to the client CL1 (330) of the branch 305 through the router R5 (344) of the relay base 303, and the monitoring device NMS (310). ) Through the router R5 (344) of the relay base 303 to the client CL3 (332) of the branch 306 (monitoring route E (374)), the response time from the monitoring device NMS (310) to the router R5 of the relay base 303 ( 344), the response time on the route (monitoring route F (375)) to the client CL5 (334) of the branch 307, and the client CL7 of the branch 308 from the monitoring device NMS (310) through the router R5 (344) of the relay base 303. Measures the response time on the route to (336) (monitoring route G (376)) And it is estimated by following the above-described method.
[0048]
When a delay is detected in any of the monitoring paths A (370) to G (376), the monitoring device NMS 310 combines and compares the delay states of the paths A (370) to G (376). The combinations of the delay detection states in each of the monitoring paths A (370) to G (376) are shown in Table 401 of FIG.
[0049]
In comparison of the delay time on the route from the router R5 (344) of the relay base 303 to the server SV1 (320) and the server SV3 (322) of the data center 302, the monitoring route A (370) as shown in the column 403 of the table 401. If a delay is detected, the cause of the delay can be narrowed down to the vicinity of the router R5 (344) and its interface IF1 (360). As shown in the column 404 of the table 401, when the monitoring path A (370) is normal and the monitoring path B (371) and the monitoring path C (372) detect a delay, the cause of the delay is the router R5 (344) and It can be narrowed down to the vicinity of the interface IF2 (361). If the monitoring path A (370) and the monitoring path C (302) are normal and the delay is detected in the monitoring path B (371) as shown in a column 405 of the table 401, the cause of the delay is the router R1 (340) and It can be narrowed down to near all interfaces. If the monitoring route A (370) and the monitoring route B (371) are normal and the delay is detected in the monitoring route C (372) as shown in the column 406 of the table 401, the cause of the delay is the router R3 (342) and It can be narrowed down to near all interfaces.
[0050]
Similarly, a comparison of the delay time in the route from the router R5 (344) of the relay base 303 to the clients CL1 (330), the client CL3 (332), the client CL5 (334), and the client CL7 (336) of each of the branches 305 to 308. In the monitoring path A (370), the monitoring path E (374), the monitoring path F (375), and the monitoring path G (376), the delay is normal in the monitoring path D (373) as shown in the column 407 of the table 401. If detected, the cause of the delay can be narrowed down to the vicinity of the router R13 (352) and all its interfaces. As shown in a column 408 of the table 401, the monitoring path A (370), the monitoring path F (375), and the monitoring path G (376) are normal, and the delay is detected in the monitoring path D (373) and the monitoring path E (374). In this case, the cause of the delay can be narrowed down to the vicinity of the router R7 (346) and its interface IF5 (364). As shown in the column 409 of the table 401, when the monitoring route A (370) and the monitoring route G are normal, and when a delay is detected in the monitoring route D, the monitoring route E and the monitoring route F, the cause of the delay is the router R7 and the router R7. It can be narrowed down to near the interface IF4. As shown in the column 410 of the table 401, the monitoring path A (370) is normal, and the delay is detected in the monitoring path D (373), the monitoring path E (374), the monitoring path F (375), and the monitoring path G (376). In this case, the cause of the delay can be narrowed down to the vicinity of the router R5 (344) and its interface IF3 (362).
[0051]
As described above, in the tree-type configuration of the network, it is possible to narrow down the cause of the delay by comparing the delay states in a plurality of paths from the monitoring device 310 to the branch line network device through the trunk line. It is. If the comparison table 401 of the delay status in each of the monitoring paths A (370) to G (376) in FIG. 4 and the narrowing result 402 of the cause part corresponding thereto are described in advance, the delay cause part narrowing processing unit 109 By referring to the correspondence in the table, the cause part can be automatically narrowed down according to the delay state of each of the monitoring paths A (370) to G (376). In order to investigate the cause of the delay, it is necessary to collect operation information of the delay cause part. However, the delay cause part narrowing down processing unit 109 sets the narrowed cause part of the delay as an operation information collection target.
[0052]
(Step 204)
The operation information collection setting file creation processing unit 110 determines a collection information type as a setting item for collecting operation information.
[0053]
The collected information type of the network operation information is a CPU utilization rate and an available memory amount for network devices such as a router, a layer 3 switch, and an ATM switch. For those interfaces, the input / output traffic amount, the input / output packet number, the input / output packet discard number, the input / output error bucket number, and the collision number are used.
[0054]
(Step 205)
The operation information collection setting file creation processing unit 110 determines a collection cycle as a setting item for collecting operation information.
[0055]
The collection period of the network operation information is set to use a preset value such as one minute, or set to one tenth of the period of regular operation information collection for grasping a normal long-term trend.
[0056]
(Step 206)
The operation information collection setting file creation processing unit 110 determines a collection period as a setting item for collecting operation information.
[0057]
The collection cycle of the network operation information is set to a value set in advance such as 30 minutes, or until the subsequent response time measurement result becomes equal to or less than the threshold value in the monitoring route whose response time exceeds the threshold value.
[0058]
(Step 207)
The operation information collection setting file creation processing unit 110 creates a setting file of the operation information collection processing unit 115 based on the items determined in steps 203 to 206.
[0059]
(Step 208)
The operation information collection setting file change processing unit 111 overwrites or partially changes the setting file created by the operation information collection setting file creation processing unit 110 on the setting file of the operation information collection processing unit 115. When the processing unit for collecting the periodic operation information of the network is already operating, it is possible to partially change the setting file.
[0060]
(Step 209)
The operation information collection processing unit activation processing unit 112 activates the operation information collection processing unit 115 whose setting file has been changed. The operation information collection processing unit 115 collects network operation information according to the reset configuration file.
[0061]
In the present embodiment, the above-described steps are performed by the monitoring apparatus. Thus, the present embodiment has the above-described configuration. The response time on the path from the client to the server can be estimated. Further, even in a large-scale network system, a cause part can be automatically narrowed down when a delay occurs. Further, even in a large-scale network system, it is possible to efficiently collect information such as automatically collecting operation information of only a cause part when a delay occurs.
[0062]
【The invention's effect】
ADVANTAGE OF THE INVENTION According to this invention, even in a large-scale network system, it becomes possible to measure a response time and narrow down a delay point efficiently.
[Brief description of the drawings]
FIG. 1 is a system configuration diagram of an embodiment.
FIG. 2 is a flow of a performance failure analysis support process by the network monitoring device of the present embodiment.
FIG. 3 is an example of a network logical configuration diagram and a response time monitoring path according to the embodiment;
FIG. 4 is an example of narrowing down a delay portion from a response time measurement result by the network monitoring device of the present embodiment.
[Explanation of symbols]
101: Response time information / packet arrival rate information, 102: Response time measurement processing unit, 103: Server / client response time estimation processing unit, 104: Response time storage processing unit, 105: Response time display processing unit , 106... Response time information / packet arrival rate information, 107... Display processing unit call processing unit, 108... Response time confirmation processing unit, 109... Delay part narrowing down processing unit, 110. Processing unit, 111: Operation information collection setting file change processing unit, 112: Operation information collection processing unit activation processing unit, 113: Operation information measurement processing unit, 114: Network operation information, 115: Operation information collection processing , Operation information storage processing unit, 117 operation information display processing unit, 118 network operation information, 119 network device , 120 ...... network monitoring apparatus, 121 ...... network information display device.

Claims

In a tree-type network from a trunk line relay point to a plurality of branch line network devices, an IP in a route from a monitoring device connected to the trunk line to the network device of any of the branch lines through the relay point through the relay point. A network system performance failure analysis support method characterized in that a packet response time and an arrival rate are measured, and a delay cause is automatically narrowed down by comparing delay states of a plurality of paths.

In a tree-type network from a trunk base relay point to a plurality of branch line network devices, an IP in a path from a monitoring device connected to the trunk line to the network device of any of the branch lines through the relay base through the relay base. Response time measuring means for measuring packet response time and arrival rate;
Response time checking means for determining whether a delay has occurred in the response time,
A network performance failure analysis support system, comprising: a delay cause part narrowing means for automatically narrowing down a cause part of a delay by comparing delay states of a plurality of paths.

The network performance failure analysis support system according to claim 2,
The IP packet response time on the path from the monitoring device to the client device in the branch line through the relay base, the IP packet response time in the route from the monitoring device to the server device in the branch line through the relay base, and the IP packet response time in the route from the monitoring device to the relay base A server / client response time estimating means for estimating a response time in a path from the client device to the server device based on an IP packet response time in a path from the server device to the network device serving as a center of the route from the server device to the client device; A network performance failure analysis support system characterized by comprising:

The network performance failure analysis support system according to claim 2,
A network device installed at the cause of the delay that was automatically narrowed down is set as the operation information collection target device, and the type of operation information to be collected, the collection cycle, and the collection period are determined, and a configuration file for collecting operation information Means for creating an operation information collection setting file for creating
Operating information collection setting file changing means for resetting the created setting file to the operating information collecting means;
Operating information collecting means starting means for starting the operating information collecting means,
A network performance failure analysis support system, further comprising operation statistics information collecting means for collecting operation information from a network device according to a setting file.

The network performance failure analysis support system according to claim 2,
A network device installed at the cause of the delay that was automatically narrowed down is set as the operation information collection target device, and the type of operation information to be collected, the collection cycle, and the collection period are determined, and a configuration file for collecting operation information Means for creating an operation information collection setting file for creating
Network performance failure analysis support, further comprising an operation information collection setting file change unit for partially changing a setting file of the operation information collection unit currently in operation based on the created setting file. system.