JP2004274289A

JP2004274289A - Method and device for load balancing of manager agent type information collection application

Info

Publication number: JP2004274289A
Application number: JP2003060847A
Authority: JP
Inventors: Toshimasa Arai; 敏正新井
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2003-03-07
Filing date: 2003-03-07
Publication date: 2004-09-30

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method and a device for load balancing of a manager agent type information collection application such that NE (Network Element) data can surely be collected at a predetermined time and a period even when a congestion occurs, as for the method and the device for load balancing of the manager agent type information collection application. <P>SOLUTION: This device comprises agents 11 and 12 in a server 10 which collect a plurality of pieces of traffic information stored by a plurality of IP network devices NE and are distributed and arranged respectively, an agent collection item correspondence table 25 for associating an agent number for uniquely identifying the agents 11 and 12 with a collection item number for uniquely identifying an item that the agents collect, and a collection item table 26 for defining the name of a collection item that can be uniquely identified and the reference point/data type/collection period of IP network device side information. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明はマネージャ−エージェント型情報収集アプリケーションのロードバランス方法及び装置に関し、更に詳しくはＩＰネットワークを構成するルータ、ブリッジ、コンピュータ装置等の複数のネットワーク装置（ＮＥ：ＮｅｔｗｏｒｋＥｌｅｍｅｎｔ：ネットワークサービスを実現する機能を持つ装置のこと）を管理するためのネットワーク管理システムにおけるマネージャ−エージェント型のデータ収集アプリケーションのロードバランス方法及び装置に関する。代表的なデータ収集機能の例としては、ルータが処理したＩＰパケットの入出力パケット数、パケットロス数、廃棄パケット数の定期収集等が考えられる。
【０００２】
ネットワーク管理システムの一部プロセスの輻輳に対して、データ収集機能の維持はネットワーク管理業務だけでなく、ＳＬＡに基づく顧客へのネットワークサービス品質情報開示にも関わる、ネットワーク管理システムに期待される課題の一つであるが、本発明はこのような輻輳時のデータ収集維持に関するものである。
【０００３】
図１１は、本発明の利用分野の説明図である。図において、１はＮＥであり、複数のＮＥが互いに接続しあって、ＩＰネットワーク２〜４を構成している。５はＮＥ１と接続され、ネットワークの監視を行なうネットワーク管理システムである。ネットワーク管理システム５は、ネットワーク情報をネットワーク管理者６に送信する。そして、該ネットワーク管理者６は、ネットワーク監視と品質確認を行なう。また、ネットワーク管理システム５は、営業・コールセンタ７にもネットワーク情報を送信する。営業・コールセンタ７では、ＳＬＡ情報（サービス品質情報）として顧客８に提示する。
【０００４】
【従来の技術】
この種のネットワーク監視システムとしては、集中管理・制御型ネットワークの負荷制御に関し、集中管理・制御装置の過負荷状態の発生要因に応じて過負荷制御を効率的に行なう技術がある（例えば、特許文献１参照）。
【０００５】
また、複数のマネージャノードと、複数のネットワーク資源を管理するエージェントノードから構成されるネットワーク管理システムにおいて、ネットワークの負荷に応じて管理トラフィック発生の頻度を自動的に最適化する技術がある（例えば特許文献２参照）。
【０００６】
ネットワーク管理システムは、複数のハードウェア（サーバと呼ぶ）から構成されており、その処理単位（プロセスと呼ぶ）は近年の分散オブジェクト環境（若しくはＣＯＲＢＡ（ＣｏｍｍｏｎＯｂｊｅｃｔＲｅｑｕｅｓｔＢｒｏｋｅｒＡｒｃｈｉｔｅｃｔｕｒｅ）と呼ぶ）等の既存技術により、任意に起動されるサーバを変更することができる。
【０００７】
上記ＩＰネットワークサービスを実現するＮＥ群が、ネットワーク上で稼働している状態で、データ収集アプリケーションを含むネットワーク管理システムは、
▲１▼ＳＬＡに基づく顧客に対するサービス品質情報開示に含まれるネットワークサービス品質保証、
▲２▼ネットワーク保守、ネットワーク障害検知、ネットワーク料金計算、設備投資予測、
等の観点から、ＮＥが保持するネットワーク性能情報を定期的に収集し、保存しておく機能が要求される。この時、ＩＰネットワークを構成するＮＥが保持するネットワーク性能情報とは、一般的にＳＮＭＰ（ＳｉｍｐｌｅＮｅｔｗｏｒｋＭａｎａｇｅｍｅｎｔＰｒｏｔｏｃｏｌ）通信による取得・設定可能なＭＩＢ（ＭａｎａｇｅｍｅｎｔＩｎｆｏｒｍａｔｉｏｎＢａｓｅ）に保持されているのが通常である。また、ＮＥはＭＩＢに保持しているネットワーク性能情報を随時更新している場合が多く、時間経過により加算される数値情報の場合にはＭＩＢが収容する最大値を超えた場合、情報は再び０から開始されるのが通常である。このため、データ収集アプリケーションは、各収集項目の利用目的毎に固有の収集周期によりデータ収集を行なう必要がある。
【０００８】
【特許文献１】
特開平９−８９０７号公報（第４頁、第５頁、図１）
【特許文献２】
特開平９−２７０７９４号公報（第４頁、図１）
【０００９】
【発明が解決しようとする課題】
しかしながら、以下の点において、このデータ収集が乱される場合がある。
▲１▼あるサーバに配置されたエージェントで、サーバが含む他の管理処理で輻輳が発生することで、そのエージェントに割り当てられるサーバの処理時間が少なくなり、ひいてはエージェントの収集処理に時間がかかってしまい、既定の収集周期でデータ収集が完了できず、既定の時刻に対する実際の収集時刻の遅延かつ又は収集データの欠落が起こる場合がある。
▲２▼あるサーバに配置されたエージェントで、サーバが含む他の管理処理で発生した輻輳状態は復旧したものの、収集周期の遅延により、実際の収集時刻と既定の収集時刻との間にずれが生じ、周期は同じであるが収集項目登録時とは異なる時刻に収集するようになる場合がある。
【００１０】
図１２は多量の故障情報を一度に受信した場合の動作説明図である。図１１と同一のものは、同一の符号を付して示す。図において、１はＮＥである。１０はＩＰネットワーク２〜４の状態を監視するサーバである。サーバ１０において、１１，１２はＮＥ１の状態を収集する収集エージェント、１３はＮＥ１の故障状態を管理する故障管理機能部、１４はネットワーク制御機能部である。１３、１４は一般的なネットワーク管理システムが保持する機能を指しており、同一サーバ内に配置される別機能であってもよい。１５は収集情報を記憶するために、サーバ１０内に設けられたデータベースである。該データベース１５の情報は、オペレータ１６，１７により必要に応じて読み出される。
【００１１】
このように構成されたシステムにおいて、収集エージェント１１，１２はＮＥ１に対して定期的に状態を収集する。そして、これら収集エージェント１１，１２から収集されたデータは、サーバ１０内のデータベース１５に記憶される。ここで、ＮＥ１から故障管理機能部１３の処理能力を超える大量の故障情報が同時に通知された場合、サーバ１０に輻輳が発生する。
【００１２】
サーバ１０に輻輳が発生すると、収集エージェント１２は既定周期かつ既定時刻でデータ収集を完遂不能な状況に陥ることになる。この結果、収集エージェント１２は既定周期かつ既定時刻にデータベース１５にデータを登録することができなくなり、データベース１５上にデータ欠落が発生する。
【００１３】
図１３は正常な場合と輻輳時におけるデータ収集のタイミングを示す図である。（ａ）は正常な場合のデータ収集タイミングを、（ｂ）は輻輳時のデータ収集タイミングをそれぞれ示す。収集項目はＡ〜Ｃまでであるものとする。周期Ｔａは収集項目Ａの収集周期を、周期Ｔｂは収集項目Ｂの収集周期を、周期Ｔｃは収集項目Ｃの収集周期をそれぞれ示す。周期のスタートは、一致する場合としない場合とがある。
【００１４】
（ｂ）は輻輳時の場合の収集タイミングを示す。図に示す時刻ｔ１で輻輳が発生したものとする。輻輳中は、既定の周期ではデータ収集できない。また、輻輳回復後１回目のデータ収集以後、既定周期での収集が再開するが、収集時刻について正常な場合との間にずれが生じる。
【００１５】
本発明はこのような課題を解決するものであって、輻輳が発生した場合でも確実にＮＥからデータ収集が既定の時刻及び周期で行なえるようなマネージャ−エージェント型情報収集アプリケーションのロードバランス方法及び装置を提供することを目的としている。
【００１６】
【課題を解決するための手段】
（１）請求項１記載の発明は、以下の通りである。図１は本発明方法の原理を示すフローチャートである。本発明は、ＩＰネットワークのトラフィック情報を各ＩＰネットワーク装置から収集するマネージャ−エージェント型のアプリケーションにおいて、ポリシーに基づき収集項目数，収集周期，データ処理能力から各エージェントの収集対象範囲を動的に変更させ（ステップ１）、各エージェントは前記変更された収集対象範囲に基づいてデータを収集する（ステップ２）ことを特徴とする。
【００１７】
ここで、ポリシーとは、ある条件に対して実行すべき行動かつ又は行動のために必要なパラメータを定義したものであって、ポリシーの適用主体もしくは適用主体を取り巻く環境が、前記の条件に適合した場合に、適用主体はその条件に対する行動かつ又は行動のために必要なパラメータを取得し、自身の行動を変化させるものである。
【００１８】
従って、輻輳状況をポリシー条件とし、エージェントの収集対象範囲の拡縮を行動として定義することで、輻輳が発生した場合でも確実にＮＥからデータ収集を行なうことができる。
（２）請求項２記載の発明は、以下の通りである。図２は本発明の原理ブロック図である。図１２と同一のものは、同一の符号を付して示す。図において、２０はサーバ１０を含むネットワーク管理システムである。サーバ１０はそれぞれが分散して配置されている。サーバ１０において、１１，１２はＮＥからデータを一定周期で取り込む収集エージェント、１３はＮＥ網の故障状態を管理する故障管理機能部、１４はネットワーク制御機能部である。
【００１９】
１３，１４は一般的なネットワーク管理システムが保持する機能を示しており、同一サーバ内に配置される別機能であってもよい。１５は収集情報を記憶するために、サーバ１０内に設けられたデータベースである。１６，１７はデータベース１５に蓄積されたデータを利用するオペレータである。
【００２０】
ネットワーク管理システム２０において、２５はエージェントを一意に識別するエージェント番号とエージェントが収集する項目を一意に識別する収集項目番号を対応付けて記憶しているエージェント収集項目対応テーブル、２６は一意に識別できる収集項目の名称、ＩＰネットワーク装置（ＮＥ）側情報の参照点・データ型・収集周期・開始時刻・運用状態等を定義している収集項目テーブルである。
【００２１】
ここで、エージェント収集項目対応テーブル２５と収集項目テーブル２６は、任意のサーバ１０の中に設けてもよいし、新たなサーバを追加して、その中に設置してもよい。
【００２２】
エージェント収集項目対応テーブル２５と収集項目テーブル２６を設けることにより、エージェントはエージェント番号から収集項目内容を取得することができ、収集項目内容に従って、データ収集を行なうことができる。
（３）請求項３記載の発明は、システムのハード的かつ又はソフト的な監視デバイスから出力されるサーバの処理性能情報を監視して、各サーバの輻輳発生及び輻輳からの復旧を検出する輻輳検出器２２と、各エージェントを一意に識別するエージェント番号と、サーバを一意に識別するサーバ番号を対応付けているエージェント／サーバ対応テーブル２１と、収集項目とエージェントの対応を輻輳状況により再編成するための様式を定義する収集ポリシー定義テーブル２４と、前記輻輳検出器２２の通知により、収集ポリシー定義テーブルから輻輳状況に適合するポリシーを取得し、収集項目とエージェントの対応を再編成する収集項目編成演算器２３と、を更に有することを特徴とする。本発明の実施の形態例を図３に示す。
【００２３】
このように構成すれば、ポリシーに基づき、収集項目数、収集周期、データ処理能力から各エージェントの収集対象範囲を動的に変更することができるので、輻輳が発生した場合でも確実にＮＥからデータ収集が既定の時刻及び周期で行なえるようなマネージャ−エージェント型情報収集アプリケーションを提供することができる。
【００２４】
本発明によれば、エージェントが配置されているサーバが輻輳状態に陥った場合でも、そのエージェントに割り当てられている収集項目をポリシーに基づき、他のサーバのエージェントに割り当てることで、各収集項目の既定の収集時刻及び収集周期を維持できるため、システムが輻輳状況においても収集周期の遅延やデータ欠落、輻輳復旧後の収集時刻のずれを防止することが可能となる。
【００２５】
【発明の実施の形態】
以下、図面を参照して本発明の実施の形態例を詳細に説明する。
【００２６】
先ず、図３に示す構成について更に説明する。図に示すシステムは、複数のＩＰネットワーク装置（ＮＥ）１を管理するネットワーク管理機能を提供する複数の処理（ネットワーク管理機能処理）を収容する複数のハードウェア（サーバ）１０で構成されるネットワーク管理システムを構成している。
【００２７】
このネットワーク管理システムには、データ収集機能として任意のサーバ１０にエージェント１１，１２を配置し、各エージェント１１，１２はＮＥ１に対して既定の収集項目を既定の時刻及び周期でデータ収集している。この時、エージェント１１，１２は自分が収集すべき収集項目をエージェント収集項目対応テーブル２５から自分のエージェント番号に対応する収集項目番号を参照することで取得する。
【００２８】
図４はエージェント収集項目対応テーブル２５の構成例を示す図である。エージェント収集項目対応テーブル２５は、エージェントを一意に識別するエージェント番号に対して、収集項目を一意に識別する収集項目番号の一覧により構成されているテーブルである。図に示すように、エージェント番号とそのエージェント番号に対応する収集項目番号の一覧より構成されていることが分かる。
【００２９】
また、各収集項目のＮＥ１側の参照点、データ型、及び収集の周期は、収集項目テーブル２６から収集項目番号に対応する収集項目内容を参照することで取得する。ここで、収集項目テーブル２６は、収集項目を一意に識別する収集項目番号に対して、収集項目の名称、ＮＥ側参照点（ＮＥのＩＰアドレス及びＭＩＢ−ＯＩＤ等のＮＥ側データの所在を示す情報）、データ型、収集開始時刻、収集周期、運用状態（運用中、休止等）を１つの単位として保持するテーブルである。図５は収集項目テーブル２６の構成例を示す図である。図に示すように、収集項目番号に対応する収集項目内容があり、収集項目内容は、名前、ＮＥ側参照点、データ型、開始時刻、周期、運用状態等から構成されている。
【００３０】
このように構成すれば、エージェント収集項目対応テーブル２５と収集項目テーブル２６を設けることにより、エージェント番号から収集項目内容を取得することができ、収集項目内容に従って、データ収集を行なうことができる。
【００３１】
本発明は、図３に示す通り更に任意のサーバ１０上に、システムのハード的かつ又はソフト的な監視デバイスから出力されるサーバの処理性能情報を監視して、各サーバの輻輳発生及び輻輳からの復旧を検出する輻輳検出器２２と、各エージェントを一意に識別するエージェント番号とサーバを一意に識別するサーバ番号を対応付けているエージェント／サーバ対応テーブル２１、収集項目とエージェントの対応を輻輳状況により再編成するための様式を定義する収集ポリシー定義テーブル２４、輻輳検出器２２の通知により収集ポリシー定義テーブル２４から輻輳状況に適合するポリシーを取得し、収集項目とエージェントの対応を再編成する収集項目編成演算器２３が設けられている。
【００３２】
図６はエージェント／サーバ対応テーブル２１の構成例を示す図である。図に示すように、サーバ番号とサーバ番号に対応するエージェント番号とにより構成されている。
【００３３】
図７は本発明装置の動作を示すフローチャートで、輻輳検出器２２と、収集項目編成演算器２３の動作を示している。フローの各処理は以下の通りである。
（Ｓ１）輻輳検出
輻輳検出器２２はサーバ１０の性能情報を監視する監視デバイス２７の出力を定期的かつ又は割り込みで受信する。
（Ｓ２）輻輳判定
輻輳検出器２２は、監視デバイス２７の出力からサーバの輻輳度を算出して輻輳の有無を判定する。この時、輻輳度は、各監視デバイス２７の出力値に対して１つ又は複数の段階的なしきい値を設け、これらしきい値を超える毎に輻輳度を強化していき、逆にしきい値を下回った場合には輻輳度を弱めていくことで算出してもよい。又は、各監視デバイス２７の出力値ｘに対して任意の数式ｆ（ｘ）を割り当て、監視デバイス２７の出力値ｘをｆ（ｘ）に適用した結果の値を輻輳度としてもよい。
（Ｓ１０）輻輳状況の検出
ステップＳ２で判定した輻輳度に基づいて輻輳状況を判断する。定常時又は輻輳度が変わらない場合又は輻輳度が下降している場合にはステップＳ１に戻る。輻輳を検出した場合又は輻輳度が上昇した場合には、以下のシーケンスに進む。
（Ｓ３）輻輳対象取得
輻輳検出器２２は、輻輳しているサーバ番号を元に、エージェント／サーバ対応テーブル２１から輻輳しているサーバ１０に配置されているエージェント番号を取得する。更に、収集項目編成演算器２３に対して、輻輳度とエージェント番号を含む輻輳通知を行なう。ここまでは、輻輳検出器２２の動作である。
（Ｓ４）収集項目取得
収集項目編成演算器２３は、輻輳検出器２２から通知されたエージェント番号を元にエージェント収集項目対応テーブル２５を参照し、該当する輻輳しているサーバのエージェントに割り当てられている収集項目番号を取得する。
（Ｓ５）ポリシー取得・演算
収集項目編成演算器２３は、輻輳度と収集項目番号の数として自明である収集項目数を元に収集ポリシー定義テーブル２４を参照し、適合するポリシーを取得する。図８は収集ポリシー定義テーブルの構成例を示す図である。図に示すように、ポリシー条件とこれに対応する処理方式とで構成されている。処理方式は、割り当て方式と付帯属性から構成されている。
【００３４】
ポリシー条件とは、処理を決定するために必要となる情報である。以下にポリシー条件を列挙する。
ａ）輻輳度、且つ又は輻輳の進行度合い（急激であるか、ゆるやかであるか）
ｂ）収集項目数、且つ又は長・中・短期間等の段階付けられた収集周期に対するそれぞれの収集項目数（短周期の収集項目が多い、長周期の収集項目が多い等）
ｃ）優先度：複数のポリシー条件に適合する場合の採用優先度等が考えられる。処理方式及び付帯属性とは、各エージェントに割り当てる収集項目の配分方式及び配分するために必要なパラメータである。例えば、
ｄ）１つの既定のエージェントに全ての収集項目を配分する方式。この時、１つのエージェントは自分のエージェント番号を付帯属性とすることができる。
ｅ）既定の複数のエージェントに対してある配分方式により収集項目を配分する方式。この時、複数のエージェントはエージェント番号を付帯属性として指定することができ、配分方式も付帯属性とすることができる。
ｆ）１つの任意のエージェントに全ての収集項目を配分する方式。この時、任意のエージェントを選択する条件としては、例えば
▲１▼各エージェントに割り当て可能な収集項目数Ｔｎと、そのエージェントに現在割り当てられている収集項目数Ｃｎの差を許容可能な追加割り当て項目数Ａｎとして、輻輳エージェントに割り当てられている収集項目数Ｃｃａと以下の関係を満たすエージェントを算出する。
【００３５】
Ｔｎ＝Ｃｎ＋Ａｎ（１）
Ａｎ＞Ｃｃａ（２）
▲２▼各エージェントに割り当て可能な収集作業量Ｔｗと、そのエージェントに現在割り当てられている収集項目数Ｃｎ及びある収集項目Ｃｉとその周期Ｔｃｉの演算の和から、許容できる追加割り当て収集作業量Ａｗを算出し、輻輳エージェントに割り当てられている収集作業量Ｃａｗと以下の関係を満たすエージェントを算出する。
【００３６】
【数１】

【００３７】
ここで、ｆ（Ｃｉ，Ｔｃｉ）は任意の演算式である。
【００３８】
Ａｗ＞Ｃａｗ（４）
これらの演算式は付帯属性とすることができる。
ｇ）任意の複数のエージェントに対してある配分式により収集項目を配分する方式。この時、任意の複数のエージェントを選択する条件としては、例えばｆ）の（２）式や（４）式を満たすｎ個のエージェントに対してＣｃａ／ｎや、Ｃａｗ／ｎをそれぞれ均等配分する方式が考えられるし、任意の配分式を用いることも可能である。
ｈ）上記ａ）〜ｇ）配分方式が満足されない場合（他エージェントに配分しても未割り当ての収集項目が残ってしまう場合）でも、未割り当てとする収集項目の処理方式を付帯属性とすることも可能である。未割り当ての収集項目の処理方式としては、例えば、
▲１▼最も単純な方式として、収集時刻が遅い収集項目は、未割り当て収集項目として輻輳しているサーバのエージェントが引き続き収集を行なうようにすることが考えられる。
▲２▼収集周期が長い収集項目は、未割り当て収集項目として輻輳しているサーバのエージェントが引き続き収集を行なうようにすることが考えられる。
▲３▼収集周期が長く、次回の収集時刻が遅い収集項目を未割り当て収集項目とする。未割り当ての収集項目は、輻輳しているサーバのエージェントが引き続き収集を行なうようにすることが考えられる（▲１▼と▲２▼の組み合わせである）。
▲４▼収集項目に優先度（重要度）属性を付加して、優先度の低い収集項目は未割り当て収集項目として輻輳エージェントが引き続き収集を行なうようにすることが考えられる。
▲５▼上記▲１▼と▲４▼、▲２▼と▲４▼、▲３▼と▲４▼の組み合わせ等が考えられる。
（Ｓ６）再編成（新しいエージェント収集項目対応テーブル作成）
ポリシーを輻輳しているサーバのエージェントに割り当てられている収集項目に適用した結果、輻輳・非輻輳サーバの各エージェントに対する新しい収集項目の割り当てを元に、エージェント収集項目対応テーブル２５を更新する。なお、ポリシーが収集項目内容も対象とする場合には、収集項目テーブル２６の内容も取得する。
【００３９】
輻輳復旧後の処理は、復旧しても収集項目のエージェントの割り当てを変更しなくてもよいし、輻輳前の割り当てに切り戻すこともできる。これらもポリシーに定義するか、システムで暗黙的にいずれかを指定しておくことが可能である。
【００４０】
輻輳前の割り当てに切り戻す場合の動作フローを図６のＳ１１以下に示す。切り戻しのために、エージェント収集項目対応テーブル２５には、収集項目番号に対して輻輳時の割り当て元だったエージェントのエージェント番号（元エージェント番号と呼ぶ）を追加する。図９は追加したエージェント収集項目対応テーブル２５の構成例を示す図である。図４に示すエージェント収集項目対応テーブル２５の構成と比較して、元エージェント番号が記載されている点が異なる。この元エージェント番号を元に、切り戻すことができる。
（Ｓ１１）対応表復元
収集項目編成演算器２３は、エージェント収集項目対応テーブル２５を参照しし、元エージェント番号が関連付けられている収集項目番号を取得する。
（Ｓ１２）再編成
収集項目編成演算器２３は、取り出した収集項目を輻輳が復旧したサーバのエージェントに割り当てるよう、エージェント収集項目対応テーブル２５の割り当て先エージェントのテーブルから収集項目を削除し、該当エージェント番号に対応付けられているテーブルに収集項目を追加する。
【００４１】
次に、実施例について説明する。本実施例では、図３に示すような構成をとり、収集ポリシー定義テーブル２４には、図１０に示すポリシーが定義されているものとする。図１０には、ポリシー条件に対する各処理方式が具体的に示されている。各機能は、以下の状態及び情報を保持しているものとする。
・輻輳検出器２２
輻輳検出には、ＣＰＵ使用率を検出する監視デバイス２７の出力を取得して、輻輳判定にはＣＰＵ使用率を使用し、この値が４０％を超えた場合には輻輳と判定するものとする。従って、ここでは輻輳度はＣＰＵ使用率とする。
・エージェントは１１，１２とし、ネットワーク管理システム内に２つのエージェントが配置されており、各エージェント番号は１，２とする。また、エージェント１１にはｍ個の収集項目、エージェント１２にはｎ個の収集項目が割り当てられているとする。
・エージェント／サーバ対応テーブル２１には、以下の２つのレコードを保持している。
【００４２】
サーバＡ：エージェント１
サーバＢ：エージェント２
以上のような状態において、エージェント１１が配置されているサーバＡで輻輳が発生し、ＣＰＵ監視デバイス２７の出力は、ＣＰＵ使用率が７０％である場合について考える。
【００４３】
輻輳検出器２２は、ＣＰＵ監視デバイス２７の出方から、ＣＰＵ使用率が７０％となっていることを検出する。輻輳判定では、ＣＰＵ使用率が４０％を超えているため、輻輳検出器２２は、サーバＡが輻輳状態であると判断する。輻輳検出器２２は、エージェント／サーバ対応テーブル２１に対してサーバＡをキーとしてエージェント番号１を取得する。
【００４４】
これにより、サーバＡにエージェント番号１のエージェントが配置されていることが判明したので、輻輳検出器２２は、収集項目編成演算器２３に輻輳度７０％とエージェント番号１を付属した輻輳発生を通知する。
【００４５】
収集項目編成演算器２３は、輻輳度７０％と輻輳エージェント番号１を受信したので、輻輳エージェント番号１をキーに、エージェント収集項目対応テーブル２５から収集項目番号一覧を取得する。その収集項目数はｍである。
【００４６】
次に、収集ポリシー定義テーブル２４から輻輳度７０％の条件を満たすポリシーを取得する。この場合、輻輳度７０％はポリシー条件を満たすので、以下の処理方式を取得することになる。
【００４７】
割り当て方式：任意の１つのエージェントに一括割り当て
付帯属性：Ｔｎ＝Ｃｎ＋Ａｎ，ＡＮ＞Ｃｃａを満たすエージェントに割り当てる。
【００４８】
付帯属性：未割り当ての収集項目の処理方式は、長周期項目を優先的に残す。
【００４９】
付帯属性１の式により輻輳していないエージェント１２に適用するが、この時仮にエージェント１２のＡｎがＡｎ＞Ｃｃａを満たしている場合には、エージェント１１の収集項目数ｍの全てをエージェント１２に割り当てることになる。エージェント１２の収集項目割り当て後の収集項目数をｎ’とするとｎ’＝ｍ＋ｎとなる。もし、Ａｎ＞Ｃｃａを満たしていない場合には、付帯属性２により、エージェント１２のＣｎがＴｎ＝Ｃｎとなるまで、収集項目ｍ中の短周期の収集項目を割り当て、余った収集周期の長い収集項目はエージェント１１にそのまま割り当てる。
【００５０】
以上による割り当てにより、エージェント収集項目対応テーブル２５を更新することで処理が終了する。
【００５１】
このように、本発明によれば、ポリシーに基づき、収集項目数、収集周期、データ処理能力から各エージェントの収集対象範囲を動的に変更することができるので、輻輳が発生した場合でも確実にＮＥのデータ収集が既定の時刻及び周期で行なえるようなマネージャ−エージェント型情報収集アプリケーションを提供することができる。
【００５２】
【発明の効果】
以上説明したように、本発明によれば、以下の効果が得られる。
（１）請求項１記載の発明によれば、ポリシーに基づき、収集項目数，収集周期，データ処理能力から各エージェントの収集対象範囲を動的に変更させるようにしたので、輻輳が発生した場合でも確実にＮＥのデータ収集を行なうことができる。
（２）請求項２記載の発明によれば、エージェント収集項目対応テーブル２５と収集項目テーブル２６を設けることにより、エージェント番号から収集項目内容を取得することができ、収集項目内容に従って、データ収集を行なうことができる。
（３）請求項３記載の発明によれば、ポリシーに基づき、収集項目数、収集周期、データ処理能力から各エージェントの収集対象範囲を動的に変更することができるので、輻輳が発生した場合でも確実にＮＥのデータ収集が既定の時刻及び周期で行なえるようなマネージャ−エージェント型情報収集アプリケーションを提供することができる。
【００５３】
このように、本発明によれば、輻輳が発生した場合でも確実にＮＥのデータ収集が既定の時刻及び周期で行なえるようなマネージャ−エージェント型情報収集アプリケーションのロードバランス方法及び装置を提供することができる。
【図面の簡単な説明】
【図１】本発明方法の原理を示すフローチャートである。
【図２】本発明の原理ブロック図である。
【図３】本発明の実施の形態例を示すブロック図である。
【図４】エージェント収集項目対応テーブルの構成例を示す図である。
【図５】収集項目テーブルの構成例を示す図である。
【図６】エージェント／サーバ対応テーブルの構成例を示す図である。
【図７】本発明装置の動作を示すフローチャートである。
【図８】収集ポリシー定義テーブルの構成例を示す図である。
【図９】追加したエージェント収集項目対応テーブルの構成例を示す図である。
【図１０】収集ポリシー定義テーブルの具体的な構成例を示す図である。
【図１１】本発明の利用分野の説明図である。
【図１２】多量の故障情報を一度に受信した場合の動作説明図である。
【図１３】正常な場合と輻輳時におけるデータ収集のタイミングを示す図である。
【符号の説明】
１ＩＰネットワーク装置（ＮＥ）
２〜４ＩＰネットワーク
１０サーバ
１１，１２収集エージェント
１３故障管理機能部
１４ネットワーク制御機能部
１５データベース
１６，１７オペレータ
２０ネットワーク管理システム
２１エージェント／サーバ対応テーブル
２２輻輳検出器
２３収集項目編成演算器
２４収集ポリシー定義テーブル
２５エージェント収集項目対応テーブル
２６収集項目テーブル[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a method and apparatus for load balancing of a manager-agent type information collection application, and more particularly, to a plurality of network devices (NE: Network Element: a function for realizing a network service) such as a router, a bridge, and a computer device constituting an IP network. The present invention relates to a method and a device for load-balancing a manager-agent type data collection application in a network management system for managing the data collection application. Typical examples of the data collection function include periodic collection of the number of input / output packets, the number of packet losses, and the number of discarded packets of IP packets processed by the router.
[0002]
In response to the congestion of some processes of the network management system, maintaining the data collection function is not only the task of network management but also the issue of network service quality information disclosure to the customer based on the SLA. One of the aspects of the present invention relates to data collection and maintenance during such congestion.
[0003]
FIG. 11 is an explanatory diagram of a field of use of the present invention. In the figure, reference numeral 1 denotes an NE, and a plurality of NEs are connected to each other to form IP networks 2 to 4. Reference numeral 5 denotes a network management system connected to the NE 1 and monitoring a network. The network management system 5 transmits network information to the network administrator 6. Then, the network administrator 6 performs network monitoring and quality confirmation. The network management system 5 also transmits network information to the sales / call center 7. The sales / call center 7 presents the customer 8 as SLA information (service quality information).
[0004]
[Prior art]
As this type of network monitoring system, there is a technology for efficiently performing overload control in accordance with a cause of an overload state of a centralized management / control device with respect to load control of a centralized management / control type network (for example, see Patent Reference 1).
[0005]
Also, in a network management system including a plurality of manager nodes and an agent node that manages a plurality of network resources, there is a technology for automatically optimizing the frequency of management traffic generation according to the network load (for example, see Patent Reference 2).
[0006]
The network management system is composed of a plurality of hardware (called a server), and its processing unit (called a process) is an existing one such as a distributed object environment in recent years (or a CORBA (Common Object Request Broker Architecture)). With the technology, it is possible to change the server to be started arbitrarily.
[0007]
A network management system including a data collection application in a state where the NEs for realizing the IP network service are operating on a network,
(1) Network service quality assurance included in service quality information disclosure to customers based on SLA,
(2) Network maintenance, network failure detection, network fee calculation, capital investment forecast,
For example, a function is required to periodically collect and store the network performance information held by the NE. At this time, the network performance information held by the NE configuring the IP network is generally held in an MIB (Management Information Base) that can be acquired and set by SNMP (Simple Network Management Protocol) communication. . In many cases, the NE updates the network performance information held in the MIB at any time. In the case of numerical information added over time, if the information exceeds the maximum value accommodated in the MIB, the information is reset to 0 again. It usually starts from. Therefore, it is necessary for the data collection application to collect data at a unique collection cycle for each purpose of use of each collection item.
[0008]
[Patent Document 1]
JP-A-9-8907 (page 4, page 5, FIG. 1)
[Patent Document 2]
JP-A-9-270794 (page 4, FIG. 1)
[0009]
[Problems to be solved by the invention]
However, this data collection may be disrupted in the following respects.
{Circle around (1)} An agent placed on a certain server causes congestion in other management processing included in the server, so that the processing time of the server assigned to the agent is reduced, and it takes time to collect the agent. As a result, data collection cannot be completed in a predetermined collection cycle, and a delay of an actual collection time with respect to a predetermined time and / or a loss of collected data may occur.
{Circle around (2)} Although the congestion state caused by other management processes included in the server has been recovered by the agent located on a certain server, the delay between the actual collection time and the default collection time is due to the delay of the collection cycle. In some cases, collection may be performed at the same time but at a different time from the time of collection item registration.
[0010]
FIG. 12 is an operation explanatory diagram when a large amount of failure information is received at once. The same components as those in FIG. 11 are denoted by the same reference numerals. In the figure, 1 is NE. Reference numeral 10 denotes a server that monitors the status of the IP networks 2 to 4. In the

server

10, 11 and 12 are collection agents for collecting the state of the NE1, 13 is a failure management function unit for managing the failure state of the NE1, and 14 is a network control function unit. Reference numerals 13 and 14 indicate functions held by a general network management system, and may be different functions arranged in the same server. Reference numeral 15 denotes a database provided in the server 10 for storing collected information. The information in the database 15 is read by operators 16 and 17 as needed.
[0011]
In the system configured as described above, the collection agents 11 and 12 periodically collect the status of the NE 1. The data collected from the collection agents 11 and 12 is stored in the database 15 in the server 10. Here, when a large amount of failure information exceeding the processing capability of the failure management function unit 13 is simultaneously notified from the NE 1, congestion occurs in the server 10.
[0012]
When congestion occurs in the server 10, the collection agent 12 falls into a state in which data collection cannot be completed at a predetermined period and at a predetermined time. As a result, the collection agent 12 cannot register data in the database 15 at a predetermined period and at a predetermined time, and data is lost on the database 15.
[0013]
FIG. 13 is a diagram showing the timing of data collection in a normal case and in congestion. (A) shows data collection timing in a normal case, and (b) shows data collection timing in congestion. The collection items are A to C. The cycle Ta indicates the collection cycle of the collection item A, the cycle Tb indicates the collection cycle of the collection item B, and the cycle Tc indicates the collection cycle of the collection item C. The start of the cycle may or may not coincide.
[0014]
(B) shows the collection timing in the case of congestion. It is assumed that congestion has occurred at time t1 shown in the figure. During congestion, data cannot be collected at a predetermined period. Also, after the first data collection after the congestion recovery, the collection at the predetermined period is restarted, but there is a difference between the normal collection time and the normal collection time.
[0015]
The present invention solves such a problem, and a load balancing method of a manager-agent type information collection application that can surely perform data collection from a NE at a predetermined time and period even when congestion occurs. It is intended to provide a device.
[0016]
[Means for Solving the Problems]
(1) The invention described in claim 1 is as follows. FIG. 1 is a flowchart showing the principle of the method of the present invention. According to the present invention, in a manager-agent type application for collecting traffic information of an IP network from each IP network device, a collection target range of each agent is dynamically changed from the number of collection items, a collection cycle, and a data processing capability based on a policy. (Step 1), and each agent collects data based on the changed collection target range (step 2).
[0017]
Here, a policy is defined as an action to be performed for a certain condition and / or a parameter necessary for the action, and an application subject of the policy or an environment surrounding the application subject meets the above condition. In this case, the application subject acquires an action corresponding to the condition and / or a parameter necessary for the action, and changes its own action.
[0018]
Therefore, by defining the congestion status as a policy condition and defining the agent's collection target range as an action, data can be reliably collected from the NE even when congestion occurs.
(2) The invention described in claim 2 is as follows. FIG. 2 is a block diagram showing the principle of the present invention. The same components as those in FIG. 12 are denoted by the same reference numerals. In the figure, reference numeral 20 denotes a network management system including the server 10. The servers 10 are arranged in a distributed manner. In the server 10, reference numerals 11 and 12 denote collection agents that take in data from the NE at regular intervals, reference numeral 13 denotes a failure management function unit that manages a failure state of the NE network, and reference numeral 14 denotes a network control function unit.
[0019]
Reference numerals 13 and 14 indicate functions held by a general network management system, and may be different functions arranged in the same server. Reference numeral 15 denotes a database provided in the server 10 for storing collected information. Reference numerals 16 and 17 denote operators that use data stored in the database 15.
[0020]
In the network management system 20, an agent collection item correspondence table 25 stores an agent number for uniquely identifying an agent and a collection item number for uniquely identifying an item collected by the agent, and 26 can be uniquely identified. It is a collection item table that defines names of collection items, reference points, data types, collection cycles, start times, operation states, and the like of IP network device (NE) side information.
[0021]
Here, the agent collection item correspondence table 25 and the collection item table 26 may be provided in an arbitrary server 10, or a new server may be added and installed therein.
[0022]
By providing the agent collection item correspondence table 25 and the collection item table 26, the agent can acquire the content of the collection item from the agent number, and can collect data according to the content of the collection item.
(3) According to the third aspect of the present invention, a congestion monitoring unit monitors processing performance information of a server output from a hardware or software monitoring device and detects occurrence of congestion of each server and recovery from congestion. A detector 22, an agent number for uniquely identifying each agent, an agent / server correspondence table 21 for associating a server number for uniquely identifying a server, and a correspondence between collected items and agents are reorganized according to congestion status. Collection policy definition table 24 that defines the format for the collection, and a policy that matches the congestion situation is acquired from the collection policy definition table based on the notification from the congestion detector 22, and the collection item organization that reorganizes the correspondence between collection items and agents And an arithmetic unit 23. FIG. 3 shows an embodiment of the present invention.
[0023]
With this configuration, the collection range of each agent can be dynamically changed based on the number of collection items, the collection cycle, and the data processing capacity based on the policy, so that even if congestion occurs, data can be reliably transmitted from the NE. It is possible to provide a manager-agent type information collection application in which collection can be performed at a predetermined time and cycle.
[0024]
According to the present invention, even when a server in which an agent is placed is in a congested state, by assigning a collection item assigned to the agent to an agent of another server based on a policy, each collection item is Since the predetermined collection time and collection cycle can be maintained, it is possible to prevent a delay in the collection cycle, a loss of data, and a shift in the collection time after congestion recovery, even in a congested state.
[0025]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[0026]
First, the configuration shown in FIG. 3 will be further described. The system shown in FIG. 1 is a network management system including a plurality of hardware (servers) 10 accommodating a plurality of processes (network management function processes) for providing a network management function for managing a plurality of IP network devices (NEs) 1. Make up the system.
[0027]
In this network management system, agents 11 and 12 are arranged in an arbitrary server 10 as a data collection function, and each of the agents 11 and 12 collects data of a predetermined collection item with respect to the NE 1 at a predetermined time and cycle. . At this time, the agents 11 and 12 acquire the collection items to be collected by referring to the collection item numbers corresponding to their agent numbers from the agent collection item correspondence table 25.
[0028]
FIG. 4 is a diagram showing a configuration example of the agent collection item correspondence table 25. The agent collection item correspondence table 25 is a table including a list of collection item numbers for uniquely identifying collection items with respect to agent numbers for uniquely identifying agents. As shown in the figure, it is understood that the list is composed of a list of agent numbers and collection item numbers corresponding to the agent numbers.
[0029]
The NE1 reference point, data type, and collection cycle of each collection item are obtained by referring to the collection item contents corresponding to the collection item number from the collection item table 26. Here, the collection item table 26 indicates the name of the collection item, the NE-side reference point (the IP address of the NE, and the location of NE-side data such as MIB-OID) for the collection item number that uniquely identifies the collection item. Information), a data type, a collection start time, a collection cycle, and an operation state (operating, pause, etc.) as one unit. FIG. 5 is a diagram showing a configuration example of the collection item table 26. As shown in the figure, there is a collection item content corresponding to the collection item number, and the collection item content includes a name, a reference point on the NE side, a data type, a start time, a cycle, an operation state, and the like.
[0030]
With this configuration, by providing the agent collection item correspondence table 25 and the collection item table 26, the contents of the collection items can be acquired from the agent numbers, and data can be collected according to the contents of the collection items.
[0031]
The present invention further monitors server processing performance information output from a hardware and / or software monitoring device on an arbitrary server 10 as shown in FIG. 3, and detects congestion occurrence and congestion of each server. Congestion detector 22 for detecting the recovery of an agent, an agent / server correspondence table 21 for associating an agent number for uniquely identifying each agent and a server number for uniquely identifying a server, A collection policy definition table 24 that defines a format for reorganization, a policy that conforms to the congestion status from the collection policy definition table 24 based on the notification from the congestion detector 22, and a collection that reorganizes the correspondence between the collection items and the agents. An item organization calculator 23 is provided.
[0032]
FIG. 6 is a diagram showing a configuration example of the agent / server correspondence table 21. As shown in the figure, it is composed of a server number and an agent number corresponding to the server number.
[0033]
FIG. 7 is a flowchart showing the operation of the apparatus of the present invention, showing the operation of the congestion detector 22 and the collection item organization calculator 23. Each process of the flow is as follows.
(S1) Congestion detection
The congestion detector 22 receives the output of the monitoring device 27 that monitors the performance information of the server 10 periodically and / or by interruption.
(S2) Congestion determination
The congestion detector 22 calculates the degree of congestion of the server from the output of the monitoring device 27 and determines whether there is congestion. At this time, the degree of congestion is provided with one or a plurality of stepwise threshold values for the output value of each monitoring device 27, and the congestion degree is strengthened each time these threshold values are exceeded. May be calculated by weakening the degree of convergence. Alternatively, an arbitrary expression f (x) may be assigned to the output value x of each monitoring device 27, and the value obtained by applying the output value x of the monitoring device 27 to f (x) may be used as the congestion degree.
(S10) Detection of congestion status
The congestion status is determined based on the congestion degree determined in step S2. When the degree of convergence does not change or when the degree of convergence is falling, the process returns to step S1. When congestion is detected or when the degree of congestion increases, the process proceeds to the following sequence.
(S3) Congestion target acquisition
The congestion detector 22 acquires an agent number assigned to the congested server 10 from the agent / server correspondence table 21 based on the congested server number. Further, a congestion notification including the congestion degree and the agent number is sent to the collection item organization calculator 23. Up to this point, the operation of the congestion detector 22 has been described.
(S4) Acquisition of collection items
The collection item organization calculator 23 refers to the agent collection item correspondence table 25 based on the agent number notified from the congestion detector 22, and acquires the collection item number assigned to the agent of the corresponding congested server. I do.
(S5) Policy acquisition / operation
The collection item organization calculator 23 refers to the collection policy definition table 24 based on the number of collection items that are obvious as the congestion degree and the number of collection item numbers, and acquires a suitable policy. FIG. 8 is a diagram illustrating a configuration example of the collection policy definition table. As shown in the figure, a policy condition and a processing method corresponding to the policy condition are configured. The processing method includes an assignment method and an attribute.
[0034]
The policy condition is information necessary for determining a process. The policy conditions are listed below.
a) Congestion degree and / or progress degree of congestion (whether it is sharp or gentle)
b) The number of items to be collected and / or the number of items to be collected for each of the graded collection periods such as long, medium, and short periods (many items for short period, many items for long period, etc.)
c) Priority: An adoption priority when a plurality of policy conditions are satisfied is considered. The processing method and supplementary attributes are the distribution method of the collection items to be allocated to each agent and the parameters required for the distribution. For example,
d) A method of allocating all collected items to one predetermined agent. At this time, one agent can use its agent number as an attribute.
e) A method of allocating collection items to a predetermined plurality of agents according to a certain distribution method. At this time, the plurality of agents can specify the agent number as the attribute, and the distribution method can also be the attribute.
f) A method of allocating all collected items to one arbitrary agent. At this time, conditions for selecting an arbitrary agent include, for example,
{Circle around (1)} The number of collection items assigned to a congestion agent, as the difference An between the number of collection items Tn that can be assigned to each agent and the number of collection items Cn currently assigned to the agent and the number of additional assignment items An that can be tolerated. An agent that satisfies the following relationship with Cca is calculated.
[0035]
Tn = Cn + An (1)
An> Cca (2)
{Circle around (2)} From the total work amount Tw that can be allocated to each agent and the sum of the number Cn of collection items currently allocated to the agent and the sum of the operation of a certain collection item Ci and its cycle Tci, an allowable additional allocation work amount Aw Is calculated, and an agent that satisfies the following relationship with the collection work amount Caw assigned to the congestion agent is calculated.
[0036]
(Equation 1)

[0037]
Here, f (Ci, Tci) is an arbitrary operation expression.
[0038]
Aw> Caw (4)
These arithmetic expressions can be attributed.
g) A method of allocating collection items to an arbitrary plurality of agents according to a certain distribution formula. At this time, as a condition for selecting an arbitrary plurality of agents, for example, Cca / n and Caw / n are equally distributed to n agents that satisfy the expressions (2) and (4) of f). A method is conceivable, and an arbitrary distribution formula can be used.
h) Even if the above a) to g) are not satisfied with the distribution method (the case where uncollected collection items remain even if they are distributed to other agents), the processing method of the collection items to be unallocated should be the attribute. Is also possible. As a processing method of an unassigned collection item, for example,
{Circle around (1)} As the simplest method, it is conceivable that a collection item with a late collection time is continuously collected by an agent of a congested server as an unallocated collection item.
{Circle around (2)} It is conceivable that an agent of a server that is congested as a non-allocated collection item may continuously collect collection items whose collection cycle is long.
(3) A collection item whose collection cycle is long and whose next collection time is late is set as an unallocated collection item. It is conceivable that the unassigned collection items are to be continuously collected by the agent of the congested server (a combination of (1) and (2)).
{Circle around (4)} A priority (importance) attribute may be added to a collection item, and a collection item with a low priority may be collected by the congestion agent as an unallocated collection item.
(5) Combinations of the above (1) and (4), (2) and (4), (3) and (4) can be considered.
(S6) Reorganization (creation of new agent collection item correspondence table)
As a result of applying the policy to the collection items assigned to the agents of the congested server, the agent collection item correspondence table 25 is updated based on the assignment of new collection items to each agent of the congested / non-congested server. If the policy covers the contents of the collection items, the contents of the collection item table 26 are also acquired.
[0039]
In the process after the congestion is restored, the agent assignment of the collection items does not need to be changed even after the restoration, and the assignment before the congestion can be switched back. These can also be defined in the policy, or one of them can be specified implicitly by the system.
[0040]
The operation flow when switching back to the assignment before congestion is shown in S11 and subsequent steps in FIG. For switching back, the agent collection item correspondence table 25 adds the agent number of the agent that was the allocation source at the time of congestion to the collection item number (called the original agent number). FIG. 9 is a diagram showing a configuration example of the added agent collection item correspondence table 25. The difference is that the former agent number is described as compared with the configuration of the agent collection item correspondence table 25 shown in FIG. It is possible to switch back based on this former agent number.
(S11) Correspondence table restoration
The collection item organization calculator 23 refers to the agent collection item correspondence table 25 and acquires the collection item number associated with the original agent number.
(S12) Reorganization
The collection item organization calculator 23 deletes the collection items from the table of the assigned agent in the agent collection item correspondence table 25 so as to allocate the extracted collection items to the agent of the server whose congestion has been restored, and associates the collected items with the corresponding agent number. Add a collection item to the existing table.
[0041]
Next, examples will be described. In the present embodiment, it is assumed that the configuration shown in FIG. 3 is adopted, and the policy shown in FIG. 10 is defined in the collection policy definition table 24. FIG. 10 specifically shows each processing method for the policy condition. Each function is assumed to hold the following status and information.
・ Congestion detector 22
For the congestion detection, the output of the monitoring device 27 for detecting the CPU usage is obtained, and for the congestion determination, the CPU usage is used. When this value exceeds 40%, it is determined that the congestion has occurred. . Therefore, here, the congestion degree is the CPU usage rate.
The agents are 11 and 12, two agents are arranged in the network management system, and the agent numbers are 1 and 2, respectively. Further, it is assumed that m collection items are assigned to the agent 11 and n collection items are assigned to the agent 12.
The agent / server correspondence table 21 holds the following two records.
[0042]
Server A: Agent 1
Server B: Agent 2
In the above state, the case where congestion occurs in the server A where the agent 11 is arranged and the output of the CPU monitoring device 27 has a CPU utilization of 70% is considered.
[0043]
The congestion detector 22 detects that the CPU usage rate is 70% from the way the CPU monitoring device 27 exits. In the congestion determination, since the CPU usage rate exceeds 40%, the congestion detector 22 determines that the server A is in a congestion state. The congestion detector 22 acquires the agent number 1 from the agent / server correspondence table 21 using the server A as a key.
[0044]
As a result, it has been found that the agent with the agent number 1 is located on the server A, and the congestion detector 22 notifies the collection item organization calculator 23 of the congestion degree of 70% and the occurrence of congestion with the agent number 1 attached. I do.
[0045]
The collection item organization calculator 23 receives the congestion degree 70% and the congestion agent number 1, and thus obtains a collection item number list from the agent collection item correspondence table 25 using the congestion agent number 1 as a key. The number of collected items is m.
[0046]
Next, a policy that satisfies the condition of the congestion degree of 70% is acquired from the collection policy definition table 24. In this case, since the congestion degree of 70% satisfies the policy condition, the following processing method is obtained.
[0047]
Assignment method: Batch assignment to any one agent
Additional attribute: Tn = Cn + An, assigned to an agent satisfying AN> Cca.
[0048]
Additional attribute: The processing method of unassigned collection items leaves long-period items with priority.
[0049]
The formula of the attribute 1 is applied to the agent 12 which is not congested. At this time, if An of the agent 12 satisfies An> Cca, all the collection items m of the agent 11 are allocated to the agent 12. Will be. Assuming that the number of collection items after the collection item allocation of the agent 12 is n ′, n ′ = m + n. If An> Cca is not satisfied, short-term collection items in the collection items m are allocated according to the attribute 2 until Cn of the agent 12 becomes Tn = Cn. The item is assigned to the agent 11 as it is.
[0050]
With the above assignment, the agent collection item correspondence table 25 is updated, and the process ends.
[0051]
As described above, according to the present invention, the collection target range of each agent can be dynamically changed based on the number of collection items, the collection cycle, and the data processing capacity based on the policy, so that even if congestion occurs, it is ensured. It is possible to provide a manager-agent type information collection application that enables NE data collection at a predetermined time and period.
[0052]
【The invention's effect】
As described above, according to the present invention, the following effects can be obtained.
(1) According to the first aspect of the present invention, the collection range of each agent is dynamically changed based on the number of collection items, the collection cycle, and the data processing capacity based on the policy. However, NE data collection can be reliably performed.
(2) According to the second aspect of the present invention, by providing the agent collection item correspondence table 25 and the collection item table 26, the contents of the collection items can be obtained from the agent numbers, and the data collection can be performed according to the contents of the collection items. Can do it.
(3) According to the third aspect of the present invention, the collection range of each agent can be dynamically changed based on the number of collection items, the collection cycle, and the data processing capacity based on the policy. However, it is possible to provide a manager-agent type information collection application that can surely perform NE data collection at a predetermined time and cycle.
[0053]
As described above, according to the present invention, it is possible to provide a load balancing method and apparatus of a manager-agent type information collection application that can surely perform NE data collection at a predetermined time and period even when congestion occurs. Can be.
[Brief description of the drawings]
FIG. 1 is a flowchart showing the principle of the method of the present invention.
FIG. 2 is a principle block diagram of the present invention.
FIG. 3 is a block diagram showing an embodiment of the present invention.
FIG. 4 is a diagram showing a configuration example of an agent collection item correspondence table.
FIG. 5 is a diagram illustrating a configuration example of a collection item table.
FIG. 6 is a diagram illustrating a configuration example of an agent / server correspondence table.
FIG. 7 is a flowchart showing the operation of the device of the present invention.
FIG. 8 is a diagram illustrating a configuration example of a collection policy definition table.
FIG. 9 is a diagram illustrating a configuration example of an added agent collection item correspondence table.
FIG. 10 is a diagram illustrating a specific configuration example of a collection policy definition table.
FIG. 11 is an explanatory diagram of a field of use of the present invention.
FIG. 12 is an operation explanatory diagram when a large amount of failure information is received at once.
FIG. 13 is a diagram showing data collection timings in a normal case and during congestion.
[Explanation of symbols]
1 IP network equipment (NE)
2-4 IP network
10 Server
11,12 Collection agent
13 Failure management function section
14 Network control function section
15 Database
16, 17 Operator
20 Network Management System
21 Agent / server correspondence table
22 Congestion detector
23 Collection item organization calculator
24 Collection policy definition table
25 Agent collection item correspondence table
26 Collection item table

Claims

In a manager-agent type application that collects IP network traffic information from each IP network device, the collection target range of each agent is dynamically changed based on the number of items to be collected, the collection period, and the data processing capability based on a policy (step 1). ),
Each agent collects data based on the changed collection target range (step 2)
A load balancing method for a manager-agent type information collecting application.

A network management system comprising a plurality of hardware for managing a plurality of IP network devices, comprising: an SLA (Service Level Agreement); service quality information disclosure based thereon; and traffic information used for grasping a network operation status. Is collected from a plurality of IP network devices in a manager-agent type application,
An agent in a server, which collects a plurality of pieces of traffic information held by a plurality of IP network devices and is distributed and arranged,
An agent collection item correspondence table that associates an agent number that uniquely identifies an agent with a collection item number that uniquely identifies an item collected by the agent,
A collection item table that defines the names of collection items that can be uniquely identified, reference points, data types, and collection periods of information on the IP network device side;
A load balancer for a manager-agent type information collection application, comprising:

A congestion detector that monitors processing performance information of the server output from the hardware and / or software monitoring device of the system and detects congestion occurrence and recovery from congestion of each server,
An agent / server correspondence table which associates an agent number for uniquely identifying each agent with a server number for uniquely identifying a server,
A collection policy definition table that defines a format for reorganizing the correspondence between collection items and agents according to congestion conditions;
By the notification of the congestion detector, obtain a policy that matches the congestion situation from the collection policy definition table, a collection item organization calculator to reorganize the correspondence between the collection items and agents,
3. The load balancer for a manager-agent type information collection application according to claim 2, further comprising: