JPH06222937A

JPH06222937A - Automatic fault recovering system in network managing system

Info

Publication number: JPH06222937A
Application number: JP5025973A
Authority: JP
Inventors: Chisato Ohira; 千里大平
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1993-01-22
Filing date: 1993-01-22
Publication date: 1994-08-12
Anticipated expiration: 2010-06-07
Also published as: JPH0754474B2

Abstract

PURPOSE:To save labor for fault recovery and to perform speedy fault recovery in a network managing system. CONSTITUTION:When any fault is generated at a managing object 4 in a network managing system 1, an agent 3 receives managing object information corresponding to that managing object 4 and the kind of error corresponding to the kind of the fault from the managing object 4 and retrieves a data base 5 with those data as keys. First of all, a recoverying method provided with the highest priority and a limitation for the number of times of retry are provided and by that recoverying method, the recoverying method is repeatedly tried until the fault recovery is made successful or the limitation for the number of times of retry is exceeded. When the number of times of retry for fault recovery exceeds the limitation for the number of times of retry, the retrieval of the data base 5 and the try of fault recovery based on the result are performed until the recoverying method of non-retrieval concerning the same managing object information and the error kind is eliminated.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、ネットワーク管理シス
テムにおける障害自動復旧方式に関し、特にデータベー
スを利用した障害自動復旧方式に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a failure automatic recovery method in a network management system, and more particularly to a failure automatic recovery method using a database.

【０００２】[0002]

【従来の技術】マネージャとエージェントによってモデ
ル化され、マネージャが複数のエージェントに対して管
理操作の発行および通知の受信を行い、エージェントが
複数の管理対象に対して管理操作の発行および通知の受
信を行うネットワーク管理システムにおいて、管理対象
からエージェントに対して障害発生の通知があり、管理
対象に対して障害復旧を行う場合、従来、管理対象に対
して人が直接障害復旧を行うか、またはエージェントで
あらかじめ定められた単一の方法によって自動的に障害
復旧を行っている。Modeled by a manager and an agent, the manager issues management operations to multiple agents and receives notifications, and the agent issues management operations to multiple management targets and receives notifications. In the network management system to perform, when the management target notifies the agent of failure occurrence, and when performing failure recovery for the management target, conventionally, a person directly performs failure recovery for the management target or Failure recovery is automatically performed by a single predetermined method.

【０００３】[0003]

【発明が解決しようとする課題】この従来の技術では、
一種類の障害復旧方法によって確実に障害が復旧される
ことを前提としている。すなわち、ある障害復旧方法を
試行したにもかかわらず失敗した場合、自動的に他の障
害復旧方法を試行する、という方法をとることができな
い。したがって、障害復旧に人手の介入が多くなり、迅
速な障害復旧を妨げる、という課題がある。SUMMARY OF THE INVENTION In this conventional technique,
It is premised that the failure is reliably recovered by one kind of failure recovery method. That is, it is impossible to automatically try another failure recovery method when a failure recovery method fails despite the attempt. Therefore, there is a problem that a large amount of manual intervention is required for disaster recovery, which hinders quick disaster recovery.

【０００４】[0004]

【課題を解決するための手段】本発明は、障害復旧の省
力化および迅速な障害復旧のため、データベースにある
複数の障害復旧方法を、実際に障害が復旧されるまで、
自動的に繰り返し試行する。SUMMARY OF THE INVENTION The present invention provides a plurality of failure recovery methods in a database for labor saving and quick failure recovery until failure is actually recovered.
Automatically repeats.

【０００５】本発明による、ネットワーク管理における
障害自動復旧方式は、一定のアルゴリズムに従ってデー
タベースを利用し、障害発生時の復旧をエージェントに
任せることによって、障害復旧の省力化、及び迅速な復
旧が行われる。The automatic failure recovery system in network management according to the present invention uses a database according to a certain algorithm and leaves the recovery when an error occurs to an agent, thereby saving labor in failure recovery and speedy recovery. .

【０００６】[0006]

【実施例】次に、本発明について図面を参照して説明す
る。DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, the present invention will be described with reference to the drawings.

【０００７】図１（ａ）は、ネットワーク管理システム
１の構成を示す説明図である。ネットワーク管理システ
ム１は１個のマネージャ２および複数のエージェント
３、管理対象４、データベース５から構成される。FIG. 1A is an explanatory diagram showing the configuration of the network management system 1. The network management system 1 is composed of one manager 2, a plurality of agents 3, a management target 4 and a database 5.

【０００８】マネージャ２は配下のエージェント３に対
して管理操作の発行および通知の受信を行い、エージェ
ント３は配下の管理対象４に対して管理操作の発行およ
び通知の受信を行う。また、各エージェント３は配下の
データベース５を検索することができる。The manager 2 issues a management operation to the subordinate agent 3 and receives the notification, and the agent 3 issues a management operation to the subordinate management target 4 and receives the notification. Also, each agent 3 can search the database 5 under it.

【０００９】また、図１（ｂ）は、本発明の主要な部分
のみの手段を説明するための図である。FIG. 1B is a diagram for explaining the means of only the main part of the present invention.

【００１０】図２は、データベース５が持つ情報を示す
説明図である。データベース５内には、以下の障害復旧
に関する情報が格納されている。FIG. 2 is an explanatory diagram showing information held by the database 5. The database 5 stores the following information regarding failure recovery.

【００１１】（１）管理対象情報１１（２）エラー種類１２（３）復旧方法１３（４）その復旧方法の優先度１４（５）その復旧方法の再試行回数制限１５各管理対象情報１１に対して複数のエラー種類１２が対
応し、各エラー種類１２に対して複数の復旧方法１３が
対応するが、復旧方法１３とその優先度１４および再試
行回数制限１５はそれぞれ１：１に対応する。(1) Management target information 11 (2) Error type 12 (3) Recovery method 13 (4) Priority of the recovery method 14 (5) Retry limit of the recovery method 15 Each management target information 11 On the other hand, a plurality of error types 12 correspond, and a plurality of recovery methods 13 correspond to each error type 12, but the recovery method 13, its priority 14 and retry count limit 15 correspond to 1: 1 respectively. .

【００１２】復旧アルゴリズムの例を図３に示す。例で
は、エージェント３が復旧アルゴリズムに従って復旧を
試行する場合を示す。An example of the restoration algorithm is shown in FIG. The example shows a case where the agent 3 attempts recovery according to the recovery algorithm.

【００１３】図４は、管理対象４からエージェント３へ
の障害通知と、データベース検索のシーケンスを示す説
明図である。FIG. 4 is an explanatory diagram showing a sequence of a fault notification from the management target 4 to the agent 3 and a database search.

【００１４】以下、図１（ｂ）の本発明の各手段の動き
を、各図を参照しながら説明する。The operation of each means of the present invention shown in FIG. 1B will be described below with reference to the drawings.

【００１５】まず、管理対象４に障害が発生した場合、
管理対象４はエージェント３に対して障害通知手段１０
１により、障害発生通知を行う。障害発生通知には、そ
の管理対象４と１：１に対応する管理対象情報１１と、
障害の種類に対応するエラー種類１２が含まれる。First, when a failure occurs in the managed object 4,
The management target 4 notifies the agent 3 of failure notification means 10
1, the failure occurrence notification is performed. The failure occurrence notification includes the management target information 11 corresponding to the management target 4 and 1: 1,
The error type 12 corresponding to the type of failure is included.

【００１６】エージェント３は、管理対象情報１１およ
びエラー種類１２をキーにして、データベース検索手段
１０２によりデータベース５を検索する（データベース
検索動作）。エージェント３は、最も高い優先度１４を
持つ復旧方法１３および再試行回数制限１５をデータベ
ース５から得る（データベース検索結果通知）。The agent 3 searches the database 5 by the database searching means 102 using the management target information 11 and the error type 12 as a key (database searching operation). The agent 3 obtains the recovery method 13 having the highest priority 14 and the retry count limit 15 from the database 5 (database search result notification).

【００１７】エージェント３は、その復旧方法１３によ
る障害復旧を復旧試行手段１０３により、管理対象４に
対して試行する（障害復旧動作）。The agent 3 tries the failure recovery by the recovery method 13 to the managed object 4 by the recovery trial means 103 (failure recovery operation).

【００１８】図５は、障害復旧の試行および成功のシー
ケンスを示す説明図である。エージェント３から管理対
象４に対する障害復旧の試行が成功した場合、管理対象
４はエージェント３に対して障害復旧結果の成功通知を
行い、障害復旧のアルゴリズムは終了する。FIG. 5 is an explanatory diagram showing a sequence of trial and success of failure recovery. When the agent 3 succeeds in the failure recovery attempt on the management target 4, the management target 4 notifies the agent 3 of the success of the failure recovery result, and the failure recovery algorithm ends.

【００１９】図６は、障害復旧の試行および失敗のシー
ケンスを示す説明図である。障害復旧が失敗した場合、
管理対象４からエージェント３に対して障害復旧結果の
失敗通知が行われる。失敗通知を受診したエージェント
３は同一の管理対象４に対して、障害復旧が成功するか
または再試行回数制限１５を越えるまで、繰り返し同一
の復旧方法１３を試行する。FIG. 6 is an explanatory diagram showing a sequence of trial and failure of failure recovery. If disaster recovery fails,
The management target 4 notifies the agent 3 of the failure recovery result failure. The agent 3 receiving the failure notification repeatedly attempts the same recovery method 13 on the same management target 4 until failure recovery is successful or the retry count limit 15 is exceeded.

【００２０】障害復旧の再試行回数が再試行回数制限１
５を越えた場合、エージェント３は再び配下のデータベ
ース５を検索し、同一の管理対象情報１１およびエラー
種類１２に関して既に検索されたものの次に高い優先度
１４がもし存在すれば、その優先度１４に対応する復旧
方法１３および再試行回数制限１５を得る。その場合、
障害復旧の試行は上と同じ方法によって繰り返される。The number of retries for failure recovery is the retry count limit 1
When the number exceeds 5, the agent 3 searches the subordinate database 5 again, and if the next highest priority 14 of the same management target information 11 and the error type 12 that has already been searched exists, the priority 14 thereof is exceeded. The recovery method 13 and the retry count limit 15 corresponding to are obtained. In that case,
The disaster recovery attempts are repeated in the same manner as above.

【００２１】データベース５の検索とその結果に基づく
障害復旧の試行は、同一の管理対象情報１１およびエラ
ー種類１２に関する未検索の復旧方法１３が無くなるま
で行われる。The search of the database 5 and the trial of the failure recovery based on the result thereof are carried out until there is no unsearched recovery method 13 for the same management target information 11 and error type 12.

【００２２】すべての障害復旧の試行が失敗した場合、
エージェント３はマネージャ２に対して障害発生の通知
を行い、障害復旧のアルゴリズムは終了する。If all failure recovery attempts fail,
The agent 3 notifies the manager 2 that a failure has occurred, and the failure recovery algorithm ends.

【００２３】すなわち、管理対象４から障害復旧失敗の
通知を受診した場合、エージェント３は図３に示される
アルゴリズムに従い、（１）再び管理対象４に対して障
害復旧を試行する、（２）再びデータベース５を検索す
る、（３）マネージャ２に対して障害発生通知を行い、
障害復旧を終了する、のいずれかを行う。That is, when the notification of failure recovery failure is received from the managed object 4, the agent 3 follows the algorithm shown in FIG. 3 to (1) try again the failure recovery for the managed object 4, and (2) again. Search the database 5, (3) notify the manager 2 of the failure occurrence,
Either end the disaster recovery.

【００２４】[0024]

【発明の効果】以上説明したように、本発明によるネッ
トワーク管理システムにおける障害自動復旧方式は、一
定のアルゴリズムに従ってデータベースを利用すること
によって、複数の障害復旧方法を、実際に障害が復旧さ
れるまで、自動的に繰り返し試行することができ、障害
復旧の省力化および迅速な障害復旧につながる、という
効果がある。As described above, the automatic failure recovery method in the network management system according to the present invention uses a plurality of failure recovery methods by utilizing a database according to a certain algorithm until the failure is actually recovered. It is possible to automatically and repeatedly perform trials, which leads to labor saving in disaster recovery and quick disaster recovery.

[Brief description of drawings]

【図１】ネットワーク管理システムの構成及び、本発明
の障害自動復旧方式を説明するための説明図である。FIG. 1 is an explanatory diagram for explaining a configuration of a network management system and a failure automatic restoration method of the present invention.

【図２】障害復旧情報を格納したデータベースが持つ情
報を示す説明図である。FIG. 2 is an explanatory diagram showing information possessed by a database storing failure recovery information.

【図３】障害復旧のアルゴリズムの例を示すフロー図で
ある。FIG. 3 is a flowchart showing an example of a fault recovery algorithm.

【図４】データベースの検索動作および検索結果通知の
シーケンスを示す説明図である。FIG. 4 is an explanatory diagram showing a database search operation and a search result notification sequence.

【図５】障害復旧の試行および成功のシーケンスを示す
説明図である。FIG. 5 is an explanatory diagram showing a sequence of trial and success of failure recovery.

【図６】障害復旧の試行および失敗のシーケンスを示す
説明図である。FIG. 6 is an explanatory diagram showing a sequence of trial and failure of failure recovery.

[Explanation of symbols]

１ネットワーク管理システム２マネージャ３エージェント４管理対象５データベース１１管理対象情報１２エラー種類１３復旧方法１４優先度１５再試行回数制限１０１障害通知手段１０２データベース検索手段１０３復旧試行手段 1 Network Management System 2 Manager 3 Agent 4 Management Target 5 Database 11 Management Target Information 12 Error Type 13 Recovery Method 14 Priority 15 Retry Count Limit 101 Fault Notification Means 102 Database Search Means 103 Recovery Trial Means

Claims

[Claims]

1. A network management system having a plurality of agents, a manager for managing the agents, a plurality of management targets managed by the agents, and a database, wherein the management targets notify the agent of occurrence of a failure. Failure notifying means for performing, and a plurality of recovery methods according to the management target in the database based on the notification information by the failure notifying means,
And its priority, and a database search means for searching the data of the number of retries limit, and from the restoration method of the high priority to the restoration method of the low priority, within the limit of the number of trials, the failure An automatic failure recovery method in a network management system, which comprises:

2. The automatic failure recovery system in a network management system according to claim 1, wherein the notification information by the failure notification means includes management target information and error type data corresponding to the type of failure.