JP2004260729A

JP2004260729A - Operation system, and recovery processing method for communication device

Info

Publication number: JP2004260729A
Application number: JP2003051482A
Authority: JP
Inventors: Yukio Tono; 幸夫東野; Kosei Ono; 孝生大野; Satoshi Oyamada; 聡小山田; Nobuhiro Tanigawa; 延広谷川
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2003-02-27
Filing date: 2003-02-27
Publication date: 2004-09-16

Abstract

<P>PROBLEM TO BE SOLVED: To avoid the execution of the same recovery process to the same device based on the fault message which is caused by one fault and transmitted from the communication device in various timings. <P>SOLUTION: In an automatic process execution operation system 1 for executing the automatic recovery process when receiving a fault message, a history of processes which are executed to the communication device is recorded in an operation history DB. Even when the fault message caused by one fault is transmitted in various timings, by referring to the operation history DB 13, another overlapping processes to the same communication device can be avoided. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明はオペレーションシステム、及び、通信装置に対する回復措置方法に関し、特に通信網を構成する通信装置からの故障メッセージに対して回復措置を実行するのに好適なオペレーションシステム、及び、通信装置に対する回復措置方法に関する。
【０００２】
【従来の技術】
通信ネットワークを構成する通信装置に故障が発生した場合には、交換、伝送、無線などの装置種別、方式毎の通信装置を管理する各オペレーションシステムに、通信装置から故障メッセージが通知される。
オペレーションシステムに送信された故障メッセージを分析することにより、分析結果に基づいた回復措置の実行が可能になる。しかし、１つの装置故障に起因して発生する多くの故障メッセージから、該故障を特定することは困難であり、従来においては、この分析作業を保守者により行い、通信装置を管理する各オペレーションシステムを通じて保守者により故障回復措置を行う必要があった。
【０００３】
一方、近年においては、故障メッセージの通知を契機に自動的に回復措置を実施する自動措置実行オペレーションシステムについても検討されている（非特許文献１参照）。
【０００４】
【非特許文献１】
ＣＤ−ＲＯＭ「電子情報通信学会２００２年ソサイエティ大会講演論文集」、社団法人電子情報通信学会、平成１４年８月２０日、講演番号：Ｂ−６−７６、「通信移動網オペレーションにおける故障措置自動化の検討」
【０００５】
【発明が解決しようとする課題】
しかしながら、上述のような自動回復措置方式を実装するオペレーションシステムの実現においては、次のような故障メッセージの発生現象に基づく問題がある。
すなわち、自動的に回復措置を実施したとき、該当装置の回復過程において、或いは該当装置が未回復のままであった場合には、該当装置が再度故障メッセージを通知する。例えば、故障発生時には「故障発生」の故障メッセージが、回復過程においては措置の実行により装置の一部がリセットされたことにより「未実装」を表す故障メッセージ、というように内容は異なるものの故障が回復するまでの間、通信装置から故障メッセージが送信されることとなる。
【０００６】
このため、自動回復措置方式を実装するオペレーションシステムにおいて、故障回復措置により発生する故障メッセージや措置後に通知される故障メッセージを、通常のメッセージと同様に分析し措置実行することは意味のない回復措置を延々と繰り返しシステムの負荷となる。更にこれによって、該当装置だけでなく他の故障通信装置を早期に回復させる妨げとなるという問題がある。
本発明の目的は、上記の問題点を解決し、重複する措置の実行を回避可能なオペレーションシステム、及び、通信装置に対する回復措置方法を提供することにある。
【０００７】
【課題を解決するための手段】
本発明の請求項１によるオペレーションシステムは、通信媒体を介して接続される装置から通知されるメッセージに基づいて、該装置に対する処理を実行することにより、該装置を管理するオペレーションシステムであって、
実行された処理を記録する処理記録手段と、
前記処理記録手段に記録されている、メッセージの通知に基づいて同一の装置に対して既に実行された処理を参照して、新たなメッセージの通知に基づく該装置に対する処理を決定する処理決定手段と、を有することを特徴とする。
【０００８】
このように構成することにより、各タイミングにおいて通知されるメッセージに基づく処理の対応をとることができ、装置の効率的な管理が可能になる。
ここで、オペレーションシステムとは、上記機能を備えた端末、装置その他の管理システムである。
本発明の請求項２によるオペレーションシステムは、請求項１において、前記メッセージは、通信網を構成する通信装置における故障を通知する故障メッセージであることを特徴とする。
【０００９】
請求項１の構成のオペレーションシステムは、特に、通信網を構成する通信装置から送信される故障メッセージに基づく処理を行なう場合に、効果的である。
すなわち、通信装置は通信網によりお互いに接続されることから１の装置の故障に起因して複数の装置から故障メッセージがそれぞれのタイミングで送信され、あるいは、故障装置の回復過程等においても異なる内容の故障メッセージが送信されるなど一の故障に起因した送信タイミング及び故障メッセージの内容は様々である。これらを全て１の故障に起因すると関連付けることができなくても、送信タイミングが異なっても、通知される故障メッセージだけではなく、通信装置への措置にも着目することにより、正確な通信装置への制御を行うことができる。
【００１０】
ここで、請求項１に記載の新たなメッセージの通知とは、時間的なずれをさすものであり、新たな故障に起因する故障メッセージを意味するものではない。
本発明の請求項３によるオペレーションシステムは、請求項２において、前記処理決定手段は、メッセージの通知に基づいて同一の装置に対して既に同一の処理が実行されている場合には、再度の処理を行わない決定をすることを特徴とする。
【００１１】
これにより、同一の装置に対する重複した処理の実行を回避して、例えば、故障メッセージに対する回復措置の実行である場合には、ソフトの初期化等の回復措置により回復過程におかれた通信装置を再び初期化等することによる回復の遅延、無意味な措置を繰り返すことによるオペレーションシステムの負荷、これに基づく他の故障発生に対する措置の遅れ等を回避することができる。
【００１２】
本発明の請求項４によるオペレーションシステムは、請求項２又は３において、前記故障メッセージに対する回復措置が段階的に定義された措置テーブルを更に有し、
前記処理決定手段は、前記措置テーブルを参照して、未回復の通信装置に対して既に実行された措置内容より上位の措置内容の回復措置の実行を決定することを特徴とする。
【００１３】
このように措置履歴を参照して、動的に措置を更新して実行することにより、より、効率的な回復措置を講じることができる。
本発明の請求項５によるオペレーションシステムは、請求項２〜４のいずれか１項において、前記故障メッセージの通知に基づいて、該故障メッセージ発生の要因となった故障を特定する故障特定手段を、更に含み、
前記処理決定手段は、前記故障特定手段により特定された同一の故障に起因する故障メッセージの通知に基づいて既に実行された処理を参照して、新たなメッセージの通知に基づく処理を決定することを特徴とする。
【００１４】
本発明の請求項６による通信装置に対する回復措置方法は、通信装置から通知される故障メッセージに基づいて、該通信装置に対する回復措置を実行するための回復措置方法であって、
同一の前記通信装置に対して既に実行された回復措置の記録に基づいて、新たな故障メッセージの通知に基づく該通信装置に対する回復措置を決定、実行することを特徴とする。
【００１５】
本発明の請求項７による通信装置に対する回復措置方法は、請求項６において、メッセージの通知に基づいて同一の装置に対して既に同一の処理が実行されている場合には、再度の処理を行わないことを特徴とする。
本発明の請求項８による通信装置に対する回復措置方法は、請求項６又は７において、未回復の通信装置に対して既に実行された措置内容より上位の措置内容の回復措置を実行することを特徴とする。
【００１６】
【発明の実施の形態】
次に、図面を参照して本発明の実施の形態について説明する。なお、以下の説明において参照する各図においては、他の図と同等の部分が同一符号によって示されている。
図１には、本実施の形態にかかる自動措置実行オペレーションシステム１が管理するネットワークの構成が示されている。同図のネットワークは、交換系、伝送系及び無線系の各種のネットワークエレメント９（以下、「ＮＥ」と称す）を含んで構成されている。本実施の形態にかかる自動措置実行オペレーションシステム１は、ＮＥ９から各種別のＮＥ９を管理するＮＥ−ＯＰＳ８（ＮＥＯｐｅｒｅｔｉｏｎＳｙｓｔｅｍ）を介して故障メッセージを受信し、当該故障メッセージを分析し、故障回復のための措置を自動実行する。自動措置実行オペレーションシステム１は、たとえば、汎用計算機により構成する。
【００１７】
図２には本実施の形態の自動措置実行オペレーションシステム１の構成が、図３には自動措置実行オペレーションシステムの動作の一例を説明するフローチャートが示されている。以下、両図を用いて自動措置実行オペレーションシステム１の構成及び動作について説明する。
ＮＥに故障が発生した場合に図２中のＮＥ‐ＯＰＳを経由して通知された故障メッセージは、警報分析部１１において受信される（Ｓ２０１）。
【００１８】
警報分析部１１は、故障メッセージ毎に設定された回復措置シナリオ１２の中から、受信した故障メッセージに対応する回復措置シナリオ１２を選択し、格納する。この回復措置シナリオは、故障メッセージの通知に対する回復措置実行手順が示されたものであり、主要因を推定する推定ルール部１５と、回復措置を実行する措置ルール部１６と、において実施されることにより回復措置が実行される。
【００１９】
推定ルール部１５は、定義された動作ルールに沿って、故障メッセージ発生の主要因推定を実施する。そして、主要因推定により推定された主要因に応じた措置対象装置および措置内容を確定する（Ｓ２０２）。
措置ルール部１６は、確定された措置内容を実行する。この際、本発明においては、運用履歴ＤＢ１３を参照して、確定された措置内容と同様の措置の履歴があるか否かを判断し（Ｓ２０３）、これにより措置を実行するか否かを決定する。
【００２０】
措置の履歴があれば（図３中の「履歴有」）、同一の措置が実行済みであることを認識し、確定された措置を実行することなく終了する。これにより、以前に実行された回復措置により、回復過程にある対象装置に対する重複した措置の実行を回避することができる。
措置の履歴がなければ（図３中の「履歴なし」）、対象装置の状態確認を行う（Ｓ２０４）。状態確認は、例えば、ＮＥ−ＯＰＳを介して対象装置に対し、その状態情報（故障、回復など）を取得するためのコマンドを投入することにより行う。
【００２１】
状態確認の結果、対象装置が「回復」している場合には、運用履歴ＤＢから措置履歴の削除処理を行なう（Ｓ２０５）。これにより、新たに故障が発生した場合に、措置の実行が可能になる。
状態確認の結果、対象装置が「未回復」である場合には、措置テーブル１４を参照して、上位の措置が存在するか確認する（Ｓ２０６）。
【００２２】
上位の措置が存在する場合には（図３中の「有」）、上位の措置を運用履歴ＤＢに記録し（Ｓ２０７）、当該上位の措置を実行する（Ｓ２０８）。これを、装置の回復が確認されるまで繰り返す。
図４には、運用履歴ＤＢ１３の構成例が示されている。同図では、措置がなされたＮＥを特定する「故障ＮＥ」、通知された故障メッセージを特定する「メッセージＮｏ．」、実行された措置を特定する「措置内容」、措置の実行状態を表す「措置フラグ」、「措置レベル」の各項目より構成されている。これにより、対象装置において実行された措置、措置のレベルを参照して、新たなメッセージの通知に基づく措置内容を決定が可能になる。
【００２３】
図５には、措置テーブルの構成例が示されている。同図においては、故障メッセージの「メッセージＮｏ．」に措置及び措置レベルが対応付けられている。「メッセージＮｏ．」は、本実施の形態においては、故障メッセージにユニークに付される番号であり、これによりどのＮＥのどのような機能に障害が生じたのかを特定可能とするものである。
【００２４】
図６はエスカレーション措置の物理的な段階と、図７はエスカレーションの論理的な段階と、を説明する図である。上述のように、措置にもかかわらず装置が未回復のままである場合には、より上位のレベルの措置を行う。図６に示されるように、物理的なエスカレーションにおいては、レベル１で、まずＴ−ＰＡ（送信用パワーアンプ）等の機能の一部を担うカードＬ１１をリセットし、レベル２では、複数のカードＬ１１によって構成される機能ブロックＬ１２をリセットし、それでも回復しない場合は、レベル３において通信装置Ｌ１３全体をリセットする措置を実行する。図７に示されるように、論理的エスカレーションにおいては、ソフトの初期化Ｌ２１、ハードの初期化Ｌ２２、ハードの初期化及びプログラムを外部からロードする措置Ｌ２３の順にレベルが上位の措置となる。
【００２５】
図８〜図１２には、上述の構成及び動作を行う自動措置実行オペレーションシステムにおける処理シーケンスの例が示されている。
図８には、運用履歴ＤＢに措置記録がある場合のシーケンスが示されている。参照結果が措置履歴有（Ｓ８２）を確認後、更に措置テーブルを参照して上位の措置が存在するかを確認する（Ｓ８３）。既に実行したレベルの措置より上位の措置がなければ措置を実行せず終息する（Ｓ８５）。
【００２６】
図９においては、措置テーブルを参照し実行したレベルの回復措置より上位の措置候補が存在している（Ｓ９１）。この場合には、措置ルールで確定した措置内容を更新し（Ｓ９２）、エスカレーションさせる。
図１０は、措置内容が更新された場合、或いは、運用履歴ＤＢに措置履歴がない場合（Ｓ１０２）、のシーケンスであり、措置対象装置の状態を確認し（Ｓ１０３）、回復状態であれば措置を実行せず終息する（Ｓ１０４）。
【００２７】
図１１のシーケンスでは、運用履歴ＤＢに措置履歴がなく（Ｓ１１１）、更に措置対象の装置が未回復状態（Ｓ１１３）であれば、措置ルールは運用履歴ＤＢに実行する措置内容を記録した後（Ｓ１１４）、ＮＥ‐ＯＰＳへＮＥに対する措置実行を要求する（Ｓ１１５）。措置の結果装置が回復しなければ運用記録ＤＢに装置状態を記録し終息する（Ｓ１１７）。
【００２８】
図１２のシーケンスでは、措置実行（Ｓ１２１）の結果、ＮＥが回復した場合（Ｓ１２２）に、回復メッセージを契機に運用履歴ＤＢから記録を削除し（Ｓ１２３）、終息する。
このように運用履歴の参照と、措置対象装置への状態確認を実施することで、回復措置による故障メッセージの発生タイミングにとらわれず、同一装置に対する同一措置を回避することが可能となる。すなわち、意味のない重複措置の回避によってシステムの負荷軽減が可能となるのみでなく、エスカレーション等の措置履歴参照に基づく措置内容の更新により効率的な回復措置の実行が可能となる。
【００２９】
【発明の効果】
以上説明詳細に説明したように、通信装置に対して実行された措置の記録を参照して、措置を決定することにより、複雑に構成される通信ネットワークにおける通信装置に故障が発生した場合でも、同一装置に対する同一措置を回避しシステムの処理性能を低下させることなく回復措置を自動実行することが可能となり、保守業務を効率化させることができる。
【００３０】
更に措置内容を最小単位からエスカレーションさせることで、サービスへの影響を最低限にし故障装置を早期に回復させるための適切な措置を実行することが可能となる。
【図面の簡単な説明】
【図１】本実施の形態にかかる自動措置実行オペレーションシステム１が管理するネットワークの構成が示す図である。
【図２】本実施の形態の自動措置実行オペレーションシステム１の構成を示す図である。
【図３】自動措置実行オペレーションシステムの動作の一例を説明するフローチャートである。
【図４】運用履歴ＤＢの構成例が示す図である。
【図５】措置テーブルの構成例を示す図である。
【図６】エスカレーション措置の物理的な段階を説明する図である。
【図７】エスカレーションの論理的な段階を説明する図である。
【図８】運用履歴ＤＢに記録がある場合の処理シーケンス図である。
【図９】措置内容を変更する場合の処理シーケンス図である。
【図１０】措置実行前に通信装置が回復している場合の処理シーケンス図である。
【図１１】措置実行後も回復しない場合の処理シーケンス図である。
【図１２】措置実行後に回復した場合の処理シーケンス図である。
【符号の説明】
１自動措置実行オペレーションシステム
１１警報分析部
１２回復措置シナリオ
１３運用履歴ＤＢ
１４措置テーブル
１５推定ルール部
１６措置ルール部
９ネットワークエレメント[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an operation system and a recovery method for a communication device, and more particularly to an operation system and a recovery method for a communication device, which are suitable for executing a recovery process for a failure message from a communication device constituting a communication network. About the method.
[0002]
[Prior art]
When a failure occurs in a communication device constituting the communication network, a failure message is notified from the communication device to each operation system that manages the communication device for each device type and system such as switching, transmission, and wireless.
By analyzing the failure message sent to the operation system, it is possible to execute a recovery measure based on the analysis result. However, it is difficult to identify the failure from many failure messages generated due to one device failure. Conventionally, this operation is performed by a maintenance person, and each operation system that manages the communication device is used. It was necessary to take failure recovery measures by maintenance personnel through the service.
[0003]
On the other hand, in recent years, an automatic action execution operation system that automatically executes a recovery action upon notification of a failure message has been studied (see Non-Patent Document 1).
[0004]
[Non-patent document 1]
CD-ROM "Transactions of the Society of Electronics, Information and Communication Engineers 2002 Society Conference", The Institute of Electronics, Information and Communication Engineers, August 20, 2002, Lecture Number: B-6-76. Examination "
[0005]
[Problems to be solved by the invention]
However, in realizing an operation system that implements the above-described automatic recovery measure method, there is a problem based on the following failure message occurrence phenomenon.
That is, when the recovery measures are automatically performed, during the recovery process of the device, or when the device remains unrecovered, the device notifies the failure message again. For example, when a failure occurs, a failure message of "failure occurred", and in the recovery process, a failure message indicating "not implemented" due to a part of the device being reset due to execution of measures, Until the recovery, the communication device sends a failure message.
[0006]
For this reason, in an operation system that implements the automatic recovery measure method, it is meaningless to analyze and execute a failure message generated by the failure recovery measure or a failure message notified after the measure in the same way as a normal message, and to execute the measure. Endlessly it becomes a load of the system. Further, this causes a problem that it is difficult to quickly recover not only the device concerned but also another failed communication device.
An object of the present invention is to solve the above-mentioned problems and to provide an operation system capable of avoiding execution of duplicate measures and a recovery measure method for a communication device.
[0007]
[Means for Solving the Problems]
An operation system according to claim 1 of the present invention is an operation system that manages a device by executing a process for the device based on a message notified from a device connected via a communication medium,
Processing recording means for recording the executed processing;
Processing determining means for determining processing to be performed on the device based on notification of a new message by referring to processing already performed on the same device based on notification of the message, which is recorded in the processing recording means; , Is characterized by having.
[0008]
With this configuration, it is possible to cope with processing based on the message notified at each timing, and it is possible to efficiently manage the apparatus.
Here, the operation system is a terminal, device, or other management system having the above functions.
The operation system according to claim 2 of the present invention is characterized in that, in claim 1, the message is a failure message for notifying a failure in a communication device constituting a communication network.
[0009]
The operation system having the configuration of claim 1 is particularly effective when performing processing based on a failure message transmitted from a communication device forming a communication network.
That is, since the communication devices are connected to each other by the communication network, a failure message is transmitted from each of the plurality of devices at each timing due to the failure of one device, or the content is different even in a recovery process of the failed device. The transmission timing and the content of the failure message due to one failure such as the transmission of the failure message are various. Even if these cannot be related to all due to one failure, and even if the transmission timings are different, not only the failure message to be notified but also the measures for the communication device are focused on, so that an accurate communication device can be obtained. Can be controlled.
[0010]
Here, the notification of the new message described in claim 1 means a time lag, and does not mean a failure message caused by a new failure.
In an operation system according to a third aspect of the present invention, in the second aspect, when the same processing has already been executed for the same apparatus based on the notification of the message, the processing determining means performs the processing again. Is determined not to be performed.
[0011]
This avoids the execution of duplicate processing for the same device. For example, in the case of performing recovery measures for a failure message, the communication apparatus that has been in the recovery process due to recovery measures such as initialization of software is used. It is possible to avoid a delay in recovery due to initialization or the like, a load on the operation system due to repeated meaningless measures, and a delay in measures for other failures based on the load.
[0012]
The operation system according to claim 4 of the present invention, according to claim 2 or 3, further comprises an action table in which recovery actions for the failure message are defined in stages.
The processing determining means refers to the measure table to determine the execution of a recovery measure having a higher measure than the measure already performed for the unrecovered communication device.
[0013]
By dynamically updating and executing a measure with reference to the measure history in this way, more efficient recovery measures can be taken.
An operation system according to a fifth aspect of the present invention is the operation system according to any one of the second to fourth aspects, further comprising: a failure identification unit that identifies a failure that caused the failure message based on the notification of the failure message. In addition,
The process determining unit refers to a process that has already been executed based on a notification of a failure message caused by the same failure specified by the failure specifying unit, and determines a process based on notification of a new message. Features.
[0014]
A recovery method for a communication device according to claim 6 of the present invention is a recovery method for executing a recovery measure for the communication device based on a failure message notified from the communication device,
On the basis of a record of recovery measures already performed on the same communication device, recovery measures for the communication device based on notification of a new failure message are determined and executed.
[0015]
According to a sixth aspect of the present invention, in the method for recovering a communication device according to the seventh aspect, when the same process has already been performed on the same device based on the notification of the message, the process is performed again. It is characterized by not having.
The method for recovering a communication device according to claim 8 of the present invention is characterized in that, in claim 6 or 7, a recovery measure having a higher-level measure than the measure already performed for an unrecovered communication device is executed. And
[0016]
BEST MODE FOR CARRYING OUT THE INVENTION
Next, an embodiment of the present invention will be described with reference to the drawings. In each of the drawings referred to in the following description, parts equivalent to those in other drawings are indicated by the same reference numerals.
FIG. 1 shows a configuration of a network managed by the automatic action execution operation system 1 according to the present embodiment. The network shown in FIG. 1 includes various network elements 9 (hereinafter, referred to as “NE”) for a switching system, a transmission system, and a wireless system. The automatic action execution operation system 1 according to the present embodiment receives a failure message from the NE 9 via an NE-OPS 8 (NE Operation System) that manages various kinds of NEs 9, analyzes the failure message, and analyzes the failure message. To automatically execute the measures. The automatic measure execution operation system 1 is constituted by, for example, a general-purpose computer.
[0017]
FIG. 2 shows a configuration of the automatic action execution operation system 1 of the present embodiment, and FIG. 3 shows a flowchart for explaining an example of the operation of the automatic action execution operation system. Hereinafter, the configuration and operation of the automatic action execution operation system 1 will be described with reference to FIGS.
A failure message notified via the NE-OPS in FIG. 2 when a failure occurs in the NE is received by the alarm analysis unit 11 (S201).
[0018]
The alarm analysis unit 11 selects and stores the recovery measure scenario 12 corresponding to the received failure message from the recovery measure scenarios 12 set for each failure message. This recovery action scenario shows a recovery action execution procedure in response to a notification of a failure message, and is executed by an estimation rule section 15 for estimating a main cause and an action rule section 16 for executing recovery action. Performs recovery measures.
[0019]
The estimation rule unit 15 performs main factor estimation of the occurrence of a failure message according to the defined operation rule. Then, a measure target device and a measure content corresponding to the main cause estimated by the main cause estimation are determined (S202).
The measure rule unit 16 executes the determined measure content. At this time, in the present invention, it is determined with reference to the operation history DB 13 whether or not there is a history of the same action as the determined action content (S203), thereby determining whether or not to execute the action. I do.
[0020]
If there is a history of the measures ("history exists" in FIG. 3), it is recognized that the same measure has been executed, and the process ends without executing the decided measure. This makes it possible to avoid the execution of the duplicated measures for the target device in the recovery process by the previously executed restoration measures.
If there is no history of the measure ("no history" in FIG. 3), the status of the target device is confirmed (S204). The status confirmation is performed, for example, by inputting a command for acquiring the status information (failure, recovery, etc.) to the target device via the NE-OPS.
[0021]
As a result of the status check, if the target device is “recovered”, a process of deleting the measure history from the operation history DB is performed (S205). This makes it possible to execute measures when a new failure occurs.
As a result of the status check, if the target device is “unrecovered”, it is checked whether or not a higher-order measure exists by referring to the measure table 14 (S206).
[0022]
If there is a higher-level measure ("Yes" in FIG. 3), the higher-level measure is recorded in the operation history DB (S207), and the higher-level measure is executed (S208). This is repeated until the recovery of the device is confirmed.
FIG. 4 shows a configuration example of the operation history DB 13. In the drawing, “failure NE” for specifying the NE where the measure has been taken, “message No.” for specifying the notified failure message, “measure content” for specifying the executed measure, and “state of execution of the measure” It consists of items of "measure flag" and "measure level". This makes it possible to determine the content of the measure based on the notification of the new message with reference to the measure executed in the target device and the level of the measure.
[0023]
FIG. 5 shows a configuration example of the measure table. In the figure, a measure and a measure level are associated with the “message No.” of the failure message. In the present embodiment, the “message No.” is a number uniquely assigned to the failure message, and it is possible to specify which function of which NE has failed by this.
[0024]
FIG. 6 is a diagram for explaining a physical stage of the escalation measure, and FIG. 7 is a diagram for explaining a logical stage of the escalation. As described above, if the device remains unrecovered despite the action, a higher level action is taken. As shown in FIG. 6, in the physical escalation, at the level 1, first, the card L11 which performs a part of a function such as a T-PA (transmission power amplifier) is reset. If the function block L12 constituted by L11 is reset and still does not recover, a measure for resetting the entire communication device L13 at level 3 is executed. As shown in FIG. 7, in the logical escalation, the level of the software is higher, the level of the software is initialized L21, the level of the hardware is initialized L22, the level of the hardware is initialized, and the program L23 for loading the program from the outside is the higher level.
[0025]
8 to 12 show an example of a processing sequence in the automatic action execution operation system that performs the above-described configuration and operation.
FIG. 8 shows a sequence when the operation history DB has a measure record. After the reference result confirms that the measure history is present (S82), it is further confirmed by referring to the measure table whether a higher-order measure exists (S83). If there is no higher-level measure than the measure of the already executed level, the process is terminated without executing the measure (S85).
[0026]
In FIG. 9, there is a measure candidate higher in level than the recovery measure executed with reference to the measure table (S91). In this case, the content of the measure determined by the measure rule is updated (S92) and escalated.
FIG. 10 shows a sequence when the measure content is updated or when there is no measure history in the operation history DB (S102). The state of the measure target device is confirmed (S103). (S104).
[0027]
In the sequence of FIG. 11, if there is no action history in the operation history DB (S111), and if the device to be actioned is in an unrecovered state (S113), the action rule is to record the action to be executed in the operation history DB ( (S114), and requests the NE-OPS to execute measures for the NE (S115). If the device does not recover as a result of the measure, the status of the device is recorded in the operation record DB and the operation is terminated (S117).
[0028]
In the sequence of FIG. 12, when the NE is recovered (S122) as a result of the measure execution (S121), the record is deleted from the operation history DB upon the recovery message (S123), and the process ends.
In this way, by referring to the operation history and confirming the status of the target device, it is possible to avoid the same measure for the same device irrespective of the timing of occurrence of a failure message due to the recovery measure. That is, not only can the system load be reduced by avoiding meaningless duplication measures, but also efficient recovery measures can be executed by updating the measure contents based on reference to a measure history such as escalation.
[0029]
【The invention's effect】
As described in detail above, by referring to the record of the measures performed on the communication device and determining the measure, even if a failure occurs in the communication device in the communication network having a complicated configuration, Recovery measures can be automatically executed without reducing the processing performance of the system by avoiding the same measures for the same device, and the maintenance work can be made more efficient.
[0030]
Further, by escalating the contents of the measure from the minimum unit, it is possible to execute an appropriate measure for minimizing the influence on the service and recovering the failed device early.
[Brief description of the drawings]
FIG. 1 is a diagram showing a configuration of a network managed by an automatic action execution operation system 1 according to an embodiment.
FIG. 2 is a diagram showing a configuration of an automatic action execution operation system 1 of the present embodiment.
FIG. 3 is a flowchart illustrating an example of an operation of the automatic action execution operation system.
FIG. 4 is a diagram illustrating a configuration example of an operation history DB;
FIG. 5 is a diagram showing a configuration example of a measure table.
FIG. 6 is a diagram illustrating a physical stage of an escalation measure.
FIG. 7 is a diagram illustrating a logical stage of escalation.
FIG. 8 is a processing sequence diagram when there is a record in the operation history DB.
FIG. 9 is a processing sequence diagram when the content of the measure is changed.
FIG. 10 is a processing sequence diagram in a case where the communication device has recovered before executing the measure.
FIG. 11 is a processing sequence diagram in a case where recovery is not performed even after execution of a measure.
FIG. 12 is a processing sequence diagram in a case where recovery is performed after execution of a measure.
[Explanation of symbols]
1 Automatic action execution operation system 11 Alarm analysis unit 12 Recovery action scenario 13 Operation history DB
14 Measure table 15 Estimation rule section 16 Measure rule section 9 Network element

Claims

An operation system that manages the device by executing a process on the device based on a message notified from the device connected via the communication medium,
Processing recording means for recording the executed processing;
A process determining unit that refers to a process that has already been executed for the same device based on a notification of a message, which is recorded in the process recording unit, and determines a process for the device based on a notification of a new message; An operation system comprising:

The operation system according to claim 1, wherein the message is a failure message for notifying a failure in a communication device forming a communication network.

The method according to claim 2, wherein the processing determination unit determines not to perform the processing again when the same processing is already executed for the same apparatus based on the notification of the message. Operation system.

The apparatus further comprises an action table in which recovery actions for the failure message are defined in stages,
3. The processing determination unit according to claim 2, wherein the processing determination unit refers to the processing table to determine execution of a recovery processing of a higher-level processing content than a processing content already performed on an unrecovered communication device. 4. 3. The operation system according to 3.

Based on the notification of the failure message, further includes a failure identification unit that identifies a failure that has caused the failure message,
The process determining unit refers to a process that has already been executed based on a notification of a failure message caused by the same failure identified by the failure identification unit, and determines a process based on notification of a new message. The operation system according to any one of claims 2 to 4, wherein

A recovery measure method for performing a recovery measure for the communication device based on a failure message notified from the communication device,
Recovering a communication device based on a notification of a new failure message based on a record of recovery actions already performed on the same communication device; .

7. The recovery method for a communication device according to claim 6, wherein if the same process has already been performed on the same device based on the notification of the message, the process is not performed again.

8. The recovery method for a communication device according to claim 6, wherein a recovery measure of a measure content higher than the measure content already executed for the unrecovered communication device is executed.