JP3342253B2

JP3342253B2 - Failure recovery method for distributed node system

Info

Publication number: JP3342253B2
Application number: JP22973595A
Authority: JP
Inventors: 俊郎中村; 英一岡; 理前側; 泰雄岡本
Original assignee: NEC Corp; Nippon Telegraph and Telephone Corp
Current assignee: NEC Corp; Nippon Telegraph and Telephone Corp
Priority date: 1995-09-07
Filing date: 1995-09-07
Publication date: 2002-11-05
Anticipated expiration: 2015-09-07
Also published as: JPH0983611A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は分散ノードシステム
において障害が発生した場合の再開制御方式に係わり，
特に大規模インテリジェントネットワークを構成するサ
ービス制御局において，分散モジュールの冗長構成およ
びバックアップ機構を利用した集中制御モジュールにお
けるモジュール状態制御により，障害発生時のサービス
中断時間を短時間とすることが可能な再処理を実現する
分散ノードシステムの障害回復方法に関するものであ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a restart control method when a failure occurs in a distributed node system.
In particular, in service control stations that make up large-scale intelligent networks, a redundant configuration of distributed modules and module state control in a centralized control module using a backup mechanism can reduce service interruption time in the event of a failure. The present invention relates to a failure recovery method for a distributed node system that realizes processing.

【０００２】[0002]

【従来の技術】従来のインテリジェントネットワークの
サービス制御局には，交換機システムが適用されてお
り，障害が発生した場合には，サービス制御局は以下に
規定される再開ロジックにより初期設定範囲を拡大しな
がら障害からの早期回復を試みている。2. Description of the Related Art A switching system is applied to a service control station of a conventional intelligent network, and when a failure occurs, the service control station expands an initial setting range by a restart logic defined below. While trying to recover quickly from disability.

【０００３】ＰＨ１：障害発生時，安定状態（通話中状
態）にある呼以外の呼に関連するリソースを初期設定し
再開。ＰＨ２：すべての呼に関するリソースを初期設定して再
開。PH1: When a failure occurs, resources related to calls other than those in a stable state (busy state) are initialized and restarted. PH2: Initialize resources for all calls and resume.

【０００４】ＰＨ３：システムプログラムを再ロード
し，全初期設定を実施して再開。交換機システムでは一般にデュプレクス構成が採用さ
れ，上記各再開ロジックは０系／１系でそれぞれ実施さ
れるため，ＰＨ１再開２回→ＰＨ２再開２回→ＰＨ３再開２回の順で再開ロジックを深め，所定の時間（２０分）以内
に再開完了できなかった場合には，障害は重大故障とし
て扱われる。罹障モジュールでのこれら再開ロジック実
施中は通信サービスの受け付けが不可能の状態であるた
め，サービス停止状態に陥ることになる。PH3: Reload the system program, execute all initial settings, and restart. In the switching system, a duplex configuration is generally adopted, and each of the above restart logics is implemented in the 0 system / 1 system respectively. If the restart cannot be completed within the time (20 minutes), the failure is treated as a serious failure. During the execution of these restart logics in the affected module, the communication service cannot be accepted, so that the service is stopped.

【０００５】[0005]

【発明が解決しようとする課題】前記「従来の技術」の
項で述べたように，従来の交換機を用いたインテリジェ
ントネットワークのサービス制御局では，障害が発生す
ると再開完了するまで当該モジュールで提供するサービ
スが停止する。このため，ネットワークのサービス品質
の劣化につながっていた。As described in the section of "Prior Art", in a service control station of an intelligent network using a conventional exchange, when a failure occurs, the service is provided by the module until restart is completed. The service stops. For this reason, the service quality of the network was degraded.

【０００６】本発明の目的は，分散モジュール構成を採
用した分散ノードシステムにおける集中制御モジュール
と罹障モジュール，および他モジュールとの連携によ
り，モジュール障害発生時であってもサービス中断時間
をできるだけ短くすること，また，必要に応じてサービ
ス中断時間をカスタマイズ可能とすることにある。An object of the present invention is to minimize the service interruption time even when a module failure occurs, by coordinating a centralized control module, a failed module, and other modules in a distributed node system employing a distributed module configuration. Another object of the present invention is to make it possible to customize the service interruption time as needed.

【０００７】[0007]

【課題を解決するための手段】本発明は，前記課題を解
決するため，負荷分散または機能分散の分散モジュール
群と，これらを統括管理する集中制御モジュールと，保
守端末とを有する分散ノードシステムにより構成される
通信制御ノードにおいて，分散モジュールで発生した障
害を集中制御モジュールが検出した場合，他モジュール
との連携により罹障モジュールへの通信サービスアクセ
スを直ちに停止し，罹障モジュールで再開完了した場合
には直ちに通信サービスアクセスを回復する。In order to solve the above-mentioned problems, the present invention provides a distributed node system having a distributed module group for load distribution or function distribution, a centralized control module for centrally managing these modules, and a maintenance terminal. In the configured communication control node, when the centralized control module detects a failure that occurred in the distributed module, the communication service access to the affected module is immediately stopped in cooperation with other modules, and the restart is completed in the affected module. To immediately restore communication service access.

【０００８】また，前記分散ノードシステムにおける機
能分散モジュールにデータベースシステムを搭載して，
通信サービスにおける更新系データを管理する場合にお
いて，モジュール相互間でバックアップ関係を規定して
おき，集中制御モジュールは，機能分散モジュールにお
ける障害発生時に障害復旧の時間監視を行い，所定の時
間以内に障害復旧しない場合にバックアップ側に処理を
引き継ぐ。Further, a database system is mounted on the function distribution module in the distributed node system,
When managing update data in a communication service, a backup relationship is specified between modules, and the centralized control module monitors the time of failure recovery when a failure occurs in the function distribution module, and within a specified time If not restored, take over the process to the backup side.

【０００９】以上の方法により，通信制御ノードの再開
処理において次のことが可能になる。 (1) 負荷分散モジュールの障害であれば，集中制御モジ
ュールと他モジュールとの連携により，負荷分散モジュ
ール構成を利用して罹障モジュールのトラヒックを他モ
ジュールで引き受けることになり，サービス無中断の障
害復旧手順が実現できる。According to the above method, the following can be performed in the restart processing of the communication control node. (1) In the case of a failure of the load distribution module, the traffic of the failed module is assumed by the other module using the load distribution module configuration in cooperation with the central control module and other modules, and the failure without service interruption. A recovery procedure can be realized.

【００１０】(2) 機能分散モジュールの障害について
は，最低では罹障モジュールと相互バックアップ関係に
あるモジュールにサービス処理を引き継ぐ間の中断のみ
で，当該モジュールで実施すべきサービスの継続が可能
となる。(2) Regarding the failure of the function distribution module, at least the interruption to take over the service processing to the module which has a mutual backup relationship with the failed module allows the continuation of the service to be performed in the module. .

【００１１】ただし，(1) (2) 双方ともモジュール障害
が発生する前と比べて処理負荷が増大する。特に(2) の
機能分散モジュールでは，バックアップ前に比べ処理負
荷が倍増するため，障害復旧監視タイマの値，すなわち
サービス中断時間と処理負荷増大とのトレードオフによ
り，タイマ値を決定する。However, both (1) and (2) increase the processing load as compared to before the occurrence of the module failure. In particular, in the function distribution module of (2), the processing load is doubled compared to before the backup, so the timer value is determined based on the value of the failure recovery monitoring timer, that is, the trade-off between the service interruption time and the processing load increase.

【００１２】[0012]

【発明の実施の形態】以下，本発明の実施の形態を図面
を参照して説明する。図１は，分散モジュール構成を利
用したサービス制御ノードの構成を示す。同図におい
て，１０１は伝達ノード，３００は分散モジュール構成
を採用したサービス制御ノードである。２０１は伝達ノ
ード１０１との通信制御を司る負荷分散モジュール群
（以下，このタイプのモジュールを負荷分散モジュール
Ａという），２０２はサービス制御ロジックを保持する
負荷分散モジュール群（以下，このタイプのモジュール
を負荷分散モジュールＢという），２０３はサービス制
御ロジックにより更新される更新系データをリアルタイ
ムデータベース上に保持する機能分散モジュール群（以
下，このタイプのモジュールを機能分散モジュールＣと
いう），２０４は負荷分散モジュール群２０１，２０２
および機能分散モジュール群２０３のモジュール群を統
括管理する集中制御モジュールである。サービス制御ノ
ード３００はこれらの分散モジュールにより構成され
る。４００は保守端末である。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 shows a configuration of a service control node using a distributed module configuration. In the figure, 101 is a transmission node, and 300 is a service control node adopting a distributed module configuration. A load distribution module group 201 (hereinafter, this type of module is referred to as a load distribution module A) that controls communication with the transmission node 101, and a load distribution module group 202 (hereinafter, this type of module that holds service control logic) is provided. A load distribution module B), 203 is a function distribution module group (hereinafter, this type of module is referred to as a function distribution module C) that holds update data updated by the service control logic on a real-time database, and 204 is a load distribution module. Groups 201, 202
And a centralized control module that centrally manages a module group of the function distribution module group 203. The service control node 300 is composed of these distribution modules. Reference numeral 400 denotes a maintenance terminal.

【００１３】伝達ノード１０１は，端末間の呼に対応す
る通話路の制御を実施する。伝達ノード１０１はインテ
リジェントネットワークサービス呼を検出するとサービ
ス制御ノード３００に通知してくる。サービス制御ノー
ド３００は，当該呼の制御を分散モジュール群２０１〜
２０４の連携により実施する。The transmission node 101 controls a communication path corresponding to a call between terminals. The transmission node 101 notifies the service control node 300 when detecting the intelligent network service call. The service control node 300 controls the call in the distributed module group 201 to
This is performed in cooperation with step 204.

【００１４】図２は，サービス制御ノード３００内の各
分散モジュール上への，障害復旧のための機能配備条件
を示す図である。図２に示すように，負荷分散モジュー
ルＡ，Ｂおよび機能分散モジュールＣは，障害検出／再
開制御機構５００，障害／再開経過通知機構５０１，他
モジュール状態管理機構５０２を持つ。集中制御モジュ
ール２０４は，障害検出／再開制御機構５００，全モジ
ュール状態監視／制御機構５０３，障害復旧監視機構５
０４を持つ。保守端末４００は，障害復旧監視タイマ設
定機能を持つ。FIG. 2 is a diagram showing a condition for allocating a function for failure recovery on each distributed module in the service control node 300. As shown in FIG. 2, the load distribution modules A and B and the function distribution module C have a failure detection / restart control mechanism 500, a failure / restart progress notification mechanism 501, and another module state management mechanism 502. The central control module 204 includes a failure detection / resumption control mechanism 500, an all module status monitoring / control mechanism 503, and a failure recovery monitoring mechanism 5.
04. The maintenance terminal 400 has a failure recovery monitoring timer setting function.

【００１５】１）障害検出／再開制御機構５００伝達ノード１０１上の呼処理と連動し，サービス制御ロ
ジックにより呼処理を実施する負荷分散モジュールＢに
は，「従来の技術」の項で示した再開ロジックをそのま
ま適用する。一方，伝達ノード１０１との通信制御を実
施する負荷分散モジュールＡおよび集中制御モジュール
２０４では，伝達系の呼処理と連動した管理リソースが
存在しないため，ＰＨ１の適用は意味を持たない。この
ため，再開ロジックはＰＨ２，ＰＨ３を適用する。1) Failure detection / restart control mechanism 500 The load balancing module B which performs call processing by the service control logic in conjunction with the call processing on the transmission node 101 has the restart described in the section of "Prior Art". Apply the logic as is. On the other hand, in the load distribution module A and the centralized control module 204 that perform communication control with the transmission node 101, the application of PH1 has no meaning since there is no management resource linked to the call processing of the transmission system. For this reason, the restart logic applies PH2 and PH3.

【００１６】また，サービス制御ロジックからのデータ
アクセス処理を実現する機能分散モジュールＣについて
も，データベースアクセス時のみ処理が存在し，伝達ノ
ード上の呼処理と連動した管理リソースは存在しないた
め，再開ロジックはＰＨ２，ＰＨ３を適用する。The function distribution module C for realizing data access processing from the service control logic also has processing only when accessing the database, and there is no management resource linked to call processing on the transmission node. Apply PH2 and PH3.

【００１７】２）障害／再開経過通知機構５０１各分散モジュールは障害を検出し，自律で再開ロジック
を実施する機能と，障害発生および再開処理を実施して
いる際の経過情報を集中制御モジュール２０４に対して
通知する機能を持つ。この機能は，罹障モジュールの状
態如何にかかわらず，どのような場合であっても情報通
知できる必要があり，このため本インタフェースはソフ
トウェアロジックではなく，ハードウェアロジックによ
り実現される。2) Fault / Restart Progress Notification Mechanism 501 Each distributed module detects a fault and performs autonomous restart logic, and the progress information during fault occurrence and restart processing is transmitted to the central control module 204. It has the function of notifying to. This function needs to be able to notify information in any case regardless of the state of the affected module, and therefore, this interface is realized by hardware logic, not software logic.

【００１８】３）全モジュール状態監視／制御機構５０
３集中制御モジュール２０４は，各モジュールからの障害
／再開経過通知を受け取り，サービス制御ノード内の全
モジュール状態を管理する。任意モジュールの障害発生
通知を受け取った場合，集中制御モジュール２０４は直
ちに当該モジュールの管理状態を障害状態とするととも
に，他の全モジュールに対して当該モジュール状態が障
害に変わったことを通知する。3) All module status monitoring / control mechanism 50
3. The centralized control module 204 receives the failure / restart progress notification from each module and manages the status of all modules in the service control node. When receiving the failure notification of the arbitrary module, the central control module 204 immediately sets the management state of the module to the failure state and notifies all other modules that the module state has changed to the failure state.

【００１９】その後，集中制御モジュール２０４は，罹
障モジュールの復旧通知を監視し，復旧通知を受け取っ
た時点で当該モジュールの状態を正常状態とするととも
に，他の全モジュールに対して当該モジュール状態が正
常に変わったことを通知する。Thereafter, the centralized control module 204 monitors the notification of the recovery of the affected module, sets the status of the module to the normal state when the notification of the recovery is received, and changes the status of the module to all other modules. Notify that it has changed normally.

【００２０】４）他モジュール状態管理機構５０２全ての分散モジュールでは，自モジュール以外の他モジ
ュール状態を管理する。全モジュール状態監視／制御機
構５０３により，集中制御モジュール２０４から罹障モ
ジュールのモジュール状態を受け取った他モジュールで
は，罹障モジュールの管理状態を障害状態と設定する。
この他モジュール状態は，常にそのモジュールで実施さ
れるサービス処理において，他モジュールへのアクセス
が必要な場合，事前に参照され，モジュール状態が正常
でない場合には当該モジュールへのアクセスを中止し，
代替モジュールを選択しアクセスを実施する。4) Other module status management mechanism 502 All distributed modules manage the status of other modules other than the own module. The other module that has received the module status of the affected module from the centralized control module 204 by the all module status monitoring / control mechanism 503 sets the management status of the affected module to the failed status.
This other module status is referred to in advance when access to another module is necessary in service processing performed by that module. If the module status is not normal, access to the module is stopped.
Select an alternative module and implement access.

【００２１】代替モジュールの選択方式としては，罹障
モジュールが負荷分散モジュールＡ，Ｂである場合に
は，同種の他モジュールを任意に選択する。一方，罹障
モジュールが機能分散モジュールＣの場合には，集中制
御モジュール２０４の障害復旧監視機構５０４に基づ
き，一定時間の待ち時間を経て当該モジュールと相互バ
ックアップ関係にあるモジュールを選択する。As an alternative module selection method, when the affected modules are the load distribution modules A and B, another module of the same type is arbitrarily selected. On the other hand, when the affected module is the function distribution module C, a module having a mutual backup relationship with the module is selected after a certain waiting time based on the failure recovery monitoring mechanism 504 of the central control module 204.

【００２２】５）障害復旧監視機構５０４集中制御モジュール２０４において，機能分散モジュー
ルＣの障害が検出された場合，３）で示したように当該
モジュールの管理状態を障害状態とするとともに，予め
保守端末４００の障害復旧監視タイマ設定機能等によっ
て指定されたタイマ値に従って障害復旧監視タイマによ
る時間監視を行い，タイマが満了した時点で罹障モジュ
ールと相互バックアップ関係にあるモジュールにサービ
ス処理を引き継ぐ。5) Failure recovery monitoring mechanism 504 When a failure of the function distribution module C is detected in the central control module 204, the management state of the module is set to the failure state as shown in 3), and the maintenance terminal is set in advance. The time is monitored by the failure recovery monitoring timer according to the timer value designated by the failure recovery monitoring timer setting function of 400, and upon expiration of the timer, the service processing is taken over to the module which has a mutual backup relationship with the affected module.

【００２３】６）障害復旧監視タイマ設定機能保守端末４００には，集中制御モジュール２０４上で保
持する障害復旧監視タイマの値を設定する機能を配備す
る。6) Fault recovery monitoring timer setting function The maintenance terminal 400 has a function of setting the value of the fault recovery monitoring timer held on the centralized control module 204.

【００２４】図３は，各モジュール間での機能連携によ
る障害発生／復旧時の手順の例を示す。図３の罹障モジ
ュール６００は，負荷分散モジュールＡ，Ｂまたは機能
分散モジュールＣの障害になったモジュールであり，他
モジュール７００は，罹障モジュール６００と集中制御
モジュール２０４を除く，その他の分散モジュールであ
る。FIG. 3 shows an example of a procedure at the time of occurrence / recovery of a failure by function cooperation between modules. 3 is a module in which the load distribution modules A and B or the function distribution module C has failed, and the other modules 700 are other distribution modules except the failure module 600 and the centralized control module 204. It is.

【００２５】障害発生時の手順は，図３に示す(1) 〜
(5) のとおりである。 (1) まず，罹障モジュール６００において，障害検出／
再開制御機構５００が障害を検出する。The procedure when a failure occurs is shown in FIG.
It is as (5). (1) First, the fault detection /
The restart control mechanism 500 detects the failure.

【００２６】(2) 障害／再開経過通知機構５０１は，そ
の障害を集中制御モジュール２０４に通知する。 (3) 集中制御モジュール２０４の全モジュール状態監視
／制御機構５０３は，罹障モジュール６００の管理状態
を障害に設定する。(2) The failure / restart progress notification mechanism 501 notifies the central control module 204 of the failure. (3) The module status monitoring / control mechanism 503 of the central control module 204 sets the management status of the affected module 600 to failure.

【００２７】(4) そして，全モジュール状態監視／制御
機構５０３は，全ての他モジュール７００に対し罹障モ
ジュール６００の障害発生を通知する。 (5) 通知を受けた他モジュール７００は，罹障モジュー
ル６００の管理状態を障害に設定する。以後，罹障モジ
ュール６００が障害状態である間は罹障モジュール６０
０へのアクセスは禁止される。(4) Then, the all module status monitoring / control mechanism 503 notifies all the other modules 700 of the occurrence of the fault in the faulty module 600. (5) Upon receiving the notification, the other module 700 sets the management status of the affected module 600 to failure. Thereafter, while the affected module 600 is in the failure state, the affected module 60
Access to 0 is prohibited.

【００２８】障害からの復旧時の手順は，図３に示す
(a) 〜(e) のとおりである。 (a) 罹障モジュール６００において，障害検出／再開制
御機構５００は所定の再開ロジックを実施する。The procedure at the time of recovery from a failure is shown in FIG.
(a) to (e). (a) In the affected module 600, the failure detection / restart control mechanism 500 executes a predetermined restart logic.

【００２９】(b) 障害／再開経過通知機構５０１は，障
害検出／再開制御機構５００による再開が完了したなら
ば，集中制御モジュール２０４の障害復旧監視機構５０
４に再開完了を通知する。(B) The failure / restart progress notification mechanism 501 starts the failure recovery monitoring mechanism 50 of the centralized control module 204 when the restart by the failure detection / restart control mechanism 500 is completed.
4 is notified of the completion of restart.

【００３０】(c) 障害復旧監視機構５０４は，罹障モジ
ュール６００の復旧を全モジュール状態監視／制御機構
５０３に通知する。 (d) 全モジュール状態監視／制御機構５０３は，罹障モ
ジュール６００の状態を正常状態に設定するとともに，
他モジュール７００に対して罹障モジュール６００の復
旧を通知する。(C) The failure recovery monitoring mechanism 504 notifies the all module status monitoring / control mechanism 503 of the recovery of the failed module 600. (d) The all module status monitoring / control mechanism 503 sets the status of the affected module 600 to a normal status,
The other module 700 is notified of the recovery of the affected module 600.

【００３１】(e) 他モジュール７００における他モジュ
ール状態管理機構５０２は，罹障モジュール６００の状
態を正常状態に設定する。以後，復旧した罹障モジュー
ル６００へのアクセスが可能になる。(E) The other module status management mechanism 502 in the other module 700 sets the status of the affected module 600 to a normal status. Thereafter, access to the restored affected module 600 becomes possible.

【００３２】図４は，機能分散モジュールにおいて障害
が発生した場合を例にとって，その障害回復手順を示し
たものである。機能分散モジュールにおいて障害が発生した場合，
即座に集中制御モジュールに対して障害通知が実施され
る。FIG. 4 shows a failure recovery procedure for a case where a failure has occurred in the function distribution module as an example. If a failure occurs in the function distribution module,
Immediately, a fault notification is sent to the central control module.

【００３３】集中制御モジュールでは，罹障モジュ
ールの状態を障害状態とするとともに，障害復旧監視タ
イマによる計測を開始し，障害復旧監視を開始する。ま
た，他モジュールに対し罹障モジュールのモジュール状
態を通知する。The centralized control module sets the state of the affected module to the failure state, starts measurement by the failure recovery monitoring timer, and starts failure recovery monitoring. Also, it notifies other modules of the module status of the affected module.

【００３４】他モジュールでは罹障モジュールに対
するアクセス処理を禁止する。罹障モジュールでは，自律再開処理が試みられてお
り，監視タイマ満了以前に再開処理が完了した場合に
は，集中制御モジュールで再開通知を検出し，他モジュ
ールに対して当該モジュールの状態を正常に戻す指示を
行う。図４に示すＣａｓｅ１は，ＰＨ２再開の実施によ
り再開完了が図れた場合を示している。In other modules, access processing to the affected module is prohibited. In the affected module, an autonomous restart process is attempted. If the restart process is completed before the monitoring timer expires, the centralized control module detects the restart notification and returns the status of the module to other modules normally. Give instructions to return. Case 1 shown in FIG. 4 shows a case where the completion of the resumption is achieved by resuming the PH2.

【００３５】一方，監視タイマが満了しても再開完了で
きなかった場合には，罹障モジュールと相互バックアッ
プ関係にあるバックアップモジュールに処理が引き継が
れる。図４に示すＣａｓｅ２は，ＰＨ３再開で再開完了
したが，既に監視タイマが満了してしまった場合を示し
ている。On the other hand, if the restart cannot be completed even if the monitoring timer expires, the processing is taken over by the backup module which has a mutual backup relationship with the affected module. Case 2 shown in FIG. 4 indicates a case where the restart has been completed by the restart of PH3, but the monitoring timer has already expired.

【００３６】図に示したように，Ｃａｓｅ２の場合には
障害発生からバックアップ処理実施までの間で罹障モジ
ュールで提供すべきサービスが中断する。一方，障害復
旧監視期間，すなわち監視タイマ値を短くするとサービ
ス中断時間は，バックアップ起動処理に要する時間まで
短縮することが可能である。As shown in the figure, in the case of Case 2, the service to be provided by the affected module is interrupted between the occurrence of the failure and the execution of the backup processing. On the other hand, if the failure recovery monitoring period, that is, the monitoring timer value is shortened, the service interruption time can be reduced to the time required for the backup activation process.

【００３７】機能分散モジュールに「従来の技術」の項
で述べた再開ロジックのみを適用した場合，適用フェー
ズはＰＨ２，ＰＨ３である。システムの初期設定時間を
３０秒，システムプログラムのロード時間を１００秒と
するとＰＨ２には３０秒，ＰＨ３には１３０秒の時間を
要することになる。When only the restart logic described in the section of "Prior Art" is applied to the function distribution module, the application phases are PH2 and PH3. Assuming that the initial setting time of the system is 30 seconds and the load time of the system program is 100 seconds, PH2 requires 30 seconds and PH3 requires 130 seconds.

【００３８】また，前述したように，機能分散モジュー
ルはデータベースシステムを搭載しており，障害発生時
にはデータベースシステムの再構築を実施する必要があ
る。この再構築のための時間（すなわちバックアップ処
理実施時間に等しい）を３０秒とすると，実質的にはＰ
Ｈ２は６０秒，ＰＨ３は１６０秒の時間を要することに
なる。As described above, the function distribution module is equipped with a database system, and it is necessary to reconstruct the database system when a failure occurs. Assuming that the time for this reconstruction (that is, equal to the backup processing execution time) is 30 seconds, P
H2 requires 60 seconds and PH3 requires 160 seconds.

【００３９】このため，例えば障害復旧監視タイマ値の
設定値が０〜３０秒以内であれば，ＰＨ２再開より早く
サービス開始が可能となり，サービス中断時間の短縮化
が図れる。Therefore, for example, if the set value of the failure recovery monitoring timer value is within 0 to 30 seconds, the service can be started earlier than the restart of PH2, and the service interruption time can be shortened.

【００４０】一般に従来の交換機でもＰＨ２再開１回で
再開完了するとは限らないため，障害復旧タイマにより
バックアップ起動を実施する本方式は，サービス中断時
間を固定化できる点がメリットである。In general, even in the conventional exchange, the restart is not always completed by one restart of PH2. Therefore, this method of starting backup by using the failure recovery timer has an advantage that the service interruption time can be fixed.

【００４１】なお，このタイマ値は保守端末４００から
の設定を可能としておくことにより，保守者によるサー
ビス中断時間のカスタマイズが可能となる。本機能は，
例えば当該機能分散モジュールを利用するユーザに対し
て，サービス中断許容時間を契約条件とするような場合
において有効である。By setting the timer value from the maintenance terminal 400, the service interruption time can be customized by the maintenance person. This function is
For example, this is effective in a case where the service interruption allowable time is set as a contract condition for a user who uses the function distribution module.

【００４２】[0042]

【発明の効果】本発明による再開処理方式を用いると，
分散モジュール構成を採用した通信制御ノードにおい
て，障害発生モジュールを素早く検出し，当該モジュー
ルの状態を制御することにより，サービス中断時間を従
来に比べて短くすることができるとともに，障害発生時
のサービス中断時間を，障害復旧監視タイマ値に設定す
ることにより保守者がカスタマイズすることが可能とな
る。According to the restart processing method of the present invention,
In a communication control node that employs a distributed module configuration, by detecting a faulty module quickly and controlling the status of the module, the service interruption time can be shortened compared to the past, and the service interruption when a failure occurs By setting the time to the failure recovery monitoring timer value, the maintenance person can customize the time.

[Brief description of the drawings]

【図１】本発明を適用するサービス制御ノードの接続構
成を示す図である。FIG. 1 is a diagram showing a connection configuration of a service control node to which the present invention is applied.

【図２】サービス制御ノードを構成する各分散モジュー
ル上での，障害復旧を実現するための機能配備条件を示
す図である。FIG. 2 is a diagram showing a condition of function deployment for realizing failure recovery on each distributed module constituting a service control node.

【図３】各分散モジュール上に配備された機能要素間で
の障害発生／復旧時のやりとりを示す図である。FIG. 3 is a diagram showing exchanges at the time of failure occurrence / recovery between functional elements arranged on each distributed module.

【図４】機能分散モジュールにおいて障害が発生した場
合の，具体的な障害回復手順を示す図である。FIG. 4 is a diagram illustrating a specific failure recovery procedure when a failure occurs in a function distribution module.

[Explanation of symbols]

１０１伝達ノード２０１負荷分散モジュール群２０２負荷分散モジュール群２０３機能分散モジュール群２０４集中制御モジュール３００サービス制御ノード４００保守端末５００障害検出／再開制御機構５０１障害／再開経過通知機構５０２他モジュール状態管理機構５０３全モジュール状態監視／制御機構５０４障害復旧監視機構６００罹障モジュール７００他モジュール 101 Transmission Node 201 Load Balancing Module Group 202 Load Balancing Module Group 203 Function Balancing Module Group 204 Centralized Control Module 300 Service Control Node 400 Maintenance Terminal 500 Fault Detection / Restart Control Mechanism 501 Fault / Restart Progress Notification Mechanism 502 Other Module State Management Mechanism 503 All module status monitoring / control mechanism 504 Failure recovery monitoring mechanism 600 Affected module 700 Other module

───────────────────────────────────────────────────── フロントページの続き (72)発明者前側理東京都港区芝５丁目７番１号日本電気株式会社内 (72)発明者岡本泰雄東京都港区芝５丁目７番１号日本電気株式会社内 (56)参考文献特開平７−168778（ＪＰ，Ａ) 特開平６−209367（ＪＰ，Ａ) 特開平６−37783（ＪＰ，Ａ) 特開昭63−214842（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) H04L 29/14 H04L 12/24 H04L 12/26 G06F 13/00 353 ────────────────────────────────────────────────── ─── Continuing on the front page (72) Inventor Osamu Osamu 5-7-1 Shiba, Minato-ku, Tokyo NEC Corporation (72) Inventor Yasuo Okamoto 5-7-1 Shiba, Minato-ku, Tokyo NEC (56) References JP-A-7-168778 (JP, A) JP-A-6-209367 (JP, A) JP-A-6-37783 (JP, A) JP-A-63-214842 (JP, A A) (58) Field surveyed (Int. Cl. ⁷ , DB name) H04L 29/14 H04L 12/24 H04L 12/26 G06F 13/00 353

Claims

(57) [Claims]

1. A distributed module group for load distribution and function distribution , a centralized control module for supervising and managing these modules,
A failure recovery method for a communication control node configured by a distributed node system having a maintenance terminal, wherein when the centralized control module detects a failure that has occurred in a distributed module, the failure of the affected module is coordinated with another module. stop the communication service access immediately, as well as restoring the communication service access immediately if resuming process has been completed by the Kakasawa module, the function distribution module in the distributed nodes system
Equipped with a database system for communication services
When managing updated data, the centralized control module is
When a failure occurs, monitor the time of failure recovery, and
If the failure does not recover within the time set from the terminal
The backup function between the modules specified in advance is
A failure recovery method for a distributed node system , wherein the processing is taken over to a backup side based on a relationship.