JP3232393B2

JP3232393B2 - Module operating state control method for distributed processing system

Info

Publication number: JP3232393B2
Application number: JP31633095A
Authority: JP
Inventors: 英一岡; 俊郎中村
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1995-12-05
Filing date: 1995-12-05
Publication date: 2001-11-26
Anticipated expiration: 2015-12-05
Also published as: JPH09162976A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、分散処理システム
を構成する各モジュール（分散モジュール）の故障を監
視する技術に係り、特に、各モジュールをバックアップ
対応に冗長構成した大規模インテリジェントネットワー
クの性能を低下させることなく、故障の監視を行なうの
に好適な分散処理システムのモジュール運転状態制御方
法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technology for monitoring the failure of each module (distributed module) constituting a distributed processing system, and more particularly, to the performance of a large-scale intelligent network in which each module is redundantly configured for backup. The present invention relates to a method for controlling a module operation state of a distributed processing system suitable for monitoring a failure without lowering the module operation state.

【０００２】[0002]

【従来の技術】従来、分散処理システムの各モジュール
の状態を監視する技術としては、各モジュールを統括管
理する集中制御モジュールを用いた集中管理方式と、各
モジュールが相互に他のモジュールの状態を監視する自
律分散管理方式等がある。集中管理方式においては、集
中制御モジュールが有するヘルスチェックを実行するこ
とにより、配下のモジュールの運転が可能であるか不可
能であるかを判定する方式や、各モジュールが具備する
自モジュールの再開起動を、集中制御モジュールに対し
て通知することにより、集中制御モジュールが、配下の
モジュールにおいて故障が発生したことを検出する方式
等がある。集中制御モジュールは、これらの方式のそれ
ぞれ、もしくは双方により収集した各構成モジュールの
状態変化を、故障していないその他のモジュールに対し
て分配する。そして、各モジュールにおいては、集中制
御モジュールから通知された故障モジュールの情報に応
じて、処理依頼先を選択する。2. Description of the Related Art Conventionally, as a technique for monitoring the state of each module of a distributed processing system, a centralized management system using a centralized control module for centrally managing each module, and each module mutually checking the state of other modules. There is an autonomous distributed management method for monitoring, and the like. In the centralized management system, the health check of the centralized control module is executed to determine whether the operation of the subordinate modules is possible or not, and the restart of the own module of each module is started. Is notified to the centralized control module so that the centralized control module detects that a failure has occurred in a subordinate module. The central control module distributes the status change of each component module collected by each of these methods or both to other modules that are not faulty. Then, in each module, a processing request destination is selected according to the information of the failed module notified from the central control module.

【０００３】この集中管理方式においては、集中制御モ
ジュールに故障が発生すると、システム全体の状態制御
が不可能になる。このため、システム全体の信頼度を一
定以上向上させることが困難である。さらに、集中管理
モジュールがノード内に発生した故障を検出するまでに
遅延が伴い、故障が発生してから、各モジュールが故障
モジュールに対する処理要求を抑止するまでに発生した
呼が異常終了し、サービス品質の低下につながる。In this centralized management system, if a failure occurs in the centralized control module, it becomes impossible to control the state of the entire system. For this reason, it is difficult to improve the reliability of the entire system by a certain amount or more. Furthermore, there is a delay until the centralized management module detects a failure that has occurred in the node, and the call that occurred before the failure occurred and before each module suppressed the processing request for the failed module terminated abnormally. It leads to quality deterioration.

【０００４】また、自律分散管理方式においては、シス
テムを構成する各モジュールが、自律的に、予め定めら
れた範囲にあるモジュール、ないしはシステムを構成す
る他の全モジュールに対して、周期的に、ヘルスチェッ
クを実行することにより、他モジュールの故障の有無の
状態を把握し、処理依頼先を選択する。このため、集中
管理方式における集中制御モジュールの故障発生に伴う
システム全体の状態制御が不可能となる状態や、故障検
出の遅延に伴うサービス品質の低下を回避することがで
きる。In the autonomous decentralized management system, each module constituting the system is autonomously periodically assigned to a module within a predetermined range or to all other modules constituting the system. By executing the health check, the status of the failure of the other module is grasped, and the processing request destination is selected. For this reason, it is possible to avoid a state in which the state control of the entire system becomes impossible due to the occurrence of a failure in the centralized control module in the centralized management system, and a decrease in service quality due to a delay in failure detection.

【０００５】しかし、この自律分散管理方式において
は、各モジュールが相互にヘルスチェックや全分散モジ
ュール群で個々のモジュール状態の整合性を保証するた
めの通信を行なう。その結果、通常の呼処理のためのモ
ジュール間通信を圧迫し、サービス品質の低下につなが
る。However, in this autonomous decentralized management system, each module performs a health check and communication for guaranteeing consistency of individual module states in all distributed module groups. As a result, communication between modules for normal call processing is squeezed, leading to a decrease in service quality.

【０００６】[0006]

【発明が解決しようとする課題】解決しようとする問題
点は、従来の技術では、分散モジュールシステムのサー
ビス品質を低下させることなく、各モジュールの故障管
理を行なうことができない点である。本発明の目的は、
これら従来技術の課題を解決し、分散モジュールをバッ
クアップ対応に冗長構成した大規模インテリジェントネ
ットワークの信頼性と性能を向上させることを可能とす
る分散処理システムのモジュール運転状態制御方法を提
供することである。The problem to be solved is that in the conventional technique, failure management of each module cannot be performed without deteriorating the service quality of the distributed module system. The purpose of the present invention is
An object of the present invention is to provide a method for controlling the operation state of a module of a distributed processing system which can solve the problems of the conventional technology and improve the reliability and performance of a large-scale intelligent network in which distributed modules are redundantly configured for backup. .

【０００７】[0007]

【課題を解決するための手段】上記目的を達成するた
め、本発明の分散処理システムのモジュール運転状態制
御方法は、分散処理システムを構成する各々のモジュー
ルで、他モジュールの運転可否状態の監視を行なう分散
処理システムのモジュール運転状態制御方法であって、
アクセス先の第１のモジュール（負荷分散モジュールＢ
２）よる自モジュール（負荷分散モジュールＡ１）から
の処理要求に対する応答時間を計測し、この時間が、ア
クセス信号毎に予め定められた時間（Ｔ０）を越えるか
否かを監視し、時間（Ｔ０）を越えた場合（タイムアウ
ト）、第１のモジュールを運転不可能な故障中状態（一
時故障中）として設定登録するか、もしくは上記第１の
モジュールを故障中状態として設定登録した後、予め別
に設定められた時間（Ｔ１）の経過後、上記第１のモジ
ュールの故障中状態の設定を解除する方法において、上
記第１のモジュールを故障中状態として設定登録してい
る間、上記第１のモジュールに対するリトライ処理を含
み、以降のアクセスを、代替可能な第２のモジュールを
選択して行ない、かつ上記第１のモジュールが、予め規
定された他のモジュール相互間でバックアップ関係を有
する機能分散モジュールであれば、該機能分散モジュー
ルとバックアップ関係を有する機能分散モジュールを、
上記第２のモジュールとして選択し、また上記第２モジ
ュールは、リトライ処理に対応する上記第１のモジュー
ルへのアクセスで、上記時間（Ｔ０）に基づく上記第１
のモジュールの故障中状態の設定登録を行なった場合、
バックアップを起動すると共に、上記第１のモジュール
が故障中であることを、接続先の全てのモジュールに通
知し、該通知を受けた各モジュールは、上記第１のモジ
ュールに対して上記故障中状態の設定登録を行ない、さ
らに上記第２のモジュールは、上記予め別に設定められ
た時間（Ｔ１）後、上記第１のモジュールの運転可否状
態を判別し、該第１のモジュールが運転可能な状態であ
れば、該第１のモジュールに対する故障中状態の設定登
録を解除すると共に、上記全てのモジュールに、上記第
１のモジュールに対する故障中状態の設定登録を解除す
るよう通知することを特徴とする。To achieve the above object, according to an aspect of the module operation state control method of a distributed processing system of the present invention, in each of the modules constituting the distributed processing system, the monitoring of the operating state for or other modules A module operation state control method for a distributed processing system that performs
Access destination first module (load distribution module B
2) The response time to the processing request from the own module (load distribution module A1) is measured, and it is monitored whether or not this time exceeds a predetermined time (T0) for each access signal. If) exceeds the (time-out), either by setting registered as first an inoperable fault in status module (transient faults in), or the first
After setting and registering the module as faulty,
After the lapse of the time (T1) set in the first module, the first module
In the method for canceling the setting of the fault state of the module,
The first module is set and registered as faulty.
The retry process for the first module
Only the second module that can be replaced
Selected and the first module is specified in advance.
There is a backup relationship between other specified modules.
Function distribution module, the function distribution module
Function distribution module that has a backup relationship with
Selected as the second module and the second module
The first module corresponding to the retry processing is
Access to the first device based on the time (T0).
When the setting registration of the failure status of the module of
Activate the backup and execute the first module
To all connected modules that the
Each of the modules notified and notified of the
Register the setting of the above failure status to the
Further, the second module is separately set in advance.
After the elapsed time (T1), the operation availability of the first module
The first module is in an operable state.
Then, the setting registration of the faulty state for the first module is performed.
The recording is canceled and all the modules are
Cancels the registration of the faulty state setting for the module 1
Is notified .

【０００８】[0008]

【発明の実施の形態】本発明においては、分散モジュー
ルで発生した故障を、各モジュールが行なう通信サービ
スアクセスに対するタイムアウト（Ｔ０）の監視により
検出する。そして、タイムアウトが発生した場合、当該
モジュールに対する通信サービスアクセスを直ちに停止
する。さらに一定時間を経過した後に、タイムアウト発
生した当該モジュールへの通信サービスアクセスを回復
する。また、当該モジュールへのアクセスに対するリト
ライ処理が可能な場合は、代替モジュールとして、同種
モジュールに対してアクセスする。DESCRIPTION OF THE PREFERRED EMBODIMENTS In the present invention, a failure occurring in a distributed module is detected by monitoring a timeout (T0) for a communication service access performed by each module. When a timeout occurs, the communication service access to the module is immediately stopped. After a certain period of time has elapsed, the communication service access to the module in which the timeout has occurred is restored. If retry processing for access to the module is possible, access is made to a module of the same type as a substitute module.

【０００９】また、通信サービスの実現上で必要な更新
系データを、予め規定されたモジュール相互間のバック
アップ関係を有する機能分散モジュールにおいて管理す
る場合、バックアップ関係にある機能分散モジュールが
相互の運転状態を管理し、相手側への通信サービスアク
セスがタイムアウトした場合、当該機能分散モジュール
が管理していた更新系データの管理権を引き継ぐ。そし
て、当該モジュールの運転のバックアップを開始すると
共に、その他のモジュールに対して、当該モジュールの
運転状態が運転不可能になったことを通知する。さら
に、一定時間（Ｔ１）経過した後に、当該モジュールか
ら引き継いでいた更新系データの管理権を返還すること
を試み、成功した場合は、他のモジュールに対して、当
該モジュールの状態が運転可能になったことを通知し、
失敗した場合は、再度、上述した一定時間（Ｔ１）が経
過するまでバックアップ運転を継続する。In the case where update data required for realizing a communication service is managed in a function distribution module having a backup relationship between predetermined modules, the function distribution modules in the backup relationship are in a mutual operation state. When the communication service access to the other party times out, the management right of the update data managed by the function distribution module is taken over. Then, the backup of the operation of the module is started, and the other modules are notified that the operation state of the module has become inoperable. Further, after a lapse of a predetermined time (T1), an attempt is made to return the management right of the update data taken over from the module, and if successful, the status of the module becomes operable with respect to other modules. Notification that
In the case of failure, the backup operation is continued again until the above-mentioned fixed time (T1) elapses.

【００１０】このようにすることにより、分散処理シス
テムのモジュール運転状態制御処理において、従来の技
術と比較して以下の利点がある。（ａ）集中制御モジュールによるモジュール状態の制御
が不要なため、集中制御モジュールの故障等の、システ
ム全体の信頼度の向上を阻害する要因を排除できる。（ｂ）集中制御モジュールが不要であり、集中制御モジ
ュールによる故障検出までの遅延を回避できる。（ｃ）ヘルスチェック用信号の発行を行なわないので、
通信量を圧迫することはない。（ｄ）全モジュール間での他のモジュールの状態整合保
証のための双方向通信を必要としないので、通信量を圧
迫することはない。By doing so, there are the following advantages in the module operation state control processing of the distributed processing system as compared with the conventional technique. (A) Since it is not necessary to control the module state by the centralized control module, it is possible to eliminate factors such as failure of the centralized control module that hinder the improvement of the reliability of the entire system. (B) No centralized control module is required, and a delay until failure detection by the centralized control module can be avoided. (C) Since the health check signal is not issued,
There is no pressure on traffic. (D) Since there is no need for bidirectional communication between all modules for guaranteeing the state matching of other modules, the communication amount is not reduced.

【００１１】[0011]

【実施例】以下、本発明の実施例を、図面により詳細に
説明する。図１は、本発明の分散処理システムのモジュ
ール運転状態制御方法の本発明に係る処理の第１の実施
例を示す通信シーケンス図、図２は、本発明の分散処理
システムのモジュール運転状態制御方法の本発明に係る
処理の第２の実施例を示す通信シーケンス図、図３は、
本発明に係るモジュール運転状態制御を行なう分散処理
システムの構成例を示すブロック図である。Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 is a communication sequence diagram showing a first embodiment of a process according to the present invention of a method for controlling a module operating state of a distributed processing system according to the present invention, and FIG. 2 is a method for controlling a module operating state of a distributed processing system according to the present invention. FIG. 3 is a communication sequence diagram showing a second embodiment of the process according to the present invention.
It is a block diagram showing the example of composition of the distributed processing system which performs module operation state control concerning the present invention.

【００１２】図３においては、本発明に係る分散処理シ
ステムとして、分散モジュール構成を利用した、インテ
リジェントネットワークサービスを提供するサービス制
御ノードの構成を示し、１００は分散モジュール構成を
採用したサービス制御ノード、３００は伝達ノード、２
０１は伝達ノード３００との通信制御を行なう負荷分散
モジュール群（以下、このタイプのモジュールをモジュ
ールＡという）、２０２はサービス制御ロジックを保有
する負荷分散モジュール群（以下、このタイプのモジュ
ールをモジュールＢという）、２０３はサービス制御ロ
ジックにより更新された更新系データをリアルタイムデ
ータベース上に保持する機能分散モジュール群（以下、
このタイプのモジュールをモジュールＣという）であ
る。FIG. 3 shows a configuration of a service control node that provides an intelligent network service using a distributed module configuration as a distributed processing system according to the present invention. 300 is a transmission node, 2
01 denotes a load distribution module group for controlling communication with the transmission node 300 (hereinafter, this type of module is referred to as module A), and 202 denotes a load distribution module group having service control logic (hereinafter, this type of module is referred to as module B). 203, a function distribution module group (hereinafter, referred to as a group of functions) that holds updated data updated by the service control logic on a real-time database.
This type of module is referred to as module C).

【００１３】サービス制御ノード１００は、これらの分
散モジュール（モジュールＡ２０１、モジュールＢ２０
２、モジュールＣ２０３）により構成されている。伝達
ノード３００は、端末間の呼に対応する通話路の制御を
行なう。すなわち、伝達ノード３００は、電話機等の端
末からのインテリジェントネットワークサービス呼を検
出すると、サービス制御ノード１００に通知してくる。
サービス制御ノード１００は、当該呼の制御を分散モジ
ュール群２０１〜２０３の連携により実施する。The service control node 100 is provided with these distributed modules (module A201, module B20
2, module C203). The transmission node 300 controls a communication path corresponding to a call between terminals. That is, upon detecting an intelligent network service call from a terminal such as a telephone, the transmission node 300 notifies the service control node 100 of the call.
The service control node 100 controls the call by cooperation of the distributed module groups 201 to 203.

【００１４】図４は、図３におけるサービス制御ノード
の本発明に係る詳細構成例を示すブロック図である。本
例は、図３のサービス制御ノード１００内の各分散モジ
ュール上へのモジュール運転状態制御のための機能配備
条件例を示し、モジュールＡ２０１，Ｂ２０２は、他モ
ジュール状態管理機構４００、モジュール間通信機構４
０３を有し、また、モジュールＣ２０３は、機能分散モ
ジュール他モジュール状態管理機構４０１、バックアッ
プ機構４０２、モジュール間通信機構４０３を有する。
以下、各機構の詳細を説明する。FIG. 4 is a block diagram showing a detailed configuration example according to the present invention of the service control node in FIG. This example shows an example of function allocation conditions for module operation state control on each distributed module in the service control node 100 in FIG. 3. Modules A201 and B202 include another module state management mechanism 400 and an inter-module communication mechanism. 4
The module C203 includes a function distribution module / other module state management mechanism 401, a backup mechanism 402, and an inter-module communication mechanism 403.
Hereinafter, details of each mechanism will be described.

【００１５】（Ａ）他モジュール状態管理機構４００：
モジュールＡ２０１，Ｂ２０２では、他モジュール状態
管理機構４００により、自モジュール以外の他モジュー
ルの状態を管理する。尚、モジュールＣ２０３において
は、各々のモジュール状態の他、バックアップ関係にあ
る機能分散モジュールのＩＤも合わせて管理する。そし
て、後述するモジュール間通信機構４０３からのタイム
アウト通知を受けると、当該モジュールの状態を故障中
に設定する。当該モジュールが負荷分散モジュールであ
る場合は、一定時間（Ｔ１）経過後、当該モジュールの
状態を運転可能な状態に再設定する。(A) Other module status management mechanism 400:
In the modules A201 and B202, the states of other modules other than the own module are managed by the other module state management mechanism 400. In the module C203, in addition to the status of each module, the IDs of the function distribution modules in a backup relationship are also managed. When a timeout notification is received from the inter-module communication mechanism 403, which will be described later, the status of the module is set to failure. If the module is a load balancing module, the state of the module is reset to an operable state after a lapse of a predetermined time (T1).

【００１６】さらに、後述するモジュールＣ２０３から
バックアップモジュールの故障通知を受信した場合は、
通知された機能分散モジュールの状態を故障中に設定
し、以後、当該機能分散モジュールの状態復旧が通知さ
れるまで、固定的に、当該機能分散モジュールの状態を
故障中とする。また、そのモジュールで実施されるサー
ビス処理において、他モジュールへのアクセスが必要な
場合、常に、他モジュール状態は事前に参照され、モジ
ュール状態が正常でない場合には、当該モジュールへの
アクセスを中止し、代替モジュールを選択してアクセス
する。代替モジュールの選択方法としては、故障モジュ
ールがモジュールＡ，Ｂである場合、すなわち負荷分散
モジュールの場合には、同種のモジュールを任意に選択
する。一方、アクセスモジュールの種別がモジュール
Ｃ、すなわち機能分散モジュールの場合は、相互バック
アップ関係にあるモジュールを選択することもできる。Further, when a failure notification of the backup module is received from a module C 203 described later,
The state of the notified function distribution module is set to failure, and thereafter, the state of the function distribution module is fixed to failure until the state restoration of the function distribution module is notified. In the service processing performed by the module, when access to another module is required, the status of the other module is always referred to in advance, and when the module status is not normal, access to the module is stopped. , Select an alternative module to access. As a method of selecting an alternative module, when the failed modules are the modules A and B, that is, in the case of the load distribution module, the same type of module is arbitrarily selected. On the other hand, when the type of the access module is the module C, that is, the function distribution module, the modules having a mutual backup relationship can be selected.

【００１７】（Ｂ）機能分散モジュール他モジュール状
態管理機構４０１：モジュールＣに配備された機能分散
モジュール他モジュール状態管理機構４０１は、前述し
た他モジュール状態管理機構４００が有する機能のほか
に、タイムアウトしたモジュールが自モジュールと相互
バックアップ関係にある機能分散モジュール（以後、こ
のようなモジュールをペアモジュールという）であった
場合、当該モジュールの状態を、直ちに、「故障中」と
設定すると共に、バックアップ機構４０２に対して、バ
ックアップ処理の開始を通知し、その他の全モジュール
に対して、ペアモジュールが故障したことを同報する。
その後、一定時間（Ｔ１）経過後、バックアップ機構４
０２に対して、バックアップの切り戻し処理（以後、単
に切り戻し処理という）を指示し、この切り戻し処理が
正常に終了した場合は、ペアモジュールの状態を正常に
復帰させると共に、全モジュールに対して、当該モジュ
ールが復帰したことを同報する（復帰通知）。(B) Function distribution module / other module status management mechanism 401: The function distribution module / other module status management mechanism 401 provided in the module C has timed out in addition to the functions of the aforementioned other module status management mechanism 400. If the module is a function distribution module having a mutual backup relationship with its own module (hereinafter, such a module is referred to as a pair module), the status of the module is immediately set to “failed” and the backup mechanism 402 , The start of the backup process is notified, and all other modules are notified that the pair module has failed.
Thereafter, after a lapse of a predetermined time (T1), the backup mechanism 4
02, a backup switchback process (hereinafter, simply referred to as a switchback process) is instructed. When the switchback process is completed normally, the state of the pair module is returned to a normal state, and all the modules are restored. Broadcasts that the module has returned (return notification).

【００１８】（Ｃ）バックアップ機構４０２：バックア
ップ機構は、機能分散モジュール他モジュール状態管理
機構４０１からのバックアップ開始指示を契機に起動し
て、バックアップ関係にあるペアモジュールのサービス
処理を引き継ぎ、切り戻し指示を契機に切り戻し処理を
開始する。バックアップの切り戻し処理は、ペアモジュ
ールの故障がまだ復帰していないことにより成功しない
場合、その処理を中断し、切り戻し指示を発行した機能
分散モジュール他モジュール状態管理機構４０１に対し
て、切り戻し処理失敗を通知すると共に、バックアップ
運転を継続する。(C) Backup mechanism 402: The backup mechanism is started upon a backup start instruction from the function distribution module / other module state management mechanism 401 to take over the service processing of the pair module in the backup relationship and issue a switchback instruction. The switchback processing is started with the trigger. If the backup switchback processing is not successful because the failure of the pair module has not yet recovered, the processing is interrupted, and the switchback processing is performed on the function distribution module / other module state management mechanism 401 that issued the switchback instruction. Notify the processing failure and continue the backup operation.

【００１９】（Ｄ）モジュール間通信機構４０３：全て
の分散モジュールでは、他モジュールとの通信を行なう
ために、モジュール間通信機能を有する。モジュール間
通信機構４０３は、他モジュールに対するアクセスに対
して、信号毎に予め定められている時間（Ｔ０）を経過
しても応答を受信しない場合、当該モジュールに対する
アクセスがタイムアウトしたことを直ちに他モジュール
状態管理機構４００に通知する。モジュールＣの場合
は、機能分散モジュール他モジュール状態管理機構４０
１に通知する。(D) Inter-module communication mechanism 403: All distributed modules have an inter-module communication function in order to communicate with other modules. If no response is received after a predetermined time (T0) has elapsed for each signal with respect to access to another module, the inter-module communication mechanism 403 immediately notifies the other module that the access to the module has timed out. Notify the state management mechanism 400. In the case of module C, the function distribution module and other module state management mechanism 40
Notify 1.

【００２０】図５は、図３におけるサービス制御ノード
の各分散モジュールで管理する他モジュールの状態遷移
モデルを示す説明図である。本図５において、５０１は
「運転中」の状態を示し、５０２は「一時故障中」の状
態を示す。それぞれの状態において通信応答時間６０１
と監視時間６０２を監視する。運転中状態５０１と一時
故障中状態５０２の間の遷移は、遷移検出ポイントを通
過して行なわれる。遷移検出ポイント７０１は、運転中
状態にあったモジュールに対する通信応答時間６０１が
予め定められたタイムアウト時間（Ｔ０）を満了した場
合に満足される遷移検出ポイントであり、遷移検出ポイ
ント７０２は、故障中にあったモジュールが、故障中状
態になってからの監視時間６０２が別に定められた監視
満了時間（Ｔ１）を越えた場合に、満足される遷移検出
ポイントである。FIG. 5 is an explanatory diagram showing a state transition model of another module managed by each distributed module of the service control node in FIG. In FIG. 5, 501 indicates a state of “operating”, and 502 indicates a state of “temporary failure”. Communication response time 601 in each state
And the monitoring time 602 are monitored. The transition between the operating state 501 and the temporary failure state 502 is performed through a transition detection point. The transition detection point 701 is a transition detection point that is satisfied when the communication response time 601 for the module in the operating state has expired a predetermined time-out time (T0). Is a transition detection point that is satisfied when the monitoring time 602 after the module in the state of failure has exceeded the monitoring expiration time (T1) defined separately.

【００２１】図６は、図３におけるサービス制御ノード
のペアモジュール以外の分散モジュールで管理する機能
分散モジュールの状態遷移モデルを示す説明図である。
本図６において、５０１と５０２は図５で示したものと
同様に、「運転中」の状態と「一時故障中」の状態を示
す。５０３は、当該モジュールＣが「故障中状態」であ
ることを示す。「運転中」状態における監視データであ
る通信応答時間６０１、「一時故障中」状態における監
視データである監視時間６０２、及び、遷移検出ポイン
ト７０１，７０２についても図５における説明と同様で
ある。遷移検出ポイント７０３は、本状態遷移図が示し
ているモジュールＣのペアモジュールからの故障通知の
受信により、また、遷移検出ポイント７０４は、故障か
らの復帰通知の受信により、通過する遷移検出ポイント
を各々示している。FIG. 6 is an explanatory diagram showing a state transition model of a function distribution module managed by a distribution module other than the pair module of the service control node in FIG.
In FIG. 6, reference numerals 501 and 502 denote a “operating” state and a “temporary failure” state, similarly to the case shown in FIG. Reference numeral 503 indicates that the module C is in the “failure state”. The communication response time 601 that is the monitoring data in the “operating” state, the monitoring time 602 that is the monitoring data in the “temporary failure” state, and the transition detection points 701 and 702 are the same as those described in FIG. The transition detection point 703 is determined by receiving a failure notification from the pair module of the module C shown in the state transition diagram, and the transition detection point 704 is determined by passing a transition detection point passing by receiving a recovery notification from the failure. Each is shown.

【００２２】図７は、図３におけるサービス制御ノード
の機能分散モジュールが管理するペアモジュールの状態
遷移モデルを示す説明図である。運転中状態５０１、通
信応答時間６０１，監視時間６０２は図５，図６におけ
る説明と同様である。状態５０４は、図５における「一
時故障中」状態５０２と同様に、当該モジュールが「一
時故障中」の状態であることと共に、故障モジュールの
バックアップ運転を行なっている状態であることを示
す。遷移検出ポイント７０５は、運転中状態にあったペ
アモジュールに対するアクセスがタイムアウト（時間Ｔ
０経過）した場合に通過する遷移検出ポイントであり、
この遷移検出ポイントを通過した場合は、全モジュール
に対して、ペアモジュールの状態が「故障中」に遷移し
たことを同報する。FIG. 7 is an explanatory diagram showing a state transition model of a pair module managed by the function distribution module of the service control node in FIG. The operating state 501, the communication response time 601, and the monitoring time 602 are the same as those described with reference to FIGS. The state 504 indicates that the module is in the “temporary failure” state and is in the state of performing the backup operation of the failed module, similarly to the “temporary failure” state 502 in FIG. The transition detection point 705 indicates that the access to the pair module in the operating state has timed out (time T
0), which is a transition detection point that passes when
If this transition detection point has been passed, the fact that the state of the pair module has transitioned to "failure" is broadcast to all modules.

【００２３】遷移検出ポイント７０６は、ペアモジュー
ルの状態が一時故障中状態５０２に遷移してからの監視
時間６０２が、予め定められた監視満了時間（Ｔ１）を
満了し、バックアップ運転の切り戻しが成功したため
に、ペアモジュールにおいて本来の運転が可能となった
ことを示す検出ポイントであり、この遷移検出ポイント
７０６を通過した場合は、全モジュールに対して、ペア
モジュールが「運転中」状態に復帰したことを同報す
る。遷移検出ポイント７０７は、切り戻し処理が正常に
終了できなかったため、バックアップ運転を継続して行
なう場合に通過する遷移検出ポイントである。At the transition detection point 706, the monitoring time 602 after the state of the pair module transits to the temporary failure state 502 expires the predetermined monitoring expiration time (T1), and the backup operation is switched back. This is a detection point indicating that the original operation has become possible in the pair module due to success. When the transition detection point 706 is passed, the pair module returns to the “operating” state for all modules. Broadcast what you did. The transition detection point 707 is a transition detection point that passes when the backup operation is continuously performed because the switchback processing has not been completed normally.

【００２４】以下、図１に基づき、図３におけるシステ
ムの、負荷分散モジュールにおいて故障が発生した場合
のモジュール状態制御手順例を説明する。本図１に示す
例では、モジュールＡ１は、モジュールＡに属するモジ
ュールであり、モジュールＢ１，Ｂ２は、それぞれモジ
ュールＢに属するモジュールである。そして、ある通信
サービスを実現するために、モジュールＡからモジュー
ルＢへのアクセスが必要であることを前提としている。Hereinafter, an example of a module state control procedure when a failure occurs in the load distribution module of the system in FIG. 3 will be described with reference to FIG. In the example shown in FIG. 1, the module A1 is a module belonging to the module A, and the modules B1 and B2 are modules belonging to the module B. Then, it is assumed that access to module B from module A is necessary to realize a certain communication service.

【００２５】故障発生時の本発明に係るモジュール状態
制御は、図１における（１）〜（５）に示す手順とな
る。すなわち、（１）モジュールＡ１から、通信サービス実現のため、
モジュールＢの１モジュールであるモジュールＢ２に対
してアクセスする。（２）この時、モジュールＢ２において故障が発生して
いると、モジュールＡ１からのアクセス要求に対して応
答が発行できない。そのため、モジュールＡ１では、上
記アクセスは通信タイムアウトとなる。（３）モジュールＢ２に対する通信タイムアウトを検出
したモジュールＡ１は、直ちに、モジュールＢ２の状態
を「一時故障中」に設定し、以後、モジュールＢに対す
るアクセス先としてモジュールＢ２を選択しないように
する。The module state control according to the present invention at the time of occurrence of a failure is performed by the procedures shown in (1) to (5) in FIG. That is, (1) From module A1, to realize a communication service,
An access is made to the module B2, which is one of the modules B. (2) At this time, if a failure has occurred in the module B2, a response to the access request from the module A1 cannot be issued. Therefore, in module A1, the above access results in a communication timeout. (3) The module A1 that has detected the communication timeout for the module B2 immediately sets the state of the module B2 to “temporary failure”, and thereafter does not select the module B2 as an access destination for the module B.

【００２６】（４）当該アクセスに対するリトライ処理
が可能な場合は、モジュールＢ２の代替モジュールとし
て、同種モジュールであるモジュールＢ１に対してアク
セスすることも可能である。（５）モジュールＡ１では、モジュールＢ２の状態を
「一時故障中」に設定してからの時間が、監視時間（Ｔ
１）を越えると、自律的にモジュールＢ２の状態を「運
転中」に戻し、以後、モジュールＢに対するアクセス先
として、モジュールＢ２を選択の範囲に加える。このよ
うな（１）〜（５）の処理の後に、初めて、モジュール
Ｂ２に対してアクセスした場合、まだ、モジュールＢ２
の故障が回復していなかった場合は、（１）〜（５）の
手順を再度繰り返す。(4) If retry processing for the access is possible, it is also possible to access module B1, which is a module of the same kind, as a substitute module for module B2. (5) In the module A1, the time after setting the state of the module B2 to “temporary failure” is the monitoring time (T
When the value exceeds 1), the state of the module B2 is autonomously returned to "operating", and thereafter, the module B2 is added to the selection range as an access destination for the module B. When the module B2 is accessed for the first time after the processes (1) to (5), the module B2 is still accessed.
If the failure has not been recovered, the steps (1) to (5) are repeated again.

【００２７】次に、図２に基づき、図３におけるシステ
ムの、機能分散モジュールにおいて故障が発生した場合
のモジュール状態制御手順例を説明する。尚、本例で
は、モジュールＡ１は、図１と同様に、モジュールＡに
属する１モジュールであり、モジュールＣ１，Ｃ２は、
それぞれモジュールＣに属するモジュールであって、相
互にバックアップ関係が規定されている。また、通信サ
ービスを実現するために、モジュールＡから、モジュー
ルＣ２が管理する更新系データへのアクセスが必要であ
ることを前提としているNext, an example of a module state control procedure when a failure occurs in the function distribution module of the system in FIG. 3 will be described with reference to FIG. In this example, the module A1 is one module belonging to the module A as in FIG. 1, and the modules C1 and C2 are:
Each of the modules belongs to the module C, and a backup relationship between the modules is defined. It is also assumed that the module A needs access to the update data managed by the module C2 in order to realize the communication service.

【００２８】このような条件での故障発生時の本発明に
係るモジュール状態制御は、図２における（１）〜（１
３）に示す手順となる。すなわち、（１）モジュールＡ１から、通信サービス実現のためモ
ジュールＣ２に対してアクセスする。（２）この時、モジュールＣ２において故障が発生して
いると、モジュールＡ１からのアクセス要求に対して応
答が発行できない。そのため、モジュールＡ１からのア
クセスは通信タイムアウトとなる。（３）モジュールＣ２に対する通信タイムアウトを検出
したモジュールＡ１は、直ちに、モジュールＣ２の状態
を「一時故障中」に設定し、以後、モジュールＣ２に対
するアクセスは、モジュールＣ２との相互バックアップ
関係にあるモジュールＣ１に対して行なう。The module state control according to the present invention at the time of occurrence of a failure under such conditions is described in (1) to (1) in FIG.
The procedure shown in 3) is performed. That is, (1) The module A1 accesses the module C2 to realize a communication service. (2) At this time, if a failure has occurred in the module C2, a response to the access request from the module A1 cannot be issued. Therefore, the access from the module A1 results in a communication timeout. (3) The module A1, upon detecting a communication timeout for the module C2, immediately sets the state of the module C2 to "temporary failure", and thereafter, the access to the module C2 is restricted to the module C1 having a mutual backup relationship with the module C2. Perform for

【００２９】（４）モジュールＡ１からの代替処理要求
を受け取ったモジュールＣ１では、アクセス要求されて
いる更新系データが相互バックアップ関係にあるモジュ
ールＣ２が管理しているデータであることを判定する
と、一旦、モジュールＣ２に対してアクセスの転送を行
なう。（５）モジュールＣ２は故障中のため、モジュールＣ１
から中継されたアクセスに対しても応答できないため、
モジュールＣ１からのアクセスもタイムアウトする。
尚、この間、モジュールＡ１からの代替処理要求がタイ
ムアウトすることを防ぐために、適当なタイミングで、
中間応答をモジュールＣ１からモジュールＡ１に対して
返送することもある。(4) When the module C1 that has received the substitute processing request from the module A1 determines that the update data requested to be accessed is data managed by the module C2 in a mutual backup relationship, the module C1 once determines , And transfers the access to the module C2. (5) Since module C2 is out of order, module C1
Can not respond to access relayed from
Access from module C1 also times out.
During this time, at an appropriate timing, in order to prevent the alternative processing request from the module A1 from timing out,
An intermediate response may be returned from module C1 to module A1.

【００３０】（６）モジュールＣ２に対する通信タイム
アウトを検出したモジュールＣ１は、直ちに、モジュー
ルＣ２の状態を「一時故障中（バックアップ起動中）」
に設定すると共に、バックアップ処理を起動する。（７）モジュールＣ１は、モジュールＣ２が故障したこ
とを全モジュールに対して通知する。（８）バックアップ処理の起動により可能となったモジ
ュールＣ２が管理していた更新系データへのアクセスを
実施し、アクセス要求元であるモジュールＡ１に対して
応答を返却する。（９）モジュールＣ１からのモジュールＣ２の故障通知
を受信した全モジュールは、モジュールＣ２の状態を
「故障中」に設定する。(6) The module C1, upon detecting a communication timeout for the module C2, immediately changes the state of the module C2 to "temporary failure (backup starting)".
And start backup processing. (7) The module C1 notifies all the modules that the module C2 has failed. (8) Access to the update data managed by the module C2 enabled by the start of the backup process is performed, and a response is returned to the module A1 that is the access request source. (9) All the modules that have received the failure notification of the module C2 from the module C1 set the state of the module C2 to “under failure”.

【００３１】（１０）モジュールＣ１では、モジュール
Ｃ２の状態を「一時故障中」に設定してから監視満了時
間（Ｔ１）が経過すると、バックアップ終了処理を起動
することにより、モジュールＣ２に対して、バックアッ
プ運転の切り戻しを試みる。この時、モジュールＣ２に
おける故障が回復していない場合、モジュールＣ１で
は、切り戻し処理が失敗する。この場合は、再度、監視
満了時間（Ｔ１）が経過するまで、バックアップ運転を
継続する。本図２における（ケース１）は、初回の切り
戻し処理が失敗した場合を示している。(10) In the module C1, when the monitoring expiration time (T1) elapses after the state of the module C2 is set to "temporary failure", the backup ending process is started, so that the module C2 Attempt to switch back to backup operation. At this time, if the failure in the module C2 has not been recovered, the switchback processing fails in the module C1. In this case, the backup operation is continued until the monitoring expiration time (T1) elapses again. (Case 1) in FIG. 2 shows a case where the first switchback processing has failed.

【００３２】（１１）切り戻し処理が成功した場合、モ
ジュールＣ１は、モジュールＣ２の状態を「運転中」に
設定する。本図２における（ケース２）は、この切り戻
し処理が成功した場合を示している。（１２）モジュールＣ２の状態が「運転中」に設定され
ると、モジュールＣ１は、全モジュールに対して、モジ
ュールＣ２の復旧通知を発行する。（１３）モジュールＣ１からのモジュールＣ２の復旧通
知を受信した全モジュールでは、モジュールＣ２の状態
を「運転中」に設定し、以後、通常のアクセスを行な
う。(11) If the switchback processing is successful, the module C1 sets the state of the module C2 to "operating". (Case 2) in FIG. 2 shows a case where the switchback processing is successful. (12) When the state of the module C2 is set to “operating”, the module C1 issues a recovery notification of the module C2 to all modules. (13) In all the modules that have received the notification of the recovery of the module C2 from the module C1, the state of the module C2 is set to “operating”, and thereafter, normal access is performed.

【００３３】以上、図１〜図７を用いて説明したよう
に、本実施例の分散処理システムのモジュール運転状態
制御方法では、分散モジュール構成を採用したサービス
制御ノード１００において、呼処理用アクセス信号のタ
イムアウト（時間Ｔ０経過）により、他モジュールの故
障を検出する。このことにより、故障検出のための専用
信号による呼処理用通信を圧迫することなしに故障発生
を素早く検出できる。また、故障モジュールの状態を自
律的にアクセスを抑止する状態に設定することにより、
故障モジュールを選択することによる呼損を減少させる
ことができると共に、従来技術における集中制御モジュ
ールからの制御を必要としないことから、特に、大規模
な分散システムにおける信頼度の決定要因から集中制御
モジュールの信頼度の影響を排除することが可能とな
る。As described above with reference to FIGS. 1 to 7, in the method of controlling the module operation state of the distributed processing system according to the present embodiment, the service control node 100 adopting the distributed module configuration uses the call processing access signal. (Time T0 has elapsed), a failure of another module is detected. As a result, the occurrence of a failure can be quickly detected without suppressing communication for call processing using a dedicated signal for failure detection. Also, by setting the state of the failed module to a state in which access is autonomously suppressed,
Since the call loss due to the selection of the faulty module can be reduced and the control from the centralized control module in the prior art is not required, the centralized control module is particularly used because of the determinants of reliability in a large-scale distributed system. Can eliminate the influence of reliability.

【００３４】尚、本発明は、図１〜図７を用いて説明し
た実施例に限定されるものではなく、その要旨を逸脱し
ない範囲において種々変更可能である。例えば、図１に
おける例ではモジュールＢ１のみが記載されており、手
順（４）での代替モジュールとしてのリトライ処理を含
むアクセス先は、このモジュールＢ１に対して行なわれ
ているが、リトライ処理以降のアクセス先は、モジュー
ルＢ１に限定されず、モジュールＢ１と同様にサービス
制御ロジックを保有する他の負荷分散モジュール群の各
モジュールでも良い。The present invention is not limited to the embodiment described with reference to FIGS. 1 to 7, and can be variously modified without departing from the gist thereof. For example, in the example of FIG. 1, only the module B1 is described, and the access destination including the retry processing as the substitute module in the procedure (4) is performed for the module B1, but the access destination after the retry processing is performed. The access destination is not limited to the module B1, but may be each module of another load balancing module group having service control logic like the module B1.

【００３５】[0035]

【発明の効果】本発明によれば、分散処理システムのサ
ービス品質を低下させることなく、各モジュールの故障
管理を行なうことができ、特に分散モジュールをバック
アップ対応に冗長構成した大規模インテリジェントネッ
トワークの信頼性と性能を向上させることが可能であ
る。According to the present invention, failure management of each module can be performed without deteriorating the service quality of the distributed processing system. In particular, the reliability of a large-scale intelligent network in which distributed modules are redundantly configured for backup can be improved. It is possible to improve performance and performance.

[Brief description of the drawings]

【図１】本発明の分散処理システムのモジュール運転状
態制御方法の本発明に係る処理の第１の実施例を示す通
信シーケンス図である。FIG. 1 is a communication sequence diagram showing a first embodiment of a process according to the present invention of a method for controlling a module operating state of a distributed processing system according to the present invention.

【図２】本発明の分散処理システムのモジュール運転状
態制御方法の本発明に係る処理の第２の実施例を示す通
信シーケンス図である。FIG. 2 is a communication sequence diagram showing a second embodiment of the process according to the present invention of the module operating state control method of the distributed processing system of the present invention.

【図３】本発明に係るモジュール運転状態制御を行なう
分散処理システムの構成例を示すブロック図である。FIG. 3 is a block diagram illustrating a configuration example of a distributed processing system that performs module operation state control according to the present invention.

【図４】図３におけるサービス制御ノードの本発明に係
る詳細構成例を示すブロック図である。FIG. 4 is a block diagram showing a detailed configuration example according to the present invention of the service control node in FIG. 3;

【図５】図３におけるサービス制御ノードの各分散モジ
ュールで管理する他モジュールの状態遷移モデルを示す
説明図である。FIG. 5 is an explanatory diagram showing a state transition model of another module managed by each distributed module of the service control node in FIG. 3;

【図６】図３におけるサービス制御ノードのペアモジュ
ール以外の分散モジュールで管理する機能分散モジュー
ルの状態遷移モデルを示す説明図である。FIG. 6 is an explanatory diagram showing a state transition model of a function distribution module managed by a distribution module other than the pair module of the service control node in FIG. 3;

【図７】図３におけるサービス制御ノードの機能分散モ
ジュールが管理するペアモジュールの状態遷移モデルを
示す説明図である。FIG. 7 is an explanatory diagram showing a state transition model of a pair module managed by the function distribution module of the service control node in FIG. 3;

[Explanation of symbols]

１００：サービス制御ノード、２０１：負荷分散モジュ
ールＡ、２０２：負荷分散モジュールＢ、２０３：機能
分散モジュールＣ、３００：伝達ノード、４００：他モ
ジュール状態管理機構、４０１：機能分散モジュール他
モジュール状態管理機構、４０２：バックアップ機構、
４０３：モジュール間通信機構、５０１：運転中状態、
５０２：一時故障中状態、５０３：故障中状態、５０
４：一時故障中・バックアップ運転中、６０１：通信応
答時間、６０２：監視時間、７０１〜７０７：遷移検出
ポイント。100: Service control node, 201: Load distribution module A, 202: Load distribution module B, 203: Function distribution module C, 300: Transmission node, 400: Other module state management mechanism, 401: Function distribution module other module state management mechanism , 402: backup mechanism,
403: inter-module communication mechanism, 501: operating state,
502: Temporary failure state, 503: Failure state, 50
4: Temporary failure / backup operation, 601: Communication response time, 602: Monitoring time, 701 to 707: Transition detection point.

フロントページの続き (56)参考文献特開昭55−8134（ＪＰ，Ａ) 特開平２−277336（ＪＰ，Ａ) 特開平６−187270（ＪＰ，Ａ) 特開昭56−122595（ＪＰ，Ａ) 特開昭55−97647（ＪＰ，Ａ) 特開昭55−37643（ＪＰ，Ａ) 特開昭54−115008（ＪＰ，Ａ) 特開昭63−102434（ＪＰ，Ａ) 特開昭60−89255（ＪＰ，Ａ) 実開昭62−71755（ＪＰ，Ｕ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) H04M 3/22 Continuation of the front page (56) References JP-A-55-8134 (JP, A) JP-A-2-277336 (JP, A) JP-A-6-187270 (JP, A) JP-A-56-122595 (JP) JP-A-55-97647 (JP, A) JP-A-55-37643 (JP, A) JP-A-54-115008 (JP, A) JP-A-63-102434 (JP, A) 60-89255 (JP, A) Japanese Utility Model Showa 62-71755 (JP, U) (58) Field surveyed (Int. Cl. ⁷ , DB name) H04M 3/22

Claims

(57) [Claims]

1. A module operation state control method for a distributed processing system, in which each module constituting the distributed processing system monitors the operation availability of another module, comprising: The response time to the processing request is measured, and the time during the measurement is a predetermined time (T0) for each access signal.
Monitors whether or not exceeding, the case where the time during the measurement exceeds the time (T0), to set registers the first module as a state in impossible operation failure Luke, or the
After setting and registering the first module as a failure state,
After the elapse of a time (T1) set separately in advance, the first
How to cancel the setting of the module's fault status
The first module as a faulty state
While retrying the first module,
A second module that includes
And the first module comprises:
Backup functions between other predefined modules
If the function distribution module has a function
Function distribution module with backup relationship with module
Selected as the second module, and
The two modules correspond to the first module corresponding to the retry processing.
Access to Joule, based on the above time (T0)
Registered the setting of the failure status of the first module
In this case, start the backup and
That all modules to which the
Module that has received the notification,
Register the setting of the above-mentioned fault status for the module of
And the second module is separately set in advance.
After the set time (T1), the operation of the first module is performed.
Determining whether the first module is operable;
State, a faulty state for the first module
In addition to canceling the setting registration, all the above modules
Register the setting of the faulty state for the first module.
A method for controlling a module operation state of a distributed processing system, wherein a notification is given to cancel the operation.