JP2002251384A

JP2002251384A - Wide area cluster control system

Info

Publication number: JP2002251384A
Application number: JP2001048186A
Authority: JP
Inventors: Yohei Matsuura; 陽平松浦
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2001-02-23
Filing date: 2001-02-23
Publication date: 2002-09-06

Abstract

PROBLEM TO BE SOLVED: To provide a wide area cluster control system which has high reliability including disaster resistance and high availability by controlling a plurality of cluster systems dispersed in a wide area via a network and normally controlling the cluster systems. SOLUTION: The cluster system 1 which is actually operated is decided based on state information on the cluster system 1 transmitted from the system monitoring demons 11 (11a to 11n) of a plurality of the cluster systems 1 (1a to 1n). State information on a plurality of the cluster systems 1 (1a to 1n) is informed to the system control means (12a to 12n) of a plurality of the cluster systems (1a to 1n). The system control means (12a to 12n) of a plurality of the cluster systems 1 (1a to 1n) control a plurality of the cluster systems (1a to 1n) in accordance with the state information.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、遠隔に分散する
クラスタシステムの状態を、クライアントが受信・蓄積
するとともに、各々のクラスタシステムに通知すること
で、ネットワークに障害が発生してもクラスタシステム
を正常に制御することを可能とし、サービスの可用性を
向上させるものである。BACKGROUND OF THE INVENTION The present invention relates to a cluster system which is remotely distributed, in which a client receives and stores the status, and notifies each cluster system of the status. It enables normal control and improves service availability.

【０００２】[0002]

【従来の技術】図１４は、例えば特開平５-２５０３４
５に示されたネットワーク管理方法であり、図におい
て、１０２及び１０４は制御点プロセッサ、１１２及び
１１４はディスク、１０６及び１０８は顧客プロセッサ
である。制御点プロセッサと顧客プロセッサは通信ネッ
トワーク１１６によって相互に接続されている。またそ
れぞれのディスクは制御点プロセッサに接続されいて
る。また制御点プロセッサは１つの主制御プロセッサ
と、予備制御プロセッサに分類され、予備制御プロセッ
サのディスクは主制御プロセッサのディスクに同期す
る。2. Description of the Related Art FIG.
5 is a network management method shown in FIG. 5, in which 102 and 104 are control point processors, 112 and 114 are disks, and 106 and 108 are customer processors. The control point processor and the customer processor are interconnected by a communication network 116. Each disk is connected to a control point processor. Further, the control point processors are classified into one main control processor and a spare control processor, and the disk of the spare control processor is synchronized with the disk of the main control processor.

【０００３】次に動作について説明する。顧客プロセッ
サは主制御プロセッサに対し、要求サービス・タスクを
送信し、主制御プロセッサはその要求に対しサービス応
答を返す。ここで、顧客プロセッサはサービス応答を検
査し、もしサービス応答が受け取れない場合は、顧客プ
ロセッサから予備制御プロセッサの一つに対し、主制御
プロセッサになるように要求を送信する。Next, the operation will be described. The customer processor sends a request service task to the main control processor, and the main control processor returns a service response to the request. Here, the customer processor checks the service response, and if no service response is received, sends a request from the customer processor to one of the spare control processors to become the main control processor.

【０００４】[0004]

【発明が解決しようとする課題】以上説明したように従
来のネットワーク管理方法では、ネットワーク１１６が
縦に分断された場合、顧客プロセッサ１０６は制御点プ
ロセッサ１０２を主制御プロセッサとみなし、顧客プロ
セッサ１０８は制御点プロセッサ１０４を主制御プロセ
ッサとみなす。このときネットワーク内には複数の主制
御プロセッサが存在することになる。それぞれの制御プ
ロセッサ内で排他で動作するアプリケーションがあった
場合、主制御プロセッサが複数存在することになり、例
えばネットワークに障害が発生した時にクラスタシステ
ムを正常に制御することが難しいという問題が発生する
ことがある。As described above, in the conventional network management method, when the network 116 is vertically divided, the customer processor 106 regards the control point processor 102 as the main control processor, and the customer processor 108 Control point processor 104 is considered the main control processor. At this time, a plurality of main control processors exist in the network. If there is an application that operates exclusively in each control processor, there are a plurality of main control processors. For example, there is a problem that it is difficult to control the cluster system normally when a network failure occurs. Sometimes.

【０００５】この発明は上記のような問題点を解決する
ためになされたもので、広域に分散した複数のクラスタ
システムをネットワーク経由で制御し、クラスタシステ
ムを正常に制御することにより、耐災害性を考慮した信
頼性、及び可用性の向上を目的とする。[0005] The present invention has been made to solve the above-mentioned problems, and a plurality of cluster systems distributed over a wide area are controlled via a network, so that the cluster systems can be controlled normally, thereby achieving disaster resistance. The purpose is to improve reliability and availability in consideration of the above.

【０００６】[0006]

【課題を解決するための手段】第１の発明は、以下の要
素を備えたものである。（ａ）ネットワークに接続され、クライアントから送信
された情報に基づいて処理を実行し、以下の要素を有す
る複数のクラスタシステム；（ａ１）クラスタシステムの状態を監視し、このクラス
タシステムの状態情報を送信するとともに、上記状態情
報が送信できなかった時にはこのクラスタシステムの停
止を指示するシステム監視手段；（ａ２）上記複数のクラスタシステムを制御するための
上記複数のクラスタシステム毎のルールテーブルを有
し、クライアントから送信された上記複数のクラスタシステ
ムの状態情報と上記ルールテーブルとに基づいて上記複
数のクラスタシステムの制御をするとともに、上記シス
テム監視手段による指示に基づいてクラスタシステムの
停止をするシステム制御手段；（ｂ）上記ネットワークに接続され、上記複数のクラス
タシステムにより送信された状態情報を上記複数のクラ
スタシステムに送信し、以下の要素を有するクライアン
ト；（ｂ１）上記システム監視手段により送信された状態情
報を蓄積し、この蓄積された状態情報が新たに送信され
た状態情報と異なる時に、実際に動作させるクラスタシ
ステムを決定し、上記複数のクラスタシステムのシステ
ム制御手段に上記複数のクラスタシステムの状態情報を
通知するシステム情報管理手段。According to a first aspect of the present invention, there is provided the following element. (A) a plurality of cluster systems that are connected to a network and execute processing based on information transmitted from a client, and have the following elements; (a1) monitor the status of the cluster system, and System monitoring means for transmitting the status information and instructing the cluster system to stop when the status information cannot be transmitted; (a2) having a rule table for each of the plurality of cluster systems for controlling the plurality of cluster systems; A system control for controlling the plurality of cluster systems based on the state information of the plurality of cluster systems transmitted from the client and the rule table and stopping the cluster systems based on an instruction from the system monitoring means; Means; (b) connect to the above network Transmitting the status information transmitted by the plurality of cluster systems to the plurality of cluster systems, and storing a status information transmitted by the system monitoring means; and storing the status information transmitted by the system monitoring means. System information management means for deciding a cluster system to be actually operated when the status information which has been transmitted differs from the newly transmitted status information, and notifying the system control means of the plurality of cluster systems of the status information of the plurality of cluster systems .

【０００７】第２の発明は、上記システム情報管理手段
をそれぞれ有する複数のクライアントを備え、上記シス
テム制御手段は、上記複数のクライアントから送信され
た上記複数のクラスタシステムの状態情報と上記ルール
テーブルとに基づいて上記複数のクラスタシステムの制
御をするものである。A second invention comprises a plurality of clients each having the above-mentioned system information management means, wherein the above-mentioned system control means includes a plurality of cluster system status information transmitted from the plurality of clients, the above-mentioned rule table, Is used to control the plurality of cluster systems.

【０００８】第３の発明は、上記ルールテーブルを有
し、このルールテーブルと、上記クライアント又は上記
複数のクライアントから送信された上記複数のクラスタ
システムの状態情報とから構成される上記複数のクラス
タシステムの動作情報を上記システム制御手段に送信す
るシステム制御サーバを備え、上記システム制御手段
は、上記動作情報に基づいて上記複数のクラスタシステ
ムの制御をするものである。[0008] A third invention has the above-mentioned rule table, and the plurality of cluster systems constituted by the rule table and the status information of the plurality of cluster systems transmitted from the client or the plurality of clients. And a system control server for transmitting the operation information to the system control means. The system control means controls the plurality of cluster systems based on the operation information.

【０００９】第４の発明は、状態情報を送信する上記ク
ライアントが正常であるか否か示すクライアント情報を
有するルールテーブルと、クライアントから送信された
上記複数のクラスタシステムの状態情報と、上記ルール
テーブルのクライアント情報とに基づいて上記複数のク
ラスタシステムの制御をするシステム制御手段とを備え
たものである。According to a fourth aspect of the present invention, there is provided a rule table having client information indicating whether or not the client transmitting status information is normal, the status information of the plurality of cluster systems transmitted from the client, and the rule table. And system control means for controlling the plurality of cluster systems based on the client information.

【００１０】第５の発明は、上記複数のクラスタシステ
ムの処理を優先的に実行させる重み付けがされた複数の
クライアントと、この重み付けがされたクライアントか
ら送信された上記複数のクラスタシステムの状態情報と
上記ルールテーブルとに基づいて上記複数のクラスタシ
ステムの制御をするシステム制御手段とを備えたもので
ある。A fifth aspect of the present invention relates to a plurality of weighted clients for giving priority to the processing of the plurality of cluster systems, and the status information of the plurality of cluster systems transmitted from the weighted clients. System control means for controlling the plurality of cluster systems based on the rule table.

【００１１】第６の発明は、上記複数のクラスタシステ
ム毎のルールテーブルが統合されたルールテーブルと、
クライアントから送信された上記複数のクラスタシステ
ムの状態情報とこの統合されたルールテーブルとに基づ
いて上記複数のクラスタシステムの制御をするシステム
制御手段とを備えたものである。A sixth invention provides a rule table in which the rule tables for the plurality of cluster systems are integrated,
System control means for controlling the plurality of cluster systems based on the state information of the plurality of cluster systems transmitted from the client and the integrated rule table.

【００１２】第７の発明は、上記システム制御手段に複
数のクラスタシステムの状態情報と、処理を実行するク
ラスタシステムであることを認識させるためのシステム
制御情報を送信するシステム情報管理手段と、上記シス
テム情報管理手段から送信された状態情報とシステム制
御情報とに基づいて上記複数のクラスタシステムの制御
をするシステム制御手段とを備えたものである。A seventh aspect of the present invention is a system information management means for transmitting the status information of a plurality of cluster systems and the system control information for causing the system control means to recognize the cluster system to execute the processing, System control means for controlling the plurality of cluster systems based on the status information and the system control information transmitted from the system information management means.

【００１３】[0013]

【発明の実施の形態】実施の形態１．図１は、実施の形
態１の広域クラスタ制御方式の構成図である。図におい
て、１（１ａ〜１ｎ）は遠隔に分散した複数のクラスタ
システム、２はクライアント、３は複数のクラスタシス
テム１とクライアント２を接続する広域ネットワークで
ある。業務アプリケーションはいずれか一つのクラスタ
システム１上で実行される。ここで、各クラスタシステ
ム１は業務アプリケーションが実行されているクラスタ
システム１をプライマリクラスタシステム、残りをセカ
ンダリクラスタシステムとする。各クラスタシステム１
は、各々のクラスタシステム１のシステム状態を示す状
態情報を収集しクライアント２に送信するシステム監視
デーモン１１（１１ａ〜１１ｎ）、及びクラスタシステ
ム１の制御を行うシステム制御デーモン１２（１２ａ〜
１２ｎ）を持つ。また、各システム制御デーモン１２
は、それぞれクラスタシステム１の制御を行うために用
いるルールテーブル１３（１３ａ〜１３ｎ）を持つ。ク
ライアント２は、各クラスタシステム１のシステム監視
デーモン１１から送信された各々のクラスタシステム１
の状態情報を受信するシステム情報管理デーモン２１を
持つ。DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiment 1 FIG. 1 is a configuration diagram of a wide-area cluster control method according to the first embodiment. In the figure, reference numeral 1 (1a to 1n) denotes a plurality of remotely distributed cluster systems, 2 denotes a client, and 3 denotes a wide area network connecting the plurality of cluster systems 1 and the clients 2. The business application is executed on any one of the cluster systems 1. Here, in each cluster system 1, the cluster system 1 in which the business application is being executed is a primary cluster system, and the rest are secondary cluster systems. Each cluster system 1
Is a system monitoring daemon 11 (11a to 11n) that collects status information indicating the system status of each cluster system 1 and sends it to the client 2, and a system control daemon 12 (12a to 12n) that controls the cluster system 1.
12n). In addition, each system control daemon 12
Have rule tables 13 (13a to 13n) used to control the cluster system 1, respectively. The client 2 transmits each cluster system 1 transmitted from the system monitoring daemon 11 of each cluster system 1.
It has a system information management daemon 21 for receiving status information of the system.

【００１４】次に動作について説明する。図２は、シス
テム監視デーモン１１の処理動作を示すフローチャート
である。通常、各システム監視デーモン１１は一定の間
隔で、それぞれのシステム監視デーモン１１を持つクラ
スタシステム１の状態を監視し、この監視しているクラ
スタシステム１の状態情報をクライアント２のシステム
情報管理デーモン２１に対して通知する（ステップＳ１
〜Ｓ３）。この時に、例えば、システム監視デーモン１
１ａがクラスタシステム１ａに障害が発生したことを確
認すると、システム監視デーモン１１ａは即座にシステ
ム情報管理デーモン２１に対してクラスタシステム１ａ
に障害が発生したことの通知を行う(ステップＳ４)。Next, the operation will be described. FIG. 2 is a flowchart illustrating the processing operation of the system monitoring daemon 11. Normally, each system monitoring daemon 11 monitors the status of the cluster system 1 having the system monitoring daemon 11 at regular intervals, and transmits the status information of the monitored cluster system 1 to the system information management daemon 21 of the client 2. (Step S1
~ S3). At this time, for example, the system monitoring daemon 1
When 1a confirms that a failure has occurred in the cluster system 1a, the system monitoring daemon 11a immediately notifies the system information management daemon 21 of the cluster system 1a.
Is notified that a failure has occurred (step S4).

【００１５】システム情報管理デーモン２１に対してク
ラスタシステム１ａに障害が発生したことの通知ができ
た時には（ステップＳ５）、クラスタシステム１ａの障
害が復帰するまで待機し（ステップＳ７）、クラスタシ
ステム１ａの障害が復帰すると（ステップＳ８）、ステ
ップＳ１に戻り、再びシステム監視デーモン１１ａが一
定の間隔でクラスタシステム１ａの状態を監視する。When the failure of the cluster system 1a is notified to the system information management daemon 21 (step S5), the process waits until the failure of the cluster system 1a is restored (step S7), and the cluster system 1a Is recovered (step S8), the process returns to step S1, and the system monitoring daemon 11a monitors the status of the cluster system 1a again at regular intervals.

【００１６】一方、ステップＳ５でネットワーク３が分
断されるなどして、システム監視デーモン１１ａがシス
テム情報管理デーモン１２に対してクラスタシステム１
ａに障害が発生したことの通知ができなかった場合に
は、システム監視デーモン１１ａはシステム制御デーモ
ン１２ａに対してクラスタシステム１ａを停止すること
を指示通知する(ステップＳ６)。その後、クラスタシス
テム１ａの障害が復帰するまで待機し（ステップＳ
７）、クラスタシステム１ａの障害が復帰すると（ステ
ップＳ８）、ステップＳ１に戻り、再びシステム監視デ
ーモン１１ａが一定の間隔でクラスタシステム１ａの状
態を監視する。このような動作が、クラスタシステム１
ａ以外のクラスタシステムで行われる。On the other hand, the system monitoring daemon 11a sends the system information management daemon 12 to the cluster system 1
If the notification that the failure has occurred in a cannot be made, the system monitoring daemon 11a instructs the system control daemon 12a to stop the cluster system 1a (step S6). After that, it waits until the failure of the cluster system 1a is recovered (step S
7) When the failure of the cluster system 1a recovers (step S8), the process returns to step S1, and the system monitoring daemon 11a monitors the state of the cluster system 1a again at regular intervals. Such an operation is performed by the cluster system 1
This is performed in a cluster system other than a.

【００１７】図３は、システム情報管理デーモン２１の
処理動作を示すフローチャートである。システム情報管
理デーモン２１は、各システム監視デーモン１１から送
信されたそれぞれのシステム監視デーモン１１を構成す
る各クラスタシステム１のシステム状態を示す状態情報
を受信し（ステップＳ１１)、この状態情報を内部に蓄
積する(ステップＳ１２)。次に、システム情報管理デー
モン２１は、これまで蓄積した各クラスタシステム１の
状態情報と、新しく各システム監視デーモン１１から送
信された状態情報とを比較し、これらの状態情報が異な
るか否かを判断する(ステップＳ１３)。これらの状態情
報が異ならない、即ち同一の時にはステップＳ１１に戻
り、再度、システム監視デーモン１１から送信された状
態情報を受信する。FIG. 3 is a flowchart showing the processing operation of the system information management daemon 21. The system information management daemon 21 receives the status information indicating the system status of each cluster system 1 constituting each system monitoring daemon 11 transmitted from each system monitoring daemon 11 (step S11), and stores this status information internally. It accumulates (step S12). Next, the system information management daemon 21 compares the status information of each cluster system 1 accumulated so far with the status information newly transmitted from each system monitoring daemon 11, and determines whether or not these status information are different. A determination is made (step S13). When the status information does not differ, that is, when the status information is the same, the process returns to step S11, and the status information transmitted from the system monitoring daemon 11 is received again.

【００１８】一方、ステップＳ１３でこれらの状態情報
が異なると判断された時には、実際に動作するクラスタ
システム１を決定し、各システム制御デーモン１２に対
し各クラスタシステム１の状態情報を通知する(ステッ
プＳ１４)。On the other hand, when it is determined in step S13 that these pieces of status information are different, the cluster system 1 that actually operates is determined, and the status information of each cluster system 1 is notified to each system control daemon 12 (step S13). S14).

【００１９】ここで、実際に動作するクラスタシステム
１は、図４に示すようなルールテーブルにもとづいて決
定される。業務アプリケーションが実行されているクラ
スタシステム１であるプライマリクラスタシステムと、
残りのクラスタシステム１であるセカンダリクラスタシ
ステムの２つのクラスタシステムから構成される環境
で、プライマリクラスタシステムに異常が発生した場合
を考える。この場合、セカンダリクラスタシステムが正
常または異常の場合、図４のプライマリクラスタシステ
ムのルールテーブルに基づき、プライマリクラスタシス
テムを停止するようにシステム制御デーモン１２に通知
する。Here, the actually operating cluster system 1 is determined based on a rule table as shown in FIG. A primary cluster system that is a cluster system 1 in which a business application is executed;
Consider a case where an abnormality occurs in the primary cluster system in an environment composed of two cluster systems of the remaining cluster system 1 and the secondary cluster system. In this case, if the secondary cluster system is normal or abnormal, it notifies the system control daemon 12 to stop the primary cluster system based on the rule table of the primary cluster system in FIG.

【００２０】また、プライマリクラスタシステムに異常
が発生した時、セカンダリクラスタシステムが正常の場
合は、図４のプライマリクラスタシステムのルールテー
ブルに基づき、セカンダリクラスタシステムを起動する
ようにシステム制御デーモン１２に通知し、異常の場合
セカンダリクラスタシステムを停止するように通知す
る。When an abnormality occurs in the primary cluster system and when the secondary cluster system is normal, the system control daemon 12 is notified to start the secondary cluster system based on the rule table of the primary cluster system in FIG. If an error occurs, the secondary cluster system is notified to stop.

【００２１】なお、図４のプライマリクラスタシステム
のルールテーブルは業務アプリケーションが実行されて
いるクラスタシステム１のルールテーブルであり、セカ
ンダリクラスタシステムのルールテーブルは業務アプリ
ケーションが実行されているクラスタシステム１以外の
クラスタシステム１のルールテーブルである。The rule table of the primary cluster system in FIG. 4 is a rule table of the cluster system 1 on which the business application is executed, and the rule table of the secondary cluster system is other than the cluster system 1 on which the business application is executed. 3 is a rule table of the cluster system 1.

【００２２】次に、システム情報管理デーモン２１がシ
ステム制御デーモン１２に状態情報を送信できた場合
(ステップＳ１５)、システム情報管理デーモン２１はス
テップＳ１１に戻り、再びシステム監視デーモン１１か
ら送信された状態情報を受信する。Next, when the system information management daemon 21 is able to send status information to the system control daemon 12
(Step S15), the system information management daemon 21 returns to step S11, and receives the state information transmitted from the system monitoring daemon 11 again.

【００２３】一方、ステップＳ１５でシステム情報管理
デーモン２１がシステム制御デーモン１２に状態情報を
送信できなかった場合、送信できなかったクラスタシス
テム１はダウンしていると判断(ステップＳ１６)し、ス
テップＳ１２に戻り、このクラスタシステム１がダウン
していることを新しいクラスタシステム１の状態情報と
して蓄積する。On the other hand, if the system information management daemon 21 cannot send the status information to the system control daemon 12 in step S15, it is determined that the failed cluster system 1 is down (step S16). Then, the fact that the cluster system 1 is down is stored as state information of the new cluster system 1.

【００２４】図５は、システム制御デーモン１２の処理
動作を示すフローチャートである。各システム制御デー
モン１２は、クライアント２のシステム情報管理デーモ
ン２１から各クラスタシステム１の状態情報を受信した
か否を判断し（ステップＳ２１）、システム情報管理デ
ーモン２１から状態情報を受信した場合には、この受信
した状態情報とルールテーブル１３に従ってそれぞれの
クラスタシステム１を制御し（ステップＳ２５）、その
後、ステップＳ２１に戻り、再びクライアント２のシス
テム情報管理デーモン２１から状態情報を受信したか否
の判断を行う。FIG. 5 is a flowchart showing the processing operation of the system control daemon 12. Each system control daemon 12 determines whether or not status information of each cluster system 1 has been received from the system information management daemon 21 of the client 2 (step S21), and if status information has been received from the system information management daemon 21, The respective cluster systems 1 are controlled in accordance with the received status information and the rule table 13 (step S25), and thereafter, the process returns to step S21 to determine whether or not status information has been received from the system information management daemon 21 of the client 2 again. I do.

【００２５】一方、ステップＳ２１で、システム情報管
理デーモン２１から状態情報を受信できなかった場合に
は、一定間隔を待ち（ステップＳ２２）、タイムアウト
したか否かが判断され（ステップＳ２３）、タイムアウ
トしたと判断された時には、システム情報管理デーモン
２１から受信できなかった状態情報を持つクラスタシス
テム１はサービス不能と判断し、受信できなかったシス
テム制御デーモン１２のクラスタシステム１を停止する
（ステップＳ２４）。その後、ステップＳ２１に戻り、
再び全てのクライアント２のシステム情報管理デーモン
２１から通知された状態情報を受信したか否の判断を行
う。On the other hand, if the status information cannot be received from the system information management daemon 21 in step S21, a predetermined interval is waited (step S22), it is determined whether or not a timeout has occurred (step S23), and the timeout has occurred. When it is determined that the cluster system 1 having the status information that could not be received from the system information management daemon 21 determines that the service cannot be performed, the cluster system 1 of the system control daemon 12 that could not receive the status information is stopped (step S24). Then, returning to step S21,
It is determined again whether the status information notified from the system information management daemons 21 of all the clients 2 has been received.

【００２６】また、ステップＳ２３で、タイムアウトし
ないと判断された時には、ステップＳ２１に戻り、クラ
イアント２のシステム情報管理デーモン２１から通知さ
れた状態情報を受信したか否の判断を行う。なお、図６
は、クラスタシステム１に障害が発生したときの、シス
テム制御デーモン１２、システム監視デーモン１１、シ
ステム情報管理デーモン２１の動作を表したものであ
る。If it is determined in step S23 that the timeout has not occurred, the process returns to step S21 to determine whether the status information notified from the system information management daemon 21 of the client 2 has been received. FIG.
Shows operations of the system control daemon 12, the system monitoring daemon 11, and the system information management daemon 21 when a failure occurs in the cluster system 1.

【００２７】以上のように本実施の形態によれば、災害
などであるクラスタシステム１に障害が発生したり、あ
るクラスタシステム１と広域ネットワーク３が分断され
たりした場合、クライアント２のシステム情報管理デー
モン２１と通信できないクラスタシステム１は停止し、
システム情報管理デーモン２１と通信できるクラスタシ
ステム１のうち、１つだけが起動される。この結果、災
害などであるクラスタシステム１に障害が発生したりネ
ットワークに障害が発生した時に各クラスタシステム１
を正常に制御することができる。これにより、耐災害性
を考慮した信頼性、及び可用性の高い広域クラスタ制御
方式を実現することができる。As described above, according to the present embodiment, when a failure occurs in the cluster system 1 such as a disaster or when a certain cluster system 1 and the wide area network 3 are separated, the system information management of the client 2 is performed. The cluster system 1 that cannot communicate with the daemon 21 stops,
Only one of the cluster systems 1 that can communicate with the system information management daemon 21 is started. As a result, when a failure occurs in the cluster system 1 such as a disaster or a network
Can be controlled normally. As a result, a wide-area cluster control method with high reliability and high availability considering disaster resistance can be realized.

【００２８】実施の形態２．実施の形態１では単一のク
ライアントを用いてクラスタシステムの状態情報を蓄積
する場合について説明したが、本実施の形態ではクライ
アントを複数用いて信頼性を向上させた場合について説
明を行う。Embodiment 2 In the first embodiment, the case where the state information of the cluster system is accumulated using a single client has been described. In the present embodiment, a case where the reliability is improved by using a plurality of clients will be described.

【００２９】図７は、実施の形態２の広域クラスタ制御
方式の構成図である。図において、１（１ａ〜１ｎ）は
遠隔に分散した複数のクラスタシステム、２（２ａ〜２
ｍ）は複数のクライアント、３は複数のクラスタシステ
ム１（１ａ〜１ｎ）と複数のクライアント２（２ａ〜２
ｍ）を接続する広域ネットワークである。各クラスタシ
ステム１は、各々のクラスタシステム１のシステム状態
を示す状態情報を収集し全てのクライアント２に送信す
るシステム監視デーモン１１（１１ａ〜１１ｎ）、及び
クラスタシステム１の制御を行うシステム制御デーモン
１２（１２ａ〜１２ｎ）を持つ。また、各システム制御
デーモン１２は、それぞれ図４で示したルールテーブル
と同様のルールテーブル、即ち、クラスタシステム１の
制御を行うために用いるルールテーブル１３（１３ａ〜
１３ｎ）を持つ。各クライアント２はそれぞれ、各クラ
スタシステム１のシステム監視デーモン１１から送信さ
れた各々のクラスタシステム１の状態情報を受信するシ
ステム情報管理デーモン２１（２１ａ〜２１ｍ）を持
つ。なお、業務アプリケーションはいずれか一つのクラ
スタシステム１上で実行される。FIG. 7 is a configuration diagram of the wide area cluster control system according to the second embodiment. In the figure, 1 (1a to 1n) is a plurality of cluster systems distributed remotely, and 2 (2a to 2
m) is a plurality of clients, 3 is a plurality of cluster systems 1 (1a to 1n) and a plurality of clients 2 (2a to 2n).
m) is a wide area network. Each cluster system 1 collects status information indicating the system status of each cluster system 1 and transmits the collected status information to all the clients 2, and a system control daemon 12 that controls the cluster system 1. (12a to 12n). Also, each system control daemon 12 has a rule table similar to the rule table shown in FIG. 4, that is, a rule table 13 (13a to 13a) used for controlling the cluster system 1.
13n). Each client 2 has a system information management daemon 21 (21a to 21m) that receives status information of each cluster system 1 transmitted from the system monitoring daemon 11 of each cluster system 1. The business application is executed on any one of the cluster systems 1.

【００３０】次に動作について説明する。図８は、シス
テム監視デーモン１１の処理動作を示すフローチャート
である。図８のフローチャートは、ステップＳ３４で各
システム監視デーモン１１が全てのクライアント２に対
してそれぞれのクラスタシステム１の状態情報を送信す
ること以外は、図２のフローチャートと同様である。ま
た、各クライアント２のシステム情報管理デーモン２１
の動作は実施の形態１で説明したシステム情報管理デー
モン２１と同様である。Next, the operation will be described. FIG. 8 is a flowchart showing the processing operation of the system monitoring daemon 11. The flowchart of FIG. 8 is the same as the flowchart of FIG. 2 except that each system monitoring daemon 11 transmits the status information of each cluster system 1 to all the clients 2 in step S34. Also, the system information management daemon 21 of each client 2
Is the same as that of the system information management daemon 21 described in the first embodiment.

【００３１】図９は、システム制御デーモン１２の処理
動作を示すフローチャートである。各システム制御デー
モン１２は、全てのクライアント２のシステム情報管理
デーモン２１からクラスタシステム１の状態情報を受信
したか否を判断し（ステップＳ４１）、各システム情報
管理デーモン２１から状態情報を受信した場合には、こ
の受信した状態情報とルールテーブル１３に従ってクラ
スタシステム１を制御し（ステップＳ４７）、その後、
ステップＳ４１に戻り、再び全てのクライアント２のシ
ステム情報管理デーモン２１から状態情報を受信したか
否の判断を行う。FIG. 9 is a flowchart showing the processing operation of the system control daemon 12. Each system control daemon 12 determines whether or not status information of the cluster system 1 has been received from the system information management daemons 21 of all the clients 2 (step S41). In step S47, the cluster system 1 is controlled in accordance with the received state information and the rule table 13.
Returning to step S41, it is determined again whether or not status information has been received from the system information management daemons 21 of all the clients 2.

【００３２】一方、システム情報管理デーモン２１から
状態情報を受信できなかった場合には、一定間隔を待ち
（ステップＳ４２）、タイムアウトしたか否かが判断さ
れ（ステップＳ４３）、タイムアウトしたと判断された
時には、受信できなかったシステム情報管理デーモン２
１のクライアント２は停止していると判断し、受信対象
から除外する（ステップＳ４４）。その後、受信対象の
クライアント数を調べ（ステップＳ４５）、受信対象の
クライアントがある場合、ステップＳ４１に戻り、再び
全てのクライアント２のシステム情報管理デーモン２１
の情報を受信したか否を判断する。受信対象のクライア
ントがなくなった場合、クラスタシステムはサービス不
能と判断し、クラスタシステムを停止する（ステップＳ
４６）。その後、ステップＳ４１に戻り、再び全てのク
ライアント２のシステム情報管理デーモン２１から通知
された状態情報を受信したか否の判断を行う。On the other hand, when the status information cannot be received from the system information management daemon 21, a predetermined interval is waited (step S42), it is determined whether or not a timeout has occurred (step S43), and it is determined that the timeout has occurred. Sometimes, the system information management daemon 2 that could not receive
It is determined that the first client 2 has stopped, and is excluded from the reception target (step S44). Thereafter, the number of clients to be received is checked (step S45).
Is determined. If there are no more clients to receive, the cluster system determines that the service is not possible and stops the cluster system (step S).
46). Thereafter, the process returns to step S41, and it is determined again whether or not the status information notified from the system information management daemons 21 of all the clients 2 has been received.

【００３３】また、ステップＳ４３でタイムアウトしな
いと判断された時には、ステップＳ４１に戻り、再び全
てのクライアント２のシステム情報管理デーモン２１の
情報を受信したか否の判断を行う。このように、システ
ム制御デーモン１２は、広域ネットワーク３に接続され
ているクライアント２の数を事前に記録しており、受信
した複数のクライアント２からの情報を多数決によって
まとめ、ルールテーブルに従ってクラスタシステムを制
御する。If it is determined in step S43 that a timeout has not occurred, the process returns to step S41, and it is determined again whether information on the system information management daemon 21 of all clients 2 has been received. As described above, the system control daemon 12 records the number of clients 2 connected to the wide area network 3 in advance, summarizes the received information from the plurality of clients 2 by majority decision, and configures the cluster system according to the rule table. Control.

【００３４】以上のように本実施の形態によれば、災害
などであるクラスタシステム１に障害が発生したり、あ
るクラスタシステム１と広域ネットワーク３が分断され
たりしても、遠隔に分散した各クラスタシステム１を正
常に制御することができ、また各クラスタシステム１の
状態情報の信頼性を向上させることが可能になる。さら
に、これにより耐災害性を考慮した信頼性、及び可用性
の高い広域クラスタ制御方式を実現することができる。As described above, according to the present embodiment, even if a failure occurs in the cluster system 1 such as a disaster, or if a certain cluster system 1 and the wide area network 3 are disconnected, The cluster system 1 can be controlled normally, and the reliability of the state information of each cluster system 1 can be improved. Further, a wide-area cluster control method with high reliability and high availability in consideration of disaster resistance can be realized.

【００３５】実施の形態３．実施の形態２では、システ
ム情報管理デーモンからの状態情報の蓄積先を、各クラ
スタシステムのシステム制御デーモンとしたが、本実施
の形態では、別の計算機を設置し、この計算機にシステ
ム情報管理デーモンからの状態情報を蓄積する場合につ
いて説明を行う。Embodiment 3 FIG. In the second embodiment, the storage destination of the status information from the system information management daemon is the system control daemon of each cluster system. However, in the present embodiment, another computer is installed, and the system information management daemon is installed in this computer. A description will be given of a case where the state information is stored.

【００３６】図１０は、実施の形態３の広域クラスタ制
御方式の構成図である。図において、１（１ａ〜１ｎ）
は遠隔に分散した複数のクラスタシステム、２（２ａ〜
２ｍ）は複数のクライアント、３は複数のクラスタシス
テム１（１ａ〜１ｎ）と複数のクライアント２（２ａ〜
２ｍ）を接続する広域ネットワークである。また、各ク
ラスタシステム１は、各々のクラスタシステム１のシス
テム状態を示す状態情報を収集し全てのクライアント２
に送信するシステム監視デーモン１１（１１ａ〜１１
ｎ）、及びクラスタシステム１の制御を行うシステム制
御デーモン１２（１２ａ〜１２ｎ）を持つ。FIG. 10 is a configuration diagram of the wide area cluster control system according to the third embodiment. In the figure, 1 (1a to 1n)
Is a plurality of cluster systems distributed remotely, and 2 (2a to
2m) is a plurality of clients, 3 is a plurality of cluster systems 1 (1a to 1n) and a plurality of clients 2 (2a to 2n).
2m). Also, each cluster system 1 collects status information indicating the system status of each cluster system 1, and collects all the clients 2
System monitoring daemon 11 (11a to 11a)
n) and a system control daemon 12 (12a to 12n) for controlling the cluster system 1.

【００３７】また、システム制御サーバ４は広域ネット
ワーク３に接続されており、各クライアント２は、各ク
ラスタシステム１のシステム監視デーモン１１から送信
された状態情報を受信し、システム制御サーバ４にこの
状態情報を送信するシステム情報管理デーモン２１（２
１ａ〜２１ｍ）を持つ。さらに、システム制御サーバ４
は、図４で示した各システム制御デーモン１２が持って
いたルールテーブル１３と同様のルールテーブル１３を
持つ。The system control server 4 is connected to the wide area network 3, and each client 2 receives the status information transmitted from the system monitoring daemon 11 of each cluster system 1 and sends the status information to the system control server 4. System information management daemon 21 (2
1a to 21m). Further, the system control server 4
Has a rule table 13 similar to the rule table 13 of each system control daemon 12 shown in FIG.

【００３８】次に動作について説明する。各システム監
視デーモン１１の処理動作は、図７のフローチャートと
同様である。また、各システム情報管理デーモン２１の
処理動作は、図３のステップＳ１４で「システム制御デ
ーモン１２にクラスタシステム１の状態情報を通知す
る」ことを、「システム制御サーバにクラスタシステム
１の状態情報を通知する」に変更すること以外は図３で
説明したと同様の動作フローになる。Next, the operation will be described. The processing operation of each system monitoring daemon 11 is the same as the flowchart of FIG. In addition, the processing operation of each system information management daemon 21 is described as “notifying the system control daemon 12 of the status information of the cluster system 1” in step S14 of FIG. The operation flow is the same as that described with reference to FIG.

【００３９】図１１は、システム制御サーバ４の処理動
作を示すフローチャートである。各システム制御サーバ
４は、全てのクライアント２のシステム情報管理デーモ
ン２１からクラスタシステム１の状態情報を受信したか
否を判断し（ステップＳ５１）、各システム情報管理デ
ーモン２１から状態情報を受信した場合には、この受信
した各情報管理デーモン２１からの状態情報を蓄積し、
この蓄積した状態情報と自らが持っているルールテーブ
ル１３とから構成されるクラスタシステム１の動作情報
をクラスタシステム１のシステム制御デーモン１２に送
信する（ステップＳ５５）。その後、ステップＳ５１に
戻り、再び全てのクライアント２のシステム情報管理デ
ーモン２１から状態情報を受信したか否を判断する。な
お、ステップＳ５５でシステム制御サーバ４がシステム
制御デーモン１２にクラスタシステム１の動作情報を送
信すると、システム制御デーモン１２ではこの動作情報
に基づいてクラスタシステム１の制御を行う。FIG. 11 is a flowchart showing the processing operation of the system control server 4. Each system control server 4 determines whether the status information of the cluster system 1 has been received from the system information management daemons 21 of all the clients 2 (step S51), and when the status information has been received from each system information management daemon 21. Accumulates the received status information from each information management daemon 21,
The operation information of the cluster system 1 composed of the accumulated state information and the rule table 13 of the cluster system 1 is transmitted to the system control daemon 12 of the cluster system 1 (step S55). Thereafter, the process returns to step S51, and it is determined again whether or not status information has been received from the system information management daemons 21 of all the clients 2. When the system control server 4 transmits the operation information of the cluster system 1 to the system control daemon 12 in step S55, the system control daemon 12 controls the cluster system 1 based on the operation information.

【００４０】一方、ステップＳ５１でシステム情報管理
デーモン２１から状態情報を受信しない場合には、一定
間隔を待ち（ステップＳ５２）、タイムアウトしたか否
かが判断され（ステップＳ５３）、タイムアウトしたと
判断された時には、受信できなかったシステム情報管理
デーモン２１の情報を持つクライアント２は停止してい
ると判断し、受信対象から除外する（ステップＳ５
４）。その後、ステップＳ５１に戻り、再び全てのクラ
イアント２のシステム情報管理デーモン２１から状態情
報を受信したか否を判断する。On the other hand, if the status information is not received from the system information management daemon 21 in step S51, a predetermined interval is waited (step S52), it is determined whether or not a timeout has occurred (step S53), and it is determined that a timeout has occurred. In this case, the client 2 having the information of the system information management daemon 21 that cannot be received is determined to be stopped, and is excluded from the reception target (step S5).
4). Thereafter, the process returns to step S51, and it is determined again whether or not status information has been received from the system information management daemons 21 of all the clients 2.

【００４１】即ち、本実施の形態では、システム制御サ
ーバ４は、システム情報管理デーモン２１からの情報を
受信し、この受信した情報とルールテーブル１３より得
たクラスタシステム１の動作情報を、各クラスタシステ
ム１のシステム制御デーモン１２に送信する。システム
制御デーモン１２はシステム制御サーバ４から得た動作
情報に従って、クラスタシステムを制御する。That is, in the present embodiment, the system control server 4 receives the information from the system information management daemon 21 and converts the received information and the operation information of the cluster system 1 obtained from the rule table 13 into each cluster. This is sent to the system control daemon 12 of the system 1. The system control daemon 12 controls the cluster system according to the operation information obtained from the system control server 4.

【００４２】以上のように本実施の形態によれば、災害
などであるクラスタシステム１に障害が発生したり、あ
るクラスタシステム１と広域ネットワーク３が分断され
たりしても、各クライアント２が各クラスタシステム１
に対し通信を行う必要性がなくなり、ネットワークの可
用性を向上させることが可能になる。また、これにより
耐災害性を考慮した信頼性、及び可用性の高い広域クラ
スタ制御方式を実現することができる。As described above, according to the present embodiment, even if a failure occurs in the cluster system 1 which is a disaster or the like, or if a certain cluster system 1 is separated from the wide area network 3, each client 2 Cluster system 1
It is not necessary to perform communication with the network, and the availability of the network can be improved. In addition, a wide area cluster control method with high reliability and high availability in consideration of disaster resistance can be realized.

【００４３】実施の形態４．実施の形態２では、プライ
マリクラスタシステム、およびセカンダリクラスタシス
テムの動作を、クライアントからの通知の多数決とルー
ルテーブルによって決定したが、本実施の形態では、重
み付けを行ったルールテーブルに基づいてクラスタシス
テム１を決定する場合について説明を行う。Embodiment 4 FIG. In the second embodiment, the operations of the primary cluster system and the secondary cluster system are determined by the majority decision of the notification from the client and the rule table. However, in the present embodiment, the cluster system 1 is determined based on the weighted rule table. Will be described.

【００４４】図１２は、重み付けを行ったルールテーブ
ルの一例を示した図である。図１２は、クライアントの
総数のうち６０％以上が正常とみなした場合、クラスタ
システムの動作を切り替えることを表したテーブルであ
る。本実施の形態の広域クラスタ制御方式の構成及びこ
の構成要素の処理動作については、実施の形態２と同様
である。本実施の形態では、実際に動作するクラスタシ
ステム１を決定する時に、図１２の重み付けをおこなっ
たルールテーブルに基づいて決定される点が、実施の形
態２と違う点である。FIG. 12 is a diagram showing an example of a weighted rule table. FIG. 12 is a table showing that the operation of the cluster system is switched when 60% or more of the total number of clients is considered normal. The configuration of the wide area cluster control system of the present embodiment and the processing operations of the components are the same as those of the second embodiment. The present embodiment is different from the second embodiment in that the cluster system 1 that actually operates is determined based on the weighted rule table in FIG.

【００４５】これにより、適用するシステムのネットワ
ーク環境によりクラスタシステムの動作を調整すること
が可能となる。つまり、一台でもクラスタシステムを正
常とみなすクライアントがある限り動作するクラスタシ
ステムや、一台でもクラスタシステムを異常とみなすク
ライアントがあれば停止するクラスタシステムを実現す
ることが可能となり、可用性を向上できる。Thus, the operation of the cluster system can be adjusted according to the network environment of the system to which the system is applied. In other words, it is possible to realize a cluster system that operates as long as there is a client that regards the cluster system as normal even if there is at least one client, and a cluster system that stops if there is at least one client that regards the cluster system as abnormal. .

【００４６】以上のように本実施の形態によれば、重み
付けをおこなったルールテーブルを使用することによ
り、システム情報管理デーモン２１の誤認識による予期
しないシステム動作を減少させることが可能となり、シ
ステム全体の信頼性を向上させることが可能になる。As described above, according to the present embodiment, it is possible to reduce unexpected system operation due to erroneous recognition of the system information management daemon 21 by using the weighted rule table, and Can be improved in reliability.

【００４７】実施の形態５．実施の形態２では、各クラ
イアントのシステム情報管理デーモンから、各クラスタ
システムのシステム制御デーモンに対して送信される状
態情報は、各クライアントで対等であったが、本実施の
形態では、クライアント毎に重み付け、即ち、複数のク
ラスタシステムの処理を優先的に実行させる重み付けが
された場合について説明を行う。Embodiment 5 In the second embodiment, the status information transmitted from the system information management daemon of each client to the system control daemon of each cluster system is equal for each client. A case will be described in which weighting is performed, that is, weighting is performed so as to give priority to processing of a plurality of cluster systems.

【００４８】本実施の形態の広域クラスタ制御方式の構
成及びこの構成要素の処理動作については、実施の形態
２と同様である。本実施の形態では、システム制御デー
モン１２がルールテーブル１３に基づきクラスタシステ
ム１の動作状態を決定する際に、ある特定のクライアン
ト２の結果を優先的に反映させるようにしたものであ
る。即ち、ある特定のクライアント２から送信された各
クラスタシステム１の状態情報とルールテーブル１３に
基づいて、システム制御デーモン２１が各クラスタシス
テム１を制御するようにしたものである。The configuration of the wide area cluster control system of the present embodiment and the processing operation of the components are the same as those of the second embodiment. In the present embodiment, when the system control daemon 12 determines the operating state of the cluster system 1 based on the rule table 13, the result of a specific client 2 is preferentially reflected. That is, the system control daemon 21 controls each cluster system 1 based on the status information of each cluster system 1 transmitted from a specific client 2 and the rule table 13.

【００４９】本実施の形態によれば、信頼性の低いクラ
イアントや、信頼性の低いネットワークパスに接続され
ているクライアントの優先度を下げることができ、クラ
イアント側の障害によるシステム制御デーモンの誤判断
を最小限におさえることが可能となる。According to the present embodiment, it is possible to lower the priority of a low-reliability client or a client connected to a low-reliability network path. Can be minimized.

【００５０】実施の形態６．実施の形態１〜５では、ル
ールテーブルをクラスタシステムの数だけ用意する必要
があったが、本実施の形態では複数のルールテーブルを
ひとつにまとめた場合について説明を行う。この場合に
は、１つのクラスタシステムがプライマリクラスタシス
テムとなり、残りのクラスタシステムがセカンダリクラ
スタシステムとなる。Embodiment 6 FIG. In the first to fifth embodiments, it is necessary to prepare the same number of rule tables as the number of cluster systems. In the present embodiment, a case will be described in which a plurality of rule tables are combined into one. In this case, one cluster system becomes a primary cluster system, and the remaining cluster systems become secondary cluster systems.

【００５１】図１３は、複数のルールテーブルを１つに
統合したルールテーブルの一例を示した図である。本実
施の形態の広域クラスタ制御方式の構成及びこの構成要
素の処理動作については、実施の形態１〜５と同様であ
るが、下記とおりシステム制御サーバ４の処理動作に違
いがある。FIG. 13 is a diagram showing an example of a rule table obtained by integrating a plurality of rule tables into one. The configuration of the wide area cluster control system of this embodiment and the processing operations of the components are the same as those of the first to fifth embodiments, but there are differences in the processing operations of the system control server 4 as described below.

【００５２】システム制御デーモン１２の処理動作は次
のとおりである各システム制御デーモン１２は、全ての
クライアント２のシステム情報管理デーモン２１からプ
ライマリクラスタシステムであるクラスタシステム１ａ
が停止している旨の通知を受けると、このプライマリク
ラスタシステムであるクラスタシステム１ａをセカンダ
リクラスタシステムに降格し、正常に動作しているセカ
ンダリクラスタシステムである他のクラスタシステムの
中から次のプライマリクラスタシステムを選出する。そ
の後、ルールテーブル１３に従ってクラスタシステム１
を制御する。The processing operation of the system control daemon 12 is as follows. Each of the system control daemons 12 sends the cluster system 1a, which is the primary cluster system, from the system information management daemon 21 of all the clients 2.
Is stopped, the cluster system 1a, which is the primary cluster system, is demoted to the secondary cluster system, and the next primary cluster system among the other cluster systems that are operating normally is replaced with the next primary cluster system. Select a cluster system. Then, according to the rule table 13, the cluster system 1
Control.

【００５３】なお、システム制御サーバ４を利用する場
合にも、上記のようにシステム制御デーモン１２と同様
の処理が行われるが、システム制御サーバ４を利用する
場合には、正常に動作しているセカンダリクラスタシス
テムである他のクラスタシステムの中から次のプライマ
リクラスタシステムを選出すると、この情報と自らが持
っているルールテーブル１３とから構成されるクラスタ
システム１の動作情報をクラスタシステム１のシステム
制御デーモン１２に送信し、以下システム制御デーモン
１２で、この動作情報に従って、クラスタシステムを制
御する。When the system control server 4 is used, the same processing as that of the system control daemon 12 is performed as described above. However, when the system control server 4 is used, it operates normally. When the next primary cluster system is selected from other cluster systems, which are secondary cluster systems, the operation information of the cluster system 1 composed of this information and the rule table 13 of the cluster system 1 is transmitted to the system control of the cluster system 1. The system control daemon 12 controls the cluster system in accordance with the operation information.

【００５４】以上のように本実施の形態によれば、複数
のルールテーブルを管理する必要性がなくなり、システ
ムの可用性を向上させることが可能になり、可用性の高
い広域クラスタ制御方式を実現することができる。As described above, according to the present embodiment, there is no need to manage a plurality of rule tables, it is possible to improve the availability of the system, and to realize a highly available wide area cluster control system. Can be.

【００５５】実施の形態７．実施の形態１〜６では、ク
ライアントが、全てのクラスタシステムのシステム制御
デーモンに対して各クラスタシステムの状態情報を送信
していたが、本実施の形態では、、高速なネットワーク
を使用する場合、プライマリクラスタシステムであるク
ラスタシステムにのみ、状態情報とシステム制御情報を
送信する場合について説明を行う。この場合には、状態
情報とシステム制御情報を受信したシステム制御デーモ
ンのクラスタシステムが自動的にプライマリクラスタシ
ステムなる。Embodiment 7 FIG. In the first to sixth embodiments, the client transmits the status information of each cluster system to the system control daemons of all the cluster systems. However, in the present embodiment, when a high-speed network is used, A case where status information and system control information are transmitted only to the cluster system that is the primary cluster system will be described. In this case, the cluster system of the system control daemon that has received the status information and the system control information automatically becomes the primary cluster system.

【００５６】本実施の形態の広域クラスタ制御方式の構
成及びこの構成要素の処理動作については、実施の形態
１〜６と同様であるが、下記の動作が実施の形態１〜６
と違う点である。The configuration of the wide area cluster control system of this embodiment and the processing operation of this component are the same as those of the first to sixth embodiments, but the following operations are performed in the first to sixth embodiments.
It is a different point.

【００５７】この動作について説明する。クライアント
２のシステム情報管理デーモン２１は一定間隔毎に、特
定のクラスタシステム１のシステム制御デーモン１２、
例えばクラスタシステム１ａのシステム制御デーモン１
２ａに対して、各クラスタシステム１の状態情報と、プ
ライマリクラスタシステムであるクラスタシステムであ
ることを認識させるためのシステム制御情報とを送信す
る。プライマリクラスタシステムであるクラスタシステ
ム１ａのシステム制御デーモン１２ａは、一定間隔で送
信されてくる状態情報とシステム制御情報を受信するこ
とで、自分がプライマリクラスタであることを確認する
が、状態情報とシステム制御情報を受信できなかった場
合、プライマリクラスタからセカンダリクラスタに降格
する。システム情報管理デーモン２１は、プライマリク
ラスタシステムであるクラスタシステム１ａに状態情報
とシステム制御情報が送信できなくなった場合、前述の
クラスタ制御アルゴリズムに基づき、新しいプライマリ
クラスタを決定すべく、状態情報とシステム制御情報を
別のクラスタシステム１のシステム制御デーモン１２に
送信する。The operation will be described. The system information management daemon 21 of the client 2 periodically starts the system control daemon 12 of the specific cluster system 1,
For example, the system control daemon 1 of the cluster system 1a
2a, the status information of each cluster system 1 and the system control information for recognizing the cluster system as the primary cluster system are transmitted. The system control daemon 12a of the cluster system 1a, which is the primary cluster system, confirms that it is the primary cluster by receiving status information and system control information transmitted at regular intervals. If control information cannot be received, demote from the primary cluster to the secondary cluster. When the status information and the system control information cannot be transmitted to the cluster system 1a as the primary cluster system, the system information management daemon 21 determines the status information and the system control to determine a new primary cluster based on the cluster control algorithm described above. The information is transmitted to the system control daemon 12 of another cluster system 1.

【００５８】以上のように本実施の形態によれば、各ク
ライアント２のシステム情報管理デーモン２１は、全て
のクラスタシステム１のシステム制御デーモン１２に対
して、状態情報を送る必要性がなくなり、システムの可
用性を向上させることが可能になり、可用性の高い広域
クラスタ制御方式を実現することができる。As described above, according to the present embodiment, the system information management daemon 21 of each client 2 does not need to send status information to the system control daemons 12 of all the cluster systems 1, and the system Can be improved, and a wide-availability cluster control method with high availability can be realized.

【００５９】[0059]

【発明の効果】この発明は、以上説明したように構成さ
れているので、以下に示すような効果を奏する。Since the present invention is configured as described above, it has the following effects.

【００６０】第１の発明では、複数のクラスタシステム
のシステム監視手段から送信されたクラスタシステムの
状態情報に基づいて、実際に動作させるクラスタシステ
ムを決定し、複数のクラスタシステムのシステム制御手
段に複数のクラスタシステムの状態情報を通知し、この
状態情報に従って、複数のクラスタシステムのシステム
制御手段が複数のクラスタシステムの制御をすることに
より、災害などであるクラスタシステムに障害が発生し
たりネットワークに障害が発生した時に各クラスタシス
テムを正常に制御することができ、耐災害性を考慮した
信頼性、及び可用性の高い広域クラスタ制御を実現する
ことができる。In the first invention, the cluster system to be actually operated is determined based on the status information of the cluster systems transmitted from the system monitoring means of the plurality of cluster systems, and the plurality of cluster systems are determined by the system control means of the plurality of cluster systems. The system control means of a plurality of cluster systems controls the plurality of cluster systems in accordance with the status information, thereby causing a failure in the cluster system such as a disaster or a failure in the network. When a problem occurs, each cluster system can be controlled normally, and wide area cluster control with high reliability and high availability in consideration of disaster resistance can be realized.

【００６１】第２の発明では、システム情報管理手段を
それぞれ有する複数のクライアントを備えることによ
り、より信頼性を向上させたクラスタ制御をすることが
でき、耐災害性を考慮した信頼性、及び可用性の高い広
域クラスタ制御を実現することができる。In the second invention, by providing a plurality of clients each having system information management means, it is possible to perform cluster control with further improved reliability, reliability and availability in consideration of disaster resistance. High-wide-area cluster control can be realized.

【００６２】第３の発明では、クライアント又は複数の
クライアントから送信された複数のクラスタシステムの
状態情報とから構成される複数のクラスタシステムの動
作情報をシステム制御手段に送信するシステム制御サー
バを備えることにより、各クライアントが各クラスタシ
ステムに対し通信を行う必要性がなくなり、ネットワー
クの可用性を向上させることができる。According to a third aspect of the present invention, there is provided a system control server for transmitting operation information of a plurality of cluster systems composed of status information of a plurality of cluster systems transmitted from a client or a plurality of clients to system control means. This eliminates the need for each client to communicate with each cluster system, thereby improving network availability.

【００６３】第４の発明では、ルールテーブルに状態情
報を送信する上記クライアントが正常であるか否か示す
クライアント情報を有することにより、適用するシステ
ムのネットワーク環境によりクラスタシステムの動作を
調整することが可能となり、可用性の高い広域クラスタ
制御を実現することができる。According to the fourth aspect of the invention, the operation of the cluster system can be adjusted according to the network environment of the system to be applied by having the client information indicating whether or not the client transmitting the state information is normal in the rule table. This makes it possible to realize high-availability wide-area cluster control.

【００６４】第５の発明では、複数のクライアントに複
数のクラスタシステムの処理を優先的に実行させる重み
付けをすることにより、クライアント側の障害によるシ
ステム制御デーモンの誤判断を最小限におさえることが
できる。According to the fifth aspect of the present invention, by giving a weight to a plurality of clients to execute the processes of a plurality of cluster systems preferentially, erroneous determination of a system control daemon due to a failure on the client side can be minimized. .

【００６５】第６の発明では、上記ルールテーブルに複
数のクラスタシステム毎のルールテーブルを統合するこ
とにより、複数のルールテーブルを管理する必要性がな
くなり、可用性の高い広域クラスタ制御を実現すること
ができる。In the sixth aspect of the present invention, by integrating rule tables for a plurality of cluster systems into the above-mentioned rule table, there is no need to manage a plurality of rule tables, and high-availability wide-area cluster control can be realized. it can.

【００６６】第７の発明では、システム情報管理手段が
システム制御手段に複数のクラスタシステムの状態情報
と、処理を実行するクラスタシステムであることを認識
させるためのシステム制御情報を送信することにより、
クライアントのシステム情報管理デーモンは、全てのク
ラスタシステムのシステム制御デーモンに対して状態情
報を送る必要性がなくなり、可用性の高い広域クラスタ
制御を実現することができる。In the seventh invention, the system information management means transmits the state information of the plurality of cluster systems and the system control information for causing the system control means to recognize that the system is a cluster system for executing the processing.
The client system information management daemon does not need to send status information to the system control daemons of all cluster systems, and can realize highly available wide area cluster control.

[Brief description of the drawings]

【図１】実施の形態１の広域クラスタ制御方式の構成
図。FIG. 1 is a configuration diagram of a wide area cluster control method according to a first embodiment.

【図２】実施の形態１におけるシステム監視デーモン
１１の処理動作を示すフローチャート。FIG. 2 is a flowchart showing a processing operation of a system monitoring daemon 11 according to the first embodiment.

【図３】実施の形態１におけるシステム情報管理デー
モン２１の処理動作を示すフローチャート。FIG. 3 is a flowchart showing a processing operation of a system information management daemon 21 according to the first embodiment.

【図４】実施の形態１におけるルールテーブルの一例
を示した図。FIG. 4 is a diagram showing an example of a rule table according to the first embodiment.

【図５】実施の形態１におけるシステム制御デーモン
１２の処理動作を示すフローチャート。FIG. 5 is a flowchart showing a processing operation of a system control daemon 12 according to the first embodiment.

【図６】実施の形態１においてクラスタシステム１に
障害が発生したときの各デーモンの動作を示す図。FIG. 6 is a diagram showing an operation of each daemon when a failure occurs in the cluster system 1 in the first embodiment.

【図７】実施の形態２の広域クラスタ制御方式の構成
図。FIG. 7 is a configuration diagram of a wide-area cluster control method according to the second embodiment.

【図８】実施の形態２におけるシステム監視デーモン
１１の処理動作を示すフローチャート。FIG. 8 is a flowchart showing a processing operation of a system monitoring daemon 11 according to the second embodiment.

【図９】実施の形態２におけるシステム制御デーモン
１２の処理動作を示すフローチャート。FIG. 9 is a flowchart showing a processing operation of a system control daemon 12 according to the second embodiment.

【図１０】実施の形態３の広域クラスタ制御方式の構
成図。FIG. 10 is a configuration diagram of a wide-area cluster control method according to the third embodiment.

【図１１】実施の形態３におけるシステム制御サーバ
４の処理動作を示すフローチャート。FIG. 11 is a flowchart showing a processing operation of a system control server 4 according to the third embodiment.

【図１２】実施の形態４における重み付けを行ったル
ールテーブルの一例を示した図。FIG. 12 is a diagram showing an example of a weighted rule table according to the fourth embodiment.

【図１３】実施の形態６における複数のルールテーブ
ルを１つに統合したルールテーブルの一例を示した図。FIG. 13 is a diagram showing an example of a rule table obtained by integrating a plurality of rule tables into one according to the sixth embodiment.

【図１４】従来のネットワーク管理方法の構成図。FIG. 14 is a configuration diagram of a conventional network management method.

【符号の説明】１（１ａ〜１ｎ）クラスタシステム、２クライアン
ト、３広域ネットワーク、１１（１１ａ〜１１ｎ）
システム監視デーモン、１２（１２ａ〜１２ｎ）シス
テム制御デーモン、１３（１３ａ〜１３ｎ）ルールテ
ーブル。[Description of Signs] 1 (1a to 1n) cluster system, 2 clients, 3 wide area network, 11 (11a to 11n)
System monitoring daemon, 12 (12a to 12n) System control daemon, 13 (13a to 13n) Rule table.

Claims

[Claims]

1. A wide-area cluster control method having the following elements. (A) a plurality of cluster systems that are connected to a network and execute processing based on information transmitted from a client, and have the following elements; (a1) monitor the status of the cluster system, and A system monitoring means for transmitting the status information and instructing to stop the cluster system when the status information cannot be transmitted; (a2) having a rule table for each of the plurality of cluster systems for controlling the plurality of cluster systems; System control for controlling the plurality of cluster systems based on the status information of the plurality of cluster systems transmitted from the client and the rule table, and for stopping the cluster systems based on an instruction from the system monitoring means; Means; (b) connect to the above network Transmitting the status information transmitted by the plurality of cluster systems to the plurality of cluster systems, and storing the status information transmitted by the system monitoring means; and (b1) storing the status information transmitted by the system monitoring means. System information management means for deciding a cluster system to be actually operated when status information that has been transmitted differs from newly transmitted status information, and notifying the system control means of the plurality of cluster systems of the status information of the plurality of cluster systems .

2. A system comprising: a plurality of clients each having the system information management unit; wherein the system control unit is configured to execute the plurality of cluster systems based on the status information of the plurality of cluster systems transmitted from the plurality of clients and the rule table. 2. The wide area cluster control method according to claim 1, wherein a plurality of cluster systems are controlled.

3. The system according to claim 2, further comprising: a rule table, wherein operation information of the plurality of cluster systems, comprising the rule table and the status information of the plurality of cluster systems transmitted from the client or the plurality of clients, is stored. 3. The wide area cluster according to claim 1, further comprising: a system control server for transmitting to the system control means, wherein the system control means controls the plurality of cluster systems based on the operation information. control method.

4. The rule table has client information indicating whether the client transmitting status information is normal or not, and the system control means transmits status information of the plurality of cluster systems transmitted from the client. And controlling the plurality of cluster systems based on the client information in the rule table.
The described wide area cluster control method.

5. The plurality of clients are weighted so as to give priority to the processing of the plurality of cluster systems, and the system control means transmits a status of the plurality of cluster systems transmitted from the weighted client. 5. The wide area cluster control method according to claim 4, wherein said plurality of cluster systems are controlled based on information and said rule table.

6. The rule table is integrated with a rule table for each of the plurality of cluster systems, and the system control means includes a state information of the plurality of cluster systems transmitted from a client, and the integrated rule table. 6. The wide area cluster control method according to claim 5, wherein the plurality of cluster systems are controlled based on the following.

7. The system information management means, wherein the system control means stores status information of a plurality of cluster systems,
Transmitting system control information for recognizing that the plurality of clusters is a cluster system that executes processing, wherein the system control unit transmits the plurality of clusters based on the state information and system control information transmitted from the system information management unit; 7. The wide area cluster control system according to claim 1, wherein the system is controlled.