JP2001043105A

JP2001043105A - High-availability computer system and data backup method of the system

Info

Publication number: JP2001043105A
Application number: JP11217147A
Authority: JP
Inventors: Koji Yamamoto; 浩司山本
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1999-07-30
Filing date: 1999-07-30
Publication date: 2001-02-16
Anticipated expiration: 2019-07-30
Also published as: JP3887130B2

Abstract

PROBLEM TO BE SOLVED: To actualize a system which is more tolerant of a fault by securing multiple backup servers and effectively copying data to the servers. SOLUTION: More than two servers connected by a network, e.g. servers S1 to S4 are prepared and the server S1 which functions as a master according to the priority communicates with the slave servers S2 to S4 to periodically search for a server having a fault and servers having no fault. When a client modifies data of a file that the master server S1 has, the server S1 takes copies 81 and 82 of the modified data to the faultless servers S3 and S4 found through the searching operation. The servers S3 and S4 periodically searches for the master in the decreasing order of the priority from the server S1 having the top priority and becomes a master newly when having the top priority itself among the faultless servers when no master is found.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、複数のサーバ計算
機が連携して処理を行い、いずれかのサーバ計算機で障
害が発生しても他のサーバ計算機が処理を引き継ぐこと
ができる高可用性システム（高可用性計算機システム）
に係り、特に他のクライアント計算機に対してサービス
を提供する複数のサーバ計算機をネットワークによって
連携させ、いずれかのサーバ計算機で障害が発生して
も、他のサーバ計算機がサービスを引き継ぐことによっ
てシステム全体としてはサービスの中断時間を可能な限
り短くするデータバックアップ機能を有する高可用性計
算機システム及び同システムにおけるデータバックアッ
プ方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a high-availability system in which a plurality of server computers cooperate to perform processing, and even if a failure occurs in any one of the server computers, another server computer can take over the processing. High availability computer system)
In particular, a plurality of server computers that provide services to other client computers are linked by a network, and even if a failure occurs in any one of the server computers, the other server computers take over the service and the entire system is taken over. The present invention relates to a high availability computer system having a data backup function for minimizing service interruption time as much as possible and a data backup method in the system.

【０００２】[0002]

【従来の技術】従来の高可用性計算機システムは、サー
バ計算機が２台存在し、その片方でサービスを提供し、
もう片方にバックアップを行う方式を適用するのが一般
的であった。2. Description of the Related Art A conventional high availability computer system has two server computers, one of which provides a service,
It was common to apply a backup method to the other side.

【０００３】この種のシステムでは、サービスを提供し
ているサーバ計算機からバックアップ用のサーバ計算機
にデータをコピーしておくことにより、サービスを提供
しているサーバ計算機に障害が発生した場合に、サービ
スをもう片方のサーバ計算機に引き継いで、その続きを
行うことができるようになっている。In this type of system, data is copied from a server computer providing a service to a backup server computer, so that when a failure occurs in the server computer providing the service, the service is provided. Is transferred to the other server computer, and the continuation can be performed.

【０００４】[0004]

【発明が解決しようとする課題】上記した２台のサーバ
計算機を用いて構成される従来の高可用性計算機システ
ムでは、サービスを提供しているサーバ計算機に障害が
発生した場合でも、残りの待機状態にあるサーバ計算機
（バックアップ用サーバ計算機）でサービスを引き継ぐ
ことが可能である。しかし、サーバ計算機が２台の場
合、２台共障害が発生する可能性もあり得るため、用途
によっては耐障害性の点で必ずしもが十分とはいえな
い。In the conventional high-availability computer system configured using the two server computers described above, even if a failure occurs in the server computer providing the service, the remaining standby state is maintained. It is possible to take over the service by the server computer (backup server computer). However, when there are two server computers, there is a possibility that a failure may occur in both servers, so that it is not always sufficient in terms of fault tolerance depending on the application.

【０００５】そこで、より障害に強いシステムを構築す
るために、３台以上のサーバ計算機を連携させて動作さ
せることが考えられる。この場合、サーバ計算機が多く
なるほどサーバ計算機間の連携は複雑となり、また運用
状態にあるサーバ計算機の負荷も大きくなることが予測
される。このため、３台以上のサーバ計算機を効果的に
連携させる仕組みが必要となる。Therefore, in order to construct a system that is more resistant to failure, it is conceivable to operate three or more server computers in cooperation with each other. In this case, it is expected that the cooperation between the server computers becomes more complicated as the number of server computers increases, and that the load on the server computers in the operating state also increases. For this reason, a mechanism for effectively linking three or more server computers is required.

【０００６】本発明は上記事情を考慮してなされたもの
でその目的は、３台以上のサーバ計算機を用いてバック
アップ用のサーバ計算機（バックアップサーバ計算機）
を複数確保し、その複数のバックアップサーバ計算機に
対する効果的なデータコピーを行うことで、より障害に
強い高可用性計算機システム及び同システムにおけるデ
ータバックアップ方法を提供することにある。The present invention has been made in view of the above circumstances, and has as its object to provide a backup server computer (backup server computer) using three or more server computers.
Is to provide a highly available computer system that is more resistant to failure and a data backup method in the system by securing a plurality of backup servers and performing effective data copying to the plurality of backup server computers.

【０００７】本発明の他の目的は、高速ネットワークと
低速ネットワークによって複数のサーバ計算機を連携さ
せ、各ネットワークに適したデータバックアップ方式を
併用することで、ネットワーク構成に柔軟に対応した効
果的なデータバックアップが行える高可用性計算機シス
テム及び同システムにおけるデータバックアップ方法を
提供することにある。Another object of the present invention is to link a plurality of server computers via a high-speed network and a low-speed network, and to use a data backup method suitable for each network, so that effective data can be flexibly adapted to a network configuration. It is an object of the present invention to provide a high availability computer system capable of performing backup and a data backup method in the system.

【０００８】[0008]

【課題を解決するための手段】本発明の第１の観点に係
る高可用性計算機システムは、ネットワークを介して接
続される少なくとも３台のサーバ計算機を備え、そのう
ちの１台がマスタサーバ計算機となってクライアント計
算機に対してサービスを提供し、当該マスタサーバ計算
機に障害が発生した場合には、所定の優先順位情報（こ
こでは、システム内の全計算機についてマスタとなる優
先順位を示すと共にマスタが切り替わる毎に優先順位が
循環使用される優先順位情報）に従って、残りの複数の
サーバ計算機のいずれかが新たにマスタサーバ計算機と
なって処理を引き継ぐ高可用性計算機システムであっ
て、上記各サーバ計算機に、次の各手段、即ち自計算機
がマスタサーバ計算機でない場合に、マスタサーバ計算
機を探すマスタ探索動作を定期的に実行するマスタ探索
手段と、このマスタ探索手段によりマスタサーバ計算機
が見つけられず、且つ障害が発生していないサーバ計算
機の中で自計算機の優先順位が最も高い場合に、自計算
機を新たにマスタサーバ計算機として設定するマスタ設
定手段と、自計算機がマスタサーバ計算機の場合に、障
害のあるサーバ計算機と障害のないサーバ計算機とを探
索するサーバ計算機探索動作を定期的に実行するサーバ
計算機探索手段と、自計算機がマスタサーバ計算機で、
且つクライアント計算機から自計算機の保持するファイ
ルのデータが変更された場合に、その変更されたデータ
を上記サーバ計算機探索手段により見つけられた障害の
ない全てのサーバ計算機に個々にコピーする、つまり１
対ｎ通信方式によるデータコピーを行うコピー手段とを
備えたことを特徴とする。A high-availability computer system according to a first aspect of the present invention includes at least three server computers connected via a network, one of which is a master server computer. If a service is provided to the client computer and a failure occurs in the master server computer, predetermined priority information (here, the master priority is shown for all the computers in the system and the master is switched) A priority server in which each of the remaining server computers becomes a new master server computer and takes over the processing according to the priority information in which the priority is cyclically used for each server computer. The following means, ie, a master search for finding a master server computer when the own computer is not the master server computer Master search means for periodically executing the operation; and when the master server means cannot find the master server computer and has the highest priority among the server computers in which no failure has occurred, the master computer As a master server computer, and a server that periodically executes a server computer search operation for searching for a faulty server computer and a faultless server computer when the own computer is the master server computer Computer search means, and the own computer is a master server computer,
Further, when the data of the file held by the own computer is changed from the client computer, the changed data is individually copied to all the fault-free server computers found by the server computer searching means.
Copy means for performing data copy by the n-to-n communication method.

【０００９】このような構成においては、クライアント
計算機によりマスタサーバ計算機の持つファイルのデー
タが変更された場合、そのデータがマスタサーバ計算機
により他の障害のない全てのサーバ計算機にコピーされ
て各サーバ計算機のファイルの内容の一致化が図られ、
しかもマスタサーバ計算機のバックアップ用のサーバ計
算機（スレーブサーバ計算機）が複数存在するため、マ
スタサーバ計算機に障害が発生した場合には、他の複数
のバックアップサーバ計算機（スレーブサーバ計算機）
のうちの１台が新たにマスタサーバ計算機となって、上
記コピーされたデータを使い、障害の発生したサーバで
それまで提供されていたサービスを引き継ぐことがで
き、より障害に強い高可用性計算機システムが実現可能
となる。In such a configuration, when data of a file held in the master server computer is changed by the client computer, the data is copied by the master server computer to all other server computers having no trouble, and the data is copied to each server computer. File contents are matched,
Moreover, since there are a plurality of backup server computers (slave server computers) for the master server computer, if a failure occurs in the master server computer, a plurality of other backup server computers (slave server computers)
One of which becomes a new master server computer, can use the copied data to take over the service that has been provided by the failed server, and is more resistant to failure. Can be realized.

【００１０】ここで、上記マスタ探索手段に、上記優先
順位情報に従って、その時点で優先順位の最も高いサー
バ計算機から始まって順位が低くなる方向に順に通信を
行うことによりマスタサーバ計算機を探す機能を持たせ
ることで、マスタ探索を効率的に行うことが可能とな
る。Here, the master searching means has a function of searching for the master server computer by performing communication in order from the server computer having the highest priority at that time in the direction of decreasing priority according to the priority information. With this, the master search can be performed efficiently.

【００１１】また、上記サーバ計算機探索手段に、優先
順位情報に従って、自計算機より１つ順位が下のサーバ
計算機から始まって順位が低くなる方向に順に通信を行
うことにより障害のあるサーバ計算機と障害のないサー
バ計算機とを探索する機能を持たせることで、自身より
優先順位が下位のサーバ計算機の障害の有無を効率的に
探索できる。Further, the server computer searching means communicates in order from the server computer which is one rank lower than the own computer to the server computer having the lower rank in accordance with the priority information, so that the faulty server computer can be identified with the faulty server computer. By providing a function of searching for a server computer having no server, it is possible to efficiently search for a server computer having a lower priority than the server computer itself.

【００１２】また、自計算機がマスタサーバ計算機の場
合に、上記サーバ計算機探索手段により見つけられた障
害のないサーバ計算機をデータ送信先として設定するデ
ータ送信先設定手段を設けるならば、上記コピー手段の
１対ｎ通信方式によるデータコピーが、当該データ送信
先設定手段の設定に従い効率的に行える。In the case where the own computer is a master server computer, if there is provided data transmission destination setting means for setting a server computer having no fault found by the server computer search means as a data transmission destination, the copying means of the copying means is provided. Data copy by the 1: n communication method can be efficiently performed according to the setting of the data transmission destination setting means.

【００１３】ここで、上記サーバ計算機探索手段により
新たに障害のないサーバ計算機が見つけられた場合に
は、つまり障害から復旧したサーバ計算機が検出された
場合には、そのサーバ計算機をデータ送信先としてデー
タ送信先設定手段により追加設定し、そのサーバ計算機
にマスタサーバ計算機のコピー手段によりマスタサーバ
計算機の保持する全てのファイルのデータをコピーする
ならば、そのサーバ計算機（復旧したサーバ計算機）を
確実に且つ速やかにバックアップ計算機の１つとするこ
とができる。Here, if a new server computer having no failure is found by the server computer search means, that is, if a server computer recovered from the failure is detected, the server computer is set as a data transmission destination. If additional settings are made by the data transmission destination setting means and the data of all the files held by the master server computer are copied to the server computer by the copying means of the master server computer, the server computer (recovered server computer) is surely connected. In addition, it can quickly become one of the backup computers.

【００１４】本発明の第２の観点に係る高可用性計算機
システムは、ネットワーク接続されたシステム内の各サ
ーバ計算機に、以下の各手段、即ち自計算機がマスタサ
ーバ計算機でない場合に、優先順位情報に従って、自計
算機より１つ順位が上のサーバから始まって順位が高く
なる方向に順に通信を行うことにより障害のないサーバ
計算機を１つ探す第１の探索動作を定期的に実行する第
１のサーバ計算機探索手段と、この第１のサーバ計算機
探索手段により障害のないサーバ計算機が見つけられる
前に障害のあるサーバ計算機が見つけられ、且つその計
算機がマスタサーバ計算機である場合、自計算機を新た
にマスタサーバ計算機として設定するマスタ設定手段
と、優先順位情報に従って、自計算機より１つ順位が下
のサーバから始まって順位が低くなる方向に順に通信を
行うことにより障害のないサーバ計算機を１つ探す第２
の探索動作を定期的に実行する第２のサーバ計算機探索
手段と、自計算機がマスタサーバ計算機で、且つクライ
アント計算機から自計算機の保持するファイルのデータ
が変更された場合に、その変更されたデータを上記第２
のサーバ計算機探索手段により見つけられた障害のない
サーバ計算機にコピーする第１のコピー手段と、他のサ
ーバ計算機からデータがコピーされた場合、そのデータ
を上記第２のサーバ計算機探索手段により見つけられた
障害のないサーバ計算機にコピーする第２のコピー手段
とを設けたことを特徴とする。A high-availability computer system according to a second aspect of the present invention provides each server computer in a network-connected system with the following means, that is, when its own computer is not a master server computer, according to priority information. A first server that periodically executes a first search operation for searching for one server computer having no failure by sequentially communicating in a direction of higher rank starting from a server one rank higher than the own computer When a faulty server computer is found before a faultless server computer is found by the computer search means and the first server computer search means, and the computer is a master server computer, the own computer is newly added to the master server computer. Master setting means to be set as a server computer, and starting from the server one rank lower than the own computer according to the priority information Ranking find one unobstructed server computer by communicating in the forward direction becomes lower second
A second server computer search means for periodically executing the search operation of the above, and when the own computer is the master server computer and the data of the file held by the own computer is changed from the client computer, the changed data The second
The first copying means for copying to the server computer without any trouble found by the server computer searching means of the first, and when data is copied from another server computer, the data is found by the second server computer searching means. And second copying means for copying to a server computer having no trouble.

【００１５】このような構成においては、クライアント
計算機によりマスタサーバ計算機の持つファイルのデー
タが変更された場合、そのデータが、マスタサーバ計算
機を除く障害のない全てのサーバ計算機（スレーブサー
バ計算機）に、優先順位の並び順に先頭のサーバ計算機
から最終のサーバ計算機まで各サーバ計算機を経由して
コピーされる。つまりそのデータがマスタサーバ計算機
から次の優先順位の並びのサーバ計算機にコピーされ、
そのサーバ計算機から更に次の優先順位の並びのサーバ
計算機にコピーされるというように、優先順位の並びが
最後のサーバ計算機までディジーチェーン方式（リレー
式）で順にコピーされる。このため、マスタサーバ計算
機が他の各サーバ計算機（スレーブサーバ計算機）に１
対ｎ通信方式により個々にデータコピーを行うのに比べ
て、速度は遅くなるものの、サーバ計算機の負荷は小さ
くて済み、障害に強く、より負荷に強い高可用性計算機
システムが実現できる。In such a configuration, when data of a file held in the master server computer is changed by the client computer, the data is transferred to all non-failed server computers (slave server computers) except the master server computer. The data is copied from the first server computer to the last server computer via each server computer in the order of priority. In other words, the data is copied from the master server computer to the server computer in the next priority order,
The sequence of priorities is copied in order by the daisy chain method (relay type) up to the last server computer, such as copying from the server computer to the server computer of the next priority sequence. For this reason, the master server computer is assigned to each of the other server computers (slave server computers) by one.
Although the speed is slower than when data is individually copied by the n-to-n communication method, the load on the server computer is small, and a high-availability computer system that is resistant to failure and more resistant to load can be realized.

【００１６】ここで、上記第２のサーバ計算機探索手段
により見つけられた障害のないサーバ計算機をデータ送
信先として設定するデータ送信先設定手段を設けるなら
ば、上記第１及び第２のコピー手段のディジーチェーン
方式によるデータコピーが、当該データ送信先設定手段
の設定に従い効率的に行える。Here, if there is provided a data transmission destination setting means for setting a server computer having no fault found by the second server computer search means as a data transmission destination, the first and second copying means are provided. Data copy by the daisy chain method can be efficiently performed according to the setting of the data destination setting means.

【００１７】また、上記第２のサーバ計算機探索手段に
より新たに障害のないサーバ計算機が見つけられた場合
には、つまり障害から復旧したサーバ計算機が検出され
た場合には、そのサーバ計算機をデータ送信先としてデ
ータ送信先設定手段により変更設定し、そのサーバ計算
機に第１または第２のコピー手段により自計算機の保持
する全てのファイルのデータをコピーするならば、その
サーバ計算機（復旧したサーバ計算機）を確実に且つ速
やかにバックアップ計算機の１つとすることができる。If a second server computer having no fault is found by the second server computer searching means, that is, if a server computer recovered from the fault is detected, the server server transmits data to the server computer. If the data is changed and set by the data transmission destination setting means and the data of all the files held by the own computer is copied to the server computer by the first or second copying means, the server computer (the restored server computer) Can be reliably and promptly used as one of the backup computers.

【００１８】本発明の第３の観点に係る高可用性計算機
システムは、第１のネットワークを介して接続される複
数の第１のサーバ計算機と、第１のネットワークより低
速な第２のネットワークを介して接続される複数の第２
のサーバ計算機と、第１のネットワーク及び第２のネッ
トワーク間に接続される第３のサーバ計算機とを備え、
いずれか１台がマスタサーバ計算機となってクライアン
ト計算機に対してサービスを提供し、当該マスタサーバ
計算機に障害が発生した場合には、システム内の全計算
機についてマスタとなる優先順位を示すと共にマスタが
切り替わる毎に優先順位が循環使用される優先順位情報
に従って、残りの複数のサーバ計算機のいずれかが新た
にマスタサーバ計算機となって処理を引き継ぐ高可用性
計算機システムであって、上記第１のサーバ計算機に
は、自計算機がマスタサーバ計算機で、且つクライアン
ト計算機から自計算機の保持するファイルのデータが変
更された場合に、その変更されたデータを、第１のネッ
トワークに接続されている自計算機より優先順位が低く
且つ障害のないサーバ計算機のうちの最も順位が高いサ
ーバ計算機にコピーする第１のコピー手段と、他のサー
バ計算機からデータがコピーされた場合、そのデータ
を、第１のネットワークに接続されている自計算機より
優先順位が低く且つ障害のないサーバ計算機のうちの最
も順位が高いサーバ計算機にコピーする第２のコピー手
段とを備え、上記第２のサーバ計算機には、自計算機が
マスタサーバ計算機で、且つクライアント計算機から自
計算機の保持するファイルのデータが変更された場合
に、その変更されたデータを、第２のネットワークに接
続されている障害のない全てのサーバ計算機に個々にコ
ピーする第３のコピー手段を備え、上記第３のサーバ計
算機には、第１のサーバ計算機からデータがコピーされ
た場合に、そのデータを第２のネットワーク上の障害の
ない全ての第２のサーバ計算機に個々にコピーし、第２
のサーバ計算機からデータがコピーされた場合に、その
データを第１のネットワーク上の障害のない第１のサー
バ計算機のうち優先順位が最も高い第１のサーバ計算機
にコピーする第４のコピー手段を備えたことを特徴とす
る。A high-availability computer system according to a third aspect of the present invention includes a plurality of first server computers connected via a first network and a second network which is slower than the first network. Multiple connected second
Server computer, and a third server computer connected between the first network and the second network,
If one of the computers serves as a master server computer and provides services to the client computers, and a failure occurs in the master server computer, the priority order to become the master for all the computers in the system and the master A high-availability computer system in which one of a plurality of remaining server computers takes over as a new master server computer and takes over the processing in accordance with the priority information in which the priority is cyclically used every time the first server computer is switched. When the own computer is the master server computer and the data of the file held by the own computer is changed from the client computer, the changed data is given priority over the own computer connected to the first network. Copy to the highest-ranked server computer with lower rank and no fault When data is copied from the first copy unit and another server computer, the data is copied to the most preferred server computer among lower-priority and fault-free server computers connected to the first network connected to the first network. A second copy unit for copying to a server computer having a higher rank, wherein the second server computer has its own computer as a master server computer, and data of a file held by its own computer has been changed from a client computer. In this case, there is provided a third copying means for individually copying the changed data to all the server computers connected to the second network without any trouble, wherein the third server computer has the first server. When data is copied from one of the server computers, the data is individually copied to all non-failed second server computers on the second network. And over, the second
A fourth copy unit that copies the data to the first server computer having the highest priority among the first server computers having no failure on the first network when data is copied from the first server computer. It is characterized by having.

【００１９】このような構成においては、高速ネットワ
ークである第１のネットワーク上では、サーバ計算機の
負荷が小さくて済むディジーチェーン方式によるデータ
バックアップが適用され、低速ネットワークである第２
のネットワーク上では、各サーバ計算機のデータの一致
化に要する時間が短くて済む１対ｎ通信方式によるデー
タバックアップが適用され、ネットワークの構成に柔軟
に対応したシステムの構築が可能となる。In such a configuration, on the first network which is a high-speed network, data backup by a daisy-chain method which requires a small load on a server computer is applied, and the second network which is a low-speed network is used.
In this network, the data backup by the 1: n communication method, in which the time required for matching the data of the server computers is short, is applied, and a system can be flexibly adapted to the network configuration.

【００２０】なお、以上の装置（高可用性計算機システ
ム）に係る本発明は方法（高可用性計算機システムにお
けるデータバックアップ方法）に係る発明としても成立
する。The present invention relating to the above apparatus (high-availability computer system) is also realized as an invention relating to a method (data backup method in a high-availability computer system).

【００２１】[0021]

【発明の実施の形態】以下、本発明の実施の形態につき
図面を参照して説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００２２】［第１の実施形態］（概略構成）図１は本発明の第１の実施形態に係る高可
用性計算機システムの構成を示すブロック図である。図
１のシステムは、３台以上のサーバ計算機、例えば４台
のサーバ計算機（以下、単にサーバと称する）Ｓ１，Ｓ
２，Ｓ３，Ｓ４と、複数台（ここではｍ台）のクライア
ント計算機（以下、単にクライアントと呼ぶ）Ｃ１〜Ｃ
ｍと、これら各サーバＳ１〜Ｓ４及びクライアントＣ１
〜Ｃｍを接続するネットワークＮとから構成される。[First Embodiment] (Schematic Configuration) FIG. 1 is a block diagram showing a configuration of a high availability computer system according to a first embodiment of the present invention. The system shown in FIG. 1 includes three or more server computers, for example, four server computers (hereinafter, simply referred to as servers) S1 and S
2, S3, S4, and a plurality (here, m) of client computers (hereinafter, simply referred to as clients) C1 to C
m, each of these servers S1 to S4 and client C1
To Cm.

【００２３】サーバＳ１〜Ｓ４は、サービスを提供する
１つのマスタサーバと、マスタサーバのバックアップ用
となる複数のスレーブサーバに分かれる。図１の状態で
は、サーバＳ１がマスタサーバ、他のサーバＳ２〜Ｓ４
がスレーブサーバ（バックアップサーバ）となってい
る。The servers S1 to S4 are divided into one master server for providing a service and a plurality of slave servers for backing up the master server. In the state of FIG. 1, the server S1 is a master server, and the other servers S2 to S4
Is a slave server (backup server).

【００２４】クライアントＣ１〜Ｃｍは、マスタサーバ
（Ｓ１）の提供するサービスを、ネットワークＮを通じ
て利用し、マスタサーバ（Ｓ１）の持つファイルに対し
書き込みを行う。The clients C1 to Cm use the service provided by the master server (S1) through the network N and write the files of the master server (S1).

【００２５】マスタサーバ（Ｓ１）は、自身（自計算
機）の持つファイル中のデータをスレーブサーバ（Ｓ２
〜Ｓ４）にコピーすることにより、マスタサーバ（Ｓ
１）とスレーブサーバ（Ｓ２〜Ｓ４）の持つファイルの
内容を等しくする。これにより、図２に示すように、マ
スタサーバ（Ｓ１）に障害が発生した場合、スレーブサ
ーバ（Ｓ２〜Ｓ４）のうちの１台、例えばスレーブサー
バＳ２が新しくマスタサーバとなり、コピーされたデー
タを使い、それまでのマスタサーバ（Ｓ１）により提供
されていたサービスを引き継ぎ、そのサービスの続きを
クライアントＣ１〜Ｃｍに提供する。以上が、本実施形
態における高可用性計算機システムの概略構成である。The master server (S1) transfers the data in its own (own computer) file in the slave server (S2).
To the master server (S4).
1) and the contents of the files of the slave servers (S2 to S4) are made equal. As a result, as shown in FIG. 2, when a failure occurs in the master server (S1), one of the slave servers (S2 to S4), for example, the slave server S2 becomes a new master server, and the copied data is transferred. The client then takes over the service provided by the master server (S1) and provides the continuation of the service to the clients C1 to Cm. The above is the schematic configuration of the high availability computer system in the present embodiment.

【００２６】（サーバの内部構成）次に、図１のシステ
ムの中心をなすサーバＳ１〜Ｓ４の内部構成について、
図３のブロック構成図を参照して説明する。(Internal Configuration of Server) Next, the internal configuration of the servers S1 to S4, which form the center of the system of FIG.
This will be described with reference to the block diagram of FIG.

【００２７】まず、サーバＳｉ，Ｓｊ（ｉ，ｊは１〜
４、但しｉ≠ｊ）では、状態監視デーモン１１、データ
受信デーモン１２、及びデータ送信デーモン１３の３つ
のデーモン（バックグラウンドで動作する処理手段）が
動作するように構成されている。以下、サーバＳｉを例
に、状態監視デーモン１１、データ受信デーモン１２、
及びデータ送信デーモン１３の機能について説明する。First, the servers Si and Sj (i and j are 1 to 1)
4, where i ≠ j), three daemons (processing means operating in the background) of the state monitoring daemon 11, the data reception daemon 12, and the data transmission daemon 13 are configured to operate. Hereinafter, taking the server Si as an example, the status monitoring daemon 11, the data reception daemon 12,
The function of the data transmission daemon 13 will be described.

【００２８】まずサーバＳｉ上の状態監視デーモン１１
は、サーバＳｊ上の状態監視デーモン１１など、自身が
存在するサーバＳｉ以外のすべてのサーバ上で動作して
いる他の状態監視デーモン１１と定期的に通信を行う。
この定期的な通信により、通信が行えないサーバが存在
した場合、そのサーバに障害が発生したと判断すること
ができる。First, the status monitoring daemon 11 on the server Si
Communicates periodically with other state monitoring daemons 11 running on all servers other than the server Si on which the server itself resides, such as the state monitoring daemon 11 on the server Sj.
If there is a server that cannot communicate with this periodic communication, it can be determined that a failure has occurred in that server.

【００２９】サーバＳｉ上の状態監視デーモン１１は、
システム内のどのサーバが障害状態にあるかを内部状態
として記憶する。そしてサーバＳｉ上の状態監視デーモ
ン１１は、システム内の各サーバの状態をもとに、自サ
ーバＳｉ上のデータ受信デーモン１２及びデータ送信デ
ーモン１３に対し、どのサーバからデータを受信し、ま
たどのサーバにデータを送信するかを指示する。The status monitoring daemon 11 on the server Si
Which server in the system is in a failure state is stored as an internal state. Then, the status monitoring daemon 11 on the server Si sends the data receiving daemon 12 and the data transmission daemon 13 on its own server Si, based on the status of each server in the system, from which server to receive data, Indicate whether to send data to the server.

【００３０】サーバＳｉ上のデータ受信デーモン１２
は、サーバＳｊなど、他のサーバのデータ送信デーモン
１３から送られてくるデータを受信し、自サーバＳｉの
ファイル蓄積手段である例えばディスク記憶装置（図示
せず）に記録する、前記したように、受け取るべきデー
タの送信元サーバは、同じサーバＳｉ上の状態監視デー
モン１１によって指定される。Data reception daemon 12 on server Si
Receives data transmitted from the data transmission daemon 13 of another server such as the server Sj and records the data in, for example, a disk storage device (not shown) which is a file storage unit of the server Si. The source server of the data to be received is specified by the status monitoring daemon 11 on the same server Si.

【００３１】サーバＳｉ上のデータ送信デーモン１３
は、自サーバＳｉのディスク記憶装置上のデータを監視
し、変更があったデータをサーバＳｊなど、他のサーバ
のデータ受信デーモン１２に送信する。前記したよう
に、データを送信すべきサーバ（データの送信先サー
バ）は、同じサーバＳｉ上の状態監視デーモン１１によ
って指定される。Data transmission daemon 13 on server Si
Monitors the data on the disk storage device of its own server Si, and transmits the changed data to the data reception daemon 12 of another server such as the server Sj. As described above, the server to which the data is to be transmitted (the destination server of the data) is specified by the status monitoring daemon 11 on the same server Si.

【００３２】次に、以上の構成のサーバＳ１〜Ｓ４を備
えた高可用性計算機システムの動作について、１対ｎ通
信方式によるバックアップを適用する場合を例に説明す
る。Next, the operation of the high-availability computer system including the servers S1 to S4 having the above-described configuration will be described by taking as an example a case where backup by the 1: n communication method is applied.

【００３３】（１対ｎ通信方式によるバックアップ）ま
ず、１対ｎ通信方式によるバックアップについて説明す
る。今、図４に示すように、サーバＳ１がマスタサー
バ、サーバＳ２〜Ｓ４がスレーブサーバとなっているも
のとする。この状態で、図１中のクライアントＣ１〜Ｃ
ｍのいずれかがマスタサーバＳ１の持つファイルに対し
てデータの書き込みを行った場合、当該マスタサーバＳ
１は（自サーバのデータ送信デーモン１３により）、そ
のデータをスレーブサーバＳ２〜Ｓ４にコピー、つまり
バックアップする。マスタサーバＳ１は、このマスタサ
ーバＳ１からスレーブサーバＳ２〜Ｓ４へのデータのバ
ックアップに１対ｎ通信方式によるバックアップを適用
して、図４において符号４１〜４３で示すように、スレ
ーブサーバＳ２〜Ｓ４に対し、ネットワークＮを介して
同一のデータを個別に送信する。(Backup by 1: n communication method) First, backup by 1: n communication method will be described. Now, it is assumed that the server S1 is a master server and the servers S2 to S4 are slave servers, as shown in FIG. In this state, the clients C1 to C in FIG.
m writes data to a file held by the master server S1,
1 (by the data transmission daemon 13 of the own server) copies the data to the slave servers S2 to S4, that is, backs up the data. The master server S1 applies a one-to-n communication backup to the data backup from the master server S1 to the slave servers S2 to S4, and as shown by reference numerals 41 to 43 in FIG. , The same data is individually transmitted via the network N.

【００３４】（マスタサーバとスレーブサーバの基本動
作）次に、マスタサーバとスレーブサーバの基本動作に
ついて説明する。まず図１のシステムでは、全てのサー
バＳ１〜Ｓ４についてマスタとなる優先順位を予め定め
ておき、その優先順位の情報を各サーバＳ１〜Ｓ４の状
態監視デーモン１１に記憶させるようにしている。優先
順位は、各サーバの性能が異なる場合には、高速なサー
バほど高く設定されるようにするとよい。優先順位の情
報は、例えば図５に示すように、各サーバＳ１〜Ｓ４に
ついて循環した順位を示すもので、ここでは順位１（先
頭順位）から順に、Ｓ１→Ｓ２→Ｓ３→Ｓ４→Ｓ１→Ｓ
２→…となっている。また、システム内で現在マスタと
なっているサーバ（の識別情報）も、各サーバＳ１〜Ｓ
４の状態監視デーモン１１に記憶される。初期起動時
は、順位１のサーバ（図５の例ではサーバＳ１）がマス
タとして記憶される。(Basic Operation of Master Server and Slave Server) Next, the basic operation of the master server and the slave server will be described. First, in the system of FIG. 1, the priorities of masters of all the servers S1 to S4 are determined in advance, and the information of the priorities is stored in the status monitoring daemon 11 of each of the servers S1 to S4. If the performance of each server is different, the priority may be set higher for a faster server. As shown in FIG. 5, for example, the priority order information indicates the order of circulation for each of the servers S1 to S4. In this example, the order is S1 → S2 → S3 → S4 → S1 → S
2 → ... In addition, (the identification information of) the server that is currently the master in the system also includes the servers S1 to S1.
4 is stored in the status monitoring daemon 11. At the time of the initial startup, the server of rank 1 (the server S1 in the example of FIG. 5) is stored as the master.

【００３５】なお、優先順位情報をＳ１，Ｓ２，Ｓ３，
Ｓ４（の４つ）だけで構成し、その並び順自体を、例え
ばＳ１→Ｓ２→Ｓ３→Ｓ４からＳ２→Ｓ３→Ｓ４→Ｓ１
へ、更にＳ３→Ｓ４→Ｓ１→Ｓ２へと、動的に循環させ
るようにしてもよい。この場合、優先順位情報自体が現
在マスタとなっているサーバの情報を保持していること
になる。また、優先順位情報をＳ１，Ｓ２，Ｓ３，Ｓ４
（の４つ）だけで構成すると共に、その並び順を固定
し、マスタの位置を示すポインタを優先順位情報上でサ
イクリックに移動するようにしてもよい。It should be noted that the priority information is S1, S2, S3,
S4 (only four), and the arrangement order itself is, for example, S1 → S2 → S3 → S4 to S2 → S3 → S4 → S1
May be dynamically circulated in the order of S3 → S4 → S1 → S2. In this case, the priority information itself holds the information of the server that is currently the master. Also, the priority information is set to S1, S2, S3, S4
(Four), the order of the masters may be fixed, and the pointer indicating the position of the master may be cyclically moved on the priority information.

【００３６】さて、各サーバＳ１〜Ｓ４の状態監視デー
モン１１は、データ受信及び送信を行うべき相手サーバ
を次のようにして決定する。まず初期起動時は、図５に
示す優先順位情報で決まる順位１のサーバ、即ちサーバ
Ｓ１がマスタサーバとなる。一方、マスタサーバでない
サーバＳ２〜Ｓ４（の状態監視デーモン１１）は、図６
のフローチャートに示す手順で、その時点で優先順位の
最も高いサーバ（サーバＳ１）から始まって順位が低く
なる方向に、自身以外のサーバ（の状態監視デーモン１
１）に対して順に通信を行っていき、自分がマスタであ
るという返事を返すサーバを探す（ステップ６１〜６
４）。ここでは、各サーバＳ２〜Ｓ４（の状態監視デー
モン１１）は、サーバＳ１からマスタであるという返事
を受け取ることになる。The status monitoring daemon 11 of each of the servers S1 to S4 determines a partner server to receive and transmit data as follows. First, at the time of initial startup, the server of rank 1 determined by the priority information shown in FIG. 5, that is, the server S1, becomes the master server. On the other hand, (the status monitoring daemon 11 of) the servers S2 to S4 that are not the master servers
In the procedure shown in the flowchart of FIG. 5, the status monitoring daemon 1 of servers other than itself (starting from the server (server S1) having the highest priority at that time and moving in the direction of decreasing priority).
It communicates in order to 1) and searches for a server that returns a reply that it is the master (steps 61 to 6).
4). Here, (the status monitoring daemon 11 of) each of the servers S2 to S4 receives a reply from the server S1 that the server is the master.

【００３７】サーバＳ２〜Ｓ４の状態監視デーモン１１
は、マスタであるという返事を返すサーバ（Ｓ１）を見
つけたなら（ステップ６２）、そのサーバ（Ｓ１）をマ
スタサーバとして記憶すると共に、自サーバ内のデータ
受信デーモン１２に知らせる（ステップ６５）。The status monitoring daemon 11 of the servers S2 to S4
Finds the server (S1) that returns a reply that it is the master (step 62), stores the server (S1) as the master server, and notifies the data reception daemon 12 in its own server (step 65).

【００３８】これに対し、サーバＳ１に障害が発生して
いる場合など、全てのサーバを探してもマスタサーバが
見つからない場合には（ステップ６２，６３）、障害が
発生していないサーバの中で自身が一番優先順位が高い
ならば（ステップ６６）、自身がマスタサーバとなる
（ステップ６７）。ここでは、初期起動時において優先
順位が最も高いサーバＳ１に障害は発生しておらず、図
８に示すように当該サーバＳ１がマスタサーバとなるも
のとする。On the other hand, when the master server cannot be found even when all the servers are searched (steps 62 and 63), such as when the server S1 has a failure, the server S1 having no failure has been found. If the user has the highest priority (step 66), the server becomes the master server (step 67). Here, it is assumed that no failure has occurred in the server S1 having the highest priority at the time of the initial startup, and the server S1 becomes the master server as shown in FIG.

【００３９】この場合、マスタとなったサーバ（マスタ
サーバ）Ｓ１の状態監視デーモン１１は、図７のフロー
チャートに示す手順で、自身より１つ順位が下のサーバ
（Ｓ２）から、２つ下のサーバ（Ｓ３）、３つ下のサー
バ（Ｓ４）と、順位が低くなる方向に順に通信を行って
いき、障害が発生していないサーバを全て探す（ステッ
プ７１〜７５）。In this case, the status monitoring daemon 11 of the server (master server) S1 that has become the master performs the procedure shown in the flowchart of FIG. 7 by two steps from the server (S2) which is one rank lower than itself. It communicates with the server (S3) and the server three lower (S4) in order of decreasing order, and searches for all servers in which no failure has occurred (steps 71 to 75).

【００４０】もし、優先順位情報上で自身の１巡後の順
位（図５の例では、順位５）の１つ手前の順位（図５で
は順位４）まで探しても障害が発生していないサーバを
見つけられなければ（ステップ７４，７６）、マスタサ
ーバＳ１の状態監視デーモン１１は、自サーバのデータ
送信デーモン１３に動きを停止するように伝える（ステ
ップ７７）。一方、障害が発生していないサーバを見つ
けたならば（ステップ７２）、マスタサーバＳ１の状態
監視デーモン１１は、そのサーバをデータ送信先として
自身のデータ送信デーモン１３に知らせる（ステップ７
３）。If a search is made up to a rank (priority 4 in FIG. 5) that is one rank ahead of itself (priority 5 in the example of FIG. 5) on the priority order information, no failure has occurred. If the server cannot be found (steps 74 and 76), the status monitoring daemon 11 of the master server S1 notifies the data transmission daemon 13 of its own server to stop the operation (step 77). On the other hand, if a server having no failure is found (step 72), the status monitoring daemon 11 of the master server S1 notifies its own data transmission daemon 13 of the server as a data transmission destination (step 7).
3).

【００４１】このようにすることによって、マスタサー
バＳ１は障害の起きていないサーバを全て検出し、その
障害の起きていない全てのサーバに対して自サーバのデ
ータ送信デーモン１３から先に述べた１対ｎ通信方式に
より個別にデータを送る（コピーする）ことができる。By doing so, the master server S1 detects all servers in which no failure has occurred, and sends the above-mentioned server 1 to the server in which the failure has not occurred. Data can be sent (copied) individually using the n-to-n communication method.

【００４２】ここでは、図８に示すように、マスタサー
バＳ１以外のサーバのうち、つまりスレーブサーバＳ
２，Ｓ３，Ｓ４のうち、サーバＳ２に障害が発生してい
るものとすると（障害サーバには×印を付してある）、
マスタサーバＳ１（のデータ送信デーモン１３）から
は、図８において符号８１，８２で示すように、スレー
ブサーバＳ３，Ｓ４に対してのみ、個別にデータが送ら
れる（コピーされる）。上述したマスタサーバを含む各
サーバ上の状態監視デーモン１１の動作は、定期的に行
われる。Here, as shown in FIG. 8, of the servers other than the master server S1,
If it is assumed that a failure has occurred in the server S2 among the servers S2, S3, and S4 (failed servers are marked with a cross),
As shown by reference numerals 81 and 82 in FIG. 8, the data is individually transmitted (copied) only to the slave servers S3 and S4 from the master server S1 (the data transmission daemon 13 thereof). The operation of the status monitoring daemon 11 on each server including the master server described above is performed periodically.

【００４３】（障害発生時の動作）次に、このような状
態で、サーバに障害が発生した場合の動作を説明する。
まず、図９において符号９０で示すように、マスタサー
バＳ１に障害が発生したものとする。このマスタサーバ
Ｓ１の障害発生は、図６のフローチャートの示すアルゴ
リズムに従い、他の正常なサーバ、つまりスレーブサー
バＳ３，Ｓ４で検出される。この場合、障害が発生して
いないサーバＳ３，Ｓ４の中で、その時点の優先順位が
マスタサーバＳ１の次に高い（順位３の）サーバＳ３、
即ち優先順位が高い方のサーバＳ３が、当該マスタサー
バＳ１の障害検出に応じて、図９に示すように新たにマ
スタサーバとなる。すると、図５の優先順位情報から明
らかなように、各サーバＳ１〜Ｓ４の優先順位は、高い
方からＳ３（順位３）→Ｓ４（順位４）→Ｓ１（順位
５）→Ｓ２（順位６）となる。(Operation When Failure Occurs) Next, the operation when a failure occurs in the server in such a state will be described.
First, it is assumed that a failure has occurred in the master server S1, as indicated by reference numeral 90 in FIG. The occurrence of the failure of the master server S1 is detected by another normal server, that is, the slave servers S3 and S4, according to the algorithm shown in the flowchart of FIG. In this case, among the servers S3 and S4 in which no failure has occurred, the server S3 with the highest priority at that time (rank 3) next to the master server S1.
That is, the server S3 having the higher priority becomes a new master server in response to the failure detection of the master server S1 as shown in FIG. Then, as is clear from the priority order information of FIG. 5, the priority order of the servers S1 to S4 is S3 (rank 3) → S4 (rank 4) → S1 (rank 5) → S2 (rank 6) in descending order. Becomes

【００４４】新たにマスタとなったサーバＳ３の状態監
視デーモン１１は、図９において符号９１，９２，９３
で示すように、自身より１つ順位が下のサーバ（Ｓ４）
から、２つ下のサーバ（Ｓ１）、３つ下のサーバ（Ｓ
２）と、順位が低くなる方向に順に通信を行っていき、
障害が発生していないサーバを全て探す。図９の例で
は、サーバＳ４だけが障害が発生していないサーバとし
て検出され、新たなマスタサーバＳ３から当該サーバ
（スレーブサーバ）Ｓ４へのデータコピーが行われるこ
とになる。The status monitoring daemon 11 of the server S3 that has newly become the master is denoted by reference numerals 91, 92 and 93 in FIG.
As shown by, the server one rank lower than itself (S4)
From the next lower server (S1) and the third lower server (S1)
2) and communicate sequentially in the direction of decreasing order,
Find all servers that have not failed. In the example of FIG. 9, only the server S4 is detected as a server in which no failure has occurred, and data is copied from the new master server S3 to the server (slave server) S4.

【００４５】これに対し、図８の状態で、スレーブサー
バＳ３，Ｓ４、即ちバックアップサーバＳ３，Ｓ４のい
ずれかに障害が発生した場合には、マスタサーバＳ１の
状態監視デーモン１１は、自サーバのデータ送信デーモ
ン１３から障害が発生したサーバヘの送信を停止する。On the other hand, in the state of FIG. 8, if a failure occurs in any of the slave servers S3 and S4, ie, the backup servers S3 and S4, the status monitoring daemon 11 of the master server S1 transmits the failure to its own server. The transmission from the data transmission daemon 13 to the failed server is stopped.

【００４６】（復旧時の動作）次に、障害で停止してい
たサーバが復旧した場合の動作について説明する。今、
図９に示すように障害で停止していたサーバＳ１が、図
１０において符号１００で示すように復旧したものとす
る。(Operation at Restoration) Next, the operation when the server that has been stopped due to the failure is restored will be described. now,
It is assumed that the server S1 that has been stopped due to a failure as shown in FIG. 9 has been restored as shown by reference numeral 100 in FIG.

【００４７】すると、マスタサーバＳ３の状態監視デー
モン１１は、定期的な監視動作により、サーバＳ１が復
旧したことを検出する。この場合、マスタサーバＳ３の
状態監視デーモン１１は、自サーバのデータ送信デーモ
ン１３に対し、データ送信先としてサーバＳ１を追加指
定する。Then, the status monitoring daemon 11 of the master server S3 detects that the server S1 has recovered by a periodic monitoring operation. In this case, the status monitoring daemon 11 of the master server S3 additionally specifies the server S1 as a data transmission destination to the data transmission daemon 13 of the own server.

【００４８】一方、復旧したサーバＳ１の状態監視デー
モン１１は、自サーバのデータ受信デーモン１２に対
し、データ受信先（データ送信元）として現在のマスタ
サーバＳ３を指定する。そして、サーバＳ１の受信先と
なるマスタサーバＳ３の全データが、図１０において符
号１０１で示すように、当該サーバＳ１にコピーされ、
システム内の他のサーバのデータとの一致化が図られ
る。On the other hand, the status monitoring daemon 11 of the recovered server S1 designates the current master server S3 as a data receiving destination (data transmission source) to the data receiving daemon 12 of the own server. Then, all the data of the master server S3, which is the destination of the server S1, is copied to the server S1, as indicated by reference numeral 101 in FIG.
Matching with data of other servers in the system is achieved.

【００４９】以後、マスタサーバＳ３は、図１中のクラ
イアントＣ１〜Ｃｍのいずれかが当該マスタサーバＳ３
の持つファイルに対してデータの書き込みを行った場
合、図１０において符号１０２，１０３で示すように、
そのデータをサーバ（スレーブサーバ）Ｓ４，Ｓ１に順
にコピーする。Thereafter, any one of the clients C1 to Cm in FIG. 1 is assigned to the master server S3.
When data is written to a file held by a file, as shown by reference numerals 102 and 103 in FIG.
The data is sequentially copied to servers (slave servers) S4 and S1.

【００５０】［第２の実施形態］以上に述べた第１の実
施形態では、１対ｎ通信方式によるバックアップを適用
するものとしたが、これに限るものではなく、例えばデ
ィジーチェーン方式によるバックアップを適用すること
も可能である。[Second Embodiment] In the first embodiment described above, the backup by the 1: n communication method is applied. However, the present invention is not limited to this. For example, the backup by the daisy chain method is used. It is also possible to apply.

【００５１】そこで、ディジーチェーン方式によるバッ
クアップを適用した本発明の第２の実施形態に係る高可
用性計算機システムについて、図面を参照して説明す
る。なお、システム構成は便宜的に図１及び図３を援用
するものとする。A high-availability computer system according to a second embodiment of the present invention to which backup by the daisy-chain method is applied will be described with reference to the drawings. 1 and 3 are referred to for convenience of the system configuration.

【００５２】（ディジーチェーン方式によるバックアッ
プ）まず、ディジーチェーン方式によるバックアップに
ついて説明する。今、図１１に示すように、サーバＳ１
がマスタサーバ、サーバＳ２〜Ｓ４がスレーブサーバと
なっているものとする。この状態で、図１中のクライア
ントＣ１〜ＣｍのいずれかがマスタサーバＳ１の持つフ
ァイルに対してデータの書き込みを行った場合、そのデ
ータをスレーブサーバＳ２〜Ｓ４にコピー、つまりバッ
クアップする動作が行われる。本実施形態では、このバ
ックアップにディジーチェーン方式によるバックアップ
が次のように適用される。(Backup by Daisy Chain Method) First, backup by the daisy chain method will be described. Now, as shown in FIG.
Is a master server, and servers S2 to S4 are slave servers. In this state, when any of the clients C1 to Cm in FIG. 1 writes data to the file of the master server S1, the operation of copying the data to the slave servers S2 to S4, that is, performing a backup operation is performed. Will be In the present embodiment, a daisy-chain backup is applied to this backup as follows.

【００５３】ここでは、まずマスタサーバＳ１（のデー
タ送信デーモン１３）からスレーブサーバＳ２〜Ｓ４の
うちの例えばサーバＳ２に、図１１において符号１１１
で示すようにデータがコピーされる。次に、そのスレー
ブサーバＳ２（のデータ送信デーモン１３）から他の例
えばスレーブサーバＳ３に、図１１において符号１１２
で示すように上記データがコピーされる。そして、その
スレーブサーバＳ３（のデータ送信デーモン１３）から
残りのスレーブサーバＳ４に、図１１において符号１１
３で示すように上記データがコピーされる。Here, first, the master server S1 (the data transmission daemon 13 thereof) sends the slave server S2 to the server S2 among the slave servers S2 to S4, for example, by the reference numeral 111 in FIG.
The data is copied as indicated by. Next, from the slave server S2 (the data transmission daemon 13 thereof) to another slave server S3, for example, reference numeral 112 in FIG.
The data is copied as shown by. Then, from the slave server S3 (the data transmission daemon 13 thereof) to the remaining slave servers S4, reference numeral 11 in FIG.
The data is copied as shown at 3.

【００５４】このように、デイジーチェーンによるバッ
クアップ方式では、マスタサーバからスレーブサーバの
うちの１台にデータをコピーし、次に、そのスレーブサ
ーバから他のスレーブサーバにデータをコピーする、と
いうように、マスタサーバから始まってリレー式でデー
タコピーが繰り返されて、全てのスレーブサーバにデー
タがコピーされる。As described above, in the backup system using the daisy chain, data is copied from the master server to one of the slave servers, and then the data is copied from the slave server to another slave server. The data copy is repeated in a relay manner starting from the master server, and the data is copied to all the slave servers.

【００５５】（マスタサーバとスレーブサーバの基本動
作）次に、マスタサーバとスレーブサーバの基本動作に
ついて説明する。まず、全てのサーバＳ１〜Ｓ４につい
てマスタとなる優先順位を予め定めておき、その優先順
位の情報を各サーバＳ１〜Ｓ４の状態監視デーモン１１
に記憶させておく点と、システム内で現在マスタとなっ
ているサーバ（の識別情報）を各サーバＳ１〜Ｓ４の状
態監視デーモン１１に記憶させる点は、前記実施形態と
同様である。(Basic Operation of Master Server and Slave Server) Next, the basic operation of the master server and the slave server will be described. First, the master priority order for all the servers S1 to S4 is determined in advance, and the information on the priority order is stored in the state monitoring daemon 11 of each server S1 to S4.
Is stored in the status monitoring daemon 11 of each of the servers S1 to S4 in the same manner as in the above embodiment.

【００５６】さて、各サーバＳ１〜Ｓ４の状態監視デー
モン１１は、データ受信及び送信を行うべき相手サーバ
を次のようにして決定する。まず初期起動時は、図５に
示す優先順位情報で決まる順位１のサーバ、即ちサーバ
Ｓ１がマスタサーバとなる。一方、マスタサーバでない
サーバＳ２〜Ｓ４（の状態監視デーモン１１）は、図１
２のフローチャートに示す手順で、自身より１つ順位が
上のサーバから始まって、２つ順位が上のサーバ、３つ
順位が上のサーバへと、順位が高くなる方向にマスタサ
ーバＳ１まで順に通信を行っていき、障害が発生してい
ないサーバを１つ探す（ステップ１２１〜１２４）。The status monitoring daemon 11 of each of the servers S1 to S4 determines a partner server to receive and transmit data as follows. First, at the time of initial startup, the server of rank 1 determined by the priority information shown in FIG. 5, that is, the server S1, becomes the master server. On the other hand, the servers S2 to S4 that are not master servers (the status monitoring daemon 11 thereof)
In the procedure shown in the flowchart of FIG. 2, the server starts with the server one rank higher than the server itself, the server with two ranks higher, the server with three ranks higher, and the master server S1 in order of higher rank. Communication is performed to search for one server in which no failure has occurred (steps 121 to 124).

【００５７】サーバＳ２〜Ｓ４（の状態監視デーモン１
１）は、障害が発生していないサーバを１つ見つけるこ
とができたなら（ステップ１２２）、上記の通信を終了
すると共に、そのサーバを自サーバのデータ受信デーモ
ン１２に知らせる（ステップ１２５）。The status monitoring daemon 1 of the servers S2 to S4 (
If 1) finds one server in which no failure has occurred (step 122), it terminates the above communication and informs the server to its own data reception daemon 12 (step 125).

【００５８】またサーバＳ２〜Ｓ４（の状態監視デーモ
ン１１）は、障害が発生していないサーバを見つける前
に、マスタサーバ（Ｓ１）に障害が発生していることを
検出したならば（ステップ１２２，１２３）、その障害
発生を検出したサーバが直ちにマスタサーバになる（ス
テップ１２６）。If the status monitoring daemon 11 of the servers S2 to S4 detects that a failure has occurred in the master server (S1) before finding a server in which no failure has occurred (step 122). , 123), and the server that has detected the failure immediately becomes the master server (step 126).

【００５９】以上の通信の後、マスタサーバ（Ｓ１）、
及びスレーブサーバ（Ｓ２〜Ｓ４）のどちらも（自サー
バの状態監視デーモン１１により）、図１３のフローチ
ャートに示す手順で、自身より１つ順位が下のサーバか
ら始まって、２つ順位が下のサーバ、３つ順位が下のサ
ーバへと、順位が低くなる方向に、現時点における最下
位のサーバまで順に通信を行っていき、障害が発生して
いないサーバを１つ探す（ステップ１３１〜１３４）。
ここで図５の優先順位情報上で、順位４のサーバＳ４よ
り１つ順位が下のサーバ、つまり順位５のサーバはサー
バＳ１である。しかし、このサーバＳ１は現在順位１の
マスタサーバであることから、順位５は当該マスタサー
バＳ１自身の１巡後の順位であり、それより１つ手前の
順位（順位４）のサーバＳ４が現時点で最下位のサーバ
であることが分かる。このためサーバＳ４は通信を行わ
ない。After the above communication, the master server (S1),
Both the slave server (S2 to S4) and the slave server (S2 to S4) start from the server which is one rank lower than itself and follow the procedure shown in the flowchart of FIG. The server communicates in order from the lowest to the lowest server at the current time in the direction of lowering the order of the server to the server of three lower ranks, and searches for one server in which no failure has occurred (steps 131 to 134). .
Here, on the priority order information of FIG. 5, the server one rank lower than the server S4 of rank 4, that is, the server of rank 5, is the server S1. However, since this server S1 is currently the master server of rank 1, rank 5 is the rank of the master server S1 itself after one round, and the server S4 of rank immediately before that (rank 4) is the current server. Indicates that the server is the lowest server. Therefore, the server S4 does not perform communication.

【００６０】各サーバ（Ｓ１〜Ｓ３）は、１つ順位が下
のサーバから始まって優先順位情報上で現在のマスタサ
ーバ（Ｓ１）の１つ手前のサーバ（Ｓ４）まで探して
も、つまり現マスタサーバ自身の１巡後の順位（順位
５）の１つ手前（順位４）まで探しても、障害が発生し
ていないサーバが見つからなければ（ステップ１３３，
１５）、自サーバのデータ送信デーモン１３に動きを停
止するように伝える（ステップ１３６）。Each of the servers (S1 to S3) searches for the server (S4) immediately preceding the current master server (S1) on the priority information starting from the server one rank lower, that is, the current server. Even if a search is made to a position (rank 4) immediately before the master server itself after one round (rank 5), if no server with no failure is found (step 133,
15), and notifies the data transmission daemon 13 of the server to stop the operation (step 136).

【００６１】一方、各サーバ（Ｓ１〜Ｓ３）は、障害が
発生していないサーバを１つ見つけることができたなら
（ステップ１３２）、上記の通信を終了すると共に、そ
のサーバを自サーバのデータ送信デーモン１３に知らせ
る（ステップ１３７）。On the other hand, if each of the servers (S1 to S3) can find one server in which no failure has occurred (step 132), it terminates the communication and stores the server in its own data. The transmission daemon 13 is notified (step 137).

【００６２】このようにすることによって、図１４に示
すように、サーバＳ２に障害が発生しているものとする
と（障害サーバには×印を付してある）、マスタサーバ
Ｓ１のデータ送信デーモン１３にはＳ３が設定され、サ
ーバＳ３のデータ受信デーモン１２にはＳ１が、データ
送信デーモン１３にはＳ４がそれぞれ設定され、そして
サーバＳ４のデータ受信デーモン１２にはＳ３が設定さ
れる。By doing so, as shown in FIG. 14, if a failure has occurred in the server S2 (the failed server is marked with an X), the data transmission daemon of the master server S1 13 is set to S3, S1 is set to the data reception daemon 12 of the server S3, S4 is set to the data transmission daemon 13, and S3 is set to the data reception daemon 12 of the server S4.

【００６３】以後、図１４の状態においてマスタサーバ
Ｓ１でデータの変更があった場合、まず符号１４１に示
すように、マスタサーバＳ１（のデータ送信デーモン１
３）からスレーブサーバＳ３の（データ受信デーモン１
２）にデータが送られてコピーされる。そして、そのデ
ータが、図１４において符号１４２に示すように、スレ
ーブサーバＳ３（のデータ送信デーモン１３）からスレ
ーブサーバＳ４の（データ受信デーモン１２）に送られ
てコピーされる。Thereafter, when data is changed in the master server S1 in the state of FIG. 14, first, as indicated by reference numeral 141, the data transmission daemon 1 of the master server S1 (
3) (Data reception daemon 1) of slave server S3
The data is sent to 2) and copied. Then, the data is sent from (the data transmission daemon 13 of) the slave server S3 to (the data reception daemon 12) of the slave server S4 and copied as indicated by reference numeral 142 in FIG.

【００６４】このように本実施形態では、マスタサーバ
から障害が発生していない全てのスレーブサーバに、マ
スタになる優先順位の順番で各スレーブサーバを亘って
ディジーチェーン方式（リレー式）でデータが送られ
る。As described above, in the present embodiment, data is transmitted from the master server to all slave servers in which no failure has occurred in a daisy chain system (relay system) across the slave servers in the order of priority of becoming the master. Sent.

【００６５】（障害発生時の動作）次に、このような状
態で、サーバに障害が発生した場合の動作を説明する。
まず、図１５において符号１５０で示すように、マスタ
サーバＳ１に障害が発生したものとする。このマスタサ
ーバＳ１の障害発生は、図１２のフローチャートに示す
アルゴリズムに従い、他の正常なスレーブサーバＳ３，
Ｓ４のうち、その時点の優先順位がマスタサーバＳ１の
次に高い（順位３の）サーバＳ３、即ち優先順位が高い
方のサーバＳ３で最初に検出される。この場合、サーバ
Ｓ３が図１５に示すように新たにマスタサーバとなり、
それまでマスタとなっていたサーバＳ１の処理を引き継
ぐ。この新たなマスタサーバＳ３（のデータ受信デーモ
ン１２）には、データ受信先（データ送信元）としてサ
ーバＳ１が設定されていたが、その設定が解除される。
ここではデータのコピーは、図１５において符号１５１
で示すように、サーバＳ３からサーバＳ４に対してだけ
行われる。(Operation when Failure Occurs) Next, an operation when a failure occurs in the server in such a state will be described.
First, it is assumed that a failure has occurred in the master server S1, as indicated by reference numeral 150 in FIG. The failure of the master server S1 is determined according to the algorithm shown in the flowchart of FIG.
Among the servers S4, the server S3 having the highest priority (order 3) next to the master server S1 at that time, that is, the server S3 having the higher priority is first detected. In this case, the server S3 becomes a new master server as shown in FIG.
The process of the server S1 that has been the master is taken over. In the new master server S3 (the data reception daemon 12 thereof), the server S1 is set as the data reception destination (data transmission source), but the setting is released.
Here, the copy of the data is denoted by reference numeral 151 in FIG.
Is performed only from the server S3 to the server S4.

【００６６】これに対し、スレーブサーバ（バックアッ
プサーバ）に障害が発生した場合には、次のようにな
る。まず、障害の発生したスレーブサーバより優先順位
の高いサーバのうち、最も優先順位の低いサーバ（Ａと
する）のデータ送信先が、障害の発生したスレーブサー
バより優先順位の低いサーバのうち、最も優先順位の高
いサーバ（Ｂとする）に変更され、逆にＢのデータ受信
先（データ送信元）はＡに変更される。もしＢがマスタ
サーバであれば、Ａのデータ送信デーモン１３が停止す
る。On the other hand, when a failure occurs in the slave server (backup server), the following occurs. First, among the servers having higher priorities than the failed slave server, the data transmission destination of the server with the lowest priority (A) is the server among the servers having lower priorities than the slave server having the failure. The server is changed to a server with a higher priority (B), and conversely, the data reception destination (data transmission source) of B is changed to A. If B is the master server, A's data transmission daemon 13 stops.

【００６７】したがって、図１４の状態で、例えば図１
６において符号１６０で示すようにスレーブサーバＳ３
に障害が発生した場合には、マスタサーバＳ１のデータ
送信先がＳ３からＳ４に変更され、スレーブサーバＳ４
のデータ受信先（データ送信元）がＳ３からＳ１に変更
される。この場合、データのコピーは、図１６において
符号１６１で示すように、マスタサーバＳ１からスレー
ブサーバＳ４に対してだけ行われる。Therefore, in the state of FIG.
As indicated by reference numeral 160 in FIG.
If a failure occurs in the slave server S4, the data transmission destination of the master server S1 is changed from S3 to S4.
Is changed from S3 to S1. In this case, data is copied only from the master server S1 to the slave server S4, as indicated by reference numeral 161 in FIG.

【００６８】（復旧時の動作）次に、障害で停止してい
たサーバが復旧した場合の動作について説明する。この
場合、復旧したサーバよりも優先順位の高いサーバのう
ち、最も優先順位の低いサーバのデータ送信先が、復旧
したサーバに変更される。また、復旧したサーバよりも
優先順位の低いサーバのうち、最も優先順位の高いサー
バがマスタサーバでないならば、このサーバのデータ受
信先（送信元）が復旧したサーバに変更される。(Operation at Restoration) Next, the operation when the server that has been stopped due to the failure is restored will be described. In this case, among the servers having higher priorities than the restored server, the data transmission destination of the server having the lowest priority is changed to the restored server. If the server with the highest priority among the servers with lower priorities than the restored server is not the master server, the data receiving destination (transmission source) of this server is changed to the restored server.

【００６９】したがって、図１５に示すように障害で停
止していたサーバＳ１が、図１７において符号１７０で
示すように（スレーブサーバとして）復旧した場合であ
れば、復旧したサーバＳ１よりも優先順位の高いサーバ
のうち、最も優先順位の低いサーバＳ４のデータ送信先
として、復旧したサーバＳ１が新たに設定される。ここ
では、復旧したサーバＳ１よりも優先順位の低い正常な
サーバは存在しないため、データ受信先（送信元）が変
更されるサーバは存在しない。また、復旧したサーバＳ
１のデータ受信先はサーバＳ４に設定される。そして、
サーバＳ１の受信先となるサーバＳ４の全データが、図
１７において符号１７１で示すように、当該サーバＳ１
にコピーされ、システム内の他のサーバのデータとの一
致化が図られる。Therefore, if the server S1 that has been stopped due to a failure as shown in FIG. 15 has been restored (as a slave server) as indicated by reference numeral 170 in FIG. 17, the priority is higher than that of the restored server S1. The restored server S1 is newly set as the data transmission destination of the server S4 having the lowest priority among the servers having the higher priority. Here, since there is no normal server having a lower priority than the restored server S1, there is no server whose data reception destination (transmission source) is changed. Also, the restored server S
The first data receiving destination is set in the server S4. And
As shown by reference numeral 171 in FIG. 17, all data of the server S4, which is the destination of the server S1, is
And is matched with the data of the other servers in the system.

【００７０】以後、図１７の状態においてマスタサーバ
Ｓ３でデータの変更があった場合のデータバックアップ
の手順は次のようになる。まず、図１７において符号１
７２に示すように、マスタサーバＳ３からスレーブサー
バＳ４に変更のあったデータがコピーされる。次に、そ
のデータが、図１７において符号１７３に示すように、
スレーブサーバＳ４から復旧したスレーブサーバＳ１に
コピーされる。以上、障害で停止していたサーバの復旧
時の動作を、復旧したサーバよりも優先順位の低い正常
なサーバが存在しない場合を例に説明した。Thereafter, the procedure of data backup when data is changed in the master server S3 in the state of FIG. 17 is as follows. First, in FIG.
As shown at 72, the changed data is copied from the master server S3 to the slave server S4. Next, as shown by reference numeral 173 in FIG.
The data is copied from the slave server S4 to the restored slave server S1. In the above, the operation at the time of restoration of a server stopped due to a failure has been described as an example in the case where there is no normal server having a lower priority than the restored server.

【００７１】次に、復旧したサーバよりも優先順位の低
い正常なサーバが存在する場合のサーバ復旧時の動作の
具体例について説明する。Next, a specific example of the operation at the time of server restoration when a normal server having a lower priority than the restored server exists will be described.

【００７２】今、図１４に示す状態で障害により停止し
ていたサーバＳ２が、図１８において符号１８０で示す
ように（スレーブサーバとして）復旧したものとする。
この場合、復旧したサーバＳ２よりも優先順位の高いサ
ーバのうち、最も優先順位の低いサーバはマスタサーバ
Ｓ１であることから、当該サーバＳ１のデータ送信先が
スレーブサーバＳ３から復旧したスレーブサーバＳ２に
変更される。また、復旧したサーバＳ２よりも優先順位
の低いサーバのうち、最も優先順位の高いサーバＳ３が
マスタサーバでないことから、このサーバＳ３のデータ
受信先（送信元）がマスタサーバＳ１から復旧したサー
バＳ２に変更される。また、復旧したサーバＳ２のデー
タ受信先はサーバＳ１に、データ送信先はサーバＳ３に
設定される。そして、サーバＳ２の受信先となるサーバ
Ｓ１（ここではマスタサーバＳ１）の全データが、図１
８において符号１８１で示すように、当該サーバＳ２に
コピーされ、システム内の他のサーバのデータとの一致
化が図られる。Now, it is assumed that the server S2 that has been stopped due to a failure in the state shown in FIG. 14 has been restored (as a slave server) as indicated by reference numeral 180 in FIG.
In this case, since the server with the lowest priority among the servers with higher priorities than the restored server S2 is the master server S1, the data transmission destination of the server S1 is changed to the slave server S2 restored from the slave server S3. Be changed. Further, since the server S3 having the highest priority among the servers having lower priorities than the restored server S2 is not the master server, the server S2 whose data receiving destination (transmission source) of the server S3 has been restored from the master server S1. Is changed to The data receiving destination of the restored server S2 is set to the server S1, and the data transmitting destination is set to the server S3. Then, all data of the server S1 (here, the master server S1) that is the destination of the server S2 is
As indicated by reference numeral 181 in FIG. 8, the data is copied to the server S2 and matched with data of another server in the system.

【００７３】以後、図１８の状態においてマスタサーバ
Ｓ１でデータの変更があった場合のデータバックアップ
の手順は次のようになる。まず、図１８において符号１
８２に示すように、マスタサーバＳ１から復旧したスレ
ーブサーバＳ２に変更のあったデータがコピーされる。
次に、そのデータが、図１８において符号１８３に示す
ように、スレーブサーバＳ２からスレーブサーバＳ３に
コピーされる。そして、そのデータが、図１８において
符号１８４に示すように、スレーブサーバＳ３からスレ
ーブサーバＳ４にコピーされる。Thereafter, the procedure of data backup when data is changed in the master server S1 in the state of FIG. 18 is as follows. First, in FIG.
As shown at 82, the changed data is copied from the master server S1 to the restored slave server S2.
Next, the data is copied from the slave server S2 to the slave server S3 as indicated by reference numeral 183 in FIG. Then, the data is copied from the slave server S3 to the slave server S4 as indicated by reference numeral 184 in FIG.

【００７４】［第３の実施形態］次に本発明の第３の実
施形態に係る高可用性計算機システムについて、図１９
のブロック構成図を参照して説明する。同図において、
高速なネットワークであるＬＡＮ（ローカルエリアネッ
トワーク）２１には複数のサーバＳ１１〜Ｓ１４が接続
されている。また、低速なネットワークであるＷＡＮ
（ワールドエリアネットワーク）２２にはサーバＳ１４
〜Ｓ１７が接続されている。本実施形態において、サー
バＳ１４は、ＬＡＮ２１及びＷＡＮ２２を相互接続する
ために設けられたもので、後述するように、ＬＡＮ２１
側からＷＡＮ２２側へと、ＷＡＮ２２側からＬＡＮ２１
側へのデータのコピー（バックアップ）を司る。Third Embodiment Next, a high availability computer system according to a third embodiment of the present invention will be described with reference to FIG.
This will be described with reference to the block diagram of FIG. In the figure,
A plurality of servers S11 to S14 are connected to a LAN (local area network) 21 which is a high-speed network. In addition, WAN which is a low-speed network
(World Area Network) 22 has server S14
To S17 are connected. In the present embodiment, the server S14 is provided for interconnecting the LAN 21 and the WAN 22.
From the side to the WAN 22 side, from the WAN 22 side to the LAN 21 side
It is responsible for copying (backing up) data to the side.

【００７５】この図１９のシステムの特徴は、ＬＡＮ２
１により接続されたサーバＳ１１〜Ｓ１４同士のバック
アップにはディジーチェーン方式を適用し、ＷＡＮ２２
により接続されたサーバＳ１４〜Ｓ１７同士のバックア
ップには１対ｎ通信方式を適用し、サーバＳ１４がその
両方式の混在を可能とするインタフェースをなす点にあ
る。ここで、少なくともＬＡＮ２１上のサーバＳ１４〜
Ｓ１３では、自身がスレーブサーバの場合にも、所定の
アプリケーションプログラムが動作して、固有の処理が
行えるようになっているものとする。The feature of the system shown in FIG.
The daisy chain method is applied to backup between the servers S11 to S14 connected by
A point-to-point communication method is applied to the backup between the servers S14 to S17 connected by the above-described method, and the server S14 forms an interface that enables a mixture of both methods. Here, at least the servers S14 to S14 on the LAN 21
In S13, it is assumed that a predetermined application program operates to perform a unique process even when the server itself is a slave server.

【００７６】さて、デイジーチェーン方式の長所は、１
対ｎ通信方式と比較した場合、マスタサーバの負荷を低
くすることができる点にある。逆に、１対ｎ通信方式の
長所は、各スレーブサーバ（バックアップサーバ）の情
報の一致が速いという点にある。そこで、高遠ＬＡＮ２
１に接続されてアプリケーションが実際に動いているサ
ーバ同士（サーバＳ１１〜Ｓ１３）では、バックアップ
にディジーチェーン方式を適用することで、アプリケー
ションの動作を妨げずに運用する。The advantages of the daisy chain method are as follows.
In comparison with the n-to-n communication method, the load on the master server can be reduced. Conversely, an advantage of the one-to-n communication method is that the information of each slave server (backup server) matches quickly. Therefore, high-distant LAN2
The servers (servers S11 to S13) connected to the server 1 and running the application are operated without interrupting the operation of the application by applying the daisy chain method to the backup.

【００７７】しかし、低速なネットワークとしてのＷＡ
Ｎ２２に接続されているサーバ同士（サーバＳ１５〜Ｓ
１７）では、バックアップにディジーチェーン方式を使
うと、データのコピーに要する時間が長くなり、各サー
バの持つデータの一致化が遅れて、各サーバ間でデータ
の不一致の度合いが大きくなってしまう。However, WA as a low-speed network
N22 (servers S15 to S15)
In 17), if the daisy-chain method is used for backup, the time required for copying data becomes longer, the matching of data held by each server is delayed, and the degree of data mismatch between the servers increases.

【００７８】そこで、ＬＡＮ２１とＷＡＮ２２との間に
ＬＡＮ２１側からＷＡＮ２２側へと、ＷＡＮ２２側から
ＬＡＮ２１側へのデータのコピーを司る機能を持つ専用
サーバＳ１４を置いて、ＬＡＮ２１側からＷＡＮ２２側
のサーバ（Ｓ１５〜Ｓ１７）にデータをコピーする必要
がある場合には、そのデータを当該サーバＳ１４で受け
取って、当該サーバＳ１４からＷＡＮ２２上の各サーバ
（Ｓ１５〜Ｓ１７）に１対ｎ通信方式でコピーする。ま
た、ＷＡＮ２２側からＬＡＮ２１側のサーバ（Ｓ１１〜
Ｓ１３）にデータをコピーする必要がある場合には、そ
のデータを当該サーバＳ１４で受け取って、当該サーバ
Ｓ１４からＬＡＮ２１上の最も優先順位の高いサーバ
（Ｓ１５〜Ｓ１７）にデータをコピーする。そして、そ
のサーバからＬＡＮ２１上の他のサーバに、優先順位に
従ってディジーチェーン方式でデータが順にコピーす
る。Therefore, a dedicated server S 14 having a function of copying data from the WAN 22 to the LAN 21 is placed between the LAN 21 and the WAN 22 from the LAN 21 to the WAN 22, and the server (from the LAN 21 to the WAN 22) is provided. When it is necessary to copy the data to S15 to S17), the data is received by the server S14, and the data is copied from the server S14 to each server (S15 to S17) on the WAN 22 by the 1: n communication method. Also, the servers on the LAN 21 side from the WAN 22 side (S11 to S11)
If the data needs to be copied in S13), the data is received by the server S14, and the data is copied from the server S14 to the server with the highest priority (S15 to S17) on the LAN 21. Then, the data is sequentially copied from the server to another server on the LAN 21 in a daisy chain system according to the priority order.

【００７９】次に、図１９のシステムにおけるデータコ
ピー（データバックアップ）の具体例を説明する。ここ
では、優先順位がＳ１１→Ｓ１２→Ｓ１３→Ｓ１４→Ｓ
１５→Ｓ１６→Ｓ１７の順であり、サーバＳ１１がマス
タサーバであるものとする。Next, a specific example of data copy (data backup) in the system of FIG. 19 will be described. Here, the priority order is S11 → S12 → S13 → S14 → S
The order is 15 → S16 → S17, and it is assumed that the server S11 is the master server.

【００８０】まず、マスタサーバＳ１１からサーバ（ス
レーブサーバ）Ｓ１２に、図１９において符号１９１で
示すように、ＬＡＮ２１を介してデータがコピーされ
る。次に、そのデータが、図１９において符号１９２で
示すように、サーバＳ１２からサーバＳ１３にＬＡＮ２
１を介してコピーされる。次に、そのデータが、図１９
において符号１９３で示すように、サーバＳ１３からサ
ーバＳ１４にＬＡＮ２１を介してコピーされる。サーバ
Ｓ１４は、サーバＳ１３からのデータを１対ｎ通信方式
により、図１９において符号１９４，１９５，１９６に
示すように、ＷＡＮ２２上の他のサーバＳ１５，Ｓ１
６，Ｓ１７に順にコピーする。なお、ＬＡＮ２１上にサ
ーバＳ１４より優先順位が低いサーバが存在する場合に
は、サーバＳ１４は（ディジーチェーン方式のバックア
ップを適用するＬＡＮ２１上のサーバとして）、そのう
ちの最も優先順位が高いサーバにもデータをコピーす
る。First, data is copied from the master server S11 to the server (slave server) S12 via the LAN 21, as indicated by reference numeral 191 in FIG. Next, as shown by reference numeral 192 in FIG. 19, the data is transmitted from the server S12 to the server S13 via the LAN2.
1 is copied. Next, the data is shown in FIG.
As indicated by reference numeral 193, the data is copied from the server S13 to the server S14 via the LAN 21. The server S14 transmits the data from the server S13 to the other servers S15 and S1 on the WAN 22 as shown by reference numerals 194, 195 and 196 in FIG.
6 and S17 are sequentially copied. If a server having a lower priority than the server S14 exists on the LAN 21, the server S14 (as a server on the LAN 21 to which the daisy-chain backup is applied) transmits data to the server having the highest priority. Copy

【００８１】ＷＡＮ２２側からＬＡＮ２１側へのデータ
のコピーの場合にもデータの方向が逆になる点を除けば
上記と同様である。以下、ＷＡＮ２２側からＬＡＮ２１
側へのデータコピーの具体例を、図２０に示すように、
優先順位がＳ１５→Ｓ１６→Ｓ１７→Ｓ１１→Ｓ１２→
Ｓ１３→Ｓ１４の順であり、サーバＳ１５がマスタサー
バである場合を例に説明する。The same applies to the copying of data from the WAN 22 to the LAN 21 except that the data direction is reversed. Hereinafter, from the WAN 22 side to the LAN 21
As shown in FIG. 20, a specific example of data copy to the
The priority is S15 → S16 → S17 → S11 → S12 →
The order is S13 → S14, and a case where the server S15 is a master server will be described as an example.

【００８２】まず、ＷＡＮ２２上のマスタサーバＳ１５
から当該ＷＡＮ２２上の他のサーバ（スレーブサーバ）
Ｓ１６，Ｓ１７，Ｓ１４に、図２０において符号２０
１，２０２，２０３で示すように、１対ｎ通信方式によ
り同一データがＷＡＮ２２を介して順にコピーされる。
次に、そのデータが、図２０において符号２０４で示す
ように、サーバＳ１４からＬＡＮ２１上の（当該サーバ
Ｓ１４を除いて）最も優先順位が高いサーバＳ１１にコ
ピーされる。次に、そのデータが、図２０において符号
２０５で示すように、サーバＳ１１から（次の優先順位
の）サーバＳ１２にＬＡＮ２１を介してコピーされる。
次に、そのデータが、図２０において符号２０６で示す
ように、サーバＳ１２から（次の優先順位の）サーバＳ
１３にＬＡＮ２１を介してコピーされる。First, the master server S15 on the WAN 22
To another server (slave server) on the WAN 22
In S16, S17, and S14, reference numeral 20 in FIG.
As shown by 1, 202 and 203, the same data is sequentially copied via the WAN 22 by the 1: n communication method.
Next, the data is copied from the server S14 to the server S11 having the highest priority (excluding the server S14) on the LAN 21 as indicated by reference numeral 204 in FIG. Next, the data is copied from the server S11 to the server S12 (having the next priority) via the LAN 21, as indicated by reference numeral 205 in FIG.
Next, as shown by reference numeral 206 in FIG. 20, the data is transmitted from the server S12 to the server S (of the next priority).
13 via the LAN 21.

【００８３】その後、上記データを、サーバＳ１３から
（次の優先順位の）サーバＳ１４にコピーしても構わな
いが、本実施形態ではＷＡＮ２２側のサーバがマスタと
なっているため、サーバＳ１４へのデータコピーは行わ
ない。その理由は、ＷＡＮ２２側のサーバがマスタとな
っている場合、上記データはサーバＳ１４からＬＡＮ２
１上のサーバにコピーされたものであり、当該サーバＳ
１４上に既に存在するためである。なお、サーバＳ１４
がマスタの場合、サーバＳ１４は、ＬＡＮ２１上の自身
を除いて最も優先順位が高いサーバ（ここではサーバＳ
１１）にデータをコピーすると共に、ＷＡＮ２２上の他
のサーバＳ１５〜Ｓ１７にデータを順にコピーする。After that, the above data may be copied from the server S13 to the server S14 (having the next priority). However, in this embodiment, since the server on the WAN 22 side is the master, the data is sent to the server S14. No data copy is performed. The reason is that when the server on the WAN 22 side is the master, the data is transmitted from the server S14 to the LAN2.
1 has been copied to the server on
14 already exists. The server S14
Is the master, the server S14 has the highest priority except for the server on the LAN 21 (here, the server S14).
11), and the data is sequentially copied to the other servers S15 to S17 on the WAN 22.

【００８４】このように本実施形態では、高速なＬＡＮ
により接続されたサーバと低速なＷＡＮにより接続され
たサーバとが混在するシステムにおいて、ＬＡＮとＷＡ
Ｎとの間のデータコピーを司る専用サーバを設けると共
に、高速ＬＡＮ上ではディジーチェーン方式を、低速Ｗ
ＡＮ上では１対ｎ通信方式を適用することで、ネットワ
ークの構成に柔軟に対応したシステムを構築することが
可能である。As described above, in this embodiment, the high-speed LAN
LAN and WA in a system in which servers connected by LAN and servers connected by low-speed WAN coexist
N and a dedicated server that controls data copying between N and N.
By applying the 1: n communication method on the AN, it is possible to construct a system flexibly corresponding to the network configuration.

【００８５】[0085]

【発明の効果】以上詳述したように本発明によれば、３
台以上のサーバ計算機を用いてバックアップサーバ計算
機を複数確保し、その複数のバックアップサーバ計算機
に対する効果的なデータコピーを行うことで、より障害
に強い高可用性計算機システム、更には負荷に強い高可
用性計算機システムが構築できる。As described in detail above, according to the present invention, 3
By securing a plurality of backup server computers using more than one server computer and effectively copying data to the plurality of backup server computers, a highly available computer system that is more resistant to failures and a highly available computer that is more resistant to loads A system can be built.

【００８６】また本発明によれば、高速ネットワークと
低速ネットワークによって複数のサーバ計算機を連携さ
せ、各ネットワークに適したデータバックアップ方式を
併用することにより、ネットワーク構成に柔軟に対応し
た効果的なデータバックアップが実現でき、より効率の
よい高可用性計算機システムが構築できる。According to the present invention, a plurality of server computers are linked by a high-speed network and a low-speed network, and a data backup method suitable for each network is used together, so that an effective data backup flexibly corresponding to a network configuration is achieved. And a more efficient high-availability computer system can be constructed.

[Brief description of the drawings]

【図１】本発明の第１の実施形態に係る高可用性計算機
システムの構成を示すブロック図。FIG. 1 is a block diagram showing a configuration of a high availability computer system according to a first embodiment of the present invention.

【図２】同実施形態においてマスタサーバに障害が発生
した場合のサービスの引き継ぎを説明するための図。FIG. 2 is an exemplary view for explaining service takeover when a failure occurs in a master server in the embodiment.

【図３】同実施形態における各サーバの内部構成を示す
ブロック図。FIG. 3 is an exemplary block diagram showing the internal configuration of each server in the embodiment.

【図４】同実施形態で適用される１対ｎ通信方式による
データバックアップを説明するための図。FIG. 4 is an exemplary view for explaining data backup by the 1: n communication method applied in the embodiment;

【図５】同実施形態で適用されるマスタとなる優先順位
を説明するための図。FIG. 5 is an exemplary view for explaining a master priority order applied in the embodiment;

【図６】同実施形態におけるスレーブサーバ（の状態監
視デーモン１１）の動作手順を説明するためのフローチ
ャート。FIG. 6 is an exemplary flowchart for explaining the operation procedure of (the status monitoring daemon 11 of) the slave server in the embodiment.

【図７】同実施形態におけるマスタサーバ（の状態監視
デーモン１１）の動作手順を説明するためのフローチャ
ート。FIG. 7 is an exemplary flowchart for explaining the operation procedure of (the status monitoring daemon 11 of) the master server in the embodiment.

【図８】図１のシステムにおけるマスタサーバＳ１の基
本動作をサーバＳ２に障害が発生している場合を例に説
明するための図。FIG. 8 is a diagram for explaining a basic operation of the master server S1 in the system of FIG. 1 by taking a case where a failure occurs in the server S2 as an example.

【図９】図８の状態でマスタサーバＳ１に障害が発生し
てサーバＳ３が新たにマスタサーバとなった場合の動作
を説明するための図。9 is a diagram for explaining an operation when a failure occurs in the master server S1 in the state of FIG. 8 and the server S3 newly becomes a master server.

【図１０】図９の状態でサーバＳ１が復旧した場合の動
作を説明するための図。FIG. 10 is a diagram for explaining an operation when the server S1 is restored in the state of FIG. 9;

【図１１】本発明の第２の実施形態に係る高可用性計算
機システムで適用されるディジーチェーン方式によるデ
ータバックアップを説明するための図。FIG. 11 is an exemplary view for explaining data backup by a daisy chain method applied in the high availability computer system according to the second embodiment of the present invention.

【図１２】同第２の実施形態におけるスレーブサーバ
（の状態監視デーモン１１）の動作手順を説明するため
のフローチャート。FIG. 12 is a flowchart for explaining an operation procedure of (the status monitoring daemon 11 of) the slave server in the second embodiment.

【図１３】同第２の実施形態におけるマスタサーバ及び
スレーブサーバ（の状態監視デーモン１１）の動作手順
を説明するためのフローチャート。FIG. 13 is a flowchart for explaining an operation procedure of (the status monitoring daemon 11 of) the master server and the slave server in the second embodiment.

【図１４】同第２の実施形態における各サーバの基本動
作をサーバＳ２に障害が発生している場合を例に説明す
るための図。FIG. 14 is an exemplary view for explaining a basic operation of each server in the second embodiment, taking a case where a failure has occurred in a server S2 as an example.

【図１５】図１４の状態でマスタサーバＳ１に障害が発
生してサーバＳ３が新たにマスタサーバとなった場合の
動作を説明するための図。FIG. 15 is a diagram for explaining an operation when a failure occurs in the master server S1 in the state of FIG. 14 and the server S3 becomes a new master server.

【図１６】図１４の状態でスレーブサーバＳ３に障害が
発生した場合の動作を説明するための図。FIG. 16 is a view for explaining an operation when a failure occurs in the slave server S3 in the state of FIG. 14;

【図１７】図１５の状態でサーバＳ１が復旧した場合の
動作を説明するための図。FIG. 17 is a view for explaining an operation when the server S1 is restored in the state of FIG. 15;

【図１８】図１４の状態でサーバＳ２が復旧した場合の
動作を説明するための図。FIG. 18 is a view for explaining an operation when the server S2 is restored in the state of FIG. 14;

【図１９】本発明の第３の実施形態に係る高可用性計算
機システムで適用される１対ｎ通信方式とディジーチェ
ーン方式併用によるデータバックアップを、ＬＡＮ２１
上のサーバＳ１１がマスタの場合を例に説明するための
図。FIG. 19 is a diagram illustrating a data backup using a one-to-n communication method and a daisy-chain method used in the high availability computer system according to the third embodiment of the present invention;
FIG. 7 is a diagram for describing an example in which the upper server S11 is a master.

【図２０】同第３の実施形態においてＷＡＮ２２上のサ
ーバＳ１５がマスタの場合のデータバックアップを説明
するための図。FIG. 20 is a view for explaining data backup when a server S15 on a WAN 22 is a master in the third embodiment.

[Explanation of symbols]

Ｓ１〜Ｓ４…サーバ（サーバ計算機）Ｓ１１〜Ｓ１３…サーバ（第１のサーバ計算機）Ｓ１４…サーバ（第３のサーバ計算機）Ｓ１５〜Ｓ１７…サーバ（第２のサーバ計算機）Ｎ…ネットワーク１１…状態監視デーモン（マスタ探索手段、マスタ設定
手段、サーバ計算機探索手段、第１のサーバ計算機探索
手段、第２のサーバ計算機探索手段、データ送信先設定
手段）１２…データ受信デーモン１３…データ送信デーモン（コピー手段、第１のコピー
手段、第２のコピー手段、第３のコピー手段、第４のコ
ピー手段）２１…ＬＡＮ（ローカルエリアネットワーク、第１のネ
ットワーク）２２…ＷＡＮ（ワイドエリアネットワーク、第２のネッ
トワーク）S1 to S4: Server (server computer) S11 to S13: Server (first server computer) S14: Server (third server computer) S15 to S17: Server (second server computer) N: Network 11: Status monitoring Daemon (master searching means, master setting means, server computer searching means, first server computer searching means, second server computer searching means, data transmission destination setting means) 12 data reception daemon 13 data transmission daemon (copying means) , A first copy unit, a second copy unit, a third copy unit, a fourth copy unit) 21 LAN (local area network, first network) 22 WAN (wide area network, second network) )

Claims

[Claims]

At least three server computers connected via a network, one of which serves as a master server computer and provides services to client computers, and a failure occurs in the master server computer In this case, one of the remaining server computers is newly added to the master server in accordance with the priority information indicating the priority order to be the master for all the computers in the system and the priority being used every time the master is switched. A high availability computer system that takes over processing as a computer, wherein each of the server computers periodically executes a master search operation for searching for a master server computer when its own computer is not a master server computer, The master search means cannot find the master server computer and a fault occurs Master setting means for newly setting the own computer as the master server computer when the own computer has the highest priority among the server computers which have not performed the operation, and a faulty server computer when the own computer is the master server computer Server computer search means for periodically executing a server computer search operation for searching for a server computer having no fault and a server computer in which the own computer is a master server computer and data of a file held in the own computer has been changed from a client computer A high-availability computer system, further comprising: a copy unit for individually copying the changed data to all server computers having no fault found by the server computer search unit.

2. When the own computer is a master server computer, the computer further includes a data transmission destination setting unit that sets a server computer without a fault found by the server computer search unit as a data transmission destination. 2. The high-availability computer system according to claim 1, wherein the copying of the data is sequentially executed for all the server computers of the data transmission destination set by the data transmission destination setting means.

3. At least three server computers connected via a network, one of which serves as a master server computer and provides services to client computers, and a failure occurs in the master server computer. In this case, one of the remaining server computers is newly added to the master server in accordance with the priority information indicating the priority order to be the master for all the computers in the system and the priority being used every time the master is switched. A high-availability computer system that takes over processing as a computer, wherein each server computer starts from a server one rank higher than its own computer according to the priority information when its own computer is not a master server computer. Searching for one server computer without trouble by communicating in order of increasing order A first server computer searching means for periodically executing the first searching operation, and a faulty server computer is found before a faultless server computer is found by the first server computer searching means, and the computer Is a master server computer, a master setting means for newly setting the own computer as a master server computer; and, in accordance with the priority information, starting from a server one rank lower than the own computer and sequentially decreasing in order. Search for one server computer without failure by performing communication
A second server computer search means for periodically executing the search operation of the above, and when the own computer is the master server computer and the data of the file held by the own computer is changed from the client computer, the changed data First copying means for copying the data to a server server having no fault found by the second server computer searching means; and when data is copied from another server computer, the data is copied to the second server computer searching means. A high-availability computer system comprising: a second copying unit for copying to a server computer having no fault found by the unit.

4. A plurality of first server computers connected via a first network, and a plurality of second server computers connected via a second network lower in speed than the first network. A third server computer connected between the first network and the second network, one of which serves as a master server computer to provide services to client computers, When a failure occurs in a computer, one of the remaining server computers is indicated in accordance with the priority information indicating the priority order to be the master for all the computers in the system and the priority being cyclically used every time the master is switched. Is a high-availability computer system that takes over the processing as a new master server computer, wherein the first server computer has its own When the computer is the master server computer and the data of the file held by the own computer is changed from the client computer, the changed data is given priority over the own computer connected to the first network. A first copy unit for copying to the highest-ranked server computer among the low-level and fault-free server computers, and when data is copied from another server computer, the data is connected to the first network. Second copying means for copying to a server computer having a lower priority than the own computer and having the highest priority among server computers having no failure, wherein the second server computer has a master server computer When the data of the file held in the own computer is changed from the client computer, the changed data A third copy unit for individually copying the data to all the server computers without failure connected to the second network, wherein the third server computer receives data from the first server computer. Is copied, the data is individually copied to all the non-failed second server computers on the second network,
When data is copied from the second server computer, the data is copied to the first server computer having the highest priority among the first server computers having no failure on the first network. A high availability computer system comprising a fourth copy unit.

5. At least three server computers connected via a network, one of which serves as a master server computer and provides services to client computers, and a failure occurs in the master server computer. In this case, one of the remaining server computers is newly added to the master server in accordance with the priority information indicating the priority order to be the master for all the computers in the system and the priority being used every time the master is switched. A data backup method in a high-availability computer system that takes over processing as a computer, wherein, if the own computer is not a master server computer, the order starts from the server computer with the highest priority at that time according to the priority information. Master server calculation by communicating in descending order If the master server operation is not found in the master search operation and the priority of the own computer is the highest among the server computers in which no failure has occurred, the master search operation is performed. Becomes a master server computer. When the own computer is the master server computer, communication is performed in order from the server computer one rank lower than the own computer to the lower rank in accordance with the priority information, thereby causing a failure. A server computer search operation for searching for a server computer having a fault and a server computer having no failure is periodically executed, and when data of a file held in the own computer is changed from a client computer, the changed data is transferred to the server. High availability characterized by individual copying to all fault-free server computers found in the computer search operation Data backup method in sex computer system.