JPH11353292A

JPH11353292A - Cluster system and its fail over control method

Info

Publication number: JPH11353292A
Application number: JP10160479A
Authority: JP
Inventors: Hironobu Kobayashi; 弘伸小林
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1998-06-09
Filing date: 1998-06-09
Publication date: 1999-12-24

Abstract

PROBLEM TO BE SOLVED: To provide a cluster system which executes a proper fail over operation in accordance with the operating state of the fail over destination. SOLUTION: In this cluster system, two server computers 10a and 10b which are connected to a public LAN 1 are loosely coupled to each other with an interconnect LAN 2. Then both computers 10a and 10b execute the communication to confirm their normal operations with each other using the LAN 2 at every prescribed interval. When one of both computers detects a failure of the other, the computer fails over the system resources operating at the other computer on its own computer. In this case, however, the control is carried out not to mechanically fail over the all system resources of the faulty computer but to change the priorities of system resources according to its own operating state and also to chance the priorities of the system resources which are originally operating at its own computer.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、疎結合された複
数のコンピュータから構成されるクラスタシステムおよ
び同システムのフェールオーバ制御方法に関する。[0001] 1. Field of the Invention [0002] The present invention relates to a cluster system including a plurality of loosely coupled computers and a failover control method for the cluster system.

【０００２】[0002]

【従来の技術】近年、コンピュータの普及により情報化
が急速に進んでおり、様々な業種でコンピュータシステ
ムが構築されている。また、これらコンピュータシステ
ムに寄せられる耐障害性向上の要求は、年々強まる一方
である。そして、この耐障害性を実現するシステムとし
て、クラスタシステムが存在する。2. Description of the Related Art In recent years, computerization has been rapidly advanced due to the spread of computers, and computer systems have been constructed in various industries. In addition, the demand for improved fault tolerance of these computer systems is increasing year by year. A cluster system exists as a system for realizing this fault tolerance.

【０００３】このクラスタシステムは、たとえば磁気デ
ィスク装置などを共有する疎結合された複数のコンピュ
ータから構成されるシステムであり、データ処理の負荷
を分散する分散システムとして機能することに加えて、
複数のコンピュータの中のいずれかのコンピュータが故
障したときに、そのコンピュータ上で動作していたシス
テム資源（ユーティリティを含むアプリケーションプロ
グラムなど）を他のコンピュータ上に引き継いで動作さ
せる（これをフェールオーバという）耐障害性システム
としても機能する。[0003] This cluster system is, for example, a system composed of a plurality of loosely coupled computers sharing a magnetic disk device or the like, and in addition to functioning as a distributed system for distributing the load of data processing,
When one of a plurality of computers fails, the system resources (such as application programs including utilities) running on that computer are taken over and run on another computer (this is called failover). Also functions as a fault-tolerant system.

【０００４】そして、このクラスタシステムは、比較的
低性能なコンピュータをＬＡＮ（ＬｏｃａｌＡｒｅａ
Ｎｅｔｗｏｒｋ）などで結合するといった機器構成で
高性能かつ高信頼性を得られるために、最近ではコスト
面からも注目されてきているシステムである。[0004] In this cluster system, a computer having a relatively low performance is connected to a LAN (Local Area).
The system has recently attracted attention in terms of cost because high performance and high reliability can be obtained with a device configuration such as connection using a network.

【０００５】このクラスタシステムでは、予め定められ
た間隔ごとに互いの正常稼動を確認し合うための通信
（これをハートビートという）を実行し、相手の正常稼
動を確認できなかったとき、すなわち、相手の故障を検
知したときに、相手のコンピュータ上で動作していたシ
ステム資源を自身のコンピュータ上に引き継いで実行す
ることによって対障害性を実現する。In this cluster system, communication for confirming normal operation of each other is performed at predetermined intervals (this is called a heartbeat), and when the normal operation of the other party cannot be confirmed, that is, When the failure of the partner is detected, the system resources that have been operating on the partner computer are taken over and executed on the own computer, thereby realizing fault tolerance.

【０００６】[0006]

【発明が解決しようとする課題】このように、複数のコ
ンピュータを疎結合させて互いをバックアップさせるク
ラスタシステムは、高性能かつ高信頼性のシステムを比
較的安価に構築できるものである。As described above, a cluster system in which a plurality of computers are loosely coupled to each other to back up each other can construct a high-performance and highly reliable system at a relatively low cost.

【０００７】ところで、従来のクラスタシステムにおい
ては、互いにハートビートを実行し合う相手と、その相
手のどのシステム資源をフェールオーバさせるかとを設
定するのみであった。したがって、相手のコンピュータ
が故障した場合には、自身のコンピュータの稼動状況に
関わらずに、設定されたすべてのシステム資源がフェー
ルオーバされる結果、自身のコンピュータ上で元々動作
していた優先度の高いシステム資源に悪影響を与えてし
まうことがあった。また、自身のコンピュータ上で元々
動作していた優先度の低いシステム資源は無条件で動作
し続ける結果、場合によっては、フェールオーバされる
べき優先度の高いシステム資源が起動できないことがあ
った。In the conventional cluster system, it is only necessary to set a partner who executes heartbeat with each other and which system resource of the partner is to be failed over. Therefore, when the partner computer fails, all the set system resources are failed over regardless of the operation status of the own computer, and as a result, the high priority which originally operated on the own computer is used. In some cases, system resources were adversely affected. In addition, a low-priority system resource originally operating on its own computer continues to operate unconditionally, and in some cases, a high-priority system resource to be failed over cannot be started.

【０００８】この発明はこのような実情に鑑みてなされ
たものであり、フェールオーバ先の稼動状況に応じて適
切なフェールオーバを実行するクラスタシステムおよび
同システムのフェールオーバ制御方法を提供することを
目的とする。The present invention has been made in view of such circumstances, and an object of the present invention is to provide a cluster system that performs appropriate failover in accordance with the operating status of a failover destination and a failover control method for the system. .

【０００９】[0009]

【課題を解決するための手段】この発明は、前述した目
的を達成するために、複数のコンピュータがネットワー
クを介して結合され、前記複数のコンピュータの中のい
ずれかのコンピュータが故障したときに、そのコンピュ
ータ上で動作していたシステム資源を他のコンピュータ
上に引き継いで動作させるクラスタシステムにおいて、
前記引き継がれるシステム資源の停止を含む優先度の変
更を前記他のコンピュータの稼動状況に応じて制御する
フェールオーバ制御手段を具備したものである。According to the present invention, in order to achieve the above-mentioned object, a plurality of computers are connected via a network, and when one of the plurality of computers fails, In a cluster system in which the system resources operating on that computer are taken over and operated on another computer,
A failover control means for controlling a change in priority including suspension of the system resources to be taken over in accordance with an operation state of the other computer.

【００１０】この発明のクラスタシステムにおいては、
ハートビートを実行し合う複数のコンピュータの中のい
ずれか一方が故障した際、その故障したコンピュータ上
で動作していたシステム資源であって、フェールオーバ
させるものとして設定されたシステム資源を、従来のよ
うに、そのままの優先度で機械的にすべてフェールオー
バさせるのではなく、フェールオーバ先のコンピュータ
の稼動状況に応じてその優先度を変更させるため（場合
によっては起動しない）、フェールオーバ先のコンピュ
ータ上で元々動作していた優先度の高いシステム資源に
悪影響を与えることもない。In the cluster system according to the present invention,
When one of the computers performing the heartbeat fails, the system resources that were operating on the failed computer and that were set to be failed over are replaced with the system resources as in the past. In order to change the priority according to the operating status of the failover destination computer (in some cases, it does not start) instead of mechanically failing over all with the same priority, it originally operates on the failover destination computer There is no adverse effect on the prioritized system resources.

【００１１】また、この発明のクラスタシステムは、前
記フェールオーバ制御手段が、前記システム資源の引き
継ぎを実行するときに、前記他のコンピュータ上で元々
動作していたシステム資源の停止を含む優先度の変更を
前記他のコンピュータの稼動状況に応じて制御するよう
にしたものである。Also, in the cluster system according to the present invention, when the failover control means executes the takeover of the system resources, the priority change including suspension of the system resources originally operating on the other computer is performed. Is controlled in accordance with the operation status of the other computer.

【００１２】この発明のクラスタシステムにおいては、
フェールオーバ先のコンピュータに故障したコンピュー
タからシステム資源がフェールオーバされてきたとき
に、フェールオーバ先のコンピュータの稼動状況に応じ
てそのフェールオーバ先のコンピュータ上で元々動作し
ていたシステム資源の優先度を変更させるため（場合に
よっては停止させる）、フェールオーバ先のコンピュー
タ上で元々動作していた優先度の低いシステム資源が無
条件で動作し続けることによってフェールオーバされる
べき優先度の高いシステム資源が起動できないといった
事態を引き起こすこともない。In the cluster system according to the present invention,
To change the priority of the system resources originally operating on the failover destination computer when the system resources are failed over from the failed computer to the failover destination computer according to the operating status of the failover destination computer (It may be stopped in some cases), a situation in which a low-priority system resource that originally operated on the failover destination computer continues to operate unconditionally and a high-priority system resource to be failed over cannot be started. No cause.

【００１３】[0013]

【発明の実施の形態】以下、図面を参照してこの発明の
実施形態を説明する。図１は、この実施形態に係るクラ
スタシステムの構成を示す図である。図１に示すよう
に、この実施形態のクラスタシステムは、パブリックＬ
ＡＮ１に接続された２台のサーバコンピュータ１０ａ〜
ｂがインターコネクトＬＡＮ２で疎結合された構成とな
っており、また、この２台のサーバコンピュータ１０ａ
〜ｂは、ＳＣＳＩ／ＦＣ（スカジー／ファイバーチャネ
ル）３により接続される共有ディスク２０をともに使用
してデータを共有する。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a diagram showing a configuration of a cluster system according to this embodiment. As shown in FIG. 1, the cluster system according to this embodiment includes a public L
Two server computers 10a to 10a connected to AN1
b are loosely coupled by an interconnect LAN 2 and the two server computers 10a
-B share data using a shared disk 20 connected by SCSI / FC (scogy / fiber channel) 3 together.

【００１４】この実施形態のクラスタシステムは、パブ
リックＬＡＮ１を介してクライアントコンピュータから
要求されるデータ処理をサーバコンピュータ１０ａ〜ｂ
で分散して実行する分散システムであり、システム資源
として、正常稼動時には、サーバコンピュータ１０ａ
（ここではこちらをノード１とする）側でリソース１群
が動作し、一方、サーバコンピュータ１０ｂ（ここでは
こちらをノード１とする）側でリソース２群が動作す
る。The cluster system according to this embodiment performs data processing requested by a client computer via the public LAN 1 on server computers 10a to 10b.
Is a distributed system that is executed in a distributed manner.
The resource 1 group operates on the side (here, this is node 1), while the resource 2 group operates on the server computer 10b (here, this is node 1) side.

【００１５】また、このサーバコンピュータ１０ａ〜ｂ
双方は、予め定められた間隔ごとに互いの正常稼動を確
認し合うための通信（ハートビート）をインターコネク
トＬＡＮ２を用いて実行している。The server computers 10a-b
Both of them execute communication (heartbeat) using the interconnect LAN 2 for confirming the normal operation of each other at predetermined intervals.

【００１６】なお、このハートビートの実行や後述する
フェールオーバの実行などは、共有ディスク２０からサ
ーバコンピュータ１０ａ〜ｂが実装するシステムメモリ
にロードされ、サーバコンピュータ１０ａ〜ｂが実装す
るＣＰＵによって実行制御されるオペレーティングシス
テム、あるいはこのオペレーティングシステム下で動作
するプログラムによって行なわれるものである。The execution of the heartbeat and the execution of a failover, which will be described later, are loaded from the shared disk 20 into the system memory mounted on the server computers 10a and 10b, and the execution is controlled by the CPU mounted on the server computers 10a and 10b. This is performed by an operating system or a program operating under the operating system.

【００１７】また、サーバコンピュータ１０ａ〜ｂで実
行されるリソース１群およびリソース２群も、共有ディ
スク２０上のファイルからサーバコンピュータ１０ａ〜
ｂが実装するシステムメモリにロードされ、サーバコン
ピュータ１０ａ〜ｂが実装するＣＰＵによって実行制御
される（オペレーティングシステム下で動作する）プロ
グラムとして構成されるものである。そして、これらリ
ソース１群またはリソース２群のいずれかに属するすべ
てのシステム資源それぞれは、共有ディスク２０上に格
納されたリソーステーブルによってそのフェールオーバ
が制御される。The resource group 1 and the resource group 2 executed by the server computers 10a to 10b are also converted from the files on the shared disk 20 to the server computers 10a to
b is loaded into a system memory mounted on the server computer 10a, and is configured as a program (executed under an operating system) whose execution is controlled by a CPU mounted on the server computers 10a and 10b. Then, the failover of each of all the system resources belonging to either the resource group 1 or the resource group 2 is controlled by the resource table stored on the shared disk 20.

【００１８】図２は、この実施形態のリソーステーブル
の一例を示す図である。図２に示すように、この実施形
態のリソーステーブルは、「リソース名」および「フェ
ールオーバの優先度」の２つの項目を備えており、「リ
ソース名」欄で示されるシステム資源それぞれのフェー
ルオーバ実行時の取り扱いが「フェールオーバの優先
度」欄に示される。FIG. 2 is a diagram showing an example of the resource table according to this embodiment. As shown in FIG. 2, the resource table according to the present embodiment includes two items of “resource name” and “failover priority”, and is used when the failover of each system resource indicated in the “resource name” column is executed. Is shown in the column "Failover priority".

【００１９】たとえば、“ネットワーク名”および“Ｉ
Ｐアドレス”は、フェールオーバ実行時の取り扱いが
“必ず実行”と設定されているが、この場合、“ネット
ワーク名”または“ＩＰアドレス”が動作するサーバコ
ンピュータが故障したときには、他方のサーバコンピュ
ータの稼動状況に関わらず、常にそのままの優先度でフ
ェールオーバが実行される。また、“アプリ１”および
“アプリ２”は、フェールオーバ実行時の取り扱いが
“実行プライオリティを下げる”と設定されているが、
この場合、“アプリ１”または“アプリ２”が動作する
サーバコンピュータが故障したときには、他方のサーバ
コンピュータの稼動状況が予め定められた条件をオーバ
ーしない場合にはそのままの優先度で、オーバーする場
合には優先度が下げられた上でフェールオーバが実行さ
れる。さらに、“アプリ３”は、フェールオーバ実行時
の取り扱いが“フェールオーバしない”と設定されてい
るが、この場合、“アプリ３”が動作するサーバコンピ
ュータが故障したときには、他方のサーバコンピュータ
の稼動状況に関わらず、常にフェールオーバは実行され
ない。For example, "network name" and "I
The “P address” is set to “always execute” when the failover is performed. In this case, if the server computer on which the “network name” or “IP address” operates fails, the other server computer is activated. Regardless of the situation, the failover is always performed with the same priority.Also, “application 1” and “app 2” are set to “reduce the execution priority” when the failover is performed.
In this case, when the server computer on which “App 1” or “App 2” operates fails, if the operating status of the other server computer does not exceed a predetermined condition, the priority is kept as it is, and , The failover is executed after the priority is lowered. Further, the handling of the “application 3” at the time of executing the failover is set to “do not failover”. In this case, when the server computer on which the “application 3” operates fails, the operation status of the other server computer is changed. Regardless, failover is not always performed.

【００２０】このフェールオーバ実行時の取り扱いが
“実行プライオリティを下げる”と設定されている場合
に用いられる予め定められる条件は、共有ディスク２０
上に格納された条件テーブルによって管理されるもので
ある。The predetermined condition used when the handling at the time of executing the failover is set to “lower the execution priority” is the shared disk 20.
It is managed by the condition table stored above.

【００２１】図３は、この実施形態の条件テーブルの一
例を示す図である。図３に示すように、この実施形態の
条件テーブルは、「チェック項目」および「設定値」の
２つの項目を備えており、「チェック項目」欄で示され
る事項それぞれについて、フェールオーバ先のサーバコ
ンピュータの満足すべき値が「設定値」欄に示される。FIG. 3 is a diagram showing an example of the condition table of this embodiment. As shown in FIG. 3, the condition table of this embodiment includes two items, “check item” and “set value”. For each of the items shown in the “check item” column, the failover destination server computer Are shown in the "set value" column.

【００２２】したがって、“アプリ１”および“アプリ
２”は、フェールオーバ先のサーバコンピュータの稼動
状況が、スワップファイルサイズ、ＣＰＵ使用率および
メモリ使用量の各事項を満足するときはそのままの優先
度でフェールオーバが実行され、満足しないときには優
先度が下げられた上でフェールオーバが実行されること
になる。Therefore, "application 1" and "application 2" have the same priority when the operation status of the server computer at the failover destination satisfies the items of the swap file size, the CPU usage rate, and the memory usage. Failover is performed, and if not satisfied, the priority is lowered and then failover is performed.

【００２３】ここで、図４および図５を参照してこの実
施形態のクラスタシステムの動作手順を説明する。図４
は、故障したコンピュータ上で動作していたシステム資
源をフェールオーバさせる際の動作手順を説明するため
のフローチャートである。Here, the operation procedure of the cluster system of this embodiment will be described with reference to FIGS. FIG.
9 is a flowchart for explaining an operation procedure when a system resource operating on the failed computer is failed over.

【００２４】複数のコンピュータの中のいずれかのコン
ピュータが故障すると、この実施形態のクラスタシステ
ムでは、その故障したコンピュータ上で動作していたシ
ステム資源分だけ以下の処理を実行する。If any one of the plurality of computers fails, the cluster system of this embodiment executes the following processing for the system resources operating on the failed computer.

【００２５】まず、この実施形態のクラスタシステムで
は、リソーステーブルを参照してそのシステム資源の優
先度を取得し（ステップＡ１）、“必ず実行”として設
定されているかどうかをまず判定する（ステップＡ
２）。そして、“必ず実行”として設定されていた場合
には（ステップＡ２のＹＥＳ）、そのシステム資源のフ
ェールオーバを実行する（ステップＡ３）。First, in the cluster system of this embodiment, the priority of the system resource is acquired by referring to the resource table (step A1), and it is first determined whether or not the priority is set as "always executed" (step A).
2). Then, if it is set as "always executed" (YES in step A2), the failover of the system resource is executed (step A3).

【００２６】一方、“必ず実行”として設定されていな
かった場合には（ステップＡ２のＮＯ）、続いて、“実
行プライオリティを下げる”として設定されているかど
うかを判定する（ステップＡ４）。そして、“実行プラ
イオリティを下げる”として設定されていた場合には
（ステップＡ４のＹＥＳ）、フェールオーバ先のサーバ
コンピュータの稼動状況と条件テーブルとを比較し（ス
テップＡ５）、条件をオーバーしていなかった場合には
（ステップＡ６のＮＯ）、そのままの優先度でフェール
オーバを実行し、条件をオーバーしていた場合には（ス
テップＡ６のＹＥＳ）、優先度を下げた上でフェールオ
ーバを実行する（ステップＡ７）。On the other hand, if it is not set as "always executed" (NO in step A2), then it is determined whether or not "lower execution priority" is set (step A4). If "lower execution priority" is set (YES in step A4), the operation status of the server computer at the failover destination is compared with the condition table (step A5), and the condition is not exceeded. In this case (NO in step A6), the failover is executed with the same priority, and when the condition is exceeded (YES in step A6), the priority is lowered and the failover is executed (step A7). ).

【００２７】なお、“実行プライオリティを下げる”と
して設定されていなかった場合には（ステップＡ４のＮ
Ｏ）、“フェールオーバしない”として設定されている
ものと判定し、フェールオーバの実行は行なわない。If it is not set as "lower execution priority" (N in step A4)
O), it is determined that "fail-over" is not set, and no failover is performed.

【００２８】また、図５は、フェールオーバ先のコンピ
ュータ上で元々動作しているシステム資源を管理する際
の動作手順を説明するためのフローチャートである。フ
ェールオーバが発生すると、この実施形態のクラスタシ
ステムでは、フェールオーバ先のコンピュータ上で元々
動作しているシステム資源分だけ以下の処理を実行す
る。FIG. 5 is a flowchart for explaining an operation procedure for managing system resources originally operating on the failover destination computer. When a failover occurs, the cluster system of this embodiment executes the following processing for the system resources originally operating on the failover destination computer.

【００２９】まず、この実施形態のクラスタシステムで
は、リソーステーブルを参照してそのシステム資源の優
先度を取得し（ステップＢ１）、“必ず実行”として設
定されているかどうかをまず判定する（ステップＢ
２）。そして、“必ず実行”として設定されていた場合
には（ステップＡ２のＹＥＳ）、そのシステム資源につ
いては何の処理も施さずにそのまま実行を継続させる。First, in the cluster system of this embodiment, the priority of the system resource is acquired by referring to the resource table (step B1), and it is first determined whether or not the priority is set as "always executed" (step B).
2). If it is set as "always executed" (YES in step A2), the execution of the system resource is continued without performing any processing.

【００３０】一方、“必ず実行”として設定されていな
かった場合には（ステップＢ２のＮＯ）、フェールオー
バ先のサーバコンピュータの稼動状況と条件テーブルと
を比較し（ステップＢ３）、条件をオーバーしていなか
った場合には（ステップＢ４のＮＯ）、そのシステム資
源については何の処理も施さずにそのまま実行を継続さ
せる。また、条件をオーバーしていた場合には（ステッ
プＢ４のＹＥＳ）、さらに、実行プライオリティを下げ
る”として設定されているかどうかを判定し（ステップ
Ｂ５）、“実行プライオリティを下げる”として設定さ
れていた場合には（ステップＢ５のＹＥＳ）、優先度を
下げた上で実行を継続させる（ステップＢ６）。一方、
“実行プライオリティを下げる”として設定されていな
かった場合には（ステップＢ５のＮＯ）、“フェールオ
ーバしない”として設定されているものと判定し、その
システム資源の実行を終了する（ステップＢ７）。On the other hand, if it is not set to "always execute" (NO in step B2), the operation status of the failover destination server computer is compared with the condition table (step B3), and the condition is exceeded. If not (NO in step B4), execution is continued without performing any processing for the system resource. If the condition is exceeded (YES in step B4), it is further determined whether or not the setting is made as "lower the execution priority" (step B5), and the setting is made as "lower the execution priority". In this case (YES in step B5), execution is continued after lowering the priority (step B6).
If it is not set as "lower execution priority" (NO in step B5), it is determined that "failover is not set" and the execution of the system resource is ended (step B7).

【００３１】このように、この実施形態のクラスタシス
テムによれば、フェールオーバ先のコンピュータの稼動
状況に応じて、故障したコンピュータ上で動作していた
システム資源であってフェールオーバを実行するものと
して設定されたシステム資源と、フェールオーバ先のコ
ンピュータ上で元々動作しているシステム資源の双方が
適切に制御されることになる。As described above, according to the cluster system of this embodiment, according to the operation status of the computer at the failover destination, the system resources that have been operating on the failed computer are set to execute the failover. Both the system resources that have failed and the system resources that originally operate on the computer at the failover destination are appropriately controlled.

【００３２】[0032]

【発明の効果】以上詳述したように、この発明によれ
ば、ハートビートを実行し合う複数のコンピュータの中
のいずれか一方が故障した際、その故障したコンピュー
タ上で動作していたシステム資源であって、フェールオ
ーバさせるものとして設定されたシステム資源を、従来
のように、そのままの優先度で機械的にすべてフェール
オーバさせるのではなく、フェールオーバ先のコンピュ
ータの稼動状況に応じてその優先度を変更させるため
（場合によっては起動しない）、フェールオーバ先のコ
ンピュータ上で元々動作していた優先度の高いシステム
資源に悪影響を与えることもない。As described above in detail, according to the present invention, when one of a plurality of computers that execute heartbeat fails, the system resources operating on the failed computer Instead of mechanically failing over all system resources set to be failed over with the same priority as before, the priority is changed according to the operating status of the failover destination computer Because of this (it is not started in some cases), the high-priority system resources originally operating on the failover destination computer are not adversely affected.

【００３３】また、フェールオーバ先のコンピュータに
故障したコンピュータからシステム資源がフェールオー
バされてきたときに、フェールオーバ先のコンピュータ
の稼動状況に応じてそのフェールオーバ先のコンピュー
タ上で元々動作していたシステム資源の優先度を変更さ
せるため（場合によっては停止させる）、フェールオー
バ先のコンピュータ上で元々動作していた優先度の低い
システム資源が無条件で動作し続けることによってフェ
ールオーバされるべき優先度の高いシステム資源が起動
できないといった事態を引き起こすこともない。Further, when system resources are failed over from a failed computer to the failover destination computer, priority is given to the system resources originally operating on the failover destination computer according to the operating status of the failover destination computer. System resources that were originally running on the failover destination computer continue to run unconditionally, causing higher-priority system resources to be failed over. It does not cause a situation that it cannot be started.

[Brief description of the drawings]

【図１】この発明の実施形態に係るクラスタシステムの
構成を示す図。FIG. 1 is a diagram showing a configuration of a cluster system according to an embodiment of the present invention.

【図２】同実施形態のリソーステーブルの一例を示す
図。FIG. 2 is an exemplary view showing an example of a resource table according to the embodiment.

【図３】同実施形態の実施形態の条件テーブルの一例を
示す図。FIG. 3 is an exemplary view showing an example of a condition table according to the embodiment;

【図４】同実施形態の故障したコンピュータ上で動作し
ていたシステム資源をフェールオーバさせる際の動作手
順を説明するためのフローチャート。FIG. 4 is an exemplary flowchart for explaining an operation procedure at the time of failing over system resources operating on the failed computer of the embodiment;

【図５】同実施形態のフェールオーバ先のコンピュータ
上で元々動作しているシステム資源を管理する際の動作
手順を説明するためのフローチャート。FIG. 5 is an exemplary flowchart for explaining an operation procedure when managing system resources originally operating on the failover destination computer according to the embodiment;

[Explanation of symbols]

１…パブリックＬＡＮ２…インターコネクトＬＡＮ３…ＳＣＳＩ／ＦＣ１０ａ〜ｂ…サーバコンピュータ２０…共有ディスク DESCRIPTION OF SYMBOLS 1 ... Public LAN 2 ... Interconnect LAN 3 ... SCSI / FC 10a-b ... Server computer 20 ... Shared disk

Claims

[Claims]

1. A plurality of computers are connected via a network, and when one of the plurality of computers fails, the system resources running on that computer are taken over by another computer. A cluster system, comprising: a failover control unit that controls a change of a priority including a stop of the system resource to be taken over according to an operation state of the other computer.

2. The failover control means according to claim 1, wherein, when executing the takeover of the system resources, the change of the priority including the suspension of the system resources originally operating on the other computer is activated by the operation of the other computer. 2. The cluster system according to claim 1, further comprising means for controlling according to a situation.

3. A plurality of computers are connected via a network, and when one of the plurality of computers fails, the system resources running on that computer are taken over by another computer. A failover control method for a cluster system, wherein the change of priority including the stop of the system resource to be taken over is controlled according to the operating status of the other computer.

4. The failover control method according to claim 3, wherein when the system resources are taken over, a change of a priority including a stop of a system resource originally operating on the other computer is controlled.