JP4371942B2

JP4371942B2 - Cluster system node control program and server

Info

Publication number: JP4371942B2
Application number: JP2004230425A
Authority: JP
Inventors: 啓士堀内; 美穂竹内
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2004-08-06
Filing date: 2004-08-06
Publication date: 2009-11-25
Anticipated expiration: 2024-08-06
Also published as: JP2006048477A

Description

本発明はクラスタシステムのノード制御プログラムおよびサーバに関し、特に独立して動作する複数のノードを構成要素とするクラスタシステムにおいて、所定のサービスを提供するノードを調停するためのノード制御プログラムおよびノードを構成するサーバに関する。 The present invention relates to a node control program and a server for a cluster system, and more particularly to a node control program and a node for arbitrating a node that provides a predetermined service in a cluster system including a plurality of nodes that operate independently. Related to the server.

従来、基幹業務などの重要なアプリケーションを実行するサーバの高信頼性を実現するために用いられている技法の１つにクラスタがある。クラスタとは、独立して動作する複数のコンピュータ（サーバ）が組み合わされて、全体として１つのシステムを構成する手法である。クラスタシステムの構成要素であるサーバは、一般にノードと呼ばれる。 Conventionally, there is a cluster as one of techniques used for realizing high reliability of a server that executes an important application such as a core business. A cluster is a technique in which a plurality of computers (servers) that operate independently are combined to form one system as a whole. A server that is a component of a cluster system is generally called a node.

図１２は、一般的なクラスタシステムの構成図である。図の例では、クラスタシステム９００は、ノードＡ９１０と、ノードＢ９２０から構成され、サービスを受けるクライアント９５１、９５２、９５３からは、１つのシステムとして認識される。なお、ノードの数は任意である。ノードＡ９１０とノードＢ９２０との間は、クライアント９５１、９５２、９５３とも接続されるパブリックＬＡＮ９３０と、クラスタシステム９００内のノードを接続するプライベートＬＡＮ９４０によって接続される。 FIG. 12 is a configuration diagram of a general cluster system. In the illustrated example, the cluster system 900 includes a node A 910 and a node B 920, and is recognized as one system by clients 951, 952, and 953 that receive services. Note that the number of nodes is arbitrary. The node A 910 and the node B 920 are connected by a public LAN 930 that is also connected to clients 951, 952, and 953 and a private LAN 940 that connects nodes in the cluster system 900.

クラスタシステムのノード間では、２重化されているパブリックＬＡＮもしくはプライベートＬＡＮのどちらかにより、どのノードがサービスを提供するかなどの調停を行っている。そして、クラスタシステム内のいずれかのノードが障害の発生や保守のために利用できなくなると、別のノードが直ちにサービスの提供を開始するフェールオーバー処理を行っている（たとえば、非特許文献１参照）。フェールオーバー処理によって、全体としての処理は中断されることがなく、サービスが提供され、システムの高信頼性を実現している。クラスタシステムが継続して稼動するためには、ＬＡＮ経路の情報交換が必須であり、単一箇所での故障による影響を防ぐため、各ノード間のＬＡＮ経路は２重化されている。 Between nodes of the cluster system, arbitration is performed such as which node provides a service by using either a public LAN or a private LAN that is duplicated. When one of the nodes in the cluster system becomes unavailable due to a failure or maintenance, another node immediately performs a failover process to start providing a service (for example, see Non-Patent Document 1). ). With the failover process, the entire process is not interrupted, and the service is provided, thereby realizing high reliability of the system. In order for the cluster system to continue to operate, information exchange of LAN routes is indispensable, and LAN routes between nodes are duplicated in order to prevent the influence of a failure at a single location.

このようなクラスタシステムでノード間の相互通信ができなくなった場合には、事前に決めた優先度に従って高優先のノードを強制的に動作させたり、ＬＡＮ以外の経路を利用して調停を行って、サービスの継続を図っている（たとえば、非特許文献２参照）。
Ｍｉｃｒｏｓｏｆｔ、“クラスタサービスの新機能”、［ｏｎｌｉｎｅ］、平成１４年９月２６日、「平成１６年５月２６日検索」、インターネット＜URL:http://www.microsoft.com/japan/windowsserver2003/evaluation/overview/technologies/clustering.mspx＞Ｍｉｃｒｏｓｏｆｔ、“クラスタサービスが共有バス上のディスクの所有権を獲得するしくみ”、［ｏｎｌｉｎｅ］、平成１４年１１月１９日、「平成１６年５月２６日検索」、インターネット＜URL:http://www.support.microsoft.com/default.aspx?scid=kb;ja;309186&sd=tech＞ When mutual communication between nodes becomes impossible in such a cluster system, a high priority node is forcibly operated according to a predetermined priority, or arbitration is performed using a route other than the LAN. The service is continued (for example, see Non-Patent Document 2).
Microsoft, “New features of cluster service”, [online], September 26, 2002, “May 26, 2004 search”, Internet <URL: http://www.microsoft.com/japan/windowsserver2003 /evaluation/overview/technologies/clustering.mspx> Microsoft, “How the cluster service acquires ownership of a disk on a shared bus”, [online], November 19, 2002, “May 26, 2004 search”, Internet <URL: http: / /www.support.microsoft.com/default.aspx?scid=kb;en;309186&sd=tech>

しかし、従来の技法では、ノード間の相互通信ができない状態でサービスを継続すると判断されたノードが正常でないと、サービスの継続が困難になってしまうという問題点がある。 However, in the conventional technique, there is a problem that continuation of the service becomes difficult if the node determined to continue the service in a state where mutual communication between the nodes cannot be performed is not normal.

従来のクラスタシステムは、多重化された通信経路の双方が故障することは稀であるという前提にたっている。しかしながら、どのように多重化していてもすべての経路で通信ができなくなる状態が生じ得る。以下、このように相互通信ができなくなる状態をスプリットブレイン（split brain）と呼ぶ。 The conventional cluster system is based on the premise that it is rare that both multiplexed communication paths fail. However, no matter how multiplexed, communication may not be possible on all paths. Hereinafter, such a state in which mutual communication cannot be performed is referred to as a split brain.

スプリットブレインが発生した場合、従来の事前に決めた優先度で起動するノードを決める技法では、高優先度のノードが正常でない場合にはサービスを継続できないという問題点がある。 When split brain occurs, the conventional technique of determining a node to be activated with a predetermined priority has a problem that service cannot be continued if a high priority node is not normal.

また、ＬＡＮ以外の経路、たとえば、ノード間で共有されるディスク装置と接続するＳＣＳＩ（Small Computer System Interface）などを使ってどのノードがサービスを継続するか調停を行う技法では、ＳＣＳＩを介した調停により１のノードが選ばれ、サービスを継続させる。このとき、調停によってサービスを継続しないと判断されたノードは、処理を停止してしまう。もし、サービスを継続したノードに何らかの故障、たとえば、サービスを起動させると検出されるＬＡＮ異常などがあったとしても、調停が行われてサービス継続が決定されると、他のノードが起動されることはない。このため、故障のないノードが停止されて故障のあるノードが生き残り、結局サービスが継続されないケースが発生する可能性がある。 In the technique of arbitrating which node continues the service using a route other than the LAN, for example, a SCSI (Small Computer System Interface) connected to a disk device shared between the nodes, arbitration via SCSI is used. As a result, one node is selected and the service is continued. At this time, the node determined not to continue the service due to the arbitration stops the processing. Even if there is some failure in the node that has continued the service, for example, a LAN abnormality detected when the service is activated, when the arbitration is performed and the service continuation is determined, another node is activated. There is nothing. For this reason, there is a possibility that a node without a failure is stopped, a node with a failure survives, and a service is not continued after all.

本発明はこのような点に鑑みてなされたものであり、ノード間の通信が不通となった場合であっても、正常なノードによるサービスの継続を可能にし、クラスタシステムの信頼性を向上させることが可能なクラスタシステムのノード制御プログラムおよびサーバを提供することを目的とする。 The present invention has been made in view of such points, and even when communication between nodes is interrupted, it is possible to continue service by a normal node and improve the reliability of the cluster system. It is an object of the present invention to provide a node control program and a server for a cluster system that can be used.

本発明では上記課題を解決するために、図１に示すようなクラスタシステムを構成するサーバ（ノード）の機能を実現するためのノード制御プログラムが提供される。本発明に係るクラスタシステムのノード制御プログラムは、ノード間の通信が不通となった場合に、所定のサービスが継続されるように各ノードの動作を制御するためのものである。このノード制御プログラムは、コンピュータに、以下の処理を実行させることができる。 In order to solve the above-described problems, the present invention provides a node control program for realizing the function of a server (node) constituting a cluster system as shown in FIG. The node control program of the cluster system according to the present invention is for controlling the operation of each node so that a predetermined service is continued when communication between nodes is interrupted. This node control program can cause a computer to execute the following processing.

コンピュータは、通信監視手段１１によって所定のサービスを実行するノードを調停するための情報交換を行う通信路経由の通信不通が検出されると、当該サーバ（ノード）１０ａが所定のサービスを実行可能な割当て時間と割当て時間を取得する優先順位が予め定義された定義情報が格納された定義情報記憶手段１６から定義情報を読み出し、計時手段１２の計時する現在時刻を取得する。そして、定義情報に基づき、計時された現在時刻から最も近い次の割当て時間の開始時刻を算出する（ステップＳ１）。次に、算出された割当て時間の開始時刻まで、サービス処理の起動を遅延する（ステップＳ２）。そして、割当て時間の開始時刻に到達すると、サービスを起動させ、割当て時間内にサービスが実行できれば動作を継続し、割当て時間内にサービスが実行できない場合は、動作を停止する（ステップＳ３）。 When the communication monitoring unit 11 detects a communication failure through a communication path for exchanging information for mediating a node that executes a predetermined service, the server (node) 10a can execute the predetermined service. The definition information is read from the definition information storage means 16 in which the assignment information and the definition information in which the priority for obtaining the assignment time is defined in advance are stored, and the current time measured by the time measuring means 12 is obtained. Then, based on the definition information, the start time of the next assigned time closest to the current time measured is calculated (step S1). Next, the activation of the service processing is delayed until the calculated allocation time start time (step S2). Then, when the start time of the allocation time is reached, the service is started, and if the service can be executed within the allocation time, the operation is continued. If the service cannot be executed within the allocation time, the operation is stopped (step S3).

このようなクラスタシステムのノード制御プログラムをコンピュータに実行させることで、ノード間で行われている所定のサービスを実行するノードを調停するための情報交換に用いる通信路を経由した通信が不通になると、それぞれのノードでは、割当て時間と割当て時間の優先度が定義された定義情報に基づき、次の割当て時間の開始時刻が算出され、開始時刻に到達するまでサービス処理の起動は遅延される。それぞれのノードに付与された優先順位によって、割当て時間が重なることはない。そして、割当て時間の開始時刻になると、サービス処理が起動される。各ノードは、それぞれの割当て時間にサービス処理が実行できるかどうかを試みる。そして、割当て時間内にサービスが実行できれば、サービス処理を継続する。また、サービスが実行できなければ、動作を停止し、他のノードがサービスを実行するのを待つ。 By causing a computer to execute such a node control program of the cluster system, communication via a communication path used for information exchange for arbitrating a node that executes a predetermined service performed between nodes is interrupted. In each node, the start time of the next allocation time is calculated based on the definition information in which the allocation time and the priority of the allocation time are defined, and the activation of the service process is delayed until the start time is reached. The allocation time does not overlap depending on the priority assigned to each node. Then, when the start time of the allocation time is reached, the service process is activated. Each node attempts to perform service processing at its assigned time. If the service can be executed within the allocated time, the service process is continued. If the service cannot be executed, the operation is stopped, and the process waits for another node to execute the service.

また、本発明では上記課題を解決するために、ノードとしてクラスタシステムを構成し、所定のサービスを提供する前記ノードを調停するサーバにおいて、前記サービスを実行可能な割当て時間と、前記割当て時間を取得する優先順位が定義された定義情報を格納する定義情報記憶手段と、現在時刻を計時する計時手段と、前記ノード間の情報交換に用いられる通信路経由の通信の不通が検出されると、前記定義情報記憶手段に格納された前記定義情報に基づいて、前記計時手段より取得した前記現在時刻から最も近い次の割当て時間の開始時刻を算出する割当て時間算出手段と、前記開始時刻まで前記サービスの起動を遅延させる遅延手段と、前記開始時刻に到達すると、前記サービスを起動し、前記割当て時間内に前記サービスが実行できれば動作を継続させる処理手段と、を具備することを特徴とするサーバ、が提供される。 Further, in the present invention, in order to solve the above-mentioned problem, an allocation time for executing the service and the allocation time are acquired in a server that configures a cluster system as a node and arbitrates the node that provides a predetermined service. A definition information storage means for storing definition information in which priority order is defined, a time measuring means for measuring the current time, and when communication failure is detected via a communication path used for information exchange between the nodes, Based on the definition information stored in the definition information storage means, allocation time calculation means for calculating the start time of the next allocation time closest to the current time acquired from the time measuring means, and the service time until the start time Delay means for delaying start-up, and when the start time is reached, the service is started and the service is executed within the allotted time. Server, characterized by comprising processing means to continue the operation, the if Re is provided.

このような構成のサーバでは、定義情報記憶手段に割当て時間とサーバごとに設定される優先順位の定義情報が格納されている。割当て時間算出手段は、所定のサービスを提供するノードを調停するための情報交換を行う通信路経由の通信が不通となったことを検出すると、定義情報記憶手段から定義情報を読み出し、現在時刻から最も近い次の割当て時間の開始時刻を算出する。遅延手段は、現在時刻から割当て時間の開始時間までサービスの起動を遅延させる。そして、割当て時間の開始時刻になると、処理手段は、サービス処理を起動する。このとき、割当て時間内にサービスが実行できれば、動作を継続し、サービスを継続させる。また、割当て時間内にサービスが実行できない場合、処理を停止させる。 In the server having such a configuration, definition information of allocation time and priority order set for each server is stored in the definition information storage means. The allocation time calculation means reads the definition information from the definition information storage means when detecting that communication via the communication path for exchanging information for arbitrating the node providing the predetermined service is interrupted, and from the current time The start time of the next next assigned time is calculated. The delay means delays activation of the service from the current time to the start time of the allocated time. Then, when the allocation time start time is reached, the processing means starts the service processing. At this time, if the service can be executed within the allocated time, the operation is continued and the service is continued. If the service cannot be executed within the allocated time, the process is stopped.

本発明では、独立して動作を行うクラスタシステムのノード間の通信が不通となった場合、予めそれぞれのノードに設定された優先順位と割当て時間とに基づき、ノードが順々にサービスの起動を試みる。そして、サービスが実行できたノード（故障のないノード）にサービスを継続させる。このように、ノード間の通信が不通になった場合でも、各ノードがサービスの稼動を試行することで、故障のないノードがサービス処理を行えるようになり、結果として、サービスの継続性を高めることができる。 In the present invention, when communication between nodes of the cluster system that operates independently is interrupted, the nodes start the services in sequence based on the priority order and the allocation time set in advance for each node. Try. Then, the service is continued at the node that has been able to execute the service (the node without a failure). In this way, even when communication between nodes is interrupted, each node tries to operate the service, so that a node without a failure can perform service processing, and as a result, the continuity of service is improved. be able to.

以下、本発明の実施の形態を図面を参照して説明する。まず、実施の形態に適用される発明の概念について説明し、その後、実施の形態の具体的な内容を説明する。
図１は、本発明の実施の形態に適用される発明の概念図である。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. First, the concept of the invention applied to the embodiment will be described, and then the specific contents of the embodiment will be described.
FIG. 1 is a conceptual diagram of the invention applied to the embodiment of the present invention.

本発明に係るクラスタシステム１は、複数のノード、図の例では、サーバ（ノード）１０ａとサーバ（ノード）１０ｂとから構成され、それぞれのノードからアクセス可能な共有ディスク装置２０を備える。 The cluster system 1 according to the present invention includes a plurality of nodes, in the illustrated example, a server (node) 10a and a server (node) 10b, and includes a shared disk device 20 that can be accessed from each node.

サーバ（ノード）１０ａは、通信状態を監視する通信監視手段１１、時刻を計時する計時手段１２、パブリックＬＡＮ３による通信を制御する通信制御手段１３、プライベートＬＡＮ４による通信を制御する通信制御手段１４、ノードの動作を制御するノード制御部１５および定義情報記憶手段１６を具備する。 The server (node) 10a includes a communication monitoring unit 11 that monitors a communication state, a time measuring unit 12 that measures time, a communication control unit 13 that controls communication using the public LAN 3, a communication control unit 14 that controls communication using the private LAN 4, a node A node control unit 15 and definition information storage means 16 for controlling the operation of

通信監視手段１１は、通信制御手段１３および通信制御手段１４による通信状態を監視し、双方の通信路においてサーバ（ノード）１０ａと他のノードとの間の通信状態を判定する。一般に、クラスタシステムでは、ＬＡＮを通じ、ノード間で情報交換を行っている。そこで、パブリックＬＡＮ３およびプライベートＬＡＮ４のいずれの経路でも他のノードとの通信が不通となったとき、ノード制御部１５はスプリットブレインの発生を検出する。 The communication monitoring unit 11 monitors the communication state by the communication control unit 13 and the communication control unit 14, and determines the communication state between the server (node) 10a and another node in both communication paths. Generally, in a cluster system, information is exchanged between nodes through a LAN. Therefore, when communication with other nodes is interrupted on both the public LAN 3 and the private LAN 4, the node control unit 15 detects the occurrence of split brain.

計時手段１２は、自装置内で時刻を計時するとともに、たとえばパブリックＬＡＮ３を介して接続する管理装置から時間情報を取得し、時計の時刻調整を行う。管理装置は、クラスタシステムを含むグループ全体を管理する装置で、管理の一つに、グループ全体の装置の計時する時刻を合わせる時刻調整管理がある。計時手段１２では、時刻調整のための時間情報を取得し、自装置の時計の時刻調整を行う。これにより、クラスタシステム内のノードの時刻がほぼ一致するようになる。なお、時間情報が取得できない場合は、そのまま計時を継続するとともに、定義情報の優先順位を下げる。たとえば、優先度の高い順に１から整数が振られている場合、最大の優先順位の次の値としたり、現在設定されている優先順位の値にノード数を加算する。 The time measuring means 12 measures the time in its own device, acquires time information from, for example, a management device connected via the public LAN 3, and adjusts the time of the clock. The management device is a device that manages the entire group including the cluster system. One type of management is time adjustment management that matches the time counted by the devices of the entire group. The time measuring means 12 acquires time information for adjusting the time and adjusts the time of the clock of the own device. As a result, the times of the nodes in the cluster system almost coincide. If the time information cannot be acquired, the timing is continued as it is and the priority of the definition information is lowered. For example, when integers are assigned from 1 in descending order of priority, the next value of the highest priority is set, or the number of nodes is added to the currently set priority value.

通信制御手段１３は、サーバ（ノード）１０ａと他のノードを含む他装置との間で、パブリックＬＡＮ３を介して行われる通信を制御する。
通信制御手段１４は、サーバ（ノード）１０ａと、クラスタシステムを構成する他のノードとの間で、プライベートＬＡＮ４を介して行われる通信を制御する。 The communication control means 13 controls communication performed via the public LAN 3 between the server (node) 10a and other devices including other nodes.
The communication control unit 14 controls communication performed via the private LAN 4 between the server (node) 10a and other nodes constituting the cluster system.

ノード制御部１５は、通常状態では、パブリックＬＡＮ３の通信制御手段１３もしくはプライベートＬＡＮ４の通信制御手段１４経由で入力する他ノードの故障などの情報によって、所定のサービスを提供するノードの調停を行っている。そして、通信監視手段１１によってスプリットブレインを検出すると、割当て時間算出処理（ステップＳ１）、遅延処理（ステップＳ２）、サービス起動処理（ステップＳ３）を順次行って、パブリックＬＡＮ３およびプライベートＬＡＮ４のどちらも用いることなく、ノードの調停を行う。割当て時間算出処理（ステップＳ１）では、定義情報記憶手段１６に格納された定義情報を読み出し、割当て時間と自装置に設定された優先順位を取得し、計時手段１２から現在時刻を取得する。割当て時間は、ノードに割当てられたサービス起動処理を行うための時間で、共通の時間が設定される。優先順位は、割当て時間が与えられる順番を示したもので、順位が高いほど、割当て時間の開始時刻が早くなる。割当て時間算出処理では、所定の基準点（特定の日時、またはグリニッジ標準時の０時など）から割当て時間を各ノードに割当てたとして自装置に割当てられる、現在時刻から最も近い次の割当て時間の開始時刻を算出する。また、算出の際には、割当て時間の終了時刻と、次の割当時間の開始時刻との間に、各ノードの時間のずれを調整する調整時間を加える。次に、遅延処理(ステップＳ２)では、算出された次の割当て時間の開始時刻までサービス起動処理（ステップＳ３）の開始を遅延させる。そして、開始時刻に到達すると、サービス起動処理（ステップＳ３）が開始され、サービス処理が起動される。割当て時間内にサービス処理が正常に実行できた場合は、サービス処理を継続させる。また、割当て時間内にサービス処理が実行できなかった場合は、動作を停止する。 In a normal state, the node control unit 15 performs arbitration of a node that provides a predetermined service based on information such as a failure of another node that is input via the communication control unit 13 of the public LAN 3 or the communication control unit 14 of the private LAN 4. Yes. Then, when the split brain is detected by the communication monitoring unit 11, the allocation time calculation process (step S1), the delay process (step S2), and the service activation process (step S3) are sequentially performed, and both the public LAN 3 and the private LAN 4 are used. Without any mediation. In the allocation time calculation process (step S1), the definition information stored in the definition information storage unit 16 is read, the allocation time and the priority set in the own device are acquired, and the current time is acquired from the time measuring unit 12. The allocated time is a time for performing the service activation process allocated to the node, and a common time is set. The priority order indicates the order in which the allocation time is given. The higher the order, the earlier the start time of the allocation time. In the allocation time calculation process, the next allocation time that is the closest to the current time and that is allocated to the own device as the allocation time is allocated to each node from a predetermined reference point (such as a specific date or time or 0 o'clock Greenwich Mean Time) is started. Calculate the time. In the calculation, an adjustment time for adjusting the time lag of each node is added between the end time of the assigned time and the start time of the next assigned time. Next, in the delay process (step S2), the start of the service activation process (step S3) is delayed until the calculated start time of the next allocation time. When the start time is reached, a service activation process (step S3) is started and the service process is activated. If the service process can be normally executed within the allocated time, the service process is continued. If the service process cannot be executed within the allocated time, the operation is stopped.

定義情報記憶手段１６は、ノードがサービス処理を実行できる割当て時間と、ノードごとに設定された割当て時間の優先順位を含む定義情報を格納する。
なお、サーバ（ノード）１０ｂの構成もサーバ（ノード）１０ａと同様である。 The definition information storage unit 16 stores definition information including an allocation time during which a node can execute a service process, and a priority order of allocation times set for each node.
The configuration of the server (node) 10b is the same as that of the server (node) 10a.

このような構成のクラスタシステムの動作について説明する。
通常状態では、通信制御手段１３もしくは通信制御手段１４を介してノード間の情報交換が行われ、所定のサービスを実行するノードの調停が行われている。計時手段１２は、管理装置から時刻情報を取得し、時刻調整を行っている。ところが、何らかの故障により、通信制御手段１３および通信制御手段１４経由の通信が不通になると、通信監視手段１１がこれを検出し、スプリットブレイン発生にいたる。 The operation of the cluster system having such a configuration will be described.
In the normal state, information is exchanged between the nodes via the communication control means 13 or the communication control means 14, and a node that executes a predetermined service is arbitrated. The time measuring means 12 acquires time information from the management device and adjusts the time. However, if communication via the communication control means 13 and the communication control means 14 is interrupted due to some failure, the communication monitoring means 11 detects this, and split brain occurs.

ノード制御部１５は、スプリットブレインを検出すると、定義情報記憶手段１６から定義情報を読み出し、計時手段１２より現在時刻を取得する。次に、定義情報に従って、次の割当て時間の開始時刻を算出し、開始時刻までサービス起動処理を遅延する。そして、開始時刻となったら、サービス起動処理を開始する。割当て時間内にサービス処理が実行できた場合は、そのままサービス処理を継続する。また、割当て時間内にサービス処理が実行できなかった場合は、動作を停止する。 When detecting the split brain, the node control unit 15 reads the definition information from the definition information storage unit 16 and acquires the current time from the time measuring unit 12. Next, the start time of the next allocation time is calculated according to the definition information, and the service activation process is delayed until the start time. When the start time comes, the service activation process is started. If the service process can be executed within the allocated time, the service process is continued as it is. If the service process cannot be executed within the allocated time, the operation is stopped.

以上のように、本発明によれば、クラスタシステムにおいて、所定のサービスを実行するノードを調停するための情報交換を行う通信が不通となった場合、それぞれのノードが予め設定されている定義情報に従って、サービス処理を起動するタイミング（割当て時間の開始時刻）を算出し、その割当て時間内にサービスを起動する処理を行う。これにより、スプリットブレインが発生した場合でも、それぞれのノードが割当てられた時間でサービス稼動を試行することができる。そして、試行の結果、最初に正常に動作できたノードがサービス処理を継続し、それ以外のノードは動作を停止したことにより、業務を継続させることができる。 As described above, according to the present invention, in the cluster system, when communication for exchanging information for arbitrating a node that executes a predetermined service is interrupted, the definition information in which each node is set in advance. Accordingly, the service processing start timing (allocation time start time) is calculated, and the service start processing is performed within the allocation time. Thereby, even when split brain occurs, it is possible to try service operation at the time when each node is allocated. Then, as a result of the trial, the node that has been able to operate normally first continues the service processing, and the other nodes stop operating, so that the business can be continued.

［第１の実施の形態］
次に、本発明の第１の実施の形態について図面を参照して詳細に説明する。以下、本発明の実施の形態では、クラスタシステムの一例として、Ｍｉｃｒｏｓｏｆｔ製クラスタサービス(以下、ＭＳＣＳとする)を運用した場合について説明する。 [First Embodiment]
Next, a first embodiment of the present invention will be described in detail with reference to the drawings. Hereinafter, in the embodiment of the present invention, a case where a Microsoft cluster service (hereinafter referred to as MSCS) is operated as an example of a cluster system will be described.

図２は、第１の実施の形態のシステムの構成図である。
本発明にかかるクラスタシステムを含むネットワークの管理単位であるグループは、クラスタシステム１と、管理装置であるドメインコントローラ（Domain Controller；以下、ＤＣとする）２０１、２０２と、クライアント９５１、９５２、９５３、９５４がパブリックＬＡＮ３を介して接続する。 FIG. 2 is a configuration diagram of a system according to the first embodiment.
A group which is a management unit of a network including a cluster system according to the present invention includes a cluster system 1, domain controllers (hereinafter referred to as DC) 201 and 202 which are management apparatuses, clients 951, 952, 953, 954 connects through the public LAN 3.

クラスタシステム１は、ノードＡとなるサーバ１００ａと、ノードＢとなるサーバ１００ｂがパブリックＬＡＮ３とプライベートＬＡＮ４を介して接続され、共有の共有ディスク装置２０に接続する。 In the cluster system 1, a server 100a serving as a node A and a server 100b serving as a node B are connected via a public LAN 3 and a private LAN 4, and are connected to a shared shared disk device 20.

サーバ（ノードＡ）１００ａ、サーバ（ノードＢ）１００ｂは、それぞれ、パブリックＬＡＮ３に接続するＬＡＮインタフェース１１０ａ、１１０ｂと、プライベートＬＡＮ４に接続するＬＡＮインタフェース１２０ａ、１２０ｂと、共有ディスク装置２０に接続するＳＣＳＩ（Small Computer System Interface）１３０ａ、１３０ｂを具備する。また、ＤＣ２０１、２０２は、それぞれ、パブリックＬＡＮ３に接続するＬＡＮインタフェース２１１、２１２を具備する。 The server (node A) 100a and the server (node B) 100b are respectively connected to the LAN interfaces 110a and 110b connected to the public LAN 3, the LAN interfaces 120a and 120b connected to the private LAN 4, and the SCSI (connected to the shared disk device 20). Small Computer System Interface) 130a and 130b. The DCs 201 and 202 include LAN interfaces 211 and 212 connected to the public LAN 3, respectively.

ＤＣ２０１、２０２は、サーバ（ノードＡ）１００ａ、サーバ（ノードＢ）１００ｂを管理しており、パブリックＬＡＮ３経由で、時間情報の送信を行っている。また、サーバ(ノードＡ）１００ａ、サーバ(ノードＢ）１００ｂ間では、情報交換のため、パブリックＬＡＮ３もしくはプライベートＬＡＮ４経由で定期的な通信を行っている。この通信はハートビート（Haert Beat）と呼ばれる。そして、ハートビートによって、サーバ(ノードＡ）１００ａとサーバ(ノードＢ）１００ｂ間の調停が行われ、いずれかのノードがサービス(業務)を稼動させている。クライアント９５１、９５２、９５３、９５４は、クラスタシステム１を１つのシステムとして認識しており、いずれのノードで業務を稼動中であるかを知る必要はない。 The DCs 201 and 202 manage the server (node A) 100a and the server (node B) 100b, and transmit time information via the public LAN 3. In addition, periodic communication is performed between the server (node A) 100a and the server (node B) 100b via the public LAN 3 or the private LAN 4 for information exchange. This communication is called a heart beat. Then, arbitration between the server (node A) 100a and the server (node B) 100b is performed by the heartbeat, and one of the nodes operates the service (business). The clients 951, 952, 953, and 954 recognize the cluster system 1 as one system, and do not need to know which node is operating the business.

以上の構成のクラスタシステムに適用されるサーバのハードウェア構成を説明する。図３は、第１の実施の形態のサーバのハードウェア構成例を示すブロック図である。
サーバ１００は、ＣＰＵ（Central Processing Unit）１０１によって装置全体が制御されている。ＣＰＵ１０１には、バス１０７を介してＲＡＭ（Random Access Memory）１０２、ハードディスクドライブ（ＨＤＤ：Hard Disk Drive）１０３、パブリックＬＡＮインタフェース１１０、プライベートＬＡＮインタフェース１２０およびＳＣＳＩ１３０が接続されている。 A hardware configuration of a server applied to the cluster system having the above configuration will be described. FIG. 3 is a block diagram illustrating a hardware configuration example of the server according to the first embodiment.
The server 100 is entirely controlled by a CPU (Central Processing Unit) 101. A random access memory (RAM) 102, a hard disk drive (HDD) 103, a public LAN interface 110, a private LAN interface 120, and a SCSI 130 are connected to the CPU 101 via a bus 107.

ＲＡＭ１０２には、ＣＰＵ１０１に実行させるＯＳ（Operating System）のプログラムやアプリケーションプログラムの少なくとも一部が一時的に格納される。また、ＲＡＭ１０２には、ＣＰＵ１０１による処理に必要な各種データが格納される。ＨＤＤ１０３には、ＯＳやアプリケーションのプログラムが格納される。パブリックＬＡＮインタフェース１１０は、パブリックＬＡＮに接続されており、パブリックＬＡＮを介して他ノード、ＤＣ、あるいはクライアントとの間でデータの送受信を行う。プライベートＬＡＮインタフェース１２０は、プライベートＬＡＮに接続されており、プライベートＬＡＮを介して他のノードとの間でデータの送受信を行う。ＳＣＳＩ１３０には、共有ディスク装置２０が接続されており、ＣＰＵ１０１からの命令に従って共有ディスク装置２０のアクセス権を獲得し、データの読み書きを行う。 The RAM 102 temporarily stores at least part of an OS (Operating System) program and application programs to be executed by the CPU 101. The RAM 102 stores various data necessary for processing by the CPU 101. The HDD 103 stores the OS and application programs. The public LAN interface 110 is connected to the public LAN, and transmits / receives data to / from other nodes, DC, or clients via the public LAN. The private LAN interface 120 is connected to the private LAN, and transmits / receives data to / from other nodes via the private LAN. The shared disk device 20 is connected to the SCSI 130, and an access right to the shared disk device 20 is acquired according to a command from the CPU 101, and data is read and written.

以上のハードウェア構成により、第１の実施の形態の処理機能を実現することができる。
次に、第１の実施の形態のサーバの内部構成について説明する。図４は、第１の実施の形態のサーバの内部構成を示すブロック図である。サーバ１００は、パブリックＬＡＮインタフェース１１０、プライベートＬＡＮインタフェース１２０、ＳＣＳＩ１３０、異常検出マネージャー１４０、切替マネージャー１５０、定義情報記憶装置１６０、リソースマネージャー１７０およびアプリケーションマネージャー１８０を有する。 With the above hardware configuration, the processing functions of the first embodiment can be realized.
Next, the internal configuration of the server according to the first embodiment will be described. FIG. 4 is a block diagram illustrating an internal configuration of the server according to the first embodiment. The server 100 includes a public LAN interface 110, a private LAN interface 120, a SCSI 130, an abnormality detection manager 140, a switching manager 150, a definition information storage device 160, a resource manager 170, and an application manager 180.

パブリックＬＡＮインタフェース１１０は、パブリックＬＡＮ３経由の通信を制御し、プライベートＬＡＮインタフェース１２０は、プライベートＬＡＮ４経由の通信を制御する。通常動作時、クラスタシステムのノード間では、ハートビート通信が行われている。ＳＣＳＩ１３０は、共有ディスク装置２０の制御権を獲得するための制御コマンドなどを発行する。また、ハートビート通信が失われ、スプリットブレインが発生した場合、ＳＣＳＩコマンドを用いてノード間の調停が行われる。 The public LAN interface 110 controls communication via the public LAN 3, and the private LAN interface 120 controls communication via the private LAN 4. During normal operation, heartbeat communication is performed between the nodes of the cluster system. The SCSI 130 issues a control command for acquiring the control right of the shared disk device 20. When heartbeat communication is lost and split brain occurs, arbitration between nodes is performed using a SCSI command.

異常検出マネージャー１４０は、ネットワーク、ＨＤＤ、アプリケーション、ノードなどの動作状態を監視し、異常を検出する。
切替マネージャー１５０は、異常検出マネージャー１４０が異常を検出した場合など、必要に応じて、ノード動作の切替制御を行う。ノード調停１５１では、ハートビート通信が正常な場合は、ハートビートにより得られた情報に基づいて、ノードの調停を行う。ＳＣＳＩ調停１５２では、ハートビート通信が失われた場合、ＳＣＳＩコマンドを用いてクォーラムの獲得を試行して、ノードの調停を行う。クォーラムは、ＭＳＣＳにおいて、クラスタを構成する全ノードから接続される共有ディスク装置２０上にあり、クラスタ運用時には１つのノードだけからアクセスできる。そして、各ノード間でクラスタ構成情報の整合性を保つために使用されるとともに、スプリットブレイン発生時の調停手段を提供する。スプリットブレイン発生時には、クォーラムを所持できるノードが生き残る。自律調停１５３では、ハートビート通信が失われたとき、それぞれのノードが自律してサービスの起動を試行して、ノードの調停を行う。自律調停１５３では、割当て時間算出部１５３ａは、定義情報記憶装置１６０に格納された定義情報に基づき、自ノードに与えられる試行のための割当て時間の開始時刻を算出し、遅延部１５３ｂは、算出された開始時刻まで待機状態を継続させ、サービス起動部１５３ｃは、開始時刻でサービス起動を試行し、試行の結果に応じてノードの動作を制御する。 The abnormality detection manager 140 monitors the operating state of the network, HDD, application, node, etc., and detects an abnormality.
The switching manager 150 performs node operation switching control as necessary, such as when the abnormality detection manager 140 detects an abnormality. In the node arbitration 151, when the heartbeat communication is normal, the node arbitration is performed based on the information obtained by the heartbeat. In the SCSI arbitration 152, when the heartbeat communication is lost, the node tries to acquire a quorum by using the SCSI command to arbitrate the node. The quorum is on the shared disk device 20 connected from all the nodes constituting the cluster in the MSCS, and can be accessed from only one node during cluster operation. It is used to maintain the consistency of the cluster configuration information between the nodes, and provides arbitration means when split brain occurs. When a split brain occurs, the node that can possess the quorum survives. In the autonomous arbitration 153, when the heartbeat communication is lost, each node autonomously tries to start the service and arbitrates the node. In the autonomous arbitration 153, the allocation time calculation unit 153a calculates the start time of the allocation time for the trial given to the own node based on the definition information stored in the definition information storage device 160, and the delay unit 153b calculates The service activation unit 153c attempts to activate the service at the start time and controls the operation of the node according to the result of the attempt.

リソースマネージャー１７０は、サーバ１００が保有するリソースの管理を行う。アプリケーションマネージャー１８０は、サーバ１００のアプリケーション実行を管理する。
以上のような構成のサーバを有するクラスタシステムにおけるスプリットブレイン発生時の自律調停１５３によるノード制御動作について説明する。 The resource manager 170 manages resources held by the server 100. The application manager 180 manages application execution of the server 100.
The node control operation by the autonomous arbitration 153 when a split brain occurs in the cluster system having the server configured as described above will be described.

自律調停１５３は、定義情報記憶装置１６０に格納された定義情報に基づいて行われる。図５は、第１の実施の形態の定義情報の一例を示した図である。
定義情報ファイル３００は、定義情報記憶装置１６０に格納される。定義情報ファイル３００には、時間情報３０１とノード情報３０２が設定されている。 The autonomous arbitration 153 is performed based on the definition information stored in the definition information storage device 160. FIG. 5 is a diagram illustrating an example of the definition information according to the first embodiment.
The definition information file 300 is stored in the definition information storage device 160. In the definition information file 300, time information 301 and node information 302 are set.

時間情報３０１には、サービス処理を起動する割当て時間を示した業務起動完了タイムアウト値が定義されている。割当て時間、すなわち、開始時刻から業務起動完了タイムアウト値が経過するまでの時間内に業務（サービス）の起動を試行する。図の例では、３６０秒が設定されている。なお、実際の試行処理の終了時刻は、業務起動完了タイムアウト値で設定された値より前に設定する。たとえば、割当て時間の８０％から９０％を消費した時刻を終了時刻とし、その後を調整時間として残す。このように調整時間を設けることにより、各ノードが計時する時刻のずれを吸収することができる。調整時間の割合は、適宜調整されて設定される。 In the time information 301, a job start completion timeout value indicating an allocation time for starting a service process is defined. An attempt is made to start a business (service) within the allotted time, ie, the time from the start time until the business start completion timeout value elapses. In the illustrated example, 360 seconds are set. Note that the actual end time of the trial process is set before the value set for the task activation completion timeout value. For example, the time when 80% to 90% of the allocated time is consumed is set as the end time, and the subsequent time is left as the adjustment time. By providing the adjustment time in this way, it is possible to absorb a time lag that each node measures. The ratio of the adjustment time is appropriately adjusted and set.

また、ノード情報３０２には、クラスタシステムを構成するノード名と、その優先度が定義されている。図の例では、優先度１にはノードＡが設定され、優先度２にはノードＢが設定されている。 The node information 302 defines the names of nodes constituting the cluster system and their priorities. In the illustrated example, node A is set for priority 1 and node B is set for priority 2.

割当て時間算出部１５３ａは、上記の説明の定義情報に基づいて、自ノードの割当て時間を算出する。図６は、第１の実施の形態の割当て時間の経過状態を示したイメージ図である。割当て時間算出部１５３ａでは、基準時刻、図の例では、グリニッジ標準時（Greenwich Mean Time；以下、ＧＭＴとする）４０１から、計時手段の計時した現在時刻４０２までの時間差を算出し、自ノードの優先順位および調停時間の設定値より、自ノードの割当て時間を算出する。なお、現在時刻４０２が自ノードの割当て時間にあたる場合、サービス起動処理を終了できない場合があるので、次の割当て時間が開始時刻として算出されるようにする。したがって、優先順位をＰ、割当て時間をＴｗ、基準時をＧＭＴとすると、自ノードの現在時刻に最も近い次の割当て時間の開始時刻Ｔｓは、
Ｔｓ＝ｉｎｔ［（現在時刻―ＧＭＴ＋Ｔｗ）／Ｔｗ］×Ｔｗ＋（Ｐ−１）×Ｔｗ
・・・（１）
で表すことができる。ここで、ｉｎｔ［］は、［］の値を整数化した値を示す。 The allocation time calculation unit 153a calculates the allocation time of the own node based on the definition information described above. FIG. 6 is an image diagram showing an elapsed time of the allocation time according to the first embodiment. The allocation time calculation unit 153a calculates the time difference from the reference time, in the example shown in the figure, from Greenwich Mean Time (hereinafter referred to as GMT) 401 to the current time 402 measured by the time measuring means, and the priority of the own node is calculated. The allocation time of the own node is calculated from the set values of the rank and the arbitration time. When the current time 402 corresponds to the allocation time of the own node, the service activation process may not be completed, so the next allocation time is calculated as the start time. Therefore, when the priority is P, the allocation time is Tw, and the reference time is GMT, the start time Ts of the next allocation time closest to the current time of the own node is
Ts = int [(current time−GMT + Tw) / Tw] × Tw + (P−1) × Tw
... (1)
Can be expressed as Here, int [] represents a value obtained by converting the value of [] into an integer.

そして、終了時刻Ｔｅは、調停時間の割合をＲとして、
Ｔｅ＝Ｔｓ＋Ｔｗ×（１−Ｒ）・・・（２）
によって算出することができる。 And the end time Te is R as the ratio of mediation time.
Te = Ts + Tw × (1-R) (2)
Can be calculated.

これにより、次の優先順位１のノードの割当て時間の開始時刻Ｔｓ１と終了時刻Ｔｅ１が算出される。同様にして、優先順位２のノードの割当て時間の開始時刻Ｔｓ２と終了時刻Ｔｅ２が算出される。優先順位１のノードの割当て時間の終了時刻Ｔｅ１と優先順位２のノードの割当て時間の開始時刻Ｔｓ２との間には、調整時間４０３が設けられる。 Thereby, the start time Ts1 and the end time Te1 of the allocation time of the next priority 1 node are calculated. Similarly, the start time Ts2 and the end time Te2 of the allocation time of the node with priority 2 are calculated. An adjustment time 403 is provided between the end time Te1 of the allocation time of the priority 1 node and the start time Ts2 of the allocation time of the priority 2 node.

割当て時間が算出されると、遅延部１５３ｂは、割当て時間の開始時刻まで待機状態になる。なお、スプリットブレインが発生したとき、自ノードで業務継続稼動中（クォーラム所持）であれば、割当て時間に関係なく何もしない（処理を継続する）。そして、クォーラムを継続所持し、他ノードからのクォーラム獲得をガードする。また、クォーラムを所持しているが業務継続中でなければ、そのノードで業務稼動は不可のはずという判断により、クォーラムを手放して、他ノードでＭＳＣＳサービスの動作を可能とするため、即座にＭＳＣＳサービスを停止させる。 When the allocation time is calculated, the delay unit 153b enters a standby state until the start time of the allocation time. Note that when split brain occurs, if the current node is operating continuously (has quorum), nothing is done regardless of the allocation time (processing is continued). And it keeps quorum and guards quorum acquisition from other nodes. In addition, if the quorum is owned but the business is not ongoing, it is determined that the business operation should not be possible at that node, so that the quorum is released and the MSCS service can be operated on other nodes. Stop the service.

そして、割当て時間の開始時刻になったら、サービス起動部１５３ｃは、ＭＳＣＳサービスの起動を試みる。割当て時間の終了時刻になっても業務稼動まで行えなかった場合、自ノードを停止させる。このとき、必要に応じて、電源切断まで行う。 When the start time of the allocated time is reached, the service activation unit 153c attempts to activate the MSCS service. If the business operation cannot be performed even after the end time of the allocated time, the local node is stopped. At this time, the power is turned off as necessary.

以上の処理により、スプリットブレイン発生時、どこかのノードで業務が正常稼動していた場合、そのノードで業務が稼動し続け、他のノードすべてが停止する。場合によっては、他ノードすべての電源が切断される。 With the above processing, when a split brain occurs, if a business is operating normally on some node, the business continues to operate on that node, and all other nodes stop. In some cases, the power to all other nodes is turned off.

また、スプリットブレイン発生時、業務は停止していたが、最低１台業務が正常稼動するノードがある場合、正常稼動するノードのうち、先に割当て時間を得たノードで業務が継続稼動し、他のノードは停止する。 Also, when the split brain occurred, the business was stopped, but if there is a node where at least one of the business is operating normally, among the nodes that are operating normally, the business will continue to operate on the node that got the allocated time first, The other nodes are stopped.

さらに、スプリットブレイン発生時、すべてのノードで業務継続不可の場合、全ノードが停止する。
このように、第１の実施の形態では、各ノードが割当て時間にサービス起動を試行することによって、正常なノードがあれば、そのノードによって業務を継続させることができる。 Furthermore, when a split brain occurs, if all the nodes cannot continue the business, all the nodes are stopped.
As described above, in the first embodiment, when each node tries to start the service at the allocated time, if there is a normal node, the business can be continued by that node.

次に、スプリットブレイン発生時の様々な状態における、２つのノード、ノードＡとノードＢのそれぞれの動作を説明する。
図７は、第１の実施の形態におけるスプリットブレイン発生時のノードの状態とその動作を示した図である。図の例では、割当て時間を３００秒、調整時間を６０秒としている。また、簡単のため、最初の割当て時間はノードＡ、次の割当て時間はノードＢに与えられるとする。 Next, the operation of each of the two nodes, node A and node B, in various states when split brain occurs will be described.
FIG. 7 is a diagram showing a node state and its operation when split brain occurs in the first embodiment. In the illustrated example, the allocation time is 300 seconds and the adjustment time is 60 seconds. For the sake of simplicity, it is assumed that the first allocation time is given to node A and the next allocation time is given to node B.

（１）スプリットブレイン発生時、ノードＡで業務稼動中であった場合
最初の割当て時間（〜２４０秒）では、ノードＡでは何も処理しないので、業務稼動中を継続する。このとき、ノードＢは、ＭＳＣＳサービスを停止している。次の割当て時間（〜５４０秒）において、ノードＢが稼動を試行するが、ノードＡが稼動中であるため失敗し、割当て時間終了（〜６００秒）で処理を停止させる。こうして、正常に業務稼動中であったノードＡが業務を継続する。 (1) When split brain occurs and node A is in operation The node A does not process anything during the first allocation time (up to 240 seconds), so the operation is continued. At this time, the Node B stops the MSCS service. At the next allocation time (up to 540 seconds), the node B tries to operate, but fails because the node A is in operation, and stops processing at the end of the allocation time (up to 600 seconds). Thus, the node A that has been operating normally continues the operation.

（２）スプリットブレイン発生時、ノードＢで業務稼動中であった場合
最初の割当て時間（〜２４０秒）において、ノードＡが稼動を試行するが、ノードＢが稼動中であるため失敗し、割当て時間終了（〜３００秒）で処理を停止させる。一方、ノードＢは、業務稼動中であるので、そのまま業務稼動を継続する。こうして、正常に業務稼動中であったノードＢが業務を継続する。 (2) When split brain occurs and node B is in business operation Node A attempts to operate during the first allocation time (up to 240 seconds), but fails because node B is active Stop processing at the end of time (~ 300 seconds). On the other hand, since the node B is in business operation, the business operation is continued as it is. In this way, the node B that has been operating normally continues the operation.

（３）スプリットブレイン発生時、ノードＡがクォーラムを所持しているが業務を停止しており、ノードＢが業務稼動可である場合
最初の割当て時間（〜２４０秒）では、ノードＡでは、一旦処理を停止させてクォーラムを手放した後、業務稼動中を試行するが失敗し、割当て時間終了（〜３００秒）で処理を停止する。次の割当て時間（〜５４０秒）において、ノードＢが稼動を試行し、成功するので、業務稼動を継続する。こうして、業務稼動可のノードＢが業務を継続する。 (3) When split brain occurs, node A possesses quorum but the business is stopped, and node B is operational. At the first allocation time (~ 240 seconds), node A once After the process is stopped and the quorum is released, the business operation is attempted but fails, and the process stops at the end of the allocation time (up to 300 seconds). In the next allocation time (˜540 seconds), Node B tries to operate and succeeds, so the business operation continues. In this way, the node B where the business operation is possible continues the business.

（４）スプリットブレイン発生時、ノードＢがクォーラムを所持しているが業務を停止しており、ノードＡが業務稼動可である場合
最初の割当て時間（〜２４０秒）では、ノードＢが処理を停止しクォーラムを手放す。ノードＡでは業務稼動中を試行して成功し、割当て時間終了（〜３００秒）で処理を継続する。次の割当て時間（〜５４０秒）において、ノードＢが稼動を試行するが、失敗して処理を停止する。こうして、業務稼動可のノードＡが業務を継続する。 (4) When split brain occurs, Node B possesses quorum but the business is stopped, and Node A is operational. Node B performs processing during the first allocation time (up to 240 seconds). Stop and let go of the quorum. At node A, the business operation is attempted and succeeded, and the processing is continued at the end of the allocation time (˜300 seconds). At the next allocation time (˜540 seconds), Node B tries to operate, but fails and stops processing. In this way, the node A where the business operation is possible continues the business.

（５）スプリットブレイン発生時、両ノードがサービスを停止し、ノードＡで業務稼動可である場合
最初の割当て時間（〜２４０秒）では、ノードＡが業務稼動中を試行して成功し、割当て時間終了（〜３００秒）で処理を継続する。次の割当て時間（〜５４０秒）において、ノードＢが稼動を試行するが、失敗して処理を停止する。こうして、業務稼動可のノードＡが業務を継続する。 (5) When split brain occurs, both nodes stop the service, and the operation can be performed on node A. During the first allocation time (up to 240 seconds), node A tries to be in operation and succeeds. Continue processing at the end of time (~ 300 seconds). At the next allocation time (˜540 seconds), Node B tries to operate, but fails and stops processing. In this way, the node A where the business operation is possible continues the business.

（６）スプリットブレイン発生時、両ノードがサービスを停止し、ノードＢで業務稼動可である場合
最初の割当て時間（〜２４０秒）では、ノードＡでは業務稼動中を試行するが失敗し、割当て時間終了（〜３００秒）で処理を停止する。次の割当て時間（〜５４０秒）において、ノードＢが稼動を試行し、成功するので、業務稼動を継続する。こうして、業務稼動可のノードＢが業務を継続する。 (6) When split brain occurs, both nodes stop the service, and the operation can be performed at node B. At the first allocation time (up to 240 seconds), node A tries to be in operation, but fails. Stop processing at the end of time (~ 300 seconds). In the next allocation time (˜540 seconds), Node B tries to operate and succeeds, so the business operation continues. In this way, the node B where the business operation is possible continues the business.

（７）スプリットブレイン発生時、両ノードがサービスを停止し、両ノードとも業務稼動不可である場合
最初の割当て時間（〜２４０秒）では、ノードＡでは業務稼動中を試行するが失敗し、割当て時間終了（〜３００秒）で処理を停止する。次の割当て時間（〜５４０秒）において、ノードＢが稼動を試行するが、失敗して処理を停止する。この場合は、両ノードとも業務を継続することができない。 (7) When split brain occurs, both nodes stop service, and both nodes are unable to operate. At the first allocation time (up to 240 seconds), node A tries to be in operation but fails. Stop processing at the end of time (~ 300 seconds). At the next allocation time (˜540 seconds), Node B tries to operate, but fails and stops processing. In this case, both nodes cannot continue the business.

（８）スプリットブレイン発生時、ノードＡが動作を停止し、ノードＢはサービス停止中であるが業務稼動可である場合
ノードＡは動作停止であるので、最初の割当て時間（〜２４０秒）は何も起こらない。次の割当て時間（〜５４０秒）において、ノードＢが稼動を試行し、成功するので、業務稼動を継続する。こうして、業務稼動可のノードＢが業務を継続する。 (8) When split brain occurs, node A stops operating, and node B is in service stop but business operation is possible. Since node A is in operation stop, the first allocation time (up to 240 seconds) is Nothing happens. In the next allocation time (˜540 seconds), Node B tries to operate and succeeds, so the business operation continues. In this way, the node B where the business operation is possible continues the business.

（９）スプリットブレイン発生時、ノードＢが動作を停止し、ノードＡはサービス停止中であるが業務稼動可である場合
最初の割当て時間（〜２４０秒）では、ノードＡが業務稼動中を試行して成功し、割当て時間終了（〜３００秒）で処理を継続する。次の割当て時間（〜５４０秒）において、ノードＢは何もしない。こうして、業務稼動可のノードＡが業務を継続する。 (9) When split brain occurs, node B stops operating, and node A is in service stop but business operation is possible. At the first allocation time (up to 240 seconds), node A tries to be in business operation. If the allocation time ends (up to 300 seconds), the process is continued. At the next allocation time (˜540 seconds), Node B does nothing. In this way, the node A where the business operation is possible continues the business.

（１０）スプリットブレイン発生時、両ノードがサービスを停止し、ノードＡが停止すればノードＢで業務稼動可である場合
最初の割当て時間（〜２４０秒）では、ノードＡでは業務稼動中を試行するが失敗し、割当て時間終了（〜３００秒）で処理を停止する。次の割当て時間（〜５４０秒）において、ノードＢが稼動を試行し、成功するので、業務稼動を継続する。こうして、業務稼動可のノードＢが業務を継続する。 (10) When split brain occurs, both nodes stop the service, and if node A stops, business operation can be performed on node B. At the first allocation time (up to 240 seconds), node A tries to be in business operation. However, it fails, and the processing is stopped when the allocation time ends (up to 300 seconds). In the next allocation time (˜540 seconds), Node B tries to operate and succeeds, so the business operation continues. In this way, the node B where the business operation is possible continues the business.

（１１）スプリットブレイン発生時、両ノードがサービスを停止し、ノードＢが停止すればノードＡで業務稼動可である場合
最初の割当て時間（〜２４０秒）では、ノードＢが停止していないので、ノードＡでは業務稼動中を試行するが失敗し、割当て時間終了（〜３００秒）で処理を停止する。次の割当て時間（〜５４０秒）において、ノードＢが稼動を試行するが、失敗して処理を停止する。この場合、両ノードとも業務を継続することができないが、たとえば、ノードＡが試行をリトライすれば、ノードＡで業務を継続することができる。また、ノードＢの割当て時間が先であれば、ケース（１１）と同様に、後から割当て時間になるノードＡが業務を継続する。 (11) When split brain occurs, both nodes stop the service, and if node B stops, business operation can be performed on node A. Node B is not stopped at the first allocation time (up to 240 seconds). At node A, the business operation is attempted, but fails, and the processing is stopped at the end of the allocation time (˜300 seconds). At the next allocation time (˜540 seconds), Node B tries to operate, but fails and stops processing. In this case, although both nodes cannot continue the business, for example, if node A retries the trial, the business can be continued at node A. Further, if the allocation time of node B is earlier, as in the case (11), the node A that becomes the allocation time later continues the business.

以上、第１の実施の形態によれば、両ノードが業務稼動不可である場合を除き、正常なノードで業務を継続することができる。なお、上記では、２ノードの場合について説明したが、２ノード以上でも同様にノードの調停を行うことができる。 As described above, according to the first embodiment, a business can be continued on a normal node, except when both nodes cannot perform business operations. In the above description, the case of two nodes has been described, but node arbitration can be similarly performed with two or more nodes.

次に、第１の実施の形態のノード制御処理の手順について説明する。図８は、第１の実施の形態のノード制御処理の手順を示したフローチャートである。第１の実施の形態のノード制御では、スプリットブレイン発生時、まず、ＭＳＣＳサービスによるＳＣＳＩを用いたノード調停を行った後、通信を用いない自律調停を行う。 Next, the procedure of node control processing according to the first embodiment will be described. FIG. 8 is a flowchart illustrating a procedure of node control processing according to the first embodiment. In the node control according to the first embodiment, when split brain occurs, first, node arbitration using SCSI by the MSCS service is performed, and then autonomous arbitration without using communication is performed.

［ステップＳ１１］通信監視処理でパブリックＬＡＮおよびプライベートＬＡＮを用いたハートビート通信を監視している。通信監視処理によって、パブリックＬＡＮによる通信不通が検出されたかどうかを判定し、検出された場合は、ステップＳ１２へ処理を進める。正常に通信が行われている場合は、処理を終了する。 [Step S11] Heartbeat communication using the public LAN and private LAN is monitored in the communication monitoring process. It is determined by the communication monitoring process whether or not communication failure by the public LAN is detected. If detected, the process proceeds to step S12. If communication is performed normally, the process ends.

［ステップＳ１２］通信監視処理によって、プライベートＬＡＮによる通信不通が検出されたかどうかを判定する。通信不通の場合、処理をステップＳ１３へ進める。正常に通信が行われている場合は、処理を終了する。 [Step S12] It is determined whether or not communication failure by the private LAN is detected by the communication monitoring process. If communication is not established, the process proceeds to step S13. If communication is performed normally, the process ends.

［ステップＳ１３］パブリックＬＡＮおよびプライベートＬＡＮともに通信が不通となったので、スプリットブレインの発生を検出する。
［ステップＳ１４］スプリットブレインにより、まず、ＭＳＣＳクラスタサービスによるＳＣＳＩを用いたノードの調整を行う。 [Step S13] Since both the public LAN and private LAN have lost communication, the occurrence of split brain is detected.
[Step S14] First, node adjustment using SCSI by the MSCS cluster service is performed by split brain.

［ステップＳ１５］クラスタによる調整の結果、自ノードがクォーラムを保持し、共有ディスクをロックしているかどうかを判定する。ロックしていない場合は、処理をステップＳ１８へ進める。 [Step S15] As a result of adjustment by the cluster, it is determined whether or not the own node holds the quorum and locks the shared disk. If not locked, the process proceeds to step S18.

［ステップＳ１６］クォーラムを保持し、共有ディスクをロックしている場合、業務を継続中であるかどうかを判定する。業務が継続中でなければ処理をステップＳ１８へ進める。 [Step S16] If the quorum is held and the shared disk is locked, it is determined whether or not the business is ongoing. If the business is not ongoing, the process proceeds to step S18.

［ステップＳ１７］自ノードがクォーラムを所持して共有ディスクをロックし、業務継続中である場合、業務稼動可の状態と判断し、動作を継続する。これにより、スプリットブレイン発生後、このノードが業務を継続する。 [Step S17] When the own node has the quorum and locks the shared disk and the business is continuing, it is determined that the business is operational and the operation is continued. Thus, after the split brain occurs, this node continues the business.

［ステップＳ１８］自ノードがクォーラムを所持していないか、クォーラムを所持しているが業務を停止している場合、自律調停によるサービス起動処理を行う。
次に、サービス起動処理について説明する。図９は、第１の実施の形態のサービス起動処理の手順を示したフローチャートである。スプリットブレインの後、自ノードがクォーラムを所持していないか、所持していても業務継続中でない場合に処理が起動される。 [Step S18] When the own node does not have a quorum, or has a quorum but has stopped the business, a service activation process by autonomous arbitration is performed.
Next, service activation processing will be described. FIG. 9 is a flowchart illustrating a procedure of service activation processing according to the first embodiment. After the split brain, the process is started when the own node does not have the quorum or does not continue the business even if it has the quorum.

［ステップＳ２１］ノードを停止する。これにより、クォーラムを所持していた場合、クォーラムを手放して他のノードのＭＳＣＳサービスの動作を可能とする。
［ステップＳ２２］定義情報記憶装置１６０に格納されている定義ファイルから定義情報を取得する。定義ファイルには、割当て時間を示す時間情報と、優先順位を示すノード情報が定義されている。 [Step S21] The node is stopped. As a result, when the quorum is possessed, the quorum is released, and the operation of the MSCS service of another node is enabled.
[Step S22] Definition information is acquired from a definition file stored in the definition information storage device 160. In the definition file, time information indicating allocation time and node information indicating priority are defined.

［ステップＳ２３］計時手段からシステム時刻を取得する。計時手段の計時する時刻は、ＤＣからの時間情報によって、システムで同じ時刻となるように調整されている。
［ステップＳ２４］基準時刻から現在時刻までの経過時間と、割当て時間と優先順位を用いて、式（１）によって割当て時間の開始時刻を算出する。そして、スリープ状態になり、算出された自ノードの割当て時間の開始時刻まで待機する。 [Step S23] The system time is acquired from the time measuring means. The time measured by the time measuring means is adjusted to be the same time in the system based on time information from the DC.
[Step S24] Using the elapsed time from the reference time to the current time, the allocation time, and the priority order, the start time of the allocation time is calculated by Equation (1). Then, it enters a sleep state and waits until the start time of the calculated allocation time of its own node.

［ステップＳ２５］割当て時間の開始時刻でノードの起動を開始する。
［ステップＳ２６］割当て時間内に業務起動処理が完了したかどうかを判定する。自ノードが正常で、他ノードが業務稼動中でなければ、業務起動処理は正常に完了する。自ノードに故障があるか、他ノードが業務の稼動中であれば、業務起動処理は完了しない。割当て時間内に業務起動が完了すれば、処理をステップＳ２７へ進め、業務が起動していなければ、処理をステップＳ２９へ進める。 [Step S25] Start of the node is started at the start time of the allocation time.
[Step S26] It is determined whether or not the task activation processing is completed within the allocated time. If the local node is normal and the other nodes are not operating, the business activation process is completed normally. If the own node has a failure or the other node is operating, the business activation process is not completed. If the business activation is completed within the allocated time, the process proceeds to step S27. If the business is not activated, the process proceeds to step S29.

［ステップＳ２７］自ノードの業務起動処理が完了したので、動作を継続し、業務を継続させる。
［ステップＳ２８］自ノードの業務起動処理が完了しなかったので、ノードを停止させる。 [Step S27] Since the task activation process of the own node is completed, the operation is continued and the task is continued.
[Step S28] Since the task activation processing of the own node has not been completed, the node is stopped.

［ステップＳ２９］自ノードの電源を切断し、システムへの影響を断つ。
以上の処理手順が実行されることにより、スプリットブレイン発生によりノード間の通信が不通となっても、正常なノードが業務起動処理を完了し、業務処理を継続して行うことが可能なる。 [Step S29] The power supply of the own node is turned off to cut off the influence on the system.
By executing the above processing procedure, even if communication between nodes is interrupted due to the occurrence of a split brain, a normal node can complete the business activation processing and continue the business processing.

［第２の実施の形態］
ところで、クラスタシステムにおいては、起動時にもノード間の調停を行わなければならない。第１の実施の形態では、スプリットブレイン発生時に通信を行わずにノード間の調停を行ったが、第２の実施の形態では、クラスタシステムの起動時にノード間の通信を行わずに調停を行う。 [Second Embodiment]
By the way, in a cluster system, it is necessary to perform arbitration between nodes even at startup. In the first embodiment, arbitration between nodes is performed without performing communication when split brain occurs. In the second embodiment, arbitration is performed without performing communication between nodes when the cluster system is started. .

なお、第２の実施の形態におけるクラスタシステムの構成は、図２に示した第１の実施の形態のシステム構成と同様であり、サーバ（ノード）のハードウェア構成は、図３に示した第１の実施の形態のハードウェア構成と同様である。また、サーバ（ノード）が有する処理機能の構成要素は、図４に示した第１の実施の形態の構成要素と同様である。そこで、図４に示した構成要素の符号を用いて、第２の実施の形態における機能を説明する。 The configuration of the cluster system in the second embodiment is the same as the system configuration of the first embodiment shown in FIG. 2, and the hardware configuration of the server (node) is the same as that shown in FIG. The hardware configuration is the same as that of the first embodiment. Further, the components of the processing function possessed by the server (node) are the same as the components of the first embodiment shown in FIG. Therefore, functions in the second embodiment will be described using the reference numerals of the components shown in FIG.

第２の実施の形態では、定義情報記憶装置１６０には、システム起動時に通信を用いない自律起動処理を行うための定義情報が格納される。図１０は、第２の実施の形態の定義情報の一例を示した図である。 In the second embodiment, the definition information storage device 160 stores definition information for performing an autonomous activation process that does not use communication when the system is activated. FIG. 10 is a diagram illustrating an example of definition information according to the second embodiment.

定義情報ファイル５００には、時間情報５０１とクラスタノード数５０２とノード情報５０３が設定されている。なお、図５に示した第１の実施の形態では、クラスタノード数はなかったが、たとえば、クラスタノード数が固定されている場合や、ノード情報からクラスタノード数が算出できる場合などには必要ない。定義情報ファイル５００の場合も、ノード情報５０３からクラスタノード数を算出することができる。 In the definition information file 500, time information 501, cluster node number 502, and node information 503 are set. In the first embodiment shown in FIG. 5, there is no number of cluster nodes. However, it is necessary, for example, when the number of cluster nodes is fixed or when the number of cluster nodes can be calculated from node information. Absent. Also in the case of the definition information file 500, the number of cluster nodes can be calculated from the node information 503.

時間情報５０１には、システム起動時のクラスタサービスを起動する割当て時間を示したクラスタサービス起動完了タイムアウト値が定義されている。割当て時間、すなわち、開始時刻からクラスタサービス起動完了タイムアウト値が経過するまでの時間内にクラスタサービスの起動を試行する。図の例では、２０秒が設定されている。第１の実施の形態の業務起動完了タイムアウト値（図５の例では３６０秒）と比較して、小さい値が設定されているのは、業務の起動が正常にできるかどうかを試行する必要がないからである。なお、調整時間は、必要に応じて適宜設定される。 The time information 501 defines a cluster service activation completion timeout value indicating an allocation time for activating the cluster service when the system is activated. Attempts to start the cluster service within the allotted time, that is, the time from the start time until the cluster service start completion timeout value elapses. In the example of the figure, 20 seconds are set. Compared to the task start completion timeout value (360 seconds in the example of FIG. 5) of the first embodiment, the smaller value is set because it is necessary to try whether the task can be started normally. Because there is no. The adjustment time is appropriately set as necessary.

クラスタノード数５０２には、クラスタシステムを構成するノード数が定義される。図の例では、３が設定されている。
また、ノード情報５０３には、クラスタシステムを構成するノード名と、その優先度が定義されている。図の例では、優先度１にはノードＡ、優先度２にはノードＢおよび優先度３にはノードＣが設定されている。 The number of nodes constituting the cluster system is defined as the number of cluster nodes 502. In the illustrated example, 3 is set.
The node information 503 defines the names of nodes constituting the cluster system and their priorities. In the example of the figure, node A is set for priority 1, node B is set for priority 2, and node C is set for priority 3.

第２の実施の形態におけるノード起動処理の全体的な流れは、図９に示した第１の実施の形態と同じである。ただし、時間内に業務起動が完了したかどうかの判断（図９のステップＳ２６）以降の処理は行わない。 The overall flow of the node activation process in the second embodiment is the same as that of the first embodiment shown in FIG. However, the processing after the determination of whether or not the business activation is completed within the time (step S26 in FIG. 9) is not performed.

図１１は、第２の実施の形態のサービス起動処理の手順を示したフローチャートである。電源投入などによってサーバが起動し、処理が開始される。
［ステップＳ３１］定義情報記憶装置１６０に格納されている定義ファイルから定義情報を取得する。定義情報には、割当て時間を示す時間情報と、優先順位を示すノード情報が定義されている。 FIG. 11 is a flowchart illustrating a procedure of service activation processing according to the second embodiment. When the power is turned on, the server is started and processing is started.
[Step S31] Definition information is acquired from a definition file stored in the definition information storage device 160. In the definition information, time information indicating allocation time and node information indicating priority are defined.

［ステップＳ３２］計時手段からシステム時刻を取得する。計時手段の計時する時刻は、ＤＣからの時間情報によって、システムで同じ時刻となるように調整されている。
［ステップＳ３３］基準時刻から現在時刻までの経過時間と、割当て時間と優先順位を用いて、式（１）によって割当て時間の開始時刻を算出する。そして、スリープ状態になり、算出された自ノードの割当て時間の開始時刻まで待機する。 [Step S32] The system time is acquired from the time measuring means. The time measured by the time measuring means is adjusted to be the same time in the system based on time information from the DC.
[Step S33] Using the elapsed time from the reference time to the current time, the assigned time, and the priority order, the start time of the assigned time is calculated by Equation (1). Then, it enters a sleep state and waits until the start time of the calculated allocation time of its own node.

［ステップＳ３４］割当て時間の開始時刻で、ＭＳＣＳクラスタサービス処理を開始してノードを起動する。
以上の手順が実行されることにより、クラスタシステムの立ち上げ時に、定義情報に従って、各ノードのＭＳＣＳクラスタサービスの開始時刻がずれる。これによって、複数のノードが一度に立ち上がり、クラスタシステムが正常に立ち上がらない状態となることを防止することができる。従来、クラスタシステムの立ち上げ時には、各ノードに立ち上がりを遅延させるプログラムを組込んで、複数のノードが同時に起動されることを防止していた。本発明の第２の実施の形態によれば、定義情報に割当て時間と優先順位を定義しておくことにより、クラスタシステムの起動を円滑に行うことが可能となる。 [Step S34] At the start time of the allocation time, the MSCS cluster service process is started to activate the node.
By executing the above procedure, the start time of the MSCS cluster service of each node is shifted according to the definition information when the cluster system is started up. As a result, it is possible to prevent a plurality of nodes from starting up at a time and the cluster system from starting up normally. Conventionally, when a cluster system is started up, a program for delaying the start-up is incorporated in each node to prevent a plurality of nodes from being started simultaneously. According to the second embodiment of the present invention, it is possible to start the cluster system smoothly by defining the allocation time and priority in the definition information.

なお、上記の処理機能は、コンピュータによって実現することができる。その場合、クラスタシステムを構成するサーバが有すべき機能の処理内容を記述したプログラムが提供される。そのプログラムをコンピュータで実行することにより、上記処理機能がコンピュータ上で実現される。処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリなどがある。磁気記録装置には、ハードディスク装置（ＨＤＤ）、フレキシブルディスク（ＦＤ）、磁気テープなどがある。光ディスクには、ＤＶＤ(Digital Versatile Disc)、ＤＶＤ−ＲＡＭ、ＣＤ−ＲＯＭ(Compact Disc Read Only Memory)、ＣＤ−Ｒ(Recordable)／ＲＷ(ReWritable)などがある。光磁気記録媒体には、ＭＯ(Magneto-Optical disk)などがある。 The above processing functions can be realized by a computer. In this case, a program describing the processing contents of the functions that should be possessed by the servers constituting the cluster system is provided. By executing the program on a computer, the above processing functions are realized on the computer. The program describing the processing contents can be recorded on a computer-readable recording medium. Examples of the computer-readable recording medium include a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory. Examples of the magnetic recording device include a hard disk device (HDD), a flexible disk (FD), and a magnetic tape. Optical disks include DVD (Digital Versatile Disc), DVD-RAM, CD-ROM (Compact Disc Read Only Memory), CD-R (Recordable) / RW (ReWritable), and the like. Magneto-optical recording media include MO (Magneto-Optical disk).

プログラムを流通させる場合には、たとえば、そのプログラムが記録されたＤＶＤ、ＣＤ−ＲＯＭなどの可搬型記録媒体が販売される。また、プログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することもできる。 When distributing the program, for example, portable recording media such as a DVD and a CD-ROM in which the program is recorded are sold. It is also possible to store the program in a storage device of a server computer and transfer the program from the server computer to another computer via a network.

プログラムを実行するコンピュータは、たとえば、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、自己の記憶装置に格納する。そして、コンピュータは、自己の記憶装置からプログラムを読み取り、プログラムに従った処理を実行する。なお、コンピュータは、可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することもできる。また、コンピュータは、サーバコンピュータからプログラムが転送される毎に、逐次、受け取ったプログラムに従った処理を実行することもできる。 The computer that executes the program stores, for example, the program recorded on the portable recording medium or the program transferred from the server computer in its own storage device. Then, the computer reads the program from its own storage device and executes processing according to the program. The computer can also read the program directly from the portable recording medium and execute processing according to the program. In addition, each time the program is transferred from the server computer, the computer can sequentially execute processing according to the received program.

（付記１）クラスタシステムを構成する複数のノードのうち所定のサービスを提供する前記ノードを調停するためのノード制御プログラムにおいて、
コンピュータに、
前記ノード間で情報交換を行う通信路経由の通信の不通が検出されると、前記サービスを実行可能な割当て時間と前記割当て時間を取得する優先順位が予め定義された定義情報に基づき、計時された現在時刻から最も近い次の割当て時間の開始時刻を算出し、
前記開始時刻まで前記サービスの起動を遅延し、
前記開始時刻に到達すると、前記サービスを起動させ、前記割当て時間内に前記サービスが実行できれば動作を継続し、前記割当て時間内に前記サービスが実行できない場合は、動作を停止する、
手順を実行させることを特徴とするノード制御プログラム。 (Supplementary Note 1) In a node control program for arbitrating the node that provides a predetermined service among a plurality of nodes constituting a cluster system,
On the computer,
When it is detected that communication via a communication path for exchanging information between the nodes is detected, an allocation time during which the service can be executed and a priority order for acquiring the allocation time are counted based on predefined definition information. Calculate the start time of the next assigned time that is closest to the current time,
Delay activation of the service until the start time,
When the start time is reached, the service is started, and if the service can be executed within the allocated time, the operation is continued.If the service cannot be executed within the allocated time, the operation is stopped.
A node control program characterized by causing a procedure to be executed.

（付記２）前記次の割当て時間の開始時刻を算出する際に、
前記割当て時間が所定の基準時刻から前記優先順位に従って前記クラスタシステムを構成する各ノードに順に割当てられたとして前記ノードに割当てられる前記割当て時間の開始時刻を算出することを特徴とする付記１記載のノード制御プログラム。 (Appendix 2) When calculating the start time of the next allocated time,
The start time of the allocation time allocated to the node is calculated assuming that the allocation time is sequentially allocated to each node constituting the cluster system according to the priority from a predetermined reference time. Node control program.

（付記３）前記次の割当て時間の開始時刻を算出する際に、
前記割当て時間の終了時刻と開始時刻との間に前記クラスタシステムの各ノードが計時する時間のずれを調整する調整時間を加えることを特徴とする付記１記載のノード制御プログラム。 (Supplementary Note 3) When calculating the start time of the next allocated time,
The node control program according to claim 1, wherein an adjustment time for adjusting a time lag timed by each node of the cluster system is added between the end time and the start time of the allocation time.

（付記４）前記クラスタシステムを含むグループ全体を管理する管理装置から時間情報を取得して時刻調整を行い、
前記管理装置から前記時間情報が取得できない場合には、前記定義情報の前記優先順位を下げることを特徴とする付記１記載のノード制御プログラム。 (Additional remark 4) Time information is acquired from the management apparatus which manages the whole group containing the said cluster system, time adjustment is performed,
The node control program according to claim 1, wherein when the time information cannot be acquired from the management apparatus, the priority of the definition information is lowered.

（付記５）前記ノード間の通信不通が検出された際に、前記サービスを正常に実行していた場合は、動作を継続し、前記次の割当て時間の開始時刻からの手順を実行しないことを特徴とする付記１記載のノード制御プログラム。 (Supplementary Note 5) When communication failure between the nodes is detected, if the service is normally executed, the operation is continued and the procedure from the start time of the next allocation time is not executed. The node control program according to appendix 1, which is characterized.

（付記６）前記ノード間の情報交換を行う前記通信路以外の第２の通信経路を有していた場合、前記第２の通信経路を用いて調停を行った後、前記次の割当て時間の開始時刻からの手順を実行することを特徴とする付記１記載のノード制御プログラム。 (Supplementary Note 6) When the second communication path other than the communication path for exchanging information between the nodes is included, after the arbitration is performed using the second communication path, the next allocated time The node control program according to appendix 1, wherein the procedure from the start time is executed.

（付記７）前記サービスが実行できない場合に動作を停止する際に、前記コンピュータの電源を切断することを特徴とする付記１記載のノード制御プログラム。
（付記８）ノードとしてクラスタシステムを構成し、所定のサービスを提供する前記ノードを調停するサーバにおいて、
前記サービスを実行可能な割当て時間と、前記割当て時間を取得する優先順位が定義された定義情報を格納する定義情報記憶手段と、
現在時刻を計時する計時手段と、
前記ノード間で情報交換を行う通信路経由の通信の不通が検出されると、前記定義情報記憶手段に格納された前記定義情報に基づいて、前記計時手段より取得した前記現在時刻から最も近い次の割当て時間の開始時刻を算出する割当て時間算出手段と、
前記開始時刻まで前記サービスの起動を遅延させる遅延手段と、
前記開始時刻に到達すると、前記サービスを起動し、前記割当て時間内に前記サービスが実行できれば動作を継続させる処理手段と、
を具備することを特徴とするサーバ。 (Supplementary note 7) The node control program according to supplementary note 1, wherein the computer is turned off when the operation is stopped when the service cannot be executed.
(Supplementary Note 8) In a server that configures a cluster system as a node and arbitrates the node that provides a predetermined service,
Definition information storage means for storing definition information in which the allocation time during which the service can be executed and the priority for obtaining the allocation time is defined;
A time measuring means for measuring the current time;
When it is detected that communication via a communication path for exchanging information between the nodes is detected, the next closest to the current time acquired from the time measuring means is based on the definition information stored in the definition information storage means. Allocation time calculation means for calculating the start time of the allocation time of
Delay means for delaying activation of the service until the start time;
Processing means for activating the service when the start time is reached and continuing the operation if the service can be executed within the allocated time;
A server comprising:

（付記９）複数のノードで構成されるクラスタシステムの前記ノードの動作を制御するノード制御方法において、
前記ノード間の通信不通が検出されると、サービスを実行可能な割当て時間と前記割当て時間を取得する優先順位が予め定義された定義情報に基づき、計時された現在時刻から最も近い次の割当て時間の開始時刻を割当て時間算出手段で算出し、
前記開始時刻まで前記サービスの起動を遅延手段で遅延し、
前記開始時刻に到達すると、処理手段で、前記サービスを起動させ、前記割当て時間内に前記サービスが実行できれば動作を継続し、前記割当て時間内に前記サービスが実行できない場合は、動作を停止する、
手順を行うことを特徴とするノード制御方法。 (Supplementary Note 9) In a node control method for controlling the operation of the node in a cluster system composed of a plurality of nodes,
When communication interruption between the nodes is detected, the next allocation time that is closest to the current time measured based on the definition information in which the allocation time during which the service can be executed and the priority order for acquiring the allocation time are defined in advance Is calculated by the allocation time calculation means,
Delay the activation of the service by the delay means until the start time,
When the start time is reached, the processing means starts the service, and continues the operation if the service can be executed within the allocated time, and stops the operation if the service cannot be executed within the allocated time.
A node control method characterized by performing a procedure.

（付記１０）クラスタシステムを構成する複数のノードを起動させるためのノード制御プログラムにおいて、
コンピュータに、
起動処理が開始されると、予め定義情報に定義されたサービスを実行可能な割当て時間と前記割当て時間を取得する優先順位に基づき、次の割当て時間の開始時刻を算出し、
前記次の割当て時間の開始時刻まで前記サービスを遅延し、
前記次の割当て時間の開始時刻で前記サービスを開始させる、
手順を実行させることを特徴とするノード制御プログラム。 (Supplementary Note 10) In a node control program for starting a plurality of nodes constituting a cluster system,
On the computer,
When the start process is started, the start time of the next allocation time is calculated based on the allocation time in which the service defined in the definition information can be executed and the priority order for acquiring the allocation time,
Delay the service until the start time of the next allocated time,
Starting the service at the start time of the next allocated time;
A node control program characterized by causing a procedure to be executed.

（付記１１）クラスタシステムを構成するサーバにおいて、
前記クラスタシステムを含む所定のサービスを実行可能な割当て時間と、前記割当て時間を取得する優先順位が定義された定義情報を格納する定義情報記憶手段と、
現在時刻を計時する計時手段と、
起動処理が開始されると、前記定義情報記憶手段に格納された前記割当て時間と前記優先順位に基づいて、前記計時手段より取得した前記現在時刻から最も近い次の割当て時間の開始時刻を算出する割当て時間算出手段と、
前記割当て時間算出手段により算出された前記割当て時間の開始時刻に到達すると、前記サービスを開始し、前記割当て時間内に前記サービスが実行できれば動作を継続させる処理手段と、
を具備することを特徴とするサーバ。 (Additional remark 11) In the server which comprises a cluster system,
Definition information storage means for storing definition information in which a predetermined service including the cluster system can be executed and definition information in which a priority order for obtaining the allocation time is defined;
A time measuring means for measuring the current time;
When the start process is started, the start time of the next assigned time closest to the current time acquired from the time measuring means is calculated based on the assigned time and the priority order stored in the definition information storage means. Allocation time calculation means;
Processing means for starting the service when the start time of the allocation time calculated by the allocation time calculation means is reached, and continuing operation if the service can be executed within the allocation time;
A server comprising:

本発明の実施の形態に適用される発明の概念図である。It is a conceptual diagram of the invention applied to embodiment of this invention. 第１の実施の形態のシステムの構成図である。It is a block diagram of the system of 1st Embodiment. 第１の実施の形態のサーバのハードウェア構成例を示すブロック図である。It is a block diagram which shows the hardware structural example of the server of 1st Embodiment. 第１の実施の形態のサーバの内部構成を示すブロック図である。It is a block diagram which shows the internal structure of the server of 1st Embodiment. 第１の実施の形態の定義情報の一例を示した図である。It is the figure which showed an example of the definition information of 1st Embodiment. 第１の実施の形態の割当て時間の経過状態を示したイメージ図である。It is the image figure which showed the elapsed state of the allocation time of 1st Embodiment. 第１の実施の形態におけるスプリットブレイン発生時のノードの状態とその動作を示した図である。It is the figure which showed the state and operation | movement of the node at the time of split brain occurrence in 1st Embodiment. 第１の実施の形態のノード制御処理の手順を示したフローチャートである。It is the flowchart which showed the procedure of the node control processing of 1st Embodiment. 第１の実施の形態のサービス起動処理の手順を示したフローチャートである。It is the flowchart which showed the procedure of the service starting process of 1st Embodiment. 第２の実施の形態の定義情報の一例を示した図である。It is the figure which showed an example of the definition information of 2nd Embodiment. 第２の実施の形態のサービス起動処理の手順を示したフローチャートである。It is the flowchart which showed the procedure of the service starting process of 2nd Embodiment. 一般的なクラスタシステムの構成図である。It is a block diagram of a general cluster system.

Explanation of symbols

１クラスタシステム
３パブリックＬＡＮ
４プライベートＬＡＮ
１０ａ、１０ｂサーバ（ノード）
１１通信監視手段
１２計時手段
１３通信制御手段（パブリックＬＡＮ）
１４通信制御手段（プライベートＬＡＮ）
１５ノード制御部
１６定義情報記憶手段
２０共有ディスク装置 1 Cluster system 3 Public LAN
4 Private LAN
10a, 10b server (node)
11 Communication monitoring means 12 Timekeeping means 13 Communication control means (public LAN)
14 Communication control means (private LAN)
15 Node control unit 16 Definition information storage unit 20 Shared disk device

Claims

In the node control program for controlling the operation of the computer to provide a predetermined service as node that make up the cluster system,
In the computer,
When the interruption of the communication in the communication channel used for information exchange with other nodes constituting the cluster system is detected, a predetermined service can attempt to whether execution assignment time, the cluster A priority order in which each node constituting the system obtains the allocated time, and pre-defined definition information is stored in the definition information storage means ,
When communication failure is detected on the communication path, the allocation time is prioritized from the reference time based on the elapsed time from the reference time common to the nodes constituting the cluster system and the definition information. Assuming that each node constituting the cluster system is assigned in order according to the rank, the start time of the next assigned time closest to the current time assigned to the computer is calculated,
If time was not executing the predetermined service delays the activation of the predetermined service to the start time, reaches the start time, starts a predetermined service, said predetermined service within the allotted time There if executed to continue the operation of said predetermined service, when said predetermined service can not be executed within the allotted time, Ru stops the operation of said predetermined service,
A node control program characterized by causing a procedure to be executed.

The communication is detected to be disconnected when both the private LAN that connects the nodes constituting the cluster system and the public LAN that connects the client and all the nodes are disconnected. The node control program according to claim 1.

When calculating the start time of the next allocated time,
2. The node control program according to claim 1, wherein an adjustment time for adjusting a time lag timed by each node of the cluster system is added between an end time and a start time of the allocation time.

Obtain time information from a management device that manages the entire group including the cluster system to adjust the time,
The node control program according to claim 1, wherein when the time information cannot be acquired from the management apparatus, the priority of the definition information is lowered.

Configure cluster system as a node in Rusa over bus provides a predetermined service,
An allocation time during which it is possible to try to execute the predetermined service when communication failure is detected on a communication path used for information exchange with other nodes constituting the cluster system; and the cluster a priority nodes constituting the system to obtain the allocated time, but the definition information storage means for storing definition information defined in advance,
When communication failure is detected on the communication path, the allocation time is prioritized from the reference time based on the elapsed time from the reference time common to the nodes constituting the cluster system and the definition information. Allocation time calculation means for calculating the start time of the next allocation time closest to the current time allocated to the server, as assigned to each node constituting the cluster system in order according to the order ,
Delay means for delaying activation of the predetermined service until the start time when the predetermined service has not been executed ;
When the start time is reached, the predetermined service is started , and if the service can be executed within the allocated time, the operation of the predetermined service is continued.If the predetermined service cannot be executed within the allocated time, processing means Ru stops the operation of said predetermined service,
A server comprising: