JP6838334B2

JP6838334B2 - Cluster system, server, server operation method, and program

Info

Publication number: JP6838334B2
Application number: JP2016187023A
Authority: JP
Inventors: 敏喜瀬戸
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2016-09-26
Filing date: 2016-09-26
Publication date: 2021-03-03
Anticipated expiration: 2036-09-26
Also published as: JP2018056633A

Description

本発明は、クラスタシステムに関し、更に詳しくは、複数のサーバを含むクラスタシステムに関する。 The present invention relates to a cluster system, and more particularly to a cluster system including a plurality of servers.

また、本発明は、クラスタシステムに用いられるサーバ、その動作方法、及びプログラムに関する。 The present invention also relates to a server used in a cluster system, an operation method thereof, and a program.

複数のコンピュータを束ねて一体として動作させるクラスタシステムが知られている。クラスタシステムのうち、システムの可用性（availability）を高めるために構築されたものは、高可用クラスタシステムと呼ばれる。高可用クラスタシステムは、ネットワークを介して相互に接続された複数のノード（サーバ）を含む。 A cluster system is known in which a plurality of computers are bundled and operated as one. Of the cluster systems, those built to increase the availability of the system are called high availability cluster systems. A highly available cluster system includes a plurality of nodes (servers) connected to each other via a network.

高可用クラスタシステムにおいて、各ノードは、ネットワークを経由して互いに死活監視を行う。高可用クラスタシステムでは、ある業務はただ一つのノードで実行され、その業務が正常に実行されている可否かが監視される。高可用クラスタシステムは、業務が実行されていたノードが停止したことを検知すると別のノードにその業務を引き継ぐ機能と、業務の監視が異常を検知すると別のノードにその業務を引き継ぐ機能とを持つ。高可用クラスタシステムでは、クラスタ内のあるノードが通信不能であると判断されると、当該ノードが停止したと判断され、そのノード上で動作していた業務が他ノードで起動される。 In a highly available cluster system, each node monitors each other alive via a network. In a high availability cluster system, a task is executed on only one node, and whether or not the task is executed normally is monitored. The high availability cluster system has a function to take over the business to another node when it detects that the node on which the business was being executed has stopped, and a function to take over the business to another node when the business monitoring detects an abnormality. Have. In a highly available cluster system, when it is determined that a certain node in the cluster cannot communicate, it is determined that the node has stopped, and the business running on that node is started on the other node.

クラスタシステムに関して、特許文献１は、正常なノードがサービスを継続するシステムを開示する。特許文献１では、ノードがサービス処理を実行できる割当て時間と、ノードごとに設定された割当て時間の優先順位とを含む定義情報が用いられる。特許文献１において、クラスタシステムを構成する２つのサーバ（ノード）間の通信が不通になると、各ノードは、定義情報に基づいて、次の割当て時間の開始時刻を算出する。各ノードは、算出された開始時刻まで、所定のサービス（業務）処理の起動を遅延する。各ノードは、割当て時間の開始時刻になると、業務処理を起動し、割当て時間内に業務が実行できれば業務処理を継続し、業務が実行できなければ動作を停止する。 Regarding a cluster system, Patent Document 1 discloses a system in which a normal node continues to serve. In Patent Document 1, definition information including an allotted time in which a node can execute a service process and a priority of the allotted time set for each node is used. In Patent Document 1, when communication between two servers (nodes) constituting the cluster system is interrupted, each node calculates the start time of the next allocation time based on the definition information. Each node delays the start of a predetermined service (business) process until the calculated start time. At the start time of the allocated time, each node starts the business process, continues the business process if the business can be executed within the allocated time, and stops the operation if the business cannot be executed.

また、クラスタシステムに関して、特許文献２は、スプリットブレインの発生時に、クライアントに対してサービスを提供するマスタノードを選択することを開示する。特許文献２において、各ノードのクラスタ管理部は、相互にハートビート通信を行うことでノード障害を検出する。各ノードの重み付け処理部は、ノードのサービスの開始に関する状態をチェックし、そのチェックされた状態に応じて、共有ストレージ装置に格納されている重み情報中の自ノードの重みを更新する。各ノードのタイブレーカ機構は、更新された重み情報が示す重みに基づいて自ノードの優先順位が最も高いか否かを判定し、最も優先順位が高い場合に自ノードをマスタノードとして選択する。 Further, regarding a cluster system, Patent Document 2 discloses that a master node that provides a service to a client is selected when a split brain occurs. In Patent Document 2, the cluster management unit of each node detects a node failure by performing heartbeat communication with each other. The weighting processing unit of each node checks the state related to the start of the service of the node, and updates the weight of the own node in the weight information stored in the shared storage device according to the checked state. The tiebreaker mechanism of each node determines whether or not the priority of the own node is the highest based on the weight indicated by the updated weight information, and if the priority is the highest, the own node is selected as the master node.

特開２００６−４８４７７号公報Japanese Unexamined Patent Publication No. 2006-48477 特開２００９−２２３５１９号公報JP-A-2009-223519

特許文献１及び２では、それぞれのノードが他のノードの死活監視を行っている。このため、ノード数が増加すると、それに伴い死活監視のための通信量が増大する。例えばＮを２以上の整数として、クラスタシステムを構成するノードの数がＮであった場合、クラスタシステムにおいて死活監視のための通信の数はＮ（Ｎ−１）／２になる。特許文献１及び２において、ノード数が増加すると通信負荷が増大し、通信の管理も複雑となる。 In Patent Documents 1 and 2, each node monitors the life and death of other nodes. Therefore, as the number of nodes increases, the amount of communication for alive monitoring increases accordingly. For example, when N is an integer of 2 or more and the number of nodes constituting the cluster system is N, the number of communications for alive monitoring in the cluster system is N (N-1) / 2. In Patent Documents 1 and 2, when the number of nodes increases, the communication load increases and communication management becomes complicated.

本発明は、上記事情に鑑み、ノード数が増加しても死活監視のための通信の増大を抑制可能なクラスタシステムを提供することを目的とする。 In view of the above circumstances, an object of the present invention is to provide a cluster system capable of suppressing an increase in communication for alive monitoring even if the number of nodes increases.

また、本発明は、上記クラスタシステムに用いられるサーバ、そのサーバの動作方法、及びプラグラムを提供することを目的とする。 Another object of the present invention is to provide a server used in the cluster system, an operation method of the server, and a program.

上記目的を達成するために、本発明は、複数のサーバを含むクラスタシステムに使用されるサーバであって、前記複数のサーバのうちの１つのサーバをリーダーとして選出するリーダー選出部と、自サーバがリーダーである場合は、前記クラスタシステムに含まれる他のサーバにキープアライブを送信して他のサーバを監視し、自サーバがリーダーでない場合は、リーダーであるサーバから前記キープアライブを受信し、該キープアライブに対する応答を前記リーダーであるサーバに送信するキープアライブ通信制御部と、自サーバがリーダーである場合に、前記キープアライブ通信制御部における監視の結果を記録するノード状態記録部とを備えるサーバを提供する。 In order to achieve the above object, the present invention is a server used in a cluster system including a plurality of servers, and a leader selection unit that selects one of the plurality of servers as a leader and its own server. If is a reader, it sends a keep alive to other servers included in the cluster system to monitor other servers, and if its own server is not a leader, it receives the keep alive from the server that is the leader. It includes a keep alive communication control unit that transmits a response to the keep alive to the server that is the reader, and a node state recording unit that records the monitoring result in the keep alive communication control unit when the own server is the reader. Provide a server.

本発明は、また、複数のサーバを含むクラスタシステムであって、前記複数のサーバのうちの１つがリーダーとして選出され、前記リーダーして選出されたサーバが、他のサーバにキープアライブを送信して他のサーバを監視し、前記他のサーバはキープアライブに対する応答を前記リーダーとして選出されたサーバに送信するクラスタシステムを提供する。 The present invention is also a cluster system including a plurality of servers, in which one of the plurality of servers is elected as a leader, and the server selected as the leader transmits a keep alive to another server. It provides a cluster system that monitors other servers and sends a response to keep-alive to the server elected as the leader.

さらに、本発明は、複数のサーバを含むクラスタシステムにおいてリーダーとして動作するサーバの動作方法であって、前記クラスタシステムに含まれる他のサーバにキープアライブを送信するステップと、前記他のサーバからキープアライブに対する応答を受信するステップと、前記応答を受信できたか否かを示す監視結果をノード状態記録部に記録するステップとを有するサーバの動作方法を提供する。 Further, the present invention is a method of operating a server that operates as a leader in a cluster system including a plurality of servers, in which a step of transmitting a keep alive to another server included in the cluster system and a keep from the other server. Provided is an operation method of a server having a step of receiving a response to an alive and a step of recording a monitoring result indicating whether or not the response could be received in a node state recording unit.

本発明は、複数のサーバを含むクラスタシステムにおいてリーダーとして動作するサーバに、前記クラスタシステムに含まれる他のサーバにキープアライブを送信するステップと、前記他のサーバからキープアライブに対する応答を受信するステップと、前記応答を受信できたか否かを示す監視結果をノード状態記録部に記録するステップとを実行させるためのプログラムを提供する。 The present invention is a step of transmitting a keep alive to a server operating as a leader in a cluster system including a plurality of servers to another server included in the cluster system, and a step of receiving a response to the keep alive from the other server. And a program for executing the step of recording the monitoring result indicating whether or not the response could be received in the node state recording unit.

本発明のクラスタシステム、サーバ、サーバの動作方法、及びプログラムは、クラスタシステムのノード数が増加しても死活監視のための通信の増大を抑制することができる。 The cluster system, server, server operation method, and program of the present invention can suppress an increase in communication for alive monitoring even if the number of nodes in the cluster system increases.

本発明のクラスタシステムに用いられるサーバを示すブロック図。The block diagram which shows the server used for the cluster system of this invention. 本発明の一実施形態に係るクラスタシステムを示すブロック図。The block diagram which shows the cluster system which concerns on one Embodiment of this invention. サーバの構成を示すブロック図。A block diagram showing the server configuration. リーダー選出時の動作手順を示すフローチャート。A flowchart showing an operation procedure when a leader is selected. リーダーノード処理の動作手順を示すフローチャート。A flowchart showing the operation procedure of the leader node processing. 非リーダーノード処理の動作手順を示すフローチャートFlowchart showing the operation procedure of non-leader node processing

本発明の実施の形態の説明に先立って、本発明の概要を説明する。図１は、本発明のクラスタシステムに用いられるサーバを示す。サーバ１０は、リーダー選出部１１、キープアライブ通信制御部１２、及びノード状態記録部１３を有する。クラスタシステムには、複数のサーバ１０が含まれる。 Prior to the description of embodiments of the present invention, an outline of the present invention will be described. FIG. 1 shows a server used in the cluster system of the present invention. The server 10 has a leader selection unit 11, a keep-alive communication control unit 12, and a node status recording unit 13. The cluster system includes a plurality of servers 10.

リーダー選出部１１は、クラスタシステムに含まれる複数のサーバ１０のうちの１つをリーダーとして選出する。キープアライブ通信制御部１２は、自サーバがリーダーである場合は、クラスタシステムに含まれる他のサーバ１０にキープアライブを送信し、他のサーバを監視する。キープアライブ通信制御部１２は、自サーバがリーダーではない場合は、リーダーであるサーバ１０からキープアライブを受信し、受信したキープアライブに対する応答をリーダーであるサーバ１０に送信する。ノード状態記録部１３は、自サーバがリーダーである場合、キープアライブ通信制御部における監視の結果を記録する。 The leader selection unit 11 selects one of a plurality of servers 10 included in the cluster system as a leader. When the own server is the leader, the keep-alive communication control unit 12 transmits the keep-alive to the other server 10 included in the cluster system and monitors the other server. If the own server is not the reader, the keep-alive communication control unit 12 receives the keep-alive from the server 10 which is the reader, and transmits a response to the received keep-alive to the server 10 which is the leader. When the local server is the leader, the node state recording unit 13 records the monitoring result in the keep-alive communication control unit.

本発明では、クラスタシステムを構成する複数のサーバの１つがリーダーとして選出される。リーダーとして選出されたサーバは、残りのサーバにキープアライブを送信し、キープアライブに対する応答を受信する。リーダーとして選出されたサーバは、キープアライブに対する応答の有無に基づいて、残りのサーバの稼動状態を監視する。本発明では、リーダーとして選出されたサーバと残りのサーバのそれぞれとの間でキープアライブ及びそれに対する応答の通信を行えばよいため、クラスタシステムのノード数が増加しても死活監視のための通信の増大を抑制することができる。 In the present invention, one of the plurality of servers constituting the cluster system is elected as the leader. The server elected as the leader sends a keepalive to the remaining servers and receives a response to the keepalive. The server elected as the leader monitors the operating status of the remaining servers based on whether or not there is a response to keepalives. In the present invention, keepalive and response communication may be performed between the server selected as the leader and each of the remaining servers, so that communication for life-and-death monitoring is performed even if the number of nodes in the cluster system increases. Can be suppressed.

以下、図面を参照しつつ、本発明の実施の形態を詳細に説明する。図２は、本発明の一実施形態に係るクラスタシステムを示す。クラスタシステム１００は、Ｎを３以上の整数として、サーバ１１０−１〜１１０−Ｎを有する。これらサーバ１１０−１〜１１０−Ｎは、典型的には、プロセッサ、メモリ、及び補助記憶装置などを有するコンピュータ装置である。サーバ１１０−１〜１１０−Ｎは、ネットワークを介して相互に接続される。サーバ１１０−１〜１１０−Ｎは、例えば高可用クラスタシステムを構成する。なお、以下の説明では、複数のサーバを特に区別する必要がない場合は、サーバ１１０とも呼ぶ。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 2 shows a cluster system according to an embodiment of the present invention. The cluster system 100 has servers 110-1 to 110-N, where N is an integer of 3 or more. These servers 110-1 to 110-N are typically computer devices having a processor, memory, auxiliary storage, and the like. The servers 110-1 to 110-N are connected to each other via a network. Servers 110-1 to 110-N form, for example, a high availability cluster system. In the following description, when it is not necessary to distinguish a plurality of servers, it is also referred to as a server 110.

本実施形態では、クラスタシステム１００において、複数のサーバ１１０のうちの１つがリーダーとして動的に選出される。リーダーとして選出されたサーバ１１０は、定期的に残りのサーバ１１０にキープアライブを送信し、それに対する応答を受信することで、残りのサーバ１１０の死活監視を行う。リーダーではない残りのサーバ１１０は、リーダーから送信されるキープアライブを用いて、リーダーとして選出されたサーバ１１０の死活監視を行う。 In the present embodiment, in the cluster system 100, one of the plurality of servers 110 is dynamically elected as a leader. The server 110 elected as the leader periodically transmits a keepalive to the remaining servers 110 and receives a response to the keep-alive to monitor the life and death of the remaining servers 110. The remaining servers 110 that are not leaders monitor the life and death of the servers 110 selected as leaders by using keepalives transmitted from the readers.

ここで、キープアライブに対して応答を返さないサーバには、動作を停止しているサーバと、何らかの理由でネットワークから切り離されてリーダーと通信不能になったサーバとが含まれる。以下では、動作を停止しているサーバを停止ノードとも呼び、リーダーと通信不能になったサーバを無応答ノードとも呼ぶ。無応答ノードと停止ノードを区別するため、各サーバ１１０は、その停止処理において、リーダーとして選出されたサーバ１１０に停止通知を送信するものとする。停止通知を送信しておらず、かつキープアライブに対して応答を返さないサーバは、無応答ノードとして取り扱われる。 Here, the servers that do not respond to the keepalive include a server that has stopped operating and a server that has been disconnected from the network for some reason and cannot communicate with the reader. In the following, a server that has stopped operating is also referred to as a stopped node, and a server that has lost communication with the reader is also referred to as a non-response node. In order to distinguish between the non-responding node and the stopped node, each server 110 shall send a stop notification to the server 110 selected as the leader in the stop processing. A server that has not sent a stop notification and does not respond to keepalives is treated as a non-responding node.

本実施形態において、複数のサーバ１１０のそれぞれは、外部から強制停止の指示を受け付ける機能を有している。リーダーとして選出されたサーバ１１０は、無応答ノードを検出すると、無応答ノードを強制的に停止させることができる。例えば、リーダーとして選出されたサーバ１１０は、無応答ノードのサーバ１１０が物理マシンであれば、ＩＰＭＩ（Intelligent Platform Management Interface）に準拠するＢＭＣ(Baseboard Management Controller)を利用して外部から電源断を実行する。リーダーとして選出されたサーバ１１０は、無応答ノードのサーバ１１０が仮想化環境上の仮想マシンであれば、ホストマシンに対する当該仮想マシンの強制停止要求を実行する。 In the present embodiment, each of the plurality of servers 110 has a function of receiving a forced stop instruction from the outside. When the server 110 elected as the leader detects the non-responding node, the non-responding node can be forcibly stopped. For example, if the server 110 of the non-responding node is a physical machine, the server 110 elected as the leader executes power off from the outside using BMC (Baseboard Management Controller) compliant with IPMI (Intelligent Platform Management Interface). To do. If the server 110 of the non-responding node is a virtual machine in the virtual environment, the server 110 elected as the leader executes a forced stop request of the virtual machine to the host machine.

リーダーとして選出されたサーバ１１０が停止する場合、他に動作しているサーバ１１０があればリーダー選出が再度実行される。リーダーとして選出されていたサーバ１１０は、新たにリーダーとして選出されたサーバ１１０がリーダーとしての動作を開始してから、停止処理を実行する。リーダーとして選出されたサーバ１１０は、他に動作しているサーバ１１０がない場合は、そのまま停止処理を実行する。 When the server 110 elected as the leader is stopped, the leader election is executed again if there is another operating server 110. The server 110 elected as the leader executes the stop process after the server 110 newly elected as the leader starts the operation as the leader. If there is no other operating server 110, the server 110 elected as the leader executes the stop process as it is.

クラスタシステム１００において、リーダーとして選出されたサーバ１１０が、何らかの理由で他のサーバ１１０と通信できなくなる場合も考えられる。そのような場合に対応するため、リーダーとして選出されたサーバ１１０は、クラスタ内の半数以上のサーバ１１０と通信が可能な場合にリーダーを維持できるものとする。 In the cluster system 100, the server 110 selected as the leader may not be able to communicate with other servers 110 for some reason. In order to deal with such a case, the server 110 elected as the leader can maintain the leader when it can communicate with more than half of the servers 110 in the cluster.

図３は、サーバ１１０の構成を示す。サーバ１１０は、リーダー選出部１１１、キープアライブ通信制御部１１２、ノード状態記録部１１３、通信部１１４、及び電源制御部１１５を有する。サーバ１１０において、リーダー選出部１１１、キープアライブ通信制御部１１２、ノード状態記録部１１３、通信部１１４、及び電源制御部１１５の機能のうちの少なくとも一部は、サーバがプログラムに従って処理を実行することで実現される。 FIG. 3 shows the configuration of the server 110. The server 110 includes a leader selection unit 111, a keep-alive communication control unit 112, a node state recording unit 113, a communication unit 114, and a power supply control unit 115. In the server 110, at least a part of the functions of the leader selection unit 111, the keep-alive communication control unit 112, the node status recording unit 113, the communication unit 114, and the power supply control unit 115 is that the server executes processing according to a program. It is realized by.

通信部１１４は、他のサーバ１１０及び図示しないクライアント端末の少なくとも一方との間で通信を行う。リーダー選出部１１１は、複数のサーバ１１０−１〜１１０−Ｎのうちの１つをリーダーとして選出する。キープアライブ通信制御部１１２は、自サーバがリーダーである場合は、他のサーバにキープアライブを送信する。キープアライブ通信制御部１１２は、自サーバがリーダーではない場合は、リーダーであるサーバからキープアライブを受信し、それに対する応答を送信する。 The communication unit 114 communicates with at least one of the other server 110 and a client terminal (not shown). The leader selection unit 111 selects one of the plurality of servers 110-1 to 110-N as a leader. If the own server is the leader, the keep-alive communication control unit 112 transmits the keep-alive to another server. If the local server is not the reader, the keep-alive communication control unit 112 receives the keep-alive from the server that is the reader, and transmits a response to the keep-alive.

ノード状態記録部１１３は、記憶装置を含んでおり、キープアライブ通信制御部１１２における監視の結果を記憶する。ノード状態記録部１１３は、例えばキープアライブ対する応答がないサーバ１１０のうち、停止通知を受け取っていないサーバを無応答ノードとして記録する。ノード状態記録部１１３は、停止通知を受け取っていたサーバを停止ノードとして記録する。また、ノード状態記録部１１３は、どのサーバがリーダーであるかを示す情報を記憶する。 The node state recording unit 113 includes a storage device and stores the monitoring result in the keep-alive communication control unit 112. The node status recording unit 113 records, for example, the server 110 that has not responded to the keepalive and has not received the stop notification as a non-responding node. The node status recording unit 113 records the server that has received the stop notification as a stop node. Further, the node state recording unit 113 stores information indicating which server is the leader.

電源制御部１１５は、サーバ１１０の電源制御を行う。電源制御部１１５は、自サーバがリーダーである場合に、キープアライブに対して応答しないノードの電源を停止させる。電源制御部１１５は、ノードの停止を制御するノード停止制御部に相当する。電源制御部１１５は、自サーバがリーダーである場合、停止させるサーバ１１０の電源制御部１１５に、電源断の要求を送信する。電源制御部１１５は、リーダーであるサーバ１１０から電源断の要求を受信すると、自サーバの電源を停止する。 The power supply control unit 115 controls the power supply of the server 110. When the local server is the leader, the power control unit 115 stops the power supply of the node that does not respond to the keepalive. The power supply control unit 115 corresponds to a node stop control unit that controls the stop of a node. When the own server is the leader, the power supply control unit 115 transmits a power cutoff request to the power supply control unit 115 of the server 110 to be stopped. When the power control unit 115 receives a power cutoff request from the reader server 110, the power control unit 115 stops the power supply of its own server.

リーダーとして選出されたサーバ１１０の電源制御部１１５は、例えば、ノード状態記録部１１３において無応答ノードとして記録されたノード（サーバ）の電源を強制的に停止する。電源制御部１１５は、無応答ノードとして記録されたサーバの数が、全サーバの半分よりも少ない場合に、サーバの電源停止を実施してもよい。無応答ノードして記録されたサーバの数が全サーバの半分以上の場合、リーダー選出部１１１は、新たなリーダーの選出を実施してもよい。 The power control unit 115 of the server 110 elected as the leader forcibly stops the power supply of the node (server) recorded as a non-response node in the node status recording unit 113, for example. The power control unit 115 may perform a power stop of the server when the number of servers recorded as a non-response node is less than half of all the servers. When the number of servers recorded as a non-response node is half or more of all the servers, the leader selection unit 111 may select a new leader.

続いて動作手順を説明する。図４は、リーダー選出時の動作手順を示す。各サーバ１１０は、例えばその起動時にリーダー選出処理を開始する。各サーバ１１０において、リーダー選出部１１１は、クラスタシステムを構成する複数のサーバ１１０の１つをリーダーとして選出する（ステップＡ１）。リーダー選出部１１１は、例えば分散合意アルゴリズムなどを利用し、リーダーとして動作させるサーバ（リーダーノード）を１つ選出する。 Subsequently, the operation procedure will be described. FIG. 4 shows an operation procedure when a leader is selected. Each server 110 starts the leader election process, for example, when it is started. In each server 110, the leader selection unit 111 selects one of the plurality of servers 110 constituting the cluster system as a leader (step A1). The leader selection unit 111 selects one server (leader node) to operate as a leader by using, for example, a distributed consensus algorithm.

リーダー選出部１１１は、ノード状態記録部１１３にリーダーノードを記録する。各サーバは、自サーバがリーダーであるか否かを判断する（ステップＡ３）。各サーバ１１０は、自サーバがリーダーである場合は、リーダーノード処理を実行し（ステップＡ４）、リーダーではない場合は非リーダーノード処理を実行する（ステップＡ５）。 The leader election unit 111 records the leader node in the node status recording unit 113. Each server determines whether or not its own server is the leader (step A3). When the own server is a leader, each server 110 executes a leader node process (step A4), and when the server 110 is not a leader, executes a non-leader node process (step A5).

図５は、リーダーノード処理の動作手順を示す。リーダーとして選出されたサーバ１１０のキープアライブ通信制御部１１２は、リーダー以外のサーバ１１０のそれぞれへキープアライブを送信する（ステップＢ１）。キープアライブ通信制御部１１２は、キープアライブに対する応答を待つ（ステップＢ２）。キープアライブ通信制御部１１２は、応答待ちがタイムアウトしたか否かを判断する（ステップＢ３）。 FIG. 5 shows the operation procedure of the leader node processing. The keep-alive communication control unit 112 of the server 110 selected as the leader transmits the keep-alive to each of the servers 110 other than the reader (step B1). The keep-alive communication control unit 112 waits for a response to the keep-alive (step B2). The keep-alive communication control unit 112 determines whether or not the response wait time-out has occurred (step B3).

キープアライブ通信制御部１１２は、タイムアウトする前に応答を受信できた場合は、その旨をノード状態記録部１１３に記録する（ステップＢ４）。キープアライブ通信制御部１１２は、ステップＢ４では、無応答ノードとして記録したサーバからキープアライブに対する応答が受信できた場合は、無応答ノードの記録を削除する。その後、キープアライブ通信制御部１１２は、一定時間待機した後にステップＢ１に戻り、例えば所定時間間隔で定期的にキープアライブの送信を行う。 If the keep-alive communication control unit 112 can receive the response before the time-out occurs, the keep-alive communication control unit 112 records that fact in the node state recording unit 113 (step B4). In step B4, the keep-alive communication control unit 112 deletes the record of the non-response node when the response to the keep-alive can be received from the server recorded as the non-response node. After that, the keep-alive communication control unit 112 returns to step B1 after waiting for a certain period of time, and transmits the keep-alive periodically, for example, at predetermined time intervals.

ステップＢ３でタイムアウトしたと判断された場合、ノード状態記録部１１３は、応答がタイムアウトしたノードを無応答ノードとして記録する（ステップＢ５）。キープアライブ通信制御部１１２は、無応答ノードの数が全サーバの半数以上であるか否かを判断する（ステップＢ６）。リーダー選出部１１１は、ステップＢ６で無応答ノードの数が全サーバの半数以上であると判断された場合は、リーダー維持不能であるとして、リーダー選出処理をやり直す（ステップＢ７）。 When it is determined that the time-out has occurred in step B3, the node state recording unit 113 records the node whose response has timed out as a non-response node (step B5). The keep-alive communication control unit 112 determines whether or not the number of non-response nodes is more than half of all the servers (step B6). If it is determined in step B6 that the number of non-responding nodes is more than half of all the servers, the leader selection unit 111 determines that the leader cannot be maintained and redoes the leader selection process (step B7).

ステップＢ６で無応答ノードの数が全サーバの半数以上ではないと判断された場合、電源制御部１１５は、無応答ノードとして記録されたサーバ１１０を強制的に停止させる（ステップＢ８）。ステップＢ８では、例えばリーダーであるサーバ１１０のキープアライブ通信制御部１１２から、無応答ノードとして記録されたサーバ１１０の電源制御部１１５に電源停止要求を送信する。電源制御部１１５が、電源停止要求を受信して電源停止を実行することで、無応答ノードとして記録された自サーバを停止させる。 When it is determined in step B6 that the number of non-response nodes is not more than half of all the servers, the power supply control unit 115 forcibly stops the server 110 recorded as the non-response nodes (step B8). In step B8, for example, the keep-alive communication control unit 112 of the server 110, which is the leader, transmits a power stop request to the power control unit 115 of the server 110 recorded as a non-response node. The power control unit 115 receives the power stop request and executes the power stop to stop the own server recorded as the non-response node.

図６は、非リーダーノード処理の動作手順を示す。リーダーとして選出されなかった残りのサーバ１１０のキープアライブ通信制御部１１２は、リーダーとして選出されたサーバ１１０から送信されるキープアライブを待ち受ける（ステップＣ１）。キープアライブ通信制御部１１２は、キープアライブがタイムアウトしたか否かを判断する（ステップＣ２）。キープアライブ通信制御部１１２は、ステップＣ２でキープアライブがタイムアウトしていないと判断した場合は、キープアライブに対する応答をリーダーとして選出されたサーバ１１０に送信する（ステップＣ３）。その後、キープアライブ通信制御部１１２は、ステップＣ１に戻り、次のキープアライブを待ち受ける。 FIG. 6 shows an operation procedure of non-leader node processing. The keep-alive communication control unit 112 of the remaining servers 110 that are not elected as leaders waits for the keep-alives transmitted from the servers 110 that are elected as leaders (step C1). The keep-alive communication control unit 112 determines whether or not the keep-alive has timed out (step C2). If the keep-alive communication control unit 112 determines in step C2 that the keep-alive has not timed out, the keep-alive communication control unit 112 transmits a response to the keep-alive to the server 110 selected as the leader (step C3). After that, the keep-alive communication control unit 112 returns to step C1 and waits for the next keep-alive.

キープアライブ通信制御部１１２は、ステップＣ２でキープアライブがタイムアウトしたと判断した場合は、リーダーが消失したとして、リーダー選出部１１１にリーダー選出のやり直しを指示する。リーダー選出部１１１は、指示に従って、リーダー選出処理を実行する（ステップＣ４）。 When the keep-alive communication control unit 112 determines that the keep-alive has timed out in step C2, it considers that the leader has disappeared and instructs the leader selection unit 111 to redo the leader selection. The leader selection unit 111 executes the leader selection process according to the instruction (step C4).

なお、リーダー選出後に再度リーダー選出処理を開始する判断は、クラスタシステムを構成する各サーバ１１０が独立して行う。元のリーダーが何らかの理由でリーダー選出処理を開始した場合、他のサーバ１１０はキープアライブの受信が途絶えるため、他のサーバ１１０においてもリーダー選出処理が開始される。その結果として、全てのサーバ１１０においてリーダー選出処理が実行されることになる。一方、リーダーからキープアライブが受信できないことであるサーバ１１０がリーダー選出処理を開始した場合、全体の過半数のサーバ１１０がリーダー選出処理を開始すれば、新たなリーダーが選出される。そうでない場合、元のリーダーが継続してリーダーとして動作し、リーダーから送信されたキープアライブが受信できなかったサーバ１１０、つまりリーダー選出処理を実行したもののリーダーを選出できなかったサーバ１１０は、リーダーからの指示で停止させられる。 It should be noted that each server 110 constituting the cluster system independently determines to start the leader selection process again after the leader is selected. If the original leader starts the leader selection process for some reason, the other server 110 stops receiving the keepalive, so that the other server 110 also starts the leader selection process. As a result, the leader election process is executed on all the servers 110. On the other hand, when the server 110, which cannot receive the keepalive from the reader, starts the leader election process, a new leader is elected if the majority of the servers 110 start the leader election process. Otherwise, the original reader will continue to act as the leader, and the server 110 that did not receive the keepalives sent by the reader, that is, the server 110 that performed the leader election process but could not elect the leader, is the leader. It is stopped by the instruction from.

本実施形態では、クラスタシステムを構成する複数のサーバ１１０−１〜１１０−Ｎのうち、１つのサーバ１１０がリーダーとして選出される。リーダーとして選出されたサーバ１１０は、残りのサーバ１１０に対して、キープアライブ通信を用いた死活監視を行う。リーダーとして選出されなかったサーバ１１０は、それぞれリーダーとして選出されたサーバ１１０との間でキープアライブ通信を行えばよい。本実施形態では、例えばサーバ１１０の台数がＮ台であった場合、キープアライブの通信の数はＮ−１となる。本実施形態では、リーダーとして選出されていないサーバ１１０の間でキープアライブ通信を行う必要がないため、クラスタシステムを構成するサーバの数が増えた場合の通信負荷の増加を抑制することができ、通信の管理も簡素化することができる。 In the present embodiment, one server 110 is elected as the leader among the plurality of servers 110-1 to 110-N constituting the cluster system. The server 110 selected as the leader performs life-and-death monitoring using keep-alive communication on the remaining servers 110. The servers 110 that are not elected as leaders may carry out keep-alive communication with the servers 110 that are elected as leaders. In the present embodiment, for example, when the number of servers 110 is N, the number of keep-alive communications is N-1. In the present embodiment, since it is not necessary to perform keep-alive communication between the servers 110 that are not selected as leaders, it is possible to suppress an increase in the communication load when the number of servers constituting the cluster system increases. Communication management can also be simplified.

また、本実施形態では、各サーバ１１０は、その停止時に、ノードの停止通知をリーダーとして選出されたサーバ１１０に送信する。リーダーとして選出されたサーバ１１０は、キープアライブに対する応答がないサーバ１１０がある場合に、停止通知の有無を調べることで、そのサーバ１１０が停止しているのか、或いは何らかの理由で応答を返すことができないのかを判別することが可能である。 Further, in the present embodiment, each server 110 transmits a node stop notification to the server 110 selected as a leader at the time of its stop. When there is a server 110 that does not respond to the keepalive, the server 110 elected as the leader may return a response whether the server 110 is stopped or for some reason by checking for the presence or absence of the stop notification. It is possible to determine if it is not possible.

例えば、あるサーバが一時的に動作不能になり、しばらく後に動作を再開するような挙動を示した場合を考える。クラスタシステムにおいて、業務を提供していたサーバが一時的に動作不能状態になり、そのサーバが停止したと判断されると、別のサーバで業務が起動される。別のサーバで業務が起動された後、元のサーバが動作を再開すると、複数のサーバで業務が実行された状態（以下、両系活性状態とも呼ぶ）になる可能性がある。 For example, consider a case where a server temporarily becomes inoperable and then resumes operation after a while. In a cluster system, if the server that provided the business temporarily becomes inoperable and it is determined that the server has stopped, the business is started on another server. If the original server resumes operation after the business is started on another server, there is a possibility that the business will be executed on multiple servers (hereinafter, also referred to as the active state of both systems).

本実施形態では、リーダーとして選出されたサーバ１１０は、一時的に動作不能になったサーバ１１０から停止通知を受け取ったか否かを判断する。リーダーとして選出されたサーバ１１０は、停止通知の有無に基づいて、そのサーバ１１０が停止したのか、或いは無応答であるのかを判断することができる。本実施形態では、リーダーとして選出されたサーバ１１０は、無応答のサーバ１１０を、例えばＩＰＭＩなどを利用した外部からの電源制御を用いて強制的に停止させる。このようにすることで、無応答のサーバ１１０が動作を再開し、両系活性状態になることを防ぐことができる。 In the present embodiment, the server 110 elected as the leader determines whether or not the stop notification has been received from the server 110 that has become temporarily inoperable. The server 110 elected as the leader can determine whether the server 110 has stopped or has no response based on the presence or absence of the stop notification. In the present embodiment, the server 110 selected as the leader forcibly stops the non-responsive server 110 by using an external power supply control using, for example, IPMI. By doing so, it is possible to prevent the non-responsive server 110 from resuming operation and becoming active in both systems.

以上、本発明の実施形態を詳細に説明したが、本発明は、上記した実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で上記実施形態に対して変更や修正を加えたものも、本発明に含まれる。 Although the embodiments of the present invention have been described in detail above, the present invention is not limited to the above-described embodiments, and changes and modifications are made to the above-described embodiments without departing from the spirit of the present invention. Also included in the present invention.

上記実施形態において、プログラムは、様々なタイプの非一時的なコンピュータ可読媒体（non-transitory computer readable medium）を用いて格納され、コンピュータに供給することができる。非一時的なコンピュータ可読媒体は、様々なタイプの実体のある記憶媒体（tangible storage medium）を含む。非一時的なコンピュータ可読媒体の例は、例えばフレキシブルディスク、磁気テープ、又はハードディスクなどの磁気記録媒体、例えば光磁気ディスクなどの光磁気記録媒体、ＣＤ（compact disc）、又はＤＶＤ（digital versatile disk）などの光ディスク媒体、及び、マスクＲＯＭ（read only memory）、ＰＲＯＭ（programmable ROM）、ＥＰＲＯＭ（erasable PROM）、フラッシュＲＯＭ、又はＲＡＭ（random access memory）などの半導体メモリを含む。また、プログラムは、様々なタイプの一時的なコンピュータ可読媒体（transitory computer readable medium）を用いてコンピュータに供給されてもよい。一時的なコンピュータ可読媒体の例は、電気信号、光信号、及び電磁波を含む。一時的なコンピュータ可読媒体は、電線及び光ファイバなどの有線通信路、又は無線通信路を介して、プログラムをコンピュータに供給できる。 In the above embodiment, the program can be stored and supplied to a computer using various types of non-transitory computer readable media. Non-transitory computer-readable media include various types of tangible storage media. Examples of non-temporary computer-readable media are magnetic recording media such as flexible disks, magnetic tapes, or hard disks, such as magneto-optical recording media such as magneto-optical discs, CDs (compact discs), or DVDs (digital versatile disks). Includes optical disk media such as, and semiconductor memories such as mask ROM (read only memory), PROM (programmable ROM), EPROM (erasable PROM), flash ROM, or RAM (random access memory). The program may also be supplied to the computer using various types of transient computer readable media. Examples of temporary computer-readable media include electrical, optical, and electromagnetic waves. The temporary computer-readable medium can supply the program to the computer via a wired communication path such as an electric wire and an optical fiber, or a wireless communication path.

１０：サーバ
１１：リーダー選出部
１２：キープアライブ通信制御部
１３：ノード状態記録部
１１０：サーバ
１１１：リーダー選出部
１１２：キープアライブ通信制御部
１１３：ノード状態記録部
１１４：通信部
１１５：電源制御部 10: Server 11: Leader selection unit 12: Keepalive communication control unit 13: Node status recording unit 110: Server 111: Leader selection unit 112: Keepalive communication control unit 113: Node status recording unit 114: Communication unit 115: Power supply control Department

Claims

A server used in a cluster system that contains multiple servers.
A leader election unit that selects one of the plurality of servers as a leader,
If the local server is the leader, it sends a keep alive to other servers included in the cluster system to monitor other servers, and if the local server is not the leader, it receives the keep alive from the server that is the leader. Then, the keep alive communication control unit that transmits the response to the keep alive to the server that is the reader,
When the local server is the reader, it is provided with a node state recording unit that records the monitoring result in the keepalive communication control unit .
Each of the other servers sends a node stop notification to the leader server when the own node is stopped.
Node state recording unit, when the local server is a leader, that records not respond to the keep-alive, and the server does not receive the stop notification as unresponsive node server.

The server according to claim 1 , wherein the node status recording unit records the server that is the source of the stop notification as a stop node when the own server is a leader.

The server according to claim 1 or 2 , further comprising a node stop control unit that stops a server recorded as a non-response node when the local server is a leader.

Said node stop control section, wherein, when the number of recorded server as unresponsive node is less than half the number of servers included in the cluster system, according to claim 3 for stopping the recorded server as the unresponsive node The server described in.

If the local server is the leader and the number of servers recorded as the non-response node exceeds half of the number of servers included in the cluster system, the leader election unit selects a new leader. The server according to claim 4 to be implemented.

A cluster system that includes multiple servers
One of the multiple servers was elected as the leader and
Server elected by said reader, sending the keep-alive monitor other servers to other servers,
The other server sends a response to the keepalive to the server elected as the leader .
Each of the other servers sends a node stop notification to the leader server when the own node is stopped.
The server elected as the leader is a cluster system that records a server that does not respond to the keepalive and has not received the stop notification as a non-response node.

A method of operating a server that operates as a leader in a cluster system that includes multiple servers.
The step of sending keepalives to other servers included in the cluster system,
The step of receiving the response to the keepalive from the other server,
And recording the monitoring result indicating whether or not received the response to the node state recording unit,
The step of receiving the node stop notification from the other server, and
A method of operating a server having a step of recording a server that does not respond to the keepalive and has not received the stop notification as a non-response node.

For a server that acts as a leader in a cluster system that includes multiple servers
The step of sending keepalives to other servers included in the cluster system,
The step of receiving the response to the keepalive from the other server,
And recording the monitoring result indicating whether or not received the response to the node state recording unit,
The step of receiving the node stop notification from the other server, and
A program for executing a step of recording a server that does not respond to the keepalive and has not received the stop notification as a non-response node.