JP2015095200A

JP2015095200A - Cluster system

Info

Publication number: JP2015095200A
Application number: JP2013235597A
Authority: JP
Inventors: 博史野口; Hiroshi Noguchi; 絵里子岩佐; Eriko Iwasa
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2013-11-14
Filing date: 2013-11-14
Publication date: 2015-05-18
Anticipated expiration: 2033-11-14
Also published as: JP5848743B2

Abstract

PROBLEM TO BE SOLVED: To ensure that, even when a representative member for updating and delivering a member management table and other members go down at the same time in a cluster system, a new representative member can be selected at high speed.SOLUTION: The present invention is a cluster system in which a plurality of cluster members perform distributed processing. Each of the plurality of cluster members (representative member 100, non-representative member 200) is provided with: a storage unit 120 (220) for storing a monitored-member setting rule 123 (223) in which it is stipulated that, when determining the subject to be alive-monitored among the plurality of cluster members, a cluster member for performing alive-monitoring on a next-term representative member candidate be selected from other than the then representative member; and a processing unit 110 (210) for selecting, in accordance with the monitored-member setting rule 123 (223), the subject to be monitored by the member itself when determining the subject to be alive-monitored among the cluster members.

Description

本発明は、複数のクラスタメンバ（サーバなど。以下、単に「メンバ」と称する場合もある。）が分散処理を行うクラスタシステムにおける、クラスタメンバ間の死活監視の技術に関する。 The present invention relates to a life and death monitoring technique between cluster members in a cluster system in which a plurality of cluster members (such as servers, hereinafter may be simply referred to as “members”) perform distributed processing.

大容量データ保持、高速アクセス、高可用性が求められる近年のＷｅｂシステムでは、複数のサーバを協調させることにより、システム全体の処理能力を向上するクラスタシステムが多く用いられている。クラスタシステムによる分散処理では、クラスタを構成するクラスタメンバと、担当するデータが、対応付けられている必要がある。クラスタメンバとデータを対応付ける手法には、例えば、コンシステントハッシング（Consistent Hashing）法という手法がある（非特許文献１）。 In recent Web systems that require large-capacity data retention, high-speed access, and high availability, cluster systems that improve the processing capacity of the entire system by coordinating a plurality of servers are often used. In the distributed processing by the cluster system, the cluster members constituting the cluster and the data in charge must be associated with each other. As a method of associating data with cluster members, for example, there is a method called a consistent hashing method (Non-patent Document 1).

コンシステントハッシング法は、データのハッシュ値と、クラスタメンバに割り当てたアドレスとを同一のＩＤ（IDentifier）空間上へマッピングすることで、クラスタメンバの担当データを定める手法である。この手法は、クラスタメンバが増減した場合でも、クラスタメンバとデータとの対応関係について全体の１／Ｎ（Ｎ：クラスタメンバ数）だけの変更で済むため、データ再配置の負荷を抑えられるという特徴がある。 The consistent hashing method is a method of determining data assigned to a cluster member by mapping a hash value of data and an address assigned to the cluster member on the same ID (IDentifier) space. Even if the number of cluster members increases or decreases, this method requires only a change of 1 / N (N: the number of cluster members) of the correspondence between the cluster members and the data, thereby reducing the load of data relocation. There is.

また、この手法では、クラスタシステムの可用性を向上させるために、担当サーバ以外にもデータの複製を持たせる方法を併用する場合がある。その場合、各クラスタメンバは、全メンバの一覧表（メンバ管理表）を保持し、特定の規則のもとにメンバ管理表から複製先を特定できる必要がある。図１に示すように、メンバ管理表には、各メンバを一意に指定するキーとなるメンバＩＤと、ＩＰ（Internet Protocol）アドレスといったメンバ固有の付加情報と、が含まれる。また、このとき、メンバ管理表の一貫性を担保するには、その更新権限がクラスタ内外の特定サーバにのみ与えられていることが望ましい。 Further, in this method, in order to improve the availability of the cluster system, there is a case where a method of having data replication in addition to the server in charge is used in combination. In that case, each cluster member needs to hold a list of all members (member management table) and be able to specify the replication destination from the member management table under specific rules. As shown in FIG. 1, the member management table includes a member ID as a key for uniquely designating each member and additional information unique to the member such as an IP (Internet Protocol) address. At this time, in order to ensure the consistency of the member management table, it is desirable that the update authority is given only to specific servers inside and outside the cluster.

このような、メンバ管理表の一貫性を実現する方式として、クラスタメンバのうちの一つを代表メンバとし、メンバ管理表の更新と他メンバへの配信を行う方式がある（非特許文献２）。この方式において、代表メンバは、任意に定められた代表メンバ選出規則に従ってクラスタメンバの中から選出される。また、各クラスタメンバは代表メンバの選出規則を把握（記憶）しており、メンバ管理表の情報を元に、自分自身を含んで代表メンバの特定を可能とする。クラスタメンバの増減による代表メンバの交代もクラスタメンバの自律的な判断によって行われる。 As a method for realizing such consistency of the member management table, there is a method in which one of the cluster members is used as a representative member and the member management table is updated and distributed to other members (Non-patent Document 2). . In this method, the representative member is selected from among the cluster members in accordance with an arbitrarily determined representative member selection rule. Each cluster member grasps (stores) representative member selection rules, and can identify representative members including themselves based on information in the member management table. The change of the representative member due to the increase / decrease of the cluster member is also performed by the autonomous determination of the cluster member.

（クラスタメンバ間死活監視と代表メンバの交代）
クラスタメンバの故障（異常）検知手法として、クラスタメンバ間で監視を行う方式がある。クラスタメンバごとに監視対象を定め、例えば定期的な（所定時間ごとの）ハートビート送信による応答有無を確認することで、所定時間内での故障検知を行う。故障が検知された場合には、監視を行ったクラスタメンバから全クラスタメンバへと当該メンバの故障が通知される。そして、当該メンバはクラスタシステムから減設される。そのため、代表メンバはメンバ管理表を更新し、その更新したメンバ管理表を全メンバに配信する。また、このとき、故障したメンバが代表メンバであった場合には、代表メンバの選出規則に従って次期の代表メンバが自律的に出現する。 (Life monitoring between cluster members and replacement of representative members)
As a cluster member failure (abnormality) detection method, there is a method of monitoring between cluster members. The monitoring target is determined for each cluster member, and for example, failure detection within a predetermined time is performed by confirming the presence or absence of a response by periodic heartbeat transmission (every predetermined time). When a failure is detected, the cluster member who has performed monitoring is notified of the failure of the member to all cluster members. The member is removed from the cluster system. Therefore, the representative member updates the member management table and distributes the updated member management table to all members. At this time, if the failed member is a representative member, the next representative member appears autonomously in accordance with the representative member selection rule.

代表メンバとその他のメンバが有する機能（処理部の構成）や情報について、図２を参照して説明する。
図２（ａ）に示すように、代表メンバは、処理部、記憶部、入出力部を備えて構成される。
処理部は、メンバ管理表配信部、メンバ管理表更新部、代表メンバ特定部、他メンバ死活監視部を備えている。
記憶部は、メンバ管理表、代表メンバ選出規則、旧監視先メンバ設定規則を記憶している。 Functions (configuration of the processing unit) and information possessed by the representative member and other members will be described with reference to FIG.
As shown in FIG. 2A, the representative member includes a processing unit, a storage unit, and an input / output unit.
The processing unit includes a member management table distribution unit, a member management table update unit, a representative member specifying unit, and another member alive monitoring unit.
The storage unit stores a member management table, representative member selection rules, and old monitoring destination member setting rules.

図２（ｂ）に示すように、非代表メンバは、処理部、記憶部、入出力部を備えて構成される。
処理部は、代表メンバ特定部、他メンバ死活監視部を備えている。
記憶部は、メンバ管理表、代表メンバ選出規則、旧監視先メンバ設定規則を記憶している。 As shown in FIG. 2B, the non-representative member includes a processing unit, a storage unit, and an input / output unit.
The processing unit includes a representative member specifying unit and another member alive monitoring unit.
The storage unit stores a member management table, representative member selection rules, and old monitoring destination member setting rules.

なお、代表メンバと非代表メンバは、図示を省略しているが、クラスタメンバとしてのメッセージ処理機能や、複製データの送信機能や管理機能などを有している。 Although not shown, the representative member and the non-representative member have a message processing function as a cluster member, a duplicate data transmission function, a management function, and the like.

（実装例と問題点）
上記のメンバ管理表（図１も参照）によるメンバ管理と、クラスタメンバ間の死活監視方式の実装例を考える。例えば、代表メンバ選出規則により、最初の代表メンバを、メンバ管理表の記載順序の先頭のメンバとする。このとき、メンバ管理表の記載順序はクラスタへの参加順序に従うものとする。さらに、各クラスタメンバの死活監視対象を、メンバ管理表で自身の次の（１つ下の）メンバとする（最後尾メンバは先頭メンバを監視する）。例として、クラスタメンバ数Ｎ＝４の場合を図３に示す。この場合に、１つのメンバ故障に伴うクラスタメンバの動作としては、以下の通りとなる。 (Implementation example and problems)
Consider an implementation example of member management based on the above-mentioned member management table (see also FIG. 1) and the alive monitoring system between cluster members. For example, according to the representative member selection rule, the first representative member is set as the first member in the description order of the member management table. At this time, the member management table is described in the order of participation in the cluster. Furthermore, the alive monitoring target of each cluster member is set as the next (one lower) member of itself in the member management table (the last member monitors the first member). As an example, FIG. 3 shows a case where the number of cluster members N = 4. In this case, the operation of the cluster member accompanying one member failure is as follows.

＜（ａ）代表メンバ以外のメンバが故障した場合（図４参照）＞
例として、メンバ３が故障した場合について考える。メンバ３を死活監視しているメンバ２がメンバ３の故障を検知し（Ｓ４１）、メンバ３の故障を全メンバへ通知する（Ｓ４２）。その後、通知を受けた代表メンバはメンバ管理表を更新して（Ｓ４３）、その更新したメンバ管理表を全メンバへ配信する（Ｓ４４）。代表メンバ以外の他メンバ２，４が故障した場合も同様である。 <(A) When a member other than the representative member fails (see FIG. 4)>
As an example, consider the case where member 3 fails. The member 2 who is alive monitoring the member 3 detects the failure of the member 3 (S41), and notifies all members of the failure of the member 3 (S42). After that, the representative member who has received the notification updates the member management table (S43), and distributes the updated member management table to all members (S44). The same applies when the members 2 and 4 other than the representative member fail.

＜（ｂ）代表メンバが故障した場合（図５参照）＞
メンバ１（代表メンバ）を死活監視しているメンバ４がメンバ１（代表メンバ）の故障を検知し（Ｓ５１）、メンバ１（代表メンバ）の故障を全メンバへ通知する（Ｓ５２）。その後、通知を受けたメンバ２（次期代表メンバ候補）は、自身を新代表メンバとして認識し（Ｓ５３）、メンバ管理表を更新し（Ｓ５４）、その更新したメンバ管理表を全メンバへ配信する（Ｓ５５）。以降、メンバ２は代表メンバとして動作する。 <(B) When a representative member fails (see FIG. 5)>
The member 4 who is alive monitoring the member 1 (representative member) detects the failure of the member 1 (representative member) (S51), and notifies the failure of the member 1 (representative member) to all members (S52). Thereafter, the member 2 (next representative member candidate) who has received the notification recognizes itself as a new representative member (S53), updates the member management table (S54), and distributes the updated member management table to all members. (S55). Thereafter, the member 2 operates as a representative member.

次に、２つのメンバが同時に故障した場合の動作として、以下の場合を考える。 Next, consider the following case as an operation when two members fail simultaneously.

＜（ｃ）代表メンバが故障するとともに、次期代表メンバ候補（先頭から二番目のメンバ）を監視するメンバでもなく代表メンバを監視するメンバでもないメンバが故障した場合（図６参照）＞
メンバ１（代表メンバ）とメンバ３が故障した場合を考える。まず、メンバ１（代表メンバ）を死活監視しているメンバ４がメンバ１（代表メンバ）の故障を検知し（Ｓ６１）、メンバ１（代表メンバ）の故障を全メンバへ通知する（Ｓ６２）。また、並行して、メンバ３を死活監視しているメンバ２がメンバ３の故障を検知し（Ｓ６３）、メンバ３の故障を全メンバへ通知する（Ｓ６４）。その後、次期代表メンバ候補であるメンバ２は、メンバ１（代表メンバ）とメンバ３の故障通知を受けて、自身を新代表メンバとして認識し（Ｓ６５）、メンバ管理表を更新し（Ｓ６６）、その更新したメンバ管理表を全メンバへ配信する（Ｓ６７）。以降、メンバ２は代表メンバとして動作する。 <(C) When a representative member fails and a member who is neither a member who monitors the next representative member candidate (second member from the top) nor a member who monitors the representative member fails (see FIG. 6)>
Consider a case where member 1 (representative member) and member 3 fail. First, the member 4 who is alive monitoring the member 1 (representative member) detects the failure of the member 1 (representative member) (S61), and notifies the failure of the member 1 (representative member) to all members (S62). In parallel, the member 2 who is alive monitoring the member 3 detects the failure of the member 3 (S63), and notifies the failure of the member 3 to all members (S64). Thereafter, the member 2 who is the next representative member candidate receives a failure notification of the member 1 (representative member) and the member 3, recognizes itself as a new representative member (S65), updates the member management table (S66), The updated member management table is distributed to all members (S67). Thereafter, the member 2 operates as a representative member.

＜（ｄ）代表メンバが故障するとともに、次期代表メンバ候補（先頭から二番目のメンバ）が故障した場合（図７参照）＞
メンバ１（代表メンバ）とメンバ２（次期代表メンバ候補）が故障した場合を考える。まず、メンバ１（代表メンバ）を死活監視しているメンバ４がメンバ１（代表メンバ）の故障を検知し（Ｓ７１）、メンバ１（代表メンバ）の故障を全メンバへ通知する（Ｓ７２）。このとき、メンバ２（次期代表メンバ候補）も故障しているが、監視元のメンバ１（代表メンバ）が故障しているために故障検知が正常に行われない。この場合、メンバ１（代表メンバ）、メンバ２（次期代表メンバ候補）の同時故障であるため、次々期代表メンバ候補であるメンバ３が代表メンバとなる必要がある。ここで、メンバ２は、次期代表メンバ候補であるが、故障しているため、メンバ管理表を更新しない（Ｓ７３前段）。しかし、メンバ３は、メンバ２（次期代表メンバ候補）の故障が通知されないために、メンバ２（次期代表メンバ候補）を新代表メンバとして認識する（Ｓ７３後段）。そして、メンバ３は、そのメンバ２（次期代表メンバ候補）が新代表メンバとしてメンバ管理表の更新と配信を行うのを待ち続けることになり、クラスタシステムの処理が停止してしまう。 <(D) When the representative member fails and the next representative member candidate (second member from the top) fails (see FIG. 7)>
Consider a case where member 1 (representative member) and member 2 (next representative member candidate) fail. First, the member 4 who is alive monitoring the member 1 (representative member) detects the failure of the member 1 (representative member) (S71), and notifies the failure of the member 1 (representative member) to all members (S72). At this time, the member 2 (next representative member candidate) also fails, but failure detection is not normally performed because the monitoring source member 1 (representative member) has failed. In this case, since member 1 (representative member) and member 2 (next representative member candidate) are at the same time, member 3 which is the next representative member candidate needs to be the representative member. Here, the member 2 is the next representative member candidate, but since it has failed, the member management table is not updated (first stage of S73). However, since the failure of the member 2 (next representative member candidate) is not notified, the member 3 recognizes the member 2 (next representative member candidate) as a new representative member (second stage of S73). Then, the member 3 continues to wait for the member 2 (next representative member candidate) to update and distribute the member management table as a new representative member, and the processing of the cluster system stops.

＜（ｅ）代表メンバが故障するとともに、代表メンバを監視するメンバ（最後尾メンバ）が故障した場合（図８参照）＞
メンバ１（代表メンバ）とメンバ４（代表メンバを監視するメンバ）が故障した場合を考える。まず、メンバ４（代表メンバを監視するメンバ）を死活監視しているメンバ３がメンバ４（代表メンバを監視するメンバ）の故障を検知し（Ｓ８１）、メンバ４（代表メンバを監視するメンバ）の故障を全メンバへ通知する（Ｓ８２）。このとき、メンバ１（代表メンバ）も故障しているが、監視元のメンバ４（代表メンバを監視するメンバ）が故障しているために故障検知が正常に行われない。この場合、メンバ１（代表メンバ）が故障しているため、次期代表メンバ候補であるメンバ２が代表メンバとなる必要がある。ここで、メンバ１（代表メンバ）は、故障しているため、メンバ管理表を更新しない（Ｓ８３前段）。しかし、メンバ２（次期代表メンバ候補）は、メンバ１（代表メンバ）の故障が通知されないために、メンバ１を代表メンバとして認識する（Ｓ８３後段）。そして、メンバ２（次期代表メンバ候補）は、メンバ１（代表メンバ）の故障が通知されないために、メンバ１（代表メンバ）がメンバ管理表の更新と配信を行うのを待ち続けることになり、クラスタシステムの処理が停止してしまう。 <(E) When a representative member fails and a member that monitors the representative member (last member) fails (see FIG. 8)>
Consider a case where member 1 (representative member) and member 4 (member monitoring the representative member) fail. First, the member 3 that is alive monitoring the member 4 (member that monitors the representative member) detects a failure of the member 4 (member that monitors the representative member) (S81), and the member 4 (member that monitors the representative member) Is reported to all members (S82). At this time, the member 1 (representative member) has also failed, but failure detection is not normally performed because the monitoring source member 4 (member that monitors the representative member) has failed. In this case, since member 1 (representative member) is out of order, member 2 as the next representative member candidate needs to be the representative member. Here, since member 1 (representative member) is out of order, the member management table is not updated (first stage of S83). However, the member 2 (next representative member candidate) recognizes the member 1 as a representative member because the failure of the member 1 (representative member) is not notified (second stage of S83). Then, since member 2 (representative member candidate) is not notified of the failure of member 1 (representative member), member 2 (representative member) continues to wait for member 1 (representative member) to update and distribute the member management table. Cluster system processing stops.

上記（ｄ）（図７）に示した代表メンバと次期代表メンバ候補が同時に故障した場合、および、（ｅ）（図８）に示した代表メンバと代表メンバを監視するメンバが同時に故障した場合には、新代表メンバが現れず、メンバ管理表の更新が行われないために、システムが停止してしまうという問題がある。 When the representative member and the next representative member candidate shown in (d) (FIG. 7) fail simultaneously, and when the representative member shown in (e) (FIG. 8) and the member monitoring the representative member fail simultaneously However, there is a problem that the system stops because no new representative member appears and the member management table is not updated.

本問題の従来の解決手法の一つを図９に示す。図７の場合と同様、メンバ１（代表メンバ）とメンバ２（次期代表メンバ候補）が故障した場合を考える。この解決手法では、クラスタメンバの故障通知が発生した際に、所定時間、代表メンバによるメンバ管理表の更新が行われなかった場合には、本来の死活監視メンバとは異なるメンバが、代表メンバもしくは次期代表メンバ候補の故障検知を行う。 One conventional solution to this problem is shown in FIG. As in the case of FIG. 7, consider a case where member 1 (representative member) and member 2 (next representative member candidate) fail. In this solution, when a failure notification of a cluster member occurs, if the member management table is not updated by the representative member for a predetermined time, a member different from the original alive monitoring member is designated as the representative member or The failure detection of the next representative member candidate is performed.

まず、メンバ１（代表メンバ）を死活監視しているメンバ４がメンバ１（代表メンバ）の故障を検知し（Ｓ９１）、メンバ１（代表メンバ）の故障を全メンバへ通知する（Ｓ９２）。ここで、メンバ２は、次期代表メンバ候補であるが、故障しているため、メンバ管理表を更新しない（Ｓ９３前段）。しかし、メンバ３は、メンバ２（次期代表メンバ候補）の故障が通知されないために、メンバ２（次期代表メンバ候補）を新代表メンバとして認識する（Ｓ９３後段）。 First, the member 4 who is alive monitoring the member 1 (representative member) detects the failure of the member 1 (representative member) (S91), and notifies the failure of the member 1 (representative member) to all members (S92). Here, the member 2 is a candidate for the next representative member, but has failed, so the member management table is not updated (first stage of S93). However, since the failure of member 2 (next representative member candidate) is not notified, member 3 recognizes member 2 (next representative member candidate) as a new representative member (second stage of S93).

ここで、故障通知から所定時間、メンバ管理表の更新が行われなかった場合に、メンバ４は、メンバ２（次期代表メンバ候補）の故障検知をメンバ３へ依頼する（Ｓ９４）。そして、メンバ３は、メンバ２（次期代表メンバ候補）の故障を検知し（Ｓ９５）、メンバ２（次期代表メンバ候補）の故障を全メンバに通知する（Ｓ９６）。その後、メンバ３は、自身を新代表メンバとして認識し（Ｓ９７）、メンバ管理表を更新し（Ｓ９８）、その更新したメンバ管理表を全メンバへ配信する（Ｓ９９）。 Here, when the member management table is not updated for a predetermined time from the failure notification, the member 4 requests the member 3 to detect the failure of the member 2 (next representative member candidate) (S94). The member 3 detects the failure of the member 2 (next representative member candidate) (S95), and notifies all members of the failure of the member 2 (next representative member candidate) (S96). Thereafter, the member 3 recognizes itself as a new representative member (S97), updates the member management table (S98), and distributes the updated member management table to all members (S99).

丸山不二夫、首藤一幸、「スケールアウトの技術」、雲の世界の向こうをつかむクラウドの技術、株式会社アスキー・メディアワークス、２００９年１１月６日、ＩＳＢＮ９７８−４−０４−８６８０６４−６Ｃ３００４、ｐ．８８‐９９Fujio Maruyama, Kazuyuki Shudo, “Technology of scale-out”, seizing beyond the cloud world Cloud Technology, ASCII Media Works, Inc., November 6, 2009, ISBN 978-4-04-886864-6 C3004, p . 88-99 岩佐絵里子、外５名、「スケールアウト型セッション制御サーバにおける動的構成変更負荷軽減方式」、信学技報、社団法人電子情報通信学会、２０１２年、ＮＳ２０１１−１５６（２０１２−０１）、ｐ．６５‐７０Eriko Iwasa, 5 others, “Dynamic configuration change load reduction method in scale-out type session control server”, IEICE Technical Report, IEICE, 2012, NS2011-156 (2012-01), p. 65-70

このように、図９に示すように、クラスタメンバの故障通知が発生した際に、所定時間、代表メンバによるメンバ管理表の更新が行われなかった場合には、本来の死活監視メンバとは異なるメンバが、代表メンバもしくは次期代表メンバ候補の故障検知を行う方法は有効である。しかしながら、この方法は代表メンバや次期代表メンバ候補の故障検知を行うまでの判断に時間を要するため、最終的にメンバ管理表が更新されるまでに比較的長時間を要するという問題がある。メンバ管理表が正しく更新されていない期間は、データの複製が正常に管理されていない状態であるため、サービスの停止が許されない高可用システムにおいては、より高速にクラスタメンバの状態を反映可能な手法が求められる。 In this way, as shown in FIG. 9, when the failure notification of the cluster member occurs, if the member management table is not updated by the representative member for a predetermined time, it is different from the original alive monitoring member. A method in which a member detects a failure of a representative member or a next representative member candidate is effective. However, this method has a problem that it takes a relatively long time until the member management table is finally updated because it takes time to determine the failure of the representative member or the next representative member candidate. During the period when the member management table is not updated correctly, data replication is not managed properly, so the status of cluster members can be reflected more quickly in a highly available system where service is not allowed to stop. A method is required.

そこで、本発明は、前記した事情に鑑みてなされたものであり、クラスタシステムにおいて、メンバ管理表の更新と配信を行う代表メンバと他のメンバとが同時に故障した場合でも、新代表メンバを高速に選出することを課題とする。 Therefore, the present invention has been made in view of the above circumstances, and in a cluster system, even when a representative member that updates and distributes a member management table and another member fail at the same time, a new representative member is The task is to select the first.

前記課題を解決するために、本発明は、複数のクラスタメンバが分散処理を行うクラスタシステムであって、前記複数のクラスタメンバそれぞれは、前記複数のクラスタメンバの識別子を含むメンバ管理表を保持しており、前記複数のクラスタメンバのうちの代表メンバは、前記メンバ管理表を更新し、その更新したメンバ管理表を前記複数のクラスタメンバそれぞれに配信し、前記複数のクラスタメンバのうち、前記代表メンバが故障した場合の次の代表メンバである次期代表メンバ候補が予め選出されており、前記複数のクラスタメンバそれぞれは、前記複数のクラスタメンバの間で行う死活監視対象を決定する際に、次期代表メンバ候補に対して死活監視を行うクラスタメンバを、そのときの代表メンバ以外から選出するように定めた監視先メンバ設定規則を記憶する記憶部と、前記複数のクラスタメンバの間で行う死活監視対象を決定する際に、前記監視先メンバ設定規則にしたがって自身の監視対象を選出する処理部と、を備えることを特徴とする。 In order to solve the above problems, the present invention provides a cluster system in which a plurality of cluster members perform distributed processing, and each of the plurality of cluster members has a member management table including identifiers of the plurality of cluster members. A representative member of the plurality of cluster members updates the member management table, distributes the updated member management table to each of the plurality of cluster members, and among the plurality of cluster members, the representative member The next representative member candidate that is the next representative member when the member fails is selected in advance, and each of the plurality of cluster members determines the next life / death monitoring target to be performed between the plurality of cluster members. It was decided to select a cluster member to perform life and death monitoring for a representative member candidate from a member other than the representative member at that time. A storage unit for storing a viewing-destination member setting rule; and a processing unit for selecting a monitoring target according to the monitoring-destination member setting rule when determining a life / death monitoring target to be performed among the plurality of cluster members, It is characterized by providing.

これにより、クラスタシステムにおいて、前記した監視先メンバ設定規則を用いることで、メンバ管理表の更新と配信を行う代表メンバと他のメンバとが同時に故障した場合でも、新代表メンバを高速に選出することができる。 As a result, in the cluster system, by using the monitoring member setting rules described above, a new representative member can be selected at high speed even when the representative member that updates and distributes the member management table and another member fail at the same time. be able to.

また、本発明において、前記監視先メンバ設定規則は、前記複数のクラスタメンバの間で行う死活監視の関係が、任意のクラスタメンバから出発して、そのクラスタメンバの監視先クラスタメンバ、さらにそのクラスタメンバの監視先クラスタメンバ、と順にたどった場合に、全クラスタメンバを網羅してから元の出発のクラスタメンバに戻るように設定された規則であることが望ましい。 Also, in the present invention, the monitoring destination member setting rule is that the relationship of life and death monitoring performed between the plurality of cluster members starts from an arbitrary cluster member, and the cluster member monitoring destination cluster member and further the cluster It is desirable that the rule is set so that all the cluster members are covered and then the original starting cluster member is returned when the members are monitored in the order of the monitoring destination cluster members.

これにより、各クラスタメンバの監視先が全メンバを跨った系を成すように構成することで、複数メンバの同時故障時に故障検知不能となるメンバをより少なくすることができる。 Thus, by configuring the cluster members so that the monitoring destinations of all cluster members extend over all members, it is possible to further reduce the number of members that cannot be detected when a plurality of members fail simultaneously.

また、本発明において、前記監視先メンバ設定規則は、クラスタメンバ数（Ｎ）が奇数の場合、各クラスタメンバの監視対象を前記メンバ管理表における順番の降順で一つ飛ばしのクラスタメンバとし、最後から一つ前の「Ｎ−１」番目のクラスタは先頭のクラスタメンバを監視し、最後のＮ番目のクラスタメンバは先頭から二番目のクラスタメンバを監視するように設定された規則であることが望ましい。 In the present invention, when the number of cluster members (N) is an odd number, the monitoring destination member setting rule sets the monitoring target of each cluster member to one cluster member skipped in the descending order of the order in the member management table. The “N−1” -th cluster immediately before is monitored for the first cluster member, and the last N-th cluster member is a rule set to monitor the second cluster member from the top. desirable.

これにより、クラスタメンバ数（Ｎ）が奇数の場合に、各クラスタメンバの監視先が全メンバを跨った系を成すように容易に構成することができる。 As a result, when the number of cluster members (N) is an odd number, the monitoring destination of each cluster member can be easily configured so as to form a system across all members.

また、本発明において、前記監視先メンバ設定規則は、クラスタメンバ数（Ｎ）が偶数の場合、各クラスタメンバの監視対象を、先頭から「Ｎ−２」番目のクラスタメンバについては、前記メンバ管理表における順番の降順で一つ飛ばしのクラスタメンバとし、最後から一つ前の「Ｎ−１」番目のクラスタは先頭から二番目のクラスタメンバを監視し、最後のＮ番目のクラスタメンバは先頭のクラスタメンバを監視するように設定された規則であることが望ましい。 In the present invention, when the number of cluster members (N) is an even number, the monitoring destination member setting rule specifies that the monitoring target of each cluster member is the member management for the “N-2” -th cluster member from the top. The cluster member is skipped by one in descending order in the table, the “N−1” -th cluster one previous from the last is monitored for the second cluster member from the top, and the last N-th cluster member is the first cluster member. The rules are preferably set to monitor cluster members.

これにより、クラスタメンバ数（Ｎ）が偶数の場合に、各クラスタメンバの監視先が全メンバを跨った系を成すように容易に構成することができる。 Thus, when the number of cluster members (N) is an even number, the monitoring destination of each cluster member can be easily configured so as to form a system across all members.

本発明によれば、クラスタシステムにおいて、メンバ管理表の更新と配信を行う代表メンバと他のメンバとが同時に故障した場合でも、新代表メンバを高速に選出するができる。 According to the present invention, in a cluster system, a new representative member can be selected at high speed even when a representative member that updates and distributes a member management table and another member fail at the same time.

従来技術のメンバ管理表のデータ構成図である。It is a data block diagram of the member management table | surface of a prior art. （ａ）は、従来技術の代表メンバの構成図である。（ｂ）は、従来技術の非代表メンバの構成図である。(A) is a block diagram of the representative member of a prior art. (B) is a block diagram of a non-representative member of the prior art. 従来技術における、クラスタメンバ数Ｎ＝４の場合のクラスタメンバ間の死活監視関係を示す図である。It is a figure which shows the alive monitoring relationship between cluster members in the case of the number N of cluster members in a prior art. 従来技術において、図３の例でメンバ３が故障した場合について、（ａ）は各メンバの状態を示す模式図であり、（ｂ）は各メンバによる処理を示すフローチャートである。In the prior art, when the member 3 fails in the example of FIG. 3, (a) is a schematic diagram showing the state of each member, and (b) is a flowchart showing processing by each member. 従来技術において、図３の例でメンバ１が故障した場合について、（ａ）は各メンバの状態を示す模式図であり、（ｂ）は各メンバによる処理を示すフローチャートである。In the prior art, when the member 1 fails in the example of FIG. 3, (a) is a schematic diagram showing the state of each member, and (b) is a flowchart showing processing by each member. 従来技術において、図３の例でメンバ１とメンバ３が同時に故障した場合について、（ａ）は各メンバの状態を示す模式図であり、（ｂ）は各メンバによる処理を示すフローチャートである。FIG. 3A is a schematic diagram showing the state of each member and FIG. 5B is a flowchart showing processing by each member when the members 1 and 3 fail simultaneously in the example of FIG. 従来技術において、図３の例でメンバ１とメンバ２が同時に故障した場合について、（ａ）は各メンバの状態を示す模式図であり、（ｂ）は各メンバによる処理を示すフローチャートである。FIG. 3A is a schematic diagram showing the state of each member and FIG. 4B is a flowchart showing processing by each member when the members 1 and 2 fail simultaneously in the example of FIG. 従来技術において、図３の例でメンバ１とメンバ４が同時に故障した場合について、（ａ）は各メンバの状態を示す模式図であり、（ｂ）は各メンバによる処理を示すフローチャートである。FIG. 3A is a schematic diagram illustrating the state of each member and FIG. 3B is a flowchart illustrating processing by each member when the member 1 and the member 4 fail simultaneously in the example of FIG. 別の従来技術において、図３の例でメンバ１とメンバ２が同時に故障した場合について、（ａ）は各メンバの状態を示す模式図であり、（ｂ）は各メンバによる処理を示すフローチャートである。FIG. 3A is a schematic diagram showing a state of each member, and FIG. 3B is a flowchart showing processing by each member when the members 1 and 2 fail simultaneously in the example of FIG. is there. （ａ）は、本実施形態の代表メンバの構成図である。（ｂ）は、本実施形態の非代表メンバの構成図である。(A) is a block diagram of the representative member of this embodiment. (B) is a block diagram of a non-representative member of the present embodiment. （ａ）は、本実施形態の方式をメンバ数（Ｎ）が奇数の場合に適用したときのメンバ間の監視関係を示す図である。（ｂ）は、本実施形態の方式をメンバ数（Ｎ）が偶数の場合に適用したときのメンバ間の監視関係を示す図である。(A) is a figure which shows the monitoring relationship between members when the system of this embodiment is applied when the number of members (N) is an odd number. (B) is a figure which shows the monitoring relationship between members when the system of this embodiment is applied when the number of members (N) is an even number. 本実施形態において、図７の場合と同様に、メンバ１とメンバ２が同時に故障した場合について、（ａ）は各メンバの状態を示す模式図であり、（ｂ）は各メンバによる処理を示すフローチャートである。In this embodiment, as in the case of FIG. 7, (a) is a schematic diagram showing the state of each member, and (b) shows the processing by each member when the members 1 and 2 fail simultaneously. It is a flowchart. （ａ）は、本実施形態において、監視先が全メンバを跨った系を成すように構成する場合のメンバ間の監視関係を示す図である。（ｂ）は、比較例として、監視先が全メンバを跨っていない構成の場合のメンバ間の監視関係を示す図である。(A) is a figure which shows the monitoring relationship between members in the case where it configures so that the monitoring destination may form the system over all members in this embodiment. (B) is a figure which shows the monitoring relationship between members in the case of the structure where the monitoring destination does not straddle all members as a comparative example. 本実施形態において、監視先が全メンバを跨った系を成すように構成するためのクラスタメンバの監視先の設定手順の説明図であり、（ａ）は、メンバ管理表における各メンバの配列状況を示す図であり、（ｂ）はその設定手順を示すフローチャートである。In this embodiment, it is explanatory drawing of the setting procedure of the monitoring destination of the cluster member for comprising so that a monitoring destination may form the system over all members, (a) is the arrangement | positioning condition of each member in a member management table | surface (B) is a flowchart showing the setting procedure.

以下、本発明を実施するための形態（以下、実施形態と称する。）について、図面を参照（言及図以外の図も適宜参照）しながら説明する。
本実施形態では、複数のクラスタメンバが分散処理を行うクラスタシステムについて説明する。なお、本実施形態ではクラスタメンバとデータを対応付ける手法としてコンシステントハッシング法を用いるが、本発明はコンシステントハッシング法に限定されない。 Hereinafter, modes for carrying out the present invention (hereinafter referred to as embodiments) will be described with reference to the drawings (refer to drawings other than the referenced drawings as appropriate).
In this embodiment, a cluster system in which a plurality of cluster members perform distributed processing will be described. In the present embodiment, the consistent hashing method is used as a method for associating the cluster members with the data, but the present invention is not limited to the consistent hashing method.

まず、本実施形態の代表メンバとその他のメンバが有する機能（処理部の構成）や情報について、図１０を参照して説明する。
図１０（ａ）に示すように、代表メンバ１００は、処理部１１０、記憶部１２０、入出力部１３０を備えて構成される。
処理部１１０は、メンバ管理表配信部１１１、メンバ管理表更新部１１２、代表メンバ特定部１１３、他メンバ死活監視部１１４を備えている。 First, functions (configuration of processing units) and information possessed by the representative member and other members of the present embodiment will be described with reference to FIG.
As shown in FIG. 10A, the representative member 100 includes a processing unit 110, a storage unit 120, and an input / output unit 130.
The processing unit 110 includes a member management table distribution unit 111, a member management table update unit 112, a representative member specifying unit 113, and another member alive monitoring unit 114.

メンバ管理表配信部１１１は、メンバ管理表１２１を他のメンバに配信する。
メンバ管理表更新部１１２は、所定の契機（例えば、メンバの増減時）に、メンバ管理表１２１を更新する。
代表メンバ特定部１１３は、所定の契機（例えば、代表メンバの故障時）に、代表メンバ選出規則１２２を用いて、代表メンバを特定する。
他メンバ死活監視部１１４は、監視先メンバ設定規則１２３により定められた他のメンバの死活監視を行う。 The member management table distribution unit 111 distributes the member management table 121 to other members.
The member management table update unit 112 updates the member management table 121 at a predetermined opportunity (for example, when the number of members increases or decreases).
The representative member specifying unit 113 specifies a representative member using a representative member selection rule 122 at a predetermined opportunity (for example, when a representative member fails).
The other member alive monitoring unit 114 performs alive monitoring of other members defined by the monitoring destination member setting rule 123.

記憶部１２０は、メンバ管理表１２１、代表メンバ選出規則１２２、監視先メンバ設定規則１２３を記憶している。
入出力部１３０は、情報の入力インタフェース、出力インタフェース、通信インタフェースを備える。 The storage unit 120 stores a member management table 121, a representative member selection rule 122, and a monitoring destination member setting rule 123.
The input / output unit 130 includes an information input interface, an output interface, and a communication interface.

なお、代表メンバ１００は、図２（ａ）に示す代表メンバと比較して、監視先メンバ設定規則１２３が旧監視先メンバ設定規則と異なっている点で技術的特徴を有する（詳細は後記）。その他の構成については、図２（ａ）に示す代表メンバにおいて対応する構成と同様であるので、詳細な説明を省略する。 The representative member 100 has a technical feature in that the monitoring destination member setting rule 123 is different from the old monitoring destination member setting rule compared to the representative member shown in FIG. 2A (details will be described later). . The other configuration is the same as the corresponding configuration in the representative member shown in FIG.

また、図１０（ａ）に図示していないが、代表メンバ１００は、記憶部１２０に、次期代表メンバ候補の選出規則も記憶している。そして、代表メンバ１００は、その次期代表メンバ候補の選出規則を用いることで、必要に応じて、次期代表メンバ候補を新たに選出することができる。または、代表メンバ１００は、次期代表メンバ候補の識別子を所定の記憶領域に常に記憶していて、代表メンバが変更になる契機に、次期代表メンバ候補の選出規則に基づいて次期代表メンバ候補を新たに選出し、その所定の記憶領域に記憶している次期代表メンバ候補の識別子を更新するようにしてもよい。なお、次期代表メンバ候補の選出規則は、例えば、メンバ管理表１２１における代表メンバの次（１つ下）のメンバを次期代表メンバとする規則である。 Although not shown in FIG. 10A, the representative member 100 also stores the selection rule for the next representative member candidate in the storage unit 120. The representative member 100 can newly select the next representative member candidate as necessary by using the selection rule for the next representative member candidate. Alternatively, the representative member 100 always stores the identifier of the next representative member candidate in a predetermined storage area. When the representative member is changed, the representative member 100 newly selects the next representative member candidate based on the selection rule for the next representative member candidate. The identifier of the next representative member candidate stored in the predetermined storage area may be updated. The next representative member candidate selection rule is, for example, a rule in which the next representative member in the member management table 121 is the next representative member.

図１０（ｂ）に示すように、非代表メンバ２００は、処理部２１０、記憶部２２０、入出力部２３０を備えて構成される。
処理部２１０は、代表メンバ特定部２１１、他メンバ死活監視部２１２を備えている。
代表メンバ特定部２１１は、所定の契機（例えば、代表メンバの故障時）に、代表メンバ選出規則２２２を用いて、代表メンバを特定する。
他メンバ死活監視部２１２は、監視先メンバ設定規則２２３により定められた他のメンバの死活監視を行う。
記憶部２２０は、メンバ管理表２２１、代表メンバ選出規則２２２、監視先メンバ設定規則２２３を記憶している。
入出力部２３０は、情報の入力インタフェース、出力インタフェース、通信インタフェースを備える。 As shown in FIG. 10B, the non-representative member 200 includes a processing unit 210, a storage unit 220, and an input / output unit 230.
The processing unit 210 includes a representative member specifying unit 211 and another member alive monitoring unit 212.
The representative member specifying unit 211 uses a representative member selection rule 222 to specify a representative member at a predetermined opportunity (for example, when a representative member fails).
The other member alive monitoring unit 212 performs alive monitoring of other members determined by the monitoring destination member setting rule 223.
The storage unit 220 stores a member management table 221, a representative member selection rule 222, and a monitoring destination member setting rule 223.
The input / output unit 230 includes an information input interface, an output interface, and a communication interface.

なお、非代表メンバ２００は、図２（ｂ）に示す非代表メンバと比較して、監視先メンバ設定規則２２３が旧監視先メンバ設定規則と異なっている点で技術的特徴を有する（詳細は後記）。その他の構成については、図２（ｂ）に示す非代表メンバにおいて対応する構成と同様であるので、詳細な説明を省略する。 The non-representative member 200 has a technical feature in that the monitoring destination member setting rule 223 is different from the old monitoring destination member setting rule as compared to the non-representative member shown in FIG. (Postscript). The other configuration is the same as the corresponding configuration in the non-representative member shown in FIG.

また、図１０（ｂ）に図示していないが、非代表メンバ２００は、記憶部２２０に、次期代表メンバ候補の選出規則（代表メンバ１００のものと共通）も記憶している。 Although not shown in FIG. 10B, the non-representative member 200 also stores a selection rule for the next representative member candidate (common to that of the representative member 100) in the storage unit 220.

また、代表メンバ１００と非代表メンバ２００は、図示を省略しているが、クラスタメンバとしてのメッセージ処理機能や、複製データの送信機能や管理機能などを有している。 Although not shown, the representative member 100 and the non-representative member 200 have a message processing function as a cluster member, a duplicate data transmission function, a management function, and the like.

また、図１０（ｂ）では、非代表メンバ２００の処理部２１０は、代表メンバ１００の処理部１１０におけるメンバ管理表配信部１１１メンバ管理表更新部１１２に対応する構成を図示していないが、非代表メンバ２００が代表メンバ１００になった際には、そのような構成も有することになることは、言うまでもない。 In FIG. 10B, the processing unit 210 of the non-representative member 200 does not show a configuration corresponding to the member management table distribution unit 111 and the member management table update unit 112 in the processing unit 110 of the representative member 100. Needless to say, when the non-representative member 200 becomes the representative member 100, such a configuration is also provided.

本実施形態では、前述した（ｄ）（図７）と（ｅ）（図８）の問題を解決するクラスタメンバ間の監視方式を提案する。そのためには、代表メンバの監視対象が次期代表メンバ候補となることを避けることが有効である。つまり、監視先メンバ設定規則１２３および監視先メンバ設定規則２２３は、複数のクラスタメンバの間で行う死活監視対象を決定する際に、次期代表メンバ候補に対して死活監視を行うクラスタメンバを、そのときの代表メンバ以外から選出するように定めた規則であり、具体的には、次の通りである。 In the present embodiment, a monitoring method between cluster members that solves the problems (d) (FIG. 7) and (e) (FIG. 8) described above is proposed. For this purpose, it is effective to avoid the monitoring target of the representative member from becoming the next representative member candidate. In other words, the monitoring destination member setting rule 123 and the monitoring destination member setting rule 223 specify the cluster member that performs life and death monitoring for the next representative member candidate when determining the life and death monitoring target to be performed among a plurality of cluster members. It is a rule determined to select from members other than the representative member at the time, specifically, as follows.

例えば、クラスタメンバ数（Ｎ）が奇数の場合、各クラスタメンバの監視対象をメンバ管理表１２１（図１と同じ）の一つ飛ばし（２つ下）のメンバとする。なお、メンバ管理表１２１では、列最後尾（最下段）の次は先頭に戻るものとして扱う。 For example, when the number of cluster members (N) is an odd number, the monitoring target of each cluster member is one skipped (two lower) member of the member management table 121 (same as FIG. 1). In the member management table 121, the next column end (bottom row) is treated as returning to the top.

つまり、図１１（ａ）に示すように、Ｎ＝５の場合で考えると、メンバ１（代表メンバ）はメンバ３を監視する。メンバ２はメンバ４を監視する。メンバ３はメンバ５を監視する。そして、最後尾（メンバ５）から一つ前の「Ｎ−１」番目のメンバ（メンバ４）は先頭メンバ（メンバ１）を監視する。また、最後尾のＮ番目のメンバ（メンバ５）は先頭から二番目のメンバ（メンバ２）を監視する。これにより、各クラスタメンバの監視先が全メンバを跨った系を成すように容易に構成することができる。 That is, as shown in FIG. 11A, considering the case of N = 5, the member 1 (representative member) monitors the member 3. Member 2 monitors member 4. Member 3 monitors member 5. Then, the “N−1” -th member (member 4) immediately before the tail (member 5) monitors the head member (member 1). The last Nth member (member 5) monitors the second member (member 2) from the top. Thereby, it can be easily configured so that the monitoring destination of each cluster member forms a system across all members.

ここで、「監視先が全メンバを跨った系」とは、複数のクラスタメンバの間で行う死活監視の関係が、任意のクラスタメンバから出発して、そのクラスタメンバの監視先クラスタメンバ、さらにそのクラスタメンバの監視先クラスタメンバ、と順にたどった場合に、全クラスタメンバを網羅してから元の出発のクラスタメンバに戻ることを指す。 Here, the “system in which the monitoring destination spans all members” means that the relationship of life and death monitoring performed between a plurality of cluster members starts from an arbitrary cluster member, the monitoring destination cluster member of the cluster member, When the cluster members are traced in order, the cluster members are monitored, and all cluster members are covered before returning to the original starting cluster member.

図１１（ｂ）に示すように、メンバ数（Ｎ）が偶数の場合には、Ｎ番目のメンバ（メンバ４）が先頭メンバ（メンバ１）を監視し、Ｎ−１番目のメンバが先頭から二番目のメンバを監視することとする。これにより、各クラスタメンバの監視先が全メンバを跨った系を成すように容易に構成することができる。また、メンバ数Ｎ＝４の場合に、メンバ２とメンバ４が互いを監視してそれらの同時故障時に故障検知が行われない事態を防ぐという効果もある。 As shown in FIG. 11B, when the number of members (N) is an even number, the Nth member (member 4) monitors the first member (member 1), and the (N-1) th member from the top. The second member will be monitored. Thereby, it can be easily configured so that the monitoring destination of each cluster member forms a system across all members. In addition, when the number of members N = 4, there is an effect that the members 2 and 4 monitor each other to prevent a situation in which failure detection is not performed at the time of the simultaneous failure.

以上の方式を用いた場合の、メンバ１（代表メンバ）とメンバ２（次期代表メンバ候補）が同時に故障したときの動作について図１２に示す（Ｎ＝４の場合）。動作主体は、処理部１１０内の各部と、処理部２１０内の各部であるが、記載を簡潔にするために各メンバを動作主体とする。 FIG. 12 shows the operation when member 1 (representative member) and member 2 (next representative member candidate) fail simultaneously when the above method is used (when N = 4). The operation subject is each unit in the processing unit 110 and each unit in the processing unit 210, but each member is an operation subject in order to simplify the description.

まず、メンバ１（代表メンバ）を死活監視しているメンバ４がメンバ１（代表メンバ）の故障を検知し（Ｓ１２１）、メンバ１（代表メンバ）の故障を全メンバへ通知する（Ｓ１２２）。また、並行して、メンバ２を死活監視しているメンバ３がメンバ２の故障を検知し（Ｓ１２３）、メンバ２の故障を全メンバへ通知する（Ｓ１２４）。次々期代表メンバ候補であるメンバ３は、メンバ１（代表メンバ）とメンバ２（次期代表メンバ候補）の故障通知を受けて、自身を新代表メンバとして認識し（Ｓ１２５）、メンバ管理表１２１を更新し（Ｓ１２６）、その更新したメンバ管理表１２１を全メンバへ配信する（Ｓ１２７）。以降、メンバ３は代表メンバとして動作する。 First, the member 4 that is alive monitoring the member 1 (representative member) detects the failure of the member 1 (representative member) (S121), and notifies the failure of the member 1 (representative member) to all members (S122). In parallel, the member 3 who is alive monitoring the member 2 detects the failure of the member 2 (S123), and notifies the failure of the member 2 to all members (S124). Member 3 who is the next representative member candidate receives the failure notification of member 1 (representative member) and member 2 (next representative member candidate), recognizes himself as a new representative member (S125), and stores member management table 121. Update (S126), and distribute the updated member management table 121 to all members (S127). Thereafter, the member 3 operates as a representative member.

以上のように、本方式を用いることで、代表メンバと次期代表メンバの故障検知が並行して行われ、新代表メンバの高速な選出を行うことが可能となる。つまり、クラスタシステムにおいて、メンバ管理表１２１の更新と配信を行う代表メンバと他のメンバとが同時に故障した場合でも、新代表メンバを高速に選出できる。 As described above, by using this method, failure detection of the representative member and the next representative member is performed in parallel, and a new representative member can be selected at high speed. That is, in the cluster system, even when a representative member that updates and distributes the member management table 121 and another member fail at the same time, a new representative member can be selected at high speed.

ここで、本方式を一般化して示す。本方式では前提として、複数メンバの同時故障時に故障検知不能となるメンバをできるだけ少なくするために、相互監視しあうペアを発生させないことに加えて、図１３（ａ）に示すように各クラスタメンバの監視先が全メンバを跨った系を成すように構成する監視先メンバ設定規則１２３，２２３を用いる。 Here, this method is generalized and shown. In this method, as a premise, in order to minimize the number of members that are unable to detect a failure at the time of simultaneous failure of a plurality of members, in addition to not generating a pair to be mutually monitored, as shown in FIG. Monitoring-destination member setting rules 123 and 223 are used so that the monitoring destinations are configured to form a system across all members.

比較例として、各クラスタメンバの監視先が全メンバを跨っていない構成の場合、図１３（ｂ）の例では、メンバ１〜３が同時に故障した場合に故障検知が不能であり、また、メンバ４〜Ｎが同時に故障した場合にも故障検知が不能である。 As a comparative example, when the monitoring destination of each cluster member does not straddle all members, in the example of FIG. 13B, failure detection is impossible when members 1 to 3 fail simultaneously, Even if 4 to N fail simultaneously, failure detection is impossible.

一方、「監視先が全メンバを跨った系」を採用する本方式では、メンバｋを代表メンバ、メンバｊを次期代表メンバ候補とするとき、各クラスタメンバの監視先を、図１４（ｂ）に示す手順で決定する。図１４（ａ）は、Ｎ個のメンバのうちｋ番目が代表メンバでｊ番目が次期代表メンバ候補であることを表している。 On the other hand, in this method employing the “system in which the monitoring destination spans all members”, when the member k is the representative member and the member j is the next representative member candidate, the monitoring destination of each cluster member is shown in FIG. Determine according to the procedure shown in. FIG. 14A shows that among the N members, the kth is the representative member and the jth is the next representative member candidate.

まず、条件１（ｘ≠ｋ，ｘ≠ｊ）に従い、メンバｋ（代表メンバ）の監視先メンバｘを決定する（Ｓ１４１）。つまり、メンバｋ（代表メンバ）の監視先メンバｘを、メンバｊ（次期代表メンバ候補）と自身を除いた（Ｎ−２）個のメンバの中から選択する。 First, the monitoring target member x of the member k (representative member) is determined according to the condition 1 (x ≠ k, x ≠ j) (S141). That is, the monitoring target member x of the member k (representative member) is selected from (N-2) members excluding the member j (next representative member candidate) and itself.

次に、条件２（ｙ≠ｋ，ｙ≠ｘ）に従い、メンバｘの監視先メンバｙを決定する（Ｓ１４２）。つまり、メンバｘの監視先メンバｙを、系の始点である代表メンバｋと、既に監視元メンバが決定しているメンバｘを除いた（Ｎ−２）個のメンバの中から選択する。 Next, the monitoring target member y of the member x is determined according to the condition 2 (y ≠ k, y ≠ x) (S142). That is, the monitoring destination member y of the member x is selected from (N-2) members excluding the representative member k that is the starting point of the system and the member x that has already been determined by the monitoring source member.

次に、監視先を未設定のメンバ数が「２以上」か「１」かを判断し、「２以上」であればＳ１４５に進み、「１」であれば最終メンバの監視先をメンバｋ（代表メンバ）に設定し（Ｓ１４４）、処理を終了する。 Next, it is determined whether the number of members whose monitoring destination is not set is “2 or more” or “1”. If “2 or more”, the process proceeds to S145. If “1”, the monitoring destination of the last member is member k. (Representative member) is set (S144), and the process ends.

Ｓ１４５において、条件３（ｚ≠ｋ，ｚ≠ｘ，ｚ≠ｙ）に従い、メンバｙの監視先メンバｚを決定する。つまり、メンバｙの監視先メンバｚを、系の始点である代表メンバｋと、既に監視元メンバが決定しているメンバｘ，ｙを除いた（Ｎ−３）個のメンバの中から選択する。 In S145, the monitoring target member z of the member y is determined according to the condition 3 (z ≠ k, z ≠ x, z ≠ y). That is, the monitoring target member z of the member y is selected from (N-3) members excluding the representative member k that is the starting point of the system and the members x and y that have already been determined by the monitoring source member. .

次に、監視先を未設定のメンバ数が「２以上」か「１」かを判断し、「２以上」であれば｛Ｓ１４２，Ｓ１４３｝、｛Ｓ１４５，Ｓ１４６｝と同様の処理（条件４，５，・・・を用いて監視先メンバを決定）を継続し、「１」であれば最終メンバの監視先をメンバｋ（代表メンバ）に設定し（Ｓ１４７）、処理を終了する。 Next, it is determined whether the number of members whose monitoring destinations are not set is “2 or more” or “1”. If the number is “2 or more”, the same processing (condition 4) as {S142, S143}, {S145, S146} , 5,... Is determined), and if “1”, the monitoring destination of the last member is set to member k (representative member) (S147), and the process ends.

このようにして、代表メンバが次期代表メンバを監視することを避けるとともに、各クラスタメンバの監視先が全メンバを跨った系を成すように構成することで、複数メンバの同時故障時に故障検知不能となるメンバをできるだけ少なくすることができる。なお、Ｎ＝２の場合は、代表メンバと次期代表メンバ候補が相互監視することになるので、本実施形態におけるＮの必要条件はＮ≧３である。 In this way, by preventing the representative member from monitoring the next representative member and configuring the cluster members to be monitored across all members, failure detection is not possible when multiple members fail simultaneously Can be reduced as much as possible. When N = 2, the representative member and the next representative member candidate are mutually monitored, so the necessary condition of N in the present embodiment is N ≧ 3.

この方式は、クラスタシステムのソフトウェアに大きな変更を加えることなく実装可能であるので、汎用的な利用に有効である。 Since this method can be implemented without major changes to the software of the cluster system, it is effective for general use.

以上で本実施形態の説明を終えるが、本発明の態様はこれらに限定されるものではない。具体的な構成や処理について、本発明の主旨を逸脱しない範囲で適宜変更が可能である。 Although description of this embodiment is finished above, the aspect of the present invention is not limited to these. The specific configuration and processing can be appropriately changed without departing from the gist of the present invention.

１００代表メンバ
１１０処理部
１１１メンバ管理表配信部
１１２メンバ管理表更新部
１１３代表メンバ特定部
１１４他メンバ死活監視部
１２０記憶部
１２１メンバ管理表
１２２代表メンバ選出規則
１２３監視先メンバ設定規則
１３０入出力部
２００非代表メンバ
２１０処理部
２１１代表メンバ特定部
２１２他メンバ死活監視部
２２０記憶部
２２２代表メンバ選出規則
２２３監視先メンバ設定規則
２３０入出力部 DESCRIPTION OF SYMBOLS 100 Representative member 110 Processing part 111 Member management table distribution part 112 Member management table update part 113 Representative member specific part 114 Other member alive monitoring part 120 Storage part 121 Member management table 122 Representative member selection rule 123 Monitoring destination member setting rule 130 Input / output Section 200 Non-representative member 210 Processing section 211 Representative member specifying section 212 Other member alive monitoring section 220 Storage section 222 Representative member selection rule 223 Monitoring destination member setting rule 230 Input / output section

Claims

A cluster system in which multiple cluster members perform distributed processing,
Each of the plurality of cluster members has a member management table including identifiers of the plurality of cluster members,
The representative member of the plurality of cluster members updates the member management table, and distributes the updated member management table to each of the plurality of cluster members.
Of the plurality of cluster members, the next representative member candidate that is the next representative member when the representative member fails is selected in advance,
Each of the plurality of cluster members is
A monitoring destination that is determined to select a cluster member that performs life and death monitoring for the next representative member candidate from other than the representative member at the time of determining the life and death monitoring target to be performed among the plurality of cluster members. A storage unit for storing member setting rules;
When determining the life and death monitoring target to be performed among the plurality of cluster members, a processing unit that selects its own monitoring target according to the monitoring member setting rules;
A cluster system comprising:

The monitoring member setting rules are:
When the relationship of life and death monitoring performed between the plurality of cluster members starts from an arbitrary cluster member, and follows the cluster member monitoring destination cluster member, and further the cluster member monitoring destination cluster member, The cluster system according to claim 1, wherein the rule is set so as to cover all of the plurality of cluster members and then return to the original starting cluster member.

The monitoring member setting rules are:
When the number of cluster members (N) is an odd number, the monitoring target of each cluster member is one cluster member skipped in the descending order of the order in the member management table, and the “N−1” -th previous cluster from the last is The cluster according to claim 1 or 2, wherein the first cluster member is monitored, and the last Nth cluster member is a rule set to monitor the second cluster member from the top. system.

The monitoring member setting rules are:
When the number of cluster members (N) is an even number, the monitoring target of each cluster member is
The “N-2” -th cluster member from the top of the order in the member management table is a cluster member that is skipped in descending order of the order in the member management table.
The “N-1” -th cluster immediately before the last is a rule set to monitor the second cluster member from the top, and the last N-th cluster member is a rule set to monitor the first cluster member. The cluster system according to claim 1 or 2, wherein: