JP2010212807A

JP2010212807A - Fail-over method, system thereof, node, and program

Info

Publication number: JP2010212807A
Application number: JP2009054187A
Authority: JP
Inventors: Takahisa Iwama; 隆寿岩間
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2009-03-06
Filing date: 2009-03-06
Publication date: 2010-09-24

Abstract

<P>PROBLEM TO BE SOLVED: To provide a fail-over method, a system thereof, a node, and a program such that information used to decide whether to shift from a standby system node to an in-use system node is made consistent between standby system nodes. <P>SOLUTION: In the fail-over method, when each standby system node detects a fault occurring to an in-use system node, information on the standby system node itself at a certain point of time after the fault detection of the in-use system node is transmitted to other standby system nodes, information on the standby system node itself and information on the other standby system nodes are acquired, and it is decided whether to shift from the standby system node to an in-use system node by using the acquired information on the standby system nodes. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、フェイルオーバー方法、そのシステム、ノード及びプログラムに関する。 The present invention relates to a failover method, a system thereof, a node, and a program.

複数のノードから構成される分散計算機システムにおいて、管理対象のノードを管理するマネージャがシステム上に存在する。例えば、ノードに障害が発生した際には、該ノード上で行われている処理を他のノードに引き継ぐことができるため、高可用性を実現でき、また、例えば、複数のノードの情報をもとに、あるノードで行われている処理を他のノードに移行することにより、システム全体としての処理の効率化を図ることが可能となる。 In a distributed computer system composed of a plurality of nodes, a manager for managing a node to be managed exists on the system. For example, when a failure occurs in a node, the processing performed on the node can be taken over by another node, so that high availability can be achieved. For example, based on information of a plurality of nodes In addition, it is possible to improve the efficiency of the processing of the entire system by shifting the processing performed at a certain node to another node.

しかし、運用中の現用系マネージャに障害が発生した場合、エージェントを管理することができなくなり、システム全体として処理が停止する。その際に、特定の方法によって、システムを継続させる必要がある。 However, if a failure occurs in the active manager that is in operation, the agent cannot be managed, and the processing of the entire system stops. At that time, it is necessary to continue the system by a specific method.

それを実現する方法の一つとして、あらかじめ待機系マネージャを用意しておき、現用系マネージャに障害が発生した際には、待機系マネージャが障害の発生した現用系マネージャの代替として運用するフェイルオーバー方法がある。 One way to achieve this is to prepare a standby manager in advance, and when a failure occurs in the active manager, the standby manager operates as a replacement for the failed active manager. There is a way.

例えば、現用系マネージャに移行することができる待機系マネージャが複数存在する場合、全ての待機系マネージャ間で、どの待機系マネージャを現用系として運用するかを協議し、選ばれた待機系マネージャは現用系としての運用を開始する。これにより、現用系マネージャに障害が発生した場合に、引き続きシステム全体として処理を継続させることができる。 For example, when there are multiple standby managers that can be transferred to the active manager, the standby managers are discussed among all the standby managers as to which standby manager is to be used as the active manager. Start operation as an active system. As a result, when a failure occurs in the active manager, it is possible to continue processing as the entire system.

また、特許文献１には、複数台で冗長化されたサーバ間で相互に監視し合い、現用系サーバの障害発生を検出した待機系サーバが自動でサーバ切替えを実施する発明が記載されている。この発明は、第１のサーバは第２のサーバを監視し、第２のサーバは第３のサーバを監視という様に、サーバ同士が互いを円環状に監視し合い、障害検知時に監視サーバが現用系サーバとなるものである。 Patent Document 1 describes an invention in which a standby server that automatically monitors a plurality of redundant servers and detects a failure of the active server automatically performs server switching. . In the present invention, the first server monitors the second server, the second server monitors the third server, and the like. This is the active server.

特開２００６−２２９５１２号公報JP 2006-229512 A

しかしながら、特許文献１の技術は、待機系サーバは１台のサーバのみを監視する。従って、現用系サーバに障害が発生した場合、その現用系の処理を引き継ぐ待機系サーバはその現用系サーバを監視していた待機系サーバである。従って、現用系の処理を引き継いだ待機系サーバは、リソースの空き容量等の能力的な面でかならずしも適したものでない場合があった。 However, in the technique of Patent Document 1, the standby server monitors only one server. Therefore, when a failure occurs in the active server, the standby server that takes over the processing of the active server is the standby server that has been monitoring the active server. Therefore, the standby server that has taken over the processing of the active system may not always be suitable in terms of capabilities such as the free capacity of resources.

また、複数の待機系マネージャの間で協議し、現用系の処理を引き継ぐ待機系サーバを決定する場合、現用系の処理を引き継いだ待機系サーバは能力的に適した場合が多いが、複数の待機系マネージャの間で協議することは容易ではなかった。何故なら、複数の待機系マネージャの間で協議する際、待機系マネージャから現用系マネージャに移行するか否かの判定に用いられる情報が各待機系マネージャによって異なるからである。 Also, when negotiating between multiple standby managers and determining the standby server that will take over the active process, the standby server that has taken over the active process is often suitable in terms of capability. Consultations between standby managers were not easy. This is because the information used to determine whether or not to shift from the standby manager to the active manager when discussing among a plurality of standby managers is different for each standby manager.

そこで、本発明は上記課題に鑑みて発明されたものであって、その目的は、待機系ノードから現用系ノードに移行するかの判定に用いられる情報を、待機系ノード間で一致させるフェイルオーバー方法、そのシステム、ノード及びプログラムを提供することにある。 Therefore, the present invention has been invented in view of the above-described problems, and its purpose is to perform a failover that matches information used for determining whether to move from a standby node to an active node between standby nodes. To provide a method, its system, a node and a program.

上記課題を解決する本発明は、現用系ノードと少なくとも２以上の待機系ノードとを備えるフェイルオーバーシステムを制御する方法であって、待機系ノードのそれぞれは、現用系ノードが現用系ノードとして機能していないことを検出すると、前記現用系ノードの検出後のある時点における自身の待機系ノードの情報を他の待機系ノードに送信し、前記他の待機系ノードの情報を取得し、自身の待機系ノードの情報と前記取得した他の待機系ノードの情報とを用いて、待機系ノードから現用系ノードに移行するかを判定するフェイルオーバー方法である。 The present invention for solving the above-described problems is a method for controlling a failover system including an active node and at least two standby nodes. Each of the standby nodes functions as an active node. If it is detected that the active node is not transmitted, the information of the standby node at a certain time after the detection of the active node is transmitted to the other standby node, the information of the other standby node is obtained, This is a failover method for determining whether to move from the standby node to the active node using information on the standby node and the acquired information on the other standby node.

上記課題を解決する本発明は、少なくとも１以上の現用系ノードと、少なくとも２以上の待機系ノードとを有し、前記待機系ノードのそれぞれは、前記少なくとも１以上の現用系ノードが現用系ノードとして機能していないことを検出する検出手段と、前記検出手段の現用系ノードの検出後のある時点における自身の待機系ノードの情報を他の待機系ノードに送信する待機系ノード情報送信手段と、前記自身の待機系ノードの情報と前記他の待機系ノードの情報とを取得し、前記取得した待機系ノードの情報を用いて、待機系ノードから現用系ノードに移行するかを判定する判定手段とを有するフェイルオーバーシステムある。 The present invention for solving the above-described problems has at least one or more active nodes and at least two or more standby nodes, and each of the standby nodes includes the at least one or more active nodes. Detection means for detecting that it is not functioning as a standby node information transmission means for transmitting information of its standby node at a certain time after detection of the active node of the detection means to another standby node; Determining whether to transition from the standby node to the active node using the acquired standby node information by acquiring the information of the own standby node and the information of the other standby node; And a failover system having means.

上記課題を解決する本発明は、少なくとも１以上の現用系ノードと少なくとも２以上の待機系ノードとを有するフェイルオーバーシステムにおけるノードであって、少なくとも１以上の現用系ノードが現用系ノードとして機能していないことを検出する検出手段と、前記検出手段の現用系ノードの検出後のある時点の自身の待機系ノードの情報を他の待機系ノードに送信する待機系ノード情報送信手段と、前記自身の待機系ノードの情報と前記他の待機系ノードの情報とを取得し、前記取得した待機系ノードの情報を用いて、待機系ノードから現用系ノードに移行するかを判定する判定手段とを有するノードである。 The present invention for solving the above problems is a node in a failover system having at least one active node and at least two standby nodes, wherein at least one active node functions as an active node. Detecting means for detecting that there is no standby node information transmitting means for transmitting information of its standby node at a certain time after detection of the active node of the detecting means to another standby node; Determining means for determining whether to move from the standby node to the active node using the acquired information of the standby node using the acquired information of the standby node; It is a node that has.

上記課題を解決する本発明は、少なくとも１以上の現用系ノードと少なくとも２以上の待機系ノードとを有するフェイルオーバーシステムにおけるノードのプログラムであって、少なくとも１以上の現用系ノードが現用系ノードとして機能していないことを検出する検出処理と、前記現用系ノードの検出後のある時点の自身の待機系ノードの情報を他の待機系ノードに送信する待機系ノード情報送信処理と、前記自身の待機系ノードの情報と前記他の待機系ノードの情報とを取得し、前記取得した待機系ノードの情報を用いて、待機系ノードから現用系ノードに移行するかを判定する判定処理とをノードに実行させるプログラムである。 The present invention for solving the above-described problems is a node program in a failover system having at least one active node and at least two standby nodes, wherein at least one active node is used as an active node. A detection process for detecting non-functioning, a standby node information transmission process for transmitting information of its standby node at a certain time after detection of the active node to another standby node, and the own node A node that acquires information on the standby node and information on the other standby node, and uses the acquired information on the standby node to determine whether to move from the standby node to the active node; This is a program to be executed.

本発明によれば、待機系ノードから現用系ノードに移行するかの判定に用いられる情報を、待機系ノード間で一致させることができる。 According to the present invention, the information used for determining whether to move from the standby node to the active node can be matched between the standby nodes.

図１は本発明の実施の形態の概要を説明するための図である。FIG. 1 is a diagram for explaining an outline of an embodiment of the present invention. 図２は第１の実施の形態のブロック図である。FIG. 2 is a block diagram of the first embodiment. 図３は第１の実施の形態のシーケンス図である。FIG. 3 is a sequence diagram of the first embodiment. 図４は実施例１のブロック図である。FIG. 4 is a block diagram of the first embodiment. 図５は第２の実施の形態のシステムのブロック図である。FIG. 5 is a block diagram of a system according to the second embodiment. 図６は第３の実施の形態のシステムのブロック図である。FIG. 6 is a block diagram of a system according to the third embodiment. 図７は第３の実施の形態のシーケンス図である。FIG. 7 is a sequence diagram of the third embodiment. 図８は実施例２のブロック図である。FIG. 8 is a block diagram of the second embodiment. 図９は実施例２のシーケンス図である。FIG. 9 is a sequence diagram of the second embodiment.

本発明の実施の形態の概要を説明する。 An outline of an embodiment of the present invention will be described.

図１は本発明の実施の形態の概要を説明するための図である。 FIG. 1 is a diagram for explaining an outline of an embodiment of the present invention.

図１では、３つの待機系ノード１，２，３を示しており、各待機系ノード１，２，３は判定アルゴリズム４を有している。この判定アルゴリズム４は、３つの待機系ノード１，２，３の待機系ノードの情報に基づいて、自身が待機系ノードから現用系ノードに移行するかを判定するアルゴリズムである。 In FIG. 1, three standby nodes 1, 2, 3 are shown, and each standby node 1, 2, 3 has a determination algorithm 4. This determination algorithm 4 is an algorithm for determining whether or not the node itself shifts from the standby node to the active node based on the information of the standby nodes of the three standby nodes 1, 2, and 3.

各待機系ノード１，２，３は、現用系ノードの障害を検出すると、現用系ノードの障害検出後のある時点の自身の待機系ノードの情報を他の待機系ノードに送信する。そして、各待機系ノード１，２，３は、他の待機系ノードに送信した自身の待機系ノードの情報と、他の待機系ノードの情報と、これらの情報と判定アルゴリズム４とに基づいて、自身が待機系から現用系に移行するかを判定する。尚、現用系ノードの障害は、現用系ノード自身の障害のみならず、現用系ノード以外の回線等の障害により、現用系ノードとして機能していない場合も含むが、以下の説明では、これらを総称して現用系ノードの障害と記載する。 When each of the standby nodes 1, 2, and 3 detects a failure of the active node, it transmits information of its own standby node at a certain point in time after detecting the failure of the active node to the other standby nodes. Then, each standby node 1, 2, 3 is based on the information of its own standby node transmitted to the other standby node, the information of the other standby node, these information, and the determination algorithm 4. , It determines whether it will shift from the standby system to the active system. The failure of the active node includes not only the failure of the active node itself but also the case where it is not functioning as the active node due to a failure of the line other than the active node, etc. Collectively, it is described as a failure of the active node.

ここで、自身の待機系ノードの情報を送信する際の現用系ノードの障害検出後のある時点であるが、ひとつの現用系ノードの障害検出に対して、同じ待機系ノードから時間的に異なる複数の待機系ノードの情報が送信されることを排除する趣旨である。何故なら、時間的に異なる複数の待機系ノードの情報が送信されると、全待機系ノードが同じ情報で移行の判定を行えなくなるからである。例えば、待機系ノード１が、時間ｔ１における自身の待機系ノードの情報と、時間ｔ２における自身の待機系ノードの情報とを送信してしまうと、待機系ノード２が時間ｔ１における待機系ノード１の情報を用い、待機系ノード３が時間ｔ２における待機系ノード１の情報を用いてしまう可能性がある。この場合、全待機系ノードで同じ情報を用いて判定することはできない。 Here, it is a certain point in time after detecting the failure of the active node when transmitting information of its own standby node, but it differs in time from the same standby node for the failure detection of one active node This is intended to exclude the transmission of information of a plurality of standby nodes. This is because when information on a plurality of standby nodes that are different in time is transmitted, all the standby nodes cannot perform the migration determination with the same information. For example, if standby node 1 transmits information about its own standby node at time t1 and information about its own standby node at time t2, standby node 2 becomes standby node 1 at time t1. The standby node 3 may use the information of the standby node 1 at time t2. In this case, determination cannot be made using the same information in all standby nodes.

このような状態を防ぐ具体的な方法としては、現用系ノードの障害を検出すると、その検出後に一度だけ、自身の待機系ノードの情報を他の待機系ノードに送信する。このようにすれば、ひとつの現用系ノードの障害検出に対して、現用系ノードの障害検出後のある時点の待機系ノードの情報となり、全待機系ノードが同じ情報で移行の判定を行うことができる。但し、ある時点の同じ待機系ノードの情報を複数回送信するのはかまわない。いずれの情報を用いても、同じ情報であるからである。また、障害が検出された現用系ノードを識別する識別情報を待機系ノードの情報に付加して送信するようにすれば、複数の現用系ノードの障害が発生した場合でも、どの現用系ノードの障害に対する待機系ノードの情報なのかが識別できる。 As a specific method for preventing such a state, when a failure of the active node is detected, the information of its own standby node is transmitted to other standby nodes only once after the detection. In this way, for the failure detection of one active node, it becomes the information of the standby node at a certain point after the failure detection of the active node, and all the standby nodes make the migration determination with the same information. Can do. However, the same standby node information at a certain time may be transmitted a plurality of times. This is because the same information is used regardless of which information is used. In addition, if identification information for identifying the active node in which a failure is detected is added to the information of the standby node and transmitted, even if multiple active node failures occur, which of the active nodes It is possible to identify whether it is standby node information for a failure.

また、待機系ノードの情報とは、ＣＰＵの稼働率、リソースの空き容量、故障率、データの送信又は受信に用いられる帯域の情報等であるが、これに限定されない。 In addition, the standby node information includes CPU operation rate, resource free capacity, failure rate, and information on bandwidth used for data transmission or reception, but is not limited thereto.

更に、判定に用いられる判定アルゴリズム４は、各待機系ノード１，２，３において同じものが好ましい。判定アルゴリズム４が同じであり、判定アルゴリズム４が用いる待機系ノードの情報が共通であれば、各待機系ノード１，２，３の判定結果が同じになるからである。 Furthermore, it is preferable that the determination algorithm 4 used for the determination is the same in each of the standby nodes 1, 2, and 3. This is because if the determination algorithm 4 is the same and the standby node information used by the determination algorithm 4 is common, the determination results of the standby nodes 1, 2, and 3 are the same.

このような構成により、各待機系ノードから現用系ノードに移行するか否かの判定する為の情報を、全ての待機系ノードの間で一致させることができる。そして、その判定結果は全ての待機系ノードの間で同一となるので、待機系ノードのそれぞれは、判定した結果を、他の待機系ノードに送信する必要がない。但し、本発明は、待機系ノードのそれぞれが、判定結果を他の待機系ノードに送信することを除外するものではなく、送信してもかまわない。 With such a configuration, the information for determining whether to move from each standby node to the active node can be made consistent among all the standby nodes. Since the determination result is the same among all the standby nodes, each of the standby nodes does not need to transmit the determination result to other standby nodes. However, the present invention does not exclude that each of the standby nodes transmits the determination result to another standby node, and may be transmitted.

以下、具体的な実施の形態を、図面を参照して詳細に説明する。
＜第１の実施の形態＞
図２を参照すると、本発明の第１の実施の形態は、フェイルオーバー方法が実現されるシステムは二つの待機系のマネージャ（ノード）１００，１０１を有する。 Hereinafter, specific embodiments will be described in detail with reference to the drawings.
<First Embodiment>
Referring to FIG. 2, in the first embodiment of the present invention, a system in which a failover method is realized has two standby managers (nodes) 100 and 101.

マネージャ１００は、待機系リスト記憶部１１０と、情報取得部１２０と、配付部１３０と、集約部１４０と、障害検知部１５０とを含む。 The manager 100 includes a standby list storage unit 110, an information acquisition unit 120, a distribution unit 130, an aggregation unit 140, and a failure detection unit 150.

マネージャ１０１は、待機系リスト記憶部１１１と、情報取得部１２１と、配付部１３１と、集約部１４１と、障害検知部１５１とを含む。 The manager 101 includes a standby system list storage unit 111, an information acquisition unit 121, a distribution unit 131, an aggregation unit 141, and a failure detection unit 151.

これらの装置はそれぞれ概略次のように動作する。 Each of these devices generally operates as follows.

障害検知部１５０は、現用系マネージャの障害を検知した際、配付部１３０にその旨を通知する。配付部１３０は、現用系マネージャの障害を通知されると、待機系リスト記憶部１１０から全ての待機系マネージャの一覧を取得し、情報取得部１２０から自身の待機系マネージャの情報を取得し、この情報を一度だけ各待機系マネージャ内の集約部１４０及び集約部１４１に送信する。 When the failure detection unit 150 detects a failure of the active manager, the failure detection unit 150 notifies the distribution unit 130 of the failure. When the distribution unit 130 is notified of the failure of the active manager, the distribution unit 130 acquires a list of all standby managers from the standby list storage unit 110, acquires information about its own standby manager from the information acquisition unit 120, and This information is transmitted only once to the aggregating unit 140 and the aggregating unit 141 in each standby manager.

同様に、障害検知部１５１は、現用系マネージャの障害を検知した際、配付部１３１にその旨を通知する。配付部１３１は、現用系マネージャの障害を通知されると、待機系リスト記憶部１１１から全ての待機系マネージャの一覧を取得し、情報取得部１２１から自身の待機系マネージャの情報を取得し、この情報を一度だけ各待機系マネージャ内の集約部１４０及び集約部１４１に送信する。 Similarly, when the failure detection unit 151 detects a failure of the active manager, the failure detection unit 151 notifies the distribution unit 131 of the failure. When the distribution unit 131 is notified of the failure of the active manager, the distribution unit 131 acquires a list of all standby managers from the standby list storage unit 111, acquires information about its own standby manager from the information acquisition unit 121, and This information is transmitted only once to the aggregating unit 140 and the aggregating unit 141 in each standby manager.

ここで、待機系ノードの情報は、ＣＰＵの稼働率、リソースの空き容量、故障率、データの送信又は受信に用いられる帯域の情報等であるが、これに限定されない。 Here, the standby node information includes the CPU operation rate, the resource free capacity, the failure rate, information on the bandwidth used for data transmission or reception, and the like, but is not limited thereto.

集約部１４０は、自待機系マネージャが待機系から現用系に移行するかを判定するアルゴリズム１４０１を有する。尚、このアルゴリズム１４００は、待機系のマネージャ１００，１０１において同一の判定アルゴリズムであるが、判定アルゴリズムの種類は特に問わない。そして、集約部１４０は、各待機系マネージャ内の配付部１３０及び１３１から受信した各待機系マネージャの情報を集約し、集約した情報をデータとして判定アルゴリズム１４００により、自身が現用系マネージャに移行するか否かを判定する。 The aggregation unit 140 has an algorithm 1401 for determining whether the own standby system manager shifts from the standby system to the active system. The algorithm 1400 is the same determination algorithm in the standby managers 100 and 101, but the type of the determination algorithm is not particularly limited. Then, the aggregating unit 140 aggregates the information of each standby system manager received from the distributing units 130 and 131 in each standby system manager, and moves itself to the active system manager by the determination algorithm 1400 using the aggregated information as data. It is determined whether or not.

同様に、集約部１４１は、自待機系マネージャが待機系から現用系に移行するかを判定するアルゴリズム１４０１を有する。尚、このアルゴリズム１４０１は、待機系のマネージャ１００，１０１において同一の判定アルゴリズムであるが、判定アルゴリズムの種類は特に問わない。そして、集約部１４１は、各待機系マネージャ内の配付部１３０及び１３１から受信した各待機系マネージャの情報を集約し、集約した情報をデータとして判定アルゴリズム１４０１により、自身が現用系マネージャに移行するか否かを判定する。 Similarly, the aggregation unit 141 includes an algorithm 1401 that determines whether the own standby system manager shifts from the standby system to the active system. The algorithm 1401 is the same determination algorithm in the standby managers 100 and 101, but the type of the determination algorithm is not particularly limited. Then, the aggregating unit 141 aggregates the information of each standby system manager received from the distribution units 130 and 131 in each standby system manager, and shifts itself to the active system manager by the determination algorithm 1401 using the aggregated information as data. It is determined whether or not.

次に、図２及び図３のシーケンス図を参照して本実施の形態の全体の動作について詳細に説明する。 Next, the overall operation of the present embodiment will be described in detail with reference to the sequence diagrams of FIGS.

まず、現用系マネージャに障害が発生すると、マネージャ１００の障害検知部１５０が障害を検知し（Ｓ２１０）、障害検知部１５０は配付部１３０に通知する。同様に、現用系マネージャに障害が発生すると、マネージャ１０１の障害検知部１５１が障害を検知し（Ｓ２１１）し、障害検知部１５１は配付部１３１に通知する。 First, when a failure occurs in the active manager, the failure detection unit 150 of the manager 100 detects the failure (S210), and the failure detection unit 150 notifies the distribution unit 130. Similarly, when a failure occurs in the active manager, the failure detection unit 151 of the manager 101 detects the failure (S211), and the failure detection unit 151 notifies the distribution unit 131.

次に、配付部１３０は、待機系リスト記憶部１１０から全ての待機系マネージャの一覧（本例では、マネージャ１００及び１０１）を取得し（Ｓ２２０）、情報取得部１２０からマネージャ１００の情報を取得する（Ｓ２３０）。そして、配付部１３０は、全マネージャの集約部（本例では、集約部１４０及び１４１）に自身の情報を送信する（Ｓ２４０）。同様に、配付部１３１は、待機系リスト記憶部１１１から全ての待機系マネージャの一覧（本例では、マネージャ１００及び１０１）を取得し（Ｓ２２１）、情報取得部１２１からマネージャ１０１の情報を取得する（Ｓ２３１）。そして、配付部１３１は、全マネージャの集約部（本例では、集約部１４０及び１４１）に自身の情報を送信する（Ｓ２４１）。 Next, the distribution unit 130 acquires a list of all standby system managers (in this example, managers 100 and 101) from the standby system list storage unit 110 (S220), and acquires information on the manager 100 from the information acquisition unit 120. (S230). Then, the distribution unit 130 transmits its own information to the aggregation units (in this example, the aggregation units 140 and 141) of all managers (S240). Similarly, the distribution unit 131 acquires a list of all standby system managers (in this example, managers 100 and 101) from the standby system list storage unit 111 (S221), and acquires information on the manager 101 from the information acquisition unit 121. (S231). Then, the distributing unit 131 transmits its information to the aggregating units of all managers (in this example, the aggregating units 140 and 141) (S241).

最後に、集約部１４０は、全マネージャの配付部（本例では、配付部１３０及び１３１）からの情報を集約し（Ｓ２５０）、自身が現用系に移行するか否かを判定する（Ｓ２６０）。同様に、集約部１４１は、全マネージャの配付部（本例では、配付部１３０及び１３１）からの情報を集約し（Ｓ２５１）、自身が現用系に移行するか否かを判定する（Ｓ２６１）。 Finally, the aggregating unit 140 aggregates information from the distributing units of all managers (in this example, the distributing units 130 and 131) (S250), and determines whether or not it itself shifts to the active system (S260). . Similarly, the aggregation unit 141 aggregates information from the distribution units (distribution units 130 and 131 in this example) of all managers (S251), and determines whether or not the manager itself shifts to the active system (S261). .

尚、待機系リスト記憶部１１０及び１１１から全待機系リストを取得する方法（Ｓ２２０及びＳ２２１）、情報取得部１２０及び１２１が持つ情報の具体的内容（Ｓ２３０及びＳ２３１）、配付部１３０及び１３１から集約部１４０及び１４１への通信方法（Ｓ２４０及びＳ２４１）、集約部１４０及び１４１が行う現用系への移行判定の具体的方法（Ｓ２６０及び２６１）は、本発明の趣旨を逸脱しない範囲で適時変更ができる。 From the standby system list storage units 110 and 111, a method for acquiring all standby system lists (S220 and S221), the specific contents of information held by the information acquisition units 120 and 121 (S230 and S231), and the distribution units 130 and 131 The communication method to the aggregation units 140 and 141 (S240 and S241), and the specific method (S260 and 261) for determining whether to move to the active system performed by the aggregation units 140 and 141 are changed in a timely manner within the scope of the present invention. Can do.

このように、本実施の形態では、全ての集約部において現用系への移行判定に用いる情報が同一である。従って、複数の待機系マネージャの中から現用系に移行するマネージャを自律的に選択することができる。 As described above, in this embodiment, the information used for the determination of the transition to the active system is the same in all the aggregation units. Accordingly, it is possible to autonomously select a manager to be transferred to the active system from a plurality of standby system managers.

次に、第１の実施の形態に対応する実施例１を説明する。 Next, Example 1 corresponding to the first embodiment will be described.

実施例１のシステムは、図４に示すように、コンピュータネットワーク４００と、その構成要素であるノード４１０〜４１２，４１９と、各ノード上で動作する４個のマネージャ４２０〜４２２，４２９と、各マネージャが保持している、４個の待機系リスト記憶部４３０〜４３２，４３９と、４個の情報取得部４４０〜４４２，４４９と、４個の配付部４５０〜４５２，４５９と、４個の集約部４６０〜４６２，４６９とを含む。 As shown in FIG. 4, the system according to the first embodiment includes a computer network 400, nodes 410 to 412 and 419 that are components thereof, four managers 420 to 422 and 429 that operate on each node, The manager holds four standby system list storage units 430 to 432, 439, four information acquisition units 440 to 442, 449, four distribution units 450 to 452, 459, and four Aggregating units 460 to 462 and 469.

上述のようなシステムにおいて、ノード４１９上のマネージャ４２９が現用系として動作し、その他のノード４１０〜４１２上のマネージャ４２０〜４２２が待機系として動作している際に、ノード４１９がコンピュータネットワーク４００から切断された場合を説明する。 In the system as described above, when the manager 429 on the node 419 operates as the active system and the managers 420 to 422 on the other nodes 410 to 412 operate as the standby system, the node 419 is disconnected from the computer network 400. The case where it is cut will be described.

まず、各々の配付部４５０〜４５２は、各々の待機系リスト記憶部４３０〜４３２より全ての待機系マネージャの一覧であるマネージャ４２０〜４２２を取得し、各々の情報取得部４４０〜４４２より自身の情報を取得し、集約部４６０〜４６２に自身の情報を送信する。 First, each distribution unit 450 to 452 acquires managers 420 to 422 that are lists of all standby system managers from each standby system list storage unit 430 to 432, and each distribution unit 450 to 452 owns from each information acquisition unit 440 to 442. Information is acquired and the information is transmitted to the aggregation units 460 to 462.

次に、各々の集約部４６０〜４６２は、待機系リスト記憶部４３０〜４３２の全てから受け取ったマネージャ４２０〜４２２の情報を集約し、自身が現用系マネージャに移行するか否かを判定する。 Next, each aggregation unit 460 to 462 aggregates the information of the managers 420 to 422 received from all of the standby system list storage units 430 to 432, and determines whether or not it shifts to the active manager.

マネージャ４２０〜４２２の情報を集約した結果として、現用系マネージャに移行すべきマネージャが、例えばマネージャ４２１であるとすると、集約部４６０〜４６２の各々はマネージャ４２１が現用系に移行すると判定し、マネージャ４２１は現用系に移行し、マネージャ４２０及び４２２は現用系に移行せず待機系のまま動作を続行する。 Assuming that the manager to be transferred to the active manager as a result of aggregating the information of the managers 420 to 422 is, for example, the manager 421, each of the aggregating units 460 to 462 determines that the manager 421 shifts to the active system. 421 shifts to the active system, and the managers 420 and 422 continue to operate in the standby system without shifting to the active system.

従来の方法においては、全てのマネージャにおいて、現用系に移行するか否かを判定するために保持しているマネージャ４２０〜４２２の情報を一致させることができなかったが、本実施例においてはそれが実現できるため、個々のマネージャが現用系に移行するか否かの判定を行うことができる。 In the conventional method, the information of the managers 420 to 422 held for determining whether or not to shift to the active system could not be matched in all managers. Therefore, it is possible to determine whether or not each manager shifts to the active system.

尚、上述した実施例１では、配付部４５０〜４５２は、待機系リスト記憶部４３０〜４３２より全ての待機系マネージャの一覧であるマネージャ４２０〜４２２を取得し、集約部４６０〜４６２に情報を送信しているが、現用系マネージャであるマネージャ４２９の集約部４６９にも送信しても良い。また、現用系マネージャであるマネージャ４２９は、他の待機系マネージャ４２０〜４２２と同様な動作を行っても良い。 In the first embodiment described above, the distribution units 450 to 452 acquire the managers 420 to 422 that are lists of all standby system managers from the standby system list storage units 430 to 432, and send information to the aggregation units 460 to 462. Although it is transmitted, it may also be transmitted to the aggregation unit 469 of the manager 429 that is the active manager. Further, the manager 429 that is the active manager may perform the same operation as the other standby managers 420 to 422.

実施例１の他の例を説明する。 Another example of the first embodiment will be described.

上述した実施例において、マネージャ４２０〜４２２の情報を集約した結果として、現用系マネージャに移行すべきマネージャが例えばマネージャ４２１及び４２２であるとすると、集約手段４６０〜４６２の各々はマネージャ４２１及び４２２が現用系に移行すると判定し、マネージャ４２１及び４２２は現用系に移行し、マネージャ４２０は現用系に移行せず待機系のまま動作を続行する。 In the above-described embodiment, assuming that the managers 421 and 422 are the managers to be transferred to the active manager as a result of aggregating the information of the managers 420 to 422, for example, the aggregating means 460 to 462 have the managers 421 and 422 respectively. The managers 421 and 422 are determined to shift to the active system, the managers 421 and 422 shift to the active system, and the manager 420 continues to operate without being transferred to the active system.

実施例１の更に他の例を説明する。 Still another example of the first embodiment will be described.

上述した実施例において、マネージャ４２０〜４２２の情報を集約した結果として、現用系マネージャに移行すべきマネージャが例えば存在しないとすると、集約手段４６０〜４６２の各々は現用系に移行するマネージャはないと判定し、マネージャ４２０〜４２２は現用系に移行せず待機系のまま動作を続行する。
＜第２の実施の形態＞
本発明の第２の実施の形態について図面を参照して詳細に説明する。図５は、第２の実施の形態のシステムのブロック図である。 In the embodiment described above, as a result of aggregating the information of the managers 420 to 422, if there is no manager to be transferred to the active manager, for example, each of the aggregating means 460 to 462 has no manager to shift to the active system. The managers 420 to 422 continue to operate while remaining in the standby system without shifting to the active system.
<Second Embodiment>
A second embodiment of the present invention will be described in detail with reference to the drawings. FIG. 5 is a block diagram of a system according to the second embodiment.

第２の実施の形態のシステムは、二つの待機系のマネージャ２００，２０１と待機系リスト記憶部３１０とを有する。 The system according to the second embodiment includes two standby managers 200 and 201 and a standby list storage unit 310.

マネージャ２００は、情報取得部２２０と、配付部２３０と、集約部２４０と、障害検知部２５０とを含む。 The manager 200 includes an information acquisition unit 220, a distribution unit 230, an aggregation unit 240, and a failure detection unit 250.

マネージャ２０１は、情報取得部２２１と、配付部２３１と、集約部２４１と、障害検知部２５１とを含む。 The manager 201 includes an information acquisition unit 221, a distribution unit 231, an aggregation unit 241, and a failure detection unit 251.

第２の実施の形態が第１の実施の形態と異なるところは、各マネージャが全ての待機系マネージャの一覧を記憶する待機系リスト記憶部を持たず、各マネージャが共有する待機系リスト記憶部２１０を備える点である。他の部分は、第１の実施の形態と同様なものなので、詳細な説明を省略する。 The second embodiment is different from the first embodiment in that each manager does not have a standby list storage unit for storing a list of all standby managers, but a standby list storage unit shared by each manager. 210. Since other parts are the same as those in the first embodiment, detailed description thereof is omitted.

マネージャ２００の障害検知部２５０は、現用系マネージャの障害を検知した際、配付部２３０にその旨を通知する。配付部２３０は待機系リスト記憶部２１０から全ての待機系マネージャの一覧を取得し、情報取得部２２０から自身の情報を取得し、各待機系マネージャ内の集約部２４０及び２４１に取得した自身の情報を送信する。 When the failure detection unit 250 of the manager 200 detects a failure of the active manager, the failure detection unit 250 notifies the distribution unit 230 of the failure. The distribution unit 230 acquires a list of all standby system managers from the standby system list storage unit 210, acquires its own information from the information acquisition unit 220, and acquires the own information acquired by the aggregation units 240 and 241 in each standby system manager. Send information.

同様に、マネージャ２０１の障害検知部２５１は、現用系マネージャの障害を検知した際、配付部２３１にその旨を通知する。配付部２３１は待機系リスト記憶部２１０から全ての待機系マネージャの一覧を取得し、情報取得部２２１から自身の情報を取得し、各待機系マネージャ内の集約部２４０及び２４１に取得した自身の情報を送信する。 Similarly, when the failure detection unit 251 of the manager 201 detects a failure of the active manager, the failure detection unit 251 notifies the distribution unit 231 to that effect. The distribution unit 231 acquires a list of all standby system managers from the standby system list storage unit 210, acquires its own information from the information acquisition unit 221, and acquires its own information acquired by the aggregation units 240 and 241 in each standby system manager. Send information.

次に、集約部２４０は、各待機系マネージャ内の配付部２３０及び２３１から受信した各待機系マネージャの情報を集約し、判定アルゴリズムにより、自身が現用系マネージャに移行するか否かを判定する。 Next, the aggregating unit 240 aggregates the information of each standby system manager received from the distribution units 230 and 231 in each standby system manager, and determines whether or not it shifts to the active system manager by a determination algorithm. .

同様に、集約部２４１は、各待機系マネージャ内の配付部２３０及び２３１から受信した各待機系マネージャの情報を集約し、判定アルゴリズムにより、自身が現用系マネージャに移行するか否かを判定する。 Similarly, the aggregating unit 241 aggregates the information of each standby system manager received from the distribution units 230 and 231 in each standby system manager, and determines whether or not it shifts to the active system manager by a determination algorithm. .

第２の実施の形態では、各マネージャが待機系リスト記憶部を持たず、各マネージャが共有する待機系リスト記憶部２１０から全ての待機系マネージャの一覧を取得するように構成されている。従って、マネージャの一覧等の更新を待機系リスト記憶部２１０についてだけ行うだけでよい。
＜第３の実施の形態＞
本発明の第３の実施の形態について図面を参照して詳細に説明する。図６は第３の実施の形態のシステムのブロック図である。 In the second embodiment, each manager does not have a standby system list storage unit, and is configured to acquire a list of all standby system managers from the standby system list storage unit 210 shared by each manager. Accordingly, it is only necessary to update the manager list or the like only for the standby system list storage unit 210.
<Third Embodiment>
A third embodiment of the present invention will be described in detail with reference to the drawings. FIG. 6 is a block diagram of a system according to the third embodiment.

図５を参照すると、本発明の第３の実施の形態におけるフェイルオーバーシステムは、二つの待機系のマネージャ３００，３０１を有する。 Referring to FIG. 5, the failover system according to the third embodiment of the present invention has two standby managers 300 and 301.

マネージャ３００は、待機系リスト記憶部３１０と、情報取得部３２０と、配付部３３０と、集約部３４０と、障害検知部３５０とを含む。 The manager 300 includes a standby list storage unit 310, an information acquisition unit 320, a distribution unit 330, an aggregation unit 340, and a failure detection unit 350.

マネージャ３０１は、待機系リスト記憶部３１１と、情報取得部３２１と、配付部３３１と、集約部３４１と、障害検知部３５１とを含む。 The manager 301 includes a standby list storage unit 311, an information acquisition unit 321, a distribution unit 331, an aggregation unit 341, and a failure detection unit 351.

第３の実施の形態が第１の実施の形態と異なる所は、障害検知部３５０，３５１が現用系マネージャの障害を検出するだけではなく、どの現用系マネージャが障害したかを識別し、この情報を各待機系マネージャの配付部３３０，３３１に送信する。そして、配付部３３０，３３１は、障害した現用系マネージャを識別する識別情報を、自身の待機系マネージャの情報に付加して集約部３４０，３４１に送信する点である。更に、各待機系マネージャの集約部３４０，３４１は、障害した現用系マネージャの識別情報が同じ待機系の情報を用いて、移行の判定を行う点が異なる。 The third embodiment differs from the first embodiment in that the failure detection units 350 and 351 not only detect the failure of the active manager, but also identify which active manager has failed, Information is transmitted to the distribution units 330 and 331 of each standby manager. The distribution units 330 and 331 add identification information for identifying the failed active manager to the information of the standby manager and transmit the identification information to the aggregation units 340 and 341. Furthermore, the aggregation units 340 and 341 of the respective standby managers are different in that migration determination is performed using information on standby systems having the same identification information of the failed active manager.

このような構成にすることにより、複数の現用系マネージャがほぼ同時に障害した場合にでも、異なる現用系マネージャの障害時の待機系マネージャの情報を用いて判定することがなくなる。 With such a configuration, even when a plurality of active managers fail almost simultaneously, determination is not made using information on the standby manager at the time of failure of different active managers.

尚、他の部分は、第１の実施の形態と同様なものなので、詳細な説明を省略する。 Since other parts are the same as those in the first embodiment, detailed description thereof is omitted.

次に、図６及び図７のシーケンス図を参照して本実施の形態の全体の動作について詳細に説明する。 Next, the overall operation of the present embodiment will be described in detail with reference to the sequence diagrams of FIGS.

まず、現用系マネージャに障害が発生すると、マネージャ３００の障害検知部３５０が障害を検知し（Ｓ３１０）、いずれの現用系マネージャの障害を検出したかを配付部３３０に通知する。同様に、現用系マネージャに障害が発生すると、マネージャ３０１の障害検知部３５１が障害を検知し（Ｓ３１１）、いずれの現用系マネージャの障害を検出したかを配付部３３１に通知する。 First, when a failure occurs in the active manager, the failure detection unit 350 of the manager 300 detects the failure (S310), and notifies the distribution unit 330 which failure of the active manager is detected. Similarly, when a failure occurs in the active manager, the failure detection unit 351 of the manager 301 detects the failure (S311), and notifies the distribution unit 331 which failure of the active manager has been detected.

配付手段３３０は、待機系リスト記憶部３１０から全ての待機系マネージャの一覧（マネージャ３００及び３０１）を取得し、情報取得部３２０からマネージャ３００の情報を取得し、マネージャ３００の情報に障害した現用系マネージャの識別情報を付加して全マネージャの集約部（本例では、集約部３４０及び３４１）に送信する（Ｓ３４０）。 The distribution unit 330 acquires a list of all standby system managers (managers 300 and 301) from the standby system list storage unit 310, acquires the information of the manager 300 from the information acquisition unit 320, and the current working that failed in the information of the manager 300 The identification information of the system manager is added and transmitted to the aggregating units (aggregating units 340 and 341 in this example) of all managers (S340).

同様に、配付手段３３１は、待機系リスト記憶部３１１から全ての待機系マネージャの一覧（マネージャ３００及び３０１）を取得し、情報取得部３２１からマネージャ３０１の情報を取得し、マネージャ３０１の情報に障害した現用系マネージャの識別情報を付加して全マネージャの集約部（本例では、集約部３４０及び３４１）に送信する（Ｓ３４１）。
これらのステップは、現用系マネージャの障害の検出ごとに行う。 Similarly, the distribution unit 331 acquires a list of all standby managers (managers 300 and 301) from the standby list storage unit 311, acquires information on the manager 301 from the information acquisition unit 321, and stores the information on the manager 301. The identification information of the failed active manager is added and transmitted to the aggregation units (in this example, the aggregation units 340 and 341) of all managers (S341).
These steps are performed each time a failure of the active manager is detected.

最後に、集約部３４０は、全マネージャの配付部（本例では、配付部３３０及び３３１）からのマネージャの情報を集約し（Ｓ３５０）、障害が発生した現用系マネージャの識別情報が同じマネージャの情報を用いて、自身が現用系に移行するか否かを判定する（Ｓ３６０）。同様に、集約部３４１は、全マネージャの配付部（本例では、配付部３３０及び３３１）からのマネージャの情報を集約し（Ｓ３５０）、障害が発生した現用系マネージャの識別情報が同じマネージャの情報を用いて、自身が現用系に移行するか否かを判定する（Ｓ３６０）。 Finally, the aggregating unit 340 aggregates manager information from the distributing units of all managers (distributing units 330 and 331 in this example) (S350), and the identification information of the active manager in which the failure has occurred is the same manager. Using the information, it is determined whether or not the device itself shifts to the active system (S360). Similarly, the aggregating unit 341 aggregates manager information from the distributing units of all managers (in this example, distributing units 330 and 331) (S350), and the identification information of the active manager in which the failure has occurred is the same manager. Using the information, it is determined whether or not the device itself shifts to the active system (S360).

このように、複数の現用系マネージャがほぼ同時に障害した場合には、各待機系マネージャの情報取得部は、障害が発生した現用系マネージャの数だけ自身の待機系マネージャの情報を取得し、配付部はそれらの情報に障害が発生した現用系マネージャを識別する識別情報を付加して送信する。各待機系マネージャの集約部は、識別情報毎、すなわち、障害が発生した現用系マネージャの数だけ移行判定を行う。このような構成にすることにより、どの現用系マネージャが障害した時点での待機系マネージャの情報に関する情報であるかを判別することができ、全ての集約部において、複数回行われる各移行判定においても、判定に用いる情報は同一となる。 In this way, when multiple active managers fail almost simultaneously, the information acquisition unit of each standby manager acquires and distributes information on its own standby manager as many as the number of active managers that have failed. The unit adds the identification information for identifying the active manager in which the failure has occurred to the information and transmits the information. The aggregation unit of each standby manager performs the migration determination for each identification information, that is, the number of active managers in which a failure has occurred. By adopting such a configuration, it is possible to determine which information about the standby manager information at the time of the failure of the active manager, and in each migration determination performed multiple times in all aggregation units However, the information used for determination is the same.

実施例２として、図８のブロック図、図９のシーケンス図を利用して説明する。本例は、複数の現用系のマネージャがほぼ同時に障害した場合の動作の例を示す。 The second embodiment will be described with reference to the block diagram of FIG. 8 and the sequence diagram of FIG. This example shows an example of the operation when a plurality of active managers fail almost simultaneously.

図５に示すような、コンピュータネットワーク５００と、その構成要素である５個のノード５１０〜５１２，５１８，５１９と、各前記ノード上で動作する５個のマネージャ５２０〜５２２，５２８，５２９と、各前記マネージャが保持している５個の待機系リスト記憶部５３０〜５３２，５３８，５３９と、５個の情報取得部５４０〜５４２，５４８，５４９と、５個の配付部５５０〜５５２，５５８，５５９と、５個の集約部５６０〜５６２，５６８，５６９と、５個の障害検知部５７０〜５７２，５７８，５７９を含むシステムについて、ノード５１８およびノード５１９上のマネージャ５２８および５２９が現用系として動作し、ノード５１０〜５１２上の各マネージャ５２０〜５２２が待機系として動作している際に、ノード５１８およびノード５１９がコンピュータネットワーク５００から切断された場合の動作の形態を説明する。 As shown in FIG. 5, a computer network 500, five nodes 510 to 512, 518, and 519 that are components thereof, and five managers 520 to 522, 528, and 529 operating on each of the nodes, Five standby list storage units 530 to 532, 538, and 539 held by each manager, five information acquisition units 540 to 542, 548, and 549, and five distribution units 550 to 552 and 558 , 559, five aggregation units 560-562, 568, 569 and five fault detection units 570-572, 578, 579, managers 528 and 529 on nodes 518 and 519 are active. When the managers 520 to 522 on the nodes 510 to 512 are operating as standby systems, the node 51 And node 519 will be described in the form of operation when it is disconnected from the computer network 500.

まず、各々の障害検知部５７０〜５７２は、ノード５１８の障害を検知し、配付部５５０〜５５２に通知する（Ｓ６１０〜Ｓ６１２）。 First, each of the failure detection units 570 to 572 detects a failure of the node 518 and notifies the distribution units 550 to 552 (S610 to S612).

配付部５５０〜５５２は、障害検知部５７０〜５７２からの通知を受けて、各々の待機系リスト記憶部５３０〜５３２より全ての待機系マネージャの一覧であるマネージャ５２０〜５２２を取得し（Ｓ６２０〜Ｓ６２２）、各々の情報取得部５４０〜５４２より自身の情報を取得する（Ｓ６３０〜Ｓ６３２）。そして、ノード５１８の識別情報を付加して全ての集約部５６０〜５６２に自身の情報を送信する（Ｓ６４０〜Ｓ６４２）。 In response to the notification from the failure detection units 570 to 572, the distribution units 550 to 552 acquire managers 520 to 522 that are lists of all standby system managers from the respective standby system list storage units 530 to 532 (S620 to S620). S622), own information is acquired from each information acquisition part 540-542 (S630-S632). Then, the identification information of the node 518 is added, and the own information is transmitted to all the aggregation units 560 to 562 (S640 to S642).

また、各々の障害検知部５７０〜５７２は、ノード５１９の障害を検知し、配付部５５０〜５５２に通知する（Ｓ６１３〜Ｓ６１５）。 In addition, each of the failure detection units 570 to 572 detects a failure of the node 519 and notifies the distribution units 550 to 552 (S613 to S615).

配付部５５０〜５５２は、障害検知部５７０〜５７２からの通知を受けて、各々の待機系リスト記憶部５３０〜５３２より全ての待機系マネージャの一覧であるマネージャ５２０〜５２２を取得し（Ｓ６２３〜Ｓ６２５）、各々の情報取得部５４０〜５４２より自身の情報を取得する（Ｓ６３３〜Ｓ６３５）。そして、ノード５１９の識別情報を付加して全ての集約部５６０〜５６２に自身の情報を送信する（Ｓ６４３〜Ｓ６４５）。 In response to the notification from the failure detection units 570 to 572, the distribution units 550 to 552 acquire managers 520 to 522 that are lists of all standby system managers from the respective standby system list storage units 530 to 532 (S623 to S623). S625), own information is acquired from each information acquisition part 540-542 (S633-S635). Then, identification information of the node 519 is added and the information is transmitted to all the aggregation units 560 to 562 (S643 to S645).

次に、各々の集約部５６０〜５６２は、配付部５５０〜５５２から受信した情報を集約する（Ｓ６５０〜Ｓ６５２）。 Next, each aggregation part 560-562 aggregates the information received from the distribution part 550-552 (S650-S652).

続いて、集約部５６０〜５６２は、障害した現用系マネージャの識別情報が同じ情報を用いて判定を行う（Ｓ６６０〜Ｓ６６２）。例えば、集約部５６０は、各配付部５５０〜５５２からそれぞれ２回ずつ情報を受け取り、それら計６個の情報は、現用系マネージャ５２８の障害に関するものと現用系マネージャ５２９の障害とに関するものである。そこで、３個の現用系マネージャ５２８の障害に関する情報を用いて、現用系マネージャ５２８に関する移行の判定を行い、３個の現用系マネージャ５２９の障害に関する情報を用いて、現用系マネージャ５２９に関する移行の判定を行う。各集約部５６０〜５６２が、例えば、現用系マネージャ５２８，５２９それぞれの障害の結果として現用系に移行すべき待機系マネージャがそれぞれ５２１，５２２であると判断した場合、待機系マネージャ５２１および５２２は双方とも現用系に移行し、待機系マネージャ５２０は現用系に移行せず待機系のまま動作を続行する。 Subsequently, the aggregating units 560 to 562 perform determination using information having the same identification information of the failed active manager (S660 to S662). For example, the aggregating unit 560 receives information from each of the distributing units 550 to 552 twice, and the six pieces of information are related to the failure of the active manager 528 and the failure of the active manager 529. . Therefore, the information regarding the failure of the three active managers 528 is used to determine the migration regarding the active manager 528, and the information regarding the failure of the three active managers 529 is used to determine the transition of the migration regarding the active manager 529. Make a decision. For example, when each of the aggregation units 560 to 562 determines that the standby managers to be transferred to the active system are 521 and 522, respectively, as a result of the failure of the active managers 528 and 529, the standby managers 521 and 522 Both of them shift to the active system, and the standby manager 520 continues to operate without changing to the active system.

複数のノードから構成される分散計算機システムであって、システム内に存在するマネージャによってそれらのノードが管理されているシステムに対して本発明を利用することができる。 The present invention can be applied to a distributed computer system composed of a plurality of nodes, in which those nodes are managed by a manager existing in the system.

１００，１０１マネージャ
１１０，１１１待機系リスト記憶部
１２０，１２１情報取得部
１３０，１３１配付部
１４０，１４１集約部
１５０，１５１障害検知部
100, 101 Manager 110, 111 Standby system list storage unit 120, 121 Information acquisition unit 130, 131 Distribution unit 140, 141 Aggregation unit 150, 151 Fault detection unit

Claims

A method for controlling a failover system comprising an active node and at least two standby nodes,
Each of the standby nodes
When detecting that the active node is not functioning as the active node, the information of its own standby node at a certain time after the detection of the active node is transmitted to other standby nodes,
Obtain information of the other standby node,
A failover method for determining whether to move from a standby node to an active node using information on its own standby node and the acquired information on another standby node.

The failover method according to claim 1, wherein each of the standby nodes does not transmit the determined result to another standby node.

Each standby node has a determination algorithm for determining a transition from the standby node to the active node using the acquired standby node information, and the determination algorithm is the same for each standby node. The failover method according to claim 1 or 2.

4. The failover method according to claim 1, wherein each of the standby nodes transmits the same information as information of its standby node to another standby node. 5.

5. Each standby node, after detecting that it is not functioning as an active node, transmits the information of its own standby node once to another standby node. Failover method.

Each standby system
Adding identification information identifying the active node that has detected that it is not functioning as an active node to the information of its own standby node,
6. The failover method according to claim 1, wherein it is determined whether to move from the standby node to the active node using information of the standby node having the same identification information.

7. The information of the standby node is at least one of CPU operation rate, resource free capacity, failure rate, and band information used for data transmission or reception. 7. Failover method.

Having at least one active node and at least two standby nodes;
Each of the standby nodes is
Detecting means for detecting that the at least one active node is not functioning as an active node;
Standby node information transmission means for transmitting information of its own standby node at a certain time after detection of the active node of the detection means to other standby nodes;
Determination means for acquiring information of the own standby node and information of the other standby node and determining whether to move from the standby node to the active node using the acquired standby node information And a failover system.

The failover system according to claim 8, wherein each of the standby nodes does not transmit the determination result to another standby node.

The determination means has a determination algorithm for determining a transition from a standby node to an active node using the acquired standby node information, and the determination algorithm is the same for each standby node. The failover system according to claim 8 or 9.

The failover system according to any one of claims 8 to 10, wherein the standby node information transmitting unit transmits the same information as information of its standby node to another standby node.

The standby node information transmitting unit transmits information on its own standby node to another standby node once after detecting the active node of the detection unit. Failover system.

The standby node information transmission means adds identification information for identifying the active node detected as not functioning as the active node to the information of its own standby node and transmits it to other standby nodes. And
The failover system according to any one of claims 8 to 12, wherein the determination unit determines whether to move from the standby node to the active node using information of the standby node having the same identification information.

14. The information of the standby node is at least one of CPU operation rate, resource free capacity, failure rate, and band information used for data transmission or reception. 14. Failover system.

15. The failover system according to claim 8, wherein each standby node has standby node information storage means for storing information of a standby node that transmits information of the standby node.

Standby node information storage means for storing information of the standby node that transmits the information of the standby node;
The failover system according to claim 8, wherein each of the standby nodes shares the standby node information storage unit.

A node in a failover system having at least one active node and at least two standby nodes,
Detecting means for detecting that at least one active node is not functioning as an active node;
Standby node information transmission means for transmitting information of its standby node at a certain time after detection of the active node of the detection means to other standby nodes;
Determination means for acquiring information of the own standby node and information of the other standby node and determining whether to move from the standby node to the active node using the acquired standby node information A node having

The node according to claim 17, wherein the node does not transmit the determination result to another standby node.

The determination unit has a determination algorithm for determining a transition from the standby node to the active node using the acquired standby node information.
The node according to claim 17 or 18, wherein the determination algorithm is the same in each standby node.

The node according to any one of claims 17 to 19, wherein the standby node information transmission unit transmits the same information as information of its standby node to another standby node.

21. The standby node information transmission unit according to claim 17, wherein after detecting the active node of the detection unit, the standby node information transmission unit transmits information of its own standby node once to another standby node. node.

The standby node information transmitting means adds identification information for identifying the active node that has detected that the active node is not functioning as the active node to the information of its own standby node, so that other standby nodes To the node,
The node according to any one of claims 17 to 21, wherein the determination unit determines whether to move from the standby node to the active node using information of a standby node having the same identification information.

The information of the standby node is at least one of CPU operation rate, resource free capacity, failure rate, and band information used for data transmission or reception. Nodes.

The node according to any one of claims 13 to 17, further comprising standby node information storage means for storing information of a standby node that transmits information of the standby node.

A node program in a failover system having at least one active node and at least two standby nodes,
A detection process for detecting that at least one active node is not functioning as an active node;
Standby node information transmission processing for transmitting information of its own standby node at a certain time after detection of the active node to other standby nodes;
Determination processing for acquiring information of the own standby node and information of the other standby node, and determining whether to move from the standby node to the active node using the acquired standby node information A program that causes nodes to execute.

The program according to claim 25, wherein, in the determination process, the determination result is not transmitted to another standby node.

27. The program according to claim 25 or claim 26, wherein the determination processing determines whether to move from the standby node to the active node using a determination algorithm that is the same in each standby node.

The program according to any one of claims 25 to 27, wherein the standby node information transmission process transmits the same information to another standby node as information of its own standby node.

The program according to any one of claims 25 to 28, wherein the standby node information transmission processing transmits the information of its own standby node once to another standby node after detecting the active node.

In the standby node information transmission process, identification information for identifying the active node that is detected as not functioning as an active node is added to the information of its own standby node and transmitted to another standby node. And
30. The program according to claim 25, wherein the determination processing determines whether to move from the standby node to the active node using information of a standby node having the same identification information.

31. The information of the standby node is at least one of CPU operation rate, resource free capacity, failure rate, and bandwidth information used for data transmission or reception. Program.