JP2005258947A

JP2005258947A - Duplexing system and multiplexing control method

Info

Publication number: JP2005258947A
Application number: JP2004071495A
Authority: JP
Inventors: Kotaro Endo; 浩太郎遠藤
Original assignee: Toshiba Corp; Toshiba Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2004-03-12
Filing date: 2004-03-12
Publication date: 2005-09-22
Anticipated expiration: 2024-03-12
Also published as: JP3910967B2

Abstract

<P>PROBLEM TO BE SOLVED: To prevent the occurrence of split brain in a duplexing system. <P>SOLUTION: Server computers 10-1 and 10-2 respectively decide the failure of the other when it is impossible to recognize the other by heart beat mechanisms 12-1 and 12-2. In this status, a server connecting part 32 of a client computer 30 is connected to either the server computer 10-1 or 10-2, for example, the operating system server computer 10-1, and a client connection flag 130-1 of the computer 10-1 is turned ON. When the flag 130-1 is turned ON, a multiplexing control part 13-1 of the server computer 10-1 makes a server processing part 11-1 of the server computer 10-1 execute processing by deciding that a majority of groups in this system are formed of the server computer 10-1 and the client computer 30. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、相互に通信可能な２台のサーバコンピュータから構成され、当該２台のサーバコンピュータのいずれか一方が稼動系として動作し、他方が待機系として動作する２重化システム及び多重化制御方法に関する。 The present invention is composed of two server computers that can communicate with each other, and one of the two server computers operates as an active system and the other operates as a standby system and multiplexing control. Regarding the method.

近年、コンピュータ技術やネットワーク技術の向上は目覚ましく、これに伴って、業務のコンピュータ処理化が広く行われている。しかし、業務の内容によっては、故障などによる中断が許されないものも多い。そこで最近では、複数のコンピュータをネットワークで結合した分散システムを構築することが一般的になりつつある。 In recent years, the improvement of computer technology and network technology has been remarkable, and along with this, computerization of business has been widely performed. However, depending on the contents of the business, there are many cases where interruption due to a failure or the like is not allowed. Therefore, recently, it is becoming common to construct a distributed system in which a plurality of computers are connected by a network.

分散システムの運用手法の１つとして、コンピュータの処理の多重化が知られている。分散システムでは、各コンピュータが独立に故障する可能性がある。仮に、１つのコンピュータが故障しただけでシステム全体が機能しないシステムでは、システムの稼働率は、１台のコンピュータの稼働率よりも低くなってしまう。かかる事態を防止するために、システム全体に係わる処理は多重化することが必要である。逆に、多重化することによって、分散システムの稼働率を１台のコンピュータの稼働率よりも高くすることが可能である。例えば、稼働率９９パーセントのコンピュータ２台で構成する分散システムが、全く多重化（２重化）されていないとすると、その分散システムの稼働率は９８％程度である。もし、これが２重化されているとすると、稼働率は、９９．９９％程度となる。このような２重化システムは、例えば特許文献１に記載されている。この特許文献１に記載された２重化システムは、２台のサーバユニニットから構成される２重化サーバシステムである。また、特許文献１には、２重化システムを利用する複数のクライアント端末の全てから当該システム内の従系サーバユニット（待機系サーバコンピュータ）に対し、主系サーバユニット（稼動系サーバコンピュータ）からサービスの提供を受けられない旨の通知があった場合だけ、従系サーバユニットが主系サーバユニットに切り替わることが記載されている。このような２重化システムでは、たとえ２つのサーバユニット間で通信ができなくても、当該２つのサーバユニットが同時に主系となるのを防止することが可能となる。 As one method for operating a distributed system, multiplexing of computer processing is known. In a distributed system, each computer may fail independently. In a system in which the entire system does not function just because one computer fails, the system operation rate is lower than the operation rate of one computer. In order to prevent such a situation, it is necessary to multiplex processes related to the entire system. Conversely, by multiplexing, it is possible to make the operating rate of the distributed system higher than the operating rate of one computer. For example, if a distributed system composed of two computers having an operation rate of 99% is not multiplexed (duplexed) at all, the operation rate of the distributed system is about 98%. If this is duplicated, the operation rate is about 99.99%. Such a duplex system is described in Patent Document 1, for example. The duplex system described in Patent Document 1 is a duplex server system including two server units. Further, in Patent Document 1, a master server unit (active server computer) is connected to a slave server unit (standby server computer) in the system from all of a plurality of client terminals using the duplex system. It is described that the slave server unit is switched to the master server unit only when there is a notification that the service cannot be provided. In such a duplex system, even if communication cannot be performed between two server units, it is possible to prevent the two server units from becoming a main system at the same time.

＜フェイルオーバ方式＞
分散システムにおいて、コンピュータの処理を多重化する方式として、コンピュータの故障を検出したら、別のコンピュータでその処理を引き継ぐ方式が従来から知られている。この方式をフェイルオーバ方式と呼ぶ。 <Failover method>
As a method of multiplexing computer processing in a distributed system, a method of taking over the processing by another computer when a computer failure is detected is conventionally known. This method is called a failover method.

フェイルオーバ方式では、コンピュータの故障を検出するのに、コンピュータ間で定期的に通信し合い、お互いの動作状況を確認し合う方法が一般的に行われている。この通信のことを「ハートビート」と呼ぶ。コンピュータの故障停止は、ハートビートのタイムアウトによって検出される。つまり一定時間ハートビートが送出されていないコンピュータは故障停止したものとみなされる。 In the failover method, in order to detect a failure of a computer, a method of regularly communicating between computers and checking each other's operation status is generally performed. This communication is called “heartbeat”. Computer outages are detected by heartbeat timeouts. In other words, a computer that has not sent a heartbeat for a certain period of time is considered to have failed.

フェイルオーバ方式を適用する分散システムにおいては、スプリットブレインの発生が問題となる。スプリットブレインとは、実行のコンテキスト（状態）が２つ以上に分かれてしまうことを指す。スプリットブレインは、故障検出が誤って行われたときに発生する。例えば、分散システムを構成するコンピュータが、２つのコンピュータグループの間で互いに通信できない状態となった場合（network partitioning: ネットワーク分割）、両コンピュータグループは、互いに相手の故障を検出する。この場合、両コンピュータグループは独立して動作を始めることから、スプリットブレインに陥る。或いは、異常な高負荷のために、あるコンピュータのハートビートの送信が一時的に中断して故障が検出され、その後、そのコンピュータが動作を再開した後も、スプリットブレインに陥る可能性がある。 In a distributed system to which the failover method is applied, the occurrence of split brain becomes a problem. Split brain means that the execution context (state) is divided into two or more. Split brain occurs when fault detection is mistakenly performed. For example, when the computers constituting the distributed system become unable to communicate with each other between two computer groups (network partitioning), both computer groups detect each other's failure. In this case, since both computer groups start to operate independently, they fall into a split brain. Or, due to an abnormally high load, the transmission of a heartbeat of a computer may be temporarily interrupted, and a failure may be detected. Then, even after the computer resumes operation, it may fall into a split brain.

多重化された処理は、一般に分散システムの中で重要な処理である。そのため、スプリットブレインが起きると、その処理に一貫性がなくなり、システム全体に致命的な影響を及ぼす。 Multiplexed processes are generally important processes in a distributed system. Therefore, when split brain occurs, the processing becomes inconsistent and has a fatal effect on the entire system.

＜多数決方式＞
フェイルオーバ方式でのスプリットブレインの問題を根源的に解決するための方式として、多数決を用いた方式（多数決方式）が知られている。この多数決方式は、多重化した全てのコンピュータで同一の処理を実行し、全体の過半数（majority）を占めるコンピュータの動作を一致させることができるならば、他のコンピュータの動作に関係なく、処理を継続するものである。多数決方式は、定足数方式の１つである。定足数方式とは、多重化した全てのコンピュータで同一の処理を実行し、定足数（quorum:クォーラム）を占めるコンピュータの動作を一致させることができるならば、他のコンピュータの動作に関係なく、処理を継続するものである。この定足数を全体の過半数とした方式が多数決方式であり、例えば特許文献２に記載されている。 <Majority method>
As a system for fundamentally solving the problem of split brain in the failover system, a system using a majority vote (major vote system) is known. In this majority method, if the same processing is executed on all the multiplexed computers and the operations of the computers occupying the majority are matched, the processing is performed regardless of the operations of other computers. It will continue. The majority voting method is one of the quorum methods. In the quorum method, if the same processing is executed on all the multiplexed computers and the operations of the computers that occupy the quorum (quorum) can be matched, the processing is performed regardless of the operation of other computers. It will continue. A method in which the quorum is a majority of the whole is a majority method, and is described in, for example, Patent Document 2.

多数決方式では、スプリットブレインを原理的に回避できる。例えば、３台のコンピュータＸ，Ｙ，Ｚで処理を多重化（３重化）している場合で、２台のコンピュータＸ，ＹのグループＡと、１台のコンピュータＺのグループＢにネットワーク分割となったものとする。この場合でも、グループＡでの処理は動き続ける。これに対し、グループＢでの処理は中断する。ここでの中断とは、自身を含めて動作を一致させることができるコンピュータの数が過半数となるまで、処理を先に進めない状態をいう。 In the majority method, split brain can be avoided in principle. For example, when processing is multiplexed (triple) by three computers X, Y, and Z, the network is divided into group A of two computers X and Y and group B of one computer Z Suppose that Even in this case, the processing in the group A continues to move. On the other hand, the process in group B is interrupted. The interruption here means a state in which the processing cannot be advanced until the number of computers that can match the operation including itself becomes a majority.

また、コンピュータＺが異常な高負荷でハングアップしたものとすると、残りの過半数を占めるコンピュータグループ（ここではグループＡ）での処理は、当該コンピュータＺに関係なく動作を続ける。コンピュータＺが回復したとき、当該コンピュータＺでの処理は、過半数を構成できないため勝手に動き出すことはない。つまりコンピュータＺは、他の過半数を占めるコンピュータグループの処理に再同期化した後に、動作を再開する。
特開２０００−３３０８１４（段落００１３、図１）特開２００１−１１７８９５（段落０００７、段落００１８乃至００２２、図１乃至図５） If the computer Z is hung up due to an abnormally high load, the processing in the computer group (here, group A) occupying the remaining majority continues to operate regardless of the computer Z. When the computer Z recovers, the processing in the computer Z does not start on its own because it cannot constitute a majority. That is, the computer Z resumes the operation after resynchronizing with the processing of the computer group that occupies the other majority.
JP 2000-330814 (paragraph 0013, FIG. 1) JP 2001-117895 (paragraph 0007, paragraphs 0018 to 0022, FIGS. 1 to 5)

前述した多数決方式のシステムでは、最低でもコンピュータの数が３台以上必要である。したがって２台のコンピュータで構成される最も単純な多重化システム、つまり２重化されたコンピュータで構成される２重化システムでは、多数決方式は使用できない。 In the majority system described above, at least three computers are required. Therefore, the majority voting method cannot be used in the simplest multiplexing system constituted by two computers, that is, a duplex system constituted by duplicated computers.

一方、特許文献１に記載された２重化システムでは、当該システムを利用する複数のクライアント端末の全てから従系サーバユニットに対し、主系サーバユニットからサービスの提供を受けられない旨の通知があった場合に、当該システム内の２重化されたサーバユニット（サーバコンピュータ）間での主系、従系の切り替えが行われる。この２重化システムでは、２つのサーバユニット間で通信ができなくても、当該２つのサーバユニットが同時に主系となる、一種のスプリットブレインに陥るのを防止できる。 On the other hand, in the duplex system described in Patent Document 1, all of a plurality of client terminals that use the system are notified to the slave server unit that they cannot receive service from the primary server unit. If there is, the main system and the sub system are switched between the redundant server units (server computers) in the system. In this duplex system, even if communication is not possible between two server units, it is possible to prevent the two server units from falling into a kind of split brain, which simultaneously becomes a main system.

しかし、この２重化システムでは、たとえ全てのクライアント端末が主系サーバユニットからサービスの提供を受けられなくなったとしても、１台でもクライアント端末が故障すると、その故障クライアント端末から従系サーバユニットに対し、主系サーバユニットからサービスの提供を受けられない旨が通知されないことから、主系、従系の切り替えが行われない。この場合、各クライアント端末は、主系、従系いずれからもサービスの提供を受けられなくなる。 However, in this duplex system, even if all the client terminals cannot receive service from the primary server unit, if one client terminal fails, the faulty client terminal changes to the secondary server unit. On the other hand, since there is no notification that the service cannot be provided from the primary server unit, switching between the primary and secondary is not performed. In this case, each client terminal cannot receive service from either the primary system or the secondary system.

本発明は上記事情を考慮してなされたものでその目的は、２重化されたサーバコンピュータ間でスプリットブレインが発生するのを防止しながら、クライアントコンピュータに対してサービスが提供されなくなる事態が発生するのを極力防止できる２重化システム及び多重化制御方法を提供することにある。 The present invention has been made in view of the above circumstances, and its purpose is to prevent a situation where a service is not provided to a client computer while preventing a split brain from occurring between duplicate server computers. It is an object of the present invention to provide a duplex system and a multiplexing control method that can prevent this from occurring as much as possible.

本発明の１つの観点によれば、Ｎ台（Ｎは１以上の整数）のクライアントコンピュータにサービスを提供するための、相互に通信可能な２台のサーバコンピュータから構成され、当該２台のサーバコンピュータのいずれか一方が稼動系として動作し、他方が待機系として動作する２重化システムが提供される。この２重化システムの上記２台のサーバコンピュータの各々は、上記サービスを提供するためのサーバ処理を実行するサーバ処理手段と、上記Ｎ台のクライアントコンピュータとの接続状態を管理するためのクライアント接続状態管理手段と、このクライアント接続状態管理手段によって管理されている上記Ｎ台のクライアントコンピュータとの接続状態に基づく多数決により、上記サーバ処理手段によるサーバ処理を実行させるか否かを制御する多重化制御手段とを備える。 According to one aspect of the present invention, two server computers configured to provide services to N client computers (N is an integer equal to or greater than 1) can communicate with each other. A duplex system is provided in which either one of the computers operates as an active system and the other operates as a standby system. Each of the two server computers of the duplex system has a client connection for managing a connection state between the server processing means for executing the server processing for providing the service and the N client computers. Multiplexing control for controlling whether or not to execute server processing by the server processing means by a majority decision based on the connection state between the state management means and the N client computers managed by the client connection state management means Means.

上記の構成において、クライアントコンピュータが２重化システムからのサービスの提供を受けるには、当該システム内の２台のサーバコンピュータのうち、その時点においてサービスの提供が可能なサーバコンピュータに接続する必要がある。通常、クライアントコンピュータは２台のサーバコンピュータのうち稼動系のサーバコンピュータに接続する。また、稼動系のサーバコンピュータが故障して、当該稼動系のサーバコンピュータと通信できなくなった場合、つまり稼動系のサーバコンピュータを認識できなくなった場合には、クライアントコンピュータは、接続先を稼動系のサーバコンピュータから待機系のサーバコンピュータに切り替える。したがって、クライアントコンピュータが、２台のサーバコンピュータのいずれか一方に接続したということは、その接続先のサーバコンピュータを当該クライアントコンピュータが認識できたことを意味する。ここで、サーバコンピュータに接続されたクライアントコンピュータの数をｎとすると、当該サーバコンピュータとｎ台のクライアントコンピュータとにより１つのグループが形成されているといえる。そこで、上記２台のサーバコンピュータの各々が、上記Ｎ台のクライアントコンピュータとの接続状態を管理するならば、その接続状態から、自身が、クライアントコンピュータを含めたコンピュータの総数に対して多数を占めるグループに属しているか否かが判定可能である。 In the above configuration, in order for the client computer to receive the service from the duplex system, it is necessary to connect to the server computer that can provide the service at that time, out of the two server computers in the system. is there. Usually, the client computer is connected to the active server computer of the two server computers. If the active server computer fails and cannot communicate with the active server computer, that is, if the active server computer cannot be recognized, the client computer connects the connection destination of the active server computer. Switch from the server computer to the standby server computer. Therefore, the fact that the client computer is connected to one of the two server computers means that the client computer can recognize the connection destination server computer. Here, if the number of client computers connected to the server computer is n, it can be said that one group is formed by the server computer and the n client computers. Therefore, if each of the two server computers manages the connection state with the N client computers, the server computer itself occupies a large number with respect to the total number of computers including the client computer. It can be determined whether or not it belongs to a group.

よって上記の構成においては、２重化システムを構成する２台のサーバコンピュータの各々は、上記接続状態から、当該サーバコンピュータを利用するクライアントコンピュータをも含めた一種の多数決により、サーバ処理を実行するか否かを正しく決定できる。これにより、たとえ２台のサーバコンピュータ間で相互に相手を認識できなくなって、互いに相手が故障したと判定しても、つまりネットワーク分割が発生しても、当該２台のサーバコンピュータの各々は、上記の接続状態からサーバ処理を実行するか否かを正しく決定できる。 Therefore, in the above configuration, each of the two server computers constituting the duplex system executes server processing from the connected state by a kind of majority vote including client computers that use the server computer. Whether or not can be determined correctly. As a result, even if it is determined that the other party cannot recognize each other between the two server computers and the other party has failed, that is, even if network partitioning occurs, each of the two server computers Whether or not to execute server processing can be correctly determined from the above connection state.

本発明によれば、２重化システムを構成する２台のサーバコンピュータ（稼動系のサーバコンピュータと待機系のサーバコンピュータ）の間でたとえネットワーク分割が起きても、当該２台のサーバコンピュータの各々は、Ｎ台のクライアントコンピュータとの接続状態から、クライアントコンピュータをも含めた多数決により、自身がサーバ処理を実行するかを決定できる。これにより、スプリットブレインの発生を防止できると共に、クライアントコンピュータに対してサービスが提供されなくなる事態が発生するのを極力防止できる。 According to the present invention, even if a network partition occurs between two server computers (active server computer and standby server computer) constituting a duplex system, each of the two server computers Can determine whether to execute server processing by a majority vote including the client computers from the connection state with the N client computers. As a result, it is possible to prevent the occurrence of split brain and to prevent the occurrence of a situation where the service is not provided to the client computer as much as possible.

以下、本発明の実施形態につき図面を参照して説明する。
［第１の実施形態］
図１は本発明の第１の実施形態に係る２重化システムの構成を示すブロック図である。図１の２重化システムは、２台のサーバコンピュータ１０-1，１０-2から構成される。サーバコンピュータ１０-1，１０-2はネットワーク２０を介して相互に通信が可能なように構成されている。このサーバコンピュータ１０-1，１０-2（から構成される２重化システム）は、ネットワーク２０を介してクライアントコンピュータ３０から利用可能である。本実施形態では、サーバコンピュータ１０-1，１０-2は、データベースサーバが動作するデータベースサーバコンピュータであり、クライアントコンピュータ３０は、データベースサーバコンピュータを利用するアプリケーションサーバコンピュータである。このアプリケーションサーバコンピュータ上では、当該アプリケーションサーバコンピュータを利用するクライアント端末に対してサービスを提供するアプリケーションサーバが動作する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[First Embodiment]
FIG. 1 is a block diagram showing the configuration of a duplex system according to the first embodiment of the present invention. The duplex system shown in FIG. 1 includes two server computers 10-1 and 10-2. The server computers 10-1 and 10-2 are configured to be able to communicate with each other via the network 20. The server computers 10-1 and 10-2 (a duplex system composed of these) can be used from the client computer 30 via the network 20. In the present embodiment, the server computers 10-1 and 10-2 are database server computers on which a database server operates, and the client computer 30 is an application server computer that uses the database server computer. On the application server computer, an application server that provides a service to a client terminal that uses the application server computer operates.

サーバコンピュータ１０-1，１０-2は、それぞれサーバ処理部１１-1，１１-2と、ハートビート機構１２-1，１２-2と、多重化制御部１３-1，１３-2とを備えている。サーバ処理部１１-1，１１-2は、サーバ処理（サーバプログラム）を実行するサーバとして機能する。本実施形態において、サーバコンピュータ１０-1，１０-2はデータベースサーバコンピュータであり、サーバ処理部１１-1，１１-2はデータベースサーバとして機能する。ハートビート機構１２-1，１２-2は、互いにハートビートを授受することにより、互いが正常であることを確認する。 The server computers 10-1 and 10-2 include server processing units 11-1 and 11-2, heartbeat mechanisms 12-1 and 12-2, and multiplexing control units 13-1 and 13-2, respectively. ing. The server processing units 11-1 and 11-2 function as servers that execute server processing (server programs). In the present embodiment, the server computers 10-1 and 10-2 are database server computers, and the server processing units 11-1 and 11-2 function as database servers. The heartbeat mechanisms 12-1 and 12-2 confirm that each other is normal by exchanging heartbeats with each other.

多重化制御部１３-1，１３-2は、対応するサーバコンピュータ１０-1，１０-2が稼動系となってサーバ処理部１１-1，１１-2を動作させるか、待機系となってサーバ処理部１１-1，１１-2の処理を待たせるかを制御する。この制御のために、多重化制御部１３-1，１３-2は、対応するサーバコンピュータ１０-1，１０-2とクライアントコンピュータ３０との接続状態を管理するクライアント接続状態管理手段としてのクライアント接続フラグ１３０-1，１３０-2を保持する。フラグ１３０-1，１３０-2は、サーバコンピュータ１０-1，１０-2にクライアントコンピュータ３０が接続しているか否かを表す。ここで、「サーバコンピュータ１０-1，１０-2にクライアントコンピュータ３０が接続している」とは、サーバコンピュータ１０-1，１０-2とクライアントコンピュータ３０との間にセッションが設定されていることを表す。 The multiplexing control units 13-1 and 13-2 operate the server processing units 11-1 and 11-2 with the corresponding server computers 10-1 and 10-2 as the active system or the standby system. Controls whether to wait for the processing of the server processing units 11-1 and 11-2. For this control, the multiplexing controllers 13-1 and 13-2 connect client connections as client connection state management means for managing the connection state between the corresponding server computers 10-1 and 10-2 and the client computer 30. The flags 130-1 and 130-2 are held. Flags 130-1 and 130-2 indicate whether or not the client computer 30 is connected to the server computers 10-1 and 10-2. Here, “the client computer 30 is connected to the server computers 10-1 and 10-2” means that a session is set between the server computers 10-1 and 10-2 and the client computer 30. Represents.

多重化制御部１３-1，１３-2は、対応するサーバコンピュータ１０-1，１０-2のハートビート機構１２-1，１２-2によって互いが正常であることが認識されている場合は、自身が稼動系であるならばサーバ処理部１１-1，１１-2による処理の実行を継続させ、待機系であるならばサーバ処理部１１-1，１１-2の停止状態を継続させる。また多重化制御部１３-1，１３-2は、ハートビート機構１２-1，１２-2が相手を認識しない場合、クライアント接続フラグ１３０-1，１３０-2がＯＮであるならば、サーバ処理部１１-1，１１-2による処理を実行させ、ＯＦＦであるならば、サーバ処理部１１-1，１１-2による処理を停止させる。 If the multiplexing controllers 13-1 and 13-2 recognize that the heartbeat mechanisms 12-1 and 12-2 of the corresponding server computers 10-1 and 10-2 are normal, If it is an active system, the server processing units 11-1 and 11-2 continue to execute processing, and if it is a standby system, the server processing units 11-1 and 11-2 continue to be stopped. In addition, the multiplexing control units 13-1 and 13-2 perform server processing if the client connection flags 130-1 and 130-2 are ON when the heartbeat mechanisms 12-1 and 12-2 do not recognize the partner. The processing by the units 11-1 and 11-2 is executed. If the processing is OFF, the processing by the server processing units 11-1 and 11-2 is stopped.

一方、クライアントコンピュータ３０は、クライアント処理部３１と、サーバ接続部３２とを備えている。クライアント処理部３１は、クライアント処理（クライアントプログラム）を実行するクライアントとして機能する。サーバ接続部３２は、サーバコンピュータ１０-1，１０-2との接続を管理する。サーバ接続部３２は、クライアントコンピュータ３０の接続先のサーバコンピュータ１０-i（ｉは１または２）が有するクライアント接続フラグ１３０-iをＯＮにする。またサーバ接続部３２は、クライアントコンピュータ３０の接続先をサーバコンピュータ１０-j（ｊは１または２、但しｊ≠ｉ）からサーバコンピュータ１０-iに切り替えたときには、元の接続先のサーバコンピュータ１０-jのクライアント接続フラグ１３０-jをＯＦＦにする。 On the other hand, the client computer 30 includes a client processing unit 31 and a server connection unit 32. The client processing unit 31 functions as a client that executes client processing (client program). The server connection unit 32 manages the connection with the server computers 10-1 and 10-2. The server connection unit 32 turns on the client connection flag 130-i of the server computer 10-i (i is 1 or 2) to which the client computer 30 is connected. When the server computer 32 switches the connection destination of the client computer 30 from the server computer 10-j (j is 1 or 2, where j ≠ i) to the server computer 10-i, the server computer 10 of the original connection destination -j client connection flag 130-j is turned OFF.

次に、図１のシステムにおける動作について説明する。
＜通常時の動作＞
まず、サーバコンピュータ１０-1，１０-2が共に正常である通常時の動作について、図２のシステム状態図を参照して説明する。今、サーバコンピュータ１０-1が稼動系として動作し、サーバコンピュータ１０-2が待機系として動作しているものとする。サーバコンピュータ１０-1，１０-2のハートビート機構１２-1，１２-2は、ネットワーク２０を介して相互に「ハートビート」と呼ばれる定期的な通信をし合うことにより、相手のサーバコンピュータの動作状況、即ち相手が正常に稼動しているかを確認する。 Next, the operation in the system of FIG. 1 will be described.
<Normal operation>
First, the normal operation in which both the server computers 10-1 and 10-2 are normal will be described with reference to the system state diagram of FIG. Assume that the server computer 10-1 operates as an active system and the server computer 10-2 operates as a standby system. The heartbeat mechanisms 12-1 and 12-2 of the server computers 10-1 and 10-2 communicate with each other through the network 20 through periodic communication called “heartbeat”, thereby Check the operating status, that is, whether the other party is operating normally.

ここでは、ハートビート機構１２-1，１２-2が、ハートビートの授受により互いに相手のサーバコンピュータを認識することができ、したがって図２に示すように相手のサーバコンピュータが正常に稼動していると確認できているものとする。この場合、サーバコンピュータ１０-1は稼動系として動作を継続し、サーバコンピュータ１０-2は待機系として動作を継続する。 Here, the heartbeat mechanisms 12-1 and 12-2 can recognize each other's server computer by sending and receiving the heartbeat, and therefore the other server computer is operating normally as shown in FIG. It shall be confirmed that In this case, the server computer 10-1 continues to operate as an active system, and the server computer 10-2 continues to operate as a standby system.

このような状況では、クライアントコンピュータ３０のサーバ接続部３は、サーバコンピュータ１０-1，１０-2のうちの待機系のサーバコンピュータ１０-2に対して接続要求を送出したとしても、当該要求は拒絶される。したがってクライアントコンピュータ３０はサーバコンピュータ１０-2に接続できない。一方、クライアントコンピュータ３０のサーバ接続部３から稼動系のサーバコンピュータ１０-1に対する接続要求は受け付けられる。したがって、クライアントコンピュータ３０は、図２において矢印２０１に示すように稼動系のサーバコンピュータ１０-1に接続される。この状態で、クライアントコンピュータ３０のサーバ接続部３は、サーバコンピュータ１０-1のクライアント接続フラグ１３０-1をＯＮにする。 In such a situation, even if the server connection unit 3 of the client computer 30 sends a connection request to the standby server computer 10-2 of the server computers 10-1 and 10-2, the request is not received. Rejected. Therefore, the client computer 30 cannot connect to the server computer 10-2. On the other hand, a connection request from the server connection unit 3 of the client computer 30 to the active server computer 10-1 is accepted. Accordingly, the client computer 30 is connected to the active server computer 10-1 as indicated by an arrow 201 in FIG. In this state, the server connection unit 3 of the client computer 30 turns on the client connection flag 130-1 of the server computer 10-1.

サーバコンピュータ１０-1の多重化制御部１３-1は、クライアント接続フラグ１３０-1がＯＮの場合、当該サーバコンピュータ１０-1のサーバ処理部１１-1による処理を実行させる。つまり、サーバコンピュータ１０-1，１０-2のうち、稼動系のサーバコンピュータ１０-1がサーバ処理を実行して、クライアントコンピュータ３０に対してサービスを提供する。一方、待機系のサーバコンピュータ１０-2では、サーバ処理部１１-2は停止状態にある。 When the client connection flag 130-1 is ON, the multiplexing control unit 13-1 of the server computer 10-1 causes the server processing unit 11-1 of the server computer 10-1 to execute processing. That is, of the server computers 10-1 and 10-2, the active server computer 10-1 executes server processing and provides a service to the client computer 30. On the other hand, in the standby server computer 10-2, the server processing unit 11-2 is in a stopped state.

＜稼動系の停止時の動作＞
次に、稼動系のサーバコンピュータ１０-1が停止した場合の動作について、図３のシステム状態図を参照して説明する。稼動系のサーバコンピュータ１０-1が故障等で停止すると、当該サーバコンピュータ１０-1のハートビート機構１２-1からのハートビートが一定時間を超えて途絶える。この場合、待機系のサーバコンピュータ１０-2のハートビート機構１２-1は稼動系のサーバコンピュータ１０-1を認識できないため、取り敢えず当該サーバコンピュータ１０-1の故障を判定する。この状況では、クライアントコンピュータ３０は稼動系のサーバコンピュータ１０-1と通信できず、当該サーバコンピュータ１０-1からサービスの提供を受けられない。そこでクライアントコンピュータ３０のサーバ接続部３２は、接続先を、稼動系のサーバコンピュータ１０-1から、図３において矢印３０１で示すように待機系のサーバコンピュータ１０-2に切り替えて、当該サーバコンピュータ１０-2のクライアント接続フラグ１３０-2をＯＮにする。このことは、クライアントコンピュータ３０が稼動系のサーバコンピュータ１０-1の故障を認識したことを表す。 <Operation when the active system stops>
Next, the operation when the active server computer 10-1 is stopped will be described with reference to the system state diagram of FIG. When the active server computer 10-1 stops due to a failure or the like, the heartbeat from the heartbeat mechanism 12-1 of the server computer 10-1 stops after a certain time. In this case, since the heartbeat mechanism 12-1 of the standby server computer 10-2 cannot recognize the active server computer 10-1, the failure of the server computer 10-1 is determined for the time being. In this situation, the client computer 30 cannot communicate with the active server computer 10-1, and cannot receive service from the server computer 10-1. Therefore, the server connection unit 32 of the client computer 30 switches the connection destination from the active server computer 10-1 to the standby server computer 10-2 as indicated by an arrow 301 in FIG. The -2 client connection flag 130-2 is turned ON. This indicates that the client computer 30 has recognized the failure of the active server computer 10-1.

したがって、クライアントコンピュータ３０のサーバ接続部３２がサーバコンピュータ１０-2のクライアント接続フラグ１３０-2をＯＮしたことは、２台のサーバコンピュータ１０-1，１０-2と１台のクライアントコンピュータ３０との合計３台のコンピュータから構成されるシステムにおいて、サーバコンピュータ１０-2とクライアントコンピュータ３０の双方が、残りのサーバコンピュータ１０-1の故障を認識していることを表す。つまり、図３のシステムの状態は、クライアントコンピュータ３０を含めたコンピュータの総数に対して、待機系のサーバコンピュータ１０-2とクライアントコンピュータ３０との２台で多数を占めるグループを形成していることを表す。 Therefore, the fact that the server connection unit 32 of the client computer 30 has turned on the client connection flag 130-2 of the server computer 10-2 means that the two server computers 10-1 and 10-2 and one client computer 30 are connected. In a system composed of a total of three computers, both the server computer 10-2 and the client computer 30 are aware of the failure of the remaining server computer 10-1. That is, the state of the system in FIG. 3 forms a group in which the standby server computer 10-2 and the client computer 30 occupy a large number with respect to the total number of computers including the client computer 30. Represents.

そこで待機系のサーバコンピュータ１０-2の多重化制御部１３-2は、クライアント接続フラグ１３０-2がＯＮになると、当該サーバコンピュータ１０-2が、システム内のコンピュータの総数に対して多数（過半数）を占めているグループに属していると判断する。この場合、サーバコンピュータ１０-2の多重化制御部１３-2は、当該サーバコンピュータ１０-2が待機系であることから、当該サーバコンピュータ１０-2のサーバ処理部１１-2により、稼動系のサーバコンピュータ１０-1のサーバ処理部１１-1で行われていた処理を引き継がせる（フェイルオーバさせる）。これにより、サーバコンピュータ１０-2は待機系から稼動系に切り替わる。 Therefore, when the client connection flag 130-2 is turned ON, the multiplexing control unit 13-2 of the standby server computer 10-2 has a large number (the majority number) of the server computer 10-2 with respect to the total number of computers in the system. ). In this case, since the server computer 10-2 is a standby system, the multiplexing control unit 13-2 of the server computer 10-2 has an active system by the server processing unit 11-2 of the server computer 10-2. The processing performed in the server processing unit 11-1 of the server computer 10-1 is taken over (failed over). As a result, the server computer 10-2 is switched from the standby system to the active system.

＜ネットワーク分割時の動作＞
次に、図２の状態にあった稼動系のサーバコンピュータ１０-1と待機系のサーバコンピュータ１０-2との間でネットワーク分割となったときの動作について、図４のシステム状態図を参照して説明する。 <Operation during network division>
Next, referring to the system state diagram of FIG. 4 for the operation when the network is divided between the active server computer 10-1 and the standby server computer 10-2 in the state of FIG. I will explain.

今、稼動系のサーバコンピュータ１０-1と待機系のサーバコンピュータ１０-2が例えば正常に動作しているにも拘わらずに、通信障害等の要因により、図４に示すように、サーバコンピュータ１０-1のハートビート機構１２-1とサーバコンピュータ１０-2のハートビート機構１２-2との間で互いにハートビートを授受できない状態、つまりネットワーク分割が発生したものとする。この状態では、サーバコンピュータ１０-1，１０-2のハートビート機構１２-1，１２-2は、互いに相手を認識できず、互いに相手のサーバコンピュータの故障を判定する。このとき、クライアントコンピュータ３０のサーバ接続部３２は、稼動系のサーバコンピュータ１０-1または待機系のサーバコンピュータ１０-2のどちらにも接続することが可能である。 Now, although the active server computer 10-1 and the standby server computer 10-2 are operating normally, for example, as shown in FIG. -1 heartbeat mechanism 12-1 and the heartbeat mechanism 12-2 of the server computer 10-2 cannot exchange heartbeats with each other, that is, network division occurs. In this state, the heartbeat mechanisms 12-1 and 12-2 of the server computers 10-1 and 10-2 cannot recognize each other, and determine each other's failure of the other server computer. At this time, the server connection unit 32 of the client computer 30 can be connected to either the active server computer 10-1 or the standby server computer 10-2.

ここでは、クライアントコンピュータ３０のサーバ接続部３２は、図４に示すように稼動系のサーバコンピュータ１０-1に接続して、当該サーバコンピュータ１０-1のクライアント接続フラグ１３０-1をＯＮにしたものとする。この動作は、クライアントコンピュータ３０が既に稼動系のサーバコンピュータ１０-1に接続している場合には行われない。、サーバコンピュータ１０-1の多重化制御部１３-1は、相手（サーバコンピュータ１０-2）の故障を判定し、且つクライアント接続フラグ１３０-1がＯＮの場合、当該サーバコンピュータ１０-1が、システム内のコンピュータの総数に対して多数（過半数）を占めているグループに属していると判断する。この場合、サーバコンピュータ１０-1の多重化制御部１３-1は、当該サーバコンピュータ１０-1が稼動系であることから、当該サーバコンピュータ１０-1のサーバ処理部１１-1による処理の実行を継続させる。 Here, the server connection unit 32 of the client computer 30 is connected to the active server computer 10-1 as shown in FIG. 4, and the client connection flag 130-1 of the server computer 10-1 is turned ON. And This operation is not performed when the client computer 30 is already connected to the active server computer 10-1. The multiplexing control unit 13-1 of the server computer 10-1 determines that the other party (server computer 10-2) has failed, and if the client connection flag 130-1 is ON, the server computer 10-1 Judged to belong to a group that occupies a large number (a majority) of the total number of computers in the system. In this case, the multiplexing control unit 13-1 of the server computer 10-1 executes the processing by the server processing unit 11-1 of the server computer 10-1 because the server computer 10-1 is an active system. Let it continue.

一方、サーバコンピュータ１０-2の多重化制御部１３-2は、相手（サーバコンピュータ１０-1）の故障を判定しても、この例のようにクライアント接続フラグ１３０-2がＯＦＦの場合、当該サーバコンピュータ１０-2が、システム内のコンピュータの総数に対して多数を占めているグループに属していないと判断する。この場合、サーバコンピュータ１０-2の多重化制御部１３-2は、当該サーバコンピュータ１０-2が待機系であることから、当該サーバコンピュータ１０-2のサーバ処理部１１-2の停止状態を継続させる。 On the other hand, even if the multiplexing control unit 13-2 of the server computer 10-2 determines that the partner (server computer 10-1) has failed, if the client connection flag 130-2 is OFF as in this example, It is determined that the server computer 10-2 does not belong to a group that occupies a large number with respect to the total number of computers in the system. In this case, the multiplexing control unit 13-2 of the server computer 10-2 continues the stopped state of the server processing unit 11-2 of the server computer 10-2 because the server computer 10-2 is a standby system. Let

このように本実施形態においては、サーバコンピュータ１０-1及び１０-2の間でネットワーク分割が発生しても、当該サーバコンピュータ１０-1及び１０-2の各々は、クライアント接続フラグ１３０-1及び１３０-2の状態、つまりクライアントコンピュータ３０との接続状態から、サーバ処理を実行するか否かを正しく決定できる。これにより、スプリットブレインの発生を防止できる。 As described above, in the present embodiment, even if network division occurs between the server computers 10-1 and 10-2, each of the server computers 10-1 and 10-2 has the client connection flag 130-1 and Whether or not to execute server processing can be correctly determined from the state of 130-2, that is, the connection state with the client computer 30. Thereby, generation | occurrence | production of split brain can be prevented.

＜稼動系が高負荷となりハートビート送信が渋滞したときの動作）
次に、サーバコンピュータ１０-1のハートビート機構１２-1からサーバコンピュータ１０-2へのハートビート送信が渋滞したときの動作について、図５のシステム状態図を参照して説明する。 <Operation when heartbeat transmission is congested due to a heavy load on the operating system)
Next, the operation when the heartbeat transmission from the heartbeat mechanism 12-1 of the server computer 10-1 to the server computer 10-2 is congested will be described with reference to the system state diagram of FIG.

今、図２の状態にあった稼動系のサーバコンピュータ１０-1が高負荷となって、当該サーバコンピュータ１０-1のハートビート機構１２-1からサーバコンピュータ１０-2へのハートビート送信が渋滞したものとする。待機系のサーバコンピュータ１０-2のハートビート機構１２-2は、図５に示すように稼動系のサーバコンピュータ１０-1を認識できない。この状態では、クライアントコンピュータ３０がサーバコンピュータ１０-1に接続していたとしても、当該サーバコンピュータ１０-1からクライアントコンピュータ３０に対するサービスの提供も渋滞する。 Now, the active server computer 10-1 in the state of FIG. 2 becomes heavily loaded, and the heartbeat transmission from the heartbeat mechanism 12-1 of the server computer 10-1 to the server computer 10-2 is congested. Shall be. The heartbeat mechanism 12-2 of the standby server computer 10-2 cannot recognize the active server computer 10-1, as shown in FIG. In this state, even if the client computer 30 is connected to the server computer 10-1, service provision from the server computer 10-1 to the client computer 30 is also congested.

そこで、クライアントコンピュータ３０のサーバ接続部３２は、接続先を、稼動系のサーバコンピュータ１０-1から、図５において矢印５０１で示すように待機系のサーバコンピュータ１０-2に切り替えて、当該サーバコンピュータ１０-2のクライアント接続フラグ１３０-2をＯＮにする。このときサーバ接続部３２は、稼動系のサーバコンピュータ１０-1のクライアント接続フラグ１３０-1をＯＦＦにする。 Therefore, the server connection unit 32 of the client computer 30 switches the connection destination from the active server computer 10-1 to the standby server computer 10-2 as indicated by an arrow 501 in FIG. The client connection flag 130-2 of 10-2 is turned ON. At this time, the server connection unit 32 turns off the client connection flag 130-1 of the active server computer 10-1.

待機系のサーバコンピュータ１０-2の多重化制御部１３-2は、クライアント接続フラグ１３０-2がＯＮになると、当該サーバコンピュータ１０-2が、システム内のコンピュータの総数に対して多数を占めているグループに属していると判断する。この場合、サーバコンピュータ１０-2の多重化制御部１３-2は、当該サーバコンピュータ１０-2が待機系であることから、当該サーバコンピュータ１０-2のサーバ処理部１１-2により、稼動系のサーバコンピュータ１０-1のサーバ処理部１１-1で行われていた処理を引き継がせる。これにより、サーバコンピュータ１０-2は待機系から稼動系に切り替わる。 In the multiplexing control unit 13-2 of the standby server computer 10-2, when the client connection flag 130-2 is turned ON, the server computer 10-2 occupies a large number with respect to the total number of computers in the system. It belongs to a certain group. In this case, since the server computer 10-2 is a standby system, the multiplexing control unit 13-2 of the server computer 10-2 has an active system by the server processing unit 11-2 of the server computer 10-2. The processing performed in the server processing unit 11-1 of the server computer 10-1 is taken over. As a result, the server computer 10-2 is switched from the standby system to the active system.

一方、稼動系のサーバコンピュータ１０-1の多重化制御部１３-1は、クライアント接続フラグ１３０-1がＯＦＦになると、当該サーバコンピュータ１０-1が、システム内のコンピュータの総数に対して多数を占めているグループに属していないと判断する。この場合、サーバコンピュータ１０-1の多重化制御部１３-1は、当該サーバコンピュータ１０-1が稼動系であることから、当該サーバコンピュータ１０-1のサーバ処理部１１-12の動作を停止させる。これにより、サーバコンピュータ１０-1は稼動系から待機系に切り替わる。 On the other hand, when the client connection flag 130-1 is turned off, the multiplexing control unit 13-1 of the active server computer 10-1 determines that the server computer 10-1 has a larger number than the total number of computers in the system. Judge that it does not belong to the group it occupies. In this case, the multiplexing control unit 13-1 of the server computer 10-1 stops the operation of the server processing unit 11-12 of the server computer 10-1 because the server computer 10-1 is an active system. . As a result, the server computer 10-1 is switched from the active system to the standby system.

上記第１の実施形態では、クライアントコンピュータ３０のサーバ接続部３２は、サーバコンピュータに接続する場合だけ接続先のサーバコンピュータのクライアント接続フラグをＯＮしている。また、サーバ接続部３２は、接続先を切り替えたときは、元の接続先のサーバコンピュータのクライアント接続フラグをＯＦＦにしている。しかし、サーバ接続部３２が、接続先のサーバコンピュータのクライアント接続フラグをＯＮにする更新操作（ＯＮ操作）を定期的に行うようにしても良い。この場合、サーバコンピュータの多重化制御部にクライアント接続フラグをＯＦＦにするＯＦＦ操作手段を持たせ、当該ＯＦＦ操作手段がクライアント接続フラグを定期的に監視して、一定時間ＯＮ操作が行われなかった場合に当該フラグをＯＦＦすると良い。ここで、一定時間ＯＮ操作が行われなかったことを検出可能とするには、例えばＯＮ操作毎に時刻情報が更新されるタイムスタンプをクライアント接続フラグに付加して、当該タイムスタンプを現在時刻と比較すれば良い。 In the first embodiment, the server connection unit 32 of the client computer 30 turns on the client connection flag of the connection destination server computer only when connecting to the server computer. Further, when the connection destination is switched, the server connection unit 32 turns off the client connection flag of the server computer of the original connection destination. However, the server connection unit 32 may periodically perform an update operation (ON operation) for turning on the client connection flag of the connection destination server computer. In this case, the multiplexing control unit of the server computer has an OFF operation means for turning off the client connection flag, the OFF operation means regularly monitors the client connection flag, and the ON operation is not performed for a certain period of time. In this case, the flag may be turned off. Here, in order to be able to detect that the ON operation has not been performed for a certain time, for example, a time stamp whose time information is updated every ON operation is added to the client connection flag, and the time stamp is set as the current time. Compare.

［第２の実施形態］
図６は本発明の第２の実施形態に係る２重化システムの構成を示すブロック図である。図６において、図１中の構成要素と同様の要素には、便宜的に同一符号を付してある。 [Second Embodiment]
FIG. 6 is a block diagram showing a configuration of a duplex system according to the second embodiment of the present invention. In FIG. 6, the same components as those in FIG. 1 are denoted by the same reference numerals for the sake of convenience.

図６の２重化システムは、図１の２重化システムと同様に、ネットワーク２０を介して相互に通信可能な２台のサーバコンピュータ１０-1，１０-2から構成される。図６の２重化システムが、図１の２重化システムと異なる点は、当該２重化システム（を構成するサーバコンピュータ１０-1，１０-2）がＮ台（Ｎは２以上の整数）のクライアントコンピュータ３０-1〜３０-Nによって利用される点である。図６の２重化システムの特徴は、サーバコンピュータ１０-i（ｉ＝１，２）に接続されるクライアントコンピュータの数がクライアントコンピュータの総数の過半数を占めている（つまりＮ／２を超えている）場合には、当該サーバコンピュータ１０-iでサーバ処理を実行し、過半数に満たない（つまりＮ／２以下の）場合には、当該サーバコンピュータ１０-iでサーバ処理を実行しない点にある。 The duplex system shown in FIG. 6 includes two server computers 10-1 and 10-2 that can communicate with each other via the network 20 in the same manner as the duplex system shown in FIG. 6 differs from the duplex system of FIG. 1 in that the duplex system (server computers 10-1 and 10-2 constituting the duplex system) is N (N is an integer of 2 or more). ) Is used by the client computers 30-1 to 30-N. 6 is characterized in that the number of client computers connected to the server computer 10-i (i = 1, 2) accounts for a majority of the total number of client computers (that is, more than N / 2). The server processing is executed by the server computer 10-i, and the server processing is not executed by the server computer 10-i when the server computer 10-i is less than the majority (that is, N / 2 or less). .

そこで、サーバコンピュータ１０-1，１０-2の多重化制御部１３-1，１３-2は、図１中のクライアント接続フラグ１３０-1，１３０-2に代えて、クライアント接続カウンタ１３１-1，１３１-2を有する。クライアント接続カウンタ１３１-1，１３１-2は、サーバコンピュータ１０-1，１０-2とクライアントコンピュータとの接続状態を管理するためのクライアント接続状態管理手段であり、サーバコンピュータ１０-1，１０-2に接続されているクライアントコンピュータの数（接続数）Ｃ１，Ｃ２を保持する。 Therefore, the multiplexing controllers 13-1 and 13-2 of the server computers 10-1 and 10-2 replace the client connection flags 130-1 and 130-2 in FIG. 131-2. The client connection counters 131-1 and 131-2 are client connection state management means for managing the connection state between the server computers 10-1 and 10-2 and the client computer, and the server computers 10-1 and 10-2. The number (number of connections) C1 and C2 of client computers connected to is held.

各クライアントコンピュータ３０-1〜３０-Nは、図１中のクライアント処理部３１及びサーバ接続部３２に相当する、クライアント処理部３１-1〜３１-N及びサーバ接続部３２-1〜３２-Nを有する。クライアントコンピュータ３０-k（ｋ＝１〜Ｎ）のサーバ接続部３２-kは、クライアント処理部３１-kがサーバコンピュータ１０-iからサービスの提供を受けようとする場合、当該サーバコンピュータ１０-iに接続要求を送出することにより、当該サーバコンピュータ１０-iに接続する。このときサーバ接続部３２-kは、サーバコンピュータ１０-iのクライアント接続カウンタ１３１-iの値を１インクリメントする。またサーバ接続部３２-kは、クライアントコンピュータ３０-kの接続先をサーバコンピュータ１０-j（ｊは１または２、但しｊ≠ｉ）からサーバコンピュータ１０-iに切り替えたときには、元の接続先のサーバコンピュータ１０-jのクライアント接続カウンタ１３１-iの値を１デクリメントする。これにより、サーバコンピュータ１０-1，１０-2のクライアント接続カウンタ１３１-1，１３１-2は、それぞれ当該サーバコンピュータ１０-1，１０-2に接続されているクライアントコンピュータの数Ｃ１，Ｃ２を保持する
サーバコンピュータ１０-1，１０-2の多重化制御部１３-1，１３-2は、当該サーバコンピュータ１０-1，１０-2のハートビート機構１２-1，１２-2によって互いが正常であることが認識されている場合は、自身が稼動系であるならばサーバ処理部１１-1，１１-2による処理の実行を継続させ、待機系であるならばサーバ処理部１１-1，１１-2の停止状態を継続させる。また多重化制御部１３-1，１３-2は、ハートビート機構１２-1，１２-2が相手を認識しない場合、クライアント接続カウンタ１３１-1，１３１-2の値、即ちサーバコンピュータ１０-1，１０-2に接続されているクライアントコンピュータの数（接続数）Ｃ１，Ｃ２がＮ／２を超えているか否かを判定する。多重化制御部１３-1，１３-2は、接続数Ｃ１，Ｃ２がＮ／２を超えているならば、サーバ処理部１１-1，１１-2による処理を実行させ、Ｎ／２以下であるならば、サーバ処理部１１-1，１１-2による処理を停止させる。これにより、２重化されたサーバコンピュータ１０-1及び１０-2の間でネットワーク分割が発生しても、またＮ台のクライアントコンピュータ３０-1〜３０-Nの一部が故障しても、サーバコンピュータ１０-1及び１０-2の間でスプリットブレインが発生するのを防止しながら、クライアントコンピュータに対してサービスが提供されなくなる事態が発生するのを極力防止できる。 The client computers 30-1 to 30-N correspond to the client processing unit 31 and the server connection unit 32 in FIG. 1, and correspond to the client processing units 31-1 to 31-N and the server connection units 32-1 to 32-N. Have The server connection unit 32-k of the client computer 30-k (k = 1 to N), when the client processing unit 31-k intends to receive service from the server computer 10-i, the server computer 10-i. The connection request is sent to the server computer 10-i. At this time, the server connection unit 32-k increments the value of the client connection counter 131-i of the server computer 10-i by 1. The server connection unit 32-k switches the connection destination of the client computer 30-k from the server computer 10-j (j is 1 or 2, where j ≠ i) to the server computer 10-i. The value of the client connection counter 131-i of the server computer 10-j is decremented by 1. As a result, the client connection counters 131-1 and 131-2 of the server computers 10-1 and 10-2 hold the numbers C1 and C2 of the client computers connected to the server computers 10-1 and 10-2, respectively. The multiplexing controllers 13-1 and 13-2 of the server computers 10-1 and 10-2 are normal to each other by the heartbeat mechanisms 12-1 and 12-2 of the server computers 10-1 and 10-2. If it is recognized that the server processing unit 11-1 or 11-2 is active, the server processing units 11-1 and 11-2 continue to execute the process. Continue the stop state of -2. In addition, when the heartbeat mechanisms 12-1 and 12-2 do not recognize the partner, the multiplexing controllers 13-1 and 13-2 determine the values of the client connection counters 131-1 and 131-2, that is, the server computer 10-1. 10-2, the number of client computers connected (number of connections) C1, C2 is determined whether it exceeds N / 2. If the number of connections C1 and C2 exceeds N / 2, the multiplexing control units 13-1 and 13-2 execute the processing by the server processing units 11-1 and 11-2, and the number is less than N / 2. If there is, the processing by the server processing units 11-1 and 11-2 is stopped. As a result, even if a network partition occurs between the duplicated server computers 10-1 and 10-2, or if some of the N client computers 30-1 to 30-N fail, While preventing the occurrence of split brain between the server computers 10-1 and 10-2, it is possible to prevent the occurrence of a situation where the service is not provided to the client computer as much as possible.

なお、本発明は、上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合せにより種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。更に、異なる実施形態に亘る構成要素を適宜組み合せてもよい。 Note that the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. Further, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, you may combine suitably the component covering different embodiment.

本発明の第１の実施形態に係る２重化システムの構成を示すブロック図。1 is a block diagram showing a configuration of a duplex system according to a first embodiment of the present invention. 同実施形態における通常時のシステム状態図。The system state figure at the normal time in the same embodiment. 同実施形態において稼動系のサーバコンピュータ１０-1が停止した場合のシステム状態図。The system state figure when the active server computer 10-1 stops in the embodiment. 同実施形態において稼動系のサーバコンピュータ１０-1と待機系のサーバコンピュータ１０-2との間でネットワーク分割となったときのシステム状態図。FIG. 3 is a system state diagram when the network is divided between the active server computer 10-1 and the standby server computer 10-2 in the embodiment. 同実施形態において稼動系のサーバコンピュータ１０-1が高負荷となってハートビート送信が渋滞したときのシステム状態図。FIG. 3 is a system state diagram when the active server computer 10-1 is heavily loaded and the heartbeat transmission is congested in the embodiment. 本発明の第２の実施形態に係る２重化システムの構成を示すブロック図。The block diagram which shows the structure of the duplex system which concerns on the 2nd Embodiment of this invention.

Explanation of symbols

１０-1，１０-2…サーバコンピュータ、１１-1，１１-2…サーバ処理部、１２-1，１２-2…ハートビート機構、１３-1，１３-2…多重化制御部、２０…ネットワーク、３０，３０-1〜３０-N…クライアントコンピュータ、３１，３１-1〜３１-N…クライアント処理部、３２，３２-1〜３２-N…サーバ接続部、１３０-1，１３０-2…クライアント接続フラグ（クライアント接続状態管理手段）、１３１-1，１３１-2…クライアント接続カウンタ（クライアント接続状態管理手段）。 10-1, 10-2 ... server computer, 11-1, 11-2 ... server processing unit, 12-1, 12-2 ... heartbeat mechanism, 13-1, 13-2 ... multiplexing control unit, 20 ... Network, 30, 30-1 to 30-N ... Client computer, 31, 31-1 to 31-N ... Client processing unit, 32, 32-1 to 32-N ... Server connection unit, 130-1, 130-2 ... client connection flag (client connection state management means), 131-1 and 131-2 ... client connection counter (client connection state management means).

Claims

Consists of two server computers that can communicate with each other to provide services to N (N is an integer of 1 or more) client computers, and either one of the two server computers operates as an active system In a duplex system in which the other operates as a standby system,
Each of the two server computers is
Server processing means for executing server processing for providing the service;
Client connection state management means for managing connection states with the N client computers;
And multiplexing control means for controlling whether or not to execute server processing by the server processing means based on a majority decision based on connection states with the N client computers managed by the client connection state management means. A duplex system characterized by this.

Each of the two server computers has a heartbeat mechanism for recognizing each other by communicating with each other,
When the heartbeat mechanism cannot recognize the other party, the multiplexing control means uses the server processing means by a majority decision based on connection states with the N client computers managed by the client connection state management means. 2. The duplex system according to claim 1, wherein whether or not to execute server processing according to claim 1 is controlled.

The client connection state management means manages the connection state with the N client computers by the number of connections indicating the number of client computers connected to the corresponding server computer among the N client computers. And
The multiplexing control means controls whether or not to execute server processing by the server processing means depending on whether or not the number of connections managed by the client connection state management means exceeds N / 2. The duplex system according to claim 1, characterized in that:

The number of connections managed by the client connection state management means is updated when any of the N client computers switches connection destinations, and the corresponding server computer becomes a new connection destination after switching. 4. The duplex system according to claim 3, wherein 1 is decremented at a time and 1 is decremented when the corresponding server computer becomes an original connection destination before switching.

N is 1;
The client connection state management means is flag information indicating whether or not a connection state with the client computer is connected to the server computer corresponding to the client computer, and the client computer corresponds to the server computer corresponding to the client computer. During the period when the client computer is connected to the first state indicating that the client computer is connected to the corresponding server computer, the flag information is periodically updated by the client computer, and the flag information is If not updated to the first state for a certain period of time, the flag information is set to a second state indicating that the client computer is not connected to the corresponding server computer;
The multiplexing control unit causes the server processing unit to execute server processing depending on whether the flag information managed by the client connection state management unit is the first state or the second state. The duplex system according to claim 1, wherein control is performed.

Consists of two server computers that can communicate with each other to provide services to N (N is an integer of 1 or more) client computers, and either one of the two server computers operates as an active system In a duplex system in which the other operates as a standby system, a multiplexing control method for controlling server processing that is duplicated between the two server computers,
Each of the two server computers manages a connection state with the N client computers,
Each of the two server computers determines whether or not it executes server processing by a majority decision based on the connection state with the N client computers.

Each of the two server computers recognize each other by communicating with each other,
If each of the two server computers fails to recognize the other party, it determines whether or not to execute server processing by a majority decision based on the connection state with the N client computers. The multiplexing control method according to claim 6.