JP2010103695A

JP2010103695A - Cluster system, cluster server and cluster control method

Info

Publication number: JP2010103695A
Application number: JP2008272162A
Authority: JP
Inventors: Daisuke Sekiguchi; 大輔関口; Misaki Kakuno; みさき角野; Shinichi Watabe; 伸一渡部; Hirokazu Nagai; 浩和永井
Original assignee: NTT Data Corp
Current assignee: NTT Data Group Corp
Priority date: 2008-10-22
Filing date: 2008-10-22
Publication date: 2010-05-06

Abstract

<P>PROBLEM TO BE SOLVED: To provide a cluster system, cluster server and cluster control method detecting a business application in which a failure has occurred, without having to use a system information management table in a split brain state where communication performing heartbeat processing goes dead. <P>SOLUTION: The cluster system comprises a plurality of servers connected by a network, wherein each application is processed by a server of an active system, and when the application becomes abnormal, application in an abnormal state is processed by a server of a standby system. Each server has a heartbeat-monitoring part for detecting whether another server is in a normal state; an application control part for starting an application operating in the other server, when heartbeat with the other server is abnormal; and an application switching part for changing a MAC (Media Access Control) address corresponding to an IP address of an application of a router to own MAC address. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、複数のサーバを有し、ある稼働系のサーバで稼働している業務アプリケーションに障害が発生した際に、他のサーバである待機系のサーバで同様の業務アプリケーションを稼働させ、常にその業務アプリケーションを稼働状態とするクラスタシステム、ならびに、そのクラスタシステムの制御方法に関する。 The present invention has a plurality of servers, and when a business application running on a certain active server fails, the same business application is always run on the standby server that is another server. The present invention relates to a cluster system that puts the business application in an operating state, and a control method for the cluster system.

近年、クラスタシステムは、システムの大規模化によりデータ量が増大しているため、複数のサーバにより負荷分散を行い、かつ稼働させる業務アプリケーションに対して稼働系及び待機系を設け、所定の業務アプリケーションが稼働しないダウンタイムを低減して信頼性を向上させている。
すなわち、クラスタシステムは、複数のサーバにより、それぞれ異なる所定の業務アプリケーションが稼働しており、各サーバが他のサーバと通信を行い相互監視を行い（ハートビート処理）、いずれかのコンピュータの業務アプリケーションに障害が発生した場合、他のサーバにおいて、その業務アプリケーションを稼働させ、その業務アプリケーションを継続して実行させる構成となっている（例えば、引用文献１参照）。
特開２００５−６９８７２号公報 In recent years, the amount of data in a cluster system has increased due to the increase in the scale of the system. Therefore, load distribution is performed by a plurality of servers, and an active system and a standby system are provided for operating business applications. Reduces downtime when the system does not operate and improves reliability.
In other words, in a cluster system, different predetermined business applications are operated by a plurality of servers, and each server communicates with other servers to perform mutual monitoring (heartbeat processing). When a failure occurs, the business application is operated on another server and the business application is continuously executed (see, for example, cited document 1).
JP 2005-69872 A

しかしながら、上述したクラスタシステムは、ハートビート処理を行っている通信が不通となった場合、相互監視が行えなくなるスプリットブレーン状態となるため、各サーバが共通にアクセス可能なシステム情報管理テーブルを有している。
すなわち、各サーバの監視ソフトが、このシステム情報管理テーブルの各サーバの業務アプリケーション毎（あるいは複数のグループからなる業務アプリケーショングループ毎）に動作主体となっているサーバの識別子を書き込む欄があり、その欄に周期的にアクセス結果を書き込んでいる。 However, the cluster system described above has a split brain state in which mutual monitoring cannot be performed when communication for performing heartbeat processing is interrupted. ing.
That is, there is a column in which the monitoring software of each server writes the identifier of the server that is the operating entity for each business application (or each business application group consisting of a plurality of groups) of each server in this system information management table. The access result is periodically written in the column.

したがって、他サーバの業務アプリケーションに対応する上記欄を確認し、設定された時間以上の間隔を置いて、新たな時間が書き込まれなかった場合、その業務アプリケーションに障害が発生していることを検出し、他のサーバがその障害のあった業務アプリケーションを自身が行う切り替え処理を行う。
ところが、上述したシステム情報管理テーブルを設ける場合、新たにディスクをシステムに配設することとなり、システム全体の価格が高くなってしまうという問題がある。
また、クラスタシステムのアプリケーションが提供するサービスの種類によっては、データを外部のディスクに蓄積する必要の無いものもあり、スプリットブレーンに対応することだけの目的で、ディスクを設けることはシステム全体の価格を高くすることとなる。 Therefore, check the above column corresponding to the business application on the other server, and if a new time is not written after an interval longer than the set time, it is detected that a fault has occurred in that business application. Then, another server performs a switching process for itself to execute the business application having the failure.
However, when the above-described system information management table is provided, there is a problem that a new disk is disposed in the system, which increases the price of the entire system.
In addition, depending on the type of service provided by the cluster system application, there is no need to store data on an external disk, and installing a disk only for the purpose of supporting split brain is the price of the entire system. Will be higher.

本発明は、このような事情に鑑みてなされたもので、ハートビート処理を行っている通信が不通となるスプリットブレーン状態となった場合に、上述したシステム情報管理テーブルを用いずとも、障害の発生した業務アプリケーションを検出することができるクラスタシステム、クラスタサーバ及びクラスタ方法を提供することを目的とする。 The present invention has been made in view of such circumstances, and in the case of a split brain state in which communication for which heartbeat processing is performed is interrupted, the failure information can be obtained without using the above-described system information management table. An object of the present invention is to provide a cluster system, a cluster server, and a cluster method capable of detecting a generated business application.

本発明のクラスタシステムは、ネットワークにて接続された複数のサーバからなり、各アプリケーションを稼働系のサーバにおいて処理し、アプリケーションが異常となった場合に、異常となったアプリケーションを待機系のサーバにて処理するクラスタシステムであり、前記各サーバが、他のサーバとの間におけるハートビートが正常に動作しているか否かを検出するハートビート監視部と、該ハートビート監視部が他のサーバとの間のハートビートが異常であることを検出した場合、前記他のサーバにて稼働しているアプリケーションを起動するアプリケーション制御部と、ルータにおける当該アプリケーションのＩＰアドレスに対応するＭＡＣアドレスを、自身のＭＡＣアドレスに変更するアプリケーション切り替え部とを有することを特徴とする。 The cluster system of the present invention is composed of a plurality of servers connected via a network. Each application is processed by an active server, and when an application becomes abnormal, the abnormal application becomes a standby server. A heartbeat monitoring unit that detects whether or not the heartbeat is operating normally with another server, and the heartbeat monitoring unit communicates with another server. When detecting that the heartbeat between is abnormal, the application control unit that activates the application running on the other server, and the MAC address corresponding to the IP address of the application in the router, Having an application switching unit for changing to a MAC address And features.

本発明のクラスタシステムは、前記アプリケーション制御部が、前記ハートビートに異常が発生した際、前記ネットワークを介してエコーパケットを各他のサーバに送信し、前記エコーパケットに対する返信の無いサーバに対応するアプリケーションを、現在実行しているアプリケーションに加えて実行することを特徴とする。 In the cluster system of the present invention, the application control unit transmits an echo packet to each other server via the network when an abnormality occurs in the heartbeat, and corresponds to a server that does not return a response to the echo packet. An application is executed in addition to the currently executing application.

本発明のクラスタシステムは、前記アプリケーション制御部が、前記エコーパケットに対して返信があった場合、当該エコーパケットを返信した他のサーバへ仮想端末ソフトウェアを用いて前記ネットワークを介してログインし、前記エコーパケットを返信したサーバで稼働している各アプリケーションの動作状態をチェックし、正常であるか否かの検出を行うことを特徴とする。 In the cluster system of the present invention, when the application control unit returns a response to the echo packet, the application control unit logs in to another server that has returned the echo packet through the network using virtual terminal software, The operating state of each application running on the server that has returned the echo packet is checked to detect whether it is normal or not.

本発明のクラスタサーバは、ネットワークにて接続された複数のサーバからなり、各アプリケーションを稼働系のサーバにおいて処理し、アプリケーションが異常となった場合に、異常となったアプリケーションを待機系のサーバにて処理するクラスタシステムで用いるクラスタサーバであり、他のサーバとの間におけるハートビートが正常に動作しているか否かを検出するハートビート監視部と、該ハートビート監視部が他のサーバとの間のハートビートが異常であることを検出した場合、前記他のサーバにて稼働しているアプリケーションを起動するアプリケーション制御部と、ルータにおける当該アプリケーションのＩＰアドレスに対応するＭＡＣアドレスを、自身のＭＡＣアドレスに変更するアプリケーション切り替え部とを有することを特徴とする。 The cluster server of the present invention is composed of a plurality of servers connected via a network. Each application is processed by an active server, and when an application becomes abnormal, the abnormal application becomes a standby server. A cluster server used in the cluster system to process the heartbeat monitoring unit for detecting whether or not the heartbeat with other servers is operating normally, and the heartbeat monitoring unit If the heartbeat between them is detected to be abnormal, the application control unit that activates the application running on the other server and the MAC address corresponding to the IP address of the application in the router And an application switching unit for changing to an address And wherein the door.

本発明のクラスタ制御方法は、ネットワークにて接続された複数のサーバからなり、各アプリケーションを稼働系のサーバにおいて処理し、アプリケーションが異常となった場合に、異常となったアプリケーションを待機系のサーバにて処理するクラスタシステムにおいて用いられるクラスタ制御方法であり、前記各サーバにおいて、ハートビート監視部が他のサーバとの間におけるハートビートが正常に動作しているか否かを検出するハートビート監視過程と、前記ハートビート監視部により他のサーバとの間のハートビートが異常であることが検出された場合、アプリケーション制御部が、前記他のサーバにて稼働しているアプリケーションを起動するアプリケーション制御過程と、アプリケーション切り替え部が、ルータにおける当該アプリケーションのＩＰアドレスに対応するＭＡＣアドレスを、自身のＭＡＣアドレスに変更するアプリケーション切り替え過程とを有することを特徴とする。 The cluster control method according to the present invention includes a plurality of servers connected via a network, and each application is processed by an active server. When an application becomes abnormal, the abnormal application is processed as a standby server. Is a cluster control method used in a cluster system for processing in which the heartbeat monitoring process in which the heartbeat monitoring unit detects whether or not the heartbeat with other servers is operating normally in each server And when the heartbeat monitoring unit detects that the heartbeat with the other server is abnormal, the application control unit starts an application running on the other server. And the application switching unit The MAC address corresponding to the IP address of the application, and having an application switching process to change its MAC address.

以上説明したように、本発明によれば、監視専用接続が不通あるいは稼働しているサーバが異常となることで、通信ハートビート処理（以下、ハートビート）が行えなくなり、スプリットブレーン状態となった場合でも、従来のように高価なシステム情報管理テーブル共有ディスクを設けなくとも、各アプリケーションに対するＩＣＭＰ（Control Message Protocol）のエコーパケット（エコーリクエストパケット）、例えばＰＩＮＧ（Packet INternet Groper）の送信、及び他のサーバに対してネットワークからのログインによるアプリケーションのチェックにより、障害のあるアプリケーションを容易に検出することが可能であり、従来例に比較して安価なクラスタシステムを構成することができる。 As described above, according to the present invention, the communication heartbeat process (hereinafter referred to as heartbeat) cannot be performed due to the disconnection of the monitoring dedicated connection or the malfunctioning server, resulting in a split brain state. Even in the case where an expensive system information management table shared disk is not provided as in the past, ICMP (Control Message Protocol) echo packets (echo request packets) for each application, for example, PING (Packet Internet Groper) transmission, and others It is possible to easily detect a faulty application by checking the application by logging in to the server from the network, and it is possible to configure a cluster system that is less expensive than the conventional example.

以下、本発明の一実施形態によるクラスタシステムの構成を図面を参照して説明する。図１は同実施形態によるクラスタシステムの構成例を示すブロック図である。
この図において、本実施形態のクラスタシステムは、サーバ１、サーバ２、…の複数のサーバと、ルータ３０と、により構成されている。
サーバ１、サーバ２、…は、ネットワーク１００に接続されており、ルータ３０を介して外部システム、すなわちクライアント端末等に接続されている。また、外部システムとの接続は、ルータ３０の代わりにスイッチングハブでもよく、ネットワークにおいて、ＶＩＰ（仮想ＩＰ）アドレスとＭＡＣ（Media Access Control address）アドレスとの対応を示すＡＲＰ（Address Resolution Protocol）テーブルを有している経路制御を行うネットワーク機器であれば何でも良い。 Hereinafter, a configuration of a cluster system according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a configuration example of a cluster system according to the embodiment.
In this figure, the cluster system of this embodiment is composed of a plurality of servers, a server 1, a server 2,..., And a router 30.
The server 1, the server 2,... Are connected to the network 100, and are connected to an external system, that is, a client terminal or the like via the router 30. The connection to the external system may be a switching hub instead of the router 30. In the network, an ARP (Address Resolution Protocol) table indicating the correspondence between the VIP (virtual IP) address and the MAC (Media Access Control address) address is provided. Any network device can be used as long as it has a route control.

このＶＩＰアドレスは、ネットワーク１００を介して、あるいはルータ３０を介して接続されるクライアント端末が業務アプリケーションにアクセスする際に用いられる。
すなわち、上記ＶＩＰアドレスは、図３のテーブルにより各業務アプリケーションに対応するネットワークセグメントに設定される。
上記サーバ１、サーバ２、…各々は、それぞれにて稼働している業務アプリケーションにて、各業務アプリケーションにアクセスするクライアント端末に対してサービスを提供する。上述した各サーバにおいて、あるサーバは稼働系として用いて、業務用アプリケーションを処理し、あるサーバは待機系として、処理が異常となった業務用アプリケーションが検出された場合、その業務アプリケーションが稼働していた稼働系のサーバから、業務アプリケーションが稼働していない待機系のサーバに移行する処理が行われる。以下説明を簡単とするため、本実施形態においては、サーバが２台であって、サーバ１（稼働系）及びサーバ２（待機系）がネットワーク１００に接続されているとして説明する。 This VIP address is used when a client terminal connected via the network 100 or via the router 30 accesses a business application.
That is, the VIP address is set in the network segment corresponding to each business application according to the table of FIG.
Each of the server 1, the server 2,... Provides a service to a client terminal that accesses each business application using a business application that is running on the server 1. In each of the servers described above, a server is used as an active system to process business applications, and a server is used as a standby system. A process of migrating from the active server to the standby server where the business application is not operating is performed. In the following description, in the present embodiment, it is assumed that there are two servers, and that the server 1 (active system) and the server 2 (standby system) are connected to the network 100 in the present embodiment.

サーバ１は、ハートビート監視部１１、業務アプリケーション制御部１２、業務アプリケーション切り替え部１３、インターフェース１４及び記憶部１５を有している。
同様に、サーバ２は、ハートビート監視部２１、業務アプリケーション制御部２２、業務アプリケーション切り替え部２３、インターフェース２４及び記憶部２５を有している。
上記業務アプリケーション制御部１２は、監視部１２１と制御部１２２とを有する。
同様に、上記業務アプリケーション制御部２２は、監視部２２１と制御部２２２とを有する。
記憶部１５及び記憶部２５には、アプリケーションＩＤと、そのアプリケーションＩＤに対応して業務アプリケーション（実行ファイル）とが記憶されている。
ハートビート監視部１１は、ハートビート監視部２１に対し、あらかじめ設定した周期毎に、監視信号を出力し、その応答の有無により、ハートビート監視部１１との間のハートビートが正常に動作しているか否かの検出を行う。
同様に、ハートビート監視部２１は、ハートビート監視部１１に対し、あらかじめ設定した周期毎に、監視信号を出力し、その応答の有無により、ハートビート監視部１１との間のハートビートが正常に動作しているか否かの検出を行う。
ここで、ハートビート監視部１１とハートビート監視部２１とは、ハートビートのデータの送受信を行うネットワークを、ルータ３０に接続されているネットワーク１００と異なるネットワークセグメントを用いて行っている。 The server 1 includes a heartbeat monitoring unit 11, a business application control unit 12, a business application switching unit 13, an interface 14, and a storage unit 15.
Similarly, the server 2 includes a heartbeat monitoring unit 21, a business application control unit 22, a business application switching unit 23, an interface 24, and a storage unit 25.
The business application control unit 12 includes a monitoring unit 121 and a control unit 122.
Similarly, the business application control unit 22 includes a monitoring unit 221 and a control unit 222.
The storage unit 15 and the storage unit 25 store an application ID and a business application (executable file) corresponding to the application ID.
The heartbeat monitoring unit 11 outputs a monitoring signal to the heartbeat monitoring unit 21 at a preset period, and the heartbeat with the heartbeat monitoring unit 11 operates normally depending on whether there is a response. Detect whether or not.
Similarly, the heartbeat monitoring unit 21 outputs a monitoring signal to the heartbeat monitoring unit 11 at a preset period, and the heartbeat with the heartbeat monitoring unit 11 is normal depending on whether or not there is a response. It is detected whether or not it is operating.
Here, the heartbeat monitoring unit 11 and the heartbeat monitoring unit 21 perform a network for transmitting and receiving heartbeat data using a network segment different from the network 100 connected to the router 30.

業務アプリケーション制御部１２は、監視部１２１と制御部１２２とを有している。
ここで、監視部１２１は、サーバ１において稼働されている業務アプリケーションの監視、すなわち、あらかじめ設定した周期毎に、自身の属するサーバ１で稼働している業務アプリケーションの動作における障害の有無を検出する。この障害の検出において、監視部１２１は、各業務アプリケーションに動作確認の確認信号を送信し、業務アプリケーションからの応答信号の有無により、自身の属するサーバ１で稼働しているアプリケーションの障害の有無を検出している。ここで、監視部１２１は、内部の記憶部に各業務アプリケーションの識別情報であるアプリケーションＩＤに対応付けて動作状態を書き込んで記録する。
制御部１２２は、業務アプリケーションの起動及び停止を行い、上記監視部１２１が障害を検出すると、対応する業務アプリケーションを停止し、停止させた業務アプリケーションのアプリケーションＩＤを、ハートビート監視部１１を介し、ハートビート監視部２１へ送信する。また、制御部１２２は、ハートビート監視部１１から通知されるアプリケーションＩＤに対応する業務アプリケーションを、記憶部１５のテーブルから読み込んで実行する。 The business application control unit 12 includes a monitoring unit 121 and a control unit 122.
Here, the monitoring unit 121 monitors the business application running on the server 1, that is, detects the presence or absence of a failure in the operation of the business application running on the server 1 to which the monitoring unit 121 belongs in advance. . In detecting the failure, the monitoring unit 121 transmits a confirmation signal for confirming the operation to each business application, and determines whether there is a failure in the application running on the server 1 to which the monitoring server 121 belongs based on the presence / absence of a response signal from the business application. Detected. Here, the monitoring unit 121 writes and records the operation state in the internal storage unit in association with the application ID that is identification information of each business application.
The control unit 122 starts and stops the business application. When the monitoring unit 121 detects a failure, the control unit 122 stops the corresponding business application, and the application ID of the stopped business application is transmitted via the heartbeat monitoring unit 11. Transmit to the heartbeat monitoring unit 21. In addition, the control unit 122 reads the business application corresponding to the application ID notified from the heartbeat monitoring unit 11 from the table in the storage unit 15 and executes it.

また、業務アプリケーション制御部２２は、監視部２２１と制御部２２２とを有している。
ここで、監視部２２１は、サーバ２において稼働されている業務アプリケーションの監視、すなわち、あらかじめ設定した周期毎に、自身の属するサーバ２で稼働している業務アプリケーションの動作における障害の有無を検出する。この障害の検出において、監視部２２１は、各業務アプリケーションに動作確認の確認信号を送信し、業務アプリケーションからの応答信号の有無により、自身の属するサーバ２で稼働している障害の有無を検出している。ここで、監視部２２１は、内部の記憶部に各業務アプリケーションのアプリケーションＩＤに対応付けて動作状態を書き込んで記録する。
制御部２２２は、業務アプリケーションの起動及び停止を行い、上記監視部２２１が障害を検出すると、対応する業務アプリケーションを停止し、停止させた業務アプリケーションのアプリケーションＩＤを、ハートビート監視部２１を介し、ハートビート監視部１１へ送信する。また、制御部２２２は、ハートビート監視部２１から通知されるアプリケーションＩＤに対応する業務アプリケーションを、記憶部２５のテーブルから読み込んで実行する。 The business application control unit 22 includes a monitoring unit 221 and a control unit 222.
Here, the monitoring unit 221 monitors the business application running on the server 2, that is, detects the presence or absence of a failure in the operation of the business application running on the server 2 to which the monitoring unit 221 is set in advance. . In detecting this failure, the monitoring unit 221 transmits an operation confirmation confirmation signal to each business application, and detects the presence or absence of a failure operating in the server 2 to which it belongs based on the presence or absence of a response signal from the business application. ing. Here, the monitoring unit 221 writes and records the operation state in the internal storage unit in association with the application ID of each business application.
The control unit 222 starts and stops the business application. When the monitoring unit 221 detects a failure, the control unit 222 stops the corresponding business application, and the application ID of the stopped business application is sent via the heartbeat monitoring unit 21. Transmit to the heartbeat monitoring unit 11. In addition, the control unit 222 reads the business application corresponding to the application ID notified from the heartbeat monitoring unit 21 from the table in the storage unit 25 and executes it.

業務アプリケーション切り替え部１３は、ルータ３０に記憶されているＡＲＰテーブルを書き換えるため、ハートビート監視部１１から通知されるアプリケーションＩＤに対応したＶＩＰアドレスと、ＭＡＣアドレスとを含むＡＲＰパケットを、ルータ３０に送信する。
インターフェース１４は、ネットワーク１００を介してルータ３０に接続されたＭＡＣアドレス＃１を有し、ＬＡＮアダプタを有するインターフェースである。このＬＡＮアダプタは、同一のＭＡＣアドレス＃１に対して複数のＶＩＰアドレスを対応付け、複数のアプリケーション毎に異なるＶＩＰアドレスを用いた通信処理を行う。
このＬＡＮアダプタは、同一のＭＡＣアドレス＃１に対して複数のＶＩＰアドレスを対応付け、複数のアプリケーション毎に異なるＶＩＰアドレスを用いた通信処理を行う。 In order to rewrite the ARP table stored in the router 30, the business application switching unit 13 sends an ARP packet including the VIP address corresponding to the application ID notified from the heartbeat monitoring unit 11 and the MAC address to the router 30. Send.
The interface 14 is an interface having a MAC address # 1 connected to the router 30 via the network 100 and having a LAN adapter. This LAN adapter associates a plurality of VIP addresses with the same MAC address # 1, and performs communication processing using a different VIP address for each of a plurality of applications.
This LAN adapter associates a plurality of VIP addresses with the same MAC address # 1, and performs communication processing using a different VIP address for each of a plurality of applications.

業務アプリケーション切り替え部２３は、ルータ３０に記憶されているＡＲＰテーブルを書き換えるため、ハートビート監視部２１から通知される業務アプリケーションＩＤに対応したＶＩＰアドレス及びＭＡＣアドレス含むＡＲＰパケットを、ルータ３０に送信する。
インターフェース２４は、ネットワーク１００を介してルータ３０に接続されたＭＡＣアドレス＃２を有し、ＬＡＮアダプタを有するインターフェースである。このＬＡＮアダプタは、同一のＭＡＣアドレス＃２に対して複数のＶＩＰアドレスを対応付け、複数の業務アプリケーション毎に異なるＶＩＰアドレスを用いた通信処理を行う。
このＬＡＮアダプタは、同一のＭＡＣアドレス＃２に対して複数のＶＩＰアドレスを対応付け、複数の業務アプリケーション毎に異なるＶＩＰアドレスを用いた通信処理を行う。 The business application switching unit 23 transmits an ARP packet including a VIP address and a MAC address corresponding to the business application ID notified from the heartbeat monitoring unit 21 to the router 30 in order to rewrite the ARP table stored in the router 30. .
The interface 24 is an interface having a MAC address # 2 connected to the router 30 via the network 100 and having a LAN adapter. This LAN adapter associates a plurality of VIP addresses with the same MAC address # 2, and performs communication processing using a different VIP address for each of a plurality of business applications.
This LAN adapter associates a plurality of VIP addresses with the same MAC address # 2, and performs communication processing using a different VIP address for each of a plurality of business applications.

ルータ３０には、サーバ１，２，…のＭＡＣアドレス（Media Access Control address）と、それぞれのサーバが業務アプリケーションに対応して有しているＶＩＰアドレスとが対応付けられた、図２に示すＡＲＰテーブルが記憶されている。ここで、＃１がサーバ１のＭＡＣアドレスであり、＃２がサーバ２のＭＡＣアドレスである。ＶＩＰアドレスＸ．Ｘ．Ｘ．Ｘ１がアプリケーションＡＰ１に対応して設けられたＶＩＰアドレスであり、ＶＩＰアドレスＸ．Ｘ．Ｘ．Ｘ２がアプリケーションＡＰ２に対応して設けられたＶＩＰアドレスであり、ＶＩＰアドレスＸ．Ｘ．Ｘ．Ｘ３がアプリケーションＡＰ３に対応して設けられたＶＩＰアドレスであり、ＶＩＰアドレスＸ．Ｘ．Ｘ．Ｘ４がアプリケーションＡＰ４に対応して設けられたＶＩＰアドレスである。 The router 30 is associated with the MAC address (Media Access Control address) of the servers 1, 2,... And the VIP address that each server has in correspondence with the business application, as shown in FIG. The table is stored. Here, # 1 is the MAC address of the server 1, and # 2 is the MAC address of the server 2. VIP address X. X. X. X1 is a VIP address provided corresponding to the application AP1, and the VIP address X. X. X. X2 is a VIP address provided corresponding to the application AP2, and the VIP address X. X. X. X3 is a VIP address provided corresponding to the application AP3. X. X. X4 is a VIP address provided corresponding to the application AP4.

また、記憶部１５及び記憶部２５各々には、上記ＶＩＰアドレスと、当該ＶＩＰアドレスに対応した業務アプリケーションを示す識別情報であるアプリケーションＩＤとが対応付けられた、図３に示す対応テーブルが記憶されている。この図３において、ＶＩＰアドレスＸ．Ｘ．Ｘ．Ｘ１はアプリケーションＩＤであるアプリケーションＡＰ１に対応付けられており、ＶＩＰアドレスＸ．Ｘ．Ｘ．Ｘ２はアプリケーションＩＤであるアプリケーションＡＰ２に対応付けられ、ＶＩＰアドレスＸ．Ｘ．Ｘ．Ｘ３はアプリケーションＩＤであるアプリケーションＡＰ３に対応付けられ、ＶＩＰアドレスＸ．Ｘ．Ｘ．Ｘ４はアプリケーションＩＤであるアプリケーションＡＰ４に対応付けられている。 Each of the storage unit 15 and the storage unit 25 stores a correspondence table shown in FIG. 3 in which the VIP address is associated with an application ID that is identification information indicating a business application corresponding to the VIP address. ing. In FIG. 3, the VIP address X. X. X. X1 is associated with application AP1 which is an application ID, and VIP address X.X1. X. X. X2 is associated with the application AP2 which is the application ID, and the VIP address X. X. X. X3 is associated with application AP3, which is an application ID, and VIP address X.X3. X. X. X4 is associated with application AP4, which is an application ID.

また、ルータ３０は、ＶＩＰアドレスを含む上記ＡＲＰパケットをサーバから受信すると、ＡＲＰパケットに付加されたＶＩＰアドレスに対応するＭＡＣアドレスを、当該ＡＲＰパケットを送信したサーバのＭＡＣアドレスに変更する。
ここで、上記ＶＩＰアドレスは、すでに述べたように、テーブルにて対応する各業務アプリケーションに対するパケットのネットワークセグメントヘッダーやＵＤＰデータグラムヘッダー中の宛先ポート番号に対応している。すなわち、上記ＶＩＰアドレスは、各業務アプリケーションに対して１つのＩＰアドレスを割り当てるために設定されている。 When the router 30 receives the ARP packet including the VIP address from the server, the router 30 changes the MAC address corresponding to the VIP address added to the ARP packet to the MAC address of the server that transmitted the ARP packet.
Here, as described above, the VIP address corresponds to the destination port number in the network segment header or the UDP datagram header of the packet for each business application corresponding in the table. That is, the VIP address is set in order to assign one IP address to each business application.

次に、図１及び図４を参照して本実施形態によるクラスタシステムの動作を説明する。図４は、図１のクラスタシステムにおける、ハートビートが正常な場合、すなわちスプリットブレーンではない状態でのアプリケーションの切り替えの動作例を示すフローチャートである。以下、説明を簡単とするため、クラスタシステムが、サーバ１及びサーバ２の２つにより構成され、サーバ１の動作を中心に説明する。しかしながら、サーバ２においても同様の動作が行われている。
以下、図２及び図３の各テーブルに示すように、図１のクラスタシステムにおいて、初期状態において、稼働系のサーバ１でアプリケーションＡＰ１、ＡＰ２、ＡＰ３及びＡＰ４が稼働しており、待機系のサーバ２ではいずれの業務アプリケーションも稼働していない。 Next, the operation of the cluster system according to the present embodiment will be described with reference to FIGS. FIG. 4 is a flowchart showing an operation example of application switching in the cluster system of FIG. 1 when the heartbeat is normal, that is, when the heartbeat is not split. Hereinafter, in order to simplify the description, the cluster system includes two servers, server 1 and server 2, and the operation of server 1 will be mainly described. However, the server 2 performs the same operation.
As shown in the tables of FIGS. 2 and 3, in the cluster system of FIG. 1, the applications AP1, AP2, AP3, and AP4 are operating on the active server 1 in the initial state, and the standby server. In No. 2, no business application is running.

監視部１２１は、サーバ１において稼働している業務アプリケーションＡＰ１、ＡＰ２、ＡＰ３及びＡＰ４の動作状態の監視を、予め設定された周期において行う（ステップＳ１）。
そして、監視部１２１は、各業務アプリケーションに対して確認信号を送信し、応答信号の有無を確認することによって、正常であるか否かの判定を行い（ステップＳ２）、正常である旨の応答があった場合、処理を繰り返し、一方、応答がない場合、または異常である旨の応答があった場合には、異常の検出されたアプリケーションＩＤを、例えばアプリケーションＡＰ３を制御部１２２へ送信して処理をＳ３へ進める。
上記アプリケーションＩＤが入力されると、制御部１２２は、このアプリケーションＡＰ３に対応する業務アプリケーションを記憶部１５のテーブルにより特定し（ステップＳ３）、特定された業務アプリケーションの稼働を停止させる（ステップＳ４）。
このとき、業務アプリケーション切り替え部１３は、このアプリケーションＡＰ３に付与されたＶＩＰに対するネットワークセグメントの対応関係の記述を記憶部１５から削除する。 The monitoring unit 121 monitors operation states of the business applications AP1, AP2, AP3, and AP4 running on the server 1 at a preset cycle (step S1).
Then, the monitoring unit 121 transmits a confirmation signal to each business application and determines whether or not it is normal by confirming the presence or absence of a response signal (step S2). If there is a response, the process is repeated. On the other hand, if there is no response or if there is a response indicating an abnormality, the application ID in which the abnormality is detected is transmitted to the control unit 122, for example, the application AP3. The process proceeds to S3.
When the application ID is input, the control unit 122 specifies a business application corresponding to the application AP3 from the table in the storage unit 15 (step S3), and stops the operation of the specified business application (step S4). .
At this time, the business application switching unit 13 deletes the description of the correspondence relationship of the network segment with respect to the VIP assigned to the application AP3 from the storage unit 15.

次に、制御部１２２は、停止させた業務アプリケーションのアプリケーションＩＤを、異常検出信号とともに、ハートビート監視部１１を介して、サーバ２へ送信する（ステップＳ５）。
上記異常検出信号及びアプリケーションＩＤがサーバ１のハートビート監視部１１から送信されると、サーバ２のハートビート監視部２１は、このアプリケーションＩＤ及び異常検出信号を受信する（ステップＳ６）。
これにより、サーバ２の制御部２２２は、ハートビート監視部２１が受信した上記アプリケーションＩＤ及び異常検出信号に基づいて業務アプリケーションの起動処理、すなわち、入力されたアプリケーションＡＰ３に対応する業務アプリケーションを、記憶部２５のテーブルから読み出し、この業務アプリケーションを起動する。
そして、業務アプリケーション切り替え部２３は、この起動したアプリケーションＡＰ３に対応するＶＩＰアドレスを、図３のテーブルより読み出し、このＶＩＰアドレスに対するネットワークセグメントを生成し、起動した業務アプリケーションのアプリケーションＩＤとＶＩＰアドレスの対応付けを行い、記憶部２５に書き込む（ステップＳ７）。
この後、サーバ２において、制御部２２２は、アプリケーションＡＰ３に対する稼働の確認処理を行う。 Next, the control unit 122 transmits the application ID of the stopped business application together with the abnormality detection signal to the server 2 via the heartbeat monitoring unit 11 (step S5).
When the abnormality detection signal and the application ID are transmitted from the heartbeat monitoring unit 11 of the server 1, the heartbeat monitoring unit 21 of the server 2 receives the application ID and the abnormality detection signal (step S6).
As a result, the control unit 222 of the server 2 stores the business application startup process based on the application ID and the abnormality detection signal received by the heartbeat monitoring unit 21, that is, the business application corresponding to the input application AP3. The data is read from the table of the unit 25, and this business application is activated.
Then, the business application switching unit 23 reads the VIP address corresponding to the activated application AP3 from the table of FIG. 3, generates a network segment for the VIP address, and associates the application ID of the activated business application with the VIP address. Is added to the storage unit 25 (step S7).
Thereafter, in the server 2, the control unit 222 performs an operation confirmation process for the application AP3.

また、業務アプリケーション切り替え部２３は、アプリケーションＩＤがアプリケーションＡＰ３のＶＩＰアドレスであるＸ．Ｘ．Ｘ．Ｘ３と、サーバ２のＭＡＣアドレス＃２を付加したＡＲＰパケットをルータ３０に送信する。
ＡＲＰパケットが入力されると、ルータ３０は、ＡＲＰテーブルにおいて、ＶＩＰアドレスＸ．Ｘ．Ｘ．Ｘ３に対応したＭＡＣアドレスをサーバ１のＭＡＣアドレス＃１からサーバ１の＃２へ書き換える。
これにより、以降の外部システムのクライアント端末からのアプリケーションＡＰ３に対するアクセスは、待機系から稼働系となったサーバ２において実行されているプリケーションＡＰ３に対して行われることになる。 In addition, the business application switching unit 23 uses the X.P.D. whose application ID is the VIP address of the application AP3. X. X. The ARP packet to which X3 and the MAC address # 2 of the server 2 are added is transmitted to the router 30.
When the ARP packet is input, the router 30 determines that the VIP address X. X. X. The MAC address corresponding to X3 is rewritten from the MAC address # 1 of the server 1 to # 2 of the server 1.
As a result, the subsequent access to the application AP3 from the client terminal of the external system is performed with respect to the application AP3 being executed in the server 2 that has changed from the standby system to the active system.

以下、スプリットブレーンが発生した際に、異常とされたサーバから、正常な他のサーバへ業務アプリケーションを切り替える２つの異なる切り替え処理について説明する。
＜スプリットブレーン発生時の第１の業務アプリケーション切り替え処理＞
次に、ハートビートが異常、すなわちサーバ１及びサーバ２においてスプリットブレーンが発生した場合の業務アプリケーションの切り替え処理について説明する。図５は、サーバ２を主体として、アプリケーション切り替え処理の動作を説明するフローチャートである。
ハートビート監視部１１及びハートビート監視部２１のそれぞれは、互いの属するサーバの稼働状態をハートビート（上述したサーバの稼働状態を示す通知の送受信処理）にて監視するとともに、監視用専用接続であるネットワークの通信状況を確認している。ここでは、サーバ２のハートビート監視部２１がサーバ１を監視している（ステップＳ１１）。
ここで、ハートビート監視部１１及びハートビート監視部２１は、それぞれルータ３０と接続されたインターフェース１４、１５のネットワーク１００と異なるネットワークセグメントにより、それぞれの間に監視用専用接続を形成している。 In the following, two different switching processes for switching a business application from an abnormal server to another normal server when a split brain occurs will be described.
<First business application switching process when split brain occurs>
Next, a business application switching process when the heartbeat is abnormal, that is, when a split brain occurs in the server 1 and the server 2, will be described. FIG. 5 is a flowchart for explaining the operation of the application switching process with the server 2 as a main body.
Each of the heartbeat monitoring unit 11 and the heartbeat monitoring unit 21 monitors the operating status of the servers to which they belong by using a heartbeat (the above-described notification transmission / reception process indicating the operating status of the server), and also uses a dedicated connection for monitoring. Checking the communication status of a certain network. Here, the heartbeat monitoring unit 21 of the server 2 monitors the server 1 (step S11).
Here, the heartbeat monitoring unit 11 and the heartbeat monitoring unit 21 form a monitoring dedicated connection between the interfaces 14 and 15 connected to the router 30 by a network segment different from the network 100, respectively.

ハートビート監視部１１及びハートビート監視部２１のそれぞれは、接続されている相手から予め設定された一定周期毎に、返信の有無によりハートビートが有効に行われているか否か、すなわちスプリットブレーンの状態であるか否かを判定する。ここでは、サーバ２のハートビート監視部２１は、サーバ１との間のハートビートが有効に行われているか否かを判定する（ステップＳ１２）。この場合も、図１のクラスタシステムにおいて、初期状態において、稼働系のサーバ１でアプリケーションＡＰ１、ＡＰ２、ＡＰ３及びＡＰ４が稼働しており、待機系のサーバ２ではいずれの業務アプリケーションも稼働していないものとして説明する。
ここで、ハートビート監視部２１は、ハートビート監視部１１に対して送信した監視信号に対する応答信号を受信した場合、サーバ１が正常に動作しているとして、処理をステップＳ１３へ進め、すでに説明したハートビートが正常な場合の監視処理となり、一方、監視信号に対する応答信号が受信されない場合、サーバ１自体が異常あるいは監視用専用接続が異常である（スプリットブレーンの状態である）として、処理をステップＳ１４へ進める。 Each of the heartbeat monitoring unit 11 and the heartbeat monitoring unit 21 determines whether or not a heartbeat is effectively performed based on the presence or absence of a reply at a predetermined period set by a connected partner, that is, a split brain. It is determined whether it is in a state. Here, the heartbeat monitoring unit 21 of the server 2 determines whether or not the heartbeat with the server 1 is effectively performed (step S12). Also in this case, in the cluster system of FIG. 1, in the initial state, the applications AP1, AP2, AP3, and AP4 are running on the active server 1, and no business application is running on the standby server 2. It will be explained as a thing.
Here, when the heartbeat monitoring unit 21 receives a response signal to the monitoring signal transmitted to the heartbeat monitoring unit 11, the heartbeat monitoring unit 21 proceeds to step S13, assuming that the server 1 is operating normally, and has already been described. On the other hand, if the response signal to the monitoring signal is not received, the server 1 itself is abnormal or the monitoring dedicated connection is abnormal (split brain state). Proceed to step S14.

次に、サーバ２の制御部２２２は、クラスタシステムにて稼働している業務アプリケーションのアプリケーションＩＤを記憶部２５において抽出し、自身にて稼働しているアプリケーションが無いため、抽出した全てのアプリケーションＩＤに対応する業務アプリケーションが、稼働系であるサーバ１において稼働している業務アプリケーションであると特定する（ステップＳ１４）。
そして、制御部２２２は、特定した上記アプリケーションＩＤ、すなわちアプリケーションＡＰ１、ＡＰ２、ＡＰ３及びＡＰ４を起動する。
そして、業務アプリケーション切り替え部２３は、起動した各アプリケーションＡＰ１、ＡＰ２、ＡＰ３、ＡＰ４各々に対応するＶＩＰアドレスを、図３のテーブルより読み出し、このＶＩＰアドレスに対するネットワークセグメントをそれぞれ生成し、起動した業務アプリケーションにそれぞれ対応付け、記憶部２５に記憶させる（ステップＳ１５）。 Next, the control unit 222 of the server 2 extracts the application ID of the business application running in the cluster system in the storage unit 25, and since there is no application running on its own, all the extracted application IDs The business application corresponding to is identified as a business application running on the active server 1 (step S14).
Then, the control unit 222 activates the identified application ID, that is, the applications AP1, AP2, AP3, and AP4.
Then, the business application switching unit 23 reads out the VIP address corresponding to each of the activated applications AP1, AP2, AP3, and AP4 from the table of FIG. 3, generates a network segment for the VIP address, and activates the activated business application. Are stored in the storage unit 25 (step S15).

この後、制御部２２２は、アプリケーションＡＰ１、ＡＰ２、ＡＰ３及びＡＰ４に対する稼働の確認処理を行う。
また、業務アプリケーション切り替え部２３は、ＶＩＰアドレスＸ．Ｘ．Ｘ．Ｘ１とＭＡＣアドレス＃２とを付加したＡＲＰパケット、ＶＩＰアドレスＸ．Ｘ．Ｘ．Ｘ２とＭＡＣアドレス＃２とを付加したＡＲＰパケット、Ｘ．Ｘ．Ｘ．Ｘ３とＭＡＣアドレス＃２とを付加したＡＲＰパケット、及びＶＩＰアドレスＸ．Ｘ．Ｘ．Ｘ４とＭＡＣアドレス＃２とを付加したＡＲＰパケットの４つのＡＲＰパケットをルータ３０に送信する（ステップＳ１６）。
ＡＲＰパケットが入力されると、ルータ３０は、ＡＲＰテーブルにおいて、ＶＩＰアドレスＸ．Ｘ．Ｘ．Ｘ１、ＶＩＰアドレスＸ．Ｘ．Ｘ．Ｘ２、Ｘ．Ｘ．Ｘ．Ｘ３及びＶＩＰアドレスＸ．Ｘ．Ｘ．Ｘ４それぞれとに対応したＭＡＣアドレスをサーバ１のＭＡＣアドレス＃１からサーバ２の＃２へ書き換える。
これにより、以降の外部システムのクライアント端末からのアプリケーションＡＰ１、ＡＰ２、ＡＰ３及びＡＰ４に対するアクセスは、サーバ１において実行されているプリケーションＡＰ１、ＡＰ２、ＡＰ３及びＡＰ４に対して行われることになる。 Thereafter, the control unit 222 performs operation confirmation processing for the applications AP1, AP2, AP3, and AP4.
The business application switching unit 23 also includes a VIP address X. X. X. X1 and MAC address # 2 added ARP packet, VIP address X. X. X. ARP packet added with X2 and MAC address # 2, X. X. X. ARP packet to which X3 and MAC address # 2 are added, and VIP address X. X. X. Four ARP packets of the ARP packet to which X4 and MAC address # 2 are added are transmitted to the router 30 (step S16).
When the ARP packet is input, the router 30 determines that the VIP address X. X. X. X1, VIP address X. X. X. X2, X.X. X. X. X3 and VIP address X. X. X. The MAC address corresponding to each X4 is rewritten from the MAC address # 1 of the server 1 to # 2 of the server 2.
As a result, subsequent access to the applications AP1, AP2, AP3, and AP4 from the client terminals of the external system is performed with respect to the applications AP1, AP2, AP3, and AP4 executed in the server 1.

上述したように、本実形態における書き換えは、サーバ２における業務アプリケーションの稼働状態によらず、スプリットブレーンとなると強制的に業務アプリケーションの切り替えが行われることになる。
これにより、本実施形態によれば、従来のように高価なシステム情報管理テーブル共有ディスクを設けなくとも、スプリットブレーンとなったことが検出されると、ハートビートを行っていた相手のサーバで稼働している業務アプリケーションを、容易に自身のサーバにて行うように切り替えることが可能となり、従来例に比較して安価なクラスタシステムを構成することができる。 As described above, the rewriting in the present embodiment forcibly switches the business application in the split brain regardless of the operating state of the business application in the server 2.
As a result, according to the present embodiment, even if an expensive system information management table shared disk is not provided as in the prior art, when it is detected that a split brain has been detected, it operates on the partner server that performed the heartbeat. It is possible to easily switch the business application to be executed on its own server, and it is possible to configure an inexpensive cluster system as compared with the conventional example.

＜スプリットブレーン発生時の第２の業務アプリケーション切り替え処理＞
次に、図６を用いて、ハートビートが異常、すなわちサーバ１及びサーバ２においてスプリットブレーンが発生した際に、すでに述べた第１の実施形態と異なる業務アプリケーションの切り替え処理について説明する。図６は、サーバ２を主体として、アプリケーション切り替え処理の動作を説明するフローチャートである。
ハートビート監視部１１及びハートビート監視部２１のそれぞれは、属するサーバの稼働状態を互いにハートビート（上述したサーバの稼働状態を示す通知の送受信処理）にて監視するとともに、監視用専用接続であるネットワークの通信状況を確認している。ここでは、サーバ２のハートビート監視部２１がサーバ１を監視している（ステップＳ２１）。
ここで、ハートビート監視部１１及びハートビート監視部２１は、それぞれルータ３０と接続されたインターフェース１４、１５のネットワーク１００と異なるセグメントにより、それぞれの間に監視用専用接続を形成している。 <Second business application switching process when split brain occurs>
Next, a business application switching process different from that of the first embodiment already described when the heartbeat is abnormal, that is, when a split brain occurs in the server 1 and the server 2, will be described with reference to FIG. FIG. 6 is a flowchart for explaining the operation of the application switching process with the server 2 as a main body.
Each of the heartbeat monitoring unit 11 and the heartbeat monitoring unit 21 monitors the operating status of the server to which it belongs by heartbeat (the above-described notification transmission / reception processing indicating the operating status of the server) and is a dedicated connection for monitoring. Checking the network communication status. Here, the heartbeat monitoring unit 21 of the server 2 monitors the server 1 (step S21).
Here, the heartbeat monitoring unit 11 and the heartbeat monitoring unit 21 form a dedicated monitoring connection between the network 100 and the segments of the interfaces 14 and 15 connected to the router 30, respectively.

ハートビート監視部１１及びハートビート監視部２１のそれぞれは、接続されている相手から予め設定された一定周期毎に、返信の有無によりハートビートが有効に行われているか否か（スプリットブレーンの状態であるか否か）を判定する。ここでは、サーバ２のハートビート監視部２１は、サーバ１との間のハートビートが有効に行われているか否かを判定する（ステップＳ２２）。この場合も、図１のクラスタシステムにおいて、初期状態において、稼働系のサーバ１でアプリケーションＡＰ１、ＡＰ２、ＡＰ３及びＡＰ４が稼働しており、待機系のサーバ２ではいずれの業務アプリケーションも稼働していない場合について説明する。
ここで、ハートビート監視部２１は、ハートビート監視部１１に対して送信した監視信号に対する応答を受信した場合、サーバ１が正常に動作しているとして、処理をステップＳ２３へ進め、すでに説明したハートビートが正常な場合の監視処理となり、一方、監視信号に対する応答信号が受信されない場合、サーバ１自体が異常あるいは監視用専用接続が異常であるとして、処理をステップＳ２４へ進める。 Each of the heartbeat monitoring unit 11 and the heartbeat monitoring unit 21 determines whether or not the heartbeat is effectively performed based on the presence or absence of a reply at a predetermined period set by the connected partner (a state of the split brain). Whether or not). Here, the heartbeat monitoring unit 21 of the server 2 determines whether or not the heartbeat with the server 1 is being performed effectively (step S22). Also in this case, in the cluster system of FIG. 1, in the initial state, the applications AP1, AP2, AP3, and AP4 are running on the active server 1, and no business application is running on the standby server 2. The case will be described.
Here, when the heartbeat monitoring unit 21 receives a response to the monitoring signal transmitted to the heartbeat monitoring unit 11, it proceeds to step S23, assuming that the server 1 is operating normally, and has already been described. When the heartbeat is normal, the monitoring process is performed. On the other hand, when the response signal to the monitoring signal is not received, the server 1 itself is abnormal or the monitoring dedicated connection is abnormal, and the process proceeds to step S24.

次に、サーバ２において、制御部２２２は、サーバ１に対してインターフェース２４から、ネットワーク１００を介してエコーパケットを送信し（ステップＳ２４）、このエコーパケットに対する応答信号の有無を確認する（ステップＳ２５）。
このとき、制御部２２２は、応答信号が無い場合、サーバ１自体が動作していないと判定し、処理をステップＳ２６へ進め、このステップＳ２６において第１の実施形態におけるステップＳ１４からＳ１６までの処理を行い、サーバ１で稼働していた業務アプリケーションをサーバ２にて起動する。
一方、応答信号が入力された場合、制御部２２２は、処理をステップＳ２７へ進める。 Next, in the server 2, the control unit 222 transmits an echo packet from the interface 24 to the server 1 via the network 100 (step S24), and checks whether there is a response signal to the echo packet (step S25). ).
At this time, if there is no response signal, the control unit 222 determines that the server 1 itself is not operating, and proceeds to step S26. In step S26, the process from step S14 to step S16 in the first embodiment is performed. The business application that was running on the server 1 is started on the server 2.
On the other hand, when the response signal is input, the control unit 222 advances the process to step S27.

ここで、エコーパケットに対する応答が入力されたが、エコーパケットの応答のみではＬ３（レイヤー３）までの動作状態しか検出できないため、制御部２２２は各業務アプリケーションの動作の状態を検出する処理を行う。
すなわち、制御部２２２は、ネットワークの上位のレイヤーにおける動作を確認するため、ＯＳ、例えばＷＩＮＤＯＷＳ（登録商標）やＵＮＩＸ（登録商標）における仮想端末ソフトウェアの機能として設けられているＴｅｌｎｅｔあるいはＳＳＨ（ＳｅｃｕｒｅＳｈｅｌｌ）により、他のサーバ、すなわちインターフェース２４及びネットワーク１００を介してサーバ１にリモートでログインし（遠隔操作により）、サーバ１の監視部１２１における業務アプリケーションの監視処理を起動させて、各業務アプリケーションの稼働状態をチェックし（ステップＳ２８）、チェック結果に基づいて、サーバ１のアプリケーションが正常に稼働しているか否かを判定する（ステップＳ２９）。
サーバ１のアプリケーションが正常に稼働していない場合、サーバ２の制御部２２２は、ステップＳ２６と同様に、第１の実施形態におけるステップＳ１４からＳ１６までの処理を行い、サーバ１で稼働していない業務アプリケーションを起動する（ステップＳ３０）。
一方、ステップＳ２９において、業務アプリケーションの全てが正常に動作しており、サーバ１の監視部１２１が業務アプリケーションの障害を検出しなかった場合、スプリットブレーン状態ではあるものの、稼働系サーバ自体には障害がないと判断できるため、サーバ１にリモートログインしている制御部２２２は、制御部１２２に業務アプリケーションを停止させる処理、及びルータ３０に対するＡＲＰパケットの送信を行わず、すなわち業務アプリケーションの稼働を強制的に待機系のサーバ２へ移行する処理であるステップＳ３０を行わずに処理を終了する。 Here, although a response to the echo packet is input, only the response to the echo packet can detect only the operation state up to L3 (layer 3), so the control unit 222 performs a process of detecting the operation state of each business application. .
In other words, the control unit 222 confirms the operation in the upper layer of the network in order to confirm the operation of the virtual terminal software in the OS, for example, WINDOWS (registered trademark) or UNIX (registered trademark), Telnet or SSH (Secure Shell. ) To remotely log in to the server 1 via another server, that is, the interface 24 and the network 100 (by remote operation), start the business application monitoring process in the monitoring unit 121 of the server 1, and The operating state is checked (step S28), and it is determined whether or not the application of the server 1 is operating normally based on the check result (step S29).
When the application of the server 1 is not operating normally, the control unit 222 of the server 2 performs the processing from steps S14 to S16 in the first embodiment and is not operating on the server 1 as in step S26. A business application is activated (step S30).
On the other hand, if all the business applications are operating normally in step S29 and the monitoring unit 121 of the server 1 does not detect a fault in the business application, it is in the split brain state but the active server itself has a fault. Therefore, the control unit 222 that is remotely logged in to the server 1 does not perform the process of causing the control unit 122 to stop the business application and the transmission of the ARP packet to the router 30, that is, forcing the operation of the business application. Thus, the process is terminated without performing step S30, which is a process of shifting to the standby server 2.

次に、サーバ１において、制御部１２２は、上記ステップＳ２８において、障害が発生した異常な業務アプリケーションが検出されると、その業務アプリケーションのアプリケーションＩＤ、例えばアプリケーションＡＰ４を読み出し、制御部１２２により、アプリケーションＡＰ４に対応する業務アプリケーションを停止させる。これにより、業務アプリケーション切り替え部２３は、アプリケーションＡＰ４に対応するＶＩＰアドレスＸ．Ｘ．Ｘ．Ｘ４のネットワークセグメントを削除する。
そして、サーバ２において、制御部２２２は、上記アプリケーションＩＤ、すなわちアプリケーションＡＰ４を起動する。
次に、業務アプリケーション切り替え部２３は、起動したアプリケーションＡＰ４に対応するＶＩＰを、図３のテーブルより読み出し、このＶＩＰアドレスに対するネットワークセグメントを生成し、起動した業務アプリケーションにそれぞれ対応付け、記憶部２５に記憶する（ステップＳ３０）。
なお、スプリットブレーン発生と同時に、稼働系のサーバ１の業務用アプリケーションの動作を監視する監視部１２１に異常が発生していることも考えられる。その場合には、制御部１２２により業務アプリケーションを停止させるという上述の処理に代えて、リモートにてログインしている制御部２２２がサーバ１の業務アプリケーションの停止処理を行うこととしても良い。 Next, in the server 1, when the abnormal business application in which the failure has occurred is detected in step S28, the control unit 122 reads the application ID of the business application, for example, the application AP4. The business application corresponding to AP4 is stopped. As a result, the business application switching unit 23 receives the VIP address X. corresponding to the application AP4. X. X. Delete the X4 network segment.
In the server 2, the control unit 222 activates the application ID, that is, the application AP4.
Next, the business application switching unit 23 reads out the VIP corresponding to the activated application AP4 from the table of FIG. 3, generates a network segment for this VIP address, associates it with the activated business application, and stores it in the storage unit 25. Store (step S30).
It is also conceivable that an abnormality has occurred in the monitoring unit 121 that monitors the operation of the business application of the active server 1 simultaneously with the occurrence of the split brain. In that case, instead of the above-described processing of stopping the business application by the control unit 122, the control unit 222 logged in remotely may perform the business application stop processing of the server 1.

この後、制御部２２２は、アプリケーションＡＰ４に対する稼働の確認処理を行う。
また、業務アプリケーション切り替え部２３は、ＶＩＰアドレスＸ．Ｘ．Ｘ．Ｘ４及びＭＡＣアドレス＃２を付加したＡＲＰパケットをルータ３０に送信する。
ＡＲＰパケットが入力されると、ルータ３０は、ＡＲＰテーブルにおいて、ＶＩＰアドレスＸ．Ｘ．Ｘ．Ｘ４に対応したＭＡＣアドレスをサーバ１のＭＡＣアドレス＃１からサーバ２のＭＡＣアドレス＃２へ書き換える。
これにより、以降の外部システムのクライアント端末からのアプリケーションＡＰ４に対するアクセスは、サーバ２において実行されているプリケーションＡＰ４に対して行われることになる。
上述したように、本実形態における書き換えは、稼働系のサーバ１が正常に動作していない場合、このサーバ１における業務アプリケーションの稼働状態によらず、スプリットブレーンとなると強制的に業務アプリケーションの切り替えを行い、サーバ１が正常に動作している場合、サーバ１において障害のある業務アプリケーションのみ、待機系のサーバ２への切り替えが行われることになる。 Thereafter, the control unit 222 performs operation confirmation processing for the application AP4.
The business application switching unit 23 also includes a VIP address X. X. X. The ARP packet to which X4 and MAC address # 2 are added is transmitted to the router 30.
When the ARP packet is input, the router 30 determines that the VIP address X. X. X. The MAC address corresponding to X4 is rewritten from the MAC address # 1 of the server 1 to the MAC address # 2 of the server 2.
As a result, the subsequent access to the application AP4 from the client terminal of the external system is made to the application AP4 executed in the server 2.
As described above, when the active server 1 is not operating normally, rewriting in this embodiment is forcibly switched when the split brain occurs regardless of the operating state of the business application on the server 1. When the server 1 is operating normally, only the business application having a failure in the server 1 is switched to the standby server 2.

すなわち、本実施形態によれば、従来のように高価なシステム情報管理テーブル共有ディスクを設けなくとも、各アプリケーションに対するＩＣＭＰのエコーパケットの送信、及び他のサーバに対してネットワークからのログインによるアプリケーションのチェックにより、スプリットブレーンとなった場合に、ハートビートを行う監視用専用接続のネットワークの通信障害により、ハートビートが行えなくなった他のサーバにおいて、障害のある業務アプリケーションを容易に検出することが可能となり、従来例に比較して安価なクラスタシステムを構成することができる。 That is, according to the present embodiment, without providing an expensive system information management table shared disk as in the prior art, it is possible to send an ICMP echo packet to each application and to log in to another server by logging in from the network. If the check results in a split brain, it is possible to easily detect faulty business applications on other servers that cannot perform heartbeat due to a communication failure in the dedicated monitoring network that performs heartbeats. Thus, an inexpensive cluster system can be configured as compared with the conventional example.

なお、図１におけるサーバ１（あるいはサーバ２）それぞれの各部の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより障害の発生しているアプリケーションの検出処理及びアプリケーションの切り替え処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータシステム」は、ホームページ提供環境（あるいは表示環境）を備えたＷＷＷシステムも含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。 Note that a program for realizing the function of each unit of the server 1 (or server 2) in FIG. 1 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into a computer system. It is also possible to perform application detection processing and application switching processing by executing the fault. The “computer system” here includes an OS and hardware such as peripheral devices. The “computer system” includes a WWW system having a homepage providing environment (or display environment). The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Further, the “computer-readable recording medium” refers to a volatile memory (RAM) in a computer system that becomes a server or a client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In addition, those holding programs for a certain period of time are also included.

また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであっても良い。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。 The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

本発明の一実施形態によるクラスタシステムの構成例を示すブロックズである。It is Blocks which shows the structural example of the cluster system by one Embodiment of this invention. 図１のルータ３０に記憶されているＡＲＰテーブルの構成例を示す概念図である。FIG. 2 is a conceptual diagram illustrating a configuration example of an ARP table stored in a router 30 in FIG. 1. 図１の記憶部１５（あるいは記憶部２５）に記憶されている対応テーブルの構成例を示す概念図である。It is a conceptual diagram which shows the structural example of the corresponding | compatible table memorize | stored in the memory | storage part 15 (or memory | storage part 25) of FIG. 図１のクラスタシステムにおけるアプリケーションの切り替えの動作例を説明するフローチャートである。3 is a flowchart for explaining an operation example of application switching in the cluster system of FIG. 1. 図１のクラスタシステムにおけるアプリケーションの切り替えの動作例を説明するフローチャートである。3 is a flowchart for explaining an operation example of application switching in the cluster system of FIG. 1. 図１のクラスタシステムにおけるアプリケーションの切り替えの動作例を説明するフローチャートである。3 is a flowchart for explaining an operation example of application switching in the cluster system of FIG. 1.

Explanation of symbols

１１，２１…ハートビート監視部
１２，２２…業務アプリケーション制御部
１３，２３…業務アプリケーション切り替え部
１４，２４…インターフェース
１５，２５…記憶部
３０…ルータ
１００…ネットワーク
１２１，２２１…監視部
１２２，２２２…制御部 11, 21 ... Heartbeat monitoring unit 12, 22 ... Business application control unit 13, 23 ... Business application switching unit 14, 24 ... Interface 15, 25 ... Storage unit 30 ... Router 100 ... Network 121, 221 ... Monitoring unit 122, 222 ... Control unit

Claims

A cluster system that consists of multiple servers connected via a network. Each application is processed on the active server, and when the application becomes abnormal, the abnormal application is processed on the standby server. ,
Each of the servers is
A heartbeat monitoring unit that detects whether or not the heartbeat with other servers is operating normally;
When the heartbeat monitoring unit detects that a heartbeat with another server is abnormal, an application control unit that starts an application running on the other server;
A cluster system comprising: an application switching unit configured to change a MAC address corresponding to an IP address of the application in the router to the own MAC address.

The application control unit
When an abnormality occurs in the heartbeat, an echo packet is transmitted to each other server via the network, and an application corresponding to a server that does not reply to the echo packet is executed in addition to the currently executing application. The cluster system according to claim 1, wherein:

When the application control unit returns a response to the echo packet, the server logs in to the other server that has returned the echo packet through the network using virtual terminal software, and the server that has returned the echo packet The cluster system according to claim 2, wherein the operating state of each running application is checked to detect whether it is normal.

Used in a cluster system that consists of multiple servers connected to the network, and each application is processed on the active server, and when the application becomes abnormal, the abnormal application is processed on the standby server A cluster server,
A heartbeat monitoring unit that detects whether or not the heartbeat with other servers is operating normally;
When the heartbeat monitoring unit detects that a heartbeat with another server is abnormal, an application control unit that starts an application running on the other server;
A cluster server, comprising: an application switching unit that changes a MAC address corresponding to an IP address of the application in the router to its own MAC address.

Used in a cluster system that consists of multiple servers connected to the network, each application is processed on the active server, and if the application becomes abnormal, the abnormal application is processed on the standby server Cluster control method,
In each of the servers,
A heartbeat monitoring process in which the heartbeat monitoring unit detects whether the heartbeat with other servers is operating normally;
When the heartbeat monitoring unit detects that the heartbeat with another server is abnormal, the application control unit starts an application running on the other server, and
A cluster control method, wherein the application switching unit includes an application switching process in which a MAC address corresponding to the IP address of the application in the router is changed to its own MAC address.