JP2010231257A

JP2010231257A - High availability system and method for handling failure of high availability system

Info

Publication number: JP2010231257A
Application number: JP2009074847A
Authority: JP
Inventors: Osami Kabashima; 良聡臣椛島
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2009-03-25
Filing date: 2009-03-25
Publication date: 2010-10-14

Abstract

<P>PROBLEM TO BE SOLVED: To provide a high availability system that uses a cluster to perform backup and recovery using the latest data without stopping the cluster by organically connecting the two technologies of the cluster and the backup. <P>SOLUTION: The high availability system is provided with: an active server 101; a cluster 103 including a backup server 102 for always storing up-to-date data in synchronization with the active server; and a virtual server 122 for synchronizing the data with the active server 102. The synchronization between backup server 102 and the virtual server 122 is interlocked with synchronization between the active server 101 and the backup server 102. The virtual server 122 is backed up while the synchronizing function between the backup server 102 and the active server 101 and the synchronizing function between the virtual server 122 and the backup server 102 are stopped. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、クラスタを用いた高可用性システムに関し、特に、高可用性システムの対障害対策方法に関する。 The present invention relates to a high availability system using a cluster, and more particularly, to a countermeasure method against a failure in a high availability system.

クラスタリングは一般的な技術であり、高可用性を実現するための技術である。一方、迅速な障害復旧を実現する技術としてバックアップが広く利用されている。 Clustering is a general technique and is a technique for realizing high availability. On the other hand, backup is widely used as a technique for realizing quick failure recovery.

これらクラスタリング及びバックアップの技術は、アプローチこそ異なるが、何れも対障害対策の技術であり、これら２つの技術を有機的に結合すれば、可用性を更に高めることが可能である。しかし、クラスタリングとバックアップを有機的に結合した技術は確立されていない。 Although these clustering and backup technologies are different in approach, they are all anti-failure technologies, and if these two technologies are organically combined, availability can be further increased. However, a technology that organically combines clustering and backup has not been established.

例えば、特許文献１に記載される関連技術では、仮想マシン上に同一機能を持ったサーバを構築してクラスタとして動作させる技術を提要しているが、障害復旧対策のためのバックアップについては何ら考慮されていない。 For example, in the related technology described in Patent Document 1, a technology for constructing a server having the same function on a virtual machine and operating it as a cluster is proposed, but no consideration is given to backup for disaster recovery measures. It has not been.

特開２００５−１７３７５１号公報JP 2005-173751 A

上述したように、クラスタリング及びバックアップの技術は共に対障害対策を目的としているにもかかわらず、これらの技術を有機的に結合して、可用性を更に高める技術については確立されていない。 As described above, although both clustering and backup technologies are aimed at countermeasures against failure, no technology has been established for further combining these technologies to further increase availability.

特許文献１等に記載される関連技術では、クラスタシステムを安全にバックアップするためには、クラスタを停止しなければならないという問題がある。これは、データの整合性を保持するために静止点を設ける必要があり、この状態でバックアップを実施する必要があるためである。従って、業務を停止することができないシステムでクラスタを停止せずにバックアップを実施することができなかった。 The related art described in Patent Document 1 and the like has a problem that the cluster must be stopped in order to safely back up the cluster system. This is because it is necessary to provide a quiesce point in order to maintain data consistency, and it is necessary to perform backup in this state. Therefore, backup cannot be performed without stopping the cluster in a system that cannot stop the business.

（発明の目的）
本発明の目的は、クラスタを用いた高可用性システムにおいて、クラスタとバックアップの２つの技術を有機的に結合させ、クラスタを停止させることなく最新のデータでのバックアップとリカバリを可能にした高可用性システムを提供することにある。 (Object of invention)
An object of the present invention is to provide a high-availability system in which two technologies of a cluster and a backup are organically combined in a high-availability system using a cluster, and backup and recovery with the latest data can be performed without stopping the cluster. Is to provide.

本発明の高可用性システムは、稼働系サーバと、当該稼働系サーバと同期してデータを常に最新に保持する待機系サーバを含むクラスタと、待機系サーバとの間でデータの同期をとる仮想サーバを含み、稼動系サーバと待機系サーバの同期処理と連動して、待機系サーバと仮想サーバの同期処理を行い、待機系サーバの稼動系サーバとの同期機能と仮想サーバの待機系サーバとの同期機能を停止した状態で、仮想サーバのバックアップを行う。 The high availability system of the present invention includes an active server, a cluster including a standby server that keeps data up-to-date in synchronization with the active server, and a virtual server that synchronizes data with the standby server In synchronization with the synchronization processing of the active server and the standby server, the standby server and the virtual server are synchronized. The synchronization function between the standby server and the active server and the standby server of the virtual server Backup the virtual server with the synchronization function stopped.

本発明の高可用性システムの対障害対策方法は、クラスタを構成する稼働系サーバと待機系サーバとの同期処理に連増して、待機系サーバと仮想サーバとの同期処理を行うステップと、待機系サーバの稼動系サーバとの同期機能と仮想サーバの待機系サーバとの同期機能を停止した状態で、仮想サーバのバックアップを行うステップを有する。 The method for countermeasures against failure of the high availability system of the present invention includes a step of performing synchronization processing between a standby server and a virtual server in addition to synchronization processing between an active server and a standby server constituting a cluster, There is a step of backing up the virtual server in a state where the synchronization function of the server with the active server and the synchronization function of the virtual server with the standby server are stopped.

本発明によれば、クラスタを用いた高可用性システムにおいて、クラスタとバックアップの２つの技術を有機的に結合させ、クラスタを停止させることなく最新のデータでのバックアップとリカバリを可能にする。 According to the present invention, in a high-availability system using a cluster, the two technologies of cluster and backup are organically combined to enable backup and recovery with the latest data without stopping the cluster.

本発明の第１の実施の形態に係る高可用性システムの全体構成を示すブロック図である。It is a block diagram which shows the whole structure of the high availability system which concerns on the 1st Embodiment of this invention. 第１の実施の形態に係る稼動系サーバの構成を示すブロック図である。It is a block diagram which shows the structure of the active system server which concerns on 1st Embodiment. 第１の実施の形態に係る待機系サーバの構成を示すブロック図である。It is a block diagram which shows the structure of the standby system server which concerns on 1st Embodiment. 第１の実施の形態に係る仮想サーバの機能構成を示すブロック図である。It is a block diagram which shows the function structure of the virtual server which concerns on 1st Embodiment. 第１の実施の形態に係る管理サーバの機能構成を示すブロック図である。It is a block diagram which shows the function structure of the management server which concerns on 1st Embodiment. 第１の実施の形態に係る高可用性システムのバックアップ処理を示すタイムチャート図である。It is a time chart figure which shows the backup process of the high availability system which concerns on 1st Embodiment. 第１の実施の形態に係る高可用性システムのリカバリ処理を示すタイムチャート図である。It is a time chart figure which shows the recovery process of the high availability system which concerns on 1st Embodiment. 本発明の第２の実施の形態に係る稼働計サーバの構成を示すブロック図である。It is a block diagram which shows the structure of the operation meter server which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施の形態に係る待機系サーバの構成を示すブロック図である。It is a block diagram which shows the structure of the standby system server which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施の形態に係る管理サーバの構成を示すブロック図である。It is a block diagram which shows the structure of the management server which concerns on the 2nd Embodiment of this invention. 第２の実施の形態に係る管理サーバの動作を示すフローチャート図である。It is a flowchart figure which shows operation | movement of the management server which concerns on 2nd Embodiment. 第１及び第２の実施の形態におけるサーバのハードウェア構成例を示すブロック図である。It is a block diagram which shows the hardware structural example of the server in 1st and 2nd embodiment.

以下、本発明の実施の形態について図面を参照して詳細に説明する。第１の実施の形態では、仮想マシンサーバとクラスタを用いた高可用性システムの基本構成と特徴、バックアップ及びリカバリ処理の動作について詳説する。また、第２の実施の形態では、稼動サーバに障害が発生した場合の自働リカバリ方法について詳説する。第３の実施の形態では、コスト削減に対応させるためにクラスタを仮想マシンサーバ上の仮想マシンに構築した場合の例について詳説する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In the first embodiment, the basic configuration and characteristics of a high availability system using virtual machine servers and clusters, and operations of backup and recovery processing will be described in detail. In the second embodiment, an automatic recovery method when a failure occurs in the active server will be described in detail. In the third embodiment, an example in which a cluster is constructed in a virtual machine on a virtual machine server in order to cope with cost reduction will be described in detail.

（第１の実施の形態）
図１は、本発明の第１の実施の形態による高可用性システムの全体構成を示すブロック図である。図１を参照すると、第１の実施の形態による高可用性システムは、ＡＰ（アプリケーション）サーバ１０１とＡＰサーバ１０２で構築したクラスタ１０３、ＤＢ（データベース）サーバ１１１とＤＢサーバで構築したクラスタ１１３、仮想マシンサーバ１２１、仮想マシンサーバ１２１上に構築した仮想ＡＰサーバ１２２及び仮想ＤＢサーバ１２３、管理サーバ１３１を含んでいる。 (First embodiment)
FIG. 1 is a block diagram showing the overall configuration of the high availability system according to the first embodiment of the present invention. Referring to FIG. 1, the high availability system according to the first embodiment includes a cluster 103 constructed by an AP (application) server 101 and an AP server 102, a cluster 113 constructed by a DB (database) server 111 and a DB server, a virtual A machine server 121, a virtual AP server 122, a virtual DB server 123, and a management server 131 constructed on the virtual machine server 121 are included.

上記のように、ＡＰサーバ１０１及びＡＰサーバ１０２はクラスタ１０３を、ＤＢサーバ１１１及びＤＢサーバ１１２は、クラスタ１１３をそれぞれ構築しており、クラスタ内のサーバのうち、一方がユーザにサービス提供を行っている稼動系サーバ、もう一方が待機中の待機系サーバとして動作している。クラスタ１０３、１１３は、ミラー型のクラスタリング機能を有している。待機系サーバであるＡＰサーバ１０２及びＤＢサーバ１１２は、それぞれ稼動系サーバであるＡＰサーバ１０１及びＤＢサーバ１１１と同期してデータを常に最新に保持する。 As described above, the AP server 101 and the AP server 102 constitute the cluster 103, and the DB server 111 and the DB server 112 constitute the cluster 113, and one of the servers in the cluster provides a service to the user. The other active server is operating as the standby server. The clusters 103 and 113 have a mirror type clustering function. The AP server 102 and the DB server 112 that are standby servers always keep the latest data in synchronization with the AP server 101 and the DB server 111 that are active servers.

このクラスタ１０３とクラスタ１１３は、サービスネットワーク１４１は、業務に使用されるサービスネットワーク１４１と、クラスタ内の同期やバックアップのために使用される管理ネットワーク１４２にそれぞれ接続されている。 In the cluster 103 and the cluster 113, the service network 141 is connected to a service network 141 used for business and a management network 142 used for synchronization and backup in the cluster.

また、図２及び図３に示すように、稼働系ＡＰサーバ１０１及び待機系ＡＰサーバ１０２は、クラスタ同期手段２０２、３０２、データ同期手段２０２、３０２を含む。クラスタ同期手段２０２、３０２は、例えば、クラスタ同期を実行するエージェントとして実装できる。また、同様に、データ同期手段２０３、３０３も、データ同期を実行するエージェントとして実装することができる。 As shown in FIGS. 2 and 3, the active AP server 101 and the standby AP server 102 include cluster synchronization means 202 and 302 and data synchronization means 202 and 302. The cluster synchronization means 202, 302 can be implemented as an agent that executes cluster synchronization, for example. Similarly, the data synchronization means 203 and 303 can also be implemented as agents that execute data synchronization.

稼働系ＡＰサーバ１０１及び待機系ＡＰサーバ１０２は、クラスタ同期手段２０２、３０２を用いて、クラスタ１０３内でデータの同期を行う機能を有する。また、データ同期手段２０３及び３０３を用いて、仮想ＡＰサーバ１２２とデータの同期を行う機能を有する。 The active AP server 101 and the standby AP server 102 have a function of synchronizing data within the cluster 103 using the cluster synchronization means 202 and 302. Further, it has a function of synchronizing data with the virtual AP server 122 using the data synchronization means 203 and 303.

なお、クラスタ１１３を構成する稼動系ＤＢサーバ１１１、待機系ＤＢサーバ１１２についても、上記稼働系ＡＰサーバ１０１及び待機系ＡＰサーバ１０２と同様の構成であり、クラスタ同期手段２０２、３０２を用いて、クラスタ１１３内でデータの同期を行う。また、データ同期手段２０３及び３０３を用いて、仮想ＤＢサーバ１２３とデータの同期を行う。 Note that the active DB server 111 and the standby DB server 112 constituting the cluster 113 have the same configuration as the active AP server 101 and the standby AP server 102, and the cluster synchronization means 202 and 302 are used. Data synchronization is performed within the cluster 113. Further, the data synchronization means 203 and 303 are used to synchronize data with the virtual DB server 123.

管理ネットワーク１４２には、クラスタ１０３の待機系ＡＰサーバ１０２と同期する仮想ＡＰサーバ１２２と、クラスタ１１３の待機系ＤＢサーバ１１２と同期する仮想ＤＢサーバ１２３、管理サーバ１３１が存在する。 The management network 142 includes a virtual AP server 122 that synchronizes with the standby AP server 102 of the cluster 103, a virtual DB server 123 that synchronizes with the standby DB server 112 of the cluster 113, and a management server 131.

仮想ＡＰサーバ１２２及び仮想ＤＢサーバ１２３は、仮想マシンサーバ１２１上に構築された仮想サーバであり、図４に示すように、データ同期手段４０２を備える。このデータ同期手段４０２によって、待機系ＡＰサーバ１０２及び待機系ＤＢサーバ１１２とデータの同期を行う。この仮想ＡＰサーバ１２２及び仮想ＤＢサーバ１２３が、バックアップ対象サーバとなる。 The virtual AP server 122 and the virtual DB server 123 are virtual servers constructed on the virtual machine server 121, and include data synchronization means 402 as shown in FIG. The data synchronization unit 402 synchronizes data with the standby AP server 102 and the standby DB server 112. The virtual AP server 122 and the virtual DB server 123 serve as backup target servers.

管理サーバ１３１は、図５に示すように、クラスタ制御プログラム５０２、バックアッププログラム５０３、リカバリプログラム５０４を含む。 As shown in FIG. 5, the management server 131 includes a cluster control program 502, a backup program 503, and a recovery program 504.

クラスタ制御プログラム５０２は、クラスタ１０３及びクラスタ１１３内の稼動系サーバと待機系サーバを切り替えるクラスタ制御機能を有する。 The cluster control program 502 has a cluster control function for switching between active and standby servers in the cluster 103 and cluster 113.

バックアッププログラム５０３は、仮想ＡＰサーバ１２２、仮想ＤＢサーバ１２をバックアップする機能を有する。 The backup program 503 has a function of backing up the virtual AP server 122 and the virtual DB server 12.

リカバリプログラム５０４は、クラスタ１０３及びクラスタ１１３内の稼働系サーバ及び待機系サーバに、バックアップ対象サーバである仮想サーバをマイグレーションしてシステムを復旧するリカバリ機能を有する。 The recovery program 504 has a recovery function for migrating a virtual server that is a backup target server to the active server and the standby server in the cluster 103 and the cluster 113 to recover the system.

本実施の形態の特徴は、ＡＰサーバ１０１とＡＰサーバ１０２のバックアップをクラスタ停止せずに実現でき、かつ、バックアップのためにＡＰサーバ１０１とＡＰサーバ１０２にかかる負荷を低減することができることである。これらを実現するために、仮想ＡＰサーバ１２２からバックアップを行う。また、負荷の低減するためにＡＰサーバ１０２の差分データのみを仮想ＡＰサーバ１２２に送信して同期処理を行う。 The feature of this embodiment is that backup of the AP server 101 and the AP server 102 can be realized without stopping the cluster, and the load on the AP server 101 and the AP server 102 for backup can be reduced. . In order to realize these, backup is performed from the virtual AP server 122. In order to reduce the load, only the difference data of the AP server 102 is transmitted to the virtual AP server 122 to perform the synchronization process.

（第１の実施の形態の動作の説明）
次に、図６、図７を使用して本実施の形態による高可用性システムの処理動作を説明する。 (Description of the operation of the first embodiment)
Next, processing operations of the high availability system according to the present embodiment will be described with reference to FIGS.

図６は、本実施の形態における高可用性システムのバックアップ処理例を示すシーケンス図である。なお、クラスタ１０３のＡＰサーバ１０１を稼動系サーバ、ＡＰサーバ１０２を待機系サーバとした場合を例示して説明する。 FIG. 6 is a sequence diagram illustrating an example of backup processing of the high availability system according to the present embodiment. An example will be described in which the AP server 101 of the cluster 103 is an active server and the AP server 102 is a standby server.

管理サーバ１３１は、稼働系ＡＰサーバ１０１と待機系ＡＰサーバ１０２、仮想ＡＰサーバ１２２に対して同期処理を指示する（ステップ６０１）。 The management server 131 instructs the active AP server 101, the standby AP server 102, and the virtual AP server 122 to perform a synchronization process (step 601).

稼働系ＡＰサーバ１０１は、データの更新が発生すると、クラスタ同期手段２０２を起動し、前回の同期から変更のあった差分データを待機系ＡＰサーバ１０２に送信する（ステップ６０２）。また、待機系ＡＰサーバ１０２は、クラスタ同期手段３０２を動作させ、稼働系ＡＰサーバ１０１からの差分データを受信してディスクに書き込み、同期処理を行う（ステップ６０３）。 When the data update occurs, the active AP server 101 activates the cluster synchronization unit 202 and transmits the difference data changed from the previous synchronization to the standby AP server 102 (step 602). The standby AP server 102 operates the cluster synchronization means 302, receives the difference data from the active AP server 101, writes it to the disk, and performs synchronization processing (step 603).

待機系ＡＰサーバ１０２は、同期処理が完了すると、クラスタ同期手段３０２を停止することにより、以降は稼働系ＡＰサーバ１０１と同期をとらないように設定変更する（ステップ６０４）。 When the synchronization process is completed, the standby AP server 102 stops the cluster synchronization unit 302, and thereafter changes the setting so as not to synchronize with the active AP server 101 (step 604).

次に、待機系ＡＰサーバ１０２は、データ同期手段３０３を起動し、稼働系ＡＰサーバ１０１から受信した差分データを仮想ＡＰサーバ１２２に送信する（ステップ６０５）。仮想ＡＰサーバ１２２は、データ同期手段４０２を起動し、差分データを受信してディスクに書き込み、同期処理を行う（ステップ６０６）。 Next, the standby AP server 102 activates the data synchronization unit 303 and transmits the difference data received from the active AP server 101 to the virtual AP server 122 (step 605). The virtual AP server 122 activates the data synchronization unit 402, receives the difference data, writes it to the disk, and performs synchronization processing (step 606).

同期が完了すると、仮想ＡＰサーバ１２２は、データ同期手段４０２を停止することにより、以降は待機系ＡＰサーバ１０２と同期をとらないように設定を変更する（ステップ６０７）。すなわち、ＡＰサーバ１０２と仮想ＡＰサーバ１２２の同期処理を一旦停止する。 When the synchronization is completed, the virtual AP server 122 stops the data synchronization unit 402 and changes the setting so as not to synchronize with the standby AP server 102 thereafter (step 607). That is, the synchronization process between the AP server 102 and the virtual AP server 122 is temporarily stopped.

続いて、管理サーバ１３１は、バックアッププログラム５０３によって仮想ＡＰサーバ１２２のバックアップを実行する（ステップ６０８）。このバックアップにおいて、システムを含めたバックアップであればオフラインでバックアップを行うが、データのバックアップのみであればオンラインでも実行可能である。また、バックアップが完了するまでの間、ＡＰサーバ１０２は仮想ＡＰサーバ１２２に差分データの送信を行うことはできないので、内部に差分データを保持しておく。 Subsequently, the management server 131 performs backup of the virtual AP server 122 by the backup program 503 (step 608). In this backup, if the backup includes the system, the backup is performed offline, but if only the data backup is performed, the backup can be performed online. Further, until the backup is completed, the AP server 102 cannot transmit the difference data to the virtual AP server 122, and therefore retains the difference data therein.

管理サーバ１３１は、仮想ＡＰサーバ１２２のバックアップが完了すると、停止している待機系ＡＰサーバ１０２のクラスタ同期手段３０２と、仮想ＡＰサーバ１２２のデータ同期手段４０２の動作を再開させる（ステップ６０９、６１０）。これにより、稼動系ＡＰサーバ１０１と待機系ＡＰサーバ１０２の同期処理及び待機系ＡＰサーバ１０２と仮想ＡＰサーバ１２２の同期処理が再開する。 When the backup of the virtual AP server 122 is completed, the management server 131 resumes the operations of the cluster synchronization unit 302 of the standby AP server 102 that has been stopped and the data synchronization unit 402 of the virtual AP server 122 (steps 609 and 610). ). As a result, the synchronization process between the active AP server 101 and the standby AP server 102 and the synchronization process between the standby AP server 102 and the virtual AP server 122 are resumed.

以上の説明では、クラスタ１０３のＡＰサーバ１０１を稼動系サーバ、ＡＰサーバ１０２を待機系サーバとした場合のバックアップ処理を説明したが、クラスタ１１３のＤＢサーバ１１１を稼動系サーバ、ＤＢサーバ１１２を待機系サーバとした場合でも、上記の同様の手順によりバックアップ処理が実行される。 In the above description, the backup processing when the AP server 101 of the cluster 103 is the active server and the AP server 102 is the standby server has been described. However, the DB server 111 of the cluster 113 is the active server and the DB server 112 is standby. Even in the case of a system server, the backup process is executed by the same procedure as described above.

図７は本実施の形態における高可用性システムのリカバリ処理例を示すシーケンス図である。なお、ここでは、稼動系サーバがＡＰサーバ１０１で、待機系サーバがＡＰサーバ１０２の場合において、稼働系ＡＰサーバ１０１にシステム異常が発生してリカバリを実行する場合の処理について説明する。 FIG. 7 is a sequence diagram showing an example of recovery processing of the high availability system in the present embodiment. Here, a description will be given of processing in a case where a system abnormality occurs in the active AP server 101 and recovery is performed when the active server is the AP server 101 and the standby server is the AP server 102.

稼働系ＡＰサーバ１０１に障害が発生すると、管理サーバ１３１は、クラスタ制御プログラム５０２によって、稼動系サーバをＡＰサーバ１０１からＡＰサーバ１０２に切り替えて、ＡＰサーバ１１０１をクラスタ１０３から切り離す（ステップ７０１、７０２、７０３）。これにより、ＡＰサーバ１０２が稼動系サーバとして業務を継続する。 When a failure occurs in the active AP server 101, the management server 131 switches the active server from the AP server 101 to the AP server 102 by the cluster control program 502, and disconnects the AP server 1101 from the cluster 103 (steps 701 and 702). 703). As a result, the AP server 102 continues the operation as an active server.

管理サーバ１３１は、仮想ＡＰサーバ１２２に対してデータ同期手段４０２の停止を指示する（ステップ７０６）。仮想ＡＰサーバ１２２は、ＡＰサーバ１０２との同期処理を停止して、システム異常が発生した時点での静止ポイントを設ける。 The management server 131 instructs the virtual AP server 122 to stop the data synchronization unit 402 (step 706). The virtual AP server 122 stops the synchronization process with the AP server 102 and provides a stationary point when a system abnormality occurs.

その後、管理サーバ１３１は、リカバリプログラム５０４を実行し、仮想ＡＰサーバ１２２を、障害が発生したＡＰサーバ１０１にマイグレーションして、システムを復旧する（７０５、７０７）。 Thereafter, the management server 131 executes the recovery program 504, migrates the virtual AP server 122 to the AP server 101 where the failure has occurred, and restores the system (705, 707).

なお、仮想マシンサーバの機能として異なるハードウェア構成のマシンをマイグレーションする技術が実現されてきているので、仮想ＡＰサーバ１２２とＡＰサーバ１０１のハードウェアが異なることによるドライバの互換性等の問題は発生しない。 Since technology for migrating machines with different hardware configurations has been realized as a function of the virtual machine server, problems such as driver compatibility due to different hardware of the virtual AP server 122 and the AP server 101 have occurred. do not do.

マイグレーションによってＡＰサーバ１０１が静止ポイントまでリカバリされると、管理サーバ１３１は、ＡＰサーバ１０１のクラスタ同期手段２０２と、仮想ＡＰサーバ１２２のデータ同期手段４０２を再開させる（７０９、７１０）。これにより、稼働系のＡＰサーバ１０２とＡＰサーバ１０１間の同期処理、ＡＰサーバ１０１と仮想ＡＰサーバ１２２の同期処理が再開する。 When the AP server 101 is recovered to the stationary point by the migration, the management server 131 restarts the cluster synchronization unit 202 of the AP server 101 and the data synchronization unit 402 of the virtual AP server 122 (709, 710). As a result, the synchronization process between the active AP server 102 and the AP server 101 and the synchronization process between the AP server 101 and the virtual AP server 122 are resumed.

上記の説明では、クラスタ１０３のＡＰサーバ１０１を稼動系サーバ、ＡＰサーバ１０２を待機系サーバとした場合のリカバリ処理を説明したが、クラスタ１１３のＤＢサーバ１１１を稼動系サーバ、ＤＢサーバ１１２を待機系サーバとした場合でも、上記の同様の手順によりリカバリ処理が実行される。 In the above description, the recovery process is described in the case where the AP server 101 of the cluster 103 is an active server and the AP server 102 is a standby server. However, the DB server 111 of the cluster 113 is standby and the DB server 112 is standby. Even in the case of a system server, the recovery process is executed by the same procedure as described above.

（第１の実施の形態による効果）
次に、上述した第１の実施の形態による効果について説明する。
第１に、クラスタによる高可用性を実現しながらサービス停止をすることなくバックアップを実施することができる。すなわち、クラスタとバックアップの２つの技術を有機的に結合させた可用性の高いシステムを提供することができる。 (Effects of the first embodiment)
Next, effects of the first embodiment described above will be described.
First, backup can be performed without stopping the service while realizing high availability by the cluster. That is, it is possible to provide a highly available system that organically combines the two technologies of cluster and backup.

第２に、稼動サーバを直接バックアップするのではなく、同一のオペレーティングシステムおよびアプリケーション、データを保持している仮想サーバからバックアップを実行することにより、稼働サーバおよび待機サーバのサービス無停止と、バックアップの負荷低減を実現することができる。また、差分データによるバックアップのため、この点でも稼動サーバ及び待機サーバへの負荷軽減が実現できる。 Second, instead of backing up the active server directly, by executing the backup from the virtual server holding the same operating system, application, and data, the service of the active server and the standby server can be stopped without interruption. Load reduction can be realized. In addition, since the backup is based on the differential data, the load on the active server and the standby server can be reduced.

第３に、クラスタの同期機能と差分バックアップを包括的に考慮することが可能である。すなわち、クラスタリングで発生する差分データを仮想サーバにも書き込みことで、クラスタとバックアップの融合を実現することができる。 Third, it is possible to comprehensively consider the cluster synchronization function and differential backup. That is, it is possible to realize the fusion of the cluster and the backup by writing the difference data generated in the clustering to the virtual server.

第４に、仮想マシンサーバのディスク状態はクラスタの同期処理と連動して、常に同時に差分データによるバックアップが実現されているため、常に最新の状態でのバックアップ、及びリカバリを提供することができる。 Fourth, since the disk state of the virtual machine server is always synchronized with the cluster synchronization process and always backed up by differential data, backup and recovery in the latest state can always be provided.

第５に、仮想マシンサーバ１２１上に仮想サーバを構築し、仮想サーバをバックアップ対象サーバとすることにより、バックアップ対象サーバの設置コストの削減、省エネルギー、スペース集約などのメリットが得られる。 Fifth, by constructing a virtual server on the virtual machine server 121 and using the virtual server as a backup target server, merits such as reduction in installation cost of the backup target server, energy saving, and space aggregation can be obtained.

（第２の実施の形態）
次に、本発明の第２の実施の形態による高可用性システムについて説明する。第２の実施の形態では、障害監視機能を使用することで、サーバに障害が発生したときに、自動的に、クラスタの切り替え、及び障害発生サーバのリカバリを実行するものである。 (Second Embodiment)
Next, a high availability system according to the second embodiment of the present invention will be described. In the second embodiment, by using the failure monitoring function, when a failure occurs in the server, the cluster is automatically switched and the failed server is recovered.

第２の実施の形態による高可用性システムの全体構成については、図１に示した第１の実施の形態の構成と同様であるので、ここでは説明を省略する。 Since the overall configuration of the high availability system according to the second embodiment is the same as the configuration of the first embodiment shown in FIG. 1, the description thereof is omitted here.

図８から図１０は、本発明の第２の実施の形態に係る稼動系ＡＰサーバ１０１、待機系サーバ１０２、及び管理サーバ１３１の機能構成を示すブロック図である。 8 to 10 are block diagrams showing functional configurations of the active AP server 101, the standby server 102, and the management server 131 according to the second embodiment of the present invention.

稼動系ＡＰサーバ１０１は、クラスタ同期手段２０２、データ同期手段２０３、障害監視手段２０４を含む。同様に、待機系ＡＰサーバ１０２は、クラスタ同期手段３０２、データ同期手段３０３、障害監視手段３０４を含む。 The active AP server 101 includes cluster synchronization means 202, data synchronization means 203, and failure monitoring means 204. Similarly, the standby AP server 102 includes cluster synchronization means 302, data synchronization means 303, and failure monitoring means 304.

稼動系ＡＰサーバ１０１と待機系ＡＰサーバ１０２の障害監視手段２０４、３０４は、それぞれサーバの障害を監視して検出するプログラムであり、ＳＮＭＰなど公知のプロトコルに基づいて実装されたプログラムによって実装される。障害監視手段２０４、３０４は、サーバにおけるシステム障害等を検出すると、障害発生通知を管理サーバ１３１に送る。 The failure monitoring means 204 and 304 of the active AP server 101 and the standby AP server 102 are programs that monitor and detect server failures, and are implemented by programs implemented based on a known protocol such as SNMP. . When the failure monitoring means 204, 304 detects a system failure or the like in the server, the failure monitoring means 204, 304 sends a failure occurrence notification to the management server 131.

管理サーバ１３１は、クラスタ制御プログラム５０２、バックアッププログラム５０３、リカバリプログラム５０４、障害監視プログラム５０５を含む。 The management server 131 includes a cluster control program 502, a backup program 503, a recovery program 504, and a failure monitoring program 505.

障害監視プログラム５０５は、障害監視手段２０４及び３０４からの障害発生通知を監視しており、障害発生通知を受信すると、これをトリガーとして、管理サーバ１３１が、クラスタの切り替え及び障害サーバのリカバリ処理を実行する。 The failure monitoring program 505 monitors failure occurrence notifications from the failure monitoring means 204 and 304. When the failure occurrence notification is received, the management server 131 uses this as a trigger to perform cluster switching and failure server recovery processing. Execute.

なお、稼動系ＡＰサーバ１０１と待機系ＡＰサーバ１０２のクラスタ同期手段２０２、データ同期手段２０３、３０３の機能、管理サーバ１３１のクラスタ制御プログラム５０２、バックアッププログラム５０３、リカバリプログラム５０４の機能については、第１の実施の形態と同様であるので、ここでは説明を省略する。 The functions of the cluster synchronization means 202 and data synchronization means 203 and 303 of the active AP server 101 and the standby AP server 102, the functions of the cluster control program 502, backup program 503, and recovery program 504 of the management server 131 are described below. Since this is the same as the first embodiment, the description thereof is omitted here.

また、図８、９において、稼動系ＡＰサーバ１０１と待機系ＡＰサーバ１０２の構成を説明したが、稼動系ＤＢサーバ１１１、待機系ＤＢサーバ１１２についても、同じように障害監視手段を備えている。 8 and 9, the configurations of the active AP server 101 and the standby AP server 102 have been described. The active DB server 111 and the standby DB server 112 are similarly provided with failure monitoring means. .

本実施の形態による高可用性システムにおけるリカバリ処理の動作について説明する。 The operation of the recovery process in the high availability system according to this embodiment will be described.

図１１は、本実施の形態における管理サーバ１３１の動作を示したフローチャート図である。 FIG. 11 is a flowchart showing the operation of the management server 131 in this embodiment.

管理サーバ１３１では障害監視プログラム５０５が動作しており、障害監視手段２０４または３０４からの障害検出通知を監視している（ステップＳ９０１）。障害検出通知を受信すると、これをトリガーとして、クラスタの切り替えおよび障害発生サーバのリカバリ処理を開始する。なお、以下の説明では、稼動系サーバをＡＰサーバ１０１、待機系サーバをＡＰサーバ１０２とし、稼動系ＡＰサーバ１０１に障害が発生した場合を例にとって説明する。 In the management server 131, the failure monitoring program 505 is running, and the failure detection notification from the failure monitoring means 204 or 304 is monitored (step S901). When a failure detection notification is received, this is used as a trigger to start cluster switching and failure server recovery processing. In the following description, the active server is the AP server 101, the standby server is the AP server 102, and a case where a failure occurs in the active AP server 101 will be described as an example.

管理サーバ１３１は、稼動系ＡＰサーバ１０１の障害監視手段２０４から障害発生通知を受信すると、自動的にクラスタ制御プログラム５０２が動作し、稼動系サーバをＡＰサーバ１０１からＡＰサーバ１０２に切り替えて、（ステップＳ９０２）、障害が発生したＡＰサーバ１０１をクラスタ１０３から切り離す（ステップＳ９０３）。 When the management server 131 receives a failure notification from the failure monitoring unit 204 of the active AP server 101, the cluster control program 502 automatically operates to switch the active server from the AP server 101 to the AP server 102 ( In step S902), the faulty AP server 101 is disconnected from the cluster 103 (step S903).

続いて、管理サーバ１３１は、仮想ＡＰサーバ１２２に対してデータ同期手段４０２の停止を指示し、同期処理を停止させる（ステップＳ９０４）。ここで、仮想ＡＰサーバ１２２は、システム異常が発生した時点での静止ポイントを設ける。 Subsequently, the management server 131 instructs the virtual AP server 122 to stop the data synchronization unit 402 and stops the synchronization process (step S904). Here, the virtual AP server 122 provides a stationary point when a system abnormality occurs.

そして、管理サーバ１３１は、リカバリプログラム５０４を実行し、仮想ＡＰサーバ１２２を、障害が発生したＡＰサーバ１０１にマイグレーションして、システムを復旧する（ステップＳ９０５）。マイグレーションによりＡＰサーバ１０１が静止ポイントまでリカバリされると、管理サーバ１３１は、稼働系ＡＰサーバ１０１のクラスタ同期手段２０２と、仮想ＡＰサーバ１２２のデータ同期手段４０２を再開させる（ステップＳ９０６）。 Then, the management server 131 executes the recovery program 504, migrates the virtual AP server 122 to the AP server 101 in which the failure has occurred, and restores the system (step S905). When the AP server 101 is recovered to the stationary point by migration, the management server 131 restarts the cluster synchronization unit 202 of the active AP server 101 and the data synchronization unit 402 of the virtual AP server 122 (step S906).

なお、第２の実施の形態におけるバックアップ処理については、第１の実施の形態の場合と全く同様に実行される。 Note that the backup processing in the second embodiment is executed in exactly the same way as in the first embodiment.

（第２の実施の形態による効果）
本実施の形態によれば、高可用性システムにおいて、稼動系サーバまたは待機系サーバの何れかに障害が発生した場合に、最新のデータにより自動的にリカバリを実現することができる。常に最新のデータでリカバリが可能な理由は、仮想ＡＰサーバのディスク状態はクラスタの同期処理と連動して、常に差分データによるバックアップが実現されているからである。すなわち、障害監視機能と組み合わせることにより、自動切り替えおよび自動リカバリを実現することができる。 (Effects of the second embodiment)
According to this embodiment, when a failure occurs in either the active server or the standby server in the high availability system, recovery can be automatically realized with the latest data. The reason why the latest data can always be recovered is that the disk state of the virtual AP server is always backed up by differential data in conjunction with the cluster synchronization process. That is, automatic switching and automatic recovery can be realized by combining with the failure monitoring function.

なお、第１の実施の形態及び第２の実施の形態において、図１に示すクラスタ１０３、１１３を仮想マシンサーバ上の仮想マシンで構築することも可能である。 In the first embodiment and the second embodiment, the clusters 103 and 113 shown in FIG. 1 can be constructed by virtual machines on a virtual machine server.

ここで、第１の実施の形態及び第２の実施の形態の各サーバのハードウェア構成例について簡単に説明する。図１２は、ＡＰサーバ１０１のハードウェア構成例を示すブロック図である。ここでは、ＡＰサーバ１０１を例にとって説明するが、他のサーバについても同様の構成である。
Here, a hardware configuration example of each server according to the first embodiment and the second embodiment will be briefly described. FIG. 12 is a block diagram illustrating a hardware configuration example of the AP server 101. Here, the AP server 101 will be described as an example, but the other servers have the same configuration.

図１２を参照すると、ＡＰサーバ１０１は、一般的なコンピュータ装置と同様のハードウェア構成によって実現することができ、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）７０１、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等のメモリからなる、データの作業領域やデータの一時退避領域に用いられる主記憶部７０２、ネットワークを介してデータの送受信を行う通信部７０３、入力装置７０５、出力装置７０６及び記憶装置７０７と接続してデータの送受信を行う入出力インタフェース部７０４、上記各構成要素を相互に接続するシステムバス７０８を備えている。記憶装置７０７は、例えば、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、磁気ディスク、半導体メモリ等の不揮発性メモリから構成される。 Referring to FIG. 12, the AP server 101 can be realized by a hardware configuration similar to that of a general computer device, and includes data such as a CPU (Central Processing Unit) 701 and a RAM (Random Access Memory). The main storage unit 702 used for the work area and data temporary save area, the communication unit 703 that transmits and receives data via the network, the input device 705, the output device 706, and the storage device 707 are connected to transmit and receive data. An input / output interface unit 704 and a system bus 708 for interconnecting the above-described components are provided. The storage device 707 includes, for example, a nonvolatile memory such as a ROM (Read Only Memory), a magnetic disk, and a semiconductor memory.

ＡＰサーバ１０１は、プログラムを組み込んだ、ＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ）等のハードウェア部品である回路部品を実装することにより、その動作をハードウェア的に実現することは勿論として、上記クラスタ同期手段２０２、データ同期手段２０３等の機能を提供するプログラムを、記憶装置７０７に格納し、そのプログラムを主記憶部７０２にロードしてＣＰＵ７０１で実行することにより、ソフトウェア的に実現することも可能である。 The AP server 101 mounts a circuit component that is a hardware component such as an LSI (Large Scale Integration) in which a program is incorporated, so that the operation is realized in hardware. A program that provides functions such as the data synchronization unit 203 is stored in the storage device 707, and the program is loaded into the main storage unit 702 and executed by the CPU 701.

以上好ましい実施の形態をあげて本発明を説明したが、本発明は必ずしも、上記実施の形態に限定されるものでなく、その技術的思想の範囲内において様々に変形して実施することができる。 Although the present invention has been described with reference to the preferred embodiments, the present invention is not necessarily limited to the above embodiments, and various modifications can be made within the scope of the technical idea. .

１０１、１０２：ＡＰサーバ
１１１、１１２：ＤＢサーバ
１０３、１１３、１００７：クラスタ
１２１：仮想マシンサーバ
１２２：仮想ＡＰサーバ
１２３：仮想ＤＢサーバ
１３１：管理サーバ
１４１：サービスネットワーク
１４２：管理ネットワーク
２０２、３０２：クラスタ同期手段
２０３、３０３、４０２：データ同期手段
２０４、３０４：障害監視手段
５０２：クラスタ制御プログラム
５０３：バックアッププログラム
５０４：リカバリプログラム
５０５：障害監視プログラム 101, 102: AP server 111, 112: DB server 103, 113, 1007: Cluster 121: Virtual machine server 122: Virtual AP server 123: Virtual DB server 131: Management server 141: Service network 142: Management network 202, 302: Cluster synchronization means 203, 303, 402: Data synchronization means 204, 304: Fault monitoring means 502: Cluster control program 503: Backup program 504: Recovery program 505: Fault monitoring program

Claims

A cluster including an active server and a standby server that always keeps data up-to-date in synchronization with the active server;
A virtual server that synchronizes data with the standby server,
In synchronization with the synchronization processing of the active server and the standby server, the standby server and the virtual server are synchronized, the synchronization function of the standby server with the active server, and the virtual server A high availability system, wherein the virtual server is backed up in a state where the synchronization function with the standby server is stopped.

Sending difference data from the active server to the standby server to perform synchronization processing, and sending the difference data from the standby server to the virtual server to perform synchronization processing,
The high availability system according to claim 1, wherein after the backup is completed, the synchronization function of the standby server with the active server and the synchronization function of the virtual server with the standby server are resumed.

If a system error occurs on the active server, disconnect the active server from the cluster,
3. The high availability according to claim 1, wherein the system is recovered and the active server is returned to the cluster by executing migration of the disconnected active server and the virtual server. system.

With a management server,
The management server is
A control program for controlling synchronization of the standby server with the active server and synchronization of the virtual server with the standby server;
A backup program for performing backup of the virtual server;
The high availability system according to any one of claims 1 to 3, further comprising a recovery program that recovers the system by executing migration of the separated active server and the virtual server.

The active server and the standby server include failure monitoring means for monitoring a failure of the own server,
When the management server receives a failure detection notification from the failure monitoring means of the active server or the standby server, the active server or the standby system in which a failure has occurred automatically, triggered by the failure detection notification The high availability system according to claim 4, wherein server detachment and migration with the virtual server are executed.

A step of performing synchronization processing between the standby server and the virtual server in series with synchronization processing between the active server and the standby server constituting the cluster;
A high availability system comprising a step of backing up the virtual server in a state where the synchronization function of the standby server with the active server and the synchronization function of the virtual server with the standby server are stopped. Measures against failure.

Performing synchronization processing by transmitting difference data from the active server to the standby server, and performing synchronization processing by transmitting the difference data from the standby server to the virtual server;
7. The method according to claim 6, further comprising a step of resuming a synchronization function of the standby server with the active server and a synchronization function of the virtual server with the standby server after the backup is completed. How to deal with failures in availability systems.

When a system error occurs in the active server, disconnecting the active server from the cluster; and
8. The method according to claim 6, further comprising a step of recovering a system and returning the active server to the cluster by executing migration of the disconnected active server and the virtual server. For high-availability systems.

With a management server,
On the management server,
Controlling the synchronization of the standby server with the active server and the synchronization of the virtual server with the standby server,
Perform a backup of the virtual server,
9. The countermeasure against failure of a high availability system according to claim 6, wherein migration of the separated active server and the virtual server is executed.

The active server and the standby server monitor the failure of the local server,
Upon receiving a failure detection notification from the active server or the standby server in the management server, automatically triggered by the failure detection notification, disconnection of the active server or the standby server that has failed, 10. The countermeasure against failure of a high availability system according to claim 9, wherein migration with the virtual server is executed.