JP7206981B2

JP7206981B2 - Cluster system, its control method, server, and program

Info

Publication number: JP7206981B2
Application number: JP2019020633A
Authority: JP
Inventors: 洋介緒方
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2019-02-07
Filing date: 2019-02-07
Publication date: 2023-01-18
Anticipated expiration: 2039-02-07
Also published as: JP2020129184A

Description

本開示は、クラスタシステム、その制御方法、及びプログラムに関し、更に詳しくは、複数のサーバを有するクラスタシステム、その制御方法、及びプログラムに関する。 The present disclosure relates to a cluster system, its control method, and program, and more particularly to a cluster system having multiple servers, its control method, and program.

また、本開示は、クラスタシステムに用いられるサーバに関する。 The present disclosure also relates to servers used in cluster systems.

近年、複数サーバを用意して可用性を高めることを目的とした高可用性クラスタシステムにおいて、待機系サーバを、オンプレミスではなくパブリッククラウド上で仮想マシンとして稼働させる構成が増えている。待機系サーバを、パブリッククラウド上で稼動させることで、高可用性クラスタシステムの運営者は、環境構築や、運用の負担を軽減することができる。 In recent years, in high-availability cluster systems intended to prepare multiple servers to increase availability, there is an increasing number of configurations in which standby servers are run as virtual machines on a public cloud instead of on-premises. By operating the standby server on the public cloud, the operator of the high-availability cluster system can reduce the burden of environment construction and operation.

関連技術として、特許文献１は、複数の情報システムを連携させてサービスの可用性を高める高可用性システムを管理する技術を開示する。特許文献１に開示される高可用性システムは、現用系サーバ、待機系サーバ、及びそれらの性能管理を実行する性能管理装置を有する。特許文献１において、現用系サーバはオンプレミス環境に構築され、待機系サーバはクラウド環境に構築される。高可用性クラスタにおいて、現用系サーバに故障や障害などの何らかの事象が発生した場合、待機系サーバは、現用系サーバが実行していた処理を引き継ぐフェイルオーバ処理を実行するように構成される。 As a related technique, Patent Literature 1 discloses a technique for managing a high-availability system that links a plurality of information systems to increase service availability. The high-availability system disclosed in Patent Document 1 has an active server, a standby server, and a performance management device that manages their performance. In Patent Literature 1, an active server is constructed in an on-premises environment, and a standby server is constructed in a cloud environment. In a high-availability cluster, when some event such as failure or failure occurs in the active server, the standby server is configured to execute failover processing that takes over the processing that the active server was executing.

ここで、オンプレミス環境は、例えば、高可用性システムを構築する特定の企業や組織などの内部に構成され、当該所有者によって運用される情報処理システムを指す。また、クラウド環境は、計算機及び通信ネットワークを構成する各種リソースを仮想化してサービスとして提供するクラウドコンピューティング環境を指す。クラウド環境は、例えばインターネットを経由して広くクラウドサービスを提供するパブリッククラウドであってもよく、特定の企業向けに構築されたプライベートクラウドであってもよい。 Here, the on-premises environment refers to, for example, an information processing system configured inside a specific company or organization that builds a highly available system and operated by the owner. A cloud environment refers to a cloud computing environment in which various resources that make up computers and communication networks are virtualized and provided as services. The cloud environment may be, for example, a public cloud that widely provides cloud services via the Internet, or a private cloud built for a specific company.

性能管理装置は、オンプレミス環境内において、ＬＡＮ（Local Area Network）などを通じて現用系サーバと通信可能に接続されている。また、性能管理装置は、インターネットなどのＷＡＮ（Wide Area Network）を通じて、待機系サーバと通信可能に接続される。性能管理装置は、待機系サーバの性能を管理するための各種処理を実行し、待機系サーバの性能に関する状態を管理する。 The performance management device is communicatively connected to the active server via a LAN (Local Area Network) or the like in the on-premises environment. Also, the performance management device is communicably connected to the standby server through a WAN (Wide Area Network) such as the Internet. The performance management device executes various processes for managing the performance of the standby server, and manages the status of the performance of the standby server.

特許文献１において、性能管理装置は、現用系サーバが正常に稼動する平常時は、待機系サーバの性能を低く抑える。一方、性能管理装置は、フェイルオーバ処理が発生した場合は、待機系サーバの性能を、現用系サーバと同等水準まで高くする。このようにすることで、正常時に待機系サーバが使用するリソースを低減することができ、高可用性クラスタにおいて、運用コストを低減することができる。 In Patent Literature 1, the performance management device keeps the performance of the standby server low during normal operation when the active server operates normally. On the other hand, when failover processing occurs, the performance management device increases the performance of the standby server to the same level as the active server. By doing so, it is possible to reduce the resources used by the standby server in the normal state, and it is possible to reduce the operation cost in the high-availability cluster.

特開２０１５－６９２８３号公報JP 2015-69283 A

ここで、高可用性クラスタシステムにおいて、待機系サーバをパブリッククラウド上の仮想マシンとして稼動させることは、環境構築及び運用を容易にする一方で、継続的な費用を発生させる。パブリッククラウド上の仮想マシンの運用コストは、仮想マシンの性能だけでなく、料金体系にも依存する。例えば、ＡｍａｚｏｎＥＣ２では、通常インスタンスとスポットインスタンスという料金体系を選択可能である。通常インスタンスとは、一定料金で利用可能な通常のインスタンスでる。一方、スポットインスタンスは、通常インスタンスの余剰リソースである。スポットインスタンスは、料金が変動するものの、通常インスタンスより安価に利用可能である。 Here, in a high-availability cluster system, operating a standby server as a virtual machine on a public cloud facilitates environment construction and operation, but generates ongoing costs. The operating costs of virtual machines on public clouds depend not only on the performance of the virtual machines, but also on the fee structure. For example, in Amazon EC2, it is possible to select a charge system of normal instance and spot instance. A regular instance is a regular instance that is available for a fixed price. Spot instances, on the other hand, are the surplus resources of regular instances. Spot instances are available at a lower price than regular instances, although their prices fluctuate.

しかしながら、スポットインスタンスは、クラウド事業者（クラウドサービスプロバイダ）によって強制終了させられる場合がある。例えば、通常インスタンスに対する需要が増加し、余剰リソースが不足した場合、クラウド事業者は、ランダムに選択された、いくつかのスポットインスタンスを強制的に終了させる。また、スポット料金が入札価格を超過した場合も、クラウド事業者は、ランダムに選択したいくつかのスポットインスタンスを強制的に終了させる。このため、スポットインスタンスは、料金面ではメリットがあるものの、可用性は低いと言える。 However, Spot Instances may be terminated by the cloud operator (cloud service provider). For example, when the demand for regular instances increases and the surplus resources run out, the cloud operator forcibly terminates some randomly selected Spot instances. Also, if the Spot price exceeds the bid price, the cloud operator will forcibly terminate some randomly selected Spot instances. For this reason, spot instances can be said to have low availability, although they are advantageous in terms of pricing.

上記したような、低価格ではあるが可用性が低いクラウド余剰リソースを用いたサービスとして、例えばＧｏｏｇｌｅＣｌｏｕｄＰｌａｔｆｏｒｍではプリエンプティブインスタンスが提供さている。また、ＡｚｕｒｅではＬｏｗ－ｐｒｉｏｒｉｔｙＶＭが提供されている。 For example, the Google Cloud Platform provides a preemptive instance as a service using surplus cloud resources that are low in price but low in availability as described above. Also, Azure provides a Low-priority VM.

特許文献１に記載の高可用性クラスタにおいて、待機系サーバをスポットインスタンスとして稼動させると、待機系の運用コストを削減することができる。しかしながら、スポットインスタンスは不定期に強制終了する可能性があるため、可用性を確保することができないという問題がある。 In the high-availability cluster described in Patent Literature 1, operating the standby server as a spot instance can reduce the operating cost of the standby system. However, there is a problem that availability cannot be ensured because Spot Instances may be forcibly terminated irregularly.

本開示は、上記事情に鑑み、強制終了される可能性があるインスタンスを用いて運用コストを抑えつつ、可用性を確保することができるクラスタシステム、その制御方法、サーバ、及びプログラムを提供することを目的とする。 In view of the above circumstances, the present disclosure aims to provide a cluster system, its control method, a server, and a program that can ensure availability while suppressing operation costs using instances that may be forcibly terminated. aim.

上記目的を達成するために、本開示は、サービスを実行するサービス実行手段を有する第１のサーバと、前記第１のサーバにおいて前記サービスの実行に障害が発生した場合に、前記サービスの実行がフェイルオーバされる第２のサーバとを備え、前記第２のサーバは、クラウド環境においてクラウドサービスプロバイダから強制終了させられる可能性がある第１の種別のインスタンスとして作成された仮想サーバであり、前記第１のサーバは、更に、前記第２のサーバが強制終了されるか否かを監視するインスタンス監視手段と、
前記第２のサーバが強制終了されることが検出された場合、前記クラウド環境に第３のサーバを前記第１の種別のインスタンスとして作成させ、前記第２のサーバが提供する機能を前記第３のサーバに引き継がせるインスタンス操作手段とを有するクラスタシステムを提供する。 In order to achieve the above object, the present disclosure provides a first server having service execution means for executing a service, and when a failure occurs in the execution of the service in the first server, execution of the service is performed. a second server to be failed over, said second server being a virtual server created as an instance of a first type that can be killed by a cloud service provider in a cloud environment; the first server further includes instance monitoring means for monitoring whether or not the second server is forcibly terminated;
When it is detected that the second server is forcibly terminated, it causes the cloud environment to create a third server as an instance of the first type, and restores the functionality provided by the second server to the third server. To provide a cluster system having an instance operation means for taking over to another server.

本開示は、また、サービスを実行するサービス実行手段と、クラウド環境においてクラウドサービスプロバイダから強制終了させられる可能性がある第１の種別のインスタンスとして作成され、かつ前記サービスの実行に障害が発生した場合に前記サービスの実行がフェイルオーバされる第１の仮想サーバが強制終了されるか否かを監視するインスタンス監視手段と、前記第１の仮想サーバが強制終了されることが検出された場合、前記クラウド環境に第２の仮想サーバを前記第１の種別のインスタンスとして作成させ、前記第１の仮想サーバが提供する機能を前記第２の仮想サーバに引き継がせるインスタンス操作手段とを備えるサーバを提供する。 The present disclosure also provides a service execution means for executing a service and a service execution means created as an instance of a first type that may be terminated by a cloud service provider in a cloud environment, and where execution of said service fails. instance monitoring means for monitoring whether or not a first virtual server whose execution of said service is failed over is forcibly terminated in a case where it is detected that said first virtual server is forcibly terminated, said and instance operation means for causing a cloud environment to create a second virtual server as an instance of the first type, and causing the function provided by the first virtual server to be handed over to the second virtual server. .

本開示は、サーバにおいてサービスを実行し、クラウド環境においてクラウドサービスプロバイダから強制終了させられる可能性があるインスタンスとして作成され、かつ前記サービスの実行に障害が発生した場合に前記サービスの実行がフェイルオーバされる第１の仮想サーバが強制終了されるか否かを監視し、前記第１の仮想サーバが強制終了されることが検出された場合、前記クラウド環境に第２の仮想サーバを前記強制終了させられる可能性があるインスタンスとして作成させ、前記第１の仮想サーバが提供する機能を前記第２の仮想サーバに引き継がせるクラスタシステム制御方法を提供する。 The present disclosure provides services that run on a server, are created as instances that can be killed by a cloud service provider in a cloud environment, and that execution of the service fails over in the event of a failure in execution of the service. monitoring whether or not the first virtual server is forcibly terminated, and if it is detected that the first virtual server is forcibly terminated, causes the cloud environment to forcibly terminate the second virtual server; A cluster system control method is provided in which the second virtual server takes over the function provided by the first virtual server.

本開示は、コンピュータに、サービスを実行し、クラウド環境においてクラウドサービスプロバイダから強制終了させられる可能性があるインスタンスとして作成され、かつ前記サービスの実行に障害が発生した場合に前記サービスの実行がフェイルオーバされる第１の仮想サーバが強制終了されるか否かを監視し、前記第１の仮想サーバが強制終了されることが検出された場合、前記クラウド環境に第２の仮想サーバを前記強制終了させられる可能性があるインスタンスとして作成させ、前記第１の仮想サーバが提供する機能を前記第２の仮想サーバに引き継がせるための処理を実行させるためのプログラムを提供する。 The present disclosure provides a computer that executes a service, is created as an instance that can be forcibly terminated by a cloud service provider in a cloud environment, and that if a failure occurs in the execution of the service, the execution of the service fails over. monitoring whether or not the first virtual server is forcibly terminated, and if it is detected that the first virtual server is forcibly terminated, the forcible termination of the second virtual server in the cloud environment; A program is provided for executing a process for creating an instance that may be executed, and causing the second virtual server to take over the function provided by the first virtual server.

本開示に係るクラスタシステム、その制御方法、サーバ、及びプログラムは、強制終了される可能性があるインスタンスを用いて運用コストを抑えつつ、可用性を確保することができる。 A cluster system, its control method, a server, and a program according to the present disclosure can ensure availability while suppressing operation costs by using instances that may be forcibly terminated.

本開示に係るクラスタシステムを概略的に示すブロック図。1 is a block diagram schematically showing a cluster system according to the present disclosure; FIG. 本開示の一実施形態に係る高可用性クラスタシステムを示すブロック図。1 is a block diagram showing a high availability cluster system according to an embodiment of the present disclosure; FIG. 待機系切替えにおける動作手順を示すフローチャート。4 is a flowchart showing an operation procedure in standby system switching; 待機系サーバにフェイルオーバする際の動作手順を示すフローチャート。4 is a flowchart showing an operation procedure when failing over to a standby server; 現用系サーバにフェイルバックする場合の動作手順を示すフローチャート。4 is a flow chart showing an operation procedure when failing back to an active server;

本開示の実施の形態の説明に先立って、本開示の概要を説明する。図１は、本開示に係るクラスタシステムを概略的に示す。クラスタシステム１０は、第１のサーバ２０、及び第２のサーバ３０を有する。第１のサーバ２０は、サービス実行手段２１、インスタンス監視手段２２、及びインスタンス操作手段２３を有する。 An outline of the present disclosure will be described prior to description of the embodiments of the present disclosure. FIG. 1 schematically shows a cluster system according to the present disclosure. A cluster system 10 has a first server 20 and a second server 30 . The first server 20 has service executing means 21 , instance monitoring means 22 and instance operating means 23 .

サービス実行手段２１は、サービスを実行する。第２のサーバ３０は、第１のサーバにおいてサービスの実行に障害が発生した場合に、サービスの実行がフェイルオーバされるサーバである。第２のサーバ３０は、クラウド環境５０に配置される。クラウド環境５０は、クラウドサービスプロバイダから強制終了させられる可能性があるインスタンス（第１の種別のインスタンス）と、強制終了されることはない通常のインスタンス（第２の種別のインスタンス）とを提供する。第２のサーバ３０は、クラウド環境５０において、第１の種別のインスタンスとして作成された仮想サーバである。 Service execution means 21 executes a service. The second server 30 is a server to which service execution fails over when a service execution failure occurs in the first server. A second server 30 is located in the cloud environment 50 . The cloud environment 50 provides instances that can be forcibly terminated by the cloud service provider (first type instances) and normal instances that cannot be forcibly terminated (second type instances). . The second server 30 is a virtual server created as a first type instance in the cloud environment 50 .

インスタンス監視手段２２は、クラウド環境５０において、第２のサーバ３０がクラウドサービスプロバイダから強制終了されるか否かを監視する。インスタンス操作手段２３は、第２のサーバ３０が強制終了されることが検出された場合、クラウド環境５０に、第３のサーバ４０を作成する。第３のサーバ４０は、第２のサーバ３０と同様に、クラウド環境５０において、第１の種別のインスタンスとして作成された仮想サーバである。インスタンス操作手段２３は、第２のサーバ３０が提供する機能を第３のサーバ４０に引き継がせる。 The instance monitoring means 22 monitors whether or not the second server 30 is forcibly terminated by the cloud service provider in the cloud environment 50 . The instance operation means 23 creates the third server 40 in the cloud environment 50 when it is detected that the second server 30 is forcibly terminated. The third server 40 is a virtual server created as an instance of the first type in the cloud environment 50 in the same way as the second server 30 . The instance operation means 23 causes the third server 40 to take over the functions provided by the second server 30 .

本開示では、第１の種別のインスタンスとして作成された仮想サーバである第２のサーバ３０が、第１のサーバ２０のフェイルオーバ先のサーバとして使用される。クラウド環境５０において、第１の種別のインスタンスは通常のインスタンスより安価で提供される。このため、クラスタシステム１０は、フェイルオーバ先のサーバに通常インスタンスが用いる場合に比べて、運用コストを低減することができる。 In the present disclosure, the second server 30, which is a virtual server created as an instance of the first type, is used as a failover destination server for the first server 20. FIG. In the cloud environment 50, instances of the first type are provided at a lower price than normal instances. Therefore, the cluster system 10 can reduce operating costs compared to a case where a normal instance is used as a failover destination server.

本開示において、インスタンス操作手段２３は、第２のサーバ３０が強制終了される場合、クラウド環境５０に第３のサーバ４０を作成する。第３のサーバ４０も、第２のサーバ３０と同様に、第１の種別のインスタンスとして作成された仮想サーバである。インスタンス操作手段２３は、第２のサーバ３０が提供する機能を、第３のサーバ４０に引き継がせる。このようにすることで、第２のサーバ３０が強制終了された場合でも、第１のサーバのフェイルオーバ先のサーバを確保することができ、クラスタシステム１０の可用性を確保することができる。このように、本開示では、第１の種別のインスタンスを使用することで運用コストを低減しつつ、可用性を確保することができる。 In the present disclosure, the instance manipulation means 23 creates the third server 40 in the cloud environment 50 when the second server 30 is forcibly terminated. Like the second server 30, the third server 40 is also a virtual server created as an instance of the first type. The instance operation means 23 causes the third server 40 to take over the functions provided by the second server 30 . By doing so, even if the second server 30 is forcibly terminated, the failover destination server of the first server can be secured, and the availability of the cluster system 10 can be secured. Thus, in the present disclosure, it is possible to ensure availability while reducing operating costs by using the first type of instance.

以下、図面を参照して本開示の実施の形態について説明する。図２は、本開示の一実施形態に係る高可用性クラスタシステムを示す。高可用性（ＨＡ：High Availability）クラスタシステム１００は、現用系サーバ１０２及び、待機系サーバ２０２、２０９、及び２１６を有する。本実施形態において、現用系サーバ１０２はオンプレミス環境１０１に配置され、待機系サーバ２０２、２０９、及び２１６はパブリッククラウド環境２０１に配置される。ＨＡクラスタシステム１００は、図１のクラスタシステム１０に対応する。 Embodiments of the present disclosure will be described below with reference to the drawings. FIG. 2 illustrates a high availability cluster system according to one embodiment of the present disclosure. A high availability (HA) cluster system 100 has an active server 102 and standby servers 202 , 209 and 216 . In this embodiment, the active server 102 is arranged in the on-premises environment 101 and the standby servers 202 , 209 and 216 are arranged in the public cloud environment 201 . HA cluster system 100 corresponds to cluster system 10 in FIG.

オンプレミス環境１０１は、例えば、ＨＡクラスタシステム１００を構築する特定の企業や組織などの内部に構成され、それらによって運用される情報処理システムである。オンプレミス環境１０１は、物理的なサーバと、それらを通信可能に接続する通信ネットワークとを用いて構成され得る。オンプレミス環境１０１を構成するサーバなどの計算機の一部は、仮想化された計算機であるバーチャルマシン（ＶＭ：Virtual Machine）を用いて構成されていてもよい。 The on-premises environment 101 is, for example, an information processing system configured inside a specific company or organization that builds the HA cluster system 100 and operated by them. The on-premises environment 101 may be configured using physical servers and a communication network that communicatively connects them. Some of the computers such as servers that configure the on-premises environment 101 may be configured using virtual machines (VMs), which are virtualized computers.

パブリッククラウド環境２０１は、計算機及び通信ネットワークを構成する各種リソースを仮想化してサービスとして提供するクラウドコンピューティング環境である。パブリッククラウド環境２０１は、ブロックストレージ２２３を有する。パブリッククラウド環境２０１において、ブロックストレージ２２３の接続先は、待機系サーバ２０２、２０９、又は２１６の間で切替え可能である。パブリッククラウド環境２０１は、図１のクラウド環境５０に対応する。 The public cloud environment 201 is a cloud computing environment in which various resources constituting computers and communication networks are virtualized and provided as services. Public cloud environment 201 has block storage 223 . In the public cloud environment 201 , the connection destination of the block storage 223 can be switched among the standby servers 202 , 209 and 216 . Public cloud environment 201 corresponds to cloud environment 50 in FIG.

オンプレミス環境１０１は、ＷＡＮ３０２を通じて、クラウドサービスプロバイダ３０１と通信可能である。パブリッククラウド環境２０１は、インターネットゲートウェイ３０３、及びＷＡＮ３０２を通じて、クラウドサービスプロバイダ３０１と通信可能である。また、オンプレミス環境１０１及びパブリッククラウド環境２０１は、クラウドストレージ（クラウドサービス）３０４と通信可能である。 On-premises environment 101 can communicate with cloud service provider 301 through WAN 302 . Public cloud environment 201 can communicate with cloud service provider 301 through Internet gateway 303 and WAN 302 . Also, the on-premises environment 101 and the public cloud environment 201 can communicate with a cloud storage (cloud service) 304 .

本実施形態において、パブリッククラウド環境２０１は、２種類のインスタンスを提供する。１つは通常インスタンスであり、もう１つは余剰インスタンスである。通常インスタンスは、契約上、クラウド提供者から強制終了されることがないインスタンスである。これに対し、余剰インスタンスは、契約上、クラウド提供者から強制終了される場合があるインスタンスである。余剰インスタンスは、例えば、ＡｍａｚｏｎＥＣ２におけるスポットインスタンスなどのインスタンスに対応する。一般に、余剰インスタンスは、通常インスタンスに比べて安価に利用できる。 In this embodiment, the public cloud environment 201 provides two types of instances. One is normal instance and the other is redundant instance. Regular instances are instances that cannot be contractually terminated by the cloud provider. In contrast, surplus instances are instances that may be contractually terminated by the cloud provider. A surplus instance corresponds to an instance, such as a spot instance on Amazon EC2, for example. Surplus instances are generally cheaper to use than normal instances.

本実施形態において、現用系サーバ１０２は、オンプレミス環境１０１に属する。現用系サーバ１０２は、物理的なコンピュータなどの情報処理装置を用いて構成された物理サーバであってもよく、或いは、仮想化技術を用いて構成された仮想サーバであってもよい。現用系サーバ１０２は、図１の第１のサーバ２０に対応する。 In this embodiment, the active server 102 belongs to the on-premises environment 101 . The active server 102 may be a physical server configured using an information processing device such as a physical computer, or may be a virtual server configured using virtualization technology. Active server 102 corresponds to first server 20 in FIG.

一方、待機系サーバ２０２、２０９、及び２１６は、パブリッククラウド環境２０１に属する仮想サーバである。待機系サーバ（第１の仮想サーバ）２０２、及び待機系サーバ（第２の仮想サーバ）２０９は、余剰インスタンス（第１の種別のインスタンス）を用いて作成され、クラウド提供者側から強制終了される場合がある。これに対し、待機系サーバ（第３の仮想サーバ）２１６は、通常インスタンス（第２の種別のインスタンス）を用いて作成される。待機系サーバ２１６は、クラウド提供者側から強制終了されることはない。待機系サーバ２０２は図１の第２のサーバ３０に対応し、待機系サーバ２０９は図１の第３のサーバ４０に対応する。 On the other hand, the standby servers 202 , 209 and 216 are virtual servers belonging to the public cloud environment 201 . A standby server (first virtual server) 202 and a standby server (second virtual server) 209 are created using surplus instances (first type instances) and forcibly terminated by the cloud provider. may occur. On the other hand, the standby server (third virtual server) 216 is created using a normal instance (second type instance). The standby server 216 is not forcibly terminated by the cloud provider. The standby server 202 corresponds to the second server 30 in FIG. 1, and the standby server 209 corresponds to the third server 40 in FIG.

待機系サーバ２０９は、後述するように、待機系サーバ２０２が強制終了される場合に作成される。パブリッククラウド環境２０１において、待機系サーバ２０２が動作しているとき、待機系サーバ２０９は存在しない。待機系サーバ２０２及び２０９は、現用系サーバ１０２が正常に動作しなくなった場合などに、現用系サーバ１０２が提供していたサービスを引き継ぐフェイルオーバ処理を実施する。待機系サーバ２１６は、後述するように、待機系サーバ２０２又は２０９がフェイルオーバ処理を行うことで作成される。 The standby server 209 is created when the standby server 202 is forcibly terminated, as will be described later. In the public cloud environment 201, when the standby server 202 is operating, the standby server 209 does not exist. The standby servers 202 and 209 perform failover processing to take over the services provided by the active server 102 when the active server 102 fails to operate normally. The standby server 216 is created when the standby server 202 or 209 performs failover processing, as will be described later.

現用系サーバ１０２は、高可用性クラスタソフト１０３及びブロックストレージ（ストレージ）１０９を有する。高可用性クラスタソフト１０３は、現用系サーバ１０２上で実行されることで、クラスタ制御部１０４、サービス１０６、インスタンス監視部１０７、及びインスタンス操作部１０８の機能を提供する。待機系サーバ２０２は、高可用性クラスタソフト２０３を有する。高可用性クラスタソフト２０３は、待機系サーバ２０２上で実行されることで、クラスタ制御部２０４、サービス２０６、インスタンス監視部２０７、及びインスタンス操作部２０８の機能を提供する。 The active server 102 has high-availability cluster software 103 and block storage (storage) 109 . The high-availability cluster software 103 is executed on the active server 102 to provide the functions of the cluster control unit 104 , the service 106 , the instance monitoring unit 107 and the instance operation unit 108 . The standby server 202 has high availability cluster software 203 . The high-availability cluster software 203 is executed on the standby server 202 to provide functions of a cluster control unit 204 , a service 206 , an instance monitoring unit 207 and an instance operation unit 208 .

待機系サーバ２０９は、高可用性クラスタソフト２１０を有する。高可用性クラスタソフト２１０は、待機系サーバ２０９上で実行されることで、クラスタ制御部２１１、サービス２１３、インスタンス監視部２１４、及びインスタンス操作部２１５の機能を提供する。待機系サーバ２１６は、高可用性クラスタソフト２１７を有する。高可用性クラスタソフト２１７は、待機系サーバ２１６上で実行されることで、クラスタ制御部２１８、サービス２２０、インスタンス監視部２２１、及びインスタンス操作部２２２の機能を提供する。 The standby server 209 has high availability cluster software 210 . The high-availability cluster software 210 provides functions of a cluster control unit 211 , a service 213 , an instance monitoring unit 214 , and an instance operation unit 215 by being executed on the standby server 209 . The standby server 216 has high availability cluster software 217 . The high-availability cluster software 217 is executed on the standby server 216 to provide functions of a cluster control unit 218 , a service 220 , an instance monitoring unit 221 and an instance operation unit 222 .

ここで、現用系サーバ１０２の高可用性クラスタソフト１０３と、待機系サーバ２０２、２０９、及び２１６の高可用性クラスタソフト２０３、２１０、及び２１７とは、同じソフトウェアであってよい。高可用性クラスタソフト１０３、２０３、２１０、及び２１７は、例えばインストール時にオンプレミス環境１０１のサーバであるかパブリッククラウド環境２０１の仮想マシンであるかが選択される。この選択に基づいて、現用系サーバ１０２における動作と待機系サーバ２０２における動作とが選択される。高可用性クラスタソフト２０３、２１０、及び２１７は、例えばインストール時にパブリッククラウド環境２０１において余剰インスタンスの仮想サーバであるか、通常インスタンスの仮想サーバであるかが選択される。この選択に基づいて、待機系サーバ２０２及び２０９における動作と、待機系サーバ２１６における動作とが選択される。 Here, the high availability cluster software 103 of the active server 102 and the high availability cluster software 203, 210 and 217 of the standby servers 202, 209 and 216 may be the same software. For the high-availability cluster software 103, 203, 210, and 217, for example, it is selected at the time of installation whether it is a server in the on-premises environment 101 or a virtual machine in the public cloud environment 201. FIG. Based on this selection, the operation in active server 102 and the operation in standby server 202 are selected. For the high-availability cluster software 203, 210, and 217, for example, in the public cloud environment 201, it is selected at the time of installation whether it is a virtual server of a surplus instance or a virtual server of a normal instance. Based on this selection, the operation in the standby servers 202 and 209 and the operation in the standby server 216 are selected.

現用系サーバ１０２において、クラスタ制御部１０４は、クラスタ構成情報１０５を管理する。クラスタ構成情報１０５は、サーバの種別（オンプレミス又はパブリッククラウド）、サーバのホスト名、ＩＰ（Internet Protocol）アドレス、ブロックストレージ１０９及び２２３の情報を含む。また、クラスタ構成情報１０５は、パブリッククラウド環境２０１上の各種パラメータ、及びクラウドサービスプロバイダ３０１に登録済のアカウント情報を含む。同様に、待機系サーバ２０２、２０９、及び２１６において、クラスタ制御部２０４、２１１、及び２１８は、クラスタ構成情報２０５、２１２、及び２１９を管理する。クラスタ構成情報１０５、２０５、２１２、及び２１９は、変更されるたびにネットワークを通じて同期され、各サーバは同一の内容のクラスタ構成情報を保持する。 In active server 102 , cluster control unit 104 manages cluster configuration information 105 . The cluster configuration information 105 includes server type (on-premises or public cloud), server host name, IP (Internet Protocol) address, and information on block storages 109 and 223 . The cluster configuration information 105 also includes various parameters on the public cloud environment 201 and account information registered with the cloud service provider 301 . Similarly, in standby servers 202 , 209 and 216 , cluster control units 204 , 211 and 218 manage cluster configuration information 205 , 212 and 219 . The cluster configuration information 105, 205, 212, and 219 are synchronized through the network each time they are changed, and each server holds the same cluster configuration information.

現用系サーバ１０２において、サービス１０６は、所定のサービスを提供する。サービス１０６は、図１のサービス実行手段２１が実行するサービスに対応する。待機系サーバ２０２、２０９、及び２１６において、サービス２０６、２１３、及び２２０は、サービス１０６と同一のサービスを提供する。ただし、現用系サーバ１０２においてサービス１０６が提供される場合、例えば待機系サーバ２０２は待機状態にあり、サービス２０６を提供しない。現用系サーバ１０２において、サービス１０６は、その提供に際してブロックストレージ１０９にデータを格納し、或いはブロックストレージ１０９内のデータを更新する。 In active server 102, service 106 provides a predetermined service. The service 106 corresponds to the service executed by the service executing means 21 in FIG. In standby servers 202 , 209 and 216 , services 206 , 213 and 220 provide the same service as service 106 . However, when the service 106 is provided by the active server 102, for example, the standby server 202 is in a standby state and does not provide the service 206. FIG. In the active server 102 , the service 106 stores data in the block storage 109 or updates data in the block storage 109 when providing the service 106 .

ブロックストレージ１０９が格納するデータは、パブリッククラウド環境２０１のブロックストレージ２２３にミラーリングされる。このミラーリングにより、ブロックストレージ２２３は、ブロックストレージ１０９と同じデータを格納する。何らかの原因で現用系サーバ１０２がサービス１０６を提供できなくなると、例えば待機系サーバ２０２はフェイルオーバ処理を行い、待機系サーバ２０２が現用系となる。現用系となった待機系サーバ２０２は、サービス２０６を提供する。待機系サーバ２０２において、サービス２０６は、ブロックストレージ２２３にミラーリングされたデータを用いてサービスの提供を行う。 Data stored by the block storage 109 is mirrored to the block storage 223 of the public cloud environment 201 . By this mirroring, the block storage 223 stores the same data as the block storage 109 . When the active server 102 cannot provide the service 106 for some reason, for example, the standby server 202 performs failover processing, and the standby server 202 becomes the active system. The standby server 202 that has become the active system provides a service 206 . In the standby server 202 , the service 206 provides services using data mirrored in the block storage 223 .

ここで、待機系サーバ２０２は余剰インスタンスとして作成されるため、強制終了される可能性がある。現用系サーバ１０２において、インスタンス監視部１０７は、待機系サーバ２０２が強制終了されるか否かを監視する。インスタンス監視部１０７は、待機系サーバ２０２が強制終了される場合、強制終了前に、クラウドサービスプロバイダ３０１から強制終了の予告を受け取る。インスタンス監視部１０７は、強制終了の予告を受け取ると、その旨をインスタンス操作部１０８に通知する。インスタンス監視部１０７は、図１のインスタンス監視手段２２に対応する。 Since the standby server 202 is created as a surplus instance, it may be forcibly terminated. In the active server 102, the instance monitoring unit 107 monitors whether or not the standby server 202 is forcibly terminated. When the standby server 202 is forcibly terminated, the instance monitoring unit 107 receives an advance notice of the forcible termination from the cloud service provider 301 before the forcible termination. Upon receiving the notice of forced termination, the instance monitoring unit 107 notifies the instance operation unit 108 to that effect. The instance monitoring unit 107 corresponds to the instance monitoring means 22 in FIG.

インスタンス操作部１０８は、インスタンス監視部１０７から強制終了の予告が通知された場合、待機系サーバ２０２の動作を停止させる。また、インスタンス操作部１０８は、クラスタ構成情報１０５を参照し、クラウドサービスプロバイダ３０１に、待機系サーバ２０９を余剰インスタンスにより作成させる。このとき、インスタンス操作部１０８は、クラウドサービスプロバイダ３０１に、待機系サーバ２０２と同じスペック（性能）の待機系サーバ２０９を作成させる。インスタンスを作成する場合に必要となるイメージファイルは、例えばクラウドストレージ３０４に保存されている。クラウドサービスプロバイダ３０１は、クラウドストレージ３０４からイメージファイルを取得し、待機系サーバ２０９を作成する。インスタンス操作部１０８は、図１のインスタンス操作手段２３に対応する。 When the instance monitoring unit 107 notifies the instance monitoring unit 107 of the forced termination, the instance operation unit 108 stops the operation of the standby server 202 . In addition, the instance operation unit 108 refers to the cluster configuration information 105 and causes the cloud service provider 301 to create the standby server 209 using surplus instances. At this time, the instance operation unit 108 causes the cloud service provider 301 to create a standby server 209 with the same specifications (performance) as the standby server 202 . An image file required for creating an instance is stored in the cloud storage 304, for example. The cloud service provider 301 acquires the image file from the cloud storage 304 and creates the standby server 209 . The instance manipulation unit 108 corresponds to the instance manipulation means 23 in FIG.

インスタンス操作部１０８は、インスタンス監視部１０７から強制終了の予告が通知された場合、ブロックストレージ２２３の接続先を、待機系サーバ２０２から、作成した待機系サーバ２０９に切り替える。以後、待機系サーバ２０９は、待機系サーバ２０２の代替として動作する。何らかの原因で現用系サーバ１０２がサービス１０６を提供できなくなると、待機系サーバ２０９はフェイルオーバ処理を行い、待機系サーバ２０９が現用系となる。現用系となった待機系サーバ２０９は、ブロックストレージ２２３にミラーリングされたデータを用いてサービス２１３を提供する。 When the instance monitoring unit 107 notifies the instance monitoring unit 107 of the forced termination, the instance operation unit 108 switches the connection destination of the block storage 223 from the standby server 202 to the created standby server 209 . After that, the standby server 209 operates as a substitute for the standby server 202 . When the active server 102 cannot provide the service 106 for some reason, the standby server 209 performs failover processing, and the standby server 209 becomes the active system. The standby server 209 that has become the active system provides the service 213 using the data mirrored in the block storage 223 .

待機系サーバ２０２又は２０９は、フェイルオーバ処理を行って待機系から現用系になった後、通常インスタンスである待機系サーバ２１６を作成する。ここでは、待機系サーバ２０２が待機系サーバ２１６を作成する場合を説明する。インスタンス操作部２０８は、クラスタ構成情報２０５を参照し、クラウドサービスプロバイダ３０１に、待機系サーバ２０２と同じスペックの待機系サーバ２１６を通常インスタンスで作成させる。また、インスタンス操作部２０８は、ブロックストレージ２２３の接続先を、待機系サーバ２０２から作成した待機系サーバ２１６に切り替える。インスタンス操作部２０８は、待機系サーバ２１６上でサービス２２０を稼動させ、待機系サーバ２１６を現用系に切り替える。インスタンス操作部２０８は、現用系の切替え後、待機系サーバ２０２を終了させる。 After the standby server 202 or 209 performs failover processing and becomes the active system from the standby system, the standby server 216, which is a normal instance, is created. Here, a case where the standby server 202 creates the standby server 216 will be described. The instance operation unit 208 refers to the cluster configuration information 205 and causes the cloud service provider 301 to create a standby server 216 with the same specifications as the standby server 202 as a normal instance. The instance operation unit 208 also switches the connection destination of the block storage 223 from the standby server 202 to the standby server 216 created. The instance operation unit 208 operates the service 220 on the standby server 216 and switches the standby server 216 to the active system. The instance operation unit 208 terminates the standby server 202 after switching to the active system.

待機系サーバ２１６が現用系となった後、現用系サーバ１０２が使用可能となると、フェイルバック処理が実行され、現用系サーバ１０２が現用系となる。現用系サーバ１０２のインスタンス操作部１０８は、フェイルバック後、余剰インスタンスを用いて待機系サーバ２０２を作成する。また、インスタンス操作部１０８は、ブロックストレージ２２３の接続先を待機系サーバ２０２に変更する。インスタンス操作部１０８は、待機系サーバ２０２のサービス２０６を待機状態にし、待機系サーバ２１６を終了させる。 After the standby server 216 becomes the active system, when the active system server 102 becomes available, failback processing is executed and the active system server 102 becomes the active system. After the failback, the instance operation unit 108 of the active server 102 creates the standby server 202 using the redundant instance. Also, the instance operation unit 108 changes the connection destination of the block storage 223 to the standby server 202 . The instance operation unit 108 puts the service 206 of the standby server 202 into a standby state and terminates the standby server 216 .

なお、図２では、現用系サーバ１０２、並びに、待機系サーバ２０２、２０９、及び２１６においてそれぞれサービスが１つずつ実行される例を示しているが、これには限定されない。例えば、現用系サーバ１０２及び待機系サーバ２０２のそれぞれにおいて、複数のサービスが実行されることとしてもよい。また、上記では、待機系サーバ２０２又は２０９が、現用系サーバ１０２のサービス１０６が停止した場合にサービス２０６又は２１３を提供する例を説明したが、これには限定されない。ＨＡクラスタシステム１００は、現用系サーバ１０２と待機系サーバ２０２又は２０９との双方でサービスを稼動する双方向の高可用性クラスタシステムとして構成されてもよい。双方向の高可用性クラスタシステムでは、一方のサーバがサービス提供不可能な状況に陥った場合、他方のサーバにサービスがフェイルオーバされる。 Although FIG. 2 shows an example in which one service is executed in each of the active server 102 and the standby servers 202, 209, and 216, the present invention is not limited to this. For example, multiple services may be executed in each of the active server 102 and the standby server 202 . In the above description, the standby server 202 or 209 provides the service 206 or 213 when the service 106 of the active server 102 stops, but the present invention is not limited to this. The HA cluster system 100 may be configured as a bi-directional high availability cluster system in which both the active server 102 and the standby server 202 or 209 run services. In a two-way high-availability cluster system, when one server becomes unable to provide service, the service is failed over to the other server.

以下、動作手順を説明する。図３は、待機系切替えにおける動作手順（クラスタシステム制御方法）を示す。ここでは、現用系サーバ１０２においてサービス１０６が正常に稼動しており、待機系サーバ２０２が待機状態であるとする。現用系サーバ１０２のインスタンス監視部１０７は、余剰インスタンスである待機系サーバ２０２を監視する（ステップＡ１）。インスタンス監視部１０７は、ステップＡ１では、例えば定期的にクラウドサービスプロバイダ３０１に、パブリッククラウド環境２０１の待機系サーバ２０２の状態を問い合わせる。 The operation procedure will be described below. FIG. 3 shows an operation procedure (cluster system control method) in standby system switching. Here, it is assumed that the service 106 is operating normally on the active server 102 and the standby server 202 is in a standby state. The instance monitoring unit 107 of the active server 102 monitors the standby server 202, which is a surplus instance (step A1). In step A1, the instance monitoring unit 107 periodically inquires of the cloud service provider 301 about the state of the standby server 202 of the public cloud environment 201, for example.

インスタンス監視部１０７は、待機系サーバ２０２が任意の時間の経過後に強制終了されるか否かを判断する（ステップＡ２）。インスタンス監視部１０７は、待機系サーバ２０２の強制終了が予告されていない場合は、ステップＡ１に戻り、待機系サーバ２０２の状態の監視を継続する。 The instance monitoring unit 107 determines whether or not the standby server 202 will be forcibly terminated after an arbitrary period of time (step A2). If the forced termination of the standby server 202 has not been announced, the instance monitoring unit 107 returns to step A<b>1 and continues monitoring the status of the standby server 202 .

インスタンス操作部１０８は、待機系サーバが強制終了されると判断された場合、パブリッククラウド環境２０１に新たな余剰インスタンスを作成させる（ステップＡ３）。インスタンス操作部１０８は、ステップＡ３では、クラスタ構成情報１０５に格納される情報に基づいて、クラウドサービスプロバイダ３０１に、スペックが待機系サーバ２０２と同一の余剰クラスタの作成を要求する。 When it is determined that the standby server will be forcibly terminated, the instance operation unit 108 causes the public cloud environment 201 to create a new surplus instance (step A3). At step A<b>3 , the instance operation unit 108 requests the cloud service provider 301 to create a surplus cluster having the same specifications as the standby server 202 based on the information stored in the cluster configuration information 105 .

クラウドサービスプロバイダ３０１は、要求に応じて、パブリッククラウド環境２０１に待機系サーバ２０９を作成する（ステップＡ４）。クラウドサービスプロバイダ３０１は、ステップＡ４では、高可用性クラスタソフト２１０がインストールされているイメージファイルをクラウドストレージ３０４から取得し、待機系サーバ２０９を余剰インスタンスとして作成する。 The cloud service provider 301 creates the standby server 209 in the public cloud environment 201 in response to the request (step A4). In step A4, the cloud service provider 301 acquires an image file in which the high-availability cluster software 210 is installed from the cloud storage 304, and creates the standby server 209 as a surplus instance.

インスタンス操作部１０８は、パブリッククラウド環境２０１において、ブロックストレージ２２３の接続先を待機系サーバ２０２から待機系サーバ２０９に変更する（ステップＡ５）。インスタンス操作部１０８は、ステップＡ５では、クラウドサービスプロバイダ３０１経由で、待機系サーバ２０２からブロックストレージ２２３を取り外す。その後、インスタンス操作部１０８は、クラウドサービスプロバイダ３０１経由で、ステップＡ４で作成された待機系サーバ２０９にブロックストレージ２２３を接続する。 The instance operation unit 108 changes the connection destination of the block storage 223 from the standby server 202 to the standby server 209 in the public cloud environment 201 (step A5). The instance operation unit 108 removes the block storage 223 from the standby server 202 via the cloud service provider 301 in step A5. After that, the instance operation unit 108 connects the block storage 223 to the standby server 209 created in step A4 via the cloud service provider 301 .

インスタンス操作部１０８は、クラウドサービスプロバイダ３０１から待機系サーバ２０９の情報を取得する。クラスタ制御部１０４は、取得された待機系サーバ２０９の情報に基づいて、クラスタ構成情報１０５を更新する（ステップＡ６）。クラスタ制御部１０４は、待機系サーバ２０９と通信を行い、現用系サーバ１０２内のクラスタ構成情報１０５と待機系サーバ２０９内のクラスタ構成情報２１２とを同期させる（ステップＡ７）。クラスタ制御部１０４は、ステップＡ７では、ＷＡＮ３０２を通じて待機系サーバ２０９のクラスタ制御部２１１と通信し、クラスタ構成情報１０５に基づいてクラスタ構成情報２１２を変更する。 The instance operation unit 108 acquires information on the standby server 209 from the cloud service provider 301 . The cluster control unit 104 updates the cluster configuration information 105 based on the acquired information of the standby server 209 (step A6). The cluster control unit 104 communicates with the standby server 209 and synchronizes the cluster configuration information 105 in the active server 102 with the cluster configuration information 212 in the standby server 209 (step A7). At step A7, the cluster control unit 104 communicates with the cluster control unit 211 of the standby server 209 via the WAN 302 and changes the cluster configuration information 212 based on the cluster configuration information 105. FIG.

現用系サーバ１０２は、ブロックストレージ１０９の差分データを、パブリッククラウド環境２０１のブロックストレージ２２３に書き込む（ステップＡ８）。現用系サーバ１０２は、ステップＡ８では、ブロックストレージ１０９において前回の差分データの書き込み時点から変更されたデータを、ブロックストレージ２２３に書き込む。このような手順を経て、待機系の切替えが完了する。 The active server 102 writes the difference data of the block storage 109 to the block storage 223 of the public cloud environment 201 (step A8). In step A8, the active server 102 writes, to the block storage 223, the data that has been changed since the previous difference data was written in the block storage 109. FIG. Through such procedures, the switching of the standby system is completed.

図４は、待機系サーバ２０２にフェイルオーバする際の動作手順を示す。現用系サーバ１０２で異常が発生した場合、待機系サーバ２０２が現用系となる（ステップＢ１）。仮に、フェイルオーバ前に待機系サーバ２０２が強制終了されていた場合は、待機系サーバ２０９が現用系となる。現用系となった待機系サーバ２０２のインスタンス操作部２０８は、パブリッククラウド環境２０１に通常インスタンスを作成する（ステップＢ２）。インスタンス操作部２０８は、ステップＢ２では、クラスタ構成情報２０５に格納される情報に基づいて、クラウドサービスプロバイダ３０１に、スペックが待機系サーバ２０２と同一の通常クラスタの作成を要求する。 FIG. 4 shows the operation procedure when failing over to the standby server 202 . When an abnormality occurs in the active server 102, the standby server 202 becomes the active system (step B1). If the standby server 202 is forcibly terminated before failover, the standby server 209 becomes the active system. The instance operation unit 208 of the standby server 202 that has become the active system creates a normal instance in the public cloud environment 201 (step B2). At step B 2 , the instance operation unit 208 requests the cloud service provider 301 to create a normal cluster having the same specifications as the standby server 202 based on the information stored in the cluster configuration information 205 .

クラウドサービスプロバイダ３０１は、要求に応じて、パブリッククラウド環境２０１に待機系サーバ２１６を作成する（ステップＢ３）。クラウドサービスプロバイダ３０１は、ステップＢ３では、高可用性クラスタソフト２１７がインストールされているイメージファイルをクラウドストレージ３０４から取得し、待機系サーバ２１６を通常インスタンスとして作成する。 The cloud service provider 301 creates the standby server 216 in the public cloud environment 201 in response to the request (step B3). In step B3, the cloud service provider 301 acquires an image file in which the high-availability cluster software 217 is installed from the cloud storage 304, and creates the standby server 216 as a normal instance.

インスタンス操作部２０８は、パブリッククラウド環境２０１において、ブロックストレージ２２３の接続先を待機系サーバ２０２から待機系サーバ２１６に変更する（ステップＢ４）。インスタンス操作部２０８は、ステップＢ４では、クラウドサービスプロバイダ３０１経由で、待機系サーバ２０２からブロックストレージ２２３を取り外す。その後、インスタンス操作部２０８は、クラウドサービスプロバイダ３０１経由で、ステップＢ３で作成した待機系サーバ２１６にブロックストレージ２２３を接続する。 The instance operation unit 208 changes the connection destination of the block storage 223 from the standby server 202 to the standby server 216 in the public cloud environment 201 (step B4). The instance operation unit 208 removes the block storage 223 from the standby server 202 via the cloud service provider 301 in step B4. After that, the instance operation unit 208 connects the block storage 223 to the standby server 216 created in step B3 via the cloud service provider 301 .

クラスタ制御部２０４は、待機系サーバ２１６のクラスタ制御部２１８と通信し、クラスタ構成情報２０５に基づいて、待機系サーバ２１６のクラスタ構成情報２１９を変更する（ステップＢ５）。インスタンス操作部２０８は、待機系サーバ２０２内のサービス２０６を停止し、待機系サーバ２１６のサービス２２０を稼動させる（ステップＢ６）。この時点で、待機系サーバ２１６が現用系となる。 The cluster control unit 204 communicates with the cluster control unit 218 of the standby server 216 and changes the cluster configuration information 219 of the standby server 216 based on the cluster configuration information 205 (step B5). The instance operation unit 208 stops the service 206 in the standby server 202 and activates the service 220 of the standby server 216 (step B6). At this point, the standby server 216 becomes the active system.

インスタンス操作部２２２は、クラウドサービスプロバイダ３０１に余剰インスタンスである待機系サーバ２０２の終了を要求する（ステップＢ７）。クラウドサービスプロバイダ３０１は、要求に従って待機系サーバ２０２を終了させる。 The instance operation unit 222 requests the cloud service provider 301 to terminate the standby server 202, which is a surplus instance (step B7). The cloud service provider 301 terminates the standby server 202 according to the request.

図５は、現用系サーバ１０２にフェイルバックする場合の動作手順を示す。現用系サーバ１０２が使用可能になると、現用系として動作している待機系サーバ２１６のクラスタ制御部２１８は、クラスタ構成情報２１９に基づいて、現用系サーバ１０２にフェイルバックする（ステップＣ１）。このフェイルバックは、通常のフェイルバックと同様な処理でよい。フェイルバックは、第三者が手動で行ってもよいし、現用系サーバ１０２又は待機系サーバ２１６が自動で行ってもよい。 FIG. 5 shows the operating procedure when failing back to the active server 102 . When the active server 102 becomes available, the cluster controller 218 of the standby server 216 operating as the active system fails back to the active server 102 based on the cluster configuration information 219 (step C1). This failback may be performed in the same manner as normal failback. Failback may be performed manually by a third party, or may be performed automatically by the active server 102 or the standby server 216 .

フェイルバック後、現用系サーバ１０２のインスタンス操作部１０８は、パブリッククラウド環境２０１に新たな余剰インスタンスを作成する（ステップＣ２）。インスタンス操作部１０８は、ステップＣ２では、クラスタ構成情報１０５に格納される情報に基づいて、クラウドサービスプロバイダ３０１に、スペックが待機系サーバ２１６と同一の余剰クラスタの作成を要求する。 After the failback, the instance operation unit 108 of the active server 102 creates a new surplus instance in the public cloud environment 201 (step C2). At step C 2 , the instance operation unit 108 requests the cloud service provider 301 to create a surplus cluster having the same specifications as the standby server 216 based on the information stored in the cluster configuration information 105 .

クラウドサービスプロバイダ３０１は、要求に応じて、パブリッククラウド環境２０１に待機系サーバ２０２を作成する（ステップＣ３）。クラウドサービスプロバイダ３０１は、ステップＣ３では、高可用性クラスタソフト２０３がインストールされているイメージファイルをクラウドストレージ３０４から取得し、待機系サーバ２０２を余剰インスタンスとして作成する。 Cloud service provider 301 creates standby server 202 in public cloud environment 201 in response to the request (step C3). In step C3, the cloud service provider 301 acquires an image file in which the high-availability cluster software 203 is installed from the cloud storage 304, and creates the standby server 202 as a surplus instance.

インスタンス操作部１０８は、パブリッククラウド環境２０１において、ブロックストレージ２２３の接続先を待機系サーバ２１６から待機系サーバ２０２に変更する（ステップＣ４）。インスタンス操作部１０８は、ステップＣ４では、クラウドサービスプロバイダ３０１経由で、待機系サーバ２１６からブロックストレージ２２３を取り外す。その後、インスタンス操作部１０８は、クラウドサービスプロバイダ３０１経由で、ステップＣ３で作成した待機系サーバ２０２にブロックストレージ２２３を接続する。 The instance operation unit 108 changes the connection destination of the block storage 223 from the standby server 216 to the standby server 202 in the public cloud environment 201 (step C4). The instance operation unit 108 removes the block storage 223 from the standby server 216 via the cloud service provider 301 in step C4. After that, the instance operation unit 108 connects the block storage 223 to the standby server 202 created in step C3 via the cloud service provider 301 .

インスタンス操作部１０８は、クラウドサービスプロバイダ３０１から待機系サーバ２０２の情報を取得する。クラスタ制御部１０４は、取得された待機系サーバ２０２の情報に基づいて、クラスタ構成情報１０５を更新する（ステップＣ５）。クラスタ制御部１０４は、待機系サーバ２０２と通信を行い、現用系サーバ１０２内のクラスタ構成情報１０５と待機系サーバ２０２内のクラスタ構成情報２０５とを同期させる（ステップＣ６）。クラスタ制御部１０４は、ステップＣ６では、ＷＡＮ３０２を通じて待機系サーバ２０２のクラスタ制御部２０４と通信し、クラスタ構成情報１０５に基づいてクラスタ構成情報２０５を変更する。 The instance operation unit 108 acquires information on the standby server 202 from the cloud service provider 301 . The cluster control unit 104 updates the cluster configuration information 105 based on the acquired information of the standby server 202 (step C5). The cluster controller 104 communicates with the standby server 202 and synchronizes the cluster configuration information 105 in the active server 102 with the cluster configuration information 205 in the standby server 202 (step C6). At step C6, the cluster control unit 104 communicates with the cluster control unit 204 of the standby server 202 via the WAN 302 and changes the cluster configuration information 205 based on the cluster configuration information 105. FIG.

インスタンス操作部１０８は、クラウドサービスプロバイダ３０１に通常インスタンスである待機系サーバ２１６の終了を要求する（ステップＣ７）。クラウドサービスプロバイダ３０１は、要求に従って待機系サーバ２１６を終了させる。 The instance operation unit 108 requests the cloud service provider 301 to terminate the standby server 216, which is a normal instance (step C7). The cloud service provider 301 terminates the standby server 216 according to the request.

現用系サーバ１０２は、ブロックストレージ１０９の差分データを、パブリッククラウド環境２０１のブロックストレージ２２３に書き込む（ステップＣ８）。現用系サーバ１０２は、ステップＣ８では、ブロックストレージ１０９において前回の差分データの書き込み時点から変更されたデータを、ブロックストレージ２２３に書き込む。このような手順を経て、待機系の切替えが完了する。 The active server 102 writes the differential data in the block storage 109 to the block storage 223 of the public cloud environment 201 (step C8). In step C8, the active system server 102 writes to the block storage 223 the data that has been changed since the previous difference data was written in the block storage 109 . Through such procedures, the switching of the standby system is completed.

本実施形態では、現用系サーバ１０２は、余剰インスタンスである待機系サーバ２０２を監視するインスタンス監視部１０７を有する。インスタンス操作部１０８は、インスタンス監視部１０７において待機系サーバ２０２が強制終了されることが検知された場合、新たに余剰インスタンスを生成し、待機系サーバ２０９として動作させる。このようにすることで、ＨＡクラスタシステム１００は、待機系サーバ２０２が強制終了する前に、クラスタ構成を動的に変更することができる。 In this embodiment, the active server 102 has an instance monitoring unit 107 that monitors the standby server 202, which is a surplus instance. When the instance monitoring unit 107 detects that the standby server 202 is forcibly terminated, the instance operation unit 108 newly generates a surplus instance and operates it as the standby server 209 . By doing so, the HA cluster system 100 can dynamically change the cluster configuration before the standby server 202 is forcibly terminated.

本実施形態では、待機系サーバ２０２及び２０９に余剰インスタンスを用いている。一般に、余剰インスタンスは通常インスタンスよりも低コストで利用できる。本実施形態では、待機系サーバ２０２が強制終了される場合、待機系サーバ２０９が余剰インスタンスとして作成され、待機系サーバ２０２の代替として動作する。このようにすることで、待機系サーバ２０２が強制終了される場合でも、現用系サーバ１０２の待機系として動作するサーバを確保することができる。このため、本実施形態では、待機系サーバ２０２に低コストで利用可能な余剰インスタンスを用いつつ、可用性の低下を抑制することができる。別の言い方をすると、本実施形態では、待機系サーバに強制終了される可能性があるインスタンスを用いて運用コストを抑えつつ、ＨＡクラスタシステム１００の可用性を確保することができる。 In this embodiment, surplus instances are used for the standby servers 202 and 209 . Surplus instances are generally available at a lower cost than regular instances. In this embodiment, when the standby server 202 is forcibly terminated, the standby server 209 is created as a surplus instance and operates as a substitute for the standby server 202 . By doing so, even if the standby server 202 is forcibly terminated, it is possible to secure a server that operates as a standby system for the active server 102 . For this reason, in this embodiment, it is possible to suppress deterioration in availability while using surplus instances that can be used at low cost for the standby server 202 . In other words, in this embodiment, the availability of the HA cluster system 100 can be ensured while suppressing operation costs by using instances that may be forcibly terminated by the standby server.

本実施形態では、フェイルオーバにより、余剰インスタンスである待機系サーバ２０２又は２０９が現用系となった場合、現用系となった待機系サーバ２０２又は２０９は、新たに通常インスタンスを作成し、待機系サーバ２１６として動作させる。待機系サーバ２０２又は２０９は、待機系サーバ２１６を現用系として動作させ、自身は終了する。このようにすることで、パブリッククラウド環境２０１において、待機系サーバ２１６を、現用系サーバ１０２の代替として動作させることができる。パブリッククラウド環境２０１において、通常インスタンスである待機系サーバ２１６は強制終了されないため、意図せずに代替動作する待機系サーバが強制終了され、サービスの提供が停止することを回避できる。 In this embodiment, when the standby server 202 or 209, which is a surplus instance, becomes the active system due to failover, the standby server 202 or 209 that has become the active system creates a new normal instance, and 216. The standby server 202 or 209 causes the standby server 216 to operate as an active system, and terminates itself. By doing so, the standby server 216 can be operated as a substitute for the active server 102 in the public cloud environment 201 . In the public cloud environment 201, since the standby server 216, which is a normal instance, is not forcibly terminated, it is possible to avoid unintentional forcible termination of the standby server operating as a substitute and stop of service provision.

本実施形態では、現用系サーバ１０２にフェイルバックされた場合、現用系となった現用系サーバ１０２は、パブリッククラウド環境２０１に余剰インスタンスを作成し、待機系サーバ２０２として動作させる。また、現用系サーバ１０２は、フェイルバック前に現用系として動作していた待機系サーバ２１６を終了させる。このようにすることで、ＨＡクラスタシステム１００において、余剰インスタンスである待機系サーバ２０２を待機系として用いることができる。また、フェイルバック後は、通常インスタンスである待機系サーバ２１６を終了させることで、運用コストが高い通常インスタンスが使用される時間を最小限にとどめることができ、費用を削減することができる。 In this embodiment, when failing back to the active server 102 , the active server 102 that has become the active system creates a surplus instance in the public cloud environment 201 and operates it as the standby server 202 . Also, the active server 102 terminates the standby server 216 that was operating as the active system before the failback. By doing so, in the HA cluster system 100, the standby server 202, which is a surplus instance, can be used as a standby system. Also, after the failback, by terminating the standby server 216, which is a normal instance, it is possible to minimize the time during which the normal instance with high operating costs is used, thereby reducing costs.

なお、上記実施形態では、現用系サーバ１０２がオンプレミス環境１０１に配置される例を説明したが、本開示はこれには限定されない。現用系サーバ１０２は、他者から強制終了されないサーバであればよく、例えばパブリッククラウド環境２０１において、通常インスタンスを用いて作成されてもよい。現用系サーバ１０２を通常インスタンスで稼動する場合、その通常インスタンスには、ブロックストレージが取り付けられる。その場合、現用系サーバ１０２として稼動する通常インスタンスと、ブロックストレージとは、同じパブリッククラウド環境に属する。 In the above embodiment, an example in which the active server 102 is arranged in the on-premises environment 101 has been described, but the present disclosure is not limited to this. The active server 102 may be a server that cannot be forcibly terminated by another party, and may be created using a normal instance in the public cloud environment 201, for example. When running the active server 102 in a normal instance, block storage is attached to the normal instance. In that case, the normal instance that operates as the active server 102 and the block storage belong to the same public cloud environment.

現用系サーバ１０２が通常インスタンスで稼動される場合、現用系サーバ１０２が属するクラウド環境と、待機系サーバ２０２、２０９、及び２１６が属するクラウド環境とは、必ずしも同一である必要はない。現用系サーバ１０２は、待機系サーバ２０２、２０９、及び２１６が属するパブリッククラウド環境２０１とは異なるクラウド環境に属していてもよい。その場合、現用系サーバ１０２は、待機系サーバ２０２、２０９、及び２１６と通信可能であればよい。 When the active server 102 is operated as a normal instance, the cloud environment to which the active server 102 belongs and the cloud environment to which the standby servers 202, 209, and 216 belong do not necessarily have to be the same. The active server 102 may belong to a cloud environment different from the public cloud environment 201 to which the standby servers 202, 209, and 216 belong. In that case, active server 102 only needs to be able to communicate with standby servers 202 , 209 , and 216 .

上記実施形態では、パブリッククラウド環境２０１の一例としてＡｍａｚｏｎＥＣ２に言及したが、本開示はこれには限定されない。本開示において、ＨＡクラスタシステム１００は、待機系サーバがユーザの意思にかかわらず強制終了されるような性質を持つパブリッククラウド上の仮想マシンを用いて構成される。ＨＡクラスタシステム１００で利用されるパブリッククラウド環境２０１は、特に特定のベンダが提供するクラウド環境には限定されない。 Although Amazon EC2 was mentioned as an example of the public cloud environment 201 in the above embodiment, the present disclosure is not limited thereto. In the present disclosure, the HA cluster system 100 is configured using virtual machines on the public cloud that have properties such that the standby server is forcibly terminated regardless of the user's will. The public cloud environment 201 used in the HA cluster system 100 is not particularly limited to cloud environments provided by specific vendors.

上記実施形態では、ＨＡクラスタシステム１００が、現用系サーバ１０２と、待機系サーバ２０２又は２０９との２台のサーバを用いて構成される例を説明したが、本開示はこれには限定されない。ＨＡクラスタシステム１００において、現用系サーバと待機系サーバの台数に特に制限はない。ＨＡクラスタシステム１００は、所定の数の現用系サーバと、それと同数、又は異なる数の待機系サーバとを含んで構成されていてもよい。 In the above embodiment, an example in which the HA cluster system 100 is configured using two servers, the active server 102 and the standby server 202 or 209 has been described, but the present disclosure is not limited to this. In the HA cluster system 100, there are no particular restrictions on the number of active servers and standby servers. The HA cluster system 100 may be configured including a predetermined number of active servers and the same number or a different number of standby servers.

上記各実施形態において、高可用性クラスタソフト１０３などのソフトウェア（プログラム）は、様々なタイプの非一時的なコンピュータ可読媒体を用いて格納され、コンピュータに供給することができる。非一時的なコンピュータ可読媒体は、様々なタイプの実体のある記憶媒体を含む。非一時的なコンピュータ可読媒体の例は、例えばフレキシブルディスク、磁気テープ、又はハードディスクなどの磁気記録媒体、例えば光磁気ディスクなどの光磁気記録媒体、ＣＤ（compact disc）、又はＤＶＤ（digital versatile disk）などの光ディスク媒体、及び、マスクＲＯＭ（read only memory）、ＰＲＯＭ（programmable ROM）、ＥＰＲＯＭ（erasable PROM）、フラッシュＲＯＭ、又はＲＡＭ（random access memory）などの半導体メモリを含む。また、プログラムは、様々なタイプの一時的なコンピュータ可読媒体を用いてコンピュータに供給されてもよい。一時的なコンピュータ可読媒体の例は、電気信号、光信号、及び電磁波を含む。一時的なコンピュータ可読媒体は、電線及び光ファイバなどの有線通信路、又は無線通信路を介して、プログラムをコンピュータに供給できる。 In each of the above embodiments, software (programs) such as the high-availability cluster software 103 can be stored and supplied to computers using various types of non-transitory computer-readable media. Non-transitory computer-readable media include various types of tangible storage media. Examples of non-transitory computer-readable media include magnetic recording media such as flexible disks, magnetic tapes, or hard disks, magneto-optical recording media such as magneto-optical discs, compact discs (CDs), or digital versatile disks (DVDs). and semiconductor memory such as mask ROM (read only memory), PROM (programmable ROM), EPROM (erasable PROM), flash ROM, or RAM (random access memory). The program may also be delivered to the computer using various types of transitory computer readable media. Examples of transitory computer-readable media include electrical signals, optical signals, and electromagnetic waves. Transitory computer-readable media can deliver the program to the computer via wired channels, such as wires and optical fibers, or wireless channels.

以上、本開示の実施形態を詳細に説明したが、本開示は、上記した実施形態に限定されるものではなく、本開示の趣旨を逸脱しない範囲で上記実施形態に対して変更や修正を加えたものも、本開示に含まれる。 Although the embodiments of the present disclosure have been described in detail above, the present disclosure is not limited to the above-described embodiments, and changes and modifications can be made to the above-described embodiments without departing from the scope of the present disclosure. are also included in the present disclosure.

例えば、上記の実施形態の一部又は全部は、以下の付記のようにも記載され得るが、以下には限られない。 For example, some or all of the above-described embodiments may be described in the following supplementary remarks, but are not limited to the following.

［付記１］
サービスを実行するサービス実行手段を有する第１のサーバと、
前記第１のサーバにおいて前記サービスの実行に障害が発生した場合に、前記サービスの実行がフェイルオーバされる第２のサーバとを備え、
前記第２のサーバは、クラウド環境においてクラウドサービスプロバイダから強制終了させられる可能性がある第１の種別のインスタンスとして作成された仮想サーバであり、
前記第１のサーバは、更に、
前記第２のサーバが強制終了されるか否かを監視するインスタンス監視手段と、
前記第２のサーバが強制終了されることが検出された場合、前記クラウド環境に第３のサーバを前記第１の種別のインスタンスとして作成させ、前記第２のサーバが提供する機能を前記第３のサーバに引き継がせるインスタンス操作手段とを有するクラスタシステム。 [Appendix 1]
a first server having service execution means for executing a service;
a second server to which execution of the service is failed over when a failure occurs in execution of the service in the first server;
the second server is a virtual server created as an instance of a first type that can be forcibly terminated by a cloud service provider in a cloud environment;
The first server further
instance monitoring means for monitoring whether the second server is forcibly terminated;
When it is detected that the second server is forcibly terminated, it causes the cloud environment to create a third server as an instance of the first type, and restores the functionality provided by the second server to the third server. A cluster system having an instance operation means to take over to the server of.

［付記２］
前記第１のサーバは前記サービスの実行において使用される第１のストレージを含み、かつ前記クラウド環境は、前記第１のストレージと同じ内容のデータを記憶する第２のストレージを含み、
前記インスタンス操作手段は、前記第２のサーバが強制終了されることが検出された場合、前記第２のストレージの接続先を前記第２のサーバから前記第３のサーバに切り替える付記１に記載のクラスタシステム。 [Appendix 2]
The first server includes a first storage used in executing the service, and the cloud environment includes a second storage that stores the same data as the first storage,
1. The instance operation means according to Supplementary Note 1, wherein when it is detected that the second server is forcibly terminated, the connection destination of the second storage is switched from the second server to the third server. cluster system.

［付記３］
前記第２のサーバは、
前記サービスの実行がフェイルオーバされた後、前記クラウド環境に、第４のサーバをクラウドサービスプロバイダから強制終了されない第２の種別のインスタンスとして作成させ、前記第２のストレージの接続先を前記第２のサーバから前記第４のサーバに切り替え、前記第４のサーバにおいて前記サービスを実行させるインスタンス操作手段を有する付記２に記載のクラスタシステム。 [Appendix 3]
The second server is
After the execution of the service is failed over, causing the cloud environment to create a fourth server as an instance of the second type that is not forcibly terminated by the cloud service provider, and connecting the second storage to the second storage. The cluster system according to appendix 2, further comprising instance operation means for switching from a server to the fourth server and causing the fourth server to execute the service.

［付記４］
前記第４のサーバは、前記サービスの実行の開始後、前記クラウドサービスプロバイダに、前記第２のサーバの終了を要求する付記３に記載のクラスタシステム。 [Appendix 4]
The cluster system according to appendix 3, wherein the fourth server requests the cloud service provider to terminate the second server after starting execution of the service.

［付記５］
前記第１のサーバが前記障害から復旧した場合、前記サービスの実行は前記第４のサーバから前記第１のサーバにフェイルバックされ、
前記第１のサーバのインスタンス操作手段は、前記クラウドサービスプロバイダに、前記第２のサーバを前記第１の種別のインスタンスとして作成させ、前記第２のストレージの接続先を前記第４のサーバから前記第２のサーバに切り替え、前記クラウドサービスプロバイダに前記第４のサーバの終了を要求する付記４に記載のクラスタシステム。 [Appendix 5]
if the first server recovers from the failure, execution of the service fails back from the fourth server to the first server;
The instance operation means of the first server causes the cloud service provider to create the second server as an instance of the first type, and changes the connection destination of the second storage from the fourth server to the 5. The cluster system of claim 4, switching to a second server and requesting the cloud service provider to terminate the fourth server.

［付記６］
前記第１のサーバの前記インスタンス操作手段は、前記クラウドサービスプロバイダに、前記第２のサーバと同じ性能で前記第３のサーバを作成させる付記１から５何れか１つに記載のクラスタシステム。 [Appendix 6]
6. The cluster system according to any one of appendices 1 to 5, wherein the instance operation means of the first server causes the cloud service provider to create the third server with the same performance as the second server.

［付記７］
前記第１のサーバは現用系として動作し、前記第２のサーバ及び前記第３のサーバは待機系として動作する付記１から６何れか１つに記載のクラスタシステム。 [Appendix 7]
7. The cluster system according to any one of appendices 1 to 6, wherein the first server operates as an active system, and the second server and the third server operate as standby systems.

［付記８］
前記サービスの実行が前記第２のサーバにフェイルオーバされる前、前記第１のサーバは現用系として動作し、かつ前記第２のサーバは待機系として動作し、
前記サービスの実行がフェイルオーバされた後で、かつ前記第４のサーバにて前記サービスの実行が開始される前、前記第２のサーバは現用系として動作し、
前記第４のサーバにて前記サービスの実行が開始された後、前記第４のサーバは現用系として動作する付記３又は４に記載のクラスタシステム。 [Appendix 8]
before execution of the service fails over to the second server, the first server operates as an active system and the second server operates as a standby system;
after execution of the service fails over and before execution of the service begins on the fourth server, the second server operates as an active system;
5. The cluster system according to appendix 3 or 4, wherein after the fourth server starts executing the service, the fourth server operates as an active system.

［付記９］
サービスを実行するサービス実行手段と、
クラウド環境においてクラウドサービスプロバイダから強制終了させられる可能性がある第１の種別のインスタンスとして作成され、かつ前記サービスの実行に障害が発生した場合に前記サービスの実行がフェイルオーバされる第１の仮想サーバが強制終了されるか否かを監視するインスタンス監視手段と、
前記第１の仮想サーバが強制終了されることが検出された場合、前記クラウド環境に第２の仮想サーバを前記第１の種別のインスタンスとして作成させ、前記第１の仮想サーバが提供する機能を前記第２の仮想サーバに引き継がせるインスタンス操作手段とを備えるサーバ。 [Appendix 9]
a service execution means for executing a service;
A first virtual server that is created as an instance of a first type that may be forcibly terminated by a cloud service provider in a cloud environment, and that fails over execution of the service if a failure occurs in the execution of the service. instance monitoring means for monitoring whether or not is forcibly terminated;
When it is detected that the first virtual server is forcibly terminated, causing the cloud environment to create a second virtual server as an instance of the first type, and performing functions provided by the first virtual server. and an instance operation means for taking over to the second virtual server.

［付記１０］
前記サービスの実行において使用される第１のストレージを更に有し、
前記クラウド環境は、前記第１のストレージと同じ内容のデータを記憶する第２のストレージを含んでおり、
前記インスタンス操作手段は、前記第１の仮想サーバが強制終了されることが検出された場合、前記第２のストレージの接続先を前記第１の仮想サーバから前記第２の仮想サーバに切り替える付記９に記載のサーバ。 [Appendix 10]
further comprising a first storage used in executing the service;
The cloud environment includes a second storage that stores the same data as the first storage,
Supplementary Note 9: The instance operation means switches the connection destination of the second storage from the first virtual server to the second virtual server when it is detected that the first virtual server is forcibly terminated. server described in .

［付記１１］
前記第１の仮想サーバは、前記サービスの実行がフェイルオーバされた後、前記クラウド環境に、第３の仮想サーバをクラウドサービスプロバイダから強制終了されない第２の種別のインスタンスとして作成させ、前記第２のストレージの接続先を前記第１の仮想サーバから前記第３の仮想サーバに切り替え、前記第３の仮想サーバにおいて前記サービスを実行させ、前記クラウドサービスプロバイダに前記第１の仮想サーバの終了を要求し、
前記インスタンス操作手段は、前記サービスの実行が前記第３の仮想サーバから前記サーバにフェイルバックされる場合、前記クラウドサービスプロバイダに、前記第１の仮想サーバを前記第１の種別のインスタンスとして作成させ、前記第２のストレージの接続先を前記第３の仮想サーバから前記第１の仮想サーバに切り替え、前記クラウドサービスプロバイダに前記第３の仮想サーバの終了を要求する付記１０に記載のサーバ。 [Appendix 11]
After the execution of the service fails over, the first virtual server causes the cloud environment to create a third virtual server as an instance of a second type that cannot be forcibly terminated by a cloud service provider, and Switching the connection destination of the storage from the first virtual server to the third virtual server, executing the service in the third virtual server, and requesting the cloud service provider to terminate the first virtual server ,
The instance operation means causes the cloud service provider to create the first virtual server as an instance of the first type when execution of the service is failed back from the third virtual server to the server. 11. The server according to appendix 10, which switches the connection destination of the second storage from the third virtual server to the first virtual server, and requests the cloud service provider to terminate the third virtual server.

［付記１２］
前記サービスの実行が前記第１の仮想サーバにフェイルオーバされる前、前記サーバは現用系として動作し、かつ前記第１の仮想サーバ又は前記第２の仮想サーバは待機系として動作する付記９から１１何れか１つに記載のサーバ。 [Appendix 12]
Appendices 9 to 11 wherein, before execution of said service is failed over to said first virtual server, said server operates as an active system and said first virtual server or said second virtual server operates as a standby system. A server according to any one of the preceding claims.

［付記１３］
サーバにおいてサービスを実行し、
クラウド環境においてクラウドサービスプロバイダから強制終了させられる可能性があるインスタンスとして作成され、かつ前記サービスの実行に障害が発生した場合に前記サービスの実行がフェイルオーバされる第１の仮想サーバが強制終了されるか否かを監視し、
前記第１の仮想サーバが強制終了されることが検出された場合、前記クラウド環境に第２の仮想サーバを前記強制終了させられる可能性があるインスタンスとして作成させ、前記第１の仮想サーバが提供する機能を前記第２の仮想サーバに引き継がせるクラスタシステム制御方法。 [Appendix 13]
running a service on the server,
A first virtual server that is created as an instance that can be forcibly terminated by a cloud service provider in a cloud environment and whose execution of the service fails over if a failure occurs in the execution of the service is forcibly terminated. monitor whether or not
When it is detected that the first virtual server is forcibly terminated, causing the cloud environment to create a second virtual server as an instance that may be forcibly terminated, and the first virtual server provides A cluster system control method for making the second virtual server take over the function of

［付記１４］
コンピュータに、
サービスを実行し、
クラウド環境においてクラウドサービスプロバイダから強制終了させられる可能性があるインスタンスとして作成され、かつ前記サービスの実行に障害が発生した場合に前記サービスの実行がフェイルオーバされる第１の仮想サーバが強制終了されるか否かを監視し、
前記第１の仮想サーバが強制終了されることが検出された場合、前記クラウド環境に第２の仮想サーバを前記強制終了させられる可能性があるインスタンスとして作成させ、前記第１の仮想サーバが提供する機能を前記第２の仮想サーバに引き継がせるための処理を実行させるためのプログラム。 [Appendix 14]
to the computer,
run the service,
A first virtual server that is created as an instance that can be forcibly terminated by a cloud service provider in a cloud environment and whose execution of the service fails over if a failure occurs in the execution of the service is forcibly terminated. monitor whether or not
When it is detected that the first virtual server is forcibly terminated, causing the cloud environment to create a second virtual server as an instance that may be forcibly terminated, and the first virtual server provides A program for causing the second virtual server to take over the function of

１０：クラスタシステム
２０：第１のサーバ
２１：サービス実行手段
２２：インスタンス監視手段
２３：インスタンス操作手段
３０：第２のサーバ
４０：第３のサーバ
５０：クラウド環境
１００：高可用性クラスタシステム
１０１：オンプレミス環境
１０２：現用系サーバ
１０３、２０３、２１０、２１７：高可用性クラスタソフト
１０４、２０４、２１１、２１８：クラスタ制御部
１０５、２０５、２１２、２１９：クラスタ構成情報
１０６、２０６、２１３、２２０：サービス
１０７、２０７、２１４、２２１：インスタンス監視部
１０８、２０８、２１５、２２２：インスタンス操作部
１０９、２２３：ブロックストレージ
２０１：パブリッククラウド環境
２０２、２０９、２１６：待機系サーバ
３０１：クラウドサービスプロバイダ
３０３：インターネットゲートウェイ
３０４：クラウドストレージ 10: Cluster system 20: First server 21: Service execution means 22: Instance monitoring means 23: Instance operation means 30: Second server 40: Third server 50: Cloud environment 100: High availability cluster system 101: On-premise Environment 102: active servers 103, 203, 210, 217: high-availability cluster software 104, 204, 211, 218: cluster control units 105, 205, 212, 219: cluster configuration information 106, 206, 213, 220: service 107 , 207, 214, 221: Instance monitoring units 108, 208, 215, 222: Instance operation units 109, 223: Block storage 201: Public cloud environment 202, 209, 216: Standby server 301: Cloud service provider 303: Internet gateway 304: Cloud storage

Claims

a first server having service execution means for executing a service;
a second server to which execution of the service is failed over when a failure occurs in execution of the service in the first server;
the second server is a virtual server created as an instance of a first type that can be forcibly terminated by a cloud service provider in a cloud environment;
The first server further
instance monitoring means for monitoring whether the second server is forcibly terminated;
When it is detected that the second server is forcibly terminated, it causes the cloud environment to create a third server as an instance of the first type, and restores the functionality provided by the second server to the third server. A cluster system having an instance operation means to take over to the server of.

The first server includes a first storage used in executing the service, and the cloud environment includes a second storage that stores the same data as the first storage,
2. The instance operation means according to claim 1, wherein when it is detected that the second server is forcibly terminated, the instance operation means switches the connection destination of the second storage from the second server to the third server. cluster system.

The second server is
After the execution of the service is failed over, causing the cloud environment to create a fourth server as an instance of the second type that is not forcibly terminated by the cloud service provider, and connecting the second storage to the second storage. 3. The cluster system according to claim 2, further comprising instance operation means for switching from a server to said fourth server and executing said service on said fourth server.

4. The cluster system according to claim 3, wherein said fourth server requests said cloud service provider to terminate said second server after starting execution of said service.

if the first server recovers from the failure, execution of the service fails back from the fourth server to the first server;
The instance operation means of the first server causes the cloud service provider to create the second server as an instance of the first type, and changes the connection destination of the second storage from the fourth server to the 5. The cluster system according to claim 4, wherein switching to a second server and requesting said cloud service provider to terminate said fourth server.

6. The cluster system according to any one of claims 1 to 5, wherein said instance operation means of said first server causes said cloud service provider to create said third server with the same performance as said second server.

7. The cluster system according to claim 1, wherein said first server operates as an active system, and said second server and said third server operate as standby systems.

a service execution means for executing a service;
A first virtual server that is created as an instance of a first type that may be forcibly terminated by a cloud service provider in a cloud environment, and that fails over execution of the service if a failure occurs in the execution of the service. instance monitoring means for monitoring whether or not is forcibly terminated;
When it is detected that the first virtual server is forcibly terminated, causing the cloud environment to create a second virtual server as an instance of the first type, and performing functions provided by the first virtual server. and an instance operation means for taking over to the second virtual server.

running a service on the server,
A first virtual server that is created as an instance that can be forcibly terminated by a cloud service provider in a cloud environment and whose execution of the service fails over if a failure occurs in the execution of the service is forcibly terminated. monitor whether or not
When it is detected that the first virtual server is forcibly terminated, causing the cloud environment to create a second virtual server as an instance that may be forcibly terminated, and the first virtual server provides A cluster system control method for making the second virtual server take over the function of

to the computer,
run the service,
A first virtual server that is created as an instance that can be forcibly terminated by a cloud service provider in a cloud environment and whose execution of the service fails over if a failure occurs in the execution of the service is forcibly terminated. monitor whether or not
When it is detected that the first virtual server is forcibly terminated, causing the cloud environment to create a second virtual server as an instance that may be forcibly terminated, and the first virtual server provides A program for causing the second virtual server to take over the function of