JPWO2019171704A1

JPWO2019171704A1 - Management server, cluster system, cluster system control method, and program

Info

Publication number: JPWO2019171704A1
Application number: JP2020504799A
Authority: JP
Inventors: チューエンファン
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2018-03-06
Filing date: 2018-12-18
Publication date: 2021-02-04
Also published as: WO2019171704A1

Abstract

現用系サーバ及び待機系サーバがクラスタウェアを有することに起因した問題の発生を回避することができるＨＡクラスタシステムを実現する。管理サーバ（１）は、所定のサービスを提供するサービス提供サーバの動作を監視するための監視スクリプトと、フェールオーバーについての前記サービス提供サーバの動作を制御するための制御スクリプトとを、前記サービス提供サーバに送信する処理を行う送信処理部（２）と、前記サービス提供サーバに対し、前記監視スクリプトの実行及び実行結果の返信を要求し、該実行結果に基づいて前記サービス提供サーバの動作状態を監視するサーバ監視処理部（３）と、前記監視スクリプトの前記実行結果が前記サービス提供サーバの異常を示す場合、前記制御スクリプトの実行を前記サービス提供サーバに要求するクラスタ制御処理部（４）とを有する。Realize an HA cluster system that can avoid the occurrence of problems caused by the active server and standby server having clusterware. The management server (1) provides the service with a monitoring script for monitoring the operation of the service providing server that provides a predetermined service and a control script for controlling the operation of the service providing server for failover. The transmission processing unit (2) that performs the process of transmitting to the server and the service providing server are requested to execute the monitoring script and return the execution result, and the operating state of the service providing server is determined based on the execution result. A server monitoring processing unit (3) to be monitored, and a cluster control processing unit (4) that requests the service providing server to execute the control script when the execution result of the monitoring script indicates an abnormality of the service providing server. Have.

Description

本発明は管理サーバ、クラスタシステム、クラスタシステムの制御方法、及びプログラムに関する。 The present invention relates to a management server, a cluster system, a control method of a cluster system, and a program.

サーバの監視に関する技術として様々技術が提案されている。例えば、特許文献１は、監視対象のコンピュータである監視対象サーバと、ジョブを監視するコンピュータであるジョブ監視サーバと、通信回線網を介してこのジョブ監視サーバと接続される遠隔監視サーバと、を備えるジョブ監視システムについて開示している。 Various technologies have been proposed as technologies related to server monitoring. For example, Patent Document 1 describes a monitored server that is a computer to be monitored, a job monitoring server that is a computer that monitors jobs, and a remote monitoring server that is connected to the job monitoring server via a communication network. The job monitoring system provided is disclosed.

ところで、サーバを監視する処理は、ＨＡ（High Availability：高可用性）クラスタシステムにおいても必要とされる。一般的に、ＨＡクラスタシステムにおいては、現用系サーバと待機系サーバが、それぞれクラスタウェアを有している。そして、現用系サーバと待機系サーバがネットワーク経由で相互通信することで、ＨＡクラスタシステムを構成する。このような構成のＨＡクラスタシステムでは、現用系サーバは自サーバで正常に業務サービスを提供できているかどうかを監視し、待機系サーバは自サーバが正常に業務サービスを引き継げるかどうかを監視する。 By the way, the process of monitoring the server is also required in the HA (High Availability) cluster system. Generally, in an HA cluster system, the active server and the standby server each have clusterware. Then, the active server and the standby server communicate with each other via the network to form an HA cluster system. In an HA cluster system with such a configuration, the active server monitors whether the local server can normally provide business services, and the standby server monitors whether the local server can take over the business services normally.

このような構成のＨＡクラスタシステムでは、業務サービス、アプリケーションソフトウェア、ミドルウェア、ハードウェアなどの監視を行うために、監視用エージェントプロセスを現用系サーバ及び待機系サーバに常駐することとなる。また、ＯＳ（Operating System）パニック発生時の対処やデータミラーリング等の機能を実現するためのエージェントもこれらのサーバに入れておく必要がある。すなわち、クラスタウェアを現用系サーバ及び待機系サーバにインストールしておく必要がある。 In an HA cluster system with such a configuration, in order to monitor business services, application software, middleware, hardware, etc., a monitoring agent process is resident on the active server and the standby server. In addition, it is necessary to put an agent in these servers to deal with an OS (Operating System) panic and to realize functions such as data mirroring. That is, it is necessary to install the clusterware on the active server and the standby server.

特開２０１１−１５９０１１号公報Japanese Unexamined Patent Publication No. 2011-159011

このように、上述したＨＡクラスタシステムを構築する場合には、管理対象のサーバ（現用系サーバ及び待機系サーバ）へのクラスタウェアのインストールが必要とされる。このため、クラスタシステムとなっていない既存のシステムをクラスタシステムへと変更する場合、クラスタウェアのインストール及びクラスタの構築により、サーバの再起動及びシステムの停止が発生してしまう。また、上述したＨＡクラスタシステムではクラスタウェアによる処理による負荷が過大となる恐れがある。
これに対し、特許文献１のジョブ監視システムは、ＨＡクラスタシステムではない。このため、現用系サーバ及び待機系サーバがクラスタウェアを有することに起因した問題の発生を回避することができるＨＡクラスタシステムを実現するための技術が依然として求められている。As described above, when constructing the HA cluster system described above, it is necessary to install the clusterware on the managed server (active server and standby server). Therefore, when changing an existing system that is not a cluster system to a cluster system, the server is restarted and the system is stopped due to the installation of clusterware and the construction of a cluster. Further, in the above-mentioned HA cluster system, the load due to the processing by the clusterware may become excessive.
On the other hand, the job monitoring system of Patent Document 1 is not an HA cluster system. Therefore, there is still a need for a technique for realizing an HA cluster system that can avoid the occurrence of problems caused by the active server and the standby server having clusterware.

そこで、本明細書に開示される実施形態が達成しようとする目的の１つは、現用系サーバ及び待機系サーバがクラスタウェアを有することに起因した問題の発生を回避することができるＨＡクラスタシステムを実現することができる管理サーバ、クラスタシステム、クラスタシステムの制御方法、及びプログラムを提供することにある。 Therefore, one of the objectives to be achieved by the embodiment disclosed in the present specification is an HA cluster system capable of avoiding the occurrence of problems caused by the active server and the standby server having clusterware. It is to provide a management server, a cluster system, a control method of the cluster system, and a program capable of realizing the above.

第１の態様にかかる管理サーバは、所定のサービスを提供するサービス提供サーバの動作を監視するための監視スクリプトと、フェールオーバーについての前記サービス提供サーバの動作を制御するための制御スクリプトとを、前記サービス提供サーバに送信する処理を行う送信処理手段と、前記サービス提供サーバに対し、前記監視スクリプトの実行及び実行結果の返信を要求し、該実行結果に基づいて前記サービス提供サーバの動作状態を監視するサーバ監視処理手段と、前記監視スクリプトの前記実行結果が前記サービス提供サーバの異常を示す場合、前記制御スクリプトの実行を前記サービス提供サーバに要求するクラスタ制御処理手段とを有する。 The management server according to the first aspect includes a monitoring script for monitoring the operation of the service providing server that provides a predetermined service and a control script for controlling the operation of the service providing server for failover. The transmission processing means that performs the process of transmitting to the service providing server and the service providing server are requested to execute the monitoring script and return the execution result, and the operating state of the service providing server is determined based on the execution result. It has a server monitoring processing means to be monitored, and a cluster control processing means that requests the service providing server to execute the control script when the execution result of the monitoring script indicates an abnormality of the service providing server.

第２の態様にかかるクラスタシステムは、所定のサービスを提供するための現用系サーバと、前記所定のサービスを提供するための待機系サーバと、前記現用系サーバ及び前記待機系サーバにおけるフェールオーバーを制御する管理サーバと、を備え、前記管理サーバは、前記現用系サーバの動作を監視するための第１の監視スクリプトと、フェールオーバーについての前記現用系サーバの動作を制御するための第１の制御スクリプトとを前記現用系サーバに送信する処理と、前記待機系サーバの動作を監視するための第２の監視スクリプトと、フェールオーバーについての前記待機系サーバの動作を制御するための第２の制御スクリプトとを前記待機系サーバに送信する処理とを行う送信処理手段と、前記現用系サーバ及び前記待機系サーバに対し、前記監視スクリプトの実行及び実行結果の返信を要求し、該実行結果に基づいて前記現用系サーバ及び前記待機系サーバの動作状態を監視するサーバ監視処理手段と、前記現用系サーバの前記監視スクリプトの前記実行結果が前記現用系サーバの異常を示す場合、前記第１の制御スクリプトの実行を前記現用系サーバに要求するとともに前記第２の制御スクリプトの実行を前記待機系サーバに要求するクラスタ制御処理手段とを有する。 The cluster system according to the second aspect performs failover between the active server for providing the predetermined service, the standby server for providing the predetermined service, and the active server and the standby server. The management server includes a management server to be controlled, and the management server has a first monitoring script for monitoring the operation of the active server and a first monitoring script for controlling the operation of the active server regarding failover. A process of transmitting the control script to the active server, a second monitoring script for monitoring the operation of the standby server, and a second monitoring script for controlling the operation of the standby server regarding failover. The transmission processing means that performs the process of transmitting the control script to the standby system server, the active system server, and the standby system server are requested to execute the monitoring script and return the execution result to the execution result. When the server monitoring processing means for monitoring the operating state of the active server and the standby server and the execution result of the monitoring script of the active server indicate an abnormality of the active server, the first It has a cluster control processing means that requests the active server to execute the control script and the standby server to execute the second control script.

第３の態様にかかるクラスタシステムの制御方法では、所定のサービスを提供するサービス提供サーバの動作を監視するための監視スクリプトと、フェールオーバーについての前記サービス提供サーバの動作を制御するための制御スクリプトとを、前記サービス提供サーバに送信し、前記サービス提供サーバに対し、前記監視スクリプトの実行及び実行結果の返信を要求し、該実行結果に基づいて前記サービス提供サーバの動作状態を監視し、前記監視スクリプトの前記実行結果が前記サービス提供サーバの異常を示す場合、前記制御スクリプトの実行を前記サービス提供サーバに要求する。 In the cluster system control method according to the third aspect, a monitoring script for monitoring the operation of the service providing server that provides a predetermined service and a control script for controlling the operation of the service providing server for failover. Is transmitted to the service providing server, the service providing server is requested to execute the monitoring script and return the execution result, and the operating state of the service providing server is monitored based on the execution result. When the execution result of the monitoring script indicates an abnormality of the service providing server, the service providing server is requested to execute the control script.

第４の態様にかかるプログラムは、所定のサービスを提供するサービス提供サーバの動作を監視するための監視スクリプトと、フェールオーバーについての前記サービス提供サーバの動作を制御するための制御スクリプトとを、前記サービス提供サーバに送信する処理を行う送信処理ステップと、前記サービス提供サーバに対し、前記監視スクリプトの実行及び実行結果の返信を要求し、該実行結果に基づいて前記サービス提供サーバの動作状態を監視するサーバ監視処理ステップと、前記監視スクリプトの前記実行結果が前記サービス提供サーバの異常を示す場合、前記制御スクリプトの実行を前記サービス提供サーバに要求するクラスタ制御処理ステップとをコンピュータに実行させる。 The program according to the fourth aspect includes a monitoring script for monitoring the operation of the service providing server that provides a predetermined service, and a control script for controlling the operation of the service providing server for failover. A transmission processing step for performing a process of transmitting to the service providing server, requesting the service providing server to execute the monitoring script and returning the execution result, and monitoring the operating state of the service providing server based on the execution result. When the server monitoring processing step to be performed and the execution result of the monitoring script indicate an abnormality of the service providing server, the computer is made to execute the cluster control processing step that requests the service providing server to execute the control script.

上述の態様によれば、現用系サーバ及び待機系サーバがクラスタウェアを有することに起因した問題の発生を回避することができるＨＡクラスタシステムを実現することができる管理サーバ、クラスタシステム、クラスタシステムの制御方法、及びプログラムを提供することができる。 According to the above-described aspect, the management server, cluster system, and cluster system that can realize an HA cluster system that can avoid the occurrence of problems caused by the active server and the standby server having clusterware. Control methods and programs can be provided.

実施形態の概要にかかる管理サーバの構成の一例を示すブロック図である。It is a block diagram which shows an example of the structure of the management server which concerns on the outline of embodiment. 実施の形態１にかかるＨＡクラスタシステムの一例を示すブロック図である。It is a block diagram which shows an example of the HA cluster system which concerns on Embodiment 1. FIG. 実施の形態にかかる運用管理サーバのハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware configuration of the operation management server which concerns on embodiment. 実施の形態１にかかるＨＡクラスタシステムにおけるクラスタの構築処理の動作の流れを示すフローチャートである。It is a flowchart which shows the operation flow of the cluster construction process in the HA cluster system which concerns on Embodiment 1. FIG. 実施の形態１にかかるＨＡクラスタシステムにおけるクラスタの起動の動作の流れを示すフローチャートであるIt is a flowchart which shows the flow of operation of starting a cluster in the HA cluster system which concerns on Embodiment 1. 実施の形態１にかかるＨＡクラスタシステムにおける監視動作の流れを示すフローチャートである。It is a flowchart which shows the flow of the monitoring operation in the HA cluster system which concerns on Embodiment 1. FIG. 実施の形態１にかかるＨＡクラスタシステムにおけるフェールオーバー動作を示すフローチャートである。It is a flowchart which shows the failover operation in the HA cluster system which concerns on Embodiment 1. FIG. 実施の形態１にかかるＨＡクラスタシステムにおけるスクリプトの実行に関する動作を示すフローチャートである。It is a flowchart which shows the operation about execution of the script in the HA cluster system which concerns on Embodiment 1. FIG. 実施の形態２にかかるＨＡクラスタシステムの一例を示すブロック図である。It is a block diagram which shows an example of the HA cluster system which concerns on Embodiment 2. FIG. 実施の形態２にかかるＨＡクラスタシステムにおけるＢＭＣを用いたサーバ監視及びサーバ制御の動作を示すフローチャートである。It is a flowchart which shows the operation of the server monitoring and server control using BMC in the HA cluster system which concerns on Embodiment 2. FIG. 実施の形態３にかかるＨＡクラスタシステムの一例を示すブロック図である。It is a block diagram which shows an example of the HA cluster system which concerns on Embodiment 3.

＜実施形態の概要＞
実施形態の説明に先立って、本発明にかかる実施形態の概要を説明する。
図１は、実施形態の概要にかかる管理サーバ１の構成の一例を示すブロック図である。管理サーバ１は、送信処理部２と、サーバ監視処理部３と、クラスタ制御処理部４とを有する。管理サーバ１は、ＨＡクラスタシステムを構成するサービス提供サーバ（図１において図示せず）と有線又は無線通信可能に接続される。なお、サービス提供サーバは、例えば、所定のサービスをクライアント装置（図示せず）に提供するサーバである。<Outline of Embodiment>
Prior to the description of the embodiment, the outline of the embodiment according to the present invention will be described.
FIG. 1 is a block diagram showing an example of the configuration of the management server 1 according to the outline of the embodiment. The management server 1 has a transmission processing unit 2, a server monitoring processing unit 3, and a cluster control processing unit 4. The management server 1 is connected to a service providing server (not shown in FIG. 1) constituting the HA cluster system so as to be capable of wired or wireless communication. The service providing server is, for example, a server that provides a predetermined service to a client device (not shown).

送信処理部２は、サービス提供サーバの動作を監視するための監視スクリプトと、サービス提供サーバの動作を制御するための制御スクリプトとを、サービス提供サーバに送信する処理を行う。 The transmission processing unit 2 performs a process of transmitting a monitoring script for monitoring the operation of the service providing server and a control script for controlling the operation of the service providing server to the service providing server.

サーバ監視処理部３は、サービス提供サーバに対し、監視スクリプトの実行及び実行結果の返信を要求する。したがって、サービス提供サーバは、管理サーバ１から送信された監視スクリプトを実行し、当該サービス提供サーバの動作状態を確認し、確認結果を管理サーバ１に返信することとなる。そして、サーバ監視処理部３は、返信された実行結果に基づいてサービス提供サーバの動作状態を監視する。 The server monitoring processing unit 3 requests the service providing server to execute the monitoring script and return the execution result. Therefore, the service providing server executes the monitoring script transmitted from the management server 1, confirms the operating state of the service providing server, and returns the confirmation result to the management server 1. Then, the server monitoring processing unit 3 monitors the operating state of the service providing server based on the returned execution result.

クラスタ制御処理部４は、監視スクリプトの実行結果がサービス提供サーバの異常を示す場合、フェールオーバーを行うための制御スクリプトの実行をサービス提供サーバに要求する。なお、クラスタ制御処理部４は、異常時おけるサービスの起動及び停止を含む一連の回復動作としてのフェールオーバーの制御に限らず、他の制御を行ってもよい。例えば、クラスタ制御処理部４は、正常時において、サービスの開始制御又はサービス移行（サービスの起動及び停止）制御を行ってもよい。 When the execution result of the monitoring script indicates an abnormality of the service providing server, the cluster control processing unit 4 requests the service providing server to execute the control script for performing failover. The cluster control processing unit 4 is not limited to failover control as a series of recovery operations including start and stop of services in an abnormal situation, and may perform other controls. For example, the cluster control processing unit 4 may perform service start control or service migration (service start and stop) control in a normal state.

上述の管理サーバ１とサービス提供サーバを備えるＨＡクラスタシステムでは、管理サーバ１によってサービス提供サーバが異常状態であるか否かの判断が行われ、管理サーバ１によってフェールオーバーの実行が制御される。すなわち、管理サーバ１がネットワーク経由でサービス提供サーバのＯＳ機能提供状態やサービス機能提供状態などを監視し、障害が検出された場合には、管理サーバ１がシステムの回復動作を制御することができる。このように、管理サーバ１にクラスタウェアが備えられ、サービス提供サーバはクラスタウェアを有さなくてもよい。これにより、ＨＡクラスタシステムをエージェントレス方式で提供することができる。 In the HA cluster system including the management server 1 and the service providing server described above, the management server 1 determines whether or not the service providing server is in an abnormal state, and the management server 1 controls the execution of failover. That is, the management server 1 monitors the OS function provision status and the service function provision status of the service providing server via the network, and when a failure is detected, the management server 1 can control the recovery operation of the system. .. As described above, the management server 1 is provided with the clusterware, and the service providing server does not have to have the clusterware. As a result, the HA cluster system can be provided in an agentless manner.

管理サーバ１を備えずサービス提供サーバにクラスタウェアを有するＨＡクラスタシステム（以下、比較例にかかるクラスタシステムと称す。）では、クラスタウェアのインストール及びクラスタの構築により、サービス提供サーバの再起動及びシステムの停止が発生しまう。そのため、既存環境を活用せず、完全に新規のクラスタ環境を構築してから新システムへと移行する傾向がある。 In an HA cluster system that does not have the management server 1 and has clusterware on the service providing server (hereinafter referred to as the cluster system according to the comparative example), the service providing server is restarted and the system is restarted by installing the clusterware and constructing the cluster. Will stop. Therefore, there is a tendency to build a completely new cluster environment and then move to a new system without utilizing the existing environment.

一方、エージェントレス型のクラスタウェアの場合、管理対象サーバ（すなわち、サービス提供サーバ）毎にクラスタウェアをインストールする必要がなく、管理サーバ１のみにインストールすればよい。このため、システム停止など既存環境に影響を与えない。そして、完全に新規の環境を構築することも必要なく、既存環境を有効活用できる。クラスタウェアのバージョンアップやメンテナンスが必要な場合でも、システム停止や管理対象サーバ毎のインストール作業が不要になる。その結果、導入及び運用のコストを抑えることが可能となる。したがって、メンテナンス及びシステム拡大も簡単に行える。 On the other hand, in the case of agentless clusterware, it is not necessary to install the clusterware for each managed server (that is, the service providing server), and it is sufficient to install it only on the management server 1. Therefore, it does not affect the existing environment such as system stoppage. And it is not necessary to build a completely new environment, and the existing environment can be effectively utilized. Even if the clusterware version upgrade or maintenance is required, there is no need to stop the system or install each managed server. As a result, it is possible to reduce the introduction and operation costs. Therefore, maintenance and system expansion can be easily performed.

また、比較例にかかるクラスタシステムにおいては、定期的にサービス提供サーバ同士間の生存確認処理、監視プロセスを監視する死活監視処理が行われるため、サービス提供サーバに過大な負荷をかけてしまう恐れがある。このため、サービス提供サーバが備えるリソースが少ないと監視が失敗してしまい、障害が発生したと誤判定されることがある。 In addition, in the cluster system of the comparative example, the survival confirmation process between the service providing servers and the alive monitoring process for monitoring the monitoring process are performed periodically, so there is a risk that the service providing server will be overloaded. is there. Therefore, if the service providing server has few resources, monitoring may fail, and it may be erroneously determined that a failure has occurred.

これに対し、上述した管理サーバ１を有するＨＡクラスタシステムでは、高可用性を実現するための主な処理が管理サーバ１で行われるため、サービス提供サーバ（すなわち、クラスタサーバ）に与える負荷を抑制することができる。 On the other hand, in the HA cluster system having the management server 1 described above, since the main processing for achieving high availability is performed by the management server 1, the load on the service providing server (that is, the cluster server) is suppressed. be able to.

次に、実施の形態の詳細について説明する。
＜実施の形態１＞
図２は、実施の形態１にかかるＨＡクラスタシステム１０の一例を示すブロック図である。また、図３は、運用管理サーバ１００のハードウェア構成の一例を示すブロック図である。Next, the details of the embodiment will be described.
<Embodiment 1>
FIG. 2 is a block diagram showing an example of the HA cluster system 10 according to the first embodiment. Further, FIG. 3 is a block diagram showing an example of the hardware configuration of the operation management server 100.

図２に示すように、ＨＡクラスタシステム１０は、運用管理サーバ１００と、現用系業務サーバ２００と、待機系業務サーバ３００とを有している。現用系業務サーバ２００及び待機系業務サーバ３００は、所定の業務サービスを提供するためのサーバであり、運用管理サーバ１００は、現用系業務サーバ２００及び待機系業務サーバ３００におけるフェールオーバーなどを制御するサーバである。 As shown in FIG. 2, the HA cluster system 10 includes an operation management server 100, a working business server 200, and a standby business server 300. The working business server 200 and the standby business server 300 are servers for providing predetermined business services, and the operation management server 100 controls failover and the like in the working business server 200 and the standby business server 300. It is a server.

運用管理サーバ１００、現用系業務サーバ２００、及び待機系業務サーバ３００は、ネットワーク４００に接続されている。また、運用管理サーバ１００、現用系業務サーバ２００、及び待機系業務サーバ３００は、ネットワーク４０１によって相互に接続されている。 The operation management server 100, the active business server 200, and the standby business server 300 are connected to the network 400. Further, the operation management server 100, the active business server 200, and the standby business server 300 are connected to each other by the network 401.

運用管理サーバ１００は、図１の管理サーバ１に対応しており、現用系業務サーバ２００及び待機系業務サーバ３００は、上述のサービス提供サーバに対応している。なお、ＨＡクラスタシステム１０において、現用系業務サーバ２００及び待機系業務サーバ３００がクラスタサーバであり、フェールオーバーにより現用系と待機系の切り替えが行われる。すなわち、フェールオーバーにより、現用系業務サーバ２００が待機系へと切り替わるとともに、待機系業務サーバ３００が現用系へと切り替わる。
なお、以下の説明では、現用系業務サーバ２００及び待機系業務サーバ３００について、管理対象サーバと称すことがある。The operation management server 100 corresponds to the management server 1 of FIG. 1, and the active business server 200 and the standby business server 300 correspond to the above-mentioned service providing server. In the HA cluster system 10, the active business server 200 and the standby business server 300 are cluster servers, and the active system and the standby system are switched by failover. That is, due to failover, the active business server 200 is switched to the standby system, and the standby business server 300 is switched to the active system.
In the following description, the active business server 200 and the standby business server 300 may be referred to as managed servers.

ネットワーク４００は、図示しないクライアント装置と接続可能なパブリックＬＡＮ（Local Area Network）である。現用系業務サーバ２００は、ネットワーク４００を介して業務サービスをクライアント装置に提供する。また、ネットワーク４００は、現用系業務サーバ２００による業務サービスの提供状態を運用管理サーバ１００から監視するためにも用いられる。
ネットワーク４０１は、運用管理サーバ１００、現用系業務サーバ２００、及び待機系業務サーバ３００を相互に接続するインタコネクトＬＡＮである。ネットワーク４０１は、運用管理サーバ１００と現用系業務サーバ２００若しくは待機系業務サーバ３００との内部通信（サーバの監視、業務サービスの制御、クラスタの制御などで用いられる通信）に利用される。The network 400 is a public LAN (Local Area Network) that can be connected to a client device (not shown). The working business server 200 provides business services to the client device via the network 400. The network 400 is also used to monitor the provision status of business services by the working business server 200 from the operation management server 100.
The network 401 is an interconnect LAN that interconnects the operation management server 100, the active business server 200, and the standby business server 300. The network 401 is used for internal communication (communication used for server monitoring, business service control, cluster control, etc.) between the operation management server 100 and the active business server 200 or the standby business server 300.

図２に示すように、ＨＡクラスタシステム１０においては、運用管理サーバ１００にクラスタウェア１１０が設けられている。クラスタウェア１１０は、クラスタ制御部１１１と、業務サービス制御部１１２と、サーバ監視部１１３と、内部監視部１１４と、スクリプト実行部１１５と、サーバ通信部１１６と、スクリプト記憶部１１７とを有する。 As shown in FIG. 2, in the HA cluster system 10, the clusterware 110 is provided on the operation management server 100. The clusterware 110 includes a cluster control unit 111, a business service control unit 112, a server monitoring unit 113, an internal monitoring unit 114, a script execution unit 115, a server communication unit 116, and a script storage unit 117.

クラスタ制御部１１１は、ＨＡクラスタシステム１０を制御するための各種処理を行う。すなわち、クラスタ制御部１１１は、例えば、フェールオーバーの実行の制御などを行う。
業務サービス制御部１１２は、現用系業務サーバ２００の業務サービス提供部２０１及び待機系業務サーバ３００の業務サービス提供部３０１の起動及び停止を制御する。The cluster control unit 111 performs various processes for controlling the HA cluster system 10. That is, the cluster control unit 111 controls, for example, the execution of failover.
The business service control unit 112 controls the start and stop of the business service providing unit 201 of the working business server 200 and the business service providing unit 301 of the standby business server 300.

サーバ監視部１１３は、現用系業務サーバ２００及び待機系業務サーバ３００の動作状態を監視する。例えば、サーバ監視部１１３は、現用系業務サーバ２００において正常に業務サービスが提供されているか否か、現用系業務サーバ２００及び待機系業務サーバ３００においてハードウェア及びソフトウェアなどの異常が発生していないかを監視する。 The server monitoring unit 113 monitors the operating status of the active business server 200 and the standby business server 300. For example, the server monitoring unit 113 determines whether or not the business service is normally provided in the working business server 200, and no abnormality such as hardware or software has occurred in the working business server 200 and the standby business server 300. Monitor.

内部監視部１１４（内部監視処理部とも称す）は、クラスタ制御部１１１、業務サービス制御部１１２、及びサーバ監視部１１３の動作状態を監視する。なお、内部監視部１１４は、クラスタ制御部１１１、業務サービス制御部１１２、及びサーバ監視部１１３の全てについて監視することが好ましいが、これらの一部のみの動作状態を監視してもよいし、他のユニットの動作状態を監視してもよい。
また、内部監視部１１４は、これらの動作状態が異常であることを検知した場合、他の装置、又は運用管理者などに対し、異常の発生を報知してもよい。これにより、ＨＡクラスタシステム１０の異常が発生した場合の所定の処理の実行、又は運用管理者による所定の運用が可能となる。
内部監視部１１４は、具体的には、クラスタ制御部１１１、業務サービス制御部１１２、及びサーバ監視部１１３のプロセスの死活監視を行う。The internal monitoring unit 114 (also referred to as an internal monitoring processing unit) monitors the operating states of the cluster control unit 111, the business service control unit 112, and the server monitoring unit 113. It is preferable that the internal monitoring unit 114 monitors all of the cluster control unit 111, the business service control unit 112, and the server monitoring unit 113, but the operating state of only a part of these may be monitored. The operating status of other units may be monitored.
Further, when the internal monitoring unit 114 detects that these operating states are abnormal, the internal monitoring unit 114 may notify another device, an operation manager, or the like of the occurrence of the abnormality. As a result, it is possible to execute a predetermined process when an abnormality occurs in the HA cluster system 10 or to perform a predetermined operation by the operation manager.
Specifically, the internal monitoring unit 114 performs alive monitoring of the processes of the cluster control unit 111, the business service control unit 112, and the server monitoring unit 113.

スクリプト実行部１１５は、スクリプト記憶部１１７に記憶されたスクリプトを実行する。例えば、スクリプト実行部１１５は、サーバ監視部１１３の要求により、現用系業務サーバ２００及び待機系業務サーバ３００を監視するためのスクリプトを実行する。また、例えば、スクリプト実行部１１５は、業務サービス制御部１１２の要求により、現用系業務サーバ２００又は待機系業務サーバ３００における業務サービス提供部２０１、３０１の起動又は停止を行うためのスクリプトを実行する。また、スクリプト実行部１１５は、設定ファイルからスクリプトを生成し、生成したスクリプトを所定の鍵により暗号化するなどの処理を行う。 The script execution unit 115 executes the script stored in the script storage unit 117. For example, the script execution unit 115 executes a script for monitoring the active business server 200 and the standby business server 300 at the request of the server monitoring unit 113. Further, for example, the script execution unit 115 executes a script for starting or stopping the business service providing units 201 and 301 in the active business server 200 or the standby business server 300 at the request of the business service control unit 112. .. In addition, the script execution unit 115 generates a script from the setting file, and performs processing such as encrypting the generated script with a predetermined key.

サーバ通信部１１６は、他のサーバとの通信の確立処理及び切断処理などを行う。
スクリプト記憶部１１７は、スクリプト及びスクリプトを生成するための基になる設定ファイルなどを記憶する記憶領域である。The server communication unit 116 performs communication establishment processing and disconnection processing with other servers.
The script storage unit 117 is a storage area for storing a script and a setting file or the like that is a basis for generating the script.

現用系業務サーバ２００は、業務サービス提供部２０１と、スクリプト実行部２０２と、スクリプト記憶部２０３とを有する。同様に、待機系業務サーバ３００も、業務サービス提供部３０１と、スクリプト実行部３０２と、スクリプト記憶部３０３とを有する。 The working business server 200 has a business service providing unit 201, a script execution unit 202, and a script storage unit 203. Similarly, the standby business server 300 also has a business service providing unit 301, a script execution unit 302, and a script storage unit 303.

業務サービス提供部２０１、３０１は、所定の業務サービスをネットワーク４００を介して、ネットワーク４００に接続されたクライアント装置（図示せず）に提供する。すなわち、業務サービス提供部２０１、３０１は、所定の業務サービスを提供するアプリケーションプログラムである。 The business service providing units 201 and 301 provide predetermined business services to a client device (not shown) connected to the network 400 via the network 400. That is, the business service providing units 201 and 301 are application programs that provide a predetermined business service.

スクリプト実行部２０２は、運用管理サーバ１００からの要求に応じて、スクリプト記憶部２０３に記憶されたスクリプトを実行する。同様に、スクリプト実行部３０２は、運用管理サーバ１００からの要求に応じて、スクリプト記憶部３０３に記憶されたスクリプトを実行する。例えば、スクリプト実行部２０２は、現用系業務サーバ２００の動作状態をチェックするスクリプト、業務サービス提供部２０１の起動又は停止を行うスクリプトを実行する。同様に、スクリプト実行部３０２は、待機系業務サーバ３００の動作状態をチェックするスクリプト、業務サービス提供部３０１の起動又は停止を行うスクリプトを実行する。なお、スクリプト実行部２０２、３０２は、常駐プロセスではなく、運用管理サーバ１００からの要求に応じて一時に作成され、スクリプト記憶部２０３、３０３のスクリプトを実行する。
スクリプト記憶部２０３、３０３は、運用管理サーバ１００から受信したスクリプトを記憶する記憶領域である。The script execution unit 202 executes the script stored in the script storage unit 203 in response to a request from the operation management server 100. Similarly, the script execution unit 302 executes the script stored in the script storage unit 303 in response to a request from the operation management server 100. For example, the script execution unit 202 executes a script for checking the operating state of the active business server 200 and a script for starting or stopping the business service providing unit 201. Similarly, the script execution unit 302 executes a script for checking the operating state of the standby business server 300 and a script for starting or stopping the business service providing unit 301. The script execution units 202 and 302 are not resident processes, but are temporarily created in response to a request from the operation management server 100, and execute the scripts of the script storage units 203 and 303.
The script storage units 203 and 303 are storage areas for storing the script received from the operation management server 100.

図３に示すように、運用管理サーバ１００は、例えば、ネットワークインタフェース１５０、メモリ１５１、及びプロセッサ１５２を含む。なお、運用管理サーバ１００は、これ以外に、ハードディスクドライブなどの記憶装置を含んでもよい。 As shown in FIG. 3, the operation management server 100 includes, for example, a network interface 150, a memory 151, and a processor 152. In addition to this, the operation management server 100 may include a storage device such as a hard disk drive.

ネットワークインタフェース１５０は、ネットワーク４００、４０１を介した通信を行うために使用される。ネットワークインタフェース１５０は、例えば、ネットワークインタフェースカード（ＮＩＣ）を含んでもよい。 The network interface 150 is used to perform communication via the networks 400 and 401. The network interface 150 may include, for example, a network interface card (NIC).

メモリ１５１は、揮発性メモリ及び不揮発性メモリの組み合わせによって構成される。メモリ１５１は、プロセッサ１５２から離れて配置されたストレージを含んでもよい。この場合、プロセッサ１５２は、図示されていない入出力インタフェースを介してメモリ１５１にアクセスしてもよい。 The memory 151 is composed of a combination of a volatile memory and a non-volatile memory. Memory 151 may include storage located away from processor 152. In this case, the processor 152 may access the memory 151 via an input / output interface (not shown).

メモリ１５１は、例えば、プロセッサ１５２により実行される、１以上の命令を含むソフトウェア（コンピュータプログラム）などを格納するために使用される。 The memory 151 is used, for example, to store software (computer program) including one or more instructions executed by the processor 152.

このプログラムは、様々なタイプの非一時的なコンピュータ可読媒体（non-transitory computer readable medium）を用いて格納され、コンピュータに供給することができる。非一時的なコンピュータ可読媒体は、様々なタイプの実体のある記録媒体（tangible storage medium）を含む。非一時的なコンピュータ可読媒体の例は、磁気記録媒体（例えばフレキシブルディスク、磁気テープ、ハードディスクドライブ）、光磁気記録媒体（例えば光磁気ディスク）、Compact Disc Read Only Memory（CD-ROM）、CD-R、CD-R/W、半導体メモリ（例えば、マスクROM、Programmable ROM（PROM）、Erasable PROM（EPROM）、フラッシュROM、Random Access Memory（RAM））を含む。また、プログラムは、様々なタイプの一時的なコンピュータ可読媒体（transitory computer readable medium）によってコンピュータに供給されてもよい。一時的なコンピュータ可読媒体の例は、電気信号、光信号、及び電磁波を含む。一時的なコンピュータ可読媒体は、電線及び光ファイバ等の有線通信路、又は無線通信路を介して、プログラムをコンピュータに供給できる。 This program can be stored and supplied to a computer using various types of non-transitory computer readable media. Non-transitory computer-readable media include various types of tangible storage media. Examples of non-temporary computer-readable media include magnetic recording media (eg flexible disks, magnetic tapes, hard disk drives), magneto-optical recording media (eg magneto-optical disks), Compact Disc Read Only Memory (CD-ROM), CD- Includes R, CD-R / W, semiconductor memory (eg, mask ROM, Programmable ROM (PROM), Erasable PROM (EPROM), flash ROM, Random Access Memory (RAM)). The program may also be supplied to the computer by various types of transient computer readable media. Examples of temporary computer-readable media include electrical, optical, and electromagnetic waves. The temporary computer-readable medium can supply the program to the computer via a wired communication path such as an electric wire and an optical fiber, or a wireless communication path.

プロセッサ１５２は、例えば、マイクロプロセッサ、ＭＰＵ(Micro Processor Unit)、又はＣＰＵ(Central Processing Unit)であってもよい。プロセッサ１５２は、複数のプロセッサを含んでもよい。プロセッサ１５２は、メモリ１５１からコンピュータプログラムを読み出して実行することで、クラスタ制御部１１１、業務サービス制御部１１２、サーバ監視部１１３、内部監視部１１４、スクリプト実行部１１５、及びサーバ通信部１１６の処理を行う。また、スクリプト記憶部１１７は、例えばメモリ１５１により実現される。 The processor 152 may be, for example, a microprocessor, an MPU (Micro Processor Unit), or a CPU (Central Processing Unit). The processor 152 may include a plurality of processors. The processor 152 reads a computer program from the memory 151 and executes it to process the cluster control unit 111, the business service control unit 112, the server monitoring unit 113, the internal monitoring unit 114, the script execution unit 115, and the server communication unit 116. I do. Further, the script storage unit 117 is realized by, for example, a memory 151.

なお、現用系業務サーバ２００及び待機系業務サーバ３００も、図３に示した構成と同様のハードウェア構成となっている。このため、現用系業務サーバ２００のプロセッサは、メモリからコンピュータプログラムを読み出して実行することで、業務サービス提供部２０１及びスクリプト実行部２０２の処理を行う。また、スクリプト記憶部２０３は、例えば現用系業務サーバ２００のメモリにより実現される。同様に、待機系業務サーバ３００のプロセッサは、メモリからコンピュータプログラムを読み出して実行することで、業務サービス提供部３０１及びスクリプト実行部３０２の処理を行う。また、スクリプト記憶部３０３は、例えば待機系業務サーバ３００のメモリにより実現される。 The working business server 200 and the standby business server 300 also have the same hardware configuration as that shown in FIG. Therefore, the processor of the working business server 200 reads the computer program from the memory and executes it to perform the processing of the business service providing unit 201 and the script executing unit 202. Further, the script storage unit 203 is realized by, for example, the memory of the working business server 200. Similarly, the processor of the standby business server 300 reads the computer program from the memory and executes it to perform the processing of the business service providing unit 301 and the script executing unit 302. Further, the script storage unit 303 is realized by, for example, the memory of the standby business server 300.

以下、図４から図８フローチャートを参照しつつ、ＨＡクラスタシステム１０の各構成要素の動作について説明する。 Hereinafter, the operation of each component of the HA cluster system 10 will be described with reference to the flowcharts of FIGS. 4 to 8.

図４は、ＨＡクラスタシステム１０におけるクラスタの構築処理の動作の流れを示すフローチャートである。この構築処理により、各サーバに、スクリプトが配置される。以下、図４を参照しつつ動作の流れについて説明する。 FIG. 4 is a flowchart showing an operation flow of the cluster construction process in the HA cluster system 10. By this construction process, a script is placed on each server. Hereinafter, the operation flow will be described with reference to FIG.

ステップ１００（Ｓ１００）において、運用管理者は、運用管理サーバ１００のＧＵＩ（Graphical User Interface）を介して、クラスタ構築に必要とされる監視に関する設定情報などを入力する。具体的には、運用管理者は、監視対象（サーバ名、業務サービス、ディスク、プロセッサなど）、監視設定（インタバール、タイムアウト、リトライ回数など）、障害発生時の対処(プロセス再起動、フェールオーバー、サーバ再起動など)などを、指定する。 In step 100 (S100), the operation manager inputs setting information related to monitoring required for cluster construction via the GUI (Graphical User Interface) of the operation management server 100. Specifically, the operation administrator can monitor targets (server name, business service, disk, processor, etc.), monitoring settings (interval, timeout, number of retries, etc.), and what to do when a failure occurs (process restart, failover, etc.) , Server restart, etc.), etc.

次に、ステップ１０１（Ｓ１０１）において、クラスタ制御部１１１は、ＧＵＩからの入力を受け付けると、入力内容に従った設定ファイルの作成をスクリプト実行部１１５に要求する。そして、ステップ１０２（Ｓ１０２）において、スクリプト実行部１１５は、設定ファイル生成用の所定のスクリプトを実行することにより、入力内容を基づく設定ファイルを作成し、スクリプト記憶部１１７に格納する。 Next, in step 101 (S101), when the cluster control unit 111 receives the input from the GUI, the cluster control unit 111 requests the script execution unit 115 to create a setting file according to the input contents. Then, in step 102 (S102), the script execution unit 115 creates a setting file based on the input contents by executing a predetermined script for generating the setting file, and stores the setting file in the script storage unit 117.

次に、ステップ１０３（Ｓ１０３）において、クラスタ制御部１１１は、スクリプト実行部１１５に、設定ファイルに従った処理を実行するためのスクリプトの生成を要求する。そして、スクリプト実行部１１５は、スクリプト生成用の所定のスクリプトを実行する。これにより、ステップ１０４（Ｓ１０４）において、スクリプト実行部１１５は、スクリプト記憶部１１７から設定ファイルを取得し、ステップ１０５（Ｓ１０５）において、設定ファイルに従った処理を実行可能なスクリプトを生成する。スクリプト実行部１１５は、生成したスクリプトをスクリプト記憶部１１７に記憶する。 Next, in step 103 (S103), the cluster control unit 111 requests the script execution unit 115 to generate a script for executing the process according to the setting file. Then, the script execution unit 115 executes a predetermined script for script generation. As a result, in step 104 (S104), the script execution unit 115 acquires the setting file from the script storage unit 117, and in step 105 (S105), generates a script capable of executing the process according to the setting file. The script execution unit 115 stores the generated script in the script storage unit 117.

次に、ステップ１０６（Ｓ１０６）において、クラスタ制御部１１１は、ステップ１０５で生成された管理対象サーバ向けのスクリプトの転送をスクリプト実行部１１５に要求する。スクリプト実行部１１５は、転送用の所定のスクリプトを実行することにより、以下のようにスクリプトを管理対象サーバ（現用系業務サーバ２００及び待機系業務サーバ３００）に転送する。
スクリプト実行部１１５は、ステップ１０７（Ｓ１０７）において、スクリプト記憶部１１７から管理対象サーバ向けのスクリプトを取得し、ステップ１０８（Ｓ１０８）において、所定の秘密鍵を使ってスクリプトを暗号化する。そして、スクリプト実行部１１５は、ステップ１０９（Ｓ１０９）において、暗号化したスクリプトを管理対象サーバに転送する。Next, in step 106 (S106), the cluster control unit 111 requests the script execution unit 115 to transfer the script for the managed server generated in step 105. The script execution unit 115 transfers the script to the management target server (active business server 200 and standby business server 300) as follows by executing a predetermined script for transfer.
In step 107 (S107), the script execution unit 115 acquires the script for the managed server from the script storage unit 117, and in step 108 (S108), encrypts the script using a predetermined private key. Then, in step 109 (S109), the script execution unit 115 transfers the encrypted script to the managed server.

現用系業務サーバ２００は、受信したスクリプトをスクリプト記憶部２０３に記憶し、待機系業務サーバ３００は、受信したスクリプトをスクリプト記憶部３０３に記憶する。したがって、本実施の形態では、管理対象サーバにおけるスクリプトの実行のたびに、スクリプトの転送を行う必要がなく、スクリプトの実行までに要する時間を抑制することができる。また、管理対象サーバでは暗号化されたスクリプトが記憶されるため、セキュリティを担保できる。 The active business server 200 stores the received script in the script storage unit 203, and the standby business server 300 stores the received script in the script storage unit 303. Therefore, in the present embodiment, it is not necessary to transfer the script each time the script is executed on the managed server, and the time required for executing the script can be suppressed. In addition, since the encrypted script is stored in the managed server, security can be ensured.

このように、クラスタ制御部１１１及びスクリプト実行部１１５は、図１の送信処理部２に対応している。すなわち、クラスタ制御部１１１及びスクリプト実行部１１５は、管理対象サーバの動作（管理対象サーバの状態）を監視するための監視スクリプトを、管理対象サーバに送信する処理を行う。また、クラスタ制御部１１１及びスクリプト実行部１１５は、フェールオーバーについての管理対象サーバの動作（例えば、業務サービス提供部２０１、３０１の起動、停止など）を制御するための制御スクリプトを、管理対象サーバに送信する処理を行う。 As described above, the cluster control unit 111 and the script execution unit 115 correspond to the transmission processing unit 2 of FIG. That is, the cluster control unit 111 and the script execution unit 115 perform a process of transmitting a monitoring script for monitoring the operation of the managed server (state of the managed server) to the managed server. Further, the cluster control unit 111 and the script execution unit 115 provide a control script for controlling the operation of the managed server (for example, start / stop of the business service providing units 201 and 301) regarding failover to the managed server. Performs the process of sending to.

なお、クラスタの設定を変更する場合、クラスタ制御部１１１は、変更内容を基づいて設定ファイルを更新するようスクリプト実行部１１５に要求する。これに対し、スクリプト実行部１１５は、上記と同様の処理を行い、再度、暗号化されたスクリプトを管理対象サーバに転送する。 When changing the cluster settings, the cluster control unit 111 requests the script execution unit 115 to update the setting file based on the changed contents. On the other hand, the script execution unit 115 performs the same processing as described above, and transfers the encrypted script to the managed server again.

図５は、ＨＡクラスタシステム１０におけるクラスタの起動の動作の流れを示すフローチャートである。以下、図５を参照しつつ動作の流れについて説明する。 FIG. 5 is a flowchart showing a flow of operation for starting a cluster in the HA cluster system 10. Hereinafter, the operation flow will be described with reference to FIG.

ステップ２００（Ｓ２００）において、内部監視部１１４は、クラスタ制御部１１１の動作状態の監視を開始する。すなわち、内部監視部１１４は、クラスタ制御部１１１のプロセスの死活監視を開始する。 In step 200 (S200), the internal monitoring unit 114 starts monitoring the operating state of the cluster control unit 111. That is, the internal monitoring unit 114 starts alive monitoring of the process of the cluster control unit 111.

次に、ステップ２０１（Ｓ２０１）において、クラスタ制御部１１１は、業務サービス制御部１１２を起動する。すなわち、クラスタ制御部１１１は、業務サービス制御部１１２のプロセスを開始させる。
業務サービス制御部１１２が起動すると、ステップ２０２（Ｓ２０２）において、内部監視部１１４は、業務サービス制御部１１２の動作状態の監視を開始する。すなわち、内部監視部１１４は、業務サービス制御部１１２のプロセスの死活監視を開始する。Next, in step 201 (S201), the cluster control unit 111 activates the business service control unit 112. That is, the cluster control unit 111 starts the process of the business service control unit 112.
When the business service control unit 112 is activated, in step 202 (S202), the internal monitoring unit 114 starts monitoring the operating state of the business service control unit 112. That is, the internal monitoring unit 114 starts the life-and-death monitoring of the process of the business service control unit 112.

次に、ステップ２０３（Ｓ２０３）において、クラスタ制御部１１１は、業務サービス制御部１１２に対し、現用系業務サーバ２００における業務サービス提供部２０１の起動を要求する。
ステップ２０４（Ｓ２０４）において、業務サービス制御部１１２は、スクリプト実行部１１５に対し、現用系業務サーバ２００における業務サービス提供部２０１の起動を要求する。
ステップ２０５（Ｓ２０５）において、スクリプト実行部１１５は、業務サービス提供部２０１の起動を現用系業務サーバ２００に要求するスクリプトを実行することにより、現用系業務サーバ２００にこれを要求する。Next, in step 203 (S203), the cluster control unit 111 requests the business service control unit 112 to start the business service providing unit 201 in the active business server 200.
In step 204 (S204), the business service control unit 112 requests the script execution unit 115 to start the business service providing unit 201 in the active business server 200.
In step 205 (S205), the script execution unit 115 requests the active business server 200 to start the business service providing unit 201 by executing the script that requests the active business server 200 to start.

ステップ２０６（Ｓ２０６）において、運用管理サーバ１００から要求を受けた現用系業務サーバ２００のスクリプト実行部２０２は、業務サービス提供部２０１の起動用のスクリプトをスクリプト記憶部２０３から取得する。なお、業務サービス提供部２０１の起動用のスクリプトは、図４に示した処理により予めスクリプト記憶部２０３に記憶されている。
ステップ２０７（Ｓ２０７）において、現用系業務サーバ２００のスクリプト実行部２０２は、取得したスクリプトを実行することにより、業務サービス提供部２０１を起動する。これにより、現用系業務サーバ２００からクライアント装置（ユーザ）へのネットワーク４００を介した業務サービスの提供が開始される（ステップ２０８（Ｓ２０８））。In step 206 (S206), the script execution unit 202 of the active business server 200 that received the request from the operation management server 100 acquires the script for starting the business service providing unit 201 from the script storage unit 203. The script for starting the business service providing unit 201 is stored in the script storage unit 203 in advance by the process shown in FIG.
In step 207 (S207), the script execution unit 202 of the active business server 200 starts the business service providing unit 201 by executing the acquired script. As a result, the provision of the business service from the active business server 200 to the client device (user) via the network 400 is started (step 208 (S208)).

業務サービス提供部２０１の起動を要求した運用管理サーバ１００のクラスタ制御部１１１は、ステップ２０９（Ｓ２０９）において、サーバ監視部１１３を起動する。すなわち、クラスタ制御部１１１は、サーバ監視部１１３のプロセスを開始させる。
サーバ監視部１１３が起動すると、ステップ２１０（Ｓ２１０）において、内部監視部１１４は、サーバ監視部１１３の動作状態の監視を開始する。すなわち、内部監視部１１４は、サーバ監視部１１３のプロセスの死活監視を開始する。In step 209 (S209), the cluster control unit 111 of the operation management server 100 that has requested the start of the business service providing unit 201 starts the server monitoring unit 113. That is, the cluster control unit 111 starts the process of the server monitoring unit 113.
When the server monitoring unit 113 is activated, in step 210 (S210), the internal monitoring unit 114 starts monitoring the operating state of the server monitoring unit 113. That is, the internal monitoring unit 114 starts alive monitoring of the process of the server monitoring unit 113.

図６は、ＨＡクラスタシステム１０における監視動作の流れを示すフローチャートである。以下、図６を参照しつつ動作の流れについて説明する。なお、継続的に監視を行うため、図６に示される動作は繰り返し実行される。 FIG. 6 is a flowchart showing the flow of the monitoring operation in the HA cluster system 10. Hereinafter, the operation flow will be described with reference to FIG. Since the monitoring is continuously performed, the operation shown in FIG. 6 is repeatedly executed.

ステップ３００（Ｓ３００）において、サーバ監視部１１３は、現用系業務サーバ２００による業務サービスの提供状態を確認するための処理をスクリプト実行部１１５に要求する。具体的には、現用系業務サーバ２００の業務サービス提供部２０１によって提供される業務サービスにネットワーク４００を介して運用管理サーバ１００からアクセスできるか否かを確認する処理をスクリプト実行部１１５に要求する。
ステップ３０１（Ｓ３０１）において、スクリプト実行部１１５は、業務サービスの提供状態の確認用の所定のスクリプトを実行することにより、業務サービスへのアクセスパスを確認する。これにより、ネットワーク４００経由で正常に現用系業務サーバ２００の業務サービスにアクセスできるか否かが確認される。In step 300 (S300), the server monitoring unit 113 requests the script execution unit 115 to perform a process for confirming the provision status of the business service by the active business server 200. Specifically, the script execution unit 115 is requested to perform a process of confirming whether or not the business service provided by the business service providing unit 201 of the working business server 200 can be accessed from the operation management server 100 via the network 400. ..
In step 301 (S301), the script execution unit 115 confirms the access path to the business service by executing a predetermined script for confirming the provision status of the business service. As a result, it is confirmed whether or not the business service of the working business server 200 can be normally accessed via the network 400.

また、ＨＡクラスタシステム１０においては、運用管理サーバ１００からのネットワーク４０１経由の要求に応じて、現用系業務サーバ２００及び待機系業務サーバ３００における監視用のスクリプトの実行が行われる。
具体的には、まず、ステップ３０２（Ｓ３０２）において、サーバ監視部１１３は、管理対象サーバ（現用系業務サーバ２００及び待機系業務サーバ３００）の動作状態の確認をスクリプト実行部１１５に要求する。
ステップ３０３（Ｓ３０３）において、スクリプト実行部１１５は、動作確認を要求するための所定のスクリプトを実行することにより、管理対象サーバ（現用系業務サーバ２００及び待機系業務サーバ３００）に対し、動作状態の確認を要求する。すなわち、スクリプト実行部１１５は、現用系業務サーバ２００に対し、現用系業務サーバ２００の監視用のスクリプトの実行を要求し、待機系業務サーバ３００に対し、待機系業務サーバ３００の監視用のスクリプトの実行を要求する。Further, in the HA cluster system 10, the monitoring script is executed in the active business server 200 and the standby business server 300 in response to the request from the operation management server 100 via the network 401.
Specifically, first, in step 302 (S302), the server monitoring unit 113 requests the script execution unit 115 to confirm the operating status of the managed server (active business server 200 and standby business server 300).
In step 303 (S303), the script execution unit 115 executes a predetermined script for requesting operation confirmation, thereby causing the managed server (active business server 200 and standby business server 300) to operate. Request confirmation of. That is, the script execution unit 115 requests the active business server 200 to execute the script for monitoring the active business server 200, and the standby business server 300 is requested to execute the script for monitoring the standby business server 300. Request the execution of.

ステップ３０４（Ｓ３０４）において、管理対象サーバ（現用系業務サーバ２００及び待機系業務サーバ３００）は、監視用のスクリプトを実行することにより、自装置の状態を確認する。すなわち、運用管理サーバ１００から要求を受けた現用系業務サーバ２００のスクリプト実行部２０２は、監視用のスクリプトをスクリプト記憶部２０３から取得する。そして、現用系業務サーバ２００のスクリプト実行部２０２は、取得したスクリプトを実行することにより、現用系業務サーバ２００の動作状態を確認する。同様に、運用管理サーバ１００から要求を受けた待機系業務サーバ３００のスクリプト実行部３０２は、監視用のスクリプトをスクリプト記憶部３０３から取得する。そして、待機系業務サーバ３００のスクリプト実行部３０２は、取得したスクリプトを実行することにより、待機系業務サーバ３００の動作状態を確認する。なお、監視用のスクリプトは、図４に示した処理により予めスクリプト記憶部２０３、３０３に記憶されている。
監視用のスクリプトの実行により、管理対象サーバのハードウェア及びソフトウェアなどの状態情報が取得される。具体的には、例えば、アプリケーションのプロセスの死活状態、ディスクへのアクセスパスの動作状態、ＬＡＮケーブルのリンク状態、ＮＩＣの状態などの状態情報が取得される。スクリプト実行部２０２、３０２は、監視用のスクリプトの実行結果、すなわち管理対象サーバの状態情報を運用管理サーバ１００に送信し、運用管理サーバ１００のスクリプト実行部１１５によってこれが取得される（ステップ３０５（Ｓ３０５））。In step 304 (S304), the managed server (active business server 200 and standby business server 300) confirms the status of its own device by executing a monitoring script. That is, the script execution unit 202 of the active business server 200 that receives the request from the operation management server 100 acquires the monitoring script from the script storage unit 203. Then, the script execution unit 202 of the working business server 200 confirms the operating state of the working business server 200 by executing the acquired script. Similarly, the script execution unit 302 of the standby business server 300 that receives the request from the operation management server 100 acquires the monitoring script from the script storage unit 303. Then, the script execution unit 302 of the standby business server 300 confirms the operating state of the standby business server 300 by executing the acquired script. The monitoring script is stored in the script storage units 203 and 303 in advance by the process shown in FIG.
By executing the monitoring script, the status information of the hardware and software of the managed server is acquired. Specifically, for example, status information such as the life-and-death status of the application process, the operating status of the access path to the disk, the link status of the LAN cable, and the NIC status is acquired. The script execution units 202 and 302 transmit the execution result of the monitoring script, that is, the status information of the managed server to the operation management server 100, and the script execution unit 115 of the operation management server 100 acquires this (step 305 (step 305 (step 305)). S305)).

ステップ３０６（Ｓ３０６）において、スクリプト実行部１１５は、ステップ３０１において得られた結果及びステップ３０５において得られた結果を、監視情報として、サーバ監視部１１３に出力する。
次に、ステップ３０７（Ｓ３０７）においては、サーバ監視部１１３は、ステップ３０６で取得した監視情報に基づいて、管理対象サーバが正常であるか否か、すなわち障害が発生していないか否かを判定する。なお、ステップ３０７における判定は、監視情報の分析（例えば、管理対象サーバが正常である場合に期待される監視結果と、実際の監視結果との比較）によって行われてもよい。In step 306 (S306), the script execution unit 115 outputs the result obtained in step 301 and the result obtained in step 305 to the server monitoring unit 113 as monitoring information.
Next, in step 307 (S307), the server monitoring unit 113 determines whether or not the managed server is normal, that is, whether or not a failure has occurred, based on the monitoring information acquired in step 306. judge. The determination in step 307 may be performed by analyzing the monitoring information (for example, comparing the monitoring result expected when the managed server is normal with the actual monitoring result).

このように、サーバ監視部１１３及びスクリプト実行部１１５は、図１のサーバ監視処理部３に相当する。すなわち、サーバ監視部１１３及びスクリプト実行部１１５は、管理対象サーバに対し、監視用のスクリプトの実行及び実行結果の返信を要求し、この実行結果に基づいて管理対象サーバの動作状態を監視する。 As described above, the server monitoring unit 113 and the script execution unit 115 correspond to the server monitoring processing unit 3 of FIG. That is, the server monitoring unit 113 and the script execution unit 115 request the managed server to execute the monitoring script and return the execution result, and monitor the operating state of the managed server based on the execution result.

図７は、ＨＡクラスタシステム１０におけるフェールオーバー動作を示すフローチャートである。ＨＡクラスタシステム１０は、現用系業務サーバ２００に障害が発生した場合、以下のような処理を行う。すなわち、図６のステップ３０７において、現用系業務サーバ２００の障害が検出された場合、ＨＡクラスタシステム１０は以下のような処理を行う。以下、図７を参照しつつ動作の流れについて説明する。 FIG. 7 is a flowchart showing a failover operation in the HA cluster system 10. When a failure occurs in the active business server 200, the HA cluster system 10 performs the following processing. That is, when a failure of the active business server 200 is detected in step 307 of FIG. 6, the HA cluster system 10 performs the following processing. Hereinafter, the operation flow will be described with reference to FIG. 7.

ステップ４００（Ｓ４００）において、現用系業務サーバ２００に障害が発生したことを検知したサーバ監視部１１３は、クラスタ制御部１１１に異常の発生を通知する。
次にステップ４０１（Ｓ４０１）において、異常の発生の通知を受けたクラスタ制御部１１１は、待機系業務サーバ３００が正常に業務サービスを引き継げるかどうかの確認をサーバ監視部１１３に要求する。サーバ監視部１１３は、図６のステップ３０７で得られた待機系業務サーバ３００の状態に基づいて、待機系業務サーバ３００が正常に業務サービスを引き継げるかどうかを判定する。サーバ監視部１１３は、待機系業務サーバ３００が正常である場合、待機系業務サーバ３００が正常に業務サービスを引き継げると判定する。In step 400 (S400), the server monitoring unit 113 that detects that a failure has occurred in the active business server 200 notifies the cluster control unit 111 of the occurrence of an abnormality.
Next, in step 401 (S401), the cluster control unit 111, which has been notified of the occurrence of the abnormality, requests the server monitoring unit 113 to confirm whether the standby business server 300 can normally take over the business service. The server monitoring unit 113 determines whether or not the standby business server 300 can normally take over the business service based on the state of the standby business server 300 obtained in step 307 of FIG. When the standby business server 300 is normal, the server monitoring unit 113 determines that the standby business server 300 can take over the business service normally.

待機系業務サーバ３００が正常に業務サービスを引き継げる場合、クラスタ制御部１１１は、業務サービス制御部１１２にフェールオーバーを要求する。具体的には、以下のような処理が行われ、現用系業務サーバ２００による業務サービスの提供から、待機系業務サーバ３００による業務サービスの提供へとシステムの状態が切替えられる。 When the standby business server 300 can take over the business service normally, the cluster control unit 111 requests the business service control unit 112 to fail over. Specifically, the following processing is performed, and the state of the system is switched from the provision of the business service by the active business server 200 to the provision of the business service by the standby business server 300.

ステップ４０２（Ｓ４０２）において、クラスタ制御部１１１は、業務サービス制御部１１２に、現用系業務サーバ２００による業務サービスの提供を停止する制御を要求する。これに対し、業務サービス制御部１１２は、ステップ４０３（Ｓ４０３）において、スクリプト実行部１１５に対し、現用系業務サーバ２００における業務サービス提供部２０１の停止を要求する。
ステップ４０４（Ｓ４０４）において、スクリプト実行部１１５は、業務サービス提供部２０１の停止を現用系業務サーバ２００に要求するスクリプトを実行することにより、現用系業務サーバ２００にこれを要求する。すなわち、スクリプト実行部１１５は、現用系業務サーバ２００に対し、業務サービス提供部２０１の停止のためのスクリプトの実行を要求する。
そして、ステップ４０５（Ｓ４０５）において、要求を受けた現用系業務サーバ２００のスクリプト実行部２０２は、業務サービス提供部２０１を停止するためのスクリプトを実行する。すなわち、運用管理サーバ１００から要求を受けた現用系業務サーバ２００のスクリプト実行部２０２は、業務サービス提供部２０１の停止用のスクリプトをスクリプト記憶部２０３から取得する。そして、現用系業務サーバ２００のスクリプト実行部２０２は、取得したスクリプトを実行することにより、現用系業務サーバ２００による業務サービスの提供を停止する。なお、停止用のスクリプトは、図４に示した処理により予めスクリプト記憶部２０３に記憶されている。In step 402 (S402), the cluster control unit 111 requests the business service control unit 112 to control to stop the provision of the business service by the active business server 200. On the other hand, in step 403 (S403), the business service control unit 112 requests the script execution unit 115 to stop the business service providing unit 201 in the active business server 200.
In step 404 (S404), the script execution unit 115 requests the working business server 200 to stop the business service providing unit 201 by executing the script that requests the working server 200 to stop. That is, the script execution unit 115 requests the active business server 200 to execute the script for stopping the business service providing unit 201.
Then, in step 405 (S405), the script execution unit 202 of the working business server 200 that received the request executes the script for stopping the business service providing unit 201. That is, the script execution unit 202 of the active business server 200 that receives the request from the operation management server 100 acquires the script for stopping the business service providing unit 201 from the script storage unit 203. Then, the script execution unit 202 of the active business server 200 stops the provision of the business service by the active business server 200 by executing the acquired script. The stop script is stored in advance in the script storage unit 203 by the process shown in FIG.

また、ステップ４０６（Ｓ４０６）において、クラスタ制御部１１１は、業務サービス制御部１１２に、待機系業務サーバ３００による業務サービスの提供を開始する制御を要求する。これに対し、業務サービス制御部１１２は、ステップ４０７（Ｓ４０７）において、スクリプト実行部１１５に対し、待機系業務サーバ３００における業務サービス提供部３０１の起動を要求する。
ステップ４０８（Ｓ４０８）において、スクリプト実行部１１５は、業務サービス提供部３０１の起動を待機系業務サーバ３００に要求するスクリプトを実行することにより、待機系業務サーバ３００にこれを要求する。すなわち、スクリプト実行部１１５は、待機系業務サーバ３００に対し、業務サービス提供部３０１の起動のためのスクリプトの実行を要求する。
そして、ステップ４０９（Ｓ４０９）において、要求を受けた待機系業務サーバ３００のスクリプト実行部３０２は、業務サービス提供部３０１を起動するためのスクリプトを実行する。すなわち、運用管理サーバ１００から要求を受けた待機系業務サーバ３００のスクリプト実行部３０２は、業務サービス提供部３０１の起動用のスクリプトをスクリプト記憶部３０３から取得する。そして、待機系業務サーバ３００のスクリプト実行部３０２は、取得したスクリプトを実行することにより、待機系業務サーバ３００による業務サービスの提供を開始する。なお、起動用のスクリプトは、図４に示した処理により予めスクリプト記憶部３０３に記憶されている。
このようにして、待機系業務サーバ３００からの業務サービスの提供を可能とし、フェールオーバーを完了する。Further, in step 406 (S406), the cluster control unit 111 requests the business service control unit 112 to control the standby system business server 300 to start providing the business service. On the other hand, in step 407 (S407), the business service control unit 112 requests the script execution unit 115 to start the business service providing unit 301 in the standby system business server 300.
In step 408 (S408), the script execution unit 115 requests the standby business server 300 to start the business service providing unit 301 by executing the script that requests the standby business server 300 to start. That is, the script execution unit 115 requests the standby business server 300 to execute the script for starting the business service providing unit 301.
Then, in step 409 (S409), the script execution unit 302 of the standby business server 300 that received the request executes the script for starting the business service providing unit 301. That is, the script execution unit 302 of the standby business server 300 that receives the request from the operation management server 100 acquires the script for starting the business service providing unit 301 from the script storage unit 303. Then, the script execution unit 302 of the standby business server 300 starts providing the business service by the standby business server 300 by executing the acquired script. The activation script is stored in advance in the script storage unit 303 by the process shown in FIG.
In this way, the business service can be provided from the standby business server 300, and the failover is completed.

このように、クラスタ制御部１１１、業務サービス制御部１１２、及びスクリプト実行部１１５は、図１のクラスタ制御処理部４に相当する。すなわち、クラスタ制御部１１１、業務サービス制御部１１２、及びスクリプト実行部１１５は、現用系業務サーバ２００が異常である場合、フェールオーバーを行うためのスクリプトの実行を管理対象サーバに要求する。 As described above, the cluster control unit 111, the business service control unit 112, and the script execution unit 115 correspond to the cluster control processing unit 4 in FIG. That is, the cluster control unit 111, the business service control unit 112, and the script execution unit 115 request the managed server to execute a script for failover when the active business server 200 is abnormal.

図８は、ＨＡクラスタシステム１０におけるスクリプトの実行に関する動作を示すフローチャートである。なお、図８において、上段は、運用管理サーバ１００が管理対象サーバにスクリプトを実行させる際の動作の流れを示し、下段は、運用管理サーバ１００においてスクリプトを実行する際の動作の流れを示す。以下、図８を参照しつつ動作の流れについて説明する。 FIG. 8 is a flowchart showing an operation related to script execution in the HA cluster system 10. In FIG. 8, the upper row shows the flow of operations when the operation management server 100 causes the managed server to execute the script, and the lower row shows the flow of operations when the operation management server 100 executes the script. Hereinafter, the operation flow will be described with reference to FIG.

まず、運用管理サーバ１００が管理対象サーバ（現用系業務サーバ２００及び待機系業務サーバ３００）にスクリプトを実行させる際の動作（リモート実行の動作）の流れを説明する。
ステップ５００（Ｓ５００）において、運用管理サーバ１００のスクリプト実行部１１５は、サーバ通信部１１６に、管理対象サーバへの通信セッションの確立を要求する。これに対し、ステップ５０１（Ｓ５０１）において、サーバ監視部１１３は、管理対象サーバ、すなわち、現用系業務サーバ２００又は待機系業務サーバ３００への通信セッションを確立する。なお、図６のステップ３０１による処理が行われる場合、サーバ通信部１１６は、ネットワーク４００（パブリックＬＡＮ)での通信セッションを確立する。それ以外の場合は、サーバ通信部１１６は、ネットワーク４０１（インタコネクトＬＡＮ）での通信セッションを確立する。First, the flow of the operation (remote execution operation) when the operation management server 100 causes the managed server (active business server 200 and standby business server 300) to execute the script will be described.
In step 500 (S500), the script execution unit 115 of the operation management server 100 requests the server communication unit 116 to establish a communication session with the managed server. On the other hand, in step 501 (S501), the server monitoring unit 113 establishes a communication session to the managed server, that is, the active business server 200 or the standby business server 300. When the process according to step 301 in FIG. 6 is performed, the server communication unit 116 establishes a communication session on the network 400 (public LAN). In other cases, the server communication unit 116 establishes a communication session on the network 401 (interconnect LAN).

次に、ステップ５０２（Ｓ５０２）において、業務サービス制御部１１２又はサーバ監視部１１３からの要求に従い、スクリプト実行部１１５は、管理対象サーバのスクリプト実行部２０２又は３０２に対し、スクリプトの実行を要求する。このとき、スクリプト実行部１１５は、暗号化されてスクリプト記憶部２０３、３０３に記憶されているスクリプトの復号に必要な鍵を送信する。 Next, in step 502 (S502), the script execution unit 115 requests the script execution unit 202 or 302 of the managed server to execute the script in accordance with the request from the business service control unit 112 or the server monitoring unit 113. .. At this time, the script execution unit 115 transmits the key required for decrypting the script encrypted and stored in the script storage units 203 and 303.

ステップ５０３（Ｓ５０３）において、スクリプト実行部２０２、３０２は、スクリプト記憶部２０３、３０３からスクリプトを取得する。そして、ステップ５０４（Ｓ５０４）において、受信した鍵を用いて、スクリプトを復号し、ステップ５０５（Ｓ５０５）において、復号したスクリプトを実行する。その後、ステップ５０６（Ｓ５０６）において、スクリプト実行部２０２、３０２は、スクリプトの実行結果を運用管理サーバ１００のスクリプト実行部１１５に送信する。 In step 503 (S503), the script execution units 202 and 302 acquire the script from the script storage units 203 and 303. Then, in step 504 (S504), the script is decrypted using the received key, and in step 505 (S505), the decrypted script is executed. After that, in step 506 (S506), the script execution units 202 and 302 transmit the script execution result to the script execution unit 115 of the operation management server 100.

次に、ステップ５０７（Ｓ５０７）において、実行結果を管理対象サーバから取得したスクリプト実行部１１５は、サーバ通信部１１６に、管理対象サーバへの通信セッションの切断を要求する。これに対し、ステップ５０８（Ｓ５０８）において、サーバ監視部１１３は、管理対象サーバへの通信セッションを切断する。なお、実行結果を取得したスクリプト実行部１１５は、要求元である業務サービス制御部１１２又はサーバ監視部１１３に実行結果を出力する。
このように、本実施の形態では、運用管理サーバ１００から管理対象サーバへのリモートログインによるスクリプトの実行が行われる。なお、リモートログインでは、ＳＳＨ（Secure Shell）が用いられてもよい。Next, in step 507 (S507), the script execution unit 115 that has acquired the execution result from the managed server requests the server communication unit 116 to disconnect the communication session to the managed server. On the other hand, in step 508 (S508), the server monitoring unit 113 disconnects the communication session to the managed server. The script execution unit 115 that has acquired the execution result outputs the execution result to the business service control unit 112 or the server monitoring unit 113 that is the request source.
As described above, in the present embodiment, the script is executed by remote login from the operation management server 100 to the managed server. In remote login, SSH (Secure Shell) may be used.

次に、運用管理サーバ１００においてスクリプトを実行する際の動作（ローカル実行の動作）の流れを説明する。この場合、図８の下段に示されるように、通信セッション確立、切断、スクリプトの復号などの処理は行われず、スクリプト記憶部１１７にあるスクリプトが実行される。すなわち、ステップ５５０（Ｓ５５０）において、スクリプト実行部１１５は、スクリプト記憶部１１７から実行対象のスクリプトを取得する。そして、ステップ５５１（Ｓ５５１）において、スクリプト実行部１１５は、取得したスクリプトを実行する。このような処理により、運用管理サーバ１００におけるクラスタウェア１１０のプロセス監視（内部監視部１１４による監視）などが実行される。この場合、スクリプト実行部１１５の実行結果は、要求元の内部監視部１１４に出力される。 Next, the flow of the operation (local execution operation) when the script is executed on the operation management server 100 will be described. In this case, as shown in the lower part of FIG. 8, processing such as communication session establishment, disconnection, and script decoding is not performed, and the script in the script storage unit 117 is executed. That is, in step 550 (S550), the script execution unit 115 acquires the script to be executed from the script storage unit 117. Then, in step 551 (S551), the script execution unit 115 executes the acquired script. By such processing, process monitoring (monitoring by the internal monitoring unit 114) of the clusterware 110 on the operation management server 100 is executed. In this case, the execution result of the script execution unit 115 is output to the internal monitoring unit 114 of the request source.

以上、実施の形態１について説明した。実施の形態１にかかるＨＡクラスタシステム１０では、運用管理サーバ１００に、上述したクラスタウェア１１０が設けられており、管理対象サーバは、クラスタウェア１１０からの要求によりスクリプトの実行及び実行結果の返信を行う。このため、管理対象サーバがクラスタウェアを有することに起因した問題（例えば、既存環境を有効活用できない、クラスタウェアの処理負荷がかかるなどの問題）の発生を回避することができる。 The first embodiment has been described above. In the HA cluster system 10 according to the first embodiment, the operation management server 100 is provided with the above-mentioned clusterware 110, and the managed server executes a script and returns an execution result in response to a request from the clusterware 110. Do. Therefore, it is possible to avoid the occurrence of problems caused by the managed server having clusterware (for example, problems such as the inability to effectively utilize the existing environment and the processing load of the clusterware).

また、ＨＡクラスタシステム１０では、図６を用いて説明したように、運用管理サーバ１００は、クライアント装置に対して業務サービスを提供するために利用されるネットワーク４００を介して、管理対象サーバによる業務サービスの提供状態を監視する。そして、運用管理サーバ１００は、そのような監視により、管理対象サーバにより業務サービスが正常に提供されていないことを検出した場合、フェールオーバーについての管理対象サーバの動作を制御するためのスクリプトの実行を管理対象サーバに要求する。
ところで、業務サービスを提供できているかを判断するのが提供元のサーバである場合、実際に外部のクライアント装置に対し、ネットワーク４００経由で正常に業務サービスを提供できているかを確実に判定できない。
これに対し、ＨＡクラスタシステム１０は、実際の外部のクライアント装置と同等のアクセスが行われるため、業務サービスが正常に提供できているか否かをより正確に判定することができる。Further, in the HA cluster system 10, as described with reference to FIG. 6, the operation management server 100 is operated by the managed server via the network 400 used to provide the business service to the client device. Monitor the service provision status. Then, when the operation management server 100 detects that the business service is not normally provided by the managed server by such monitoring, the operation management server 100 executes a script for controlling the operation of the managed server regarding failover. To the managed server.
By the way, when it is the server of the provider that determines whether or not the business service can be provided, it cannot be reliably determined whether or not the business service can be normally provided to the external client device via the network 400.
On the other hand, since the HA cluster system 10 is accessed in the same manner as an actual external client device, it is possible to more accurately determine whether or not the business service can be provided normally.

＜実施の形態２＞
次に、実施の形態２について説明する。なお、以下の説明では、実施の形態１と重複する構成、動作については説明を割愛する。本実施の形態では、管理対象サーバが備えるＢＭＣ（ＢａｓｅｂｏａｒｄＭａｎａｇｅｍｅｎｔＣｏｎｔｒｏｌｌｅｒ）に対する処理を運用管理サーバが行う点で、上述の実施の形態と異なっている。<Embodiment 2>
Next, the second embodiment will be described. In the following description, the description of the configuration and operation overlapping with the first embodiment will be omitted. This embodiment is different from the above-described embodiment in that the operation management server performs processing on the BMC (Baseboard Management Controller) included in the managed server.

図９は、実施の形態２にかかるＨＡクラスタシステム２０の一例を示すブロック図である。図９に示すように、現用系業務サーバ２００はＢＭＣ２０４を備え、同様に、待機系業務サーバ３００は、ＢＭＣ３０４を備えている。ＢＭＣ２０４、３０４は、ＯＳとは独立に動作して、プロセッサ、メモリなどといったサーバを構成するハードウェアの監視機能と、サーバの起動及び停止の制御などを行う。ＢＭＣ２０４、３０４は、例えばＩＰＭＩ(Intelligent Platform Management Interface)規格に準拠しており、ＢＭＣとの通信用のＬＡＮ（ＢＭＣＬＡＮ）であるネットワーク４０２に接続されている。ＢＭＣ２０４、３０４は、ハードウェア障害等のために管理対象サーバのＯＳがダウンしている際においても、障害通報機能およびリモートコントロール機能を運用管理サーバ１００に提供する。
管理対象サーバのＯＳがダウンした場合、運用管理サーバ１００からネットワーク４０１経由で管理対象サーバにスクリプトを実行させることにより管理対象サーバを監視及び制御することが不可能になる。そのような場合であっても、管理対象サーバにＢＭＣが搭載されている本実施の形態においては、図１０に示すように、運用管理サーバ１００からネットワーク４０２経由で管理対象サーバを監視・制御(死活監視、強制停止など)することができる。FIG. 9 is a block diagram showing an example of the HA cluster system 20 according to the second embodiment. As shown in FIG. 9, the working business server 200 includes a BMC 204, and similarly, the standby business server 300 includes a BMC 304. The BMCs 204 and 304 operate independently of the OS to monitor the hardware that constitutes the server, such as a processor and memory, and to control the start and stop of the server. The BMCs 204 and 304 conform to, for example, an IPMI (Intelligent Platform Management Interface) standard, and are connected to a network 402 which is a LAN (BMC LAN) for communication with the BMC. The BMCs 204 and 304 provide the operation management server 100 with a failure reporting function and a remote control function even when the OS of the managed server is down due to a hardware failure or the like.
When the OS of the managed server goes down, it becomes impossible to monitor and control the managed server by causing the managed server to execute a script from the operation management server 100 via the network 401. Even in such a case, in the present embodiment in which the BMC is mounted on the managed server, the managed server is monitored and controlled from the operation management server 100 via the network 402 as shown in FIG. Life and death monitoring, forced stop, etc.) can be performed.

図１０は、ＨＡクラスタシステム２０におけるＢＭＣを用いたサーバ監視及びサーバ制御の動作を示すフローチャートである。以下、図１０を参照しつつ動作の流れについて説明する。 FIG. 10 is a flowchart showing the operation of server monitoring and server control using BMC in the HA cluster system 20. Hereinafter, the operation flow will be described with reference to FIG.

ステップ６００(Ｓ６００)において、サーバ監視部１１３は、管理対象サーバ（現用系業務サーバ２００及び待機系業務サーバ３００）の動作状態の確認をスクリプト実行部１１５に要求する。
ステップ６０１（Ｓ６０１）において、スクリプト実行部１１５は、管理対象サーバの状態情報を要求するための所定のスクリプトを実行する。すなわち、スクリプト実行部１１５は、ＢＭＣ２０４、３０４から管理対象サーバの動作状態を取得するためのコマンド（監視用のＩＰＭＩコマンド）をネットワーク４０２経由で送信する。In step 600 (S600), the server monitoring unit 113 requests the script execution unit 115 to confirm the operating status of the managed servers (active business server 200 and standby business server 300).
In step 601 (S601), the script execution unit 115 executes a predetermined script for requesting the status information of the managed server. That is, the script execution unit 115 transmits a command (IPMI command for monitoring) for acquiring the operating state of the managed server from the BMCs 204 and 304 via the network 402.

ステップ６０２（Ｓ６０２）において、コマンドを受信したＢＭＣ２０４、３０４は、ハードウェアの各種情報（ＣＰＵ情報、メモリ情報、ＯＳ状態等）である監視情報（状態情報とも称す）を取得する。
そして、ステップ６０３（Ｓ６０３）において、ＢＭＣ２０４、３０４は、運用管理サーバ１００に監視情報を送信する。
ステップ６０４（Ｓ６０４）において、ＢＭＣ２０４、３０４から監視情報を取得した運用管理サーバ１００のスクリプト実行部１１５は、監視情報をサーバ監視部１１３に出力する。
ステップ６０５（Ｓ６０５）において、サーバ監視部１１３は、ステップ６０４で取得した監視情報を分析し、管理対象サーバの状態を確認する。すなわち、サーバ監視部１１３は、管理対象サーバが正常であるか否か、すなわち障害が発生していないか否かを判定する。In step 602 (S602), the BMCs 204 and 304 that have received the command acquire monitoring information (also referred to as status information) that is various hardware information (CPU information, memory information, OS status, etc.).
Then, in step 603 (S603), the BMCs 204 and 304 transmit the monitoring information to the operation management server 100.
In step 604 (S604), the script execution unit 115 of the operation management server 100 that has acquired the monitoring information from the BMCs 204 and 304 outputs the monitoring information to the server monitoring unit 113.
In step 605 (S605), the server monitoring unit 113 analyzes the monitoring information acquired in step 604 and confirms the status of the managed server. That is, the server monitoring unit 113 determines whether or not the managed server is normal, that is, whether or not a failure has occurred.

ステップ６０５においてサーバ監視部１１３が管理対象サーバの異常を検出した場合、ステップ６０６（Ｓ６０６）において、サーバ監視部１１３は、クラスタ制御部１１１に管理対象サーバにおける異常の発生を通知する。
ステップ６０７（Ｓ６０７）において、通知を受けたクラスタ制御部１１１は、スクリプト実行部１１５に、障害発生時の処理（管理対象サーバの再起動等）を要求する。
ステップ６０８（Ｓ６０８）において、スクリプト実行部１１５は、障害発生時の処理をＢＭＣ２０４、３０４に要求するための所定のスクリプトを実行する。すなわち、スクリプト実行部１１５は、例えば、ＢＭＣ２０４、３０４に管理対象サーバの電源オフコマンド又は電源オンコマンド（電源制御用のＩＰＭＩコマンド）をネットワーク４０２経由で送信する。
ステップ６０９（Ｓ６０９）において、コマンドを受信したＢＭＣ２０４、３０４は、管理対象サーバの電源制御を行う。これにより、例えば、障害発生時の処理として、障害が発生した管理対象サーバの再起動が行われる。または、例えば、障害発生時の処理として、障害が発生した現用系業務サーバ２００の停止及び待機系業務サーバ３００の起動が行われる。なお、待機系業務サーバ３００が停止している状態において現用系業務サーバ２００の障害を検出した場合、現用系業務サーバ２００を電源オフにし、待機系業務サーバ３００を電源オンにすれば、ＩＰ（Internet Protocol）アドレス等の衝突が発生しない。すなわち、クライアント装置からの接続の切り替えが不要になる。When the server monitoring unit 113 detects an abnormality in the managed server in step 605, the server monitoring unit 113 notifies the cluster control unit 111 of the occurrence of the abnormality in the managed server in step 606 (S606).
In step 607 (S607), the cluster control unit 111 that has received the notification requests the script execution unit 115 to perform processing (restart of the managed server, etc.) when a failure occurs.
In step 608 (S608), the script execution unit 115 executes a predetermined script for requesting the BMCs 204 and 304 to perform processing when a failure occurs. That is, the script execution unit 115 transmits, for example, a power-off command or a power-on command (IPMI command for power control) of the managed server to the BMCs 204 and 304 via the network 402.
In step 609 (S609), the BMCs 204 and 304 that have received the command control the power supply of the managed server. As a result, for example, as a process when a failure occurs, the managed server in which the failure has occurred is restarted. Alternatively, for example, as a process when a failure occurs, the active business server 200 in which the failure has occurred is stopped and the standby business server 300 is started. If a failure of the working business server 200 is detected while the standby business server 300 is stopped, the power of the working business server 200 can be turned off and the power of the standby business server 300 can be turned on. Internet Protocol) Address conflicts do not occur. That is, it is not necessary to switch the connection from the client device.

ステップ６０９におけるＢＭＣ２０４、３０４による処理が行われると、ステップ６１０（Ｓ６１０）において、処理の実行結果が運用管理サーバ１００のスクリプト実行部１１５に返信される。また、ステップ６１１（Ｓ６１１）において、クラスタ制御部１１１は、この実行結果をスクリプト実行部１１５から取得する。 When the processing by BMC 204 and 304 in step 609 is performed, the execution result of the processing is returned to the script execution unit 115 of the operation management server 100 in step 610 (S610). Further, in step 611 (S611), the cluster control unit 111 acquires the execution result from the script execution unit 115.

このように、本実施の形態では、運用管理サーバ１００は、管理対象サーバのＢＭＣに対し、管理対象サーバの状態情報の返信を要求し、この状態情報に基づいて管理対象サーバの状態を監視する。そして、運用管理サーバ１００は、管理対象サーバの異常を示す状態情報がＢＭＣから得られた場合、ＢＭＣに対し管理対象サーバの電源制御を要求する。このような構成によれば、管理対象サーバにおいて監視用又は制御用のスクリプトが実行できないような状況において、適切に対応することができる。 As described above, in the present embodiment, the operation management server 100 requests the BMC of the managed server to return the status information of the managed server, and monitors the status of the managed server based on this status information. .. Then, when the operation management server 100 obtains the state information indicating the abnormality of the managed server from the BMC, the operation management server 100 requests the BMC to control the power supply of the managed server. According to such a configuration, it is possible to appropriately deal with the situation where the monitoring or control script cannot be executed on the managed server.

＜実施の形態３＞
次に、実施の形態３について説明する。本実施の形態は、業務サービスを提供するサーバのいずれかが運用管理サーバとしても用いられる点で、実施の形態１と異なっている。以下、実施の形態１と異なる点について説明し、実施の形態１と同様の構成及び動作に関しては説明を割愛する。<Embodiment 3>
Next, the third embodiment will be described. This embodiment is different from the first embodiment in that any of the servers that provide business services is also used as an operation management server. Hereinafter, the points different from those of the first embodiment will be described, and the description of the same configuration and operation as that of the first embodiment will be omitted.

図１１は、実施の形態３にかかるＨＡクラスタシステム３０の一例を示すブロック図である。図１１に示すように、ＨＡクラスタシステム３０は、運用管理・待機系業務サーバ５００と、現用系業務サーバ２００とを有している。図１１に示すように、運用管理・待機系業務サーバ５００は、クラスタウェア１１０の他に業務サービス提供部１２０を有する点で、実施の形態１にかかる運用管理サーバ１００と異なる。すなわち、運用管理・待機系業務サーバ５００は、実施の形態１にかかる運用管理サーバ１００において、業務サービス提供部１２０が追加されたサーバともいえる。業務サービス提供部１２０は、上述した業務サービス提供部２０１、３０１と同様の動作を行う。 FIG. 11 is a block diagram showing an example of the HA cluster system 30 according to the third embodiment. As shown in FIG. 11, the HA cluster system 30 has an operation management / standby business server 500 and a working business server 200. As shown in FIG. 11, the operation management / standby business server 500 is different from the operation management server 100 according to the first embodiment in that it has a business service providing unit 120 in addition to the clusterware 110. That is, the operation management / standby business server 500 can be said to be a server to which the business service providing unit 120 is added to the operation management server 100 according to the first embodiment. The business service providing unit 120 performs the same operation as the above-mentioned business service providing units 201 and 301.

なお、本実施の形態において、運用管理・待機系業務サーバ５００が自サーバを監視する際、サーバ監視部１１３は、図８の下段に示した動作と同様に、スクリプト実行部１１５に対し、スクリプト記憶部１１７のスクリプトをローカルで実行するよう要求する。また、業務サービス提供部１２０を起動又は停止する際も、図８の下段に示した動作と同様に、スクリプト実行部１１５に対し、スクリプト記憶部１１７のスクリプトをローカルで実行するよう要求する。 In the present embodiment, when the operation management / standby business server 500 monitors its own server, the server monitoring unit 113 sends a script to the script execution unit 115 in the same manner as the operation shown in the lower part of FIG. Requests that the script in storage 117 be executed locally. Further, when starting or stopping the business service providing unit 120, the script executing unit 115 is requested to execute the script of the script storage unit 117 locally, as in the operation shown in the lower part of FIG.

このように、運用管理を行うサーバが業務サービス提供部１２０を有することにより、当該サーバを運用管理だけでなく、待機系のサーバとしても利用することができる。また、このような構成によれば、待機系のサーバとして用意するサーバの台数を削減することもできるため、システムの導入コストを抑制することができる。 As described above, since the server that performs operation management has the business service providing unit 120, the server can be used not only as operation management but also as a standby server. Further, according to such a configuration, the number of servers prepared as standby servers can be reduced, so that the system introduction cost can be suppressed.

なお、本発明は上記実施の形態に限られたものではなく、趣旨を逸脱しない範囲で適宜変更することが可能である。例えば、実施の形態３において、実施の形態２で説明したＢＭＣを用いた制御が行われてもよい。 The present invention is not limited to the above embodiment, and can be appropriately modified without departing from the spirit. For example, in the third embodiment, the control using the BMC described in the second embodiment may be performed.

以上、実施の形態を参照して本願発明を説明したが、本願発明は上記によって限定されるものではない。本願発明の構成や詳細には、発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 Although the invention of the present application has been described above with reference to the embodiments, the invention of the present application is not limited to the above. Various changes that can be understood by those skilled in the art can be made within the scope of the invention in the configuration and details of the invention of the present application.

この出願は、２０１８年３月６日に出願された日本出願特願２０１８−３９３９０を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims priority on the basis of Japanese application Japanese Patent Application No. 2018-39390 filed on March 6, 2018, and the entire disclosure thereof is incorporated herein by reference.

１管理サーバ
２送信処理部
３サーバ監視処理部
４クラスタ制御処理部
１０、２０、３０ＨＡクラスタシステム
１００運用管理サーバ
１１０クラスタウェア
１１１クラスタ制御部
１１２業務サービス制御部
１１３サーバ監視部
１１４内部監視部
１１５、２０２、３０２スクリプト実行部
１１６サーバ通信部
１１７、２０３、３０３スクリプト記憶部
１２０、２０１、３０１業務サービス提供部
１５０ネットワークインタフェース
１５１メモリ
１５２プロセッサ
２００現用系業務サーバ
３００待機系業務サーバ
４００、４０１、４０２ネットワーク
５００運用管理・待機系業務サーバ1 Management server 2 Transmission processing unit 3 Server monitoring processing unit 4 Cluster control processing unit 10, 20, 30 HA cluster system 100 Operation management server 110 Clusterware 111 Cluster control unit 112 Business service control unit 113 Server monitoring unit 114 Internal monitoring unit 115 , 202, 302 Script execution unit 116 Server communication unit 117, 203, 303 Script storage unit 120, 201, 301 Business service provision unit 150 Network interface 151 Memory 152 Processor 200 Active business server 300 Standby business server 400, 401, 402 Network 500 Operation management / standby business server

Claims

Performs a process of transmitting a monitoring script for monitoring the operation of the service providing server that provides a predetermined service and a control script for controlling the operation of the service providing server regarding failover to the service providing server. Transmission processing means and
A server monitoring processing means that requests the service providing server to execute the monitoring script and return the execution result, and monitors the operating state of the service providing server based on the execution result.
A management server having a cluster control processing means that requests the service providing server to execute the control script when the execution result of the monitoring script indicates an abnormality of the service providing server.

The server monitoring processing means further monitors the provision status of the predetermined service by the service providing server via the network used by the service providing server to provide the predetermined service to the client device. And
The management server according to claim 1, wherein the cluster control processing means further requests the service providing server to execute the control script when the predetermined service is not normally provided by the service providing server.

The service providing server is a server provided with a BMC (Baseboard Management Controller).
The server monitoring processing means further requests the BMC of the service providing server to return the state information of the service providing server, and monitors the state of the service providing server based on the state information.
The management server according to claim 1 or 2, wherein the cluster control processing means requests the BMC to control the power supply of the service providing server when state information indicating an abnormality of the service providing server is obtained from the BMC. ..

The management server according to any one of claims 1 to 3, further comprising an internal monitoring processing means for monitoring the operating state of at least one of the server monitoring processing means or the cluster control processing means.

The management server according to any one of claims 1 to 4, further comprising a service providing means for providing the predetermined service.

A working server for providing a given service and
A standby server for providing the prescribed service and
The management server that controls failover in the active server and the standby server,
With
The management server
A process of transmitting a first monitoring script for monitoring the operation of the active server and a first control script for controlling the operation of the active server regarding failover to the active server. A process of transmitting a second monitoring script for monitoring the operation of the standby server and a second control script for controlling the operation of the standby server regarding failover to the standby server. Transmission processing means to be performed and
A server monitoring process that requests the active server and the standby server to execute the monitoring script and return the execution result, and monitors the operating status of the active server and the standby server based on the execution result. Means and
When the execution result of the monitoring script of the working server indicates an abnormality of the working server, the working server is requested to execute the first control script and the second control script is executed. A cluster system that has the cluster control processing means required for the standby server.

The server monitoring processing means further monitors the provision status of the predetermined service by the active server via the network used by the active server to provide the predetermined service to the client device. And
The cluster control processing means further requests the active server to execute the first control script and the second control script when the predetermined service is not normally provided by the active server. The cluster system according to claim 6, which requests the standby server to execute the above.

A monitoring script for monitoring the operation of the service providing server that provides a predetermined service and a control script for controlling the operation of the service providing server regarding failover are transmitted to the service providing server.
The service providing server is requested to execute the monitoring script and return the execution result, and the operating state of the service providing server is monitored based on the execution result.
A method of controlling a cluster system that requests the service providing server to execute the control script when the execution result of the monitoring script indicates an abnormality of the service providing server.

Performs a process of transmitting a monitoring script for monitoring the operation of the service providing server that provides a predetermined service and a control script for controlling the operation of the service providing server regarding failover to the service providing server. Transmission processing step and
A server monitoring processing step that requests the service providing server to execute the monitoring script and return the execution result, and monitors the operating state of the service providing server based on the execution result.
When the execution result of the monitoring script indicates an abnormality of the service providing server, a non-temporary program containing a program for causing the computer to execute the cluster control processing step requesting the service providing server to execute the control script is stored. Computer readable medium.