JP6024148B2

JP6024148B2 - Program placement method

Info

Publication number: JP6024148B2
Application number: JP2012073352A
Authority: JP
Inventors: 洋介岩松
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2012-03-28
Filing date: 2012-03-28
Publication date: 2016-11-09
Anticipated expiration: 2032-03-28
Also published as: JP2013206051A

Description

本発明は、複数のコンピュータがネットワークを通じて相互に通信可能に接続されたコンピュータシステムにおけるプログラム配置方法に関する。 The present invention relates to a program arrangement method in a computer system in which a plurality of computers are connected so as to communicate with each other through a network.

従来より、物理マシンのハードウェア資源をソフトウェアを使って複数に分割して独立させ、１台の物理マシン上に複数の仮想マシンを同時に稼働させる仮想マシン（ＶＭ：ＶｅｒｔｕａｌＭａｃｈｉｎｅ）技術が知られている。また、仮想マシンを複数の物理マシン上で多重に実行し、ハードウェア資源に対するフォールトトレラント（ＦＴ）性を持たせる技術が提案されている。 Conventionally, a virtual machine (VM) technology in which hardware resources of a physical machine are divided into a plurality of pieces using software and independent, and a plurality of virtual machines are simultaneously operated on one physical machine is known. Yes. In addition, a technique has been proposed in which a virtual machine is executed in multiple on a plurality of physical machines to provide fault tolerance (FT) for hardware resources.

例えば特許文献１には、高速チェックポイント方式によるＦＴシステムが記載されている。このＦＴシステムでは、一方の物理マシン上で仮想マシン（アクティブ側）を実行しつつ定期的なチェックポイント毎にスナップショット（メモリ状態、ディスク状態、プロセッサやＩ／Ｏのコンテクスト）を取得し、他方の物理マシンに転送して仮想マシン（スタンバイ側）上に保存することにより、二重化を行う。そして、アクティブ側の仮想マシンが物理マシンの障害によって業務処理を実行することができなくなった場合、フェイルオーバー（スタンバイからアクティブへの切替）を実施し、スタンバイ側の仮想マシンがアクティブ側の仮想マシンで実行されていた業務処理を引継いで実行する。 For example, Patent Document 1 describes an FT system using a high-speed checkpoint method. In this FT system, a virtual machine (active side) is executed on one physical machine while taking a snapshot (memory state, disk state, processor and I / O context) at regular checkpoints, Is duplicated by transferring to the physical machine and saving it on the virtual machine (standby side). If the active virtual machine can no longer execute the business process due to a physical machine failure, failover (switch from standby to active) is performed, and the standby virtual machine becomes the active virtual machine. Take over the business process that was being executed in step 3, and execute it.

また、複数の物理マシンに複数の仮想マシンを最適配置する方法として、以下のような方法が提案されている。 In addition, as a method for optimally arranging a plurality of virtual machines on a plurality of physical machines, the following method has been proposed.

例えば特許文献２には、物理マシンと仮想マシンの記憶容量に基づいて、複数の物理マシンに複数の仮想マシンを最適配置する方法が記載されている。具体的には、各仮想マシンの所定時間毎のパフォーマンスを示す実測データと、各物理マシンの記憶容量と、各仮想マシンの記憶容量とに基づいて、複数の仮想マシンの各々を複数の物理マシンの何れかで稼働させた場合の各時間における各仮想マシンのパフォーマンスを示す値の合計値が最大となるような、仮想マシンと物理マシンの組み合わせ、すなわち仮想マシンの物理マシンへの配置を算出する。そして、算出した配置に従って、各仮想マシンのファイルをその仮想マシンに対応する物理マシンの記憶領域に格納して各仮想マシンの再配置を行う。 For example, Patent Document 2 describes a method of optimally arranging a plurality of virtual machines in a plurality of physical machines based on the storage capacities of the physical machine and the virtual machine. Specifically, each of the plurality of virtual machines is divided into a plurality of physical machines based on the actual measurement data indicating the performance of each virtual machine every predetermined time, the storage capacity of each physical machine, and the storage capacity of each virtual machine. Calculate the combination of the virtual machine and physical machine, that is, the placement of the virtual machine on the physical machine, so that the total of the values indicating the performance of each virtual machine at each time when operating at any of the above is maximized . Then, according to the calculated arrangement, the files of each virtual machine are stored in the storage area of the physical machine corresponding to the virtual machine and the virtual machines are rearranged.

また特許文献３には、災害による影響を回避するために、災害による影響を受ける物理マシン上の仮想マシンを、災害による影響を受けない他の物理マシンに再配置する方法が記載されている。 Patent Document 3 describes a method of relocating a virtual machine on a physical machine affected by a disaster to another physical machine not affected by the disaster in order to avoid the influence of the disaster.

また特許文献４には、ネットワーク使用率に基づいて、複数の物理マシンに複数の仮想マシンを配置する方法が記載されている。具体的には、仮想マシン間の通信量の測定結果に基づいて、所定の仮想マシンを他の物理マシンに移動した場合のネットワーク使用率の期待値を算出して当該移動の妥当性を判定し、妥当な場合に仮想マシンを他の物理マシンに移動させる。 Further, Patent Document 4 describes a method of arranging a plurality of virtual machines on a plurality of physical machines based on a network usage rate. Specifically, based on the measurement results of traffic between virtual machines, the expected value of the network usage rate when a given virtual machine is moved to another physical machine is calculated to determine the validity of the movement. Move virtual machines to other physical machines when appropriate.

また特許文献５には、物理マシンと仮想マシンの負荷に基づいて、複数の物理マシンに複数の仮想マシンを配置する方法が記載されている。具体的には、物理マシンおよび仮想マシンの負荷の変化を、高次方程式等の近似式によって把握または分析することで、物理マシンのリソース不足の事前検出、および中長期的に見て適切な仮想マシンの配置および移動を実現する。 Patent Document 5 describes a method of arranging a plurality of virtual machines on a plurality of physical machines based on the loads of the physical machine and the virtual machine. Specifically, by grasping or analyzing changes in the load of physical machines and virtual machines using approximate equations such as higher-order equations, it is possible to detect in advance the shortage of physical machine resources, and to make appropriate virtual Realize machine placement and movement.

特開２０１０−１６０６６０号公報JP 2010-160660 A 特開２００５−１１５６５３号公報JP 2005-115653 A 特開２０１１−２０９８１１号公報JP2011-209811A 特開２０１１−１８０８８９号公報JP 2011-180889 A 特開２０１０−１１７７６０号公報JP 2010-117760 A

高速チェックポイント方式によりＦＴ化された仮想マシンのセット（アクティブ側の仮想マシンとスタンバイ側の仮想マシン）では、アクティブ側の仮想マシンの方がスタンバイ側の仮想マシンより負荷が大きく、またスタンバイ側の仮想マシンからアクティブ側の仮想マシンへの通信量に比べてアクティブ側の仮想マシンからスタンバイ側の仮想マシンへの通信量の方が多いという特徴がある。 In a set of virtual machines (active side virtual machine and standby side virtual machine) converted to FT by the fast checkpoint method, the active side virtual machine has a higher load than the standby side virtual machine, and the standby side virtual machine The amount of communication from the active side virtual machine to the standby side virtual machine is larger than the amount of communication from the virtual machine to the active side virtual machine.

このため、複数の物理マシンに複数の仮想マシンを配置する場合、上述した特徴を考慮して仮想マシンの配置を決定する必要がある。しかしながら、ＦＴ化されたシステムの場合、何れかの物理マシンに障害が発生すると、物理マシン上のアクティブ側の仮想マシンで行われていた業務処理を他の物理マシン上のスタンバイ側の仮想マシンが引き継ぐフェイルオーバーが発生する。このため、フェイルオーバーが発生すると、複数の物理マシンの処理負荷や物理マシン間のネットワーク通信負荷の分散状態が崩れてシステム全体の実行効率が低下する。フェイルオーバーが発生した時点で再び仮想マシンの最適な配置計算と再配置を実施することも考えられるが、稼働中のアクティブ側の仮想マシンを別の物理マシンに移動させる結果となり、稼働中の業務への影響が懸念される。 For this reason, when arranging a plurality of virtual machines on a plurality of physical machines, it is necessary to determine the arrangement of the virtual machines in consideration of the above-described characteristics. However, in the case of an FT system, when a failure occurs in any physical machine, the standby virtual machine on the other physical machine performs the business processing performed on the active virtual machine on the physical machine. Failover takes over. For this reason, when a failover occurs, the processing load of a plurality of physical machines and the distribution state of the network communication load among the physical machines collapse, and the execution efficiency of the entire system decreases. Although it may be possible to perform the optimal placement calculation and relocation of the virtual machine again when a failover occurs, the result is that the active virtual machine that is running is moved to another physical machine, resulting in a running operation. There is concern about the impact on

本発明の目的は、上述した課題、すなわち、フェイルオーバーが発生すると複数の物理マシンの処理負荷や物理マシン間のネットワーク通信負荷の分散状態が崩れてシステム全体の実行効率が低下する、という課題を解決するプログラム配置方法を提供することにある。 The object of the present invention is to solve the above-described problem, that is, when the failover occurs, the processing load of a plurality of physical machines and the distributed state of the network communication load among the physical machines are disrupted and the execution efficiency of the entire system is lowered. The object is to provide a program placement method to be solved.

本発明の一形態にかかるプログラム配置方法は、
第１のコンピュータおよび複数の第２のコンピュータがネットワークを通じて相互に通信可能に接続されたコンピュータシステムが実行するプログラム配置方法であって、
上記第１のコンピュータが、それぞれ一方がアクティブ、他方がスタンバイとして動作する２つのプログラムのセットである複数のプログラムセットを上記複数の第２のコンピュータに配置する配置パターンであって、同じプログラムセットの上記２つのプログラムが同じ上記第２のコンピュータに配置されず且つ上記複数の第２のコンピュータそれぞれの処理負荷と上記複数の第２のコンピュータ間の通信負荷との少なくとも一方を考慮した配置パターンを決定し、
上記第１のコンピュータが、上記決定した配置パターンに従って、上記複数の第２のコンピュータに上記複数のプログラムセットを配置し、
上記第１のコンピュータが、何れかの上記第２のコンピュータに障害が発生してフェイルオーバーが行われた場合、フェイルオーバーが行われていない残りの上記プログラムセットの上記２つのプログラムのアクティブとスタンバイとを切り替える切替パターンを、上記フェイルオーバー後の上記複数の第２のコンピュータそれぞれの処理負荷と上記複数の第２のコンピュータ間の通信負荷との少なくとも一方を考慮して決定し、
上記第１のコンピュータが、上記決定した切替パターンに従って、上記フェイルオーバーが行われていない上記プログラムセットの上記２つのプログラムのアクティブとスタンバイの切り替えを行う
といった構成を採る。 A program arrangement method according to an aspect of the present invention includes:
A program arrangement method executed by a computer system in which a first computer and a plurality of second computers are connected to each other through a network,
The first computer is an arrangement pattern in which a plurality of program sets, each of which is a set of two programs that operate as one active and the other as a standby, are arranged in the plurality of second computers, The two programs are not arranged on the same second computer, and an arrangement pattern is determined in consideration of at least one of the processing load of each of the plurality of second computers and the communication load between the plurality of second computers. And
The first computer arranges the plurality of program sets on the plurality of second computers according to the determined arrangement pattern,
When a failure occurs in any one of the second computers and the first computer fails over, the active and standby of the two programs in the remaining program set that has not failed over A switching pattern for switching between the plurality of second computers after the failover is determined in consideration of at least one of the processing load of each of the plurality of second computers and the communication load between the plurality of second computers,
The first computer adopts a configuration in which the two programs of the program set that are not failed over are switched between active and standby according to the determined switching pattern.

また本発明の他の形態にかかるコンピュータは、
第１のコンピュータおよび複数の第２のコンピュータがネットワークを通じて相互に通信可能に接続されたコンピュータシステムにおける上記第１のコンピュータであって、
それぞれ一方がアクティブ、他方がスタンバイとして動作する２つのプログラムのセットである複数のプログラムセットを上記複数の第２のコンピュータに配置する配置パターンであって、同じプログラムセットの上記２つのプログラムが同じ上記第２のコンピュータに配置されず且つ上記複数の第２のコンピュータそれぞれの処理負荷と上記複数の第２のコンピュータ間の通信負荷との少なくとも一方を考慮した配置パターンを決定する配置パターン決定手段と、
上記決定した配置パターンに従って、上記複数の第２のコンピュータに上記複数のプログラムセットを配置するプログラム配置手段と、
何れかの上記第２のコンピュータに障害が発生してフェイルオーバーが行われた場合、フェイルオーバーが行われていない残りの上記プログラムセットの上記２つのプログラムのアクティブとスタンバイとを切り替える切替パターンを、上記フェイルオーバー後の上記複数の第２のコンピュータそれぞれの処理負荷と上記複数の第２のコンピュータ間の通信負荷との少なくとも一方を考慮して決定する切替パターン決定手段と、
上記決定した切替パターンに従って、上記フェイルオーバーが行われていない上記プログラムセットの上記２つのプログラムのアクティブとスタンバイの切り替えを行う切替手段と
を有する、といった構成を採る。 A computer according to another embodiment of the present invention includes:
The first computer in a computer system in which a first computer and a plurality of second computers are connected to each other through a network,
An arrangement pattern in which a plurality of program sets, each of which is a set of two programs operating as one active and the other as a standby, are arranged in the plurality of second computers, and the two programs in the same program set are the same An arrangement pattern determining means for determining an arrangement pattern that is not arranged in the second computer and takes into account at least one of the processing load of each of the plurality of second computers and the communication load between the plurality of second computers;
Program placement means for placing the plurality of program sets on the plurality of second computers according to the determined placement pattern;
When a failure occurs in any of the second computers and a failover is performed, a switching pattern for switching between active and standby of the two programs of the remaining program set that has not been failed over, Switching pattern determining means for determining in consideration of at least one of the processing load of each of the plurality of second computers after the failover and the communication load between the plurality of second computers;
According to the determined switching pattern, a configuration is adopted in which switching means for switching between active and standby of the two programs of the program set in which the failover is not performed is employed.

本発明は上述した構成を有するため、フェイルオーバーが発生した場合でも、複数の物理マシンの処理負荷や物理マシン間のネットワーク通信負荷の分散状態が崩れてシステム全体の実行効率が低下するのを防止することができる。 Since the present invention has the above-described configuration, even when a failover occurs, it is possible to prevent the processing load of a plurality of physical machines and the distribution state of network communication load among physical machines from being disrupted to reduce the execution efficiency of the entire system. can do.

本発明の第１の実施形態のブロック図である。It is a block diagram of a 1st embodiment of the present invention. 本発明の第１の実施形態の動作の一例を示すフローチャートである。It is a flowchart which shows an example of operation | movement of the 1st Embodiment of this invention. 本発明の第１の実施形態におけるプログラムセットの配置例を示す図である。It is a figure which shows the example of arrangement | positioning of the program set in the 1st Embodiment of this invention. 本発明の第１の実施形態におけるフェイルオーバー直後の各プログラムの状態と処理負荷の値を示す図である。It is a figure which shows the state of each program and the value of processing load immediately after the failover in the 1st Embodiment of this invention. 本発明の第１の実施形態におけるフェイルオーバー後に決定された切替パターンに従ってフェイルオーバーしていないプログラムセットのアクティブとスタンバイの切り替えを行った後の各プログラムの状態と処理負荷の値を示す図である。It is a figure which shows the state of each program, and the value of processing load after switching active and standby of the program set which has not failed over according to the switching pattern determined after failover in the 1st Embodiment of this invention. . 本発明の第２の実施形態のブロック図である。It is a block diagram of the 2nd Embodiment of this invention. 本発明の第２の実施形態における各物理マシンの処理能力を測定する手順を示すフローチャートである。It is a flowchart which shows the procedure which measures the processing capacity of each physical machine in the 2nd Embodiment of this invention. 本発明の第２の実施形態における各物理マシンの処理能力の測定結果を保持するテーブルの一例を示す図である。It is a figure which shows an example of the table holding the measurement result of the processing capacity of each physical machine in the 2nd Embodiment of this invention. 本発明の第２の実施形態における物理マシン間の通信速度を測定する手順を示すフローチャートである。It is a flowchart which shows the procedure which measures the communication speed between the physical machines in the 2nd Embodiment of this invention. 本発明の第２の実施形態における物理マシン間の通信速度の測定結果を保持するテーブルの一例を示す図である。It is a figure which shows an example of the table holding the measurement result of the communication speed between the physical machines in the 2nd Embodiment of this invention. 本発明の第２の実施形態における各プログラムの処理負荷と通信量を測定する構成例を示すブロック図である。It is a block diagram which shows the structural example which measures the processing load and communication amount of each program in the 2nd Embodiment of this invention. 本発明の第２の実施形態における各プログラムの処理負荷と通信量を測定する手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure which measures the processing load and communication amount of each program in the 2nd Embodiment of this invention. 本発明の第２の実施形態における各プログラムの処理負荷の測定結果を保持するテーブルの一例を示す図である。It is a figure which shows an example of the table holding the measurement result of the processing load of each program in the 2nd Embodiment of this invention. 本発明の第２の実施形態における各プログラムの通信量の測定結果を保持するテーブルの一例を示す図である。It is a figure which shows an example of the table holding the measurement result of the communication amount of each program in the 2nd Embodiment of this invention. 本発明の第２の実施形態における配置パターンの選択手順の一例を示すフローチャートである。It is a flowchart which shows an example of the selection procedure of the arrangement | positioning pattern in the 2nd Embodiment of this invention. 本発明の第２の実施形態における配置パターンの選択手順で使用する重み係数の一例を示す図である。It is a figure which shows an example of the weighting factor used in the selection procedure of the arrangement | positioning pattern in the 2nd Embodiment of this invention. 本発明の第２の実施形態における切替パターンの選択手順の一例を示すフローチャートである。It is a flowchart which shows an example of the selection procedure of the switching pattern in the 2nd Embodiment of this invention. 本発明の第３の実施形態における配置パターンの選択手順の一例を示すフローチャートである。It is a flowchart which shows an example of the selection procedure of the arrangement | positioning pattern in the 3rd Embodiment of this invention. 本発明の第３の実施形態における配置パターンの選択手順で使用する重み係数の一例を示す図である。It is a figure which shows an example of the weighting coefficient used in the selection procedure of the arrangement | positioning pattern in the 3rd Embodiment of this invention.

次に本発明の実施の形態について図面を参照して詳細に説明する。
[第１の実施形態]
図１を参照すると、本発明の第１の実施形態にかかるコンピュータシステムは、第１のコンピュータ１１００および複数の第２のコンピュータ１２００がネットワーク１３００を通じて相互に通信可能に接続されている。 Next, embodiments of the present invention will be described in detail with reference to the drawings.
[First embodiment]
Referring to FIG. 1, in a computer system according to a first embodiment of the present invention, a first computer 1100 and a plurality of second computers 1200 are connected via a network 1300 so that they can communicate with each other.

コンピュータ１１００は、パーソナルコンピュータ等の情報処理装置（物理マシン）であり、コンピュータシステム全体を制御する機能を有する。 A computer 1100 is an information processing apparatus (physical machine) such as a personal computer and has a function of controlling the entire computer system.

コンピュータ１１００は、通信インターフェース部（通信Ｉ／Ｆ部）１１０１、記憶部１１０２、およびプロセッサ１１０３を有する。 The computer 1100 includes a communication interface unit (communication I / F unit) 1101, a storage unit 1102, and a processor 1103.

通信Ｉ／Ｆ部１１０１は、専用のデータ通信回路からなり、ネットワーク１３００を介して接続されたコンピュータ１２００などの各種装置との間でデータ通信を行う機能を有している。 The communication I / F unit 1101 includes a dedicated data communication circuit, and has a function of performing data communication with various devices such as a computer 1200 connected via the network 1300.

記憶部１１０２は、ハードディスクやメモリなどの記憶装置からなり、プロセッサ１１０３での各種処理に必要な処理情報やプログラム１１０４を記憶する機能を有している。処理情報として、複数のプログラムセット１１０９がある。 The storage unit 1102 includes a storage device such as a hard disk or a memory, and has a function of storing processing information and programs 1104 necessary for various processes in the processor 1103. As processing information, there are a plurality of program sets 1109.

それぞれのプログラムセット１１０９は、一方がアクティブ、他方がスタンバイとして動作する２つのプログラムのセットである。このプログラムセット１１０９によって、アクティブの仮想マシンとスタンバイの仮想マシンとから構成されるＦＴシステムが生成される。 Each program set 1109 is a set of two programs, one operating as active and the other as standby. By this program set 1109, an FT system including an active virtual machine and a standby virtual machine is generated.

プログラム１１０４は、プロセッサ１１０３に読み込まれて実行されることにより各種処理部を実現するプログラムであり、通信Ｉ／Ｆ部１１０１などのデータ入出力機能を介して外部装置（図示せず）やコンピュータ可読記憶媒体（図示せず）から予め読み込まれて記憶部１１０２に保存される。 The program 1104 is a program that implements various processing units by being read and executed by the processor 1103, and can be read by an external device (not shown) or computer readable via a data input / output function such as the communication I / F unit 1101. It is read in advance from a storage medium (not shown) and stored in the storage unit 1102.

プロセッサ１１０３は、ＣＰＵなどのマイクロプロセッサとその周辺回路を有し、記憶部１１０２からプログラム１１０４を読み込んで実行することにより、上記ハードウェアとプログラム１１０４とを協働させて各種処理部を実現する機能を有している。プロセッサ１１０３で実現される主な処理部として、配置パターン決定部１１０５とプログラム配置部１１０６と切替パターン決定部１１０７と切替部１１０８とがある。 The processor 1103 has a microprocessor such as a CPU and its peripheral circuits, and reads and executes the program 1104 from the storage unit 1102 to realize various processing units by cooperating the hardware and the program 1104. have. The main processing units realized by the processor 1103 include an arrangement pattern determination unit 1105, a program arrangement unit 1106, a switching pattern determination unit 1107, and a switching unit 1108.

配置パターン決定部１１０５は、複数のプログラムセット１１０９を、複数のコンピュータ１２００に配置する配置パターンを決定する機能を有する。配置パターン決定部１１０５は、配置パターンの決定では、同じプログラムセット１１０９の２つのプログラムが同じコンピュータ１２００に配置されないようにする。また配置パターン決定部１１０５は、配置パターンの決定では、複数のコンピュータ１２００それぞれの処理負荷と、複数のコンピュータ１２００間の通信負荷との少なくとも一方を考慮して配置パターンを決定する。勿論、処理負荷や通信負荷以外の各種の情報を考慮して配置パターンを決定してもよい。 The arrangement pattern determination unit 1105 has a function of determining an arrangement pattern in which a plurality of program sets 1109 are arranged in a plurality of computers 1200. The arrangement pattern determination unit 1105 prevents the two programs of the same program set 1109 from being arranged on the same computer 1200 in determining the arrangement pattern. The arrangement pattern determination unit 1105 determines the arrangement pattern in consideration of at least one of the processing load of each of the plurality of computers 1200 and the communication load between the plurality of computers 1200. Of course, the arrangement pattern may be determined in consideration of various types of information other than the processing load and the communication load.

プログラム配置部１１０６は、配置パターン決定部１１０５によって決定された配置パターンに従って、複数のコンピュータ１２００に複数のプログラムセット１１０９を配置する機能を有する。すなわち、プログラム配置部１１０６は、或るプログラムセット１１０９を構成するアクティブ側のプログラムを、その配置先であるコンピュータ１２００にネットワーク１３００を通じて送信して当該コンピュータ１２００上にアクティブ側の仮想マシンを生成する。また、プログラム配置部１１０６は、或るプログラムセット１１０９を構成するスタンバイ側のプログラムを、その配置先であるコンピュータ１２００にネットワーク１３００を通じて送信して当該コンピュータ１２００上にスタンバイ側の仮想マシンを生成する。 The program placement unit 1106 has a function of placing a plurality of program sets 1109 on a plurality of computers 1200 according to the placement pattern determined by the placement pattern determination unit 1105. In other words, the program placement unit 1106 transmits an active program constituting a certain program set 1109 to the placement destination computer 1200 via the network 1300 to generate an active virtual machine on the computer 1200. Further, the program placement unit 1106 transmits a standby-side program constituting a certain program set 1109 to the placement destination computer 1200 via the network 1300 to generate a standby-side virtual machine on the computer 1200.

切替パターン決定部１１０７は、何れかのコンピュータ１２００に障害が発生してフェイルオーバーが行われた場合、フェイルオーバーが行われていない残りのプログラムセット１１０９の２つのプログラムのアクティブとスタンバイとを切り替える切替パターンを生成する機能を有する。切替パターン決定部１１０７は、切替パターンの生成では、フェイルオーバー後の、複数のコンピュータ１２００それぞれの処理負荷と複数のコンピュータ１２００間の通信負荷との少なくとも一方を考慮して、切替パターンを決定する。勿論、処理負荷や通信負荷以外の各種の情報を考慮して切替パターンを決定してもよい。 When a failure occurs in any computer 1200 and a failover is performed, the switching pattern determination unit 1107 switches between switching between active and standby of the two programs of the remaining program set 1109 that has not been failed over It has a function to generate a pattern. In generating the switching pattern, the switching pattern determination unit 1107 determines the switching pattern in consideration of at least one of the processing load of each of the plurality of computers 1200 and the communication load between the plurality of computers 1200 after failover. Of course, the switching pattern may be determined in consideration of various types of information other than the processing load and the communication load.

切替部１１０８は、切替パターン決定部１１０７によって決定された切替パターンに従って、上記フェイルオーバーが行われていないプログラムセット１１０９の２つのプログラムのアクティブとスタンバイの切り替えを行う機能を有する。すなわち、切替部１１０８は、或るプログラムセット１１０９のアクティブ側のプログラムが配置されているコンピュータ１２００に対してネットワーク１３００を通じてスタンバイへの変更を指令すると同時に、そのプログラムセット１１０９を構成するスタンバイ側のプログラムが配置されているコンピュータ１２００に対してネットワーク１３００を通じてアクティブへの変更を指令する。 The switching unit 1108 has a function of switching between active and standby of the two programs of the program set 1109 not subjected to the failover according to the switching pattern determined by the switching pattern determination unit 1107. That is, the switching unit 1108 instructs the computer 1200 on which an active program of a certain program set 1109 is placed to change to standby through the network 1300 and at the same time, configures the standby program that configures the program set 1109. Is instructed to change to active through the network 1300.

コンピュータ１２００は、パーソナルコンピュータ等の情報処理装置（物理マシン）であり、複数の仮想マシンを稼働させる機能を有する。コンピュータ１２００は、コンピュータ１１００と同様に、通信Ｉ／Ｆ部、記憶部、プロセッサにより構成される。 The computer 1200 is an information processing apparatus (physical machine) such as a personal computer and has a function of operating a plurality of virtual machines. Similar to the computer 1100, the computer 1200 includes a communication I / F unit, a storage unit, and a processor.

またコンピュータ１１００およびコンピュータ１２００は、キーボードやマウスなどの操作入力装置や、ＬＣＤやＰＤＰなどの画面表示装置を有していてよい。 Further, the computer 1100 and the computer 1200 may have operation input devices such as a keyboard and a mouse, and screen display devices such as an LCD and a PDP.

図２は本実施形態にかかるコンピュータシステムが実行するプログラム配置方法の手順を示すフローチャートである。以下、図１および図２を参照して本実施形態の動作を説明する。 FIG. 2 is a flowchart showing the procedure of the program arrangement method executed by the computer system according to the present embodiment. The operation of this embodiment will be described below with reference to FIGS.

まず、コンピュータ１１００の配置パターン決定部１１０５は、複数のプログラムセット１１０９を複数のコンピュータ１２００に配置する配置パターンを決定する（ステップＳ１００１）。このとき配置パターン決定部１１０５は、同じプログラムセットのアクティブとスタンバイの２つのプログラムがそれぞれ異なるコンピュータ１２００に配置されるような配置パターンを決定する。また、配置パターン決定部１１０５は、複数のコンピュータ１２００の処理負荷とコンピュータ１２００間の通信負荷との少なくとも一方を考慮して配置パターンを決定する。例えば、配置パターン決定部１１０５は、全てのコンピュータ１２００が同じ性能を有する場合、それぞれのコンピュータ１２００の処理負荷（稼働するプログラムによる負荷の合計）がほぼ等しくなり、また任意の２つのコンピュータ１２００間の通信負荷（一方の上で稼働するアクティブおよびスタンバイのプログラムから他方の上で稼働する対となるスタンバイおよびアクティブへの通信量およびその反対方向の通信量）がほぼ均等になるような配置パターンを決定する。これらの処理負荷や通信負荷は、予め計測して記憶部１１０２に記憶されたデータを使用しても良いし、複数のコンピュータ１２００を使用して実際に計測したデータを使用しても良い。 First, the arrangement pattern determining unit 1105 of the computer 1100 determines an arrangement pattern for arranging a plurality of program sets 1109 on a plurality of computers 1200 (step S1001). At this time, the arrangement pattern determination unit 1105 determines an arrangement pattern in which two programs, active and standby, of the same program set are arranged in different computers 1200, respectively. In addition, the arrangement pattern determination unit 1105 determines an arrangement pattern in consideration of at least one of the processing load of the plurality of computers 1200 and the communication load between the computers 1200. For example, when all the computers 1200 have the same performance, the arrangement pattern determination unit 1105 makes the processing load of each computer 1200 (the total load due to the running program) almost equal, and between any two computers 1200 Determining the layout pattern so that the communication load (the amount of traffic from the active and standby programs running on one side to the paired standby and active running on the other side, and vice versa) is almost equal To do. For these processing loads and communication loads, data measured in advance and stored in the storage unit 1102 may be used, or data actually measured using a plurality of computers 1200 may be used.

次に、コンピュータ１１００のプログラム配置部１１０６は、上記決定した配置パターンに従って、複数のコンピュータ１２００に複数のプログラムセット１１０９を配置する（ステップＳ１００２）。これにより、複数のコンピュータ１２００上に、それぞれがアクティブとスタンバイの２つの仮想マシンのセットであるＦＴシステムが複数生成され、動作を開始することになる。 Next, the program placement unit 1106 of the computer 1100 places a plurality of program sets 1109 on the plurality of computers 1200 according to the determined placement pattern (step S1002). As a result, a plurality of FT systems, each of which is a set of two virtual machines, active and standby, are generated on a plurality of computers 1200 and start operating.

その後、複数のコンピュータ１２００の何れかに障害が発生し、その上で稼働しているアクティブの仮想マシンの動作が停止すると、フェイルオーバーが行われ、動作を停止した仮想マシンの処理をスタンバイ側の仮想マシンが引継いで実行する。このように今までスタンバイとして動作していた仮想マシンがアクティブとして動作するため、新たにアクティブとなった仮想マシンが配置されているコンピュータ１２００の処理負荷がフェイルオーバー前に比べて増大する。また、障害の発生したコンピュータ１２００上のプログラムと通信を行っていた他のコンピュータ１２００の通信負荷が、フェイルオーバー前に比べて低下する。この結果、そのままでは処理負荷や通信負荷のバランスが崩れるままになり、システム全体の実行効率が低下することになる。 Thereafter, when a failure occurs in any of the plurality of computers 1200 and the operation of the active virtual machine operating on the computer 1200 is stopped, a failover is performed, and the processing of the stopped virtual machine is performed on the standby side. The virtual machine takes over and executes. As described above, since the virtual machine that has been operating as a standby until now operates as active, the processing load of the computer 1200 in which the newly activated virtual machine is arranged increases compared to before the failover. In addition, the communication load of the other computer 1200 that was communicating with the program on the computer 1200 in which the failure has occurred is lower than before the failover. As a result, the balance of processing load and communication load remains unbalanced as it is, and the execution efficiency of the entire system is reduced.

そこで本実施形態では、フェイルオーバーが行われた場合、速やかに以下のような処理を実行する。 Therefore, in the present embodiment, when a failover is performed, the following processing is promptly executed.

まずコンピュータ１１００の切替パターン決定部１１０７は、フェイルオーバーが行われた場合、フェイルオーバーが行われていない残りのプログラムセット１１０９の２つのプログラムのアクティブとスタンバイとを切り替える切替パターンを決定する（ステップＳ１００３）。この切替パターンの決定では、切替パターン決定部１１０７は、フェイルオーバー後の複数のコンピュータ１２００それぞれの処理負荷とコンピュータ１２００間の通信負荷との少なくとも一方を考慮して、切替パターンを決定する。例えば切替パターン決定部１１０７は、すべてのコンピュータ１２００の性能が同じである場合、それぞれのコンピュータ１２００の処理負荷（稼働するプログラムによる負荷の合計）がほぼ等しくなり、また任意の２つのコンピュータ１２００間の通信負荷（一方の上で稼働するアクティブおよびスタンバイのプログラムから他方の上で稼働する対となるスタンバイおよびアクティブへの通信量およびその反対方向の通信量）がほぼ均等になるような切替パターンを決定する。これらの処理負荷や通信負荷は、予め計測して記憶部１１０２に記憶されたデータを使用しても良いし、複数の第２のコンピュータ１２００を使用して実際に計測したデータを使用しても良い。 First, when a failover is performed, the switching pattern determination unit 1107 of the computer 1100 determines a switching pattern for switching between active and standby of the two programs of the remaining program set 1109 that has not been failed over (step S1003). ). In determining the switching pattern, the switching pattern determining unit 1107 determines the switching pattern in consideration of at least one of the processing load of each of the plurality of computers 1200 after failover and the communication load between the computers 1200. For example, when the performance of all the computers 1200 is the same, the switching pattern determination unit 1107 has almost the same processing load (total load due to the operating program) of each computer 1200, and between any two computers 1200. A switching pattern is determined so that the communication load (the amount of traffic from the active and standby programs running on one side to the paired standby and active running on the other side and the amount of traffic in the opposite direction) is almost equal. To do. These processing loads and communication loads may be measured in advance and used in the data stored in the storage unit 1102 or may be data actually measured using a plurality of second computers 1200. good.

次にコンピュータ１１００の切替部１１０８は、上記決定した切替パターンに従って、フェイルオーバーが行われていないプログラムセット１１０９の２つのプログラムのアクティブとスタンバイの切り替えを行う（ステップＳ１００４）。通常、アクティブとスタンバイの切り替えは一秒以下という短い時間で高速に行うことが可能である。これにより、業務処理に影響を与えずに、処理負荷や通信負荷のバランスを改善し、システム全体の実行効率を高めることが可能である。 Next, the switching unit 1108 of the computer 1100 switches between active and standby of the two programs of the program set 1109 that has not been failed over according to the determined switching pattern (step S1004). Normally, switching between active and standby can be performed at high speed in a short time of one second or less. As a result, it is possible to improve the balance of processing load and communication load without affecting business processing, and increase the execution efficiency of the entire system.

次に簡略化した例を用いて本実施形態の効果を説明する。 Next, the effect of this embodiment will be described using a simplified example.

図３は、プログラム配置部１１０６によって、同じ性能を有する４台のコンピュータ１２００上に、合計８種類のプログラムセットを配置したパターンの一例を示す。 FIG. 3 shows an example of a pattern in which a total of eight types of program sets are arranged on four computers 1200 having the same performance by the program arrangement unit 1106.

図３を参照すると、コンピュータ１２００−１には、第１乃至第３のプログラムセットのアクティブのプログラムＡ１〜Ａ３と、第６のプログラムセットのスタンバイのプログラムＳ６とが配置されている。また、コンピュータ１２００−２には、第１乃至第３のプログラムセットのスタンバイのプログラムＳ１〜Ｓ３と、第４、第７、第８のプログラムセットのアクティブのプログラムＡ４、Ａ７、Ａ８とが配置されている。また、コンピュータ１２００−３には、第４、第７のプログラムセットのスタンバイのプログラムＳ４、Ｓ７と、第５のプログラムセットのアクティブのプログラムＡ５とが配置されている。また、コンピュータ１２００−４には、第５、第８のプログラムセットのスタンバイのプログラムＳ５、Ｓ８と、第６のプログラムセットのアクティブのプログラムＡ６とが配置されている。 Referring to FIG. 3, the computer 1200-1 includes active programs A1 to A3 of the first to third program sets and a standby program S6 of the sixth program set. The computer 1200-2 includes standby programs S1 to S3 of the first to third program sets and active programs A4, A7, and A8 of the fourth, seventh, and eighth program sets. ing. In the computer 1200-3, standby programs S4 and S7 of the fourth and seventh program sets and an active program A5 of the fifth program set are arranged. In the computer 1200-4, standby programs S5 and S8 of the fifth and eighth program sets and an active program A6 of the sixth program set are arranged.

図３におけるプログラムの箇所に付記した数値は、そのプログラムの処理負荷の値（単位は例えばＭＩＰＳ）を示す。また、コンピュータ１２０の箇所に付記した数値は、そのコンピュータ上に配置されたプログラムの処理負荷の合計値を示す。図３に示す配置パターンでは、コンピュータ１２００−１の処理負荷は９１、コンピュータ１２００−２の処理負荷は９３、コンピュータ１２００−３、１２００−４の処理負荷は共に９２であり、４台のコンピュータ１２００の処理負荷はほぼ等しくなっている。 The numerical value added to the location of the program in FIG. 3 indicates the value of the processing load of the program (the unit is, for example, MIPS). The numerical value added to the location of the computer 120 indicates the total processing load of the programs arranged on the computer. In the arrangement pattern shown in FIG. 3, the processing load of the computer 1200-1 is 91, the processing load of the computer 1200-2 is 93, and the processing loads of the computers 1200-3 and 1200-4 are both 92. The processing load is almost equal.

図４は、コンピュータ１２００−１に障害が発生して、フェイルオーバーが行われた直後の各プログラムの状態と処理負荷の値を示している。コンピュータ１２００−２上の第１乃至第３のプログラムセットのプログラムがフェイルオーバーによってスタンバイからアクティブに切り替わっている。この結果、コンピュータ１２００−２の処理負荷の合計は９３から１８０に増大している。 FIG. 4 shows the state of each program and the value of the processing load immediately after a failure occurs in the computer 1200-1 and a failover is performed. Programs in the first to third program sets on the computer 1200-2 are switched from standby to active due to failover. As a result, the total processing load of the computer 1200-2 increases from 93 to 180.

図５は、切替パターン決定部１１０７によって決定された切替パターンに従って切替部１１０８がフェイルオーバーしていないプログラムセットのアクティブとスタンバイの切り替えを行った後の各プログラムの状態と処理負荷の値を示している。図５の例では、第７および第８の２つのプログラムセットについて、アクティブとスタンバイの切り替えを行っている。この結果、コンピュータ１２００−２の処理負荷は１８０から１２２に減少し、コンピュータ１２００−３と１２００−４の処理負荷は９２から１２１に上昇している。その結果、３台のコンピュータ１２００の処理負荷がほぼ等しくなっている。 FIG. 5 shows the state of each program and the value of the processing load after switching the active and standby of the program set in which the switching unit 1108 has not failed over in accordance with the switching pattern determined by the switching pattern determining unit 1107. Yes. In the example of FIG. 5, switching between active and standby is performed for the second and eighth program sets. As a result, the processing load on the computer 1200-2 decreases from 180 to 122, and the processing load on the computers 1200-3 and 1200-4 increases from 92 to 121. As a result, the processing loads of the three computers 1200 are almost equal.

以上の例では、処理負荷を考慮して配置パターンおよび切替パターンを決定したが、コンピュータ１２００間の通信負荷を考慮して配置パターンおよび切替パターンを決定する構成や、処理負荷と通信負荷の双方を考慮して配置パターンおよび切替パターンを決定する構成であっても同様の作用効果が得られることは明らかである。 In the above example, the arrangement pattern and the switching pattern are determined in consideration of the processing load. However, the arrangement pattern and the switching pattern are determined in consideration of the communication load between the computers 1200, and both the processing load and the communication load are determined. It is obvious that the same operation and effect can be obtained even if the arrangement pattern and the switching pattern are determined in consideration.

本実施形態は以下のような各種の付加変更が可能である。 This embodiment can be variously added and changed as follows.

配置パターン決定部１１０５は、配置パターンの決定では、配置パターンの複数の候補のそれぞれについて、配置後の各コンピュータ１２００の処理負荷とコンピュータ１２００間の通信負荷との少なくとも一方の負荷の分散の程度を表す指標値を算出し、この算出した指標値に基づいて上記複数の候補の中から配置パターンを決定してよい。 In determining the arrangement pattern, the arrangement pattern determining unit 1105 determines the degree of distribution of at least one of the processing load of each computer 1200 after the arrangement and the communication load between the computers 1200 for each of the plurality of arrangement pattern candidates. An index value to be expressed may be calculated, and an arrangement pattern may be determined from the plurality of candidates based on the calculated index value.

あるいは配置パターン決定部１１０５は、配置パターンの決定では、配置パターンの複数の候補のそれぞれについて、配置後の各コンピュータ１２００の処理負荷とコンピュータ１２００間の通信負荷との少なくとも一方の負荷の分散の程度を表す指標値と、何れかのコンピュータ１２００に障害が発生してフェイルオーバーが起きた時点でフェイルオーバーが行われていない残りのプログラムセット１１０９の２つのプログラムのアクティブとスタンバイとを切り替えることによって期待される、コンピュータ１２００の処理負荷とコンピュータ１２００間の通信負荷との少なくとも一方の負荷の分散の程度を表す指標値とを算出し、この２種類の指標値に基づいて、上記配置パターンの複数の候補の中から配置パターンを決定して良い。 Alternatively, the arrangement pattern determination unit 1105 determines the degree of distribution of at least one of the processing load of each computer 1200 and the communication load between the computers 1200 after arrangement for each of a plurality of arrangement pattern candidates. Expected by switching between active and standby of the two programs of the remaining program set 1109 that are not failed over when a failure occurs in any computer 1200 and a failover occurs An index value representing the degree of distribution of at least one of the processing load of the computer 1200 and the communication load between the computers 1200 is calculated, and a plurality of the arrangement patterns are calculated based on the two types of index values. An arrangement pattern may be determined from the candidates.

また切替パターン決定部１１０７は、切替パターンの決定では、切替パターンの複数の候補のそれぞれについて、切り替え後のコンピュータ１２００の処理負荷とコンピュータ１２００間の通信負荷との少なくとも一方の負荷の分散の程度を表す指標値を算出し、この指標値に基づいて上記複数の候補の中から切替パターンを決定してよい。 The switching pattern determination unit 1107 determines the degree of distribution of at least one of the processing load of the computer 1200 after switching and the communication load between the computers 1200 for each of the plurality of candidates for the switching pattern. An index value to be expressed may be calculated, and a switching pattern may be determined from the plurality of candidates based on the index value.

[第２の実施形態]
[本実施形態の概要]
高速チェックポイント方式によりＦＴ化された仮想マシンのセットをプログラム・オブジェクトのセットとして捉えると、このプログラムセットには以下のような特徴がある。
（ａ）二重化を構成するプログラムセットの間の依存関係は大きいが、別々の物理マシンに配置する必要がある。したがって、プログラムセットの間で高速に通信を行えることが重要である。
（ｂ）プログラムセットの間で負荷が非対称である。アクティブ側のプログラムの方がスタンバイのプログラムより負荷が大きい。
（ｃ）プログラムセットの間で通信が非対称である。アクティブ側のプログラムからスタンバイ側のプログラムへの通信量が大きい。
（ｄ）物理マシンが故障した場合、フェイルオーバー（スタンバイからアクティブへの切り替え）を、一秒以下という短い時間で高速に行うことができる。
（ｅ）物理マシンが故障していない場合でも、指示により、アクティブとスタンバイの切り替えを一秒以下という短い時間で高速に行うことができる。 [Second Embodiment]
[Overview of this embodiment]
When a set of virtual machines converted into an FT by the high-speed checkpoint method is regarded as a set of program objects, this program set has the following characteristics.
(A) Although the dependency relationship between the program sets constituting the duplex is large, it is necessary to arrange them in separate physical machines. Therefore, it is important to be able to communicate at high speed between program sets.
(B) The load is asymmetric between program sets. The active program is more loaded than the standby program.
(C) Communication is asymmetric between program sets. The amount of communication from the active program to the standby program is large.
(D) When a physical machine fails, failover (switching from standby to active) can be performed at high speed in a short time of one second or less.
(E) Even when the physical machine has not failed, switching between active and standby can be performed at a high speed in a short time of one second or less by an instruction.

本実施形態は、上記のようなプログラムセットの特徴を考慮して構成される。本実施形態では、複数の物理マシンに対して、アクティブとスタンバイを高速に切り替えることができる複数のプログラムセットを配置する場合、物理マシンの処理負荷および物理マシンの間のネットワーク通信負荷を分散し、全体の実行効率を上げることを可能にする。また、１つの物理マシンが使用不可能になってフェイルオーバーが起きた場合、複数のプログラムセットのアクティブとスタンバイとを切り替えて、負荷を再び分散し、全体の実行効率を上げることを可能にする。 This embodiment is configured in consideration of the characteristics of the program set as described above. In this embodiment, when a plurality of program sets that can switch between active and standby at high speed are arranged for a plurality of physical machines, the processing load of the physical machine and the network communication load between the physical machines are distributed, It is possible to increase the overall execution efficiency. In addition, when a single physical machine becomes unavailable and a failover occurs, it is possible to switch the active and standby of multiple program sets to redistribute the load and increase the overall execution efficiency. .

より具体的には、管理部は、物理マシンの処理能力測定部および通信速度測定部にそれぞれ指示をだし、物理マシンの処理能力および物理マシンの間の通信速度を測定する。また、プログラムセットをそれぞれ物理マシンに配置し、プログラムの処理負荷およびプログラムセットの間の通信量を測定する。そして、複数のプログラムセットを複数の物理マシンに配置する場合、管理部は、それぞれの配置パターンについて、処理負荷分散のレーティングおよびネットワーク負荷分散のレーティングを計算し、負荷が平衡するように最適な配置を選択し、実行する。また、ある物理マシンが使用不可能になってフェイルオーバーが起こり、複数のプログラムセットのアクティブとスタンバイとを切り替える場合、管理部は、それぞれの切替パターンについて、処理負荷分散のレーティングおよびネットワーク負荷分散のレーティングを再計算し、負荷が平衡するように最適な切替を選択し、実行する。 More specifically, the management unit issues instructions to the processing capacity measurement unit and the communication speed measurement unit of the physical machine, and measures the processing capacity of the physical machine and the communication speed between the physical machines. In addition, each program set is arranged in a physical machine, and the processing load of the program and the communication amount between the program sets are measured. When multiple program sets are placed on multiple physical machines, the management unit calculates the processing load balancing rating and network load balancing rating for each placement pattern, and optimal placement is achieved so that the load is balanced. Select and execute. Also, when a physical machine becomes unusable and a failover occurs, and when switching between active and standby of multiple program sets, the management unit sets the processing load balancing rating and network load balancing for each switching pattern. Recalculate the rating, select and execute the best switch to balance the load.

[本実施形態の構成]
図６および図１１を参照すると、本実施形態は、複数の物理マシン１０１〜１０５と、それらを接続するネットワーク１１５、それらの上で動作するプログラムセット４０１〜４０２からなる。１つの物理マシンは管理部１０６を備えている。他の物理マシンは、通信速度測定部１０７〜１１０と、処理能力測定部１１１〜１１４とを備えている。 [Configuration of this embodiment]
Referring to FIGS. 6 and 11, the present embodiment includes a plurality of physical machines 101 to 105, a network 115 connecting them, and program sets 401 to 402 operating on them. One physical machine includes a management unit 106. Other physical machines include communication speed measuring units 107 to 110 and processing capacity measuring units 111 to 114.

管理部１０６は、他の物理マシンに対して、通信速度や処理能力の測定指示を行う。また、プログラムセットの最適な配置を計算し、実際に他の物理マシンに対してプログラムセットの配置指示を行う。また、ある物理マシンが使用不可能になった場合、プログラムセットに対してアクティブとスタンバイの切替指示を行う。 The management unit 106 instructs the other physical machines to measure the communication speed and processing capacity. In addition, the optimal arrangement of the program set is calculated, and the program set arrangement instruction is actually given to another physical machine. When a physical machine becomes unusable, it instructs the program set to switch between active and standby.

通信速度測定部１０７〜１１０は、物理マシンの間のネットワーク通信速度を測定する。また、プログラムセットの間の個別のネットワーク通信量を測定する。 The communication speed measuring units 107 to 110 measure the network communication speed between physical machines. It also measures individual network traffic between program sets.

処理能力測定部１１１〜１１４は、物理マシンのプロセッサやＩ／Ｏ等の処理能力を測定する。また、プログラムの個別の処理負荷を測定する。 The processing capacity measuring units 111 to 114 measure processing capacities of physical machine processors and I / O. Also measure the individual processing load of the program.

プログラムセット４０１〜４０２は、アクティブプログラムとスタンバイプログラムからなる。アクティブとスタンバイは別々の物理マシン上で動作する。アクティブが動作していた物理マシンが使用不可能になった場合、スタンバイをアクティブに切り替えて、処理を継続することができる。また、物理マシンが使用不可能になっていない場合でも、管理部１０６はプログラムセットに切替指示を出し、アクティブとスタンバイを切り替えることができる。 The program sets 401 to 402 include an active program and a standby program. Active and standby run on separate physical machines. If the physical machine on which active was operating becomes unusable, the standby can be switched to active and processing can continue. Even if the physical machine is not disabled, the management unit 106 can issue a switching instruction to the program set and switch between active and standby.

物理マシン、通信速度測定部、処理能力測定部の数は４つである必要はなく、任意の数に拡張できる。また、プログラムセットの数は２つである必要はなく、任意の数に拡張できる。 The number of physical machines, communication speed measuring units, and processing capacity measuring units need not be four, and can be expanded to an arbitrary number. Further, the number of program sets need not be two, and can be expanded to an arbitrary number.

[本実施形態の動作]
図６乃至図１０を参照して、本実施形態において、物理マシンの処理能力及び物理マシンの間のネットワーク通信速度を事前に測定する動作について説明する。 [Operation of this embodiment]
With reference to FIGS. 6 to 10, an operation for measuring in advance the processing capability of the physical machine and the network communication speed between the physical machines will be described in the present embodiment.

管理部１０６は、物理マシンを１つ選択し、物理マシンの処理能力測定部に対して測定指示を出す（ステップ２０１）。処理能力測定部は、物理マシンの処理能力を測定し（ステップ２０２）、結果を通知する（ステップ２０３）。管理部１０６は結果を保存する（ステップ２０４）。管理部１０６は、全ての物理マシンに対して測定を繰り返し（ステップ２０５）、物理マシンの処理能力を図８に例示するようにテーブルに記録する。 The management unit 106 selects one physical machine and issues a measurement instruction to the processing capacity measurement unit of the physical machine (step 201). The processing capacity measuring unit measures the processing capacity of the physical machine (step 202) and notifies the result (step 203). The management unit 106 stores the result (step 204). The management unit 106 repeats measurement for all physical machines (step 205), and records the processing capabilities of the physical machines in a table as illustrated in FIG.

また管理部１０６は、物理マシンを１つ選択し、物理マシンの通信速度測定部に対して測定指示を出す（ステップ３０１）。通信速度測定部は、他の全ての物理マシンとの通信速度を測定し（ステップ３０２、ステップ３０３）、結果を通知する（ステップ３０４）。管理部１０６は結果を保存する（ステップ３０５）。管理部１０６は、全ての物理マシンに対して測定を繰り返し（ステップ３０６）、物理マシンの間の通信速度を図１０に例示するようにテーブルに記録する。 The management unit 106 selects one physical machine and issues a measurement instruction to the communication speed measurement unit of the physical machine (step 301). The communication speed measuring unit measures the communication speed with all other physical machines (step 302, step 303) and notifies the result (step 304). The management unit 106 stores the result (step 305). The management unit 106 repeats the measurement for all the physical machines (step 306), and records the communication speed between the physical machines in the table as illustrated in FIG.

次に、図１１乃至図１４を参照して、本実施形態において、プログラムの処理負荷及びプログラムセットの間のネットワーク通信量を事前に測定する動作について説明する。 Next, with reference to FIG. 11 thru | or FIG. 14, the operation | movement which measures the network traffic between a program processing load and a program set in this embodiment in advance is demonstrated.

管理部１０６は、プログラムセットを１つ選択し、アクティブプログラムとスタンバイプログラムとを別々の物理マシン１０２、１０３に配置する（ステップ５０１）。処理能力測定部１０７、１０８は、プログラムの処理負荷を計測する（ステップ５０２）。通信速度測定部は、アクティブからスタンバイへの通信量、スタンバイからアクティブへの通信量を測定する（ステップ５０３）。全てのプログラムセット４０１〜４０２について測定を繰り返し（ステップ５０４）、図１３および図１４に例示するようにテーブルに記録する。 The management unit 106 selects one program set and places the active program and the standby program on separate physical machines 102 and 103 (step 501). The processing capacity measuring units 107 and 108 measure the processing load of the program (step 502). The communication speed measuring unit measures the communication amount from active to standby and the communication amount from standby to active (step 503). The measurement is repeated for all the program sets 401 to 402 (step 504) and recorded in the table as illustrated in FIGS.

次に、図１５および図１６を参照して、本実施形態において、複数のプログラムセットを複数の物理マシンに配置する動作について説明する。 Next, with reference to FIG. 15 and FIG. 16, the operation | movement which arrange | positions a some program set in a some physical machine in this embodiment is demonstrated.

管理部１０６は、複数のプログラムセットを複数の物理マシンに配置するパターンを１つ選択する（ステップ６０１）。そのパターンについて、処理負荷分散のレーティングを計算する（ステップ６０２）。具体的には、まず、各々の物理マシンについて、次式により使用率を求める。
（物理マシンの使用率）＝（プログラムの処理負荷の合計）／（物理マシンの処理能力） The management unit 106 selects one pattern in which a plurality of program sets are arranged on a plurality of physical machines (step 601). For the pattern, a processing load distribution rating is calculated (step 602). Specifically, first, the usage rate is obtained for each physical machine by the following equation.
(Physical machine usage rate) = (total processing load on the program) / (physical machine processing capacity)

さらに、上記求めた物理マシンの使用率を用いて、各々の物理マシンについて、次式により処理使用率の分散を求める。
（処理使用率の分散）＝各々の（物理マシンの使用率）の分散 Further, using the obtained physical machine usage rate, the variance of the processing usage rate is obtained for each physical machine by the following equation.
(Distribution of processing usage) = Distribution of (physical machine usage)

さらに、上記求めた物理マシンの処理使用率の分散を用いて、各々の物理マシンについて、次式により処理負荷分散のレーティングを求める。
（処理負荷分散のレーティング）＝１／（処理使用率の分散） Further, the processing load distribution rating is obtained for each physical machine by the following equation using the obtained distribution of the physical machine processing utilization rate.
(Rating of processing load distribution) = 1 / (Distribution of processing usage rate)

次に管理部１０６は、ネットワーク負荷分散のレーティングを計算する（ステップ６０３）。具体的には、まず、物理マシンの間の上下の通信の各々について、次式によりネットワーク使用率を求める。
（ネットワーク使用率）＝（プログラムの通信量の合計）／（ネットワークの通信速度） Next, the management unit 106 calculates a network load balancing rating (step 603). Specifically, first, for each of the upper and lower communications between physical machines, the network usage rate is obtained by the following equation.
(Network usage rate) = (Total program traffic) / (Network communication speed)

さらに、上記求めたネットワーク使用率を用いて、ネットワーク使用率の分散を次式により求める。
（ネットワーク使用率の分散）＝各々の（ネットワーク使用率）の分散 Further, using the obtained network usage rate, the variance of the network usage rate is obtained by the following equation.
(Distribution of network usage) = Distribution of each (network usage)

さらに、上記求めたネットワーク使用率の分散を用いて、次式によりネットワーク負荷分散のレーティングを求める。
（ネットワーク負荷分散のレーティング）＝１／（ネットワーク使用率の分散） Further, the network load distribution rating is obtained by the following equation using the obtained distribution of the network usage rate.
(Rating of network load distribution) = 1 / (Distribution of network usage rate)

そして、管理部１０６は、上記求めた処理負荷分散のレーティングとネットワーク負荷分散のレーティングと重み係数とを用いて、次式により総合のレーティングを計算する（ステップ６０４）。
（総合のレーティング）＝（処理負荷分散のレーティング）×（マシン処理の重み係数）＋（ネットワーク負荷分散のレーティング）×（ネットワークの重み係数） Then, the management unit 106 calculates a total rating according to the following equation using the calculated processing load distribution rating, network load distribution rating, and weighting factor (step 604).
(Overall rating) = (Processing load balancing rating) x (Machine processing weighting factor) + (Network load balancing rating) x (Network weighting factor)

ここで、マシン処理の重み係数およびネットワークの重み係数は、予め定数として定めていてもよいし、変数として装置の使用者が指定できるようにしてもよい。図１６は重み係数の一例である。 Here, the weighting coefficient for machine processing and the weighting coefficient for network may be determined in advance as constants, or may be specified by the user of the apparatus as variables. FIG. 16 shows an example of the weighting factor.

管理部１０６は、全ての配置パターンについてレーティングの計算を繰り返す（ステップ６０５）。物理マシンの数をｍ、プログラムセットの数をｎとすると、配置パターンの数は、（ｍ×（ｍ−１））ⁿである。管理部１０６は、最も総合レーティングの大きい配置パターンを選択し（ステップ６０６）、複数のプログラムセットを複数の物理マシンに配置する処理を実行する。 The management unit 106 repeats the rating calculation for all the arrangement patterns (step 605). When the number of physical machines is m and the number of program sets is n, the number of arrangement patterns is (m × (m−1)) ⁿ . The management unit 106 selects an arrangement pattern having the largest overall rating (step 606), and executes a process of arranging a plurality of program sets on a plurality of physical machines.

次に、図１７を参照して、ある物理マシンが使用不可能になった場合に、複数のプログラムセットのアクティブとスタンバイとを切り替える動作について説明する。 Next, with reference to FIG. 17, an operation for switching between active and standby of a plurality of program sets when a certain physical machine becomes unusable will be described.

ある物理マシンが使用不可能になった場合、その物理マシンでアクティブプログラムが動作していたプログラムセットについてはフェイルオーバーが実行され、別の物理マシンで動作していたスタンバイプログラムがアクティブプログラムに切り替わる。そのため、処理負荷やネットワーク負荷に偏りが生じる。管理部１０６は、この偏りを是正するために以下の処理を行う。 When a certain physical machine becomes unusable, failover is executed for the program set in which the active program is operating on the physical machine, and the standby program operating on another physical machine is switched to the active program. Therefore, the processing load and the network load are biased. The management unit 106 performs the following process to correct this bias.

管理部１０６は、複数のプログラムセットのアクティブとスタンバイとを切り替えるパターンを１つ選択する（ステップ７０１）。そのパターンについて、処理負荷分散のレーティングを計算し（ステップ７０２）、ネットワーク負荷分散のレーティングを計算し（ステップ７０３）、総合のレーティングを計算する（ステップ７０４）。計算の具体的な方法は、ステップ６０２〜６０４に準ずる。全ての切替パターンについてレーティングの計算を繰り返す（ステップ７０５）。切り替え可能なプログラムセットの数をｎとすると、切替パターンの数は、２ⁿである。最も総合のレーティングの大きい切替パターンを選択し（ステップ７０６）、複数のプログラムセットの切替処理を実行する。 The management unit 106 selects one pattern for switching between active and standby of a plurality of program sets (step 701). For the pattern, a processing load distribution rating is calculated (step 702), a network load distribution rating is calculated (step 703), and an overall rating is calculated (step 704). A specific method of calculation is based on steps 602 to 604. Rating calculation is repeated for all switching patterns (step 705). When the number of program sets that can be switched is n, the number of switching patterns is 2 ⁿ . A switching pattern having the largest overall rating is selected (step 706), and switching processing of a plurality of program sets is executed.

[本実施形態の効果]
第１の効果は、複数のプログラムセットを複数の物理マシンに配置する場合、処理およびネットワーク通信について負荷が分散され、全体の実行効率が上がることである。 [Effect of this embodiment]
The first effect is that when a plurality of program sets are arranged in a plurality of physical machines, the load is distributed with respect to processing and network communication, and the overall execution efficiency is improved.

その理由は、物理マシンの処理能力、物理マシンの間のネットワーク通信速度、プログラムセットの処理負荷、プログラムセットの間のネットワーク通信量を事前に測定し、プログラムセットの処理負荷およびネットワーク通信負荷が非対称であるという特徴を考慮し、配置パターン毎に負荷分散のレーティングを計算し、レーティング最大の配置パターンを選択するためである。 This is because the processing load of the physical machine, the network communication speed between the physical machines, the processing load of the program set, and the network traffic between the program set are measured in advance, and the processing load of the program set and the network communication load are asymmetric. This is because the load distribution rating is calculated for each arrangement pattern, and the arrangement pattern having the maximum rating is selected.

第２の効果は、複数のプログラムセットを配置する複数の物理マシンの性能が互いに相違している場合であっても、処理およびネットワーク通信について負荷が分散され、全体の実行効率が上がることである。その理由は、第１の効果の理由と同じである。 The second effect is that even if the performances of a plurality of physical machines in which a plurality of program sets are arranged are different from each other, the load is distributed for processing and network communication, and the overall execution efficiency is increased. . The reason is the same as the reason for the first effect.

第３の効果は、複数のプログラムセットが複数の物理マシンに配置されている状態で１つの物理マシンが使用不可能になってフェイルオーバーが起きた場合でも、全体の実行効率の低下を防止することができることである。 The third effect is to prevent a decrease in overall execution efficiency even when one physical machine becomes unusable and a failover occurs in a state where a plurality of program sets are arranged on a plurality of physical machines. Be able to.

その理由は、切替パターン毎に負荷分散のレーティングを計算し、レーティング最大の切替パターンを選択し、フェイルオーバーしていない１以上のプログラムセットのアクティブとスタンバイとを故意に切り替えて、負荷を再び分散させるためである。 The reason is to calculate the load distribution rating for each switching pattern, select the switching pattern with the highest rating, deliberately switch the active and standby of one or more program sets that have not failed over, and distribute the load again This is to make it happen.

[第３の実施形態]
本実施形態は、管理部１０６が複数のプログラムセットを複数の物理マシンに配置する場合、物理マシンが使用不可能になった場合の切替パターンについて予めシミュレートし、切替後の負荷分散を考慮して配置パターンを選択する点で、第２の実施形態と相違する。 [Third embodiment]
In this embodiment, when the management unit 106 places a plurality of program sets on a plurality of physical machines, a switching pattern when the physical machine becomes unusable is simulated in advance, and the load distribution after switching is considered. This is different from the second embodiment in that the arrangement pattern is selected.

図１８は本実施形態における配置パターンを選択する手順の一例を示すフローチャートである。以下、図１８を参照して本実施形態の動作を説明する。 FIG. 18 is a flowchart showing an example of a procedure for selecting an arrangement pattern in the present embodiment. Hereinafter, the operation of the present embodiment will be described with reference to FIG.

管理部１０６は、複数のプログラムセットを複数の物理マシンに配置するパターンを１つ選択する（ステップ８０１）。処理負荷分散のレーティングを計算し（ステップ８０２）、ネットワーク負荷分散のレーティングを計算し（ステップ８０３）、切替前のレーティングを計算する（ステップ８０４）。計算の具体的な方法は、ステップ６０２〜６０４に準ずる。 The management unit 106 selects one pattern in which a plurality of program sets are arranged on a plurality of physical machines (step 801). A processing load distribution rating is calculated (step 802), a network load distribution rating is calculated (step 803), and a rating before switching is calculated (step 804). A specific method of calculation is based on steps 602 to 604.

次に、物理マシンを１つ選択し（ステップ８０５）、物理マシンが使用不可能になった場合を仮定し（ステップ８０６）、以下、切替をシミュレートする。管理部１０６は、複数のプログラムセットのアクティブとスタンバイとを切り替えるパターンを１つ選択する（ステップ８０７）。処理負荷分散のレーティングを計算し（ステップ８０８）、ネットワーク負荷分散のレーティングを計算し（ステップ８０９）、切替後のレーティングを計算する（ステップ８１０）。全ての切替パターンについてレーティングの計算を繰り返す（ステップ８１１）。この切替をシミュレートする動作は、ステップ７０１〜７０５に準ずる。そして、切替後のレーティングの最大値を計算する（ステップ８１２）。全ての物理マシンについて使用不可能になった場合をシミュレートする動作を繰り返し（ステップ８１３）、切替後のレーティングの最大値の平均値を計算する（ステップ８１４）。 Next, one physical machine is selected (step 805), and it is assumed that the physical machine becomes unusable (step 806). Hereinafter, switching is simulated. The management unit 106 selects one pattern for switching between active and standby of a plurality of program sets (step 807). A processing load distribution rating is calculated (step 808), a network load distribution rating is calculated (step 809), and a post-switching rating is calculated (step 810). Rating calculation is repeated for all switching patterns (step 811). The operation for simulating this switching is in accordance with steps 701 to 705. Then, the maximum value of the rating after switching is calculated (step 812). The operation of simulating the case where all the physical machines become unusable is repeated (step 813), and the average value of the maximum ratings after switching is calculated (step 814).

次に、総合のレーティングを以下の式により計算する（ステップ８１５）。
（総合のレーティング）＝（切替前のレーティング）×（切替前の重み係数）＋（切替後のレーティングの最大値の平均値）×（切替前の重み係数） Next, the total rating is calculated by the following equation (step 815).
(Overall rating) = (Rating before switching) x (Weighting coefficient before switching) + (Average value of rating after switching) x (Weighting coefficient before switching)

ここで、切替前後の重み係数は、予め定数として定めていてもよいし、変数として装置の使用者が指定できるようにしてもよい。図１９は重み係数の一例である。 Here, the weighting coefficient before and after switching may be determined in advance as a constant, or may be specified by the user of the apparatus as a variable. FIG. 19 is an example of a weighting coefficient.

全ての配置パターンについてレーティングの計算を繰り返す（ステップ８１６）。そして、最も総合のレーティングの大きい配置パターンを選択し（ステップ８１７）、複数のプログラムセットを複数の物理マシンに配置する処理を実行する。 Rating calculation is repeated for all arrangement patterns (step 816). Then, an arrangement pattern having the largest overall rating is selected (step 817), and a process of arranging a plurality of program sets on a plurality of physical machines is executed.

[本実施形態の効果]
本実施形態によれば、第２の実施形態と同様の効果が得られると共に、さらに、複数のプログラムセットを複数の物理マシンに配置する場合、１つの物理マシンが使用不可能になってフェイルオーバーが起きて複数のプログラムセットのアクティブとスタンバイとを切り替えたときに、処理およびネットワーク通信について負荷分散しやすい配置を予め選択できる、という効果が得られる。 [Effect of this embodiment]
According to this embodiment, the same effects as those of the second embodiment can be obtained. Further, when a plurality of program sets are arranged on a plurality of physical machines, one physical machine becomes unusable and failover occurs. When a problem occurs and the active and standby of a plurality of program sets are switched, it is possible to obtain an effect that it is possible to select in advance an arrangement that facilitates load distribution for processing and network communication.

その理由は、物理マシンが使用不可能になった場合の切替パターンについて予めシミュレートし、切替前のレーティングと切替後のレーティングから総合のレーティングを計算し、レーティング最大の配置パターンを選択するためである。 The reason is that the switching pattern when the physical machine becomes unusable is simulated in advance, the total rating is calculated from the rating before switching and the rating after switching, and the layout pattern with the maximum rating is selected. is there.

１１００…第１のコンピュータ
１１０１…通信Ｉ／Ｆ部
１１０２…記憶部
１１０３…プロセッサ
１１０４…プログラム
１１０５…配置パターン決定部
１１０６…プログラム配置部
１１０７…切替パターン決定部
１１０８…切替部
１１０９…プログラムセット
１２００…第２のコンピュータ
１３００…ネットワーク DESCRIPTION OF SYMBOLS 1100 ... 1st computer 1101 ... Communication I / F part 1102 ... Memory | storage part 1103 ... Processor 1104 ... Program 1105 ... Arrangement pattern determination part 1106 ... Program arrangement | positioning part 1107 ... Switching pattern determination part 1108 ... Switching part 1109 ... Program set 1200 ... Second computer 1300... Network

Claims

A program arrangement method executed by a computer system in which a first computer and a plurality of second computers are connected to each other through a network,
The first computer is an arrangement pattern in which a plurality of program sets, each of which is a set of two programs that operate as one active and the other as a standby, are arranged in the plurality of second computers, The two programs are not arranged in the same second computer, and an arrangement pattern is determined in consideration of at least one of the processing load of each of the plurality of second computers and the communication load between the plurality of second computers. And
The first computer arranges the plurality of program sets in the plurality of second computers according to the determined arrangement pattern,
When a failure occurs in any one of the second computers and the first computer is failed over, the active and standby of the two programs of the remaining program sets that are not failed over A switching pattern for switching between the plurality of second computers after the failover is determined in consideration of at least one of a processing load of each of the plurality of second computers and a communication load between the plurality of second computers,
In accordance with the determined switching pattern, the first computer switches the program that is operating as active out of the two programs of the program set that has not been failed over to standby and operates as standby Switch program active <br/> Program placement method.

In determining the arrangement pattern,
The first computer calculates, for each of the plurality of candidates for the arrangement pattern, a first index value representing a degree of distribution of at least one of the processing load and the communication load after arrangement,
The program placement method according to claim 1, wherein the first computer determines the placement pattern from the plurality of candidates based on the calculated first index value.

In determining the switching pattern,
The first computer calculates, for each of the plurality of candidates for the switching pattern, a second index value representing a degree of distribution of at least one of the processing load and the communication load after switching,
The program placement method according to claim 1 or 2, wherein the first computer determines the switching pattern from the plurality of candidates based on the calculated second index value.

In determining the arrangement pattern,
A first index value representing a degree of distribution of at least one of the processing load and the communication load after placement for each of the plurality of placement pattern candidates; The processing expected by switching between active and standby of the two programs of the remaining program set that is not failed over when a failure occurs in the second computer and a failover occurs Calculating a second index value representing a degree of dispersion of at least one of the load and the communication load;
2. The program arrangement method according to claim 1, wherein the first computer determines the arrangement pattern from a plurality of candidates for the arrangement pattern based on the first index value and the second index value. .

In determining the arrangement pattern,
The first computer determines the arrangement pattern from a plurality of candidates for the arrangement pattern based on a value obtained by weighting and adding the first index value and the second index value. 5. The program arrangement method according to 4.

The first computer in a computer system in which a first computer and a plurality of second computers are connected to be able to communicate with each other through a network,
An arrangement pattern in which a plurality of program sets, each of which is a set of two programs that operate as one active and the other as a standby, are arranged in the plurality of second computers, and the two programs in the same program set are the same An arrangement pattern determining means for determining an arrangement pattern that is not arranged in the second computer and that takes into account at least one of the processing load of each of the plurality of second computers and the communication load between the plurality of second computers;
Program placement means for placing the plurality of program sets on the plurality of second computers according to the determined placement pattern;
When a failure occurs in any of the second computers and a failover is performed, a switching pattern for switching between active and standby of the two programs of the remaining program set that has not been failed over, Switching pattern determining means for determining in consideration of at least one of the processing load of each of the plurality of second computers after the failover and the communication load between the plurality of second computers;
Switching means for switching a program operating as active among the two programs of the program set that has not been failed over to standby and switching a program operating as standby to active according to the determined switching pattern And a computer having.

A first computer in a computer system in which a first computer and a plurality of second computers are connected to each other through a network;
An arrangement pattern in which a plurality of program sets, each of which is a set of two programs that operate as one active and the other as a standby, are arranged in the plurality of second computers, and the two programs in the same program set are the same Determining an arrangement pattern in consideration of at least one of a processing load of each of the plurality of second computers and a communication load between the plurality of second computers, which is not arranged in the second computer;
Arranging the plurality of program sets on the plurality of second computers according to the determined arrangement pattern;
When a failure occurs in any of the second computers and a failover is performed, a switching pattern for switching between active and standby of the two programs of the remaining program set that has not been failed over, Determining in consideration of at least one of the processing load of each of the plurality of second computers after the failover and the communication load between the plurality of second computers;
Switching the program operating as active out of the two programs of the program set not being failed over to standby and switching the program operating as standby active according to the determined switching pattern; A program to let you do.