JP6582785B2

JP6582785B2 - Virtual machine management system, virtual machine management method and program

Info

Publication number: JP6582785B2
Application number: JP2015183660A
Authority: JP
Inventors: 宮島　弘明; 弘明宮島
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2015-09-17
Filing date: 2015-09-17
Publication date: 2019-10-02
Anticipated expiration: 2035-09-17
Also published as: JP2017058997A

Description

本発明は、仮想マシン管理システム、仮想マシン管理方法及びプログラムに関する。 The present invention relates to a virtual machine management system, a virtual machine management method, and a program.

システムの可用性や信頼性を高めるために、様々な技術が開発されている。仮想マシンにおいては、可用性を高めることを目的の一つとして、フォールトトレラント（ＦａｕｌｔＴｏｒｅｌａｎｔ：以下「ＦＴ」と称する場合がある）機能と呼ばれる技術が用いられつつある。フォールトトレラント機能により、例えば、仮想マシンを実行するホストに障害が生じた場合に、予め用意された他のホストに当該仮想マシンの処理が引き継がれて当該仮想マシンの処理が継続する。フォールトトレラント機能の一例が、非特許文献１に記載されている。 Various techniques have been developed to increase the availability and reliability of the system. In a virtual machine, a technique called a fault tolerant (hereinafter, sometimes referred to as “FT”) function is being used as one of the purposes for improving availability. By the fault tolerant function, for example, when a failure occurs in a host that executes a virtual machine, the process of the virtual machine is taken over by another host prepared in advance, and the process of the virtual machine is continued. An example of the fault tolerant function is described in Non-Patent Document 1.

特許文献１には、特別なハードウェアを使用せずに、簡便、高速に系の切り替えが可能な低消費電力のフォールトトレラント計算機システム等が記載されている。 Patent Document 1 describes a low-power consumption fault-tolerant computer system that can easily and quickly switch a system without using special hardware.

特許文献２には、仮想マシンのマイグレーション先を決定する方法が記載されている。特許文献２に記載の技術では、システム全体の最適化のためのマイグレーションポリシーだけでなく、アプリケーションに依存するマイグレーションポリシーをユーザが定義できる。そして、特許文献２に記載の技術では、両ポリシーを考慮しながら、アプリケーションが稼働する仮想マシンを稼働させたまま、他のサーバに移動することを制御する。 Patent Document 2 describes a method for determining a migration destination of a virtual machine. With the technique described in Patent Document 2, a user can define not only a migration policy for optimizing the entire system but also a migration policy depending on an application. In the technique described in Patent Document 2, it is controlled to move to another server while the virtual machine on which the application is operating is operating while considering both policies.

特開２０１２−２２１３２１号公報JP 2012-213321 A 特開２００９−１１６８５２号公報JP 2009-116852 A

Brendan Cully, Geoffrey Lefebvre, Dutch Meyer, Mike Feeley, Norm Hutchinson, and Andrew Warfield. 2008. Remus: high availability via asynchronous virtual machine replication. In Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation (NSDI'08), Jon Crowcroft and Mike Dahlin (Eds.). USENIX Association, Berkeley, CA, USA, 161-174.Brendan Cully, Geoffrey Lefebvre, Dutch Meyer, Mike Feeley, Norm Hutchinson, and Andrew Warfield. 2008.Remus: high availability via asynchronous virtual machine replication.In Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation (NSDI'08), Jon Crowcroft and Mike Dahlin (Eds.). USENIX Association, Berkeley, CA, USA, 161-174.

フォールトトレラント機能を備える仮想マシンにおいては、フォールトトレラント機能に関連して、負荷の大きな処理が行われる場合がある。そのため、フォールトトレラント機能は、当該仮想マシンによって実行されるサービスの外部への提供等、仮想マシンの動作に影響を及ぼす可能性がある。しかしながら、各特許文献等に記載の技術では、フォールトトレラント機能に起因する仮想マシンの動作への影響を軽減することが困難である。 In a virtual machine having a fault tolerant function, processing with a large load may be performed in relation to the fault tolerant function. Therefore, the fault-tolerant function may affect the operation of the virtual machine, such as providing the service executed by the virtual machine to the outside. However, with the techniques described in each patent document or the like, it is difficult to reduce the influence on the operation of the virtual machine due to the fault tolerant function.

本発明は、上記課題を解決するためになされたものであって、フォールトトレラント機能に起因する仮想マシンの動作への影響が小さい仮想マシン管理システム等を提供することを主たる目的とする。 The present invention has been made to solve the above-described problems, and has as its main object to provide a virtual machine management system and the like that have a small influence on the operation of the virtual machine due to the fault tolerant function.

本発明の一態様における仮想マシン管理システムは、仮想マシンのフォールトトレラント機能の対象となる少なくとも一つのサービスの各々について、前記フォールトトレラント機能の状態に関する条件を保持するサービス管理表と、
前記条件に基づいて、前記仮想マシンの前記フォールトトレラント機能の状態を制御する制御手段とを備える、仮想マシン管理システム。

また、本発明の一態様における仮想マシン管理方法は、仮想マシンのフォールトトレラント機能の対象となる少なくとも一つのサービスの各々に対するフォールトトレラント機能を有効にすべき条件に基づいて、サービスの少なくとも一つが条件を満たすか否かを判定し、判定の結果に基づいて、仮想マシンのフォールトトレラント機能に関する状態を変更するように制御する。 The virtual machine management system according to an aspect of the present invention includes a service management table that holds conditions regarding the state of the fault tolerant function for each of at least one service that is a target of the fault tolerant function of the virtual machine;
A virtual machine management system comprising: control means for controlling the state of the fault tolerant function of the virtual machine based on the condition.

In the virtual machine management method according to one aspect of the present invention, at least one of the services is a condition based on a condition for enabling the fault tolerant function for each of at least one service that is a target of the fault tolerant function of the virtual machine. Whether the condition is satisfied is determined, and control is performed to change the state related to the fault tolerant function of the virtual machine based on the determination result.

また、本発明の一態様におけるプログラムは、コンピュータに、仮想マシンのフォールトトレラント機能の対象となる少なくとも一つのサービスの各々に対するフォールトトレラント機能を有効にすべき条件に基づいて、前記サービスの少なくとも一つが前記条件を満たすか否かを判定する処理と、判定の結果に基づいて、仮想マシンのフォールトトレラント機能に関する状態を変更するように制御する処理とを実行させる。 Further, the program according to one aspect of the present invention is based on the condition that the fault tolerant function for each of at least one service that is the target of the fault tolerant function of the virtual machine is enabled in the computer. A process for determining whether or not the condition is satisfied and a process for controlling to change the state related to the fault tolerant function of the virtual machine based on the determination result are executed.

本発明によると、フォールトトレラント機能に起因する仮想マシンの動作への影響が小さい仮想マシン管理システム等を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the virtual machine management system etc. with a small influence on operation | movement of the virtual machine resulting from a fault tolerant function can be provided.

本発明の各実施形態の前提となる仮想マシンの構成等を示す図である。It is a figure which shows the structure of the virtual machine etc. which are the premise of each embodiment of this invention. 本発明の第１の実施形態における仮想マシン管理システムの構成を示す図である。It is a figure which shows the structure of the virtual machine management system in the 1st Embodiment of this invention. 本発明の第１の実施形態における仮想マシン管理システムの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the virtual machine management system in the 1st Embodiment of this invention. 本発明の第１の実施形態における仮想マシン管理システムの動作の変形例を示すフローチャートである。It is a flowchart which shows the modification of operation | movement of the virtual machine management system in the 1st Embodiment of this invention. 本発明の第２の実施形態における仮想マシン管理システムの構成を示す図である。It is a figure which shows the structure of the virtual machine management system in the 2nd Embodiment of this invention. 本発明の各実施形態における仮想マシン管理システム等を実現する情報処理装置の一構成例を示す図である。It is a figure which shows one structural example of the information processing apparatus which implement | achieves the virtual machine management system etc. in each embodiment of this invention.

本発明の各実施形態について、添付の図面を参照して説明する。なお、本発明の各実施形態において、各システムの各構成要素は、機能単位のブロックを示している。各システムの各構成要素は、例えば図６に示すような情報処理装置５００とプログラムとの任意の組み合わせにより実現される。情報処理装置５００は、一例として、以下のような構成を含む。 Embodiments of the present invention will be described with reference to the accompanying drawings. In each embodiment of the present invention, each component of each system represents a functional unit block. Each component of each system is realized by, for example, an arbitrary combination of an information processing apparatus 500 and a program as shown in FIG. The information processing apparatus 500 includes the following configuration as an example.

・ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）５０１
・ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）５０２
・ＲＡＭ（ＲａｍｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）５０３
・ＲＡＭ５０３にロードされるプログラム５０４
・プログラム５０４を格納する記憶装置５０５
・記録媒体５０６の読み書きを行うドライブ装置５０７
・通信ネットワーク５０９と接続する通信インターフェース５０８
・データの入出力を行う入出力インターフェース５１０
・各構成要素を接続するバス５１１
各実施形態における各システムの各構成要素は、これらの機能を実現するプログラム５０４をＣＰＵ５０１が取得して実行することで実現される。各装置の各構成要素の機能を実現するプログラム５０４は、例えば、予め記憶装置５０５やＲＡＭ５０３に格納されており、必要に応じてＣＰＵ５０１が読み出す。なお、プログラム５０４は、通信ネットワーク５０９を介してＣＰＵ５０１に供給されてもよいし、予め記録媒体５０６に格納されており、ドライブ装置５０７が当該プログラムを読み出してＣＰＵ５０１に供給してもよい。また、各装置の実現方法には、様々な変形例がある。例えば、各実施形態における各システムは、情報処理装置５００とは異なる回路等を含む様々なハードウェアとプログラムとの任意の組み合わせにより実現され得る。 CPU (Central Processing Unit) 501
ROM (Read Only Memory) 502
-RAM (Random Access Memory) 503
A program 504 loaded into the RAM 503
A storage device 505 for storing the program 504
A drive device 507 for reading / writing the recording medium 506
Communication interface 508 connected to the communication network 509
An input / output interface 510 for inputting / outputting data
-Bus 511 connecting each component
Each component of each system in each embodiment is realized by the CPU 501 acquiring and executing a program 504 that realizes these functions. The program 504 that realizes the function of each component of each device is stored in advance in the storage device 505 or the RAM 503, for example, and is read by the CPU 501 as necessary. Note that the program 504 may be supplied to the CPU 501 via the communication network 509 or may be stored in the recording medium 506 in advance, and the drive device 507 may read the program and supply it to the CPU 501. In addition, there are various modifications in the method of realizing each device. For example, each system in each embodiment can be realized by any combination of various hardware including a circuit different from the information processing apparatus 500 and a program.

本発明の各実施形態に関する説明に先立って、各実施形態の前提となる仮想マシンにおけるフォールトトレラント機能やその動作を説明する。図１は、各実施形態の前提となる仮想マシン等の構成を示す図である。 Prior to the description of each embodiment of the present invention, a fault tolerant function and its operation in a virtual machine which is a premise of each embodiment will be described. FIG. 1 is a diagram illustrating a configuration of a virtual machine or the like that is a premise of each embodiment.

フォールトトレラント機能は、仮想マシンを実行するハードウェア等に障害が生じた場合等に、当該仮想マシンが行っていた処理を他のハードウェア等にて実行される他の仮想マシンが継続して行う機能である。本発明の各実施形態においては、フォールトトレラント機能を実現する動作の一つとして、次のような動作が想定される。 In the fault tolerant function, when a failure occurs in the hardware that executes the virtual machine, other virtual machines that are executed by other hardware continue to perform the processing that the virtual machine is performing. It is a function. In each embodiment of the present invention, the following operation is assumed as one of the operations for realizing the fault tolerant function.

まず、チェックポイントと呼ばれる一定間隔のタイミング毎等に、待機中の仮想マシンに稼働中の仮想マシンの状態が他のハードウェア等にて実行される他の仮想マシンに保存される。そして、稼働中の仮想マシンを実行するハードウェア等に障害が発生した場合には、待機中の仮想マシンは、保存された情報を用いて、稼働中の仮想マシンが実行していた処理を引き継いで実行する。一般的なフォールトトレラント機能においては、処理の引き継ぎの際には、仮想マシンの実行がそのまま継続される。このようなフォールトトレラント機能を用いることで、仮想マシンの可用性を高めることが可能となる。 First, the state of a virtual machine that is running in a standby virtual machine is stored in another virtual machine that is executed by other hardware or the like at every predetermined interval called a checkpoint. When a failure occurs in the hardware that executes the operating virtual machine, the standby virtual machine takes over the processing performed by the operating virtual machine using the stored information. Run with. In a general fault-tolerant function, execution of a virtual machine is continued as it is when processing is taken over. By using such a fault tolerant function, the availability of the virtual machine can be increased.

なお、以降においては、チェックポイントにおいて仮想マシンの状態が保存され、障害が発生した際に他の仮想マシンによる処理の引き継ぎが可能な状態を、フォールトトレラント機能が有効であると称する。また、仮想マシンの状態の保存が行われず、障害が発生した際に他の仮想マシンによる処理の引き継ぎが行われない状態を、フォールトトレラント機能が無効であると称する。 Hereinafter, a state in which the state of the virtual machine is stored at the checkpoint and processing can be taken over by another virtual machine when a failure occurs is referred to as a fault tolerant function being effective. In addition, a state in which the state of the virtual machine is not saved and a process is not taken over by another virtual machine when a failure occurs is referred to as an invalid fault tolerant function.

フォールトトレラント機能を実現する動作の一例は、具体的には以下のようになる。図１に示す例においては、ホスト１０−１及び１０−２が通信ネットワーク５０を介して接続されている。ホスト１０−１及び１０−２は、例えば上述した図６に示す情報処理装置５００であるが、これに限られない。また、通信ネットワーク５０は、ホスト１０−１と１０−２との間でデータの転送を可能にする任意の構成のネットワークである。 An example of the operation for realizing the fault tolerant function is specifically as follows. In the example shown in FIG. 1, hosts 10-1 and 10-2 are connected via a communication network 50. The hosts 10-1 and 10-2 are, for example, the information processing apparatus 500 illustrated in FIG. 6 described above, but are not limited thereto. The communication network 50 is a network having an arbitrary configuration that enables data transfer between the hosts 10-1 and 10-2.

ホスト１０−１及び１０−２においては、仮想マシン管理システム１００−１及び１００−２がそれぞれ実行されている。仮想マシン管理システム１００−１及び１００−２は、ホスト１０−１又は１０−２のそれぞれにおいて仮想マシンを実現する一般的なシステムである。仮想マシン管理システム１００−１及び１００−２は、ＶＭＭ（ＶｉｒｔｕａｌＭａｃｈｉｎｅＭａｎａｇｅｍｅｎｔ）、又はハイパーバイザ（Ｈｙｐｅｒｖｉｓｏｒ）とも呼ばれる。仮想マシン管理システム１００−１及び１００−２は、一般的なフォールトトレラント機能の実行に必要となる構成を備える。 In the hosts 10-1 and 10-2, virtual machine management systems 100-1 and 100-2 are executed, respectively. The virtual machine management systems 100-1 and 100-2 are general systems that realize a virtual machine in each of the hosts 10-1 and 10-2. The virtual machine management systems 100-1 and 100-2 are also referred to as a virtual machine management (VMM) or a hypervisor. The virtual machine management systems 100-1 and 100-2 have a configuration necessary for executing a general fault-tolerant function.

仮想マシン２０又は４０は、それぞれ、仮想化された計算機システムである。仮想マシン２０は、仮想マシン管理システム１００−１にて実行される。また、仮想マシン４０は、仮想マシン管理システム１００−２にて実行される。仮想マシン２０又は４０は、それぞれゲスト物理メモリ２１又は４１を備える。ゲスト物理メモリ２１又は４１は、それぞれ仮想マシン２０又は４０の物理メモリとして機能する。それぞれゲスト物理メモリ２１又は４１のそれぞれは、ホスト１０−１又は１０−２が備える物理的なメモリ等を用いて実現される。 Each of the virtual machines 20 and 40 is a virtualized computer system. The virtual machine 20 is executed by the virtual machine management system 100-1. The virtual machine 40 is executed by the virtual machine management system 100-2. The virtual machine 20 or 40 includes a guest physical memory 21 or 41, respectively. The guest physical memory 21 or 41 functions as a physical memory of the virtual machine 20 or 40, respectively. Each of the guest physical memories 21 or 41 is realized using a physical memory or the like included in the host 10-1 or 10-2.

図１に示す例では、仮想マシン２０は稼働しており、仮想マシン２０においてゲストＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）３０が動作する。また、仮想マシン４０は待機しており、ゲストＯＳ３０の動作を引き継ぐことが可能である。すなわち、仮想マシン４０は、仮想マシン２０においてゲストＯＳ３０が正常に動作している場合には、特段の動作を行わない。そして、ホスト１０−１又は仮想マシン管理システム１００−１の少なくとも一方に問題が生じた場合には、仮想マシン４０は動作を開始する。この場合には、ゲストＯＳ３０は、仮想マシン管理システム１００−２が備えるフォールトトレラント機能により、仮想マシン４０に動作が引き継がれる。そして、ゲストＯＳ３０は、仮想マシン４０にて継続して動作する。 In the example illustrated in FIG. 1, the virtual machine 20 is operating, and a guest OS (Operating System) 30 operates in the virtual machine 20. Further, the virtual machine 40 is on standby and can take over the operation of the guest OS 30. That is, the virtual machine 40 does not perform any special operation when the guest OS 30 is operating normally in the virtual machine 20. When a problem occurs in at least one of the host 10-1 or the virtual machine management system 100-1, the virtual machine 40 starts operating. In this case, the guest OS 30 takes over the operation of the virtual machine 40 by the fault tolerant function provided in the virtual machine management system 100-2. The guest OS 30 continues to operate on the virtual machine 40.

フォールトトレラント機能を有効とするために、仮想マシン管理システム１００−１及び１００−２は、仮想マシン２０の状態を保存する。仮想マシン管理システム１００−１は、仮想マシン２０の状態を、上述したチェックポイントと呼ばれるタイミングで仮想マシン管理システム１００−２を介して仮想マシン４０に保存する。チェックポイントは、例えば一定の間隔で設定される。上述したチェックポイントの間隔は、仮想マシン２０にて実行されるサービスの種類等に応じて適宜定められる。仮想マシン管理システム１００−１は、仮想マシン２０の状態の保存に際して、仮想マシン２０の動作を一時的に停止してその状態を保存する。 In order to enable the fault tolerant function, the virtual machine management systems 100-1 and 100-2 store the state of the virtual machine 20. The virtual machine management system 100-1 stores the state of the virtual machine 20 in the virtual machine 40 via the virtual machine management system 100-2 at a timing called the check point described above. Checkpoints are set at regular intervals, for example. The checkpoint interval described above is appropriately determined according to the type of service executed in the virtual machine 20. When saving the state of the virtual machine 20, the virtual machine management system 100-1 temporarily stops the operation of the virtual machine 20 and saves the state.

仮想マシン管理システム１００−１は、仮想マシン２０の状態として、例えば、仮想マシン２０が備えるゲスト物理メモリ２１の内容を仮想マシン４０のゲスト物理メモリ４１に保存する。仮想マシン２０の状態として、仮想マシン２０が仮想的に備えるＣＰＵの状態や、仮想マシン２０が仮想的に備える記憶装置の内容等が保存されてもよい。仮想マシン２０の状態としてゲスト物理メモリ２１の内容が保存される場合、仮想マシン管理システム１００−１は、直近のチェックポイントにて仮想マシン２０の状態が保存された後に更新されたゲスト物理メモリ２１の内容を保存する。すなわち、直近のチェックポイントにて仮想マシン２０の状態が保存された後に書き込みが行われたメモリ２１の領域が、次のチェックポイントにおいて保存の対象となる。 The virtual machine management system 100-1 stores, for example, the contents of the guest physical memory 21 included in the virtual machine 20 in the guest physical memory 41 of the virtual machine 40 as the state of the virtual machine 20. As the state of the virtual machine 20, the state of the CPU that the virtual machine 20 includes virtually, the contents of the storage device that the virtual machine 20 includes virtually, and the like may be stored. When the contents of the guest physical memory 21 are stored as the state of the virtual machine 20, the virtual machine management system 100-1 updates the guest physical memory 21 updated after the state of the virtual machine 20 is stored at the latest checkpoint. Save the contents of. That is, the area of the memory 21 in which writing has been performed after the state of the virtual machine 20 has been saved at the most recent checkpoint becomes a storage target at the next checkpoint.

一般に、仮想マシン２０又は４０においては、ゲスト物理メモリ２１又は４１のそれぞれの内容は、ページと呼ばれる一定の単位にて管理される。したがって、ゲスト物理メモリ２１において、書き込みが行われたか否かの判定は、ページ単位で行われる。また、仮想マシン管理システム１００−１は、ページ単位でゲスト物理メモリ２１の内容を保存する。書き込みが行われ、次のチェックポイントにおいて保存の対象となるページは、ダーティページと呼ばれる。 In general, in the virtual machine 20 or 40, the contents of the guest physical memory 21 or 41 are managed in a certain unit called a page. Therefore, in the guest physical memory 21, it is determined on a page basis whether or not writing has been performed. Further, the virtual machine management system 100-1 stores the contents of the guest physical memory 21 in units of pages. A page that is written and is to be saved at the next checkpoint is called a dirty page.

図１に示す例では、ゲスト物理メモリ２１として囲われる領域に含まれる複数の四角形状の図形は、ゲスト物理メモリ２１におけるページの各々を表す。図１に示す例では、当該図形のうち、黒色で塗られた図形の各々は、ダーティページを表す。また、転送バッファ１２１として囲われる領域に含まれる複数の四角形状の図形は、チェックポイントにおいて保存の対象となるダーティページを表す。 In the example shown in FIG. 1, a plurality of quadrangular figures included in an area enclosed as the guest physical memory 21 represent each page in the guest physical memory 21. In the example shown in FIG. 1, each of the figures painted in black among the figures represents a dirty page. In addition, a plurality of quadrangular figures included in the area enclosed as the transfer buffer 121 represent dirty pages to be saved at checkpoints.

上述のように、フォールトトレラント機能が有効であり、仮想マシン２０の状態の保存が行われる場合、仮想マシン２０は動作を一時的に停止する。仮想マシン２０の動作は、仮想マシン２０の状態の仮想マシン４０への保存が完了した後に再開される。すなわち、仮想マシン２０の動作は、仮想マシン２０の状態が仮想マシン４０に保存されるまで保留される。仮想マシン２０の状態を仮想マシン４０がゲスト物理メモリ４１へ保存する際には、ホスト１０−１からホスト１０−２への仮想マシン２０の状態に関するデータの転送が必要となる。これに対して、ホスト１０−１が備えるメモリにおけるデータのコピーは、ホスト１０−１からホスト１０−２へのデータの転送と比較して高速に実行される。 As described above, when the fault tolerant function is valid and the state of the virtual machine 20 is saved, the virtual machine 20 temporarily stops its operation. The operation of the virtual machine 20 is resumed after the storage of the state of the virtual machine 20 in the virtual machine 40 is completed. That is, the operation of the virtual machine 20 is suspended until the state of the virtual machine 20 is stored in the virtual machine 40. When the virtual machine 40 stores the state of the virtual machine 20 in the guest physical memory 41, it is necessary to transfer data regarding the state of the virtual machine 20 from the host 10-1 to the host 10-2. On the other hand, the data copy in the memory provided in the host 10-1 is executed at a higher speed than the data transfer from the host 10-1 to the host 10-2.

そこで、仮想マシン２０の動作が一時的に停止する期間を短くするために、仮想マシン管理システム１００−１は、仮想マシン２０の状態を最初に転送バッファ１２１に保存する。転送バッファ１２１は、後述のように、例えば仮想マシン管理システム１００−１が備えるメモリである。仮想マシン２０の状態が転送バッファ１２１に保存に保存されると、仮想マシン２０は、動作を再開する。仮想マシン２０の動作の再開と併せて、仮想マシン管理システム１００−１は、転送バッファ１２１に保存された仮想マシン２０の状態を仮想マシン４０のゲスト物理メモリ４１へ保存する。このようにすることで、仮想マシン２０の動作が一時的に停止する期間は、仮想マシン２０の状態がゲスト物理メモリ４１へ直接に転送されてゲスト物理メモリ４１に保存される場合と比較して短い期間となる。なお、同じページが繰り返しダーティページとなる場合は、仮想マシン管理システム１００−１は、当該ダーティページに関して、その時点における内容と直近のチェックポイントにおける内容との差分を仮想マシン４０へ転送してもよい。 Therefore, in order to shorten the period during which the operation of the virtual machine 20 is temporarily stopped, the virtual machine management system 100-1 first stores the state of the virtual machine 20 in the transfer buffer 121. As will be described later, the transfer buffer 121 is, for example, a memory included in the virtual machine management system 100-1. When the state of the virtual machine 20 is saved in the transfer buffer 121, the virtual machine 20 resumes operation. In conjunction with the resumption of the operation of the virtual machine 20, the virtual machine management system 100-1 stores the state of the virtual machine 20 stored in the transfer buffer 121 in the guest physical memory 41 of the virtual machine 40. By doing so, the period during which the operation of the virtual machine 20 is temporarily stopped is compared with the case where the state of the virtual machine 20 is directly transferred to the guest physical memory 41 and stored in the guest physical memory 41. A short period. When the same page is repeatedly a dirty page, the virtual machine management system 100-1 transfers the difference between the content at that time and the content at the latest checkpoint to the virtual machine 40 with respect to the dirty page. Good.

また、上述のように仮想マシン２０の状態が保存される場合には、仮想マシン２０から外部への通信は、仮想マシン２０の状態の仮想マシン４０への保存が完了するまで保留される。すなわち、直近のチェックポイントにて仮想マシン２０の状態が保存された後に生じた後に生じた外部への通信は、次のチェックポイントにおいて仮想マシン２０の状態が保存されるまで外部へ送信されずに保留される。保留された通信は、仮想マシン２０の状態の仮想マシン４０への保存が完了した後に実際に実行される。 Further, when the state of the virtual machine 20 is stored as described above, communication from the virtual machine 20 to the outside is suspended until the storage of the state of the virtual machine 20 in the virtual machine 40 is completed. That is, communication to the outside that occurs after the state of the virtual machine 20 is saved at the latest checkpoint is not transmitted to the outside until the state of the virtual machine 20 is saved at the next checkpoint. Deferred. The suspended communication is actually executed after the storage of the state of the virtual machine 20 in the virtual machine 40 is completed.

この動作は、下記の理由に基づく。ホスト１０等に障害などが発生し、仮想マシン２０の処理が仮想マシン４０へ引き継がれた場合には、直近のチェックポイントにて保存された仮想マシン２０の状態が仮想マシン４０へ引き継がれる。したがって、直近のチェックポイントにて仮想マシン２０の状態が保存された後に仮想マシン２０にて通信が行われると、その後に仮想マシン４０へ引き継がれた場合に、通信が重複して行われる可能性が生じる。そして、仮想マシン２０にて行われた通信と仮想マシン４０にて行われた通信との間で矛盾が発生する可能性が存在する。このことから、上述のような通信の保留が行われる。 This operation is based on the following reason. When a failure or the like occurs in the host 10 or the like, and the processing of the virtual machine 20 is taken over to the virtual machine 40, the state of the virtual machine 20 saved at the latest checkpoint is taken over to the virtual machine 40. Therefore, when communication is performed in the virtual machine 20 after the state of the virtual machine 20 is saved at the latest checkpoint, there is a possibility that communication is performed in a duplicated manner when the virtual machine 40 is subsequently taken over. Occurs. There is a possibility that a contradiction occurs between the communication performed in the virtual machine 20 and the communication performed in the virtual machine 40. From this, the communication is suspended as described above.

ところで、仮想マシン２０の状態が保存される場合においては、ダーティページの数に応じて、仮想マシン２０が一時的に停止する期間が長くなる場合がある。また、仮想マシン２０にて実行され、何らかのサービスを提供するプログラムは、仮想マシン２０のゲスト物理メモリ２１へ規模の大きなデータを格納した後にサービスの提供を開始する場合がある。このようなプログラムによって実行されるサービスとして、データベース、Ｗｅｂサービス等のデータキャッシュを作成するサービスが含まれる。 By the way, when the state of the virtual machine 20 is saved, the period during which the virtual machine 20 is temporarily stopped may become longer depending on the number of dirty pages. A program that is executed in the virtual machine 20 and provides some service may start providing the service after storing large-scale data in the guest physical memory 21 of the virtual machine 20. Services that are executed by such programs include services that create data caches such as databases and Web services.

この場合には、仮想マシン２０の状態が保存される際に、多くのダーティページが保存の対象となる。したがって、仮想マシン２０の状態の保存に伴い、仮想マシン２０の動作が一時的に停止する期間が長くなる。この結果として、仮想マシン２０にて実行されるプログラムによるサービスの提供の開始が遅れるという問題が生じる場合がある。すなわち、フォールトトレラント機能に起因して、仮想マシン２０を用いた外部へのサービスの提供の開始が遅延する等の悪影響が生じる可能性がある。本発明の各実施形態における仮想マシン管理システムは、上述した悪影響の軽減を可能等する。 In this case, when the state of the virtual machine 20 is saved, many dirty pages are to be saved. Therefore, as the state of the virtual machine 20 is saved, the period during which the operation of the virtual machine 20 is temporarily stopped becomes longer. As a result, there may be a problem that the start of service provision by a program executed in the virtual machine 20 is delayed. In other words, due to the fault tolerant function, there is a possibility that an adverse effect such as a delay in the start of provision of services to the outside using the virtual machine 20 may occur. The virtual machine management system in each embodiment of the present invention can reduce the above-described adverse effects.

（第１の実施形態）
次に、本発明の第１の実施形態について説明する。図２は、本発明の第１の実施形態における仮想マシン管理システムを示す図である。図３は、本発明の第１の実施形態における仮想マシン管理システムの一動作例を示すフローチャートである。図４は、本発明の第１の実施形態における仮想マシン管理システムの動作の変形例を示すフローチャートである。 (First embodiment)
Next, a first embodiment of the present invention will be described. FIG. 2 is a diagram showing a virtual machine management system according to the first embodiment of the present invention. FIG. 3 is a flowchart illustrating an operation example of the virtual machine management system according to the first embodiment of the present invention. FIG. 4 is a flowchart illustrating a modified example of the operation of the virtual machine management system according to the first embodiment of the present invention.

図２に示すとおり、本発明の第１の実施形態における仮想マシン管理システム１００は、少なくとも、制御部１１０と、サービス管理表１１１と、チェックポイント制御部１１２とを備える。また、本発明の第１の実施形態における仮想マシン管理システム１００は、メモリ転送制御部１２０と、転送バッファ１２１と、ネットワーク制御部１３０と、通信解析部１３１と、保留バッファ１３２とを備える。 As shown in FIG. 2, the virtual machine management system 100 according to the first embodiment of the present invention includes at least a control unit 110, a service management table 111, and a checkpoint control unit 112. The virtual machine management system 100 according to the first embodiment of the present invention includes a memory transfer control unit 120, a transfer buffer 121, a network control unit 130, a communication analysis unit 131, and a hold buffer 132.

本実施形態における仮想マシン管理システム１００は、例えば、図１に示す仮想マシン管理システム１００−１又は１００−２として用いられる。すなわち、仮想マシン管理システム１００は、図１に示すホスト１０−１又は１０−２等にて実行される。また、仮想マシン管理システム１００は、仮想マシン２０又は４０等の仮想マシンの実行を管理する。 The virtual machine management system 100 in this embodiment is used as, for example, the virtual machine management system 100-1 or 100-2 shown in FIG. That is, the virtual machine management system 100 is executed by the host 10-1 or 10-2 shown in FIG. Further, the virtual machine management system 100 manages execution of a virtual machine such as the virtual machine 20 or 40.

以下、仮想マシン管理システム１００において、仮想マシン２０と同様の仮想マシンが実行されることを想定する。仮想マシン管理システム１００において、仮想マシン４０と同様の仮想マシンが実行されてもよい。以下の仮想マシン管理システム１００の各構成要素に関する説明においては、「仮想マシン」との記載は、仮想マシン管理システム１００において実行される仮想マシンを指す。 Hereinafter, it is assumed that a virtual machine similar to the virtual machine 20 is executed in the virtual machine management system 100. In the virtual machine management system 100, a virtual machine similar to the virtual machine 40 may be executed. In the following description of each component of the virtual machine management system 100, “virtual machine” refers to a virtual machine executed in the virtual machine management system 100.

また、以下の説明においては、任意の種類のサービスを提供するプログラムが仮想マシンのゲストＯＳにてアプリケーションプログラムとして実行されることを、サービスが仮想マシンにて実行されると称する場合がある。さらに、当該サービスが外部のサーバ等や外部のユーザ等から利用可能になることを、サービスが外部へ提供される等と称する場合がある。 Further, in the following description, execution of a program that provides an arbitrary type of service as an application program in the guest OS of the virtual machine may be referred to as execution of the service in the virtual machine. Furthermore, the availability of the service from an external server or the like or an external user may be referred to as providing the service to the outside.

続いて、本発明の第１の実施形態における仮想マシン管理システム１００の各構成要素について説明する。 Next, each component of the virtual machine management system 100 according to the first embodiment of the present invention will be described.

制御部１１０は、サービスを提供するプログラムの少なくとも一つがサービス管理表１１１にて保持される条件を満たすか否かを判断して、仮想マシンの状態を制御する。制御部１１０は、仮想マシンにおいて実行されるサービスの少なくとも一つが後述するサービス管理表１１１にて保持される条件を満たす場合に、仮想マシンのフォールトトレラント機能を有効とするよう仮想マシンの状態を制御する。また、制御部１１０は、仮想マシンの状態の保存等、フォールトトレラント機能を実現する場合に必要となる制御を行う。 The control unit 110 determines whether or not at least one of the programs that provide the service satisfies a condition held in the service management table 111, and controls the state of the virtual machine. The control unit 110 controls the state of the virtual machine so as to enable the fault tolerant function of the virtual machine when at least one of the services executed in the virtual machine satisfies a condition held in the service management table 111 described later. To do. In addition, the control unit 110 performs control necessary for realizing a fault tolerant function, such as storage of the state of a virtual machine.

また、制御部１１０は、例えば上述のサービスのいずれもが停止する場合や、上述のサービスがサービス管理表１１１にて保持される条件を満たさない場合等に、フォールトトレラント機能を無効とするよう仮想マシンの状態を制御する。 In addition, the control unit 110 virtually disables the fault tolerant function when, for example, any of the above services stops or when the above services do not satisfy the conditions held in the service management table 111. Control the state of the machine.

フォールトトレラント機能を有効にすべき条件としては、サービスを提供するプログラムの動作に関連する任意の条件が含まれる。条件の詳細は後述する。 The condition for enabling the fault tolerant function includes any condition related to the operation of the program that provides the service. Details of the conditions will be described later.

制御部１１０は、任意の手順にてサービスの少なくとも一つがサービス管理表１１１にて保持される条件を満たすか否かを判断する。制御部１１０は、例えば、サービスを提供するプログラムの動作を監視する仮想マシン管理システム１００の他の構成要素による監視の結果に基づいて、サービスの少なくとも一つが上述の条件を満たすかを判断する。 The control unit 110 determines whether or not at least one of the services satisfies a condition held in the service management table 111 by an arbitrary procedure. For example, the control unit 110 determines whether at least one of the services satisfies the above-described condition based on the result of monitoring by another component of the virtual machine management system 100 that monitors the operation of the program that provides the service.

例えば、制御部１１０は、サービスを提供するプログラムの動作を監視する他の構成要素からのフォールトトレラント機能を有効にすべき旨の要求に基づいて、サービスの少なくとも一つが上述の条件を満たすと判断してもよい。また、上述のサービスを提供するプログラムの動作を監視する他の構成要素がサービス管理表１１１に監視の結果を書込み、制御部１１０は、当該結果を参照して、サービスの少なくとも一つが上述の条件を満たすかを判断してもよい。
また、制御部１１０は、仮想マシンにて実行されるサービスを提供するプログラムの動作を直接に監視して、サービスが上述の条件を満たすかを判断してもよい。 For example, the control unit 110 determines that at least one of the services satisfies the above-described condition based on a request that the fault tolerant function should be enabled from another component that monitors the operation of the program that provides the service. May be. In addition, another component that monitors the operation of the program that provides the service writes the monitoring result in the service management table 111, and the control unit 110 refers to the result so that at least one of the services satisfies the condition described above. It may be determined whether or not
In addition, the control unit 110 may directly monitor the operation of a program that provides a service executed in the virtual machine and determine whether the service satisfies the above-described condition.

別の一例として、制御部１１０は、後述する通信解析部１３１にて解析されるサービスを提供するプログラムの通信に関する動作の状況に基づいて、サービスが上述の条件を満たすか否かを判断してもよい。 As another example, the control unit 110 determines whether or not the service satisfies the above-described conditions based on the operation status related to the communication of the program that provides the service analyzed by the communication analysis unit 131 described later. Also good.

サービスが上述の条件を満たすと判定した場合に、制御部１１０は、メモリ転送制御部１２０やネットワーク制御部１３０等を適宜制御して、フォールトトレラント機能を有効にする。この制御が行われることで、チェックポイントとして予め指定された間隔で仮想マシンの状態が別のホストにて実行される待機中の仮想マシンに保存される。 When it is determined that the service satisfies the above-described condition, the control unit 110 appropriately controls the memory transfer control unit 120, the network control unit 130, and the like to enable the fault tolerant function. By performing this control, the state of the virtual machine is saved in a standby virtual machine executed by another host at intervals specified in advance as checkpoints.

また、フォールトトレラント機能が有効である場合に、サービスが上述の条件を満たさないと判定すると、制御部１１０は、フォールトトレラント機能を無効にするように制御する。例えば、制御部１１０は、メモリ転送制御部１２０やネットワーク制御部１３０等を適宜制御してフォールトトレラント機能を無効にする。すなわち、チェックポイントとして指定された間隔で行われていた仮想マシンの状態の保存に関する処理が停止される。 If the fault tolerant function is valid and the service unit 110 determines that the service does not satisfy the above-described condition, the control unit 110 performs control so as to invalidate the fault tolerant function. For example, the control unit 110 disables the fault tolerant function by appropriately controlling the memory transfer control unit 120, the network control unit 130, and the like. In other words, the processing related to the storage of the state of the virtual machine, which has been performed at intervals designated as checkpoints, is stopped.

フォールトトレラント機能の実行には、大きな負荷が必要となる場合がある。したがって、仮想マシン管理システム１００においてフォールトトレラント機能が有効である場合には、上述のように、仮想マシンにて実行されるサービスの外部への提供の開始が遅延する等の悪影響が生じる可能性がある。 Executing a fault tolerant function may require a large load. Therefore, when the fault tolerant function is effective in the virtual machine management system 100, as described above, there is a possibility that adverse effects such as a delay in the start of provision of services executed in the virtual machine will occur. is there.

一方で、フォールトトレラント機能の必要性は、外部へのサービスの提供が行われている場合には高いが、それ以外の場合には必要性が小さい場合もある。そこで、フォールトトレラント機能は、必要性と、サービスを提供するプログラムが動作する際の負荷の大きさとに基づいて有効とされることが好ましいと考えられる。 On the other hand, the necessity of the fault tolerant function is high when a service is provided to the outside, but the necessity is sometimes small in other cases. Therefore, it is considered that the fault tolerant function is preferably made effective based on the necessity and the magnitude of the load when the program providing the service operates.

そこで、本実施形態における仮想マシン管理システム１００では、制御部１１０は、サービスの少なくとも一つがサービス管理表１１１にて保持される条件を満たすか否かを判断して、仮想マシンの状態を制御する。条件が適切に設定されることで、制御部１１０は、フォールトトレラント機能の必要性が高い場合に限って有効にすることを可能にする。すなわち、制御部１１０による制御によって、負荷が大きいが、フォールトトレラント機能の必要性が小さい処理が仮想マシンにて実行されている場合には、フォールトトレラント機能を無効にすることが可能となる。 Therefore, in the virtual machine management system 100 according to the present embodiment, the control unit 110 determines whether or not at least one of the services satisfies the condition held in the service management table 111, and controls the state of the virtual machine. . By appropriately setting the conditions, the control unit 110 can be enabled only when the necessity of the fault tolerant function is high. That is, the control by the control unit 110 makes it possible to invalidate the fault tolerant function when a process with a large load but a small necessity for the fault tolerant function is being executed in the virtual machine.

したがって、制御部１１０が上述の制御を行うことで、仮想マシンの可用性を高めつつ、フォールトトレラント機能に起因する仮想マシンの動作への影響を小さくすることが可能となる。
Therefore, when the control unit 110 performs the above-described control, it is possible to reduce the influence on the operation of the virtual machine due to the fault tolerant function while increasing the availability of the virtual machine.

サービス管理表１１１は、仮想マシンのフォールトトレラント機能の対象となる少なくとも一つのサービスの各々について、前記フォールトトレラント機能の有効又は無効との状態に関する条件を保持する。 The service management table 111 holds a condition regarding the status of whether the fault tolerant function is valid or invalid for each of at least one service that is a target of the fault tolerant function of the virtual machine.

一例として、サービス管理表１１１は、仮想マシンのフォールトトレラント機能の対象となる少なくとも一つのサービスの各々について、フォールトトレラント機能を有効にすべき条件を保持する。 As an example, the service management table 111 holds conditions for enabling the fault tolerant function for each of at least one service that is a target of the fault tolerant function of the virtual machine.

上述のように、フォールトトレラント機能を有効にすべき条件としては、サービスを提供するプログラムの実行に関連する任意の条件が含まれる。この条件には、例えば、サービスを提供するプログラムの実行に関する条件や、外部へのサービスの提供の有無に関する条件等が含まれる。これらの条件は、具体的な例としては、サービスを提供するプログラムの実行に際して必要となるメモリの使用量や、サービスの外部への提供に関して行われる通信の種類等の形式で規定される。 As described above, the condition for enabling the fault tolerant function includes any condition related to the execution of the program that provides the service. This condition includes, for example, a condition related to execution of a program that provides a service, a condition related to whether or not a service is provided to the outside, and the like. As specific examples, these conditions are defined in the form of the amount of memory required for executing the program that provides the service, the type of communication performed for providing the service to the outside, and the like.

なお、サービス管理表１１１は、仮想マシンのフォールトトレラント機能の対象となる少なくとも一つのサービスの各々について、フォールトトレラント機能を無効にすべき条件を保持してもよい。 Note that the service management table 111 may hold conditions for disabling the fault tolerant function for each of at least one service that is a target of the fault tolerant function of the virtual machine.

本実施形態においては、サービス管理表１１１は、一例として、仮想マシンにて実行されるサービスの各々について、外部へのサービスの提供の有無に関する条件を保持する。サービスの外部への提供の有無に関する条件としては、外部へのサービスの提供に関して行われる通信の種類等が用いられる。 In the present embodiment, as an example, the service management table 111 holds a condition regarding whether or not a service is provided to the outside for each service executed in a virtual machine. As a condition regarding whether or not the service is provided to the outside, the type of communication performed regarding the provision of the service to the outside is used.

すなわち、サービス管理表１１１は、フォールトトレラント機能を有効にすべき条件として、外部へのサービスの提供を開始する際にサービスに関して送受信される通信に関する情報を保持する。例えば、この情報には、外部へのサービスの提供を開始する場合に行われる通信のプロトコルやパケットの種類、これらの通信に関するアドレスやポート番号等が含まれる。 That is, the service management table 111 holds information related to communication transmitted / received with respect to the service when the provision of the service to the outside is started as a condition for enabling the fault tolerant function. For example, this information includes a protocol for communication performed when starting provision of a service to the outside, a packet type, an address and a port number related to the communication, and the like.

通信のプロトコルとしてＴＣＰ／ＩＰ（ＴｒａｎｓｍｉｓｓｉｏｎＣｏｎｔｒｏｌＰｒｏｔｏｃｏｌ／ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）が用いられる場合には、サービス管理表１１１は、プロトコルの種類として、例えばＴＣＰ又はＵＤＰ（ＵｓｅｒＤａｔａｇｒａｍＰｒｏｔｏｃｏｌ）のいずれであるかを保持する。また、この場合には、サービス管理表１１１は、アドレスとして、サービスの要求元又は提供先等のＩＰ（ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）アドレスを保持する。 When TCP / IP (Transmission Control Protocol / Internet Protocol) is used as a communication protocol, the service management table 111 holds, for example, either TCP or UDP (User Datagram Protocol) as the protocol type. . In this case, the service management table 111 holds an IP (Internet Protocol) address such as a service request source or a service destination as an address.

また、サービス管理表１１１は、フォールトトレラント機能の有効または無効の状態を保持してもよい。サービス管理表１１１は、フォールトトレラント機能が有効とされる場合に、フォールトトレラント機能が有効となった要因となるサービスや条件に関する情報を保持してもよい。すなわち、この情報は、例えばどのサービスに起因してフォールトトレラント機能が有効となったかを示す情報である。 Further, the service management table 111 may hold a valid or invalid state of the fault tolerant function. The service management table 111 may hold information on services and conditions that cause the fault tolerant function to be effective when the fault tolerant function is enabled. That is, this information is information indicating, for example, which service caused the fault tolerant function to be effective.

更に、サービス管理表１１１は、後述するチェックポイント制御部１１２にてチェックポイントの間隔の変更が行われる場合には、チェックポイントに関する情報を保持してもよい。チェックポイントに関する情報としては、仮想マシンのフォールトトレラント機能の対象となる一つのサービスについて、各々のサービスに適したチェックポイントの間隔に関する情報が含まれる。 Furthermore, the service management table 111 may hold information on checkpoints when the checkpoint control unit 112 described later changes checkpoint intervals. The information about checkpoints includes information about checkpoint intervals suitable for each service for one service that is the target of the fault tolerant function of the virtual machine.

適切なチェックポイントの間隔は、仮想マシンにおいて実行されるサービスに応じて異なる場合がある。仮想マシンの状態が保存される場合には、上述のように、外部への通信が保留される。通信が保留されている期間が長くなることで、外部への通信の頻度が高い場合等には外部への通信に支障が生じる可能性がある。したがって、例えば外部への通信の頻度が高い場合等には、一般には、短いチェックポイントの間隔が設定されることが好ましい。 The appropriate checkpoint interval may vary depending on the services running on the virtual machine. When the state of the virtual machine is saved, communication to the outside is suspended as described above. Since the period during which communication is suspended becomes longer, there is a possibility that trouble may occur in communication to the outside when the frequency of communication to the outside is high. Therefore, for example, when the frequency of external communication is high, it is generally preferable to set a short checkpoint interval.

これに対して、プログラムの実行時におけるページの更新には、一般に、時間局所性があることが知られている。すなわち、プログラムの実行時に、ある期間においては、同一のページに対して繰り返し更新が行われる場合がある。このような場合には、長いチェックポイントの間隔が設定されることで、仮想マシンの状態の保存において、ダーティページとして更新の対象となるページの総量を削減することが可能となる。すなわち、メモリへの書込みの頻度が高い場合等には、一般には、長いチェックポイントの間隔が設定されることが好ましい。 On the other hand, it is generally known that there is a time locality in updating a page when a program is executed. In other words, when the program is executed, the same page may be repeatedly updated during a certain period. In such a case, by setting a long checkpoint interval, it is possible to reduce the total amount of pages to be updated as dirty pages when saving the state of the virtual machine. That is, when the frequency of writing to the memory is high, it is generally preferable to set a long checkpoint interval.

チェックポイント制御部１１２は、フォールトトレラント機能が有効である場合に、必要に応じてチェックポイントの間隔やチェックポイントにおける処理を制御する。チェックポイント制御部１１２は、例えば、制御部１１０からの指示に応じて、サービス管理表１１１に規定されたサービスの各々について規定されたチェックポイントの間隔に応じて、チェックポイントの間隔を変更する制御を行う。 When the fault tolerant function is valid, the checkpoint control unit 112 controls checkpoint intervals and processing at checkpoints as necessary. For example, the checkpoint control unit 112 changes the checkpoint interval according to the checkpoint interval specified for each of the services specified in the service management table 111 in response to an instruction from the control unit 110. I do.

メモリ転送制御部１２０は、チェックポイントにおいて、稼働している仮想マシンから待機している仮想マシンへのゲスト物理メモリの内容の転送を制御する。また、メモリ転送制御部１２０は、仮想マシンにおけるゲスト物理メモリへのアクセスの制御など、仮想マシンの実現に必要となる一般的な制御を行ってもよい。 The memory transfer control unit 120 controls transfer of the contents of the guest physical memory from the running virtual machine to the standby virtual machine at the checkpoint. Further, the memory transfer control unit 120 may perform general control necessary for realizing the virtual machine, such as control of access to the guest physical memory in the virtual machine.

転送バッファ１２１は、仮想マシン管理システム１００において実行する仮想マシンが稼働している場合に、当該仮想マシンが有するゲスト物理メモリの内容を一時的に保持するバッファである。メモリ転送制御部１２０及び転送バッファ１２１は、一般的なフォールトトレラント機能を実現可能な仮想マシン管理システムが備える構成要素と同様の要素である。 The transfer buffer 121 is a buffer that temporarily holds the contents of the guest physical memory of the virtual machine when the virtual machine executed in the virtual machine management system 100 is operating. The memory transfer control unit 120 and the transfer buffer 121 are the same elements as those included in a virtual machine management system that can realize a general fault-tolerant function.

ネットワーク制御部１３０は、フォールトトレラント機能が有効である場合において、上述した通信の保留の動作等を制御する。また、ネットワーク制御部１３０は、仮想マシンにおいて実行されるゲストＯＳでの通信の制御等、仮想マシンを実現する際に必要となる一般的な制御を行ってもよい。 When the fault tolerant function is valid, the network control unit 130 controls the above-described communication suspension operation and the like. The network control unit 130 may perform general control necessary for realizing the virtual machine, such as control of communication with the guest OS executed in the virtual machine.

通信解析部１３１は、上述のサービス管理表１１１においてフォールトトレラント機能を有効にすべき条件として通信に関する条件が規定されている場合に、仮想マシンにて実行されるゲストＯＳにおける通信の内容を解析する。通信解析部１３１は、通信の内容の解析として、例えば、通信がゲストＯＳにおいて実行されるサービスの開始または終了に関連するか否かを解析する。ゲストＯＳにおける通信がＴＣＰ（ＴｒａｎｓｍｉｓｓｉｏｎＣｏｎｔｒｏｌＰｒｏｔｏｃｏｌ）による通信である場合には、通信解析部１３１は、パケットの種類を判別する等によって通信の内容を解析する。 The communication analysis unit 131 analyzes the content of communication in the guest OS executed in the virtual machine when the communication management condition is defined as a condition for enabling the fault tolerant function in the service management table 111 described above. . For example, the communication analysis unit 131 analyzes whether the communication is related to the start or end of a service executed in the guest OS. When the communication in the guest OS is communication by TCP (Transmission Control Protocol), the communication analysis unit 131 analyzes the content of the communication by determining the type of the packet.

保留バッファ１３２は、フォールトトレラント機能が有効である場合において通信が保留される場合に、保留された通信に関する情報を保持するバッファである。 The hold buffer 132 is a buffer that holds information regarding the held communication when the communication is put on hold when the fault tolerant function is enabled.

続いて、本発明の第１の実施形態における仮想マシン管理システム１００の動作を説明する。なお、この動作例においては、フォールトトレラント機能を有効にすべき条件の一例として、通信に関する条件がサービス管理表１１１に規定されていることを想定する。すなわち、制御部１１０は、通信解析部１３１における解析の結果に基づいてフォールトトレラント機能を有効にすべき条件を満たすか否かを判定する。 Next, the operation of the virtual machine management system 100 in the first embodiment of the present invention will be described. In this operation example, it is assumed that a condition regarding communication is defined in the service management table 111 as an example of a condition for enabling the fault tolerant function. That is, the control unit 110 determines whether or not a condition for enabling the fault-tolerant function is satisfied based on the analysis result in the communication analysis unit 131.

最初に、通信解析部１３１は、ゲストＯＳにて行われる通信を解析する（ステップＳ１０１）。例えば、通信解析部１３１は、ネットワーク制御部１３０を介してゲストＯＳにて行われる通信の内容を解析する。 First, the communication analysis unit 131 analyzes communication performed in the guest OS (step S101). For example, the communication analysis unit 131 analyzes the content of communication performed by the guest OS via the network control unit 130.

次に、通信解析部１３１は、ステップＳ１０１にて解析した通信が、仮想マシンのゲストＯＳにて行われる外部へのサービス提供の開始に関する通信である否かを判定する（ステップＳ１０２）。例えば、当該通信がＴＣＰによる通信である場合には、通信解析部１３１は、解析対象となるゲストＯＳからの通信のプロトコルタイプがＴＣＰ＿ＳＹＮ＿ＡＣＫである場合に、外部へのサービス提供の開始に関する通信であると判定する。また、当該通信がＵＤＰによる通信である場合には、通信解析部１３１は、判定が困難であることから、本ステップの段階では外部へのサービス提供の開始に関する通信であると判定する。 Next, the communication analysis unit 131 determines whether or not the communication analyzed in step S101 is communication related to the start of service provision to the outside performed by the guest OS of the virtual machine (step S102). For example, when the communication is TCP communication, the communication analysis unit 131 is communication related to the start of service provision to the outside when the protocol type of communication from the guest OS to be analyzed is TCP_SYN_ACK. Is determined. If the communication is based on UDP, the communication analysis unit 131 determines that the communication is related to the start of service provision to the outside at the stage of this step because the determination is difficult.

ステップＳ１０２において、解析対象となる通信が外部へのサービス提供の開始に関する通信であると判定された場合（ステップＳ１０２：Ｙｅｓ）には、制御部１１０及び通信解析部１３１は、ステップＳ１０３の処理を実行する。ステップＳ１０３では、制御部１１０は、フォールトトレラント機能を有効にすべき条件を満たすか否かを判定する。この動作例においては、制御部１１０は、通信解析部１３１における解析結果に基づいて判定を行う。 In step S102, when it is determined that the communication to be analyzed is communication related to the start of service provision to the outside (step S102: Yes), the control unit 110 and the communication analysis unit 131 perform the process of step S103. Execute. In step S103, the control unit 110 determines whether or not a condition for enabling the fault tolerant function is satisfied. In this operation example, the control unit 110 makes a determination based on the analysis result in the communication analysis unit 131.

フォールトトレラント機能を有効にすべき条件を満たしているか否かの判定は、ステップＳ１０１又はステップＳ１０２にて解析等が行われた通信が、サービス管理表１１１に保持される条件に該当するか否かを判定することで行われる。すなわち、当該通信に関するアドレスやポート番号が、サービス管理表１１１に保持されている、フォールトトレラント機能を有効にすべき条件に該当するか比較することで行われる。 Whether or not the condition for enabling the fault tolerant function is satisfied is determined by whether or not the communication analyzed in step S101 or step S102 corresponds to the condition held in the service management table 111 It is done by judging. That is, it is performed by comparing whether the address and port number related to the communication correspond to the conditions for enabling the fault tolerant function held in the service management table 111.

ステップＳ１０３における一つの動作例として、通信解析部１３１は、上述の通信が、サービス管理表１１１に保持される条件に該当するか否かを比較する。条件に該当する場合には、通信解析部１３１は、制御部１１０にフォールトトレラント機能を有効にすべき旨の要求を行う。そして、制御部１１０は、要求を受けた場合に、フォールトトレラント機能を有効にすべき条件を満たすと判断する。 As an example of the operation in step S103, the communication analysis unit 131 compares whether or not the above-described communication satisfies a condition held in the service management table 111. When the condition is met, the communication analysis unit 131 requests the control unit 110 to enable the fault tolerant function. Then, when receiving the request, the control unit 110 determines that the condition for enabling the fault tolerant function is satisfied.

また、ステップＳ１０３における別の動作例として、通信解析部１３１は、上述の通信に関する情報を制御部１１０に送信してもよい。この場合には、制御部１１０が、当該送信がサービス管理表１１１に保持される条件に該当するか比較してもよい。 As another operation example in step S <b> 103, the communication analysis unit 131 may transmit information related to the above communication to the control unit 110. In this case, the control unit 110 may compare whether the transmission satisfies a condition held in the service management table 111.

ステップＳ１０３においてフォールトトレラント機能を有効にすべき条件を満たすと判定された場合（ステップＳ１０３：Ｙｅｓ）には、制御部１１０は、サービス管理表１２０を更新する（ステップＳ１０４）。例えば、制御部１１０は、ステップＳ１０３における比較の結果に基づいて、フォールトトレラント機能が有効となった要因となるサービスや条件に関する情報を保持するようにサービス管理表１２０を更新する。また、制御部１１０は、フォールトトレラント機能が有効である旨を示すようにサービス管理表１２０を更新してもよい。 When it is determined in step S103 that the condition for enabling the fault tolerant function is satisfied (step S103: Yes), the control unit 110 updates the service management table 120 (step S104). For example, the control unit 110 updates the service management table 120 so as to hold information on services and conditions that cause the fault tolerant function to be effective based on the comparison result in step S103. Further, the control unit 110 may update the service management table 120 so as to indicate that the fault tolerant function is effective.

制御部１１０は、フォールトトレラント機能が既に有効とされているかを確認する（ステップＳ１０５）。フォールトトレラント機能が有効とされていない場合（ステップＳ１０５：Ｎｏ）には、制御部１１０は、フォールトトレラント機能を有効にするよう制御を行う（ステップＳ１０６）。 The control unit 110 confirms whether the fault tolerant function has already been enabled (step S105). When the fault tolerant function is not enabled (step S105: No), the control unit 110 performs control to enable the fault tolerant function (step S106).

フォールトトレラント機能が既に有効とされている場合（ステップＳ１０５：Ｙｅｓ）には、チェックポイント制御部１１２は、チェックポイントの間隔を変更するよう制御する（ステップＳ１０７）。この場合には、チェックポイント制御部１１２は、例えばサービス管理表１１１を参照し、ステップＳ１０３等で比較の対象となった通信に関連するサービスに適したチェックポイントの間隔に関する情報を取得する。そして、そのサービスに適したチェックポイントの間隔が、その時点におけるチェックポイントの間隔と比較して長い場合には、チェックポイント制御部１１２は、チェックポイントの間隔を上述のサービスに適した間隔に変更する。 When the fault tolerant function is already enabled (step S105: Yes), the checkpoint control unit 112 controls to change the interval between checkpoints (step S107). In this case, the checkpoint control unit 112 refers to the service management table 111, for example, and acquires information regarding checkpoint intervals suitable for the service related to the communication to be compared in step S103 and the like. If the checkpoint interval suitable for the service is longer than the checkpoint interval at that time, the checkpoint control unit 112 changes the checkpoint interval to an interval suitable for the service. To do.

なお、ステップＳ１０７においてチェックポイントの間隔が変更される場合には、チェックポイント制御部１１２は、上述の手順と異なる手順にてチェックポイントの間隔を変更するよう制御してもよい。例えば、チェックポイント制御部１１２は、仮想マシンにて実行されているサービスに対して定められた任意の優先度等に応じて、優先度が最も高いサービスに適した間隔となるようにチェックポイント間隔を制御してもよい。 When the checkpoint interval is changed in step S107, the checkpoint control unit 112 may perform control so as to change the checkpoint interval by a procedure different from the above-described procedure. For example, the checkpoint control unit 112 determines the checkpoint interval so as to be an interval suitable for the service with the highest priority according to an arbitrary priority set for the service executed in the virtual machine. May be controlled.

ステップＳ１０６又はＳ１０７における処理に引き続いて、ネットワーク制御部１３０は、必要に応じて通信に関する処理を実行する（ステップＳ１０８）。ネットワーク制御部１３０は、フォールトトレラント機能が有効である場合における通信の保留等、フォールトトレラント機能が有効である場合に必要となる処理を含む通信に関する処理を実行する。 Subsequent to the processing in step S106 or S107, the network control unit 130 executes processing related to communication as necessary (step S108). The network control unit 130 executes processing related to communication including processing necessary when the fault tolerant function is effective, such as suspension of communication when the fault tolerant function is effective.

また、ステップＳ１０３においてステップＳ１０３においてフォールトトレラント機能を有効にすべき条件を満たさないと判定された場合（ステップＳ１０３：Ｎｏ）にも、ネットワーク制御部１３０は、必要に応じてステップＳ１０８の処理を実行する。 In step S103, also when it is determined in step S103 that the conditions for enabling the fault tolerant function are not satisfied (step S103: No), the network control unit 130 executes the process of step S108 as necessary. To do.

ステップＳ１０２において、解析対象となる通信が外部へのサービス提供の開始に関する通信ではないと判定された場合（ステップＳ１０２：Ｎｏ）には、ステップＳ１０９の処理が行われる。ステップＳ１０９においては、通信解析部１３１は、ステップＳ１０１にて解析した通信が、仮想マシンのゲストＯＳにて行われる外部へのサービス提供の終了に関する通信である否かを判定する。当該通信がＴＣＰによる通信である場合には、通信解析部１３１は、例えばプロトコルタイプがＴＣＰ＿ＲＳＴ又はＴＣＰ＿ＦＩＮである場合に、外部へのサービス提供の終了に関する通信であると判定する。 If it is determined in step S102 that the communication to be analyzed is not communication related to the start of service provision to the outside (step S102: No), the process of step S109 is performed. In step S109, the communication analysis unit 131 determines whether or not the communication analyzed in step S101 is communication related to termination of service provision to the outside performed by the guest OS of the virtual machine. If the communication is TCP communication, the communication analysis unit 131 determines that the communication is related to termination of service provision to the outside when the protocol type is TCP_RST or TCP_FIN, for example.

ステップＳ１０９において、解析対象となる通信が外部へのサービス提供の終了に関する通信であると判定された場合（ステップＳ１０９：Ｙｅｓ）には、制御部１１０及び通信解析部１３１は、ステップＳ１１０の処理を実行する。ステップＳ１０３では、制御部１１０は、フォールトトレラント機能を無効にすべき条件を満たすか否かを判定する。 In step S109, when it is determined that the communication to be analyzed is communication related to termination of service provision to the outside (step S109: Yes), the control unit 110 and the communication analysis unit 131 perform the process of step S110. Execute. In step S103, the control unit 110 determines whether or not a condition for disabling the fault tolerant function is satisfied.

この動作例においては、制御部１１０は、ステップＳ１１０の処理として、ステップＳ１０３における処理と同様にして、通信解析部１３１における解析結果に基づいて判定を行う。すなわち、当該通信に関するアドレスやポート番号が、サービス管理表１１１に保持されるフォールトトレラント機能を無効にすべき条件に該当するか比較することで行われる。 In this operation example, the control unit 110 performs the determination based on the analysis result in the communication analysis unit 131 in the same manner as the processing in step S103 as the processing in step S110. That is, it is performed by comparing whether the address and the port number related to the communication correspond to the conditions for disabling the fault tolerant function held in the service management table 111.

ステップＳ１１０においてフォールトトレラント機能を有効にすべき条件を満たすと判定された場合（ステップＳ１１０：Ｙｅｓ）には、制御部１１０は、サービス管理表１２０を更新する（ステップＳ１１１）。例えば、制御部１１０は、ステップＳ１１０における比較の結果に基づいて、フォールトトレラント機能が無効とすべきと判断された要因となるサービスや条件に関する情報を保持するようにサービス管理表１２０を更新する。 When it is determined in step S110 that the condition for enabling the fault tolerant function is satisfied (step S110: Yes), the control unit 110 updates the service management table 120 (step S111). For example, the control unit 110 updates the service management table 120 based on the result of the comparison in step S110 so as to hold information on services and conditions that cause the fault tolerant function to be invalidated.

続いて、制御部１１０は、サービス管理表１１１を参照する等によって、フォールトトレラント機能を有効とすべき他のサービスの外部への提供が行われているかを確認する（ステップＳ１１２）。他のサービスが提供されている場合（ステップＳ１１２：Ｙｅｓ）には、制御部１１０は、フォールトトレラント機能を有効としたままの状態とする。そして、チェックポイント制御部１１２は、ステップＳ１０７の処理として、必要に応じてチェックポイントの間隔を変更するよう制御する。 Subsequently, the control unit 110 confirms whether other services that should enable the fault tolerant function are provided to the outside by referring to the service management table 111 (step S112). If another service is provided (step S112: Yes), the control unit 110 keeps the fault tolerant function enabled. Then, the checkpoint control unit 112 performs control so as to change the checkpoint interval as necessary in step S107.

フォールトトレラント機能を有効とすべき他のサービスの外部への提供が行われていない場合には、制御部１１０は、フォールトトレラント機能を無効にするよう制御を行う（ステップＳ１１３）。ステップＳ１１３の処理と併せて、制御部１１０は、フォールトトレラント機能が無効である旨を示すようにサービス管理表１２０を更新してもよい。ステップＳ１１３の処理の後には、必要に応じて、ネットワーク制御部１３０は、ステップＳ１０８の通信に関する処理を実行する。 In the case where provision of other services that should enable the fault tolerant function is not performed outside, the control unit 110 performs control to disable the fault tolerant function (step S113). In conjunction with the process of step S113, the control unit 110 may update the service management table 120 to indicate that the fault tolerant function is invalid. After the processing in step S113, the network control unit 130 executes processing related to communication in step S108 as necessary.

以上のとおり、本発明の第１の実施形態における仮想マシン管理システム１００は、フォールトトレラント機能を有効にすべき条件を満たす場合にフォールトトレラント機能を有効にするように仮想マシンを制御する。すなわち、本実施形態においては仮想マシン管理システム１００の制御部１１０は、サービス管理表１１１に予め規定された条件を満たす場合にフォールトトレラント機能を有効とするように仮想マシンを制御する。サービス管理表１１１にて規定される条件は、例えば、仮想マシンにおいてサービスの外部への提供が行われることを示す条件である。 As described above, the virtual machine management system 100 according to the first embodiment of the present invention controls the virtual machine so as to enable the fault tolerant function when the condition for enabling the fault tolerant function is satisfied. In other words, in the present embodiment, the control unit 110 of the virtual machine management system 100 controls the virtual machine so as to enable the fault tolerant function when the conditions specified in the service management table 111 are satisfied. The conditions defined in the service management table 111 are conditions indicating that a service is provided to the outside in the virtual machine, for example.

つまり、本実施形態の仮想マシン管理システム１００においては、サービス管理表１１１に保持される条件に応じ、仮想マシンにおいてサービスの外部への提供が行われる場合にフォールトトレラント機能を有効とすることが可能となる。言い換えると、本実施形態の仮想マシン管理システム１００においては、負荷が大きく、かつ、外部に対して影響を及ぼす可能性の小さい処理が行われている場合にはフォールトトレラント機能を無効とすることが可能となる。 That is, in the virtual machine management system 100 of the present embodiment, the fault tolerant function can be enabled when the service is provided to the outside in the virtual machine according to the conditions held in the service management table 111. It becomes. In other words, in the virtual machine management system 100 of the present embodiment, the fault tolerant function may be invalidated when a process with a large load and a small possibility of affecting the outside is performed. It becomes possible.

したがって、仮想マシンの状態の保存等に伴う仮想マシンの一時的な停止等、フォールトトレラント機能に起因する仮想マシンへの負荷を軽減することが可能となる。これにより、仮想マシンの状態の保存が行われることに起因して、外部へのサービス提供が遅延すること等が回避される。 Therefore, it is possible to reduce the load on the virtual machine due to the fault tolerant function, such as temporary stop of the virtual machine accompanying the storage of the state of the virtual machine. As a result, it is possible to avoid delaying the provision of services to the outside due to the storage of the state of the virtual machine.

すなわち、本実施形態における仮想マシン管理システム１００は、フォールトトレラント機能に起因する仮想マシンの動作への影響が小さい仮想マシン管理システム等を提供することを可能とする。 That is, the virtual machine management system 100 according to this embodiment can provide a virtual machine management system or the like that has a small influence on the operation of the virtual machine due to the fault tolerant function.

（第１の実施形態の変形例）
本発明の第１の実施形態においては、種々の変形例が考えられる。 (Modification of the first embodiment)
Various modifications are conceivable in the first embodiment of the present invention.

本実施形態における仮想マシン管理システム１００は、仮想マシンを実現する際に必要となる構成として、図２に示す構成とは異なる構成を備えてもよい。また、仮想マシン管理システム１００は、一般的なフォールトトレラント機能を実現する際に必要となる構成として、図２に示す構成とは異なる構成を備えてもよい。仮想マシン管理システム１００は、少なくとも制御部１１０及びサービス管理表１１１を備えることで、フォールトトレラント機能に起因する仮想マシンの動作への影響を軽減することが可能となる。 The virtual machine management system 100 according to the present embodiment may include a configuration different from the configuration illustrated in FIG. 2 as a configuration required when realizing a virtual machine. Further, the virtual machine management system 100 may include a configuration different from the configuration illustrated in FIG. 2 as a configuration necessary for realizing a general fault-tolerant function. By including at least the control unit 110 and the service management table 111, the virtual machine management system 100 can reduce the influence on the operation of the virtual machine due to the fault tolerant function.

また、別の一例として、本実施形態における仮想マシン管理システム１００は、チェックポイントの間隔を変更しなくてもよい。この場合には、仮想マシン管理システム１００は、フォールトトレラント機能が有効である場合には、予め定められたチェックポイントの間隔にて仮想マシンの状態を保存する。すなわち、この場合における仮想マシン管理システム１００は、図４に示すフローチャートのように動作してもよい。 As another example, the virtual machine management system 100 according to this embodiment may not change the checkpoint interval. In this case, when the fault tolerant function is valid, the virtual machine management system 100 stores the state of the virtual machine at a predetermined checkpoint interval. That is, the virtual machine management system 100 in this case may operate as in the flowchart shown in FIG.

図４に示すフローチャートにおいては、図３に示すフローチャートのＳ１０７にて規定されるチェックポイント間隔の調整に関する動作が省略される。そして、ステップＳ１０５又はステップＳ１１２にて規定される分岐が“Ｙｅｓ”である場合には、引き続いてステップＳ１０８にて規定される通信に関する処理が行われる。図４に示すフローチャートの動作は、上述の点を除き、図３に示すフローチャートと同様に動作する。 In the flowchart shown in FIG. 4, the operation relating to the adjustment of the checkpoint interval defined in S107 of the flowchart shown in FIG. 3 is omitted. If the branch defined in step S105 or step S112 is “Yes”, processing relating to communication defined in step S108 is subsequently performed. The operation of the flowchart shown in FIG. 4 is the same as that of the flowchart shown in FIG. 3 except for the points described above.

また、この場合には、仮想マシン管理システム１００のチェックポイント制御部は、チェックポイントにおける仮想マシンの情報の保存についての制御を行う。 In this case, the checkpoint control unit of the virtual machine management system 100 controls the storage of virtual machine information at the checkpoint.

このような場合であっても、仮想マシン管理システム１００は、制御部１１０がサービス管理表１１１に規定される条件を満たす場合にフォールトトレラント機能を有効にすることで、仮想マシンの動作への影響の軽減が可能となる。 Even in such a case, the virtual machine management system 100 can affect the operation of the virtual machine by enabling the fault tolerant function when the control unit 110 satisfies the conditions specified in the service management table 111. Can be reduced.

（第２の実施形態）
次に、本発明の第２の実施形態について説明する。図５は、本発明の第２の実施形態における仮想マシン管理システム２００の構成を示す図である。 (Second Embodiment)
Next, a second embodiment of the present invention will be described. FIG. 5 is a diagram showing a configuration of the virtual machine management system 200 in the second exemplary embodiment of the present invention.

図５に示すように、本発明の第２の実施形態における仮想マシン管理システム２００は、本発明の第１の実施形態における仮想マシン管理システム１００と同様の構成を備える。仮想マシン管理システム２００において、制御部１１０は、サービス監視部２４０における仮想マシンの監視結果に基づいて、仮想マシンの記フォールトトレラント機能に関する状態を制御する。 As shown in FIG. 5, the virtual machine management system 200 according to the second embodiment of the present invention has the same configuration as the virtual machine management system 100 according to the first embodiment of the present invention. In the virtual machine management system 200, the control unit 110 controls the state related to the fault tolerant function of the virtual machine based on the monitoring result of the virtual machine in the service monitoring unit 240.

仮想マシンのゲストＯＳにおいては、一般的に、仮想マシン管理システムや仮想マシンを支援するプログラムが導入される場合がある。本実施形態においては、サービス監視部２４０は、そのようなプログラムの例である。仮想マシン管理システム２００は、サービス監視部２４０における監視結果を用いて仮想マシンの制御を行う。 In general, a guest OS of a virtual machine may introduce a virtual machine management system or a program that supports the virtual machine. In the present embodiment, the service monitoring unit 240 is an example of such a program. The virtual machine management system 200 controls the virtual machine using the monitoring result in the service monitoring unit 240.

本実施形態においては、サービス監視部２４０は、一つの動作例として、ゲストＯＳにて実行されるサービスに関連するプロセスの状態を監視する。すなわち、サービス監視部２４０は、これらのプロセスの起動や終了等を監視する。サービス監視部２４０が監視の対象とするプロセスに関連するサービス等は、サービス管理表１１１にて適宜規定されてもよい。 In the present embodiment, the service monitoring unit 240 monitors the state of a process related to a service executed on the guest OS as one example of operation. That is, the service monitoring unit 240 monitors the start and end of these processes. A service or the like related to a process to be monitored by the service monitoring unit 240 may be appropriately defined in the service management table 111.

監視の対象となるプロセスが起動した場合には、サービス監視部２４０は、制御部１１０にフォールトトレラント機能を有効にすべき旨の要求を行う。そして、制御部１１０は、要求を受けた場合に、フォールトトレラント機能を有効にすべき条件を満たすと判断する。 When the process to be monitored is activated, the service monitoring unit 240 requests the control unit 110 to enable the fault tolerant function. Then, when receiving the request, the control unit 110 determines that the condition for enabling the fault tolerant function is satisfied.

または、サービス監視部２４０は、仮想マシンにおいてプロセスが起動した場合に、その旨を制御部１１０に通知してもよい。この場合には、制御部１１０は、通知の対象であるプロセスがサービス管理表１１１に保持される条件に関連するプロセスであるかを比較し、フォールトトレラント機能を有効にすべき条件を満たすと判断してもよい。 Alternatively, the service monitoring unit 240 may notify the control unit 110 when a process is started in the virtual machine. In this case, the control unit 110 compares whether the process to be notified is a process related to the condition held in the service management table 111, and determines that the condition for enabling the fault tolerant function is satisfied. May be.

同様に、サービス監視部２４０は、監視の対象となるプロセスが終了した場合には、サービス監視部２４０は、制御部１１１は、フォールトトレラント機能を無効にすべき旨の要求を行う。そして、制御部１１０は、要求を受けた場合に、他のフォールトトレラント機能を有効とすべき条件を参照しつつ、必要に応じてフォールトトレラント機能を無効にすべき条件を満たすと判断する。または、サービス監視部２４０は、仮想マシンにおいてプロセスが終了した場合に、その旨を制御部１１０に通知してもよい。制御部１１０は、当該通知に基づき、必要に応じてフォールトトレラント機能を有効にすべき条件を満たすと判断してもよい。 Similarly, when the process to be monitored is completed, the service monitoring unit 240 requests that the control unit 111 disable the fault tolerant function. Then, when receiving the request, the control unit 110 determines that the condition for disabling the fault tolerant function is satisfied as necessary while referring to the condition for enabling the other fault tolerant function. Alternatively, the service monitoring unit 240 may notify the control unit 110 when the process is completed in the virtual machine. Based on the notification, the control unit 110 may determine that the condition for enabling the fault tolerant function is satisfied as necessary.

別の例として、サービス監視部２４０は、ゲストＯＳにて実行されるサービスに関連するプロセスの状態として、当該プロセスのメモリの使用量を監視する。
この場合には、サービス監視部２４０は、例えば、サービス管理表１１１に予め定められたメモリの使用量を超えたか否かを監視する。そして、制御部１１０は、サービス監視部２４０の監視の結果に基づいて、上述の例における手順と同様の手順等にてフォールトトレラント機能を有効にすべき（又は無効にすべき）条件を満たすと判断する。 As another example, the service monitoring unit 240 monitors the memory usage of the process as the state of the process related to the service executed in the guest OS.
In this case, the service monitoring unit 240 monitors, for example, whether or not a memory usage predetermined in the service management table 111 has been exceeded. Then, based on the monitoring result of the service monitoring unit 240, the control unit 110 satisfies the condition that the fault tolerant function should be enabled (or disabled) in the same procedure as the procedure in the above example. to decide.

本実施形態においては、仮想マシン管理システム２００の通信解析部１３１は、第１の実施形態において説明した例と同様に、仮想マシンにて実行されるゲストＯＳにおける通信の内容を解析する。この場合には、制御部１１０の動作としていくつかの動作が想定される。 In the present embodiment, the communication analysis unit 131 of the virtual machine management system 200 analyzes the content of communication in the guest OS executed in the virtual machine, as in the example described in the first embodiment. In this case, several operations are assumed as the operation of the control unit 110.

一つの動作として、制御部１１０は、サービス監視部２４０における監視の結果に基づいてフォールトトレラント機能の状態に関する制御を行う。制御部１１０は、フォールトトレラント機能の状態に関する制御においては、通信解析部１３１による通信を解析した結果を用いない。そして、通信解析部１３１は、フォールトトレラント機能が無効である場合には、外部へのサービスの提供に関わるパケットを破棄する。このような制御を行うことで、制御部１１０は、フォールトトレラント機能が有効である場合に限って仮想マシンによる外部へのサービスの提供を行うことを可能にする。 As one operation, the control unit 110 performs control related to the state of the fault tolerant function based on the monitoring result in the service monitoring unit 240. In the control related to the state of the fault tolerant function, the control unit 110 does not use the result of analyzing the communication by the communication analysis unit 131. Then, when the fault tolerant function is invalid, the communication analysis unit 131 discards a packet related to provision of a service to the outside. By performing such control, the control unit 110 can provide a service to the outside by the virtual machine only when the fault tolerant function is effective.

また、制御部１１０は、通信解析部１３１又はサービス監視部２４０のいずれかからの通知等に基づいてフォールトトレラント機能の状態を制御してもよい。例えば、通信解析部１３１又はサービス監視部２４０のいずれかからの通知等がフォールトトレラント機能を有効にすべき条件に関する場合、制御部１１０は、フォールトトレラント機能を有効にすべき条件を満たすと判断してもよい。この場合には、制御部１１０は、フォールトトレラント機能を有効にするように制御する。 The control unit 110 may control the state of the fault tolerant function based on a notification from either the communication analysis unit 131 or the service monitoring unit 240. For example, when the notification from either the communication analysis unit 131 or the service monitoring unit 240 relates to a condition for enabling the fault tolerant function, the control unit 110 determines that the condition for enabling the fault tolerant function is satisfied. May be. In this case, the control unit 110 performs control so as to enable the fault tolerant function.

以上のとおり、本発明の第２の実施形態における仮想マシン管理システム２００は、制御部１１０は、フォールトトレラント機能の状態の制御に際して、サービス監視部２４０における監視の結果を用いる。サービス監視部２４０は、仮想マシンのゲストＯＳにて実行されるプロセスの状況を監視する。このことから、制御部１１０は、そのプロセスの状況に基づいて、フォールトトレラント機能の状態を制御することが可能となる。したがって、本実施形態における仮想マシン管理システム２００は、第１の実施形態における仮想マシン管理システム１００と同様の効果を奏する。 As described above, in the virtual machine management system 200 according to the second embodiment of the present invention, the control unit 110 uses the monitoring result in the service monitoring unit 240 when controlling the state of the fault tolerant function. The service monitoring unit 240 monitors the status of processes executed by the guest OS of the virtual machine. From this, the control unit 110 can control the state of the fault tolerant function based on the status of the process. Therefore, the virtual machine management system 200 in the present embodiment has the same effects as the virtual machine management system 100 in the first embodiment.

以上、実施形態を参照して本発明を説明したが、本発明は上記実施形態に限定されるものではない。本発明の構成や詳細には、本発明のスコープ内で当業者が理解し得る様々な変更をすることができる。また、各実施形態における構成は、本発明のスコープを逸脱しない限りにおいて、互いに組み合わせることが可能である。 The present invention has been described above with reference to the embodiments, but the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention. The configurations in the embodiments can be combined with each other without departing from the scope of the present invention.

１０ホスト
２０、４０仮想マシン
２１、４１ゲスト物理メモリ
３０ゲストＯＳ
５０通信ネットワーク
１００、２００仮想マシン管理システム
１１０制御部
１１１サービス管理表
１１２チェックポイント制御部
１２０メモリ転送制御部
１２１転送バッファ
１３０ネットワーク制御部
１３１通信解析部
１３２保留バッファ
５００情報処理装置
５０１ＣＰＵ
５０２ＲＯＭ
５０３ＲＡＭ
５０４プログラム
５０５記憶装置
５０６記憶媒体
５０７ドライブ装置
５０８通信インターフェース
５０９通信ネットワーク
５１０入出力インターフェース
５１１バス 10 Host 20, 40 Virtual machine 21, 41 Guest physical memory 30 Guest OS
DESCRIPTION OF SYMBOLS 50 Communication network 100, 200 Virtual machine management system 110 Control part 111 Service management table 112 Checkpoint control part 120 Memory transfer control part 121 Transfer buffer 130 Network control part 131 Communication analysis part 132 Reserve buffer 500 Information processing apparatus 501 CPU
502 ROM
503 RAM
504 Program 505 Storage device 506 Storage medium 507 Drive device 508 Communication interface 509 Communication network 510 Input / output interface 511 Bus

Claims

A service management table that holds conditions relating to the state of the fault tolerant function for each of at least one service that is a target of the fault tolerant function of the virtual machine;
Control means for controlling the state of the fault tolerant function of the virtual machine based on the condition ;
E Bei and a communication analysis means for analyzing the contents of the communication in the virtual machines,
The virtual machine management system , wherein the control unit controls a state of the fault tolerant function of the virtual machine based on an analysis result of the communication content in the communication analysis unit .

2. The control unit according to claim 1, wherein the control unit controls the virtual machine to enable the fault-tolerant function when at least one of the services satisfies the condition that the fault-tolerant function should be enabled. 3. Virtual machine management system.

The control means controls the virtual machine to disable the fault tolerant function when the condition that any of the services should disable the fault tolerant function is satisfied. The virtual machine management system described in 1.

The apparatus according to claim 1, further comprising: a checkpoint control unit configured to control a checkpoint that is an interval for storing the state of the virtual machine when the fault tolerant function is enabled according to the service related to the condition. The virtual machine management system according to any one of the above.

5. The virtual machine management system according to claim 1, wherein the service management table holds a condition for enabling the fault-tolerant function related to communication of the service for each of the services.

Said control means on the basis of the result of monitoring by the service monitoring unit for executing status monitoring related to the service to be executed in a virtual machine, and controls the state of the fault tolerant capabilities of the virtual machine, of the claims 1 to 5 The virtual machine management system according to any one of the above.

Determining whether at least one of the services satisfies the condition based on a condition for enabling the fault tolerant function for each of at least one service that is a target of the fault tolerant function of the virtual machine;
Based on the result of the determination, control to change the state related to the fault tolerant function of the virtual machine ,
Analyzing the content of communication in the virtual machine,
A virtual machine management method for controlling a state of the fault tolerant function of the virtual machine based on an analysis result of the communication content .

On the computer,
A process of determining whether at least one of the services satisfies the condition based on a condition for enabling the fault tolerant function for each of at least one service to be a target of the fault tolerant function of the virtual machine;
Based on the result of the determination, a process for controlling the virtual machine to change a state related to the fault tolerant function ;
Processing to analyze the content of communication in the virtual machine ,
The control process is a program that controls a state of the fault tolerant function of the virtual machine based on an analysis result of the content of the communication in the analysis process .