JP2009086701A

JP2009086701A - Virtual computer system and virtual machine restoration method in same system

Info

Publication number: JP2009086701A
Application number: JP2007251582A
Authority: JP
Inventors: Takuya Kumagai; 卓也熊谷; Tetsuya Iinuma; 哲也飯沼
Original assignee: Toshiba Corp; Toshiba Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2007-09-27
Filing date: 2007-09-27
Publication date: 2009-04-23
Anticipated expiration: 2027-09-27
Also published as: JP4510064B2

Abstract

<P>PROBLEM TO BE SOLVED: To restore a virtual machine to a state right before occurrence of a fault in a short period of time. <P>SOLUTION: In a virtual computer system, when a fault occurs in a physical computer 10-1 which a virtual machine 11-1 operates at a first time of day, a restoration mechanism 140-2 on a physical computer 10-2 restores a virtual machine 11-1 of a state of a second time of day into a virtual machine 11-2 on a physical computer 10-2 based on a snapshot 112 acquired by a disk device 100 at the second time of day nearest to the first time of day. The restoration mechanism 140-2 restores the virtual machine to the state right before the first time of day by supplying input data from the second time of day to the first time of day recorded by a communication recording device 40 as the data to be processed by the application to the virtual machine by a predetermined parallel degree in a log table associated with the virtual machine 11-1 and the application which operates on this virtual machine. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、物理計算機上で動作する仮想マシンを備えた仮想計算機システムに係り、特に、仮想マシンが動作する物理計算機の障害時に当該仮想マシンを復元するのに好適な仮想計算機システム及び同システムにおける仮想マシン復元方法に関する。 The present invention relates to a virtual machine system including a virtual machine that operates on a physical machine, and more particularly to a virtual machine system that is suitable for restoring a virtual machine in the event of a failure of a physical machine that operates the virtual machine. The present invention relates to a virtual machine restoration method.

近年、例えば特許文献１に記載されているような仮想マシンを備えた仮想計算機システムが開発されている。仮想マシンは、物理計算機（のオペレーティングシステム）上で動作する仮想化された計算機として知られている。仮想マシンは、仮想化されたディスク、メモリ、ＣＰＵ（プロセッサ）及びネットワークインタフェース（ネットワークインタフェースカード：ＮＩＣ）を含む。この仮想マシン上にオペレーティングシステム（ＯＳ）、アプリケーション（アプリケーションプログラム）をインストールすることにより、当該ＯＳ、アプリケーションを実行させることができる。ここでアプリケーションは、仮想マシンのＯＳ上で動作する。仮想マシンは、外部（クライアントマシン）からアプリケーションへの何らかの入力（例えば命令の送出）が行われると、その入力内容に応じた処理を当該アプリケーションに従って実行し、実行結果（を含む応答）を外部（クライアントマシン）に出力する。
特開２００６−２０２２７９号公報 In recent years, for example, a virtual computer system including a virtual machine as described in Patent Document 1 has been developed. A virtual machine is known as a virtualized computer that operates on a physical computer (operating system thereof). The virtual machine includes a virtualized disk, memory, CPU (processor), and network interface (network interface card: NIC). By installing an operating system (OS) and applications (application programs) on this virtual machine, the OS and applications can be executed. Here, the application operates on the OS of the virtual machine. When any input (for example, sending of an instruction) is performed from the outside (client machine) to the application, the virtual machine executes processing according to the input content according to the application, and sends the execution result (including a response) to the outside (including the response). Output to the client machine).
JP 2006-202279 A

上記したように、仮想マシンでは、アプリケーションに従う処理（アプリケーションの処理）によって、仮想マシンの状態が処理前の状態から処理後の状態に遷移する。つまり仮想マシンの状態遷移が当該仮想マシン上で動作するアプリケーションへの入力によって引き起こされる。そこで、例えば処理前の仮想マシンの状態が状態＃１であり、処理後の仮想マシンの状態が状態２であるとすると、状態＃１にある仮想マシンを状態＃２にするために、次のような方法が考えられる。この方法とは、仮想マシンを状態＃１に設定し、しかる後に当該仮想マシンが状態＃１から状態＃２に遷移した際と同じ入力を当該仮想マシン上で動作するアプリケーションに対して行うことである。この方法を応用することで、例えば物理計算機に障害が発生しても、当該物理計算機上で動作していた仮想マシンを障害発生直前の状態に復元することが可能となる。 As described above, in the virtual machine, the state of the virtual machine changes from the state before the processing to the state after the processing by the processing according to the application (processing of the application). That is, the state transition of the virtual machine is caused by an input to an application operating on the virtual machine. Therefore, for example, if the state of the virtual machine before processing is state # 1, and the state of the virtual machine after processing is state 2, the virtual machine in state # 1 is Such a method can be considered. This method is to set the virtual machine to the state # 1, and then perform the same input to the application running on the virtual machine as when the virtual machine transitioned from the state # 1 to the state # 2. is there. By applying this method, for example, even if a failure occurs in a physical computer, it is possible to restore the virtual machine that was running on the physical computer to the state immediately before the failure occurred.

さて、物理計算機上で動作する仮想マシンを当該物理計算機の障害が発生する直前の状態に復元するための方法として、例えば定期的に仮想マシンのスナップショット（ある時刻における仮想マシンの動作状態を保存した情報ファイル）を取得（採取）すると共に、当該仮想マシンの状態遷移のきっかけとなる入力データと出力データのログを取得することが考えられる。つまり、物理計算機の障害が発生した場合に、その障害の発生時点ｔxに最も近い時点ｔy（ｔy＜ｔx）で取得されたスナップショットにより時点ｔyにおける仮想マシンの状態を復元し、しかる後にログに従って時点ｔyから時間ｔxまでの当該仮想マシン上で動作するアプリケーションへの入力を再現するならば、当該仮想マシンを物理計算機の障害が発生した時点ｔxの直前の状態に戻すことが可能となる。 As a method for restoring a virtual machine running on a physical computer to a state immediately before the failure of the physical computer occurs, for example, periodically snapshots of the virtual machine (save the virtual machine operating state at a certain time) It is conceivable to acquire (collect) the information file) and acquire a log of input data and output data that trigger the state transition of the virtual machine. That is, when a failure occurs in the physical computer, the state of the virtual machine at the time ty is restored by the snapshot acquired at the time ty (ty <tx) closest to the time tx when the failure occurs, and then according to the log If the input to the application running on the virtual machine from the time ty to the time tx is reproduced, the virtual machine can be returned to the state immediately before the time tx when the failure of the physical computer occurs.

しかしながら、スナップショットを取得する時間間隔内に仮想マシン（上で動作するアプリケーション）への入力が多数が発生する場合、ログに従って時系列順に当該仮想マシンへの入力を再現すると、当該仮想マシンを復元するのに長時間を要することが予測される。 However, if a large number of inputs to the virtual machine (the application running on it) occur within the time interval for acquiring the snapshot, the virtual machine is restored by reproducing the input to the virtual machine in chronological order according to the log It is expected that it will take a long time to do.

本発明は上記事情を考慮してなされたものでその目的は、アプリケーションによっては、先行する処理に関係せず、どのような状態からでも同じ入力に対して同じ出力と同じ状態遷移が行われることに着目して、このような特別のアプリケーションが動作する仮想マシンを障害発生直前の状態に短時間で復元することができる仮想計算機システム及び同システムにおける仮想マシン復元方法を提供することにある。 The present invention has been made in consideration of the above circumstances, and its purpose is that, depending on the application, the same output and the same state transition are performed for the same input from any state regardless of the preceding processing. In view of the above, it is an object of the present invention to provide a virtual machine system that can quickly restore a virtual machine on which such a special application operates to a state immediately before a failure occurs, and a virtual machine restoration method in the system.

本発明の１つの観点によれば、仮想マシンがそれぞれ動作可能な、第１及び第２の物理計算機を含む複数の物理計算機を備えた仮想計算機システムが提供される。この仮想計算機システムは、前記複数の物理計算機によって共有されるディスク装置であって、前記複数の物理計算機のうちの任意の物理計算機で動作する仮想マシンが仮想ディスクとして使用可能なデータ領域を提供するディスク装置と、前記仮想マシンと当該仮想マシン上で動作するアプリケーションとに対応付けられたログテーブルに、前記アプリケーションで処理される入力データ、当該入力データに応じて前記アプリケーションで実行される処理の結果としての出力データ及び当該入力データの入力時刻を示す情報の組を、前記アプリケーションを当該仮想マシンが実行することによって提供されるサービスを利用するクライアントマシンとの間の通信の履歴として時系列順に記録する通信記録装置とを具備する。また、前記第１の物理計算機は、当該第１の物理計算機で前記仮想マシンが動作する場合、当該仮想マシンの動作状態及び当該仮想マシンの使用する前記仮想ディスクの状態を定期的にスナップショットとして取得して当該仮想マシンに対応付けて前記ディスク装置に保存するスナップショット管理手段を含み、前記第２の物理計算機は、前記第１の物理計算機で前記仮想マシンが動作している状態で、当該第１の物理計算機に第１の時刻で障害が発生した場合、当該第１の時刻に最も近い第２の時刻で取得されて前記仮想マシンに対応付けて前記ディスク装置に保存された前記スナップショットに基づき、前記第２の時刻の状態の前記仮想マシンを前記第２の物理計算機上に復元する第１の復元手段と、前記第２の時刻の状態に復元された仮想マシン上で動作するアプリケーションが、先行する処理に関係せず、どのような状態からでも同じ入力に対して同じ出力と同じ状態遷移が行われる特定タイプのアプリケーションである場合、前記第２の時刻の状態に仮想マシンと前記アプリケーションとに対応付けられた前記ログテーブルに記録されている前記第２の時刻から前記第１の時刻までの入力データを、前記アプリケーションで処理されるべきデータとして、予め定められた並列度で前記第２の時刻の状態に復元された仮想マシンに投入することにより、当該仮想マシンを前記第１の時刻の直前の状態に復元する特定復元タイプの復元処理を行う第２の復元手段とを含む。 According to one aspect of the present invention, a virtual computer system including a plurality of physical computers including a first physical computer and a second physical computer, each of which can operate a virtual machine, is provided. The virtual computer system is a disk device shared by the plurality of physical computers, and provides a data area that can be used as a virtual disk by a virtual machine operating on an arbitrary physical computer among the plurality of physical computers. In the log table associated with the disk device, the virtual machine and the application running on the virtual machine, the input data processed by the application, the result of the processing executed by the application according to the input data As a history of communication with a client machine that uses a service provided by the virtual machine executing the application, a set of output data and information indicating the input time of the input data is recorded in chronological order And a communication recording device. In addition, when the virtual machine operates on the first physical computer, the first physical computer periodically takes an operation state of the virtual machine and a state of the virtual disk used by the virtual machine as a snapshot. Snapshot management means for acquiring and storing in the disk device in association with the virtual machine, wherein the second physical computer is in a state where the virtual machine is operating in the first physical computer; When a failure occurs in the first physical computer at the first time, the snapshot acquired at the second time closest to the first time and stored in the disk device in association with the virtual machine Based on the first time, the first restoring means for restoring the virtual machine in the second time state on the second physical computer, and restored to the second time state When the application running on the virtual machine is a specific type of application in which the same output and the same state transition are performed for the same input from any state regardless of the preceding process, the second time In this state, input data from the second time to the first time recorded in the log table associated with the virtual machine and the application is preliminarily set as data to be processed by the application. A restoration process of a specific restoration type that restores the virtual machine to a state immediately before the first time by being input to the virtual machine restored to the state of the second time with a predetermined degree of parallelism 2 restoration means.

本発明によれば、復元されるべき仮想マシン上で動作するアプリケーションが、先行する処理に関係せず、どのような状態からでも同じ入力に対して同じ出力と同じ状態遷移が行われる特定タイプのアプリケーション（特別のアプリケーション）である場合、上記仮想マシンと当該アプリケーションとに対応付けられたログテーブルに時系列順に記録された入力データが、当該アプリケーションで処理されるべきデータとして、その時系列順に無関係に、予め定められた並列度で、スナップショットを用いて復元された仮想マシンに投入されるため、当該アプリケーションに関し、上記仮想マシンを障害発生直前の状態に短時間で復元することができる。 According to the present invention, an application running on a virtual machine to be restored has a specific type in which the same output and the same state transition are performed for the same input from any state regardless of the preceding process. In the case of an application (special application), the input data recorded in the time series in the log table associated with the virtual machine and the application is irrelevant as the data to be processed by the application. Since the virtual machine is input to the virtual machine restored using the snapshot at a predetermined parallelism, the virtual machine can be restored to the state immediately before the occurrence of the failure in a short time with respect to the application.

以下、本発明の実施の形態につき図面を参照して説明する。
図１は本発明の一実施形態に係る仮想計算機システムの構成を示すブロック図である。図１の仮想計算機システムは、複数の物理計算機（物理サーバ計算機）、例えば２台の物理計算機１０-1及び１０-2から構成される。物理計算機１０-1及び１０-2は、通信路２０によって相互接続されている。この通信路２０は、例えばネットワークによって実現される。 Embodiments of the present invention will be described below with reference to the drawings.
FIG. 1 is a block diagram showing a configuration of a virtual machine system according to an embodiment of the present invention. The virtual computer system of FIG. 1 includes a plurality of physical computers (physical server computers), for example, two physical computers 10-1 and 10-2. The physical computers 10-1 and 10-2 are interconnected by a communication path 20. The communication path 20 is realized by a network, for example.

物理計算機１０-1及び１０-2は、ＣＰＵ、Ｉ／Ｏ装置及びメモリのような周知のハードウェア資源（図示せず）を備えている。物理計算機１０-1及び１０-2は、当該計算機１０-1及び１０-2によって共有されるディスク装置１００と接続されている。つまりディスク装置１００は、物理計算機１０-1及び１０-2が共通に有するハードウェア資源である。 The physical computers 10-1 and 10-2 include well-known hardware resources (not shown) such as a CPU, an I / O device, and a memory. The physical computers 10-1 and 10-2 are connected to a disk device 100 shared by the computers 10-1 and 10-2. That is, the disk device 100 is a hardware resource shared by the physical computers 10-1 and 10-2.

物理計算機１０-1及び１０-2が有するハードウェア資源は、仮想化されることにより、仮想マシン（Virtual Machine：ＶＭ）が動作する環境（仮想マシン実行環境）を提供する。図１では、物理計算機１０-1の仮想マシン実行環境で仮想マシン１１-1が動作している状態が示されている。この仮想マシン実行環境は、当該実行環境がディスク装置１００のうちの仮想マシン１１-1に割り当てられる（仮想マシン１１-1が利用可能な）仮想化されたディスク領域である仮想ディスク１１０を含む。仮想ディスク１１０の内容は、後述する仮想マシンマネージャ（ＶＭＭ）１３-1及び１３-2からは、１つのファイルとして認識される。 The hardware resources of the physical computers 10-1 and 10-2 are virtualized to provide an environment (virtual machine execution environment) in which a virtual machine (VM) operates. FIG. 1 shows a state in which the virtual machine 11-1 is operating in the virtual machine execution environment of the physical computer 10-1. This virtual machine execution environment includes a virtual disk 110 which is a virtualized disk area (the virtual machine 11-1 can be used) assigned to the virtual machine 11-1 in the disk device 100. The contents of the virtual disk 110 are recognized as one file by virtual machine managers (VMM) 13-1 and 13-2, which will be described later.

仮想マシン１１-1（のＯＳ）上では、例えばアプリケーション（ＡＰ）１２-1及び１２-2が動作する。アプリケーション１２-1及び１２-2には、それぞれポート番号Ｐ１及びＰ２が割り当てられているものとする。クライアントマシン３０は、このポート番号Ｐ１及びＰ２のポートを介して、アプリケーション１２-1及び１２-2にアクセスすることができる。 On the virtual machine 11-1 (its OS), for example, applications (AP) 12-1 and 12-2 operate. Assume that port numbers P1 and P2 are assigned to the applications 12-1 and 12-2, respectively. The client machine 30 can access the applications 12-1 and 12-2 via the ports having the port numbers P1 and P2.

仮想マシン１１-1が動作する物理計算機１０-1に障害が発生した場合、当該仮想マシン１１-1が提供するサービスを、例えば物理計算機１０-2側に引き継がせるために、当該物理計算機１０-2に仮想マシン１１-1に相当する仮想マシン１１-2が生成（再生成）される。図１では、仮想マシン１１-2が破線のブロックで示されている。このことは、図１の状態では、未だ仮想マシン１１-2が物理計算機１０-2上に生成されていないことを示す。仮想マシン１１-2が動作する場合、当該仮想マシン１１-2（のＯＳ）上では、仮想マシン１１-1におけるのと同様に、アプリケーション（ＡＰ）１２-1及び１２-2が動作可能である。なお、例えば物理計算機１０-1上で、仮想マシン１１-1を含む複数の仮想マシンが動作する構成であっても構わない。 When a failure occurs in the physical computer 10-1 on which the virtual machine 11-1 operates, the physical computer 10-1 is transferred to, for example, the physical computer 10-2 to take over the service provided by the virtual machine 11-1. 2, a virtual machine 11-2 corresponding to the virtual machine 11-1 is generated (regenerated). In FIG. 1, the virtual machine 11-2 is indicated by a broken-line block. This indicates that in the state of FIG. 1, the virtual machine 11-2 has not yet been generated on the physical computer 10-2. When the virtual machine 11-2 operates, the applications (AP) 12-1 and 12-2 can operate on the virtual machine 11-2 (the OS) as in the virtual machine 11-1. . For example, a configuration may be employed in which a plurality of virtual machines including the virtual machine 11-1 operate on the physical computer 10-1.

物理計算機１０-1及び１０-2上では、ハイパバイザである仮想マシンマネージャ（Virtual Machine Manager：ＶＭＭ）１３-1及び１３-2がそれぞれ動作する。ＶＭＭ１３-1及び１３-2は、仮想マシンモニタ（Virtual Machine Monitor：ＶＭＭ）とも呼ばれ、それぞれ、物理計算機１０-1及び１０-2が有する上述のハードウェア資源の利用を管理することで、仮想マシンを管理する。例えばＶＭＭ１３-1及び１３-2は、物理計算機１０-1及び１０-2が有するハードウェア資源を仮想化することにより仮想マシンが動作する仮想マシン実行環境を提供する。つまりＶＭＭ１３-1及び１３-2は、仮想化されたハードウェア資源を有する仮想マシンを構築する。 On the physical computers 10-1 and 10-2, virtual machine managers (VMM) 13-1 and 13-2, which are hypervisors, respectively operate. The VMMs 13-1 and 13-2 are also referred to as virtual machine monitors (VMMs), and virtual machines are managed by managing the use of the hardware resources of the physical computers 10-1 and 10-2, respectively. Manage machines. For example, the VMMs 13-1 and 13-2 provide a virtual machine execution environment in which a virtual machine operates by virtualizing hardware resources of the physical computers 10-1 and 10-2. In other words, the VMMs 13-1 and 13-2 construct virtual machines having virtualized hardware resources.

物理計算機１０-1及び１０-2上では、それぞれクラスタマネージャ１４-1及び１４-2も動作する。クラスタマネージャ１４-1及び１４-2は、物理計算機１０-1及び１０-2上でそれぞれ動作する仮想マシン１１-1及び１１-2から構成されるクラスタシステムを制御する。クラスタマネージャ１４-1及び１４-2は、相互に通信（ハートビート通信）を行うことで、それぞれ、物理計算機１０-2及び１０-1上で動作する仮想マシン１１-2及び１１-1の障害を検出する周知の障害検出機能を有する。仮想マシン１１-1及び１１-2は、後述するクライアントマシン３０からは同一の仮想マシンとして認識される。つまり仮想マシン１１-1及び１１-2は、クライアントマシン３０からは単一のＩＰ（Internet Protocol）アドレスが割り当てられた仮想マシンとして認識される。この単一のＩＰアドレスがＩＰＡであるとする。 Cluster managers 14-1 and 14-2 also operate on the physical computers 10-1 and 10-2, respectively. The cluster managers 14-1 and 14-2 control a cluster system composed of virtual machines 11-1 and 11-2 operating on the physical computers 10-1 and 10-2, respectively. The cluster managers 14-1 and 14-2 communicate with each other (heartbeat communication), thereby causing failures in the virtual machines 11-2 and 11-1 operating on the physical computers 10-2 and 10-1, respectively. It has a well-known fault detection function for detecting The virtual machines 11-1 and 11-2 are recognized as the same virtual machine from the client machine 30 described later. That is, the virtual machines 11-1 and 11-2 are recognized by the client machine 30 as virtual machines to which a single IP (Internet Protocol) address is assigned. Assume that this single IP address is IPA.

クラスタマネージャ１４-1及び１４-2は、それぞれ復元機構１４０-1及び１４０-2を含む。復元機構１４０-i（ｉ＝１，２）は、仮想マシン１１-j（ｊ＝１，２、但しｊ≠ｉ）の障害を検出した場合に、障害発生直前の仮想マシン１１-jの状態を仮想マシン１１-iに復元する機能、つまり仮想マシン１１-iを用いて、物理計算機１０-i上に障害発生直前の状態の仮想マシン１１-jを復元する機能を有する。 The cluster managers 14-1 and 14-2 include restoration mechanisms 140-1 and 140-2, respectively. When the restoration mechanism 140-i (i = 1, 2) detects a failure of the virtual machine 11-j (j = 1, 2, j ≠ i), the state of the virtual machine 11-j immediately before the failure occurs Is restored to the virtual machine 11-i, that is, the virtual machine 11-i is used to restore the virtual machine 11-j in the state immediately before the failure on the physical computer 10-i.

物理計算機１０-1及び１０-2上では、それぞれスナップショットマネージャ１５-1及び１５-2も動作する。スナップショットマネージャ１５-1及び１５-2は、それぞれ、物理計算機１０-1及び１０-2上で仮想マシンが動作する場合に、定期的に当該仮想マシンの動作状態及び当該仮想マシンが利用する仮想ディスクの内容をスナップショットとして取得してディスク装置１００に保存する。仮想マシンの動作状態は、当該仮想マシンに割り当てられているＣＰＵの状態（プログラムカウンタ及びレジスタの状態）及びメモリの状態を含む。 Snapshot managers 15-1 and 15-2 also operate on the physical computers 10-1 and 10-2, respectively. When the virtual machines operate on the physical computers 10-1 and 10-2, respectively, the snapshot managers 15-1 and 15-2 periodically operate the virtual machine and the virtual machine used by the virtual machine. The contents of the disk are acquired as a snapshot and stored in the disk device 100. The operation state of the virtual machine includes a CPU state (program counter and register states) and a memory state assigned to the virtual machine.

物理計算機１０-1及び１０-2は通信路２０を介してクライアントマシン３０と接続されている。クライアントマシン３０は、物理計算機１０-1または１０-2上で仮想マシンが動作する場合に、当該仮想マシンの提供するサービスを利用するために、通信路２０を介して当該仮想マシンと通信を行う。図１の例では、クライアントマシン２０は、物理計算機１０-1上で動作する仮想マシン１１-1と通信を行う。 The physical computers 10-1 and 10-2 are connected to the client machine 30 via the communication path 20. When the virtual machine operates on the physical computer 10-1 or 10-2, the client machine 30 communicates with the virtual machine via the communication path 20 in order to use a service provided by the virtual machine. . In the example of FIG. 1, the client machine 20 communicates with the virtual machine 11-1 operating on the physical computer 10-1.

通信路２０上には、通信記録装置４０が配置されている。通信記録装置４０は、クライアントマシン３０と物理計算機１０-1または１０-2上で動作する仮想マシン（上のアプリケーション）との間で通信路２０を介して行われる通信（入出力）の履歴を、仮想マシン（上のアプリケーション）毎に時系列順に時刻情報（入力の時刻を表す時刻情報）と共に記録する。ここでは、クライアントマシン３０から仮想マシン（上のアプリケーション）への通信を「入力」と呼び、その逆の仮想マシンからクライアントマシン３０への通信を「出力」と呼ぶ。 A communication recording device 40 is disposed on the communication path 20. The communication recording device 40 records a history of communication (input / output) performed via the communication path 20 between the client machine 30 and the virtual machine (the upper application) operating on the physical computer 10-1 or 10-2. The time information (time information indicating the input time) is recorded in time series order for each virtual machine (the upper application). Here, communication from the client machine 30 to the virtual machine (upper application) is referred to as “input”, and communication from the opposite virtual machine to the client machine 30 is referred to as “output”.

本実施形態において通信記録装置４０は、クライアントマシン３０を物理計算機１０-1または１０-2上で動作する仮想マシンと接続するためのスイッチに設けられている。しかし通信記録装置４０が、ルータ、或いはプロキシサーバ（クライアントマシン３０から物理計算機１０-1または１０-2上で動作する仮想マシンへのアクセスを代理するプロキシサーバ）に設けられていても良い。 In the present embodiment, the communication recording device 40 is provided in a switch for connecting the client machine 30 to a virtual machine operating on the physical computer 10-1 or 10-2. However, the communication recording device 40 may be provided in a router or a proxy server (a proxy server proxying access from the client machine 30 to a virtual machine operating on the physical computer 10-1 or 10-2).

ディスク装置１００には、スナップショット領域１１１が確保されている。このスナップショット領域１１１は、例えば仮想マシン１１-1の動作状態及び仮想ディスク１１０の内容をスナップショット１１２として定期的に格納するのに用いられる。スナップショット領域１１１には、当該領域１１１に格納されたスナップショット１１２の列を管理するスナップショット管理情報１１３も格納される。 In the disk device 100, a snapshot area 111 is secured. The snapshot area 111 is used to periodically store, for example, the operation state of the virtual machine 11-1 and the contents of the virtual disk 110 as a snapshot 112. In the snapshot area 111, snapshot management information 113 for managing a column of the snapshot 112 stored in the area 111 is also stored.

ディスク装置１００にはまた、復元管理テーブル１１４が格納される。復元管理テーブル１１４は、仮想マシンを復元する際の復元方法を示す復元タイプを、仮想マシン上で動作するアプリケーション毎に、当該仮想マシンに割り当てられたＩＰアドレス、及び当該アプリケーションに割り当てられたポート番号と対応付けて予め保持する。 The disk device 100 also stores a restoration management table 114. The restoration management table 114 indicates a restoration type indicating a restoration method when restoring a virtual machine, for each application operating on the virtual machine, an IP address assigned to the virtual machine, and a port number assigned to the application. And stored in advance.

本実施形態で適用される復元タイプは復元タイプＲＴ１と復元タイプＲＴ２の２種である。復元タイプＲＴ１及びＲＴ２は、それぞれ、第１及び第２のタイプのアプリケーションが動作する仮想マシンの復元に適した復元方法を示す。 There are two types of restoration applied in the present embodiment: restoration type RT1 and restoration type RT2. The restoration types RT1 and RT2 indicate restoration methods suitable for restoring virtual machines in which the first and second types of applications operate, respectively.

第１のタイプのアプリケーションとは、仮想マシン上で動作する場合に、先行する処理に関係せず、どのような状態からでも同じ入力に対して同じ出力と同じ状態遷移が行われる特定のアプリケーションを指す。第１のタイプのアプリケーションは、例えばデータベース参照専用のアプリケーションである。本実施形態において、アプリケーション１２-1は第１のタイプのアプリケーションである。復元タイプＲＴ１は、通信記録装置４０によって記録された通信の履歴の示す入力順序とは無関係に、当該履歴に含まれている入力データ列を、復元されるべき仮想マシン上で動作するアプリケーションに予め定められた並列度で投入する復元方法を指す。 The first type of application is a specific application in which the same output and the same state transition are performed for the same input from any state regardless of the preceding process when operating on the virtual machine. Point to. The first type of application is an application dedicated to database reference, for example. In the present embodiment, the application 12-1 is a first type application. The restoration type RT1 preliminarily stores an input data string included in the history in an application operating on the virtual machine to be restored regardless of the input order indicated by the communication history recorded by the communication recording device 40. It refers to a restoration method that is input with a predetermined degree of parallelism.

第２のタイプのアプリケーションとは、仮想マシン上で動作する場合に、同じ入力であっても、先行する状態の違いによって異なる出力と異なる状態遷移が行われる可能性のあるアプリケーション（つまり先行する処理に依存するアプリケーション）を指す。本実施形態において、アプリケーション１２-2は第２のタイプのアプリケーションである。復元タイプＲＴ２は、通信記録装置４０によって記録された通信の履歴の示す入力順序と入力タイミングとを守って、当該履歴に含まれている入力データ列を、復元されるべき仮想マシン上で動作するアプリケーションに投入する復元方法を指す。 The second type of application is an application in which a different state transition may be performed from a different output depending on a difference in the preceding state even when the input is the same (that is, a preceding process) when operating on a virtual machine. Application that depends on). In the present embodiment, the application 12-2 is a second type application. The restoration type RT2 operates on the virtual machine to be restored, in accordance with the input order and the input timing indicated by the communication history recorded by the communication recording device 40, in accordance with the input data string included in the history. Refers to the restoration method that is submitted to the application.

図２は通信記録装置４０の構成を示すブロック図ある。通信記録装置４０は、ログ取得部４１、ログテーブル４２-1及び４２-2、ログ保存部４３、ログ送信部４４及びフィルタ部４５を含む。 FIG. 2 is a block diagram showing the configuration of the communication recording device 40. The communication recording device 40 includes a log acquisition unit 41, log tables 42-1 and 42-2, a log storage unit 43, a log transmission unit 44, and a filter unit 45.

ログ取得部４１は、クライアントマシン２と仮想マシン１１-1上で動作するアプリケーション１２-1及び１２-2との間で通信路２０を介して行われる通信の履歴（ログ）を取得する。ログテーブル４２-1及び４２-2は、ログ取得部４１によって取得された、クライアントマシン２と仮想マシン１１-1上で動作するアプリケーション１２-1及び１２-2との間の通信の履歴を保持するのに用いられる。ログ保存部４３は、ログ取得部４１によって取得された通信の履歴を、ログテーブル４２-1及び４２-2のうちの対応するテーブルに時系列順に保存（記録）する。 The log acquisition unit 41 acquires a history (log) of communication performed via the communication path 20 between the client machine 2 and the applications 12-1 and 12-2 operating on the virtual machine 11-1. The log tables 42-1 and 42-2 hold a history of communication between the client machine 2 and the applications 12-1 and 12-2 operating on the virtual machine 11-1 acquired by the log acquisition unit 41. Used to do. The log storage unit 43 stores (records) the communication history acquired by the log acquisition unit 41 in the corresponding table of the log tables 42-1 and 42-2 in chronological order.

ログ送信部４４は、復元機構１４０-i（ｉ＝１，２）からの要求に従い、ログテーブル４２-1及び４２-2に保存されている通信の履歴を当該復元機構１４０-iに送信する。フィルタ部４５は、復元機構１４０-iによる復元処理の期間、クライアントマシン３０から送られる復元処理の対象となっている仮想マシン上のアプリケーション宛ての通信データをフィルタリングする。 The log transmission unit 44 transmits the communication history stored in the log tables 42-1 and 42-2 to the restoration mechanism 140-i according to the request from the restoration mechanism 140-i (i = 1, 2). . The filter unit 45 filters communication data addressed to the application on the virtual machine that is the target of the restoration process sent from the client machine 30 during the restoration process by the restoration mechanism 140-i.

図３は、ログテーブル４２-1の一例を示す。図３に示されるように、ログテーブル４２-1には、入力データと当該入力データに対する出力データの対が入力時点の時刻の情報と共に時系列順に保存されている。 FIG. 3 shows an example of the log table 42-1. As shown in FIG. 3, the log table 42-1 stores a pair of input data and output data corresponding to the input data together with time information at the time of input in chronological order.

図４は復元機構１４０-2の構成を示すブロック図である。復元機構１４０-2は、第１の復元部１４１及び第２の復元部１４２から構成される。なお、復元機構１４０-1も復元機構１４０-2と同様の構成を有する。 FIG. 4 is a block diagram showing a configuration of the restoration mechanism 140-2. The restoration mechanism 140-2 includes a first restoration unit 141 and a second restoration unit 142. Note that the restoration mechanism 140-1 has the same configuration as the restoration mechanism 140-2.

第１の復元部１４１は、物理計算機（物理計算機１０-1）に障害が発生した場合に、その障害発生時刻に最も近い時刻（最新のスナップショット取得時刻）で取得された、当該物理計算機上の仮想マシン（仮想マシン１１-1）に関するスナップショットに基づき、当該最新のスナップショット取得時刻の当該仮想マシンの状態を、当該第１の復元部１４１が設けられている物理計算機（物理計算機１０-2）上に復元する。第２の復元部１４２は、最新のスナップショット取得時刻の状態に復元された仮想マシンと当該仮想マシン上で動作するアプリケーション（アプリケーション１２-1及び１２-2）とに対応付けられたログテーブル（ログテーブル４２-1及び４２-2）に上記最新のスナップショット取得時刻から障害発生時刻までの間に記録されている入力データを、当該アプリケーションで処理されるべきデータとして当該仮想マシンに投入することにより、当該仮想マシンを障害発生時刻の直前の状態に復元する。 When a failure occurs in the physical computer (physical computer 10-1), the first restoration unit 141 performs the above operation on the physical computer acquired at the time closest to the failure occurrence time (latest snapshot acquisition time). Based on the snapshot relating to the virtual machine (virtual machine 11-1), the state of the virtual machine at the latest snapshot acquisition time is changed to a physical computer (physical computer 10-) provided with the first restoration unit 141. 2) Restore on top. The second restoration unit 142 includes log tables (applications 12-1 and 12-2) associated with virtual machines restored to the latest snapshot acquisition time and applications (applications 12-1 and 12-2) operating on the virtual machines. Input the input data recorded between the latest snapshot acquisition time and the failure occurrence time in the log tables 42-1 and 42-2) into the virtual machine as data to be processed by the application. Thus, the virtual machine is restored to the state immediately before the failure occurrence time.

第２の復元部１４２は、ログ要求部１４３、ログ記憶部１４４、再投入部１４５、処理結果取得部１４６及び復元判定部１４７を含む。ログ要求部１４３は、仮想マシンを復元するに際し、当該仮想マシン上で動作するアプリケーションとクライアントマシンとの間の通信の履歴（ログ）を通信記録装置４０に要求する。ログ記憶部１４４は、ログ要求部１４３からの要求に応じて通信記録装置４０から送信された通信の履歴を格納する。再投入部１４５は、ログ記憶部１４４に格納された通信の履歴に含まれている入力データの列を、復元管理テーブル１１４によって示される復元タイプに固有の投入手順（復元方法）で、復元されるべき仮想マシン上で動作するアプリケーションに投入する。再投入部１４５は復元タイプ判定部１４５ａを含む。復元タイプ判定部１４５ａは、入力データの列をアプリケーションに投入する際に適用すべき投入手順（復元方法）を表す復元タイプを判定する。 The second restoration unit 142 includes a log request unit 143, a log storage unit 144, a re-input unit 145, a processing result acquisition unit 146, and a restoration determination unit 147. When the virtual machine is restored, the log request unit 143 requests the communication recording device 40 for a history (log) of communication between the application running on the virtual machine and the client machine. The log storage unit 144 stores a communication history transmitted from the communication recording device 40 in response to a request from the log request unit 143. The re-input unit 145 restores the input data sequence included in the communication history stored in the log storage unit 144 by the input procedure (restoration method) specific to the restoration type indicated by the restoration management table 114. To the application running on the virtual machine. The re-input unit 145 includes a restoration type determination unit 145a. The restoration type determination unit 145a determines a restoration type that represents a loading procedure (restoration method) to be applied when a string of input data is loaded into an application.

処理結果取得部１４６は、再投入部１４５による入力データの投入に応じて、投入先から出力される処理の結果（処理結果）としての出力データを取得する。復元判定部１４７は、処理結果取得部１４６によって取得された処理結果（出力データ）を、当該処理結果に対応する入力データと通信の履歴中で対をなしている出力データと比較することにより、復元されるべき仮想マシンが正しく復元されているかを判定する。 The processing result acquisition unit 146 acquires output data as a processing result (processing result) output from the input destination in response to input of input data by the re-input unit 145. The restoration determination unit 147 compares the processing result (output data) acquired by the processing result acquisition unit 146 with the input data corresponding to the processing result and the output data paired in the communication history. Determine if the virtual machine to be restored is restored correctly.

図５は復元管理テーブル１１４の一例を示す。図５の例では、復元管理テーブル１１４には、仮想マシン１１-1または１１-2上で動作する２つのアプリケーション１２-1及び１２-2について、ＩＰアドレス、ポート番号及び復元タイプが登録されている。即ち、アプリケーション１２-1に関しては、ＩＰアドレス、ポート番号及び復元タイプとして、それぞれ、ＩＰＡ、Ｐ１及びＲＴ１が登録され、アプリケーション１２-2に関しては、ＩＰアドレス、ポート番号及び復元タイプとして、それぞれ、ＩＰＡ、Ｐ２及びＲＴ２が登録されている。 FIG. 5 shows an example of the restoration management table 114. In the example of FIG. 5, in the restoration management table 114, IP addresses, port numbers, and restoration types are registered for the two applications 12-1 and 12-2 operating on the virtual machine 11-1 or 11-2. Yes. That is, for the application 12-1, IPA, P1, and RT1 are registered as the IP address, port number, and restoration type, respectively, and for the application 12-2, the IPA, port number, and restoration type are assigned as IPA, respectively. , P2 and RT2 are registered.

次に、図１の仮想計算機システムにおける動作を説明する。
今、クライアントマシン２０が、物理計算機１０-1上で動作する仮想マシン１１-1の提供するサービスを利用するために、通信路２０を介して当該仮想マシン１１-1上で動作するアプリケーション１２-1及び１２-2との間で通信を行っているものとする。この場合、通信記録装置４０内のログ取得部４１は、クライアントマシン２０と仮想マシン１１-1上で動作するアプリケーション１２-1及び１２-2との間の通信シーケンスで発生した全ての通信の履歴を取得する。通信記録装置４０内のログ保存部４３は、ログ取得部４１によって取得された通信の履歴を、アプリケーション１２-1及び１２-2毎に、それぞれログテーブル４２-1及び４２-2に保存する。 Next, operations in the virtual machine system of FIG. 1 will be described.
Now, in order for the client machine 20 to use the service provided by the virtual machine 11-1 running on the physical computer 10-1, the application 12- running on the virtual machine 11-1 via the communication path 20 is used. It is assumed that communication is performed between 1 and 12-2. In this case, the log acquisition unit 41 in the communication recording device 40 has a history of all communications that occurred in the communication sequence between the client machine 20 and the applications 12-1 and 12-2 running on the virtual machine 11-1. To get. The log storage unit 43 in the communication recording device 40 stores the communication history acquired by the log acquisition unit 41 in the log tables 42-1 and 42-2 for each of the applications 12-1 and 12-2.

仮想マシン１１-1が動作する物理計算機１０-1では、スナップショットマネージャ１５-1が、当該仮想マシン１１-1の動作状態と当該仮想マシン１１-1が利用する仮想ディスク１１０の内容を、ディスク装置１００に確保されている仮想マシン１１-1用のスナップショット領域１１１にスナップショット１１２として定期的に（例えば時間Δｔ毎に）取得（格納）している。スナップショットマネージャ１５-1は、スナップショット１１２を取得する都度、当該取得されたスナップショット１１２の世代管理のためにスナップショット管理情報１１３を更新する。 In the physical computer 10-1 on which the virtual machine 11-1 operates, the snapshot manager 15-1 displays the operation status of the virtual machine 11-1 and the contents of the virtual disk 110 used by the virtual machine 11-1 on the disk. It is periodically acquired (stored) as a snapshot 112 in the snapshot area 111 for the virtual machine 11-1 secured in the apparatus 100 (for example, every time Δt). Each time the snapshot manager 15-1 acquires the snapshot 112, the snapshot manager 15-1 updates the snapshot management information 113 for generation management of the acquired snapshot 112.

このようにして、例えば時刻ｔ１にスナップショット１１２として“ｓｎａｐ１”が取得され、時刻ｔ１からΔｔ後の時刻ｔ２（ｔ２＝ｔ１＋Δｔ）にスナップショット１１２として“ｓｎａｐ２”が取得されたものとする。また、時刻ｔ２から次にスナップショット１１２を取得すべき時刻ｔ３（ｔ３＝ｔ２＋Δｔ）が到来するまでのある時刻ｔに物理計算機１０-1の障害（例えばハードウェア障害）が発生したものとする。図６は、時刻ｔ１及びｔ２のそれぞれで、スナップショット領域１１１にスナップショット１１２“ｓｎａｐ１”及び１１２“ｓｎａｐ２”が取得された様子を示す。スナップショット１１２“ｓｎａｐ１”及び１１２“ｓｎａｐ２”は、それぞれ時刻ｔ１及びｔ２における仮想マシン１１-1の正常な状態を示す状態情報であるといえる。この時刻ｔ１及びｔ２のそれぞれにおけるスナップショット１１２“ｓｎａｐ１”及び１１２“ｓｎａｐ２”は、スナップショット管理情報１１３によって世代管理される。 In this manner, for example, it is assumed that “snap1” is acquired as the snapshot 112 at time t1, and “snap2” is acquired as the snapshot 112 at time t2 (t2 = t1 + Δt) after Δt from time t1. Further, it is assumed that a failure (for example, hardware failure) of the physical computer 10-1 occurs at a certain time t from the time t2 until the next time t3 (t3 = t2 + Δt) at which the snapshot 112 should be acquired. FIG. 6 shows that snapshots 112 “snap1” and 112 “snap2” are acquired in the snapshot area 111 at times t1 and t2, respectively. The snapshots 112 “snap1” and 112 “snap2” can be said to be state information indicating the normal state of the virtual machine 11-1 at times t1 and t2, respectively. The snapshots 112 “snap1” and 112 “snap2” at the times t 1 and t 2 are generation-managed by the snapshot management information 113.

また、時刻ｔ１から時刻ｔ２までの期間内の時刻ｔ１１，ｔ１２，ｔ１３，ｔ１４，ｔ１５及びｔ１６のそれぞれに、入力データＩＮ１，ＩＮ２，ＩＮ３，ＩＮ４，ＩＮ５及びＩＮ６が、クライアントマシン３０から仮想マシン１１-1上で動作するアプリケーション１２-1にポート番号Ｐ１のポートを介して与えられ、これらの入力データＩＮ１１，ＩＮ１２，ＩＮ１３，ＩＮ１４，ＩＮ１５及びＩＮ１６に対する当該アプリケーション１２-1での処理の結果として、出力データＯＵＴ１１，ＯＵＴ１２，ＯＵＴ１３，ＯＵＴ１４，ＯＵＴ１５及びＯＵＴ１６が当該アプリケーション１２-1からクライアントマシン３０に返されたものとする。そして、時刻ｔ２から時刻ｔまでの期間内の時刻ｔ２１，ｔ２２，ｔ２３及びｔ２４のそれぞれに、入力データＩＮ２１，ＩＮ２２，ＩＮ２３及びＩＮ２４が、クライアントマシン３０から仮想マシン１１-1上で動作するアプリケーション１２-1に与えられ、これらの入力データＩＮ２１，ＩＮ２２，ＩＮ２３及びＩＮ２４に対して当該アプリケーション１２-1からクライアントマシン３０に出力データＯＵＴ２１，ＯＵＴ２２，ＯＵＴ２３及びＯＵＴ２４が返されたものとする。 Further, the input data IN1, IN2, IN3, IN4, IN5, and IN6 are transferred from the client machine 30 to the virtual machine 11 at times t11, t12, t13, t14, t15, and t16 in the period from the time t1 to the time t2, respectively. -1 is given to the application 12-1 operating on the port number P1 as a result of processing in the application 12-1 for these input data IN11, IN12, IN13, IN14, IN15 and IN16. Assume that the output data OUT11, OUT12, OUT13, OUT14, OUT15, and OUT16 are returned from the application 12-1 to the client machine 30. Then, the input data IN21, IN22, IN23, and IN24 are run on the virtual machine 11-1 from the client machine 30 at times t21, t22, t23, and t24 in the period from the time t2 to the time t, respectively. -1 and output data OUT21, OUT22, OUT23 and OUT24 are returned from the application 12-1 to the client machine 30 in response to the input data IN21, IN22, IN23 and IN24.

上述の時刻ｔ１から時刻ｔ（障害発生時刻ｔ）までの期間内にアプリケーション１２-1に投入された入力データと当該入力データに対応する出力データとの対は、ログテーブル４２-1に時系列順に保存される。図３は、時刻ｔにおけるログテーブル４２-1の状態を示したものである。 A pair of the input data input to the application 12-1 and the output data corresponding to the input data within the period from the time t1 to the time t (failure occurrence time t) is time-series in the log table 42-1. Saved in order. FIG. 3 shows the state of the log table 42-1 at time t.

時刻ｔ１から時刻ｔまでの期間に物理計算機１０-1上で行われた上述の処理の手順を図７に示す。図７では、仮想マシン１１-1上で動作するアプリケーション１２-2における処理は省略されている。 FIG. 7 shows the procedure of the above process performed on the physical computer 10-1 during the period from time t1 to time t. In FIG. 7, processing in the application 12-2 running on the virtual machine 11-1 is omitted.

さて、時刻ｔで上述のように物理計算機（第１の物理計算機）１０-1に障害が発生した場合、物理計算機１０-1上で動作していた仮想マシン１１-1を復元するための処理（仮想マシン復元処理）が、例えば物理計算機（第２の物理計算機）１０-2上で行われる。以下、この仮想マシン復元処理について、図８のフローチャートを参照して説明する。 When a failure occurs in the physical computer (first physical computer) 10-1 at time t as described above, a process for restoring the virtual machine 11-1 operating on the physical computer 10-1. The (virtual machine restoration process) is performed on, for example, the physical computer (second physical computer) 10-2. Hereinafter, the virtual machine restoration process will be described with reference to the flowchart of FIG.

まず、時刻ｔで物理計算機１０-1に障害が発生したことは、物理計算機１０-2上で動作するクラスタマネージャ１４-2によって検出される。するとクラスタマネージャ１４-2内の復元機構１４０-2が起動される。復元機構１４０-2はＶＭＭ１３-2に対して、障害が発生した物理計算機１０-1上で動作していた仮想マシン１１-1を物理計算機１０-2のＯＳ上に仮想マシン１１-2として生成（構築）することを要求する。この要求を受けてＶＭＭ１３-2は、物理計算機１０-2のＯＳ上に仮想マシン１１-2を生成する（ステップＳ１）。 First, the occurrence of a failure in the physical computer 10-1 at time t is detected by the cluster manager 14-2 operating on the physical computer 10-2. Then, the restoration mechanism 140-2 in the cluster manager 14-2 is activated. For the VMM 13-2, the restoration mechanism 140-2 creates a virtual machine 11-1 operating on the physical computer 10-1 in which the failure has occurred as a virtual machine 11-2 on the OS of the physical computer 10-2. Request to (build). In response to this request, the VMM 13-2 creates a virtual machine 11-2 on the OS of the physical computer 10-2 (step S1).

復元機構１４０-2内の第１の復元部１４１は、ＶＭＭ１３-2によって生成された仮想マシン１１-2の状態を、スナップショット領域１１１に保存されているスナップショット１１２の列のうち、時刻ｔ（つまり障害発生時刻ｔ）に最も近いスナップショット取得時刻（つまり最新のスナップショット取得時刻）で取得されたスナップショット１１２を用いて、当該最新のスナップショット取得時刻おける仮想マシン１１-1と同一の状態に復元する（ステップＳ２）。図６から明らかなように、時刻ｔ（第１の時刻）に最も近いスナップショット取得時刻（第２の時刻）はｔ２であり、時刻ｔ２で取得されたスナップショット１１２は“ｓｎａｐ２”である。したがって本実施形態では、復元機構１４０-2は、スナップショット１１２“ｓｎａｐ２”を用いて、時刻ｔ２における仮想マシン１１-1と同一状態の仮想マシンを仮想マシン１１-2として復元する。つまり復元機構１４０-2は、スナップショット１１２“ｓｎａｐ２”を用いて、時刻ｔ２における仮想マシン１１-1の状態を仮想マシン１１-2上に復元する。なお、ステップＳ２が、ＶＭＭ１３-2によって行われても構わない。つまり、スナップショットに基づいて仮想マシンを復元するための復元機能をＶＭＭ１３-2に持たせても良い。 The first restoration unit 141 in the restoration mechanism 140-2 displays the state of the virtual machine 11-2 generated by the VMM 13-2 at the time t in the column of the snapshots 112 stored in the snapshot area 111. Using the snapshot 112 acquired at the snapshot acquisition time closest to (that is, the failure occurrence time t) (that is, the latest snapshot acquisition time), the same as the virtual machine 11-1 at the latest snapshot acquisition time The state is restored (step S2). As is apparent from FIG. 6, the snapshot acquisition time (second time) closest to time t (first time) is t2, and the snapshot 112 acquired at time t2 is “snap2”. Therefore, in this embodiment, the restoration mechanism 140-2 restores the virtual machine in the same state as the virtual machine 11-1 at the time t2 as the virtual machine 11-2 using the snapshot 112 “snap2”. That is, the restoration mechanism 140-2 restores the state of the virtual machine 11-1 at the time t2 on the virtual machine 11-2 using the snapshot 112 “snap2”. Note that step S2 may be performed by the VMM 13-2. That is, the VMM 13-2 may be provided with a restoration function for restoring a virtual machine based on the snapshot.

さて、時刻ｔ２における仮想マシン１１-1と同一状態の仮想マシンが仮想マシン１１-2として物理計算機１０-2上に復元されると、当該仮想マシン１１-2のＯＳ上では、時刻ｔ２における仮想マシン１１-1と同一の状態で、当該仮想マシン１１-1が実行していたのと同一のアプリケーション１２-1及び１２-2を実行することが可能となる。 When a virtual machine in the same state as the virtual machine 11-1 at time t2 is restored on the physical computer 10-2 as the virtual machine 11-2, the virtual machine at time t2 is displayed on the OS of the virtual machine 11-2. In the same state as the machine 11-1, the same applications 12-1 and 12-2 that the virtual machine 11-1 was executing can be executed.

そこで復元機構１４０-2内のログ要求部１４３は、通信記録装置４０に対して、クライアントマシン（本実施形態ではクライアントマシン３０）と仮想マシン１１-1上で動作するアプリケーション１２-1及び１２-2との間の通信の履歴（ログ）を要求する（ステップＳ３）。この要求に応じて、通信記録装置４０内のログ送信部４４は、ログテーブル４２-1及び４２-2に保存されている通信の履歴を復元機構１４０-2に送信する。ここで、ログテーブル４２-1及び４２-2に保存されている通信の履歴、即ちクライアントマシン（クライアントマシン３０）とアプリケーション１２-1及び１２-2との間の通信の履歴を、それぞれアプリケーション１２-1及び１２-2に対応する通信の履歴と呼ぶ。 Therefore, the log request unit 143 in the restoration mechanism 140-2 sends the communication recording device 40 with the applications 12-1 and 12- running on the client machine (in this embodiment, the client machine 30) and the virtual machine 11-1. A history (log) of communication with 2 is requested (step S3). In response to this request, the log transmission unit 44 in the communication recording device 40 transmits the communication history stored in the log tables 42-1 and 42-2 to the restoration mechanism 140-2. Here, the communication histories stored in the log tables 42-1 and 42-2, that is, the communication histories between the client machine (client machine 30) and the applications 12-1 and 12-2 are respectively shown in the application 12. This is called a communication history corresponding to -1 and 12-2.

ログ送信部４４は、ログテーブル４２-1及び４２-2に保存されている通信の履歴を復元機構１４０-2に送信する際に、フィルタ部４５に対してアプリケーション１２-1及び１２-2宛ての通信データのフィルタリングを要求する。これによりフィルタ部４５は、復元機構１４０-2からフィルタリングの解除が要求されるまで、アプリケーション１２-1及び１２-2宛ての通信データを全てフィルタリング（ブロック）する。 When the log transmission unit 44 transmits the communication history stored in the log tables 42-1 and 42-2 to the restoration mechanism 140-2, the log transmission unit 44 addresses the applications 12-1 and 12-2 to the filter unit 45. Request filtering of communication data. Thereby, the filter unit 45 filters (blocks) all communication data addressed to the applications 12-1 and 12-2 until the cancellation of filtering is requested from the restoration mechanism 140-2.

ログ要求部１４３は、ログ送信部４４から送信された通信の履歴をログ記憶部１４４に格納する（ステップＳ４）。復元機構１４０-2内の再投入部１４５は、ログ記憶部１４４に格納された通信の履歴、つまりアプリケーション１２-1及び１２-2に対応する通信の履歴に含まれている、時刻ｔ２から時刻ｔ（障害発生時刻ｔ）までの入力データの列のみを、仮想マシン１１-2上で動作するアプリケーション１２-1及び１２-2に、ポート番号Ｐ１及びＰ２で示されるポートを介して投入する。ここで、アプリケーション１２-1及び１２-2への入力データの投入の手順は、復元管理テーブル１１４に登録されている、当該アプリケーション１２-1及び１２-2に対応する復元タイプによって示される。但し以下の説明では、簡略化のために、アプリケーション１２-1に対応する通信の履歴に含まれている入力データの列の投入についてのみ説明する。 The log request unit 143 stores the communication history transmitted from the log transmission unit 44 in the log storage unit 144 (step S4). The re-input unit 145 in the restoration mechanism 140-2 includes the communication history stored in the log storage unit 144, that is, the communication history corresponding to the applications 12-1 and 12-2, from time t2. Only the column of input data up to t (failure occurrence time t) is input to the applications 12-1 and 12-2 running on the virtual machine 11-2 through the ports indicated by the port numbers P1 and P2. Here, the procedure for inputting the input data to the applications 12-1 and 12-2 is indicated by the restoration type corresponding to the applications 12-1 and 12-2 registered in the restoration management table 114. However, in the following description, only input of input data included in the communication history corresponding to the application 12-1 will be described for simplification.

本実施形態において、仮想マシン１１-2のＩＰアドレスは仮想マシン１１-1のＩＰアドレスと同一のＩＰＡであり、アプリケーション１２-1に割り当てられたポート番号はＰ１である。この場合、再投入部１４５内の復元タイプ判定部１４５ａは、通信の履歴に含まれている入力データを投入（再投入）するのに先立ち、ディスク装置１００に格納されている復元管理テーブル１１４を参照して、ＩＰアドレスＩＰＡ及びポート番号Ｐ１の対と対応付けられている復元タイプがＲＴ１であるかをチェックする（ステップＳ５）。 In this embodiment, the IP address of the virtual machine 11-2 is the same IPA as the IP address of the virtual machine 11-1, and the port number assigned to the application 12-1 is P1. In this case, the restoration type determination unit 145a in the re-injection unit 145 reads the restoration management table 114 stored in the disk device 100 prior to input (re-injection) of input data included in the communication history. Referring to this, it is checked whether the restoration type associated with the pair of IP address IPA and port number P1 is RT1 (step S5).

図５から明らかなように、復元管理テーブル１１４上でＩＰアドレスＩＰＡ及びポート番号Ｐ１の対と対応付けられている復元タイプはＲＴ１である。この場合、再投入部１４５は、アプリケーション１２-1に対応する通信の履歴に含まれている時刻ｔ２から時刻ｔまでの入力データの列を、予め定められた並列度で、仮想マシン１１-2上で動作している当該アプリケーション１２-1に、ポート番号Ｐ１のポートを介して投入する。本実施形態では、並列度は４であるものとする。この場合、４つの入力データがアプリケーション１２-1に同時に投入される。この４つの入力データを、ＩＮ＃ａ，ＩＮ＃ｂ，ＩＮ＃ｃ及びＩＮ＃ｄで表す。 As apparent from FIG. 5, the restoration type associated with the pair of the IP address IPA and the port number P1 on the restoration management table 114 is RT1. In this case, the re-input unit 145 converts the input data string from time t2 to time t included in the communication history corresponding to the application 12-1 to the virtual machine 11-2 with a predetermined parallelism. The application 12-1 operating above is entered through the port of port number P1. In this embodiment, it is assumed that the degree of parallelism is 4. In this case, four input data are simultaneously input to the application 12-1. These four input data are represented by IN # a, IN # b, IN # c, and IN # d.

さて、アプリケーション１２-1に対応する通信の履歴に含まれている時刻ｔ２から時刻ｔまでの入力データの列は、図３から明らかなように、ＩＮ２１，ＩＮ２２，ＩＮ２３及びＩＮ２４である。したがって並列度が４である本実施形態では、再投入部１４５は、入力データＩＮ２１，ＩＮ２２，ＩＮ２３及びＩＮ２４を上記入力データＩＮ＃ａ，ＩＮ＃ｂ，ＩＮ＃ｃ及びＩＮ＃ｄとして、仮想マシン１１-2上で動作しているアプリケーション１２-1に並行して投入する（ステップＳ６ａ，Ｓ６ｂ，Ｓ６ｃ，Ｓ６ｄ）。 As is apparent from FIG. 3, the input data columns from time t2 to time t included in the communication history corresponding to the application 12-1 are IN21, IN22, IN23, and IN24. Therefore, in the present embodiment in which the degree of parallelism is 4, the re-input unit 145 uses the input data IN21, IN22, IN23, and IN24 as the input data IN # a, IN # b, IN # c, and IN # d as a virtual machine. In parallel with the application 12-1 running on 11-2 (steps S6a, S6b, S6c, S6d).

仮想マシン１１-2は、この並行して投入される入力データＩＮ２１，ＩＮ２２，ＩＮ２３及びＩＮ２４の処理をアプリケーション１２-1に従って実行する。つまり仮想マシン１１-2上では、並行して投入される入力データＩＮ２１，ＩＮ２２，ＩＮ２３及びＩＮ２４に基づく処理がアプリケーション１２-1によって行われる。 The virtual machine 11-2 executes the processing of the input data IN21, IN22, IN23 and IN24 input in parallel according to the application 12-1. That is, on the virtual machine 11-2, processing based on the input data IN21, IN22, IN23, and IN24 input in parallel is performed by the application 12-1.

アプリケーション１２-1は前述したように、仮想マシン上で動作する場合に、先行する処理に関係せず、どのような状態からでも同じ入力に対して同じ出力と同じ状態遷移が行われる第１のタイプのアプリケーションである。したがって、入力データＩＮ２１，ＩＮ２２，ＩＮ２３及びＩＮ２４を、時刻ｔ２〜時刻ｔまでの間にクライアントマシン３０によって投入されたのと同一の順序で投入しなくても、当該入力データＩＮ２１，ＩＮ２２，ＩＮ２３及びＩＮ２４に対するアプリケーション１２-1での処理（仮想マシン１１-2によるアプリケーション１２-1に従う処理）により、アプリケーション１２-1の実行に関して、仮想マシン１１-2は時刻ｔの直前（障害発生直前）の仮想マシン１１-1の状態に復元されることが期待される。 As described above, when the application 12-1 operates on the virtual machine, the same output and the same state transition are performed with respect to the same input from any state regardless of the preceding process. Type of application. Therefore, even if the input data IN21, IN22, IN23, and IN24 are not input in the same order as input by the client machine 30 between time t2 and time t, the input data IN21, IN22, IN23, and As a result of execution of the application 12-1 by the processing in the application 12-1 for IN24 (processing according to the application 12-1 by the virtual machine 11-2), the virtual machine 11-2 performs virtual processing immediately before the time t (immediately before the failure occurs). It is expected to be restored to the state of the machine 11-1.

入力データＩＮ２１，ＩＮ２２，ＩＮ２３及びＩＮ２４の送信元ＩＰアドレスはクライアントマシン３０に割り当てられているＩＰアドレスに一致する。したがって、入力データＩＮ２１，ＩＮ２２，ＩＮ２３及びＩＮ２４に対するアプリケーション１２-1での処理の結果（応答）の送信先ＩＰアドレスには、クライアントマシン３０に割り当てられているＩＰアドレスが用いられる。そこで復元機構１４０-2内の処理結果取得部１４６は、仮想マシン１１-2から出力される、入力データＩＮ２１，ＩＮ２２，ＩＮ２３及びＩＮ２４に対するアプリケーション１２-1での処理の結果（応答）をクライアントマシン３０に代わって取得して、つまり横取りして（ステップＳ７ａ，Ｓ７ｂ，Ｓ７ｃ，Ｓ７ｄ）、当該処理結果がクライアントマシン３０に送信されるのを防止する。 The source IP addresses of the input data IN21, IN22, IN23 and IN24 match the IP address assigned to the client machine 30. Therefore, the IP address assigned to the client machine 30 is used as the transmission destination IP address of the processing result (response) in the application 12-1 for the input data IN21, IN22, IN23, and IN24. Therefore, the processing result acquisition unit 146 in the restoration mechanism 140-2 sends the processing result (response) in the application 12-1 to the input data IN21, IN22, IN23, and IN24 output from the virtual machine 11-2 on the client machine. It is acquired instead of 30, that is, intercepted (steps S7a, S7b, S7c, S7d), and the processing result is prevented from being transmitted to the client machine 30.

復元機構１４０-2内の復元判定部１４７は、処理結果取得部１４６によって取得された、入力データＩＮ２１，ＩＮ２２，ＩＮ２３及びＩＮ２４に対する処理の結果（処理結果）Ｏ２１，Ｏ２２，Ｏ２３及びＯ２４を、アプリケーション１２-1に対応する通信の履歴において当該入力データＩＮ２１，ＩＮ２２，ＩＮ２３及びＩＮ２４と対をなしている出力データＯＵＴ２１，ＯＵＴ２２，ＯＵＴ２３及びＯＵＴ２４と比較し（ステップＳ８ａ，Ｓ８ｂ，Ｓ８ｃ，Ｓ８ｄ）、一致しているか否かにより、復元処理が正常に行われているか否かを判定する（ステップＳ９ａ，Ｓ９ｂ，Ｓ９ｃ，Ｓ９ｄ）。 The restoration determination unit 147 in the restoration mechanism 140-2 uses the processing results (processing results) O21, O22, O23, and O24 for the input data IN21, IN22, IN23, and IN24 acquired by the processing result acquisition unit 146 as applications. Compared with the output data OUT21, OUT22, OUT23 and OUT24 paired with the input data IN21, IN22, IN23 and IN24 in the communication history corresponding to 12-1 (steps S8a, S8b, S8c and S8d), It is determined whether or not the restoration process is normally performed depending on whether or not it is done (steps S9a, S9b, S9c, and S9d).

もし、Ｏ２１，Ｏ２２，Ｏ２３及びＯ２４が、それぞれＯＵＴ２１，ＯＵＴ２２，ＯＵＴ２３及びＯＵＴ２４に等しいならば、復元判定部１４７は、ＩＮ２１，ＩＮ２２，ＩＮ２３及びＩＮ２４の投入による復元処理は成功したと判定する。アプリケーション１２-1では、先行する処理に依存しないため、並列処理で入力を行っても通信の履歴と同一の順序で入力を行った場合と同じ結果になる。このため、ＩＮ２１，ＩＮ２２，ＩＮ２３及びＩＮ２４の投入による復元処理に成功した場合、全体の復元処理も成功したといえる。これに対して、ステップＳ９ａ，Ｓ９ｂ，Ｓ９ｃ，Ｓ９ｄのいずれか１つでも不一致（復元失敗）が判定された場合、図８では省略されているが、上述の仮想マシン復元処理がリトライされる。 If O21, O22, O23, and O24 are equal to OUT21, OUT22, OUT23, and OUT24, respectively, the restoration determination unit 147 determines that the restoration process by the input of IN21, IN22, IN23, and IN24 is successful. In the application 12-1, since it does not depend on the preceding process, even if input is performed in parallel processing, the result is the same as when input is performed in the same order as the communication history. For this reason, when the restoration process by the input of IN21, IN22, IN23 and IN24 is successful, it can be said that the whole restoration process is also successful. On the other hand, if a mismatch (restoration failure) is determined in any one of steps S9a, S9b, S9c, and S9d, the virtual machine restoration process described above is retried, although omitted in FIG.

さて、復元判定部１４７によって復元処理の成功が判定されると、再投入部１４５は次に投入すべき入力データがアプリケーション１２-1に対応する通信の履歴に残されているかを判定する（ステップＳ１０）。もし、次に投入すべき入力データが残されているならば、再投入部１４５は、この入力データを並列度４で投入する。但し、残されている入力データの個数が４に満たない場合には、再投入部１４５は、その残されている入力データを全て投入する。 When the restoration determination unit 147 determines that the restoration process is successful, the re-input unit 145 determines whether the input data to be input next is left in the communication history corresponding to the application 12-1 (step S1). S10). If input data to be input next is left, the re-input unit 145 inputs this input data with a parallel degree of 4. However, when the number of remaining input data is less than 4, the re-input unit 145 inputs all the remaining input data.

これに対し、次に投入すべき入力データが残されていないならば、復元機構１４０-2は仮想マシン復元処理（アプリケーション１２-1に関する仮想マシン復元処理）を終了する。このとき仮想マシン１１-2は、アプリケーション１２-1の動作に関しては、障害障害発生時刻ｔの直前の仮想マシン１１-1の状態に復元されている。したがって、仮想マシン１１-2は、障害発生時刻ｔの直前の仮想マシン１１-1と同一の状態でアプリケーション１２-1によるクライアントマシン３０に対するサービスを継続することができる。この場合、復元機構１４０-2は通信記録装置４０内のフィルタ部４５に対して、仮想マシン１１-2上のアプリケーション１２-1宛ての通信データのフィルタリングの解除を要求する。 On the other hand, if there is no input data to be input next, the restoration mechanism 140-2 ends the virtual machine restoration process (virtual machine restoration process related to the application 12-1). At this time, regarding the operation of the application 12-1, the virtual machine 11-2 is restored to the state of the virtual machine 11-1 immediately before the failure occurrence time t. Therefore, the virtual machine 11-2 can continue the service to the client machine 30 by the application 12-1 in the same state as the virtual machine 11-1 immediately before the failure occurrence time t. In this case, the restoration mechanism 140-2 requests the filtering unit 45 in the communication recording device 40 to cancel filtering of communication data addressed to the application 12-1 on the virtual machine 11-2.

このように本実施形態によれば、アプリケーション１２-1に対応する通信の履歴に含まれている入力データの列を並列に投入することにより、当該アプリケーション１２-1に関しては、仮想マシン１１-2を障害発生直前の仮想マシン１１-1の状態に高速に復元できる。なお、復元タイプＲＴ１の仮想マシン復元処理で適用される並列度が、復元管理テーブル１１４に復元タイプＲＴ１と対応付けて予め設定される構成であっても良い。 As described above, according to the present embodiment, by inputting the input data strings included in the communication history corresponding to the application 12-1 in parallel, the virtual machine 11-2 Can be restored to the state of the virtual machine 11-1 immediately before the failure occurs at high speed. The parallelism applied in the restoration type RT1 virtual machine restoration process may be set in advance in the restoration management table 114 in association with the restoration type RT1.

上記の説明では省略されているが、ログテーブル４２-2に保存されている通信の履歴（即ちアプリケーション１２-2に対応する通信の履歴）を利用した、復元タイプＲＴ２の仮想マシン復元処理が、上述のログテーブル４２-1に保存されている通信の履歴（即ちアプリケーション１２-1に対応する通信の履歴）を利用した復元タイプＲＴ１の仮想マシン復元処理と並行して実行される。但し、復元タイプＲＴ２の仮想マシン復元処理では、アプリケーション１２-2に対応する通信の履歴に含まれている入力データが時系列順で、且つ当該履歴と相対時刻が同一のタイミングで投入される。 Although omitted in the above description, the restoration type RT2 virtual machine restoration process using the communication history stored in the log table 42-2 (that is, the communication history corresponding to the application 12-2) is as follows. It is executed in parallel with the restoration type RT1 virtual machine restoration process using the communication history stored in the log table 42-1 (that is, the communication history corresponding to the application 12-1). However, in the virtual machine restoration process of the restoration type RT2, input data included in the communication history corresponding to the application 12-2 is input in chronological order, and the history and relative time are input at the same timing.

つまり、アプリケーション１２-2は、先行する処理に依存する第２のタイプのアプリケーションである。このため、アプリケーション１２-2に関する仮想マシン復元処理では、当該アプリケーション１２-2に対応する通信の履歴の示す入力順序と入力タイミングとを守って、当該履歴に含まれている入力列を、仮想マシン１１-2上で動作する当該アプリケーション１２-2に逐次投入する復元タイプＲＴ２の復元方法が適用されている。 That is, the application 12-2 is a second type application that depends on the preceding process. For this reason, in the virtual machine restoration processing related to the application 12-2, the input sequence included in the history is converted into the virtual machine while keeping the input order and the input timing indicated by the communication history corresponding to the application 12-2. A restoration method of restoration type RT2 that is sequentially input to the application 12-2 running on 11-2 is applied.

ところで、アプリケーション１２-2に対して入力を行う場合、１回の入力が行われてその入力に対する処理が終了した後、次の入力が行われるまで多大な待ち時間が発生することがある。このため、上記実施形態のように、アプリケーション１２-2に対応する通信の履歴の示す入力順序と入力タイミングとを守って、当該アプリケーション１２-2に関する仮想マシン復元処理を行ったのでは、ある入力・出力と次の入力・出力との間に多大な待ち時間が存在する場合、復元タイプＲＴ２での仮想マシン復元処理がより低速となる。 By the way, when an input is made to the application 12-2, a long waiting time may occur until the next input is performed after the input is performed once and the processing for the input is completed. For this reason, if the virtual machine restoration processing related to the application 12-2 is performed in accordance with the input order and the input timing indicated by the communication history corresponding to the application 12-2 as in the above embodiment, a certain input When there is a large waiting time between an output and the next input / output, the virtual machine restoration process with the restoration type RT2 becomes slower.

そこで、第２のタイプのアプリケーションに関する仮想マシン復元処理において、当該アプリケーションに対応する通信の履歴の示す入力順序は守るものの、入力タイミングについては履歴の示す入力タイミングよりも短縮しても構わない。具体的には、待ち時間が予め定められた一定時間を超える場合には、当該一定時間後に次の入力データを投入すれば良い。また、全ての入力データの列を、履歴の待ち時間とは無関係で、且つ状態遷移に影響を及ぼさない最低の待ち時間で投入しても良い。このような待ち時間は、第２のタイプのアプリケーション毎に、予め設定することができる。この復元手法を適用することにより、第２のタイプのアプリケーションに関する仮想マシン復元処理を高速化することができる。 Therefore, in the virtual machine restoration process related to the second type application, the input order indicated by the communication history corresponding to the application is maintained, but the input timing may be shorter than the input timing indicated by the history. Specifically, when the waiting time exceeds a predetermined time, the next input data may be input after the predetermined time. Further, all the input data columns may be input with a minimum waiting time that is not related to the waiting time of the history and does not affect the state transition. Such a waiting time can be set in advance for each second type of application. By applying this restoration method, it is possible to speed up the virtual machine restoration process for the second type application.

上記実施形態では、復元機構１４０-2内のログ要求部１４３からの要求に応じて、ログテーブル４２-1及び４２-2に保存されている通信の履歴が全て通信記録装置４０内のログ送信部４４によって復元機構１４０-2に送信される。しかし、ログ要求部１４３からの要求に、必要とする履歴の開始時刻と終了時刻を示す時刻情報を含め、当該開始時刻と終了時刻とで示される期間の履歴のみをログ送信部４４から復元機構１４０-2に送信する構成としても良い。 In the above embodiment, all the communication histories stored in the log tables 42-1 and 42-2 are transmitted in the log transmission in the communication recording device 40 in response to a request from the log request unit 143 in the restoration mechanism 140-2. This is transmitted by the unit 44 to the restoration mechanism 140-2. However, the request from the log request unit 143 includes the time information indicating the start time and end time of the required history, and only the history of the period indicated by the start time and end time is restored from the log transmission unit 44. It is good also as a structure which transmits to 140-2.

また上記実施形態では、仮想マシン１１-1または１１-2上で動作するアプリケーション１２-1及び１２-2を利用するクライアントマシンがクライアントマシン３０のみである。しかし、クライアントマシン３０を含む複数のクライアントマシンが仮想マシン１１-1または１１-2上で動作するアプリケーション１２-1及び１２-2を利用する構成であっても構わない。 In the above embodiment, the client machine 30 is the only client machine that uses the applications 12-1 and 12-2 running on the virtual machine 11-1 or 11-2. However, a plurality of client machines including the client machine 30 may use the applications 12-1 and 12-2 that operate on the virtual machine 11-1 or 11-2.

また上記実施形態では、物理計算機１０-1に障害が発生した場合に、その障害発生直前の仮想マシン１１-1の状態を、物理計算機１０-2上で仮想マシン１１-2として復元している。しかし、物理計算機１０-1の障害が回復した際に、当該物理計算機１０-1上で、その障害発生直前の仮想マシン１１-1の状態を復元するようにしても構わない。 In the above embodiment, when a failure occurs in the physical computer 10-1, the state of the virtual machine 11-1 immediately before the failure is restored as the virtual machine 11-2 on the physical computer 10-2. . However, when the failure of the physical computer 10-1 is recovered, the state of the virtual machine 11-1 immediately before the occurrence of the failure may be restored on the physical computer 10-1.

なお、本発明は、上記実施形態またはその変形例そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態またはその変形例に開示されている複数の構成要素の適宜な組み合わせにより種々の発明を形成できる。例えば、実施形態またはその変形例に示される全構成要素から幾つかの構成要素を削除してもよい。 In addition, this invention is not limited to the said embodiment or its modification example as it is, A component can be deform | transformed and embodied in the range which does not deviate from the summary in an implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the embodiment or its modification. For example, you may delete a some component from all the components shown by embodiment or its modification.

本発明の一実施形態に係る仮想計算機システムの構成を示すブロック図。1 is a block diagram showing a configuration of a virtual machine system according to an embodiment of the present invention. 図１に示される通信記録装置の構成を示すブロック図。The block diagram which shows the structure of the communication recording apparatus shown by FIG. 図２に示されるログテーブルの一例を示す図。The figure which shows an example of the log table shown by FIG. 図１に示される復元機構の構成を示すブロック図。The block diagram which shows the structure of the decompression | restoration mechanism shown by FIG. 図１に示される復元管理テーブルの一例を示す図。The figure which shows an example of the restoration | restore management table shown by FIG. 定期的なスナップショット取得を説明するための図。The figure for demonstrating regular snapshot acquisition. ある時刻から物理計算機障害発生時までの期間に物理計算機上で行われた処理の一例を示すフローチャート。The flowchart which shows an example of the process performed on the physical computer in the period from a certain time to the time of physical computer failure occurrence. 同実施形態における仮想マシン復元処理の手順を示すフローチャート。9 is a flowchart showing a procedure of virtual machine restoration processing in the embodiment.

Explanation of symbols

１０-1，１０-2…物理計算機、１１-1，１１-2…仮想マシン、１２-1，１２-2…アプリケーション（ＡＰ）、１３-1，１３-2…仮想マシンマネージャ（ＶＭＭ）、１４-1，１４-2…クラスタマネージャ、１５-1，１５-2…スナップショットマネージャ、２０…通信路、３０…クライアントマシン、４０…通信記録装置、４１…ログ取得部、４２-1，４２-2…ログテーブル、４３…ログ保存部、４４…ログ送信部、４５…フィルタ部、１００…ディスク装置、１１０…仮想ディスク、１１１…スナップショット領域、１１２…スナップショット、１１４…復元管理テーブル、１４０-1，１４０-2…復元機構、１４１…第１の復元部、１４２…第２の復元部、１４３…ログ要求部、１４４…ログ記憶部、１４５…再投入部、１４５ａ…復元タイプ判定部、１４６…処理結果取得部、１４７…復元判定部。 10-1, 10-2 ... physical computer, 11-1, 11-2 ... virtual machine, 12-1, 12-2 ... application (AP), 13-1, 13-2 ... virtual machine manager (VMM), 14-1, 14-2 ... cluster manager, 15-1, 15-2 ... snapshot manager, 20 ... communication path, 30 ... client machine, 40 ... communication recording device, 41 ... log acquisition unit, 42-1, 42 -2 ... log table, 43 ... log storage unit, 44 ... log transmission unit, 45 ... filter unit, 100 ... disk device, 110 ... virtual disk, 111 ... snapshot area, 112 ... snapshot, 114 ... restoration management table, 140-1, 140-2 ... Restoration mechanism, 141 ... First restoration unit, 142 ... Second restoration unit, 143 ... Log request unit, 144 ... Log storage unit, 145 ... Re-injection unit, 145a ... Restoration type determination Part, 1 6 ... processing result acquisition unit, 147 ... restoring determination unit.

Claims

In a virtual computer system including a plurality of physical computers including first and second physical computers, each of which can operate a virtual machine,
A disk device shared by the plurality of physical computers, the disk device providing a data area that can be used as a virtual disk by a virtual machine operating on an arbitrary physical computer of the plurality of physical computers;
In a log table associated with the virtual machine and an application running on the virtual machine, input data processed by the application, and output data as a result of processing executed by the application according to the input data And a communication recording apparatus that records a set of information indicating the input time of the input data as a history of communication with a client machine that uses a service provided by the virtual machine executing the application in time series order And
When the virtual machine operates on the first physical computer, the first physical computer periodically acquires the operation state of the virtual machine and the state of the virtual disk used by the virtual machine as a snapshot. Snapshot management means for storing in the disk device in association with the virtual machine
The second physical computer is
When a failure occurs at a first time in the first physical computer while the virtual machine is operating on the first physical computer, the information is acquired at a second time closest to the first time. And a first restoring means for restoring the virtual machine at the second time on the second physical computer based on the snapshot stored in the disk device in association with the virtual machine. ,
A specific type in which an application running on a virtual machine restored to the state at the second time undergoes the same output and the same state transition for the same input from any state regardless of the preceding process Input from the second time to the first time recorded in the log table associated with the virtual machine restored to the state at the second time and the application. By inputting the data as data to be processed by the application into the virtual machine restored to the state at the second time with a predetermined degree of parallelism, the virtual machine is immediately before the first time. And a second restoration means for performing a restoration process of a specific restoration type for restoring to the state of the virtual machine system.

It is a restoration management table stored in the disk device, and for each combination of a port number of a port assigned to a virtual machine and an application operating on the virtual machine, whether or not the application is the specific type of application Further comprising a restoration management table in which restoration type information indicating the restoration method to be determined is registered;
The second restoration means registers in the restoration management table in association with a combination of a port number assigned to a virtual machine restored to the state at the second time and an application operating on the virtual machine. The virtual machine system according to claim 1, wherein when the restored type information indicates the specific restoration type, the restoration process of the specific restoration type is performed.

The second restoration means registers in the restoration management table in association with a combination of a port number assigned to a virtual machine restored to the state at the second time and an application operating on the virtual machine. In a case where the information on the restored type does not indicate the specific restoration type, the first time from the second time recorded in the log table associated with the virtual machine and the application As input data to be processed by the application, a pair of input data that exceeds a predetermined input time interval in a time series order and at least a predetermined input time interval is stored in the virtual machine at the predetermined input time interval. To restore the virtual machine to the state immediately before the first time. The virtual computer system according to claim 2.

The response from the virtual machine as a result of the processing according to the application of the virtual machine to the input data is compared with the output data recorded in the log table in a pair with the input data The virtual computer system according to claim 2, further comprising: a restoration determination unit that determines whether the state of the virtual machine has been correctly restored.

When a plurality of physical computers including the first and second physical computers that can operate the virtual machines and the virtual machines operate on the physical computers, the operation state of the virtual machines and the use of the virtual machines A plurality of physical computers including snapshot management means for periodically acquiring the state of the virtual disk as a snapshot and storing it in the disk device in association with the virtual machine; A set of information indicating input data processed by the application, output data as a result of processing executed by the application according to the input data, and an input time of the input data Provided by the virtual machine executing the application. A virtual machine restore method applied to a virtual computer system composed of a communication recording apparatus for recording the time series order as a communication history between a client machine that utilizes that service,
In the state where the virtual machine is operating on the first physical computer, if a failure occurs at the first time in the first physical computer, the second physical computer is at the first time. Based on the snapshot acquired at the closest second time and stored in the disk device in association with the virtual machine, the virtual machine at the second time is placed on the second physical computer. Steps to restore,
A specific type in which an application running on a virtual machine restored to the state at the second time undergoes the same output and the same state transition for the same input from any state regardless of the preceding process The second physical computer from the second time recorded in the log table associated with the virtual machine restored to the state of the second time and the application. By inputting the input data up to the first time as data to be processed by the application into the virtual machine restored to the state at the second time with a predetermined parallelism, the virtual machine Performing a restoration process of a specific restoration type that restores the state to the state immediately before the first time. Method.