JP4783392B2

JP4783392B2 - Information processing apparatus and failure recovery method

Info

Publication number: JP4783392B2
Application number: JP2008091727A
Authority: JP
Inventors: 治男冨田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2008-03-31
Filing date: 2008-03-31
Publication date: 2011-09-28
Anticipated expiration: 2028-03-31
Also published as: JP2009245216A

Description

本発明は、仮想記憶機能を有する情報処理装置および同装置に適用される障害回復方法に関する。 The present invention relates to an information processing apparatus having a virtual storage function and a failure recovery method applied to the apparatus.

近年の半導体技術の向上により、ＣＰＵ、Ｉ／Ｏ装置といったハードウェア資源の仮想化に対応したシステムが開発されている。このシステムは、仮想マシンモニタ（ハイパーバイザ）と呼ばれる仮想化ソフトウェアで制御されている。 Due to recent improvements in semiconductor technology, systems that support virtualization of hardware resources such as CPUs and I / O devices have been developed. This system is controlled by virtualization software called a virtual machine monitor (hypervisor).

仮想化ソフトウェアとしては、商用ソフトウェアのVMware（登録商標）や、オープンソースソフトウェアのXenがよく知られている。 As virtualization software, commercial software VMware (registered trademark) and open source software Xen are well known.

これらの仮想化ソフトウェアを利用することで、パーソナルコンピュータのようなコンピュータ上で簡単に仮想化環境を構築できる。 By using these virtualization software, a virtual environment can be easily constructed on a computer such as a personal computer.

この仮想化環境を利用することにより、ＣＰＵ、メモリ、Ｉ／Ｏ装置といったハードウェア資源を複数のオペレーティングシステムが同時に利用する事が可能になり、ハードウェア資源の有効利用が可能になる。 By using this virtual environment, a plurality of operating systems can simultaneously use hardware resources such as a CPU, a memory, and an I / O device, and the hardware resources can be effectively used.

しかし、仮想化技術の分野においては、システムの可用性を高めるためのフォールトトレラント機能の実現方法についてはほとんど報告されていないのが現状である。 However, in the field of virtualization technology, there are few reports on how to implement a fault-tolerant function for increasing system availability.

従来、高信頼性が要求されるサーバ等の計算機システムでは、メモリやＣＰＵ等のハードウェア故障や、ソフトウェアの不具合によるシステム停止を回避する方法として、ハードウェア装置を多重化したサーバ上でＯＳを稼動する方法や、クラスタリングソフトウェアを使用して、複数台のコンポピュータをクラスタリングしたシステムで運用する方法、等が用いられている。 2. Description of the Related Art Conventionally, in a computer system such as a server that requires high reliability, as a method for avoiding a hardware failure such as a memory or a CPU or a system stop due to a software failure, an OS is installed on a server on which hardware devices are multiplexed. A method of operating, a method of operating in a system in which a plurality of computers are clustered using clustering software, and the like are used.

しかし、ハードウェア装置を多重化する方法では、ハードウェアコストが増大し、またソフトウェアの不具合によるシステム停止には対応することが困難であった。また、ハードウェア装置を制御するためのデバイスドライバ等の特別なソフトウェアが必要とされる。 However, in the method of multiplexing hardware devices, the hardware cost increases and it is difficult to cope with a system stop due to a software defect. In addition, special software such as a device driver for controlling the hardware device is required.

一方、クラスタリングソフトウェアを使用して、複数台のコンピュータをクラスタリングしたシステムで運用する方法の場合には、複数台のコンピュータとクラスタリングソフトウェアとを用意する必要があり、システム構築に関する時間とコストが増大するという問題がある。 On the other hand, in the case of a method that uses clustering software to operate a system in which a plurality of computers are clustered, it is necessary to prepare a plurality of computers and clustering software, which increases the time and cost related to system construction. There is a problem.

ところで、特許文献１には、仮想記憶管理方式の計算機におけるデータ復旧方法が開示されている。この方法は、ページテーブルの書き込み可能ビットをオフしておき、プロセッサがページのデータを更新しようとした時にメモリ保護割り込みを発生させ、その割り込み処理ルーチンの中で、ページの内容をディスクに保存するというものである。
特開平１０−７８８８４号公報 Incidentally, Patent Document 1 discloses a data recovery method in a virtual storage management computer. This method turns off the writable bit in the page table, generates a memory protection interrupt when the processor attempts to update the page data, and saves the contents of the page to disk in the interrupt handling routine. That's it.
JP-A-10-78884

しかし、特許文献１の方法では、仮想化機能を有する計算機システムの高信頼化については考慮されていない。また、特許文献１の方法では、あるプログラムのデータの復旧しか考慮していないため、実際にハードウェア障害やソフトウェアの不具合が発生した場合には、システム停止を回避できない危険がある。 However, in the method of Patent Document 1, high reliability of a computer system having a virtualization function is not considered. In addition, since the method of Patent Document 1 only considers data recovery of a certain program, there is a risk that a system stop cannot be avoided when a hardware failure or software failure actually occurs.

仮想化機能を有する計算機システムにおいては、ハードウェア資源は仮想化ソフトウェアによって管理されるため、高信頼化を図るためには、仮想環境上で動作するオペレーティングシステムと仮想化ソフトウェアとの間の新たなインタフェースを実現することが必要となる。 In a computer system having a virtualization function, hardware resources are managed by virtualization software. Therefore, in order to achieve high reliability, a new system between an operating system operating in a virtual environment and virtualization software is required. It is necessary to implement an interface.

本発明は上述の事情を考慮してなされたものであり、ハードウェアの多重化やクラスタリングといった複雑な構成を用いることなく、ハードウェア故障やソフトウェアの不具合によるシステム停止を回避することが情報処理装置および障害回復方法を提供することを目的とする。 The present invention has been made in consideration of the above-described circumstances, and can avoid a system stop due to a hardware failure or a software failure without using a complicated configuration such as hardware multiplexing or clustering. And it aims at providing a failure recovery method.

上述の課題を解決するため、本発明は、仮想化機能と仮想記憶機能とを有し、仮想環境上で動作するオペレーティングシステムの仮想記憶をページ単位で処理する情報処理装置において、前記仮想環境を制御する仮想マシンモニタによって、前記オペレーティングシステムに割り当てるメモリ領域に対応するページテーブルの全ページを書き込み禁止状態に設定する手段と、前記オペレーティングシステムが書き込み禁止のページへアクセスするために前記ページテーブルを参照した際に発生するページ書き込み違反の例外に応答して、前記ページ書き込み違反が発生したページの更新前データを前記オペレーティングシステムのメモリ領域から取得して前記仮想マシンモニタが管理するメモリ領域に保存する処理を、前記仮想マシンモニタによって実行する更新前データ保存手段と、前記更新前データを保存した後に前記ページ書き込み違反が発生したページの書き込み禁止を解除することによって前記オペレーティングシステムによる前記ページ書き込み違反が発生したページへの書き込みを継続させる処理を、前記仮想マシンモニタによって実行する手段と、
定期的にチェックポイント取得処理を実行して、前記情報処理装置のプロセッサの状態を含むコンテキストを、前記オペレーティングシステムの所定の仮想ページアドレスに対応したメモリ領域に保存するチェックポイント取得手段と、前記ページ書き込み違反による例外が発生した場合、前記ページ書き込み違反が発生したページが前記所定の仮想ページアドレスに対応するページであるか否かを判別する処理と、前記ページ書き込み違反が発生したページが前記所定の仮想ページアドレスに対応するページであることが判別された場合、前記書き込み禁止を解除したページを再度書き込み禁止状態に設定して前記オペレーティングシステムに割り当てるページテーブルの全ページを書き込み禁止状態に再設定する処理とを、前記仮想マシンモニタによって実行する手段とを具備することを特徴とする。 In order to solve the above-described problem, the present invention provides an information processing apparatus that has a virtualization function and a virtual storage function and processes virtual storage of an operating system operating on a virtual environment in units of pages. Means for setting all pages of a page table corresponding to a memory area allocated to the operating system to a write-inhibited state by a virtual machine monitor to be controlled, and the operating system refers to the page table to access a write-protected page; In response to a page write violation exception that occurs when the page write violation occurs, the pre-update data of the page in which the page write violation has occurred is acquired from the memory area of the operating system and stored in the memory area managed by the virtual machine monitor Processing the virtual machine monitor Therefore, the pre-update data storage means to be executed, and writing to the page where the page write violation has occurred by the operating system by releasing the write prohibition of the page where the page write violation has occurred after storing the pre-update data Means for executing the process to be continued by the virtual machine monitor;
Checkpoint acquisition means for periodically executing checkpoint acquisition processing and storing a context including a state of a processor of the information processing apparatus in a memory area corresponding to a predetermined virtual page address of the operating system; and the page When an exception due to a write violation occurs, a process for determining whether or not the page in which the page write violation has occurred is a page corresponding to the predetermined virtual page address, and the page in which the page write violation has occurred is the predetermined If it is determined that the page corresponds to the virtual page address of the page, the page for which the write prohibition has been canceled is set to the write prohibition state again, and all the pages of the page table allocated to the operating system are reset to the write prohibition state. Processing to perform the virtual machine Characterized by comprising a means for performing the monitoring.

また、本発明は、仮想化機能と仮想記憶機能とを有する情報処理装置を障害から回復するための障害回復処理方法であって、仮想マシンモニタによって制御される仮想環境上で動作するオペレーティングシステムに割り当てるメモリ領域に対応するページテーブルの全ページを書き込み禁止状態に設定処理を、前記仮想環境を制御する仮想マシンモニタによって実行するステップと、前記オペレーティングシステムが書き込み禁止のページへアクセスするために前記ページテーブルを参照した際に発生するページ書き込み違反の例外に応答して、前記ページ書き込み違反が発生したページの更新前データを前記オペレーティングシステムのメモリ領域から取得して前記仮想マシンモニタが管理するメモリ領域に保存する処理を、前記仮想マシンモニタによって実行するステップと、前記更新前データを保存した後に前記ページ書き込み違反が発生したページの書き込み禁止を解除することによって前記オペレーティングシステムによる前記ページ書き込み違反が発生したページへの書き込みを継続させる処理を、前記仮想マシンモニタによって実行するステップと、定期的にチェックポイント取得処理を実行して、前記情報処理装置のプロセッサの状態を含むコンテキストを、前記オペレーティングシステムの所定の仮想ページアドレスに対応したメモリ領域に保存するチェックポイント取得ステップと、前記ページ書き込み違反による例外が発生した場合、前記ページ書き込み違反が発生したページが前記所定の仮想ページアドレスに対応するページであるか否かを判別する処理と、前記ページ書き込み違反が発生したページが前記所定の仮想ページアドレスに対応するページであることが判別された場合、前記書き込み禁止を解除したページを再度書き込み禁止状態に設定して前記オペレーティングシステムに割り当てるページテーブルの全ページを書き込み禁止状態に再設定する処理とを、前記仮想マシンモニタによって実行するステップとを具備することを特徴とする。 The present invention also provides a failure recovery processing method for recovering an information processing apparatus having a virtualization function and a virtual storage function from a failure, and an operating system that operates in a virtual environment controlled by a virtual machine monitor. A step of executing a process of setting all pages of a page table corresponding to a memory area to be allocated to a write-protected state by a virtual machine monitor that controls the virtual environment; and the page for the operating system to access a page that is write-protected. A memory area managed by the virtual machine monitor by acquiring pre-update data of the page in which the page write violation occurred from the memory area of the operating system in response to an exception of the page write violation that occurs when referring to the table The process of saving in the virtual machine A step of executing by the monitor, and a process of continuing to write to the page in which the page write violation has occurred by releasing the write prohibition of the page in which the page write violation has occurred after storing the pre-update data Are executed by the virtual machine monitor, and a checkpoint acquisition process is periodically executed, and a context including the state of the processor of the information processing apparatus is stored in a memory corresponding to a predetermined virtual page address of the operating system. A checkpoint acquisition step for storing in an area, and a process for determining whether or not a page in which the page write violation has occurred is a page corresponding to the predetermined virtual page address when an exception due to the page write violation occurs When it is determined that the page in which the page write violation has occurred is a page corresponding to the predetermined virtual page address, the page for which the write prohibition is canceled is again set to the write prohibition state and assigned to the operating system. And a step of executing, by the virtual machine monitor, a process of resetting all pages of the page table to a write-inhibited state.

また、本発明は、仮想記憶機能を有するコンピュータのハードウェア資源を仮想化する仮想マシンモニタとして機能するプログラムであって、前記仮想マシンモニタによって制御される仮想環境上で動作するオペレーティングシステムに割り当てるメモリ領域に対応するページテーブルの全ページを書き込み禁止状態に設定する手順と、前記オペレーティングシステムが書き込み禁止のページへアクセスするために前記ページテーブルを参照した際に発生するページ書き込み違反の例外に応答して、前記ページ書き込み違反が発生したページの更新前データを前記オペレーティングシステムのメモリ領域から取得して前記仮想マシンモニタが管理するメモリ領域に保存する処理を実行する更新前データ保存手順と、前記更新前データを保存した後に前記ページ書き込み違反が発生したページの書き込み禁止を解除することによって前記オペレーティングシステムによる前記ページ書き込み違反が発生したページへの書き込みを継続させる処理を実行する手順と、前記ページ書き込み違反による例外が発生した場合、前記ページ書き込み違反が発生したページが前記オペレーティングシステムの所定の仮想ページアドレスに対応するページであるか否かを判別する判別手順であって、前記オペレーティングシステムは、定期的にチェックポイント取得処理を実行して、前記情報処理装置のプロセッサの状態を含むコンテキストを、前記所定の仮想ページアドレスに対応したメモリ領域に保存するタスクを含んでいる、判別手順と、前記ページ書き込み違反による例外が発生した場合、前記ページ書き込み違反が発生したページが前記所定の仮想ページアドレスに対応するページであるか否かを判別する手順と、前記ページ書き込み違反が発生したページが前記所定の仮想ページアドレスに対応するページであることが判別された場合、前記書き込み禁止を解除したページを再度書き込み禁止状態に設定して前記オペレーティングシステムに割り当てるページテーブルの全ページを書き込み禁止状態に再設定する手順とを、前記コンピュータに実行させることを特徴とする。 The present invention also provides a program that functions as a virtual machine monitor that virtualizes hardware resources of a computer having a virtual storage function, and that is allocated to an operating system that operates in a virtual environment controlled by the virtual machine monitor. Responding to a procedure for setting all pages of the page table corresponding to the area to the write-protected state and a page write violation exception that occurs when the operating system references the page table to access the write-protected page. A pre-update data storage procedure for executing processing for acquiring pre-update data of a page in which the page write violation has occurred from a memory area of the operating system and storing the pre-update data in a memory area managed by the virtual machine monitor; Save previous data A procedure for executing a process of continuing to write to the page where the page write violation occurred by the operating system by releasing the write prohibition of the page where the page write violation occurred later, and an exception due to the page write violation occurs A determination procedure for determining whether or not the page in which the page write violation has occurred is a page corresponding to a predetermined virtual page address of the operating system, and the operating system periodically acquires a checkpoint A determination procedure including a task for executing a process and storing a context including a state of a processor of the information processing apparatus in a memory area corresponding to the predetermined virtual page address; and an exception due to the page write violation if it occurs, A procedure for determining whether or not a page in which a page write violation has occurred is a page corresponding to the predetermined virtual page address; and a page in which the page write violation has occurred is a page corresponding to the predetermined virtual page address. If it is determined that there is, the computer executes a procedure for setting the page for which the write prohibition is canceled to the write prohibition state again and resetting all the pages of the page table assigned to the operating system to the write prohibition state. It is characterized by making it.

本発明によれば、ハードウェアの多重化やクラスタリングといった複雑な構成を用いることなく、ハードウェア故障やソフトウェアの不具合によるシステム停止を回避することが可能となる。 According to the present invention, it is possible to avoid a system stop due to a hardware failure or software failure without using a complicated configuration such as hardware multiplexing or clustering.

以下、図面を参照して本発明の実施形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

まず、図１を参照して、本発明の一実施形態に係る情報処理装置の構成を説明する。この情報処理装置はＩＡアーキテクチャのような既存のＰＣアーキテクチャを用いたコンピュータであり、例えば各種業務処理を実行するためのサーバコンピュータとして実現されている。 First, the configuration of an information processing apparatus according to an embodiment of the present invention will be described with reference to FIG. This information processing apparatus is a computer using an existing PC architecture such as the IA architecture, and is realized as a server computer for executing various business processes, for example.

図１には、このコンピュータの構成が模式的に示されている。このコンピュータは、図示のように、１以上のＣＰＵ（プロセッサ）１１１と、メモリ管理ユニット（ＭＭＵ）１１２と、１以上のメモリモジュールから構成されるメモリ１１３と、ハードディスクドライブ（ＨＤＤ）のようなディスクドライブ装置１１４と、グラフィクスコントローラ、ＬＡＮコントローラ等の他の各種Ｉ／Ｏデバイス１１５等を備えている。メモリ管理ユニット（ＭＭＵ）１１２は、システムバス１００を介して、メモリ１１３、ディスクドライブ装置１１４、およびＩ／Ｏデバイス１１５それぞれに接続されている。 FIG. 1 schematically shows the configuration of the computer. As shown, the computer includes one or more CPUs (processors) 111, a memory management unit (MMU) 112, a memory 113 composed of one or more memory modules, and a disk such as a hard disk drive (HDD). A drive device 114 and other various I / O devices 115 such as a graphics controller and a LAN controller are provided. The memory management unit (MMU) 112 is connected to the memory 113, the disk drive device 114, and the I / O device 115 via the system bus 100.

メモリ管理ユニット（ＭＭＵ）１１２は仮想記憶機能をサポートするための機構であり、ページングによって、仮想環境（仮想マシン）上で動作するオペレーティングシステムの仮想記憶アドレスを物理アドレスに変換する。なお、このメモリ管理ユニット（ＭＭＵ）１１２はＣＰＵ１１１内に実装する形式であってもよい。また、マルチプロセッサ構成の構成例としては、複数のコアを含む１つのＣＰＵ１１１を用いる構成や、複数のＣＰＵ１１１を用いる構成等が利用し得る。 The memory management unit (MMU) 112 is a mechanism for supporting a virtual storage function, and converts a virtual storage address of an operating system operating on a virtual environment (virtual machine) into a physical address by paging. The memory management unit (MMU) 112 may be mounted in the CPU 111. As a configuration example of the multiprocessor configuration, a configuration using one CPU 111 including a plurality of cores, a configuration using a plurality of CPUs 111, and the like can be used.

本コンピュータにおいては、ＣＰＵおよびＩ／Ｏデバイス等のハードウェア資源を仮想化し、その仮想環境上で既存のオペレーティングシステムを動作させる仮想化ソフトウェアである仮想マシンモニタが実行される。この仮想マシンモニタは、仮想環境上で動作するオペレーティングシステムの仮想記憶をページ単位で処理して、実ハードウェア資源を制御する。 In this computer, a virtual machine monitor, which is virtualization software that virtualizes hardware resources such as a CPU and an I / O device and operates an existing operating system in the virtual environment, is executed. This virtual machine monitor processes virtual memory of an operating system operating in a virtual environment in units of pages and controls real hardware resources.

図２には、本コンピュータのハードウェアとソフトウェアとの関係が示されている。 FIG. 2 shows the relationship between the hardware and software of this computer.

ＣＰＵ１１１、メモリ１１３、ディスクドライブ装置１１４、他のＩ／Ｏデバイス１１５といったハードウェア資源（実ハードウェア）の制御は全て仮想マシンモニタ（ハイパーバイザ）２００によって実行される。この仮想マシンモニタ２００は各オペレーティングシステム（ゲストＯＳ）に対して仮想環境（仮想マシン）を提供する。つまり、仮想マシンモニタ２００は、ハードウェア資源を仮想化し、論理的なハードウェア資源を各ゲストＯＳに割り当てる。これにより、仮想マシンモニタ２００上によってゲストＯＳ毎に提供される仮想環境上で複数のゲストＯＳがそれぞれ独立して動作することができる。 Control of hardware resources (real hardware) such as the CPU 111, the memory 113, the disk drive device 114, and other I / O devices 115 are all executed by the virtual machine monitor (hypervisor) 200. The virtual machine monitor 200 provides a virtual environment (virtual machine) to each operating system (guest OS). That is, the virtual machine monitor 200 virtualizes hardware resources and allocates logical hardware resources to each guest OS. As a result, a plurality of guest OSs can operate independently on the virtual environment provided for each guest OS on the virtual machine monitor 200.

図３は、本コンピュータの有する仮想記憶機能の例を示している。仮想記憶機能は、ページングを用いて、仮想アドレス空間（仮想記憶空間）を所定サイズのページ単位で物理アドレス空間（物理メモリ空間）にアドレス変換する機能である。このアドレス変換には、ページテーブルが用いられる。ＭＭＵ１１２にはトランスレーション・ルックアサイド・バッファ（ＴＬＢ）が設けられており、ページテーブルの内容がキャッシュされている。 FIG. 3 shows an example of the virtual storage function of the computer. The virtual storage function is a function that converts the virtual address space (virtual storage space) into a physical address space (physical memory space) in units of pages of a predetermined size using paging. A page table is used for this address conversion. The MMU 112 is provided with a translation lookaside buffer (TLB), and the contents of the page table are cached.

図３では、x86 CPUにおけるページ単位でのメモリ管理の例がされている（３２ビットアドレッシングの場合）。ＣＰＵ１１１が仮想アドレス空間上のあるデータをアクセスする場合、そのデータの仮想アドレス（３２ビット）は、例えば、ページディレクトリインデックス、ページテーブルインデックス、およびページオフセットから構成される。ＣＰＵのＣＲ３レジスタによって指定されるページディレクトリ（ＰＤ）内のページディレクトリエントリ（ＰＤＥ）が仮想アドレス内のページディレクトリインデックスによって指定される。そして、指定されたページディレクトリエントリ（ＰＤＥ）の値は、複数のページテーブル（ＰＴ）の中から、仮想アドレスに対応するページテーブルを指定する。仮想アドレス内のページテーブルインデックスは、指定されたページテーブル内のページテーブルエントリ（ＰＴＥ）を指定する。このページテーブルエントリ（ＰＴＥ）の値は仮想アドレスに対応する物理メモリ上の物理ページのアドレスを示す。この物理ページの中で、仮想アドレス内のページオフセットで指定されるアドレスが、仮想アドレスに対応する物理アドレスとなる。ページテーブル（ＰＴ）内の各ページテーブルエントリ（ＰＴＥ）の有効／無効により、ＣＰＵがアクセスする仮想空間アドレスに対応する物理ページが物理メモリ１１３上に存在するか否かが識別される。 FIG. 3 shows an example of memory management in units of pages in the x86 CPU (in the case of 32-bit addressing). When the CPU 111 accesses certain data in the virtual address space, the virtual address (32 bits) of the data is composed of, for example, a page directory index, a page table index, and a page offset. The page directory entry (PDE) in the page directory (PD) specified by the CR3 register of the CPU is specified by the page directory index in the virtual address. The value of the designated page directory entry (PDE) designates a page table corresponding to the virtual address from among a plurality of page tables (PT). The page table index in the virtual address specifies a page table entry (PTE) in the specified page table. The value of this page table entry (PTE) indicates the address of the physical page on the physical memory corresponding to the virtual address. In this physical page, an address specified by a page offset in the virtual address is a physical address corresponding to the virtual address. The validity / invalidity of each page table entry (PTE) in the page table (PT) identifies whether a physical page corresponding to the virtual space address accessed by the CPU exists on the physical memory 113.

このように、ページテーブル（ＰＴ）は、仮想ページアドレスを物理ページアドレスに変換するために用いられる。 Thus, the page table (PT) is used to convert a virtual page address into a physical page address.

図４には、本実施形態で用いられるメモリ割り当ての例が示されている。 FIG. 4 shows an example of memory allocation used in this embodiment.

本実施形態においては、物理メモリ１１３のメモリ空間上には、ゲストＯＳ１用のメモリ領域３０１、ゲストＯＳ２用のメモリ領域３０２、ゲストＯＳ３用のメモリ領域３０３、仮想マシンモニタ（ハイパーバイザ）用のメモリ領域３０４、ゲストＯＳ１の更新履歴用メモリ領域３０５、ゲストＯＳ２の更新履歴用メモリ領域３０６、ゲストＯＳ３の更新履歴用メモリ領域３０７、ゲストＯＳ１〜ＯＳ３に対する障害時代替用メモリ領域３０８とが割り当てられている。 In the present embodiment, on the memory space of the physical memory 113, the memory area 301 for the guest OS1, the memory area 302 for the guest OS2, the memory area 303 for the guest OS3, and the memory for the virtual machine monitor (hypervisor) An area 304, an update history memory area 305 of the guest OS1, an update history memory area 306 of the guest OS2, an update history memory area 307 of the guest OS3, and a failure replacement memory area 308 for the guest OS1 to OS3 are allocated. Yes.

例えば、ゲストＯＳ１は、仮想アドレス空間（０〜２ＧＢ）を用いてゲストＯＳ１用のメモリ領域３０１をアクセスすることができ、またゲストＯＳ２は、仮想アドレス空間（０〜１ＧＢ）を用いてゲストＯＳ２用のメモリ領域３０２をアクセスすることができる。 For example, the guest OS 1 can access the memory area 301 for the guest OS 1 using the virtual address space (0 to 2 GB), and the guest OS 2 is used for the guest OS 2 using the virtual address space (0 to 1 GB). The memory area 302 can be accessed.

ゲストＯＳ１の更新履歴用メモリ領域３０５はゲストＯＳ１によって行われたメモリ更新履歴を格納するための領域であり、ここには、ゲストＯＳ１によって更新されるメモリ領域３０１上のデータの更新前データが仮想マシンモニタ２００によって蓄積される。 The update history memory area 305 of the guest OS 1 is an area for storing a memory update history performed by the guest OS 1. In this area, data before update of the data in the memory area 301 updated by the guest OS 1 is virtual. Accumulated by the machine monitor 200.

ゲストＯＳ２の更新履歴用メモリ領域３０６はゲストＯＳ２によって行われたメモリ更新履歴を格納するための領域であり、ここには、ゲストＯＳ２によって更新されるメモリ領域３０２上のデータの更新前データが仮想マシンモニタ２００によって蓄積される。 The update history memory area 306 of the guest OS 2 is an area for storing a memory update history performed by the guest OS 2. In this area, data before update of the data in the memory area 302 updated by the guest OS 2 is virtual. Accumulated by the machine monitor 200.

ゲストＯＳ３の更新履歴用メモリ領域３０７はゲストＯＳ３によって行われたメモリ更新履歴を格納するための領域であり、ここには、ゲストＯＳ３によって更新されるメモリ領域３０３上のデータの更新前データが仮想マシンモニタ２００によって蓄積される。 The update history memory area 307 of the guest OS 3 is an area for storing the memory update history performed by the guest OS 3, and here, the pre-update data of the data on the memory area 303 updated by the guest OS 3 is virtual. Accumulated by the machine monitor 200.

これら更新履歴用メモリ領域３０５〜３０７は仮想マシンモニタ２００が管理するメモリ領域である。 These update history memory areas 305 to 307 are memory areas managed by the virtual machine monitor 200.

障害時代替用メモリ領域３０８は、ゲストＯＳに割り当てられた物理メモリ領域内のある記憶位置にハードウェア障害が発生した場合に、その記憶位置の代替用に用いられる記憶領域である。この障害時代替用メモリ領域３０８も仮想マシンモニタ２００が管理するメモリ領域である。 The failure replacement memory area 308 is a storage area used for replacement of a storage location when a hardware failure occurs at a storage location in the physical memory area allocated to the guest OS. This fault replacement memory area 308 is also a memory area managed by the virtual machine monitor 200.

ここで、仮想マシンモニタ２００によって実行される更新前データ蓄積処理の概要について説明する。ここでは、説明を簡単にするために、ゲストＯＳが１つである場合を例示する。 Here, an outline of the pre-update data accumulation process executed by the virtual machine monitor 200 will be described. Here, in order to simplify the description, a case where there is one guest OS is illustrated.

仮想マシンモニタ２００は、ゲストＯＳに割り当てるメモリ領域に対応するページテーブルの全ページを書き込み禁止状態に設定する。例えば、各ページテーブルエントリの書き込み可能属性を示すビットをオフに設定すればよい。ゲストＯＳが書き込み禁止のページへアクセスするためにページテーブルを参照すると、ページ書き込み違反の例外（または割り込み）がＣＰＵ内で発生し、これによって仮想マシンモニタ２００に制御が渡される。仮想マシンモニタ２００は、ページ書き込み違反の例外の発生に応答して、ページ書き込み違反が発生したページの更新前データをゲストＯＳのメモリ領域から取得して、そのゲストＯＳ用の更新履歴用メモリ領域に保存する。 The virtual machine monitor 200 sets all pages in the page table corresponding to the memory area allocated to the guest OS to a write-protected state. For example, the bit indicating the writable attribute of each page table entry may be set off. When the guest OS refers to the page table to access a write-protected page, a page write violation exception (or interrupt) is generated in the CPU, which gives control to the virtual machine monitor 200. In response to the occurrence of the page write violation exception, the virtual machine monitor 200 acquires the pre-update data of the page in which the page write violation has occurred from the memory area of the guest OS, and the update history memory area for the guest OS Save to.

更新前データを保存した後に、仮想マシンモニタ２００は、ページ書き込み違反が発生したページの書き込み禁止を解除することによってゲストＯＳによるページ書き込み違反が発生したページへの書き込みを継続させる。これにより、ゲストＯＳは、当該ページへの書き込みを正常に実行することが出来るようになる。 After saving the pre-update data, the virtual machine monitor 200 continues to write to the page where the page write violation has occurred by the guest OS by releasing the write prohibition of the page where the page write violation has occurred. As a result, the guest OS can normally execute writing to the page.

このように更新履歴用メモリ領域に更新前データを蓄積することにより、更新履歴用メモリ領域を用いてゲストＯＳのメモリ領域の内容を復元することができる。 By accumulating the pre-update data in the update history memory area in this way, the contents of the guest OS memory area can be restored using the update history memory area.

更新前データ蓄積処理は、チェックポイント取得処理のタイミングと同期するように制御される。ここで、チェックポイント取得処理は、プロセスの実行の途中の状態からの再実行に必要な情報を定期的に保存する処理である。この情報を保存する時点をチェックポイントと呼び、その情報を保存することをチェックポイントの取得と呼ぶ。チェックポイント取得処理では、ＣＰＵの状態（プログラムカウンタ、レジスタ値等）を含むコンテキストを再実行に必要な情報として保存する。このチェックポイント取得処理はゲストＯＳのカーネルによって実行することが出来る。この場合、ゲストＯＳは、取得したコンテキストを、そのゲストＯＳの所定の仮想ページアドレスに対応したメモリ領域に保存する。仮想マシンモニタ２００は、チェックポイント取得に使用する仮想ページアドレスに対するライトをトリガに、更新前データ蓄積処理を制御することができる。 The pre-update data storage process is controlled to synchronize with the timing of the checkpoint acquisition process. Here, the checkpoint acquisition process is a process of periodically storing information necessary for re-execution from a state in the middle of process execution. The time when this information is stored is called a checkpoint, and the storage of this information is called checkpoint acquisition. In the checkpoint acquisition process, the context including the CPU state (program counter, register value, etc.) is saved as information necessary for re-execution. This checkpoint acquisition process can be executed by the guest OS kernel. In this case, the guest OS stores the acquired context in a memory area corresponding to a predetermined virtual page address of the guest OS. The virtual machine monitor 200 can control the pre-update data storage process triggered by a write to a virtual page address used for checkpoint acquisition.

すなわち、ページ書き込み違反による例外が発生した場合、仮想マシンモニタ２００は、まず、ページ書き込み違反が発生したページがチェックポイント取得用の仮想ページアドレスに対応するページであるか否かを判別する。このチェックポイント取得用の仮想ページアドレスで指定されるページは、ゲストＯＳと仮想マシンモニタ２００との間のインタフェース用のページとして用いられる。 That is, when an exception due to a page writing violation occurs, the virtual machine monitor 200 first determines whether or not the page where the page writing violation has occurred is a page corresponding to the virtual page address for checkpoint acquisition. The page specified by the virtual page address for acquiring the checkpoint is used as a page for an interface between the guest OS and the virtual machine monitor 200.

ページ書き込み違反が発生したページがチェックポイント取得用の仮想ページアドレスに対応するページであるならば、仮想マシンモニタ２００は、書き込み禁止を解除したページを再度書き込み禁止状態に設定し、これによってゲストＯＳに割り当てるページテーブルの全ページを書き込み禁止状態に再設定する。 If the page in which the page write violation has occurred is a page corresponding to the virtual page address for checkpoint acquisition, the virtual machine monitor 200 sets the page for which the write prohibition has been canceled to the write prohibition state again, whereby the guest OS Reset all pages in the page table to be assigned to write-protected state.

このようにして、チェックポイントの度にページテーブルの全ページが書き込み禁止状態に初期設定され、あるチェックポイントから次のチェックポイントまでの期間においては、各ページ毎に、そのページに対する最初の更新に対応する更新前データのみが蓄積される。 In this way, every page in the page table is initially set to a write-protected state at each checkpoint, and during the period from one checkpoint to the next checkpoint, each page is updated for the first time. Only the corresponding pre-update data is accumulated.

また、仮想マシンモニタ２００は、チェックポイント取得処理のタイミングと同期するように、Ｉ／Ｏ装置に対するアクセスを実行する。 The virtual machine monitor 200 executes access to the I / O device so as to synchronize with the timing of the checkpoint acquisition process.

すなわち、仮想マシンモニタ２００は、ゲストＯＳから発行されるＩ／Ｏ要求の実行タイミングを次のチェックポイント取得処理が実行されるまで遅延させるために、ゲストＯＳから発行されるＩ／Ｏ要求を蓄積する。そして、仮想マシンモニタ２００は、ページ書き込み違反が発生したページがチェックポイント取得用の仮想ページアドレスに対応するページであることが判別された時、つまりチェックポイント取得処理が実行された時に、蓄積されているＩ／Ｏ要求を実行する。 That is, the virtual machine monitor 200 accumulates the I / O request issued from the guest OS in order to delay the execution timing of the I / O request issued from the guest OS until the next checkpoint acquisition process is executed. To do. The virtual machine monitor 200 accumulates when it is determined that the page where the page write violation has occurred is a page corresponding to the virtual page address for checkpoint acquisition, that is, when the checkpoint acquisition processing is executed. The I / O request being executed.

このようにＩ／Ｏ要求の実行を遅らせることにより、障害が発生した場合には、ＣＰＵおよびメモリの状態のみならず、Ｉ／Ｏ装置の状態も、直前のチェックポイント時点の状態に復元することが出来る。 By delaying the execution of the I / O request in this way, when a failure occurs, not only the state of the CPU and memory but also the state of the I / O device can be restored to the state at the time of the previous checkpoint. I can do it.

仮想マシンモニタ２００は全てのハードウェア資源を直接的に制御しているので、ゲストＯＳによるＩ／Ｏ要求が実際に実行されるタイミングを容易に制御することができる。 Since the virtual machine monitor 200 directly controls all hardware resources, the timing at which an I / O request by the guest OS is actually executed can be easily controlled.

また、仮想マシンモニタ２００は、ハードウェア故障時に以下の処理を実行する。ここでは、メモリ１１２のＥＣＣエラーを例示して説明する。 In addition, the virtual machine monitor 200 executes the following processing when a hardware failure occurs. Here, an ECC error of the memory 112 will be described as an example.

ＥＣＣエラー発生時には、ＮＭＩ（割り込み）が発生し、仮想マシンモニタ２００に制御が渡される。 When an ECC error occurs, an NMI (interrupt) occurs and control is passed to the virtual machine monitor 200.

仮想マシンモニタ２００は、発生したＥＣＣエラーが誤り訂正可能なエラー（１ビットエラー）であるか否かを判別する。発生したＥＣＣエラーが誤り訂正可能なエラーでない場合（２ビットエラー）、仮想マシンモニタ２００は、保存されているコンテキスト（チェックポイント情報）を用いてＣＰＵの状態を直前のチェックポイントに対応する状態に復元すると共に、更新履歴用メモリ領域に保存されている更新前データを用いてゲストＯＳのメモリ領域の状態を直前のチェックポイントに対応する時点に復元する。 The virtual machine monitor 200 determines whether or not the generated ECC error is an error correctable error (1 bit error). When the generated ECC error is not an error correctable error (2-bit error), the virtual machine monitor 200 changes the CPU state to the state corresponding to the immediately preceding checkpoint using the saved context (checkpoint information). At the same time, the pre-update data stored in the update history memory area is used to restore the state of the guest OS memory area to the time corresponding to the immediately preceding checkpoint.

さらに、仮想マシンモニタ２００は、誤り訂正可能ではないエラーが発生した仮想ページアドレス用に、障害時代替用メモリ領域３０８の空き物理ページを割り当て、誤り訂正可能ではないエラーが発生した仮想ページアドレスに対応するページテーブルのエントリに、空き物理ページのページアドレスを設定する。 Furthermore, the virtual machine monitor 200 allocates a free physical page in the fault replacement memory area 308 to a virtual page address where an error that is not error-correctable has occurred, and assigns a virtual page address that has an error that is not error-correctable. The page address of the free physical page is set in the corresponding page table entry.

また、仮想マシンモニタ２００は、ソフトウェアの不具合によるシステム停止を回避するために、以下の機能も有している。 The virtual machine monitor 200 also has the following functions in order to avoid a system stop due to a software defect.

仮想マシンモニタ２００は、まず、ゲストＯＳがパニック関数を呼び出す仮想ページアドレスに対応したページが存在しない状態にページテーブルを初期設定する。ゲストＯＳがパニック関数を呼び出すためにページテーブルを参照すると、ページフォルトによる例外が発生し、これによって仮想マシンモニタ２００に制御が移る。仮想マシンモニタ２００は、ページフォルトによる例外の発生に応答して、ページフォルトが発生したページがパニック関数のコードを呼び出す仮想ページアドレスに対応するページであるか否かを判別する。 First, the virtual machine monitor 200 initializes the page table in a state where there is no page corresponding to the virtual page address at which the guest OS calls the panic function. When the guest OS refers to the page table in order to call the panic function, an exception due to a page fault occurs, and thereby control is transferred to the virtual machine monitor 200. In response to the occurrence of an exception due to a page fault, the virtual machine monitor 200 determines whether or not the page in which the page fault has occurred is a page corresponding to a virtual page address that calls a panic function code.

パニック関数のコードを呼び出す仮想ページアドレスに対応するページであるならば、仮想マシンモニタ２００は、保存されているコンテキスト（チェックポイント情報）を用いてＣＰＵの状態を直前のチェックポイントに対応する状態に復元すると共に、更新履歴用メモリ領域に保存されている更新前データを用いてゲストＯＳのメモリ領域の状態を直前のチェックポイントに対応する時点に復元する。 If the page corresponds to the virtual page address that calls the panic function code, the virtual machine monitor 200 changes the CPU state to the state corresponding to the immediately preceding checkpoint using the saved context (checkpoint information). At the same time, the pre-update data stored in the update history memory area is used to restore the state of the guest OS memory area to the time corresponding to the immediately preceding checkpoint.

次に、仮想マシンモニタ２００によって実行される具体的な処理について説明する。 Next, specific processing executed by the virtual machine monitor 200 will be described.

まず、仮想マシンモニタによるページフォルト処理を説明する。ここでは、本実施形態の構成例を理解し易くするために、まず、一般的な仮想マシンモニタとしてＸｅｎを例として説明する。 First, page fault processing by the virtual machine monitor will be described. Here, in order to facilitate understanding of the configuration example of the present embodiment, first, a general virtual machine monitor will be described using Xen as an example.

仮想マシンミニタＸｅｎでは、ゲストＯＳの仮想空間アドレスのページング機能を管理するデータ構造であるページテーブル（ゲストページテーブル）とは別に、ゲストＯＳの仮想空間アドレスと物理メモリを変換するための、シャドウページテーブルを使用したアドレス変換方式が一般的である。ゲストＯＳによって管理されるゲストページテーブルはアドレス変換に関しては仮想的なものであり、ゲストページテーブルの各エントリにはゲストＯＳが物理ページアドレスであると思っているアドレスが格納される。シャドウページテーブルはＭＭＵ１１２に実際に設定される実ページテーブルであり、仮想マシンミニタＸｅｎによって管理される。シャドウページテーブルの各エントリには実際の物理ページアドレスが格納される。 In the virtual machine Minita Xen, a shadow page table for converting the virtual space address of the guest OS and the physical memory separately from the page table (guest page table) which is a data structure for managing the paging function of the virtual space address of the guest OS An address conversion method using is generally used. The guest page table managed by the guest OS is virtual in terms of address conversion, and each entry in the guest page table stores an address that the guest OS thinks is a physical page address. The shadow page table is a real page table that is actually set in the MMU 112, and is managed by the virtual machine minita Xen. Each entry of the shadow page table stores an actual physical page address.

図５は、仮想マシンモニタＸｅｎのシャドウページテーブルの概念図である。 FIG. 5 is a conceptual diagram of the shadow page table of the virtual machine monitor Xen.

説明を簡単にするため、ここでは、シングルシャドウページテーブルで説明する。また、ゲストＯＳと仮想マシンモニタを区別するため、シャドウ側の用語にはＳ（shadow)をつけた。これは、説明のためであり、ＸｅｎではＰＴはページ仮想モニタのページテーブルであり、ここで説明しているＳＰＴにあたる。マルチプルシャドウページテーブルは、個々のゲストＯＳのページテーブル（ゲストページテーブル）毎にシャドウページテーブルを持つ方式で、ゲストＯＳのページテーブルが切り替わるたびに、対応するシャドウページテーブルを切り返える方式である。 In order to simplify the explanation, here, a single shadow page table is used for explanation. In order to distinguish between the guest OS and the virtual machine monitor, S (shadow) is added to the term on the shadow side. This is for explanation, and in Xen, PT is the page table of the page virtual monitor, which corresponds to the SPT described here. The multiple shadow page table is a method having a shadow page table for each page table (guest page table) of each guest OS, and each time the guest OS page table is switched, the corresponding shadow page table is switched back. .

ゲストＯＳが実行されると、ＣＰＵ（本物、仮想でない）は、ゲストＯＳのページテーブルでなく、仮想マシンモニタのシャドウページテーブル（ＳＰＴ）を参照し、アドレス変換を行う。ＩＡ32の場合には、ＴＬＢをフラッシュするために、必ずinvlpg命令を実行する必要があり、仮想化が実装されたＣＰＵでは、一般保護例外となり、例外を処理する仮想マシンモニタに制御が渡たる。 When the guest OS is executed, the CPU (genuine, not virtual) refers to the shadow page table (SPT) of the virtual machine monitor instead of the page table of the guest OS and performs address conversion. In the case of IA32, it is necessary to always execute the invlpg instruction in order to flush the TLB. In a CPU in which virtualization is implemented, a general protection exception occurs, and control is passed to the virtual machine monitor that processes the exception.

仮想マシンモニタXenでは、ゲストＯＳのＰＴＥの操作が発生したことを、ページフォルトで管理している。そのため、ゲストＯＳのＰＴのＰＴＥがマップされる物理メモリの
ＳＰＴのＳＰＴＥは無効にしている。ゲストＯＳのＰＴＥの操作のページフォルトの例外が発生すると、そのページをout of sync list と呼ばれるリストにキューイングし、ゲストＯＳのＰＴＥにマップされる仮想マシンモニタのＳＰＴＥを有効にし（例外を回復）、ゲストＯＳのＰＴＥへの書き込みを許可し、ＰＴＥへ書き込み、ＰＴＥ更新時のinvlpg命令を実行する。 The virtual machine monitor Xen manages the occurrence of a guest OS PTE operation by a page fault. For this reason, the SPTE of the SPT of the physical memory to which the PTE of the PT of the guest OS is mapped is invalidated. When a page fault exception occurs in the guest OS PTE operation, the page is queued in a list called out of sync list and SPTE of the virtual machine monitor mapped to the guest OS PTE is enabled (recover exception). ), Allowing the guest OS to write to the PTE, write to the PTE, and execute the invlpg instruction when updating the PTE.

invlpg命令（特権命令）を実行すると、一般保護例外が発生し、仮想マシンモニタへ制御が渡り、out of sync list の内容をＳＰＴに反映し、ゲストＯＳのＰＴＥにマップされる仮想マシンモニタXenのＳＰＴＥを再び無効にする。 When an invlpg instruction (privileged instruction) is executed, a general protection exception occurs, control is passed to the virtual machine monitor, the contents of the out of sync list are reflected in the SPT, and the virtual machine monitor Xen mapped to the PTE of the guest OS Disable SPTE again.

ゲストＯＳがページをアクセスするためにＰＴにＰＴＥを追加しようとすると、ＣＰＵは、ＳＰＤ経由でアクセスするため、ＳＰＴ内には、ＳＰＴＥが存在しないためアドレス変換できないため、ページフォルトが発生する。ページフォルトを処理する仮想マシンモニタでは、フォルトを処理する仮想空間アドレスがゲストＯＳのＰＴをout of sync list へ保存し、ページフォルトを解決する。ページフォルトが解決され、ゲストＯＳのＰＴのＰＴＥを更新し、ＴＬＢをフラッシュするために、invlpg命令を実行すると、一般保護例外が発生し、仮想マシンモニタへ制御が渡り、out of sync list に保存してあるＰＴのＰＴＥ状態と現在のゲストＯＳのＰＴのＰＴＥ状態をチェックし、現在のゲストＯＳのＰＴのＰＴＥがない場合で、out of sync list に保存してあるＰＴのＰＴＥがある場合には、ゲストＯＳのＰＴＥが無効になったと判断し、仮想マシンモニタのＰＴのＰＴＥを無効にする。現在のゲストＯＳのＰＴのＰＴＥがある場合で、out of sync list に保存してあるＰＴのＰＴＥがない場合には、ゲストＯＳがＰＴＥを設定したと判断し、ゲストＯＳのＰＴＥをＳＰＴＥへコピーする（図５）。 When the guest OS tries to add a PTE to the PT in order to access the page, the CPU accesses via the SPD. Therefore, since the SPTE does not exist in the SPT, the address cannot be converted, and a page fault occurs. In the virtual machine monitor that processes the page fault, the virtual space address that processes the fault saves the PT of the guest OS in the out of sync list and resolves the page fault. When the invlpg instruction is executed to solve the page fault, update the PTE of the guest OS PT, and flush the TLB, a general protection exception occurs, control is passed to the virtual machine monitor, and it is saved in the out of sync list Check the PTE status of the current PT and the PTE status of the current guest OS, and if there is no PT PTE of the current guest OS and there is a PT PTE saved in the out of sync list Determines that the PTE of the guest OS has become invalid, and invalidates the PT PTE of the virtual machine monitor. If there is a PT PTE for the current guest OS and there is no PT PTE saved in the out of sync list, it is determined that the guest OS has set the PTE, and the guest OS PTE is copied to the SPTE. (FIG. 5).

ここまでが、仮想マシンモニタＸｅｎでのページフォルトの実現例である。 Up to this point, an example of realizing a page fault in the virtual machine monitor Xen is described.

さて、本実施形態では、例えば、従来の仮想マシンモニタのシャドウページテーブルに加え、ゲストＯＳの全ＰＴがマップされる物理メモリのＳＰＴのＳＰＴＥを書き込み禁止にする。Ｘｅｎでは、ゲストＯＳのＰＴのＰＴＥ操作をエミュレートするために、シャドウページテーブルとout of sync list とゲストＯＳが実行したinvlpg命令の一般保護例外による書き戻しを利用していたが、本実施形態では、ゲストＯＳの更新前のページ状態を保存するために、仮想マシンモニタのメモリ管理構造に、before page image listを追加する。 In the present embodiment, for example, in addition to the shadow page table of the conventional virtual machine monitor, the SPTE of the SPT of the physical memory to which all the PTs of the guest OS are mapped is prohibited. In Xen, in order to emulate the PTE operation of the PT of the guest OS, the shadow page table, the out of sync list, and the write back by the general protection exception of the invlpg instruction executed by the guest OS are used. Then, in order to save the page state before updating the guest OS, a before page image list is added to the memory management structure of the virtual machine monitor.

本実施形態では、ゲストＯＳがページを更新する際には、ＳＰＴＥを参照する。ＳＰＴＥでは、全てのＰＴＥに対応するページそれぞれが書き込み禁止に設定されているため、ページフォルトの例外（書き込み違反による例外）が発生し、仮想マシンモニタ２００へ制御が渡る。仮想マシンモニタ２００では、ＳＰＴＥが書き込み禁止になっていれば、
（チェックポイント間でのはじめての書き込み）、ゲストＯＳの更新したいページの内容（更新前データ）を図２に示す当該ゲストＯＳに対応する更新履歴用メモリ領域に格納すると共に、ゲストＯＳの更新したいページの仮想空間アドレス(ゲストＯＳの仮想アドレス)と更新履歴用メモリ領域のアドレス（仮想マシンモニタの物理アドレス）とをshadow page table entry listへ設定し、そして、書き込み違反が発生したページに対応するＳＰＴＥの書き込み禁止を解除する（図６）。 In the present embodiment, SPTE is referred to when the guest OS updates a page. In SPTE, each page corresponding to all PTEs is set to write-protection, so a page fault exception (exception due to write violation) occurs, and control is passed to the virtual machine monitor 200. In the virtual machine monitor 200, if SPTE is write-protected,
(First writing between checkpoints) The contents of the page to be updated by the guest OS (data before update) are stored in the update history memory area corresponding to the guest OS shown in FIG. 2 and the guest OS is to be updated. Set the virtual space address of the page (virtual address of the guest OS) and the address of the update history memory area (physical address of the virtual machine monitor) in the shadow page table entry list, and correspond to the page where the write violation occurred The SPTE write prohibition is canceled (FIG. 6).

図２の各ゲストＯＳの更新履歴メモリ領域への保存の仕方は、ページフォルトが発生したゲストＯＳの仮想空間アドレス（仮想ページアドレス）でbefore page image entryを検索し、もし、ページフォルトが発生したゲストＯＳの仮想空間アドレスのエントリが有効であれば、過去にビフォアイメージを採取した事があるので、ゲストＯＳの仮想空間アドレスのデータで更新履歴の対応するエントリを上書きする。エントリが無効であれば、過去にビフォアイメージを採取した事がないので、更新履歴の対応するエントリにゲストＯＳの仮想空間アドレスのデータを書き込み、before page image entryの無効といなっているエントリを取り出し、ゲストＯＳの仮想空間アドレスを設定し、エントリを有効にする（図８）。 The method for saving each guest OS in the update history memory area in FIG. 2 is to search before page image entry by the virtual space address (virtual page address) of the guest OS in which the page fault has occurred, and if a page fault has occurred If the entry of the virtual space address of the guest OS is valid, the before image has been collected in the past, so the corresponding entry in the update history is overwritten with the virtual space address data of the guest OS. If the entry is invalid, the before image has not been collected in the past, so write the virtual space address data of the guest OS to the corresponding entry in the update history, and extract the entry that is invalid for the before page image entry Then, the virtual space address of the guest OS is set and the entry is validated (FIG. 8).

また、本実施形態では、図６に示すように、ECC error page listも用いられる。このECC error page listは、ECC２ビットエラーが発生したページをリンクするために用いられる。また、本実施形態では、図７に示すように、ゲストＯＳからのＩ／Ｏ要求を蓄積及び管理するためのＩ／Ｏ要求管理構造として、I/O request listも追加されている。 In this embodiment, an ECC error page list is also used as shown in FIG. This ECC error page list is used for linking pages in which an ECC 2-bit error has occurred. In this embodiment, as shown in FIG. 7, an I / O request list is also added as an I / O request management structure for accumulating and managing I / O requests from the guest OS.

次に、図９および図１０のフローチャートを参照して、仮想マシンモニタ２００によって実行されるページフォルト処理の流れを説明する。 Next, the flow of page fault processing executed by the virtual machine monitor 200 will be described with reference to the flowcharts of FIGS. 9 and 10.

まず、ゲストＯＳによるページ書き込みがゲストＯＳが管理している仮想アドレスの範囲内であるかどうかがＣＰＵのＭＭＵ等によって判別され（ステップＳ１１）、範囲内でないならば、ページフォルトが発生し、通常の書き込み違反例外処理が仮想マシンモニタ２００によって実行される（ステップＳ１２）。 First, whether the page write by the guest OS is within the range of the virtual address managed by the guest OS is determined by the MMU of the CPU or the like (step S11). Is written by the virtual machine monitor 200 (step S12).

ページ書き込みがゲストＯＳが管理している仮想アドレスの範囲内であり、且つその書き込みページが書き込み禁止されているならば（ステップＳ１３のＹＥＳ）、ページ書き込み違反の例外が発生し、仮想マシンモニタ２００に制御が渡る。 If the page write is within the virtual address range managed by the guest OS and the write page is write-protected (YES in step S13), a page write violation exception occurs and the virtual machine monitor 200 Control passes to.

仮想マシンモニタ２００は、ページ書き込み違反が発生したページが、上述したゲストＯＳと仮想マシンモニタ２００とのインタフェース用のページ（チェックポイント取得用ページ）ではないかどうか、つまり、ページ書き込み違反が発生したページがチェックポイント取得のためにコンテキストをライトすべきゲストＯＳの仮想ページアドレスに対応するページでないかどうかを判別する（ステップＳ１４）。 The virtual machine monitor 200 determines whether or not the page in which the page write violation has occurred is not the above-described interface page (checkpoint acquisition page) between the guest OS and the virtual machine monitor 200, that is, a page write violation has occurred. It is determined whether or not the page is a page corresponding to the virtual page address of the guest OS to which the context is to be written for checkpoint acquisition (step S14).

ページ書き込み違反が発生したページがインタフェース用のページ（チェックポイント取得用ページ）でないならば（ステップＳ１４のＹＥＳ）、仮想マシンモニタ２００は、ページ書き込み違反が発生したページの更新前の内容（データ）をゲストＯＳのメモリ領域からリードして、当該ゲストＯＳに対応する更新履歴用メモリ領域に保存する（ステップＳ１５）。この保存後、仮想マシンモニタ２００は、ページテーブルを更新して、そのページを書き込み可に設定する（ステップＳ１６）。これにより、ゲストＯＳはページ書き込み違反が発生したページへのデータの書き込みが可能となる（ステップＳ１７）。 If the page in which the page write violation has occurred is not an interface page (checkpoint acquisition page) (YES in step S14), the virtual machine monitor 200 displays the content (data) before the update of the page in which the page write violation has occurred. Are read from the memory area of the guest OS and stored in the update history memory area corresponding to the guest OS (step S15). After the saving, the virtual machine monitor 200 updates the page table and sets the page to be writable (step S16). As a result, the guest OS can write data to the page where the page write violation has occurred (step S17).

ページ書き込み違反が発生したページがインタフェース用のページ（チェックポイント取得用ページ）であるならば（ステップＳ１４のＮＯ）、仮想マシンモニタ２００は、インタフェース用のページを書き込み可に設定した後、ステップＳ２１に進み、Ｉ／Ｏ管理構造に存在する発行していないＩ／Ｏ要求を実行する。そして、仮想マシンモニタ２００は、メモリ管理構造に存在するページフォルトがあったページを管理するデータを初期化した後（ステップＳ２２）、全てのページの書き込みを再び書き込み禁止に設定する（ステップＳ２３）。チェックポイント情報（コンテキスト）はゲストＯＳによりチェックポイント取得用ページに書き込まれる（ステップＳ１７）。 If the page in which the page write violation has occurred is an interface page (a checkpoint acquisition page) (NO in step S14), the virtual machine monitor 200 sets the interface page to be writable, and then step S21. Proceed to Execute an unissued I / O request that exists in the I / O management structure. Then, the virtual machine monitor 200 initializes data for managing a page having a page fault existing in the memory management structure (step S22), and then sets writing of all pages to write prohibition again (step S23). . The checkpoint information (context) is written on the checkpoint acquisition page by the guest OS (step S17).

次に、図１１のフローチャートを参照して、invlpg命令での一般保護例外処理の流れを説明する。 Next, the flow of general protection exception processing with the invlpg instruction will be described with reference to the flowchart of FIG.

invlpg命令が実行されると、仮想マシンモニタ２００は、ゲストＯＳと仮想マシンモニタ２００とのインタフェース用のページが書き込み可であるか否かを判別する（ステップＳ２１）。インタフェース用のページが書き込み可であれば（ステップＳ３１のＹＥＳ）、仮想マシンモニタ２００は、そのページを書き込み禁止にする（ステップＳ３２）。そして、仮想マシンモニタ２００は、out of sync listを用いて、ページテーブルとシャドウページテーブルとを同期させる（ステップＳ３３）。 When the invlpg instruction is executed, the virtual machine monitor 200 determines whether the page for the interface between the guest OS and the virtual machine monitor 200 is writable (step S21). If the interface page is writable (YES in step S31), the virtual machine monitor 200 prohibits writing the page (step S32). Then, the virtual machine monitor 200 synchronizes the page table and the shadow page table using the out of sync list (step S33).

次に、図１２、図１３のフローチャートを参照して、ＥＣＣエラー時のＮＭＩ処理の流れについて説明する。 Next, the flow of NMI processing when an ECC error occurs will be described with reference to the flowcharts of FIGS.

ＥＣＣの２ビット誤り等の障害が発生した場合には、ＣＰＵへＮＭＩ（Non Maskable Interrupt）が入力される。ＣＰＵのＮＭＩハンドラ割り込みは、仮想マシンモニタ２００へ登録されているので、ＮＭＩは仮想マシンモニタ２００が処理する。仮想マシンモニタ２００には、従来のＮＭＩハンドラに加え、チェックポイントリスタートを実行する処理が追加されている。ＥＣＣエラー時のＮＭＩハンドラの処理は以下の通りである。 When a failure such as an ECC 2-bit error occurs, NMI (Non Maskable Interrupt) is input to the CPU. Since the CPU's NMI handler interrupt is registered in the virtual machine monitor 200, the virtual machine monitor 200 processes the NMI. In the virtual machine monitor 200, processing for executing a checkpoint restart is added in addition to the conventional NMI handler. The processing of the NMI handler at the time of an ECC error is as follows.

ゲストＯＳによるメモリアクセスによってＥＣＣエラーが発生した時、ＮＭＩによって仮想マシンモニタ２００に制御が渡る。仮想マシンモニタ２００は、発生したＥＣＣエラーが誤り訂正可能なＥＣＣエラーであるかどうかを判定する（ステップＳ４１）。 When an ECC error occurs due to memory access by the guest OS, control is passed to the virtual machine monitor 200 by the NMI. The virtual machine monitor 200 determines whether or not the generated ECC error is an error correctable ECC error (step S41).

誤り訂正可能なＥＣＣエラーでない場合には（ステップＳ４１のＮＯ）、仮想マシンモニタ２００は、保存されている更新前データを用いてゲストＯＳのメモリの状態を直前のチェックポイントに対応する時点に復元する処理を実行する（ステップＳ４２）。ステップＳ４２では、例えば、更新履歴管理構造（before page image list）を使用して、誤り訂正可能でない仮想アドレスを除き、ゲストＯＳの仮想アドレスに対応するメモリの内容が物理アドレスに書き戻される。次いで、仮想マシンモニタ２００は、誤り訂正可能ではないエラーが発生した仮想ページアドレス用に、障害時代替用メモリ領域３０８の空き物理ページを用意する(ステップＳ４３)。そして、仮想マシンモニタ２００は、誤り訂正可能ではないエラーが発生した仮想ページアドレスに対応するページテーブル(シャドウページテーブル)のエントリに、用意した空き物理ページのページアドレスを設定する（ステップＳ４４〜Ｓ４６）。この場合、誤り訂正可能ではないエラーが発生した仮想ページアドレスに対応する更新前データは用意した空き物理ページに書き戻される（ステップＳ４４）。 If the error is not an error-correctable ECC error (NO in step S41), the virtual machine monitor 200 restores the memory state of the guest OS to the time corresponding to the immediately preceding checkpoint using the stored pre-update data. The process which performs is performed (step S42). In step S42, for example, using the update history management structure (before page image list), the contents of the memory corresponding to the virtual address of the guest OS are written back to the physical address except for the virtual address that is not error-correctable. Next, the virtual machine monitor 200 prepares a free physical page in the fault replacement memory area 308 for the virtual page address where an error that is not error-correctable has occurred (step S43). Then, the virtual machine monitor 200 sets the page address of the prepared free physical page in the entry of the page table (shadow page table) corresponding to the virtual page address where an error that is not error correctable has occurred (steps S44 to S46). ). In this case, the pre-update data corresponding to the virtual page address where an error that is not error-correctable has occurred is written back to the prepared free physical page (step S44).

この後、仮想マシンモニタ２００は、ＥＣＣエラーが発生したことを示すメッセージをコンソール等の表示画面上に表示する（ステップＳ４７）。 Thereafter, the virtual machine monitor 200 displays a message indicating that an ECC error has occurred on a display screen such as a console (step S47).

なお、発生したＥＣＣエラーが誤り訂正可能でないＥＣＣエラーである場合には、自動的にリスタート処理を開始してもよい。この場合、保存されている更新前データを用いてゲストＯＳのメモリの状態を直前のチェックポイントに対応する時点に復元する処理、および誤り訂正可能ではないエラーが発生した仮想ページアドレス用に空き物理ページを用意して、誤り訂正可能ではないエラーが発生した仮想ページアドレスに対応するページテーブルのエントリに用意した物理ページの物理ページアドレスを設定する処理等に加え、保存されているコンテキストを用いてＣＰＵの状態を直前のチェックポイントに対応する状態に復元する処理が仮想マシンモニタ２００によって実行される。 If the generated ECC error is an ECC error that cannot be corrected, the restart process may be automatically started. In this case, a process for restoring the memory state of the guest OS to the time corresponding to the immediately preceding checkpoint using the stored pre-update data, and a free physical for the virtual page address in which an error that is not error-correctable has occurred. In addition to processing to set the physical page address of the physical page prepared in the entry of the page table corresponding to the virtual page address where the error occurred that is not error correctable, in addition to using the stored context The virtual machine monitor 200 executes processing for restoring the CPU state to the state corresponding to the immediately preceding checkpoint.

また、仮想マシンモニタ２００は、ＥＣＣエラー時と同様の処理をゲストＯＳのパニック関数についても行う。 In addition, the virtual machine monitor 200 performs the same processing as that for the ECC error on the panic function of the guest OS.

パニックの発生の監視は、例えば、ゲストＯＳがパニック関数を呼び出す仮想ページアドレスに対応するページが存在しない状態にページテーブルを初期設定することによって容易に行うことができる。手順は以下の通りである。 The occurrence of panic can be easily monitored by, for example, initializing the page table in a state where there is no page corresponding to the virtual page address at which the guest OS calls the panic function. The procedure is as follows.

１．仮想マシンモニタ２００は、まず、ゲストＯＳがパニック関数を呼び出す仮想ページアドレスに対応したページが存在しない状態にページテーブルを初期設定する。ここでは、例えば、パニック関数を呼び出す仮想ページアドレスに対応したページテーブルエントリのプレゼントビットがＯＦＦに設定される。 1. First, the virtual machine monitor 200 initializes the page table in a state where there is no page corresponding to the virtual page address at which the guest OS calls the panic function. Here, for example, the present bit of the page table entry corresponding to the virtual page address that calls the panic function is set to OFF.

２．仮想マシンモニタ２００は、ページフォルトによる例外が発生すると、ページフォルトが発生したページがパニック関数のコードを呼び出す仮想ページアドレスに対応するページであるか否かを判別する。 2. When an exception due to a page fault occurs, the virtual machine monitor 200 determines whether or not the page in which the page fault has occurred is a page corresponding to a virtual page address that calls a panic function code.

３．パニック関数のコードを呼び出す仮想ページアドレスに対応するページであれば、仮想マシンモニタ２００は、保存されているコンテキスト（チェックポイント情報）を用いてＣＰＵの状態を直前のチェックポイントに対応する状態に復元すると共に、更新履歴用メモリ領域に保存されている更新前データを用いてゲストＯＳのメモリ領域の状態を直前のチェックポイントに対応する時点に復元する。 3. If the page corresponds to the virtual page address that calls the panic function code, the virtual machine monitor 200 restores the CPU state to the state corresponding to the immediately preceding checkpoint using the saved context (checkpoint information). At the same time, the pre-update data stored in the update history memory area is used to restore the state of the guest OS memory area to the time corresponding to the immediately preceding checkpoint.

なお、データベース等を使う場合には、ユーザ（アプリケーション）からの要求に従い、復旧できるようにする事も意義がある。この場合には、”リスタート用”インタフェースのページを設定すれば、ユーザ要求でリスタートを実現することができる。 In the case of using a database or the like, it is also meaningful to enable recovery in accordance with a request from a user (application). In this case, if a “restart” interface page is set, the restart can be realized by a user request.

次に、チェックポイントを取得する処理について説明する。 Next, a process for acquiring a checkpoint will be described.

チェックポイント取得処理はゲストＯＳのカーネル内で実現することができる。Windows（登録商標）の場合には、カーネルドライバ、Ｌｉｎｕｘ（登録商標）の場合には、カーネルスレッドで実現することができる。カーネルスレッドは、ゲストＯＳの論理ＣＰＵにバインドし、ＣＰＵ個数分が同時に動作し、ＣＰＵ間通信を使用し代表者の一人がチェックポイントを取得する。 The checkpoint acquisition process can be realized in the guest OS kernel. In the case of Windows (registered trademark), it can be realized by a kernel driver, and in the case of Linux (registered trademark), it can be realized by a kernel thread. The kernel thread binds to the logical CPU of the guest OS, and operates for the number of CPUs at the same time, and one of the representatives acquires a checkpoint using inter-CPU communication.

図１４のフローチャートはチェックポイント取得処理の流れを示している。 The flowchart in FIG. 14 shows the flow of checkpoint acquisition processing.

例えば複数のゲストＯＳがＣＰＵ間通信を使用して通信することにより、ゲストＯＳの内から代表を決定する（ステップＳ５１）。各ゲストＯＳはチェックポイントの時点毎に自身が代表であるか否かを判定し、代表でない場合にはチェックポイント取得処理の完了まで待機する（ステップＳ５２、Ｓ５３）。代表のゲストＯＳは、チェックポイント情報（コンテキスト）を仮想マシンモニタとのインタフェース用のページに書き込むことにより、チェックポイント取得処理を実行する（ステップＳ５３、Ｓ５４）。 For example, a plurality of guest OSs communicate with each other using inter-CPU communication to determine a representative from among the guest OSs (step S51). Each guest OS determines whether or not it is a representative at each checkpoint time. If it is not a representative, it waits until the checkpoint acquisition process is completed (steps S52 and S53). The representative guest OS executes the checkpoint acquisition process by writing the checkpoint information (context) in the page for interface with the virtual machine monitor (steps S53 and S54).

以上のように、本実施形態によれば、更新履歴用メモリ領域に更新前データを蓄積することにより、更新履歴用メモリ領域を用いてゲストＯＳのメモリ領域の内容を復元することができる。また、更新前データ蓄積処理は、ゲストＯＳと仮想モニタとの間のインタフェース用のページを用いることにより、チェックポイント取得処理のタイミングと同期するように制御される。これにより、チェックポイント取得の度にページテーブルの全ページが書き込み禁止状態に初期設定することができ、あるチェックポイントから次のチェックポイントまでの期間においては、各ページ毎に、そのページに対する最初の更新に対応する更新前データのみを効率よく蓄積することができる。また、ゲストＯＳと仮想モニタとの間のインタフェース用のページを利用することにより、ゲストＯＳから発行されるＩ／Ｏ要求の実行タイミングを次のチェックポイント取得処理が実行されるまで遅延させる処理を効率よく実行することができるので、Ｉ／Ｏ装置をも含めてシステム全体の状態を正常に直前のチェックポイントの時点に戻してリスタートすることができる。 As described above, according to the present embodiment, by accumulating the pre-update data in the update history memory area, the contents of the memory area of the guest OS can be restored using the update history memory area. Further, the pre-update data accumulation process is controlled to synchronize with the timing of the checkpoint acquisition process by using a page for the interface between the guest OS and the virtual monitor. As a result, every page of the page table can be initialized to a write-protected state every time a checkpoint is acquired, and during the period from one checkpoint to the next checkpoint, the first Only pre-update data corresponding to the update can be efficiently accumulated. Further, by using the interface page between the guest OS and the virtual monitor, a process for delaying the execution timing of the I / O request issued from the guest OS until the next checkpoint acquisition process is executed. Since it can be executed efficiently, the state of the entire system including the I / O device can be normally returned to the previous checkpoint and restarted.

よって、ハードウェアの多重化やクラスタリングといった複雑な構成を用いることなく、仮想化機能を用いたコンピュータ上でのハードウェア故障やソフトウェアの不具合によるシステム停止を回避することが可能となる。 Therefore, it is possible to avoid a system stop due to hardware failure or software failure on a computer using the virtualization function without using a complicated configuration such as hardware multiplexing or clustering.

なお、本実施形態では、ゲストページテーブルの他に、仮想マシンモニタが管理する物理ページアドレスを保持するためのページテーブルをシャドウページテーブルとして実装する例を説明したが、これはあくまで一例であり、ゲストＯＳの仮想ページアドレスを仮想マシンモニタが管理する物理ページアドレスに変換するページテーブルのみを使用する構成であってもよい。 In this embodiment, an example in which a page table for holding a physical page address managed by the virtual machine monitor in addition to the guest page table has been described as a shadow page table, but this is only an example, The configuration may be such that only the page table for converting the virtual page address of the guest OS into the physical page address managed by the virtual machine monitor is used.

また、本実施形態の更新前データ保存処理および障害回復処理等は全て仮想マシンモニタによって実行されるので、この仮想マシンモニタをコンピュータ読み取り可能な記憶媒体を通じて通常のコンピュータにインストールして実行するだけで、本実施形態と同様の効果を容易に実現することができる。 In addition, since the pre-update data storage process and the failure recovery process of this embodiment are all executed by the virtual machine monitor, the virtual machine monitor is simply installed and executed on a normal computer through a computer-readable storage medium. The effects similar to those of the present embodiment can be easily realized.

また、本発明は上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。 Further, the present invention is not limited to the above-described embodiments as they are, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of components disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.

本発明の一実施形態に係る情報処理装置の構成を示すブロック図。The block diagram which shows the structure of the information processing apparatus which concerns on one Embodiment of this invention. 同実施形態の情報処理装置におけるハードウェアとソフトウェアとの関係を示すブロック図。2 is an exemplary block diagram showing the relationship between hardware and software in the information processing apparatus of the embodiment. FIG. 同実施形態の情報処理装置で用いられる仮想記憶機能の例を説明するための図。The figure for demonstrating the example of the virtual memory function used with the information processing apparatus of the embodiment. 同実施形態の情報処理装置におけるメモリ割り当ての例を示す図。4 is a diagram showing an example of memory allocation in the information processing apparatus of the embodiment. FIG. 従来の仮想マシンモニタのシャドウページテーブルを模式的に示す図。The figure which shows typically the shadow page table of the conventional virtual machine monitor. 同実施形態の情報処理装置で用いられる仮想マシンモニタのメモリ管理構造を説明するための図。3 is an exemplary view for explaining a memory management structure of a virtual machine monitor used in the information processing apparatus of the embodiment. FIG. 同実施形態の情報処理装置で用いられる仮想マシンモニタのＩ／Ｏ要求管理構造を説明するための図。3 is an exemplary view for explaining an I / O request management structure of a virtual machine monitor used in the information processing apparatus of the embodiment. FIG. 同実施形態の情報処理装置で用いられる仮想マシンモニタの更新履歴管理構造を説明するための図。6 is an exemplary view for explaining an update history management structure of a virtual machine monitor used in the information processing apparatus of the embodiment. FIG. 同実施形態の情報処理装置によって実行されるページフォルト処理の流れの一部を説明するためのフローチャート。6 is an exemplary flowchart for explaining a part of the flow of page fault processing executed by the information processing apparatus of the embodiment; 同実施形態の情報処理装置によって実行されるページフォルト処理の流れの残り部分を説明するためのフローチャート。6 is an exemplary flowchart for explaining the remaining part of the flow of page fault processing executed by the information processing apparatus of the embodiment; 同実施形態の情報処理装置によって実行される一般保護例外処理の流れを説明するフローチャート。6 is an exemplary flowchart for explaining the flow of general protection exception processing executed by the information processing apparatus of the embodiment; 同実施形態の情報処理装置によって実行されるＥＣＣエラー時のＮＭＩ処理の流れの一部を説明するためのフローチャート。6 is an exemplary flowchart for explaining a part of the flow of NMI processing when an ECC error is executed by the information processing apparatus of the embodiment; 同実施形態の情報処理装置によって実行されるＥＣＣエラー時のＮＭＩ処理の流れの残り部を説明するためのフローチャート。6 is an exemplary flowchart for explaining the remaining part of the flow of NMI processing when an ECC error is executed by the information processing apparatus of the embodiment; 同実施形態の情報処理装置によって実行されるチェックポイント取得処理の流れを説明するためのフローチャート。6 is an exemplary flowchart for explaining the flow of checkpoint acquisition processing executed by the information processing apparatus of the embodiment;

Explanation of symbols

１１１…ＣＰＵ、１１２…メモリ管理ユニット、１１３…メモリ、１１４…ディスクドライブ装置、１１５…Ｉ／Ｏデバイス、２００…仮想マシンモニタ、３０１…ゲストＯＳ１用のメモリ領域、３０２…ゲストＯＳ２用のメモリ領域、３０３…ゲストＯＳ３用のメモリ領域、３０４…仮想マシンモニタ（ハイパーバイザ）用のメモリ領域、３０５…ゲストＯＳ１の更新履歴用メモリ領域、３０６…ゲストＯＳ２の更新履歴用メモリ領域、３０７…ゲストＯＳ３の更新履歴用メモリ領域、３０８…ゲストＯＳ１〜ＯＳ３に対する障害時代替用メモリ領域。 111 ... CPU, 112 ... memory management unit, 113 ... memory, 114 ... disk drive device, 115 ... I / O device, 200 ... virtual machine monitor, 301 ... memory area for guest OS1, 302 ... memory area for guest OS2 303: Memory area for guest OS3, 304: Memory area for virtual machine monitor (hypervisor), 305 ... Memory area for update history of guest OS1, 306 ... Memory area for update history of guest OS2, 307 ... Guest OS3 Update history memory area, 308... Memory area for failure replacement for guest OS1 to OS3.

Claims

In an information processing apparatus that has a virtualization function and a virtual storage function and processes virtual storage of an operating system operating on a virtual environment in units of pages
Means for setting all pages of a page table corresponding to a memory area allocated to the operating system to a write-inhibited state by a virtual machine monitor that controls the virtual environment;
In response to a page write violation exception that occurs when the operating system refers to the page table to access a write protected page, the pre-update data of the page in which the page write violation has occurred is stored in the operating system. A pre-update data storage means for executing processing by the virtual machine monitor, which is obtained from the memory area and stored in the memory area managed by the virtual machine monitor;
Processing to continue writing to the page where the page write violation occurred by the operating system by releasing the write prohibition of the page where the page write violation occurred after saving the pre-update data, by the virtual machine monitor Means to perform,
Checkpoint acquisition means for periodically executing a checkpoint acquisition process and storing a context including a state of the processor of the information processing apparatus in a memory area corresponding to a predetermined virtual page address of the operating system;
When an exception due to the page write violation occurs, a process for determining whether or not the page in which the page write violation has occurred is a page corresponding to the predetermined virtual page address, and a page in which the page write violation has occurred If it is determined that the page corresponds to the predetermined virtual page address, the page for which the write prohibition has been canceled is set to the write prohibition state again, and all pages of the page table assigned to the operating system are set to the write prohibition state An information processing apparatus comprising: means for executing a resetting process by the virtual machine monitor.

In order to delay execution of an I / O request issued from the operating system until the next checkpoint acquisition processing is executed, processing for accumulating I / O requests issued from the operating system, and page writing Means for executing, by the virtual machine monitor, processing for executing the stored I / O request when it is determined that the page in which the violation has occurred is a page corresponding to the predetermined virtual page address. The information processing apparatus according to claim 1, further comprising:

Means for executing, by the virtual machine monitor, processing for determining whether or not the ECC error is an error correctable error when an ECC error occurs due to a memory access by the operating system;
If the ECC error is not an error correctable error, the state of the processor is restored to the state corresponding to the immediately preceding checkpoint using the saved context, and the saved pre-update data is used to restore the processor state. A process of restoring the memory state of the operating system to the time corresponding to the immediately preceding checkpoint, and a predetermined physical page is allocated to the virtual page address where the error that is not error-correctable occurs, and the error that is not error-correctable Further comprising: means for executing, by the virtual machine monitor, a process of setting a physical page address of the predetermined physical page in an entry of the page table corresponding to the virtual page address where the virtual machine address has occurred. Item 6. The information processing apparatus according to Item 1.

Means for executing, by the virtual machine monitor, processing for setting the page table to a state in which a page corresponding to a virtual page address that calls a panic function of the operating system does not exist;
When an exception due to a page fault occurs, a process for determining whether or not the page where the exception due to the page fault occurs is a page corresponding to a virtual page address that calls the panic function code, and a code for the panic function If the page corresponds to the virtual page address to be called, the state of the processor is restored to the state corresponding to the immediately preceding checkpoint using the saved context, and the saved pre-update data is used to restore the processor state. 3. The information processing apparatus according to claim 2, further comprising means for executing, by the virtual machine monitor, processing for restoring the memory state of the operating system to a time point corresponding to the immediately preceding checkpoint.

A failure recovery processing method for recovering an information processing apparatus having a virtualization function and a virtual storage function from a failure,
Executing by the virtual machine monitor controlling the virtual environment the setting process for setting all pages of the page table corresponding to the memory area allocated to the operating system operating in the virtual environment controlled by the virtual machine monitor to the write-inhibited state; ,
In response to a page write violation exception that occurs when the operating system refers to the page table to access a write protected page, the pre-update data of the page in which the page write violation has occurred is stored in the operating system. A process of acquiring from a memory area and storing in a memory area managed by the virtual machine monitor by the virtual machine monitor;
Processing to continue writing to the page where the page write violation occurred by the operating system by releasing the write prohibition of the page where the page write violation occurred after saving the pre-update data, by the virtual machine monitor Steps to perform;
A checkpoint acquisition step of periodically executing a checkpoint acquisition process and storing a context including a state of the processor of the information processing apparatus in a memory area corresponding to a predetermined virtual page address of the operating system;
When an exception due to the page write violation occurs, a process for determining whether or not the page in which the page write violation has occurred is a page corresponding to the predetermined virtual page address, and a page in which the page write violation has occurred If it is determined that the page corresponds to the predetermined virtual page address, the page for which the write prohibition has been canceled is set to the write prohibition state again, and all pages of the page table assigned to the operating system are set to the write prohibition state. And a step of executing the resetting process by the virtual machine monitor.

In order to delay execution of an I / O request issued from the operating system until the next checkpoint acquisition processing is executed, processing for accumulating I / O requests issued from the operating system, and page writing A step of executing, by the virtual machine monitor, processing for executing the stored I / O request when it is determined that the page in which the violation has occurred is a page corresponding to the predetermined virtual page address. The failure recovery processing method according to claim 5, further comprising:

A program that functions as a virtual machine monitor that virtualizes hardware resources of a computer having a virtual storage function,
A procedure for setting all pages of a page table corresponding to a memory area allocated to an operating system operating in a virtual environment controlled by the virtual machine monitor to a write-inhibited state;
In response to a page write violation exception that occurs when the operating system refers to the page table to access a write protected page, the pre-update data of the page in which the page write violation has occurred is stored in the operating system. A pre-update data storage procedure for executing processing to be acquired from the memory area and stored in the memory area managed by the virtual machine monitor;
A procedure for executing processing to continue writing to the page where the page writing violation has occurred by the operating system by releasing the write prohibition of the page where the page writing violation has occurred after storing the pre-update data;
A determination procedure for determining whether or not the page in which the page write violation has occurred is a page corresponding to a predetermined virtual page address of the operating system when an exception due to the page write violation has occurred, the operating system Includes a task for periodically executing a checkpoint acquisition process and storing a context including a state of a processor of the information processing apparatus in a memory area corresponding to the predetermined virtual page address; and ,
When an exception due to the page write violation occurs, a procedure for determining whether the page where the page write violation occurred is a page corresponding to the predetermined virtual page address;
When it is determined that the page in which the page write violation has occurred is a page corresponding to the predetermined virtual page address, the page that is released from the write prohibition is set to the write prohibition state again and assigned to the operating system A program for causing the computer to execute a procedure for resetting all pages of a table to a write-protected state.

A procedure for accumulating I / O requests issued from the operating system in order to delay execution of I / O requests issued from the operating system until a next checkpoint acquisition process is executed;
Causing the computer to further execute a procedure for executing the stored I / O request when it is determined that the page in which the page write violation has occurred is a page corresponding to the predetermined virtual page address. The program according to claim 7.

A procedure for determining whether or not the ECC error is an error correctable error when an ECC error occurs due to memory access by the operating system;
If the ECC error is not an error correctable error, the state of the processor is restored to the state corresponding to the immediately preceding checkpoint using the saved context, and the saved pre-update data is used to restore the processor state. Procedures to restore the operating system memory state to the time corresponding to the previous checkpoint;
A predetermined physical page is allocated to a virtual page address in which an error that is not error-correctable has occurred, and an entry in the page table corresponding to the virtual page address in which an error that is not error-correctable has occurred 8. The program according to claim 7, further causing the computer to execute a procedure for setting a physical page address.

A procedure for executing, by the virtual machine monitor, processing for setting the page table to a state in which no page corresponding to a virtual page address that calls the panic function of the operating system exists;
When an exception due to a page fault occurs, a procedure for determining whether or not the page where the exception due to the page fault is a page corresponding to a virtual page address that calls the code of the panic function;
If the page corresponds to a virtual page address that calls the panic function code, the saved context is used to restore the processor state to the state corresponding to the previous checkpoint and the saved update 8. The program according to claim 7, further causing the computer to execute a procedure for restoring the state of the memory of the operating system to a time corresponding to the immediately preceding checkpoint using previous data.