JP2007323142A

JP2007323142A - Information processing apparatus and its control method

Info

Publication number: JP2007323142A
Application number: JP2006149730A
Authority: JP
Inventors: Yuji Fujiwara; 勇治藤原
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2006-05-30
Filing date: 2006-05-30
Publication date: 2007-12-13

Abstract

<P>PROBLEM TO BE SOLVED: To provide an information processing apparatus which has a plurality of virtual machines and which, even for a serious error regarding the operation of the particular virtual machine, is capable of continuing the operation of the other virtual machines unaffected by the error. <P>SOLUTION: The information processing apparatus having a plurality of virtual machines has: an associating means for associating the particular virtual machine with hardware used by the particular virtual machine; an error detecting means for detecting errors with the hardware used by the particular virtual machine; and an operation stopping means for stopping, upon detection of an error, the operation of the virtual machine associated with the hardware. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、情報処理装置およびその制御方法に係り、特に、仮想マシンの環境を具備する情報処理装置およびその制御方法に関する。 The present invention relates to an information processing apparatus and a control method thereof, and more particularly, to an information processing apparatus including a virtual machine environment and a control method thereof.

パーソナルコンピュータやサーバ等の情報処理装置では、通常、ＣＰＵ、主メモリ、およびこれらに適宜のバスを介して接続される各種の周辺デバイスを備えて構成されている。そして、主メモリや周辺デバイスに何らかの原因でエラーが発生した場合や、周辺デバイスとの間でパリティエラー等の通信エラーが発生した場合には、これらのエラーの発生を検出すると共に、検出したエラーが重大なエラーの場合には更なる被害の拡大を防止するため、システムを停止する等の防御手段が講じられていることが多い。 An information processing apparatus such as a personal computer or a server is usually configured to include a CPU, a main memory, and various peripheral devices connected to these via an appropriate bus. If an error occurs for some reason in the main memory or peripheral device, or if a communication error such as a parity error occurs with the peripheral device, the occurrence of these errors is detected and the detected error In the case of a serious error, defensive measures such as shutting down the system are often taken to prevent further damage.

エラーの発生自体を完全にゼロとすることは困難である。そこで、エラーが発生したときにその影響を最小限に留める技術や、エラーの発生部位を容易に特定できるようにすることは、情報処理装置にとって非常に重要な技術課題である。 It is difficult to make the error occurrence itself completely zero. Therefore, it is a very important technical problem for the information processing apparatus to minimize the influence of an error when it occurs, and to easily identify the location where the error occurs.

例えば、特許文献１には、近時の情報処理装置において多用されているＰＣＩバスに通信エラーが検出されたときのエラー処理に関する技術が開示されている。 For example, Patent Document 1 discloses a technique relating to error processing when a communication error is detected on a PCI bus frequently used in recent information processing apparatuses.

特許文献１が開示する技術は、ＰＣＩバスにパリティエラー等が検出された場合、単にエラーを検出しシステムを停止させるだけでなく、パリティエラーを生じさせたＰＣＩデバイスを特定し、かつ記録を残すことでその後のＰＣＩデバイスの修理や交換といった保守作業を容易にすることができるとするものである。 In the technique disclosed in Patent Document 1, when a parity error or the like is detected on the PCI bus, not only the error is detected and the system is stopped, but also the PCI device causing the parity error is specified and the record is left. Thus, maintenance work such as repair and replacement of the PCI device can be facilitated.

他方、近時の情報処理装置、特にサーバ等では、仮想マシンという情報処理概念が採りいれられつつある。 On the other hand, recent information processing apparatuses, particularly servers, are adopting the concept of information processing called virtual machines.

仮想マシンでは、これに接続される主メモリや周辺デバイス等のハードウェアがソフトウェアとして仮想化され、仮想化されたハードウェア（実際にはソフトウェア）はゲストＯＳと呼ばれるＯＳ上で制御される形態をとっている。そして仮想化されたハードウェアと現実のハードウェアとは、仮想マシンモニタ（以下、ＶＭＭ（Virtual Machine Monitor）という）呼ばれるソフトウェアによって関連付けられている。 In a virtual machine, hardware such as a main memory and peripheral devices connected to the virtual machine is virtualized as software, and the virtualized hardware (actually software) is controlled on an OS called a guest OS. I'm taking it. The virtualized hardware and the actual hardware are associated by software called a virtual machine monitor (hereinafter referred to as VMM (Virtual Machine Monitor)).

ゲストＯＳ上で動作する仮想マシンは、ホストＯＳの管理下でも動作可能である。この場合、上記のＶＭＭはホストＯＳの管理下で動作することになる。 A virtual machine that operates on the guest OS can also operate under the management of the host OS. In this case, the VMM operates under the management of the host OS.

通常、１つの情報処理装置では１つのホストＯＳを有するが、このホストＯＳで複数の仮想マシン（即ち、複数のゲストＯＳ）を扱うことが可能である。 Normally, one information processing apparatus has one host OS, but this host OS can handle a plurality of virtual machines (that is, a plurality of guest OSs).

例えば、マイクロソフト社のWindows ＸＰ（Windows（登録商標））をホストＯＳとし、その管理の元に、Windows９８（Windows（登録商標））やＵＮＩＸ（登録商標）といった異なる種類の複数のゲストＯＳを動作させることが可能である。このような構成が可能であることによって、ホストＯＳには新しいＯＳを使用する一方、既にサポートの対象外となった旧式のＯＳをゲストＯＳとして継続使用する形態が可能となる。 For example, Windows XP (Windows (registered trademark)) of Microsoft Corporation is used as a host OS, and a plurality of different types of guest OSs such as Windows 98 (Windows (registered trademark)) and UNIX (registered trademark) are operated under the management. It is possible. With such a configuration, a new OS can be used as the host OS, while an old OS that is no longer supported is continuously used as the guest OS.

また、比較的セキュリティ機能の低い旧式のＯＳをゲストＯＳとして使用する一方、セキュリティ機能の高い新式のＯＳをホストＯＳとして利用することで、情報処理装置全体としては高いセキュリティを実現することが可能となる。 In addition, while using an old OS with a relatively low security function as a guest OS, using a new OS with a high security function as a host OS enables high security for the entire information processing apparatus. Become.

このように仮想マシンの利点は多く、既にＶＭware社の「ＶＭware」やMicrosoft社の「Virtual Server」等の製品として提供されているものもある。ＣＰＵ能力の向上に伴って仮想マシン固有のオーバヘッド処理時間が短縮されれば、今後さらに普及していくものと予想される。
特開２００３−２２２２２号公報 Thus, there are many advantages of virtual machines, and some of them are already provided as products such as “VMware” of VMware and “Virtual Server” of Microsoft. If the overhead processing time unique to the virtual machine is shortened along with the improvement of the CPU capability, it is expected that it will become more popular in the future.
Japanese Patent Laid-Open No. 2003-22222

ところで、訂正不能なメモリエラー等の重大なエラーが検出されると、システム全体を停止させるという処理は、複数の仮想マシンを具備する情報処理装置の形態においても同様である。通常、複数の仮想マシンに対しては、現実の物理メモリの領域を複数に分割し、分割された領域を各仮想マシンに割り当てているが、メモリにエラーが生じた場合、その領域と仮想マシンとの対応が従来取られていないため、総ての仮想マシンを含めたシステム全体を停止するという手段しかとりえなかった。 By the way, when a serious error such as an uncorrectable memory error is detected, the process of stopping the entire system is the same in the form of an information processing apparatus having a plurality of virtual machines. Normally, for multiple virtual machines, the actual physical memory area is divided into multiple parts, and the divided areas are assigned to each virtual machine, but if an error occurs in the memory, that area and the virtual machine In the past, no measures were taken so that the entire system including all virtual machines could only be stopped.

本発明は上記事情に鑑みてなされたもので、複数の仮想マシンを具備する情報処理装置において、特定の仮想マシンの動作に係る重大なエラーが発生した場合であっても、そのエラーの影響を受けない他の仮想マシンの動作を継続させることができる、情報処理装置およびその制御方法を提供することを目的とする。 The present invention has been made in view of the above circumstances, and in an information processing apparatus having a plurality of virtual machines, even when a serious error relating to the operation of a specific virtual machine occurs, the influence of the error is reduced. An object of the present invention is to provide an information processing apparatus and a control method thereof that can continue the operation of another virtual machine that is not received.

上記課題を解決するため、本発明に係る情報処理装置は、請求項１に記載したように、複数の仮想マシンを具備する情報処理装置において、特定の仮想マシンと前記特定の仮想マシンが利用するハードウェアとを関連付ける関連付け手段と、前記特定の仮想マシンが利用するハードウェアのエラーを検出するエラー検出手段と、前記エラーが検出された場合、そのハードウェアに関連付けられた仮想マシンの、動作を停止させる動作停止手段と、を備えたことを特徴とする。 In order to solve the above problems, an information processing apparatus according to the present invention is used by a specific virtual machine and the specific virtual machine in an information processing apparatus having a plurality of virtual machines as described in claim 1. Associating means for associating with hardware, error detecting means for detecting an error in hardware used by the specific virtual machine, and operation of the virtual machine associated with the hardware when the error is detected And an operation stop means for stopping.

また、上記課題を解決するため、本発明に係る情報処理装置の制御方法は、請求項８に記載したように、複数の仮想マシンを具備する情報処理装置の制御方法において、特定の仮想マシンと前記特定の仮想マシンが利用するハードウェアとを関連付ける関連付けステップと、前記特定の仮想マシンが利用するハードウェアのエラーを検出するエラー検出ステップと、前記エラーが検出された場合、そのハードウェアに関連付けられた仮想マシンの、動作を停止させる動作停止ステップと、を備えたことを特徴とする。 In order to solve the above problems, an information processing apparatus control method according to the present invention includes a specific virtual machine and a control method for an information processing apparatus including a plurality of virtual machines. An associating step of associating with hardware used by the specific virtual machine, an error detecting step of detecting an error in hardware used by the specific virtual machine, and associating with the hardware when the error is detected And an operation stop step for stopping the operation of the virtual machine.

本発明に係る情報処理装置およびその制御方法によれば、複数の仮想マシンを具備する情報処理装置において、特定の仮想マシンの動作に係る重大なエラーが発生した場合であっても、そのエラーの影響を受けない他の仮想マシンの動作を継続させることができる。 According to the information processing apparatus and the control method thereof according to the present invention, even if a serious error related to the operation of a specific virtual machine occurs in the information processing apparatus including a plurality of virtual machines, The operation of other virtual machines that are not affected can be continued.

本発明の実施形態に係る情報処理装置１およびその制御方法に付いて、添付図面を参照して説明する。 An information processing apparatus 1 and a control method thereof according to an embodiment of the present invention will be described with reference to the accompanying drawings.

（１）構成
図１は、情報処理装置１のシステム構成例を示す図であり、特にソフトウェアの構成に重点をおいて例示する図である。 (1) Configuration FIG. 1 is a diagram illustrating an example of a system configuration of the information processing apparatus 1, and is a diagram specifically illustrating the configuration of software.

本実施形態に係る情報処理装置１は、複数の仮想マシン（図１の例では、仮想マシン０ＶＭ０から仮想マシンｎＶＭｎまでのｎ＋１個の仮想マシンを備えている）を備えた形態のものである。 The information processing apparatus 1 according to the present embodiment includes a plurality of virtual machines (including n + 1 virtual machines from the virtual machine 0VM0 to the virtual machine nVMn in the example of FIG. 1).

情報処理装置１は、ハードウェア３（詳細例は図２参照）とソフトウェア２を備えている。ソフトウェア２は、複数の仮想マシン（ＶＭ０〜ＶＭｎ）と、これらの仮想マシンと実ハードウェアとを対応付ける仮想マシンモニタＶＭＭを備えている。 The information processing apparatus 1 includes hardware 3 (see FIG. 2 for a detailed example) and software 2. The software 2 includes a plurality of virtual machines (VM0 to VMn) and a virtual machine monitor VMM that associates these virtual machines with actual hardware.

各仮想マシン（以下、仮想マシン（ＶＭ０〜ＶＭｎ）を総称する場合は、単に仮想マシンＶＭという）は、ゲストＯＳと呼ばれるＯＳを夫々備えており（ゲストＯＳ（０）ＧＯＳ０〜ゲストＯＳＧＯＳｎ）、各仮想マシンＶＭは、このゲストＯＳの下で夫々動作している。 Each virtual machine (hereinafter simply referred to as a virtual machine VM when collectively referring to virtual machines (VM0 to VMn)) includes an OS called a guest OS (guest OS (0) GOS0 to guest OSGOSn). Each virtual machine VM operates under this guest OS.

各仮想マシンＶＭのゲストＯＳは、例えば、Windows（登録商標）、Linux（登録商標）、ＵＮＩＸ等であり、各仮想マシンＶＭで用いるゲストＯＳは、異なる種類のＯＳであってもよいし、同じ種類のＯＳであってもよい。 The guest OS of each virtual machine VM is, for example, Windows (registered trademark), Linux (registered trademark), UNIX, or the like. The guest OS used in each virtual machine VM may be a different type of OS or the same. It may be a type of OS.

仮想マシン環境下では、ゲストＯＳは、直接実際のハードウェア３にアクセスするのではなく、仮想化（ソフトウェア化）されたハードウェアにアクセスする形態をとる。ハードウェアの仮想化は仮想マシンモニタＶＭＭで行っている。また、仮想マシンモニタＶＭＭは、仮想化されたハードウェアと実際のハードウェア３との対応付けも行っている。 Under the virtual machine environment, the guest OS does not directly access the actual hardware 3, but accesses the virtualized (softwareized) hardware. Hardware virtualization is performed by the virtual machine monitor VMM. The virtual machine monitor VMM also associates virtualized hardware with actual hardware 3.

各仮想マシンＶＭでは、ゲストＯＳの管理の下で各種のアプリケーションソフトウェア（App0〜Appｍ）が動作する。各アプリケーションソフトウェア（App0〜Appｍ）も仮想マシンＶＭ毎に異なる種類のアプリケーションソフトウェアを用いてもよいし、重複した種類のアプリケーションソフトウェアを用いてもよい。 In each virtual machine VM, various application software (App0 to Appm) operates under the management of the guest OS. Different types of application software (App0 to Appm) may be used for each virtual machine VM, or overlapping types of application software may be used.

図１の例示では、さらにホストＯＳ４を備えている。ホストＯＳ４とは、通常の環境（即ち、仮想マシン環境ではない環境）で用いられるＯＳのことであり、ゲストＯＳと区別するためにホストＯＳと呼ばれている。ホストＯＳもゲストＯＳと同様に、例えばWindows、Linux、ＵＮＩＸ等である。このホストＯＳの管理下で通常のアプリケーションソフトウェア（App0〜Appｍ）も動作する。また、仮想マシンモニタＶＭＭもホストＯＳ４の管理下で動作する。 In the example of FIG. 1, a host OS 4 is further provided. The host OS 4 is an OS used in a normal environment (that is, an environment that is not a virtual machine environment), and is called a host OS to distinguish it from a guest OS. Similarly to the guest OS, the host OS is, for example, Windows, Linux, UNIX, or the like. Normal application software (App0 to Appm) also operates under the management of the host OS. The virtual machine monitor VMM also operates under the management of the host OS 4.

なお、ホストＯＳを具備しない形態、即ち、仮想マシンＶＭ、仮想マシンモニタＶＭＭ、およびハードウェア３だけで構成される形態であってもよい。 Note that the host OS may not be provided, that is, the virtual machine VM, the virtual machine monitor VMM, and the hardware 3 may be included.

図２は、情報処理装置１の構成のうち、特にハードウェア３の構成例に重点をおいて示した図である。 FIG. 2 is a diagram showing an emphasis on the configuration example of the hardware 3 in the configuration of the information processing apparatus 1.

ハードウェア３は、例えば、ＣＰＵ３１、チップセット（ＭＣＨ）３２（ＭＣＨ：Memory Control Hub）、チップセット（IＣＨ）３３（ＩＣＨ：Ｉ／Ｏ Control Hub）を備えている。チップセット（ＭＣＨ）３２には、主メモリ３４が接続される他、ビデオコントローラ３５がＢＵＳ１を介して接続されている。ビデオコントローラ３５にはさらにディスプレイ３９が接続される。 The hardware 3 includes, for example, a CPU 31, a chip set (MCH) 32 (MCH: Memory Control Hub), and a chip set (ICH) 33 (ICH: I / O Control Hub). In addition to the main memory 34, a video controller 35 is connected to the chip set (MCH) 32 via the BUS 1. A display 39 is further connected to the video controller 35.

主メモリ３４は、例えば、ＥＣＣ（Error Correction Code）付きのＤＩＭＭ（Dual In-line Memory Module）で構成される。 The main memory 34 is configured by, for example, a DIMM (Dual In-line Memory Module) with ECC (Error Correction Code).

一方、チップセット（IＣＨ）３３には、例えば、ＬＡＮカード３６がＢＵＳ２を介して、ＩＤＥコントローラ３７がＢＵＳ３を介して、さらにＳＣＳＩコントローラ３８がＢＵＳ４を介して夫々接続されている。また、ＩＤＥコントローラ３７には、ＨＤＤ（Hard Disk Drive）４１や光ディスク（図示せず）等のＩＤＥデバイスが接続されており、ＳＣＳＩコントローラ３８には、各種のＳＣＳＩデバイス４２が接続されている。 On the other hand, to the chip set (ICH) 33, for example, a LAN card 36 is connected via BUS2, an IDE controller 37 via BUS3, and a SCSI controller 38 via BUS4. The IDE controller 37 is connected to IDE devices such as a hard disk drive (HDD) 41 and an optical disk (not shown), and various SCSI devices 42 are connected to the SCSI controller 38.

ＢＵＳ１乃至ＢＵＳ４は、ＰＣＩ系のバス、例えば、ＰＣＩバス、ＰＣＩ―Ｘバス、ＰＣＩ Expressバス等である。 BUS1 to BUS4 are PCI buses such as a PCI bus, a PCI-X bus, a PCI Express bus, and the like.

図３は、本実施形態に係る情報処理装置１の機能実現手段の構成例を示すブロック図であり、特に本発明に特徴的な機能実現手段に関するブロック図である。 FIG. 3 is a block diagram illustrating a configuration example of the function realizing unit of the information processing apparatus 1 according to the present embodiment, and is a block diagram relating to the function realizing unit that is particularly characteristic of the present invention.

情報処理装置１は、エラー検出手段５１、関連付け手段５２、関連付けテーブル５３、動作停止手段５４を具備している。 The information processing apparatus 1 includes an error detection unit 51, an association unit 52, an association table 53, and an operation stop unit 54.

エラー検出手段５１は、メモリエラーやバス通信エラー等のハードウェアエラーを検出する手段である。具体的には、メモリエラーの場合には、主メモリ３４が接続されているチップセット（ＭＣＨ）３２がエラー検出手段５１となる。また、バス通信エラー、例えば、ＳＥＲＲやＰＥＲＲと呼ばれるＰＣＩバスのパリティエラーの場合には、ＰＣＩバス系（ＢＵＳ１乃至ＢＵＳ４）に接続されるチップセット（ＩＣＨ）３３やチップセット（ＭＣＨ）３２がエラー検出手段となる。 The error detection means 51 is a means for detecting hardware errors such as memory errors and bus communication errors. Specifically, in the case of a memory error, the chip set (MCH) 32 to which the main memory 34 is connected becomes the error detection means 51. Further, in the case of a bus communication error, for example, a parity error of a PCI bus called SERR or PERR, the chip set (ICH) 33 or chip set (MCH) 32 connected to the PCI bus system (BUS1 to BUS4) has an error. It becomes a detection means.

ところで、メモリエラーやパリティエラー等の重大なエラーがチップセット３２、３３で検出されると、チップセットは通常、ＳＭＩ（System Management Interrupt）割り込みを発生させるように設定されている。ＳＭＩ割り込みが発生すると、このＳＭＩ割り込みは、ＯＳを介することなく直接ＢＩＯＳへ通知され、ＢＩＯＳでは、ＢＩＯＳ内のＳＭＩハンドラーが起動し、情報処理装置１のシステム全体を停止させるように構成されている。 By the way, when a serious error such as a memory error or a parity error is detected by the chip sets 32 and 33, the chip set is normally set to generate an SMI (System Management Interrupt) interrupt. When an SMI interrupt occurs, this SMI interrupt is notified directly to the BIOS without going through the OS, and the BIOS is configured to activate the SMI handler in the BIOS and stop the entire system of the information processing apparatus 1. .

一方、チップセット３２、３３に対する設定を変更することで、ＳＭＩ割り込みに換えてＳＣＩ（System Control Interrupt）割り込みと呼ばれる割り込みを発生させることができる。このＳＣＩ割り込みは、ＢＩＯＳではなくＯＳ或いは、仮想マシンモニタＶＭＭに通知することが可能である。 On the other hand, by changing the settings for the chip sets 32 and 33, an interrupt called an SCI (System Control Interrupt) interrupt can be generated instead of the SMI interrupt. This SCI interrupt can be notified to the OS or the virtual machine monitor VMM instead of the BIOS.

本実施形態に係る情報処理装置１では、メモリエラーやパリティエラー等が検出された場合には、ＳＣＩ割り込みを発生させるようにチップセット３２、３３の設定を行っている。そして、発生したＳＣＩ割り込みを仮想マシンモニタＶＭＭに通知する形態としている。即ち、本実施形態では、メモリエラーやパリティエラー等が発生した場合であっても、情報処理装置１のシステム全体を停止させる（通常のＳＭＩ割り込みの設定によって）ことがない形態としている。 In the information processing apparatus 1 according to the present embodiment, the chip sets 32 and 33 are set so as to generate an SCI interrupt when a memory error, a parity error, or the like is detected. The generated SCI interrupt is notified to the virtual machine monitor VMM. That is, in the present embodiment, even when a memory error, a parity error, or the like occurs, the entire system of the information processing apparatus 1 is not stopped (by a normal SMI interrupt setting).

ＳＣＩ割り込みは仮想マシンモニタＶＭＭの関連付け手段５２に通知される。関連付け手段５２は、ＳＣＩ割り込みを認識すると、チップセット３２、３３のレジスタを読みにいき、エラーの発生した場所を特定する。例えば、メモリエラーの場合には、エラーが発生したメモリアドレスを特定する。また、パリティエラーの場合には、パリティエラーが発生したバスを特定する。 The SCI interrupt is notified to the association unit 52 of the virtual machine monitor VMM. When recognizing the SCI interrupt, the associating unit 52 reads the registers of the chip sets 32 and 33 and specifies the location where the error has occurred. For example, in the case of a memory error, the memory address where the error has occurred is specified. In the case of a parity error, the bus in which the parity error has occurred is specified.

他方、仮想マシンモニタＶＭＭは各仮想マシンＶＭが使用するハードウェアを認識している。仮想マシンモニタＶＭＭは、仮想マシンＶＭと実際のハードウェアの対応付けを行う機能を有するものである。例えば、主メモリ３４に関しては、図４に例示したように、主メモリ３４の領域を仮想マシンモニタＶＭＭ用と各仮想マシンＶＭ（ＶＭ０〜ＶＮｎ）用に予め分割し、各仮想マシンＶＭが使用可能な領域を仮想マシンＶＭ毎に割り当てている。そして、主メモリ３４内のアドレス領域と各仮想マシンＶＭとを関連付ける関連付けテーブル（メモリ・アロケーション・テーブル）５３を保有している。 On the other hand, the virtual machine monitor VMM recognizes hardware used by each virtual machine VM. The virtual machine monitor VMM has a function of associating the virtual machine VM with actual hardware. For example, with respect to the main memory 34, as illustrated in FIG. 4, the area of the main memory 34 is divided in advance for the virtual machine monitor VMM and each virtual machine VM (VM0 to VNn), and each virtual machine VM can be used. Is allocated to each virtual machine VM. An association table (memory allocation table) 53 that associates the address area in the main memory 34 with each virtual machine VM is held.

図５は、メモリ・アロケーション・テーブル５３の一例を示す図である。この例では、例えば仮想マシンＶＭ０には、先頭アドレス4000_0000hから終了アドレス7FFF_FFFFhまでの領域が割り当てられ、仮想マシンＶＭ１には、先頭アドレス8000_0000hから終了アドレスBFFF_FFFFhまでの領域が割り当てられている。 FIG. 5 is a diagram illustrating an example of the memory allocation table 53. In this example, for example, an area from the start address 4000_0000h to the end address 7FFF_FFFFh is assigned to the virtual machine VM0, and an area from the start address 8000_0000h to the end address BFFF_FFFFh is assigned to the virtual machine VM1.

仮想マシンモニタＶＭＭの関連付け手段５２は、メモリエラーが発生したアドレスを特定すると、このメモリ・アロケーション・テーブル５３を参照し、該当するアドレスを使用している仮想マシンＶＭを特定する。 When the address 52 in which the memory error has occurred is specified, the associating unit 52 of the virtual machine monitor VMM refers to the memory allocation table 53 and specifies the virtual machine VM using the corresponding address.

例えば、メモリエラーが発生したアドレスが8000_0FFFhであった場合、そのアドレスを利用している仮想マシンＶＭは仮想マシンＶＭ１であると特定する。 For example, when the address where the memory error occurs is 8000_0FFFh, the virtual machine VM using the address is specified as the virtual machine VM1.

動作停止手段５４は、特定された仮想マシンＶＭ、例えば仮想マシンＶＭ１に対して停止するように指示を出す手段である。 The operation stop unit 54 is a unit that issues an instruction to stop the specified virtual machine VM, for example, the virtual machine VM1.

具体的には、仮想マシンモニタＶＭＭの動作停止手段５４は、仮想マシンＶＭ１のゲストＯＳに対してのみ、ＮＭＩ（Non-Maskable Interrupt）割り込みを発生させ、仮想マシンＶＭ１の動作を停止させる。他の仮想マシンＶＭ（正常なメモリ領域が割り当てられている仮想マシンＶＭ）に対してはＮＭＩ割り込みを発生しない。この結果、メモリエラーが発生した領域を使用している仮想マシンＶＭ１だけが動作を停止し、他の仮想マシンＶＭは動作を継続することが可能となる。 Specifically, the operation stop unit 54 of the virtual machine monitor VMM generates an NMI (Non-Maskable Interrupt) interrupt only for the guest OS of the virtual machine VM1, and stops the operation of the virtual machine VM1. An NMI interrupt is not generated for other virtual machines VM (virtual machine VM to which a normal memory area is allocated). As a result, only the virtual machine VM1 that uses the area where the memory error has occurred stops its operation, and the other virtual machines VM can continue its operation.

（２）動作
上記のように構成された情報処理装置１の動作（制御方法）について、フローチャートを用いて説明する。 (2) Operation The operation (control method) of the information processing apparatus 1 configured as described above will be described using a flowchart.

図６は、第１の動作例として、メモリエラーが発生した場合の動作について説明するフローチャートである。また、図７は、動作の場所を構成図上で示すものであり、図７の中の丸印の数字と図６のステップ番号は夫々対応している。 FIG. 6 is a flowchart for explaining the operation when a memory error occurs as a first operation example. FIG. 7 shows the location of the operation on the configuration diagram. The numbers in circles in FIG. 7 correspond to the step numbers in FIG.

まず、仮想マシンＶＭ１に割り当てられたメモリでメモリエラーが発生したとする（ステップＳＴ１）。 First, it is assumed that a memory error has occurred in the memory allocated to the virtual machine VM1 (step ST1).

チップセット（ＭＣＨ）３２は、メモリエラーが発生したことを検出する（ステップＳＴ２）。ここで、メモリエラーとは、主メモリ３４がエラー訂正機能付きのメモリである場合には、訂正不能なエラーのことをいう。 The chip set (MCH) 32 detects that a memory error has occurred (step ST2). Here, the memory error means an error that cannot be corrected when the main memory 34 is a memory with an error correction function.

次に、チップセット（ＭＣＨ）３２は、ＳＣＩ割り込みを発生し、仮想マシンモニタＶＭＭに通知する（ステップＳＴ３）。 Next, the chip set (MCH) 32 generates an SCI interrupt and notifies the virtual machine monitor VMM (step ST3).

通知を受けた仮想マシンモニタＶＭＭは、チップセット（ＭＣＨ）３２のレジスタを読みにいき、エラーアドレスを取得する（ステップＳＴ４）。 The virtual machine monitor VMM that has received the notification reads the register of the chip set (MCH) 32 and acquires an error address (step ST4).

さらに、仮想マシンモニタＶＭＭは、取得したエラーアドレスとメモリ・アロケーション・テーブルとを比較し、仮想マシンＶＭ１に割り当てられたメモリでエラーが発生したことを検出する（ステップＳＴ５）。 Further, the virtual machine monitor VMM compares the acquired error address with the memory allocation table, and detects that an error has occurred in the memory allocated to the virtual machine VM1 (step ST5).

その後、仮想マシンモニタＶＭＭは、仮想マシンＶＭ１のゲストＯＳに対してＮＭＩ割り込みを発生させて仮想マシンＶＭ１の動作を停止させる（ステップＳＴ６）。なお、仮想マシンＶＭ１以外の他の仮想マシンＶＭは動作を継続する。 Thereafter, the virtual machine monitor VMM generates an NMI interrupt for the guest OS of the virtual machine VM1 and stops the operation of the virtual machine VM1 (step ST6). Note that other virtual machines VM other than the virtual machine VM1 continue to operate.

図８は、第２の動作例として、ＰＣＩバスにパリティエラーが発生した場合の動作について説明するフローチャートである。また、図９は、動作の場所を構成図上で示すものであり、図９の中の丸印の数字と図８のステップ番号の１桁目の数字とは夫々対応している。 FIG. 8 is a flowchart for explaining the operation when a parity error occurs in the PCI bus as a second operation example. FIG. 9 shows the location of the operation on the configuration diagram. The numbers in the circles in FIG. 9 correspond to the first digit of the step number in FIG.

まず、仮想マシンＶＭ１がＩＤＥコントローラ３７のＨＤＤ４１にアクセスしたとする（ステップＳＴ１１）。 First, it is assumed that the virtual machine VM1 accesses the HDD 41 of the IDE controller 37 (step ST11).

これを受けて、仮想マシンモニタＶＭＭは、ＩＤＥコントローラ３７に対してアクセスを試みる（ステップＳＴ１２）。 In response to this, the virtual machine monitor VMM tries to access the IDE controller 37 (step ST12).

このとき、ＩＤＥコントローラ３７が接続されているＰＣＩバス（ＢＵＳ３）にパリティエラーを示すＳＥＲＲが発生したとする（ステップＳＴ１３）。 At this time, it is assumed that a SERR indicating a parity error has occurred on the PCI bus (BUS3) to which the IDE controller 37 is connected (step ST13).

ＳＥＲＲがＰＣＩバスに発生すると、チップセット（ＩＣＨ）３３は、ＳＥＲＲを検出する（ステップＳＴ１４）。 When SERR is generated in the PCI bus, the chip set (ICH) 33 detects SERR (step ST14).

その後、チップセット（ＩＣＨ）３３は、仮想マシンモニタＶＭＭに対してＳＣＩ割り込みを発生する（ステップＳＴ１５）。 Thereafter, the chip set (ICH) 33 generates an SCI interrupt to the virtual machine monitor VMM (step ST15).

仮想マシンモニタＶＭＭは、ＳＣＩ割り込みを受けると、チップセット（ＩＣＨ）３３のレジスタを読みにいき、ＢＵＳ３でＳＥＲＲが発生したことを検出する（ステップＳＴ１６）。 When receiving the SCI interrupt, the virtual machine monitor VMM reads the register of the chip set (ICH) 33 and detects that the SERR has occurred in BUS3 (step ST16).

仮想マシンモニタＶＭＭは、ＢＵＳ３にアクセス（実際にはＢＵＳ３に接続されているＨＤＤ４１に）した仮想マシンＶＭ１に対して動作を停止させる（ステップＳＴ１７）。 The virtual machine monitor VMM stops the operation of the virtual machine VM1 that has accessed BUS3 (actually, the HDD 41 connected to BUS3) (step ST17).

なお、ＳＥＲＲが発生したＢＵＳ３にアクセスしていない仮想マシンＶＭ１以外の他の仮想マシンＶＭは動作を継続する。 Note that other virtual machines VMs other than the virtual machine VM1 that has not accessed the BUS3 in which the SERR has occurred continue to operate.

ＰＣＩバス上で規定されているパリティエラーにはＳＥＲＲの他に、ＰＥＲＲがあるが、この場合も上述したＳＥＲＲと全く同様の処理によって仮想マシンＶＭ１の動作を停止させればよい。 In addition to SERR, there is PERR as a parity error defined on the PCI bus. In this case as well, the operation of the virtual machine VM1 may be stopped by the same processing as the above SERR.

ところで、上述した第２の動作例では、仮想マシンＶＭ１がアクセスしたＢＵＳ３にパリティエラーが発生した場合、その仮想マシンＶＭ１の動作を停止させる形態である。 By the way, in the second operation example described above, when a parity error occurs in BUS3 accessed by the virtual machine VM1, the operation of the virtual machine VM1 is stopped.

一方、ＢＵＳ３にパリティエラーが発生した後に、他の仮想マシンＶＭ、例えば、仮想マシンＶＭ２がＢＵＳ３にアクセスしようとした場合、仮想マシンＶＭ２にも重大な障害が生じる可能性がある。このような場合には、仮想マシンＶＭ２の動作を停止させることで障害を回避することが可能となる。 On the other hand, if another virtual machine VM, for example, virtual machine VM2 tries to access BUS3 after a parity error has occurred in BUS3, a serious failure may occur in virtual machine VM2. In such a case, it is possible to avoid a failure by stopping the operation of the virtual machine VM2.

この動作を実現するためには、仮想マシンモニタＶＭＭが、図１０に例示したようなバス管理テーブルを保有する形態とすればよい。 In order to realize this operation, the virtual machine monitor VMM may have a bus management table as illustrated in FIG.

図１０（ａ）は、バス管理テーブルの初期状態を示しており、総てのバス（例えば、ＢＵＳ１〜ＢＵＳ４）は「正常」とする。 FIG. 10A shows an initial state of the bus management table, and all buses (for example, BUS1 to BUS4) are set to “normal”.

動作中に、例えば仮想マシンＶＭ１からのアクセスによってＢＵＳ３の異常が検出された場合には、仮想マシンＶＭ１の動作を停止させると共に、図１０（ｂ）に示したように、バス管理テーブルのバス「３」の欄を「異常」と変更する。 During the operation, for example, when an abnormality of BUS3 is detected by access from the virtual machine VM1, the operation of the virtual machine VM1 is stopped and, as shown in FIG. 10B, the bus “ Change the column “3” to “abnormal”.

他の仮想マシンＶＭ、例えば仮想マシンＶＭ２からのＢＵＳ３に対するアクセスがあった場合に、仮想マシンモニタＶＭＭはこのバス管理テーブルを参照し、ＢＵＳ３が異常であることをすることを確認し、その後、ＢＵＳ３をアクセスした仮想マシンＶＭ２の動作を停止する。これによって、仮想マシンＶＭ２に重大な障害が発生することを事前に防止することが可能となる。 When there is an access to BUS3 from another virtual machine VM, for example, virtual machine VM2, the virtual machine monitor VMM refers to this bus management table and confirms that BUS3 is abnormal, and then BUS3 The operation of the virtual machine VM2 that accessed is stopped. Thereby, it is possible to prevent a serious failure from occurring in the virtual machine VM2 in advance.

上述したように、本実施形態に係る情報処理装置１、およびその制御方法によれば、複数の仮想マシンを具備する情報処理装置において、特定の仮想マシンの動作に係る重大なエラーが発生した場合であっても、そのエラーの影響を受けない他の仮想マシンの動作を継続させることができる。 As described above, according to the information processing apparatus 1 and the control method thereof according to the present embodiment, when a serious error related to the operation of a specific virtual machine occurs in the information processing apparatus including a plurality of virtual machines. Even so, the operation of another virtual machine that is not affected by the error can be continued.

なお、本発明は上記の実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせても良い。 Note that the present invention is not limited to the above-described embodiments as they are, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of components disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, the constituent elements over different embodiments may be appropriately combined.

本発明の一実施形態に係る情報処理装置のシステム構成例を示す図であり、特に、ソフトウェア構成を重点的に示す図。1 is a diagram illustrating an example of a system configuration of an information processing apparatus according to an embodiment of the present invention, and particularly illustrates a software configuration with emphasis. 本発明の一実施形態に係る情報処理装置のシステム構成例を示す図であり、特にハードウェア構成を重点的に示す図。1 is a diagram illustrating a system configuration example of an information processing apparatus according to an embodiment of the present invention, and particularly illustrates a hardware configuration. 本発明の一実施形態に係る情報処理装置における機能実現手段の構成例を示すブロック図。The block diagram which shows the structural example of the function implementation | achievement means in the information processing apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る情報処理装置のシステム構成例を示す図であり、特に、主メモリの領域割り当てを重点的に示す図。1 is a diagram illustrating an example of a system configuration of an information processing apparatus according to an embodiment of the present invention, and more particularly, a diagram that focuses on area allocation of a main memory. 仮想マシンモニタが具備するメモリ・アロケーション・テーブルの一例を示す図。The figure which shows an example of the memory allocation table with which a virtual machine monitor is provided. 主メモリにメモリエラーが発生した場合の処理例（第１の動作例）を示すフローチャート。The flowchart which shows the process example (1st operation example) when a memory error generate | occur | produces in the main memory. 主メモリにメモリエラーが発生した場合の処理例（第１の動作例）を、システム構成図を用いて説明する図。The figure explaining the process example (1st operation example) when a memory error generate | occur | produces in the main memory using a system block diagram. ＰＣＩバスにパリティエラーが発生した場合の処理例（第２の動作例）を示すフローチャート。The flowchart which shows the process example (2nd operation example) when a parity error generate | occur | produces in the PCI bus. ＰＣＩバスにパリティエラーが発生した場合の処理例（第２の動作例）を、システム構成図を用いて説明する図。The figure explaining the process example (2nd operation example) when a parity error generate | occur | produces in a PCI bus using a system block diagram. 仮想マシンモニタが具備するバス管理テーブルの一例を示す図。The figure which shows an example of the bus management table with which a virtual machine monitor is provided.

Explanation of symbols

１情報処理装置
３１ＣＰＵ
３２チップセット（ＭＣＨ）
３３チップセット（ＩＣＨ）
３４主メモリ
３７ＩＤＥコントローラ
４１ＨＤＤ
５１エラー検出手段
５２関連付け手段
５２動作停止手段
５３関連付けテーブル（メモリ・アロケーション・テーブル）
ＶＭ０〜ＶＭｎ仮想マシン（０）〜仮想マシン（ｎ）
ＶＭＭ仮想マシンモニタ
ＧＯＳ０〜ＧＯＳｍゲストＯＳ（０）〜ゲストＯＳ（ｎ） 1 Information processing device 31 CPU
32 Chipset (MCH)
33 Chipset (ICH)
34 Main memory 37 IDE controller 41 HDD
51 Error Detection Unit 52 Association Unit 52 Operation Stop Unit 53 Association Table (Memory Allocation Table)
VM0 to VMn Virtual machine (0) to Virtual machine (n)
VMM virtual machine monitor GOS0 to GOSm Guest OS (0) to Guest OS (n)

Claims

In an information processing apparatus including a plurality of virtual machines,
An association means for associating a specific virtual machine with hardware used by the specific virtual machine;
Error detection means for detecting an error in hardware used by the specific virtual machine;
An operation stopping means for stopping the operation of the virtual machine associated with the hardware when the error is detected;
An information processing apparatus comprising:

The hardware is main memory;
The associating means is means for dividing the area of the main memory, assigning the divided areas to the respective virtual machines, and associating them.
The operation stop means stops the operation of the virtual machine assigned to the area where the error is detected.
The information processing apparatus according to claim 1.

The main memory is a memory with an error correction function,
The operation stop means stops the operation of the virtual machine when the detected error is uncorrectable.
The information processing apparatus according to claim 2.

The hardware error is a bus communication error of a PCI device used by the specific virtual machine.
The information processing apparatus according to claim 1.

The PCI device communication error is SERR or PERR.
The information processing apparatus according to claim 4.

The hardware error is a PCI device bus communication error used by the specific virtual machine,
Means for stopping the operation of the other virtual machine to be accessed when another virtual machine tries to access the bus of the PCI device in which the error is detected;
The information processing apparatus according to claim 1, further comprising:

The error detection means is an error detection means by a chip set, and the chip set notifies the operation stop means by an SCI interrupt when an error is detected, not by an SMI interrupt.
The information processing apparatus according to claim 1.

In a control method for an information processing apparatus including a plurality of virtual machines,
Associating a specific virtual machine with the hardware used by the specific virtual machine;
An error detection step of detecting a hardware error used by the specific virtual machine;
An operation stop step for stopping the operation of the virtual machine associated with the hardware when the error is detected;
An information processing apparatus control method comprising:

The hardware is main memory;
The associating step is a step of dividing the area of the main memory, assigning the divided area to the respective virtual machines, and associating them.
The operation stop step stops the operation of the virtual machine assigned to the area where the error is detected.
The information processing apparatus control method according to claim 8.

The main memory is a memory with an error correction function,
The operation stop step stops the operation of the virtual machine when the detected error is uncorrectable.
The information processing apparatus control method according to claim 9.

The hardware error is a bus communication error of a PCI device used by the specific virtual machine.
The information processing apparatus control method according to claim 8.

The PCI device communication error is SERR or PERR.
The method of controlling an information processing apparatus according to claim 11.

The hardware error is a PCI device bus communication error used by the specific virtual machine,
A step of stopping the operation of the other virtual machine to be accessed when another virtual machine tries to access the bus of the PCI device in which the error is detected;
The information processing apparatus control method according to claim 8, further comprising:

The operation stop step is an error detection step by a chipset. When the chipset detects an error, the chipset notifies the operation stop step by an SCI interrupt instead of an SMI interrupt.
The information processing apparatus control method according to claim 8.