JP2009211517A

JP2009211517A - Virtual computer redundancy system

Info

Publication number: JP2009211517A
Application number: JP2008055056A
Authority: JP
Inventors: Fumihiro Makiyama; 文博牧山
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2008-03-05
Filing date: 2008-03-05
Publication date: 2009-09-17
Anticipated expiration: 2028-03-05
Also published as: JP5392594B2

Abstract

<P>PROBLEM TO BE SOLVED: To specify the cause of a failure occurrence dependent on hardware by a dedicated hardware independently of the state of a host operation system when a hardware failure happens to an operating virtual computer system. <P>SOLUTION: The virtual computer redundancy system is provided with an operating computer system and a standby computer system for standing by as backup for the operating computer system. The operating computer system is provided with an operating host operating system and an operating guest operating system operating on a virtual computer provided by the operating host operating system. The standby computer system is provided with a standby host operating system and a standby guest operating system. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、仮想計算機冗長化システムに関し、特に運用系で複数のゲストオペレーティングシステムをホストオペレーティングシステム上で稼動させている仮想計算機システムのハードウェアで障害が発生した際に、オペレーティングシステムのメモリダンプを外部記憶に採取し、動作中のゲストオペレーティングシステムの処理を高速に待機系のゲストオペレーティングシステムへ引き継がせる機能を持った仮想計算機冗長化システムに関する。 The present invention relates to a virtual machine redundancy system, and in particular, when a failure occurs in hardware of a virtual machine system in which a plurality of guest operating systems are operated on a host operating system in an operating system, an operating system memory dump is performed. The present invention relates to a virtual machine redundancy system having a function of collecting data in an external storage and transferring a running guest operating system to a standby guest operating system at high speed.

クラスタ管理された複数台の仮想計算機を有する仮想計算機冗長化システムが知られる。この仮想計算機冗長化システムでは、それぞれのホストオペレーティングシステム上で稼動する仮想計算機制御部どうしが連携する。何れかの仮想計算機システムで障害が発生した場合には、その仮想計算機システムを管轄する仮想計算機制御部の上で稼動しているゲストオペレーティングシステムの処理を、正常稼動している仮想計算機システムへ移行させてダウンタイムを短縮する。しかし、ソフトウェアでなる仮想計算機制御部だけでは、ハードウェアに依存する障害発生原因の特定が困難であった。 A virtual machine redundancy system having a plurality of cluster-managed virtual machines is known. In this virtual machine redundancy system, virtual machine control units operating on the respective host operating systems cooperate with each other. If a failure occurs in any of the virtual machine systems, the guest operating system process running on the virtual machine controller that controls the virtual machine system is transferred to the normally running virtual machine system To reduce downtime. However, it is difficult to identify the cause of the failure depending on the hardware only by the virtual computer control unit made of software.

この種の技術としては、特開２００２−３２２４４号公報（特許文献１参照）に記載された発明がある。この発明では、仮想計算機を運用しているときに、仮想計算機制御部に何らかの不具合が発生し、仮想計算機が動作できなくなった場合に、ゲストオペレーティングシステムのメモリイメージを記憶媒体に退避させる。そのメモリイメージをダンプ型式データに変換出力させて、その不具合原因の調査解析を容易にする。 As this type of technology, there is an invention described in Japanese Patent Laid-Open No. 2002-32244 (see Patent Document 1). According to the present invention, when a virtual machine control unit has some trouble when operating a virtual machine and the virtual machine cannot operate, the memory image of the guest operating system is saved in a storage medium. The memory image is converted into dump type data and output to facilitate investigation and analysis of the cause of the failure.

この発明の第１の問題点は、次の通りである。仮想計算機システムの運用中にハードウェア障害が発生した場合に、ホストオペレーティングシステム、及び、その上で稼動している仮想計算機制御部が制御不能になる可能性がある。これらが制御不能になると、仮想計算機制御部の制御下で動作中であった各ゲストオペレーティングシステムが制御不能となるため、それぞれのゲストオペレーティングシステムが備えるダンプ採取手段が使えなくなるという問題点である。 The first problem of the present invention is as follows. If a hardware failure occurs during the operation of the virtual machine system, the host operating system and the virtual machine controller operating on the host operating system may become uncontrollable. If they become uncontrollable, each guest operating system operating under the control of the virtual machine control unit becomes uncontrollable, and the dump collecting means included in each guest operating system cannot be used.

その理由は、仮想計算機制御部は、ホストオペレーティングシステム上で稼動しているため、ハードウェア障害が発生した場合、ホストオペレーティングシステム自体が正常動作できなくなり、仮想計算機制御部も制御不能となる。よって、各ゲストオペレーティングシステムも制御不能となってしまい、ダンプ採取手段を起動することができないからである。 The reason is that since the virtual machine control unit operates on the host operating system, when a hardware failure occurs, the host operating system itself cannot operate normally and the virtual machine control unit cannot be controlled. Therefore, each guest operating system is also uncontrollable and the dump collecting means cannot be activated.

第２の問題点は、ソフトウェアである仮想計算機制御部どうしの連携で運用系から待機系へゲストオペレーティングシステムの処理を移行させる方式では、ハードウェア障害発生時には一定時間のダウンタイムが発生してしまうという問題点である。その理由は、仮想計算機制御部どうしは、ハートビートによる死活監視を行っているため、ハートビートが途切れたことを検出するまでに、ある一定のタイムアウト時間が掛かるからである。 The second problem is that in the method in which the guest operating system process is transferred from the active system to the standby system in cooperation with the virtual machine controller, which is software, a downtime of a certain time occurs when a hardware failure occurs. It is a problem. The reason is that since the virtual machine control units perform alive monitoring by heartbeats, it takes a certain time-out time to detect that the heartbeat is interrupted.

特開平７−２１９８０２号公報（特許文献２参照）には、２重化制御方式の発明が記載されている。この２重化制御方式は、主系にて処理を行っている最中に障害が検出された場合は、予備系に切り替えて処理を続行するために、両系の記憶部内容を同一に保つべく、主系の記憶部への書き込みデータを予備系の記憶部へも書き込みを行う。主系記憶部に発生した書き込み内容を、主系ＣＰＵとは独立に動作する記憶制御部によりその内部のバッファへ逐次記憶させる。主系ＣＰＵからの指示により、主系ＣＰＵの記憶部変更内容を別のバッファ領域へ記憶させ始める。並行して前バッファ領域の内容通りの変更を予備系の記憶部へ対して行う。 Japanese Patent Laid-Open No. 7-219802 (see Patent Document 2) describes an invention of a duplex control system. In this duplex control method, when a failure is detected during processing in the main system, the contents of the storage units of both systems are kept the same in order to switch to the standby system and continue the processing. Therefore, the write data to the main storage unit is also written to the standby storage unit. The write contents generated in the main storage unit are sequentially stored in the internal buffer by the storage control unit that operates independently of the main CPU. In response to an instruction from the main CPU, the storage unit change contents of the main CPU are started to be stored in another buffer area. In parallel, the change according to the contents of the previous buffer area is made to the storage unit of the standby system.

特開平８−２８７０２１号公報（特許文献３参照）には、共用メモリに結合される複数の計算機システムの発明が記載されている。これは、少なくとも１つの実計算機（以下、実クラスタと記す）と外部記憶装置である共用メモリとを結合する電子計算機システムにおける発明である。実クラスタ及び仮想計算機運用された実クラスタ内の個々のゲストクラスタを制御するためのオペレーティングシステム（以下、ＯＳと記す）を有する実クラスタまたは、仮想計算機システムを制御するためのＯＳを有する少なくとも１つの仮想計算機システムが、共用メモリに接続される。 Japanese Patent Laid-Open No. 8-28702 (see Patent Document 3) describes an invention of a plurality of computer systems coupled to a shared memory. This is an invention in an electronic computer system that couples at least one real computer (hereinafter referred to as a real cluster) and a shared memory that is an external storage device. At least one having an operating system (hereinafter referred to as OS) for controlling individual guest clusters in a real cluster and a virtual cluster operated real cluster or an OS for controlling a virtual machine system A virtual machine system is connected to the shared memory.

特開平９−３０５４２４号公報（特許文献４参照）には、プロセッサの二重化システムの発明が記載されている。この発明は、主メモリバスで接続されたＭＰＵ及び主記憶装置と、相手系と共有情報を送受する送受信回路を持つ共有データ一致化装置（以下、ＣＭＥと略称）を備える。主記憶装置は、共有データを記憶する共有エリアを有する。ＣＭＥは、メモリアクセス情報取得手段、メモリアクセス手段、共有エリア設定手段、共有データ監視手段を有する。メモリアクセス情報取得手段は、ＭＰＵから主記憶装置へ書き込むアドレスとデータを含むアクセス情報を主メモリバスからスヌープする。メモリアクセス手段は、相手系からの受信情報が共有データの場合に共有エリアに書き込む。共有エリア設定手段は、共有エリアの範囲を指定する。共有データ監視手段は、アクセス情報または受信情報中のアドレスが共有エリアの範囲内にあるとき、該情報を共有データと判断する。 Japanese Patent Laying-Open No. 9-305424 (see Patent Document 4) describes an invention of a dual processor system. The present invention includes an MPU and a main storage device connected by a main memory bus, and a shared data matching device (hereinafter abbreviated as CME) having a transmission / reception circuit for transmitting / receiving shared information to / from a counterpart system. The main storage device has a shared area for storing shared data. The CME includes memory access information acquisition means, memory access means, shared area setting means, and shared data monitoring means. The memory access information acquisition means snoops access information including an address and data to be written from the MPU to the main storage device from the main memory bus. The memory access means writes in the shared area when the received information from the partner system is shared data. The shared area setting means designates the range of the shared area. The shared data monitoring unit determines that the information is shared data when the address in the access information or the received information is within the range of the shared area.

特開２００２−３２２４４号公報JP 2002-32244 A 特開平７−２１９８０２号公報Japanese Laid-Open Patent Publication No. 7-219802 特開平８−２８７０２１号公報JP-A-8-287021 特開平９−３０５４２４号公報JP-A-9-305424

本発明の課題は、運用系の仮想計算機システムでハードウェア障害が発生した場合に、ホストオペレーティングシステムの状態に依存することなく、専用ハードウェアによってハードウェアに依存する障害発生原因の特定を行うことにある。 An object of the present invention is to identify a cause of failure depending on hardware by dedicated hardware without depending on the state of a host operating system when a hardware failure occurs in an operational virtual machine system. It is in.

本発明の他の課題は、運用系でハードウェア障害が発生した場合でもダウンタイムを大幅に短縮して、運用系のゲストオペレーティングシステムの処理を待機系のゲストオペレーティングシステムの処理として、処理を引き継ぐことにある。 Another problem of the present invention is that even when a hardware failure occurs in the active system, the downtime is greatly reduced, and the processing of the active guest operating system is taken over as the processing of the standby guest operating system. There is.

本発明の一つのアスペクトによる仮想計算機冗長化システムは、運用系コンピュータシステムと、運用系コンピュータシステムのバックアップとして待機する待機系コンピュータシステムとを具備する。運用系コンピュータシステムは、運用系ホストオペレーティングシステム、及び、運用系ホストオペレーティングシステムによって提供される仮想計算機上で動作する運用系ゲストオペレーティングシステムを備える。待機系コンピュータシステムは、待機系ホストオペレーティングシステムと、待機系ゲストオペレーティングシステムとを備える。 A virtual machine redundancy system according to one aspect of the present invention includes an active computer system and a standby computer system that stands by as a backup of the active computer system. The operational computer system includes an operational host operating system and an operational guest operating system that operates on a virtual machine provided by the operational host operating system. The standby computer system includes a standby host operating system and a standby guest operating system.

本発明によれば、運用系の仮想計算機システムでハードウェア障害が発生した場合に、ホストオペレーティングシステムの状態に依存することなく、専用ハードウェアによってハードウェアに依存する障害発生原因の特定を行うことができる。 According to the present invention, when a hardware failure occurs in an active virtual machine system, the cause of failure that depends on hardware is identified by dedicated hardware without depending on the state of the host operating system. Can do.

また、運用系でハードウェア障害が発生した場合でもダウンタイムを大幅に短縮して、運用系のゲストオペレーティングシステムの処理を待機系のゲストオペレーティングシステムの処理として、処理を引き継ぐことができる。 Further, even when a hardware failure occurs in the active system, the downtime can be greatly shortened, and the process of the active guest operating system can be taken over as the process of the standby guest operating system.

本発明を実施するための最良の形態の一つについて、図面を参照して詳細に説明する。図１を参照すると、ひとつの実施の形態における仮想計算機冗長化システムは、運用系コンピュータシステムＳＹＳ１と、待機系コンピュータシステムＳＹＳ２と、共有ストレージ１９とを有している。運用系コンピュータシステムＳＹＳ１、及び、待機系コンピュータシステムＳＹＳ２のそれぞれは、物理ハードウェアＨＷ１，ＨＷ２と、基本入出力制御システム７，８とを具備している。運用系コンピュータシステムＳＹＳ１では、ホストＯＳＯＳ１が動作し、このホストＯＳＯＳ１の上で、ゲストＯＳＯＳ１−Ａ〜ＯＳ１−Ｃが動作している。また、待機系コンピュータシステムＳＹＳ２では、ホストＯＳＯＳ２、並びに、ゲストＯＳＯＳ２−Ａ〜ＯＳ２−Ｃが、運用系のバックアップとして待機している。 One of the best modes for carrying out the present invention will be described in detail with reference to the drawings. Referring to FIG. 1, the virtual machine redundancy system in one embodiment includes an operational computer system SYS 1, a standby computer system SYS 2, and a shared storage 19. Each of the operational computer system SYS1 and the standby computer system SYS2 includes physical hardware HW1 and HW2 and basic input / output control systems 7 and 8. In the operational computer system SYS1, a host OS OS1 operates, and guest OSs OS1-A to OS1-C operate on the host OS OS1. In the standby computer system SYS2, the host OS OS2 and the guest OSs OS2-A to OS2-C are on standby as active backups.

ホストＯＳＯＳ１は、仮想計算機制御部１と、障害検知部３と、ダンプ部５とを有している。仮想計算機制御部１は、ホストＯＳＯＳ１の機能の一部を担い、ゲストＯＳＯＳ１−Ａ〜ＯＳ１−Ｃに仮想マシンリソースを提供し、メモリ管理と制御を行う。障害検知部３は、ホストＯＳＯＳ１の機能の一部として、致命的ハードウェア障害発生時に起動されて、ダンプ部５の起動、システムＳＹＳ１の自動的な再起動を行う。ダンプ部５は、ホストＯＳＯＳ１の機能の一部として、障害検知部３から起動され、メモリダンプ採取を行う。ホストＯＳＯＳ２では、仮想計算機制御部２と、障害検知部４と、ダンプ部６とが、仮想計算機制御部１と、障害検知部３と、ダンプ部５とのバックアップとして、それぞれ待機している。 The host OS OS1 has a virtual machine control unit 1, a failure detection unit 3, and a dump unit 5. The virtual machine control unit 1 takes part of the function of the host OS OS1, provides virtual machine resources to the guest OSs OS1-A to OS1-C, and performs memory management and control. The failure detection unit 3 is started when a fatal hardware failure occurs as part of the function of the host OS OS1, and starts the dump unit 5 and automatically restarts the system SYS1. The dump unit 5 is activated by the failure detection unit 3 as part of the function of the host OS OS1 and collects a memory dump. In the host OS OS2, the virtual machine control unit 2, the failure detection unit 4, and the dump unit 6 are respectively waiting as backups for the virtual machine control unit 1, the failure detection unit 3, and the dump unit 5. .

基本入出力制御システム７，８は、それぞれ、ホストＯＳＯＳ１，ＯＳ２と物理ハードウェアＨＷ１，ＨＷ２との入出力制御のサービスをホストＯＳＯＳ１，ＯＳ２に提供し、また、障害発生時の対処プログラムであるシステム管理モードを、中央演算処理装置９，１０に提供するファームウェアである。 The basic input / output control systems 7 and 8 provide the host OS OS1 and OS2 with input / output control services between the host OS OS1 and OS2 and the physical hardware HW1 and HW2, respectively. This is firmware that provides a certain system management mode to the central processing units 9 and 10.

物理ハードウェアＨＷ１，ＨＷ２は、それぞれ、中央演算処理装置９，１０と、物理メモリ制御部１１，１２と、物理メモリと１３，１４と、物理ハードウェア管理部１５，１６、物理Ｉ／Ｏ制御部１７，１８とを有している。運用系の物理メモリ制御部１１は、物理メモリ１３への読み書きの制御を行う役割と、書込み許可された物理アドレスへの書込みの際にはメモリ内容を物理ハードウェア管理部１５にも転送する。待機系の物理メモリ制御部１２は、物理メモリ１４への読み書きの制御を行う役割と、物理ハードウェア管理部１６からの物理メモリ１４への書込み要求があった際には物理メモリ１４への書き込みを行う。運用系の物理ハードウェア管理部１５は、物理メモリ制御部１１からのメモリコピーを受け取り、待機系へ転送する。致命的ハードウェア障害を検出して中央演算処理装置９に割り込みを上げ、待機系へ制御移行通知と中央演算処理装置９のレジスタ、コンテキスト情報の送信を行う。 The physical hardware HW1 and HW2 are respectively the central processing units 9 and 10, the physical memory control units 11 and 12, the physical memory 13 and 14, the physical hardware management units 15 and 16, and the physical I / O control. Parts 17 and 18. The active physical memory control unit 11 controls reading / writing to the physical memory 13 and transfers the memory contents to the physical hardware management unit 15 when writing to a physical address that is permitted to be written. The standby physical memory control unit 12 controls reading / writing to the physical memory 14 and writes to the physical memory 14 when a request for writing to the physical memory 14 is received from the physical hardware management unit 16. I do. The active physical hardware management unit 15 receives the memory copy from the physical memory control unit 11 and transfers it to the standby system. A fatal hardware failure is detected, the central processing unit 9 is interrupted, a control transfer notification is sent to the standby system, and the central processing unit 9 registers and context information are transmitted.

待機系の物理ハードウェア管理部１６は、受信したメモリコピーを物理メモリ制御部１２へ渡す。制御移行通知を受信すると、中央演算処理装置１０へ割り込みを上げ、運用系から受信したレジスタ、コンテキスト情報からの運用継続を引き継がせる処理を行わせる。運用系の物理Ｉ／Ｏ制御部１７は、共有ストレージ１９への読み書き制御及び入出力ハードウェアへの読み書き制御を行う。待機系の物理Ｉ／Ｏ制御部１８は、共有ストレージ１９からの読み込み制御及び入出力ハードウェアへの読み書き制御を行う。 The standby physical hardware management unit 16 passes the received memory copy to the physical memory control unit 12. When the control transfer notification is received, an interrupt is given to the central processing unit 10 to perform processing for taking over the continuation of operation from the register and context information received from the active system. The active physical I / O control unit 17 performs read / write control to the shared storage 19 and read / write control to the input / output hardware. The standby physical I / O control unit 18 performs read control from the shared storage 19 and read / write control to the input / output hardware.

共有ストレージ１９は、運用系と待機系それぞれからアクセス可能であり、詳細には、運用系からは読み書き可能、待機系からは読み込みのみ可能である。共有ストレージ１９は、待機系のゲストＯＳＯＳ２−Ａ〜ＯＳ２−Ｃが運用系に切り替わった後に、運用系のファイルシステムを参照できるように、運用系ゲストＯＳＯＳ１−Ａ〜ＯＳ１−Ｃの最新のファイルシステム情報を格納する。 The shared storage 19 can be accessed from each of the active system and the standby system. Specifically, the shared storage 19 can be read / written from the active system and can only be read from the standby system. The shared storage 19 is the latest version of the active guest OSes OS1-A to OS1-C so that the active file systems can be referred to after the standby guest OSes OS2-A to OS2-C are switched to the active system. Stores file system information.

１．運用系のゲストＯＳＯＳ１−Ａ〜ＯＳ１−Ｃと、待機系のゲストＯＳＯＳ２−Ａ〜ＯＳ２−Ｃのメモリ領域におけるデータ同期のプロセスについて、図１を用いて説明する。 1. A data synchronization process in the memory areas of the active guest OSes OS1-A to OS1-C and the standby guest OSes OS2-A to OS2-C will be described with reference to FIG.

［運用系、待機系］
運用系、待機系それぞれの物理メモリ制御部１１，１２は、アドレス管理テーブル（後述する図５参照）を持っている。仮想計算機制御部１，２は、自身がメモリ管理している物理メモリアドレス情報に変更が生じた場合、物理メモリ制御部１１，１２の当該アドレス管理テーブルに、物理メモリ情報のコピーを行う（図１の〔１〕）。 [Active / Standby]
The physical memory control units 11 and 12 of the active system and the standby system each have an address management table (see FIG. 5 described later). The virtual machine control units 1 and 2 copy the physical memory information to the address management tables of the physical memory control units 11 and 12 when the physical memory address information managed by the virtual machine control unit 1 or 2 changes (see FIG. 1 [1]).

［運用系］
物理メモリ制御部１１は、中央演算処理装置９から書込み要求があった場合には（図１の〔２〕）、アドレス管理テーブルの物理アドレス情報と、書込み要求のあった物理アドレスとを照合する。該当する物理アドレスが存在した場合には、物理メモリ１３に書き込むのと同時に（図１の〔３〕）、物理ハードウェア管理部１５にも、物理アドレスと、書き込まれるメモリデータを渡す（図１の〔４〕）。物理ハードウェア管理部１５は、受け取った物理アドレスとメモリデータとを、待機系の物理ハードウェア管理部１６に送信する（図１の〔５〕）。 [Operational system]
When there is a write request from the central processing unit 9 ([2] in FIG. 1), the physical memory control unit 11 collates the physical address information in the address management table with the physical address for which the write request has been made. . If the corresponding physical address exists, at the same time as writing to the physical memory 13 ([3] in FIG. 1), the physical address and the memory data to be written are also passed to the physical hardware management unit 15 (FIG. 1). [4]). The physical hardware management unit 15 transmits the received physical address and memory data to the standby physical hardware management unit 16 ([5] in FIG. 1).

［待機系］
待機系の物理ハードウェア管理部１６は、受け取った物理アドレスとメモリデータとに基づいて、物理メモリ１４へ書き込みを行うため、物理メモリ制御部１２に対して物理メモリ１４への書込みを要求する（図１の〔６〕）。要求を受けた物理メモリ制御部１２は、自身のアドレス管理テーブルの物理アドレス情報と、書込み要求のあった物理アドレスとを照合する。該当する物理アドレスが存在した場合のみ、中央演算処理装置１０に対して物理メモリバスの開放を要求する（図１の〔７〕）。物理メモリバス開放要求を受け取った中央演算処理装置１０は、物理メモリバスの開放を物理メモリ制御部１２に通知する（図１の〔８〕）。物理メモリ制御部１２は、指定された物理アドレスへの書込み許可を、物理ハードウェア管理部１６へ通知して、データ転送の受入れの準備を完了する（図１の〔９〕）。通知を受け取った物理ハードウェア管理部１６は、物理メモリ制御部１２に対して、メモリデータの転送を行う（図１の〔１０〕）。物理ハードウェア管理部１６から転送されてきたメモリデータを受けた物理メモリ制御部１２は、物理メモリ１４の指定アドレスに対して書込みを行う（図１の〔１１〕）。書込みを完了すると、中央演算処理装置１０に対して物理メモリバスの占有使用の完了を通知する（図１の〔１２〕）。 [Standby]
The standby physical hardware management unit 16 requests the physical memory control unit 12 to write to the physical memory 14 in order to write to the physical memory 14 based on the received physical address and memory data ( [6] in FIG. The physical memory control unit 12 that has received the request collates the physical address information in its own address management table with the physical address for which the write request has been made. Only when the corresponding physical address exists, the central processing unit 10 is requested to release the physical memory bus ([7] in FIG. 1). The central processing unit 10 that has received the physical memory bus release request notifies the physical memory control unit 12 of the release of the physical memory bus ([8] in FIG. 1). The physical memory control unit 12 notifies the physical hardware management unit 16 of write permission to the designated physical address, and completes preparation for accepting data transfer ([9] in FIG. 1). The physical hardware management unit 16 that has received the notification transfers memory data to the physical memory control unit 12 ([10] in FIG. 1). The physical memory control unit 12 that has received the memory data transferred from the physical hardware management unit 16 writes to the designated address of the physical memory 14 ([11] in FIG. 1). When the writing is completed, the central processing unit 10 is notified of the completion of exclusive use of the physical memory bus ([12] in FIG. 1).

２．ハードウェア障害発生時におけるゲストＯＳＯＳ１−Ａ〜ＯＳ１−Ｃの移行処理について、図２を用いて、説明する。 2. Migration processing of the guest OSes OS1-A to OS1-C when a hardware failure occurs will be described with reference to FIG.

［運用系］
運用系の物理ハードウェア管理部１５が致命的なハードウェア障害を検出すると、中央演算処理装置９に対して割り込みを上げる（図２の〔１〕）。物理ハードウェア管理部１５から割り込みを受けた中央演算処理装置９は、物理メモリ１３に展開されている基本入出力制御システム７のシステム管理モードに制御を移す（図２の〔２〕）。システム管理モードへの移行によって、中央演算処理装置９の現行の状態を退避するようにとの命令が出ると（図２の〔３〕）、中央演算処理装置９から現行の状態（レジスタ情報、コンテキスト情報）が物理メモリ制御部１１に渡される（図２の〔４〕）。これらは、物理メモリ１３に展開された基本入出力制御システム７のシステム管理領域にあるレジスタ及びコンテキスト情報保存領域に保存され（図２の〔５〕）、同時に、物理メモリ制御部１１を経由して、物理ハードウェア管理部１５へも渡される（図２の〔６〕）。物理ハードウェア管理部１５は、待機系の物理ハードウェア管理部１６に対して、ゲストＯＳＯＳ１−Ａ〜ＯＳ１−Ｃの制御移行通知と、受け取った中央演算処理装置９のレジスタ、コンテキスト情報とを送信する（図２の〔７〕、〔８〕）。 [Operational system]
When the active physical hardware management unit 15 detects a fatal hardware failure, it raises an interrupt to the central processing unit 9 ([1] in FIG. 2). The central processing unit 9 that has received an interrupt from the physical hardware management unit 15 transfers control to the system management mode of the basic input / output control system 7 developed in the physical memory 13 ([2] in FIG. 2). When an instruction to save the current state of the central processing unit 9 is issued by the shift to the system management mode ([3] in FIG. 2), the current state (register information, Context information) is passed to the physical memory control unit 11 ([4] in FIG. 2). These are stored in the registers and the context information storage area in the system management area of the basic input / output control system 7 expanded in the physical memory 13 ([5] in FIG. 2), and simultaneously through the physical memory control unit 11. Then, it is also passed to the physical hardware management unit 15 ([6] in FIG. 2). The physical hardware management unit 15 notifies the standby physical hardware management unit 16 of the control transfer notification of the guest OSs OS1-A to OS1-C and the received registers and context information of the central processing unit 9. Transmit ([7], [8] in FIG. 2).

一方、システム管理モードに入っている中央演算処理装置９は、ホストＯＳＯＳ１の障害検知部３を起動するための割り込みを中央演算処理装置９自身へ上げるよう、物理ハードウェア管理部１５に対して命令を発行し（図２の〔９〕）、その後、リストア命令を実行してシステム管理モードから抜ける。続いて、物理ハードウェア管理部１５から割り込みを受けると（図２の〔１０〕）、中央演算処理装置９は、障害検知部３に制御を移す（図２の〔１１〕）。障害検知部３はダンプ部５を起動し、ダンプ部５は、中央演算処理装置９と、物理メモリ制御部１１と、物理Ｉ／Ｏ制御部１７とを経由して、共有ストレージ１９にメモリダンプを書き込む。メモリダンプが完了してダンプ部５から終了通知を受けると、障害検知部３は、運用系コンピュータシステムＳＹＳ１の再起動を自動的に行う。 On the other hand, the central processing unit 9 in the system management mode instructs the physical hardware management unit 15 to raise an interrupt for starting the failure detection unit 3 of the host OS OS1 to the central processing unit 9 itself. An instruction is issued ([9] in FIG. 2), and then a restore instruction is executed to exit the system management mode. Subsequently, when receiving an interrupt from the physical hardware management unit 15 ([10] in FIG. 2), the central processing unit 9 transfers control to the failure detection unit 3 ([11] in FIG. 2). The failure detection unit 3 activates the dump unit 5, and the dump unit 5 performs a memory dump to the shared storage 19 via the central processing unit 9, the physical memory control unit 11, and the physical I / O control unit 17. Write. When the memory dump is completed and an end notification is received from the dump unit 5, the failure detection unit 3 automatically restarts the operational computer system SYS1.

［待機系］
待機系の物理ハードウェア管理部１６は、運用系の物理ハードウェア管理部１５から、制御移行通知と、レジスタ及びコンテキスト情報とを受け取ると、物理ハードウェア管理部１６自身の状態レジスタを運用系であることを示す値に変更し、中央演算処理装置１０に対して割り込みを上げる（図２の〔１２〕）。物理ハードウェア管理部１６から割り込みを受けた中央演算処理装置１０は、物理メモリ１４に展開された基本入出力制御システム８の命令コード群における所定のエントリアドレスにジャンプし、システム管理モードで実行されるプログラムに制御を移す（図２の〔１３〕）。システム管理モードへの移行によって、中央演算処理装置１０の現行の状態を退避するようにとの命令が出ると（図２の〔１４〕）、中央演算処理装置１０から現行の状態（レジスタ情報、コンテキスト情報）が物理メモリ制御部１２に渡される（図２の〔１５〕）。この現行の状態は、物理メモリ１４に展開された基本入出力制御システム８のシステム管理領域にあるコンテキスト情報保存領域に保存される（図２の〔１６〕）。 [Standby]
When the standby physical hardware management unit 16 receives the control transfer notification and the register and context information from the active physical hardware management unit 15, the standby physical hardware management unit 16 stores the status register of the physical hardware management unit 16 itself in the active system. The value is changed to a value indicating that it is present, and an interrupt is given to the central processing unit 10 ([12] in FIG. 2). The central processing unit 10 receiving the interrupt from the physical hardware management unit 16 jumps to a predetermined entry address in the instruction code group of the basic input / output control system 8 developed in the physical memory 14 and is executed in the system management mode. Control is transferred to the program ([13] in FIG. 2). When an instruction to save the current state of the central processing unit 10 is issued by the shift to the system management mode ([14] in FIG. 2), the current state (register information, Context information) is transferred to the physical memory control unit 12 ([15] in FIG. 2). This current state is stored in the context information storage area in the system management area of the basic input / output control system 8 developed in the physical memory 14 ([16] in FIG. 2).

システム管理モードにおいては、次に、物理ハードウェア管理部１６に対して、運用系から運用系ゲストＯＳＯＳ１−Ａ〜ＯＳ１−Ｃの制御移行通知を受け取っているかどうかの確認を行う。ここでは、物理ハードウェア管理部１６の状態レジスタが運用系を示す値となっているかどうかをチェックする（図２の〔１７〕）。運用系を示す値になっている場合は、物理ハードウェア管理部１６が保持している運用系中央演算処理装置９のレジスタ及びコンテキスト情報を引き取って（図２の〔１８〕）、物理メモリ制御部１２経由で、システム管理領域の予備のコンテキスト保存領域に保存する（図２の〔１９〕）。 Next, in the system management mode, it is confirmed whether or not the physical hardware management unit 16 has received control transfer notifications of the active guest OSs OS1-A to OS1-C from the active system. Here, it is checked whether or not the status register of the physical hardware management unit 16 has a value indicating the active system ([17] in FIG. 2). If the value indicates the active system, the register and the context information of the active central processing unit 9 held by the physical hardware management unit 16 are acquired ([18] in FIG. 2), and physical memory control is performed. It is stored in a spare context storage area in the system management area via the section 12 ([19] in FIG. 2).

システム管理モードにおいては、最後に、中央演算処理装置１０に対してリストア命令を発行し、予備のコンテキスト保存領域に格納されているレジスタ、コンテキスト情報を中央演算処理装置１０の各レジスタにリストアさせる。そして、運用系ゲストＯＳＯＳ１−Ａ〜ＯＳ１−Ｃが、障害直前に処理していたフェーズからシステム運用を再開する（図２の〔２０〕、〔２１〕）。システム運用が再開された後、障害検知部４は仮想計算機制御部２と連携し、バックアップしたゲストＯＳＯＳ２−Ａ〜ＯＳ２−Ｃが正常に稼動しているかどうかを診断するため、ポーリングチェックを行う。正常稼動していない場合は、ゲストＯＳＯＳ２−Ａ〜ＯＳ２−Ｃの再起動を行うなどする。 In the system management mode, finally, a restore command is issued to the central processing unit 10 to restore the registers and context information stored in the spare context storage area to each register of the central processing unit 10. Then, the operational guest OSes OS1-A to OS1-C resume system operation from the phase processed immediately before the failure ([20] and [21] in FIG. 2). After the system operation is resumed, the failure detection unit 4 cooperates with the virtual machine control unit 2 to perform a polling check in order to diagnose whether the backed up guest OSes OS2-A to OS2-C are operating normally. . If it is not operating normally, the guest OSes OS2-A to OS2-C are restarted.

３．物理ハードウェア管理部１５の構成例について、図３を用いて説明する。図３を参照すると、物理ハードウェア管理部１５は、割り込み処理部２０と、ログ採取部２１と、状態レジスタ２２と、データ受信部２３と、データ送信部２４と、エラーレジスタ２５と、入力バッファ３５と、出力バッファ３６とを有している。図３において、割り込み処理部２０は、致命的ハードウェア障害を検出して、中央演算処理装置９に割り込みを発生させる。ログ採取部２１は、ハードウェア障害発生時の詳細ログ情報を採取する。状態レジスタ２２は、システムが運用系であるか、または待機系であるかを示す情報を保持する。データ受信部２３は、メモリ情報と制御移行通知とを受信処理する。データ送信部２４は、メモリ情報と制御移行通知とを送信処理する。エラーレジスタ２５は、検出されたハードウェア障害が致命的なものかどうかの情報（障害レベル）を保持する。入力バッファ３５は、受信処理するメモリ情報及び制御移行通知のバッファとして機能する。出力バッファ３６は、送信処理するメモリ情報及び制御移行通知のバッファとして機能する。 3. A configuration example of the physical hardware management unit 15 will be described with reference to FIG. Referring to FIG. 3, the physical hardware management unit 15 includes an interrupt processing unit 20, a log collection unit 21, a status register 22, a data reception unit 23, a data transmission unit 24, an error register 25, and an input buffer. 35 and an output buffer 36. In FIG. 3, the interrupt processing unit 20 detects a fatal hardware failure and causes the central processing unit 9 to generate an interrupt. The log collection unit 21 collects detailed log information when a hardware failure occurs. The status register 22 holds information indicating whether the system is an active system or a standby system. The data receiving unit 23 receives the memory information and the control transfer notification. The data transmission unit 24 transmits the memory information and the control transfer notification. The error register 25 holds information (failure level) indicating whether the detected hardware failure is fatal. The input buffer 35 functions as a buffer for memory information to be received and a control transfer notification. The output buffer 36 functions as a buffer for memory information to be transmitted and a control transfer notification.

物理ハードウェア管理部１５の動作例について説明する。物理メモリ制御部１１又は物理Ｉ／Ｏ制御部１７は、ハードウェア障害を検出すると、物理ハードウェア管理部１５に通知を上げ、障害が致命的かどうかを示す情報をエラーレジスタ２５に登録する。割り込み処理部２０は、エラーレジスタ２５の値をチェックし、通知されたハードウェア障害が致命的なものかどうかを判定する。致命的な場合は、中央演算処理装置９に割り込みを上げてログ採取部２１で障害情報を記録する。割り込み要求に対しては、そのまま中央演算処理装置９に割り込みを上げる。 An operation example of the physical hardware management unit 15 will be described. When the physical memory control unit 11 or the physical I / O control unit 17 detects a hardware failure, the physical memory control unit 11 or the physical I / O control unit 17 notifies the physical hardware management unit 15 and registers information indicating whether the failure is fatal in the error register 25. The interrupt processing unit 20 checks the value of the error register 25 and determines whether the notified hardware failure is fatal. In the case of a fatal case, the central processing unit 9 is interrupted and the log collection unit 21 records the failure information. In response to an interrupt request, an interrupt is sent to the central processing unit 9 as it is.

物理メモリ制御部１１からの物理アドレスとメモリデータの入力については、出力バッファ３６に一時的に格納した後、データ送信部２４から対向の物理ハードウェア管理部１６へ送信を行う。データ受信部２３で受信した物理アドレスとメモリデータについては、入力バッファ３５に一時的に格納した後、物理メモリ制御部１１へ出力する。ここで、制御移行通知を受信した場合には、状態レジスタ２２の値を待機系を示す値から運用系を示す値に変更する。 The physical address and memory data input from the physical memory control unit 11 are temporarily stored in the output buffer 36 and then transmitted from the data transmission unit 24 to the opposing physical hardware management unit 16. The physical address and memory data received by the data receiving unit 23 are temporarily stored in the input buffer 35 and then output to the physical memory control unit 11. Here, when the control transfer notification is received, the value of the status register 22 is changed from the value indicating the standby system to the value indicating the active system.

４．仮想計算機制御部１の構成例について、図４を用いて説明する。図４を参照すると、仮想計算機制御部１は、仮想ハードウェア提供部２６と、メモリ管理部２７と、メモリ管理テーブル２８とを有している。仮想ハードウェア提供部２６は、ゲストＯＳＯＳ１−Ａ〜ＯＳ１−Ｃが使用するシステムリソース（中央演算処理装置、物理メモリ、基本入出力システム、入出力デバイス等）を仮想的に生成して提供する役割を担う。メモリ管理部２７は、セグメントとページ情報を保持し、ホストＯＳＯＳ１とゲストＯＳＯＳ１−Ａ〜ＯＳ１−Ｃのメモリ管理を行う。メモリ管理テーブル２８は、システムで書込み可能な領域の論理アドレスと物理アドレスとの対応情報を格納する。メモリ管理テーブル２８には、ゲストＯＳＯＳ１−Ａ〜ＯＳ１−Ｃ、ホストＯＳＯＳ１自身の各プロセスのメモリ情報が全て格納されている。 4). A configuration example of the virtual machine control unit 1 will be described with reference to FIG. Referring to FIG. 4, the virtual machine control unit 1 includes a virtual hardware providing unit 26, a memory management unit 27, and a memory management table 28. The virtual hardware providing unit 26 virtually generates and provides system resources (central processing unit, physical memory, basic input / output system, input / output device, etc.) used by the guest OSes OS1-A to OS1-C. Take a role. The memory management unit 27 holds the segment and page information and performs memory management of the host OS OS1 and the guest OSs OS1-A to OS1-C. The memory management table 28 stores correspondence information between logical addresses and physical addresses of areas writable by the system. The memory management table 28 stores all the memory information of each process of the guest OS OS1-A to OS1-C and the host OS OS1 itself.

仮想計算機制御部１の動作例について説明する。仮想計算機制御部１は、書き込みが行われることによってメモリ内容に変更があった場合には、変更があったメモリの番地情報及びその内容などを保存して、メモリ管理テーブル２８を更新する。また、そのメモリ情報（物理アドレス及びメモリ内容）を物理メモリ制御部１１に渡す。物理メモリ制御部１１は、アドレス管理テーブル（後述する図５参照）を更新し、メモリ内容に変更があったメモリ番地情報を保存する。メモリ情報は、同時に、物理メモリ制御部１１から物理ハードウェア管理部１５を経由して待機系の仮想計算機制御部２にも渡される。このメモリ情報により、待機系のメモリ管理テーブルの更新も行われる。 An operation example of the virtual machine control unit 1 will be described. When there is a change in the memory contents due to the writing, the virtual machine control unit 1 stores the address information of the changed memory and its contents, and updates the memory management table 28. Further, the memory information (physical address and memory contents) is passed to the physical memory control unit 11. The physical memory control unit 11 updates the address management table (see FIG. 5 described later), and stores the memory address information whose memory contents are changed. At the same time, the memory information is also transferred from the physical memory control unit 11 to the standby virtual computer control unit 2 via the physical hardware management unit 15. Based on this memory information, the standby memory management table is also updated.

５．物理メモリ制御部１１の構成例について、図５を用いて説明する。図５を参照すると、物理メモリ制御部１１は、物理メモリ、Ｉ／Ｏ入出力切り替え回路２９と、アドレス管理テーブル３０と、出力バッファ３７とを有している。物理メモリ、Ｉ／Ｏ入出力切り替え回路２９は、中央演算処理装置９、物理Ｉ／Ｏ制御部１７等からのメモリ入出力を制御して、物理メモリ１３へのアクセス制御全般を行う。アドレス管理テーブル３０には、仮想計算機制御部１が管理しているメモリ管理テーブル２８のアドレス情報がコピーされる。物理メモリ、Ｉ／Ｏ入出力切り替え回路２９は、物理メモリ１３におけるある物理アドレスに対する書込みが発生した際に、アドレス管理テーブル３０を参照して、当該物理アドレスが書込み許可領域であるかどうかをチェックする。該当する物理アドレスが存在した場合には、書き込まれるメモリ内容のコピーを、物理ハードウェア管理部１５に転送する。 5). A configuration example of the physical memory control unit 11 will be described with reference to FIG. Referring to FIG. 5, the physical memory control unit 11 includes a physical memory, an I / O input / output switching circuit 29, an address management table 30, and an output buffer 37. The physical memory / I / O input / output switching circuit 29 controls memory input / output from the central processing unit 9, the physical I / O control unit 17, and the like, and performs overall access control to the physical memory 13. In the address management table 30, the address information of the memory management table 28 managed by the virtual machine control unit 1 is copied. When a write to a physical address in the physical memory 13 occurs, the physical memory / I / O input / output switching circuit 29 refers to the address management table 30 and checks whether the physical address is a write-permitted area. To do. If the corresponding physical address exists, a copy of the memory content to be written is transferred to the physical hardware management unit 15.

６．共有ストレージ１９について詳述する。本実施の形態においては、仮想計算機冗長化システムに、共有ストレージ１９を設置している。仮に、運用系ゲストＯＳＯＳ１−Ａ〜ＯＳ１−Ｃが使用するファイルシステムが、運用系のローカルに存在する物理ディスクに格納されたディスクイメージであった場合を考える。この場合には、待機系が運用系に移行した後は、待機系ゲストＯＳＯＳ２−Ａ〜ＯＳ２−Ｃからはそのディスクイメージにアクセスすることができない。そのため、仮想計算機冗長化システムは、待機系が運用系に移行した後は、動作できなくなる。 6). The shared storage 19 will be described in detail. In the present embodiment, the shared storage 19 is installed in the virtual machine redundancy system. Assume that the file system used by the active guest OSes OS1-A to OS1-C is a disk image stored on a physical disk that exists locally in the active system. In this case, after the standby system shifts to the active system, the standby system guest OS OS2-A to OS2-C cannot access the disk image. For this reason, the virtual machine redundancy system cannot operate after the standby system has shifted to the active system.

そこで、本実施の形態においては、運用系コンピュータシステムＳＹＳ１と、待機系コンピュータシステムＳＹＳ２との両方から平等にアクセス可能な共有ストレージ１９を用意している。ここに、運用系ゲストＯＳＯＳ１−Ａ〜ＯＳ１−Ｃから更新されるファイルシステム情報を保存しておくことにしている。 Therefore, in the present embodiment, a shared storage 19 that is equally accessible from both the active computer system SYS1 and the standby computer system SYS2 is prepared. Here, file system information updated from the active guest OSes OS1-A to OS1-C is stored.

本実施の形態による効果について説明する。第１の効果は、ホストオペレーティングシステムでは検出できないハードウェア障害の原因を特定できるという点である。その理由は、専用ハードウェアである物理ハードウェア管理部を用意して、この物理ハードウェア管理部が、物理メモリ制御部、物理Ｉ／Ｏ制御部などの各種ハードウェア制御部からの障害通知を検出し、ログ情報を保存する仕組みを提供しているからである。 The effect by this Embodiment is demonstrated. The first effect is that the cause of a hardware failure that cannot be detected by the host operating system can be identified. The reason is that a physical hardware management unit, which is dedicated hardware, is prepared, and this physical hardware management unit notifies failure notifications from various hardware control units such as a physical memory control unit and a physical I / O control unit. This is because it provides a mechanism for detecting and saving log information.

第２の効果は、ゲストオペレーティングシステムの二重化運用を、ダウンタイムを大幅に短縮して提供することができる点である。その理由は、運用系、待機系それぞれの仮想計算機システムに係る物理ハードウェア管理部が連携することで、ハードウェア障害発生時に障害を検出し、即座にゲストオペレーティングシステムの処理を移行させる仕組みを提供しているからである。 The second effect is that the redundant operation of the guest operating system can be provided with greatly reduced downtime. The reason for this is that the physical hardware management units related to the virtual machine systems of the active system and the standby system work together to detect a failure when a hardware failure occurs and provide a mechanism to immediately migrate the guest operating system processing. Because it is.

図１は、仮想計算機冗長化システムの構成、及び、通常運用時における制御の流れ及びデータの流れを示した図である。FIG. 1 is a diagram showing a configuration of a virtual machine redundancy system, and a control flow and a data flow during normal operation. 図２は、致命的なハードウェア障害が発生した時の制御及びデータの流れを示した図である。FIG. 2 is a diagram showing the flow of control and data when a fatal hardware failure occurs. 図３は、物理ハードウェア管理部の構成例を示した図である。FIG. 3 is a diagram illustrating a configuration example of the physical hardware management unit. 図４は、仮想計算機制御部の構成例を示した図である。FIG. 4 is a diagram illustrating a configuration example of the virtual machine control unit. 図５は、物理メモリ制御部の構成例を示した図である。FIG. 5 is a diagram illustrating a configuration example of the physical memory control unit.

Explanation of symbols

１，２仮想計算機制御部
３，４障害検知部
５，６ダンプ部
７，８基本入出力制御システム
９，１０中央演算処理装置
１１，１２物理メモリ制御部
１３，１４物理メモリ
１５，１６物理ハードウェア管理部
１７，１８物理Ｉ／Ｏ制御部
１９共有ストレージ
２０割り込み処理部
２１ログ採取部
２２状態レジスタ
２３データ受信部
２４データ送信部
２５エラーレジスタ
２６仮想ハードウェア提供部
２７メモリ管理部
２８メモリ管理テーブル
２９物理メモリ、Ｉ／Ｏ入出力切り替え回路
３０アドレス管理テーブル
３５入力バッファ
３６，３７出力バッファ 1, 2 Virtual computer control unit 3, 4 Fault detection unit 5, 6 Dump unit 7, 8 Basic input / output control system 9, 10 Central processing unit 11, 12 Physical memory control unit 13, 14 Physical memory 15, 16 Physical hardware Hardware management unit 17, 18 Physical I / O control unit 19 Shared storage 20 Interrupt processing unit 21 Log collection unit 22 Status register 23 Data reception unit 24 Data transmission unit 25 Error register 26 Virtual hardware provision unit 27 Memory management unit 28 Memory management Table 29 Physical memory, I / O input / output switching circuit 30 Address management table 35 Input buffer 36, 37 Output buffer

Claims

An operational host operating system, and an operational computer system comprising an operational guest operating system operating on a virtual machine provided by the operational host operating system;
A virtual machine redundancy system comprising a standby host operating system and a standby guest operating system, the standby computer system waiting as a backup of the operational computer system.

The operational host operating system is:
In addition to performing memory management of the active guest operating system and the active host operating system, the operating guest operating system has operating virtual computer control means for providing virtual hardware resources to the active guest operating system. ,
The standby host operating system is:
The virtual machine redundancy system according to claim 1, further comprising standby virtual computer control means that stands by as a backup of the active virtual computer control means.

The virtual hardware resource is:
The virtual computer redundancy system according to claim 2, comprising a virtual arithmetic processing device, a virtual file system, and a virtual memory.

The operational computer system is:
A change occurs in the memory contents of the operational physical memory due to the operation of the operational host operating system, the operational physical memory used by the operational guest operating system, and the operational guest operating system. The operation physical physical hardware having an operation physical memory control means for transmitting the memory information to the standby computer system,
The standby computer system includes:
The standby host operating system, the standby physical memory used by the standby guest operating system, and the memory information are written to the standby physical memory, and the memory area of the standby guest operating system is updated. The virtual machine redundancy system according to claim 3, further comprising standby physical hardware having standby physical memory control means.

The memory information is
A physical address of the operational physical memory;
The virtual machine redundancy system according to claim 4, further comprising a memory content stored at the physical address.

The operational physical hardware is
Further comprising operational physical hardware management means for receiving the memory information from the physical memory control means and transmitting the memory information to the standby computer system;
The standby physical hardware is:
6. The virtual machine redundancy system according to claim 5, further comprising standby physical hardware management means that receives the memory information from the active physical hardware management means and passes the memory information to the standby physical memory control means.

The operational physical hardware is
It further has an operational central processing unit,
The operational physical hardware management means includes:
The virtual machine redundancy system according to claim 6, wherein a fatal hardware failure is detected and a failure detection interrupt is raised to the active central processing unit.

The fatal hardware failure is
The virtual machine redundancy system according to claim 7, comprising an error in the operational physical memory and an error in a physical I / O device.

The operational host operating system is:
9. The virtual machine redundancy according to claim 8, further comprising a failure detection unit that is activated when the operational physical hardware management unit detects a fatal hardware failure and instructs to collect a memory dump of the operational physical memory. System.

The operational host operating system is:
It further comprises dump means for collecting a memory dump of the operational physical memory,
The failure detection means includes
10. The virtual machine redundancy system according to claim 9, wherein the memory dump collection is instructed by activating the dump unit, and then the system restart of the operational computer system is automatically performed.

The operational computer system is:
Performs basic input / output control of the operational physical hardware, and when the operational central processing unit receives the failure detection interrupt, shifts to a system management mode to instruct the failure handling basic input / output A control system,
The standby physical hardware is:
In addition to having a standby central processing unit,
When the operational central processing unit receives the failure detection interrupt, the operational computer system receives a control transfer notification instructing that the operational computer system should be backed up,
The standby computer system includes:
A standby basic input / output control system that is called when the standby physical hardware receives the control transfer notification and that causes the standby central processing unit to take over the processing of the active central processing unit; The virtual machine redundancy system according to claim 10.

The operational physical hardware management means includes:
When a fatal hardware failure is detected, the control transfer notification is sent to the standby computer system,
The standby physical hardware management means includes
The virtual machine redundancy system according to claim 11, wherein the control transfer notification is received from the active physical hardware management means, and a backup instruction interrupt is raised to the standby central processing unit.

The operational physical hardware management means includes:
It has an active physical hardware management means status register that stores flag information indicating whether it is an active or standby system,
When a fatal hardware failure is detected, the contents of the operational physical hardware management means status register are rewritten from a value indicating the active system to a value indicating the standby system,
The standby physical hardware management means is
It has a standby physical hardware management means status register that stores flag information indicating whether it is an active system or a standby system,
13. The virtual machine redundancy system according to claim 12, wherein when the control transfer notification is received, the contents of the standby physical hardware management means status register are rewritten from a value indicating the standby system to a value indicating the active system.

The operational virtual computer control means includes:
A memory management table including information for specifying a writable area from the active guest operating system in the memory area of the active physical memory;
If there is a write to the writable area, the memory information is sent to the operational physical memory control means,
The operational physical memory control means includes:
The virtual machine redundancy system according to claim 13, wherein the memory information is written to the active physical memory and the writable area is updated.

When storing the disk image of the virtual file system used by the active guest operating system, the standby computer system backs up the active computer system, and when the standby guest operating system starts operation, The virtual machine redundancy system according to claim 14, further comprising a shared storage accessible from the standby computer system as a virtual file system referred to by the standby guest operating system.