JP2013254354A

JP2013254354A - Computer device, software management method and program

Info

Publication number: JP2013254354A
Application number: JP2012129620A
Authority: JP
Inventors: Akira Kanashiro; 聖金城
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2012-06-07
Filing date: 2012-06-07
Publication date: 2013-12-19

Abstract

PROBLEM TO BE SOLVED: To avoid a situation in which a device of a standby system continues operating while remaining in a degeneration state in a case where a system changeover from an active system to the standby system occurs when the device of the standby system is synchronized with degeneration of the active system and is in a degeneration state.SOLUTION: When a system is switched and a physical server device 421 of a standby system takes over processing of a physical server device 401 of an active system, a VM state recovery section 427 determines whether or not a guest OS brought into a degeneration state by a VM state management section 428 before a system changeover is present, and causes the guest OS in the degeneration state to recover from the degeneration state when the guest OS is present.

Description

本発明は、コンピュータ装置が冗長化されたコンピュータシステムに関する。
例えば、仮想マシン（ＶＭ）同期技術を用いて冗長化されたコンピュータシステムに関する。 The present invention relates to a computer system in which computer devices are made redundant.
For example, the present invention relates to a computer system made redundant by using a virtual machine (VM) synchronization technique.

従来の仮想マシン同期技術は、図１１のような構成を取り運用・待機の二重系構成が組まれたサーバ上で仮想マシンを動作させている。
図１１の構成では、運用系サーバ上で動作する仮想マシンのＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）レジスタやメモリの内容を、待機系サーバ上で一時停止状態となっている仮想マシンに定期的にコピーする。
これにより、運用系と待機系の仮想マシンの状態を同期し、運用系サーバのサーバ障害発生後も、待機系での動作継続を実現している。
ＣＰＵレジスタやメモリ内容のコピー処理は、仮想マシン内部の入出力処理における入力を契機とし、コピー処理を実行する方式（非特許文献１）や、一定間隔の周期制御によりコピー処理を実行する方式（非特許文献２）がある。 In the conventional virtual machine synchronization technology, a virtual machine is operated on a server configured as shown in FIG. 11 and configured with a dual system configuration of operation and standby.
In the configuration of FIG. 11, the contents of a CPU (Central Processing Unit) register and memory of a virtual machine operating on the active server are periodically copied to a virtual machine that is suspended on the standby server.
As a result, the states of the active and standby virtual machines are synchronized, and the operation in the standby system is continued even after a server failure occurs in the active server.
The CPU register and memory contents copy processing is triggered by input in the input / output processing inside the virtual machine (Non-Patent Document 1), or the copy processing is executed by periodic control at regular intervals ( Non-patent document 2).

また、従来のＣＰＵ異常検知技術には、たとえば、ＵＮＩＸ（登録商標）においては、プロセッサの異常を検知後、ダイナミック・プロセッサ・デアロケーション（非特許文献３）を利用することで、異常が発生したプロセッサを切り離し、システムを縮退させることで動作の継続を可能とする技術がある。
この技術は、将来ｘ８６ＣＰＵを搭載したＬｉｎｕｘ（登録商標）システムにおいても実現すると想定される。 Further, in the conventional CPU abnormality detection technology, for example, in UNIX (registered trademark), an abnormality has occurred by detecting a processor abnormality and then using dynamic processor deallocation (Non-patent Document 3). There is a technology that enables operation to be continued by disconnecting the processor and degenerating the system.
This technology is expected to be realized in the Linux (registered trademark) system equipped with x86 CPU in the future.

ＫＶＭを利用した耐故障クラスタリング技術の開発、情報処理学会研究報告．［システムソフトウェアとオペレーティング・システム］２０１０−ＯＳ−１１５（２０），１−８，２０１０−０７−２７Development of fault-tolerant clustering technology using KVM, IPSJ research report. [System Software and Operating System] 2010-OS-115 (20), 1-8, 2010-07-27 Ｒｅｍｕｓ：ＨｉｇｈＡｖａｉｌａｂｉｌｉｔｙｖｉａＡｓｙｎｃｈｒｏｎｏｕｓＶｉｒｔｕａｌＭａｃｈｉｎｅＲｅｐｌｉｃａｔｉｏｎ、ＵＳＥＮＩＸＮＳＤＩ’０８（Ａｐｒｉｌ．２０１０）Remus: High Availability via Asynchronous Virtual Machine Replication, USENIX NSDI'08 (April. 2010) ＰＯＷＥＲ７（Ｒ）システムのＲＡＳＰｏｗｅｒＳｙｓｔｅｍＴＭの信頼性・可用性・保守性の重要な特徴、ＤａｎｉｅｌＨｅｎｄｅｒｓｏｎ、ＪｉｍＭｉｔｃｈｅｌｌ、およびＧｅｏｒｇｅＡｈｒｅｎｓ、２０１０年１１月１日Key Features of RAS Power SystemTM Reliability, Availability, and Serviceability of POWER7 (R) Systems, Daniel Henderson, Jim Mitchell, and George Ahrens, November 1, 2010

ＣＰＵ異常検知技術をＶＭ技術と組み合わせることで、図１２のようなシステムが構築可能となる。
図１２のようなシステムでは、ＶＭが保持する仮想ＣＰＵ（以下、「ｖＣＰＵ」と記す）に物理ＣＰＵ（以下、「ＣＰＵ」と記す）が割り当てられた構成をとり、ＣＰＵの異常を、そのＣＰＵを利用しているＶＭ及びｖＣＰＵに通知することができるようになる。
仮想マシンが複数のｖＣＰＵを保持している場合、ＣＰＵ異常時に異常が発生したＣＰＵに割り当てられているｖＣＰＵをＶＭから切り離すことで、ＶＭを縮退状態にすることが可能となる。 By combining the CPU abnormality detection technology with the VM technology, a system as shown in FIG. 12 can be constructed.
The system as shown in FIG. 12 has a configuration in which a physical CPU (hereinafter referred to as “CPU”) is assigned to a virtual CPU (hereinafter referred to as “vCPU”) held by the VM, and an abnormality of the CPU is detected. It is possible to notify VMs and vCPUs that are using.
When the virtual machine holds a plurality of vCPUs, the VM can be put into a degenerated state by disconnecting the vCPU assigned to the CPU in which the abnormality occurred when the CPU is abnormal from the VM.

上記のＣＰＵ異常検知技術とＶＭ技術を組み合わせたシステムにＶＭ同期技術を適用したシステムは、図１３に示すような動作を行う。
図１３では、運用系のＣＰＵで異常が発生した場合、そのＣＰＵに割り当てられているＶＭは、異常が発生したＣＰＵに割り当てられているｖＣＰＵを切り離すことで縮退状態になる。
このとき、ＶＭ同期により、ＶＭの状態が待機系にコピーされ、待機系のＣＰＵに異常が発生していないにも関わらず、待機系サーバ上で一時停止状態となっているＶＭは縮退状態となる。
待機系のＶＭが縮退状態となっている状態で、運用系から待機系に系切替が発生した場合、待機系のＶＭは縮退状態のまま動作を開始し、縮退状態が継続されてしまう、という課題がある。 A system in which the VM synchronization technology is applied to a system combining the CPU abnormality detection technology and the VM technology performs the operation as shown in FIG.
In FIG. 13, when an abnormality occurs in the active CPU, the VM assigned to the CPU enters a degenerated state by disconnecting the vCPU assigned to the CPU in which the abnormality has occurred.
At this time, due to VM synchronization, the VM state is copied to the standby system, and the VM that is temporarily stopped on the standby server even though no abnormality has occurred in the standby CPU is in the degenerated state. Become.
When system switching occurs from the active system to the standby system in a state where the standby VM is in a degenerated state, the standby VM starts operating in the degenerated state, and the degenerated state continues. There are challenges.

この発明は、上記のような課題を解決することを主な目的の一つとしており、待機系のコンピュータ装置が縮退状態のまま動作し続ける事態を回避することを主な目的とする。 One of the main objects of the present invention is to solve the above-described problems, and a main object of the present invention is to avoid a situation in which a standby computer device continues to operate in a degenerated state.

本発明に係るコンピュータ装置は、
コンピュータ装置が冗長化されたコンピュータシステムで運用系として用いられる運用系コンピュータ装置に実装されているソフトウェアと共通するソフトウェアが実装され、前記コンピュータシステムで待機系として用いられるコンピュータ装置であって、
前記運用系コンピュータ装置内のいずれかのソフトウェアが縮退状態となった際に、前記運用系コンピュータ装置内で縮退状態となったソフトウェアに対応する前記コンピュータ装置内のソフトウェアを縮退状態にする縮退実施部と、
系が切り替えられ、前記コンピュータ装置が前記運用系コンピュータの処理を引き継いだ際に、系の切替前に前記縮退実施部が縮退状態にした縮退状態ソフトウェアが存在するか否かを判断し、前記縮退状態ソフトウェアが存在する場合に、前記縮退状態ソフトウェアを縮退状態から回復させる状態回復部とを有することを特徴とする。 The computer device according to the present invention is:
Software that is common to software implemented in an operational computer device used as an operational system in a computer system in which the computer device is made redundant, is a computer device used as a standby system in the computer system,
When any software in the operational computer device is in a degenerated state, a degeneration execution unit that sets the software in the computer device corresponding to the software in the degenerated state in the active computer device to a degenerated state When,
When the system is switched and the computer device takes over the processing of the active computer, it is determined whether or not there is degenerate state software that the degeneration execution unit puts into a degenerated state before switching the system, and the degeneration And a state recovery unit that recovers the degraded state software from the degraded state when the state software exists.

本発明によれば、系が切り替えられた際に、縮退状態ソフトウェアが存在するか否かを判断し、縮退状態ソフトウェアが存在する場合に、縮退状態ソフトウェアを縮退状態から回復させるため、待機系のコンピュータ装置が縮退状態のまま動作し続ける事態を回避することができる。 According to the present invention, when the system is switched, it is determined whether or not the degraded state software exists, and when the degraded state software exists, the standby state of the standby system is recovered in order to recover the degraded state software from the degraded state. It is possible to avoid a situation in which the computer device continues to operate in a degenerated state.

実施の形態１に係るコンピュータシステムの構成例を示す図。FIG. 2 is a diagram illustrating a configuration example of a computer system according to the first embodiment. 実施の形態１に係る運用系の正常運転時の動作を示すフローチャート図。FIG. 3 is a flowchart showing an operation during normal operation of an operational system according to the first embodiment. 実施の形態１に係る運用系の正常運転時の動作を示すフローチャート図。FIG. 3 is a flowchart showing an operation during normal operation of an operational system according to the first embodiment. 実施の形態１に係る運用系のＣＰＵ異常発生時の動作を示すフローチャート図。FIG. 3 is a flowchart showing an operation when an operational CPU abnormality occurs according to the first embodiment. 実施の形態１に係る待機系の正常運転時の動作を示すフローチャート図。FIG. 3 is a flowchart showing an operation during normal operation of the standby system according to the first embodiment. 実施の形態１に係る待機系の正常運転時の動作を示すフローチャート図。FIG. 3 is a flowchart showing an operation during normal operation of the standby system according to the first embodiment. 実施の形態１に係る待機系の系切替後の動作を示すフローチャート図。FIG. 3 is a flowchart showing an operation after system switching of the standby system according to the first embodiment. 実施の形態１に係るＣＰＵの割当パターン例を示す図。FIG. 3 is a diagram showing an example of CPU allocation patterns according to the first embodiment. 実施の形態１に係る優先度に基づくＣＰＵの割当パターン例を示す図。FIG. 4 is a diagram showing an example of CPU allocation patterns based on priorities according to the first embodiment. 実施の形態１に係る物理サーバ装置のハードウェア構成例を示す図。FIG. 3 is a diagram illustrating a hardware configuration example of a physical server device according to the first embodiment. 仮想マシン同期技術を説明する図。The figure explaining virtual machine synchronization technology. ＣＰＵ異常発生時の縮退処理を説明する図。The figure explaining the degeneracy process at the time of CPU abnormality occurrence. 系の切替後の待機系で縮退状態が継続することを説明する図。The figure explaining that a degeneracy state continues in the standby system after system switching. 物理ＣＰＵに空きがなくリソース不足になる場合を説明する図。The figure explaining the case where there is no space in the physical CPU and the resource becomes insufficient.

実施の形態１．
本実施の形態では、図１３を用いて説明した課題に対して、待機系サーバにＶＭの系切替が行われた後に、縮退によって減少した数のｖＣＰＵを系が切り替わった後のＶＭに追加し、運用系サーバで正常に動作（縮退状態ではない）していたときと同じｖＣＰＵ数にし、追加したｖＣＰＵに対して、未使用のＣＰＵ（ハードウェア資源）を割り当てることで、縮退状態からの回復を行う構成を説明する。 Embodiment 1 FIG.
In this embodiment, in response to the problem described with reference to FIG. 13, the number of vCPUs decreased due to degeneration is added to the VM after the system is switched after the VM system is switched to the standby server. Recovery from the degraded state by assigning unused CPUs (hardware resources) to the added vCPUs with the same number of vCPUs as when operating normally (not in a degraded state) on the active server The structure which performs is demonstrated.

例えば、図１４のように３つのＶＭＭ（仮想マシンモニタ）が存在し、各ＶＭＭが物理ＣＰＵ６個で動作し、各ＶＭは２個のｖＣＰＵで動作している環境を想定する。
この環境において、運用系のＶＭＣの物理ＣＰＵで異常が発生し、運用系のＶＭＣと待機系のＶＭＣが縮退状態になった後に、系の切替が発生した場合は、系の切替後に待機系のＶＭＣは上記方法で縮退状態から正常状態に回復可能である。
しかし、更に、運用系のＶＭＤの物理ＣＰＵで異常が発生し、運用系のＶＭＤと待機系のＶＭＤが縮退状態になった後に、系の切替が発生した場合は、未使用の物理ＣＰＵが存在せず、待機系のＶＭＤのｖＣＰＵに未使用の物理ＣＰＵを割り当てることができない。 For example, as shown in FIG. 14, it is assumed that there are three VMMs (virtual machine monitors), each VMM operates with six physical CPUs, and each VM operates with two vCPUs.
In this environment, if an error occurs in the physical CPU of the active VM C and the system switch occurs after the active VM C and the standby VM C are in a degenerated state, after the system switch The standby VMC can be recovered from the degenerated state to the normal state by the above method.
However, if an error occurs in the physical CPU of the active VM D and the system switch occurs after the active VM D and the standby VM D are degenerated, the unused physical There is no CPU, and an unused physical CPU cannot be assigned to the vCPU of the standby VM D.

本実施の形態では、この課題に対応するために、システムの継続動作の優先度を示す可用性保証優先度を導入し、ＶＭにｖＣＰＵを追加し、ｖＣＰＵに割り当てるＣＰＵの割当パターンおよび割当時間を制御することで、可用性保証優先度の高いＶＭから順に動作継続を保証する。 In this embodiment, in order to cope with this problem, an availability guarantee priority indicating the priority of continuous operation of the system is introduced, a vCPU is added to the VM, and an allocation pattern and an allocation time of the CPU allocated to the vCPU are controlled. As a result, operation continuation is guaranteed in order from the VM with the highest availability guarantee priority.

図１は、本実施の形態に係るコンピュータシステムの構成例を示す。 FIG. 1 shows a configuration example of a computer system according to the present embodiment.

ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）４００上には、２台の物理サーバ装置４０１、４２１が配置されている。
物理サーバ装置４０１は、運用系のサーバ装置として動作するものとする。
このため、物理サーバ装置４０１は、運用系コンピュータ装置の例に相当する。
また、物理サーバ装置４２１は、待機系のサーバ装置として動作するものとする。
このため、物理サーバ装置４２１は、コンピュータ装置の例に相当する。 Two physical server devices 401 and 421 are arranged on a LAN (Local Area Network) 400.
It is assumed that the physical server device 401 operates as an active server device.
Therefore, the physical server device 401 corresponds to an example of an operational computer device.
In addition, the physical server device 421 operates as a standby server device.
Therefore, the physical server device 421 corresponds to an example of a computer device.

物理サーバ装置４０１は３基のＣＰＵ１１、１２、１３を搭載しており、物理サーバ装置４２１は３基のＣＰＵ２１、２２、２３を搭載している。
それぞれの物理サーバ装置上では、仮想マシンモニタ４０２、４２２が動作している。
仮想マシンモニタ４０２上では、仮想ＣＰＵ（ｖＣＰＵ）１５、１６を有するゲストＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）４０３とｖＣＰＵ１４を有するゲストＯＳ４０４が動作しており、仮想マシンモニタ４２２上では、ｖＣＰＵ２４、２５を有するゲストＯＳ４２３とｖＣＰＵ２６を有するゲストＯＳ４２４が動作している。
仮想マシンモニタ４０２、４２２は、ＶＭ縮退可否判定部４０５、４２５、ＣＰＵ異常検知部４０６、４２６、ＶＭ状態回復部４０７、４２７、ＶＭ状態管理部４０８、４２８、ＶＭスケジューラ４０９、４２９、ＶＭ系切替部４１０、４３０、ＶＭ同期部４１１、４３１、ＣＰＵ利用状況管理部４１２、４３２、異常箇所判定部４１３、４３３、ＶＭ設定ファイル４１４、４３４を有している。
物理サーバ装置４０１を運用系物理サーバ装置とし、物理サーバ装置４２１を待機系物理サーバ装置としているため、ゲストＯＳ４０３は運用系ＶＭとして動作し、ゲストＯＳ４２３はゲストＯＳ４０３に対応する待機系ＶＭとして起動しており、一時停止状態となっている。
同様に、ゲストＯＳ４０４も運用系ＶＭとして動作し、ゲストＯＳ４０４に対応する待機系ＶＭとしてゲストＯＳ４２４が起動しており、一時停止状態となっている。 The physical server device 401 is equipped with three CPUs 11, 12, and 13, and the physical server device 421 is equipped with three CPUs 21, 22, and 23.
On each physical server device, virtual machine monitors 402 and 422 are operating.
On the virtual machine monitor 402, a guest OS (Operating System) 403 having virtual CPUs (vCPU) 15 and 16 and a guest OS 404 having vCPU 14 are operating. On the virtual machine monitor 422, a guest OS 423 having vCPUs 24 and 25 is operating. And the guest OS 424 having the vCPU 26 are operating.
The virtual machine monitors 402 and 422 include VM degeneration availability determination units 405 and 425, CPU abnormality detection units 406 and 426, VM state recovery units 407 and 427, VM state management units 408 and 428, VM schedulers 409 and 429, and VM system switching. Sections 410 and 430, VM synchronization sections 411 and 431, CPU usage status management sections 412 and 432, abnormal part determination sections 413 and 433, and VM setting files 414 and 434.
Since the physical server device 401 is an active physical server device and the physical server device 421 is a standby physical server device, the guest OS 403 operates as an active VM, and the guest OS 423 is started as a standby VM corresponding to the guest OS 403. Is in a paused state.
Similarly, the guest OS 404 also operates as an active VM, the guest OS 424 is activated as a standby VM corresponding to the guest OS 404, and is in a suspended state.

ここで、仮想マシンモニタ４０２、４２２の構成要素のうち、本実施の形態の特徴的な動作と関連する要素を説明する。 Here, among the components of the virtual machine monitors 402 and 422, components related to the characteristic operation of the present embodiment will be described.

ＶＭ設定ファイル４１４、４３４は、正常運転時の保有する仮想ＣＰＵ（ｖＣＰＵ）数と縮退可否設定、可用性保証優先度の設定が書かれたＶＭ（ゲストＯＳ）毎の設定ファイルである。 The VM setting files 414 and 434 are setting files for each VM (guest OS) in which the number of virtual CPUs (vCPUs) held during normal operation, degeneration availability setting, and availability guarantee priority settings are written.

異常箇所判定部４１３、４３３は、ＣＰＵ異常が発生した場合に、異常が発生したＣＰＵと異常が発生したＣＰＵに割り当てられているＶＭを識別する。 When a CPU abnormality occurs, the abnormal part determination units 413 and 433 identify the CPU in which the abnormality has occurred and the VM assigned to the CPU in which the abnormality has occurred.

ＶＭ系切替部４１０、４３０は、運用系の物理サーバ装置４０１に異常が発生した場合に、系の切替を行う。 The VM system switching units 410 and 430 perform system switching when an abnormality occurs in the active physical server device 401.

ＶＭ状態管理部４０８、４２８は、ＶＭ毎のｖＣＰＵ利用状況と、可用性保証優先度の情報を収集・管理し、運用系の仮想マシンモニタと待機系の仮想マシンモニタ間で共有する。
具体的には、ＣＰＵ異常が発生していずれかのＶＭが縮退状態になった場合に、ＶＭ状態管理部４０８は、縮退状態になったＶＭ、切り離されたｖＣＰＵ等の情報をＶＭ状態管理部４２８に通知する。
そして、ＶＭ状態管理部４２８は、ＶＭ状態管理部４０８から通知された情報に基づき、待機系の物理サーバ装置４２１内の対応するＶＭを縮退状態にする。
ＶＭ状態管理部４２８は、縮退実施部の例に相当する。 The VM state management units 408 and 428 collect and manage vCPU usage status and availability guarantee priority information for each VM, and share them between the active virtual machine monitor and the standby virtual machine monitor.
Specifically, when one of the VMs is in a degraded state due to the occurrence of a CPU abnormality, the VM state management unit 408 displays information on the VM in the degraded state, the separated vCPU, and the like as a VM state management unit. 428 is notified.
Then, the VM state management unit 428 puts the corresponding VM in the standby physical server device 421 into a degenerated state based on the information notified from the VM state management unit 408.
The VM state management unit 428 corresponds to an example of a degeneration execution unit.

ＶＭ状態回復部４０７、４２７は、運用系から待機系への系切替後に不足分のｖＣＰＵを追加し、ＶＭ状態管理部４０８、４２８が管理しているＶＭ毎のｖＣＰＵ利用状況と可用性保証優先度を元に、追加後の各ｖＣＰＵに割り当てるＣＰＵ割当時間を制御する。
つまり、ＶＭ状態回復部４２７は、系が切り替えられ、待機系の物理サーバ装置４２１が運用系の物理サーバ装置４０１の処理を引き継いだ際に、系の切替前にＶＭ状態回復部４２７が縮退状態にしたＶＭ（縮退状態ソフトウェア）が存在するか否かを判断し、縮退状態のＶＭが存在する場合に、縮退状態のＶＭを縮退状態から回復させる。
また、ＶＭ状態回復部４２７は、縮退状態のＶＭが専有できる物理ＣＰＵがあるか否かを判断し、縮退状態のＶＭが専有できる物理ＣＰＵがない場合に、他のＶＭが利用している物理ＣＰＵを縮退状態のＶＭと当該他のＶＭとで共有させて、縮退状態のＶＭを縮退状態から回復させる。
また、ＶＭ状態回復部４２７は、物理ＣＰＵの利用状況によっては、縮退状態のＶＭよりも可用性保証優先度が低いＶＭの物理ＣＰＵの利用を制限して、縮退状態のＶＭに割り当てる物理ＣＰＵを確保する場合もある。
ＶＭ状態回復部４２７は、状態回復部の例に相当する。 The VM state recovery units 407 and 427 add insufficient vCPUs after system switching from the active system to the standby system, and the vCPU usage status and availability guarantee priority for each VM managed by the VM state management units 408 and 428 Based on the above, the CPU allocation time allocated to each added vCPU is controlled.
That is, when the system is switched and the standby physical server device 421 takes over the processing of the active physical server device 401, the VM state recovery unit 427 is in a degenerated state before switching the system. It is determined whether or not there is a degenerated VM (degraded software), and if there is a degenerated VM, the degenerated VM is recovered from the degenerated state.
Also, the VM state recovery unit 427 determines whether there is a physical CPU that can be used exclusively by the degraded VM, and when there is no physical CPU that can be used exclusively by the degraded VM, the physical state used by other VMs The CPU is shared between the degraded VM and the other VM, and the degraded VM is recovered from the degraded state.
Also, the VM state recovery unit 427 restricts the use of the physical CPU of the VM whose availability guarantee priority is lower than that of the degraded VM depending on the usage status of the physical CPU, and secures the physical CPU to be allocated to the degraded VM. There is also a case.
The VM state recovery unit 427 corresponds to an example of a state recovery unit.

次に動作について、正常運転時の運用系と待機系の動作、ＣＰＵ異常発生時の運用系の動作、系切替実施直後の待機系の動作、に分けて説明する。 Next, the operation will be described separately for the operation system and the standby system during normal operation, the operation of the operation system when a CPU abnormality occurs, and the operation of the standby system immediately after system switching.

最初に正常運転時の運用系と待機系の動作について説明する。
図２及び図３が、運用系の物理サーバ装置４０１の正常運転時の動作を示すフローチャート図である。
図５及び図６が、待機系の物理サーバ装置４２１の正常運転時の動作を示すフローチャート図である。 First, the operation of the active system and the standby system during normal operation will be described.
FIG. 2 and FIG. 3 are flowcharts showing operations during normal operation of the active physical server device 401.
FIGS. 5 and 6 are flowcharts showing the operation of the standby physical server device 421 during normal operation.

運用系の物理サーバ装置４０１では、Ｓ５０１にてＶＭ同期部４１１が動作を開始した後、Ｓ５０２にてＶＭのＣＰＵ・メモリ内容を待機系のＶＭ同期部４３１に送信する。
以後、定期的にＳ５０２の動作を繰り返す。
また、ＶＭ同期部４１１の動作と並行して、Ｓ５０３でＶＭ状態管理部４０８が動作を開始した後、Ｓ５０４にてＶＭ設定ファイル４１４を読み込み、ＶＭ毎の可用性保証優先度の情報を待機系のＶＭ状態管理部４２８と共有し、Ｓ５０５にて動作中（一時停止状態は動作中には含まない）のＶＭのｖＣＰＵ利用状況を収集し、待機系のＶＭ状態管理部４２８と共有する。
以後、定期的にＳ５０５の動作を繰り返す。 In the active physical server device 401, after the VM synchronization unit 411 starts operation in S501, the CPU / memory contents of the VM are transmitted to the standby VM synchronization unit 431 in S502.
Thereafter, the operation of S502 is repeated periodically.
In parallel with the operation of the VM synchronization unit 411, after the VM state management unit 408 starts operation in S503, the VM setting file 414 is read in S504, and the availability guarantee priority information for each VM is stored in the standby system. It is shared with the VM state management unit 428, collects the vCPU usage status of the VM that is operating in S505 (the pause state is not included in the operation), and shares it with the standby VM state management unit 428.
Thereafter, the operation of S505 is repeated periodically.

また、待機系では、Ｓ７０１にてＶＭ同期部４３１が動作を開始した後、Ｓ７０２にて、Ｓ５０２で運用系のＶＭ同期部４１１から送信されたＶＭのＣＰＵ・メモリ内容を受信し、待機系の一時停止中のＶＭに反映する。
また、ＶＭ同期部４３１の動作と並行して、Ｓ７０３でＶＭ状態管理部４２８が動作を開始した後、Ｓ７０４にてＶＭ設定ファイル４３４を読み込み、ＶＭ毎の可用性保証優先度の情報を運用系のＶＭ状態管理部４０８と共有し、Ｓ７０５にて動作中（一時停止状態は動作中には含まない）のＶＭのｖＣＰＵ利用状況を収集し、運用系のＶＭ状態管理部４０８と共有する。
以後、定期的にＳ７０５の動作を繰り返す。
以上が正常運転時の運用系と待機系の動作である。 In the standby system, after the VM synchronization unit 431 starts operation in S701, in S702, the CPU CPU / memory contents of the VM transmitted from the active VM synchronization unit 411 are received in S502, and the standby system This is reflected in the temporarily stopped VM.
In parallel with the operation of the VM synchronization unit 431, after the VM state management unit 428 starts operation in S703, the VM setting file 434 is read in S704, and the availability guarantee priority information for each VM is obtained. It is shared with the VM state management unit 408, collects the vCPU usage status of the VM that is operating in S705 (the pause state is not included in the operation), and shares it with the active VM state management unit 408.
Thereafter, the operation of S705 is repeated periodically.
The above is the operation of the active system and the standby system during normal operation.

次に、ＣＰＵ異常発生時の運用系の動作について説明する（図４）。 Next, the operation of the operational system when a CPU abnormality occurs will be described (FIG. 4).

Ｓ６０１にて、運用系の物理サーバ装置４０１にてＣＰＵ異常が発生すると、Ｓ６０２にてＣＰＵ異常検知部４０６がＣＰＵ異常を検知し、Ｓ６０３にて、異常箇所判定部４１３が呼び出され、異常が発生したＣＰＵと異常が発生したＣＰＵに割り当てられているＶＭを識別する。
次にＳ６０４にて、ＶＭ縮退可否判定部４０５が、Ｓ６０３で識別したＶＭに対して縮退可能設定がされているかどうかをＶＭ設定ファイル４１４にアクセスし確認する。
このとき縮退可能設定がされていない場合は、Ｓ６０９に処理が進み、ＶＭ系切替部４１０によって系切替を実施し、運用系のＶＭを停止させ、待機系の一時停止中のＶＭの動作を再開させる。
縮退可能設定がされている場合は、ＶＭ縮退可否判定部４０５は、Ｓ６０５にて、当該ＶＭが縮退可能かどうかをＶＭ状態管理部４０８が管理している情報にアクセスし確認する。
このとき、縮退不可能である場合は、Ｓ６０９に処理が進み、ＶＭ系切替部４１０によって系切替を実施し、運用系のＶＭを停止させ、待機系の一時停止中のＶＭの動作を再開させる。
縮退可能である場合は、Ｓ６０６にてＣＰＵ異常検知部４０６が該当するＶＭに対してＣＰＵ異常を通知する。
次に、Ｓ６０７にて、ＣＰＵ異常通知を受信したＶＭが、異常が発生したＣＰＵに割り当てられているｖＣＰＵを切り離し、縮退状態になる。
Ｓ６０８にて、ＶＭが縮退状態になり、ＶＭの保有するｖＣＰＵ数が減少したことをＶＭ状態管理部４０８が検知し、待機系のＶＭ状態管理部４２８と共有する。
つまり、ＶＭ状態管理部４０８は、縮退状態になったＶＭと切り離されたｖＣＰＵを待機系のＶＭ状態管理部４２８に通知する。
ＶＭ状態管理部４２８は、ＶＭ状態管理部４０８から通知されたｖＣＰＵに対応するｖＣＰＵの切り離しを行い、待機系の物理サーバ装置４２１内の対応するＶＭを縮退状態にする。 If a CPU abnormality occurs in the active physical server device 401 in S601, the CPU abnormality detection unit 406 detects the CPU abnormality in S602, and the abnormality location determination unit 413 is called in S603, causing an abnormality. The VM assigned to the CPU that has failed and the CPU in which an abnormality has occurred are identified.
In step S <b> 604, the VM degeneration possibility determination unit 405 accesses the VM setting file 414 to check whether the VM identified in step S <b> 603 has been set to be degenerate.
If the degradable setting is not set at this time, the process proceeds to S609, the system switching is performed by the VM system switching unit 410, the active VM is stopped, and the operation of the temporarily stopped VM is resumed. Let
If the degradable setting is set, the VM degradability determination unit 405 accesses and confirms information managed by the VM state management unit 408 in S605 as to whether or not the VM can be degenerated.
At this time, if the degeneration is impossible, the process proceeds to S609, the system switching is performed by the VM system switching unit 410, the active VM is stopped, and the operation of the temporarily stopped VM of the standby system is resumed. .
If degeneration is possible, the CPU abnormality detection unit 406 notifies the corresponding VM of the CPU abnormality in S606.
Next, in S607, the VM that has received the CPU abnormality notification disconnects the vCPU assigned to the CPU in which the abnormality has occurred, and enters a degenerated state.
In S608, the VM state management unit 408 detects that the VM is in a degenerated state and the number of vCPUs held by the VM has decreased, and shares it with the standby VM state management unit 428.
That is, the VM state management unit 408 notifies the standby state VM state management unit 428 of the vCPU separated from the VM in the degenerated state.
The VM state management unit 428 disconnects the vCPU corresponding to the vCPU notified from the VM state management unit 408, and puts the corresponding VM in the standby physical server device 421 into a degenerated state.

以上が運用系でＣＰＵ異常が発生した時の動作である。Ｓ６０５、Ｓ６０６、Ｓ６０７、Ｓ６０８の処理を経て、ＶＭが縮退状態となった後で更にＣＰＵ異常が発生した場合は、再度Ｓ６０２からの処理を実施する。 The above is the operation when a CPU abnormality occurs in the operational system. If a CPU abnormality occurs after the processing of S605, S606, S607, and S608 and the VM is in a degenerated state, the processing from S602 is performed again.

最後に系切替直後の待機系の動作について説明する（図７）。 Finally, the operation of the standby system immediately after system switching will be described (FIG. 7).

運用系の動作Ｓ６０９（図４）にて系切替が行われ、Ｓ８０１にて系切替が完了後、Ｓ８０２にてＶＭ状態回復部４２７が系切替の直後のＶＭが縮退状態かどうかをＶＭ状態管理部４２８の管理する情報にアクセスし、確認する。
つまり、ＶＭ状態回復部４２７は、系の切替前にＶＭ状態管理部４２８が縮退状態にしたＶＭがあるか否かを判断する。
縮退状態のＶＭがない場合は、Ｓ８０９に処理が移り、処理が完了する。
縮退状態のＶＭがあるときは、処理はＳ８０３に移り、ＶＭ状態回復部４２７が、縮退によって不足しているｖＣＰＵを追加する。
次に、ＶＭ状態回復部４２７は、Ｓ８０４にて、追加したｖＣＰＵに割り当てるための未使用のＣＰＵがあるかどうかをＣＰＵ利用状況管理部４３２の情報を用いて確認する。
このとき、追加したｖＣＰＵの数だけ未使用のＣＰＵがある場合は、ＶＭ状態回復部４２７は、Ｓ８０５にてｖＣＰＵに未使用のＣＰＵを割り当て、Ｓ８０９に処理が移り、処理が完了する。 In system operation S609 (FIG. 4), system switching is performed. After system switching is completed in S801, the VM state recovery unit 427 determines whether or not the VM immediately after system switching is in a degenerated state in S802. The information managed by the unit 428 is accessed and confirmed.
That is, the VM state recovery unit 427 determines whether there is a VM that has been put into a degenerated state by the VM state management unit 428 before system switching.
If there is no degenerated VM, the process moves to S809 to complete the process.
If there is a degraded VM, the process moves to S803, and the VM state recovery unit 427 adds a vCPU that is insufficient due to the degradation.
Next, in S804, the VM state recovery unit 427 confirms whether there is an unused CPU to be allocated to the added vCPU by using information of the CPU usage status management unit 432.
At this time, if there are as many unused CPUs as the number of added vCPUs, the VM state recovery unit 427 assigns an unused CPU to the vCPU in S805, and the process moves to S809, where the process is completed.

未使用物理ＣＰＵが無い場合は、ＶＭ状態回復部４２７は、Ｓ８０６にて待機系で動作中のＶＭのｖＣＰＵ利用状況と、Ｓ７０５にて待機系のＶＭ状態管理部４２８が運用系のＶＭ状態管理部４０８と共有した運用系のｖＣＰＵ利用状況を比較して、ＣＰＵ割当パターン変更で回復可能かどうかを確認する。
ＣＰＵ割当パターン変更で回復可能である場合には、ＶＭ状態回復部４２７は、Ｓ８０７にてＶＭ状態管理部４２８から系切替直前まで待機系で動作していたその他のＶＭのｖＣＰＵ利用状況を取得し、系切替直後の余剰ＣＰＵリソース量を算出し、ｖＣＰＵ（追加したｖＣＰＵ含む）へのＣＰＵ割当パターンを変更し、Ｓ８０９に処理が移り、処理が完了する。
一方、ＣＰＵ割当パターン変更で回復不可能な場合には、Ｓ８０８に処理が移り、ＶＭ状態回復部４２７は、可用性保証優先度に基づき、可用性保証優先度が低いＶＭに割り当てられているＣＰＵの割り当て時間を減らし、可用性保証優先度が高いＶＭにＣＰＵ割当時間を多く割り当て、Ｓ８０９に処理が移り、処理が完了する。
以上が系切替直後の待機系の動作である。 When there is no unused physical CPU, the VM state recovery unit 427 uses the vCPU usage status of the VM that is operating in the standby system in S806, and the standby VM state management unit 428 performs the active VM state management in S705. The vCPU usage status of the active system shared with the unit 408 is compared, and it is confirmed whether or not recovery is possible by changing the CPU allocation pattern.
If recovery is possible by changing the CPU allocation pattern, the VM state recovery unit 427 acquires the vCPU usage status of other VMs operating in the standby system until immediately before system switching from the VM state management unit 428 in S807. The surplus CPU resource amount immediately after the system switching is calculated, the CPU allocation pattern to the vCPU (including the added vCPU) is changed, the process proceeds to S809, and the process is completed.
On the other hand, if the CPU allocation pattern cannot be recovered due to the CPU allocation pattern change, the process moves to S808, and the VM state recovery unit 427 allocates the CPU allocated to the VM with the lower availability guarantee priority based on the availability guarantee priority. The time is reduced, and a CPU allocation time is allocated to a VM with a high availability guarantee priority. The process moves to S809, and the process is completed.
The above is the operation of the standby system immediately after system switching.

Ｓ８０７では、ＶＭ状態回復部４２７は、図８のように余剰ＣＰＵ量を算出し、縮退状態のＶＭが正常状態で動作するために必要なＣＰＵ量を提供できるＣＰＵを他のＶＭと共有するように割当パターンを変更する、という処理を行う。
図８の例では、ｖＣＰＵ３が縮退状態のＶＭのｖＣＰＵであり、このｖＣＰＵ３は、ＣＰＵ時間の３０％が割り当てられれば実現でき、物理ＣＰＵ２にはＣＰＵ時間の５０％が余っているので、このうちの３０％をｖＣＰＵ３に割り当てることで、縮退状態のＶＭを正常状態に回復させる。 In S807, the VM state recovery unit 427 calculates the surplus CPU amount as shown in FIG. 8, and shares the CPU that can provide the CPU amount necessary for the degenerate VM to operate in the normal state with other VMs. The process of changing the allocation pattern is performed.
In the example of FIG. 8, the vCPU 3 is a VM vCPU in a degenerated state, and this vCPU 3 can be realized if 30% of the CPU time is allocated, and the physical CPU 2 has 50% of the CPU time remaining. Is allocated to the vCPU 3, so that the degenerated VM is restored to the normal state.

Ｓ８０８では、ＶＭ状態回復部４２７は、図９のように、可用性保証優先度の低いＶＭに割り当てられているＣＰＵ量を減らすことで、余剰ＣＰＵ量を増やし、可用性保証優先度の高いＶＭに割り当てる、という処理を行う。
図９の例では、ｖＣＰＵ３が縮退状態のＶＭのｖＣＰＵであり、このｖＣＰＵ３は、ＣＰＵ時間の３０％が割り当てられれば実現できる。
物理ＣＰＵ１、物理ＣＰＵ２ともに余剰は２５％のみであり、図８のＣＰＵ割当パターン変更では回復できない。
このため、縮退状態のＶＭ（ｖＣＰＵ３を使用）よりも可用性保証優先度が低いＶＭ（ｖＣＰＵ１を使用）に割り当てられているＣＰＵ量を５％減らして、余剰ＣＰＵ量を３０％にし、この３０％をｖＣＰＵ３に割り当てて、縮退状態のＶＭを正常状態に回復させる。 In S808, as shown in FIG. 9, the VM state recovery unit 427 increases the surplus CPU amount by reducing the CPU amount allocated to the VM with the low availability guarantee priority, and allocates it to the VM with the high availability guarantee priority. The process is performed.
In the example of FIG. 9, the vCPU 3 is a VM vCPU in a degenerated state, and this vCPU 3 can be realized if 30% of the CPU time is allocated.
The surplus of both the physical CPU 1 and the physical CPU 2 is only 25%, and cannot be recovered by changing the CPU allocation pattern in FIG.
For this reason, the CPU amount allocated to the VM (using vCPU 1) having a lower availability guarantee priority than the VM in the degenerated state (using vCPU 3) is reduced by 5% to make the surplus CPU amount 30%, and this 30% Is assigned to the vCPU 3, and the degenerated VM is restored to the normal state.

以上のように、ＶＭ毎のｖＣＰＵ利用状況の収集やＶＭが縮退状態であるかどうかの管理を行うＶＭ状態管理部と、Ｓ８０７の処理およびＳ８０８の処理を実行するＶＭ状態回復部により、可用性保証優先度と系切替直後のＣＰＵ割当状況に基づき、ｖＣＰＵへのＣＰＵ割当パターンおよびＣＰＵ割当量の変更を行うことで可用性保証優先度の高いものから順に縮退状態から回復し系切替後の動作が保証される、という効果を得る。 As described above, the availability guarantee is performed by the VM state management unit that collects the vCPU usage status for each VM and manages whether the VM is in a degraded state, and the VM state recovery unit that executes the processing of S807 and S808. Based on the priority and the CPU allocation status immediately after system switching, the CPU allocation pattern and CPU allocation to the vCPU are changed to recover from the degraded state in order from the one with the highest availability guarantee priority, and the operation after system switching is guaranteed. The effect that is done.

以上、本実施の形態では、
ＶＭ同期における同期元のＶＭと、同期元ＶＭを管理する仮想マシンモニタが動作する運用系物理サーバ装置と、
ＶＭ同期における同期先のＶＭと、同期先ＶＭを管理する仮想マシンモニタが動作する待機系物理サーバ装置とを備え、
仮想マシンモニタ内部に、
ＣＰＵ異常発生時に異常を検知するＣＰＵ異常検知部と、
異常が発生したＣＰＵに割り当てられているｖＣＰＵおよびＶＭを特定する異常箇所判定部と、
ＶＭ毎の可用性保証優先度と縮退可否設定が書かれたＶＭ設定ファイルと、
現在のＶＭの縮退状況とｖＣＰＵ利用状況を管理し、待機系とＶＭの縮退状況とｖＣＰＵ利用状況を共有するＶＭ状態管理部と、
系切替後に運用系のＶＭ状態管理部と共有したＶＭの縮退状況とｖＣＰＵ利用状況を元に切替後のＶＭにｖＣＰＵを追加し、未使用ＣＰＵの割り当て、ＣＰＵ割当パターンおよびＣＰＵ割当量の変更を行うＶＭ状態回復部を備え、
運用系物理サーバ装置で動作するＶＭがＣＰＵ異常によって縮退状態となり、ＶＭ同期によって縮退状態が引き継がれているＶＭに対して、運用系で正常に稼働して時のｖＣＰＵ利用状況と可用性保証優先度情報をＶＭ状態管理部より取得し、可用性保証優先度と系切替直後のＣＰＵ割当状況に基づき、ｖＣＰＵへのＣＰＵ割当パターンおよびＣＰＵ割当量の変更を行うことで可用性保証優先度の高いものから順に縮退状態から回復し系切替後の動作を保証する、
仮想マシン同期装置および異常検知・系切替手法を説明した。 As described above, in the present embodiment,
A synchronization source VM in VM synchronization, an active physical server device on which a virtual machine monitor that manages the synchronization source VM operates,
A synchronization destination VM in VM synchronization, and a standby physical server device on which a virtual machine monitor that manages the synchronization destination VM operates,
Inside the virtual machine monitor,
A CPU abnormality detection unit that detects an abnormality when a CPU abnormality occurs;
An abnormal point determination unit that identifies a vCPU and a VM assigned to a CPU in which an abnormality has occurred;
A VM configuration file in which the availability guarantee priority for each VM and the degeneracy permission setting are written;
A VM state management unit that manages the current VM degradation status and vCPU usage status, and shares the VM degradation status and vCPU usage status with the standby system;
Add vCPU to the VM after switching based on the VM degeneration status and vCPU usage status shared with the active VM state management unit after system switching, and change the allocation of unused CPU, CPU allocation pattern and CPU allocation A VM state recovery unit to perform,
The vCPU usage status and availability guarantee priority when the VM running on the active physical server device is in a degraded state due to a CPU failure and the VM that has been inherited by VM synchronization is operating normally in the active system Information is acquired from the VM state management unit, and the CPU allocation pattern to the vCPU and the CPU allocation amount are changed on the basis of the availability assurance priority and the CPU allocation status immediately after the system switchover, in descending order of the availability assurance priority. Recover from the degenerate state and guarantee the operation after system switchover.
The virtual machine synchronization device and abnormality detection / system switching method were explained.

なお、上記では、仮想マシンが構築されているコンピュータシステムを例にして説明を行ったが、本実施の形態の適用対象はこれに限られない。
運用系コンピュータ装置内のいずれかのソフトウェアが縮退状態となった際に、運用系コンピュータ装置内で縮退状態となったソフトウェアに対応する待機系コンピュータ装置内のソフトウェアを縮退状態にするコンピュータシステムであれば、本実施の形態に示した方式を適用できる。
そして、本実施の形態に示した方式を適用すれば、系の切替後に待機系コンピュータ装置に縮退状態ソフトウェアが存在するか否かを判断し、縮退状態ソフトウェアが存在する場合には、縮退状態ソフトウェアを縮退状態から回復させることが可能である。 In the above description, the computer system in which the virtual machine is constructed is described as an example. However, the application target of the present embodiment is not limited to this.
A computer system that puts software in a standby computer device corresponding to software in a degraded state in the active computer device into a degraded state when any software in the active computer device is in a degraded state. For example, the method described in this embodiment can be applied.
Then, if the method shown in this embodiment is applied, it is determined whether or not degraded software is present in the standby computer device after system switching. If degraded software is present, degraded software is present. Can be recovered from the degenerated state.

最後に、本実施の形態に示した物理サーバ装置４０１、４２１のハードウェア構成例について説明する。
図１０は、本実施の形態に示す物理サーバ装置４０１、４２１のハードウェア資源の一例を示す図である。
なお、図１０の構成は、あくまでも物理サーバ装置４０１、４２１のハードウェア構成の一例を示すものであり、物理サーバ装置４０１、４２１のハードウェア構成は図１０に記載の構成に限らず、他の構成であってもよい。 Finally, a hardware configuration example of the physical server devices 401 and 421 described in this embodiment will be described.
FIG. 10 is a diagram illustrating an example of hardware resources of the physical server devices 401 and 421 described in this embodiment.
Note that the configuration in FIG. 10 is merely an example of the hardware configuration of the physical server devices 401 and 421, and the hardware configuration of the physical server devices 401 and 421 is not limited to the configuration illustrated in FIG. It may be a configuration.

図１０において、物理サーバ装置４０１、４２１は、プログラムを実行するＣＰＵ９１１を備えている。
ＣＰＵ９１１は、バス９１２を介して、例えば、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）９１３、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）９１４、通信ボード９１５、表示装置９０１、キーボード９０２、マウス９０３、磁気ディスク装置９２０と接続され、これらのハードウェアデバイスを制御する。
更に、ＣＰＵ９１１は、ＦＤＤ９０４（ＦｌｅｘｉｂｌｅＤｉｓｋＤｒｉｖｅ）、コンパクトディスク装置９０５（ＣＤＤ）と接続していてもよい。
また、磁気ディスク装置９２０の代わりに、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）を用いてもよい。
ＲＡＭ９１４は、揮発性メモリの一例である。
ＲＯＭ９１３、ＦＤＤ９０４、ＣＤＤ９０５、磁気ディスク装置９２０の記憶媒体は、不揮発性メモリの一例である。これらは、記憶装置の一例である。
通信ボード９１５、キーボード９０２、マウス９０３、ＦＤＤ９０４などは、入力装置の一例である。
また、通信ボード９１５、表示装置９０１などは、出力装置の一例である。 In FIG. 10, the physical server devices 401 and 421 include a CPU 911 that executes a program.
The CPU 911 is connected to, for example, a ROM (Read Only Memory) 913, a RAM (Random Access Memory) 914, a communication board 915, a display device 901, a keyboard 902, a mouse 903, and a magnetic disk device 920 via a bus 912. Control hardware devices.
Further, the CPU 911 may be connected to an FDD 904 (Flexible Disk Drive) or a compact disk device 905 (CDD).
Further, instead of the magnetic disk device 920, an SSD (Solid State Drive) may be used.
The RAM 914 is an example of a volatile memory.
The storage media of the ROM 913, the FDD 904, the CDD 905, and the magnetic disk device 920 are an example of a nonvolatile memory. These are examples of the storage device.
The communication board 915, the keyboard 902, the mouse 903, the FDD 904, and the like are examples of input devices.
The communication board 915, the display device 901, and the like are examples of output devices.

通信ボード９１５は、図１に示すように、ＬＡＮ４００に接続される。 The communication board 915 is connected to the LAN 400 as shown in FIG.

磁気ディスク装置９２０には、仮想マシンモニタ９２１、ゲストＯＳ９２２、プログラム群９２３、ファイル群９２４が記憶されている。
プログラム群９２３のプログラムは、ＣＰＵ９１１、仮想マシンモニタ９２１、ゲストＯＳ９２２により実行される。
仮想マシンモニタ９２１には、図１の仮想マシンモニタ４０２、４２２内の各要素が含まれる。 The magnetic disk device 920 stores a virtual machine monitor 921, a guest OS 922, a program group 923, and a file group 924.
The programs in the program group 923 are executed by the CPU 911, the virtual machine monitor 921, and the guest OS 922.
The virtual machine monitor 921 includes each element in the virtual machine monitors 402 and 422 of FIG.

ＲＯＭ９１３には、ＢＩＯＳ（ＢａｓｉｃＩｎｐｕｔＯｕｔｐｕｔＳｙｓｔｅｍ）プログラムが格納され、磁気ディスク装置９２０にはブートプログラムが格納されている。
物理サーバ装置４０１、４２１の起動時には、ＲＯＭ９１３のＢＩＯＳプログラム及び磁気ディスク装置９２０のブートプログラムが実行され、ＢＩＯＳプログラム及びブートプログラムにより仮想マシンモニタ９２１、ゲストＯＳ９２２が起動される。 The ROM 913 stores a BIOS (Basic Input Output System) program, and the magnetic disk device 920 stores a boot program.
When the physical server devices 401 and 421 are activated, the BIOS program in the ROM 913 and the boot program in the magnetic disk device 920 are executed, and the virtual machine monitor 921 and the guest OS 922 are activated by the BIOS program and the boot program.

更に、ファイル群９２４には、本実施の形態の説明において、「〜の判断」、「〜の判定」、「〜の検出」、「〜の同期」、「〜の識別」、「〜の制御」、「〜の設定」、「〜の選択」、「〜の確認」、「〜の割り当て」等として説明している処理の結果を示す情報やデータや信号値や変数値やパラメータが、「〜ファイル」や「〜データベース」の各項目として記憶されている。
「〜ファイル」や「〜データベース」は、ディスクやメモリなどの記録媒体に記憶される。ディスクやメモリなどの記憶媒体に記憶された情報やデータや信号値や変数値やパラメータは、読み書き回路を介してＣＰＵ９１１によりメインメモリやキャッシュメモリに読み出され、抽出・検索・参照・比較・演算・計算・処理・編集・出力・印刷・表示などのＣＰＵの動作に用いられる。
抽出・検索・参照・比較・演算・計算・処理・編集・出力・印刷・表示のＣＰＵの動作の間、情報やデータや信号値や変数値やパラメータは、メインメモリ、レジスタ、キャッシュメモリ、バッファメモリ等に一時的に記憶される。
また、本実施の形態で説明しているフローチャートの矢印の部分は主としてデータや信号の入出力を示し、データや信号値は、ＲＡＭ９１４のメモリ、ＦＤＤ９０４のフレキシブルディスク、ＣＤＤ９０５のコンパクトディスク、磁気ディスク装置９２０の磁気ディスク、その他光ディスク、ブルーレイ（登録商標）ディスク、ＤＶＤ等の記録媒体に記録される。
また、データや信号は、バス９１２や信号線やケーブルその他の伝送媒体によりオンライン伝送される。 Further, in the description of the present embodiment, the file group 924 includes “control of”, “determination of”, “determination of”, “detection of”, “synchronization”, “identification of”, and “control of”. "," Setting of "," selection of "," confirmation of "," assignment of ", etc. It is stored as each item of "~ file" and "~ database".
The “˜file” and “˜database” are stored in a recording medium such as a disk or a memory. Information, data, signal values, variable values, and parameters stored in a storage medium such as a disk or memory are read out to the main memory or cache memory by the CPU 911 via a read / write circuit, and extracted, searched, referenced, compared, and calculated. Used for CPU operations such as calculation, processing, editing, output, printing, and display.
Information, data, signal values, variable values, and parameters are stored in the main memory, registers, cache memory, and buffers during the CPU operations of extraction, search, reference, comparison, calculation, processing, editing, output, printing, and display. It is temporarily stored in a memory or the like.
Also, the arrows in the flowchart described in this embodiment mainly indicate input / output of data and signals, and the data and signal values are the RAM 914 memory, the FDD904 flexible disk, the CDD905 compact disk, and the magnetic disk device. 920 magnetic disks, other optical disks, Blu-ray (registered trademark) disks, DVDs, and other recording media.
Data and signals are transmitted online via a bus 912, signal lines, cables, or other transmission media.

また、本実施の形態の説明において「〜部」として説明しているものは「〜ステップ」、「〜手順」、「〜処理」であってもよい。
すなわち、本実施の形態で説明したフローチャートに示すステップ、手順、処理により、本発明に係る「ソフトウェア管理方法」を実現することができる。
また、「〜部」として説明しているものは、ＲＯＭ９１３に記憶されたファームウェアで実現されていても構わない。
或いは、ソフトウェアのみ、或いは、素子・デバイス・基板・配線などのハードウェアとソフトウェアとの組み合わせ、さらには、ファームウェアとの組み合わせで実施されても構わない。ファームウェアとソフトウェアは、プログラムとして、磁気ディスク、フレキシブルディスク、光ディスク、コンパクトディスク、ブルーレイ（登録商標）ディスク、ＤＶＤ等の記録媒体に記憶される。 In addition, what is described as “to part” in the description of the present embodiment may be “to step”, “to procedure”, and “to process”.
That is, the “software management method” according to the present invention can be realized by the steps, procedures, and processes shown in the flowchart described in the present embodiment.
Further, what is described as “˜unit” may be realized by firmware stored in the ROM 913.
Alternatively, it may be implemented only by software, or a combination of hardware such as an element, a device, a board, and wiring and software, and further a combination with firmware. Firmware and software are stored as programs in a recording medium such as a magnetic disk, a flexible disk, an optical disk, a compact disk, a Blu-ray (registered trademark) disk, and a DVD.

１１ＣＰＵ、１２ＣＰＵ、１３ＣＰＵ、１４ｖＣＰＵ、１５ｖＣＰＵ、１６ｖＣＰＵ、２１ＣＰＵ、２２ＣＰＵ、２３ＣＵＰ、２４ｖＣＰＵ、２５ｖＣＰＵ、２６ｖＣＰＵ、４００ＬＡＮ、４０１物理サーバ装置、４０２仮想マシンモニタ、４０３ゲストＯＳ、４０４ゲストＯＳ、４０５ＶＭ縮退可否判定部、４０６ＣＰＵ異常検知部、４０７ＶＭ状態回復部、４０８ＶＭ状態管理部、４０９ＶＭスケジューラ、４１０ＶＭ系切替部、４１１ＶＭ同期部、４１２ＣＰＵ利用状況管理部、４１３異常箇所判定部、４１４ＶＭ設定ファイル、４２１物理サーバ装置、４２２仮想マシンモニタ、４２３ゲストＯＳ、４２４ゲストＯＳ、４２５ＶＭ縮退可否判定部、４２６ＣＰＵ異常検知部、４２８ＶＭ状態管理部、４２７ＶＭ状態回復部、４２９ＶＭスケジューラ、４３０ＶＭ系切替部、４３１ＶＭ同期部、４３２ＣＰＵ利用状況管理部、４３３異常箇所判定部、４３４ＶＭ設定ファイル。 11 CPU, 12 CPU, 13 CPU, 14 vCPU, 15 vCPU, 16 vCPU, 21 CPU, 22 CPU, 23 CUP, 24 vCPU, 25 vCPU, 26 vCPU, 400 LAN, 401 Physical server device, 402 Virtual machine monitor, 403 Guest OS, 404 Guest OS, 405 VM degeneration availability determination unit, 406 CPU abnormality detection unit, 407 VM state recovery unit, 408 VM state management unit, 409 VM scheduler, 410 VM system switching unit, 411 VM synchronization unit, 412 CPU use Status management unit, 413 abnormality location determination unit, 414 VM setting file, 421 physical server device, 422 virtual machine monitor, 423 guest OS, 424 guest OS, 425 VM degeneration availability determination unit, 426 CPU abnormality detection unit, 428 M state management unit, 427 VM state recovery unit, 429 VM scheduler, 430 VM-based switching unit, 431 VM synchronization unit, 432 CPU usage management unit, 433 abnormal point determination unit, 434 VM configuration file.

Claims

Software that is common to software implemented in an operational computer device used as an operational system in a computer system in which the computer device is made redundant, is a computer device used as a standby system in the computer system,
When any software in the operational computer device is in a degenerated state, a degeneration execution unit that sets the software in the computer device corresponding to the software in the degenerated state in the active computer device to a degenerated state When,
When the system is switched and the computer device takes over the processing of the active computer, it is determined whether or not there is degenerate state software that the degeneration execution unit puts into a degenerated state before switching the system, and the degeneration A computer apparatus, comprising: a state recovery unit that recovers the degenerate state software from the degenerated state when the state software exists.

The state recovery unit is
When the degraded state software exists, it is determined whether there is a hardware resource in the computer device that can be exclusively used by the degraded state software, and when there is no hardware resource that can be exclusively used by the degraded state software, A hardware resource used by other software in the computer device is shared between the degraded software and the other software in the computer device, and the degraded software is recovered from the degraded state. The computer apparatus according to claim 1.

The state recovery unit is
If there is no hardware resource that can be used exclusively by the software in the degraded state, if there is a surplus resource that can be allocated to the software in the degraded state in the hardware resource used by other software in the computer device, the hardware resource The computer apparatus according to claim 2, wherein the surplus resource is allocated to the degenerated state software, and the degenerated state software is recovered from the degenerated state.

The state recovery unit is
If there is no surplus resource that can be allocated to the degraded software in hardware resources used by other software in the computer device, the use of hardware resources of any software in the computer device is restricted. 4. The surplus resource is generated in the hardware resource, the surplus resource of the generated hardware resource is allocated to the degraded state software, and the degraded state software is recovered from the degraded state. Computer equipment.

The state recovery unit is
If the hardware resources used by other software in the computer device do not have surplus resources that can be allocated to the degraded software, the use of hardware resources of software having a lower priority than the degraded software is restricted. The computer apparatus according to claim 4, wherein surplus resources are generated in the hardware resources.

The computer device includes:
A virtual machine monitor, on the virtual machine monitor,
A guest OS (operating system) common on the virtual machine monitor of the operational computer device is implemented, and a guest OS is implemented,
The degeneration execution unit
When any guest OS in the operational computer device is in a degraded state, the guest OS in the computer device corresponding to the guest OS in the degraded state in the operational computer device is degraded,
The state recovery unit is
When a system is switched and the computer device takes over the processing of the active computer, it is determined whether or not there is a degenerated guest OS that the degeneration execution unit has put into a degenerated state before system switching, and The computer apparatus according to claim 1, wherein, when a degenerated guest OS exists, the degenerated guest OS is recovered from the degenerated state.

The computer apparatus according to claim 6, wherein the degeneration execution unit and the state recovery unit are included in the virtual machine monitor.

A software management method implemented by a computer device used as a standby system in the computer system, in which software common to software installed in an operational computer device used as an active system in a computer system in which the computer device is made redundant is installed There,
When any software in the operational computer device is in a degraded state, the computer device is in a degraded state for the software in the computer device corresponding to the software in the degraded state in the operational computer device A degeneration implementation step,
Whether or not there is degenerate state software that has been put into a degenerated state by the degeneration execution step before the system is switched when the computer is switched over and the computer device takes over the processing of the active computer And a state recovery step of recovering the degraded state software from the degraded state when the degraded state software exists.

Software common to software installed in an operational computer device used as an operational system in a computer system in which the computer device is made redundant is installed, and the computer device used as a standby system in the computer system includes:
When any software in the operational computer apparatus is in a degenerated state, a degeneration execution step for bringing the software in the computer apparatus corresponding to the software in the degenerated state in the operational computer apparatus into a degenerated state When,
When the system is switched and the computer device takes over the processing of the active computer, it is determined whether there is degenerate state software that has been put into a degenerated state by the degeneration execution step before the system is switched, A program for executing a state recovery step of recovering the degraded state software from the degraded state when the degraded state software exists.