JP5913003B2

JP5913003B2 - Computer control apparatus, method and program

Info

Publication number: JP5913003B2
Application number: JP2012188148A
Authority: JP
Inventors: 雅昭小川; 崇博大平; 光司天野
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2012-08-29
Filing date: 2012-08-29
Publication date: 2016-04-27
Anticipated expiration: 2032-08-29
Also published as: JP2014044690A

Description

本発明は、物理計算機上に仮想化環境を構築したシステムの物理計算機および仮想化環境を制御する技術に関する。 The present invention relates to a physical computer of a system in which a virtual environment is constructed on a physical computer and a technology for controlling the virtual environment.

鉄道運行管理、電力系統制御、プラント制御などの高信頼性が要求される用途では、稼動系計算機に障害が発生した場合に、その稼動系計算機の処理を引き継ぐ待機系計算機を備えた多重系システムが用いられている（特許文献１参照）。 In applications that require high reliability such as railway operation management, power system control, and plant control, a multi-system system equipped with a standby computer that takes over the processing of the active computer when a failure occurs in the active computer Is used (see Patent Document 1).

特許文献１の多重系システムでは、稼動系計算機に障害が発生したら、迅速かつ確実に待機系計算機への系切り替えを行うべく、障害を検出した待機系計算機から稼動系計算機へ再起動／停止要求電文を送信する方法が提案されている。 In the multiplex system of Patent Document 1, when a failure occurs in the active computer, a restart / stop request is issued from the standby computer that detected the failure to the active computer in order to quickly and surely switch the system to the standby computer. A method of transmitting a message has been proposed.

ところで、現在では、計算機を仮想化する技術を用いることで物理計算機上に仮想的な計算機（仮想化環境）を構築することが可能であり、更には、同じ物理計算機上に複数の仮想化環境を構築することも可能である。このような仮想化環境上でアプリケーションを実行することができる。 By the way, at present, it is possible to construct a virtual computer (virtual environment) on a physical computer by using a technology for virtualizing the computer, and furthermore, a plurality of virtual environments on the same physical computer. It is also possible to construct An application can be executed on such a virtual environment.

このような物理計算機上に仮想化環境を構築したシステムにおいても、高可用性を実現すべく、物理計算機および仮想化環境の障害を監視し、計算機の系切り替えを実行する手法が提案されている（特許文献２参照）。 Even in such a system in which a virtual environment is constructed on a physical computer, a method has been proposed in which failure of the physical computer and the virtual environment is monitored and system switching of the computer is executed in order to achieve high availability ( Patent Document 2).

特許第４４８７２６０号公報Japanese Patent No. 4487260 特許第４８０９２０９号公報Japanese Patent No. 4809209

特許文献１の多重系システムは物理計算機を前提としたものである。そのため、物理計算機上に複数の仮想化環境を構築した場合に、物理計算機上の各仮想化環境で発生した障害を契機に、稼動系計算機に存在する仮想化環境から待機系計算機に存在する仮想化環境へ系切り替えをする手段がない。また、特許文献１の多重化システムは物理計算機を前提としているため、仮想化環境に対して再起動／停止要求電文を送信することもできない。そのため、稼働系計算機に障害が発生したとき、再起動／停止要求電文によって稼働系計算機を強制的に待機系に移行させることができず、二重化の両系の計算機が自身を稼動系と判断して動作してしまう可能性がある。 The multiplex system of Patent Document 1 is based on a physical computer. Therefore, when multiple virtual environments are built on the physical computer, the virtual environment that exists in the standby computer from the virtual environment that exists in the active computer is triggered by a failure that occurs in each virtual environment on the physical computer. There is no means to switch the system to a computer environment. In addition, since the multiplexing system of Patent Document 1 is based on a physical computer, a restart / stop request message cannot be transmitted to the virtual environment. Therefore, when a failure occurs in the active computer, the active computer cannot be forcibly transferred to the standby system by the restart / stop request message, and both computers in the duplex system determine themselves as active. May work.

また、物理計算機上に複数の仮想化環境を構築した場合、障害時に送受信される電文で通信経路における通信負荷が高くなり一時的に通信ができない状態が発生する可能性がある。また、仮想化環境が存在するシステムにおいては、物理計算機だけで構成されたシステムに比べてネットワークの層が多段となる。そのため、障害時に通信経路の負荷が高くなり、それが原因で通信経路上で間欠障害が発生しやすい。 In addition, when a plurality of virtual environments are constructed on a physical computer, there is a possibility that a communication load on a communication path is increased due to a message transmitted / received at the time of failure, and a state where communication cannot be temporarily performed may occur. Further, in a system in which a virtual environment exists, the network layers are multistage as compared with a system configured by only physical computers. For this reason, the load on the communication path becomes high at the time of a failure, and this causes an intermittent failure on the communication path.

このように通信経路上に間欠障害が発生した場合、特許文献２においては、待機系計算機に存在する待機系仮想環境は、稼動系計算機に障害が発生したと判断し、系切り替えを実行してしまう。そして、その後で通信経路が正常状態になったとき、両計算機が自計算機を稼動系と判断したまま稼動してしまうことが考えられる。これは、鉄道運行管理、電力系統制御、プラント制御などの高信頼性が要求される用途においては好ましくない。 In this way, when an intermittent failure occurs on the communication path, in Patent Document 2, the standby virtual environment existing in the standby computer determines that a failure has occurred in the active computer, and performs system switching. End up. Then, when the communication path becomes normal after that, it is conceivable that both computers operate with their own computers determined to be active. This is not preferable in applications that require high reliability such as railway operation management, power system control, and plant control.

本発明の目的は、物理計算機上に仮想化環境を構築したシステムを適切に制御することを可能にする技術を提供することである。 An object of the present invention is to provide a technique that makes it possible to appropriately control a system in which a virtual environment is constructed on a physical computer.

本発明の一態様による計算機制御装置は、物理計算機上に仮想化環境を構築し、前記物理計算機と前記仮想化環境の二重化を構成するシステムの計算機制御装置であって、前記物理計算機および前記仮想化環境のそれぞれに配置され、前記物理計算機および前記仮想化環境のそれぞれの二重化において対応する両系が生存監視電文を互いに送受信する生存監視手段と、前記物理計算機および前記仮想化環境のそれぞれの二重化において対応する両系がどちらも自系が稼働系と認識する不整合が生じているか否か判定する不整合判定手段と、前記生存監視手段において、前記生存監視電文が所定のタイムアウト時間内に受信されずタイムアウトした系に対応する系の物理計算機または仮想化環境を障害と判断する障害箇所特定手段と、前記不整合判定手段において、不整合が生じていたら、稼働系として動作を開始した時刻が古い系の物理計算機または仮想化環境を障害と判断する不整合箇所特定手段と、前記障害箇所特定手段または前記不整合箇所特定手段において、障害と判断された系が稼働系であれば、所定の保護処理により、該稼働系の物理計算機または仮想化環境を稼働系として動作しない状態にすると共に、対応する待機系の物理計算機または仮想化環境を稼働系に遷移させ、障害と判断された系が待機系であれば、該待機系の物理計算機または仮想化環境を停止させる状態制御手段と、を有している。 A computer control device according to an aspect of the present invention is a computer control device for a system that constructs a virtual environment on a physical computer and configures the physical computer and the virtual environment to be duplicated, and includes the physical computer and the virtual computer. Survival monitoring means that is arranged in each of the virtualized environments and both systems corresponding to the duplexing of the physical computer and the virtualized environment send and receive a survival monitoring message to each other, and the duplexing of the physical computer and the virtualized environment In the inconsistency determining means for determining whether or not there is a mismatch in which both of the corresponding systems recognize that the own system is the active system, and in the alive monitoring means, the alive monitoring message is received within a predetermined timeout period. A fault location identifying means for judging a physical computer or a virtual environment of a system corresponding to a timed-out system as a fault, and the irregularity In the determination means, if there is a mismatch, the mismatch location specifying means for determining that the physical computer or virtual environment of the system whose operation time is old as the active system is a failure, and the failure location specifying means or the mismatch If the system identified as a failure is an active system in the location specifying means, the physical computer or virtual environment of the active system is not operated as an active system by a predetermined protection process, and the corresponding standby system And a state control unit that stops the physical computer or the virtual environment of the standby system if the physical computer or the virtual environment is transitioned to the active system and the system determined to be a failure is the standby system.

本発明によれば、物理計算機上に仮想化環境を構築したシステムを適切に制御することが可能となる。 ADVANTAGE OF THE INVENTION According to this invention, it becomes possible to control appropriately the system which built the virtual environment on the physical computer.

本実施形態による計算機システムのブロック図である。It is a block diagram of the computer system by this embodiment. 本実施例による計算機システムの全体構成を示すブロック図である。It is a block diagram which shows the whole structure of the computer system by a present Example. 本実施例による計算機システムの全体構成を示すブロック図である。It is a block diagram which shows the whole structure of the computer system by a present Example. 本実施例における物理計算機・仮想化環境構成制御管理テーブル２６１Ａ、２６１Ｂを示す図である。It is a figure which shows the physical computer and virtual environment configuration control management tables 261A and 261B in a present Example. 本実施例における物理計算機および仮想化環境上の構成制御アプリケーションが生存監視電文を送信する処理フローＳ４０１を示すフローチャートである。It is a flowchart which shows process flow S401 in which the configuration control application on a physical computer and a virtual environment in a present Example transmits a survival monitoring message | telegram. 本実施例における物理計算機および仮想化環境上の構成制御アプリケーション２７０Ａ、２７０Ｂ、２８３ＡＡ、２８３ＡＢ、２８３ＢＡ、２８３ＢＢが生存監視電文を受信する処理フローＳ５０１を示すフローチャートである。It is a flowchart showing a processing flow S501 in which the physical computer and the configuration control applications 270A, 270B, 283AA, 283AB, 283BA, and 283BB in the virtual environment receive a survival monitoring message. 本実施例における物理計算機・仮想化環境通信経路監視部２５０Ａ、２５０Ｂが実行する処理フローＳ６０１を示すフローチャートである。It is a flowchart which shows process flow S601 which the physical computer / virtualization environment communication path | route monitoring part 250A, 250B in a present Example performs. 本実施例における物理計算機・仮想化環境障害箇所特定部２４０Ａ、２４０Ｂの処理フローＳ７０１を示すフローチャートである。It is a flowchart which shows processing flow S701 of the physical computer / virtualization environment failure location identification part 240A, 240B in a present Example. 本実施例における物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂが再起動および停止を命令する処理フローＳ８０１を示すフローチャートである。It is a flowchart which shows processing flow S801 in which the physical computer and virtualization environment restart / stop part 230A, 230B in this embodiment commands restart and stop. 本実施例における物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂが再起動／停止の通知を受信したときの処理フローＳ９０１を示すフローチャートである。It is a flowchart which shows process flow S901 when the physical computer and virtualization environment restart / stop part 230A, 230B in a present Example receives the notification of restart / stop. 本実施例における物理計算機・仮想化環境構成制御管理部２２０Ａ、２２０Ｂが物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂから通知を受信したときに実行する処理フローＳ１００１を示すフローチャートである。It is a flowchart showing a processing flow S1001 executed when the physical computer / virtualized environment configuration control management unit 220A, 220B receives a notification from the physical computer / virtualized environment restart / stop unit 230A, 230B in this embodiment. 本実施例における物理計算機・仮想化環境構成制御管理部２２０Ａ、２２０Ｂが物理計算機・仮想化環境通信経路監視部２５０Ａ、２５０Ｂから通知を受信したときに実行する処理の処理フローＳ１１０１を示すフローチャートである。It is a flowchart showing a processing flow S1101 of processing executed when the physical computer / virtualized environment configuration control management unit 220A, 220B receives a notification from the physical computer / virtualized environment communication path monitoring unit 250A, 250B in this embodiment. .

本発明の実施形態について図面を参照して説明する。 Embodiments of the present invention will be described with reference to the drawings.

図１は、本発明の一実施形態による計算機システムのブロック図である。図１を参照すると、本実施形態による計算機システムは、二重化された２つの物理計算機１０Ａ、１０Ｂを有している。物理計算機１０Ａと物理計算機１０Ｂは基本的に同じ構成であり、対をなし、一方が稼動系となり、他方が待機系となって二重化を構成している。二重化においては、稼働系に障害が発生した場合には、稼働系と待機系の系が切り替わり、待機系が新たな稼動系となる。 FIG. 1 is a block diagram of a computer system according to an embodiment of the present invention. Referring to FIG. 1, the computer system according to the present embodiment includes two physical computers 10A and 10B that are duplicated. The physical computer 10A and the physical computer 10B have basically the same configuration, and are paired, with one serving as an active system and the other serving as a standby system to form a duplex. In duplexing, when a failure occurs in the active system, the active system and the standby system are switched, and the standby system becomes a new active system.

物理計算機１０Ａには２つの仮想化環境１１ＡＡ、１１ＡＢが構築されている。同様に、物理計算機１０Ｂにも２つの仮想化環境が１１ＢＡ、１１ＢＢが構築されている。そして、物理計算機１０Ａの仮想化環境１１ＡＡと物理計算機１０Ｂの仮想化環境１１ＡＢが二重化を構成し、物理計算機１０Ａの仮想化環境１１ＢＡと物理計算機１０Ｂの仮想化環境１１ＢＢが二重化を構成している。 Two virtual environments 11AA and 11AB are constructed in the physical computer 10A. Similarly, two virtual environments 11BA and 11BB are also constructed in the physical computer 10B. The virtual environment 11AA of the physical computer 10A and the virtual environment 11AB of the physical computer 10B constitute a duplex, and the virtual environment 11BA of the physical computer 10A and the virtual environment 11BB of the physical computer 10B constitute a duplex.

また、物理計算機１０Ａ、１０Ｂは、物理計算機と仮想化環境が共に二重化された計算機システムの計算機を制御するために、生存監視部１２Ａ、１２ＡＡ、１２ＡＢ、１２Ｂ、１２ＢＡ、１２ＢＢ、不整合判定部１３Ａ、１３Ｂ、障害箇所特定部１４Ａ、１４Ｂ、不整合箇所特定部１５Ａ、１５Ｂ、および状態制御部１６Ａ、１６Ｂを有している。 In addition, the physical computers 10A and 10B control the computers of the computer system in which both the physical computer and the virtual environment are duplicated, so that the survival monitoring units 12A, 12AA, 12AB, 12B, 12BA, 12BB, and the inconsistency determination unit 13A , 13B, failure location identification units 14A, 14B, mismatch location identification units 15A, 15B, and state control units 16A, 16B.

物理計算機１０Ａにある生存監視部１２Ａと物理計算機１０Ｂにある生存監視部１２Ｂ、不整合判定部１３Ａと不整合判定部１３Ｂ、障害箇所特定部１４Ａと障害箇所特定部１４Ｂ、不整合箇所特定部１５Ａと不整合箇所特定部１５Ｂ、状態制御部１６Ａと状態制御部１６Ｂはそれぞれ同じものである。ここでは主に物理計算機１０Ａについて説明する。 The survival monitoring unit 12A in the physical computer 10A and the survival monitoring unit 12B in the physical computer 10B, the inconsistency determining unit 13A and the inconsistency determining unit 13B, the fault location specifying unit 14A, the fault location specifying unit 14B, and the inconsistent location specifying unit 15A The inconsistent part specifying unit 15B, the state control unit 16A, and the state control unit 16B are the same. Here, the physical computer 10A will be mainly described.

物理計算機１０Ａ、仮想化環境１１ＡＡ、１１ＡＢにはそれぞれに生存監視部１２Ａ、１２ＡＡ、１２ＡＢが配置されている。同様に、物理計算機１０Ｂ、仮想化環境１１ＢＡ、１１ＢＢにはそれぞれに生存監視部１２Ｂ、１２ＢＡ、１２ＢＢが配置されている。 Survival monitoring units 12A, 12AA, and 12AB are arranged in the physical computer 10A and the virtual environments 11AA and 11AB, respectively. Similarly, survival monitoring units 12B, 12BA, and 12BB are arranged in the physical computer 10B and the virtual environments 11BA and 11BB, respectively.

二重化において対応する両系である物理計算機１０Ａと物理計算機１０Ｂが相互に生存していることを確認するために、生存監視部１２Ａと生存監視部１２Ｂが互いに生存監視電文を送受信する。同様に、二重化において対応する両系である仮想化環境１１ＡＡと仮想化環境１１ＢＡが生存していることを相互に確認するために、生存監視部１２ＡＡと生存監視部１２ＢＡが互いに生存監視電文を送受信する。同様に、二重化において対応する両系である仮想化環境１１ＡＢと仮想化環境１１ＢＢが生存していることを相互に確認するために、生存監視部１２ＡＢと生存監視部１２ＢＢが互いに生存監視電文を送受信する。 In order to confirm that the physical computer 10A and the physical computer 10B, which are both systems corresponding to each other in duplication, are alive, the alive monitoring unit 12A and the alive monitoring unit 12B transmit and receive a alive monitoring message to each other. Similarly, in order to mutually confirm that the virtualization environment 11AA and the virtualization environment 11BA, which are both systems corresponding to duplication, are alive, the alive monitoring unit 12AA and the alive monitoring unit 12BA send and receive a alive monitoring message to each other. To do. Similarly, in order to mutually confirm that the virtualization environment 11AB and the virtualization environment 11BB, which are both systems corresponding to the duplex, are alive, the survival monitoring unit 12AB and the survival monitoring unit 12BB send and receive a survival monitoring message to each other. To do.

不整合判定部１３Ａは、物理計算機１０Ａと物理計算機１０Ｂ、仮想化環境１１ＡＡと仮想化環境１１ＢＡ、仮想化環境１１ＡＢと仮想化環境１１ＢＢという、それぞれの二重化において対応する両系がどちらも自系が稼働系と認識するという不整合が生じているか否か判定する。例えば、両系が相互に送受信する電文に自系が稼働系と認識していることを表わす情報を含めることにより、他系の認識を知ることができる。 The inconsistency determination unit 13A is configured so that both systems corresponding to each duplication, ie, the physical computer 10A and the physical computer 10B, the virtual environment 11AA and the virtual environment 11BA, and the virtual environment 11AB and the virtual environment 11BB are both own systems. It is determined whether or not there is an inconsistency that the system is recognized as active. For example, it is possible to know the recognition of the other system by including information indicating that the own system is recognized as an active system in a message transmitted and received between both systems.

障害箇所特定部１４Ａは、生存監視部１２Ａ、１２ＡＡ、１２ＡＢにおいて、生存監視電文が所定のタイムアウト時間内に受信されずタイムアウトしたら、その系に対応する系の物理計算機１０Ｂまたは仮想化環境１１ＢＡ、１１ＢＢを障害と判断する。 When the survival monitoring message is not received within the predetermined timeout time in the survival monitoring units 12A, 12AA, and 12AB, the failure location specifying unit 14A determines that the physical computer 10B or the virtual environment 11BA, 11BB of the system corresponding to the system. Is determined to be a failure.

不整合箇所特定部１５Ａは、不整合判定部１３Ａにおいて、不整合が生じていたら、稼働系として動作を開始した時刻が古い系の物理計算機または仮想化環境を障害と判断する。例えば、両系が送受信する電文に自系が稼働系として動作を開始した時刻（稼働系開始時刻）を含めることにより、他系の稼動系開始時刻を知ることができる。 If there is a mismatch in the mismatch determination unit 13A, the mismatch location specifying unit 15A determines that the physical computer or the virtual environment of the system with the old operation start time is a failure. For example, the operating system start time of the other system can be known by including the time (the operating system start time) when the local system starts operating as an active system in a message transmitted and received by both systems.

状態制御部１６Ａは、障害箇所特定部１４Ａまたは不整合箇所特定部１５Ａにおいて障害と判断された系が稼働系であれば、所定の保護処理により、その稼働系の物理計算機または仮想化環境を稼働系として動作しない状態にすると共に、対応する待機系の物理計算機または仮想化環境を稼働系に遷移させる。また、状態制御部１６Ａは、障害箇所特定部１４Ａまたは不整合箇所特定部１５Ａにおいて障害と判断された系が待機系であれば、その待機系の物理計算機または仮想化環境を停止させる。物理計算機１０Ａの障害箇所特定部１４Ａまたは不整合箇所特定部１５Ａでは他系の物理計算機１０Ｂあるいはその上に構築された仮想化環境１１ＢＡ、１１ＢＢの障害が特定されるので、状態制御部１６Ａは、他系の物理計算機１０Ｂまたは仮想化環境１１ＢＡ、１１Ｂに対して保護処理や停止させる処理を電文で指示する。 The state control unit 16A operates the physical computer or the virtual environment of the active system by a predetermined protection process if the system determined to be a failure by the fault location specifying unit 14A or the inconsistent location specifying unit 15A is an active system. While not operating as a system, the corresponding standby physical computer or virtual environment is transitioned to the active system. Further, if the system determined to be a failure by the failure location specifying unit 14A or the inconsistent location specifying unit 15A is a standby system, the state control unit 16A stops the physical computer or the virtual environment of the standby system. Since the fault location specifying unit 14A or the inconsistent location specifying unit 15A of the physical computer 10A specifies the fault of the other physical computer 10B or the virtual environments 11BA and 11BB constructed thereon, the state control unit 16A Instruct the other physical computer 10B or the virtual environments 11BA and 11B to perform a protection process or a process to be stopped using a message.

以上、本実施形態によれば、物理計算機上に仮想化環境を構築し、物理計算機と仮想化環境の二重化を構成する計算機システムにおいて、物理計算機あるいは仮想化環境に障害が生じたとき適切に制御を実行することができる。 As described above, according to the present embodiment, in a computer system in which a virtual environment is constructed on a physical computer and the physical computer and the virtual environment are duplicated, appropriate control is performed when a failure occurs in the physical computer or the virtual environment. Can be executed.

なお、上述の保護処理は、例えば、稼働系の仮想化環境１１ＢＡ、１１ＢＢが障害となっていれば、その仮想化環境１１ＢＡ、１１ＢＢの所定のメモリ領域を仮想化環境１１ＢＡ、１１ＢＢ外のメモリ領域に退避してから、仮想化環境１１ＢＡ、１１ＢＢをリセットする処理であってもよい。これによれば、障害による系切換が発生したとき、障害要因の特定に利用可能なデータを取得することができる。 For example, if the active virtual environments 11BA and 11BB are faulty, the above-described protection processing is performed by using a predetermined memory area of the virtual environments 11BA and 11BB as a memory area outside the virtual environments 11BA and 11BB. The processing may be a process of resetting the virtualization environments 11BA and 11BB after saving the data. According to this, when system switching occurs due to a failure, it is possible to acquire data that can be used to identify the cause of the failure.

また、例えば、不整合箇所特定部１５Ａは、物理計算機１０Ａ、１０Ｂおよび仮想化環境１１ＡＡ、１１ＡＢ、１１ＢＡ、１１ＢＢのそれぞれについて稼働系として動作を開始した時刻を稼働系開始時刻として記録しておき、対応する両系に不整合が生じていたら、両系の稼動系開始時刻を参照することにより、障害とする系を決定することにしてもよい。これによれば、両系の状態が不整合となったときに、どちらの系を障害とするかを容易に判断することができる。 Further, for example, the inconsistent location specifying unit 15A records the time when the physical computers 10A and 10B and the virtual environments 11AA, 11AB, 11BA, and 11BB start operating as the active system as the active system start time. If there is a mismatch between the two corresponding systems, the system to be failed may be determined by referring to the operating system start time of both systems. According to this, when the state of both systems becomes inconsistent, it can be easily determined which system is the failure.

また、例えば、障害箇所特定部１４Ａは、物理計算機１０Ａおよび仮想化環境１１ＡＡ、１１ＡＢのそれぞれについて、タイムアウト時間を予め定めておき、生存監視電文を最後に受信した時刻である最終受信時刻を記録し、最終受信時刻から現在時刻までの時間がタイムアウト時間を超えたらタイムアウトと判断することにしてもよい。これによれば、生存監視電文のタイムアウトを容易に判断することができる。 Further, for example, the failure location specifying unit 14A sets a timeout time in advance for each of the physical computer 10A and the virtual environments 11AA and 11AB, and records the last reception time that is the time when the survival monitoring message was last received. If the time from the last reception time to the current time exceeds the timeout time, it may be determined that a timeout has occurred. According to this, it is possible to easily determine the timeout of the survival monitoring message.

なお、図１において物理計算機１０Ａ、１０Ｂ内に設けられた各部は、それぞれの処理手順を規定したソフトウェアプログラムをコンピュータに実行させることにより実現することもできる。 Note that each unit provided in the physical computers 10A and 10B in FIG. 1 can also be realized by causing a computer to execute a software program that defines each processing procedure.

続いて上述した本実施形態をより具体化した実施例について説明する。 Next, an example that more specifically embodies the above-described embodiment will be described.

図２および３は、本実施例による計算機システムの全体構成を示すブロック図である。図２には稼働系の物理計算機１００Ａが示され、図３には待機系の物理計算機１００Ｂが示されている。 2 and 3 are block diagrams showing the overall configuration of the computer system according to this embodiment. FIG. 2 shows an active physical computer 100A, and FIG. 3 shows a standby physical computer 100B.

図２を参照すると、稼働系の物理計算機１００Ａは、メモリ２００Ａ、プロセッサ１１０、ディスクインタフェース１２０、および通信インタフェース１３０、１３１を有している。ディスクインタフェース１２０はディスク１４０に接続されている。通信インタフェース１３０はネットワーク１５０に接続され、通信インタフェース１３１はネットワーク１６０に接続されている。 Referring to FIG. 2, the active physical computer 100A includes a memory 200A, a processor 110, a disk interface 120, and communication interfaces 130 and 131. The disk interface 120 is connected to the disk 140. The communication interface 130 is connected to the network 150, and the communication interface 131 is connected to the network 160.

メモリ２００Ａ上には、仮想化環境用論理区画２８１ＡＡ、２８１ＡＢと、構成制御アプリケーション２７０Ａと、物理計算機・仮想化環境共有用メモリ区画２３０Ａと、物理計算機・仮想化環境再起動／停止部２３０Ａと、物理計算機・仮想化環境障害箇所特定部２４０Ａと、物理計算機・仮想化環境通信経路監視部２５０Ａと、物理計算機・仮想化環境構成制御管理部２２０Ａと、物理計算機用ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）２１０Ａとが存在する。 On the memory 200A, virtual environment logical partitions 281AA and 281AB, a configuration control application 270A, a physical computer / virtualized environment sharing memory partition 230A, a physical computer / virtualized environment restart / stop unit 230A, A physical computer / virtualized environment fault location identifying unit 240A, a physical computer / virtualized environment communication path monitoring unit 250A, a physical computer / virtualized environment configuration control managing unit 220A, and an OS (Operating System) 210A for the physical computer. Exists.

構成制御アプリケーション２７０Ａと、物理計算機・仮想化環境共有用メモリ区画２３０Ａと、物理計算機・仮想化環境再起動／停止部２３０Ａと、物理計算機・仮想化環境障害箇所特定部２４０Ａと、物理計算機・仮想化環境通信経路監視部２５０Ａと、物理計算機・仮想化環境構成制御管理部２２０Ａとは、物理計算機用ＯＳ２１０Ａ上で動作する。 Configuration control application 270A, physical computer / virtualized environment sharing memory partition 230A, physical computer / virtualized environment restart / stop unit 230A, physical computer / virtualized environment fault location identifying unit 240A, physical computer / virtualized environment The virtual environment communication path monitoring unit 250A and the physical computer / virtualized environment configuration control management unit 220A operate on the physical computer OS 210A.

仮想化環境用論理区画２８１ＡＡには、構成制御アプリケーション２８３ＡＡと仮想化環境用ＯＳ２８４ＡＡとを含む仮想化環境２８２ＡＡが存在する。同様に、仮想化環境用論理区画２８１ＡＢには、構成制御アプリケーション２８３ＡＢと仮想化環境用ＯＳ２８４ＡＢとを含む仮想化環境２８２ＡＢが存在する。構成制御アプリケーション２８３ＡＡが仮想化環境用ＯＳ２８４ＡＡ上で動作し、構成制御アプリケーション２８３ＡＢが仮想化環境用ＯＳ２８４ＡＢ上に動作する。 The virtual environment logical partition 281AA includes a virtual environment 282AA including a configuration control application 283AA and a virtual environment OS 284AA. Similarly, the virtual environment logical partition 281AB includes a virtual environment 282AB including a configuration control application 283AB and a virtual environment OS 284AB. The configuration control application 283AA operates on the virtual environment OS 284AA, and the configuration control application 283AB operates on the virtual environment OS 284AB.

図３に示す待機系の物理計算機１００Ｂは、図１に示した稼働系の物理計算機１００Ａと同様である。ただし、こちらでは仮想化環境用論理区画２８１ＢＡ、ＢＢに仮想化環境２８２ＢＡ、ＢＢがそれぞれ存在している。仮想化環境２８２ＢＡには構成制御アプリケーション２８３ＢＡと仮想化環境用ＯＳ２８４ＢＡが存在し、仮想化環境２８２ＢＢには構成制御アプリケーション２８３ＢＢと仮想化環境用ＯＳ２８４ＢＢが存在する。 The standby physical computer 100B shown in FIG. 3 is the same as the active physical computer 100A shown in FIG. However, here, the virtual environments 282BA and BB exist in the virtual environment logical partitions 281BA and BB, respectively. The virtual environment 282BA includes a configuration control application 283BA and a virtual environment OS 284BA, and the virtual environment 282BB includes a configuration control application 283BB and a virtual environment OS 284BB.

上述のような構成を有する計算機システムにおいて、稼働系の物理計算機１００Ａに障害が発生した場合には、稼働系の物理計算機１００Ａと待機系の物理計算機１００Ｂの系が切り替わり、物理計算機１００Ａが待機系となり、物理計算機１００Ｂが稼動系となる。 In the computer system having the above-described configuration, when a failure occurs in the active physical computer 100A, the active physical computer 100A and the standby physical computer 100B are switched, and the physical computer 100A is switched to the standby system. Thus, the physical computer 100B becomes an active system.

各物理計算機１００Ａ、１００Ｂに具備しているメモリ２００Ａ、２００Ｂ上には、仮想化環境用論理区画２８１ＡＡ、２８１ＡＢおよび２８１ＢＡ、２８１ＢＢがそれぞれ存在する。仮想化環境用論理区画２８１ＡＡの稼働系の仮想化環境Ａ２８２ＡＡと、仮想化環境用論理区画２８１ＢＡの待機系の仮想化環境Ａ２８２ＢＡとにより、仮想化環境Ａが二重化されている。同様に、稼働系の仮想化環境Ｂ２８２ＡＢと、待機系の仮想化環境Ｂ２８２ＢＢとによって仮想化環境Ｂが二重化されている。 The virtual environment logical partitions 281AA, 281AB and 281BA, 281BB exist on the memories 200A, 200B included in the physical computers 100A, 100B, respectively. The virtual environment A is duplicated by the active virtual environment A 282AA of the virtual environment logical partition 281AA and the standby virtual environment A 282BA of the virtual environment logical partition 281BA. Similarly, the virtual environment B is duplicated by the active virtual environment B 282AB and the standby virtual environment B 282BB.

稼働系の仮想化環境Ａ２８２ＡＡに障害が発生した場合には、稼働系の仮想化環境Ａ２８２ＡＡと待機系の仮想化環境Ａ２８２ＢＡの系が切り替わり、仮想化環境Ａ２８２ＡＡが待機系となり、仮想化環境Ａ２８２ＢＡが稼動系となる。 When a failure occurs in the active virtual environment A 282AA, the active virtual environment A 282AA and the standby virtual environment A 282BA are switched, and the virtual environment A 282AA becomes the standby system. Environment A 282BA becomes the active system.

稼働系の仮想化環境Ｂ２８２ＡＢに障害が発生した場合には、稼働系の仮想化環境Ｂ２８２ＡＢと待機系の仮想化環境Ｂ２８２ＢＢの系が切り替わり、仮想化環境Ｂ２８２ＡＢが待機系となり、仮想化環境Ｂ２８２ＢＢが稼動系となる。 When a failure occurs in the active virtual environment B 282AB, the active virtual environment B 282AB and the standby virtual environment B 282BB are switched, and the virtual environment B 282AB becomes the standby system. The environment B282BB becomes the active system.

各物理計算機のディスクインタフェース１２０に接続されたディスク１４０には、システムを構築するための管理情報１４１と、メモリダンプで取得したデータを記録する領域としてメモリダンプ領域１４２とが備えられている。 The disk 140 connected to the disk interface 120 of each physical computer is provided with management information 141 for constructing a system and a memory dump area 142 as an area for recording data acquired by memory dump.

また、各物理計算機１００Ａ、１００Ｂが備える通信インタフェース１３０がネットワーク１５０に接続され、通信インタフェース１３１がネットワーク１６０に接続され、ネットワーク１５０とネットワーク１６０によって通信経路が二重化されている。 In addition, the communication interface 130 included in each of the physical computers 100A and 100B is connected to the network 150, the communication interface 131 is connected to the network 160, and the communication path is duplicated by the network 150 and the network 160.

各物理計算機用ＯＳ２１０Ａ、２１０Ｂ上に存在する構成制御アプリケーション２７０Ａ、２７０Ｂは、二重化された通信インタフェース１３０、１３１を介して、自物理計算機の状態と自物理計算機が稼動系に遷移した時刻とを含む生存監視電文を互いに送受信する。自物理計算機の状態が稼動系でない場合、生存監視電文には、自物理計算機が稼働系に遷移した時刻として０が設定される。物理計算機が稼動系に遷移した時刻として０以外の値が設定されていれば、その物理計算機が自系が稼動系であると認識していることを示し、０が設定されていれば、その物理計算機が自系が待機系であると認識していることを示す。 The configuration control applications 270A and 270B existing on the OSs 210A and 210B for the respective physical computers indicate the state of the own physical computer and the time when the own physical computer transited to the active system via the duplex communication interfaces 130 and 131. Send and receive liveness monitoring messages including each other. When the state of the own physical computer is not the active system, 0 is set in the survival monitoring message as the time when the own physical computer transited to the active system. If a value other than 0 is set as the time when the physical computer transits to the active system, this indicates that the physical computer recognizes that the own system is the active system, and if 0 is set, that Indicates that the physical computer recognizes that its own system is a standby system.

また、各仮想化環境用ＯＳ２８４ＡＡ、２８４ＡＢ、２８４ＢＡ、２８４ＢＢ上の構成制御アプリケーション２８３ＡＡ、２８３ＡＢ、２８３ＢＡ、２８３ＢＢは、仮想化環境毎の稼動系と待機系とで二重化されている通信インタフェース１３０、１３１を介して、自仮想化環境の状態と自仮想化環境が稼動系に遷移した時刻とを含む生存監視電文を互いに送受信する。自仮想化環境の状態が稼動系でない場合、生存監視電文には、自仮想化環境が稼働系に遷移した時刻として０が設定される。仮想化環境が稼動系に遷移した時刻として０以外の値が設定されていれば、その仮想化環境が自系が稼動系であると認識していることを示し、０が設定されていれば、その仮想化環境が自系が待機系であると認識していることを示す。 Further, the configuration control applications 283AA, 283AB, 283BA, and 283BB on the respective virtual environment OSs 284AA, 284AB, 284BA, and 284BB are communication interfaces 130 and 131 that are duplexed between the active system and the standby system for each virtual environment. Via the, a survival monitoring message including the state of the self-virtualized environment and the time when the self-virtualized environment transitions to the active system is transmitted and received. When the state of the self-virtualized environment is not the active system, 0 is set in the survival monitoring message as the time when the self-virtualized environment has transitioned to the active system. If a value other than 0 is set as the time when the virtual environment transitions to the active system, this indicates that the virtual environment recognizes that the local system is the active system, and if 0 is set. This indicates that the virtual environment recognizes that its own system is a standby system.

各物理計算機および仮想化環境の構成制御アプリケーションの両系間で送受信される生存監視電文は各物理計算機用ＯＳ２１０Ａ、２１０Ｂに存在する物理計算機・仮想化環境通信経路監視部２５０Ａ、２５０Ｂに取得される。これを契機に、物理計算機・仮想化環境通信経路監視部２５０Ａ、２５０Ｂは、送受信の時刻を元に、その電文に該当の物理計算機・仮想化環境共有用メモリ区画２６０Ａ、２６０Ｂに存在する物理計算機・仮想化環境構成制御管理テーブル２６１Ａ、２６１Ｂにおける、電文を送受信した時刻（最終送信時刻、最終受信時刻）を記録する領域を更新する。 Survival monitoring messages transmitted / received between both systems of the physical computer and the configuration control application of the virtual environment are acquired by the physical computer / virtual environment communication path monitoring units 250A and 250B existing in the OSs 210A and 210B for the physical computers. The Based on this, the physical computer / virtualized environment communication path monitoring units 250A, 250B, based on the transmission / reception time, the physical computers existing in the physical computer / virtualized environment sharing memory partitions 260A, 260B corresponding to the message. Update the area in the virtualization environment configuration control management table 261A, 261B that records the time (final transmission time, final reception time) at which a message was transmitted / received.

また、生存監視電文に不整合が生じていた場合には、物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂに相手系の物理計算機または仮想化環境を停止させるための通知をする。生存監視電文における自系が稼働系に遷移した時刻から、二重化の両系が共に自系を稼動系であると認識していたら、生存監視電文に不整合が生じていると判断すればよい。 If there is a mismatch in the survival monitoring message, the physical computer / virtualized environment restart / stop unit 230A, 230B is notified to stop the counterpart physical computer or virtualized environment. From the time when the own system in the survival monitoring message transitions to the active system, if both duplexed systems recognize that the own system is the active system, it may be determined that there is an inconsistency in the survival monitoring message.

各物理計算機用ＯＳ２１０Ａ、２１０Ｂ上の物理計算機・仮想化環境障害箇所特定部２４０Ａ、２４０Ｂは、物理計算機・仮想化環境構成制御管理テーブル２６１Ａ、２６１Ｂに存在するタイムアウト時間の最小値をもって周期起動される。周期起動された物理計算機・仮想化環境障害箇所特定部２４０Ａ、２４０Ｂは、生存監視電文が途絶えている箇所があればその障害発生箇所を特定し、その障害発生箇所が物理計算機または仮想化環境であるか特定し、特定した情報を障害発生の通知として、物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂに送る。なお、ここでタイムアウト時間の最小値を用いているのは、どの物理計算機または仮想化環境でタイムアウトが発生しても、タイムアウト時間の１周期以内にタイムアウトの発生を検知することができるようにするためである。 The physical computer / virtualized environment fault location identifying units 240A, 240B on the physical computer OSs 210A, 210B are cyclically started with the minimum value of the timeout time existing in the physical computer / virtualized environment configuration control management tables 261A, 261B. . The cyclically activated physical computer / virtualized environment fault location specifying unit 240A, 240B specifies the fault occurrence location if there is a location where the survival monitoring message is interrupted, and the fault occurrence location is a physical computer or virtual environment. Whether or not there is, the specified information is sent to the physical computer / virtualized environment restart / stop units 230A and 230B as a notification of the occurrence of a failure. Note that the minimum value of the timeout time is used here so that the occurrence of the timeout can be detected within one cycle of the timeout time regardless of the physical computer or the virtual environment. Because.

各物理計算機用ＯＳ２１０Ａ、２１０Ｂ上の物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂは、物理計算機・仮想化環境構成制御管理テーブル２６１Ａ、２６１Ｂから物理計算機と仮想化環境の状態を取得する。 The physical computer / virtualized environment restart / stop units 230A, 230B on the physical computer OSs 210A, 210B acquire the state of the physical computer and the virtualized environment from the physical computer / virtualized environment configuration control management tables 261A, 261B. .

また、物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂは、物理計算機・仮想化環境障害箇所特定部２４０Ａ、２４０Ｂからの障害発生の通知に基づき、障害発生箇所が、物理計算機または仮想化環境のいずれかの稼動系であるか否かを判断する。 Further, the physical computer / virtualized environment restart / stop units 230A and 230B are configured so that the failure location is determined by the physical computer or virtualization based on the notification of the failure occurrence from the physical computer / virtualized environment failure location specifying unit 240A and 240B. Judge whether it is any active system in the environment.

障害発生箇所が物理計算機の稼働系である場合、物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂは、障害が発生している相手系の通信インタフェース１３０、１３１へネットワークを介して、再起動を指示する再起動電文を送信して物理計算機上のＯＳへＮＭＩ（マスクが不可能な割り込み）を発生させる。その後、物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂは、規定時間だけ待ち合わせを行い、障害が発生した物理計算機へリセットを通知し、物理計算機・仮想化環境構成制御管理部２２０Ａ、２２０Ｂへ待機状態への状態変更を通知する。 When the failure location is the active system of the physical computer, the physical computer / virtualized environment restart / stop units 230A and 230B re-enter the communication interface 130 and 131 of the partner system in which the failure has occurred via the network. A restart message instructing activation is transmitted to generate an NMI (interrupt that cannot be masked) to the OS on the physical computer. Thereafter, the physical computer / virtualized environment restart / stop units 230A and 230B wait for a specified time, notify the physical computer in which a failure has occurred to the reset, and the physical computer / virtualized environment configuration control management units 220A and 220B. Notify the status change to standby status.

また、障害発生箇所が仮想化環境である場合、物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂは、障害が発生している相手系の物理計算機・仮想化環境再起動／停止部２３０Ａまたは２３０Ｂへ、通信インタフェース１３０、１３１からネットワークを介し、当該仮想化環境の再起動を指示する再起動電文を送信する。 When the failure occurrence location is a virtual environment, the physical computer / virtual environment restart / stop unit 230A, 230B is the physical computer / virtual environment restart / stop unit 230A of the partner system in which the failure occurs. Alternatively, a restart message instructing restart of the virtual environment is transmitted from the communication interfaces 130 and 131 to the network 230B via the network.

再起動電文を受信した系の物理計算機・仮想化環境再起動／停止部２３０Ａまたは２３０Ｂは、障害の発生している仮想化環境の仮想化環境用ＯＳ２８４ＡＡ、２８４ＡＢ、２８４ＢＡ、２８４ＢＢへＮＭＩを発生させる。その後、物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂは、規定時間だけ待ち合わせを行い、障害が発生した仮想化環境へリセット電文を通知し、物理計算機・仮想化環境構成制御管理部２２０Ａ、２２０Ｂへ待機状態への状態変更を通知する。 The physical computer / virtualized environment restart / stop unit 230A or 230B of the system that has received the restart message generates an NMI to the virtual environment OSs 284AA, 284AB, 284BA, and 284BB in the virtual environment where the failure has occurred. . Thereafter, the physical computer / virtualized environment restart / stop units 230A and 230B wait for a specified time, notify the virtual environment in which a failure has occurred, a reset message, and the physical computer / virtualized environment configuration control management unit 220A. , 220B is notified of the state change to the standby state.

各物理計算機用ＯＳ２１０Ａ、２１０Ｂ上で物理計算機・仮想化環境構成制御管理部２２０Ａ、２２０Ｂが動作し、物理計算機・仮想化環境共有用メモリ区画２３０Ａ、２３０Ｂに存在する物理計算機・仮想化環境構成制御管理テーブル２６１Ａ、２６１Ｂの情報に基づいて、各物理計算機および仮想化環境の構成制御を管理し、状態の変更通知があった場合にはテーブルに存在する該当の状態を変更する。 The physical computer / virtualized environment configuration control management unit 220A, 220B operates on the physical computer OS 210A, 210B, and the physical computer / virtualized environment configuration existing in the physical computer / virtualized environment shared memory partitions 230A, 230B. Based on the information in the control management tables 261A and 261B, the configuration control of each physical computer and the virtual environment is managed, and when a state change notification is received, the corresponding state existing in the table is changed.

図４は、本実施例における物理計算機・仮想化環境構成制御管理テーブル２６１Ａ、２６１Ｂを示す図である。 FIG. 4 is a diagram showing the physical computer / virtualized environment configuration control management tables 261A and 261B in this embodiment.

図４中の上段にある物理計算機についてのテーブルには、各物理計算機１００Ａ、１００Ｂの識別子（Ａ、Ｂ）が登録されている。 The identifiers (A, B) of the physical computers 100A, 100B are registered in the table for the physical computers in the upper part of FIG.

状態Ｔ００２には、識別子Ｔ００１が記載された物理計算機１００Ａ、１００Ｂが稼動系か待機系かを示す情報が登録されている。 In the state T002, information indicating whether the physical computers 100A and 100B with the identifier T001 are active or standby is registered.

稼動系開始時刻Ｔ００３には、各物理計算機１００Ａ、１００Ｂが稼動系として動作を開始した時刻である稼動系開始時刻が記録されている。 In the active system start time T003, an active system start time, which is a time when the physical computers 100A and 100B start to operate as active systems, is recorded.

最終送信時刻Ｔ００４には、各物理計算機１００Ａ、１００Ｂから生存監視電文が最後に送信された時刻が記録されている。 In the last transmission time T004, the time when the survival monitoring message was last transmitted from each of the physical computers 100A and 100B is recorded.

最終受信時刻Ｔ００５には、二重化における相手の物理計算機１００Ａ、１００Ｂから生存監視電文を最後に受信した時刻が記録されている。 In the last reception time T005, the time at which the survival monitoring message was last received from the partner physical computers 100A and 100B in the duplex is recorded.

タイムアウト時間Ｔ００６には、相手の物理計算機１００Ａ、１００Ｂからの生存監視電文が途絶えた判断するためのタイムアウト時間が記録されている。 In the time-out time T006, a time-out time for determining that the survival monitoring message from the counterpart physical computers 100A and 100B has been interrupted is recorded.

下段にある仮想化環境についてのテーブルは、上段のテーブルの右側に連なるものである。この仮想化環境についてのテーブルには、各物理計算機１００Ａ、１００Ｂに存在する仮想化環境２８２ＡＡ、２８２ＡＢ、２８２ＢＡ、２８２ＢＢを示す識別子Ｔ００７（Ａ、Ｂ、Ａ、Ｂ）が記録されている。例えば、最上段の仮想化環境は識別子Ａの物理計算機に構築された識別子Ａの仮想化環境であり、２段目の仮想化環境は識別子Ａの物理計算機に構築された識別子Ｂの仮想化環境である。 The table for the virtual environment in the lower row is connected to the right side of the upper table. In the table for the virtual environment, identifiers T007 (A, B, A, B) indicating the virtual environments 282AA, 282AB, 282BA, 282BB existing in the respective physical computers 100A, 100B are recorded. For example, the virtualization environment at the top level is the virtualization environment with identifier A constructed in the physical computer with identifier A, and the virtualization environment at the second level is the virtualization environment with identifier B constructed in the physical computer with identifier A It is.

状態Ｔ００８には、各仮想化環境２８２ＡＡ、２８２ＡＢ、２８２ＢＡ、２８２ＢＢが稼動系か待機系かを示す情報が登録されている。 In the state T008, information indicating whether each of the virtualization environments 282AA, 282AB, 282BA, and 282BB is an active system or a standby system is registered.

稼動系開始時刻Ｔ００９には、各仮想化環境２８２ＡＡ、２８２ＡＢ、２８２ＢＡ、２８２ＢＢが稼動系へ遷移した時刻が記録されている。 In the active system start time T009, the time when each of the virtualization environments 282AA, 282AB, 282BA, and 282BB transitions to the active system is recorded.

相手Ｔ０１０には、各仮想化環境２８２ＡＡ、２８２ＡＢ、２８２ＢＡ、２８２ＢＢが、二重化においてどの物理計算機のどの仮想化環境と対応しているかを示す識別子が記録されている。￥例えば、最上段の仮想化環境は、識別子Ｂの物理計算機に構築された識別子Ａの仮想化環境と対応している。 In the partner T010, an identifier indicating which virtual environment of which physical computer corresponds to each virtual environment 282AA, 282AB, 282BA, 282BB in the duplex is recorded. ¥ For example, the virtual environment at the top corresponds to the virtual environment with identifier A constructed in the physical computer with identifier B.

最終送信時刻Ｔ０１１には、相手Ｔ０１０にて示されている二重化にて対応する仮想化環境に生存監視電文を最後に送信した時刻が記録されている。 The last transmission time T011 records the time when the survival monitoring message was last transmitted to the virtual environment corresponding to the duplexing indicated by the partner T010.

最終受信時刻Ｔ０１２には、相手Ｔ０１０に示されている二重化にて対応する仮想化環境から最後に生存監視電文を受信した時刻が記録されている。 In the final reception time T012, the time when the last survival monitoring message is received from the virtual environment corresponding to the duplexing shown in the partner T010 is recorded.

タイムアウト時刻Ｔ０１３には、各仮想化環境２８２ＡＡ、２８２ＡＢ、２８２ＢＡ、２８２ＢＢにおける生存監視電文のタイムアウト時間が記録されている。 In the timeout time T013, the timeout time of the survival monitoring message in each of the virtualization environments 282AA, 282AB, 282BA, and 282BB is recorded.

図５は、本実施例における物理計算機および仮想化環境上の構成制御アプリケーションが生存監視電文を送信する処理フローＳ４０１を示すフローチャートである。物理計算機１００Ａ、１００Ｂ上の構成制御アプリケーション２７０Ａ、２７０Ｂおよび仮想化環境２８２ＡＡ、２８２ＡＢ、２８２ＢＡ、２８２ＢＢ上の構成制御アプリケーション２８３ＡＡ、２８３ＡＢ、２８３ＢＡ、２８３ＢＢは全て同様の動作を行う。 FIG. 5 is a flowchart showing a processing flow S401 in which the physical computer and the configuration control application on the virtual environment in this embodiment transmit a survival monitoring message. The configuration control applications 270A, 270B on the physical computers 100A, 100B and the configuration control applications 283AA, 283AB, 283BA, 283BB on the virtual environments 282AA, 282AB, 282BA, 282BB all perform the same operation.

図５を参照すると、構成制御アプリケーションは、図４に示された物理計算機・仮想化環境構成制御管理テーブル２６１Ａ、２６１Ｂより、仮想化環境の二重化にて対応する通信相手の識別子Ｔ００７およびＴ０１０と、物理計算機および仮想化環境の状態を示すＴ００２およびＴ００８の情報を取得する（ステップＳ４０２）。続いて、構成制御アプリケーションは、自身の通信相手に対して、自身が存在する環境が稼動系か待機系の情報および稼動系になった時刻を生存監視電文として送信する（ステップＳ４０３）。続いて、構成制御アプリケーションはタイムアウト時間の１／２の時間だけウェイトして（ステップＳ４０４）、ステップＳ４０２に戻る。 Referring to FIG. 5, the configuration control application uses the physical computer / virtualized environment configuration control management tables 261A and 261B shown in FIG. 4 to identify communication partner identifiers T007 and T010 corresponding to the duplication of the virtual environment, Information on T002 and T008 indicating the state of the physical computer and the virtual environment is acquired (step S402). Subsequently, the configuration control application transmits information on whether the environment in which it exists to the active system or the standby system and the time when the environment becomes the active system to the communication partner as a survival monitoring message (step S403). Subsequently, the configuration control application waits for half the timeout period (step S404), and returns to step S402.

図６は、本実施例における物理計算機および仮想化環境上の構成制御アプリケーション２７０Ａ、２７０Ｂ、２８３ＡＡ、２８３ＡＢ、２８３ＢＡ、２８３ＢＢが生存監視電文を受信する処理フローＳ５０１を示すフローチャートである。本実施例の構成制御アプリケーションは図１に示した上記実施形態の生存監視部１２Ａ、１２ＡＡ、１２ＡＢ、１２ＢＡ、１２ＢＢに対応する。物理計算機１００Ａ、１００Ｂ上の構成制御アプリケーション２７０Ａ、２７０Ｂおよび仮想化環境２８２ＡＡ、２８２ＡＢ、２８２ＢＡ、２８２ＢＢ上の構成制御アプリケーション２８３ＡＡ、２８３ＡＢ、２８３ＢＡ、２８３ＢＢは全て同様の動作を行う。 FIG. 6 is a flowchart showing a processing flow S501 in which the physical computer and the configuration control applications 270A, 270B, 283AA, 283AB, 283BA, and 283BB in the virtual environment receive the survival monitoring message. The configuration control application of this example corresponds to the survival monitoring units 12A, 12AA, 12AB, 12BA, and 12BB of the above-described embodiment illustrated in FIG. The configuration control applications 270A, 270B on the physical computers 100A, 100B and the configuration control applications 283AA, 283AB, 283BA, 283BB on the virtual environments 282AA, 282AB, 282BA, 282BB all perform the same operation.

図６を参照すると、構成制御アプリケーションは、図４に示された物理計算機・仮想化環境構成制御管理テーブル２６１Ａ、２６１Ｂより、仮想化環境の二重化において、仮想化環境とそれに対応する通信相手の識別子Ｔ００７およびＴ０１０と、物理計算機および仮想化環境の状態Ｔ００２およびＴ００８の情報を取得する（ステップＳ５０２）。なお、ここでは物理計算機の二重化にて対応する通信相手は予め分かっているものとする。続いて、構成制御プリケーションは、自身の通信相手から送信された生存監視電文を受信して（ステップＳ５０３）、ステップＳ５０１に戻る。 Referring to FIG. 6, the configuration control application uses the physical computer / virtualized environment configuration control management tables 261A and 261B shown in FIG. 4 to identify the virtual environment and the corresponding communication partner identifier in the duplication of the virtual environment. Information on T007 and T010, and states T002 and T008 of the physical computer and the virtual environment are acquired (step S502). In this case, it is assumed that a communication partner corresponding to the duplication of physical computers is known in advance. Subsequently, the configuration control application receives the survival monitoring message transmitted from its own communication partner (step S503), and returns to step S501.

図７は、本実施例における物理計算機・仮想化環境通信経路監視部２５０Ａ、２５０Ｂが実行する処理フローＳ６０１を示すフローチャートである。本実施例の物理計算機・仮想化環境通信経路監視部２５０Ａ、２５０Ｂは、図１に示した上記実施形態の不整合判定部１３Ａ、１３Ｂに対応する。各物理計算機用ＯＳ２１０Ａ、２１０Ｂに存在する物理計算機・仮想化環境通信経路監視部２５０Ａ、２５０Ｂは同様の動作を行う。 FIG. 7 is a flowchart showing the processing flow S601 executed by the physical computer / virtualized environment communication path monitoring units 250A and 250B in this embodiment. The physical computer / virtualized environment communication path monitoring units 250A and 250B of the present example correspond to the inconsistency determination units 13A and 13B of the above-described embodiment illustrated in FIG. The physical computer / virtualized environment communication path monitoring units 250A and 250B existing in the physical computer OSs 210A and 210B perform the same operation.

図７を参照すると、物理計算機・仮想化環境通信経路監視部２５０Ａ、２５０Ｂは、図４に示された物理計算機・仮想化環境構成制御管理テーブル２６１Ａ、２６１Ｂより、Ｔ００７に示された各仮想化環境の通信相手を示すＴ０１０の情報を取得する（ステップＳ６０２）。 Referring to FIG. 7, the physical computer / virtualized environment communication path monitoring units 250A and 250B perform the virtualization shown in T007 from the physical computer / virtualized environment configuration control management tables 261A and 261B shown in FIG. Information of T010 indicating the communication partner of the environment is acquired (step S602).

次に、物理計算機・仮想化環境通信経路監視部２５０Ａ、２５０Ｂは、ステップＳ６０２に取得した情報を基にして、物理計算機１００Ａ、１００Ｂ上の構成制御アプリケーション２７０Ａ、２７０Ｂおよび仮想化環境２８２ＡＡ、２８２ＡＢ、２８２ＢＡ、２８２ＢＢ上の構成制御アプリケーション２８３ＡＡ、２８３ＡＢ、２８３ＢＡ、２８３ＢＢの生存監視電文の送受信を監視する（ステップＳ６０３）。そして、物理計算機・仮想化環境通信経路監視部２５０Ａ、２５０Ｂは、図４に示した該当物理計算機・仮想化環境構成制御管理テーブル２６１Ａ、２６１Ｂにおける物理計算機および各仮想化環境の最終送受信時刻Ｔ００４、Ｔ００５、Ｔ０１１、Ｔ０１２および稼動系開始時刻Ｔ００３、Ｔ００９を更新する（ステップＳ６０４）。 Next, the physical computer / virtualized environment communication path monitoring units 250A and 250B, based on the information acquired in step S602, the configuration control applications 270A and 270B and the virtual environments 282AA and 282AB on the physical computers 100A and 100B, The transmission / reception of the survival monitoring message of the configuration control applications 283AA, 283AB, 283BA, and 283BB on 282BA and 282BB is monitored (step S603). Then, the physical computer / virtualized environment communication path monitoring units 250A, 250B perform the final transmission / reception time T004 of the physical computer and each virtual environment in the corresponding physical computer / virtualized environment configuration control management table 261A, 261B shown in FIG. T005, T011, T012, and active system start times T003, T009 are updated (step S604).

続いて、物理計算機・仮想化環境通信経路監視部２５０Ａ、２５０Ｂは、送受信した生存監視電文のデータと、物理計算機・仮想化環境構成制御管理テーブル２６１Ａ、２６１Ｂにて管理されているデータとを参照し、各物理計算機および仮想化環境の二重化における両系の状態が整合しているか、不整合になっているか判断する（ステップＳ６０５）。ここでは両系が稼働系として動作しているとき両系の状態が不整合と判断する。 Subsequently, the physical computer / virtualized environment communication path monitoring unit 250A, 250B refers to the data of the transmitted / received survival monitoring message and the data managed in the physical computer / virtualized environment configuration control management tables 261A, 261B. Then, it is determined whether the statuses of both systems in the duplexing of each physical computer and the virtual environment are consistent or inconsistent (step S605). Here, when both systems are operating as active systems, it is determined that the states of both systems are inconsistent.

両系の整合がとれていない場合、物理計算機・仮想化環境通信経路監視部２５０Ａ、２５０Ｂは、物理計算機・仮想化環境構成制御管理部２２０Ａ、２２０Ｂに、両系の状態に不整合がある旨の通知を送信する（ステップＳ６０６）。 If the two systems are not matched, the physical computer / virtualized environment communication path monitoring units 250A and 250B indicate that the physical computers / virtualized environment configuration control management units 220A and 220B have inconsistencies in the states of both systems. Is sent (step S606).

図８は、本実施例における物理計算機・仮想化環境障害箇所特定部２４０Ａ、２４０Ｂの処理フローＳ７０１を示すフローチャートである。本実施例の物理計算機・仮想化環境障害箇所特定部２４０Ａ、２４０Ｂは、図１に示した障害箇所特定部１４Ａ、１４Ｂに対応する。 FIG. 8 is a flowchart showing the processing flow S701 of the physical computer / virtualization environment fault location identifying unit 240A, 240B in this embodiment. The physical computer / virtualization environment fault location specifying units 240A and 240B of this embodiment correspond to the fault location specifying units 14A and 14B shown in FIG.

図８を参照すると、物理計算機・仮想化環境障害箇所特定部２４０Ａ、２４０Ｂは、図４に示した物理計算機・仮想化環境構成制御管理テーブル２６１Ａ、２６１Ｂより、識別子Ｔ００７で示された各仮想化環境の通信相手を示す相手Ｔ０１０の情報を取得する（ステップＳ７０２）。続いて、物理計算機・仮想化環境障害箇所特定部２４０Ａ、２４０Ｂは、物理計算機・仮想化環境構成制御管理テーブル２６１Ａ、２６１Ｂより、物理計算機および各仮想化環境のタイムアウト時間Ｔ００６およびＴ０１３の情報を取得する（ステップＳ７０３）。更に、物理計算機・仮想化環境障害箇所特定部２４０Ａ、２４０Ｂは、物理計算機・仮想化環境構成制御管理テーブル２６１Ａ、２６１Ｂに存在するタイムアウト時間Ｔ００６、Ｔ０１３の最小値に相当する時間だけウェイトする（ステップＳ７０４）。 Referring to FIG. 8, the physical computer / virtualized environment fault location identifying units 240A, 240B are each virtualized by the identifier T007 from the physical computer / virtualized environment configuration control management tables 261A, 261B shown in FIG. Information on the partner T010 indicating the communication partner in the environment is acquired (step S702). Subsequently, the physical computer / virtualized environment failure location specifying unit 240A, 240B obtains information on timeout times T006 and T013 of the physical computer and each virtual environment from the physical computer / virtualized environment configuration control management tables 261A, 261B. (Step S703). Further, the physical computer / virtualized environment fault location specifying unit 240A, 240B waits for a time corresponding to the minimum value of the timeout times T006, T013 existing in the physical computer / virtualized environment configuration control management tables 261A, 261B (steps). S704).

次に、物理計算機・仮想化環境障害箇所特定部２４０Ａ、２４０Ｂは、物理計算機・仮想化環境構成制御管理テーブル２６１Ａ、２６１Ｂを参照し、物理計算機および各仮想化環境について生存監視電文の送受信の最終更新時刻Ｔ００４、Ｔ００５、Ｔ０１１、Ｔ０１２からタイムアウト時間が経過しているか否か判断する（ステップＳ７０５）。 Next, the physical computer / virtualized environment failure location specifying unit 240A, 240B refers to the physical computer / virtualized environment configuration control management tables 261A, 261B, and finally transmits and receives the life monitoring message for the physical computer and each virtualized environment. It is determined whether or not a timeout period has elapsed since the update times T004, T005, T011, and T012 (step S705).

そして、物理計算機・仮想化環境障害箇所特定部２４０Ａ、２４０Ｂは、タイムアウトしている物理計算機または仮想化環境が存在しなければ、何もせずにステップＳ７０２に戻る。 Then, the physical computer / virtualized environment fault location identifying unit 240A, 240B returns to step S702 without doing anything if there is no physical computer or virtual environment that has timed out.

一方、タイムアウトしている物理計算機あるいは仮想化環境がある場合、物理計算機・仮想化環境障害箇所特定部２４０Ａ、２４０Ｂは、その物理計算機あるいは仮想化環境に障害が発生していると判断する（ステップＳ７０６）。 On the other hand, if there is a physical computer or virtual environment that has timed out, the physical computer / virtual environment fault location identifying unit 240A, 240B determines that a fault has occurred in the physical computer or virtual environment (step) S706).

ステップＳ７０６の結果より、物理計算機・仮想化環境障害箇所特定部２４０Ａ、２４０Ｂは、物理計算機で障害が発生していれば、物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂに対して、物理計算機１００Ａまたは１００Ｂにて障害が発生していることを通知する（ステップＳ７０７）。 From the result of step S706, the physical computer / virtualized environment fault location identifying unit 240A, 240B determines that the physical computer / virtualized environment restart / stop unit 230A, 230B, if a fault has occurred in the physical computer, It is notified that a failure has occurred in the physical computer 100A or 100B (step S707).

またステップＳ７０６の結果より、物理計算機・仮想化環境障害箇所特定部２４０Ａ、２４０Ｂは、仮想化環境で障害が発生していれば、物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂに対して、仮想化環境にて障害が発生していることを通知する（ステップＳ７０８）。 From the result of step S706, the physical computer / virtualized environment fault location identifying unit 240A, 240B determines that the physical computer / virtualized environment restarting / stopping unit 230A, 230B has a fault in the virtualized environment. Then, a notification that a failure has occurred in the virtual environment is sent (step S708).

図９は、本実施例における物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂが再起動および停止を命令する処理フローＳ８０１を示すフローチャートである。本実施例の物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂは、図１に示した上記実施形態の状態制御部１６Ａ、１６Ｂに対応する。 FIG. 9 is a flowchart showing a processing flow S801 in which the physical computer / virtualized environment restart / stop units 230A and 230B instruct the restart and stop in this embodiment. The physical computer / virtualized environment restart / stop units 230A and 230B of the present example correspond to the state control units 16A and 16B of the above-described embodiment illustrated in FIG.

図９を参照すると、物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂは、図４に示した物理計算機・仮想化環境構成制御管理テーブル２６１Ａ、２６１Ｂより、物理計算機および各仮想化環境の通信相手の情報Ｔ０１０を取得する（ステップＳ８０２）。 Referring to FIG. 9, the physical computer / virtualized environment restart / stop units 230A and 230B have the physical computers and virtualized environment configurations 261A and 261B shown in FIG. Communication partner information T010 is acquired (step S802).

その後、物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂは、物理計算機・仮想化環境障害箇所特定部２４０Ａ、２４０Ｂまたは物理計算機・仮想化環境構成制御管理部２２０Ａ、２２０Ｂから、障害の発生および障害発生箇所の情報を受信すると（ステップＳ８０３）、障害発生箇所が稼動系であるか否か判断する（ステップＳ８０４）。 Thereafter, the physical computer / virtualized environment restart / stop unit 230A, 230B generates a failure from the physical computer / virtualized environment fault location identifying unit 240A, 240B or the physical computer / virtualized environment configuration control management unit 220A, 220B. When the failure location information is received (step S803), it is determined whether the failure location is an active system (step S804).

障害発生箇所が待機系であった場合、物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂは、その障害発生箇所である物理計算機あるいは仮想化環境が停止の状態へ遷移することを物理計算機・仮想化環境構成制御管理部２２０Ａ、２２０Ｂに通知し（ステップＳ８０５）、ステップＳ８０２に戻る。 When the failure occurrence location is a standby system, the physical computer / virtualized environment restart / stop units 230A and 230B indicate that the physical computer or virtualization environment that is the failure occurrence location is changed to a stopped state. Notify the virtualization environment configuration control management unit 220A, 220B (step S805), and return to step S802.

一方、障害発生箇所が稼働系であった場合、物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂは、その障害発生箇所が物理計算機であるか否か判断する（ステップＳ８０６）。 On the other hand, if the failure location is an active system, the physical computer / virtualized environment restart / stop units 230A and 230B determine whether the failure occurrence location is a physical computer (step S806).

障害発生箇所が物理計算機であれば、物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂは、物理計算機用ＯＳ２１０Ａまたは２１０ＢへのＮＭＩを指示するための再起動電文を、障害発生箇所となっている物理計算機の通信インタフェース１３０または１３１へ送信する（ステップＳ８０７）。この電文を受信した物理計算機の物理計算機用ＯＳ２１０Ａまたは２１０Ｂは、ＮＭＩを契機としてメモリダンプが実行され、所定のメモリ領域のデータがディスク１４０のメモリダンプ領域１４２に退避される。 If the failure location is a physical computer, the physical computer / virtualized environment restart / stop units 230A and 230B use a restart message for instructing the NMI to the physical computer OS 210A or 210B as the failure location. It is transmitted to the communication interface 130 or 131 of the physical computer that is currently in operation (step S807). The physical computer OS 210 </ b> A or 210 </ b> B of the physical computer that has received this message executes a memory dump triggered by the NMI, and data in a predetermined memory area is saved in the memory dump area 142 of the disk 140.

続いて、物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂは、規定の時間だけ待ち合わせをした後、障害が発生した物理計算機１００Ａ、１００Ｂへリセット要求の電文を送信する（ステップＳ８０８）。 Subsequently, the physical computer / virtualized environment restart / stop units 230A and 230B wait for a specified time, and then transmit a reset request message to the physical computers 100A and 100B in which the failure has occurred (step S808).

一方、ステップＳ８０６における障害発生箇所が仮想化環境であれば、物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂは、通信インタフェース１３０または１３１およびネットワークを介して、障害発生箇所となっている仮想化環境のある物理計算機の物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂへ、障害発生箇所となっている仮想化環境の再起動を指示する再起動電文を送信する（ステップＳ８０９）。 On the other hand, if the failure occurrence location in step S806 is a virtual environment, the physical computer / virtualization environment restart / stop units 230A and 230B are failure occurrence locations via the communication interface 130 or 131 and the network. A restart message is sent to the physical computer / virtual environment restart / stop units 230A and 230B of the physical computer with the virtual environment to instruct the restart of the virtual environment that is the failure location (step S809). .

続いて、物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂは、規定時間だけ待ち合わせをした後、障害発生箇所となっている仮想化環境のある物理計算機の物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂへ、その仮想化環境の停止を指示する停止電文を送信する（ステップＳ８１０）。 Subsequently, the physical computer / virtual environment restart / stop units 230A and 230B wait for a specified time, and then restart the physical computer / virtual environment of the physical computer with the virtual environment that is the failure location. / A stop message for instructing stop of the virtual environment is transmitted to the stop units 230A and 230B (step S810).

ステップＳ８０８またはステップＳ８１０の後、物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂは、障害が発生した稼動系に対応する待機系の環境を稼動系へ状態遷移させる旨を物理計算機・仮想化環境構成制御管理部２２０Ａ、２２０Ｂに通知する（ステップＳ８１１）。 After step S808 or step S810, the physical computer / virtualized environment restart / stop units 230A and 230B indicate that the standby system environment corresponding to the active system where the failure has occurred changes state to the active system. The management environment configuration control management unit 220A, 220B is notified (step S811).

図１０は、本実施例における物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂが再起動／停止の通知を受信したときの処理フローＳ９０１を示すフローチャートである。 FIG. 10 is a flowchart showing the processing flow S901 when the physical computer / virtualized environment restart / stop units 230A and 230B receive the restart / stop notification in this embodiment.

図１０を参照すると、物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂは、二重化における他系の物理計算機の物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂから通知電文を受信すると（ステップＳ９０２）、その電文よりどの仮想化環境に障害が発生しているか判断し、その仮想化環境の仮想化環境用ＯＳへＮＭＩを通知してメモリダンプを開始させる（ステップＳ９０３）。次に、物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂは、規定の時間だけ待ち合わせをした後、障害が発生している仮想化環境へリセット通知を発行する（ステップＳ９０４）。 Referring to FIG. 10, when the physical computer / virtualized environment restart / stop units 230A, 230B receive a notification message from the physical computer / virtualized environment restart / stop units 230A, 230B of the other physical computer in the duplex mode. (Step S902) From the message, it is determined which virtualization environment has a failure, NMI is notified to the virtualization environment OS of the virtualization environment, and a memory dump is started (Step S903). Next, the physical computer / virtualized environment restart / stop units 230A and 230B wait for a specified time, and then issue a reset notification to the virtualized environment in which the failure has occurred (step S904).

更に、物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂは、障害が発生している仮想化環境に対応する他系の環境を稼動系へ遷移させるため、物理計算機・仮想化環境構成制御管理部２２０Ａ、２２０Ｂに状態変更を通知する（ステップＳ９０５）。 Further, the physical computer / virtualized environment restart / stop units 230A and 230B control the physical computer / virtualized environment configuration in order to transition the other environment corresponding to the virtualized environment in which the failure has occurred to the active system. The management unit 220A, 220B is notified of the state change (step S905).

図１１は、本実施例における物理計算機・仮想化環境構成制御管理部２２０Ａ、２２０Ｂが物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂから通知を受信したときに実行する処理フローＳ１００１を示すフローチャートである。本実施例の物理計算機・仮想化環境構成制御管理部２２０Ａ、２２０Ｂは、図１に示した上記実施形態の不整合箇所特定部１５Ａ、１５Ｂに対応する。 FIG. 11 shows a processing flow S1001 executed when the physical computer / virtualized environment configuration control management unit 220A, 220B receives a notification from the physical computer / virtualized environment restart / stop unit 230A, 230B in this embodiment. It is a flowchart. The physical computer / virtualized environment configuration control management units 220A and 220B of the present example correspond to the inconsistent location specifying units 15A and 15B of the above-described embodiment shown in FIG.

図１１を参照すると、物理計算機・仮想化環境構成制御管理部２２０Ａ、２２０Ｂは、物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂから状態変更の通知を受信すると（Ｓ１００２）、図４に示した物理計算機・仮想化環境構成制御管理テーブル２６１Ａ、２６１Ｂにおける状態Ｔ００２またはＴ００８を更新し、更に待機系から稼動系への状態遷移の場合には時刻Ｔ００３またはＴ００９を更新する（ステップＳ１００３）。 Referring to FIG. 11, when the physical computer / virtualized environment configuration control management unit 220A, 220B receives a status change notification from the physical computer / virtualized environment restart / stop unit 230A, 230B (S1002), FIG. The state T002 or T008 in the physical computer / virtualized environment configuration control management table 261A, 261B shown is updated, and in the case of a state transition from the standby system to the active system, the time T003 or T009 is updated (step S1003).

図１２は、本実施例における物理計算機・仮想化環境構成制御管理部２２０Ａ、２２０Ｂが物理計算機・仮想化環境通信経路監視部２５０Ａ、２５０Ｂから通知を受信したときに実行する処理の処理フローＳ１１０１を示すフローチャートである。 FIG. 12 illustrates a processing flow S1101 of processing executed when the physical computer / virtualized environment configuration control management unit 220A, 220B receives a notification from the physical computer / virtualized environment communication path monitoring unit 250A, 250B in this embodiment. It is a flowchart to show.

図１２を参照すると、物理計算機・仮想化環境構成制御管理部２２０Ａ、２２０Ｂは、物理計算機・仮想化環境通信経路監視部２５０Ａ、２５０Ｂから、物理計算機または仮想化環境の二重化における両系の状態に不整合がある旨の通知を受信すると（ステップＳ１１０２）、他系の物理計算機または仮想化環境が稼動系へ遷移した時刻（稼働系開始時刻）の情報を取得する（ステップＳ１１０３）。 Referring to FIG. 12, the physical computer / virtualized environment configuration control management unit 220A, 220B changes from the physical computer / virtualized environment communication path monitoring unit 250A, 250B to the state of both systems in the duplication of the physical computer or virtualized environment. When a notification indicating that there is a mismatch is received (step S1102), information on the time when the physical computer or the virtual environment of the other system transitions to the active system (operating system start time) is acquired (step S1103).

続いて、物理計算機・仮想化環境構成制御管理部２２０Ａ、２２０Ｂは、自計算機内の物理計算機・仮想化環境構成制御管理テーブル２６１Ａ、２６１Ｂにおける自系の稼動系開始時刻と、ステップＳ１１０３にて取得した相手系の稼動系開始時刻を比較する（ステップＳ１１０４）。 Subsequently, the physical computer / virtualized environment configuration control management unit 220A, 220B acquires the active system start time of the own system in the physical computer / virtualized environment configuration control management table 261A, 261B in the own computer in step S1103. The operating system start times of the partner systems that have been compared are compared (step S1104).

相手系の環境の方が稼動系開始時刻が古い場合、物理計算機・仮想化環境構成制御管理部２２０Ａ、２２０Ｂは、相手系の環境が障害であると判断し、その旨を物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂへ通知する（ステップＳ１１０５）。 If the other system environment has an earlier operating system start time, the physical computer / virtualized environment configuration control management unit 220A, 220B determines that the other system environment is faulty, and this is indicated by the physical computer / virtualization. The environment restart / stop units 230A and 230B are notified (step S1105).

本実施例によれば、物理計算機および仮想化環境のいずれにおいて障害が発生した場合でも、障害が発生した箇所に対して再起動／停止電文を通知することが可能であり、確実かつ迅速に障害発生箇所を切り離すことができる。これにより、通信経路の間欠障害などにおいて一時的に生存監視電文が途絶え、その後で通信経路が回復したとき、両系の物理計算機または仮想化環境が稼動系として動作しようとする状態が起きても、その時点で不整合を解消することができる。また、障害発生箇所を切り離す場合に、その瞬間のメモリ状態をディスクへ保存するメモリダンプを開始し、障害発生時点のメモリ内の情報を記録することにより、障害解析性の向上も図ることが可能である。 According to this embodiment, even if a failure occurs in either the physical computer or the virtual environment, it is possible to notify the restart / stop message to the location where the failure has occurred, so that the failure can be performed reliably and quickly. The occurrence point can be separated. As a result, even if there is a situation where both the physical computers or the virtual environment try to operate as the active system when the survival monitoring message is temporarily interrupted due to an intermittent failure of the communication path, and then the communication path is restored Inconsistency can be resolved at that time. In addition, when the failure location is cut off, a memory dump that saves the memory state at that moment to the disk is started, and information in the memory at the time of the failure can be recorded to improve failure analysis. It is.

以上の本実施例を整理すると、障害発生時に適切な制御を行うための構成は以下の通りである。 To summarize this embodiment, the configuration for performing appropriate control when a failure occurs is as follows.

（１）構成制御アプリケーション（２７０Ａ、２７０Ｂ） (1) Configuration control application (270A, 270B)

本アプリケーションは、物理計算機・仮想化環境上に配置され、稼動系と待機系の構成制御アプリケーション間において生存監視電文の送受信を実施する。 This application is placed on a physical computer / virtualized environment, and sends and receives liveness monitoring messages between active and standby configuration control applications.

（２）物理計算機・仮想化環境構成制御管理テーブル（２６１Ａ、２６１Ｂ） (2) Physical computer / virtualized environment configuration control management table (261A, 261B)

物理計算機および仮想化環境の障害箇所情報、生存監視電文の送受信履歴、物理計算機および仮想化環境の稼動系または待機系の状態を一意に特定するための管理テーブルであり、物理計算機および仮想環境から共に参照可能である。 This is a management table for uniquely identifying the failure location information of physical computers and virtual environments, the transmission / reception history of survival monitoring messages, and the status of the active or standby system of physical computers and virtual environments, from physical computers and virtual environments. Both can be referenced.

（３）物理計算機・仮想化環境通信経路監視部（２５０Ａ、２５０Ｂ） (3) Physical computer / virtualized environment communication path monitoring unit (250A, 250B)

物理計算機および仮想化環境の稼動系と待機系において相互に生存を確認するための生存監視電文の送受信を監視し、また稼動系と待機系の組み合わせの状態に不整合がないか判断する。 Monitors the transmission / reception of the survival monitoring message for confirming the survival of the active system and the standby system in the physical computer and the virtual environment, and determines whether there is a mismatch in the combination of the active system and the standby system.

（４）物理計算機・仮想化環境障害箇所特定部（２４０Ａ、２４０Ｂ） (4) Physical computer / virtualized environment fault location identification unit (240A, 240B)

物理計算機および仮想化環境のどの箇所で障害が発生したかを特定する。 Identify where in the physical computer and the virtual environment the failure occurred.

（５）物理計算機・仮想化環境再起動／停止部（２３０Ａ、２３０Ｂ） (5) Physical computer / virtualized environment restart / stop unit (230A, 230B)

障害が発生すると、物理計算機・仮想化環境障害箇所特定部２４０Ａ、２４０Ｂの情報を基に、障害が発生した計算機へ再起動要求電文を送信する。 When a failure occurs, a restart request message is transmitted to the computer in which the failure has occurred, based on the information of the physical computer / virtualized environment failure location identification units 240A and 240B.

このとき物理計算機で障害が発生している場合には再起動要求を物理計算機へ送信し、物理計算機のＯＳに対してＮＭＩ（マスク不可能割り込み信号）を発生させる。ＮＭＩによって障害発生時点のメモリ状態をディスクへ保存するメモリダンプを開始し、障害発生時点の情報を記録することができる。 At this time, if a failure has occurred in the physical computer, a restart request is transmitted to the physical computer, and an NMI (non-maskable interrupt signal) is generated for the OS of the physical computer. The memory dump at which the memory state at the time of failure occurrence is saved to the disk by NMI can be started, and information at the time of failure occurrence can be recorded.

この際に、ＮＭＩに対してメモリダンプが実行されない場合には、再び障害系の物理計算機へ停止要求電文を送信する。停止要求電文を受け取った障害の物理計算機では、即時に計算機を停止状態とし、再び物理計算機が起動することを抑止する。メモリダンプが実行された場合、メモリダンプが終了した後にリブートあるいは停止を実行すればよい。 At this time, if a memory dump is not executed for the NMI, a stop request message is transmitted to the failed physical computer again. The faulty physical computer that has received the stop request message immediately puts the computer in a stopped state and prevents the physical computer from starting again. When a memory dump is executed, reboot or stop may be executed after the memory dump is completed.

また、障害が仮想化環境上で発生している場合には、障害の発生した物理計算機上に存在する仮想化環境再起動／停止部２３０Ａ、２３０Ｂに再起動要求電文を送信する。 If a failure has occurred in the virtual environment, a restart request message is transmitted to the virtual environment restart / stop units 230A and 230B existing on the physical computer in which the failure has occurred.

障害の発生した物理計算機上に存在する物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂは、再起動要求電文を受信すると、物理計算機・仮想化環境障害箇所特定部２４０Ａ、２４０Ｂが判断した障害発生箇所となっている仮想化環境内のＯＳに対してＮＭＩ（マスク不可能割り込み信号）を発生させる。仮想環境上のＯＳにＮＭＩを発生させることによって障害発生時点のメモリ状態をディスクへ保存するメモリダンプを開始し、障害発生時点の情報を記録することができる。 Upon receiving the restart request message, the physical computer / virtualized environment restart / stop units 230A and 230B existing on the failed physical computer are determined by the physical computer / virtualized environment fault location specifying unit 240A and 240B. An NMI (non-maskable interrupt signal) is generated for the OS in the virtual environment where the failure has occurred. By generating an NMI in the OS on the virtual environment, a memory dump for saving the memory state at the time of the failure to the disk can be started, and information at the time of the failure can be recorded.

この際に、仮想化環境内に発生させたＮＭＩに対してメモリダンプが実行されない場合には、障害となっている仮想化環境自体へ停止要求電文を送信する。停止要求電文を受信した仮想化環境では、即時に仮想化環境を停止状態とし、現状態をフリーズする。メモリダンプが実行された場合、メモリダンプが終了した後にリブートあるいは停止を実行すればよい。 At this time, if a memory dump is not executed for the NMI generated in the virtual environment, a stop request message is transmitted to the virtual environment itself that has failed. In the virtual environment that has received the stop request message, the virtual environment is immediately stopped and the current state is frozen. When a memory dump is executed, reboot or stop may be executed after the memory dump is completed.

（６）物理計算機・仮想化環境構成制御管理部（２２０Ａ、２２０Ｂ） (6) Physical computer / virtualized environment configuration control management unit (220A, 220B)

他系の物理計算機あるいは仮想化環境に障害が発生し、物理計算機・仮想化環境再起動／停止部２３０Ａより障害発生箇所への再起動／停止通知が送信されると、物理計算機・仮想化環境障害箇所特定部２４０Ａ、２４０Ｂによる障害発生箇所の情報を基に、物理計算機・仮想化環境構成制御管理テーブル２６１Ａ、２６１Ｂを更新する。 When a failure occurs in a physical computer or virtual environment of another system, and a restart / stop notification is sent from the physical computer / virtual environment restart / stop unit 230A to the point of failure, the physical computer / virtual environment The physical computer / virtualized environment configuration control management tables 261A and 261B are updated based on the information of the failure occurrence location by the failure location identification units 240A and 240B.

また、自系の物理計算機あるいは仮想化環境上に障害が発生し、他系の物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂより、障害発生箇所への再起動／停止通知を受信すると、物理計算機・仮想化環境障害箇所特定部２４０Ａ、２４０Ｂからの障害発生箇所の情報を基に物理計算機・仮想化環境構成制御管理テーブル２６１Ａ、２６１Ｂを更新する。 Further, when a failure occurs in the physical computer or virtual environment of the own system, and a restart / stop notification to the location of the failure is received from the restart / stop units 230A and 230B of the physical computer / virtual environment of the other system Then, the physical computer / virtualized environment configuration control management tables 261A, 261B are updated based on the information on the location where the fault has occurred from the physical computer / virtualized environment fault location specifying unit 240A, 240B.

また、通信経路の間欠障害などで物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂより障害発生箇所への再起動／停止通知が送信され、通信が回復した際には物理計算機・仮想化環境通信経路監視部２５０Ａ、２５０Ｂの情報を元に物理計算機・仮想化環境構成制御管理テーブル２６１Ａ、２６１Ｂを更新し、更新したテーブルの内容を基に物理計算機・仮想化環境構成制御管理部２２０Ａ、２２０Ｂに相手系に対して再起動／停止命令の発行を指示する。 In addition, a restart / stop notification is sent from the physical computer / virtualized environment restart / stop unit 230A, 230B to the location where the failure occurred due to an intermittent communication path failure, and when communication is restored, the physical computer / virtualization The physical computer / virtualized environment configuration control management table 261A, 261B is updated based on the information of the environmental communication path monitoring units 250A, 250B, and the physical computer / virtualized environment configuration control management unit 220A, based on the contents of the updated table. Instruct 220B to issue a restart / stop command to the partner system.

この条件の判定においては、通信が回復した段階において、物理計算機・仮想化環境構成制御管理テーブル２６１Ａにおける稼動系と待機系の情報と、相手系が稼動系となった時刻とを基に、物理計算機・仮想化環境再起動／停止部２３０Ａ、２３０Ｂにどの物理計算機または仮想化環境を再起動あるいは停止させるか判断する。 In the determination of this condition, when communication is recovered, the physical and standby environment information in the physical computer / virtualized environment configuration control management table 261A and the time when the partner system becomes the active system are used. The computer / virtualized environment restart / stop unit 230A, 230B determines which physical computer or virtualized environment is to be restarted or stopped.

上述した本発明の実施形態は、本発明の説明のための例示であり、本発明の範囲をそれらの実施形態にのみ限定する趣旨ではない。当業者は、本発明の要旨を逸脱することなしに、他の様々な態様で本発明を実施することができる。 The above-described embodiments of the present invention are examples for explaining the present invention, and are not intended to limit the scope of the present invention only to those embodiments. Those skilled in the art can implement the present invention in various other modes without departing from the spirit of the present invention.

以上、本発明の実施形態および実施例について述べてきたが、本発明は、これらの実施形態や実施例だけに限定されるものではなく、本発明の技術思想の範囲内において、これらの実施形態や実施例を組み合わせて使用したり、一部の構成を変更したりしてもよい。 The embodiments and examples of the present invention have been described above, but the present invention is not limited to these embodiments and examples, and these embodiments are within the scope of the technical idea of the present invention. The embodiments may be used in combination, or a part of the configuration may be changed.

１００Ａ、Ｂ…物理計算機、１０Ａ、Ｂ…物理計算機、１１０…プロセッサ、１１ＡＡ、ＡＢ、ＢＡ、ＢＢ…仮想化環境、１２０…ディスクインタフェース、１２Ａ、ＡＡ、ＡＢ、Ｂ、ＢＡ、ＢＢ…生存監視部、１３０…通信インタフェース、１３１…通信インタフェース、１３Ａ、Ｂ…不整合判定部、１４０…ディスク、１４１…管理情報、１４２…メモリダンプ領域、１４Ａ、Ｂ…障害箇所特定部、１５０…ネットワーク、１５Ａ、Ｂ…不整合箇所特定部、１６０…ネットワーク、１６Ａ、Ｂ…状態制御部、２００Ａ、Ｂ…メモリ、２１０Ａ、Ｂ…物理計算機用ＯＳ、２２０Ａ、Ｂ…物理計算機・仮想化環境構成制御管理部、２３０Ａ、Ｂ…物理計算機・仮想化環境再起動／停止部、２４０Ａ、Ｂ…物理計算機・仮想化環境障害箇所特定部、２５０Ａ、Ｂ…物理計算機・仮想化環境通信経路監視部、２６０Ａ、Ｂ…物理計算機・仮想化環境共有用メモリ区画、２６１Ａ、Ｂ…物理計算機・仮想化環境構成制御管理テーブル、２７０Ａ、Ｂ…構成制御アプリケーション、２８１ＡＡ、ＡＢ、ＢＡ、ＢＢ…仮想化環境用論理区画、２８２ＡＡ、ＡＢ、ＢＡ、ＢＢ…仮想化環境、２８３ＡＡ、ＡＢ、ＢＡ、ＢＢ…構成制御アプリケーション、２８４ＡＡ、ＡＢ、ＢＡ、ＢＢ…仮想化環境用ＯＳ
100A, B ... physical computer, 10A, B ... physical computer, 110 ... processor, 11AA, AB, BA, BB ... virtual environment, 120 ... disk interface, 12A, AA, AB, B, BA, BB ... survival monitoring unit , 130: Communication interface, 131: Communication interface, 13A, B ... Inconsistency determination unit, 140 ... Disk, 141 ... Management information, 142 ... Memory dump area, 14A, B ... Fault location identification unit, 150 ... Network, 15A, B: Inconsistent location identification unit, 160 ... Network, 16A, B ... Status control unit, 200A, B ... Memory, 210A, B ... OS for physical computer, 220A, B ... Physical computer / virtualized environment configuration control management unit, 230A, B: Physical computer / virtualized environment restart / stop unit, 240A, B: Physical computer / virtualized environment fault location identification , 250A, B ... Physical computer / virtualized environment communication path monitoring unit, 260A, B ... Physical computer / virtualized environment sharing memory partition, 261A, B ... Physical computer / virtualized environment configuration control management table, 270A, B ... Configuration control application, 281AA, AB, BA, BB ... Logical partition for virtualized environment, 282AA, AB, BA, BB ... Virtualized environment, 283AA, AB, BA, BB ... Configuration control application, 284AA, AB, BA, BB ... Virtualized OS

Claims

A computer control device of a system that constructs a virtual environment on a physical computer and configures the dual of the physical computer and the virtual environment,
Survival monitoring means that is arranged in each of the physical computer and the virtualized environment, and both systems corresponding to each duplication of the physical computer and the virtualized environment transmit and receive a liveness monitoring message to each other;
In the duplexing of the physical computer and the virtual environment, both of the corresponding systems that are generated when the active system remains the active system in the system switching when the active system is determined to be a failure are both the own system Inconsistency determining means for determining whether or not there is an inconsistency that is recognized as an active system,
In the survival monitoring means, a failure location specifying means for determining that a physical computer or a virtual environment of a system corresponding to a system that has timed out without receiving the survival monitoring message within a predetermined timeout time,
In the inconsistency determination means, if inconsistency occurs, inconsistency location specifying means for determining that the physical computer or virtual environment of the system whose operation time is old as the active system is a failure,
In the fault location specifying means or the inconsistent location specifying means, if the system determined to be a fault is an active system, the predetermined physical processing or the virtual environment of the active system is not operated by the predetermined protection processing State control for transitioning the corresponding standby physical computer or virtual environment to the active system and stopping the standby physical computer or virtual environment if the faulted system is the standby system Means,
I have a,
If the system determined to be faulty is the active system and the location where the fault has occurred is a virtualized environment, the state control means stops the virtualized environment for a physical computer with the virtualized environment. By instructing, the virtual environment is put into a state where it does not operate as an active system.
Computer control device.

The protection process includes a process of saving a predetermined memory area of the virtual environment to a memory area outside the virtual environment and resetting the virtual environment if the active virtual environment is an obstacle. The computer control device according to claim 1, comprising:

The inconsistent location specifying means records the time when the physical computer and the virtualized environment started operating as the active system as the active system start time, and if there is a mismatch between the two corresponding systems, The computer control apparatus according to claim 1, wherein a system to be a failure is determined by referring to an active system start time of the system.

The failure location specifying means predetermines the timeout time for each of the physical computer and the virtual environment, records a last reception time that is a time when the survival monitoring message was last received, and records the final reception. The computer control apparatus according to any one of claims 1 to 3, wherein a time-out from a time to a current time exceeds the time-out time, and is determined to be time-out.

A computer control method for constructing a virtual environment on a physical computer and controlling a system constituting a duplex of the physical computer and the virtual environment,
Both systems corresponding in the duplexing of the physical computer and the virtualized environment send and receive the life monitoring message to each other, and the physical computer of the system corresponding to the system timed out without receiving the life monitoring message within a predetermined time or A first step of determining the virtual environment as a failure;
In the duplexing of the physical computer and the virtual environment, both of the corresponding systems that are generated when the active system remains the active system in the system switching when the active system is determined to be a failure are both the own system Whether or not there is an inconsistency that the system recognizes as the active system, and if there is a mismatch, the second is to determine the physical computer or the virtual environment of the system whose operation time is old as the active system as a failure . Steps,
If the system determined to be a failure is an active system, a predetermined protection process makes the active physical computer or virtualization environment non-operational and the corresponding standby physical computer or virtualization A third step of transitioning the environment to the active system;
If the system determined to be a failure is a standby system, a fourth step of stopping the physical computer or virtual environment of the standby system;
I have a,
If the system determined to have a failure in the third step is an active system and the location where the failure has occurred is a virtualized environment, the virtual environment is stopped on the physical computer with the virtualized environment. Instructing the virtual environment to not operate as an active system,
Computer control method.

A computer control program for constructing a virtual environment on a physical computer and controlling a system that constitutes a duplex of the physical computer and the virtual environment,
Both systems corresponding in the duplexing of the physical computer and the virtualized environment send and receive the life monitoring message to each other, and the physical computer of the system corresponding to the system timed out without receiving the life monitoring message within a predetermined time or A first procedure for determining a virtualized environment as a failure;
In the duplexing of the physical computer and the virtual environment, both of the corresponding systems that are generated when the active system remains the active system in the system switching when the active system is determined to be a failure are both the own system Whether or not there is an inconsistency that the system recognizes as the active system, and if there is a mismatch, the second is to determine the physical computer or the virtual environment of the system whose operation time is old as the active system as a failure . Procedure and
If the system determined to be a failure is an active system, a predetermined protection process makes the active physical computer or virtualization environment non-operational and the corresponding standby physical computer or virtualization A third procedure for transitioning the environment to the active system;
If the system determined to be a failure is a standby system, a fourth procedure for stopping the physical computer or virtual environment of the standby system;
When letting the calculator execute
In the third procedure, if the system determined to be a failure is an active system and the location where the failure has occurred is a virtual environment, the virtual environment is stopped on the physical computer with the virtual environment. Instructing the virtual environment to not operate as an active system,
Computer control program.