JP4999670B2

JP4999670B2 - Computer equipment

Info

Publication number: JP4999670B2
Application number: JP2007328215A
Authority: JP
Inventors: 毅樋口
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2007-12-20
Filing date: 2007-12-20
Publication date: 2012-08-15
Anticipated expiration: 2027-12-20
Also published as: JP2009151509A

Description

本発明は、計算機における障害監視技術に関し、特に仮想計算機における障害監視技術に関する。 The present invention relates to a fault monitoring technique in a computer, and more particularly to a fault monitoring technique in a virtual machine.

計算機における障害監視システムは、物理計算機とＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）が１対１に対応しているため、各ＯＳ上に配備されている障害監視用のエージェントが障害監視ツールに対して通知を行うことができる。
一方、仮想計算機環境においては、仮想計算機上に配備されているゲストＯＳからそれぞれ障害通知が行われると、物理計算機のハードウェアの障害が各ゲストＯＳから通知されるなどの問題が発生する。
このため、特許文献１では、特定の統合管理エージェントにて集約を行うことにより、１台の物理計算機からの通知を集約できる技術が開示されている。
また、計算機におけるハードウェアの障害や予期せぬ動作に対する試験は、特許文献２にあるように実際のハードウェアの応答を、装置やソフトウェアによって変更することにより、ハードウェアの障害を擬似的に発生させ、その結果を基に業務プログラムに通知し、障害発生運用への切り替えの動作を確認可能とするものであった。
特開２００２−２２９８０６号公報特開２００２−３５１７５５号公報 Since the fault monitoring system in a computer has a one-to-one correspondence between a physical computer and an OS (Operating System), a fault monitoring agent deployed on each OS notifies the fault monitoring tool. Can do.
On the other hand, in the virtual machine environment, when a fault notification is made from each guest OS deployed on the virtual machine, problems such as a notice of a hardware fault of the physical machine from each guest OS occur.
For this reason, Patent Document 1 discloses a technology that can aggregate notifications from a single physical computer by performing aggregation using a specific integrated management agent.
In addition, testing for hardware failures and unexpected behaviors in computers is performed by artificially generating hardware failures by changing the actual hardware response using a device or software, as described in Patent Document 2. Based on the result, the business program is notified, and the operation for switching to the operation in which the failure has occurred can be confirmed.
JP 2002-229806 A JP 2002-351755 A

仮想計算機環境は、複数のＯＳを独立して仮想的に動作させる環境であるため、これまで複数の拠点に配備されていた物理計算機環境を１台の物理計算機上に配備することが可能となる。このため、仮想計算機環境上に配備されたそれぞれのゲストＯＳは、個別のネットワークに接続される形態となることがある。
このため、必ずしも同じ物理計算機上に配備されているゲストＯＳ間やゲストＯＳとホストＯＳの間の通信が行われるとは限らないため、ネットワーク上の通信経路が物理的にも仮想的にも存在していない可能性がある。 Since the virtual computer environment is an environment in which a plurality of OSs are virtually operated independently, a physical computer environment that has been deployed at a plurality of bases can be deployed on a single physical computer. . For this reason, each guest OS deployed on the virtual machine environment may be connected to an individual network.
For this reason, communication between guest OSs deployed on the same physical computer or between a guest OS and a host OS is not always performed, so there are physical and virtual communication paths on the network. It may not have been done.

また、ゲストＯＳは追加が行われる可能性があり、追加されるたびに障害情報を集約するためのホストＯＳへの通信のための情報の設定を実施する必要があるという課題がある。 Further, there is a possibility that the guest OS may be added, and there is a problem that it is necessary to set information for communication to the host OS for collecting failure information every time the guest OS is added.

さらに、従来のコンピュータシステムの試験方法を仮想計算機上で実行した場合、それぞれの障害発生の指示や障害検知結果の収集はそれぞれのネットワーク上で実施する必要があった。
しかし、仮想計算機環境では、ハードウェアからの障害を受信したホストＯＳは、そのデータをホストＯＳ上で処理する場合やエミュレートしてゲストＯＳに通知することで、ゲストＯＳ上で障害が検出される場合があり、実際にどこで障害がどのような手段にて検知されるかわからないという課題がある。
ハードウェアの障害が発生した場合、エラーの種類や発生箇所により、ホストＯＳ上では検知されず、ゲストＯＳからのリクエストに対するエラーとして通知されることで障害と判断できるものやホストＯＳ上で検知され、障害と判断できるものがあるため、どのような手段で障害が検知されるかがわからない。例えば、ディスク故障が発生した場合、完全に停止してしまうとゲストＯＳが停止してしまうため、ホストで検知される。他方、一部のエリアに障害が発生し、そのエリアにアクセスした場合に、そのアクセスの結果エラーと認識できる場合は、ゲストＯＳで検知される。このように、ハードウェア障害がホストＯＳで検知される場合とゲストＯＳで検知される場合の両方がある。
このことは、試験による擬似的な障害の検知に限らず、実際の障害の検知においても同様である。 Furthermore, when a conventional computer system test method is executed on a virtual machine, it is necessary to collect each failure occurrence instruction and failure detection result on each network.
However, in the virtual machine environment, the host OS that has received a failure from the hardware detects the failure on the guest OS when processing the data on the host OS or by emulating and notifying the guest OS. There is a problem that it is difficult to know where the fault is actually detected by what means.
When a hardware failure occurs, depending on the type and location of the error, it is not detected on the host OS, but is detected as an error in response to a request from the guest OS or detected on the host OS. Because there is something that can be judged as a failure, it is not known by what means the failure is detected. For example, when a disk failure occurs, the guest OS is stopped when it is completely stopped. On the other hand, when a failure occurs in some area and the area is accessed, if it can be recognized as an error as a result of the access, it is detected by the guest OS. As described above, there are both cases where a hardware failure is detected by the host OS and cases where the hardware failure is detected by the guest OS.
This is not limited to the detection of a pseudo failure by a test, and the same applies to the detection of an actual failure.

また、どのようなＩ／Ｏ（Ｉｎｐｕｔ／Ｏｕｔｐｕｔ）がホストＯＳ上で処理され、どのようなＩ／Ｏがエミュレートされるかは仮想化実装方式によって異なるため、個別に対応することは困難である。
また、ゲストＯＳを識別する情報は仮想計算機管理機構が独自に持つ情報（例えば、ドメインＩＤ（ＩｄｅｎｔｉｆｉｃａｔｉｏｎＤａｔａ））であることから、ホスト名やＩＰ（ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）アドレスといった識別情報では、ゲストＯＳを特定させることが出来ないという課題もある。 Also, what I / O (Input / Output) is processed on the host OS and what I / O is emulated differs depending on the virtualization implementation method, so it is difficult to deal with them individually. is there.
In addition, since the information for identifying the guest OS is information uniquely owned by the virtual machine management mechanism (for example, domain ID (Identification Data)), the identification information such as the host name or IP (Internet Protocol) address is used to identify the guest OS. There is also a problem that it cannot be specified.

この発明は上記のような課題を解決することを主な目的の一つとし、ホストＯＳにてゲストＯＳからの障害発生の情報を集約する障害管理方式を提供することを目的の一つとする。 One of the main objects of the present invention is to solve the above-described problems, and an object of the present invention is to provide a failure management method in which failure information from the guest OS is aggregated by the host OS.

また、ユーザの指示に基づいて、仮想的な障害を特定のゲストＯＳに対して発生させることを可能とした試験システムを提供することを目的の一つとする。 Another object of the present invention is to provide a test system that can cause a virtual failure to occur on a specific guest OS based on a user instruction.

本発明に係る計算機装置は、
仮想計算機を実現する仮想計算機管理機構を搭載し、前記仮想計算機管理機構上でホストＯＳと一つ以上のゲストＯＳとが動作し、各ゲストＯＳに割当てられる記憶領域を有する計算機装置であって、
前記ホストＯＳの通信アドレスが設定され、各ゲストＯＳで検出された障害を通知する障害情報の宛先アドレスとして前記ホストＯＳの通信アドレスを通知する通信アドレス情報を送信し、各ゲストＯＳで検知された障害を通知する障害情報を受信する第一の障害監視制御部と、
第一の障害監視制御部から通信アドレス情報を受信し、受信した通信アドレス情報に示される前記ホストＯＳの通信アドレスを、対応するゲストＯＳに割当てられている記憶領域に格納し、対応するゲストＯＳにおいて障害が検出された際に検出された障害を通知する障害情報を生成し、記憶領域に格納されている前記ホストＯＳの通信アドレスを宛先アドレスとし、生成した障害情報を前記第一の障害監視制御部に対して送信する一つ以上の第二の障害監視制御部とを有することを特徴とする。 The computer apparatus according to the present invention is:
A computer apparatus equipped with a virtual machine management mechanism for realizing a virtual machine, a host OS and one or more guest OSs operating on the virtual machine management mechanism, and having a storage area allocated to each guest OS,
Communication address information for notifying the communication address of the host OS is transmitted as a destination address of failure information for notifying a failure detected in each guest OS, and the host OS communication address is set. A first fault monitoring control unit for receiving fault information for notifying a fault;
The communication address information is received from the first failure monitoring control unit, the communication address of the host OS indicated in the received communication address information is stored in a storage area allocated to the corresponding guest OS, and the corresponding guest OS Failure information for notifying the detected failure when a failure is detected in the server, the communication address of the host OS stored in the storage area as a destination address, and the generated failure information as the first failure monitoring It has one or more 2nd failure monitoring control parts transmitted to a control part, It is characterized by the above-mentioned.

本発明によれば、第一の障害監視制御部がホストＯＳの通信アドレスの情報を各ゲストＯＳに対応する第二の障害監視制御部に送信し、各ゲストＯＳに対応する第二の障害監視制御部が、障害を通知する障害情報を生成し、ホストＯＳの通信アドレスに基づき、生成した障害情報を第一の障害監視制御部に対して送信するため、障害情報の送信先を第一の障害監視制御部に集約することができる。 According to the present invention, the first failure monitoring control unit transmits information on the communication address of the host OS to the second failure monitoring control unit corresponding to each guest OS, and the second failure monitoring corresponding to each guest OS. The control unit generates failure information for notifying a failure, and transmits the generated failure information to the first failure monitoring control unit based on the communication address of the host OS. It can be integrated into the fault monitoring control unit.

実施の形態１．
本実施の形態では、ホストＯＳにてゲストＯＳからの障害発生の情報を集約できるよう、ネットワーク構成の確認や通知先であるホストＯＳの情報をゲストＯＳに通知可能とすることにより、障害発生の情報を集約し、ホストＯＳ、ゲストＯＳのいずれにおいて検出されるような障害であっても、どこでどのような手段にて検知したかを把握可能とする障害管理方式を説明する。 Embodiment 1 FIG.
In the present embodiment, in order for the host OS to collect information on the occurrence of a failure from the guest OS, it is possible to check the network configuration and to notify the guest OS of the information on the host OS that is the notification destination. A failure management method that aggregates information and makes it possible to grasp where and by what means a failure that can be detected in either the host OS or the guest OS will be described.

図１は、本実施の形態に係る仮想計算機環境における障害管理方式を説明する構成図である。
図に示すように、本実施の形態では、障害監視装置１、物理計算機２−１〜２−ｎを備えている。障害監視装置１、物理計算機２−１〜２−ｎのホストＯＳは通信回線３を介して接続され、ゲストＯＳは通信回線４を介して接続されている仮想計算機システムを前提にして説明する。
物理計算機２−１〜２−ｎは、計算機装置の例である。 FIG. 1 is a configuration diagram for explaining a failure management method in a virtual machine environment according to the present embodiment.
As shown in the figure, the present embodiment includes a failure monitoring apparatus 1 and physical computers 2-1 to 2-n. The failure monitoring apparatus 1 and the host computers of the physical computers 2-1 to 2 -n are connected via the communication line 3, and the guest OS is assumed to be a virtual computer system connected via the communication line 4.
The physical computers 2-1 to 2-n are examples of computer devices.

障害監視装置１は、各システムに導入されている市販されているものやシステムにて自製された障害監視ツールが導入された端末装置である。 The failure monitoring device 1 is a terminal device into which a commercially available device installed in each system or a failure monitoring tool manufactured by the system is installed.

物理計算機２−１〜２−ｎは、仮想計算機管理機構２０が搭載された計算機であり、仮想計算機管理機構２０上でホストＯＳ２２と１つ以上のゲストＯＳが動作する。
ホストＯＳ２２は、各ゲストＯＳのＩ／Ｏのエミュレート等を実施する。
ゲストＯＳ２３は、アプリケーションプログラムの実行等を行う。
ホストＯＳ２２、ゲストＯＳ２３は、仮想ネットワーク２１で接続される。
また、仮想計算機管理機構２０は、仮想マシンモニタとも呼ばれる。 The physical computers 2-1 to 2-n are computers on which the virtual computer management mechanism 20 is mounted, and the host OS 22 and one or more guest OSs operate on the virtual computer management mechanism 20.
The host OS 22 emulates I / O of each guest OS.
The guest OS 23 executes application programs and the like.
The host OS 22 and guest OS 23 are connected by a virtual network 21.
The virtual machine management mechanism 20 is also called a virtual machine monitor.

通信回線３、４は、例えばイントラネット、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）等のネットワークであり、互いに独立したネットワークで、相互通信を行うことは出来ない。 The communication lines 3 and 4 are networks such as an intranet and a LAN (Local Area Network), for example, and are mutually independent networks and cannot perform mutual communication.

ホストＯＳ２２上には、障害が発生したことにより発生する事象を監視し、情報を取得するホスト障害監視制御エージェント２２１が搭載されている。
ゲストＯＳ２３上には障害が発生したことにより発生する事象を監視し、情報を取得するゲスト障害監視制御エージェント２３１が搭載されている。 On the host OS 22, a host failure monitoring control agent 221 that monitors an event that occurs when a failure occurs and acquires information is installed.
On the guest OS 23, a guest failure monitoring control agent 231 that monitors an event that occurs due to a failure and acquires information is installed.

ホスト障害監視制御エージェント２２１には、ホストＯＳ２２のＩＰアドレス（通信アドレス）が設定されている。
ホスト障害監視制御エージェント２２１は、同じ仮想計算機管理機構２０上で動作する各ゲストＯＳ２３に対して、各ゲストＯＳで検出された障害を通知する障害情報の宛先アドレスとしてホストＯＳ２２のＩＰアドレスの情報を送信し、同じ仮想計算機管理機構２０上で動作する各ゲストＯＳ２３から障害情報を受信する。
ホスト障害監視制御エージェント２２１は、第一の障害監視制御部の例である。 In the host failure monitoring control agent 221, the IP address (communication address) of the host OS 22 is set.
The host failure monitoring and control agent 221 uses the IP address information of the host OS 22 as a destination address of failure information for notifying each guest OS 23 operating on the same virtual machine management mechanism 20 of a failure detected by each guest OS. The failure information is transmitted from each guest OS 23 operating on the same virtual machine management mechanism 20.
The host failure monitoring control agent 221 is an example of a first failure monitoring control unit.

ゲスト障害監視制御エージェント２３１は、ホスト障害監視制御エージェント２２１からホストＯＳ２２のＩＰアドレスの情報を受信し、障害を検出した際に検出した障害を通知する障害情報を生成し、ホストＯＳ２２のＩＰアドレスを宛先アドレスとして、生成した障害情報をホスト障害監視制御エージェント２２１に対して送信する。
また、物理計算機２には、ゲストＯＳに割当てられるメモリの物理的な記憶領域（不図示）があり、ゲスト障害監視制御エージェント２３１は、ホストＯＳのＩＰアドレスの情報を受信した際に、ホストＯＳ２２のＩＰアドレスを、自身（ゲストＯＳ）に割当てられるメモリの記憶領域に格納する。ゲスト障害監視制御エージェント２３１は、エミュレートされた仮想メモリ（ファイル）を自身の物理メモリとして認識しており、ゲスト障害監視制御エージェント２３１は、自身に割当てられている仮想メモリの記憶領域にホストＯＳのＩＰアドレスを格納する処理を行うことで、ホストＯＳ２２、仮想計算機管理機構２０を介して、自身に割当てられている物理メモリの記憶領域にホストＯＳ２２のＩＰアドレスを格納することができる。
また、障害情報を送信する際も、自身に割当てられている仮想メモリの記憶領域からホストＯＳのＩＰアドレスを読み込む処理を行うことで、ホストＯＳ２２、仮想計算機管理機構２０を介して、自身の物理メモリの記憶領域からホストＯＳ２２のＩＰアドレスを読み込むことができる。
ゲスト障害監視制御エージェント２３１は、第二の障害監視制御部の例である。 The guest failure monitoring control agent 231 receives the IP address information of the host OS 22 from the host failure monitoring control agent 221, generates failure information for notifying the detected failure when the failure is detected, and sets the IP address of the host OS 22 The generated failure information is transmitted to the host failure monitoring control agent 221 as the destination address.
Further, the physical computer 2 has a physical storage area (not shown) of a memory allocated to the guest OS, and the guest fault monitoring control agent 231 receives the host OS IP address information and receives the host OS 22. Are stored in a storage area of a memory allocated to itself (guest OS). The guest fault monitoring control agent 231 recognizes the emulated virtual memory (file) as its own physical memory, and the guest fault monitoring control agent 231 stores the host OS in the storage area of the virtual memory allocated to itself. The IP address of the host OS 22 can be stored in the storage area of the physical memory allocated to itself via the host OS 22 and the virtual machine management mechanism 20.
Also, when transmitting failure information, the host OS 22 and the virtual machine management mechanism 20 are used to read the physical address of the host OS by reading the host OS IP address from the virtual memory storage area allocated to itself. The IP address of the host OS 22 can be read from the storage area of the memory.
The guest failure monitoring control agent 231 is an example of a second failure monitoring control unit.

このような機構により、各物理計算機２では、各ゲスト障害監視制御エージェント２３１による障害情報の送信先をホスト障害監視制御エージェント２２１に集約している。
また、ホスト障害監視制御エージェント２２１は、各ゲスト障害監視制御エージェント２３１から受信した障害情報にホストＯＳ２２の識別情報を付加し、ホストＯＳ２２の識別情報が付加された障害情報を障害監視装置１に対して送信する。 With such a mechanism, each physical computer 2 collects the failure information transmission destinations of the guest failure monitoring control agents 231 in the host failure monitoring control agent 221.
The host failure monitoring control agent 221 adds the identification information of the host OS 22 to the failure information received from each guest failure monitoring control agent 231, and sends the failure information to which the identification information of the host OS 22 is added to the failure monitoring device 1. To send.

ホスト障害監視制御エージェント２２１において、２２１１は、ゲストＯＳ２３のＩＰアドレスの検索を行い、ホストＯＳ２２のＩＰアドレス情報をゲストＯＳ２３に通知するＩＰ通知部である。
２２１２は、障害検知部２２１３あるいはゲスト障害監視制御エージェント２３１からの障害発生情報を受信する情報受信部である。
２２１３は、ホストＯＳ２２やゲストＯＳ２３で発生した障害に基づき発生する動作を監視し、障害発生の検知を行う障害検知部である。
２２１４は、情報受信部２２１２からのゲストＯＳの障害情報や障害検知部２２１３からのホストＯＳの障害情報を障害監視装置１が解析可能な情報に変換し、通知を行う障害報告部である。 In the host failure monitoring control agent 221, 2211 is an IP notification unit that searches for the IP address of the guest OS 23 and notifies the guest OS 23 of the IP address information of the host OS 22.
Reference numeral 2212 denotes an information reception unit that receives failure occurrence information from the failure detection unit 2213 or the guest failure monitoring control agent 231.
Reference numeral 2213 denotes a failure detection unit that monitors an operation that occurs based on a failure that has occurred in the host OS 22 or the guest OS 23 and detects the occurrence of the failure.
A failure report unit 2214 converts the failure information of the guest OS from the information reception unit 2212 and the failure information of the host OS from the failure detection unit 2213 into information that can be analyzed by the failure monitoring apparatus 1 and performs notification.

ゲスト障害監視制御エージェント２３１において、２３１１は、ホストＯＳ２２のＩＰアドレス情報を受信し、その情報を基に障害情報の報告をホストＯＳ２２に対して実施可能とするＩＰ受信部である。
２３１２は、障害検知部２３１３からの障害発生情報を受信する情報受信部である。
２３１３は、ホストＯＳ２２やゲストＯＳ２３で発生した障害に基づき発生する動作を監視し、障害発生の検知を行う障害検知部である。
２３１４は、障害検知部２３１３にて収集した障害情報をホスト障害監視制御エージェント２２１に通知する障害報告部である。 In the guest failure monitoring control agent 231, 2311 is an IP receiving unit that receives the IP address information of the host OS 22 and enables the host OS 22 to report failure information based on the information.
Reference numeral 2312 denotes an information reception unit that receives failure occurrence information from the failure detection unit 2313.
Reference numeral 2313 denotes a failure detection unit that monitors an operation that occurs based on a failure that has occurred in the host OS 22 or the guest OS 23 and detects the occurrence of the failure.
A failure report unit 2314 notifies the host failure monitoring control agent 221 of failure information collected by the failure detection unit 2313.

図２、図３は、実施の形態１の仮想計算機環境における障害管理方式の処理動作を示すフローチャートである。
まず、図２を用いてホストＯＳ２２上のホスト障害監視制御エージェント２２１の動作について説明する。 2 and 3 are flowcharts showing processing operations of the failure management method in the virtual machine environment according to the first embodiment.
First, the operation of the host failure monitoring control agent 221 on the host OS 22 will be described with reference to FIG.

ホスト障害監視制御エージェント２２１が起動すると、ＩＰ通知部２２１１は、仮想計算機管理機構２０が提供するゲストＯＳ２３の情報からゲストＯＳ２３のＭＡＣ（ＭｅｄｉａＡｃｃｅｓｓＣｏｎｔｒｏｌ）アドレスを取得し、ａｒｐ（ＡｄｄｒｅｓｓＲｅｓｏｌｕｔｉｏｎＰｒｏｔｏｃｏｌ）コマンド等やホストＯＳ２２が持つネットワークＩ／Ｆ（Ｉｎｔｅｒｆａｃｅ）にて取りうるアドレスリストを生成し、ｐｉｎｇ等のＩＣＭＰ（ＩｎｔｅｒｎｅｔＣｏｎｔｒｏｌＭｅｓｓａｇｅＰｒｏｔｏｃｏｌ）のコマンドを用いてネットワークアクセスを行い、ノードの到達を確認してＭＡＣアドレスとＩＰアドレスの対応情報を取得する（ＳＴ１０１）。
つまり、ホスト障害監視制御エージェント２２１は、仮想計算機管理機構２０から提供される各ゲストＯＳ２３のドメインＩＤと各ゲストＯＳ２３のＭＡＣアドレスとの対応づけ情報に基づき、各ゲストＯＳ２３のＩＰアドレスを取得し、取得した各ゲストＯＳ２３のＩＰアドレスを用いて、各ゲストＯＳ２３にホストＯＳ２２のＩＰアドレスの情報を送信する。
例えば、仮想計算機管理機構２０がＸｅｎ（登録商標）に従って構成されている場合は、ＩＰ通知部２２１１は、Ｘｅｎに用意されているｘｍコマンドの結果として得られるＭＡＣアドレスの情報と、ＩＰアドレスとＭＡＣアドレスの対応情報（ａｒｐコマンドの出力）を基に対応を調べて導出する。
Ｘｅｎの場合、仮想計算機管理機構２０は、ドメインという概念でホストＯＳやゲストＯＳを管理しており、特定のゲストＯＳを識別するためにはドメイン名やドメインＩＤといった識別情報を利用する必要がある。この情報は仮想計算機管理機構２０の内部で管理されているものである。
図１６（ａ）はｘｍコマンドの出力例を示し、図１６（ｂ）はａｒｐコマンドの出力例を示す。
図１６（ａ）の例では、ドメインＩＤ：１（ｄｏｍｉｄ１）のゲストＯＳのＭＡＣアドレスとして「ｍａｃ００：１６：３ｅ：３１：４ｃ：２ｆ」が示され、図１６（ｂ）の例では、３行目にＭＡＣアドレス「００：１６：３Ｅ：３１：４Ｃ：２Ｆ」とＩＰアドレス「１９２．１６８．１．１００」が示され、これらより、ＩＰ通知部２２１１は、ドメインＩＤ：１のゲストＯＳのＩＰアドレスが「１９２．１６８．１．１００」であることを検出する。 When the host failure monitoring control agent 221 is activated, the IP notification unit 2211 acquires the MAC (Media Access Control) address of the guest OS 23 from the information of the guest OS 23 provided by the virtual machine management mechanism 20, and an arp (Address Resolution Protocol) command. Etc., and an address list that can be taken by the network interface (Interface) of the host OS 22 is generated, network access is performed using ICMP (Internet Control Message Protocol) commands such as ping, and node arrival is confirmed. The correspondence information between the MAC address and the IP address is acquired (ST101).
That is, the host failure monitoring control agent 221 acquires the IP address of each guest OS 23 based on the association information between the domain ID of each guest OS 23 and the MAC address of each guest OS 23 provided from the virtual machine management mechanism 20. Information on the IP address of the host OS 22 is transmitted to each guest OS 23 using the acquired IP address of each guest OS 23.
For example, when the virtual machine management mechanism 20 is configured in accordance with Xen (registered trademark), the IP notification unit 2211 receives the MAC address information, the IP address, and the MAC address obtained as a result of the xm command prepared in Xen. Based on address correspondence information (output of the arp command), the correspondence is examined and derived.
In the case of Xen, the virtual machine management mechanism 20 manages the host OS and guest OS based on the concept of domain, and it is necessary to use identification information such as a domain name and a domain ID to identify a specific guest OS. . This information is managed inside the virtual machine management mechanism 20.
FIG. 16A shows an output example of the xm command, and FIG. 16B shows an output example of the arp command.
In the example of FIG. 16A, “mac 00: 16: 3e: 31: 4c: 2f” is shown as the MAC address of the guest OS of the domain ID: 1 (domid 1). In the example of FIG. The MAC address “00: 16: 3E: 31: 4C: 2F” and the IP address “192.168.1.100” are shown in the third line, and from these, the IP notification unit 2211 has the domain ID: 1. It detects that the IP address of the guest OS is “192.168.1.100”.

次に、ＩＰ通知部２２１１は、ＭＡＣアドレスとＩＰアドレスの対応情報と仮想計算機管理機構２０が提供するゲストＯＳのＭＡＣアドレスの情報から該当するＩＰアドレスの取得が出来たかどうかを判断する（ＳＴ１０２）。
該当するＩＰアドレスの取得が出来た場合は、ＳＴ１０３へ処理を移す。取得できなかった場合は、ＳＴ１０７へ処理を移す。 Next, the IP notification unit 2211 determines whether or not the corresponding IP address has been acquired from the correspondence information between the MAC address and the IP address and the MAC address information of the guest OS provided by the virtual machine management mechanism 20 (ST102). .
If the corresponding IP address can be acquired, the process proceeds to ST103. If not acquired, the process proceeds to ST107.

該当するＩＰアドレスの取得が出来た場合（ＳＴ１０２でＹＥＳ）は、ＩＰ通知部２２１１はホストＯＳ２２のＩＰアドレスを、取得したＩＰアドレス情報を利用してゲストＯＳ２３に通知する（ＳＴ１０３）。
つまり、ＩＰ通知部２２１１は、取得したゲストＯＳ２３のＩＰアドレスが宛先として付加され、ホストＯＳ２２のＩＰアドレスを通知するパケットをゲストＯＳ２３に送信する。 When the corresponding IP address can be acquired (YES in ST102), the IP notification unit 2211 notifies the guest OS 23 of the IP address of the host OS 22 using the acquired IP address information (ST103).
That is, the IP notification unit 2211 adds the acquired IP address of the guest OS 23 as a destination, and transmits a packet that notifies the IP address of the host OS 22 to the guest OS 23.

次に、情報受信部２２１２は、ゲスト障害監視制御エージェント２３１の障害報告部２３１４からの障害情報、あるいは障害検知部２２１３からの障害情報の受信を待つ（ＳＴ１０４）。
情報を受信すると、障害報告部２２１４は、障害監視装置１の障害監視ツールが解析可能な情報に変換する（ＳＴ１０５）。
情報の変換が完了すると、障害報告部２２１４は、障害情報を障害監視装置１の障害監視ツールに送信し、障害通知を待つＳＴ１０４へ（ＳＴ１０６）。 Next, the information reception unit 2212 waits for reception of failure information from the failure report unit 2314 of the guest failure monitoring control agent 231 or failure information from the failure detection unit 2213 (ST104).
Upon receiving the information, the failure report unit 2214 converts the information into information that can be analyzed by the failure monitoring tool of the failure monitoring apparatus 1 (ST105).
When the information conversion is completed, the failure report unit 2214 transmits the failure information to the failure monitoring tool of the failure monitoring apparatus 1 and waits for a failure notification to ST104 (ST106).

該当するＩＰアドレスの取得が出来なかった場合（ＳＴ１０２でＮＯ）は、ホストＯＳ２２とゲストＯＳ２３の間の通信が可能なネットワークが存在しないと判断し、障害報告部２２１４が障害監視装置１のオペレータに通知し（当該ゲストＯＳ２３とホストＯＳ２２との間のネットワークが設定されていないことを通知するメッセージを出力し）、終了する（ＳＴ１０７）。 If the corresponding IP address cannot be acquired (NO in ST102), it is determined that there is no network that can communicate between the host OS 22 and the guest OS 23, and the failure report unit 2214 informs the operator of the failure monitoring apparatus 1 Notify (output a message notifying that the network between the guest OS 23 and the host OS 22 is not set), and end (ST107).

次に、図３を用いてゲストＯＳ２３上のゲスト障害監視制御エージェント２３１の動作について説明する。 Next, the operation of the guest failure monitoring control agent 231 on the guest OS 23 will be described with reference to FIG.

ゲスト障害監視制御エージェント２３１が起動すると、ＩＰ受信部２３１１は、ホスト障害監視制御エージェント２２１からのホストＯＳ２２のＩＰアドレス情報の受信を待ち、受信すると障害報告部２３１４にホストＯＳ２２のＩＰアドレスの情報を通知する（ＳＴ２０１）。
障害報告部２３１４は、自身（ゲストＯＳ２３）に割当てられている仮想メモリの記憶領域に格納する処理を行うことにより、仮想計算機管理機構２０及びホストＯＳ２２の仲介により、ホストＯＳ２２のＩＰアドレスを自身に割当てられている物理メモリの記憶領域に格納する。 When the guest failure monitoring control agent 231 is activated, the IP reception unit 2311 waits for reception of the IP address information of the host OS 22 from the host failure monitoring control agent 221. When the guest failure monitoring control agent 231 is received, the IP address information of the host OS 22 is sent to the failure reporting unit 2314. Notification is made (ST201).
The failure report unit 2314 performs the process of storing in the storage area of the virtual memory allocated to itself (guest OS 23), and thereby sets the IP address of the host OS 22 to itself through mediation between the virtual machine management mechanism 20 and the host OS 22. Store in the storage area of the allocated physical memory.

情報受信部２３１２は、障害検知部２３１３によって取得された障害情報を受信する（ＳＴ２０２）。
障害情報を受信すると、障害報告部２３１４はホスト障害監視制御エージェント２２１に障害情報を通知する（ＳＴ２０３）。
このときの障害情報には、ホスト障害監視制御エージェント２２１から受信したホストＯＳ２２のＩＰアドレスが宛先アドレスとして付加され、障害報告部２３１４はホスト障害監視制御エージェント２２１に対して障害情報を送信する。
障害報告部２３１４は、自身（ゲストＯＳ２３）に割当てられている仮想メモリの記憶領域からホストＯＳ２２のＩＰアドレスを読み出す処理を行うことにより、仮想計算機管理機構２０及びホストＯＳ２２の仲介により、ホストＯＳ２２のＩＰアドレスを自身に割当てられている物理メモリの記憶領域から読み出して、障害情報にホストＯＳ２２のＩＰアドレスを付加する。 The information receiving unit 2312 receives the failure information acquired by the failure detecting unit 2313 (ST202).
Upon receiving the failure information, the failure report unit 2314 notifies the failure information to the host failure monitoring control agent 221 (ST203).
At this time, the IP address of the host OS 22 received from the host fault monitoring control agent 221 is added as a destination address to the fault information, and the fault reporting unit 2314 transmits the fault information to the host fault monitoring control agent 221.
The failure report unit 2314 reads the IP address of the host OS 22 from the storage area of the virtual memory allocated to itself (guest OS 23), thereby intermediating the virtual machine management mechanism 20 and the host OS 22 to The IP address is read from the storage area of the physical memory allocated to itself, and the IP address of the host OS 22 is added to the failure information.

例えば、図１７に示すように、元々別の拠点にあった各種サーバを集約した場合を想定する。
図１７は、元々は、拠点１にサーバＡとサーバＢが存在し、また、拠点２にサーバＡとサーバＢが存在していたが、これらを仮想計算機で再構築し、物理計算機１に拠点１のサーバＡと同等のゲストＯＳと、拠点２のサーバＡと同等のゲストＯＳを配置し、物理計算機２に拠点１のサーバＢと同等のゲストＯＳと、拠点２のサーバＢと同等のゲストＯＳを配置した状態を示している。
物理計算機１上のホストＯＳとゲストＯＳ（拠点１サーバＡ）は仮想ネットワークで接続されている。また、物理計算機１上のホストＯＳとゲストＯＳ（拠点２サーバＡ）は仮想ネットワークで接続されている。しかし、物理計算機１上のゲストＯＳ（拠点１サーバＡ）とゲストＯＳ（拠点２サーバＡ）は、別の拠点にあったものであるため、同一物理計算機上に存在するものの、これらゲストＯＳは相互に接続されていない。
物理計算機２でも同様である。
一方、物理計算機１上のゲストＯＳ（拠点１サーバＡ）と物理計算機２上のゲストＯＳ（拠点１サーバＢ）は、元々同じ拠点にあったので仮想ネットワークで接続される。同様に、物理計算機２上のゲストＯＳ（拠点２サーバＡ）と物理計算機２上のゲストＯＳ（拠点２サーバＢ）は、元々同じ拠点にあったので仮想ネットワークで接続される。 For example, as shown in FIG. 17, a case is assumed where various servers that originally existed in different bases are aggregated.
FIG. 17 originally shows that the server A and the server B exist at the base 1 and the server A and the server B exist at the base 2, but these are reconstructed by a virtual machine and the physical computer 1 A guest OS equivalent to the server A of the base 1 and a guest OS equivalent to the server A of the base 2 are arranged, and a guest OS equivalent to the server B of the base 1 and a guest equivalent to the server B of the base 2 are placed on the physical computer 2 The state where OS is arranged is shown.
The host OS on the physical computer 1 and the guest OS (base 1 server A) are connected by a virtual network. Further, the host OS on the physical computer 1 and the guest OS (base 2 server A) are connected by a virtual network. However, since the guest OS (base 1 server A) and guest OS (base 2 server A) on the physical computer 1 are in different bases, they exist on the same physical computer. Not connected to each other.
The same applies to the physical computer 2.
On the other hand, the guest OS (base 1 server A) on the physical computer 1 and the guest OS (base 1 server B) on the physical computer 2 were originally located at the same base, so they are connected by a virtual network. Similarly, the guest OS (base 2 server A) on the physical computer 2 and the guest OS (base 2 server B) on the physical computer 2 were originally located at the same base, so they are connected by a virtual network.

このような構成において、本実施の形態では、物理計算機１のホストＯＳは、同じ物理計算機１上のゲストＯＳ（拠点１サーバＡ）及びゲストＯＳ（拠点２サーバＡ）にＩＰアドレスを通知し、ゲストＯＳ（拠点１サーバＡ）及びゲストＯＳ（拠点２サーバＡ）から障害情報を受信し、障害情報を集約して、障害監視装置１に障害情報を通知する。
また、同様に、物理計算機２のホストＯＳは、同じ物理計算機２上のゲストＯＳ（拠点１サーバＢ）及びゲストＯＳ（拠点２サーバＢ）にＩＰアドレスを通知し、ゲストＯＳ（拠点１サーバＢ）及びゲストＯＳ（拠点２サーバＢ）から障害情報を受信し、障害情報を集約して、障害監視装置１に障害情報を通知する。
このため、障害監視装置１は、物理計算機の構成、ＯＳ間の仮想ネットワークの構成に関わらず、全てのゲストＯＳの障害情報を収集することができる。 In such a configuration, in this embodiment, the host OS of the physical computer 1 notifies the guest OS (base 1 server A) and the guest OS (base 2 server A) on the same physical computer 1 of the IP address, Fault information is received from the guest OS (base 1 server A) and the guest OS (base 2 server A), the fault information is aggregated, and the fault monitoring apparatus 1 is notified of the fault information.
Similarly, the host OS of the physical computer 2 notifies the guest OS (base 1 server B) and guest OS (base 2 server B) on the same physical computer 2 of the IP address, and the guest OS (base 1 server B). ) And the guest OS (base 2 server B), collect the failure information, and notify the failure monitoring apparatus 1 of the failure information.
For this reason, the failure monitoring apparatus 1 can collect failure information of all guest OSes regardless of the configuration of the physical computer and the configuration of the virtual network between the OSs.

以上のように、実施の形態１によれば、起動時にゲストＯＳの構成に対応し、自動的に通信のためのＩＰアドレスの情報を受け渡しする構成としたことにより、ゲストＯＳの追加が発生した場合であっても、自動的に情報の送受信を行うことが可能となる。
また、ゲストＯＳのＩＰアドレス情報をホストＯＳ上で取得できなかった場合には、直接接続可能なネットワーク構成が行われていないことが事前に判断可能となったことにより、オペレータの操作によりネットワーク構成の再構築を行った上で障害の監視を実施することが可能となる。
また、障害情報をホストＯＳ上で集約可能としたことにより、ホストＯＳとゲストＯＳ、あるいはゲストＯＳ同士が独立した別セグメントのネットワークに接続されていた場合であっても、障害発生の指示や障害情報の収集を一つの障害監視ツールにて管理することが可能となる。
また、ホストＯＳ、ゲストＯＳにて障害の発生や監視を行うこととしたことにより、仮想計算機管理機構の実装方式に依存せずに障害の監視を行うことが可能となる。 As described above, according to the first embodiment, the guest OS is added due to the configuration that automatically corresponds to the configuration of the guest OS at the time of start-up and automatically passes the IP address information for communication. Even in this case, it is possible to automatically send and receive information.
Further, when the IP address information of the guest OS cannot be acquired on the host OS, it is possible to determine in advance that the network configuration that can be directly connected is not performed. It is possible to monitor faults after rebuilding the system.
In addition, since failure information can be aggregated on the host OS, even when the host OS and guest OS, or when the guest OS is connected to a network in a separate segment, failure indications and failures Information collection can be managed by a single failure monitoring tool.
In addition, since the occurrence and monitoring of the failure are performed in the host OS and the guest OS, the failure can be monitored without depending on the mounting method of the virtual machine management mechanism.

本実施の形態では、仮想計算機環境を搭載した物理計算機にて、
ホストＯＳ上にホスト障害監視制御エージェント、ゲストＯＳ上にゲスト障害監視制御エージェントを備え、
前記ホスト障害監視制御エージェントは、
前記ゲストＯＳのドメインＩＤの情報から得られるＭＡＣアドレスの情報を基にＩＰアドレスを取得し、取得したＩＰアドレスを用いて通信を行い、前記ホストＯＳのＩＰアドレスを通知するＩＰ通知部と、
前記ゲスト障害監視制御エージェントからの障害発生情報を受信する情報受信部と、
前記ホストＯＳ、あるいはハードウェアの動作を監視し、障害発生の検知を行う障害検知部と、
前記障害検知部や前記情報受信部にて取得した検知した障害情報を障害監視ツール等に通知する障害報告部とを備え、
前記ゲスト障害監視制御エージェントは、
前記ホストＯＳのＩＰアドレス情報を取得し、障害情報の通知に利用できるようにするＩＰ受信部と、
前記ゲストＯＳの動作を監視し、障害発生の検知を行う障害検知部と、
前記障害検知部にて取得した障害情報を前記ホスト障害制御エージェントに通知する障害報告部とを備え、
仮想計算機環境下においてゲスト障害監視制御エージェントの障害通知先を自動的にホストＯＳに設定し、通知できるようにしたことにより、ホストＯＳ障害監視制御エージェントのみが障害監視ツール等に通知することが可能となり、物理計算機単位で障害情報の通知を行う仮想計算機環境における障害管理方式を説明した。 In this embodiment, a physical computer equipped with a virtual machine environment
A host fault monitoring and control agent is provided on the host OS, and a guest fault monitoring and control agent is provided on the guest OS.
The host failure monitoring control agent is
An IP notification unit that acquires an IP address based on MAC address information obtained from the domain ID information of the guest OS, performs communication using the acquired IP address, and notifies the IP address of the host OS;
An information receiving unit for receiving failure occurrence information from the guest failure monitoring control agent;
A fault detection unit that monitors the operation of the host OS or hardware and detects the occurrence of a fault;
A failure reporting unit for notifying a failure monitoring tool or the like of the detected failure information acquired by the failure detection unit or the information receiving unit;
The guest fault monitoring control agent is
An IP receiver that obtains the IP address information of the host OS and makes it available for notification of fault information;
A fault detection unit that monitors the operation of the guest OS and detects a fault occurrence;
A failure report unit for notifying the host failure control agent of the failure information acquired by the failure detection unit;
In the virtual machine environment, the failure notification destination of the guest failure monitoring control agent is automatically set to the host OS so that it can be notified so that only the host OS failure monitoring control agent can notify the failure monitoring tool. Thus, a fault management method in a virtual machine environment that notifies fault information in units of physical machines has been described.

また、本実施の形態では、ホスト障害監視制御エージェントに、前記ゲストＯＳのドメインＩＤの情報から得られるＭＡＣアドレスの情報を基にＩＰアドレスを取得し、取得したＩＰアドレスを用いて通信を行い、前記ホストＯＳのＩＰアドレスを通知するＩＰ通知部を備え、
前記ゲスト障害監視制御エージェントに前記ホストＯＳのＩＰアドレス情報を取得し、障害情報の通知に利用できるようにするＩＰ受信部を備え、
ゲストＯＳの追加時の自動設定とネットワーク設定の不備を検知することを可能とした仮想計算機環境における障害試験システムについて説明した。 In this embodiment, the host failure monitoring control agent acquires an IP address based on the MAC address information obtained from the domain ID information of the guest OS, performs communication using the acquired IP address, An IP notification unit for notifying the IP address of the host OS;
An IP receiver that obtains the IP address information of the host OS in the guest fault monitoring control agent and makes it available for notification of fault information;
A failure test system in a virtual machine environment that can detect a deficiency in automatic setting and network setting when a guest OS is added has been described.

実施の形態２．
実施の形態２では、障害情報にホストＯＳ情報を付与する形態について示す。
図４、図５、図６を用いて、障害情報がゲスト障害監視制御エージェント２３１から送付されてきた場合に、障害情報にホストＯＳ情報を付与する動作について説明する。
図４は、本実施の形態に係るホストＯＳの動作フローである。
図５は、ゲストＯＳ２３にて障害を検知した際の障害情報の例である。
図６は、ホストＯＳ２２から障害監視制御マネージャ５へ通知する障害情報の例である。 Embodiment 2. FIG.
In the second embodiment, a mode in which host OS information is added to failure information will be described.
The operation of adding the host OS information to the failure information when the failure information is sent from the guest failure monitoring control agent 231 will be described with reference to FIGS. 4, 5, and 6.
FIG. 4 is an operation flow of the host OS according to the present embodiment.
FIG. 5 is an example of failure information when a failure is detected by the guest OS 23.
FIG. 6 is an example of failure information notified from the host OS 22 to the failure monitoring control manager 5.

図４において、ＳＴ１０１〜ＳＴ１０４及びＳＴ１０７は実施の形態１で説明したものと同様である。
ゲスト障害監視制御エージェント２３１は、図５の例のような障害を検出したメッセージをホストＯＳ２２に送信する。この時点の障害情報は、ゲストＯＳ２３上で取得できる情報のみが記載され、ホストＯＳ２２に関する情報は記載されていない状態である。
ホストＯＳ２２の情報受信部２２１２は、情報を受信すると障害検知部２２１３からの障害情報かゲストＯＳ２３からの障害情報かを判断する（ＳＴ３０１）。
障害検知部２２１３からの障害情報の場合は、ＳＴ１０５へ処理を移す。ゲストＯＳからの障害情報の場合は、ＳＴ３０２へ処理を移す。 In FIG. 4, ST101 to ST104 and ST107 are the same as those described in the first embodiment.
The guest failure monitoring control agent 231 transmits a message that detects a failure such as the example of FIG. 5 to the host OS 22. Only the information that can be acquired on the guest OS 23 is described as the failure information at this time, and information regarding the host OS 22 is not described.
When receiving the information, the information reception unit 2212 of the host OS 22 determines whether the failure information is from the failure detection unit 2213 or the failure information from the guest OS 23 (ST301).
In the case of failure information from the failure detection unit 2213, the process proceeds to ST105. In the case of failure information from the guest OS, the process moves to ST302.

ゲストＯＳから来た障害情報の場合は、情報受信部２２１２が図６の例のようにホストＯＳ識別情報（この例の場合はホスト名）を付与する（ＳＴ３０２）。
以降のＳＴ１０５及びＳＴ１０６は、実施の形態１で説明したものと同様である。 In the case of failure information coming from the guest OS, the information receiving unit 2212 assigns host OS identification information (host name in this example) as in the example of FIG. 6 (ST302).
Subsequent ST105 and ST106 are the same as those described in the first embodiment.

以上のように、実施の形態２によれば、障害を検知したＯＳの識別情報とホストＯＳの識別情報を障害情報の中にセットするようにしたことにより、試験結果がどの物理計算機の情報であるか、また、その情報は物理計算機上のどのＯＳ上で検出されたかの把握が可能となり、障害発生時の挙動の把握を容易に行うことが可能となる。
また、ホストＯＳの識別情報をホストＯＳ上でセットするようにしたことにより、ゲストＯＳは、ホストＯＳの情報を知ることなく、障害情報にホストＯＳの識別情報をセットすることが可能となり、ゲストＯＳ上のゲスト障害監視制御エージェントは独立して動作することが可能となる。 As described above, according to the second embodiment, the identification information of the OS in which the failure is detected and the identification information of the host OS are set in the failure information. In addition, it is possible to grasp on which OS on the physical computer the information has been detected, and it is possible to easily grasp the behavior when a failure occurs.
In addition, since the host OS identification information is set on the host OS, the guest OS can set the host OS identification information in the failure information without knowing the host OS information. The guest fault monitoring control agent on the OS can operate independently.

以上、本実施の形態によれば、ホストＯＳ上の情報受信部にて受信した情報がゲストＯＳ障害監視制御エージェントからの障害情報であった場合には、受信した障害情報にホストＯＳ識別情報を付与する仮想計算機における障害管理方式について説明した。 As described above, according to the present embodiment, when the information received by the information receiving unit on the host OS is failure information from the guest OS failure monitoring control agent, the host OS identification information is added to the received failure information. The fault management method in the virtual machine to be assigned has been described.

実施の形態３．
実施の形態３では、障害検知の形態について示す。
図７、図８は、実施の形態３の仮想計算機環境における障害管理方式の処理動作を示すフローチャートである。
まず、図７を用いてホストＯＳ２２上の障害検知部２２１３の動作について説明する。 Embodiment 3 FIG.
In the third embodiment, a failure detection mode will be described.
7 and 8 are flowcharts showing processing operations of the failure management method in the virtual machine environment according to the third embodiment.
First, the operation of the failure detection unit 2213 on the host OS 22 will be described with reference to FIG.

障害検知部２２１３は、自身が稼動していることを障害監視装置１に対して通知する（ＳＴ４０１）。
つまり、ホスト障害監視制御エージェント２２１の障害検知部２２１３は、ホストＯＳ２２が稼動していることを通知する稼動通知（ハートビート）を一定周期ごとに障害監視装置に対して送信する。 The failure detection unit 2213 notifies the failure monitoring apparatus 1 that it is operating (ST401).
That is, the failure detection unit 2213 of the host failure monitoring control agent 221 transmits an operation notification (heartbeat) for notifying that the host OS 22 is operating to the failure monitoring apparatus at regular intervals.

また、障害検知部２２１３は、ログの監視を行い、監視対象のログファイルの前回チェック部分以降に発生したログの内容をチェックし、指定されたキーワード（例えば、ＥＲＲＯＲという文字列など）のログの発生有無をチェックする（ＳＴ４０２）。
指定されたキーワードのログが発生していなければ、ＳＴ４０３へ処理を移す。発生していた場合は、ＳＴ４０６へ処理を移す。
ホストＯＳにおけるログ監視は、ホストＯＳが出力するログ、仮想計算機管理機構が出力するログを監視することにより、ホストＯＳの障害や仮想計算機管理機構の障害、ホストＯＳが検知したハードウェア障害を検知することを目的とする。 In addition, the failure detection unit 2213 monitors the log, checks the content of the log generated after the previous check portion of the log file to be monitored, and logs the log of the specified keyword (for example, a character string such as ERROR). The presence or absence of occurrence is checked (ST402).
If no log of the specified keyword has occurred, the process moves to ST403. If it has occurred, the process moves to ST406.
Log monitoring in the host OS detects the failure of the host OS, the failure of the virtual machine management mechanism, and the hardware failure detected by the host OS by monitoring the log output by the host OS and the log output by the virtual machine management mechanism. The purpose is to do.

次に、障害検知部２２１３は、プロセスの監視を行い、監視対象のプロセスの有無をｐｓコマンドなどによりチェックする（ＳＴ４０３）。
監視対象のプロセスがすべて稼動していればＳＴ４０４へ処理を移す。監視対象のプロセスが一つでも稼動していなければＳＴ４０７へ処理を移す。 Next, the failure detection unit 2213 monitors the process, and checks the presence / absence of the process to be monitored by a ps command or the like (ST403).
If all the processes to be monitored are running, the process moves to ST404. If even one process to be monitored is not operating, the process moves to ST407.

次に、障害検知部２２１３は、監視対象のＨＷ（物理ハードウェア）へのアクセスやＩＰＭＩ（ＩｎｔｅｌｌｉｇｅｎｔＰｌａｔｆｏｒｍＭａｎａｇｅｍｅｎｔＩｎｔｅｒｆａｃｅ）などのＨＷ自体が監視している稼動情報へのアクセスなどによるＨＷの稼動状態のチェックを行う（ＳＴ４０４）。
ＨＷが正常に稼動していればＳＴ４０５へ処理を移す。正常に稼動していなければＳＴ４０８へ処理を移す。 Next, the failure detection unit 2213 checks the operation status of the HW by accessing the monitoring target HW (physical hardware) or accessing operation information monitored by the HW itself, such as IPMI (Intelligent Platform Management Interface). A check is made (ST404).
If the HW is operating normally, the process moves to ST405. If not operating normally, the process moves to ST408.

次に、障害検知部２２１３は、ゲストＯＳ２３からの稼動通知を受け取り、すべてのゲストＯＳ２３から稼動通知が来ているか否かをチェックする（ＳＴ４０５）。
稼動通知がすべて来ていれば、ＳＴ４０１へ処理を移す。稼動通知が送信されてきていないゲストＯＳ２３が存在する場合には、ＳＴ４０９へ処理を移す。 Next, the failure detection unit 2213 receives an operation notification from the guest OS 23 and checks whether or not operation notifications have been received from all the guest OSs 23 (ST405).
If all operation notifications have been received, the process moves to ST401. If there is a guest OS 23 for which an operation notification has not been transmitted, the process moves to ST409.

ＳＴ４０２において、ログファイル中に指定されたキーワードのログが存在する場合（ＳＴ４０２でＹＥＳ）には、障害検知部２２１３は、障害情報の通知元識別情報にホストＯＳの情報をセットし、通知元ホスト識別情報にもホストＯＳ情報をセットし、日時情報として検知した時間をセットし、監視対象識別情報としてログ監視をセットし、監視対象個別情報として検知したログファイル名をセットし、問題と判断したログの内容をメッセージにセットした障害情報を生成し、障害受信部に通知する。複数存在する場合には、それぞれ一つずつ障害情報を生成し、障害受信部に通知する（ＳＴ４０６）。 In ST402, when the log of the specified keyword exists in the log file (YES in ST402), the failure detection unit 2213 sets the host OS information in the notification source identification information of the failure information, and the notification source host The host OS information is also set in the identification information, the detection time is set as the date and time information, the log monitoring is set as the monitoring target identification information, the detected log file name is set as the individual monitoring target information, and the problem is determined. Fault information in which the log contents are set in a message is generated and notified to the fault receiver. If there are a plurality of fault information, fault information is generated one by one and notified to the fault receiver (ST406).

ＳＴ４０３において、監視対象のプロセスが稼動していない場合（ＳＴ４０３でＹＥＳ）には、障害検知部２２１３は、障害情報の通知元識別情報にホストＯＳの情報をセットし、通知元ホスト識別情報にもホストＯＳ情報をセットし、日時情報として検知した時間をセットし、監視対象識別情報としてプロセス監視をセットし、監視対象個別情報には何もセットせず、稼動していないプロセス名をメッセージにセットした障害情報を生成し、障害受信部に通知する。複数存在する場合には、それぞれ一つずつ障害情報を生成し、障害受信部に通知する（ＳＴ４０７）。 In ST403, when the process to be monitored is not operating (YES in ST403), the failure detection unit 2213 sets the host OS information in the notification source identification information of the failure information, and also in the notification source host identification information. Set the host OS information, set the detected time as date and time information, set process monitoring as monitoring target identification information, set nothing in the monitoring target individual information, and set the name of the process that is not running in the message The failure information is generated and notified to the failure receiver. If there are a plurality of pieces, failure information is generated one by one and notified to the failure receiving unit (ST407).

ＳＴ４０４において、ＨＷに問題がある場合（ＳＴ４０５でＹＥＳ）には、障害検知部２２１３は、障害情報の通知元識別情報にホストＯＳの情報をセットし、通知元ホスト識別情報にもホストＯＳ情報をセットし、日時情報として検知した時間をセットし、監視対象識別情報としてＨＷ監視をセットし、監視対象個別情報には、問題のあったハードウェアの識別情報をセットし、問題の内容をメッセージにセットした障害情報を生成し、障害受信部に通知する。複数存在する場合には、それぞれ一つずつ障害情報を生成し、障害受信部に通知する（ＳＴ４０８）。 In ST404, if there is a problem with the HW (YES in ST405), failure detection section 2213 sets the host OS information in the notification source identification information of the failure information, and also sets the host OS information in the notification source host identification information. Set the detected time as date and time information, set HW monitoring as monitoring target identification information, set the identification information of the hardware that had the problem in the monitoring target individual information, and describe the content of the problem in the message The set failure information is generated and notified to the failure receiving unit. When there are a plurality of pieces, failure information is generated one by one and notified to the failure receiving unit (ST408).

ＳＴ４０５において、ゲストＯＳからの定期的な稼動通知が送信されてきていなかった場合（ＳＴ４０５でＹＥＳ）には、障害検知部２２１３は、障害情報の通知元識別情報にホストＯＳの情報をセットし、通知元ホスト識別情報にもホストＯＳ情報をセットし、日時情報として検知した時間をセットし、監視対象識別情報としてハートビート監視をセットし、監視対象個別情報には何もセットせず、通知が送信されてこなかったゲストＯＳ情報をメッセージにセットした障害情報を生成し、障害受信部に通知する。複数存在する場合には、それぞれ一つずつ障害情報を生成し、障害受信部に通知する（ＳＴ４０９）。 In ST405, when a periodic operation notification from the guest OS has not been transmitted (YES in ST405), the failure detection unit 2213 sets the host OS information in the notification source identification information of the failure information, The host OS information is also set in the notification source host identification information, the detected time is set as the date and time information, the heartbeat monitoring is set as the monitoring target identification information, nothing is set in the monitoring target individual information, and the notification is made. Failure information in which guest OS information that has not been transmitted is set in a message is generated and notified to the failure receiving unit. If there are a plurality of fault information, fault information is generated one by one and notified to the fault receiver (ST409).

障害検知部２２１３は、上記の処理を定期的に繰り返して、ホストＯＳ２２が出力するログ、仮想計算機管理機構２０が出力するログ、ホストＯＳ２２におけるプロセスの稼動状況、各ゲストＯＳ２３の稼動状況、物理ハードウェアの動作状況の少なくともいずれかを一定周期ごとに監視し、いずれかにおいて障害を検出した場合に、検出した障害を通知する障害情報を生成し、生成した障害情報にホストＯＳの識別情報を付加し、ホストＯＳの識別情報が付加された障害情報を障害監視装置１に対して送信する。 The failure detection unit 2213 periodically repeats the above processing, and outputs a log output by the host OS 22, a log output by the virtual machine management mechanism 20, an operation status of a process in the host OS 22, an operation status of each guest OS 23, a physical hardware At least one of the operating status of the hardware is monitored at regular intervals, and when a failure is detected in any one of them, failure information for notifying the detected failure is generated, and the identification information of the host OS is added to the generated failure information Then, the failure information to which the identification information of the host OS is added is transmitted to the failure monitoring apparatus 1.

次に、図８を用いてゲストＯＳ２３上の障害検知部２３１３の動作について説明する。 Next, the operation of the failure detection unit 2313 on the guest OS 23 will be described with reference to FIG.

障害検知部２３１３は、自身が稼動していることをホスト障害監視制御エージェント２２１に対して通知する（ＳＴ５０１）。
具体的には、障害検知部２３１３は、ゲストＯＳ２３が稼動していることを通知する稼動通知（ハートビート）を一定周期ごとに、ホスト障害監視制御エージェント２２１に対して送信する。 The failure detection unit 2313 notifies the host failure monitoring control agent 221 that it is operating (ST501).
Specifically, the failure detection unit 2313 transmits an operation notification (heartbeat) for notifying that the guest OS 23 is operating to the host failure monitoring control agent 221 at regular intervals.

次に、障害検知部２３１３は、ログの監視を行い、監視対象のログファイルの前回チェック部分以降に発生したログの内容をチェックし、指定されたキーワード（例えば、ＥＲＲＯＲという文字列など）のログの発生有無をチェックする（ＳＴ５０２）。
指定されたキーワードのログが発生していなければ、ＳＴ５０３へ処理を移す。発生していた場合は、ＳＴ５０４へ処理を移す。
ゲストＯＳ２３におけるログ監視は、ゲストＯＳ２３上で動作するアプリケーションプログラムが出力したログを監視することにより、アプリケーションプログラムの障害を検知することを目的とし、また、ゲストＯＳ２３が出力するログを監視することにより、ゲストＯＳ２３の障害やゲストＯＳが検知したハードウェアの障害を検知することを目的としている。 Next, the failure detection unit 2313 monitors the log, checks the content of the log generated after the previous check portion of the log file to be monitored, and logs the specified keyword (for example, a character string such as ERROR). Is checked (ST502).
If no log for the specified keyword has occurred, the process moves to ST503. If it has occurred, the process moves to ST504.
The log monitoring in the guest OS 23 is intended to detect a failure of the application program by monitoring the log output by the application program operating on the guest OS 23, and by monitoring the log output by the guest OS 23. The purpose is to detect a failure of the guest OS 23 and a hardware failure detected by the guest OS.

次に、障害検知部２３１３は、ゲストＯＳ上のプロセス及びゲストＯＳ上で動作しているアプリケーションプログラムのプロセスの監視を行い、監視対象のプロセスの有無をｐｓコマンドなどによりチェックする（ＳＴ５０３）。
監視対象のプロセスがすべて稼動していればＳＴ４０１へ処理を移す。監視対象のプロセスが一つでも稼動していなければＳＴ５０５へ処理を移す。 Next, the failure detection unit 2313 monitors the process on the guest OS and the process of the application program running on the guest OS, and checks the presence / absence of the process to be monitored using a ps command or the like (ST503).
If all the monitoring target processes are operating, the process proceeds to ST401. If even one process to be monitored is not operating, the process moves to ST505.

また、ＳＴ５０２において、ログファイル中に指定されたキーワードのログが存在する場合（ＳＴ５０２でＹＥＳ）には、障害検知部２３１３は、障害情報の通知元識別情報にゲストＯＳ（自身）の情報をセットし、通知元ホスト識別情報には何もセットせず、日時情報として検知した時間をセットし、監視対象識別情報としてログ監視をセットし、監視対象個別情報として検知したログファイル名をセットし、問題と判断したログの内容をメッセージにセットした障害情報を生成し、障害受信部に通知する。複数存在する場合には、それぞれ一つずつ障害情報を生成し、障害受信部に通知する（ＳＴ５０４）。 In ST502, when a log of the specified keyword exists in the log file (YES in ST502), the failure detection unit 2313 sets the information of the guest OS (self) in the notification source identification information of the failure information. Nothing is set in the notification source host identification information, the time detected as the date / time information is set, the log monitoring is set as the monitoring target identification information, the log file name detected as the monitoring target individual information is set, Fault information in which the content of the log determined to be a problem is set in a message is generated and notified to the fault receiver. If there are a plurality of pieces, failure information is generated one by one and notified to the failure receiving unit (ST504).

また、ＳＴ５０３において、監視対象のプロセスが稼動していない場合には、障害検知部２３１３は、障害情報の通知元識別情報にゲストＯＳ（自身）の情報をセットし、通知元ホスト識別情報には何もセットせず、日時情報として検知した時間をセットし、監視対象識別情報としてプロセス監視をセットし、監視対象個別情報には何もセットせず、稼動していないプロセス名をメッセージにセットした障害情報を生成し、障害受信部に通知する。複数存在する場合には、それぞれ一つずつ障害情報を生成し、障害受信部に通知する（ＳＴ５０５）。 In ST503, when the process to be monitored is not operating, the failure detection unit 2313 sets the information of the guest OS (self) in the notification source identification information of the failure information, and the notification source host identification information Nothing is set, the detected time is set as date and time information, process monitoring is set as monitoring target identification information, nothing is set in monitoring target individual information, and the name of the process that is not running is set in the message Fault information is generated and notified to the fault receiver. When there are a plurality of pieces, failure information is generated one by one and notified to the failure receiving unit (ST505).

障害検知部２３１３は、上記の処理を定期的に繰り返して、対応するゲストＯＳが出力するログ、対応するゲストＯＳ上で動作するアプリケーションプログラムが出力するログ、対応するゲストＯＳにおけるプロセスの稼動状況、対応するゲストＯＳ上で動作するアプリケーションプログラムにおけるプロセスの稼動状況の少なくともいずれかを一定周期ごとに監視し、いずれかにおいて障害を検出した場合に、検出した障害を通知する障害情報を生成し、生成した障害情報をホスト障害監視制御エージェント２２１に対して送信する。 The failure detection unit 2313 periodically repeats the above-described process to output a log output by the corresponding guest OS, a log output by an application program operating on the corresponding guest OS, an operation status of the process in the corresponding guest OS, Monitor at least one of the operating statuses of processes in application programs running on the corresponding guest OS at regular intervals, and generate and generate failure information to notify the detected failure when any failure is detected The failure information is transmitted to the host failure monitoring control agent 221.

以上のように、本実施の形態では、障害検知をログ監視、プロセス監視、ＨＷ監視、ハートビート監視によって実施するようにしたことにより、ゲストＯＳ上で稼動しているアプリケーションの障害は、プロセス監視やログ監視により検知可能となり、ゲストＯＳそのものの障害は、ログ監視やハートビート監視により検知可能となり、ゲスト障害監視制御エージェントの障害は、ハートビート監視により検知可能となり、仮想計算機管理機構の障害はハートビート監視により検知可能となり、ＨＷの障害は、ＨＷ監視やハートビート監視により検知可能となることで、物理計算機を構成する各要素すべての障害の検知が可能となる。 As described above, in this embodiment, failure detection is performed by log monitoring, process monitoring, HW monitoring, and heartbeat monitoring, so that a failure of an application running on the guest OS can be monitored. Can be detected by log monitoring and heartbeat monitoring, the failure of the guest failure monitoring control agent can be detected by heartbeat monitoring, and the failure of the virtual machine management mechanism is It becomes possible to detect by heartbeat monitoring, and HW failures can be detected by HW monitoring or heartbeat monitoring, so that it is possible to detect failures of all the elements constituting the physical computer.

本実施の形態では、ホストＯＳ上の障害検知部にて定期的に稼動していることを障害監視ツール等に通知することでホストＯＳが稼動していることを把握可能とする仮想計算機における障害管理方式について説明した。 In this embodiment, a failure in a virtual machine that makes it possible to grasp that the host OS is operating by notifying a failure monitoring tool or the like that the failure detection unit on the host OS is operating periodically. The management method was explained.

また、本実施の形態では、ゲストＯＳ上の障害検知部にて定期的に稼動していることをホストＯＳ上の障害検知部に通知することでゲストＯＳが稼動していることを把握可能とする仮想計算機における障害管理方式について説明した。 Further, in this embodiment, it is possible to grasp that the guest OS is operating by notifying the failure detection unit on the host OS that the failure detection unit on the guest OS is regularly operating. The fault management method in the virtual machine to explain was explained.

また、本実施の形態では、ホストＯＳ上の前記障害検知部にて定期的にＯＳや仮想計算機管理機構が出力するログを監視することで前記ホストＯＳならびに仮想計算機管理機構の稼動状況を把握可能とする仮想計算機における障害管理方式について説明した。 In the present embodiment, the failure detection unit on the host OS can periodically monitor the logs output from the OS and the virtual machine management mechanism to grasp the operating status of the host OS and the virtual machine management mechanism. The fault management method in the virtual machine is explained.

また、本実施の形態では、ゲストＯＳ上の前記障害検知部にて定期的にＯＳやアプリケーションが出力するログを監視することで前記ゲストＯＳならびにアプリケーションの稼動状況を把握可能とする仮想計算機における障害管理方式について説明した。 In the present embodiment, the failure detection unit on the guest OS regularly monitors the logs output by the OS and applications, so that the failure in the virtual machine enables the operating status of the guest OS and applications to be grasped. The management method was explained.

また、本実施の形態では、ホストＯＳ上の前記障害検知部にてプロセスの稼動、非稼動を監視することで前記ホストＯＳの稼動状況を把握可能とする仮想計算機における障害管理方式について説明した。 Further, in the present embodiment, the failure management method in the virtual machine has been described in which the operation status of the host OS can be grasped by monitoring the operation / non-operation of the process by the failure detection unit on the host OS.

また、本実施の形態では、ゲストＯＳ上の前記障害検知部にてプロセスの稼動、非稼動を監視することで前記ゲストＯＳの稼動状況を把握可能とする仮想計算機における障害管理方式について説明した。 Further, in the present embodiment, the failure management method in the virtual machine has been described in which the operation status of the guest OS can be grasped by monitoring the operation / non-operation of the process by the failure detection unit on the guest OS.

また、本実施の形態では、前記障害検知部にて定期的にハードウェアへのアクセスを行うことでハードウェアの稼動状況を把握可能とする仮想計算機における障害管理方式について説明した。 Further, in the present embodiment, the failure management method in the virtual machine has been described in which the failure detection unit can grasp the operation status of the hardware by periodically accessing the hardware.

実施の形態４．
本実施の形態では、ユーザが認識するホスト名やＩＰアドレスの情報を仮想計算機管理機構がゲストＯＳを認識するドメインＩＤ等の識別情報に自動的に変換することにより、仮想的な障害を特定のゲストＯＳに対して発生させることを可能とした試験システムについて説明する。
また、ホストＯＳにて障害発生の設定指示や結果を集約し、擬似的に発生させた障害による障害検出をホストＯＳ、ゲストＯＳのいずれにおいて検出されるような障害であっても検出し、どこでどのような手段にて検知したかを把握可能とすることにより、ゲストＯＳのネットワーク環境や仮想化実装方式による障害発生箇所に依存せず、障害発生や確認の管理を可能とした試験システムを説明する。 Embodiment 4 FIG.
In this embodiment, a virtual fault is identified by automatically converting host name and IP address information recognized by the user into identification information such as a domain ID that the virtual machine management mechanism recognizes the guest OS. A test system that can be generated for the guest OS will be described.
In addition, the host OS collects fault setting instructions and results, and detects faults due to pseudo-faults that are detected by either the host OS or guest OS. Explains the test system that enables management of failure occurrence and confirmation without depending on the network environment of the guest OS and the location of the failure due to the virtualization implementation method by making it possible to grasp by what means it was detected To do.

図９は、本実施の形態に係る仮想計算機環境における試験システムの構成図である。
図に示すように、仮想計算機環境の試験システムは、障害監視制御マネージャ５を含む障害監視装置１、物理計算機２−１〜２−ｎを備えている。障害監視制御マネージャ５、物理計算機２−１〜２−ｎのホストＯＳは通信回線３を介して接続され、ゲストＯＳは通信回線４を介して接続されている。 FIG. 9 is a configuration diagram of the test system in the virtual machine environment according to the present embodiment.
As shown in the figure, the virtual machine environment test system includes a failure monitoring apparatus 1 including a failure monitoring control manager 5 and physical computers 2-1 to 2-n. The fault monitoring control manager 5 and the host OS of the physical computers 2-1 to 2-n are connected via the communication line 3, and the guest OS is connected via the communication line 4.

障害監視制御マネージャ５は、ブラウザなどの画面表示要求や物理計算機２−１〜２−ｎからの障害情報、稼動情報を受信する情報受信部５１、画面表示要求に伴い、表示用の情報を生成する表示部５２、物理計算機２−１〜２−ｎに対して障害発生の指示の実施を行う擬似障害発生制御部５３、物理計算機２−１〜２−ｎから収集した障害情報や稼動情報の蓄積を実施する稼動情報蓄積部５４から構成される。
つまり、障害監視装置１は、擬似障害の発生を要求する擬似障害発生要求を物理計算機２−１〜２−ｎに送信し、また、物理計算機２−１〜２−ｎにおける擬似障害の検知状況を監視する。 The fault monitoring control manager 5 generates display information in response to a screen display request such as a browser, fault information from the physical computers 2-1 to 2-n, an information receiving unit 51 that receives operation information, and a screen display request. Display unit 52, simulated fault occurrence control unit 53 that instructs the physical computers 2-1 to 2-n to issue a fault, and fault information and operation information collected from the physical computers 2-1 to 2-n. It is comprised from the operation information storage part 54 which implements | stores.
That is, the failure monitoring apparatus 1 transmits a simulated failure occurrence request for requesting the occurrence of a simulated failure to the physical computers 2-1 to 2-n, and the detection status of the simulated failure in the physical computers 2-1 to 2-n. To monitor.

物理計算機２−１〜２−ｎは、仮想計算機管理機構２０が搭載された計算機であり、各ゲストＯＳのＩ／Ｏのエミュレート等を実施するホストＯＳ２２、ゲストＯＳ２３が搭載され、ホストＯＳ２２、ゲストＯＳ２３が接続される仮想ネットワーク２１から構成される。
通信回線３、４は、例えばイントラネット、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）等のネットワークであり、互いに独立したネットワークで、相互通信を行うことは出来ない。 The physical computers 2-1 to 2-n are computers equipped with the virtual computer management mechanism 20, and are equipped with a host OS 22 and a guest OS 23 that emulate I / O of each guest OS, and the host OS 22, The virtual network 21 is connected to the guest OS 23.
The communication lines 3 and 4 are networks such as an intranet and a LAN (Local Area Network), for example, and are mutually independent networks and cannot perform mutual communication.

ホストＯＳ２２上には、擬似的に障害を発生させるモジュールや機器に対する障害発生指示や障害を発生させたことにより発生する事象を監視し情報を取得するホスト障害監視制御エージェント２２１と、ホストＯＳ上で発生可能な擬似的な障害を発生させる障害発生モジュール２２２が搭載されている。 On the host OS 22, a host failure monitoring control agent 221 that monitors a failure occurrence instruction for a module or device that generates a pseudo failure and an event that occurs when the failure occurs and acquires information, and on the host OS A failure generation module 222 that generates a possible pseudo failure is mounted.

ゲストＯＳ２３上には、擬似的に障害を発生させるモジュールや機器に対する障害発生指示や障害を発生させたことにより発生する事象を監視し情報を取得するゲスト障害監視制御エージェント２３１と、ゲストＯＳ上で発生可能な擬似的な障害を発生させる障害発生モジュール２３２が搭載されている。 On the guest OS 23, a guest fault monitoring control agent 231 that monitors a fault occurrence instruction for a module or device that generates a pseudo fault or an event that occurs when a fault occurs and acquires information, and a guest OS A failure generation module 232 that generates a possible pseudo failure is mounted.

ホスト障害監視制御エージェント２２１は、障害監視装置１から送信された擬似障害発生要求を受信し、受信した擬似障害発生要求の内容を解析し、解析結果に基づき、いずれかのＯＳの障害発生モジュールに対して擬似障害発生要求で要求されている擬似障害を発生させるよう通知する。
つまり、本実施の形態では、障害監視装置１からの擬似障害発生要求の受信をホスト障害監視制御エージェント２２１に集約している。 The host failure monitoring control agent 221 receives the simulated failure occurrence request transmitted from the failure monitoring device 1, analyzes the content of the received simulated failure occurrence request, and determines the failure occurrence module of any OS based on the analysis result. It notifies to generate the simulated fault requested by the simulated fault occurrence request.
In other words, in the present embodiment, the reception of the pseudo failure occurrence request from the failure monitoring apparatus 1 is collected in the host failure monitoring control agent 221.

また、ホスト障害監視制御エージェント２２１は、擬似障害の対象となる擬似障害対象ゲストＯＳの通信アドレスが含まれる擬似障害発生要求であって、ホストＯＳの障害発生モジュールと仮想計算機管理機構２０とが協働して発生させる擬似障害の発生を要求する擬似障害発生要求を受信した場合に、擬似障害対象ゲストＯＳの通信アドレスから擬似障害対象ゲストＯＳのドメインＩＤを取得し、取得した擬似障害対象ゲストＯＳのドメインＩＤを仮想計算機管理機構２０に通知する。 The host failure monitoring control agent 221 is a simulated failure occurrence request including the communication address of the simulated failure target guest OS that is the target of the simulated failure, and the failure occurrence module of the host OS and the virtual machine management mechanism 20 cooperate. When receiving a simulated fault occurrence request that requests the occurrence of a simulated fault to be generated, the domain ID of the simulated fault target guest OS is acquired from the communication address of the simulated fault target guest OS, and the acquired pseudo fault target guest OS Is notified to the virtual machine management mechanism 20.

更には、ホスト障害監視制御エージェント２２１は、擬似障害対象ゲストＯＳのＩＰアドレスが含まれる擬似障害発生要求であって、擬似障害対象ゲストＯＳに割当てられている物理ハードウェアの擬似障害の発生を要求する擬似障害発生要求を受信した場合に、擬似障害対象ゲストＯＳのＩＰアドレスから擬似障害対象ゲストＯＳのドメインＩＤを取得し、取得した擬似障害対象ゲストＯＳのドメインＩＤを仮想計算機管理機構２０に通知する。 Furthermore, the host failure monitoring control agent 221 is a simulated failure occurrence request including the IP address of the simulated failure target guest OS, and requests the occurrence of a simulated failure of the physical hardware allocated to the simulated failure target guest OS. When the virtual failure occurrence request is received, the domain ID of the simulated failure target guest OS is acquired from the IP address of the simulated failure target guest OS, and the acquired domain ID of the simulated failure target guest OS is notified to the virtual machine management mechanism 20 To do.

また、ホストＯＳ２２の障害発生モジュール２２２は、ホストＯＳの擬似障害、仮想計算機管理機構２０の擬似障害、いずれのゲストＯＳ２３にも割当てられていない物理ハードウェアの擬似障害の少なくともいずれかを発生させる。 Further, the failure occurrence module 222 of the host OS 22 generates at least one of a simulated failure of the host OS, a simulated failure of the virtual machine management mechanism 20, and a simulated failure of physical hardware not assigned to any guest OS 23.

また、ホスト障害監視制御エージェント２２１は、ホストＯＳ２２の障害発生モジュール２２２が発生させた擬似障害を検出した場合に、検出した擬似障害を通知する障害情報を生成し、生成した障害情報にホストＯＳ２２の識別情報を付加し、ホストＯＳ２２の識別情報が付加された障害情報を障害監視装置１に対して送信する。 Further, when the host failure monitoring control agent 221 detects a pseudo failure generated by the failure occurrence module 222 of the host OS 22, the host failure monitoring control agent 221 generates failure information for notifying the detected pseudo failure, and the generated failure information of the host OS 22 is generated. The identification information is added, and the failure information to which the identification information of the host OS 22 is added is transmitted to the failure monitoring apparatus 1.

各ゲストＯＳ２３の障害発生モジュール２３２は、対応するゲストＯＳ２３の擬似障害、対応するゲストＯＳ２３上で動作するアプリケーションプログラムの擬似障害の少なくともいずれかを発生させる。 The failure occurrence module 232 of each guest OS 23 generates at least one of a simulated failure of the corresponding guest OS 23 and a simulated failure of an application program operating on the corresponding guest OS 23.

また、各ゲスト障害監視制御エージェント２３１は、各ゲストＯＳ２３の障害発生モジュール２３２が発生させた擬似障害を検出した場合に、検出した擬似障害を通知する障害情報を生成し、生成した障害情報をホスト障害監視制御エージェント２２１に対して送信し、ホスト障害監視制御エージェント２２１は、各ゲスト障害監視制御エージェント２３１から受信した障害情報にホストＯＳ２２の識別情報を付加し、ホストＯＳ２２の識別情報が付加された障害情報を障害監視装置１に対して送信する。 In addition, each guest fault monitoring control agent 231 generates fault information for notifying the detected pseudo fault when the fault fault generated by the fault occurrence module 232 of each guest OS 23 is detected, and the generated fault information is hosted. The failure information received from each guest failure monitoring control agent 231 is sent to the failure monitoring control agent 221, and the host failure monitoring control agent 221 adds the identification information of the host OS 22 to the failure information received from each guest failure monitoring control agent 231. Fault information is transmitted to the fault monitoring apparatus 1.

ホスト障害監視制御エージェント２２１において、２２１１は、ゲストＯＳ２３のＩＰアドレスの検索を行い、ホストＯＳ２２のＩＰアドレス情報をゲストＯＳ２３に通知するＩＰ通知部である。
２２１２は、障害検知部２２１３あるいはゲスト障害監視制御エージェント２３１からの障害発生情報あるいは障害監視制御マネージャ５からの擬似障害発生要求を受信する情報受信部である。
２２１３は、ホストＯＳ２２やゲストＯＳ２３で発生した障害に基づき発生する動作を監視し、障害発生の検知を行う障害検知部である。
２２１４は、情報受信部２２１２からのゲストＯＳの障害情報や障害検知部２２１３からのホストＯＳの障害情報を障害監視制御マネージャ５に通知を行う障害報告部である。
２２８１は、ゲストＯＳ２３に対する擬似障害発生指示であった場合に、障害監視制御マネージャ５がゲストＯＳ２３を識別するホスト名やＩＰアドレスの情報を仮想計算機管理機構２０が識別できるドメインＩＤに変換するドメインＩＤ変換部である。
２２８２は、障害監視制御マネージャ５からの障害発生指示がホストＯＳ２２上の障害発生モジュール２２２で発生させる障害であった場合には、障害発生モジュール２２２に障害発生指示を行い、ゲストＯＳ２３上の障害発生モジュール２３２で発生させる障害であった場合には、ゲスト障害監視制御エージェント２３１に障害発生指示を行う擬似障害制御部である。 In the host failure monitoring control agent 221, 2211 is an IP notification unit that searches for the IP address of the guest OS 23 and notifies the guest OS 23 of the IP address information of the host OS 22.
Reference numeral 2212 denotes an information reception unit that receives failure occurrence information from the failure detection unit 2213 or the guest failure monitoring control agent 231 or a pseudo failure occurrence request from the failure monitoring control manager 5.
Reference numeral 2213 denotes a failure detection unit that monitors an operation that occurs based on a failure that has occurred in the host OS 22 or the guest OS 23 and detects the occurrence of the failure.
A failure report unit 2214 notifies the failure monitoring control manager 5 of failure information of the guest OS from the information reception unit 2212 and failure information of the host OS from the failure detection unit 2213.
2281 is a domain ID for converting the host name and IP address information for identifying the guest OS 23 into a domain ID that can be identified by the virtual machine management mechanism 20 when the failure monitoring control manager 5 is a pseudo failure occurrence instruction to the guest OS 23. It is a conversion unit.
2282, when the failure occurrence instruction from the failure monitoring control manager 5 is a failure generated by the failure occurrence module 222 on the host OS 22, the failure occurrence instruction is given to the failure occurrence module 222, and the failure occurrence on the guest OS 23 occurs. In the case of a failure that occurs in the module 232, the simulated failure control unit instructs the guest failure monitoring control agent 231 to generate a failure.

ゲスト障害監視制御エージェント２３１において、２３１１は、ホストＯＳのＩＰアドレス情報を受信し、その情報を基に障害情報の報告をホストＯＳ２２に対して実施可能とするＩＰ受信部である。
２３１２は、障害検知部２３１３からの障害発生情報あるいはホスト障害監視制御エージェント２２１からの擬似障害発生要求を受信する情報受信部である。
２３１３は、ホストＯＳ２２やゲストＯＳ２３で発生した障害に基づき発生する動作を監視し、障害発生の検知を行う障害検知部である。
２３１４は、障害検知部２３１３にて収集した障害情報をホスト障害監視制御エージェント２２１に通知する障害報告部である。
２３８１は、障害発生モジュール２３２に障害発生指示を行う擬似障害制御部である。 In the guest failure monitoring control agent 231, 2311 is an IP reception unit that receives the IP address information of the host OS and enables the host OS 22 to report the failure information based on the information.
Reference numeral 2312 denotes an information reception unit that receives failure occurrence information from the failure detection unit 2313 or a pseudo failure occurrence request from the host failure monitoring control agent 221.
Reference numeral 2313 denotes a failure detection unit that monitors an operation that occurs based on a failure that has occurred in the host OS 22 or the guest OS 23 and detects the occurrence of the failure.
A failure report unit 2314 notifies the host failure monitoring control agent 221 of failure information collected by the failure detection unit 2313.
Reference numeral 2381 denotes a simulated fault control unit that gives a fault occurrence instruction to the fault occurrence module 232.

図１０、図１１は、実施の形態４の仮想計算機環境における試験システムの処理動作を示すフローチャートである。
まず、図１０を用いてホストＯＳ２２上のホスト障害監視制御エージェント２２１の動作について説明する。 10 and 11 are flowcharts showing processing operations of the test system in the virtual machine environment according to the fourth embodiment.
First, the operation of the host failure monitoring control agent 221 on the host OS 22 will be described with reference to FIG.

ホスト障害監視制御エージェント２２１が起動すると、ＩＰ通知部２２１１は、仮想計算機管理機構２０が提供するゲストＯＳの情報からゲストＯＳのＭＡＣアドレスを取得し、ａｒｐコマンド等やホストＯＳ２２が持つネットワークＩ／Ｆにて取りうるアドレスリストを生成し、ｐｉｎｇ等でネットワークアクセスを行い、ＭＡＣアドレスとＩＰアドレスの対応情報を取得する（ＳＴ１０１）。
次に、ＩＰ通知部２２１１は、ＭＡＣアドレスとＩＰアドレスの対応情報と仮想計算機管理機構２０が提供するゲストＯＳのＭＡＣアドレスの情報から該当するＩＰアドレスの取得が出来たかどうかを判断する（ＳＴ１０２）。該当するＩＰアドレスの取得が出来た場合は、ＳＴ１０３へ処理を移す。取得できなかった場合は、ＳＴ１０７へ処理を移す。
該当するＩＰアドレスの取得が出来た場合（ＳＴ１０２でＹＥＳ）は、ＩＰ通知部２２１１は、ホストＯＳ２２のＩＰアドレスを取得したＩＰアドレス情報を利用してゲストＯＳ２３に通知する（ＳＴ１０３）。 When the host failure monitoring control agent 221 is activated, the IP notification unit 2211 acquires the MAC address of the guest OS from the guest OS information provided by the virtual machine management mechanism 20, and uses the arp command or the network I / F that the host OS 22 has. A list of addresses that can be taken in is generated, network access is performed by ping or the like, and correspondence information between the MAC address and the IP address is acquired (ST101).
Next, the IP notification unit 2211 determines whether or not the corresponding IP address has been acquired from the correspondence information between the MAC address and the IP address and the MAC address information of the guest OS provided by the virtual machine management mechanism 20 (ST102). . If the corresponding IP address can be acquired, the process proceeds to ST103. If not acquired, the process proceeds to ST107.
When the corresponding IP address can be acquired (YES in ST102), the IP notification unit 2211 notifies the guest OS 23 using the IP address information acquired from the IP address of the host OS 22 (ST103).

次に、情報受信部２２１２は、ゲスト障害監視制御エージェント２３１の障害報告部２３１３からの障害情報、あるいは障害検知部２２１４からの障害情報あるいは障害監視制御マネージャ５からの擬似障害発生要求の受信を待つ（ＳＴ１０４）。
情報受信部２２１２は、情報を受信すると障害情報か障害監視制御マネージャからの擬似障害発生要求かを判断する（ＳＴ６０１）。
障害情報の場合はＳＴ３０１へ処理を移す。擬似障害発生要求の場合はＳＴ６０２へ処理を移す。 Next, the information reception unit 2212 waits for reception of failure information from the failure report unit 2313 of the guest failure monitoring control agent 231, failure information from the failure detection unit 2214, or a pseudo failure occurrence request from the failure monitoring control manager 5. (ST104).
When receiving the information, information receiving section 2212 determines whether it is fault information or a pseudo fault occurrence request from fault monitoring control manager (ST601).
In the case of failure information, the process moves to ST301. In the case of a pseudo failure occurrence request, the process moves to ST602.

受信した情報が擬似障害発生要求の場合（ＳＴ６０１でＹＥＳ）、擬似障害制御部２２８２は、ホストＯＳ２２上の障害発生モジュール２２２に対して設定するものか、ゲストＯＳ２３上の障害発生モジュール２３２に対して設定するものかを判断する（ＳＴ６０２）。
ホストＯＳ２２上の障害発生モジュール２２２に対して設定するものであれば（ＳＴ６０２でＹＥＳ）、ＳＴ６０３へ処理を移す。ゲストＯＳ２３上の障害発生モジュール２３２に対して設定するものであれば（ＳＴ６０２でＮＯ）、ＳＴ６０６へ処理を移す。 When the received information is a simulated failure occurrence request (YES in ST601), the simulated failure control unit 2282 is set for the failure occurrence module 222 on the host OS 22 or the failure occurrence module 232 on the guest OS 23. It is determined whether to set (ST602).
If it is set for the failure occurrence module 222 on the host OS 22 (YES in ST602), the process proceeds to ST603. If it is set for the failure occurrence module 232 on the guest OS 23 (NO in ST602), the process proceeds to ST606.

次に、擬似障害制御部２２８２は、ホストＯＳ２２上の障害発生モジュール２２２に対して設定する擬似障害発生要求の場合（ＳＴ６０２でＹＥＳ）、特定のゲストＯＳ２３（擬似障害対象ゲストＯＳ）に割当てられたディスク等のハードウェア障害に関するものか、ファンや電源等のようにリソースの割当が行われていないハードウェアに対する障害に関するものかを判断する（ＳＴ６０３）。特定のゲストＯＳ２３（擬似障害対象ゲストＯＳ）に割当てられたディスク等のハードウェアの擬似障害は、ホストＯＳ２２の障害発生モジュール２２２と仮想計算機管理機構２０とが協働して発生させる擬似障害である。
特定のゲストＯＳ２３に割当てられているハードウェアに対する障害に関するものであれば、ＳＴ６０４へ処理を移す。リソースの割当が行われていないハードウェアに対する障害に関するものであれば、ＳＴ６０５へ処理を移す。 Next, the simulated failure control unit 2282 is assigned to a specific guest OS 23 (simulated failure target guest OS) in the case of a simulated failure occurrence request set for the failure occurrence module 222 on the host OS 22 (YES in ST602). It is determined whether the problem is related to a hardware failure such as a disk, or a failure related to hardware to which resources are not allocated, such as a fan or a power supply (ST603). A hardware simulated failure such as a disk allocated to a specific guest OS 23 (a simulated failure target guest OS) is a simulated failure generated by the failure generation module 222 of the host OS 22 and the virtual machine management mechanism 20 in cooperation. .
If the failure is related to the hardware assigned to the specific guest OS 23, the process moves to ST604. If it is related to a failure in hardware to which resources are not allocated, the process moves to ST605.

ドメインＩＤ変換部２２８１は、擬似障害発生要求において指定されたゲストＯＳ２３（擬似障害対象ゲストＯＳ）を指定するホスト名やＩＰアドレス情報を基にａｒｐコマンド等により、ＭＡＣアドレス情報を取得し、仮想計算機管理機構２０が提供しているゲストＯＳ２３の情報からＭＡＣアドレスに該当するドメインＩＤの取得を行う（ＳＴ６０４）。
例えば、ドメインＩＤ変換部２２８１は、実施の形態１で示したｘｍコマンド出力及びａｒｐコマンド出力を用いて、実施の形態１で説明した手順とは逆の手順にて、擬似障害対象ゲストＯＳのドメインＩＤを取得する。 The domain ID conversion unit 2281 acquires MAC address information by an arp command or the like based on the host name or IP address information that specifies the guest OS 23 (pseudo failure target guest OS) specified in the simulated failure occurrence request, and the virtual machine The domain ID corresponding to the MAC address is acquired from the information of the guest OS 23 provided by the management mechanism 20 (ST604).
For example, the domain ID conversion unit 2281 uses the xm command output and the arp command output shown in the first embodiment, and reverses the domain of the pseudo-failure target guest OS in the procedure reverse to the procedure described in the first embodiment. Get an ID.

そして、擬似障害制御部２２８２は、障害発生モジュール２２２に対して擬似障害の発生を指示する（ＳＴ６０５）。 Then, the simulated fault control unit 2282 instructs the fault occurrence module 222 to generate a simulated fault (ST605).

また、ゲストＯＳ２３上の障害発生モジュール２３２に対して設定するものであれば（ＳＴ６０２でＮＯ）、擬似障害制御部２２８２は、擬似障害対象ゲストＯＳ２３のゲスト障害監視制御エージェント２３１に対して障害発生指示を送信する（ＳＴ６０６）。 If the setting is made for the failure occurrence module 232 on the guest OS 23 (NO in ST602), the simulated failure control unit 2282 instructs the failure occurrence to the guest failure monitoring control agent 231 of the simulated failure target guest OS 23. Is transmitted (ST606).

また、ＳＴ６０１において、情報受信部２２１２は、情報を受信すると（ＳＴ６０１でＮＯ）、障害検知部２２１３からの障害情報かゲストＯＳ２３からの障害情報かを判断する（ＳＴ３０１）。
障害検知部２２１３からの障害情報の場合は、ＳＴ６０７へ処理を移す。ゲストＯＳからの障害情報の場合は、ＳＴ３０２へ処理を移す。 In ST601, when receiving the information (NO in ST601), the information reception unit 2212 determines whether the failure information is from the failure detection unit 2213 or the failure information from the guest OS 23 (ST301).
In the case of failure information from the failure detection unit 2213, the process moves to ST607. In the case of failure information from the guest OS, the process moves to ST302.

ゲストＯＳから来た障害情報の場合（ＳＴ３０１でＮＯ）は、情報受信部２２１２が図６の例のようにホストＯＳ識別情報（この例の場合はホスト名）を付与する（ＳＴ３０２）。
受信した情報が障害検知部２２１３からの障害情報、あるいはゲスト障害監視制御エージェント２３１からの障害情報である場合は、障害報告部２２１４が障害監視制御マネージャ５に受信した障害情報を送信する（ＳＴ６０７）。 In the case of failure information coming from the guest OS (NO in ST301), the information receiving unit 2212 gives host OS identification information (host name in this example) as in the example of FIG. 6 (ST302).
If the received information is failure information from the failure detection unit 2213 or failure information from the guest failure monitoring control agent 231, the failure reporting unit 2214 transmits the received failure information to the failure monitoring control manager 5 (ST607). .

また、ＳＴ１０２において、該当するＩＰアドレスの取得が出来なかった場合（ＳＴ１０２でＮＯ）は、ホストＯＳ２２とゲストＯＳ２３の間の通信が可能なネットワークが存在しないと判断し、障害報告部２２１４が障害監視制御マネージャ５のオペレータに通知し（当該ゲストＯＳ２３とホストＯＳ２２との間のネットワークが設定されていないことを通知するメッセージを出力し）、終了する（ＳＴ１０７）。 If the corresponding IP address cannot be obtained in ST102 (NO in ST102), it is determined that there is no network that can communicate between the host OS 22 and the guest OS 23, and the failure report unit 2214 monitors the failure. The operator of the control manager 5 is notified (outputs a message notifying that the network between the guest OS 23 and the host OS 22 is not set), and the process ends (ST107).

次に、図１１を用いてゲストＯＳ２３上のゲスト障害監視制御エージェント２３１の動作について説明する。 Next, the operation of the guest failure monitoring control agent 231 on the guest OS 23 will be described with reference to FIG.

ゲスト障害監視制御エージェント２３１が起動すると、ＩＰ受信部２３１１は、ホスト障害監視制御エージェント２２１からのホストＯＳ２２のＩＰアドレス情報の受信を待ち、受信すると障害報告部２３１４にホストＯＳ２２のＩＰアドレスの情報を通知する（ＳＴ２０１）。 When the guest failure monitoring control agent 231 is activated, the IP reception unit 2311 waits for reception of the IP address information of the host OS 22 from the host failure monitoring control agent 221. When the guest failure monitoring control agent 231 is received, the IP address information of the host OS 22 is sent to the failure reporting unit 2314. Notification is made (ST201).

情報受信部２３１２は、障害検知部２３１３によって取得された障害発生情報かホスト障害監視制御エージェントからの擬似障害発生指示を受信する（ＳＴ２０２）。
情報受信部２３１２は、情報を受信すると、その情報が擬似障害発生要求か障害情報の受信かを判断する（ＳＴ６０１）。
擬似障害発生要求の場合にはＳＴ６０２へ処理を移す。障害情報の受信の場合にはＳＴ２０３へ処理を移す。 The information receiving unit 2312 receives the failure occurrence information acquired by the failure detection unit 2313 or the pseudo failure occurrence instruction from the host failure monitoring control agent (ST202).
Upon receiving the information, information receiving section 2312 determines whether the information is a simulated fault occurrence request or fault information reception (ST601).
In the case of a simulated fault occurrence request, the process moves to ST602. If the failure information is received, the process proceeds to ST203.

受信した情報が擬似障害発生要求の場合（ＳＴ６０１でＹＥＳ）、擬似障害制御部２３８１が、障害発生モジュール２３２に対して指定された障害発生の設定を実施する（ＳＴ６０２）。
一方、障害情報を受信すると（ＳＴ６０１でＮＯ）、障害報告部２３１４はホスト障害監視制御エージェント２２１に障害情報を通知する（ＳＴ２０３）。 If the received information is a simulated fault occurrence request (YES in ST601), the simulated fault control unit 2381 implements the specified fault occurrence setting for the fault occurrence module 232 (ST602).
On the other hand, when failure information is received (NO in ST601), the failure report unit 2314 notifies the failure information to the host failure monitoring control agent 221 (ST203).

なお、障害発生モジュール２２２又は障害発生モジュール２３２が擬似障害を発生させた場合は、障害検知部２２１３又は障害検知部２３１３は、実施の形態３において説明した手順（図７又は図８に示した手順）にて擬似障害を検知し、障害報告部２２１４又は障害報告部２３１４が検知した擬似障害を報告する障害情報を生成する。 If the fault occurrence module 222 or the fault occurrence module 232 generates a pseudo fault, the fault detection unit 2213 or the fault detection unit 2313 performs the procedure described in the third embodiment (the procedure illustrated in FIG. 7 or FIG. 8). ) Is detected, and failure information for reporting the simulated failure detected by the failure report unit 2214 or the failure report unit 2314 is generated.

以上のように、実施の形態４によれば、擬似障害の発生指示や障害情報をホストＯＳ上で集約可能としたことにより、ホストＯＳとゲストＯＳ、あるいはゲストＯＳ同士が独立した別セグメントのネットワークに接続されていた場合であっても、障害発生の指示や障害情報の収集を一つの障害監視制御マネージャにて管理することが可能となる。
また、ホストＯＳ、ゲストＯＳにて障害の発生や監視を行うこととしたことにより、仮想計算機管理機構の実装方式に依存せずに試験の実施、確認を行うことが可能となる。
また、ホストＯＳ上で指定されたゲストＯＳのホスト名やＩＰアドレスを基に仮想計算機管理機構が認識可能なドメインＩＤに変換する構成としたことにより、ホストＯＳ上でゲストＯＳに対する擬似障害の発生を行うことが可能となる。
また、擬似障害を発生させることで、ハードウェア障害の試験を実施できるようにしたことにより、物理計算機では検証が困難であったハードウェア障害の試験を物理計算機上で動作しているシステムを仮想計算機環境上に搭載することで、実施することが可能となる。 As described above, according to the fourth embodiment, the pseudo-failure occurrence instruction and the failure information can be aggregated on the host OS. Even when connected to the network, it is possible to manage failure instruction and failure information collection with a single failure monitoring control manager.
In addition, since the failure and monitoring of the host OS and guest OS are performed, the test can be performed and confirmed without depending on the mounting method of the virtual machine management mechanism.
In addition, a configuration in which the virtual machine management mechanism recognizes the domain ID based on the host name or IP address of the guest OS specified on the host OS causes a pseudo failure on the guest OS on the host OS. Can be performed.
In addition, by making it possible to perform a hardware failure test by generating a simulated failure, a system running a hardware failure test on a physical computer, which was difficult to verify with a physical computer, can be virtualized. It can be implemented by installing it on a computer environment.

以上、本実施の形態では、仮想計算機環境を搭載した物理計算機上で障害を擬似的に発生させ、障害発生時の動作検証を実施する障害試験システムにおいて、
障害監視制御マネージャとホストＯＳ上にホスト障害監視制御エージェント、ゲストＯＳ上にゲスト障害監視制御エージェントを備え、
前記ホスト障害監視制御エージェントは、
前記障害監視制御マネージャからの障害発生指示の受信ならびに前記ゲスト障害監視制御エージェントからの障害発生情報を受信する情報受信部と、
前記情報受信部が取得した情報がゲストＯＳに対する擬似ハードウェア障害発生指示であった場合に、ゲストＯＳを識別するホスト名やＩＰアドレス情報を仮想計算機管理機構が認識可能なゲストＯＳ識別情報に変換するドメインＩＤ変換部と、
前記ドメインＩＤ変換部にて取得したドメインＩＤを基に擬似ハードウェア障害を発生させる障害発生モジュールに対して障害発生の指示を行う擬似障害制御部と、
前記擬似障害制御部にてセットされた障害に基づき発生する動作を監視し、障害発生の検知を行う障害検知部と、
前記障害検知部や前記情報受信部にて取得した検知した障害情報を前記障害監視制御マネージャに通知する障害報告部とを備え、
前記ゲスト障害監視制御エージェントは、
前記ホスト障害監視制御エージェントからの障害発生指示を受信する情報受信部と、
前記情報受信部にて取得した情報を基に擬似障害を発生させる障害発生モジュールに対して障害発生の指示を行う擬似障害制御部と、
前記擬似障害制御部あるいは前記ホスト障害監視制御エージェントの前記擬似障害制御部にてセットされた擬似障害に基づき発生する動作を監視し、障害発生の検知を行う障害検知部と、
前記障害検知部にて取得した障害情報を前記ホスト障害制御エージェントに通知する障害報告部とを備え、
ゲストＯＳを識別するＩＰアドレスから仮想化管理機能が識別するドメインＩＤの情報に変換し、擬似障害をセット可能としたことにより、ゲストＯＳからのハードウェアアクセスがあった場合に擬似障害を発生させることを可能とした仮想計算機環境における障害試験システムについて説明した。 As described above, in the present embodiment, in a failure test system for performing a simulation of a failure on a physical computer equipped with a virtual computer environment and performing an operation verification when the failure occurs,
A fault monitoring control manager, a host fault monitoring control agent on the host OS, and a guest fault monitoring control agent on the guest OS,
The host failure monitoring control agent is
An information receiving unit for receiving a failure occurrence instruction from the failure monitoring control manager and receiving failure occurrence information from the guest failure monitoring control agent;
When the information acquired by the information receiving unit is a pseudo hardware failure occurrence instruction for the guest OS, the host name or IP address information for identifying the guest OS is converted into guest OS identification information that can be recognized by the virtual machine management mechanism A domain ID conversion unit,
A pseudo-fault control unit that issues a fault occurrence instruction to a fault occurrence module that generates a pseudo hardware fault based on the domain ID acquired by the domain ID conversion unit;
A fault detection unit that monitors an operation that occurs based on the fault set by the pseudo fault control unit and detects a fault occurrence;
A failure report unit for notifying the failure monitoring control manager of the detected failure information acquired by the failure detection unit or the information receiving unit;
The guest fault monitoring control agent is
An information receiving unit for receiving a failure occurrence instruction from the host failure monitoring control agent;
A simulated fault control unit that issues a fault occurrence instruction to a fault occurrence module that generates a pseudo fault based on information acquired by the information receiving unit;
A fault detection unit that monitors an operation that occurs based on the pseudo fault set in the pseudo fault control unit of the pseudo fault control unit or the host fault monitoring control agent, and detects a fault occurrence;
A failure report unit for notifying the host failure control agent of the failure information acquired by the failure detection unit;
By converting the IP address that identifies the guest OS into domain ID information that is identified by the virtualization management function and making it possible to set a pseudo fault, a pseudo fault occurs when there is a hardware access from the guest OS The fault test system in the virtual machine environment that made it possible was explained.

また、本実施の形態では、ホストＯＳ上の前記擬似障害制御部にて擬似ホストＯＳ障害の設定の実施を可能とする仮想計算機環境における障害試験システムについて説明した。 Further, in the present embodiment, the failure test system in the virtual machine environment that enables the simulated failure control unit on the host OS to set the simulated host OS failure has been described.

また、本実施の形態では、ホストＯＳ上の前記擬似障害制御部にて擬似仮想計算機管理機構障害の設定の実施を可能とする仮想計算機環境における障害試験システムについて説明した。 In the present embodiment, the fault test system in the virtual machine environment has been described that enables the pseudo fault control unit on the host OS to set the fault of the virtual virtual machine management mechanism.

また、本実施の形態では、ゲストＯＳ上の前記擬似障害制御部にて擬似ゲストＯＳ障害の設定の実施を可能とする仮想計算機環境における障害試験システムについて説明した。 Further, in the present embodiment, the failure test system in the virtual machine environment has been described in which the simulated guest OS failure can be set by the simulated failure control unit on the guest OS.

また、本実施の形態では、ゲストＯＳ上の前記擬似障害制御部にて擬似アプリケーション障害の設定の実施を可能とする仮想計算機環境における障害試験システムについて説明した。 In the present embodiment, the failure test system in the virtual machine environment that enables the simulated failure control unit on the guest OS to set the simulated application failure has been described.

実施の形態５．
実施の形態５では、障害監視制御マネージャ１の動作について示す。
図１２は、ホスト一覧を表示した画面の例である。
図１３は、擬似障害発生要求を実施する画面の例である。
図１４は、各ホスト上で発生している障害情報を表示する画面の例である。
図１５は、障害監視制御マネージャ１の動作フローを示した図である。 Embodiment 5 FIG.
In the fifth embodiment, the operation of the failure monitoring control manager 1 will be described.
FIG. 12 is an example of a screen displaying a host list.
FIG. 13 is an example of a screen for executing a simulated fault occurrence request.
FIG. 14 is an example of a screen displaying failure information occurring on each host.
FIG. 15 is a diagram illustrating an operation flow of the failure monitoring control manager 1.

情報受信部５１は、ブラウザなどによるユーザからの表示要求、擬似障害発生要求あるいは物理計算機２−１〜２−ｎからの障害情報あるいは稼動情報の受信を待つ（ＳＴ８０１）。
情報を受信した場合は、その情報がユーザからの表示要求か擬似障害発生要求か物理計算機２−１〜２−ｎからの障害情報あるいは稼動情報であるかを判断する（ＳＴ８０２）。
表示要求の場合は、ＳＴ８０３へ処理を移す。それ以外の場合はＳＴ８０５へ処理を移す。 The information receiving unit 51 waits for a display request from a user, a simulated fault occurrence request, or fault information or operation information from the physical computers 2-1 to 2-n by a browser or the like (ST801).
When the information is received, it is determined whether the information is a display request from the user, a pseudo failure occurrence request, or failure information or operation information from the physical computers 2-1 to 2-n (ST802).
In the case of a display request, the process moves to ST803. Otherwise, the process moves to ST805.

ユーザからの表示要求であった場合（ＳＴ８０２でＹＥＳ）は、表示部５２がＤＢ（ＤａｔａＢａｓｅ）に格納されている構成情報や障害情報を基に表示用のデータの生成を行う。
ホスト一覧表示要求の場合には、表示部５２は、図１２に示した画面を表示するためのデータを生成する。
また、擬似障害発生要求を行う画面の表示要求の場合には、表示部５２は、図１３に示した画面を表示するためのデータを生成する。
また、障害発生状況の表示要求の場合には、表示部５２は、図１４に示した画面を表示するためのデータを生成する（ＳＴ８０３）。
そして、表示部１２は生成された表示データを要求元のブラウザ等に返す（ＳＴ８０４）。 If the request is a display request from the user (YES in ST802), the display unit 52 generates display data based on the configuration information and failure information stored in the DB (Data Base).
In the case of a host list display request, the display unit 52 generates data for displaying the screen shown in FIG.
Further, in the case of a screen display request for making a simulated fault occurrence request, the display unit 52 generates data for displaying the screen shown in FIG.
Further, in the case of the request for displaying the failure occurrence status, the display unit 52 generates data for displaying the screen shown in FIG. 14 (ST803).
Then, display unit 12 returns the generated display data to the requesting browser or the like (ST804).

また、ＳＴ８０２において表示要求でなかった場合（ＳＴ８０２でＮＯ）は、情報受信部１１は、受信した情報が擬似障害発生要求か障害情報あるいは稼動情報であるかを判断する（ＳＴ８０５）。擬似障害発生要求の場合はＳＴ８０６へ処理を移す。それ以外の場合はＳＴ８０７へ処理を移す。 If the display request is not a request in ST802 (NO in ST802), information receiving section 11 determines whether the received information is a pseudo-failure occurrence request, fault information, or operation information (ST805). In the case of a simulated fault occurrence request, the process moves to ST806. Otherwise, the process moves to ST807.

オペレータから擬似障害発生要求を受信した場合（ＳＴ８０５でＹＥＳ）は、擬似障害発生制御部１３が指示されたホストＯＳ２２のホスト障害監視制御エージェント２２１に対して障害発生指示を通知する（ＳＴ８０６）。
一方、物理計算機２のホストＯＳ２２から障害情報あるいは稼動情報を受信した場合（ＳＴ８０５でＮＯ）は、稼動情報蓄積部１４が受信データをＤＢに格納し、障害発生状況の表示要求を受け付けた際に内容を表示可能とする（ＳＴ８０７）。 When a simulated fault occurrence request is received from the operator (YES in ST805), the simulated fault occurrence control unit 13 notifies the fault occurrence instruction to the host fault monitoring control agent 221 of the host OS 22 instructed (ST806).
On the other hand, when failure information or operation information is received from the host OS 22 of the physical computer 2 (NO in ST805), the operation information storage unit 14 stores the received data in the DB and receives a failure occurrence status display request. The contents can be displayed (ST807).

以上のように障害監視制御マネージャにて擬似障害発生指示、障害発生状況の確認を集中管理する構成としたことにより、障害発生時の各物理計算機上のホストＯＳ、ゲストＯＳの挙動を確認することが可能となり、試験を円滑に進めることが可能となる。 As described above, the behavior of the host OS and guest OS on each physical computer when a failure occurs can be confirmed by configuring the failure monitoring control manager to centrally manage the pseudo failure occurrence instruction and the confirmation of the failure occurrence status. This makes it possible to proceed with the test smoothly.

なお、仮想計算機環境における障害管理方式を説明する上で、ホストＯＳ、ゲストＯＳと記載しているが、Ｘｅｎなどのように明示的にホストＯＳを用いない仮想計算機管理機構を用いた場合であった場合、各ドメインの管理を行うＤｏｍａｉｎ−０をホストＯＳとみなし、他のＤｏｍａｉｎ−ＵをゲストＯＳとみなして構成することにより、本明細書に記載の障害管理方式は実現可能である。
また、特別なＤｏｍａｉｎ−０やホストＯＳといったＯＳが存在しない環境においても、代表の仮想ＯＳを定義することにより、本明細書に記載の障害管理方式は実現可能である。 In the description of the fault management method in the virtual machine environment, the host OS and guest OS are described. However, this is the case where a virtual machine management mechanism that does not explicitly use the host OS such as Xen is used. In this case, the failure management method described in this specification can be realized by regarding the Domain-0 that manages each domain as the host OS and configuring the other Domain-U as the guest OS.
Further, even in an environment where no OS such as special Domain-0 or host OS exists, the failure management method described in this specification can be realized by defining a representative virtual OS.

このように、本実施の形態では、仮想計算機環境を搭載した物理計算機上で障害を擬似的に発生させ、障害発生時の動作検証を実施する障害試験システムにおいて、
障害監視制御マネージャが、
ホスト障害監視制御エージェントに対して擬似障害の発生の指示を行う擬似障害発生制御部と、
前記ホスト障害監視制御エージェントからの稼動情報を収集し、結果を表示する稼動情報収集部とを備えていることを説明した。 As described above, in the present embodiment, in the failure test system for performing a simulation of a failure on a physical computer equipped with a virtual computer environment and performing an operation verification when the failure occurs,
The fault monitoring control manager
A simulated fault occurrence control unit that instructs the host fault monitoring control agent to generate a simulated fault;
It has been described that an operation information collecting unit that collects operation information from the host failure monitoring control agent and displays the result is provided.

最後に、実施の形態１〜５に示した障害監視装置１及び物理計算機２のハードウェア構成例について説明する。
図１８は、実施の形態１〜５に示す障害監視装置１及び物理計算機２のハードウェア資源の一例を示す図である。
なお、図１８の構成は、あくまでも障害監視装置１及び物理計算機２のハードウェア構成の一例を示すものであり、障害監視装置１及び物理計算機２のハードウェア構成は図１８に記載の構成に限らず、他の構成であってもよい。 Finally, a hardware configuration example of the failure monitoring apparatus 1 and the physical computer 2 described in the first to fifth embodiments will be described.
FIG. 18 is a diagram illustrating an example of hardware resources of the failure monitoring apparatus 1 and the physical computer 2 described in the first to fifth embodiments.
The configuration in FIG. 18 is merely an example of the hardware configuration of the failure monitoring device 1 and the physical computer 2, and the hardware configuration of the failure monitoring device 1 and the physical computer 2 is limited to the configuration described in FIG. Alternatively, other configurations may be used.

図１８において、障害監視装置１及び物理計算機２は、プログラムを実行するＣＰＵ９１１（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ、中央処理装置、処理装置、演算装置、マイクロプロセッサ、マイクロコンピュータ、プロセッサともいう）を備えている。
ＣＰＵ９１１は、バス９１２を介して、例えば、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）９１３、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）９１４、通信ボード９１５、表示装置９０１、キーボード９０２、マウス９０３、磁気ディスク装置９２０と接続され、これらのハードウェアデバイスを制御する。
更に、ＣＰＵ９１１は、ＦＤＤ９０４（ＦｌｅｘｉｂｌｅＤｉｓｋＤｒｉｖｅ）、コンパクトディスク装置９０５（ＣＤＤ）、プリンタ装置９０６、スキャナ装置９０７と接続していてもよい。また、磁気ディスク装置９２０の代わりに、光ディスク装置、メモリカード（登録商標）読み書き装置などの記憶装置でもよい。
ＲＡＭ９１４は、揮発性メモリの一例である。ＲＯＭ９１３、ＦＤＤ９０４、ＣＤＤ９０５、磁気ディスク装置９２０の記憶媒体は、不揮発性メモリの一例である。これらは、記憶装置の一例である。
通信ボード９１５、キーボード９０２、マウス９０３、スキャナ装置９０７、ＦＤＤ９０４などは、入力装置の一例である。
また、通信ボード９１５、表示装置９０１、プリンタ装置９０６などは、出力装置の一例である。 In FIG. 18, the fault monitoring apparatus 1 and the physical computer 2 include a CPU 911 (also referred to as a central processing unit, a central processing unit, a processing unit, an arithmetic unit, a microprocessor, a microcomputer, and a processor) that executes a program.
The CPU 911 is connected to, for example, a ROM (Read Only Memory) 913, a RAM (Random Access Memory) 914, a communication board 915, a display device 901, a keyboard 902, a mouse 903, and a magnetic disk device 920 via a bus 912. Control hardware devices.
Further, the CPU 911 may be connected to an FDD 904 (Flexible Disk Drive), a compact disk device 905 (CDD), a printer device 906, and a scanner device 907. Further, instead of the magnetic disk device 920, a storage device such as an optical disk device or a memory card (registered trademark) read / write device may be used.
The RAM 914 is an example of a volatile memory. The storage media of the ROM 913, the FDD 904, the CDD 905, and the magnetic disk device 920 are an example of a nonvolatile memory. These are examples of the storage device.
A communication board 915, a keyboard 902, a mouse 903, a scanner device 907, an FDD 904, and the like are examples of input devices.
The communication board 915, the display device 901, the printer device 906, and the like are examples of output devices.

通信ボード９１５は、図１等に示すように、ネットワークに接続されている。例えば、通信ボード９１５は、ＬＡＮ（ローカルエリアネットワーク）、インターネット、ＷＡＮ（ワイドエリアネットワーク）などに接続されていても構わない。 The communication board 915 is connected to a network as shown in FIG. For example, the communication board 915 may be connected to a LAN (local area network), the Internet, a WAN (wide area network), or the like.

図１８では、磁気ディスク装置９２０の内容は物理計算機２を実現するためのプログラムの例が示されている。
図１８の磁気ディスク装置９２０では、仮想計算機管理機構９２１（仮想マシンモニタ）、ホストＯＳ９２２、プログラム群９２３、ファイル群９２４が記憶されている。
プログラム群９２３のプログラムは、ＣＰＵ９１１、仮想計算機管理機構９２１、ホストＯＳ９２２により実行される。
また、仮想計算機管理機構９２１自身がホストＯＳ９２２の機能を含む場合や、ホストＯＳ９２２内に仮想計算機管理機構９２１が存在する場合もある。
障害監視装置１の磁気ディスク装置９２０では、例えば、仮想計算機管理機構９２１、ホストＯＳ９２２の代わりに、通常のＯＳやウィンドウシステムが記憶される。 In FIG. 18, the contents of the magnetic disk device 920 show an example of a program for realizing the physical computer 2.
18 stores a virtual machine management mechanism 921 (virtual machine monitor), a host OS 922, a program group 923, and a file group 924.
The programs in the program group 923 are executed by the CPU 911, the virtual machine management mechanism 921, and the host OS 922.
In some cases, the virtual machine management mechanism 921 itself includes the function of the host OS 922, or the virtual machine management mechanism 921 exists in the host OS 922.
In the magnetic disk device 920 of the failure monitoring apparatus 1, for example, a normal OS and a window system are stored instead of the virtual machine management mechanism 921 and the host OS 922.

ＲＯＭ９１３には、ＢＩＯＳ（ＢａｓｉｃＩｎｐｕｔＯｕｔｐｕｔＳｙｓｔｅｍ）プログラムが格納され、磁気ディスク装置９２０にはブートプログラムが格納されている。
物理計算機２の起動時には、ＲＯＭ９１３のＢＩＯＳプログラム及び磁気ディスク装置９２０のブートプログラムが実行され、ＢＩＯＳプログラム及びブートプログラムにより仮想計算機管理機構９２１、ホストＯＳ９２２（障害監視装置１では、ＯＳ）が起動される。 The ROM 913 stores a BIOS (Basic Input Output System) program, and the magnetic disk device 920 stores a boot program.
When the physical computer 2 is started, the BIOS program in the ROM 913 and the boot program for the magnetic disk device 920 are executed, and the virtual computer management mechanism 921 and the host OS 922 (OS in the fault monitoring device 1) are started by the BIOS program and the boot program. .

プログラム群９２３には、物理計算機２の場合、実施の形態１〜５に示されるゲストＯＳ及びこれらの内部要素を実現するプログラムが含まれる。具体的には、プログラム群９２３には、実施の形態１〜５の説明において「〜部」として説明している機能を実行するプログラムが記憶されている。
また、ゲストＯＳで実施されるアプリケーションプログラムも記憶されている。
また、障害監視装置１では、プログラム群９２３には、障害監視制御マネージャ等のアプリケーションプログラムが格納されている。
プログラムは、ＣＰＵ９１１により読み出され実行される。 In the case of the physical computer 2, the program group 923 includes the guest OS shown in Embodiments 1 to 5 and programs that realize these internal elements. Specifically, the program group 923 stores a program for executing the function described as “˜unit” in the description of the first to fifth embodiments.
An application program executed by the guest OS is also stored.
In the failure monitoring apparatus 1, the program group 923 stores application programs such as a failure monitoring control manager.
The program is read and executed by the CPU 911.

また、ファイル群９２４には、物理計算機２の場合は、例えば、ハードウェアをエミュレートするための各種ファイルが含まれる。
更には、ファイル群９２４には、実施の形態１〜５の説明において、「〜の判断」、「〜の計算」、「〜の比較」、「〜の変換」、「〜の取得」、「〜の設定」、「〜の登録」、「〜の選択」等として説明している処理の結果を示す情報やデータや信号値や変数値やパラメータが、「〜ファイル」や「〜データベース」の各項目として記憶されている。
「〜ファイル」や「〜データベース」は、ディスクやメモリなどの記録媒体に記憶される。ディスクやメモリなどの記憶媒体に記憶された情報やデータや信号値や変数値やパラメータは、読み書き回路を介してＣＰＵ９１１によりメインメモリやキャッシュメモリに読み出され、抽出・検索・参照・比較・演算・計算・処理・編集・出力・印刷・表示などのＣＰＵの動作に用いられる。
抽出・検索・参照・比較・演算・計算・処理・編集・出力・印刷・表示のＣＰＵの動作の間、情報やデータや信号値や変数値やパラメータは、メインメモリ、レジスタ、キャッシュメモリ、バッファメモリ等に一時的に記憶される。
また、実施の形態１〜５で説明しているフローチャートの矢印の部分は主としてデータや信号の入出力を示し、データや信号値は、ＲＡＭ９１４のメモリ、ＦＤＤ９０４のフレキシブルディスク、ＣＤＤ９０５のコンパクトディスク、磁気ディスク装置９２０の磁気ディスク、その他光ディスク、ミニディスク、ＤＶＤ等の記録媒体に記録される。また、データや信号は、バス９１２や信号線やケーブルその他の伝送媒体によりオンライン伝送される。 In the case of the physical computer 2, the file group 924 includes various files for emulating hardware, for example.
Further, the file group 924 includes “determination of”, “calculation of”, “comparison of”, “conversion of”, “acquisition of”, “ Information, data, signal values, variable values, and parameters indicating the results of the processing described as “setting of”, “registration of”, “selection of”, etc. are stored in “~ file” or “˜database”. It is stored as each item.
The “˜file” and “˜database” are stored in a recording medium such as a disk or a memory. Information, data, signal values, variable values, and parameters stored in a storage medium such as a disk or memory are read out to the main memory or cache memory by the CPU 911 via a read / write circuit, and extracted, searched, referenced, compared, and calculated. Used for CPU operations such as calculation, processing, editing, output, printing, and display.
Information, data, signal values, variable values, and parameters are stored in the main memory, registers, cache memory, and buffers during the CPU operations of extraction, search, reference, comparison, calculation, processing, editing, output, printing, and display. It is temporarily stored in a memory or the like.
In addition, the arrows in the flowcharts described in the first to fifth embodiments mainly indicate input / output of data and signals. The data and signal values are the RAM 914 memory, the FDD 904 flexible disk, the CDD 905 compact disk, and the magnetic field. Recording is performed on a recording medium such as a magnetic disk of the disk device 920, other optical disks, mini disks, DVDs, and the like. Data and signals are transmitted online via a bus 912, signal lines, cables, or other transmission media.

また、実施の形態１〜５の説明において「〜部」として説明しているものは、「〜回路」、「〜装置」、「〜機器」であってもよく、また、「〜ステップ」、「〜手順」、「〜処理」であってもよい。すなわち、「〜部」として説明しているものは、ＲＯＭ９１３に記憶されたファームウェアで実現されていても構わない。或いは、ソフトウェアのみ、或いは、素子・デバイス・基板・配線などのハードウェアのみ、或いは、ソフトウェアとハードウェアとの組み合わせ、さらには、ファームウェアとの組み合わせで実施されても構わない。ファームウェアとソフトウェアは、プログラムとして、磁気ディスク、フレキシブルディスク、光ディスク、コンパクトディスク、ミニディスク、ＤＶＤ等の記録媒体に記憶される。プログラムはＣＰＵ９１１により読み出され、ＣＰＵ９１１により実行される。すなわち、プログラムは、実施の形態１〜５の「〜部」としてコンピュータを機能させるものである。あるいは、実施の形態１〜５の「〜部」の手順や方法をコンピュータに実行させるものである。 In addition, what is described as “˜unit” in the description of the first to fifth embodiments may be “˜circuit”, “˜device”, “˜device”, and “˜step”, It may be “˜procedure” or “˜processing”. That is, what is described as “˜unit” may be realized by firmware stored in the ROM 913. Alternatively, it may be implemented only by software, or only by hardware such as elements, devices, substrates, and wirings, by a combination of software and hardware, or by a combination of firmware. Firmware and software are stored as programs in a recording medium such as a magnetic disk, a flexible disk, an optical disk, a compact disk, a mini disk, and a DVD. The program is read by the CPU 911 and executed by the CPU 911. That is, the program causes the computer to function as “to part” in the first to fifth embodiments. Alternatively, the computer executes the procedure and method of “to part” in the first to fifth embodiments.

このように、実施の形態１〜５に示す障害監視装置１及び物理計算機２は、処理装置たるＣＰＵ、記憶装置たるメモリ、磁気ディスク等、入力装置たるキーボード、マウス、通信ボード等、出力装置たる表示装置、通信ボード等を備えるコンピュータであり、上記したように「〜部」として示された機能をこれら処理装置、記憶装置、入力装置、出力装置を用いて実現するものである。 As described above, the failure monitoring apparatus 1 and the physical computer 2 described in the first to fifth embodiments are output devices such as a CPU as a processing device, a memory as a storage device, a magnetic disk, a keyboard as an input device, a mouse, a communication board, and the like. The computer includes a display device, a communication board, and the like, and implements the functions indicated as “˜unit” using the processing device, the storage device, the input device, and the output device as described above.

実施の形態１に係るシステム構成例を示す図。FIG. 3 is a diagram illustrating an example of a system configuration according to the first embodiment. 実施の形態１に係るホストＯＳの動作例を示すフローチャート図。FIG. 3 is a flowchart showing an operation example of a host OS according to the first embodiment. 実施の形態１に係るゲストＯＳの動作例を示すフローチャート図。FIG. 4 is a flowchart showing an example of operation of a guest OS according to the first embodiment. 実施の形態２に係るホストＯＳの動作例を示すフローチャート図。FIG. 9 is a flowchart showing an operation example of a host OS according to the second embodiment. 実施の形態２に係るゲストＯＳからホストＯＳに送信される障害情報の例を示す図。FIG. 10 is a diagram illustrating an example of failure information transmitted from the guest OS to the host OS according to the second embodiment. 実施の形態２に係るホストＯＳから障害監視装置に送信される障害情報の例を示す図。FIG. 10 is a diagram illustrating an example of failure information transmitted from the host OS to the failure monitoring apparatus according to the second embodiment. 実施の形態３に係るホストＯＳの動作例を示すフローチャート図。FIG. 9 is a flowchart showing an operation example of a host OS according to the third embodiment. 実施の形態３に係るゲストＯＳの動作例を示すフローチャート図。FIG. 9 is a flowchart showing an operation example of a guest OS according to the third embodiment. 実施の形態４に係るシステム構成例を示す図。FIG. 10 is a diagram illustrating an example of a system configuration according to a fourth embodiment. 実施の形態４に係るホストＯＳの動作例を示すフローチャート図。FIG. 9 is a flowchart showing an operation example of a host OS according to the fourth embodiment. 実施の形態４に係るゲストＯＳの動作例を示すフローチャート図。FIG. 10 is a flowchart showing an operation example of a guest OS according to the fourth embodiment. 実施の形態５に係るホスト一覧を表示した画面の例を示す図。FIG. 10 is a diagram showing an example of a screen displaying a host list according to the fifth embodiment. 実施の形態５に係る擬似障害発生要求を実施する画面の例を示す図。FIG. 20 is a diagram showing an example of a screen for executing a simulated fault occurrence request according to the fifth embodiment. 実施の形態５に係る障害情報を表示した画面の例を示す図。FIG. 10 is a diagram showing an example of a screen displaying failure information according to the fifth embodiment. 実施の形態５に係る障害監視制御マネージャの動作例を示すフローチャート図。FIG. 10 is a flowchart showing an operation example of a failure monitoring control manager according to the fifth embodiment. 実施の形態１に係るｘｍコマンドの出力例及びａｒｐコマンドの出力例を示す図。FIG. 6 is a diagram illustrating an output example of an xm command and an output example of an arp command according to the first embodiment. 実施の形態１に係る仮想計算機によるサーバの統合の例を示す図。FIG. 3 is a diagram illustrating an example of server integration by the virtual machine according to the first embodiment. 実施の形態１〜５に係る障害監視装置及び物理計算機のハードウェア構成例を示す図。The figure which shows the hardware structural example of the failure monitoring apparatus which concerns on Embodiment 1-5, and a physical computer.

Explanation of symbols

１障害監視装置、２物理計算機、３通信回線、４通信回線、５障害監視制御マネージャ、２０仮想計算機管理機構、２１仮想ネットワーク、２２ホストＯＳ、２３ゲストＯＳ、５１情報受信部、５２表示部、５３擬似障害発生制御部、５４稼動情報蓄積部、２２１ホスト障害監視制御エージェント、２２２障害発生モジュール、２３１ゲスト障害監視制御エージェント、２３２障害発生モジュール、２２１１ＩＰ通知部、２２１２情報受信部、２２１３障害検知部、２２１４障害報告部、２２８１ドメインＩＤ変換部、２２８２擬似障害制御部、２３１１ＩＰ受信部、２３１２情報受信部、２３１３障害検知部、２３１４障害報告部、２３８１擬似障害制御部。 DESCRIPTION OF SYMBOLS 1 Fault monitoring device, 2 Physical computer, 3 Communication line, 4 Communication line, 5 Fault monitoring control manager, 20 Virtual computer management mechanism, 21 Virtual network, 22 Host OS, 23 Guest OS, 51 Information receiving part, 52 Display part, 53 Pseudo fault occurrence control unit, 54 operation information storage unit, 221 host fault monitoring control agent, 222 fault occurrence module, 231 guest fault monitoring control agent, 232 fault occurrence module, 2211 IP notification unit, 2212 information receiving unit, 2213 fault detection Unit, 2214 fault report unit, 2281 domain ID conversion unit, 2282 simulated fault control unit, 2311 IP reception unit, 2312 information reception unit, 2313 fault detection unit, 2314 fault report unit, 2381 pseudo fault control unit.

Claims

A computer apparatus having a virtual machine management mechanism for realizing a virtual machine, a host OS (Operating System) and at least one guest OS operating on the virtual machine management mechanism, and having a storage area allocated to each guest OS Because
Communication address information for notifying the communication address of the host OS is transmitted as a destination address of failure information for notifying a failure detected in each guest OS, and the host OS communication address is set. A first fault monitoring control unit for receiving fault information for notifying a fault;
The communication address information is received from the first failure monitoring control unit, the communication address of the host OS indicated in the received communication address information is stored in a storage area allocated to the corresponding guest OS, and the corresponding guest OS Failure information for notifying the detected failure when a failure is detected in the server, the communication address of the host OS stored in the storage area as a destination address, and the generated failure information as the first failure monitoring possess a second failure monitoring control unit one or more to be transmitted to the control unit,
The first fault monitoring and control unit
Based on the association information between the domain ID (Identification Data) of each guest OS provided by the virtual machine management mechanism and the MAC (Media Access Control) address of each guest OS, the IP (Internet Protocol) address of each guest OS is set. A computer characterized in that, using the acquired IP address of each guest OS, communication address information for notifying the IP address of the host OS is transmitted to a second failure monitoring control unit corresponding to each guest OS. apparatus.

The first fault monitoring and control unit
2. The message according to claim 1 , wherein, when an IP address of any guest OS cannot be acquired, a message notifying that a network between the guest OS and the host OS is not set is output. Computer equipment.

A virtual machine management mechanism that implements a virtual machine is mounted, and a host OS (Operating System) and one or more guest OSs operate on the virtual machine management mechanism, and a pseudo-failure occurrence request that requests the occurrence of a pseudo-failure is issued. A computer device connected to a fault monitoring device for transmission ,
A failure occurrence module of the host OS that generates a simulated failure in the host OS;
A guest OS failure module that generates a simulated failure in each guest OS;
The simulated fault occurrence request transmitted from the fault monitoring device is received, the content of the received simulated fault occurrence request is analyzed, and a request for a simulated fault occurrence request is issued to the fault occurrence module of any OS based on the analysis result. And a first failure monitoring control unit for notifying the occurrence of a simulated failure .

The computer device is:
A storage area allocated to each guest OS;
The first fault monitoring and control unit
Communication address information for notifying the communication address of the host OS is transmitted as a destination address of failure information for notifying a failure detected in each guest OS, and the host OS communication address is set. Receive failure information to notify the failure,
The computer apparatus further includes:
The communication address information is received from the first failure monitoring control unit, the communication address of the host OS indicated in the received communication address information is stored in a storage area allocated to the corresponding guest OS, and the corresponding guest OS Failure information for notifying the detected failure when a failure is detected in the server, the communication address of the host OS stored in the storage area as a destination address, and the generated failure information as the first failure monitoring The computer apparatus according to claim 3, further comprising one or more second failure monitoring control units that transmit to the control unit.

The computer device is:
5. The computer apparatus according to claim 3, wherein reception of pseudo fault occurrence requests from the fault monitoring apparatus is concentrated in a first fault monitoring control unit.

The first fault monitoring and control unit
Generation of a pseudo failure that includes a communication address of a pseudo-failure target guest OS that is a target of the pseudo-failure, and is generated in cooperation between the failure generation module of the host OS and the virtual machine management mechanism Is received from the communication address of the simulated fault target guest OS, the domain ID of the simulated fault target guest OS is acquired, and the acquired domain ID of the simulated fault target guest OS is The computer apparatus according to claim 3 , wherein the computer management mechanism is notified.

The first fault monitoring and control unit
When a pseudo failure occurrence request that includes the IP address of the simulated failure target guest OS and that requests the occurrence of a pseudo failure of the physical hardware assigned to the simulated failure target guest OS is received In addition, the domain ID of the simulated failure target guest OS is acquired from the IP address of the simulated failure target guest OS, and the acquired domain ID of the simulated failure target guest OS is notified to the virtual machine management mechanism. The computer apparatus according to claim 6 .

The host OS failure occurrence module is:
4. The virtual machine management mechanism according to claim 3 , wherein at least one of a simulated failure of the host OS, a simulated failure of the virtual machine management mechanism, and a simulated failure of physical hardware not assigned to any guest OS is generated. Computer equipment.

The first fault monitoring and control unit
When a pseudo fault generated by the fault occurrence module of the host OS is detected, fault information for notifying the detected pseudo fault is generated, identification information of the host OS is added to the generated fault information, and 4. The computer apparatus according to claim 3 , wherein the fault information to which the identification information of the host OS is added is transmitted to the fault monitoring apparatus.

The failure occurrence module of each guest OS is
The computer apparatus according to claim 3 , wherein at least one of a pseudo failure of a corresponding guest OS and a pseudo failure of an application program operating on the corresponding guest OS is generated.

The second fault monitoring controller
When a simulated fault generated by the corresponding guest OS fault occurrence module is detected, fault information for notifying the detected pseudo fault is generated, and the generated fault information is sent to the first fault monitoring controller. Send
The first fault monitoring and control unit
The host OS identification information is added to the fault information received from the second fault monitoring control unit, and the fault information with the host OS identification information added is transmitted to the fault monitoring apparatus. The computer apparatus according to claim 4 .

The computer device is:
Computer apparatus according to claim 1 or 4, characterized in that it aggregates the destination of the fault information by the second fault monitor control unit to the first failure monitoring control unit.

The computer device is:
Connected to a fault monitoring device that monitors faults,
The first fault monitoring and control unit
The host OS identification information is added to the fault information received from the second fault monitoring control unit, and the fault information with the host OS identification information added is transmitted to the fault monitoring apparatus. The computer apparatus according to claim 1 or 4 .

The first fault monitoring and control unit
Monitor at least one of the log output by the host OS, the log output by the virtual machine management mechanism, the process operating status of the host OS, the operating status of each guest OS, and the operating status of physical hardware at regular intervals. Then, failure information for notifying a failure in any one of them is generated, the identification information of the host OS is added to the generated failure information, and the failure information to which the identification information of the host OS is added is sent to the failure monitoring apparatus. The computer apparatus according to claim 13 , which transmits the computer apparatus.

The first fault monitoring and control unit
The computer apparatus according to claim 13 or 14 , wherein an operation notification for notifying that the host OS is operating is transmitted to the failure monitoring apparatus at regular intervals.

The second fault monitoring controller
The log output by the corresponding guest OS, the log output by the application program operating on the corresponding guest OS, the process operating status of the corresponding guest OS, and the process operating status of the application program operating on the corresponding guest OS monitoring at least one of every predetermined period and generates the fault information for notifying a failure in either the generated fault information claim 1 or and transmitting to the first failure monitoring control unit 4. The computer apparatus according to 4 .

The second fault monitoring controller
The guest OS to run notification for notifying that running at constant intervals, computing device according to claim 1 or 4, characterized in that transmitted to the first failure monitoring control unit.