JP2019212046A

JP2019212046A - Control program, control method, and information processing device

Info

Publication number: JP2019212046A
Application number: JP2018108001A
Authority: JP
Inventors: 浩之小室; Hiroyuki Komuro
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2018-06-05
Filing date: 2018-06-05
Publication date: 2019-12-12

Abstract

To more efficiently generate information useful for surveying a cause of an abnormaliry.SOLUTION: A control program causes a computer to execute processing for: when detecting that log information transmitted by a monitoring target machine at specific transmission intervals cannot be received at reception intervals corresponding to the transmission intervals, identifying an event that has occurred in a system on the basis of information acquired from the system in connection with the monitoring target machine ; and in a case where any one piece of log information has been received from the monitoring target machine, generating information that associates the information indicating the identified event with the log information which has been received.SELECTED DRAWING: Figure 3

Description

本発明は、制御プログラム、制御方法及び情報処理装置に関する。 The present invention relates to a control program, a control method, and an information processing apparatus.

複数の仮想マシンが連携して動作する分散システムが増えている。このようなシステムでは、システム運用時の可用性を保つため、各仮想マシンの状態を監視する機能（以下、「監視機能」という。）が配備され、不慮のトラブル発生による運用停止に備えている。 A distributed system in which a plurality of virtual machines operate in cooperation is increasing. In such a system, in order to maintain availability during system operation, a function for monitoring the state of each virtual machine (hereinafter referred to as “monitoring function”) is provided to prepare for an operation stop due to an unexpected trouble.

仮想マシンの状態監視の方法としては、仮想マシン側から一定期間ごとに存在通知を監視機能に送信する方法が有る。この方法では、一定期間を超えて存在通知が受信されない場合に、監視機能は仮想マシンに異常が発生した（例えば、仮想マシンがダウンした）と判断する。 As a method of monitoring the state of the virtual machine, there is a method of transmitting a presence notification from the virtual machine side to the monitoring function at regular intervals. In this method, when the presence notification is not received for a certain period of time, the monitoring function determines that an abnormality has occurred in the virtual machine (for example, the virtual machine has gone down).

特開２００７−２６５２１５号公報JP 2007-265215 A 特開２０１５−５０３８１１号公報Japanese Patent Laying-Open No. 2015-503811 特開２０１４−１９１４９１号公報JP 2014-191491 A

仮想マシンがダウンすると、ＶＭ（Virtual Machine）マネージャ又はハイパーバイザ等によって、直ちに仮想マシンの再起動が行われる。再起動においては、元の仮想マシンが初期化（削除）されて、新たに仮想マシンが生成される。したがって、存在通知が到達しなくなったという異常の原因を調査するために有用な資料となりうる、元の仮想マシンが出力していたログ情報等も削除されてしまう。その結果、当該異常の原因調査に有用な情報が少なく、異常の原因調査が困難になる。 When the virtual machine goes down, the virtual machine is immediately restarted by a VM (Virtual Machine) manager or a hypervisor. In the restart, the original virtual machine is initialized (deleted), and a new virtual machine is generated. Therefore, log information and the like output by the original virtual machine, which can be useful for investigating the cause of the abnormality that the presence notification is not reached, are also deleted. As a result, there is little information useful for investigating the cause of the abnormality, making it difficult to investigate the cause of the abnormality.

そこで、一側面では、本発明は、異常原因の調査に有用な情報の生成を効率化することを目的とする。 Thus, in one aspect, an object of the present invention is to improve the efficiency of generating information useful for investigating the cause of an abnormality.

一つの態様では、制御プログラムは、特定の送信間隔で監視対象マシンより送信されるログ情報が、前記送信間隔に応じた受信間隔で受信できないことを検知すると、前記監視対象マシンに関連するシステムから取得される情報に基づき、前記システムで発生した事象を特定し、前記監視対象マシンよりいずれかのログ情報を受信済みの場合、特定した前記事象を示す情報を、受信済みの前記ログ情報に対応付けた情報を生成する、処理をコンピュータに実行させる。 In one aspect, when the control program detects that the log information transmitted from the monitoring target machine at a specific transmission interval cannot be received at the reception interval corresponding to the transmission interval, the control program detects from the system related to the monitoring target machine. Based on the acquired information, an event that occurred in the system is specified, and when any log information has been received from the monitored machine, information indicating the specified event is included in the received log information. Causes the computer to execute processing for generating the associated information.

一側面として、異常原因の調査に有用な情報の生成を効率化することができる。 As one aspect, the generation of information useful for investigating the cause of abnormality can be made efficient.

本発明の実施の形態におけるシステム構成例を示す図である。It is a figure which shows the system configuration example in embodiment of this invention. 本発明の実施の形態における管理装置１０のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of the management apparatus 10 in embodiment of this invention. 本発明の実施の形態における管理装置１０の機能構成例を示す図である。It is a figure which shows the function structural example of the management apparatus 10 in embodiment of this invention. 管理装置１０が実行する処理手順の一例を説明するためのフローチャートである。5 is a flowchart for explaining an example of a processing procedure executed by the management apparatus 10; 存在通知が繰り返し送信される状態を示す図である。It is a figure which shows the state in which presence notification is transmitted repeatedly. 事象特定処理の処理手順の一例を説明するためのフローチャートである。It is a flowchart for demonstrating an example of the process sequence of an event specific process.

以下、図面に基づいて本発明の実施の形態を説明する。図１は、本発明の実施の形態におけるシステム構成例を示す図である。図１において、管理装置１０と１以上のサーバ装置２０とは、ネットワークシステムＮ１を介して接続されている。ネットワークシステムＮ１は、複数のルータＲによって接続されるネットワーク（リンク）の集合である。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a diagram showing an example of a system configuration in the embodiment of the present invention. In FIG. 1, a management apparatus 10 and one or more server apparatuses 20 are connected via a network system N1. The network system N1 is a set of networks (links) connected by a plurality of routers R.

サーバ装置２０は、１以上の仮想マシン（ＶＭ（Virtual Machine））３０が稼働する物理的なコンピュータである。各仮想マシン３０は、例えば、インターネット等のネットワークを介して所定のサービスを提供するための分散システムを構成してもよい。なお、サーバ装置２０がネットワークシステムＮ１に接続しているため、各仮想マシン３０もネットワークシステムＮ１に接続可能である。 The server device 20 is a physical computer on which one or more virtual machines (VMs) 30 operate. Each virtual machine 30 may constitute a distributed system for providing a predetermined service via a network such as the Internet, for example. Since the server device 20 is connected to the network system N1, each virtual machine 30 can also be connected to the network system N1.

管理装置１０は、各仮想マシン３０の管理を行う情報処理装置である。管理とは、例えば、起動、停止、状態の監視等をいう。 The management apparatus 10 is an information processing apparatus that manages each virtual machine 30. Management refers to, for example, starting, stopping, and status monitoring.

図２は、本発明の実施の形態における管理装置１０のハードウェア構成例を示す図である。図２の管理装置１０は、それぞれバスＢで相互に接続されているドライブ装置１００、補助記憶装置１０２、メモリ装置１０３、ＣＰＵ１０４、及びインタフェース装置１０５等を有する。 FIG. 2 is a diagram illustrating a hardware configuration example of the management apparatus 10 according to the embodiment of the present invention. The management device 10 in FIG. 2 includes a drive device 100, an auxiliary storage device 102, a memory device 103, a CPU 104, an interface device 105, and the like that are mutually connected by a bus B.

管理装置１０での処理を実現するプログラムは、記録媒体１０１によって提供される。プログラムを記録した記録媒体１０１がドライブ装置１００にセットされると、プログラムが記録媒体１０１からドライブ装置１００を介して補助記憶装置１０２にインストールされる。但し、プログラムのインストールは必ずしも記録媒体１０１より行う必要はなく、ネットワークを介して他のコンピュータよりダウンロードするようにしてもよい。補助記憶装置１０２は、インストールされたプログラムを格納すると共に、必要なファイルやデータ等を格納する。 A program for realizing processing in the management apparatus 10 is provided by the recording medium 101. When the recording medium 101 on which the program is recorded is set in the drive device 100, the program is installed from the recording medium 101 to the auxiliary storage device 102 via the drive device 100. However, the program need not be installed from the recording medium 101 and may be downloaded from another computer via a network. The auxiliary storage device 102 stores the installed program and also stores necessary files and data.

メモリ装置１０３は、プログラムの起動指示があった場合に、補助記憶装置１０２からプログラムを読み出して格納する。ＣＰＵ１０４は、メモリ装置１０３に格納されたプログラムに従って管理装置１０に係る機能を実行する。インタフェース装置１０５は、ネットワークに接続するためのインタフェースとして用いられる。 The memory device 103 reads the program from the auxiliary storage device 102 and stores it when there is an instruction to start the program. The CPU 104 executes a function related to the management device 10 according to a program stored in the memory device 103. The interface device 105 is used as an interface for connecting to a network.

なお、記録媒体１０１の一例としては、ＣＤ−ＲＯＭ、ＤＶＤディスク、又はＵＳＢメモリ等の可搬型の記録媒体が挙げられる。また、補助記憶装置１０２の一例としては、ＨＤＤ（Hard Disk Drive）又はフラッシュメモリ等が挙げられる。記録媒体１０１及び補助記憶装置１０２のいずれについても、コンピュータ読み取り可能な記録媒体に相当する。 An example of the recording medium 101 is a portable recording medium such as a CD-ROM, a DVD disk, or a USB memory. An example of the auxiliary storage device 102 is an HDD (Hard Disk Drive) or a flash memory. Both the recording medium 101 and the auxiliary storage device 102 correspond to computer-readable recording media.

図３は、本発明の実施の形態における管理装置１０の機能構成例を示す図である。図３において、管理装置１０は、親システム１１及び監視部１２等を有する。親システム１１及び監視部１２は、管理装置１０にインストールされた１以上のプログラム（制御プログラム）が、ＣＰＵ１０４に実行させる処理により実現される。管理装置１０は、また、情報記憶部１３を利用する。情報記憶部１３は、例えば、補助記憶装置１０２、又は管理装置１０にネットワークを介して接続可能な記憶装置等を用いて実現可能である。 FIG. 3 is a diagram illustrating a functional configuration example of the management apparatus 10 according to the embodiment of the present invention. In FIG. 3, the management device 10 includes a parent system 11 and a monitoring unit 12. The parent system 11 and the monitoring unit 12 are realized by processing that the CPU 104 executes one or more programs (control programs) installed in the management apparatus 10. The management apparatus 10 also uses the information storage unit 13. The information storage unit 13 can be realized by using, for example, a storage device that can be connected to the auxiliary storage device 102 or the management device 10 via a network.

親システム１１は、仮想マシン３０のライフサイクルを管理するソフトウェアである。ＶＭマネージャやハイパーバイザは、親システム１１の一例である。例えば、親システム１１は、仮想マシン３０の起動（生成）、削除等に関連する制御を実行する。 The parent system 11 is software that manages the life cycle of the virtual machine 30. A VM manager and a hypervisor are examples of the parent system 11. For example, the parent system 11 executes control related to activation (generation), deletion, and the like of the virtual machine 30.

監視部１２は、仮想マシン３０の状態の監視等を行う。図３において、監視部１２は、存在通知取得部１２１及び事象特定部１２２等を含む。 The monitoring unit 12 monitors the state of the virtual machine 30 and the like. In FIG. 3, the monitoring unit 12 includes a presence notification acquisition unit 121, an event identification unit 122, and the like.

存在通知取得部１２１は、監視対象の仮想マシン３０が複数のタイミング（特定の送信間隔）で送信する存在通知を取得（受信）する。複数のタイミングの間隔は、例えば、一定間隔でもよい。但し、存在通知の間隔は、厳密に一定でなくてもよい。なお、存在通知とは、仮想マシン３０が正常に動作していることを外部へ通知するための情報をいう。 The presence notification acquisition unit 121 acquires (receives) a presence notification transmitted by the monitored virtual machine 30 at a plurality of timings (specific transmission intervals). The interval between the plurality of timings may be a fixed interval, for example. However, the interval between the presence notifications may not be strictly constant. The presence notification refers to information for notifying the outside that the virtual machine 30 is operating normally.

本実施の形態において、仮想マシン３０は、当該仮想マシン３０が出力したログ情報を存在通知に含めて送信する。ログ情報には、仮想マシン３０において発生したイベントを示すメッセージが、時系列に含まれている。当該イベントには、エラー（異常）等も含まれる。例えば、仮想マシン３０のＯＳ（Operating System）のシステムログが当該ログ情報として用いられてもよい。存在通知取得部１２１は、受信した存在通知に含まれるログ情報を、情報記憶部１３に記憶する。なお、一回の存在通知に含まれるログ情報は、仮想マシン３０がこれまでに出力したログ情報の全部でなくてもよい。例えば、当該全部のうち、存在通知の停止の調査に有効な範囲に限定された部分（例えば、直近の５０行や直近の１Ｋｂｙｔｅ等）が、一回の存在通知に含まれてもよい。 In the present embodiment, the virtual machine 30 transmits the log information output by the virtual machine 30 in the presence notification. The log information includes messages indicating events occurring in the virtual machine 30 in time series. The event includes an error (abnormality) and the like. For example, an OS (Operating System) system log of the virtual machine 30 may be used as the log information. The presence notification acquisition unit 121 stores the log information included in the received presence notification in the information storage unit 13. Note that the log information included in a single presence notification may not be all of the log information output so far by the virtual machine 30. For example, a portion (for example, the latest 50 lines, the latest 1 Kbyte, etc.) that is limited to a range that is effective for the investigation of the presence notification stop may be included in one presence notification.

事象特定部１２２は、存在通知が取得できなくなった場合（存在通知の送信間隔に応じた受信間隔で存在通知を受信できないことを検知した場合）に、仮想マシン３０に関連するシステムから情報を取得し、取得された情報に基づいて、当該システム内で発生した事象を特定する。事象特定部１２２は、当該事象を示す情報を、存在通知（ログ情報）が取得できなくなるまでに取得された受信済みのログ情報（例えば、存在通知取得部１２１によって最後に取得されたログ情報）に対応付けた情報を生成し、当該情報を情報記憶部１３に記憶する。なお、本実施の形態において、仮想マシン３０に関連するシステムとは、ネットワークシステムＮ１及び親システム１１である。 The event specifying unit 122 acquires information from the system related to the virtual machine 30 when the presence notification cannot be acquired (when it is detected that the presence notification cannot be received at the reception interval corresponding to the transmission interval of the presence notification). Then, based on the acquired information, an event occurring in the system is specified. The event specifying unit 122 has received the log information acquired until the presence notification (log information) cannot be acquired as the information indicating the event (for example, the log information acquired last by the presence notification acquisition unit 121). The information associated with is generated, and the information is stored in the information storage unit 13. In the present embodiment, the systems related to the virtual machine 30 are the network system N1 and the parent system 11.

以下、管理装置１０が実行する処理手順について説明する。図４は、管理装置１０が実行する処理手順の一例を説明するためのフローチャートである。 Hereinafter, a processing procedure executed by the management apparatus 10 will be described. FIG. 4 is a flowchart for explaining an example of a processing procedure executed by the management apparatus 10.

ステップＳ１０１において、存在通知取得部１２１は、監視対象の仮想マシン３０（以下、「対象マシン」という。）からの存在通知の受信を待機する。待機時間が（Δｔ＋α）秒（α≧０）を超える前までに（Ｓ１０３でＮｏ）、存在通知を受信すると（Ｓ１０１でＹｅｓ）、存在通知取得部１２１は、当該存在通知に含まれているログ情報を、当該存在通知の送信元の対象マシンの識別情報（以下、「仮想マシンＩＤ」という。）に関連付けて情報記憶部１３に記憶する（Ｓ１０２）。この際、前回受信されたログ情報は、新たに受信されたログ情報によって上書きされてもよい。また、仮想マシン３０のＩＰアドレスが仮想マシンＩＤとして用いられてもよい。続いて、存在通知取得部１２１は、ステップＳ１０１以降を繰り返す。 In step S <b> 101, the presence notification acquisition unit 121 waits for reception of a presence notification from the virtual machine 30 to be monitored (hereinafter referred to as “target machine”). If the presence notification is received (No in S103) before the standby time exceeds (Δt + α) seconds (α ≧ 0) (No in S103), the presence notification acquisition unit 121 includes the log included in the presence notification. The information is stored in the information storage unit 13 in association with the identification information (hereinafter referred to as “virtual machine ID”) of the target machine that is the transmission source of the presence notification (S102). At this time, the log information received last time may be overwritten by the newly received log information. Further, the IP address of the virtual machine 30 may be used as the virtual machine ID. Subsequently, the presence notification acquisition unit 121 repeats step S101 and subsequent steps.

したがって、存在通知が（Δｔ＋α）以内の間隔（例えば、一定間隔）で対象マシンから繰り返し送信されている間は、ステップＳ１０１及びＳ１０２が繰り返される。 Therefore, steps S101 and S102 are repeated while the presence notification is repeatedly transmitted from the target machine at an interval (for example, a constant interval) within (Δt + α).

図５は、存在通知が繰り返し送信される状態を示す図である。図５には、Δｔ秒の一定間隔で、対象マシンから存在通知取得部１２１へ存在通知が送信される例が示されている。 FIG. 5 is a diagram illustrating a state in which the presence notification is repeatedly transmitted. FIG. 5 shows an example in which presence notifications are transmitted from the target machine to the presence notification acquisition unit 121 at regular intervals of Δt seconds.

一方、いずれかのタイミングで、存在通知の待機時間が（Δｔ＋α）秒を超えると（すなわち、前回の存在通知の受信時から（Δｔ＋α）秒を超えても存在通知が受信されないと）（Ｓ１０３でＹｅｓ）、事象特定部１２２は、事象特定処理を実行する（Ｓ１０４）。事象特定処理では、ネットワークシステムＮ１や親システム１１において発生した事象が特定される。なお、親システム１１がダウンすることが、存在通知を取得できなくなる原因となることもある。 On the other hand, if the presence notification wait time exceeds (Δt + α) seconds at any timing (that is, the presence notification is not received even if (Δt + α) seconds have elapsed since the previous presence notification was received) (S103 Yes), the event identification unit 122 executes event identification processing (S104). In the event specifying process, events occurring in the network system N1 and the parent system 11 are specified. Note that the down of the parent system 11 may cause a failure to acquire the presence notification.

事象特定処理が完了すると、事象特定部１２２は、親システム１１に対して、事象の特定が完了したことを通知する（Ｓ１０５）。親システム１１は、事象特定部１２２からの通知に応じ、対象マシンを削除（初期化）し、対象マシンの再起動（再作成）を行う。すなわち、親システム１１は、事象特定部１２２からの通知を受信するまで、対象マシンの再起動を待機する。 When the event identification process is completed, the event identification unit 122 notifies the parent system 11 that the identification of the event has been completed (S105). The parent system 11 deletes (initializes) the target machine and restarts (recreates) the target machine in response to the notification from the event specifying unit 122. That is, the parent system 11 waits for the target machine to be restarted until receiving a notification from the event specifying unit 122.

続いて、ステップＳ１０４の詳細について説明する。図６は、事象特定処理の処理手順の一例を説明するためのフローチャートである。 Next, details of step S104 will be described. FIG. 6 is a flowchart for explaining an example of the processing procedure of the event specifying process.

ステップＳ２０１において、事象特定部１２２は、ネットワークシステムＮ１を構成する各ルータＲから通信状態を示す情報（以下、「通信情報」という。）を取得する。例えば、事象特定部１２２は、各ルータＲと対象マシンとのそれぞれ宛てにｐｉｎｇコマンドを発行し、各ルータＲ及び対象マシンについて、応答の有無を示す情報を通信情報として生成する。このような通信情報は、対象マシンと管理装置１０との間の通信経路を構成する各ネットワーク（各リンク）の通信の可否を示す情報であるともいえる。 In step S201, the event identification unit 122 acquires information (hereinafter referred to as “communication information”) indicating a communication state from each router R configuring the network system N1. For example, the event specifying unit 122 issues a ping command to each router R and the target machine, and generates information indicating whether or not there is a response as communication information for each router R and the target machine. Such communication information can be said to be information indicating whether or not each network (each link) composing a communication path between the target machine and the management apparatus 10 can be communicated.

何らかの原因で通信情報を取得（又は生成）できなかった場合（Ｓ２０２でＮｏ）、事象特定部１２２は、通信情報の取得に失敗したことを示す情報を事象情報として生成し（Ｓ２０３）、ステップＳ２１２へ進む。すなわち、この場合、通信情報を取得できないといった事象が特定される。 If the communication information cannot be acquired (or generated) for some reason (No in S202), the event specifying unit 122 generates information indicating that the acquisition of the communication information has failed as event information (S203), and step S212. Proceed to That is, in this case, an event that communication information cannot be acquired is specified.

一方、通信情報を取得できた場合（Ｓ２０２でＹｅｓ）、事象特定部１２２は、当該通信情報に基づいて、仮想マシン３０と管理装置１０との間の通信状態が正常であるか否かを判定する（Ｓ２０４）。例えば、通信情報が、いずれかのルータＲ又は対象マシンから応答が無かったことを示す場合、事象特定部１２２は、通信状態が異常であると判定し（Ｓ２０４でＮｏ）、当該通信情報を事象情報として（Ｓ２０５）、ステップＳ２１２へ進む。すなわち、この場合、通信状態が異常であるといった事象が特定される。 On the other hand, when the communication information can be acquired (Yes in S202), the event specifying unit 122 determines whether or not the communication state between the virtual machine 30 and the management apparatus 10 is normal based on the communication information. (S204). For example, when the communication information indicates that there is no response from any router R or the target machine, the event specifying unit 122 determines that the communication state is abnormal (No in S204), and the communication information is changed to the event. As information (S205), the process proceeds to step S212. That is, in this case, an event that the communication state is abnormal is specified.

一方、通信情報が、全てのルータＲ及び対象マシンから応答が有ったことを示す場合（Ｓ２０４でＮｏ）、事象特定部１２２は、親システム１１から対象マシンの稼働情報を取得する（Ｓ２０６）。稼働情報とは、対象マシンの稼働状態を示す情報であり、対象マシン動作しているか否か（ダウンしているか）を示すと共に、対象マシンが動作している場合には、例えば、対象マシンの各種のリソースの負荷（使用状況）等を含む情報である。各種のリソースの負荷を示す情報とは、例えば、対象マシンのＣＰＵ及びメモリのそれぞれの使用率を示す情報である。 On the other hand, when the communication information indicates that there is a response from all the routers R and the target machine (No in S204), the event specifying unit 122 acquires the operation information of the target machine from the parent system 11 (S206). . The operation information is information indicating the operation state of the target machine, indicates whether the target machine is operating (down), and when the target machine is operating, for example, This information includes various resource loads (usage status) and the like. The information indicating the load of various resources is information indicating the usage rates of the CPU and the memory of the target machine, for example.

何らかの原因で稼働情報を取得できなかった場合（Ｓ２０７でＮｏ）、事象特定部１２２は、稼働情報の取得に失敗したことを示す情報を事象情報として生成し（Ｓ２０８）、ステップＳ２１２へ進む。すなわち、この場合、稼働情報を取得できないといった事象が特定される。 If the operation information cannot be acquired for some reason (No in S207), the event specifying unit 122 generates information indicating that the acquisition of the operation information has failed as event information (S208), and the process proceeds to step S212. That is, in this case, an event that the operation information cannot be acquired is specified.

一方、稼働情報を取得できた場合（Ｓ２０７でＹｅｓ）、事象特定部１２２は、稼働情報に基づいて、対象マシンの稼働状態が正常であるか否かを判定する（Ｓ２０９）。稼働情報が、対象マシンが動作していないこと（ダウンしていること）を示す場合、又は対象マシンは動作しているが、いずれかのリソースの負荷が異常であることを示す場合、事象特定部１２２は、当該稼働状態が異常であると判定する。この場合（Ｓ２０９でＮｏ）、事象特定部１２２は、当該稼働情報を事象情報として（Ｓ２１０）、ステップＳ２１２へ進む。すなわち、この場合、対象マシンの稼働状態が異常であるといった事象が特定される。なお、リソースの負荷が正常であるか異常であるかは、例えば、各リソースの負荷を、当該負荷に対して予め設定されている閾値と比較することで判定されてもよい。 On the other hand, when the operation information can be acquired (Yes in S207), the event specifying unit 122 determines whether the operation state of the target machine is normal based on the operation information (S209). If the operation information indicates that the target machine is not operating (down), or if the target machine is operating but any resource load is abnormal, specify the event The unit 122 determines that the operating state is abnormal. In this case (No in S209), the event identification unit 122 sets the operation information as event information (S210), and proceeds to step S212. That is, in this case, an event that the operation state of the target machine is abnormal is specified. Whether the resource load is normal or abnormal may be determined, for example, by comparing the load of each resource with a threshold value set in advance for the load.

一方、稼働情報が、対象マシンが動作していることを示し、かつ、各リソースの負荷が正常であることを示す場合、事象特定部１２２は、当該稼働状態が正常であると判定する。この場合（Ｓ２０９でＹｅｓ）、ネットワークシステムＮ１に異常が無く、対象マシンの稼働状態に異常が無いにも関わらず存在通知が受信されないといった矛盾した状況が発生していることになる。そこで、事象特定部１２２は、対象マシンとの通信情報及び対象マシンの稼働状態は正常であることを示す情報を事象情報として生成し（Ｓ２１１）、ステップＳ２１２へ進む。 On the other hand, when the operation information indicates that the target machine is operating and the load of each resource is normal, the event specifying unit 122 determines that the operation state is normal. In this case (Yes in S209), there is a contradictory situation in which there is no abnormality in the network system N1, and no presence notification is received even though there is no abnormality in the operation state of the target machine. Therefore, the event identification unit 122 generates communication information with the target machine and information indicating that the operation state of the target machine is normal as event information (S211), and the process proceeds to step S212.

ステップＳ２１２において、事象特定部１２２は、ステップＳ２０３、Ｓ２０５、Ｓ２０３、Ｓ２１０又はＳ２１１において生成等された事象情報を、対象マシンの仮想マシンＩＤに関連付けられて情報記憶部１３に記憶されているログ情報に対応付けた情報を生成し、生成した情報を情報記憶部１３に記憶する（Ｓ２１２）。該当するログ情報が複数有る場合には、最新のログ情報に当該事象情報が対応付けられればよい。なお、ログ情報と事象情報との対応付けの方法は、特定の方法に限定されない。例えば、同じ識別子がログ情報及び事象情報に付与されることで、両者が対応付けられてもよいし、ログ情報を格納したファイルと事象情報を格納したファイルとが同じフォルダ（又はディレクトリ）に記憶されることで、両者が対応付けられてもよい。 In step S212, the event specifying unit 122 stores the event information generated in step S203, S205, S203, S210, or S211 in the information storage unit 13 in association with the virtual machine ID of the target machine. The information associated with is generated, and the generated information is stored in the information storage unit 13 (S212). When there are a plurality of corresponding log information, the event information may be associated with the latest log information. Note that the method of associating the log information with the event information is not limited to a specific method. For example, the same identifier may be assigned to the log information and the event information so that they may be associated with each other. The file storing the log information and the file storing the event information are stored in the same folder (or directory). By doing so, the two may be associated with each other.

上述したように、本実施の形態によれば、仮想マシン３０からの存在通知が受信されなくなった場合に、当該仮想マシン３０から最後に取得されたログ情報と、存在通知が受信されなくなった直後に関連システムから取得された事象情報とが対応付けられた情報が生成され、当該情報が情報記憶部１３に記憶される。したがって、異常原因の調査に有用な情報の生成を効率化することができる。その結果、存在通知が停止した原因の調査者は、当該ログ情報及び当該事象情報に基づいて、効率的に当該原因の解明を行うことができる。すなわち、調査者は、事象情報に基づいて、ネットワークの異常であるのか仮想マシン３０の異常であるのかを切り分けることができる。また、仮想マシン３０の異常である場合、仮想マシン３０から最後に取得されたログ情報には、当該異常が発生する予兆を示す情報が含まれている可能性が有る。したがって、調査者は、当該ログ情報を参照して、仮想マシン３０に異常が発生した原因を調査することができ、異常原因を特定する作業の負荷を軽減することができる。 As described above, according to this embodiment, when the presence notification from the virtual machine 30 is no longer received, the log information acquired last from the virtual machine 30 and immediately after the presence notification is no longer received. Is generated in association with the event information acquired from the related system, and the information is stored in the information storage unit 13. Therefore, the generation of information useful for investigating the cause of abnormality can be made efficient. As a result, the investigator who causes the presence notification to stop can efficiently solve the cause based on the log information and the event information. That is, the investigator can identify whether the abnormality is in the network or the virtual machine 30 based on the event information. In the case of an abnormality in the virtual machine 30, the log information acquired last from the virtual machine 30 may include information indicating a sign that the abnormality will occur. Therefore, the investigator can investigate the cause of the abnormality in the virtual machine 30 with reference to the log information, and can reduce the work load for identifying the cause of the abnormality.

また、存在通知に含まれるログ情報が、直近の一定期間又は一定量のログ情報が存在通知と共に繰り替えし仮想マシン３０から送信される。したがって、仮想マシン３０がダウン（停止）してしまった場合であっても、ログ情報を全く入手できないといった事態の発生を回避することができる。 In addition, the log information included in the presence notification is transmitted from the virtual machine 30 by repeating the log information of a certain period or a certain amount together with the presence notification. Therefore, even when the virtual machine 30 is down (stopped), it is possible to avoid a situation in which log information cannot be obtained at all.

また、情報記憶部１３に記憶されるログ情報は、最新のログ情報によって上書きされるようにすることで、最後のログ情報を容易に特定することができると共に、情報記憶部１３の記憶容量の消費を抑えることができる。 In addition, the log information stored in the information storage unit 13 can be easily identified with the latest log information, so that the last log information can be easily specified and the storage capacity of the information storage unit 13 can be increased. Consumption can be suppressed.

なお、本実施の形態は、物理マシンが監視対象とされる場合について適用されてもよい。すなわち、監視対象のマシンは、仮想マシン３０に限定されなくてもよい。 Note that this embodiment may be applied to a case where a physical machine is a monitoring target. In other words, the monitored machine need not be limited to the virtual machine 30.

なお、本実施の形態において、対象マシンは、監視対象マシンの一例である。管理装置１０は、情報処理装置の一例である。存在通知取得部１２１は、取得部の一例である。事象特定部１２２は、特定部及び生成部の一例である。 In the present embodiment, the target machine is an example of a monitoring target machine. The management device 10 is an example of an information processing device. The presence notification acquisition unit 121 is an example of an acquisition unit. The event specifying unit 122 is an example of a specifying unit and a generating unit.

以上、本発明の実施の形態について詳述したが、本発明は斯かる特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。 Although the embodiments of the present invention have been described in detail above, the present invention is not limited to such specific embodiments, and various modifications can be made within the scope of the gist of the present invention described in the claims. Deformation / change is possible.

以上の説明に関し、更に以下の項を開示する。
（付記１）
特定の送信間隔で監視対象マシンより送信されるログ情報が、前記送信間隔に応じた受信間隔で受信できないことを検知すると、前記監視対象マシンに関連するシステムから取得される情報に基づき、前記システムで発生した事象を特定し、
前記監視対象マシンよりいずれかのログ情報を受信済みの場合、特定した前記事象を示す情報を、受信済みの前記ログ情報に対応付けた情報を生成する、
処理をコンピュータに実行させることを特徴とする制御プログラム。
（付記２）
特定した前記事象を示す前記情報を、受信済みの前記ログ情報のうち、前記監視対象マシンより最後に受信したログ情報に対応付けた情報を生成する、
ことを特徴とする付記１記載の制御プログラム。
（付記３）
前記事象を示す情報は、前記監視対象マシンが接続するネットワークの通信状態を示す情報、又は前記監視対象マシンの稼働状態を示す情報を含む、
ことを特徴とする付記１又は２に記載の制御プログラム。
（付記４）
特定の送信間隔で監視対象マシンより送信されるログ情報が、前記送信間隔に応じた受信間隔で受信できないことを検知すると、前記監視対象マシンに関連するシステムから取得される情報に基づき、前記システムで発生した事象を特定し、
前記監視対象マシンよりいずれかのログ情報を受信済みの場合、特定した前記事象を示す情報を、受信済みの前記ログ情報に対応付けた情報を生成する、
処理をコンピュータが実行することを特徴とする制御方法。
（付記５）
特定した前記事象を示す前記情報を、受信済みの前記ログ情報のうち、前記監視対象マシンより最後に受信したログ情報に対応付けた情報を生成する、
ことを特徴とする付記４記載の制御方法。
（付記６）
前記事象を示す情報は、前記監視対象マシンが接続するネットワークの通信状態を示す情報、又は前記監視対象マシンの稼働状態を示す情報を含む、
ことを特徴とする付記４又は５に記載の制御方法。
（付記７）
特定の送信間隔で監視対象マシンより送信されるログ情報が、前記送信間隔に応じた受信間隔で受信できないことを検知すると、前記監視対象マシンに関連するシステムから取得される情報に基づき、前記システムで発生した事象を特定する特定部と、
前記監視対象マシンよりいずれかのログ情報を受信済みの場合、特定した前記事象を示す情報を、受信済みの前記ログ情報に対応付けた情報を生成する生成部と、
を有することを特徴とする情報処理装置。
（付記８）
前記生成部は、特定した前記事象を示す前記情報を、受信済みの前記ログ情報のうち、前記監視対象マシンより最後に受信したログ情報に対応付けた情報を生成する、
ことを特徴とする付記７記載の情報処理装置。
（付記９）
前記事象を示す情報は、前記監視対象マシンが接続するネットワークの通信状態を示す情報、又は前記監視対象マシンの稼働状態を示す情報を含む、
ことを特徴とする付記７又は８に記載の情報処理装置。 Regarding the above description, the following items are further disclosed.
(Appendix 1)
When it is detected that log information transmitted from a monitoring target machine at a specific transmission interval cannot be received at a reception interval corresponding to the transmission interval, the system is based on information acquired from a system related to the monitoring target machine. Identify the events that occurred in
If any log information has been received from the monitored machine, information indicating the identified event is generated in association with the received log information.
A control program for causing a computer to execute processing.
(Appendix 2)
Generating information corresponding to the log information received last from the monitored machine, among the received log information, the information indicating the identified event;
The control program according to supplementary note 1, wherein
(Appendix 3)
The information indicating the event includes information indicating a communication state of a network to which the monitored machine is connected, or information indicating an operating state of the monitored machine.
The control program according to appendix 1 or 2, characterized in that:
(Appendix 4)
When it is detected that log information transmitted from a monitoring target machine at a specific transmission interval cannot be received at a reception interval corresponding to the transmission interval, the system is based on information acquired from a system related to the monitoring target machine. Identify the events that occurred in
If any log information has been received from the monitored machine, information indicating the identified event is generated in association with the received log information.
A control method characterized in that a computer executes a process.
(Appendix 5)
Generating information corresponding to the log information received last from the monitored machine, among the received log information, the information indicating the identified event;
The control method according to supplementary note 4, characterized by:
(Appendix 6)
The information indicating the event includes information indicating a communication state of a network to which the monitored machine is connected, or information indicating an operating state of the monitored machine.
The control method according to appendix 4 or 5, characterized in that:
(Appendix 7)
When it is detected that log information transmitted from a monitoring target machine at a specific transmission interval cannot be received at a reception interval corresponding to the transmission interval, the system is based on information acquired from a system related to the monitoring target machine. A specific part that identifies the event that occurred in
When any log information has been received from the monitored machine, a generation unit that generates information in which the information indicating the identified event is associated with the received log information;
An information processing apparatus comprising:
(Appendix 8)
The generation unit generates information corresponding to the log information received last from the monitoring target machine among the received log information, the information indicating the identified event,
The information processing device according to appendix 7, wherein
(Appendix 9)
The information indicating the event includes information indicating a communication state of a network to which the monitored machine is connected, or information indicating an operating state of the monitored machine.
The information processing apparatus according to appendix 7 or 8, characterized by the above.

１０管理装置
１１親システム
１２監視部
１３情報記憶部
２０サーバ装置
３０仮想マシン
１００ドライブ装置
１０１記録媒体
１０２補助記憶装置
１０３メモリ装置
１０４ＣＰＵ
１０５インタフェース装置
１２１存在通知取得部
１２２事象特定部
Ｂバス
Ｎ１ネットワークシステム
Ｒルータ DESCRIPTION OF SYMBOLS 10 Management apparatus 11 Parent system 12 Monitoring part 13 Information storage part 20 Server apparatus 30 Virtual machine 100 Drive apparatus 101 Recording medium 102 Auxiliary storage apparatus 103 Memory apparatus 104 CPU
105 Interface Device 121 Presence Notification Acquisition Unit 122 Event Identification Unit B Bus N1 Network System R Router

Claims

When it is detected that log information transmitted from a monitoring target machine at a specific transmission interval cannot be received at a reception interval corresponding to the transmission interval, the system is based on information acquired from a system related to the monitoring target machine. Identify the events that occurred in
If any log information has been received from the monitored machine, information indicating the identified event is generated in association with the received log information.
A control program for causing a computer to execute processing.

Generating information corresponding to the log information received last from the monitored machine, among the received log information, the information indicating the identified event;
The control program according to claim 1.

The information indicating the event includes information indicating a communication state of a network to which the monitored machine is connected, or information indicating an operating state of the monitored machine.
The control program according to claim 1 or 2, characterized by the above.

When it is detected that log information transmitted from a monitoring target machine at a specific transmission interval cannot be received at a reception interval corresponding to the transmission interval, the system is based on information acquired from a system related to the monitoring target machine. Identify the events that occurred in
If any log information has been received from the monitored machine, information indicating the identified event is generated in association with the received log information.
A control method characterized in that a computer executes a process.

When it is detected that log information transmitted from a monitoring target machine at a specific transmission interval cannot be received at a reception interval corresponding to the transmission interval, the system is based on information acquired from a system related to the monitoring target machine. A specific part that identifies the event that occurred in
When any log information has been received from the monitored machine, a generation unit that generates information in which the information indicating the identified event is associated with the received log information;
An information processing apparatus comprising: