JP5042323B2

JP5042323B2 - Distributed resource monitoring system, method and apparatus

Info

Publication number: JP5042323B2
Application number: JP2010031012A
Authority: JP
Inventors: 啓生宮本; 雄次對馬; 秀貴青木; 輝米川
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2010-02-16
Filing date: 2010-02-16
Publication date: 2012-10-03
Anticipated expiration: 2030-02-16
Also published as: JP2011170411A

Description

社会基盤に適用される情報システム、特に広域分散資源利用を行うシステムの性能・障害監視を行う技術に関する。 The present invention relates to technology for monitoring performance and faults of information systems that are applied to social infrastructures, especially systems that use wide-area distributed resources.

電子マネーや鉄道制御、電力制御のように情報通信技術（Information and Communication Technology：ＩＣＴ）利用のサービスが普及し社会基盤の一部となっている。社会基盤を支えるＩＣＴには、増加し続けるデータや処理要求に対して、高信頼かつ低遅延な応答が必要となる。さらに、昨今の社会的、地球的要請から、ＩＣＴにもエネルギー効率の向上が求められている。また、社会基盤を支えるシステムである以上は、障害を含む稼動状況や性能の監視も必要不可欠である。 Services using information and communication technology (ICT) such as electronic money, railway control, and power control have become widespread and have become part of the social infrastructure. An ICT that supports a social infrastructure needs a highly reliable and low-latency response to ever-increasing data and processing requirements. Furthermore, due to recent social and global demands, ICT is also required to improve energy efficiency. In addition, as long as it is a system that supports social infrastructure, it is indispensable to monitor the operation status and performance including faults.

従来のＩＣＴシステムを対象とした稼動状況（障害含む）や性能を監視する方法として、以下にあげる３つの従来技術がある。これらは、主にデータセンタや企業内の情報システムを対象とした監視技術である。 There are the following three conventional techniques for monitoring the operating status (including faults) and performance of a conventional ICT system. These are monitoring technologies mainly for data centers and information systems in enterprises.

まず、第１の従来技術として、監視対象となる情報システムに対し、一般ユーザと同等のアクセス手段により仮想的な利用を実施することにより、情報システムが正常動作しているか否かの障害監視、及び応答時間による性能監視が可能とする方法がある。 First, as a first prior art, a failure monitoring of whether or not an information system is operating normally by performing virtual use on an information system to be monitored by an access means equivalent to a general user, There is also a method that enables performance monitoring by response time.

また、第２の従来技術として、監視対象となる情報システムを構成する個々のサーバやルータ、ストレージを対象として、資源の利用状況を監視する方法がある。本方法は各機器の性能関連のデータを統計的に収集することで情報システムの性能や障害監視を可能とする。 As a second conventional technique, there is a method of monitoring the resource usage status for individual servers, routers, and storages constituting an information system to be monitored. This method enables performance monitoring and fault monitoring of information systems by statistically collecting performance-related data of each device.

更に、第３の従来技術として、仮想的な利用を行うのに際し、多数のユーザを模擬するストレステストのプログラムを利用する方法がある。これにより、情報システムの耐負荷性能の監視を可能とする。 Furthermore, as a third prior art, there is a method of using a stress test program that simulates a large number of users when performing virtual use. As a result, the load resistance performance of the information system can be monitored.

以上のような先行技術文献としては、例えば特許文献１、２、３等がある。 Examples of the prior art documents as described above include Patent Documents 1, 2, 3, and the like.

特表２００５−５０６６０５号公報JP 2005-506605 A 特開２００３−２８３５６５号公報JP 2003-283565 A 特開２０００−３１５１９８号公報JP 2000-315198 A

第一の従来技術では、外部からの仮想的な利用に基づく監視を行うため、情報システム内部の動作について詳細な性能情報が得られない。さらに、障害発生時においても、情報システムのどの部分で障害は発生しているか判定不能である。また、広域分散し、階層的な情報システムでは情報処理要求元の位置に依存した要求を発生させなければならず、実現が困難となる。また、実サービスとの独立性が保たれていないため、障害を伴う監視においては社会への影響を鑑みると実サービスを停止せざるをえない。 In the first prior art, since monitoring based on virtual use from the outside is performed, detailed performance information cannot be obtained regarding the internal operation of the information system. Furthermore, even when a failure occurs, it cannot be determined in which part of the information system the failure has occurred. Also, in a wide-area distributed and hierarchical information system, a request depending on the position of the information processing request source must be generated, which is difficult to realize. In addition, since the independence from the actual service is not maintained, in the case of monitoring with faults, the actual service must be stopped in view of the impact on society.

第二の従来技術では、機器毎の詳細な統計的性能情報が取得可能であり、障害発生部位の特定、ある程度の動作推定が可能ではあるが、特定の処理要求に対する個々の機器の動作との関連付けが不明であり「推定」の域を出られない。 In the second prior art, detailed statistical performance information for each device can be acquired, and the location of the failure can be specified and the operation can be estimated to some extent. The association is unknown and the "estimated" area cannot be left.

第三の従来技術では、情報システムの耐負荷性を判定するのみであり、障害監視には利用できない。また、社会基盤を支える情報処理システムでは、耐負荷性確認を行うと社会基盤を麻痺させる可能性があるため、実施不可である可能性が高い。 The third conventional technique only determines the load resistance of the information system and cannot be used for fault monitoring. Moreover, in the information processing system that supports the social infrastructure, there is a possibility that the social infrastructure may be paralyzed if the load resistance confirmation is performed, so that it is highly possible that the information infrastructure is not feasible.

本発明の目的は、分散配置される情報通信機器の稼働状況や性能を、稼働する実業務に影響を与えることなく測定する分散資源監視システムを提供することにある。 An object of the present invention is to provide a distributed resource monitoring system that measures the operating status and performance of distributed information communication devices without affecting the actual business operations.

本発明の他の目的は、サービスを実現するアプリケーションやデータに変更を加えず、広域分散された情報システムの詳細な性能情報を得つつ、障害発生時に正確な部位の特定を可能にする性能・障害の監視方法、及び装置を提供することにある。 Another object of the present invention is the ability to specify an accurate part when a failure occurs while obtaining detailed performance information of an information system distributed over a wide area without changing the application or data for realizing the service. It is an object to provide a fault monitoring method and apparatus.

上記の目的を達成するため、本発明においては、管理サーバと、複数の処理ノードからなる分散資源がネットワークを介して階層的な情報処理を実施して所望のサービスを提供する分散資源の監視システムであって、複数の処理ノードは、情報処理を実施する本番アプリケーションと、本番アプリケーションの動作を模擬する擬似アプリケーションを備え、サービスに対応する擬似パケットを受信した際、擬似アプリケーションが、擬似パケットに順序情報を付加して、他の処理ノードに送信し、管理サーバは、複数の処理ノードの擬似アプリケーション各々が擬似パケットを送受信した際の稼動情報を取得し、取得した擬似パケットの順序情報と稼動情報を評価することにより、ネットワーク上の障害部位を推定する分散資源監視システムを提供する。 In order to achieve the above object, in the present invention, a distributed resource monitoring system in which a distributed resource comprising a management server and a plurality of processing nodes performs hierarchical information processing via a network and provides a desired service. The plurality of processing nodes includes a production application that performs information processing and a pseudo application that simulates the operation of the production application. When a pseudo packet corresponding to a service is received, the pseudo application orders the pseudo packet in order. The information is added and transmitted to another processing node, and the management server acquires operation information when each of the pseudo applications of the plurality of processing nodes transmits / receives the pseudo packet, and the order information and the operation information of the acquired pseudo packet Distributed resource monitoring system that estimates the fault location on the network by evaluating To provide.

また、上記の目的を達成するため、本発明においては、複数の処理ノードからなる分散資源がネットワークを介して階層的な情報処理を実施してサービスを提供するシステムにおける障害発生部位を推定する分散資源監視方法であって、複数の処理ノードに情報処理を実施する本番アプリケーションと、本番アプリケーションの動作を模擬する擬似アプリケーションを設定し、処理ノードはネットワークを介して提供するサービスに対応する擬似パケットを受信し、擬似アプリケーションが模擬動作の後、擬似パケットに順序情報を添付し、他の処理ノードに順次送信し、管理マネージャは、複数の処理ノードの擬似アプリケーション各々が擬似パケットを送受信したときの稼動情報を取得し、取得した擬似パケットの順序情報に基づいて稼動情報を評価することにより、ネットワーク上の障害部位を推定する分散資源監視方法を提供する。 In order to achieve the above object, according to the present invention, a distributed resource composed of a plurality of processing nodes performs distributed information processing via a network to estimate a fault occurrence site in a system that provides a service. A resource monitoring method, in which a production application that performs information processing is set in a plurality of processing nodes, and a pseudo application that simulates the operation of the production application is set, and the processing node sends a pseudo packet corresponding to a service provided via the network. Received, after the pseudo-application is simulated, attaches order information to the pseudo-packet and sequentially sends it to other processing nodes. The management manager operates when each pseudo-application of the plurality of processing nodes transmits / receives the pseudo-packet. Information, and based on the obtained pseudo packet order information By evaluating the information, it provides a distributed resource monitoring method for estimating the failure site on the network.

更に、上記の目的を達成するため、本発明においては、複数の処理ノードがネットワークを介して階層的な情報処理を実施して所望のサービスを提供するシステムの分散資源監視装置であって、それぞれ情報処理を実施する本番アプリケーションと、本番アプリケーションの動作を模擬する擬似アプリケーションを備えた複数の処理ノードが、サービスに対応する擬似パケットを受信し、擬似アプリケーションが擬似パケットに順序情報を付加して、他の処理ノードに送信する際の順序情報と稼働情報を収集し、取得した擬似パケットの順序情報と稼動情報に基づき、障害部位を推定する分散資源監視装置を提供する。 Furthermore, in order to achieve the above object, according to the present invention, there is provided a distributed resource monitoring apparatus for a system in which a plurality of processing nodes perform hierarchical information processing via a network and provide a desired service. A plurality of processing nodes having a production application that performs information processing and a pseudo application that simulates the operation of the production application receive a pseudo packet corresponding to the service, and the pseudo application adds order information to the pseudo packet, Provided is a distributed resource monitoring apparatus that collects order information and operation information when transmitting to other processing nodes, and estimates a faulty part based on the obtained order information and operation information of a pseudo packet.

すなわち、上記の目的を達成するため、本発明においては、階層的な情報処理を実施する広域分散配置された処理ノード内において、本番アプリケーションと同一資源上に擬似アプリケーションを配置し、この擬似アプリケーションにより処理の流れを模擬することにより、詳細な性能情報から、問題となる処理経路の特定や、障害部位を特定することを可能にする。 That is, in order to achieve the above object, in the present invention, a pseudo application is arranged on the same resource as the production application in a processing node arranged in a wide area to perform hierarchical information processing. By simulating the flow of processing, it becomes possible to identify a problematic processing path and a faulty part from detailed performance information.

本発明によれば、分散配置される情報通信機器の稼働状況や性能を、稼働する実業務に影響を与えることなく測定する分散資源監視システムを提供することができる。また、サービスを実現するアプリケーションやデータに変更を加えず、広域分散された情報システムの詳細な性能情報の取得、障害発生部位の特定を行う性能・障害監視方法、及び装置を提供することができる。 According to the present invention, it is possible to provide a distributed resource monitoring system that measures the operating status and performance of distributed information communication devices without affecting the actual business operations. In addition, it is possible to provide a performance / fault monitoring method and apparatus for obtaining detailed performance information of a distributed information system and specifying a fault occurrence site without changing the application or data for realizing the service. .

第１の実施例に係る、性能・障害監視システムの一構成例を示す図である。1 is a diagram illustrating a configuration example of a performance / fault monitoring system according to a first embodiment. FIG. 第１の実施例に係る、監視システムが適用される情報処理システムの一例を示す図である。It is a figure which shows an example of the information processing system to which the monitoring system is applied based on a 1st Example. 第１の実施例に係る、性能・障害監視システムの機能動作を模式的に示した図である。It is the figure which showed typically the functional operation | movement of the performance and fault monitoring system based on 1st Example. 第１の実施例に係る、性能・障害監視システムの処理フローを示すフローチャート図である。It is a flowchart figure which shows the processing flow of the performance and fault monitoring system based on a 1st Example. 第１の実施例に係る、処理サーバの擬似ＡＰの一構成例を示す図である。It is a figure which shows the example of 1 structure of the pseudo AP of the processing server based on 1st Example. 第１の実施例に係る、管理サーバの管理マネージャの一構成例を示す図である。It is a figure which shows the example of 1 structure of the management manager of the management server based on a 1st Example. 第１の実施例に係る、性能・障害監視システムに用いるパケットの一構成例を示す図である。It is a figure which shows one structural example of the packet used for the performance and fault monitoring system based on 1st Example. 第１の実施例に係る、擬似アプリケーション（Application；ＡＰ）の擬似パケット受信時の処理の詳細フローを示す図である。It is a figure which shows the detailed flow of a process at the time of the pseudo packet reception of a pseudo application (Application; AP) based on a 1st Example. 第１の実施例に係る、擬似パケットの流れを説明するための図である。It is a figure for demonstrating the flow of the pseudo packet based on 1st Example. 第１の実施例に係る、管理マネージャ１４１の処理フローを示す図である。It is a figure which shows the processing flow of the management manager 141 based on a 1st Example. 第１の実施例に係る、性能・障害監視システムが適用される実際の情報処理システムの他の構成例を示す図である。It is a figure which shows the other structural example of the actual information processing system to which the performance and fault monitoring system which concerns on a 1st Example is applied. 第１の実施例に係る、擬似ＡＰの動作のバリエーションのテーブルを示す図である。It is a figure which shows the table of the variation | change of operation | movement of pseudo | simulation AP based on 1st Example. 第１の実施例に係る、擬似ＡＰの滞在時間のバリエーションのテーブルを示す図である。It is a figure which shows the table of the variation of the residence time of pseudo AP based on a 1st Example. 第１の実施例に係わる情報処理システムの処理の流れの全体を説明するシーケンス図である。It is a sequence diagram explaining the whole processing flow of the information processing system concerning a 1st Example. 第１の実施例に係わる情報処理システムの処理の流れの全体を説明するシーケンス図である。It is a sequence diagram explaining the whole processing flow of the information processing system concerning a 1st Example. 第１の実施例に係わる、情報処理システムの具体的構成の一例を示す図である。It is a figure which shows an example of the specific structure of the information processing system concerning a 1st Example. 第１の実施例に係わる、稼働情報蓄積部に蓄積される種々の稼働情報のテーブル例を示す図である。It is a figure which shows the example of a table of the various operation information accumulate | stored in the operation information storage part concerning a 1st Example.

以下、本発明を実施するための形態を図面に従い説明する。なお、本明細書において、階層的情報処理とは、実世界における所望のサービスを複数のアプリケーション（Application；ＡＰ）が実現する処理を言う。好適にはこの複数のＡＰは、ネットワークに接続される複数の処理ノードで実行されるものであり、本明細書においては、特に本番ＡＰと称する。個々のサービスでは、実世界で取得した情報を元に複数の処理ノード内の本番ＡＰが連携して情報処理を行い、処理結果を実世界にフィードバックする。 Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings. In the present specification, hierarchical information processing refers to processing in which a plurality of applications (APs) realize a desired service in the real world. Preferably, the plurality of APs are executed by a plurality of processing nodes connected to the network, and are specifically referred to as production APs in this specification. In each service, production APs in a plurality of processing nodes cooperate in information processing based on information acquired in the real world, and process results are fed back to the real world.

また、同一資源上（筐体、ブレード、中央処理部、メモリなど）の擬似ＡＰが本番ＡＰの動作を模擬し、サービスが利用している経路を特定する。さらに、擬似ＡＰはデータ送受信時の時刻を次の処理ノードへの送信パケットに付加し、その時刻周辺の稼動情報を処理ノード内に蓄積する。蓄積した各処理ノードの稼動情報を管理サーバ内の管理マネージャが収集し、関連付けを行うことでサービスの一連の情報処理の流れを追うことが可能になり、障害発生時における問題部位の特定を可能にできる。 In addition, a pseudo AP on the same resource (a case, a blade, a central processing unit, a memory, etc.) simulates the operation of the production AP and specifies a route used by the service. Furthermore, the pseudo AP adds the time at the time of data transmission / reception to the transmission packet to the next processing node, and accumulates operation information around that time in the processing node. The management information in the management server collects the accumulated operation information of each processing node, and it is possible to follow the flow of a series of information processing of the service by associating it, and it is possible to identify the problem part at the time of failure occurrence Can be.

第１の実施例は、階層的な情報処理を実施する広域分散配置された複数の処理ノード内において、本番ＡＰと同一資源上に擬似ＡＰを配置し、処理の流れを模擬することにより、問題となる処理経路の特定と、詳細な性能情報から障害部位を特定する性能・障害監視システムである。 In the first embodiment, a pseudo AP is arranged on the same resource as the real AP in a plurality of processing nodes arranged in a wide area to perform hierarchical information processing, and the process flow is simulated. This is a performance / fault monitoring system that identifies a processing path to be identified and identifies a faulty part from detailed performance information.

図１に第１の実施例の性能・障害監視システムの概要を示す図である。同図において、１１、１２、１３は広域に分散配置された処理ノード、１４は管理サーバ、１５、１６、１７は公衆網・イントラネットなどのネットワークを示している。この広域に分散配置された複数の処理ノード１１、１２、１３にまたがって複数の情報処理ＡＰである本番ＡＰが連携することにより各種のサービスを提供する。処理ノード１１、１２、１３では、各々の情報処理結果に応じ、処理結果の送信先がそれぞれ発生する。 FIG. 1 is a diagram showing an outline of the performance / fault monitoring system of the first embodiment. In the figure, reference numerals 11, 12, and 13 denote processing nodes distributed in a wide area, reference numeral 14 denotes a management server, and reference numerals 15, 16, and 17 denote networks such as public networks and intranets. Various services are provided by the production AP, which is a plurality of information processing APs, straddling the plurality of processing nodes 11, 12, 13 distributed in a wide area. In the processing nodes 11, 12, and 13, processing result transmission destinations are generated according to the respective information processing results.

処理ノード１１は、本番ＡＰ１１１、擬似ＡＰ１１２、オペレーティングシステム（Operating System；ＯＳ）あるいはＡＰ実行基盤１１３、ハードウェア１１４で構成される。ハードウェア１１４は、記憶部であるメモリ（Memory；ＭＭ）１１５、入出力部（Input/Output；Ｉ／Ｏ）１１６、処理部である中央処理部（Central Processing Unit：ＣＰＵ）１１７、記憶部であるハードディスクドライブ（Hard Disk Drive：ＨＤＤ）１１８、ネットワークとのインタフェース(Interface；Ｉ／Ｆ)１１９から構成される。他の処理ノード１２、１３、管理サーバ１４のハードウェア構成も同様であり、後でその一例を示す。管理サーバ１４上では管理マネージャ１４１が動作する。 The processing node 11 includes a production AP 111, a pseudo AP 112, an operating system (OS) or an AP execution base 113, and hardware 114. The hardware 114 includes a memory (MM) 115 as a storage unit, an input / output (I / O) 116, a central processing unit (CPU) 117 as a processing unit, and a storage unit. A hard disk drive (HDD) 118 and an interface (I / F) 119 with a network are included. The hardware configurations of the other processing nodes 12 and 13 and the management server 14 are the same, and an example thereof will be shown later. A management manager 141 operates on the management server 14.

図１の性能・障害監視システム構成において、実サービスでは、実世界で取得した情報を元に処理ノード１１、１２、１３内の各本番ＡＰが連携して情報処理を行い、処理結果を実世界にフィードバックする。また、処理ノード１１、１２、１３において、本番ＡＰの動作を模擬する擬似ＡＰが動作する。処理ノード１１を例に説明すると、ハードウェア１１４からなる同一資源上の擬似ＡＰ１１２が本番ＡＰ１１１の動作を模擬し、サービスが利用している経路を特定する。すなわち、本実施例における擬似ＡＰ１１２は、情報処理を担う本番ＡＰ１１１と同一資源を利用し、本番ＡＰ１１１の動作を模擬しつつ稼動情報を集約する機能を提供する。 In the performance / fault monitoring system configuration shown in FIG. 1, in the real service, the production APs in the processing nodes 11, 12, and 13 perform information processing based on information acquired in the real world, and the processing results are obtained in the real world. To give feedback. Further, a pseudo AP that simulates the operation of the production AP operates in the processing nodes 11, 12, and 13. To explain the processing node 11 as an example, the pseudo AP 112 on the same resource consisting of the hardware 114 simulates the operation of the production AP 111 and specifies the route used by the service. That is, the pseudo AP 112 in the present embodiment provides a function of collecting operation information while simulating the operation of the production AP 111 using the same resources as the production AP 111 that handles information processing.

また、擬似ＡＰ１１２はデータ送受信時の時刻を次の処理ノードへの送信パケットに付加し、その時刻周辺の稼動情報を処理ノード１１内に蓄積する。蓄積した各処理ノード１１、１２、１３の稼動情報を管理サーバ１４内の管理マネージャ１４１が全て収集し、関連付けを行うことでサービスの一連の情報処理の流れを追うことが可能になり、障害発生時における問題部位の特定を可能にする。 Further, the pseudo AP 112 adds the time at the time of data transmission / reception to the transmission packet to the next processing node, and accumulates operation information around that time in the processing node 11. The management manager 141 in the management server 14 collects all the accumulated operation information of the processing nodes 11, 12, and 13 and associates them so that it is possible to follow the flow of a series of information processing of the service, and a failure occurs. Enables identification of problem areas at times.

ここで、本実施例における本番ＡＰによる具体的なサービスについて一例を挙げて説明する。本番ＡＰの具体的なサービスとしては、例えば公的な映像監視や、企業によるセキュリティサービスなどがある。このような監視サービスにおいて、中央監視室では各拠点の映像監視を行っているが、人の目では高々数か所の映像を監視するのがせいぜいである。そこで、階層化な監視処理を行うことにより、問題行動を起こした監視対象である被写体を自動的に検知し、中央監視室のディスプレイにアラート表示する。このときの管理ノード１４の役割は、処理ノード１１、１２、１３やネットワーク１５、１６、１７の障害を事前に検知して、サービスが停止しないような対策を実施することにある。 Here, a specific service by the production AP in the present embodiment will be described with an example. Specific services of the production AP include, for example, public video monitoring and security services by companies. In such a monitoring service, the central monitoring room monitors the video of each base, but at most, it is at best to monitor the video in several places. Therefore, by performing hierarchical monitoring processing, the subject that is the monitoring target that caused the problematic behavior is automatically detected, and an alert is displayed on the display of the central monitoring room. The role of the management node 14 at this time is to detect a failure of the processing nodes 11, 12, 13 and the networks 15, 16, and 17 in advance and take measures to prevent the service from stopping.

本システム構成では、監視映像の階層化処理により常に各拠点からの映像を中央監視室に送信する必要はない。ただ、問題行動を起こした被写体を検知した場合には、優先的に中央監視室へ当該映像を送信し、監視室のディスプレイに表示する必要が出てくる。このとき、本システムでは、例えば、処理ノード１１で取得した映像データをいくつかの処理ノード１２、１３等を経て、中央監視室まで送信する際のネットワーク帯域の確保、送信データの優先度切替、他拠点から送信されてくる映像データとのネットワークパス調停等が必要となる。 In this system configuration, it is not always necessary to transmit the video from each base to the central monitoring room by the hierarchization processing of the monitoring video. However, when a subject that has caused a problem behavior is detected, it is necessary to preferentially transmit the video to the central monitoring room and display it on the display of the monitoring room. At this time, in this system, for example, securing the network bandwidth when transmitting the video data acquired by the processing node 11 to the central monitoring room via the several processing nodes 12, 13 and the like, switching the priority of the transmission data, Network path arbitration with video data transmitted from other sites is required.

上記の具体的な監視サービスを例にとって、本実施例の階層化処理機能をより詳細に説明する。従来のシステムでは、処理ノード１１のカメラ映像は全て中央監視室へ送信していたが、本実施例の性能・障害監視システム構成において、中間に分散配置された処理ノードで処理を実施することにより、必要な分だけの映像データを中央監視室へ送信することが可能になる。つまり、ネットワーク帯域を無駄に消費することなくなり、これまでと同じネットワーク帯域でさらに多くの拠点を監視することが可能になる。 Taking the above specific monitoring service as an example, the hierarchical processing function of this embodiment will be described in more detail. In the conventional system, all the camera images of the processing node 11 are transmitted to the central monitoring room, but in the performance / fault monitoring system configuration of the present embodiment, the processing is performed by processing nodes distributed in the middle. This makes it possible to transmit only the necessary video data to the central monitoring room. That is, the network bandwidth is not wasted, and more bases can be monitored with the same network bandwidth as before.

監視サービスを行う本システムの処理フローの一例は下記の通りである。 An example of the processing flow of this system for performing the monitoring service is as follows.

(1)処理ノード１１には、例えばWebカメラと人感センサが搭載され、人感センサ情報と、カメラ映像データをパケットとして送信するＡＰが搭載されている。 (1) The processing node 11 includes, for example, a Web camera and a human sensor, and an AP that transmits human sensor information and camera video data as a packet.

(2)処理ノード１２には、カメラ映像データから被写体の問題行動の検知する検知ＡＰが搭載されている。 (2) The processing node 12 is equipped with a detection AP for detecting the problem behavior of the subject from the camera video data.

(3)処理ノード１１でセンサの近くに人が来たのを検知し、カメラ映像データを処理ノード１２へ送信開始する。 (3) The processing node 11 detects that a person has come close to the sensor, and starts transmitting camera video data to the processing node 12.

(4)処理ノード１２の問題行動検知ＡＰによって取得した映像を処理し、問題行動と判断された映像を中央監視室へ送信する。問題行動と判断されなかった場合には、処理ノード１２で映像を終端し、中央監視室への送信は行われない。このような階層化処理により、ネットワーク帯域を無駄に消費することなくなり、これまでと同じネットワーク帯域でさらに多くの拠点を監視することが可能になる。このような本実施例が適用される実際の情報処理システムは、色々な構成を取りうる。 (4) The video acquired by the problem behavior detection AP of the processing node 12 is processed, and the video determined to be the problem behavior is transmitted to the central monitoring room. If it is not determined that the behavior is problematic, the video is terminated at the processing node 12 and is not transmitted to the central monitoring room. By such a hierarchization process, the network bandwidth is not wasted, and more bases can be monitored with the same network bandwidth as before. An actual information processing system to which this embodiment is applied can take various configurations.

図２は、本実施例の性能・障害監視システムが適用される実際の情報処理システムの一構成例を示している。同図において、２１〜２８は処理ノードで、図１の処理ノード１１，１２、１３等に対応する。２９は公衆網・イントラネットなどのネットワークを示す。処理ノード２１〜２８の内、幾つかの処理ノード２１、２２、２３、２４等は、上述のように実世界３０に存在する各種の通信端末として機能するセンサ３１、３２やアクチュエータ３３、３４、或いはWebカメラ等からセンシング情報、映像データを受信し、また制御情報を送信する。 FIG. 2 shows a configuration example of an actual information processing system to which the performance / fault monitoring system of this embodiment is applied. In the figure, reference numerals 21 to 28 denote processing nodes corresponding to the processing nodes 11, 12, 13 and the like in FIG. Reference numeral 29 denotes a network such as a public network or an intranet. Among the processing nodes 21 to 28, some of the processing nodes 21, 22, 23, 24, etc. are sensors 31, 32 and actuators 33, 34, which function as various communication terminals existing in the real world 30 as described above. Alternatively, sensing information and video data are received from a web camera or the like, and control information is transmitted.

図１１に本実施例の性能・障害監視システムが適用される実際の情報処理システムの他の構成例を示した。同図において、処理ノード１１０１、１１０２、１１０３、１１０４の内、処理ノード１１０１、１１０２は情報処理が可能なインテリジェントノードであり、処理ノード１１０３、１１０４は広域ネットワーク１１０５に接続された情報処理が可能なデータセンタなどのサーバである。処理ノード１１０１、１１０２が接続されるローカルネットワーク１１０７、１１０８には、センサやカメラなどの各種の通信端末１１０９が接続される。また、ローカルネットワーク１２０７には情報のフィルタ機能を有するエッジノード１２１０が接続され、このエッジノード１１１０を介して、通信端末である複数のセンサ１１１１、１１１２、アクチュエータ１１１３との間で、センシング情報の吸い上げや、制御情報の伝達を行う。このような情報処理システムにおいては、エッジノード１１１０、処理ノード１１０１〜１１０４が、図１における処理ノード１１、１２、１３に対応し、管理サーバ１１０６は管理サーバ１４に対応する。 FIG. 11 shows another configuration example of an actual information processing system to which the performance / fault monitoring system of this embodiment is applied. In the figure, among the processing nodes 1101, 1102, 1103 and 1104, the processing nodes 1101 and 1102 are intelligent nodes capable of information processing, and the processing nodes 1103 and 1104 are capable of information processing connected to the wide area network 1105. A server such as a data center. Various communication terminals 1109 such as sensors and cameras are connected to local networks 1107 and 1108 to which the processing nodes 1101 and 1102 are connected. Further, an edge node 1210 having an information filtering function is connected to the local network 1207, and sensing information is sucked up between the plurality of sensors 1111 and 1112 and the actuator 1113 which are communication terminals via the edge node 1110. And transmission of control information. In such an information processing system, the edge node 1110 and the processing nodes 1101 to 1104 correspond to the processing nodes 11, 12, and 13 in FIG. 1, and the management server 1106 corresponds to the management server 14.

続いて本実施例の性能・障害監視システムの機能動作を模式的に示した図である図３に基づき、本実施例の処理ノード１１、１２、１３による情報処理の連携によりサービスを行う情報処理システムの性能・障害監視方法を具体的に説明する。 Subsequently, based on FIG. 3 which is a diagram schematically showing the functional operation of the performance / failure monitoring system of the present embodiment, information processing for providing services by cooperation of information processing by the processing nodes 11, 12, and 13 of the present embodiment. The system performance / fault monitoring method will be described in detail.

図３は、図１で説明した、分散配置された複数の処理ノード１１、１２、１３内において、本番ＡＰと同一資源上に擬似ＡＰを配置し、処理の流れを模擬するためのシステム構成を示している。なお、処理ノード１１、１２、１３の各ハードウェア構成は図１に示した処理ノード１１のハードウェア構成と同様な構成を有するが、図３においては主要な要素のみを簡略的に図示した。また管理サーバ１４も図示が省略した。 FIG. 3 shows a system configuration for simulating the flow of processing by arranging a pseudo AP on the same resource as a production AP in the plurality of processing nodes 11, 12, and 13 that are distributed as described in FIG. Show. The hardware configurations of the processing nodes 11, 12, and 13 have the same configuration as the hardware configuration of the processing node 11 shown in FIG. 1, but only the main elements are shown in a simplified manner in FIG. The management server 14 is also not shown.

図３において、実世界中のアドレスＤの端末（Address；Ａｄｄｒ＝Ｄ）から本番系処理時のパケットが入力され、処理ノード１１、１２、１３での本番ＡＰ１１１、１２１、１３１による情報処理の結果、本番パケットが実世界の端末（アドレス＝Ｄ）に送出される。同様に、擬似パケットが端末（Ａｄｄｒ＝Ｄ）から入力し、処理ノード１１、１２、１３における擬似ＡＰ１１２、１２２、１３２による模擬処理の結果に基づき、擬似パケットが端末（Ａｄｄｒ＝Ｄ）に送出される。 In FIG. 3, a packet at the time of production processing is input from a terminal (Address; Addr = D) of address D in the real world, and the result of information processing by the production APs 111, 121, 131 at the processing nodes 11, 12, 13 The real packet is sent to a real-world terminal (address = D). Similarly, a pseudo packet is input from the terminal (Addr = D), and the pseudo packet is sent to the terminal (Addr = D) based on the result of the simulation processing by the pseudo APs 112, 122, 132 in the processing nodes 11, 12, and 13. The

図３に示したように、本番ＡＰ１１１の送信先アドレスＢは、処理ノード１１内の擬似ＡＰ１１２に、滞留時間（０．４）と共に、本番ＡＰ１１１の所有者であるユーザや管理者から入手して実装される。同様に、処理ノード１２、処理ノード１３の情報処理である本番ＡＰ１２１、１３１の送信先アドレスＣ、アドレスＤは、それぞれの滞留時間（０．１）、（０．２）と共に擬似ＡＰ１２２、１３２に設定される。なお、送信先アドレスや滞留時間等のデータは、ユーザから入手する代わりに、各処理サーバにおいて、後で説明するように本番ＡＰを一定時間モニタリングして、滞留時間とその送信先を推定しても良い。 As shown in FIG. 3, the transmission destination address B of the production AP 111 is obtained from the user or administrator who is the owner of the production AP 111 together with the residence time (0.4) to the pseudo AP 112 in the processing node 11. Implemented. Similarly, the destination addresses C and D of the production APs 121 and 131 which are information processing of the processing nodes 12 and 13 are stored in the pseudo APs 122 and 132 together with the respective residence times (0.1) and (0.2). Is set. In addition, instead of obtaining data such as destination address and dwell time from the user, each processing server monitors the production AP for a certain period of time as described later, and estimates the dwell time and its destination. Also good.

図４に本実施例の全体処理シーケンスを説明する概略フローチャートを示した。図４において、性能・障害監視の処理フローが開始すると、まず管理マネージャ１４１が各処理ノード１１、１２、１３に対して擬似ＡＰを設定する（ステップ４１、以下括弧内ではステップを省略）。続いて、実世界中の該当するクライアント端末が後で説明する擬似パケットを送信する（４２）。各処理ノードは、該当する擬似パケットの送受信時に擬似ＡＰで階層的に情報処理されたことを識別できる情報処理順序情報を擬似パケットに付与する（４３）。 FIG. 4 shows a schematic flowchart for explaining the entire processing sequence of this embodiment. In FIG. 4, when the processing flow for performance / fault monitoring starts, first, the management manager 141 sets a pseudo AP for each of the processing nodes 11, 12, and 13 (step 41; hereinafter, steps are omitted in parentheses). Subsequently, the corresponding client terminal in the real world transmits a pseudo packet described later (42). Each processing node gives information processing order information that can identify that information processing has been hierarchically performed by the pseudo AP when the corresponding pseudo packet is transmitted / received to the pseudo packet (43).

擬似ＡＰは擬似パケット送受信時の処理ノードの稼働情報を取得する（４４）。擬似パケットを受信した処理ノードが擬似パケットを蓄積する（４５）。管理マネージャ１４１が、各擬似ＡＰが擬似パケットを送受信したときの稼働情報を取得する（４６）。管理マネージャ１４１は、クライアント端末が受信した擬似パケットの情報処理順序情報を元に処理ノードの稼働情報を取得する（４７）。管理マネージャ１４１は取得した稼働情報を評価してネットワーク上の障害部位を推定する（４８）。 The pseudo AP acquires operation information of the processing node at the time of transmitting and receiving the pseudo packet (44). The processing node that has received the pseudo packet accumulates the pseudo packet (45). The management manager 141 acquires operation information when each pseudo AP transmits and receives a pseudo packet (46). The management manager 141 acquires the operation information of the processing node based on the information processing order information of the pseudo packet received by the client terminal (47). The management manager 141 evaluates the acquired operation information and estimates a faulty part on the network (48).

本実施例の性能・障害監視処理シーケンスにおいて、ステップ４２でクライアント端末が擬似パケットの送信を行うが、この擬似パケットは、クライアント端末に提供するサービス品質を評価し、品質に問題となる箇所を特定するための情報を提供するものである。この擬似パケットにより通信時間、情報処理時間の内訳を明確にし、遅延が発生している処理ノードやネットワークを調査することが可能になる。よって、サービスそのものの問題点を調査することよりも、サービスが動作するプラットフォームに問題がないかを調査するためのツールとして機能する。 In the performance / failure monitoring processing sequence of this embodiment, the client terminal transmits a pseudo packet in step 42. This pseudo packet evaluates the quality of service provided to the client terminal and identifies the location where the quality is a problem. It provides information to do. This pseudo packet makes it possible to clarify the breakdown of communication time and information processing time, and to investigate processing nodes and networks in which a delay occurs. Therefore, it functions as a tool for investigating whether there is a problem in the platform on which the service operates rather than investigating the problem of the service itself.

また、この擬似パケットは複数の処理ノード上の情報処理を経由することで、ノード識別子及び到着時間と送出時間をペイロードに追記していき、End-to-Endを流れた擬似パケットを管理ノード１４が集計し、管理マネージャ１４１が持つシステムのネットワークトポロジーと照らし合わせ経路情報と遅延の原因となる滞留箇所を調査する。 Further, this pseudo packet goes through information processing on a plurality of processing nodes, so that the node identifier, arrival time, and transmission time are added to the payload, and the pseudo packet that has flowed end-to-end is managed by the management node 14. Are collected and checked against the network topology of the system possessed by the management manager 141 and the staying location causing the delay.

本実施例の性能・障害監視システムにおいて、End-to-Endは実世界に存在し、サービスを利用するクライアント端末、通信端末がこのEndの部分である。すなわち、End-to-Endとはクライアント端末がサービスに対してリクエストを送信し、応答が帰ってくるまでの区間を想定している。そのため擬似パケットの送出/受信元は実世界に擬似パケットを送出/受信する情報処理装置である端末を配置する。あるいは、次に近い処理ノードの管理ミドルウェアから処理ノード上のアプリケーションに対して擬似パケットを入力する方法もある。 In the performance / failure monitoring system of this embodiment, End-to-End exists in the real world, and the client terminal and communication terminal that use the service are the End part. In other words, End-to-End assumes a section from when a client terminal transmits a request to a service until a response returns. For this reason, the sending / receiving source of the pseudo packet arranges a terminal which is an information processing apparatus that sends / receives the pseudo packet in the real world. Alternatively, there is a method of inputting a pseudo packet to the application on the processing node from the management middleware of the next processing node.

前者は、実サービスの利用と同じ状況で検証が可能であり、より精度のよい検証が可能になるが検証の際に、処理ノード１１が存在する場所へ赴き、擬似パケットを入力する必要がある。一方、後者は検証の処理ノードに擬似パケットを送出/受信する仕組みを配置することになるが、すべての操作が管理ノードの操作で終わるため検証が容易である。この場合、擬似ＡＰのデプロイ（４１）と同時に、送信する擬似パケットもデプロイされる。各種設定および擬似アプリのインストールの実行に加え、処理ノード１１の管理ミドルウェアに擬似パケットを登録する。管理ミドルウェアは各処理ノードにおいて管理ノード１４とのインタフェースを持つソフトウェアであり、管理ノード１４からのサービス開始指示により、擬似ＡＰが入力待ちの状態になり、処理ノード１１の管理ミドルウェアに登録された擬似パケットを入力データとして擬似ＡＰに流し込む処理を実行する。 The former can be verified in the same situation as when using the actual service, and more accurate verification is possible. However, at the time of verification, it is necessary to go to the place where the processing node 11 exists and input a pseudo packet. . On the other hand, in the latter case, a mechanism for sending / receiving a pseudo packet is arranged in the verification processing node, but verification is easy because all operations end with the management node. In this case, the pseudo packet to be transmitted is deployed at the same time as the deployment of the pseudo AP (41). In addition to executing various settings and installing a pseudo application, a pseudo packet is registered in the management middleware of the processing node 11. The management middleware is software having an interface with the management node 14 in each processing node. The pseudo AP is waiting for input in response to a service start instruction from the management node 14 and is registered in the management middleware of the processing node 11. A process of flowing a packet into the pseudo AP as input data is executed.

図５に実施例１の性能・障害監視システムにおける擬似ＡＰの一実施例を示した。同図において、５１、５２、５３はそれぞれ稼動情報蓄積部、擬似ＡＰ、擬似パケット蓄積部を示す。稼動情報蓄積部５１、擬似パケット蓄積部５３は先に説明したハードウェアの記憶部に形成される。擬似ＡＰ１１２はパケット到着時刻採集部５４、稼動情報採集部５５、本番ＡＰ動作模擬部５６、パケット送信時刻採集部５７、採取データ保持部５８、管理マネージャIF部５９、プロファイル情報付与部６０の各機能ブロックで構成される。これらの機能ブロックはハードウェアの処理部で実行されるプログラムで構成される。 FIG. 5 shows an embodiment of the pseudo AP in the performance / fault monitoring system of the first embodiment. In the figure, reference numerals 51, 52, and 53 denote an operation information storage unit, a pseudo AP, and a pseudo packet storage unit, respectively. The operation information storage unit 51 and the pseudo packet storage unit 53 are formed in the hardware storage unit described above. The pseudo AP 112 has functions of a packet arrival time collection unit 54, an operation information collection unit 55, a production AP operation simulation unit 56, a packet transmission time collection unit 57, a collection data holding unit 58, a management manager IF unit 59, and a profile information addition unit 60. Consists of blocks. These functional blocks are composed of programs executed by a hardware processing unit.

図６に、同様に実施例１の性能・障害監視システムにおける管理マネージャの一実施例を示した。同図において、６１、６２はそれぞれ採取データ蓄積部、管理マネージャを示し、管理マネージャ６２は、採取データ収集部６３、擬似ＡＰＩ／Ｆ部６４、擬似ＡＰ制御部６５、採取データ関連付け処理部６６の機能ブロックからなる。この機能ブロックの詳細は後で説明する。 FIG. 6 shows an embodiment of the management manager in the performance / fault monitoring system of the first embodiment. In the figure, reference numerals 61 and 62 denote a collection data storage unit and a management manager, respectively. The management manager 62 includes a collection data collection unit 63, a pseudo API / F unit 64, a pseudo AP control unit 65, and a collection data association processing unit 66. Consists of functional blocks. Details of this functional block will be described later.

図７に、本実施例の情報処理システムで用いられるパケットの構成の一例を図７に示した。図７の７１は通信ヘッダ部、７２はペイロード部を示す。ペイロード部７２は、情報処理順序情報＃１−−−＃ｎが順次記録される。この情報処理順序情報７３は、ノード識別情報７４、サービス識別情報７５、アプリケーション識別情報７６、受信時刻情報７７、送信時刻情報７８から構成される。 FIG. 7 shows an example of a packet configuration used in the information processing system of this embodiment. In FIG. 7, 71 indicates a communication header portion, and 72 indicates a payload portion. In the payload portion 72, information processing order information # 1 --- # n is sequentially recorded. The information processing order information 73 includes node identification information 74, service identification information 75, application identification information 76, reception time information 77, and transmission time information 78.

さて図５に戻り、擬似ＡＰ５２の本番ＡＰ動作模擬部５６では、本番ＡＰ１１１による情報処理の典型的な処理時間（無負荷時の時間など）分をスリープする等して、本番ＡＰ１１１に影響のない形で消費し模擬する。当該時間は、管理サーバＩＦ部５９経由で模擬負荷として指定する。すなわち、本番ＡＰ動作模擬部５６は、管理マネージャIF部５９を経由して、管理マネージャ１４から擬似ＡＰ５２の送信先指定、模擬負荷を指示される。この送信先指定は、送信先固定、一定確率での送信先振り分け、条件指定型複数送信先振り分け（メモリ使用率が閾値が超えたなど）、複数送信（送信先固定）、複数送信(送信先が一定確率で変化)などが存在する。また、模擬負荷は、スリープ処理時間を指定、一定確率でスリープ時間が変化、本番ＡＰの稼働状況を反映（リアルタイム、非リアルタイムなど）などが存在する。 Now, referring back to FIG. 5, the production AP operation simulation unit 56 of the pseudo AP 52 does not affect the production AP 111 by sleeping for a typical processing time (such as no load time) of information processing by the production AP 111. Consume and simulate in form. The time is specified as a simulated load via the management server IF unit 59. That is, the production AP operation simulation unit 56 is instructed by the management manager 14 to specify the transmission destination of the pseudo AP 52 and the simulated load via the management manager IF unit 59. This destination specification is fixed destination, destination distribution with a certain probability, condition specification type multiple destination allocation (memory usage exceeded the threshold, etc.), multiple transmission (fixed destination), multiple transmission (destination Change at a certain probability). In addition, the simulated load includes a sleep processing time, a sleep time that changes with a certain probability, and an operational status of the production AP (real time, non-real time, etc.).

パケット到着時刻採取部５４は、情報処理を行う処理ノードであるサーバにパケットが到着した時間を、採取データ保持部５８にて保持するよう機能する。また、パケット送信時刻採取部５７は、到着時のＣＰＵやメモリ、ＩＯの利用率か稼動情報（障害情報含む）、本番ＡＰ動作模擬部５６での時間消費の後、管理マネージャIF部５９経由で、指定される送信先に処理要求（パケット）を送信する際の時刻を、採取データ保持部５８にて保持するよう機能する。なお、指定される送信先が複数ある場合には実動作に基づく確率ベースで送信先決定を行う。 The packet arrival time collection unit 54 functions so that the collection data holding unit 58 holds the time when a packet arrives at a server that is a processing node that performs information processing. Further, the packet transmission time collection unit 57 uses the CPU, memory, IO usage rate or operation information (including fault information) at the time of arrival, time consumption in the real AP operation simulation unit 56, and then passes through the management manager IF unit 59. The collection data holding unit 58 functions to hold the time when the processing request (packet) is transmitted to the designated transmission destination. When there are a plurality of designated transmission destinations, the transmission destination is determined based on a probability based on the actual operation.

模擬ＡＰ５２の採集データ保持部５８は、各採取部５４、５７採取されたデータをwrap aroundで記録しつづけるとともに、障害発生などのイベント発生時にはwrap aroundを停止するよう機能する。どのようなイベントで採取を停止するかは管理マネージャIF部５９経由で指定する。なお、プロファイル情報付与部６０は、後で説明するように、受け取った擬似パケットのペイロード部７２に処理ノードの識別子と一次記憶した受信時刻情報と送信時刻情報を付与する機能である。 The collected data holding unit 58 of the simulated AP 52 continues to record the collected data 54 and 57 in a wrap around manner, and functions to stop the wrap around when an event such as a failure occurs. The event at which collection is to be stopped is specified via the management manager IF unit 59. As will be described later, the profile information adding unit 60 has a function of adding the identifier of the processing node, the reception time information temporarily stored, and the transmission time information to the payload portion 72 of the received pseudo packet.

擬似パケット蓄積部５３では、受信した擬似パケットをその受信時刻情報、送信時刻情報とともに保持する。送信時刻情報は、パケット送信時刻採取部５７から直接記憶するように構成できる。それと共に、管理マネージャＩＦ部５９を介して、管理マネージャ１４に擬似パケット情報を送信する。擬似パケット情報は、管理マネージャ１４で、どのパスを通って、すなわちどの処理ノードを経由してサービスが提供されているかを調査する情報として利用する。また、管理マネージャ１４と各処理ノードの時刻差を補正することで、ネットワークの遅延箇所を調査する情報として利用する。 The pseudo packet storage unit 53 holds the received pseudo packet together with its reception time information and transmission time information. The transmission time information can be configured to be stored directly from the packet transmission time collection unit 57. At the same time, pseudo packet information is transmitted to the management manager 14 via the management manager IF unit 59. The pseudo packet information is used by the management manager 14 as information for investigating which path, that is, via which processing node the service is provided. In addition, the time difference between the management manager 14 and each processing node is corrected, and this is used as information for investigating a delay portion of the network.

稼動情報蓄積部５１では、パケット到着時刻採取部５４が擬似パケットを受信したタイミングから任意時間までのハードウェア稼働情報やネットワークの統計情報を蓄積する。収集については稼働情報採取部５５が行う。ハードウェア稼働情報はたとえば、ＣＰＵやメモリ、ＩＯの利用率、障害情報などを想定している。また、ネットワークの統計情報は、ＲＦＣ１２１３に規定されるＭＩＢ（Management Information Base）などの統計情報を用いることができる。 The operation information accumulation unit 51 accumulates hardware operation information and network statistical information from the timing when the packet arrival time collection unit 54 receives the pseudo packet to an arbitrary time. The collection is performed by the operation information collection unit 55. The hardware operation information assumes, for example, CPU, memory, IO usage rate, failure information, and the like. Further, statistical information such as MIB (Management Information Base) defined in RFC1213 can be used as the network statistical information.

図１６の１６０１〜１６０４は、それぞれ処理ノード１１のＨＤＤ１１８中の稼働情報蓄積部５１に蓄積される、サーバ稼働情報、プロセス稼働情報、ネットワーク稼働情報、ストレージ稼働情報のテーブルの一例を示した。サーバ稼働情報１６０１は図示の通り、サーバ稼働についての種々の情報を、プロセス稼働情報１６０２は、プロセス稼働についての種々の情報を、ネットワーク稼働情報１６０３は、ネットワーク稼働についての種々の情報を、ストレージ稼働情報１６０４は、記憶部であるストレージ稼働を示す種々の情報を蓄積する。 Reference numerals 1601 to 1604 in FIG. 16 indicate examples of tables of server operation information, process operation information, network operation information, and storage operation information that are stored in the operation information storage unit 51 in the HDD 118 of the processing node 11, respectively. As shown in the figure, the server operation information 1601 shows various information about the server operation, the process operation information 1602 shows various information about the process operation, the network operation information 1603 shows various information about the network operation, and the storage operation. The information 1604 accumulates various types of information indicating storage operation as a storage unit.

図８は、擬似ＡＰ５２の擬似パケット受信時の処理の詳細フローを示している。本詳細フローは、図４に示した本実施例の全体フローのステップ４３〜４５に対応している。 FIG. 8 shows a detailed flow of processing when the pseudo AP 52 receives a pseudo packet. This detailed flow corresponds to steps 43 to 45 of the overall flow of this embodiment shown in FIG.

さて、処理ノードの擬似ＡＰ５２２は、クライアント端末が送信した擬似パケット受信（８０１）すると、パケット到着時刻採取部５４が時刻情報取得装置１２７から現在時刻を取得し、パケット受信時刻情報として一次記憶（８０２）する。稼働情報採取部５５は、ＣＰＵ負荷率、ネットワークスループット、メモリ使用率、ＨＤＤ使用量等の処理ノードの稼働情報の収集を開始（８０３）する。 When the pseudo AP 522 of the processing node receives the pseudo packet transmitted from the client terminal (801), the packet arrival time collection unit 54 obtains the current time from the time information acquisition device 127, and stores it as primary packet reception time information (802). ) The operation information collection unit 55 starts collecting the operation information of the processing node such as the CPU load factor, network throughput, memory usage rate, HDD usage amount (803).

本番ＡＰ動作模擬部５６は、擬似パケットが送信される前に、管理マネージャＩＦ部５９を経由して、管理マネージャ１４１から送信されてきた模擬動作指示を実行（８０４）する。これにより、処理ノード１１において、模擬動作が実行（８０５）される。この稼働情報は、アプリケーションのデプロイ時に管理マネージャ１４１から設定された滞留時間分、擬似ＡＰ５２内で滞留すると、管理マネージャ１４１に事前に指示された送信先ＩＰアドレスへ擬似パケットとして送信する。 The real AP operation simulation unit 56 executes the simulation operation instruction transmitted from the management manager 141 via the management manager IF unit 59 before the pseudo packet is transmitted (804). Thereby, the simulation operation is executed (805) in the processing node 11. When the operation information stays in the pseudo AP 52 for the stay time set by the management manager 141 when the application is deployed, the operation information is transmitted as a pseudo packet to the destination IP address designated in advance by the management manager 141.

パケット送信時刻採取部５７は、時刻情報取得装置１２７から現在時刻を取得し、擬似ＡＰ５２内部に送信時刻情報として一時記憶（８０６）する。擬似パケット送信時において、パケット送信時刻情報をログに出力し、プロファイル情報付与部６０は、受け取った擬似パケットのペイロード部に、処理ノードの識別子と一時記憶した受信時刻情報と送信時刻情報を付与（８０７）し、情報を付与した擬似パケットを送信する（８０８）。 The packet transmission time collection unit 57 acquires the current time from the time information acquisition device 127 and temporarily stores (806) the transmission time information in the pseudo AP 52. At the time of pseudo packet transmission, the packet transmission time information is output to a log, and the profile information adding unit 60 adds the identifier of the processing node, the temporarily stored reception time information, and transmission time information to the payload portion of the received pseudo packet ( 807) and transmit the pseudo packet to which the information is added (808).

以上説明した擬似ＡＰ５２の本番ＡＰ動作部模擬部５６がどのように本番ＡＰをモニタするかについて、その動作内容を説明する。 The operation content of how the real AP operation unit simulation unit 56 of the pseudo AP 52 described above monitors the real AP will be described.

(1) サービス管理者が擬似ＡＰの動作定義データを作成し、指定する。 (1) The service administrator creates and specifies pseudo AP operation definition data.

(2) 本番ＡＰをモニタリングし、送信元に対する送信先の確率分布を求める。モニタリングは、本番ＡＰを監視する管理ミドルウェアが行い、モニタリング結果を一定周期で管理マネージャ１４１が収集する。管理マネージャ１４１は収集結果から、本番ＡＰごとに動作定義データ（送信元のＩＰアドレス、滞留時間、送信先アドレス１、確率１、送信先アドレス２、確率２・・・）を作成し、擬似ＡＰ５２の本番ＡＰ動作模擬部５６に送信する。 (2) Monitor the production AP and determine the probability distribution of the destination with respect to the source. The monitoring is performed by the management middleware that monitors the production AP, and the management manager 141 collects the monitoring results at regular intervals. The management manager 141 creates operation definition data (transmission source IP address, dwell time, transmission destination address 1, probability 1, transmission destination address 2, probability 2...) For each production AP from the collection result, and creates a pseudo AP 52. To the actual AP operation simulation unit 56.

この動作定義データは、処理の開始時に擬似ＡＰの管理マネージャIF部５９を通して各擬似ＡＰが受信し、本番ＡＰ動作模擬部５６に登録する。サービス管理者が指定する場合には、振る舞い動作データを作成して、各擬似ＡＰへ振る舞い動作データを送信する。 The operation definition data is received by each pseudo AP through the management manager IF unit 59 of the pseudo AP at the start of processing, and is registered in the actual AP operation simulation unit 56. When specified by the service manager, behavior behavior data is created and behavior behavior data is transmitted to each pseudo AP.

この動作定義データフォーマットの一例は下記の通りである。
＜time=滞留時間&送信元ＩＰアドレス数=2&送信元ＩＰアドレス１＝xxx.xxx.xxx.xxx&送信元ＩＰアドレス２=yyy.yyy.yyy.yyy&送信先ＩＰアドレス数=2&送信先アドレス１=zzz.zzz.zzz.zzz&確率１=10&送信先アドレス２=qqq.qqq.qqq.qqq&確率２=90＞
ここで、動作定義データの動作のバリエーションを図１２のテーブル１２０１に示した。番号１〜４に動作のバリエーションを示したが、その動作の内容は図示の通りである。また、同様に、滞留時間のバリエーションを図１３のテーブル１３０１に示した。番号１〜４に滞留時間のバリエーションを示したが、その時間の内容は図示の通りである。 An example of the operation definition data format is as follows.
<Time = dwell time & source IP address number = 2 & source IP address 1 = xxx.xxx.xxx.xxx & source IP address 2 = yyy.yyy.yyy.yyy & destination IP address number = 2 & destination address 1 = zzz.zzz.zzz.zzz & probability 1 = 10 & destination address 2 = qqq.qqq.qqq.qqq & probability 2 = 90>
Here, the operation variation of the operation definition data is shown in the table 1201 of FIG. The variations of the operations are shown in the numbers 1 to 4, and the contents of the operations are as illustrated. Similarly, the variation of the residence time is shown in the table 1301 of FIG. Variations of residence time are shown in numbers 1 to 4, and the contents of the time are as shown in the figure.

図９は、本実施例において、各処理ノード１１、１２、１３での処理の結果、プローブパケットとして機能する擬似パケットがどの様に転送されていくかを示している。同図において、９４、９５、９６、９７はそれぞれ順次転送されるパケットを示している。パケット９４は実世界から最初に転送される擬似パケットを示している。図７に示した通信ヘッダ部７１に、宛先が記述され、ペイロード部７２には処理ノード１１、１２、１３を経由する度に、情報処理順序情報が追加されていく。例えば、実世界の端末（アドレス＝Ｄ）に送られる擬似パケット９７には、処理ノード１１、１２、１３で記録された情報処理順序情報が全て記述されていることになる。 FIG. 9 shows how a pseudo packet that functions as a probe packet is transferred as a result of processing in each processing node 11, 12, 13 in this embodiment. In the figure, reference numerals 94, 95, 96, and 97 denote packets that are sequentially transferred. A packet 94 indicates a pseudo packet that is first transferred from the real world. The destination is described in the communication header portion 71 shown in FIG. 7, and information processing order information is added to the payload portion 72 every time it passes through the processing nodes 11, 12, and 13. For example, the pseudo packet 97 sent to the real-world terminal (address = D) describes all the information processing order information recorded by the processing nodes 11, 12, and 13.

続いて、図１０に示した、管理マネージャ１４１の処理フローに基づき、本実施例の管理マネージャ１４１の動作を説明する。 Next, the operation of the management manager 141 of this embodiment will be described based on the processing flow of the management manager 141 shown in FIG.

同図において、管理マネージャ６２の採取データ収集部６３は、各処理ノードから擬似ＡＰが採取したデータを擬似ＡＰI/F部６４経由で収集する。擬似ＡＰ制御部６５は、擬似ＡＰの次送信先アドレスや本番ＡＰの動作模擬情報を擬似ＡＰに送信する。例えば、滞留時間、確率による複数の擬似ＡＰ振り分けなどである。 In the figure, a collection data collection unit 63 of the management manager 62 collects data collected by the pseudo AP from each processing node via the pseudo API / F unit 64. The pseudo AP control unit 65 transmits the next destination address of the pseudo AP and the operation simulation information of the production AP to the pseudo AP. For example, a plurality of pseudo APs are distributed according to residence time and probability.

まず、採取データ収集部６３が擬似ＡＰＩ／Ｆ部６４を介して各処理ノードの擬似ＡＰ５２の擬似パケット蓄積部に蓄積された擬似パケット情報等を収集する（１００１）。収集した各処理ノードの擬似パケット情報を採取データ蓄積部６１に蓄積し（１００２）、管理ノード１４と各処理ノード１１、１２、１３の時刻情報を補正する（１００３）。採取データ関連付け処理部６６は、擬似パケットに付与した時刻情報と各処理ノードの時刻補正により、サービスのroundtrip timeと擬似ＡＰ間の通信時間を算出する。また、同じタイミングでのネットワーク稼動情報、ハードウェア稼動情報を関連付けする。そして、採取データ関連付け処理部６６で取得した擬似パケット情報を時刻情報順にソートし（１００４）、ソート結果を図示を省略した表示部に提示する（１００５）。 First, the collection data collection unit 63 collects pseudo packet information and the like stored in the pseudo packet storage unit of the pseudo AP 52 of each processing node via the pseudo API / F unit 64 (1001). The collected pseudo packet information of each processing node is stored in the collection data storage unit 61 (1002), and the time information of the management node 14 and each processing node 11, 12, 13 is corrected (1003). The collection data association processing unit 66 calculates the service roundtrip time and the communication time between the pseudo APs based on the time information given to the pseudo packet and the time correction of each processing node. In addition, network operation information and hardware operation information at the same timing are associated. Then, the pseudo packet information acquired by the collection data association processing unit 66 is sorted in order of time information (1004), and the sorting result is presented on a display unit (not shown) (1005).

以上説明した管理マネージャ構成における採取データ関連付け処理部６６の処理内容を整理すると下記の通りである。 The processing contents of the collection data association processing unit 66 in the management manager configuration described above are organized as follows.

(1)まず、採取データ収集部が擬似ＡＰI/F部６４を利用し、各擬似ＡＰからの擬似パケットのログを取得する。擬似パケットのログは擬似パケットの最終到着処理ノードから取得する。 (1) First, the collection data collection unit uses the pseudo API / F unit 64 to acquire a log of pseudo packets from each pseudo AP. The pseudo packet log is acquired from the final arrival processing node of the pseudo packet.

(2)採取データ収集部６３が擬似ＡＰI/F部６４を利用し、各種稼働情報を取得し、採取データ蓄積部６１に登録する。データ取得完了を採取データ関連つけ処理部６６に通知する。 (2) The collection data collection unit 63 uses the pseudo API / F unit 64 to acquire various operation information and registers it in the collection data storage unit 61. The collection data association processing unit 66 is notified of the completion of data acquisition.

(3)採取データ関連付け処理部６１は、各処理ノードの時刻補正を行う。この補正の方法については、あとで説明するＧＰＳ(Global Positioning System)等の利用により補正することができる。 (3) The collection data association processing unit 61 corrects the time of each processing node. This correction method can be corrected by using a GPS (Global Positioning System) described later.

(4)続いて、サービスのラウンドトリップタイムを算出する。このサービスのラウンドトリップタイムは擬似パケットが最初の擬似ＡＰに入力され、いくつかの擬似ＡＰを経由して、最終処理ノードに到着し、ログが出力されるまでの時間に対応する。 (4) Subsequently, the round trip time of the service is calculated. The round trip time of this service corresponds to the time from when a pseudo packet is input to the first pseudo AP, through several pseudo APs, to the final processing node, and until a log is output.

(5)処理時間内訳を算出し、擬似ＡＰ間の通信時間を算出する。 (5) The processing time breakdown is calculated, and the communication time between the pseudo APs is calculated.

(6)擬似パケットが通過した時刻周辺のネットワーク稼働情報、擬似パケットが到着し、送出する周辺時刻のハードウェア稼働情報を採取データ蓄積部６１から取得する。同時に同じ稼働情報取得場所での同データの統計情報を採取データ蓄積部６１から取得する。 (6) The network operation information around the time when the pseudo packet passes, and the hardware operation information at the peripheral time when the pseudo packet arrives and is transmitted are acquired from the collected data storage unit 61. At the same time, statistical information of the same data at the same operation information acquisition location is acquired from the collected data storage unit 61.

(7)処理時間内訳と(5)で取得したネットワーク稼働情報とハードウェア稼働情報を割り付けし、グラフィカル・ユーザ・インタフェース（Graphical User Interface：ＧＵＩ）で表示部に表示する。 (7) Breakdown of processing time and network operation information and hardware operation information acquired in (5) are allocated and displayed on the display unit with a graphical user interface (GUI).

(8)擬似パケットが通過した時刻周辺の稼働情報(1)と統計情報とを比較する。滞留時間の分布を統計情報から求めた際に、ある一定範囲に(1)情報が入らなかった場合はエラー箇所としてGUIに表示などを行う。 (8) The operation information (1) around the time when the pseudo packet passes is compared with the statistical information. When the distribution of residence time is obtained from statistical information, if (1) information does not fall within a certain range, it is displayed on the GUI as an error location.

なお、上述した管理サーバ、処理ノードにＧＰＳを搭載、もしくはＧＰＳからの時刻情報を取得するＩ／Ｆを持つことで定期的にノードの時刻情報を更新することにより、時刻補正を実現することができる。これによって、広域的に分散配置された複数の処理ノードである場合でも、同様な方法によって時差分の時間差が無視できる。 In addition, it is possible to realize time correction by periodically updating the time information of the node by installing the GPS in the management server and the processing node described above or having an I / F for acquiring time information from the GPS. it can. As a result, even when there are a plurality of processing nodes distributed in a wide area, the time difference of the time difference can be ignored by the same method.

図１４Ａ、図１４Ｂは、以上詳述した第１の実施例における情報処理システムにおける処理の流れの全体を説明するシーケンス図である。図１４Ａの処理に続いて、図１４Ｂの処理が行われる。同図において、管理マネージャ１４０１、処理ノード１４０２、１４０３、１４０４はそれぞれ、管理マネージャ１４１、処理ノード１１、１２、１３に対応する。 14A and 14B are sequence diagrams for explaining the entire processing flow in the information processing system in the first embodiment described in detail above. Following the process of FIG. 14A, the process of FIG. 14B is performed. In the figure, management manager 1401 and processing nodes 1402, 1403 and 1404 correspond to management manager 141 and processing nodes 11, 12 and 13, respectively.

図１４Ａにおいて、上述の図４を用いた説明のとおり、管理マネージャ１４０１は、まず処理ノード１４０２に擬似パケットを登録する（１４０５〜１４０７）。続いて、管理マネージャ１４０１は、それぞれの処理ノード１４０２、１４０３、１４０４に対して、サービス開始指示（１４０８、１４１１、１４１４）を行い、擬似ＡＰを起動（１４０９、１４１２、１４１５）する。その後、処理ノード１４０２の擬似ＡＰが擬似パケットを取得（１４１７）、送信時刻を付与（１４１８）、擬似パケット蓄積部に保持（１４１９）、稼働情報取得を開始する（１４２０）。そして、処理ノード１４０２の擬似ＡＰが擬似パケットを処理ノード１４０３に送信する（１４２１）。 In FIG. 14A, as described above with reference to FIG. 4, the management manager 1401 first registers a pseudo packet in the processing node 1402 (1405 to 1407). Subsequently, the management manager 1401 issues a service start instruction (1408, 1411, 1414) to each of the processing nodes 1402, 1403, 1404, and activates the pseudo AP (1409, 1412, 1415). Thereafter, the pseudo AP of the processing node 1402 acquires a pseudo packet (1417), assigns a transmission time (1418), holds it in the pseudo packet storage unit (1419), and starts acquiring operation information (1420). The pseudo AP of the processing node 1402 transmits a pseudo packet to the processing node 1403 (1421).

続いて、処理ノード１４０３の擬似ＡＰは擬似パケットを受信し、稼働情報取得開始、受信時刻、送信時刻の付与、擬似パケット蓄積部に保持し、擬似パケットを処理ノード１４０４に送信する（１４２２〜１４２７）。処理ノード１４０４においても同様な処理（１４２８〜１４３２）が行われる。 Subsequently, the pseudo AP of the processing node 1403 receives the pseudo packet, holds the operation information acquisition start, the reception time and the transmission time, holds the pseudo packet in the pseudo packet storage unit, and transmits the pseudo packet to the processing node 1404 (1422-1427). ). Similar processing (1428 to 1432) is performed in the processing node 1404.

更に、図１４Ｂに移り、処理ノード１４０４が擬似パケットを処理ノード１４０３に送信（１４３３）すると、処理１４３４〜１４４２が実行される。 14B, when the processing node 1404 transmits a pseudo packet to the processing node 1403 (1433), processing 1434 to 1442 are executed.

以上の処理を受けて、管理マネージャ１４０１は各処理ノードに対してサービス停止指示を行い、稼働情報取得が停止される（１４４３〜１４５１）。その後、管理マネージャ１４０１は、各処理ノード１４０２、１４０３、１４０４から稼働情報と擬似パケット情報を取得し（１４５２〜１４５４）、採取データ蓄積部に保存（１４５５）する。そして、各処理ノード間の時刻情報の補正（１４５６）を行い、図示を省略した管理サーバ１４のディスプレイ等の表示部に、擬似パケットが通過した経路及び処理／通過にかかった時間を重ねて表示する（１４５７）。管理サーバ１４の管理マネージャ１４１を稼働するサービス管理者は、この表示データと統計情報と比較し、大きく異なっている箇所を特定したり、評価（１４５８）を行ったりする。これにより、上述したネットワーク上の障害部位を推定することができる。 In response to the above processing, the management manager 1401 issues a service stop instruction to each processing node, and operation information acquisition is stopped (1443 to 1451). Thereafter, the management manager 1401 acquires operation information and pseudo packet information from the processing nodes 1402, 1403, and 1404 (1452-1454) and stores them in the collected data storage unit (1455). Then, the time information between the processing nodes is corrected (1456), and the route through which the pseudo packet passes and the time taken for processing / passing are displayed on the display unit such as the display of the management server 14 (not shown). (1457). The service manager who operates the management manager 141 of the management server 14 compares the display data with the statistical information, identifies a greatly different portion, and performs evaluation (1458). As a result, it is possible to estimate the faulty part on the network described above.

図１５に、上述してきた本実施例の情報処理システムの具体的な構成の一例を示す。同図に見るように、ＬＡＮ等のネットワーク１５０に処理ノード１５１と管理サーバ１５２が接続されている。このネットワーク１５０、処理ノード１５１、管理サーバ１５２は、上述した公衆網・イントラネット１７、処理ノード１１、１２、１３、及び管理サーバ１４に対応している。管理サーバ１５２は、内部バス１４６に接続されたメモリ１４２、ＨＤＤ等のデータ蓄積部１４３、ネットワークＩ／ＦであるＬＡＮアダプタ１４４、及びＣＰＵ１４５で構成されるコンピュータである。管理マネージャ６２の内部は図６に示したとおりの機能構成を備えている。 FIG. 15 shows an example of a specific configuration of the information processing system of the present embodiment described above. As shown in the figure, a processing node 151 and a management server 152 are connected to a network 150 such as a LAN. The network 150, processing node 151, and management server 152 correspond to the public network / intranet 17, processing nodes 11, 12, and 13, and the management server 14 described above. The management server 152 is a computer including a memory 142 connected to the internal bus 146, a data storage unit 143 such as an HDD, a LAN adapter 144 that is a network I / F, and a CPU 145. The inside of the management manager 62 has a functional configuration as shown in FIG.

一方、処理ノード１５１は、図１に示したとおり、メモリ１１５、ＣＰＵ１１７，データ蓄積部であるＨＤＤ１１８、ＬＡＮアダプタ等のＩ／Ｆ１１９を有し、更にバス１２５にはＩ／Ｆ１２６を介して時刻情報取得装置１２７が接続されている。ＨＤＤ１１８中には、図５で説明した稼働情報蓄積部５１、擬似パケット蓄積部５３が形成される。また、メモリ（ＭＭ）１１５中には、本番ＡＰ１１１と図５にその詳細を示した擬似ＡＰ１１２が記憶されている。 On the other hand, as shown in FIG. 1, the processing node 151 has a memory 115, a CPU 117, an HDD 118 as a data storage unit, an I / F 119 such as a LAN adapter, and the bus 125 also receives time information via the I / F 126. An acquisition device 127 is connected. In the HDD 118, the operation information storage unit 51 and the pseudo packet storage unit 53 described with reference to FIG. The memory (MM) 115 stores a real AP 111 and a pseudo AP 112 whose details are shown in FIG.

なお、先に図１１を用いて説明した情報処理システムにおいても、同様な具体的なシステム構成で構築されることは言うまでもない。 Needless to say, the information processing system described above with reference to FIG. 11 is also constructed with the same specific system configuration.

以上本発明の実施例を説明したが、本発明は、以上説明した実施例に限定されるものでなく、階層的情報処理を行う他のシステムにも広く適用できる。 Although the embodiments of the present invention have been described above, the present invention is not limited to the embodiments described above, and can be widely applied to other systems that perform hierarchical information processing.

本発明は、社会基盤に適用される情報システム、特に広域分散資源利用を行うシステムにおける、性能情報の取得、及び障害発生部位の監視を行う技術として有用である。 INDUSTRIAL APPLICABILITY The present invention is useful as a technique for acquiring performance information and monitoring a fault occurrence site in an information system applied to a social infrastructure, particularly a system that uses a wide area distributed resource.

１１、１２、１３、２１〜２８、１１０１〜１１０４…処理ノード
１４，１１０６…管理サーバ
１５、１６、１７、１１０７、１１０８、１５０…ネットワーク
３０…実世界
３１、３２、１１１１、１１１２…センサ
３３、３４、１１１３…アクチュエータ
６２、１４１…管理マネージャ
７１…通信ヘッダ部
７２…ペイロード部
１１１、１２１、１３１…本番ＡＰ
５２、１１２、１２２、１３２…擬似ＡＰ
１１３、１２３、１３３…ＯＳ又はＡＰ実行基盤
１１４、１２４、１３４…ハードウェア
１１５、１４２…メモリ（ＭＭ）
１１６…Ｉ／Ｏ
１１７、１４５…ＣＰＵ
１１８、１４３…ＨＤＤ
１１９、１４４…Ｉ／Ｆ
１１０９…端末
１１１０…エッジノード。 11, 12, 13, 21-28, 1101-1104 ... processing nodes 14, 1106 ... management servers 15, 16, 17, 1107, 1108, 150 ... network 30 ... real world 31, 32, 1111, 1112 ... sensor 33, 34, 1113 ... Actuator 62, 141 ... Management manager 71 ... Communication header part 72 ... Payload part 111, 121, 131 ... Production AP
52, 112, 122, 132 ... pseudo AP
113, 123, 133 ... OS or AP execution platform 114, 124, 134 ... Hardware 115, 142 ... Memory (MM)
116 ... I / O
117, 145 ... CPU
118, 143 ... HDD
119, 144 ... I / F
1109 ... Terminal 1110 ... Edge node.

Claims

A distributed resource monitoring system in which a distributed resource including a management server and a plurality of processing nodes performs hierarchical information processing via a network to provide a service,
The plurality of processing nodes are
A production application that performs the information processing, and a pseudo application that simulates the operation of the production application,
When receiving a pseudo packet corresponding to the service, the pseudo application includes sequence information including node identification information for identifying the processing node and transmission / reception time information indicating a time at which the pseudo packet is transmitted / received. And send it to the other processing nodes,
The management server
Acquire operation information when each of the pseudo applications of the plurality of processing nodes transmits / receives the pseudo packet, and corresponds to a time when the processing node transmits / receives the pseudo application based on the order information of the acquired pseudo packets. Determining that a failure has occurred in the processing node when the operation information is not included in a predetermined statistical value range ;
Distributed resource monitoring system characterized by that.

The distributed resource monitoring system according to claim 1,
The management server sets the pseudo application to a plurality of the processing nodes via the network.
Distributed resource monitoring system characterized by that.

The distributed resource monitoring system according to claim 1,
The management server issues a service stop instruction to the plurality of processing nodes via the network, and acquires the operation information and the pseudo packet from each of the processing nodes after the service stop instruction.
Distributed resource monitoring system characterized by that.

The distributed resource monitoring system according to claim 1,
The plurality of processing nodes include a storage unit that stores the pseudo packet and the operation information, and transmits the pseudo packet and the operation information stored in the storage unit to the management server via the network.
Distributed resource monitoring system characterized by that.

The distributed resource monitoring system according to claim 1,
The operation information includes server operation information, network operation information, and storage operation information.
Distributed resource monitoring system characterized by that.

A distributed resource monitoring method for estimating a failure occurrence part of a system that provides a service by performing hierarchical information processing through a network in which a distributed resource including a management server and a plurality of processing nodes includes:
A production application that performs the information processing and a pseudo application that simulates the operation of the production application are set in a plurality of the processing nodes,
The processing node receives a pseudo packet corresponding to the service;
The pseudo application adds order information including node identification information for identifying the processing node and transmission / reception time information indicating the time at which the pseudo packet was transmitted / received to the pseudo packet, and transmits the pseudo packet to the other processing node. ,
The management manager of the management server
Acquire operation information when each of the pseudo applications of the plurality of processing nodes transmits / receives the pseudo packet, and corresponds to a time when the processing node transmits / receives the pseudo application based on the order information of the acquired pseudo packets. Determining that a failure has occurred in the processing node when the operation information is not included in a predetermined statistical value range ;
A distributed resource monitoring method characterized by the above.

The distributed resource monitoring method according to claim 6,
The management manager sets the pseudo application to a plurality of the processing nodes via the network.
A distributed resource monitoring method characterized by the above.

The distributed resource monitoring method according to claim 7,
The management manager can instruct a transmission destination designation of the pseudo application set in the processing node and a simulated load.
A distributed resource monitoring method characterized by the above.

The distributed resource monitoring method according to claim 6,
The pseudo application running on the processing node collects the operation information of the processing node during reception and transmission of the pseudo packet;
A distributed resource monitoring method characterized by the above.

The distributed resource monitoring method according to claim 6,
The management manager issues a service stop instruction to the plurality of processing nodes via the network, and acquires the operation information and the pseudo packet from each of the processing nodes after the service stop instruction.
A distributed resource monitoring method characterized by the above.

A distributed resource monitoring apparatus for a system in which a plurality of processing nodes perform hierarchical information processing via a network to provide a desired service,
And production application implementing the information processing respectively, a plurality of said processing nodes with pseudo application that simulates the operation of the production application receives the pseudo packet corresponding to the service, to the pseudo application the pseudo packet, the node identification information for identifying the processing node, and the virtual packet by adding the order information including the transmission and reception time information indicating the time at which the transmitted and received, transmission and reception with the sequence information when transmitted to another of said processing nodes of said virtual packet A collection unit that collects operational information when
Based on the collected the sequence information of the pseudo packet, the operation information the processing nodes corresponding to the time that reception of the pseudo application, if not included in the scope of predetermined statistical value, a failure in the processing node generates Provided with a processing unit for determining that
A distributed resource monitoring apparatus.

The distributed resource monitoring apparatus according to claim 11,
An interface unit that receives the order information and the operation information from the plurality of processing nodes via the network;
A distributed resource monitoring apparatus.

The distributed resource monitoring apparatus according to claim 12, wherein
A pseudo application control unit configured to set the pseudo application to the plurality of processing nodes via the interface unit and the network;
A distributed resource monitoring apparatus.

The distributed resource monitoring apparatus according to claim 11,
The management manager that functions as the collection unit and the processing unit, and a storage unit that stores the collected order information and operation information.
A distributed resource monitoring apparatus.

The distributed resource monitoring apparatus according to claim 11,
The operation information includes server operation information, network operation information, and storage operation information.
A distributed resource monitoring apparatus.