JP5779548B2

JP5779548B2 - Information processing system operation management apparatus, operation management method, and operation management program

Info

Publication number: JP5779548B2
Application number: JP2012130375A
Authority: JP
Inventors: 健太郎角井; 昭博伊藤; 敦行乾
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2011-07-21
Filing date: 2012-06-08
Publication date: 2015-09-16
Anticipated expiration: 2032-06-08
Also published as: JP2013041574A

Description

情報処理システムの運用管理装置、運用管理方法及び運用管理プログラムに関し、特にシステムの稼働状況を監視し、システムの障害の発生を検知する情報処理システムの運用管理装置、運用管理方法及び運用管理プログラムに関する。 More particularly, the present invention relates to an operation management apparatus, an operation management method, and an operation management program for an information processing system that monitors the operation status of the system and detects the occurrence of a system failure. .

近年、情報処理システムが企業活動や社会インフラの基盤としてますます重要な位置を占めるようになるにつれ、高い処理能力と高い信頼性を兼備した情報処理システムへの要請はかつてないほど高くなっている。 In recent years, as information processing systems have become increasingly important as the foundation of corporate activities and social infrastructure, the demand for information processing systems that combine high processing capacity and high reliability has never been higher. .

そうした高度な情報処理システムの実現様態として、多数の情報処理装置をデータセンタ等に設置し、それらを協調動作させることによってシステムとしての目的を達成せしめる並列分散処理システムが普及しつつある。 As a state of implementation of such an advanced information processing system, a parallel distributed processing system in which a large number of information processing apparatuses are installed in a data center or the like and cooperatively operate them to achieve the purpose of the system is becoming widespread.

こうした並列分散処理システムを運用管理するにあたって課題となるのは、システムで発生する障害の検知と対応である。多数の情報処理装置の協調により動作するという特性上、装置の障害の発生はシステム全体の動作に影響を及ぼす。多くのシステムは、このような障害の発生に対する耐性を具備しており、システム全体の停止は回避されるが、それでも性能の劣化や資源の利用効率の低下は避けられない。また、システムが大規模になり、使用する装置の数が増加するに従い、装置の障害発生の頻度は看過できないほどに大きくなる。 A problem in the operation and management of such a parallel distributed processing system is detection and response to a failure occurring in the system. Due to the characteristic of operating by the cooperation of a large number of information processing devices, the occurrence of a device failure affects the operation of the entire system. Many systems have tolerance against the occurrence of such a failure, so that the entire system can be prevented from being stopped, but it is still inevitable that performance deteriorates and resource utilization efficiency decreases. Also, as the system becomes larger and the number of devices used increases, the frequency of device failures increases to a level that cannot be overlooked.

こうした障害発生の検知に関わる背景技術として、例えば特許文献１では、「複数の情報処理装置が協調して動作する情報処理システムの性能を監視する性能運用管理装置であって、前記複数の情報処理装置の稼働状況、及び、前記複数の情報処理装置間を接続する各通信回線のデータ通信状況を監視する監視手段と、前記監視手段による監視データに基づいて、前記情報処理システムに現在発生している障害を検知」する装置が開示されている。 As background art related to detection of such a failure occurrence, for example, in Patent Document 1, “a performance operation management device that monitors the performance of an information processing system in which a plurality of information processing devices operate in cooperation with each other, Based on the monitoring status of the operating status of the apparatus and the data communication status of each communication line connecting the plurality of information processing apparatuses, and the monitoring data by the monitoring means, An apparatus for “detecting a fault” is disclosed.

また非特許文献１では、仮想化されたシステムにおいて、あるアプリケーションが動作する複数のインスタンスの間で計測データの相関関係を抽出し、その相関の低下によって障害の発生を検知する方法が開示されている。 Further, Non-Patent Document 1 discloses a method of extracting a correlation of measurement data between a plurality of instances in which a certain application operates in a virtualized system, and detecting the occurrence of a failure by a decrease in the correlation. Yes.

特開２００５−３２７２６１号公報JP 2005-327261 A

Hui Kang, Haifeng Chen, and Guofei Jiang. 2010、 PeerWatch: a fault detection and diagnosis tool for virtualized consolidation systems、 In Proceeding of the 7th international conference on Autonomic computing (ICAC '10). ACM, New York, NY, USA, 119-128.Hui Kang, Haifeng Chen, and Guofei Jiang. 2010, PeerWatch: a fault detection and diagnosis tool for virtualized consolidation systems, In Proceeding of the 7th international conference on Autonomic computing (ICAC '10) .ACM, New York, NY, USA, 119-128.

さて、前述のような並列分散処理システムが普及した理由として、情報処理装置の低廉化により大量に設置、運用が可能になったこと、それら装置の単体性能が著しく向上したことが挙げられる。加えて、そうした情報処理装置の群をひとまとまりとして協調動作させ（こうした群をクラスタと呼称する）、任意の目的のシステムとして活用せしめるにあたり必要となるソフトウェアを、容易に記述できるプログラミング技法が開発されたことが挙げられる。 Now, the reason why the above-mentioned parallel distributed processing system has become widespread is that a large amount of information processing devices can be installed and operated due to the low cost of the information processing devices, and the unit performance of these devices has been remarkably improved. In addition, a programming technique has been developed that makes it easy to describe the software required to make a group of such information processing devices work together as a group (referred to as a cluster) and to use it as a system for any purpose. It can be mentioned.

そうした技法のひとつがＭａｐＲｅｄｕｃｅ方式である。ＭａｐＲｅｄｕｃｅ方式においては、ジョブを多数のタスクに分割し、それぞれのタスクを、クラスタを構成する多数の情報処理装置に分散させて実行するため、並列効果により大幅な実行時間の短縮を期待できる。また、タスクを大きくＭａｐタスクとＲｅｄｕｃｅタスクの２種類に分割し、複数のＭａｐタスク、あるいは複数のＲｅｄｕｃｅタスクそれぞれの間に、処理対象とするデータの相互依存をなくすようになっているため、タスク間の処理の同期を明示してプログラミングする必要がなくなり、タスクスケジューリングを簡素化したのも特徴である。このＭａｐＲｅｄｕｃｅ方式を活用することで、並列分散処理を活用した多種多様な応用システムが実現可能となり、例えば公共交通機関における電子乗車券の使用履歴から得られるデータを活用した人流分析や、送配電網上に設置されたセンサから得られる電力消費量データを活用した電力需要分析等の応用が考えられる。 One such technique is the MapReduce method. In the MapReduce method, a job is divided into a large number of tasks, and each task is distributed and executed in a large number of information processing apparatuses constituting a cluster. Therefore, a significant reduction in execution time can be expected due to a parallel effect. In addition, the task is largely divided into two types, a Map task and a Reduce task, so that interdependencies of data to be processed are eliminated between a plurality of Map tasks or a plurality of Reduce tasks. It is also characterized by simplifying task scheduling, eliminating the need for explicit programming of synchronization between processes. By utilizing this MapReduce method, a wide variety of application systems utilizing parallel distributed processing can be realized. For example, human flow analysis using data obtained from the use history of electronic tickets in public transportation, transmission and distribution networks, etc. Applications such as power demand analysis using power consumption data obtained from sensors installed above are conceivable.

並列分散処理システムにおいては、情報処理装置の障害の発生が、システム全体の動作に影響を及ぼすことは前に述べた。ＭａｐＲｅｄｕｃｅ方式は、こうした状況への対策としても有効である。クラスタを構成する情報処理装置のうちひとつに障害が発生し、当該装置で実行中であったタスクが正常に終了しなかったとしよう。その場合でも、別の装置で同じタスクを再実行することで、ジョブ全体の実行は停止させずに完遂することができる。これは、ＭａｐＲｅｄｕｃｅ方式においてはタスク間の相互依存が最小限にとどめられているため、一つのタスクを再実行しても、他のタスクへの影響が極めて少ないためである。 As described above, in the parallel distributed processing system, the occurrence of a failure in the information processing apparatus affects the operation of the entire system. The MapReduce method is also effective as a countermeasure against such a situation. Suppose that one of the information processing devices that make up a cluster has failed, and the task that was being executed on that device has not ended normally. Even in that case, the execution of the entire job can be completed without stopping by re-execution of the same task in another apparatus. This is because, in the MapReduce method, the interdependence between tasks is kept to a minimum, and even if one task is re-executed, the influence on other tasks is extremely small.

しかしながら、かかる特性を備えたＭａｐＲｅｄｕｃｅ方式のクラスタにおいても、障害の発生を完全に無視できるわけではない。タスク間の相互依存が極小化されているとは言え、タスクの再実行はジョブ全体の実行を遅延せしめる。 However, even in the MapReduce cluster having such characteristics, the occurrence of a failure cannot be completely ignored. Although the interdependence between tasks is minimized, re-execution of tasks delays the execution of the entire job.

また、ある種の障害によっては、タスクが異常終了しないまでも、その実行が本来期待されるものより遅延するという事態が発生しうる。これは例えば、情報処理装置でのスラッシングといった現象の発生によるものがある。このような場合、ＭａｐＲｅｄｕｃｅ方式では、こうしたタスクの実行遅延が発生していることを認識し、同じタスクを別の装置でも実行を開始する。こうした処理を投機的実行と呼称し、開始されるタスクをバックアップタスクと呼ぶ。そして、より先に正常に実行が終了したタスクの出力を処理結果として採用し、そのタスクより遅延しているタスクは強制終了させる。 In addition, depending on a certain type of failure, even if the task does not end abnormally, a situation may occur in which the execution is delayed from what is originally expected. This is due to, for example, the occurrence of a phenomenon such as thrashing in the information processing apparatus. In such a case, the MapReduce method recognizes that such a task execution delay has occurred, and starts executing the same task in another device. Such processing is called speculative execution, and a task to be started is called a backup task. Then, the output of the task that has been normally executed earlier is adopted as the processing result, and the task delayed from the task is forcibly terminated.

こうしたバックアップタスク方式は一定の有効性を持つが、それでもなおジョブ全体の実行が遅延することには変わりはない。 Although such a backup task method has a certain effectiveness, the execution of the entire job is still delayed.

また、タスクの異常終了であれ実行遅延であれ、こうした障害が頻発する情報処理装置は、計算資源の浪費を引き起こすものであり、早急に修理・交換を行うことが求められる。 In addition, an information processing apparatus in which such a failure occurs frequently, whether it is an abnormal end of a task or an execution delay, causes a waste of computational resources, and is required to be repaired and replaced as soon as possible.

よって、ＭａｐＲｅｄｕｃｅ方式を採用した並列分散処理システムを対象とした障害検知の方法が必要となるが、公知の方法では必ずしも十全とは言えない。 Therefore, a failure detection method for a parallel distributed processing system adopting the MapReduce method is required, but the known method is not necessarily sufficient.

例えば前記特許文献1にて開示される発明においては、複数の情報処理装置から、それぞれ複数種類の監視データを取得し、そこから相関関係を算出することで障害の発生を検知する技術が開示されている。しかし、そうした相関関係を抽出すべき監視データをいかに選択するか、その指針は示していない。該文献に例示されている監視データは、ＤＢサーバにおけるトランザクションのスループットとディスクＩ／Ｏ量といったように、監視対象であるシステムの構成やダイナミクスについて一定の知識を有する者であれば、そこに相関が存在することを見出せるものであるが、つまり障害監視を実行するにあたって、当該システムについてのアプリオリな知識を必要とするものである。 For example, the invention disclosed in Patent Document 1 discloses a technique for detecting the occurrence of a failure by acquiring a plurality of types of monitoring data from a plurality of information processing apparatuses and calculating a correlation therefrom. ing. However, it does not provide guidance on how to select monitoring data from which such correlations should be extracted. The monitoring data exemplified in this document correlates to any person who has a certain knowledge about the configuration and dynamics of the system to be monitored, such as the transaction throughput and disk I / O amount in the DB server. In other words, in order to execute fault monitoring, a priori knowledge of the system is required.

しかるに前述のようなＭａｐＲｅｄｕｃｅ方式の並列分散処理システムにてこの発明を適用しようとすると困難に直面するであろう。なぜならば、ＭａｐＲｅｄｕｃｅ方式においては、ジョブをタスクに分割した後、どのタスクをどの情報処理装置で実行するかは実行時にならないと決定しないからである。この性質ゆえに、ＭａｐＲｅｄｕｃｅ方式はタスクスケジューリングの柔軟さと計算資源の利用効率の向上という利点を得ることができたのであるが、上記のような監視データを基にした障害検知技術の適用を図ろうとすると、どの装置間で相関関係を算出すればよいのか判然としないという問題がある。 However, it will be difficult to apply the present invention to the MapReduce parallel distributed processing system as described above. This is because, in the MapReduce method, after dividing a job into tasks, it is not determined which task is to be executed by which information processing apparatus unless it is at the time of execution. Because of this property, the MapReduce method has obtained the advantages of flexibility in task scheduling and improvement in utilization efficiency of computing resources. However, when trying to apply the failure detection technology based on the monitoring data as described above. There is a problem that it is not clear which device the correlation should be calculated.

また、プログラミング技法としてのＭａｐＲｅｄｕｃｅ方式の利点が、容易に多種多様な並列分散ソフトウェアを構築しうるという点にあるのであれば、ＭａｐＲｅｄｕｃｅ方式の並列分散処理システムは特定少数の応用システムのためのみならず、多様な応用システムのアプリケーションに供用されることも考えられる。その場合、当該クラスタが実行するジョブは、プロセッサ資源を多用するもの、ディスクＩ／Ｏを多用するもの等の特質の差異が生じ、装置にもたらす負荷も多様になるであろう。この結果、上記のようなアプローチによる障害検知技術の適用を図ろうとすると、装置から取得しうる多数の監視データの中から、相関関係算出の対象とすべきものを抽出することが困難となるという問題がある。 In addition, if the advantage of the MapReduce method as a programming technique is that a wide variety of parallel distributed software can be easily constructed, the MapReduce parallel distributed processing system is not limited to a specific number of application systems. It can also be used for various applications. In this case, the jobs executed by the cluster will have different characteristics such as those that use a lot of processor resources and those that use a lot of disk I / O, and the load on the apparatus will also vary. As a result, it is difficult to extract what should be the target of correlation calculation from a large number of monitoring data that can be acquired from the device when trying to apply the failure detection technology by the above approach. There is.

さらに、並列分散処理システムが有効に活用されればされるほどに、その規模を拡大するために、新たな情報処理装置が追加導入されることであろう。その結果、近年のように装置の性能面での進歩が急速な時代においては、クラスタを構成する情報処理装置のそれぞれについて、その具備する計算資源が不均一なものとなると考えられる。この点もまた、上記のようなアプローチによる障害検知技術の適用を困難とする。 Furthermore, as the parallel distributed processing system is effectively used, a new information processing apparatus will be additionally introduced in order to expand the scale. As a result, it is considered that the computing resources provided for each of the information processing devices constituting the cluster become non-uniform in an era in which the performance of the device is rapidly advanced as in recent years. This also makes it difficult to apply the failure detection technique based on the above approach.

すなわち、監視データの相関関係を分析することによって障害検知を行うというアプローチでは、どの情報処理装置の、どの監視データを選択しペアとして分析すべきかという問題に対して回答する必要がある。 That is, in the approach of detecting a failure by analyzing the correlation of monitoring data, it is necessary to answer to the question of which monitoring data of which information processing apparatus should be selected and analyzed as a pair.

例えば、前記非特許文献１にて開示される技術においては、正準相関分析（ＣａｎｏｎｉｃａｌＣｏｒｒｅｌａｔｉｏｎＡｎａｌｙｓｉｓ）という統計手法を活用することで、多様な監視データをひとまとまりとして分析の対象としている。この方法は、装置の監視データからどれを選択するかという課題を解決する一例である。しかしながら、前記のような、どの装置の間で相関があるとみなすべきかを判断する問題に対する解とはなっていない。 For example, in the technique disclosed in Non-Patent Document 1, a variety of monitoring data is collected as a target of analysis by utilizing a statistical technique called canonical correlation analysis. This method is an example that solves the problem of which one to select from the monitoring data of the apparatus. However, it is not a solution to the problem of determining which devices should be considered to be correlated as described above.

このように、公知の技術は、並列分散処理システムにおいて障害検知が重要であるにもかかわらず対応できていない。例えば、ＭａｐＲｅｄｕｃｅ方式のクラスタのように、複数の情報処理装置でそれぞれ取得した監視データを基に障害検知を行おうとしても、どの装置の間に相関関係が生じるかが事前に決定せず、またシステムの稼働中に相関関係が生じる組み合わせが変化するようなシステムに対応できていない。 As described above, the known technology cannot cope with failure detection in a parallel distributed processing system, although it is important. For example, even if failure detection is performed based on monitoring data acquired by each of a plurality of information processing devices, such as a MapReduce cluster, it is not determined in advance which device will have a correlation. It cannot cope with a system in which the combination in which the correlation occurs during system operation changes.

そこで、並列分散処理システムにおいて、障害検知を行う方法、プログラム、装置、システムを提供する。これは例えば、ジョブの多様性や、ジョブの実行スケジューリングの非決定性や、稼働中のシステム構成の変更にも関わらず、障害検知を行うものである。 Therefore, a method, program, apparatus, and system for detecting a failure in a parallel distributed processing system are provided. For example, failure detection is performed regardless of job diversity, non-determinism of job execution scheduling, and changes in the operating system configuration.

上記課題を解決するために、例えば特許請求の範囲に記載の構成を採用する。 In order to solve the above problems, for example, the configuration described in the claims is adopted.

本願は上記課題を解決する手段を複数含んでいるが、その一例は以下のような構成を有する。
ジョブを複数の情報処理装置で協調して実行する情報処理システムの運用管理装置であって、運用管理装置は、複数の情報処理装置各々から情報を取得するデータ収集部と、複数の情報処理装置に関するデータを記憶する記憶部と、記憶部に記憶されたデータを用いて複数の情報処理装置の状態を評価する評価部を有する。複数の情報処理装置は各々、所定の複数の特性の内のいずれか一の特性を有しており、
データ収集部は、複数の情報処理装置各々から性能情報を取得して前記記憶部に格納する。記憶部は更に、二の情報処理装置がとり得る特性の組み合わせ各々について、当該二の情報処理装置の性能情報の相関関係についての閾値を記憶している。評価部は、複数の情報処理装置のうち一の評価対象の情報処理装置について当該評価対象の情報処理装置の状態を評価する場合に、当該評価対象の情報処理装置以外の複数の情報処理装置各々について、当該評価対象の情報処理装置との性能情報の相関値を算出すると共に当該評価対象の情報処理装置との特性の組み合わせを特定し、特定された特性の組み合わせについて記憶部に格納されている閾値と算出された相関値とを比較し、比較の結果に基づいて評価対象の情報処理装置の状態を評価する。 The present application includes a plurality of means for solving the above-mentioned problems, and an example thereof has the following configuration.
An operation management device of an information processing system that executes a job in cooperation with a plurality of information processing devices, the operation management device including a data collection unit that acquires information from each of the plurality of information processing devices, and a plurality of information processing devices A storage unit that stores data related to the information processing unit, and an evaluation unit that evaluates the states of the plurality of information processing apparatuses using the data stored in the storage unit. Each of the plurality of information processing devices has any one of predetermined characteristics.
The data collection unit acquires performance information from each of the plurality of information processing apparatuses and stores the performance information in the storage unit. The storage unit further stores a threshold value regarding the correlation between the performance information of the two information processing devices for each combination of characteristics that the two information processing devices can take. The evaluation unit evaluates the state of the information processing apparatus as the evaluation target for one information processing apparatus as the evaluation target among the plurality of information processing apparatuses, and each of the plurality of information processing apparatuses other than the information processing apparatus as the evaluation target And calculating a correlation value of performance information with the information processing apparatus to be evaluated, specifying a combination of characteristics with the information processing apparatus to be evaluated, and storing the specified combination of characteristics in the storage unit The threshold is compared with the calculated correlation value, and the state of the information processing apparatus to be evaluated is evaluated based on the comparison result.

並列分散処理システムにおいて障害検知が可能となる。例えば、多様なジョブが実行される並列分散処理システムにおいても、障害検知が可能となる。 Fault detection is possible in a parallel distributed processing system. For example, failure detection is possible even in a parallel distributed processing system in which various jobs are executed.

上記した以外の課題、構成及び効果は、以下の実施形態の説明により明らかにされる。 Problems, configurations, and effects other than those described above will be clarified by the following description of embodiments.

情報処理装置の構成の一例を示す図である。It is a figure which shows an example of a structure of information processing apparatus. 並列分散処理システムの全体構成の一例を示す図である。It is a figure which shows an example of the whole structure of a parallel distributed processing system. ＭａｐＲｅｄｕｃｅクラスタの構成の一例を示す図である。It is a figure which shows an example of a structure of a MapReduce cluster. ＭａｐＲｅｄｕｃｅ方式の処理の実行フローの一例を示す図である。It is a figure which shows an example of the execution flow of the process of MapReduce system. タスクトラッカのスロット数の概念の一例を示す図である。It is a figure which shows an example of the concept of the slot number of a task tracker. 監視エージェントの構成の一例を示す図である。It is a figure which shows an example of a structure of the monitoring agent. 監視エージェントの処理手順の一例を示す図である。It is a figure which shows an example of the process sequence of a monitoring agent. 監視マネージャの構成の一例を示す図である。It is a figure which shows an example of a structure of the monitoring manager. 監視エージェントの処理手順の一例を示す図である。It is a figure which shows an example of the process sequence of a monitoring agent. 監視マネージャのデータベースが格納しているテーブル群の一例を示す図である。It is a figure which shows an example of the table group which the database of the monitoring manager has stored. 管理対象ホスト一覧テーブルの一例を示す図である。It is a figure which shows an example of a management object host list table. ＯＳ性能情報を格納するテーブルの一例であるプロセッサ性能情報テーブルの例を示す図である。It is a figure which shows the example of the processor performance information table which is an example of the table which stores OS performance information. ＯＳ性能情報を格納するテーブルの一例であるメモリ性能情報テーブルの例を示す図である。It is a figure which shows the example of the memory performance information table which is an example of the table which stores OS performance information. ＯＳ性能情報を格納するテーブルの一例であるディスク性能情報テーブルの例を示す図である。It is a figure which shows the example of the disk performance information table which is an example of the table which stores OS performance information. ＭａｐＲｅｄｕｃｅスケジューリング情報の例であるジョブリストの一例を示す図である。It is a figure which shows an example of the job list which is an example of MapReduce scheduling information. ＭａｐＲｅｄｕｃｅスケジューリング情報の例であるタスクリストの一例を示す図である。It is a figure which shows an example of the task list which is an example of MapReduce scheduling information. ＭａｐＲｅｄｕｃｅスケジューリング情報の例であるアテンプトリストの一例を示す図である。It is a figure which shows an example of the attempt list | wrist which is an example of MapReduce scheduling information. ＭａｐＲｅｄｕｃｅスケジューリング情報の例であるデータ転送トレースの一例を示す図である。It is a figure which shows an example of the data transfer trace which is an example of MapReduce scheduling information. 稼働状況評価処理手順の一例を示す図である。It is a figure which shows an example of an operation condition evaluation processing procedure. 仮想グループ生成の処理手順の一例を示す図である。It is a figure which shows an example of the process sequence of a virtual group production | generation. 仮想グループテーブルの一例を示す図である。It is a figure which shows an example of a virtual group table. 仮想グループノード一覧テーブルの例を示す図である。It is a figure which shows the example of a virtual group node list table. 仮想グループの概念の一例を示す図である。It is a figure which shows an example of the concept of a virtual group. ノード特性判定の処理手順の一例を示す図である。It is a figure which shows an example of the process sequence of node characteristic determination. ノード特性判定に用いるテーブルの例を示す図である。It is a figure which shows the example of the table used for node characteristic determination. ノード特性判定に用いるテーブルの他の例を示す図である。It is a figure which shows the other example of the table used for node characteristic determination. ノード特性判定に用いるテーブルの設定に用いる画面表示の一例を示す図である。It is a figure which shows an example of the screen display used for the setting of the table used for node characteristic determination. クラスタマップ生成の処理手順の一例を示す図である。It is a figure which shows an example of the process sequence of cluster map generation. クラスタマップテーブルの一例を示す図である。It is a figure which shows an example of a cluster map table. クラスタマップの概念の一例を示す図である。It is a figure which shows an example of the concept of a cluster map. ノード性能行列生成の処理手順の一例を示す図である。It is a figure which shows an example of the process sequence of node performance matrix production | generation. ノード性能行列テーブルの一例を示す図である。It is a figure which shows an example of a node performance matrix table. 相関算出の処理手順の一例を示す図である。It is a figure which shows an example of the process sequence of correlation calculation. ジョブプロファイルテーブルの一例を示す図である。It is a figure which shows an example of a job profile table. イベント通知の処理手順の一例を示す図である。It is a figure which shows an example of the process sequence of an event notification. イベント通知の画面表示の一例を示す図である。It is a figure which shows an example of the screen display of an event notification. 管理対象ホスト稼働状況表示の処理手順の一例を示す図である。It is a figure which shows an example of the process sequence of a management object host operating condition display. 監視コンソールの画面表示の一例を示す図である。It is a figure which shows an example of the screen display of a monitoring console. 監視コンソールの画面表示の他の一例を示す図である。It is a figure which shows another example of the screen display of a monitoring console. 監視コンソールの画面表示の一例を示す図である。It is a figure which shows an example of the screen display of a monitoring console. 監視コンソールの画面表示の他の一例を示す図である。It is a figure which shows another example of the screen display of a monitoring console. 第２の実施形態による並列分散処理システムの全体構成の一例を示す図である。It is a figure which shows an example of the whole structure of the parallel distributed processing system by 2nd Embodiment. 第２の実施形態による監視マネージャとリモートモニタの構成の一例を示す図である。It is a figure which shows an example of a structure of the monitoring manager and remote monitor by 2nd Embodiment. 第３の実施形態によるノード性能行列生成の処理手順の一例を示す図である。It is a figure which shows an example of the process sequence of the node performance matrix production | generation by 3rd Embodiment. 第３の実施形態による管理対象ホスト一覧テーブルの一例を示す図である。It is a figure which shows an example of the management object host list table by 3rd Embodiment. 第４の実施形態によるジョブプロファイルテーブルの一例を示す図である。It is a figure which shows an example of the job profile table by 4th Embodiment. 第４の実施形態による分析アルゴリズムテーブルの一例を示す図である。It is a figure which shows an example of the analysis algorithm table by 4th Embodiment. 第４の実施形態による相関算出の処理手順の一例を示す図である。It is a figure which shows an example of the process sequence of the correlation calculation by 4th Embodiment. 第４の実施形態による分析アルゴリズム設定に用いる画面表示の一例を示す図である。It is a figure which shows an example of the screen display used for the analysis algorithm setting by 4th Embodiment. 第４の実施形態による分析アルゴリズム自動判定方法の概念の一例を示す図である。It is a figure which shows an example of the concept of the analysis algorithm automatic determination method by 4th Embodiment. 第４の実施形態による分析アルゴリズム自動判定方法の概念の他の一例を示す図である。It is a figure which shows another example of the concept of the analysis algorithm automatic determination method by 4th Embodiment. 第４の実施形態による分析アルゴリズム自動判定の処理手順の一例を示す図である。It is a figure which shows an example of the process sequence of the analysis algorithm automatic determination by 4th Embodiment. 第４の実施形態による障害検知方法の概念の一例を示す図である。It is a figure which shows an example of the concept of the failure detection method by 4th Embodiment.

以下、本発明の実施の形態を図面に基づいて詳細に説明する。なお、以後説明される図面においては、同一部には同一符号を付し、その繰り返しの説明は省略または簡略化される。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In the drawings described below, the same portions are denoted by the same reference numerals, and repeated description thereof is omitted or simplified.

まず、第一の実施例として、障害検知の機能を備えるシステム運用管理装置の例を説明する。 First, as a first embodiment, an example of a system operation management apparatus having a failure detection function will be described.

図１は、情報処理装置の構成の一例を示す図である。 FIG. 1 is a diagram illustrating an example of the configuration of the information processing apparatus.

情報処理装置１００はプロセッサ１０１、メモリ１０２、ストレージ１０３、ネットワークＩ／Ｆ１０４、コンソール１０５から構成されている。プロセッサ１０１はメモリ１０２、ストレージ１０３、ネットワークＩ／Ｆ１０４、コンソール１０５と接続されている。ネットワークＩ／Ｆ１０４は、ネットワーク１０６と接続されている。 The information processing apparatus 100 includes a processor 101, a memory 102, a storage 103, a network I / F 104, and a console 105. The processor 101 is connected to a memory 102, a storage 103, a network I / F 104, and a console 105. The network I / F 104 is connected to the network 106.

情報処理装置１００は、例えばラックマウントサーバ、ブレードサーバ、パーソナルコンピュータ等である。また情報処理装置１００は、プロセッサ１０１、メモリ１０２、ストレージ１０３、ネットワークＩ／Ｆ１０４、コンソール１０５を、いずれも複数を備えることがある。また、ストレージ１０３は、例えばハードディスクドライブ（ＨＤＤ）や、ソリッドステートドライブ（ＳＳＤ）等であり、またはこれらを複数台組み合わせたものである。また、ネットワーク１０６は、例えばイーサネット（登録商標）や、ＩＥＥＥ８０２．１１規格に基づく無線ネットワーク等である。 The information processing apparatus 100 is, for example, a rack mount server, a blade server, a personal computer, or the like. The information processing apparatus 100 may include a plurality of processors 101, a memory 102, a storage 103, a network I / F 104, and a console 105. The storage 103 is, for example, a hard disk drive (HDD), a solid state drive (SSD), or the like, or a combination of a plurality of these. The network 106 is, for example, an Ethernet (registered trademark) or a wireless network based on the IEEE 802.11 standard.

ストレージ１０３は、データを不揮発的に記録し、また読み出すことができる。ネットワークＩ／Ｆ１０４は、それが接続するネットワーク１０６を経由して、他の情報処理装置１００が有するネットワークＩ／Ｆ１０４と通信することができる。コンソール１０５は、ディスプレイ装置を用いてテキスト情報、グラフィカル情報等を表示し、また接続されたヒューマンインタフェースデバイスから情報を受信することができる。 The storage 103 can record and read data in a nonvolatile manner. The network I / F 104 can communicate with the network I / F 104 included in the other information processing apparatus 100 via the network 106 to which the network I / F 104 is connected. The console 105 can display text information, graphical information, and the like using a display device, and can receive information from a connected human interface device.

情報処理装置１００は、メモリ１０２にユーザプロセス２００、オペレーティングシステム（ＯＳ）２１０を実装している。ユーザプロセス２００、オペレーティングシステム２１０は、いずれもプログラムであって、情報処理装置１００の有するプロセッサ１０１で実行され、これによって情報処理装置１００はメモリ１０２やストレージ１０３へデータを読み書きし、ネットワークＩ／Ｆ１０４とネットワーク１０６を経由して、他の情報処理装置１００のメモリ２００に実装されているユーザプロセス２００やオペレーティングシステム２１０と通信を行い、コンソール１０５に情報を表示し受信することができる。 The information processing apparatus 100 has a user process 200 and an operating system (OS) 210 mounted on a memory 102. The user process 200 and the operating system 210 are both programs and are executed by the processor 101 included in the information processing apparatus 100, whereby the information processing apparatus 100 reads and writes data from and to the memory 102 and the storage 103, and the network I / F 104. Via the network 106, it can communicate with the user process 200 and the operating system 210 installed in the memory 200 of the other information processing apparatus 100, and can display and receive information on the console 105.

本実施例で示すシステム運用管理装置、あるいは並列分散処理システムは図１に示す情報処理装置１００と同様の構成を有する。 The system operation management apparatus or parallel distributed processing system shown in this embodiment has the same configuration as the information processing apparatus 100 shown in FIG.

図２は、並列分散処理システムの全体構成の一例を示す図である。 FIG. 2 is a diagram illustrating an example of the overall configuration of the parallel distributed processing system.

監視サーバ１１０、クライアント１２０、マスタノード１３０、ワーカノード１４０は、いずれも情報処理装置１００にそれぞれ特徴のあるユーザプロセス２００を実装したものである。例えば、監視サーバ１１０はユーザプロセスとして監視マネージャ２０１を実装する。クライアント１２０はユーザプロセスとして監視コンソール２０２を実装する。マスタノード１３０はユーザプロセスとしてジョブトラッカ２０３、ネームノード２０４、監視エージェント２０５を実装する。ワーカノード１４０はユーザプロセスとしてタスクトラッカ２０６、データノード２０７、監視エージェント２０５を実装する。また、これら情報処理装置は、ネットワーク１０６を経由して相互に通信が可能である。 The monitoring server 110, the client 120, the master node 130, and the worker node 140 are all implemented with user processes 200 that are characteristic of the information processing apparatus 100. For example, the monitoring server 110 implements the monitoring manager 201 as a user process. The client 120 implements the monitoring console 202 as a user process. The master node 130 implements a job tracker 203, a name node 204, and a monitoring agent 205 as user processes. The worker node 140 includes a task tracker 206, a data node 207, and a monitoring agent 205 as user processes. Further, these information processing apparatuses can communicate with each other via the network 106.

ＭａｐＲｅｄｕｃｅクラスタ３００は、マスタノード１３０の１台と、ワーカノード１４０の１台以上を含む。 The MapReduce cluster 300 includes one master node 130 and one or more worker nodes 140.

並列分散処理システムは、監視サーバ１１０、クライアント１２０を複数含むことがある。並列分散処理システムは、ＭａｐＲｅｄｕｃｅクラスタ３００を複数含むことがある。 The parallel distributed processing system may include a plurality of monitoring servers 110 and clients 120. A parallel distributed processing system may include a plurality of MapReduce clusters 300.

本実施例におけるシステム運用管理装置とは、監視サーバ１１０と監視マネージャ２０１に加えて、クライアント１２０と監視コンソール２０２、または監視エージェント２０５のいずれかまたは両方から構成される。監視サーバ１１０がクライアント１２０を兼ねることもありうる。情報処理装置とそこに実装されるユーザプロセスの対応関係には自由度があり、本実施例はその多数の組み合わせの中の一例であることは留意されたい。 The system operation management apparatus according to the present exemplary embodiment includes one or both of a client 120, a monitoring console 202, and a monitoring agent 205 in addition to the monitoring server 110 and the monitoring manager 201. The monitoring server 110 may also serve as the client 120. It should be noted that there is a degree of freedom in the correspondence between the information processing apparatus and the user processes mounted thereon, and this embodiment is an example of many combinations thereof.

並列分散処理システムの運用管理担当者は、クライアント１２０が実装する監視コンソール２０２が、コンソール１０５を経由して表示する情報を基にして並列分散処理システムの監視を行う。また監視コンソール２０２は、運用管理担当者がコンソール１０５を経由して入力する情報を受信して監視マネージャに送信し、監視マネージャはその情報を基に動作を変更する。こうした、監視コンソールを介した運用管理担当者とシステム運用管理装置との相互作用の例は後述される。 A person in charge of operation management of the parallel distributed processing system monitors the parallel distributed processing system based on information displayed by the monitoring console 202 implemented by the client 120 via the console 105. The monitoring console 202 receives information input by the person in charge of operation via the console 105 and transmits it to the monitoring manager, and the monitoring manager changes the operation based on the information. An example of the interaction between the operation manager and the system operation management apparatus via the monitoring console will be described later.

図３は、ＭａｐＲｅｄｕｃｅクラスタ３００を構成するマスタノード１３０とワーカノード１４０の関係の一例を示す図である。 FIG. 3 is a diagram illustrating an example of the relationship between the master node 130 and the worker node 140 that configure the MapReduce cluster 300.

マスタノード１３０のジョブトラッカ２０３は、ジョブ３０１を実行する。ジョブ３０１は、１つ以上のタスク３０２の集合である。タスク３０２はユーザプロセス２００の一様態であり、ワーカノード１４０では1つ以上のタスク３０２が実行可能である。ジョブトラッカ２０３は、ワーカノード１４０のタスクトラッカ２０６と通信を行う。すなわち、ジョブトラッカ２０３はタスクトラッカ２０６に、タスク３０２の実行を指示する。１つのジョブ３０１を構成するタスク３０２の群を、複数のワーカノード１４０に分散させて並列に実行することによって、処理効率の向上を図るのがＭａｐＲｅｄｕｃｅ方式の主眼である。 The job tracker 203 of the master node 130 executes the job 301. A job 301 is a set of one or more tasks 302. The task 302 is a state of the user process 200, and the worker node 140 can execute one or more tasks 302. The job tracker 203 communicates with the task tracker 206 of the worker node 140. That is, the job tracker 203 instructs the task tracker 206 to execute the task 302. The main point of the MapReduce method is to improve the processing efficiency by distributing a group of tasks 302 constituting one job 301 to a plurality of worker nodes 140 and executing them in parallel.

ＭａｐＲｅｄｕｃｅクラスタの利用者は、実行すべきジョブ３０１をジョブトラッカ２０３に指示する。指示はマスタノード１３０のコンソール１０５を使用して行ってもよいし、他の情報処理装置１００からネットワーク１０６を経由した通信を行うことで行ってもよい。ジョブトラッカ２０３は、そのジョブ３０１を構成するタスク３０２の群を、その管理下にあるタスクトラッカ２０６に分配し、タスクトラッカ２０６は、分配されたタスク３０２を実行する。 The user of the MapReduce cluster instructs the job tracker 203 to execute the job 301. The instruction may be performed using the console 105 of the master node 130 or may be performed by performing communication via the network 106 from another information processing apparatus 100. The job tracker 203 distributes the group of tasks 302 constituting the job 301 to the task tracker 206 under the management, and the task tracker 206 executes the distributed task 302.

またＭａｐＲｅｄｕｃｅクラスタ３００は、ワーカノード１４０のストレージ１０３にデータを格納する分散ファイルシステムの機能を備える。これはタスク３０２がその処理に必要とするデータを格納するものである。マスタノード１３０のネームノード２０４は、あるデータがどのワーカノード１４０に格納されているかという情報（メタデータ）を持っている。あるデータを必要とするタスク３０２は、ネームノード２０４と通信を行う。すなわち、タスク３０２はネームノード２０４からその必要とするデータに対応するメタデータを取得し、しかる後にそのデータが格納されているワーカノード１４０で動作するデータノード２０７と通信を行い、目的のデータを要求する。データノード２０７は、データをデータブロック３０３の群に分割してストレージ１０３に格納しており、要求されたデータをタスク３０２に転送する。 The MapReduce cluster 300 has a distributed file system function for storing data in the storage 103 of the worker node 140. This stores data required by the task 302 for its processing. The name node 204 of the master node 130 has information (metadata) indicating which worker node 140 stores certain data. A task 302 that requires some data communicates with the name node 204. That is, the task 302 obtains metadata corresponding to the required data from the name node 204, and then communicates with the data node 207 operating on the worker node 140 in which the data is stored, and requests the target data. To do. The data node 207 divides the data into groups of data blocks 303 and stores them in the storage 103, and transfers the requested data to the task 302.

図４は、ＭａｐＲｅｄｕｃｅ方式のジョブ３０１の実行フローを示す図である。 FIG. 4 is a diagram illustrating an execution flow of the MapReduce type job 301.

ジョブ３０１が、１つ以上のタスク３０２の集合であることは先に述べた。タスク３０２は、Ｍａｐタスク３０５とＲｅｄｕｃｅタスク３０７の２つの種別からなる。Ｍａｐタスク３０５は、分散ファイルシステムに置かれた入力ファイルであるスプリット３０４を読み、何らかの処理を行った上で中間ファイル３０６を生成し、これをワーカノード１４０のストレージ１０３に書き込む。この中間ファイル３０６は、キー・バリュー形式のファイルであり、このキーによって、そのデータがどのＲｅｄｕｃｅタスク３０７の入力となるかが決まる。すなわちファイル３０６のバリューにはマップタスク３０５による処理の結果が書き込まれ、キーにはバリューとして書き込まれた値を入力すべきReduceタスク３０７を指定する値が書き込まれる。タスクトラッカ２０６は、Ｒｅｄｕｃｅタスク３０７を実行するワーカノード１４０に、中間ファイル３０６の群から特定のキーを持つデータを転送する。タスクトラッカ２０６は、転送されたデータをソートしたうえでＲｅｄｕｃｅタスク３０７に入力する。Ｒｅｄｕｃｅタスク３０７はそのデータに何らかの処理を行い、その結果を分散ファイルシステム上の出力ファイル３０８として生成する。 As described above, the job 301 is a set of one or more tasks 302. The task 302 consists of two types, a Map task 305 and a Reduce task 307. The Map task 305 reads the split 304, which is an input file placed in the distributed file system, performs some processing, generates an intermediate file 306, and writes this in the storage 103 of the worker node 140. The intermediate file 306 is a key / value format file, and this key determines which Reduce task 307 the data is input to. That is, the result of the processing by the map task 305 is written in the value of the file 306, and the value specifying the Reduce task 307 to which the value written as the value is input is written in the key. The task tracker 206 transfers data having a specific key from the group of intermediate files 306 to the worker node 140 that executes the Reduce task 307. The task tracker 206 sorts the transferred data and inputs the sorted data to the Reduce task 307. The Reduce task 307 performs some processing on the data, and generates the result as an output file 308 on the distributed file system.

図５は、タスクトラッカ２０６のスロット数の概念を示す図である。 FIG. 5 is a diagram showing the concept of the number of slots of the task tracker 206.

タスクトラッカ２０６には「スロット数」の設定がある。これは、ワーカノード１４０において１つのタスクトラッカ２０６が同時に実行するタスク数の上限値であり、Ｍａｐタスク３０５とＲｅｄｕｃｅタスク３０７毎に設定できる。図ではスロット数として各４が設定された状態を示す。これらの値は上限値であり、常にこの値と同数のタスクを実行しているわけではない。タスクトラッカ２０６は、スロット数の設定情報をジョブトラッカ２０３に送り、ジョブトラッカ２０３は、その情報を基に各タスクトラッカ２０６で実行すべきタスクを指定する。 The task tracker 206 has a “slot number” setting. This is the upper limit value of the number of tasks simultaneously executed by one task tracker 206 in the worker node 140, and can be set for each of the Map task 305 and the Reduce task 307. The figure shows a state where each slot is set to 4. These values are upper limit values, and the same number of tasks are not always executed. The task tracker 206 sends slot number setting information to the job tracker 203, and the job tracker 203 designates a task to be executed by each task tracker 206 based on the information.

図６は、監視エージェント２０５の構成の一例と、その処理手順の一例を示す図である。 FIG. 6 is a diagram illustrating an example of the configuration of the monitoring agent 205 and an example of a processing procedure thereof.

図６Ａは監視エージェント２０５の構成の一例を示す。監視エージェント２０５は、監視データ取得部２０５１、監視データ送信部２０５４から構成される。監視データ取得部２０５１は、ＯＳ性能情報取得部２０５２、ＭａｐＲｅｄｕｃｅスケジューリング情報取得部２０５３を有する。監視エージェント２０５は、多様な監視対象から監視データを取得できるよう、監視データ取得部２０５１は、監視対象に応じた監視データ取得のための機能をプラグインとして使用するように構成されている。本実施例では、監視データ取得部２０５１は、オペレーティングシステム（ＯＳ）２１０からＯＳ性能情報を取得するためのＯＳ性能情報取得部２０５２、ジョブトラッカ２０３とデータノード２０７からＭａｐＲｅｄｕｃｅスケジューリング情報を取得するためのＭａｐＲｅｄｕｃｅスケジューリング情報取得部２０５３を、それぞれプラグインとして使用する。 FIG. 6A shows an example of the configuration of the monitoring agent 205. The monitoring agent 205 includes a monitoring data acquisition unit 2051 and a monitoring data transmission unit 2054. The monitoring data acquisition unit 2051 includes an OS performance information acquisition unit 2052 and a MapReduce scheduling information acquisition unit 2053. The monitoring data acquisition unit 2051 is configured to use a function for acquiring monitoring data corresponding to the monitoring target as a plug-in so that the monitoring agent 205 can acquire monitoring data from various monitoring targets. In the present embodiment, the monitoring data acquisition unit 2051 acquires the OS performance information acquisition unit 2052 for acquiring the OS performance information from the operating system (OS) 210, the MapReduce scheduling information from the job tracker 203 and the data node 207. The MapReduce scheduling information acquisition unit 2053 is used as a plug-in.

監視データ送信部２０５４は、監視データ取得部２０５１とそのプラグインが取得した監視データを、監視マネージャ２０１に送信する。送信する手段は、ユニキャストでもマルチキャストでもよい。 The monitoring data transmission unit 2054 transmits the monitoring data acquired by the monitoring data acquisition unit 2051 and its plug-in to the monitoring manager 201. The means for transmitting may be unicast or multicast.

図６Ｂは監視エージェント２０５の処理手順の一例を示す。監視エージェント２０５は、ＯＳ性能情報を取得し（Ｓ６０１）、ＭａｐＲｅｄｕｃｅスケジューリング情報を取得し（Ｓ６０２）、取得した監視データを監視マネージャ２０１に送信し（Ｓ６０３）、しかる後に一定の時間ウェイトし（Ｓ６０４）、再びステップＳ６０１を開始する。このように、監視エージェント２０５の処理手順は１つのループ処理であり、稼働している間は一定間隔で監視データを監視マネージャ２０１に送信し続けることになる。 FIG. 6B shows an example of the processing procedure of the monitoring agent 205. The monitoring agent 205 acquires OS performance information (S601), acquires MapReduce scheduling information (S602), transmits the acquired monitoring data to the monitoring manager 201 (S603), and waits for a certain period of time (S604). Step S601 is started again. As described above, the processing procedure of the monitoring agent 205 is one loop processing, and the monitoring data is continuously transmitted to the monitoring manager 201 at regular intervals while it is operating.

図７は、監視マネージャ２０１の構成の一例と、その処理手順の一例を示す図である。図７Ａは監視マネージャ２０１の構成の一例を示す。監視マネージャ２０１は、監視データ収集部２０１１、監視データ格納部２０１２、データベース２０１３、稼働状況評価部２０１４、イベント通知部２０１５から構成される。監視データ収集部２０１１は、監視エージェント２０５が送信する監視データを収集する。監視データ格納部２０１２は、収集された監視データをデータベース２０１３に格納する。本実施例では、前述の監視エージェント２０５が送信するＭａｐＲｅｄｕｃｅスケジューリング情報とＯＳ性能情報を格納する。稼働状況評価部２０１４は、データベース２０１３に格納された監視データの情報を基に、並列分散処理システムの稼働状況評価を行い、障害の発生を検知した場合は、イベント通知部２０１５が監視コンソール２０２に対してイベント通知を行う。 FIG. 7 is a diagram illustrating an example of the configuration of the monitoring manager 201 and an example of a processing procedure thereof. FIG. 7A shows an example of the configuration of the monitoring manager 201. The monitoring manager 201 includes a monitoring data collection unit 2011, a monitoring data storage unit 2012, a database 2013, an operation status evaluation unit 2014, and an event notification unit 2015. The monitoring data collection unit 2011 collects monitoring data transmitted by the monitoring agent 205. The monitoring data storage unit 2012 stores the collected monitoring data in the database 2013. In the present embodiment, MapReduce scheduling information and OS performance information transmitted by the monitoring agent 205 described above are stored. The operation status evaluation unit 2014 evaluates the operation status of the parallel distributed processing system based on the information of the monitoring data stored in the database 2013, and when the occurrence of a failure is detected, the event notification unit 2015 notifies the monitoring console 202. Event notification is sent to the event.

図７Ｂは監視マネージャ２０１の処理手順の一例を示す。監視マネージャ２０１の処理は、２つのループ処理からなる。第１のループは、監視エージェント２０５が送信した監視データを受信し（Ｓ７０１）、ＭａｐＲｅｄｕｃｅスケジューリング情報をデータベースに格納し（Ｓ７０２）、ＯＳ性能情報を同じくデータベースに格納する（Ｓ７０３）。第２のループは、データベース２０１３から得られる情報を基に稼働状況評価を行い（Ｓ７０４）、もし障害の発生を検知したならば（Ｓ７０５）、監視コンソール２０２に対してイベント通知を行い（Ｓ７０６）、しかる後に一定時間ウェイトする（Ｓ７０７）。このうちステップＳ７０４、Ｓ７０６については、より詳細な手順を後述する。 FIG. 7B shows an example of the processing procedure of the monitoring manager 201. The process of the monitoring manager 201 consists of two loop processes. The first loop receives monitoring data transmitted by the monitoring agent 205 (S701), stores MapReduce scheduling information in a database (S702), and stores OS performance information in the database (S703). The second loop performs an operation status evaluation based on information obtained from the database 2013 (S704), and if an occurrence of a failure is detected (S705), notifies the monitoring console 202 of an event (S706). After that, a certain time is waited (S707). Among these, steps S704 and S706 will be described in more detail later.

上記のように、監視エージェント２０５と監視マネージャ２０１は、定期的に監視データのやりとりのために通信を行う。本実施例では、監視エージェント２０５が監視データを送信するステップＳ６０３の実行をもって、その通信の開始の契機を制御している様態となっているが、該通信の様態はこれに限定されるものではなく、例えば監視マネージャ２０１が定期的に監視エージェント２０５に対して監視データの送信を要求する等もありうる。 As described above, the monitoring agent 205 and the monitoring manager 201 regularly communicate for exchanging monitoring data. In this embodiment, the monitoring agent 205 controls the trigger of the communication by executing step S603 in which the monitoring data is transmitted. However, the communication mode is not limited to this. For example, the monitoring manager 201 may periodically request the monitoring agent 205 to transmit monitoring data.

図８は、監視マネージャ２０１のデータベース２０１３がその内部に格納しているテーブル群の一例を示す図である。データベース２０１３は、管理対象ホスト一覧のテーブル４０１を格納する。また、ＯＳ性能情報のテーブル群４０２をホスト毎に格納する。また、ＭａｐＲｅｄｕｃｅスケジューリング情報のテーブル群４０３をクラスタ毎に格納する。 FIG. 8 is a diagram illustrating an example of a table group stored in the database 2013 of the monitoring manager 201. The database 2013 stores a managed host list table 401. Further, the OS performance information table group 402 is stored for each host. Further, a table group 403 of MapReduce scheduling information is stored for each cluster.

図９は、管理対象ホスト一覧のテーブル４０１の一例を示す図である。管理対象ホストとは、監視マネージャ２０１が稼働状況判定の対象とする情報処理装置１０１である。本実施例においては、典型的にはワーカノード１４０の一群であるが、他の情報処理装置１０１もまた管理対象ホストになりうる。 FIG. 9 is a diagram illustrating an example of the management target host list table 401. The management target host is the information processing apparatus 101 that is the target of the operation status determination by the monitoring manager 201. In this embodiment, it is typically a group of worker nodes 140, but other information processing apparatuses 101 can also be managed hosts.

該テーブルの１レコードは１つの管理対象ホストに対応する。該テーブルに管理対象ホストが追加される契機としては、運用管理担当者の操作によるもの、監視エージェント２０５からの通知処理によるもの、監視マネージャ２０１のディスカバリ処理によるもの等が考えられる。 One record of the table corresponds to one managed host. Possible triggers for adding a management target host to the table include an operation by an operation manager, a notification process from the monitoring agent 205, a discovery process by the monitoring manager 201, and the like.

図９では、該テーブルのフィールドのうち、説明に必要なもののみを挙げている。ホスト名フィールド４０１１は、管理対象ホストのホスト名を記録する。代表ＩＰアドレスフィールド４０１２は、管理対象ホストのネットワークＩ／Ｆ１０４のうち１つに付与されたＩＰアドレスを記録する。クラスタ名フィールド４０１３は、管理対象ホストが属するＭａｐＲｅｄｕｃｅクラスタ３００を記録する。障害検知フラグフィールド４０１４は、管理対象ホストで障害の発生が検知されているかを示すフラグを格納する。該テーブルには、監視マネージャの処理にとって必要な情報を記録するための他のフィールドも存在しうることは留意されたい。 In FIG. 9, only the fields necessary for the description are listed out of the fields of the table. The host name field 4011 records the host name of the management target host. The representative IP address field 4012 records an IP address assigned to one of the network I / Fs 104 of the managed host. The cluster name field 4013 records the MapReduce cluster 300 to which the managed host belongs. The failure detection flag field 4014 stores a flag indicating whether or not a failure has been detected on the managed host. It should be noted that there may be other fields in the table for recording information necessary for the monitoring manager processing.

図１０は、ＯＳ性能情報のテーブル群４０２の一例を示す図である。ＯＳ性能情報は、監視エージェント２０５がオペレーティングシステム２１０から取得し、監視マネージャ２０１に送信する監視データである。監視マネージャ２０１は、受信したＯＳ性能情報を、データベース２０１３内のテーブル群４０２に格納する。 FIG. 10 is a diagram illustrating an example of a table group 402 of OS performance information. The OS performance information is monitoring data acquired by the monitoring agent 205 from the operating system 210 and transmitted to the monitoring manager 201. The monitoring manager 201 stores the received OS performance information in the table group 402 in the database 2013.

本実施例では、ＯＳ性能情報のテーブルの個別の例として、プロセッサ性能情報テーブル４０２１、メモリ性能情報テーブル４０２２、ディスク性能情報テーブル４０２３を示す。各テーブルに共通するのは、レコードそれぞれに当該情報を取得した時刻と、情報を取得するインターバルを含むことである。プロセッサやディスクの場合、インターバルは、何秒間の値を累積し算出したものかを示す。 In this embodiment, a processor performance information table 4021, a memory performance information table 4022, and a disk performance information table 4023 are shown as individual examples of the OS performance information table. What is common to each table is that each record includes a time at which the information is acquired and an interval at which the information is acquired. In the case of a processor or a disk, the interval indicates how many seconds of values are accumulated and calculated.

プロセッサ性能情報の場合は、累積値からさらに使用率を算出し記録する。ディスク性能情報の場合は、累積値から単位時間当たりのＩ／Ｏ量を算出し記録する。一方メモリ性能情報の場合は、取得した値はその取得した時点でのスナップショットであり、インターバルは文字通り、情報の取得間隔の意味である。そして、各レコードはさらに詳細な性能情報の項目を複数含む。この項目それぞれをメトリックと呼ぶ。 In the case of processor performance information, the usage rate is further calculated from the accumulated value and recorded. In the case of disk performance information, the I / O amount per unit time is calculated from the accumulated value and recorded. On the other hand, in the case of memory performance information, the acquired value is a snapshot at the time of acquisition, and the interval literally means the information acquisition interval. Each record includes a plurality of items of detailed performance information. Each of these items is called a metric.

情報処理装置１０１は、プロセッサ１０１や、ストレージ１０３を構成するディスクを複数搭載することがある。監視エージェント２０５は、オペレーティングシステム２１０からそれらを別個の監視データとして取得し、監視マネージャ２０１がＯＳ性能情報のテーブル群４０２に記録する際には別個のレコードとして記録し、それぞれがプロセッサＩＤやデバイス名で区別特定される。 The information processing apparatus 101 may be equipped with a plurality of disks constituting the processor 101 and the storage 103. The monitoring agent 205 acquires them as separate monitoring data from the operating system 210, and records them as separate records when the monitoring manager 201 records them in the table group 402 of OS performance information. Are identified by

本実施例では、ＯＳ性能情報の代表的なものとして上記の３つを取り上げたが、これに限定されるものではなく、他にもオペレーティングシステム２１０から取得できる統計情報は、同様にＯＳ性能情報のテーブル群４０２の１つとなりうる。 In the present embodiment, the above three are taken as typical OS performance information. However, the present invention is not limited to this, and other statistical information that can be acquired from the operating system 210 is also OS performance information. Can be one of the table groups 402.

図１１は、ＭａｐＲｅｄｕｃｅスケジューリング情報のテーブル群４０３の一例を示す図である。ＭａｐＲｅｄｕｃｅスケジューリング情報は、監視エージェント２０５がジョブトラッカ２０３とデータノード２０７から取得し、監視マネージャ２０１に送信する監視データである。監視マネージャ２０１は、受信したＭａｐＲｅｄｕｃｅスケジューリング情報を、データベース２０１３内のテーブル群４０３に格納する。 FIG. 11 is a diagram illustrating an example of a table group 403 of MapReduce scheduling information. The MapReduce scheduling information is monitoring data acquired by the monitoring agent 205 from the job tracker 203 and the data node 207 and transmitted to the monitoring manager 201. The monitoring manager 201 stores the received MapReduce scheduling information in the table group 403 in the database 2013.

本実施例では、ＭａｐＲｅｄｕｃｅスケジューリング情報のテーブルとして、ジョブリスト４０４、タスクリスト４０５、アテンプトリスト４０６、データ転送トレース４０７がある。 In the present embodiment, there are a job list 404, a task list 405, an attempt list 406, and a data transfer trace 407 as tables of MapReduce scheduling information.

ＭａｐＲｅｄｕｃｅ方式におけるジョブ３０１とタスク３０２の関係については図３で説明したとおりである。ジョブリスト４０４は、ジョブ１つにつき１レコードを記録する。ジョブはジョブＩＤフィールドに記録されるジョブＩＤにより一意に特定される。タスクリスト４０５は、タスク１つにつき１レコードを記録する。タスクはタスクＩＤフィールドに記録されるタスクＩＤにより一意に特定され、ジョブＩＤフィールドに記録されるジョブＩＤにより、該タスクが属するジョブが特定される。 The relationship between the job 301 and the task 302 in the MapReduce method is as described with reference to FIG. The job list 404 records one record for each job. The job is uniquely identified by the job ID recorded in the job ID field. The task list 405 records one record for each task. The task is uniquely specified by the task ID recorded in the task ID field, and the job to which the task belongs is specified by the job ID recorded in the job ID field.

ＭａｐＲｅｄｕｃｅ方式におけるタスク３０２の実行をアテンプトと呼ぶ。アテンプトリスト４０６は１アテンプトにつき１レコードを記録する。アテンプトはアテンプトＩＤフィールドに記録されるアテンプトＩＤにより一意に特定され、タスクＩＤフィールドに記録されるタスクＩＤにより、該アテンプトの元となるタスクが特定される。通常は1タスクにつき1アテンプトのみが記録されるが、タスクの実行が失敗した場合等にタスクの再実行が行われ、同一のタスクが複数回実行されることがある。この場合は、アテンプトリスト４０６に、同一のタスクＩＤに対して複数回のアテンプトが記録されることになる。アテンプトリスト４０６の実行ノードフィールドには、当該アテンプトを実行したワーカノード１４０のホスト名が記録される。 The execution of the task 302 in the MapReduce method is called an attempt. The attempt list 406 records one record per one attempt. An attempt is uniquely specified by an attempt ID recorded in the attempt ID field, and a task that is the basis of the attempt is specified by a task ID recorded in the task ID field. Normally, only one attempt is recorded per task, but when the task execution fails, the task is re-executed, and the same task may be executed multiple times. In this case, multiple attempts are recorded in the attempt list 406 for the same task ID. In the execution node field of the attempt list 406, the host name of the worker node 140 that executed the attempt is recorded.

ＭａｐＲｅｄｕｃｅクラスタ３００が分散ファイルシステムを備えることは前述した。データ転送トレース４０７は、データノード２０７が転送したデータについて記録する。データ転送１回につき１レコードを記録する。 As described above, the MapReduce cluster 300 includes a distributed file system. The data transfer trace 407 records data transferred by the data node 207. One record is recorded for each data transfer.

図１２は、監視マネージャの処理手順のうち、稼働状況評価の処理手順（ステップＳ７０４）の一例を示す図である。この図を基に、まず稼働状況評価の処理手順を概説し、それに続いてより詳細な手順を説明する。 FIG. 12 is a diagram illustrating an example of the operation status evaluation processing procedure (step S704) in the monitoring manager processing procedure. Based on this figure, the processing procedure for operating status evaluation is first outlined, followed by a more detailed procedure.

稼働状況評価部２０１４は、まずデータベース２０１３から、ＭａｐＲｅｄｕｃｅスケジューリング情報４０３のうち、ジョブリスト４０４、タスクリスト４０５、アテンプトリスト４０６を取得する。この３テーブルのうち、ジョブリスト４０４とタスクリスト４０５はいずれもジョブＩＤのフィールドを含み、タスクリスト４０５とアテンプトリスト４０６はいずれもタスクＩＤのフィールドを含むことが分かる。また、アテンプトリスト４０６は実行ノードのフィールドを含む。よって、これらのフィールドをもって３つのテーブルを結合することで、あるジョブの実行に使用されたワーカノード１４０の群が判別できる。このワーカノード群を抽出する処理が仮想グループの生成（Ｓ１２０１）である。 First, the operation status evaluation unit 2014 acquires the job list 404, the task list 405, and the attempt list 406 from the database 2013 from the MapReduce scheduling information 403. Of these three tables, the job list 404 and the task list 405 both include a job ID field, and the task list 405 and the attempt list 406 both include a task ID field. The attempt list 406 includes an execution node field. Therefore, by joining the three tables with these fields, the group of worker nodes 140 used for executing a certain job can be determined. The process of extracting the worker node group is virtual group generation (S1201).

次に稼働状況評価部２０１４は、データベース２０１３から管理対象ホスト一覧テーブル４０１を取得する。そして管理対象ホスト毎に、先のタスクリスト４０５とアテンプトリスト４０６を使い、該管理対象ホストで実行されていたタスクのタスク種別を求める。次に、データ転送トレース４０７を取得し、そこから得られる該管理対象ホストから転送したデータの情報を加えて、各管理対象ホストのノード特性を判定する（Ｓ１２０２）。次に稼働状況評価部２０１４は、先に生成した仮想グループと、各管理対象ホストのノード特性を併合しクラスタマップを生成する（Ｓ１２０３）。 Next, the operation status evaluation unit 2014 acquires the managed host list table 401 from the database 2013. For each managed host, the task type of the task executed on the managed host is obtained using the previous task list 405 and the attempt list 406. Next, the data transfer trace 407 is acquired, and the information of the data transferred from the managed host obtained therefrom is added to determine the node characteristics of each managed host (S1202). Next, the operating status evaluation unit 2014 merges the previously generated virtual group and the node characteristics of each managed host to generate a cluster map (S1203).

そしてクラスタマップを生成すると、次に各管理対象ホストについて、そのＯＳ性能情報４０２からノード性能行列を生成し（Ｓ１２０４）、そのノード性能行列を用いて正準相関係数の算出による相関分析を行う（Ｓ１２０５）。稼働状況評価部２０１４は、その相関分析の結果によって、障害の発生を検知する。もし障害の発生を検知した場合には、処理はイベント通知部２０１５によるイベント通知の処理手順（ステップＳ７０６）に移行する。 When the cluster map is generated, for each managed host, a node performance matrix is generated from the OS performance information 402 (S1204), and correlation analysis is performed by calculating a canonical correlation coefficient using the node performance matrix. (S1205). The operating status evaluation unit 2014 detects the occurrence of a failure based on the result of the correlation analysis. If the occurrence of a failure is detected, the process proceeds to the event notification processing procedure (step S706) by the event notification unit 2015.

以下、上記の各ステップについて詳細な手順を説明する。 Hereinafter, a detailed procedure for each of the above steps will be described.

図１３は、稼働状況評価の処理手順のうち、仮想グループ生成の処理手順（ステップＳ１２０１）の一例を示す図である。まず稼働状況評価部は、ジョブリスト４０４を取得し（Ｓ１３０１）、そこからステータスがＲＵＮＮＩＮＧであるジョブ、または終了時刻が記録されていないジョブを抽出する（Ｓ１３０２）。これで現在実行中のジョブが抽出される。これらのジョブ群をカレントジョブと呼ぶ。 FIG. 13 is a diagram illustrating an example of a virtual group generation processing procedure (step S1201) in the operational status evaluation processing procedure. First, the operating status evaluation unit obtains the job list 404 (S1301), and extracts a job whose status is RUNNING or a job whose end time is not recorded (S1302). This extracts the job currently being executed. These job groups are called current jobs.

次にタスクリスト４０５を取得し（Ｓ１３０３）、そこからカレントジョブのジョブＩＤを含むレコードを抽出する（Ｓ１３０４）。さらにステータスがＲＵＮＮＩＮＧ、または終了時刻が記録されていないタスクを抽出する（Ｓ１３０５）。これにより、現在実行中のタスクが抽出される。 Next, the task list 405 is acquired (S1303), and a record including the job ID of the current job is extracted therefrom (S1304). Further, a task whose status is RUNNING or whose end time is not recorded is extracted (S1305). Thereby, the task currently being executed is extracted.

次にアテンプトリスト４０６を取得し（Ｓ１３０６）、そこから実行中のタスクのタスクＩＤを含むレコードを抽出し（Ｓ１３０７）、さらにステータスがＲＵＮＮＩＮＧ、または終了時刻が記録されていないアテンプトを抽出する（Ｓ１３０８）。 Next, an attempt list 406 is acquired (S1306), a record including the task ID of the task being executed is extracted therefrom (S1307), and an attempt whose status is RUNNING or whose end time is not recorded is extracted (S1308). ).

ここまでで抽出されたレコードのうち、タスクリストから抽出されたレコードにはスプリットのフィールドが、アテンプトリストから抽出されたレコードには実行ノードのフィールドがある。そこで、カレントジョブの各々について、これらの対応関係をそれぞれ仮想グループテーブルに記載する（Ｓ１３１０、Ｓ１３１１）。 Of the records extracted so far, the record extracted from the task list has a split field, and the record extracted from the attempt list has an execution node field. Therefore, for each of the current jobs, these correspondences are described in the virtual group table (S1310, S1311).

図１４Ａは、この仮想グループテーブル５０１の一例を示す図である。 FIG. 14A is a diagram showing an example of this virtual group table 501.

スプリットを含む、または、実行ノードである全ノードを、あるジョブの仮想グループとする（Ｓ１３１２）。すなわち、カレントジョブ1つにつき仮想グループ1つができる。図１４Ｂは、仮想グループノード一覧テーブル５０２の一例を示す図である。これは仮想グループテーブル５０１から、ジョブＩＤ、ジョブ名、ノード名を取り出したテーブルである。このテーブルにより、ある仮想グループに属するノードを一覧することができる。 All nodes including splits or execution nodes are set as a virtual group of a job (S1312). That is, one virtual group is created for each current job. FIG. 14B is a diagram illustrating an example of the virtual group node list table 502. This is a table in which the job ID, job name, and node name are extracted from the virtual group table 501. With this table, nodes belonging to a certain virtual group can be listed.

稼働状況評価部２０１４は、これら仮想グループテーブル５０１および仮想グループノード一覧テーブル５０２を、後々の処理に供するため監視サーバ１１０のメモリ１０２に保存する。または、データベース２０１３に格納してもよい。 The operation status evaluation unit 2014 stores the virtual group table 501 and the virtual group node list table 502 in the memory 102 of the monitoring server 110 for later processing. Alternatively, it may be stored in the database 2013.

図１５は、上記の仮想グループの概念を示す図の一例である。仮想グループ５０３には、あるジョブについて、タスク３０２の実行ノードおよびスプリット（Ｍａｐタスクの入力ファイル）３０４を含むノードが所属することになる。 FIG. 15 is an example of a diagram showing the concept of the virtual group. In the virtual group 503, an execution node of the task 302 and a node including a split (Map task input file) 304 belong to a certain job.

図１６は、ノード特性判定の処理手順（ステップＳ１２０２）の一例を示す図である。ノード特性の判定とは、ノードが仮想グループにおいてどのような役割を果たしているかを判定するものである。この役割をノード特性と呼称する。判定の材料となるのは、仮想グループ生成のときに作成した、対応関係を記したテーブルと、データ転送トレースである。データ転送トレースの転送先フィールドを参照することで、そのノードが含むスプリットがどこに転送されるものなのかを判定することができる。 FIG. 16 is a diagram illustrating an example of a processing procedure (step S1202) for determining node characteristics. Node characteristic determination is to determine what role a node plays in a virtual group. This role is called node characteristics. The judgment materials are a table describing correspondences and a data transfer trace created at the time of virtual group generation. By referring to the transfer destination field of the data transfer trace, it is possible to determine where the split included in the node is transferred.

ノード特性の判定は、管理対象ホスト毎に行う。まず管理対象ホストが仮想グループに属するかを、仮想グループノード一覧テーブル５０２に基づき判定する（Ｓ１６０１）。仮想グループに属する場合は処理を継続するが、属しない場合は稼働状況評価対象外として処理を終了する（Ｓ１６０７）仮想グループに属する場合、仮想グループテーブル５０１の情報を用いて実行ノードであるかを判定する（Ｓ１６０２）。実行ノードである場合は、そのタスク種別を判定する（Ｓ１６０３）。次に仮想グループテーブル５０１の情報を用いてスプリットを含むかを判定する（Ｓ１６０４）。スプリットを含む場合は、データ転送トレースの情報を用い、そのスプリットの転送先を判定する（Ｓ１６０５）。これらの処理、特にタスク種別の判定Ｓ１６０３と転送先の判定Ｓ１６０５によって得られる情報と、次に示すテーブルを用いてノード特性の判定を行う（Ｓ１６０６）。 Node characteristics are determined for each managed host. First, it is determined based on the virtual group node list table 502 whether the managed host belongs to the virtual group (S1601). If it belongs to the virtual group, the process is continued, but if it does not belong, the process is terminated as out of the operation status evaluation target (S1607). Determination is made (S1602). If it is an execution node, the task type is determined (S1603). Next, it is determined whether the split is included using the information of the virtual group table 501 (S1604). If a split is included, information on the data transfer trace is used to determine the transfer destination of the split (S1605). The node characteristics are determined using these processes, particularly information obtained by the task type determination S1603 and the transfer destination determination S1605, and the following table (S1606).

図１７は、ノード特性の判定に使用するテーブルの例と、それらの設定に用いる画面表示の例を示す図である。 FIG. 17 is a diagram illustrating an example of a table used for determining node characteristics and an example of a screen display used for setting them.

まず図１７Ａは、もっとも単純なノード特性の分類の例である。この例では、1ノードが1スロット（同時に1タスクしか実行できない）という設定であり、タスク種別はＭａｐ、Ｒｅｄｕｃｅ、Ｎｏｎｅ（実行ノードでない）のいずれか、スプリットの転送先はローカル、リモート、転送なしのいずれかである。 FIG. 17A is an example of the simplest node characteristic classification. In this example, 1 node is set to 1 slot (only 1 task can be executed at the same time), the task type is Map, Reduce, or None (not an execution node), the transfer destination of the split is local, remote, no transfer One of them.

先のステップＳ１６０３、Ｓ１６０５によって得られる情報から、該管理対象ホストがこのテーブル５０４のどの欄に該当するかを判定できる。その欄に記された記号（ＭＬ、ＭＲ等）が、すなわちその管理対象ホストのノード特性である。このテーブル５０４に記されたこれらの記号は、複数のノード特性を区別するために便宜的に定められた記号であり、その用をなすものであればどのような記号の体系を用いても構わない。 From the information obtained in the previous steps S1603 and S1605, it can be determined which column of the table 504 the managed host corresponds to. Symbols (ML, MR, etc.) written in the column are the node characteristics of the managed host. These symbols written in the table 504 are symbols that are determined for the purpose of distinguishing a plurality of node characteristics, and any symbol system may be used as long as it is used for this purpose. Absent.

図１７Ｂは、いくぶん複雑なノード特性の分類の例である。管理対象ホストは、スプリットを複数含み、それらを様々な転送先に転送することもあるであろう。データ転送トレースの情報から、管理対象ホストがスプリットを転送した転送先とバイト数を得ることができる。これをローカル転送（Ｌ）とリモート転送（Ｒ）に分け、さらにその転送量の比率で６段階に分類する。また、1ノードに複数スロットが設定されている場合、実行されているタスクのタスク種別がＭａｐタスク（Ｍ）であるかＲｅｄｕｃｅタスク（Ｒ）であるか、その数の比率に応じて５段階で、あるいは実行しているタスクがない状態（Ｎｏｎｅ）を加えて６段階で分類する。 FIG. 17B is an example of a somewhat complicated classification of node characteristics. A managed host may include multiple splits and transfer them to various destinations. From the data transfer trace information, the transfer destination and the number of bytes to which the managed host has transferred the split can be obtained. This is divided into local transfer (L) and remote transfer (R), and further classified into six levels according to the ratio of the transfer amount. Also, when multiple slots are set for one node, the task type of the task being executed is Map task (M) or Reduce task (R), depending on the ratio of the number of steps in five steps. Or, a state where there is no task being executed (None) is added to classify in six stages.

こうしたノード特性の判定に使用するテーブルは様々なものが考えうるが、どういったテーブルが適切であるかは、並列分散処理システムによって異なるであろう。そこで、並列分散処理システムの運用管理担当者が、どのようなテーブルを使用するかを監視マネージャに指示できるようにする。 Various tables can be considered for determining such node characteristics, but what kind of table is appropriate will vary depending on the parallel distributed processing system. Therefore, a person in charge of operation management of the parallel distributed processing system can instruct the monitoring manager what table to use.

図１７Ｃは、これらノード特性の判定に使用するテーブルを設定するプリファレンス画面の例を示す図である。プリファレンス画面６０１は、監視コンソール２０２が監視コンソールスクリーン６００に表示する画面であり、ノード特性使用チェックボックス６０１１、ノード特性自動判定チェックボックス６０１２、プリセットメソッド使用チェックボックス６０１３、メソッド選択ドロップダウンボックス６０１４、カスタムメソッド作成チェックボックス６０１５、カスタムメソッドテーブル６０１６を備える。カスタムメソッドテーブル６０１６は、複数のエントリにより構成され、エントリそれぞれはメトリクス使用チェックボックス６０１７、メトリクス名６０１８を備える。また、プリファレンス画面６０１は、ＯＫ／Ｃａｎｃｅｌボタン６０１９を備える。 FIG. 17C is a diagram showing an example of a preference screen for setting a table used for determining these node characteristics. The preference screen 601 is a screen that the monitoring console 202 displays on the monitoring console screen 600. The node property use check box 6011, the node property automatic determination check box 6012, the preset method use check box 6013, the method selection drop-down box 6014, A custom method creation check box 6015 and a custom method table 6016 are provided. The custom method table 6016 includes a plurality of entries, and each entry includes a metrics use check box 6017 and a metrics name 6018. The preference screen 601 includes an OK / Cancel button 6019.

運用管理担当者は、監視コンソール２０２が実装されているクライアント１２０のコンソール１０５に表示される監視コンソールスクリーン６００と、同じくコンソール１０５のヒューマンインタフェースデバイスを用いて、ノード特性の判定に使用するテーブルを指定する。ノード特性使用チェックボックス６０１１をチェックすることで、稼働状況評価にノード特性を適用するよう指示することができる。ノード特性自動判定チェックボックス６０１２をチェックすることで、監視マネージャがノード特性を自動的に判定するよう指示することができる。このノード特性自動判定チェックボックス６０１２をチェックすることで、以下のノード特性判定方法に関わる操作が可能になる。 The person in charge of operations designates a table to be used for determination of node characteristics using the monitoring console screen 600 displayed on the console 105 of the client 120 on which the monitoring console 202 is mounted and the human interface device of the console 105. To do. By checking the node property use check box 6011, it is possible to instruct to apply the node property to the operation status evaluation. By checking the node characteristic automatic determination check box 6012, the monitoring manager can be instructed to automatically determine the node characteristic. By checking this node characteristic automatic determination check box 6012, the following operations related to the node characteristic determination method can be performed.

プリセットメソッド使用チェックボックス６０１３をチェックすることで、あらかじめ監視マネージャに登録されているノード特性判定方法を使用するよう指示することができる。プリセットメソッド使用チェックボックス６０１３をチェックすると、メソッド選択ドロップダウンボックス６０１４が使用できるようになる。このメソッド選択ドロップダウンボックス６０１４を操作することで、あらかじめ登録されているノード特性判定方法のうちどれを使用するかを選択し指示することができる。例えば、管理対象ホストがＭａｐＲｅｄｕｃｅクラスタを構成するノードである場合に適切なノード特性判定方法として「ＭａｐＲｅｄｕｃｅ」という名称の判定方法が登録されていれば、これを選択する。 By checking the preset method use check box 6013, it is possible to instruct to use the node characteristic determination method registered in advance in the monitoring manager. When the preset method use check box 6013 is checked, a method selection drop-down box 6014 can be used. By operating this method selection drop-down box 6014, it is possible to select and instruct which of the node characteristic determination methods registered in advance is to be used. For example, when the management target host is a node constituting the MapReduce cluster, if a determination method named “MapReduce” is registered as an appropriate node characteristic determination method, this is selected.

監視マネージャにあらかじめ登録されているノード特性判定方法では適切ではないと運用管理担当者が判断した場合は、カスタムメソッド作成チェックボックス６０１５をチェックし、カスタム化されたノード特性判定方法を使用するよう指示することができる。カスタムメソッド作成チェックボックス６０１５をチェックすると、カスタムメソッドテーブル６０１６の操作が可能になる。 If the person in charge of operation determines that the node characteristic determination method registered in advance in the monitoring manager is not appropriate, check the custom method creation check box 6015 and instruct to use the customized node characteristic determination method. can do. When the custom method creation check box 6015 is checked, the custom method table 6016 can be operated.

カスタムメソッドテーブル６０１６は、監視マネージャが監視エージェントから収集する様々な監視データを列挙し、それらのうちどれを用いてノード特性の判定を行うかを指示するものである。監視データはカスタムメソッドテーブル６０１６のエントリとして一覧表示され、多数に及ぶ場合にはスクロールバーによりその一部のみを表示する。各エントリに対応する監視データの名称をメトリクス名６０１８に表示する。各エントリが備えるメトリクス使用チェックボックス６０１７をチェックすると、該エントリに対応する監視データをノード特性判定に使用するよう指示することができる。 The custom method table 6016 lists various types of monitoring data collected from the monitoring agent by the monitoring manager, and instructs which of them is used to determine the node characteristics. The monitoring data is displayed as a list of entries in the custom method table 6016, and when there are a large number, only a part thereof is displayed by the scroll bar. The name of the monitoring data corresponding to each entry is displayed in the metric name 6018. When the metrics use check box 6017 included in each entry is checked, it can be instructed to use the monitoring data corresponding to the entry for node characteristic determination.

運用管理担当者が、カスタムメソッドテーブル６０１６のメトリクス使用チェックボックスのうち適切と判断するものをいくつか選択のうえチェックし、ＯＫ／Ｃａｎｃｅｌボタン６０１９のうちＯＫボタンを押下すると、監視コンソールはそれら選択されたメトリクスの情報を監視マネージャに送信する。監視マネージャはその情報に基づき、ノード特性判定に使用するテーブルを構築し、監視サーバ１１０のメモリ１０２に保存する。または、データベース２０１３に格納してもよい。 When the operation manager selects and checks some of the metrics use check boxes in the custom method table 6016 that are determined to be appropriate, and presses the OK button of the OK / Cancel button 6019, the monitoring console is selected. The metrics information is sent to the monitoring manager. Based on the information, the monitoring manager builds a table used for determining node characteristics and stores it in the memory 102 of the monitoring server 110. Alternatively, it may be stored in the database 2013.

図１８は、クラスタマップ生成の処理手順（ステップＳ１２０３）の一例を示す図である。 FIG. 18 is a diagram illustrating an example of a cluster map generation processing procedure (step S1203).

クラスタマップは、仮想グループに属するノードをノード特性で分類したものである。まずノード特性判定結果を取得する（Ｓ１８０１）。次に、仮想グループはカレントジョブ1つにつき1つであるので、まずジョブＩＤにてソートする（Ｓ１８０２）。次にノード特性でソートする（Ｓ１８０３）ことで、ノード特性ごとにノードを分類することができる。 The cluster map is obtained by classifying nodes belonging to a virtual group according to node characteristics. First, a node characteristic determination result is acquired (S1801). Next, since there is one virtual group for each current job, sorting is first performed by job ID (S1802). Next, by sorting by node characteristics (S1803), the nodes can be classified for each node characteristic.

図１９は、クラスタマップテーブルの一例を示す図である。 FIG. 19 is a diagram illustrating an example of a cluster map table.

クラスタマップテーブル５０６は、図１４Ｂで示した仮想グループノード一覧テーブル５０２にノード特性判定結果を追記し、ノード特性でソートしたものであると言える。 It can be said that the cluster map table 506 is obtained by adding node characteristic determination results to the virtual group node list table 502 shown in FIG.

図２０は、クラスタマップの概念を示す図の一例である。 FIG. 20 is an example of a diagram illustrating the concept of the cluster map.

クラスタマップ５０３は、ジョブＩＤとジョブ名で識別されるジョブを単位として、そのジョブの実行に関わるノードをノード種別により分類したものである。あるノード特性を備えるノード１４０は、同じノード特性を備えるノードと共にノード特性グループ５０７に属する。 The cluster map 503 is obtained by classifying nodes related to execution of a job by node type, with a job identified by a job ID and a job name as a unit. A node 140 having a certain node characteristic belongs to the node characteristic group 507 together with nodes having the same node characteristic.

図２１は、ノード性能行列生成の処理手順（ステップＳ１２０４）の一例を示す図である。 FIG. 21 is a diagram illustrating an example of a processing procedure (step S1204) for generating a node performance matrix.

管理対象ホスト毎に、まずデータベースに該ホストのＯＳ性能情報があるかを判定する（Ｓ２１０１）。もしなければ、稼働状況評価の対象外とする（Ｓ２１０５）。もしあれば、そのＯＳ性能情報から一定のタイムフレームのデータを取得し（Ｓ２１０２）、全てのメトリックを連結し（Ｓ２１０３）、ノード性能行列を生成する（Ｓ２１０４）。 For each managed host, it is first determined whether there is OS performance information of the host in the database (S2101). If not, it is excluded from the operation status evaluation (S2105). If there is, data of a certain time frame is acquired from the OS performance information (S2102), all metrics are concatenated (S2103), and a node performance matrix is generated (S2104).

この手順で示されるように、ノード性能行列とは、そのノードから取得したＯＳ性能情報を連結したものであり、ＯＳ性能情報がデータベースに記録されているホストについて、そのホストの特性をあるタイムフレーム内の資源の使用状況から特徴づけるものである。ここではメトリックを単純に連結したものをノード性能行列として使用しているが、他の例も考えられる。例えば、過去のＯＳ性能情報もデータベースに記録されていることを利用し、指数加重移動平均を算出した上で連結するといった方法も可能である。 As shown in this procedure, the node performance matrix is obtained by concatenating the OS performance information acquired from the node. For the host in which the OS performance information is recorded in the database, the characteristics of the host are expressed in a time frame. It is characterized by the usage status of resources. Here, a simple concatenation of metrics is used as the node performance matrix, but other examples are possible. For example, using the fact that past OS performance information is also recorded in the database, it is possible to connect after calculating an exponential weighted moving average.

また、このホストの特性を特徴づけるという目的から、タイムフレームを決定する。ＯＳ性能情報には情報取得のインターバルが記録されている。稼働状況評価の処理一回につき各ホストで共通のタイムフレームであれば、どのようなものを使うにせよ、メトリック毎にデータを複数含むようなタイムフレームを選択することが必要である。 In addition, the time frame is determined for the purpose of characterizing the characteristics of the host. Information acquisition intervals are recorded in the OS performance information. As long as the time frame is common to each host for each operation status evaluation process, it is necessary to select a time frame that includes a plurality of data for each metric, regardless of what is used.

図２２は、ノード性能行列テーブルの一例を示す図である。ノード性能行列テーブルは、ノード性能行列を格納するテーブルであり、稼働状況評価の対象となるノード毎に生成される。上記ステップＳ２１０３で連結したとおり、テーブルの列方向にはＯＳ性能情報の各メトリックが列挙され、テーブルの縦方向には各メトリックのタイムフレーム内のデータが取得時刻順に配置される。この例ではテーブルの最初の行に各メトリックの名称、左端の列にデータの取得時刻を含むが、これはテーブルの内容をわかりやすく示すために記載したものであり、実際のテーブルには必ずしも含む必要はない。 FIG. 22 is a diagram illustrating an example of the node performance matrix table. The node performance matrix table is a table that stores a node performance matrix, and is generated for each node that is an object of operation status evaluation. As linked in step S2103, each metric of the OS performance information is listed in the column direction of the table, and data in the time frame of each metric is arranged in the order of acquisition time in the vertical direction of the table. In this example, the name of each metric is included in the first row of the table, and the data acquisition time is included in the leftmost column. This is shown for easy understanding of the contents of the table, and is not necessarily included in the actual table. There is no need.

図２３は、相関分析の処理手順（ステップＳ１２０５）の一例を示す図である。 FIG. 23 is a diagram illustrating an example of a correlation analysis processing procedure (step S1205).

相関分析は、管理対象ホスト毎に行う。まず管理対象ホストが仮想グループに属するかを判定する（Ｓ２３０１）。もし仮想グループに属していないとすれば、該管理対象ホストはどのカレントジョブの実行にも関与していないということであり、稼働状況評価の対象外とする（Ｓ２３１４）。次に管理対象ホストのノード性能行列が存在するかを判定する（Ｓ２３０２）。もしＯＳ性能情報が取得されていない等の理由でノード性能行列が生成されず、該管理対象ホストのノード性能行列テーブルが存在しない場合は、稼働状況評価の対象外とする（Ｓ２３１４）。 Correlation analysis is performed for each managed host. First, it is determined whether the managed host belongs to a virtual group (S2301). If it does not belong to the virtual group, it means that the managed host is not involved in the execution of any current job, and is excluded from the operation status evaluation target (S2314). Next, it is determined whether there is a node performance matrix for the managed host (S2302). If the node performance matrix is not generated because the OS performance information has not been acquired and the node performance matrix table of the managed host does not exist, it is excluded from the operation status evaluation target (S2314).

次からの処理は、管理対象ホストが属する仮想グループに注目して行う。まず該仮想グループに存在するノード特性を抽出し（Ｓ２３０３）、それらノード特性毎に、該ノード特性を備えるノードを抽出し（Ｓ２３０４）、そしてそれらノード毎に相関係数の算出の処理を行う（Ｓ２３０５）。これらの処理に必要な、ノードとノード特性の情報はクラスタマップテーブル５０６から抽出することができる。 The following processing is performed paying attention to the virtual group to which the managed host belongs. First, node characteristics existing in the virtual group are extracted (S2303), a node having the node characteristics is extracted for each node characteristic (S2304), and a correlation coefficient calculation process is performed for each node (S2304). S2305). Node and node characteristic information necessary for these processes can be extracted from the cluster map table 506.

このように、ある管理対象ホストから見て、ノード特性が自身のそれと同一であるか異なるかに関わらず、自身の属する仮想グループに存在する全てのノード属性との間で相関係数算出の処理を行うことにより、該仮想グループに属するノード特性の構成がいかようであっても対応することができる。 In this way, correlation coefficient calculation processing is performed with all node attributes existing in the virtual group to which the node belongs, regardless of whether the node characteristics are the same as or different from those of the managed host. By doing the above, it is possible to cope with any configuration of node characteristics belonging to the virtual group.

さて、ステップＳ２３０５にてあるノード特性を備えるノード群から１ノードを選択した後、次に該ノードのノード性能行列Ｖnを取得する（Ｓ２３０６）。そして、管理対象ホストのノード性能行列Vpと、該ノードのノード性能行列Vnとの間の正準相関係数を算出する（Ｓ２３０７）。正準相関係数は、１つ以上の相関係数ρ１〜ρｎとして表わされる。 Now, after selecting one node from the node group having certain node characteristics in step S2305, next, the node performance matrix Vn of the node is acquired (S2306). Then, a canonical correlation coefficient between the node performance matrix Vp of the managed host and the node performance matrix Vn of the node is calculated (S2307). The canonical correlation coefficient is represented as one or more correlation coefficients ρ1 to ρn.

この正準相関係数のうち、ある閾値より高いものの数が、カレントジョブのその２つのノード間の関係を示す情報である。この数が、該ジョブの正常時における同一組のノード特性間のそれより小さい場合、当該ノード間の相関が低くなったことを意味する。そして、他の全ノードに対してその現象が観測された場合、管理対象ホストについての障害の検知とみなす。この処理を実行するためには、あるジョブの正常時におけるノード特性間の正準相関係数についての情報が必要であり、そうした情報を既定正準相関データと呼称する。この情報を格納するテーブルをジョブプロファイルテーブルと呼称し、後述される。相関分析の処理手順の説明に戻る。正準相関係数ρ１〜ρｎを算出したのち、管理対象ホストと選択したノードのノード特性の組をキーとして、ジョブプロファイルテーブルから該ノード特性間の既定正準相関データを取得する（Ｓ２３０８）。次に、ρ１〜ρｎのうち、ある閾値より高いものを選出し（Ｓ２３０９）、そしてその数が既定正準相関データの数より小さいと判定した場合は（Ｓ２３１０）、カウンタをインクリメントする（Ｓ２３１１）。こうして、正準相関係数の算出と既定正準相関データとの比較を、仮想グループ内の全ノード特性とそれに属するノード群、すなわち仮想グループ内の全ノードに対して行い、結果カウンタの値が仮想グループの(現在注目している管理対象ホストを除いた)ノード数に等しくなった場合（Ｓ２３１２）、当該管理対象ホストでは障害が発生していると判定し、管理対象ホスト一覧のテーブルの該レコードについて障害検知フラグを１に設定する（Ｓ２３１３）。 The number of canonical correlation coefficients higher than a certain threshold is information indicating the relationship between the two nodes of the current job. If this number is smaller than that between the same set of node characteristics when the job is normal, it means that the correlation between the nodes is low. If the phenomenon is observed for all other nodes, it is regarded as a failure detection for the managed host. In order to execute this processing, information on the canonical correlation coefficient between node characteristics at the time of normality of a certain job is necessary, and such information is referred to as predetermined canonical correlation data. A table storing this information is referred to as a job profile table and will be described later. Returning to the explanation of the correlation analysis procedure. After calculating the canonical correlation coefficients ρ1 to ρn, default canonical correlation data between the node characteristics is acquired from the job profile table using the node characteristics of the managed host and the selected node as a key (S2308). Next, among ρ1 to ρn, those higher than a certain threshold are selected (S2309), and when it is determined that the number is smaller than the number of predetermined canonical correlation data (S2310), the counter is incremented (S2311). . Thus, the calculation of the canonical correlation coefficient and the comparison with the predetermined canonical correlation data are performed for all node characteristics in the virtual group and the nodes belonging to the node group, that is, all nodes in the virtual group, and the value of the result counter is When the number of nodes in the virtual group (excluding the currently-managed managed host) is equal (S2312), it is determined that a failure has occurred in the managed host, and the table of managed host list The failure detection flag is set to 1 for the record (S2313).

この処理において必要となる閾値は、１つのクラスタ、あるいは１つのシステム運用管理装置において一貫したものであれば、任意のものを設定できる。また、障害発生の判定に使用するカウンタについて、仮想グループのノード数と等しくなったときに限らず、例えば仮想グループのノード数の半数を超えた場合に障害発生とみなす等、その判定の基準は任意のものを設定できる。こうした自由度は、並列分散処理システムの複雑さ、あるいは情報処理装置で発生する障害の多様さに適応するために必要なものである。 Any threshold can be set as long as it is consistent in one cluster or one system operation management apparatus. In addition, the counter used for determining the failure occurrence is not limited to when the number of nodes in the virtual group is equal, for example, when the number of nodes in the virtual group exceeds half of the number of nodes, the determination criteria are Anything can be set. Such a degree of freedom is necessary to adapt to the complexity of the parallel distributed processing system or the variety of failures that occur in the information processing apparatus.

図２４は、ジョブプロファイルテーブルの一例を示す図である。 FIG. 24 is a diagram illustrating an example of a job profile table.

ジョブプロファイルテーブル５０９は、ジョブ名で特定されるジョブについて、そのジョブの実行に関わるノードをノード特性で分類した上で、正常時のそれらのノード同士でノード性能行列の正準相関係数を算出した結果から、既定の閾値より大きい値を記録したもの、すなわち既定正準相関データを、ジョブ名およびノード特性の組をキーとして検索できるよう記録したものである。稼働状況評価部２０１４は、ジョブプロファイルテーブルに既定正準相関データを記録する。この際、前述したようにある閾値より大きい値のみを記録してもよいし、あるいは算出した正準相関係数を全て記録しておき、相関算出の処理を実行するに際して閾値より大きい値のみを取得するようにしてもよい。また、同一のノード特性を持つノードの組は複数の組み合わせがありうるが、それぞれから算出される正準相関係数のうち最も小さい値のものを既定正準相関データとして採用してもよいし、平均値や中央値を算出して採用してもよい。稼働状況評価部２０１４は、ジョブプロファイルテーブル５０９を、後々の相関分析の処理に供するため監視サーバ１１０のメモリ１０２に保存する。または、データベース２０１３に格納してもよい。 The job profile table 509 classifies the nodes related to the execution of the job specified by the job name according to the node characteristics, and calculates the canonical correlation coefficient of the node performance matrix between the nodes in the normal state. As a result, a value larger than a predetermined threshold value is recorded, that is, predetermined canonical correlation data is recorded so that a set of job name and node characteristic can be searched as a key. The operating status evaluation unit 2014 records default canonical correlation data in the job profile table. At this time, as described above, only a value larger than a certain threshold value may be recorded, or all the calculated canonical correlation coefficients are recorded, and when executing the correlation calculation process, only a value larger than the threshold value is recorded. You may make it acquire. In addition, there may be a plurality of combinations of nodes having the same node characteristics, but the one with the smallest value among the canonical correlation coefficients calculated from each may be adopted as the default canonical correlation data. Alternatively, an average value or a median value may be calculated and employed. The operation status evaluation unit 2014 stores the job profile table 509 in the memory 102 of the monitoring server 110 for use in later correlation analysis processing. Alternatively, it may be stored in the database 2013.

図２５は、監視マネージャの処理手順のうち、イベント通知の処理手順（Ｓ７０６）の一例を示す図である。 FIG. 25 is a diagram illustrating an example of an event notification processing procedure (S706) in the monitoring manager processing procedure.

イベント通知は、管理対象ホストのうち、相関算出の結果障害検知フラグフィールド４０１４が1に設定されたものを抽出し、監視コンソールに当該ホストの情報を通知する処理である。イベント通知部は、まず管理対象ホスト一覧のテーブル４０１を取得し（Ｓ２５０１）、該テーブルの各レコードにつき障害検知フラグフィールド４０１４を調べる（Ｓ２５０２）。そして、該フィールドが１である場合は、そのレコードからホスト名フィールド４０１１を抽出し、監視コンソールに通知する（Ｓ２５０３）。また他のフィールド、例えばクラスタ名フィールド４０１３を、ホスト名フィールドと併せて通知することもできる。 The event notification is a process of extracting, from the management target hosts, correlation detection result failure detection flag field 4014 set to 1, and notifying the monitoring console of the host information. The event notification unit first acquires the managed host list table 401 (S2501), and checks the failure detection flag field 4014 for each record in the table (S2502). If the field is 1, the host name field 4011 is extracted from the record and notified to the monitoring console (S2503). Further, other fields such as the cluster name field 4013 can be notified together with the host name field.

図２６は、監視コンソールにおけるイベント通知の画面表示の一例を示す図である。 FIG. 26 is a diagram illustrating an example of a screen display of event notification in the monitoring console.

監視マネージャ２０１のイベント通知部２０１５からの通知を受信した監視コンソール２０２は、クライアント１２０のコンソール１０５に監視コンソールスクリーン６００およびイベント通知画面６０２を表示することで、運用管理担当者に障害の発生を通知する。 The monitoring console 202 that has received the notification from the event notification unit 2015 of the monitoring manager 201 displays the monitoring console screen 600 and the event notification screen 602 on the console 105 of the client 120, thereby notifying the person in charge of operation management of the occurrence of a failure. To do.

イベント通知画面６０２は、クラスタ名表示６０２１と、該クラスタに属するノードのノード名表示６０２２とノードステータス表示６０２３の組により構成される。監視マネージャが送信したイベント通知がクラスタ名フィールドを含む場合には、監視コンソールスクリーンにノードをクラスタ別に分類して表示することで、運用管理担当者は障害が影響する範囲を容易に把握することができる。このために、クラスタ名表示６０２１が用意される。イベント通知が含むホスト名は、ノード名表示６０２２に表示する。ノードステータス表示６０２３には、監視マネージャが該ノードにおける障害の発生を検知したことを運用管理担当者が認識できるような方法で、それを表示する。例えば文字による表示、色調の変化による表示、あるいはこれらの組み合わせによる表示等の方法がある。また、運用管理担当者が障害への対応を実施するにあたって有用な情報を、併せて表示することができる。 The event notification screen 602 includes a cluster name display 6021, a node name display 6022 of a node belonging to the cluster, and a node status display 6023. If the event notification sent by the monitoring manager includes the cluster name field, the operation manager can easily understand the scope of the failure by classifying the nodes by cluster on the monitoring console screen. it can. For this purpose, a cluster name display 6021 is prepared. The host name included in the event notification is displayed on the node name display 6022. The node status display 6023 displays this in a manner that allows the operation manager to recognize that the monitoring manager has detected the occurrence of a failure in the node. For example, there are methods such as display by characters, display by change in color tone, or display by a combination thereof. In addition, information useful for the person in charge of operation when dealing with a failure can be displayed together.

このイベント通知画面に示されるように、本実施例のシステム運用管理装置は、その管理対象ホストにおける障害の発生を、コンソール１０５を経由した情報表示にて運用管理担当者に通知するが、他にも電子メールの送信による通知や、ブザーの鳴動や回転警告灯の点灯による通知等、様々な方法がありうる。 As shown in this event notification screen, the system operation management apparatus of this embodiment notifies the operation management person in charge of the occurrence of a failure in the managed host by displaying information via the console 105. There are various methods such as notification by sending an e-mail, notification by sounding a buzzer or turning on a rotation warning light.

図２７は、監視コンソールが管理対象ホストの稼働状況を画面表示する処理手順の一例を示す図である。監視コンソール２０２は、監視マネージャ２０１からのイベント通知処理に依らずとも、管理対象ホストの稼働状況をクライアント１２０のコンソール１０５に表示させることができる。これにより、管理対象ホストにおける障害の発生の有無に関わらず、運用管理担当者は管理対象ホストの稼働状況を監視することができる。 FIG. 27 is a diagram illustrating an example of a processing procedure in which the monitoring console displays the operation status of the managed host on the screen. The monitoring console 202 can display the operating status of the managed host on the console 105 of the client 120 without depending on the event notification processing from the monitoring manager 201. Thereby, regardless of whether or not a failure has occurred in the managed host, the person in charge of operation management can monitor the operating status of the managed host.

まず監視コンソール２０２は、管理対象ホスト一覧のテーブル４０１を取得する（Ｓ２７０１）。管理対象ホスト一覧のテーブルは、監視マネージャ２０１のデータベース２０１３に格納されているものを、監視マネージャとの通信によって取得する。次いで、取得した管理対象ホスト一覧から表示対象ホストを抽出する（Ｓ２７０２）。表示対象ホストは、管理対象ホスト一覧のサブセットであり、その抽出には様々な基準を適用しうるが、例えば監視マネージャが監視サーバのメモリにテーブルとして保存している情報を使用することが考えられる。その例は後述される。しかる後に、表示対象ホストの一覧を画面表示に適した表形式に整形し（Ｓ２７０３）、クライアント１２０のコンソール１０５を経由して画面表示を行う（Ｓ２７０４）。 First, the monitoring console 202 acquires the managed host list table 401 (S2701). The managed host list table is acquired from the database 2013 of the monitoring manager 201 through communication with the monitoring manager. Next, a display target host is extracted from the acquired management target host list (S2702). The display target host is a subset of the management target host list, and various criteria can be applied to the extraction. For example, it is possible to use information stored as a table in the memory of the monitoring server by the monitoring manager. . Examples thereof will be described later. Thereafter, the list of hosts to be displayed is shaped into a table format suitable for screen display (S2703), and screen display is performed via the console 105 of the client 120 (S2704).

監視コンソール２０２は、上記のように管理対象ホスト一覧のテーブルや、監視マネージャがメモリに保存しているテーブルを監視マネージャとの通信によって取得するが、これらの処理を管理対象ホストの稼働状況を画面表示する都度実行する必要があるわけでは必ずしもない。監視コンソールは、取得したテーブルをクライアント１２０のメモリ１０２に保存しておき、複数回の画面表示の処理でこれらメモリに保存されたテーブルを再使用することで監視マネージャとのデータ転送量を削減することができる。この場合、クライアントのメモリに保存されたテーブルと、監視サーバのメモリに保存ないしはデータベースに格納されたテーブルとの間で、その内容に齟齬が生じないよう配慮する必要があるが、そのために必要な処理は一般にキャッシュ制御と呼ばれ、当業者には周知のものであろう。 The monitoring console 202 acquires the managed host list table and the table stored in the memory by the monitoring manager as described above by communicating with the monitoring manager, and displays the operation status of the managed host on the screen. It is not always necessary to execute it every time it is displayed. The monitoring console saves the acquired table in the memory 102 of the client 120, and reduces the data transfer amount with the monitoring manager by reusing the table stored in the memory in a plurality of screen display processes. be able to. In this case, it is necessary to consider that there is no discrepancy between the table stored in the client memory and the table stored in the monitoring server memory or stored in the database. The process is commonly referred to as cache control and will be well known to those skilled in the art.

図２８は、監視コンソールが管理対象ホストの稼働状況を画面表示する一例として、仮想グループノード一覧テーブルの情報を基に表示対象ホストを抽出する画面表示の例を示す図である。 FIG. 28 is a diagram illustrating an example of a screen display in which the monitoring console extracts a display target host based on information in the virtual group node list table as an example of a screen display of the operation status of the management target host.

監視コンソールが仮想グループを画面表示するにあたって、例えば図１５に示すような図を模して表示することも可能であるが、より一覧性の高い例も考えられる。 When the monitoring console displays the virtual group on the screen, for example, it is possible to display the virtual group in a manner similar to that shown in FIG.

図２８Ａは監視コンソールスクリーン６００に、あるＭａｐＲｅｄｕｃｅクラスタのクラスタ表示画面６０３を、仮想グループノード一覧テーブル５０２の情報を基にして表示する例である。仮想グループノード一覧テーブルは、監視マネージャ２０１の稼働状況評価部２０１４が生成するものであり、監視マネージャは図１４Ｂに示すテーブルとして監視サーバ１１０のメモリに保存している。管理対象ホスト一覧のテーブルとこのテーブルの情報を基に、監視コンソールが仮想グループを単位とした管理対象ホストの稼働状況を監視コンソールスクリーンに画面表示するとすれば、クラスタ名表示６０３１、仮想グループ６０３２、ノード６０３３、を表示する。 FIG. 28A is an example in which a cluster display screen 603 of a certain MapReduce cluster is displayed on the monitoring console screen 600 based on information in the virtual group node list table 502. The virtual group node list table is generated by the operation status evaluation unit 2014 of the monitoring manager 201, and the monitoring manager stores it in the memory of the monitoring server 110 as a table shown in FIG. 14B. Based on the managed host list table and the information in this table, if the monitoring console displays the operating status of the managed host in units of virtual groups on the monitoring console screen, a cluster name display 6031, a virtual group 6032, Node 6033 is displayed.

図２８Ｂは、監視コンソールスクリーン６００に、あるＭａｐＲｅｄｕｃｅクラスタのクラスタ表示画面６０３を、仮想グループノード一覧テーブル５０２の情報を基にして表示する別の例である。この例では、クラスタ名表示６０３１、ノード６０３３、該ノードが実行に関わるジョブ名６０３４、を表示する。監視コンソールが、こうした監視コンソールスクリーン６００をクライアント１２０のコンソール１０５に表示することで、運用管理担当者は管理対象ホストの稼働状況を知ることができる。 FIG. 28B is another example in which a cluster display screen 603 of a certain MapReduce cluster is displayed on the monitoring console screen 600 based on the information in the virtual group node list table 502. In this example, a cluster name display 6031, a node 6033, and a job name 6034 related to the execution of the node are displayed. The monitoring console displays such a monitoring console screen 600 on the console 105 of the client 120, so that the person in charge of operation management can know the operating status of the managed host.

図２９は、監視コンソールが管理対象ホストの稼働状況を画面表示する別の例として、クラスタマップテーブルの情報を基に表示対象ホストを抽出する画面表示の例を示す図である。 FIG. 29 is a diagram showing an example of a screen display in which the monitoring console extracts the display target host based on the information of the cluster map table as another example of displaying the operation status of the management target host on the screen.

監視コンソールがクラスタマップを画面表示するにあたって、例えば図２０に示すような図を模して表示することも可能であるが、より一覧性の高い例も考えられる。 When the monitoring console displays the cluster map on the screen, for example, it is possible to display it by imitating a diagram as shown in FIG. 20, but an example with higher listability is also possible.

図２９Ａは監視コンソールスクリーン６００に、あるＭａｐＲｅｄｕｃｅクラスタのクラスタ表示画面６０３を、クラスタマップテーブル５０６の情報を基にして表示する例である。クラスタマップテーブルは、監視マネージャ２０１の稼働状況評価部２０１４が生成するものであり、監視マネージャは図１９に示すテーブルとして監視サーバ１１０のメモリに保存している。管理対象ホスト一覧のテーブルとこのテーブルの情報を基に、監視コンソールが仮想グループを単位とした管理対象ホストの稼働状況を監視コンソールスクリーンに画面表示するとすれば、クラスタ名表示６０３１、仮想グループ６０３２、ノード６０３３、ノード属性６０３５、を表示する。 FIG. 29A is an example in which a cluster display screen 603 of a certain MapReduce cluster is displayed on the monitoring console screen 600 based on the information in the cluster map table 506. The cluster map table is generated by the operation status evaluation unit 2014 of the monitoring manager 201, and the monitoring manager stores it in the memory of the monitoring server 110 as a table shown in FIG. Based on the managed host list table and the information in this table, if the monitoring console displays the operating status of the managed host in units of virtual groups on the monitoring console screen, a cluster name display 6031, a virtual group 6032, A node 6033 and a node attribute 6035 are displayed.

図２９Ｂは、監視コンソールスクリーン６００に、あるＭａｐＲｅｄｕｃｅクラスタのクラスタ表示画面６０３を、クラスタマップテーブル５０６の情報を基にして表示する別の例である。この例では、クラスタ名表示６０３１、ノード６０３３、該ノードが実行に関わるジョブ名６０３４、該ノードのノード属性ラベル６０３６を表示する。 FIG. 29B is another example in which a cluster display screen 603 of a certain MapReduce cluster is displayed on the monitoring console screen 600 based on information in the cluster map table 506. In this example, a cluster name display 6031, a node 6033, a job name 6034 related to the execution of the node, and a node attribute label 6036 of the node are displayed.

以上の説明においては、ジョブ３０１が、Ｍａｐタスク３０５とＲｅｄｕｃｅタスク３０７の２つの種別のタスクを有する場合を例示した。しかしジョブが複数のMapタスクを有し、Reduceタスクを含まない場合にも本発明は上記説明に従って実施可能である。この場合、ノード特性はスプリットの転送先（すなわちデータファイルの特性）によって判定される（図１６）。 In the above description, the case where the job 301 has two types of tasks, the Map task 305 and the Reduce task 307, has been exemplified. However, even when a job has a plurality of Map tasks and does not include a Reduce task, the present invention can be implemented according to the above description. In this case, the node characteristics are determined by the split transfer destination (that is, the characteristics of the data file) (FIG. 16).

次に、本発明を適用した第二の実施例を説明する。第一の実施例で示した障害検知の機能を備えるシステム運用管理装置は、管理対象ホストからＯＳ性能情報やＭａｐＲｅｄｕｃｅスケジューリング情報を収集するために、各管理対象ホストに監視エージェントを実装していた。このような監視エージェントは、多くの場合、運用管理担当者が該ホストにインストールするものである。つまり管理対象ホスト数が増加するほど、作業が煩雑になるであろう。また、監視エージェントが該ホストのメモリをいくばくか消費することについて、懸念する向きもあるであろう。 Next, a second embodiment to which the present invention is applied will be described. The system operation management apparatus having the failure detection function shown in the first embodiment has implemented a monitoring agent on each managed host in order to collect OS performance information and MapReduce scheduling information from the managed host. In many cases, such a monitoring agent is installed on the host by an operation manager. In other words, the more the number of managed hosts, the more complicated the work will be. There may also be concern about the monitoring agent consuming some of the host's memory.

そこで本実施例では、監視エージェントを使用せずに障害検知を行う例を説明する。基本的な構成は第一の実施例と同一であるため、差異となる部分のみを説明する。 Therefore, in the present embodiment, an example in which failure detection is performed without using a monitoring agent will be described. Since the basic configuration is the same as that of the first embodiment, only differences will be described.

図３０は、第二の実施例による並列分散処理システムの一例を示す図である。図２に示す、第一の実施例による並列分散処理システムとの差異は、マスタノード１３０やワーカノード１４０が監視エージェントを実装せず、代わりに監視サーバ１１０がリモートモニタ２０８を実装する点である。 FIG. 30 is a diagram illustrating an example of a parallel distributed processing system according to the second embodiment. The difference from the parallel distributed processing system according to the first embodiment shown in FIG. 2 is that the master node 130 and the worker node 140 do not implement the monitoring agent, and the monitoring server 110 implements the remote monitor 208 instead.

図３１は、第二の実施例における監視マネージャとリモートモニタのブロック構成を示す図の一例である。 FIG. 31 is an example of a block diagram of the monitoring manager and the remote monitor in the second embodiment.

リモートモニタ２０８は、リモート監視データ取得部２０８１、監視データ送信部２０８４から構成される。 The remote monitor 208 includes a remote monitoring data acquisition unit 2081 and a monitoring data transmission unit 2084.

リモート監視データ取得部２０８１は、ＯＳ性能情報取得部２０８２、ＭａｐＲｅｄｕｃｅスケジューリング情報取得部２０８３を有する。リモートモニタ２０８は、多様な監視対象から監視データを取得できるよう、リモート監視データ取得部２０８１が、監視対象に応じた監視データ取得のための機能をプラグインとして使用するように構成されている。本実施例では、リモート監視データ取得部２０８１は、オペレーティングシステム（ＯＳ）２１０からＯＳ性能情報を取得するためのＯＳ性能情報取得部２０８２、ジョブトラッカ２０３とデータノード２０７からＭａｐＲｅｄｕｃｅスケジューリング情報を取得するためのＭａｐＲｅｄｕｃｅスケジューリング情報取得部２０８３を、それぞれプラグインとして使用する。 The remote monitoring data acquisition unit 2081 includes an OS performance information acquisition unit 2082 and a MapReduce scheduling information acquisition unit 2083. The remote monitor 208 is configured such that the remote monitoring data acquisition unit 2081 uses a function for acquiring monitoring data according to the monitoring target as a plug-in so that the monitoring data can be acquired from various monitoring targets. In this embodiment, the remote monitoring data acquisition unit 2081 acquires the MapReduce scheduling information from the OS performance information acquisition unit 2082 for acquiring OS performance information from the operating system (OS) 210, the job tracker 203, and the data node 207. Each of the MapReduce scheduling information acquisition units 2083 is used as a plug-in.

監視データ送信部２０８４は、リモート監視データ取得部２０８１とそのプラグインが取得した監視データを、監視マネージャ２０１に送信する。監視マネージャ２０１の監視データ収集部２０１１は、リモートモニタ２０８が送信する監視データを収集する。 The monitoring data transmission unit 2084 transmits the monitoring data acquired by the remote monitoring data acquisition unit 2081 and its plug-in to the monitoring manager 201. A monitoring data collection unit 2011 of the monitoring manager 201 collects monitoring data transmitted by the remote monitor 208.

プラグインは、それぞれの方法で情報を取得する。その例として、ＯＳ性能情報の場合は、ＳＳＨ（登録商標）とＯＳコマンド、あるいはＳＮＭＰを使うといった方法がある。ＭａｐＲｅｄｕｃｅスケジューリング情報の場合は、ＳＳＨでジョブトラッカやデータノードのログファイルを収集するといった方法がある。いずれにしても、取得する情報については第一の実施例における監視エージェントの実装するプラグインと変わりはない。 Plug-ins get information in their own way. As an example, in the case of OS performance information, there is a method of using SSH (registered trademark) and an OS command or SNMP. In the case of MapReduce scheduling information, there is a method of collecting log files of job trackers and data nodes by SSH. In any case, the acquired information is the same as the plug-in implemented by the monitoring agent in the first embodiment.

監視データ送信部２０８４が、監視マネージャ２０１に監視データを送信する方法としては、例えばソケット、ＲＰＣ、ＨＴＴＰといったプロセス間通信の方法によるものがある。 As a method for the monitoring data transmission unit 2084 to transmit the monitoring data to the monitoring manager 201, there is an inter-process communication method such as socket, RPC, or HTTP.

以上説明したような方法で、第二の実施例は監視エージェントを実装せずに、本発明を並列分散処理システムに適用する。 With the method described above, the second embodiment applies the present invention to a parallel distributed processing system without mounting a monitoring agent.

次に、本発明を適用した第三の実施例を説明する。第一の実施例では、ノード性能行列生成において管理対象ホストから収集されるＯＳ性能情報の監視データを使用した。第三の実施例では、これに加えて、管理対象ホストのノード性能指標を使用する。ノード性能指標とは、情報処理装置の備えるプロセッサ、メモリといった計算資源の個別の性能を数値によって表現したものである。例えばプロセッサについては、ある管理対象ホストが備えるプロセッサの個数、動作周波数といったものがノード性能指標である。本実施例は、この情報を使用することで、その具備する計算資源において多様性のある情報処理装置により構成される並列分散処理システムを対象にした障害検知をより効果的に行うことを狙いとするものである。以下、基本的な構成は第一の実施例と同一であるため、差異となる部分のみを説明する。 Next, a third embodiment to which the present invention is applied will be described. In the first embodiment, monitoring data of OS performance information collected from a managed host in node performance matrix generation is used. In the third embodiment, in addition to this, the node performance index of the managed host is used. The node performance index is a numerical expression of individual performance of computing resources such as a processor and a memory included in the information processing apparatus. For example, for a processor, the number of processors and operating frequency of a certain managed host are node performance indicators. The purpose of this embodiment is to use this information to more effectively perform fault detection for a parallel distributed processing system composed of information processing devices having diversity in the computational resources of the information. To do. Hereinafter, since the basic configuration is the same as that of the first embodiment, only the difference will be described.

図３２は、第三の実施例におけるノード性能行列生成の処理手順の一例を示す図である。図２１に示す処理手順に加えて、ステップＳ３２０３が追加される。監視マネージャ２０１の稼働状況評価部２０１４は、ステップＳ３２０３において、管理対象ホストのノード性能指標を取得する。典型的には、ノード性能指標は管理対象ホスト一覧テーブル４０１に記録されており、これを取得する。そしてステップＳ３２０４において、ＯＳ性能情報から取得した一定のタイムフレームのデータに含まれるメトリックに加えて、ノード性能指標の数値を列挙したものを連結し、ノード性能行列を生成する（Ｓ３２０５）。 FIG. 32 is a diagram illustrating an example of a processing procedure for generating a node performance matrix in the third embodiment. In addition to the processing procedure shown in FIG. 21, step S3203 is added. In step S3203, the operation status evaluation unit 2014 of the monitoring manager 201 acquires the node performance index of the management target host. Typically, the node performance index is recorded in the managed host list table 401, and is acquired. In step S3204, in addition to the metrics included in the data of a certain time frame acquired from the OS performance information, the nodes enumerated numerical values of the node performance indices are connected to generate a node performance matrix (S3205).

図３３は、第三の実施例における管理対象ホスト一覧のテーブル４０１の一例を示す図である。図９に示す管理対象ホスト一覧テーブル４０１の内容に加えて、プロセッサ数を記録するフィールド４０１５、プロセッサの動作周波数を記録するフィールド４０１６が追加される。これらのフィールドは、ノード性能指標として典型的なものとして例示されているのであって、他にも搭載するメモリの量といった計算資源に関わる情報も同様にノード性能指標として活用しうることは留意されたい。 FIG. 33 is a diagram illustrating an example of the management target host list table 401 in the third embodiment. In addition to the contents of the managed host list table 401 shown in FIG. 9, a field 4015 for recording the number of processors and a field 4016 for recording the operating frequency of the processors are added. It is noted that these fields are exemplified as typical node performance indicators, and other information related to computational resources such as the amount of memory installed can also be used as node performance indicators. I want.

次に、本発明を適用した第四の実施例を説明する。第一の実施例では、正準相関分析、すなわち二つのノード性能行列からその正準相関係数を算出することにより、障害を検知した。第四の実施例では、正準相関分析に限定せず、様々な統計手法を用いて障害検知を行う。 Next, a fourth embodiment to which the present invention is applied will be described. In the first embodiment, a failure is detected by canonical correlation analysis, that is, by calculating the canonical correlation coefficient from two node performance matrices. In the fourth embodiment, failure detection is performed using various statistical methods without being limited to canonical correlation analysis.

そもそも本発明の要諦は、情報処理装置より取得した監視データからノード性能行列を生成し、統計手法を用いてそれらの相関を分析するところにある。そして、このような目的に供することのできる統計手法は正準相関分析に限定されるものではない。一般に統計手法の中でも多変量解析として知られる分野では、複数の変数からなるデータ群を対象として、データの分類、次元圧縮、特徴抽出を行う統計手法が研究されてきた。例えば、主成分分析、ユークリッド距離を距離関数とするクラスタ分析、といった手法が知られており、正準相関分析もまたその一例である。 In the first place, the gist of the present invention is that a node performance matrix is generated from monitoring data acquired from an information processing apparatus, and their correlation is analyzed using a statistical method. The statistical method that can be used for such purposes is not limited to canonical correlation analysis. In a field generally known as multivariate analysis among statistical methods, statistical methods for classifying data, compressing dimensions, and extracting features for a data group composed of a plurality of variables have been studied. For example, methods such as principal component analysis and cluster analysis using Euclidean distance as a distance function are known, and canonical correlation analysis is also an example.

こうした様々な手法を、監視データを基にした障害検知に適用するにあたっては、ある種の適性が存在する。例えばあるジョブの実行において同一のノード特性を持つノード群について、それらの監視データを時系列データとして捉えてみると、大局的には変動が少ない一方で、局所的にはノード間で互いに同期しない微細な変動を呈する場合がある。このような場合には、例えばノード性能行列についてペアワイズでユークリッド距離を求め、群平均法によってノード間の距離を判定することで、様々に変動する監視データ群から異常なものを検知することができる。 In applying these various methods to fault detection based on monitoring data, there is a certain suitability. For example, regarding a group of nodes that have the same node characteristics in the execution of a job, when the monitoring data is regarded as time-series data, there is little fluctuation on the whole, but locally the nodes do not synchronize with each other There may be slight fluctuations. In such a case, for example, by calculating the Euclidean distance pairwise for the node performance matrix and determining the distance between the nodes by the group average method, abnormal data can be detected from various monitoring data groups. .

こうした統計手法は、情報処理装置のメモリ上ではアルゴリズムを実装するプログラムとして実現される。そして、それら様々なアルゴリズム群からジョブの性質に応じて適切なものを選択する方法として、例えば監視データを時系列データのグラフとして監視コンソールスクリーンに図示し、運用管理担当者がその振る舞いを観察し、適切なアルゴリズムを判断、設定するといった方法が考えられる。また、こうしたプロセスをプログラムで自動化することも考えられる。 Such a statistical method is realized as a program for implementing an algorithm on the memory of the information processing apparatus. Then, as a method of selecting an appropriate algorithm from these various algorithm groups according to the nature of the job, for example, monitoring data is displayed on the monitoring console screen as a graph of time-series data, and the operation manager observes the behavior. A method of determining and setting an appropriate algorithm can be considered. It is also conceivable to automate this process with a program.

他にも、ジョブの性質に応じてアルゴリズムの適性を判定する方法は様々なものが考えられるが、本発明で注目するのは、監視データの分析に基づく障害検知に適用するアルゴリズムについて、様々なものを適宜使い分けることで、より効果的な障害検知を実現し得るという点である。そこで本実施例では、ジョブ、あるいはジョブの中でのノード特性の組によって、それぞれ適用する分析アルゴリズムを選択することで、より効果的に障害検知を行う構成と処理手順を示す。以下、基本的な構成、処理手順は第一の実施例と同一であるため、差異となる部分のみを説明する。 There are various other methods for determining the suitability of the algorithm according to the nature of the job. The present invention focuses on various algorithms that are applied to failure detection based on analysis of monitoring data. It is a point that a more effective fault detection can be realized by properly using things. Therefore, in this embodiment, a configuration and processing procedure for detecting a failure more effectively by selecting an analysis algorithm to be applied according to a job or a set of node characteristics in the job will be described. Hereinafter, since the basic configuration and the processing procedure are the same as those in the first embodiment, only differences will be described.

図３４は、第四の実施例において稼働状況評価部２０１４が使用するテーブルの一例を示す図である。 FIG. 34 is a diagram illustrating an example of a table used by the operation status evaluation unit 2014 in the fourth embodiment.

図３４Ａは、第四の実施例における分析アルゴリズム付きジョブプロファイルテーブルの一例を示す図である。図２４に示すジョブプロファイルテーブル５０９では、あるノード特性とその比較対象に適用する分析アルゴリズムは暗黙のうちに仮定されていた。一方、図３４Ａに示すジョブプロファイルテーブル５１０は、分析アルゴリズムフィールド５１０１と、閾値データフィールド５１０２を含む。すなわちジョブプロファイルテーブル５１０は、分析アルゴリズムと閾値データを、ジョブ名およびノード特性の組をキーとして検索できるよう記録したものである。 FIG. 34A is a diagram illustrating an example of a job profile table with an analysis algorithm in the fourth embodiment. In the job profile table 509 shown in FIG. 24, an analysis algorithm applied to a certain node characteristic and its comparison target is implicitly assumed. On the other hand, the job profile table 510 shown in FIG. 34A includes an analysis algorithm field 5101 and a threshold data field 5102. That is, the job profile table 510 records the analysis algorithm and threshold data so that a combination of job name and node characteristic can be searched as a key.

分析アルゴリズムを記録する分析アルゴリズムフィールド５１０１は、特定のアルゴリズムと一意に対応するＩＤを含む。このアルゴリズムＩＤを記録するテーブルは後述される。 An analysis algorithm field 5101 for recording an analysis algorithm includes an ID uniquely corresponding to a specific algorithm. A table for recording the algorithm ID will be described later.

閾値データを含む閾値データフィールド５１０２は、分析アルゴリズムが障害の発生を判定するために使用するデータを含む。第一の実施例におけるジョブプロファイルテーブル５０９は既定正準相関データを記録していたが、これはその名前が示す通り、正準相関分析に基づく障害検知の処理において必要なデータであった。一方、本実施例にて閾値データを記録する閾値データフィールド５１０２は、分析アルゴリズムフィールド５１０１が含みうる様々な分析アルゴリズムに対応する閾値データを含む。なお、該閾値データは、第一の実施例で相関分析の処理（図２３のステップＳ２３０９）に用いた閾値とは異なる構成要素であることには留意されたい。 A threshold data field 5102 containing threshold data contains data that the analysis algorithm uses to determine the occurrence of a fault. The job profile table 509 in the first embodiment records predetermined canonical correlation data, which, as the name indicates, is necessary data in the failure detection process based on the canonical correlation analysis. On the other hand, the threshold data field 5102 for recording threshold data in this embodiment includes threshold data corresponding to various analysis algorithms that the analysis algorithm field 5101 can include. It should be noted that the threshold data is a component different from the threshold used in the correlation analysis process (step S2309 in FIG. 23) in the first embodiment.

図３４Ｂは、分析アルゴリズムテーブルの一例を示す図である。分析アルゴリズムテーブル５１１は、アルゴリズムＩＤフィールド５１１１、アルゴリズムＩＤに対応する分析アルゴリズムの名称を記録するアルゴリズム名フィールド５１１２、該分析アルゴリズムを実装する関数へのポインタを記録する分析関数ポインタフィールド５１１３、該分析関数の出力である相関値と閾値データを比較する関数へのポインタを記録する閾値判定関数ポインタフィールド５１１４を含む。関数へのポインタとは、監視マネージャ２０１と同様に監視サーバ１１０のメモリ１０２に実装されるプログラムを指示するアドレスであり、例えば分析関数プログラム５１２のメモリ空間上のアドレスである。すなわち分析アルゴリズムテーブル５１１は、アルゴリズムＩＤをキーとして、該アルゴリズムＩＤと一意に対応するある分析アルゴリズムを実装するプログラム、および該分析アルゴリズムが算出する相関値と閾値データを比較するプログラムを検索できるよう記録したものである。アルゴリズム名フィールド５１１２に記録された分析アルゴリズムの名称は、監視コンソール２０２を介した運用管理担当者への情報の提示において使用する。 FIG. 34B is a diagram illustrating an example of the analysis algorithm table. The analysis algorithm table 511 includes an algorithm ID field 5111, an algorithm name field 5112 that records the name of the analysis algorithm corresponding to the algorithm ID, an analysis function pointer field 5113 that records a pointer to a function that implements the analysis algorithm, and the analysis function A threshold value determination function pointer field 5114 for recording a pointer to a function for comparing the correlation value and the threshold value data. A pointer to a function is an address indicating a program installed in the memory 102 of the monitoring server 110 as in the monitoring manager 201, and is an address on the memory space of the analysis function program 512, for example. That is, the analysis algorithm table 511 is recorded so that a program that implements a certain analysis algorithm that uniquely corresponds to the algorithm ID and a program that compares the correlation value calculated by the analysis algorithm with threshold data can be searched using the algorithm ID as a key. It is a thing. The name of the analysis algorithm recorded in the algorithm name field 5112 is used when presenting information to the person in charge of operation management via the monitoring console 202.

前記の関数ポインタは必ずしもメモリ空間上のアドレスである必要はなく、例えば分析関数プログラム５１２は、監視マネージャとはまた異なるユーザプロセス２００として実装され、該プログラムと監視マネージャがプロセス間通信を行うためのエンドポイントをもって関数ポインタと見做してもよい。こうしたプログラムの相互呼び出しに関する多様な技術の中から当業者にとって好適なものを選択してよい。 The function pointer does not necessarily need to be an address in the memory space. For example, the analysis function program 512 is implemented as a user process 200 different from the monitoring manager, and the program and the monitoring manager are used for communication between processes. An endpoint may be considered a function pointer. One suitable for those skilled in the art may be selected from a variety of techniques related to the mutual calling of such programs.

ジョブプロファイルテーブル５１０の分析アルゴリズムフィールド５１０１は、システム運用管理装置の動作中に任意のタイミングで書き換えることができる。また、分析アルゴリズムテーブル５１１の関数ポインタを記録するフィールド５１１３および５１１４と、該ポインタが指示するメモリ空間内のアドレスに格納されるプログラムは、同じく任意のタイミングで書き換えることができる。もちろん、監視サーバ１１０のメモリに複数のプログラムを実装しておき、分析アルゴリズムテーブル５１１のフィールドに記録された関数ポインタを、あるプログラムのアドレスから別のプログラムのアドレスへと切り替えることもできる。つまり、並列分散処理システムの稼働中に、適用する分析アルゴリズムを様々に変更することができる。こうした自由度は、並列分散処理システムの複雑さ、あるいは情報処理装置で発生する障害の多様さに適応するために必要なものである。こうしたフィールドの書き換えを行うタイミングの例は後述される。 The analysis algorithm field 5101 of the job profile table 510 can be rewritten at an arbitrary timing during the operation of the system operation management apparatus. Similarly, the fields 5113 and 5114 in which the function pointers of the analysis algorithm table 511 are recorded and the program stored at the address in the memory space indicated by the pointers can be rewritten at an arbitrary timing. Of course, a plurality of programs can be mounted in the memory of the monitoring server 110, and the function pointer recorded in the field of the analysis algorithm table 511 can be switched from one program address to another program address. That is, the analysis algorithm to be applied can be variously changed during the operation of the parallel distributed processing system. Such a degree of freedom is necessary to adapt to the complexity of the parallel distributed processing system or the variety of failures that occur in the information processing apparatus. An example of timing for rewriting such a field will be described later.

稼働状況評価部２０１４は、ジョブプロファイルテーブル５１０および分析アルゴリズムテーブル５１１を、相関分析の処理に供するため監視サーバ１１０のメモリ１０２に保存する。または、データベース２０１３に格納してもよい。 The operating status evaluation unit 2014 stores the job profile table 510 and the analysis algorithm table 511 in the memory 102 of the monitoring server 110 for use in correlation analysis processing. Alternatively, it may be stored in the database 2013.

図３５は、相関分析の処理手順（ステップＳ１２０５）の別の一例を示す図である。図２３で示した相関分析の処理手順では、分析アルゴリズムとして正準相関分析を用いることを前提とした処理であったが、ここでは、複数の分析アルゴリズムを使い分ける処理を示す。なお、便宜上「相関」という呼称を用いて説明するが、統計学におけるその語義は本実施例で適用する統計手法を限定するものではなく、「相関係数」の上位概念としての「類似度」、あるいは任意の距離空間における「距離」といった概念を含む、より広義のものとして捉えられるべきものである。 FIG. 35 is a diagram illustrating another example of the correlation analysis processing procedure (step S1205). The correlation analysis processing procedure shown in FIG. 23 is based on the assumption that canonical correlation analysis is used as the analysis algorithm, but here, a process for selectively using a plurality of analysis algorithms is shown. For convenience, the term “correlation” is used for explanation, but the meaning in statistics does not limit the statistical method applied in this embodiment, and “similarity” as a superordinate concept of “correlation coefficient”. Or it should be understood as a broader one that includes the concept of “distance” in an arbitrary metric space.

相関分析の処理手順において、仮想グループへの所属の判定から、ノード性能行列の取得まで（ステップＳ３５０１〜ステップＳ３５０６）は、第一の実施例と共通である。すなわち相関分析は管理対象ホスト毎に行い、まず管理対象ホストが仮想グループに属するかを判定し（Ｓ３５０１）、もし仮想グループに属していないとすれば、該管理対象ホストはどのカレントジョブの実行にも関与していないということであり、稼働状況評価の対象外とする（Ｓ３５１３）。次に管理対象ホストのノード性能行列が存在するかを判定し（Ｓ３５０２）、もしＯＳ性能情報が取得されていない等の理由でノード性能行列が生成されず、該管理対象ホストのノード性能行列テーブルが存在しない場合は、稼働状況評価の対象外とする（Ｓ３５１３）。 In the correlation analysis procedure, the process from the determination of belonging to the virtual group to the acquisition of the node performance matrix (steps S3501 to S3506) is the same as in the first embodiment. That is, the correlation analysis is performed for each managed host. First, it is determined whether the managed host belongs to the virtual group (S3501). If the managed host does not belong to the virtual group, the managed host determines which current job to execute. Is not involved, and is excluded from the operation status evaluation (S3513). Next, it is determined whether or not the node performance matrix of the managed host exists (S3502). If the OS performance information is not acquired, the node performance matrix is not generated, and the node performance matrix table of the managed host. Is not included in the operation status evaluation (S3513).

次からの処理は、管理対象ホストが属する仮想グループに注目して行う。まず該仮想グループに存在するノード特性を抽出し（Ｓ３５０３）、それらノード特性毎に、該ノード特性を備えるノード群を抽出し（Ｓ３５０４）、そしてそれらノード群から順に１ノードを選択して相関の算出の処理を行う（Ｓ３５０５）。これらの処理に必要な、ノードとノード特性の情報はクラスタマップテーブル５０６から抽出することができる。次いで、ステップＳ３５０５のループにて選択した１ノードについて、該ノードのノード性能行列Ｖnを取得する（Ｓ３５０６）。これ以降の処理が、第一の実施例の差異となる。 The following processing is performed paying attention to the virtual group to which the managed host belongs. First, node characteristics existing in the virtual group are extracted (S3503), a node group having the node characteristics is extracted for each node characteristic (S3504), and one node is sequentially selected from these node groups to obtain correlation. Calculation processing is performed (S3505). Node and node characteristic information necessary for these processes can be extracted from the cluster map table 506. Next, for one node selected in the loop of step S3505, the node performance matrix Vn of the node is acquired (S3506). The subsequent processing is the difference of the first embodiment.

まず、管理対象ホストのノード性能行列Ｖｐと、ステップＳ３５０６にて取得したノード性能行列Ｖｎとを引数として関数ｆ１を実行し、その解ｒを得る（Ｓ３５０７）。関数ｆ１は、ジョブプロファイルテーブル５１０を管理対象ホストと選択したノードのノード特性の組をキーとして検索してアルゴリズムＩＤを取得し、さらに該アルゴリズムＩＤをキーとして分析アルゴリズムテーブル５１１を検索することで取得できる、分析関数ポインタの指示する分析関数プログラムである。典型的には、該分析関数ｆ１はＶｎおよびＶｐを引数に取り、その戻り値をｒとする。このｒは、先に説明した「相関」の値であり、二つのノード性能行列間の類似度あるいは距離を意味する。 First, the function f1 is executed with the node performance matrix Vp of the managed host and the node performance matrix Vn acquired in step S3506 as arguments, and the solution r is obtained (S3507). The function f1 is obtained by searching the job profile table 510 using a combination of the node characteristics of the managed host and the selected node as a key to acquire an algorithm ID, and further searching the analysis algorithm table 511 using the algorithm ID as a key. This is an analysis function program that can be indicated by an analysis function pointer. Typically, the analysis function f1 takes Vn and Vp as arguments, and its return value is r. This r is the value of “correlation” described above, and means the similarity or distance between two node performance matrices.

次にジョブプロファイルテーブル５１０を、管理対象ホストと選択したノードのノード特性の組をキーとして検索することで閾値データｔを取得する（Ｓ３５０８）。この閾値ｔは、相関値ｒと比較することを目的としたデータである。 Next, the threshold value t is obtained by searching the job profile table 510 using a set of node characteristics of the managed host and the selected node as a key (S3508). This threshold value t is data intended to be compared with the correlation value r.

次にｒおよびｔを引数として関数ｆ２を実行し、その解として真偽値を得（Ｓ３５０９）、もし真であれば閾値を超過していると見做し、カウンタをインクリメントする（Ｓ３５１０）。もし偽であれば閾値を超過していないと見做す。関数ｆ２は、関数ｆ１と同様、ジョブプロファイルテーブル５１０を管理対象ホストと選択したノードのノード特性の組をキーとして検索してアルゴリズムＩＤを取得し、さらに該アルゴリズムＩＤをキーとして分析アルゴリズムテーブル５１１を検索することで取得できる、閾値判定関数ポインタの指示する閾値判定関数プログラムである。典型的には、該閾値判定関数ｆ２はｒおよびｔを引数に取り、真偽値を戻り値とする。 Next, the function f2 is executed with r and t as arguments, and a true / false value is obtained as a solution (S3509). If true, it is considered that the threshold has been exceeded, and the counter is incremented (S3510). If false, it is assumed that the threshold has not been exceeded. Similar to the function f1, the function f2 searches the job profile table 510 using a combination of the node characteristics of the managed host and the selected node as a key to obtain an algorithm ID, and further uses the algorithm ID as a key to obtain an analysis algorithm table 511. This is a threshold value determination function program indicated by a threshold value determination function pointer, which can be acquired by searching. Typically, the threshold value determination function f2 takes r and t as arguments, and uses a true / false value as a return value.

こうして、相関値の算出と閾値データとの比較を、仮想グループ内の全ノード特性とそれに属するノード群、すなわち仮想グループ内の全ノードに対して行い、結果カウンタの値が仮想グループの(現在注目している管理対象ホストを除いた)ノード数に等しくなった場合（Ｓ３５１１）、当該管理対象ホストでは障害が発生していると判定し、管理対象ホスト一覧のテーブルの該レコードについて障害検知フラグを１に設定する（Ｓ３５１２）。このカウンタの値について、仮想グループのノード数と等しくなったときに限らず、例えば仮想グループのノード数の半数を超えた場合に障害発生とみなす等、その判定の基準として任意のものを設定できるのは第一の実施例と同様である。 In this way, the correlation value is calculated and the threshold data is compared with all node characteristics in the virtual group and the nodes belonging to it, that is, all nodes in the virtual group. If it is equal to the number of nodes (excluding managed managed hosts) (S3511), it is determined that a failure has occurred in the managed host, and a failure detection flag is set for the record in the managed host list table. It is set to 1 (S3512). The value of this counter is not limited to when it is equal to the number of nodes in the virtual group. For example, it can be set as a criterion for the determination, such as when a failure occurs when the number of nodes in the virtual group exceeds half of the number. This is the same as in the first embodiment.

閾値判定関数ｆ２は、ｒおよびｔについて、そのスカラ値としての大小を比較するものとは限らない。例えば、関数ｆ１として正準相関分析を採用する場合であれば、rは正準相関係数ρ１〜ρｎの配列であり、ｔは並列分散処理システムの正常時において一定以上の値である正準相関係数の配列であり、関数ｆ２は配列ｒの要素のうち一定以上の値であるものの要素数ｎ１と配列ｔの要素数ｎ２を比較し、ｎ１＜ｎ２である場合に真を、それ以外の場合に偽を返却するものとなろう。また同様に、ｔをシステム正常時の正準相関係数の配列と、前記の「一定以上の値」を判定する閾値とを格納する構造体としてもよい。これらの例は第一の実施例における正準相関分析に基づいた処理と実質的に同一のものであるが、このように二つの関数ｆ１、ｆ２、そして閾値データｔによって抽象化することで、様々な分析アルゴリズムを適用することができる。 The threshold determination function f2 does not necessarily compare the magnitudes of r and t as scalar values. For example, if canonical correlation analysis is adopted as the function f1, r is an array of canonical correlation coefficients ρ1 to ρn, and t is a canonical value that is a certain value or more when the parallel distributed processing system is normal. This is an array of correlation coefficients, and the function f2 compares the number of elements n1 of the elements of the array r that are greater than or equal to a certain value with the number of elements n2 of the array t, and true if n1 <n2, otherwise In the case of, it will return fake. Similarly, t may be a structure that stores an array of canonical correlation coefficients when the system is normal and a threshold for determining the “value above a certain value”. These examples are substantially the same as the processing based on the canonical correlation analysis in the first embodiment, but by abstracting with the two functions f1, f2 and the threshold data t in this way, Various analysis algorithms can be applied.

もしジョブプロファイルテーブル５１０の分析アルゴリズムフィールド５１０１にアルゴリズムＩＤが記録されていない場合は、デフォルトの分析アルゴリズムを使用するように構成してもよい。このようなデフォルトの分析アルゴリズムは、固定されていてもよいし、運用管理担当者が指定してもよい。 If an algorithm ID is not recorded in the analysis algorithm field 5101 of the job profile table 510, a default analysis algorithm may be used. Such a default analysis algorithm may be fixed, or may be designated by an operation manager.

さて、並列分散処理システムの動作中に、任意のタイミングで適用する分析アルゴリズムを変更できることは前述した。どの分析アルゴリズムを適用するかの判断について、これを運用管理担当者の裁量によって行ってもよいし、システム運用管理装置がプログラムによって行ってもよい。本実施例では、まず運用管理担当者が、監視コンソールを介して障害検知に使用する分析アルゴリズムをシステム運用管理装置に指示できるようにする方法の一例を示す。続いて、システム運用管理装置がプログラムによって、障害検知に使用する分析アルゴリズムを判定する方法の一例を示す。 As described above, the analysis algorithm applied at an arbitrary timing can be changed during the operation of the parallel distributed processing system. The determination as to which analysis algorithm to apply may be performed at the discretion of the person in charge of operation management, or may be performed by a system operation management apparatus by a program. In the present embodiment, an example of a method for allowing an operation management person to instruct a system operation management apparatus of an analysis algorithm used for failure detection via a monitoring console is shown first. Next, an example of a method in which the system operation management apparatus determines an analysis algorithm used for failure detection by a program will be described.

図３６は、障害検知に使用する分析アルゴリズムを設定する画面の例を示す図である。分析アルゴリズム設定画面６０４は、監視コンソール２０２が監視コンソールスクリーン６００に表示する画面であり、デフォルト分析アルゴリズム選択ドロップダウンボックス６０４１、マルチ分析アルゴリズム使用チェックボックス６０４２、分析アルゴリズム自動判定ラジオボタン６０４３、分析アルゴリズム手動設定ラジオボタン６０４４、カスタム分析アルゴリズム設定テーブル６０４５、カスタム分析アルゴリズム設定ボタン６０４８を備える。カスタム分析アルゴリズム設定テーブル６０４５は、複数のエントリにより構成され、エントリそれぞれは関連付け使用チェックボックス６０４５１、ジョブ名６０４５２、ノード特性の組名６０４５３、分析アルゴリズム名６０４５４の各フィールドを備える。また、分析アルゴリズム設定画面６０４は、分析アルゴリズムリスト表示リンク６０４６、ＯＫ／Ｃａｎｃｅｌボタン６０４７を備える。 FIG. 36 is a diagram illustrating an example of a screen for setting an analysis algorithm used for failure detection. The analysis algorithm setting screen 604 is a screen that the monitoring console 202 displays on the monitoring console screen 600, and includes a default analysis algorithm selection drop-down box 6041, a multi-analysis algorithm use check box 6042, an analysis algorithm automatic determination radio button 6043, an analysis algorithm manual. A setting radio button 6044, a custom analysis algorithm setting table 6045, and a custom analysis algorithm setting button 6048 are provided. The custom analysis algorithm setting table 6045 includes a plurality of entries, and each entry includes fields of an association use check box 60451, a job name 60452, a node characteristic set name 60453, and an analysis algorithm name 60454. The analysis algorithm setting screen 604 includes an analysis algorithm list display link 6046 and an OK / Cancel button 6047.

運用管理担当者は、監視コンソール２０２が実装されているクライアント１２０のコンソール１０５に表示される監視コンソールスクリーン６００と、同じくコンソール１０５のヒューマンインタフェースデバイスを用いて、障害検知に使用する分析アルゴリズムを監視マネージャ２０１に対して指定することができる。 The person in charge of operation management uses the monitoring console screen 600 displayed on the console 105 of the client 120 on which the monitoring console 202 is installed and the human interface device of the console 105 to monitor the analysis algorithm used for fault detection. 201 can be specified.

デフォルト分析アルゴリズム選択ドロップダウンボックス６０４１は、システム運用管理装置にプログラムとしてインストールされ、相関分析の処理に適用可能となっている分析アルゴリズムの名称を選択肢として表示し、これを操作することでデフォルトの分析アルゴリズムを選択し監視マネージャに対して指示することができる。デフォルトの分析アルゴリズムは、他に分析アルゴリズムを選択する契機が存在しない場合に適用する。これは例えば、複数種類の分析アルゴリズムを使用するよう指示されていない場合や、初めて実行するジョブであったり、分析アルゴリズムの自動判定を実行する前提となる情報が未だ十分に蓄積されていなかったりといった理由により適用する分析アルゴリズムの自動判定が行われなかった場合や、使用する分析アルゴリズムが手動で設定されていない場合等に、デフォルトの分析アルゴリズムを適用する。また、システム運用管理装置にプログラムとしてインストールされた分析アルゴリズムが１つのみ存在する場合には、デフォルト分析アルゴリズム選択ドロップダウンボックス６０４１は選択肢としてその分析アルゴリズムの名称のみを表示し、デフォルトの分析アルゴリズムとして適用する。 The default analysis algorithm selection drop-down box 6041 displays the names of analysis algorithms that are installed as programs in the system operation management apparatus and can be applied to the correlation analysis processing as options. An algorithm can be selected and directed to the monitoring manager. The default analysis algorithm is applied when there is no other opportunity to select an analysis algorithm. This is the case, for example, when there is no instruction to use multiple types of analysis algorithms, or when a job is executed for the first time, or information that is a prerequisite for executing automatic determination of analysis algorithms is not yet accumulated. The default analysis algorithm is applied when the analysis algorithm to be applied is not automatically determined for a reason or when the analysis algorithm to be used is not manually set. If there is only one analysis algorithm installed as a program in the system operation management apparatus, the default analysis algorithm selection drop-down box 6041 displays only the name of the analysis algorithm as an option, and as a default analysis algorithm Apply.

マルチ分析アルゴリズム使用チェックボックス６０４２をチェックすることで、複数種類の分析アルゴリズムを障害検知に適用するよう指示することができる。マルチ分析アルゴリズム使用チェックボックス６０４２をチェックすることで、以下の分析アルゴリズム設定に関する操作が可能になる。 By checking the multi-analysis algorithm use check box 6042, it is possible to instruct to apply a plurality of types of analysis algorithms to failure detection. Checking the use of multi-analysis algorithm check box 6042 enables the following operations related to analysis algorithm setting.

分析アルゴリズム自動判定ラジオボタン６０４３を選択すると、監視マネージャに対して、適用する分析アルゴリズムを運用管理担当者による指定に依らずとも判定するよう指示することができる。一方、分析アルゴリズム手動設定ラジオボタン６０４４を選択すると、監視マネージャに対して、運用管理担当者による設定に基づき適用する分析アルゴリズムを変更するよう指示することができる。この二つのラジオボタンは排他関係にあり、同時には選択できないよう構成してある。以降、監視マネージャが前者の指定に従った処理を行うモードを自動判定モード、後者の指定に従った処理を行うモードを手動設定モードと呼称する。 When the analysis algorithm automatic determination radio button 6043 is selected, it is possible to instruct the monitoring manager to determine the analysis algorithm to be applied without depending on the designation by the person in charge of operation management. On the other hand, when the analysis algorithm manual setting radio button 6044 is selected, the monitoring manager can be instructed to change the analysis algorithm to be applied based on the setting by the person in charge of operation management. These two radio buttons are in an exclusive relationship and cannot be selected at the same time. Hereinafter, a mode in which the monitoring manager performs processing according to the former designation is referred to as an automatic determination mode, and a mode in which processing according to the latter designation is referred to as a manual setting mode.

監視マネージャは自動判定モードに設定されると、収集した監視データとクラスタマップテーブルの情報に基づいて、適用する分析アルゴリズムを判定する。この処理の例は後述される。 When the monitoring manager is set to the automatic determination mode, the monitoring manager determines an analysis algorithm to be applied based on the collected monitoring data and information of the cluster map table. An example of this process will be described later.

一方、監視マネージャが手動設定モードに設定されると、運用管理担当者の指示に基づいて分析アルゴリズムを適用する。すなわち、分析アルゴリズム手動設定ラジオボタン６０４４を選択すると、カスタム分析アルゴリズム設定テーブル６０４５の操作が可能になる。カスタム分析アルゴリズム設定テーブル６０４５は、ジョブプロファイルテーブル５１０に記録されているジョブ名およびノード特性の組を列挙し、それらに対してどの分析アルゴリズムを適用するかの関連付けを指示するものである。これら関連付けの情報は、カスタム分析アルゴリズム設定テーブルのエントリとして一覧表示され、多数に及ぶ場合はスクロールバーによりその一部のみを表示する。 On the other hand, when the monitoring manager is set to the manual setting mode, the analysis algorithm is applied based on an instruction from the person in charge of operation management. That is, when the analysis algorithm manual setting radio button 6044 is selected, the custom analysis algorithm setting table 6045 can be operated. The custom analysis algorithm setting table 6045 enumerates combinations of job names and node characteristics recorded in the job profile table 510 and instructs association of which analysis algorithm is applied to them. These pieces of association information are listed as entries in the custom analysis algorithm setting table, and in the case of a large number, only a part of them is displayed by a scroll bar.

エントリの先頭にある関連付け使用チェックボックス６０４５１をチェックすると、該エントリに属する各フィールドの内容に基づく関連付けを有効にする。このチェックボックスを操作することにより、関連付けの適用を一時的に抑止したり、また有効化したり、といった操作が可能になる。 When an association use check box 60451 at the beginning of an entry is checked, association based on the contents of each field belonging to the entry is validated. By operating this check box, operations such as temporarily inhibiting application of the association and enabling it can be performed.

ジョブ名フィールド６０４５２とノード特性の組名フィールド６０４５３に対して、適用したい分析アルゴリズムを分析アルゴリズム名フィールド６０４５４で選択する。１つのジョブ名に対して、ノード特性の組は１つ以上が存在し得るが、設定がない場合はデフォルトの分析アルゴリズムが適用される。この関連付けは運用管理担当者の明示的な指示がなくとも設定および表示されており、例えば、ノード特性の組名として「Ｄｅｆａｕｌｔ」を、分析アルゴリズム名として、前述のデフォルトの分析アルゴリズムの名称を、いずれも斜体で表示する。 For the job name field 60452 and the node property set name field 60453, an analysis algorithm to be applied is selected in the analysis algorithm name field 60454. There can be one or more sets of node characteristics for one job name, but the default analysis algorithm is applied if there is no setting. This association is set and displayed without an explicit instruction from the person in charge of operation management. For example, “Default” is set as the set name of the node characteristic, the name of the default analysis algorithm is set as the analysis algorithm name, Both are displayed in italics.

分析アルゴリズム名フィールドはドロップダウンボックスを兼用しており、関連付けを設定したい分析アルゴリズムを選択できる。表示されている分析アルゴリズムの名称が、デフォルトの分析アルゴリズムである場合にはそれと判別できるよう表示する。例えば、分析アルゴリズム名を斜体で表示する。 The analysis algorithm name field also serves as a drop-down box, and the analysis algorithm for which association is to be set can be selected. When the name of the displayed analysis algorithm is the default analysis algorithm, it is displayed so that it can be distinguished. For example, the analysis algorithm name is displayed in italics.

アルゴリズムリスト表示リンク６０４６を選択すると、システム運用管理装置にプログラムとしてインストールされ相関分析の処理に適用可能となっている分析アルゴリズムの名称の一覧を監視コンソールスクリーン６００に表示する。典型的には、このリストは分析アルゴリズム設定画面６０４とは別の画面として表示し、運用管理担当者がカスタム分析アルゴリズム設定テーブルの操作を行うに当たって参考となるよう、分析アルゴリズムの名称、特徴、過去の使用実績等の情報を表示する。また同様の情報は、ヘルプウィンドウ、ツールチップ等、監視コンソール２０２の操作性の観点から見てより好適な手段を選択して表示してもよい。 When an algorithm list display link 6046 is selected, a list of analysis algorithm names that are installed as programs in the system operation management apparatus and are applicable to correlation analysis processing are displayed on the monitoring console screen 600. Typically, this list is displayed as a screen different from the analysis algorithm setting screen 604, and the name, characteristics, and past of the analysis algorithm are provided so that the operation manager can refer to the custom analysis algorithm setting table. Displays information such as the usage record of. Similar information may be displayed by selecting a more suitable means from the viewpoint of operability of the monitoring console 202, such as a help window and a tool tip.

カスタム分析アルゴリズム設定ボタン６０４８を押下すると、ジョブ名、ノード特性の組名について分析アルゴリズムとの関連付けを追加する画面を表示する。この画面は、カスタム分析アルゴリズム設定テーブルと同様に、ジョブ名、ノード特性の組を表示するが、クラスタマップテーブル５０６に記録されている監視マネージャにとって既知のジョブとそのノード特性の組を全てエントリとして表示する。そのエントリ群の中から、分析アルゴリズムとの関連付けを設定したいエントリを選択すると、カスタム分析アルゴリズム設定テーブルに該エントリが追加され、分析アルゴリズムとの関連付けの設定が可能となる。この操作で追加したエントリは、ノード特性の組名フィールドに「Ｄｅｆａｕｌｔ」ではなく選択したノード特性の組が表示され、分析アルゴリズム名フィールドは関連付けを設定したい分析アルゴリズムを選択できるようドロップダウンボックスを兼用する。 When a custom analysis algorithm setting button 6048 is pressed, a screen for adding an association with the analysis algorithm for the job name and the set of node characteristics is displayed. This screen displays the combination of job name and node characteristics as in the custom analysis algorithm setting table, but all the jobs known to the monitoring manager recorded in the cluster map table 506 and the combination of the node characteristics are entered as entries. indicate. When an entry for which an association with an analysis algorithm is to be set is selected from the entry group, the entry is added to the custom analysis algorithm setting table, and an association with the analysis algorithm can be set. In the entry added by this operation, the selected node characteristic pair is displayed instead of “Default” in the node characteristic pair name field, and the analysis algorithm name field also functions as a drop-down box so that the analysis algorithm to be set can be selected. To do.

監視マネージャは、運用管理担当者が関連付けを設定するにあたって参考になる情報を表示してもよい。これは例えば、後述される分析アルゴリズムの自動判定に用いる方法から得られた自動判定結果を表示したり、ノード特性毎にその代表的な監視データの情報を時系列データのグラフとして表示したり、といった様々な方法を含む。 The monitoring manager may display information for reference by the person in charge of operation management when setting the association. This includes, for example, displaying an automatic determination result obtained from a method used for automatic determination of an analysis algorithm to be described later, displaying information of representative monitoring data for each node characteristic as a graph of time series data, Including various methods.

運用管理担当者が、ＯＫ／Ｃａｎｃｅｌボタン６０４７のうちＯＫボタンを押下すると、監視コンソールは分析アルゴリズムの関連付けに関する情報を監視マネージャに送信する。これは典型的には、デフォルトの分析アルゴリズムの名称、複数種類の分析アルゴリズム使用の可否、自動判定モードと手動設定モードの別、ジョブ名およびそのノード特性の組名とそれに関連付けられた分析アルゴリズムの名称、といった情報であるが、分析アルゴリズム設定画面の操作の前後で変更された情報の差分のみを送信する等、処理効率を鑑みつつ好適な方法を選択してよい。 When the person in charge of operation management presses the OK button of the OK / Cancel button 6047, the monitoring console transmits information related to the association of the analysis algorithm to the monitoring manager. This is typically the name of the default analysis algorithm, the availability of multiple types of analysis algorithms, the distinction between automatic judgment mode and manual setting mode, the name of the job name and its node characteristics, and the associated analysis algorithm. Although it is information such as the name, a suitable method may be selected in view of processing efficiency, such as transmitting only the difference between the information changed before and after the operation of the analysis algorithm setting screen.

監視マネージャは監視コンソールより受信した情報に基づき、ジョブプロファイルテーブル５１０の分析アルゴリズムフィールド５１０１に分析アルゴリズムＩＤを記録する。併せて閾値データフィールド５１０２に、該分析アルゴリズムに対応する閾値データを記録する。適用する分析アルゴリズムによって、それぞれ対応する閾値データが必要となるが、過去に使用した閾値データを再利用することもあるであろう。そこで、閾値データをメモリに保存、あるいはデータベースに格納しておき、適宜ジョブプロファイルテーブルの閾値データフィールドに複製したり、あるいは該閾値データのメモリ空間上のアドレスやデータベース上の検索キーをもって閾値データフィールドの記録内容としたりしてもよい。 The monitoring manager records the analysis algorithm ID in the analysis algorithm field 5101 of the job profile table 510 based on the information received from the monitoring console. In addition, threshold data corresponding to the analysis algorithm is recorded in the threshold data field 5102. Depending on the analysis algorithm to be applied, corresponding threshold data is required, but the threshold data used in the past may be reused. Therefore, the threshold data is stored in the memory or stored in the database, and appropriately copied to the threshold data field of the job profile table, or the threshold data field with the address in the memory space of the threshold data and the search key on the database. Or may be recorded contents.

さて、監視マネージャが自動判定モードに設定されると、収集した監視データとクラスタマップテーブルの情報に基づいて、適用する分析アルゴリズムを判定すると先に述べた。この自動判定の処理の一例について、まずその概念を示し、続いて処理手順を示す。 As described above, when the monitoring manager is set to the automatic determination mode, the analysis algorithm to be applied is determined based on the collected monitoring data and the information of the cluster map table. An example of the automatic determination process will be described first, followed by a processing procedure.

図３７は、ある管理対象ホストから収集したＯＳ性能情報を、時系列データとしてグラフに描画した例を示す。横軸は時間の推移であり、縦軸は当該ＯＳ性能情報が含むメトリックの１つ、例えばプロセッサの使用率の変化である。 FIG. 37 shows an example in which OS performance information collected from a certain managed host is drawn on a graph as time series data. The horizontal axis represents the transition of time, and the vertical axis represents one of the metrics included in the OS performance information, for example, the change in the usage rate of the processor.

図３７Ａは、あるノード特性Ａを備えるノードのメトリックの変動を示すグラフの例である。当該メトリックの変動を、例えば区間７０１で観察すると、その区間での最大値は７０２、最小値は７０３である。一方、図３７Ｂは、また別のノード特性Ｂを備えるノードのメトリックの変動を示す別の例である。同じく区間７０１で観察すると、その区間での最大値は７０４、最小値は７０５である。ここで７０２と７０３の差α、７０４と７０５の差βに注目すると、α》βである。いずれのグラフにおいても、メトリックは微細なレベルで変動しているが、大局的な変動には顕著な違いがある。 FIG. 37A is an example of a graph showing a change in the metric of a node having a certain node characteristic A. FIG. When the change of the metric is observed in the section 701, for example, the maximum value in the section is 702 and the minimum value is 703. On the other hand, FIG. 37B is another example showing a change in the metric of a node having another node characteristic B. Similarly, when observing in the section 701, the maximum value in that section is 704 and the minimum value is 705. If attention is paid to the difference α between 702 and 703 and the difference β between 704 and 705, α >> β. In any graph, the metric fluctuates at a fine level, but there is a significant difference in the global fluctuation.

これが意味するところは、図３７Ａに示すようなメトリックの変動を特徴とするノード特性Ａと、図３７Ｂに示すようなメトリックの変動を特徴とするノード特性Ｂにおいては、適用すべき分析アルゴリズムが異なるということである。なぜならば、ノード特性Ａのようなメトリック変動の特徴を備えるノード同士では、その大局的な変動を相関として適切に検出することができるが、一方ノード特性Ｂのようなメトリック変動の特徴を備えるノード同士の場合、典型的な相関分析のアルゴリズムでは相関が検出されないか、よしんば相関を検出したとしても、障害発生時にその相関の変化、典型的には相関の低下によって、それを検出できない可能性が無視できなくなるためである。 This means that the analysis algorithm to be applied differs between the node characteristic A characterized by metric fluctuation as shown in FIG. 37A and the node characteristic B characterized by metric fluctuation as shown in FIG. 37B. That's what it means. This is because nodes having a metric variation feature such as node characteristic A can appropriately detect the global variation as a correlation, while nodes having a metric variation feature such as node characteristic B. In the case of mutual correlation, even if a correlation is not detected by a typical correlation analysis algorithm, or if a correlation is detected, it may not be detected due to a change in the correlation at the time of failure, typically a decrease in the correlation. This is because it cannot be ignored.

このような理由により、ＯＳ性能情報の変動の特徴を用いた分析アルゴリズムの自動判定が必要となる。前者のようなノード特性に対しては、正準相関分析を一例とする分析アルゴリズム、後者のようなノード特性に対しては、ノード間でのメトリックの相対的な比較に基づく分析アルゴリズムが有効である。 For this reason, it is necessary to automatically determine the analysis algorithm using the characteristics of fluctuations in OS performance information. An analysis algorithm based on canonical correlation analysis is effective for node characteristics such as the former, and an analysis algorithm based on relative comparison of metrics between nodes is effective for node characteristics such as the latter. is there.

このような、メトリック変動の特徴に注目した分析アルゴリズム自動判定の方法として、例えば自己相関分析による方法が考えられるが、ここではより簡易な例として、最大値と最小値を用いる方法を示す。 For example, a method based on autocorrelation analysis can be considered as an analysis algorithm automatic determination method focusing on the feature of metric fluctuation. Here, as a simpler example, a method using a maximum value and a minimum value is shown.

図３８は、分析アルゴリズム自動判定の処理手順の一例を示す図である。この処理は、監視マネージャが自動判定モードに設定された状態である場合に、任意のタイミングで実行する。 FIG. 38 is a diagram illustrating an example of a processing procedure for automatic analysis algorithm determination. This process is executed at an arbitrary timing when the monitoring manager is set to the automatic determination mode.

分析アルゴリズムの自動判定は、典型的には仮想グループ毎に行う。自動判定モードにある監視マネージャは、まずクラスタマップテーブル５０６に記録されているある仮想グループを選択し、さらにその仮想グループからあるノード特性Ｃを備えるノード群を抽出する（Ｓ３８０１）。次に、該ノード群の中からランダムに１つを抽出する（Ｓ３８０２）。この抽出されたノードのＯＳ性能情報を取得し（Ｓ３８０３）、ＯＳ性能情報が含む複数のメトリックの各々について（Ｓ３８０４）、一定のタイムフレームのデータを取得し、その区間内での最大値と最小値を求める（Ｓ３８０５）。次いで最大値と最小値の差、すなわち変動の幅をある一定の閾値と比較し（Ｓ３８０６）、もし変動の幅が閾値を超える場合は、カウンタＡをインクリメントし（Ｓ３８０７）、一方、変動の幅が閾値内に収まる場合は、カウンタＢをインクリメントする（Ｓ３８０８）。 The automatic determination of the analysis algorithm is typically performed for each virtual group. The monitoring manager in the automatic determination mode first selects a certain virtual group recorded in the cluster map table 506, and further extracts a node group having a certain node characteristic C from the virtual group (S3801). Next, one node is randomly extracted from the node group (S3802). The OS performance information of the extracted node is acquired (S3803), and for each of a plurality of metrics included in the OS performance information (S3804), data of a certain time frame is acquired, and the maximum value and minimum value in the section are acquired. A value is obtained (S3805). Next, the difference between the maximum value and the minimum value, that is, the fluctuation range is compared with a certain threshold value (S3806). If the fluctuation range exceeds the threshold value, the counter A is incremented (S3807). Is within the threshold value, the counter B is incremented (S3808).

この変動幅と閾値との比較を全メトリックについて行った後、カウンタＡの値とカウンタＢの値を比較する（Ｓ３８０９）。比較の結果、カウンタＡの方が大きい場合は分析アルゴリズムＡの適用を判定し（Ｓ３８１０）、カウンタＢの方が大きい場合には分析アルゴリズムＢの適用を判定する（Ｓ３８１１）。典型的には、分析アルゴリズムＡは正準相関分析を一例とし、分析アルゴリズムＢは後述されるようなメトリックの平均値に注目するアルゴリズムを一例とする。 After comparing the fluctuation range with the threshold value for all metrics, the value of the counter A is compared with the value of the counter B (S3809). As a result of the comparison, if the counter A is larger, the application of the analysis algorithm A is determined (S3810), and if the counter B is larger, the application of the analysis algorithm B is determined (S3811). Typically, the analysis algorithm A is an example of canonical correlation analysis, and the analysis algorithm B is an example of an algorithm that focuses on an average value of metrics as described later.

分析アルゴリズムが判定されると、監視マネージャは、ジョブプロファイルテーブル５１０から該仮想グループが実行に関与するジョブ名とノード特性Ｃ同士の組のレコードを検索し、判定した分析アルゴリズムを該レコードの分析アルゴリズムフィールド５１０１に記録し、以降の相関分析の処理に適用する。 When the analysis algorithm is determined, the monitoring manager searches the job profile table 510 for a record of a combination of the job name and node characteristics C involved in the execution of the virtual group, and uses the determined analysis algorithm as the analysis algorithm for the record. The data is recorded in the field 5101 and applied to the subsequent correlation analysis processing.

また、あるジョブのあるノード特性について適用する分析アルゴリズムを判定した後、まだ他のノード特性について自動判定の処理が行われていなかった場合、当該分析アルゴリズムを他のノード特性の組に適用する分析アルゴリズムとして併せて記録してもよい。これにより、あるノード特性１つについて適用する分析アルゴリズムを判定すれば、それをデフォルトの分析アルゴリズムに代えて当該ジョブの分析アルゴリズムとして適用することができる。 In addition, after determining the analysis algorithm to be applied to a certain node characteristic of a job, if the automatic determination processing has not yet been performed for another node characteristic, the analysis algorithm is applied to another set of node characteristics. You may record together as an algorithm. Thus, if an analysis algorithm to be applied to one node characteristic is determined, it can be applied as an analysis algorithm for the job instead of the default analysis algorithm.

さて、前記の自動判定方法は、あるメトリックについて、その最大値と最小値に注目する方法であった。この２つの値は、障害検知のための分析にも活用できる。すなわち変動の少ないメトリックしか得られない場合に適用する分析アルゴリズムの一例を示す。 The automatic determination method is a method of paying attention to the maximum value and the minimum value of a certain metric. These two values can also be used for analysis for fault detection. That is, an example of an analysis algorithm applied when only a metric with little fluctuation can be obtained is shown.

図３９は、あるノード特性を備える３つの管理対象ホストから収集したＯＳ性能情報を、時系列データとしてグラフに描画した例を示す。横軸は時間の推移であり、縦軸は当該ＯＳ性能情報が含むメトリックの１つ、例えばプロセッサの使用率の変化である。 FIG. 39 shows an example in which OS performance information collected from three managed hosts having certain node characteristics is drawn on a graph as time series data. The horizontal axis represents the transition of time, and the vertical axis represents one of the metrics included in the OS performance information, for example, the change in the usage rate of the processor.

図３９において、前記の分析アルゴリズム自動判定の処理を実行することで得られた、あるノード特性Ｃの管理対象ホストにおけるプロセッサ使用率の最大値を７０６、最小値を７０７で示す。また、ノード特性Ｃを備えるある２つの管理対象ホストのプロセッサ使用率を、それぞれ平均値を７０８、７０９で示す。平均値７０８、７０９は、前記７０６と７０７で夾叉される範囲内に収まっていることがわかる。 In FIG. 39, the maximum value of the processor usage rate in the managed host of a certain node characteristic C obtained by executing the above-described analysis algorithm automatic determination processing is indicated by 706, and the minimum value is indicated by 707. Further, the average values of the processor usage rates of two managed hosts having node characteristics C are indicated by 708 and 709, respectively. It can be seen that the average values 708 and 709 are within the range spanned by 706 and 707.

一方で、同じくノード特性Ｃを備える、別の管理対象ホストのプロセッサ使用率の平均値を算出してみたところ、７１０であったとしよう。この平均値７１０は、前記７０６と７０７の範囲を逸脱している。これを持って、平均値７１０を呈する当該管理対象ホストでは障害が発生していると判定する。 On the other hand, suppose that the average value of the processor usage rate of another managed host having the node characteristic C is 710. This average value 710 is out of the range of 706 and 707. With this, it is determined that a failure has occurred in the managed host exhibiting the average value 710.

この分析アルゴリズムは、ノード毎に平均値の算出とその閾値判定を行うだけであり、前述した相関分析の処理手順より単純であるが、ジョブによっては実用的なレベルで障害検知が可能である。 This analysis algorithm only calculates the average value and threshold value determination for each node, and is simpler than the above-described correlation analysis processing procedure. However, depending on the job, failure detection is possible at a practical level.

以上のようにして、並列分散処理システムの稼働中に、監視マネージャの稼働状況評価部２０１が障害検知に適用する分析アルゴリズムを、運用管理担当者の手動設定によっても、監視マネージャの自動判定によっても、様々に変更することができる。 As described above, during the operation of the parallel distributed processing system, the analysis algorithm applied to the failure detection by the monitoring manager's operation status evaluation unit 201 can be manually set by the person in charge of operation management or automatically determined by the monitoring manager. Various changes can be made.

また、監視マネージャはその稼働中に、自動判定モードと手動設定モードを相互に遷移することもできる。この場合、モード遷移前にジョブプロファイルテーブル５１０に記録した分析アルゴリズムや閾値データを維持することが望ましいが、モード遷移後に、各々決められた処理に基づいてこれらのデータに変更を加えることは妨げられない。 The monitoring manager can also switch between the automatic determination mode and the manual setting mode during operation. In this case, it is desirable to maintain the analysis algorithm and threshold data recorded in the job profile table 510 before the mode transition. However, after the mode transition, it is prohibited to change these data based on each determined process. Absent.

以上説明した方法により、第四の実施例は、様々な契機に複数の分析アルゴリズムから１つを選択して適用することで、より効果的な障害検知を実現する。 By the method described above, the fourth embodiment realizes more effective fault detection by selecting and applying one from a plurality of analysis algorithms at various occasions.

なお、本発明は上記した実施例に限定されるものではなく、様々な変形例が含まれる。例えば、上記した実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明したすべての構成を備えるものに限定されるものではない。また、ある実施例の構成の一部を他の実施例の構成に置き換えることが可能であり、また、ある実施例の構成に他の実施例の構成を加えることも可能である。また、各実施例の構成の一部について、他の構成の追加・削除・置換をすることが可能である。 In addition, this invention is not limited to an above-described Example, Various modifications are included. For example, the above-described embodiments have been described in detail for easy understanding of the present invention, and are not necessarily limited to those having all the configurations described. Further, a part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment. Further, it is possible to add, delete, and replace other configurations for a part of the configuration of each embodiment.

また、上記の各構成、機能、処理部、処理手段等は、それらの一部または全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、上記の各構成、機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現してもよい。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリや、ＨＤＤ、ＳＳＤ等の記憶装置、またはＳＤカード、ＤＶＤ−ＲＯＭ等の記憶媒体に置くことができる。 Each of the above-described configurations, functions, processing units, processing means, and the like may be realized by hardware by designing a part or all of them with, for example, an integrated circuit. Each of the above-described configurations, functions, and the like may be realized by software by interpreting and executing a program that realizes each function by the processor. Information such as programs, tables, and files for realizing each function can be stored in a memory, a storage device such as HDD or SSD, or a storage medium such as an SD card or DVD-ROM.

また、制御線や情報線は説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。実際にはほとんど全ての構成が相互に接続されていると考えてもよい。 Further, the control lines and information lines indicate what is considered necessary for the explanation, and not all the control lines and information lines on the product are necessarily shown. Actually, it may be considered that almost all the components are connected to each other.

１００情報処理装置
１１０監視サーバ
１２０クライアント
１３０マスタノード
１４０ワーカノード
２００ユーザプロセス
２０１監視マネージャ
２０２監視コンソール
２０３ジョブトラッカ
２０４ネームノード
２０５監視エージェント
２０６タスクトラッカ
２０７データノード
２０８リモートモニタ
３００ＭａｐＲｅｄｕｃｅクラスタ
３０１ジョブ
３０２タスク
３０３データブロック
３０４スプリット
３０５Ｍａｐタスク
３０６中間ファイル
３０７Ｒｅｄｕｃｅタスク
３０８出力ファイル
３０９Ｍａｐスロット
３１０Ｒｅｄｕｃｅスロット
４０１管理対象ホスト一覧テーブル
４０２ＯＳ性能情報
４０３ＭａｐＲｅｄｕｃｅスケジューリング情報
４０４ジョブリスト
４０５タスクリスト
４０６アテンプトリスト
４０７データ転送トレース
５０１仮想グループテーブル
５０２仮想グループノード一覧テーブル
５０３仮想グループ
５０４ノード特性テーブル
５０５多相ノード特性テーブル
５０６クラスタマップテーブル
５０７ノード特性グループ
５０８ノード性能行列テーブル
５０９ジョブプロファイルテーブル
５１０分析アルゴリズム付きジョブプロファイルテーブル
５１１分析アルゴリズムテーブル
６００監視コンソールスクリーン
６０１プリファレンス設定画面
６０２イベント通知画面
６０３クラスタ表示画面
６０４分析アルゴリズム設定画面 DESCRIPTION OF SYMBOLS 100 Information processing apparatus 110 Monitoring server 120 Client 130 Master node 140 Worker node 200 User process 201 Monitoring manager 202 Monitoring console 203 Job tracker 204 Name node 205 Monitoring agent 206 Task tracker 207 Data node 208 Remote monitor 300 MapReduce cluster 301 Job 302 Task 303 Data Block 304 Split 305 Map task 306 Intermediate file 307 Reduce task 308 Output file 309 Map slot 310 Reduce slot 401 Managed host list table 402 OS performance information 403 MapReduce scheduling information 404 Job list 405 Task list 406 Temp List 407 Data transfer trace 501 Virtual group table 502 Virtual group node list table 503 Virtual group 504 Node characteristic table 505 Multiphase node characteristic table 506 Cluster map table 507 Node characteristic group 508 Node performance matrix table 509 Job profile table 510 Job with analysis algorithm Profile table 511 Analysis algorithm table 600 Monitoring console screen 601 Preferences setting screen 602 Event notification screen 603 Cluster display screen 604 Analysis algorithm setting screen

Claims

An operation management device of an information processing system that executes a job in cooperation with a plurality of information processing devices,
A data collection unit for obtaining information from each of the plurality of information processing devices;
A storage unit for storing data relating to the plurality of information processing devices;
An evaluation unit that evaluates the state of the plurality of information processing devices using data stored in the storage unit;
Each of the plurality of information processing devices has any one of predetermined characteristics,
The data collection unit acquires performance information from each of the plurality of information processing devices and stores the performance information in the storage unit,
The storage unit further stores, for each combination of characteristics that can be taken by the two information processing devices, a threshold value regarding the correlation between the performance information of the two information processing devices,
The evaluation unit is
When evaluating the state of the information processing apparatus as the evaluation target for one information processing apparatus as the evaluation target among the plurality of information processing apparatuses,
For each of the plurality of information processing apparatuses other than the information processing apparatus to be evaluated, a correlation value of performance information with the information processing apparatus to be evaluated is calculated and a combination of characteristics with the information processing apparatus to be evaluated is specified. And comparing the threshold value stored in the storage unit with the calculated correlation value for the specified combination of characteristics,
An operation management apparatus that evaluates a state of the information processing apparatus to be evaluated based on a result of the comparison.

The operation management apparatus according to claim 1, wherein the job includes a plurality of tasks and a plurality of data files.
The operation management apparatus, wherein the plurality of predetermined characteristics are characteristics determined based on a type of a task executed by the information processing apparatus and characteristics of a data file input / output by the information processing apparatus.

The operation management apparatus according to claim 2, wherein the job is a MapReduce-type job, and the job includes a Map task and a Reduce task,
The plurality of predetermined characteristics are characteristics determined based on whether a task executed by the information processing apparatus is a Map task or a Reduce task, and a type of data file transfer accompanying the execution of the task. Operation management device.

The operation management device according to claim 2,
For each job executed in the information processing system, the data collection unit inputs and outputs the type of task executed by the information processing apparatus and the information processing apparatus for each of a plurality of information processing apparatuses that execute the job. Get data file characteristics,
The operation management apparatus, wherein the evaluation unit specifies characteristics of each of the plurality of information processing apparatuses based on a task type and data file characteristics collected by the data collection unit.

The operation management device according to claim 1,
The threshold value stored in the storage unit is a value calculated by the evaluation unit using performance information collected from the plurality of information processing devices by the data collection unit.

The operation management device according to claim 5,
The operation management apparatus characterized in that the performance information includes a performance index of a computing resource constituting the information processing apparatus.

The operation management device according to claim 1,
The operation management apparatus, wherein the evaluation unit determines occurrence of an abnormality in the information processing apparatus based on a comparison between a threshold value stored in the storage unit and the calculated correlation value.

The operation management apparatus according to claim 1, wherein the job includes a plurality of tasks and a plurality of data files.
The operation management apparatus, wherein the plurality of predetermined characteristics are characteristics determined based on characteristics of a data file input / output by the information processing apparatus.

The operation management device according to claim 1,
The storage unit is calculated by using a correlation value calculating unit that calculates a correlation value of performance information of the second information processing device and a correlation value calculating unit for each combination of characteristics that the two information processing devices can take. A correlation value and a threshold value determination means for comparing the threshold value,
The said evaluation part calculates the correlation value of the said performance information using the said correlation value calculation means, and compares the said correlation value with the said threshold value using the said threshold value determination means, The operation management apparatus characterized by the above-mentioned.

The operation management device according to claim 9,
The storage unit stores a plurality of the correlation value calculation means and a plurality of the threshold determination means,
In the case where the evaluation unit evaluates the state of the evaluation target information processing apparatus for one evaluation target information processing apparatus among the plurality of information processing apparatuses,
Calculate the maximum and minimum values of the performance information collected from the information processing device to be evaluated,
An operation management apparatus that switches between the correlation value calculation means and the threshold value determination means based on the difference between the maximum value and the minimum value.

The operation management apparatus according to claim 10,
The storage unit stores a maximum value and a minimum value of the performance information,
The evaluation unit is
When evaluating the state of the evaluation target information processing apparatus for one evaluation target information processing apparatus among the plurality of information processing apparatuses, an average value of performance information collected from the evaluation target information processing apparatus is calculated. ,
An operation management apparatus, wherein occurrence of an abnormality is determined based on a comparison between the average value and the maximum value and the minimum value.

An operation management method for an information processing system for executing a job in cooperation with a plurality of information processing apparatuses having any one of a plurality of predetermined characteristics,
An operation management device communicably connected to the plurality of information processing devices,
For each combination of characteristics that can be taken by two information processing devices among the plurality of information processing devices, store a threshold value regarding the correlation of the performance information of the two information processing devices,
Obtaining performance information from each of the plurality of information processing devices;
When evaluating the state of the information processing apparatus as the evaluation target for one information processing apparatus as the evaluation target among the plurality of information processing apparatuses,
For each of the plurality of information processing devices other than the information processing device to be evaluated, a correlation value of performance information with the information processing device to be evaluated is calculated,
Identify a combination of characteristics with the information processing apparatus to be evaluated, compare the threshold value with the calculated correlation value for the identified combination of characteristics,
An operation management method comprising evaluating a state of the information processing apparatus to be evaluated based on a result of the comparison.

An operation management apparatus communicably connected to the information processing apparatus of the information processing system that executes a job in cooperation with a plurality of information processing apparatuses having any one of a plurality of predetermined characteristics.
For each combination of characteristics that can be taken by two information processing devices among the plurality of information processing devices, a procedure for storing a threshold value for correlation of performance information of the two information processing devices;
A procedure for acquiring performance information from each of the plurality of information processing devices;
When evaluating the state of the information processing apparatus as an evaluation target for one information processing apparatus as the evaluation target among the plurality of information processing apparatuses,
For each of the plurality of information processing devices other than the information processing device to be evaluated, a procedure for calculating a correlation value of performance information with the information processing device to be evaluated;
A procedure for specifying a combination of characteristics with the information processing apparatus to be evaluated, and comparing the calculated threshold value with the calculated correlation value for the specified combination of characteristics;
An operation management program for executing a procedure for evaluating the state of the information processing apparatus to be evaluated based on the result of the comparison.