JP2008234351A

JP2008234351A - Integrated operation monitoring system and program

Info

Publication number: JP2008234351A
Application number: JP2007073492A
Authority: JP
Inventors: Tadashi Adachi; 忠史安達
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2007-03-20
Filing date: 2007-03-20
Publication date: 2008-10-02

Abstract

<P>PROBLEM TO BE SOLVED: To provide a hierarchical integrated operation monitoring system for monitoring failure information and performance information of a sub-system monitored by multiple pieces of existing operation monitoring software with a threshold of a preset fixed level. <P>SOLUTION: An output signal containing failure information and performance information of a sub-system monitoring layer 3 set for each sub-system is compared and redefined with failure linkage information between sub-systems based on past failure histories preliminarily accumulated in a knowledge database and performance information of existing operation monitoring software in an HUB layer 4, and then message-delivered as standard system management information to an integrated operation monitoring layer 2 and an operator monitoring layer 1, whereby the whole system can be monitored at a standard monitoring level. The operation monitoring function is divided between the integrated operation monitoring layer 2 and the operator monitoring layer 1, whereby efficient integrated operation monitoring can be attained. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、統合運用監視システム及びプログラムに関し、ローカルエリアネットワーク（以下、ＬＡＮとする）上に構築されたコンピュータシステムの運用管理サーバ、該サーバを含んだコンピュータシステム、運用管理のためのメッセージの抽出方法、ならびに前記運用管理サーバに実行させるプログラムに関し、特に、複数のコンピュータシステムを既製の運用管理ツールを使用し、一元的、統合的、且つ効率的に管理する統合運用監視システムを提供するものに関する。 The present invention relates to an integrated operation monitoring system and program, and relates to an operation management server of a computer system constructed on a local area network (hereinafter referred to as a LAN), a computer system including the server, and extraction of a message for operation management. The present invention relates to a method and a program to be executed by the operation management server, and particularly relates to a method for providing an integrated operation monitoring system that manages a plurality of computer systems in an integrated, integrated and efficient manner using an off-the-shelf operation management tool. .

統合運用監視システムに関する従来技術は、例えば、特許文献１ないし３に開示された従来技術がある。 Conventional techniques relating to the integrated operation monitoring system include, for example, conventional techniques disclosed in Patent Documents 1 to 3.

特許文献１に記載の従来技術は、統合運用監視コンソールに表示部分を切り出し、作業者が所望する統合運用管理を実現する統合運用監視コンソールへのインターフェースの仕様および実現する手段を提供するものである。 The prior art described in Patent Document 1 provides a specification of an interface to an integrated operation monitoring console that realizes integrated operation management desired by an operator and means for realizing it by cutting a display portion on the integrated operation monitoring console. .

また、特許文献２に記載の従来技術は、運用管理サーバ上に、運用管理端末に出力する信号を選択するための複数のフィルター機能を有し、被監視対象システムのエラーメッセージを、あらかじめ使用するフィルターの使用順序を制御することによって、発生頻度の多いメッセージは、使用順序の早いフィルターにより特定することで、監視速度を早くする手段を提供するものである。 The prior art described in Patent Document 2 has a plurality of filter functions for selecting signals to be output to the operation management terminal on the operation management server, and uses the error message of the monitored system in advance. By controlling the order in which the filters are used, a message having a high frequency of occurrence is specified by a filter having a fast usage order, thereby providing a means for increasing the monitoring speed.

また、特許文献３に記載の従来技術は、エラーログ収集エージェントシステムにおいて、各製品のベンダが提供する製品ごとのエラー／警告／インフォメーションのメッセージの重要度とは別に、システム全体としての重要度もともに保存する（特に、段落００１８、００１９参照）ものであり、また、Ａ社製品が異常終了すると連携するＢ社製品も障害発生するはずなのでこれらのログ情報を関係情報としてグループ化する（特に、段落００２９、００３０参照）ものである。
特開平７−２５３８６６号公報特開２００６−２６００５６号公報特開２００３−２１６４５７号公報 In addition, the prior art described in Patent Document 3 has an error log collection agent system in which, as well as the importance of error / warning / information messages for each product provided by the vendor of each product, the importance of the entire system is also achieved. Both of them are saved (particularly, refer to paragraphs 0018 and 0019), and if the product of Company A ends abnormally, the product of Company B that cooperates should also fail, so these log information is grouped as related information (particularly, (See paragraphs 0029 and 0030).
Japanese Patent Laid-Open No. 7-253866 JP 2006-260056 A JP 2003-216457 A

しかしながら、上述した従来の統合運用監視システムでは次のような問題がある。 However, the conventional integrated operation monitoring system described above has the following problems.

第１の問題点は、情報システムを複数のサブシステムで構成し、かつ複数の既製の運用管理ツールを使用して運用する場合、統合的、かつ均一な品質で、一元的なシステム全体の障害情報、性能情報を短時間で正確に管理できないということである。その理由は、既製運用管理ソフトウェアは、被監視対象サーバ、ストレージから、出力される障害に関するメッセージ性能に関するメッセージを、それぞれの既製ソフトウェア独自の閾値でフィルタリングして管理するため、複数の既製運用管理ソフトウェアで、システムを構成するサブシステムをそれぞれ独立して監視する場合、既製運用監視ソフトウェアに依存して性能、障害等の被監視対象機器の管理レベルにばらつきが生じるという問題が発生する。 The first problem is that when an information system is composed of a plurality of subsystems and is operated using a plurality of off-the-shelf operation management tools, the failure of the entire system is integrated with uniform quality. This means that information and performance information cannot be managed accurately in a short time. The reason is that off-the-shelf operation management software manages messages that are output from the monitored server and storage by filtering the message performance related to faults with the threshold value unique to each off-the-shelf software. Thus, when the subsystems constituting the system are monitored independently, there arises a problem that the management level of the monitored device such as performance and failure varies depending on the off-the-shelf operation monitoring software.

第２の問題点は、システム全体を統合監視する場合にコストがかかるという問題がある。その理由は、昨今の企業内コンピュータは、機能別、業務別に複数のサブシステムが構築され、複数のサブシステムで１つの業務システムを構成し、サブシステムのそれぞれが、独立して構築され、さらに異なる既製運用管理ソフトウェアで運用されるためである。また、従来の運用監視システムでは、サブシステムごとに、監視コンソールを準備し、複数の監視コンソールを監視、管理するか、新たに統合監視システムを導入する際、既存の既製運用監視ソフトを廃止し、統合運用監視システム導入の目的で、新たに共通の運用監視ソフトウェアを導入する必要が発生し、監視ソフトウェア導入コストが追加コストとして新たに発生するためである。 The second problem is that it is expensive to perform integrated monitoring of the entire system. The reason for this is that in recent enterprise computers, a plurality of subsystems are constructed by function and business, and a plurality of subsystems constitute one business system, and each subsystem is independently constructed. This is because it is operated by different off-the-shelf operation management software. In addition, the conventional operation monitoring system prepares a monitoring console for each subsystem, monitors and manages multiple monitoring consoles, or abolishes existing off-the-shelf operation monitoring software when a new integrated monitoring system is introduced. This is because it is necessary to newly introduce common operation monitoring software for the purpose of introducing the integrated operation monitoring system, and the monitoring software introduction cost is newly generated as an additional cost.

本発明は、上述した問題点を鑑みてなされたものであり、その目的とするところは、複数の既製監視ツールで管理されている複数のサブシステムのすべての被監視対象サーバ、ストレージの監視レベルをあらかじめ設定した標準的な監視レベルで均一かつ容易に実現する統合運用間システムを提供することを目的とする。 The present invention has been made in view of the above-described problems, and an object of the present invention is to monitor all monitored servers and storage monitoring levels of a plurality of subsystems managed by a plurality of off-the-shelf monitoring tools. It aims at providing the system between integrated operation which realizes uniformly and easily at the standard monitoring level set up beforehand.

上記目的を解決するための請求項１記載の発明は、コンピュータシステムの統合運用監視システムであって、サブシステム内の監視対象機器の性能情報を収集する運用監視ツールと、複数のサブシステムの前記運用監視ツールが収集した前記性能情報をまとめて記憶し、また、前記性能情報を標準化するためのテンプレートを記憶するナレッジデータベースと、前記テンプレートに基づき前記性能情報を標準化して出力する演算処理手段と、を有することを特徴とする統合運用監視システムである。 An invention according to claim 1 for solving the above object is an integrated operation monitoring system for a computer system, wherein an operation monitoring tool for collecting performance information of monitored devices in a subsystem, and the plurality of subsystems The performance information collected by the operation monitoring tool is collectively stored, a knowledge database that stores a template for standardizing the performance information, and an arithmetic processing unit that standardizes and outputs the performance information based on the template; And an integrated operation monitoring system.

請求項２記載の発明は、請求項１記載の統合運用監視システムにおいて、前記演算処理手段の標準化された前記性能情報の出力を受け取り、前記出力を管理システム技術者に提示する統合運用監視手段を有することを特徴とする。 According to a second aspect of the present invention, in the integrated operation monitoring system according to the first aspect, there is provided an integrated operation monitoring unit that receives the output of the performance information standardized by the arithmetic processing unit and presents the output to a management system engineer. It is characterized by having.

請求項３記載の発明は、請求項１又は２記載の統合運用監視システムにおいて、前記運用監視ツールは、サブシステム内の監視対象機器の障害情報も収集し、前記ナレッジデータベースは、複数のサブシステムの前記運用監視ツールが収集した障害情報を蓄積し、前記演算処理手段は、サブシステム内の監視対象機器に障害が発生した場合、前記ナレッジデータベースが蓄積した障害情報を基に、当該障害に関連して障害が発生する可能性がある機器を分析し、分析結果を出力することを特徴とする。 According to a third aspect of the present invention, in the integrated operation monitoring system according to the first or second aspect, the operation monitoring tool also collects failure information of monitored devices in the subsystem, and the knowledge database includes a plurality of subsystems. The failure information collected by the operation monitoring tool is accumulated, and when the failure occurs in the monitoring target device in the subsystem, the arithmetic processing means relates to the failure based on the failure information accumulated in the knowledge database. Then, it is characterized by analyzing a device that may cause a failure and outputting an analysis result.

請求項４記載の発明は、コンピュータシステムを統合運用監視システムとして機能させるプログラムであって、前記コンピュータシステム内のサブシステム内の運用監視サーバを、監視対象機器の性能情報を収集させる運用監視ツールとして機能させ、前記コンピュータシステムの統合運用監視サーバを、複数のサブシステムの前記運用監視ツールが収集した前記性能情報をまとめて記憶し、また、前記性能情報を標準化するためのテンプレートを記憶するナレッジデータベースと、前記テンプレートに基づき前記性能情報を標準化して出力する演算処理手段と、として機能させることを特徴とする統合運用監視プログラムである。 The invention according to claim 4 is a program for causing a computer system to function as an integrated operation monitoring system, and an operation monitoring server in a subsystem in the computer system is used as an operation monitoring tool for collecting performance information of monitored devices. Knowledge database for storing the performance information collected by the operation monitoring tool of a plurality of subsystems together and storing a template for standardizing the performance information. And an operation processing unit that standardizes and outputs the performance information based on the template.

請求項５記載の発明は、請求項４記載の統合運用監視プログラムにおいて、前記コンピュータシステムの統合運用監視サーバを、前記演算手段の標準化された前記性能情報の出力を受け取り、前記出力を監視システムの技術者に提示する統合運用監視手段として機能させることを特徴とする。 According to a fifth aspect of the present invention, in the integrated operation monitoring program according to the fourth aspect, the integrated operation monitoring server of the computer system receives the output of the performance information standardized by the computing means, and the output is output from the monitoring system. It is characterized by functioning as an integrated operation monitoring means presented to engineers.

請求項６記載の発明は、請求項４又は５記載の統合運用監視プログラムにおいて、前記運用監視ツールに、サブシステム内の監視対象機器の障害情報も収集する処理と、前記ナレッジデータベースに、複数のサブシステムの前記運用監視ツールが収集した障害情報を蓄積する処理と、前記演算処理手段に、サブシステム内の監視対象機器に障害が発生した場合、前記ナレッジデータベースが蓄積した障害情報を基に、当該障害に関連して障害が発生する可能性がある機器を分析し、分析結果を出力する処理と、を実行させることを特徴とする。 The invention according to claim 6 is the integrated operation monitoring program according to claim 4 or 5, wherein the operation monitoring tool collects failure information of monitored devices in the subsystem, and the knowledge database includes a plurality of When a failure occurs in the monitoring target device in the subsystem in the processing for storing the failure information collected by the operation monitoring tool of the subsystem and the arithmetic processing means, based on the failure information accumulated in the knowledge database, And analyzing a device in which a failure may occur in relation to the failure and outputting an analysis result.

本発明によれば、複数の既製監視ツールで管理されている複数のサブシステムのすべての被監視対象サーバ、ストレージの監視レベルをあらかじめ設定した標準的な監視レベルで均一かつ容易に実現する統合運用間システムを提供することができる。 According to the present invention, an integrated operation that uniformly and easily realizes the monitoring levels of all monitored servers and storages of a plurality of subsystems managed by a plurality of off-the-shelf monitoring tools at a standard monitoring level set in advance. Inter-system can be provided.

次に、本発明の実施の形態について図面を参照して詳細に説明する。 Next, embodiments of the present invention will be described in detail with reference to the drawings.

まず、本実施の形態の構成について説明する。図１は、複数のサブシステム群を有して構成された、本発明の実施の形態に係る４層構造のコンピュータシステムの階層型統合運用管理システムの構成を示す図である。 First, the configuration of the present embodiment will be described. FIG. 1 is a diagram showing a configuration of a hierarchical integrated operation management system of a computer system having a four-layer structure according to an embodiment of the present invention configured with a plurality of subsystem groups.

図１を参照すると、本実施形態の階層型統合運用監視システムは、オペレータ監視層１と、統合運用監視層２と、サブシステム監視層３と、ＨＵＢ層４とを有する４層構造の構成である。 Referring to FIG. 1, the hierarchical integrated operation monitoring system of this embodiment has a four-layer structure including an operator monitoring layer 1, an integrated operation monitoring layer 2, a subsystem monitoring layer 3, and a HUB layer 4. is there.

オペレータ監視層１は、システム障害情報を運用監視オペレータへ通報する障害通報機能（この機能を以下、パトライト機能という）を有する。 The operator monitoring layer 1 has a failure notification function for reporting system failure information to the operation monitoring operator (this function is hereinafter referred to as a patrol function).

統合運用監視層２は、運用管理システム技術者により、サブシステム監視層３配下の統合システムの総合性能の監視と、統合システムで起きた障害の分析とが行われる階層である。 The integrated operation monitoring layer 2 is a layer where an operation management system engineer performs monitoring of the overall performance of the integrated system under the subsystem monitoring layer 3 and analysis of a failure that has occurred in the integrated system.

サブシステム監視層３は、各々が統合システムを構成するサブシステム３ａ、３ｂ、・・・を有して構成される（以下、サブシステムのどれか１つを指すときは、サブシステム３ｘとする）。サブシステム３ｘについて説明する。サブシステム３ｘは、サブシステム３ｘが属する統合システムの障害分析と性能分析をし、また、性能情報と障害情報を監視し管理する。なお、サブシステム３ｘの監視対象となる機器は、サーバやストレージやネットワークルータ等があり、図１に示す例では、サブシステム３ａの、サーバ３ａａ、ストレージ３ａｂ、ネットワーク３ａｃである。また、サブシステム３ｘで動作する統合システムの障害と性能の監視／管理／分析ソフトウェアは、既製の運用管理ソフトウェアが使用できるが、監視対象機器の性能情報と障害情報を後述するＨＵＢ層４に通信出力できるものが好ましい。 The subsystem monitoring layer 3 includes subsystems 3a, 3b,... That constitute an integrated system (hereinafter referred to as subsystem 3x when referring to any one of the subsystems). ). The subsystem 3x will be described. The subsystem 3x performs failure analysis and performance analysis of the integrated system to which the subsystem 3x belongs, and monitors and manages performance information and failure information. The devices to be monitored by the subsystem 3x include servers, storages, network routers, and the like. In the example illustrated in FIG. 1, the servers 3aa, the storage 3ab, and the network 3ac of the subsystem 3a. Moreover, as the failure / performance monitoring / management / analysis software of the integrated system operating in the subsystem 3x, off-the-shelf operation management software can be used, but the performance information and failure information of the monitored device are communicated to the HUB layer 4 described later. What can output is preferable.

ＨＵＢ層４は、ナレッジデータベース４１と、サブシステム監視層３から出力される情報を受信する運用情報収集部４２と、運用情報収集部４２をナレッジデータベース４１の情報を用い演算処理を実行する演算処理部４３と、演算処理部４３の出力情報を上位オペレータ監視層１及び統合運用監視層２へ出力する管理情報出力部４４とを備えて構成される。 The HUB layer 4 includes a knowledge database 41, an operation information collection unit 42 that receives information output from the subsystem monitoring layer 3, and an operation process that causes the operation information collection unit 42 to perform an operation process using information in the knowledge database 41. And a management information output unit 44 that outputs output information of the arithmetic processing unit 43 to the upper operator monitoring layer 1 and the integrated operation monitoring layer 2.

ナレッジデータベース４１は、各サブシステム毎に使用されている既製運用監視ソフトのメッセージを標準化するための情報、過去の障害履歴に基づくサブシステム間の障害連係情報をあらかじめテンプレートとして蓄積している。 The knowledge database 41 prestores information for standardizing messages of off-the-shelf operation monitoring software used for each subsystem, and fault linkage information between subsystems based on past fault histories as templates.

運用情報収集部４２は、障害情報又は性能情報をＳＮＴＰ、メール形式などの種々の形式・プロトコルで送信された情報（以下、メッセージという）を受信する機能を有する。 The operation information collection unit 42 has a function of receiving failure information or performance information (hereinafter referred to as a message) transmitted in various formats / protocols such as SNTP and mail format.

なお、ＨＵＢ層４は、単一のコンピュータにより実装してもよい。この場合は、コンピュータのハードディスク等の２次記憶装置がナレッジデータベース４１として、ＣＰＵ等の処理装置が運用情報収集部４２、演算処理部４３、管理情報出力部４４として機能する。しかしながら、これに限定されるものではなく、相互に通信可能に接続された複数のコンピュータから成るコンピュータシステムによって、仮想的に実装してもよい。複数のコンピュータによって仮想化することによって、処理能力の向上という効果を奏する。また、この場合は、特に、ＨＵＢ層４を「仮想ＨＵＢ層４」と呼ぶ。 Note that the HUB layer 4 may be implemented by a single computer. In this case, a secondary storage device such as a hard disk of a computer functions as the knowledge database 41, and a processing device such as a CPU functions as the operation information collection unit 42, the arithmetic processing unit 43, and the management information output unit 44. However, the present invention is not limited to this, and may be virtually implemented by a computer system including a plurality of computers connected to be communicable with each other. By virtualizing with a plurality of computers, there is an effect of improving the processing capability. In this case, the HUB layer 4 is particularly referred to as a “virtual HUB layer 4”.

上記構成の本実施形態に係る階層型統合運用管理システムは、下記の動作をする。 The hierarchical integrated operation management system according to the present embodiment having the above configuration operates as follows.

まず、サブシステム監視層３において、各サブシステム３ｘが、それぞれにインストールされている既製運用監視ツール（又は既製の運用管理ソフトウェア）によって、複数の被監視対象機器（例えば、サーバ３ａａ、ストレージ３ａｂ）についての、性能情報と障害情報を収集する。次に、各サブシステム３ｘは、独立して検出・収集した障害情報と性能情報を、上位のＨＵＢ層４に出力する。 First, in the subsystem monitoring layer 3, each subsystem 3x is configured by a plurality of devices to be monitored (for example, a server 3aa and a storage 3ab) using a ready-made operation monitoring tool (or a ready-made operation management software) installed in each subsystem 3x. Collect performance information and failure information for. Next, each subsystem 3x outputs failure information and performance information independently detected and collected to the upper HUB layer 4.

ここで、ＨＵＢ層４の動作について図２を参照すると、仮想ＨＵＢ層４の運用情報収集部４２は、上記のとおり障害情報又は性能情報をＳＮＴＰ、メール形式などの形式で送信された情報を受信する機能を有しており、各サブシステム３ｘから個別に送信されたメッセージが一時収納する（メッセージ受付、ステップＳ１）。そして、メッセージが一時収納された後、フィルター機能により、障害情報、性能情報に分割し演算処理部４３へ送信する（ステップＳ２、ステップＳ３）。 Here, referring to FIG. 2 regarding the operation of the HUB layer 4, the operation information collection unit 42 of the virtual HUB layer 4 receives the information transmitted in the form of SNTP, mail format, etc., as described above. A message individually transmitted from each subsystem 3x is temporarily stored (message reception, step S1). Then, after the message is temporarily stored, it is divided into failure information and performance information by the filter function and transmitted to the arithmetic processing unit 43 (steps S2 and S3).

運用情報収集部４２で使用するフィルターには、フィルター条件として、障害情報、性能情報を記録したテーブルが準備されている。運用情報収集部４２は、入力メッセージをこのフィルター条件のテーブルと比較し障害情報と性能情報に切り分けた後、演算処理部４３に出力する。 The filter used in the operation information collection unit 42 is prepared with a table in which failure information and performance information are recorded as filter conditions. The operation information collection unit 42 compares the input message with the filter condition table and classifies the input message into failure information and performance information, and then outputs the information to the arithmetic processing unit 43.

演算処理層４３は、障害情報が入力された場合、ナレッジデータベース４１を参照し（ステップＳ４）、ナレッジデータベース４１に格納されている過去の障害発生データと関連付ける。次に、入力された障害情報と、障害発生該当機器の性能情報と、該当機器の過去の障害発生データと、により関連して２次障害を発生する可能性のある機器、サブシステム群を自動的に抽出する（他システムへの影響度分析、ステップＳ５）。次に、障害情報と併せて、新たなメッセージファイルを作成する（ステップＳ６）。なお、このメッセージファイルは、オペレータ監視層１と統合運用監視層２において、監視オペレータや運用管理者のユーザインターフェースとなる表示画面を構成するためのデータとなる。次に、管理情報出力部４４は、演算処理部４３の作成したメッセージファイルを、管理情報出力部４４へ出力する（ステップＳ７）。 When failure information is input, the arithmetic processing layer 43 refers to the knowledge database 41 (step S4) and associates it with past failure occurrence data stored in the knowledge database 41. Next, the devices and subsystems that may cause a secondary failure are automatically associated with the input failure information, performance information of the affected device, and past failure data of the device. Extraction (influence analysis on other systems, step S5). Next, a new message file is created together with the failure information (step S6). Note that this message file is data for configuring a display screen serving as a user interface of the monitoring operator or the operation manager in the operator monitoring layer 1 and the integrated operation monitoring layer 2. Next, the management information output unit 44 outputs the message file created by the arithmetic processing unit 43 to the management information output unit 44 (step S7).

また、性能情報が入力された場合、ナレッジデータベース４１を参照し（ステップＳ８）、あらかじめナレッジデータベース４１に記録されている既製運用監視ツールの性能比較上を有するテンプレートを用いて、運用情報収集部４２で収集されたデータを標準性能情報に換算する（ステップＳ９）。その後、上記ステップＳ６と同様にオペレータ監視層１と統合運用監視層２において監視オペレータや運用管理者のユーザインターフェースとなる表示画面を構成するためのデータとなるメッセージファイルを作成する（ステップＳ１０）。次に、管理情報出力部４４は、演算処理部４３の作成したメッセージファイルを、管理情報出力部４４へ出力する（ステップＳ１１）。 When performance information is input, the knowledge database 41 is referred to (step S8), and the operation information collection unit 42 is used by using a template having a performance comparison of a ready-made operation monitoring tool recorded in the knowledge database 41 in advance. The data collected in step 1 is converted into standard performance information (step S9). Thereafter, in the same manner as in step S6, a message file serving as data for constructing a display screen serving as a user interface for the monitoring operator and the operation manager is created in the operator monitoring layer 1 and the integrated operation monitoring layer 2 (step S10). Next, the management information output unit 44 outputs the message file created by the arithmetic processing unit 43 to the management information output unit 44 (step S11).

次に、本実施形態のナレッジデータベース４１について説明する。まず、ナレッジデータベース４１に格納される性能情報について説明する。図３は、ナレッジデータベース４１のストレージ性能テンプレートの一例を示した図である。 Next, the knowledge database 41 of this embodiment will be described. First, the performance information stored in the knowledge database 41 will be described. FIG. 3 is a diagram showing an example of the storage performance template of the knowledge database 41.

図３を参照すると、各サブシステムで使用される運用監視ツール（ａ）、運用監視ツール（ｂ）では、既製運用監視ソフトウェアの特性によって、同様のストレージを監視した場合に性能情報にばらつきがある。 Referring to FIG. 3, in the operation monitoring tool (a) and the operation monitoring tool (b) used in each subsystem, performance information varies when similar storage is monitored depending on the characteristics of the ready-made operation monitoring software. .

ストレージ性能テンプレートには、あらかじめ、該当システムで既ストレージを使用した場合の標準性能値があらかじめ情報として記録されており、演算処理部４３に運用情報収集部のストレージ性能情報が入力されたとき、ストレージ性能テンプレートのテーブルを比較参照することによって、標準的な性能情報に置き換えることが可能となる。 In the storage performance template, the standard performance value when the existing storage is used in the corresponding system is recorded in advance as information, and when the storage performance information of the operation information collection unit is input to the arithmetic processing unit 43, the storage performance template By comparing and referring to the performance template table, it can be replaced with standard performance information.

すなわち、本実施形態のナレッジデータベース４１は、例えば同一ディスクの性能測定をした場合、複数の監視ツールによって測定値が異なる実情に鑑みて、あらかじめ複数の監視ツールで同一の被監視対象ストレージを監視した場合のばらつきをモニタリングしておいたモニタリング結果（性能情報）が格納されている。そして、ナレッジデータベース４１に格納されている、この性能情報を参照することによって、測定値は、全サブシステムで標準的な値に換算される。この標準的な値は、例えばシステム全体の重要度を示す指数等とは異なり、各被監視対象機器と運用監視ツールの相対値をナレッジデータベースに蓄積したものである。 In other words, the knowledge database 41 of the present embodiment, for example, when the performance of the same disk is measured, monitors the same monitored storage with a plurality of monitoring tools in advance in consideration of the situation where measured values differ depending on the plurality of monitoring tools. Stores monitoring results (performance information) for monitoring variations in cases. Then, by referring to this performance information stored in the knowledge database 41, the measured value is converted into a standard value in all subsystems. This standard value is different from, for example, an index indicating the importance of the entire system, and is obtained by storing the relative values of each monitored device and the operation monitoring tool in the knowledge database.

ナレッジデータベース４１には、上記性能情報だけではなく過去の障害情報も格納される。次に、ナレッジデータベース４１に格納された障害情報を基に、演算処理部４３が、ある障害が発生したときに別の障害が発生する可能性があることを予兆する動作について説明する。 The knowledge database 41 stores not only the performance information but also past failure information. Next, based on the failure information stored in the knowledge database 41, an operation in which the arithmetic processing unit 43 predicts that another failure may occur when a certain failure occurs will be described.

ナレッジデータベース４１は、過去の障害情報を格納しており、また、この過去の障害情報を基にシステム間の関連情報もナレッジとして格納している。そして、ナレッジデータベース４１が、例えば、過去の障害情報を基にＡシステムとＢシステムが関連しているというナレッジを有する場合、演算処理部４３は、仮にＡシステムに障害が発生し、Ｂシステムに障害が発生していなかったとしても、過去の障害情報を検索し、Ａシステムで障害が発生した場合、一定時間後にＢシステムに障害が発生する可能性があることを予兆する。なお、演算処理部４３が、障害発生の可能性があることを特定する監視対象機器は、１つに限定されることなく、複数であってもよい。 The knowledge database 41 stores past failure information, and also stores related information between systems based on the past failure information as knowledge. For example, when the knowledge database 41 has knowledge that the A system and the B system are related based on past failure information, the arithmetic processing unit 43 temporarily causes a failure in the A system, Even if a failure has not occurred, past failure information is searched, and if a failure occurs in the A system, it is predicted that a failure may occur in the B system after a certain time. The arithmetic processing unit 43 is not limited to one device to be monitored, and there may be a plurality of devices to be monitored.

次に、管理情報出力部４４について説明する。管理情報出力部４４は、入力された信号が、障害情報であるか、性能情報であるかを識別するためのフィルター機能を有している。 Next, the management information output unit 44 will be described. The management information output unit 44 has a filter function for identifying whether the input signal is failure information or performance information.

管理情報出力部４４へ入力された情報信号は、入力情報信号が性能情報であった場合、入力信号を統合運用監視層２へのみ出力し、入力情報信号が障害情報であった場合、その信号を運用監視層２へ出力するとともに、オペレータ監視層１に出力する。 The information signal input to the management information output unit 44 outputs the input signal only to the integrated operation monitoring layer 2 when the input information signal is performance information, and the signal when the input information signal is failure information. Is output to the operation monitoring layer 2 and to the operator monitoring layer 1.

オペレータ監視層１に伝達される情報信号は、管理情報出力部４４のフィルター機能によって制限されるため、オペレータ監視層の備える図示しない監視モニタに出力される運用監視情報は、システム異常時に発信される情報のみを選択的に運用管理することが可能になる。 Since the information signal transmitted to the operator monitoring layer 1 is limited by the filter function of the management information output unit 44, the operation monitoring information output to a monitoring monitor (not shown) provided in the operator monitoring layer is transmitted when the system is abnormal. It becomes possible to selectively manage only information.

また、本情報には、予兆情報として、障害発生箇所に関連するシステム障害予知情報が付加されているため、障害の影響が拡大する前に、あらかじめオペレータはシステム全体の予兆管理を行うことができる。 In addition, since this information is added with predictive information, system failure prediction information related to the failure location, the operator can perform predictive management of the entire system in advance before the influence of the failure expands. .

また、統合運用監視層２には、入力される性能情報が、既製運用監視ソフトウェアの特性に依存しない、標準的な運用監視性能情報として提供されるため、サブシステムが増設、撤去される場合、複数の既製運用監視ソフトウェアが使用された場合であっても、標準的な閾値で、一定の障害監視、性能監視を行える統合運用監視を実現することが可能となる。 In addition, since the integrated operation monitoring layer 2 provides the performance information that is input as standard operation monitoring performance information that does not depend on the characteristics of the off-the-shelf operation monitoring software, when a subsystem is added or removed, Even when a plurality of off-the-shelf operation monitoring software is used, it is possible to realize integrated operation monitoring capable of performing constant fault monitoring and performance monitoring with standard threshold values.

また、上記実施形態は、統合運用監視層２とオペレータ監視層１で、運用監視機能を分割することより、効率的な統合運用監視を実現している。 In the above embodiment, the integrated operation monitoring layer 2 and the operator monitoring layer 1 divide the operation monitoring function, thereby realizing efficient integrated operation monitoring.

上記本実施形態によれば、少なくとも２つ以上のサブシステムから構成されるコンピュータシステムで、かつ複数の既製運用管理ソフトウェアで管理されているコンピュータシステムにおいて、すべての被監視対象サーバ、ストレージを、あらかじめ設定した均一な運用品質で統合的に管理することができ、かつ、障害が発生したサブシステム、対象機器が、他のシステム、他の対象機器の障害を誘発する可能性をあらかじめ予兆できるシステム運用管理を実現することが可能となる。その理由は、各サブシステム管理層と、オペレーション層の間に、ナレッジデータベースを有するＨＵＢ層を採用した４層構造の統合運用監視システムを採用したことによる。 According to the present embodiment, in the computer system composed of at least two or more subsystems and managed by a plurality of off-the-shelf operation management software, all monitored servers and storages are stored in advance. System operation that can be managed in an integrated manner with the set uniform operational quality, and that the subsystem and target device in which the failure occurred can predict in advance the possibility that other systems and other target devices will fail Management can be realized. The reason is that a four-layer integrated operation monitoring system employing a HUB layer having a knowledge database is adopted between each subsystem management layer and the operation layer.

なお、本発明は上記実施形態に限定されず、本発明の技術的思想の範囲内において種々の変形が可能である。上記実施形態の構成要素のＨＵＢ層、サブシステム監視層の数は、上記実施形態に限定されず、本発明を実施するために好適な、数、場所に設定することができる。 In addition, this invention is not limited to the said embodiment, A various deformation | transformation is possible within the range of the technical idea of this invention. The number of HUB layers and subsystem monitoring layers of the constituent elements of the above embodiment is not limited to the above embodiment, and can be set to a number and a location suitable for carrying out the present invention.

例えば、複数のサブシステムを異なる複数の場所に分散設置する構成としてもよい。この場合、ナレッジデータベースに共通情報を保管しておくことで、エリアごとに、ＨＵＢシステムを構築して、標準化データを作成した後、遠隔地のシステム管理者が、リモートで監視管理する形態をとることも可能である。 For example, a configuration may be adopted in which a plurality of subsystems are installed in a plurality of different locations. In this case, by storing common information in the knowledge database, a HUB system is constructed for each area, standardized data is created, and then a remote system administrator remotely monitors and manages the data. It is also possible.

また、例えば、オペレータ監視層を異なる場所に設置する構成としてもよい（図４参照）。この場合、複数のＨＵＢ層を被監視対象機器の近くに設置することによって、統合運用監視層へ伝達する通信のトラフィックを軽減することも可能となる。 Further, for example, the operator monitoring layer may be installed at different locations (see FIG. 4). In this case, it is possible to reduce communication traffic transmitted to the integrated operation monitoring layer by installing a plurality of HUB layers near the device to be monitored.

本発明の実施の形態に係る４層構造のコンピュータシステムの階層型統合運用管理システムの構成を示すブロック図である。It is a block diagram which shows the structure of the hierarchy type | mold integrated operation management system of the computer system of the 4 layer structure which concerns on embodiment of this invention. 本発明の実施の形態の動作手順を示すフローチャートである。It is a flowchart which shows the operation | movement procedure of embodiment of this invention. 本発明の実施の形態のナレッジデータベースのストレージ性能テンプレートの一例を示す図である。It is a figure which shows an example of the storage performance template of the knowledge database of embodiment of this invention. 本発明の他の実施の形態に係る階層型統合運用管理システムの構成を示すブロック図である。It is a block diagram which shows the structure of the hierarchical type integrated operation management system which concerns on other embodiment of this invention.

Explanation of symbols

１オペレータ監視層
２統合運用監視層
３サブシステム監視層
３ａ，３ｂ，３ｘサブシステム
４ＨＵＢ層
４１ナレッジデータベース
４２運用情報収集部
４３演算処理部
４４管理情報出力部 DESCRIPTION OF SYMBOLS 1 Operator monitoring layer 2 Integrated operation monitoring layer 3 Subsystem monitoring layer 3a, 3b, 3x Subsystem 4 HUB layer 41 Knowledge database 42 Operation information collection part 43 Arithmetic processing part 44 Management information output part

Claims

An integrated operation monitoring system for a computer system,
An operation monitoring tool that collects performance information of monitored devices in the subsystem;
A knowledge database that collectively stores the performance information collected by the operation monitoring tool of a plurality of subsystems, and that stores a template for standardizing the performance information;
Arithmetic processing means for standardizing and outputting the performance information based on the template;
An integrated operation monitoring system characterized by comprising:

The integrated operation monitoring system according to claim 1, further comprising an integrated operation monitoring unit that receives the standardized output of the performance information of the arithmetic processing unit and presents the output to a management system engineer.

The operation monitoring tool also collects failure information of monitored devices in the subsystem,
The knowledge database stores failure information collected by the operation monitoring tool of a plurality of subsystems,
The arithmetic processing means, when a failure occurs in a monitored device in the subsystem, analyzes a device in which a failure may occur in relation to the failure based on the failure information accumulated in the knowledge database. 3. The integrated operation monitoring system according to claim 1, wherein an analysis result is output.

A program that allows a computer system to function as an integrated operation monitoring system,
An operation monitoring server in a subsystem in the computer system,
It functions as an operation monitoring tool that collects performance information of monitored devices,
An integrated operation monitoring server of the computer system,
A knowledge database that collectively stores the performance information collected by the operation monitoring tool of a plurality of subsystems, and that stores a template for standardizing the performance information;
Arithmetic processing means for standardizing and outputting the performance information based on the template;
Integrated operation monitoring program characterized by functioning as

An integrated operation monitoring server of the computer system,
5. The integrated operation monitoring program according to claim 4, wherein said integrated operation monitoring program receives an output of said performance information standardized by said computing means and functions as an integrated operation monitoring means for presenting said output to a technician of a monitoring system.

Processing for collecting failure information of monitored devices in the subsystem in the operation monitoring tool;
Processing for accumulating failure information collected by the operation monitoring tool of a plurality of subsystems in the knowledge database;
When a failure occurs in a monitoring target device in a subsystem, the arithmetic processing unit analyzes a device that may cause a failure in relation to the failure based on the failure information accumulated in the knowledge database. , Processing to output the analysis results,
The integrated operation monitoring program according to claim 4 or 5, wherein the integrated operation monitoring program is executed.