JP5267749B2 - Operation management apparatus, operation management method, and program - Google Patents

Operation management apparatus, operation management method, and program Download PDF

Info

Publication number
JP5267749B2
JP5267749B2 JP2012549903A JP2012549903A JP5267749B2 JP 5267749 B2 JP5267749 B2 JP 5267749B2 JP 2012549903 A JP2012549903 A JP 2012549903A JP 2012549903 A JP2012549903 A JP 2012549903A JP 5267749 B2 JP5267749 B2 JP 5267749B2
Authority
JP
Japan
Prior art keywords
correlation
monitored
common
destruction
operation management
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2012549903A
Other languages
Japanese (ja)
Other versions
JPWO2012086824A1 (en
Inventor
謙太郎 矢吹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to JP2012549903A priority Critical patent/JP5267749B2/en
Application granted granted Critical
Publication of JP5267749B2 publication Critical patent/JP5267749B2/en
Publication of JPWO2012086824A1 publication Critical patent/JPWO2012086824A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3495Performance evaluation by tracing or monitoring for systems

Description

本発明は、運用管理装置、運用管理方法、及びプログラムに関し、特に、システムの障害検出を行う運用管理装置、運用管理方法、及びプログラムに関する。   The present invention relates to an operation management apparatus, an operation management method, and a program, and more particularly, to an operation management apparatus, an operation management method, and a program that detect a system failure.

システム性能の時系列情報を用いて、システムのモデル化を行い、生成されたモデルを用いてそのシステムの障害を検出する運用管理システムの一例が特許文献1に記載されている。
特許文献1記載の運用管理システムは、システムの複数種別の性能値の計測値をもとに、複数の種別間の組み合わせのそれぞれに対して相関関数を決定することにより、複数の相関関数を含む相関モデルを生成する。そして、この運用管理システムは、生成された相関モデルを用いて、新たに入力された性能値の計測値に対して相関関係の破壊(相関破壊)が発生しているかどうかを判定し、相関破壊が集中している性能種別を検出することにより、障害の原因を特定する。
Patent Document 1 describes an example of an operation management system that models a system using time series information of system performance and detects a failure of the system using a generated model.
The operation management system described in Patent Literature 1 includes a plurality of correlation functions by determining a correlation function for each combination of a plurality of types based on measurement values of a plurality of types of performance values of the system. Generate a correlation model. Then, this operation management system uses the generated correlation model to determine whether or not correlation destruction (correlation destruction) has occurred with respect to the newly input performance value measurement value. The cause of the failure is identified by detecting the performance type in which is concentrated.

特開2009−199533号公報JP 2009-199533 A

上述の特許文献1に記載された運用管理システムにおいては、監視の対象となるシステム内のある処理装置等の障害が周辺に波及したことにより、システム内のいくつかの処理装置等で相関破壊が発生した場合、相関破壊を基にした障害原因の特定が困難になるという問題があった。
本発明の目的は、上述の課題を解決し、システム内の障害の波及によりいくつかの処理装置等で相関破壊が検出された場合でも、障害原因の候補を特定可能な運用管理装置、運用管理方法、及びプログラムを提供することである。
In the operation management system described in Patent Document 1 described above, a failure of a certain processing device or the like in the system to be monitored has spread to the periphery, so that correlation destruction is caused in some processing devices or the like in the system. When it occurs, there is a problem that it becomes difficult to identify the cause of the failure based on the correlation destruction.
An object of the present invention is to solve the above-described problems, and an operation management apparatus and operation management capable of specifying a failure cause candidate even when correlation destruction is detected in some processing devices or the like due to the propagation of a failure in the system A method and program are provided.

本発明の一態様における第1の運用管理装置は、複数の被監視対象の各々について、複数種別の性能値の内の異なる2つの種別の性能値間の相関関係を示す相関関数を1以上含む相関モデルを記憶する相関モデル記憶手段と、前記複数の被監視対象の各々について、入力された前記被監視対象の前記性能値を前記相関モデル記憶手段に記憶された当該被監視対象の前記相関モデルに適用し、当該相関モデルに含まれる前記相関関係の相関破壊を検出する相関破壊検出手段と、共通な装置または共通な前記被監視対象に直接または間接的に接続され、かつ、共通な前記相関関係を含む前記相関モデルを有する前記複数の被監視対象の相互間で、前記共通な前記相関関係毎の前記相関破壊の検出有無を比較することにより、障害要因の候補となる前記被監視対象を決定し、出力する障害分析手段とを含む。
本発明の一態様における第1の運用管理方法は、複数の被監視対象の各々について、複数種別の性能値の内の異なる2つの種別の性能値間の相関関係を示す相関関数を1以上含む相関モデルを記憶し、前記複数の被監視対象の各々について、入力された前記被監視対象の前記性能値を当該被監視対象の前記相関モデルに適用し、当該相関モデルに含まれる前記相関関係の相関破壊を検出し、共通な装置または共通な前記被監視対象に直接または間接的に接続され、かつ、共通な前記相関関係を含む前記相関モデルを有する前記複数の被監視対象の相互間で、前記共通な前記相関関係毎の前記相関破壊の検出有無を比較することにより、障害要因の候補となる前記被監視対象を決定し、出力する。
本発明の一態様におけるコンピュータ読み取り可能な記録媒体は、コンピュータに、複数の被監視対象の各々について、複数種別の性能値の内の異なる2つの種別の性能値間の相関関係を示す相関関数を1以上含む相関モデルを記憶し、前記複数の被監視対象の各々について、入力された前記被監視対象の前記性能値を当該被監視対象の前記相関モデルに適用し、当該相関モデルに含まれる前記相関関係の相関破壊を検出し、共通な装置または共通な前記被監視対象に直接または間接的に接続され、かつ、共通な前記相関関係を含む前記相関モデルを有する前記複数の被監視対象の相互間で、前記共通な前記相関関係毎の前記相関破壊の検出有無を比較することにより、障害要因の候補となる前記被監視対象を決定し、出力する処理を実行させるプログラムを格納する。
本発明の一態様における第2の運用管理装置は、複数の被監視対象の各々について、複数種別の性能値の内の異なる2つの種別の性能値間の相関関係を示す相関関数を1以上含む相関モデルを記憶する相関モデル記憶手段と、前記複数の被監視対象の各々について、入力された前記被監視対象の前記性能値を前記相関モデル記憶手段に記憶された当該被監視対象の前記相関モデルに適用し、当該相関モデルに含まれる前記相関関係の相関破壊を検出する相関破壊検出手段と、共通な装置または共通な前記被監視対象からいずれかが選択されて処理要求を受け付ける、あるいは、共通な装置または共通な前記被監視対象の処理結果を利用する、同一の機能を提供する前記複数の被監視対象の相互間で、共通な前記相関関係毎の前記相関破壊の検出有無を比較することにより、障害要因の候補となる前記被監視対象を決定し、出力する障害分析手段とを含む。
本発明の一態様における第2の運用管理方法は、複数の被監視対象の各々について、複数種別の性能値の内の異なる2つの種別の性能値間の相関関係を示す相関関数を1以上含む相関モデルを記憶し、前記複数の被監視対象の各々について、入力された前記被監視対象の前記性能値を当該被監視対象の前記相関モデルに適用し、当該相関モデルに含まれる前記相関関係の相関破壊を検出し、共通な装置または共通な前記被監視対象からいずれかが選択されて処理要求を受け付ける、あるいは、共通な装置または共通な前記被監視対象の処理結果を利用する、同一の機能を提供する前記複数の被監視対象の相互間で、共通な前記相関関係毎の前記相関破壊の検出有無を比較することにより、障害要因の候補となる前記被監視対象を決定し、出力する。
The first operation management apparatus according to one aspect of the present invention includes one or more correlation functions indicating correlations between performance values of two different types of performance values of a plurality of types for each of a plurality of monitored targets. Correlation model storage means for storing a correlation model; and for each of the plurality of monitored objects, the correlation model of the monitored object stored in the correlation model storage means with the performance value of the monitored object input Correlation destruction detecting means for detecting correlation destruction of the correlation included in the correlation model, and directly or indirectly connected to a common device or a common monitored object, and the common correlation By comparing the presence / absence of detection of the correlation destruction for each of the common correlations among the plurality of monitored objects having the correlation model including a relationship, a candidate for a failure factor is obtained. Serial determined to be monitored, and a fault analysis means for outputting.
The first operation management method according to one aspect of the present invention includes one or more correlation functions indicating correlations between performance values of two different types of performance values of a plurality of types for each of a plurality of monitored targets. A correlation model is stored, and for each of the plurality of monitored targets, the input performance value of the monitored target is applied to the correlation model of the monitored target, and the correlation of the correlation model included in the correlation model is applied. Between the plurality of monitored objects that detect correlation destruction and are connected directly or indirectly to a common device or the common monitored object and have the correlation model that includes the common correlation. By comparing the presence / absence of detection of the correlation destruction for each of the common correlations, the monitored object that is a failure factor candidate is determined and output.
The computer-readable recording medium according to one embodiment of the present invention provides a computer with a correlation function that indicates a correlation between two different types of performance values among a plurality of types of performance values for each of a plurality of monitored targets. Storing a correlation model including one or more, and applying the performance value of the monitored target inputted to each of the plurality of monitored targets to the correlation model of the monitored target, and including the correlation model in the correlation model Correlation of the plurality of monitored objects having the correlation model that is directly or indirectly connected to a common device or a common monitored object and that includes the common correlation Among the common correlations, the detection of the correlation destruction is compared to determine the monitored target that is a failure factor candidate, and the process of outputting is executed. To store that program.
The second operation management apparatus according to an aspect of the present invention includes one or more correlation functions indicating correlations between performance values of two different types of performance values of a plurality of types for each of a plurality of monitored targets. Correlation model storage means for storing a correlation model; and for each of the plurality of monitored objects, the correlation model of the monitored object stored in the correlation model storage means with the performance value of the monitored object input Applied to the correlation destruction detection means for detecting the correlation destruction of the correlation included in the correlation model, and either a common device or a common monitored object is selected to accept a processing request, or a common The correlation destruction test for each of the common correlations between the plurality of monitored objects that provide the same function using a common processing result of the devices or the monitored objects. By comparing the presence or absence, and a fault analysis means for determining the monitored subject as a failure factor candidates, and outputs.
The second operation management method according to an aspect of the present invention includes one or more correlation functions indicating correlations between performance values of two different types of performance values of a plurality of types for each of a plurality of monitored targets. A correlation model is stored, and for each of the plurality of monitored targets, the input performance value of the monitored target is applied to the correlation model of the monitored target, and the correlation of the correlation model included in the correlation model is applied. The same function that detects correlation destruction and receives a processing request by selecting either a common device or the common monitored object, or uses a processing result of the common device or the common monitored object By comparing the presence / absence of detection of the correlation destruction for each of the common correlations among the plurality of monitored objects that provide the same, the monitored objects that are candidates for failure factors are determined and output That.

本発明の効果は、システム内の障害の波及によりいくつかの処理装置等で相関破壊が検出された場合でも障害原因の特定ができることである。   The effect of the present invention is that the cause of the failure can be identified even when correlation destruction is detected by some processing devices or the like due to the propagation of the failure in the system.

本発明の第一の実施の形態の特徴的な構成を示すブロック図である。It is a block diagram which shows the characteristic structure of 1st embodiment of this invention. 本発明の第一の実施の形態における運用管理装置100を適用した運用管理システムの構成を示すブロック図である。It is a block diagram which shows the structure of the operation management system to which the operation management apparatus 100 in 1st embodiment of this invention is applied. 本発明の第一の実施の形態における、被監視装置200の接続関係の例を示す図である。It is a figure which shows the example of the connection relation of the to-be-monitored apparatus 200 in 1st embodiment of this invention. 本発明の第一の実施の形態における性能系列情報121の例を示す図である。It is a figure which shows the example of the performance series information 121 in 1st embodiment of this invention. 本発明の第一の実施の形態における相関モデル122の例を示す図である。It is a figure which shows the example of the correlation model 122 in 1st embodiment of this invention. 本発明の第一の実施の形態における、相関破壊が検出された相関関係の例を示す図である。It is a figure which shows the example of the correlation in which correlation destruction was detected in 1st embodiment of this invention. 本発明の第一の実施の形態における、障害の波及の例を示す図である。It is a figure which shows the example of the propagation of a failure in 1st embodiment of this invention. 本発明の第一の実施の形態におけるグループ情報123の例を示す図である。It is a figure which shows the example of the group information 123 in 1st embodiment of this invention. 本発明の第一の実施の形態における運用管理装置100の全体的な処理を示すフローチャートである。It is a flowchart which shows the whole process of the operation management apparatus 100 in 1st embodiment of this invention. 本発明の第一の実施の形態における、相関破壊の検出有無の類似度の算出結果を示す図である。It is a figure which shows the calculation result of the similarity degree of the detection presence or absence of correlation destruction in 1st embodiment of this invention. 本発明の第一の実施の形態における、障害解析結果130の例を示す図である。It is a figure which shows the example of the failure analysis result 130 in 1st embodiment of this invention.

(第一の実施の形態)
次に、本発明の第一の実施の形態について説明する。
はじめに、本発明の第一の実施の形態の構成について説明する。図2は、本発明の第一の実施の形態における運用管理装置100を適用した運用管理システムの構成を示すブロック図である。
図2を参照すると、本発明の第一の実施の形態における運用管理システムは、運用管理装置(監視制御装置)100、及び、複数の被監視装置200を含む。
運用管理装置100は、被監視対象である被監視装置200から収集した性能情報をもとに、被監視対象(被監視装置200)毎に相関モデル122を生成し、生成した相関モデル122を用いて、被監視対象(被監視装置200)についての障害分析を行う。
被監視装置200は、例えば、Webサーバ、アプリケーションサーバ(APサーバ)、データベースサーバ(DBサーバ)等、ユーザに対してサービスを提供するシステムを構成する装置である。
図3は、本発明の第一の実施の形態における、被監視装置200の接続関係の例を示す図である。ここで、被監視装置200は、Webサーバ層、APサーバ層、及び、DBサーバ層からなる階層システムを構成している。装置識別子SV1〜4の被監視装置200はWebサーバ、装置識別子SV5〜8の被監視装置200はAPサーバ、装置識別子SV9、10の被監視装置200はDBサーバである。
Webサーバ層に含まれる被監視装置200の各々は、APサーバ層に含まれる被監視装置200の各々と接続される。また、APサーバ層に含まれる被監視装置200の各々は、DBサーバ層に含まれる被監視装置200の各々と接続される。ネットワークを介したユーザからシステムへのリクエストは、ロードバランサ300によりWebサーバ層に含まれる被監視装置200の各々に転送される。そして、Webサーバ層に含まれる被監視装置200の各々は、例えば、ランダムにAPサーバ層に含まれる被監視装置200の各々にリクエストを転送する。
また、被監視装置200の各々は、複数種目の性能値の実測データを一定間隔毎に計測し、計測された各実測データ(計測値)を運用管理装置100へ送信する。ここで、性能値の種目として、例えば、CPU(Central Processing Unit)使用率(CPU_U)、メモリ使用量(MEM_U)、ディスク使用量(Disk_U)、ディスク入出力レート(Disk_IO)、受信パケット数(Packer_R)、送信パケット数(Packet_S)等が計測される。
ここで、被監視装置200と性能値の種目の組を性能値の種別(性能種別、または、単に種別)とし、同一時刻に計測された複数種別の性能値の組を性能情報とする。
運用管理装置100は、性能情報収集部101、相関モデル生成部102、相関破壊検出部104、障害分析部105、表示部106、性能情報記憶部111、相関モデル記憶部112、グループ情報記憶部113、相関破壊記憶部114を含む。
ここで、情報収集部101は、被監視装置200から性能情報を収集し、その時系列変化を性能系列情報121として性能情報記憶部111に保存する。
図4は、本発明の第一の実施の形態における性能系列情報121の例を示す図である。図4の例では、性能系列情報121は、装置識別子SV1の被監視装置200のCPU使用率(SV1.CPU_U)、メモリ使用量(SV1.MEM_U)、ディスク使用量(SV1.Disk_U)、ディスク入出力レート(SV1.Disk_IO)、装置識別子SV2の被監視装置200のCPU使用率(SV2.CPU_U)等を性能種別として含む。
相関モデル生成部102は、被監視装置200の各々について、性能系列情報121をもとに相関モデル122を生成する。ここで、相関モデル生成部102は、所定期間の性能系列情報121に基づいて、被監視装置200の各々について、複数の性能種別の内の異なる2つの性能種別毎に、当該2つの性能種別間の相関関係を示す相関関数(変換関数)を決定し、決定した相関関数を含む相関モデル122を生成する。相関関数は、1つの性能種別の計測値の時系列から他の性能種別の性能値の時系列を予測する関数であり、特許文献1に示されるように、上述の任意の2つの性能種別の計測値の時系列に対するシステム同定処理によって決定される。相関モデル生成部102は、さらに、相関関数による変換誤差をもとに、相関関数ごとに、例えば、変換誤差の平均値の大きさに応じて小さくなる重みを算出し、重みの大きい相関関数のみを相関モデル122に含めてもよい。
相関モデル記憶部112は、相関モデル生成部102が生成した相関モデル122を記憶する。
図5は、本発明の第一の実施の形態における相関モデル122の例を示す図である。図5において、各ノードは性能種別、ノード間の実線の矢印は2つの性能種別の一方から他方への相関関係を示す。図5の例では、装置識別子SV1〜4の被監視装置200の各々の相関モデル122は、CPU_UからMEM_U、CPU_UからDisk_U、MEM_UからPacket_S、及び、MEM_UからPacket_Rへの相関関係を含む。装置識別子SV5〜8の被監視装置200の各々の相関モデル122は、CPU_UからMEM_U、CPU_UからDisk_IO、CPU_UからPacket_S、MEM_UからDisk_U、及びPacket_SからPacket_Rへの相関関係を含む。また、これらの相関関係のそれぞれについて、相関関数(図示せず)が決定されている。
相関破壊検出部104は、新たに入力された性能情報と相関モデル記憶部112に記憶された相関モデル122とを用いて、被監視装置200の各々について、相関モデル122に含まれる相関関係の相関破壊を検出する。相関破壊検出部104は、特許文献1と同様に、複数の性能種別の内の2つの性能種別の一方の計測値を、当該2つの性能種別に対応する相関関数に入力して得られた値と、他方の計測値との差分が所定値以上の場合、当該2つの性能種別間の相関破壊として検出する。
相関破壊記憶部114は、相関破壊検出部104により相関破壊が検出された相関関係を示す相関破壊情報124を記憶する。
図6は、本発明の第一の実施の形態における、相関破壊が検出された相関関係の例を示す図である。図6において、破線の矢印は、相関破壊が検出された相関関係を示す。
ここで、本発明の第一の実施の形態における、障害の波及に伴い発生する相関破壊について説明する。図7は、本発明の第一の実施の形態における、障害の波及の例を示す図である。
ここで、Web層に属する装置識別子SV1〜4の被監視装置200間、AP層に属する装置識別子SV5〜8の被監視装置200間、及び、DB層に属する装置識別子SV9、10の被監視装置200間では、各被監視装置200は互いに同様の処理を行うため、これらの各階層の被監視装置200に直接、または、間接的に接続された共通な他の被監視装置200の障害の影響は各階層の被監視装置200に共通に及ぶ。従って、各階層の被監視装置200の相関モデル122間では、共通な相関関係毎の相関破壊の検出有無(相関破壊の発生箇所)は、互いに類似すると考えられる。
また、ある階層のある被監視装置200で障害が発生した場合、その階層の被監視装置200の相関モデル122間では、共通な相関関係毎の相関破壊の検出有無は、障害が発生した被監視装置200と、その他の被監視装置200とで異なると考えられる。
図7の例では、図3に示した階層システムにおいて、AP層の装置識別子SV7の被監視装置200(APサーバ)に障害が発生している。この場合、装置識別子SV7の被監視装置200の相関モデル122では、その装置の障害に起因した相関破壊が発生する。
そして、装置識別子SV7の被監視装置200の障害の影響は、その被監視装置200に直接的に接続されたWebサーバ層に含まれる装置識別子SV1〜4の被監視装置200に波及する。例えば、装置識別子SV7の被監視装置200の障害により、装置識別子SV1〜4の被監視装置200から装置識別子SV7の被監視装置200へ送信したリクエストに対するリプライが遅延し、装置識別子SV1〜4の被監視装置200においてリクエストの輻輳が発生する。この場合、装置識別子SV1〜4の被監視装置200間では、共通な相関関係毎の相関破壊の検出有無は、互いに類似する。
さらに、装置識別子SV7の被監視装置200の障害の影響は、装置識別子SV1〜4の被監視装置200を通して、間接的に接続された装置識別子SV5、6、8の被監視装置200に波及する。例えば、装置識別子SV1〜4の被監視装置200において発生したリクエストの輻輳により、装置識別子SV5、6、8の被監視装置200と装置識別子SV1〜4の被監視装置200との間の通信に遅延が生じる。この場合、障害が発生した装置識別子SV7の被監視装置200と、その障害が波及した、装置識別子SV5、6、8の被監視装置200とでは、共通な相関関係毎の相関破壊の検出有無が異なる。
従って、各階層に含まれる被監視装置200間で、共通な相関関係毎の相関破壊の検出有無を比較し、その階層内の他の被監視装置200と相関破壊の検出有無が異なる被監視装置200を抽出することにより、障害要因の候補となる被監視装置200を特定することができる。
グループ情報記憶部113は、グループ情報123を記憶する。図8は、本発明の第一の実施の形態におけるグループ情報123の例を示す図である。グループ情報123は、グループを識別するグループ識別子と、そのグループに分類される被監視装置200の装置識別子とを含む。
グループ情報123の各グループは、共通な他の被監視装置200に直接的、または、間接的に接続され、共通な相関関係を持つ(相関モデル122が類似した)被監視装置200が、同じグループに含まれるように設定される。
本発明の第一の実施の形態では、図3の階層システムにおける各階層がグループとして設定される。図8の例では、図3の階層システムにおけるWebサーバ層、APサーバ層、及びDBサーバ層が、それぞれ、グループ識別子GP1、GP2、GP3のグループとして、管理者等により予め設定されている。
障害分析部105は、グループ情報123と相関破壊情報124とをもとに、各グループに含まれる被監視装置200の相互間で、共通な相関関係毎の相関破壊の検出有無を比較することにより、障害要因の候補(障害要因候補)となる被監視装置200を特定し、出力する。
なお、運用管理装置100は、CPU(Central Processing Unit)とプログラムを記憶した記憶媒体を含み、プログラムに基づく制御によって動作するコンピュータであってもよい。また、性能情報記憶部111、相関モデル記憶部112、グループ情報記憶部113、及び、相関破壊記憶部114は、それぞれ個別の記憶媒体でも、1つの記憶媒体によって構成されてもよい。
次に、本発明の第一の実施の形態における運用管理装置100の動作について説明する。
図9は、本発明の第一の実施の形態における運用管理装置100の全体的な処理を示すフローチャートである。
はじめに、運用管理装置100の相関モデル生成部102は、性能情報記憶部111の性能系列情報121をもとに、各被監視装置200について相関モデル122を生成する。相関モデル生成部102は、生成した相関モデル122を相関モデル記憶部112に保存する(ステップS101)。
例えば、相関モデル生成部102は、図4の性能系列情報121を用いて、装置識別子SV1〜8の被監視装置200について図5に示すような相関モデル122を生成する。
次に、相関破壊検出部104は、情報収集部101により新たに入力された性能情報と相関モデル記憶部112に記憶された相関モデル122とを用いて、被監視装置200の各々について、相関モデル122に含まれる相関関係の相関破壊を検出する。相関破壊記憶部114は、検出された相関関係を示す相関破壊情報124を生成し、相関破壊記憶部114に保存する(ステップS102)。
例えば、相関破壊検出部104は、図5に示した装置識別子SV1〜8の被監視装置200の相関モデル122について、図6に示すように相関破壊を検出する。
次に、障害分析部105は、相関破壊記憶部114に記憶された相関破壊情報124をもとに、グループ情報123で示される各グループの被監視装置200の相互間で、共通な相関関係毎の相関破壊の検出有無(相関破壊の検出箇所)を比較することにより、相関関係毎の相関破壊の検出有無がグループ内の他の被監視装置200とは異なる被監視装置200を障害要因候補と決定する(ステップS103)。
ここで、障害分析部105は、各グループの被監視装置200の相互間で、共通な相関関係毎の相関破壊の検出有無の類似度を算出し、類似度が所定値以下、あるいは、類似度が低い方から所定数の被監視装置200を障害要因候補と決定する。類似度の算出方法は、複数の相関モデル122間で、共通な相関関係毎の相関破壊の検出有無を比較できれば、どのような方法を用いてもよい。
例えば、類似度として、共通な相関関係毎の相関破壊の検出有無を要素としたベクトルと、そのベクトルのグループ内の被監視装置200についての平均ベクトルとの間のコサイン類似度を用いる場合、障害分析部105は、グループ内の各被監視装置iについての類似度Siを、次の数1により算出する。

Figure 0005267749
図10は、本発明の第一の実施の形態における、相関破壊の検出有無の類似度の算出結果を示す図である。
例えば、障害分析部105は、図6の相関破壊に対して、図10のように類似度を算出する。ここで、類似度が低い方から、4つの被監視装置200を障害要因候補と決定する場合、障害分析部105は、装置識別子SV5〜8の被監視装置200を障害要因候補と決定する。
なお、障害分析部105は、類似度の代わりに、相関破壊の検出有無を要素としたベクトルを所定の方法により算出した基準ベクトルと比較することにより一致度を算出し、一致度が所定値以下、あるいは、一致度が低い所定数の被監視装置200を障害要因候補としてもよい。この場合、障害分析部105は、例えば、数1における相関破壊の検出有無を要素としたベクトルBiの全てのiについての論理和を計算することにより、基準ベクトルを生成し、各ベクトルBiと基準ベクトルとの間の要素の一致数をもとに、一致度を算出する。
次に、障害分析部105は、障害要因候補の被監視装置200に関する情報を含む障害解析結果130を表示部106へ出力する(ステップS104)。
図11は、本発明の第一の実施の形態における、障害解析結果130の例を示す図である。例えば、障害分析部105は、図11のような障害解析結果130を表示部106へ出力する。図11において、障害解析結果130は、障害要因候補リスト131、相関破壊検出結果132、及び、異常スコアリスト133を含む。
障害要因候補リスト131は、障害要因候補の被監視装置200の装置識別子とその類似度を示す。図11の例では、障害要因候補である、装置識別子SV5〜8の被監視装置200の装置識別子が、類似度の低い順で示されている。
相関破壊検出結果132は、障害要因候補の被監視装置200における、相関破壊が検出された相関関係を示す。図11の例では、障害要因候補の内、管理者等により選択された、類似度が最も小さい装置識別子SV7の被監視装置200について、相関破壊が検出された相関関係が性能種別とともに相関モデル122上で示されている。
異常スコアリスト133は、障害要因候補の被監視装置200における、相関破壊が検出された相関関係に関係する性能種別とその異常スコアを示す。ここで、異常スコアは、その性能種別について、相関破壊が集中している度合いを示し、例えば、特許文献1と同様の方法により算出される。図11の例では、装置識別子SV7の被監視装置200について、相関破壊が検出された相聞関係に関係する性能種別が、異常スコアが高い順に示されている。
管理者は、表示部106に表示された障害解析結果130を参照することにより、障害要因候補の被監視装置200、及び、その被監視装置200において相関破壊が集中している性能種別を障害要因の調査対象として把握できる。
例えば、管理者は、図11の障害解析結果130を参照し、装置識別子SV7の被監視装置200を調査対象として把握し、その装置で異常スコアの高いCPU使用率を優先的に調査できる。
以上により、本発明の第一の実施の形態の動作が完了する。
次に、本発明の第一の実施の形態の特徴的な構成を説明する。図1は、本発明の第一の実施の形態の特徴的な構成を示すブロック図である。
図1を参照すると、運用管理装置100は、相関モデル記憶部112、相関破壊検出部104、及び、障害分析部105を含む。
ここで、相関モデル記憶部112は、複数の被監視対象の各々について、複数種別の性能値の内の異なる2つの種別の性能値間の相関関係を示す相関関数を1以上含む相関モデル122を記憶する。相関破壊検出部104は、複数の被監視対象の各々について、入力された被監視対象の性能値を相関モデル記憶部112に記憶された当該被監視対象の相関モデル122に適用し、当該相関モデル122に含まれる相関関係の相関破壊を検出する。障害分析部105は、共通な装置または共通な被監視対象に直接または間接的に接続され、かつ、共通な相関関係を含む相関モデル122を有する複数の被監視対象の相互間で、共通な相関関係毎の相関破壊の検出有無を比較することにより、障害要因の候補となる被監視対象を決定し、出力する。
本発明の第一の実施の形態によれば、相関モデル122上の相関破壊をもとにシステムの障害を検出する運用管理装置100において、システム内の障害の波及によりいくつかの処理装置等で相関破壊が検出された場合でも障害原因の候補を特定できる。その理由は、障害分析部105が、共通な装置または共通な他の被監視装置200に直接または間接的に接続され、かつ、共通な相関関係を含む相関モデル122を有する複数の被監視装置200の相互間で、共通な相関関係毎の相関破壊の検出有無を比較することにより、障害要因の候補となる被監視装置200を決定するためである。
また、本発明の第一の実施の形態によれば、障害要因の候補となる被監視装置200において優先的に調査すべき性能種別を、管理者等が容易に把握できる。その理由は、障害分析部105が、障害要因の候補である被監視装置200の相関モデル122に含まれる相関破壊が検出された相関関係を、当該相関関係に関係する性能値の種別と関連づけて出力するためである。
以上、実施形態を参照して本願発明を説明したが、本願発明は上記実施形態に限定されない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。
例えば、本発明の第一の実施の形態では、1つの被監視装置200を1つの被監視対象とし、被監視対象毎に相関モデル122を生成し、障害要因候補の被監視対象を決定したが、これに限らず、複数の被監視装置200を1つの被監視対象としてもよい。また、被監視装置200上で動作する仮想マシン等、被監視装置200上である機能を提供する論理的な構成単位を1つの被監視対象としてもよい。
また、本発明の第一の実施の形態では、管理者等が、階層システムにおける各階層をグループとしてグループ情報123に設定したが、運用管理装置100のグループ情報生成部(図示せず)が、共通な装置または共通な他の被監視装置200に直接または間接的に接続された複数の被監視装置200間で、相関モデル122の比較を行う(例えば、クラスタリングアルゴリズムに従ってクラスタリングを行う)ことにより、共通な相関関係を持つ(相関モデル122が類似した)被監視装置200が各グループに含まれるように、グループ情報123を設定してもよい。
この出願は、2010年12月20日に出願された日本出願特願2010−282727を基礎とする優先権を主張し、その開示の全てをここに取り込む。(First embodiment)
Next, a first embodiment of the present invention will be described.
First, the configuration of the first embodiment of the present invention will be described. FIG. 2 is a block diagram showing a configuration of an operation management system to which the operation management apparatus 100 according to the first embodiment of the present invention is applied.
Referring to FIG. 2, the operation management system according to the first embodiment of the present invention includes an operation management apparatus (monitoring control apparatus) 100 and a plurality of monitored apparatuses 200.
The operation management apparatus 100 generates a correlation model 122 for each monitored target (monitored apparatus 200) based on the performance information collected from the monitored apparatus 200 that is the monitored target, and uses the generated correlation model 122. Thus, failure analysis is performed on the monitored object (monitored apparatus 200).
The monitored apparatus 200 is an apparatus that constitutes a system that provides services to users, such as a Web server, an application server (AP server), and a database server (DB server).
FIG. 3 is a diagram showing an example of the connection relationship of the monitored device 200 in the first embodiment of the present invention. Here, the monitored apparatus 200 constitutes a hierarchical system including a Web server layer, an AP server layer, and a DB server layer. The monitored devices 200 with the device identifiers SV1 to SV4 are Web servers, the monitored devices 200 with the device identifiers SV5 to 8 are AP servers, and the monitored devices 200 with the device identifiers SV9 and SV10 are DB servers.
Each monitored device 200 included in the Web server layer is connected to each monitored device 200 included in the AP server layer. Each monitored device 200 included in the AP server layer is connected to each monitored device 200 included in the DB server layer. A request from the user to the system via the network is transferred by the load balancer 300 to each monitored device 200 included in the Web server layer. Then, each of the monitored devices 200 included in the Web server layer, for example, randomly transfers a request to each of the monitored devices 200 included in the AP server layer.
In addition, each monitored apparatus 200 measures actual data of a plurality of types of performance values at regular intervals, and transmits each measured actual data (measured value) to the operation management apparatus 100. Here, as items of the performance value, for example, a CPU (Central Processing Unit) usage rate (CPU_U), a memory usage amount (MEM_U), a disk usage amount (Disk_U), a disk input / output rate (Disk_IO), and the number of received packets (Packer_R) ), The number of transmitted packets (Packet_S), and the like are measured.
Here, the combination of the monitored device 200 and the item of the performance value is set as a performance value type (performance type or simply type), and a set of performance values of a plurality of types measured at the same time is set as performance information.
The operation management apparatus 100 includes a performance information collection unit 101, a correlation model generation unit 102, a correlation destruction detection unit 104, a failure analysis unit 105, a display unit 106, a performance information storage unit 111, a correlation model storage unit 112, and a group information storage unit 113. The correlation destruction storage unit 114 is included.
Here, the information collection unit 101 collects performance information from the monitored device 200 and stores the time series change in the performance information storage unit 111 as performance series information 121.
FIG. 4 is a diagram illustrating an example of the performance sequence information 121 according to the first embodiment of this invention. In the example of FIG. 4, the performance series information 121 includes the CPU usage rate (SV1.CPU_U), the memory usage amount (SV1.MEM_U), the disk usage amount (SV1.Disk_U), the disk entry of the monitored device 200 with the device identifier SV1. The output rate (SV1.Disk_IO), the CPU usage rate (SV2.CPU_U) of the monitored device 200 with the device identifier SV2 and the like are included as performance types.
The correlation model generation unit 102 generates a correlation model 122 for each monitored device 200 based on the performance sequence information 121. Here, based on the performance series information 121 for a predetermined period, the correlation model generation unit 102 calculates the difference between the two performance types for each of the two different performance types from among the plurality of performance types. A correlation function (conversion function) indicating the correlation is determined, and a correlation model 122 including the determined correlation function is generated. The correlation function is a function that predicts a time series of performance values of other performance types from a time series of measurement values of one performance type. As shown in Patent Document 1, any of the above two performance types It is determined by the system identification process for the time series of measurement values. The correlation model generation unit 102 further calculates, for each correlation function, a weight that decreases according to, for example, the average value of the conversion error based on the conversion error due to the correlation function. May be included in the correlation model 122.
The correlation model storage unit 112 stores the correlation model 122 generated by the correlation model generation unit 102.
FIG. 5 is a diagram showing an example of the correlation model 122 in the first embodiment of the present invention. In FIG. 5, each node indicates a performance type, and a solid line arrow between nodes indicates a correlation from one of the two performance types to the other. In the example of FIG. 5, each correlation model 122 of the monitored devices 200 with the device identifiers SV1 to SV4 includes correlations from CPU_U to MEM_U, CPU_U to Disk_U, MEM_U to Packet_S, and MEM_U to Packet_R. Each correlation model 122 of the monitored devices 200 with the device identifiers SV5 to 8 includes correlations from CPU_U to MEM_U, CPU_U to Disk_IO, CPU_U to Packet_S, MEM_U to Disk_U, and Packet_S to Packet_R. A correlation function (not shown) is determined for each of these correlations.
The correlation destruction detection unit 104 uses the newly input performance information and the correlation model 122 stored in the correlation model storage unit 112 to correlate the correlation included in the correlation model 122 for each monitored device 200. Detect destruction. Similar to Patent Document 1, the correlation destruction detection unit 104 is a value obtained by inputting one measurement value of two performance types out of a plurality of performance types into a correlation function corresponding to the two performance types. When the difference between the measured value and the other measured value is greater than or equal to a predetermined value, it is detected as a correlation breakdown between the two performance types.
The correlation destruction storage unit 114 stores correlation destruction information 124 indicating the correlation in which the correlation destruction is detected by the correlation destruction detection unit 104.
FIG. 6 is a diagram illustrating an example of correlation in which correlation destruction is detected in the first embodiment of the present invention. In FIG. 6, a broken-line arrow indicates a correlation in which correlation destruction is detected.
Here, the correlation destruction that occurs with the spread of the failure in the first embodiment of the present invention will be described. FIG. 7 is a diagram showing an example of failure propagation in the first embodiment of the present invention.
Here, between the monitored devices 200 having the device identifiers SV1 to 4 belonging to the Web layer, between the monitored devices 200 having the device identifiers SV5 to 8 belonging to the AP layer, and to the monitored devices having the device identifiers SV9 and 10 belonging to the DB layer. Since each of the monitored devices 200 performs the same processing between the devices 200, the influence of the failure of another common monitored device 200 connected directly or indirectly to the monitored devices 200 of these layers. Is common to the monitored devices 200 in each layer. Therefore, between the correlation models 122 of the monitored devices 200 of each layer, it is considered that the presence / absence of detection of correlation destruction for each common correlation (location where correlation destruction occurs) is similar to each other.
In addition, when a failure occurs in a monitored device 200 in a certain hierarchy, the presence or absence of detection of correlation destruction for each common correlation between the correlation models 122 of the monitored devices 200 in that hierarchy is determined by the monitored device in which the failure has occurred. It is considered that the device 200 is different from the other monitored devices 200.
In the example of FIG. 7, in the hierarchical system shown in FIG. 3, a failure has occurred in the monitored device 200 (AP server) with the device identifier SV7 of the AP layer. In this case, in the correlation model 122 of the monitored device 200 with the device identifier SV7, correlation destruction due to the failure of the device occurs.
Then, the influence of the failure of the monitored device 200 with the device identifier SV7 spreads to the monitored device 200 with the device identifiers SV1 to 4 included in the Web server layer directly connected to the monitored device 200. For example, due to a failure of the monitored device 200 with the device identifier SV7, a reply to a request transmitted from the monitored device 200 with the device identifier SV1 to 4 to the monitored device 200 with the device identifier SV7 is delayed, and the monitored device with the device identifiers SV1 to SV4. Request congestion occurs in the monitoring device 200. In this case, between the monitored devices 200 having the device identifiers SV1 to SV4, the presence / absence of detection of correlation destruction for each common correlation is similar to each other.
Further, the influence of the failure of the monitored device 200 having the device identifier SV7 is transmitted to the monitored devices 200 having the device identifiers SV5, 6, and 8 indirectly connected through the monitored device 200 having the device identifiers SV1 to SV4. For example, communication between the monitored device 200 with the device identifiers SV5, 6, and 8 and the monitored device 200 with the device identifiers SV1 to SV4 is delayed due to congestion of requests generated in the monitored device 200 with the device identifiers SV1 to SV4. Occurs. In this case, the presence / absence of detection of correlation destruction for each common correlation is detected between the monitored device 200 with the device identifier SV7 in which the failure has occurred and the monitored devices 200 with the device identifiers SV5, 6 and 8 to which the failure has spread. Different.
Accordingly, the monitored devices 200 included in each layer are compared with each other in the presence / absence of detection of correlation destruction for each correlation, and the monitored devices differ in the presence / absence of detection of correlation destruction from other monitored devices 200 in the layer. By extracting 200, it is possible to identify the monitored device 200 that is a candidate for a failure factor.
The group information storage unit 113 stores group information 123. FIG. 8 is a diagram illustrating an example of the group information 123 according to the first embodiment of this invention. The group information 123 includes a group identifier for identifying the group and a device identifier of the monitored device 200 classified into the group.
Each group of the group information 123 is directly or indirectly connected to another common monitored device 200 and the monitored devices 200 having a common correlation (similar to the correlation model 122) are the same group. Is set to be included.
In the first embodiment of the present invention, each hierarchy in the hierarchical system of FIG. 3 is set as a group. In the example of FIG. 8, the Web server layer, the AP server layer, and the DB server layer in the hierarchical system of FIG. 3 are preset by the administrator or the like as groups of group identifiers GP1, GP2, and GP3, respectively.
The failure analysis unit 105 compares the presence / absence of detection of correlation destruction for each common correlation between the monitored devices 200 included in each group based on the group information 123 and the correlation destruction information 124. The monitored device 200 that is a failure factor candidate (failure factor candidate) is identified and output.
Note that the operation management apparatus 100 may be a computer that includes a CPU (Central Processing Unit) and a storage medium that stores a program, and operates by control based on the program. Further, the performance information storage unit 111, the correlation model storage unit 112, the group information storage unit 113, and the correlation destruction storage unit 114 may be configured as individual storage media or as a single storage medium.
Next, the operation of the operation management apparatus 100 in the first embodiment of the present invention will be described.
FIG. 9 is a flowchart showing overall processing of the operation management apparatus 100 in the first embodiment of the present invention.
First, the correlation model generation unit 102 of the operation management apparatus 100 generates a correlation model 122 for each monitored apparatus 200 based on the performance series information 121 of the performance information storage unit 111. The correlation model generation unit 102 stores the generated correlation model 122 in the correlation model storage unit 112 (step S101).
For example, the correlation model generation unit 102 generates a correlation model 122 as shown in FIG. 5 for the monitored devices 200 with the device identifiers SV1 to 8 using the performance series information 121 of FIG.
Next, the correlation destruction detection unit 104 uses the performance information newly input by the information collection unit 101 and the correlation model 122 stored in the correlation model storage unit 112 to calculate a correlation model for each monitored device 200. Correlation destruction of the correlation included in 122 is detected. The correlation destruction storage unit 114 generates correlation destruction information 124 indicating the detected correlation and stores it in the correlation destruction storage unit 114 (step S102).
For example, the correlation destruction detection unit 104 detects the correlation destruction as shown in FIG. 6 for the correlation model 122 of the monitored device 200 with the device identifiers SV1 to SV8 shown in FIG.
Next, based on the correlation destruction information 124 stored in the correlation destruction storage unit 114, the failure analysis unit 105 creates a common correlation between the monitored devices 200 of each group indicated by the group information 123. By comparing the presence / absence of detection of correlation destruction (detection location of correlation destruction), the monitored device 200 whose presence / absence of detection of correlation destruction for each correlation is different from other monitored devices 200 in the group is determined as a failure factor candidate. Determine (step S103).
Here, the failure analysis unit 105 calculates the degree of detection of the presence or absence of correlation destruction for each common correlation between the monitored devices 200 of each group, and the degree of similarity is equal to or less than a predetermined value, or the degree of similarity A predetermined number of monitored devices 200 are determined as failure factor candidates from the lowest. As a method for calculating the similarity, any method may be used as long as the presence / absence of detection of correlation destruction for each common correlation can be compared between the plurality of correlation models 122.
For example, when using the cosine similarity between a vector whose element is the presence or absence of correlation destruction detection for each common correlation and the average vector for the monitored devices 200 in the group of the vector as the similarity, The analysis unit 105 calculates the similarity Si for each monitored device i in the group by the following equation (1).
Figure 0005267749
FIG. 10 is a diagram illustrating a calculation result of the similarity of whether or not correlation destruction is detected in the first embodiment of the present invention.
For example, the failure analysis unit 105 calculates the similarity as shown in FIG. 10 for the correlation destruction shown in FIG. Here, when the four monitored devices 200 are determined as failure factor candidates from the lower similarity, the failure analysis unit 105 determines the monitored devices 200 having the device identifiers SV5 to SV8 as failure factor candidates.
In addition, the failure analysis unit 105 calculates the degree of coincidence by comparing a vector having the presence or absence of detection of correlation destruction as an element instead of the similarity with a reference vector calculated by a predetermined method, and the degree of coincidence is equal to or less than a predetermined value. Alternatively, a predetermined number of monitored devices 200 having a low degree of coincidence may be used as failure factor candidates. In this case, for example, the failure analysis unit 105 generates a reference vector by calculating a logical sum of all i of the vector Bi with the presence / absence of detection of correlation destruction in Equation 1 as an element, and each vector Bi and the reference The degree of coincidence is calculated based on the number of coincidence of elements with the vector.
Next, the failure analysis unit 105 outputs a failure analysis result 130 including information related to the monitored device 200 as a failure factor candidate to the display unit 106 (step S104).
FIG. 11 is a diagram showing an example of the failure analysis result 130 in the first embodiment of the present invention. For example, the failure analysis unit 105 outputs a failure analysis result 130 as illustrated in FIG. 11 to the display unit 106. In FIG. 11, the failure analysis result 130 includes a failure factor candidate list 131, a correlation destruction detection result 132, and an abnormal score list 133.
The failure factor candidate list 131 indicates the device identifier of the monitored device 200 of the failure factor candidate and its similarity. In the example of FIG. 11, the device identifiers of the monitored devices 200 with the device identifiers SV5 to 8 that are failure factor candidates are shown in order of decreasing similarity.
The correlation destruction detection result 132 indicates the correlation in which the correlation destruction is detected in the monitored device 200 that is a failure factor candidate. In the example of FIG. 11, for the monitored device 200 with the device identifier SV7 having the smallest similarity selected by the administrator or the like among the failure factor candidates, the correlation in which the correlation destruction is detected together with the performance type is the correlation model 122. Shown above.
The abnormality score list 133 indicates the performance type related to the correlation in which the correlation destruction is detected and the abnormality score in the monitored apparatus 200 as the failure factor candidate. Here, the abnormality score indicates a degree of concentration of correlation destruction for the performance type, and is calculated by, for example, the same method as in Patent Document 1. In the example of FIG. 11, for the monitored device 200 with the device identifier SV7, the performance types related to the hearing relationship in which the correlation destruction is detected are shown in descending order of the abnormality score.
The administrator refers to the failure analysis result 130 displayed on the display unit 106 to determine the monitored device 200 as a failure factor candidate and the performance type in which correlation destruction is concentrated in the monitored device 200 as the failure factor. Can be grasped as a survey target.
For example, the administrator can refer to the failure analysis result 130 of FIG. 11 to grasp the monitored device 200 with the device identifier SV7 as the investigation target, and preferentially investigate the CPU usage rate with a high abnormality score in the device.
Thus, the operation of the first embodiment of the present invention is completed.
Next, a characteristic configuration of the first embodiment of the present invention will be described. FIG. 1 is a block diagram showing a characteristic configuration of the first embodiment of the present invention.
Referring to FIG. 1, the operation management apparatus 100 includes a correlation model storage unit 112, a correlation destruction detection unit 104, and a failure analysis unit 105.
Here, the correlation model storage unit 112 includes, for each of a plurality of monitored targets, a correlation model 122 including one or more correlation functions indicating correlations between performance values of two different types of performance values of a plurality of types. Remember. The correlation destruction detection unit 104 applies the input performance value of the monitored object to each of the monitored models for each of the plurality of monitored objects to the correlation model 122 of the monitored object stored in the correlation model storage unit 112. Correlation destruction of the correlation included in 122 is detected. The failure analysis unit 105 is connected directly or indirectly to a common device or a common monitored object, and has a common correlation among a plurality of monitored objects having a correlation model 122 including a common correlation. By comparing the presence / absence of detection of correlation destruction for each relationship, a monitored object that is a candidate for a failure factor is determined and output.
According to the first embodiment of the present invention, in the operation management apparatus 100 that detects a system failure based on the correlation destruction on the correlation model 122, some processing devices or the like may be used due to the propagation of the fault in the system. Even when correlation destruction is detected, a candidate for the cause of failure can be specified. The reason is that the failure analysis unit 105 is directly or indirectly connected to a common device or another common monitored device 200 and has a plurality of monitored devices 200 having a correlation model 122 including a common correlation. This is because the monitored devices 200 that are candidates for the failure factor are determined by comparing the presence / absence of detection of correlation destruction for each common correlation.
Further, according to the first embodiment of the present invention, the administrator or the like can easily grasp the performance type to be preferentially investigated in the monitored apparatus 200 that is a candidate for the failure factor. The reason is that the failure analysis unit 105 associates the correlation in which the correlation destruction included in the correlation model 122 of the monitored apparatus 200 that is a candidate for the failure factor is detected with the type of performance value related to the correlation. This is for output.
While the present invention has been described with reference to the embodiments, the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.
For example, in the first embodiment of the present invention, one monitored device 200 is set as one monitored target, the correlation model 122 is generated for each monitored target, and the monitored target of the failure factor candidate is determined. Not limited to this, a plurality of monitored devices 200 may be set as one monitored target. A logical structural unit that provides a function on the monitored apparatus 200, such as a virtual machine that operates on the monitored apparatus 200, may be set as one monitored object.
Further, in the first embodiment of the present invention, the administrator or the like sets each layer in the hierarchical system as a group in the group information 123, but the group information generation unit (not shown) of the operation management apparatus 100 By comparing the correlation model 122 between a plurality of monitored devices 200 connected directly or indirectly to a common device or another common monitored device 200 (for example, clustering is performed according to a clustering algorithm), The group information 123 may be set so that the monitored devices 200 having a common correlation (similar to the correlation model 122) are included in each group.
This application claims the priority on the basis of Japanese application Japanese Patent Application No. 2010-282727 for which it applied on December 20, 2010, and takes in those the indications of all here.

100 運用管理装置
101 性能情報収集部
102 相関モデル生成部
104 相関破壊検出部
105 障害分析部
106 表示部
111 性能情報記憶部
112 相関モデル記憶部
113 グループ情報記憶部
114 相関破壊記憶部
121 性能系列情報
122 相関モデル
123 グループ情報
124 相関破壊情報
130 障害解析結果
131 障害要因候補リスト
132 相関破壊検出結果
133 異常スコアリスト
200 被監視装置
300 ロードバランサ
DESCRIPTION OF SYMBOLS 100 Operation management apparatus 101 Performance information collection part 102 Correlation model production | generation part 104 Correlation destruction detection part 105 Failure analysis part 106 Display part 111 Performance information storage part 112 Correlation model storage part 113 Group information storage part 114 Correlation destruction storage part 121 Performance series information 122 correlation model 123 group information 124 correlation destruction information 130 failure analysis result 131 failure factor candidate list 132 correlation failure detection result 133 abnormality score list 200 monitored device 300 load balancer

Claims (23)

複数の被監視対象の各々について、複数種別の性能値の内の異なる2つの種別の性能値間の相関関係を示す相関関数を1以上含む相関モデルを記憶する相関モデル記憶手段と、
前記複数の被監視対象の各々について、入力された前記被監視対象の前記性能値を前記相関モデル記憶手段に記憶された当該被監視対象の前記相関モデルに適用し、当該相関モデルに含まれる前記相関関係の相関破壊を検出する相関破壊検出手段と、
共通な装置または共通な前記被監視対象に直接または間接的に接続され、かつ、共通な前記相関関係を含む前記相関モデルを有する前記複数の被監視対象の相互間で、前記共通な前記相関関係毎の前記相関破壊の検出有無を比較することにより、障害要因の候補となる前記被監視対象を決定し、出力する障害分析手段と
を含む運用管理装置。
Correlation model storage means for storing a correlation model including one or more correlation functions indicating correlation between performance values of two different types of performance values of a plurality of types for each of a plurality of monitored objects;
For each of the plurality of monitored objects, the performance value of the monitored object that is input is applied to the correlation model of the monitored object stored in the correlation model storage unit, and is included in the correlation model Correlation destruction detection means for detecting correlation destruction of correlation,
The common correlation between the plurality of monitored objects that are directly or indirectly connected to a common device or the common monitored object and have the correlation model including the common correlation. An operation management apparatus comprising: failure analysis means for determining and outputting the monitored target as a failure factor candidate by comparing the presence or absence of detection of the correlation destruction for each.
前記障害分析手段は、他の被監視対象に対する、前記共通な前記相関関係毎の前記相関破壊の検出有無の類似度が低い前記被監視対象を、前記障害要因の候補と決定する
請求項1に記載の運用管理装置。
The failure analysis means determines the monitored target having a low similarity in the presence or absence of the detection of the correlation destruction for each of the common correlations with respect to another monitored target as the failure factor candidate. The operation management device described.
前記複数の被監視対象は複数の階層に分割され、隣接する2つの前記階層の内の一方の階層に属する前記被監視対象の各々が、他方の階層に属する前記被監視対象の各々と接続され、前記複数の階層の各々における前記複数の被監視対象が有する前記相関モデルは前記共通な前記相関関係を含み、
前記障害分析手段は、
前記複数の階層の各々における前記複数の被監視対象の相互間で、前記共通な前記相関関係の各々の前記相関破壊検出有無を比較する
請求項1に記載の運用管理装置。
The plurality of monitored objects are divided into a plurality of hierarchies, and each of the monitored objects belonging to one of the two adjacent hierarchies is connected to each of the monitored objects belonging to the other hierarchy. The correlation model of the plurality of monitored objects in each of the plurality of hierarchies includes the common correlation.
The failure analysis means includes
The operation management apparatus according to claim 1, wherein the presence or absence of the correlation destruction detection of each of the common correlations is compared between the plurality of monitored targets in each of the plurality of hierarchies.
前記障害分析手段は、さらに、前記障害要因の候補である前記被監視対象の前記相関モデルに含まれる前記相関破壊が検出された前記相関関係を、当該相関関係に関係する性能値の種別と関連づけて出力する
請求項1に記載の運用管理装置。
The failure analysis means further associates the correlation in which the correlation destruction included in the correlation model of the monitored object that is a candidate for the failure factor is detected with a type of performance value related to the correlation. The operation management apparatus according to claim 1, wherein the operation management apparatus outputs the output.
さらに、
前記共通な装置または前記共通な前記被監視対象に直接または間接的に接続された前記複数の被監視対象間で、前記相関モデルの比較を行うことにより、前記共通な前記相関関係を含む前記相関モデルを有する前記複数の被監視対象を含むグループを抽出する、グループ情報生成手段を含む
請求項1に記載の運用管理装置。
further,
The correlation including the common correlation by comparing the correlation models between the plurality of monitored objects directly or indirectly connected to the common device or the common monitored object. The operation management apparatus according to claim 1, further comprising group information generation means for extracting a group including the plurality of monitoring targets having a model.
複数の被監視対象の各々について、複数種別の性能値の内の異なる2つの種別の性能値間の相関関係を示す相関関数を1以上含む相関モデルを記憶し、
前記複数の被監視対象の各々について、入力された前記被監視対象の前記性能値を当該被監視対象の前記相関モデルに適用し、当該相関モデルに含まれる前記相関関係の相関破壊を検出し、
共通な装置または共通な前記被監視対象に直接または間接的に接続され、かつ、共通な前記相関関係を含む前記相関モデルを有する前記複数の被監視対象の相互間で、前記共通な前記相関関係毎の前記相関破壊の検出有無を比較することにより、障害要因の候補となる前記被監視対象を決定し、出力する
運用管理方法。
For each of a plurality of monitored objects, store a correlation model including one or more correlation functions indicating a correlation between performance values of two different types of performance values of a plurality of types,
Applying the input performance value of the monitored object to the correlation model of the monitored object for each of the plurality of monitored objects, detecting correlation destruction of the correlation included in the correlation model;
The common correlation between the plurality of monitored objects that are directly or indirectly connected to a common device or the common monitored object and have the correlation model including the common correlation. An operation management method for determining and outputting the monitored target that is a candidate for a failure factor by comparing the presence or absence of detection of the correlation destruction for each.
他の被監視対象に対する、前記共通な前記相関関係毎の前記相関破壊の検出有無の類似度が低い前記被監視対象を、前記障害要因の候補と決定する
請求項6に記載の運用管理方法。
The operation management method according to claim 6, wherein the monitored target having a low similarity of the presence or absence of detection of the correlation destruction for each of the common correlations with respect to another monitored target is determined as the failure factor candidate.
前記複数の被監視対象は複数の階層に分割され、隣接する2つの前記階層の内の一方の階層に属する前記被監視対象の各々が、他方の階層に属する前記被監視対象の各々と接続され、前記複数の階層の各々における前記複数の被監視対象が有する前記相関モデルは前記共通な前記相関関係を含み、
前記障害要因の候補を決定する場合、前記複数の階層の各々における前記複数の被監視対象の相互間で、前記共通な前記相関関係の各々の前記相関破壊検出有無を比較する
請求項6に記載の運用管理方法。
The plurality of monitored objects are divided into a plurality of hierarchies, and each of the monitored objects belonging to one of the two adjacent hierarchies is connected to each of the monitored objects belonging to the other hierarchy. The correlation model of the plurality of monitored objects in each of the plurality of hierarchies includes the common correlation.
7. When determining the failure factor candidates, the presence / absence of the correlation destruction detection in each of the common correlations is compared between the plurality of monitored targets in each of the plurality of hierarchies. Operation management method.
さらに、
前記障害要因の候補である前記被監視対象の前記相関モデルに含まれる前記相関破壊が検出された前記相関関係を、当該相関関係に関係する性能値の種別と関連づけて出力する
請求項6に記載の運用管理方法。
further,
7. The correlation in which the correlation destruction included in the correlation model of the monitored target that is the failure factor candidate is output in association with a type of performance value related to the correlation. Operation management method.
さらに、
前記共通な装置または前記共通な前記被監視対象に直接または間接的に接続された前記複数の被監視対象間で、前記相関モデルの比較を行うことにより、前記共通な前記相関関係を含む前記相関モデルを有する前記複数の被監視対象を含むグループを抽出する
請求項6に記載の運用管理方法。
further,
The correlation including the common correlation by comparing the correlation models between the plurality of monitored objects directly or indirectly connected to the common device or the common monitored object. The operation management method according to claim 6, wherein a group including the plurality of monitored objects having a model is extracted.
コンピュータに、
複数の被監視対象の各々について、複数種別の性能値の内の異なる2つの種別の性能値間の相関関係を示す相関関数を1以上含む相関モデルを記憶し、
前記複数の被監視対象の各々について、入力された前記被監視対象の前記性能値を当該被監視対象の前記相関モデルに適用し、当該相関モデルに含まれる前記相関関係の相関破壊を検出し、
共通な装置または共通な前記被監視対象に直接または間接的に接続され、かつ、共通な前記相関関係を含む前記相関モデルを有する前記複数の被監視対象の相互間で、前記共通な前記相関関係毎の前記相関破壊の検出有無を比較することにより、障害要因の候補となる前記被監視対象を決定し、出力する
処理を実行させるプログラムを格納するコンピュータ読み取り可能な記録媒体。
On the computer,
For each of a plurality of monitored objects, store a correlation model including one or more correlation functions indicating a correlation between performance values of two different types of performance values of a plurality of types,
Applying the input performance value of the monitored object to the correlation model of the monitored object for each of the plurality of monitored objects, detecting correlation destruction of the correlation included in the correlation model;
The common correlation between the plurality of monitored objects that are directly or indirectly connected to a common device or the common monitored object and have the correlation model including the common correlation. A computer-readable recording medium for storing a program for executing a process of determining and outputting the monitored target as a failure factor candidate by comparing the presence or absence of detection of the correlation destruction for each.
他の被監視対象に対する、前記共通な前記相関関係毎の前記相関破壊の検出有無の類似度が低い前記被監視対象を、前記障害要因の候補と決定する
請求項11に記載のプログラムを格納する記録媒体。
The program according to claim 11, wherein the monitored target having a low similarity in the presence or absence of detection of the correlation destruction for each of the common correlations with respect to another monitored target is determined as the failure factor candidate. recoding media.
前記複数の被監視対象は複数の階層に分割され、隣接する2つの前記階層の内の一方の階層に属する前記被監視対象の各々が、他方の階層に属する前記被監視対象の各々と接続され、前記複数の階層の各々における前記複数の被監視対象が有する前記相関モデルは前記共通な前記相関関係を含み、
前記障害要因の候補を決定する場合、前記複数の階層の各々における前記複数の被監視対象の相互間で、前記共通な前記相関関係の各々の前記相関破壊検出有無を比較する
請求項11に記載のプログラムを格納する記録媒体。
The plurality of monitored objects are divided into a plurality of hierarchies, and each of the monitored objects belonging to one of the two adjacent hierarchies is connected to each of the monitored objects belonging to the other hierarchy. The correlation model of the plurality of monitored objects in each of the plurality of hierarchies includes the common correlation.
The determination of whether or not the correlation destruction is detected in each of the common correlations is compared between the plurality of monitoring targets in each of the plurality of hierarchies when the failure factor candidate is determined. Recording medium for storing programs.
さらに、
前記障害要因の候補である前記被監視対象の前記相関モデルに含まれる前記相関破壊が検出された前記相関関係を、当該相関関係に関係する性能値の種別と関連づけて出力する
請求項11に記載のプログラムを格納する記録媒体。
further,
12. The correlation in which the correlation destruction included in the correlation model of the monitored target that is the failure factor candidate is output in association with a type of performance value related to the correlation. Recording medium for storing programs.
さらに、
前記共通な装置または前記共通な前記被監視対象に直接または間接的に接続された前記複数の被監視対象間で、前記相関モデルの比較を行うことにより、前記共通な前記相関関係を含む前記相関モデルを有する前記複数の被監視対象を含むグループを抽出する
請求項11に記載のプログラムを格納する記録媒体。
further,
The correlation including the common correlation by comparing the correlation models between the plurality of monitored objects directly or indirectly connected to the common device or the common monitored object. The recording medium for storing a program according to claim 11, wherein a group including the plurality of monitored objects having a model is extracted.
複数の被監視対象の各々について、複数種別の性能値の内の異なる2つの種別の性能値間の相関関係を示す相関関数を1以上含む相関モデルを記憶する相関モデル記憶手段と、
前記複数の被監視対象の各々について、入力された前記被監視対象の前記性能値を前記相関モデル記憶手段に記憶された当該被監視対象の前記相関モデルに適用し、当該相関モデルに含まれる前記相関関係の相関破壊を検出する相関破壊検出手段と、
共通な装置または共通な前記被監視対象からいずれかが選択されて処理要求を受け付ける、あるいは、共通な装置または共通な前記被監視対象の処理結果を利用する、同一の機能を提供する前記複数の被監視対象の相互間で、共通な前記相関関係毎の前記相関破壊の検出有無を比較することにより、障害要因の候補となる前記被監視対象を決定し、出力する障害分析手段と
を含む運用管理装置。
Correlation model storage means for storing a correlation model including one or more correlation functions indicating correlation between performance values of two different types of performance values of a plurality of types for each of a plurality of monitored objects;
For each of the plurality of monitored objects, the performance value of the monitored object that is input is applied to the correlation model of the monitored object stored in the correlation model storage unit, and is included in the correlation model Correlation destruction detection means for detecting correlation destruction of correlation,
The plurality of providing the same function by selecting a common device or a common target to be monitored and receiving a processing request, or using a common device or a common processing result of the monitored target Operation including failure analysis means for determining and outputting the monitored target that is a candidate for a failure factor by comparing the presence or absence of detection of the correlation destruction for each of the common correlations between the monitored targets Management device.
前記障害分析手段は、他の被監視対象に対する、前記共通な前記相関関係毎の前記相関破壊の検出有無の類似度が低い前記被監視対象を、前記障害要因の候補と決定する
請求項16に記載の運用管理装置。
The failure analysis means determines the monitored target having a low similarity in the presence or absence of the detection of the correlation destruction for each of the common correlations with respect to another monitored target as the failure factor candidate. The operation management device described.
前記障害分析手段は、さらに、前記障害要因の候補である前記被監視対象の前記相関モデルに含まれる前記相関破壊が検出された前記相関関係を、当該相関関係に関係する性能値の種別と関連づけて出力する
請求項16に記載の運用管理装置。
The failure analysis means further associates the correlation in which the correlation destruction included in the correlation model of the monitored object that is a candidate for the failure factor is detected with a type of performance value related to the correlation. The operation management apparatus according to claim 16, wherein the operation management apparatus outputs the output.
さらに、
前記共通な装置または前記共通な前記被監視対象からいずれかが選択されて処理要求を受け付ける、あるいは、前記共通な装置または前記共通な前記被監視対象の処理結果を利用する、同一の機能を提供する前記複数の被監視対象間で、前記相関モデルの比較を行うことにより、前記共通な前記相関関係を含む前記相関モデルを有する前記複数の被監視対象を含むグループを抽出する、グループ情報生成手段を含む
請求項16に記載の運用管理装置。
further,
The same function is provided in which either one of the common device or the common monitored target is selected and a processing request is accepted, or the processing result of the common device or the common monitored target is used. Group information generating means for extracting a group including the plurality of monitored objects having the correlation model including the common correlation by comparing the correlation models among the plurality of monitored objects. The operation management apparatus according to claim 16.
複数の被監視対象の各々について、複数種別の性能値の内の異なる2つの種別の性能値間の相関関係を示す相関関数を1以上含む相関モデルを記憶し、
前記複数の被監視対象の各々について、入力された前記被監視対象の前記性能値を当該被監視対象の前記相関モデルに適用し、当該相関モデルに含まれる前記相関関係の相関破壊を検出し、
共通な装置または共通な前記被監視対象からいずれかが選択されて処理要求を受け付ける、あるいは、共通な装置または共通な前記被監視対象の処理結果を利用する、同一の機能を提供する前記複数の被監視対象の相互間で、共通な前記相関関係毎の前記相関破壊の検出有無を比較することにより、障害要因の候補となる前記被監視対象を決定し、出力する
運用管理方法。
For each of a plurality of monitored objects, store a correlation model including one or more correlation functions indicating a correlation between performance values of two different types of performance values of a plurality of types,
Applying the input performance value of the monitored object to the correlation model of the monitored object for each of the plurality of monitored objects, detecting correlation destruction of the correlation included in the correlation model;
The plurality of providing the same function by selecting a common device or a common target to be monitored and receiving a processing request, or using a common device or a common processing result of the monitored target An operation management method for determining and outputting the monitored target as a failure factor candidate by comparing the presence or absence of detection of the correlation destruction for each of the common correlations between the monitored targets.
他の被監視対象に対する、前記共通な前記相関関係毎の前記相関破壊の検出有無の類似度が低い前記被監視対象を、前記障害要因の候補と決定する
請求項20に記載の運用管理方法。
21. The operation management method according to claim 20, wherein the monitored object having a low similarity in the presence or absence of detection of the correlation destruction for each of the common correlations with respect to another monitored object is determined as the failure factor candidate.
さらに、
前記障害要因の候補である前記被監視対象の前記相関モデルに含まれる前記相関破壊が検出された前記相関関係を、当該相関関係に関係する性能値の種別と関連づけて出力する
請求項20に記載の運用管理方法。
further,
21. The correlation in which the correlation destruction included in the correlation model of the monitoring target that is a candidate for the failure factor is detected is output in association with a type of performance value related to the correlation. Operation management method.
さらに、
前記共通な装置または前記共通な前記被監視対象からいずれかが選択されて処理要求を受け付ける、あるいは、前記共通な装置または前記共通な前記被監視対象の処理結果を利用する、同一の機能を提供する前記複数の被監視対象間で、前記相関モデルの比較を行うことにより、前記共通な前記相関関係を含む前記相関モデルを有する前記複数の被監視対象を含むグループを抽出する
請求項20に記載の運用管理方法。
further,
The same function is provided in which either one of the common device or the common monitored target is selected and a processing request is accepted, or the processing result of the common device or the common monitored target is used. The group including the plurality of monitored objects having the correlation model including the common correlation is extracted by comparing the correlation models between the plurality of monitored objects. Operation management method.
JP2012549903A 2010-12-20 2011-12-16 Operation management apparatus, operation management method, and program Active JP5267749B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2012549903A JP5267749B2 (en) 2010-12-20 2011-12-16 Operation management apparatus, operation management method, and program

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP2010282727 2010-12-20
JP2010282727 2010-12-20
JP2012549903A JP5267749B2 (en) 2010-12-20 2011-12-16 Operation management apparatus, operation management method, and program
PCT/JP2011/079963 WO2012086824A1 (en) 2010-12-20 2011-12-16 Operation management device, operation management method, and program

Publications (2)

Publication Number Publication Date
JP5267749B2 true JP5267749B2 (en) 2013-08-21
JPWO2012086824A1 JPWO2012086824A1 (en) 2014-06-05

Family

ID=46314089

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2012549903A Active JP5267749B2 (en) 2010-12-20 2011-12-16 Operation management apparatus, operation management method, and program

Country Status (5)

Country Link
US (2) US8874963B2 (en)
EP (1) EP2657843B1 (en)
JP (1) JP5267749B2 (en)
CN (1) CN103262048B (en)
WO (1) WO2012086824A1 (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5418610B2 (en) 2010-02-15 2014-02-19 日本電気株式会社 Failure cause extraction apparatus, failure cause extraction method, and program storage medium
WO2011155621A1 (en) * 2010-06-07 2011-12-15 日本電気株式会社 Malfunction detection device, obstacle detection method, and program recording medium
CN103262048B (en) * 2010-12-20 2016-01-06 日本电气株式会社 operation management device, operation management method and program thereof
JP5831558B2 (en) * 2012-01-23 2015-12-09 日本電気株式会社 Operation management apparatus, operation management method, and program
JP5937209B2 (en) * 2012-07-03 2016-06-22 株式会社日立製作所 Failure effect evaluation system and evaluation method
WO2014097598A1 (en) * 2012-12-17 2014-06-26 日本電気株式会社 Information processing device which carries out risk analysis and risk analysis method
CN105027088B (en) * 2013-02-18 2018-07-24 日本电气株式会社 Network analysis equipment and systematic analytic method
JPWO2014188638A1 (en) * 2013-05-22 2017-02-23 日本電気株式会社 Shared risk group management system, shared risk group management method, and shared risk group management program
EP3015989A4 (en) 2013-06-25 2017-06-07 Nec Corporation System analysis device, system analysis method and system analysis program
US10592328B1 (en) * 2015-03-26 2020-03-17 Amazon Technologies, Inc. Using cluster processing to identify sets of similarly failing hosts
CN108306747B (en) * 2017-01-11 2021-07-23 阿里巴巴集团控股有限公司 Cloud security detection method and device and electronic equipment
JP6823265B2 (en) 2017-03-28 2021-02-03 富士通株式会社 Analytical instruments, analytical systems, analytical methods and analytical programs
JP7415714B2 (en) 2020-03-23 2024-01-17 富士通株式会社 Failure cause identification system, failure cause identification method, and failure cause identification program
US11288115B1 (en) * 2020-11-05 2022-03-29 International Business Machines Corporation Error analysis of a predictive model

Family Cites Families (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1415307A2 (en) * 2001-08-06 2004-05-06 Mercury Interactive Corporation System and method for automated analysis of load testing results
GB0125713D0 (en) * 2001-10-26 2001-12-19 Statoil Asa Method of combining spatial models
US7441008B2 (en) * 2002-12-18 2008-10-21 International Business Machines Corporation Method for correlating transactions and messages
US7552447B2 (en) * 2004-05-26 2009-06-23 International Business Machines Corporation System and method for using root cause analysis to generate a representation of resource dependencies
US7735141B1 (en) * 2005-03-10 2010-06-08 Noel Steven E Intrusion event correlator
US20060294220A1 (en) * 2005-06-22 2006-12-28 Microsoft Corporation Diagnostics and resolution mining architecture
JP4758259B2 (en) * 2006-01-31 2011-08-24 株式会社クラウド・スコープ・テクノロジーズ Network monitoring apparatus and method
JP4573179B2 (en) * 2006-05-30 2010-11-04 日本電気株式会社 Performance load abnormality detection system, performance load abnormality detection method, and program
JPWO2007148562A1 (en) * 2006-06-22 2009-11-19 日本電気株式会社 Share management system, share management method and program
JP4859558B2 (en) * 2006-06-30 2012-01-25 株式会社日立製作所 Computer system control method and computer system
US7661031B2 (en) * 2006-12-28 2010-02-09 International Business Machines Corporation Correlating macro and error data for debugging program error event
US7925369B2 (en) * 2007-12-18 2011-04-12 Globalfoundries Inc. Method and apparatus for optimizing models for extracting dose and focus from critical dimension
US20090187412A1 (en) * 2008-01-18 2009-07-23 If Analytics Llc Correlation/relationship and forecasting generator
JP4872945B2 (en) * 2008-02-25 2012-02-08 日本電気株式会社 Operation management apparatus, operation management system, information processing method, and operation management program
JP4872944B2 (en) 2008-02-25 2012-02-08 日本電気株式会社 Operation management apparatus, operation management system, information processing method, and operation management program
US8700953B2 (en) 2008-09-18 2014-04-15 Nec Corporation Operation management device, operation management method, and operation management program
JP5428372B2 (en) * 2009-02-12 2014-02-26 日本電気株式会社 Operation management apparatus, operation management method and program thereof
US7992040B2 (en) * 2009-02-20 2011-08-02 International Business Machines Corporation Root cause analysis by correlating symptoms with asynchronous changes
US8381038B2 (en) * 2009-05-26 2013-02-19 Hitachi, Ltd. Management server and management system
JP5297272B2 (en) * 2009-06-11 2013-09-25 株式会社日立製作所 Device abnormality monitoring method and system
JPWO2011046228A1 (en) * 2009-10-15 2013-03-07 日本電気株式会社 System operation management apparatus, system operation management method, and program storage medium
JP5418610B2 (en) * 2010-02-15 2014-02-19 日本電気株式会社 Failure cause extraction apparatus, failure cause extraction method, and program storage medium
US8060782B2 (en) * 2010-03-01 2011-11-15 Microsoft Corporation Root cause problem identification through event correlation
WO2011155621A1 (en) * 2010-06-07 2011-12-15 日本電気株式会社 Malfunction detection device, obstacle detection method, and program recording medium
US8774023B2 (en) * 2010-09-22 2014-07-08 At&T Intellectual Property I, Lp Method and system for detecting changes in network performance
US8751436B2 (en) * 2010-11-17 2014-06-10 Bank Of America Corporation Analyzing data quality
CN103262048B (en) * 2010-12-20 2016-01-06 日本电气株式会社 operation management device, operation management method and program thereof
US8959313B2 (en) * 2011-07-26 2015-02-17 International Business Machines Corporation Using predictive determinism within a streaming environment
US9148495B2 (en) * 2011-07-26 2015-09-29 International Business Machines Corporation Dynamic runtime choosing of processing communication methods
US20140058798A1 (en) * 2012-08-24 2014-02-27 o9 Solutions, Inc. Distributed and synchronized network of plan models
CN104798049B (en) * 2012-11-20 2017-08-04 日本电气株式会社 operation management device and operation management method

Also Published As

Publication number Publication date
EP2657843A1 (en) 2013-10-30
EP2657843A4 (en) 2015-08-12
JPWO2012086824A1 (en) 2014-06-05
US8874963B2 (en) 2014-10-28
EP2657843B1 (en) 2020-04-08
CN103262048A (en) 2013-08-21
US20150006960A1 (en) 2015-01-01
US20130159778A1 (en) 2013-06-20
WO2012086824A1 (en) 2012-06-28
CN103262048B (en) 2016-01-06

Similar Documents

Publication Publication Date Title
JP5267749B2 (en) Operation management apparatus, operation management method, and program
JP6394726B2 (en) Operation management apparatus, operation management method, and program
JP5532150B2 (en) Operation management apparatus, operation management method, and program
JP5874936B2 (en) Operation management apparatus, operation management method, and program
EP2523115B1 (en) Operation management device, operation management method, and program storage medium
KR101971013B1 (en) Cloud infra real time analysis system based on big date and the providing method thereof
US20160378583A1 (en) Management computer and method for evaluating performance threshold value
JP6183450B2 (en) System analysis apparatus and system analysis method
US10430268B2 (en) Operations management system, operations management method and program thereof
JP6183449B2 (en) System analysis apparatus and system analysis method
CN105027088B (en) Network analysis equipment and systematic analytic method
JP6252309B2 (en) Monitoring omission identification processing program, monitoring omission identification processing method, and monitoring omission identification processing device
JP6467365B2 (en) Failure analysis apparatus, failure analysis program, and failure analysis method

Legal Events

Date Code Title Description
TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20130409

R150 Certificate of patent or registration of utility model

Ref document number: 5267749

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

Free format text: JAPANESE INTERMEDIATE CODE: R150