JP2009211658A

JP2009211658A - Failure detection device, failure detection method and program therefor

Info

Publication number: JP2009211658A
Application number: JP2008056746A
Authority: JP
Inventors: Yoshimasa Hattori; 佳正服部
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2008-03-06
Filing date: 2008-03-06
Publication date: 2009-09-17
Anticipated expiration: 2028-03-06
Also published as: JP4826831B2

Abstract

<P>PROBLEM TO BE SOLVED: To solve a problem that a failure caused by a complex combination of monitoring items cannot be detected. <P>SOLUTION: A failure detection device 10 for detecting occurrence of a failure of an object to be monitored based on monitoring information about predetermined monitoring items acquired from the object to be monitored includes: a monitoring information recording processing section 12 for giving a scoring point to each piece of information about the monitoring item according to a predetermined rule condition; a scoring point accumulation processing section 13 for accumulating the scoring points for every group of a plurality of monitoring items; and a failure determination processing section 14 for determining the presence of a failure of the object to be monitored by comparing the accumulated scoring points and a threshold set for every group. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、被監視装置からの監視情報に基づいて被監視装置の障害を検知し通報する障害検知装置、障害検知方法及びプログラムに関する。 The present invention relates to a failure detection device, a failure detection method, and a program for detecting and reporting a failure of a monitored device based on monitoring information from the monitored device.

システムの運用管理においては、システムを構成するコンピュータ等の被監視装置から採取した監視情報に基づいて被監視装置の障害を検知する方法が採用されている。 In system operation management, a method of detecting a failure of a monitored device based on monitoring information collected from the monitored device such as a computer constituting the system is adopted.

このような障害検知においては、一般的に、被監視装置のソフトウェア又はハードウェアに関する個々の監視項目についての監視情報を採取し、個々の監視項目毎に、障害（問題）が発生しているかどうかを判断する手法が用いられている。 In such failure detection, generally, monitoring information about individual monitoring items related to software or hardware of the monitored device is collected, and whether or not a failure (problem) has occurred for each monitoring item. A method is used to determine the above.

例えば、この種の障害監視装置の関連技術が特許文献１に開示されている。この特許文献１には、監視対象（項目）の障害状態にあらかじめ重み付けを行っておき、監視対象について障害が発生した場合でもその障害より大きな重み付けの障害が以前に通知されている場合には通知を抑制して、不要な障害通報を抑止させる技術が記載されている。
特開２００２−１７１３０４号公報 For example, Patent Document 1 discloses a related technology of this type of failure monitoring apparatus. In this Patent Document 1, a failure state of a monitoring target (item) is weighted in advance, and even when a failure has occurred in the monitoring target, a notification is given when a failure having a higher weight than that failure has been previously notified. The technology that suppresses unnecessary trouble reports is described.
JP 2002-171304 A

システムの運用管理においては、個々の監視項目についての監視情報から検知する障害単独では大きな問題とならない場合でも、複数の監視項目に対する障害が複合的に組み合わさった場合に緊急性の高い重大な問題が発生する可能性ある。 In system operation management, even if a failure detected from the monitoring information for each monitoring item alone does not cause a major problem, a serious problem with a high degree of urgency occurs when a combination of failures for multiple monitoring items is combined. May occur.

上述した特許文献１等に記載される関連技術の障害検知方法では、個々の管理項目についての監視情報に基づいて監視項目毎に障害を検知しているため、上述したような複数の監視項目が複合的に組み合わさって重大な問題が発生した場合に、原因の特定が困難であるという問題があった。特に、システムの監視対象（監視項目）が多くなればなるほど原因の究明に時間がかかるという問題があった。 In the related art failure detection method described in Patent Document 1 and the like described above, since a failure is detected for each monitoring item based on the monitoring information about each management item, there are a plurality of monitoring items as described above. When a serious problem occurs in combination, there is a problem that it is difficult to identify the cause. In particular, there is a problem that as the number of system monitoring targets (monitoring items) increases, it takes time to investigate the cause.

（発明の目的）
本発明の目的は、個々の監視項目についての監視情報だけでなく、複数の監視項目の組み合わせについての複合的な情報に基づいて、監視項目が複合的に組み合わさって発生する障害を検出することが可能な障害検知装置、障害検知方法及びそのプログラムを提供することにある。 (Object of invention)
An object of the present invention is to detect a failure caused by a combination of monitoring items based on not only monitoring information about individual monitoring items but also composite information about a combination of a plurality of monitoring items. It is an object of the present invention to provide a failure detection device, a failure detection method, and a program thereof.

本発明による障害検知装置は、被監視対象から取得した所定の監視項目に対する監視情報に基づいて、被監視対象の障害の発生を検知する障害検知装置であって、監視項目に対する監視情報毎に、所定のルール条件に従って点数を付与する手段と、複数の監視項目のグループ毎に、点数を累積する点数累積手段と、累積した点数と、グループ毎に設定した閾値とを比較することにより、被監視対象の障害の有無を判定する判定手段とを含む。 The failure detection device according to the present invention is a failure detection device that detects the occurrence of a failure of a monitored target based on monitoring information for a predetermined monitoring item acquired from the monitored target, and for each monitoring information for the monitored item, By comparing the means for assigning points according to a predetermined rule condition, the point accumulating means for accumulating points for each group of a plurality of monitoring items, and the threshold value set for each group to be monitored Determination means for determining the presence or absence of a target failure.

本発明による障害検知方法は、被監視対象から取得した所定の監視項目に対する監視情報に基づいて、被監視対象の障害の発生を検知する障害検知方法であって、監視項目に対する監視情報毎に、所定のルール条件に従って点数を付与するステップと、複数の監視項目のグループ毎に、点数を累積するステップと、累積した点数と、グループ毎に設定した閾値とを比較することにより、被監視対象の障害の有無を判定する判定ステップとを含む。 The failure detection method according to the present invention is a failure detection method for detecting the occurrence of a failure of a monitored target based on monitoring information for a predetermined monitoring item acquired from the monitored target, and for each monitoring information for the monitored item, A step of assigning points according to a predetermined rule condition, a step of accumulating points for each group of a plurality of monitoring items, and comparing the accumulated points with a threshold set for each group, A determination step of determining the presence or absence of a failure.

本発明によるプログラムは、コンピュータ上で実行され、被監視対象から取得した所定の監視項目に対する監視情報に基づいて、被監視対象の障害の発生を検知するプログラムであって、コンピュータに、監視項目に対する監視情報毎に、所定のルール条件に従って点数を付与する処理と、複数の監視項目のグループ毎に、点数を累積する処理と、累積した点数と、グループ毎に設定した閾値とを比較することにより、被監視対象の障害の有無を判定する判定処理を、実行させる。 A program according to the present invention is a program that is executed on a computer and detects occurrence of a failure of a monitored object based on monitoring information for a predetermined monitoring item acquired from the monitored object. By comparing the process of assigning points according to a predetermined rule condition for each monitoring information, the process of accumulating points for each group of a plurality of monitoring items, and the threshold value set for each group A determination process for determining the presence or absence of a failure to be monitored is executed.

本発明によれば、監視項目が複合的に組み合わさって発生する障害を検出することが可能となる。
According to the present invention, it is possible to detect a failure that occurs due to a combination of monitoring items.

次に、本発明の実施の形態について図面を参照して詳細に説明する。 Next, embodiments of the present invention will be described in detail with reference to the drawings.

（第１の実施の形態）
図１を参照すると、本発明の第１の実施の形態による障害検知通報システムは、障害監視を行う対象である被監視装置１００と、被監視装置１００の障害を検知して通報する障害検知通報装置１０を備えて構成される。 (First embodiment)
Referring to FIG. 1, the failure detection notification system according to the first exemplary embodiment of the present invention is a monitored device 100 that is a target for failure monitoring, and a failure detection notification that detects and reports a failure of the monitored device 100. The apparatus 10 is provided.

被監視装置１００は、コンピュータ、ルータ、ハブ、ファイアウオール装置などの機器であり、障害検知通報装置１０は、被監視装置１００から収集した監視情報に基づいて障害の検知と通報までの処理を行う装置である。被監視装置１００と障害検知通報装置１０は、互いにローカルエリアネットワーク（ＬＡＮ）やインターネットを介して相互に接続さている。 The monitored device 100 is a device such as a computer, a router, a hub, or a firewall device, and the failure detection notification device 10 is a device that performs processing up to failure detection and notification based on the monitoring information collected from the monitored device 100. It is. The monitored device 100 and the failure detection notification device 10 are connected to each other via a local area network (LAN) or the Internet.

被監視装置１００は、被監視装置１００上のハードウェア、ソフトウェアに関する監視情報を採取する監視情報採取部１０１を有している。この監視情報採取部１０１は、プログラムによって実現され、例えばハードウェア、ソフトウェアに関する監視情報をＯＳ（オペレーションシステム）の機能を利用して採取する機能を有する。 The monitored device 100 includes a monitoring information collection unit 101 that collects monitoring information related to hardware and software on the monitored device 100. The monitoring information collection unit 101 is realized by a program and has a function of collecting monitoring information related to, for example, hardware and software by using a function of an OS (operation system).

監視情報採取部１０１が監視情報には、被監視装置１００の温度、電圧、ファン、ＣＰＵ、メモリ、ディスク（ＤＢテーブル）、ネットワーク、プロセス、メッセージログ（アプリケーションログ、イベントログ、システムログ）、死活監視等に関する情報が含まれる。 Monitoring information collected by the monitoring information collection unit 101 includes temperature, voltage, fan, CPU, memory, disk (DB table), network, process, message log (application log, event log, system log), life and death of the monitored device 100. Information about monitoring etc. is included.

障害検知通報装置１０は、監視情報取得部１１と、監視情報記録処理部１２と、点数累積処理部１３と、障害判定処理部１４と、通報実行部１５と、点数付与ルールテーブル２１と、データ記憶部２２と、問題判定テーブル２３と、点数更新用情報テーブル２４と、累積点数記憶部２５を含んで構成される。 The failure detection notification device 10 includes a monitoring information acquisition unit 11, a monitoring information recording processing unit 12, a score accumulation processing unit 13, a failure determination processing unit 14, a report execution unit 15, a score assignment rule table 21, data The storage unit 22, the problem determination table 23, the score update information table 24, and the cumulative score storage unit 25 are configured.

監視情報取得部１１は、監視情報採取部１０１で採取した被監視装置１００の監視情報をＬＡＮ、インターネット等を通じて取得する機能を有する。 The monitoring information acquisition unit 11 has a function of acquiring monitoring information of the monitored apparatus 100 collected by the monitoring information collection unit 101 via a LAN, the Internet, or the like.

監視情報記録処理部１２は、監視情報取得部１１で取得した監視情報を受け取り、点数付与ルールテーブル２１を参照して監視情報に累積対象フラグと点数を付け加えると共に、そのデータをデータ記憶部２２に格納する機能を有する。 The monitoring information recording processing unit 12 receives the monitoring information acquired by the monitoring information acquisition unit 11, adds a cumulative target flag and a score to the monitoring information with reference to the score addition rule table 21, and stores the data in the data storage unit 22. Has the function of storing.

点数累積処理部１３は、監視情報記録処理部１２で監視情報に付加された点数を設定された所定時間の間、一時記憶領域として機能する累積点数記憶部２５に加算していく機能を有する。 The score accumulation processing unit 13 has a function of adding the score added to the monitoring information by the monitoring information recording processing unit 12 to the accumulated score storage unit 25 functioning as a temporary storage area for a set predetermined time.

障害判定処理部１４は、監視情報記録処理部１２又は点数累積処理部１３から送付された累積点数を、問題判定テーブル２３の閾値点数と比較することによって、障害発生の有無を検知し、障害有りの判定をした場合に、通報方法を決定する機能を有する。また、障害判定処理部１４は、点数更新用情報テーブル２４を参照することによって、誤った判定がなされた場合に問題判定テーブル２３中の閾値点数の補正を行う機能も有している。 The failure determination processing unit 14 detects the presence or absence of a failure by comparing the accumulated score sent from the monitoring information recording processing unit 12 or the score accumulation processing unit 13 with the threshold score of the problem determination table 23, and there is a failure. This function has a function to determine the reporting method when the determination is made. The failure determination processing unit 14 also has a function of correcting the threshold score in the problem determination table 23 when an incorrect determination is made by referring to the score update information table 24.

通報実行部１５は、障害判定処理部１４で決定した通報方法に従って通報を実行する機能を有する。 The notification execution unit 15 has a function of executing notification according to the notification method determined by the failure determination processing unit 14.

点数付与ルールテーブル２１は、「サーバ(システム名)」、「監視種類」、「ルール条件」、「累積対象フラグ」、「累積グループ名」、「付与点数」のフィールドを有する。 The score assignment rule table 21 has fields of “server (system name)”, “monitoring type”, “rule condition”, “cumulative target flag”, “cumulative group name”, and “granted score”.

サーバ(システム名)には、被監視装置１００のサーバ名又はシステム名が登録されている。 In the server (system name), the server name or system name of the monitored apparatus 100 is registered.

監視種類には、温度、電圧、ファン（回転数）、ＣＰＵ、メモリ、ディスク（ＤＢテーブル）、ネットワーク、プロセス、メッセージログ（アプリケーションログ、イベントログ、シスログ）、死活監視等の監視項目の種類が登録されている。 Types of monitoring include types of monitoring items such as temperature, voltage, fan (rotation speed), CPU, memory, disk (DB table), network, process, message log (application log, event log, syslog), and life / death monitoring. It is registered.

ルール条件には、監視種類で指定された監視項目の状態に対する閾値条件や文字列条件が登録されている。 In the rule condition, a threshold condition and a character string condition for the state of the monitoring item specified by the monitoring type are registered.

閾値条件としては、上限又は下限の一方、或いは上限と下限の両方を指摘することが可能である。例えば、ＣＰＵの負荷率の場合であれば、「上限：８０％以上」、「下限：２０％以下」、或いは「上限：８０％以上、下限：２０％以下」のように指定することができる。また、文字列条件としては、メッセージログ等に含まれる所定の文字列が指定される。 As the threshold condition, it is possible to point out either the upper limit or the lower limit, or both the upper limit and the lower limit. For example, in the case of the load factor of the CPU, it can be specified as “upper limit: 80% or more”, “lower limit: 20% or less”, or “upper limit: 80% or more, lower limit: 20% or less”. . As the character string condition, a predetermined character string included in a message log or the like is designated.

累積対象フラグには、当該監視種類で指定された監視項目が点数を累積する対象であるかどうかを示すフラグとして、ＹＥＳ又はＮＯが登録されている。 In the accumulation target flag, YES or NO is registered as a flag indicating whether or not the monitoring item specified by the monitoring type is a target for accumulating points.

累積グループ名には、点数の累積を行う単位となるグループを示す名称が登録されている。例えば、ある監視項目単独では重大な事態が発生しないが、他の監視項目と組み合わさった場合に重大な事態が発生すると想定できる場合、その組み合わせに含まれる複数の監視項目を１つのグループとし、そのグループを累積グループ名で区別する。 In the cumulative group name, a name indicating a group that is a unit for accumulating points is registered. For example, if a certain monitoring item alone does not cause a serious situation, but it can be assumed that a serious situation will occur when combined with other monitoring items, a plurality of monitoring items included in the combination are grouped together, The group is distinguished by the cumulative group name.

例えば、監視項目のうち、「温度」、「電圧」、「ファン」をグループとして、累積グループ名を「ＡＡ１」のように登録してある。 For example, among the monitoring items, “temperature”, “voltage”, and “fan” are registered as a group, and the cumulative group name is registered as “AA1”.

監視項目が累積対象でない場合（累積グループに属さない場合）には、累積グループ名に、「default」の文字列が登録されている。 When the monitoring item is not the accumulation target (when it does not belong to the accumulation group), the character string “default” is registered in the accumulation group name.

付与点数には、監視項目又は累積グループ毎に付与する点数（１、２、３、・・・）が指定されている。 The number of points (1, 2, 3,...) To be assigned for each monitoring item or cumulative group is designated as the number of points to be assigned.

データ記憶部２２の記録情報は、「ＩＤ」、「サーバ（システム名）」、「監視種類」、「発生時刻」、「監視情報詳細」、「累積対象フラグ」、「点数」のフィールドを有する。 The recorded information in the data storage unit 22 includes fields of “ID”, “server (system name)”, “monitoring type”, “occurrence time”, “monitoring information details”, “cumulative target flag”, and “score”. .

ＩＤには、監視情報記録処理部１２が付与した監視情報を一意に識別する識別番号（例えば、ＩＤ１、ＩＤ２、ＩＤ３、・・・）を登録する。 In the ID, an identification number (for example, ID1, ID2, ID3,...) That uniquely identifies the monitoring information assigned by the monitoring information recording processing unit 12 is registered.

発生時刻には、監視情報が発生した時刻が指定され、監視情報詳細には、監視情報の詳細な内容を登録する。 The time at which the monitoring information is generated is specified as the occurrence time, and the detailed contents of the monitoring information are registered in the monitoring information details.

点数には、点数付与ルールテーブル２１に基づいて付与された点数を登録する。 In the score, the score given based on the score granting rule table 21 is registered.

サーバ（システム名）、監視種類、累積対象フラグについては、点数付与ルールテーブル２１の対応するフィールドと同様である。 The server (system name), the monitoring type, and the accumulation target flag are the same as the corresponding fields in the score assignment rule table 21.

問題判定テーブル２３は、「累積グループ名」、「閾値点数」、「問題ランク」、「通報方法」のフィールドを有する。累積グループ名については、上述した通りである。 The problem determination table 23 includes fields of “cumulative group name”, “threshold score”, “problem rank”, and “report method”. The cumulative group name is as described above.

閾値点数には、監視項目又は累積グループ毎の点数(累積点数)の閾値が記載されている。この閾値点数には、重要度の違いに応じて幾つかの段階に分けた閾値を設定することが可能である。
障害判定処理部１４は、監視項目又は累積グループの点数又は累積点数が、閾値点数に登録された値を上回る場合に、被監視装置１００に障害が発生している（通報の必要有り）と判定する。 In the threshold score, a threshold of the score (cumulative score) for each monitoring item or cumulative group is described. In this threshold score, a threshold divided into several stages can be set according to the difference in importance.
The failure determination processing unit 14 determines that a failure has occurred in the monitored device 100 (the report needs to be reported) when the score or the cumulative score of the monitoring item or cumulative group exceeds the value registered in the threshold score. To do.

問題ランクには、上記点数の閾値に応じた問題の重要度（例えば、重要度が高い順に、Ａ、Ｂ、Ｃ・・・等）が記載されている。 The problem rank describes the importance of the problem according to the score threshold (for example, A, B, C... In descending order of importance).

通報方法には、パトランプの点灯、電子メールによる通知、他のアプリケーションへの通知、電話による通知又はそれらの組み合わせ等の問題の通報方法が記載されている。 The reporting method includes a reporting method for problems such as lighting of a patrol lamp, notification by e-mail, notification to another application, notification by telephone, or a combination thereof.

点数更新用情報テーブル２４は、「累積グループ名」、「待機時間」、「サーバ（システム名）」、「監視種類」、「ルール条件」、「補正係数」のフィールドを有する。この点数更新用情報テーブル２４の累積グループ、サーバ（システム名）、監視種類、ルール条件のフィールドについては、点数付与ルールテーブル２１の各フィールドと同じデータが記載され、さらに、以下の待機時間と、補正係数のフィールドが追加されている。 The score update information table 24 has fields of “cumulative group name”, “standby time”, “server (system name)”, “monitoring type”, “rule condition”, and “correction coefficient”. For the cumulative group, server (system name), monitoring type, and rule condition fields of the score update information table 24, the same data as each field of the score assignment rule table 21 is described. A correction factor field has been added.

待機時間には、更新用情報の待ち合わせ時間が指定されている。 The waiting time for the update information is designated as the waiting time.

補正係数には、問題判定テーブル２３の「閾値点数」フィールドの点数を補正するための係数が記載される。 In the correction coefficient, a coefficient for correcting the score in the “threshold score” field of the problem determination table 23 is described.

累積点数記憶部２５は、「累積グループ名」、「累積点数」、「ＩＤ」のフィールドを有する。累積グループ名については、上述した通りである。 The cumulative score storage unit 25 has fields of “cumulative group name”, “cumulative score”, and “ID”. The cumulative group name is as described above.

累積点数には、一定の設定時間の間に加算された点数が記載される。 In the cumulative score, the score added during a certain set time is described.

ＩＤには、加算の対象になった監視情報のＩＤが記載される。 In the ID, the ID of the monitoring information to be added is described.

次に、上記障害検知通知装置１０のハードウェア構成例について、図１０を参照して説明する。 Next, a hardware configuration example of the failure detection notification apparatus 10 will be described with reference to FIG.

図１０を参照すると、障害検知通知装置１０は、一般的なコンピュータ装置と同様のハードウェア構成によって実現することができ、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）４０１、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等のメインメモリであり、データの作業領域やデータの一時退避領域に用いられる主記憶部４０２、ネットワーク６００を介してデータの送受信を行う通信部４０３、外部装置と接続してデータの送受信を行う入出力インタフェース部４０４、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、磁気ディスク、半導体メモリ等の不揮発性メモリから構成されるハードディスク装置である補助記憶部４０５（例えば、点数付与ルールテーブル２１、データ記憶部２２、問題判定テーブル２３、点数更新用情報テーブル２４、累積点数記憶部２５は、この補助記憶部４０５上に構築される）、本情報処理装置の上記各構成要素を相互に接続するシステムバス４０６、ディスプレイ装置等の出力装置４０７及びキーボード等の入力装置４０８を備えている。 Referring to FIG. 10, the failure detection notification device 10 can be realized by a hardware configuration similar to a general computer device, and is a main memory such as a CPU (Central Processing Unit) 401 and a RAM (Random Access Memory). A main storage unit 402 used for a data work area and a temporary data save area; a communication unit 403 that transmits / receives data via the network 600; and an input / output interface unit 404 that transmits / receives data by connecting to an external device. Auxiliary storage unit 405 (for example, a score assignment rule table 21, a data storage unit 22, a problem determination table 23) which is a hard disk device including a nonvolatile memory such as a ROM (Read Only Memory), a magnetic disk, and a semiconductor memory The score update information table 24 and the cumulative score storage unit 25 are built on the auxiliary storage unit 405), a system bus 406 for interconnecting the above-described components of the information processing apparatus, and an output device such as a display device 407 and an input device 408 such as a keyboard.

本実施の形態による障害検知通知装置１０は、障害の検知と通報を実行するプログラムを組み込んだ、ＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ）等のハードウェア部品である回路部品を実装することにより、その動作をハードウェア的に実現することは勿論として、監視情報取得部１１、監視情報記録処理部１２、点数累積処理部１３、障害判定処理部１４、通報実行部１５の各機能を提供するプログラムを、補助記憶部４０５に格納し、そのプログラムを主記憶部４０２にロードしてＣＰＵ４０１で実行することにより、ソフトウェア的に実現することも可能である。 The failure detection notification device 10 according to the present embodiment implements a hardware operation by mounting a circuit component which is a hardware component such as an LSI (Large Scale Integration) in which a program for detecting and reporting a failure is incorporated. As a matter of course, the program that provides each function of the monitoring information acquisition unit 11, the monitoring information recording processing unit 12, the score accumulation processing unit 13, the failure determination processing unit 14, and the notification execution unit 15 is auxiliary storage. The program can be realized by software by storing the program in the unit 405, loading the program into the main storage unit 402, and executing the program by the CPU 401.

（実施の形態による動作）
次に、上記のように構成される本実施の形態による障害検知通知装置１０の動作について、図１、図２〜図６及び図７を参照して説明する。図７、図８は、障害検知通知装置１０の動作内容を説明するフローチャートである。 (Operation according to the embodiment)
Next, the operation of the failure detection notification device 10 according to the present embodiment configured as described above will be described with reference to FIGS. 1, 2 to 6 and 7. 7 and 8 are flowcharts for explaining the operation contents of the failure detection notification device 10.

図７を参照すると、監視情報取得部１１が、採取した被監視装置１００の監視情報を監視情報採取部１０１からＬＡＮ、インターネットを介して取得し、取得情報に不正がないかを確認した後、監視情報記録処理部１２に渡す（ステップＳ１０１）。 Referring to FIG. 7, after the monitoring information acquisition unit 11 acquires the collected monitoring information of the monitored device 100 from the monitoring information collection unit 101 via the LAN and the Internet, and confirms whether the acquired information is fraudulent, The information is transferred to the monitoring information recording processing unit 12 (step S101).

監視情報記録処理部１２は、点数付与ルールテーブル２１を参照して、「サーバ（システム）名」、「監視種類」、「ルール条件」に合致するかどうかを監視情報取得部１１から渡された監視情報1件１件に対して照合する（ステップＳ１０２）。 The monitoring information recording processing unit 12 refers to the score assignment rule table 21 and is passed from the monitoring information acquisition unit 11 as to whether the “server (system) name”, “monitoring type”, and “rule condition” are met. One piece of monitoring information is collated (step S102).

条件に合致した場合、監視情報記録処理部１２は、監視情報に、点数付与ルールテーブル２１に設定している「累積対象フラグ」（ＹＥＳ又はＮＯ）、「累積グループ名」、「付与点数」の情報を付加する。 When the condition is met, the monitoring information recording processing unit 12 includes “accumulation target flag” (YES or NO), “accumulation group name”, and “granting score” set in the score assignment rule table 21 in the monitoring information. Add information.

条件に合致するものがなかった場合には、監視情報に、「累積対象フラグ」（＝ＮＯ）、「累積グループ」（＝default）、「付与点数」の情報を付加する。 If there is no item that matches the condition, information of “accumulation target flag” (= NO), “accumulation group” (= default), and “granting number” is added to the monitoring information.

点数付与ルールテーブル２１との照合が終了すると、監視情報記録処理部１２は、その監視情報に関して、ＩＤ、サーバ（システム）名、監視種類、発生時刻、監視情報詳細、累積対象フラグ、点数の情報を、図３に示すように記録情報としてデータ記録部２２に登録する（ステップＳ１０３）。 When the collation with the score assignment rule table 21 is completed, the monitoring information recording processing unit 12 relates to the monitoring information, ID, server (system) name, monitoring type, occurrence time, monitoring information details, accumulation target flag, score information Is registered in the data recording unit 22 as recording information as shown in FIG. 3 (step S103).

また、監視情報記録処理部１２は、付加された「累積対象フラグ」を参照して（ステップＳ１０４）、累積対象フラグ＝ＹＥＳであればデータ記録部２２に記録した情報を点数累積処理部１３へ送付し（ステップＳ１０５）、累積対象フラグ＝ＮＯの場合には、情報を障害判定処理部１４へ送付する（ステップＳ１０６）。 The monitoring information recording processing unit 12 refers to the added “accumulation target flag” (step S104), and if the accumulation target flag = YES, the information recorded in the data recording unit 22 is sent to the score accumulation processing unit 13. If the accumulation target flag = NO, the information is sent to the failure determination processing unit 14 (step S106).

点数累積処理部１３は、送付された記録情報を受け取ると、累積グループ名単位で累積点数記憶部２５の記録情報と同じ累積グループ名に対応する累積点数に、記録情報に付与された点数を加算する（ステップＳ１０７〜Ｓ１０９）。 Upon receiving the sent recording information, the score accumulation processing unit 13 adds the score given to the recording information to the cumulative score corresponding to the same cumulative group name as the recording information in the cumulative score storage unit 25 in cumulative group names. (Steps S107 to S109).

点数累積処理部１３による点数の加算は、予め設定した一定の設定時間間隔で行う。この設定時間については、監視する被監視装置１００の種類や稼動状況等に応じて任意の時間を予め設定するものとする。 The addition of the points by the point accumulation processing unit 13 is performed at a preset fixed time interval. As for the set time, an arbitrary time is set in advance in accordance with the type of the monitored device 100 to be monitored, the operation status, and the like.

点数累積処理部１３は、最初に情報を受け取ったときから累積グループ名毎に設定時間のカウントを開始し（ステップＳ１０７）、設定時間の間待ち合わせを行い、カウントが終了するまで（ステップＳ１０９）、記録情報に付加された点数を累積点数記憶部２５の「累積点数」に累積していく（ステップＳ１０８）。 The score accumulation processing unit 13 starts counting the set time for each accumulated group name from the time when the information is first received (step S107), waits for the set time, and until the count ends (step S109). The score added to the recorded information is accumulated in the “accumulated score” of the accumulated score storage unit 25 (step S108).

カウントが終了すると、累積点数記憶部２５の累積点数と、カウント開始から終了までに到着した記録情報を障害判定処理部１４へ送付する（ステップＳ１１０）。 When the counting is completed, the cumulative score in the cumulative score storage unit 25 and the record information that has arrived from the start to the end of the count are sent to the failure determination processing unit 14 (step S110).

なお、設定時間のカウントが終了後、新たな情報を受け取った場合、累積点数記憶部２５は、受け取り次第再び設定時間のカウントを開始して、点数を累積する。 When new information is received after the set time has been counted, the accumulated score storage unit 25 starts counting the set time again upon receipt, and accumulates the points.

障害判定処理部１４では、累積点数を受け取ると、問題判定テーブル２３に基づいて、通報の必要性、問題の重要度、通報方法を判定する（ステップＳ１１１）。 When receiving the cumulative score, the failure determination processing unit 14 determines the necessity of notification, the importance of the problem, and the notification method based on the problem determination table 23 (step S111).

障害判定処理部１４による判定処理の詳細について以下に説明する。
（１）記録情報が累積対象フラグ＝ＮＯである場合（監視情報記録処理部１２から送付された記録情報の場合）
問題判定テーブル２３の「累積グループ名」＝defaultのフィールドを参照して、その「閾値点数」と送付された記録情報に記録されている「点数」とを比較して、「点数」が「閾値点数」を上回っている場合に被監視装置１００に障害が発生していると判定し、通報の必要有りの決定をする。
（２）情報が累積対象フラグ＝ＹＥＳである場合（点数累積処理部１３から送付された記録情報の場合）
問題判定テーブル２３の該当する「累積グループ名」を参照して、対応する「閾値点数」と点数累積処理部１３から送られてきた「累積点数」とを比較して、「累積点数」が「閾値点数」を上回っている場合に被監視装置１００に障害が発生していると判定し、通報の必要有りの決定をする。 Details of the determination processing by the failure determination processing unit 14 will be described below.
(1) When the recording information is the accumulation target flag = NO (in the case of the recording information sent from the monitoring information recording processing unit 12)
Referring to the field of “cumulative group name” = default in the problem determination table 23, the “threshold score” is compared with the “score” recorded in the sent record information, and the “score” is set to “threshold value”. If it exceeds the “score”, it is determined that a failure has occurred in the monitored apparatus 100, and it is determined that a notification is necessary.
(2) When the information is the accumulation target flag = YES (in the case of recording information sent from the point accumulation processing unit 13)
With reference to the corresponding “cumulative group name” in the problem determination table 23, the corresponding “threshold score” is compared with the “cumulative score” sent from the score accumulation processing unit 13, and the “cumulative score” is “ If the threshold value is exceeded, it is determined that a failure has occurred in the monitored apparatus 100, and it is determined that a notification is necessary.

通報の必要有りと決定した場合、障害判定処理部１４は、問題判定テーブル２３を参照し、「問題ランク」に記載された重要度と、「通報方法」に記載された通報方法を指定し、通報実行部１５に対して通報を指示する。 When it is determined that the report is necessary, the failure determination processing unit 14 refers to the problem determination table 23, designates the importance level described in the “problem rank”, and the notification method described in the “report method”. The report execution unit 15 is instructed to report.

例えば、重要度が低い場合には、通報方法としてパトランプの点灯を指定し、重要度が高い場合には、通報方法としてパトランプの点灯に加えて、電子メールによる通報を指定することが考えられる。 For example, when the importance level is low, it is possible to designate the lighting of the patrol lamp as the reporting method, and when the importance level is high, it is possible to designate the reporting by e-mail in addition to the lighting of the patrol lamp as the reporting method.

なお、上記何れの場合にも、点数又は累積点数が問題判定テーブル２３を「閾値点数」を下回っている場合には、障害が発生していないと判定し、通報の必要無しと決定し、通報は行わない。 In any of the above cases, if the score or the cumulative score is below the “threshold score” in the problem determination table 23, it is determined that no failure has occurred, and it is determined that there is no need for notification. Do not do.

障害判定処理部１４から通報の指示を受けると、通報実行部１５が指示された通報方法で通報を実施する（ステップＳ１１２）。 When receiving a notification instruction from the failure determination processing unit 14, the notification execution unit 15 performs a notification by the specified notification method (step S112).

その際、電子メール、他アプリケーションへの通知、電話での通報に関しては、監視情報についてデータ記録部２２に記録した記録情報の内容を添付して通知を行う。さらに、点数の累積対象である監視情報に関しては、累積グループに含まれる全ての監視項目の内容を添付して通知する。 At that time, with respect to e-mail, notification to other applications, and telephone notification, the contents of the recorded information recorded in the data recording unit 22 are attached to the monitoring information. Furthermore, regarding the monitoring information that is the accumulation target of the score, the contents of all the monitoring items included in the accumulation group are attached and notified.

また、パトランプの点灯によって障害を通報するようにしてもよい。さらに、パトランプの点灯は、電子メールや他アプリケーションへの通知と併せて、行うようにしてもよい。 Moreover, you may make it report a failure by lighting of a patrol lamp. Furthermore, the patrol lamp may be turned on in conjunction with e-mail or notification to other applications.

次に、障害判定処理部１４による問題判定テーブル２３の「閾値点数」の補正処理について、図８及び図９を参照して説明する。 Next, correction processing of the “threshold score” of the problem determination table 23 by the failure determination processing unit 14 will be described with reference to FIGS. 8 and 9.

まず、障害有り（通報の必要有り）の判定が誤判定であった場合の補正処理について、図８のフローチャートを参照して説明する。 First, a correction process in the case where the determination that there is a failure (need to report) is an erroneous determination will be described with reference to the flowchart of FIG.

障害判定処理部１４は、障害有り（通報の必要有り）と判定し通報を実施した後、点数更新用情報テーブル２４を参照して「待機時間」に指定された時間の間、想定される障害が発生したかどうかを、すなわち、通報した累積グループ名と同じ累積グループ名の記録情報が、待機時間の間に点数累積処理部１３から到着したかどうかを判定する（ステップＳ２０１〜２０３）。 The failure determination processing unit 14 determines that there is a failure (need to report) and executes the notification. Then, the failure determination processing unit 14 refers to the point update information table 24 and refers to the assumed failure for the time designated as “waiting time”. That is, it is determined whether or not the recording information having the same cumulative group name as the notified cumulative group name has arrived from the point accumulation processing unit 13 during the standby time (steps S201 to S203).

通報した累積グループ名と同じ累積グループ名の記録情報が、待機時間の間に到着した場合には、障害有り（通報の必要有り）の判定が適切であったと判断できるので、閾値点数の補正処理を行わない。 If the recorded information of the same cumulative group name as the reported cumulative group name arrives during the waiting time, it can be determined that the determination of fault (need to report) is appropriate, so the threshold score correction process Do not do.

通報した累積グループ名と同じ累積グループ名の記録情報が、待機時間の間に到着しない場合には、障害有り（通報の必要有り）の判定が誤判定であったと判断できるので、点数更新用情報テーブル２４の同じ累積グループ名に対応する「補正係数」を使用して算出した補正点数αを問題判定テーブル２３の「閾値点数」フィールドに加算することによって、「閾値点数」の点数を補正する（ステップＳ２０４）。 If the recorded information of the same cumulative group name as the reported cumulative group name does not arrive during the waiting time, it can be determined that the determination of fault (need to be reported) was an erroneous determination, so the information for updating the score The score of the “threshold score” is corrected by adding the correction score α calculated using the “correction coefficient” corresponding to the same cumulative group name in the table 24 to the “threshold score” field of the problem determination table 23 ( Step S204).

この補正点数αは、例えば下記の式に基づいて算出する。
補正点数α＝（Ｘ−Ｙ）＊Ｆ
ここで、Ｘ：累積点数記憶部２５の「累積点数」の値、Ｙ：問題判定テーブル２３の「閾値点数」の値、Ｆ：点数更新用情報テーブル２４の「補正係数」の値である。 The correction score α is calculated based on the following formula, for example.
Number of correction points α = (X−Y) * F
Here, X is the value of “cumulative score” in the cumulative score storage unit 25, Y is the value of “threshold score” in the problem determination table 23, and F is the value of “correction coefficient” in the score update information table 24.

次に、障害無し（通報の必要無し）の判定が誤判定であった場合の補正処理について、図９のフローチャートを参照して説明する。 Next, correction processing when the determination of no failure (no need for notification) is an erroneous determination will be described with reference to the flowchart of FIG.

障害判定処理部１４は、障害無し（通報の必要無し）と判定した後、点数更新用情報テーブル２４に登録している障害が発生したかどうかを判定する。すなわち、障害判定処理部１４は、点数累積処理部１３から監視情報が到着すると、到着した監視情報が、点数更新用情報テーブル２４の同じ累積グループ名のルール条件に合致するかどうかを判定する（ステップＳ３０１）。 After determining that there is no failure (no need for notification), the failure determination processing unit 14 determines whether a failure registered in the score update information table 24 has occurred. That is, when the monitoring information arrives from the score accumulation processing unit 13, the failure determination processing unit 14 determines whether the received monitoring information matches the rule condition of the same accumulation group name in the score update information table 24 ( Step S301).

到着した監視情報が、点数更新用情報テーブル２４のルール条件に合致しない場合には、点数更新用情報テーブル２４に登録している障害は発生していない（障害無し（通報の必要無し）の判定が適切）と判断し、処理を終了する。 If the arrived monitoring information does not match the rule conditions of the score update information table 24, the failure registered in the score update information table 24 has not occurred (no failure (no report required) determination) And the process is terminated.

到着した監視情報が、点数更新用情報テーブル２４の同じ累積グループ名のルール条件に合致する場合、点数累積処理部１３において点数の累積（図７のステップＳ１０７〜Ｓ１０９）が行われているかどうかを判定する（ステップＳ３０２）。 When the arrived monitoring information matches the rule condition of the same cumulative group name in the score update information table 24, it is determined whether or not the score is accumulated in the score accumulation processing unit 13 (steps S107 to S109 in FIG. 7). Determination is made (step S302).

点数の累積が行われている場合には、設定時間のカウントが終了するまで待ち合わせる。 If the points have been accumulated, the process waits until the set time has been counted.

点数の累積が行われていない場合には、さらに、点数更新用情報テーブル２４の「待機時間」の待ち合わせ中であるかを判定する（ステップＳ３０３）。待機時間の待ち合わせ中であれば、障害有り（通報の必要有り）と判定した場合であるので、処理を終了する。 If the points are not accumulated, it is further determined whether or not the “waiting time” in the point update information table 24 is being waited for (step S303). If the waiting time is waiting, it is determined that there is a failure (need to report), so the processing is terminated.

待機時間の待ち合わせ中でなければ、障害無し（通報の必要無し）の判定を行ったにもかかわらず、点数更新用情報テーブル２４に登録している障害が発生したと判断することができる。すなわち、障害無しの判定が誤判定であったと判断できるので、点数更新用情報テーブル２４の「補正係数」を参照して算出した補正点数βを、問題判定テーブル２３の「閾値点数」フィールドから減算することにより、「閾値点数」の点数を補正する（ステップＳ３０４）。 If it is not waiting for the waiting time, it can be determined that a failure registered in the score update information table 24 has occurred despite the determination of no failure (no need for notification). That is, since it can be determined that the determination of no failure is an erroneous determination, the correction score β calculated with reference to the “correction coefficient” in the score update information table 24 is subtracted from the “threshold score” field of the problem determination table 23. As a result, the score of “threshold score” is corrected (step S304).

この補正点数βは、例えば下記の式に基づいて算出する。
補正点数β ＝Ｙ＊Ｆ This correction score β is calculated based on the following equation, for example.
Number of correction points β = Y * F

（第１の実施の形態による効果）
本実施の形態によれば、監視項目に対する監視情報毎に、点数付与ルールテーブル２１のルール条件に従って点数を付与すると共に、複数の監視項目の組み合わせたグループ毎に、点数を累積し、累積した点数に基づいて障害の有無を検知するため、監視項目が複合的に組み合わさって発生する障害を検出し、通報することが可能となる。 (Effects of the first embodiment)
According to this embodiment, for each piece of monitoring information for a monitoring item, points are assigned according to the rule conditions of the point assignment rule table 21, and the points are accumulated for each group of a plurality of monitoring items combined. Therefore, it is possible to detect and report a failure that occurs when monitoring items are combined in combination.

また、監視項目のグループ毎に、障害の重要度に応じた通報方法を設定しておくことにより、障害の重要度に応じた適切な通報方法を選択することができる。 In addition, by setting a notification method according to the importance of the failure for each monitoring item group, an appropriate notification method according to the importance of the failure can be selected.

以上好ましい実施の形態と実施例をあげて本発明を説明したが、本発明は必ずしも、上記実施の形態及び実施例に限定されるものでなく、その技術的思想の範囲内において様々に変形して実施することができる。 Although the present invention has been described with reference to the preferred embodiments and examples, the present invention is not necessarily limited to the above-described embodiments and examples, and various modifications can be made within the scope of the technical idea. Can be implemented.

上記の実施の形態は、障害の検知と通報を行う障害検知通報装置について説明したが、上述した方法によって障害の検知だけを行い、通報については、通報実行部１５として機能する外部の装置に通報を指示する構成とすることも可能である。 In the above embodiment, the failure detection notification device that detects and reports a failure is described. However, only the failure detection is performed by the above-described method, and the notification is sent to an external device that functions as the notification execution unit 15. It is also possible to adopt a configuration for instructing.

本発明の実施の形態による障害検知通報装置の構成を示すブロック図である。It is a block diagram which shows the structure of the failure detection notification apparatus by embodiment of this invention. 本発明の実施の形態における点数付与ルールテーブルの構成例を示す図である。It is a figure which shows the structural example of the score provision rule table in embodiment of this invention. 本発明の実施の形態におけるデータ記憶部の記録情報の構成例を示す図である。It is a figure which shows the structural example of the recording information of the data storage part in embodiment of this invention. 本発明の実施の形態における問題判定テーブルの構成例を示す図である。It is a figure which shows the structural example of the problem determination table in embodiment of this invention. 本発明の実施の形態における累積点数記憶部の登録内容の構成例を示す図である。It is a figure which shows the structural example of the registration content of the accumulation score memory | storage part in embodiment of this invention. 本発明の実施の形態における点数更新用情報テーブルの構成例を示す図である。It is a figure which shows the structural example of the information table for score update in embodiment of this invention. 本発明の実施の形態による障害検知通報装置の動作を説明するフローチャートである。It is a flowchart explaining operation | movement of the failure detection notification apparatus by embodiment of this invention. 本発明の実施の形態による障害検知通報装置における閾値の補正処理の動作を説明するフローチャートである。It is a flowchart explaining the operation | movement of the correction process of the threshold value in the failure detection notification apparatus by embodiment of this invention. 本発明の実施の形態による障害検知通報装置における閾値の補正処理の動作を説明するフローチャートである。It is a flowchart explaining the operation | movement of the correction process of the threshold value in the failure detection notification apparatus by embodiment of this invention. 本発明の実施の形態による障害検知通報装置のハードウェア構成例を示すブロック図である。It is a block diagram which shows the hardware structural example of the failure detection notification apparatus by embodiment of this invention.

Explanation of symbols

１０：障害検知通報装置
１１：監視情報取得部
１２：監視情報記録処理部
１３：点数累積処理部
１４：障害判定処理部
１５：通報実行部
２１：点数付与ルールテーブル
２２：データ記憶部２２
２３：問題判定テーブル
２４：点数更新用情報テーブル
２５：累積点数記憶部
１００：被監視装置
１０１：監視情報採取部
10: Failure detection notification device 11: Monitoring information acquisition unit 12: Monitoring information recording processing unit 13: Score accumulation processing unit 14: Failure determination processing unit 15: Notification execution unit 21: Score assignment rule table 22: Data storage unit 22
23: Problem determination table 24: Score update information table 25: Cumulative score storage unit 100: Monitored device 101: Monitoring information collection unit

Claims

A failure detection device that detects the occurrence of a failure in the monitored target based on monitoring information for a predetermined monitoring item acquired from the monitored target,
Means for assigning points according to a predetermined rule condition for each monitoring information for the monitoring item;
Point accumulation means for accumulating the points for each group of a plurality of the monitoring items;
A failure detection apparatus comprising: a determination unit that determines whether there is a failure of the monitored target by comparing the accumulated score with a threshold set for each group.

Provided with a report execution means for executing a report when the failure is determined;
The failure detection apparatus according to claim 1, wherein when the determination unit determines that the failure is present, the determination unit selects a notification method set for each type of the group and instructs the notification execution unit. .

Different notification methods are set for each threshold set in stages.
3. The failure detection apparatus according to claim 2, wherein the determination unit selects a notification method according to a threshold value where the accumulated score exceeds, and instructs the notification execution unit.

The failure according to any one of claims 1 to 3, wherein the determination unit increases or decreases the threshold based on a predetermined correction coefficient when the determination of the presence or absence of a failure is an erroneous determination. Detection device.

The said determination means correct | amends the said threshold value by adding the value calculated using the predetermined | prescribed correction coefficient to the said threshold value when determination with a failure is a misjudgment. The failure detection device described in 1.

The said determination means correct | amends the said threshold value by subtracting the value calculated using the predetermined | prescribed correction coefficient from the said threshold value, when determination with no failure is a misjudgment. 4. The fault detection apparatus according to 4.

For each monitoring item, a rule condition indicating whether or not points are assigned, an accumulation target flag indicating whether or not points are accumulated, a group name, and a rule table in which points to be assigned are registered,
When the monitoring information is acquired, the means for assigning the score refers to the rule table, the value of the monitoring item indicated in the monitoring information satisfies the rule condition, and the accumulation target flag is the accumulation target. The failure detection apparatus according to any one of claims 1 to 6, wherein, in the case where the information is indicated, the score registered in the rule table is added to the monitoring information.

Cumulative score storage means for recording the cumulative score for each group, the threshold value, the problem rank indicating the importance of the failure, and at least one notification method according to the problem rank for each group Provide a problem determination table,
The score accumulating means accumulates the score given to the monitoring information to the accumulated score of the accumulated score storage means,
The determination means determines the presence or absence of a failure by comparing the cumulative score of the cumulative score storage means with the threshold value of the problem determination table. The failure detection device according to claim 2, wherein the notification method is selected according to the method.

The said determination means determines the presence or absence of the said fault by comparing the score accumulate | stored during the predetermined setting time with the said threshold value, The said any one of Claims 1-8 characterized by the above-mentioned. Failure detection device.

An update table in which a correction coefficient for correcting the threshold is registered for each group,
The determination means includes
When it is determined that there is a failure, the determination that there is a failure is determined to be an erroneous determination when the monitoring information for the monitoring item included in the group that has been determined to have a failure is not acquired during a predetermined waiting time. And adding the value calculated using the correction coefficient of the update table to the threshold value,
When it is determined that there is no failure, when the monitoring information for the monitoring item included in the group that is determined as having no failure is acquired, the determination that there is no failure is determined as an erroneous determination, and the update table is corrected. 6. The failure detection apparatus according to claim 4, wherein a value calculated using a coefficient is subtracted from the threshold value.

A failure detection method for detecting the occurrence of a failure of the monitored object based on monitoring information for a predetermined monitoring item acquired from the monitored object,
For each piece of monitoring information for the monitoring item, assigning a score according to a predetermined rule condition;
Accumulating the score for each group of a plurality of the monitoring items;
A failure detection method comprising: a determination step of determining the presence or absence of a failure of the monitored target by comparing the accumulated score and a threshold set for each group.

The failure detection method according to claim 11, further comprising a notification step of executing a notification by a notification method set for each type of the group when it is determined that the failure is present in the determination step.

Different notification methods are set for each threshold set in stages.
13. The fault detection according to claim 12, wherein in the determination step, a notification method is selected according to a threshold value where the accumulated score exceeds, and the notification is executed in the notification step by the selected notification method. Method.

14. The method according to claim 11, wherein the determination step includes a step of increasing or decreasing the threshold value based on a predetermined correction coefficient when the determination of the presence or absence of a failure is an erroneous determination. The failure detection method described.

The determination step includes a step of correcting the threshold value by adding a value calculated using a predetermined correction coefficient to the threshold value when the determination of failure is an erroneous determination. The fault detection method according to claim 14.

The determination step includes a step of correcting the threshold value by subtracting a value calculated using a predetermined correction coefficient from the threshold value when the determination of no failure is an erroneous determination. The failure detection method according to claim 14.

For each monitoring item, a rule condition indicating whether or not points are assigned, an accumulation target flag indicating whether or not points are accumulated, a group name, and a rule table in which points to be assigned are registered,
In the step of assigning points, when the monitoring information is acquired, the value of the monitoring item indicated in the monitoring information satisfies the rule condition with reference to the rule table, and the accumulation target flag is an accumulation target The failure detection method according to any one of claims 11 to 16, wherein a point registered in the rule table is assigned to the monitoring information.

Cumulative score storage means for recording the cumulative score for each group, the threshold value, the problem rank indicating the importance of the failure, and at least one notification method according to the problem rank for each group Provide a problem determination table,
In the step of accumulating the score, the score given to the monitoring information is accumulated in the accumulated score of the accumulated score storage means,
In the determination step, the cumulative score of the cumulative score storage means is compared with the threshold value of the problem determination table to determine whether there is a failure. The failure detection method according to any one of claims 12 to 17, wherein the notification method is selected in accordance with the method.

19. The presence or absence of the failure is determined by comparing the score accumulated during a predetermined set time with the threshold value in the determination step. Failure detection method.

An update table in which a correction coefficient for correcting the threshold is registered for each group,
In the determination step,
When it is determined that there is a failure, the determination that there is a failure is determined to be an erroneous determination when the monitoring information for the monitoring item included in the group that has been determined to have a failure is not acquired during a predetermined waiting time. And adding the value calculated using the correction coefficient of the update table to the threshold value,
When it is determined that there is no failure, when the monitoring information for the monitoring item included in the group that is determined as having no failure is acquired, the determination that there is no failure is determined as an erroneous determination, and the update table is corrected. The failure detection method according to claim 14 or 15, wherein a value calculated using a coefficient is subtracted from the threshold.

A program that is executed on a computer and detects occurrence of a failure of the monitored object based on monitoring information for a predetermined monitoring item acquired from the monitored object,
In the computer,
A process of assigning points according to a predetermined rule condition for each monitoring information for the monitoring item;
A process of accumulating the score for each group of a plurality of the monitoring items;
A program for executing a determination process for determining the presence or absence of a failure of the monitored object by comparing the accumulated score and a threshold set for each group.

The program according to claim 21, wherein when the determination process determines that the failure is present, a notification process for executing a notification by a notification method set for each type of the group is executed.

Different notification methods are set for each threshold set in stages.
The program according to claim 22, wherein a notification method is selected in accordance with a threshold value in which the accumulated score exceeds in the determination process, and a notification is performed by the selected notification method in the notification process.

24. The method according to claim 21, wherein the determination process includes a process of increasing or decreasing the threshold value based on a predetermined correction coefficient when the determination of the presence or absence of a failure is an erroneous determination. The listed program.

The determination process includes a process of correcting the threshold value by adding a value calculated using a predetermined correction coefficient to the threshold value when the determination of failure is an erroneous determination. The program according to claim 24.

The determination process includes a process of correcting the threshold value by subtracting a value calculated using a predetermined correction coefficient from the threshold value when the determination of no failure is an erroneous determination. The program according to claim 24.

For each monitoring item, a rule condition indicating whether or not points are assigned, an accumulation target flag indicating whether or not points are accumulated, a group name, and a rule table in which points to be assigned are registered,
In the process of assigning points, when the monitoring information is acquired, the rule table is referred to, the value of the monitoring item indicated in the monitoring information satisfies the rule condition, and the accumulation target flag is an accumulation target 27. The program according to any one of claims 21 to 26, wherein a point registered in the rule table is added to the monitoring information when the information is indicated.

Cumulative score storage means for recording the cumulative score for each group, the threshold value, the problem rank indicating the importance of the failure, and at least one notification method according to the problem rank for each group Provide a problem determination table,
In the process of accumulating the score, the score given to the monitoring information is accumulated in the accumulated score of the accumulated score storage means,
In the determination process, the cumulative score of the cumulative score storage means is compared with the threshold value of the problem determination table to determine whether or not there is a failure. The program according to any one of claims 22 to 27, wherein the notification method is selected according to the method.

29. The presence or absence of the failure is determined by comparing the score accumulated during a predetermined set time with the threshold value in the determination process. Program.

An update table in which a correction coefficient for correcting the threshold is registered for each group,
In the determination process,
When it is determined that there is a failure, the determination that there is a failure is determined to be an erroneous determination when the monitoring information for the monitoring item included in the group that has been determined to have a failure is not acquired during a predetermined waiting time. And adding the value calculated using the correction coefficient of the update table to the threshold value,
When it is determined that there is no failure, when the monitoring information for the monitoring item included in the group that is determined as having no failure is acquired, the determination that there is no failure is determined as an erroneous determination, and the update table is corrected. The program according to claim 24 or 25, wherein a value calculated using a coefficient is subtracted from the threshold value.