JP2008171104A

JP2008171104A - Monitoring apparatus, monitoring system, monitoring method and monitoring program for monitoring business service and system performance

Info

Publication number: JP2008171104A
Application number: JP2007002089A
Authority: JP
Inventors: Masahiro Ono; 允裕大野; Kiyoshi Kato; 清志加藤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2007-01-10
Filing date: 2007-01-10
Publication date: 2008-07-24

Abstract

<P>PROBLEM TO BE SOLVED: To always properly report an alert in monitoring a device to monitor metrics for business service metrics and system performance, by eliminating the difference between the result of determining an alert for the business service metrics and that for the system performance. <P>SOLUTION: A physical influence relationship between the business service metrics and system performance items is set in a metric map storage part 304 as a metric map. Based on the result of determining an alert for the business service metrics that is determined to be normal by an alert determining means 102, and on a system performance value of a system performance item associated with the business service metrics in the metric map, a threshold value deriving means 105 mechanically calculates, through regression analysis, a system performance value when the result of determining the alert for the business service metrics is normal; the result calculated is stored in a threshold value storage part 303. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、情報処理装置などの監視対象装置に業務サービスとシステム性能の問題が発生しているか否かを判定する監視技術に関し、特に、業務サービスメトリックのアラート判定結果とシステム性能のアラート判定結果との異なりをなくして、アラートを適切に通知する機能を有する監視装置、監視システム、監視方法および監視プログラムに関する。 The present invention relates to a monitoring technology for determining whether or not a business service and system performance problem occurs in a monitoring target device such as an information processing device, and in particular, an alert determination result of a business service metric and an alert determination result of a system performance The present invention relates to a monitoring device, a monitoring system, a monitoring method, and a monitoring program having a function of appropriately notifying an alert and eliminating the difference.

情報処理装置などの監視対象装置に業務サービスとシステム性能の問題が発生しているか否かを判定する一般的な監視装置では、業務サービスの問題が発生しているか否か、または、システム性能の問題が発生しているか否かを判定する。 A general monitoring device that determines whether a business service and system performance problem has occurred in a monitoring target device such as an information processing device. Determine if a problem has occurred.

例えば、システム性能の問題が発生しているか否かを判定する一般的な監視装置は、あらかじめ定められている複数のシステム性能項目（ＣＰＵ使用率、メモリ使用率、ネットワーク使用率、ディスク使用率、連続稼動時間、稼動プロセス数、実行プロセス数、接続ユーザ数、接続コネクション数など）のシステム性能値を監視対象装置から取得する。そして、取得した各システム性能値と、システム管理者によって設定されている各システム性能項目のしきい値とを比較することによって、システム性能上の問題が発生しているか否か判定する。 For example, a general monitoring device that determines whether or not a system performance problem has occurred has a plurality of predetermined system performance items (CPU usage rate, memory usage rate, network usage rate, disk usage rate, System performance values such as continuous operation time, number of active processes, number of execution processes, number of connected users, number of connected connections, etc.) are acquired from the monitoring target device. Then, by comparing each acquired system performance value with a threshold value of each system performance item set by the system administrator, it is determined whether or not a system performance problem has occurred.

また、業務サービスの問題が発生しているか否かを判定する一般的な監視装置も、同様に、あらかじめ定められている複数の業務サービスメトリック（利用者の端末からの応答時間、アプリケーションエラーの発生頻度、タイムアウト回数、平均復旧時間、平均故障間隔、業務サービスの稼働率、業務サービス提供期間など）の業務サービスメトリック値を監視対象装置から取得する。そして、取得した各業務サービスメトリック値と、業務責任者によって設定されている各業務サービスメトリックのしきい値とを比較することによって、業務サービス上の問題が発生しているか否か判定する。 Similarly, a general monitoring device that determines whether or not a business service problem has occurred also has a plurality of predetermined business service metrics (response time from user terminal, occurrence of application error). Business service metric values such as frequency, timeout count, average recovery time, average failure interval, business service availability, business service provision period, etc.) are acquired from the monitored device. Then, by comparing each acquired business service metric value with the threshold value of each business service metric set by the business manager, it is determined whether or not a problem in the business service has occurred.

上記の一般的な監視装置では、システム管理者と業務責任者が経験にもとづいて各しきい値を設定しなければならない。従って、システム管理者と業務責任者とに負担がかかるとともに、適切なしきい値が設定されない可能性があるという問題がある。 In the above general monitoring apparatus, each threshold value must be set based on experience by the system manager and the business manager. Therefore, there is a problem that the system administrator and the business manager are burdened and there is a possibility that an appropriate threshold value may not be set.

そのような問題点を解決するための監視装置が提案されている（例えば、特許文献１，２参照。）。 Monitoring devices for solving such problems have been proposed (see, for example, Patent Documents 1 and 2).

特許文献１に記載されている従来の監視装置は、システム性能の問題が発生しているか否かを判定する監視装置であって、システム管理者によって指定されたサンプリング期間中に収集した１つのシステム性能項目の平均値および標準偏差から、システム性能項目のしきい値を算出する。例えば、ＣＰＵ使用率を一定期間測定し、測定結果の平均値を算出して、平均値をＣＰＵ使用率の予想しきい値とする。また、ＣＰＵ使用率の測定結果から標準偏差を算出して、標準偏差をＣＰＵ使用率の予想しきい値の誤差とする。 The conventional monitoring device described in Patent Document 1 is a monitoring device that determines whether or not a system performance problem has occurred, and is one system that is collected during a sampling period designated by a system administrator. The system performance item threshold value is calculated from the average value and standard deviation of the performance item. For example, the CPU usage rate is measured for a certain period, an average value of the measurement results is calculated, and the average value is set as an expected threshold value of the CPU usage rate. Further, a standard deviation is calculated from the measurement result of the CPU usage rate, and the standard deviation is set as an error of an expected threshold value of the CPU usage rate.

特許文献２に記載されている従来の監視装置は、業務サービスの問題が発生しているか否かを判定する監視装置であって、今までの１つの業務サービスの実行時間の実績値にもとづいて、業務サービスの予想実行時間をしきい値として算出する。例えば、利用者の端末からの応答時間を一定期間測定し、測定結果の平均値を算出して、平均値を利用者の端末からの応答時間のしきい値とする。 The conventional monitoring device described in Patent Document 2 is a monitoring device that determines whether or not a business service problem has occurred, and is based on the actual execution time of one business service so far. The expected execution time of the business service is calculated as a threshold value. For example, the response time from the user terminal is measured for a certain period, the average value of the measurement results is calculated, and the average value is used as the threshold value for the response time from the user terminal.

特開２００３−２６３３４２号公報JP 2003-263342 A 特開２００４−３６２１４０号公報JP 2004-362140 A

特許文献１，２に記載されている従来の監視装置、およびそれらを組合せた装置によれば、システム性能項目のしきい値と業務サービスメトリックのしきい値とを、それぞれ自動的に設定することは可能である。しかし、システム性能項目のしきい値はシステム管理者の観点から自動的に設定され、業務サービスメトリックのしきい値は業務責任者の観点から自動的に設定されるため、システム性能項目のしきい値と業務サービスメトリックのしきい値との整合をとることができない。システム管理者の観点では、対象とするシステム性能項目の測定結果のみから自動設定され、業務責任者の観点では、業務サービスメトリックの測定結果のみから自動設定されるためである。 According to the conventional monitoring device described in Patent Documents 1 and 2, and a device that combines them, the threshold of the system performance item and the threshold of the business service metric can be set automatically. Is possible. However, the threshold for system performance items is set automatically from the perspective of the system administrator, and the threshold for business service metrics is set automatically from the perspective of the business manager. The value and the threshold of the business service metric cannot be matched. This is because, from the viewpoint of the system administrator, it is automatically set only from the measurement result of the target system performance item, and from the viewpoint of the business manager, it is automatically set only from the measurement result of the business service metric.

なお、システム性能項目と業務サービスメトリックとの整合がとれないときには、業務サービスメトリックのアラート判定結果とシステム性能のアラート判定結果とが異なる場合が発生する。具体的には、システム性能項目でアラートが発生するが、業務サービスメトリックではアラートが発生しない場合があり、また、業務サービスメトリックでアラートが発生するが、システム性能項目ではアラートが発生しない場合がある。 Note that when the system performance item and the business service metric cannot be matched, the business service metric alert determination result may differ from the system performance alert determination result. Specifically, an alert occurs in the system performance item, but an alert may not occur in the business service metric, and an alert occurs in the business service metric, but an alert may not occur in the system performance item .

その結果、どちらのアラート判定結果が正しいかのが不明になり、アラートが適切に通知されていないことになる。アラートが適切に通知されなければ、システム管理者は、システム性能上の問題を把握できず、また、業務責任者は、業務サービス上の問題を特定することが困難となる。 As a result, it is unclear which alert determination result is correct, and the alert is not properly notified. If the alert is not properly notified, the system administrator cannot grasp the system performance problem, and the business manager is difficult to identify the business service problem.

そこで、本発明は、業務サービスメトリックからシステム性能までのメトリックを監視対象とする監視装置、監視システム、監視方法および監視プログラムにおいて、業務サービスメトリックにおけるアラート判定結果とシステム性能におけるアラート判定結果の異なりをなくして、アラートを適切に通知できるようにすることを目的とする。 Therefore, the present invention relates to a monitoring device, a monitoring system, a monitoring method, and a monitoring program that monitor business service metrics to system performance metrics. The difference between the alert determination result in the business service metric and the alert determination result in the system performance is determined. The goal is to be able to notify alerts appropriately.

本発明による監視装置は、メトリック定義情報記憶部に記憶されたメトリック定義情報に従って、監視対象装置の１以上のシステム性能値と１以上の業務サービスメトリック値を取得するメトリック値取得手段から送信されるシステム性能値または業務サービスメトリック値をメトリック値記憶部に記憶し、しきい値記憶部に記憶されている業務サービスメトリックのしきい値とシステム性能項目のしきい値に従って、監視対象装置にシステム性能および業務サービスの問題が発生しているか否かを判定するアラート判定手段を備えた監視装置であって、１つの業務サービスメトリックと１以上のシステム性能項目との関係を示すメトリックマップを記憶するメトリックマップ記憶部と、アラート判定手段によって正常と判定された業務サービスメトリックによるアラート判定結果と、メトリックマップにおいて業務サービスメトリックに対応付けられたシステム性能項目と、システム性能項目のシステム性能値とにもとづいて、業務サービスメトリックのアラート判定結果が正常なときのシステム性能値を回帰分析により算出し、算出結果をしきい値記憶部に記憶するしきい値導出手段とを備えたことを特徴とする。 The monitoring device according to the present invention is transmitted from the metric value acquisition means for acquiring one or more system performance values and one or more business service metric values of the monitoring target device according to the metric definition information stored in the metric definition information storage unit. The system performance value or business service metric value is stored in the metric value storage unit, and the system performance is monitored in the monitored device according to the business service metric threshold value and the system performance item threshold value stored in the threshold value storage unit. And a metric map that stores a metric map indicating a relationship between one business service metric and one or more system performance items. Business services that are determined to be normal by the map storage unit and the alert determination means The system performance when the alert judgment result of the business service metric is normal based on the alert judgment result by the metric, the system performance item associated with the business service metric in the metric map, and the system performance value of the system performance item Threshold value deriving means for calculating a value by regression analysis and storing the calculation result in a threshold value storage unit is provided.

監視装置は、さらに、業務サービスメトリックのアラート判定結果が正常でありシステム性能のアラート判定結果も正常である場合がどの程度の割合で発生するかを示す陽性予測割合と、業務サービスメトリックのアラート判定結果が異常でありシステム性能のアラート判定結果も異常である場合がどの程度の割合で発生するかを示す陰性予測割合を記憶する定義情報記憶部と、定義情報記憶部に記憶されている陽性予測割合と陰性予測割合に従って、しきい値記憶部に記憶されているシステム性能項目のしきい値を変更するか否かを判定するしきい値変更判定手段とを備えていてもよい。 The monitoring device further includes a positive prediction ratio indicating the rate at which the business service metric alert judgment result is normal and the system performance alert judgment result is normal, and the business service metric alert judgment. A definition information storage unit that stores a negative prediction ratio indicating how often the result is abnormal and the system performance alert determination result is also abnormal, and a positive prediction stored in the definition information storage unit You may provide the threshold value change determination means which determines whether the threshold value of the system performance item memorize | stored in the threshold value memory | storage part is changed according to a ratio and a negative prediction ratio.

監視装置は、しきい値導出手段が、算出したしきい値を管理端末の画面に表示し、画面上で新たなしきい値の設定が指示された場合に、算出したしきい値を新たなしきい値としてしきい値記憶部に記憶させるように構成されていてもよい。 In the monitoring apparatus, the threshold deriving means displays the calculated threshold on the screen of the management terminal, and when the setting of a new threshold is instructed on the screen, the calculated threshold is newly set. The threshold value storage unit may be configured to store the value as a value.

本発明による監視システムは、複数の監視対象装置と、複数の監視対象装置と通信可能に接続された監視装置と、監視装置に通信可能に接続された管理端末と、監視装置および管理端末と通信可能に接続された記憶装置とを備えた監視システムであって、記憶装置が、メトリック定義情報を記憶するメトリック定義情報記憶部と、監視対象装置の１以上のシステム性能値と１以上の業務サービスメトリック値を記憶するメトリック値記憶部と、システム性能項目のしきい値と業務サービスメトリックのしきい値を記憶するしきい値記憶部と、１つの業務サービスメトリックと１以上のシステム性能項目との関係を示すメトリックマップを記憶するメトリックマップ記憶部とを含み、監視装置が、メトリック定義情報記憶部に記憶されたメトリック定義情報に従って、監視対象装置の１以上のシステム性能値と１以上の業務サービスメトリック値を取得するメトリック値取得手段と、メトリック値取得手段から送信されるシステム性能値または業務サービスメトリック値をメトリック値記憶部に記憶し、監視対象装置にシステム性能および業務サービスの問題が発生しているか否かを判定するアラート判定手段と、業務サービスメトリックのアラート判定結果が正常でありシステム性能のアラート判定結果も正常である場合がどの程度の割合で発生するかを示す陽性予測割合と、業務サービスメトリックのアラート判定結果が異常でありシステム性能のアラート判定結果も異常である場合がどの程度の割合で発生するかを示す陰性予測割合を記憶する定義情報記憶部と、しきい値記憶部に記憶されているシステム性能項目のしきい値を変更するか否かを判定するしきい値変更判定手段と、アラート判定手段によって正常と判定された業務サービスメトリックによるアラート判定結果と、メトリックマップにおいて業務サービスメトリックに対応付けられたシステム性能項目と、システム性能項目のシステム性能値とにもとづいて、業務サービスメトリックのアラート判定結果が正常なときのシステム性能値を回帰分析により算出し、算出結果をしきい値記憶部に記憶するしきい値導出手段とを含むことを特徴とする。 A monitoring system according to the present invention includes a plurality of monitoring target devices, a monitoring device connected to be able to communicate with the plurality of monitoring target devices, a management terminal communicatively connected to the monitoring device, and a communication with the monitoring device and the management terminal. A monitoring system including a storage device connected to the metric definition information storage unit that stores metric definition information, one or more system performance values of the monitoring target device, and one or more business services A metric value storage unit for storing a metric value, a threshold storage unit for storing a threshold value of a system performance item and a threshold value of a business service metric, one business service metric, and one or more system performance items A metric map storage unit that stores a metric map indicating the relationship, and the monitoring device stores the metrics stored in the metric definition information storage unit. Metric value acquisition means for acquiring one or more system performance values and one or more business service metric values of the monitoring target device, and the system performance value or business service metric value transmitted from the metric value acquisition means according to the definition information Alert determination means that stores in the value storage unit and determines whether there is a problem with system performance and business service in the monitored device, and the alert determination result of the system service metric that the business service metric alert determination result is normal and the system performance alert determination result The percentage of positive predictions that indicate the rate of occurrence of normal service and the rate at which the alert judgment result of the business service metric is abnormal and the alert judgment result of the system performance is also abnormal Definition information storage unit that stores a negative prediction ratio indicating whether or not to perform threshold value storage Threshold change determination means for determining whether or not to change the threshold value of the system performance item stored in the message, the alert determination result by the business service metric determined to be normal by the alert determination means, and the metric map Based on the system performance item associated with the business service metric and the system performance value of the system performance item, the system performance value when the alert judgment result of the business service metric is normal is calculated by regression analysis. And a threshold value deriving unit that stores the threshold value in the threshold value storage unit.

監視システムにおいて、しきい値導出手段が、算出したしきい値を管理端末に送信して管理端末の画面に表示させ、画面上でしきい値の設定が指示された場合に、しきい値を新たなしきい値としてしきい値記憶部に記憶させるように構成されていてもよい。 In the monitoring system, the threshold deriving means transmits the calculated threshold value to the management terminal to display it on the management terminal screen, and when the threshold setting is instructed on the screen, the threshold value is calculated. You may comprise so that it may memorize | store in a threshold value memory | storage part as a new threshold value.

本発明による監視方法は、複数の監視対象装置と通信可能に接続されたコンピュータによって監視対象装置の監視を行う方法であって、コンピュータが、業務サービスメトリックのしきい値とシステム性能項目のしきい値に従って、監視対象装置にシステム性能および業務サービスの問題が発生しているか否かを判定してアラート判定結果を記憶部に記憶し、記憶部に記憶された正常と判定された業務サービスメトリックによるアラート判定結果と、業務サービスメトリックに対応付けられたシステム性能項目と、システム性能項目のシステム性能値とにもとづいて、業務サービスメトリックのアラート判定結果が正常なときのシステム性能値を回帰分析により算出して算出結果を記憶し、算出結果を管理端末に出力し、管理端末からの指示に従って、管理端末に出力された新たなしきい値を、記憶部に記憶されるシステム性能項目のしきい値として設定し、監視対象装置から送信されるシステム性能値と業務サービスメトリック値を受信して記憶することを特徴とする。 The monitoring method according to the present invention is a method of monitoring a monitoring target device by a computer that is communicably connected to a plurality of monitoring target devices, wherein the computer uses a threshold of business service metrics and a threshold of system performance items. According to the value, it is determined whether there is a problem in system performance and business service in the monitored device, the alert determination result is stored in the storage unit, and the business service metric determined as normal stored in the storage unit Based on the alert judgment result, the system performance item associated with the business service metric, and the system performance value of the system performance item, the system performance value when the business service metric alert judgment result is normal is calculated by regression analysis To store the calculation results, output the calculation results to the management terminal, and give instructions from the management terminal. Therefore, the new threshold value output to the management terminal is set as the threshold value of the system performance item stored in the storage unit, and the system performance value and business service metric value transmitted from the monitored device are received. It is memorized.

システム性能値を回帰分析により算出する際に、記憶部に記憶されている業務サービスメトリックのアラート判定結果が正常でありシステム性能のアラート判定結果も正常である場合がどの程度の割合で発生するかを示す陽性予測割合と業務サービスメトリックのアラート判定結果が異常でありシステム性能のアラート判定結果も異常である場合がどの程度の割合で発生するかを示す陰性予測割合に従って、システム性能項目のしきい値を変更するか否かを判定するように構成されていてもよい。 When calculating the system performance value by regression analysis, how often does it occur when the business service metric alert judgment result stored in the storage unit is normal and the system performance alert judgment result is also normal? Threshold of system performance items according to the negative prediction ratio indicating how often the alert prediction result of business service metrics and the alert judgment result of system performance metrics are abnormal and the alert judgment result of system performance is also abnormal It may be configured to determine whether or not to change the value.

業務サービスメトリックからシステム性能までのメトリックが監視対象である監視対象装置において、ある業務サービスメトリックとあるシステム性能項目が、物理的に同一の監視装置上の測定項目であれば、ある業務サービスメトリックとあるシステム性能項目は、同じ物理的な影響を受けていることになる。この点に着目し、本発明では、業務サービスメトリックとシステム性能項目との物理的な影響関係をメトリックマップとして設定しておき、このメトリックマップと、正常と判定された業務サービスメトリックのアラート判定結果と、メトリックマップにおいて業務サービスメトリックに対応付けられたシステム性能項目のシステム性能値にもとづいて、業務サービスメトリックのアラート判定結果が正常なときのシステム性能値を回帰分析により機械的に算出し、算出結果をしきい値記憶部に記憶する。 In a monitoring target device whose metrics from business service metrics to system performance are monitored, if a certain business service metric and a certain system performance item are measurement items on the same physical monitoring device, Some system performance items are subject to the same physical influence. Focusing on this point, in the present invention, the physical influence relationship between the business service metric and the system performance item is set as a metric map, and the alert judgment result of this metric map and the business service metric determined to be normal Based on the system performance value of the system performance item associated with the business service metric in the metric map, the system performance value when the business service metric alert judgment result is normal is mechanically calculated by regression analysis. The result is stored in the threshold storage unit.

本発明によれば、業務サービスメトリックからシステム性能までのメトリックを監視対象として監視対象装置を監視するときに、業務サービスメトリックのアラート判定結果とシステム性能のアラート判定結果の異なりをなくして、アラートを適切に通知することが可能になる。その理由は、業務サービスメトリックのアラート判定結果とシステム性能のアラート判定結果とが異なる場合に、正常と判定された業務サービスメトリックのアラート判定結果と、メトリックマップによって業務サービスメトリックに対応付けられたシステム性能項目のシステム性能値にもとづいて、業務サービスメトリックのアラート判定結果が正常なときのシステム性能値を回帰分析して算出結果をしきい値記憶部に記憶して、システム性能項目の新たなしきい値にするためである。 According to the present invention, when monitoring a monitoring target device using a metric from a business service metric to a system performance as a monitoring target, the difference between the business service metric alert judgment result and the system performance alert judgment result is eliminated, and the alert is sent. It becomes possible to notify appropriately. The reason is that when the business service metric alert judgment result and the system performance alert judgment result are different, the business service metric alert judgment result determined to be normal and the system associated with the business service metric by the metric map Based on the system performance value of the performance item, the system performance value when the alert judgment result of the business service metric is normal is subjected to regression analysis, and the calculation result is stored in the threshold value storage unit. This is to make it a value.

実施の形態１．
図１は、本発明による監視システムの第１の実施の形態（実施の形態１）の構成例を示すブロック図である。図１に示す監視システムは、業務アプリケーションが動作している監視対象装置２を監視するものであり、ＣＰＵ、メモリ、ネットワーク接続機器などを有する監視装置１と、ディスク装置などの記憶装置３と、コンソール等の入力装置およびディスプレイ等の出力装置を有する管理端末４とを含む。管理端末４は、監視装置１と通信可能に接続され、記憶装置３は、監視装置１および管理端末４と通信可能に接続されている。 Embodiment 1 FIG.
FIG. 1 is a block diagram showing a configuration example of a first embodiment (Embodiment 1) of a monitoring system according to the present invention. The monitoring system shown in FIG. 1 monitors a monitoring target device 2 on which a business application is operating, and includes a monitoring device 1 having a CPU, a memory, a network connection device, etc., a storage device 3 such as a disk device, And a management terminal 4 having an input device such as a console and an output device such as a display. The management terminal 4 is communicably connected to the monitoring device 1, and the storage device 3 is communicably connected to the monitoring device 1 and the management terminal 4.

記憶装置３は、メトリック定義情報記憶部３０１と、メトリック値記憶部３０２と、しきい値記憶部３０３と、メトリックマップ記憶部３０４とを備えている。メトリック定義情報記憶部３０１は、監視対象装置２からシステム性能値と業務サービスメトリック値とを収集するためのメトリック定義に関する情報を記憶する。メトリック定義情報記憶部３０１に記憶される個々のメトリック定義情報は、監視対象とする監視対象装置２を特定する情報と、システム性能項目および業務サービスメトリックを特定する情報とを含む。メトリック定義情報記憶部３０１に記憶される情報には、さらに、取得間隔を示す情報などの他の情報が含まれていてもよい。 The storage device 3 includes a metric definition information storage unit 301, a metric value storage unit 302, a threshold value storage unit 303, and a metric map storage unit 304. The metric definition information storage unit 301 stores information regarding metric definitions for collecting system performance values and business service metric values from the monitoring target device 2. Each metric definition information stored in the metric definition information storage unit 301 includes information for specifying the monitoring target device 2 to be monitored, and information for specifying a system performance item and a business service metric. The information stored in the metric definition information storage unit 301 may further include other information such as information indicating an acquisition interval.

図２に、メトリック定義情報記憶部３０１に記憶されるメトリック定義情報の例を示す。例えば、１行目のMonitor_001 を識別子とするメトリック定義情報は、Dev_001 を識別子とする監視対象装置２からＣＰＵ使用率を３０秒間隔で取得することを示す。 FIG. 2 shows an example of metric definition information stored in the metric definition information storage unit 301. For example, the metric definition information having the identifier Monitor_001 in the first line indicates that the CPU usage rate is acquired at 30-second intervals from the monitoring target device 2 having Dev_001 as the identifier.

メトリック値記憶部３０２は、監視装置１が監視対象装置２から取得したシステム性能値または業務サービスメトリック値などのメトリック値情報を記憶する。メトリック値記憶部３０２に記憶される個々のメトリック値情報は、監視対象とする監視対象装置２を特定する情報と、システム性能項目および業務サービスメトリックを特定する情報と、監視装置１が監視対象装置２から取得したメトリック値を示す情報と、取得時刻を示す情報とを含む。 The metric value storage unit 302 stores metric value information such as system performance values or business service metric values acquired by the monitoring device 1 from the monitoring target device 2. The individual metric value information stored in the metric value storage unit 302 includes information for specifying the monitoring target device 2 to be monitored, information for specifying the system performance item and the business service metric, and the monitoring device 1 for the monitoring target device. 2 includes information indicating the metric value acquired from 2 and information indicating the acquisition time.

図３に、メトリック値記憶部３０２に記憶されるメトリック値情報の例を示す。この例では、個々のメトリック値情報は、取得したメトリック値に、取得先の監視対象装置２の識別子、取得したメトリック項目および取得時刻が付加されて記憶されている。 FIG. 3 shows an example of metric value information stored in the metric value storage unit 302. In this example, individual metric value information is stored with the acquired metric value added with the identifier of the monitoring target apparatus 2 as the acquisition destination, the acquired metric item, and the acquisition time.

しきい値記憶部３０３は、システム性能項目のしきい値情報と業務サービスメトリックのしきい値情報とを記憶する。しきい値記憶部３０３に記憶される個々のしきい値情報は、メトリック定義情報記憶部３０１に記憶された各メトリック定義情報を特定する情報と、そのしきい値を示す情報とを含む。 The threshold storage unit 303 stores threshold information on system performance items and threshold information on business service metrics. The individual threshold information stored in the threshold storage unit 303 includes information for identifying each metric definition information stored in the metric definition information storage unit 301 and information indicating the threshold.

図４に、しきい値記憶部３０３に記憶されるしきい値情報の例を示す。例えば、１行目のthreashold_001を識別子とするしきい値情報は、メトリック定義情報の識別子であるMonitor_001 のしきい値が７０％であることを示す。メトリックマップ記憶部３０４は、１つの業務サービスメトリックと、１以上のシステム性能項目との関係を示すメトリックマップを記憶する。 FIG. 4 shows an example of threshold information stored in the threshold storage unit 303. For example, the threshold information whose identifier is threshold_001 on the first line indicates that the threshold of Monitor_001, which is the identifier of the metric definition information, is 70%. The metric map storage unit 304 stores a metric map indicating the relationship between one business service metric and one or more system performance items.

メトリックマップ記憶部３０４に記憶される各メトリックマップは、１つの業務サービスメトリックを特定する情報と、業務メトリックに関係する１以上のシステム性能項目を特定する情報とを含む。 Each metric map stored in the metric map storage unit 304 includes information specifying one business service metric and information specifying one or more system performance items related to the business metric.

図５に、メトリックマップ記憶部３０４に記憶されるメトリックマップの例を示す。例えば、１行目のMap_001 を識別子とするメトリックマップは、Monitor_002 を識別子とする業務サービスメトリックには、Monitor_001 を識別子とするシステム性能項目と、Monitor_003 を識別子とするシステム性能項目があることを示す。 FIG. 5 shows an example of a metric map stored in the metric map storage unit 304. For example, the metric map having Map_001 as the identifier in the first line indicates that the business service metric having Monitor_002 as an identifier includes a system performance item having Monitor_001 as an identifier and a system performance item having Monitor_003 as an identifier.

監視装置１は、メトリック値取得手段１０１と、アラート判定手段１０２と、定義情報記憶部１０３と、しきい値変更判定手段１０４と、しきい値導出手段１０５とを備えている。メトリック値取得手段１０１、アラート判定手段１０２、しきい値変更判定手段１０４およびしきい値導出手段１０５は、ハードウェア回路で実現可能であるが、それらの手段を実現するためのプログラムに従って処理を実行するＣＰＵと周辺回路とで実現することもできる。つまり、監視装置１をコンピュータで実現可能である。 The monitoring device 1 includes a metric value acquisition unit 101, an alert determination unit 102, a definition information storage unit 103, a threshold value change determination unit 104, and a threshold value derivation unit 105. The metric value acquisition unit 101, the alert determination unit 102, the threshold value change determination unit 104, and the threshold value derivation unit 105 can be realized by a hardware circuit, but execute processing according to a program for realizing these units. It can also be realized by a CPU and a peripheral circuit. That is, the monitoring device 1 can be realized by a computer.

メトリック値取得手段１０１は、メトリック定義情報記憶部３０２に記憶された監視対象装置２のメトリック定義情報に従ってシステム性能値と業務サービスメトリック値とを取得する。また、メトリック値取得手段１０１は、取得したシステム性能値と業務サービスメトリック値とを、取得時刻を示す情報を付してアラート判定手段１０２に渡す。 The metric value acquisition unit 101 acquires a system performance value and a business service metric value according to the metric definition information of the monitoring target device 2 stored in the metric definition information storage unit 302. In addition, the metric value acquisition unit 101 passes the acquired system performance value and business service metric value to the alert determination unit 102 with information indicating the acquisition time.

アラート判定手段１０２は、メトリック値取得手段１０１が取得したシステム性能値および業務サービスメトリック値と、しきい値記憶部３０３に記憶されているしきい値とにもとづいて、監視対象装置２にシステム性能および業務サービスの問題が発生しているか否か判定し、そのアラート判定結果をしきい値変更判定手段１０４に渡す。また、アラート判定手段１０２は、メトリック値取得手段１０１が取得したシステム性能値および業務サービスメトリック値と、取得時刻を示す情報と、アラート判定結果とを、メトリック値記憶部３０２に記憶する。 The alert determination unit 102 sends the system performance value to the monitoring target device 2 based on the system performance value and business service metric value acquired by the metric value acquisition unit 101 and the threshold value stored in the threshold value storage unit 303. In addition, it is determined whether or not a business service problem has occurred, and the alert determination result is passed to the threshold value change determination means 104. The alert determination unit 102 stores the system performance value and business service metric value acquired by the metric value acquisition unit 101, information indicating the acquisition time, and the alert determination result in the metric value storage unit 302.

定義情報記憶部１０３は、陽性予測割合と陰性予測割合とを記憶する。陽性予測割合は、業務サービスメトリックのアラート判定結果が正常でありシステム性能のアラート判定結果も正常である場合がどの程度の割合で発生するかを示す。陰性予測割合は、業務サービスメトリックのアラート判定結果が異常でありシステム性能のアラート判定結果も異常である場合がどの程度の割合で発生するかを示す。例えば、陽性予測割合は９０％、陰性予測割合は８５％と記憶する。 The definition information storage unit 103 stores a positive prediction ratio and a negative prediction ratio. The positive predictive ratio indicates the ratio of occurrence when the business service metric alert determination result is normal and the system performance alert determination result is also normal. The negative prediction ratio indicates the rate at which the case where the alert determination result of the business service metric is abnormal and the alert determination result of the system performance is also abnormal occurs. For example, the positive prediction ratio is stored as 90% and the negative prediction ratio is stored as 85%.

しきい値変更判定手段１０４は、メトリックマップ記憶部３０４に記憶されたメトリックマップを参照して、アラート判定手段１０２によって判定された業務サービスメトリックのアラート判定結果と、業務サービスメトリックに対応付けられたシステム性能のアラート判定結果とにもとづいて、陽性予測割合と陰性予測割合とを算出する。そして、算出した陽性予測割合と、定義情報記憶部１０３に記憶された陽性予測割合とを比較し、算出した陽性予測割合が定義情報記憶部１０３に記憶された陽性予測割合よりも小さい場合に、しきい値導出手段１０５を起動する。または、算出した陰性予測割合と、定義情報記憶部１０３に記憶された陰性予測割合とを比較し、算出した陰性予測割合が定義情報記憶部１０３に記憶された陰性予測割合よりも小さい場合に、しきい値導出手段１０５を起動する。例えば、算出した陽性予測割合が８０％であり、定義情報記憶部１０３に記憶された陽性予測割合が９０％であれば、算出した陽性予測割合が小さいので、しきい値導出手段１０５が起動される。 The threshold value change determination unit 104 refers to the metric map stored in the metric map storage unit 304 and associates the business service metric alert determination result determined by the alert determination unit 102 with the business service metric. Based on the alert judgment result of the system performance, a positive prediction ratio and a negative prediction ratio are calculated. Then, when the calculated positive prediction ratio is compared with the positive prediction ratio stored in the definition information storage unit 103, and the calculated positive prediction ratio is smaller than the positive prediction ratio stored in the definition information storage unit 103, The threshold deriving means 105 is activated. Alternatively, when the calculated negative prediction ratio is compared with the negative prediction ratio stored in the definition information storage unit 103, and the calculated negative prediction ratio is smaller than the negative prediction ratio stored in the definition information storage unit 103, The threshold deriving means 105 is activated. For example, if the calculated positive prediction ratio is 80% and the positive prediction ratio stored in the definition information storage unit 103 is 90%, the calculated positive prediction ratio is small, and the threshold value derivation unit 105 is activated. The

しきい値導出手段１０５は、メトリックマップ記憶部３０４に記憶されたメトリックマップを参照して、メトリック値記憶部３０２に記憶された業務サービスメトリックのアラート判定結果と、メトリック値取得手段１０１によって取得された業務サービスメトリックに対応付けられたシステム性能項目のシステム性能値とにもとづいて、システム性能項目の新たなしきい値を算出し、算出したしきい値をしきい値記憶部３０３に記憶する。 The threshold value derivation unit 105 refers to the metric map stored in the metric map storage unit 304 and is acquired by the business service metric alert determination result stored in the metric value storage unit 302 and the metric value acquisition unit 101. Based on the system performance value of the system performance item associated with the business service metric, a new threshold value of the system performance item is calculated, and the calculated threshold value is stored in the threshold value storage unit 303.

しきい値導出手段１０５は、具体的には、メトリックマップ記憶部３０４に記憶されたメトリックマップにおける業務サービスメトリックと業務サービスメトリックに対応付けられたシステム性能項目とを参照し、業務サービスメトリックのアラート判定結果を説明変数、システム性能項目のシステム性能値を目的変数として、業務サービスメトリックのアラート判定結果が正常である場合におけるシステム性能項目のシステム性能値を算出する多変量回帰分析を行う。 Specifically, the threshold value derivation unit 105 refers to the business service metric in the metric map stored in the metric map storage unit 304 and the system performance item associated with the business service metric, and alerts the business service metric. Multivariate regression analysis is performed to calculate the system performance value of the system performance item when the alert determination result of the business service metric is normal, using the determination result as the explanatory variable and the system performance value of the system performance item as the objective variable.

また、しきい値導出手段１０５は、記憶装置３に記憶されたメトリック定義情報、メトリック値情報、しきい値情報およびメトリックマップを管理端末４に送信する。管理端末４は、しきい値導出手段１０５から受けたメトリック定義情報、メトリック値情報、しきい値情報およびメトリックマップを、例えば図６に示すように画面に表示する。 Further, the threshold deriving unit 105 transmits the metric definition information, metric value information, threshold information, and metric map stored in the storage device 3 to the management terminal 4. The management terminal 4 displays the metric definition information, metric value information, threshold information and metric map received from the threshold deriving means 105 on a screen as shown in FIG. 6, for example.

この例では、領域４１に、Monito_001 を識別子とするメトリック定義情報が表示され、領域４２に、Monitor_001 を識別子とするメトリック定義情報の取得時刻と、そのときのメトリック値と、しきい値記憶部３０３に記憶されているしきい値と、しきい値導出手段１０５で算出したしきい値（予測しきい値）とを１組として、時系列に一覧表示されている。 In this example, metric definition information whose identifier is Monoto_001 is displayed in the region 41, and the acquisition time of the metric definition information whose identifier is Monitor_001, the metric value at that time, and the threshold value storage unit 303 are displayed in the region 42. And the threshold values (predicted threshold values) calculated by the threshold value deriving means 105 as a set, are listed in time series.

管理端末４に表示されているボタン４３が押下されると、管理端末４は、その旨の情報を監視装置１に送信する。しきい値導出手段１０５は、その旨の情報を受けると、しきい値記憶部３０３に記憶されているしきい値を、しきい値導出手段１０５が算出したしきい値（予測しきい値）で更新する。すなわち、しきい値導出手段１０５は、画面上で新たなしきい値の設定が指示された場合に、算出したしきい値をしきい値記憶部３０３に記憶する。 When the button 43 displayed on the management terminal 4 is pressed, the management terminal 4 transmits information to that effect to the monitoring device 1. When threshold deriving means 105 receives information to that effect, threshold deriving means 105 calculates the threshold stored in threshold storage section 303 (predicted threshold). Update with. That is, the threshold value deriving unit 105 stores the calculated threshold value in the threshold value storage unit 303 when an instruction to set a new threshold value is given on the screen.

次に、本実施の形態の全体の動作を説明する。
システム管理者または業務責任者は、業務サービスメトリックのアラート判定結果とシステム性能のアラート判定結果との異なりをなくすために、該当するシステム性能項目に新たなしきい値を再設定する場合、または、既に設定されているシステム性能項目のしきい値を見直す場合には、監視装置１を起動する前に、管理端末４を通じて、陽性予測割合と陰性予測割合とを定義情報記憶部１０３に記憶させておく。 Next, the overall operation of the present embodiment will be described.
In order to eliminate the difference between the business service metric alert judgment result and the system performance alert judgment result, the system administrator or the business manager must reset a new threshold for the corresponding system performance item or When reviewing the threshold value of the set system performance item, the positive prediction ratio and the negative prediction ratio are stored in the definition information storage unit 103 through the management terminal 4 before starting the monitoring device 1. .

また、管理端末４を通じて、メトリック定義情報記憶部３０１に記憶されているメトリック定義情報、しきい値記憶部３０３に記憶されているしきい値情報、およびメトリックマップ記憶部３０４に記憶されているメトリックマップを確認し、必要に応じて、新たなメトリック定義情報、しきい値情報、およびメトリックマップを、追加、変更または削除しておく。 In addition, the metric definition information stored in the metric definition information storage unit 301, the threshold information stored in the threshold storage unit 303, and the metric stored in the metric map storage unit 304 through the management terminal 4 Check the map, and add, change, or delete new metric definition information, threshold information, and metric maps as necessary.

監視装置１は、システム管理者または業務責任者によって起動されると、図７に示す処理を開始する。なお、集合Ａ、集合Ｂ、集合Ｃ、変数Ｓ，Ｔ，Ｕ，Ｖ，Ｘ，Ｙの初期値はゼロであるとする。また、変数Ｓ，Ｔ，Ｕ，Ｖについては、システム性能項目ＩＤごとに異なる変数Ｓ，Ｔ，Ｕ，Ｖを用意する。 When the monitoring device 1 is activated by the system administrator or the business manager, the monitoring device 1 starts the processing shown in FIG. Assume that the initial values of set A, set B, set C, and variables S, T, U, V, X, and Y are zero. For variables S, T, U, and V, different variables S, T, U, and V are prepared for each system performance item ID.

まず、監視装置１におけるメトリック値取得手段１０１が、監視対象装置２からシステム性能値と業務サービスメトリック値とを取得し、アラート判定手段１０２が、監視対象装置２のシステム性能および業務サービスに問題が発生しているか否かを判定する。なお、アラート判定手段１０２は、取得した値としきい値記憶部３０３に記憶されているしきい値とを比較して、取得した値がしきい値を越えている場合に、問題が発生しているとする（ステップＳ１０１）。 First, the metric value acquisition unit 101 in the monitoring device 1 acquires the system performance value and the business service metric value from the monitoring target device 2, and the alert determination unit 102 has a problem in the system performance and business service of the monitoring target device 2. Determine whether it has occurred. Note that the alert determination means 102 compares the acquired value with the threshold value stored in the threshold value storage unit 303, and if the acquired value exceeds the threshold value, a problem has occurred. (Step S101).

次に、しきい値変更判定手段１０４が、メトリックマップ記憶部３０４のメトリックマップを集合Ａに格納し（ステップＳ１０２）、集合Ａから１つのメトリックマップを取り出す（ステップＳ１０３）。集合Ａの取り出しに成功した場合にはステップＳ１０５に進み、失敗した場合にはステップＳ１０１に戻る（ステップＳ１０４）。 Next, the threshold value change determination unit 104 stores the metric map in the metric map storage unit 304 in the set A (step S102), and extracts one metric map from the set A (step S103). If the retrieval of the set A has succeeded, the process proceeds to step S105, and if it has failed, the process returns to step S101 (step S104).

ステップＳ１０５では、しきい値変更判定手段１０４が、取り出した１つのメトリックマップにおける１つの業務サービスメトリックに対応付けられた複数のシステム性能項目ＩＤを集合Ｂに格納する。また、システム性能項目ＩＤごとにＳ，Ｔ，Ｕ，Ｖを１つの変数セットとし、集合Ｂに格納したシステム性能項目ＩＤ分の変数セットを集合Ｃに格納する。そして、集合Ｂにシステム性能項目ＩＤがあればステップＳ１０７に進み、なければステップＳ１０３に戻る（ステップＳ１０６）。 In step S105, the threshold value change determination unit 104 stores a plurality of system performance item IDs associated with one business service metric in the extracted one metric map in the set B. Further, S, T, U, and V are set as one variable set for each system performance item ID, and a variable set for the system performance item ID stored in the set B is stored in the set C. If there is a system performance item ID in the set B, the process proceeds to step S107, and if not, the process returns to step S103 (step S106).

ステップＳ１０７では、しきい値変更判定手段１０４が、集合Ｂから、１つのシステム性能項目ＩＤを取り出す。また、集合Ｂから取り出したシステム性能項目ＩＤに対応するＳ，Ｔ，Ｕ，Ｖの変数セットを集合Ｃから取り出す。そして、ステップＳ１０７で取り出したシステム性能項目ＩＤに対応付けられた業務サービスメトリックに問題が発生していればステップＳ１０９に進み、発生していなければステップＳ１１０に進む（ステップＳ１０８）。 In step S107, the threshold value change determining unit 104 extracts one system performance item ID from the set B. In addition, a variable set of S, T, U, and V corresponding to the system performance item ID extracted from the set B is extracted from the set C. If a problem has occurred in the business service metric associated with the system performance item ID extracted in step S107, the process proceeds to step S109, and if not, the process proceeds to step S110 (step S108).

ステップＳ１０９では、しきい値変更判定手段１０４は、ステップＳ１０７で取り出したシステム性能項目ＩＤに問題が発生していればステップＳ１１１に進み、発生していなければステップＳ１１２に進む。ステップＳ１１１では、しきい値変更判定手段１０４は、ステップＳ１０７で取り出した変数セットのＳに１を加算し、ステップＳ１１５に進む。ステップＳ１１２では、ステップＳ１０７で取り出した変数セットのＴに１を加算し、ステップＳ１１５に進む。 In step S109, the threshold value change determination unit 104 proceeds to step S111 if a problem has occurred in the system performance item ID extracted in step S107, and proceeds to step S112 if no problem has occurred. In step S111, the threshold value change determination unit 104 adds 1 to S of the variable set extracted in step S107, and proceeds to step S115. In step S112, 1 is added to T of the variable set extracted in step S107, and the process proceeds to step S115.

ステップＳ１１０では、しきい値変更判定手段１０４は、ステップＳ１０７で取り出したシステム性能項目ＩＤに問題が発生していればステップＳ１１３に進み、発生していなければステップＳ１１４に進む。ステップＳ１１３では、ステップＳ１０７で取り出した変数セットのＵに１を加算し、ステップＳ１１５に進む。ステップＳ１１４では、ステップＳ１０７で取り出した変数セットのＶに１を加算し、ステップＳ１１５に進む。 In step S110, the threshold value change determination means 104 proceeds to step S113 if a problem has occurred in the system performance item ID extracted in step S107, and proceeds to step S114 if it has not occurred. In step S113, 1 is added to U of the variable set extracted in step S107, and the process proceeds to step S115. In step S114, 1 is added to V of the variable set extracted in step S107, and the process proceeds to step S115.

ステップＳ１１５では、しきい値変更判定手段１０４は、ステップＳ１０７で取り出した変数セット、およびステップＳ１１１〜ステップＳ１１４の処理で加算された変数を用いて、陽性予測割合と陰性予測割合とを算出する。そして、陽性予測割合をＸとし、Ｘには、ＳとＵとを加算した数でＳを除算した数を格納する。また、陰性予測割合をＹとし、Ｙには、ＴとＶとを加算した数でＶを除算した数を格納する。 In step S115, the threshold value change determination unit 104 calculates a positive prediction ratio and a negative prediction ratio using the variable set extracted in step S107 and the variables added in the processes in steps S111 to S114. Then, the positive prediction ratio is X, and X stores the number obtained by dividing S by the number obtained by adding S and U. The negative prediction ratio is Y, and Y stores the number obtained by dividing V by the number obtained by adding T and V.

次に、しきい値変更判定手段１０４は、定義情報記憶部１０３に記憶されている陽性予測割合をＸ’に格納し、定義情報記憶部１０３に記憶されている陰性予測割合をＹ’に格納する（ステップＳ１１６）。そして、ＸとＸ’とを比較し、ＸがＸ’よりも大きければステップＳ１２０に進み、小さければステップＳ１１８に進む。または、ＹとＹ’を比較し、ＹがＹ’よりも大きければステップＳ１２０に進み、小さければステップＳ１１８に進む（ステップＳ１１７）。 Next, the threshold value change determination unit 104 stores the positive prediction ratio stored in the definition information storage unit 103 in X ′, and stores the negative prediction ratio stored in the definition information storage unit 103 in Y ′. (Step S116). Then, X and X 'are compared. If X is larger than X', the process proceeds to step S120, and if X is smaller, the process proceeds to step S118. Alternatively, Y and Y 'are compared, and if Y is larger than Y', the process proceeds to step S120, and if smaller, the process proceeds to step S118 (step S117).

ステップＳ１１８では、しきい値導出手段１０５が、ステップＳ１０７で取り出したシステム性能項目ＩＤをもつしきい値を新たに算出し、算出したしきい値をしきい値記憶部３０３に記憶する。 In step S118, the threshold value deriving unit 105 newly calculates a threshold value having the system performance item ID extracted in step S107, and stores the calculated threshold value in the threshold value storage unit 303.

次に、しきい値導出手段１０５は、算出したシステム性能項目ＩＤに対応するＳ，Ｔ，Ｕ，Ｖの変数セットのすべてにゼロを格納し、ステップＳ１２０に進む（ステップＳ１１９）。ステップＳ１２０では、ステップＳ１０７で取り出したシステム性能項目ＩＤに対応するＳ，Ｔ，Ｕ，Ｖの変数セットを、集合Ｃに格納し、ステップＳ１０６へ戻る。 Next, the threshold value derivation unit 105 stores zero in all of the S, T, U, and V variable sets corresponding to the calculated system performance item ID, and proceeds to step S120 (step S119). In step S120, the S, T, U, and V variable sets corresponding to the system performance item ID extracted in step S107 are stored in set C, and the process returns to step S106.

本実施の形態では、業務サービスメトリックのアラート判定結果とシステム性能のアラート判定結果との異なりをなくして、アラート自体を常に適切に通知することが可能になる。その理由は、業務サービスメトリックのアラート判定結果と異なるシステム性能項目に新たなしきい値を再設定することができるためである。 In the present embodiment, it is possible to always appropriately notify the alert itself by eliminating the difference between the business service metric alert judgment result and the system performance alert judgment result. The reason is that a new threshold value can be reset to a system performance item different from the alert determination result of the business service metric.

また、本実施の形態では、新たなしきい値を再設定する場合や既に設定されているシステム性能項目のしきい値を見直す場合等に、定義情報記憶部１０３に記憶される陽性予測割合または陰性予測割合を非常に大きく設定することによって、ステップＳ１１７からステップＳ１１８へと強制的に処理を進めて、既に設定されているシステム性能項目のしきい値を見直すことができる。そのような処理は、新たな業務サービスメトリックやシステム性能項目を設定したときに、既に設定されているシステム性能項目を見直すことが明確な場合に有効な手段になる。 Further, in the present embodiment, when a new threshold value is reset or when a threshold value of a system performance item that has already been set is reviewed, the positive prediction ratio or negative value stored in the definition information storage unit 103 By setting the prediction ratio to a very large value, the process can be forcibly advanced from step S117 to step S118, and the threshold value of the already set system performance item can be reviewed. Such processing is an effective means when it is clear to review the already set system performance item when a new business service metric or system performance item is set.

実施の形態２．
次に、第１の実施の形態の変形例である第２の実施の形態（実施の形態２）を説明する。図８は、第２の実施の形態を利用した運用保守コンサルティングの利用形態を示す説明図である。 Embodiment 2. FIG.
Next, a second embodiment (Embodiment 2), which is a modification of the first embodiment, will be described. FIG. 8 is an explanatory diagram showing a usage form of operation and maintenance consulting using the second embodiment.

図８（Ａ）に示す形態は、顧客システム１０と、運用保守システム２０と、運用保守コンサルティングシステム３０とで構成される。顧客システム１０と運用保守システム２０とは通信ネットワーク４０を介して接続される。運用保守システム２０と運用保守コンサルティングシステム３０とは通信ネットワーク４０を介して接続される。なお、各システム間で送受信されるデータは、暗号化されてもよい。 The form shown in FIG. 8A includes a customer system 10, an operation and maintenance system 20, and an operation and maintenance consulting system 30. The customer system 10 and the operation and maintenance system 20 are connected via a communication network 40. The operation and maintenance system 20 and the operation and maintenance consulting system 30 are connected via a communication network 40. Note that data transmitted and received between the systems may be encrypted.

顧客２００は、顧客システム１０を所有する。また、運用保守サービスを提供する者（以下、運用保守サービス提供者という。）１００は、運用保守システム２０を所有する。また、運用保守をコンサルティングする者（以下、運用保守コンサルティング提供者という。）３００は、運用保守コンサルティングシステム３０を所有する。図８（Ｂ）に示すように、顧客は、運用保守システム２０によって、顧客システムの運用保守サービスを受ける。運用保守サービスでは、システム性能または業務サービスの問題が発生している場合に、問題が発生していることが顧客システム１０に通知される。顧客は、運用保守サービスを享受する対価を支払う。 The customer 200 owns the customer system 10. A person who provides an operation and maintenance service (hereinafter referred to as an operation and maintenance service provider) 100 owns the operation and maintenance system 20. A person who consults operation and maintenance (hereinafter referred to as an operation and maintenance consulting provider) 300 owns the operation and maintenance consulting system 30. As shown in FIG. 8B, the customer receives an operation and maintenance service of the customer system by the operation and maintenance system 20. In the operation and maintenance service, when a system performance or business service problem occurs, the customer system 10 is notified that the problem has occurred. The customer pays for the operation and maintenance service.

図８（Ｂ）に示すように、運用保守サービス提供者は、顧客に運用保守サービスを提供する。また、運用保守サービス提供者は、運用保守コンサルティングシステム３０によって、運用保守サービスを改善するための運用保守コンサルティングサービスを受ける。運用保守コンサルティングサービスでは、業務サービスメトリックのアラート判定結果と異なるシステム性能項目に対して、新たなしきい値情報が提供される。運用保守サービス提供者は、運用保守コンサルティングサービスを享受する対価を支払う。 As shown in FIG. 8B, the operation and maintenance service provider provides the operation and maintenance service to the customer. The operation and maintenance service provider receives an operation and maintenance consulting service for improving the operation and maintenance service by the operation and maintenance consulting system 30. In the operation and maintenance consulting service, new threshold information is provided for system performance items different from the alert judgment result of the business service metric. The operation / maintenance service provider pays for the operation / maintenance consulting service.

運用保守コンサルティング提供者は、運用保守サービス提供者に運用保守コンサルティングサービスを提供する。 The operation and maintenance consulting provider provides the operation and maintenance consulting service to the operation and maintenance service provider.

図９は、監視システムの第２の実施の形態の構成例を示すブロック図である。第１の実施の形態と同じ構成要素については、図１における符号と同じ符号を付して、説明を省略する。 FIG. 9 is a block diagram illustrating a configuration example of the second embodiment of the monitoring system. The same constituent elements as those of the first embodiment are denoted by the same reference numerals as those in FIG.

顧客システム１０は、監視対象装置２と管理端末４ａとを備えている。管理端末４ａは、記憶装置３ａのメトリック定義情報記憶部３０１に記憶されたメトリック定義情報、しきい値記憶部３０３に記憶されたしきい値値情報、およびアラート判定手段１０２から送信されるアラート判定結果を画面に表示する。 The customer system 10 includes a monitoring target device 2 and a management terminal 4a. The management terminal 4a uses the metric definition information stored in the metric definition information storage unit 301 of the storage device 3a, the threshold value information stored in the threshold storage unit 303, and the alert determination transmitted from the alert determination unit 102. Display the results on the screen.

運用保守システム２０は、監視装置１ａと、記憶装置３ａと、管理端末４ｂとを備えている。監視装置１ａは、メトリック値取得手段１０１とアラート判定手段１０２とを含む。アラート判定手段１０２は、第１の実施の形態の動作に加えて、さらに他の動作も行う。記憶装置３ａは、メトリック定義情報記憶部３０１と、しきい値記憶部３０３とを含む。 The operation and maintenance system 20 includes a monitoring device 1a, a storage device 3a, and a management terminal 4b. The monitoring device 1a includes a metric value acquisition unit 101 and an alert determination unit 102. The alert determination unit 102 performs other operations in addition to the operations of the first embodiment. The storage device 3 a includes a metric definition information storage unit 301 and a threshold storage unit 303.

管理端末４ｂは、記憶装置３ａのメトリック定義情報記憶部３０１に記憶されたメトリック定義情報、しきい値記憶部３０３に記憶されたしきい値情報、および記憶装置３ｂのメトリック値記憶部に記憶されたメトリック値情報、記憶装置３ｂのしきい値記憶部３０３に記憶されたしきい値情報、および記憶装置３ｂのメトリックマップ記憶部３０４に記憶されたメトリックマップを画面に表示する。 The management terminal 4b is stored in the metric definition information stored in the metric definition information storage unit 301 of the storage device 3a, the threshold information stored in the threshold storage unit 303, and the metric value storage unit of the storage device 3b. The metric value information, the threshold information stored in the threshold storage unit 303 of the storage device 3b, and the metric map stored in the metric map storage unit 304 of the storage device 3b are displayed on the screen.

運用保守コンサルティングシステム３０は、監視装置１ｂと、記憶装置３ｂと、管理端末４ｃとを含む。監視装置１ｂは、定義情報記憶部１０３と、しきい値変更判定手段１０４と、しきい値導出手段１０５とを含む。記憶装置３ｂは、メトリック値記憶部３０２と、しきい値記憶部３０３と、メトリックマップ記憶部３０４とを含む。 The operation and maintenance consulting system 30 includes a monitoring device 1b, a storage device 3b, and a management terminal 4c. The monitoring device 1b includes a definition information storage unit 103, a threshold value change determining unit 104, and a threshold value deriving unit 105. The storage device 3b includes a metric value storage unit 302, a threshold value storage unit 303, and a metric map storage unit 304.

管理端末４ｃは、記憶装置３ｂのメトリック値記憶部３０２に記憶されたメトリック値情報、記憶装置３ｂのしきい値記憶部３０３に記憶されたしきい値情報、および記憶装置３ｂのメトリックマップ記憶部３０４に記憶されたメトリックマップを画面に表示する。 The management terminal 4c includes metric value information stored in the metric value storage unit 302 of the storage device 3b, threshold information stored in the threshold value storage unit 303 of the storage device 3b, and a metric map storage unit of the storage device 3b. The metric map stored in 304 is displayed on the screen.

メトリック値取得手段１０１、アラート判定手段１０２、定義情報記憶部１０３、しきい値変更判定手段１０４およびしきい値導出手段１０５のそれぞれは、第１の実施の形態における動作と同じ動作を行う。 Each of the metric value acquisition unit 101, the alert determination unit 102, the definition information storage unit 103, the threshold value change determination unit 104, and the threshold value derivation unit 105 performs the same operation as that in the first embodiment.

アラート判定手段１０２は、第１の実施の形態の場合と同様に、メトリック値取得手段１０１が取得したシステム性能値および業務サービスメトリック値と、記憶装置３ａのしきい値記憶部３０３に記憶されているしきい値とにもとづいて、監視対象装置２にシステム性能および業務サービスの問題が発生しているか否かを判定する。本実施の形態では、そのアラート判定結果を、しきい値変更判定手段１０４に渡すとともに、さらに、顧客システム１０における管理端末４ａにも渡す。 As in the case of the first embodiment, the alert determination unit 102 stores the system performance value and business service metric value acquired by the metric value acquisition unit 101 and the threshold storage unit 303 of the storage device 3a. Based on the threshold value, it is determined whether there is a problem in system performance and business service in the monitoring target apparatus 2. In the present embodiment, the alert determination result is passed to the threshold value change determination means 104 and further passed to the management terminal 4a in the customer system 10.

本実施の形態では、第１の実施の形態の効果に加えて、複数の運用保守システムに対して、１つの運用保守コンサルティングシステムで運用保守コンサルティングサービスを提供することができるという効果がある。その理由は、本実施の形態では、監視システムが、運用保守サービスを提供する監視装置１ａと、運用保守コンサルティングサービスを提供する監視装置１ｂとに分割されているためである。 In this embodiment, in addition to the effect of the first embodiment, there is an effect that an operation and maintenance consulting service can be provided to a plurality of operation and maintenance systems with one operation and maintenance consulting system. This is because, in this embodiment, the monitoring system is divided into a monitoring device 1a that provides an operation and maintenance service and a monitoring device 1b that provides an operation and maintenance consulting service.

なお、本発明は上記の各実施の形態に限定されず、その他各種の付加変更が可能である。例えば、本発明の監視対象装置２は、１台に限られず、１以上の複数台であってもよい。また、本発明の監視装置１は、監視対象装置２と物理的に別のコンピュータである必要はなく、何れかの監視対象装置２を構成するコンピュータを、監視装置１として使用することも可能である。 The present invention is not limited to the above embodiments, and various other additions and changes can be made. For example, the monitoring target device 2 of the present invention is not limited to one, but may be one or more plural devices. Further, the monitoring device 1 of the present invention does not need to be a physically separate computer from the monitoring target device 2, and a computer constituting any one of the monitoring target devices 2 can be used as the monitoring device 1. is there.

また、本発明の監視装置１の機能をハードウェア的に実現することができるが、コンピュータとプログラムとで実現することもできる。プログラムは、磁気ディスクや半導体メモリなどのコンピュータ可読記憶媒体に記録されて提供され、コンピュータの立ち上げ時などにコンピュータに読み取られ、そのコンピュータの動作を制御することによって、そのコンピュータを上記の各実施の形態における監視装置１として機能させ、上記の処理を実行させる。 Moreover, although the function of the monitoring device 1 of the present invention can be realized by hardware, it can also be realized by a computer and a program. The program is provided by being recorded on a computer-readable storage medium such as a magnetic disk or a semiconductor memory, and is read by the computer at the time of starting up the computer and the computer is controlled by controlling the operation of the computer. Function as the monitoring device 1 in the above-described form, and execute the above processing.

本発明は、インターネットを介してデータのやり取りをしたり共有したりすることが可能な情報処理システムを監視する装置および方法として有用であり、特に、業務サービスとシステム性能が監視対象である監視対象装置を運用保守する運用保守サービスを支援する運用保守コンサルティングに好適に適用される。 INDUSTRIAL APPLICABILITY The present invention is useful as an apparatus and method for monitoring an information processing system capable of exchanging and sharing data via the Internet, and in particular, a monitoring target whose business service and system performance are monitoring targets. The present invention is suitably applied to operation and maintenance consulting that supports an operation and maintenance service for operating and maintaining an apparatus.

本発明による監視装置の第１の実施の形態の構成例を示すブロック図である。It is a block diagram which shows the structural example of 1st Embodiment of the monitoring apparatus by this invention. 第１の実施の形態におけるメトリック定義情報記憶部に記憶されているメトリック定義情報の例を示す説明図である。It is explanatory drawing which shows the example of the metric definition information memorize | stored in the metric definition information storage part in 1st Embodiment. 第１の実施の形態におけるメトリック値記憶部に記憶されるメトリック値情報の例を示す説明図である。It is explanatory drawing which shows the example of the metric value information memorize | stored in the metric value memory | storage part in 1st Embodiment. 第１の実施の形態におけるしきい値記憶部に記憶されるしきい値情報の例を示す説明図である。It is explanatory drawing which shows the example of the threshold value information memorize | stored in the threshold value memory | storage part in 1st Embodiment. 第１の実施の形態におけるメトリックマップ記憶部に記憶されるメトリックマップの例を示す説明図である。It is explanatory drawing which shows the example of the metric map memorize | stored in the metric map memory | storage part in 1st Embodiment. 第１の実施の形態における管理端末の表示画面の一例を示す説明図である。It is explanatory drawing which shows an example of the display screen of the management terminal in 1st Embodiment. 第１の実施の形態の監視装置の処理例を示すフローチャートである。It is a flowchart which shows the process example of the monitoring apparatus of 1st Embodiment. 第２の実施の形態を利用した運用保守コンサルティングの利用形態を示す説明図である。It is explanatory drawing which shows the utilization form of the operation maintenance consulting using 2nd Embodiment. 本発明による監視装置の第２の実施の形態の構成例を示すブロック図である。It is a block diagram which shows the structural example of 2nd Embodiment of the monitoring apparatus by this invention.

Explanation of symbols

１、１ａ、１ｂ監視装置
１０１メトリック値取得手段
１０２アラート判定手段
１０３定義情報記憶部
１０４しきい値変更判定手段
１０５しきい値導出手段
２監視対象装置
３、３ａ、３ｂ記憶装置
３０１メトリック定義情報記憶部
３０２メトリック値記憶部
３０３しきい値記憶部
３０４メトリックマップ記憶部
４、４ａ、４ｂ、４ｃ管理端末
１０顧客システム
２０運用保守システム
３０運用保守コンサルティングシステム
４０通信ネットワーク DESCRIPTION OF SYMBOLS 1, 1a, 1b Monitoring apparatus 101 Metric value acquisition means 102 Alert determination means 103 Definition information storage part 104 Threshold change determination means 105 Threshold value derivation means 2 Monitoring object apparatus 3, 3a, 3b Storage apparatus 301 Metric definition information storage Unit 302 Metric value storage unit 303 Threshold value storage unit 304 Metric map storage unit 4, 4a, 4b, 4c Management terminal 10 Customer system 20 Operation maintenance system 30 Operation maintenance consulting system 40 Communication network

Claims

A system performance value or business service metric transmitted from a metric value acquisition means for acquiring one or more system performance values and one or more business service metric values of the monitoring target device according to the metric definition information stored in the metric definition information storage unit The value is stored in the metric value storage unit, and the system performance and business service problems are detected in the monitored device according to the business service metric threshold value and the system performance item threshold value stored in the threshold value storage unit. In a monitoring device provided with alert determination means for determining whether or not it has occurred,
A metric map storage unit for storing a metric map indicating a relationship between one business service metric and one or more system performance items;
Based on the alert determination result by the business service metric determined to be normal by the alert determination means, the system performance item associated with the business service metric in the metric map, and the system performance value of the system performance item, And a threshold deriving means for calculating the system performance value when the alert judgment result of the business service metric is normal by regression analysis and storing the calculation result in a threshold storage unit. apparatus.

A system with a positive predictive ratio indicating how often the alert judgment result of the business service metric is normal and the alert judgment result of the system performance is normal, and the alert judgment result of the business service metric is abnormal A definition information storage unit that stores a negative prediction ratio indicating how much the case where the performance alert determination result is also abnormal, and
Threshold value change determination means for determining whether to change the threshold value of the system performance item stored in the threshold value storage unit according to the positive prediction ratio and negative prediction ratio stored in the definition information storage unit The monitoring device according to claim 1, further comprising:

Connected to the management terminal including the display means,
The threshold value derivation means displays the calculated threshold value on the screen of the management terminal, and when the threshold value setting is instructed on the screen, the threshold value derivation unit sets the threshold value as a new threshold value. The monitoring device according to claim 2, which is stored in a storage unit.

A plurality of monitoring target devices, a monitoring device connected to be able to communicate with the plurality of monitoring target devices, a management terminal connected to be able to communicate with the monitoring device, and a connection to be able to communicate with the monitoring device and the management terminal A monitoring system comprising a storage device,
The storage device includes a metric definition information storage unit that stores metric definition information, a metric value storage unit that stores one or more system performance values and one or more business service metric values of the monitoring target device, and a system performance item. A threshold value storage unit for storing a threshold value and a threshold value for a business service metric; and a metric map storage unit for storing a metric map indicating a relationship between one business service metric and one or more system performance items;
The monitoring device includes a metric value acquisition unit that acquires one or more system performance values and one or more business service metric values of the monitoring target device according to the metric definition information stored in the metric definition information storage unit, and the metric value acquisition Alert determination means for storing a system performance value or a business service metric value transmitted from the means in the metric value storage unit, and determining whether a problem of system performance and business service has occurred in the monitored device; A positive prediction ratio indicating how often the service metric alert judgment result is normal and the system performance alert judgment result is normal, and the business service metric alert judgment result is abnormal and the system performance The percentage of cases where the alert judgment result is abnormal A definition information storage unit that stores a negative prediction ratio indicating whether to perform, a threshold value change determination unit that determines whether to change a threshold value of a system performance item stored in the threshold value storage unit, , Based on the alert determination result by the business service metric determined to be normal by the alert determination means, the system performance item associated with the business service metric in the metric map, and the system performance value of the system performance item And a threshold value deriving unit that calculates the system performance value when the alert determination result of the business service metric is normal by regression analysis and stores the calculation result in a threshold value storage unit. system.

The threshold value derivation means transmits the calculated threshold value to the management terminal and displays it on the screen of the management terminal. When the threshold setting is instructed on the screen, the threshold value deriving means The monitoring system according to claim 4, wherein the threshold value storage unit stores the threshold value as a threshold value.

In a method of monitoring a monitoring target device by a computer that is communicably connected to a plurality of monitoring target devices,
The computer is
According to the threshold value of the business service metric and the threshold value of the system performance item, it is determined whether there is a problem with the system performance and the business service in the monitored device, and the alert determination result is stored in the storage unit.
Based on the alert determination result based on the business service metric determined to be normal stored in the storage unit, the system performance item associated with the business service metric, and the system performance value of the system performance item, the business The system performance value when the service metric alert judgment result is normal is calculated by regression analysis, and the calculation result is stored.
Outputting the calculation result to the management terminal;
In accordance with an instruction from the management terminal, a new threshold value output to the management terminal is set as a threshold value of the system performance item stored in the storage unit,
A monitoring method comprising receiving and storing a system performance value and a business service metric value transmitted from a monitoring target device.

When calculating system performance values by regression analysis,
Business service metric alert judgment and business service metric alert judgment indicating how often the business service metric alert judgment result stored in the storage unit is normal and the system performance alert judgment result is normal A determination is made as to whether or not to change the threshold value of the system performance item according to a negative predictive ratio indicating a rate at which the result is abnormal and the system performance alert determination result is also abnormal. 6. The monitoring method according to 6.

To a computer that is communicably connected to multiple monitored devices,
Processing for determining whether system performance and business service problems have occurred in the monitored device according to the business service metric threshold value and the system performance item threshold value, and storing the alert determination result in the storage unit; ,
Based on the alert determination result based on the business service metric determined to be normal stored in the storage unit, the system performance item associated with the business service metric, and the system performance value of the system performance item, the business service Processing for calculating the system performance value when the alert determination result of the metric is normal by regression analysis and storing the calculation result;
Processing to output the calculation result to a management terminal;
In accordance with an instruction from the management terminal, a process of setting a new threshold value output to the management terminal as a threshold value of the system performance item stored in the storage unit;
A monitoring program for executing a process of receiving a system performance value and a business service metric value transmitted from a monitoring target device and storing them in a storage unit.

When calculating system performance values by regression analysis,
Business service metric alert judgment and business service metric alert judgment indicating the percentage of cases where the business service metric alert judgment result stored in the storage unit is normal and the system performance alert judgment result is also normal To determine whether or not to change the threshold value of the system performance item according to the negative prediction ratio that indicates the rate at which the result is abnormal and the system performance alert determination result is also abnormal The monitoring program according to claim 8.