JP2001265623A

JP2001265623A - Monitoring device

Info

Publication number: JP2001265623A
Application number: JP2000075707A
Authority: JP
Inventors: Yoshihide Shoji; 好英庄司
Original assignee: Hitachi Ltd; Hitachi Information Technology Co Ltd
Current assignee: Hitachi Ltd; Hitachi Information Technology Co Ltd
Priority date: 2000-03-17
Filing date: 2000-03-17
Publication date: 2001-09-28

Abstract

PROBLEM TO BE SOLVED: To provide a monitoring device capable of automatically optimizing the monitoring of plural system equipments and detecting a fault which can not detected by the system equipment to be monitored itself. SOLUTION: A state detection part 2 refers to a schedule table 3, requests a fault report to the monitoring object system equipment 12, acquires the report and samples the state information of the system equipment 12 further. An analysis part 1 recognizes the operation state of the system equipment on the basis of information inside a sample information table 4 where past information regarding the system equipment 12 is recorded, the acquired fault report and the sampled state information and updates the contents of the schedule table 3 where a parameter for monitoring is set on the basis of a recognized result. Also, the analysis part 1 compares a hardware constitution record recorded inside the sample information table 4 and hardware constitution reported by the system equipment 12 and detects the fault on the basis of a compared result.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、計算機システムを
構成するシステム機器の動作状態を監視する監視装置に
係り、特に、ユーザによって設定されるスケジュールに
従って、監視対象となるシステム機器からの報告を得る
ことで障害を監視する監視装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a monitoring apparatus for monitoring the operation state of system equipment constituting a computer system, and in particular, to obtain reports from system equipment to be monitored according to a schedule set by a user. The present invention relates to a monitoring device that monitors a failure by monitoring the failure.

【０００２】[0002]

【従来の技術】一般に、計算機センタに設置される複数
のシステム機器が、常時正常に動作していることを、監
視員に代わって継続的に監視するための装置として監視
装置がある。監視装置には、システム機器に対して、何
時、どのような監視をするべきかを指定するための複数
のパラメータが設定され、これに従った監視ができるも
のがある。このとき、監視対象であるシステム機器と
は、例えば計算機システムを構成するホストコンピュー
タや補助記憶装置などである。これらシステム機器の中
には、外部から要求されれば、処理能力の一部を用い
て、内部構成や動作状態を自らチェックし、検出した障
害を報告する機能を持つものがある。2. Description of the Related Art In general, there is a monitoring device as a device for continuously monitoring on behalf of a monitoring person that a plurality of system devices installed in a computer center are always operating normally. Some monitoring devices are set with a plurality of parameters for designating when and what kind of monitoring should be performed on the system equipment, and can perform monitoring in accordance with the parameters. At this time, the system device to be monitored is, for example, a host computer or an auxiliary storage device that constitutes a computer system. Some of these system devices have a function of checking an internal configuration or an operation state by itself by using a part of the processing capacity and reporting a detected failure when requested from outside.

【０００３】[0003]

【発明が解決しようとする課題】従来の監視装置には、
以下の課題がある。（１）監視対象の負荷監視装置の監視対象となる複数のシステム機器は、計算
機システム内でタイムクリティカルな処理に従事してい
る装置であるため、監視することによって、システム機
器は内部構成や動作状態を自らチェックし、検出した障
害を報告するため、これによりシステム機器の負荷を増
大させ、性能の低下を招いてはならない。しかし、同じ
種類のシステム機器であっても、ハードウェア構成、稼
働スケジュール、処理する情報などの要因により負荷は
大きく変化する。このため、予め設定されたパラメータ
を用いるだけでは、稼働中のシステム機器の負荷状態に
応じた詳細な監視を行うことはできなかった。すなわ
ち、例えば、システム機器の負荷が８０％の状態におけ
る監視の仕方と、４０％の状態における監視の仕方とは
相違してくるべきであるが、システム機器の負荷状態に
応じた詳細な監視はできなかった。（２）システム機器が自ら検出できない障害自らの動作状態を報告することができるシステム機器で
あっても検出できない障害がある。例えば、電源起動時
に検出したハードウェア構成を基本構成として保持し、
監視の要求を受け取ったときには、基本構成に含まれる
ハードウェアについて動作状態を調査するシステム機器
では、電源起動時から検出できなかったハードウェアに
関して障害が検出できない。本発明の目的は、同じ種類
であっても動作状態の異なる複数のシステム機器の監視
を自動的に最適化し、かつ監視対象であるシステム機器
が自らでは検出できない障害を検出できる、監視装置を
提供することにある。SUMMARY OF THE INVENTION Conventional monitoring devices include:
There are the following issues. (1) Monitoring target load Since a plurality of system devices to be monitored by the monitoring device are devices that are engaged in time-critical processing in the computer system, by monitoring, the system device has an internal configuration and operation. Since the status is checked by itself and the detected fault is reported, the load on the system equipment should be increased and the performance should not be reduced. However, even for system devices of the same type, the load greatly changes due to factors such as the hardware configuration, operation schedule, and information to be processed. For this reason, it was not possible to perform detailed monitoring according to the load state of the operating system equipment only by using the preset parameters. That is, for example, the method of monitoring when the load of the system device is 80% and the method of monitoring when the load of the system device is 40% should be different. could not. (2) Failures that cannot be detected by system devices themselves There are some failures that cannot be detected even by system devices that can report their own operation status. For example, the hardware configuration detected when the power is turned on is held as the basic configuration,
When a monitoring request is received, a system device that investigates the operation state of hardware included in the basic configuration cannot detect a failure with respect to hardware that cannot be detected since power-on. An object of the present invention is to provide a monitoring device that can automatically optimize monitoring of a plurality of system devices having the same type but different operating states, and can detect a failure that the system device to be monitored cannot detect by itself. Is to do.

【０００４】[0004]

【課題を解決するための手段】上記目的を達成するた
め、本発明は、スケジュールテーブル内の監視設定に従
い、複数のシステム機器一つ一つに障害報告を要求し、
システム機器から受け取った結果を用いて監視を行う監
視装置において、システム機器に障害報告を要求して障
害報告を取得し、さらにシステム機器から該システム機
器の状態情報を採取する状態検出部と、システム機器に
関する過去の情報を記録する採取情報テーブルと、前記
取得した障害報告と採取した状態情報とシステム機器に
関する過去の情報とを組み合わせてシステム機器の動作
状態を把握し、前記スケジュールテーブルの監視設定の
更新を行う手段を備える解析部を有するようにしてい
る。また、システム機器一つ一つについてハードウェア
構成記録を有し、前記解析部はシステム機器が報告する
ハードウェア構成と前記ハードウェア構成記録における
ハードウェア構成とを比較することにより障害を検出す
る手段を備えるようにしている。In order to achieve the above object, according to the present invention, a failure report is requested from each of a plurality of system devices according to a monitoring setting in a schedule table,
A monitoring device that monitors using a result received from the system device, a status detection unit that requests a fault report from the system device, obtains a fault report, and further obtains status information of the system device from the system device; A collection information table for recording past information on the device, and a combination of the obtained failure report, the collected status information, and past information on the system device to grasp the operation state of the system device, An analysis unit having means for updating is provided. Further, each of the system devices has a hardware configuration record, and the analysis unit detects a failure by comparing the hardware configuration reported by the system device with the hardware configuration in the hardware configuration record. Is provided.

【０００５】[0005]

【発明の実施の形態】監視装置が、監視対象となるシス
テム機器に対して問い合わせを行なうことで、そのシス
テム機器から障害報告、ハードウェア構成が報告され
る。同時に、システム機器に問い合わせを発行してから
応答が戻るまでの応答時間や、得られたハードウェア構
成を過去の記録に照らし合わせ評価することによって、
システム機器が直接報告してくる以上の情報を得ること
ができる。このとき解析部は、得られた情報を用いて障
害監視を行うのと同時に、システム機器のハードウェア
構成やビジー率について、保持している過去の状態や基
準値との比較解析を行ない、比較解析の結果に基づき、
監視スケジュールを自動的に最適化する。DESCRIPTION OF THE PREFERRED EMBODIMENTS When a monitoring device makes an inquiry to a system device to be monitored, a fault report and a hardware configuration are reported from the system device. At the same time, by evaluating the response time from issuing an inquiry to the system device to returning a response and the obtained hardware configuration against past records,
It is possible to obtain more information than the system equipment reports directly. At this time, the analysis unit performs fault analysis using the obtained information, and at the same time, performs a comparative analysis of the hardware configuration and busy rate of the system equipment with the stored past state and reference value, and performs comparison. Based on the results of the analysis,
Automatically optimize your monitoring schedule.

【０００６】以下に本発明の実施例を図面を用いて説明
する。図１は実施例の監視装置の概略構成を示すブロッ
ク図である。この監視装置は、解析部１、状態検出部
２、スケジュールテーブル３、採取情報データベース
４、入出力処理部５、警報部９との６つの制御プログラ
ム上のソフトウェア構成部と、表示部６、操作部７、補
助記憶８、警報装置１０との４つのハード構成部とで構
成されている。スケジュールテーブル３には、監視対象
となる複数のシステム機器１２の一つ一つに対応して、
監視を実施する日時や時刻毎に、監視の優先度、監視の
実施間隔、行なうべき監視の項目などを示した監視の設
定１〜ｎが予め保持されている。状態検出部２は、スケ
ジュールテーブル３から現時刻における設定内容を参照
し、システム機器の中から監視開始時刻となっているも
のを選び出す。このとき複数のシステム機器が条件に該
当するならば、最も優先順位の高いシステム機器を選択
する。状態検出部２は、選択したシステム機器１２につ
いて、要求した情報を採取し、同時に自らもシステム機
器のビジー率などを計測する。この後、状態検出部２が
保持する採取情報は解析部１に送られる。Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram illustrating a schematic configuration of the monitoring apparatus according to the embodiment. This monitoring device includes software components on six control programs including an analysis unit 1, a state detection unit 2, a schedule table 3, a collection information database 4, an input / output processing unit 5, and an alarm unit 9, a display unit 6, an operation unit It comprises four hardware components: a unit 7, an auxiliary memory 8, and an alarm device 10. The schedule table 3 includes, for each of the plurality of system devices 12 to be monitored,
Monitoring settings 1 to n indicating the monitoring priority, the monitoring execution interval, the items to be monitored, and the like are stored in advance for each date and time when monitoring is performed. The state detection unit 2 refers to the setting contents at the current time from the schedule table 3 and selects the system device whose monitoring start time has been reached from the system devices. At this time, if a plurality of system devices meet the condition, the system device with the highest priority is selected. The state detection unit 2 collects the requested information for the selected system device 12 and simultaneously measures the busy rate of the system device itself. Thereafter, the collection information held by the state detection unit 2 is sent to the analysis unit 1.

【０００７】解析部１は、状態検出部２から得た情報で
障害を検出したならば、警報部９、警報装置１０、通信
回線１１を介して、監視員が駐在する管理センタへ障害
の発生を通報する。採取情報データベース４には、監視
対象システム機器１２の一つ一つに対応して、稼働中の
履歴情報が保持されており、このうちビジー率に付いて
は日時、時刻毎に統計化された形となり、システム機器
の記録１〜ｎが保持されている。解析部１は、状態検出
部２より送られた採取情報と、採取情報データベース４
に保持されている履歴情報とを比較解析し、監視設定の
最適化が必要と判断したならば、スケジュールテーブル
３の最適化を行う。スケジュールテーブル３の最適化が
終わった後に、解析部１が、監視によって得られた情報
を用いて、採取情報データベース４を更新し、次回以降
の監視に使用する。[0007] If the analysis unit 1 detects a failure based on the information obtained from the state detection unit 2, the failure is generated via the alarm unit 9, the alarm device 10, and the communication line 11 to the management center where the supervisor is stationed. Report. The collection information database 4 holds operating history information corresponding to each of the monitored system devices 12, and the busy rate is statisticized for each date and time. And records 1 to n of the system devices are held. The analysis unit 1 collects the collection information sent from the state detection unit 2 and the collection information database 4
Is compared with the history information stored in the schedule table 3, and if it is determined that the monitoring setting needs to be optimized, the schedule table 3 is optimized. After the optimization of the schedule table 3 is completed, the analysis unit 1 updates the collection information database 4 using the information obtained by the monitoring, and uses it for the next and subsequent monitoring.

【０００８】表示部６と操作部７は、例えば、ＣＲＴ表
示装置やキーボードで構成されており、入出力処理部５
を通して、スケジュールテーブル３の初期設定を入力す
る。同時に、表示部６と操作部７は、入出力処理部５を
通して、スケジュールテーブル３や監視記録４の内容を
参照することができる。補助記憶装置８は、監視装置内
に保持されているスケジュールテーブル３や採取情報デ
ータベース４の内容を入出力することができる。The display unit 6 and the operation unit 7 are composed of, for example, a CRT display device and a keyboard.
, The initial setting of the schedule table 3 is input. At the same time, the display unit 6 and the operation unit 7 can refer to the contents of the schedule table 3 and the monitoring record 4 through the input / output processing unit 5. The auxiliary storage device 8 can input and output the contents of the schedule table 3 and the collection information database 4 held in the monitoring device.

【０００９】次に、前記のソフトウェア構成部１、２、
３、４、５、９の詳細な動作を、図２Ａから図８の流れ
図に従って記述する。図２Ａ、図２Ｂは両図で１つの図
を構成し、図１の概略図から、監視時に発生する情報の
流れに関わる部分を抜き出した詳細図である。図３は監
視処理の流れを示す概略フローチャートである。図４は
初期設定時にユーザにより入力されるパラメータを示
す。図５は監視対象システム機器を選定する際の詳細な
フローチャートである。図６Ａ、図６Ｂは両図で１つの
図を構成し、情報の採取と障害の検出までの詳細処理を
示すフローチャートである。図７Ａ、図７Ｂは両図で１
つの図を構成し、解析の結果によるスケジュールテーブ
ル最適化の詳細な流れ図である。図８は、解析の結果に
よる採取情報データベースの更新の詳細な流れ図であ
る。この実施例の計算機システムは計算機などの情報処
理装置から構成されている。Next, the software components 1, 2,.
The detailed operations of 3, 4, 5, 9 will be described according to the flowcharts of FIGS. 2A to 8. FIGS. 2A and 2B constitute a single figure in both figures, and are detailed diagrams in which a portion related to a flow of information generated at the time of monitoring is extracted from the schematic diagram of FIG. FIG. 3 is a schematic flowchart showing the flow of the monitoring process. FIG. 4 shows parameters input by the user at the time of initial setting. FIG. 5 is a detailed flowchart for selecting a monitoring target system device. FIG. 6A and FIG. 6B are flow charts that constitute one figure in both figures and show detailed processing from collection of information to detection of a failure. 7A and FIG. 7B show 1 in both figures.
5 is a detailed flowchart of optimizing a schedule table based on the result of analysis, which constitutes two figures. FIG. 8 is a detailed flowchart of updating the collection information database based on the analysis result. The computer system of this embodiment includes an information processing device such as a computer.

【００１０】図３のフローチャートで、監視装置は、設
置時に一度だけＳ（ステップ）１にて、ユーザから以
後の監視の指標となるパラメータが、操作部７あるいは
補助記憶装置８を通じて入力される。図４にて、ユーザ
により入力するパラメータと、それらパラメータがスケ
ジュールテーブル３と採取情報テーブル４にどのように
反映されるかを示す。このとき、ユーザによる入力パラ
メータは、システム装置一つ一つに対する厳密な監視設
定ではなく、監視装置によるスケジュール作成の際に指
標となる値である。[0010] In the flowchart of FIG. 3, the monitoring device is input only once at the time of installation in S (step) 1 through the operation unit 7 or the auxiliary storage device 8 from the user through the operation unit 7 or the auxiliary storage device 8. FIG. 4 shows parameters input by the user and how those parameters are reflected in the schedule table 3 and the collection information table 4. At this time, the input parameter by the user is not a strict monitoring setting for each system device, but a value that serves as an index when the monitoring device creates a schedule.

【００１１】図４にて、システム機器１２の種類による
規定値として、「監視項目」、「装置の一般的な監視間
隔」という２種類の設定がある。「監視項目」は、シス
テム機器の障害検出のために、どの採取情報を参照し、
どの値ならば障害と判断するか、を示す項目と規定値
で、解析部１に予めプログラムされている。「監視項
目」は、採取情報データベース４内の「システム機器の
監視項目」に反映される。「一般的な監視間隔」は、シ
ステム機器にどの程度の頻度で監視を行うかを示し、
「監視項目」と同様に、システム機器の機能から考えら
れた値が、解析部１に予めプログラムされている。「一
般的な監視間隔」は、採取情報データベース４内の「基
本監視間隔」に反映される。図４にて、ユーザによる入
力項目として、「システム機器の優先順位」、「監視項
目ごとの優先順位」、「監視設定の区切り」、という３
種類の設定がある。「システム機器の優先順位」は、計
算機システム内に同じ種類の機器が複数ある場合、計算
機システムにとってどのシステム機器が重要であるかを
示す。「システム機器の優先順位」は、採取情報データ
ベース４内の「基本優先順位」に反映される。「監視項
目ごとの優先順位」は、指定された監視項目による障害
が、稼働中の計算機システムの監視で、どのくらいの重
要性を持つかを示す。「監視項目ごとの優先順位」は、
採取情報データベース４内の「システム機器の監視項
目」の「監視頻度」に反映される。「監視設定の区切
り」は、２４時間の連続監視を行うとき、計算機システ
ムの稼動するスケジュールに対応し、監視設定の細分化
のために用いられる。例えば、００：００から５：５９
まで電源断、６：００から１１：５９まで勘定系の処
理、１２：００から２３：５９までバッチ系の処理、と
いうスケジュールの計算機システムにおいては、その３
つを区切りとしてスケジューリングを行う。「監視設定
の区切り」は採取情報データベース４内の「日時時刻に
よるビジー率の履歴」の「監視設定の区切り」に反映さ
れる。採取情報データベース４の内容の例と、スケジュ
ールテーブル３の内容の例は図２Ａ及び図２Ｂに示され
ている。In FIG. 4, there are two types of settings, “monitoring item” and “general device monitoring interval”, as prescribed values according to the type of system equipment 12. "Monitoring Items" refers to which collected information to detect system equipment failures.
The analysis unit 1 is programmed in advance with an item indicating a value for determining a failure and a specified value. “Monitoring item” is reflected in “Monitoring item of system device” in the collection information database 4. "General monitoring interval" indicates how often system equipment is monitored,
Similarly to the “monitoring item”, a value considered from the function of the system device is programmed in the analysis unit 1 in advance. The “general monitoring interval” is reflected in the “basic monitoring interval” in the collection information database 4. In FIG. 4, three input items by the user are “priority of system device”, “priority of each monitoring item”, and “limit of monitoring setting”.
There are different settings. The “priority of system devices” indicates which system device is important for the computer system when there are a plurality of devices of the same type in the computer system. The “system device priority” is reflected in the “basic priority” in the collection information database 4. The “priority for each monitoring item” indicates how important a failure due to the specified monitoring item is in monitoring the operating computer system. "Priority of each monitoring item"
This is reflected in “monitoring frequency” of “system device monitoring item” in the collection information database 4. The “limit of monitoring setting” corresponds to a schedule for operating the computer system when performing continuous monitoring for 24 hours, and is used for subdividing the monitoring setting. For example, 00:00 to 5:59
In a computer system with a schedule of power off until 6:00 to 11:59, accounting processing from 2:00 to 23:59, and batch processing from 12:00 to 23:59, part 3
Scheduling is performed with one as a break. The “limit of monitoring setting” is reflected in the “limit of monitoring setting” of the “history of busy rate by date and time” in the collection information database 4. 2A and 2B show an example of the contents of the collection information database 4 and an example of the contents of the schedule table 3.

【００１２】Ｓ２にて、状態検出部２は、一定時間毎に
スケジュールテーブル３を参照し、監視対象であるシス
テム機器１２の中から、現時点で監視するべきシステム
機器を選択する。図５に監視を行うシステム機器の選択
手順の詳細を示す。Ｓ２１にて、状態検出部２は現時刻
を取り込む。その後、Ｓ２２にて、状態検出部２は、ス
ケジュールテーブル３内にある「システム機器の監視予
定」一つ一つから「次回の監視予定」を参照し、現時刻
に監視を行うことを設定されているシステム機器を検索
する。このとき、複数のシステム機器が監視を行うべき
設定となっていたならば、Ｓ２３にて、「システム機器
の監視予定」の「優先順位」を比較することで、最も優
先順位の高いシステム機器を検索し、監視を行うシステ
ム機器に選択する。In S2, the state detection unit 2 refers to the schedule table 3 at regular intervals and selects a system device to be monitored at this time from the system devices 12 to be monitored. FIG. 5 shows details of a procedure for selecting a system device to be monitored. At S21, state detection unit 2 captures the current time. After that, in S22, the state detection unit 2 refers to the “scheduled monitoring of system devices” in the schedule table 3 from each “scheduled monitoring of the next time” and sets to monitor at the current time. Search for system devices that are running. At this time, if a plurality of system devices are set to be monitored, in S23, the system devices having the highest priority are compared by comparing the "priorities" of the "scheduled monitoring of system devices". Search and select the system device to monitor.

【００１３】Ｓ３にて、状態検出部２は、選択したシス
テム機器に対する障害検出処理を行う。図６Ａ及び図６
Ｂにシステム機器の障害検出手順の詳細を示す。図６Ａ
及び図６ＢのＳ３１にて、状態検出部２は、スケジュー
ルテーブル３内の「監視対象システム機器の監視設定」
の「監視項目」部分を参照する。監視対象システム機器
の監視設定は、「監視設定の区切り」により、時刻に応
じて行うべき監視処理の項目とその監視頻度を示す。Ｓ
３２にて、状態検出部２は、スケジュールテーブル３内
の「項目名」と「監視頻度」から、システム機器に報告
を要求する監視項目を決定し、システム機器に対して、
監視項目に応じた報告要求を発行する。これについては
図２Ａ及び図２Ｂにおいても示されている。Ｓ３３に
て、状態検出部２は、「システム機器が認識しているハ
ードウェア構成情報」と、要求した監視項目に対する
「障害報告」をシステム機器から受け取る。同時に、要
求を発行してから、報告を受け取るまでの時間を計測
し、この経過時間からシステム機器のビジー率を算出す
る。これについては図２Ａ及び図２Ｂにおいても示され
ている。この後、監視によって得た情報は、状態検出部
２から解析部１へと送られる。Ｓ３４にて、解析部１
は、状態検出部２から受け取った報告に「障害報告」が
含まれているのかを判定する。「障害報告」を検出した
ならば、警報部９、警報装置１０、通信回線１１を通し
て、管理センタに駐在する監視員へ通報を行う。この
後、警報を受け取った監視員は、入出力処理部５を通し
て、表示部６、操作部７、補助記憶装置８などから詳細
な障害情報を採取する。Ｓ３５にて、解析部１は、採取
情報データベース４から、監視対象システム機器の履歴
情報について「監視装置が保持するハードウェア構成の
記録」を参照する。この情報は、初期設定あるいは過去
の監視より採取したシステム機器のハードウェア構成を
記録したものである。Ｓ３６にて、解析部１は、今回の
監視で採取した「システム機器が認識しているハードウ
ェア構成情報」と、Ｓ３３で読み込んだ「監視装置が保
持しているハードウェア構成の記録」とを照らし合わせ
る。このとき、「監視装置が保持しているハードウェア
構成の記録」に記録されているにも関わらず、「システ
ム機器が認識しているハードウェア構成情報」では報告
されず、障害報告もされていないハードウェアを検出し
たならば、システム機器が検出できなかった障害ハード
ウェアだと考えられ、解析部１は、システム機器から
「障害報告」を受け取ったときと同様に、監視員への通
報を行う。Ｓ３７にて、今回の監視において障害を検出
し、警報を実施したならば、システム機器について、監
視員の解析に用いるための情報を採取する。このとき、
システム機器に対して、監視頻度に関わらず、全ての監
視項目について報告要求を行う。In S3, the state detection unit 2 performs a failure detection process on the selected system device. 6A and 6
B shows details of the procedure for detecting a failure in a system device. FIG. 6A
Also, in S31 of FIG. 6B, the state detection unit 2 sets the “monitoring setting of the monitored system device” in the schedule table 3.
Refer to the "monitoring item" part of the above. The monitoring setting of the monitoring target system device indicates the item of the monitoring process to be performed according to the time and the monitoring frequency by using the “limit of monitoring setting”. S
At 32, the state detection unit 2 determines a monitoring item to request a report to the system device from the “item name” and “monitoring frequency” in the schedule table 3, and
Issues a report request according to the monitoring item. This is also shown in FIGS. 2A and 2B. In S33, the state detection unit 2 receives “hardware configuration information recognized by the system device” and “failure report” for the requested monitoring item from the system device. At the same time, the time from issuing the request to receiving the report is measured, and the busy rate of the system device is calculated from the elapsed time. This is also shown in FIGS. 2A and 2B. Thereafter, the information obtained by the monitoring is sent from the state detection unit 2 to the analysis unit 1. In S34, the analysis unit 1
Determines whether the report received from the state detection unit 2 includes a “failure report”. If a "failure report" is detected, a report is sent to a monitoring person stationed at the management center through the alarm unit 9, the alarm device 10, and the communication line 11. Thereafter, the supervisor receiving the alarm collects detailed fault information from the display unit 6, the operation unit 7, the auxiliary storage device 8, and the like through the input / output processing unit 5. In S35, the analysis unit 1 refers to “recording of the hardware configuration held by the monitoring device” for the history information of the monitoring target system device from the collection information database 4. This information is a record of the hardware configuration of the system device collected from the initial settings or past monitoring. In S36, the analysis unit 1 compares the “hardware configuration information recognized by the system device” collected in the current monitoring and the “recording of the hardware configuration held by the monitoring device” read in S33. Showing each other. At this time, even though the information is recorded in “Record of hardware configuration held by monitoring device”, it is not reported in “Hardware configuration information recognized by system equipment” and a failure report is also made. If no hardware is detected, it is considered faulty hardware that the system equipment could not be detected, and the analysis unit 1 sends a report to the supervisor as in the case of receiving a “fault report” from the system equipment. Do. In S37, if a failure is detected in this monitoring and a warning is issued, information to be used for analysis by a monitor is collected for the system equipment. At this time,
A report request is issued to the system device for all monitoring items regardless of the monitoring frequency.

【００１４】Ｓ４にて、スケジュールテーブル３の最適
化を行う。図７Ａ及び図７Ｂにスケジュールテーブルの
最適化の詳細を示す。図７Ａ及び図７ＢのＳ４１にて、
解析部１は、監視対象システムの今回の監視におけるビ
ジー率と、採取情報データベース４内の過去同時刻にお
ける「平均ビジー率」とを照らし合わせる。Ｓ４２に
て、今回のビジー率が、同システムの「平均ビジー率」
より、装置の種類によって予め決められる規定値以上の
大きさであったならば、例えば、今回のビジー率が３５
％、平均ビジー率が３０％、装置の種類によって予め決
められる規定値が２％であるならば、３５％は（３０％
＋２％）より大きくなり、解析部１は、システム機器が
過負荷状態であると判断し、スケジュールテーブル３内
の「監視間隔」、「次回の監視予定」を大きく、「優先
順位」、監視項目の「監視頻度」を低くすることで、監
視による負荷を軽減するような再設定を行う。Ｓ４３に
て、解析部１は、採取情報データベース４から、システ
ム機器の種類に応じた、「優先順位」、「監視間隔」、
各監視項目の「監視頻度」についてのユーザ初期設定時
の値を読み出す。Ｓ４４にて、今回の監視でいずれのス
ケジュール再設定要因も検出されず、かつビジー率が
「平均ビジー率」を大きく上回っていないならば、シス
テム機器は定常状態にあると考えられる。このとき、ス
ケジュールテーブル３内の「優先順位」、「監視間
隔」、監視項目の「監視頻度」が、ユーザ初期設定時の
値と異なっているならば、解析部１は、スケジュールテ
ーブル３をユーザ初期設定時の値に合わせて再設定す
る。At S4, the schedule table 3 is optimized. 7A and 7B show details of the schedule table optimization. In S41 of FIGS. 7A and 7B,
The analysis unit 1 compares the busy rate of the monitoring target system in the current monitoring with the “average busy rate” in the collection information database 4 at the same time in the past. In S42, this busy rate is the “average busy rate” of the system.
If the size is equal to or larger than a predetermined value determined in advance by the type of the device, for example, the current busy rate becomes 35
%, The average busy rate is 30%, and if the prescribed value predetermined by the type of device is 2%, 35% is (30%
+ 2%), the analysis unit 1 determines that the system device is overloaded, and increases the “monitoring interval” and “next monitoring schedule” in the schedule table 3 to “priority” and the monitoring item. By lowering the “monitoring frequency”, the setting is performed so as to reduce the load of monitoring. In S43, the analysis unit 1 retrieves “priority”, “monitoring interval”,
The value of “monitoring frequency” of each monitoring item at the time of user initial setting is read. In S44, if no schedule resetting factor is detected in the current monitoring and the busy rate does not significantly exceed the “average busy rate”, the system device is considered to be in a steady state. At this time, if the “priority”, the “monitoring interval”, and the “monitoring frequency” of the monitoring item in the schedule table 3 are different from the values at the time of the user initial setting, the analysis unit 1 sets the schedule table 3 to the user. Reset according to the initial settings.

【００１５】Ｓ５にて、採取情報データベースの更新を
行う。図８に採取情報データベース４の更新の詳細を示
す。図８のＳ５１にて、解析部１は、今回の監視によっ
て得たビジー率を用いて、「平均ビジー率」を更新す
る。Ｓ５２にて、今回の監視で障害が検出されたなら
ば、「障害発生回数」に１を加える。Ｓ５３にて、「障
害発生回数」が、システム機器の種類によって規定され
る値を超えた場合には、「基本優先順位」と「基本監視
間隔」を高くすることで、障害の多発するシステム機器
の監視を重点的に行うこととする。Ｓ５４にて、今回の
監視でシステム機器のハードウェア構成に変更が検出さ
れたならば、採取情報データベース４内の「監視装置が
保持するシステム機器のハードウェア構成の記録」を、
現在の構成に合わせて更新する。At S5, the collection information database is updated. FIG. 8 shows details of updating the collection information database 4. In S51 of FIG. 8, the analysis unit 1 updates the “average busy rate” using the busy rate obtained by the current monitoring. In S52, if a failure is detected in the current monitoring, 1 is added to the "frequency of failure occurrence". In step S53, when the “number of failure occurrences” exceeds the value specified by the type of the system device, the “basic priority” and the “basic monitoring interval” are increased to increase the number of system devices in which failures frequently occur. Monitoring will be prioritized. In S54, if a change is detected in the hardware configuration of the system device in this monitoring, the “recording of the hardware configuration of the system device held by the monitoring device” in the collection information database 4 is performed.
Update according to the current configuration.

【００１６】以上で、ひとつの監視対象システム機器の
監視が終了する。以後、Ｓ４で最適化されたスケジュー
ルテーブル３と、Ｓ５で更新された採取情報データベー
スを直ちに反映し、監視を繰り返すことで、計算機シス
テムに最適化された監視を行う。以上述べたように、本
実施例によれば、計算機システムの動作状態や障害の発
生といった動作状態に応じて、複数の監視対象システム
機器について、その時点での最適な監視を監視員の介入
なしに実現でき、これにより、処理が集中して負荷のか
かっているシステム機器に対して最適化された監視をス
ケジューリングできるため、充分な監視を行った上で、
過負荷を要因とする性能の低下を防止できる。また、過
去に障害を発生し、再発が予想されるシステム機器に対
して重点的な監視を行うことが出来る。また、監視対象
のシステム機器では検出できないハードウェア障害を検
出できる。Thus, the monitoring of one monitored system device is completed. Thereafter, the schedule table 3 optimized in S4 and the collection information database updated in S5 are immediately reflected, and monitoring is repeated, thereby performing monitoring optimized for the computer system. As described above, according to this embodiment, optimal monitoring of a plurality of monitored system devices at that time is performed without monitoring personnel intervention in accordance with the operating state of the computer system or the occurrence of a failure. In this way, it is possible to schedule optimized monitoring for system devices that are processing intensive and load, so after sufficient monitoring,
It is possible to prevent performance degradation due to overload. In addition, it is possible to perform intensive monitoring on a system device which has a failure in the past and is expected to recur. Further, a hardware failure that cannot be detected by the monitored system device can be detected.

【００１７】[0017]

【発明の効果】本発明によれば、計算機システムの動作
状態や障害の発生といった動作状態に応じて、複数の監
視対象システム機器について、その時点での最適な監視
を監視員の介入なしに実現できる。According to the present invention, optimum monitoring of a plurality of monitored system devices at that time can be realized without intervention of a supervisor according to the operating status of the computer system or the occurrence of a failure. it can.

[Brief description of the drawings]

【図１】本発明の実施例に係わる監視装置の概略構成を
示すブロック図である。FIG. 1 is a block diagram illustrating a schematic configuration of a monitoring device according to an embodiment of the present invention.

【図２Ａ】図１の概略図から、監視時に発生する情報の
流れに関わる部分を抜き出した詳細ブロック図の一部で
ある。FIG. 2A is a part of a detailed block diagram in which a part related to a flow of information generated at the time of monitoring is extracted from the schematic diagram of FIG. 1;

【図２Ｂ】図１の概略図から、監視時に発生する情報の
流れに関わる部分を抜き出した詳細ブロック図の他の一
部である。FIG. 2B is another part of the detailed block diagram in which a part related to a flow of information generated at the time of monitoring is extracted from the schematic diagram of FIG. 1;

【図３】監視処理の流れを示す概略フローチャートであ
る。FIG. 3 is a schematic flowchart showing the flow of a monitoring process.

【図４】初期設定時にユーザから設定されるパラメータ
の採取情報データベースへの展開図である。FIG. 4 is a development diagram in a collection information database of parameters set by a user at the time of initial setting.

【図５】監視対象システム機器を選定する際の詳細なフ
ローチャートである。FIG. 5 is a detailed flowchart for selecting a monitoring target system device.

【図６Ａ】情報の採取と障害の検出までの詳細処理を示
すフローチャートの一部である。FIG. 6A is a part of a flowchart showing a detailed process from collection of information to detection of a failure;

【図６Ｂ】情報の採取と障害の検出までの詳細処理を示
すフローチャートの他の一部である。FIG. 6B is another part of the flowchart showing the detailed processing from collection of information to detection of a failure.

【図７Ａ】解析の結果によるスケジュールテーブル最適
化の詳細な流れ図の一部である。FIG. 7A is a part of a detailed flowchart of schedule table optimization based on the result of analysis;

【図７Ｂ】解析の結果によるスケジュールテーブル最適
化の詳細な流れ図の他の一部である。FIG. 7B is another part of the detailed flowchart of schedule table optimization based on the result of analysis.

【図８】解析の結果による採取情報データベースの更新
の詳細な流れ図である。FIG. 8 is a detailed flowchart of updating a collection information database based on a result of analysis.

[Explanation of symbols]

１解析部２状態検出部３スケジュールテーブル４採取情報データベース５入出力処理部６表示部７操作部８補助記憶装置９警報部１０警報装置１１通信回線１２監視対象システム機器 DESCRIPTION OF SYMBOLS 1 Analysis part 2 State detection part 3 Schedule table 4 Sampling information database 5 Input / output processing part 6 Display part 7 Operation part 8 Auxiliary storage device 9 Alarm part 10 Alarm device 11 Communication line 12 Monitored system equipment

Claims

[Claims]

1. A failure report is requested to each of a plurality of system devices according to a monitoring setting in a schedule table.
A monitoring device that monitors using the results received from the system equipment, requests a failure report from the system equipment, acquires a failure report,
Further, a state detection unit that collects state information of the system device from the system device, a collection information table that records past information about the system device, and the acquired failure report, the collected state information, and past information about the system device. A monitoring unit comprising means for grasping the operation state of the system device by combining the above and updating the monitor setting of the schedule table based on the grasp result.

2. The monitoring device according to claim 1, further comprising a hardware configuration record for each system device, wherein the analysis unit includes a hardware configuration reported by the system device and a hardware configuration in the hardware configuration record. A monitoring device comprising: means for detecting a failure by comparing the configuration with a configuration.