JP2003296140A

JP2003296140A - Object monitoring system, object monitoring method and object monitoring program

Info

Publication number: JP2003296140A
Application number: JP2002100172A
Authority: JP
Inventors: Satoru Shinada; 悟品田
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2002-04-02
Filing date: 2002-04-02
Publication date: 2003-10-17

Abstract

<P>PROBLEM TO BE SOLVED: To provide a structure for improving resistance to fault in a system for remotely monitoring the state of a plurality of objects to be monitored present on a computer. <P>SOLUTION: This object monitoring system, in which a monitoring side computer monitors objects to be monitored in a computer to be monitored, is characterized in that the computer to be monitored is provided with object monitoring program parts present by the number of the objects to be monitored, each of which collects the state of one object to be monitored predefined not to overlap from the objects to be monitored at fixed time intervals, and reports state information including the collected state and the time of collection; and an agent program part for acquiring the state information from each of the object monitoring program parts followed by storing, and collectively reporting, in response to an inquiry from the observer computer, the stored state information of the objects to be monitored. <P>COPYRIGHT: (C)2004,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】オブジェクト監視方式、オブ
ジェクト監視方法およびオブジェクト監視用プログラム
に関し、特に、対障害性を向上できるオブジェクト監視
方式、オブジェクト監視方法およびオブジェクト監視用
プログラムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an object monitoring system, an object monitoring method and an object monitoring program, and more particularly to an object monitoring system, an object monitoring method and an object monitoring program which can improve fault tolerance.

【０００２】[0002]

【従来の技術】従来の計算機監視方式の一例が特開２０
０１−００５６９１に記載されている。この計算機監視
方式は、マネージャプログラム部と、通信路と、エージ
ェントプログラム部と、監視対象とから構成されてい
る。このような構成を有する従来の計算機監視方式のマ
ネージャプログラム部は、エージェントプログラム部に
対して計算機上で動作する監視対象の状態を問い合わ
せ、エージェントプログラム部は、その要求を受け、情
報を採取し応答する。2. Description of the Related Art An example of a conventional computer monitoring system is disclosed in Japanese Patent Laid-Open No.
01-005691. This computer monitoring system is composed of a manager program section, a communication path, an agent program section, and a monitoring target. The manager program unit of the conventional computer monitoring system having such a configuration inquires of the agent program unit about the state of the monitoring target operating on the computer, and the agent program unit receives the request, collects information, and responds. To do.

【０００３】[0003]

【発明が解決しようとする課題】しかし、この従来の技
術には、次のような問題点があった。第１の問題点は、
監視対象計算機の負荷が増大した場合、エージェントプ
ログラム部もその影響を受けるので、マネージャプログ
ラム部から要求を受けても、監視対象の情報を採取でき
ず、また採取したとしてもマネージャプログラム部に応
答することができないという点にある。その理由は、エ
ージェントプログラム部自身が動作するために多くの資
源を必要としているからである。第２の問題点は、監視
対象のいずれかに障害が発生した場合、その障害となっ
た監視対象の情報を採取しようとするエージェントプロ
グラム部はその動作を継続することができないため、ど
のような原因によりエージェントプログラム部の動作が
継続できなくなったのか判断することができないという
点にある。その理由は、監視対象の情報を採取する処理
とその情報をマネージャプログラム部に応答する処理と
が一体となっているためである。本発明の目的は、計算
機上に存在する複数の監視対象オブジェクトの状態を遠
隔から監視する方式に関し、対障害性を向上させる構成
を提供するものである。However, this conventional technique has the following problems. The first problem is
If the load on the monitored computer increases, the agent program part will also be affected, so even if a request from the manager program part is received, the monitored information cannot be collected, and even if it is collected, it will respond to the manager program part. The point is that you cannot do it. The reason is that the agent program unit itself requires a lot of resources to operate. The second problem is that, when a failure occurs in any of the monitoring targets, the agent program unit that tries to collect the information of the monitoring target that has failed cannot continue its operation. It is not possible to judge whether the operation of the agent program part cannot be continued due to the cause. The reason is that the process of collecting the information to be monitored and the process of responding the information to the manager program unit are integrated. It is an object of the present invention to provide a system for remotely monitoring the states of a plurality of monitoring target objects existing on a computer and providing a configuration for improving fault tolerance.

【０００４】[0004]

【課題を解決するための手段】本願発明のオブジェクト
監視方式は、監視元計算機が当該監視元計算機と通信路
で接続された監視対象計算機中の監視対象オブジェクト
を監視するオブジェクト監視方式であって、前記監視対
象計算機は、前記監視対象オブジェクトの数だけ存在
し、それぞれは前記監視対象オブジェクトの内の予め定
められた重複しない一つの監視対象オブジェクトの状態
を一定時間間隔で採取し、採取したその状態および採取
した時刻を含む状態情報を通知するオブジェクト監視プ
ログラム部と、前記オブジェクト監視プログラム部のそ
れぞれから前記状態情報を取得して記憶するとともに、
前記監視元計算機から問い合わせを受けた場合には、記
憶してある前記監視対象オブジェクトの状態情報をまと
めて前記監視元計算機に通知するエージェントプログラ
ム部とを備え、前記オブジェクト監視プログラム部のそ
れぞれは、当該オブジェクト監視プログラム部が監視す
る監視対象オブジェクト以外の監視対象オブジェクトに
は依存せずに動作し、前記エージェントプログラム部
は、前記オブジェクト監視プログラム部の動作および監
視対象オブジェクトの障害により影響を受けないよう動
作することを特徴とし、前記監視元計算機は、一定時間
間隔で前記エージェントプログラム部に問い合わせを行
って前記監視対象オブジェクトの状態情報を取得し、状
態情報が採取されていないものがあった場合には、当該
状態情報が採取されていない監視対象オブジェクトに障
害が発生したものと判断するマネージャプログラム部を
備えたことを特徴とする。An object monitoring system of the present invention is an object monitoring system in which a monitoring source computer monitors a monitoring target object in a monitoring target computer connected to the monitoring source computer via a communication path. The monitoring target computer exists in the number of the monitoring target objects, and each of the monitoring target objects has a predetermined non-overlapping state of one monitoring target object that is collected at a constant time interval and the collected state And an object monitoring program unit for notifying the state information including the collected time, and acquiring and storing the state information from each of the object monitoring program unit,
When an inquiry is received from the monitoring source computer, an agent program unit that collectively notifies the monitoring source computer of the stored state information of the monitoring target object, and each of the object monitoring program units, The object monitoring program unit operates independently of the monitoring target objects other than the monitoring target objects monitored by the object monitoring program unit, and the agent program unit is not affected by the operation of the object monitoring program unit and the failure of the monitoring target object. It is characterized in that the monitoring source computer makes an inquiry to the agent program unit at a constant time interval to acquire the status information of the monitoring target object, and if there is a status information that is not collected, Is the status information collected Fault monitoring target object no one is characterized by comprising a manager program unit determines that occurred.

【０００５】本願発明のオブジェクト監視方法は、監視
元計算機が当該監視元計算機と通信路で接続された監視
対象計算機中の監視対象オブジェクトを監視するオブジ
ェクト監視方法であって、前記監視対象計算機が、前記
監視対象オブジェクトの数だけ存在し、それぞれは前記
監視対象オブジェクトの内の予め定められた重複しない
一つの監視対象オブジェクトの状態を一定時間間隔で採
取し、採取したその状態および採取した時刻を含む状態
情報を通知するステップと、前記監視対象計算機が、前
記オブジェクト監視プログラム部のそれぞれから前記状
態情報を取得して記憶するとともに、前記監視元計算機
から問い合わせを受けた場合には、記憶してある前記監
視対象オブジェクトの状態情報をまとめて前記監視元計
算機に通知するステップと、前記監視元計算機が、一定
時間間隔で前記エージェントプログラム部に問い合わせ
を行って前記監視対象オブジェクトの状態情報を取得
し、状態情報が採取されていないものがあった場合に
は、当該状態情報が採取されていない監視対象オブジェ
クトに障害が発生したものと判断するステップとを含む
ことを特徴とする。The object monitoring method of the present invention is an object monitoring method for a monitoring source computer to monitor a monitoring target object in a monitoring target computer connected to the monitoring source computer by a communication path. There are as many monitoring target objects as possible, each of which includes a predetermined non-overlapping state of one of the monitoring target objects at predetermined time intervals, and includes the collected state and the collected time. The step of notifying the state information, and the monitored computer acquires and stores the state information from each of the object monitoring program units, and stores it when an inquiry is received from the monitoring source computer. The status information of the monitoring target objects is collectively notified to the monitoring source computer. When the monitoring source computer makes an inquiry to the agent program unit at a fixed time interval to acquire the status information of the monitoring target object, and there is a status information that has not been collected, And a step of determining that a failure has occurred in the monitored object for which status information has not been collected.

【０００６】本発明のオブジェクト監視用プログラム
は、監視元計算機が当該監視元計算機と通信路で接続さ
れた監視対象計算機中の監視対象オブジェクトを監視す
るオブジェクト監視処理であって、前記監視対象オブジ
ェクトの数だけ存在し、それぞれは前記監視対象オブジ
ェクトの内の予め定められた重複しない一つの監視対象
オブジェクトの状態を一定時間間隔で採取し、採取した
その状態および採取した時刻を含む状態情報を通知する
処理と、前記オブジェクト監視プログラム部のそれぞれ
から前記状態情報を取得して記憶するとともに、前記監
視元計算機から問い合わせを受けた場合には、記憶して
ある前記監視対象オブジェクトの状態情報をまとめて前
記監視元計算機に通知する処理とを前記監視対象計算機
に実行させ、一定時間間隔で前記エージェントプログラ
ム部に問い合わせを行って前記監視対象オブジェクトの
状態情報を取得し、状態情報が採取されていないものが
あった場合には、当該状態情報が採取されていない監視
対象オブジェクトに障害が発生したものと判断する処理
を前記監視元計算機に実行させることを特徴とする。The object monitoring program of the present invention is an object monitoring process for a monitoring source computer to monitor a monitoring target object in a monitoring target computer connected to the monitoring source computer by a communication path. There are as many monitoring objects as possible, each of which collects the status of one predetermined, non-overlapping monitoring object of the monitoring objects at regular time intervals, and notifies the status information including the collected status and the collection time. The processing and the status information from each of the object monitoring program units are acquired and stored, and when an inquiry is received from the monitoring source computer, the status information of the stored monitoring target objects is collectively stored as described above. The process of notifying the monitoring source computer is executed by the monitoring target computer and If the status information of the monitored object is acquired by inquiring to the agent program section at intervals, and the status information is not collected for some objects, a failure occurs in the monitored object for which the status information has not been collected. It is characterized in that the monitoring source computer is caused to execute a process of determining that the occurrence has occurred.

【０００７】[0007]

【発明の実施の形態】次に、本発明について図面を参照
して詳細に説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, the present invention will be described in detail with reference to the drawings.

【０００８】まず、本発明の第１の実施の形態について
詳細に説明する。図１を参照すると、本発明の第１の実
施の形態は、監視元計算機１００と、監視対象計算機２
００とから構成されている。これらの計算機は、プログ
ラム制御により動作し、通信路３００を介して相互に接
続されている。監視元計算機１００は、マネージャプロ
グラム部１１０を備える。監視対象計算機２００は、エ
ージェントプログラム部２１０と、オブジェクト監視プ
ログラム部２２０−１、２２０−２、２２０−３と、監
視対象オブジェクト２３０−１、２３０−２、２３０−
３とを備える。マネージャプログラム部１１０は、エー
ジェントプログラム部２１０と通信路３００を介して通
信を行い、エージェントプログラム部２１０が収集した
情報を取得する。エージェントプログラム部２１０は、
オブジェクト監視プログラム部２２０−１、２２０−
２、２２０−３と計算機内部通信路４００を介して通信
を行い、オブジェクト監視プログラム部２２０−１、２
２０−２、２２０−３が収集した監視対象オブジェクト
２３０−１、２３０−２、２３０−３の状態に関する情
報を取得する。ここで、エージェントプログラム部２１
０は、必要最小限の計算機資源から構成され、監視対象
計算機２００のオペレーティングシステムが提供するメ
モリロック機能並びにタスクの優先度制御機能を用い
て、監視対象計算機２００上の他のプログラム（オブジ
ェクト監視プログラム部２２０を含む）の動作、他の資
源（監視対象オブジェクトを含む）の障害等より影響を
受けないよう動作する。メモリロック機能とは、オペレ
ーティングシステムが提供する機能の一つで、対象のタ
スクが必要とするメモリを予め確保し、それを固定する
ことでページング機構等がタスクからメモリを奪うこと
を防ぐ機能をいう。これにより、監視対象計算機２００
のメモリ負荷が高い場合でも、エージェントプログラム
部２１０は、予め確保してあるメモリで動作し続けるこ
とが可能となる。一般的にロック可能なメモリ量には限
りがあるため、あらゆるモジュールについてメモリロッ
クすることはできないので、エージェントプログラム部
２１０は機能を限定し、必要とする資源量を最小化す
る。タスクの優先度制御機能とは、システムの上で動作
する各タスクの実行優先度を制御する機能をいい、他の
タスクよりもエージェントプログラム部２１０のタスク
の優先度を高く設定することにより、エージェントプロ
グラム部２１０は監視対象計算機２００の負荷が高い場
合においても動作が可能となる。オブジェクト監視プロ
グラム部２２０−１、２２０−２、２２０−３は、それ
ぞれ監視対象計算機上で動作する監視対象オブジェクト
２３０−１、２３０−２、２３０−３を監視し、その状
態、情報採取時刻等を状態情報としてエージェントプロ
グラム部２１０に通知する。尚、オブジェクト監視プロ
グラム部２２０−１、２２０−２、２２０−３のそれぞ
れは、それぞれが監視する監視対象オブジェクト以外の
監視対象オブジェクトには依存せずに動作する。次に、
図１〜図４を参照して本実施の形態の全体の動作につい
て詳細に説明する。ここでは、一例として監視対象オブ
ジェクト２３０−１はディスク上のファイル、監視対象
オブジェクト２３０−２は空きメモリ、監視対象オブジ
ェクト２３０−３は実行待ちタスクとする。図２は、オ
ブジェクト監視プログラム部２２０−１、２２０−２、
２２０−３の処理フローを表わす。オブジェクト監視プ
ログラム部２２０−１、２２０−２、２２０−３は、監
視対象オブジェクト２３０−１、２３０−２、２３０−
３の状態を採取し（ステップＡ１）、それを情報採取時
刻とともに状態情報としてエージェントプログラム部２
１０に通知する（ステップＡ２）と、エージェントプロ
グラム部２１０は、当該状態情報を取得して記憶する。
そして、オブジェクト監視プログラム部２２０−１、２
２０−２、２２０−３は、予め定められた監視間隔によ
り待ち合わせ処理（ステップＡ３）を行い、再度ステッ
プＡ１に戻る。図３は、エージェントプログラム部２１
０の処理フローを表わす。エージェントプログラム部２
１０は、マネージャプログラム部１１０からの問い合わ
せを待ち（ステップＢ１）、問い合わせを受けた場合に
は（ステップＢ２）、記憶してある前記状態情報をまと
めて（ステップＢ３）マネージャプログラム部１１０に
通知する（ステップＢ４）。図４は、マネージャプログ
ラム部１１０の処理フローを表わす。マネージャプログ
ラム部１１０は、エージェントプログラム部２１０に対
して監視対象オブジェクト２３０−１、２３０−２、２
３０−３の状態を問い合わせることにより状態情報を取
得し（ステップＣ１）、各監視対象オブジェクト２３０
−１、２３０−２、２３０−３について、その情報採取
時刻をチェックする（ステップＣ２）。予め定められた
監視間隔で状態情報の採取ができていないことを検出し
た場合には、対応するオブジェクト監視プログラム２３
０−１、２３０−２、２３０−３において異常が生じて
いることを検知する（ステップＣ３）。特に異常を検出
しない場合は、監視間隔に応じた待ち合わせ処理（ステ
ップＣ４）を行い、再度ステップＣ１に戻る。ここで監
視対象オブジェクト２３０−１即ちディスク上のファイ
ルに障害が発生したとする。この場合、オブジェクト監
視プログラム部２２０−１は、ステップＡ１で状態を採
取することができず、ステップＡ２で状態情報を通知し
ないこととなる。一方、他のオブジェクト監視プログラ
ム部２２０−２および２２０−３は、それぞれ監視の対
象となる監視対象オブジェクト２２０−２および２２０
−３が異なるので、監視対象オブジェクト２２０−１即
ちディスク上のファイルが障害になっても影響を受けず
に、ステップＡ２において状態情報をエージェントプロ
グラム部２１０に通知することができる。この状態でエ
ージェントプログラム部２１０は、マネージャプログラ
ム部１１０から要求を受けると（ステップＢ２）、監視
対象オブジェクト２３０の状態情報をマネージャプログ
ラム部１１０に通知する（ステップＢ３およびステップ
Ｂ４）。マネージャプログラム部１１０は、エージェン
トプログラム部２１０から状態情報を取得すると（ステ
ップＣ１）、監視対象オブジェクト２３０−１の状態情
報が採取できていないことを検出し（ステップＣ２およ
びステップＣ３）、監視対象オブジェクト２３０−１に
障害が発生したことを検知する（ステップＣ５）。以
上、理想的には本実施の形態のように、一つの監視対象
オブジェクト２３０に対して、一つのオブジェクト監視
プログラム部２２０を対応させて監視させることが好ま
しいが、１つの監視対象オブジェクト２３０に対して一
つのオブジェクト監視プログラム部２２０を対応させな
いで行うこともできる。例えば、同じディスク上にある
２つのファイルと、そのディスクに接続されているコン
トローラ(インタフェースカード等)を監視するために、
その２つのファイルに対して、それぞれオブジェクト監
視プログラム２２０を対応させて監視させる。この場
合、２つのオブジェクト監視プログラム２２０のそれぞ
れは、コントローラを介してファイルをアクセスし監視
する必要があり、ディスクと接続されているコントロー
ラ(インタフェースカード等)が正常に動作していること
が前提となるが、この２つのオブジェクト監視プログラ
ム２２０が同時に監視対象オブジェクトの状態情報を採
取できなくなった場合には、２つのオブジェクト監視プ
ログラム２２０がファイルを監視するために必要な共通
部分であるコントローラまたはディスクに障害が発生し
ているものと判断することができる。このように、共通
部分を使用するオブジェクト監視プログラム２２０を複
数動作させることによって、より多くの情報を取得する
ことができる場合もある。次に、本発明の第２の実施の
形態について図面を参照して詳細に説明する。図１にお
いて監視対象オブジェクト２３０を監視対象計算機２０
０のシステムが提供する各種の機能と定義し、各々のオ
ブジェクト監視プログラム部２２０がそれぞれの監視対
象オブジェクト２３０を使用するように構成する。ま
た、複数の機能をオブジェクト監視プログラム部２２０
が使用するよう構成しても良い。各オブジェクト監視プ
ログラム部２２０は、機能が正常に動作することを定期
的に確認し、その結果をエージェントプログラム部２１
０に通知する。以後の動作は、第１の実施の形態と同様
であるので説明を省略する。以上により、システムが提
供する各種の機能が正常に動作することを確認でき、逆
に正常に動作が行われないことを検出することによっ
て、監視対象計算機の障害を検知することができる。次
に、本発明の第３の実施の形態について図面を参照して
詳細に説明する。第１の実施の形態の構成において、特
にエージェントプログラム部２１０を監視対象計算機２
００のシステム中核部（例えば、ＵＮＩＸ（登録商標）
であればカーネル部)に含める。このような構成にする
ことにより、エージェントプログラム部が他の一般のタ
スクから切り離され、より対障害性を増すことが可能と
なる。First, the first embodiment of the present invention will be described in detail. Referring to FIG. 1, in the first exemplary embodiment of the present invention, a monitoring source computer 100 and a monitoring target computer 2
00 and 00. These computers operate under program control and are connected to each other via a communication path 300. The monitoring source computer 100 includes a manager program unit 110. The monitoring target computer 200 includes an agent program unit 210, object monitoring program units 220-1, 220-2, 220-3, and monitoring target objects 230-1, 230-2, 230-.
3 and 3. The manager program unit 110 communicates with the agent program unit 210 via the communication path 300, and acquires the information collected by the agent program unit 210. The agent program section 210 is
Object monitoring program units 220-1, 220-
2, 220-3 through the computer internal communication path 400 to communicate with the object monitoring program units 220-1, 2
Information regarding the states of the monitoring target objects 230-1, 230-2, and 230-3 collected by the 20-2 and 220-3 is acquired. Here, the agent program unit 21
0 is composed of the minimum necessary computer resources, and uses the memory lock function and the task priority control function provided by the operating system of the monitoring target computer 200 to execute other programs (object monitoring program) on the monitoring target computer 200. The operation is performed so as not to be affected by the operation of the unit 220 and the failure of other resources (including the monitoring target object). The memory lock function is one of the functions provided by the operating system.It has a function that secures the memory required by the target task in advance and locks it to prevent the paging mechanism from taking the memory from the task. Say. As a result, the monitored computer 200
Even when the memory load is high, the agent program unit 210 can continue to operate in the memory secured in advance. Generally, since the amount of memory that can be locked is limited, the memory cannot be locked for all modules. Therefore, the agent program unit 210 limits its function and minimizes the required resource amount. The task priority control function refers to a function of controlling the execution priority of each task operating on the system. By setting the task priority of the agent program unit 210 higher than that of other tasks, the agent priority control function The program unit 210 can operate even when the load on the monitored computer 200 is high. The object monitoring program units 220-1, 220-2, 220-3 monitor the monitoring target objects 230-1, 230-2, 230-3 operating on the monitoring target computer, respectively, and their states, information collection times, etc. To the agent program unit 210 as status information. It should be noted that each of the object monitoring program units 220-1, 220-2, 220-3 operates without depending on a monitoring target object other than the monitoring target object that it monitors. next,
The overall operation of this embodiment will be described in detail with reference to FIGS. Here, as an example, the monitoring target object 230-1 is a file on a disk, the monitoring target object 230-2 is a free memory, and the monitoring target object 230-3 is an execution waiting task. FIG. 2 shows the object monitoring program units 220-1, 220-2,
220-3 shows a processing flow of 220-3. The object monitoring program units 220-1, 220-2, 220-3 have monitoring target objects 230-1, 230-2, 230-.
The state of No. 3 is collected (step A1), and it is used as the state information together with the information collection time in the agent program unit 2
When notified to 10 (step A2), the agent program unit 210 acquires and stores the state information.
Then, the object monitoring program units 220-1, 2
20-2 and 220-3 perform a waiting process (step A3) at a predetermined monitoring interval, and return to step A1 again. FIG. 3 shows the agent program section 21.
0 represents a processing flow of 0. Agent program section 2
10 waits for an inquiry from the manager program unit 110 (step B1), and when the inquiry is received (step B2), collectively stores the stored state information (step B3) and notifies the manager program unit 110. (Step B4). FIG. 4 shows a processing flow of the manager program unit 110. The manager program section 110 instructs the agent program section 210 to monitor objects 230-1, 230-2, 2
Status information is acquired by inquiring about the status of 30-3 (step C1), and each monitoring target object 230 is acquired.
The information collection times of -1, 230-2 and 230-3 are checked (step C2). When it is detected that the status information cannot be collected at a predetermined monitoring interval, the corresponding object monitoring program 23
It is detected that an abnormality has occurred in 0-1, 230-2, and 230-3 (step C3). If no particular abnormality is detected, a waiting process (step C4) according to the monitoring interval is performed, and the process returns to step C1. Here, it is assumed that a failure has occurred in the monitored object 230-1, that is, the file on the disk. In this case, the object monitoring program unit 220-1 cannot collect the state in step A1 and does not notify the state information in step A2. On the other hand, the other object monitoring program units 220-2 and 220-3 have monitoring target objects 220-2 and 220 to be monitored, respectively.
-3 is different, the status information can be notified to the agent program unit 210 in step A2 without being affected even if the monitored object 220-1, that is, the file on the disk, becomes a failure. When the agent program section 210 receives a request from the manager program section 110 in this state (step B2), the agent program section 210 notifies the manager program section 110 of the state information of the monitoring target object 230 (steps B3 and B4). When the manager program unit 110 acquires the state information from the agent program unit 210 (step C1), it detects that the state information of the monitoring target object 230-1 has not been collected (steps C2 and C3), and the monitoring target object is detected. It is detected that a failure has occurred in 230-1 (step C5). As described above, ideally, as in the present embodiment, it is preferable to monitor one object to be monitored 230 in association with one object monitoring program unit 220, but to monitor one object to be monitored 230. One object monitoring program unit 220 may not be associated with each other. For example, to monitor two files on the same disk and the controller (interface card etc.) connected to that disk,
The object monitoring program 220 is associated with each of the two files and monitored. In this case, each of the two object monitoring programs 220 needs to access and monitor the file via the controller, and it is assumed that the controller (interface card or the like) connected to the disk is operating normally. However, if the two object monitoring programs 220 cannot collect the status information of the monitored object at the same time, the two object monitoring programs 220 need to store in the controller or disk which is a common part for monitoring the files. It can be judged that a failure has occurred. In this way, more information may be acquired in some cases by operating a plurality of object monitoring programs 220 that use the common part. Next, a second embodiment of the present invention will be described in detail with reference to the drawings. In FIG. 1, the monitoring target object 230 is replaced by the monitoring target computer 20.
0 is defined as various functions provided by the system, and each object monitoring program unit 220 is configured to use each monitoring target object 230. In addition, the object monitoring program unit 220 has a plurality of functions.
May be configured to be used by. Each object monitoring program unit 220 periodically confirms that the function operates normally, and the result is confirmed by the agent program unit 21.
Notify 0. Subsequent operations are the same as those in the first embodiment, so description thereof will be omitted. As described above, it is possible to confirm that the various functions provided by the system operate normally, and conversely, it is possible to detect a failure in the monitored computer by detecting that the function does not operate normally. Next, a third embodiment of the present invention will be described in detail with reference to the drawings. In the configuration of the first embodiment, in particular, the agent program unit 210 is connected to the monitored computer 2
00 system core (eg UNIX (registered trademark)
If so, include it in the kernel part). With such a configuration, the agent program section can be separated from other general tasks, and the fault tolerance can be further increased.

【０００９】[0009]

【発明の効果】以上説明したように、本発明は、以下の
効果を有する。第１の効果は、監視システムの対障害性
が向上することにある。その理由は、監視システムの中
核となる監視対象計算機上のエージェントプログラム部
が動作する際に必要となる資源を極少化し、この部分の
動作を保証できるように構成しているためである。ま
た、各オブジェクト監視プログラムを監視対象毎に分割
することで、あるオブジェクト監視プログラムの障害
が、エージェントプログラム部や、他のオブジェクト監
視プログラムの動作に影響しないよう構成されるためで
ある。第２の効果は、監視対象オブジェクト毎にオブジ
ェクト監視プログラムが存在することで、監視対象計算
機上の障害が、どこで発生したかを検出できることにあ
る。その理由は、オブジェクト監視プログラムが各監視
対象に対して細分化されているためである。As described above, the present invention has the following effects. The first effect is that the fault tolerance of the monitoring system is improved. The reason is that the resources required for the operation of the agent program part on the monitored computer, which is the core of the monitoring system, are minimized and the operation of this part is guaranteed. Also, by dividing each object monitoring program for each monitoring target, it is configured that a failure of a certain object monitoring program does not affect the operations of the agent program unit and other object monitoring programs. The second effect is that the existence of the object monitoring program for each monitoring target object makes it possible to detect where the failure on the monitoring target computer has occurred. The reason is that the object monitoring program is subdivided for each monitoring target.

[Brief description of drawings]

【図１】本発明の第１の実施の形態の構成を示すブロッ
ク図である。FIG. 1 is a block diagram showing a configuration of a first exemplary embodiment of the present invention.

【図２】本発明の第１の実施の形態の動作の流れ図であ
る。FIG. 2 is a flowchart of the operation of the first embodiment of the present invention.

【図３】本発明の第１の実施の形態の動作の流れ図であ
る。FIG. 3 is a flowchart of the operation of the first exemplary embodiment of the present invention.

【図４】本発明の第１の実施の形態の動作の流れ図であ
る。FIG. 4 is a flowchart of the operation of the first embodiment of the present invention.

[Explanation of symbols]

１００監視元計算機１１０マネージャプログラム部２００監視対象計算機２１０エージェントプログラム部２２０−１オブジェクト監視プログラム部２２０−２オブジェクト監視プログラム部２２０−３オブジェクト監視プログラム部２３０−１監視対象オブジェクト２３０−２監視対象オブジェクト２３０−３監視対象オブジェクト３００通信路４００計算機内部通信路 100 monitoring source computer 110 Manager Program Department 200 Computers to be monitored 210 Agent Program Department 220-1 Object Monitoring Program Unit 220-2 Object monitoring program section 220-3 Object Monitoring Program Unit 230-1 Monitoring target object 230-2 Monitoring target object 230-3 Monitoring target object 300 communication channels 400 Computer internal communication path

Claims

[Claims]

1. An object monitoring method in which a monitoring source computer monitors a monitoring target object in a monitoring target computer connected to the monitoring source computer via a communication path, wherein the monitoring target computer is the number of the monitoring target objects. An object that collects the status of one predetermined, non-overlapping monitored object among the monitored objects at regular time intervals, and notifies the status information including the collected status and the collected time. The monitoring program section and the object monitoring program section acquire and store the state information, and when an inquiry is received from the monitoring source computer, the stored state information of the monitored object is summarized. And an agent program unit for notifying the monitoring source computer. Each of the object monitoring program units operates independently of monitoring target objects other than the monitoring target objects monitored by the object monitoring program unit, and the agent program unit operates and monitors the object monitoring program unit. The monitoring source computer performs an inquiry to the agent program unit at a fixed time interval to acquire the status information of the monitoring target object and collects the status information. An object monitoring method characterized by comprising a manager program section that judges that a failure has occurred in the monitored object for which the status information has not been collected, if there is any that has not been collected.

2. An object monitoring method, wherein a monitoring source computer monitors a monitoring target object in a monitoring target computer connected to the monitoring source computer by a communication path, wherein the monitoring target computer is the number of the monitoring target objects. Exists, each of which collects the state of one predetermined non-overlapping monitored object of the monitored objects at regular time intervals, and notifies the state information including the collected state and the collected time. And the monitoring target computer acquires and stores the state information from each of the object monitoring program units, and when an inquiry is received from the monitoring source computer, the status of the stored monitoring target object A step of collectively reporting information to the monitoring source computer, and the monitoring source computer If the status information of the monitoring target object is acquired by inquiring to the agent program unit at regular time intervals, and there is any status information that has not been collected, the monitoring target object for which the status information has not been collected And a step of determining that a failure has occurred in the object monitoring method.

3. An object monitoring process in which a monitoring source computer monitors a monitoring target object in a monitoring target computer connected to the monitoring source computer via a communication path, and there are as many monitoring target objects as there are monitoring target objects. A process of collecting the state of one predetermined non-overlapping monitored object of the monitored objects at regular time intervals and notifying the status information including the collected status and the collected time; and the object monitoring program A process of acquiring and storing the state information from each of the units, and collectively reporting the stored state information of the monitoring target objects to the monitoring source computer when an inquiry is received from the monitoring source computer. Is executed by the computer to be monitored, and the agent program section is executed at regular time intervals. If the status information of the above-mentioned monitored objects is acquired by performing a match and there is any status information that has not been collected, it is determined that a failure has occurred in the monitored object for which the status information has not been collected. An object monitoring program that causes the monitoring source computer to execute the processing to be performed.