JPH06231009A

JPH06231009A - Monitoring device

Info

Publication number: JPH06231009A
Application number: JP5014576A
Authority: JP
Inventors: Tetsuya Toi; 哲也戸井
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 1993-02-01
Filing date: 1993-02-01
Publication date: 1994-08-19

Abstract

PURPOSE:To provide a monitoring device capable of recognizing the malfunction of a CPU due to a cause other than a transitory one correctly. CONSTITUTION:A CPU monitoring device 11 is equipped with a bus request number of times counter 31 which counts the number of times of bus requests of respective CPU set as the target of monitoring, and applies interruption to the CPU with a count value within a time set by an interval timer 32 less than a value set by a lower limitation setting register 33. A result of interruption is registered on a response register 37. and a control circuit 34 discriminates the CPU as an abnormal one when no normal response is received within fixed time. A discrimination result is outputted to an error display signal line 21 and an error notification signal line 22.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は複数の中央処理装置を備
えた計算機システムで、これら中央処理装置の動作を監
視するために用いられる監視装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a monitoring system used in a computer system having a plurality of central processing units to monitor the operation of these central processing units.

【０００２】[0002]

【従来の技術】中央処理装置（以下、単にＣＰＵと略称
する。）を使用した計算機システムでは、何らかの原因
によってＣＰＵが暴走することがある。ＣＰＵの暴走を
監視して、これが検出されたときにシステムを正常な状
態に戻すために、従来からウォッチ・ドッグ・タイマが
使用されている。2. Description of the Related Art In a computer system using a central processing unit (hereinafter simply referred to as CPU), the CPU may run away for some reason. Watchdog timers have traditionally been used to monitor CPU runaway and return the system to a normal state when this is detected.

【０００３】ウォッチ・ドッグ・タイマに関しては、例
えば特公平４−１９５７７号公報に詳細な開示が行われ
ている。すなわち、この監視手法では、ＣＰＵがクロッ
クパルスを計数することによって、周期的に信号が発生
されるようにしている。そして、これにより発生する信
号がこの周期よりも長い所定の時間以上途切れるような
事態が発生した場合には、ＣＰＵに暴走等の動作不良が
生じたものと判断し、適切な措置が採られるようになっ
ている。例えば、この判断が行われた時点でリセット信
号を出力してＣＰＵをリセットし、正常動作に戻すとい
った措置が採られる。The watch dog timer is disclosed in detail in, for example, Japanese Patent Publication No. 4-19577. That is, in this monitoring method, a signal is generated periodically by the CPU counting clock pulses. Then, when a situation occurs in which the signal generated thereby is interrupted for a predetermined time longer than this cycle, it is determined that a malfunction of the CPU such as runaway has occurred, and appropriate measures are taken. It has become. For example, when this judgment is made, a reset signal is output to reset the CPU to restore normal operation.

【０００４】ところで、ＣＰＵが動作不良に陥る原因と
しては各種のものがある。このうちの一過性の原因とし
ては、瞬発的な外来雑音の発生が代表的である。この他
に、ハードウェアの障害によるものやソフトウェアその
他の障害によるものがある。前者の例としては、ＣＰＵ
に対する直流電源の供給不良や、動作クロックの停止、
半導体素子の冷却不良に基づく熱暴走を挙げることがで
きる。後者の例としては、ソフトウェアに欠陥（バグ）
があったような場合や、入力データにエラーがあってプ
ログラム自身が異常を検出して動作中止（ホールト）状
態になったような場合を挙げることができる。By the way, there are various causes of CPU malfunction. One of the typical causes of the transient noise is the occurrence of instantaneous external noise. In addition to this, there are hardware failures and software and other failures. An example of the former is a CPU
Supply of DC power to the
Thermal runaway due to poor cooling of semiconductor elements can be mentioned. An example of the latter is a software defect (bug)
There is a case where there is an error, or a case where an error occurs in the input data and the program itself detects an abnormality and enters an operation halt state.

【０００５】[0005]

【発明が解決しようとする課題】ＣＰＵが動作不良に陥
る原因のうち一過性のものは、ウォッチ・ドッグ・タイ
マを使用してＣＰＵにリセットをかければ、これを正常
に復帰させることができる。しかしながら、ハードウェ
アの故障やソフトウェアを原因としてホールトが発生し
たような場合には、システムを正常に復帰させることが
できない。すなわち、この場合にウォッチ・ドッグ・タ
イマが異常を検出してＣＰＵを繰り返しリセットさせて
も、ＣＰＵは同一状況において再び動作不良を起こして
しまう。したがって、このような場合にはウォッチ・ド
ッグ・タイマを用いてもシステムの改善を行うことがで
きない。The temporary cause of the malfunction of the CPU can be restored to normal by resetting the CPU using the watch dog timer. . However, if a halt occurs due to a hardware failure or software, the system cannot be restored normally. That is, in this case, even if the watch dog timer detects an abnormality and resets the CPU repeatedly, the CPU will malfunction again in the same situation. Therefore, in such a case, the system cannot be improved even by using the watch dog timer.

【０００６】そればかりか、ウォッチ・ドッグ・タイマ
自体は正常に動作してＣＰＵのリセットが行われている
ので、これによっても不都合な状況が改善されていない
といった認識をシステム自体が持つこともできなかっ
た。したがって、何らかの不具合が発生したときにこれ
を上位のシステム等に通知するような機構を持ったシス
テムであったとしても、この段階でこの不具合を通知す
ることもできなかった。Not only that, since the watch dog timer itself operates normally and the CPU is reset, the system itself may have the recognition that the inconvenient situation is not improved even by this. There wasn't. Therefore, even if the system has a mechanism for notifying a higher level system when some kind of trouble occurs, it is not possible to notify this trouble at this stage.

【０００７】また、従来のこのような監視装置では、ウ
ォッチ・ドッグ・タイマがＣＰＵに周期的に割込信号を
入力して、一定時間以内にこれに対する応答があるかど
うかをチェックし、応答があれば正常動作が行われてい
ると判断することになっていた。したがって、ＣＰＵは
自分が問題なく正常に動作しているような場合にも、周
期的に発生する割り込みに対して必ず応答を行うことが
必要となり、これが処理能力を実質的に低下させること
につながることになった。Further, in such a conventional monitoring device, the watch dog timer periodically inputs an interrupt signal to the CPU and checks whether or not there is a response to the signal within a predetermined time. If so, it was decided that the normal operation was performed. Therefore, even when the CPU is operating normally without any problem, it is necessary for the CPU to respond to interrupts that occur periodically, which leads to a substantial decrease in processing capability. is what happened.

【０００８】そこで本発明の目的は、一過性以外の原因
で引き起こされたＣＰＵの動作不良を正しく認識するこ
とのできる監視装置を提供することにある。Therefore, an object of the present invention is to provide a monitoring device capable of correctly recognizing a malfunction of a CPU caused by a cause other than a transient.

【０００９】本発明の他の目的は、ＣＰＵの動作不良を
正しく認識してこれを上位装置に通知することのできる
監視装置を提供することにある。Another object of the present invention is to provide a monitoring device capable of correctly recognizing a malfunction of a CPU and notifying the malfunction to a host device.

【００１０】本発明の更に他の目的は、正常動作中のＣ
ＰＵに過度の負担を強いることなくその監視を行うこと
のできる監視装置を提供することにある。Still another object of the present invention is to provide C during normal operation.
It is an object of the present invention to provide a monitoring device capable of monitoring the PU without imposing an excessive burden on the PU.

【００１１】[0011]

【課題を解決するための手段】請求項１記載の発明で
は、（イ）ＣＰＵが一定時間にバスを使用する頻度を測
定する頻度測定手段と、（ロ）この頻度測定手段の測定
値を予め定めた許容範囲と比較する比較手段と、（ハ）
この比較手段によって予め定めた許容範囲に属さない頻
度であると判別されたとき、そのＣＰＵの異常を検出す
る異常検出手段とを監視装置に具備させる。According to a first aspect of the present invention, (a) a frequency measuring means for measuring the frequency of use of a bus by a CPU for a fixed time, and (b) a measurement value of the frequency measuring means in advance. Comparison means to compare with the specified tolerance range, and (c)
When the comparison unit determines that the frequency does not fall within the predetermined allowable range, the monitoring device is provided with an abnormality detection unit that detects an abnormality of the CPU.

【００１２】すなわち請求項１記載の発明では、ＣＰＵ
が一定時間にバスを使用する頻度を測定して、これが予
め定めた許容範囲外である場合には、そのＣＰＵに何ら
かの障害が発生したものとしてそのＣＰＵの異常を検出
するようにしている。That is, in the invention described in claim 1, the CPU
Measures the frequency of using the bus for a certain period of time, and if it is out of a predetermined allowable range, it is determined that some failure has occurred in the CPU and the abnormality of the CPU is detected.

【００１３】請求項２記載の発明では、（イ）ＣＰＵが
一定時間にバスを使用する頻度を測定する頻度測定手段
と、（ロ）この頻度測定手段の測定値を予め定めた許容
範囲と比較する比較手段と、（ハ）この比較手段によっ
て予め定めた許容範囲に属さない頻度であると判別され
たとき、そのＣＰＵに対して所定の応答を出力させるた
めの割込信号を送出する割込信号送出手段と、（ニ）こ
の割込信号が送出されてから所定の時間内にそのＣＰＵ
が正常な応答を行った場合を除いてそのＣＰＵの異常を
検出する異常検出手段とを監視装置に具備させる。In a second aspect of the present invention, (a) a frequency measuring means for measuring the frequency of use of the bus by the CPU for a fixed time, and (b) a measurement value of the frequency measuring means is compared with a predetermined allowable range. And (c) an interrupt that sends an interrupt signal for outputting a predetermined response to the CPU when the comparison means determines that the frequency does not fall within the predetermined allowable range. (D) signal transmission means, and (d) the CPU within a predetermined time after this interrupt signal is transmitted.
The monitoring device is provided with an abnormality detecting means for detecting an abnormality of the CPU except when the CPU makes a normal response.

【００１４】すなち請求項２記載の発明では、ＣＰＵが
一定時間にバスを使用する頻度を測定して、これが予め
定めた許容範囲外である場合には、そのＣＰＵに何らか
の障害が発生したもの一応推察し、そのＣＰＵに対して
割り込みをかけ、異常な応答が行われたり、所定の時間
内に応答が行われなかったような場合にはそのＣＰＵの
異常を検出するようにしている。That is, according to the second aspect of the present invention, the frequency at which the CPU uses the bus for a certain period of time is measured, and if it is outside the predetermined allowable range, some failure occurs in the CPU. In the meantime, the CPU is interrupted, and if an abnormal response is made or no response is made within a predetermined time, the abnormality of the CPU is detected.

【００１５】請求項３記載の発明では、（イ）バスを共
用する複数のＣＰＵそれぞれが一定時間にバスを使用す
る頻度を個別に測定する頻度測定手段と、（ロ）この頻
度測定手段による各ＣＰＵごとの測定値を予め定めた許
容範囲と比較する比較手段と、（ハ）この比較手段によ
って予め定めた許容範囲に属さない頻度であると判別さ
れたＣＰＵに対して所定の応答を出力させるための割込
信号を送出する割込信号送出手段と、（ニ）この割込信
号が送出されてから所定の時間内にそのＣＰＵが正常な
応答を行った場合を除いてそのＣＰＵの異常を検出する
異常検出手段と、（ホ）この異常検出手段の検出結果を
外部に出力する検出結果出力手段とを監視装置に具備さ
せる。According to the third aspect of the present invention, (a) frequency measuring means for individually measuring the frequency of each of a plurality of CPUs sharing a bus using the bus at a fixed time; and (b) each of the frequency measuring means. A comparing means for comparing the measured value for each CPU with a predetermined allowable range; and (c) outputting a predetermined response to the CPU determined to have a frequency that does not fall within the predetermined allowable range by the comparing means. An interrupt signal transmitting means for transmitting an interrupt signal for the purpose of: (d) Except when the CPU makes a normal response within a predetermined time after the interrupt signal is transmitted, the abnormality of the CPU is detected. The monitoring device is provided with abnormality detecting means for detecting and (e) detection result outputting means for outputting the detection result of the abnormality detecting means to the outside.

【００１６】すなわち請求項３記載の発明では、バスを
共用する複数のＣＰＵが監視の対象になる場合を扱って
おり、この場合にはそれらＣＰＵごとに頻度測定手段を
用意することにしている。そして、測定した各頻度を許
容範囲と比較し、異常と推察されるＣＰＵに対しては割
り込みをかけてチェックを行い、いずれかのＣＰＵに異
常が判別されたら、その検出結果を外部に出力し、例え
ば各ＣＰＵの状況を表示部に表示したり、上位装置に結
果を報告することを可能にしている。That is, the invention according to claim 3 deals with the case where a plurality of CPUs sharing a bus are to be monitored, and in this case, frequency measuring means is prepared for each of these CPUs. Then, each measured frequency is compared with the permissible range, and CPUs that are suspected to be abnormal are interrupted and checked. If any of the CPUs is determined to be abnormal, the detection result is output to the outside. For example, the status of each CPU can be displayed on the display unit and the result can be reported to the host device.

【００１７】請求項４記載の発明では、（イ）バスを共
用する複数のＣＰＵそれぞれが一定時間にバスを使用す
る頻度を個別に測定する頻度測定手段と、（ロ）この頻
度測定手段による各ＣＰＵごとの測定値を予め定めた許
容範囲と比較する比較手段と、（ハ）この比較手段によ
って予め定めた許容範囲に属さない頻度であると判別さ
れたＣＰＵに対して所定の応答を出力させるための割込
信号を送出する割込信号送出手段と、（ニ）この割込信
号が送出されてから所定の時間内にそのＣＰＵが正常な
応答を行った場合を除いてそのＣＰＵの異常を検出する
異常検出手段と、（ホ）前記した複数のＣＰＵのバス使
用要求を調停するバス調停手段と、（ヘ）異常検出手段
によって異常が検出されたＣＰＵがバス使用要求を行っ
たときこれがバス調停手段に到達しないようにマスクす
るバス要求マスク手段とを監視装置に具備させる。According to the fourth aspect of the present invention, (a) frequency measuring means for individually measuring the frequency of each of a plurality of CPUs sharing a bus using the bus at a fixed time; and (b) each of the frequency measuring means. A comparing means for comparing the measured value for each CPU with a predetermined allowable range; and (c) outputting a predetermined response to the CPU determined to have a frequency that does not fall within the predetermined allowable range by the comparing means. An interrupt signal transmitting means for transmitting an interrupt signal for the purpose of: (d) Except when the CPU makes a normal response within a predetermined time after the interrupt signal is transmitted, the abnormality of the CPU is detected. Abnormality detection means for detecting, (e) bus arbitration means for arbitrating the bus use requests of the plurality of CPUs described above, and (f) when a CPU in which an abnormality is detected by the abnormality detection means makes a bus use request A bus request mask means for masking so as not to reach the stop means is provided in the monitoring device.

【００１８】すなわち請求項４記載の発明でも、請求項
３記載の発明と同様にバスを共用する複数のＣＰＵが監
視の対象になる場合を扱っている。請求項４記載の発明
の場合にはこれらのＣＰＵがバスの使用を要求した場合
の競合を調整するためのバス調停手段を備えており、Ｃ
ＰＵごとに頻度の測定を行い、異常と推察されるＣＰＵ
に対しては割り込みによって確認を行って、最終的に異
常と判別されたＣＰＵについては、バス調停手段に対す
るバスの使用要求をマスクすることによって、誤動作の
発生を防止している。That is, the invention according to claim 4 also deals with the case where a plurality of CPUs sharing a bus are to be monitored as in the invention according to claim 3. In the case of the invention described in claim 4, a bus arbitration means for adjusting contention when these CPUs request the use of the bus is provided, and C
CPU is estimated to be abnormal by measuring the frequency for each PU
For the CPU that is finally determined to be abnormal, the bus use request to the bus arbitration unit is masked to prevent the malfunction from occurring.

【００１９】[0019]

【実施例】以下実施例につき本発明を詳細に説明する。EXAMPLES The present invention will be described in detail below with reference to examples.

【００２０】図１は本発明の一実施例における監視装置
を使用した計算機システムの構成を表わしたものであ
る。このシステムでＣＰＵ監視装置１１はシステムバス
１２を介して第０〜第３のＣＰＵ１３₀〜１３₃と、こ
れらの共有メモリ１４および入出力制御回路１５と接続
されている。入出力制御回路１５には、図示しない入出
力装置が接続されるようになっている。ＣＰＵ監視装置
１１と第０〜第３のＣＰＵ１３₀〜１３₃の間には、Ｃ
ＰＵ監視装置１１側からこれらに個別に割り込みをかけ
るための割込信号線（ＩＮＴ）１７₀〜１７₃と、各Ｃ
ＰＵ１３₀〜１３ ₃がシステムバス１２の使用を要求す
る際に使用するバス要求線（ＢＲ）１８₀〜１８₃と、
ＣＰＵ監視装置１１がこれらのＣＰＵ１３₀〜１３₃の
１つにシステムバス１２の使用を許可する際に使用する
バス使用許可線１９₀〜１９₃が配置されている。FIG. 1 is a monitoring device according to an embodiment of the present invention.
Represents the configuration of a computer system using
It In this system, the CPU monitoring device 11 is a system bus
0th to 3rd CPUs 13 via 12₀~ 13₃And this
Connection with shared memory 14 and input / output control circuit 15
Has been done. The input / output control circuit 15 is not shown
Force device is connected. CPU monitoring device
11 and the 0th to 3rd CPU 13₀~ 13₃Between C
These are individually interrupted from the PU monitoring device 11 side.
Interrupt signal line (INT) 17 for₀~ 17₃And each C
PU13₀~ 13 ₃Requires the use of system bus 12
Bus request line (BR) 18 used when₀~ 18₃When,
The CPU monitoring device 11 has these CPUs 13₀~ 13₃of
Used to allow the use of the system bus 12 for one
Bus use permission line 19₀~ 19₃Are arranged.

【００２１】更に、ＣＰＵ監視装置１１にはエラー表示
信号線２１とエラー通知信号線２２の一端が接続されて
いる。エラー表示信号線２１は第０〜第３のＣＰＵ１３
₀〜１３₃のエラーの発生状況を表示する信号を出力す
るもので、その他端はエラー表示装置２３に接続されて
いる。エラー通知信号線２２は第０〜第３のＣＰＵ１３
₀〜１３₃に何らかのエラーが発生した場合にこれを上
位装置に通知するためのものであり、本実施例ではその
他端が通知先としての図示しないシステム管理プロセッ
サに接続されている。Further, one end of an error display signal line 21 and an error notification signal line 22 is connected to the CPU monitoring device 11. The error display signal line 21 is connected to the 0th to 3rd CPUs 13
_0-13 ₃ outputs a signal indicating the occurrence of an error, the other end is connected to the error display 23. The error notification signal line 22 is connected to the 0th to 3rd CPUs 13
_0-13 is ₃ for notifying this to the host apparatus when an error occurs, in the present embodiment and the other end is connected to a system management processor (not shown) as a notification destination.

【００２２】図２は、ＣＰＵ監視装置の機能的な構成を
表わしたものである。ＣＰＵ監視装置１１は、図１に示
した第０〜第３のＣＰＵ１３₀〜１３₃のバス要求回数
をカウントするバス要求回数カウンタ３１を備えてい
る。これらのバス要求回数は、インターバルタイマ３２
によって設定された時間内で計数され、下限値設定レジ
スタ３３に設定された下限値と比較されるようになって
いる。この結果として計数値が下限値に満たないＣＰＵ
１３が存在した場合には、制御回路３４が該当するその
ＣＰＵ１３に接続された割込信号線１７に割込信号を送
出するようになっている。応答タイマ３５はこのときの
そのＣＰＵ１３からの応答時間を測定するためのタイマ
である。FIG. 2 shows a functional configuration of the CPU monitoring device. The CPU monitoring device 11 includes a bus request number counter 31 that counts the number of bus requests of the _{0th to} 3rd CPUs 13 _{0 to} 13 ₃ shown in FIG. These bus request counts are determined by the interval timer 32.
Is counted within the time set by, and compared with the lower limit value set in the lower limit value setting register 33. As a result, the CPU whose count value is less than the lower limit value
When there is 13, the control circuit 34 sends an interrupt signal to the interrupt signal line 17 connected to the corresponding CPU 13. The response timer 35 is a timer for measuring the response time from the CPU 13 at this time.

【００２３】なお、本実施例では下限値設定レジスタ３
３に各ＣＰＵ１３₀〜１３₃の下限値として共通の下限
値を設定するが、これはＣＰＵ１３₀〜１３₃ごとにそ
れらのバスアクセスの頻度を考慮して異なった下限値を
設定するようにしてもよい。この場合には、アドレスバ
ス１２Ａに接続されたデコーダ３６の解読結果を基にし
てＣＰＵ１３ごとに下限値を下限値設定レジスタ３３に
登録するようにすればよい。In this embodiment, the lower limit value setting register 3
A common lower limit value is set as the lower limit value for each of the CPUs 13 _{0 to} 13 _{3 in} the CPU _3. However, a different lower limit value is set for each of the CPUs 13 _{0 to} 13 ₃ in consideration of the frequency of bus access. Good. In this case, the lower limit value may be registered in the lower limit value setting register 33 for each CPU 13 based on the decoding result of the decoder 36 connected to the address bus 12A.

【００２４】ＣＰＵ監視装置１１は、該当するＣＰＵ１
３から割り込みに対する応答があった場合にはデコーダ
３６で応答のあったＣＰＵ１３を解読し、データバス１
２に現われたデータを応答レジスタ３７に登録して制御
回路３４で正常な応答があったかどうかを判別するよう
にしている。そして、エラーの発生が判別された場合に
は、エラー表示信号線２１にエラー表示信号を出力する
と共に、エラー通知信号線２２にエラー通知信号を出力
することになる。The CPU monitoring device 11 has a corresponding CPU 1
When there is a response to the interrupt from 3, the decoder 36 decodes the responding CPU 13 and the data bus 1
The data appearing in 2 is registered in the response register 37 so that the control circuit 34 determines whether or not there is a normal response. When it is determined that an error has occurred, the error display signal is output to the error display signal line 21 and the error notification signal is output to the error notification signal line 22.

【００２５】ＣＰＵ監視装置１１には、この他にバス調
停回路３８が設けられており、システムバス１２（図
１）の使用に際しての調停が行われるようになってい
る。バス調停回路３８の入力側には、それぞれのバス要
求線１８₀〜１８₃に対応させて２入力アンドゲート３
９₀〜３９₃が配置されている。これらのアンドゲート
３９₀〜３９₃の一方の入力端にはそれぞれのバス要求
信号が入力され、他方の入力端には制御回路３４からマ
スク信号４１₀〜４１₃が入力されるようになってい
る。これらのマスク信号４１₀〜４１₃は、エラーの発
生したＣＰＵ１３がバス要求信号を発生させた場合に、
これを無効とするためのものである。In addition to this, the CPU monitoring device 11 is provided with a bus arbitration circuit 38 so that arbitration is performed when the system bus 12 (FIG. 1) is used. The input side of the bus arbitration circuit 38, two-input AND gate 3 respectively corresponding to the bus request line 18 _0-18 ₃
9 _0-39 ₃ are arranged. Each of the AND gates 39 _{0 to} 39 ₃ receives the bus request signal at one input terminal thereof, and the mask signals 41 _{0 to} 41 ₃ from the control circuit 34 at the other input terminal thereof. There is. These mask signals 41 _{0 to} 41 ₃ are used when the CPU 13 in which an error has occurred generates a bus request signal.
This is to invalidate this.

【００２６】これは、たとえハードウェアの故障が全く
一時的に発生して、その後の回路動作に問題が生じない
ような場合であっても、この故障によるプログラムの実
行の中断によって他のＣＰＵ１３との処理同期が採られ
ない場合があることを考慮したためである。したがっ
て、１つのＣＰＵが動作を回復してプログラムの実行を
開始しても他のＣＰＵとの間に何らの不都合も生じさせ
ないような計算機システムでは、このようなバス要求信
号のマスクを必ずしも行う必要がないことは当然であ
る。This is because even if a hardware failure occurs totally temporarily and no problem occurs in the subsequent circuit operation, interruption of the program execution due to this failure causes the CPU 13 to be interrupted by another CPU 13. This is because there is a case where the processing synchronization of is not always taken. Therefore, in a computer system in which one CPU does not cause any inconvenience with another CPU even if the CPU recovers its operation and starts executing a program, it is necessary to always mask the bus request signal. It is natural that there is no.

【００２７】次にこの図２に示したＣＰＵ監視装置の回
路動作を具体的に説明する。下限設定レジスタ３３に
は、この装置の初期設定の段階で、図１に示したシステ
ムバス１２におけるデータバスＤを通じて下限設定デー
タが供給され、セットされる。また、システム全体に共
通のクロック信号がクロック信号線５１を通じてインタ
ーバルタイマ３２と応答タイマ３５に供給されている。
インターバルタイマ３２は、計算機システムの各ＣＰＵ
１３₀〜１３₃が動作を開始する時点で制御回路３４か
らリセット信号５２の供給を受け、以後、クロック信号
線５１を通じてクロック信号のカウントアップを行うよ
うになっている。そして、所定の計数値に到達するたび
にカウントアップ信号５３を出力して再び零から計数を
カウントアップするようになっている。すなわち、イン
ターバルタイマ３２は制御回路３４に対して所定の周期
でカウントアップ信号５３を供給することになる。Next, the circuit operation of the CPU monitoring device shown in FIG. 2 will be specifically described. The lower limit setting register 33 is supplied and set with the lower limit setting data through the data bus D in the system bus 12 shown in FIG. 1 at the stage of initial setting of this device. A clock signal common to the entire system is supplied to the interval timer 32 and the response timer 35 via the clock signal line 51.
The interval timer 32 corresponds to each CPU of the computer system.
13 _0-13 ₃ receives a supply of the reset signal 52 from the control circuit 34 at the time of starting the operation, thereafter, and performs the count-up of the clock signal via a clock signal line 51. Then, each time a predetermined count value is reached, a count-up signal 53 is output and the count is again counted up from zero. That is, the interval timer 32 supplies the count-up signal 53 to the control circuit 34 at a predetermined cycle.

【００２８】このカウントアップ信号５３の１周期の時
間ごとにバス要求カウンタ３１は各ＣＰＵ１３₀〜１３
₃のバス要求の回数をカウントする。すなわち、バス要
求カウンタ３１はそれぞれのＣＰＵ１３₀〜１３₃に対
応したカウンタ５５₀〜５５ ₃を備えており、それぞれ
対応するバス要求線１８₀〜１８₃と接続されている。
そして、インターバルタイマ３２がカウントアップ信号
５３を出力するたびにこれらの計数したバス要求の回数
をカウント値情報５６₀〜５６₃として制御回路３４に
供給する一方、この直後に制御回路３４からリセット信
号５７の供給を受けてそれぞれのカウンタ５５₀〜５５
₃のカウント値を零にリセットするようになっている。
このようにして、バス要求カウンタ３１からはそれぞれ
のＣＰＵ１３₀〜１３₃が単位時間当たりにバス要求を
行った頻度がカウント値として出力されることになる。When one cycle of the count-up signal 53
The bus request counter 31 is provided for each CPU 13 at intervals.₀~ 13
₃Counts the number of bus requests for. That is, the bus is required
The job counter 31 is for each CPU 13₀~ 13₃Against
Counter 55₀~ 55 ₃Equipped with each
Corresponding bus request line 18₀~ 18₃Connected with.
Then, the interval timer 32 outputs a count-up signal.
Number of these counted bus requests each time 53 is output
Count value information 56₀~ 56₃As control circuit 34
While supplying, immediately after this, the reset signal from the control circuit 34.
Each counter 55 receives the supply of No. 57.₀~ 55
₃The count value of is reset to zero.
In this way, the bus request counter 31
CPU 13₀~ 13₃Bus requests per unit time
The frequency of execution is output as a count value.

【００２９】制御回路３４から出力されるマスク信号４
１₀〜４１₃は各ＣＰＵ１３₀〜１３₃に異常が検出さ
れない初期状態においてはＨ（ハイ）レベルとなってい
る。この状況下では、それぞれのバス要求線１８₀〜１
８₃にＣＰＵ１３₀〜１３₃からバス要求を示すＨレベ
ルの信号が送られてくると、これらは対応するアンドゲ
ート３９₀〜３９₃を通過してバス調停回路３８に供給
されることになる。図１に示したように、この計算機シ
ステムでは各ＣＰＵ１３₀〜１３₃がシステムバス１２
を共用している。したがって、バス調停回路３８はＣＰ
Ｕ１３₀〜１３ ₃のうちの複数が同時にシステムバス１
２の使用を要求したときに、いずれか１つにこれを使用
させるようにバスの使用権を調停するようになってい
る。この調停結果は、バス調停回路３８に接続されたバ
ス使用許可線１９₀〜１９₃から対応するＣＰＵ１３₀
〜１３₃に送出されるようになっている。Mask signal 4 output from the control circuit 34
1₀~ 41₃Is each CPU 13₀~ 13₃Is detected
In the initial state, it is at H (high) level
It Under this situation, each bus request line 18₀~ 1
8₃To CPU13₀~ 13₃H level indicating bus request from
When a signal is sent from the
39₀~ 39₃Supply to the bus arbitration circuit 38 through
Will be done. As shown in Fig. 1, this computer system
Each CPU13 in the stem₀~ 13₃Is the system bus 12
Are shared. Therefore, the bus arbitration circuit 38 is CP
U13₀~ 13 ₃System bus 1
Use this for any one when you request the use of 2.
To arbitrate the right to use the bus
It This arbitration result is stored in the bus arbitration circuit 38.
Use permission line 19₀~ 19₃Corresponding CPU13₀
~ 13₃It will be sent to.

【００３０】さて、制御回路３４は、バス要求カウンタ
３１から得られたそれぞれのＣＰＵ１３₀〜１３₃につ
いての単位時間当たりのバス要求の回数を下限設定レジ
スタ３３で設定しておいた下限値と比較するようになっ
ている。この下限値は、ＣＰＵ１３₀〜１３₃が通常の
動作状況であると仮定したとき、インターバルタイマ３
２が設定した１周期の時間内でバス要求を行う統計的に
最低限の回数よりも若干低い値に設定されている。すな
わち、これらのＣＰＵ１３₀〜１３₃が正常に動作して
いれば、多くの場合、システムバス１２に対するそれぞ
れのアクセスの回数はこの下限値よりも大きな値となる
ようになっている。The control circuit 34 compares the number of bus requests per unit time for each of the CPUs 13 _{0 to} 13 ₃ obtained from the bus request counter 31 with the lower limit value set in the lower limit setting register 33. It is supposed to do. This lower limit value is the interval timer 3 when the CPUs 13 _{0 to} 13 ₃ are assumed to be in a normal operating condition.
2 is set to a value slightly lower than the statistically minimum number of times of requesting a bus within the time of one cycle set. That is, if these CPUs 13 _{0 to} 13 ₃ are operating normally, the number of times of access to the system bus 12 will be larger than the lower limit value in many cases.

【００３１】一方、もしＣＰＵ１３₀〜１３₃のいずれ
かが障害を発生しプログラムの実行が停止したような場
合には、システムバス１２に対するバス要求の回数は増
加しなくなり、そのＣＰＵ１３に対応するカウンタ５５
のカウント値は前記した下限値よりも低い値になってし
まう。もちろん、あるＣＰＵ１３のカウント値がこの下
限値よりも低い場合でも、直ちにそのＣＰＵ１３に障害
が発生していると判断することはできない。そのＣＰＵ
１３が、たまたまある時間帯でシステムバス１２をアク
セスする必要性が少ないことは可能性として十分考えら
れることだからである。On the other hand, if any of the CPUs 13 _{0 to} 13 ₃ fails and execution of the program is stopped, the number of bus requests to the system bus 12 does not increase, and the counter corresponding to the CPU 13 is stopped. 55
The count value of is lower than the lower limit value. Of course, even if the count value of a certain CPU 13 is lower than this lower limit value, it cannot be immediately determined that a failure has occurred in that CPU 13. That cpu
This is because it is fully possible that 13 does not need to access the system bus 12 in a certain time zone.

【００３２】そこで制御回路３４はＣＰＵ１３₀〜１３
₃のうち下限値よりも低い値となってしまったものに対
して確認の作業を行う。これは、制御回路３４に接続さ
れた割込信号線１７₀〜１７₃のうち下限値よりもカウ
ント値が低くなってしまったＣＰＵ１３に対応する線に
対して割込信号を送出することによって行われる。割込
信号は割込信号線１７を通じて対応するＣＰＵ１３の割
込端子ＩＮＴに直接入力され、そのＣＰＵ１３に割込処
理を行わせる。そのＣＰＵ１３が正常ならば、割込処理
の結果として応答タイマ３５によって設定された応答時
間内に正常な応答信号をシステムバス１２に送出するよ
うになっている。Therefore, the control circuit 34 controls the CPUs 13 _{0 to} 13
Perform the confirmation work for those that have become lower than the lower limit of the _three . This is done by sending an interrupt signal to the line corresponding to the CPU 13 of which the count value has become lower than the lower limit value among the interrupt signal lines 17 _{0 to} 17 ₃ connected to the control circuit 34. Be seen. The interrupt signal is directly input to the corresponding interrupt terminal INT of the CPU 13 through the interrupt signal line 17, and causes the CPU 13 to perform the interrupt process. If the CPU 13 is normal, a normal response signal is sent to the system bus 12 within the response time set by the response timer 35 as a result of the interrupt processing.

【００３３】応答タイマ３５は、このような応答時間を
設定するために前記した割込信号が割込信号線１７に送
出されるタイミングでリセット信号６１を制御回路３４
から供給され、そのカウント値を零にリセットされるよ
うになっている。そして、この時点からクロック信号線
５１を通じてクロック信号のカウントアップを行い、応
答時間に相当する所定のクロック数をカウントアップし
たらカウントアップ信号６２を制御回路３４に送出する
ようになっている。The response timer 35 outputs the reset signal 61 at the timing when the above-mentioned interrupt signal is sent to the interrupt signal line 17 in order to set such a response time.
It is supplied from and the count value is reset to zero. Then, from this time point, the clock signal is counted up through the clock signal line 51, and when the predetermined number of clocks corresponding to the response time is counted up, the count-up signal 62 is sent to the control circuit 34.

【００３４】一方、応答レジスタ３７はデータバス１２
Ｄからデータを、またデコーダ３６から応答のあったＣ
ＰＵ１３の種類を判別するようになっている。このため
に、デコーダ３６にはシステムバス１２を構成するアド
レスバス１２Ａからアドレス情報が供給されるようにな
っている。On the other hand, the response register 37 is the data bus 12
Data from D and C from which the decoder 36 responded
The type of PU 13 is discriminated. For this reason, the decoder 36 is supplied with address information from the address bus 12A constituting the system bus 12.

【００３５】応答レジスタ３７は、各ＣＰＵ１３₀〜１
３₃に対応した４つのレジスタ６４ ₀〜６４₃を備えて
いる。これらのレジスタ６４₀〜６４₃は、割込信号に
対する応答をＣＰＵごとに登録するようになっている。
この中には、エラーの発生を示すエラービット（Ｅ）
と、応答の有無を示す応答ビット（Ｒ）とが含まれてい
る。The response register 37 is provided for each CPU 13₀~ 1
Three₃4 registers 64 corresponding to ₀~ 64₃Equipped with
There is. These registers 64₀~ 64₃Is the interrupt signal
A response to this is registered for each CPU.
In this, an error bit (E) indicating the occurrence of an error
And a response bit (R) indicating the presence or absence of a response are included.
It

【００３６】障害の発生の可能性があるＣＰＵ１３に対
して前記した割込信号が送出されたタイミングで、制御
回路３４は応答レジスタ３７内の対応するレジスタ６４
の応答ビット（Ｒ）を応答ビットセット信号６６によっ
て信号“１”にセットする。そして、応答タイマ３５か
らカウントアップ信号６２が出力される前に該当するＣ
ＰＵから応答データ（正常時は“０”）の書き込みがあ
ると、該当するレジスタ３７の応答ビットは信号“１”
から信号“０”に変化する。これに対して、時間内に応
答がなかった場合には、そのレジスタ３７の応答ビット
は信号“１”のままとなる。また、応答自体はカウント
アップ信号６２が出力される前に到来しても、その応答
データにおけるエラービット（Ｅ）がそのＣＰＵ１３の
エラー検出によって信号“０”から信号“１”に変化し
ている場合がある。At the timing when the above-mentioned interrupt signal is sent to the CPU 13 in which a failure may occur, the control circuit 34 causes the corresponding register 64 in the response register 37.
The response bit (R) is set to the signal "1" by the response bit set signal 66. Then, before the count-up signal 62 is output from the response timer 35, the corresponding C
When the response data (normally "0") is written from the PU, the response bit of the corresponding register 37 is the signal "1".
Changes to the signal "0". On the other hand, when there is no response within the time, the response bit of the register 37 remains the signal "1". Further, even if the response itself arrives before the count-up signal 62 is output, the error bit (E) in the response data changes from the signal "0" to the signal "1" due to the error detection of the CPU 13. There are cases.

【００３７】制御回路３４は、応答タイマ３５からカウ
ントアップ信号６２が出力された時点で対応するレジス
タ６４内のエラービット（Ｅ）と応答ビット（Ｒ）の読
み出しを行う。そして、エラービット（Ｅ）が信号
“１”に変化していたり、応答ビットが信号“１”の場
合のまであれば、そのＣＰＵ１３に何らかの障害が存在
するものとの判別を行う。制御回路３４はこの場合に
は、ＣＰＵ１３₀〜１３₃のうちのどれに障害が発生し
たかを示すエラー表示信号をエラー表示信号線２１に出
力すると共に、エラー通知信号をエラー通知信号線２２
に出力することになる。The control circuit 34 reads the error bit (E) and the response bit (R) in the corresponding register 64 when the count-up signal 62 is output from the response timer 35. Then, if the error bit (E) is changed to the signal "1" or the response bit is the signal "1", it is determined that the CPU 13 has some trouble. In this case, the control circuit 34 outputs an error display signal indicating which of the CPUs 13 _{0 to} 13 ₃ has a failure to the error display signal line 21 and outputs an error notification signal to the error notification signal line 22.
Will be output to.

【００３８】また、制御回路３４はこれと共に、障害の
発生したＣＰＵ１３に対応するマスク信号４１をＨレベ
ルからＬ（ロー）レベルに論理レベルを切り替える。こ
の結果、Ｌレベルに切り替えられたアンドゲート３９は
バス要求線１８に表われたバス要求信号を通過させない
ようになり、障害の発生したＣＰＵ１３からバス要求が
発生してもこれはバス調停回路３８に到達しないことに
なる。すなわち、障害の発生したＣＰＵ１３がシステム
バス１２を要求しても、この要求を拒絶することにな
る。At the same time, the control circuit 34 switches the logic level of the mask signal 41 corresponding to the faulty CPU 13 from the H level to the L (low) level. As a result, the AND gate 39 switched to the L level does not pass the bus request signal appearing on the bus request line 18, and even if a bus request is issued from the faulty CPU 13, this is caused by the bus arbitration circuit 38. Will not be reached. That is, even if the faulty CPU 13 requests the system bus 12, this request is rejected.

【００３９】エラー表示信号がエラー表示信号線２１に
出力されると、図示に示したようにエラー表示装置２３
に障害を起こしたＣＰＵ１３を特定する情報が表示され
る。この例では、異常と診断されたＣＰＵ１３に対応す
るランプが点滅して、オペレータが適切な対応をとるこ
とができるようになる。エラー通知信号線２２に出力さ
れたエラー信号は、前記したようにシステム管理プロセ
ッサに通知され、該当するＣＰＵ１３の分担していた作
業を他のＣＰＵ１３が分担する等の必要な措置が採られ
ることになる。When the error display signal is output to the error display signal line 21, the error display device 23 as shown in the figure.
Information for identifying the CPU 13 that has failed is displayed. In this example, the lamp corresponding to the CPU 13 diagnosed as abnormal blinks, and the operator can take an appropriate action. The error signal output to the error notification signal line 22 is notified to the system management processor as described above, and necessary measures such as the work shared by the corresponding CPU 13 being shared by another CPU 13 are taken. Become.

【００４０】図３は、制御回路による各ＣＰＵの監視制
御の様子を表わしたものである。なお、制御回路３４
は、図示しないがＣＰＵを内蔵しており、同じく図示し
ないＲＯＭ（リード・オンリ・メモリ）に格納されたプ
ログラムを実行することによってこの図３に示したよう
な制御を行うようになっている。制御回路３４内にはプ
ログラムの実行に際して各種データを一時的に格納する
ためのランダム・アクセス・メモリ（図示せず）も作業
用メモリとして用意されている。FIG. 3 shows how the control circuit monitors and controls each CPU. The control circuit 34
3 has a built-in CPU (not shown), and similarly executes a program stored in a ROM (read only memory) (not shown) to perform the control as shown in FIG. A random access memory (not shown) for temporarily storing various data when the program is executed is also prepared in the control circuit 34 as a working memory.

【００４１】制御回路３４はインターバルタイマ３２か
らカウントアップ信号５３が出力されたかどうかを監視
している（ステップＳ１０１）。カウントアップ信号５
３が出力されると（Ｙ）、前記した作業用メモリの所定
の領域に格納する数値Ｎが初期的に“０”にセットされ
る（ステップＳ１０２）。そして、まず第０のカウンタ
５５₀のカウント値が下限設定レジスタ３３で設定して
おいた下限値よりも小さいかどうかの判別が行われる
（ステップＳ１０３）。小さくなければ（Ｎ）、値Ｎが
“３”に到達しているかどうかのチェックが行われる
（ステップＳ１０４）。この場合には値Ｎが“１”なの
で（Ｎ）、値Ｎが“１”だけカウントアップされる（ス
テップＳ０５）。そして、ステップＳ１０３に戻って次
の第１のカウンタ５５₁について同様の比較作業が行わ
れることになる。The control circuit 34 monitors whether or not the count-up signal 53 is output from the interval timer 32 (step S101). Count up signal 5
When 3 is output (Y), the numerical value N stored in the predetermined area of the working memory is initially set to "0" (step S102). Then, first, it is determined whether or not the count value of the 0th counter 55 ₀ is smaller than the lower limit value set in the lower limit setting register 33 (step S103). If not smaller (N), it is checked whether or not the value N has reached "3" (step S104). In this case, since the value N is "1" (N), the value N is incremented by "1" (step S05). Then, returning to step S103, the same comparison operation is performed for the next first counter 55 ₁ .

【００４２】このようにして各カウンタ５５₀〜５５₃
について順次、カウント値の比較が行われて、対応する
ＣＰＵ１３₀〜１３₃がシステムバス１２を通常行われ
る程度の頻度でアクセスしているかどうかのチェックが
行われる。この段階で、例えば第１のカウンタ５５₁の
カウント値が下限値よりも小さかったものとする。この
場合には（ステップＳ１０３；Ｙ）、第１のＣＰＵ１３
₁に対して割込信号が出力されることになる（ステップ
Ｓ１０６）。In this way, each counter 55 _{0 to} 55 ₃
Are sequentially compared with each other, and it is checked whether or not the corresponding CPUs 13 _{0 to} 13 ₃ are accessing the system bus 12 as often as they are normally performed. At this stage, for example, it is assumed that the count value of the first counter 55 ₁ is smaller than the lower limit value. In this case (step S103; Y), the first CPU 13
_An interrupt signal is output for ₁ (step S106).

【００４３】これと共に、応答タイマ３５は応答の制限
時間を計るためにクロック信号線５１を通じてクロック
信号のカウントアップを開始する（ステップＳ１０
７）。そして、応答タイマ３５からカウントアップ信号
６２が出力される前に第１のＣＰＵ１３₁から応答があ
れば（ステップＳ１０９；Ｎ、Ｓ１０８；Ｙ）、応答レ
ジスタ３７の内容が正常応答であるかどうかをチェック
する（ステップＳ１１０）。そして、正常応答であれば
（Ｙ）、第１のＣＰＵ１３₁は正常であるのでステップ
Ｓ１０４に進んで、次のＣＰＵ１３₂についての検査の
段階に作業を進めることになる。At the same time, the response timer 35 starts counting up the clock signal through the clock signal line 51 in order to measure the response time limit (step S10).
7). If there is a response from the first CPU 13 ₁ before the count-up signal 62 is output from the response timer 35 (steps S109; N, S108; Y), it is determined whether the content of the response register 37 is a normal response. Check (step S110). Then, if the response is normal (Y), the first CPU 13 ₁ is normal, and thus the process proceeds to step S104 to proceed to the inspection stage for the next CPU 13 ₂ .

【００４４】これに対して、ステップＳ１１０で第１の
ＣＰＵ１３₁が正常な応答を行わなかったような場合、
すなわち第１のレジスタ６４₁内のエラービット（Ｅ）
が“１”に変化したような場合には（ステップＳ１１
０；Ｎ）、第１のマスク信号４１₁をＨレベルからＬレ
ベルに切り替えて、第１のＣＰＵ１３₁のバス要求線１
８₁にバス要求信号が表れたときにこれがバス調停回路
３８に到達するのを遮断（マスク）する。また、第１の
ＣＰＵ１３₁に障害があることが判明したのでその旨の
エラー表示を行わせるためにエラー表示信号線２１にエ
ラー表示信号を出力すると共に、エラー通知信号線２２
にエラー信号を出力する（ステップＳ１１１）。そし
て、ステップＳ１０４に戻って次のＣＰＵ１３₂に対す
るチェック作業を開始することになる。On the other hand, if the first CPU 13 ₁ does not respond normally in step S110,
That is, the error bit (E) in the _first register 64 ₁
Is changed to “1” (step S11
0; N), the first mask signal 41 ₁ is switched from H level to L level, the first CPU 13 ₁ of the bus request line 1
When the bus request signal appears at 8 _1, it blocks (masks) the arrival of the bus request signal at the bus arbitration circuit 38. Further, since it has been found that the first CPU 13 ₁ has a failure, an error display signal is output to the error display signal line 21 and an error notification signal line 22
An error signal is output to (step S111). Then, the process returns to step S104 to start the check operation for the next CPU 13 ₂ .

【００４５】このようにして各ＣＰＵ１３₀〜１３₃に
対するチェック作業が順次進行し、この途中でカウント
値が下限値を下回ったＣＰＵ１３に対しては割込処理に
よりそれらが正常に動作するかどうかのチェックが行わ
れることになる。そして、第３のＣＰＵ１３₃に対する
チェック作業が終了したら、ステップＳ１０４で値Ｎが
“３”となっているので、この値をカウントアップする
ことなく全チェック作業を終了させることになる（エン
ド）。もちろん、システムバス１２にこれ以上の数のＣ
ＰＵ１３が接続されている場合には、その数に応じた値
Ｎまでチェック作業が繰り返し行われることは当然であ
る。In this way, the check work for each of the CPUs 13 _{0 to} 13 ₃ progresses in sequence, and for the CPU 13 whose count value falls below the lower limit value during this, it is checked whether or not they normally operate by the interrupt process. A check will be made. Then, when the checking operation for the third CPU 13 ₃ is completed, the value N in step S104 is "3", so that to terminate the entire checking operation without counting up the value (END). Of course, there are more Cs on the system bus 12.
When the PUs 13 are connected, it goes without saying that the check work is repeated up to the value N according to the number.

【００４６】変形例 Modification

【００４７】図４は、本発明の監視装置の変形例を説明
するために、これに使用される条件設定レジスタの内容
を表わしたものである。この変形例では先の実施例の下
限設定レジスタ３３の代わりに、監視の対象となるＣＰ
Ｕごとに条件設定レジスタ７１を配置している。この条
件設定レジスタ７１は１６ビット構成となっており、最
上位の１ビット“Ｃ”が条件ビット７２を構成し、残り
の１５ビットが条件値７３を表わしている。FIG. 4 shows the contents of the condition setting register used for explaining a modified example of the monitoring apparatus of the present invention. In this modified example, instead of the lower limit setting register 33 of the previous embodiment, the CP to be monitored
A condition setting register 71 is arranged for each U. The condition setting register 71 has a 16-bit configuration. The most significant 1 bit "C" constitutes a condition bit 72, and the remaining 15 bits represent a condition value 73.

【００４８】ここで条件ビット７２は、これが信号
“０”のときは条件値７３で示された値未満のときにそ
のＣＰＵを異常と判別し、先の実施例で説明したような
割込処理を行うことにしている。また、条件ビット７２
が信号“１”のときには、条件値７３で示された値を越
えたときにそのＣＰＵを異常と判別し、先の実施例で説
明したような割込処理を行うことにしている。When the condition bit 72 is the signal "0", it determines that the CPU is abnormal when it is less than the value indicated by the condition value 73, and the interrupt processing as described in the previous embodiment. I am going to do. Also, the condition bit 72
Is a signal "1", the CPU is determined to be abnormal when the value indicated by the condition value 73 is exceeded, and the interrupt processing as described in the previous embodiment is performed.

【００４９】すなわち、この変形例では監視の対象とな
るＣＰＵごとに一定時間内にバスをアクセスする回数に
対する異常推測の基準となる数値を自由に設定すると共
に、その基準となる数値よりも大きいときに異常と推測
するのか、小さいときに異常と推測するのかを条件ビッ
ト７２によって設定できるようにしている。これによ
り、各ＣＰＵに対するより木目の細かい監視が可能にな
る。In other words, in this modification, a numerical value that is a criterion for abnormality estimation with respect to the number of times the bus is accessed within a fixed time is freely set for each CPU to be monitored, and when it is larger than the criterion value. The condition bit 72 can set whether to presume abnormal or to presume abnormal when it is small. This allows more detailed monitoring of each CPU.

【００５０】なお、以上説明した実施例では複数のＣＰ
Ｕが共通のバスに接続されている計算機システムを例に
とって本発明の説明を行ったが、単独のＣＰＵを監視す
る監視装置にも本発明を適用することができることはも
ちろんである。また、実施例および変形例では、一定時
間にバスの使用要求があったとき、それを計数して所定
の値と比較することにしたが、バスの使用要求の頻度を
割合等の他の手法で求め、これを所定の基準値または基
準範囲と比較するようにしてもよい。In the embodiment described above, a plurality of CPs are used.
Although the present invention has been described by taking a computer system in which U is connected to a common bus as an example, it goes without saying that the present invention can also be applied to a monitoring device that monitors a single CPU. Further, in the embodiment and the modified example, when a bus use request is made in a certain period of time, the number is counted and compared with a predetermined value. However, another method such as a ratio of the bus use request frequency is used. It is also possible to compare the value with a predetermined reference value or reference range.

【００５１】また、実施例および変形例ではこのような
比較作業の結果としてあるＣＰＵが異常と推察されたと
き、更にそのＣＰＵに割り込みをかけて応答を行わせ、
正常動作を行っているかどうかを判断することにした。
しかしながら、後者の再確認の作業を省略してそのＣＰ
Ｕを異常と判断することも可能であることは言うまでも
ない。Further, in the embodiment and the modified example, when a certain CPU is inferred to be abnormal as a result of such comparison work, the CPU is further interrupted to make a response,
I decided to judge whether it is operating normally.
However, omitting the latter reconfirmation work, the CP
It goes without saying that it is possible to judge U as abnormal.

【００５２】特に、そのＣＰＵがある時間帯においてど
の程度バスをアクセスする必要があるかを判別できるよ
うな場合には、それぞれの時間帯において下限設定レジ
スタ３３（図２）あるいは条件設定レジスタ７１の値を
こまめに書き換えるようすることで、そのＣＰＵが異常
であるかどうかを十分正確に判断することが可能であ
る。In particular, when it is possible to determine to what extent the CPU needs to access the bus in a certain time zone, the lower limit setting register 33 (FIG. 2) or the condition setting register 71 in each time zone can be determined. By frequently rewriting the values, it is possible to determine whether the CPU is abnormal or not with sufficient accuracy.

【００５３】[0053]

【発明の効果】以上説明したように請求項１記載の発明
では、ＣＰＵがバスを使用する頻度を測定して、それが
異常な値であればそのＣＰＵが異常であると判別するこ
とにした。したがって、そのＣＰＵに何らの負担もかけ
ずに正常か否かの判別を行うことができるという効果が
ある。As described above, according to the first aspect of the invention, the frequency at which the CPU uses the bus is measured, and if it is an abnormal value, it is determined that the CPU is abnormal. . Therefore, it is possible to determine whether the CPU is normal or not without imposing any load on the CPU.

【００５４】また、請求項２記載の発明によれば、ＣＰ
Ｕがバスを使用する頻度を測定して、それが異常な値で
あれば、更にそのＣＰＵに割り込みをかけて正常な応答
が行われるかどうかを判別することでそのＣＰＵに異常
が発生したかどうかを最終的に判断することにしたの
で、正確な判断を行うことができる。しかも、ＣＰＵに
対する割り込みは、異常が推察される場合に限ったの
で、ＣＰＵにこの監視のための過度の負担を強いること
がないという利点がある。According to the invention described in claim 2, CP
Whether the CPU has an abnormality by measuring the frequency at which the bus uses the bus and if it is an abnormal value, further interrupt the CPU to determine whether a normal response is made. I decided to make a final decision on whether to make an accurate decision. Moreover, the interruption to the CPU is limited to the case where an abnormality is suspected, so that there is an advantage that the CPU is not excessively burdened with the monitoring.

【００５５】更に請求項３記載の発明によれば、バスを
共用する複数のＣＰＵを監視の対象として、これらのＣ
ＰＵに過度の負担をかけずに異常の有無を判別すること
ができる。しかも、異常を検出した場合にはその結果を
外部に出力することにしたので、例えば表示器に出力す
ることでオペレータにどのＣＰＵが異常であるかを知ら
せることができる。また上位装置に知らせることで、こ
れら複数のＣＰＵの共働作業に支障が生じないような措
置を採らせることができる。Further, according to the third aspect of the invention, a plurality of CPUs sharing a bus are targeted for monitoring, and these C
It is possible to determine the presence or absence of an abnormality without imposing an excessive load on the PU. In addition, when an abnormality is detected, the result is output to the outside. Therefore, for example, by outputting the result to the display, the operator can be informed of which CPU is abnormal. In addition, by notifying the host device, it is possible to take measures so as not to hinder the cooperative work of the plurality of CPUs.

【００５６】また、請求項４記載の発明によれば、複数
のＣＰＵそれぞれについてバスの使用頻度の測定を行
い、異常と推察されるＣＰＵに対しては割り込みによっ
て確認を行って、最終的に異常と判別されたＣＰＵにつ
いては、バス調停手段に対するバスの使用要求をマスク
することにしたので、システム全体の誤動作の発生を未
然に防止することができる。According to the invention described in claim 4, the bus usage frequency is measured for each of the plurality of CPUs, and the CPUs that are suspected to be abnormal are confirmed by interruption, and finally the abnormality is detected. With regard to the CPU determined to be the above, the use request of the bus to the bus arbitration means is masked, so that the malfunction of the entire system can be prevented.

[Brief description of drawings]

【図１】本発明の一実施例における監視装置を使用し
た計算機システムの構成を表わしたブロック図である。FIG. 1 is a block diagram showing a configuration of a computer system using a monitoring device according to an embodiment of the present invention.

【図２】本実施例でＣＰＵ監視装置の機能的な構成を
表わしたブロック図である。FIG. 2 is a block diagram showing a functional configuration of a CPU monitoring device in this embodiment.

【図３】制御回路による各ＣＰＵの監視制御の様子を
表わした流れ図である。FIG. 3 is a flowchart showing a state of supervisory control of each CPU by a control circuit.

【図４】本発明の変形例における条件設定レジスタの
内容を表わした説明図である。FIG. 4 is an explanatory diagram showing contents of a condition setting register in a modified example of the present invention.

[Explanation of symbols]

１１…ＣＰＵ監視装置（監視装置）、１２…システムバ
ス、１３₀〜１３₃…ＣＰＵ、１４…共有メモリ、１７
…割込信号線、１８…バス要求線、１９…バス使用許可
線、２１…エラー表示信号線、２２…エラー通知信号
線、２３…エラー表示装置、３１…バス要求カウンタ、
３２…インターバルタイマ、３３…下限設定レジスタ、
３４…制御回路、３５…応答タイマ、３７…応答レジス
タ、３８…バス調停回路、３９…アンドゲート、５５…
カウンタ、６４…レジスタ、７１…条件設定レジスタ、
７２…条件ビット、７３…条件値11 ... CPU monitoring device (monitoring device), 12 ... System bus, 13 _{0 to} 13 ₃ ... CPU, 14 ... Shared memory, 17
... interruption signal line, 18 ... bus request line, 19 ... bus use permission line, 21 ... error display signal line, 22 ... error notification signal line, 23 ... error display device, 31 ... bus request counter,
32 ... Interval timer, 33 ... Lower limit setting register,
34 ... Control circuit, 35 ... Response timer, 37 ... Response register, 38 ... Bus arbitration circuit, 39 ... AND gate, 55 ...
Counter, 64 ... Register, 71 ... Condition setting register,
72 ... Condition bit, 73 ... Condition value

Claims

[Claims]

1. A frequency measuring means for measuring a frequency at which a central processing unit uses a bus for a fixed time, a comparing means for comparing a measured value of the frequency measuring means with a predetermined allowable range, and a comparing means for preliminarily using the comparing means. A monitoring device comprising: an abnormality detecting unit that detects an abnormality of the central processing unit when it is determined that the frequency does not fall within a predetermined allowable range.

2. A frequency measuring means for measuring the frequency of use of the bus by the central processing unit for a certain period of time, a comparing means for comparing the measured value of the frequency measuring means with a predetermined allowable range, and the comparing means beforehand. When it is determined that the frequency does not fall within the defined allowable range, an interrupt signal transmitting means for transmitting an interrupt signal for outputting a predetermined response to the central processing unit, and this interrupt signal are transmitted. A monitoring device comprising: an abnormality detecting means for detecting an abnormality of the central processing unit except when the central processing unit makes a normal response within a predetermined time after the operation.

3. A frequency measuring means for individually measuring the frequency of each of a plurality of central processing units sharing a bus using the bus at a fixed time, and a measurement value for each central processing unit by the frequency measuring means is predetermined. Comparing means for comparing with the allowable range, and an interrupt signal for outputting a predetermined response to the central processing unit, which is determined by the comparing means to have a frequency that does not belong to the predetermined allowable range. An interrupt signal transmitting means, and an abnormality detecting means for detecting an abnormality of the central processing unit unless the central processing unit makes a normal response within a predetermined time after the interruption signal is transmitted, A monitoring device comprising: a detection result output means for outputting the detection result of the abnormality detection means to the outside.

4. A frequency measuring means for individually measuring the frequency of use of a bus by each of a plurality of central processing units sharing a bus, and a measurement value for each central processing unit by the frequency measuring means is predetermined. Comparing means for comparing with the allowable range, and an interrupt signal for outputting a predetermined response to the central processing unit, which is determined by the comparing means to have a frequency that does not belong to the predetermined allowable range. An interruption signal sending means, an abnormality detecting means for detecting an abnormality of the central processing unit except when the central processing unit makes a normal response within a predetermined time after the interruption signal is sent, Bus arbitration means for arbitrating bus usage requests of a plurality of central processing units, and when the central processing unit in which the abnormality is detected by the abnormality detection means makes a bus usage request, this reaches the bus arbitration means. Monitoring apparatus characterized by comprising a bus request mask means for masking so as not.