JP2001325242A

JP2001325242A - Monitoring system for multiple cpu system

Info

Publication number: JP2001325242A
Application number: JP2000141208A
Authority: JP
Inventors: Susumu Moriya; 進森谷; Mitsuhiro Watanabe; 充洋渡邉
Original assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Current assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Priority date: 2000-05-15
Filing date: 2000-05-15
Publication date: 2001-11-22
Anticipated expiration: 2020-05-15
Also published as: JP4126849B2

Abstract

PROBLEM TO BE SOLVED: To solve the problem that there is no measures when a CPU unit is stopped by a temporary fault in the case that individual CPU units perform monitoring by the presence/absence of transmission signals from other CPU units in a multiple CPU system. SOLUTION: The individual CPU units #0 and #N are provided with a timer TIM for forcibly resetting and reactivating the CPU of the respective CPU units when a transmission signal TX is not generated within a set time limit and stoppage by the temporary fault is prevented by the reset. Also, the CPU units are provided with a timer for forcibly resetting and reactivating the CPU of the present CPU unit when the transmission signal is not generated within the set time limit and also when a specific code is not received from the other CPU unit.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、シリアルバスでＣ
ＰＵユニット間が接続されたマルチＣＰＵシステムの監
視方式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a serial bus
The present invention relates to a monitoring method for a multi-CPU system in which PU units are connected.

【０００２】[0002]

【従来の技術】図３は、マルチＣＰＵシステムの要部構
成を示し、ＣＰＵユニット＃０〜＃３が互いにシリアル
バスで接続され、各ＣＰＵユニットによる分散処理シス
テムなどを構築する。このシステム構成において、各Ｃ
ＰＵユニットのうちの１つでも故障すると、システム全
体の機能障害になるため、各ＣＰＵユニット＃０〜＃３
の動作監視が必要になる。2. Description of the Related Art FIG. 3 shows a configuration of a main part of a multi-CPU system, in which CPU units # 0 to # 3 are connected to each other via a serial bus, thereby constructing a distributed processing system or the like using each CPU unit. In this system configuration, each C
Failure of even one of the PU units results in a functional failure of the entire system.
Operation monitoring is required.

【０００３】この監視方式として、シリアルバスを通し
て各ＣＰＵユニット間で授受される情報の有無で他のＣ
ＰＵユニットの正常／異常を監視している。[0003] As this monitoring method, the presence or absence of information exchanged between CPU units through a serial bus determines whether or not other CPU units can receive information.
Monitors the normal / abnormal state of the PU unit.

【０００４】例えば、ＣＰＵユニット＃０は、ＣＰＵユ
ニット＃１から定期的に送られてくる情報を基に、ＣＰ
Ｕユニット＃１の健全性を判定する。ＣＰＵユニット＃
１からの情報が何らかの理由で途切れた場合、ＣＰＵユ
ニット＃０はＣＰＵユニット＃１の異常と認識し、異常
監視出力を発生する。[0004] For example, the CPU unit # 0 performs a CP based on information periodically sent from the CPU unit # 1.
The soundness of U unit # 1 is determined. CPU unit #
If the information from 1 is interrupted for some reason, the CPU unit # 0 recognizes that the CPU unit # 1 is abnormal and generates an abnormality monitoring output.

【０００５】このように、各ＣＰＵユニット間の情報の
有無でそれぞれ他のＣＰＵユニットを監視している。As described above, each of the other CPU units is monitored by the presence or absence of information between the CPU units.

【０００６】[0006]

【発明が解決しようとする課題】ＣＰＵユニットは、外
来ノイズ等を含めて、ハードウェアやソフトウェアの一
過性障害で停止に至る場合がある。この場合、当該ＣＰ
Ｕユニット＃Ｘを監視している他のＣＰＵユニット＃Ｙ
には設定時間内に情報が送信されないため、他のＣＰＵ
ユニット＃ＹはＣＰＵユニット＃Ｘが異常と認識してし
まい、システムダウンに至る恐れがある。The CPU unit may stop due to a temporary failure of hardware or software, including external noise. In this case, the CP
Other CPU unit #Y monitoring U unit #X
Information is not sent within the set time.
In the unit #Y, the CPU unit #X recognizes that there is an abnormality, and the system may be down.

【０００７】本発明の目的は、ＣＰＵユニットが一過性
障害で停止したときのシステムダウンを防止できるマル
チＣＰＵシステムの監視方式を提供することにある。An object of the present invention is to provide a monitoring method for a multi-CPU system which can prevent a system down when a CPU unit is stopped due to a temporary failure.

【０００８】[0008]

【課題を解決するための手段】本発明は、ＣＰＵユニッ
トが一過性障害で停止した場合、多くの場合はその再起
動により正常に復帰できることに着目し、ＣＰＵユニッ
トが一過性障害で停止した場合に障害発生ＣＰＵユニッ
トが自動的に自ユニットのＣＰＵを強制リセットまたは
他のＣＰＵユニットからの受信を論理積条件にしてＣＰ
Ｕを強制リセットし、このリセットにより障害発生ＣＰ
Ｕユニット自体を再起動することで、マルチＣＰＵシス
テムのシステムダウンを防止できるようにしたもので、
以下の方式を特徴とする。SUMMARY OF THE INVENTION The present invention focuses on the fact that when a CPU unit is stopped due to a transient failure, it can be normally restored by restarting the CPU unit in many cases, and the CPU unit is stopped due to a transient failure. In the event of a failure, the CPU unit that caused the failure automatically resets its own CPU or makes the
U is forcibly reset.
By restarting the U unit itself, it is possible to prevent the system down of the multi CPU system,
The following method is characterized.

【０００９】シリアルバスで複数のＣＰＵユニット間が
接続され、各ＣＰＵユニットは他のＣＰＵユニットから
の送信信号を監視時間内に受信しないときに当該ＣＰＵ
ユニットの障害発生とするマルチＣＰＵシステムの監視
方式において、各ＣＰＵユニットは、送信信号が設定時
限内に発生しないときに自ＣＰＵユニットのＣＰＵを強
制リセットして再起動させるタイマを備えたことを特徴
とする。A plurality of CPU units are connected by a serial bus, and when each CPU unit does not receive a transmission signal from another CPU unit within the monitoring time, the CPU unit is connected to the CPU unit.
In the monitoring method of the multi-CPU system in which a unit failure occurs, each CPU unit is provided with a timer for forcibly resetting and restarting the CPU of the own CPU unit when a transmission signal does not occur within a set time limit. And

【００１０】また、シリアルバスで複数のＣＰＵユニッ
ト間が接続され、各ＣＰＵユニットは他のＣＰＵユニッ
トからの送信信号を監視時間内に受信しないときに当該
ＣＰＵユニットの障害発生とするマルチＣＰＵシステム
の監視方式において、各ＣＰＵユニットは、送信信号が
設定時限内に発生しないとき、かつ他のＣＰＵユニット
から特殊コードを受信しないときに自ＣＰＵユニットの
ＣＰＵを強制リセットして再起動させるタイマを備えた
ことを特徴とする。A plurality of CPU units are connected by a serial bus, and each CPU unit causes a failure of the CPU unit when a transmission signal from another CPU unit is not received within a monitoring time. In the monitoring method, each CPU unit has a timer for forcibly resetting and restarting the CPU of its own CPU unit when a transmission signal does not occur within a set time limit and when no special code is received from another CPU unit. It is characterized by the following.

【００１１】[0011]

【発明の実施の形態】図１は、本発明の実施形態を示す
ＣＰＵユニットの要部構成図である。各ＣＰＵユニット
＃０、＃Ｎは、送信信号ＴＸを送信バッファＢＵＦＴを
通してシリアルバスに出力し、また、他のＣＰＵユニッ
トからの送信信号ＲＸを受信バッファＢＵＦＲを通して
受信する。FIG. 1 is a block diagram of a main part of a CPU unit showing an embodiment of the present invention. Each of the CPU units # 0 and #N outputs the transmission signal TX to the serial bus through the transmission buffer BUFT, and receives the transmission signal RX from another CPU unit through the reception buffer BUFR.

【００１２】ここで、各ＣＰＵユニット＃０、＃１は、
送信バッファＢＵＦＴの入力になる送信信号ＴＸで再起
的に計時を開始するタイマＴＩＭを設ける。Here, the CPU units # 0 and # 1 are:
A timer TIM is provided to start time counting recursively with a transmission signal TX input to the transmission buffer BUFT.

【００１３】このタイマＴＩＭは、設定される時限内に
送信信号ＴＸが発生したときにリセットされ、このリセ
ット時点から再び計時を開始することで、設定時限内に
送信信号ＴＸが発生する限りリセットと計時を繰り返
す。そして、設定時限内に送信信号ＴＸが発生しない場
合にタイムアップ出力を得る。The timer TIM is reset when the transmission signal TX is generated within a set time period, and by restarting time counting from this reset time, the timer TIM is reset as long as the transmission signal TX is generated within the set time period. Repeat the timing. Then, when the transmission signal TX does not occur within the set time limit, a time-up output is obtained.

【００１４】タイマＴＩＭのタイムアップ出力は、自Ｃ
ＰＵユニット内のＣＰＵを強制的にリセットさせ、自Ｃ
ＰＵユニットを再起動させる信号にする。The time-up output of the timer TIM is
Force the CPU in the PU unit to reset
The signal is used to restart the PU unit.

【００１５】なお、タイマＴＩＭの時限は、他のＣＰＵ
ユニットに設定される監視時間よりも短い時間にされ
る。また、タイマＴＩＭは、ＣＰＵユニットのＣＰＵな
どの動作停止にも機能を維持できるハードウェア構成と
する。The time limit of the timer TIM is different from that of another CPU.
The time is set shorter than the monitoring time set in the unit. The timer TIM has a hardware configuration capable of maintaining the function even when the operation of the CPU of the CPU unit is stopped.

【００１６】このようなタイマＴＩＭを各ＣＰＵユニッ
トに設けたシステムにおいて、各ＣＰＵユニット＃０、
＃Ｎは、起動時に内部を初期化し、タイマＴＩＭも初期
化して処理を開始する。各ＣＰＵユニットは、その処理
開始と共に、シリアルバスを通して各ＣＰＵユニット間
で授受される情報の有無で他のＣＰＵユニットの正常／
異常の監視を開始、および自ＣＰＵユニット内のタイマ
ＴＩＭも計時を開始する。In a system in which such a timer TIM is provided in each CPU unit, each CPU unit # 0,
#N initializes the inside at the time of startup, initializes the timer TIM, and starts processing. At the start of the processing, each CPU unit determines whether other CPU units are normal or not based on the presence or absence of information exchanged between the CPU units via the serial bus.
The monitoring of the abnormality is started, and the timer TIM in the own CPU unit also starts counting time.

【００１７】この処理状態で、あるＣＰＵユニットに一
過性障害が発生し、その送信信号ＴＸの発生が停止した
場合、この停止時間が他のＣＰＵユニットによる監視時
間内で、タイマＴＩＭの時限に達したとき、タイマＴＩ
ＭによるＣＰＵの強制リセットがなされ、自ＣＰＵユニ
ットを再起動させる。この再起動により一過性障害が動
作停止原因の場合には再起動により正常動作に復帰させ
る。In this processing state, if a temporary failure occurs in a certain CPU unit and the generation of the transmission signal TX is stopped, the stop time is within the monitoring time of another CPU unit and is limited to the time limit of the timer TIM. When reached, timer TI
The CPU is forcibly reset by M, and the own CPU unit is restarted. If the transient failure is the cause of the operation stop due to this restart, the normal operation is restored by the restart.

【００１８】ＣＰＵユニットがその再起動にも正常動作
に復帰できない障害発生の場合、他のＣＰＵユニットに
よる監視時間で障害発生として監視する。When a failure occurs in which the CPU unit cannot return to the normal operation even after restarting, the failure is monitored as a failure occurrence in the monitoring time of another CPU unit.

【００１９】なお、タイマＴＩＭは、１回の強制リセッ
ト信号を発生するに限らず、その時限を他のＣＰＵユニ
ットによる監視時間の数分の１に設定することで、送信
信号の停止で複数回の強制リセット信号を発生すること
もできる。この場合、タイマＴＩＭは強制リセット信号
を発生したときにタイマＴＩＭ自体をリセットする構成
にする。Note that the timer TIM is not limited to generating one forced reset signal, but by setting the time limit to be a fraction of the monitoring time by the other CPU units, the timer TIM can generate a plurality of times by stopping the transmission signal. Can be generated. In this case, the timer TIM is configured to reset itself when a forced reset signal is generated.

【００２０】また、タイマＴＩＭの時限は、他のＣＰＵ
ユニットによる監視時間よりも長い時間に設定すること
ができる。この場合、他のＣＰＵユニットが先に障害発
生を認識するが、この障害発生を他のＣＰＵユニットが
複数回の認識で初めて障害情報を発生する構成とする。Further, the time limit of the timer TIM is determined by another CPU.
The time can be set longer than the monitoring time by the unit. In this case, the other CPU unit recognizes the occurrence of the failure first, but the failure information is generated only when the other CPU unit recognizes the failure a plurality of times.

【００２１】図２は、本発明の他の実施形態を示すＣＰ
Ｕユニットの要部構成図である。同図が図１と異なる部
分は、タイマＴＩＭのリセット信号発生条件に、他のＣ
ＰＵユニットからの特殊コードの受信信号ＲＸをもたせ
る点にある。FIG. 2 shows a CP according to another embodiment of the present invention.
It is a principal part block diagram of U unit. FIG. 6 differs from FIG. 1 in that other C
This is to provide a reception signal RX of a special code from the PU unit.

【００２２】この構成では、送信信号ＴＸの停止がタイ
マＴＩＭの時限を越えるのみではＣＰＵの強制リセット
はなされず、他のＣＰＵユニットから特殊コードを受信
したことをＡＮＤ（論理積）条件にして強制リセットを
発生する。In this configuration, the forced reset of the CPU is not performed only when the stop of the transmission signal TX exceeds the time limit of the timer TIM. Generate a reset.

【００２３】この構成により、他のＣＰＵユニットによ
る特殊コードの送信が条件となり、他のＣＰＵユニット
との協動による再起動を可能にし、タイマＴＩＭの誤動
作による不要な強制リセットを防止できる。With this configuration, the transmission of the special code by another CPU unit becomes a condition, so that restart can be performed in cooperation with another CPU unit, and unnecessary forced reset due to malfunction of the timer TIM can be prevented.

【００２４】[0024]

【発明の効果】以上のとおり、本発明によれば、送信信
号が停止した障害発生ＣＰＵユニットが自動的に自ユニ
ットのＣＰＵを強制リセットまたは他のＣＰＵユニット
からの受信を論理積条件にしてＣＰＵを強制リセット
し、このリセットにより障害発生ＣＰＵユニット自体を
再起動するようにしたため、ＣＰＵユニットが一過性障
害で停止したときのシステムダウンを防止できる。As described above, according to the present invention, the faulty CPU unit whose transmission signal has stopped automatically resets its own CPU or makes the reception from another CPU unit a logical product condition. Is forcedly reset, and the reset causes the failed CPU unit itself to be restarted. Therefore, it is possible to prevent a system failure when the CPU unit is stopped due to a temporary failure.

[Brief description of the drawings]

【図１】本発明の実施形態を示すＣＰＵユニットの要部
構成図。FIG. 1 is a configuration diagram of a main part of a CPU unit according to an embodiment of the present invention.

【図２】本発明の他の実施形態を示すＣＰＵユニットの
要部構成図。FIG. 2 is a main part configuration diagram of a CPU unit showing another embodiment of the present invention.

【図３】マルチＣＰＵシステムの構成例。FIG. 3 is a configuration example of a multi-CPU system.

[Explanation of symbols]

＃０〜＃３、＃Ｎ…ＣＰＵユニットＢＵＦＴ…送信バッファＢＵＦＲ…受信バッファＴＩＭ…タイマ # 0 to # 3, #N: CPU unit BUFT: Transmission buffer BUFR: Reception buffer TIM: Timer

───────────────────────────────────────────────────── フロントページの続きＦターム(参考） 5B042 GA11 JJ04 JJ15 JJ19 KK02 5B045 BB12 BB28 HH04 JJ05 JJ45 5B083 AA05 BB01 CC09 CD07 CE01 DD11 EE02 EE11 EF01 GG04 ──────────────────────────────────────────────────続き Continued on the front page F term (reference) 5B042 GA11 JJ04 JJ15 JJ19 KK02 5B045 BB12 BB28 HH04 JJ05 JJ45 5B083 AA05 BB01 CC09 CD07 CE01 DD11 EE02 EE11 EF01 GG04

Claims

[Claims]

A plurality of CPU units are connected by a serial bus, and each CPU unit does not receive a transmission signal from another CPU unit within a monitoring time when the CPU unit does not receive the transmission signal.
In the monitoring method of the multi-CPU system in which a failure of the U unit occurs, each CPU unit has a timer for forcibly resetting and restarting the CPU of the own CPU unit when a transmission signal does not occur within a set time limit. Characteristic Multi-C
Monitoring system for PU system.

2. A plurality of CPU units are connected by a serial bus, and each of the CPU units does not receive a transmission signal from another CPU unit within a monitoring time period.
In the monitoring method of the multi-CPU system in which a failure of the U unit occurs, each CPU unit switches the CPU of its own CPU unit when a transmission signal does not occur within a set time limit and when no special code is received from another CPU unit. A monitoring method for a multi-CPU system, comprising a timer for forcibly resetting and restarting.