JP2001325242A - Monitoring system for multiple cpu system - Google Patents

Monitoring system for multiple cpu system

Info

Publication number
JP2001325242A
JP2001325242A JP2000141208A JP2000141208A JP2001325242A JP 2001325242 A JP2001325242 A JP 2001325242A JP 2000141208 A JP2000141208 A JP 2000141208A JP 2000141208 A JP2000141208 A JP 2000141208A JP 2001325242 A JP2001325242 A JP 2001325242A
Authority
JP
Japan
Prior art keywords
cpu
unit
cpu unit
transmission signal
units
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2000141208A
Other languages
Japanese (ja)
Other versions
JP4126849B2 (en
Inventor
Susumu Moriya
進 森谷
Mitsuhiro Watanabe
充洋 渡邉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Meidensha Corp
Meidensha Electric Manufacturing Co Ltd
Original Assignee
Meidensha Corp
Meidensha Electric Manufacturing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Meidensha Corp, Meidensha Electric Manufacturing Co Ltd filed Critical Meidensha Corp
Priority to JP2000141208A priority Critical patent/JP4126849B2/en
Publication of JP2001325242A publication Critical patent/JP2001325242A/en
Application granted granted Critical
Publication of JP4126849B2 publication Critical patent/JP4126849B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Retry When Errors Occur (AREA)
  • Debugging And Monitoring (AREA)

Abstract

PROBLEM TO BE SOLVED: To solve the problem that there is no measures when a CPU unit is stopped by a temporary fault in the case that individual CPU units perform monitoring by the presence/absence of transmission signals from other CPU units in a multiple CPU system. SOLUTION: The individual CPU units #0 and #N are provided with a timer TIM for forcibly resetting and reactivating the CPU of the respective CPU units when a transmission signal TX is not generated within a set time limit and stoppage by the temporary fault is prevented by the reset. Also, the CPU units are provided with a timer for forcibly resetting and reactivating the CPU of the present CPU unit when the transmission signal is not generated within the set time limit and also when a specific code is not received from the other CPU unit.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【0001】[0001]

【発明の属する技術分野】本発明は、シリアルバスでC
PUユニット間が接続されたマルチCPUシステムの監
視方式に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a serial bus
The present invention relates to a monitoring method for a multi-CPU system in which PU units are connected.

【0002】[0002]

【従来の技術】図3は、マルチCPUシステムの要部構
成を示し、CPUユニット#0〜#3が互いにシリアル
バスで接続され、各CPUユニットによる分散処理シス
テムなどを構築する。このシステム構成において、各C
PUユニットのうちの1つでも故障すると、システム全
体の機能障害になるため、各CPUユニット#0〜#3
の動作監視が必要になる。
2. Description of the Related Art FIG. 3 shows a configuration of a main part of a multi-CPU system, in which CPU units # 0 to # 3 are connected to each other via a serial bus, thereby constructing a distributed processing system or the like using each CPU unit. In this system configuration, each C
Failure of even one of the PU units results in a functional failure of the entire system.
Operation monitoring is required.

【0003】この監視方式として、シリアルバスを通し
て各CPUユニット間で授受される情報の有無で他のC
PUユニットの正常/異常を監視している。
[0003] As this monitoring method, the presence or absence of information exchanged between CPU units through a serial bus determines whether or not other CPU units can receive information.
Monitors the normal / abnormal state of the PU unit.

【0004】例えば、CPUユニット#0は、CPUユ
ニット#1から定期的に送られてくる情報を基に、CP
Uユニット#1の健全性を判定する。CPUユニット#
1からの情報が何らかの理由で途切れた場合、CPUユ
ニット#0はCPUユニット#1の異常と認識し、異常
監視出力を発生する。
[0004] For example, the CPU unit # 0 performs a CP based on information periodically sent from the CPU unit # 1.
The soundness of U unit # 1 is determined. CPU unit #
If the information from 1 is interrupted for some reason, the CPU unit # 0 recognizes that the CPU unit # 1 is abnormal and generates an abnormality monitoring output.

【0005】このように、各CPUユニット間の情報の
有無でそれぞれ他のCPUユニットを監視している。
As described above, each of the other CPU units is monitored by the presence or absence of information between the CPU units.

【0006】[0006]

【発明が解決しようとする課題】CPUユニットは、外
来ノイズ等を含めて、ハードウェアやソフトウェアの一
過性障害で停止に至る場合がある。この場合、当該CP
Uユニット#Xを監視している他のCPUユニット#Y
には設定時間内に情報が送信されないため、他のCPU
ユニット#YはCPUユニット#Xが異常と認識してし
まい、システムダウンに至る恐れがある。
The CPU unit may stop due to a temporary failure of hardware or software, including external noise. In this case, the CP
Other CPU unit #Y monitoring U unit #X
Information is not sent within the set time.
In the unit #Y, the CPU unit #X recognizes that there is an abnormality, and the system may be down.

【0007】本発明の目的は、CPUユニットが一過性
障害で停止したときのシステムダウンを防止できるマル
チCPUシステムの監視方式を提供することにある。
An object of the present invention is to provide a monitoring method for a multi-CPU system which can prevent a system down when a CPU unit is stopped due to a temporary failure.

【0008】[0008]

【課題を解決するための手段】本発明は、CPUユニッ
トが一過性障害で停止した場合、多くの場合はその再起
動により正常に復帰できることに着目し、CPUユニッ
トが一過性障害で停止した場合に障害発生CPUユニッ
トが自動的に自ユニットのCPUを強制リセットまたは
他のCPUユニットからの受信を論理積条件にしてCP
Uを強制リセットし、このリセットにより障害発生CP
Uユニット自体を再起動することで、マルチCPUシス
テムのシステムダウンを防止できるようにしたもので、
以下の方式を特徴とする。
SUMMARY OF THE INVENTION The present invention focuses on the fact that when a CPU unit is stopped due to a transient failure, it can be normally restored by restarting the CPU unit in many cases, and the CPU unit is stopped due to a transient failure. In the event of a failure, the CPU unit that caused the failure automatically resets its own CPU or makes the
U is forcibly reset.
By restarting the U unit itself, it is possible to prevent the system down of the multi CPU system,
The following method is characterized.

【0009】シリアルバスで複数のCPUユニット間が
接続され、各CPUユニットは他のCPUユニットから
の送信信号を監視時間内に受信しないときに当該CPU
ユニットの障害発生とするマルチCPUシステムの監視
方式において、各CPUユニットは、送信信号が設定時
限内に発生しないときに自CPUユニットのCPUを強
制リセットして再起動させるタイマを備えたことを特徴
とする。
A plurality of CPU units are connected by a serial bus, and when each CPU unit does not receive a transmission signal from another CPU unit within the monitoring time, the CPU unit is connected to the CPU unit.
In the monitoring method of the multi-CPU system in which a unit failure occurs, each CPU unit is provided with a timer for forcibly resetting and restarting the CPU of the own CPU unit when a transmission signal does not occur within a set time limit. And

【0010】また、シリアルバスで複数のCPUユニッ
ト間が接続され、各CPUユニットは他のCPUユニッ
トからの送信信号を監視時間内に受信しないときに当該
CPUユニットの障害発生とするマルチCPUシステム
の監視方式において、各CPUユニットは、送信信号が
設定時限内に発生しないとき、かつ他のCPUユニット
から特殊コードを受信しないときに自CPUユニットの
CPUを強制リセットして再起動させるタイマを備えた
ことを特徴とする。
A plurality of CPU units are connected by a serial bus, and each CPU unit causes a failure of the CPU unit when a transmission signal from another CPU unit is not received within a monitoring time. In the monitoring method, each CPU unit has a timer for forcibly resetting and restarting the CPU of its own CPU unit when a transmission signal does not occur within a set time limit and when no special code is received from another CPU unit. It is characterized by the following.

【0011】[0011]

【発明の実施の形態】図1は、本発明の実施形態を示す
CPUユニットの要部構成図である。各CPUユニット
#0、#Nは、送信信号TXを送信バッファBUFTを
通してシリアルバスに出力し、また、他のCPUユニッ
トからの送信信号RXを受信バッファBUFRを通して
受信する。
FIG. 1 is a block diagram of a main part of a CPU unit showing an embodiment of the present invention. Each of the CPU units # 0 and #N outputs the transmission signal TX to the serial bus through the transmission buffer BUFT, and receives the transmission signal RX from another CPU unit through the reception buffer BUFR.

【0012】ここで、各CPUユニット#0、#1は、
送信バッファBUFTの入力になる送信信号TXで再起
的に計時を開始するタイマTIMを設ける。
Here, the CPU units # 0 and # 1 are:
A timer TIM is provided to start time counting recursively with a transmission signal TX input to the transmission buffer BUFT.

【0013】このタイマTIMは、設定される時限内に
送信信号TXが発生したときにリセットされ、このリセ
ット時点から再び計時を開始することで、設定時限内に
送信信号TXが発生する限りリセットと計時を繰り返
す。そして、設定時限内に送信信号TXが発生しない場
合にタイムアップ出力を得る。
The timer TIM is reset when the transmission signal TX is generated within a set time period, and by restarting time counting from this reset time, the timer TIM is reset as long as the transmission signal TX is generated within the set time period. Repeat the timing. Then, when the transmission signal TX does not occur within the set time limit, a time-up output is obtained.

【0014】タイマTIMのタイムアップ出力は、自C
PUユニット内のCPUを強制的にリセットさせ、自C
PUユニットを再起動させる信号にする。
The time-up output of the timer TIM is
Force the CPU in the PU unit to reset
The signal is used to restart the PU unit.

【0015】なお、タイマTIMの時限は、他のCPU
ユニットに設定される監視時間よりも短い時間にされ
る。また、タイマTIMは、CPUユニットのCPUな
どの動作停止にも機能を維持できるハードウェア構成と
する。
The time limit of the timer TIM is different from that of another CPU.
The time is set shorter than the monitoring time set in the unit. The timer TIM has a hardware configuration capable of maintaining the function even when the operation of the CPU of the CPU unit is stopped.

【0016】このようなタイマTIMを各CPUユニッ
トに設けたシステムにおいて、各CPUユニット#0、
#Nは、起動時に内部を初期化し、タイマTIMも初期
化して処理を開始する。各CPUユニットは、その処理
開始と共に、シリアルバスを通して各CPUユニット間
で授受される情報の有無で他のCPUユニットの正常/
異常の監視を開始、および自CPUユニット内のタイマ
TIMも計時を開始する。
In a system in which such a timer TIM is provided in each CPU unit, each CPU unit # 0,
#N initializes the inside at the time of startup, initializes the timer TIM, and starts processing. At the start of the processing, each CPU unit determines whether other CPU units are normal or not based on the presence or absence of information exchanged between the CPU units via the serial bus.
The monitoring of the abnormality is started, and the timer TIM in the own CPU unit also starts counting time.

【0017】この処理状態で、あるCPUユニットに一
過性障害が発生し、その送信信号TXの発生が停止した
場合、この停止時間が他のCPUユニットによる監視時
間内で、タイマTIMの時限に達したとき、タイマTI
MによるCPUの強制リセットがなされ、自CPUユニ
ットを再起動させる。この再起動により一過性障害が動
作停止原因の場合には再起動により正常動作に復帰させ
る。
In this processing state, if a temporary failure occurs in a certain CPU unit and the generation of the transmission signal TX is stopped, the stop time is within the monitoring time of another CPU unit and is limited to the time limit of the timer TIM. When reached, timer TI
The CPU is forcibly reset by M, and the own CPU unit is restarted. If the transient failure is the cause of the operation stop due to this restart, the normal operation is restored by the restart.

【0018】CPUユニットがその再起動にも正常動作
に復帰できない障害発生の場合、他のCPUユニットに
よる監視時間で障害発生として監視する。
When a failure occurs in which the CPU unit cannot return to the normal operation even after restarting, the failure is monitored as a failure occurrence in the monitoring time of another CPU unit.

【0019】なお、タイマTIMは、1回の強制リセッ
ト信号を発生するに限らず、その時限を他のCPUユニ
ットによる監視時間の数分の1に設定することで、送信
信号の停止で複数回の強制リセット信号を発生すること
もできる。この場合、タイマTIMは強制リセット信号
を発生したときにタイマTIM自体をリセットする構成
にする。
Note that the timer TIM is not limited to generating one forced reset signal, but by setting the time limit to be a fraction of the monitoring time by the other CPU units, the timer TIM can generate a plurality of times by stopping the transmission signal. Can be generated. In this case, the timer TIM is configured to reset itself when a forced reset signal is generated.

【0020】また、タイマTIMの時限は、他のCPU
ユニットによる監視時間よりも長い時間に設定すること
ができる。この場合、他のCPUユニットが先に障害発
生を認識するが、この障害発生を他のCPUユニットが
複数回の認識で初めて障害情報を発生する構成とする。
Further, the time limit of the timer TIM is determined by another CPU.
The time can be set longer than the monitoring time by the unit. In this case, the other CPU unit recognizes the occurrence of the failure first, but the failure information is generated only when the other CPU unit recognizes the failure a plurality of times.

【0021】図2は、本発明の他の実施形態を示すCP
Uユニットの要部構成図である。同図が図1と異なる部
分は、タイマTIMのリセット信号発生条件に、他のC
PUユニットからの特殊コードの受信信号RXをもたせ
る点にある。
FIG. 2 shows a CP according to another embodiment of the present invention.
It is a principal part block diagram of U unit. FIG. 6 differs from FIG. 1 in that other C
This is to provide a reception signal RX of a special code from the PU unit.

【0022】この構成では、送信信号TXの停止がタイ
マTIMの時限を越えるのみではCPUの強制リセット
はなされず、他のCPUユニットから特殊コードを受信
したことをAND(論理積)条件にして強制リセットを
発生する。
In this configuration, the forced reset of the CPU is not performed only when the stop of the transmission signal TX exceeds the time limit of the timer TIM. Generate a reset.

【0023】この構成により、他のCPUユニットによ
る特殊コードの送信が条件となり、他のCPUユニット
との協動による再起動を可能にし、タイマTIMの誤動
作による不要な強制リセットを防止できる。
With this configuration, the transmission of the special code by another CPU unit becomes a condition, so that restart can be performed in cooperation with another CPU unit, and unnecessary forced reset due to malfunction of the timer TIM can be prevented.

【0024】[0024]

【発明の効果】以上のとおり、本発明によれば、送信信
号が停止した障害発生CPUユニットが自動的に自ユニ
ットのCPUを強制リセットまたは他のCPUユニット
からの受信を論理積条件にしてCPUを強制リセット
し、このリセットにより障害発生CPUユニット自体を
再起動するようにしたため、CPUユニットが一過性障
害で停止したときのシステムダウンを防止できる。
As described above, according to the present invention, the faulty CPU unit whose transmission signal has stopped automatically resets its own CPU or makes the reception from another CPU unit a logical product condition. Is forcedly reset, and the reset causes the failed CPU unit itself to be restarted. Therefore, it is possible to prevent a system failure when the CPU unit is stopped due to a temporary failure.

【図面の簡単な説明】[Brief description of the drawings]

【図1】本発明の実施形態を示すCPUユニットの要部
構成図。
FIG. 1 is a configuration diagram of a main part of a CPU unit according to an embodiment of the present invention.

【図2】本発明の他の実施形態を示すCPUユニットの
要部構成図。
FIG. 2 is a main part configuration diagram of a CPU unit showing another embodiment of the present invention.

【図3】マルチCPUシステムの構成例。FIG. 3 is a configuration example of a multi-CPU system.

【符号の説明】[Explanation of symbols]

#0〜#3、#N…CPUユニット BUFT…送信バッファ BUFR…受信バッファ TIM…タイマ # 0 to # 3, #N: CPU unit BUFT: Transmission buffer BUFR: Reception buffer TIM: Timer

───────────────────────────────────────────────────── フロントページの続き Fターム(参考) 5B042 GA11 JJ04 JJ15 JJ19 KK02 5B045 BB12 BB28 HH04 JJ05 JJ45 5B083 AA05 BB01 CC09 CD07 CE01 DD11 EE02 EE11 EF01 GG04 ──────────────────────────────────────────────────続 き Continued on the front page F term (reference) 5B042 GA11 JJ04 JJ15 JJ19 KK02 5B045 BB12 BB28 HH04 JJ05 JJ45 5B083 AA05 BB01 CC09 CD07 CE01 DD11 EE02 EE11 EF01 GG04

Claims (2)

【特許請求の範囲】[Claims] 【請求項1】 シリアルバスで複数のCPUユニット間
が接続され、各CPUユニットは他のCPUユニットか
らの送信信号を監視時間内に受信しないときに当該CP
Uユニットの障害発生とするマルチCPUシステムの監
視方式において、 各CPUユニットは、送信信号が設定時限内に発生しな
いときに自CPUユニットのCPUを強制リセットして
再起動させるタイマを備えたことを特徴とするマルチC
PUシステムの監視方式。
A plurality of CPU units are connected by a serial bus, and each CPU unit does not receive a transmission signal from another CPU unit within a monitoring time when the CPU unit does not receive the transmission signal.
In the monitoring method of the multi-CPU system in which a failure of the U unit occurs, each CPU unit has a timer for forcibly resetting and restarting the CPU of the own CPU unit when a transmission signal does not occur within a set time limit. Characteristic Multi-C
Monitoring system for PU system.
【請求項2】 シリアルバスで複数のCPUユニット間
が接続され、各CPUユニットは他のCPUユニットか
らの送信信号を監視時間内に受信しないときに当該CP
Uユニットの障害発生とするマルチCPUシステムの監
視方式において、 各CPUユニットは、送信信号が設定時限内に発生しな
いとき、かつ他のCPUユニットから特殊コードを受信
しないときに自CPUユニットのCPUを強制リセット
して再起動させるタイマを備えたことを特徴とするマル
チCPUシステムの監視方式。
2. A plurality of CPU units are connected by a serial bus, and each of the CPU units does not receive a transmission signal from another CPU unit within a monitoring time period.
In the monitoring method of the multi-CPU system in which a failure of the U unit occurs, each CPU unit switches the CPU of its own CPU unit when a transmission signal does not occur within a set time limit and when no special code is received from another CPU unit. A monitoring method for a multi-CPU system, comprising a timer for forcibly resetting and restarting.
JP2000141208A 2000-05-15 2000-05-15 Multi-CPU system monitoring method Expired - Lifetime JP4126849B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2000141208A JP4126849B2 (en) 2000-05-15 2000-05-15 Multi-CPU system monitoring method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2000141208A JP4126849B2 (en) 2000-05-15 2000-05-15 Multi-CPU system monitoring method

Publications (2)

Publication Number Publication Date
JP2001325242A true JP2001325242A (en) 2001-11-22
JP4126849B2 JP4126849B2 (en) 2008-07-30

Family

ID=18648355

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2000141208A Expired - Lifetime JP4126849B2 (en) 2000-05-15 2000-05-15 Multi-CPU system monitoring method

Country Status (1)

Country Link
JP (1) JP4126849B2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010140361A (en) * 2008-12-12 2010-06-24 Fujitsu Microelectronics Ltd Computer system and abnormality detection circuit
JP2019110410A (en) * 2017-12-18 2019-07-04 株式会社明電舎 Network device

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063356B (en) * 2009-11-18 2014-05-21 杭州华三通信技术有限公司 Multi-central processing unit (CPU) heartbeat detection system and method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010140361A (en) * 2008-12-12 2010-06-24 Fujitsu Microelectronics Ltd Computer system and abnormality detection circuit
US8700835B2 (en) 2008-12-12 2014-04-15 Fujitsu Semiconductor Limited Computer system and abnormality detection circuit
JP2019110410A (en) * 2017-12-18 2019-07-04 株式会社明電舎 Network device

Also Published As

Publication number Publication date
JP4126849B2 (en) 2008-07-30

Similar Documents

Publication Publication Date Title
US7434085B2 (en) Architecture for high availability using system management mode driven monitoring and communications
US20120131384A1 (en) Computer system
US20040177242A1 (en) Dynamic computer system reset architecture
JP2011128795A (en) Information processor, and recovery method for information processor
CN113630281B (en) BYPASS control method, device, terminal and storage medium
JP2007034479A (en) Operation system device, standby system device, operation/standby system, operation system control method, standby system control method, and operation system/standby system control method
US6526527B1 (en) Single-processor system
CN111371642B (en) Network card fault detection method, device, equipment and storage medium
JP4126849B2 (en) Multi-CPU system monitoring method
JPH06119303A (en) Loose coupling multiprocessor system
JP2004086520A (en) Monitoring control device and its method
JP6654662B2 (en) Server device and server system
CN114942687B (en) Reset safety mechanism based on monitoring, implementation method and reset circuit
JP3266841B2 (en) Communication control device
JP2954040B2 (en) Interrupt monitoring device
JP3652910B2 (en) Device status monitoring method
JP3637510B2 (en) Fault monitoring method and circuit
JPH1078896A (en) Industrial electronic computer
KR20000059718A (en) Nonstop operation method and circuit for plc duplication system
JP4957068B2 (en) Redundant system switching method
JP2005018710A (en) Uninterruptible power supply system and information processing system corresponding to information processor with two or more power input parts
JP2834062B2 (en) Information processing system
JPS58225738A (en) Dispersion type transmission system
JPH09212201A (en) Control circuit for production facility
JPH10143393A (en) Diagnosis and processing device

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20060414

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20080117

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20080129

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20080328

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20080422

A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20080505

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110523

Year of fee payment: 3

R150 Certificate of patent or registration of utility model

Free format text: JAPANESE INTERMEDIATE CODE: R150

Ref document number: 4126849

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120523

Year of fee payment: 4

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130523

Year of fee payment: 5

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20140523

Year of fee payment: 6

EXPY Cancellation because of completion of term