JP2578985B2 - Redundant controller - Google Patents

Redundant controller

Info

Publication number
JP2578985B2
JP2578985B2 JP1177028A JP17702889A JP2578985B2 JP 2578985 B2 JP2578985 B2 JP 2578985B2 JP 1177028 A JP1177028 A JP 1177028A JP 17702889 A JP17702889 A JP 17702889A JP 2578985 B2 JP2578985 B2 JP 2578985B2
Authority
JP
Japan
Prior art keywords
monitoring
partner
watchdog timer
failure
fault
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP1177028A
Other languages
Japanese (ja)
Other versions
JPH0342943A (en
Inventor
圭一 大山
聡生 首藤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NIPPON DENKI SOFUTOEA KK
NEC Corp
Original Assignee
NIPPON DENKI SOFUTOEA KK
Nippon Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NIPPON DENKI SOFUTOEA KK, Nippon Electric Co Ltd filed Critical NIPPON DENKI SOFUTOEA KK
Priority to JP1177028A priority Critical patent/JP2578985B2/en
Publication of JPH0342943A publication Critical patent/JPH0342943A/en
Application granted granted Critical
Publication of JP2578985B2 publication Critical patent/JP2578985B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Description

【発明の詳細な説明】 [産業上の利用分野] 本発明は二重化構成で動作する通信制御装置の障害監
視方式に関し,特にシステムの性質上処理量に大きな変
動があっても確実な障害検出とリカバリを義務付けられ
た二重化制御装置の改良に関する。
Description: TECHNICAL FIELD The present invention relates to a fault monitoring method for a communication control device operating in a duplex configuration, and more particularly to a fault monitoring method capable of reliably detecting a fault even when there is a large change in the processing amount due to the nature of the system. The present invention relates to an improvement of a redundant control device which is required to perform a recovery.

[従来の技術] 従来,この種の二重化構成による通信制御装置の障害
監視は,主に一定時間相手系からのチェック信号が途絶
えると,相手系障害を自系に通知するウォッチドッグタ
イマによって行われていた。
[Prior art] Conventionally, fault monitoring of a communication control device of this type of duplex configuration is performed mainly by a watchdog timer that notifies a fault of the partner system to its own system when a check signal from the partner system is interrupted for a certain period of time. I was

[発明が解決しようとする課題] 上述したウォッチドッグタイマ等による障害監視で
は,障害確認の余地が無く,相手系より相手系障害を通
知された際に本当に障害が発生したものとして受付けざ
るを得ない。従って,ウォッチドッグタイマ等のハード
障害による誤動作や,相手系の処理量がピークに達した
ために起こる相手系からのチェック信号送出の遅延に伴
う誤動作等が発生すると,システムが正常に動作してい
るにもかかわらず,障害処理動作に移行してしまうとい
う欠点がある。
[Problems to be Solved by the Invention] In the fault monitoring using the watchdog timer or the like described above, there is no room for confirming the fault, and when a fault is notified from the counterpart system, it must be accepted as a real fault. Absent. Therefore, if a malfunction occurs due to a hardware failure of a watchdog timer or the like, or a malfunction occurs due to a delay in sending a check signal from the partner system due to a peak in the processing amount of the partner system, the system operates normally. Nevertheless, there is a drawback that the operation shifts to a failure handling operation.

また,障害検出の方法が一つしかないため,主機がシ
ステムとして動作不能の状態に落ち入っても,副機に対
して通知されないような状態が発生した場合(ウォッチ
ドッグタイマ等のハード障害,ダイナミックループ形成
状態),システムがその機能を発揮できない状態に落ち
入ってしまうという欠点がある。
Also, since there is only one method of fault detection, if the main unit falls into an inoperable state as a system, a condition occurs in which the sub unit is not notified (hardware failure such as a watchdog timer, etc. Dynamic loop formation state), and the system falls into a state where it cannot perform its function.

[課題を解決するための手段] 本発明の二重化制御装置は,自系の障害を相手系に通
知するウォッチドッグタイマ手段,相手系に対し,自系
が動作していることを通知するヘルスチェック手段,自
系の障害を自系で監視するループ監視手段,自系と相手
系との間の通信機能の状態を監視する系間通信状態監視
手段,自系と相手系との間で回線の切り換えを行う回線
切換装置の監視を行う回線切換装置監視手段を有し,こ
れらの各手段により行われる異常発生の通知,及び自系
の現在の状態(現在主機であるかどうか,既に他の異常
発生の通知を受けているかどうか)を合わせて障害の判
断を行い,速やかにリカバリ処理を行うことを特徴とす
る。
[Means for Solving the Problems] A redundant control device according to the present invention comprises a watchdog timer means for notifying a partner system of a fault in the self system, and a health check for notifying the partner system that the self system is operating. Means, loop monitoring means for monitoring the failure of the local system in the local system, inter-system communication status monitoring means for monitoring the status of the communication function between the local system and the remote system, and communication between the local system and the remote system. It has line switching device monitoring means for monitoring the line switching device that performs switching, and reports the occurrence of abnormalities performed by each of these means, and the current status of its own system (whether or not it is the current main unit, if any other abnormalities have already occurred). The system is characterized in that the failure is determined in accordance with whether the notification of occurrence has been received and the recovery process is promptly performed.

[実施例] 次に,本発明について図面を参照して説明する。第1
図は本発明の一実施例のソフトウェア及びハードウェア
の構成図であり,二重化構成にある通信制御装置10,20
の一方が主機,他方が副機として動作する。
Next, the present invention will be described with reference to the drawings. First
FIG. 1 is a configuration diagram of software and hardware according to an embodiment of the present invention.
One operates as the main engine and the other operates as the sub-engine.

1はウォッチドッグタイマ,2はウォッチドッグタイマ
に信号を送るタスク,3は相手系のウォッチドッグタイマ
からの割り込みを受けるハードウェア(PIO)である。
Reference numeral 1 denotes a watchdog timer, reference numeral 2 denotes a task for sending a signal to the watchdog timer, and reference numeral 3 denotes hardware (PIO) for receiving an interrupt from the watchdog timer of the partner system.

ウォッチドッグタイマ1では,ウォッチドッグタイマ
に信号を送る自系のタスク2からの信号が1秒以上途絶
えると,相手系のPIO3に対して割り込み通知を行う。な
お,PIO3は相手系ウォッチドッグタイマ1からの割込み
だけでなく,回線切換装置30からの異常割込みも合わせ
て受け取る。これらの割り込み通知は,割り込み通知を
受けるタスク4を通じて障害監視メインタスク5へと渡
される。障害監視メインタスク5には,6秒ごとにヘルス
チェックデータを相手系に送信するヘルスチェック手
段,逆に8秒間相手系からヘルスチェックデータがこな
かった場合にヘルスチェックタイムアウトするヘルスチ
ェック監視手段,他系との系間通信状態を監視する系間
通信状態監視手段,ヘルスチェックタイムアウト,系間
通信状態異常を検出した際に,相手系からのウォッチド
ッグタイマ割込みを待つガードタイマ手段等があり,各
イベント発生時には第2図のマトリックスによって障害
の判定を行う。なお,第2図中,空白部は何の処理も行
わない。
In the watchdog timer 1, if the signal from the task 2 of the own system that sends a signal to the watchdog timer is interrupted for 1 second or more, an interrupt notification is sent to the PIO3 of the other system. Note that PIO3 receives not only an interrupt from the partner watchdog timer 1 but also an abnormal interrupt from the line switching device 30. These interrupt notifications are passed to the fault monitoring main task 5 through the task 4 receiving the interrupt notification. The fault monitoring main task 5 includes a health check means for transmitting health check data to the other system every 6 seconds, a health check monitoring means for performing a health check timeout when no health check data is received from the other system for 8 seconds, There are inter-system communication status monitoring means for monitoring the inter-system communication status with other systems, guard timer means for waiting for a watchdog timer interrupt from the partner system when a health check timeout or inter-system communication status error is detected, etc. At the time of occurrence of each event, a failure is determined by the matrix shown in FIG. In FIG. 2, no processing is performed on the blank portion.

6は第1のループ監視タスクでシステム中の全アプリ
ケーションタスクの中で最高位のレベルで動作する。7
は第2のループ監視タスク7で全アプリケーションタス
クの中で最低位のレベルで動作する。8はループ監視フ
ラグである。第2のループ監視タスク7は0.5秒ごとに
ループ監視フラグ8をセットし,第1のループ監視タス
ク6は8秒ごとにループ監視フラグ8をリセットする。
この時,もし既にループ監視フラグ8がリセットされて
いた場合,システムはダイナミックループ形成状態に落
ち入っていると判断し,その機能を停止(自殺)する。
Reference numeral 6 denotes a first loop monitoring task which operates at the highest level among all application tasks in the system. 7
Operates at the lowest level among all the application tasks in the second loop monitoring task 7. 8 is a loop monitoring flag. The second loop monitoring task 7 sets the loop monitoring flag 8 every 0.5 seconds, and the first loop monitoring task 6 resets the loop monitoring flag 8 every 8 seconds.
At this time, if the loop monitoring flag 8 has already been reset, the system determines that the system has entered a dynamic loop formation state, and stops its function (suicide).

次に、本発明装置における状態遷移図である第2図を
参照して説明する。
Next, a description will be given with reference to FIG. 2 which is a state transition diagram in the device of the present invention.

自系が主機である場合には、状態によらずイベントの
発生に対し、副機障害か自系停止のいずれかの選択をす
る。なお、空欄は何もしないことを意味し、斜線は該当
のイベントが発生し得ないことを示す。すなわち、主機
ではガードタイマは動作しない。
When the own system is the main machine, the user selects either the sub machine failure or the self system stop for the occurrence of the event regardless of the state. Note that a blank column indicates that nothing is performed, and a hatched line indicates that the corresponding event cannot occur. That is, the guard timer does not operate in the main engine.

一方、自系が副機の場合は、各舛目は以下を意味す
る。
On the other hand, when the own system is the sub-machine, each box means the following.

『ウォッチドッグタイマ割込み中へ移行』:状態を
『ウォッチドッグタイマ割込み中』にする。
“Move to watchdog timer interrupt”: Change the status to “watchdog timer interrupt”.

『ガードタイマスタート』:ガードタイマをセット
し、状態を『ガードタイマ動作中』にする。
"Guard timer start": Sets the guard timer and sets the status to "guard timer is operating".

『ウォッチドッグタイマ割込み中解除』:状態を『通
常時』に戻す。
“Release during watchdog timer interrupt”: Returns the status to “normal”.

『ガードタイマキャンセル』:ガードタイマをセット
し、状態を『通常時』に戻す。
“Guard timer cancel”: Sets the guard timer and returns the state to “normal”.

第2図のマトリックスは、二重化システムが系間通信
路等の共通部の障害で両系とも相手系障害を検出してし
まうことのないよう、主機と副機の相手系障害判定条件
を変えている。具体的には、副機から見た主機障害の判
定条件を、2つの異常検出があった時ときびしくしてい
る。
The matrix shown in FIG. 2 is obtained by changing the partner failure determination conditions of the main unit and the sub unit so that the redundant system does not detect the partner failure due to a failure in the common part such as the inter-system communication path. I have. Specifically, the judgment condition of the main engine failure as seen from the sub-machine is strict when two abnormalities are detected.

例えば、状態が『副機』で、イベントが『ガードタイ
マタイムアウト』の場合には、前述したように、ヘルス
チェックのタイムアウトや系間通信状態異常のみでは主
機障害とは判定できないので、ガードタイマをセット
し、ウォッチドッグタイマの割込みを待つが、割り込み
がなかった場合は、系間通信不能な副機は二重化システ
ムの副機として成立しないため自系停止とする。
For example, if the status is “sub-machine” and the event is “guard timer timeout”, as described above, it is not possible to determine the failure of the main machine only by the health check timeout or inter-system communication status abnormality. It is set and waits for an interrupt from the watchdog timer, but if there is no interrupt, the sub-machine that cannot communicate between systems is not established as a sub-machine of the redundant system, so the self-system is stopped.

[発明の効果] 以上説明したように本発明は,ウォッチドッグタイ
マ,ヘルスチェック手段,ループ監視手段,系間通信状
態監視手段,回線切換装置監視手段の5つの障害監視手
段の組合わせにより,より正確で迅速な障害監視を実現
できる効果がある。
[Effects of the Invention] As described above, the present invention is further improved by the combination of the five fault monitoring means of the watchdog timer, the health check means, the loop monitoring means, the inter-system communication state monitoring means, and the line switching device monitoring means. There is an effect that accurate and quick failure monitoring can be realized.

以下に,本発明により障害対処がより確実になった例
の一部を挙げる。
The following is a part of an example in which troubleshooting according to the present invention has become more reliable.

1.主機の処理量のピーク時,及びウォッチドッグタイ
マの誤動作等による副機に対するウォッチドッグタイマ
の誤通知に対しては,副機に対して送られる次のヘルス
チェックデータ送信によってウォッチドッグタイマ割込
み中の状態が解除されるので,主機,副機の切り換えが
むやみに起こることは無い。
1. The watchdog timer interrupt is issued by sending the next health check data sent to the submachine when the watchdog timer is erroneously notified to the submachine due to the peak processing amount of the main machine or the malfunction of the watchdog timer. Since the middle state is canceled, switching between the main unit and the sub unit does not occur unnecessarily.

2.主機または副機がループ形成状態に落ち入った場合
は,ループ監視タスクにより検出されて停止するので,
相手系に確実に障害が通知され,システムとして動作続
行不能となった装置は速やかに除外される。
2. When the main unit or sub unit enters the loop formation state, it is detected by the loop monitoring task and stopped.
A failure is reliably notified to the partner system, and devices that cannot continue to operate as a system are promptly excluded.

【図面の簡単な説明】 第1図は本発明のソフトウェア及びハードウェアの構成
図,第2図は主機,副機におけるイベント発生時の処理
を示すマトリックスである。 1:ウォッチドッグタイマ,2:タスク,3:PIO,4:タスク,5:
障害監視メインタスク,6:第1のループ監視タスク,7:第
2のループ監視タスク,8:ループ監視フラグ。
BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a configuration diagram of software and hardware of the present invention, and FIG. 2 is a matrix showing processing when an event occurs in a main unit and a sub unit. 1: Watchdog timer, 2: Task, 3: PIO, 4: Task, 5:
Fault monitoring main task, 6: first loop monitoring task, 7: second loop monitoring task, 8: loop monitoring flag.

Claims (1)

(57)【特許請求の範囲】(57) [Claims] 【請求項1】二重化構成で動作する通信制御装置におい
て,各通信制御装置は,自系の障害を相手系に通知する
ウォッチドッグタイマ手段,相手系に対し,自系が動作
していることを通知するヘルスチェック手段,自系の障
害を自系で監視するループ監視手段,自系と相手系との
間の通信機能の状態を監視する系間通信状態監視手段,
自系と相手系との間で回線の切り換えを行う回線切換装
置の監視を行う回線切換装置監視手段を有し,これらの
各手段により行われる異常発生の通知,及び自系の現在
の状態を合わせて障害の判断を行い,リカバリ処理を行
うことを特徴とする二重化制御装置。
In a communication control apparatus operating in a redundant configuration, each communication control apparatus includes a watchdog timer means for notifying a partner system of a fault in the local system, and a countermeasure that the local system is operating with respect to the partner system. A health check unit for notifying, a loop monitoring unit for monitoring a failure of the local system in the local system, an inter-system communication state monitoring unit for monitoring a state of a communication function between the local system and a partner system,
It has line switching device monitoring means for monitoring the line switching device that switches the line between its own system and the partner system. It notifies the occurrence of abnormalities and the current status of its own system. A redundant control device, which also determines a failure and performs a recovery process.
JP1177028A 1989-07-11 1989-07-11 Redundant controller Expired - Lifetime JP2578985B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP1177028A JP2578985B2 (en) 1989-07-11 1989-07-11 Redundant controller

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP1177028A JP2578985B2 (en) 1989-07-11 1989-07-11 Redundant controller

Publications (2)

Publication Number Publication Date
JPH0342943A JPH0342943A (en) 1991-02-25
JP2578985B2 true JP2578985B2 (en) 1997-02-05

Family

ID=16023889

Family Applications (1)

Application Number Title Priority Date Filing Date
JP1177028A Expired - Lifetime JP2578985B2 (en) 1989-07-11 1989-07-11 Redundant controller

Country Status (1)

Country Link
JP (1) JP2578985B2 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05177164A (en) * 1991-12-26 1993-07-20 Kinshiyou Kagaku Kk Installation of natural stone touch/polished stone appearance finish surface
JP5161808B2 (en) * 2009-02-16 2013-03-13 三菱電機株式会社 Dual system controller
JP5951520B2 (en) * 2013-02-19 2016-07-13 株式会社日立製作所 Multiple processing system
JP7304731B2 (en) * 2019-04-16 2023-07-07 ローム株式会社 watchdog timer

Also Published As

Publication number Publication date
JPH0342943A (en) 1991-02-25

Similar Documents

Publication Publication Date Title
EP0993633B1 (en) Active failure detection
JPH0666783B2 (en) How to interconnect network modules
JP2578985B2 (en) Redundant controller
JP3420919B2 (en) Information processing device
CN112034774A (en) Hot redundancy control method
JP3341712B2 (en) Switching unit failure handling method
JP2977705B2 (en) Control system of networked multiplexed computer system
JP3107104B2 (en) Standby redundancy method
JPH06290126A (en) Fault monitoring system for computer system
JP2536115B2 (en) Abnormality monitoring method in exchange
JPH1049450A (en) Recovery system for abnormal time of remote monitor system
JPH01279301A (en) Computer decentralizing system
JPH1196088A (en) Duplex constitution device
JP2000295259A (en) Device for detecting abnormality in lan
JPH02310755A (en) Health check system
JPH01234966A (en) Fault detecting system for multiplexed computer system
JPS5850372B2 (en) Data collection and distribution processing system
JPH05165798A (en) System controlling system for two-series system
JPH06175869A (en) Duplex computer system
JPS60256848A (en) Computer mutual monitor method
JPH0721106A (en) Network managing method
JPH11331194A (en) Device and system for monitor
JPH01112851A (en) Data communication system
JPS61290834A (en) Supervisory equipment for condition of transmission circuit
JP2001325117A (en) Stand-by duplex system information processor and its system state checking method

Legal Events

Date Code Title Description
S531 Written request for registration of change of domicile

Free format text: JAPANESE INTERMEDIATE CODE: R313531

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20071107

Year of fee payment: 11

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20081107

Year of fee payment: 12

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20081107

Year of fee payment: 12

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091107

Year of fee payment: 13

EXPY Cancellation because of completion of term
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091107

Year of fee payment: 13