JPS6224330A - Fault detecting system for multi-processor - Google Patents

Fault detecting system for multi-processor

Info

Publication number
JPS6224330A
JPS6224330A JP60161832A JP16183285A JPS6224330A JP S6224330 A JPS6224330 A JP S6224330A JP 60161832 A JP60161832 A JP 60161832A JP 16183285 A JP16183285 A JP 16183285A JP S6224330 A JPS6224330 A JP S6224330A
Authority
JP
Japan
Prior art keywords
processor
processors
area
fault
common memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP60161832A
Other languages
Japanese (ja)
Inventor
Noboru Mizuhara
水原 登
Tadashi Koshiba
小柴 忠司
Toru Hoshi
徹 星
Kenji Kawakita
謙二 川北
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Priority to JP60161832A priority Critical patent/JPS6224330A/en
Publication of JPS6224330A publication Critical patent/JPS6224330A/en
Pending legal-status Critical Current

Links

Landscapes

  • Hardware Redundancy (AREA)
  • Multi Processors (AREA)

Abstract

PURPOSE:To detect the fault of a processor at an opposite side by providing an action display area for processors to a common memory and making each processor refer to said display area to reset the display patterns with each other. CONSTITUTION:A working state display area ST for processors and a fault code display area FC are provided in a specific address of a common memory CM set between processors 1 and 2. Both processors 1 and 2 refers periodically to the area ST to write their own numbers. Thus it is possible to inform the processor of the remote side that the processor of its own side is working normally. While a fault code is written to the area FC if the processor of its won side has a fault. Thus the contents of the generation of the fault can be delivered to the processor of the remote side. Here it is decided that the processor of the remote side has no working if the area FC is not rewritten for two periods by the processor of the opposite side.

Description

【発明の詳細な説明】 〔発明の利用分野〕 本発明は、共通メモリを有する密結合のマルチプロセッ
サ・システムに係わり、特にプロセッサの障害検出に好
適なプロセッサ相互監視方式に関する。
DETAILED DESCRIPTION OF THE INVENTION [Field of Application of the Invention] The present invention relates to a tightly coupled multiprocessor system having a common memory, and more particularly to a mutual processor monitoring method suitable for detecting processor failures.

〔発明の背景〕[Background of the invention]

プロセッサの障害検出手段として、ウォッチドッグタイ
マによるプログラム暴走検出法、クロック断検出による
プロセッサ停止検出法などが知られ、マルチプロセッサ
の場合もこれらの方法がとられているが、これらの方法
の実現には専用回路を必要とする。(五十用他、″マイ
クロプロセッサシステム冗長構成の一考察″′、電子通
信学会技術研究報告+ Vol−84t Nα、212
,5E84−78.1984年11月に詳しい)〔発明
の目的〕 本発明の目的は、上記したような共通メモリを有するマ
ルチプロセッサ・システムにおいて、共通メモリ以外の
付加機構を設けることなく、プロセッサ相互で相手プロ
セッサの異常を検出できる機能を提供することにある。
Known methods for detecting processor failures include a program runaway detection method using a watchdog timer and a processor stop detection method using clock interruption detection.These methods are also used in the case of multiprocessors, but it is difficult to realize these methods. requires dedicated circuitry. (Iso et al., “A Study on Microprocessor System Redundancy Configuration”, Institute of Electronics and Communication Engineers Technical Research Report + Vol-84t Nα, 212
, 5E84-78, November 1984) [Object of the Invention] An object of the present invention is to provide a multiprocessor system having a common memory as described above, in which processors can communicate with each other without providing any additional mechanism other than the common memory. The objective is to provide a function that can detect abnormalities in the other processor.

〔発明の概要〕[Summary of the invention]

本発明の特徴は、前記したマルチプロセッサ・システム
において、共通メモリ上の特定番地にプロセッサの動作
表示用領域を設け、各プロセッサは周期的に本領域を参
照し、互いに相手プロセッサが本領域に表示したパター
ンをリセットし合うことにより、相手プロセッサの異常
を検出できることを可能にする点にある。
A feature of the present invention is that in the multiprocessor system described above, an area for displaying processor operations is provided at a specific address on the common memory, each processor periodically refers to this area, and each other's processors display information in this area. By resetting the patterns that have been created, it is possible to detect an abnormality in the partner processor.

〔発明の実施例〕[Embodiments of the invention]

以下、本発明の実施例を第1図および第2図により説明
する。
Embodiments of the present invention will be described below with reference to FIGS. 1 and 2.

第1図は、本プロセッサ障害検出方式を実現するマルチ
プロセッサ・システムの構成を示したものである。第1
図においてPL、P2はそれぞれプロセッサを示し、C
MはPL、P2間の共通メモリを、STはCM内のプロ
セッサ動作状態表示領域を、FCは障害コード表示領域
を示す、またLMI、LM2はそれぞれPL、P2のロ
ーカルメモリを、CNTl、CNT2はそれぞれPL。
FIG. 1 shows the configuration of a multiprocessor system that implements the present processor failure detection method. 1st
In the figure, PL and P2 each indicate a processor, and C
M is the common memory between PL and P2, ST is the processor operating status display area in CM, FC is the fault code display area, LMI and LM2 are the local memories of PL and P2, respectively, and CNTl and CNT2 are the local memories of PL and P2, respectively. PL each.

P2の和本プロセッサ障害検出回数カウンタを示す。It shows the number of times the Japanese processor failure detection counter of P2 is shown.

第2図は、第1図に示すプロセッサp1のプロセッサ動
作表示および相手プロセッサ障害検出のための動作フロ
ーを示したものである。第2図・に示した処理は各プロ
セッサにおいて周期的に起動され、まずST判定ステッ
プ21においてプロセッサ動作状態表示を判定し、ST
が相手プロセッサ番号を示していれば相手プロセッサが
動作中とみなす。$i<FC判定ステップ22では、F
Cが障害発生を示していれば障害処理ステップ29へ、
正常動作中を示していれば動作状態判定ステップ23へ
移る。ステップ23では自プロセッサの動作状態を判定
し、障害が発生していなければ直ちに自プロセッサ動作
中表示ステップ25へ移り、障害が発生していれば、障
害コード設定ステップ24にて障害内容に対応した障害
コードをFCに設定した後、ステップ25へ移る。自プ
ロセッサ動作中表示ステップ25では、自プロセッサが
動作中であることを相手プロセッサに通知するためにS
Tに自プロセッサ番号を書き込み、続< CNTiクリ
アステップ26にて障害検出カウンタをOクリアしてお
き処理を終了する。
FIG. 2 shows an operation flow for displaying the processor operation of the processor p1 shown in FIG. 1 and detecting a fault in the other processor. The process shown in FIG.
If indicates the partner processor number, it is assumed that the partner processor is operating. $i<FC In the determination step 22, F
If C indicates that a failure has occurred, proceed to failure processing step 29;
If it indicates normal operation, the process moves to operation state determination step 23. In step 23, the operating state of the own processor is determined, and if no fault has occurred, the process immediately moves to step 25 to display that the own processor is operating, and if a fault has occurred, the fault code setting step 24 is carried out to respond to the details of the fault. After setting the fault code to FC, the process moves to step 25. In the own processor operation display step 25, the S
The own processor number is written in T, and the fault detection counter is cleared to O in the continuation < CNTi clear step 26, and the process ends.

一方、ST判定ステップ21でSTが自プロセッサ番号
を示したままであれば、続<CNTi判定ステップ27
にてCNTiの値を判定し、CNTiが0であれば相手
プロセッサが動作を停止した可能性があるとみなし、C
NTiインクリメントステップ28にてCNTiに1を
加算して次周期の起動に備える。また、ステップ27に
てCNT iが1であれば2周期の間、相手プロセッサ
が動作していないことを示し、この場合は相手プロセッ
サが動作を停止したとみなし、障害処理ステップ29に
て障害処理1例えば1重系システムであれば障害処理の
収集とともに再立ち上げの初期設定を、2重系システム
であれば障害情報の収集ならびに系切替えを行なう、な
お、ステップ27でCNTiを判定した際、CNTi=
1、すなわち。
On the other hand, if the ST continues to indicate its own processor number in ST determination step 21, then continue < CNTi determination step 27
If CNTi is 0, it is assumed that the other processor may have stopped operating, and the
In the NTi increment step 28, 1 is added to CNTi in preparation for the next cycle of activation. Furthermore, if CNT i is 1 in step 27, it indicates that the partner processor has not been operating for two cycles. 1. For example, in the case of a single-system system, failure processing is collected and initial settings for restart are performed, and in the case of a dual-system system, failure information is collected and system switching is performed. Furthermore, when CNTi is determined in step 27, CNTi=
1, that is.

連続して2周期相手プロセッサが動作していない場合に
、初めて相手プロセッサが停止したとみなすのは、プロ
セッサ間の本プログラム実行周期誤差による過剰の障害
検出を防止するためである。
The reason why it is assumed that the partner processor has stopped for the first time when the partner processor has not been operating for two consecutive cycles is to prevent excessive failure detection due to the program execution cycle error between the processors.

また、プロセッサ間で障害コードを排他的に割当てるこ
とができれば、STに障害コードも表示してFCを削除
することもできる。
Furthermore, if a fault code can be exclusively assigned between processors, the fault code can also be displayed on the ST and the FC can be deleted.

以上、本実施例ではプロセッサが2台の場合の例を示し
たが、プロセッサが3台以J−の場合も、共通メモリC
M中にnC,(nはプロセッサ台数)個のSTおよびF
Cを設けるとともに、各プロセッサPiのローカルメモ
リL M i中にn−1個のプロセッサ障害検出カウン
タを設け、プロセッサPiは互いに他のn−1台のプロ
セッサの動作状態を監視することで同様に実現できる。
In the above, the example in which there are two processors has been shown in this embodiment, but when there are three or more processors, the common memory C
nC (n is the number of processors) STs and F in M
In addition, n-1 processor failure detection counters are provided in the local memory L M i of each processor Pi, and the processors Pi mutually monitor the operating status of the other n-1 processors. realizable.

ただし、プロセッサ数が多くなるとSTおよびFCの数
も監視できない数となる。その場合は、プロセッサ対応
のSTを設け、各プロセッサは周期的に自プロセッサ対
応のSTの値をカラン1−アップさせることにより(オ
ーバフローした場合は0に戻す)、他のプロセッサに自
プロセッサの動作を表示する方法も考えられる。
However, when the number of processors increases, the number of STs and FCs also becomes too large to be monitored. In that case, an ST corresponding to the processor is provided, and each processor periodically increments the value of the ST corresponding to its own processor by 1 (in case of overflow, returns it to 0), so that the other processors can know the operation of its own processor. Another possible method is to display the .

〔発明の効果〕〔Effect of the invention〕

本発明によれば、共通メモリを有するマルチプロセッサ
システムにおいて、共通メモリ中に設けた動作状態表示
領域を各プロセッサが周期的にアクセスするのみで互い
の障害が検出できるので、障害検出用の付加回路を削減
できる効果がある。
According to the present invention, in a multiprocessor system having a common memory, each processor can detect each other's faults simply by periodically accessing the operating status display area provided in the common memory. It has the effect of reducing

【図面の簡単な説明】[Brief explanation of drawings]

第1図は本マルチプロセッサ障害検出方式を実現する密
結合マルチプロセッサシステムの構成図、第2図は障害
検出処理手順のフローチャートであ代理人 弁理士 小
川勝男\ミ・′・Sフ第 1図 第 2固
Figure 1 is a block diagram of a tightly coupled multiprocessor system that implements this multiprocessor failure detection method, and Figure 2 is a flowchart of the failure detection processing procedure. 2nd hard

Claims (1)

【特許請求の範囲】[Claims] 共通メモリを有するマルチプロセツサシステムにおいて
、共通メモリ内にプロセツサ動作状態表示用の領域を設
けるとともに、各プロセツサに固有の識別コードを割当
て、互いのプロセツサが上記領域を同一の周期で参照し
、自プロセツサの識別コードを書き込んでおくことによ
り相手プロセツサに自プロセツサが正常動作中であるこ
とを通知できるとともに、障害種別に割当てたコードを
書き込むことにより障害発生内容を通知でき、一方、連
続した複数周期に渡つて上記領域が相手プロセツサによ
り書き替えられていない場合は相手プロセツサ動作停止
は判定できることを特徴とするマルチプロセツサの障害
検出方式。
In a multiprocessor system having a common memory, an area for displaying the operating status of the processors is provided in the common memory, and a unique identification code is assigned to each processor, so that each processor refers to the above area at the same cycle and performs self-processing. By writing the identification code of the processor, it is possible to notify the other processor that the own processor is operating normally, and by writing the code assigned to the failure type, it is possible to notify the contents of the failure. A fault detection method for a multiprocessor, characterized in that if the area has not been rewritten by the other processor for a period of time, it can be determined that the other processor has stopped operating.
JP60161832A 1985-07-24 1985-07-24 Fault detecting system for multi-processor Pending JPS6224330A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP60161832A JPS6224330A (en) 1985-07-24 1985-07-24 Fault detecting system for multi-processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP60161832A JPS6224330A (en) 1985-07-24 1985-07-24 Fault detecting system for multi-processor

Publications (1)

Publication Number Publication Date
JPS6224330A true JPS6224330A (en) 1987-02-02

Family

ID=15742768

Family Applications (1)

Application Number Title Priority Date Filing Date
JP60161832A Pending JPS6224330A (en) 1985-07-24 1985-07-24 Fault detecting system for multi-processor

Country Status (1)

Country Link
JP (1) JPS6224330A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02205966A (en) * 1989-02-03 1990-08-15 Pfu Ltd Mutual supervisory processing system for multiplex system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS55127652A (en) * 1979-03-26 1980-10-02 Hitachi Ltd Mutual supervision system between computers
JPS57164345A (en) * 1981-04-01 1982-10-08 Nec Corp Failure detecting system for composite microcomputer

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS55127652A (en) * 1979-03-26 1980-10-02 Hitachi Ltd Mutual supervision system between computers
JPS57164345A (en) * 1981-04-01 1982-10-08 Nec Corp Failure detecting system for composite microcomputer

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02205966A (en) * 1989-02-03 1990-08-15 Pfu Ltd Mutual supervisory processing system for multiplex system

Similar Documents

Publication Publication Date Title
TWI229796B (en) Method and system to implement a system event log for system manageability
JP3481737B2 (en) Dump collection device and dump collection method
JPH04178871A (en) Id specification control system
CN117573609B (en) System-on-chip with redundancy function and control method thereof
JPS6224330A (en) Fault detecting system for multi-processor
JPS6048773B2 (en) Mutual monitoring method between multiple computers
US5524206A (en) Sub-CPU monitoring system including dual port memory
JPS6115239A (en) Processor diagnosis system
JPS6113626B2 (en)
JPH03230254A (en) Fault detecting method for multiprocessor system
JPH0652130A (en) Multiprocessor system
JPS6290068A (en) Auxiliary monitor system
JPH0196752A (en) Multi-processor device
JPS5957351A (en) Data processing system
JPS6165354A (en) Detecting system of troubled processor
JPS626265B2 (en)
JPH01154257A (en) Mutual monitor processing system
JPH02206866A (en) Reset signal generator in multiprocessor system
JP2778344B2 (en) Multiple processor system
JPH0619744A (en) Operation monitoring system for multiprocessor system
KR930002857B1 (en) System formation table making-up method for multi-processor system
JPH0496832A (en) Fault information gathering device
JPH02212946A (en) Fault informing system for information processor
JPH01205244A (en) System for collecting logging information
JPS5827538B2 (en) Mutual monitoring method