JP2908430B1

JP2908430B1 - Host processor monitoring apparatus and monitoring method for multiprocessor system

Info

Publication number: JP2908430B1
Application number: JP10132388A
Authority: JP
Inventors: 修司山本
Original assignee: NEC Software Kyushu Ltd
Current assignee: NEC Software Kyushu Ltd
Priority date: 1998-05-14
Filing date: 1998-05-14
Publication date: 1999-06-21
Anticipated expiration: 2018-05-14
Also published as: JPH11328131A

Abstract

【要約】【課題】マルチプロセッサシステムにおいて、ホスト
プロセッサの障害発生等による稼働停止状態を確実に検
知するホストプロセッサ監視装置を提供する。【解決手段】共有装置を共有する複数のプロセッサ
と、それらの間で通信するための通信機構を有するマル
チプロセッサシステムにおけるホストプロセッサ監視装
置が開示される。ここで、各プロセッサは、排他ロック
機構を有する。また、各プロセッサは、運用が開始され
た時、共有装置内の状態データを運用中に設定し、自プ
ロセッサが運用中であることを示す電文を、通信機構を
介して一定時間毎に他のプロセッサに同報通信を行う手
段を有する。また、各プロセッサが運用中のすべての他
プロセッサからの電文を受信できないとき、電文を受信
できない他プロセッサの各々の排他制御機構のロックを
試み、排他制御機構がロックできる時、電文を受信でき
ないプロセッサは終了または障害が発生しているとして
リカバリー処理を実行し、ロックできない時は電文を受
信できないプロセッサの監視を継続する手段を有する。Provided is a host processor monitoring device for reliably detecting an operation stop state due to a failure of a host processor in a multiprocessor system. A host processor monitoring device in a multiprocessor system having a plurality of processors sharing a shared device and a communication mechanism for communicating between the processors is disclosed. Here, each processor has an exclusive lock mechanism. Further, when the operation is started, each processor sets the status data in the shared device to "in operation", and sends a message indicating that the own processor is operating to another message at regular intervals via the communication mechanism. Means for broadcasting to the processor. When each processor cannot receive a message from all other operating processors, it attempts to lock the exclusive control mechanism of each other processor that cannot receive the message, and when the exclusive control mechanism can lock, the processor that cannot receive the message. Has means for executing recovery processing assuming that termination or failure has occurred, and continuing monitoring of processors that cannot receive telegrams when locking is not possible.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、疎結合マルチプロ
セッサシステムにおけるホストプロセッサの監視装置お
よび監視方法に関し、特に排他制御機構を使用して、ホ
ストプロセッサの稼働停止を確認するホストプロセッサ
監視装置および監視方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a monitoring device and a monitoring method for a host processor in a loosely-coupled multiprocessor system, and more particularly to a monitoring device and a monitoring device for a host processor for confirming stoppage of operation of a host processor using an exclusive control mechanism. About the method.

【０００２】[0002]

【従来の技術】近年、情報処理システムに対する信頼性
の向上は重要な課題となっている。マルチプロセッサシ
ステムは、図５に示すように複数のホストプロセッサか
ら構成されており、上記課題にも対応することが可能で
ある。このような構成を有するマルチプロセッサシステ
ムは、特にオンライントランザクション処理、及び仮想
宛先制御などにおいて、高い信頼性を実現することがで
きる情報処理システムとなっている。2. Description of the Related Art In recent years, improving the reliability of an information processing system has become an important issue. The multiprocessor system includes a plurality of host processors as shown in FIG. 5, and can cope with the above problem. The multiprocessor system having such a configuration is an information processing system that can realize high reliability particularly in online transaction processing and virtual destination control.

【０００３】例えば端末へのメッセージを貯蓄する仮想
宛先制御の場合、仮想宛先メッセージの登録要求を受け
ると、マルチプロセッサシステムに接続された端末に出
力するが、万一要求のあがったホストプロセッサに障害
が発生した場合には人手に頼らず、別のホストプロセッ
サで出力処理を代替し、続行することが可能である。な
お、特開昭６３−３１６２５１号公報には、マルチプロ
セッサシステムにおいて通常、共用使用される磁気ディ
スク装置に代えて半導体装置で構成される拡張記憶装置
を使用し、高速アクセスを可能にしているものがある。For example, in the case of virtual destination control for storing a message to a terminal, when a registration request for a virtual destination message is received, the message is output to a terminal connected to the multiprocessor system. When the error occurs, the output processing can be replaced by another host processor and continued without relying on humans. Japanese Patent Application Laid-Open No. 63-316251 discloses a multiprocessor system in which an extended storage device composed of a semiconductor device is used in place of a magnetic disk device normally used in common, thereby enabling high-speed access. There is.

【０００４】さて、マルチプロセッサシステムにおい
て、その構成要素である任意のホストプロセッサに障害
が発生したことを認識する方式を、過去の特許出願で遡
及調査すると、まず、特開昭６３−１９３２６０号公報
があり、ホストプロセッサがホストプロセッサ間の共有
補助記憶装置の特定領域を一定時間毎に更新するように
仕組まれ、この特定領域を一定時間以上更新しなかった
場合に、ホストプロセッサに障害が発生したと認識する
ホストプロセッサ監視方式が開示されている。In a multiprocessor system, a method of recognizing that a failure has occurred in an arbitrary host processor which is a component of the system has been retrospectively investigated in a past patent application. The host processor is designed to update a specific area of the shared auxiliary storage device between the host processors at fixed time intervals, and if the specific area is not updated for a certain time or more, a failure occurs in the host processor. There is disclosed a host processor monitoring method which recognizes that:

【０００５】また、特開平４−２１９８６０号公報に
は、ホストプロセッサ間の共有資源を管理するための排
他制御方式として、共有ファイル上に、共有資源が更新
／参照中であることを示す情報を置く方式のマルチプロ
セッサシステムにおいて、ホストプロセッサとマルチシ
ステム制御プロセッサ間のパス障害に対処する方式とし
て、各ホストプロセッサに対応して監視タイマを置き、
ホストプロセッサの処理が一定時間内に終了するか否か
を該監視タイマのスタートからリセットまでの時間によ
って検知し、障害発生時には障害発生ホストを他ホスト
プロセッサに通知してリカバリ処理をなすホストプロセ
ッサ監視方式が開示されている。Japanese Patent Application Laid-Open No. 4-219860 discloses an exclusive control method for managing shared resources between host processors. Information indicating that the shared resources are being updated / referenced on a shared file. In a multiprocessor system of a placing method, as a method for coping with a path failure between the host processor and the multisystem control processor, a monitoring timer is placed corresponding to each host processor,
The host processor monitors whether or not the processing of the host processor is completed within a predetermined time based on the time from the start to the reset of the monitoring timer, and when a failure occurs, notifies the other host processor of the failed host and performs recovery processing. A scheme is disclosed.

【０００６】さらに、特開平４−３４０６４９号公報に
は、各ホストプロセッサがそれぞれ自己の障害発生を検
出し、これをダウン通知手段を介して、いずれかのホス
トプロセッサに置かれているダウン監視手段に通知し、
ダウン通知手段から改めて他ホストプロセッサに通知す
るホストプロセッサ監視方式が開示されている。Further, Japanese Patent Application Laid-Open No. 4-340649 discloses that each host processor detects the occurrence of its own fault and reports this to a down monitoring means provided in one of the host processors via a down notification means. Notify,
There is disclosed a host processor monitoring method for notifying another host processor again from the down notification means.

【０００７】[0007]

【発明が解決しようとする課題】しかしながら、特開昭
６３−１９３２６０号公報および特開平４−２１９８６
０号公報に開示されている従来技術には、以下に説明す
る問題点があった。すなわち、マルチプロセッサシステ
ムを構成するホストプロセッサが高負荷の場合に、ホス
トプロセッサ間の共有補助記憶装置の特定領域を更新す
ることができず、高負荷ホストプロセッサに障害が発生
したと認識してしまうことである。However, JP-A-63-193260 and JP-A-4-21986 are known.
The prior art disclosed in Japanese Patent Publication No. 0 has the following problems. That is, when the host processor configuring the multiprocessor system has a high load, the specific area of the shared auxiliary storage device between the host processors cannot be updated, and it is recognized that a fault has occurred in the high-load host processor. That is.

【０００８】マルチプロセッサシステムは、前述のとお
り、要求の挙がったホストプロセッサに障害が発生した
場合には人手に頼らず、別のホストプロセッサで処理を
代替し、続行するシステムである。マルチプロセッサシ
ステムを構成するホストプロセッサが高負荷の場合に、
誤って障害が発生したと認識すると、別のホストプロセ
ッサで処理を代替しようとして不要なリカバリ処理を行
ってしまう。従来、不要なリカバリ処理は、リカバリ処
理を実行するホストプロセッサの負荷をあげ、マルチプ
ロセッサシステム全体の負荷をあげていた。また、特開
平４−３４０６４９号公報に開示されている従来技術で
は、結果的に各ホストプロセッサによる自己診断が成功
せずに、障害発生を見逃してしまう場面についての対策
がなされていない。As described above, a multiprocessor system is a system in which, when a failure occurs in a host processor for which a request has been made, processing is replaced by another host processor and continued without relying on manual labor. When the host processor of the multiprocessor system is under heavy load,
If it is mistakenly recognized that a failure has occurred, an unnecessary recovery process is performed in an attempt to substitute the process with another host processor. Conventionally, unnecessary recovery processing increases the load on the host processor that executes the recovery processing and increases the load on the entire multiprocessor system. Further, in the prior art disclosed in Japanese Patent Application Laid-Open No. 4-340649, no countermeasure is taken against a situation in which the self-diagnosis by each host processor does not succeed and the occurrence of a failure is overlooked.

【０００９】本発明は、以上のような従来のホストプロ
セッサ監視方式における問題点に鑑みてなされたもので
あり、マルチプロセッサシステム全体のマルチプロセッ
サが互いに他のマルチプロセッサシステムを監視し、ホ
ストプロセッサの障害発生を含む稼働停止状態を確実に
検知することを可能にするホストプロセッサ監視方式を
提供することを目的とする。The present invention has been made in view of the above-mentioned problems in the conventional host processor monitoring method, and the multiprocessors of the entire multiprocessor system monitor each other's other multiprocessor systems, and monitor the host processor. It is an object of the present invention to provide a host processor monitoring method capable of reliably detecting an operation stop state including a failure occurrence.

【００１０】[0010]

【課題を解決するための手段】上記課題を解決するため
に、本発明の実施態様によれば、共有装置を共有する複
数のプロセッサと、それらの間で通信するための通信機
構を有するマルチプロセッサシステムであって、各プロ
セッサは、排他ロック機構と、運用が開始された時、前
記共有装置内の状態データを運用中に設定し、自プロセ
ッサが運用中であることを示す電文を、前記通信機構を
介して一定時間毎に他のプロセッサに同報通信する手段
と、前記他のプロセッサのうちの運用中のすべてのプロ
セッサからの電文が受信されないとき、電文が受信され
ないプロセッサの各々の排他制御機構のロックを試み、
前記排他制御機構がロックできる時、前記プロセッサは
終了または障害が発生しているとしてリカバリー処理を
実行し、ロックできない時は前記プロセッサの監視を継
続する手段を具備するマルチプロセッサシステムにおけ
るプロセッサ監視装置が提供される。According to an embodiment of the present invention, there is provided a multiprocessor having a plurality of processors sharing a shared device and a communication mechanism for communicating between the processors. The system, wherein each processor, when the operation is started, sets the status data in the shared device to operating when the operation is started, and transmits a message indicating that the own processor is operating to the communication. Means for broadcasting to other processors at regular time intervals via a mechanism, and exclusive control of each of the processors not receiving a message when messages from all the operating processors among the other processors are not received. Attempt to lock the mechanism,
When the exclusive control mechanism can be locked, the processor performs a recovery process assuming that the processor is terminated or a failure has occurred, and when the lock cannot be performed, the processor monitoring device in the multiprocessor system includes a unit that continues monitoring the processor. Provided.

【００１１】ここで、前記マルチプロセッサシステムに
おけるプロセッサ監視装置において、前記各プロセッサ
が、運用を開始する時、前記通信機構を介して電文の受
信を試み、前記電文の受信ができる時、前記電文の発信
元プロセッサが運用中であることを識別することが可能
である。Here, in the processor monitoring device in the multiprocessor system, when each of the processors starts operation, attempts to receive a message via the communication mechanism, and when the message can be received, the processor monitors the message. It is possible to identify that the source processor is in operation.

【００１２】さらに、前記マルチプロセッサシステムに
おけるプロセッサ監視装置において、前記各プロセッサ
は、運用を終了する時、前記共有装置内の状態データを
他のすべてのプロセッサについて参照し、前記他のすべ
てのプロセッサが初期状態の時、マルチプロセッサシス
テム全体の終了処理を行い、前記他のすべてのプロセッ
サのうち1つ以上運用中であれば、前記各プロセッサの
終了処理を行うことが可能である。Further, in the processor monitoring device in the multiprocessor system, each of the processors refers to the status data in the shared device for all the other processors when the operation is terminated, and all the other processors refer to the status data in the shared device. In the initial state, the termination processing of the entire multiprocessor system is performed, and if at least one of the other processors is in operation, the termination processing of each processor can be performed.

【００１３】また、共有装置を共有する複数のプロセッ
サと、それらの間で通信するための通信機構と、各プロ
セッサは排他ロック機構を有するマルチプロセッサシス
テムにおいて、運用が開始された時、前記共有装置内の
状態データを運用中に設定するステップと、自プロセッ
サが運用中であることを示す電文を、前記通信機構を介
して一定時間毎に他のプロセッサに同報通信するステッ
プと、前記他のプロセッサのうちの運用中のすべてのプ
ロセッサからの電文が受信されないとき、電文が受信さ
れないプロセッサの各々の排他制御機構のロックを試み
るステップと、前記排他制御機構がロックできる時、前
記プロセッサは終了または障害が発生しているとしてリ
カバリー処理を実行し、ロックできない時は前記プロセ
ッサの監視を継続するステップを具備するマルチプロセ
ッサシステムにおけるプロセッサ監視方法が提供され
る。A plurality of processors sharing a shared device, a communication mechanism for communicating between the processors, and each processor, when operating in a multiprocessor system having an exclusive lock mechanism, when the shared device is started, Setting the status data in operation to operating; transmitting a message indicating that the own processor is operating to another processor at regular time intervals via the communication mechanism; and Attempting to lock the exclusive control mechanism of each of the processors for which no message is received when no message is received from all of the operating processors among the processors, and when the exclusive control mechanism can lock, the processor is terminated or Perform recovery processing assuming that a failure has occurred, and continue monitoring the processor when locking is not possible Processor monitoring method in a multiprocessor system comprising that steps are provided.

【００１４】ここで、前記マルチプロセッサシステムに
おけるプロセッサ監視方法において、前記各プロセッサ
が、運用を開始する時、前記通信機構を介して他プロセ
ッサからの電信の受信を試みるステップと、前記電信の
受信ができる時、前記電信の発信元プロセッサが運用中
であることを識別するステップをさらに具備することも
可能である。Here, in the processor monitoring method in the multiprocessor system, when each of the processors starts operation, an attempt is made to receive a signal from another processor via the communication mechanism. When possible, the method may further comprise the step of identifying that the source processor of the telegram is operational.

【００１５】さらに、前記マルチプロセッサシステムに
おけるプロセッサ監視方法において、前記各プロセッサ
は、運用を終了する時、前記共有装置内の状態データを
自プロセッサを除く他のすべてのプロセッサについて参
照するステップと、前記他のすべてのプロセッサが初期
状態の時、マルチプロセッサシステム全体の終了処理を
行い、前記他のすべてのプロセッサのうち1つ以上運用
中であれば、前記自プロセッサの終了処理を行うステッ
プをさらに具備することが可能である。Further, in the processor monitoring method in the multiprocessor system, when each processor terminates its operation, the processor refers to state data in the shared device for all processors other than its own processor, When all the other processors are in the initial state, the process further includes a step of performing a termination process of the entire multiprocessor system, and performing a termination process of the own processor if at least one of the other processors is in operation. It is possible to

【００１６】また、共有装置を共有する複数のプロセッ
サと、それらの間で通信するための通信機構と、各プロ
セッサは排他ロック機構を有するマルチプロセッサシス
テムにおいて、運用が開始された時、前記共有装置内の
状態データを運用中に設定するステップと、自プロセッ
サが運用中であることを示す電文を、前記通信機構を介
して一定時間毎に他のプロセッサに同報通信するステッ
プと、前記他のプロセッサのうちの運用中のすべてのプ
ロセッサからの電文が受信されないとき、電文が受信さ
れないプロセッサの各々の排他制御機構のロックを試み
るステップと、前記排他制御機構がロックできる時、前
記プロセッサは終了または障害が発生しているとしてリ
カバリー処理を実行し、ロックできない時は前記プロセ
ッサの監視を継続するステップを実行するプログラムを
記録した記憶媒体が提供される。Also, when a plurality of processors sharing a shared device, a communication mechanism for communicating between them, and each processor are operated in a multiprocessor system having an exclusive lock mechanism, when the shared device is started, Setting the status data in operation to operating; transmitting a message indicating that the own processor is operating to another processor at regular time intervals via the communication mechanism; and Attempting to lock the exclusive control mechanism of each of the processors for which no message is received when no message is received from all of the operating processors among the processors, and when the exclusive control mechanism can lock, the processor is terminated or Perform recovery processing assuming that a failure has occurred, and continue monitoring the processor when locking is not possible Storage medium is provided having recorded thereon a program for executing that step.

【００１７】ここで、前記プログラムを記録した記憶媒
体において、前記各プロセッサが、運用を開始する時、
前記通信機構を介して他プロセッサからの電信の受信を
試みるステップと、前記電信の受信ができた時、前記電
信の発信元プロセッサが運用中であることを識別するス
テップをさらに実行することが可能である。Here, in the storage medium storing the program, when each of the processors starts operation,
Attempting to receive a telegram from another processor via the communication mechanism and, if the telegram is successfully received, identifying that the source processor of the telegram is in operation can be further executed. It is.

【００１８】さらに、前記プログラムを記録した記憶媒
体において、前記各プロセッサは、運用を終了する時、
前記共有装置内の状態データを他のすべてのプロセッサ
について参照するステップと、前記他のすべてのプロセ
ッサが初期状態の時、マルチプロセッサシステム全体の
終了処理を行い、前記他のすべてのプロセッサのうち1
つ以上運用中であれば、前記各プロセッサの終了処理を
行うステップをさらに実行することが可能である。Further, in the storage medium storing the program, each of the processors, when ending the operation,
Referring to state data in the shared device for all other processors; and, when all of the other processors are in the initial state, perform a termination process of the entire multiprocessor system, and perform one of the other processors.
If one or more are in operation, it is possible to further execute a step of performing a termination process of each processor.

【００１９】[0019]

【発明の実施の形態】本発明は、ホストプロセッサ間の
共有排他機構をロックすることによりマルチプロセッサ
システムを構成する各ホストプロセッサの状態を監視
し、マルチプロセッサシステム全体の運用状態を管理で
きる構成を提供するものである。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention locks a shared exclusion mechanism between host processors, monitors the state of each host processor constituting a multiprocessor system, and manages the operation state of the entire multiprocessor system. To provide.

【００２０】以下、本発明の実施の形態を図面を参照し
て説明する。図１は、本発明の実施の形態に係るホスト
プロセッサ監視方式を採用するマルチプロセッサシステ
ムの構成を示す図である。図１において、マルチプロセ
ッサシステムは、プログラム制御により動作するホスト
プロセッサ１，２と、マルチプロセッサシステムを構成
するホストプロセッサ間で利用可能な資源の排他制御を
行うためのシステムプログラム制御機構であって、各ホ
ストプロセッサに対応している共有排他制御機構３１，
３２と、マルチプロセッサシステムを構成するホストプ
ロセッサ間で電文の送受信を行うためのシステムプログ
ラム制御機構であるホストプロセッサ間通信機構４と、
マルチプロセッサシステムを構成するホストプロセッサ
間で共有される資源であって、ホストプロセッサが状態
管理のために参照更新を行う共有補助記憶装置５から構
成されている。Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a diagram showing a configuration of a multiprocessor system employing a host processor monitoring method according to an embodiment of the present invention. In FIG. 1, a multiprocessor system is a system program control mechanism for performing exclusive control of resources available between host processors 1 and 2 operating under program control and host processors constituting the multiprocessor system. A shared exclusion control mechanism 31 corresponding to each host processor,
32, a communication mechanism 4 between host processors, which is a system program control mechanism for transmitting and receiving messages between host processors constituting the multiprocessor system,
This is a resource shared between the host processors constituting the multiprocessor system, and is constituted by a shared auxiliary storage device 5 in which the host processor performs reference updating for state management.

【００２１】また、ホストプロセッサ１は、ホスト開始
部１１と、他ホスト監視部１２と、ホスト終了部１３と
を含み、さらに、ドライブ１４を介して記憶媒体１５と
アクセスが可能である。また、ホストプロセッサ２は、
ホスト開始部２１と、他ホスト監視部２２と、ホスト終
了部２３とを含み、さらに、ドライブ２４を介して記憶
媒体２５とアクセスが可能である。さらに、共有補助記
憶装置５は、第１の制御領域として、ホストプロセッサ
１の状態を記録する領域５１と、ホストプロセッサ２の
状態を記録する領域５２とを含む。領域５１と領域５２
に記録される情報は、共有補助記憶装置５内のホストプ
ロセッサ監視用レコード内の情報である。The host processor 1 includes a host start unit 11, a remote host monitoring unit 12, and a host end unit 13, and is capable of accessing a storage medium 15 via a drive 14. Further, the host processor 2
It includes a host start unit 21, another host monitoring unit 22, and a host termination unit 23, and can access a storage medium 25 via a drive 24. Further, the shared auxiliary storage device 5 includes, as a first control area, an area 51 for recording the state of the host processor 1 and an area 52 for recording the state of the host processor 2. Region 51 and region 52
Is information in the host processor monitoring record in the shared auxiliary storage device 5.

【００２２】図１に示すマルチプロセッサシステムで
は、説明の都合上、ホストプロセッサとして、ホストプ
ロセッサ１と２の２つを示したが、一般には、ホストプ
ロセッサの個数は任意の複数であり、ホストプロセッサ
の状態を記録する領域はマルチプロセッサシステムを構
成するこれらのホストプロセッサに対応して設置される
ことになる。なお、上記のホストプロセッサの状態に
は、“初期状態”と“運用中状態”がある。In the multiprocessor system shown in FIG. 1, two host processors 1 and 2 are shown as host processors for convenience of explanation. However, in general, the number of host processors is arbitrary and a plurality of host processors are provided. The area for recording the state is set in correspondence with these host processors constituting the multiprocessor system. The states of the host processor include an "initial state" and an "operating state".

【００２３】ホスト開始部１１は、マルチプロセッサ内
のホストプロセッサ間で監視を開始するに必要な処理を
担う。ホスト開始部１１は、マルチプロセッサを構成す
るホストプロセッサ起動時に動作し、自ホストプロセッ
サ用の共有排他機構３１をロックし、共有補助記憶装置
５内の自ホストプロセッサの状態５１を初期状態から運
用中状態に変更し、ホストプロセッサ間通信機構４より
別ホストプロセッサが送信した運用中電文を受信し、該
受信した電文により、他ホストプロセッサが起動されて
いることを認識し、該他ホストプロセッサに対しての障
害監視を開始すると共に、自ホストプロセッサが起動さ
れたことをホストプロセッサ間通信機構４を介してマル
チプロセッサシステムを構成する全ホストプロセッサに
通知するための運用中電文を、ホストプロセッサ間通信
機構４に一定間隔毎に送信する。上記運用中電文には、
自ホストプロセッサの識別子が格納されている。The host start unit 11 performs processing necessary for starting monitoring between host processors in the multiprocessor. The host start unit 11 operates when the host processor constituting the multiprocessor is started, locks the shared exclusion mechanism 31 for the own host processor, and changes the state 51 of the own host processor in the shared auxiliary storage device 5 from the initial state to the operating state. State, and receives the operating message transmitted by another host processor from the inter-host processor communication mechanism 4, recognizes that the other host processor has been activated by the received message, and notifies the other host processor. In addition to starting the failure monitoring, all the operating messages for notifying all the host processors constituting the multiprocessor system via the inter-host processor communication mechanism 4 that the host processor has been activated are transmitted to the host processor. It is transmitted to the mechanism 4 at regular intervals. In the above operating message,
Stores the identifier of the own host processor.

【００２４】他ホスト監視部１２は、ホストプロセッサ
１が、他のホストプロセッサの障害を検出する機能を持
つ。他ホスト監視部１２は、前述の運用中電文を一定時
間受信できなかった場合に動作し、受信できなかったホ
ストプロセッサ識別子に対応するホストプロセッサ用の
共有排他機構３２に対してロックを試みる。上記ロック
の試みにおいて、ロックできた場合は、当該ホストプロ
セッサが終了したか、または障害が発生したものと見な
して、リカバリ動作を行う。ロックできなかった場合に
は、当該プロセッサが高負荷下で稼働中のため、運用中
電文を送信できなかったものとして、単に監視を継続す
るに留める。The other host monitoring unit 12 has a function for the host processor 1 to detect a failure of another host processor. The other-host monitoring unit 12 operates when the above-described in-operation message has not been received for a certain period of time, and attempts to lock the shared exclusion mechanism 32 for the host processor corresponding to the host processor identifier that has not been received. In the above lock attempt, if the lock is successful, it is considered that the host processor has terminated or a failure has occurred, and a recovery operation is performed. If the lock could not be achieved, the processor is operating under a high load, and it is assumed that the in-operation message could not be transmitted, and the monitoring is simply continued.

【００２５】ホスト終了部１３は、マルチプロセッサを
構成するホストプロセッサを終了させる場合に、他のホ
ストプロセッサによる当該ホストプロセッサへの監視を
中止させる機能を持つ。ホスト終了部１３は、マルチプ
ロセッサを構成するホストプロセッサ終了時に動作し、
共有補助記憶装置５内の自ホストプロセッサの状態５１
を“運用中状態”から“初期状態”に変更し、かつ自ホ
ストプロセッサ用の共有排他機構３１をアンロックし、
以後、自ホストプロセッサからは運用中電文を送信しな
いことにより、他のホストプロセッサによる障害監視を
中止させる。ホスト開始部２１，他ホスト監視部２２，
ホスト終了部２３は、それぞれ、ホスト開始部１１，他
ホスト監視部１２，ホスト終了部１３と同一の制御を行
う。The host termination unit 13 has a function of stopping monitoring of the host processor by another host processor when terminating the host processor constituting the multiprocessor. The host termination unit 13 operates when the host processor constituting the multiprocessor terminates,
State 51 of own host processor in shared auxiliary storage device 5
Is changed from the “operating state” to the “initial state”, and the shared exclusion mechanism 31 for the host processor is unlocked.
Thereafter, by not transmitting the operating message from the own host processor, the failure monitoring by the other host processors is stopped. Host start unit 21, other host monitoring unit 22,
The host end unit 23 performs the same control as the host start unit 11, the other host monitoring unit 12, and the host end unit 13, respectively.

【００２６】図２、図３、図４は、本発明の実施の形態
に係るホストプロセッサ監視方式の動作を示すフローチ
ャートである。以下、図１を参照しつつ、図２、図３、
図４のフローチャートを用いて本発明の実施の形態に係
るホストプロセッサ監視方式の動作を説明する。FIGS. 2, 3 and 4 are flowcharts showing the operation of the host processor monitoring system according to the embodiment of the present invention. Hereinafter, referring to FIG. 1, FIG. 2, FIG.
The operation of the host processor monitoring method according to the embodiment of the present invention will be described with reference to the flowchart of FIG.

【００２７】最初に、図２のフローを用いて、ホスト開
始部１１と２１の動作を示すが、以下、本実施の形態で
は、ホストプロセッサをコマンドの投入により起動する
場合として、例えばホストプロセッサ１をコマンドの投
入により起動し、ホストプロセッサ１の起動完了後、ホ
ストプロセッサ２をコマンドの投入により起動し、この
間において、ホストプロセッサ同士が互いの障害監視を
行うことで、マルチプロセッサシステムが運用される場
合の動作を示す。First, the operation of the host start units 11 and 21 will be described with reference to the flow of FIG. 2. Hereinafter, in the present embodiment, the case where the host processor is started by inputting a command, for example, the host processor 1 Is started by inputting a command, and after the start of the host processor 1 is completed, the host processor 2 is started by inputting a command. During this time, the host processors monitor each other for faults, and the multiprocessor system is operated. The operation in the case is shown.

【００２８】まず、ステップ１１１において、ホストプ
ロセッサ１のホスト開始部１１でホストプロセッサ１用
の共有排他機構３１をロックする。ステップ１１２で、
共有補助記憶装置５のホストプロセッサ１の状態５１を
“初期状態”から“運用中状態”にする。ステップ１１
３で、ホストプロセッサ間通信機構４を介して他ホスト
プロセッサが送信した自ホストプロセッサ宛ての運用中
電文の受信を試みる。この運用中電文には、送信元のホ
ストプロセッサの識別子が格納されている。First, in step 111, the host start unit 11 of the host processor 1 locks the shared exclusion mechanism 31 for the host processor 1. At step 112,
The state 51 of the host processor 1 of the shared auxiliary storage device 5 is changed from the “initial state” to the “operating state”. Step 11
At step 3, an attempt is made to receive a running message addressed to the own host processor transmitted from another host processor via the inter-host processor communication mechanism 4. This operating message stores the identifier of the transmission source host processor.

【００２９】ステップ１１４で、自ホストプロセッサ宛
ての運用中電文の受信に成功する時にはステップ１１５
が実行され、失敗する時にはステップ１１５がスキップ
される。ここでは、ホストプロセッサ２は未起動状態で
あるため、ホストプロセッサ１が受信する他ホストプロ
セッサからの情報は無く、ステップ１１５はスキップさ
れる。その後、ステップ１１６で、ホストプロセッサ１
は、ホストプロセッサ間通信機構４を使って、ホストプ
ロセッサ１の運用中電文を他の全ホストプロセッサに送
信する。この運用中電文には自ホストプロセッサの識別
子をあらかじめ格納しておく。但し、ここでは、ホスト
プロセッサ２は未起動状態であるため、ホストプロセッ
サ間通信機構４への送信処理はエラーになる。本送信処
理は一定間隔毎に行う。If it is determined in step 114 that the active message addressed to the host processor has been successfully received, the process proceeds to step 115.
Is executed, and when it fails, step 115 is skipped. Here, since the host processor 2 has not been started, there is no information from the other host processors received by the host processor 1 and step 115 is skipped. Thereafter, in step 116, the host processor 1
Transmits the operating message of the host processor 1 to all other host processors using the inter-host processor communication mechanism 4. In this operating message, the identifier of the host processor is stored in advance. However, in this case, since the host processor 2 has not been started, the transmission processing to the host processor communication mechanism 4 results in an error. This transmission processing is performed at regular intervals.

【００３０】他方、ホストプロセッサ２のホスト開始部
２１は、ステップ１１１で、ホストプロセッサ２用の共
有排他機構３２をロックする。ステップ１１２で、共有
補助記憶装置５のホストプロセッサ２の状態５２を“初
期状態”から“運用中状態”にする。ステップ１１３
で、システムのホストプロセッサ間通信機構４よりホス
トプロセッサ１が送信した自ホストプロセッサへの運用
中電文を受信を試みる。この運用中電文には、送信元の
ホストプロセッサの識別子が格納されている。On the other hand, the host start unit 21 of the host processor 2 locks the shared exclusion mechanism 32 for the host processor 2 in step 111. In step 112, the state 52 of the host processor 2 of the shared auxiliary storage device 5 is changed from the "initial state" to the "operating state". Step 113
Then, an attempt is made to receive the in-operation message sent to the host processor from the host processor 1 by the communication mechanism 4 between the host processors of the system. This operating message stores the identifier of the transmission source host processor.

【００３１】ステップ１１４で、自ホストプロセッサ宛
ての運用中電文の受信に成功する時にはステップ１１５
が実行され、失敗する時にはステップ１１５がスキップ
される。これによりホストプロセッサ２はホストプロセ
ッサ１が起動中であることを認識し、ステップ１１５を
実行する。ステップ１１５で、ホストプロセッサ２のメ
モリ上にホストプロセッサ１のホストプロセッサ識別子
を記憶する。ステップ１１６で、ホストプロセッサ２
は、ホストプロセッサ間通信機構４を使って、ホストプ
ロセッサ２の運用中電文を他の全ホストプロセッサに送
信する。これにより、ホストプロセッサ１は、ホストプ
ロセッサ間通信機構４から運用中電文を受信し、ホスト
プロセッサ２が起動されたことを認識し、ステップ１１
５で、ホストプロセッサ１のメモリ上にホストプロセッ
サ２のホストプロセッサ識別子を記憶する。If it is determined in step 114 that the active message addressed to the host processor has been successfully received, step 115
Is executed, and when it fails, step 115 is skipped. As a result, the host processor 2 recognizes that the host processor 1 is running, and executes step 115. In step 115, the host processor identifier of the host processor 1 is stored in the memory of the host processor 2. At step 116, the host processor 2
Transmits the operating message of the host processor 2 to all other host processors using the inter-host processor communication mechanism 4. As a result, the host processor 1 receives the operating message from the inter-host processor communication mechanism 4, recognizes that the host processor 2 has been activated, and proceeds to step 11
In step 5, the host processor identifier of the host processor 2 is stored in the memory of the host processor 1.

【００３２】次に、図３のフローを用いて、他ホスト監
視部１２と２２の動作を示すが、以下、本実施の形態で
は、ホストプロセッサに障害が発生した場合、例えば、
ホストプロセッサ２に障害が発生したり、高負荷下で稼
働中のために処理が遅延し、ホストプロセッサ１がホス
トプロセッサ２の運用中電文を受信できなかった場合の
動作を示す。前述のとおり、ホストプロセッサ１は、ホ
スト開始部１１でシステムのホストプロセッサ間通信機
構４を介してホストプロセッサ２が一定間隔毎に送信し
た運用中電文を受信しており、ホストプロセッサ２が起
動されていることを示す情報として、メモリ上にホスト
プロセッサ２のホストプロセッサ識別子を記憶してい
る。Next, the operation of the other host monitoring units 12 and 22 will be described with reference to the flow of FIG. 3. Hereinafter, in this embodiment, when a failure occurs in the host processor, for example,
An operation when a failure occurs in the host processor 2 or processing is delayed because the host processor 2 is operating under a high load, and the host processor 1 cannot receive the operating message of the host processor 2 will be described. As described above, the host processor 1 receives the operating message transmitted by the host processor 2 at regular intervals by the host start unit 11 via the inter-host processor communication mechanism 4 of the system, and the host processor 2 is activated. The host processor identifier of the host processor 2 is stored in the memory as information indicating that the host processor 2 is operating.

【００３３】まず、ホストプロセッサ１の他ホスト監視
部１２は、ステップ１２１で、システムのホストプロセ
ッサ間通信機構４から運用中電文を一定時間受信できな
かった場合に、ホストプロセッサ１のメモリ上に記憶し
たホストプロセッサ識別子に対応するホストプロセッサ
２用の共有排他機構３２のロックを試みる。ステップ１
２２で、ホストプロセッサ２用の共有排他機構機構３２
をロックできた時にはステップ１２３、１２４が実行さ
れる。ロックできなかった時には、単にホストプロセッ
サの高負荷等により運用中電文の送信に時間がかかって
いるものと認識し、ステップ１２３、１２４をスキップ
して、単に他ホスト監視部１２による監視を継続する。First, the other host monitoring unit 12 of the host processor 1 stores the message in operation in the memory of the host processor 1 when the operating message cannot be received from the inter-host processor communication mechanism 4 for a predetermined time in step 121. Attempt to lock the shared exclusion mechanism 32 for the host processor 2 corresponding to the set host processor identifier. Step 1
22, the shared exclusion mechanism 32 for the host processor 2
Are locked, steps 123 and 124 are executed. If it cannot be locked, it is recognized that transmission of the operating message is taking a long time due to the high load of the host processor or the like, and the steps 123 and 124 are skipped and the monitoring by the other host monitoring unit 12 is simply continued. .

【００３４】ステップ１２３で、運用中電文を受信でき
なかったメモリ上のホストプロセッサ識別子に対応する
ホストプロセッサ２の共有補助記憶装置５内の状態５２
を参照する。状態５２に“初期状態”が記録されている
場合はホストプロセッサ２は終了したと認識し、“運用
中状態”が記録されている場合はホストプロセッサ２に
障害が発生していると認識し、ホストプロセッサ２の終
了または障害によって中断された処理の継続処理または
リカバリ処理を行う。ステップ１２４で、運用中電文を
受信できなかったメモリ上のホストプロセッサ識別子に
対応するホストプロセッサ２用の共有排他機構３２をア
ンロックし、メモリ上に記憶しているホストプロセッサ
識別子をクリアする。他ホスト監視部２２は、上記フロ
ーに示す他ホスト監視部１２の動作と同一の動作を取
る。At step 123, the state 52 in the shared auxiliary storage device 5 of the host processor 2 corresponding to the host processor identifier in the memory from which the in-operation message could not be received.
See If the "initial state" is recorded in the state 52, the host processor 2 recognizes that the processing has ended. If the "operational state" is recorded, the host processor 2 recognizes that a failure has occurred in the host processor 2. A continuation process or a recovery process of the process interrupted by the termination or failure of the host processor 2 is performed. In step 124, the shared exclusion mechanism 32 for the host processor 2 corresponding to the host processor identifier on the memory from which the in-operation message could not be received is unlocked, and the host processor identifier stored on the memory is cleared. The other host monitoring unit 22 performs the same operation as the operation of the other host monitoring unit 12 shown in the above flow.

【００３５】最後に、図４のフローを用いて、ホスト終
了部１３と２３の動作を示すが、以下、本実施の形態で
は、ホストプロセッサに終了コマンドが投入された場
合、例えば、ホストプロセッサ１に終了コマンドが投入
された結果として、ホストプロセッサ１が終了した後、
ホストプロセッサ２に終了コマンドを投入し、マルチプ
ロセッサシステムが終了する場合の動作を示す。前述の
とおり、今、ホストプロセッサ１は、ホスト開始部１１
でホストプロセッサ１用の共有排他機構３１をロック
し、共有補助記憶装置５のホストプロセッサ１の状態５
１を“運用中状態”にしている。ホストプロセッサ２
は、ホスト開始部２１でホストプロセッサ２用の共有排
他機構３２をロックし、共有補助記憶装置５のホストプ
ロセッサ２の状態５２を“運用中状態”にしている。Finally, the operation of the host termination units 13 and 23 will be described with reference to the flow of FIG. 4. Hereinafter, in the present embodiment, when a termination command is input to the host processor, for example, the host processor 1 After the host processor 1 terminates as a result of the termination command
An operation when a termination command is input to the host processor 2 to terminate the multiprocessor system will be described. As described above, the host processor 1 now has the host starting unit 11
Locks the shared exclusion mechanism 31 for the host processor 1 and sets the state 5 of the host processor 1 in the shared auxiliary storage device 5 to
1 is in the “operating state”. Host processor 2
Locks the shared exclusion mechanism 32 for the host processor 2 by the host start unit 21 and sets the state 52 of the host processor 2 of the shared auxiliary storage device 5 to the “operating state”.

【００３６】ホストプロセッサ１に終了コマンドが投入
された場合、ホストプロセッサ１のホスト終了部１３
は、まず、ステップ１３１で、共有補助記憶装置５を参
照する。ステップ１３２で、自ホストプロセッサ以外に
“運用中状態”が書き込まれているホストプロセッサが
あるか否かをチェックする。チェック結果、”運用中状
態”があれば、ステップ１３３，１３４を実行し、自ホ
ストプロセッサの運用のみを終了する。”運用中状態”
がなければ、ステップ１３６，１３７を実行し、マルチ
プロセッサシステム全体を終了する。When a termination command is input to the host processor 1, the host termination unit 13 of the host processor 1
First, in step 131, the shared auxiliary storage device 5 is referred to. In step 132, it is checked whether there is any host processor other than the own host processor in which the “operational state” is written. As a result of the check, if there is an “operating state”, steps 133 and 134 are executed, and only the operation of the own host processor is ended. "In operation"
If not, steps 136 and 137 are executed, and the entire multiprocessor system ends.

【００３７】今、共有補助記憶装置５のホストプロセッ
サ２の状態５２は“運用中状態”なので、ホストプロセ
ッサ１は、ステップ１３３で、共有補助記憶装置５内の
ホストプロセッサ１の状態５１を“初期状態”にする。
その後、ステップ１３４で、ホストプロセッサ１用の共
有排他機構３１をアンロックする。上記一連の処理の結
果、ホストプロセッサ２は、前述の仕組みにより、ホス
トプロセッサ１の終了を検知することができて、これに
より、ホストプロセッサ２が、他ホスト監視部２２でホ
ストプロセッサ１の終了によって中断された続きの処理
を行うことにより、マルチプロセッサシステムの運用を
継続する。Now, since the state 52 of the host processor 2 of the shared auxiliary storage device 5 is "operating state", the host processor 1 changes the state 51 of the host processor 1 in the shared auxiliary storage device 5 to "initial state" in step 133. State ”.
Thereafter, in step 134, the shared exclusion mechanism 31 for the host processor 1 is unlocked. As a result of the above series of processing, the host processor 2 can detect the end of the host processor 1 by the above-described mechanism. By continuing the interrupted processing, the operation of the multiprocessor system is continued.

【００３８】ホストプロセッサ１終了後、ホストプロセ
ッサ２に終了コマンドを投入した場合、ホストプロセッ
サ２のホスト終了部２３は、自ホストプロセッサ以外に
運用中のホストプロセッサがない。つまり、ステップ１
３２で、自ホストプロセッサ以外に“運用中状態”が書
き込まれているホストプロセッサがないと判断する。従
って、マルチプロセッサシステム全体を終了させるもの
と認識し、実行中の処理が完了するのを待って、ステッ
プ１３６で、マルチプロセッサシステム全体の終了処理
を行う。最後に、ステップ１３７で、ホストプロセッサ
２用の共有排他機構３２をアンロックする。When a termination command is input to the host processor 2 after termination of the host processor 1, the host termination unit 23 of the host processor 2 has no operating host processor other than its own host processor. That is, step 1
At 32, it is determined that there is no host processor other than the own host processor in which the “operating state” is written. Accordingly, it is recognized that the entire multiprocessor system is to be terminated, and after waiting for the processing being executed to be completed, in step 136, the termination processing of the entire multiprocessor system is performed. Finally, in step 137, the shared exclusion mechanism 32 for the host processor 2 is unlocked.

【００３９】なお、本実施の形態では、他ホストプロセ
ッサの障害発生の可能性を検出するための情報として、
ホストプロセッサ間の通信機能を受け持つホストプロセ
ッサ間通信機構を介する電文を使用したが、これに代え
て、従来技術と同様に、共有補助記憶装置内に第２の制
御領域（図示は省略）を設置し、各ホストプロセッサ間
で、前述の運用中電文に相当する情報を、該第２の制御
領域において受渡しする方式を採用することも可能であ
る。但し、該方式を採用して、なおかつ本発明の実施の
形態を完成させるには、前述の排他制御が共有補助記憶
装置単位に仕掛けられる方式ではなく、共有補助記憶装
置内の領域単位できめ細かく仕掛けることができる方式
であることが必要となる。In this embodiment, as information for detecting the possibility of a failure of another host processor,
Instead of using a message via a communication mechanism between host processors, which has a communication function between host processors, a second control area (not shown) is installed in the shared auxiliary storage device, similarly to the prior art. However, it is also possible to adopt a method in which information corresponding to the above-described operating message is transferred between the host processors in the second control area. However, in order to adopt this method and to complete the embodiment of the present invention, the exclusive control described above is not performed in units of the shared auxiliary storage device, but is performed in units of areas in the shared auxiliary storage device in detail. It needs to be a method that can do it.

【００４０】また、前述の排他制御は、必ずしもホスト
プロセッサ間で共有される資源に対してのみ仕掛けられ
るものである必要はなく、一つのホストプロセッサに専
有されて割りつけられている資源が、該ホストプロセッ
サ内で起動されるマルチタスクの競合下で使用される場
合の、タスク間排他制御をも範疇に含むものとする。さ
らに、前述の排他制御は、ロックを掛けたホストプロセ
ッサに障害が発生した際には、該ロックは自動的に解除
される性質のものとする。The exclusive control described above does not necessarily need to be performed only on resources shared between host processors, and resources exclusively allocated to one host processor are assigned to the resources. Exclusion control between tasks when used under competition of multitasks started in the host processor is also included in the category. Further, the above-described exclusive control is such that when a failure occurs in the locked host processor, the lock is automatically released.

【００４１】[0041]

【発明の効果】以上に説明したように、本発明に係るホ
ストプロセッサ監視方式は、人手に頼ることなく、マル
チプロセッサシステムを構成するホストプロセッサの異
常の発生を確実に監視することができる。特に、従来
は、ホストプロセッサが高負荷の場合において異常発生
と誤認識することがあったが、このような事態を完全に
防ぐことができる。As described above, the host processor monitoring method according to the present invention can reliably monitor the occurrence of an abnormality in a host processor constituting a multiprocessor system without relying on humans. In particular, in the past, there was a case where the host processor was erroneously recognized as having an abnormality when the load was high, but such a situation can be completely prevented.

【００４２】また、ホストプロセッサが稼働停止状態の
時、その原因が障害発生によるものか、それともホスト
プロセッサの終了によるものかを確実に判断することも
できる。以上により、稼働停止状態となったホストプロ
セッサの処理の継続やリカバリ処理を、他のホストプロ
セッサにより確実に実施することができる。When the operation of the host processor is stopped, it is possible to reliably determine whether the cause is a failure or the termination of the host processor. As described above, the continuation of the processing of the host processor in the operation stopped state and the recovery processing can be reliably performed by another host processor.

[Brief description of the drawings]

【図１】本発明の実施の形態に係るホストプロセッサ監
視装置を有するマルチプロセッサシステムの構成を示す
図である。FIG. 1 is a diagram showing a configuration of a multiprocessor system having a host processor monitoring device according to an embodiment of the present invention.

【図２】本発明の実施の形態に係るホストプロセッサ監
視装置におけるホスト開始部の動作を示すフローチャー
トである。FIG. 2 is a flowchart showing an operation of a host start unit in the host processor monitoring device according to the embodiment of the present invention.

【図３】本発明の実施の形態に係るホストプロセッサ監
視装置における他ホスト監視部の動作を示すフローチャ
ートである。FIG. 3 is a flowchart illustrating an operation of another host monitoring unit in the host processor monitoring device according to the embodiment of the present invention.

【図４】本発明の実施の形態に係るホストプロセッサ監
視装置におけるホスト終了部の動作を示すフローチャー
トである。FIG. 4 is a flowchart showing an operation of a host termination unit in the host processor monitoring device according to the embodiment of the present invention.

【図５】従来からの一般的なマルチプロセッサの構成を
示す図である。FIG. 5 is a diagram illustrating a configuration of a conventional general multiprocessor.

[Explanation of symbols]

１，２ホストプロセッサ３１，３２共有排他制御機構４ホストプロセッサ間通信機構５共有補助記憶装置１１，２１ホスト開始部１２，２２他ホスト監視部１３，２３ホスト終了部１４，２４ドライブ１５，２５記憶媒体５１，５２ホストプロセッサの状態を記録する領域 1, 2 host processors 31, 32 shared exclusion control mechanism 4 communication mechanism between host processors 5 shared auxiliary storage device 11, 21 host start unit 12, 22 other host monitoring unit 13, 23 host end unit 14, 24 drive 15, 25 storage Medium 51, 52 Area for recording the status of the host processor

Claims

(57) [Claims]

1. In a multiprocessor system having a plurality of processors sharing a shared device and a communication mechanism for communicating between the processors, each processor includes an exclusive control mechanism and its own processor when operation is started. Means for broadcasting a message indicating that the processor is in operation to the other processors at regular time intervals via the communication mechanism, and receiving messages from all the operating processors among the other processors. If not, attempt to lock the exclusive control mechanism of each processor that does not receive the message.If the exclusive control mechanism can be locked, the processor performs recovery processing assuming that a termination or failure has occurred. Processor monitoring device in a multiprocessor system comprising means for continuing monitoring of the processor

2. When each processor starts operation, it tries to receive a message via the communication mechanism, and when the message can be received, it identifies that the source processor of the message is in operation. 2. The method according to claim 1, further comprising:
Processor monitoring device in a multiprocessor system according to claim 1.

3. The processor sets status data in the shared device to a status indicating that the device is in operation when the operation is started, and changes the status data in the shared device to another status when the operation is terminated. Referring to all the processors, when all the other processors are in the initial state, the entire multiprocessor system is terminated, and if at least one of the other processors is in operation, the processing of each of the processors is performed. 3. The processor monitoring device in a multiprocessor system according to claim 1, further comprising means for performing an end process.

4. In a multiprocessor system having an exclusive control mechanism, a plurality of processors sharing a shared device, a communication mechanism for communicating between the processors, and when the operation is started, the own processor Broadcasting a message indicating that it is in operation to another processor at regular intervals via the communication mechanism, and not receiving messages from all of the other processors in operation among the other processors Attempting to lock the exclusive control mechanism of each processor for which no message is received, and when the exclusive control mechanism can be locked, the processor performs recovery processing assuming that a termination or failure has occurred, and when the lock cannot be performed. Processor in a multiprocessor system comprising the step of continuing to monitor the processor Monitoring method

5. When each processor starts operation, the processor attempts to receive a signal from another processor via the communication mechanism, and when the signal can be received, the source processor of the signal operates. 5. The method according to claim 4, further comprising the step of identifying that the processor is running.

6. The processor according to claim 1, wherein when the operation is started, each processor sets status data in the shared device to a status indicating that the shared device is operating, and when the operation is terminated, the status data in the shared device is set. A step of referring to all other processors except the own processor, and when all of the other processors are in an initial state, perform termination processing of the entire multiprocessor system, and perform at least one of the other processors in operation. 6. The method according to claim 4, further comprising the step of performing a termination process of the own processor.
A processor monitoring method in a multiprocessor system according to any one of the above.

7. A multiprocessor system having an exclusive control mechanism, a plurality of processors sharing a shared device, a communication mechanism for communicating between the processors, and a processor, when the operation is started, when the operation is started. Broadcasting a message indicating that it is in operation to another processor at regular intervals via the communication mechanism, and not receiving messages from all of the other processors in operation among the other processors Attempting to lock the exclusive control mechanism of each processor for which no message is received, and when the exclusive control mechanism can be locked, the processor performs recovery processing assuming that a termination or failure has occurred, and when the lock cannot be performed. Is a storage medium storing a program for executing a step of continuing monitoring of the processor.

8. When each of the processors starts operation, it attempts to receive a signal from another processor via the communication mechanism, and when the signal can be received, the processor that has transmitted the signal has The storage medium storing the program according to claim 7, further comprising a step of identifying that it is in operation.

9. When each processor starts operation, the processor sets status data in the shared device to a status indicating that the shared device is operating, and when ending the operation, the processor changes the status data in the shared device. A step of referring to all other processors, and when all the other processors are in an initial state, perform a termination process of the entire multiprocessor system, and if at least one of the other processors is in operation, 9. A storage medium storing the program according to claim 7, further comprising a step of performing a termination process of each processor.