JPS63251840A

JPS63251840A - Control method for detection of multi-processor abnormality

Info

Publication number: JPS63251840A
Application number: JP62086329A
Authority: JP
Inventors: Jinichi Nakamura; 仁一中村
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 1987-04-08
Filing date: 1987-04-08
Publication date: 1988-10-19

Abstract

PURPOSE:To prevent the progress of a fault and to improve the reliability of a multi-processor system by keeping an abnormal cycle under a waiting state when the fault occurs in the system. CONSTITUTION:When a memory has a fault, a flip-flop 2 is set by a NAND 8 when the abnormal cycle is equal to the cycle of a CPU-2 via a fault generation signal 3C. Then an interruption request 1F is supplied by the output of the flip-flop 2 against a CPU-1. At the same time, the output of a counter 3 is inhibited by a NAND 7, therefore, a waiting state 2E of the CPU-2 is not released. Then the state 2E is released when a control signal received from the CPU-1 is reset. Thus the state 2E is not released and a processor is kept under the state 2E when a fault occurs. Thus it is possible to prevent the progress of an error state and to improve the reliability of a multi-processor system.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は共有メモリを有するマルチプロセッサシステム
において発生した障害を検出し制御する方式、および１
処１！Ｉｆ装置と従弟ＩＩＩＩ　ＬＱ　ニで構成される
マルチプロセフサシ、ステムにおいて従処理装置に発生
した障害を検出し制御する方法に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention provides a system for detecting and controlling failures occurring in a multiprocessor system having a shared memory, and
Place 1! The present invention relates to a method for detecting and controlling a failure occurring in a slave processor in a multiprocessor system consisting of an If device and a cousin III LQ device.

（従来の技術〕従来共作メそりを有するマルチプロセッサシステムにお
いては障害の発生をメモリの内容に反映しそれを各々の
プロセッサがセマフォを用いて読むことにより異常検出
していた（特開昭６Ｏ−２５４３０３）、又主処理装置
と従処理装置で構成されるマルチプロセッサシステムに
おいては主処理装置内に応答待ちタイマを設け、従処理
装置の状態を監視することにより障害発生を検出してい
た。また最近では障害検出時間の短縮化を計るためファ
ームウェアレベルで前記従処理装置のための吠自通知要
求コマンドを設は従処理装置からのレスポンスが予め決
められた時間内に得られるかどうかで判断する方式（特
開昭００−２５４３３８）であった。(Prior art) Conventionally, in a multiprocessor system with a collaborative memory system, an error was detected by reflecting the occurrence of a failure in the memory contents and having each processor read the contents using a semaphore. -254303), and in a multiprocessor system composed of a main processing unit and a slave processing unit, a response waiting timer is provided in the main processing unit and the occurrence of a failure is detected by monitoring the status of the slave processing unit. Recently, in order to shorten the failure detection time, a self-notification request command for the slave processing device has been set at the firmware level, and the judgment is made based on whether a response from the slave processing device can be obtained within a predetermined time. (Japanese Unexamined Patent Publication No. 00-254338).

（発明が解決しようとする四履点）従来の技術では共有メモリを有するマルチプロセッサシ
ステムに勿いても、主処理Ｋ　Ｕと従処理ｌｍで構成さ
れるマルチプロセッサシステムにおいても発生する障害
を瞬時に検出し得ない。即ち障害が発生してから何らか
の方法で障害に対処するまでにプロセッサは異常状態の
ままで動作を続行するので事態の悪化を招くことになる
。最悪の場合は障害の検出前にシステムダウンに致るこ
とも「る。(Four Points to be Solved by the Invention) Conventional techniques can instantly resolve failures that occur not only in multiprocessor systems with shared memory, but also in multiprocessor systems composed of a main processor KU and a slave processor LM. Undetectable. In other words, the processor continues to operate in an abnormal state after a failure occurs until some method is taken to deal with the failure, resulting in a worsening of the situation. In the worst case, the system may go down before the failure is detected.

本発明は、上記の欠点を除去し、障害があった場合にそ
のエラー状態を進行させない信頓性の高いマルチプロセ
ッサの制御方法を提供することを目的とする。SUMMARY OF THE INVENTION It is an object of the present invention to provide a highly reliable multiprocessor control method that eliminates the above-mentioned drawbacks and prevents the error state from progressing in the event of a failure.

[Means for solving problems]

本発明は少なくとも２つ以上のＣＰＵが共有のメモリを
有するマルチプロセッサにおいて、前記プロセッサのう
ちの少なくとも１つが前記共有メモリをアクセスした際
、該ＣＰＵにウェイトがかけられ、前記共存メモリのエ
ラーをチェックするメモリ異常検出回路がエラーを検出
した場合には、前記ウェイト状信を１１続せしめること
を特徴する。The present invention provides a multiprocessor in which at least two or more CPUs have a shared memory, and when at least one of the processors accesses the shared memory, a wait is applied to the CPU to check for errors in the coexisting memory. If the memory abnormality detection circuit detects an error, it is characterized in that 11 of the wait messages are sent in succession.

（作用〕この方式においては障害発生の時点でプロセラ、ザのサ
イクルをウェイト伏皿とする。そのため共有メモリを有
するマルチプロセッサシステムの各々のプロセッサ、及
び主処理装置と従処理装置で構成されるマルチプロセッ
サシステムの各々の処理装置内のプロセッサに対しサイ
クルの開始にまずウェイトをかける。障害の発生がない
場合にはすぐさまウェイトを解除するので無用のウェイ
トが入ることはない。障害発生時にはウェイトは解除さ
れず上記プロセッサ又は処理装置はウェイト状態のまま
であるので次の処理に移ることはない。また障害発生の
検出はハードウェアの信号により割り込み発生回路から
他のプロセッサ又は他の処理装ぎへの割り込みにより行
なう。(Operation) In this method, the cycle of the processor and the processor is used as a wait table at the time of occurrence of a failure.Therefore, each processor of the multiprocessor system having a shared memory, and the multiprocessor system consisting of the main processing unit and the slave processing unit, First, a wait is placed on the processor in each processing unit of the processor system at the start of a cycle.If no failure occurs, the wait is immediately released, so no unnecessary wait is inserted.When a failure occurs, the wait is canceled. The processor or processing device remains in a wait state and does not move on to the next process.Furthermore, the occurrence of a failure is detected by sending a signal from the interrupt generation circuit to another processor or other processing device using a hardware signal. This is done by interrupt.

〔Example〕

以下に添付図面を参照しながら本発明の詳細な説明する
。The present invention will be described in detail below with reference to the accompanying drawings.

共イ「メモリを「するマルチプロセッサシステムにおい
て本発明を実施するシステム構成を第１図に示す。第１
図において１はメインプロセッサでＣＰＵ−１であり、
２はサブプロセッサＣＰＵ−２である。各々のプロセッ
サの共存メモリ５に対してのアクセス要求は１・のＣＰ
　Ｕ　−１がＩＤ１２のＣＰＵ−２が２０であり、競合
回路回路４で調停されＩＤの許可信号がＩＧ、２Ｃの許
可信号が２Ｄとなりそれぞれ排他的に出力される。この
１Ｇと２Ｄのメモリアクセス許可信号により７のメモリ
アクセス制御回路からメモリに対する制御信号３Ｄが生
成される。この３Ｄの信号とアドレス３Ａ、データ３１
３により共存メモリ５はデータの人出力を行なう。また
このタイミングに同期してメモリ異邦゛検出回路６によ
り共有メモリ５に対するアクセスが正常であるかを判断
する。異常が検出された場合は異常発生検出信号３Ｃを
出力する０通常のシロ常検出はパリティチェックあるい
はＣＲＣチェックにより行なう。FIG. 1 shows a system configuration for implementing the present invention in a multiprocessor system that uses common memory.
In the figure, 1 is the main processor, CPU-1,
2 is a sub-processor CPU-2. An access request to the co-located memory 5 of each processor is CP of 1.
U-1 is ID12 and CPU-2 is 20, and after arbitration by the competition circuit 4, the permission signal for ID becomes IG, and the permission signal for 2C becomes 2D, which are respectively output exclusively. Based on the 1G and 2D memory access permission signals, the memory access control circuit 7 generates a control signal 3D for the memory. This 3D signal, address 3A, data 31
3, the coexistence memory 5 performs data output. Also, in synchronization with this timing, the memory foreign detection circuit 6 determines whether access to the shared memory 5 is normal. If an abnormality is detected, an abnormality occurrence detection signal 3C is output. Normally, abnormality detection is performed by a parity check or a CRC check.

１のＣＰ　Ｕ　−１と２０ＣＰＵ−２が共存メモリ５に
対しアクセスする際のタイミングチャートを第２図に示
す。１のＣＩ）　Ｕ　−１からの共存メモリ５に対する
アクセス要求ＩＤが出力され、競合回避回路４で調停さ
れ１のＣＰＵ−１のアクセス許可信号ＩＧが出力される
。その時２のＣＰＵ−２からの共有メモリ５に対するア
クセス要求２ｃはｌＤの要求が解除されるまで競合Ｌｌ
ｉｌ　跡回路４に許可されないのでそのままの状態とな
る。１のｃＰＵ−１側ではアクセス許可信号ＩＧにより
アドレスバッフ７８、データパフフ７８を開き共存メモ
リ５に対しアクセスを開始する。メモリアクセス１１ｉ
制御回路７から共有メモリ５にアクセス制御信号３Ｄが
出力されデータの人出力が行なわれｌのＣＰＵ−１側の
アクセスが終了する。この時のアクセスデータを用いて
メモリ異邦゛検出回路６により異常検出が行なわれる。A timing chart when CPU 1 and CPU 20 access the coexisting memory 5 is shown in FIG. 1's CI) The access request ID for the coexisting memory 5 from U-1 is output, and the contention avoidance circuit 4 arbitrates the access request ID, and the access permission signal IG for the 1's CPU-1 is output. At that time, the access request 2c to the shared memory 5 from CPU-2 of 2 is contention Ll until the request of LD is released.
il Since it is not permitted by trace circuit 4, it will remain in the same state. On the cPU-1 side, the address buffer 78 and data puff 78 are opened in response to the access permission signal IG, and access to the coexistence memory 5 is started. memory access 11i
An access control signal 3D is outputted from the control circuit 7 to the shared memory 5, data is outputted manually, and the access on the CPU-1 side of 1 is completed. Abnormality detection is performed by the memory foreign state detection circuit 6 using the access data at this time.

異常が検出された場合は検出信号３Ｃにより１のＣＰＵ
−１に人力される。ｌのＣＩ）　Ｕ　−１のアクセス時
の障害は割り込みとしてＩＣより人力され−１１１の例
外処理が行なわれる。If an abnormality is detected, the CPU of 1 is activated by the detection signal 3C.
-1 is manually powered. 1) A failure during access to U-1 is manually generated by the IC as an interrupt, and -111 exception processing is performed.

１のＣＩ’Ｕ−１のサイクルが終了すると競合回避回路
４から２のＣＰＵ−２のアクセス許可信号２Ｄが出力さ
れる。この信号によりアドレスバッフ１１３、データバ
ッフ１１２を開き共有メモリ５に対してのアクセスを開
始する。メモリ制御回路７から共有メモリ５にアクセス
制御信号３Ｄが出力されデータの人出力が行なわれる。When the cycle of the first CI'U-1 ends, the contention avoidance circuit 4 outputs an access permission signal 2D for the second CPU-2. This signal opens the address buffer 113 and data buffer 112 and starts accessing the shared memory 5. An access control signal 3D is output from the memory control circuit 7 to the shared memory 5, and data is output manually.

２のＣＰＵ−２はアクセス要求信号２Ｃを出力した時点
でウェイト制御回路３により自分自身にウェイト２Ｅを
かける。このウェイト２Ｅはアクセス許可信号２１）が
出力された後に解除するが、異常検出回路Ｇにより２の
ＣＰＵ−２のアクセスに異常が検出された場合は解除さ
れずウェイト２Ｅは出力されたままとなるので、２のＣ
ｌ）　Ｕ　−２はその異常サイクルのｔまでウェイトを
ｇ＋／る。また２のＣＩ）　Ｕ　−２のアクセス時の異
常検出信号はウェイト制御回路３より１のＣＩ）　Ｕ　
−１に対し割り込み信号Ｉ　Ｆが出力されるので１のＣ
Ｉ）　Ｕ　−１は障害に対する処理を行ない２のＣＩ）
　Ｕ　−２に対する制御信号ＩＥによりリセットをかけ
たり停止させたりすることができる。When the CPU-2 outputs the access request signal 2C, the wait control circuit 3 applies a weight 2E to itself. This wait 2E is canceled after the access permission signal 21) is output, but if the abnormality detection circuit G detects an abnormality in the access of CPU-2, it is not canceled and the wait 2E remains output. Therefore, 2C
l) U -2 increases the weight by g+/ until t of the abnormal cycle. Also, the abnormality detection signal when accessing CI 2) U-2 is sent from the wait control circuit 3 to CI 1) U.
Since the interrupt signal IF is output for -1, C of 1
I) U-1 performs processing for the failure and CI of 2)
It can be reset or stopped by the control signal IE for U-2.

第３図にウェイト制御回路３の回路を示す、２のＣＰＵ
−２がメモリ、アクセス要求信号２ｃを出力すると６の
ＮＡＮＤが反転してフリップフロップ１をクリアしＣＰ
Ｕ−２に対しウェイト信号２Ｅが出力される０Ｍ合回避
回路よりＣＰＵ−２のアクセス許可信号２Ｄが出力され
ると７リツプフロフプ１のクリアは解除される。さらに
カウンタ３のクリアも解除される。１紀カウンタ３と発
振器４はメモリに対するアクセスレディのタイミングを
作るもので予め設定しておいた時間になるとフリップフ
ロップ１のクロックをたたきＣＩ）　Ｕ　−２に対する
ウェイト２Ｅを解除する。しかしメモリに障害が発生し
た場合、障害発生信号３ｃによりそのサイクルがＣＰＵ
−２のサイクルの時ＮＡＮＤ８によりフリップフロップ
２がセットされその出力によりＣＰＵ−１に対する割り
込み要求ＩＦが入力される。またＡＮＤ７によりカウン
タ８の出力は禁止されるのでＣＩ’Ｕ−２のウェイト２
Ｅは解除されない、このウェイトはＣＰＵ−１からの制
御信号例えばリセットにより解除される。Figure 3 shows the circuit of the weight control circuit 3.
-2 is the memory, and when it outputs the access request signal 2c, the NAND of 6 is inverted, clearing the flip-flop 1, and CP
When the access permission signal 2D of the CPU-2 is outputted from the 0M event avoidance circuit which outputs the wait signal 2E to U-2, the clearing of the 7-rip-flop 1 is canceled. Further, the clearing of counter 3 is also canceled. The first period counter 3 and the oscillator 4 are used to create a ready timing for accessing the memory, and at a preset time, they strike the clock of the flip-flop 1 and release the wait 2E for CI) U-2. However, if a memory failure occurs, the failure occurrence signal 3c causes the cycle to be transferred to the CPU.
-2 cycle, the flip-flop 2 is set by the NAND 8, and its output inputs an interrupt request IF to the CPU-1. Also, since the output of counter 8 is prohibited by AND7, the weight 2 of CI'U-2 is
E is not canceled; this weight is canceled by a control signal from the CPU-1, such as a reset.

第４図に主処理装置と従処理装置で構成されるマルチプ
ロセッサシステムにおいて本発明を実施する他のシステ
ム構成を示す。ｌはメイ／プ「１セツサでＣＩ’　Ｕ　
−１で２は従処理装置のプロセッサでＣＩ）　Ｕ　−２
である。４３は従処理装置のメモリ、４４は従処理Ｈｎ
のＩｌｏである。２のＣＰＵ−２がメモリ４３ヘアクセ
スする場合はメモリアクセス制御信号４２Ａによりメモ
リ４３ヘアクヒスし、同時にウェイト制御回路４７によ
り２のＣＩ’Ｕ−２へウェイト信号４２　ＩＩを出力す
る。メモリアクセスに異常があるかどうかについてはメ
モリ異常検出回路４５により判定し異’７３がない場合
にはウェイト制御回路４７は２のＣＰＵ−２に対するウ
ェイトを解除して２のＣＩ）　Ｕ　−２はそのサイクル
を終結するが異゛ＩＳが検出された場合はウェイト信号
４２１１は解除されず１のＣＩ’Ｕ−１に対し異常を知
らせる割り込み信号４１Ｂが入力される。２のＣＰＵ−
２がＩｌｏへアクセスする場合はＩ１０アクセス制御信
号４２Ｄにより１１０ヘアクセスし同時にウェイト制御
回路４７により２のＣＰＵ−２へウェイト信号４２１１
を出力する。Ｉ１０アクセスに異常があるかどうかにつ
いてはＩ１０異常検出回路４６により判定し異常がない
場合にはウェイト制御回路４７は２のＣＰＵ−２に対す
るウェイト解除して２のＣＰＵ−２はそのＩ１０アクセ
スサイクルは終結するが、異常が検出された場合はウェ
イト信号４２Ｈは解除されず１のＣＰＵ−１に対し異常
を知らせる割り込み信号４１１１が入力される。このウ
ェイト制御回路４７についてはｆｔＥ３図と同じもので
ある。FIG. 4 shows another system configuration in which the present invention is implemented in a multiprocessor system composed of a main processing unit and a slave processing unit. l is May/pu ``CI'U with 1 setsa''
-1 and 2 is the processor of the slave processing unit (CI) U -2
It is. 43 is the memory of the slave processing device, 44 is the slave processing Hn
This is Ilo. When the second CPU-2 accesses the memory 43, the memory 43 is accessed by the memory access control signal 42A, and at the same time, the wait control circuit 47 outputs the wait signal 42II to the second CI'U-2. The memory abnormality detection circuit 45 determines whether or not there is an abnormality in the memory access, and if there is no abnormality, the wait control circuit 47 releases the wait for CPU-2 of 2, and the CI of 2) U-2 is When the cycle is terminated, but a different IS is detected, the wait signal 4211 is not canceled and an interrupt signal 41B is inputted to CI'U-1 to notify the abnormality. 2 CPU-
When 2 accesses Ilo, the I10 access control signal 42D is used to access 110, and at the same time, the wait control circuit 47 sends a wait signal 4211 to the CPU-2 of 2.
Output. The I10 abnormality detection circuit 46 determines whether there is an abnormality in the I10 access, and if there is no abnormality, the wait control circuit 47 releases the wait for the CPU-2 of No. 2, and the CPU-2 of the No. 2 performs the I10 access cycle. However, if an abnormality is detected, the wait signal 42H is not canceled and an interrupt signal 4111 is input to the CPU-1 to notify the CPU-1 of the abnormality. This weight control circuit 47 is the same as in the ftE3 diagram.

〔Effect of the invention〕

以上詳記したように本発明の異常検出制御方法によれば
、共有メモリを有するマルチプロセッサシステムや主処
理装置と従処理装置で構成するマルチプロセッサシステ
ムにおいて障害が発生した場合にその異常サイクルをウ
ェイト状態とするので次のサイクルを実行しないために
障害の進行を妨ぐことが出来、システムの信頼性を向上
させることが出来る。As detailed above, according to the abnormality detection control method of the present invention, when a failure occurs in a multiprocessor system having a shared memory or a multiprocessor system consisting of a main processing unit and a slave processing unit, the abnormal cycle is suspended. Since the next cycle is not executed, the progress of the failure can be prevented, and the reliability of the system can be improved.

[Brief explanation of drawings]

第１図は本発明の実施例を示すブロック図、第２図は、
上記実施例の共有メモリへのアクセス動作を説明するタ
イミングチャート、第３図は、本発明のウェイトｌｌｉ
制御回路の一例を示す図。第４図は、本発明の他の実施
例を示すブロック図。１・・・ＣＰＵ２・・・ＣＰＵ３・・・ウェイト制御回路以　　上第３図FIG. 1 is a block diagram showing an embodiment of the present invention, and FIG. 2 is a block diagram showing an embodiment of the present invention.
A timing chart illustrating the access operation to the shared memory in the above embodiment, FIG. 3 shows the weight lli of the present invention.
The figure which shows an example of a control circuit. FIG. 4 is a block diagram showing another embodiment of the invention. 1...CPU 2...CPU 3...Wait control circuit and above Figure 3

Claims

[Claims]

In a multiprocessor in which at least two or more CPUs have a shared memory, when at least one of the processors accesses the shared memory, the CPU
1. An abnormality detection control method for a multiprocessor, characterized in that a wait state is applied to U, and when a memory abnormality detection circuit that checks errors in the shared memory detects an error, the wait state is continued.