JPS63231537A

JPS63231537A - Fault processing system

Info

Publication number: JPS63231537A
Application number: JP62063734A
Authority: JP
Inventors: Kazuhiko Ninomiya; 和彦二宮; Tetsuji Ogawa; 小川　哲二
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1987-03-20
Filing date: 1987-03-20
Publication date: 1988-09-27

Abstract

PURPOSE:To recover and process the inverting trouble of a RAM used as an RCS cheaply and at high speed by providing a means to store the same data as the data (micro-instruction sequence) RCS (reloadable control storage)-loaded and a restarting address which goes to a pair with the data. CONSTITUTION:Together with the recovering data of RAM, a restarting address RADRi is prepared beforehand corresponding to a trouble address and by a channel control device CHC 300, a restarting processing corresponding to the trouble address can be executed to a channel device 3101-n. Namely, the instructing information to instruct the effect to restart after the CHC 300 sets the value to an address register ADR 312 together with the value of the RADRi fetched earlier to a data line 530 in continuation to the recovering processing of an RCS 311, is published and informed by a signal line 5101.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、データ処理システムに係り、マイクロプログ
ラム制御の処理装置における、マイクロ命令の読出しエ
ラーの回復に好適な障害処理方式〔従来の技術〕高性能マイクロコンビエータの登場や、低摩な高速ラン
ダム・アクセス・メモリ（ＲＡＭ）の出現により、各種
電子装置の制御部に、リロード可能なコントロール・ス
トレージ（ＲＣＳ’）を持つマイクロプログラム制御方
式が採用されている。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a data processing system, and relates to a fault handling method suitable for recovering from a microinstruction read error in a microprogram-controlled processing device [Prior art] With the advent of high-performance microcombinators and low-cost, high-speed random access memory (RAM), microprogram control systems with reloadable control storage (RCS') have become popular in the control sections of various electronic devices. It has been adopted.

この種の処理装置の制御部を、ＲＣＳを用いてマイクロ
プログラム化することのメリットは、その複雑な制御を
少ない開発工数で容易に実現できるところにあるが、問
題はＲＣＳに用いられるＲＡＭの信頼性にある。即ち、
ＲＣＳに障害が発生すると、マイクロプログラムの実行
を不可能にし、また、通常の動作ではＲＣＳの内容は更
新されないので、仮に一時的なＲＡＭのピット反転によ
る障害であっても、回復不可能な障害につながり、装置
の可用性を損う結果になる。このための対策として、従
来はハミング・コードを用いたエラーピットの修正、水
平垂直パリティピットを用いたエラービットの修正、あ
るいは外部記憶（７０ッビ・ディスクなど）から一群の
データをリロードするなどの障害回復方法が採られてい
る。ハミング・コードによる方法は一時的な障害の外に
ＲＡＭの永久的な障害に対しても有効であるが、冗長ビ
ットを付加する必要があるため生産コストの上昇をまね
く。また、水平垂直パリティピットを用いる方法あるい
は外部記憶からのりロードによる方法は、障害回復のだ
めの所要時間が大であるという難点がある。The advantage of microprogramming the control section of this type of processing device using RCS is that complex control can be easily realized with less development effort, but the problem is the reliability of the RAM used in RCS. It's in the sex. That is,
If a failure occurs in the RCS, it becomes impossible to execute the microprogram, and the contents of the RCS are not updated during normal operation, so even if the failure is caused by a temporary RAM pit inversion, it is an irrecoverable failure. This results in a loss of equipment availability. Conventional countermeasures for this include correcting error pits using Hamming codes, correcting error bits using horizontal and vertical parity pits, or reloading a group of data from external storage (such as a 70-bit disk). Disaster recovery methods have been adopted. Although the Hamming code method is effective against temporary failures as well as permanent RAM failures, the need to add redundant bits increases production costs. Furthermore, the method using horizontal and vertical parity pits or the method using paste loading from external storage has the disadvantage that it takes a long time to recover from a failure.

[Problem that the invention seeks to solve]

上記欠点を解決するため、ＲＣＳと同一のデータを記憶
する記憶手段を具備し、ＲＣＳの読出しデータ（マイク
ロ命令）のエラーを検出した場合にマイクロ命令の実行
を一時的に抑止し、前記記憶手段から正常なデータを読
出しＲＣＳへリロードするとともにエラーデータに代っ
て、前記正常なデータを以ってマイクロプログラムを再
実行せしむる方式が特開昭４９−９０４６６号公報に開
示されている。しかしながら、ＲＣＳの読出しを完了し
、その後、チェック回路による読出しデータの正常性を
確認してからでないと、マイクロ命令を実行できず、マ
イクロ命令の実行サイクルが大きくなり、性能が低下す
る。一般にＲＣ３のアドレスが確定してから、読出しデ
ータが得られるまでの、いわゆるＲＡＭのリード・アク
セス時間と、読出しデータが確定してから次に実行すべ
きアドレスが設定されるまでの時間をマイクロ命令の実
行サイクルとすることによって性能を最大限に引き出す
ことができる。しかし、このような方法だと、性能を最
大限に引き出すことはできても、ＲＣＳの読出しデータ
の確定後、直ちに該読出しデータを用いて次マイクロ命
令のアドレスを設定する動作を実行し、マイクロ命令で
指定された動作も実行するとともに、並行して、該、読
出しデータのチェックが実行されるため、該読出しデー
タのエラーが検出されても、既に該読出しデータはマイ
クロ命令として実行されている。したがって、マイクロ
命令単位の実行の抑止と修正及びＲＣＳへの再書込みに
よるＲＡＭ障害の回復処理と、前記障害回復処理に引続
くマイクロ命令の再実行が不可能になる。In order to solve the above drawbacks, the storage means is provided with a storage means for storing the same data as the RCS, and when an error in the read data (microinstruction) of the RCS is detected, the execution of the microinstruction is temporarily inhibited, and the storage means Japanese Unexamined Patent Publication No. 49-90466 discloses a method in which normal data is read out from the RCS, reloaded into the RCS, and the microprogram is re-executed using the normal data in place of the error data. However, the microinstruction cannot be executed unless the reading of the RCS is completed and the normality of the read data is confirmed by a check circuit, which increases the execution cycle of the microinstruction and degrades performance. In general, the so-called RAM read access time from when the address of RC3 is determined until the read data is obtained, and the time from when the read data is determined until the next address to be executed is set is the microinstruction time. Performance can be maximized by setting the execution cycle to . However, although performance can be maximized with this method, after the RCS read data is determined, the read data is immediately used to set the address of the next microinstruction, and the microinstruction is The operation specified by the instruction is also executed, and the read data is checked in parallel, so even if an error in the read data is detected, the read data has already been executed as a microinstruction. . Therefore, it becomes impossible to suppress and modify the execution of each microinstruction, to recover from a RAM failure by rewriting to the RCS, and to re-execute the microinstruction subsequent to the failure recovery process.

また、マイクロプログラム制御の複数の副処理装置のう
ちの成る一つの副処理装置において、ＲＣＳの読出しエ
ラーが検出された場合、主処理装置が他の副処理装置か
ら正常なデータを取得し、該データをエラー検出したＲ
ＣＳへリロードすることによって、ＲＣＳの内容を回復
する手段が知られている。しかしながら、ＲＣＳの読出
しエラーを検出した副処理装置でエラー検出前までに実
行していた動作シーケンスの再実行方法に関する配慮が
なかった。Furthermore, if an RCS read error is detected in one of the microprogram-controlled subprocessing units, the main processing unit acquires normal data from another subprocessing unit and R that detected an error in the data
Means are known for restoring the contents of the RCS by reloading it into the CS. However, no consideration was given to a method for re-executing the operation sequence that was being executed before the error was detected by the sub-processing device that detected the RCS read error.

本発明の目的は、ＲＣＳとして使用されるＲＡＭの反転
障害に対して、低摩且つ高速な回復処理方式を提供する
ことにある。An object of the present invention is to provide a low-friction and high-speed recovery processing method for a reversal failure of a RAM used as an RCS.

本発明の目的は、ＲＣＳの読出しデータのエラーを検出
した時に、エラーとなったマイクロ命令が既に実行され
ている処理装置において、有効なマイクロ命令の再実行
方式を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to provide an effective microinstruction re-execution system in a processing device in which an erroneous microinstruction is already being executed when an error in RCS read data is detected.

[Means for solving problems]

上記目的は、ＲＣＳにロードされたデータ（マイクロ命
令列）と同一のデータと、該データと対になったリスタ
ート・アドレスとを記憶する記憶手段を具備することに
より達成される。The above object is achieved by providing a storage means for storing the same data (microinstruction sequence) loaded into the RCS and a restart address paired with the data.

〔Example〕

以下、本発明の一実施例を説明する。実施例としては、
データ処理システムの構成要素の−っであるチャネル装
置に適用した例を用いる。An embodiment of the present invention will be described below. As an example,
An example applied to a channel device, which is one of the components of a data processing system, will be used.

第１図はデータ処理システムの全体構成を示している。FIG. 1 shows the overall configuration of the data processing system.

１００は中央処理装置（ＣＰＵ）、２００は主記憶制御
装置（ＳＣＵ）、’２１０は主記憶装置（ＭＳＵ）、３
００はチャネル制御装置（ＣＨＣ）、５１゜はチャネル
装置（ＣＨ）であり複数台（ｎ台）存在し、４００はリ
ードイス装置（ＳＶＰ）である。100 is a central processing unit (CPU), 200 is a main storage control unit (SCU), '210 is a main storage unit (MSU), 3
00 is a channel control device (CHC), 51° is a channel device (CH), of which there are a plurality of devices (n devices), and 400 is a read device device (SVP).

第２図はチャネル制御装置５００．と、チャネル装置３
１０との関係をより詳細に示した図である−３１１はり
ロード可能なコントロール・ストレージ（ＲＣＳ）、３
１２は３１１をアドレスするアドレス・レジスタ（ＡＤ
Ｈ）であり、３１３は月２のディレー・レジスタ（ＤＡ
ＤＲ）であり、３１１の読出しデータのエラーが検出さ
れた場合、５１２は既に次マイクロ命令のアドレスに更
新されるが、３１３はエラーが検出されたデータを読出
した時に使用したマイクロ命令アドレスが保持される。FIG. 2 shows a channel control device 500. and channel device 3
10-311 Loadable Control Storage (RCS), 3
12 is an address register (AD
H) and 313 is the month 2 delay register (DA
DR), and if an error in the read data of 311 is detected, 512 is already updated to the address of the next microinstruction, but 313 retains the microinstruction address used when reading the data where the error was detected. be done.

５００１〜ｎは各々、３１０１〜ｎから３００への割込
み信号線、５１０１゜゛は３００から３１０１．、への
割込み信号５００１Ａ、ｎに対する応答信号である。母
線５２０は６１０１−ｎから３００へのデータ線、母線
５５０は３００から５１０１ψのデ゛−タ線である・第５図は記憶手段に記憶される情報の形式とア。5001-n are interrupt signal lines from 3101-n to 300, respectively, and 5101゛ is from 300 to 3101. , is a response signal to the interrupt signal 5001A,n. The bus line 520 is a data line from 6101-n to 300, and the bus line 550 is a data line from 300 to 5101ψ. FIG. 5 shows the format and format of information stored in the storage means.

ドレス手段を図示している０本実施例では該記憶手段が
主記憶装置ＭＳＵ（２１０）の一部の領域に用意されて
いるものとして説明する。In this embodiment, it is assumed that the storage means is provided in a part of the main storage unit MSU (210).

サービス装置４００は電源投入等にともなう初期設定処
理において、チャネル装置５１０１Ｍ１のＲＣＳ３１１
ヘマイクロ命令をロードするとともに、ＭＳＵ２１０に
対しても予じめ定められたアドレス（オフセット・アド
レス）から、第４図に示すが如きＲＣＳとＲＡＤＲ（リ
スタート・アドレス）が対になった情報を設定する。そ
の後、各々のチャネル装置５１０１．、Ｔ１はＲＣＳ５
１１を逐一読出し、実行することにより処理を進め、所
定の処理が終了すると、ＣｕＣ３Ｏ０へ割込み信号を送
出する。ＣＨＣ５００は割込み信号線５００１〜ｎを逐
一走査しており、例えばチャネル装置３１０１０割込み
を受付けると、対応する応答信号５１０１を通してチャ
ネル装置３１０１へ応答信号を返信する。これにより、
チャネル装置５１０１は装置の状態情報をデータ線５２
０へ載せ・この情報をＣｕＣ２Ｏ４が取込みプログラム
へ報告する。The service device 400 uses the RCS 311 of the channel device 5101M1 in the initial setting process associated with power-on, etc.
At the same time, load the microinstruction into the MSU 210, and also set information on the pair of RCS and RADR (restart address) as shown in Figure 4 from a predetermined address (offset address). do. Thereafter, each channel device 5101 . , T1 is RCS5
11 is read and executed one by one to advance the process, and when the predetermined process is completed, an interrupt signal is sent to CuC3O0. The CHC 500 scans the interrupt signal lines 5001 to 500n one by one, and when it receives an interrupt from the channel device 31010, for example, it returns a response signal to the channel device 3101 through the corresponding response signal 5101. This results in
Channel device 5101 sends device status information to data line 52.
CuC2O4 reports this information to the capture program.

チャネル装置３１０１にてＲＣＳ３１１の読出しデータ
のエラーが検出されると、チャネル装置３１０１は、割
込み信号線５００１にてＣｕＣ２Ｏ４へ割込み信号を発
する。ＣｕＣ２Ｏ４は鎮剤込み信号を受付けると信号線
５１０１にて応答する。チャネル装置３１０１は信号＋
！！５１０１に応答して、データ線５２０を用いてＤＡ
ＤＲ５１５の内容とともにＲＣ３５１１の読出しデータ
のエラーの検出を反映し７た状態情報を通知する一ＣＨ
Ｃ５００はデータ線５２０の情報を取り込み、状態情報
を解析した結果、ＲＣＳ３１１の読出しデータのエラー
が発生していることを知ると、第３図に示すようにオフ
セット・アドレスにＤＡＤＲの値を加算した値を主記憶
アドレスとしてＳ　ＣＵ　２００ヘフエツテ・リクエス
トを発信する。　５ＣＵ２００はＭＳＵ２１０からＲＣ
Ｓ（ｉ）とＲＡＤＲ（ｔ）とをフェッチし、ＣｕＣ２Ｏ
４へ転送する。チャネル装置３１０１のＡＤＲ３１２は
既に次のマイクロ命令アドレスに更新されているので、
ＣＭ（，５００はデータ線５３０に先に受信したＤＡＤ
Ｒ５１２の値と、この値をＡＤＨ５１２へ設定するよう
指示する指示情報とを載せ、信号線５１０１を以ってチ
ャネル装置３１０１へ通゛知する。チャネル装置６１０
１は該通知を受けて、ＡＤＲ５１２を再設定する。続い
て、ＣＨＣ５ｏｏはＭＳＵ２１０からフェッチしてき九
ＲＣＳ（ｉ）の値と、この値をＲＣＳ５１１ヘリロード
するよう指示する指示情報とを載せ、信号線５１０１を
以ってチャネル装置５１０１へ通知する。チャネル装置
は該通知ｙを受けてＲＣＳ５１１へＲＣＳ（ｉ　）をリ
ロードする。When the channel device 3101 detects an error in the read data of the RCS 311, the channel device 3101 issues an interrupt signal to the CuC2O4 via the interrupt signal line 5001. When CuC2O4 receives the sedative signal, it responds via signal line 5101. Channel device 3101 receives signal +
! ! 5101, using data line 520 to
One channel that notifies the contents of DR515 as well as status information reflecting the detection of an error in the read data of RC3511.
C500 takes in the information on data line 520, analyzes the status information, and finds that an error has occurred in the read data of RCS311, so it adds the value of DADR to the offset address as shown in Figure 3. Send a request to the SCU 200 using the value as the main memory address. 5CU200 is RC from MSU210
Fetch S(i) and RADR(t), and
Transfer to 4. Since ADR 312 of channel device 3101 has already been updated to the next microinstruction address,
CM (,500 is the DAD received earlier on the data line 530
The value of R512 and instruction information instructing the ADH 512 to set this value are loaded and notified to the channel device 3101 via the signal line 5101. Channel device 610
1 receives the notification and resets the ADR 512. Subsequently, the CHC 5oo fetches the value of RCS(i) from the MSU 210 and loads the instruction information to load this value into the RCS 511, and notifies the channel device 5101 via the signal line 5101. The channel device receives the notification y and reloads the RCS(i) into the RCS 511.

以上の動作によ＃）ＲＣＳ３１１に内在していた障害は
回復されるが、チャネル装置３１０１はＲＣＳ３１１の
読出しデータのエラーにより動作シーケンスが損われて
いるため、エラー検出までにチャネル装置５１０１にお
いて実行された処理を再実行させるか、再実行が不可能
な場合は障害処理ルーチンを起動させなければならない
・再実行の可否は障害データを内在していたマイクロ命
令アドレスに依存する。Although the fault inherent in the RCS 311 is recovered by the above operation, the operation sequence of the channel device 3101 is damaged due to the error in the read data of the RCS 311, so the operation sequence is not executed in the channel device 5101 until the error is detected. The process must be re-executed, or if re-execution is not possible, a fault handling routine must be activated. Whether or not re-execution is possible depends on the microinstruction address that contained the fault data.

したがって、本発明では第３図に示すようにＲＡＭの回
復データとともに、障害アドレスに対応して、′リスタ
ート・アドレス（ＲＡＤＲ（ｉ））を予じめ用意して、
ＣＨＣ！ｔｏｏによりチャネル装置”　０１−ｎに障害
アドレスに対応したりスタート処理を可能にしている。Therefore, in the present invention, as shown in FIG. 3, a 'restart address (RADR(i)) is prepared in advance in correspondence with the failed address along with the recovery data in the RAM.
CHC! too allows the channel device "01-n" to respond to a faulty address and perform start processing.

即ち、ＣＨＣ５（１０は前述し九ＲＣＳ５１１の回復処
理に引続いて、データ線５３０に先にフェッチしたＲＡ
ＤＲ（ｉ　）の値とともに、この値をＡＤＲ３１２へ設
定した後にリスタートする旨を指示した指示情報を載せ
、信号線５１０１にて通知する。That is, CHC5 (10 is the previously fetched RA
Along with the value of DR(i), instruction information instructing to restart after setting this value to the ADR 312 is loaded and notified via the signal line 5101.

以上の説明では、ＲＣＳ（ｉ　）とＲＡＤＲ（ｉ）の記
憶手段として、主記憶の一部を利用する例を用いたが、
該記憶手段をＣｕＣ２Ｏ４の内部に具備してもよいし、
あるいは独立した記憶装置を設置してもよい。また、処
理装置自身に内蔵してもよい。In the above explanation, an example was used in which a part of the main memory is used as a storage means for RCS(i) and RADR(i), but
The storage means may be provided inside CuC2O4,
Alternatively, an independent storage device may be installed. Alternatively, it may be built into the processing device itself.

以上の説明から明らかな如く、処理装置におけるＲＣＳ
のデータ障害が処理装置の処理速度で高速且つ容易に回
復でき、更に処理装置のマイクロ命令実行速度を損わな
いで回復処理が可能となり・安価で且つ可用性の高い処
理システムを提供できる利点がある。特に実施例の如き
、チャネル装置に適用した場合には更に以下の如き利点
がある・一般にチャネル装置の内部障害はＲＣＳの読出
しデータのエラーも含め、プログラム（通常はオペレー
ティング・システム）へエラーの発生が通知され、デー
タ処理システム全体としての障害回復処理がプログラム
により実行されるが、本発明を用いれば、ＲＣＳの読出
しデータのエラーは、工２−の発生アドレスによっては
チャネル装置内部での再試行によって救済されることが
あり、プログラムに障害回復処理の回数を低減して、デ
ータ処理システムの性能を平均的に向上させることが可
能となる。As is clear from the above explanation, RCS in the processing device
Data failures can be recovered quickly and easily at the processing speed of the processing device, and recovery processing can be performed without impairing the microinstruction execution speed of the processing device, which has the advantage of providing an inexpensive and highly available processing system. . In particular, when applied to a channel device as in the embodiment, there are further advantages as follows: In general, an internal failure in a channel device causes an error in the program (usually the operating system), including an error in RCS read data. is notified, and fault recovery processing for the data processing system as a whole is executed by the program. However, if the present invention is used, an error in RCS read data can be retried within the channel device depending on the address of occurrence in step 2-. It is possible to reduce the number of failure recovery processes in a program and improve the average performance of the data processing system.

[Brief explanation of the drawing]

第１図は本発明の一実施例のデータ処理システムの構成
図、第２図は本発明を適用したチャネル・装置とチャネ
ル制御装置の接続図、第３図は本発明による記憶手段に
記憶される情報の形式図である・５００・チャネル制御装置（ＣＨＣ）。５１０１Ａ、ｎ・・・チャネル装置（ＣＨ）。３１２・・・ＲＣＳのアドレス・レジスタ。５００１、ｒｌ−ＣＨからＣＨＣへの割込み信号線。５１０１−ｎ・・・ＣＨＣからＣＨへの応答信号線。５２０・・ＣＨからＣＨＣへのデータ線。５３０・ＣＨＣからＣＨへのデータ線。第　１７栴　２　図］乃ｎFIG. 1 is a configuration diagram of a data processing system according to an embodiment of the present invention, FIG. 2 is a connection diagram of a channel device and a channel control device to which the present invention is applied, and FIG. 500 Channel Control Unit (CHC). 5101A, n... Channel device (CH). 312...RCS address register. 5001, interrupt signal line from rl-CH to CHC. 5101-n...Response signal line from CHC to CH. 520...Data line from CH to CHC. 530・Data line from CHC to CH. No. 17, Figure 2] No.

Claims

[Claims]

1. A processing device having an RCS (reloadable control storage), comprising a storage means for storing information in which the same data as the RCS and a restart address are paired; When an RCS read error is detected, data identical to the RCS and a restart address are read from the storage means based on the RCS address at that time, and after reloading the data RCS, a restart address is written to the RCS address register. A failure handling method characterized by setting an address and restarting.