JPH0374422B2

JPH0374422B2 -

Info

Publication number: JPH0374422B2
Application number: JP59253566A
Authority: JP
Priority date: 1984-11-29
Filing date: 1984-11-29
Publication date: 1991-11-26
Also published as: JPS61131050A

Description

【発明の詳細な説明】〔産業上の利用分野〕この発明は情報処理装置のメモリ部に発生した
訂正可能エラーの処理方式に関し、特にメモリ部
に発生したエラーの情報を分類して計数する処理
方式に関するものである。[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to a method for processing correctable errors occurring in the memory section of an information processing device, and in particular to a process for classifying and counting information on errors occurring in the memory section. It is related to the method.

情報処理装置の有するメモリ部、即ちメインス
トレージMS及びコントロールストレージCS等に
発生したエラー障害がシステムに及ぼす影響は大
きく、時には、システムが暴走して使用不能に陥
る。 An error failure occurring in a memory section of an information processing device, ie, a main storage MS, a control storage CS, etc., has a large effect on the system, and sometimes the system goes out of control and becomes unusable.

従つて、メモリ部にエラーが発生した場合に
は、速やかに的確な処理判断をするためのエラー
情報を把握する必要がある。エラーの中でもメモ
リ部の１ビツトエラーはシステム障害に至る前駆
となるので、この検出とその処理が装置の信頼性
確保のために重要である。 Therefore, when an error occurs in the memory unit, it is necessary to grasp the error information in order to quickly make an accurate processing decision. Among errors, a 1-bit error in the memory section is a precursor to a system failure, so detection and processing thereof are important for ensuring reliability of the device.

[Conventional technology]

メモリ部の発生したエラーを装置の使用上の障
害となる前に検出し処理するために、各種の手法
が実施されている。 Various techniques have been implemented to detect and handle errors that occur in the memory section before they become a problem in the use of the device.

その一つにパトロール診断方法があり、一定周
期毎に格納されているメモリのワードを読み出し
てチエツクし、エラーがあれば訂正して再びもと
のアドレスに格納する操作を自動的に繰り返して
常に異常のないことを監視し、もし、訂正可能な
（例えば１ビツトの）エラーを検出すれば、１ビ
ツトエラー訂正回路へ送るとともに、記録し、カ
ウンタに発生回数を計数する方式が一般に行われ
ている。 One such method is the patrol diagnosis method, which reads and checks the words stored in memory at regular intervals, corrects any errors, and stores them again at the original address, which is automatically repeated. Generally, the system monitors to make sure there are no abnormalities, and if a correctable (for example, 1-bit) error is detected, it is sent to the 1-bit error correction circuit, recorded, and the number of occurrences is counted in a counter. .

[Problem that the invention seeks to solve]

１ビツトエラーの発生状況を見れば、例えば半
導体記録素子の損傷がエラーの原因であれば、同
じワードで繰り返しエラーを発生する。また、ワ
ードの構成がその素子に関連するワード、即ち、
エラー発生ワードに連続するワードに多くエラー
が発生する。この場合は当然速やかに損傷した記
憶素子を交換することが必要である。 Looking at the occurrence of one-bit errors, for example, if the cause of the error is damage to a semiconductor recording element, errors will occur repeatedly in the same word. Also, a word whose configuration is related to the element, i.e.
Many errors occur in words that follow the error-generated word. In this case, it is naturally necessary to promptly replace the damaged memory element.

また、一方ではメモリ部に書き込み、読み出し
の途中で上記とは別の原因、例えば電気的なノイ
ズが信号回路に飛び込んだために、エラーとなる
場合があり、この場合は再度検出されることはな
いのが普通である。 On the other hand, an error may occur due to a cause other than the above, such as electrical noise jumping into the signal circuit during writing to and reading from the memory, and in this case, it will not be detected again. It is normal not to have one.

従来は、エラーカウンタの計数値を見てエラー
の状況を判断して、例えば、直ちに部品交換をし
た。即ち、従来の方式では一時的なエラーである
のか、あるいは回復不可能な固定エラーであるの
か、エラーデータをダンプしてチエツクしなけれ
ば区別することができない、と云う問題点があつ
た。 Conventionally, the error situation was determined by looking at the count value of the error counter and, for example, parts were replaced immediately. That is, in the conventional method, there was a problem in that it was impossible to distinguish whether it was a temporary error or a fixed error that could not be recovered unless the error data was dumped and checked.

[Means for solving problems]

この発明は上記問題点を解消するため、１ビツ
トエラーを検出すると、複数のカウンタで計数す
る。例えば、固定エラーとソフトエラーとに区別
してそれぞれの発生回数をカウントする２個のカ
ウンタを装備し、その記録はフラグコードに反映
させ、デイスプレイに表示する。 In order to solve the above problem, the present invention counts a 1-bit error using a plurality of counters when it is detected. For example, it is equipped with two counters that count the number of occurrences of fixed errors and soft errors, and the records are reflected in flag codes and displayed on a display.

即ち、メインストレージMSあるいはコントロ
ールストレージCS等、パトロール診断の対象と
するメモリ部の１個のワードの読み出し／書き込
み動作（以下、リード／ライトと云う）で１ビツ
トエラーを検出した場合に、そのエラーワードか
ら連続するｎ個のワードをリード／ライトしてチ
エツクを行う。 In other words, when a 1-bit error is detected in a read/write operation (hereinafter referred to as read/write) of a single word in the memory section targeted for patrol diagnosis, such as the main storage MS or control storage CS, the error word A check is performed by reading/writing n consecutive words from the beginning.

上記リード／ライトで１ビツトエラーを検出し
た場合には、再び同じワードをリード／ライトし
てチエツクを行い、それでもエラーとなる場合は
固定エラーとして計数する。 If a 1-bit error is detected in the above read/write, the same word is read/written again and checked, and if an error still occurs, it is counted as a fixed error.

リード／ライトしてチエツクし、１ビツトエラ
ーを検出した場合で、再び同じワードのリード／
ライトしてチエツクを行い、エラーにならなかつ
た場合はソフトエラーとして計数する。 If a 1-bit error is detected after reading/writing and checking, read/write the same word again.
Write and check, and if no error occurs, it is counted as a soft error.

[Effect]

上記の１ビツトエラー検出方法によると、リー
ド／ライトで検出したエラーは再度のリード／ラ
イトで確認され、固定エラーとソフトエラーとに
分類され、区別してそれぞれのカウンタに計数さ
れ、しかも連続したｎ個のワードのチエツクを行
うので、エラーの発生状況が明確となり、それに
よつて、エラー処理を的確に行うことができる。 According to the 1-bit error detection method described above, errors detected during read/write are confirmed during read/write again, are classified into fixed errors and soft errors, and are counted separately in their respective counters. Since the word is checked, the situation in which the error has occurred becomes clear, and thus error processing can be performed accurately.

〔Example〕

以下、図面を参照してこの発明の実施例を詳細
に説明する。 Embodiments of the present invention will be described in detail below with reference to the drawings.

第１図はこの発明の一実施例を示すブロツク図
である。 FIG. 1 is a block diagram showing one embodiment of the present invention.

図において、１はアドレスカウンタ３とループ
カウンタ４を含むリード／ライト制御部で、２は
その入力となるパトロールアドレスを格納するパ
トロールアドレスレジスタである。 In the figure, 1 is a read/write control unit including an address counter 3 and a loop counter 4, and 2 is a patrol address register that stores a patrol address that is input thereto.

５はパトロール診断で監視されるメモリ部で、
６はメモリ部から呼び出されたワードがセツトさ
れるワードレジスタである。 5 is the memory section monitored by patrol diagnosis,
6 is a word register in which a word read from the memory section is set.

７は１ビツトエラー検出／訂正部で、８は固定
エラーカウンタ９とソフトエラーカウンタ１０を
含むエラー制御部で、表示部１１とログ記録部１
２によつて固定エラーカウンタとソフトエラーカ
ウンタの計数値が表示、記録される。 7 is a 1-bit error detection/correction section, 8 is an error control section including a fixed error counter 9 and a soft error counter 10, a display section 11 and a log recording section 1.
2 displays and records the counted values of the fixed error counter and soft error counter.

第２図はこの発明の方式のフローチヤートで、
この図に従つて、この方式の動作を詳しく説明す
る。但し、第２図はメインメモリMSのパトロー
ル診断を行つている場合の例である。 Figure 2 is a flowchart of the method of this invention.
The operation of this system will be explained in detail with reference to this figure. However, FIG. 2 is an example in which patrol diagnosis of the main memory MS is performed.

まず、パトロールアドレス（以下、アドレスＡ
とする）がパトロールアドレスレジスタ２にフエ
ツチされると（ステツプ１）、リード／ライト制
御部１はアドレスカウンタ３を介してメモリ部５
の指示されたアドレスＡのワードをワードレジス
タ６に呼び出す（ステツプ２）。 First, patrol address (hereinafter address A)
) is fetched into the patrol address register 2 (step 1), the read/write control unit 1 fetches the memory unit 5 via the address counter 3.
The word at the address A specified by is called into the word register 6 (step 2).

以後、第２図の状態区分は括弧付数字を用い
る。 Hereinafter, numbers in parentheses will be used for the status classifications in FIG.

呼び出されたワードを１ビツトエラー検出部７
がチエツクする(3)。エラーが検出されないとアド
レスカウンタ３に１が加算されて(4)、次の処理に
移行する。このワードで１ビツトエラーが検出さ
れると、このワードのアドレスＡから連続してｎ
個のワードがチエツクされる。 The called word is detected by the 1-bit error detection unit 7.
checks (3). If no error is detected, 1 is added to the address counter 3 (4), and the process moves to the next step. If a 1-bit error is detected in this word, n consecutive bits starting from address A of this word
words are checked.

ループカウンタ４はｎに、固定エラーカウンタ
９は０に、ソフトエラーカウンタ１０は０にセツ
トされる(5)。 The loop counter 4 is set to n, the fixed error counter 9 to 0, and the soft error counter 10 to 0 (5).

一旦、もとのアドレスに訂正されて格納された
エラーを発生したワードは再びワードレジスタ６
に呼び出され(6)、１ビツトエラー検出／訂正部７
によつてチエツクされる(7)。 Once corrected and stored at the original address, the word that caused the error is returned to the word register 6.
(6), 1-bit error detection/correction section 7
Checked by (7).

エラーが検出されない場合には、ソフトエラー
カウンタ１０に１が加算され(8)、エラーが検出さ
れると、固定エラーカウンタ９に１が加算される
(9)。 If no error is detected, 1 is added to the soft error counter 10 (8), and if an error is detected, 1 is added to the fixed error counter 9.
(9).

いずれの場合にもアドレスカウンタ３は１が加
算されてＡ＋１となり、ループカウンタ４から１
が減算され、ｎ−１となる(10)。 In either case, address counter 3 is incremented by 1 and becomes A+1, and loop counter 4 becomes 1.
is subtracted and becomes n-1 (10).

ループカウンタ１０が０でなければ（11）、リ
ード／ライト制御部１によつてアドレスＡ＋１が
呼び出され（12）、１ビツトエラー検出／訂正部
７によつてチエツクされる（13）。 If the loop counter 10 is not 0 (11), address A+1 is called by the read/write control section 1 (12) and checked by the 1-bit error detection/correction section 7 (13).

エラーが検出されると再び呼び出されて（６）、
チエツクされる（７）。また、ステツプ（13）で
エラー検出されなければ、各カウンタをセツトし
直して（10）、ループカウンタの値をチエツクし
（11）、アドレスＡ＋２のワードを呼び出して
（12）チエツクする（13）。 It is called again (6) when an error is detected.
Checked (7). If no error is detected in step (13), each counter is reset (10), the value of the loop counter is checked (11), the word at address A+2 is called (12) and checked (13). .

以上のように（10）、（11）、（12）、（13）は繰り
返され、ループカウンタが０、即ち初めにエラー
が発生したワードからｎ個のワードのチエツクが
終わつた時、エラー処理部８からロギング処理と
フラグコード作成が行われ（14）、フラグコード
の表示即ち分類されたエラー表示がデイスプレイ
表示部１１で行われ（15）、ログ記録部１２で記
録されるとともに次の処理に移行する。 As described above, (10), (11), (12), and (13) are repeated, and when the loop counter is 0, that is, when n words have been checked from the first word where an error occurred, error processing is performed. Logging processing and flag code creation are performed from section 8 (14), flag code display, that is, classified error display is performed on display display section 11 (15), and is recorded in log recording section 12, and the next processing is performed. to move to.

上記の説明はコントロールメモリMSのメモリ
１ビツトエラー処理方式について述べたがコント
ロールメモリCSについて適用しても何等支障が
ないことは云うまでもない。 Although the above description has been made regarding the memory 1-bit error processing method of the control memory MS, it goes without saying that there is no problem in applying this method to the control memory CS as well.

また、第１図の各ブロツクはローカルストレー
ジLS、コントロールストレージCS、メインスト
レージMS、レジスタ他で構成されたシステムで
機能する。 Furthermore, each block in Figure 1 functions in a system consisting of local storage LS, control storage CS, main storage MS, registers, and others.

〔Effect of the invention〕

以上説明したようにこの発明によればメモリ部
に発生した１ビツトエラーがエラー状態によつて
分類されて記録され、且つ表示されるので、その
エラーに対する処理判断が迅速にでき、的確な処
置をとることができる。例えば、保守担当者はエ
ラー情報の通知を受ければ、直ちに対策の判断を
し、プリント板を交換する等必要な処置を行うこ
とができる。 As explained above, according to the present invention, 1-bit errors that occur in the memory section are classified and recorded according to the error state, and are displayed, so that it is possible to quickly determine what to do with the error and take appropriate measures. be able to. For example, when a maintenance person is notified of error information, he or she can immediately decide on countermeasures and take necessary measures such as replacing the printed circuit board.

[Brief explanation of drawings]

第１図はこの発明のメモリ１ビツトエラー処理
方式の一実施例を示すブロツク図、第２図は第１
図のフローチヤートである。図中、１はリード／ライト制御部、５はメモリ
部、７は１ビツトエラー検出部、８はエラー制御
部、９は固定エラーカウンタ、１０はソフトエラ
ーカウンタ、１１は表示部、１２はログ記録部で
ある。 FIG. 1 is a block diagram showing an embodiment of the memory 1-bit error processing method of the present invention, and FIG.
This is a flowchart of the figure. In the figure, 1 is a read/write control section, 5 is a memory section, 7 is a 1-bit error detection section, 8 is an error control section, 9 is a fixed error counter, 10 is a soft error counter, 11 is a display section, and 12 is a log record. Department.

Claims

[Claims]

1. In an information processing device that performs patrol diagnosis of the memory section and corrects and writes when a correctable error is detected, the error detection section rechecks the word in which a correctable error has been detected and detects a correctable error again. When a correctable error is detected, it is counted in a first error counter, and when it is not detected, it is counted in a second error counter. A memory error processing method characterized in that the values of both the counters are displayed by performing the discrimination control.