JP6555095B2

JP6555095B2 - Memory diagnostic repair device

Info

Publication number: JP6555095B2
Application number: JP2015224300A
Authority: JP
Inventors: 田島　宏一; 宏一田島; 西田　廣治; 廣治西田
Original assignee: Fuji Electric Co Ltd
Current assignee: Fuji Electric Co Ltd
Priority date: 2015-11-16
Filing date: 2015-11-16
Publication date: 2019-08-07
Anticipated expiration: 2035-11-16
Also published as: JP2017091418A

Description

本発明は、メモリについてソフトエラーの有無の診断およびソフトエラーの修復を行う技術に関する。 The present invention relates to a technique for diagnosing the presence or absence of a soft error in a memory and repairing the soft error.

ソフトエラーとは、メモリに記憶されているデータの一部のビット値が反転することを言う。ソフトエラーは、例えばアルファ線や中性子線、陽子線、重イオン線等の粒子線がメモリ素子等の半導体チップに突入することで引き起こされることが知られている。また、半導体チップの微細化が進むほどソフトエラーの発生頻度が高くなることが知られている。ソフトエラーは、メモリを構成するメモリ素子等の半導体チップの一部が物理的に壊れるハードエラーとは異なり、一時的な不良である。このため、メモリ素子をリセットしたり、データをリライト（再書き込み）したりすることによってソフトエラーを修復できることが一般に知られている。 A soft error means that a bit value of a part of data stored in a memory is inverted. It is known that a soft error is caused when particle beams such as alpha rays, neutron rays, proton rays, and heavy ion rays enter a semiconductor chip such as a memory element. It is also known that the frequency of soft errors increases as the semiconductor chip becomes finer. A soft error is a temporary failure unlike a hard error in which a part of a semiconductor chip such as a memory element constituting a memory is physically broken. For this reason, it is generally known that a soft error can be repaired by resetting a memory element or rewriting (rewriting) data.

ビット反転の有無の診断およびビット反転の発生したデータの修復については、誤り訂正符号（以下、「ＥＣＣ」（ＥｒｒｏｒＣｏｒｒｅｃｔｉｏｎＣｏｄｅ）という）を利用することで実現できることが一般に知られている。より詳細に説明すると、ＥＣＣを利用することで単一ビット或いは２ビット以上が変化したのかを調べ、単一ビットだけが変化した場合にはそのビットを反転させることでデータを修復することができる。２ビット以上が変化した場合には、ＥＣＣ単独では修復できないが、既存のインターリーブ方式と併用することで、２ビット以上の誤りを修復可能となることが一般に知られている。また、ＥＣＣを利用した技術の他にも、ソフトエラーの診断等を実現する技術として特許文献１〜３および非特許文献１の各文献に開示の技術が提案されている。 It is generally known that the diagnosis of the presence or absence of bit inversion and the restoration of data in which bit inversion has occurred can be realized by using an error correction code (hereinafter referred to as “ECC” (Error Correction Code)). To explain in more detail, it is possible to check whether a single bit or two or more bits have changed by using ECC, and when only a single bit has changed, the data can be restored by inverting that bit. . It is generally known that when two or more bits change, ECC cannot be repaired alone, but when used in combination with an existing interleaving method, an error of two or more bits can be repaired. In addition to techniques using ECC, techniques disclosed in Patent Documents 1 to 3 and Non-Patent Document 1 have been proposed as techniques for realizing diagnosis of soft errors and the like.

特許文献１には、正データ（後の処理のためにメモリに書き込まれるデータそのままのデータ）と反転データ（正データの各ビットを反転させたデータ）とを予めメモリに保存しておき、正データにアクセスする時点でパリティチェックなどで異常の有無を判定し、異常があった場合には反転データを利用することが記載されている。 In Patent Document 1, positive data (data as it is written to a memory for later processing) and inverted data (data obtained by inverting each bit of positive data) are stored in a memory in advance. It is described that the presence or absence of an abnormality is determined by parity check or the like when data is accessed, and that inverted data is used when there is an abnormality.

特許文献２には、電気自動車等に搭載される電動機駆動装置等のＥＰＲＯＭ（ＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲｅａｄＯｎｌｙＭｅｍｏｒｙ）或いはＥＥＰＲＯＭ（ＥｌｅｃｔｒｉｃａｌｌｙＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲｅａｄＯｎｌｙＭｅｍｏｒｙ）についてのソフトエラーの有無の診断および修復技術が開示されている。より詳細に説明すると、特許文献２には、正データに加えて反転データとチェックコードをＥＰＲＯＭ等に格納しておき、車両のマスターキーが抜かれた時などの所定のタイミングでソフトエラーの有無を診断し、ソフトエラーがある場合にはその修復を行うことが記載されている。 Patent Document 2 discloses the diagnosis and restoration of the presence or absence of a soft error in EPROM (Erasable Programmable Read Only Memory) or EEPROM (Electrically Erasable Programmable Read Only Memory) such as an electric motor drive device mounted on an electric vehicle or the like. ing. More specifically, Patent Document 2 stores inversion data and a check code in addition to the positive data in an EPROM or the like, and whether there is a soft error at a predetermined timing such as when the master key of the vehicle is removed. It is described that a diagnosis is made and the soft error is repaired.

特許文献３には、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）が二重化された二重系システムにおけるメモリのソフトエラーの有無の診断および修復技術が開示されている。特許文献３には、メモリ内の３カ所にデータを保存し、二重系のＣＰＵの一方でアプリケーションプログラム等を実行する際に多数決でデータエラーの有無の判定および修復を行い、二重系のＣＰＵの他方を用いて修復が為されたか否かの検証を行う技術が記載されている。 Patent Document 3 discloses a diagnosis and repair technique for the presence or absence of a soft error in a memory in a dual system in which a CPU (Central Processing Unit) is duplicated. In Patent Document 3, data is stored in three locations in a memory, and when a dual CPU executes an application program or the like, the presence or absence of a data error is determined and repaired by a majority vote. A technique for verifying whether or not a repair has been performed using the other CPU is described.

非特許文献１にはＩＥＣ６１５０８−７Ａ．５．７にＤｏｕｂｌｅＲＡＭｗｉｔｈｈａｒｄｗａｒｅｏｒｓｏｆｔｗａｒｅｃｏｍｐａｒｉｓｏｎａｎｄｒｅａｄ／ｗｒｉｔｅｔｅｓｔによるメモリ診断方法が記載されている。 Non-Patent Document 1 describes IEC 61508-7 A.I. In 5.7, a memory diagnosis method using Double RAM with hardware or software comparison and read / write test is described.

特開平５−２１６７７１号公報Japanese Patent Application Laid-Open No. 5-216671 特開２００２−５５８８５号公報JP 2002-55885 A 特開２０１３−１０９５３２号公報JP2013-109532A

ＩＥＣ６１５０８Ｆｕｎｃｔｉｏｎａｌｓａｆｅｔｙｏｆｅｌｅｃｔｒｉｃａｌ／ｅｌｅｃｔｒｏｎｉｃａｌ／ｐｒｏｇｒａｍｍａｂｌｅｅｌｅｃｔｒｏｎｉｃｓａｆｅｔｙ−ｒｅｌａｔｅｄｓｙｓｔｅｍｓIEC61508 Functional safety of electrical / electronic / programmable electrical safety-related systems

メモリには、そのメモリの搭載されている機器の実運用の過程でデータの読み出しのみが行われるものと、任意のタイミングでデータの書き込みおよび読み出しが行われるものとがある。前者の代表例はＥＥＰＲＯＭであり、後者の代表例はＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）である。以下では、後者のメモリのことを「リアルタイムに読み書きされるメモリ」と呼ぶ。 There are two types of memories, one in which only data is read during the actual operation of the device in which the memory is mounted, and the other in which data is written and read at an arbitrary timing. A typical example of the former is an EEPROM, and a typical example of the latter is a RAM (Random Access Memory). Hereinafter, the latter memory is referred to as “memory that is read and written in real time”.

リアルタイムに読み書きされるメモリについては、特定のタイミングで、または定期的にソフトエラーの有無の診断および修復を行う態様では実際に処理を実行するためにメモリからデータを読み出すタイミングと診断タイミングが一致するとは限らず、誤ったデータを用いて処理が実行される虞がある。このため、特許文献２に開示の技術は、リアルタイムに読み書きされるメモリについてのソフトエラーの有無の診断および修復には適さない。そもそも、特許文献２に開示の技術におけるデータ誤りの診断および修復対象はＥＰＲＯＭ或いはＥＥＰＲＯＭであり、ＲＡＭに格納されているデータのソフトエラーの検知や修復については考慮されていない。 For memory that is read and written in real time, in a mode in which diagnosis and repair for the presence or absence of a soft error are performed at a specific timing or periodically, the timing at which data is read from the memory to actually execute processing matches the diagnosis timing However, the process may be executed using incorrect data. For this reason, the technique disclosed in Patent Document 2 is not suitable for diagnosis and repair of the presence or absence of a soft error in a memory that is read and written in real time. In the first place, the object of data error diagnosis and repair in the technique disclosed in Patent Document 2 is EPROM or EEPROM, and detection or repair of soft errors in data stored in RAM is not considered.

特許文献１、特許文献３および非特許文献１に開示の技術、或いはＥＣＣを用いた技術は、リアルタイムに読み書きされるメモリについてのソフトエラーの有無の診断および修復に適用可能ではあるが、夫々以下のような問題がある。まず、特許文献１に開示の技術については、メモリからのデータの読み出しタイミングでエラーを検出することが記載されているものの、具体的な検出方法が不明確である。この検出方法について特許文献１にはパリティチェックを用いることを示唆する記載があるが、パリティチェックではエラーの検出能力が低いといった問題がある。また、特許文献１に開示の技術には、メモリの格納内容は修復されず、誤ったままになるといった問題もある。 The technology disclosed in Patent Literature 1, Patent Literature 3, and Non-Patent Literature 1, or the technology using ECC can be applied to diagnosis and repair of the presence or absence of a soft error in a memory that is read and written in real time. There is a problem like this. First, although the technique disclosed in Patent Document 1 describes that an error is detected at the timing of reading data from a memory, the specific detection method is unclear. Regarding this detection method, Patent Document 1 has a description suggesting that a parity check is used, but the parity check has a problem that the error detection capability is low. Further, the technique disclosed in Patent Document 1 has a problem that the stored contents of the memory are not restored and remain in error.

特許文献３に記載の技術では、ＣＰＵの多重化が前提とされており、メモリから読み出したデータを用いて処理を実行するハードウェア（二重化されたＣＰＵのうちの一方）の他に検証のためのハードウェア（二重化されたＣＰＵのうちの他方）が別途必要になるといった問題がある。非特許文献１に記載の技術には、ソフトエラーの有無の診断はできるものの、メモリから読み出したデータの修復やメモリの格納内容の修復ができないといった問題がある。 In the technique described in Patent Document 3, it is assumed that CPUs are multiplexed, and for verification in addition to hardware (one of the duplicated CPUs) that executes processing using data read from the memory. This requires a separate hardware (the other of the duplicated CPUs). The technique described in Non-Patent Document 1 has a problem that although the presence or absence of a soft error can be diagnosed, the data read from the memory and the stored contents of the memory cannot be repaired.

ＥＣＣを利用する技術には以下のような問題がある。すなわち、ＥＣＣを用いてソフトエラーの有無の診断および修復を行う場合、ＥＣＣ対応のためにハードウェアのコストが増大する、といった問題がある。加えて、ＥＣＣを用いてソフトエラーの有無の診断および修復を行う態様には、インターリーブなどの特殊な方式と併用しない限り２ビット以上のソフトエラーを修復できないといった問題がある。また、ＥＣＣを利用した技術には、メモリから読み出したデータの誤りは訂正されるものの、メモリの格納内容は修復されず、誤ったままになるといった問題もある。 The technology that uses ECC has the following problems. That is, when diagnosing and repairing the presence / absence of a soft error using ECC, there is a problem that the cost of hardware increases in order to cope with ECC. In addition, the aspect of diagnosing and repairing the presence or absence of a soft error using ECC has a problem that a soft error of 2 bits or more cannot be repaired unless it is used in combination with a special method such as interleaving. In addition, although the technique using ECC corrects an error in data read from the memory, there is a problem that the stored contents of the memory are not restored and remain in error.

以上説明したように、リアルタイムに読み書きされるメモリについて、データ読出しのタイミングでソフトエラーの有無を診断し、ソフトエラーがあった場合にメモリの格納内容を修復することを、ハードウェアコストの増加を招くことなく実現する技術は従来なかった。 As explained above, for memory that is read and written in real time, it is possible to diagnose the presence or absence of a soft error at the timing of data reading, and to restore the stored contents of the memory when there is a soft error. There has been no technology that can be realized without inviting.

本発明は以上に説明した課題に鑑みて為されたものであり、データ読出しのタイミングでソフトエラーの有無を診断し、ソフトエラーがあった場合にメモリの格納内容を修復することを、ハードウェアコストの増加を招くことなく実現する技術、を提供することを目的とする。 The present invention has been made in view of the above-described problem, and diagnoses the presence or absence of a soft error at the timing of data reading, and restores the stored contents of the memory when there is a soft error. It is an object to provide a technology that can be realized without causing an increase in cost.

上記課題を解決するために本発明は、以下のメモリアクセス制御手段、特定手段、および診断修復手段を有する診断修復装置を提供する。メモリアクセス制御手段は、読み出し対象として指定されたデータをメモリから読み出すとともに当該データに対応付けて前記メモリに格納された少なくとも２つのデータを前記メモリから読み出す。特定手段は、メモリアクセス制御手段により読み出された少なくとも３つのデータの全てが同一ではない場合に多数を占めるデータを修復データとして特定するとともに少数となったデータのメモリにおける記憶領域を診断対象領域として特定する。診断修復手段は、診断対象領域へテストデータを書き込んだ後に当該診断対象領域からデータを読み出し、当該読み出したデータと書き込み前のテストデータとが一致する場合には修復データを用いて診断対象領域を修復する一方、一致しない場合にはハードエラーのエラー通知を行う。 In order to solve the above-described problems, the present invention provides a diagnostic repair apparatus having the following memory access control means, identification means, and diagnostic repair means. The memory access control means reads data designated as a reading target from the memory and reads at least two data stored in the memory in association with the data from the memory. The specifying means specifies the data that occupies the majority as the repair data when all of the at least three data read by the memory access control means are not the same, and the storage area in the memory of the data that has become a small number is the diagnosis target area As specified. The diagnostic repairing means reads the data from the diagnostic target area after writing the test data to the diagnostic target area, and if the read data and the test data before writing match, On the other hand, if they do not match, a hard error is notified.

本発明においては、メモリアクセス制御手段により読み出された少なくとも３つのデータの全てが同一ではない場合、少数のデータは異常データと見做される。この異常データが格納されていた診断対象領域にテストデータを書き込んだ後に当該診断対象領域から読み出したデータと書き込み前のテストデータとが一致するのであれば、上記異常データはソフトエラーによるものと考えられる。ハードエラーが発生したのであれば、診断対象領域から読み出したデータと書き込み前のテストデータとが一致することはないからである。 In the present invention, a small number of data is regarded as abnormal data when all of at least three data read by the memory access control means are not the same. If the data read from the diagnosis target area after writing the test data in the diagnosis target area where the abnormal data was stored matches the test data before writing, the abnormal data is considered to be due to a soft error. It is done. This is because, if a hard error has occurred, the data read from the diagnosis target area does not match the test data before writing.

本発明によれば、メモリ内の何れかの記憶領域にソフトエラーが発生したとしても、そのソフトエラーはデータ読み出しの際に修復される。このため、上記メモリがリアルタイムに読み書きされるメモリであっても、誤ったデータで処理が実行されることはなく、信頼性を担保することができる。加えて本発明では、特許文献３に開示の技術のように、検証のためのハードウェアが別途必要となることはない。つまり、本発明によれば、データ読出しのタイミングでソフトエラーの有無を診断し、ソフトエラーがあった場合にメモリの格納内容を修復することを、ハードウェアコストの増加を招くことなく実現することが可能になる。 According to the present invention, even if a soft error occurs in any storage area in the memory, the soft error is repaired when reading data. For this reason, even if the memory is a memory that is read and written in real time, processing is not executed with erroneous data, and reliability can be ensured. In addition, unlike the technique disclosed in Patent Document 3, the present invention does not require additional hardware for verification. That is, according to the present invention, it is possible to diagnose the presence / absence of a soft error at the timing of data reading, and to restore the stored contents of the memory when there is a soft error without increasing the hardware cost. Is possible.

より好ましい態様においては、特定手段は、メモリアクセス制御手段により読み出された少なくとも３つのデータのうちで多数を占めるデータの数が予め定めた閾値未満である場合には、ハードエラーのエラー通知とは異なる第２のエラー通知を行う。当該第２のエラー通知が為されたものの、ハードエラーのエラー通知が為されなかったのであれば、読み出し対象のデータおよび当該データに対応する少なくとも２つのデータの各々を格納する記憶領域のうち上記閾値以上の数の記憶領域においてソフトエラーが発生したことを意味する。つまり、診断修復装置のユーザは第２のエラー通知の有無を通じてソフトエラーが頻発しているか否かを把握することができる。また、第１および第２のエラー通知とともに、エラーの発見されたデータ（多数決において少数となったデータ、或いは書き込み前と一致しなかったテストデータ）そのものを通知しても良く、この場合はどのビットが怪しいかを診断修復装置のユーザに把握させることが可能になる。 In a more preferred aspect, the specifying means includes an error notification of a hard error when the number of data occupying a majority of at least three data read by the memory access control means is less than a predetermined threshold. Performs a different second error notification. If the second error notification is made but the hard error error notification is not made, the above-mentioned storage area for storing each of the data to be read and at least two data corresponding to the data This means that a soft error has occurred in the number of storage areas equal to or greater than the threshold. In other words, the user of the diagnostic / restoration apparatus can grasp whether or not a soft error frequently occurs through the presence / absence of the second error notification. In addition to the first and second error notifications, the data in which an error is found (data that has become a minority in the majority vote or test data that did not match before writing) may be notified. It is possible to make the user of the diagnostic / repair device know whether the bit is suspicious.

上記テストデータは１種類だけでも良いが、互いに異なる複数のテストデータの各々を用いて、診断対象領域への書き込み、診断対象領域からの読み出し、および書き込み前との比較を行うことがより好ましく、構成ビットが互いに反転している２つのテストデータを用いて診断対象領域への書き込み、診断対象領域からの読み出し、および書き込み前との比較を行うことがさらに好ましい。構成ビットが互いに反転している２つのテストデータとは、テストデータのデータサイズが１バイトである場合には、０ｘｆｆと０ｘ００、或いは０ｘａａと０ｘ５５のようなデータのことを言う。０ｘ００のように全てのビットが０であるテストデータのみを用いて診断対象領域についてのハードウェアの有無の検出を行うと、ビット値が０のままとなるハードエラーを検出し損ねる虞がある。本態様によれば、ハードエラーの検出漏れを確実に回避することができる。 Although only one type of test data may be used, it is more preferable that each of a plurality of test data different from each other is used to perform writing to the diagnosis target area, reading from the diagnosis target area, and comparison before writing. More preferably, writing to the diagnosis target area, reading from the diagnosis target area, and comparison with before writing are performed using two test data whose constituent bits are inverted from each other. The two test data in which the constituent bits are inverted from each other means data such as 0xff and 0x00 or 0xaa and 0x55 when the data size of the test data is 1 byte. If the presence / absence of hardware for a diagnosis target area is detected using only test data in which all bits are 0, such as 0x00, a hardware error in which the bit value remains 0 may be missed. According to this aspect, it is possible to reliably avoid detection errors of hard errors.

また、上記課題を解決するために、ＣＰＵ（或いはＣＰＵコア）などの一般的なコンピュータを上記メモリアクセス制御手段、特定手段および診断修復手段として機能させるプログラムを提供する態様も勿論考えられる。このようなプログラムの具体的な配布態様としては、フラッシュＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）などのコンピュータ読み取り可能な記録媒体に書き込んで配布する態様やインターネットなどの電気通信回線経由のダウンロードにより配布する態様が考えられる。 In order to solve the above problems, it is of course possible to provide a program that causes a general computer such as a CPU (or CPU core) to function as the memory access control means, the specifying means, and the diagnostic repair means. As specific distribution modes of such a program, there are a mode in which the program is written and distributed on a computer-readable recording medium such as a flash ROM (Read Only Memory), and a mode in which the program is distributed by download via an electric communication line such as the Internet. It is done.

以上説明したように、本発明によれば、ハードウェアコストの増加を招くことなく、リアルタイムに読み書きされるメモリについてのソフトエラーの有無の診断および修復を、当該メモリからのデータ読出しのタイミングで行うことが可能になる。 As described above, according to the present invention, the presence or absence of a soft error in a memory that is read and written in real time is diagnosed and repaired at the timing of reading data from the memory without increasing the hardware cost. It becomes possible.

本発明の一実施形態の診断修復装置１０のブロック構成の一例を示す図である。It is a figure which shows an example of the block configuration of the diagnostic repair apparatus 10 of one Embodiment of this invention. データ管理テーブル１２の一例を示す図である。It is a figure which shows an example of the data management table. 構造体定義ファイル１３および構造体定義データ２１の一例を示す図である。It is a figure which shows an example of the structure definition file 13 and the structure definition data 21. 同診断修復装置１０が実行する処理の概要について説明するための図である。It is a figure for demonstrating the outline | summary of the process which the diagnosis repair apparatus 10 performs. 同診断修復装置１０を実現可能なハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions which can implement | achieve the diagnostic repair apparatus. ＣＰＵ３６がアプリケーションプログラムにしたがって実行する処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process which CPU36 performs according to an application program. データ書込処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a data writing process. データ読出処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a data read-out process. 診断修復処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a diagnostic repair process.

以下、図面を参照しつつ本発明の実施形態について説明する。
＜診断修復装置１０の機能ブロック構成例＞
図１は、本実施形態における診断修復装置１０の機能ブロック構成の一例を示す図である。本実施形態における診断修復装置１０は例えばＣＰＵである。診断修復装置１０は、メモリ１１と、データ管理テーブル１２と、構造体定義ファイル１３と、各種アプリケーションプログラムを実行するプログラム実行部１４と、を有する。診断修復装置１０は、メモリ１１からのデータの読み出しの際にソフトエラーの有無を診断し、ソフトエラーがあった場合にはその修復を行う装置である。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
<Example of Functional Block Configuration of Diagnostic Repair Device 10>
FIG. 1 is a diagram illustrating an example of a functional block configuration of a diagnostic / repair device 10 according to the present embodiment. The diagnostic / repair device 10 in this embodiment is a CPU, for example. The diagnostic / repair device 10 includes a memory 11, a data management table 12, a structure definition file 13, and a program execution unit 14 that executes various application programs. The diagnostic / repair device 10 is a device that diagnoses the presence / absence of a soft error when reading data from the memory 11 and repairs the soft error.

メモリ１１は、例えばＲＡＭなどのリアルタイムに読み書き可能なメモリである。メモリ１１は、内部に複数の記憶領域を有する。図１に示す例では、メモリ１１内の複数の記憶領域のうち「データ領域１」、「データ領域２」、および「データ領域３」の３つの記憶領域が図示されている。本実施形態では、「データ領域１」、「データ領域２」、および「データ領域３」の各々がソフトエラーの有無の診断および修復対象となる。なお、本実施形態の診断修復装置１０は、アプリケーションプログラムの実行過程で参照／更新されるデータを格納するための領域としてレジスタ等を有しても良く、データを一時的に保存するキャッシュメモリ等を有しも良い。 The memory 11 is a readable / writable memory such as a RAM. The memory 11 has a plurality of storage areas inside. In the example illustrated in FIG. 1, three storage areas of “data area 1”, “data area 2”, and “data area 3” among the plurality of storage areas in the memory 11 are illustrated. In the present embodiment, each of “data area 1”, “data area 2”, and “data area 3” is a diagnosis and repair target for the presence or absence of a soft error. The diagnostic / repair device 10 according to the present embodiment may have a register or the like as an area for storing data that is referred to / updated in the process of executing the application program, a cache memory that temporarily stores data, and the like. May be included.

プログラム実行部１４は例えばＣＰＵコアである。図１に示すようにプログラム実行部１４は、メモリアクセス制御手段１４ａ、特定手段１４ｂ、および診断修復手段１４ｃを有する。プログラム実行部１４に含まれるこれら３つの手段は、例えばＣ言語における関数等のサブルーチン化されたプログラムにしたがってＣＰＵコアを作動させることで実現されるソフトウェアモジュールである。例えば、メモリアクセス制御手段１４ａは、メモリ１１へデータを書き込むためのデータ書込関数のプログラム或いは同メモリ１１からデータを読み出すためのデータ読出関数のプログラムにしたがってＣＰＵコアを作動させることで実現される。特定手段１４ｂおよび診断修復手段１４ｃは上記データ読出関数にしたがってＣＰＵコアを作動させることで実現される。データ書込関数のプログラムおよびデータ読出関数のプログラムについては、アプリケーションプログラムの実行過程で適宜呼び出し可能なソフトウェアライブラリとして実装しておけば良い。 The program execution unit 14 is, for example, a CPU core. As shown in FIG. 1, the program execution unit 14 includes a memory access control unit 14a, a specifying unit 14b, and a diagnostic / repair unit 14c. These three means included in the program execution unit 14 are software modules realized by operating the CPU core in accordance with a subroutine program such as a function in C language, for example. For example, the memory access control means 14a is realized by operating the CPU core in accordance with a data write function program for writing data to the memory 11 or a data read function program for reading data from the memory 11. . The specifying unit 14b and the diagnostic repair unit 14c are realized by operating the CPU core according to the data reading function. The data write function program and the data read function program may be implemented as software libraries that can be called as appropriate during the execution of the application program.

メモリアクセス制御手段１４ａは、データ書込処理とデータ読出処理を実行する。データ書込処理は、アプリケーションプログラムの実行過程でメモリ１１への書き込みを指示されたデータ（以下、書き込み対象データ）をメモリ１１内のデータ領域に書き込む処理である。データ読出処理は、アプリケーションプログラムの実行過程でメモリ１１からの読み出しを指示されたデータ（以下、読み出し対象データ）をメモリ１１内の該当データ領域から読み出す処理である。データ書込処理およびデータ読出処理の詳細については後に明らかする。特定手段１４ｂおよび診断修復手段１４ｃが実行する処理の詳細についても後に明らかにする。 The memory access control unit 14a executes data write processing and data read processing. The data writing process is a process of writing data instructed to be written to the memory 11 in the course of executing the application program (hereinafter, data to be written) to a data area in the memory 11. The data reading process is a process of reading data instructed to be read from the memory 11 in the course of executing the application program (hereinafter referred to as read target data) from the corresponding data area in the memory 11. Details of the data writing process and the data reading process will be described later. Details of processing executed by the specifying unit 14b and the diagnostic / repair unit 14c will also be made clear later.

データ管理テーブル１２は、メモリ１１のデータ領域の先頭アドレスを管理するテーブルである。データ管理テーブル１２は、メモリアクセス制御手段１４ａおよび診断修復手段１４ｃの各々からアクセス可能な共通領域に配置される。図２は、データ管理テーブル１２の一例を示す図である。図２に示すデータ管理テーブル１２は、「番号」および「名称」の２つの領域を有しているが、データ管理テーブル１２が有する領域は「番号」および「名称」の２つに限定される訳ではない。データ管理テーブル１２における「番号」領域には、データ管理テーブルに格納されているデータ（本実施形態では、「名称」領域に格納されるデータ）を一意に識別するための識別子が格納される。本実施形態では当該識別子として一連番号が用いられる。データ管理テーブル１２における「名称」領域には、メモリ１１内に設定された複数のデータ領域の各々の先頭アドレスが格納される。 The data management table 12 is a table for managing the start address of the data area of the memory 11. The data management table 12 is arranged in a common area accessible from each of the memory access control unit 14a and the diagnostic / repair unit 14c. FIG. 2 is a diagram illustrating an example of the data management table 12. The data management table 12 shown in FIG. 2 has two areas of “number” and “name”, but the area of the data management table 12 is limited to two of “number” and “name”. Not a translation. The “number” area in the data management table 12 stores an identifier for uniquely identifying data stored in the data management table (in this embodiment, data stored in the “name” area). In the present embodiment, a serial number is used as the identifier. The “name” area in the data management table 12 stores the start address of each of a plurality of data areas set in the memory 11.

構造体定義ファイル１３は、アプリケーションプログラム等において参照／更新されるデータのうちの構造体データのデータ構造を管理するファイルである。構造体データとは、構造体メンバと呼ばれるサブデータの集合体で構成されるデータである。図３（ａ）は、構造体定義データ２１を示し、図３（ｂ）は、図３（ａ）の構造体定義データ２１から得られる構造体定義ファイル１３を示し、図３（ｃ）は、構造体データのメモリ割り付け例を示す。構造体定義データとは、構造体データを定義するデータである。
図３（ａ）には、Ｃ言語で記述されたプログラムにおける構造体定義データ２１が示されている。周知のように、Ｃ言語では、「ｃｈａｒ」、「ｉｎｔ」、「ｌｏｎｇ」および「ｄｏｕｂｌｅ」の４種類のデータ型が定義されている。「ｃｈａｒ」は１バイトのデータ型であり、ＡＳＣＩＩコードなどの文字コード用のデータ型である。「ｉｎｔ」は、整数型であり、２バイトのデータサイズを有する。「ｌｏｎｇ」は、倍長整数型であり、４バイトのデータサイズを有する。「ｄｏｕｂｌｅ」は、倍精度実数型であり、８バイトのデータサイズを有する。なお、構造体メンバは配列であっても良く、［ｎ］はｎ個のデータよりなる一次元配列を意味する。 The structure definition file 13 is a file that manages the data structure of structure data among data that is referred to / updated in an application program or the like. Structure data is data composed of a collection of sub-data called structure members. 3A shows the structure definition data 21, FIG. 3B shows the structure definition file 13 obtained from the structure definition data 21 of FIG. 3A, and FIG. An example of memory allocation of structure data is shown. Structure definition data is data that defines structure data.
FIG. 3A shows structure definition data 21 in a program written in C language. As is well known, in the C language, four types of data, “char”, “int”, “long”, and “double”, are defined. “Char” is a 1-byte data type, and is a data type for character codes such as ASCII codes. “Int” is an integer type and has a data size of 2 bytes. “Long” is a long integer type and has a data size of 4 bytes. “Double” is a double-precision real number type and has a data size of 8 bytes. The structure member may be an array, and [n] means a one-dimensional array composed of n pieces of data.

図３（ｂ）に示す構造体定義ファイル１３は、メモリアクセス制御手段１４ａ、および診断修復手段１４ｃの各々からアクセス可能な共通領域に配置される。構造体定義ファイル１３には、構造体データ毎に、その構造体データを構成する各構造体メンバの「データ名」、「当該構造体メンバのバイト数」、「当該構造体メンバの先頭の相対アドレス」が書き込まれている。なお、構造体メンバの先頭の相対アドレスとは、構造体データの先頭アドレスをアドレスの起算点とした場合における当該構造体メンバの先頭アドレスのことをいう。構造体定義ファイル１３を構成する項目は、「データ名」、「バイト数」、「先頭相対アドレス」等に限定される訳ではない。本実施形態では、図３（ａ）に示す構造体定義データ２１が診断修復装置１０に入力されると、診断修復装置１０は、各データをテーブルに変換して、図３（ｂ）に示す構造体定義ファイル１３を生成する。図３（ｃ）では、構造体定義ファイル１３をメモリ１１へ割り付けした例が示されている。ここで、上述した図１の例では、メモリ１１内のデータ領域は３つあるため、同じ領域割り付けがメモリ１１内の３か所に所定の形式（例えば、文字型、整数型、倍長整数型、倍精度実数型等のデータタイプやデータタイプに対応するバイト数等）で設定される。 The structure definition file 13 shown in FIG. 3B is arranged in a common area accessible from each of the memory access control unit 14a and the diagnostic / repair unit 14c. In the structure definition file 13, for each structure data, “data name”, “number of bytes of the structure member”, “relative of the head of the structure member” of each structure member constituting the structure data "Address" is written. Note that the relative address at the head of a structure member means the head address of the structure member when the head address of the structure data is used as the starting point of the address. Items constituting the structure definition file 13 are not limited to “data name”, “number of bytes”, “start relative address”, and the like. In the present embodiment, when the structure definition data 21 shown in FIG. 3A is input to the diagnostic repair device 10, the diagnostic repair device 10 converts each data into a table, which is shown in FIG. 3B. A structure definition file 13 is generated. FIG. 3C shows an example in which the structure definition file 13 is allocated to the memory 11. Here, in the example of FIG. 1 described above, since there are three data areas in the memory 11, the same area allocation is in a predetermined format (for example, character type, integer type, long integer) in three places in the memory 11. Data type such as type, double precision real number type, and the number of bytes corresponding to the data type).

本実施形態におけるメモリアクセス制御手段１４ａ、特定手段１４ｂおよび診断修復手段１４ｃの各々はＣ言語で記述されたプログラムにしたがってＣＰＵコアを作動させることで実現される。この場合、診断修復装置１０（ＣＰＵ）が実際の処理を実行する前、例えば上記プロブラムのコンパイル時に予め設定された構造体定義データ２１を用いて構造体定義ファイル１３を生成しても良い。構造体定義データ２１は、各種アプリケーションプログラムとともに外部から診断修復装置１０に与えられる。本実施形態の診断修復装置１０は、構造体定義データ２１を用いて構造対定義ファイル１３を生成するが、これに限定されるものではない。例えば構造体定義ファイル１３そのものをインターネットやＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）等の電気通信回線経由で外部から取得する態様であっても良く、また、ユーザ等からの入力により取得する態様であっても良い。 Each of the memory access control means 14a, the specifying means 14b, and the diagnostic restoration means 14c in the present embodiment is realized by operating the CPU core according to a program written in C language. In this case, the structure definition file 13 may be generated using the structure definition data 21 set in advance, for example, when the program is compiled, before the diagnosis / repair device 10 (CPU) executes the actual processing. The structure definition data 21 is given to the diagnostic / repair device 10 from the outside together with various application programs. Although the diagnostic repair apparatus 10 of this embodiment produces | generates the structure pair definition file 13 using the structure definition data 21, it is not limited to this. For example, the structure definition file 13 itself may be acquired from the outside via an electric communication line such as the Internet or a LAN (Local Area Network), or may be acquired by input from a user or the like. .

＜診断修復装置１０が実行する処理の概要＞
次いで、メモリ１１についてのソフトエラーの有無の診断および修復のために診断修復装置１０が実行する処理の概要について、図４を参照しつつ説明する。図４における（１）および（２）の処理はメモリアクセス制御手段１４ａが実行する処理である。具体的には、図４における（１）の処理はＣＰＵコアがデータ書込関数のプログラムにしたがって実行するデータ書込処理であり、図４における（２）の処理はＣＰＵコアがデータ読出関数にしたがって実行するデータ読出処理である。図４における（３）の処理および（４）の処理は何れもデータ読出関数のプログラムにしたがってＣＰＵコアが実行する処理である。図４の（３）の処理は特定手段１４ｂの処理であり、図４の（４）の処理は診断修復手段１４ｃの処理である。 <Outline of processing executed by diagnosis / repair device 10>
Next, an outline of processing executed by the diagnostic / repair device 10 for diagnosing and repairing the presence / absence of a soft error in the memory 11 will be described with reference to FIG. The processes (1) and (2) in FIG. 4 are processes executed by the memory access control means 14a. Specifically, the process (1) in FIG. 4 is a data write process executed by the CPU core in accordance with the data write function program, and the process (2) in FIG. Therefore, this is a data reading process to be executed. The processes (3) and (4) in FIG. 4 are processes executed by the CPU core in accordance with the data read function program. The process of (3) in FIG. 4 is the process of the specifying unit 14b, and the process of (4) in FIG. 4 is the process of the diagnostic repair unit 14c.

図４では、データ領域１〜３のそれぞれはメモリ１１−１〜１１−３と表記されている。本実施形態のデータ書込処理、すなわち、図４における（１）の処理は、アプリケーションプログラムの実行過程でメモリ１１への書き込みを指示されたデータを当該メモリ１１へ書き込む処理である。図４の（１）に示すように、本実施形態のデータ書込処理では、メモリ１１へのデータ書き込みの際に、書き込み対象データは３つの異なる形式でデータ領域１〜３の各々へ書き込まれる。具体的には、メモリアクセス制御手段１４ａは、書き込み対象データそのままのデータ（以下、正データ）をデータ領域１に書き込み、当該正データに対応する２種類のデータをデータ領域２およびデータ領域３の各々に書き込む。より詳細に説明すると、メモリアクセス制御手段１４ａは、データ領域２には正データを１６進数ｆｆｆｆ（ハイバリュー（オールｆ））でＥＯＲ（排他的論理和）演算したデータ（反転データ）を書き込み、データ領域３には正データを予め設定された１又は複数の異なる所定のパターン（本実施形態では、１６進数ａａａａ）でＥＯＲ演算したデータ（パターンデータ）を書き込む。これは、ソフトエラー以外のビットエラー等のエラー検出もできるようにするためである。 In FIG. 4, each of the data areas 1 to 3 is represented as memories 11-1 to 11-3. The data writing process of the present embodiment, that is, the process (1) in FIG. 4 is a process of writing data instructed to be written to the memory 11 in the execution process of the application program to the memory 11. As shown in (1) of FIG. 4, in the data writing process of the present embodiment, when data is written to the memory 11, the write target data is written to each of the data areas 1 to 3 in three different formats. . Specifically, the memory access control unit 14a writes the data to be written as it is (hereinafter referred to as positive data) to the data area 1, and stores two types of data corresponding to the positive data in the data area 2 and the data area 3. Write to each. More specifically, the memory access control unit 14a writes data (inverted data) obtained by performing an EOR (exclusive OR) operation on the positive data in a hexadecimal number ffff (high value (all f)) in the data area 2, In the data area 3, data (pattern data) obtained by performing an EOR operation on one or a plurality of different predetermined patterns (in the present embodiment, hexadecimal aaaa) set as positive data is written. This is to enable detection of errors such as bit errors other than soft errors.

上述した所定のパターンは１６進数ａａａａのように同一数字が連続するものに限定される訳ではなく、１６進数１２３４，４３２１のように昇順、降順になったものでも良い。また、１６進数０ａ０ａのように所定の２以上の値を交互に配列したパターンを用いても良い。また、パターンデータを生成するためのパターンを処理毎に変更しても良く、常に予め設定された固定のパターンを用いても良い。更に、上記パターンデータを生成するための演算は、ＥＯＲ演算に限定されるものではなく、例えば論理積演算等を用いても良く、処理毎或いは処理データの種類に応じて演算内容を異ならせても良い。なお、上記説明では、反転データの生成に用いる１６進数やパターンデータの生成に用いるパターンの桁数を４桁としたが、正データのバイト数に対応させた桁数を採用すれば良い。 The predetermined pattern described above is not limited to the same number of consecutive numbers such as the hexadecimal number aaaa, but may be an ascending or descending order such as the hexadecimal numbers 1334 and 4321. Alternatively, a pattern in which two or more predetermined values are alternately arranged like a hexadecimal number 0a0a may be used. Further, the pattern for generating the pattern data may be changed for each process, and a fixed pattern that is set in advance may be used. Furthermore, the operation for generating the pattern data is not limited to the EOR operation, and for example, a logical product operation or the like may be used, and the content of the operation varies depending on the processing or the type of the processing data. Also good. In the above description, the hexadecimal number used to generate the inverted data and the number of digits of the pattern used to generate the pattern data are four digits. However, the number of digits corresponding to the number of bytes of the positive data may be used.

本実施形態のデータ読出処理、すなわち、図４における（２）の処理は、アプリケーションプログラムの実行過程でメモリ１１からの読み出しを指示されたデータを当該メモリ１１から読み出す処理である。図４の（２）に示すように、本実施形態のデータ読出処理では、読み出し対象のデータ（正データ）とともに当該読み出し対象のデータに対応する２種類のデータ（すなわち、反転データおよびパタ−ンデータ）が読み出される。本実施形態では、これら３つのデータを用いて、各データの格納されていた記憶領域にソフトエラーが発生しているか否かの診断が行われ、ソフトエラーが検出された場合にはその修復が行われる。 The data reading process of the present embodiment, that is, the process (2) in FIG. 4 is a process of reading from the memory 11 data instructed to be read from the memory 11 during the execution of the application program. As shown in (2) of FIG. 4, in the data reading process of the present embodiment, two types of data (that is, inverted data and pattern data) corresponding to the data to be read together with the data to be read (positive data). ) Is read out. In this embodiment, using these three data, a diagnosis is made as to whether or not a soft error has occurred in the storage area in which each data is stored, and if a soft error is detected, the repair is performed. Done.

より詳細に説明すると、メモリアクセス制御手段１４ａは、上記３つのデータ（正データ、反転データおよびパターンデータ）の各々が互いに一致するか否かの照合を行う。具体的には、メモリアクセス制御手段１４ａは、まず、メモリ１１−２から得られた反転データを１６進数ｆｆｆｆでＥＯＲ演算する前のデータに戻してから正データとの照合を行う。なお、両者が一致しているか否かの判断は、完全一致に限定されるものではなく、例えば所定の誤差範囲にあるか否かにより判断してもよい。照合の結果が不一致の場合には、メモリアクセス制御手段１４ａは、メモリエラーと判断する。次いで、メモリアクセス制御手段１４ａは、パターンデータと正データの照合を行う。この場合も、メモリアクセス制御手段１４ａは、パターンデータを元のデータに戻した後に正データと照合する。 More specifically, the memory access control unit 14a collates whether each of the three data (normal data, inverted data, and pattern data) matches each other. Specifically, the memory access control unit 14a first collates the inverted data obtained from the memory 11-2 to the data before the EOR operation with the hexadecimal number ffff and then collates with the positive data. Note that the determination as to whether or not the two match is not limited to a perfect match, and may be determined based on, for example, whether or not they are within a predetermined error range. If the collation results do not match, the memory access control means 14a determines that a memory error has occurred. Next, the memory access control means 14a collates the pattern data with the positive data. Also in this case, the memory access control means 14a collates with the positive data after returning the pattern data to the original data.

正データと反転データの照合および正データとパターンデータの照合の何れにおいても一致が確認された場合には、正データを用いてアプリケーションプログラムにしたがった処理が実行される。これに対してメモリエラーと判定された場合、本実施形態では、図４における（３）の処理および（４）の処理が実行される。以下では、図４における（３）の処理と同（４）の処理とを「診断修復処理」と総称する。図４における（３）の処理は、特定手段１４ｂの処理である。特定手段１４ｂは、上記３つのデータで多数決を行い、多数のものを正常データと判定する。より詳細に説明すると、特定手段１４ｂは、上記３つのデータのうち互いに一致する２つのデータを正常データとするとともに、これら２つのデータとは一致しないデータの格納されていたデータ領域にメモリエラーが発生していると見做して当該データ領域を診断対象領域として特定する。なお、上記３つのデータの各々が互いに異なっていた場合には、特定手段１４ｂは、ハードエラーの発生を示すエラー通知を行い、診断修復処理を終了する。 If a match is confirmed in both the collation of the positive data and the inverted data and the collation of the positive data and the pattern data, the process according to the application program is executed using the positive data. On the other hand, when it is determined that there is a memory error, in the present embodiment, the processes (3) and (4) in FIG. 4 are executed. Hereinafter, the process (3) and the process (4) in FIG. 4 are collectively referred to as “diagnosis repair process”. The process (3) in FIG. 4 is the process of the specifying unit 14b. The specifying unit 14b makes a majority decision with the above three data, and determines a large number as normal data. More specifically, the specifying unit 14b sets two data that match each other among the above three data as normal data, and a memory error occurs in the data area in which the data that does not match these two data is stored. The data area is specified as a diagnosis target area on the assumption that it has occurred. If each of the three data is different from each other, the specifying unit 14b issues an error notification indicating the occurrence of a hard error, and ends the diagnostic repair process.

図４には、メモリ１１−１が診断対象領域として特定された場合について例示されている。診断対象領域が特定されると、メモリエラーがソフトエラーによるものであるか否かを診断し、ソフトエラーである場合には上記正常データを修復データとしてそのソフトエラーを修復する処理（図４における（４）の処理）が診断修復手段１４ｃによって実行される。図４における（４）の処理では、診断修復手段１４ｃは、診断対象領域に所定のテストデータを書き込んだ後に当該診断対象領域からテストデータを読み出し、当該読み出したテストデータと書き込み前のテストデータとが一致する場合には、当該診断対象領域において発生したメモリエラーはソフトエラーであると見做して修復データを当該診断対象領域に書き込んで当該修復対象領域の格納データを修復する。前述したように、ソフトエラーはメモリ素子のリセット或いはデータの再書き込みによって修復できるからである。
以上が本実施形態における診断修復処理の概要である。 FIG. 4 illustrates the case where the memory 11-1 is specified as the diagnosis target area. When the diagnosis target area is specified, it is diagnosed whether or not the memory error is caused by a soft error. If the error is a soft error, the soft error is repaired by using the normal data as repair data (in FIG. 4). Process (4)) is executed by the diagnostic / restoring means 14c. In the process of (4) in FIG. 4, the diagnostic repair means 14c reads test data from the diagnosis target area after writing predetermined test data in the diagnosis target area, and reads the read test data, the test data before writing, If the two match, the memory error occurring in the diagnosis target area is regarded as a soft error, and the repair data is written in the diagnosis target area to restore the stored data in the repair target area. This is because the soft error can be repaired by resetting the memory element or rewriting data as described above.
The above is the outline of the diagnostic repair process in the present embodiment.

＜診断修復装置１０のハードウェア構成例＞
上述したように診断修復装置１０の特徴を顕著に示す各機能ブロック（すなわち、メモリアクセス制御手段１４ａ、特定手段１４ｂおよび診断修復手段１４ｃ）は何れもソフトウェアモジュールである。このため、これら各手段をＣＰＵに実現させるプログラム（具体的には前述した書込関数および読出関数のプログラム、以下、診断修復プログラム）を汎用のパーソナルコンピュータやサーバ等の一般的なコンピュータ装置にソフトウェアライブラリとしてインストールしておけば、上記コンピュータ装置の制御部（ＣＰＵ）に上記ソフトウェアライブラリを適宜読み出して実行させることで、当該コンピュータ装置を診断修復装置１０として機能させることができる。 <Hardware Configuration Example of Diagnostic Repair Device 10>
As described above, each functional block (that is, the memory access control unit 14a, the specifying unit 14b, and the diagnostic / repair unit 14c) that significantly shows the features of the diagnostic / repair device 10 is a software module. For this reason, a program (specifically, the above-described write function and read function program, hereinafter referred to as a diagnostic repair program) for causing the CPU to realize each of these means can be applied to a general computer device such as a general-purpose personal computer or a server. If installed as a library, the computer device can function as the diagnostic / repair device 10 by causing the control unit (CPU) of the computer device to read and execute the software library as appropriate.

図５は、診断修復装置１０として動作可能なコンピュータ装置のハードウェア構成の一例を示す図である。図５に示すコンピュータ装置は、入力装置３１と、出力装置３２と、ドライブ装置３３と、補助記憶装置３４と、メモリ装置３５と、当該コンピュータ装置の制御中枢として機能するＣＰＵ３６と、ネットワーク接続装置３７と、これら各構成要素を接続するシステムバスＢと、を有する。 FIG. 5 is a diagram illustrating an example of a hardware configuration of a computer apparatus that can operate as the diagnostic repair apparatus 10. The computer device shown in FIG. 5 includes an input device 31, an output device 32, a drive device 33, an auxiliary storage device 34, a memory device 35, a CPU 36 that functions as a control center of the computer device, and a network connection device 37. And a system bus B for connecting these components.

入力装置３１は、例えばマウス等のポインティングデバイスやキーボードである。入力装置３１は、各種操作信号をユーザの操作に応じてＣＰＵ３６に与える。これによりユーザの操作内容がＣＰＵ３６に伝達される。入力装置３１に対する操作により入力される操作信号の具体例としては、各種プログラムの実行を指示する信号等が挙げられる。 The input device 31 is a pointing device such as a mouse or a keyboard, for example. The input device 31 provides various operation signals to the CPU 36 in accordance with user operations. Thereby, the user's operation content is transmitted to the CPU 36. Specific examples of the operation signal input by an operation on the input device 31 include a signal for instructing execution of various programs.

出力装置３２は、例えばディスプレイとその駆動回路とを有する。出力装置３２の有するディスプレイには、ＣＰＵ３６による制御の下、コンピュータ装置の利用を促す画面等や、各種プログラムの実行経過や実行結果等を表すデータが表示される。 The output device 32 has, for example, a display and its drive circuit. On the display of the output device 32, under the control of the CPU 36, a screen for prompting the use of the computer device and data representing the execution progress and execution results of various programs are displayed.

図５に示すコンピュータ装置にインストールされる診断修復プログラムは、例えばＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）メモリやＣＤ−ＲＯＭ等の可搬型の記録媒体３８等により提供される。診断修復プログラムを記録した記録媒体３８は、ドライブ装置３３にセット可能であり、記録媒体３８に記録された診断修復プログラムは、記録媒体３８からドライブ装置３３を介して補助記憶装置３４にインストールされる。 The diagnostic / repair program installed in the computer apparatus shown in FIG. 5 is provided by a portable recording medium 38 such as a USB (Universal Serial Bus) memory or a CD-ROM. The recording medium 38 on which the diagnostic repair program is recorded can be set in the drive device 33, and the diagnostic repair program recorded on the recording medium 38 is installed in the auxiliary storage device 34 from the recording medium 38 via the drive device 33. .

補助記憶装置３４は、例えばハードディスクである。補助記憶装置３４は、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）をＣＰＵ３６に実現させるＯＳプログラムや前述した診断修復プログラム、図４における（１）〜（４）の各処理の実行の際に利用される各種データ（例えば、データ管理テーブル１２、構造体定義ファイル１３等）等を蓄積し、必要に応じて入出力することができる。 The auxiliary storage device 34 is, for example, a hard disk. The auxiliary storage device 34 is an OS program that causes the CPU 36 to implement an OS (Operating System), the above-described diagnostic repair program, and various types of data (eg, for example) used when executing the processes (1) to (4) in FIG. , Data management table 12, structure definition file 13, etc.) can be stored and input / output as necessary.

メモリ装置３５は、上述したメモリ１１に対応する。また、メモリ装置３５には、ＣＰＵ３６により補助記憶装置３４から読み出されたプログラム等がロードされる。なお、メモリ装置３５には、ＲＡＭの他にＲＯＭが含まれても良い。 The memory device 35 corresponds to the memory 11 described above. The memory device 35 is loaded with a program read from the auxiliary storage device 34 by the CPU 36. The memory device 35 may include a ROM in addition to the RAM.

ネットワーク接続装置３７は、例えばＮＩＣ（ＮｅｔｗｏｒｋＩｎｔｅｒｆａｃｅＣａｒｄ）であり、インターネットやＬＡＮなどの電気通信回線に接続されている。ネットワーク装置３７はその接続先の電気通信回線から送信されてくるデータを受信してＣＰＵ３６へ引き渡す一方、ＣＰＵ３６から引き渡されたデータを上記電気通信回線へ送出する。これにより、図５に示すコンピュータ装置は、上記電気通信回線に接続されている他のコンピュータ装置とデータ通信することができる。ネットワーク接続装置３７は、ソフトウェアライブラリや図４における（１）〜（４）の各処理を実行する際に必要となる各種データを、上記電気通信回線に接続されている他のコンピュータ装置から取得（ダウンロード）する際に利用される。 The network connection device 37 is, for example, a NIC (Network Interface Card), and is connected to an electric communication line such as the Internet or a LAN. The network device 37 receives data transmitted from the connected telecommunication line and delivers it to the CPU 36, while sending the data delivered from the CPU 36 to the telecommunication line. Accordingly, the computer apparatus shown in FIG. 5 can perform data communication with another computer apparatus connected to the above-described electric communication line. The network connection device 37 acquires various data necessary for executing the processes (1) to (4) in FIG. 4 from other computer devices connected to the telecommunication line ( Used when downloading.

＜本実施形態の診断修復装置の動作＞
次に、本実施形態の診断修復装置の動作を、フローチャートを用いて説明する。図６は、ＣＰＵ３６がアプリケーションプログラムにしたがって実行する処理の流れを示すフローチャートである。このアプリケーションプログラムにしたがってＣＰＵ３６が実行する処理の流れは以下の通りである。まず、ＣＰＵ３６は、アプリケーションプログラムの実行過程で使用するデータをメモリ装置３５から読み出すためにデータ読出関数を呼び出し、データ読出関数のプログラムにしたがってデータ読出処理を実行する（ステップＳ０１）。 <Operation of the Diagnosis Repair Device of the Present Embodiment>
Next, the operation of the diagnostic repair apparatus of this embodiment will be described using a flowchart. FIG. 6 is a flowchart showing the flow of processing executed by the CPU 36 in accordance with the application program. The flow of processing executed by the CPU 36 in accordance with this application program is as follows. First, the CPU 36 calls a data read function to read data used in the execution process of the application program from the memory device 35, and executes a data read process according to the data read function program (step S01).

データ読出処理を完了すると、ＣＰＵ３６はその処理結果が異常終了であったか否かを判定する（ステップＳ０２）。ステップＳ０２の判定結果がＹｅｓである場合にはＣＰＵ３６は異常処理（ステップＳ０５）を実行してアプリケーションプログラムの実行を完了する。これに対して、ステップＳ０２の判定結果がＮｏである場合にはＣＰＵ３６はアプリケーションプログラムのコードにしたがってアプリケーション処理を実行し（ステップＳ０３）、さらにその処理結果のデータをメモリ装置３５に書き込むためにデータ書込関数を呼び出し、データ書込処理を実行する（ステップＳ０４）。 When the data reading process is completed, the CPU 36 determines whether or not the processing result is abnormal termination (step S02). If the determination result in step S02 is Yes, the CPU 36 executes an abnormality process (step S05) and completes the execution of the application program. On the other hand, if the determination result in step S02 is No, the CPU 36 executes application processing according to the code of the application program (step S03), and further writes data for writing the processing result in the memory device 35. The writing function is called to execute the data writing process (step S04).

以上がアプリケーションプログラムにしたがってＣＰＵ３６が実行する処理の流れである。なお、ステップＳ０３にて実行されるアプリケーション処理の処理内容はアプリケーションプログラムの種類に応じて定まる。以下では、データ書込処理、データ読出処理および診断修復処理について説明する。また、以下では、便宜上、データ書込処理を先に説明する。これは、データ読出処理における読み出し対象データは、データ書込処理によってメモリ装置３５に書き込まれたデータだからである。 The above is the flow of processing executed by the CPU 36 in accordance with the application program. Note that the processing content of the application processing executed in step S03 is determined according to the type of application program. Hereinafter, the data writing process, the data reading process, and the diagnostic repair process will be described. In the following, the data writing process will be described first for convenience. This is because the read target data in the data read process is data written in the memory device 35 by the data write process.

＜データ書込処理の処理手順＞
図７は、データ書込処理の流れを示すフローチャートである。本実施形態においてデータ書込関数（関数名；ｗｓｄａｔ）を呼び出す際に指定するパラメータ（データ書込関数ｗｓｄａｔの引数）は、書き込みデータ名と書き込みデータである。以下では、データ書込関数ｗｓｄａｔを呼び出すコードの記述例が「ｒｉｎｆ＝ｗｓｄａｔ（ａ［２］＝２０）」である場合について説明する。これは、書き込みデータ名がａである整数型の配列の３番目（相対値）に整数値２０を書き込みデータとして書き込むことを意味している。上記コードの記述例においてｒｉｎｆはデータ書込関数ｗｓｄａｔの戻り値（例えば、書き込みの成否を示す関数値）を意味するが、図６に示すように、本実施形態ではこの戻り値は使用されない。 <Data write processing procedure>
FIG. 7 is a flowchart showing the flow of the data writing process. In this embodiment, the parameters (arguments of the data write function wsdat) specified when calling the data write function (function name; wsdat) are the write data name and the write data. Hereinafter, a description will be given of a case where the description example of the code that calls the data write function wsdat is “rinf = wsdat (a [2] = 20)”. This means that the integer value 20 is written as write data to the third (relative value) of the integer type array whose write data name is a. In the above description example of the code, rinf means a return value of the data write function wsdat (for example, a function value indicating success or failure of writing), but as shown in FIG. 6, this return value is not used in this embodiment.

本実施形態では、データ書込関数ｗｓｄａｔを呼び出す際にパラメータとして書き込みデータ名と書き込みデータとを指定する例について説明するが、これに限定されるものではない。例えば連続した書き込みアドレスの各々にデータを書き込む場合には、書き込み先頭相対アドレスと、書き込みデータバイト数と、書き込みデータとをパラメータとして指定し、各データを関連付けて一括して書き込むこともできる。 In the present embodiment, an example in which a write data name and write data are specified as parameters when calling the data write function wsdat will be described, but the present invention is not limited to this. For example, when data is written to each successive write address, the write head relative address, the number of write data bytes, and the write data can be specified as parameters, and the data can be associated and written together.

データ書込関数ｗｓｄａｔにしたがって作動し、書き込みデータ名および書き込みデータを引き渡されたＣＰＵ３６は、データ書込処理を実行するメモリアクセス制御手段１４ａとして機能する。ＣＰＵ３６は、まず、図７に示すように、データ管理テーブル１２と構造体定義ファイル１３とを読み出し、書き込み先頭アドレスを計算する（ステップＳＡ０１）。なお、書き込み先頭アドレスについては、例えば「書き込み先頭アドレス＝データ領域１先頭アドレス＋書き込みデータ名先頭アドレス＋（相対値−１）×データ名バイト数」のような式を用いて算出することがきるが、これに限定されるものではない。 The CPU 36 that operates according to the data write function wsdat and receives the write data name and the write data functions as the memory access control means 14a that executes the data write process. First, as shown in FIG. 7, the CPU 36 reads the data management table 12 and the structure definition file 13 and calculates the write start address (step SA01). The write start address can be calculated using an expression such as “write start address = data area 1 start address + write data name start address + (relative value−1) × number of data name bytes”. However, the present invention is not limited to this.

次に、ＣＰＵ３６は、パラメータのデータを構造体定義ファイル１３のデータ名に対応するバイト数分書き込む（ステップＳＡ０２）。例えば、整数型のデータ名であれば、ＣＰＵ３６は、２バイト分のデータを書き込む。次に、ＣＰＵ３６は、データ管理テーブル１２から「データ領域２」の先頭アドレスを読み出し、構造体定義ファイル１３を参照して書き込み先頭アドレスを計算する（ステップＳＡ０３）。ステップＳＡ０３の処理における計算では、上述したステップＳＡ０１における計算とデータ領域先頭アドレスのみが異なる。具体的には、「書き込み先頭アドレス＝データ領域２先頭アドレス＋書き込みデータ名先頭アドレス＋（相対値−１）×データ名バイト数」のような式を用いて書き込み先頭アドレスを計算することができる。 Next, the CPU 36 writes parameter data for the number of bytes corresponding to the data name of the structure definition file 13 (step SA02). For example, in the case of an integer type data name, the CPU 36 writes 2 bytes of data. Next, the CPU 36 reads the start address of “data area 2” from the data management table 12, and calculates the write start address with reference to the structure definition file 13 (step SA03). The calculation in step SA03 differs from the calculation in step SA01 described above only in the data area head address. Specifically, the write start address can be calculated using an expression such as “write start address = data area 2 start address + write data name start address + (relative value−1) × number of data name bytes”. .

次に、ＣＰＵ３６は、書き込みデータと例えば１６進数ｆｆｆｆのＥＯＲ演算を行って反転データを生成し（ステップＳＡ０４）、その反転データを対応するデータ型のバイト数で書き込む（ステップＳＡ０５）。そのとき書き込むアドレスは、上述したステップＳＡ０３の処理で計算したアドレスに書き込む。 Next, the CPU 36 performs EOR operation of the write data and, for example, the hexadecimal number ffff to generate inverted data (step SA04), and writes the inverted data with the number of bytes of the corresponding data type (step SA05). The address to be written at that time is written to the address calculated in the process of step SA03 described above.

次に、ＣＰＵ３６は、データ管理テーブル１２から「データ領域３」の先頭アドレスを読み出し、構造体定義ファイル１３を参照して書き込み先頭アドレスを計算する（ステップＳＡ０６）。なお、ステップＳＡ０６における書き込み先頭アドレスの計算は、上述したステップＳＡ０３における計算と同様にデータ領域先頭アドレスが変わるのみである。具体的には、「書き込み先頭アドレス＝データ領域３先頭アドレス＋書き込みデータ名先頭アドレス＋（相対値−１）×データ名バイト数」のような式を用いて書き込み先頭アドレスを計算することができる。 Next, the CPU 36 reads the head address of “data area 3” from the data management table 12, and refers to the structure definition file 13 to calculate the write head address (step SA06). Note that the calculation of the write start address in step SA06 only changes the data area start address as in the above-described calculation in step SA03. Specifically, the write start address can be calculated using an expression such as “write start address = data area 3 start address + write data name start address + (relative value−1) × number of data name bytes”. .

次に、ＣＰＵ３６は、書き込みデータと所定のパターン（本実施形態では、１６進数ａａａａ）とのＥＯＲ演算を行ってパターンデータを生成し（ステップＳＡ０７）、そのパターンデータをステップＳＡ０６の処理にて計算されたアドレスから対応するデータ型のバイト数で書き込む（ステップＳＡ０８）。
以上がデータ書込処理の流れである。 Next, the CPU 36 performs an EOR operation between the write data and a predetermined pattern (hexadecimal number aaaa in this embodiment) to generate pattern data (step SA07), and calculates the pattern data by the process of step SA06. The number of bytes of the corresponding data type is written from the address that has been set (step SA08).
The above is the flow of the data writing process.

＜データ読出処理の処理手順＞
次いでデータ読出処理について説明する。図８は、データ読出処理の流れを示すフローチャートである。本実施形態においてデータ読出関数（関数名：ｒｓｄａｔ）を呼び出す際に指定するパラメータ（データ読出関数ｒｓｄａｔの引数）は、読み出しデータ名と読み出しレジスタ名である。以下では、データ読出関数ｒｓｄａｔを呼び出すコードの記述例が「ｒｉｎｆ＝ｒｓｄａｔ（ｂ［４］、ｘ）」である場合について説明する。これは、読み出しデータ名がｂである倍長整数型の配列の５番目（相対値）を読み出しレジスタ名ｂのレジスタに読み出すことを意味している。上記コードの記述例においてｒｉｎｆはデータ読出関数ｒｓｄａｔの戻り値を意味する。この戻り値は、例えば「正常」（例えば、０）、「異常」（例えば、−１）、「データ修復」（例えば、１）の３種類の値の何れかである。本実施形態では、この戻り値に基づいて図６のステップＳ０２の判定が為される。 <Data read processing procedure>
Next, the data reading process will be described. FIG. 8 is a flowchart showing the flow of the data reading process. In this embodiment, the parameters (arguments of the data read function rsdat) specified when calling the data read function (function name: rsdat) are the read data name and the read register name. Hereinafter, a description will be given of a case where a description example of a code that calls the data read function rsdat is “rinf = rsdat (b [4], x)”. This means that the fifth (relative value) of the long integer type array whose read data name is b is read into the register of the read register name b. In the description example of the above code, rinf means a return value of the data read function rsdat. This return value is one of three values, for example, “normal” (eg, 0), “abnormal” (eg, −1), and “data repair” (eg, 1). In the present embodiment, the determination in step S02 of FIG. 6 is made based on this return value.

本実施形態では、データ読出関数ｒｓｄａｔを呼び出す際に、パラメータとして読み出しデータ名と読み出しレジスタ名とを指定する例について説明するが、これに限定されるものではない。例えば読み出しアドレスが連続したデータを読み出し先頭相対アドレスと、読み出しデータバイト数と、読み出しデータ領域とをパラメータとして、各データを関連付けて一括で読み出すこともできる。 In the present embodiment, an example of specifying a read data name and a read register name as parameters when calling the data read function rsdat will be described, but the present invention is not limited to this. For example, data with continuous read addresses can be read in a batch by associating each data with the read head relative address, the number of read data bytes, and the read data area as parameters.

データ読出関数ｒｓｄａｔにしたがって作動し、読み出しデータ名および読み出しレジスタ名を引き渡されたＣＰＵ３６は、データ読出処理を実行するメモリアクセス制御手段１４ａとして機能する。ＣＰＵ３６は、まず、図８に示すように、データ管理テーブル１２と構造体定義ファイル１３とを読み出し、読み出し先頭アドレスを計算する（ステップＳＢ０１）。なお、読み出し先頭アドレスについては、例えば「読み出し先頭アドレス１＝データ領域１先頭アドレス＋読み出しデータ名先頭アドレス＋（相対値−１）×データ名バイト数」、「読み出し先頭アドレス２＝データ領域２先頭アドレス＋読み出しデータ名先頭アドレス＋（相対値−１）×データ名バイト数」、「読み出し先頭アドレス３＝データ領域３先頭アドレス＋読み出しデータ名先頭アドレス＋（相対値−１）×データ名バイト数」のような式を用いて算出することがきるが、これに限定されるものではない。 The CPU 36, which operates according to the data read function rsdat and is handed over the read data name and the read register name, functions as the memory access control means 14a that executes the data read processing. First, as shown in FIG. 8, the CPU 36 reads the data management table 12 and the structure definition file 13 and calculates the read head address (step SB01). For example, “read start address 1 = data area 1 start address + read data name start address + (relative value−1) × number of data name bytes”, “read start address 2 = data area 2 start” Address + read data name start address + (relative value−1) × data name byte count ”,“ read start address 3 = data area 3 start address + read data name start address + (relative value−1) × data name byte count ” However, the present invention is not limited to this.

ＣＰＵ３６は、読み出し先頭アドレス１、２、３から、各データ領域に書き込まれているデータを所定の読み出しデータバイト数で読み出す（ステップＳＢ０２）。以下では、上記の要領で読み込まれたデータを便宜上データ１、２、３とする。本実施形態では、データ１はデータ領域１から読み出される正データであり、データ２はデータ領域２から読み出される反転データであり、データ３はデータ領域３から読み出されるパターンデータである。 The CPU 36 reads the data written in each data area from the read head addresses 1, 2, and 3 with a predetermined number of read data bytes (step SB02). Hereinafter, the data read in the above manner is referred to as data 1, 2, and 3 for convenience. In this embodiment, data 1 is positive data read from the data area 1, data 2 is inverted data read from the data area 2, and data 3 is pattern data read from the data area 3.

次に、ＣＰＵ３６は、データ２と１６進数ｆｆｆｆとのＥＯＲ演算を行うとともに、データ３と所定のパターン（１６進数ａａａａ）とのＥＯＲ演算を行う（ステップＳＢ０３）。以下では、前者の演算結果をデータａと呼び、後者の演算結果をデータｂと呼ぶ。ステップＳＢ０３の処理は、反転データおよびパターンデータを各々の算出元となった正データに戻すための処理である。なお、反転データおよびパターンデータの生成のための演算としてＥＯＲ演算以外の演算を用いる場合は、ステップＳＢ０３では当該演算の逆演算を行うようにすれば良い。 Next, the CPU 36 performs an EOR operation between the data 2 and the hexadecimal number ffff, and also performs an EOR operation between the data 3 and a predetermined pattern (hexadecimal number aaa) (step SB03). Hereinafter, the former calculation result is referred to as data a, and the latter calculation result is referred to as data b. The process of step SB03 is a process for returning the inverted data and the pattern data to the positive data that is the respective calculation sources. Note that when an operation other than the EOR operation is used as the operation for generating the inverted data and the pattern data, the reverse operation of the operation may be performed in step SB03.

次に、ＣＰＵ３６は、データ１と、データａ，ｂとを照合し、データの一致を確認する（ステップＳＢ０４）。３つのデータが一致している場合（ステップＳＢ０５：Ｙｅｓ）には、ＣＰＵ３６は、データ１をデータ読み出しレジスタ名の示すレジスタにセットするとともに関数値に「正常」を示す値をセットして（ステップＳＢ０６）、データ読出処理を完了する。これに対して、３つのデータの中に他とは一致しないものが含まれていた場合（ステップＳＢ０５：Ｎｏ）には、ＣＰＵ３６は診断修復処理（すなわち、図４における（３）および（４）の処理）を実行する（ステップＳＢ０７）。 Next, the CPU 36 collates the data 1 with the data a and b and confirms the coincidence of the data (step SB04). If the three data match (step SB05: Yes), the CPU 36 sets data 1 in the register indicated by the data read register name and sets a value indicating “normal” in the function value (step SB05). SB06), the data reading process is completed. On the other hand, when the data that does not match the others is included in the three data (step SB05: No), the CPU 36 performs the diagnostic repair process (that is, (3) and (4) in FIG. 4). (Step SB07).

＜診断修復処理の処理手順＞
図９は診断修復処理の流れを示すフローチャートである。図９に示すように、本実施形態の診断修復処理は、ステップＳＣ０１〜ステップＳＣ１１の各処理によって構成されている。図９におけるステップＳＣ０１、ステップＳＣ０２およびステップＳＣ１１の各処理が、図４における（３）の処理、すなわち、特定手段１４ｂの処理であり、ステップＳＣ０３〜ステップＳＣ０１０の各処理が図４における（４）の処理、すなわち、診断修復手段１４ｃの処理である。図９に示すように、ＣＰＵ３６は、データ１、データａ、およびデータｂのうちの何れか２つのデータが一致するか否かを判定する（ステップＳＣ０１）。ステップＳＣ０１の判定結果がＮｏである場合、すなわち、データ１とデータａとが一致せず、データ１とデータｂとが一致せず、さらにデータａとデータｂも一致しない場合には、ＣＰＵ３６は関数値に「異常」を示す値（ハードエラーの発生を示す値）をセット（ステップＳＣ１１）して診断修復処理を完了する。これに対して、ステップＳＣ０１の判定結果がＹｅｓである場合、すなわち、データ１、データａ、およびデータｂのうちの何れか２つのデータが一致していた場合には、ＣＰＵ３６はステップＳＣ０２の処理を実行する。 <Diagnosis and repair processing procedure>
FIG. 9 is a flowchart showing the flow of diagnostic repair processing. As shown in FIG. 9, the diagnostic repair process of the present embodiment is configured by the processes of Step SC01 to Step SC11. Each process of step SC01, step SC02, and step SC11 in FIG. 9 is the process of (3) in FIG. 4, that is, the process of the specifying unit 14b, and each process of steps SC03 to SC010 is (4) in FIG. This is the process of the diagnostic repair means 14c. As shown in FIG. 9, CPU 36 determines whether any two of data 1, data a, and data b match (step SC01). If the determination result in step SC01 is No, that is, if data 1 and data a do not match, data 1 and data b do not match, and data a and data b do not match, CPU 36 does not match. The function value is set to a value indicating “abnormal” (a value indicating the occurrence of a hard error) (step SC11), and the diagnostic repair process is completed. On the other hand, if the determination result in step SC01 is Yes, that is, if any two of data 1, data a, and data b match, the CPU 36 performs the process in step SC02. Execute.

ステップＳＣ０１の判定結果がＹｅｓである場合に実行されるステップＳＣ０２では、ＣＰＵ３６は、データ１、データａ、およびデータｂのうち他とは一致しなかったデータの格納先のデータ領域を診断対象領域として特定し、その診断対象領域のアドレスを記憶するとともに、データ１、データａ、およびデータｂのうちで互いに一致するデータのうちの一方を修復データとして特定し記憶する。つまり、本実施形態では、データ１、データａ、およびデータｂのうちで多数を占めるデータが修復データとなり、少数となったデータの格納先が診断対象領域となる。 In step SC02, which is executed when the determination result in step SC01 is Yes, the CPU 36 determines the data storage destination data area that does not match data 1, data a, and data b as the diagnosis target area. And the address of the diagnosis target area is stored, and one of the data 1, data a, and data b that match each other is specified and stored as repair data. That is, in the present embodiment, the data that occupies the majority among the data 1, data a, and data b is the repair data, and the storage destination of the data that has become the small is the diagnosis target area.

上記の要領で診断対象領域が特定されると、ＣＰＵ３６は、診断対象領域におけるエラー原因がソフトエラーであるか否かを判定し、ソフトエラーであった場合には修復を行う。より詳細に説明すると、ＣＰＵ３６は、まず、診断対象領域に第１のテストデータ（本実施形態では、１６進数５５５５）を書き込む（ステップＳＣ０３）。なお、ステップＳＣ０３にて診断対象領域にデータを書き込む際には、ＣＰＵ３６は、キャッシュをスルーしてその書き込みを行う。次いで、ＣＰＵ３６は、診断対象領域からデータを読み出し（ステップＳＣ０４）、読み出したデータと上記書き込み前の第１のテストデータとが一致するか否かを判定する（ステップＳＣ０５）。ステップＳＣ０５の判定結果がＮｏの場合、すなわち、診断対象領域読み出したデータと第１のテストデータとが一致しなかった場合には、ＣＰＵ３６は、前述のステップＳＣ１１の処理を実行する。これに対して、ステップＳＣ０５の判定結果がＹｅｓであった場合は、ＣＰＵ３６は、診断対象領域に第２のテストデータ（本実施形態では、１６進数ａａａａ）を書き込む（ステップＳＣ０６）。なお、ステップＳＣ０６にて診断対象領域にデータを書き込む際も、ＣＰＵ３６はキャッシュをスルーしてその書き込みを行う。 When the diagnosis target area is specified in the above manner, the CPU 36 determines whether or not the cause of the error in the diagnosis target area is a soft error. More specifically, the CPU 36 first writes the first test data (hexadecimal number 5555 in the present embodiment) in the diagnosis target area (step SC03). When writing data to the diagnosis target area in step SC03, the CPU 36 performs the writing through the cache. Next, the CPU 36 reads data from the diagnosis target area (step SC04), and determines whether or not the read data matches the first test data before writing (step SC05). If the determination result in step SC05 is No, that is, if the data read out from the diagnosis target area does not match the first test data, the CPU 36 executes the process in step SC11 described above. On the other hand, if the determination result in step SC05 is Yes, the CPU 36 writes the second test data (hexadecimal number aaa in this embodiment) in the diagnosis target area (step SC06). Even when data is written to the diagnosis target area in step SC06, the CPU 36 performs the writing through the cache.

次いで、ＣＰＵ３６は、診断対象領域からデータを読み出し（ステップＳＣ０７）、読み出したデータと上記書き込み前の第２のテストデータとが一致するか否かを判定する（ステップＳＣ０８）。ステップＳＣ０８の判定結果がＮｏの場合は、ＣＰＵ３６は前述のステップＳＣ１１の処理を実行する。これに対して、ステップＳＣ０８の判定結果がＹｅｓであった場合は、ＣＰＵ３６は診断対象領域について発生したデータエラーはソフトエラーであると判定し、診断対象領域に修復データを書き込む（ステップＳＣ０９）ことでその修復を行い、関数値に「データ修復」を示す値をセット（ステップＳＣ１０）して診断修復処理を終了する。 Next, the CPU 36 reads data from the diagnosis target area (step SC07), and determines whether or not the read data matches the second test data before writing (step SC08). When the determination result of step SC08 is No, the CPU 36 executes the process of step SC11 described above. On the other hand, if the determination result in step SC08 is Yes, the CPU 36 determines that the data error occurring in the diagnosis target area is a soft error, and writes the repair data in the diagnosis target area (step SC09). Then, the repair is performed, a value indicating “data repair” is set in the function value (step SC10), and the diagnosis repair process is terminated.

第２のテストデータは、第１のテストデータの構成ビットを反転させたデータである。本実施形態において第１のテストデータと第２のテストデータの２種類のテストデータを用いて診断対象領域の診断を行うのは、仮に第１のテストデータのみ、或いは第２のテストデータのみを用いて診断対象領域の診断を行うとハードエラーを検出し損ねる虞があるからである。例えば、最上位ビットが０のままとなるハードエラーが診断対象領域に発生している状況下で第２のテストデータのみを用いて診断対象領域の診断を行うと、このハードエラーを検出し損ねてしまう。構成ビットが互いに反転した関係にある２種類のテストデータを用いて診断対象領域の診断を行えば、ハードエラーの検出漏れを回避することができる。なお、本実施形態では第１および第２のテストデータとして１６進数５５５５および１６進数ａａａａの２種類を用いたが、１６進数００００と１６進数ｆｆｆｆを用いても良い。要は、構成ビットが互いに反転した関係にある２種類のテストデータを用いて診断対象領域の診断を行う態様であれば良い。また、１６進数００ａａと１６進数ｆｆａａと１６進数ｂｂ５５といった互いに異なる３種類のテストデータを用いても同様の効果を得ることは可能である。 The second test data is data obtained by inverting the constituent bits of the first test data. In the present embodiment, the diagnosis target region is diagnosed using the two types of test data, the first test data and the second test data, only for the first test data or only for the second test data. This is because if a diagnosis target region is used for diagnosis, a hard error may be missed. For example, if the diagnosis target area is diagnosed using only the second test data in a situation where a hardware error in which the most significant bit remains 0 is generated in the diagnosis target area, the hard error cannot be detected. End up. If the diagnosis target area is diagnosed using two types of test data in which the constituent bits are inverted to each other, it is possible to avoid omission of detection of a hard error. In this embodiment, two types of hexadecimal number 5555 and hexadecimal number aaaa are used as the first and second test data, but hexadecimal number 0000 and hexadecimal number ffff may be used. The point is that the diagnosis target region may be diagnosed by using two types of test data in which the constituent bits are inverted to each other. The same effect can be obtained by using three different types of test data such as the hexadecimal number 00aa, the hexadecimal number ffaa, and the hexadecimal number bb55.

ここで注目すべき点は、本実施形態によれば、メモリ装置３５からのデータの読み出しやアプリケーション処理、およびメモリ装置３５へのデータの書き込みを行うハードウェア（すなわち、ＣＰＵ３６）の他に、検証のためのハードウェアが別途必要になることはない、という点である。検証のためのハードウェアを別途必要としないため、本実施形態によれば、特許文献３に開示の技術のようにハードウェアコストが増加することはない。つまり、本実施形態によれば、ハードウェアコストの増加を招くことなく、リアルタイムに読み書きされるメモリについてのソフトエラーの有無の診断および修復を、当該メモリからのデータ読出しのタイミングで行うことが可能になる。 It should be noted that according to the present embodiment, verification is performed in addition to hardware (that is, the CPU 36) that reads data from the memory device 35, performs application processing, and writes data to the memory device 35. There is no need for additional hardware. Since hardware for verification is not required separately, according to the present embodiment, the hardware cost does not increase unlike the technique disclosed in Patent Document 3. In other words, according to the present embodiment, it is possible to diagnose and repair the presence / absence of a soft error in a memory that is read / written in real time at the timing of reading data from the memory without causing an increase in hardware cost. become.

また、本実施形態では、ＥＣＣを用いる場合に比較してＣＰＵ３６に掛かる処理負荷が小さいといった利点もある。このように、本実施形態の診断修復装置１０は、ＣＰＵ３６に掛かる処理負荷が小さく、かつリアルタイムに読み書きされるメモリについてのソフトエラーの有無の診断および修復をハードウェアコストの増加を招くことなく実現することができる。このため、本実施形態の診断修復装置１０は、ソフトエラーの有無の診断および修復をリアルタイムかつ少ないハードウェアリソースで行うこと（すなわち、小規模・低コストなハードウェアにおいて小負荷で行うこと）を要求される電子機器に好適である。このような電子機器の具体例としては、自動車のコントローラであるＶＣＵ（ＶｅｈｉｃｌｅＣｏｎｔｒｏｌＵｎｉｔ）が挙げられる。 In addition, the present embodiment has an advantage that the processing load applied to the CPU 36 is small as compared with the case where ECC is used. As described above, the diagnosis and repair apparatus 10 according to the present embodiment realizes diagnosis and repair of the presence or absence of a soft error in a memory that is loaded and read in real time without increasing the hardware cost. can do. For this reason, the diagnosis and repair apparatus 10 according to the present embodiment performs diagnosis and repair of the presence / absence of a soft error in real time with a small amount of hardware resources (that is, with a small load on a small and low-cost hardware). Suitable for required electronic equipment. A specific example of such an electronic device is a VCU (Vehicle Control Unit) which is a controller of an automobile.

以上本発明の一実施形態について説明したが、この実施形態に以下の変形を加えても良い。
（１）上記実施形態では、正データ、反転データおよびパターンデータを各々１つずつ、すなわち合計３個のデータのうち多数を占めるデータを修復データとしてソフトエラーの修復を行った。しかし、多数を占めるデータの数が予め定められた閾値未満である場合には、当該多数のデータについても信頼性に疑義があると見做し、データ修復に換えてハードエラーの通知とは異なるエラー通知を行うようにしても良い。なお、上記閾値については、正データ、反転データおよび排他的論理和データの数の合計値以下で、かつ正データ、反転データおよびパターンデータの数の過半数の値よりも大きい値であれば良く、整数値である必要はない。 Although one embodiment of the present invention has been described above, the following modifications may be added to this embodiment.
(1) In the above-described embodiment, the soft error is repaired by using one each of the positive data, the inverted data, and the pattern data, that is, the data that occupies the majority among the total of three data. However, when the number of data occupying a large number is less than a predetermined threshold, it is considered that there is a doubt about the reliability of the large number of data, and it is different from the notification of a hard error instead of data restoration. Error notification may be performed. The threshold value may be a value that is equal to or less than the total value of the number of positive data, inverted data, and exclusive OR data and greater than the majority value of the number of positive data, inverted data, and pattern data, It need not be an integer value.

（２）上記実施形態では、正データ、反転データおよびパターンデータを各々１つずつ、すなわち合計３個のデータを用いてソフトエラーの発生の有無の診断を行った。しかし、２個の正データと２個の反転データと１つのパターンデータの合計５つのデータを用いてソフトエラーの発生の有無の診断行っても良く、７個以上の奇数個のデータを用いてソフトエラーの発生の有無の診断を行っても良い。また、上記閾値による判定を併用する態様であれば、正データ、反転データおよびパターンデータの数の合計値は偶数であっても良い。例えば６個のデータを用いる場合には上記閾値を３．１、４或いは５に設定しておけば良い。 (2) In the above-described embodiment, the presence / absence of a soft error is diagnosed using one each of positive data, inverted data, and pattern data, that is, a total of three data. However, the presence or absence of soft error may be diagnosed using a total of five data of two positive data, two inverted data, and one pattern data, and an odd number of seven or more data is used. A diagnosis of the occurrence of a soft error may be performed. Further, as long as the determination based on the threshold value is used in combination, the total number of the positive data, the inverted data, and the pattern data may be an even number. For example, when 6 pieces of data are used, the threshold value may be set to 3.1, 4, or 5.

（３）上記実施形態ではメモリアクセス制御手段１４ａ、特定手段１４ｂおよび診断修復手段１４ｃの各々をソフトウェアモジュールで実現したが、これら各手段のうちの何れか１つまたは複数を電子回路等のハードウェアで実現しても良い。また、上記実施形態では、ソフトエラーの診断および修復対象となるメモリ１１が診断修復装置１０に含まれていたが、メモリ１１が診断修復装置１０の外部に設けられていても良い。また、上記実施形態のメモリアクセス制御手段１４ａは、診断対象のメモリへのデータの書き込みを行うデータ書込処理と当該メモリからのデータの読み出しを行うデータ読出処理とを実行したが、データ書込処理については他の装置が行う態様であっても良い。 (3) In the above embodiment, each of the memory access control means 14a, the specifying means 14b, and the diagnostic / repair means 14c is realized by a software module. However, any one or more of these means is implemented as hardware such as an electronic circuit. It may be realized with. In the above-described embodiment, the memory 11 to be diagnosed and repaired by the soft error is included in the diagnostic / restoration apparatus 10. However, the memory 11 may be provided outside the diagnosis / restoration apparatus 10. In addition, the memory access control unit 14a of the above embodiment has executed the data writing process for writing data to the memory to be diagnosed and the data reading process for reading data from the memory. The processing may be performed by another device.

要するに本発明の診断修復装置は、読み出し対象として指定されたデータをメモリから読み出すとともに当該データに対応付けてメモリに格納された少なくとも２つのデータをメモリから読み出すメモリアクセス制御手段と、メモリアクセス制御手段により読み出された少なくとも３つのデータの全てが同一ではない場合に多数を占めるデータを修復データとして特定するとともに少数となったデータのメモリにおける記憶領域を診断対象領域として特定する特定手段と、診断対象領域へテストデータを書き込んだ後に診断対象領域からデータを読み出し、当該読み出したデータと書き込み前のテストデータとが一致する場合に修復データを用いて診断対象領域を修復する一方、一致しない場合にはハードエラーのエラー通知を行う診断修復手段とを有していれば良い。 In short, the diagnostic / repair device of the present invention reads out data designated as a read target from the memory and reads out at least two data stored in the memory in association with the data from the memory, and a memory access control means A specifying means for specifying data that occupies a large number as repair data when all of at least three data read by the step are not the same, and specifying a storage area in the memory of the data that has become a small number as a diagnosis target area, and diagnosis When the test data is written to the target area, the data is read from the diagnostic target area, and when the read data matches the test data before writing, the repair target data is used to repair the diagnostic target area. Diagnostic repair that provides error notification of hard errors It is sufficient and a stage.

１０…診断修復装置、１１…メモリ、１２…データ管理テーブル、１３…構造体定義ファイル、１４…プログラム実行部、１４ａ…メモリアクセス制御手段、１４ｂ…特定手段、１４ｃ…診断修復手段、３１…入力装置、３２…出力装置、３３…ドライブ装置、３４…補助記憶装置、３５…メモリ装置、３６…ＣＰＵ、３７…ネットワーク接続装置、３８…記録媒体、Ｂ…システムバス。 DESCRIPTION OF SYMBOLS 10 ... Diagnosis repair apparatus, 11 ... Memory, 12 ... Data management table, 13 ... Structure definition file, 14 ... Program execution part, 14a ... Memory access control means, 14b ... Identification means, 14c ... Diagnosis repair means, 31 ... Input Device 32. Output device 33 Drive device 34 Auxiliary storage device 35 Memory device 36 CPU 37 Network connection device 38 Recording medium B System bus

Claims

Memory access control means for reading out data designated as a reading target from the memory and reading out at least two pieces of data stored in the memory in association with the data;
When all of at least three data read by the memory access control means are not the same, the data that occupies the majority is specified as repair data, and the storage area of the data that has become a small number is specified as the diagnosis target area Specific means to
After writing test data to the diagnosis target area, data is read from the diagnosis target area, and when the read data matches the test data before writing, the repair data is used to repair the diagnosis target area. A diagnostic repair means for notifying a hard error if they do not match , and
The specifying means is:
When the number of data occupying a majority of at least three data read by the memory access control means is less than a predetermined threshold, a second error notification different from the error notification is performed . A diagnostic repair device characterized.

  The diagnostic repair means includes
  Using each of a plurality of different test data, writing to the diagnosis target area, reading from the diagnosis target area, and comparison with before writing
  The diagnostic repair device according to claim 1.

3. The diagnosis and repair device according to claim 2, wherein the diagnosis and repair means uses two test data whose constituent bits are inverted from each other.