JP4131928B2

JP4131928B2 - Data storage control method and apparatus

Info

Publication number: JP4131928B2
Application number: JP2002333479A
Authority: JP
Inventors: 健二久保; 英樹遲野井; 輝幸安永
Original assignee: Hitachi Ltd; Hitachi Information and Control Solutions Ltd
Current assignee: Hitachi Ltd; Hitachi Information and Control Solutions Ltd
Priority date: 2002-11-18
Filing date: 2002-11-18
Publication date: 2008-08-13
Anticipated expiration: 2022-11-18
Also published as: JP2004171106A

Description

【０００１】
【発明の属する技術分野】
本発明は、メモリ内に記憶されたデータを読み出すデータ記憶制御方法および装置に係り、特に、メモリに対するメモリ１ビットエラー訂正内容の書き戻しに好適にデータ記憶制御方法および装置に関する。
【０００２】
【従来の技術】
従来、データ記憶制御装置としては、例えば、特開平１０−８３３５７号公報に記載されているように、メモリ内で発生した一過性のメモリ１ビットエラーに対してメモリコントローラが該メモリ領域からデータの読み出し動作を行った際に、ＥＣＣ（Error Checking&Correction）によりエラーを検出し、その検出したエラーに対して訂正可能であれば訂正を行ったうえでデータを出力するものが知られている。
【特許文献１】
特開平１０−８３３５７号公報
【０００３】
【発明が解決しようとする課題】
しかしながら、特開平１０−８３３５７号公報に記載されたものは、メモリの信頼性向上のためのものであり、一般的なメモリコントローラではメモリ１ビットエラー検出時はエラー訂正したデータが読み出されるが、エラーが検出された部位に対しエラー訂正されたデータを書き戻すようにはなっていないため、さらに１ビットエラーが発生し、複数ビットのエラーとなると、訂正不可能になる可能性がある。
【０００４】
それに対して、エラーを検出し訂正されたデータを出力するだけなく、エラーが検出された部位に対しエラー訂正されたデータを書き戻す機能を持ったメモリコントローラも知られているが、かかる機能を有するメモリコントローラは、内部にその処理回路を内蔵しているために高価であるとともに、一般的ではないために種類が限定されてしまい装置全体として捕らえたときの選択性が乏しくなる。
【０００５】
一方、メモリコントローラではなく、ＣＰＵによりエラー訂正されたデータを書き戻す処理を行う方法も考えられる。しかし、ＤＭＡ装置を用いている場合には、ＣＰＵがエラー発生を検出してエラー訂正されたデータをメモリに書き戻そうとしたメモリ領域がＤＭＡ使用中であると、ＣＰＵが書き戻そうとするアドレスにはＤＭＡ転送によりエラーが検出された時と違う値が入っているために、この状態にてＣＰＵがエラー訂正内容を書き戻すとデータ不一致となる。従ってＤＭＡ転送を使用するＤＭＡ装置がメモリコントローラに接続されているような装置においては、ＣＰＵによりエラー訂正されたデータを書き戻す処理を行うことはできないという問題があった。
本発明の目的は、ＤＭＡ使用メモリ領域においても、ＣＰＵによるメモリ１ビットエラー訂正内容のメモリへの書き戻しが可能なデータ記憶制御方法および装置を提供することにある。
【０００６】
【課題を解決するための手段】
（１）上記目的を達成するために、本発明は、ＣＰＵと、メモリと、このメモリを制御しＥＣＣレジスタ部を持つメモリコントローラと、ＤＭＡ装置とからなる計算機システムのデータ記憶制御方法において、上記メモリは、複数のエラー情報部からなるエラーテーブルを備え、上記複数のエラー情報部の内、個々のエラー情報部には、１ビットエラーが発生したアドレスを示す１ビットエラー発生アドレスと、１ビットエラーが発生した場合、読み出されたデータに対して、ＥＣＣによって訂正されたデータである１ビットエラー発生データリード値とが記憶され、上記ＣＰＵは、ＤＭＡ使用中でない場合に上記エラー情報部の上記１ビットエラー発生アドレスに記憶された上記メモリのエラーアドレスに対しエラー訂正されたデータを書き戻すようにしたものである。
かかる方法により、ＤＭＡ使用メモリ領域に対しても、ＣＰＵによるメモリ１ビットエラー訂正内容のメモリへの書き戻しが可能となる。
【０００７】
【課題を解決するための手段】
（２）上記（１）において、好ましくは、上記ＣＰＵは、上記エラーアドレスから読み出されたデータと前記エラーテーブルから読み出された１ビットエラー発生データリード値が異なる場合には書き戻しを行わないようにしたものである。
【０００８】
（３）上記（１）において、好ましくは、上記ＣＰＵは、ＤＭＡ使用中である場合には上記メモリのエラーアドレスに対するデータ書き戻し処理を次回に繰り越すようにしたものである。
【０００９】
（４）上記目的を達成するために、本発明は、ＣＰＵと、メモリと、このメモリを制御しＥＣＣレジスタ部を持つメモリコントローラと、ＤＭＡ装置とからなる計算機システムのデータ記憶制御装置において、上記メモリコントローラは、ＤＭＡ使用情報を保持するＤＭＡ情報部を備え、上記メモリは、複数のエラー情報部からなるエラーテーブルを備え、上記複数のエラー情報部の内、個々のエラー情報部には、１ビットエラーが発生したアドレスを示す１ビットエラー発生アドレスと、１ビットエラーが発生した場合、読み出されたデータに対して、ＥＣＣによって訂正されたデータである１ビットエラー発生データリード値とが記憶され、上記ＣＰＵは、このＤＭＡ情報部に保持されたＤＭＡ使用情報を参照して、ＤＭＡ使用中でない場合に上記エラー情報部の上記１ビットエラー発生アドレスに記憶された上記メモリのエラーアドレスに対しエラー訂正されたデータを書き戻す書き戻し制御手段を備えるようにしたものである。
かかる方法により、ＤＭＡ使用メモリ領域に対しても、ＣＰＵによるメモリ１ビットエラー訂正内容のメモリへの書き戻しが可能となる。
【００１０】
【発明の実施の形態】
以下、図１〜図３を用いて、本発明の一実施形態によるデータ記憶制御装置の構成及び動作について説明する。
図１は、本発明の一実施形態によるデータ記憶制御装置の全体構成を示すブロック図である。図２は、本発明の一実施形態によるデータ記憶制御装置に用いるエラーテーブルの構成を示すブロック図である。図３は、本発明の一実施形態によるデータ記憶制御装置による１ビットエラー書き戻し処理の内容を示すフローチャートである。
【００１１】
本実施形態によるデータ記憶制御装置は、ＣＰＵ１０と、ＤＭＡ装置２０と、メモリ３０と、メモリコントローラ４０とから構成されている。ＣＰＵ１０とメモリコントローラ４０とは、ＣＰＵバスＢｕ１により接続されている。ＤＭＡ装置２０とメモリコントローラ４０とは、Ｉ／ＯバスＢｕ２により接続されている。ＤＭＡ装置２０とメモリ３０とは、メモリバスＢｕ３により接続されている。
【００１２】
ＣＰＵ１０は、メモリ１ビットエラー訂正内容の書き戻しを制御する書き戻し制御手段１２を備えている。書き戻し制御手段１２の動作については、図３を用いて後述する。
【００１３】
メモリコントローラ４０は、ＥＣＣレジスタ部４２と、ＤＭＡ情報部４４を備えている。メモリコントローラ４０がメモリ３０からデータを読み出した際、ＥＣＣ（Error Checking&Correction）によりエラーを検出し、その検出したエラーに対して訂正可能であれば訂正を行ったうえでデータを出力する。このとき、メモリ１ビットエラーが発生すると、エラーが発生したこと及びエラーが発生したメモリのアドレスがＥＣＣレジスタ部４２に設定される。ＥＣＣレジスタ部４２は、エラー発生情報部４２Ａと、エラーアドレス部４２Ｂとを備えている。メモリ１ビットエラーが発生すると、メモリコントローラ４０は、ＥＣＣレジスタ部４２のエラー発生情報部４２Ａに、エラーが発生したことを示すフラグをセットし、また、エラーアドレス部４２Ｂに、エラーが発生したメモリアドレスをセットする。
【００１４】
ＤＭＡ情報部４４には、ＤＭＡ装置２０によるメモリ３０に対するＤＭＡ使用の情報がセットされる。ＤＭＡ情報部４４は、ＤＭＡ対象アドレス部４４Ａと、ＤＭＡ使用情報部４４Ｂとを備えている。ＤＭＡ使用情報部４４Ｂには、ＤＭＡ装置２０がＤＭＡを使用中であることを示すフラグがセットされ、また、ＤＭＡ使用中には、そのＤＭＡの対象となっているアドレスが、ＤＭＡ対象アドレス部４４Ａにセットされる。なお、ＤＭＡ情報部４４は、メモリコントローラ４０内にある必然性はなく、ハードウェアとして実現してもソフトウェアとして実現してもよいものである。
【００１５】
メモリ３０は、エラーテーブル３２を備えている。図２に示すように、エラーテーブル３２は、ｎ個のエラー情報部３２-1，３２-2，…，３２-nと、インデックス部３２ｘを備えている。エラー情報部３２-1には、図示するように、１ビットエラーが発生したアドレスを示す１ビットエラー発生アドレスと、１ビットエラーが発生した場合、読み出されたデータに対して、ＥＣＣによって訂正されたデータである１ビットエラー発生データリード値とが記憶される。他のエラー情報部３２-2，…，３２-nも同様に構成されている。インデックス部３２ｘには、エラー情報部３２-1〜３２-nの内、エラー情報が登録されている数に関連する数が記憶される。例えば、２つのエラー情報が記憶されている場合、２個のエラー情報部３２-1，３２-2にはエラー情報が記憶されており、３番目のエラー情報部３２-3は空である。この場合、インデックス部３２ｘには、数値「３」がセットされている。すなわち、インデックス部３２ｘにセットされている数値は、「発生しているエラーの数＋１」であり、また、空のエラー情報部の先頭の数値である。
【００１６】
ＣＰＵ１０およびＤＭＡ装置２０からメモリ３０への読み書きは、いずれの場合もメモリコントローラ４０を介して行われる。また、ＣＰＵ１０およびＤＭＡ装置２０は、メモリバスＢｕ３が未使用状態であれば、お互いの状態に依存することなく、メモリ３０とのデータの読み書きが可能である。
【００１７】
本実施形態では、書き戻し制御手段１２及びエラーテーブル３２を設けたことに特徴がある。ＣＰＵ１０は、定周期にて書き戻し制御手段１２を起動することにより、書き戻し制御手段１２は、メモリコントローラ４０に対しエラーパトロール処理、すなわちＥＣＣレジスタ部４２のエラー発生情報４２Ａを監視しており、ここでメモリ１ビットエラー発生を検出するとエラー発生箇所に対してエラー訂正内容のメモリへの書き戻し処理を行うことになる。
【００１８】
ここで、図３を用いて、書き戻し制御手段１２による書き戻し処理の内容について説明する。
ステップＳ１において、ＣＰＵ１０は、定周期にて書き戻し制御手段１２を起動することにより、書き戻し制御手段１２は、メモリコントローラ４０に対しエラーパトロールを開始する。
【００１９】
次に、ステップＳ２において、書き戻し制御手段１２は、ＥＣＣレジスタ部４２のエラー発生情報部４２Ａをチェックして、エラーフラグがセットされているか否かにより、メモリ１ビットエラーが発生したか否かを検出する。エラーが検出されると、ステップＳ３に進み、エラーアドレス部４２Ｂを参照して得たメモリ内のエラー発生アドレスを、図２に示したエラーテーブル３２のエラー情報部３２-1の１ビットエラー発生アドレスに記憶し、また、そのアドレスをデータリードした値を１ビットエラー発生データリード値に記憶する。ここで、データリードした値とは、ＥＣＣによりエラー訂正されたデータであり、このエラー訂正されたデータ値が記憶される。なお、エラーテーブル３２のエラー情報部の記憶する場所は、インデックス部３２ｘを参照して、インデックス値が「１」であれば、エラー情報部３２-1に記憶し、インデックス値が「２」であれば、エラー情報部３２-2に記憶する。その後、ステップＳ４に進む。また、ステップＳ２において、エラーが検出されない場合には、ステップＳ３をパスして、ステップＳ４に進む。
【００２０】
次に、ステップＳ４において、書き戻し制御手段１２は、インデックス部３２ｘを参照して、エラーテーブル３２が空であるか否かを判断する。インデックス値が「１」であれば、エラーテーブル３２は空であると判断できる。エラーテーブル３２が空の場合は、書き戻すべき１ビットエラーデータがないため、ステップＳ１２において処理を終了する。
【００２１】
エラーテーブル３２が空でない場合には、ステップＳ５において、書き戻し制御手段１２は、エラーテーブルインデックスが最後か否かを判定する。エラーテーブルインデックスは、インデックス部３２ｘにセットされたインデックス値とは異なり、図２に示す処理の開始時には、「１」に初期化されている。エラーテーブルインデックスは、後述するステップＳ１１の処理により順次「１」づつインクリメントされる値である。エラーテーブルインデックスとインデックス部３２ｘにセットされたインデックス値が等しくなると、ステップＳ１２において処理を終了する。
【００２２】
エラーテーブルインデックスが最後でない場合には、ステップＳ６において、書き戻し制御手段１２は、エラーテーブル３２より、過去に発生したアドレスをリードし、リード値を比較する。例えば、書き戻し制御手段１２は、エラーテーブル３２のエラー情報部３２-1から１ビットエラー発生アドレスを読み出す。そして、メモリ３０から、この読み出されたアドレスのデータを読み出す。なお、データ読み出し時には、ＥＣＣによりエラー訂正されたリード値となる。また、書き戻し制御手段１２は、エラーテーブル３２のエラー情報部３２-1から１ビットエラー発生データリード値を読み出す。そして、メモリ３０から読み出されてＥＣＣによりエラー訂正されたリード値と、エラー情報部３２-1から読み出された１ビットエラー発生データリード値を比較する。
【００２３】
ここで、２つのリードデータの値が異なる場合は、メモリ１ビットエラー検出後にエラー発生アドレスに対してメモリ書き込み(上書き)が実施されたこととなるので、書き戻しは行う必要がなくなり、ステップＳ１３において、エラーテーブル３２からエントリー，すなわち、エラー情報部３２-1の内容を削除する。
一方、ステップＳ６の判定において、エラーテーブル１１に記録されているデータリード値とリードされたデータが同じであれば、メモリエラー状態が継続していると判定できるので、ステップＳ７以降において、訂正内容の書き戻しを実施することになる。
ステップＳ７において、書き戻し制御手段１２は、ＤＭＡ対象アドレス４４Ａを参照して、メモリエラーが検出されたアドレスがＤＭＡ対象領域となっているかどうかを判定する。ＤＭＡ対象領域となっていなければＣＰＵ１０により訂正内容を書き戻しても問題はないため、ステップＳ９に移行して、エラー発生アドレスに対してリード／ライトを実行し、エラー訂正内容を書き戻す。
一方、ステップＳ７の判定において、エラー発生アドレスがＤＭＡ対象領域となっていた場合は、さらにステップＳ８において、書き戻し制御手段１２は、ＤＭＡ使用情報４４Ｂを参照して、該当ＤＭＡ転送が使用中であるかどうかを判定する。ＤＭＡ転送が未使用中(ＤＭＡ転送終了済み)であれば、ステップＳ９において、エラー発生アドレスに対してリード／ライトを実行しエラー訂正内容を書き戻す。一方、ＤＭＡ転送が使用中であれば、書き戻しを実施せずに、ステップＳ１１に進み、次回の定周期パトロール時に書き戻し処理を繰越す。
【００２４】
ステップＳ８において、リード／ライトを実行すると、次に、ステップＳ１０において、エラーテーブル３２からエントリー，すなわち、エラー情報部３２-1の内容を削除する。
【００２５】
次に、ステップＳ１１において、エラーテーブルインデックスを１だけインクリメントする。そして、ステップＳ５に戻り、インクリメントされたエラーテーブルインデックスについて、ステップＳ５〜Ｓ１１の処理を実行する。
【００２６】
以上説明したように、メモリ１ビットエラー検出時にそのエラー訂正内容をメモリに書き戻す機能を有しない一般的なメモリコントローラにおいて、該メモリコントローラにＤＭＡ転送を行うＤＭＡ装置が接続されていたとしても、ＣＰＵによりメモリ内のＤＭＡ使用領域を意識することなく、メモリ１ビットエラー検出箇所にエラー訂正内容を書き戻すことにより、訂正不可能なエラーの発生を抑えることができるため、メモリに関する信頼性が向上する。またソフトウェアによって処理するので、コストを最小限に抑えることが可能となる。
【００２７】
ＤＭＡ使用中には、書き戻し処理を次回のパトロール時に繰り越すことにより、ＤＭＡ装置を用いた場合でも誤った書き戻しをすることなく、エラーの発生を抑えることができる。
【００２８】
さらに、次回のパトロール時に再度実行することにより、確実にエラーの発生を抑えることができる。
【００２９】
なお、以上の説明では、ステップＳ１〜Ｓ１３の処理は、全て、ＣＰＵ１０によって定周期パトロールとして実行するものとしているが、例えば、ＥＣＣ時にエラーが発生したことによってＣＰＵ１０に割り込みをかけ、ＣＰＵ１０がこのエラー発生割り込みがあった場合に、ステップＳ３〜Ｓ１３の処理を実行するようにしてもよいものである。なお、ステップＳ８の処理のようにＤＭＡ転送中は次回のパトロール時に繰り越すようにしているので、ステップＳ４〜Ｓ１３の処理自体は、定周期パトロールとして実行する。
【００３０】
【発明の効果】
本発明によれば、ＤＭＡ使用メモリ領域においても、ＣＰＵによるメモリ１ビットエラー訂正内容のメモリへの書き戻しが可能となる。
【図面の簡単な説明】
【図１】本発明の一実施形態によるデータ記憶制御装置の全体構成を示すブロック図である。
【図２】本発明の一実施形態によるデータ記憶制御装置に用いるエラーテーブルの構成を示すブロック図である。
【図３】本発明の一実施形態によるデータ記憶制御装置による１ビットエラー書き戻し処理の内容を示すフローチャートである。
【符号の説明】
１０…ＣＰＵ
１２…書き戻し制御手段
２０…ＤＭＡ装置
３０…メモリ
３２…エラーテーブル
４０…メモリコントローラ
４２…ＥＣＣレジスタ部
４４…ＤＭＡ情報部
Ｂｕ１…ＣＰＵバス
Ｂｕ２…Ｉ／Ｏバス
Ｂｕ３…メモリバス[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a data storage control method and apparatus for reading data stored in a memory, and more particularly to a data storage control method and apparatus suitable for writing back memory 1-bit error correction contents to a memory.
[0002]
[Prior art]
Conventionally, as a data storage control device, for example, as described in Japanese Patent Laid-Open No. 10-83357, a memory controller receives data from a memory area in response to a transient memory 1-bit error occurring in a memory. It is known that an error is detected by ECC (Error Checking & Correction) when the read operation is performed, and if the detected error can be corrected, the data is output after correction.
[Patent Document 1]
Japanese Patent Laid-Open No. 10-83357
[Problems to be solved by the invention]
However, what is described in Japanese Patent Laid-Open No. 10-83357 is for improving the reliability of the memory, and in a general memory controller, error-corrected data is read when a memory 1-bit error is detected. Since the error-corrected data is not written back to the portion where the error is detected, if a 1-bit error occurs and a multi-bit error occurs, the correction may not be possible.
[0004]
On the other hand, there are memory controllers that not only detect errors and output corrected data, but also have a function to write back error-corrected data to the part where the error is detected. The memory controller that is included is expensive because it has its processing circuit built therein, and is not general, so the types are limited and the selectivity when captured as the entire device is poor.
[0005]
On the other hand, a method of writing back data that has been error-corrected by the CPU instead of the memory controller is also conceivable. However, if a DMA device is used, the CPU attempts to write back if the memory area in which the CPU detects an error occurrence and attempts to write back the error-corrected data to the memory is in use by the DMA. Since the address contains a value different from that when an error is detected by DMA transfer, if the CPU writes back the error correction contents in this state, data mismatch occurs. Therefore, in a device in which a DMA device that uses DMA transfer is connected to the memory controller, there is a problem that it is not possible to perform a process of writing back data that has been error-corrected by the CPU.
SUMMARY OF THE INVENTION An object of the present invention is to provide a data storage control method and apparatus capable of writing back memory 1-bit error correction contents to a memory by a CPU even in a DMA use memory area.
[0006]
[Means for Solving the Problems]
To achieve (1) above object, the present invention includes a CPU, a memory, a memory controller with ECC register unit controls the memory, in a data storage control method of a computer system including a DMA device, the memory includes an error table consisting of a plurality of error information portion, of the upper Symbol plurality of error information unit, the individual error information unit, a 1-bit error occurrence address indicates an address 1 bit error occurs, 1 When a bit error occurs, a 1-bit error occurrence data read value, which is data corrected by ECC, is stored for the read data, and the CPU stores the error information section when the DMA is not in use. writing the error correction data to the 1-bit fault address stored in the error address of the memory It is obtained by the return.
With this method, it is possible to write back the memory 1-bit error correction contents to the memory by the CPU even in the DMA use memory area.
[0007]
[Means for Solving the Problems]
(2) In the above (1), preferably, the CPU performs a write-back when the data read from the error address and the 1-bit error occurrence data read value read from the error table are different. It is something that is not.
[0008]
(3) In the above (1), preferably, when the DMA is in use, the CPU carries over the data write-back processing for the error address of the memory next time.
[0009]
(4) In order to achieve the above object, the present invention provides a data storage control device for a computer system comprising a CPU, a memory, a memory controller for controlling the memory and having an ECC register, and a DMA device. the memory controller includes a DMA information unit for holding DMA usage information, the memory includes an error table consisting of a plurality of error information portion, of the upper Symbol plurality of error information unit, the individual error information unit, A 1-bit error occurrence address indicating an address where a 1-bit error has occurred, and a 1-bit error occurrence data read value that is data corrected by ECC with respect to the read data when a 1-bit error has occurred. stored, the CPU refers to the DMA use information stored in the DMA information unit, not in DMA using If the is obtained so as to comprise a writeback control means to the 1-bit error occurs addresses stored error address of the memory in the error information unit writes the error-corrected data.
With this method, it is possible to write back the memory 1-bit error correction contents to the memory by the CPU even in the DMA use memory area.
[0010]
DETAILED DESCRIPTION OF THE INVENTION
The configuration and operation of the data storage control device according to an embodiment of the present invention will be described below with reference to FIGS.
FIG. 1 is a block diagram showing the overall configuration of a data storage control device according to an embodiment of the present invention. FIG. 2 is a block diagram showing a configuration of an error table used in the data storage control device according to the embodiment of the present invention. FIG. 3 is a flowchart showing the contents of 1-bit error write-back processing by the data storage control device according to the embodiment of the present invention.
[0011]
The data storage control device according to the present embodiment includes a CPU 10, a DMA device 20, a memory 30, and a memory controller 40. The CPU 10 and the memory controller 40 are connected by a CPU bus Bu1. The DMA device 20 and the memory controller 40 are connected by an I / O bus Bu2. The DMA device 20 and the memory 30 are connected by a memory bus Bu3.
[0012]
The CPU 10 includes a write-back control means 12 that controls the write-back of the memory 1-bit error correction content. The operation of the write back control means 12 will be described later with reference to FIG.
[0013]
The memory controller 40 includes an ECC register unit 42 and a DMA information unit 44. When the memory controller 40 reads data from the memory 30, an error is detected by ECC (Error Checking & Correction), and if the detected error can be corrected, the data is output after correction. At this time, when a memory 1-bit error occurs, the occurrence of the error and the address of the memory where the error has occurred are set in the ECC register unit 42. The ECC register unit 42 includes an error occurrence information unit 42A and an error address unit 42B. When a memory 1-bit error occurs, the memory controller 40 sets a flag indicating that an error has occurred in the error occurrence information section 42A of the ECC register section 42, and the memory in which an error has occurred in the error address section 42B. Set the address.
[0014]
In the DMA information section 44, information on DMA use for the memory 30 by the DMA device 20 is set. The DMA information unit 44 includes a DMA target address unit 44A and a DMA usage information unit 44B. In the DMA usage information section 44B, a flag indicating that the DMA device 20 is using DMA is set, and when the DMA is in use, the DMA target address is changed to the DMA target address section 44A. Set to The DMA information unit 44 is not necessarily in the memory controller 40, and may be realized as hardware or software.
[0015]
The memory 30 includes an error table 32. As shown in FIG. 2, the error table 32 includes n error information parts 32-1, 32-2,..., 32-n and an index part 32x. In the error information section 32-1, as shown in the figure, a 1-bit error occurrence address indicating an address where a 1-bit error has occurred, and if a 1-bit error has occurred, the read data is corrected by ECC. The 1-bit error occurrence data read value that is the stored data is stored. The other error information sections 32-2,..., 32-n are similarly configured. The index part 32x stores a number related to the number of registered error information among the error information parts 32-1 to 32-n. For example, when two pieces of error information are stored, error information is stored in the two error information sections 32-1 and 32-2, and the third error information section 32-3 is empty. In this case, a numerical value “3” is set in the index portion 32x. That is, the numerical value set in the index part 32x is “the number of errors that have occurred + 1”, and is the leading numerical value of the empty error information part.
[0016]
Reading and writing from the CPU 10 and the DMA device 20 to the memory 30 is performed via the memory controller 40 in any case. Further, the CPU 10 and the DMA device 20 can read / write data from / to the memory 30 without depending on the state of each other if the memory bus Bu3 is unused.
[0017]
The present embodiment is characterized in that the write-back control means 12 and the error table 32 are provided. The CPU 10 activates the write-back control means 12 at a fixed period, so that the write-back control means 12 monitors the error patrol process for the memory controller 40, that is, error occurrence information 42A in the ECC register unit 42, Here, when the occurrence of a memory 1-bit error is detected, the error correction content is written back to the memory at the location where the error occurred.
[0018]
Here, the contents of the write-back process by the write-back control means 12 will be described with reference to FIG.
In step S <b> 1, the CPU 10 activates the write-back control unit 12 at a regular cycle, so that the write-back control unit 12 starts error patrol for the memory controller 40.
[0019]
Next, in step S2, the write-back control means 12 checks the error occurrence information part 42A of the ECC register part 42 and determines whether or not a memory 1 bit error has occurred depending on whether or not an error flag is set. Is detected. If an error is detected, the process proceeds to step S3, where the error occurrence address in the memory obtained by referring to the error address portion 42B is used as the 1-bit error occurrence in the error information portion 32-1 of the error table 32 shown in FIG. The value read from the address is stored in the 1-bit error occurrence data read value. Here, the data read value is data error-corrected by ECC, and this error-corrected data value is stored. The location of the error information part of the error table 32 is stored in the error information part 32-1 with reference to the index part 32x if the index value is "1", and the index value is "2". If there is, it is stored in the error information section 32-2. Thereafter, the process proceeds to step S4. If no error is detected in step S2, step S3 is passed and the process proceeds to step S4.
[0020]
Next, in step S4, the write-back control means 12 refers to the index part 32x and determines whether or not the error table 32 is empty. If the index value is “1”, it can be determined that the error table 32 is empty. If the error table 32 is empty, there is no 1-bit error data to be written back, so the process ends in step S12.
[0021]
If the error table 32 is not empty, in step S5, the write-back control means 12 determines whether or not the error table index is the last. Unlike the index value set in the index part 32x, the error table index is initialized to “1” at the start of the processing shown in FIG. The error table index is a value that is sequentially incremented by “1” by the process of step S11 described later. When the error table index is equal to the index value set in the index part 32x, the process ends in step S12.
[0022]
If the error table index is not the last, in step S6, the write-back control means 12 reads an address generated in the past from the error table 32 and compares the read value. For example, the write-back control means 12 reads the 1-bit error occurrence address from the error information section 32-1 of the error table 32. Then, the read address data is read from the memory 30. Note that when reading data, the read value is error-corrected by ECC. The write-back control means 12 reads the 1-bit error occurrence data read value from the error information section 32-1 of the error table 32. Then, the read value read from the memory 30 and error-corrected by the ECC is compared with the 1-bit error occurrence data read value read from the error information section 32-1.
[0023]
Here, if the values of the two read data are different, the memory write (overwrite) has been performed on the error occurrence address after the memory 1-bit error is detected, so there is no need to perform the write back, step S13. , The entry, that is, the content of the error information section 32-1 is deleted from the error table 32.
On the other hand, if it is determined in step S6 that the data read value recorded in the error table 11 is the same as the read data, it can be determined that the memory error state continues. Will be written back.
In step S7, the write-back control means 12 refers to the DMA target address 44A and determines whether the address where the memory error is detected is a DMA target area. If it is not in the DMA target area, there is no problem even if the CPU 10 writes back the correction contents. Therefore, the process proceeds to step S9, the read / write is executed for the error occurrence address, and the error correction contents are written back.
On the other hand, if it is determined in step S7 that the error occurrence address is a DMA target area, in step S8, the write-back control means 12 refers to the DMA usage information 44B and the DMA transfer is in use. Determine if it exists. If the DMA transfer is not in use (DMA transfer has been completed), in step S9, read / write is executed on the error occurrence address and the error correction content is written back. On the other hand, if the DMA transfer is in use, the process returns to step S11 without performing the write-back, and the write-back process is carried forward at the next fixed period patrol.
[0024]
When the read / write is executed in step S8, next, in step S10, the entry, that is, the content of the error information section 32-1 is deleted from the error table 32.
[0025]
Next, in step S11, the error table index is incremented by one. Then, the process returns to step S5, and the processes of steps S5 to S11 are executed for the incremented error table index.
[0026]
As described above, even if a general memory controller that does not have a function of writing back the error correction contents to the memory when a memory 1-bit error is detected, even if a DMA device that performs DMA transfer is connected to the memory controller, Since the error correction contents are written back to the memory 1-bit error detection location without being aware of the DMA usage area in the memory by the CPU, the occurrence of uncorrectable errors can be suppressed, so the reliability of the memory is improved. To do. Further, since the processing is performed by software, the cost can be minimized.
[0027]
While DMA is being used, the write-back process is carried forward at the next patrol, so that the occurrence of errors can be suppressed without erroneous write-back even when the DMA device is used.
[0028]
Furthermore, by executing again at the next patrol, it is possible to reliably suppress the occurrence of errors.
[0029]
In the above description, the processes in steps S1 to S13 are all executed by the CPU 10 as a periodic patrol. However, for example, when an error occurs during ECC, the CPU 10 is interrupted, and the CPU 10 When there is a generated interrupt, the processing in steps S3 to S13 may be executed. Note that, during the DMA transfer as in the process of step S8, the process is carried over at the next patrol, so the processes of steps S4 to S13 are executed as a periodic patrol.
[0030]
【The invention's effect】
According to the present invention, the memory 1 bit error correction content can be written back to the memory by the CPU even in the DMA use memory area.
[Brief description of the drawings]
FIG. 1 is a block diagram showing an overall configuration of a data storage control device according to an embodiment of the present invention.
FIG. 2 is a block diagram showing a configuration of an error table used in the data storage control device according to the embodiment of the present invention.
FIG. 3 is a flowchart showing the contents of a 1-bit error write-back process by the data storage control device according to the embodiment of the present invention.
[Explanation of symbols]
10 ... CPU
DESCRIPTION OF SYMBOLS 12 ... Write-back control means 20 ... DMA apparatus 30 ... Memory 32 ... Error table 40 ... Memory controller 42 ... ECC register part 44 ... DMA information part Bu1 ... CPU bus Bu2 ... I / O bus Bu3 ... Memory bus

Claims

In a data storage control method for a computer system comprising a CPU, a memory, a memory controller that controls the memory and has an ECC register, and a DMA device,
The memory includes an error table including a plurality of error information parts,
Among the plurality of error information portions, each error information portion includes a 1-bit error occurrence address indicating an address where a 1-bit error has occurred, and, when a 1-bit error has occurred, A 1-bit error occurrence data read value that is data corrected by ECC is stored,
The data storage control method, wherein the CPU writes back the error-corrected data to the error address of the memory stored in the 1-bit error occurrence address of the error information section when the DMA is not in use.

The data storage control method according to claim 1, wherein
The data storage control method, wherein the CPU does not perform write back when the data read from the error address is different from the 1-bit error occurrence data read value read from the error table .

The data storage control method according to claim 1, wherein
A data storage control method according to claim 1, wherein when the DMA is in use, the data write-back process for the error address in the memory is carried forward to the next time.

In a data storage control device of a computer system comprising a CPU, a memory, a memory controller that controls the memory and has an ECC register unit, and a DMA device,
The memory controller includes a DMA information section for holding DMA usage information,
The memory includes an error table including a plurality of error information parts,
Among the plurality of error information portions, each error information portion includes a 1-bit error occurrence address indicating an address where a 1-bit error has occurred, and, when a 1-bit error has occurred, A 1-bit error occurrence data read value that is data corrected by ECC is stored,
The CPU refers to the DMA usage information held in the DMA information section and corrects the error address of the memory stored in the 1-bit error occurrence address of the error information section when the DMA is not in use. A data storage control device comprising write back control means for writing back the written data.