JPH08147112A

JPH08147112A - Error recovery device for disk array device

Info

Publication number: JPH08147112A
Application number: JP6286189A
Authority: JP
Inventors: Shigeo Konno; 茂生金野
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1994-11-21
Filing date: 1994-11-21
Publication date: 1996-06-07

Abstract

PURPOSE: To efficiently perform the recovery work by automatically performing the recovery processing without requiring hands neither replacement of a disk device by medium initialization of the disk device where a fault occurs. CONSTITUTION: If the frequency in error occurrence of one of disk devices 50 to 57 for data storage and a disk device 58 for redundant information storage in a disk array 5 exceeds a prescribed value, data of the disk device where error occurs is restored into an auxiliary disk device 59 by a first data restoration part 46; and when the restoration operation of this part 46 is completed, a re-initializing part 47 initializes (formats) the medium of the disk device where the error occurs. After initialization of the re-initializing part 47 is completed, a medium check part 48 checks the medium of the disk device where the error occurs. A second data restoration part 49 restores data of the auxiliary disk device 59 into an error disk device when it is discriminated by the medium check part 48 that the medium is normal.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、データ記憶用又はパリ
ティ記憶用のディスク装置の障害発生時に予備ディスク
装置へデータ復元して対応するディスクアレイ装置のエ
ラー回復装置に関し、特に、予備ディスク装置に切り替
えた後にエラーディスク装置の回復動作を試みるように
したディスクアレイ装置のエラー回復装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an error recovery device for a disk array device which restores data to a spare disk device when a failure occurs in the disk device for data storage or parity storage, and more particularly to a spare disk device. The present invention relates to an error recovery device of a disk array device that attempts recovery operation of an error disk device after switching.

【０００２】[0002]

【従来の技術】高速化、高性能化が進む近年のコンピュ
ータシステムにおいて、半導体テクノロジの進歩を背景
とした中央処理装置の性能向上は目覚ましいものがあ
り、このため、外部に接続される外部記憶装置に対して
も同様な高性能化が要求されている。この要求に対し
て、機械的動作を伴う磁気ディスク装置の高速化に限界
があるため、複数の磁気ディスク装置でディスクアレイ
を構成してディスクアレイ制御装置に並列接続し、複数
の磁気ディスク装置を並列アクセスしてリード、ライト
動作を行うディスクアレイ装置が提供されている。2. Description of the Related Art In recent computer systems that are becoming faster and higher in performance, the performance of a central processing unit has been remarkably improved due to the progress of semiconductor technology. The same high performance is required for the. In response to this request, there is a limit to the speedup of the magnetic disk device that involves mechanical operation, so a disk array is configured with multiple magnetic disk devices and connected in parallel to the disk array control device, and multiple magnetic disk devices are 2. Description of the Related Art There is provided a disk array device that performs parallel access to perform read and write operations.

【０００３】このようなディスクアレイ装置では、運用
されているディスク装置に対し、予備ディスク装置を設
け、運用ディスク装置の障害時に予備ディスク装置に切
り替えて運用する。図６は従来のディスクアレイ装置で
ある。ディスクアレイ制御装置２は、上位装置１と接続
される上位装置インタフェース制御部３と、ディスクア
レイ５の複数の磁気ディスク装置５０〜５９と接続され
るデバイス制御部４により構成される。ディスクアレイ
５は、データ記憶用ディスク装置５０〜５７と冗長情報
記憶用ディスク装置（以下「冗長ディスク装置」とい
う）５７を有し、さらに予備ディスク装置５９を設けて
いる。In such a disk array device, a spare disk device is provided for the operating disk device, and the spare disk device is switched to the spare disk device when the operation disk device fails. FIG. 6 shows a conventional disk array device. The disk array controller 2 is composed of a host device interface controller 3 connected to the host device 1 and a device controller 4 connected to the plurality of magnetic disk devices 50 to 59 of the disk array 5. The disk array 5 has data storage disk devices 50 to 57 and a redundant information storage disk device (hereinafter referred to as “redundant disk device”) 57, and further has a spare disk device 59.

【０００４】ディスクアレイ制御装置２は、上位装置１
からのデータ転送要求に対して、デバイス制御部４を経
由して磁気ディスク装置５０〜５８を並列にアクセス
し、リード処理またはライト処理を同時に行う。即ち、
複数のデータ記憶用ディスク装置５０〜５７にデータが
書き込まれる際に、冗長ディスク装置５８に対してパリ
ティデータ等を生成して書込みを行う。パリティは、デ
ータの読出し時に複数のデータ記憶用ディスク内のある
一台の磁気ディスク装置において何らかの障害が発生し
た場合においても、他の正常なディスク装置のデータと
冗長ディスク装置のパリティデータからのデータ復元を
可能としている。The disk array control device 2 is a host device 1.
In response to a data transfer request from the magnetic disk devices 50 to 58, the magnetic disk devices 50 to 58 are accessed in parallel via the device control unit 4 to simultaneously perform the read process or the write process. That is,
When data is written to the plurality of data storage disk devices 50 to 57, parity data or the like is generated and written to the redundant disk device 58. Parity is the data from the data of other normal disk units and the parity data of redundant disk units even if some failure occurs in one magnetic disk unit in multiple data storage disks when reading data. It is possible to restore.

【０００５】また、ディスク装置５０〜５８の内のある
一台のディスク装置において連続して障害が発生した場
合、デバイス制御部４の指示により障害を起こしたエラ
ーディスク装置を論理ディスクの割当てから切り離して
予備ディスク装置に割り当て、新たに割り当てた予備デ
ィスク装置にエラーディスク装置の全データを復元させ
ている。Further, when a failure occurs in one of the disk devices 50 to 58 in succession, the error disk device in which the error has occurred is separated from the logical disk allocation according to the instruction from the device control unit 4. All the data of the error disk device is restored to the newly allocated spare disk device.

【０００６】予備ディスク装置に対するデータ復元処理
は、オペレータによる指示も可能であるが、通常はディ
スクアレイ制御装置２にてエラーの発生状況を監視し、
エラーの発生がある一定値を越えた場合に自動的にデー
タ復元を開始させている。The data restoration process for the spare disk device can be instructed by an operator, but normally, the disk array controller 2 monitors the error occurrence status,
Data restoration is started automatically when an error exceeds a certain value.

【０００７】[0007]

【発明が解決しようとする課題】ところで、ディスク装
置に発生する障害としては、ディスク装置を構成する部
品等の劣化や不良等によるところが多いが、部品を構成
する材料等の特性やトラックずれ等により訂正不可能な
データチェックが発生することがある。一般にこれらの
データチェック障害は、媒体のイニシャライズ処理（フ
ォーマット処理）により復旧することが可能である。By the way, most of the failures that occur in the disk device are due to deterioration and defects of the parts that make up the disk device, but due to the characteristics of the materials that make up the parts and the track deviation. Uncorrectable data checks may occur. Generally, these data check failures can be recovered by the initialization process (format process) of the medium.

【０００８】しかし、従来装置にあっては、障害を起こ
したディスク装置は新品と交換することを前提としてお
り、媒体のイニシャライズ処理で復旧可能な障害であっ
ても、必ずシステム筐体から障害ディスク装置を外し、
別の試験装置等にセットしてイニシャライズしてみなけ
ればならず、ディスク装置の交換や、イニシャライズの
ための人手による作業を必要としていたため、ディスク
アレイ装置の復旧作業に時間がかかるという問題があっ
た。However, in the conventional apparatus, it is premised that the failed disk device is replaced with a new one, and even if the failure is recoverable by the initialization processing of the medium, the failed disk must always be removed from the system case. Remove the device,
Since it was necessary to set it in another test device, etc. to initialize it, it required replacement of the disk device and manual work for initialization, so it took time to restore the disk array device. there were.

【０００９】本発明は、障害を起こしたディスク装置の
媒体イニシャライズによる回復処理を人手やディスク装
置の交換を必要とすることなく自動的に行って障害発生
に対する復旧作業を効率化して短時間で処理できるよう
にしたディスクアレイ装置のエラー回復装置を提供する
ことを目的とする。According to the present invention, the recovery process by the media initialization of the failed disk device is automatically performed without the need for manual or replacement of the disk device, and the recovery work for the failure occurrence is made efficient and processed in a short time. It is an object of the present invention to provide an error recovery device for a disk array device that is made possible.

【００１０】[0010]

【課題を解決するための手段】図１は本発明の原理説明
図である。まず本発明は、データ記憶用ディスク装置５
０〜５７と冗長情報記憶用ディスク装置５８を備えたデ
ィスクアレイ５を接続し、上位装置１からのアクセスに
対して複数の磁気ディスク装置５０〜５８を並列アクセ
スするディスクアレイ制御装置２を有し、更に、ディス
クアレイ５は少なくとも１台の予備ディスク装置５９を
備えたディスクアレイ装置を対象とする。FIG. 1 is a diagram illustrating the principle of the present invention. First, the present invention relates to a data storage disk device 5
0 to 57 and a disk array 5 having a redundant information storage disk device 58 are connected to each other, and a disk array control device 2 for parallelly accessing a plurality of magnetic disk devices 50 to 58 in response to an access from the host device 1 is provided. Further, the disk array 5 is intended for a disk array device having at least one spare disk device 59.

【００１１】このようなディスクアレイ装置のエラー回
復装置として本発明にあっては、ディスクアレイ制御装
置２に、第１データ復元部４６、再イニシャライズ部４
７、媒体検査部４８及び第２データ復元部４９を設け
る。第１データ復元部４６は、ディスクアレイ５のデー
タ記憶用ディスク装置５０〜５７及び冗長情報記憶用デ
ィスク装置５８のいずれかのエラー発生回数が規定値を
越えた場合に、エラー発生ディスク装置のデータを予備
ディスク装置５９に復元する。再イニシャライズ部４７
は、第１データ復元部４６による復元動作が完了した後
に、エラーディスク装置の媒体をイニシャライズ（フォ
ーマッティング）する。媒体検査部４８は、再イニシャ
ライズ部４７によるイニシャライズが完了した後に、エ
ラーディスク装置の媒体の検査を行う。第２データ復元
部４９は、媒体検査部４８により媒体正常が判定された
場合に、予備ディスク装置５９のデータをエラーディス
ク装置に復元する。In the present invention as an error recovery device for such a disk array device, the disk array control device 2 includes a first data restoration unit 46 and a re-initialization unit 4.
7. A medium inspection unit 48 and a second data restoration unit 49 are provided. The first data restoration unit 46, when the number of error occurrences of any one of the data storage disk devices 50 to 57 and the redundant information storage disk device 58 of the disk array 5 exceeds a specified value, the data of the error occurrence disk device. Are restored to the spare disk device 59. Re-initialization part 47
Initializes (formats) the medium of the error disk device after the restoration operation by the first data restoration unit 46 is completed. The medium inspection unit 48 inspects the medium of the error disk device after the initialization by the re-initialization unit 47 is completed. The second data restoration unit 49 restores the data of the spare disk device 59 to the error disk device when the medium inspection unit 48 determines that the medium is normal.

【００１２】更に、ディスクアレイ制御装置２に上位報
告部３８を設け、第１データ復元部４６によるデータ復
元の開始と終了、再イニシャライズ部４７による再イニ
シャライズの開始、媒体検査部４８による媒体正常判定
に基づく再イニシャライズの終了、第２データ復元部４
９によるデータ復元の開始と終了の各々を、上位装置１
に通知する。Further, the disk array control device 2 is provided with an upper report section 38, start and end of data restoration by the first data restoration section 46, start of re-initialization by the re-initialization section 47, and medium normality judgment by the medium inspection section 48. Of re-initialization based on the second data restoration unit 4
Each of the start and end of the data restoration by the 9
To notify.

【００１３】更に、上位報告部３８は、第１データ復元
部４６によるデータ復元、再イニシャライズ部４７によ
る再イニシャライズ、及び第２データ復元部４９による
データ復元の各々について、上位装置１への完了報告か
らの経過時間を監視し、一定時間を越えても上位装置１
又はオペレータからの指示がない場合は、強制的に次の
処理に移行させる。Further, the upper report unit 38 reports the completion of the data restoration by the first data restoration unit 46, the re-initialization by the re-initialization unit 47, and the data restoration by the second data restoration unit 49 to the higher-level device 1. Monitors the elapsed time from the
Alternatively, if there is no instruction from the operator, the process is forced to shift to the next process.

【００１４】更に、ディスクアレイ制御装置２にロギン
グ処理部３９を設け、第１データ復元部４６によるデー
タ復元、再イニシャライズ部４７による再イニシャライ
ズ、及び第２データ復元部４９によるデータ復元の各々
の報告内容を、不揮発性記憶部に記憶保持する。Further, the disk array control device 2 is provided with a logging processing unit 39 for reporting data restoration by the first data restoration unit 46, re-initialization by the re-initialization unit 47, and data restoration by the second data restoration unit 49. The content is stored and held in the nonvolatile storage unit.

【００１５】[0015]

【作用】このような本発明によるディスクアレイ装置の
エラー回復装置によれば次の作用が得られる。ディスク
アレイのデータ及び冗長記録用のディスク装置のいずれ
かでエラーが多発してエラー発生回数が規定値を越えた
ときに、自動的に予備ディスク装置へのデータ復元動作
を開始する。このとき上位装置に対してデータ復元の開
始が通知される。予備ディスク装置へのデータ復元動作
が完了すると、上位装置にその旨を報告し、完了報告に
対する指示を時間監視により待つ。According to the error recovery device of the disk array device of the present invention, the following effects can be obtained. When an error occurs frequently in either the data of the disk array or the disk device for redundant recording and the number of error occurrences exceeds a specified value, the data restoration operation to the spare disk device is automatically started. At this time, the start of data restoration is notified to the host device. When the data restoration operation to the spare disk device is completed, the fact is reported to the upper device and the instruction for the completion report is waited by the time monitoring.

【００１６】上位装置またはオペレータからの指示があ
るか、或いは監視時間がオーバフローすると、エラーデ
ィスク装置の再イニシャライズを実施する。再イニシャ
ライズが済むと、次にイニシャライズが済んだ媒体を検
査する検査処理（診断処理）を行う。媒体診断が正常で
あれば、この時点で上位装置に再イニシャライズの完了
を報告し、完了報告に対する指示を時間監視により待
つ。When there is an instruction from the host device or the operator or when the monitoring time overflows, the error disk device is reinitialized. When the re-initialization is completed, an inspection process (diagnosis process) for inspecting the medium that has been initialized next is performed. If the medium diagnosis is normal, the completion of re-initialization is reported to the host device at this point, and the instruction for the completion report is waited by time monitoring.

【００１７】上位装置またはオペレータからの指示があ
るか、或いは監視時間がオーバフローすると、予備ディ
スク装置から再イニシャライズによりエラーの回復した
ディスク装置にデータを復元し、上位装置に対してデー
タ復元完了を通知する。これにより媒体のイニシャライ
ズで回復可能なディスク装置の故障を、ディスク装置を
交換したり人手を必要とすることなく、正常なディスク
装置に回復させることができる。When there is an instruction from the host device or an operator, or when the monitoring time overflows, data is restored from the spare disk device to the disk device in which the error has been recovered by re-initialization, and the host device is notified of the completion of data restoration. To do. As a result, the failure of the disk device that can be recovered by the initialization of the medium can be recovered to the normal disk device without replacing the disk device or requiring manpower.

【００１８】また上位装置からディスクアレイ制御装置
が切り離されても、ロギング処理部によりディスクアレ
イ制御装置の不揮発性記憶部に障害発生に対する復旧状
況及び結果が格納され、ロギング情報として上位装置に
提供することができる。Even if the disk array control device is disconnected from the host device, the logging processing unit stores the recovery status and the result of the failure occurrence in the non-volatile storage unit of the disk array control device, and provides it to the host device as logging information. be able to.

【００１９】[0019]

【実施例】図２は、本発明の一実施例を示したブロック
図である。図２において、本発明のディスクアレイ装置
は、上位装置としてのホストコンピュータ１に接続され
たディスクアレイ制御装置２と、論理デバイスとして複
数のディスク装置５０〜５９を並列接続したディスクア
レイ５から構成される。ディスクアレイ５は、この実施
例にあっては、データを記憶するための８台の記憶用デ
ィスク装置５０〜５７、１台のパリティ情報を記憶する
冗長ディスク装置５８、および１台の予備ディスク装置
５９で構成される。FIG. 2 is a block diagram showing an embodiment of the present invention. 2, the disk array device of the present invention comprises a disk array control device 2 connected to a host computer 1 as a host device and a disk array 5 in which a plurality of disk devices 50 to 59 are connected in parallel as logical devices. It In this embodiment, the disk array 5 includes eight storage disk devices 50 to 57 for storing data, one redundant disk device 58 for storing parity information, and one spare disk device. It is composed of 59.

【００２０】ディスクアレイ装置２は、ホストコンピュ
ータ１と接続される上位インタフェース制御部３と、デ
ィスクアレイ５と接続されるデバイス制御部４で構成さ
れる。上位インタフェース制御部３には、インタフェー
ス制御部３１、ＭＰＵ３２、データ転送制御部３３、フ
ラグレジスタ３５、カウンタ３４、不揮発記憶部３６が
設けられる。The disk array device 2 comprises an upper interface control section 3 connected to the host computer 1 and a device control section 4 connected to the disk array 5. The upper interface control unit 3 is provided with an interface control unit 31, an MPU 32, a data transfer control unit 33, a flag register 35, a counter 34, and a non-volatile storage unit 36.

【００２１】ＭＰＵ３２は、マイクロプログラム３７に
よりホストコンピュータ１からのデータ転送要求に対す
る各種の処理を行い、その処理機能の中に、デバイス制
御部４によるディスクアレイ５の状態、特にエラー回復
処理に伴う各種の状態や結果をホストコンピュータ１に
報告するための上位報告部３８と、上位報告部３８で報
告するエラー回復の状況や結果を不揮発記憶部３６にロ
ギング情報として記憶保持するロギング処理部３９の機
能を設けている。更にＭＰＵ３２には、オペレータ制御
部６が接続され、エラー回復などの各種のメンテナンス
に必要な情報をオペレータがオペレータ制御部６よりＭ
ＰＵ３２に指示可能としている。The MPU 32 performs various processes in response to a data transfer request from the host computer 1 by the microprogram 37, and among its processing functions, various states associated with the state of the disk array 5 by the device control unit 4, particularly error recovery processing. Function of the upper report unit 38 for reporting the status and result of the above to the host computer 1, and the logging processing unit 39 for storing and retaining the error recovery status and result reported by the upper report unit 38 in the nonvolatile storage unit 36 as logging information. Is provided. Further, an operator control unit 6 is connected to the MPU 32 so that the operator can send information necessary for various maintenance such as error recovery from the operator control unit 6 to the M unit 32.
The PU 32 can be instructed.

【００２２】デバイス制御部４には、ディスクアレイ制
御部４１、ＭＰＵ４２、データ転送制御部４３、データ
チェックカウンタ４４が設けられる。ＭＰＵ４２は、マ
イクロプログラム４５を実行し、上位インタフェース制
御部３のＭＰＵ３２によるホストコンピュータ１からの
データ転送要求に伴うディスクアレイ５に対するリード
動作またはライト動作、更に本発明のエラー回復のたの
処理動作を行う。The device control section 4 is provided with a disk array control section 41, an MPU 42, a data transfer control section 43, and a data check counter 44. The MPU 42 executes the microprogram 45 to perform a read operation or a write operation with respect to the disk array 5 in response to a data transfer request from the host computer 1 by the MPU 32 of the host interface control unit 3, and further an error recovery processing operation of the present invention. To do.

【００２３】このエラー回復のため、マイクロプログラ
ム４５には、第１データ復元部４６、再イニシャライズ
部４７、媒体検査部４８および第２データ復元部４９の
各機能が設けられている。データチェックカウンタ４４
は、ディスクアレイ５に設けたディスク装置５０〜５９
ごとにカウンタ領域をもっており、ホストコンピュータ
１からのデータ転送要求に伴うディスクアレイのアクセ
ス時のリード動作で得られた読出データについて、ＥＣ
Ｃにより訂正不可能なエラーを検出したときに障害発生
と判断して、エラーを起こしたディスク装置に対応する
データチェックカウンタ４４の値を１つインクリメント
する。For this error recovery, the microprogram 45 is provided with the respective functions of a first data restoration section 46, a reinitialization section 47, a medium inspection section 48 and a second data restoration section 49. Data check counter 44
Are disk devices 50 to 59 provided in the disk array 5.
Each has a counter area, and the read data obtained by the read operation at the time of accessing the disk array in response to the data transfer request from the host computer 1
When an uncorrectable error is detected by C, it is determined that a failure has occurred, and the value of the data check counter 44 corresponding to the disk device in which the error has occurred is incremented by one.

【００２４】第１データ復元部４６は、データチェック
カウンタ４４の計数値を監視しており、エラー発生回数
が予め定めた規定値に達すると、エラー回数が規定値に
達したディスク装置をエラーディスク装置と判定し、エ
ラー回復処理の対象に指定し、エラーディスク装置のデ
ータを予備ディスク装置５９に復元させるためのデータ
復元処理を実行する。The first data restoring unit 46 monitors the count value of the data check counter 44, and when the number of error occurrences reaches a preset specified value, the disk device whose error count has reached the specified value is set as an error disk. It is determined that the device is a device and is designated as the target of the error recovery process, and the data restoration process for restoring the data of the error disk device to the spare disk device 59 is executed.

【００２５】予備ディスク装置５９に対するデータ復元
は、エラーディスク装置を除く正常な記憶用ディスク装
置と冗長ディスク装置５８の各データを使用して生成す
ることができる。再イニシャライズ部４７は、第１デー
タ復元部４６で予備ディスク装置５９に対するエラーデ
ィスク装置のデータ復元が正常終了した場合のホストコ
ンピュータ１またはオペレータ制御部６からの指示、あ
るいはいずれの指示もない場合は、上位インタフェース
制御部３に設けたカウンタ３４による時間監視でオーバ
フローした際に起動し、エラーディスク装置の媒体の再
イニシャライズ、即ち初期化処理としてのフォーマッテ
ィングを実行させる。Data restoration to the spare disk device 59 can be generated by using each data of the normal storage disk device except the error disk device and the redundant disk device 58. The re-initialization unit 47 gives an instruction from the host computer 1 or the operator control unit 6 when the data restoration of the error disk device to the spare disk device 59 is normally completed by the first data restoration unit 46, or when there is no instruction. The counter 34 provided in the host interface control unit 3 is activated when the time is monitored and overflows, and reinitialization of the medium of the error disk device, that is, formatting as an initialization process is executed.

【００２６】媒体検査部４８は、再イニシャライズ部４
７によるエラーディスク装置の媒体のイニシャライズが
終了した時点で起動し、イニシャライズが済んだ媒体の
データ面に所定のダミーデータを全面に書き込み、続い
て全面のリードを行って、正常にリードできたか否かの
媒体検査を行う。媒体検査部４８による検査が正常に終
了すれば、これで再イニシャライズの完了となる。再イ
ニシャライズの完了は、ホストコンピュータ１およびオ
ペレータ制御部９に報告される。The medium inspecting unit 48 includes a re-initializing unit 4
Error due to 7) Start up at the point when initialization of the medium of the disk device is completed, write predetermined dummy data on the entire data surface of the initialized medium, and then read the entire surface. Perform the medium inspection. When the inspection by the medium inspection unit 48 is normally completed, the re-initialization is completed. Completion of the re-initialization is reported to the host computer 1 and the operator control unit 9.

【００２７】第２データ復元部４９は、再イニシャライ
ズ完了後にホストコンピュータ１またはオペレータ制御
部６からの指示、あるいは上位インタフェース制御部３
に設けたカウンタ３４による時間監視がオーバフローし
た際に起動し、再イニシャライズが済んで正常に動作可
能な、エラーを起こしたディスク装置に対し、予備のデ
ィスク装置５９のデータを復元する。この場合、予備の
ディスク装置５９は正常に動作していることから、予備
のディスク装置５９のデータをエラー回復が済んだディ
スク装置にコピーすることになる。The second data restoring section 49 receives an instruction from the host computer 1 or the operator control section 6 or the upper interface control section 3 after the completion of the re-initialization.
When the time monitoring by the counter 34 provided in 1 is overflowed, the data is stored in the spare disk device 59 for the disk device in which an error has occurred, which is activated when it is reinitialized and can operate normally. In this case, since the spare disk device 59 is operating normally, the data of the spare disk device 59 is copied to the disk device for which error recovery has been completed.

【００２８】更に、上位インタフェース制御部３のＭＰ
Ｕ３２の機能として設けた上位報告部３８は、デバイス
制御部４のＭＰＵ４２による第１データ復元部４６、再
イニシャライズ部４７、媒体検査部４８および第２デー
タ復元部４９によるエラー回復処理の開始と終了および
その結果をホストコンピュータ１に報告する。なお、再
イニシャライズについては、その開始は再イニシャライ
ズ部４７による動作開始を報告し、再イニシャライズの
終了は媒体検査部４８による正常終了で再イニシャライ
ズ完了を報告することになる。Furthermore, the MP of the upper interface control unit 3
The upper report unit 38 provided as a function of U32 starts and ends the error recovery process by the first data restoration unit 46, the re-initialization unit 47, the medium inspection unit 48, and the second data restoration unit 49 by the MPU 42 of the device control unit 4. And the result is reported to the host computer 1. Regarding the re-initialization, the start of the re-initialization is reported by the re-initialization unit 47, and the end of the re-initialization is normally terminated by the medium inspection unit 48, and the completion of the re-initialization is reported.

【００２９】上位報告部３８は、ホストコンピュータ１
に加えて、必要に応じてオペレータ制御部６にエラー回
復処理の状況および結果を報告することができる。例え
ば、保守要員がディスクアレイ制御装置２についている
場合には、オペレータ制御部６に状況を報告して操作パ
ネルなどに所定のコード番号による状態表示や結果表示
を行い、オペレータのエラー回復に対する指示を待つこ
とができる。The upper report section 38 is provided for the host computer 1
In addition, the status and result of the error recovery processing can be reported to the operator control unit 6 as necessary. For example, when a maintenance person is attached to the disk array control device 2, the operator is notified of the situation and the operation panel or the like is used to display a status or a result with a predetermined code number to give an instruction to the operator for error recovery. Can wait

【００３０】更に上位報告部３８は、ホストコンピュー
タ１に対するエラー回復のための各種の動作の開始報告
を行った際に、カウンタ３４を起動して時間監視を行
い、カウンタ３４の計数値が一定時間後にオーバフロー
すると、ホストコンピュータ１またはオペレータ制御部
６からの指示を待つことなく、ＭＰＵ４２に対し次のエ
ラー回復のための処理への移行を指示する。Further, when the upper reporting unit 38 reports the start of various operations for error recovery to the host computer 1, it activates the counter 34 and monitors the time, and the count value of the counter 34 remains constant for a certain period of time. If an overflow occurs later, the MPU 42 is instructed to shift to the next error recovery process without waiting for an instruction from the host computer 1 or the operator control unit 6.

【００３１】上位報告部３８によるホストコンピュータ
１への報告処理は、フラグレジスタ３５の状態に応じて
行われる。フラグレジスタ３５が１にセットされている
場合、上位報告部３８は割込処理によりホストコンピュ
ータ１に対する報告を行う。これに対しフラグレジスタ
３５が０にリセットされている場合には、ホストコンピ
ュータ１からのアクセスに対する応答ステータスとして
上位装置への報告を行うことになる。The reporting process to the host computer 1 by the upper report unit 38 is performed according to the state of the flag register 35. When the flag register 35 is set to 1, the upper report unit 38 reports to the host computer 1 by interrupt processing. On the other hand, when the flag register 35 is reset to 0, the response to the access from the host computer 1 is reported to the host device.

【００３２】即ち、ディスクアレイ制御装置２がホスト
コンピュータ１から切り離されている状態では、フラグ
レジスタ３５は１にセットされており、この状態では割
込みによりホストコンピュータ１への報告が行われる。
一方、ホストコンピュータ１とディスクアレイ制御装置
２が結合されてデータ転送中にあっては、例えば転送終
了時のステータス情報に含めて上位装置への報告を行う
ようになる。That is, when the disk array controller 2 is disconnected from the host computer 1, the flag register 35 is set to 1. In this state, the interrupt is reported to the host computer 1.
On the other hand, when the host computer 1 and the disk array control device 2 are connected and data is being transferred, for example, the status information at the end of the transfer is included in the report to the higher-level device.

【００３３】図３は、図２のディスクアレイ制御装置２
によるデータ転送処理の概略である。まずステップＳ１
で、上位インタフェース制御部３のＭＰＵ３２がホスト
コンピュータ１からのデータ転送による入出力要求の有
無をチェックしている。入出力要求があると、ステップ
Ｓ２に進み、デバイス制御部４のＭＰＵ４２に対しリー
ドコマンドまたはライトコマンドを発行し、ディスクア
レイ制御部４１を介して、ディスクアレイ５の記憶用デ
ィスク装置５０〜５７、更に冗長ディスク装置５８の並
列アクセスによるステップＳ２のリード動作またはライ
ト動作を行う。FIG. 3 shows the disk array controller 2 of FIG.
2 is a schematic diagram of a data transfer process by. First, step S1
Then, the MPU 32 of the host interface control unit 3 checks whether or not there is an input / output request from the host computer 1 due to data transfer. When there is an input / output request, the process proceeds to step S2, where a read command or a write command is issued to the MPU 42 of the device control unit 4, and the storage disk devices 50 to 57 of the disk array 5 are issued via the disk array control unit 41. Further, the read operation or the write operation of step S2 is performed by the parallel access of the redundant disk device 58.

【００３４】例えば、ホストコンピュータ１からのライ
トデータの転送要求に対しては、チャネルインタフェー
ス制御部３１、データ転送制御部３３、データ転送制御
部４３、ディスクアレイ制御部４１を経由して、記憶用
ディスク装置５０〜５７に対するデータ書込みおよび冗
長ディスク装置５８に対するパリティデータの書込みが
行われる。For example, in response to a write data transfer request from the host computer 1, the data is stored via the channel interface controller 31, the data transfer controller 33, the data transfer controller 43, and the disk array controller 41. Data is written to the disk devices 50 to 57 and parity data is written to the redundant disk device 58.

【００３５】また、ホストコンピュータ１からのリード
データ転送要求に対しては、ディスクアレイ５の記憶用
ディスク装置５０〜５７よりデータの読出しを行い、デ
ィスクアレイ制御部４１、データ転送制御部４３、デー
タ転送制御部３３、チャネルインタフェース制御部３１
を経由して、ホストコンピュータに要求データを転送す
る。Further, in response to a read data transfer request from the host computer 1, data is read from the storage disk devices 50 to 57 of the disk array 5, and the disk array controller 41, the data transfer controller 43, and the data are transferred. Transfer control unit 33, channel interface control unit 31
The requested data is transferred to the host computer via.

【００３６】次にステップＳ３で、ディスクアレイ５の
運用中のディスク装置において、訂正不可能なエラーが
発生したディスクがあるか否かチェックする。もし訂正
不可能なエラーを発生したディスク装置があれば、ステ
ップＳ４に進み、ＭＰＵ４２がデータチェックカウンタ
４４の対応するカウンタエリアのエラー発生回数を１つ
インクリメントする。Next, in step S3, it is checked whether or not there is a disk in which an uncorrectable error has occurred in the disk device in operation of the disk array 5. If there is a disk device that has generated an uncorrectable error, the process proceeds to step S4, and the MPU 42 increments the error occurrence count of the corresponding counter area of the data check counter 44 by one.

【００３７】次にステップＳ５で、データチェックカウ
ンタ４４の値の中に予め定めた規定値を越えるエラー発
生回数のディスク装置があるか否かチェックする。もし
規定値を越えるエラー発生回数のディスク装置があれ
ば、そのディスク装置をエラーディスクと判定し、ステ
ップＳ６のエラー処理に進む。図４および図５は、図３
のステップＳ６の本発明によるエラー処理の詳細であ
る。このエラー処理について、図２のディスクアレイ５
に設けている記憶用ディスク装置５０のエラー発生回数
が規定値に達してエラーディスクと判定された場合を例
にとって説明する。Next, in step S5, it is checked whether or not there is a disk device having the number of error occurrences exceeding the predetermined value in the value of the data check counter 44. If there is a disk device with the number of error occurrences exceeding the specified value, the disk device is determined to be an error disk and the process proceeds to the error processing of step S6. 4 and FIG.
4 is a detail of the error processing according to the present invention in step S6 of FIG. Regarding this error processing, the disk array 5 of FIG.
An example will be described in which the number of error occurrences of the storage disk device 50 provided in FIG.

【００３８】ＭＰＵ４２において、記憶用ディスク装置
５０のデータチェックカウンタ４４の値が規定値に達す
ると、エラーディスクと判定して、ＭＰＵ３２に障害通
知報告を行う。この障害通知報告を受けたＭＰＵ３２
は、ＭＰＵ４２に対し、図４のステップＳ１に示すよう
に、エラーディスク装置５０のデータを予備ディスク装
置５９に復元させるためのデータ復元処理の開始を指示
する。In the MPU 42, when the value of the data check counter 44 of the storage disk device 50 reaches a specified value, it is determined that the disk is an error disk and a failure notification report is sent to the MPU 32. MPU32 that received this failure notification report
Instructs the MPU 42 to start a data restoration process for restoring the data of the error disk device 50 to the spare disk device 59, as shown in step S1 of FIG.

【００３９】同時にＭＰＵ３２は、上位報告部３８の機
能によりホストコンピュータ１に対しデータ復元処理が
開始されたことを、ステップＳ２のように報告する。こ
のときＭＰＵ３２は、フラグレジスタ３５の状態をチェ
ックし、フラグレジスタ３５が１にセットされていれ
ば、割込みによりホストコンピュータ１にデータ復元処
理の開始を報告し、一方、フラグが０にリセットされて
いれば、現在行われているホストコンピュータ１からの
アクセス終了に伴うステータス情報に含めてデータ復元
処理の開始を報告する。At the same time, the MPU 32 reports to the host computer 1 that the data restoration process has been started by the function of the upper report unit 38, as in step S2. At this time, the MPU 32 checks the state of the flag register 35, and if the flag register 35 is set to 1, it reports the start of the data restoration process to the host computer 1 by an interrupt, while the flag is reset to 0. If this is the case, the start of the data restoration process is reported by including it in the status information that accompanies the current access from the host computer 1.

【００４０】ＭＰＵ３２からのデータ復元開始の指示を
受けたＭＰＵ４２は、第１データ復元部４６の機能によ
り、ディスクアレイ制御部４１を介してエラーディスク
装置５０のデータを予備ディスク装置５９に復元するた
めの復元処理を開始させる。このデータ復元処理は、エ
ラーディスク装置５０を除いた正常な記憶用ディスク装
置５１〜５７の各データと冗長ディスク装置５８のパリ
ティデータに基づいて生成することができる。The MPU 42, which has received the data restoration start instruction from the MPU 32, restores the data of the error disk device 50 to the spare disk device 59 via the disk array control unit 41 by the function of the first data restoring unit 46. Start the restoration process of. This data restoration process can be generated based on each data of the normal storage disk devices 51 to 57 excluding the error disk device 50 and the parity data of the redundant disk device 58.

【００４１】予備ディスク装置５９に対するエラーディ
スク装置５０の全てのデータが復元して正常終了がステ
ップＳ３で判別されると、ステップＳ４に進み、ＭＰＵ
４２はＭＰＵ３２にデータ復元の完了報告を行う。これ
を受けてＭＰＵ３２は、そのときのフラグレジスタ３５
の状態を参照しながら、ホストコンピュータ１に対する
データ復元完了報告を行う。When all the data in the error disk device 50 for the spare disk device 59 has been restored and normal termination is determined in step S3, the process proceeds to step S4 and the MPU
42 reports the completion of data restoration to the MPU 32. In response to this, the MPU 32 receives the flag register 35 at that time.
The data restoration completion report to the host computer 1 is issued while referring to the state of.

【００４２】ＭＰＵ３２は、ホストコンピュータ１に対
するデータ完了報告を終わると、ステップＳ５で、ホス
トコンピュータ１からの確認応答を待っており、確認応
答が得られて初めて報告完了と判断し、次のステップＳ
６に進む。このホストコンピュータ１からの確認応答待
ちの間は、ステップＳ６でロギング処理部３９を起動
し、不揮発記憶部３６に予備ディスク装置５９に対する
データ復元完了の状態を記録する内部ロギング処理を行
う。After completing the data completion report to the host computer 1, the MPU 32 waits for the confirmation response from the host computer 1 in step S5, and judges that the report is completed only when the confirmation response is obtained, and the next step S5.
Proceed to 6. While waiting for the confirmation response from the host computer 1, the logging processing unit 39 is activated in step S6, and an internal logging process for recording the state of data restoration completion for the spare disk device 59 in the non-volatile storage unit 36 is performed.

【００４３】ステップＳ５で、ホストコンピュータ１か
ら正常に確認応答が得られて報告完了になると、ステッ
プＳ６に進み、ＭＰＵ３２はカウンタ３４を起動して時
間監視を開始する。カウンタ３４は、予め定めた所定時
間を経過するとオーバフローして、監視時間が終了した
ことを表わす。カウンタ３４がオーバフローする監視時
間以内に、ホストコンピュータ１またはオペレータ制御
部６より再イニシャライズの指示があれば、次のステッ
プＳ７の処理に進む。また再イニシャライズの指示がな
くとも、カウンタ３４がオーバフローした時点でＭＰＵ
４２に再イニシャライズを指示することになる。When the confirmation response is normally obtained from the host computer 1 and the report is completed in step S5, the process proceeds to step S6, in which the MPU 32 activates the counter 34 and starts the time monitoring. The counter 34 overflows after a lapse of a predetermined time, and indicates that the monitoring time has ended. If there is a re-initialization instruction from the host computer 1 or the operator control unit 6 within the monitoring time when the counter 34 overflows, the process proceeds to the next step S7. Even if there is no instruction for re-initialization, the MPU will be released when the counter 34 overflows.
42 will be instructed to re-initialize.

【００４４】ＭＰＵ４２は、ＭＰＵ３２によるホストコ
ンピュータ１またはオペレータ制御部６による指示に基
づいた再イニシャライズ、あるいは指示がないときのカ
ウンタ３４のオーバフローに基づく再イニシャライズの
指示を受け、ステップＳ７で、エラーディスク装置５０
の媒体の再イニシャライズを指示する。この指示を受け
て、ディスクアレイ制御部４１を介してエラーディスク
装置５０は、工場出荷時と同様に媒体のフォーマッティ
ングを再度やり直すイニシャライズ動作を開始する。The MPU 42 receives a reinitialization instruction based on an instruction from the host computer 1 or the operator control unit 6 by the MPU 32, or a reinitialization instruction based on the overflow of the counter 34 when there is no instruction. Fifty
To re-initialize the medium. In response to this instruction, the error disk device 50 starts the initialization operation for re-forming the formatting of the medium again via the disk array control unit 41 as in the factory shipment.

【００４５】エラーディスク装置５０の再イニシャライ
ズの正常終了がステップＳ８で判別されると、ＭＰＵ４
２は、続いてエラーディスク装置５０に対し、ステップ
Ｓ９で、再イニシャライズが済んだ媒体のデータ領域全
面にダミーデータを書き込んだ後に全面をリードして、
リード結果をチェックする媒体検査処理の開始を指示す
る。When it is determined in step S8 that the reinitialization of the error disk device 50 is normally completed, the MPU4
In step S9, the dummy disk 2 is written to the entire data area of the re-initialized medium, and then the entire surface of the error disk device 50 is read.
Instruct to start the medium inspection process for checking the read result.

【００４６】続いてステップＳ１０で、エラーディスク
装置５０における媒体検査処理の正常終了がＭＰＵ４２
で判別されると、ＭＰＵ４２はＭＰＵ３２に再イニシャ
ライズ処理の完了を報告する。これを受けてＭＰＵ３２
は、そのときのフラグレジスタ３５の状態に応じてホス
トコンピュータ１に対し再イニシャライズ処理の完了報
告をステップＳ１１のように行う。Subsequently, in step S10, the MPU 42 indicates that the medium inspection process in the error disk device 50 is normally completed.
When the determination is made in step 3, the MPU 42 reports the completion of the re-initialization process to the MPU 32. In response to this, MPU32
Responds to the host computer 1 according to the state of the flag register 35 at that time to report the completion of the re-initialization process as in step S11.

【００４７】再イニシャライズ処理の完了報告に対し、
次のステップＳ１２で、ホストコンピュータ１より確認
応答があるか否か監視しており、その間に、ステップＳ
２２で、再イニシャライズ処理の完了を不揮発記憶部３
６に内部ロギング処理として記憶保持させる。ホストコ
ンピュータ１より確認応答を受けてステップＳ１２で報
告完了が判別されると、ステップＳ１３で、ＭＰＵ３２
はカウンタ３４をリセットして再度スタートし、ホスト
コンピュータ１またはオペレータ制御部６からの指示を
受けるための時間監視を開始する。カウンタ３４がオー
バフローする前に指示があれば、図５のステップＳ１４
に進む。指示がなくとも、ステップＳ２３で一定時間後
にカウンタ３４がオーバフローすれば、図５のステップ
Ｓ１４に進む。In response to the completion report of the re-initialization process,
In the next step S12, it is monitored whether or not there is a confirmation response from the host computer 1, and in the meantime, in step S12.
In step 22, the completion of the re-initialization process is confirmed by the nonvolatile storage unit 3
6 is stored and held as internal logging processing. When the completion of the report is determined in step S12 in response to the confirmation response from the host computer 1, the MPU 32 is determined in step S13.
Resets the counter 34 and restarts it, and starts time monitoring for receiving an instruction from the host computer 1 or the operator control unit 6. If there is an instruction before the counter 34 overflows, step S14 in FIG.
Proceed to. Even if there is no instruction, if the counter 34 overflows after a certain time in step S23, the process proceeds to step S14 in FIG.

【００４８】図５のステップＳ１４にあっては、ホスト
コンピュータ１またはオペレータ制御部６からの指示あ
るいはこの指示がなくとも、カウンタ３４のオーバフロ
ーに基づき、再イニシャライズが正常終了したエラーを
起こしたディスク装置５０に対する予備ディスク装置５
９からのデータ復元指示をＭＰＵ４２に対し行い、デー
タ復元処理が開始される。In step S14 of FIG. 5, even if there is no instruction from the host computer 1 or the operator control section 6 or there is no such instruction, the disk device in which the error that the re-initialization is normally completed is caused based on the overflow of the counter 34. Spare disk device 5 for 50
The data restoration instruction from the MPU 9 is given to the MPU 42, and the data restoration process is started.

【００４９】続いて、予備ディスク装置５９のデータの
ディスク装置５０に対するエラー回復の正常終了をステ
ップＳ１５でＭＰＵ４２が判別すると、このデータ回復
処理の正常終了をＭＰＵ３２に通知する。ＭＰＵ３２
は、そのときのフラグレジスタ３５の状態に応じホスト
コンピュータ１に、エラーを起こしたディスク装置５０
の復旧処理の完了報告をステップＳ１６のように行う。Then, when the MPU 42 determines in step S15 that the error recovery of the data of the spare disk device 59 to the disk device 50 is normally completed, the MPU 32 is notified of the normal completion of the data recovery process. MPU32
Depending on the state of the flag register 35 at that time
The completion report of the recovery process is performed as in step S16.

【００５０】続いてステップＳ１７で、ホストコンピュ
ータ１からの確認応答を待っており、その間にステップ
Ｓ２５で、不揮発記憶部３６に、エラーを起こしたディ
スク装置５０が回復してデータ復元が完了したことを記
録する内部ロギング処理を行う。ホストコンピュータ１
より復旧処理の完了報告に対する確認応答がステップＳ
１７で判別されると、一連のエラー発生に伴う回復処理
を終了し、図３のメインルーチンにリターンする。Subsequently, in step S17, the confirmation response from the host computer 1 is waited, and in the meantime, in step S25, the disk device 50 in which the error has occurred is recovered in the non-volatile storage unit 36 and the data restoration is completed. Performs an internal logging process that records Host computer 1
A confirmation response to the completion report of the restoration process is sent in step S.
When the determination is made in 17, the recovery process associated with the occurrence of a series of errors is ended, and the process returns to the main routine of FIG.

【００５１】一方、ステップＳ３で、予備ディスク装置
５９に対するエラーディスク装置５０のデータ復元が正
常終了できなかった場合には、予備ディスク装置５９に
障害があることから、ステップＳ１８のエラー処理に進
む。この場合には、エラーディスク装置５０に加えて予
備ディスク装置５９を交換し、必要なデータ復元処理を
行う。On the other hand, in step S3, when the data restoration of the error disk device 50 to the spare disk device 59 cannot be normally completed, there is a failure in the spare disk device 59, and the process proceeds to the error processing of step S18. In this case, the spare disk device 59 is replaced in addition to the error disk device 50, and the necessary data restoration processing is performed.

【００５２】またステップＳ８で再イニシャライズが正
常終了しなかったり、ステップＳ１０で媒体検査処理が
正常終了しなかった場合には、ステップＳ２１で、エラ
ーディスク装置５０は再イニシャライズを行っても使用
できない障害を起こしているものと判断し、エラーディ
スク装置５０の交換によるエラー処理を行う。更に、ス
テップＳ１５において、再イニシャライズ完了後のエラ
ーを起こしたディスク装置への予備ディスク装置５９か
らのデータ復元が正常終了できなかった場合には、ステ
ップＳ２４で、ディスク装置５０に再イニシャライズで
は回復できない別の障害が発生したものと判断し、ディ
スク装置５０を交換するエラー処理を行うことになる。If the re-initialization does not end normally in step S8 or the medium inspection process does not end normally in step S10, the error disk device 50 cannot be used even if re-initialization is performed in step S21. The error processing is performed by replacing the error disk device 50. Further, in step S15, if the data restoration from the spare disk device 59 to the disk device in which the error occurred after the completion of the re-initialization cannot be normally completed, the disk device 50 cannot be recovered by the re-initialization in step S24. It is determined that another failure has occurred, and error processing for replacing the disk device 50 is performed.

【００５３】尚、上記の実施例は、磁気ディスク装置を
用いたディスクアレイを例にとっているが、光ディスク
装置、半導体メモリ装置など適宜の物理デバイスを用い
たアレイ装置に適用できる。また、ディスクアレイ５に
設けた記憶用ディスク装置の台数は、必要に応じて適宜
に定めることができる。また、実施例のディスクアレイ
５は１ランク構成を例にとっているが、並列構成を多段
階に設けた複数ランク構成としてもよい。Although the above-mentioned embodiment has exemplified the disk array using the magnetic disk device, it can be applied to an array device using an appropriate physical device such as an optical disk device and a semiconductor memory device. Moreover, the number of storage disk devices provided in the disk array 5 can be appropriately determined as necessary. Although the disk array 5 of the embodiment has a one-rank configuration as an example, it may have a multi-rank configuration in which a parallel configuration is provided in multiple stages.

【００５４】更に上記の実施例にあっては、ホストコン
ピュータ１に対しエラーディスク装置から予備ディスク装置へのデータ
復元の開始エラーディスク装置から予備ディスク装置へのデータ
復元の完了エラーディスク装置の再イニシャライズ処理の完了エラーディスク装置に対する予備ディスク装置からの
データ復元の完了を報告しているが、少なくとも最初ののデータ復元開
始報告と最後のの復旧処理の完了を報告できればよ
く、その間の報告は必要に応じて適宜に定めることがで
きる。Further, in the above embodiment, the host computer 1 starts the data restoration from the error disk unit to the spare disk unit and the data restoration from the error disk unit to the spare disk unit is completed. The error disk unit is reinitialized. Completion of processing Although the completion of data restoration from the spare disk unit is reported to the error disk unit, it is sufficient if at least the first data restoration start report and the completion of the last recovery process are reported, and the report between them is necessary. It can be appropriately determined depending on the situation.

【００５５】特に本発明にあっては、上位装置に報告を
行って指示を待つが、指示がなくともカウンタのオーバ
フローによる時間監視で次のエラー回復の処理に自動的
に移行できるため、基本的には上位装置への状況の報告
を行う必要はない。但し、上位装置からディスクアレイ
制御装置２側の状態が見えなくなるのを回避するため、
少なくとも不揮発記憶部３６にエラー回復のロギング情
報を記憶させる必要はある。In particular, in the present invention, a report is sent to the host device and an instruction is awaited. However, even if there is no instruction, it is possible to automatically shift to the next error recovery processing by the time monitoring due to the overflow of the counter. It is not necessary to report the status to the host device. However, in order to prevent the state of the disk array control device 2 side from becoming invisible to the host device,
At least the nonvolatile storage unit 36 needs to store the error recovery logging information.

【００５６】[0056]

【発明の効果】以上説明してきたように本発明によれ
ば、訂正不可能なデータチェックの発生により、ディス
クアレイの中のディスク装置のデータ復元が開始される
と、ディスクアレイ制御装置の内部処理により上位装置
またはオペレータからの作業指示を必要とすることな
く、自動的に、エラーを起こしたディスク装置を可能な
限り使用可能状態に戻す再イニシャライズを含む復旧作
業が行われ、オペレータ不在などにより障害の復旧が遅
れることなく実施され、更に、人手による操作ミスを防
ぐことができる。As described above, according to the present invention, when the data restoration of the disk device in the disk array is started due to the occurrence of the uncorrectable data check, the internal processing of the disk array controller is started. This automatically performs recovery work including re-initialization to return the failed disk device to the usable state as much as possible without the need for work instructions from the host device or operator. The restoration can be performed without delay, and further, it is possible to prevent an operation error by human.

【００５７】またエラー発生ディスクについては、自動
的に再イニシャライズと再イニシャライズ完了後の全面
リード動作による媒体検査が行われ、正常終了でエラー
は回復したものとして予備のディスク装置のデータを復
元して、元の運用状態に自動的に戻るようになり、媒体
のイニシャライズで回復するようなデータチェックの発
生に対し効率良くエラー回復処理を行うことができる。For the disk in which the error has occurred, a media inspection is automatically performed by the re-initialization and the full read operation after the completion of the re-initialization, and it is assumed that the error has been recovered by the normal end, and the data of the spare disk device is restored. As a result, the original operating state is automatically restored, and error recovery processing can be efficiently performed against the occurrence of a data check that is recovered by initializing the medium.

[Brief description of drawings]

【図１】本発明の原理説明図FIG. 1 is a diagram illustrating the principle of the present invention.

【図２】本発明の一実施例を示したブロック図FIG. 2 is a block diagram showing an embodiment of the present invention.

【図３】本発明のアクセス処理の概略のフローチャートFIG. 3 is a schematic flowchart of an access process of the present invention.

【図４】図３のエラー処理の詳細のフローチャートFIG. 4 is a detailed flowchart of error processing in FIG.

【図５】図３のエラー処理の詳細のフローチャート（続
き）FIG. 5 is a detailed flowchart of error handling in FIG. 3 (continued).

【図６】従来装置のブロック図FIG. 6 is a block diagram of a conventional device.

[Explanation of symbols]

１：上位装置（ホストコンピュータ）２：ディスクアレイ制御装置３：上位インタフェース制御部４：デバイス制御部５：ディスクアレイ６：オペレータ制御部３１：インタフェース制御部３２，４２：ＭＰＵ３３，４３：データ転送制御部３４：カウンタ３５：フラグレジスタ３６：不揮発記憶部３７，４５：マイクロプログラム３８：上位報告部３９：ロギング処理部４１：ディスクアレイ制御部４４：データチェックカウンタ４６：第１データ復元部４７：再イニシャライズ部４８：媒体検査部４９：第２データ復元部 1: Host device (host computer) 2: Disk array control device 3: Host interface control unit 4: Device control unit 5: Disk array 6: Operator control unit 31: Interface control unit 32, 42: MPU 33, 43: Data transfer Control unit 34: Counter 35: Flag register 36: Non-volatile storage unit 37, 45: Micro program 38: High-order report unit 39: Logging processing unit 41: Disk array control unit 44: Data check counter 46: First data restoration unit 47: Reinitialization unit 48: Medium inspection unit 49: Second data restoration unit

Claims

[Claims]

1. A disk array controller for connecting a disk array having a plurality of disk devices for storing data and storing redundant information, and for parallelly accessing the plurality of magnetic disk devices in response to an access from a host device. The disk array further comprises at least one spare disk device, wherein the disk array control device includes a plurality of disk devices for storing data of the disk array and storing redundant information. When any one of the error occurrence counts exceeds the specified value, the first data restoration unit restores the data of the error occurrence disk device to the spare disk device, and after the restoration operation by the data restoration unit is completed, A re-initialization unit that initializes the medium of the error disk device, and a re-initialization unit. A medium inspecting unit for inspecting the medium of the error disk device after the initialization is completed, and for restoring the data in the spare disk device to the error disk device when the medium inspecting unit determines that the medium is normal. An error recovery device for a disk array device, comprising: a data recovery unit.

2. The error recovery device for a disk array device according to claim 1, further comprising the start and end of data restoration by the first data restoration unit, the start of re-initialization by the re-initialization unit, and the medium inspection. End of re-initialization based on medium normality judgment by the section, the second
Each of the start and end of data restoration by the data restoration unit,
An error recovery device for a disk array device, which is provided with an upper report unit for notifying the upper device.

3. The error recovery device for a disk array device according to claim 2, wherein the higher-order reporting unit is configured to restore data by the first data restoration unit, re-initialize by the re-initialization unit, and the second data. For each data restoration by the restoration unit, monitor the elapsed time from the completion report to the upper device, and if there is no instruction from the upper device or operator even after a certain period of time, forcibly move to the next process An error recovery device for a disk array device, characterized by:

4. The error recovery device for a disk array device according to claim 1, further comprising: data restoration by the first data restoration unit, re-initialization by the re-initialization unit, and data by the second data restoration unit. An error recovery device for a disk array device, which is provided with a logging processing unit for storing and holding each report content of restoration in a non-volatile storage unit.