JPH10222315A

JPH10222315A - Method and device for error recovery of doubled hard disk drives

Info

Publication number: JPH10222315A
Application number: JP9027784A
Authority: JP
Inventors: Tsukasa Kimura; 司木村
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1997-02-12
Filing date: 1997-02-12
Publication date: 1998-08-21

Abstract

PROBLEM TO BE SOLVED: To facilitate maintenance operation at the time of the alteration of a defective storage area by substituting a normal storage area in a 1st hard disk drive for a defective storage area on the 1st hard disk drive, copying data from a 2nd hard disk drive, and restoring the data in the defective storage area. SOLUTION: An error restoration start part 9 outputs an error restoration signal once receiving an error report signal. An substituting process indication part 10 once receiving this error restoration signal requests a DK (hard disk drive) common control part 2 to performs a substituting process for an error occurrence sector of DK#0. After the substituting process for DK#0 is completed, a copy instruction part 11 requests the DK common control part 2 to rewrite data. Namely, the copy instruction part 11 writes read data from DK#1 to DK#0 where the error has occurred through a DK#0 control part 2 and a DK#0 control part 3 to restore the data. Consequently, the maintenance operation can be made efficient.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、データリードまた
はデータライト時にエラーの発生した不良記憶領域の代
替処理とそのデータの復旧とを自動的に実施する二重化
されたハードディスク装置のエラー復旧方法および装置
に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an error recovery method and apparatus for a duplicated hard disk drive, which automatically performs a process of replacing a defective storage area in which an error has occurred during data reading or data writing and recovery of the data. It is about.

【０００２】[0002]

【従来の技術】従来のハードディスク装置の二重化制御
装置について図を用いて説明する。図６は従来例を示す
ブロック図であり、その構成について詳細に説明する。2. Description of the Related Art A conventional redundant control device for a hard disk drive will be described with reference to the drawings. FIG. 6 is a block diagram showing a conventional example, and its configuration will be described in detail.

【０００３】処理装置であるＣＰＵ１は、必要に応じて
データリードアクセスやデータライトアクセスをデータ
ファイル制御部１６に供給する。データファイル制御部
１６は、データリードアクセスやデータライトアクセス
を受信すると、データファイル部１７，１８（以下、デ
ータファイル部＃０，データファイル部＃１という）内
に設けられたハードディスク装置５，８（以下、ＤＫ＃
０，ＤＫ＃１という）に対してデータの読み出しまたは
書き込みを実施する。The CPU 1 serving as a processing device supplies data read access and data write access to the data file control unit 16 as necessary. When the data file control unit 16 receives a data read access or a data write access, the hard disk devices 5, 8 provided in the data file units 17, 18 (hereinafter referred to as data file unit # 0 and data file unit # 1). (Hereinafter, DK #
0, DK # 1).

【０００４】なお、これらデータファイル部＃０，＃１
には、ＤＫ＃０，ＤＫ＃１以外にそれぞれデータ終了指
示部１９，２０が設けられている。これらデータ終了指
示部１９，２０は、ハードディスク装置のデータの書き
込み状態を示す信号を出力する。The data file sections # 0, # 1
Are provided with data end instructing units 19 and 20 in addition to DK # 0 and DK # 1, respectively. These data end instructing units 19 and 20 output signals indicating the data write state of the hard disk device.

【０００５】データ終了解析部２１は、データ終了指示
部１９，２０から受信した信号によってデータの書き込
み状態を検出し、データの書き込みが正常に終了してい
ないと判断するとコピー指示部１１を駆動させる。コピ
ー指示部１１は正常にデータの書き込みが実施されたハ
ードディスク装置からデータを読み出し、異常書き込み
のあったハードディスク装置に対してそのデータを転送
させてデータの復旧を実施する。The data end analysis unit 21 detects the data write state based on the signals received from the data end instruction units 19 and 20, and drives the copy instruction unit 11 when judging that the data write is not completed normally. . The copy instructing unit 11 reads data from the hard disk device to which the data has been normally written, and transfers the data to the hard disk device to which the data has been abnormally written to restore the data.

【０００６】このように従来のハードディスク装置の二
重化制御装置は、データの書き込みが正常に終了したか
否かをチェックし、正常に書き込めなかったときはもう
一方のハードディスク装置からデータを転送させて再度
書き込みを行うものであった。As described above, the conventional redundant control device for a hard disk device checks whether or not the data writing has been completed normally. If the data cannot be normally written, the data is transferred from the other hard disk device and re-transmitted. Writing was performed.

【０００７】[0007]

【発明が解決しようとする課題】しかしながら、記憶領
域（すなわち、セクタまたはトラック等）に物理的に異
常が発生するなどした場合、異常の発生した記憶領域を
修復しないことには再書き込みを行っても再び同じよう
なデータエラーが発生してしまう。そこで、上記のよう
な従来の二重化制御装置においては、再書き込みを行っ
てもデータエラーが発生する場合、データエラーの発生
したハードディスク装置を一旦システムから切り離し、
人手によって不良セクタを代替する処理（以下、セクタ
代替という）を行っていた。そのため、常時保守要員を
待機させなければならず保守作業が負担となり、またセ
クタ代替時には片系での運転となるためシステムの信頼
性が低下するという問題点があった。本発明は、このよ
うな課題を解決するためのものであり、不良記憶領域の
代替時における保守作業の簡単化とシステムの信頼性の
向上とを同時に実現する二重化されたハードディスク装
置のエラー復旧方法および装置を提供することを目的と
する。However, when a physical error occurs in a storage area (that is, a sector, a track, or the like), rewriting must be performed so that the abnormal storage area is not repaired. The same data error occurs again. Therefore, in the above-described conventional redundant control device, when a data error occurs even after rewriting, the hard disk device in which the data error has occurred is once disconnected from the system,
The process of manually replacing a defective sector (hereinafter, referred to as sector replacement) has been performed. Therefore, there is a problem that the maintenance staff must be kept on standby at all times, and the maintenance work is burdensome. In addition, when the sector is replaced, the operation is performed in one system, so that the reliability of the system is reduced. An object of the present invention is to solve such a problem, and an error recovery method for a dual hard disk drive that simultaneously simplifies maintenance work when replacing a defective storage area and improves system reliability. And an apparatus.

【０００８】[0008]

【課題を解決するための手段】このような目的を達成す
るために、本発明に係る二重化されたハードディスク装
置のエラー復旧方法は、第１のハードディスク装置に不
良記憶領域を検出すると、この不良記憶領域を第１のハ
ードディスク装置内の正常な記憶領域と代替し、第２の
ハードディスク装置からデータをコピーすることによっ
て上記不良記憶領域におけるデータを復旧させるもので
ある。また、本発明に係る二重化されたハードディスク
装置のエラー復旧装置は、第１のハードディスク装置の
駆動を制御するとともに、第１のハードディスク装置に
不良記憶領域を検出するとエラー通知信号を出力し、不
良記憶領域の代替処理と不良記憶領域におけるデータの
復旧処理とを実施する第１のＤＫ制御部と、第２のハー
ドディスク装置の駆動を制御するとともに、第２のハー
ドディスク装置に不良記憶領域を検出するとエラー通知
信号を出力し、不良記憶領域の代替処理と不良記憶領域
におけるデータの復旧処理とを実施する第２のＤＫ制御
部と、上記エラー通知信号を受信すると上記各ＤＫ制御
部による不良記憶領域の代替処理とデータの復旧処理と
を起動させるエラー復旧起動部とを備えている。したが
って、本発明はハードディスク装置をシステムから切り
離すことなく不良記憶領域の代替処理およびデータの復
旧を実施することができる。In order to achieve the above object, an error recovery method for a duplicated hard disk drive according to the present invention comprises the steps of: detecting a defective storage area in a first hard disk drive; The area is replaced with a normal storage area in the first hard disk drive, and data in the defective storage area is restored by copying data from the second hard disk drive. Further, the error recovery apparatus for a duplicated hard disk drive according to the present invention controls the driving of the first hard disk drive, and outputs an error notification signal when a defective storage area is detected in the first hard disk drive. A first DK control unit for performing the area replacement processing and the data recovery processing in the defective storage area, controls the driving of the second hard disk drive, and generates an error when a defective storage area is detected in the second hard disk drive. A second DK control unit that outputs a notification signal and performs a process of substituting a defective storage area and a process of restoring data in the defective storage area; An error recovery start unit that starts the substitution process and the data recovery process is provided. Therefore, the present invention can perform the replacement processing of the defective storage area and the data recovery without disconnecting the hard disk device from the system.

【０００９】[0009]

【発明の実施の形態】次に、本発明の一つの実施の形態
について図を用いて説明する。図１は本発明の一つの実
施の形態を示すブロック図であり、その構成について説
明する。なお、図１において図６における符号と同一の
ものは、同一または同等の部品を示す。Next, one embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing one embodiment of the present invention, and its configuration will be described. In FIG. 1, the same components as those in FIG. 6 indicate the same or equivalent parts.

【００１０】ＤＫ制御部３，６（以下、ＤＫ＃０制御
部、ＤＫ＃１制御部という）は、それぞれＳＣＳＩイン
ターフェース４，７（以下、ＳＣＳＩ＃０、ＳＣＳＩ＃
１という）等の汎用インターフェースを介してＤＫ＃０
またはＤＫ＃１に接続され、ＤＫ共通制御部２の指示を
受けてＤＫ＃０，ＤＫ＃１に対してデータの読み出しや
書き込み等を実施する。また、これらＤＫ＃０制御部お
よびＤＫ＃１制御部は、データの読み出しや書き込み等
の際に各セクタの所定の記録情報を読みとり、読みとっ
たデータのパリティやＣＲＣ等をチェックすることによ
って、物理的に壊れたセクタやトラック等の不良記憶領
域の有無を判定する。そして、不良セクタ等を検出する
とエラー通知信号を出力する。The DK control units 3 and 6 (hereinafter referred to as DK # 0 control unit and DK # 1 control unit) respectively have SCSI interfaces 4 and 7 (hereinafter referred to as SCSI # 0 and SCSI #).
DK # 0 via a general-purpose interface such as
Alternatively, it is connected to the DK # 1 and receives data from the DK # 0 and DK # 1 in response to an instruction from the DK common control unit 2, and reads and writes data. The DK # 0 control unit and the DK # 1 control unit read predetermined recording information of each sector when reading or writing data, and check the parity, CRC, and the like of the read data, thereby obtaining a physical data. It is determined whether there is a defective storage area such as a sector or a track that has been temporarily broken. Then, when a bad sector or the like is detected, an error notification signal is output.

【００１１】ＤＫ共通制御部２は、上記のようにＤＫ＃
０制御部，ＤＫ＃１制御部の駆動を制御するだけでな
く、ＤＫ＃０，ＤＫ＃１にそれぞれ設定された系の情報
を記憶保持する手段を備えている。すなわち、常時ＤＫ
＃０，ＤＫ＃１の何れか一方にはマスタ系が設定され、
また他方にはスレーブ系が設定されている。そして、こ
れらの系情報は、データの読み出し時等に参照されてど
ちらのハードディスク装置を利用するかの決定に使用さ
れる。The DK common control unit 2 performs the DK #
In addition to controlling the driving of the 0 control unit and the DK # 1 control unit, a unit for storing and holding the information of the system set in each of the DK # 0 and DK # 1 is provided. That is, the DK
A master system is set in one of # 0 and DK # 1,
On the other hand, a slave system is set. The system information is referred to at the time of reading data and the like, and is used to determine which hard disk device to use.

【００１２】エラー復旧起動部９は、ＤＫ＃０制御部ま
たはＤＫ＃１制御部からのエラー通知信号を受信する
と、エラー復旧動作（すなわち、不良セクタの代替処
理、不良セクタにおけるデータの復旧処理など）を実施
するため、エラー復旧信号を出力する。When receiving an error notification signal from the DK # 0 control unit or the DK # 1 control unit, the error recovery start unit 9 performs an error recovery operation (ie, a process for replacing a bad sector, a process for recovering data in a bad sector, etc.). ), An error recovery signal is output.

【００１３】代替処理指示部１０は、エラー復旧信号を
受信すると不良セクタの代替処理を実施するため代替処
理指示信号をＤＫ共通制御部２に出力する。コピー指示
部１１は、一方のハードディスク装置のデータを他方の
ハードディスク装置にコピーしてデータ復旧を行う。す
なわち、代替処理指示部１０によって不良セクタの代替
処理が行われると、コピー指示部１１は本来この不良セ
クタに書き込まれているはずのデータをもう一方のハー
ドディスク装置から読み出し、セクタ代替を行ったハー
ドディスク装置に転送して書き込み、データの復旧を実
施する。Upon receiving the error recovery signal, the alternative processing instruction unit 10 outputs an alternative processing instruction signal to the DK common control unit 2 to execute the alternative processing of the defective sector. The copy instructing unit 11 performs data recovery by copying data from one hard disk device to another hard disk device. That is, when the replacement process instructing unit 10 performs the replacement process for the defective sector, the copy instructing unit 11 reads out the data that should have been written in the defective sector from the other hard disk device, and executes the hard disk replacement for the sector replacement. Transfers data to the device and writes it to recover data.

【００１４】以上のような本発明の動作について、デー
タの読み出し時とデータの書き込み時とに分けて詳細に
説明する。The operation of the present invention as described above will be described in detail for data reading and data writing.

【００１５】なお、本実施の形態はミラーリング機能を
有する制御装置として説明する。すなわち、データの読
み出しはマスタ系である一方のハードディスク装置をセ
クタ単位で読み出すことによって実施する（以下、片系
リードという）。また、データの書き込みは、マスタ系
およびスレーブ系の両ハードディスク装置に対して同一
のデータをセクタ単位で書き込むことによって実施する
（以下、両系ライトという）。This embodiment will be described as a control device having a mirroring function. That is, data reading is performed by reading one hard disk device of the master system in sector units (hereinafter, referred to as single-system reading). Data writing is performed by writing the same data to both the master hard disk device and the slave hard disk device in sector units (hereinafter referred to as both system write).

【００１６】まず、データの読み出しにおける動作手順
について説明する。図２はＣＰＵ１がデータの読み出し
を要求した際の動作手順を示すフローチャートである。
ステップ１０１において、ＣＰＵ１はＤＫ共通制御部２
に対し片系リードの要求を出力する。First, an operation procedure in reading data will be described. FIG. 2 is a flowchart showing an operation procedure when the CPU 1 requests data reading.
In step 101, the CPU 1 executes the DK common control unit 2
A request for one-system read is output to

【００１７】ステップ１０２において、ＤＫ共通制御部
２は予めマスタ系の設定されたＤＫ＃０制御部に対して
リード要求を実行する。ＤＫ＃０制御部は自系のＤＫ＃
０に対してリード動作を実行する。In step 102, the DK common control unit 2 issues a read request to a DK # 0 control unit set in advance in the master system. The DK # 0 control unit is the DK # of its own system.
A read operation is performed on 0.

【００１８】ステップ１０３において、現在マスタ系で
あるＤＫ＃０でリードエラーが発生すると、このリード
エラーはＳＣＳＩバスのプロトコルによってＳＣＳｌ＃
０を介してＤＫ＃０制御部に通知される。するとＤＫ＃
０制御部は、エラー復旧起動部９に対してエラー通知信
号を出力してエラーデータ復旧動作の起動を要求し、ス
テップは１０４に移行する。また、リードエラーが検出
されない場合はステップ１０９に移行し、ＤＫ＃０から
読み出されたデータをＣＰＵ１に送出した後にリード動
作を終了する。In step 103, when a read error occurs in the DK # 0 which is the current master system, this read error is caused by the SCSI bus protocol according to the SCSI bus #.
0 is notified to the DK # 0 control unit. Then DK #
The 0 control unit outputs an error notification signal to the error recovery activation unit 9 to request activation of the error data recovery operation. If no read error has been detected, the process proceeds to step 109, where the data read from DK # 0 is sent to the CPU 1, and the read operation ends.

【００１９】ステップ１０４において、エラー復旧起動
部９はエラー通知信号を受信するとエラーデータの復旧
処理を実行するため、ＤＫ共通制御部２と代替処理指示
部１０に対してエラー復旧信号を供給する。エラー復旧
信号を受信するとＤＫ共通制御部２は、マスタ系をＤＫ
＃０からＤＫ＃１に切り替え、スレーブ系をＤＫ＃１か
らＤＫ＃０に切り替える。In step 104, upon receiving the error notification signal, the error recovery start-up unit 9 supplies an error recovery signal to the DK common control unit 2 and the alternative processing instruction unit 10 to execute error data recovery processing. Upon receiving the error recovery signal, the DK common control unit 2 sets the master system to the DK.
Switch from # 0 to DK # 1, and switch the slave system from DK # 1 to DK # 0.

【００２０】ステップ１０５において、ＤＫ共通制御部
２からの指示によりマスタ系になったＤＫ＃１制御部
は、自系ＤＫ＃１に対して同一セクタのリード動作を実
行する。ステップ１０６において、マスタ系になったＤ
Ｋ＃１でもリードエラーが発生した場合は、二重障害と
してＣＰＵ１へエラーを通知する。エラーが発生しなか
ったときはステップを１０８へ移行する。ステップ１０
７において、ステップ１０５における動作と並行して代
替処理指示部１０は代替処理指示信号を出力する。ＤＫ
共通制御部２はこの代替処理指示信号に基づき、ＤＫ＃
０制御部に対してＤＫ＃０のエラー発生セクタの代替処
理の実行を要求する。In step 105, the DK # 1 control unit which has become the master system in accordance with an instruction from the DK common control unit 2 executes a read operation of the same sector for its own system DK # 1. In step 106, D
If a read error occurs even in K # 1, the CPU 1 is notified of the error as a double failure. If no error has occurred, the process proceeds to step 108. Step 10
In step 7, the alternative processing instruction unit 10 outputs an alternative processing instruction signal in parallel with the operation in step 105. DK
The common control unit 2 receives the DK #
0 requests the DK # 0 control unit to execute the replacement process for the DK # 0 error occurring sector.

【００２１】ステップ１０８において、ＤＫ＃１でエラ
ーが検出されない場合は、コピー指示部１１はＤＫ共通
制御部２、ＤＫ＃０制御部を介してＤＫ＃１からのリー
ドデータをエラーの発生したＤＫ＃０に書き込み、デー
タの復旧を実行する。ステップ１０９において、ＤＫ共
通制御部２は、ＤＫ＃１から読み出したデータをＣＰＵ
１へ送出してリード動作を終了する。In step 108, if no error is detected in DK # 1, the copy instructing unit 11 sends the read data from DK # 1 via the DK common control unit 2 and DK # 0 control unit to the DK in which the error has occurred. Write to # 0 and execute data recovery. In step 109, the DK common control unit 2 sends the data read from DK # 1 to the CPU
1 to end the read operation.

【００２２】次に、データの書き込みの手順について説
明する。図３はＣＰＵ１がデータの書き込みを要求した
際の動作手順を示すフローチャートである。ステップ２
０１において、ＣＰＵ１がＤＫ共通制御部２に対して両
系ライトの要求を実行する。Next, the procedure for writing data will be described. FIG. 3 is a flowchart showing an operation procedure when the CPU 1 requests data writing. Step 2
At 01, the CPU 1 requests the DK common control unit 2 to write to both systems.

【００２３】ステップ２０２において、ＤＫ共通制御部
２は両系のＤＫ＃０，ＤＫ＃１に同一データを書き込む
ため、ＤＫ＃０制御部とＤＫ＃１制御部とに対してライ
ト動作を要求する。In step 202, the DK common control unit 2 requests the DK # 0 control unit and the DK # 1 control unit to perform a write operation to write the same data to the DK # 0 and DK # 1 of both systems. .

【００２４】ステップ２０３において、ＤＫ＃０制御部
とＤＫ＃１制御部とは、各々ＤＫ＃０，ＤＫ＃１でのラ
イトエラーを監視し、例えばマスタ系のＤＫ＃０でエラ
ーが検出された場合はエラー復旧起動部９に通知し、検
出されなかった場合はライト動作を終了する。In step 203, the DK # 0 control unit and the DK # 1 control unit monitor write errors in DK # 0 and DK # 1, respectively, and for example, an error is detected in DK # 0 of the master system. In this case, the error recovery start unit 9 is notified, and if not detected, the write operation ends.

【００２５】ステップ２０４において、エラー復旧起動
部９はエラー通知信号を受信するとエラー復旧処理を起
動させるためエラー復旧信号を出力する。代替処理指示
部１０はこのエラー復旧信号を受信するとＤＫ共通制御
部２に対してＤＫ＃０のエラー発生セクタの代替処理の
実行を要求する。In step 204, upon receiving the error notification signal, the error recovery starting unit 9 outputs an error recovery signal to start the error recovery processing. Upon receiving the error recovery signal, the substitution processing instruction unit 10 requests the DK common control unit 2 to execute the substitution processing for the DK # 0 error occurring sector.

【００２６】ステップ２０５において、ＤＫ＃０の代替
処理が完了するとコピー指示部１１は、ＤＫ共通制御部
２に対してデータの再書き込みを要求する。すなわち、
コピー指示部１１はＤＫ共通制御部２、ＤＫ＃０制御部
を介してＤＫ＃１からのリードデータをエラーの発生し
たＤＫ＃０に書き込み、データの復旧を実行する。ステ
ップ２０６において、ＤＫ＃０で再度ライトエラーが検
出された場合は、二重障害としてＣＰＵ１にエラーを通
知する。エラーが検出されない場合はライト動作を終了
する。In step 205, when the substitute process for DK # 0 is completed, the copy instruction unit 11 requests the DK common control unit 2 to rewrite data. That is,
The copy instructing unit 11 writes the read data from the DK # 1 to the DK # 0 where the error has occurred via the DK common control unit 2 and the DK # 0 control unit, and executes data recovery. In step 206, if a write error is detected again in DK # 0, an error is notified to the CPU 1 as a double failure. If no error is detected, the write operation ends.

【００２７】図４は本発明のその他の実施の形態を示す
ブロック図である。この実施の形態は、ディスクコント
ローラをハードディスク装置毎に設けたものであり、各
コントローラ間における制御信号およびデータは交絡バ
スを介して送受信される点が図１のものと相違する。FIG. 4 is a block diagram showing another embodiment of the present invention. This embodiment differs from the embodiment shown in FIG. 1 in that a disk controller is provided for each hard disk device, and control signals and data between the controllers are transmitted and received via a confounding bus.

【００２８】図４に係るハードディスク装置の二重化制
御装置の構成について説明する。同図において、図１と
同一または同様の符号は、同一または同等の部品を示
す。ディスクコントローラ＃０は、ＤＫ＃０制御部とＳ
ＣＳＩ＃０とエラー復旧起動部９と代替処理指示部１０
とコピー指示部１１等から構成されている。また、ディ
スクコントローラ＃１はディスクコントローラ＃０と同
様に構成され、ディスクコントローラ＃１とは互いに交
絡バスを介して接続されている。The configuration of the duplicated control device for the hard disk device shown in FIG. 4 will be described. In the figure, the same or similar reference numerals as those in FIG. 1 indicate the same or equivalent parts. The disk controller # 0 communicates with the DK # 0 control unit and S
CSI # 0, error recovery activation unit 9, and alternative processing instruction unit 10
And a copy instructing unit 11. The disk controller # 1 has the same configuration as the disk controller # 0, and is connected to the disk controller # 1 via a confounding bus.

【００２９】上位装置１４はＣＰＵ等の処理装置から構
成され、リード動作およびライト動作等の動作要求源で
ある。また、上位装置１４はＩＯ制御バス１５を介して
ディスクコントローラ＃０、＃１と接続されてディスク
コントローラ＃０，＃１の駆動を制御するとともに、マ
スタ系スレーブ系等の系情報の管理も行っている。The host device 14 is composed of a processing device such as a CPU, and is a source of operation request such as read operation and write operation. The host device 14 is connected to the disk controllers # 0 and # 1 via the IO control bus 15, controls the driving of the disk controllers # 0 and # 1, and also manages system information such as a master system and a slave system. ing.

【００３０】次に図４の動作について説明する。基本的
な動作は図１の場合と同様であるが、ＤＫ共通制御部を
備えていない点が異なる。そのため、図６においてはデ
ィスクコントローラ＃０，＃１間に設けられた交絡バス
を介してディスクコントローラの制御が行われる。Next, the operation of FIG. 4 will be described. The basic operation is the same as that of FIG. 1, except that the DK common control unit is not provided. Therefore, in FIG. 6, the disk controller is controlled via a confounding bus provided between the disk controllers # 0 and # 1.

【００３１】図５は上位装置１４が読み出し要求を出し
た際の動作手順を示すフローチャートである。ステップ
３０１において、上位装置１４内のＣＰＵ（図示せず）
はＩＯ制御バス１５を介して、予めマスタ系の設定され
たディスクコントローラ＃０に対し片系リードの要求を
出力する。FIG. 5 is a flowchart showing an operation procedure when the host device 14 issues a read request. In step 301, a CPU (not shown) in the host device 14
Outputs a single-system read request via the IO control bus 15 to the disk controller # 0 set in advance as the master system.

【００３２】ステップ３０２において、マスタ系ディス
クコントローラ＃０は上位装置１４からの要求を受けて
一連のリード動作を実行する。ステップ３０３におい
て、ＤＫ＃０制御部は自系のＤＫ＃０に対してリード動
作を実行する。In step 302, the master disk controller # 0 executes a series of read operations in response to a request from the host device 14. In step 303, the DK # 0 control unit performs a read operation on the DK # 0 of the own system.

【００３３】ステップ３０４において、ＤＫ＃０でリー
ドエラーが発生すると、このリードエラーはＳＣＳＩバ
スのプロトコルによってＳＣＳｌ＃０を介してＤＫ＃０
制御部に通知される。エラーが発生するとステップは３
０５へ移行し、エラーが発生しなければステップ３１２
へ移行してリードデータを上位装置１４に送出した後に
リード動作を終了する。In step 304, when a read error occurs in DK # 0, this read error is transmitted via DK # 0 via SCSI # 0 according to the SCSI bus protocol.
The control unit is notified. If an error occurs, step 3
05 and if no error occurs, step 312
Then, after the read data is sent to the host device 14, the read operation ends.

【００３４】ステップ３０５において、リードエラーが
発生するとＤＫ＃０制御部は、エラー復旧起動部９に対
してエラー通知信号を出力してエラーデータ復旧動作の
起動を要求する。エラー復旧起動部９はエラー通知信号
を受信するとエラーデータの復旧処理を実行するため、
自系の代替処理指示部１０と他系のコピー指示部１１ａ
およびＤＫ＃１制御部に対してエラー復旧信号を供給す
る。ステップ３０６において、上位装置１４はマスタ系
をＤＫ＃０からＤＫ＃１に切り替え、スレーブ系をＤＫ
＃１からＤＫ＃０に切り替える。In step 305, when a read error occurs, the DK # 0 control unit outputs an error notification signal to the error recovery activation unit 9 to request activation of the error data recovery operation. When receiving the error notification signal, the error recovery activation unit 9 executes the error data recovery process.
Substitution processing instruction unit 10 of own system and copy instruction unit 11a of another system
And an error recovery signal to the DK # 1 control unit. In step 306, the host device 14 switches the master system from DK # 0 to DK # 1, and sets the slave system to DK #.
Switch from # 1 to DK # 0.

【００３５】ステップ３０７において、上位装置１４か
らの指示によりマスタ系になったＤＫ＃１制御部は、自
系のＤＫ＃１に対して同一セクタのリード動作を実行す
る。ステップ３０８において、ステップ３０７と並行し
て代替処理指示部１０は代替処理指示信号を出力する。
ＤＫ＃０制御部はこの代替処理指示信号に基づき、ＤＫ
＃０のエラー発生セクタの代替処理の実行を要求する。
ステップ３０９において、マスタ系になったＤＫ＃１で
もリードエラーが発生した場合は、二重障害として上位
装置１４へエラーを通知する。In step 307, the DK # 1 control unit which has become the master system in response to an instruction from the higher-level device 14 executes the same sector read operation on its own DK # 1. In step 308, the alternative processing instruction unit 10 outputs an alternative processing instruction signal in parallel with step 307.
The DK # 0 control unit performs DK #
Request the execution of the substitution process for the # 0 error occurring sector.
In step 309, if a read error occurs even in the DK # 1 which has become the master system, the error is notified to the higher-level device 14 as a double failure.

【００３６】ステップ３１０において、コピー指示部１
１ａはステップ３０７で読み出されたリードデータを交
絡バス２２を介してディスクコントローラ＃０に対して
転送する。ステップ３１１において、コピー指示部１１
は交絡バス２２を介して転送されたリードデータをＤＫ
＃０にコピーするようにＤＫ＃０制御部に対して要求す
る。ＤＫ＃０はこのリードデータをＤＫ＃０に書き込ん
でデータを復旧する。ステップ３１２において、ＤＫ＃
０制御部はＤＫ＃１から読み出したリードデータをＣＰ
Ｕ１へ送出してリード動作を終了する。In step 310, copy instructing section 1
1a transfers the read data read in step 307 to the disk controller # 0 via the confounding bus 22. In step 311, the copy instruction unit 11
DK reads the read data transferred via the confounding bus 22
It requests the DK # 0 control unit to copy to # 0. DK # 0 restores the data by writing the read data to DK # 0. In step 312, the DK #
0 control unit transfers the read data read from DK # 1 to CP
The read operation is sent to U1 to end the read operation.

【００３７】なお、ＤＫ＃１におけるエラー復旧処理に
ついても上記と同様に実施される。また、ライト動作は
図１に係るハードディスク装置と同様に行われる。すな
わち、両系ライトを実施した際に各ＤＫ制御部がライト
エラーを検出すると代替処理指示部によってセクタ代替
が要求され、セクタ代替が完了すると他方のハードディ
スク装置からのデータをコピーして実施される。The error recovery processing in DK # 1 is performed in the same manner as described above. The write operation is performed in the same manner as in the hard disk device shown in FIG. That is, when each DK control unit detects a write error when performing both-system write, a sector replacement is requested by the replacement processing instruction unit, and when the sector replacement is completed, the data from the other hard disk device is copied and executed. .

【００３８】[0038]

【発明の効果】以上説明したように本発明は、二重化さ
れたハードディスク装置に不良記憶領域を検出するとそ
の代替処理を実施するとともに、不良記憶領域における
データの復旧処理を実施することができる。したがっ
て、本発明は不良記憶領域の代替処理時に対象となるハ
ードディスク装置をシステムから切り離すことなく、す
なわち二重化された状態のままで不良記憶領域の代替処
理を実施することができ、システムの信頼性を低下させ
ることがないという効果を有する。また、人手を介する
ことなく代替処理を実施することができるため、保守作
業を効率化させることができる。As described above, according to the present invention, when a defective storage area is detected in a duplicated hard disk drive, the replacement processing can be performed and the data recovery processing in the defective storage area can be performed. Therefore, according to the present invention, it is possible to perform the replacement process of the defective storage area without disconnecting the target hard disk device from the system at the time of the replacement processing of the defective storage area, that is, while maintaining the duplexed state, thereby improving the reliability of the system. It has the effect of not lowering. In addition, since the substitute process can be performed without manual intervention, the maintenance work can be made more efficient.

[Brief description of the drawings]

【図１】本発明の一つの実施の形態を示すブロック図
である。FIG. 1 is a block diagram showing one embodiment of the present invention.

【図２】ＣＰＵ１がデータの読み出しを要求した際の
動作手順を示すフローチャートである。FIG. 2 is a flowchart showing an operation procedure when the CPU 1 requests data reading.

【図３】ＣＰＵ１がデータの書き込みを要求した際の
動作手順を示すフローチャートである。FIG. 3 is a flowchart showing an operation procedure when the CPU 1 requests data writing.

【図４】本発明のその他の実施の形態を示すブロック
図である。FIG. 4 is a block diagram showing another embodiment of the present invention.

【図５】図４における上位装置１４がデータの読み出
しを要求した際の動作手順を示すフローチャートであ
る。5 is a flowchart showing an operation procedure when the host device 14 in FIG. 4 requests data reading.

【図６】従来例を示すブロック図である。FIG. 6 is a block diagram showing a conventional example.

[Explanation of symbols]

１…ＣＰＵ、２…ＤＫ共通制御部、３…ＤＫ＃０制御
部、４…ＳＣＳＩ＃０、５…ＤＫ＃０、６…ＤＫ＃１制
御部、７…ＳＣＳＩ＃１、８…ＤＫ＃１、９…エラー復
旧起動部、１０…代替処理指示部、１１…コピー指示
部。DESCRIPTION OF SYMBOLS 1 ... CPU, 2 ... DK common control part, 3 ... DK # 0 control part, 4 ... SCSI # 0, 5 ... DK # 0, 6 ... DK # 1 control part, 7 ... SCSI # 1, 8 ... DK # 1 , 9: error recovery start unit, 10: alternative processing instruction unit, 11: copy instruction unit

Claims

[Claims]

An error recovery method for a duplicated hard disk drive comprising a first hard disk drive and a second hard disk drive, wherein when a defective storage area is detected in the first hard disk drive, A duplicated hard disk drive which replaces a defective storage area with a normal storage area in a first hard disk drive, and recovers data in the defective storage area by copying data from a second hard disk drive. Error recovery method.

2. The system according to claim 1, wherein when a defective storage area is detected in the first hard disk device of the master system at the time of writing data, the first hard disk device is switched to the slave system and the second hard disk device is changed to the master system. Switching, performing a replacement process for the defective storage area, reading data from a second hard disk device of a master system, writing the data to a first hard disk device of a slave system, and restoring data in the defective storage region. Error recovery method for duplicated hard disk drive.

3. The data storage device according to claim 1, wherein when a defective storage area is detected in the first or second hard disk device at the time of reading data, the replacement processing of the defective storage area is performed, and data can be normally written. An error recovery method for a duplicated hard disk drive, comprising reading data from the hard disk drive, writing the data to the failed hard disk drive, and recovering the data in the defective storage area.

4. A duplicated hard disk drive error recovery device comprising a first hard disk drive and a second hard disk drive, wherein the drive of the first hard disk drive is controlled,
When a defective storage area is detected in the first hard disk device, an error notification signal is output, and a first processing for replacing the defective storage area and restoring data in the defective storage area is performed.
DK control unit, and controls the driving of the second hard disk drive,
When a defective storage area is detected in the second hard disk device, an error notification signal is output, and the second processing for replacing the defective storage area and processing for restoring data in the defective storage area is performed.
A DK control unit, and an error recovery start unit that starts the replacement process of the defective storage area and the data recovery process by each of the DK control units upon receiving the error notification signal. Error recovery device for hard disk drive.

5. The system according to claim 4, further comprising: a controller that controls driving of the first and second DK control units and manages system information of each of the hard disk devices.
An error recovery device for a duplicated hard disk drive, comprising a K common control unit.

6. The system according to claim 4, further comprising a confounding bus connecting the first and second DK control units, wherein data read from a normal hard disk device is subjected to a process of replacing a defective storage area. An error recovery device for a duplicated hard disk device, which transmits data to the hard disk device via the confounding bus and recovers data.