JP2006120042A

JP2006120042A - Disk array device

Info

Publication number: JP2006120042A
Application number: JP2004309218A
Authority: JP
Inventors: Ryuichi Aoki; 隆一青木
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2004-10-25
Filing date: 2004-10-25
Publication date: 2006-05-11
Anticipated expiration: 2024-10-25
Also published as: JP4609034B2

Abstract

<P>PROBLEM TO BE SOLVED: To improve the efficiency of automatic restoration for a read error of a disk array of an RAID 1. <P>SOLUTION: A block being, for example, a read and write unit of hard disk itself is used as a unit of automatic restoration in the RAID 1. When an error occurs in reading from one hard disk A of the disk array (S14), an RAID 1 control part reads data of a block of a read target from the other hard disk B (S16) and writes the read data to the hard disk A again (S20). When a read error is detected here, a block reallocation mechanism of the hard disk A itself reallocates a block in a spare area instead of the block where the read error has occurs, the rewrite of S20 is consequently performed for the reallocated block. Using the block as the unit of automatic restoration can greatly reduce the rate of error occurrences at the error restoration in comparison with in the conventional practice in which the entire disk or partition is used as the unit. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明はディスクアレイに関し、特にＲＡＩＤ１のディスクアレイの復旧制御に関する。 The present invention relates to a disk array, and more particularly to recovery control of a RAID 1 disk array.

ディスクアレイ装置は、複数のハードディスクを並列接続してそれら全体を１つのディスク装置として動作させることで、データ読み書きの高速化、又は耐障害性の向上、あるいはその両方を実現する。このようなディスクアレイの装置の制御レベルとして、ＲＡＩＤ(redundant array of independent disks)０〜５やそれらの組み合わせが知られている。 A disk array device realizes high-speed data read / write and / or improved fault tolerance by connecting a plurality of hard disks in parallel and operating them as one disk device. As control levels of such disk array devices, RAID (redundant array of independent disks) 0 to 5 and combinations thereof are known.

このなかで、ＲＡＩＤ１は、ハードディスクを２台利用して、双方のディスクに同一のデータを記録することにより、データ記録機構を利用する上位システムに対してハードディスクの信頼性よりも遥かに高いデータ記録機構を提供するものである。１つのハードディスクの故障率をλとすると、ＲＡＩＤ１の故障率は双方のハードディスクが同時に故障する確率となるから２λ^２である。いずれかのハードディスクが故障する確率は２λであり、一方が故障したときに残りが故障する確率がλであり、双方が同時に故障する確率はそれらの積となるからである。たとえば、λが１％(0.01)の場合には、ＲＡＩＤ１の故障率は0.02％となる。 Among these, RAID 1 uses two hard disks and records the same data on both disks, thereby recording data much higher than the reliability of the hard disk for the host system using the data recording mechanism. A mechanism is provided. Assuming that the failure rate of one hard disk is λ, the failure rate of RAID 1 is 2λ ^{2 because} both hard disks are likely to fail simultaneously. This is because the probability that one of the hard disks will fail is 2λ, the probability that the other will fail if one fails, and the probability that both will fail simultaneously is the product of them. For example, when λ is 1% (0.01), the failure rate of RAID1 is 0.02%.

しかし、ＲＡＩＤ１がこの信頼性を得るためには、２台が正常動作する冗長化状態を常に維持することが必要である。これに対し、一方のハードディスクが故障し、停止している状態（この状態を縮退状態と呼ぶ）では、ＲＡＩＤ１の故障率はλに低下してしまい、これでは信頼性の点では１台のハードディスクと変わるところがない。そこで、ＲＡＩＤ１の高い信頼性を維持するには、一方のハードディスクにエラーが生じた場合でも、できるだけすみやかに冗長化状態へ復旧させる必要がある。 However, in order for RAID 1 to obtain this reliability, it is necessary to always maintain a redundant state in which the two units normally operate. On the other hand, when one of the hard disks has failed and stopped (this state is referred to as a degenerate state), the failure rate of RAID 1 is reduced to λ, which is one hard disk in terms of reliability. There is no change. Therefore, in order to maintain the high reliability of RAID1, even when an error occurs in one of the hard disks, it is necessary to restore the redundancy state as soon as possible.

さて、ハードディスクの故障の検出は、書込時と読出時の両方で行われる。エラーは、(a) ハードディスクの全体に関係する障害を原因とするもの、(b) 記録媒体の部分の障害を原因とするものとがある。(a) に属するエラーが発生した場合には、書込時および読出時ともに、この障害は復旧できないため、エラーを検出したハードディスクを切り離す。ＲＡＩＤ１の場合、この結果縮退状態になってしまうが、ディスク全体の障害なのでこれ自体は仕方がない。この場合、エラーの生じたハードディスクを新たなハードディスクに交換し、正常なディスクのデータをコピーすることで冗長性を回復するより他はない。 The hard disk failure is detected both at the time of writing and at the time of reading. There are two types of errors: (a) those caused by failures related to the entire hard disk, and (b) those caused by failures in the recording medium. If an error belonging to (a) occurs, this failure cannot be recovered both during writing and reading, so the hard disk in which the error was detected is disconnected. In the case of RAID 1, this results in a degenerate state, but this is inevitable because it is a failure of the entire disk. In this case, there is no other way than restoring the redundancy by replacing the hard disk in which the error has occurred with a new hard disk and copying the data of the normal disk.

一方、(b) に属するエラーは、書込時のエラーであれば、ベリファイ処理を実施するハードディスクの場合にはハードディスク自体の再割当機構により自動的に修復される。しかし、ＩＤＥ（Integrated Drive Electronics）ハードディスクドライブのようにベリファイ処理を実施しない場合には、書込時に修復がなされないので、そのエラーが読出時に検出されることとなる。 On the other hand, if the error belonging to (b) is an error at the time of writing, in the case of a hard disk to be verified, it is automatically repaired by the reassignment mechanism of the hard disk itself. However, when the verify process is not performed as in an IDE (Integrated Drive Electronics) hard disk drive, the error is detected at the time of reading because the repair is not performed at the time of writing.

上記(b) に属する読出時のエラーの場合は、一方のハードディスクからの読出においてエラーが発生しても、他方のハードディスクから読み出すことにより、上位システムからの読出要求には応えられるが、ハードディスク自体はエラーとなるために何らかの対応が必要である。そこで、従来のＲＡＩＤ１のディスクアレイ制御では、このような部分障害が一方のハードディスクの読出時に生じた場合には、もう一方のハードディスクの全体のデータをそのエラーを検出したハードディスクにコピーすることで、冗長性回復を図っていた。また、ソフトウエアＲＡＩＤ方式のＲＡＩＤ１ディスクアレイ・システムの中には、ハードディスク全体ではなく、エラーを検出したパーティションを単位として正常なハードディスクからデータコピーを行うことで冗長性復旧を図るものも知られている。 In the case of an error during reading belonging to (b) above, even if an error occurs in reading from one hard disk, reading from the other hard disk can satisfy the read request from the host system, but the hard disk itself Something needs to be done to get an error. Therefore, in the conventional RAID 1 disk array control, when such a partial failure occurs when one of the hard disks is read, the entire data of the other hard disk is copied to the hard disk that detected the error, Redundancy was restored. Some RAID 1 disk array systems based on software RAID are designed to restore redundancy by copying data from a normal hard disk in units of partitions in which an error has been detected, instead of the entire hard disk. Yes.

また、特許文献１には、冗長構成のハードディスクシステムにおけるディスク間の復旧コピーについて示されている。この文献に示される方式では、復旧時に正常なディスクから、異常を修復したディスクへとデータをコピーする際、コピー元からの読出にエラーが発生しても、強制的に続行することにより、少なくともエラーが発生しないデータの復旧を実施することにより、全体システムの動作継続を試みることを可能とするものである。 Patent Document 1 discloses a recovery copy between disks in a redundant hard disk system. In the method shown in this document, when data is copied from a normal disk to a disk that has been repaired at the time of recovery, even if an error occurs in reading from the copy source, it is forced to continue at least, It is possible to attempt to continue the operation of the entire system by performing recovery of data in which no error occurs.

また特許文献２には、ディスクアレイ装置において、故障したディスクを交換した後のデータ再構築（リビルト）時の制御方式について示されている。この方式では、データ再構築時に読出エラーが発生した場合に、発生箇所が未使用領域であればエラーを無視することにより、復旧処理における実質的なエラー発生率を低減するものである。 Further, Patent Document 2 discloses a control method at the time of data reconstruction (rebuild) after replacing a failed disk in a disk array device. In this method, when a read error occurs during data reconstruction, if the occurrence location is an unused area, the error is ignored, thereby reducing the substantial error occurrence rate in the recovery process.

また特許文献３には、ディスクアレイのディスク交換後のデータ復旧処理における二次障害の復旧支援のための方式が示されている。この方式は、復旧処理において読出エラーが発生した状況からの障害回復に際して、障害復旧技術者の支援を行うものである。 Japanese Patent Application Laid-Open No. 2004-228561 discloses a method for supporting recovery from secondary failure in data recovery processing after disk replacement of a disk array. This method provides assistance to a failure recovery engineer when recovering from a situation where a read error has occurred in recovery processing.

また特許文献４には、データ読出時に障害が発生した際に、他方の正常なハードディスクからデータを読み出して、障害が発生したハードディスクに再度書き込むことで冗長性を回復させるとともに、その再書込時には、過去にエラーが発生した領域を避けて新たな領域に書き込むという方式が開示されている。この文献には、エラーが発生したディスクの復旧のために、ＲＡＩＤ３〜５の場合は、他のディスクやパリティー用ディスクのうち、エラーの生じたブロック（セクタ）領域に対応するブロック（セクタ）領域内のデータを読み出して、エラーの生じたディスクに再書込する点が開示されている。また、ＲＡＩＤ１については「このとき読み出すデータは、ＲＡＩＤ１の場合はエラーが生じたディスク装置１４と全く同じデータが記録された他のディスク装置１４から読み込む同一のデータである。」と記載されている。 Further, in Patent Document 4, when a failure occurs during data reading, data is read from the other normal hard disk and rewritten to the failed hard disk to restore redundancy, and at the time of rewriting A method of writing in a new area while avoiding an area where an error has occurred in the past is disclosed. In this document, for recovery of a disk in which an error has occurred, in the case of RAID 3 to 5, a block (sector) area corresponding to a block (sector) area in which an error has occurred among other disks or parity disks It is disclosed that the data in the disk is read and rewritten to the disk in which the error occurred. For RAID1, “data read at this time is the same data read from another disk device 14 in which exactly the same data as the disk device 14 in which an error has occurred is recorded”. .

特開平０５−２４２５９３号公報JP 05-242593 A 特開平０８−１８５２７４号公報Japanese Patent Laid-Open No. 08-185274 特開平０９−３０５３２６号公報JP 09-305326 A 特開２００１−１００９４８号公報Japanese Patent Laid-Open No. 2001-1000094

特許文献１の方式は、エラーをあえて無視して全体システムの動作継続を試みるだけであって、正常なディスクで読出エラーが発生した部分については、復旧対象であるディスクには正常なデータが書き込まれないまま復旧が終了してしまう。これでは、復旧後にその読出エラーの発生した部分の読出が行われる場合、復旧したディスクからその部分を読み出すと、正しいデータがない部分から読出が行われるので、不正なデータを読み出し上位システムに返すことになり、その結果は全く予期できないものとなる。これは非常に危険な方法である。 The method of Patent Document 1 simply ignores the error and tries to continue the operation of the entire system, and normal data is written to the recovery target disk in the portion where the read error occurs in the normal disk. Recovery ends without being completed. In this case, when the portion where the read error has occurred is read after recovery, if the portion is read from the recovered disk, the read is performed from the portion where there is no correct data, so that illegal data is read and returned to the host system. The result is quite unpredictable. This is a very dangerous method.

特許文献２の方式は、ハードディスクの使用率の低さに依存してエラー発生率を低減するものであるが、未使用部分のエラーを無視するだけでは発生率はせいぜい数分の１に低減されるだけである。 The method of Patent Document 2 is to reduce the error rate depending on the low usage rate of the hard disk, but the rate of occurrence is reduced to a fraction of at most by simply ignoring the unused portion error. Just do.

特許文献３の方式は、自動復旧を試みるものではないため、縮退動作を避けることもできないし、これで縮退動作期間をどの程度低減できるかについては示されておらず、不明である。 Since the method of Patent Document 3 does not attempt automatic recovery, it is impossible to avoid the degeneration operation, and how much the degeneration operation period can be reduced by this method is not shown and is unknown.

特許文献４の方式は、ＲＡＩＤ３〜５のように、小さいサイズの記録単位制御を行うＲＡＩＤ方式には適用可能である。しかし、ＲＡＩＤ１のようにディスク全体やパーティションといった大きなサイズでしかデータ復旧を管理しない方式では、過去にエラーが発生した領域を避けた新たな領域を確保するといっても、同じディスク内にディスク全体やパーティションと同等な大きさの領域を設けることはコスト的に見合わないため、事実上採用不能である。 The method of Patent Document 4 can be applied to a RAID method that performs recording unit control of a small size, such as RAIDs 3 to 5. However, in the method of managing data recovery only with a large size such as the entire disk or partition such as RAID 1, even if a new area avoiding the area where an error has occurred in the past is secured, the entire disk or Providing an area of the same size as the partition is not cost effective, so it is virtually impossible to employ.

また、従来のＲＡＩＤ１のデータ復旧のように、正常なハードディスクからディスク全体又はエラーのあったパーティションに該当するパーティションのデータを読み出して、エラーのあったディスクにコピーする処理の場合、読出エラーが生じると双方のディスクがエラーになってしまい、自動処理による復旧が困難になる。ファイル単位の読出ではハードディスクが読出エラーを生じる可能性は低いが、ディスク全体やパーティションという大きいサイズの読出の場合、読出エラーが生じる可能性は無視できない。 Further, in the case of processing for reading data of a partition corresponding to an entire disk or a partition having an error from a normal hard disk and copying the data to a disk having an error as in the conventional RAID 1 data recovery, a read error occurs. Both disks become errors and recovery by automatic processing becomes difficult. In reading in units of files, the possibility that the hard disk will cause a read error is low, but in the case of reading a large size such as the whole disk or a partition, the possibility that a read error will occur cannot be ignored.

この点について詳しく説明すると、例えば、ハードディスクの１ビットを読み出す際の復旧不能なエラー率をμ、復旧の単位をＳ[bit]とすると、この復旧処理においてエラーが発生する確率は、Ｓ・μとなる。たとえばμが１０^−１３であるとし、復旧の単位Ｓがハードディスク全体で例えば１００[GByte](＝１００×１０^９×８[bit])とすると、正常なハードディスクのすべてを読み出すときのエラー率は０．０８，すなわち８％となる。決して無視できない確率である。 This point will be described in detail. For example, if the unrecoverable error rate when reading one bit of the hard disk is μ and the unit of recovery is S [bit], the probability that an error will occur in this recovery process is S · μ. It becomes. For example, if μ is 10 ⁻¹³ and the recovery unit S is 100 Gbytes (= 100 × 10 ⁹ × 8 [bit]) for the entire hard disk, the error rate when reading all normal hard disks is 0.08, that is, 8%. Probability that can never be ignored.

近年ハードディスクの容量は増加の一途をたどっており、数百ＧＢｙｔｅが当たり前になってきている。またこれに伴い、ハードディスクのパーティションのサイズも大きくなっている。したがって、ディスク全体やパーティションを単位として復旧する方式では、復旧単位のサイズは少ない場合でも10GByte程度、大きい場合には300GByteを越えることが一般的であると考えられる。このように大きい単位を復旧のために連続して読み出せば、正常なハードディスクからの読出でも、上記のように無視できないエラー率で読出エラーが発生する。このように読出エラーが発生した場合には、双方のハードディスクが利用できない状態に陥る。このため、ハードディスクの内容を論理的に比較しながらの手作業での復旧が必要になり、この場合には必ずしも完全復旧が可能である保証はないし、その復旧作業に要する時間も見積もることは難しい。 In recent years, the capacity of hard disks has been steadily increasing, and hundreds of GBytes have become commonplace. Along with this, the size of hard disk partitions has also increased. Therefore, in the method of recovering the entire disk or partition as a unit, it is generally considered that even if the size of the recovery unit is small, it is about 10 GByte, and if it is large, it exceeds 300 GByte. If such a large unit is continuously read for recovery, a read error occurs at an error rate that cannot be ignored as described above even when reading from a normal hard disk. When a read error occurs in this way, both hard disks cannot be used. For this reason, manual recovery is necessary while logically comparing the contents of the hard disk. In this case, there is no guarantee that complete recovery is possible, and it is difficult to estimate the time required for the recovery operation. .

ＲＡＩＤ１は、各種のＲＡＩＤ方式の中でも最も構成が簡単で、ソフトウェアでも実現可能なため、低コストで信頼性の高いハードディスクシステムを提供するためには有望な方式であるが、上述のように従来のＲＡＩＤ１のシステムではハードディスクに部分的な読出エラーが発生した場合に効果的な復旧方法が提供されていなかった。 RAID 1 is the most promising method for providing a low-cost and highly reliable hard disk system because it has the simplest configuration among various RAID methods and can be realized by software. However, as described above, RAID 1 is a promising method. The RAID 1 system does not provide an effective recovery method when a partial read error occurs on the hard disk.

本発明は、ＲＡＩＤ１方式のディスクアレイ装置において、いずれかのハードディスクに部分的な読出エラーが発生した場合の、効果的な復旧方法を提供する。 The present invention provides an effective recovery method when a partial read error occurs in any hard disk in a RAID 1 type disk array device.

本発明は、再割当機構を備えた複数のハードディスクと、それらハードディスクに対する読み書き動作を制御するＲＡＩＤ１制御部とを備えたＲＡＩＤ１方式のディスクアレイ装置であって、前記ＲＡＩＤ１制御部は、前記ハードディスクを、該ハードディスクのパーティションよりも小さいサイズに設定された復旧単位ごとに領域分割して管理し、前記複数のハードディスクの一つからの読出時に読出エラーが生じた場合、その読出エラーが生じた復旧単位の領域のデータを前記複数のハードディスクのうちの他の一つから読み出し、読み出したデータを読出エラーが生じたハードディスクに書き込むことで冗長構成を復旧する、ことを特徴とするディスクアレイ装置を提供する。 The present invention is a RAID 1 type disk array device comprising a plurality of hard disks provided with a reallocation mechanism and a RAID 1 control unit for controlling read / write operations on these hard disks, wherein the RAID 1 control unit Each recovery unit set to a size smaller than the partition of the hard disk is divided and managed, and if a read error occurs when reading from one of the hard disks, the recovery unit in which the read error has occurred There is provided a disk array device characterized in that data in a region is read from another one of the plurality of hard disks, and the read data is written to a hard disk in which a read error has occurred to restore the redundant configuration.

ここで、前記復旧単位の領域は、前記ハードディスクの読み書きの単位であるブロックとすることも好ましい。 Here, the recovery unit area is preferably a block which is a read / write unit of the hard disk.

また、好適には、前記ＲＡＩＤ１制御部は、前記復旧単位の各領域と前記ハードディスクの読み書きの単位であるブロック群との対応関係を登録した対応管理テーブルを備え、読出エラーが生じたブロックが属する復旧単位の領域のブロック群を前記対応管理テーブルから求め、それらブロック群のデータを復旧単位の領域のデータとして読み出し、読出エラーが生じたハードディスクに書き込むようにする。 Preferably, the RAID1 control unit includes a correspondence management table in which a correspondence relationship between each area of the recovery unit and a block group which is a read / write unit of the hard disk is registered, and a block in which a read error has occurred belongs. The block group of the recovery unit area is obtained from the correspondence management table, the data of the block group is read as the recovery unit area data, and written to the hard disk in which the read error has occurred.

以下、図面を参照して、本発明を実施するための最良の形態（以下「実施形態」と呼ぶ）について説明する。 The best mode for carrying out the present invention (hereinafter referred to as “embodiment”) will be described below with reference to the drawings.

本実施形態では、ＲＡＩＤ１のディスクアレイ装置において、読出エラー（上述の（ｂ）の障害に該当）の発生時の冗長化復旧を効率化する仕組みを提供する。 The present embodiment provides a mechanism for improving the efficiency of redundancy recovery when a read error (corresponding to the failure (b) described above) occurs in a RAID 1 disk array device.

ＲＡＩＤ１において、記録されているデータのすべてが正常であるとみなす単位を、ディスク全体やパーティションという単位よりも小さく設定し、エラーを検出したときに復旧すべき単位を小さくする。例えばハードディスクの１ビットを読み出す際の復旧不能なエラー率を上述の従来例と同様１０^−１３であるとした場合、復旧の単位を１メガバイト[MByte] とすれば、その復旧単位を読み出す際のエラー発生率は０．００００００８、すなわち０．００００８％まで下げることができる。十〜数百ギガバイト規模でパーティションやディスク全体を復旧のために読み出す場合と比べた場合、エラー発生率は４〜６桁低くなる。復旧の単位をブロックとすることで、そのサイズを４〜１６[KByte] とすれば、エラー発生率をさらに２桁程度低くすることができる。 In RAID1, the unit for assuming that all recorded data is normal is set to be smaller than the unit of the entire disk or partition, and the unit to be restored when an error is detected is reduced. For example, if the unrecoverable error rate when reading one bit of the hard disk which is a conventional example similar to 10 ^-13 above, if a unit of recovery one megabyte [MByte] and, when reading the recovery unit The error rate can be reduced to 0.0000008, that is, 0.00008%. The error rate is 4 to 6 orders of magnitude lower than when reading the entire partition or disk for recovery on the scale of 10 to several hundred gigabytes. By making the unit of recovery into blocks, if the size is 4 to 16 [KByte], the error rate can be further reduced by about two digits.

復旧の単位はユーザ（ＲＡＩＤ１システムの運用者）が任意に設定できるようにしてももちろんよいが、ブロックあるいはセクタのようにハードディスク自体の読み書きの単位をその復旧の単位とすれば、更に制御が簡素化できる。 Of course, the recovery unit can be arbitrarily set by the user (RAID 1 system operator). However, if the read / write unit of the hard disk itself, such as a block or sector, is used as the recovery unit, the control is further simplified. Can be

また、本実施形態では、現在のハードディスク装置が持ついくつかの機能を利用している。 In the present embodiment, some functions of the current hard disk device are used.

例えば、現在のハードディスク装置は、書込時に、ディスクの回転速度やその微分値、ヘッドの位置が、ディスクに対し適切に書込処理が実行可能な範囲内であることを確認する機能を備えている。例えばＩＤＥ規格では、このように書込時に回転速度等が一定の範囲内にあることを以て正常に書込ができているものと判定する。この機能により、書き込むべき領域に正しく書込が行われ、かつ書き込んではいけない領域にはみ出て書き込まれることがないようにしている。なお、ＳＣＳＩ規格では、更に、書き込んだデータを即座に読み出し、読み出しエラーが発生しないことにより書き込みの結果を確認するというベリファイ処理が規定されている。ただし、ベリファイ処理は処理コストがかかるので、そのようなエラーが発生する確率も勘案して、ＩＤＥ規格のハードディスクにはベリファイ処理は実装されていない。 For example, the current hard disk device has a function of confirming that the rotational speed of the disk, its differential value, and the head position are within a range in which the writing process can be appropriately performed on the disk at the time of writing. Yes. For example, in the IDE standard, it is determined that writing is normally performed when the rotational speed or the like is within a certain range during writing. With this function, data is correctly written in an area to be written, and is prevented from being written in an area that should not be written. The SCSI standard further defines a verify process in which written data is read immediately and a write result is confirmed without a read error. However, since the verification process requires processing costs, the verification process is not implemented on the IDE standard hard disk in consideration of the probability of such an error.

また、ハードディスクに書き込まれるデータはＣＲＣ（巡回冗長符号）と呼ばれるエラー訂正符号が付加されたものとなっており、適切に書き込まれていなかった場合でも、わずかな程度のエラーなら訂正可能であるとともに、訂正可能な範囲を超えた場合には読出時にエラーであることが判別できるようになっている。 Further, data written to the hard disk is added with an error correction code called CRC (Cyclic Redundancy Code), and even if it is not written properly, it can be corrected if it is a slight error. If it exceeds the correctable range, it can be determined that an error has occurred during reading.

また、現在のハードディスク装置では、あるブロックが読出エラーになった場合には、そのブロックはメディアエラー（ディスク表面の特定の領域が異常であることによる修復不能なエラー）であると判断され、ディスク上に用意された予備領域上の他のブロックが再割当される。再割当されたブロックは、元のブロックのブロック番号にマッピングされる。したがって、上位システムから見ればそのブロックへの書き込みが許されるが、ハードディスク内部ではそのブロックに対し再割当されたブロックへ書込が行われることとなる。この機能は、再割当(reassign)機構、あるいは交代機構と呼ばれる。 Also, in a current hard disk device, when a certain block has a read error, it is determined that the block is a media error (an error that cannot be repaired due to a specific area on the disk being abnormal), and the disk Other blocks on the spare area prepared above are reallocated. The reallocated block is mapped to the block number of the original block. Therefore, although writing to the block is permitted when viewed from the host system, writing to the block reassigned to the block is performed inside the hard disk. This function is called a reassign mechanism or a replacement mechanism.

本実施形態では、これら現在のハードディスク装置が一般的に有している機能を利用して、ＲＡＩＤ１における読出エラー時の効率的なデータ復旧処理を実現する。この復旧処理について、以下に説明する。 In the present embodiment, efficient data recovery processing at the time of a read error in RAID 1 is realized by using a function that these current hard disk devices generally have. This recovery process will be described below.

まず図１を参照して、ＲＡＩＤ１のシステム構成を説明する。図１において、ＲＡＩＤ１のディスクアレイ装置（上位システム１０を含めた全体のコンピュータシステムからすれば、これはストレージサブシステムに該当する）は、ＲＡＩＤ１制御部２０と、ハードディスク３０Ａ及びハードディスク３０Ｂ（以下それぞれ「ハードディスクＡ」，「ハードディスクＢ」と呼ぶ）から構成される。このうちＲＡＩＤ１制御部２０が、それらハードディスクＡ，Ｂに対するデータの読み書きを制御する。パーソナルコンピュータやオペレーティングシステムで動作するストレージ用デバイスドライバよりも上位のプログラム等の上位システム１０は、ＲＡＩＤ１制御部２０に接続され、このＲＡＩＤ１制御部２０に対し、単体のハードディスク装置に対するのと同様の読出・書込の要求を送る。上位システム１０から要求を受けたＲＡＩＤ１制御部２０は、書込要求の場合は要求されたデータをハードディスクＡ，Ｂの両方に書き込み、読出要求の場合は要求されたデータをハードディスクＡ，Ｂのうちの一方（これはＲＡＩＤ１制御部２０にシステム管理者が予め設定しておいてもよいし、ＲＡＩＤ１制御部２０が自動判定してもよい）から読み出す。 First, the system configuration of RAID 1 will be described with reference to FIG. In FIG. 1, a RAID 1 disk array device (which corresponds to a storage subsystem in the case of the entire computer system including the host system 10) includes a RAID 1 control unit 20, a hard disk 30A, and a hard disk 30B (hereinafter, “ Hard disk A "and" hard disk B "). Of these, the RAID1 control unit 20 controls reading and writing of data with respect to the hard disks A and B. A host system 10 such as a program higher than a storage device driver that operates on a personal computer or an operating system is connected to a RAID 1 control unit 20, and the RAID 1 control unit 20 reads data in the same manner as for a single hard disk device. Send a write request. The RAID1 control unit 20 that receives the request from the host system 10 writes the requested data to both the hard disks A and B in the case of a write request, and writes the requested data in the hard disks A and B in the case of a read request. (This may be preset by the system administrator in the RAID1 control unit 20 or may be automatically determined by the RAID1 control unit 20).

以下、ＲＡＩＤ１制御部２０の制御動作について説明する。以下の例では、読出エラー発生時の復旧の単位を、ハードディスク自体の読み書きの単位であるブロック（セクタとも言う）とする。このようにブロックを復旧単位とするため、ハードディスクＡとＢは同一容量で、ブロックのサイズが同一のものとする。 Hereinafter, the control operation of the RAID1 control unit 20 will be described. In the following example, the unit of recovery when a read error occurs is a block (also referred to as a sector), which is a read / write unit of the hard disk itself. In this way, since the block is a recovery unit, the hard disks A and B have the same capacity and the same block size.

まず書込時の処理では、ＲＡＩＤ１制御部２０は、上位システム１０から書込要求を受け付けると、両方のハードディスクＡ，Ｂにそのデータの書込要求を発する。この書込要求に対し、ハードディスクＡ，Ｂそれぞれから書込成功又はエラーを報せる応答が返される。ここでハードディスクＡ，Ｂの両方からエラーの応答があった場合は、ＲＡＩＤ１制御部２０は、ストレージサブシステムの停止を上位システム１０に通知してから、サブストレージシステムを停止する。ハードディスクＡ，Ｂの一方のみからエラーの応答があり他方からは書込成功の通知があった場合は、エラー応答のあったハードディスクをストレージサブシステムから切り離して縮退動作に移行する。そして、縮退動作に移行した旨を報せる通知を上位システム１０に対して行う。 First, in the process at the time of writing, when the RAID 1 control unit 20 receives a write request from the host system 10, it issues a write request for the data to both hard disks A and B. In response to this write request, a response indicating a write success or an error is returned from each of the hard disks A and B. If there is an error response from both hard disks A and B, the RAID1 control unit 20 notifies the host system 10 of the stop of the storage subsystem and then stops the sub storage system. If there is an error response from only one of the hard disks A and B and a write success notification is received from the other, the hard disk that has received the error response is disconnected from the storage subsystem and shifted to a degeneration operation. And the notification which reports that it changed to degenerate operation is performed with respect to the high-order system 10. FIG.

以上が書込時の制御であり、これは従来のＲＡＩＤ１の制御と基本的に同様である。次に、読出時のＲＡＩＤ１制御部２０の制御動作を、図２を参照して説明する。 The above is the control at the time of writing, which is basically the same as the conventional RAID1 control. Next, the control operation of the RAID1 control unit 20 during reading will be described with reference to FIG.

この手順では、ＲＡＩＤ１制御部２０は、上位システム１０からの読出要求を待つ（Ｓ１０）。上位システム１０からの読出要求には、読み出すべきブロックの番号の指定が含まれる。このような読出要求を受け取ると、ＲＡＩＤ１制御部２０は、読出対象に選ばれたハードディスク（図示例ではハードディスクＡ）から、その要求に指定されたブロックのデータの読出を行う（Ｓ１２）。この読出でエラーが発生しなければ（Ｓ１４の判定結果が否定（Ｎ））、ＲＡＩＤ１制御部２０は、読み出したデータを上位システム１０に返す（Ｓ２４）。 In this procedure, the RAID1 control unit 20 waits for a read request from the host system 10 (S10). The read request from the host system 10 includes designation of the block number to be read. Upon receiving such a read request, the RAID1 control unit 20 reads the data of the block specified in the request from the hard disk selected in the reading target (hard disk A in the illustrated example) (S12). If no error occurs in this reading (the determination result in S14 is negative (N)), the RAID1 control unit 20 returns the read data to the host system 10 (S24).

一方、Ｓ１２の読出動作時に、ハードディスクＡから読出エラーの通知があると（Ｓ１４の判定結果が肯定（Ｙ））、ＲＡＩＤ１制御部２０は、ハードディスクＢに対し、ハードディスクＡから読み出そうとしたのと同じブロックの読出を要求する（Ｓ１６）。この要求に対してハードディスクＢから正しくデータが読み出せると（Ｓ１８の判定結果が否定（Ｎ））、そのデータをハードディスクＡの当該ブロックに対して再書込する（Ｓ２０）。なお、ハードディスクＡ内部の制御部は、読出エラーを検知した時点で、その読出エラーの生じたブロックの再割当処理を行う。したがって、Ｓ２０の再書込では、ハードディスクＡに対し、Ｓ１２，１４で読出エラーが検知されたブロックの書込要求を行えば、ハードディスクＡではそのブロックにマッピングされた再割当ブロックに対して自動的にデータを書き込まれることになる。この再書込の際、ハードディスクＡから書込エラーの通知が無ければ（Ｓ２２の判定結果が否定（Ｎ））、ＲＡＩＤ１制御部２０は、Ｓ１６で読み出したデータを上位システム１０に返す（Ｓ２４）。 On the other hand, if there is a read error notification from the hard disk A during the read operation in S12 (the determination result in S14 is affirmative (Y)), the RAID1 control unit 20 tried to read from the hard disk A to the hard disk B. Is requested to read the same block (S16). If data can be correctly read from the hard disk B in response to this request (determination result in S18 is negative (N)), the data is rewritten to the corresponding block of the hard disk A (S20). Note that the control unit inside the hard disk A performs reassignment processing of the block in which the read error has occurred when the read error is detected. Therefore, in the rewriting in S20, if a write request is made to the hard disk A for the block in which the read error is detected in S12, 14, the hard disk A automatically performs the reassignment block mapped to the block. Data will be written to. If there is no notification of a write error from the hard disk A at the time of this rewriting (the determination result in S22 is negative (N)), the RAID1 control unit 20 returns the data read in S16 to the host system 10 (S24). .

なお、Ｓ１６におけるハードディスクＢからの読出の際に読出エラーが発生した場合（Ｓ１８の判定結果がＹ）、ＲＡＩＤ１制御部２０は、ストレージサブシステムの停止を上位システム１０に通知し（Ｓ２６）、サブストレージシステムを停止する（Ｓ２８）。 If a read error occurs during reading from the hard disk B in S16 (the determination result in S18 is Y), the RAID1 control unit 20 notifies the host system 10 of the stop of the storage subsystem (S26). The storage system is stopped (S28).

また、Ｓ２０におけるハードディスクＡの冗長化復旧（再書込）の際に書込エラーが発生した場合、ＲＡＩＤ１制御部２０は、ハードディスクＡをストレージサブシステムから切り離してハードディスクＢのみの縮退動作に移行し（Ｓ３０）、縮退動作に移行した旨を上位システム１０に通知する（Ｓ３２）。 If a write error occurs during the recovery (rewriting) of the hard disk A in S20, the RAID1 control unit 20 disconnects the hard disk A from the storage subsystem and shifts to the degeneration operation of only the hard disk B. (S30) The host system 10 is notified of the transition to the degeneration operation (S32).

以上、読込時のＲＡＩＤ１制御部２０の制御動作の例を説明した。以上の例では、Ｓ１６でハードディスクＢから読み出したデータをすぐには上位システム１０に返さず、Ｓ１８でそのデータをハードディスクＡへ再書込して冗長化復旧を行った後で、Ｓ２４でそのデータを上位システム１０に返した。ただし、この順序はあくまで一例である。この逆に、Ｓ１６でハードディスクＢからデータを読み出した段階ですぐに上位システム１０にそのデータを返し、その後、そのデータをハードディスクＡに再書込するようにしてもシステムとしては成立する。もっともこの場合、タイミングの問題で、ハードディスクＡの冗長化復旧が完了する前に上位システム１０からの次の要求が来たりすると、再び読出エラーが発生してしまうなどの可能性があるので、このような問題の発生を回避する手順として、図２に示した手順は好ましい。 The example of the control operation of the RAID1 control unit 20 at the time of reading has been described above. In the above example, the data read from the hard disk B in S16 is not immediately returned to the host system 10, but the data is rewritten to the hard disk A in S18 and the redundancy is restored. Is returned to the host system 10. However, this order is merely an example. On the contrary, the system can be established by returning the data to the host system 10 immediately after reading the data from the hard disk B in S16 and then rewriting the data to the hard disk A. However, in this case, there is a possibility that a read error will occur again if the next request from the host system 10 comes before the completion of the redundant recovery of the hard disk A due to timing problems. As a procedure for avoiding such a problem, the procedure shown in FIG. 2 is preferable.

以上では、読出エラー時の復旧の単位として、ハードディスクの読み書きの単位であるブロックを採用したが、これより大きいサイズを復旧単位に採用することももちろん可能である。復旧単位をハードディスクや一般的なパーティションのサイズから小さいサイズとすれば、上で説明したように、復旧作業時の読出エラー発生率を低下させることができる。また、復旧単位を小さくすれば、復旧時の読出及び再書込の時間が短くなり、復旧作業に要する時間を全体として短縮することができる。復旧処理が完了するまではストレージサブシステムは冗長性を持たないため、この処理時間を低減することは上位システムから見たデータ記録機構としての信頼性を大幅に向上できる。復旧単位を１メガバイトなどのようにハードディスクやパーティションに比して数桁小さいサイズに設定すれば、復旧時の読出エラー発生率や復旧処理に要する時間を大幅に低減することができる。 In the above description, the block, which is the read / write unit of the hard disk, is used as the recovery unit in the event of a read error, but it is of course possible to employ a larger size as the recovery unit. If the recovery unit is made smaller than the size of the hard disk or general partition, as described above, the read error occurrence rate during the recovery operation can be reduced. Further, if the recovery unit is made smaller, the time for reading and rewriting at the time of recovery is shortened, and the time required for the recovery work can be shortened as a whole. Since the storage subsystem does not have redundancy until the restoration process is completed, reducing this processing time can greatly improve the reliability of the data recording mechanism as seen from the host system. If the recovery unit is set to a size several orders of magnitude smaller than that of the hard disk or partition, such as 1 megabyte, the read error rate during recovery and the time required for recovery processing can be greatly reduced.

ハードディスクの読み書きの単位であるブロックを復旧の単位とする前述の例では、ＲＡＩＤ１制御部２０は、上位システム１０から読出要求されたブロックに対して読出エラーがあればそのブロックの復旧を行えばよいので、復旧のために特別なハードディスク領域管理を行う必要はなかった。これに対し、そのようなブロックよりも大きい単位での復旧を実現するには、復旧単位とブロックとの対応付けを管理する、図３に示したような対応管理テーブルをＲＡＩＤ１制御部２０に持たせる。対応管理テーブルには、図示のごとく、各復旧単位の番号ごとに、その復旧単位に該当するブロックの番号が登録される。図示例では、連続するブロック群を復旧単位としているので、該当ブロック番号には、先頭と末尾のブロックのブロック番号が示されている。なお、復旧単位内のブロックにエラーが生じた場合でも、それはハードディスク内の制御で再割当がなされるので、この対応管理テーブル自体を修正する必要はない。 In the above-described example in which a block, which is a read / write unit of the hard disk, is used as a recovery unit, the RAID1 control unit 20 may recover a block requested to be read from the host system 10 if there is a read error. Therefore, there was no need to perform special hard disk space management for recovery. On the other hand, in order to realize recovery in a unit larger than such a block, the RAID1 control unit 20 has a correspondence management table as shown in FIG. 3 for managing the association between the recovery unit and the block. Make it. As shown in the figure, for each recovery unit number, the number of the block corresponding to the recovery unit is registered in the correspondence management table. In the illustrated example, since a continuous block group is a recovery unit, the corresponding block number indicates the block number of the first and last blocks. Even if an error occurs in a block in the recovery unit, it is reassigned by the control in the hard disk, so there is no need to modify the correspondence management table itself.

このように対応管理テーブルを持つ構成の場合、ＲＡＩＤ１制御部２０は、ハードディスクから読出エラーの通知を受けると、そのとき読出の対象であるブロックの番号が所属する復旧単位の番号を、図３に例示したような対応管理テーブルから求める。そして、求めた復旧単位に属するブロック群をその対応管理テーブルから求め、それらブロック群のデータをもう一方のハードディスクから読み出し、読出エラーのあったハードディスクに書き込む。以上の制御により、読出エラー時の冗長化復旧が可能になる。 In the case of the configuration having the correspondence management table as described above, when the RAID1 control unit 20 receives a read error notification from the hard disk, the number of the recovery unit to which the number of the block to be read at that time belongs is shown in FIG. It is obtained from the correspondence management table as illustrated. Then, the block group belonging to the obtained recovery unit is obtained from the correspondence management table, the data of the block group is read from the other hard disk, and written to the hard disk having the read error. With the above control, it is possible to restore redundancy when a read error occurs.

復旧単位のサイズは、ＲＡＩＤ１制御部２０のパラメータ設定用のプログラム等を介して、システム管理者が設定できるようにすることもできる。この場合、管理者が復旧単位のサイズを例えばバイト数単位、あるいはブロック数単位で指定すると、そのプログラムがそのサイズに応じて各復旧単位の先頭及び末尾のブロック番号を計算し、それらを対応管理テーブルに登録するようにすればよい。 The size of the recovery unit can also be set by the system administrator via a parameter setting program of the RAID1 control unit 20 or the like. In this case, if the administrator specifies the size of the recovery unit, for example, in units of bytes or blocks, the program calculates the block numbers at the beginning and end of each recovery unit according to the size, and manages them accordingly. It should be registered in the table.

以上に説明したＲＡＩＤ１制御部２０は、ハードウエアのＲＡＩＤコントローラとして構成することも、ＲＡＩＤ制御用のプログラムとして構成することも、あるいはハードウエアとソフトウエアの折衷システムとして構成することもできる。 The RAID1 control unit 20 described above can be configured as a hardware RAID controller, a RAID control program, or a hardware and software compromise system.

以上説明したように、本実施形態によれば、ＲＡＩＤ１のディスクアレイ装置において、ハードディスクの一部分につき読出エラーが生じたときに、冗長状態への自動復旧を行うことができる。しかも、その自動復旧の単位を従来のＲＡＩＤ１のシステムより大幅に小さくしたので、自動復旧時に正常なハードディスクからの読出エラーが生じる確率を大幅に減らすことができるとともに、自動復旧に要する時間も大幅に短くすることができる。本実施形態の方式は、特にＩＤＥディスクドライブを利用したディスクアレイ装置において最も効果が高い。なぜならばＩＤＥディスクドライブは書き込んだデータのベリファイ処理を実施しないために、読出時にエラーが発生する確率が高いからである。 As described above, according to this embodiment, when a read error occurs in a part of a hard disk in a RAID 1 disk array device, automatic recovery to a redundant state can be performed. In addition, since the unit of automatic recovery is significantly smaller than the conventional RAID 1 system, the probability of occurrence of a normal hard disk read error during automatic recovery can be greatly reduced, and the time required for automatic recovery is also greatly increased. Can be shortened. The method of this embodiment is most effective especially in a disk array device using an IDE disk drive. This is because the IDE disk drive does not perform the verification process of the written data, so that there is a high probability that an error will occur during reading.

ＲＡＩＤ１のシステム構成を示す図である。It is a figure which shows the system configuration | structure of RAID1. 読出時のＲＡＩＤ１制御部の制御動作の例を示すフローチャートである。It is a flowchart which shows the example of control operation of the RAID1 control part at the time of reading. 復旧単位とハードディスクのブロックとの対応管理テーブルの例を示す図である。It is a figure which shows the example of a management table of a recovery unit and the block of a hard disk.

Explanation of symbols

１０上位システム、２０ＲＡＩＤ１制御部、３０Ａ，３０Ｂハードディスク。 10 host system, 20 RAID1 control unit, 30A, 30B hard disk.

Claims

A RAID 1 type disk array device comprising a plurality of hard disks provided with a reallocation mechanism and a RAID 1 controller for controlling read / write operations on these hard disks,
The RAID1 control unit manages the hard disk by dividing the area into recovery units set to a size smaller than the partition of the hard disk, and a read error occurs when reading from one of the hard disks Recovering the redundant configuration by reading the data of the recovery unit area in which the read error has occurred from the other one of the plurality of hard disks and writing the read data to the hard disk in which the read error has occurred. A featured disk array device.

2. The disk array device according to claim 1, wherein the recovery unit area is a block which is a read / write unit of the hard disk.

The RAID1 control unit includes a correspondence management table in which a correspondence relationship between each area of the recovery unit and a block group that is a read / write unit of the hard disk is registered, and a block of a recovery unit area to which a block in which a read error occurs belongs 2. The disk array device according to claim 1, wherein a group is obtained from the correspondence management table, the data of the block group is read as data in the area of the recovery unit, and written to the hard disk in which a read error has occurred.

A disk array control method for controlling read / write operations in a RAID 1 system for a plurality of hard disks having a reallocation mechanism,
The hard disk is divided into areas for each recovery unit set to a size smaller than the partition of the hard disk, and is managed.
If a read error occurs when reading from one of the plurality of hard disks, the data of the recovery unit area where the read error has occurred is read from the other one of the plurality of hard disks, and the read data is read. Restore the redundant configuration by writing to the hard disk where the error occurred.
And a disk array control method.

A program for causing a computer system to function as a disk array control device that controls read / write operations in RAID 1 with respect to a plurality of hard disks having a reallocation mechanism,
Managing the hard disk by dividing an area for each recovery unit set to a size smaller than the partition of the hard disk;
If a read error occurs when reading from one of the plurality of hard disks, reading the data of the recovery unit area where the read error has occurred from the other one of the plurality of hard disks;
Restoring the redundant configuration by writing the read data to the hard disk in which the read error has occurred; and
A program that executes

A program for causing a computer system to function as a disk array control device that controls read / write operations in RAID 1 with respect to a plurality of hard disks having a reallocation mechanism,
If a read error occurs when reading data from one of the plurality of hard disks, reading the block data in which the read error has occurred from the other one of the plurality of hard disks;
Restoring the redundant configuration by writing the read block data to the corresponding block of the hard disk in which the read error has occurred; and
A program that executes