JP2009104369A

JP2009104369A - Disk sub-system

Info

Publication number: JP2009104369A
Application number: JP2007274963A
Authority: JP
Inventors: Masaru Sugawara; 賢菅原; Gosuke Sato; 剛介佐藤; Takashi Takeuchi; 崇竹内; Katsuaki Muramatsu; 克昭村松
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2007-10-23
Filing date: 2007-10-23
Publication date: 2009-05-14

Abstract

<P>PROBLEM TO BE SOLVED: To dispense with any spare disk in a conventional RAID configuration, and to prevent data lost by restoring original data even when failure occurs in a plurality of disk drives. <P>SOLUTION: In a plurality of disk drives of RAID configurations, a data region for writing and reading normal data and a spare region for writing and reading data as the alternate of the data region are specified. When the necessity of the copy of the data region of a certain disk drive is generated, the data of the data region of the first disk drive are automatically copied to the spare region of the other disk drive according to the circumstances of the failure of the plurality of disk drives. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、ディスクサブシステムに係り、特にディスクアレイから成るディスクサブシステムにおいてディスクドライブに故障が発生した場合のデータロストの低減に関する。 The present invention relates to a disk subsystem, and more particularly to reduction of data lost when a failure occurs in a disk drive in a disk subsystem comprising a disk array.

ＲＡＩＤ構成によるディスクサブシステムは、パリティディスクを備えており、ＲＡＩＤ構成下の１台のディスクドライブに故障が発生した場合には、パリティディスクを用いてデータ修復することできる。しかし、複数台のディスクドライブが故障した場合にはエラー修復が出来ないため、データロストが起きる。 A disk subsystem with a RAID configuration includes a parity disk, and when a failure occurs in one disk drive under the RAID configuration, data can be restored using the parity disk. However, if a plurality of disk drives fail, the error cannot be repaired and data loss occurs.

特許文献１（特開２００５−１２２３３８公報）には、スペアディスクを持つＲＡＩＤ構成のディスクアレイにおいて、ディスクドライブ毎に故障発生の可能性を推測して、故障発生の可能性の高いディスクドライブからスペアディスクにデータを自動的にコピーすることで、多重ディスクの故障によるデータロストを低減する技術が開示されている。
また、レガシーシステムにおいてはシステムが使用するディスクドライブの領域に制限がある場合も珍しくなく、ディスクドライブの記憶容量の増大に応じて未使用領域も増大する傾向にある。 In Patent Document 1 (Japanese Patent Laid-Open No. 2005-122338), in a RAID-structured disk array having spare disks, the possibility of occurrence of a failure is estimated for each disk drive, and a spare from a disk drive having a high possibility of occurrence of failure A technique for reducing data lost due to failure of multiple disks by automatically copying data to the disk is disclosed.
In legacy systems, it is not uncommon for the area of a disk drive used by the system to be limited, and the unused area tends to increase as the storage capacity of the disk drive increases.

特開２００５−１２２３３８公報JP 2005-122338 A

上記従来技術のような、スペアディスクを持つディスクサブシステムにおいて複数台のディスクドライブに故障が発生した場合でも継続的に稼動させるためには、スペアディスクが必要となるためハードウェアのコストが上がる。またスペアディスクにデータを退避する場合、通常のユーザー業務に影響を及ぼす可能性がある。 In the disk subsystem having the spare disk as in the above-described prior art, in order to continuously operate even when a plurality of disk drives fail, a spare disk is required, so that the hardware cost increases. In addition, when data is saved to a spare disk, normal user operations may be affected.

本発明の目的は、従来のＲＡＩＤ構成におけるスペアディスクを不要とし、ＲＡＩＤ構成のディスクドライブに定義したスペア領域にデータをコピーすることで、ディスクドライブに故障が発生した場合に当該コピーデータを用いて元のデータを修復してデータロストを低減することにある。 An object of the present invention is to eliminate the need for a spare disk in a conventional RAID configuration, and to copy data to a spare area defined in a disk drive in a RAID configuration, so that the copy data is used when a failure occurs in the disk drive. The original data is restored to reduce data loss.

本発明によるディスクサブシステムは、ＲＡＩＤ構成の複数のディスクドライブと、該ディスクドライブ群にデータを書き込み及び読み出すための制御を行う処理手段と、を有するディスクサブシステムにおいて、
前記複数のディスクドライブに、通常のデータを書き込み及び読み出すデータ領域と、該データ領域の代替としてデータを書き込み及び読み出すスペア領域とを規定し、
複数の該ディスクドライブの状態を管理する管理手段と、該管理手段によるディスクドライブの状態から、あるディスクドライブ（第１ディスクドライブ）の該データ領域のコピーの必要が生じた場合、該第１ディスクドライブの該データ領域のデータを、他のディスクドライブ（第２ディスクドライブ）の該スペア領域に自動的にコピーするコピー実行手段と、を有することを特徴とするディスクサブシステムとして構成される。 A disk subsystem according to the present invention is a disk subsystem having a plurality of disk drives with a RAID configuration and processing means for performing control for writing and reading data to and from the disk drive group.
A data area for writing and reading normal data to the plurality of disk drives, and a spare area for writing and reading data as an alternative to the data area,
A managing means for managing the states of the plurality of disk drives, and when the data area of a certain disk drive (first disk drive) needs to be copied from the state of the disk drives by the managing means, the first disk Copy execution means for automatically copying the data in the data area of the drive to the spare area of another disk drive (second disk drive) is configured as a disk subsystem.

好ましい例では、前記ディスクドライブ群は、ミラー構成の正副の複数のディスクドライブから成り、かつ各ディスクドライブには前記データ領域及びスペア領域が形成され、前記管理手段は、ペア関係にあるディスクドライブの一方が不良な状態（閉塞ディスク）を示す第１のフラグと、ペア関係にあるディスクドライブにおけるデータの更新の必要性を示す第２のフラグと、を管理する管理テーブルをメモリに保持して管理し、前記コピー実行手段は、該管理テーブルを参照して、前記第１ディスクドライブの該データ領域から前記第２ディスクドライブの該スペア領域にデータコピーする。 In a preferred example, the disk drive group includes a plurality of primary and secondary disk drives in a mirror configuration, and the data area and the spare area are formed in each disk drive, and the management means includes a pair of disk drives. A management table for managing the first flag indicating one of the defective states (blocked disk) and the second flag indicating the necessity of updating data in the paired disk drive is held in the memory and managed. The copy execution means refers to the management table and copies data from the data area of the first disk drive to the spare area of the second disk drive.

また、好ましくは、前記コピー実行手段は、前記管理テーブルの前記第１のフラグの値に応じて、電源切断時に一括してデータコピーを行う第１のデータコピーモードか、又は電源切断時ではなく、前記第２のフラグに応じて、前記第１ディスクドライブの該データ領域内のデータをブロック単位に前記第２ディスクドライブの該スペア領域にコピーする第２のデータコピーモードを実施する。 Preferably, the copy execution means is in a first data copy mode in which data copy is performed in a lump when power is turned off, or not when power is turned off, according to the value of the first flag in the management table. In response to the second flag, a second data copy mode is implemented in which data in the data area of the first disk drive is copied to the spare area of the second disk drive in block units.

また、好ましくは、前記スペア領域の容量は、前記データ領域と等しいかそれよりも大きい。 Preferably, the capacity of the spare area is equal to or larger than that of the data area.

本発明によれば、従来のＲＡＩＤ構成におけるスペアディスクを不要とし、ＲＡＩＤ構成のディスクドライブの記憶領域をスペア領域として利用してデータコピーしておくことができる。ディスクドライブに故障が発生した場合に、スペア領域にコピーされたデータを使用して元のデータを修復してデータロストを低減することができる。 According to the present invention, a spare disk in a conventional RAID configuration is not required, and data can be copied using a storage area of a disk drive having a RAID configuration as a spare area. When a failure occurs in the disk drive, the data lost can be reduced by restoring the original data using the data copied to the spare area.

以下、図面を参照して、好ましい一実施例について説明する。
図１は、ＲＡＩＤ構成のディスクサブシステムの例を示す。
ディスクサブシステム１は、関係する種々のプログラムを実行して、ディスクに対するデータの読み書きに伴うデータ処理を行うＣＰＵ３と、メモリ２１内に保持されるディスク制御のためのディスク制御コントロールウェア（Ｃ／Ｗと記す）４と、本発明に関係するスペア領域に対するデータのコピー制御を行う自動データコピープログラム５、データキャッシュ６と、不揮発メモリ２２内に保持されるディスク稼動状況管理テーブル７、ＳＣＳＩコントローラ９、及びＳＣＳＩバス９１を介して接続されるディスクドライブ群８を有して構成される。ディスクサブシステム１はメインバス１１を介してホストシステム（以下単にホストという）１０に接続される。 A preferred embodiment will be described below with reference to the drawings.
FIG. 1 shows an example of a disk subsystem having a RAID configuration.
The disk subsystem 1 executes various related programs to perform data processing accompanying data reading / writing on the disk, and disk control controlware (C / W) for disk control held in the memory 21. 4), an automatic data copy program 5 for performing data copy control on a spare area related to the present invention, a data cache 6, a disk operation status management table 7 held in the nonvolatile memory 22, a SCSI controller 9, And a disk drive group 8 connected via a SCSI bus 91. The disk subsystem 1 is connected to a host system (hereinafter simply referred to as a host) 10 via a main bus 11.

ディスクドライブ群８は、複数のディスクドライブ１００〜１０５を有するＲＡＩＤ構成を成し、各ディスクドライブはＳＣＳＩバス９１を介してＳＣＳＩコントローラ９に接続される。ここで、ディスクドライブ１００と１０３、１０１と１０４、１０２と１０５は、それぞれペアにしてミラー構成を成している。 The disk drive group 8 has a RAID configuration including a plurality of disk drives 100 to 105, and each disk drive is connected to the SCSI controller 9 via the SCSI bus 91. Here, the disk drives 100 and 103, 101 and 104, and 102 and 105 are paired to form a mirror configuration.

各ディスクドライブ１００〜１０５の記憶領域は、データ領域Ａ（例えば１００Ａ）とスペア領域Ｂ（例えば１００Ｂ）に分割されている。記憶容量はスペア領域≧データ領域の関係である。データ領域Ａは、ホスト１０の処理データを記憶し、スペア領域Ｂはデータ領域Ａが故障によって使用できない場合、それに代わって処理データを記憶する。 The storage area of each of the disk drives 100 to 105 is divided into a data area A (for example, 100A) and a spare area B (for example, 100B). The storage capacity has a relation of spare area ≧ data area. The data area A stores the processing data of the host 10, and the spare area B stores the processing data instead when the data area A cannot be used due to a failure.

ディスク制御Ｃ／Ｗ４は、ホスト１０の指示により必要に応じてディスクドライブ群８の各ディスクドライブのデータ領域又はスペア領域に記録されたデータを読み又はその領域にデータを書き込む。データキャッシュ６はディスクドライブ群８から読み出されたデータを一時記憶する。ホスト１０からのデータ要求に対して、目的のデータがデータキャッシュ６に記憶されている場合にはそこから読み出されてホスト１０へ送信される。 The disk control C / W 4 reads or writes data recorded in the data area or spare area of each disk drive of the disk drive group 8 as required by an instruction from the host 10. The data cache 6 temporarily stores data read from the disk drive group 8. In response to a data request from the host 10, if the target data is stored in the data cache 6, it is read out from the data cache 6 and transmitted to the host 10.

図２は、ホスト１０からディスクドライブ群８へ起動時の動作の概略を示す。
ステップＳ１はデータ書き込み処理である。この場合、ディスクドライブ１００のデータ領域１００Ａとディスクドライブ１０３のデータ領域１０３Åの双方にデータを書き込む。ステップＳ２はデータ読み出し処理である。データ読み出しの場合、ディスクドライブ１００のデータ領域１００Ａまたはディスクドライブ１０３のデータ領域１０３Ａから読み出しする。 FIG. 2 shows an outline of the operation at the time of starting from the host 10 to the disk drive group 8.
Step S1 is a data writing process. In this case, data is written in both the data area 100A of the disk drive 100 and the data area 103Å of the disk drive 103. Step S2 is data read processing. In the case of data reading, data is read from the data area 100A of the disk drive 100 or the data area 103A of the disk drive 103.

ステップＳ３は、ステップＳ１と同様に、ディスクドライブ１０１及びディスクドライブ１０４のデータ領域への書き込み処理である。ステップＳ４は、ステップＳ２と同様に、ディスクドライブ１０１またはディスクドライブ１０４のデータ領域からの読み出し処理である。 Step S3 is a writing process to the data areas of the disk drive 101 and the disk drive 104, as in step S1. Step S4 is a read process from the data area of the disk drive 101 or the disk drive 104, as in step S2.

ステップＳ５はディスクドライブ１００のデータ領域１００Ａをディスクドライブ１０１のスペア領域１０１Ｂにデータコピーする処理である。
ステップＳ６はステップＳ５と同様に、ディスクドライブ１０１のデータ領域１０１Ａをディスクドライブ１０２のスペア領域１０２Ｂにデータコピーする処理である。ステップＳ７はステップＳ５と同様に、ディスクドライブ１０２のデータ領域１０２Ａをディスクドライブ１００のスペア領域１００Ｂにデータコピーする処理である。 Step S5 is a process of copying the data area 100A of the disk drive 100 to the spare area 101B of the disk drive 101.
Step S6 is a process of copying data from the data area 101A of the disk drive 101 to the spare area 102B of the disk drive 102, as in step S5. Step S7 is a process of copying data from the data area 102A of the disk drive 102 to the spare area 100B of the disk drive 100, as in step S5.

図３はディスク稼動状況管理テーブル７の例を示す。
ディスク稼動状況管理テーブル７は、ペア数１〜Ｎの閉塞ディスク管理フラグ３００、各ペア単位のデータコピー先ディスクドライブ番号管理テーブル３０１、データ領域全てのＬＢＡ（Logical Block Address）に対する更新必要フラグ３０２を記憶して管理する。ここで、ディスクのペアは総数Ｎ個、各ディスクは総数Ｍ個のブロックを有しており、それぞれＬＢＡにより指定されるとする。各ブロックの容量は例えば５１２バイトである。 FIG. 3 shows an example of the disk operation status management table 7.
The disk operation status management table 7 includes a blocked disk management flag 300 having 1 to N pairs, a data copy destination disk drive number management table 301 for each pair, and an update necessary flag 302 for LBA (Logical Block Address) of all data areas. Remember and manage. Here, it is assumed that there are a total of N disk pairs, and each disk has a total of M blocks, each of which is designated by an LBA. The capacity of each block is, for example, 512 bytes.

閉塞ディスクとは、ペア関係にあるディスクドライブの一方が不良な場合をいう。閉塞ディスク管理フラグ３００が“１”の場合は閉塞ディスクが有り、“０”の場合は閉塞ディスクが無いことを意味する。図示の例では、１ペア目とＮペア目は“０”なので、閉塞ディスクは無く、両方のディスクとも良好であることを示す。一方、２ペア目と３ペア目は“１”なので、閉塞ディスクが有ることを示す。本実施例で特徴的なことは、ペア関係にあるディスクドライブに閉塞ディスクがある場合、そのペアはいずれ障害発生の可能性が高いと判断して、スペア領域にデータコピーする処理を行う。 A blocked disk refers to a case where one of the paired disk drives is defective. When the blocked disk management flag 300 is “1”, there is a blocked disk, and when it is “0”, there is no blocked disk. In the illustrated example, since the first pair and the Nth pair are “0”, there is no blocked disk, indicating that both disks are good. On the other hand, the second and third pairs are “1”, indicating that there is a blocked disk. What is characteristic in this embodiment is that when a disk drive in a pair relationship has a blocked disk, it is determined that a failure is likely to occur in the pair, and data is copied to the spare area.

また、更新必要フラグ３０２が“１”の場合は、該当するブロックのデータ更新が必要、“０”の場合はデータ更新が不要、を意味する。
図示の例で、１ペア目のディスクドライブ１００と１０３の管理テーブルは、２ペア目のディスクドライブ１０１と１０４のデータ更新フラグを持ち、その２ペア目のディスクペアのブロック番号が１，２，４のブロックの更新が必要であることを示している。同様にして、２ペア目のディスク１０１と１０４に対応付けられた３ペア目のディスクドライブ１０２と１０５のデータ更新フラグは、ブロック番号が１とkのブロックが更新要である。 Further, when the update necessary flag 302 is “1”, it means that data update of the corresponding block is necessary, and when it is “0”, it means that data update is unnecessary.
In the illustrated example, the management table for the first pair of disk drives 100 and 103 has data update flags for the second pair of disk drives 101 and 104, and the block number of the second pair of disk drives is 1, 2, 4 indicates that the block needs to be updated. Similarly, in the data update flag of the third pair of disk drives 102 and 105 associated with the second pair of disks 101 and 104, the blocks with block numbers 1 and k need to be updated.

図４は通常時における自動データコピープログラム５の実行処理を示す。
ホスト１０が待機中か否かを判断して、待機中の時にデータの自動コピーを開始する（ステップＳ１０）。
データの自動コピーが実行されると、ディスク稼動状況管理テーブル７の閉塞ディスク管理フラグ３００を参照して、閉塞ディスクが有るか判定する（ステップＳ１１）。判定の結果、閉塞ディスクのペアが有る場合（即ちフラグ３００が“１”）、ペアディスクのいずれか一方が不良なので、いずれ障害発生の可能性が高いと判断して、スペア領域へデータコピーを行うことになるが、即、データの自動コピーを実施しない。一方、上記判定の結果、閉塞ディスクペアが無い場合（即ちフラグ３００が“０”）には、自動データコピーを実施する。 FIG. 4 shows an execution process of the automatic data copy program 5 in a normal state.
It is determined whether or not the host 10 is on standby, and automatic data copying is started when the host 10 is on standby (step S10).
When automatic data copying is executed, it is determined whether there is a blocked disk by referring to the blocked disk management flag 300 in the disk operation status management table 7 (step S11). As a result of the determination, if there is a pair of blocked disks (that is, flag 300 is “1”), one of the pair disks is defective, so it is determined that a failure is likely to occur and data is copied to the spare area. Do it, but don't immediately copy data automatically. On the other hand, if the result of the determination is that there is no blocked disk pair (that is, the flag 300 is “0”), automatic data copying is performed.

自動データコピーは、ホスト１０から対象とするディスクドライブへのアクセス要求がない場合に実施するため、閉塞ディスクペアが有るものはディスクドライブ１台の運転となっており、自動データコピーが全て終了していない状態でそのディスクドライブが故障すると、データロストが起こる。そのため、閉塞ディスクペアがあるものに対しては自動データコピーを一括で実行する。一括で行う自動データコピーは電源切断を契機に実行する（ステップＳ１２）。 Since automatic data copying is performed when there is no access request to the target disk drive from the host 10, one disk drive is operating when there is a blocked disk pair, and all automatic data copying is completed. If the disk drive fails when not in use, data loss occurs. For this reason, automatic data copy is executed in batch for those with a blocked disk pair. Automatic data copying performed in a batch is executed when the power is turned off (step S12).

ステップＳ１１の判断の結果、閉塞ディスクペアが無い場合はディスク稼動状況管理テーブル７に記録されているペア＃Ｌ（１〜Ｎ）単位のデータコピー先ディスクドライブ番号管理テーブル３０１を参照し、ＬＢＡ＃Ｐ（１〜Ｍ）の更新必要フラグ３０２が“１”であるか判定する（ステップＳ１３）。更新必要フラグ３０２が“１”の場合は更新が必要であるため、該当するディスクドライブペア＃Ｌの正側または副側からデータを読み出す（ステップＳ１４）。
そして、読み出したデータを更新必要なディスクドライブペア＃Ｌ（ディスク稼動状況管理テーブル７の各ペア単位のデータコピー先ディスクドライブ番号管理テーブル３０１で示される正側のディスクドライブ）に書き込む（ステップＳ１５）。 If the result of determination in step S11 is that there is no blocked disk pair, refer to the data copy destination disk drive number management table 301 in pairs #L (1 to N) recorded in the disk operation status management table 7, and LBA # It is determined whether the update necessity flag 302 of P (1 to M) is “1” (step S13). If the update necessary flag 302 is “1”, the update is necessary, so the data is read from the primary side or the secondary side of the corresponding disk drive pair #L (step S14).
Then, the read data is written to the disk drive pair #L that needs to be updated (the primary disk drive indicated by the data copy destination disk drive number management table 301 for each pair in the disk operation status management table 7) (step S15). .

この場合、読み出し及び書き込みデータの単位はＬＢＡ単位である。読み出し及び書き込みのデータ単位をＬＢＡ単位で行うのは、ホスト１０からディスクサブシステム１へアクセス要求があった場合、円滑にデータの自動コピーの処理を中断するためであり、ディスクドライブへ書き込む最小の単位をＬＢＡ単位とする。 In this case, the unit of read and write data is an LBA unit. The read and write data units are performed in LBA units in order to smoothly interrupt the automatic data copy process when there is an access request from the host 10 to the disk subsystem 1. The unit is LBA unit.

次に、ディスク稼動状況管理テーブル７の更新が正常に完了したＬＢＡ＃Ｐの更新必要フラグ３０２を“１”から“０”にセットする（ステップＳ１６）。これで自動データコピー終了とする。
ステップＳ１３での更新必要フラグ３０２が“０”の場合には更新の必要がないため、次のＬＢＡ＃Ｐにカウントアップする（ステップＳ１７）。そしてカウントアップしたＬＢＡ＃Ｐが最終ＬＢＡ＃（Ｍ＋１）であるか判定する（ステップＳ１８）。最終ＬＢＡ＃（Ｍ＋１）でない場合には自動データコピーを終了する。一方、最終ＬＢＡ＃（Ｍ＋１）の場合にはディスク稼動状況管理テーブル７のペア＃Ｌが最終ペア＃（Ｎ＋１）であるか判定する（ステップＳ１９）。その判定の結果、最終ペア＃（Ｎ＋１）の場合には自動データコピーを終了し、最終ペア＃（Ｎ＋１）でない場合にはペア＃Ｌをカウントアップしてから自動データコピーを終了する（ステップＳ２０）。 Next, the update necessary flag 302 of LBA # P for which the update of the disk operation status management table 7 has been normally completed is set from “1” to “0” (step S16). This completes automatic data copying.
If the update necessity flag 302 in step S13 is “0”, no update is necessary, so the count is incremented to the next LBA # P (step S17). Then, it is determined whether the counted up LBA # P is the final LBA # (M + 1) (step S18). If it is not the final LBA # (M + 1), the automatic data copy is terminated. On the other hand, in the case of the final LBA # (M + 1), it is determined whether the pair #L in the disk operation status management table 7 is the final pair # (N + 1) (step S19). As a result of the determination, if the final pair # (N + 1), the automatic data copy is terminated. If not the final pair # (N + 1), the pair #L is counted up and the automatic data copy is terminated (step S20). ).

以上の処理動作を最終ペア＃（Ｎ＋１）と最終ＬＢＡ＃（Ｍ＋１）まで繰り返し、最終ペア＃（Ｎ＋１）に達したら、再度最初のペア＃Ｌから実行開始する。自動データコピーはホスト１０からアクセス要求があった場合は再度ホスト１０からアクセス要求がない場合まで中断する。 The above processing operation is repeated until the final pair # (N + 1) and the final LBA # (M + 1). When the final pair # (N + 1) is reached, the execution is started again from the first pair #L. When there is an access request from the host 10, the automatic data copy is interrupted until there is no access request from the host 10 again.

図５は電源切断時における自動データコピープログラム５の実行処理を示す。
この処理は、図４のステップＳ１２における処理として実行される。この場合、自動データコピーを行い、その処理が完全に終了してからシステム電源がオフとなる。
まず、ディスク稼動状況管理テーブル７に記録されているペア＃Ｌ（１〜Ｎ）ＬＢＡ＃Ｐ（１〜Ｍ）の更新必要フラグ３０２が“１”であるか判定する（ステップＳ２１）。その結果、更新必要フラグ３０２が“１”の場合には更新が必要なので、該当ディスクドライブペア＃Ｌの正側の１ブロックからデータを読み出す（ステップＳ２２）。なお、閉塞ディスクドライブがある場合は閉塞ディスクドライブでない方のディスクドライブの１ブロックから読み出す。 FIG. 5 shows an execution process of the automatic data copy program 5 when the power is turned off.
This process is executed as the process in step S12 of FIG. In this case, automatic data copying is performed, and the system power is turned off after the processing is completely completed.
First, it is determined whether the update necessary flag 302 of the pair #L (1-N) LBA # P (1-M) recorded in the disk operation status management table 7 is “1” (step S21). As a result, since the update is necessary when the update necessary flag 302 is “1”, data is read from the primary block of the corresponding disk drive pair #L (step S22). If there is a blocked disk drive, it is read from one block of the disk drive that is not a blocked disk drive.

そして、その読み出したデータを更新が必要なディスクドライブペア＃の正側（ディスク稼動状況管理テーブル７の各ペア単位のデータコピー先ディスクドライブ番号管理テーブル３０１で示す）の１ブロックに書き込む（ステップＳ２３）。なお、閉塞ディスクドライブがある場合には閉塞ディスクドライブでない方のディスクドライブに書き込む。 Then, the read data is written to one block on the primary side of the disk drive pair # that needs to be updated (indicated by the data copy destination disk drive number management table 301 for each pair in the disk operation status management table 7) (step S23). ). If there is a blocked disk drive, it is written to the disk drive that is not a blocked disk drive.

次に、ディスク稼動状況管理テーブル７の更新が正常に完了したＬＢＡ＃Ｐの更新必要フラグ３０２を“１”から“０”にセットする（ステップＳ２４）。更新必要フラグ３０２が“０”の場合には更新の必要がないため、次のＬＢＡ＃Ｐにカウントアップする（ステップＳ２５）。
次に、カウントアップしたＬＢＡ＃Ｐが最終ＬＢＡ＃（Ｍ＋１）であるか判定する（ステップＳ２６）。その結果、最終ＬＢＡ＃（Ｍ＋１）でない場合には、最終ＬＢＡ＃（Ｍ＋１）に達するまで繰り返す。 Next, the update necessary flag 302 of LBA # P for which the update of the disk operation status management table 7 has been normally completed is set from “1” to “0” (step S24). When the update necessary flag 302 is “0”, there is no need for updating, so the next LBA #P is counted up (step S25).
Next, it is determined whether the counted up LBA # P is the final LBA # (M + 1) (step S26). If the result is not the final LBA # (M + 1), the process is repeated until the final LBA # (M + 1) is reached.

そして、ディスク稼動状況管理テーブル７のペア＃Ｌが最終ペア＃（Ｎ＋１）であるかを判定する（ステップＳ２７）。その結果、最終ペア＃（Ｎ＋１）の場合には自動データコピーを終了する。一方、最終ペア＃（Ｎ＋１）でない場合には最終ペア＃（Ｎ＋１）に達するまで自動データコピーを繰り返して終了する。
最後に自動データコピーが終了したらシステム電源切断する（ステップＳ２９）。自動データコピーが完了したディスクドライブは閉塞したディスクドライブの代わりとなり、引き続き、ＲＡＩＤ構成として使用可能である。 Then, it is determined whether the pair #L in the disk operation status management table 7 is the last pair # (N + 1) (step S27). As a result, in the case of the final pair # (N + 1), the automatic data copy is terminated. On the other hand, if it is not the final pair # (N + 1), the automatic data copy is repeated until the final pair # (N + 1) is reached.
Finally, when the automatic data copy is completed, the system power is turned off (step S29). The disk drive for which automatic data copying has been completed replaces the blocked disk drive, and can continue to be used as a RAID configuration.

本実施例によれば、ＲＡＩＤ構成のディスクドライブの記録領域を、スペア領域と定義して有効に利用することができる。また、ＣＰＵがディスクドライブの未使用の場合に、データの自動コピーができるので、ユーザーの通常の業務に影響を与えることなくバックアップデータを退避することができる。 According to this embodiment, a recording area of a disk drive having a RAID configuration can be effectively used by defining it as a spare area. In addition, since the data can be automatically copied when the CPU is not using the disk drive, the backup data can be saved without affecting the normal operation of the user.

一実施例におけるディスクサブシステムの構成を示す図。The figure which shows the structure of the disk subsystem in one Example. ホスト２からディスクアレイ１１へ起動時の動作を示す図。The figure which shows the operation | movement at the time of starting to the disk array 11 from the host 2. FIG. ディスク稼動状況管理テーブル７の例を示す図。The figure which shows the example of the disk operation condition management table. 一実施例における通常時の動作を示すフローチャート。The flowchart which shows the operation | movement at the normal time in one Example. 一実施例における電源切断時の動作を示すフローチャート。The flowchart which shows the operation | movement at the time of the power-off in one Example.

Explanation of symbols

１：ディスクサブシステム１０：ホストシステム１１：メインバス
３：ＣＰＵ４：ディスク制御Ｃ／Ｗ５：自動データコピープログラム６：データキャッシュ７：ディスク稼動状況管理テーブル８：ディスクドライブ群９：ＳＣＳＩコントローラ９１：ＳＣＳＩバス。 1: Disk subsystem 10: Host system 11: Main bus
3: CPU 4: Disk control C / W 5: Automatic data copy program 6: Data cache 7: Disk operation status management table 8: Disk drive group 9: SCSI controller 91: SCSI bus.

Claims

In a disk subsystem having a plurality of disk drives with a RAID configuration and processing means for performing control for writing and reading data to and from the disk drive group,
A data area for writing and reading normal data to the plurality of disk drives, and a spare area for writing and reading data as an alternative to the data area,
Management means for managing the status of the plurality of disk drives;
When it is necessary to copy the data area of a certain disk drive (first disk drive) from the state of the disk drive by the management means, data in the data area of the first disk drive is transferred to another disk drive ( Copy execution means for automatically copying to the spare area of the second disk drive).

The disk drive group is composed of a plurality of primary and secondary disk drives in a mirror configuration, and the data area and the spare area are formed in each disk drive,
The management means includes a first flag indicating that one of the paired disk drives is defective (blocked disk), a second flag indicating the necessity of updating data in the paired disk drive, Manage the management table to manage
2. The disk subsystem according to claim 1, wherein the copy execution means copies data from the data area of the first disk drive to the spare area of the second disk drive with reference to the management table.

The copy execution means is in a first data copy mode in which data copy is performed at a time when the power is turned off according to the value of the first flag in the management table, or when the power is turned off. 3. A second data copy mode is implemented in which data in the data area of the first disk drive is copied to the spare area of the second disk drive in block units according to a flag. Disk subsystem.

4. The disk subsystem according to claim 1, wherein the spare area has a capacity equal to or larger than that of the data area.