JP3177997B2

JP3177997B2 - Storage system

Info

Publication number: JP3177997B2
Application number: JP08654091A
Authority: JP
Inventors: 弘治荒井; 孝夫佐藤; 弘行北嶋
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1991-04-18
Filing date: 1991-04-18
Publication date: 2001-06-18
Anticipated expiration: 2016-06-18
Also published as: JPH04318640A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、複数の記憶装置からな
るシステムにおいて、記憶装置に障害が発生した場合の
効率的なデータ回復方法に係わり、特にデータを複数の
記憶装置に分割して記憶するディスクアレイシステムな
どの場合に有効に適用できる障害回復方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an efficient data recovery method when a failure occurs in a storage device in a system including a plurality of storage devices, and in particular, stores data by dividing the data into the plurality of storage devices. The present invention relates to a failure recovery method that can be applied effectively in the case of a disk array system or the like.

【０００２】[0002]

【従来の技術】複数の記憶装置からなるシステムにおい
て、複数の記憶装置にデータを分割して記憶する場合に
は、分割して記憶することによりデータの信頼性が低下
していた。2. Description of the Related Art In a system including a plurality of storage devices, when data is divided and stored in a plurality of storage devices, the reliability of data is reduced by dividing and storing the data.

【０００３】従来方式では、信頼性の低下を防止するた
めに全記憶装置を対象範囲とする障害回復用の冗長デー
タ（ＥＣＣデータ）を付加し、全記憶装置のうちいずれ
かに障害が発生した場合には、上記のＥＣＣデータによ
り、該記憶装置のデータを回復していた。In the conventional system, redundant data (ECC data) for failure recovery covering all storage devices is added in order to prevent a decrease in reliability, and a failure occurs in any one of the storage devices. In such a case, the data in the storage device is recovered by the ECC data.

【０００４】[0004]

【発明が解決しようとする課題】上記従来技術では、シ
ステムの記憶容量増加などによる記憶装置数の増加に伴
うデータ信頼性の低下に対抗して障害回復機能の強化の
ために、同時に回復できる障害状態の記憶装置数を示す
ＥＣＣデータの冗長度を増加した。このため、システム
の拡張に伴いＥＣＣデータの生成方法が複雑になりＥＣ
Ｃデータ生成の負荷が増大し、また、ＥＣＣデータの障
害回復対象となる記憶装置が増加し、障害回復処理でア
クセスするデータ量が増大するなど障害回復の負荷が増
大していた。つまり従来方法では、システムとしての拡
張性が悪く、特に各パスに複数の記憶装置を接続する場
合では、多重に障害が発生するとパス性能がボトルネッ
クになり、システム性能が低下するという問題があっ
た。In the above-mentioned prior art, a failure which can be recovered at the same time in order to enhance a failure recovery function against a decrease in data reliability due to an increase in the number of storage devices due to an increase in the storage capacity of the system. The redundancy of ECC data indicating the number of storage devices in the state has been increased. For this reason, the method of generating ECC data becomes complicated with the expansion of the system, and the EC
The load of C data generation has increased, the number of storage devices to be subjected to ECC data failure recovery has increased, and the amount of data accessed in the failure recovery processing has increased, thus increasing the load of failure recovery. In other words, in the conventional method, the expandability of the system is poor. In particular, when a plurality of storage devices are connected to each path, if multiple failures occur, the path performance becomes a bottleneck and the system performance is degraded. Was.

【０００５】本発明の目的は、上記問題点の解決を図
り、複数の記憶装置を用いてデータを複数の記憶装置に
分割して記憶するシステムでの障害回復制御の高性能化
とシステム拡張性の向上を図ることである。SUMMARY OF THE INVENTION It is an object of the present invention to solve the above-mentioned problems, and to improve the performance and expandability of failure recovery control in a system in which data is divided into a plurality of storage devices using a plurality of storage devices and stored. Is to improve.

【０００６】本発明のもう一つの目的は、上記システム
の記憶装置の障害に対する障害回復処理に伴う記憶装置
およびパスに対するアクセス負荷の負荷分散の実現を図
ることである。Another object of the present invention is to achieve load distribution of an access load to a storage device and a path associated with a failure recovery process for a failure of a storage device in the above system.

【０００７】[0007]

【課題を解決するための手段】複数の記憶装置と該記憶
装置を制御する制御装置及び制御装置と各記憶装置の間
をつなぐパスからなり、複数の記憶装置にまたがってデ
ータを記憶するシステムにおいて、複数の記憶装置にま
たがって記憶するデータ集合から障害回復用のＥＣＣデ
ータを作成する手段と、上記ＥＣＣデータを別の記憶装
置に記憶し、該ＥＣＣデータと対応するデータ集合をＥ
ＣＣグループとしてまとめて管理する手段と、ＥＣＣグ
ループを一括してアクセスする手段と、上記ＥＣＣグル
ープを互いに排他的な記憶装置またはパスの部分集合に
対応して複数個設定する手段と、一部の記憶装置のデー
タが障害などにより一時的または恒久的にアクセス不可
能になった場合に、該データが属するＥＣＣグループの
残りのデータを入力する手段と、入力したデータから上
記データを回復する手段により、障害回復処理の高性能
化を図ることができる。SUMMARY OF THE INVENTION In a system for storing data across a plurality of storage devices, the storage system includes a plurality of storage devices, a control device for controlling the storage devices, and a path connecting the control device and each storage device. Means for generating ECC data for failure recovery from a data set stored over a plurality of storage devices, storing the ECC data in another storage device, and storing a data set corresponding to the ECC data in an E storage device.
Means for collectively managing as a CC group, means for collectively accessing ECC groups, means for setting a plurality of ECC groups corresponding to mutually exclusive storage devices or subsets of paths, When the data in the storage device becomes temporarily or permanently inaccessible due to a failure or the like, means for inputting the remaining data of the ECC group to which the data belongs, and means for recovering the data from the input data Thus, the performance of the failure recovery process can be improved.

【０００８】また、上記システムにおいて、ＥＣＣグル
ープの設定を基準データ量毎に変更する手段により、障
害回復処理に伴う記憶装置やパスのアクセス負荷の負荷
分散を図ることができる。Further, in the above system, the means for changing the setting of the ECC group for each reference data amount makes it possible to distribute the load of the access load of the storage device and the path accompanying the failure recovery processing.

【０００９】[0009]

【作用】任意の記憶装置に発生した障害に伴う障害回復
処理では、回復対象のデータを含むＥＣＣグループを用
いて、障害状態の記憶装置を除く残りの記憶装置のデー
タから障害状態の記憶装置のデータを回復する。ＥＣＣ
グループを互いに排他的な記憶装置またはパスの部分集
合に対して設定することにより、それぞれのECCグルー
プが担当するデータ量を制限しＥＣＣデータの量を少な
く設定でき、システムの記憶容量の拡大により記憶装置
数が増加しても、増加した記憶装置群を部分集合として
新たなＥＣＣグループを設定することにより、障害回復
処理に伴ってアクセスするデータ量を一定に保てるた
め、障害回復処理の負荷を増加させずに障害回復処理の
高性能化が達成できる。In a failure recovery process associated with a failure that has occurred in an arbitrary storage device, an ECC group including data to be recovered is used to recover the storage device in the failed state from the data in the remaining storage devices excluding the failed storage device. Recover data. ECC
By setting a group for a mutually exclusive storage device or a subset of paths, the amount of data handled by each ECC group can be limited and the amount of ECC data can be reduced, and storage can be performed by increasing the storage capacity of the system. Even if the number of devices increases, a new ECC group is set as a subset of the increased storage device group, so that the amount of data to be accessed in accordance with the failure recovery processing can be kept constant, thereby increasing the load of the failure recovery processing. Without doing so, higher performance of the failure recovery processing can be achieved.

【００１０】また、あらかじめＥＣＣグループの設定を
基準データ量毎に変更して設定することにより、特定の
記憶装置またはパスに固定的にＥＣＣグループを設定す
る方法に比較して、障害回復処理に伴ってアクセスする
データの総量は同じで、アクセスする記憶装置数または
パス数を増加できるため、障害回復処理に伴う記憶装置
１個またはパス１本あたりのアクセスデータ量が減少
し、記憶装置及びパスに対するアクセス負荷の負荷分散
を実現することができる。In addition, by changing the setting of the ECC group in advance for each reference data amount and setting the ECC group in advance, the ECC group can be fixedly set to a specific storage device or path, and the ECC group can be set in accordance with the failure recovery processing. Since the total amount of data to be accessed is the same and the number of storage devices or paths to be accessed can be increased, the amount of access data per storage device or one path associated with the failure recovery processing decreases, and The load distribution of the access load can be realized.

【００１１】[0011]

【実施例】以下、本発明の実施例を図面に基づいて詳細
に説明する。Embodiments of the present invention will be described below in detail with reference to the drawings.

【００１２】図３は、本実施例のシステム構成を示す。
記憶装置として磁気ディスク装置（以下、ドライブと言
う）を用い、並列動作可能な８個のドライブ１０１〜10
8と１個の予備ドライブ１０９及び該ドライブ群１０１
〜１０９を制御するディスク制御装置３０１と各ドライ
ブ１０１〜１０９とディスク装置３０１間をつなぎ全ド
ライブ１０１〜１０９が同時に動作可能なパス３０２か
らなる。このシステム構成ではドライブとパスが一対一
対応のため、ドライブを選択することとパスを選択する
ことは同じ意味である。データは上記８個のドライブ１
０１〜１０８中の６ドライブに分割して記録し、残りの
２ドライブにはＥＣＣデータを記録する。図３では、Ｅ
ＣＣグループ３１０はドライブＢ１０２，Ｃ１０３，Ｄ
１０４にデータ集合３１１を記録し、ドライブＡ１０１
にＥＣＣデータ３１２を記録し、ＥＣＣグループ３２０
はドライブＦ１０６，Ｇ１０７，Ｈ１０８にデータ集合
３１３を記録し、ドライブＥ１０５にＥＣＣデータ３１
４を記録した場合を示している。なお、図３では本質的
でない複雑さを避けるために、予備ドライブを１個で表
示しているが、システムとして要求される信頼性を高め
るために、予備ドライブの個数を増加することは容易で
ある。FIG. 3 shows a system configuration of this embodiment.
Eight drives 101 to 10 operable in parallel using a magnetic disk device (hereinafter referred to as a drive) as a storage device
8 and one spare drive 109 and the drive group 101
And a path 302 that connects the drives 101 to 109 to the disk device 301 and controls all the drives 101 to 109 at the same time. In this system configuration, since a drive and a path correspond one-to-one, selecting a drive and selecting a path have the same meaning. The data is the above 8 drives 1
Recording is performed by dividing the data into six drives 01 to 108, and ECC data is recorded in the remaining two drives. In FIG. 3, E
The CC group 310 includes drives B102, C103, and D
104, the data set 311 is recorded, and the drive A 101
ECC data 312 is recorded in the ECC group 320
Records the data set 313 in the drives F106, G107, and H108, and stores the ECC data 31 in the drive E105.
4 is recorded. Although FIG. 3 shows only one spare drive in order to avoid unnecessary complexity, it is easy to increase the number of spare drives in order to increase the reliability required for the system. is there.

【００１３】図１は各ドライブの記憶領域を摸式化した
ものであり、縦の列１０１〜１０９がドライブを示し、
横の行１３１〜１３５が各ドライブのトラックを示す。
ドライブＡ１０１とドライブＨ１０８がドライブ序列番
号上で連続すると定義し、ＥＣＣグループをドライブ序
列番号で連続する４個のドライブに設定し、トラックを
基準データ量として各ドライブに対する設定を変更し、
１ドライブずつスライドして設定した場合を示す。例え
ば、トラック１３１のＥＣＣグループ１１１はドライブ
Ａ１０１，Ｂ１０２，Ｃ１０３，Ｄ１０４に設定し、次
のトラック１３２では１ドライブ分スライドさせて、Ｅ
ＣＣグループ１１３をドライブH108，Ａ１０１，Ｂ１０
２，Ｃ１０３に設定する。FIG. 1 is a schematic diagram of a storage area of each drive, and vertical columns 101 to 109 show drives.
The horizontal rows 131 to 135 show the tracks of each drive.
It is defined that the drive A101 and the drive H108 are continuous on the drive sequence number, the ECC group is set to four drives continuous by the drive sequence number, and the setting for each drive is changed using the track as the reference data amount.
The case where the setting is performed by sliding one drive at a time is shown. For example, the ECC group 111 of the track 131 is set to the drives A101, B102, C103, and D104, and the next track 132 is slid by one drive, and
Drive CC group 113 to drive H108, A101, B10
2, set to C103.

【００１４】図２は各ドライブの記憶領域を摸式化した
ものであり、図１と同じ形式である。ＥＣＣグループを
特定の４ドライブに固定的に設定した場合を示し、ＥＣ
Ｃグループ２１１をドライブＡ１０１，Ｂ１０２，Ｃ１
０３，Ｄ１０４に設定し、ＥＣＣグループ２１２をドラ
イブＥ１０５，Ｆ１０６，Ｇ１０７，Ｈ１０８に設定し
たことを示す。FIG. 2 schematically shows the storage area of each drive, and has the same format as that of FIG. This shows a case where the ECC group is fixedly set to a specific four drives.
Drives A101, B102, C1
03, D104, indicating that the ECC group 212 has been set to the drives E105, F106, G107, and H108.

【００１５】図４は各ドライブの記憶領域を摸式化した
ものであり、図１と同じ形式である。ＥＣＣグループの
設定をトラック単位に変更し、変更の際のＥＣＣグルー
プの設定に必要な４個のドライブの選択において、各ド
ライブをランダムに選択するなど、各ドライブの組合せ
をできるだけ均等に実現するように選択した場合を示
す。例えば、トラック１３１のＥＣＣグループ４１１は
ドライブＡ１０１，B102，Ｃ１０３，Ｄ１０４に設定
し、トラック１３２のＥＣＣグループ４１３はドライブ
Ａ１０１，Ｄ１０４，Ｅ１０５，Ｆ１０６に設定する。FIG. 4 schematically shows the storage area of each drive, and has the same format as that of FIG. The setting of the ECC group is changed in track units, and in selecting four drives necessary for setting the ECC group at the time of the change, the combination of the drives is realized as uniformly as possible, such as selecting each drive at random. Shows the case where it is selected. For example, the ECC group 411 of the track 131 is set to the drives A101, B102, C103, and D104, and the ECC group 413 of the track 132 is set to the drives A101, D104, E105, and F106.

【００１６】図５はＥＣＣグループを用いた障害回復処
理における、ドライブとディスク制御装置間でのデータ
及びＥＣＣデータの移動状況を示す。各データおよびＥ
ＣＣデータは矢印に従った移動を行う。FIG. 5 shows the movement of data and ECC data between a drive and a disk controller in a failure recovery process using an ECC group. Each data and E
The CC data moves according to the arrow.

【００１７】図６はＥＣＣグループを用いた障害回復方
法のフローチャートを示す。FIG. 6 shows a flowchart of a failure recovery method using an ECC group.

【００１８】図７は各ドライブパスに２台のドライブを
接続した場合のシステム構成を示す。ドライブ１０１〜
１０９と対応するドライブ７０１〜７０９は、パス３０
２を共有し、各ドライブはそれぞれ独立に動作できる
が、同時に同じパスを使用することはできない。FIG. 7 shows a system configuration when two drives are connected to each drive path. Drive 101-
The drives 701 to 709 corresponding to the path 109
2 and each drive can operate independently, but cannot use the same path at the same time.

【００１９】次に、図５，６を用いて、ドライブＤ１０
４に障害が発生した場合のＥＣＣグループを用いた障害
回復処理について述べる。Next, referring to FIGS.
4 describes a failure recovery process using an ECC group when a failure occurs.

【００２０】ドライブ障害検出７２０はドライブからの
レスポンス情報の分析や状態監視機能などによりディス
ク制御装置３０１が行う。以下では、ドライブＤ１０４
にドライブ障害を検出したことを仮定する。ドライブ障
害を検出したディスク制御装置３０１はドライブＤ１０
４のデータを回復するために障害回復処理７００を起動
する。障害回復処理７００では、予備ドライブチェック
７０１で予備ドライブＪ１０９が使用可能状態かどうか
調べる。その結果により条件判定７０２において、予備
ドライブが使用可能状態でなければ７０７、障害回復処
理を終了し、使用可能状態であれば７０８、障害回復処
理を続行する。次にＥＣＣグループ入力７０３で、ＥＣ
Ｃグループ３１０からドライブＤ１０４を除くドライブ
Ａ１０１，Ｂ１０２，Ｃ１０３のデータ３１２，６０
１，６０２をディスク制御装置301に入力する。ディス
ク制御装置３０１に入力した該データ集合からデータ回
復７０４によりドライブＤ１０４のデータ６０３を回復
する。次に回復データ出力７０５で予備ドライブＪ１０
９にデータ６０３を出力する。続く条件判定７０６にお
いて、ドライブＤ１０４の全データを回復したかどうか
を判定し、回復するべきデータが残っている場合には７
０９、ＥＣＣグループ入力７０３，データ回復７０４，
回復データ出力７０５を繰り返す。回復するべきデータ
が残っていない場合には７１０、障害回復処理を終了す
る。以上の処理により、ドライブD104のデータを予備ド
ライブＪ１０９に回復することができ、ドライブＤ１０
４の代わりにドライブＪ１０９をアクセスすることによ
り、ドライブＤ１０４に障害が発生する前と同様に処理
を行うことができる。The drive failure detection 720 is performed by the disk control device 301 by analyzing response information from the drive and monitoring the status. In the following, the drive D104
Assume that a drive failure has been detected. The disk control device 301 that has detected the drive failure is the drive D10.
In order to recover the data of No. 4, the failure recovery processing 700 is started. In the failure recovery processing 700, it is checked in the spare drive check 701 whether the spare drive J109 is available. As a result, in the condition determination 702, if the spare drive is not in the usable state, the failure recovery processing is terminated 707, and if the spare drive is in the usable state, the failure recovery processing is continued 708. Next, in the ECC group input 703, the EC
Data 312, 60 of drives A 101, B 102, C 103 excluding drive D 104 from C group 310
1, 602 are input to the disk control device 301. The data 603 of the drive D 104 is recovered by the data recovery 704 from the data set input to the disk control device 301. Next, at the recovery data output 705, the spare drive J10
9 to output data 603. In the subsequent condition determination 706, it is determined whether or not all the data in the drive D104 has been recovered.
09, ECC group input 703, data recovery 704,
The recovery data output 705 is repeated. If there is no data to be recovered, 710, the failure recovery processing ends. By the above processing, the data of the drive D104 can be recovered to the spare drive J109, and the drive D10 can be recovered.
By accessing the drive J109 instead of 4, it is possible to perform the same processing as before the failure occurred in the drive D104.

【００２１】障害発生時には、ＥＣＣグループの設定の
差異にかかわらず、上記の障害回復処理を行うことによ
り、データの回復を図ることができる。When a failure occurs, data recovery can be achieved by performing the above-described failure recovery processing regardless of the difference in the setting of the ECC group.

【００２２】次に図２を用いて、ＥＣＣグループを特定
のドライブに固定的に設定した場合の障害回復方法の実
施例について詳細に述べる。Next, an embodiment of a failure recovery method when an ECC group is fixedly set to a specific drive will be described in detail with reference to FIG.

【００２３】４個のドライブの同一アドレスのトラック
群にＥＣＣグループを固定的に設定し、３トラックのデ
ータと対応する１トラックのＥＣＣデータを記憶する。
ドライブ１０１〜１０８は８個あるので、同一アドレス
を持つトラック群から２個のＥＣＣグループ２１１，２
１２を設定できる。An ECC group is fixedly set in a track group of the same address of the four drives, and ECC data of one track corresponding to data of three tracks is stored.
Since there are eight drives 101 to 108, two ECC groups 211 and
12 can be set.

【００２４】ドライブＤ１０４に障害を仮定し、障害回
復処理について述べる。ドライブＤ１０４の一部または
全部のトラックがアクセス不可になると、ＥＣＣグルー
プ２１１を用いてドライブＤ１０４のデータを回復す
る。まず、トラック１３１の回復ではドライブＡ１０
１，Ｂ１０２，Ｃ１０３のトラック１３１を入力し、ド
ライブＤ１０４のトラック１３１を回復し、予備ドライ
ブであるドライブＪ109のトラック１３１に記憶する。
同様にドライブＤ１０４のトラック１３２の回復はドラ
イブＡ１０１，Ｂ１０２，Ｃ１０３のトラック１３２を
入力して行う。以下、各障害トラックについて毎回同一
のドライブＡ１０１，Ｂ１０２，Ｃ１０３からＥＣＣグ
ループの残りを入力し同様の処理を行うことにより、ド
ライブD104のデータを回復することができる。Assuming that a failure has occurred in the drive D104, the failure recovery processing will be described. When some or all tracks of the drive D104 become inaccessible, the data of the drive D104 is recovered using the ECC group 211. First, in the recovery of the track 131, the drive A10
Tracks 131, B102 and C103 are input, and the track 131 of the drive D104 is recovered and stored in the track 131 of the drive J109 which is a spare drive.
Similarly, the recovery of the track 132 of the drive D104 is performed by inputting the track 132 of the drives A101, B102, and C103. Thereafter, the data of the drive D104 can be recovered by inputting the rest of the ECC group from the same drive A101, B102, and C103 for each failed track and performing the same processing.

【００２５】本実施例によれば、障害回復処理に伴って
アクセスするドライブはドライブＡ１０１，Ｂ１０２，
Ｃ１０３の合計３個のドライブであり、他のドライブE1
05，Ｆ１０６，Ｇ１０７，Ｈ１０８には影響が無い。こ
のように、ＥＣＣグループを固定的に設定する方式は、
障害回復処理に伴ってアクセスするドライブの範囲を限
定し、他の領域のドライブは通常どおり動作できるとい
う利点がある。反面アクセスが集中するため、複数の障
害が同時に発生した場合には障害回復時間が長くなると
いう問題点がある。According to the present embodiment, the drives to be accessed in accordance with the failure recovery processing are the drives A101, B102,
C103 is a total of three drives, and the other drive E1
05, F106, G107 and H108 have no effect. As described above, the method of fixedly setting the ECC group is as follows.
There is an advantage that the range of drives to be accessed in accordance with the failure recovery processing is limited, and drives in other areas can operate normally. On the other hand, since access is concentrated, if a plurality of faults occur simultaneously, there is a problem that the fault recovery time becomes longer.

【００２６】次に、図１の場合の障害回復方法の実施例
について詳細に述べる。Next, an embodiment of the failure recovery method in the case of FIG. 1 will be described in detail.

【００２７】図１では、ドライブ序列番号が連続する４
ドライブの同一アドレスのトラック群を１個のＥＣＣグ
ループとして設定し、３トラックのデータと対応する１
トラックのＥＣＣデータを記憶する。ドライブ１０１〜
１０８は８個あるので、同一アドレスを持つトラックか
ら２個のＥＣＣグループを設定できる。さらにトラック
単位でＥＣＣグループの設定を１ドライブ分スライドさ
せる。In FIG. 1, four consecutive drive sequence numbers are used.
A group of tracks at the same address of the drive is set as one ECC group, and one track corresponding to data of three tracks is set.
The ECC data of the track is stored. Drive 101-
Since there are eight 108s, two ECC groups can be set from tracks having the same address. Further, the setting of the ECC group is slid by one drive for each track.

【００２８】ドライブＤ１０４に障害を仮定し、障害回
復処理について述べる。ドライブＤ１０４の一部または
全部のトラックがアクセス不可になると、ＥＣＣグルー
プを用いてドライブＤ１０４のデータを回復する。例え
ば、ドライブＤ１０４のトラック１３１の回復ではＥＣ
Ｃグループ１１１を用いる。ドライブＡ１０１，Ｂ１０
２，Ｃ１０３からトラック１３１を入力し、ドライブＤ
１０４のトラック１３１を回復し、予備ドライブである
ドライブＪ１０９のトラック１３１に記憶する。同様に
ドライブＤ１０４のトラック１３２の回復はＥＣＣグル
ープ１１４を用いる。ドライブＥ１０５，Ｆ１０６，Ｇ
１０７からトラック１３２を入力し、同様の処理を行
う。以下、トラック１３３の回復はＥＣＣグループ１１
６を用いて、ドライブＣ１０３，Ｅ１０５，Ｆ１０６に
アクセスし、トラック１３４の回復はＥＣＣグループ１
１８を用いて、ドライブＢ１０２，Ｃ１０３，Ｅ１０５
にアクセスし、トラック１３５の回復はＥＣＣグループ
１０９を用いて、ドライブＡ１０１，Ｂ１０２，Ｃ１０
３にアクセスし、同様の処理を行うことにより、ドライ
ブＤ１０４の全データを回復することができる。Assuming that a failure has occurred in the drive D104, the failure recovery processing will be described. When some or all tracks of the drive D104 become inaccessible, the data of the drive D104 is recovered using the ECC group. For example, in recovery of the track 131 of the drive D104, EC
The C group 111 is used. Drive A101, B10
2, input the track 131 from C103 and drive D
The track 131 of 104 is recovered and stored in the track 131 of the drive J109 which is a spare drive. Similarly, the recovery of the track 132 of the drive D 104 uses the ECC group 114. Drive E105, F106, G
A track 132 is input from 107 and the same processing is performed. Hereinafter, the recovery of track 133 is ECC group 11
6, the drive C103, E105, and F106 are accessed, and the recovery of the track 134 is performed in the ECC group 1
18, the drive B102, C103, E105
, And the tracks 135 are recovered by using the ECC group 109 and the drives A101, B102, and C10.
3 and perform the same processing, all data in the drive D104 can be recovered.

【００２９】本実施例によれば、図１のＥＣＣグループ
の設定では、障害処理に伴ってアクセスするドライブは
合計６個のドライブであり、図２のように固定的にＥＣ
Ｃグループを設定する方法のアクセスドライブ数である
３個に比べて２倍のドライブにアクセスすることができ
る。しかも、アクセスするデータの総量は同じであるた
めドライブＡ１０１，Ｂ１０２，Ｃ１０３及び各ドライ
ブへのパス３０２の負荷の負荷分散を実現できる。According to this embodiment, in the setting of the ECC group in FIG. 1, a total of six drives are accessed in accordance with the failure processing, and as shown in FIG.
It is possible to access twice as many drives as compared to three, which is the number of access drives in the method of setting the C group. Moreover, since the total amount of data to be accessed is the same, load distribution of the loads on the drives A101, B102, C103 and the path 302 to each drive can be realized.

【００３０】以下、図４の場合の障害回復方法の実施例
について詳細に述べる。Hereinafter, an embodiment of the failure recovery method in the case of FIG. 4 will be described in detail.

【００３１】図４では、８個のドライブ１０１〜１０８
に対して、同一アドレスのトラックの中から４個を選択
するすべての組合せを平等に選択し、ＥＣＣグループと
して設定する。例えば、ＥＣＣグループ４１１はドライ
ブＡ１０１，Ｂ１０２，C103，Ｄ１０４のトラック１３
１の集合であり、ＥＣＣグループ４１３はドライブＡ１
０１，Ｄ１０４，Ｅ１０５，Ｆ１０６のトラック１３２
の集合である。In FIG. 4, eight drives 101-108
, All combinations for selecting four tracks from the same address are equally selected and set as an ECC group. For example, the ECC group 411 includes the tracks 13 of the drives A101, B102, C103, and D104.
1 and the ECC group 413 is the drive A1
Track 132 of 01, D104, E105, F106
Is a set of

【００３２】ドライブＤ１０４に障害を仮定し、障害回
復処理について述べる。ドライブＤ１０４の一部または
全部のトラックがアクセス不可になると、ＥＣＣグルー
プを用いてドライブＤ１０４のデータを回復する。トラ
ック１３１の回復ではECCグループ４１１を用いる。ド
ライブＡ１０１，Ｂ１０２，Ｃ１０３からトラック１３
１を入力し、ドライブＤ１０４のトラック１３１を回復
し、予備ドライブであるドライブＪ１０９のトラック１
３１に記憶する。以下、トラック１３２については、Ｅ
ＣＣグループ４１３を用いて、ドライブＡ１０１，Ｅ１
０５，Ｆ106にアクセスし、トラック１３３について
は、ＥＣＣグループ４１６を用いて、ドライブＢ１０
２，Ｇ１０７，Ｈ１０８にアクセスし、トラック１３４
については、ＥＣＣグループ４１８を用いて、ドライブ
Ｂ１０２，Ｅ１０５，Ｆ１０６にアクセスし、トラック
１３５については、ＥＣＣグループ４１９を用いて、ド
ライブＡ１０１，Ｇ１０７，Ｈ１０８にアクセスし、同
様の処理を行うことにより、ドライブＤ１０４のデータ
を回復することができる。Assuming that a failure has occurred in the drive D104, the failure recovery processing will be described. When some or all tracks of the drive D104 become inaccessible, the data of the drive D104 is recovered using the ECC group. The recovery of the track 131 uses the ECC group 411. Tracks 13 from drives A101, B102, C103
1, the track 131 of the drive D104 is recovered, and the track 1 of the drive J109, which is a spare drive, is recovered.
31. Hereinafter, regarding the track 132, E
Using the CC group 413, the drives A101, E1
05 and F106, and for the track 133, using the ECC group 416, the drive B10
2, G107 and H108, and access the track 134
Is accessed by using the ECC group 418 to the drives B102, E105, and F106, and the track 135 is accessed by using the ECC group 419 to access the drives A101, G107, and H108, and the same processing is performed. The data in the drive D104 can be recovered.

【００３３】本実施例によれば、障害処理に伴ってアク
セスするドライブは、ドライブD104を除くすべてのドラ
イブ、つまり合計７個のドライブである。図２のＥＣＣ
グループを固定的に設定する方式の３個に比べて２倍以
上のドライブにアクセスし、図１の方式よりもアクセス
ドライブ数を増すことができる。また、アクセスするデ
ータの総量は同じであるため、ドライブＡ１０１，Ｂ１
０２，Ｃ１０３及び各ドライブへのパス３０２の負荷の
負荷分散を実現できる。反面、各ＥＣＣグループの記憶
領域が断片化するため、領域管理が複雑になるという問
題点がある。According to this embodiment, the drives to be accessed in response to the failure processing are all the drives except the drive D104, that is, a total of seven drives. The ECC of FIG.
More than twice the number of drives can be accessed as compared with the three methods in which the group is fixedly set, and the number of access drives can be increased as compared with the method of FIG. Since the total amount of data to be accessed is the same, the drives A101 and B1
02, C103 and the load of the path 302 to each drive can be distributed. On the other hand, since the storage area of each ECC group is fragmented, there is a problem that area management becomes complicated.

【００３４】上記の３つの実施例では、ＥＣＣグループ
の設定を変更する基準データ量をトラックで行う場合を
述べたが、セクタやシリンダを基準データ量として採用
することは容易である。また、ビットやバイトを基準デ
ータ量として採用するには、上記実施例の様に磁気ディ
スク装置を記憶装置として利用する場合には効率が悪
く、バイトあるいはビット単位でアクセスする記憶装置
でシステムを構成することが望ましい。In the above three embodiments, the case where the reference data amount for changing the setting of the ECC group is performed by the track has been described. However, it is easy to adopt a sector or a cylinder as the reference data amount. In addition, when bits or bytes are used as the reference data amount, the efficiency is low when the magnetic disk device is used as a storage device as in the above-described embodiment, and the system is configured by a storage device accessed in units of bytes or bits. It is desirable to do.

【００３５】上記の実施例では、各パスにドライブを１
個接続したシステムであるが、次に図７に基づいて各パ
スに複数のドライブを接続したシステムの実施例につい
て述べる。図７では、各パスに２個のドライブを接続
し、ドライブ１０１〜１０８とドライブ８０１〜８０８
とでは、ＥＣＣグループの設定を１パス分ずらしてい
る。このため、ドライブＤ１０４とドライブＰ８０４に
同時に障害が発生した場合には、ドライブＤ１０４の障
害回復に伴うアクセスはパス８３１，８３２，833で行
い、ドライブＰ８０４の障害回復に伴うアクセスはパス
８３５，８３６，８３７で行う。In the above embodiment, one drive is assigned to each path.
Next, an embodiment of a system in which a plurality of drives are connected to each path will be described with reference to FIG. In FIG. 7, two drives are connected to each path, and drives 101 to 108 and drives 801 to 808 are connected.
And, the setting of the ECC group is shifted by one pass. Therefore, when a failure occurs in the drive D104 and the drive P804 at the same time, access for recovery from the failure of the drive D104 is performed by the paths 831, 833, and 833, and access for recovery of the failure of the drive P804 is to the paths 835, 826. 837.

【００３６】本実施例によれば、同一パスに複数ドライ
ブを接続するシステムの場合に、ＥＣＣグループをパス
について一定量ずらして設定することにより、同時に複
数のドライブに障害が発生した場合のパスに対する負荷
を分散することができる。According to this embodiment, in the case of a system in which a plurality of drives are connected to the same path, the ECC group is set to be shifted by a fixed amount with respect to the path, so that a path in the case where a failure occurs in a plurality of drives at the same time is avoided. Load can be distributed.

【００３７】[0037]

【発明の効果】以上詳細に述べたごとく、本発明によれ
ば、複数の磁気ディスク装置などの記憶装置からなり、
データを複数の記憶装置にまたがって記憶し、記憶装置
に障害発生時には障害状態の記憶装置を除く残りの記憶
装置のデータから障害回復処理を行う、システムにおい
て、互いに排他的な記憶装置の部分集合に対してＥＣＣ
グループを設定することにより、システムの記憶容量の
拡大などにより記憶装置数が増加しても、障害回復処理
の負荷の増加を防止できる。As described above in detail, according to the present invention, a storage device such as a plurality of magnetic disk devices is provided.
A set of mutually exclusive storage devices in a system that stores data across multiple storage devices and performs a failure recovery process from the data of the remaining storage devices except for the failed storage device when a failure occurs in the storage device Against ECC
By setting the group, even if the number of storage devices increases due to an increase in the storage capacity of the system, it is possible to prevent an increase in the load of the failure recovery processing.

【００３８】また、本発明によれば、上記システムにお
ける障害回復処理において、ＥＣＣグループの設定を基
準データ量毎に記憶装置またはパスについて変更して設
定することにより、障害回復処理に必要なのアクセスを
より多くの記憶装置及びパスに対して行うことができる
ようになるので、負荷分散を図れる。Further, according to the present invention, in the failure recovery processing in the above-described system, the access required for the failure recovery processing is achieved by changing the setting of the ECC group for the storage device or the path for each reference data amount and setting it. Since the operation can be performed for more storage devices and paths, the load can be distributed.

[Brief description of the drawings]

【図１】ＥＣＣグループをトラック単位で１ドライブづ
つずらして設定した構成を示す。FIG. 1 shows a configuration in which an ECC group is set by shifting one drive at a time in track units.

【図２】ＥＣＣグループを特定の４ドライブに固定的に
設定した構成を示す。FIG. 2 shows a configuration in which an ECC group is fixedly set to specific four drives.

【図３】実施例のシステム構成を示す。FIG. 3 shows a system configuration of an embodiment.

【図４】ＥＣＣグループを、８個のドライブの中から４
個のドライブを均等に選択して設定した場合の構成を示
す。FIG. 4 shows an ECC group of 4 out of 8 drives.
1 shows a configuration in a case where drives are equally selected and set.

【図５】図３と同様のシステム構成において、ドライブ
障害に対して、障害回復を行う場合のデータの動きを示
す。FIG. 5 shows data movement in the case of performing recovery from a drive failure in a system configuration similar to that of FIG. 3;

【図６】障害回復処理のフローチャート。FIG. 6 is a flowchart of a failure recovery process.

【図７】同一パスに複数記憶装置をつないだ場合のシス
テム構成図。FIG. 7 is a system configuration diagram when a plurality of storage devices are connected to the same path.

[Explanation of symbols]

１０１〜１０８…磁気ディスクドライブ、１０９…予備
ドライブ、１１１〜１２０…ＥＣＣグループ、１３１〜
１３５…トラック、２１１，２１２…ＥＣＣグループ、
３０１…ディスク制御装置、３０２…パス、３１１，３
１３…データ集合、３１２，３１４…ＥＣＣデータ、３
１０，３２０…ＥＣＣグループ、411〜４２０…ＥＣＣ
グループ、６０１，６０２，６０３…データ、７００…
障害回復処理、７０１…予備ドライブチェックステッ
プ、７０２…予備ドライブチェック判定ステップ、７０
３…ＥＣＣグループ入力ステップ、７０４…データ回復
ステップ、７０５…回復データ出力ステップ、７０６…
回復終了判定ステップ、７０７…予備ドライブ不足の場
合の処理ルート、７０８…予備ドライブがある場合の処
理ルート、７０９…未回復データがある場合の処理ルー
ト、７１０…未回復データが無い場合の処理ルート、７
２０…ドライブ障害検出ステップ、８０１〜８０９…ド
ライブ、８１１，８１３…データ集合、８１２，８１４
…ＥＣＣデータ、８１０，８２０…ＥＣＣグループ、８
３１〜８３９…パス。101 to 108: magnetic disk drive, 109: spare drive, 111 to 120: ECC group, 131 to
135 ... truck, 211, 212 ... ECC group,
301: Disk control device, 302: Path, 311, 3
13: Data set, 312, 314: ECC data, 3
10,320: ECC group, 411-420: ECC
Group, 601, 602, 603 ... data, 700 ...
Failure recovery processing, 701: spare drive check step, 702: spare drive check determination step, 70
3 ECC group input step 704 Data recovery step 705 Recovery data output step 706
Recovery end determination step, 707: processing route when there is insufficient spare drive, 708: processing route when there is a spare drive, 709: processing route when there is unrecovered data, 710 ... processing route when there is no unrecovered data , 7
20: drive failure detection step, 801 to 809: drive, 811, 813 ... data set, 812, 814
... ECC data, 810,820 ... ECC group, 8
31-839 ... pass.

───────────────────────────────────────────────────── フロントページの続き (72)発明者北嶋弘行神奈川県川崎市麻生区王禅寺1099番地株式会社日立製作所システム開発研究所内 (56)参考文献特開平２−157953（ＪＰ，Ａ) 特開昭62−293355（ＪＰ，Ａ) 特開平２−194457（ＪＰ，Ａ) 特開平２−291011（ＪＰ，Ａ) 特開平２−176822（ＪＰ，Ａ) 特表平３−505935（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 12/16 G06F 11/10 G06F 3/06 ──────────────────────────────────────────────────続き Continuation of the front page (72) Inventor Hiroyuki Kitajima 1099 Ozenji Temple, Aso-ku, Kawasaki-shi, Kanagawa Prefecture Hitachi, Ltd. System Development Laboratory (56) References JP-A-2-157953 (JP, A) JP-A-62-293355 (JP, A) JP-A-2-194457 (JP, A) JP-A-2-291011 (JP, A) JP-A-2-176822 (JP, A) JP-A-3-505935 (JP) , A) (58) Field surveyed (Int.Cl. ⁷ , DB name) G06F 12/16 G06F 11/10 G06F 3/06

Claims

(57) [Claims]

1. A group is formed by a plurality of storage areas existing in different storage devices, and at least one storage area in the one group includes one of the other storage areas in the one group. In a storage device system for storing parity data for reproducing lost data when data stored in the storage device is lost, at least one of a plurality of storage regions in a group to which the first storage region of the first storage device belongs. A storage system, wherein one is provided in a second storage device that does not include a storage region in a group to which a second storage region of the first storage device belongs.