JP2702278B2

JP2702278B2 - Disk drive device

Info

Publication number: JP2702278B2
Application number: JP2318446A
Authority: JP
Inventors: 佳子松本; 道生宮崎; 守彦四谷
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1990-11-22
Filing date: 1990-11-22
Publication date: 1998-01-21
Anticipated expiration: 2013-01-21
Also published as: JPH04188463A

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、ディスクアレイを用いたディスクドライブ
装置に関する。Description: TECHNICAL FIELD The present invention relates to a disk drive using a disk array.

［従来の技術］近年、小型のディスクドライブを多数配置したディス
クアレイが考案されている。これは、各ディスクドライ
ブを切り換えて使用することにより、データの転送速度
および信頼性の向上を図ったものである。[Related Art] In recent years, a disk array in which many small disk drives are arranged has been devised. This is to improve the data transfer speed and reliability by switching and using each disk drive.

このディスクアレイを用いた、技術を図面を用いて説
明する。A technique using this disk array will be described with reference to the drawings.

第13図は、アレイ構造をとるドライブ群の一例を示し
たものである。FIG. 13 shows an example of a drive group having an array structure.

この装置は、ハードディスクコントローラ301〜304
と、ディスクドライブ310〜312,320〜322,330〜332,340
〜342と、すなわち、合計４台のハードディスクコント
ローラと、16台のディスクドライブとにより構成されて
いる。This device is a hard disk controller 301-304
And disk drives 310-312,320-322,330-332,340
342, that is, a total of four hard disk controllers and 16 disk drives.

今、当該アレイディスクにおける１論理グループを、
ディスクドライブ310,320,330,340と、ディスクドライ
ブ311,321,331,341と、ディスクドライブ312,322,332,3
42との、各４台づつの３つのグループとする。Now, one logical group in the array disk is
Disk drives 310, 320, 330, 340, disk drives 311, 321, 331, 341 and disk drives 312, 322, 332, 3
There are three groups of four, each with 42.

そして、それぞれの論理グループに属するディスクド
ライブのうち、ハードディスクコントローラ301,302配
下のドライブをデータ格納用ドライブに、ハードディス
クコントローラ303の配下のドライブを冗長データ用ド
ライブに、ハードディスクコントローラ304の配下のド
ライブを予備ドライブに割当てている。Of the disk drives belonging to each logical group, the drives under the hard disk controllers 301 and 302 are used as data storage drives, the drives under the hard disk controller 303 are used as redundant data drives, and the drives under the hard disk controller 304 are used as spare drives. Assigned to

なお、点線ａで囲まれたドライブ群は１つのグループ
を示している。The drive group surrounded by a dotted line a indicates one group.

今、ディスクドライブ310に故障が発生した時は、予
備として割合てられているディスクドライブ340を切換
対象とし、ディスクドライブ310と切り換える。Now, when a failure occurs in the disk drive 310, the disk drive 340, which is set as a spare, is set as a switching target and is switched to the disk drive 310.

切換後、当該論理グループ（以下グループと称す）
は、ディスクドライブ320,330,340により構成されるこ
とになる（点線ｂで囲まれたドライブ群）。その後、さ
らにディスクドライブ320に故障が発生した時には、同
様に、やはり予備として割当てられているディスクドラ
イブ341に切換る。After switching, the relevant logical group (hereinafter referred to as group)
Is composed of disk drives 320, 330, and 340 (a drive group surrounded by a dotted line b). Thereafter, when a failure occurs in the disk drive 320, the disk drive is similarly switched to the disk drive 341 that is also assigned as a spare.

その結果、当該グループは、ディスクドライブ330,34
0,341により構成されることとなる（点線ｃで囲まれた
ドライブ群）。As a result, the group is assigned to the disk drives 330, 34
0,341 (a drive group surrounded by a dotted line c).

他の例について示す。 Another example will be described.

第14図は、アレイ構造をとるドライブ群のうちの２グ
ループだけを示したものである。グループＡはディスク
ドライブ350a〜355aからなり、グループＢはディスクド
ライブ350b〜355bより構成されている。そしてディスク
ドライブ350a〜353aとディスクドライブ350b〜353bとが
データ格納用ドライブとして割り当てられている。ま
た、ディスクドライブ354a、355a、354b、355bは、冗長
データ用ドライブに割当てられている。FIG. 14 shows only two of the group of drives having an array structure. Group A includes disk drives 350a to 355a, and group B includes disk drives 350b to 355b. The disk drives 350a to 353a and the disk drives 350b to 353b are allocated as data storage drives. Further, the disk drives 354a, 355a, 354b, 355b are allocated to redundant data drives.

この場合、ディスクドライブに故障が発生したとして
も１グループ内で２台までならば、データは失われない
構成になっている。In this case, even if a failure occurs in the disk drive, data is not lost if the number of drives in one group is two or less.

［発明が解決しようとする課題］しかし、上記第13図に示した、従来技術におけるディ
スクドライブに故障が発生した時のディスクドライブの
切換は、予備として最初に割合てられたディスクドライ
ブを、逐次、切換対象ディスクドライブとして使用する
ため、切換後のグループを構成するドライブが、同一ハ
ードディスクコントローラ配下に複数存在することがあ
った。その結果、同時に、アクセスできるドライブ数が
減少し、当該グループの性能低下につながっていた。[Problems to be Solved by the Invention] However, as shown in FIG. 13, switching of the disk drive when a failure occurs in the disk drive according to the prior art is performed by sequentially switching the disk drive initially set as a spare. In some cases, a plurality of drives constituting a group after switching exist under the same hard disk controller for use as a disk drive to be switched. As a result, at the same time, the number of drives that can be accessed is reduced, leading to a decrease in the performance of the group.

上述の場合について具体的にいえば、一台のハードデ
ィスクコントローラ304に、２台のデイスクドライブ装
置340、341が割当てられている。そのため、当該グルー
プは、１度にアクセスできるディスクドライブは、ディ
スクドライブ330と、ディスクドライブ340または341の
いずれか一台、すなわち、２台だけとなる。More specifically, in the above case, two disk drive devices 340 and 341 are allocated to one hard disk controller 304. Therefore, in the group, only one disk drive 330 and one of the disk drives 340 or 341 can be accessed at a time, that is, only two disk drives.

従って、１台のドライブは、シーケンシャルにしかア
クセスできない為、性能低下につながっていた。Therefore, a single drive can only be accessed sequentially, leading to a reduction in performance.

また、第14図に示した例においては、あるグループで
は、あと１台でも同一グループ内でディスクドライブに
故障が発生すると、データ復元不可能な状態であるのに
対し、他のグループは、全ドライブ正常で、あと２台の
ディスクドライブに故障が起こってもデータ復元が可能
な状態になる場合がある。すなわち、データ喪失が起こ
る危険性の高いグループと低いグループが混在するとい
った状態が発生し、全体としての信頼性が低下するとい
った問題があった。In the example shown in FIG. 14, if a disk drive of one group fails in the same group, the data cannot be restored. There is a case where the drive is normal and data can be restored even if two disk drives fail. In other words, there is a problem that a group in which data loss is likely to occur and a group in which data loss is low coexist, and the reliability as a whole decreases.

具体的にいえば、ディスクドライブ350aとディスクド
ライブ351aとに故障が発生した場合、グループＡは、あ
と１台でもドライブ故障が起こればデータの復元は不可
能である。これに対し、グループＢは、あと２台ドライ
ブ故障が起こっても、データの復元は可能であり、各グ
ループ間において、信頼度にかたよりが生じていた。More specifically, if a failure occurs in the disk drive 350a and the disk drive 351a, data cannot be restored in the group A if at least one drive fails. On the other hand, in the group B, even if two more drive failures occur, data can be restored, and the reliability of each group varies.

また、各々のグループに要求される性能及び信頼度が
異なる場合でも、それらを配慮するような切換はされて
いなかった。つまり、性能を高く要求されるグループが
同一ハードディスクコントローラの配下に複数のディス
クドライブで構成されていたり、信頼性を高く要求され
るグループが、データ喪失の危険性の高い状態になって
いた。Further, even when the performance and the reliability required for each group are different, the switching has not been performed in consideration of them. In other words, a group required to have high performance is composed of a plurality of disk drives under the same hard disk controller, and a group required to have high reliability has a high risk of data loss.

本発明の目的は、ディスクドライブに故障が発生した
時のディスクドライブ切換後も、当該グループを構成す
るディスクドライブの配置が、１度にアクセスできるド
ライブ数が減少しないような位置にあるディスクドライ
ブを選択するディスクドライブ装置を提供することであ
る。An object of the present invention is to dispose a disk drive in a position such that the number of drives that can be accessed at one time does not decrease even after the disk drive switching when a failure occurs in the disk drive. The purpose is to provide a disk drive device to be selected.

また、他の目的は、ディスクドライブの故障によるグ
ループ間のデータ喪失危険度のかたよりを、危険度の低
いグループから危険度の高いグループへ、ドライブを融
通することにより、危険度の分散を図った、ディスクド
ライブ装置を提供することである。Another object is to distribute the risk by sharing the risk of data loss between groups due to a disk drive failure from a low-risk group to a high-risk group. To provide a disk drive device.

さらに、別の目的は、ディスクドライブの故障時に、
各々のグループに要求される性能や信頼性等のユーザ指
示情報により、ディスクドライブの切換を行なうか否
か、又、切換を行なう時には、要求に合う最適な切換対
象を選択するディスクドライブ装置を提供することであ
る。In addition, another purpose is when a disk drive fails.
Provided is a disk drive device for determining whether or not to switch disk drives based on user instruction information required for each group, such as performance and reliability, and when switching, selecting an optimum switching target that meets the requirements. It is to be.

また、ディスクドライブの切換動作を行なう契機を、
グループ毎に設定できる機能を備えたディスクドライブ
装置を提供することにある。Also, the trigger for switching the disk drive is
An object of the present invention is to provide a disk drive device having a function that can be set for each group.

［課題を解決するための手段］本発明は上記目的を達成するためになされたもので、
その一態様としては、複数のドライブから構成される論
理ドライブグループを複数有するディスクドライブ群
と、各グループのドライブを制御する制御手段とを備え
たディスクドライブ装置において、上記論理ドライブグ
ループは予備ドライブを有し、制御手段は、故障ドライ
ブを、該故障ドライブと同一グループ内の予備ドライブ
または該故障ドライブを制御する制御手段と同一の制御
手段により制御される予備ドライブに切換える第１のモ
ードと、故障ドライブを、該故障ドライブを制御する制
御手段と同一制御手段により制御されるドライブのう
ち、使用されていない予備ドライブの存在するグループ
に属するドライブを切換対象ドライブとし、該切換対象
ドライブを該切換対象ドライブと同一グループの予備ド
ライブとを切換え、その後、故障ドライブと、該故障ド
ライブと同一制御手段により制御される切換対象ドライ
ブとを切換える第２のモードとを有し、いづれかのドラ
イブに故障が発生した場合、該故障ドライブと同一グル
ープ内の予備ドライブ、または、該故障ドライブを制御
する制御手段と同一制御手段により制御される予備ドラ
イブが存在する場合には、第１のモードに入りドライブ
を切り替えるが、当該予備ドライブが存在しない場合に
は、第２のモードに入りドライブを切り換える機能を有
することを特徴とするディスクドライブ装置が提供され
る。[Means for Solving the Problems] The present invention has been made to achieve the above object.
According to one aspect, in a disk drive device including a disk drive group having a plurality of logical drive groups including a plurality of drives and control means for controlling the drives of each group, the logical drive group includes a spare drive. A first mode for switching the failed drive to a spare drive in the same group as the failed drive or a spare drive controlled by the same control means as the control means for controlling the failed drive; Among the drives controlled by the same control means as the control means for controlling the failed drive, a drive belonging to a group in which an unused spare drive exists is set as a drive to be switched, and the drive to be switched is set as the drive to be switched. Switch between the drive and a spare drive in the same group And a second mode for switching between the failed drive and a drive to be switched controlled by the same control means as the failed drive. If a failure occurs in any of the drives, the failure drive is included in the same group as the failed drive. When there is a spare drive or a spare drive controlled by the same control means as the control means for controlling the failed drive, the first mode is entered and the drive is switched. A disk drive device having a function of entering a second mode and switching a drive.

他の態様としては、複数のドライブから構成される論
理ドライブグループを複数有するディスクドライブ群
と、各グループのドライブを制御する制御手段とを備え
たディスクドライブ装置において、制御手段は、上記論
理グループ毎に、ドライブの故障が発生した場合データ
損失に至るまでのドライブ数であるマージンを確認し、
該マージンの大きいグループに属するドライブを、マー
ジンの小さいグループに所属するように切換ることを特
徴とするディスクドライブ装置が提供される。According to another aspect, in a disk drive device including a disk drive group having a plurality of logical drive groups each including a plurality of drives, and control means for controlling the drives of each group, the control means includes: In the event of a drive failure, check the margin, which is the number of drives until data loss occurs,
There is provided a disk drive device characterized in that a drive belonging to a group having a large margin is switched so as to belong to a group having a small margin.

また別の態様としては、複数のドライブと、各ドライ
ブを制御する制御手段とを備えたディスクドライブ装置
において、上記制御手段は、使用するドライブを切換る
機能とともに、予め設定された時間帯においてのみ、該
切換を行う機能を有することを特徴とするディスクドラ
イブ装置が提供される。As another aspect, in a disk drive device including a plurality of drives and control means for controlling each drive, the control means has a function of switching a drive to be used, and has a function of switching only a predetermined time zone. And a disk drive device having a function of performing the switching.

なお、上記各態様において、グループを構成するドラ
イブは、論理上、同一論理グループに属するドライブは
同一行に位置し、かつ、同一の制御手段により制御され
るドライブは同一列に位置するような、マトリクス状に
配置されていることが好ましい。また、ドライブにはデ
ータの登録を行うデータ格納用ドライブと、冗長データ
用ドライブとを含んでいても良い。In each of the above aspects, the drives constituting the group are logically such that drives belonging to the same logical group are located in the same row, and drives controlled by the same control means are located in the same column. Preferably, they are arranged in a matrix. The drives may include a data storage drive for registering data and a redundant data drive.

また、制御手段は、以下のパラメータを一つ以上備え
ていても構わない。The control means may include one or more of the following parameters.

各グループを構成しているデータ格納用ドライブと、
冗長データ用ドライブの、物理アドレス及び、当該ドラ
イブの状態を示す故障フラグから構成される。グループ
テーブル。The data storage drives that make up each group,
It consists of a physical address of the redundant data drive and a failure flag indicating the status of the drive. Group table.

現在の予備ドライブの個数、及び、予備ドライブの物
理アドレスと、当該予備ドライブが未使用か否かを示す
無効フラグより構成される予備ドライブ管理表。A spare drive management table including the current number of spare drives, the physical address of the spare drive, and an invalid flag indicating whether the spare drive is unused.

物理アドレスと、当該アドレスのドライブの属性を示
す、物理ドライブ管理表。なお、当該物理ドライブ管理
表は、当該ドライブが正常か、故障かを示す故障フラグ
と、当該ドライブの属するグループ番号と、当該ドライ
ブがデータ格納用ドライブか、冗長データ用ドライブ
か、予備ドライブかを示す、ドライブ識別子により構成
される。A physical drive management table showing a physical address and an attribute of a drive of the address. The physical drive management table includes a failure flag indicating whether the drive is normal or a failure, a group number to which the drive belongs, and whether the drive is a data storage drive, a redundant data drive, or a spare drive. It is composed of the following drive identifiers.

各グループ毎に、要求される性能及び信頼性のレベル
を示す、要求レベル管理表。該要求レベル管理表は、当
該グループに要求される性能レベルと、当該グループ内
の故障ドライブ数と、当該グループに要求される信頼性
レベルとにより構成される。Requirement level management table showing the required performance and reliability levels for each group. The required level management table includes a performance level required for the group, the number of failed drives in the group, and a reliability level required for the group.

各グループ毎に、ドライブ故障時の切換動作を行なう
契機を指示する、切換スケジューリング表。6 is a switching scheduling table for instructing a switching operation at the time of a drive failure for each group.

ドライブ故障時、即座に切換を行なわない場合、切換
動作の予約行う、切換予約テーブル。該切換予約テーブ
ルは、各グループ毎に設定され、故障ドライブアドレス
と、登録フラグにより構成される。A switching reservation table in which a switching operation is reserved if switching is not performed immediately when a drive fails. The switching reservation table is set for each group and includes a failed drive address and a registration flag.

［作用］制御手段は、故障ドライブを検出した時、当該ドライ
ブの切替を現在行なうか否かの判断を行なう。これは、
故障ドライブのアドレスより、物理ドライブ管理表から
グループ番号を求め、さらに、切替スケジューリング表
より当該グループの切替契機を求め、現在の時間と照ら
し合わせることにより判断される。この時、同時に、物
理ドライブ管理表の故障フラグをオンする。現在切替を
行なう時であれば、即座に切替を行なうが、そうでない
と判断された時は、当該グループに対応する切替予約テ
ーブルにアドレスを登録し登録フラグをオンする。登録
された故障ドライブは、ある一定周期で監視され、切替
時期に達した時、登録フラグをオフし、切替を行なう。
これにより、グループ毎に設定された契機に従い、切替
を行なうことができる。[Operation] When detecting a failed drive, the control means determines whether or not the drive is currently switched. this is,
The determination is made by obtaining the group number from the physical drive management table from the address of the failed drive, further obtaining the switching trigger of the group from the switching scheduling table, and comparing it with the current time. At this time, the failure flag in the physical drive management table is turned on at the same time. If the current switching is to be performed, the switching is performed immediately. If not, the address is registered in the switching reservation table corresponding to the group and the registration flag is turned on. The registered failed drive is monitored at a certain period, and when the switching time has been reached, the registration flag is turned off and switching is performed.
Thus, switching can be performed in accordance with the trigger set for each group.

切替を行なう時、予備ドライブの配置等の状態によ
り、次の３つの状態が存在する。When the switching is performed, the following three states exist depending on the state of the spare drive and the like.

なお、この第１の状態と第２の状態とは第１のモード
に対応するもので、第３の状態は第２のモードに対応す
るものである。Note that the first state and the second state correspond to the first mode, and the third state corresponds to the second mode.

第１の状態は、予備ドライブが存在し、且つ、故障ド
ライブのアドレスと同一行又は同一列に存在する状態で
あり、第２の状態は、予備ドライブは存在するが、故障
ドライブと同一行又は同一列に予備ドライブが存在しな
い状態であり、第３の状態は、予備ドライブが存在しな
い状態である。The first state is a state in which a spare drive exists and exists in the same row or the same column as the address of the failed drive. The second state is that a spare drive exists but is located in the same row or There is no spare drive in the same row, and the third state is where there is no spare drive.

現在、どの状態かの判断は以下の手順で行なわれる。
まず予備ドライブ管理表にて、予備ドライブ数を求め、
予備ドライブ数が０ならば、第３の状態と判断される。
予備ドライブ数が０以外ならば、第１か第２の状態であ
る。第１か第２かの判断は、故障ドライブアドレスを
（l,m）とすると、予備ドライブ管理表で無効フラグが
オフの予備ドライブアドレスで、行にｌを持つもの又
は、列にｍを持つものが存在すれば、第１の状態、無け
れば第２の状態と判断される。The current state is determined in the following procedure.
First, find the number of spare drives in the spare drive management table,
If the number of spare drives is 0, it is determined to be in the third state.
If the number of spare drives is other than 0, it is the first or second state. The first or second judgment is that, assuming that the failed drive address is (l, m), a spare drive address in which the invalid flag is off in the spare drive management table has l in the row or m in the column. If something exists, it is determined to be in the first state, otherwise it is determined to be in the second state.

第１の状態の切替動作としては、故障ドライブのグル
ープ番号より、グループテーブルにて当該グループのデ
ータ格納用ドライブと冗長データ用ドライブを求め、そ
れらのドライブのデータより、故障ドライブのデータを
復元し、予備ドライブへコピーする。そして、グループ
テーブルの、故障ドライブアドレスを、予備ドライブア
ドレスに変更し、当該予備ドライブの有効フラグをオフ
にし予備ドライブ数を−１する。さらに、物理ドライブ
管理表で、故障ドライブのグループ番号とデータ識別子
を、切替た予備ドライブへコピーする。In the first state switching operation, a data storage drive and a redundant data drive of the group are obtained from the group table based on the group number of the failed drive, and the data of the failed drive is restored from the data of those drives. , Copy to spare drive. Then, the failed drive address in the group table is changed to the spare drive address, the validity flag of the spare drive is turned off, and the number of spare drives is decremented by one. Further, the group number and the data identifier of the failed drive are copied to the switched spare drive in the physical drive management table.

第２の状態では、単純に予備ドライブへ切替ると、切
替後、当該グループの性能が低下する。ここで、当該グ
ループに要求されている性能レベルを要求レベル管理表
にて判断し、性能に対する要求が低ければ予備ドライブ
にそのまま切替る。切替動作は第１の状態と同様であ
る。性能に対する要求が高い時は、まず切替対象ドライ
ブを探すことから始まる。予備ドライブを同一行に持
ち、且つ故障ドライブの同一列に存在する正常ドライブ
を検索する。これは、予備ドライブ管理表にて、有効フ
ラグがオンである予備ドライブアドレスを求める。これ
を（i,j）とし、故障ドライブのアドレスを（l,m）とす
ると、アドレス（i,m）が正常ドライブであるか否かを
物理ドライブ管理表にて求め、正常ならば当該ドライブ
が切替対象となる。故障ならば、次の予備ドライブを求
め、上記条件を満たす正常ドライブが見つかるまでくり
返す。その結果、上記条件を満たすドライブが存在しな
ければ、予備ドライブを切替対象とする。この時の切替
動作は第１の状態の切替動作と同様である。条件を満た
すドライブが存在する時の切替動作を説明する。切替対
象ドライブ（以下C,DRV）をまず予備ドライブ（以下S.D
RV）にコピーする。第１の状態の時と同様に故障ドライ
ブ（以下B.DRV）のデータを復元し、C.DRVへコピーす
る。次に、グループテーブルにて、S.DRVアドレスを、
C.DRVアドレスに更新し、B.DRVアドレスをC.DRVアドレ
スに更新する。予備ドライブ管理表にて、当該S.DRVの
有効フラグをオフし、予備ドライブ数を−１する。次に
物理ドライブ管理表に2,C.DRVのグループ番号とドライ
ブ識別子を、S.DRVにコピーし、次にB.DRVのグループ番
号とドライブ識別子をC.DRVにコピーする。これによ
り、ドライブ故障時の予備ドライブを使った切替による
各グループの性能は、切替前と同様の性能を維持するこ
とができる。又、各グループの性能要求に対してそれに
適したドライブ切替を提供できる。In the second state, if the drive is simply switched to the spare drive, the performance of the group deteriorates after the switch. Here, the performance level required for the group is determined from the required level management table, and if the performance requirement is low, the drive is directly switched to the spare drive. The switching operation is the same as in the first state. When the demand for performance is high, the process starts by searching for a drive to be switched. A normal drive which has a spare drive in the same row and exists in the same column of the failed drive is searched. This is to find a spare drive address whose valid flag is ON in the spare drive management table. If this is (i, j) and the address of the failed drive is (l, m), the physical drive management table determines whether the address (i, m) is a normal drive. Becomes a switching target. If a failure occurs, a next spare drive is obtained, and the process is repeated until a normal drive satisfying the above conditions is found. As a result, if there is no drive that satisfies the above condition, the spare drive is set as the switching target. The switching operation at this time is the same as the switching operation in the first state. The switching operation when there is a drive satisfying the condition will be described. The drive to be switched (hereinafter C, DRV) is first replaced with a spare drive (hereinafter SD)
RV). As in the first state, the data of the failed drive (hereinafter, B.DRV) is restored and copied to C.DRV. Next, in the group table, set the S.DRV address
Update to C.DRV address and update B.DRV address to C.DRV address. In the spare drive management table, the valid flag of the S.DRV is turned off, and the number of spare drives is decremented by one. Next, the group number and drive identifier of 2, C.DRV are copied to S.DRV in the physical drive management table, and then the group number and drive identifier of B.DRV are copied to C.DRV. Thereby, the performance of each group by switching using the spare drive at the time of drive failure can maintain the same performance as before switching. Further, it is possible to provide drive switching suitable for the performance requirements of each group.

第３の状態は、故障ドライブは回復不可能なので、デ
ータ喪失の危険性の分散を、当該グループに要求される
信頼性レベルに合わせ行なう。分散方法としては、要求
レベル管理表により、当該グループの障害ドライブ数を
＋１し、信頼レベルと比較する。信頼性に対する要求が
高くて、且つデータ喪失までのマージンが少ない時、デ
ータ喪失までのマージンが大きく且つ信頼性に対する要
求が低いグループより、ドライブを融通する。これは、
信頼性レベルと、障害ドライブ数より、融通対象となる
グループ番号を求め、当該グループの中から、故障ドラ
イブと同一列のドライブを求め融通の対象とする。これ
は融通後の性能低下を防ぐ為である。同一列にドライブ
が無ければ、任意の列のドライブを対象とする。融通す
る為の切替動作は、第１の状態と同様に故障ドライブの
データを復元し、切替対象ドライブへコピーする。次
に、グループテーブルにて、故障ドライブのアドレス
を、切替対象ドライブのアドレスに更新し、切替対象ド
ライブの故障フラグをオンする。次に物理ドライブ管理
表より、故障ドライブのグループ番号とドライブ識別子
を切替対象ドライブへコピーする。要求レベル管理表に
て、故障ドライブを含むグループの故障ドライブ数を−
１し、切替対象ドライブを含むグループの障害ドライブ
数を＋１する。これにより、グループに要求される信頼
性に合った危険度の分散が実現できる。In the third state, since the failed drive is not recoverable, the risk of data loss is distributed according to the reliability level required for the group. As a distribution method, the number of failed drives in the group is incremented by 1 according to the request level management table, and is compared with the reliability level. When the demand for reliability is high and the margin before data loss is small, the drive is more flexible than the group having a large margin for data loss and a low demand for reliability. this is,
Based on the reliability level and the number of failed drives, a group number to be interchanged is determined, and a drive in the same row as the failed drive is determined from the group and is set as an interchange target. This is to prevent performance degradation after the interchange. If there is no drive in the same row, the drive in any row is targeted. In the switching operation for interchange, the data of the failed drive is restored and copied to the switching target drive as in the first state. Next, in the group table, the address of the failed drive is updated to the address of the switching target drive, and the failure flag of the switching target drive is turned on. Next, the group number and the drive identifier of the failed drive are copied to the switching target drive from the physical drive management table. In the request level management table, the number of failed drives in the group including the failed drive
In step 1, the number of failed drives in the group including the switch target drive is incremented by one. As a result, it is possible to achieve the distribution of the risk level that matches the reliability required for the group.

［実施例］以下、本発明の一実施例を図面を用いて説明する。Hereinafter, an embodiment of the present invention will be described with reference to the drawings.

第１図は本発明を適用したディスクドライブ装置を含
む計算機システムである。FIG. 1 shows a computer system including a disk drive device to which the present invention is applied.

この計算機システムは、中央処理装置であるCPU60
と、ディスクアレイ制御装置70とディスクアレイ装置80
とから主に構成されるディスクドライブ装置とからな
る。This computer system has a central processing unit CPU60.
And the disk array controller 70 and the disk array device 80
And a disk drive device mainly composed of

そして、ディスクアレイ制御装置70は、CPU60からの
指示に従いディスクアレイ装置80を制御している。The disk array control device 70 controls the disk array device 80 in accordance with an instruction from the CPU 60.

第２図にディスクアレイ制御装置70の内部構成を示
す。FIG. 2 shows the internal configuration of the disk array controller 70.

マイクロプロセッサユニット74は、ランダムアクセス
メモリ75に格納されているプログラムを、逐次デコード
しながら実行し、ディスクアレイ制御装置70全体を制御
している。チャンネル制御回路71は、CPU60とのデータ
転送を制御している。制御回路72は各ドライブとのデー
タ転送を制御している。データバッファ73は、チャンネ
ル制御回路71と制御回路72間のデータ転送時に用いられ
るものである。冗長データ作成回路76は、CPU60より送
られて来たデータに対して冗長データを付加する。時計
77は本サブシステムに備えられたものである。The microprocessor unit 74 executes the programs stored in the random access memory 75 while sequentially decoding them, and controls the entire disk array controller 70. The channel control circuit 71 controls data transfer with the CPU 60. The control circuit 72 controls data transfer with each drive. The data buffer 73 is used when transferring data between the channel control circuit 71 and the control circuit 72. The redundant data creation circuit 76 adds redundant data to the data sent from the CPU 60. clock
77 is provided in this subsystem.

読み出し時、ディスクアレイ装置80の各ドライブより
読み出されたデータは、データバッファ73に一旦貯えら
れ、同期をとってから、チャンネル制御回路71によりCP
U60へ送られる構成となっている。At the time of reading, data read from each drive of the disk array device 80 is temporarily stored in the data buffer 73, and after synchronization, the channel control circuit 71
It is configured to be sent to U60.

第３図は、ディスクアレイ装置80を構成するドライブ
構成を示したものである。FIG. 3 shows a drive configuration of the disk array device 80.

データ転送制御回路81〜87は、ディスクアレイ制御装
置70とデータの転送を行うためのものである。各データ
転送制御回路81〜87には、それぞれ４台のドライブ81a
〜81d、82a〜82d、・・・、87a〜87dが接続されてい
る。The data transfer control circuits 81 to 87 are for transferring data to and from the disk array control device 70. Each of the data transfer control circuits 81 to 87 includes four drives 81a.
To 81d, 82a to 82d, ..., 87a to 87d are connected.

そして、データ転送制御回路81,82,83,84配下のドラ
イブ群は、データ格納用ドライブとして、また、データ
転送制御回路85、86配下のドライブ群は冗長データ用ド
ライブとして、データ転送制御回路87配下のドライブ群
は予備ドライブとして割当てられている。The drive group under the data transfer control circuits 81, 82, 83, 84 is a data storage drive, and the drive group under the data transfer control circuits 85, 86 is a redundant data drive. The subordinate drive group is assigned as a spare drive.

また、このドライブ群は、データ回復グループ（以
下、グループという）を、ドライブ81a〜87a、ドライブ
81b〜87b、ドライブ81c〜87c、ドライブ81d〜87dの４組
としている。なお、これ以降、各グループをデータ転送
制御回路81〜87に近い方から順に、グループGa,Gb,Gc,G
dという。In addition, this drive group includes a data recovery group (hereinafter referred to as a group), drives 81a to 87a, and drives 81a to 87a.
81b to 87b, drives 81c to 87c, and drives 81d to 87d. Hereafter, the groups Ga, Gb, Gc, G are grouped in the order from the one closer to the data transfer control circuits 81 to 87.
called d.

また、各ドライブのアドレス付けを、ドライブの配置
を行列とみなし、行方向が０〜6,列方向が０〜３とし、
（0,0）〜（3,6）をアドレスとする。In addition, the addressing of each drive is regarded as a matrix, and the arrangement of the drives is regarded as a matrix.
(0,0) to (3,6) are addresses.

本実施例のマイクロプロセッサユニット74は、各グル
ープ、ドライブに要求される信頼性等考慮し、該要求等
を満足するに最適なドライブを選択して、ドライブの切
り替えを行うように構成されている。具体的には、予
め、各種要求、条件等を入力された後述する各種テーブ
ルをランダムアクセスメモリ75等に備えて、このテーブ
ルを参照しながら、切り替えるドライブを選択する構成
である。The microprocessor unit 74 of the present embodiment is configured to select a drive most suitable for satisfying the requirements and the like in consideration of reliability and the like required for each group and drive, and to switch the drive. . Specifically, the random access memory 75 or the like is provided with various tables to be described later in which various requests, conditions, and the like are input in advance, and selects a drive to be switched while referring to the tables.

このテーブルを、第４図乃至第９図を用いて説明す
る。This table will be described with reference to FIGS.

第４図に示すグループテーブル90は、各グループを構
成するドライブのアドレス情報を持っている。The group table 90 shown in FIG. 4 has the address information of the drives constituting each group.

この表の中の、行100はグループGa、行101はグループ
Gb、行102はグループGc、行103はグループGdに対応して
いる。そして、各グループに対応して、データ格納用の
ドライブアドレス情報91〜94、冗長データ用のドライブ
アドレス情報95,96が設けられている。In this table, row 100 is group Ga, row 101 is group
Gb, row 102 corresponds to group Gc, and row 103 corresponds to group Gd. In addition, drive address information 91 to 94 for data storage and drive address information 95 and 96 for redundant data are provided for each group.

各ドライブアドレス情報91〜94には、当該データがド
ライブ故障によりアクセス不可能な状態であることを示
す故障フラグ104と、当該ドライブの行アドレス105と、
列アドレス106とから構成されている。Each of the drive address information 91 to 94 includes a failure flag 104 indicating that the data is inaccessible due to a drive failure, a row address 105 of the drive,
And a column address 106.

次に、第５図に予備ドライブ管理表110を示す。 Next, FIG. 5 shows the spare drive management table 110.

この予備ドライブ管理表110は、現在の予備ドライブ
のアドレス情報を示したものである。これは、予備ドラ
イブが使用済みでないことを示す有効フラグ111と、当
該予備ドライブの行アドレス112と列アドレス113とより
構成されている。また、現在の予備ドライブの数を示す
予備ドライブ数114を持っている。This spare drive management table 110 shows the address information of the current spare drive. This is composed of a valid flag 111 indicating that the spare drive has not been used, and a row address 112 and a column address 113 of the spare drive. Further, it has a spare drive number 114 indicating the current number of spare drives.

第６図に物理ドライブ管理表120を示す。 FIG. 6 shows the physical drive management table 120.

この物理ドライブ管理表120は、図中テーブルの上側
と左側とに示した数字Ｉ、Ｊからなる行列（I,J）で示
されるアドレスのドライブの管理情報を示すもので、当
該ドライブが故障ドライブであることを示すためのもの
である。The physical drive management table 120 shows management information of a drive having an address indicated by a matrix (I, J) consisting of numbers I and J shown on the upper side and the left side of the table in the figure. It is to show that it is.

そして、各ドライブ毎に、故障フラグ121と、当該ド
ライブがデータ格納用ドライブであるのか、冗長データ
用ドライブであるのか、予備ドライブであるのかを示す
識別子122と、当該ドライブの属するグループ番号123と
を有している。For each drive, a failure flag 121, an identifier 122 indicating whether the drive is a data storage drive, a redundant data drive, or a spare drive, and a group number 123 to which the drive belongs are included. have.

第７図に要求レベル管理表130を示す。 FIG. 7 shows the request level management table 130.

要求レベル管理表130は、各グループの性能及び信頼
性の要求レベルを示すものであり、この表中の、行131
がグループGa、行132がグループGb、行133がグループG
c、行134がグループGdに対応している。The request level management table 130 indicates the required level of performance and reliability of each group.
Is group Ga, row 132 is group Gb, row 133 is group G
c, row 134 corresponds to group Gd.

また、各グループは、当該グループに要求される性能
レベル136と、当該グループ内の故障ドライブ数137、当
該グループに要求される信頼性レベル138にて構成され
る。Each group includes a performance level 136 required for the group, the number of failed drives 137 in the group, and a reliability level 138 required for the group.

性能レベル136は、「０」は「低い」、「１」は「高
い」と定義している。性能レベル136が高い時は、ディ
スク故障による切換を、切換後も性能低下をきたさない
ドライブへ切換るものとし、低い時は、予備ドライブへ
単純に切換るものとする。The performance level 136 defines “0” as “low” and “1” as “high”. When the performance level 136 is high, the switching due to the disk failure is switched to a drive that does not cause performance degradation even after the switching, and when the performance level 136 is low, it is simply switched to the spare drive.

また、信頼性レベル138は、「０」を「低い」、
「１」を「高い」と定義している。そして、信頼性レベ
ル138が高いグループについては、当該グループ内で故
障ドライブ数が２台となった場合、信頼性レベル138の
低いグループよりドライブを切換て、融通する。信頼性
レベル138が低いグループについても、上記条件で切換
を行なうが、切換対象ドライブは、信頼性レベル138の
低いグループから選択する。In addition, the reliability level 138 indicates that “0” is “low”,
“1” is defined as “high”. When the number of failed drives in the group having the higher reliability level 138 becomes two, the drive is switched from the group having the lower reliability level 138 to accommodate the drive. Switching is also performed under the above conditions for a group having a low reliability level 138. The drive to be switched is selected from a group having a low reliability level 138.

第８図は、切換スケジューリング表140である。この
表は、各グループ毎の、ドライブ切換を行なう時を、指
定するためのものである。FIG. 8 is a switching scheduling table 140. This table is used to specify when to perform drive switching for each group.

この切換スケジューリング表140において、行141はグ
ループGaに、行142はグループGbに、行143はグループGc
に、行144はグループGdに対応している。また、各グル
ープのデータは、パラメータ145と、スタート時間146
と、最終時間147とにより構成されている。In the switching scheduling table 140, the row 141 corresponds to the group Ga, the row 142 corresponds to the group Gb, and the row 143 corresponds to the group Gc.
Meanwhile, row 144 corresponds to group Gd. The data of each group includes a parameter 145 and a start time 146.
And a final time 147.

そして、パラメータ145の値が０の場合は、ドライブ
の切換を、即時に、実行する。パラメータ145の値が１
の場合は、スタート時間146と最終時間147とにより指定
されている時間帯においてのみ、ドライブの切換を実行
する。また、パラメータ145の値が２の場合は、スター
ト時間146〜最終時間147で示される曜日のみ実行する。
パラメータ145の値が３の場合は、スタート時間146と最
終時間147とにより指定されている日のみ実行する。When the value of the parameter 145 is 0, the drive switching is executed immediately. The value of parameter 145 is 1
In the case of, drive switching is executed only in the time zone specified by the start time 146 and the end time 147. When the value of the parameter 145 is 2, only the day of the week indicated by the start time 146 to the last time 147 is executed.
When the value of the parameter 145 is 3, the process is executed only on the day specified by the start time 146 and the end time 147.

第９図に切換予約テーブル150を示す。 FIG. 9 shows the switching reservation table 150.

切換予約テーブル150は、パラメータ145の値が０でな
い場合、すなわちドライブの切換を即時に実行しない故
障ドライブについて、上述の切換スケジューリング表14
0のスタート時間146と最終時間147とにより指定される
時間帯において、切換を行うための予約をするためのも
のである。When the value of the parameter 145 is not 0, that is, for a failed drive that does not immediately execute the drive switching, the switching reservation table 150
This is for making a reservation for switching in a time zone specified by the start time 146 and the final time 147 of 0.

予約は、当該故障ドライブの行アドレス152と列アド
レス153とを登録し、登録フラグ151をオンにすることに
より行われる。The reservation is performed by registering the row address 152 and the column address 153 of the failed drive and turning on the registration flag 151.

また、当該ドライブの切換が終了した時は、登録フラ
グ151をオフすることにより、当該ドライブの登録は削
除される。Further, when the switching of the drive is completed, the registration of the drive is deleted by turning off the registration flag 151.

次に、ドライブ（x,y）が故障した時の切換対象ドラ
イブの検索動作を第10図（ａ），（ｂ），（ｃ），
（ｄ）に示すフローチャートを用いて説明する。Next, the search operation for the drive to be switched when the drive (x, y) fails will be described with reference to FIGS. 10 (a), (b), (c), and (c).
This will be described with reference to the flowchart shown in FIG.

まず故障ドライブ（x,y）がどのグループに属するか
を、物理ドライブ管理表120を用いて求める（ステップ2
00）。次に、スケジューリング表140より、当該グルー
プの切換実行時間を求める（ステップ201）。First, a group to which the failed drive (x, y) belongs is obtained using the physical drive management table 120 (step 2).
00). Next, the switching execution time of the group is obtained from the scheduling table 140 (step 201).

その結果、パラメータ145が０か否か、つまり、現
在、すぐに切換を実行するか否かを判断し（ステップ20
2）、即時に切換る必要がない時は、第10図（ｄ）に示
したステップ220、221の処理により切換の予約を行な
う。As a result, it is determined whether or not the parameter 145 is 0, that is, whether or not the switching is to be performed immediately (step 20).
2) If there is no need to switch immediately, the switching is reserved by the processing of steps 220 and 221 shown in FIG. 10 (d).

ステップ220において、マイクロプロセッサユニット7
4は、切換予約テーブル150の登録フラグ151がオフのエ
リアを検索する。続いて該エリアの登録フラグ151をオ
ンすると共に、故障ドライブのアドレスを登録する（ス
テップ221）。In step 220, the microprocessor unit 7
Step 4 searches for an area where the registration flag 151 of the switching reservation table 150 is off. Subsequently, the registration flag 151 of the area is turned on, and the address of the failed drive is registered (step 221).

予約を行なった故障ドライブについては、この後、例
えばある一定周期等で、時計77による、現時刻がスター
ト時間146と最終時間147との間にあるか否か判断し、切
換時刻に達した時には、ステップ203以下の論理にて、
切換を行なう。For the failed drive for which the reservation has been made, thereafter, for example, at a certain fixed period or the like, it is determined by the clock 77 whether or not the current time is between the start time 146 and the final time 147. , With the logic following step 203,
Perform switching.

ステップ202において、切換を即時に実行する場合に
は、予備ドライブ管理表110をサーチして、有効フラグ1
11のオンであるもの、すなわち、使用可能な予備ドライ
ブがあるか否かを判断する（ステップ203）。その結
果、１つでも使用可能な予備ドライブが存在する場合に
は、続いてこの使用可能な予備ドライブの中で、性能低
下をきたさない位置にあるドライブが存在するか否か
を、ステップ204、205において判断する。If the switching is to be executed immediately in step 202, the spare drive management table 110 is searched and the valid flag 1
It is determined whether 11 is ON, that is, whether or not there is a usable spare drive (step 203). As a result, if there is at least one available spare drive, subsequently, at step 204, it is determined whether any of the available spare drives is located at a position that does not cause performance degradation. Judge at 205.

ステップ204では、故障ドライブと同一行（ｘ）に予
備ドライブが存在するか否かを、予備ドライブ管理表11
0の行アドレス112をサーチすることにより判断する。存
在すれば、ステップ213に進み、サーチされたドライブ
を切換対象ドライブとして、切り替える。In step 204, whether or not a spare drive exists in the same row (x) as the failed drive is determined by a spare drive management table 11
The determination is made by searching for a row address 112 of 0. If there is, the process proceeds to step 213, and the searched drive is switched as the switching target drive.

同一行に予備ドライブがない場合は、今度は、同一列
（ｙ）に予備ドライブが存在するか否かを、予備ドライ
ブ管理表110の列アドレス113をサーチすることにより判
断する（ステップ205）。存在すればステップ212に進
み、サーチされたドライブを切換対象ドライブとして、
切り替える。If there is no spare drive in the same row, it is determined whether a spare drive exists in the same column (y) by searching the column address 113 of the spare drive management table 110 (step 205). If there is, the process proceeds to step 212, and the searched drive is set as a switch target drive, and
Switch.

ｘ行にもｙ列にも予備ドライブが無い場合は、予備ド
ライブに単純に切換るだけでは、性能低下を防ぐことが
できない。If there is no spare drive in either the x-row or the y-column, simply switching to the spare drive does not prevent the performance degradation.

そこで当該グループに要求されている性能を知る為、
要求レベル管理表130の性能レベル136を用いて性能レベ
ルを求めて（ステップ206）、その値を確認する（ステ
ップ207）。So, to know the performance required of the group,
The performance level is obtained using the performance level 136 of the required level management table 130 (step 206), and the value is confirmed (step 207).

その結果、性能レベル136が０の時、すなわち要求性
能が低い場合は、存在する予備ドライブのうち、任意の
一つを切換対象ドライブとし、これを切り替える（ステ
ップ211）。一方、性能レベル136が１の時、すなわち、
要求性能が高い時はステップ208に進む。As a result, when the performance level 136 is 0, that is, when the required performance is low, any one of the existing spare drives is set as the drive to be switched, and the drive is switched (step 211). On the other hand, when the performance level 136 is 1, that is,
When the required performance is high, the process proceeds to step 208.

ところで、性能低下を防ぐ為には、ドライブの並列数
を下げないことが必要である。そのため、切換対象ドラ
イブを故障ドライブと同一列（ｙ）から選択することが
好ましい。しかし、この時、選択されたドライブを含む
グループが性能低下をきたさないよう配慮する必要があ
る。そこで、ステップ208においては、このような条件
を満たすドライブを、サーチしている。By the way, in order to prevent performance degradation, it is necessary not to reduce the number of parallel drives. Therefore, it is preferable to select the drive to be switched from the same column (y) as the failed drive. However, at this time, care must be taken to prevent the group including the selected drive from deteriorating in performance. Therefore, in step 208, a drive satisfying such a condition is searched.

具体的には、物理ドライブ管理表120のｙ列、言い替
えれば、故障ドライブと同じ列にあるドライブについ
て、それぞれの同一行に予備ドライブがあるか否かを、
物理ドライブ管理表120の識別子122を用いて判断する。
例えば故障ドライブが（2,1）で、予備ドライブが（6,
3）に存在する場合は、（2,3）が上記条件を満たすドラ
イブとなる。Specifically, for a drive in the y column of the physical drive management table 120, in other words, for a drive in the same column as the failed drive, it is determined whether or not there is a spare drive in each of the same rows.
The determination is made using the identifier 122 of the physical drive management table 120.
For example, if the failed drive is (2,1) and the spare drive is (6,1)
If it exists in (3), (2,3) is a drive satisfying the above conditions.

ステップ208によりドライブが選択されると、ステッ
プ209に進む。When a drive is selected in step 208, the process proceeds to step 209.

ステップ209では、まず、ステップ208で求めたドライ
ブと、該ドライブと同一行に存在する予備ドライブとを
切換える。そして、その後、故障ドライブとステップ20
8で求めたドライブを切換える（ステップ210）。In step 209, first, the drive determined in step 208 and the spare drive in the same row as the drive are switched. And then the failed drive and step 20
The drive determined in step 8 is switched (step 210).

このように、まず、故障ドライブと同一列にある他の
ドライブを、該他のドライブと同一行の予備ドライブと
切換、その後、故障ドライブを該他のドライブと切換る
ことにより、故障ドライブと同一列のドライブに、切り
替えることが可能となる。As described above, first, the other drive in the same column as the failed drive is switched to the spare drive in the same row as the other drive, and then, the failed drive is switched to the other drive to thereby become the same as the failed drive. It is possible to switch to the drive in the row.

この方法をとることにより、性能を高く要求するグル
ープに対し、ドライブ故障時の性能維持を提供できる。By adopting this method, it is possible to provide a group requiring high performance to maintain performance in the event of a drive failure.

尚、実際の切換動作はステップ230以降で説明する。 The actual switching operation will be described from step 230.

次にステップ203において、予備ドライブが無いと判
断された場合の説明を第10図（ｃ）を用いて行なう。Next, a description will be given of the case where it is determined in step 203 that there is no spare drive with reference to FIG. 10 (c).

ステップ214では、要求レベル管理表130の故障ドライ
ブ数137の値に１を加える。これにより故障ドライブ数1
37は、該故障発生後の、当該グループのデータ喪失の危
険性を示すことになる。In step 214, 1 is added to the value of the number of failed drives 137 in the request level management table 130. As a result, the number of failed drives is 1
37 indicates the risk of data loss of the group after the occurrence of the failure.

本実施例では、冗長データドライブを各グループ共２
台としているため、データ喪失までのマージンは２台故
障となる。そのため、ステップ215において、故障ドラ
イブ数を判断し、故障ドライブ数が、２台以下の場合に
は、データ喪失の危険は低いと判断して、そのまま動作
を終了する。In this embodiment, redundant data drives are assigned to two groups.
Therefore, the margin up to the data loss is two failures. Therefore, in step 215, the number of failed drives is determined. If the number of failed drives is two or less, it is determined that the risk of data loss is low, and the operation is terminated as it is.

一方、故障ドライブが２台、つまりマージンが無い時
は、データ喪失の危険性が非常に高いため、危険性をデ
ィスクアレイ装置80全体に分散するように切換対象ドラ
イブを選択する（ステップ216）。On the other hand, when there are two failed drives, that is, when there is no margin, the risk of data loss is very high, so the drive to be switched is selected so that the risk is distributed to the entire disk array device 80 (step 216).

なお、ステップ216におけるドライブ選択の最適化論
理は、後述する第12図のステップ240以降で説明する。The logic for optimizing the drive selection in step 216 will be described later in step 240 in FIG.

続いて、故障ドライブをステップ216において選択し
たドライブと切り替える（ステップ217）。Subsequently, the failed drive is switched to the drive selected in step 216 (step 217).

その結果、ステップ200で求めたグループ、すなわち
故障ドライブの属するグループは、ステップ216におい
て選択されたドライブの属するグループから、ドライブ
を１台融通されたことになる。従って、ステップ218に
おいて、故障ドライブの属するグループは、故障ドライ
ブ数137の値を１減らし、一方、融通した側のグループ
は故障ドライブ数137の値を１増やす（ステップ219）。
そして、動作を終了する。As a result, in the group obtained in step 200, that is, the group to which the failed drive belongs, one drive has been exchanged from the group to which the drive selected in step 216 belongs. Accordingly, in step 218, the group to which the failed drive belongs decreases the value of the number of failed drives 137 by one, while the group on the interchange side increases the value of the number of failed drives 137 by one (step 219).
Then, the operation ends.

次に第11図のフローチャートを用いて、ドライブの切
換操作を説明する。Next, the drive switching operation will be described with reference to the flowchart of FIG.

この動作は、ステップ209、210、211、212、213、217
に適用するものである。This operation is performed in steps 209, 210, 211, 212, 213, and 217.
It is applied to.

なお、以下の説明は、アドレスが（x,y）のドライブ
Ａと、アドレスが（ｘ′,y′）のドライブＢとを切り替
える際の動作について示すものとする。In the following description, an operation when switching between the drive A having the address (x, y) and the drive B having the address (x ', y') will be described.

本動作を行なうドライブの組合せとしては、正常ドラ
イブと予備ドライブ、故障ドライブと予備ドライブ、故
障ドライブと正常ドライブの３種類がある。There are three types of combinations of drives that perform this operation: a normal drive and a spare drive, a failed drive and a spare drive, and a failed drive and a normal drive.

動作開始後、マイクロプロセッサユニット74は、ま
ず、切換動作の対象となっているドライブＡとドライブ
Ｂのうち、いずれかが故障ドライブであるか否かを判断
する（ステップ230）。いずれかが故障ドライブで有る
場合には、まず物理ドライブ管理表120の当該故障ドラ
イブの故障フラグ121をオンにし、さらに、グループテ
ーブル90の当該故障ドライブに対応する故障フラグ104
もオンにする（ステップ231）。そして、その後ステッ
プ232に進む。After the start of the operation, the microprocessor unit 74 first determines whether any of the drive A and the drive B to be switched is a failed drive (step 230). If any of the failed drives is a failed drive, the failure flag 121 of the failed drive in the physical drive management table 120 is first turned on, and the failure flag 104 corresponding to the failed drive in the group table 90 is further turned on.
Is also turned on (step 231). Then, the process proceeds to step 232.

一方、ステップ230において、故障ドライブを含んで
いなければ、直接、ステップ232に進む。On the other hand, if it is determined in step 230 that the failed drive is not included, the process directly proceeds to step 232.

ステップ232においては、物理ドライブ管理表120のド
ライブＡとドライブＢとの識別子122およびグループ番
号123を入れ換える。これにより、物理アドレスとして
の切換えは終了する。In step 232, the identifier 122 and the group number 123 of the drive A and the drive B in the physical drive management table 120 are exchanged. Thus, the switching as the physical address ends.

続いて、予備ドライブが含まれるか否かを判断し（ス
テップ233）、含まれればステップ234に進むが、含まれ
ていなければステップ238に進む。Subsequently, it is determined whether or not a spare drive is included (step 233). If it is included, the process proceeds to step 234. If not, the process proceeds to step 238.

ステップ234においては、予備ドライブ管理表110の行
アドレス112、列アドレス113の内容を、切換対象の正常
ドライブ又は故障ドライブアドレスに変更し、さらに、
グループテーブル90の故障フラグ104の内容も、予備ド
ライブ管理表110の有効フラグ111に、コピーする。In step 234, the contents of the row address 112 and the column address 113 of the spare drive management table 110 are changed to a normal drive or a failed drive address to be switched, and
The contents of the failure flag 104 of the group table 90 are also copied to the valid flag 111 of the spare drive management table 110.

続いて、有効フラグ111が、１か否か判断する（ステ
ップ235）。そして、有効フラグ111が１であった場合、
すなわち、予備ドライブと故障ドライブとを切換えた時
は、予備ドライブ数114の値を１減らし（ステップ23
6）、ステップ237に進む。一方、有効フラグ111が０な
らば、すなわち、予備ドライブと正常ドライブとを切り
替えた場合には、直接、ステップ237に進む。Subsequently, it is determined whether the valid flag 111 is 1 (step 235). If the valid flag 111 is 1,
That is, when the spare drive and the failed drive are switched, the value of the spare drive number 114 is reduced by 1 (step 23).
6), proceed to step 237. On the other hand, if the valid flag 111 is 0, that is, if the spare drive and the normal drive are switched, the process directly proceeds to step 237.

ステップ237では、当該予備ドライブの行アドレス11
2、列アドレス113を、グループテーブル90の切換対象ド
ライブの行アドレス105、列アドレス106にコピーし、切
換動作を終了する。In step 237, the row address 11 of the spare drive
2. The column address 113 is copied to the row address 105 and the column address 106 of the drive to be switched in the group table 90, and the switching operation ends.

一方、ステップ233において予備ドライブを含まない
と判断された場合には、グループテーブル90で、切換る
ドライブ同士のアドレス情報、つまり、故障フラグ10
4、行アドレス105、列アドレス106を相互に入れ替えて
（ステップ238）、切換動作を終了する。On the other hand, if it is determined in step 233 that the spare drive is not included, the address information of the drive to be switched, that is, the failure flag 10
4. The row address 105 and the column address 106 are exchanged with each other (step 238), and the switching operation ends.

以上の、ステップ234〜238により、１グループを構成
しているドライブアドレスと、予備ドライブのアドレス
情報の切換が終了する。By the above steps 234 to 238, the switching of the drive addresses constituting one group and the address information of the spare drive is completed.

次に、上述のステップ216において行う最適化ドライ
ブの選択方法を、第12図に示すフローチャートを用いて
説明する。Next, a method of selecting an optimized drive performed in step 216 will be described with reference to the flowchart shown in FIG.

当該グループの故障ドライブ数が２となった時、本シ
ステムでは冗長データを２ドライブ訂正分しか持ってい
ないため、本グループのデータ喪失の危険性は非常に高
くなっている。When the number of failed drives in the group becomes 2, the risk of data loss in this group is extremely high because the present system has redundant data for only two drives.

従って、まずドライブを１台融通するグループを選択
する必要がある。その為にまずステップ240にて、要求
レベル管理表130の信頼性レベル138が０のグループ、す
なわち要求されてる信頼性が低いグループをサーチす
る。Therefore, it is necessary to first select a group that accommodates one drive. For that purpose, first, in step 240, a group whose reliability level 138 in the request level management table 130 is 0, that is, a group whose required reliability is low is searched.

これは信頼性に対する要求が比較的低いグループから
ドライブを提供させるためである。This is because the drive from a group having a relatively low reliability requirement is provided.

続いて、信頼性レベル０のグループが存在するか否か
を判断し（ステップ241）、存在すればステップ242に進
む。一方、存在しなければステップ244に進む。Subsequently, it is determined whether or not a group having the reliability level 0 exists (step 241). On the other hand, if it does not exist, the process proceeds to step 244.

ステップ242では、信頼性レベル138が０で、且つ、故
障ドライブ数が０のグループを探す。これは要求レベル
管理表130において、当該グループの故障ドライブ数137
の値が、０か否かで判断する。これにより、最も危険度
の低いグループを見つけることができる。In step 242, a group in which the reliability level 138 is 0 and the number of failed drives is 0 is searched. This corresponds to the number of failed drives 137 of the group in the request level management table 130.
Is determined based on whether or not the value of is zero. Thereby, the group with the lowest risk can be found.

続いて、ステップ242の条件を満たすグループが存在
するか否かを判断し（ステップ243）、存在すれば、ド
ライブを提供するグループは当該グループに決定される
（ステップ247）。しかし、存在しなければステップ244
に進む。Subsequently, it is determined whether or not there is a group that satisfies the condition of step 242 (step 243). If there is, a group providing a drive is determined to be the group (step 247). If not, however, step 244
Proceed to.

ステップ241またはステップ243において、それぞれの
ステップにおいて要求される条件を満たすグループが存
在しなかった場合に、すなわち、信頼性の低いグループ
が無いか、または、あったとしても、当該グループには
すでに故障ドライブが存在する場合には、上述のとお
り、ステップ244に進む。In step 241 or step 243, if there is no group that satisfies the conditions required in each step, that is, if there is no unreliable group or if there is If a drive is present, proceed to step 244, as described above.

ステップ244においては、まず、当該故障ドライブを
含むグループの信頼性レベルを判定する。その結果、信
頼性レベルが低い時、つまり信頼性レベル138の値が０
の時は、切換対象グループは無と判断する（ステップ24
8）。逆に、信頼性レベルが高い時、つまり信頼性レベ
ル138の値が１の時は、続いて、故障ドライブ数が０の
グループを求める（ステップ245）。In step 244, first, the reliability level of the group including the failed drive is determined. As a result, when the reliability level is low, that is, when the value of the reliability level 138 is 0,
In the case of, it is determined that there is no switching target group (step 24)
8). Conversely, when the reliability level is high, that is, when the value of the reliability level 138 is 1, a group in which the number of failed drives is 0 is obtained (step 245).

そして、上記条件を満たすグループ、すなわち、信頼
性レベルが高く、かつ、故障ドライブ数が０のグルー
プ、が存在するか否かを判断し（ステップ246）、存在
する場合には、ステップ247に進みそのグループを切換
対象グループとする。逆に、ステップ246において存在
しなかった場合には、切換対象となるグループは存在し
ないことになる（ステップ248）。Then, it is determined whether or not there is a group that satisfies the above condition, that is, a group with a high reliability level and a number of failed drives of 0 (step 246). The group is set as a switching target group. Conversely, if it does not exist in step 246, there is no group to be switched (step 248).

第12図のステップ240〜248により、切換対象グループ
の有無と、存在する場合には、そのグループ番号が少な
くとも１つ求まる。At steps 240 to 248 in FIG. 12, the presence / absence of a switching target group and, if it exists, at least one group number are determined.

そして、そのグループのどのドライブと切換るのが最
適であるかは、第10図（ａ）のステップ204〜208と同様
の処理を、当該グループ内のドライブに対して行うこと
により決定することができる。Then, it is possible to determine which drive in the group is optimal to switch to by performing the same processing as in steps 204 to 208 in FIG. 10A for the drives in the group. it can.

以上の実施例において説明したように、ドライブに故
障が発生した時に、予備ドライブの存在する場合と存在
しない場合ともに、各グループに要供される性能や信頼
性に見合った切換ドライブを選択することができる。そ
のため、ディスクドライブ装置全体としての、性能バラ
ンス、信頼性バランスを保つことができる。As described in the above embodiment, when a failure occurs in a drive, it is necessary to select a switching drive that matches the performance and reliability required for each group, regardless of whether a spare drive exists or not. Can be. Therefore, the performance balance and the reliability balance of the entire disk drive device can be maintained.

［発明の効果］以上説明したように本発明によれば、ドライブ故障
時、予備ドライブが少なくとも１つ存在すれば、切換後
も性能を保持できる。また、予備ドライブが無い時でも
データ喪失の危険性を、すべてのドライブへ分散させる
方向にドライブの融通を行なうため、信頼性が向上し
た。さらに、上記処理を行う時間帯等を指定でき、ユー
ザーの要求にも柔軟に対応できる。さらにまた、本発明
は、ハードウェアの変更なし、マイクロプログラムのみ
で実現できる等の効果が得られる。[Effects of the Invention] As described above, according to the present invention, at the time of drive failure, if at least one spare drive exists, performance can be maintained even after switching. In addition, since the risk of data loss is distributed to all the drives even when there is no spare drive, the reliability is improved because the drives are interchanged. Further, a time zone or the like in which the above processing is performed can be designated, and it is possible to flexibly respond to a user request. Furthermore, the present invention has effects such as no change in hardware, realization with only a microprogram, and the like.

[Brief description of the drawings]

第１図は本発明の一実施例であるディスクドライブ装置
を使用した計算機システムの構成を示すブロック図、第
２図は本実施例のディスクアレイ制御装置の構成を示す
ブロック図、第３図は本発明の一実施例を示すディスク
アレイ装置の構成を示す説明図、第４図は本実施例で使
用するグループテーブルの構成図、第５図は本実施例で
使用する予備ドライブ管理表の構成図、第６図は本実施
例で使用する物理ドライブ管理表の構成図、第７図は本
実施例で使用する要求レベル管理表の構成図、第８図は
本実施例で使用する切換スケジューリング表の構成図、
第９図は本発明で使用する切換予約テーブルの構成図、
第10図は切換対象ドライブの検索手順を示すフローチャ
ート、第11図は切換操作手順を示すフローチャート、第
12図は、危険度を分数するための切換ドライブ検索動作
を示すフローチャート、第13図は従来のディスクアレイ
装置を示す説明図、第14図は従来のディスクアレイのグ
ループの概念を示す説明図である。 60:中央処理装置、70:ディスクアレイ制御装置、71:チ
ャンネル制御回路、72:制御回路、73:データバッファ、
74:マイクロプロセッサユニット、75:ランダムアクセス
メモリ、76:冗長データ作成回路、77:時計、80:ディス
クアレイ装置、81〜87:データ転送制御回路、81a,b,c,d
〜87a,b,c,d:ドライブ、90:グループテーブル、91〜94:
データ格納用ドライブアドレス情報、95:冗長データ用
ドライブアドレス情報、96:冗長データ用ドライブアド
レス情報、100:行、101:行、102:行、103:行、104:故障
フラグ、105:行アドレス、106:列アドレス、110:予備ド
ライブ管理表、111:有効フラグ、112:行アドレス、113:
列アドレス、114:予備ドライブ数、120:物理ドライブ管
理表、121:故障フラグ、122:識別子、123:グループ番
号、130:要求レベル管理表、131:行、132:行、133:行、
134:行、136:性能レベル、137:故障ドライブ数、138:信
頼性レベル、140:切換スケジューリング表、141:行、14
2:行、143:行、144:行、145:パラメータ、146:スタート
時間、147:最終時間、150:切換予約テーブル、151:登録
フラグ、152:行アドレス、153:列アドレス、301〜304:
ハードディスクコントローラ、310〜312,320〜322,330
〜332,340〜342:ディスクドライブ、350a,b〜355a,b:デ
ィスクドライブ。FIG. 1 is a block diagram showing a configuration of a computer system using a disk drive device according to one embodiment of the present invention, FIG. 2 is a block diagram showing a configuration of a disk array control device of this embodiment, and FIG. FIG. 4 is an explanatory diagram showing a configuration of a disk array device according to an embodiment of the present invention, FIG. 4 is a configuration diagram of a group table used in the embodiment, and FIG. 5 is a configuration of a spare drive management table used in the embodiment. FIG. 6, FIG. 6 is a configuration diagram of a physical drive management table used in this embodiment, FIG. 7 is a configuration diagram of a request level management table used in this embodiment, and FIG. 8 is switching scheduling used in this embodiment. Configuration diagram of the table,
FIG. 9 is a configuration diagram of a switching reservation table used in the present invention,
FIG. 10 is a flowchart showing a search procedure for a drive to be switched, FIG. 11 is a flowchart showing a switching operation procedure, and FIG.
12 is a flowchart showing a switching drive search operation for fractionating the degree of risk, FIG. 13 is an explanatory diagram showing a conventional disk array device, and FIG. 14 is an explanatory diagram showing the concept of a conventional disk array group. is there. 60: central processing unit, 70: disk array controller, 71: channel control circuit, 72: control circuit, 73: data buffer,
74: microprocessor unit, 75: random access memory, 76: redundant data creation circuit, 77: clock, 80: disk array device, 81 to 87: data transfer control circuit, 81a, b, c, d
~ 87a, b, c, d: drive, 90: group table, 91 ~ 94:
Drive address information for data storage, 95: drive address information for redundant data, 96: drive address information for redundant data, 100: row, 101: row, 102: row, 103: row, 104: failure flag, 105: row address , 106: column address, 110: spare drive management table, 111: valid flag, 112: row address, 113:
Column address, 114: number of spare drives, 120: physical drive management table, 121: failure flag, 122: identifier, 123: group number, 130: request level management table, 131: row, 132: row, 133: row,
134: Row, 136: Performance level, 137: Number of failed drives, 138: Reliability level, 140: Switching scheduling table, 141: Row, 14
2: line, 143: line, 144: line, 145: parameter, 146: start time, 147: last time, 150: switching reservation table, 151: registration flag, 152: line address, 153: column address, 301 to 304 :
Hard disk controller, 310-312,320-322,330
~ 332,340 ~ 342: disk drive, 350a, b ~ 355a, b: disk drive.

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平２−236714（ＪＰ，Ａ) 特開平２−214061（ＪＰ，Ａ) 特開昭58−22465（ＪＰ，Ａ) 特開平４−153727（ＪＰ，Ａ) ──────────────────────────────────────────────────続き Continuation of the front page (56) References JP-A-2-236714 (JP, A) JP-A-2-214406 (JP, A) JP-A-58-22465 (JP, A) JP-A-4- 153727 (JP, A)

Claims

(57) [Claims]

1. A disk drive device comprising: a disk drive group having a plurality of logical drive groups composed of a plurality of drives; and control means for controlling the drives of each group, wherein the logical drive group has a spare drive. A first mode for switching the failed drive to a spare drive in the same group as the failed drive or a spare drive controlled by the same control means as the control means for controlling the failed drive; Failed drive
Among the drives controlled by the same control means as the control means for controlling the failed drive, a drive belonging to a group having an unused spare drive is set as a drive to be switched, and the drive to be switched is the same as the drive to be switched. Switch to the group's spare drive, then
A second mode for switching between a failed drive and a drive to be switched, which is controlled by the same control means as the failed drive, wherein a failure occurs in any of the drives;
When there is a spare drive in the same group as the failed drive or a spare drive controlled by the same control means as the control means for controlling the failed drive, the first mode is entered and the drive is switched. A disk drive device having a function of entering a second mode and switching drives when a spare drive does not exist.

2. A disk drive device comprising: a disk drive group having a plurality of logical drive groups composed of a plurality of drives; and control means for controlling the drives of each group, wherein the control means comprises: When a drive failure occurs, a margin, which is the number of drives until data loss occurs, is checked, and a drive belonging to a group having a large margin is switched to belong to a group having a small margin. Disk drive device.

3. A disk drive device comprising a plurality of drives and control means for controlling each drive, wherein said control means has a function of switching a drive to be used, and has a function only in a predetermined time zone. A disk drive device having a switching function.