JP3063666B2

JP3063666B2 - Array disk controller

Info

Publication number: JP3063666B2
Application number: JP9080998A
Authority: JP
Inventors: 善昭森
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1997-03-31
Filing date: 1997-03-31
Publication date: 2000-07-12
Anticipated expiration: 2017-03-31
Also published as: JPH10275060A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、複数のディスクに
データを分散して書き込むアレイディスクを制御するア
レイディスク制御装置に係り、特に、ディスクまたはデ
ィスクインターフェースでの障害発生時にディスクの切
り離しを行うアレイディスク制御装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an array disk control device for controlling an array disk in which data is distributed and written to a plurality of disks, and more particularly to an array for separating disks when a failure occurs in a disk or a disk interface. It relates to a disk control device.

【０００２】[0002]

【従来の技術】従来より、特開平４−６７４７６号公報
に開示されているように、磁気ディスク装置のリードエ
ラーまたはシークエラーの発生状況を監視し、媒体異常
がある磁気ディスクを切り離すアレイディスク制御装置
が提案されている。これは、エラーの発生回数により媒
体の状態を検出し、エラーの発生回数が予め定められた
値を越えたときに当該ディスクを強制的に切り離すもの
である。2. Description of the Related Art Conventionally, as disclosed in Japanese Patent Application Laid-Open No. 4-67476, an array disk control for monitoring the occurrence of a read error or seek error of a magnetic disk device and separating a magnetic disk having a medium error. A device has been proposed. In this method, the state of the medium is detected based on the number of occurrences of the error, and the disk is forcibly disconnected when the number of occurrences of the error exceeds a predetermined value.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、特開平
４−６７４７６号公報に示されている従来の技術ては、
媒体異常の発生頻度を知るために各媒体欠陥の位置に関
する構報を記憶しておかなければならなず、すると、非
常に大きな記憶容量が必要となってしまう、という不都
合があった。However, the conventional technique disclosed in Japanese Patent Application Laid-Open No. 4-67476 is
In order to know the occurrence frequency of the medium abnormality, it is necessary to store a report about the position of each medium defect, and there is a disadvantage that an extremely large storage capacity is required.

【０００４】さらに、従来の技術では、ディスクインタ
フェース制御部に異常があった場合であっても、当該デ
ィスクインタフェース制御部に接続されているディスク
に異常があるとして切り離してしまう、という不都合が
あった。Further, in the conventional technique, even if there is an abnormality in the disk interface control unit, the disk connected to the disk interface control unit is disconnected because it has an abnormality. .

【０００５】具体的には、近年の磁気ディスクは媒体の
品質が向上し、記録媒体の障害による障害は比較的少な
く、例え発生しても交代セクター処理などによりあまり
大きな被害が生じにくい。逆に、ディスクの記録密度の
高密度化などに伴い、ヘッドの記録面への吸着、あるい
は、スピンドルモータの故障、アレイディスク制御装置
とディスクとの間のインターフェースの障害のようにデ
ィスクの記録媒体の一部の障害よりもディスク全体がア
クセス不能になる障害の方が生じやすく、また、これに
よる被害の方がディスク全体がアクセス不能になる分大
きなものとなる。More specifically, in recent years, the quality of a magnetic disk has been improved, and the failure due to a failure in a recording medium is relatively small. Conversely, with the increase in the recording density of the disk, the recording medium of the disk may be attracted to the recording surface of the head, a failure of the spindle motor, or a failure of the interface between the array disk controller and the disk. A failure in which the entire disk becomes inaccessible is more likely to occur than a part of the failure, and the damage caused by the failure is greater because the entire disk becomes inaccessible.

【０００６】ヘッドの記録面への吸着、あるいは、スピ
ンドルモータの故障などディスク自身の障害に対しては
従来技術の延長でディスクを切り離すことで信頼性を向
上させることができる。しかし、ディスクインターフェ
ースの障害に関してはディスク自身の障害であるか、ア
レイディスク制御装置のインターフェース部の障害であ
るかの切り分けは困難であり、従来技術のようにエラー
の発生履歴をカウントすることによりディスクを切り離
すことは悪影響を招くことがある。For the failure of the disk itself such as the suction of the head on the recording surface or the failure of the spindle motor, the reliability can be improved by separating the disk by extension of the prior art. However, it is difficult to determine whether a disk interface failure is a failure of the disk itself or a failure of the interface unit of the array disk controller. Disconnecting can have adverse effects.

【０００７】例えば、図６のように２台のアレイディス
ク制御装置から各ディスク７１〜７４をアクセスできる
ように構成されたアレイディスク装置で、アレイディス
ク制御装置１０Ａ内のディスクインターフェース制御回
路４２Ａが故障した場合、ディスク５２Ａを切り離すこ
とにより処理の継続は可能であるが、この場合正常なデ
ィスクを切り離すことになってしまう。For example, in an array disk device as shown in FIG. 6 in which each of the disks 71 to 74 can be accessed from two array disk controllers, the disk interface control circuit 42A in the array disk controller 10A fails. In this case, the processing can be continued by disconnecting the disk 52A, but in this case, a normal disk is disconnected.

【０００８】さらに、従来例では、複数のディスクの内
の１つのディスクの一部分の領域に障害がある場合に、
他のディスクに障害が発生したとして切り離してしまう
と、当該１つのディスクの障害のない他の部分のデータ
を読み出すことができなくなってしまう、という不都合
があった。すなわち、障害の発生の回数で単純に切り離
し処理を行うと、復旧可能なデータであっても喪失して
しまう、という不都合があった。Further, in the conventional example, when a failure occurs in a partial area of one of a plurality of disks,
If a failure occurs in another disk and the disk is disconnected, there is a disadvantage that data of another portion of the one disk without failure cannot be read. That is, if the disconnection process is simply performed by the number of times of occurrence of a failure, there is a disadvantage that even recoverable data is lost.

【０００９】まず、冗長性を持ったアレイディスクで
は、媒体の障害により書き込みができない場合にはデー
タを格納するデータディスクとパリティデータを格納す
る冗長ディスクとの整合性をとるため、障害ディスクを
切り離して処理を行う必要がある。First, in a redundant array disk, if writing cannot be performed due to a medium failure, the failed disk is separated in order to ensure consistency between the data disk for storing data and the redundant disk for storing parity data. Must be processed.

【００１０】しかし、読み出しの場合には冗長ディスク
でデータを復旧してホストコンピュータに転送すること
が可能なため、より影響が大きくなるディスク全体の切
り離しは行わず一時的な切り離しだけで処理を継続しよ
うとすることがある。あるいは、媒体に比較的軽微な障
害が発生しているが、再試行により救済することが可能
な状況ではディスクの切り離しは行われない場合があ
る。However, in the case of reading, since the data can be recovered and transferred to the host computer by using the redundant disk, the processing is continued only by temporary disconnection without disconnecting the entire disk, which has a greater effect. You may try. Alternatively, in a situation where a relatively minor failure has occurred in the medium but the medium can be remedied by retrying, the disk may not be disconnected.

【００１１】このような部分的な媒体障害が発生してい
るが、そのディスクの切り離しが行われていない状態に
なることがあり得る。このような状態で他のディスクで
重度な障害が発生した場合には、従来技術のように障害
の発生頻度をカウントする方法だけでディスクを切り離
すようにすると、ディスクは切り離されて処理を継続し
てしまうことになる。Although such a partial medium failure has occurred, the disk may not be disconnected. If a severe failure occurs in another disk in such a state, if the disk is detached only by counting the frequency of failure as in the prior art, the disk is detached and processing continues. Would be.

【００１２】この状態でディスクを切り離した場合、処
理を継続することが可能であるが、これにより冗長性が
なくなるため、の部分的な媒体障害の発生している部分
はその後読み出すことができなくなる。これは、すでに
書き込まれているデータを読み出せない、すなわちデー
タの消失という、データを書き込むことができなくなり
処理を中断するよりも大きな問題になる。When the disk is disconnected in this state, the processing can be continued. However, since the redundancy is lost, the part where a partial medium failure has occurred cannot be read out thereafter. . This is a bigger problem than the inability to read data that has already been written, that is, the loss of data, that is, the fact that data cannot be written and processing is interrupted.

【００１３】[0013]

【発明の目的】本発明は、係る従来例の有する不都合を
改善し、特に、少ない障害履歴情報で、的確にディスク
の切り離しを行い、信頼性を向上させることのできるア
レイディスク制御装置を提供することを、その目的とす
る。SUMMARY OF THE INVENTION The object of the present invention is to provide an array disk control device which can solve the disadvantages of the prior art and, in particular, can accurately separate disks with little failure history information and improve reliability. That is its purpose.

【００１４】[0014]

【課題を解決するための手段】そこで、本発明では、上
位装置から送信されたデータを受信するホストインタフ
ェース制御部と、このホストインタフェースで受信した
データを冗長構成にして複数のディスクに格納するアレ
イ制御手段とを備えている。さらに、アレイ制御手段
に、ディスクの障害に応じて予め定められた値が格納さ
れるディスク障害管理テーブルを併設している。そし
て、アレイ制御手段が、ディスクの障害が発生したとき
にディスク障害管理テーブルの値を増加させるディスク
障害値加算機能と、当該ディスクに対して正常にデータ
の格納又は読み出しが行われたときにはディスク障害管
理テーブルの値を減少させるディスク障害値減算機能
と、ディスク障害管理テーブルのディスク障害値が予め
定められたしきい値よりも上回ったときに当該インタフ
ェースを切り離す機能をとを備えた、という構成を採っ
ている。 Therefore, according to the present invention, there is provided a host interface control unit for receiving data transmitted from a host device, and an array for storing data received by the host interface in a redundant configuration on a plurality of disks. Control means. Further, a disk failure management table in which a predetermined value is stored according to a disk failure is provided in the array control means. The array control means includes a disk failure value addition function for increasing the value of the disk failure management table when a disk failure occurs, and a disk failure value when data is normally stored or read from or to the disk. A disk failure value subtraction function for decreasing the value of the management table, and a function of disconnecting the interface when the disk failure value of the disk failure management table exceeds a predetermined threshold. I am taking it .

【００１５】アレイ制御手段は、ディスクの障害状態に
応じてディスクの切り離し処理を行う。このとき、本発
明では、ディスクの障害に応じて予め定められた値が格
納されるディスク障害管理テーブルを用いる。アレイ制
御手段は、ディスク障害値加算機能により、ディスクの
障害が発生したときにディスク障害管理テーブルの値を
増加させる。一方、間欠的な障害によってディスクの切
り離しを行うことを防止するため、ディスク障害値減算
機能によって、ディスクに対して正常にデータの格納又
は読み出しが行われたときにはディスク障害管理テーブ
ルの値を減少させる。そして、アレイ制御手段は、ディ
スク切り離し機能により、ディスク障害管理テーブルの
ディスク障害値が予め定められたしきい値よりも上回っ
たときに当該インタフェースを切り離す。ディスク障害
値減算機能によって、回復不能なメディアの障害の発生
と、間欠的なリード／ライトエラーの発生とを的確にデ
ィスク障害管理テーブルに反映される。このため、軽度
の障害で切り離す必要がないときに切り離し処理を行う
ことがない。The array control means performs a disk disconnection process according to the disk failure state. At this time, the present invention uses a disk failure management table in which a predetermined value is stored according to a disk failure. The array control means increases the value of the disk failure management table when a disk failure occurs by using the disk failure value addition function. On the other hand, in order to prevent disconnection of the disk due to an intermittent failure, the value of the disk failure management table is reduced by the disk failure value subtraction function when data is normally stored or read from the disk. . Then, the array control unit disconnects the interface when the disk failure value in the disk failure management table exceeds a predetermined threshold value by the disk disconnection function . The disk failure value subtraction function, the occurrence of unrecoverable media failure, is reflected in precisely disk failure management table and generation of the intermittent read / write errors. For this reason, there is no need to perform disconnection processing when disconnection is not required due to a minor failure.

【００１６】[0016] また、本発明では特に、前記アレイ制御手In the present invention, in particular, the array control means is used.
段が、前記複数のディスクをさらに複数の領域に分割すStep further divides the plurality of disks into a plurality of areas.
ると共に、前記ディスク障害管理テーブルが、当該分割And the disk failure management table
領域ごとに障害の履歴を管理するテーブル項目を有し、It has a table item that manages the history of failures for each area,
ディスク障害値加算機能が、ディスクの障害が発生したThe disk failure value addition function failed the disk.
ときに一定値ａを加算する機能を備え、ディスク障害管Sometimes has a function to add a constant value a,
理テーブルのしきい値を、前記一定値ａのｎ倍に設定すThe threshold value of the management table is set to n times the constant value a.
ると共に、このｎの値を、前記上位装置が同一データをAt the same time, the value of n is
自動的にリトライする回数に設定し、ディスク障害値減Set the number of retries automatically and reduce the disk failure value.
算機能が、前記ディスクの分割領域数に応じて一定値ａCalculation function calculates a constant value a according to the number of divided areas of the disk.
よりも小さい値を減算する機能を備えた、という構成をWith a function to subtract values smaller than
採っている。これにより前述した目的を達成しようとすI am taking it. This will try to achieve the above objectives
るものである。Things.

【００１７】[0017]

【発明の実施の形態】以下、本発明の実施の形態を説明
する。本実施形態によるアレイディスク制御装置は、上
位装置から送信されたデータを受信するホストインタフ
ェース制御部と、このホストインタフェースで受信した
データを冗長構成にして複数のディスクに格納するアレ
イ制御手段とを備えている。このアレイ制御手段に、デ
ィスクの障害に応じて予め定められた値が格納されるデ
ィスク（メディア）障害管理テーブルを併設している。Embodiments of the present invention will be described below. The array disk control device according to the present embodiment includes a host interface control unit that receives data transmitted from a higher-level device, and an array control unit that stores data received by the host interface in a redundant configuration on a plurality of disks. ing. The array control means is provided with a disk (media) fault management table in which a predetermined value according to a disk fault is stored.

【００１８】しかも、アレイ制御手段は、ディスクの障
害が発生したときにディスク障害管理テーブルの値を増
加させるディスク障害値加算機能と、当該ディスクに対
して正常にデータの格納又は読み出しが行われたときに
はディスク障害管理テーブルの値を減少させるディスク
障害値減算機能と、ディスク障害管理テーブルのディス
ク障害値が予め定められたしきい値よりも上回ったとき
に当該インタフェースを切り離すディスク切り離し機能
とを備えている。Moreover, the array control means has a disk failure value addition function for increasing the value of the disk failure management table when a disk failure occurs, and data has been normally stored or read from the disk. Sometimes, a disk failure value subtraction function for decreasing the value of the disk failure management table and a disk disconnection function for disconnecting the interface when the disk failure value of the disk failure management table exceeds a predetermined threshold value are provided. I have.

【００１９】この構成では、ディスク障害値減算機能
が、ディスクに対して正常にデータの格納又は読み出し
が行われたときには、ディスク障害値加算機能によって
加算されたディスク障害値を減算する。このため、間欠
的なディスクの障害の場合には、ディスク障害値加算機
能によってディスク障害値が加算されても、正常な動作
が継続することによりディスク障害値はしきい値に至ら
ず、従って、切り離す必要がない場合に切り離し処理を
行うことがない。In this configuration, the disk failure value subtraction function subtracts the disk failure value added by the disk failure value addition function when data is normally stored or read from the disk. For this reason, in the case of intermittent disk failure, even if the disk failure value is added by the disk failure value addition function, the disk failure value does not reach the threshold value due to continued normal operation. There is no need to perform disconnection processing when there is no need to disconnect.

【００２０】また、実施の形態では、アレイ制御手段
が、複数のディスク装置をさらに複数の領域に分割する
と共に、ディスク障害管理テーブルが、当該分割領域ご
とに障害の履歴を管理するテーブル項目を有する。さら
に、ディスク障害値加算機能が、ディスクの障害が発生
したときに一定値ａを加算する機能を備え、ディスク障
害管理テーブルのしきい値を、一定値ａのｎ倍に設定す
ると共に、このｎの値を、上位装置が同一データを自動
的にリトライする回数に設定する。そして、ディスク障
害値減算機能が、ディスク装置の分割領域数に応じて一
定値ａよりも小さい値を減算する機能を備える。Further, in the embodiment , the array control means further divides the plurality of disk devices into a plurality of areas, and the disk fault management table has a table item for managing a fault history for each of the divided areas. . Further, the disk failure value addition function has a function of adding a constant value a when a disk failure occurs, and sets the threshold value of the disk failure management table to n times the constant value a, and Is set to the number of times the higher-level device automatically retries the same data. The disk failure value subtraction function has a function of subtracting a value smaller than the fixed value a according to the number of divided areas of the disk device.

【００２１】この領域の分割を行う本実施形態では、分
割領域とディスクに障害が発生する箇所との大きさの相
違と、上位装置が同一データを自動的にリトライする回
数を考慮している。上位装置が同一データの書き込みを
繰り返す回数が３回だとすると、３回繰り返して書き込
みが行えない場合には、書き込み又は読み出しを行うデ
ィスクの位置は同一箇所であるため、当該データを書き
込もうとしたセクタにメディアの障害が発生していると
判定できる。従って、しきい値をａのｎ倍とすると、こ
のｎを上位装置が同一のデータをリトライする回数に設
定すると、この上位装置のリトライが不成功となった場
合には、必ず障害有りとなり、切り離し処理の対象とす
る。In the present embodiment for dividing the area, the difference between the size of the divided area and the location where a failure occurs in the disk, and the number of times the higher-level device automatically retries the same data are taken into account. If the host device repeats writing the same data three times, and if the writing cannot be repeated three times, the position of the disk where the writing or reading is performed is the same, and the data is written to the sector where the data is to be written. It can be determined that a media failure has occurred. Therefore, if the threshold value is set to n times a, and this n is set to the number of times the upper device retries the same data, if the retry of this higher device fails, the failure always occurs, It is the target of disconnection processing.

【００２２】一方、複数の異なるアクセス命令が上位装
置から出力され、一部に障害のある分割領域に異なる命
令によってアクセスするときには、実際にアクセスする
箇所が異なる。この場合には、分割領域の正常な箇所に
アクセスすることがあるため、正常な箇所へのアクセス
によってディスク障害値を減算していく。しかし、この
減算の値は、加算の設定値ａよりも十分に小さい値とす
ることで、ディスクに障害が発生したという履歴をディ
スク障害テーブルに残しておく。すなわち、分割領域が
小さい場合には、一旦障害が発生したとされる箇所が正
常な状態となったと判定できる場合であっても、分割領
域が大きく、分割領域のなかに障害のある部分とない部
分とがある場合には、障害が発生した分割領域であって
も正常にデータのリードライトが行える場合が多くな
る。このため、減算する値は、ディスク装置の分割領域
数に応じて一定値ａよりも小さい値とする。この小さい
値での減算が継続すると、すなわち、正常なリードライ
トが繰り返されると、ディスク障害テーブルの値は０と
なる。一方、上位装置からリトライ命令が繰り返される
と、大きい値ａが加算されるため、しきい値に達する。
このように分割領域としきい値及び加算、減算値の関係
を設定するため、的確にディスクの切り離し処理を行う
ことができる。On the other hand, when a plurality of different access commands are output from the host device, and when a partially faulty divided area is accessed by a different command, the actual access location differs. In this case, since a normal part of the divided area may be accessed, the disk failure value is subtracted by accessing the normal part. However, the value of the subtraction is set to a value sufficiently smaller than the set value a of the addition, so that a history that a disk has failed is left in the disk failure table. In other words, when the divided area is small, even if it can be determined that the location where the failure has occurred once becomes normal, the divided area is large and there is no faulty part in the divided area. When there is a part, data reading and writing can be performed normally even in a divided area where a failure has occurred. Therefore, the value to be subtracted is set to a value smaller than the fixed value a according to the number of divided areas of the disk device. When the subtraction with this small value continues, that is, when normal read / write is repeated, the value of the disk failure table becomes 0. On the other hand, when the retry instruction is repeated from the host device, a large value a is added, and the threshold value is reached.
As described above, since the relationship between the divided area, the threshold value, and the addition / subtraction value is set, it is possible to accurately perform the disk separation processing.

【００２３】他の実施形態では、アレイ制御手段が、デ
ィスク切り離し機能によってディスクを切り離す制御を
行うときにディスク障害管理テーブルによって他のディ
スクも障害を有するとされた場合には当該切り離し制御
を中止する切り離し中止機能を備える。この切り離し中
止機能により、自動的に切り離し処理を行うのは複数の
ディスクのうち１台のみに障害が発生している場合に限
られ、このため、切り離し処理を行ってもデータの冗長
構成によりデータを復旧することができる。一方、１台
に障害が判定され、切り離し処理を行うとしたときであ
っても、他のディスクにも障害がある場合には、当該他
のディスクの障害部分についてのデータの復旧ができな
くなるため、切り離し処理を行わずに、上位装置に報告
する。このため、切り離し処理によりデータを読み出せ
なくなる事態の発生を確実に防止することができる。In another embodiment, when the array control means performs control for disconnecting a disk by the disk disconnection function, if the other disk has a failure according to the disk failure management table, the disconnection control is stopped. It has a disconnection stop function. The disconnection stop function automatically performs the disconnection processing only when a failure has occurred in only one of the plurality of disks. Therefore, even if the disconnection processing is performed, the data is redundantly configured. Can be recovered. On the other hand, even if it is determined that a failure has occurred in one of the disks and the disconnection process is to be performed, if another disk also has a failure, data recovery for the failed portion of the other disk becomes impossible. , Without reporting to the host device. Therefore, it is possible to reliably prevent a situation in which data cannot be read due to the separation processing.

【００２４】また、アレイ制御手段とディスクとを接続
するデバイスインタフェースの障害を監視するアレイデ
ィスク制御装置にあっては、このデバイスインタフェー
スの障害状態に応じて切り離し処理を行う。この実施形
態では、ディスク障害管理テーブルに、デバイスインタ
ーフェースの障害に応じて予め定められた値が格納され
るデバイスインタフェース障害管理テーブルを併設す
る。さらに、アレイ制御手段が、デバイスインターフェ
ースの障害が発生したときにデバイスインタフェース障
害管理テーブルの値を増加させるインタフェース障害値
加算機能と、当該デバイスインタフェースを介して正常
にデータの格納又は読み出しが行われたときにデバイス
インタフェース障害管理テーブルの値を減少させるイン
タフェース障害値減算機能と、デバイスインタフェース
障害管理テーブルのインタフェース障害値が予め定めら
れたしきい値よりも上回ったときに当該インタフェース
を切り離すデバイスインタフェース切り離し機能とを備
える。Further, in an array disk control device for monitoring a failure of a device interface connecting the array control means and the disk, a disconnection process is performed according to the failure state of the device interface. In this embodiment, a device interface fault management table in which a predetermined value is stored according to a device interface fault is provided in the disk fault management table. Furthermore, the array control means increases the value of the device interface fault management table when a device interface fault occurs, and the data storage or reading is normally performed via the device interface. An interface failure value subtraction function for decreasing the value of the device interface failure management table and a device interface disconnection function for disconnecting the interface when the interface failure value of the device interface failure management table exceeds a predetermined threshold value And

【００２５】このデバイスインタフェースの状態に応じ
てディスクの切り離し処理を行う場合であっても、一旦
障害が発生したのちに正常にデータの転送が行えた場合
には、障害値の減算を行う。これにより、切り離し処理
が必要な場合にのみ切り離し処理を行うことができる。Even in the case of performing the disconnection processing of the disk according to the state of the device interface, the failure value is subtracted if the data can be normally transferred after the occurrence of the failure. Thus, the separation process can be performed only when the separation process is necessary.

【００２６】デバイスインタフェースの管理を行う実施
形態では、アレイ制御手段が２台以上あり、同一のディ
スクへそれぞれのアレイ制御手段が接続されている場合
には、切り離し処理を行ったのち、他方のアレイ制御手
段によって当該ディスクにアクセスする。すなわち、こ
の実施形態では、デバイスインタフェース切り離し機能
に、当該デバイスインタフェース切り離し機能によって
デバイスインタフェースが切り離された場合には上位装
置からの再試行を複数のアレイ制御手段を順次選択して
行う制御をする再試行制御機能を併設する。これによ
り、ディスクインタフェースの切り離し処理と他のアレ
イ制御手段による接続という復旧処理とを自動的に行う
ことができる。In the embodiment for managing the device interface, if there are two or more array control means and each array control means is connected to the same disk, the disconnection processing is performed, and then the other array control means is used. The disk is accessed by the control means. In other words, in this embodiment, when the device interface is disconnected by the device interface disconnection function, a retry from the higher-level device is performed by sequentially selecting a plurality of array control means. A trial control function is also provided. This makes it possible to automatically perform the disk interface disconnection processing and the recovery processing of connection by other array control means.

【００２７】図１は本実施形態の構成例を示すブロック
図である。図１に示すように、本実施形態では、接続さ
れているディスク毎にメディアの障害が検出された履歴
を記憶するメディア障害管埋テーブル（ディスク障害管
理テーブル）１と、接続されているディスクインターフ
ェース４１〜４４毎に、インターフェース障害が検出さ
れ、ホストコンピュータに報告された履歴を記憶するデ
バイスインターフェース障害管理テーブル２とを有して
いる。FIG. 1 is a block diagram showing a configuration example of this embodiment. As shown in FIG. 1, in the present embodiment, a media failure management table (disk failure management table) 1 for storing a history of detecting a media failure for each connected disk, and a connected disk interface Each device 41 to 44 has a device interface failure management table 2 for storing a history of an interface failure detected and reported to the host computer.

【００２８】この各障害管理テーブル１，２に障害値を
加算又は減算するのは、図１に示す例では、マイクロプ
ログラム処理部５が行う。このマイクロプログラム処理
部５は、ＣＰＵ，ＲＡＭ等の構成を備え、障害値の加算
又は減算を行う手順が記載されたプログラムを実行する
ことで、アレイ制御手段の各種機能を実現する。In the example shown in FIG. 1, the microprogram processing unit 5 adds or subtracts a fault value to each of the fault management tables 1 and 2. The microprogram processing unit 5 includes a CPU, a RAM, and the like, and realizes various functions of an array control unit by executing a program describing a procedure for adding or subtracting a fault value.

【００２９】マイクロプログラム処理部５は、ディスク
又はデバイスインタフェースの障害が検出または報告さ
れたときには該当するメディア障害管理テーブル１又は
デバイスインターフェース障害管理テーブル２にあらか
じめ設定された値を加算する。When a failure in the disk or device interface is detected or reported, the microprogram processing unit 5 adds a preset value to the corresponding media failure management table 1 or device interface failure management table 2.

【００３０】一方、当該ディスクを使用しても障害が発
生しなかったとき、あるいは一時的に障害を回避するた
めに当該ディスクを切り離して処理を行った場合には当
該ディスクのメディア障害管埋テーブル１、およびデバ
イスインターフェース障害管埋テーブル２の内容からあ
らかじめ設定された値を減ずる。On the other hand, when no failure occurs even when the disk is used, or when the disk is separated and processed to temporarily avoid the failure, the media failure management table of the disk is used. 1, and a preset value is subtracted from the contents of the device interface fault management table 2.

【００３１】さらに、マイクロプログラム処理部５は、
ディスクに対してアクセスする前にメディア障害管埋テ
ーブル１、および、デバイスインターフェース障害管埋
テーブル２の内容を調べ、規定値を超えていたときに当
該ディスクを障害と見なして切り離しの対象とする。上
述したアレイ制御手段は、このマイクロプログラム処理
部５と、アレイ制御部３とを備える。Further, the microprogram processing unit 5
Before accessing the disk, the contents of the media failure management table 1 and the device interface failure management table 2 are checked. If the contents exceed a specified value, the disk is regarded as a failure and is subject to disconnection. The above-described array control means includes the microprogram processing unit 5 and the array control unit 3.

【００３２】また、マイクロプログラム処理部５は、メ
ディア障害管理テーブル１、および、デバイスインター
フェース障害管理テーブル２の内容を調べ切り離しの対
象となるディスクが検出された場合に、他のディスクの
メディア障害履歴がある場合にはディスクの切り離しを
中止する機能を有している。The microprogram processing unit 5 checks the contents of the media failure management table 1 and the device interface failure management table 2 and, if a disk to be separated is detected, detects the media failure history of another disk. If there is, there is a function to stop disconnecting the disk.

【００３３】図１に示す実施形態では、障害履歴管理の
ために必要な情報は１ディスク当たりディスクインター
フェース障害の履歴管埋のための１つと、メディアに障
害履歴管埋のための１ディスクの記録領域を分割した数
に対応するだけの情報があればいい。例えば、１ディス
クの記録領域を３２分割したのであれば３２個の記憶領
域があればいい。ディスクインターフェース、または、
メディアの障害が検出された場合には対応する障害履歴
管埋テーブルにあらかじめ設定された値を加算する。In the embodiment shown in FIG. 1, the information necessary for managing the failure history is one for disc management of the history of disk interface faults per disk, and the recording of one disc for the management of fault history on a medium. All that is needed is information that corresponds to the number of divided areas. For example, if the recording area of one disk is divided into 32, it is sufficient if there are 32 storage areas. Disk interface, or
When a media failure is detected, a preset value is added to the corresponding failure history management table.

【００３４】これらの障害履歴の値がしきい値に達した
場合に当該ディスクを切り離しの対象とする処理は従来
より行われていた。本発明では、ディスクへのアクセス
で異常が発生しなかった場合には、当該記憶領域に対応
するメディア障害管埋テーブル１、および、デバイスイ
ンターフェース障害管理テーブル２の内容からあらかじ
め設定された値を減ずる。これにより、間欠的に発生し
得る軽微な障害が累積されてディスクが切り離されるこ
とを防止する。When the value of these failure histories reaches a threshold value, the process of disconnecting the disk has been conventionally performed. According to the present invention, when no abnormality occurs in accessing the disk, a preset value is subtracted from the contents of the media failure management table 1 and the device interface failure management table 2 corresponding to the storage area. . As a result, it is possible to prevent the disk from being disconnected due to accumulation of minor failures that may occur intermittently.

【００３５】このディスクインターフェースの障害が発
生した場合には、必ずホストコンピュータへの障害報告
を行う。デバイソインターフェース障害管埋テーブル２
の加算はホストコンピュータへの障害報告時に加工され
るため、ホストコンピュータが障害の発生を認識した回
数として扱うことができる。複数のアレイディスク制御
装置から同一ディスクに対してアクセス可能なシステム
ではホストコンピュータが障害の発生を認識することに
よって、他のアレイディスク制御装置を経由して再試行
を行うことが可能になる。When a failure occurs in the disk interface, a failure report is always sent to the host computer. Device interface obstruction filling table 2
Is processed at the time of reporting a failure to the host computer, and can be treated as the number of times the host computer has recognized the occurrence of the failure. In a system in which the same disk can be accessed from a plurality of array disk controllers, the host computer recognizes the occurrence of the failure, and can retry via another array disk controller.

【００３６】アレイディスク制御装置内のディスクイン
ターフェース制御回路の故障であればこの再試行によっ
てディスクを切り離すことなく、他の正常なアレイディ
スク制御装置を使用して処理の継続が可能になる。ディ
スクインターフェース障害の原因がディスク側にあれば
ホストコンピュータが他のアレイディスク制御装置を経
由して再試行を行っても回復せず、再度最初に障害を報
告したアレイディスク制御装置を使用して再試行を操り
返す。これにより、デバイスインターフェース障害管埋
テーブル２の値が加算されしきい値を越えるとディスク
の切り離しの対象になる。If a failure occurs in the disk interface control circuit in the array disk controller, the processing can be continued using another normal array disk controller without disconnecting the disk by this retry. If the cause of the disk interface failure is on the disk side, it will not recover even if the host computer retries via another array disk controller, and it will be restarted using the array disk controller that first reported the failure. Take back the trial. As a result, when the value of the device interface fault management table 2 is added and exceeds the threshold value, the disk is disconnected.

【００３７】さらに、ディスク切り離しの対象になった
ディスクは実際に切り離す前に他のディスクのメディア
障害管理テーブル１の内容を調べ、障害の発生履歴が残
っている場合にはそのディスクの切り離しにより復旧で
きないデータが発生する可能性があるため切り離しを中
止する。これにより、ディスクの切り離しによって読み
出せないデータが生ずることを防ぐ。Further, before the disk is actually separated, the contents of the media failure management table 1 of the other disk are examined before the actual separation, and if a failure history remains, the disk is recovered by disconnecting the disk. Stop disconnection because there is a possibility that unreadable data may occur. This prevents data that cannot be read due to disconnection of the disk from occurring.

【００３８】[0038]

【実施例】次に本発明について図面を参照して説明す
る。DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, the present invention will be described with reference to the drawings.

【００３９】図１は本発明のアレイディスク制御装置の
１構成例を示すブロック図である。アレイディスク制御
装置は、ホストコンピュータとアレイディスク制御装置
との間のデータ転送制御を行うホストインターフェース
制御部６と、ホストコンピュータから送られたデータの
複数ディスクへの分散、ＲＡＩＤでの冗長データの生
成、あるいは複数ディスク（直接アクセス記憶装置，Ｄ
ＡＳＤ）から読み出したデータからのオリジナルデータ
（ホストコンピュータから送られた時の形）への合成、
冗長データとの整合性のチェック、冗長データによるデ
ータの修復などを行うアレイ制御部３と、データを記憶
するディスク７１，７２，７３，７４と、各ディスク７
１，７２，７３，７４とアレイ制御部３との間のインタ
ーフェース制御を行うディスクインターフェース制御部
４１，４２，４３，４４とを備えている。FIG. 1 is a block diagram showing one configuration example of the array disk control device of the present invention. The array disk controller has a host interface controller 6 for controlling data transfer between the host computer and the array disk controller, distributing data sent from the host computer to a plurality of disks, and generating redundant data by RAID. Or multiple disks (direct access storage, D
ASD) to combine the original data (as sent from the host computer) from the data read from the
An array control unit 3 for checking consistency with redundant data, restoring data with redundant data, etc .; disks 71, 72, 73, 74 for storing data;
Disk interface controllers 41, 42, 43, 44 for controlling the interface between the array controllers 3, 72, 73, 74 are provided.

【００４０】さらに、図１に示す例では、メディア障害
管埋テーブル１を備えている、本実施例では、メディア
障害管埋テーブル１は、各ディスク７１，７２，７３，
７４の記録領域を８分割して個々の領域毎にエラーの発
生履歴をカウントする１バイトのカウンタの集合であ
る。また、アレイディスク制御装置は、デバイスインタ
ーフェース障害管埋テーブル２を備えている。このデバ
イスインターフェース障害管埋テーブル２は、各ディス
クインターフェース制御部４１，４２，４３，４４毎に
エラーの発生履歴をカウントする１バイトのカウンタの
集合である。Further, in the example shown in FIG. 1, a media fault management table 1 is provided. In the present embodiment, the media fault management table 1 includes the disks 71, 72, 73,
This is a set of 1-byte counters for dividing the recording area of 74 into eight and counting the error occurrence history for each area. Further, the array disk control device includes a device interface fault management table 2. The device interface fault management table 2 is a set of 1-byte counters for counting the error occurrence history for each of the disk interface control units 41, 42, 43, and 44.

【００４１】マイクロプログラム処理部５は、ホストイ
ンターフェース制御部６、アレイ制御部３など各部の動
作制御を行う部分であり、本実施例では、メディア障害
管埋テーブル１、デバイスインターフェース障害管理テ
ーブル２の更新もマイクロプログラム処理部５によって
マイクロプログラムによって実現されている。The microprogram processing section 5 is a section for controlling the operation of each section such as the host interface control section 6 and the array control section 3. In the present embodiment, the microprogram processing table 1 and the device interface fault management table 2 Updating is also realized by the microprogram by the microprogram processing unit 5.

【００４２】次にフローチャートに基づいて動作を説明
する。この例では、ディスク障害の加算値を（２
０）₁₆，しきい値を（６０）₁₆，減算値を（０１）₁₆と
する。また、ディスクを８領域に分割する。そして、デ
バイスインタフェースの障害の加算値を（０１）₁₆，し
きい値を（０３），減算は初期値（００）へクリアする
処理により行う。上位装置のオペレーティングシステム
のリトライ回数は３回である。この設定により、切り離
し処理の適否の判断が良好に行われる。Next, the operation will be described based on a flowchart. In this example, the added value of the disk failure is (2
0) ₁₆ , the threshold value is (60) ₁₆ , and the subtraction value is (01) ₁₆ . Also, the disc is divided into eight areas. The added value of the failure of the device interface is cleared to (01) ₁₆ , the threshold value to (03), and the subtraction to the initial value (00). The number of retries of the operating system of the host device is three. With this setting, the determination as to whether or not the disconnection process is appropriate is made satisfactorily.

【００４３】図２はホストコンピュータからコマンドを
受け付け、切り離し対象ディスクの有無をチェックする
処理のフローチャートである。FIG. 2 is a flowchart of a process for receiving a command from the host computer and checking for the presence or absence of a disk to be disconnected.

【００４４】アレイディスク制御装置はホストインター
フェース制御部６を経由してホストコンピュータからコ
マンドを受け付けると（ステップＳ２０１）、このコマ
ンドで指定された読み出し、または書き込みアドレスか
らアクセスするディスク７１〜７４と、記録領域がディ
スクの全記録域を８分割したどの領域（以下ゾーンと記
す）に含まれるかを決定する（ステップＳ２０２）。When the array disk controller receives a command from the host computer via the host interface controller 6 (step S201), the disks 71 to 74 which are accessed from the read or write address specified by the command, and record data. It is determined which area (hereinafter, referred to as a zone) is obtained by dividing the entire recording area of the disc into eight (step S202).

【００４５】次にメディア障害管理テーブル１からステ
ップＳ２０２で決定されたディスク、ゾーンに対応する
カウンタの内容を読み出す（ステップＳ２０３）。複数
のディスク、またはゾーンにまたがる場合は、そのカウ
ンタ値の全てをあらかじめ設定されたしきい値と比較す
る（ステップＳ２０４）。Next, the contents of the counter corresponding to the disk and zone determined in step S202 are read from the media failure management table 1 (step S203). If the data spans a plurality of disks or zones, all the counter values are compared with a preset threshold value (step S204).

【００４６】ここではしきい値として（６０）₁₆が設定
されているものとする。しきい値を超えるカウンタ値が
ない場合には、続いてデバイスインターフェース障害管
埋テーブル２の中からステップＳ２０２で決定された使
用するディスクに対応するディスクインターフェース制
御部４１〜４４の障害履歴カウントを読み出す（ステッ
プＳ２０５）。Here, it is assumed that (60) ₁₆ is set as the threshold value. If there is no counter value exceeding the threshold value, the failure history counts of the disk interface control units 41 to 44 corresponding to the used disks determined in step S202 are read from the device interface failure management table 2. (Step S205).

【００４７】複数のディスクにまたがる場合は、そのカ
ウンタ値の全てをあらかじめ設定されたしきい値と比較
する（ステップＳ２０６）。ここではしきい値として
（０３）₁₆が設定されている。このため、デバイスイン
タフェースの障害が３回報告された場合に当該インタフ
ェースの切り離しを行う。ステップＳ２０４、２０６何
れの判断でもしきい値を越えていない場合には、必要な
ディスクを選択してディスクをアクセスする（ステップ
Ｓ２０８）。If the data spans a plurality of disks, all the counter values are compared with a preset threshold value (step S206). Here, (03) ₁₆ is set as the threshold value. Therefore, when a failure of a device interface is reported three times, the interface is disconnected. If the determination in either of steps S204 and S206 does not exceed the threshold value, a necessary disk is selected and the disk is accessed (step S208).

【００４８】なお、ステップＳ２０８ではすでに必要な
ディスクが切り離されている場合にはデータ復旧できる
様に冗長ディスクを使用するかどうかの判断も行う。メ
ディア障害管埋テーブル１の内容がしきい値を越えてい
ないかどうかの判断（ステップＳ２０４）、および、デ
バイスインターフェース障害管埋テーブル２の内容がし
きい値を越えていないかどうかの判断（ステップＳ２０
６）でしきい値を越えているカウンタが検出された場合
には、このディスクを切り離し対象として切り離しの可
否の判断を行う（ステップＳ２０７）。In step S208, if a necessary disk has already been disconnected, it is also determined whether or not to use a redundant disk so that data can be recovered. It is determined whether the content of the media failure management table 1 does not exceed the threshold value (step S204), and whether the content of the device interface failure management table 2 does not exceed the threshold value (step S204). S20
If a counter exceeding the threshold value is detected in 6), it is determined whether or not the disk can be detached with this disk as an object to be detached (step S207).

【００４９】メディア障害管理テーブル１、または、デ
バイスインターフェース障害管埋テーブル２の内容がし
きい値を越えていて、なおかつ切り離し可能な状態であ
れば、ステップＳ２０７で切り離され、必要なディスク
を選択して読み出し、書き込みなどのアクセスの実行を
行う（ステップＳ２０８）。If the contents of the media failure management table 1 or the device interface failure management table 2 exceed the threshold value and can be separated, in step S207, the disk is separated and a necessary disk is selected. Then, access such as reading and writing is performed (step S208).

【００５０】図３はディスクへのアクセス後のメディア
障害管埋テーブル１、および、デバイスインターフェー
ス障害管理テーブル２の操作に関するフローチャートで
ある。ディスクへのアクセス後、最初にディスクインタ
ーフェースに関する障害の有無を判断する（ステップＳ
３０１）。FIG. 3 is a flowchart showing the operation of the media fault management table 1 and the device interface fault management table 2 after accessing the disk. After accessing the disk, it is first determined whether there is a failure related to the disk interface (step S).
301).

【００５１】ディスクインターフェースで異常が検出さ
れた場合には、エラーの検出されたディスクに対応する
デバイスインターフェース障害管埋テーブル２のカウン
タの内容に（０１）を加算し（ステップＳ３０２）、ホ
ストコンピュータにエラーの発生を知らせるステータス
を報告する（ステップＳ３０３）。When an abnormality is detected in the disk interface, (01) is added to the contents of the counter of the device interface fault management table 2 corresponding to the disk in which the error has been detected (step S302), and the host computer is notified. The status reporting the occurrence of the error is reported (step S303).

【００５２】ディスクインターフェースに異常が検出さ
れなかった場合には、使用した全てのディスクに対応す
るデバイスインターフェース障害管理テーブル２の内容
を（００）にクリアする（ステップＳ３０４）。If no abnormality is detected in the disk interface, the contents of the device interface fault management table 2 corresponding to all used disks are cleared to (00) (step S304).

【００５３】ディスクインターフェースに異常が検出さ
れなかった場合には、続いてディスクのエラーの有無を
調べる（ステップＳ３０５）。ディスクでのエラーも発
生していなければ、ホストコンピュータに正常終了を示
すステータスを報告し（ステップＳ３０９）、さらにア
クセス対象になったディスク、およびゾーンの対応する
メディア障害管埋テーブル１の内容から（０１）を減じ
る（ステップＳ３１０）。If no abnormality is detected in the disk interface, it is checked whether there is an error in the disk (step S305). If no error has occurred in the disk, a status indicating a normal end is reported to the host computer (step S309), and based on the disk to be accessed and the contents of the corresponding media fault management table 1 of the zone ( 01) is reduced (step S310).

【００５４】なお、ステップＳ３１０の処理はメディア
障害管理テーブル１の該当するカウンタの値が（００）
の場合、すなわち、メディア障害の発生履歴がない場合
にはスキップされ、何もしない。In the process of step S310, the value of the corresponding counter of the media fault management table 1 is set to (00).
In the case of, that is, when there is no history of occurrence of a media failure, the process is skipped and nothing is performed.

【００５５】ステップＳ３０５の判断でディスクにエラ
ーが発生した場合にはエラーの発生したディスク、およ
び発生位置に対応するゾーンを確定する（ステップＳ３
０６）。ステップＳ３０６で決定されたゾーンに対応す
るカウンタをメディア障害管理テーブル１から読み出
し、カウンタ値に（２０）₁₆を加算し（ステップＳ３０
７）、ホストコンピュータにエラーの発生を示すステー
タスを報告する（ステップＳ３０３）。An error has occurred in the disc in the judgment of step S305.
If over it occurs to determine the zone corresponding to the failed disk, and generating position error (step S3
06). The counter corresponding to the zone determined in step S306 is read from the media failure management table 1, and (20) ₁₆ is added to the counter value (step S30).
7) The status indicating the occurrence of the error is reported to the host computer (step S303).

【００５６】ディスクでエラーが発生しなかった場合と
同様に、アクセス対象になり、なおかつエラーの発生し
なかったディスク、ゾーンに対応するメディア障害管埋
テーブル１のカウンタは（０１）を減じる（ステップＳ
３１０）。As in the case where no error has occurred in the disk, the counter of the media fault management table 1 corresponding to the disk or zone to which access has been made and for which no error has occurred, subtracts (01) (step). S
310).

【００５７】この場合も、ステップＳ３１０の処理はメ
ディア障害管理テーブル１の該当するカウンタの値が
（００）の場合、すなわち、メディア障害の発生履歴が
ない場合にはスキップされ、何もしない。Also in this case, if the value of the corresponding counter in the media failure management table 1 is (00), that is, if there is no history of occurrence of the media failure, the process of step S310 is skipped and nothing is performed.

【００５８】図３に示されるように、ディスクアクセス
により障害が発生した場合でもメディア障害管理テーブ
ル１、およびデバイスインターフェース障害管理テーブ
ル２のカウンタ値の操作だけで、この時点ではディスク
の切り離しは行わない。As shown in FIG. 3, even when a failure occurs due to disk access, the disk is not disconnected at this time only by operating the counter values of the media failure management table 1 and the device interface failure management table 2. .

【００５９】ディスクの切り離しはエラーが発生して、
ホストコンピュータにエラーが報告され、ホストコンピ
ュータ側の判断で有限回の再試行を行うことにより、メ
ディア障害管埋テーブル１、または、デバイスインター
フェース障害管理テーブル２のカウント値が増加してデ
ィスクの切り離しに至る。When disconnecting a disk, an error occurs.
An error is reported to the host computer, and a finite number of retries are performed at the discretion of the host computer. As a result, the count value of the media failure management table 1 or the device interface failure management table 2 increases, and the disk is disconnected. Reach.

【００６０】図３のステップＳ２０７に示されるメディ
ア障害管埋テーブル１、または、デバイスインターフェ
ース障害管理テーブル２の内容がしきい値を越えてして
いる場合のディスクの切り離しの可否は図４のフローチ
ャートに示されるしきい値を越えている場合には、まず
最初に現在のアレイディスクの状態をチェックする（ス
テップＳ４０１）。Whether the disk can be disconnected when the contents of the media failure management table 1 or the device interface failure management table 2 shown in step S207 of FIG. 3 exceed the threshold value is shown in the flowchart of FIG. If the threshold value is exceeded, first, the current state of the array disk is checked (step S401).

【００６１】本実施例のアレイディスクは３台のディス
クと１台の冗長ディスクによって構成されているものを
想定している。したがって、すでに１台のディスクが切
り離されている場合には冗長性を失っており、これ以上
のディスクの切り離しはできないため、すでに切り離し
済みのディスクが存在しないかどうかを調べる（ステッ
プＳ４０１）。It is assumed that the array disk of this embodiment is composed of three disks and one redundant disk. Therefore, if one disk has already been disconnected, the redundancy has been lost, and it is not possible to disconnect any more disks. Therefore, it is checked whether there is any disk that has already been disconnected (step S401).

【００６２】すでに切り離し済みのディスクが存在する
場合には、何もせずにディスク切り離しの可否の判断を
終了する。切り離し済みのディスクが存在せず冗長性を
維持できている時には、切り離し対象とされたディスク
以外のディスクに関するメディアの障害履歴をメディア
障害管埋テーブル１から読み出す。読み出すカウンタは
全てのゾーンを対象に読み出す（ステップＳ４０２）。If there is a disk that has already been separated, the determination as to whether or not the disk can be separated is terminated without doing anything. When the disconnected disk does not exist and the redundancy can be maintained, the failure history of the media regarding the disks other than the disk to be disconnected is read from the media failure management table 1. The read counter reads all zones (step S402).

【００６３】切り離し対象ディスク以外のディスクに障
害履歴の有無を判断するために、読み出したカウンタ値
を調べる（ステップＳ４０３）。（００）以外のものが
存在する場合には切り離すことによって後に読み出せな
くなるデータが発生する可能性があるため切り離しを中
止する。The read counter value is checked to determine whether there is a failure history in a disk other than the disk to be disconnected (step S403). If there is something other than (00), the separation is stopped because there is a possibility that data that cannot be read later may occur due to the separation.

【００６４】切り離し対象ディスク以外のディスクに障
害履歴がない場合には、さらに切り離し対象のディスク
以外のディスクに対応するディスクインターフェースの
障害履歴をデバイスインターフェース障害管理テーブル
２から読み出す（ステップＳ４０４）。If there is no failure history in the disks other than the disk to be separated, the failure history of the disk interface corresponding to the disk other than the disk to be separated is read from the device interface failure management table 2 (step S404).

【００６５】切り離し対象ディスク以外のディスクに対
応するディスクインターフェースに障害履歴を調ベ（ス
テップＳ４０３）、（００）以外のものが存在する場合
には切り離すことによって後に読み出せなくなるデータ
が発生する可能性があるため切り離しを中止する。他の
ディスクインターフェース制御部に障害履歴がない場合
には、切り離し対象のディスクを切り離す（ステップＳ
４０６）。The failure history is checked in a disk interface corresponding to a disk other than the disk to be separated (step S403). If there is a disk other than (00), data that cannot be read later due to separation may occur. Cancel disconnection because there is. If there is no failure history in the other disk interface control unit, the disk to be separated is separated (step S
406).

【００６６】次に第２の実施例として図５のような２つ
のアレイディスク制御装置１０Ａ，１０Ｂから同一ディ
スク７１〜７４をアクセス可能な構成を採った場合につ
いて説明する。この場合もアレイディスク制御装置の動
作は前述の処理と全く同じである。Next, a description will be given of a second embodiment in which the same disks 71 to 74 can be accessed from two array disk controllers 10A and 10B as shown in FIG. In this case, the operation of the array disk control device is exactly the same as the above-described processing.

【００６７】ホストコンピュータが再試行を行う際に２
つのディレクタに交互に再試行を行うことにより、障害
の原因がアレイディスク制御装置１０Ａ内のディスクイ
ンターフェース制御部４１Ａ〜４４Ａにある場合にはア
レイディスク制御装置１０Ｂ側から再試行をすることに
よって、入出力命令は成功し、ディスクを切り離すこと
なく処理を継続することができる。When the host computer retrys,
By alternately retrying the two directors, if the failure is caused by the disk interface control units 41A to 44A in the array disk controller 10A, the retry is performed from the array disk controller 10B side. The output command succeeds and processing can continue without disconnecting the disk.

【００６８】障害の原因がディスク７１〜７４内にある
場合には、２つのアレイディスク制御装置１０Ａ、１０
Ｂ何れを通してもエラーが発生し、ホストコンピュータ
にエラーが報告される度にメディア障害管理テーブル１
の内容、または、デバイスインターフェース障害管埋テ
ーブル２の内容が増加して、ホストコンピュータが無条
件に再試行を繰り返すことによってディスクの切り離し
に至る。If the cause of the failure is in the disks 71 to 74, the two array disk controllers 10A, 10A
B, an error occurs through any of the media failure management tables 1 each time an error is reported to the host computer.
Or the contents of the device interface fault management table 2 increase, and the host computer unconditionally repeats the retry, thereby leading to the disconnection of the disk.

【００６９】上述したように本実施例によると、以下の
効果を奏する。As described above, according to this embodiment, the following effects can be obtained.

【００７０】第１の効果は、ディスクインターフェース
制御部の数に相当するカウンタと各ディスクの記録領域
をメディアエラー管埋のために分割した数だけのカウン
タ用意すればよく、少ない制御情報で障害ディスクの切
り離し管理を行うことができる。The first effect is that counters corresponding to the number of disk interface control units and counters of the number obtained by dividing the recording area of each disk for the purpose of media error management can be prepared. Can be managed separately.

【００７１】第２の効果は、ディスクインターフェース
の障害のようにアレイディスク制御装置側に原因がある
場合と、ディスク側に原因がある場合との切り分けが容
易でない障害でも、的確に障害ディスクの切り離し制御
を行うことができる。The second effect is that even if the cause is on the array disk controller and the cause is on the disk, such as a failure in the disk interface, it is not easy to distinguish the failure, it is possible to accurately disconnect the failed disk. Control can be performed.

【００７２】特に２台以上のアレイディスク制御装置か
らなるアレイディスク装置ではホストコンピュータから
の再試行を各アレイディスク制御装置を通して交互に実
行することにより、アレイディスク制御装置側に原因が
ある場合と、ディスク側に原因がある場合とを的確に切
り分け、アレイディスク制御装置側に障害がある場合に
は正常なディスクを切り離すことなく処理を継続可能に
し、ディスク側に原因がある場合にはディスクを切り離
して処理の継続を可能にすることができる。In particular, in an array disk device composed of two or more array disk controllers, the retry from the host computer is executed alternately through each array disk controller, so that there are cases where there is a cause on the array disk controller side, The disk side can be properly separated from the cause, and if there is a failure in the array disk controller, processing can be continued without disconnecting the normal disk.If the disk side has the cause, the disk can be disconnected. Continuation of the process can be made possible.

【００７３】また、これはホストコンピュータからの再
試行を前提にしているが、ホストコンピュータからの再
試行は単に再試行を行うアレイディスク制御装置を交互
に切り換えるだけでよく、ソフトウェアによる複雑なス
テータス解析等は必要ないため、アレイディスク装置を
使用するためのソフトウェアの負担も増えることはな
い。This is based on the retry from the host computer. However, the retry from the host computer may simply be performed by alternately switching the array disk controller that performs the retry. Since there is no necessity, the load of software for using the array disk device does not increase.

【００７４】第３の効果は、先に本発明が解決しようと
する課題の項でも記載したように冗長性を持ったアレイ
ディスクでは、媒体の障害により書き込みができない場
合にはデータを格納するデータディスクとパリティデー
タを格納する冗長ディスクとの整合性をとるため、障害
ディスクを切り離して処理を行う必要があるが、読み出
しの場合には冗長ディスクでデータを復旧してホストコ
ンピュータに転送することが可能なため、より影響が大
きくなるディスク全体の切り離しは行わず一時的な切り
離しだけで処理を継続しようとすることがある。The third effect is that, as described in the section of the problem to be solved by the present invention, in an array disk having redundancy, if data cannot be written due to a medium failure, data to be stored is stored. In order to maintain consistency between the disk and the redundant disk that stores parity data, it is necessary to separate the failed disk for processing, but in the case of reading, it is possible to recover the data using the redundant disk and transfer it to the host computer. Because it is possible, there is a case where an attempt is made to continue processing only by temporary detachment without detaching the entire disk, which has a greater effect.

【００７５】あるいは、媒体に比較的軽微な障害が発生
しているが、再試行により救済することが可能な状況で
はディスクの切り離しは行われない場合がある。Alternatively, in a situation where a relatively small failure has occurred in the medium but the medium can be remedied by retrying, the disk may not be disconnected in some cases.

【００７６】このような部分的な媒体障害が発生してい
るが、そのディスクの切り離しが行われていない場合で
も、ディスクの切り離しを行うための判断の際に、メデ
ィア障害管理テーブル、あるいはデバイスインターフェ
ース障害管埋テーブルをチェックし、他のディスクでの
障害の有無を容易に判断することができるため、安易な
ディスクの切り離しで後に読めないデータが生ずること
を避けることができる。Even if such a partial media failure has occurred, but the disk has not been disconnected, the media failure management table or the device interface can be used to determine whether or not the disk should be disconnected. Since the failure management table can be checked and the presence or absence of a failure in another disk can be easily determined, it is possible to avoid data that cannot be read later due to easy disconnection of the disk.

【００７７】[0077]

【発明の効果】本発明は以上の陽に構成され機能するの
で、これによると、アレイ制御手段が、ディスクの障害
が発生したときにディスク障害管理テーブルの値を増加
させ、一方、ディスクに対して正常にデータの格納又は
読み出しが行われたときにはディスク障害管理テーブル
の値を減少させ、そして、ディスク障害管理テーブルの
ディスク障害値が予め定められたしきい値よりも上回っ
たときに当該インタフェースを切り離すため、回復不能
なメディアの障害の発生と、間欠的なリード／ライトエ
ラーの発生とを的確にディスク障害管理テーブルに反映
することができ、このため、軽度の障害で切り離す必要
がないときに切り離し処理を行うことがなく、さらに、
加算値と減算値及びしきい値を上位装置のリトライ回数
及びディスクの管理上の分割領域数に応じて設定するこ
とで、ディスクの一部に障害が発生したときであっても
良好にその状態をディスク障害管理テーブルに反映する
ことができ、さらに、デバイスインタフェースの状態を
重ねて管理することで、切り離し処理を必要な場合にの
み行うことができ、また、切り離し処理を行うときに、
ディスク障害管理テーブルを用いて他のディスクの状態
を確認して他のディスクにも障害がある場合には切り離
し処理を中止するため、冗長構成に基づいて復旧ができ
なくなる事態を良好に回避することができる。このよう
に、少ない障害履歴情報で、的確にディスクの切り離し
を行い、信頼性を向上させることができ、さらに、分割
領域としきい値及び加算、減算値の関係を設定すること
で、より的確にディスクの切り離し処理を行うことがで
きる、という従来にない優れたアレイディスク制御装置
を提供することができる。According to the present invention, the array control means increases the value of the disk failure management table when a disk failure occurs. When data storage or reading is normally performed, the value of the disk failure management table is reduced, and when the disk failure value of the disk failure management table exceeds a predetermined threshold, the interface is disabled. Due to the disconnection, the occurrence of an unrecoverable medium failure and the occurrence of an intermittent read / write error can be accurately reflected in the disk failure management table. Without performing disconnection processing,
By setting the addition value, the subtraction value, and the threshold value according to the number of retries of the higher-level device and the number of divided areas in the disk management, even if a failure occurs in a part of the disk, the state can be improved. Can be reflected in the disk failure management table.Furthermore, by superimposing and managing the states of the device interfaces, disconnection processing can be performed only when necessary, and when performing disconnection processing,
Use the disk failure management table to check the status of other disks, and if there is a failure in another disk, abort the disconnection process, so that it is possible to satisfactorily avoid the situation where recovery cannot be performed based on the redundant configuration Can be. Thus, a small fault history information, accurately perform disconnection of the disk, Ki is possible to improve the reliability, further, divided
Set the relationship between the area, threshold value, and addition / subtraction value
In this way, it is possible to perform more accurate disk disconnection processing.
Kill, it is possible to provide an excellent array disk controller unprecedented called.

[Brief description of the drawings]

【図１】本発明のアレイディスク制御装置の構成例を示
すブロック図である。FIG. 1 is a block diagram illustrating a configuration example of an array disk control device according to the present invention.

【図２】本発明のアレイディスク制御装置のディスクア
クセス開始前の切り離し判断の処理を示すフローチャー
トである。FIG. 2 is a flowchart showing a disconnection determination process performed before a disk access is started by the array disk control device of the present invention.

【図３】本発明のアレイディスク制御装置のディスクア
クセス後のエラー履歴管埋の処理を示すフローチャート
である。FIG. 3 is a flowchart showing processing of error history management after disk access of the array disk control device of the present invention.

【図４】本発明のアレイディスク制御装置におけるディ
スク切り離しの判断処理を示すフローチャートである。FIG. 4 is a flowchart showing a disk disconnection determination process in the array disk control device of the present invention.

【図５】本発明のアレイディスク制御装置を２台使用し
たアレイディスクの１構成例を示すブロック図である。FIG. 5 is a block diagram showing one configuration example of an array disk using two array disk control devices of the present invention.

【図６】従来技術での２台の制御装置からなるアレイデ
ィスクの１構成例を示すブロック図である。FIG. 6 is a block diagram showing a configuration example of an array disk including two control devices according to the related art.

[Explanation of symbols]

１，１Ａ，１Ｂメディア障害管埋テーブル２，２Ａ，２Ｂデバイスインターフェース障害管埋テ
ーブル３，３Ａ，３Ｂアレイ制御部５，５Ａ，５Ｂマイクロプログラム処理部６，６Ａ，６Ｂホストインターフェース制御部１０，１０Ａ，１０Ｂアレイディスク制御装置４１，４２，４３，４４ディスクインターフェース制
御部４１Ａ，４２Ａ，４３Ａ，４４Ａディスクインターフ
ェース制御部４１Ｂ，４２Ｂ，４３Ｂ，４４Ｂディスクインターフ
ェース制御部７１，７２，７３，７４ディスク1, 1A, 1B Media fault management table 2, 2A, 2B Device interface fault management table 3, 3A, 3B Array control unit 5, 5A, 5B Micro program processing unit 6, 6A, 6B Host interface control unit 10, 10A , 10B Array disk controller 41, 42, 43, 44 Disk interface controller 41A, 42A, 43A, 44A Disk interface controller 41B, 42B, 43B, 44B Disk interface controller 71, 72, 73, 74 Disk

Claims

(57) [Claims]

1. A method for receiving data transmitted from a host device.
Host interface control unit and the host interface
The data received by the
Array control means to store data in the disk
Disk failure management in which predetermined values are stored according to
An array disk control device comprising a table and the
When increasing the value of the disk failure management table
Disk failure value addition function and normal for the disk
When data is stored or read in the
Disk failure that decreases the value in the disk failure management table
Value subtraction function and the disk failure management table
When the fault value exceeds a predetermined threshold
Disk detach function that detaches the interface in question
The array control means further divides the plurality of disks into a plurality of areas, and the disk failure management table has a table item for managing a failure history for each of the divided areas, The disk failure value addition function has a function of adding a constant value a when a disk failure occurs. The threshold value of the disk failure management table is set to n times the constant value a and this n Is set to the number of times that the higher-level device automatically retries the same data. An array disk control device, comprising:

2. A device interface failure management table in which a predetermined value is stored in accordance with a device interface failure in the disk failure management table, wherein the array control means generates a failure of the device interface. An interface failure value addition function that increases the value of the device interface failure management table when the data is stored, and decreases the value of the device interface failure management table when data is normally stored or read through the device interface. An interface failure value subtraction function to cause the device interface failure management table to disconnect when the interface failure value of the device interface failure management table exceeds a predetermined threshold value. Array disk controller according to claim 1, wherein the was e.

3. A host interface control unit for receiving data transmitted from a higher-level device, and array control means for storing data received by the host interface control unit in a redundant configuration and storing the data on a plurality of disks via a device interface. An array disk control device comprising: a device interface fault management table in which a predetermined value is stored in accordance with a fault in the device interface in the array control means; An interface failure value addition function for increasing the value of the device interface failure management table when an interface failure occurs, and the device interface when data is normally stored or read through the device interface. An interface failure value subtraction function for reducing the value of the failure management table and a device interface separation function for disconnecting the interface when the interface failure value of the device interface failure management table exceeds a predetermined threshold value. An array disk control device, characterized in that:

4. A host interface control unit for receiving data transmitted from a higher-level device, and a plurality of disks connected to a plurality of disks for storing data received by the host interface control unit and connected to the plurality of disks via a device interface. An array disk control device comprising a plurality of array control means for storing, wherein said array control means is provided with a device interface fault management table in which a predetermined value is stored in accordance with a fault of said device interface, Each of the array control means has a table value adding function for increasing the value of the device interface fault management table when a fault occurs in the device interface, and data has been normally stored or read through the device interface. A table value subtraction function for decreasing the value of the device interface failure management table, and a device interface separation function for disconnecting the interface when the value of the device interface failure management table exceeds a predetermined threshold. When the device interface is disconnected by the device interface disconnection function, the device interface disconnection function is controlled to sequentially select the plurality of array control means to retry from the higher-level device. An array disk control device having a retry control function.