JPH0651914A

JPH0651914A - Magnetic disk subsystem

Info

Publication number: JPH0651914A
Application number: JP4099110A
Authority: JP
Inventors: Hiroyuki Chigami; 裕幸地紙
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1992-04-20
Filing date: 1992-04-20
Publication date: 1994-02-25

Abstract

PURPOSE:To reduce man-hour for retrial and to improve the throughput of an entire system by providing counters for counting the number of times of faults generated on an access passage respectively corresponding to plural memory directors. CONSTITUTION:When a memory director 2 is abnormally finished, a counter 5 counts the number of times of abnormal end and when the value reaches a value set in advance, it is confirmed the other memory director 3 is set in a usable state. Then, fault contents are edited, those contents and the enclosure of the director itself are reported to a host computer 1, and the self-close is performed. When the memory director 2 is newly activated after the self- enclosure, an I/O 6 is activated without by the memory director 3 without any competition with the other director since there is no usable memory director in a cluster 4 excepting for the memory director 3. Since the cluster 4 is closed when the self-enclosure of the memory director 3 is performed, the self-enclosure of the memory director 2 is not performed.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は磁気ディスクサブシステ
ムに関し、特に複数の磁気ディスク装置と、この複数の
磁気ディスク装置に対する複数のアクセス経路上に設け
た複数の記憶ディレクタとを備える磁気ディスクサブシ
ステムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a magnetic disk subsystem, and more particularly to a magnetic disk subsystem having a plurality of magnetic disk devices and a plurality of storage directors provided on a plurality of access paths to the plurality of magnetic disk devices. Regarding

【０００２】[0002]

【従来の技術】複数の磁気ディスク装置と、この複数の
磁気ディスク装置に対する複数のアクセス経路上に設け
た複数の記憶ディレクタとを備える従来の磁気ディスク
サブシステムは、ホストコンピュータが磁気ディスク装
置に対してアクセスするとき、磁気ディスク装置（Ｉ／
Ｏ）の起動を、複数のアクセス経路によって同じクラス
タ内の利用可能な記憶ディレクタによって行い、ホスト
コンピュータ自身は、どの記憶ディレクタによってＩ／
Ｏの起動が実行されたかについては関知しない方式とな
っている。このとき、アクセス経路上または記憶ディレ
クタに障害が発生した場合、ホストコンピュータは、再
試行を行うことがある。この再試行のとき、ホストコン
ピュータは、どの記憶ディレクタによって再試行を行う
かには関知しないため、先に障害が発生したアクセス経
路上の記憶ディレクタ以外の記憶ディレクタによって再
試行を行った場合は、その再試行が正常に終了する可能
性が高い。この結果、アクセス経路上に障害を有する記
憶ディレクタは、次の起動のときも参加し、その度に異
常終了となって再試行を行い、再試行の結果成功すると
いうことを繰返えす。2. Description of the Related Art In a conventional magnetic disk subsystem including a plurality of magnetic disk devices and a plurality of storage directors provided on a plurality of access paths to the plurality of magnetic disk devices, a host computer is provided with a magnetic disk device. Access to the magnetic disk device (I /
O) is started by the storage directors available in the same cluster by a plurality of access routes, and the host computer itself can execute I / O by which storage director.
It is a system that does not know whether the activation of O has been executed. At this time, if a failure occurs on the access path or in the storage director, the host computer may retry. At the time of this retry, since the host computer does not care which storage director performs the retry, if the retry is performed by a storage director other than the storage director on the access path where the failure occurred earlier, The retry is likely to end normally. As a result, the storage director having a fault on the access path participates also at the next activation, becomes an abnormal end each time, retries, and succeeds as a result of the retries.

【０００３】[0003]

【発明が解決しようとする課題】上述したように、従来
の磁気ディスクサブシステムは、複数のアクセス経路を
有するため、一つのアクセス経路に何らかの障害が発生
して異常終了となっても、再試行のとき、他のアクセス
経路上の利用可能な記憶ディレクタによって再試行を行
って正常終了すると、アクセス経路に障害を有する記憶
ディレクタをそのままの状態で運用することとなり、次
の新しい起動のとき、アクセス経路に障害を有する記憶
ディレクタを含む全記憶ディレクタが競合してその処理
を行うため、試行回数が増え、ホストコンピュータにリ
カバリー処理のための負担をかけ、システム全体のスル
ープットを低下させるという欠点を有している。As described above, since the conventional magnetic disk subsystem has a plurality of access paths, even if some failure occurs in one access path and the processing ends abnormally, the retry is retried. At this time, if a storage director available on another access path retries and terminates normally, the storage director with the access path failure will be operated as it is, and the next new startup Since all storage directors including the storage director with a path failure compete for processing, the number of trials increases and the host computer is burdened with recovery processing, which reduces the throughput of the entire system. is doing.

【０００４】[0004]

【課題を解決するための手段】本発明の磁気ディスクサ
ブシステムは、複数の磁気ディスク装置と、前記複数の
磁気ディスク装置に対する複数のアクセス経路上に設け
た複数の記憶ディレクタとを備える磁気ディスクサブシ
ステムにおいて、前記複数の記憶ディレクタのそれぞれ
に対応して該当するアクセス経路上に発生した障害の回
数を計数するカウンタを設け、前記カウンタの計数値が
あらかじめ設定した値をこえたときにホストコンピュー
タに報告して自己閉塞するようにしたものである。A magnetic disk subsystem according to the present invention comprises a plurality of magnetic disk devices and a plurality of storage directors provided on a plurality of access paths to the plurality of magnetic disk devices. In the system, a counter is provided for counting the number of failures occurring on the corresponding access path corresponding to each of the plurality of storage directors, and the host computer is provided with a counter when the count value of the counter exceeds a preset value. It was reported and self-obstructed.

【０００５】[0005]

【実施例】次に、本発明の実施例について図面を参照し
て説明する。Embodiments of the present invention will now be described with reference to the drawings.

【０００６】図１は本発明の一実施例を示すブロック図
である。FIG. 1 is a block diagram showing an embodiment of the present invention.

【０００７】図１の磁気ディスクサブシステムは、２台
の記憶ディレクタ２および３と、記憶ディレクタ２およ
び３間の共通の情報を有する複数のクラスタ４と、複数
台の磁気ディスク装置（Ｉ／Ｏ）６とを備えており、ホ
ストコンピュータ１は、記憶ディレクタ２および３によ
って複数台の磁気ディスク装置６に対して競合して起動
処理を行うことができる。記憶ディレクタ２および３に
は、それぞれカウンタ５が設けてある。各カウンタ５
は、自己の所属する記憶ディレクタ２または３とＩ／Ｏ
６との間の記憶経路７に発生した障害の数を計数する。The magnetic disk subsystem of FIG. 1 includes two storage directors 2 and 3, a plurality of clusters 4 having common information between the storage directors 2 and 3, and a plurality of magnetic disk devices (I / O). ) 6 is provided, the host computer 1 can perform the boot process in competition with the plurality of magnetic disk devices 6 by the storage directors 2 and 3. A counter 5 is provided in each of the storage directors 2 and 3. Each counter 5
Is the storage director 2 or 3 to which he belongs and I / O
The number of faults occurring in the storage path 7 between 6 and 6 is counted.

【０００８】ホストコンピュータ１は、Ｉ／Ｏ６の起動
の指示を、記憶ディレクタ２および３間の共通の情報を
有するクラスタ４に対して出した場合、記憶ディレクタ
２および３の競合によってどの記憶ディレクタがＩ／Ｏ
６の起動を行うかが決定され、ホストコンピュータ１
は、どの記憶ディレクタによってＩ／Ｏの起動が実行さ
れたかについては関知しない。記憶ディレクタ２が競合
に勝ち、記憶ディレクタ２がＩ／Ｏ６の起動を行った結
果、記憶ディレクタ２からＩ／Ｏ６までの記憶経路７に
何らかの障害があり、異常終了となると、ホストコンピ
ュータ１は、再試行を行うことがある。この再試行のと
き、記憶ディレクタ３が競合に勝ってＩ／Ｏ６の起動を
行うと、正常に終了する可能性が高いが、記憶ディレク
タ２が競合に勝ってＩ／Ｏ６の起動を行うと、異常終了
する可能性が高い。When the host computer 1 issues an instruction to activate the I / O 6 to the cluster 4 having the common information between the storage directors 2 and 3, which storage director 2 or 3 is in conflict with the other storage directors. I / O
It is decided whether or not to start 6 and the host computer 1
Is unaware of which storage director initiated the I / O. When the storage director 2 wins the competition and the storage director 2 activates the I / O 6, as a result of some failure in the storage path 7 from the storage director 2 to the I / O 6 and abnormal termination, the host computer 1 May retry. At the time of this retry, if the storage director 3 wins the conflict and activates the I / O 6, there is a high possibility that it will end normally, but if the storage director 2 wins the conflict and activates the I / O 6, It is likely to terminate abnormally.

【０００９】記憶ディレクタ２が異常終了すると、カウ
ンタ５は、異常終了の回数を計数し、その計数値があら
かじめ設定してある値に達すると、他の記憶ディレクタ
３が利用可能状態であることを確認し、その障害内容を
編集すると共にその障害内容および自己を閉塞すること
をホストコンピュータ１に報告した後、自己閉塞する。
記憶ディレクタ２が自己閉塞した後、ホストコンピュー
タ１による新たな起動があると、クラスタ４内には、記
憶ディレクタ３以外には利用可能な状態にある記憶ディ
レクタが存在しないため、記憶ディレクタ３は、他の記
憶ディレクタとの競合なしにＩ／Ｏ６の起動を行う。When the storage director 2 terminates abnormally, the counter 5 counts the number of abnormal terminations, and when the count value reaches a preset value, it is confirmed that another storage director 3 is available. After confirming and editing the failure content and reporting the failure content and self-blocking to the host computer 1, self-blocking is performed.
After the storage director 2 is self-blocked, when the host computer 1 newly starts up, there is no storage director in the cluster 4 other than the storage director 3 in an available state. I / O6 is started without conflict with other storage directors.

【００１０】記憶ディレクタ２が異常終了したとき、記
憶ディレクタ３が利用可能状態にない場合、すなわち記
憶ディレクタ３が自己閉塞している場合は、記憶ディレ
クタ２が自己閉塞すると、クラスタ４を閉塞することに
なるため、記憶ディレクタ２は自己閉塞しない。When the storage director 2 terminates abnormally, if the storage director 3 is not in an available state, that is, if the storage director 3 is self-blocking, the cluster 4 is blocked when the storage director 2 self-blocks. Therefore, the storage director 2 does not self-close.

【００１１】このように、各記憶ディレクタにカウンタ
を設け、各記憶ディレクタから磁気ディスク装置までの
記憶経路に障害が発生したとき、その障害の発生回数を
カウンタで計数することにより、障害発生が指定値を超
えた記憶ディレクタを自己閉塞させて障害発生が指定値
に満たない記憶ディレクタによって磁気ディスク装置を
起動することができるため、ホストコンピュータによる
再試行の工数を削減し、システム全体のスループットを
向上させることが可能となる。As described above, each storage director is provided with a counter, and when a failure occurs in the storage path from each storage director to the magnetic disk device, the failure occurrence is designated by counting the number of occurrences of the failure with the counter. Storage directors that exceed the specified value can be self-blocked, and the magnetic disk device can be activated by the storage director whose failure has not reached the specified value. This reduces the number of retries by the host computer and improves the overall system throughput. It becomes possible.

【００１２】また、障害発生が指定値を超えた記憶ディ
レクタを自己閉塞させることにより、ホストコンピュー
タがその記憶ディレクタにアクセスしないようにする処
理が不必要になり、また、障害を有する記憶ディレクタ
が次の起動に参加しないため、異常終了後に他の経路に
よる再試行によって正常終了させるリカバリ処理の回数
が減り、そのための制御も削減することができるため、
ホストコンピュータの負担を軽減しかつリカバリ処理の
時間を短縮できる。Further, by self-blocking the storage director whose failure has exceeded the specified value, there is no need for a process to prevent the host computer from accessing the storage director. Since it does not participate in the startup of, the number of recovery processes that are normally terminated by retrying by another route after abnormal termination is reduced, and the control for that is also reduced.
The load on the host computer can be reduced and the recovery processing time can be shortened.

【００１３】更に、各記憶ディレクタから自己を閉塞す
ることをホストコンピュータに対して報告することによ
り、記憶ディレクタのハードウエアの障害による閉塞で
ないことが明らかにし、かつ注意事象報告によってその
障害の内容をホストコンピュータに対して報告すること
が可能となる。Furthermore, each storage director reports to the host computer that it is self-blocking, and it is clarified that the storage director is not blocked due to a hardware failure, and the contents of the failure are reported by a caution event report. It is possible to report to the host computer.

【００１４】[0014]

【発明の効果】以上説明したように、本発明の磁気ディ
スクサブシステムは、各記憶ディレクタにカウンタを設
け、各記憶ディレクタから磁気ディスク装置までの記憶
経路に障害が発生したとき、その障害の発生回数をカウ
ンタで計数し、障害発生回数が指定値を超えたとき、そ
のことをホストコンピュータに対して報告して自己閉塞
するように構成することにより、ホストコンピュータに
よる再試行の工数を削減し、システム全体のスループッ
トを向上させることが可能となるという効果がある。ま
た、ホストコンピュータがその記憶ディレクタにアクセ
スしないようにする処理が不必要になり、異常終了後に
他の経路による再試行によって正常終了させるリカバリ
処理の回数が減り、そのための制御も削減することがで
きるため、ホストコンピュータの負担を軽減しかつリカ
バリ処理の時間を短縮できるという効果もある。更に、
各記憶ディレクタから自己を閉塞することをホストコン
ピュータに対して報告することにより、記憶ディレクタ
のハードウエアの障害による閉塞でないことが明らかに
し、かつ注意事象報告によってその障害の内容をホスト
コンピュータに対して報告することが可能となるという
効果もある。As described above, in the magnetic disk subsystem of the present invention, each storage director is provided with a counter, and when a failure occurs in the storage path from each storage director to the magnetic disk device, the failure occurs. By counting the number of times with a counter and reporting the fact to the host computer when the number of failure occurrences exceeds a specified value and self-blocking, it reduces the number of man-hours for retry by the host computer, There is an effect that it is possible to improve the throughput of the entire system. In addition, the process of preventing the host computer from accessing the storage director becomes unnecessary, the number of recovery processes to be normally completed by retrying by another route after abnormal termination is reduced, and the control therefor can also be reduced. Therefore, it is possible to reduce the load on the host computer and shorten the recovery processing time. Furthermore,
By reporting to the host computer that each storage director blocks itself, it is clarified that it is not a blockage due to a hardware failure of the storage director, and the contents of the failure are reported to the host computer by a caution event report. There is also an effect that it becomes possible to report.

[Brief description of drawings]

【図１】本発明の一実施例を示すブロック図である。FIG. 1 is a block diagram showing an embodiment of the present invention.

[Explanation of symbols]

１ホストコンピュータ２記憶ディレクタ３記憶ディレクタ４クラスタ５カウンタ６磁気ディスク装置（Ｉ／Ｏ）７記憶経路 1 Host Computer 2 Storage Director 3 Storage Director 4 Cluster 5 Counter 6 Magnetic Disk Unit (I / O) 7 Storage Path

Claims

[Claims]

1. A magnetic disk subsystem comprising a plurality of magnetic disk devices, and a plurality of storage directors provided on a plurality of access paths to the plurality of magnetic disk devices, the magnetic disk subsystem corresponding to each of the plurality of storage directors. A counter is provided for counting the number of failures that have occurred on the corresponding access route, and when the count value of the counter exceeds a preset value, it is reported to the host computer to self-block. Magnetic disk subsystem.