JP2973425B2

JP2973425B2 - Fault handling method and device therefor

Info

Publication number: JP2973425B2
Application number: JP1085245A
Authority: JP
Inventors: 智洋村田; 雅晴赤津; 謙三栗原; 繁雄本間
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1989-04-04
Filing date: 1989-04-04
Publication date: 1999-11-08
Anticipated expiration: 2014-11-08
Also published as: JPH02264335A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、ディスク制御装置における障害処理方法お
よびそのための装置に関し、特にキャッシュを内蔵し、
ディスクに対するホストコンピュータからの書込みデー
タをキャッシュ内に一時保留し、上述のホストコンピュ
ータによるディスクに対する書込み動作とは非同期に、
キャッシュからディスクに対して纏め書き動作を行うデ
ィスク制御装置の障害に対し、書込み保留データの信頼
性を向上させるのに好適な障害処理方法およびそのため
の装置に関する。Description: TECHNICAL FIELD The present invention relates to a failure handling method in a disk control device and a device therefor, and more particularly to a method for incorporating a cache,
The write data from the host computer to the disk is temporarily held in a cache, and asynchronously with the above-described write operation to the disk by the host computer,
The present invention relates to a failure processing method suitable for improving the reliability of write-pending data in response to a failure of a disk control device that performs collective writing operation from a cache to a disk, and an apparatus therefor.

[Conventional technology]

従来のキャッシュ付きディスク制御装置においては、
頻繁にアクセスさせるデータをキャッシュに保持してお
き、それらのデータがアクセスされた場合は、ディスク
までアクセスに行かずにキャッシュ上からデータを読出
すことにより、データをアクセス効率の向上を図ってい
る。この場合、例えば、特開昭60−79447号公報に開示
されている如く、キャッシュ障害を検知すると、キャッ
シュ全体を切離し、キャッシュ上のデータはすべて捨て
ていた。In the conventional disk controller with cache,
Data to be accessed frequently is stored in a cache, and when such data is accessed, data is read from the cache without going to the disk to improve the data access efficiency. . In this case, for example, as disclosed in Japanese Patent Application Laid-Open No. Sho 60-79447, when a cache failure is detected, the entire cache is disconnected and all data on the cache is discarded.

[Problems to be solved by the invention]

上記従来技術は、キャッシュ上にディスクドライブと
同一のデータを格納しておき、リードのみに利用する場
合のみを想定したものである。しかし、本発明の前提で
ある、ホストコンピュータからの書込みデータを一時キ
ャッシュ上に保留し、ホストコンピュータのディスクに
対する書込み動作とは非同期に、キャッシュからディス
クに対して纏め書き動作を行うようなディスク制御装置
の障害に対しては、キャッシュ上のデータを捨てるだけ
ではすまない。The prior art described above assumes that the same data as a disk drive is stored in a cache and is used only for reading. However, the disk control that presupposes the write data from the host computer in a temporary cache and performs the collective write operation from the cache to the disk asynchronously with the write operation to the disk of the host computer, which is a premise of the present invention. For a device failure, it is not enough to simply discard the data on the cache.

すなわち、ディスク制御装置で纏め書き動作を行う場
合には、キャッシュのバックアップメモリを用意して、
キャッシュ上の書込み保留データを二重化し、キャッシ
ャ障害時は上記バックアップメモリから、その逆の場合
はキャッシュから一重化となった書込み保留データをデ
ィスクドライブに反映させている。しかし、ディスク制
御装置内のプロセッサがキャッシュまたはバックアップ
メモリの管理情報を更新中にプロセッサ障害になると、
管理情報が不正となるため書込み保留データが失われて
しまう。That is, when performing the batch writing operation in the disk controller, a backup memory of the cache is prepared,
The write pending data on the cache is duplicated, and the write pending data from the cache is reflected on the disk drive from the backup memory in the event of a cache failure, and vice versa. However, if the processor in the disk controller fails to update the cache or backup memory management information,
Since the management information becomes invalid, the write pending data is lost.

この場合、キャッシュアるいはバックアップメモリそ
のものはハード的に正常であるため、これらのメモリの
ハードウェア障害の検知のみでは、キャッシュとバック
アップメモリのいずれから書込み保留データをディスク
ドライブに反映させるかを決めることができず、前記従
来技術では対処することができない。In this case, since the cache memory or the backup memory itself is normal in terms of hardware, only detection of a hardware failure in these memories determines which of the cache and the backup memory is to be used to reflect write pending data to the disk drive. And cannot be dealt with by the conventional technology.

本発明は上記事情に鑑みてなされたもので、その目的
とするところは、従来の技術における上述の如き問題を
解消し、ホストコンピュータからの書込みデータを一時
キャッシュ上に保留し、ホストコンピュータのディスク
に対する書込み動作とは非同期に、キャッシュからディ
スクに対して纏め書き動作を行うようなディスク制御装
置の障害に対しても対処可能な障害処理方法を提供する
ことにある。The present invention has been made in view of the above circumstances, and an object of the present invention is to solve the above-described problems in the related art, to temporarily hold write data from a host computer in a cache, and to store data in a disk of the host computer. An object of the present invention is to provide a failure handling method capable of coping with a failure of a disk control device that performs a collective writing operation from a cache to a disk asynchronously with a writing operation to a disk.

[Means for solving the problem]

本発明の上記目的は、複数のプロセッサ，キャッシュ
メモリおよびキャッシュのバックアップメモリを内蔵
し、ホストコンピュータからのディスクドライブに対す
る書込みを一時的に前記キャッシュメモリとバックアッ
プメモリに保留した時点で、ホストコンピュータに書込
み終了を報告し、ホストコンピュータのデータ書込み動
作とは非同期に、前記キャッシュ上の保留データをディ
スクドライブに書込む動作（纏め書き動作）行うディス
ク制御装置において、前記キャッシュ上の保留データの
管理情報を有するキャッシュディレクトリ，前記バック
アップメモリ上の保留データを管理情報を有するバック
アップメモリディレクトリの情報の更新を、前記複数の
プロセッサ中のいずれか１つのプロセッサが、他のプロ
セッサを排他しつつシーケンシャルに行うとともに、任
意のプロセッサが障害で動作停止したことを検出した時
点で、該プロセッサが前記キャッシュディレクトリ情報
とバックアップメモリディレクトリ情報のいずれかを更
新中であったかを同定し、そのいずれかの管理情報の更
新の完結している方に対応するメモリから、前記書込み
保留データをディスクドライブに書込むことを特徴とす
る障害処理方法およびそのための装置により達成され
る。It is an object of the present invention to provide a host computer which has a plurality of processors, a cache memory and a cache backup memory, and temporarily writes data to a disk drive from a host computer in the cache memory and the backup memory. The disk control device that reports the end and writes the pending data on the cache to the disk drive (collectively write operation) asynchronously with the data write operation of the host computer. The update of the information in the cache directory having the cache directory and the backup data in the backup memory directory having the management information of the pending data in the backup memory is performed by any one of the plurality of processors while excluding the other processors. At the same time as detecting that any processor has stopped operating due to a failure, the processor identifies whether the cache directory information or the backup memory directory information is being updated, and manages any of them. This is achieved by a failure handling method and an apparatus therefor, characterized in that the write pending data is written to a disk drive from a memory corresponding to the one whose information has been updated.

[Action]

本発明に係る障害処理方法においては、書込み保留デ
ータを有するキャッシュまたはバックアップメモリの管
理情報の更新を、ディスク制御装置内の複数のプロセッ
サのうちの任意の一つのプロセッサが、排他的にかつシ
ーケンシャルに行うことにより、プロセッサの障害によ
ってキャッシュとバックアップメモリの管理情報が同時
に破壊されることがないことに基づいて、プロセッサの
障害を検出した時点で、障害となったプロセッサがキャ
ッシュまたはバックアップメモリのいずれの管理情報を
更新中であったかを、正常なプロセッサが同定し、キャ
ッシュまたはバックアップメモリのうち、管理情報の更
新が完結している方から書込み保留データをディスクド
ライブに書込むようにして、ディスク制御装置内で一重
化状態になった書込み保留データを確実にディスクドラ
イブに書込むことを可能とするものである。In the failure processing method according to the present invention, any one of the plurality of processors in the disk control device exclusively and sequentially updates the management information of the cache or the backup memory having the write pending data. By doing so, based on the fact that the management information of the cache and the backup memory is not destroyed at the same time due to the failure of the processor, when the failure of the processor is detected, A normal processor identifies whether the management information is being updated, and writes the write-pending data to the disk drive from the cache or the backup memory from which the update of the management information is completed. Writing that has become unified Reliably hold data and makes it possible to be written to the disk drive.

また、本発明に係る障害処理装置は、これに必要な手
段を備えたものとなっている。Further, the fault processing apparatus according to the present invention includes means necessary for this.

〔Example〕

以下、本発明の実施例を図面に基づいて詳細に説明す
る。Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

第２図は、本発明の一実施例であるキャッシュ付きデ
ィスクサブシステムの構成図である。本実施例のディス
クサブシステムは、ホストコンピュータ100に接続され
るチャネル制御装置101,ディスク制御装置102,ディスク
ドライブ1100および1200により構成される。ディスク制
御装置102は、四つのプロセッサ103,104,105および106
を内蔵し、それぞれのプロセッサは、プロセッサ間排他
のためのロック機構107,プロセッサ状態を反映する制御
メモリ187,キャッシュメモリ1060,その管理情報を格納
するキャッシュディレクトリ1050,キヤッシュのバック
アップメモリ109,その管理情報を格納するバックアップ
メモリディレクトリ108と接続される。また、各プロセ
ッサはそれぞれ、チャネル制御装置101,ディスクドライ
ブ1100,1200と接続される。FIG. 2 is a configuration diagram of a disk subsystem with a cache according to one embodiment of the present invention. The disk subsystem of the present embodiment includes a channel controller 101, a disk controller 102, and disk drives 1100 and 1200 connected to the host computer 100. The disk controller 102 has four processors 103, 104, 105 and 106.
Each processor has a lock mechanism 107 for inter-processor exclusion, a control memory 187 reflecting the processor state, a cache memory 1060, a cache directory 1050 for storing its management information, a cache backup memory 109, and its management. It is connected to a backup memory directory 108 for storing information. Each processor is connected to the channel control device 101 and the disk drives 1100 and 1200, respectively.

まず、各プロセッサが正常に動作する場合におけるデ
ィスク制御装置102の纏め書き動作を説明する。例え
ば、ディスク制御装置102内のプロセッサ103が、チャネ
ルを介しホストコンピュータ100からのアクセス要求を
受取ると、プロセッサ103は、その要求を解読し、それ
がどのディスクドライブに対するアクセス要求かを判定
する。例として、ここでは、ディスクドライブ1100に対
する書込み要求であるとする。First, the collective writing operation of the disk control device 102 when each processor operates normally will be described. For example, when the processor 103 in the disk controller 102 receives an access request from the host computer 100 via a channel, the processor 103 decodes the request and determines which disk drive is the access request. As an example, it is assumed here that the request is a write request to the disk drive 1100.

プロセッサ103は、ロック機構107に対して、信号線12
6を介してディスクドライブ1100に対するロック要求を
出す。ロック機構107内部動作については、後で述べ
る。プロセッサ103は、ロック機構107から、信号線125
を介してディスクドライブ1100に対するロック取得成功
の応答が入ると、チャネル制御装置101にコマンドを要
求する。チャネル制御装置101からコマンドがプロセッ
サ103に送られると、プロセッサ103はこのコマンドを解
読し、リード／ライトの区別（ここではライトコマンド
とする）と、アクセス対象トラック番号（TR1）および
レコード番号（R1）を認識する。The processor 103 connects the signal line 12 to the lock mechanism 107.
A lock request is issued to the disk drive 1100 via 6. The internal operation of the lock mechanism 107 will be described later. The processor 103 outputs the signal line 125 from the lock mechanism 107.
When the response of the lock acquisition success to the disk drive 1100 is received via the, the command is requested to the channel control device 101. When a command is transmitted from the channel control device 101 to the processor 103, the processor 103 decodes the command, distinguishes between read / write (here, a write command), the track number to be accessed (TR1) and the record number (R1). ) Recognize.

次に、プロセッサ103は、前述のキャッシュメモリ106
0の管理情報を格納するキャッシュディレクトリ1050内
のディスクドライブ1100用トラック表3801（第５図参
照）を検索し、上述のアクセス対象トラック番号TR1に
対するキャッシュスロットが既にキャッシュ上にアロケ
ートされているか否かを判定する。ここでは、トラック
表3801のTR1にスロット制御ブロック354がアロケートさ
れており、スロット制御ブロック354からキャッシュメ
モリ1060内のデータ格納エリア306に、TR1のデータが既
に格納されている。これらをトラックヒットと呼ぶ。な
お、第５図は、キャッシュディレクトリ1050およびキャ
ッシュメモリ1060の論理構成を示す図である。Next, the processor 103
A search is made of the disk drive 1100 track table 3801 (see FIG. 5) in the cache directory 1050 storing the management information of 0, and whether or not the cache slot for the access target track number TR1 has already been allocated on the cache. Is determined. Here, the slot control block 354 is allocated to TR1 of the track table 3801, and the data of TR1 is already stored in the data storage area 306 in the cache memory 1060 from the slot control block 354. These are called track hits. FIG. 5 is a diagram showing a logical configuration of the cache directory 1050 and the cache memory 1060.

キャッシュメモリ1060内のトラックヒットデータには
二種類ある。すなわち、キャッシュ上のスロッとに格納
されているデータとそれに対応するディスクドライブ上
のトラック上のデータが一致しているスロット（これ
を、「ノンペンディングスロット」と呼ぶ）、および、
それらが一致しないスロット（これを、「ペンディング
スロット」と呼ぶ）である。前者は第５図中でキャッシ
ュメモリ1060中のハッチング無しのスロットで、また、
高後者はハッチング付きのスロットで示されている。上
記ペンディングスロットは、ディスク制御装置102の纏
め書き動作により、書込み保留データがキャッシュ上に
格納されることにより発生する。前述のTR1の場合は、
ノンペンディングスロットである。There are two types of track hit data in the cache memory 1060. That is, a slot in which the data stored in the slot on the cache matches the data on the track on the disk drive corresponding to the slot (this is called a “non-pending slot”), and
These are slots that do not match (this is called a "pending slot"). The former is a non-hatched slot in the cache memory 1060 in FIG.
The latter are indicated by hatched slots. The pending slot is generated when write pending data is stored in the cache by the collective writing operation of the disk control device 102. In the case of TR1 mentioned above,
This is a non-pending slot.

ここで、第５図に基づいて、キャッシュディレクトリ
1050内のスロット制御ブロックをつないだキューの説明
をしておく。キュー322は、すべてのディスクドライブ
におけるノンペンディングスロットの位置をポイントす
るスロット制御ブロックを一つのキューに集めたもので
ある。また、キュー320は、ディスクドライブ1100に関
するすべてのペンディングスロットの位置をポイントす
るスロット制御ブロックを一つのキューに集めたもの、
キュー321は、ディスクドライブ1200に関するすべての
ペンディングスロットの位置をポイントするスロット制
御ブロックを一つのキューに集めたものである。このよ
うに、ペンディングスロットをポイントするスロット制
御ブロックのキューは、ディスクドライブ対応に独立さ
せておく。キュー323は、キャッシュ上の空きスロット
の位置をポイントするスロット制御ブロックを一つのキ
ューに集めたものである。Here, based on FIG. 5, the cache directory
The queue connecting the slot control blocks in 1050 will be explained. The queue 322 is a collection of slot control blocks pointing to the positions of non-pending slots in all disk drives in one queue. Also, the queue 320 is a collection of slot control blocks that point to the positions of all pending slots for the disk drive 1100 in one queue,
The queue 321 is a collection of slot control blocks pointing to the positions of all pending slots for the disk drive 1200 in one queue. In this way, the queue of the slot control block pointing to the pending slot is made independent for the disk drive. The queue 323 is a collection of slot control blocks pointing to the positions of empty slots on the cache in one queue.

プロセッサ103は、スロット制御ブロック354のスロッ
トステータス（第６図参照）を判定し、当該スロットを
他プロセッサがアクセス中であるか否か（スロットステ
ータスがビジーか否か）を、判定する。ここでは、当該
スロットを他プロセッサがアクセス中ではないとする。
このとき、プロセッサ103は、スロット制御ブロック350
のスロットステータスをビジーに書換え、ロック中プロ
セッサNo.に自プロセッサ番号である103を書込む。The processor 103 determines the slot status of the slot control block 354 (see FIG. 6), and determines whether or not another processor is accessing the slot (whether the slot status is busy). Here, it is assumed that another processor is not accessing the slot.
At this time, the processor 103
Is updated to busy, and the own processor number 103 is written to the locked processor number.

次に、当該スロットを纏め書きの対象とするために、
スロット制御ブロック354をノンペンディングスロット
キュー322から、ペンディングスロットキュー320に移
す。更に、キャッシュ上の書込み保留データを二重化す
るために、バックアップメモリ109のディレクトリ108を
検索し、キャッシュ上のペンディングスロットと同一の
データを格納するためのバックアップメモリ109上の空
きスロットを確保する。Next, in order to make the slot a target for batch writing,
The slot control block 354 is moved from the non-pending slot queue 322 to the pending slot queue 320. Further, in order to duplicate the write pending data on the cache, the directory 108 of the backup memory 109 is searched, and an empty slot on the backup memory 109 for storing the same data as the pending slot on the cache is secured.

バックアップメモリ109のディレクトリ108内には、バ
ックアップメモリ109上のペンディングスロットの位置
を示すスロット制御ブロックをディスクドライブ1100,1
200対応につないだキュー4010,4011（第７図参照）およ
びバックアップメモリ109上の空きスロット位置を示す
スロット制御ブロックを一つにつないだキュー4020があ
る。In the directory 108 of the backup memory 109, slot control blocks indicating the positions of pending slots on the backup memory 109 are stored in the disk drives 1100 and 1100.
There are queues 4010 and 4011 (see FIG. 7) connected to the corresponding 200 and a queue 4020 connecting the slot control blocks indicating the empty slot positions on the backup memory 109 to one.

例えば、キャッシュ上のTR1に対応するスロットを二
重化するためには、キュー4020から空きスロット605を
ポイントするスロット制御ブロック654を抜取って、デ
ィスクドライブ1100用のペンディングスロットキュー40
10に追加する。そして、キャッシュディレクトリ上のTR
1に対応するスロット制御ブロック354のバックアップメ
モリスロットポインタとして、バックアップメモリディ
レクトリ108上のスロット制御ブロック654へのポインタ
を記憶する。For example, to duplicate the slot corresponding to TR1 on the cache, the slot control block 654 that points to the empty slot 605 is extracted from the queue 4020, and the pending slot queue 40 for the disk drive 1100 is extracted.
Add to 10. And TR on the cache directory
As the backup memory slot pointer of the slot control block 354 corresponding to 1, a pointer to the slot control block 654 on the backup memory directory 108 is stored.

更に、プロセッサ103は、第８図に示す制御メモリ187
上のジョブ票602の内容（Read/Writeの区別，アクセス
対象ドライブNo.アクセス対象トラックおよびレコードN
o.ならびにプロセッサが確保中のスロット番号等）を書
込んだ後、前述のチャネル制御装置101およびプロセッ
サ103を経由して、キャッシュスロット306（第５図参
照）上のレコードR1のフィールドおよびバックアップメ
モリのスロット605（第７図参照）内のレコードR1のフ
ィールドに書込みレコードデータを同時に転送する。Further, the processor 103 includes a control memory 187 shown in FIG.
Contents of the above job form 602 (Read / Write distinction, access target drive No. access target track and record N
o. and the slot number reserved by the processor, etc.), via the above-described channel control device 101 and processor 103, the field of the record R1 in the cache slot 306 (see FIG. 5) and the backup memory. The write record data is simultaneously transferred to the field of the record R1 in the slot 605 (see FIG. 7).

キャッシュおよびバックアップメモリへの書込みレコ
ードデータの転送が終了した時点で、チャネル制御装置
101に対して、ディスクドライブへのデータ書込み終了
の報告を返す。このとき、書込みレコードデータは、キ
ャッシュおよびバックアップメモリ上にペンディングス
ロットとして、二重に保留されている状態にある。この
ようなペンディングスロットのデータがある本数貯まる
と、チャネル制御装置101からのアクセス要求を処理中
でない任意のプロセッサが、ディスクドライブに複数ト
ラック分纏めて書出し、ディスクドライブに書き終えた
キャッシュスロットおよびそれらに対応して確保したバ
ックアップメモリスロットの制御ブロックを、ペンディ
ングスロットキューから解放し、空きスロットキューに
戻す。When the transfer of the write record data to the cache and the backup memory is completed, the channel controller
In response to 101, a report of completion of data writing to the disk drive is returned. At this time, the write record data is doubly reserved as a pending slot on the cache and the backup memory. When the number of such pending slot data reaches a certain number, an arbitrary processor that is not processing an access request from the channel control device 101 collectively writes a plurality of tracks to the disk drive and writes the cache slots and the cache slots that have been written to the disk drive. Is released from the pending slot queue and returned to the empty slot queue.

以上が、ディスク制御装置102における正常時の纏め
書き動作である。The above is the summary writing operation in the disk controller 102 in the normal state.

次に、前述のロック機構107の構成および動作を、第
３図および第４図に従って説明する。Next, the configuration and operation of the lock mechanism 107 will be described with reference to FIGS.

第３図は、ロック機構107の内部構成を示す図であ
り、三つのレジスタ3000,3100,3300、監視タイマ208,ロ
ック制御部206,アービタ290から構成される。レジスタ3
000は、各ドライブ毎にロック情報2000を記憶する。ロ
ック情報2000中のビット2001に“1"がセットされると、
あるプロセッサが当該ドライブのロックを確保中であ
り、他プロセッサが当該ドライブをアクセスすべきでな
いことを示す。FIG. 3 is a diagram showing the internal configuration of the lock mechanism 107, which comprises three registers 3000, 3100, 3300, a monitoring timer 208, a lock control unit 206, and an arbiter 290. Register 3
000 stores lock information 2000 for each drive. When bit 2001 in lock information 2000 is set to “1”,
Indicates that one processor is securing the lock on the drive and that other processors should not access the drive.

また、更新フラグＡ（2002）に“1"がセットされる
と、当該ドライブのロックを確保中のプロセッサが、当
該ドライブに関するキャッシュディレクトリ情報の更新
を行う可能性があることを示す。逆に、更新フラグＢ
（2003）に“1"がセットされると、当該ドライブのロッ
クを確保中のプロセッサが、当該ドライブに関するバッ
クアップメモリディレクトリ情報の更新を行う可能性が
あることを示す。プロセッサNo.（2004）には、当該ド
ライブのロックを確保中のプロセッサNo.が、設定され
る。レジスタ3100には、各ドライブに対するプロセッサ
からの新たなロック要求が設定される。また、レジスタ
3200には、各ドライブに対するロックの解除要求、およ
び既に他プロセッサが取得中のロックを強制取得する要
求が設定される。Also, when "1" is set to the update flag A (2002), it indicates that the processor which is securing the lock of the drive may update the cache directory information related to the drive. Conversely, the update flag B
If “1” is set to (2003), it indicates that the processor that is securing the lock of the drive may update the backup memory directory information related to the drive. In the processor No. (2004), a processor No. which is securing the lock of the drive is set. In the register 3100, a new lock request from the processor for each drive is set. Also register
In 3200, a lock release request for each drive and a request to forcibly acquire a lock already acquired by another processor are set.

これらのレジスタのドライブ毎の設定情報は、ロック
情報2000の設定されるものと同じである。これらのレジ
スタの使用法は、次に述べるロック制御部206の動作の
説明と合せて述べる。The setting information of these registers for each drive is the same as that set in the lock information 2000. The usage of these registers will be described together with the description of the operation of the lock control unit 206 described below.

ロック制御部206の動作は、第４図のフローチャート
で示される。ロック制御部206は、上述のレジスタ3100
と3200のいずれかに任意のプロセッサがロック要求を設
定したときに動作を開始する（ステップ4000）。なお、
複数のプロセッサがロック要求を設定する場合の競合
は、アービタ290が処理する。The operation of the lock control unit 206 is shown in the flowchart of FIG. The lock control unit 206 includes the register 3100 described above.
And operation starts when any processor sets a lock request to any of 3200 and 3200 (step 4000). In addition,
Arbiter 290 handles conflicts when multiple processors set lock requests.

今、レジスタ3100に、あるドライブに対するロック要
求（ロックビットとプロセッサNo.をセット、ディレク
トリ更新を行う場合には、対応する更新フラグもセット
する）が設定されたものとすると、ロック制御部206
は、まず、監視タイマ206を起動した（ステップ4190）
後、レジスタ3000の当該ドライブに関するロック情報中
のロックビットを調べる（ステップ4300）。ロックビッ
トがセットされていなければ、監視タイマ206を停止さ
せた後、レジスタ3100の内容をレジスタ3000に転送し、
ロックを要求したプロセッサの当該ドライブに対するロ
ックを確保する（ステップ4600,4800）。また、ロック
ビットがセットされている場合には、ロック取得待ちに
入り監視タイマ208がタイムアウトするまでに当該ロッ
クビットがリセットされなければロック要求タイムアウ
トエラーを報告する（ステップ4400,4500,4700）。Now, assuming that a lock request for a certain drive is set in the register 3100 (the lock bit and the processor number are set, and when a directory is updated, the corresponding update flag is also set), the lock control unit 206
Started the monitoring timer 206 first (step 4190)
Thereafter, the lock bit in the lock information of the drive of the register 3000 is checked (step 4300). If the lock bit is not set, after stopping the monitoring timer 206, the content of the register 3100 is transferred to the register 3000,
The lock for the drive of the processor which has requested the lock is secured (steps 4600 and 4800). If the lock bit is set, the lock wait state is entered, and if the lock bit is not reset before the monitoring timer 208 times out, a lock request timeout error is reported (steps 4400, 4500, 4700).

また、レジスタ3200にあるドライブに対するロック要
求（ロックビットとプロセッサNo.をセット、ディレク
トリ更新を行う場合には、対応する更新フラグもセット
する）が設定されたものとすると、ロック制御部206
は、レジスタ3200の当該ドライブに関するロック情報中
のロックビットを調べ、それが“1"ならば強制ロック取
得要求、“0"ならばロック解除要求と判定する。If a lock request for the drive in the register 3200 (set a lock bit and a processor number, and if a directory is updated, also set a corresponding update flag) is set, the lock control unit 206
Examines the lock bit in the lock information of the drive in the register 3200, and if it is “1”, determines a forced lock acquisition request, and if it is “0”, determines a lock release request.

強制ロック取得要求の場合は、レジスタ3200の内容を
レジスタ3000に無条件で転送し、レジスタ3100にロック
待ちの要求がある場合は、監視タイマを208をリセット
スタートさせる（ステップ4140,4180,4170）。この制御
ロック取得モードは、同一プロセッサが、同一のドライ
ブのロックを取消したままで更新フラグの内容のみを変
化させる場合にも用いられる。また、ロック解除要求の
場合は、当該ドライブに関するレジスタ3000と3100のロ
ック情報のプロセッサNo.を比較し、それが等しい場合
には、レジスタ3000,3100をクリアする。等しくない場
合には、レジスタ3100のみをクリアする（ステップ412
0,4130）。In the case of a forced lock acquisition request, the contents of the register 3200 are unconditionally transferred to the register 3000. If there is a lock wait request in the register 3100, the monitoring timer 208 is reset and started (steps 4140, 4180, 4170). . This control lock acquisition mode is also used when the same processor changes only the content of the update flag while canceling the lock of the same drive. In the case of a lock release request, the processor 3000 of the lock information of the register 3000 and the processor number of the lock information of the 3100 are compared, and if they are equal, the registers 3000 and 3100 are cleared. If they are not equal, only the register 3100 is cleared (step 412).
0,4130).

以上の動作を行うディスク制御装置において、前述の
キャッシュディレクトリ1050またはバックアップメモリ
ディレクトリ108内の各スロット制御ブロックをつない
だキューを更新中に、プロセッサ障害が発生した場合を
考える。この場合、上記ディレクトリ内のペンディング
スロットの管理情報が不正となり、不正ディレクトリを
有するメモリからの書込み保留データのディスクドライ
ブへの書出しができなくなるため、ディスク制御装置内
で書込み保留データが事実上一重化の状態となる。この
状態を検出し、一重化状態となった書込み保留データを
ディスクドライブへ書出す処理を障害処理として行う必
要がある。In the disk control device performing the above operation, consider a case where a processor failure occurs while updating the queue connecting each slot control block in the cache directory 1050 or the backup memory directory 108 described above. In this case, the management information of the pending slot in the directory becomes invalid, and the write pending data from the memory having the illegal directory cannot be written to the disk drive. Therefore, the write pending data is virtually unified in the disk controller. State. It is necessary to detect the state and write the unified write pending data to the disk drive as failure processing.

以下、上述のディスク制御装置を例に、本実施例によ
る障害処理動作を、第１図に示すフローチャートに従っ
て説明する。Hereinafter, the failure processing operation according to the present embodiment will be described with reference to the flowchart shown in FIG. 1 using the above-described disk control device as an example.

本障害処理動作の開始には、二つの契機がある（ステ
ップ700）。第一は、チャネル制御装置101からのリセッ
ト入力である。例えば、ディスク制御装置102内で、プ
ロセッサ103が障害で停止すると、該プロセッサは異常
発生通知割込みをチャネル制御装置101に、信号線112を
介して発生する。異常発生通知割込みを受けたチャネル
制御装置101は、プロセッサ103に対し信号線111を介し
てリセット信号を出力する。リセット信号を受取ったプ
ロセッサ103は、自プロセッサ番号を障害プロセッサ番
号にセットする（ステップ702）。There are two occasions for starting the failure handling operation (step 700). The first is a reset input from the channel control device 101. For example, when the processor 103 stops due to a failure in the disk control device 102, the processor generates an abnormality occurrence notification interrupt to the channel control device 101 via the signal line 112. The channel control device 101 that has received the abnormality occurrence notification interrupt outputs a reset signal to the processor 103 via the signal line 111. The processor 103 that has received the reset signal sets its own processor number to the failed processor number (step 702).

他の一つの契機は、ロック機構107におけるロック取
得タイムアウトエラーの発生である。この場合、いずれ
かのプロセッサがあるディスクドライブのロックを確保
したままで永久障害になっていることが考えられる。こ
の場合には、次の如く、障害プロセッサを切分ける。ロ
ック機構107からのロックタイムアウト報告を受けたプ
ロセッサは、当該プロセッサがロックしようとしていた
ディスクドライブに関し、ロック機構107中のレジスタ3
000のロック情報内のプロセッサNo.を読出し、読出した
プロセッサNo.について、前述の制御メモリ187中の当該
プロセッサのプロセッサステータスを調べ、それが永久
障害状態であれば、障害プロセッサ番号として、当該プ
ロセッサNo.を設定する（ステップ701,703）。Another trigger is the occurrence of a lock acquisition timeout error in the lock mechanism 107. In this case, it is conceivable that one of the processors has caused a permanent failure while securing the lock of a certain disk drive. In this case, the faulty processor is isolated as follows. The processor that has received the lock timeout report from the lock mechanism 107 registers the register 3 in the lock mechanism 107 regarding the disk drive that the processor was trying to lock.
The processor number in the lock information of 000 is read, and the processor status of the processor in the control memory 187 described above is checked for the read processor number. No. is set (steps 701 and 703).

以上の処理の後、次の如き処理を行う。以下の処理
は、上記二つの契機に対して共通である。After the above processing, the following processing is performed. The following processing is common to the above two occasions.

まず、制御メモリ187中の障害となったプロセッサの
ジョブ票を調べて、障害プロセッサがアクセス処理中で
あったディスクドライブNo.を読出す（ステップ713）。
その結果、アクセス処理中のドライブがあれば（ステッ
プ783）、ロック機構107中のレジスタ3000内の該当する
ディスクドライブ情報を調べ、ロックビットがセットさ
れており、かつ、ロック取得プロセッサNo.（2004）が
障害プロセッサNo.と一致していることを調べる。一致
していなければ、既に他のプロセッサが障害処理を開始
しているとみなし、自プロセッサでは障害処理は行なわ
ない。First, the job slip of the failed processor is checked in the control memory 187, and the disk drive No. for which the failed processor was performing access processing is read (step 713).
As a result, if there is a drive that is being accessed (step 783), the corresponding disk drive information in the register 3000 of the lock mechanism 107 is checked, the lock bit is set, and the lock acquisition processor No. (2004 ) Matches the failed processor number. If they do not match, it is considered that another processor has already started the failure processing, and the own processor does not perform the failure processing.

一致している場合、ロック機構107のレジスタ3200に
当該ディスクドライブに対する自プロセッサの強制ロッ
ク要求をセットして、当該ディスクドライブのロックを
取得し、以下の処理を行う。If they match, a forced lock request of the own processor for the disk drive is set in the register 3200 of the lock mechanism 107, the lock of the disk drive is acquired, and the following processing is performed.

まず、当該ディスクドライブのロック情報内の更新フ
ラグを調べ、キャッシュディレクトリ更新を示す更新フ
ラグＡと、バックアップメモリディレクトリ更新を示す
更新フラグＢのいずれがセットされているかを調べる
（ステップ704）。更新フラグＡがセットされている場
合は、キャッシュディレクトリ更新中のプロセッサ障害
でキャッシュディレクトリ情報が破壊されたとみなし
て、キャッシュを閉塞し（ステップ705）、障害プロセ
ッサがアクセス中であったディスクドライブに対するチ
ャネル制御装置101からのアクセスを一時停止する（ス
テップ707）。First, the update flag in the lock information of the disk drive is checked to determine which of the update flag A indicating the cache directory update and the update flag B indicating the backup memory directory update is set (step 704). If the update flag A is set, it is considered that the cache directory information has been destroyed due to the processor failure during the cache directory update, and the cache is closed (step 705). The access from the control device 101 is temporarily stopped (step 707).

バックアップメモリ内の、当該ディスクドライブの書
込み保留データの当該ディスクドライブへの書出し処理
は、バックアップメモリディレクトリ内の、当該ディス
クドライブに対するペンディングスロットキューにつな
がっている、前記スロット制御ブロックでポイントされ
たバックアップメモリ上の当該ディスクドライブに対す
る書込み保留データを、当該ディスクドライブ上に書出
し（ステップ709）、書出し完了後、キャッシュディレ
クトリをイニシャライズして、当該ディスクドライブに
対するチャネル制御装置101からのアクセスを許可し、
制御メモリ187内の障害プロセッサのジョブ表をクリア
し、ロック機構107のレジスタ3200に、当該ディスクド
ライブのロック解除要求をセットして、当該ディスクド
ライブに対する纏め書き動作を再開する（ステップ711,
714）。The process of writing the write pending data of the disk drive in the backup memory to the disk drive is performed in the backup memory directory, which is linked to the pending slot queue for the disk drive and pointed to by the slot control block. The above write pending data for the disk drive is written on the disk drive (step 709). After the writing is completed, the cache directory is initialized, and access to the disk drive from the channel control device 101 is permitted.
The job table of the failed processor in the control memory 187 is cleared, an unlock request for the disk drive is set in the register 3200 of the lock mechanism 107, and the collective writing operation on the disk drive is restarted (step 711,
714).

一方、更新フラグＢがセットされている場合、バック
アップメモリディレクトリ更新中のプロセッサ障害によ
り、バックアップメモリディレクトリ情報が破壊された
とみなして、バックアップメモリを閉塞し（ステップ70
6）、障害プロセッサがアクセス中であったディスクド
ライブに対するチャネル制御装置101からのアクセスを
一時停止し（ステップ708）、キャッシュディレクトリ
内の当該ディスクドライブに対するペンディングスロッ
トキューにつながっているスロット制御ブロックでポイ
ントされた、キャッシュメモリ上と当該ディスクドライ
ブに対する書込み保留データを、当該ディスクドライブ
上に書出し（ステップ710）、書出し完了後、バックア
ップメモリディレクトリをイニシャライズして、当該デ
ィスクドライブに対するチャネル制御装置101からのア
クセスを許可し、制御メモリ187内の障害プロセッサの
ジョブ票をクリアし、ロック機構107のレジスタ3200に
当該ディスクドライブのロック解除要求をセットして、
当該ディスクドライブに対する纏め書き動作を再開する
（ステップ712,714）。On the other hand, if the update flag B is set, the backup memory is closed, assuming that the backup memory directory information has been destroyed due to a processor failure during the update of the backup memory directory (step 70).
6), suspend access from the channel controller 101 to the disk drive accessed by the failed processor (step 708), and point to the slot control block connected to the pending slot queue for the disk drive in the cache directory. The written write data on the cache memory and the disk drive is written to the disk drive (step 710). After the writing is completed, the backup memory directory is initialized to access the disk drive from the channel controller 101. Is cleared, the job vote of the failed processor in the control memory 187 is cleared, and a lock release request for the disk drive is set in the register 3200 of the lock mechanism 107,
The collective writing operation for the disk drive is restarted (steps 712 and 714).

なお、ロック機構107で、異なるプロセッサがキャッ
シュディレクトリとバックアップメモリディレクトリを
同時に更新することが無いように、前述の如くプロセッ
サ間排他を行っているため、複数のプロセッサ障害によ
っても、キャッシュディレクトリとバックアップメモリ
ディレクトリの両方が同時に不正となることはない。The lock mechanism 107 performs inter-processor exclusion as described above so that different processors do not update the cache directory and the backup memory directory at the same time. Both directories cannot be corrupted at the same time.

次に、プロセッサ自身は正常であるが、ディレクトリ
とプロセッサを結ぶパスにパス障害が発生した場合を考
える。Next, consider the case where the processor itself is normal, but a path failure has occurred in the path connecting the directory and the processor.

この場合、ディレクトリとプロセッサを結ぶパスにパ
ス障害が発生した以上、ディレクトリ更新中であったプ
ロセッサ自身では、当該ディレクトリのイニシャライズ
（第１図ステップ714）はできないため、処理を完結さ
せることはできない。In this case, as long as a path failure has occurred in the path connecting the directory and the processor, the processor itself that is updating the directory cannot initialize the directory (step 714 in FIG. 1), so that the processing cannot be completed.

これに対しては、ディレクトリ更新中であった当該プ
ロセッサ自身が、自プロセッサを永久障害状態とするこ
とにより、当該ディレクトリとの正常なパスを持つ他プ
ロセッサが、ディレクトリ更新中であった当該プロセッ
サが確保したままのディスクドライブのロックを確保し
ようとして、ロック機構107でロックタイムアウトエラ
ーが検出され、それを契機として第１図に示した障害処
理を行い、当該ディスクドライブの書込み保留データを
当該ディスクドライブに書出した後、当該ディスクドラ
イブの纏め書き動作を再開させることができる。In response to this, the processor itself that is updating the directory puts its own processor in a permanent failure state, so that another processor having a normal path to the directory can update the processor that is updating the directory. The lock mechanism 107 detects a lock time-out error in an attempt to secure the lock of the disk drive that has been secured, and triggers the failure processing shown in FIG. After that, the collective writing operation of the disk drive can be restarted.

上記実施例によれば、纏め書き動作を行うディスク制
御装置において、ディスク制御装置内の任意のプロセッ
サが障害で動作を停止し、キャッシュメモリ上の書込み
保留データの管理情報、または、バックアップメモリ上
の書込み保留データの管理情報の更新を完結できない場
合でも、ロック機構107および制御メモリ187内のジョブ
票の情報により、それらの書込み保留データ管理情報の
うちいずれかは正しい状態を確保でき、かつ、どちらが
正しいかを切分けることができるため、書込み保留デー
タ管理情報が正しい方のメモリから、書込み保留データ
を確実にディスクドライブに書出すことができる。According to the above-described embodiment, in the disk control device that performs the collective writing operation, an arbitrary processor in the disk control device stops operation due to a failure, and the management information of the write-pending data in the cache memory or the management information in the backup memory. Even when the update of the management information of the write pending data cannot be completed, the information of the job ticket in the lock mechanism 107 and the control memory 187 can secure one of the write pending data management information in a correct state, and Since it is possible to determine whether the data is correct or not, the write pending data can be surely written to the disk drive from the memory having the correct write pending data management information.

また、ロック機構107のキャッシュディレクトリおよ
びバックアップメモリディレクトリのロック情報が、デ
ィスクドライブ単位に記憶されるため、障害プロセッサ
がアクセスしていた各ディスクドライブ毎に、キャッシ
ュディレクトリとバックアップメモリディレクトリとい
ずれの書込み保留データ管理情報の更新が完結している
かを切分けることができ、ディスク制御装置内の複数の
プロセッサの同時障害発生の場合にも、障害プロセッサ
がアクセスしていた各ディスクドライブの書込み保留デ
ータを生存プロセッサが完全にディスクドライブに書出
すことができる。In addition, since the lock information of the cache directory and the backup memory directory of the lock mechanism 107 is stored in units of disk drives, for each disk drive that has been accessed by the failed processor, either the cache directory or the backup memory directory, It is possible to determine whether the data management information has been updated or not. Even if multiple processors in the disk control unit have a simultaneous failure, the write pending data of each disk drive accessed by the failed processor survives. The processor can completely write to the disk drive.

これらにより、本実施例では、前述の如きディスク制
御装置の纏め書き動作の信頼度が大幅に向上するという
効果がある。As a result, in this embodiment, there is an effect that the reliability of the collective writing operation of the disk control device as described above is greatly improved.

なお、上記実施例においては、ディスク制御装置内の
プロセッサ数を４台としたが、一般にはｎ台をプロセッ
サでディスク制御装置を構成しても良い。また、キャッ
シュディレクトリとバックアップメモリディレクトリの
更新を切分けるための更新フラグA,Bをロック機構107内
部のレジスタ内に設けたが、これらのフラグは，キャッ
シュディレクトリ内、および、バックアップメモリディ
レクトリ内に設けることも可能である。すなわち、キャ
ッシュディレクトリ内のディスクドライブ対応のペンデ
ィングスロットキュー毎に更新フラグＡを設け、更に、
バックアップメモリディレクトリ内のディスクドライブ
対応のペンディングスロットキュー毎に更新フラグＢを
設け、プロセッサがそれぞれのキューを更新する直前
で、それらのフラグをセットし、更新終了後にそれらの
フラグをリセットするようにすれば良い。In the above embodiment, the number of processors in the disk controller is four, but in general, the disk controller may be composed of n processors. Also, update flags A and B for separating the update of the cache directory and the backup memory directory are provided in the register inside the lock mechanism 107. These flags are provided in the cache directory and the backup memory directory. It is also possible. That is, an update flag A is provided for each pending slot queue corresponding to a disk drive in the cache directory,
An update flag B is provided for each pending slot queue corresponding to a disk drive in the backup memory directory, the processor sets these flags immediately before updating the respective queues, and resets the flags after the update is completed. Good.

〔The invention's effect〕

以上、詳細に説明した如く、本発明によれば、複数の
プロセッサ，キャッシュメモリおよびキャッシュのバッ
クアップメモリを内蔵し、ホストコンピュータからのデ
ィスクドライブに対する書込みを一時的に前記キャッシ
ュメモリとバックアップメモリに保留した時点で、ホス
トコンピュータに書込み終了を報告し、ホストコンピュ
ータのデータ書込み動作とは非同期に、前記キャッシュ
上の保留データをディスクドライブに書込む纏め書き動
作を行うディスク制御装置において、前記キャッシュ上
の保留データを管理情報を有するキャッシュディレクト
リ，前記バックアップメモリ上の保留データの管理情報
を有するバックアップメモリディレクトリの情報の更新
を、前記複数のプロセッサ中のいずれか１つのプロセッ
サが、他のプロセッサを排他しつつシーケンシャルに行
うとともに、任意のプロセッサが障害で動作停止したこ
とを検出した時点で、該プロセッサが前記キャッシュデ
ィレクトリ情報とバックアップメモリディレクトリ情報
のいずれを更新中であったかを同定し、そのいずれかの
管理情報の更新の完結している方に対応するメモリか
ら、前記書込み保留データをディスクドライブに書込む
ようにしたことにより、書込み保留データをディスク制
御装置からディスクドライブに書出せなくなったり、書
込み保留データがディスク制御装置内で一重化の状態の
ままになることを防止することができ、ディスク制御装
置による纏め書き動作の信頼度を大幅に向上させること
が可能な障害処理方法およびそれのための装置を実現で
きるという顕著な効果を奏するものである。As described above in detail, according to the present invention, a plurality of processors, a cache memory, and a cache backup memory are built in, and writing from the host computer to the disk drive is temporarily suspended in the cache memory and the backup memory. At this point, the disk control device that reports the end of writing to the host computer and performs a collective writing operation of writing the held data in the cache to the disk drive asynchronously with the data writing operation of the host computer. One of the plurality of processors updates data in a cache directory having management information and a backup memory directory having management information of pending data in the backup memory by another processor. And while sequentially performing, while detecting that any processor has stopped operating due to a failure, the processor identifies which of the cache directory information and backup memory directory information was being updated, and identified By writing the write pending data to the disk drive from the memory corresponding to the one whose update of the management information has been completed, the write pending data cannot be written out from the disk controller to the disk drive, A failure processing method capable of preventing write pending data from being left in a single state in the disk control device and greatly improving the reliability of the collective writing operation by the disk control device, and a failure processing method thereof. Has a remarkable effect of realizing a device for .

[Brief description of the drawings]

第１図は本発明の一実施例である、ディスク制御装置に
おける障害処理動作を示すフローチャート、第２図は実
施例のキャッシュ付きディスクサブシステムの構成図、
第３図はその要部であるロック機構の内部構成を示す
図、第４図はロック制御部の動作を示すフローチャー
ト、第５図はキャッシュディレクトリおよびキャッシュ
メモリの論理構成を示す図、第６図はキャッシュディレ
クトリ内のスロット制御ブロックの構成図、第７図はバ
ックアップメモリディレクトリおよびバックアップメモ
リの論理構成図、第８図は制御メモリの論理構成図であ
る。 100:ホストコンピュータ、101:チャネル制御装置、102:
ディスク制御装置、103,104,105,106:プロセッサ、107:
ロック機構、108:バックアップメモリディレクトリ、10
60:キャッシュメモリ、187:制御メモリ、109:バックア
ップメモリ、1050:キャッシュディレクトリ、1100およ
び1200:ディスクドライブ、3000,3100,3300:ロック機構
内のレジスタ、208:監視タイマ、206:ロック制御部、29
0:アービタ、2000:ロック情報。FIG. 1 is a flowchart showing a failure processing operation in a disk control device according to an embodiment of the present invention, FIG. 2 is a configuration diagram of a disk subsystem with a cache according to the embodiment,
FIG. 3 is a diagram showing an internal configuration of a lock mechanism as a main part thereof, FIG. 4 is a flowchart showing an operation of a lock control unit, FIG. 5 is a diagram showing a logical configuration of a cache directory and a cache memory, and FIG. Is a block diagram of a slot control block in the cache directory, FIG. 7 is a logical block diagram of a backup memory directory and a backup memory, and FIG. 8 is a logical block diagram of a control memory. 100: Host computer, 101: Channel controller, 102:
Disk control device, 103, 104, 105, 106: processor, 107:
Lock mechanism, 108: Backup memory directory, 10
60: cache memory, 187: control memory, 109: backup memory, 1050: cache directory, 1100 and 1200: disk drive, 3000, 3100, 3300: register in lock mechanism, 208: monitoring timer, 206: lock control unit, 29
0: Arbiter, 2000: Lock information.

フロントページの続き (72)発明者栗原謙三神奈川県川崎市麻生区王禅寺1099番地株式会社日立製作所システム開発研究所内 (72)発明者本間繁雄神奈川県小田原市国府津2880番地株式会社日立製作所小田原工場内 (56)参考文献特開昭63−269244（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁶，ＤＢ名) G06F 12/08 G06F 12/16 G06F 11/00 Continuing on the front page (72) Inventor Kenzo Kurihara 1099 Ozenji Temple, Aso-ku, Kawasaki City, Kanagawa Prefecture Inside System Development Laboratory, Hitachi, Ltd. (56) References JP-A-63-269244 (JP, A) (58) Fields investigated (Int. Cl. ⁶ , DB name) G06F 12/08 G06F 12/16 G06F 11/00

Claims

(57) [Claims]

A host computer has a built-in cache memory and a cache backup memory. When a host computer temporarily writes to a disk drive and temporarily suspends the cache memory and the backup memory, the host computer notifies the host computer of the end of writing. A disk controller that reports and asynchronously writes the pending data on the cache to the disk drive (collectively write operation) asynchronously with the data write operation of the host computer, has the management information of the pending data on the cache. Updating of information in a cache directory and a backup memory directory having management information on pending data in the backup memory is performed by any one of the plurality of processors while excluding other processors. At the time when it is detected that any processor has stopped operating due to a failure, the processor identifies which of the cache directory information and the backup memory directory information was being updated, and manages any of them. A failure processing method comprising writing the write-pending data to a disk drive from a memory corresponding to the one whose information has been updated.

2. The fault handling method according to claim 1, wherein the two directory information are initialized in addition to each of the operations, and the collective writing operation is restarted.

3. In addition to the above operations, if the faulty processor does not restart due to a reset and becomes a permanent fault, the permanent fault is detected when the faulty processor is detected. Identifying which of the information occurred during the update, and writing the write pending data to the disk drive from the memory corresponding to the one whose update of the management information was completed, and writing the two directories 2. The failure processing method according to claim 1, wherein the information is initialized in a reusable form, and the collective writing operation is restarted.

4. A plurality of processors, a cache memory, and a cache backup memory are built-in, and when the writing from the host computer to the disk drive is temporarily suspended in the cache memory and the backup memory, the writing completion is notified to the host computer. A disk controller that reports and asynchronously writes the pending data on the cache to the disk drive (collectively write operation) asynchronously with the data write operation of the host computer, has the management information of the pending data on the cache. Update of information in the cache directory and the backup memory directory having the management information of the pending data on the backup memory is performed by one of the plurality of processors while excluding the other processors. When a failure in a write path connecting the processor and each directory is detected during execution, the processor that is using the failed path is placed in a permanent failure state, and the permanent failure is stored in the cache directory information. And which of the backup memory directory information occurred during the update, and from the memory corresponding to the one whose update of the management information was completed,
A failure processing method comprising: writing the write pending data to a disk drive; initializing the two directory information in a reusable form; and restarting the collective writing operation.

5. The system according to claim 1, wherein the cache directory information and the backup memory directory information have an independent data structure for each disk drive to be written from the host computer, and each operation is performed for each disk drive. The fault processing method according to claim 1, wherein:

6. A plurality of processors, a cache memory, and a cache backup memory are built-in, and when the writing from the host computer to the disk drive is temporarily suspended in the cache memory and the backup memory, the writing completion is notified to the host computer. The disk control device that performs the operation of writing the pending data in the cache to the disk drive (collectively writing operation) asynchronously with the data write operation of the host computer, and stores the management information of the pending data in the cache. The update of information in the cache directory having the cache memory and the backup memory directory having the management information of the pending data in the backup memory is performed by one of the plurality of processors while excluding the other processors. Means for controlling the execution of any of the processors, means for detecting that any of the processors has stopped operating due to a failure, and which of the cache directory information and the backup memory directory information has been updated by the processor which has stopped operating due to the failure. A fault handling apparatus characterized by comprising means for identifying a fault.

7. In addition to the above means, means for detecting when the failed processor does not restart due to a reset and becomes a permanent failure, and when detecting the presence of the processor which has become a permanent failure. 7. The apparatus according to claim 6, further comprising means for enabling another processor to acquire the cache directory or the backup memory directory locked by the processor after the processor is forcibly unlocked. Fault handling equipment.