JP2000513129A

JP2000513129A - Method of tracking incomplete writes in a disk array, and disk storage system performing such a method

Info

Publication number: JP2000513129A
Application number: JP11500947A
Authority: JP
Inventors: レッグ，クリストファー・ビィ
Original assignee: ユニシス・コーポレイション
Priority date: 1997-05-30
Filing date: 1998-05-29
Publication date: 2000-10-03
Anticipated expiration: 2018-05-29
Also published as: DE69822819T2; AU7703798A; EP0985173A1; DE69822819D1; US5893164A; JP3288724B2; WO1998054647A1; EP0985173B1

Abstract

(57)【要約】ディスクアレイ内の不完全な書込を追跡する方法は、書込みされるべきアレイ内のそれぞれのブロックを識別する複数の書込コマンド（ｔ１、ｔ２、ｔ３、ｔ４、ｔ５の書込）を順次受取るステップと、最も新しく受取られた書込コマンドのみに対して拡張書込領域（たとえば、時刻ｔ１のリストのＰＢ８，２８９）のリスト（１６）を生成するステップとを含み、各拡張書込領域は書込みされるべきブロックと後の書込コマンドによって書込みされる可能性のある追加的なブロックとを含み（たとえば、ＰＢ８，２８９＋８はＰＢ８，２９０のｔ１での書込を含む）、リスト内のいかなる拡張書込領域内（たとえば、ｔ４の書込）にもない特定のブロックの書込みを行なう書込コマンドが後に受取られるごとに、１つの拡張書込領域（たとえば、ｔ３のリスト１６内の最後のエントリ）を特定のブロックを含む新しい拡張書込領域（たとえば、ｔ４のリスト内の最初のエントリ）に置換えることによってリストを変更するステップと、変更するステップが行なわれるごとに磁気媒体にリストの複製（ｔ４の１７）を記憶させるステップとを含む。 SUMMARY OF THE INVENTION A method for tracking incomplete writes in a disk array uses a plurality of write commands (t1, t2, t3, t4, t5) to identify each block in the array to be written. Write), and generating a list (16) of extended write areas (eg, PB8, 289 of the list at time t1) for only the most recently received write command; Each extended write area includes the block to be written and any additional blocks that may be written by a later write command (eg, PB8,289 + 8 includes writing PB8,290 at t1). ), Each time a write command to write a particular block that is not in any of the extended write areas in the list (eg, write at t4) is received, one extension Modifying the list by replacing the area (eg, the last entry in the list at t3) with a new extended write area containing the particular block (eg, the first entry in the list at t4); And storing a copy of the list (17 of t4) on a magnetic medium each time the step of performing is performed.

Description

【発明の詳細な説明】名称：ディスクアレイ内の不完全な書込を追跡する方法、およびこのような方法を行なうディスク記憶システム発明の背景：この発明は、ディスクアレイ内の不完全な書込を追跡する方法と、このような方法を行なうディスク記憶システムとに関する。先行技術において、「ＲＡＩＤ」ディスクアレイという用語は安価なディスクの冗長アレイを意味するものと定義されており、いくつかの異なるＲＡＩＤディスクアレイが定義されている。これらはレベル１ＲＡＩＤディスクアレイ、レベル３ＲＡＩＤディスクアレイおよびレベル５ＲＡＩＤディスクアレイを含む。パターソン（Patterson.）他による「安価なディスクによる冗長アレイ（ＲＡＩＤ）について」（“A Case for Redundant Arrays of Inexpensive Disks(RAID)” ）、報告書番号第ＵＣＢ／ＣＳＤ８７／３９１号、１９８７年１２月、カリフォルニア大学バークレー校コンピュータサイエンス部（Computer Science Divis ion of the University of California at Berkeley）を参照されたい。レベル５ＲＡＩＤディスクアレイでは、パリティおよびデータの両方が一組のいくつかのディスクにまたがってストライピングされる。図１は、ディスク０、ディスク１…ディスク４とラベル付けされた５ディスクからなる組によりアレイが形成されるレベル５ＲＡＩＤディスクアレイの一例を示す。アレイの各列はデータとその組内の単一のディスク内に記憶されるパリティとを含む。アレイの各行はその組の５ディスク全部にまたがってストライピングされるデータおよびパリティを含む。図１において、アレイの各行は１ディスク上に存在する１パリティチャンクと他の４ディスク上に存在する４データチャンクとからなる。また、各データチャンクおよび各パリティチャンクはいくつかの物理ブロックに分割される。単一のブロックはユーザプログラムによって読出または書込コマンドで別個にアドレス指定され得るチャンクの最小の部分である。図１において、１チャンク当り８ブロックが存在する。各ブロックは、予め定められた数のバイト（たとえば５１２バイト）と「ＣＲＣ」バイトと呼ばれる１巡回冗長検査バイトとからなる。図１のアレイにおいて、行０のブロック０は論理アドレス０の読出／書込コマンドによってアドレス指定される。この論理アドレスが順次１ずつ増分されるに連れ、データブロックは、データチャンク０のブロック１−７、データチャンク１のブロック０−７、データチャンク２のブロック０−７、データチャンク３のブロック０−７、データチャンク４のブロック８−１５、データチャンク５のブロック８−１５…の順序でアドレス指定される。たとえば、データチャンク５のブロック８は論理アドレス４０を有する。データブロックに書込むとき、そのブロック内のＣＲＣバイトも発生され、書込まれる。さらに、そのデータブロックと同じブロック番号を有するパリティブロックも発生され、書込まれる。このパリティブロックは奇数パリティまたは偶数パリティを用いて書込まれる。偶数パリティでは、パリティブロックと、同じブロック番号を有する全データブロックとの排他的論理和がすべて０のブロックとなる。逆に、奇数パリティでは、パリティブロックと、同じブロック番号を有する全ブロックとの排他的論理和がすべて１のブロックとなる。書込まれるべき新しいデータブロックのために新しいパリティブロックを発生する方法の１つは以下のとおりである。まず、既存のデータブロックとそのパリティブロックとをそれらのそれぞれのディスクから読出す。次に、読出されたパリティブロックと読出されたデータブロックとの排他的論理和をとり、それと新しいデータブロックとの排他的論理和をとるようにして新しいパリティを計算する。この新しいパリティブロックおよび新しいデータブロックを次にそれらのそれぞれのディスクに書込む。読出コマンドの実行の間、読出されるデータのブロックからＣＲＣバイトが再生成される。再生成されたＣＲＣバイトが記憶されているＣＲＣバイトと異なっていれば、読出されるデータのブロックは誤りを含んでいる。この誤りを訂正するため、誤りのあるデータブロックが、ａ）誤りのあるデータブロックと同じブロック番号を有するディスク上の他のブロック（データおよびパリティ）のすべてを読出し、ｂ）それらのブロックの排他的論理和をとることによって再生成される。ここで、特定のブロック番号「ｉ」を有するブロックへとデータを書込もうと試みる特定の書込コマンドの実行が開始されたが、その実行が完了前に中断される場合を考える。このような中断はたとえば停電のために起こり得る。上の場合、中断は、新しいデータブロックの書込が完了した後であるが新しいパリティブロックの書込が始まる前に起こり得る。同様に、中断は、新しいパリティブロックの書込が完了した後であるが新しいデータブロックの書込が始まる前にも起こり得る。いずれの場合でも、ブロック番号「ｉ」を有する全ブロックの排他的論理和がすべて０またはすべて１のブロックとは等しくならない。同時に、ブロック番号「ｉ」の全ブロックのＥＣＣバイトは正しいであろう。中断の原因が解消した後、アレイはユーザプログラムによって読出および書込され続けるであろう。ブロック番号「ｉ」を有するいずれかのデータブロックが読出され、ＥＣＣバイトが誤りを検出すれば、同じブロック番号を有する残りのデータブロックとパリティブロックとの排他的論理和をとることによって誤りのあるデータブロックを再生成する試みが行なわれる。しかしながら、それまでの不完全な書込のため、その再生成プロセスはうまくいかない。この問題は先行技術において、アレイが動作し始めると「１」にセットされ、アレイが通常の態様で動作を停止すると「０」にリセットされるフラグをディスクアレイ内に与えることによって対処されてきた。したがって、アレイが動作し始めるときにフラグがセットされる前にそれが「１」であることがわかれば、以前にアレイの通常動作が中断されていたに違いないということになる。しかしながら、上の先行技術のフラグの欠点は、フラグが誤った状態にあるとわかった後に、不完全に書込まれている特定のブロックを識別するのに時間がかかりすぎることである。書込の不完全なブロックを見つけるために、アレイ全体の全データブロックおよび全パリティブロックを読出さなければならず、したがって、パリティブロックを読出データブロックから再計算しなければならず、再計算されたパリティブロックを読出されたパリティブロックと比較しなければならない。大きなアレイでは、このプロセスを完了するのにまる１日かかる可能性がある。したがって、この発明の主な目的は、ディスクアレイ内の不完全な書込を追跡する方法と、このような方法を行ない、アレイの通常動作が中断された後にアレイ全体の全ブロックに対するパリティを検査する必要性をなくすディスク記憶システムとを提供することである。発明の簡単な概要：この発明では、ディスクアレイ内の不完全な書込がデジタルコンピュータ内の制御プロセスによって追跡される。この制御プログラムは、アレイ内の書込みが行なわれるべきそれぞれのブロックを識別する複数の書込コマンドを順次受取る。これに応答して、制御プログラムは最も新しく受取られた書込コマンドのみに対する拡張された書込領域のリストを発生する。このような拡張書込領域の各々は、書込まれるべきブロックと、その後の書込コマンドによって書込まれる可能性のある追加的な関連のブロックとを含む。また、リストは、そのリスト内の拡張書込領域にない特定のブロックを書込みを行なう書込コマンドが後に受取られるごとに制御プログラムによって変更される。この変更では、ディスク内の１つの拡張書込領域がその特定のブロックを含む新しい拡張書込領域によって置換えられる。さらに、変更ステップが起こるごとに、制御プログラムはリストの複製を磁気媒体に記憶する。停電のような予期しない中断のために、受取られた書込コマンドのいずれかによって求められる書込動作が開始され得るが完了できない。この発明では、不完全な書込の存在がリストの複製内において識別されるブロックのパリティのみを検査することによって検出される。その結果として、ディスクアレイ全体の全ブロックのパリティを検査するのにかかる時間に対してかなりの量の時間が節約される。上の節約の実際上の数値例として、アレイ全体が９００ギガバイトを記憶し、複製内の拡張書込領域が合計５００メガバイトに及ぶ場合を考える。ディスクアレイの５００メガバイトの部分のパリティを検査するためには約１０分が必要である。しかしながら、９００ギガバイトは５００メガバイトよりも１８００倍大きく、アレイ全体のパリティを検査するには約（１８００）×（１０分）すなわち約３０時間もかかることになる。リスト内の拡張書込領域の総数を増加させると、書込コマンドによって書込みがされるべきブロック全部がリストによって含まれる可能性が高くなる。これはリストの複製が磁気媒体上で更新される必要がある回数を低減するので望ましい結果である。好ましくは、１％未満の書込コマンドが新しい拡張書込領域のリストへの追加を必要とする。図面の簡単な説明：図１は、不完全な書込が起こり得るレベル５ＲＡＩＤディスクアレイの構造を示す。図２は、この発明に従う図１のアレイ内の不完全な書込を追跡するディスク記憶システムのある好ましい実施例を示す。図３は、不完全な書込を追跡するために図２のディスク記憶システムが用いる拡張書込領域の好ましい構造を示す。図４は、図３の拡張書込領域が図２のディスク記憶システム内でリストとそのリストの複製とに記憶される方法を示す。図５は、不完全な書込を追跡するために図２のディスク記憶システムが用いる拡張書込領域の第２の好ましい構造を示す。詳細な説明：この発明に従う、ディスクアレイ内の不完全な書込を追跡するディスク記憶システムの１つの好ましい実施例を図２に示す。図２のシステムは１組の５ディスク（ディスク０からディスク４まで）を含み、それらは図１に関する背景で説明したようにレベル５ＲＡＩＤディスクアレイ内にデータブロックおよびパリティブロックを記憶する。図２のシステムはまた、ディスクアレイに結合されるデジタルコンピュータ１１と、コンピュータ１１に結合される半導体メモリ１２と、コンピュータに結合される操作卓１３とを含む。メモリ１２内に記憶されるのは、制御プログラム１４と１つをユーザプログラム１５ｊとして示す複数のユーザプログラムとである。各ユーザプログラムは読出または書込されるべきディスクアレイ内の特定のデータブロックを識別する論理アドレスを含んだ読出／書込コマンドを含む。それらの読出／書込コマンドは実行のために制御プログラム１４内の待ち行列１４ａへと順次送られる。待ち行列１４ａ内の各書込コマンドは制御プログラム１４によって後に受取られ、以下のように実行される。まず、制御プログラムは書込コマンド内の論理アドレスを分析して、書込みされるべき各データブロックの番号と、そのデータブロックを含むディスクの番号と、対応のパリティブロックを含むディスクの番号とを判断する。次に、制御プログラムは上述のデータブロックの現在の内容とその対応のパリティブロックとをそれらのそれぞれのディスクから読出す。次に、制御プログラムは、読出されたパリティブロックと読出されたデータブロックとの排他的論理和をとり、それと新しいデータブロックとの排他的論理和をとるようにして新しいパリティブロックを計算する。次に、制御プログラムは新しいパリティブロックと新しいデータブロックとをそれらのそれぞれのディスクへと書込む。したがって、各書込コマンドの実行は２つの読出動作が２つの異なるディスク上で行なわれることを必要とし、その後２つの書込動作が同じ２ディスク上で行なわれる。これらの２つの書込動作はいかなる順序で行なわれてもよく、その結果、停電のような予期しない中断が起これば、ディスク上の書込動作は他のディスク上の書込動作が始まる前に完了することができない。ディスクアレイ内のこのような不完全の書込動作を追跡するために、図２のディスク記憶システムは半導体メモリ１２内にリスト１６をさらに含む。このリスト１６は制御プログラム１４が待ち行列１４ａから最も新しく受取ったそれらの書込コマンドのための拡張された書込領域を識別する。リスト内の各拡張書込領域は、書込コマンドによって書込まれるべきブロックと追加的な関連ブロックとを含む。後述する図３がこれらの拡張書込領域の具体的な２例を示す。書込コマンドが実行のために待ち行列１４ａから受取られるごとに、制御プログラム１４は書込みされるべきブロックのアイデンティティとリスト１６内の拡張書込領域にあるブロックのアイデンティティとを比較する。書込まれるべきブロックがリスト１６内のいかなる拡張書込領域にもなければ、制御プログラム１４はリスト１６内の１つの拡張書込領域を、書込みされるべき特定のブロックを含む新しい拡張書込領域に置換える。好ましくは、リスト１６内の拡張書込領域のすべてがそのリスト内に、最も古く用いられたものが削除される形式で配列される。この場合、新しい拡張書込領域がリストに追加される必要があるとき、置換えられる１つの拡張書込領域は最も古く用いられたものである。いずれにせよ、拡張書込領域は、その拡張書込領域内にブロックの書込みを行なう、受取られた全書込コマンドが完全に実行されるまでリスト１６から取除かれない。図２のディスク記憶システムにおいても、リスト１７の複製が磁気媒体に記憶される。一実施例では、複製１７はレベル５ＲＡＩＤディスクアレイを保持する５ディスクの１つに記憶される。代替的に、図２のディスク記憶システムがリストの複製１７が記憶される補助的磁気媒体１８を含んでもよい。この複製１７は、それが識別する拡張書込領域が特定の順序で配列されていない点でリスト１６とは異なる。しかしながら、リスト１６および複製１７の両方とも同じ拡張書込領域を識別する。リスト１６とその複製１７とに記憶される拡張書込領域のための１つの特定の構造を図３に示す。ここでは、大きな矢印２１が、それら矢印が描かれた２つのデータブロックの書込を求める、制御プログラム１４が待ち行列１４ａから受取った書込コマンドを表わす。それらのデータブロックは物理ブロックＰＢＹ＋５およびＰＢＹ＋６内のディスク２に記憶される。それら２ブロックのための拡張書込領域が図３のハッチライン２２で示され、それは全ディスク上の物理ブロックＰＢＹ＋５、ＰＢＹ＋６、ＰＢＹ＋７、ＰＢＹおよびＰＢＹ＋１を含む。図３においても同様に、待ち行列１４ａからの第２の書込コマンドに応答して書込みが行なわれる１つのデータブロック内に別の大きな矢印２５が描かれる。そのデータブロックは物理ブロックＰＢＹ＋６としてディスク４上に記憶される。そのブロックのための拡張書込領域が図３のドット２６で示され、この拡張書込領域は全ディスク上の物理ブロックＰＢＹ＋６、ＰＢＹ＋７、ＰＢＹ＋８およびＰＢＹ＋９を含む。図３において、ハッチライン２２で示す拡張書込領域の範囲は以下のように決定される。まず、制御プログラムが、書込コマンド２１が最後に書込みを行なうブロックの論理アドレスに１、２および３を加えることによって３つの拡張論理アドレスを発生する。次に、制御プログラムはそれらの拡張論理アドレスに対応する物理ブロックを識別する。参照番号２３ａ、２３ｂおよび２３ｃがこのステップの結果を示す。この場合、拡張書込領域は全ディスク上のそれらの物理ブロックと、書込まれるべきブロックとからなると定義される。ドット２６で示す拡張書込領域も同じように決定される。図３において、参照番号２７ａ、２７ｂおよび２７ｃは、ａ）書込コマンド２５が書込みを行なうブロックの論理アドレスに１、２および３を加え、ｂ）対応の物理ブロックを識別する結果を示す。リスト１６においては、ハッチライン２２で示す拡張書込領域は（ＰＢＹ＋５，＋２）および（ＰＢＹ，＋１）のような簡潔な形で識別される。この簡潔な形によってリストの大きさが減少する。この形では、括弧内の第１の項目は拡張書込領域内の全ディスク上の物理ブロックであり、第２の項目は全ディスク上の上のブロックに続く物理ブロックの数のカウントである。同様に、ドット２６で示す拡張書込領域は（ＰＢＹ＋６，＋３）のような簡潔な形で表わされる。ここで、ハッチライン２２で示す拡張書込領域がリスト１６にある間に書込コマンド２５が待ち行列１４ａから受取られるとする。その場合、ドット２６で示す拡張書込領域はリスト１６に追加されない。これは、上述のように、制御プログラム１４が書込みが行なわれるべきブロックのアイデンティティとリスト１６の拡張書込領域にあるブロックのアイデンティティとを比較するからである。特定の書込コマンドが書込みを行なうブロックのすべてがリスト１６の拡張書込領域にあれば、そのコマンドのための新しい拡張書込領域はリストには追加されない。リスト１６とその複製１７との動作を示す具体的な一例を図４に示す。この例では、リスト１６とその複製１７との初期の内容が時間ｔ０で生じる。そのとき、リスト１６内の最も新しく使用された拡張書込領域は（ＰＢ３５，１９１＋６）であり、次に最も新しく使用された拡張書込領域は（ＰＢ７６５＋６）であり、最も古く使用された拡張書込領域は（ＰＢ１３６，３５３＋６）である。その後、図４の時刻ｔ１で、物理ブロックＰＢ８，２９０の書込を求める書込コマンドが制御プログラム１４によって待ち行列１４ａから受取られる。この物理ブロックはリスト１６内にある拡張書込領域（ＰＢ８，２８９＋８）に含まれている。したがって、新しい拡張書込領域はリストに追加されず、そのため、リストの複製１７は変更されないままである。しかしながら、リスト１６内の拡張書込領域はその使用順序を維持するために再配列される。次に、図４の時刻ｔ２で、物理ブロックＰＢ１３６，３５４の書込を求める書込コマンドが制御ブロック１４によって待ち行列１４ａから受取られる。この物理ブロックはリスト１６内にある拡張書込領域（ＰＢ１３６，３５３＋１０）に含まれている。したがって、新しい拡張書込領域はリストに追加されず、リストの複製１７は変更されないままである。しかしながら、リスト１６内の拡張書込領域はここでもまたその使用順序を維持するために再配列される。次に、図４の時刻ｔ３で、物理ブロックＰＢ７６７の書込を求める書込コマンドが制御プログラム１４によって待ち行列１４ａから受取られる。この物理ブロックはリスト１６内にある拡張書込領域（ＰＢ７６５＋６）に含まれている。したがって、新しい拡張書込領域はリストに追加されず、リストの複製１７はここでもまた変更されない。しかしながら、リスト１６内の拡張書込領域はその使用順序を維持するために再配列される。次に、図４の時刻ｔ４で、物理ブロックＰＢ９１，５３２の書込を求める書込コマンドが制御プログラム１４によって待ち行列１４ａから受取られる。この物理ブロックはリスト１６内にある拡張書込領域のいかなるものにも含まれていない。したがって、新しい拡張書込領域（ＰＢ９１，５３２＋３）がリスト１６に追加され、最も古く使用された拡張書込領域（ＰＢ５２４，９１１＋６）がリスト１６から削除され、リスト内の全拡張書込領域がその使用順序で配列される。また、リストの複製１７はその内容が新しいリスト１６と同じであるように書込まれる。最後に、図４の時刻ｔ５で、物理ブロックＰＢ１３６，３５６の書込を求める書込コマンドが制御プログラム１４によって待ち行列１４ａから受取られる。この物理ブロックはリスト１６内にある拡張書込領域（ＰＢ１３６，３５０＋１０）に含まれている。したがって、新しい拡張書込領域はリストに追加されず、リストの複製１７は変更されないままである。しかしながら、リスト１６内の拡張書込領域はここでもまたその使用順序を維持するために再配列される。図４を簡略化するために合計５個の拡張書込領域しかりスト１６（およびその複製１７）に示されていないことに留意されたい。しかしながら、そのリストは実際のディスク記憶システムでははるかに大きい。好ましくは、ディスク１６内にある拡張書込領域は１００メガバイトから１ギガバイトのアレイ内の全体の記憶領域をカバーする。リスト１６内にある拡張書込領域の総数を増やすと、待ち行列１４ａからの書込コマンドが書込みを行なう全ブロックがリストに含まれている可能性が高くなる。これは、リストの複製１７が磁気媒体上で更新される必要がある回数を低減するので望ましい結果である。好ましくは、待ち行列１４ａからの書込コマンドの１％未満が新しい拡張書込領域のリスト１６への追加を必要とする。しかしながら、リスト１６内にある拡張書込領域の総数を増やすと、予期しない中断が起こった後にパリティが検査される必要があるブロックの数も増える。たとえば、図２のシステムが待ち行列１４ａからの書込コマンドを実行している間に予期しない中断が起こるとする。その中断のため、あるディスクへのデータブロックの書込が、別のディスクへの対応のパリティブロックの書込が始まる前に完了したり、その逆になったりする可能性がある。その結果、中断の原因が解消した後、このような不完全な書込の存在をアレイが検査しなければならない。この発明では、不完全な書込の存在はリスト１６の複製１７内で識別されるブロック上のパリティのみを検査することによって検出される。その結果として、レベル５ＲＡＩＤディスクアレイ全体の全ブロック上のパリティを検査するのにかかる時間に比べてかなりの量の時間が節約される。上の節約の実際上の数値例として、アレイ全体が９００ギガバイトを記憶し、複製１７内の拡張書込領域が合計５００メガバイトに及ぶ場合を考える。ディスクアレイの５００メガバイトの部分に対してパリティを検査するのに約１０分がかかる。しかしながら、９００ギガバイトは５００メガバイトよりも１８００倍大きく、アレイ全体に対してパリティを検査するには約（１８００）×（１０分）すなわち約３０時間もかかる。ディスクアレイ内の不完全な書込を追跡するある好ましい方法と、このような方法を行なうある好ましいディスク記憶システムとをここで詳細に説明した。しかしながら、この発明の範疇から逸脱せずにそれらの細部にさまざまな変更を加えることができる。たとえば、図３では、各拡張書込領域の範囲が、待ち行列１４ａからの書込コマンドが最後に書込みを行なうブロックの論理アドレスに１、２および３を加えることによって決定された。しかしながら、変更例として、その論理アドレスに加えられる整数の数を増やしてもよい。このように増やすことで、待ち行列からの後の書込コマンドが書込みを行なう全ブロックを拡張書込領域が含む可能性が高くなる。好ましくは最後の書込ブロックの論理アドレスに加えられる数は１から２００の範囲である。別の変更例として、各拡張書込領域の範囲は待ち行列１４ａからの書込コマンドが最初に書込みを行なうブロックの論理アドレスから１、２…を引くことによって決定されてもよい。この変更例では、拡張書込領域は書込みが行なわれるべきブロックに先行するアレイ内のブロックを含む。これもまた、待ち行列からの将来の書込コマンドが書込みを行なう全ブロックを拡張書込領域が含む可能性を高める。好ましくは、最初に書込みが行なわれるブロックの論理アドレスから引かれる数は１から２００の範囲である。別の変更例として、リスト１６内の各拡張書込領域が、書込コマンドが書込みを行なうブロックのみと、書込まれるべき最後のブロックの論理アドレスに１、２…を加えることによって、および／または書込まれるべき最初のブロックの論理アドレスから１、２…を引くことによって識別される追加的なブロックとを含むように規定されてもよい。この変更の一例を図５に示す。図５において、矢印３５は論理アドレスＬＡ_wおよびＬＡ_w＋１を有する２つのブロックの書込を求める、待ち行列１４ａからの書込コマンドを表わす。それら２つのブロックは物理ブロックＰＢＹ＋５およびＰＢＹ＋６としてディスク４に存在する。それら２ブロックのために、拡張書込領域が図５内に、書込みが行なわれるべきブロックと、書込みが行なわれるべき最後のブロックの論理アドレスに１、２、…１５を加えることによって識別される追加的なブロックと、書込みが行なわれるべき最初のブロックの論理アドレスから１、２…８を引くことによって識別される追加的なブロックとであるように規定される。これらの追加的なブロックは図５において参照番号３５（＋１）から３５（＋１５）および３５（−１）から３５（−８）で示される。リスト１６とその複製１７とにおいては、図５の拡張書込領域は（ＬＡ_w−８，ＬＡ_w＋１６）として簡潔な形で識別される。この形では、括弧内の第１の項目が拡張書込領域内の最初のブロックの論理アドレスであり、第２の項目が拡張書込領域内の最後のブロックの論理アドレスである。拡張書込領域が図５に示すようにして規定された場合、予期しない中断の後に、不完全な書込の存在が以下のように検出される。まず、制御プログラム１４が複製１７内の書込拡張領域に含まれる全論理アドレスを対応の物理ブロックに変換する。この変換の間、各物理ブロックが存在する特定のディスクは無視される。次に、それらの物理ブロックがアレイ内の全ディスクから読出される。読出されたデータブロックからパリティが再生成され、読出されたパリティブロックと比較される。別の変更例として、図５内の拡張書込領域の大きさがより小さくまたはより大きくされてもよい。好ましくは、書込みが行なわれるべき最後のブロックの論理アドレスに加えられる数と書込みが行なわれるべき最初のブロックの論理アドレスから引かれる数とは１から２００の範囲である。また、別の変更例として、リスト１６が最も古く使用された順序で削除するように拡張書込領域を配列するものに限定されるわけではない。代わりに、リスト１６から１つの拡張書込領域を取除いて新しい拡張書込領域のための場所をあけるためにいかなる予め定められたアルゴリズムが用いられてもよい。一例として、リスト１６から除去される拡張書込領域がリスト内で最長のものであってもよい。別の例として、リスト１６から除去される拡張書込領域がランダムに選択されてもよい。さらに別の変更例として、リスト１６が図２に示すような制御プログラム１４を保持する同じ半導体メモリ１２に記憶される必要はない。代わりに、リスト１６はコンピュータ１１内に含まれるメモリかまたは１組のレジスタに記憶されてもよい。さらに別の変更例として、データブロックおよびパリティブロックを保持するＲＡＩＤディスクアレイはレベル５アレイでなくともよい。代わりに、それらのデータブロックおよびパリティブロックが、誤りのあるデータブロックをパリティブロックから再生成させる何らかの態様でディスク上に配列されてもよい。たとえば、データブロックおよびパリティブロックを保持するアレイが「背景」部分で記載したようなレベル３ＲＡＩＤディスクアレイであってもよい。したがって、この発明の範囲は説明した特定の実施例のいかなるものの詳細にも限定されず、添付の請求の範囲によって規定されることを理解されたい。Description: Title: A method for tracking incomplete writes in a disk array and a disk storage system performing such a method Background of the Invention: The present invention relates to a method for tracking incomplete writes in a disk array and a disk storage system performing such a method. In the prior art, the term "RAID" disk array has been defined to mean a redundant array of inexpensive disks, and several different RAID disk arrays have been defined. These include a level 1 RAID disk array, a level 3 RAID disk array, and a level 5 RAID disk array. "A Case for Redundant Arrays of Inexpensive Disks (RAID)", by Patterson. Et al., Report No. UCB / CSD 87/391, December 1987. See the Computer Science Division of the University of California at Berkeley, Mon, Berkeley. In a level 5 RAID disk array, both parity and data are striped across a set of several disks. FIG. 1 shows an example of a level 5 RAID disk array in which an array is formed by a set of five disks labeled as disk 0, disk 1,. Each column of the array contains data and parity stored in a single disk in the set. Each row of the array contains data and parity striped across all five disks in the set. In FIG. 1, each row of the array consists of one parity chunk on one disk and four data chunks on the other four disks. Also, each data chunk and each parity chunk are divided into several physical blocks. A single block is the smallest part of a chunk that can be separately addressed by a user program with a read or write command. In FIG. 1, there are eight blocks per chunk. Each block consists of a predetermined number of bytes (eg, 512 bytes) and one cyclic redundancy check byte called a "CRC" byte. In the array of FIG. 1, block 0 of row 0 is addressed by a logical address 0 read / write command. As the logical addresses are sequentially incremented by one, the data blocks are block 1-7 of data chunk 0, block 0-7 of data chunk 1, block 0-7 of data chunk 2, and block 0 of data chunk 3. -7, blocks 8-15 of data chunk 4, blocks 8-15 of data chunk 5, etc. in that order. For example, block 8 of data chunk 5 has logical address 40. When writing to a data block, the CRC bytes within that block are also generated and written. In addition, a parity block having the same block number as the data block is also generated and written. This parity block is written using odd or even parity. In the even parity, the exclusive OR of the parity block and all the data blocks having the same block number is a block of which all are 0. Conversely, in the odd parity, the exclusive OR of the parity block and all the blocks having the same block number becomes one block. One way to generate a new parity block for a new data block to be written is as follows. First, the existing data blocks and their parity blocks are read from their respective disks. Next, an exclusive OR of the read parity block and the read data block is calculated, and a new parity is calculated by calculating an exclusive OR of the logical block and the new data block. This new parity block and the new data block are then written to their respective disks. During execution of the read command, the CRC byte is regenerated from the block of data to be read. If the regenerated CRC byte is different from the stored CRC byte, the block of data to be read contains an error. To correct this error, the erroneous data block is read by: a) reading all the other blocks (data and parity) on the disk having the same block number as the erroneous data block; b) exclusion of those blocks It is regenerated by taking the logical OR. Here, a case is considered in which the execution of a specific write command that attempts to write data to a block having a specific block number “i” is started, but the execution is interrupted before completion. Such an interruption may occur, for example, due to a power outage. In the above case, the interruption may occur after the writing of the new data block is completed but before the writing of the new parity block begins. Similarly, an interruption can occur after the writing of a new parity block is completed but before the writing of a new data block begins. In either case, the exclusive OR of all blocks with block number "i" is not equal to all zeros or all ones. At the same time, the ECC bytes for all blocks with block number "i" will be correct. After the cause of the interruption has been resolved, the array will continue to be read and written by the user program. If any data block with block number "i" is read and the ECC byte detects an error, the remaining data block with the same block number and the parity block are XOR'd with the error to find an error. An attempt is made to regenerate the data block. However, the regeneration process does not work because of the previous incomplete writing. This problem has been addressed in the prior art by providing a flag in the disk array that is set to "1" when the array starts operating and reset to "0" when the array stops operating in the normal manner. . Thus, if the flag is set to "1" before the flag is set when the array begins to operate, then normal operation of the array must have been previously suspended. However, a disadvantage of the above prior art flags is that it takes too long to identify a particular block that has been incompletely written after the flag is found to be in the wrong state. To find the incomplete block of the write, all data blocks and all parity blocks of the entire array must be read, and therefore the parity blocks must be recalculated from the read data blocks, and The parity block read must be compared with the parity block read. For large arrays, this process can take an entire day to complete. Accordingly, it is a primary object of the present invention to provide a method of tracking incomplete writes in a disk array and to perform such a method to check the parity for all blocks of the entire array after normal operation of the array has been interrupted. To provide a disk storage system that eliminates the need to do so. BRIEF SUMMARY OF THE INVENTION: In the present invention, incomplete writes in the disk array are tracked by a control process in the digital computer. The control program sequentially receives a plurality of write commands identifying each block in the array to be written. In response, the control program generates an expanded list of write areas for only the most recently received write command. Each such extended write area includes a block to be written and additional associated blocks that may be written by a subsequent write command. The list is changed by the control program each time a write command for writing a specific block not in the extended write area in the list is received later. In this change, one extended write area in the disk is replaced by a new extended write area containing that particular block. In addition, each time a change step occurs, the control program stores a copy of the list on magnetic media. Due to an unexpected interruption, such as a power outage, the write operation required by any of the received write commands may be initiated but cannot be completed. In the present invention, the presence of an incomplete write is detected by checking only the parity of the block identified in the copy of the list. As a result, a considerable amount of time is saved relative to the time required to check the parity of every block in the entire disk array. As a practical numerical example of the above savings, consider the case where the entire array stores 900 gigabytes and the extended write area in the replica amounts to a total of 500 megabytes. It takes about 10 minutes to check the parity of the 500 megabyte portion of the disk array. However, 900 gigabytes is 1800 times larger than 500 megabytes, and it would take about (1800) * (10 minutes) or about 30 hours to check the parity of the entire array. Increasing the total number of extended write areas in the list increases the likelihood that the list will contain all the blocks to be written by the write command. This is a desirable result because it reduces the number of times that a copy of the list needs to be updated on magnetic media. Preferably, less than 1% of write commands require addition to the list of new extended write areas. BRIEF DESCRIPTION OF THE DRAWINGS: FIG. 1 shows the structure of a level 5 RAID disk array in which incomplete writing can occur. FIG. 2 illustrates one preferred embodiment of a disk storage system that tracks incomplete writes in the array of FIG. 1 in accordance with the present invention. FIG. 3 shows a preferred structure of the extended write area used by the disk storage system of FIG. 2 to track incomplete writes. FIG. 4 illustrates how the extended write area of FIG. 3 is stored in a list and a copy of the list in the disk storage system of FIG. FIG. 5 illustrates a second preferred structure of the extended write area used by the disk storage system of FIG. 2 to track incomplete writes. Detailed description: One preferred embodiment of a disk storage system for tracking incomplete writes in a disk array according to the present invention is shown in FIG. The system of FIG. 2 includes a set of five disks (disk 0 through disk 4), which store data blocks and parity blocks in a level 5 RAID disk array as described in the background with respect to FIG. The system of FIG. 2 also includes a digital computer 11 coupled to the disk array, a semiconductor memory 12 coupled to the computer 11, and a console 13 coupled to the computer. Stored in the memory 12 are a control program 14 and a plurality of user programs, one of which is shown as a user program 15j. Each user program includes a read / write command that includes a logical address that identifies a particular data block in the disk array to be read or written. These read / write commands are sequentially sent to a queue 14a in the control program 14 for execution. Each write command in queue 14a is later received by control program 14 and is performed as follows. First, the control program analyzes the logical address in the write command to determine the number of each data block to be written, the number of the disk containing the data block, and the number of the disk containing the corresponding parity block. I do. Next, the control program reads the current contents of the aforementioned data blocks and their corresponding parity blocks from their respective disks. Next, the control program calculates an exclusive OR of the read parity block and the read data block, and calculates a new parity block by taking an exclusive OR of the read parity block and the new data block. Next, the control program writes a new parity block and a new data block to their respective disks. Thus, the execution of each write command requires that two read operations be performed on two different disks, after which the two write operations are performed on the same two disks. These two write operations may be performed in any order, so that if an unexpected interruption, such as a power outage, occurs, the write operation on the disk will be performed before the write operation on the other disk begins. Cannot be completed. To track such imperfect write operations in the disk array, the disk storage system of FIG. 2 further includes a list 16 in semiconductor memory 12. This list 16 identifies extended write areas for those write commands most recently received by the control program 14 from the queue 14a. Each extended write area in the list contains the block to be written by the write command and additional related blocks. FIG. 3 described later shows two specific examples of these extended writing areas. Each time a write command is received from queue 14a for execution, control program 14 compares the identity of the block to be written to the identity of the block in the extended write area in list 16. If the block to be written is not in any of the extended write areas in list 16, control program 14 places one extended write area in list 16 into a new extended write area containing the particular block to be written. Replace with the area. Preferably, all of the extended write areas in the list 16 are arranged in the list in such a manner that the oldest used ones are deleted. In this case, when a new extended write area needs to be added to the list, the one extended write area that is replaced is the oldest used. In any case, the extended write area is not removed from list 16 until all received write commands that write blocks in the extended write area have been completely executed. In the disk storage system of FIG. 2 as well, a copy of list 17 is stored on magnetic media. In one embodiment, replica 17 is stored on one of five disks holding a level 5 RAID disk array. Alternatively, the disk storage system of FIG. 2 may include an auxiliary magnetic medium 18 on which a copy 17 of the list is stored. This replica 17 differs from the list 16 in that the extended write areas it identifies are not arranged in a particular order. However, both the list 16 and the replica 17 identify the same extended write area. One particular structure for the extended write area stored in the list 16 and its replica 17 is shown in FIG. Here, the large arrow 21 represents a write command received from the queue 14a by the control program 14 requesting writing of the two data blocks on which the arrows are drawn. These data blocks are stored on disk 2 in physical blocks PBY + 5 and PBY + 6. The extended write area for those two blocks is indicated by hatch line 22 in FIG. 3, which includes physical blocks PBY + 5, PBY + 6, PBY + 7, PBY and PBY + 1 on all disks. Similarly, in FIG. 3, another large arrow 25 is drawn in one data block to be written in response to the second write command from queue 14a. The data block is stored on the disk 4 as a physical block PBY + 6. The extended write area for that block is indicated by dot 26 in FIG. 3, which includes physical blocks PBY + 6, PBY + 7, PBY + 8 and PBY + 9 on the entire disk. In FIG. 3, the range of the extended writing area indicated by hatch line 22 is determined as follows. First, the control program generates three extended logical addresses by adding 1, 2, and 3 to the logical address of the block to which the write command 21 last writes. Next, the control program identifies physical blocks corresponding to those extended logical addresses. Reference numbers 23a, 23b and 23c indicate the results of this step. In this case, the extended write area is defined to consist of those physical blocks on all disks and the blocks to be written. The extended writing area indicated by the dot 26 is determined in the same manner. In FIG. 3, reference numerals 27a, 27b, and 27c indicate the results of a) adding 1, 2, and 3 to the logical address of the block to which the write command 25 writes, and b) identifying the corresponding physical block. In the list 16, the extended writing area indicated by the hatch line 22 is identified in a simple form such as (PBY + 5, +2) and (PBY, +1). This compact form reduces the size of the list. In this form, the first item in parentheses is a physical block on all disks in the extended write area, and the second item is a count of the number of physical blocks following the block on all disks. Similarly, the extended writing area indicated by the dot 26 is represented in a simple form such as (PBY + 6, +3). Here, it is assumed that the write command 25 is received from the queue 14a while the extended write area indicated by the hatch line 22 is in the list 16. In that case, the extended writing area indicated by the dot 26 is not added to the list 16. This is because the control program 14 compares the identity of the block to be written with the identity of the block in the extended write area of the list 16 as described above. If all of the blocks to which a particular write command writes are in the extended write area of list 16, no new extended write area for that command is added to the list. FIG. 4 shows a specific example of the operation of the list 16 and its copy 17. In this example, the initial contents of list 16 and its copy 17 occur at time t0. At that time, the most recently used extended write area in the list 16 is (PB35,191 + 6), the next most recently used extended write area is (PB765 + 6), and the oldest recently used extended write area is The embedding area is (PB136, 353 + 6). Thereafter, at time t1 in FIG. 4, a write command for writing the physical blocks PB8, 290 is received by the control program 14 from the queue 14a. This physical block is included in the extended write area (PB8, 289 + 8) in the list 16. Therefore, the new extended write area is not added to the list, so the copy 17 of the list remains unchanged. However, the extended write areas in list 16 are rearranged to maintain their order of use. Next, at time t2 in FIG. 4, a write command for writing the physical blocks PB136 and 354 is received by the control block 14 from the queue 14a. This physical block is included in the extended write area (PB136, 353 + 10) in the list 16. Therefore, the new extended write area is not added to the list, and the copy 17 of the list remains unchanged. However, the extended write area in the list 16 is again rearranged to maintain its use order. Next, at time t3 in FIG. 4, a write command for writing the physical block PB767 is received by the control program 14 from the queue 14a. This physical block is included in the extended write area (PB765 + 6) in the list 16. Therefore, the new extended write area is not added to the list and the copy 17 of the list is not changed here either. However, the extended write areas in list 16 are rearranged to maintain their order of use. Next, at time t4 in FIG. 4, a write command for writing the physical blocks PB91 and 532 is received by the control program 14 from the queue 14a. This physical block is not included in any of the extended write areas in the list 16. Therefore, a new extended write area (PB91, 532 + 3) is added to the list 16, the oldest used extended write area (PB524, 911 + 6) is deleted from the list 16, and all extended write areas in the list are deleted. They are arranged in order of use. Also, the copy 17 of the list is written so that its contents are the same as the new list 16. Finally, at time t5 in FIG. 4, a write command for writing the physical blocks PB136, 356 is received by the control program 14 from the queue 14a. This physical block is included in the extended write area (PB136, 350 + 10) in the list 16. Therefore, the new extended write area is not added to the list and the copy of the list 17 remains unchanged. However, the extended write area in the list 16 is again rearranged to maintain its use order. Note that for simplicity of FIG. 4, a total of five extended write area keys 16 (and their duplicates 17) are not shown. However, the list is much larger in a real disk storage system. Preferably, the extended write area in the disk 16 covers the entire storage area in the 100 megabyte to 1 gigabyte array. Increasing the total number of extended write areas in the list 16 increases the likelihood that the list contains all blocks to which the write command from the queue 14a writes. This is a desirable result because it reduces the number of times that the list replicas 17 need to be updated on magnetic media. Preferably, less than 1% of write commands from queue 14a require the addition of a new extended write area to list 16. However, increasing the total number of extended write areas in list 16 also increases the number of blocks whose parity needs to be checked after an unexpected interruption. For example, assume that an unexpected interruption occurs while the system of FIG. 2 is executing a write command from queue 14a. Because of the interruption, writing a data block to one disk may be completed before writing the corresponding parity block to another disk, or vice versa. As a result, the array must check for the presence of such incomplete writes after the cause of the interruption has been resolved. In the present invention, the presence of an incomplete write is detected by checking only the parity on the block identified in the replica 17 of the list 16. As a result, a significant amount of time is saved relative to the time required to check the parity on all blocks of the entire level 5 RAID disk array. As a practical numerical example of the above savings, consider the case where the entire array stores 900 gigabytes and the extended write area in replica 17 totals 500 megabytes. It takes about 10 minutes to check parity for a 500 megabyte portion of the disk array. However, 900 gigabytes is 1800 times larger than 500 megabytes, and checking parity for the entire array takes about (1800) * (10 minutes), or about 30 hours. One preferred method of tracking incomplete writes in a disk array and a preferred disk storage system for performing such a method have been described in detail herein. However, various changes may be made in those details without departing from the scope of the invention. For example, in FIG. 3, the extent of each extended write area was determined by adding 1, 2, and 3 to the logical address of the block to which the write command from queue 14a last writes. However, as a modification, the number of integers added to the logical address may be increased. By increasing in this way, the possibility that the extended write area includes all blocks to be written by a later write command from the queue is increased. Preferably, the number added to the logical address of the last write block ranges from 1 to 200. As another variation, the extent of each extended write area may be determined by subtracting 1, 2, ... from the logical address of the block to which the write command from queue 14a first writes. In this variation, the extended write area includes the blocks in the array that precede the block to be written. This also increases the likelihood that the extended write area will contain all blocks to which future write commands from the queue will write. Preferably, the number subtracted from the logical address of the first block to be written is in the range of 1 to 200. As another variation, each extended write area in list 16 may be configured to add 1, 2,... To the logical address of only the block to which the write command is writing and the last block to be written, and / or Or additional blocks identified by subtracting 1, 2,... From the logical address of the first block to be written. An example of this change is shown in FIG. In FIG. 5, an arrow 35 indicates a logical address LA. _w And LA _w Represents a write command from queue 14a seeking to write two blocks with +1. These two blocks exist on the disk 4 as physical blocks PB Y + 5 and PB Y + 6. For those two blocks, the extended write area is identified in FIG. 5 by adding 1, 2,... 15 to the logical address of the block to be written and the last block to be written. Additional blocks are defined as being additional blocks identified by subtracting 1, 2,... 8 from the logical address of the first block to be written. These additional blocks are designated in FIG. 5 by reference numbers 35 (+1) to 35 (+15) and 35 (-1) to 35 (-8). In the list 16 and its copy 17, the extended write area in FIG. _w -8, LA _w +16). In this form, the first item in parentheses is the logical address of the first block in the extended write area, and the second item is the logical address of the last block in the extended write area. If the extended write area is defined as shown in FIG. 5, after an unexpected interruption, the presence of an incomplete write is detected as follows. First, the control program 14 converts all logical addresses included in the write extension area in the copy 17 into corresponding physical blocks. During this conversion, the particular disk on which each physical block resides is ignored. Next, those physical blocks are read from all disks in the array. Parity is regenerated from the read data block and compared with the read parity block. As another modification, the size of the extended writing area in FIG. 5 may be made smaller or larger. Preferably, the number added to the logical address of the last block to be written and the number subtracted from the logical address of the first block to be written are in the range of 1 to 200. Further, as another modification, the present invention is not limited to the arrangement in which the extended writing areas are arranged so that the list 16 is deleted in the oldest used order. Alternatively, any predetermined algorithm may be used to remove one extended write area from list 16 to make room for a new extended write area. As an example, the extended writing area removed from the list 16 may be the longest in the list. As another example, the extended writing area to be removed from the list 16 may be randomly selected. As a further modification, the list 16 need not be stored in the same semiconductor memory 12 holding the control program 14 as shown in FIG. Alternatively, the list 16 may be stored in a memory included in the computer 11 or in a set of registers. As yet another modification, the RAID disk array holding the data blocks and the parity blocks may not be a level 5 array. Alternatively, the data blocks and parity blocks may be arranged on the disk in any way that allows the erroneous data blocks to be regenerated from the parity blocks. For example, the array holding the data blocks and parity blocks may be a level 3 RAID disk array as described in the "Background" section. Therefore, it is to be understood that the scope of the invention is not limited to the details of any of the specific embodiments described, but is defined by the appended claims.

【手続補正書】【提出日】平成１１年１２月１日（１９９９．１２．１）【補正内容】（１）明細書第３頁第２９行と第４頁第１行との間に下記を挿入する。記『ＷＯ−Ａ−９４／２９７９５が対処する問題はこの発明と同じである。しかしながら、ＷＯ−Ａ−９４／２９７９５の解決法はこの発明とは本質的に異なる。すなわち、この発明は統合化された、ストライピングされたデータ／パリティ構造の一部を形成するように通常のディスク記憶空間にそれぞれのブロックを含む拡張書込領域を生成することを予見しているが、これは、矛盾していろ可能性のあるブロックのリストを記憶するために、通常のディスクアレイ構造とは別個の追加的なＮＶＲＡＭ領域を設けることに基づくＷＯ−Ａ−９４／２９７９５の解決法とは異なる。提案されている解決法は、リストのサイズを減少させ、ディスクアレイ内の矛盾に関連した全ブロックを効果的に識別するという利点を有する。』（２）明細書第４頁第５行と第６行との間に「請求項記載の発明を以下に記載する。」を挿入する。（３）請求の範囲を別紙のとおり補正する。請求の範囲１．ディスクアレイ内の不完全な書込を追跡する、コンピュータプログラムによって行なわれる方法（図４参照）であって、書込みが行なわれるべき、前記アレイ内のそれぞれのブロックを識別する複数の書込コマンド（図４のｔ１、ｔ２、ｔ３、ｔ４、ｔ５での書込）を順次受取るステップと、最も新しく受取られた複数の書込コマンドのみに対して拡張書込領域（時刻ｔ１でのリスト内のたとえばＰＢ８，２８９）のリスト（図４の１６）を生成するステップとを含み、各拡張書込領域は書込みが行なわれるべきブロックとそれに関連した追加的なブロックとを含み（たとえば、ＰＢ８，２９９＋８はＰＢ８，２９０のｔ１での書込を含む）、前記リスト内のいかなる拡張書込領域（たとえばｔ４での書込）内にもない特定のブロックの書込みを行なう書込コマンドが後に受取られるごとに、１つの拡張書込領域（たとえば、ｔ３のリスト１６内の最後のエントリ）を前記特定のブロックを含む新しい拡張書込領域（たとえば、ｔ４でのリスト内の最初のエントリ）に置換えることによって前記リストを変更するステップと、前記変更するステップが行なわれるごとに磁気媒体に前記リストの複製（ｔ４の１７）を記憶させるステップとを含む、方法。２．前記リスト内の拡張領域は１００メガバイトから１ギガバイトの前記アレイ内の全体の記憶領域をカバーする、請求項１に記載の方法。３．各拡張書込領域は１個から５００個の範囲の追加的なブロックを含む、請求項１に記載の方法。４．各拡張書込領域は前記ディスクのすべての上の同じ物理ブロックを含む、請求項１に記載の方法。５．各拡張書込領域は選択されたディスク上の選択された物理ブロックを含む、請求項１に記載の方法。６．前記変更するステップは、前記リスト内の最も古く書込みが行なわれた拡張書込領域を置換える、請求項１に記載の方法。７．前記プログラムは揮発性半導体メモリ内に前記リストを生成する、請求項１に記載の方法。８．前記プログラムは１組の揮発性レジスタ内に前記リストを生成する、請求項１に記載の方法。９．前記リストの前記複製が記憶される前記磁気媒体は前記アレイ内のディスクである、請求項１に記載の方法。１０．前記リストの前記複製が記憶される前記磁気媒体は前記ディスクアレイの外部にある補助媒体である、請求項１に記載の方法。１１．前記アレイはレベル５ＲＡＩＤディスクアレイである、請求項１に記載の方法。１２．ａ）受取られた書込コマンドによって識別される、前記アレイ内の特定のブロックの書込みを始めるステップと、ｂ）前記特定のブロックの書込をそれが完全に書込みされる前に中断するステップとをさらに含む、請求項１に記載の方法。１３．ａ）前記アレイ内の特定のブロックの書込が中断された後に前記磁気媒体から前記複製を読出すステップと、ｂ）前記複製内の拡張書込領域に含まれるブロック上のみのパリティを検査するステップとをさらに含む、請求項１に記載の方法。１４．デジタルコンピュータに結合されるディスクのアレイと、書込みが行なわれるべき前記アレイ内のそれぞれのブロックを識別する複数の書込コマンドを順次受取る、前記コンピュータ内のプログラムと、最も新しく受取られる複数の書込コマンドのみのための拡張書込領域の前記コンピュータ内のリストとを含み、各拡張書込領域は書込みされるべきブロックとそれに関連した追加的なブロックとを含み、前記プログラムは、前記リスト内のいかなる拡張書込領域内にもない特定のブロックを書込を行なう書込コマンドが後に受取られるごとに、１つの拡張書込領域を前記特定のブロックを含む新しい拡張書込領域に置換えることによって前記リストを変更するようにされ、前記リストが前記プログラムによって変更されるごとに前記リストの複製を記憶する磁気媒体をさらに含む、ディスク記憶システム。[Procedure amendment] [Date of submission] December 1, 1999 (1999.12.1) [Contents of amendment] (1) The following is placed between page 3, line 29 and page 4, line 1 Insert The problem addressed by WO-A-94 / 29795 is the same as in the present invention. However, the solution of WO-A-94 / 29795 is substantially different from the present invention. That is, while the present invention envisages creating an extended write area containing each block in normal disk storage space to form part of an integrated, striped data / parity structure. This is a solution of WO-A-94 / 29795 based on providing an additional NV RAM area separate from the usual disk array structure to store a list of possibly inconsistent blocks. Different from the law. The proposed solution has the advantage of reducing the size of the list and effectively identifying all blocks associated with inconsistencies in the disk array. (2) Insert “The claimed invention is described below.” Between the fifth and sixth lines of page 4 of the specification. (3) The claims are corrected as shown in the attached sheet. Claims 1. A method performed by a computer program to track incomplete writes in a disk array (see FIG. 4) , comprising a plurality of write commands ( FIG. 4) identifying each block in said array to be written. 4) sequentially receiving write commands at t1, t2, t3, t4, and t5 in FIG. 4; and an extended write area ( in the list at time t1) for only the most recently received write commands . Generating a list ( e.g., PB8, 289) (16 in FIG. 4) , wherein each extended write area includes a block to be written and additional blocks associated therewith (e.g., PB8, 299 + 8). the PB8, and a write at 290 t1), any extended writing area (e.g. a particular block nor in writing) at t4 in said list Each time a write command for performing write is received later, one extended writing area (e.g., the last entry in the list 16 of t3) New Additional write area including the specified block (e.g., at t4 by replacing the first entry) in the list and a step of storing and changing the list, replication of said list on a magnetic media each time step of the change is performed (17 t4), Method. 2. The method of claim 1, wherein the extended area in the list covers 100 megabytes to 1 gigabyte of the entire storage area in the array. 3. The method of claim 1, wherein each extended writing area includes between one and 500 additional blocks. 4. The method of claim 1, wherein each extended write area includes the same physical block on all of the disks. 5. The method of claim 1, wherein each extended write area includes a selected physical block on a selected disk. 6. The method of claim 1, wherein the altering step replaces an oldest written extended write area in the list. 7. The method according to claim 1, wherein the program generates the list in a volatile semiconductor memory. 8. The method of claim 1, wherein the program generates the list in a set of volatile registers. 9. The method of claim 1, wherein the magnetic medium on which the copy of the list is stored is a disk in the array. 10. The method of claim 1, wherein the magnetic medium on which the copy of the list is stored is an auxiliary medium external to the disk array. 11. The method of claim 1, wherein the array is a level 5 RAID disk array. 12. a) starting to write a particular block in the array, identified by the received write command; b) suspending the writing of the particular block before it has been completely written; The method of claim 1, further comprising: 13. a) reading the replica from the magnetic medium after writing of a particular block in the array has been interrupted; and b) checking parity only on blocks included in the extended write area in the replica. The method of claim 1, further comprising: 14． An array of disks coupled to a digital computer, and a program in the computer for sequentially receiving a plurality of write commands identifying respective blocks in the array to be written, and a plurality of most recently received writes. A list in the computer of an extended write area for commands only, wherein each extended write area includes a block to be written and additional blocks associated therewith, wherein the program comprises: By replacing a single extended write area with a new extended write area containing the specific block each time a write command to write a particular block that is not in any extended write area is subsequently received. Modifying the list, wherein the list is modified each time the list is modified by the program. A disk storage system further comprising a magnetic medium for storing a copy of the disk.

Claims

[Claims] 1. A computer program that tracks incomplete writes in a disk array. Is a method performed by A plurality identifying each block in the array to which writing is to be performed Sequentially receiving write commands of List of extended write areas for only the most recently received write commands Generating each of the extended write areas in a block in which writing is to be performed. And additional blocks associated with it, Write a specific block not in any extended write area in the list Each time a subsequent write command is received, one extended write area is Modify the list by replacing it with a new extended write area containing a lock Steps and Storing a copy of the list on a magnetic medium each time the changing step is performed And the step of causing. 2. The extension area in the list is the array from 100 megabytes to 1 gigabyte. 2. The method of claim 1, wherein the entire storage area is covered. 3. Each extended write area includes between 1 and 500 additional blocks Item 1. The method according to Item 1. 4. Each extended write area contains the same physical block on all of the disks, The method of claim 1. 5. Each extended write area contains the selected physical block on the selected disk, The method of claim 1. 6. The step of modifying includes the oldest written extension in the list. The method of claim 1, wherein the writing area is replaced. 7. 2. The program according to claim 1, wherein the program generates the list in a volatile semiconductor memory. The method described in. 8. The program generates the list in a set of volatile registers. 2. The method according to 1. 9. The magnetic medium on which the copy of the list is stored is a disk in the array The method of claim 1, wherein 10. The magnetic medium on which the copy of the list is stored is a copy of the disk array. The method of claim 1, wherein the method is an external auxiliary medium. 11. The array of claim 1, wherein the array is a level 5 RAID disk array. Method. 12. a) a particular command in the array identified by the received write command Initiating the writing of a block, b) writing the particular block Interrupting before being completely written. Law. 13. a) the magnetic medium after writing of a particular block in the array is interrupted; And b) reading the copy from the extended write area in the copy. Checking parity on lock only. Method. 14． An array of disks coupled to a digital computer Multiple write commands to identify each block in the array to be sequenced. The next program to be received in the computer and the most recently received programs A list in the computer of an extended writing area only for embedded commands. Each extended write area is a block to be written and additional blocks associated with it The program is not in any extended write area in the list Each time a write command to write a particular block is received later, By replacing the extended write area with a new extended write area including the specific block. The list is changed by the program, and the list is changed by the program. A disk drive, further comprising a magnetic medium for storing a copy of the list each time it is changed. Memory system.