JP3832223B2

JP3832223B2 - Disk array disk failure recovery method

Info

Publication number: JP3832223B2
Application number: JP2000297066A
Authority: JP
Inventors: 育哉八木沢; 康行味松; 直人松並; 暁弘萬年; 賢一 ▲高▼本
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2000-09-26
Filing date: 2000-09-26
Publication date: 2006-10-11
Anticipated expiration: 2020-09-26
Also published as: JP2002108571A

Description

【０００１】
【発明の属する技術分野】
本発明は主として、コンピュータの外部記憶装置システムにおけるディスク障害復旧方法に関するものである。
【０００２】
【従来の技術】
（１）ディスクアレイのディスク障害復旧方法
ディスクアレイシステムは、ＲＡＩＤ(Redundant Arrays of Inexpensive Disks)とも呼ばれ、複数のディスク装置をアレイ状に配置した構成をとり、ホスト装置（以下ホストと略する）からのリード要求（データの読み出し要求）およびライト要求（データの書き込み要求）をディスク装置の並列動作によって高速に処理するとともに、データに冗長データを付加することによって信頼性を向上させた記憶装置である。ディスクアレイシステムは、冗長データの種類と構成により５つのレベルに分類されている（論文："A Case for Redundant Arrays of Inexpensive Disks (RAID)", David A.Patterson, Garth Gibson, and Randy H.Katz, Computer Science Division Department of Electrical Engineering and Computer Sciences, University of California Berkeley）。
【０００３】
上記のようなディスクアレイを実現するためには、ホストからのリード／ライト要求を各ディスク装置へのリード／ライト要求に変換し、ライト時にはデータを各ディスク装置へ分散し、リード時には各ディスク装置からデータを集合するデータ分散・集合制御を行う必要がある。このような制御をディスクアレイ制御と呼ぶこととする。
【０００４】
ディスクアレイのうちパリティを付加している例えばＲＡＩＤ５レベルでは、１台のディスク障害が発生しても他のディスクとのパリティ保証により、ディスクの内容を復旧することができる。ディスクを復旧する場合、障害があったディスクを交換し、パリティグループを構成する他のディスクからデータを読み出し、ＸＯＲ演算（排他的論理和）を施した後、演算結果を交換したディスクに書き込む。
【０００５】
（２）スナップショットのための二重化
一般にハードディスクなどのコンピュータの外部記憶装置に記録されたデータは、装置の障害、ソフトウェアの欠陥、誤操作などによりデータを喪失した場合に、喪失したデータを回復できるように定期的にテープなどにコピーして保存しておくバックアップが必要である。その際、コピー作業中にデータが更新され、データに不整合が生じるとバックアップとして意味をなさないため、コピー作業中はデータの整合性を保証する必要がある。
【０００６】
バックアップされるデータの整合性を保証するためには、データにアクセスするバックアッププログラム以外のプログラムを停止すればよいが、高可用性が要求されるシステムではプログラムを長時間停止させることができない。そのため、バックアップ中にプログラムがデータを更新することを妨げず、なおかつバックアップ開始時点でのデータの記憶イメージを作成する仕組みを提供する必要がある。ここで、ある時点でのデータの記憶イメージをスナップショットと呼び、指定された時点のスナップショットを作成しつつデータの更新が可能な状態を提供する仕組みをスナップショット管理方法と呼ぶ。また、スナップショット管理方法によりスナップショットを作成することをスナップショットの取得と呼び、スナップショット取得の対象となったデータをオリジナルデータと呼ぶ。また、スナップショットを作成した状態をやめることをスナップショットの削除と呼ぶ。
【０００７】
従来のスナップショット管理方法の一つとして、データの二重化による方法が挙げられる。
【０００８】
この方法では、スナップショットを取得していない通常の状態において、コンピュータ上のプログラムがすべてのデータを２つの記憶領域に二重化（ミラー）する。スナップショットを取得する時は二重化を停止して２つの記憶領域を独立な領域に分離し、１つの領域をオリジナルデータ、もう１つの領域をスナップショットとして提供する。
【０００９】
スナップショットを取得し二重化を停止している間は、オリジナルデータの記憶領域に対するデータの更新を許可するとともに、データ更新が発生した場合は更新した位置を記録しておく。スナップショット削除時には、データの二重化を再開するとともに、２つの記憶領域の間で内容が一致していない更新データを更新位置の記録をもとにオリジナルデータの記憶領域からスナップショットとして提供していた記憶領域にコピーする（ミラー再同期化）。コンピュータ上のプログラムでデータを二重化する方法は、例えば米国特許５，０５１，８８７に示されている。
【００１０】
【発明が解決しようとする課題】
ディスク交換後のパリティ再構築時には、交換したディスク以外の同パリティグループの全ディスクからデータを読み出す必要があるため通常アクセス性能が低下するという課題がある。また、パリティグループを形成するＲＡＩＤ５などの構成を表すｎＤ＋１Ｐにおいて、データドライブ数ｎが増加すると性能がさらに悪化するという課題がある。
【００１１】
データの二重化によるスナップショット管理方法では、ミラー再同期化時において、更新／参照をする通常アクセスと更新データのコピーアクセスがオリジナルデータの記憶領域に集中し、通常アクセスの性能が低下する。ミラー再同期化にかかる時間は、スナップショットを取得して二重化を停止している間に更新されたデータ量に比例するので、更新アクセスが単位時間あたり同じ回数発生すると仮定した場合、ミラー再同期化にかかる時間は二重化の停止時間に比例して大きくなる。ディスク交換後のパリティ再構築と並行してバックアップをとる場合、スナップショットを取得して二重化を停止している時間が長く、通常アクセスの性能が低下するミラー再同期化時間が長くなるという課題がある。
【００１２】
本発明の第１の目的は、スナップショット取得のために二重化運用しているディスクアレイにおいて、ディスク交換後のパリティ再構築時の通常アクセス性能低下を抑止するディスク障害復旧方法を提供することである。
【００１３】
本発明の第２の目的は、スナップショット取得のために二重化運用しているディスクアレイにおいて、二重化を停止している期間におけるディスク交換後のパリティ再構築時間を短縮することで、ミラー再同期化中の更新データのコピー量を削減し、性能が低下するミラー再同期化時間を短縮するディスク障害復旧方法を提供することである。
【００１４】
本発明の第３の目的は、第１、第２の目的に加えて、パリティグループを形成するＲＡＩＤ５などのディスク構成を表すｎＤ＋１Ｐにおいて、データドライブ数ｎの増加による性能低下を抑止するディスク障害復旧方法を提供することである。
【００１５】
【課題を解決するための手段】
前記第１の目的を達成するために本発明は、スナップショット取得のために二重化運用しているディスクアレイにおいて、障害ディスクの復旧時に行うパリティ再構築を、パリティ冗長性を用いたパリティ生成に代えて、スナップショット用ミラー構成の同じ位置にあるディスクからのデータコピーによって実施するディスク障害復旧サブプログラムを設ける。ディスク交換後のパリティ再構築に関わるディスクアクセス回数を削減することで、通常アクセス性能の低下を抑止することができる。
【００１６】
また、前記第２の目的を達成するために本発明は、スナップショット取得のために二重化運用しているディスクアレイにおいて、前記同様のディスク障害復旧サブプログラムを設ける。二重化を停止している期間におけるディスク交換後のパリティ再構築時間を短縮することで、ミラー再同期化中の更新データのコピー量を削減し、性能が低下するミラー再同期化時間を短縮することができる。
【００１７】
また、前記第２の目的を達成するために本発明は、スナップショット取得のために二重化運用しているディスクアレイにおいて、前記同様のディスク障害復旧サブプログラムを設ける。パリティグループを形成するＲＡＩＤ５などのディスク構成を表すｎＤ＋１Ｐにおいて、ｎの数によらず１台のディスクからのコピーとなるため、データドライブ数ｎの増加による性能低下を抑止することができる。
【００１８】
【発明の実施の形態】
本発明の第１の実施形態は、スナップショット取得のために二重化運用しているディスクアレイにおいて、ディスク交換後のパリティ再構築に関わるディスクアクセス回数を削減することで、通常アクセス性能の低下を抑止するためのものである。
【００１９】
また、二重化を停止している期間におけるディスク交換後のパリティ再構築時間を短縮することで、ミラー再同期化中の更新データのコピー量を削減し、性能が低下するミラー再同期化時間を短縮するためのものである。
【００２０】
また、ディスク交換後のパリティ再構築に関わるディスクアクセス回数を削減することで、パリティグループを形成するＲＡＩＤ５などのディスク構成を表すｎＤ＋１Ｐでのデータドライブ数ｎの増加による性能低下を抑止するためのものである。
【００２１】
なお、本発明ではスナップショットの利用例として、バックアップをとりあげるが、ＯＬＡＰ（ＯｎＬｉｎｅＡｎａｌｙｔｉｃａｌＰｒｏｃｅｓｓｉｎｇ）やシステムテスト等の他の目的においても利用が可能である。
【００２２】
（１）構成の説明
本発明の第１の実施形態のシステム構成を図１を用いて説明する。図１において、コンピュータ１００とディスクアレイ２００が、ＳＣＳＩインタフェース１４０、２４０を介してＳＣＳＩバス３００で接続されている。コンピュータ１００のメモリ１２０には、データベースプログラム１２６、バックアッププログラム１２７があり、コンピュータ１００を制御するＣＰＵ１１０によって実行される。ディスクアレイ２００には、ディスクコントローラ２５０によって制御されるディスク群２５１〜２５２があり、またメモリ２２０内にはスナップショット管理プログラム２２１があり、ＣＰＵ２１０によって実行される。ディスク群２５１はディスク２７１〜２７５を有し、ディスク群２５２はディスク２８１〜２８２を有し、各ディスク群は各々がパリティ付きのストライピングされたアレイ構成をとる。本実施形態では、ディスク群２５１〜２５２内のディスク数を５としているが、それぞれ３つ以上のディスクから構成されていればよい。
【００２３】
各ディスク群２５１〜２５２内の記憶領域は、ＳＣＳＩの論理ユニットであるＬＵ（ＬｏｇｉｃａｌＵｎｉｔ）としてコンピュータ１００からアクセスされる。各ディスク群２５１〜２５２に対応する各記憶領域をそれぞれＬＵ２６１〜２６２とする。各ディスク群２５１〜２５２内のＬＵはそれぞれ複数であってもよい。本実施形態では、ディスクアレイ２２０内のスナップショット管理プログラム２２１が、ＬＵ２６１とＬＵ２６２を二重化して管理し、ＬＵ２６１をオリジナルデータを持ったミラー元ＬＵとし、ＬＵ２６２をオリジナルデータのミラーであるミラー先ＬＵとする。ＬＵ２６２が、スナップショットとして使用するＬＵである。
【００２４】
次に、コンピュータ１００内のプログラムについて説明する。
【００２５】
データベースプログラム１２６は、実行中にミラー元ＬＵであるＬＵ２６１にアクセスし、また、データ更新を制御してＬＵ２６１内のデータの整合性を保証するバックアップモードに切替える機能を持つ。バックアップモードへはユーザまたはバックアッププログラム１２７からの指示により遷移する。バックアッププログラム１２７は、ユーザからの指示によってスナップショットを保存したＬＵ２６２からテープ等にバックアップするためのデータを読み出す機能と、ディスクアレイ２００にＳＣＳＩのＭｏｄｅＳｅｌｅｃｔコマンドを発行する機能と、データベースプログラム１２６にバックアップモードの有効化、無効化を指示する機能を持つ。
【００２６】
次に、ディスクアレイ２００内のプログラム、および、管理表について説明する。
【００２７】
ディスクアレイ２００のスナップショット管理プログラム２２１は、コンピュータ１００からの要求に応じてディスクコントローラ２５０にディスクアクセスを指示するディスクアクセスサブプログラム２３０と、１つのＬＵに対する更新を二重化してあらかじめ指定された別のＬＵにも適用し、２つのＬＵに同じ内容を書き込むＬＵミラーサブプログラム２３１を持つ。ディスクアクセスサブプログラム２３０は、コンピュータ１００からのリード／ライト要求を各ディスク２７１〜２７５、２８１〜２８５へのリード／ライト要求に変換するディスクアレイ制御を行う。ＬＵミラーサブプログラム２３１は、ＬＵ２６１に対するアクセスをＬＵ２６２に二重化する。
【００２８】
また、スナップショット管理プログラム２２１は、二重化を停止しているとき（非ミラー時）にミラー元ＬＵに対する更新を検出する非ミラー時更新監視サブプログラム２３４と、その更新位置を後述する更新／不整合位置管理表２２２に記録する非ミラー時更新位置管理サブプログラム２３５と、ミラー再同期化を行う際にミラー元ＬＵの更新部分をミラー先ＬＵにコピーするミラー再同期サブプログラム２３２と、障害ディスクを交換した後にパリティ付きのストライピングされたアレイ構成に復旧するディスク障害復旧サブプログラム２３３とを持つ。更新／不整合位置管理表２２２は、ミラー元ＬＵとミラー先ＬＵのデータ内容の管理に用い、非ミラー時に更新されたミラー元ＬＵの更新位置と、ディスク障害後の交換したディスクにおいて同一ＬＵ内でのパリティが不整合となっている位置を記録するものである。
【００２９】
更新／不整合位置管理表２２２は、図２に示すようなビットマップとし、ミラー元ＬＵ内のすべてのＬＢＡセット番号、および、ＬＢＡセット番号に付随する更新ビット、ミラー元不整合ビット、ミラー先不整合ビットから成る。ＬＢＡセットは、ＬＵの全領域に対して、１個以上の同数のＬＢＡ（ＬｏｇｉｃａｌＢｌｏｃｋＡｄｄｒｅｓｓ）を単位として先頭から分割していったときの個々の集合であり、ＬＢＡセット番号は、ＬＢＡの先頭側から各ＬＢＡセットに通し番号をつけたものである。
【００３０】
ディスクアレイのデータ分散により、ＬＢＡセットは各ディスクに対し先頭側から１個ずつ分配されていくものとし、各ディスク内の同じ位置におけるＬＢＡセットの組に対してパリティグループ番号を付記する。図２の例では、パリティグループ番号０は、ＬＢＡセット番号０〜４により構成されている。また、本実施形態ではディスク５台に対してデータ分散を行うものとし、各ディスクのＬＢＡセットの組を表すディスクセット番号を付記する。図２の例では、ディスクセット０は、ＬＢＡセット番号０、５、１０、１５により構成されている。
【００３１】
更新ビットは、非ミラー時にミラー元ＬＵのＬＢＡセットが更新されたかどうかを示し、「更新」、「非更新」に応じてそれぞれ１、０を指定する。更新ビットの初期設定値は０である。図２の例では、ＬＢＡセット番号１の領域のみが非ミラー時に更新されている状態を示す。
【００３２】
ミラー元不整合ビットは、ミラー元ＬＵが交換されたときに同一ＬＵ内の他のディスクとパリティの整合性がとれているかどうかを示し、「不整合」「整合」に応じてそれぞれ１、０を指定する。ミラー元不整合ビットの式設定値は０である。
【００３３】
ミラー先不整合ビットは、ミラー先ＬＵが交換されたときに同一ＬＵ内の他のディスクとパリティの整合性がとれているかどうかを示し、「不整合」「整合」に応じてそれぞれ１、０を指定する。ミラー先不整合ビットの式設定値は０である。図２の例では、ミラー先ＬＵにおいて、パリティグループ番号１〜３のディスクセット０の領域が他のディスクとのパリティ不整合となっている状態を示す。パリティグループ番号０のディスクセット０の領域は、後述するディスク障害復旧方法により他のディスクとのパリティ整合性がとれていることを示す。
【００３４】
（２）スナップショットの取得／削除
ミラー元ＬＵであるＬＵ２６１のスナップショットをミラー先ＬＵであるＬＵ２６２として提供する場合を例にとり、スナップショット取得／削除時におけるバックアッププログラム１２７とスナップショット管理プログラム２２１の動作を図３のフローチャートを用いて説明する。
【００３５】
まず、コンピュータ１００のバックアッププログラム１２７がデータベースプログラム１２６に指示を与え、バックアップモードを有効化してスナップショットを取得するデータの整合性を保証する（ステップ２０００）。次に、バックアッププログラム１２７は、ディスクアレイ２００にスナップショットを取得するためのＭｏｄｅＳｅｌｅｃｔコマンドを発行する（ステップ２００１）。ディスクアレイ２００のスナップショット管理プログラム２２１は、ＭｏｄｅＳｅｌｅｃｔコマンドを受信すると（ステップ３０００）、非ミラー時更新監視サブプログラム２３４と非ミラー時更新位置管理サブプログラム２３５を有効化し、ＬＵ２６１の更新位置記録を開始する（ステップ３００１）。以降、ＬＵ２６１が更新されると、更新／不整合位置管理表２２２における更新されたＬＢＡを含むＬＢＡセットの更新ビットに１を設定し、更新があったことを記録する。次に、スナップショット管理プログラム２２１は、ＬＵミラーサブプログラム２３１を無効化し、ＬＵ２６１とＬＵ２６２の二重化を停止する（ステップ３００２）。これにより、ミラー元ＬＵであるＬＵ２６１に対する更新がミラー先ＬＵであるＬＵ２６２に反映されなくなる。次に、スナップショット管理プログラム２２１は、ＭｏｄｅＳｅｌｅｃｔコマンドの終了ステータスをコンピュータ１００のバックアッププログラム１２７に送信する（ステップ３００３）。バックアッププログラム１２７は、ＭｏｄｅＳｅｌｅｃｔコマンドの終了ステータスを受信すると（ステップ２００２）、データベースプログラム１２６に指示を与え、バックアップモードを無効化する（ステップ２００３）。
【００３６】
次に、バックアッププログラム１２７は、ディスクアレイ２００に対し、ＬＵ２６３にスナップショット削除を指示するＭｏｄｅＳｅｌｅｃｔコマンドを発行する（ステップ２００４）。ディスクアレイ２００のスナップショット管理プログラム２２１は、ＭｏｄｅＳｅｌｅｃｔコマンドを受信すると（ステップ３００５）、ＬＵミラーサブプログラム２３１を有効化し、ＬＵ２６１とＬＵ２６２の二重化を再開する（ステップ３００６）。これにより、ＬＵ２６１に対する更新がＬＵ２６２にも反映される。次に、スナップショット管理プログラム２２１は、非ミラー時更新監視サブプログラム２３４と非ミラー時更新位置管理サブプログラム２３５を無効化し、ＬＵ２６１の更新位置記録を停止する（ステップ３００７）。以降、更新／不整合位置管理表２２２の更新ビットを非ミラー時更新位置管理サブプログラム２３５が変更しなくなる。次に、スナップショット管理プログラム２２１は、ミラー再同期サブプログラム２３２を有効化し、更新／不整合位置管理表２２２を参照して、ＬＵ２６１とＬＵ２６２で内容が一致しない部分をＬＵ２６１からＬＵ２６２にコピーする（ステップ３００８）。次に、スナップショット管理プログラム２２１は、ミラー再同期サブプログラム２３２を無効化し（３００９）、ＭｏｄｅＳｅｌｅｃｔコマンドの終了ステータスをコンピュータ１００のバックアッププログラム１２７に送信する（ステップ３０１０）。バックアッププログラム１２７は、ＭｏｄｅＳｅｌｅｃｔコマンドの終了ステータスを受信し動作を終了する（ステップ２００５）。
【００３７】
ここで、ステップ３００８でＬＵ２６１からＬＵ２６２へのデータコピーを行うミラー再同期サブプログラム２３２の動作を図４のフローチャートを用いて説明する。まず、ミラー再同期サブプログラム２３２が、更新／不整合位置管理表２２２の更新ビットに更新記録として１があるかどうかを調べる（ステップ１００１）。もし、更新記録である１がなければミラー再同期化が完了したので処理を終了する（ステップ１００２）。更新記録があれば、該当記録位置の更新を抑止し（ステップ１００３）、該当記録位置のデータを
ミラー元ＬＵであるＬＵ２６１からミラー先ＬＵであるＬＵ２６２にコピーする（ステップ１００４）。データのコピーは、ＬＵ２６１を含むディスク群２５１のディスク２７１〜２７５のいずれかにＲＥＡＤコマンドを発行して指定したＬＢＡのデータを読み出し、ＬＵ２６２を含むディスク群２５２のうちディスク群２５１のＲＥＡＤコマンドを発行したディスクに対応するディスク２８１〜２８５のいずれかにＷＲＩＴＥコマンドを発行してＬＵ２６１に指定したＬＢＡと同じＬＢＡに読み出したデータを書きこむことで実施する。データのコピーは、ＣＯＰＹコマンドを用いてもよい。次に、ミラー再同期サブプログラム２３２は、該当記録位置の更新抑止を解除し（ステップ１００５）、更新／不整合位置管理表２２２の該当する更新ビットに０を設定して更新記録を削除し（ステップ１００６）、ステップ１００１に戻る。
【００３８】
以上が、スナップショット取得／削除時におけるバックアッププログラム１２７とスナップショット管理プログラム２２１の動作である。コンピュータ１００のバックアッププログラム１２７は、ステップ２００３〜ステップ２００４の間でスナップショットを取得したＬＵ２６２の読み出しを行うことができる。
【００３９】
（３）ディスク障害復旧方法
本発明では、障害ディスクを交換して新しくセットしたディスクに対するパリティ再構築を、パリティの冗長性を利用して行うことに代えて、ミラー化された対となるディスクからコピーすることで行う。対となるディスクとは、例えば、図１のディスク２７１〜２７５とディスク２８１〜２８５がこの順番でディスク群を形成していると想定するとディスク２７１とディスク２８１の関係が該当する。
【００４０】
本実施形態では、障害ディスクを交換した後のパリティ再構築について、スナップショット取得／削除動作のどの段階で行ったかにより、ミラーフェーズ、非ミラーフェーズ、再同期化フェーズの３フェーズに分けて説明する。非ミラーフェーズは、ミラー化がされていない図３のステップ３００２開始からステップ３００７終了までとする。再同期化フェーズは、更新されたデータをミラー元ＬＵからミラー先ＬＵにコピーしている図３のステップ３００８開始からステップ３００９終了までとする。ミラーフェーズは、ミラー化されている段階とし、非ミラーフェーズと再同期化フェーズ以外の全範囲とする。
【００４１】
また、各フェーズの間に、ミラー元ＬＵとミラー先ＬＵのどちらのディスクを交換したかについても場合分けする。
【００４２】
なお、障害ディスクを交換した後のパリティ再構築は、ディスクアレイ２００のスナップショット管理プログラム２２１がディスク障害復旧サブプログラム２３３を有効化することによって行い、コンピュータ１００からの指示とは独立して実施することができる。
【００４３】
（３−１）ミラーフェーズ
ミラーフェーズにおけるディスク障害復旧サブプログラム２３３の動作を図５のフローチャートを用いて説明する。ミラーフェーズにおいては、ミラー元ＬＵとミラー先ＬＵの内容が同じであり、ミラー元ＬＵとミラー先ＬＵのどちらのディスクを交換した場合でもディスク障害復旧サブプログラム２３３の動作が同じであるので、ミラー先ＬＵのディスクを交換した場合を例にとる。ここで、障害で交換したディスクを図１のミラー先ＬＵであるディスク群２５２のディスク２８１と想定する。ミラー化されている場合は、ミラー元ＬＵであるディスク群２５１のディスク２７１と同じデータがディスク２８１に格納されることになる。まず、ディスクが交換されると、ディスク障害復旧サブプログラム２３３が、交換したディスク２８１に対応する更新／不整合位置管理表２２２のミラー先不整合ビットをすべて１に設定する（ステップ４００１）。次に、ディスク障害復旧サブプログラム２３３は、更新／不整合位置管理表２２２のミラー先不整合ビットに不整合記録として１があるかどうかを調べる（ステップ４００２）。もし、不整合記録である１がなければディスク２８１の障害復旧が完了しディスク群２５２のパリティは整合性がとれたので処理を終了する（ステップ４００３）。不整合記録があれば、該当記録位置の更新を抑止し（ステップ４００４）、更新／不整合位置管理表２２２のミラー元不整合ビットに不整合記録として１があるかどうかを調べる（ステップ４００５）。もし、不整合記録である１がなければミラー元ＬＵのパリティ整合性はとれているので、該当位置のデータをディスク２７１からディスク２８１にコピーする（ステップ４００７）。不整合記録である１があればミラー元ＬＵも障害復旧中でありパリティの整合性はとれていないので、ディスク２８１と同一ディスク群２５２の他のディスク２８２〜２８５からデータを読み出しパリティ演算によってデータを復元し、ディスク２８１の該当位置に書き込む（ステップ４００６）。ステップ４００７、または、ステップ４００６終了後、ディスク障害復旧サブプログラム２３３は、更新／不整合位置管理表２２２のディスク２８１に該当する不整合ビットを０に設定して不整合記録を削除し（ステップ４００８）、該当記録位置の更新抑止を解除し（ステップ４００９）、ステップ４００２に戻る。
【００４４】
以上が、ミラーフェーズにおけるディスク障害復旧サブプログラム２３３の動作である。
【００４５】
なお、ステップ４００７におけるデータのコピーは、ディスク２７１にＲＥＡＤコマンドを発行して指定したＬＢＡのデータを読み出し、ディスク２８１にＷＲＩＴＥコマンドを発行してディスク２７１に指定したＬＢＡと同じＬＢＡに読み出したデータを書きこむことで実施する。データのコピーは、ＣＯＰＹコマンドを用いてもよい。
【００４６】
スナップショット管理プログラム２２１は、ディスク障害復旧サブプログラム２３３動作中にコンピュータ１００からデータ更新要求が来た場合、ミラー元ＬＵとミラー先ＬＵの両方に反映させるものとし、交換したディスクの不整合ビットを０にする。また、コンピュータ１００からデータ読み出し要求が来た場合は、不整合ビットが０であるＬＵからデータを読み出す。両方のＬＵの不整合ビットが１である場合は、交換したディスク以外の全ディスクからデータを読み出しパリティ演算によって要求されたデータを復元し、コンピュータ１００に送信する。
【００４７】
（３−２）非ミラーフェーズ
非ミラーフェーズにおけるディスク障害復旧サブプログラム２３３の動作を図６のフローチャートを用いて説明する。非ミラーフェーズにおいては、ミラー元ＬＵとミラー先ＬＵの内容は一致していないが、ミラー元ＬＵとミラー先ＬＵのどちらのディスクを交換した場合でもディスク障害復旧サブプログラム２３３の動作が同じであるので、ミラー元ＬＵのディスクを交換した場合を例にとる。ここで、障害で交換したディスクを図１のミラー元ＬＵであるディスク群２５１のディスク２７１と想定する。
【００４８】
まず、ディスクが交換されると、ディスク障害復旧サブプログラム２３３が、交換したディスク２７１に対応する更新／不整合位置管理表２２２のミラー元不整合ビットをすべて１に設定する（ステップ５００１）。次に、ディスク障害復旧サブプログラム２３３は、更新／不整合位置管理表２２２のミラー元不整合ビットに不整合記録として１があるかどうかを調べる（ステップ５００２）。もし、不整合記録である１がなければディスク２７１の障害復旧が完了しディスク群２５１のパリティは整合性がとれたので処理を終了する（ステップ５００３）。不整合記録があれば、該当記録位置の更新を抑止し（ステップ５００４）、更新／不整合位置管理表２２２のミラー先不整合ビットに不整合記録として１があるかどうかを調べる（ステップ５００５）。もし、不整合記録である１があればミラー先ＬＵも障害復旧中でありパリティの整合性はとれていないので、ディスク２７１と同一ディスク群２５１の他のディスク２７２〜２７５からデータを読み出しパリティ演算によってデータを復元し、ディスク２７１の該当位置に書き込む（ステップ５００６）。もし、不整合記録である１がなければ、更新／不整合位置管理表２２２の該当する更新ビットに更新記録として１があるかどうかを調べる（ステップ５００７）。もし、更新記録である１があれば、該当位置のディスク２７１とディスク２８１のデータは異なるので、ディスク障害復旧サブプログラム２３３はディスク２７１と同一ディスク群２５１の他のディスク２７２〜２７５からデータを読み出しパリティ演算によってデータを復元し、ディスク２７１の該当位置に書き込む（ステップ５００６）。もし、更新記録である１がなければ、該当位置のディスク２７１とディスク２８１のデータは同一となるべきなので、該当位置のデータをディスク２８１からディスク２７１にコピーする（ステップ５００８）。
【００４９】
ステップ５００６、または、ステップ５００８終了後、ディスク障害復旧サブプログラム２３３は、更新／不整合位置管理表２２２のディスク２７１に該当する不整合ビットを０に設定して不整合記録を削除し（ステップ５００９）、該当記録位置の更新抑止を解除し（ステップ５０１０）、ステップ５００２に戻る。以上が、非ミラーフェーズにおけるディスク障害復旧サブプログラム２３３の動作である。
【００５０】
なお、ステップ５００８におけるデータのコピーは、ディスク２８１にＲＥＡＤコマンドを発行して指定したＬＢＡのデータを読み出し、ディスク２７１にＷＲＩＴＥコマンドを発行してディスク２８１に指定したＬＢＡと同じＬＢＡに読み出したデータを書きこむことで実施する。データのコピーは、ＣＯＰＹコマンドを用いてもよい。
【００５１】
スナップショット管理プログラム２２１は、ディスク障害復旧サブプログラム２３３動作中にコンピュータ１００からデータ読み出し要求が来た場合、通常はミラー元ＬＵからデータを読み出すが、更新ビットが０、ミラー先不整合ビットが０であるＬＢＡセットに関してはミラー先ＬＵから読み出してもよい。
【００５２】
（３−３）再同期化フェーズ（ミラー先ＬＵ復旧）
再同期化フェーズにおけるミラー先ＬＵに関わるディスク障害復旧サブプログラム２３３の動作を図７のフローチャートを用いて説明する。ここで、障害で交換したディスクを図１のミラー先ＬＵであるディスク群２５２のディスク２８１と想定する。
【００５３】
まず、ディスクが交換されると、ディスク障害復旧サブプログラム２３３が、交換したディスク２８１に対応する更新／不整合位置管理表２２２のミラー先不整合ビットをすべて１に設定する（ステップ６００１）。次に、ディスク障害復旧サブプログラム２３３は、更新／不整合位置管理表２２２のミラー先不整合ビットに不整合記録として１があるかどうかを調べる（ステップ６００２）。もし、不整合記録である１がなければディスク２８１の障害復旧が完了しディスク群２５２のパリティは整合性がとれたので処理を終了する（ステップ６００３）。不整合記録があれば、該当記録位置の更新を抑止し（ステップ６００４）、更新／不整合位置管理表２２２のミラー元不整合ビットに不整合記録として１があるかどうかを調べる（ステップ６００５）。もし、不整合記録である１がなければミラー元ＬＵのパリティ整合性はとれているので、該当位置のデータをディスク２７１からディスク２８１にコピーし（ステップ６００７）、更新／不整合位置管理表２２２の該当する更新ビットに０を設定して更新の有無にかかわらず更新記録を削除する（ステップ６００８）。もし、不整合記録である１があればミラー元ＬＵも障害復旧中でありパリティの整合性はとれていないので、ディスク２８１と同一ディスク群２５２の他のディスク２８２〜２８５からデータを読み出しパリティ演算によってデータを復元し、ディスク２８１の該当位置に書き込む（ステップ６００６）。ステップ６００８、または、ステップ６００６終了後、ディスク障害復旧サブプログラム２３３は、更新／不整合位置管理表２２２のディスク２８１に該当する不整合ビットを０に設定して不整合記録を削除し（ステップ６００６）、該当記録位置の更新抑止を解除し（ステップ６０１０）、ステップ６００２に戻る。
【００５４】
以上が、再同期化フェーズにおけるミラー先ＬＵに関わるディスク障害復旧サブプログラム２３３の動作である。
【００５５】
なお、ステップ６００７におけるデータのコピーは、ディスク２７１にＲＥＡＤコマンドを発行して指定したＬＢＡのデータを読み出し、ディスク２８１にＷＲＩＴＥコマンドを発行してディスク２７１に指定したＬＢＡと同じＬＢＡに読み出したデータを書きこむことで実施する。データのコピーは、ＣＯＰＹコマンドを用いてもよい。
【００５６】
（３−４）再同期化フェーズ（ミラー元ＬＵ復旧）
再同期化フェーズにおけるミラー元ＬＵに関わるディスク障害復旧サブプログラム２３３の動作を図８のフローチャートを用いて説明する。ここで、障害で交換したディスクを図１のミラー元ＬＵであるディスク群２５１のディスク２７１と想定する。
【００５７】
まず、ディスクが交換されると、ディスク障害復旧サブプログラム２３３が、交換したディスク２７１に対応する更新／不整合位置管理表２２２のミラー元不整合ビットをすべて１に設定する（ステップ７００１）。次に、ディスク障害復旧サブプログラム２３３は、更新／不整合位置管理表２２２のミラー元不整合ビットに不整合記録として１があるかどうかを調べる（ステップ７００２）。もし、不整合記録である１がなければディスク２７１の障害復旧が完了しディスク群２５１のパリティは整合性がとれたので処理を終了する（ステップ７００３）。不整合記録があれば、該当記録位置の更新を抑止し（ステップ７００４）、更新／不整合位置管理表２２２のミラー先不整合ビットに不整合記録として１があるかどうかを調べる（ステップ７００５）。もし、不整合記録である１があればミラー先ＬＵも障害復旧中でありパリティの整合性はとれていないので、ディスク２７１と同一ディスク群２５１の他のディスク２７２〜２７５からデータを読み出しパリティ演算によってデータを復元し、ディスク２７１の該当位置に書き込む（ステップ７００６）。もし、不整合記録である１がなければ、更新／不整合位置管理表２２２の該当する更新ビットに更新記録として１があるかどうかを調べる（ステップ７００７）。もし、更新記録である１があれば、該当位置のディスク２７１とディスク２８１のデータは異なるので、ディスク障害復旧サブプログラム２３３はディスク２７１と同一ディスク群２５１の他のディスク２７２〜２７５からデータを読み出しパリティ演算によってデータを復元し、ディスク２７１の該当位置に書き込む（ステップ７００６）。もし、更新記録である１がなければ、該当位置のディスク２７１とディスク２８１のデータは同一でよいので、該当位置のデータをディスク２８１からディスク２７１にコピーする（ステップ７００８）。
【００５８】
ステップ７００６、または、ステップ７００８終了後、ディスク障害復旧サブプログラム２３３は、更新／不整合位置管理表２２２のディスク２７１に該当する不整合ビットを０に設定して不整合記録を削除し（ステップ７００９）、該当記録位置の更新抑止を解除し（ステップ７０１０）、ステップ７００２に戻る。以上が、再同期化フェーズにおけるミラー元ＬＵに関わるディスク障害復旧サブプログラム２３３の動作である。
【００５９】
なお、ステップ７００８におけるデータのコピーは、ディスク２８１にＲＥＡＤコマンドを発行して指定したＬＢＡのデータを読み出し、ディスク２７１にＷＲＩＴＥコマンドを発行してディスク２８１に指定したＬＢＡと同じＬＢＡに読み出したデータを書きこむことで実施する。データのコピーは、ＣＯＰＹコマンドを用いてもよい。
【００６０】
スナップショット管理プログラム２２１は、ディスク障害復旧サブプログラム２３３動作中にコンピュータ１００からデータ読み出し要求が来た場合、通常はミラー元ＬＵからデータを読み出すが、更新ビットが０、ミラー先不整合ビットが０であるＬＢＡセットに関してはミラー先ＬＵから読み出してもよい。
【００６１】
（４）データ読み出しと書きこみ、および、スナップショット読み出しの際のコンピュータ動作
まず、コンピュータ１００が、ディスクアレイ２００にあるＬＵ２６１のデータにアクセスする場合のデータベースプログラム１２６の動作を説明する。データベースプログラム１２６は、スナップショット取得の有無に関係なく、同じ動作を行う。
【００６２】
データベースプログラム１２６がＬＵ２６１のデータを読み出す場合、データベースプログラム１２６はディスクアレイ２００に対し、ＬＵ２６１のデータを読み出すＲＥＡＤコマンドを発行する。最後に、データベースプログラム１２６は、ディスクアレイ２００からデータとステータスを受信し動作を終了する。また、データベースプログラム１２６がＬＵ２６１にデータを書きこむ場合、データベースプログラム１２６はディスクアレイ２００に対し、ＬＵ２６１にデータを書きこむＷＲＩＴＥコマンドを発行し、データを送信する。最後に、データベースプログラム１２６は、ディスクアレイ２００からステータスを受信し動作を終了する。
【００６３】
次に、コンピュータ１００が、ディスクアレイ２００にあるＬＵ２６１のスナップショットを読み出す場合のバックアッププログラム１２７の動作を説明する。
【００６４】
バックアッププログラム１２７がＬＵ２６１のスナップショットを読み出す場合、バックアッププログラム１２７はディスクアレイ２００に対し、ＬＵ２６１のミラー先ＬＵであるＬＵ２６２のデータを読み出すＲＥＡＤコマンドを発行する。最後に、バックアッププログラム１２７は、ディスクアレイ２００からデータとステータスを受信し動作を終了する。
【００６５】
（５）データ読み出しと書きこみ、および、スナップショット読み出しの際のディスクアレイ動作
まず、コンピュータ１００が、ディスクアレイ２００にあるＬＵ２６１のデータにアクセスする場合のスナップショット管理プログラム２２１の動作を説明する。
【００６６】
コンピュータ１００がＬＵ２６１のデータを読み出す場合、スナップショット管理プログラム２２１がＬＵ２６１に対するＲＥＡＤコマンドを受信する。次に、ＬＵミラーサブプログラム２３１が有効で、かつ、ミラー再同期サブプログラム２３２による更新部分のコピーが終了していれば、ＬＵ２６１、もしくは、ミラー先ＬＵであるＬＵ２６２からデータを読み出す。そうでなければ、ＬＵ２６１からデータを読み出す。最後に、読み出したデータとステータスをコンピュータ１００に送信する。ＬＵ２６１とミラー先ＬＵであるＬＵ２６２の内容が一致している場合は、両者のいずれかからデータを読み出すことにより負荷を分散させることができる。
【００６７】
コンピュータ１００がＬＵ２６１にデータを書きこみ記憶内容を更新する場合、スナップショット管理プログラム２２１がＬＵ２６１に対するＷＲＩＴＥコマンドとデータを受信する。次に、ＬＵミラーサブプログラム２３１が有効であればＬＵ２６１とミラー先ＬＵであるＬＵ２６２にデータを書きこみ、無効であればＬＵ２６１にデータを書きこむ。次に、非ミラー時更新監視サブプログラム２３４と非ミラー時更新位置管理サブプログラム２３５が有効であれば、ＬＵ２６１の更新／不整合位置管理表２２２に対して更新したＬＢＡを含むＬＢＡセットの更新ビットを１に設定し、無効であれば何もしない。最後に、ステータスをコンピュータ１００に送信する。
【００６８】
次に、コンピュータ１００が、ディスクアレイ２００にあるＬＵ２６１のスナップショットを読み出す場合のスナップショット管理プログラム２２１の動作を説明する。
【００６９】
コンピュータ１００がＬＵ２６１のスナップショットを読み出す場合、スナップショット管理プログラム２２１がＬＵ２６１のミラー先ＬＵであるＬＵ２６２に対するＲＥＡＤコマンドを受信する。次に、スナップショット管理プログラム２２１は、ミラー先ＬＵであるＬＵ２６２からデータを読み出す。最後に、読み出したデータとステータスをコンピュータ１００に送信する。
【００７０】
なお、ミラー再同期サブプログラム２３２による更新部分のコピー中は、コピー処理とコンピュータ１００によるＬＵ２６１へのデータアクセス処理が同じＬＵ２６１に集中するため、データアクセス性能が低下する。
【００７１】
（６）効果
本実施形態によれば、スナップショット取得のために二重化運用しているディスクアレイにおいて、ディスク交換後のパリティ再構築に関わるディスクアクセス回数を削減することで、通常アクセス性能の低下を抑止することができるという効果がある。
【００７２】
また、本実施形態によれば、二重化を停止している期間におけるディスク交換後のパリティ再構築時間を短縮することで、ミラー再同期化中の更新データのコピー量を削減し、性能が低下するミラー再同期化時間を短縮できるという効果がある。
【００７３】
たとえば、ＲＡＩＤ５の５Ｄ＋１Ｐの構成を想定し、ミラーフェーズにおいてディスク障害を復旧する場合、パリティの冗長性によるディスク復旧をした場合、４回のディスク読み出しと１回のディスク書き込みが発生する。本発明を適用することで、１回のディスク読み出しと１回のディスク書き込みにすることができ、通常のアクセス性能の低下を抑止できる。非ミラーフェーズにおいても同様のディスクアクセス回数の削減効果が期待でき、スナップショットを取得してバックアップ等をとる際の読み出し性能低下を抑止でき、読み出し時間の増加も抑止できることから二重化停止時間、および、性能が低下するミラー再同期化時間を短縮できる。
【００７４】
また、ディスク交換後のパリティ再構築に関わるディスクアクセス回数を削減することで、パリティグループを形成するＲＡＩＤ５などのディスク構成を表すｎＤ＋１Ｐでのデータドライブ数ｎの増加による性能低下を抑止できるという効果がある。ｎＤ＋１Ｐの場合、パリティの冗長性によるディスク復旧をした場合、ｎ回のディスク読み出しと１回のディスク書き込みが発生する。本発明を適用することで、１回のディスク読み出しと１回のディスク書き込みにすることができる。
【００７５】
なお、本発明ではコンピュータ１００とディスクアレイ２００を接続するインターフェースをＳＣＳＩバス３００としたが、ＦｉｂｒｅＣｈａｎｎｅｌ等の他のインターフェースであってもよい。
【００７６】
また、本実施形態では、スナップショット取得のためミラー元ＬＵとミラー先ＬＵで二重化しているが、ミラー先ＬＵを複数設けた多重ミラーにおいても本発明は適用可能である。この場合、更新／不整合位置管理表２２２のミラー先不整合ビットをミラー先ＬＵ分だけ設け、ディスク障害復旧サブプログラム２３３がそれぞれのミラー先ＬＵとミラー元ＬＵに対し二重化の場合と同様の動作をし、ＬＵミラーサブプログラム２３１がミラー元
ＬＵに対するアクセスを複数のミラー先ＬＵに多重化する動作をすればよい。
【００７７】
【発明の効果】
以上述べたように、本発明によれば、スナップショット取得のために二重化運用しているディスクアレイにおいて、ディスク交換後のパリティ再構築に関わるディスクアクセス回数を削減することで、通常アクセス性能の低下を抑止することができるという効果がある。
【００７８】
また、本実施形態によれば、二重化を停止している期間におけるディスク交換後のパリティ再構築時間を短縮することで、ミラー再同期化中の更新データのコピー量を削減し、性能が低下するミラー再同期化時間を短縮できるという効果がある。
【００７９】
また、ディスク交換後のパリティ再構築に関わるディスクアクセス回数を削減することで、パリティグループを形成するＲＡＩＤ５などのディスク構成を表すｎＤ＋１Ｐでのデータドライブ数ｎの増加による性能低下を抑止できるという効果がある。
【図面の簡単な説明】
【図１】第１の実施形態におけるシステム構成図である。
【図２】第１の実施形態における更新／不整合位置管理表の説明図である。
【図３】第１の実施形態におけるスナップショット取得／削除フローである。
【図４】第１の実施形態におけるミラー再同期サブプログラムの動作フローである。
【図５】第１の実施形態におけるミラーフェーズのディスク障害復旧サブプログラム動作フローである。
【図６】第１の実施形態における非ミラーフェーズのディスク障害復旧サブプログラム動作フローである。
【図７】第１の実施形態における再同期化フェーズのミラー先ＬＵ復旧に関わるディスク障害復旧サブプログラム動作フローである。
【図８】第１の実施形態における再同期化フェーズのミラー元ＬＵ復旧に関わるディスク障害復旧サブプログラム動作フローである。
【符号の説明】
１００…コンピュータ、２００…ディスクアレイ、２２１…スナップショット管理プログラム、２２２…更新／不整合位置管理表、２３３…ディスク障害復旧サブプログラム、２７１〜２７５、２８１〜２８５…ディスク。[0001]
BACKGROUND OF THE INVENTION
The present invention mainly relates to a disk failure recovery method in an external storage system of a computer.
[0002]
[Prior art]
(1) Disk failure recovery method for disk array
The disk array system, also called RAID (Redundant Arrays of Inexpensive Disks), has a configuration in which a plurality of disk devices are arranged in an array, and a read request (data read request) from a host device (hereinafter abbreviated as a host). And a write device (data write request) that is processed at a high speed by the parallel operation of the disk device, and the reliability is improved by adding redundant data to the data. Disk array systems are classified into five levels according to the type and configuration of redundant data (Paper: "A Case for Redundant Arrays of Inexpensive Disks (RAID)", David A. Patterson, Garth Gibson, and Randy H. Katz. , Computer Science Division Department of Electrical Engineering and Computer Sciences, University of California Berkeley).
[0003]
In order to realize the disk array as described above, a read / write request from the host is converted into a read / write request to each disk device, data is distributed to each disk device at the time of writing, and each disk device at the time of reading. It is necessary to perform data distribution / set control that collects data from Such control is called disk array control.
[0004]
In the RAID 5 level, for example, where parity is added in the disk array, the contents of the disk can be recovered by guaranteeing parity with other disks even if one disk fails. When recovering a disk, the failed disk is replaced, data is read from the other disks constituting the parity group, subjected to XOR operation (exclusive OR), and the operation result is written to the replaced disk.
[0005]
(2) Duplication for snapshots
In general, data recorded on an external storage device of a computer such as a hard disk is regularly copied to tape so that the lost data can be recovered if the data is lost due to a device failure, software defect or incorrect operation. Backup is required. At this time, if the data is updated during the copying operation and the data is inconsistent, it does not make sense as a backup. Therefore, it is necessary to guarantee the data consistency during the copying operation.
[0006]
In order to guarantee the consistency of the data to be backed up, a program other than the backup program that accesses the data may be stopped. However, in a system that requires high availability, the program cannot be stopped for a long time. Therefore, it is necessary to provide a mechanism for creating a storage image of data at the start of backup without preventing the program from updating data during backup. Here, a storage image of data at a certain time point is called a snapshot, and a mechanism that provides a state in which data can be updated while creating a snapshot at a specified time point is called a snapshot management method. In addition, creating a snapshot by the snapshot management method is called snapshot acquisition, and the data for which the snapshot is acquired is called original data. Also, quitting the state in which the snapshot has been created is called snapshot deletion.
[0007]
One conventional snapshot management method is a data duplication method.
[0008]
In this method, in a normal state where a snapshot is not acquired, a program on a computer duplexes (mirrors) all data in two storage areas. When a snapshot is acquired, duplication is stopped, the two storage areas are separated into independent areas, one area is provided as original data, and the other area is provided as a snapshot.
[0009]
While the snapshot is acquired and duplication is stopped, updating of the data in the storage area of the original data is permitted, and when the data update occurs, the updated position is recorded. When the snapshot was deleted, data duplication was resumed, and update data whose contents did not match between the two storage areas were provided as a snapshot from the storage area of the original data based on the record of the update position Copy to storage area (mirror resynchronization). A method of duplicating data by a program on a computer is shown in US Pat. No. 5,051,887, for example.
[0010]
[Problems to be solved by the invention]
At the time of parity reconstruction after disk replacement, there is a problem in that normal access performance deteriorates because it is necessary to read data from all disks in the same parity group other than the replaced disk. Further, in nD + 1P representing a configuration such as RAID 5 forming a parity group, there is a problem that the performance is further deteriorated when the number n of data drives is increased.
[0011]
In the snapshot management method based on data duplication, during mirror resynchronization, normal access for updating / referencing and copy access for updated data are concentrated in the storage area of the original data, and the performance of normal access deteriorates. Since the time required for mirror resynchronization is proportional to the amount of data updated while taking a snapshot and stopping duplexing, assuming that update access occurs the same number of times per unit time, mirror resynchronization The time required for conversion increases in proportion to the stoppage time of duplexing. When backup is performed in parallel with parity reconstruction after disk replacement, it takes a long time to stop the duplication after taking a snapshot, resulting in longer mirror resynchronization time, which reduces normal access performance. is there.
[0012]
A first object of the present invention is to provide a disk failure recovery method that suppresses a decrease in normal access performance at the time of parity reconstruction after disk replacement in a disk array that is duplicated for snapshot acquisition. .
[0013]
The second object of the present invention is to perform mirror resynchronization by shortening the parity rebuilding time after disk replacement in a period when the duplexing is stopped in a disk array that is duplexed for snapshot acquisition. It is to provide a disk failure recovery method that reduces the amount of copy of update data in the disk and shortens the mirror resynchronization time when the performance deteriorates.
[0014]
In addition to the first and second objects, the third object of the present invention is a disk failure recovery which suppresses performance degradation due to an increase in the number of data drives n in nD + 1P representing a disk configuration such as RAID 5 forming a parity group Is to provide a method.
[0015]
[Means for Solving the Problems]
In order to achieve the first object, the present invention replaces parity rebuilding performed at the time of recovery of a failed disk with parity generation using parity redundancy in a disk array that is duplicated for snapshot acquisition. Thus, a disk failure recovery subprogram that is implemented by copying data from a disk at the same position in the snapshot mirror configuration is provided. By reducing the number of disk accesses related to parity reconstruction after disk replacement, it is possible to suppress a decrease in normal access performance.
[0016]
In order to achieve the second object, according to the present invention, a disk failure recovery subprogram similar to the above is provided in a disk array that is duplicated for snapshot acquisition. By shortening the parity rebuild time after disk replacement in the period when duplexing is stopped, the copy amount of update data during mirror resynchronization is reduced, and the mirror resynchronization time when performance is degraded is shortened. Can do.
[0017]
In order to achieve the second object, according to the present invention, a disk failure recovery subprogram similar to the above is provided in a disk array that is duplicated for snapshot acquisition. In nD + 1P representing a disk configuration such as RAID 5 that forms a parity group, copying is performed from one disk regardless of the number of n, so that a decrease in performance due to an increase in the number of data drives n can be suppressed.
[0018]
DETAILED DESCRIPTION OF THE INVENTION
The first embodiment of the present invention suppresses a decrease in normal access performance by reducing the number of disk accesses related to parity reconstruction after disk replacement in a disk array that is duplicated for snapshot acquisition. Is to do.
[0019]
In addition, by shortening the parity rebuild time after disk replacement during the period when duplexing is stopped, the amount of update data copied during mirror resynchronization is reduced, and the mirror resynchronization time when performance is degraded is shortened. Is to do.
[0020]
Also, by reducing the number of disk accesses related to parity reconstruction after disk replacement, it is possible to suppress performance degradation due to an increase in the number n of data drives in nD + 1P representing a disk configuration such as RAID 5 forming a parity group It is.
[0021]
In the present invention, a backup is taken as an example of using a snapshot, but it can also be used for other purposes such as OLAP (OnLine Analytical Processing) and system testing.
[0022]
(1) Description of configuration
The system configuration of the first embodiment of the present invention will be described with reference to FIG. In FIG. 1, a computer 100 and a disk array 200 are connected by a SCSI bus 300 via SCSI interfaces 140 and 240. The memory 120 of the computer 100 includes a database program 126 and a backup program 127 that are executed by the CPU 110 that controls the computer 100. The disk array 200 includes disk groups 251 to 252 controlled by the disk controller 250, and a snapshot management program 221 is stored in the memory 220 and is executed by the CPU 210. The disk group 251 has disks 271 to 275, the disk group 252 has disks 281 to 282, and each disk group has a striped array configuration with parity. In the present embodiment, the number of disks in the disk groups 251 to 252 is five, but each disk group 251 to 252 only needs to be composed of three or more disks.
[0023]
Storage areas in the disk groups 251 to 252 are accessed from the computer 100 as LUs (Logical Units) which are SCSI logical units. The storage areas corresponding to the disk groups 251 to 252 are referred to as LUs 261 to 262, respectively. There may be a plurality of LUs in each of the disk groups 251 to 252. In this embodiment, the snapshot management program 221 in the disk array 220 manages the LU 261 and LU 262 by duplicating the LU 261 as a mirror source LU having original data, and the LU 262 is a mirror destination LU which is a mirror of the original data. And The LU 262 is an LU used as a snapshot.
[0024]
Next, a program in the computer 100 will be described.
[0025]
The database program 126 has a function of accessing the LU 261 that is the mirror source LU during execution and switching to a backup mode that controls data update and ensures data consistency in the LU 261. Transition to the backup mode is made in accordance with an instruction from the user or the backup program 127. The backup program 127 has a function of reading data to be backed up to a tape or the like from the LU 262 storing a snapshot according to an instruction from the user, a function of issuing a SCSI ModeSelect command to the disk array 200, and a backup mode to the database program 126. It has a function to instruct to enable / disable.
[0026]
Next, programs and management tables in the disk array 200 will be described.
[0027]
The snapshot management program 221 of the disk array 200 has a disk access subprogram 230 for instructing disk access to the disk controller 250 in response to a request from the computer 100 and another update designated in advance by duplicating updates to one LU. This also applies to LUs, and has an LU mirror subprogram 231 that writes the same contents to two LUs. The disk access subprogram 230 performs disk array control for converting read / write requests from the computer 100 into read / write requests to the disks 271 to 275 and 281 to 285. The LU mirror subprogram 231 duplicates access to the LU 261 to the LU 262.
[0028]
The snapshot management program 221 also includes a non-mirror time update monitoring subprogram 234 that detects an update to the mirror source LU when duplexing is stopped (non-mirror time), and an update / inconsistency whose update position will be described later. Non-mirrored update location management subprogram 235 recorded in the location management table 222, mirror resynchronization subprogram 232 for copying the update part of the mirror source LU to the mirror destination LU when mirror resynchronization is performed, and the failed disk And a disk failure recovery subprogram 233 for recovering to a striped array configuration with parity after replacement. The update / inconsistency position management table 222 is used to manage the data contents of the mirror source LU and the mirror destination LU. The update position of the mirror source LU updated at the time of non-mirror and the same LU in the replaced disk after the disk failure The position at which the parity is inconsistent is recorded.
[0029]
The update / inconsistency position management table 222 is a bitmap as shown in FIG. 2 and includes all LBA set numbers in the mirror source LU, update bits associated with the LBA set numbers, mirror source inconsistency bits, and mirror destinations. Consists of inconsistent bits. An LBA set is an individual set when the entire area of an LU is divided from the top in units of one or more equal numbers of LBAs (Logical Block Address), and the LBA set number is the top of the LBA. A serial number is assigned to each LBA set from the side.
[0030]
It is assumed that one LBA set is distributed to each disk from the head side by data distribution of the disk array, and a parity group number is added to a set of LBA sets at the same position in each disk. In the example of FIG. 2, the parity group number 0 is composed of LBA set numbers 0 to 4. In this embodiment, data distribution is performed for five disks, and a disk set number representing a set of LBA sets for each disk is added. In the example of FIG. 2, the disk set 0 is configured with LBA set numbers 0, 5, 10, and 15.
[0031]
The update bit indicates whether or not the LBA set of the mirror source LU has been updated at the time of non-mirroring, and designates 1 and 0 according to “update” and “non-update”, respectively. The initial value of the update bit is 0. The example of FIG. 2 shows a state in which only the area of LBA set number 1 is updated at the time of non-mirroring.
[0032]
The mirror source inconsistency bit indicates whether or not the parity is consistent with other disks in the same LU when the mirror source LU is exchanged, and is 1 and 0 in accordance with “inconsistency” and “match”, respectively. Is specified. The formula setting value of the mirror source mismatch bit is 0.
[0033]
The mirror destination inconsistency bit indicates whether the parity is consistent with other disks in the same LU when the mirror destination LU is exchanged, and is 1 or 0 in accordance with “inconsistency” or “match”, respectively. Is specified. The setting value of the mirror destination mismatch bit is 0. The example of FIG. 2 shows a state in which the area of the disk set 0 with the parity group numbers 1 to 3 is in a parity mismatch with other disks in the mirror destination LU. The area of disk set 0 with parity group number 0 indicates that parity consistency with other disks is achieved by a disk failure recovery method described later.
[0034]
(2) Snapshot acquisition / deletion
Taking the case of providing a snapshot of LU 261 as a mirror source LU as LU 262 as a mirror destination LU as an example, the operations of backup program 127 and snapshot management program 221 at the time of snapshot acquisition / deletion will be described with reference to the flowchart of FIG. explain.
[0035]
First, the backup program 127 of the computer 100 gives an instruction to the database program 126, and validates the backup mode to guarantee the consistency of the data for acquiring the snapshot (step 2000). Next, the backup program 127 issues a ModeSelect command for acquiring a snapshot to the disk array 200 (step 2001). Upon receipt of the ModeSelect command (step 3000), the snapshot management program 221 of the disk array 200 enables the non-mirror update monitoring subprogram 234 and the non-mirror update location management subprogram 235, and starts recording the update location of the LU 261. (Step 3001). Thereafter, when the LU 261 is updated, 1 is set in the update bit of the LBA set including the updated LBA in the update / inconsistent position management table 222, and the fact that there has been an update is recorded. Next, the snapshot management program 221 invalidates the LU mirror subprogram 231 and stops duplication of the LU 261 and LU 262 (step 3002). As a result, the update to the LU 261 that is the mirror source LU is not reflected in the LU 262 that is the mirror destination LU. Next, the snapshot management program 221 transmits the completion status of the ModeSelect command to the backup program 127 of the computer 100 (step 3003). When the backup program 127 receives the completion status of the ModeSelect command (step 2002), it gives an instruction to the database program 126 and invalidates the backup mode (step 2003).
[0036]
Next, the backup program 127 issues a ModeSelect command that instructs the LU 263 to delete the snapshot to the disk array 200 (step 2004). When receiving the ModeSelect command (step 3005), the snapshot management program 221 of the disk array 200 validates the LU mirror subprogram 231 and resumes duplication of the LU 261 and LU 262 (step 3006). Thereby, the update to the LU 261 is reflected in the LU 262. Next, the snapshot management program 221 invalidates the non-mirror time update monitoring subprogram 234 and the non-mirror time update position management subprogram 235, and stops the LU 261 update position recording (step 3007). Thereafter, the update bit in the update / inconsistent position management table 222 is not changed by the non-mirrored update position management subprogram 235. Next, the snapshot management program 221 activates the mirror resynchronization subprogram 232, refers to the update / inconsistency position management table 222, and copies the portion where the contents do not match between the LU 261 and LU 262 from the LU 261 to the LU 262 ( Step 3008). Next, the snapshot management program 221 invalidates the mirror resynchronization subprogram 232 (3009), and transmits the completion status of the ModeSelect command to the backup program 127 of the computer 100 (step 3010). The backup program 127 receives the end status of the ModeSelect command and ends the operation (step 2005).
[0037]
Here, the operation of the mirror resynchronization subprogram 232 for copying data from the LU 261 to the LU 262 in step 3008 will be described with reference to the flowchart of FIG. First, the mirror resynchronization subprogram 232 checks whether there is 1 as an update record in the update bit of the update / mismatch position management table 222 (step 1001). If there is no update record 1, mirror resynchronization is complete and the process ends (step 1002). If there is an update record, the update of the corresponding recording position is suppressed (step 1003), and the data of the corresponding recording position is stored.
Copying from the LU 261 that is the mirror source LU to the LU 262 that is the mirror destination LU (step 1004). Data is copied by issuing a READ command to one of the disks 271 to 275 of the disk group 251 including the LU 261 to read the specified LBA data, and issuing a READ command for the disk group 251 of the disk group 252 including the LU 262. The WRITE command is issued to any of the disks 281 to 285 corresponding to the selected disk, and the read data is written in the same LBA as the LBA designated in the LU 261. The COPY command may be used to copy the data. Next, the mirror resynchronization subprogram 232 cancels the update suppression of the corresponding recording position (step 1005), sets the corresponding update bit of the update / inconsistent position management table 222 to 0 and deletes the update record ( Step 1006) and return to Step 1001.
[0038]
The above is the operation of the backup program 127 and the snapshot management program 221 when acquiring / deleting a snapshot. The backup program 127 of the computer 100 can read the LU 262 that acquired the snapshot between step 2003 and step 2004.
[0039]
(3) Disk failure recovery method
In the present invention, parity reconstruction for a newly set disk by replacing a failed disk is performed by copying from a mirrored paired disk instead of using parity redundancy. The pair of disks corresponds to the relationship between the disk 271 and the disk 281 assuming that the disks 271 to 275 and the disks 281 to 285 in FIG. 1 form a disk group in this order, for example.
[0040]
In the present embodiment, parity reconstruction after replacing a failed disk will be described in three phases: a mirror phase, a non-mirror phase, and a resynchronization phase, depending on which stage of the snapshot acquisition / deletion operation is performed. . The non-mirror phase is from the start of step 3002 to the end of step 3007 in FIG. The resynchronization phase is from the start of step 3008 to the end of step 3009 in FIG. 3 in which the updated data is copied from the mirror source LU to the mirror destination LU. The mirror phase is the stage where it is mirrored, and the entire range other than the non-mirror phase and the resynchronization phase.
[0041]
Also, it is classified according to whether the disk of the mirror source LU or the mirror destination LU was replaced during each phase.
[0042]
The parity reconstruction after the replacement of the failed disk is performed by the snapshot management program 221 of the disk array 200 enabling the disk failure recovery subprogram 233, and is performed independently of the instruction from the computer 100. be able to.
[0043]
(3-1) Mirror phase
The operation of the disk failure recovery subprogram 233 in the mirror phase will be described with reference to the flowchart of FIG. In the mirror phase, the contents of the mirror source LU and the mirror destination LU are the same, and the operation of the disk failure recovery subprogram 233 is the same regardless of whether the mirror source LU or the mirror destination LU is replaced. Take the case where the disk of the destination LU is replaced as an example. Here, it is assumed that the disk replaced due to the failure is the disk 281 of the disk group 252 which is the mirror destination LU in FIG. When mirrored, the same data as the disk 271 of the disk group 251 that is the mirror source LU is stored in the disk 281. First, when a disk is replaced, the disk failure recovery subprogram 233 sets all the mirror destination inconsistency bits in the update / inconsistency position management table 222 corresponding to the replaced disk 281 to 1 (step 4001). Next, the disk failure recovery subprogram 233 checks whether there is 1 as the inconsistent record in the mirror destination inconsistent bit of the update / inconsistent position management table 222 (step 4002). If there is no inconsistent record 1, the failure recovery of the disk 281 is completed, and the parity of the disk group 252 is consistent, so the process ends (step 4003). If there is inconsistent recording, update of the corresponding recording position is suppressed (step 4004), and it is checked whether or not there is 1 as inconsistent recording in the mirror source inconsistent bit of the update / inconsistent position management table 222 (step 4005). . If there is no inconsistent record 1, the mirror source LU has parity consistency, and the data at the corresponding position is copied from the disk 271 to the disk 281 (step 4007). If there is 1 that is inconsistent recording, the mirror source LU is also recovering from the failure and the consistency of the parity is not taken. Is written in the corresponding position of the disk 281 (step 4006). After step 4007 or step 4006, the disk failure recovery subprogram 233 sets the inconsistency bit corresponding to the disk 281 in the update / inconsistency position management table 222 to 0 and deletes the inconsistency record (step 4008). ) Releases the update inhibition of the corresponding recording position (step 4009), and returns to step 4002.
[0044]
The above is the operation of the disk failure recovery subprogram 233 in the mirror phase.
[0045]
In step 4007, the data is copied by reading the LBA data designated by issuing a READ command to the disk 271 and reading the data read to the same LBA as the LBA designated by the disk 271 by issuing the WRITE command to the disk 281. Implement by writing. The COPY command may be used to copy the data.
[0046]
When a data update request is received from the computer 100 during the operation of the disk failure recovery subprogram 233, the snapshot management program 221 is to reflect both in the mirror source LU and the mirror destination LU. Set to zero. When a data read request is received from the computer 100, data is read from the LU whose inconsistency bit is 0. When the inconsistency bit of both LUs is 1, data is read from all the disks other than the replaced disk, and the requested data is restored by the parity calculation and transmitted to the computer 100.
[0047]
(3-2) Non-mirror phase
The operation of the disk failure recovery subprogram 233 in the non-mirror phase will be described with reference to the flowchart of FIG. In the non-mirror phase, the contents of the mirror source LU and the mirror destination LU do not match, but the operation of the disk failure recovery subprogram 233 is the same when either the mirror source LU or the mirror destination LU is replaced. Therefore, the case where the mirror source LU disk is replaced is taken as an example. Here, it is assumed that the disk replaced due to the failure is the disk 271 of the disk group 251 that is the mirror source LU in FIG.
[0048]
First, when a disk is replaced, the disk failure recovery subprogram 233 sets all the mirror source inconsistency bits in the update / inconsistency position management table 222 corresponding to the replaced disk 271 to 1 (step 5001). Next, the disk failure recovery subprogram 233 checks whether there is 1 as a mismatch record in the mirror source mismatch bit of the update / mismatch position management table 222 (step 5002). If there is no inconsistent record 1, the failure recovery of the disk 271 is completed, and the parity of the disk group 251 is consistent, so the process ends (step 5003). If there is inconsistent recording, update of the corresponding recording position is suppressed (step 5004), and it is checked whether or not there is 1 as inconsistent recording in the mirror destination inconsistent bit of the update / inconsistent position management table 222 (step 5005). . If there is 1 that is inconsistent recording, the mirror destination LU is also recovering from the failure and the parity is not consistent, so data is read from other disks 272 to 275 in the same disk group 251 as the disk 271 and parity calculation is performed. Thus, the data is restored and written in the corresponding position on the disk 271 (step 5006). If there is no inconsistent record 1, it is checked whether or not there is 1 as an update record in the corresponding update bit of the update / inconsistent position management table 222 (step 5007). If there is an update record 1, the data on the disk 271 and the disk 281 at the corresponding positions are different, so the disk failure recovery subprogram 233 reads the data from the other disks 272 to 275 in the same disk group 251 as the disk 271. Data is restored by parity operation and written to the corresponding position on the disk 271 (step 5006). If there is no update record 1, the data on the disk 271 and the disk 281 at the corresponding position should be the same, so the data at the corresponding position is copied from the disk 281 to the disk 271 (step 5008).
[0049]
After step 5006 or step 5008, the disk failure recovery subprogram 233 sets the inconsistency bit corresponding to the disk 271 in the update / inconsistency position management table 222 to 0 and deletes the inconsistency record (step 5009). ) Releases the update inhibition of the corresponding recording position (step 5010), and returns to step 5002. The above is the operation of the disk failure recovery subprogram 233 in the non-mirror phase.
[0050]
In step 5008, the data is copied by reading the LBA data designated by issuing a READ command to the disk 281 and reading the data read to the same LBA as the LBA designated by the disk 281 by issuing the WRITE command to the disk 271. Implement by writing. The COPY command may be used to copy the data.
[0051]
When a data read request is received from the computer 100 during the operation of the disk failure recovery subprogram 233, the snapshot management program 221 normally reads data from the mirror source LU, but the update bit is 0 and the mirror destination mismatch bit is 0. May be read from the mirror destination LU.
[0052]
(3-3) Resynchronization phase (mirror destination LU recovery)
The operation of the disk failure recovery subprogram 233 related to the mirror destination LU in the resynchronization phase will be described with reference to the flowchart of FIG. Here, it is assumed that the disk replaced due to the failure is the disk 281 of the disk group 252 which is the mirror destination LU in FIG.
[0053]
First, when a disk is replaced, the disk failure recovery subprogram 233 sets all the mirror destination inconsistency bits in the update / inconsistency position management table 222 corresponding to the replaced disk 281 to 1 (step 6001). Next, the disk failure recovery subprogram 233 checks whether there is 1 as the inconsistent record in the mirror destination inconsistent bit of the update / inconsistent position management table 222 (step 6002). If there is no inconsistent recording 1, the failure recovery of the disk 281 is completed, and the parity of the disk group 252 is consistent, and the process is terminated (step 6003). If there is inconsistent recording, update of the corresponding recording position is suppressed (step 6004), and it is checked whether or not there is 1 as inconsistent recording in the mirror source inconsistent bit of the update / inconsistent position management table 222 (step 6005). . If there is no inconsistent record 1, the mirror source LU has parity consistency, so the data at the corresponding position is copied from the disk 271 to the disk 281 (step 6007), and the update / inconsistent position management table 222. The corresponding update bit is set to 0 and the update record is deleted regardless of whether or not there is an update (step 6008). If there is 1 inconsistent recording, the mirror source LU is also recovering from the failure and the parity is not consistent, so data is read from other disks 282 to 285 in the same disk group 252 as the disk 281 and parity calculation is performed. Thus, the data is restored and written in the corresponding position on the disk 281 (step 6006). After step 6008 or step 6006, the disk failure recovery subprogram 233 sets the inconsistency bit corresponding to the disk 281 in the update / inconsistency position management table 222 to 0 and deletes the inconsistency record (step 6006). ) Releases the update inhibition of the corresponding recording position (step 6010), and returns to step 6002.
[0054]
The above is the operation of the disk failure recovery subprogram 233 related to the mirror destination LU in the resynchronization phase.
[0055]
In step 6007, the data is copied by reading the LBA data designated by issuing a READ command to the disk 271 and reading the data read to the same LBA as the LBA designated by the disk 271 by issuing the WRITE command to the disk 281. Implement by writing. The COPY command may be used to copy the data.
[0056]
(3-4) Resynchronization phase (mirror source LU recovery)
The operation of the disk failure recovery subprogram 233 related to the mirror source LU in the resynchronization phase will be described with reference to the flowchart of FIG. Here, it is assumed that the disk replaced due to the failure is the disk 271 of the disk group 251 that is the mirror source LU in FIG.
[0057]
First, when a disk is replaced, the disk failure recovery subprogram 233 sets all the mirror source inconsistency bits of the update / inconsistency position management table 222 corresponding to the replaced disk 271 to 1 (step 7001). Next, the disk failure recovery subprogram 233 checks whether there is 1 as the inconsistent record in the mirror source inconsistent bit of the update / inconsistent position management table 222 (step 7002). If there is no inconsistent record 1, the failure recovery of the disk 271 is completed, and the parity of the disk group 251 is consistent, so the process ends (step 7003). If there is inconsistent recording, update of the corresponding recording position is inhibited (step 7004), and it is checked whether or not there is 1 as inconsistent recording in the mirror destination inconsistent bit of the update / inconsistent position management table 222 (step 7005). . If there is 1 that is inconsistent recording, the mirror destination LU is also recovering from the failure and the parity is not consistent, so data is read from other disks 272 to 275 in the same disk group 251 as the disk 271 and parity calculation is performed. Thus, the data is restored and written in the corresponding position on the disk 271 (step 7006). If there is no inconsistent record 1, it is checked whether or not there is 1 as an update record in the corresponding update bit of the update / inconsistent position management table 222 (step 7007). If there is an update record 1, the data on the disk 271 and the disk 281 at the corresponding positions are different, so the disk failure recovery subprogram 233 reads the data from the other disks 272 to 275 in the same disk group 251 as the disk 271. Data is restored by parity calculation and written to the corresponding position on the disk 271 (step 7006). If there is no update record 1, the data on the disk 271 and the disk 281 at the corresponding position may be the same, so the data at the corresponding position is copied from the disk 281 to the disk 271 (step 7008).
[0058]
After step 7006 or step 7008, the disk failure recovery subprogram 233 sets the inconsistency bit corresponding to the disk 271 in the update / inconsistency position management table 222 to 0 and deletes the inconsistency record (step 7009). ) Releases the update inhibition of the corresponding recording position (step 7010), and returns to step 7002. The above is the operation of the disk failure recovery subprogram 233 related to the mirror source LU in the resynchronization phase.
[0059]
In step 7008, the data is copied by reading the LBA data designated by issuing a READ command to the disk 281 and reading the data read to the same LBA as the LBA designated by the disk 281 by issuing the WRITE command to the disk 271. Implement by writing. The COPY command may be used to copy the data.
[0060]
When a data read request is received from the computer 100 during the operation of the disk failure recovery subprogram 233, the snapshot management program 221 normally reads data from the mirror source LU, but the update bit is 0 and the mirror destination mismatch bit is 0. May be read from the mirror destination LU.
[0061]
(4) Computer operation when reading and writing data and reading snapshots
First, the operation of the database program 126 when the computer 100 accesses the data of the LU 261 in the disk array 200 will be described. The database program 126 performs the same operation regardless of whether or not a snapshot is acquired.
[0062]
When the database program 126 reads LU 261 data, the database program 126 issues a READ command for reading the LU 261 data to the disk array 200. Finally, the database program 126 receives data and status from the disk array 200 and ends the operation. When the database program 126 writes data to the LU 261, the database program 126 issues a WRITE command for writing data to the LU 261 to the disk array 200, and transmits the data. Finally, the database program 126 receives the status from the disk array 200 and ends the operation.
[0063]
Next, the operation of the backup program 127 when the computer 100 reads a snapshot of the LU 261 in the disk array 200 will be described.
[0064]
When the backup program 127 reads the snapshot of the LU 261, the backup program 127 issues a READ command for reading the data of the LU 262, which is the mirror destination LU of the LU 261, to the disk array 200. Finally, the backup program 127 receives data and status from the disk array 200 and ends the operation.
[0065]
(5) Disk array operation when reading and writing data and reading snapshots
First, the operation of the snapshot management program 221 when the computer 100 accesses the data of the LU 261 in the disk array 200 will be described.
[0066]
When the computer 100 reads data from the LU 261, the snapshot management program 221 receives a READ command for the LU 261. Next, if the LU mirror subprogram 231 is valid and the copy of the updated portion by the mirror resynchronization subprogram 232 has been completed, data is read from the LU 261 or the LU 262 that is the mirror destination LU. Otherwise, data is read from the LU 261. Finally, the read data and status are transmitted to the computer 100. If the contents of the LU 261 and the LU 262 that is the mirror destination LU match, the load can be distributed by reading data from either of them.
[0067]
When the computer 100 writes data to the LU 261 and updates the stored contents, the snapshot management program 221 receives a WRITE command and data for the LU 261. Next, if the LU mirror subprogram 231 is valid, data is written to the LU 261 and the mirror destination LU 262, and if invalid, data is written to the LU 261. Next, if the non-mirror time update monitoring subprogram 234 and the non-mirror time update position management subprogram 235 are valid, the update bit of the LBA set including the updated LBA for the update / mismatch position management table 222 of the LU 261 Set to 1 and do nothing if disabled. Finally, the status is transmitted to the computer 100.
[0068]
Next, the operation of the snapshot management program 221 when the computer 100 reads a snapshot of the LU 261 in the disk array 200 will be described.
[0069]
When the computer 100 reads the snapshot of the LU 261, the snapshot management program 221 receives a READ command for the LU 262 that is the mirror destination LU of the LU 261. Next, the snapshot management program 221 reads data from the LU 262 that is the mirror destination LU. Finally, the read data and status are transmitted to the computer 100.
[0070]
Note that during the copying of the updated portion by the mirror resynchronization subprogram 232, the copy processing and the data access processing to the LU 261 by the computer 100 are concentrated on the same LU 261, so that the data access performance is degraded.
[0071]
(6) Effect
According to this embodiment, in a disk array that is operated in duplicate for snapshot acquisition, it is possible to suppress a decrease in normal access performance by reducing the number of disk accesses related to parity reconstruction after disk replacement. There is an effect that can be done.
[0072]
In addition, according to the present embodiment, by reducing the parity reconstruction time after disk replacement during the period when duplexing is stopped, the copy amount of update data during mirror resynchronization is reduced, and the performance is degraded. The mirror resynchronization time can be shortened.
[0073]
For example, assuming a RAID 5 5D + 1P configuration, when recovering a disk failure in the mirror phase, or when recovering a disk with parity redundancy, four disk reads and one disk write occur. By applying the present invention, it is possible to perform one disk read and one disk write, and it is possible to suppress a decrease in normal access performance. Even in the non-mirror phase, the same effect of reducing the number of disk accesses can be expected, it is possible to suppress read performance degradation when taking snapshots and taking backups, etc., and it is possible to suppress increase in read time. Mirror resynchronization time that degrades performance can be shortened.
[0074]
In addition, by reducing the number of disk accesses related to parity reconstruction after disk replacement, it is possible to suppress performance degradation due to an increase in the number n of data drives in nD + 1P representing a disk configuration such as RAID 5 forming a parity group. is there. In the case of nD + 1P, when disk recovery is performed by parity redundancy, n disk reads and one disk write occur. By applying the present invention, it is possible to perform one disk read and one disk write.
[0075]
In the present invention, the SCSI bus 300 is used as an interface for connecting the computer 100 and the disk array 200, but another interface such as Fiber Channel may be used.
[0076]
In this embodiment, the mirror source LU and the mirror destination LU are duplicated to obtain a snapshot. However, the present invention can also be applied to a multiple mirror provided with a plurality of mirror destination LUs. In this case, the mirror destination inconsistency bits of the update / inconsistency position management table 222 are provided for the mirror destination LU, and the disk failure recovery subprogram 233 performs the same operation as when the mirror destination LU and the mirror source LU are duplicated. LU mirror subprogram 231 is the mirror source
An operation of multiplexing access to the LU to a plurality of mirror destination LUs may be performed.
[0077]
【The invention's effect】
As described above, according to the present invention, in a disk array that is duplicated for snapshot acquisition, normal access performance is reduced by reducing the number of disk accesses related to parity reconstruction after disk replacement. There is an effect that can be suppressed.
[0078]
In addition, according to the present embodiment, by reducing the parity reconstruction time after disk replacement during the period when duplexing is stopped, the copy amount of update data during mirror resynchronization is reduced, and the performance is degraded. The mirror resynchronization time can be shortened.
[0079]
In addition, by reducing the number of disk accesses related to parity reconstruction after disk replacement, it is possible to suppress performance degradation due to an increase in the number n of data drives in nD + 1P representing a disk configuration such as RAID 5 forming a parity group. is there.
[Brief description of the drawings]
FIG. 1 is a system configuration diagram according to a first embodiment.
FIG. 2 is an explanatory diagram of an update / inconsistency position management table in the first embodiment.
FIG. 3 is a snapshot acquisition / deletion flow according to the first embodiment.
FIG. 4 is an operation flow of a mirror resynchronization subprogram in the first embodiment.
FIG. 5 is an operational flow of a disk failure recovery subprogram in the mirror phase in the first embodiment.
FIG. 6 is a non-mirror phase disk failure recovery subprogram operation flow in the first embodiment.
FIG. 7 is a disk failure recovery subprogram operation flow related to mirror destination LU recovery in the resynchronization phase in the first embodiment.
FIG. 8 is a disk failure recovery subprogram operation flow related to the mirror source LU recovery in the resynchronization phase in the first embodiment.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 100 ... Computer, 200 ... Disk array, 221 ... Snapshot management program, 222 ... Update / inconsistency position management table, 233 ... Disk failure recovery subprogram, 271-275, 281-285 ... Disk.

Claims

A disk failure recovery method for an external storage device having a plurality of disks connected to a computer,
Forming a parity group using the first plurality of disks, and arranging a first logical unit on the first plurality of disks;
Using the same number of disks as the first plurality of disks, and forming a parity group using the second plurality of disks so as to form a pair with the first plurality of disks, the second plurality of disks Having a second step of placing a second logical unit on the disk;
A third step of writing the data write request data to both the first logical unit and the second logical unit when the computer makes a data write request;
When there is a data write request from the computer, the first logical unit and the second logical unit are configured so as to perform the data write to the first logical unit and not to the second logical unit. A fourth step for controlling the writing of data to the logical unit;
A fifth step of storing the position on the disk and the update record at which the data was written when the data was written to the first logical unit in the fourth step;
A sixth step of identifying a disk in the first plurality of disks that is paired with the replaced disk when one of the second plurality of disks is replaced with a new disk; ,
A seventh step of copying the data at the location to the new disc location corresponding to the location of the identified disc if there is no update record at the location of the data on the identified disc;
If the update record exists at the position of the data on the specified disc, the data corresponding to the position of the new disc is transferred to the position of the new disc corresponding to the position of the second plurality of discs. A disk failure recovery method for an external storage device, comprising an eighth step of obtaining and recording from other disk groups other than the replaced disk by parity calculation.

A disk failure recovery method for an external storage device having a plurality of disks connected to a computer,
Forming a parity group using the first plurality of disks, and arranging a first logical unit on the first plurality of disks;
Using the same number of disks as the first plurality of disks, and forming a parity group using the second plurality of disks so as to form a pair with the first plurality of disks, the second plurality of disks Having a second step of placing a second logical unit on the disk;
A third step of writing the data write request data to both the first logical unit and the second logical unit when the computer makes a data write request;
When there is a data write request from the computer, the first logical unit and the second logical unit are configured so as to perform the data write to the first logical unit and not to the second logical unit. A fourth step for controlling the writing of data to the logical unit;
A fifth step of storing the position on the disk and the update record at which the data was written when the data was written to the first logical unit in the fourth step;
A sixth step of identifying a disk in the second plurality of disks that is paired with the replaced disk if one of the first plurality of disks is replaced with a new disk; ,
If there is no update record at the position of the data on the one exchanged disk , the data on the position of the specified disk paired with the position on the one exchanged disk is changed to the position. A seventh step of copying to the location of the new disk corresponding to
If there is the update record at the position of the data on the one exchanged disk , the data corresponding to the position of the new disk is transferred to the position of the new disk corresponding to the position. A disk failure recovery method for an external storage device, comprising: an eighth step of obtaining and recording by a parity calculation from a disk group other than the replaced one of the disks among the disks.