JP3597349B2

JP3597349B2 - Storage subsystem and fault recovery method thereof

Info

Publication number: JP3597349B2
Application number: JP24152197A
Authority: JP
Inventors: 昇古海; 正敦野崎; 茂木城
Original assignee: Hitachi Software Engineering Co Ltd; Hitachi Ltd
Current assignee: Hitachi Software Engineering Co Ltd; Hitachi Ltd
Priority date: 1997-09-05
Filing date: 1997-09-05
Publication date: 2004-12-08
Anticipated expiration: 2017-09-05
Also published as: JPH1185410A

Description

【０００１】
【発明の属する技術分野】
本発明は、記憶サブシステムおよびその障害回復技術に関し、特に、遠隔地に設置され、各々が冗長記憶構成を採る複数の記憶サブシステムにて構成されるデータ二重化記憶サブシステム等に適用して有効な技術に関する。
【０００２】
【従来の技術】
中央処理装置と周辺記憶装置とからなる情報処理システムでは、情報量の膨大化とともに取り扱うデータに対する信頼性への要求が強まる中で、従来より記憶媒体や記憶装置の物理的な障害に対する信頼性向上策として、複数個の記憶媒体にデータを二重に保持することによって、障害に伴うデータ消失に対しバックアップデータからのデータ回復を図るデータ二重化記憶サブシステムが実用化されている。
【０００３】
一方、個々の記憶サブシステムとしては、データを複数個の記憶媒体に分割して配置し、更に幾つかのデータを一単位としてパリティデータに代表される冗長データを生成および分散記憶する事によって、あるデータの記憶媒体の障害時に、冗長データと当該一単位内の他データとからデータ回復を行なうＲＡＩＤ記憶装置も実用化されている。
【０００４】
ところが銀行等のオンラインシステムに代表されるように、広域に渡って情報処理システムが機能し、多くの情報処理システムが連動しているようなシステムにおいては、これらのデータ信頼性向上技術は、一つの記憶サブシステム内でデータを多重に保持したり冗長化を図るものであり、その記憶サブシステム全体の障害や、中央処理装置をも含む情報処理システム全体が例えば建物全体の停電・火災等によって動作しなくなった場合、その被害が広域のシステム全体に影響を及ぼすばかりでなく、データ消失に伴う被害度は甚大なものになってしまう。この様な懸念に対し、遠隔地においてデータを二重に保持するデータ二重管理システムが実用化されている。しかしながらこの遠隔データ二重化においては、遠隔の情報処理システム間のデータの通信を中央処理装置間の通信機能によって処理しているため、データ処理や演算等を行なう中央処理装置の負荷が大きく、この中央処理装置の負荷を軽減する事が、遠隔データ二重化システムの課題とされている。
【０００５】
この様な課題の対策として、遠隔地にあるそれぞれの情報処理システム間を、記憶制御装置に制御装置間で通信およびデータ転送を行なう機能を設け、通信・データ転送パスで制御装置同士を接続する事により、データ二重化に掛かる負荷を記憶制御装置に持たせ中央処理装置の負荷を軽減するシステムも実用化されている。この遠隔データ二重化記憶サブシステムでは、主業務を行なう情報処理システムをプライマリシステムとし、それぞれ第１の中央処理装置、第１の記憶サブシステムおよび第１の記憶制御装置とする。また、バックアップ側の情報処理システムをセカンダリシステムとし、それぞれ第２の中央処理装置、第２の記憶サブシステムおよび第２の記憶制御装置とする。第１、第２のそれぞれの記憶制御装置は不揮発化機構を備えた大容量のキャッシュメモリ（ディスクキャッシュ）を備えている場合が一般的である。第１と第２の記憶制御装置間を１本ないしは複数本のデータ転送パスで接続し、データの一単位毎（例えばボリューム毎）に正・副のペアボリュームの関係を定義する。正側のデータをマスタデータ（マスタボリューム）と呼び、副側データをリモートデータ（リモートボリューム）と呼ぶ。プライマリシステムでの記憶サブシステムへのライト要求（Ｉ／Ｏ）においては、第１の中央処理装置から第１の記憶サブシステムへの書き込みデータを、自配下の記憶装置に書き込むだけでなく、第２の記憶制御装置のホストとして第２の記憶サブシステムにデータライトＩ／Ｏを発行し、データの二重化を図る。この様にしてデータファイルの二重化の運用を行なっている最中に、プライマリシステム側で障害が発生し、業務の継続が不可能になった場合には、即座にセカンダリシステムに業務を切替え、二重化されている第２の記憶サブシステムのデータを元に業務を継続する。このような技術の一例として、米国特許第５，１５５，８４５号に開示された技術があげられる。
【０００６】
また、既に第１の記憶サブシステム上に存在するデータボリュームを新たに遠隔二重化ボリュームとして定義し二重化ペアを新規に作成する場合（これを初期コピーと呼ぶ）には、第１の記憶制御装置は、第１の記憶サブシステムの当該ボリュームのデータを順次に記憶装置からキャッシュメモリに読み出し、第１の記憶制御装置から第２の記憶サブシステムにライトＩ／Ｏを発行する事によって、ボリュームデータの複写を行なう。この時のデータ複写の一単位はデータ格納単位の一単位（トラック）毎であっても良いし、複数個のデータ単位（例えば、シリンダ）毎であっても構わない。更に、第１の記憶サブシステムは、初期コピー処理のＩ／Ｏを実行しながら、同時に第１の中央処理装置からの更新Ｉ／Ｏを受ける事も可能である。初期コピー実行中のボリューム上のデータに対する更新においては、第１の記憶制御装置は、その更新範囲が、初期コピー処理が実施済み（第２の記憶サブシステムへの複写が完了済み）の領域に対する更新の場合には、同期または非同期の方式において第２の記憶サブシステムへの更新データの反映を行なう。
【０００７】
第１および第２の記憶サブシステムは、以下に述べるようなＲＡＩＤ方式のデータ格納方式であっても良い。本技術は、Ｄ．Ａ，Ｐａｔｔｅｒｓｏｎ，ｅｔａｌ．”ＩｎｔｒｏｄｕｃｔｉｏｎｔｏＲｅｄｕｎｄａｎｔＡｒｒａｙｓｏｆＩｎｅｘｐｅｎｓｉｖｅＤｉｓｋｓ（ＲＡＩＤ）”，ｓｐｒｉｎｇＣＯＭＰＣＯＮ’８９，ｐｐ．１１２−１１７，Ｆｅｂ．１９８９の論文にて述べられている技術である。ＲＡＩＤ方式とは、記憶サブシステムをｎ＋ｍ個の記憶装置を一つのデータ格納単位とし、データのある一単位（例えば、記憶媒体上の１トラック）毎に、ｎ個の記憶装置に分割して格納する。さらにｎ個のデータ単位を１グループとしてパリティデータと呼ばれる冗長データを作成する。冗長データ数はその冗長度に応じて定まり、冗長度がｍの場合はｍ個の冗長データを作成する。冗長データそのものも当該冗長データを構成するデータグループの格納記憶媒体とは異なる記憶媒体に格納する。このｎ個のデータ単位とそのｍ個の冗長データから構成されるデータ群を冗長化グループと呼ぶ。このことにより、一つの記憶装置が障害により読み出し不能に陥ったとしても、当該冗長化グループの他のｎ―１個のデータとｍ個の冗長データからデータの再生が可能であり、また同様に障害によって書き込み不能に陥った場合でもｍ個の冗長データを更新しておくことで論理的にデータの格納がなされる。このようにして記憶媒体やディスク媒体の障害に対しデータの信頼性を高めている。
【０００８】
従来の、ＲＡＩＤ方式を用いている遠隔データ二重化記憶サブシステムにおいては、第１の記憶サブシステムで記憶媒体の障害により冗長性が失われた場合、第１の記憶サブシステム内で、障害が発生した記憶媒体以外の第１の記憶サブシステム内の記憶媒体に格納されたｎ−１個のデータおよびｍ個の冗長データを用い、障害が発生した記憶媒体のデータを復元し再び第１の記憶サブシステム内の前記記憶媒体に記録させ、完全に冗長性およびｎ個のデータを復元させた後、前記第２の記憶サブシステムからのデータ複写を完了させて初めて前記第２の記憶サブシステムとの二重化システムとして運用可能である。この種の技術として例えば特開平０６−２６６５０８号公報がある。
【０００９】
しかし、この技術では前記第１の記憶サブシステム内の完全なる冗長性およびデータの回復がなされるまで、ＲＡＩＤの遠隔データ二重化記憶サブシステムとしての機能が制限される事になる。
【００１０】
【発明が解決しようとする課題】
上記で述べたＲＡＩＤ方式の場合、論理ボリュームと物理ボリュームが１：１に対応していない場合があり（特にＲＡＩＤ５など）、冗長データの作成単位（以下、ＥＣＣグループとする）内に複数の論理ボリュームが含まれる事がある。この時、ＥＣＣグループ内で複数の物理記憶媒体に障害が発生すると、そのＥＣＣグループは冗長性を失い、ＥＣＣグループ内のデータへのアクセスが不可能（ＥＣＣグループの閉塞）となるが、このＥＣＣグループの閉塞によりそのＥＣＣグループ内の全論理ボリューム内データへのアクセスが不可能になってしまうという技術的課題がある。
【００１１】
一方、上記で述べた遠隔データ二重化記憶サブシステムにＲＡＩＤ方式を用いている場合には、このように第１の記憶サブシステム内のある１ＥＣＣグループが閉塞したとしても、そのＥＣＣグループ内のデータに対するアクセス時はそのデータを二重化している第２の記憶サブシステム内の物理記憶媒体にアクセスすることでデータのリード・ライト処理を実行することができる。
【００１２】
しかしこの状態では、中央処理装置からのリード・ライト要求は第２の記憶サブシステム内のデータのみを用いて処理することになり、データは二重化状態にない。従って、この状態が長く続くと第２の記憶サブシステムに障害が発生し、セカンダリシステムがダウンすることでデータ損失を引き起こす危険性が高くなる。そこで閉塞したＥＣＣグループを直ちに回復する処理が必要である。
【００１３】
ここで、閉塞したＥＣＣグループを回復させるには、障害の発生した物理記憶媒体内のデータが二重化されている第２の記憶サブシステム内の物理記憶媒体内のデータを第１の記憶サブシステム内の予備の物理記憶媒体内に複写しデータを復元するとともに、閉塞したＥＣＣグループの冗長データを回復する処理が必要である。
【００１４】
しかし、閉塞したＥＣＣグループに含まれる全論理ボリュームの複写をするには多大な処理時間を要し、ＥＣＣグループの回復が遅れる上、サブシステム内の各資源を占有する時間も増大し、中央処理装置からのリード・ライト要求の処理性能を劣化させる恐れがある。また、従来の遠隔データ二重化技術では、二重化はあくまでも論理ボリューム単位であり、物理ボリュームの二重化という意識はない。従って、冗長データの複写は行われず、閉塞したＥＣＣグループの冗長性を回復させることが出来ない。もし、冗長データも回復させようとするならば、閉塞したＥＣＣグループ内の冗長データを除く全データを第１の記憶サブシステムに回復した後、それらのデータを元に冗長データを作成して冗長データの回復を行なう必要がある。これもやはり、閉塞したＥＣＣグループの回復に時間がかかってしまう。
【００１５】
そこで、第１の記憶サブシステム内のＥＣＣグループ内の複数の記憶媒体に障害が発生し冗長性がなくなった場合、即座にＥＣＣグループを再構成し、データの冗長性を復元させ遠隔のデータ二重化状態に戻すための手段が必要である。
【００１６】
本発明の目的は、冗長記憶構成の記憶サブシステムにおいて障害によって失われた冗長性を、システムを停止させることなく迅速に回復することが可能な記憶サブシステムおよびその障害回復技術を提供することにある。
【００１７】
本発明の他の目的は、冗長記憶構成の記憶サブシステムにおいて障害によって失われた冗長性およびデータ二重化状態の回復を、システムを停止させることなく迅速に行うことが可能な記憶サブシステムおよびその障害回復技術を提供することにある。
【００１８】
【課題を解決するための手段】
本発明の記憶サブシステムの障害回復方法は、
複数のデータと当該複数のデータから生成した冗長データを、正側記憶サブシステムの複数の記憶媒体に分散して格納するステップと、
複数のデータと冗長データとを、副側記憶サブシステムの複数の記憶媒体に複写するステップと、
冗長データの冗長度を超えるデータの障害が正側記憶サブシステムに発生した場合に、正側記憶サブシステムの複数の記憶媒体に格納されている正常なデータと、副側記憶サブシステムの複数の記憶媒体に格納してあるデータから障害データを復元するステップと、を有するものである。
また、本発明の記憶サブシステムの障害回復方法は、
複数のデータと当該複数のデータから生成した冗長データとで冗長グループを作成するステップと、
複数のデータと冗長データを正側記憶サブシステムの複数の記憶媒体に分散して格納するステップと、
複数のデータと冗長データとを、副側記憶サブシステムの複数の記憶媒体に複写するステップと、
正側記憶サブシステムの記憶媒体に冗長データの冗長度を超える障害が発生した場合に、正側記憶サブシステムの正常な記憶媒体に格納されているデータと、障害記憶媒体に対応する副側記憶サブシステムの記憶媒体に複写されているデータにより別の冗長グループを作成するステップと、を有するものである。
また、本発明の記憶サブシステムの障害回復方法は、
複数のデータと当該複数のデータから生成した冗長データを、正側記憶サブシステムの複数の記憶媒体に分散して格納するステップと、
複数のデータと冗長データとを、副側記憶サブシステムの複数の記憶媒体に複写するステップと、
正側記憶サブシステムの正側記憶手段のみで構成される冗長グループＧ１において、冗長データの冗長度を超える障害が発生した場合に、冗長グループＧ１に属する障害が発生していない正側記憶手段と、障害が発生した正側記憶手段に対応するデータを保持する副側記憶手段との間で冗長グループＧ２を構成するステップと、
正側記憶サブシステムの正側予備記憶手段に、障害が発生した正側記憶手段に対応するデータを保持する副側記憶手段の保持する内容を複写するステップと、冗長グループＧ１に属する障害が発生していない正側記憶手段と、正側予備記憶手段との間で冗長グループＧ３を構成するステップと、を有するものである。
また、本発明の記憶サブシステムの障害回復方法は、
複数のデータと当該複数のデータから生成した冗長データを、正側記憶サブシステムの複数の記憶媒体に分散して格納するステップと、
複数のデータと冗長データとを、副側記憶サブシステムの複数の記憶媒体に複写するステップと、
正側記憶サブシステムの正側記憶手段のみで構成される冗長グループＧ１において、冗長データの冗長度を超える障害が発生した場合に、冗長グループＧ１に属する障害が発生していない正側記憶手段と、障害が発生した正側記憶手段に対応するデータを保持する副側記憶手段との間で冗長グループＧ２を構成するステップと、
正側記憶サブシステムの正側予備記憶手段に、障害が発生した正側記憶手段に対応するデータを保持する副側記憶手段の保持する内容を複写するステップと、障害が発生した正側記憶手段の数が、正側予備記憶手段の数を上回る場合に、冗長グループＧ１に属する障害が発生していない正側記憶手段と、正側予備記憶手段、および障害が発生した正側記憶手段に対応するデータを保持する副側記憶手段との間で冗長グループＧ４を構成するステップと、を有するものである。
また、本発明の記憶サブシステムは、更新データを他の記憶サブシステムに備わる記憶手段との間で多重に保持する記憶手段を備え、
記憶手段は、冗長グループを構成する複数のデータ、および当該データから生成した冗長データ、を複数個の記憶媒体に分散して格納する記憶サブシステムであって、
記憶サブシステムの記憶手段で構成される冗長グループＧ１において、冗長データの冗長度を超える障害が発生した場合に、冗長グループＧ１に属する障害が発生していない記憶手段と、障害が発生した記憶手段に対応するデータを保持する他の記憶サブシステムに備わる記憶手段との間で冗長グループＧ２を構成するものである。
【００１９】
より具体的には、一例として、データ二重化記憶サブシステムにおいて、正側記憶サブシステム内の第１の冗長データグループＧ１内でｍ個（ｍ＞１）の記憶媒体に障害が発生し、正側記憶サブシステム内の記憶媒体だけで冗長性を保証する冗長データグループの構成が不可能となった場合、障害の発生したｍ個の記憶媒体の二重化データを保有している副側記憶サブシステム内の記憶媒体と、正側記憶サブシステム内の第１の冗長データグループＧ１内の障害の発生した記憶媒体以外の正常な記憶媒体とから第２の冗長データグループＧ２を構成して、障害が発生した第１の冗長データグループＧ１内のデータの冗長性を保証し、上位装置からのリード・ライト要求の処理を可能とする手段を備えたものである。
また、障害が発生している記憶媒体のうちの（ｍ―１）個のデータは副側記憶サブシステムから正側記憶サブシステム内の予備の記憶媒体にデータを複写することで第１の冗長データグループＧ１を第３の冗長データグループＧ３に再構成してデータの冗長性を復元し、残りの一つの障害の発生している記憶媒体内のデータについては、障害の発生した第１の冗長データグループＧ１内のデータおよび冗長データを用いて正側記憶サブシステム内の予備の記憶媒体内に復元し、データ二重化状態に戻す手段を備えたものである。
【００２０】
また、障害の発生したｍ個の記憶媒体内のデータを冗長データグループ作成単位に副側記憶サブシステムから正側記憶サブシステム内の予備の記憶媒体に復元して障害が発生している記憶媒体を含む第１の冗長データグループＧ１を第３の冗長データグループＧ３に再構成し、第３の冗長データグループＧ３の再構成が終了した範囲に対する上位装置からのリード・ライト要求は、再構成した第３の冗長データグループＧ３内のデータを用いて処理し、データの複写が済んでおらず、第３の冗長データグループＧ３が再構成されていない範囲に対する上位装置からのリード・ライト要求は、上述の第２の冗長データグループＧ２内のデータを用いて処理し、データの複写による第１の冗長データグループＧ１の第３の冗長データグループＧ３への再構成終了後、データ二重化状態に戻す手段を備えたものである。
【００２１】
【発明の実施の形態】
以下、本発明の実施の形態を図面を参照しながら詳細に説明する。
【００２２】
図１は、本発明の一実施の形態である障害回復方法を実施する記憶サブシステムにて構成されるデータ二重化記憶サブシステムを含む情報処理システムの構成の一例を示す概念図であり、図２、図３および図４は、本実施の形態のデータ二重化記憶サブシステムにて用いられる制御情報の一例を示す説明図、図５は、本実施の形態のデータ二重化記憶サブシステムの作用の一例を示す概念図、図６は、本実施の形態のデータ二重化記憶サブシステムの作用の一例を示すフローチャートである。また、図７は、本実施の形態のデータ二重化記憶サブシステムの変形例の作用の一例を示す概念図である。
【００２３】
本実施の形態の情報処理システムは、中央処理装置１と、記憶制御装置２、ディスク装置３、記憶制御装置４、ディスク装置５からからなるデータ二重化記憶サブシステムとで構成されている。データ二重化記憶サブシステムの記憶制御装置２は中央処理装置１とデータ転送パス６１により接続されているが、記憶制御装置４は中央処理装置１には接続されておらず、記憶制御装置２とデータ転送パス６２で接続されている。
【００２４】
ディスク装置３およびディスク装置５の各々には、それぞれ複数の物理ドライブ３ａ（正側記憶手段）および複数の物理ドライブ５ａ（副側記憶手段）が配置されており、データの記憶にはＲＡＩＤ５の冗長記憶構成を用いている。すなわち、外部から受領したデータは複数のデータ単位に分割され、さらに当該データ単位群からはパリティ等の冗長データ（群）が生成され、これらのデータ単位群および対応する冗長データ（群）が、異なる物理ドライブ３ａ（物理ドライブ５ａ）に分散して格納される。
【００２５】
ディスク装置３およびディスク装置５の各々には、予備の物理ドライブ３ａ（ＳＰＡＲＥ−ＶＯＬ）および予備の物理ドライブ５ａ（ＳＰＡＲＥ−ＶＯＬ）が配置されており、後述のように、必要に応じて、障害の発生した物理ドライブの代替として用いられる。
【００２６】
なお、以下の説明では、中央処理装置１に接続される記憶制御装置２とディスク装置３からなる記憶サブシステムを正側記憶サブシステム、記憶制御装置４とディスク装置５からなる記憶サブシステムを副側記憶サブシステムとする。
【００２７】
まず、正側記憶サブシステムの構成の一例について説明する。
【００２８】
チャネル制御部２２は中央処理装置１と記憶制御装置２とのインターフェース機能を有し、中央処理装置１が発行するディスク装置３内のデータに対する入出力要求を受け付ける。ホストコマンド処理部２４は、チャネル制御部２２が受け付けた要求を実行する。ドライブ制御部２７はドライブパス６３を介して接続された記憶制御装置２とディスク装置３とのインタフェース機能を有し、ディスク装置３と記憶制御装置２とのデータ転送の制御を行なう。
【００２９】
制御メモリ２５は論理ボリュームと物理ボリュームとの相互の変換を行うための論理・物理ＶＯＬ変換テーブル２０１、正側記憶サブシステム内のボリュームと副側記憶サブシステム内のボリュームの二重化の状態を示すペア状態管理テーブル２０２と、物理ＶＯＬ状態管理テーブル２０３を格納している。これらの詳細については必要に応じて後述する。
【００３０】
キャッシュメモリ２６は中央処理装置１とディスク装置３とのデータ転送時に一時的にデータを格納するためのバッファメモリである。物理ドライブ障害復旧部２８は、ディスク装置３内の物理ドライブ３ａに障害が発生した場合の復旧処理を制御する。この詳細については必要に応じて後述する。データ転送制御部２１は、データ転送パス６２を介して接続された記憶制御装置２と記憶制御装置４とのインタフェース機能を有し、データ転送の制御を行なう。リモートＩ／Ｏ制御部２３は、ディスク装置５内のデータに対する入出力要求を発行する処理を行なう。
【００３１】
次に副側記憶サブシステムの構成の一例について説明する。
【００３２】
チャネル制御部４１は、データ転送パス６２を介して接続された記憶制御装置２と記憶制御装置４とのインターフェース機能を有し、記憶制御装置２が発行するディスク装置５内のデータに対する入出力要求を受け付ける。ホストコマンド処理部４２は、チャネル制御部４１が受け付けた要求を実行する。ドライブ制御部４５は、ドライブパス６４を介して接続された記憶制御装置４とディスク装置５とのインタフェース機能を有し、ディスク装置５と記憶制御装置４とのデータ転送の制御を行なう。制御メモリ４３は論理ボリュームと物理ボリュームとの相互の変換を行う論理・物理ＶＯＬ変換テーブル４０１を格納している。この詳細については必要に応じて後述する。キャッシュメモリ４４は記憶制御装置２とディスク装置５とのデータ転送時に一時的にデータを格納するためのバッファメモリである。物理ドライブリード・ライト処理部４６は物理ドライブ５ａの単位にディスク装置５内のデータのリード・ライト処理を実行する。
【００３３】
次に正側記憶サブシステムと副側記憶サブシステムのデータの二重化について説明する。
【００３４】
まず、正側記憶サブシステム内の論理ＶＯＬおよび物理ＶＯＬはそれぞれ副側記憶サブシステム内の論理ＶＯＬおよび物理ＶＯＬと二重化のペア状態を形成する。このペア状態はペア状態管理テーブル２０２に格納されている。このペア形成のためのコピーを形成コピーと呼ぶ。形成コピーは形成コピー制御部２９がディスク装置３内のデータをリードし、リモートＩ／Ｏ制御部２３からディスク装置５内のペアＶＯＬに対しライト要求を発行してデータを書き込むことでデータの二重化を図っている。
【００３５】
次に中央処理装置１からのデータの流れを説明する前に、ＲＡＩＤ５におけるデータの更新時の処理の概要について説明する。ＲＡＩＤ５では、データは複数のドライブによって分割され格納される。また、あるデータ単位に対し、冗長データ（以下パリティデータとする）を持っている。このパリティデータを作成する単位のデータ群およびパリティデータの列をストライプ列といい、パリティデータを作成する単位をＥｒｒｏｒＣｏｒｒｅｃｔｉｏｎＣｏｄｅグループ（以下、ＥＣＣグループとする）という。今、あるデータに対し更新要求があった場合、更新前の旧データおよびそのストライプ列の旧パリティデータと更新データより、更新後の新パリティデータを割り出し、更新データおよび新パリティデータをドライブに書き込む。この処理をリード・モディファイ・ライト処理という。
【００３６】
次に中央処理装置１からディスク装置３内のデータに対するライト要求があった場合のデータの流れについて説明する。中央処理装置１から発行されたライト要求はチャネル制御部２２によって受け付けられ、ホストコマンド処理部２４によってキャッシュメモリ２６にライトデータが格納される。一方、ドライブ制御部２７により、ディスク装置３内の更新範囲の更新前旧データおよびそのストライプ列の旧パリティデータをリードし、キャッシュメモリ２６に格納する。そしてキャッシュメモリ２６に格納されている旧データ、旧パリティデータおよび更新データから新パリティデータを作成し、ライトデータおよび新パリティデータをディスク装置３へライトする。
【００３７】
一方、リモートＩ／Ｏ制御部２３は、副側記憶サブシステムの記憶制御装置４に対し、ディスク装置３内のライトデータと二重化対象となっているディスク装置５内のペアの物理ドライブ５ａ上データにライト要求を発行し、中央処理装置１からのライトデータを転送する。記憶制御装置４では記憶制御装置２からのライト要求をチャネル制御部４１で受け付け、ホストコマンド処理部４２がキャッシュメモリ４４にライトデータを格納する。
【００３８】
一方、ドライブ制御部２７は上記のライトデータと更新前データおよびそのストライプ列の旧パリティデータから新パリティデータを求め、ライトデータとともにディスク装置５へライトする。これらのライト処理が全て完結した時点で記憶制御装置２は中央処理装置１に対し、ライト処理の終了報告を行なう。
【００３９】
次に、制御メモリ２５内の各テーブルについて説明する。
【００４０】
論理・物理ＶＯＬ変換テーブル２０１は、論理ＶＯＬと物理ＶＯＬの変換を行なうためのテーブルであり、上位からある論理ＶＯＬに対するリード・ライト要求があった場合、それがどの物理ＶＯＬに対応するものかを示す情報が格納されている。図２は、この論理・物理ＶＯＬ変換テーブル２０１の構成の一例示す説明図である。論理ＶＯＬ番号２０１ａと、物理ＶＯＬ番号２０１ｂとが対応付けて格納され、一つの物理ＶＯＬ内に複数の論理ＶＯＬが設定される場合には、必要に応じて、物理ＶＯＬ内使用シリンダ範囲２０１ｃの情報も設定される。なお、特に図示しないが、副側記憶サブシステムの制御メモリ４３における論理・物理ＶＯＬ変換テーブル４０１も同様の構成となっている。
【００４１】
図３は、ペア状態管理テーブル２０２の内容の一例を示す説明図である。このテーブルは論理ＶＯＬペア状態管理テーブル２００１と物理ＶＯＬペア状態管理テーブル２００２から構成されている。
【００４２】
論理ＶＯＬペア状態管理テーブル２００１は正側記憶サブシステム内の論理ＶＯＬと副側記憶サブシステム内の論理ＶＯＬの二重化の対応関係を示す情報が格納されている。具体的には、一例として、正側論理ＶＯＬ番号２００１ａ、副側論理ＶＯＬ番号２００１ｂ、ペア状態フラグ２００１ｃ、当該ＶＯＬ内の二重完のデータ範囲を示す有効範囲ポインタ２００１ｄ、等の情報が設定される。
【００４３】
物理ＶＯＬペア状態管理テーブル２００２は正側記憶サブシステム内の物理ＶＯＬと副側記憶サブシステム内の物理ＶＯＬの二重化の対応関係を示す情報が格納されている。具体的には、一例として、正側物理ＶＯＬ番号２００２ａ、副側物理ＶＯＬ番号２００２ｂ、ペア状態フラグ２００２ｃ、当該ＶＯＬ内の二重完のデータ範囲を示す有効範囲ポインタ２００２ｄ、等の情報が設定される。
【００４４】
なお、ペア状態フラグ２００１ｃおよびペア状態フラグ２００２ｃは、一例として、“０”：非二重化、“１”：二重化、“２”：二重化構築中、“３”：正側障害、“４”：副側障害、等の情報が必要に応じて設定される。
【００４５】
図４は、物理ＶＯＬ状態管理テーブル２０３の構成の一例を示す説明図である。複数のＥＣＣグループを識別するためのＥＣＣグループ番号２０３ａと、ＥＣＣグループ状態フラグ２０３ｂと、当該ＥＣＣグループを構成する物理ＶＯＬ群を示す構成物理ＶＯＬ番号２０３ｃ、当該ＥＣＣグループを構成する各物理ＶＯＬの状態を示す物理ＶＯＬ状態フラグ２０３ｄ等で構成されている。
【００４６】
ＥＣＣグループ状態フラグ２０３ｂは、“０”：正常、“１”：コレクション動作不可、“２”：コレクション動作中、“３”：副ボリューム使用によるコレクション動作中、“４”：ＳＰＡＲＥ−ＶＯＬ復旧中、“５”：ＳＰＡＲＥ−ＶＯＬ使用によるコレクション動作中、等の情報が必要に応じて設定される。
【００４７】
また、物理ＶＯＬ状態フラグ２０３ｄは、“０”：正常、“１”：障害、“２”：副側物理ＶＯＬ使用、“３”：ＳＰＡＲＥ−ＶＯＬ使用、等の各情報が必要に応じて設定される。
【００４８】
さて、次に１ＥＣＣグループ内で１つの物理ドライブに障害が発生した場合の動作について説明する。１ＥＣＣグループ内の１つの物理ドライブが故障し、その故障したドライブ内のデータに対し、中央処理装置１からリード・ライト要求があった場合、そのリード・ライト要求範囲を含むストライプ列のデータおよびパリティデータから障害が発生してリード・ライト不可となった範囲のデータを復元し、リード・ライト処理を実行する処理をコレクションリード・ライト処理という。また、コレクションリード処理によって障害が発生したドライブ内の全データを復元しスペアドライブに反映することで、ＥＣＣグループを再構成する処理をコレクションコピー処理という。
【００４９】
本実施の形態では、正側記憶サブシステムの１ＥＣＣグループ内で２つ以上の物理ドライブ３ａに障害が発生し、上記で述べたコレクション処理が不可能となった場合には、副側記憶サブシステムの対応するデータを用いてＥＣＣグループの再構成を行う方法の一例について述べる。以下、これについて説明する。
【００５０】
上述のようにＲＡＩＤ５の場合、１ＥＣＣグループ内で２つ以上の物理ドライブ３ａに障害が発生するとコレクション処理が不可能となり、そのＥＣＣグループ内のデータは冗長性を失う。この時、物理ＶＯＬ状態管理テーブル２０３のＥＣＣグループ状態フラグ２０３ｂには当該ＥＣＣグループは使用不可を示す情報（“１”：コレクション動作不可）が記憶される。この場合、本実施の形態のデータ二重化記憶サブシステムでは副側記憶サブシステムを利用して正側記憶サブシステム内の障害が発生したＥＣＣグループを再構成し、そのＥＣＣグループを復旧することが可能である。
【００５１】
この方法の一つを図５を用いて説明する。
【００５２】
まず、正側記憶サブシステム内のあるＥＣＣグループ内でＭ個（Ｍ＞１）の物理ドライブ３ａに障害が発生したとする。この時のＥＣＣグループを第１のＥＣＣグループＧ１とする。まず物理ＶＯＬ状態管理テーブル２０３のＥＣＣグループ状態フラグ２０３ｂには当該ＥＣＣグループの使用不可の情報（“１”：コレクション動作不可）が記憶される。そして、物理ドライブ障害復旧部２８が、障害が発生した正側の物理ドライブ３ａとペア状態にある副側記憶サブシステム内の物理ドライブ５ａを障害発生した正側記憶サブシステム内の物理ドライブの代りにＥＣＣグループの構成要素とし、図５のように第２のＥＣＣグループＧ２を構成する。この第２のＥＣＣグループＧ２内のデータに対し、中央処理装置１からライト要求があった場合、データ更新後に作成された冗長データは第２のＥＣＣグループＧ２内のパリティドライブ（Ｐ）および、第１のＥＣＣグループＧ１内のパリティドライブ（Ｐ）の両方に書き込む。これにより、第１および第２のＥＣＣグループＧ２内の冗長性を保証しつつ、中央処理装置１からのリード・ライト要求を処理できる。
【００５３】
このように第２のＥＣＣグループＧ２を構成すれば、ＥＣＣグループ内のデータの冗長性は保証され、中央処理装置１からのリード・ライト処理は正常に処理できる。この状態で、障害が発生した物理ドライブ３ａ内のデータ（Ｄ）を復元してもよいし、最終的に正側記憶サブシステム内で図５のように、予備の物理ドライブ３ａ（Ｓ）を用いた第３のＥＣＣグループＧ３を形成して二重化状態に復帰してもよい。
【００５４】
次に、障害が発生した物理ドライブ内のデータの復元方法について述べる。
【００５５】
正側記憶サブシステム内の１ＥＣＣグループ内でＭ個の物理ドライブ３ａに障害が発生した場合、障害が発生した（Ｍ−１）個の物理ドライブ内のデータに関しては、物理ドライブ障害復旧部２８が物理ＶＯＬペア状態管理テーブル２００２を参照し、障害が発生した物理ドライブ３ａと二重化のペア状態にある副側記憶サブシステム内の各物理ＶＯＬ＃（物理ドライブ５ａ）を算出する。そしてその物理ＶＯＬ＃単位にその物理ＶＯＬ内のデータに対してリモートＩ／Ｏ制御部２３がリード要求を発行し、正側記憶サブシステム内のスペアボリューム（Ｓ）に副側記憶サブシステムからデータをコピーすることで障害が発生している（Ｍ−１）個の物理ドライブ３ａ内のデータを復元させる。これにより、障害が発生した物理ドライブ３ａを含むＥＣＣグループはコレクション動作が可能となり、物理ＶＯＬ状態管理テーブル２０３のＥＣＣグループ状態フラグ２０３ｂには当該ＥＣＣグループ使用可の情報（“５”：ＳＰＡＲＥ−ＶＯＬ使用によるコレクション動作中）が書き込まれる。
【００５６】
データの複写の途中で、正側記憶サブシステム内のコレクション動作が不可能なＥＣＣグループに対し、中央処理装置１からのリード・ライト要求があった場合は図５に示した第２のＥＣＣグループＧ２のように、障害が発生した物理ドライブ３ａとペア状態にある副側記憶サブシステム内の物理ドライブ５ａを構成要素としたＥＣＣグループを再構成し、この再構成したＥＣＣグループ内のデータを用いて処理する。データの複写が完了し、障害が発生しているＥＣＣグループでコレクション動作が可能となった後、残りの一つの障害が発生している物理ドライブ内のデータについては、コレクションコピー処理によって正側記憶サブシステム内のスペアドライブ（Ｓ）にデータを復元する。なお、データが復元されたスペアボリュームは新たに物理ＶＯＬペア状態管理テーブル２００２（の正側物理ＶＯＬ番号２００２ａ）に登録され、正側記憶サブシステム内の物理ＶＯＬと副側記憶サブシステム内の物理ＶＯＬの二重化ペア状態を維持する。
【００５７】
また、１ＥＣＣグループ内でＭ個（Ｍ＞１）の物理ドライブ３ａの障害が発生し、論理ＶＯＬペア状態管理テーブル２００１上の障害が発生した物理ドライブ３ａを含む論理ＶＯＬのペア状態が解除状態になった場合は、上記のデータ復元後に論理ＶＯＬペア状態管理テーブル２００１上のペア状態を自動的に二重化ペア状態に戻す手段があってもよい。以上のような処理シーケンスを示したフローチャートを図６に示す。
【００５８】
また、障害が発生した物理ドライブが１ＥＣＣグループ内でＭ個（Ｍ＞１）の場合、図７に示す方法でデータを復旧してもよい。
【００５９】
すなわち、１ＥＣＣグループ内でＭ個の物理ドライブ３ａに障害が発生すると、そのＥＣＣグループは使用不能という情報（“１”：コレクション動作不可）が物理ＶＯＬ状態管理テーブル２０３のＥＣＣグループ状態フラグ２０３ｂに記憶される。この復旧はまず、物理ドライブ障害復旧部２８が物理ＶＯＬペア状態管理テーブル２００２を参照し、Ｍ個の障害が発生した物理ドライブ３ａと二重化のペア状態にある副側のサブシステム内の各物理ＶＯＬ＃（物理ドライブ５ａ）を算出する。そして、算出した各物理ＶＯＬ＃内のデータをストライプ列単位にリモートＩ／Ｏ制御部２３がリード要求を発行し、正側記憶サブシステム内のスペアドライブ（Ｓ）にデータを複写する。データの複写が終了した範囲は論理ＶＯＬペア状態管理テーブル２００１および物理ＶＯＬペア状態管理テーブル２００２内の有効範囲を示すポインタ（有効範囲ポインタ２００１ｄ、有効範囲ポインタ２００２ｄ）で記憶しておく。データの複写が終了した範囲は、その範囲でＥＣＣグループを再構成し（図７中のＥＣＣグループＧ５）、複写完了済の範囲に対する中央処理装置１からのリード・ライト要求は、正側記憶サブシステムでリード・モディファイ・ライト処理を行なうとともに、ライト要求時はライトデータを副側記憶サブシステムにも書き込み、データの二重化を維持する。
【００６０】
一方、ストライプ列単位のデータ複写が済んでいない範囲に対し中央処理装置１からリード・ライト要求があった場合は、図７に示したＥＣＣグループＧ４（副側記憶サブシステムの一部の物理ドライブ５ａのデータを障害の物理ドライブ３ａの代替として使用）内のデータを用いて処理する。また、ＥＣＣグループ内でＭ個（Ｍ＞１）の物理ドライブ３ａの障害が発生し、論理ＶＯＬペア状態管理テーブル２００１上の障害が発生した物理ドライブ３ａを含む論理ＶＯＬのペア状態が解除状態になった場合は、上記のデータ復元後に論理ＶＯＬペア状態管理テーブル２００１上のペア状態を自動的に二重化ペア状態に戻す手段があってもよい。
【００６１】
以上説明したように、本実施の形態に例示されたデータ二重化記憶サブシステムを用いれば、正側記憶サブシステム内の一つのＥＣＣグループ内の物理ドライブ３ａが、冗長度を越えて（本実施の形態の場合２台以上）故障し、当該ＥＣＣグループ内でのコレクション動作が不能となった場合でも、当該故障で使用不能となったデータに対応した副側記憶サブシステム内の正常なデータや物理ドライブ５ａを用いてＥＣＣグループを再構成することで、即座に、データの冗長性を回復することができ、情報処理システムを止めることなく、データの冗長性を維持した稼働を継続でき、遠隔でデータの二重化を行なっている記憶サブシステムの信頼性を向上させることができる。
【００６２】
また、中央処理装置１との間のオンライン処理中に障害の発生した物理ドライブ３ａ内のデータ復元が可能であり、情報処理システム全体の運用効率や可用性が向上する。またデータの復元後、再び遠隔でのデータ二重化を実現することが可能である。
【００６３】
以上本発明者によってなされた発明を実施の形態に基づき具体的に説明したが、本発明は前記実施の形態に限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能であることはいうまでもない。
【００６４】
【発明の効果】
本発明の記憶サブシステムおよびその障害回復方法によれば、冗長記憶構成の記憶サブシステムにおいて障害によって失われた冗長性を、システムを停止させることなく迅速に回復することができる、という効果が得られる。
【００６５】
また、冗長記憶構成の記憶サブシステムにおいて障害によって失われた冗長性およびデータ二重化状態の回復を、システムを停止させることなく迅速に行うことができる、という効果が得られる。
【図面の簡単な説明】
【図１】本発明の一実施の形態である障害回復方法を実施する記憶サブシステムにて構成されるデータ二重化記憶サブシステムを含む情報処理システムの構成の一例を示す概念図である。
【図２】本発明の一実施の形態である障害回復方法を実施する記憶サブシステムにて構成されるデータ二重化記憶サブシステムにて用いられる制御情報の一例を示す説明図である。
【図３】本発明の一実施の形態である障害回復方法を実施する記憶サブシステムにて構成されるデータ二重化記憶サブシステムにて用いられる制御情報の一例を示す説明図である。
【図４】本発明の一実施の形態である障害回復方法を実施する記憶サブシステムにて構成されるデータ二重化記憶サブシステムにて用いられる制御情報の一例を示す説明図である。
【図５】本発明の一実施の形態である障害回復方法を実施する記憶サブシステムにて構成されるデータ二重化記憶サブシステムの作用の一例を示す概念図である。
【図６】本発明の一実施の形態である障害回復方法を実施する記憶サブシステムにて構成されるデータ二重化記憶サブシステムの作用の一例を示すフローチャートである。
【図７】本発明の一実施の形態である障害回復方法を実施する記憶サブシステムにて構成されるデータ二重化記憶サブシステムの変形例の作用の一例を示す概念図である。
【符号の説明】
１…中央処理装置、２…記憶制御装置、２１…データ転送制御部、２２…チャネル制御部、２３…リモートＩ／Ｏ制御部、２４…ホストコマンド処理部、２５…制御メモリ、２６…キャッシュメモリ、２７…ドライブ制御部、２８…物理ドライブ障害復旧部、２９…形成コピー制御部、３…ディスク装置、３ａ…物理ドライブ（正側記憶手段）、４…記憶制御装置、４１…チャネル制御部、４２…ホストコマンド処理部、４３…制御メモリ、４４…キャッシュメモリ、４５…ドライブ制御部、４６…物理ドライブリード・ライト処理部、４０１…論理・物理ＶＯＬ変換テーブル、５…ディスク装置、５ａ…物理ドライブ（副側記憶手段）、６１…データ転送パス、６２…データ転送パス、６３…ドライブパス、６４…ドライブパス、２０１…論理・物理ＶＯＬ変換テーブル、２０１ａ…論理ＶＯＬ番号、２０１ｂ…物理ＶＯＬ番号、２０１ｃ…物理ＶＯＬ内使用シリンダ範囲、２０２…ペア状態管理テーブル、２００１…論理ＶＯＬペア状態管理テーブル、２００１ａ…正側論理ＶＯＬ番号、２００１ｂ…副側論理ＶＯＬ番号、２００１ｃ…ペア状態フラグ、２００１ｄ…有効範囲ポインタ、２００２…物理ＶＯＬペア状態管理テーブル、２００２ａ…正側物理ＶＯＬ番号、２００２ｂ…副側物理ＶＯＬ番号、２００２ｃ…ペア状態フラグ、２００２ｄ…有効範囲ポインタ、２０３…物理ＶＯＬ状態管理テーブル、２０３ａ…ＥＣＣグループ番号、２０３ｂ…ＥＣＣグループ状態フラグ、２０３ｃ…構成物理ＶＯＬ番号、２０３ｄ…物理ＶＯＬ状態フラグ、Ｇ１〜Ｇ３…ＥＣＣグループ、Ｇ４〜Ｇ５…ＥＣＣグループ。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a storage subsystem and a failure recovery technique thereof, and is particularly effective when applied to, for example, a data redundant storage subsystem which is installed in a remote location and includes a plurality of storage subsystems each having a redundant storage configuration. Technology.
[0002]
[Prior art]
In an information processing system consisting of a central processing unit and a peripheral storage device, as the amount of information has increased and the demand for reliability of data to be handled has increased, the reliability of storage media and storage devices against physical failure has been improved. As a countermeasure, a data duplication storage subsystem has been put to practical use for recovering data from backup data in the event of data loss due to a failure by holding data twice in a plurality of storage media.
[0003]
On the other hand, as individual storage subsystems, by dividing data into a plurality of storage media and arranging them, and further generating and distributing redundant data represented by parity data with some data as one unit, A RAID storage device that recovers data from redundant data and other data in the unit when a storage medium of certain data fails has been put into practical use.
[0004]
However, in systems in which information processing systems function over a wide area and many information processing systems are linked, as typified by on-line systems such as banks, these data reliability improvement technologies are not suitable. Data is stored in multiple storage subsystems in a redundant manner and redundancy is achieved.If a failure occurs in the entire storage subsystem or the entire information processing system including the central processing unit fails due to, for example, a power outage or fire in the entire building, If the system stops operating, the damage not only affects the entire system in a wide area, but also the damage caused by the data loss becomes enormous. In response to such a concern, a data duplication management system that duplicates data in a remote place has been put to practical use. However, in this remote data duplication, since the data communication between the remote information processing systems is processed by the communication function between the central processing units, the load on the central processing unit for performing data processing, arithmetic, and the like is large. Reducing the load on the processing unit has been an issue for remote data duplication systems.
[0005]
As a countermeasure against such a problem, a function for performing communication and data transfer between the control devices is provided in the storage control device between the remote information processing systems, and the control devices are connected via a communication / data transfer path. As a result, a system for reducing the load on the central processing unit by assigning a load for data duplication to the storage control device has been put to practical use. In this remote data redundant storage subsystem, an information processing system that performs a main task is defined as a primary system, and is defined as a first central processing unit, a first storage subsystem, and a first storage control device, respectively. In addition, the information processing system on the backup side is referred to as a secondary system, and is referred to as a second central processing unit, a second storage subsystem, and a second storage control device, respectively. In general, each of the first and second storage control devices includes a large-capacity cache memory (disk cache) having a nonvolatile mechanism. The first and second storage controllers are connected by one or more data transfer paths, and the relationship between the primary and secondary pair volumes is defined for each unit of data (for example, for each volume). The primary data is called master data (master volume), and the secondary data is called remote data (remote volume). In the write request (I / O) to the storage subsystem in the primary system, the write data from the first central processing unit to the first storage subsystem is written not only to the storage device under its control but also to the first storage subsystem. As a host of the second storage controller, it issues a data write I / O to the second storage subsystem to achieve data duplication. If a failure occurs on the primary system during the operation of data file duplication in this way and business cannot be continued, the business is immediately switched to the secondary system and duplication is performed. The business is continued based on the data of the second storage subsystem that has been set. An example of such a technique is the technique disclosed in US Pat. No. 5,155,845.
[0006]
When a data volume already existing on the first storage subsystem is newly defined as a remote duplex volume and a duplex pair is newly created (this is called an initial copy), the first storage control device The data of the volume of the first storage subsystem is sequentially read from the storage device to the cache memory, and the first storage control device issues a write I / O to the second storage subsystem, so that the volume data is Make a copy. One unit of data copying at this time may be one unit (track) of the data storage unit, or may be one unit of a plurality of data units (eg, cylinders). Further, the first storage subsystem can execute the I / O of the initial copy processing, and at the same time, receive the updated I / O from the first central processing unit. In updating data on a volume on which an initial copy is being executed, the first storage controller updates the update range of an area whose initial copy processing has been performed (copying to the second storage subsystem has been completed). In the case of updating, the update data is reflected on the second storage subsystem in a synchronous or asynchronous manner.
[0007]
The first and second storage subsystems may be a RAID data storage method as described below. This technology is described in D. A, Patternson, et al. "Introduction to Redundant Arrays of Inexpensive Disks (RAID)", spring COMPCON '89, pp. 146-64. 112-117, Feb. This is a technique described in a 1989 paper. In the RAID system, the storage subsystem is composed of n + m storage devices as one data storage unit, and is divided into n storage devices for each unit of data (for example, one track on a storage medium) and stored. I do. Further, redundant data called parity data is created with n data units as one group. The number of redundant data is determined according to the degree of redundancy. When the degree of redundancy is m, m pieces of redundant data are created. The redundant data itself is also stored in a storage medium different from the storage medium of the data group constituting the redundant data. A data group including the n data units and the m redundant data is called a redundancy group. As a result, even if one storage device becomes unreadable due to a failure, data can be reproduced from the other n-1 data and m redundant data in the redundancy group, and similarly, Even in the case where writing becomes impossible due to a failure, data is logically stored by updating the m pieces of redundant data. In this way, data reliability is improved in the event of a failure in a storage medium or a disk medium.
[0008]
In a conventional remote data duplication storage subsystem using a RAID system, when redundancy is lost due to a failure of a storage medium in the first storage subsystem, a failure occurs in the first storage subsystem. Using the (n-1) data and the m redundant data stored in the storage medium in the first storage subsystem other than the storage medium thus restored, the data in the storage medium in which the failure has occurred is restored and the first storage is performed again. After recording on the storage medium in the subsystem and completely restoring the redundancy and n data, the second storage subsystem and the second storage subsystem are not copied until data copying from the second storage subsystem is completed. Can be operated as a redundant system. For example, Japanese Patent Application Laid-Open No. 06-266508 discloses this type of technology.
[0009]
However, this technique will limit the RAID's ability to function as a remote data duplicate storage subsystem until complete redundancy and data recovery within the first storage subsystem is achieved.
[0010]
[Problems to be solved by the invention]
In the case of the above-described RAID system, there is a case where a logical volume and a physical volume do not correspond one-to-one (particularly, RAID5 or the like), and a plurality of logical volumes are created in a unit of creating redundant data (hereinafter referred to as an ECC group). May include volume. At this time, if a failure occurs in a plurality of physical storage media in the ECC group, the ECC group loses redundancy and access to data in the ECC group becomes impossible (blockage of the ECC group). There is a technical problem that access to data in all logical volumes in the ECC group becomes impossible due to blockage of the group.
[0011]
On the other hand, in the case where the above-described RAID system is used for the remote data redundant storage subsystem, even if one ECC group in the first storage subsystem is closed in this way, the data in the ECC group is not At the time of access, data read / write processing can be executed by accessing a physical storage medium in the second storage subsystem that duplicates the data.
[0012]
However, in this state, the read / write request from the central processing unit is processed using only the data in the second storage subsystem, and the data is not in a duplicated state. Therefore, if this state continues for a long time, a failure occurs in the second storage subsystem, and the risk of causing data loss due to the secondary system going down increases. Therefore, a process for immediately recovering the closed ECC group is required.
[0013]
Here, in order to recover the closed ECC group, the data in the physical storage medium in the second storage subsystem in which the data in the physical storage medium in which the failure has occurred is duplicated is stored in the first storage subsystem. It is necessary to perform a process of copying the data in the spare physical storage medium to restore the data and recovering the redundant data of the closed ECC group.
[0014]
However, copying all the logical volumes included in the closed ECC group requires a large amount of processing time, delays recovery of the ECC group, increases the time occupying each resource in the subsystem, and increases the central processing. The processing performance of the read / write request from the device may be degraded. Further, in the conventional remote data duplication technology, the duplication is performed in units of logical volumes, and there is no awareness of duplication of physical volumes. Therefore, the redundant data is not copied, and the redundancy of the closed ECC group cannot be restored. If the redundant data is to be recovered, all data except the redundant data in the closed ECC group is recovered to the first storage subsystem, and then the redundant data is created based on those data. You need to recover the data. This also takes time to recover the closed ECC group.
[0015]
Therefore, when a failure occurs in a plurality of storage media in the ECC group in the first storage subsystem and redundancy is lost, the ECC group is immediately reconfigured to restore data redundancy and remote data duplication. Means are needed to return to the state.
[0016]
An object of the present invention is to provide a storage subsystem capable of quickly recovering redundancy lost due to a failure in a storage subsystem having a redundant storage configuration without stopping the system, and a failure recovery technique thereof. is there.
[0017]
Another object of the present invention is to provide a storage subsystem capable of quickly recovering redundancy and a data duplication state lost due to a failure in a storage subsystem having a redundant storage configuration without stopping the system, and a failure in the storage subsystem. It is to provide a recovery technique.
[0018]
[Means for Solving the Problems]
The storage subsystem failure recovery method of the present invention comprises:
Storing a plurality of data and redundant data generated from the plurality of data in a plurality of storage media of the primary storage subsystem in a distributed manner;
Copying the plurality of data and the redundant data to the plurality of storage media of the secondary storage subsystem;
When a data failure exceeding the redundancy of the redundant data occurs in the primary storage subsystem, the normal data stored in the multiple storage media of the primary storage subsystem and the multiple data in the secondary storage subsystem Restoring the fault data from the data stored in the storage medium.
Further, the storage subsystem failure recovery method of the present invention includes:
Creating a redundancy group with the plurality of data and the redundancy data generated from the plurality of data;
Distributing and storing a plurality of data and redundant data in a plurality of storage media of the primary storage subsystem;
Copying the plurality of data and the redundant data to the plurality of storage media of the secondary storage subsystem;
When a failure exceeding the redundancy of the redundant data occurs in the storage medium of the primary storage subsystem, the data stored in the normal storage medium of the primary storage subsystem and the secondary storage corresponding to the failed storage medium Creating another redundancy group from the data copied to the storage medium of the subsystem.
Further, the storage subsystem failure recovery method of the present invention includes:
Storing a plurality of data and redundant data generated from the plurality of data in a plurality of storage media of the primary storage subsystem in a distributed manner;
Copying the plurality of data and the redundant data to the plurality of storage media of the secondary storage subsystem;
When a failure exceeding the redundancy of redundant data occurs in the redundancy group G1 including only the primary storage means of the primary storage subsystem, the primary storage means having no failure belonging to the redundancy group G1 Configuring a redundant group G2 with the secondary storage unit that holds data corresponding to the primary storage unit in which a failure has occurred;
Copying the contents held by the secondary storage means holding data corresponding to the failed primary storage means to the primary spare storage means of the primary storage subsystem; And a step of configuring a redundant group G3 between the primary storage means that has not been performed and the primary spare storage means.
Further, the storage subsystem failure recovery method of the present invention includes:
Storing a plurality of data and redundant data generated from the plurality of data in a plurality of storage media of the primary storage subsystem in a distributed manner;
Copying the plurality of data and the redundant data to the plurality of storage media of the secondary storage subsystem;
When a failure exceeding the redundancy of redundant data occurs in the redundancy group G1 including only the primary storage means of the primary storage subsystem, the primary storage means having no failure belonging to the redundancy group G1 Configuring a redundant group G2 with the secondary storage unit that holds data corresponding to the primary storage unit in which a failure has occurred;
Copying the contents held by the secondary storage means for holding data corresponding to the failed primary storage means to the primary spare storage means of the primary storage subsystem; Is greater than the number of primary spare storage means, the primary storage means belonging to the redundancy group G1 in which no failure has occurred, the primary spare storage means, and the primary storage means in which the failure has occurred. Configuring a redundant group G4 with the secondary storage means for storing data to be stored.
Further, the storage subsystem of the present invention includes storage means for holding update data in multiplex with storage means provided in another storage subsystem,
The storage unit is a storage subsystem that stores a plurality of data constituting a redundancy group and redundant data generated from the data in a distributed manner on a plurality of storage media,
When a failure exceeding the redundancy of redundant data occurs in the redundancy group G1 configured by the storage unit of the storage subsystem, a storage unit in which no failure belongs to the redundancy group G1 and a storage unit in which the failure has occurred A redundant group G2 is configured with storage means provided in another storage subsystem that holds data corresponding to.
[0019]
More specifically, as an example, in the data redundancy storage subsystem, a failure occurs in m (m> 1) storage media in the first redundant data group G1 in the primary storage subsystem, and If it becomes impossible to form a redundant data group that guarantees redundancy using only the storage medium in the storage subsystem, the secondary storage subsystem holding the duplicated data of the m failed storage media A second redundant data group G2 is configured from the storage medium of the first storage subsystem and a normal storage medium other than the failed storage medium in the first redundant data group G1 in the primary storage subsystem, and a failure occurs. Means for guaranteeing the redundancy of the data in the first redundant data group G1 and enabling processing of a read / write request from a higher-level device.
The (m-1) pieces of data in the failed storage medium are copied from the secondary storage subsystem to a spare storage medium in the primary storage subsystem to perform first redundancy. The data group G1 is reconfigured into the third redundant data group G3 to restore the data redundancy, and the remaining data in the failed storage medium is replaced with the failed first redundant data. There is provided means for restoring the data in the spare storage medium in the primary storage subsystem using the data in the data group G1 and the redundant data, and returning the data to the data duplex state.
[0020]
In addition, the data in the m storage media in which the failure occurred is restored from the secondary storage subsystem to the spare storage medium in the primary storage subsystem in units of creating a redundant data group, and Is reconfigured into a third redundant data group G3, and a read / write request from a higher-level device for the range in which the reconfiguration of the third redundant data group G3 has been completed is reconfigured. Processing is performed using the data in the third redundant data group G3, and a read / write request from a higher-level device for a range in which data has not been copied and the third redundant data group G3 has not been reconfigured is Processing is performed using the data in the above-described second redundant data group G2, and the third redundant data group G3 of the first redundant data group G1 is copied by copying the data. After completion reconfiguration, those having a means for returning the data duplication state.
[0021]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[0022]
FIG. 1 is a conceptual diagram showing an example of the configuration of an information processing system including a data duplication storage subsystem configured by a storage subsystem that executes a failure recovery method according to an embodiment of the present invention. 3 and 4 are explanatory diagrams showing an example of control information used in the data duplication storage subsystem of the present embodiment, and FIG. 5 is an example of an operation of the data duplication storage subsystem of the present embodiment. FIG. 6 is a flowchart showing an example of the operation of the data duplication storage subsystem of the present embodiment. FIG. 7 is a conceptual diagram showing an example of the operation of a modification of the data redundancy storage subsystem of the present embodiment.
[0023]
The information processing system according to the present embodiment includes a central processing unit 1 and a data duplication storage subsystem including a storage control device 2, a disk device 3, a storage control device 4, and a disk device 5. The storage control device 2 of the data duplication storage subsystem is connected to the central processing unit 1 by the data transfer path 61, but the storage control device 4 is not connected to the central processing unit 1, and the storage control device 2 They are connected by a transfer path 62.
[0024]
In each of the disk device 3 and the disk device 5, a plurality of physical drives 3a (primary storage means) and a plurality of physical drives 5a (secondary storage means) are arranged, respectively. A storage configuration is used. That is, data received from the outside is divided into a plurality of data units, and redundant data (group) such as parity is generated from the data unit group. These data unit groups and corresponding redundant data (group) are The data is distributed and stored in different physical drives 3a (physical drives 5a).
[0025]
A spare physical drive 3a (SPARE-VOL) and a spare physical drive 5a (SPARE-VOL) are arranged in each of the disk device 3 and the disk device 5, and as described later, a failure occurs as necessary. Is used as a substitute for the physical drive where the error occurred.
[0026]
In the following description, the storage subsystem composed of the storage control device 2 and the disk device 3 connected to the central processing unit 1 is referred to as the primary storage subsystem, and the storage subsystem composed of the storage control device 4 and the disk device 5 is represented as the secondary storage subsystem. Side storage subsystem.
[0027]
First, an example of the configuration of the primary storage subsystem will be described.
[0028]
The channel control unit 22 has an interface function between the central processing unit 1 and the storage control unit 2 and receives an input / output request for data in the disk device 3 issued by the central processing unit 1. The host command processing unit 24 executes the request received by the channel control unit 22. The drive control unit 27 has an interface function between the storage control device 2 and the disk device 3 connected via the drive path 63, and controls data transfer between the disk device 3 and the storage control device 2.
[0029]
The control memory 25 includes a logical / physical VOL conversion table 201 for performing mutual conversion between a logical volume and a physical volume, and a pair indicating a duplex state of a volume in the primary storage subsystem and a volume in the secondary storage subsystem. A state management table 202 and a physical VOL state management table 203 are stored. These details will be described later as necessary.
[0030]
The cache memory 26 is a buffer memory for temporarily storing data when transferring data between the central processing unit 1 and the disk device 3. The physical drive failure recovery unit 28 controls recovery processing when a failure occurs in the physical drive 3a in the disk device 3. The details will be described later as necessary. The data transfer control unit 21 has an interface function between the storage control device 2 and the storage control device 4 connected via the data transfer path 62, and controls data transfer. The remote I / O control unit 23 performs a process of issuing an input / output request for data in the disk device 5.
[0031]
Next, an example of the configuration of the secondary storage subsystem will be described.
[0032]
The channel control unit 41 has an interface function between the storage control device 2 and the storage control device 4 connected via the data transfer path 62, and performs an input / output request for data in the disk device 5 issued by the storage control device 2. Accept. The host command processing unit 42 executes the request received by the channel control unit 41. The drive control unit 45 has an interface function between the storage control device 4 and the disk device 5 connected via the drive path 64, and controls data transfer between the disk device 5 and the storage control device 4. The control memory 43 stores a logical / physical VOL conversion table 401 for performing mutual conversion between a logical volume and a physical volume. The details will be described later as necessary. The cache memory 44 is a buffer memory for temporarily storing data when transferring data between the storage control device 2 and the disk device 5. The physical drive read / write processing unit 46 executes read / write processing of data in the disk device 5 for each physical drive 5a.
[0033]
Next, data duplication of the primary storage subsystem and the secondary storage subsystem will be described.
[0034]
First, the logical VOL and the physical VOL in the primary storage subsystem form a duplex pair state with the logical VOL and the physical VOL in the secondary storage subsystem, respectively. This pair status is stored in the pair status management table 202. This copy for forming a pair is called a formed copy. As for the formation copy, the formation copy control unit 29 reads the data in the disk device 3, issues a write request from the remote I / O control unit 23 to the pair VOL in the disk device 5, and writes the data, thereby duplicating the data. I am planning.
[0035]
Next, before describing the flow of data from the central processing unit 1, an outline of processing at the time of updating data in RAID5 will be described. In RAID 5, data is divided and stored by a plurality of drives. Further, a certain data unit has redundant data (hereinafter referred to as parity data). The data group of the unit for creating the parity data and the column of the parity data are called a stripe row, and the unit for creating the parity data is called an Error Correction Code group (hereinafter, referred to as an ECC group). Now, when an update request is issued for a certain data, the updated new parity data is determined from the old data before the update and the old parity data and the updated data of the stripe row, and the updated data and the new parity data are written to the drive. . This process is called a read-modify-write process.
[0036]
Next, the flow of data when a write request for data in the disk device 3 is issued from the central processing unit 1 will be described. The write request issued from the central processing unit 1 is accepted by the channel control unit 22, and the write data is stored in the cache memory 26 by the host command processing unit 24. On the other hand, the drive control unit 27 reads the old data before update of the update range in the disk device 3 and the old parity data of the stripe row, and stores them in the cache memory 26. Then, new parity data is created from the old data, old parity data, and update data stored in the cache memory 26, and the write data and the new parity data are written to the disk device 3.
[0037]
On the other hand, the remote I / O control unit 23 sends the write data in the disk device 3 and the data on the paired physical drives 5a in the disk device 5 to be duplicated to the storage control device 4 of the secondary storage subsystem. Issues a write request, and transfers the write data from the central processing unit 1. In the storage controller 4, the write request from the storage controller 2 is accepted by the channel controller 41, and the host command processing unit 42 stores the write data in the cache memory 44.
[0038]
On the other hand, the drive control unit 27 obtains new parity data from the write data, the pre-update data, and the old parity data of the stripe row, and writes the new parity data to the disk device 5 together with the write data. When all of these write processes are completed, the storage controller 2 reports the end of the write process to the central processing unit 1.
[0039]
Next, each table in the control memory 25 will be described.
[0040]
The logical / physical VOL conversion table 201 is a table for performing conversion between a logical VOL and a physical VOL, and when a read / write request is issued to a certain logical VOL from the upper level, which logical VOL corresponds to which physical VOL. Information is stored. FIG. 2 is an explanatory diagram showing an example of the configuration of the logical / physical VOL conversion table 201. The logical VOL number 201a and the physical VOL number 201b are stored in association with each other, and when a plurality of logical VOLs are set in one physical VOL, information on the used cylinder range 201c in the physical VOL is provided as necessary. Is also set. Although not particularly shown, the logical / physical VOL conversion table 401 in the control memory 43 of the secondary storage subsystem has the same configuration.
[0041]
FIG. 3 is an explanatory diagram showing an example of the contents of the pair status management table 202. This table includes a logical VOL pair status management table 2001 and a physical VOL pair status management table 2002.
[0042]
The logical VOL pair status management table 2001 stores information indicating a correspondence relationship between a logical VOL in the primary storage subsystem and a logical VOL in the secondary storage subsystem. Specifically, as an example, information such as a primary logical VOL number 2001a, a secondary logical VOL number 2001b, a pair status flag 2001c, and an effective range pointer 2001d indicating a double-completed data range in the VOL are set. You.
[0043]
The physical VOL pair status management table 2002 stores information indicating the correspondence between the physical VOL in the primary storage subsystem and the duplication of the physical VOL in the secondary storage subsystem. Specifically, as an example, information such as a primary physical VOL number 2002a, a secondary physical VOL number 2002b, a pair status flag 2002c, an effective range pointer 2002d indicating a double-completed data range in the VOL, and the like are set. You.
[0044]
The pair status flag 2001c and the pair status flag 2002c are, for example, “0”: non-duplex, “1”: duplex, “2”: under duplex configuration, “3”: primary side failure, “4”: secondary Information such as side failure is set as needed.
[0045]
FIG. 4 is an explanatory diagram showing an example of the configuration of the physical VOL status management table 203. An ECC group number 203a for identifying a plurality of ECC groups, an ECC group status flag 203b, a constituent physical VOL number 203c indicating a physical VOL group constituting the ECC group, and a state of each physical VOL constituting the ECC group And a physical VOL status flag 203d indicating the status.
[0046]
The ECC group status flag 203b is “0”: normal, “1”: collection operation disabled, “2”: collection operation in progress, “3”: collection operation using the secondary volume, “4”: SPARE-VOL recovery in progress. , "5": Information such as during the collection operation using the SPARE-VOL is set as necessary.
[0047]
The physical VOL status flag 203d is set as necessary with information such as "0": normal, "1": failure, "2": use of secondary physical VOL, "3": use of SPARE-VOL. Is done.
[0048]
Next, an operation when a failure occurs in one physical drive in one ECC group will be described. When one physical drive in one ECC group fails and a read / write request is issued from the central processing unit 1 to data in the failed drive, data and parity of a stripe row including the read / write request range A process of restoring data in a range where reading and writing are disabled due to a failure from the data and executing a read / write process is called a collection read / write process. Further, a process of restoring all data in a drive in which a failure has occurred by the collection read process and reflecting the restored data on the spare drive to reconfigure the ECC group is called a collection copy process.
[0049]
In the present embodiment, if two or more physical drives 3a have failed in one ECC group of the primary storage subsystem and the above-described collection process cannot be performed, the secondary storage subsystem An example of a method of reconstructing an ECC group using the corresponding data will be described. Hereinafter, this will be described.
[0050]
As described above, in the case of RAID 5, if a failure occurs in two or more physical drives 3a in one ECC group, the collection process becomes impossible, and data in the ECC group loses redundancy. At this time, information indicating that the ECC group is unusable (“1”: collection operation disabled) is stored in the ECC group status flag 203b of the physical VOL status management table 203. In this case, in the data duplication storage subsystem of the present embodiment, the failed ECC group in the primary storage subsystem can be reconfigured using the secondary storage subsystem, and the ECC group can be recovered. It is.
[0051]
One of the methods will be described with reference to FIG.
[0052]
First, assume that a failure has occurred in M (M> 1) physical drives 3a in a certain ECC group in the primary storage subsystem. The ECC group at this time is referred to as a first ECC group G1. First, information indicating that the ECC group cannot be used (“1”: collection operation disabled) is stored in the ECC group status flag 203b of the physical VOL status management table 203. Then, the physical drive failure recovery unit 28 replaces the physical drive 5a in the secondary storage subsystem that is paired with the failed physical drive 3a in place of the physical drive in the failed primary storage subsystem. And a second ECC group G2 as shown in FIG. When there is a write request from the central processing unit 1 for the data in the second ECC group G2, the redundant data created after the data update is performed by the parity drive (P) in the second ECC group G2 and the redundant data. Write to both parity drives (P) in one ECC group G1. As a result, it is possible to process a read / write request from the central processing unit 1 while guaranteeing the redundancy in the first and second ECC groups G2.
[0053]
By configuring the second ECC group G2 in this way, the redundancy of the data in the ECC group is guaranteed, and the read / write processing from the central processing unit 1 can be performed normally. In this state, the data (D) in the failed physical drive 3a may be restored, or the spare physical drive 3a (S) may be eventually replaced in the primary storage subsystem as shown in FIG. The used third ECC group G3 may be formed to return to the duplex state.
[0054]
Next, a method of restoring data in a failed physical drive will be described.
[0055]
When a failure occurs in the M physical drives 3a in one ECC group in the primary storage subsystem, the physical drive failure recovery unit 28 performs data recovery on the data in the (M-1) failed physical drives. By referring to the physical VOL pair status management table 2002, each physical VOL # (physical drive 5a) in the secondary storage subsystem in a duplicated pair status with the failed physical drive 3a is calculated. Then, the remote I / O control unit 23 issues a read request to the data in the physical VOL in units of the physical VOL #, and sends the data from the secondary storage subsystem to the spare volume (S) in the primary storage subsystem. To restore the data in the (M-1) physical drives 3a in which a failure has occurred. As a result, the ECC group including the failed physical drive 3a can perform a collection operation, and the ECC group status flag 203b of the physical VOL status management table 203 indicates that the ECC group is available (“5”: SPARE-VOL). (During collection operation by use) is written.
[0056]
If a read / write request is issued from the central processing unit 1 to an ECC group in the primary storage subsystem in which a collection operation cannot be performed during data copying, the second ECC group shown in FIG. As shown in G2, an ECC group including the physical drive 5a in the secondary storage subsystem paired with the failed physical drive 3a as a component is reconfigured, and data in the reconfigured ECC group is used. Process. After the data copying is completed and the collection operation is enabled in the failed ECC group, the remaining data in the failed physical drive is corrected to the primary storage by the correction copy process. The data is restored to the spare drive (S) in the subsystem. The spare volume whose data has been restored is newly registered in (the primary physical VOL number 2002a of) the physical VOL pair status management table 2002, and the physical VOL in the primary storage subsystem and the physical VOL in the secondary storage subsystem are registered. Maintain the VOL duplex pair state.
[0057]
Further, when a failure of M (M> 1) physical drives 3a occurs in one ECC group, the pair status of the logical VOL including the failed physical drive 3a in the logical VOL pair status management table 2001 is changed to the released state. In the event that the data has been restored, there may be means for automatically returning the pair status on the logical VOL pair status management table 2001 to the duplex pair status after the above data restoration. FIG. 6 is a flowchart showing the processing sequence as described above.
[0058]
If the number of failed physical drives is M (M> 1) in one ECC group, the data may be recovered by the method shown in FIG.
[0059]
That is, when a failure occurs in M physical drives 3a in one ECC group, information indicating that the ECC group is unusable (“1”: collection operation disabled) is stored in the ECC group status flag 203b of the physical VOL status management table 203. Is done. In this recovery, first, the physical drive failure recovery unit 28 refers to the physical VOL pair status management table 2002, and each physical VOL in the secondary subsystem in a duplicated pair status with the M failed physical drives 3a. # (Physical drive 5a) is calculated. Then, the remote I / O control unit 23 issues a read request for the calculated data in each physical VOL # in stripe units, and copies the data to the spare drive (S) in the primary storage subsystem. The range in which data copying has been completed is stored in the logical VOL pair status management table 2001 and the physical VOL pair status management table 2002 as pointers (valid range pointers 2001d and 2002d) indicating valid ranges. The ECC group is reconfigured in the range where the data copying is completed (ECC group G5 in FIG. 7), and a read / write request from the central processing unit 1 for the copied range is sent to the primary storage sub-unit. The system performs read-modify-write processing, and at the time of a write request, writes write data also to the secondary storage subsystem to maintain data duplication.
[0060]
On the other hand, when there is a read / write request from the central processing unit 1 for a range in which data copying has not been completed in stripe column units, the ECC group G4 shown in FIG. 5a is used as a substitute for the failed physical drive 3a). Further, when a failure of M (M> 1) physical drives 3a occurs in the ECC group, the pair status of the logical VOL including the failed physical drive 3a on the logical VOL pair status management table 2001 is changed to the released state. In the event that the data has been restored, there may be means for automatically returning the pair status on the logical VOL pair status management table 2001 to the duplex pair status after the above data restoration.
[0061]
As described above, by using the data duplex storage subsystem exemplified in the present embodiment, the physical drives 3a in one ECC group in the primary storage subsystem exceed the redundancy (this embodiment). Even if a failure occurs and the collection operation in the ECC group is disabled, normal data and physical data in the secondary storage subsystem corresponding to the data that has become unusable due to the failure. By reconfiguring the ECC group using the drive 5a, the data redundancy can be immediately restored, and the operation while maintaining the data redundancy can be continued without stopping the information processing system. The reliability of the storage subsystem performing data duplication can be improved.
[0062]
Further, data recovery in the physical drive 3a in which a failure has occurred during online processing with the central processing unit 1 is possible, and the operation efficiency and availability of the entire information processing system are improved. After data restoration, remote data duplication can be realized again.
[0063]
Although the invention made by the present inventor has been specifically described based on the embodiment, the present invention is not limited to the above embodiment, and various changes can be made without departing from the gist of the invention. Needless to say.
[0064]
【The invention's effect】
ADVANTAGE OF THE INVENTION According to the storage subsystem of this invention and its failure recovery method, the effect that the redundancy lost by the failure in the storage subsystem of the redundant storage configuration can be quickly recovered without stopping the system is obtained. Can be
[0065]
Further, an effect is obtained that the redundancy and the data duplication state lost due to the failure in the storage subsystem having the redundant storage configuration can be quickly restored without stopping the system.
[Brief description of the drawings]
FIG. 1 is a conceptual diagram illustrating an example of a configuration of an information processing system including a data duplication storage subsystem configured by a storage subsystem that executes a failure recovery method according to an embodiment of the present invention.
FIG. 2 is an explanatory diagram showing an example of control information used in a data duplication storage subsystem configured by a storage subsystem that performs a failure recovery method according to an embodiment of the present invention;
FIG. 3 is an explanatory diagram showing an example of control information used in a data duplication storage subsystem configured by a storage subsystem that performs a failure recovery method according to an embodiment of the present invention;
FIG. 4 is an explanatory diagram showing an example of control information used in a data duplication storage subsystem configured by a storage subsystem that executes a failure recovery method according to an embodiment of the present invention;
FIG. 5 is a conceptual diagram illustrating an example of an operation of a data duplication storage subsystem configured by a storage subsystem that performs a failure recovery method according to an embodiment of the present invention;
FIG. 6 is a flowchart illustrating an example of an operation of a data duplication storage subsystem configured by a storage subsystem that executes a failure recovery method according to an embodiment of the present invention;
FIG. 7 is a conceptual diagram showing an example of an operation of a modified example of the data duplication storage subsystem configured by the storage subsystem that performs the failure recovery method according to the embodiment of the present invention;
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Central processing unit, 2 ... Storage control unit, 21 ... Data transfer control unit, 22 ... Channel control unit, 23 ... Remote I / O control unit, 24 ... Host command processing unit, 25 ... Control memory, 26 ... Cache memory 27, a drive control unit, 28, a physical drive failure recovery unit, 29, a formed copy control unit, 3, a disk device, 3a, a physical drive (primary storage unit), 4, a storage control device, 41, a channel control unit, 42: Host command processing unit, 43: Control memory, 44: Cache memory, 45: Drive control unit, 46: Physical drive read / write processing unit, 401: Logical / physical VOL conversion table, 5: Disk device, 5a: Physical Drive (secondary storage means), 61: data transfer path, 62: data transfer path, 63: drive path, 64: drive path, 201: Physical / physical VOL conversion table, 201a: logical VOL number, 201b: physical VOL number, 201c: range of cylinders used in physical VOL, 202: pair status management table, 2001: logical VOL pair status management table, 2001a: primary logical VOL No., 2001b: Secondary logical VOL number, 2001c: Pair status flag, 2001d: Effective range pointer, 2002: Physical VOL pair status management table, 2002a: Primary physical VOL number, 2002b: Secondary physical VOL number, 2002c: Pair Status flag, 2002d: valid range pointer, 203: physical VOL status management table, 203a: ECC group number, 203b: ECC group status flag, 203c: constituent physical VOL number, 203d: physical VOL status flag, G1 to G3: ECC glue Flop, G4~G5 ... ECC group.

Claims

The data sent from the higher-level device to the target logical volume of the read / write request is distributed to a plurality of physical volumes of the primary storage subsystem as a plurality of data and redundant data generated from the plurality of data. Storing and storing;
Copying the plurality of data and the redundant data to a plurality of physical volumes of a secondary storage subsystem remote from the primary storage subsystem;
The primary storage subsystem accumulates and manages physical volume pair status management information of the secondary storage subsystem;
When a failure of data exceeding the redundancy of the redundant data occurs in the primary storage subsystem, the data is stored in a plurality of physical volumes of the secondary storage subsystem based on the physical volume pair status management information. reads certain data, create a normal data stored in the plurality of physical volumes, another redundancy group from data stored in a plurality of physical volumes of the secondary storage subsystem of the primary storage subsystem And restoring the failure data.

The method comprising the data sent from the host device to the logical volume to be read-write request, and the generated redundant data from said plurality of data and a plurality of data to create a redundancy group,
Distributing and storing the plurality of data and the redundant data in a plurality of physical volumes of a primary storage subsystem;
Copying the plurality of data and the redundant data to a plurality of physical volumes of a secondary storage subsystem remote from the primary storage subsystem;
The primary storage subsystem accumulates and manages physical volume pair status management information of the secondary storage subsystem;
If a failure exceeding the redundancy of the redundant data in the physical volume of the primary storage subsystem occurs, on the basis of the physical volume pair status management information, the secondary storage subsystem corresponding to the failed physical volume read data that is copied to the physical volumes, the data stored in the normal the physical volume of the primary storage subsystem, the physical volume of the secondary storage subsystem corresponding to the failed physical volume Creating another redundancy group based on the data being duplicated.

The data sent from the higher-level device to the target logical volume of the read / write request is distributed to a plurality of physical volumes of the primary storage subsystem as a plurality of data and redundant data generated from the plurality of data. Storing and storing;
Copying the plurality of data and the redundant data to a plurality of physical volumes of a secondary storage subsystem remote from the primary storage subsystem;
The primary storage subsystem accumulates and manages physical volume pair status management information of the secondary storage subsystem;
When a failure exceeding the redundancy of the redundant data occurs in the redundancy group G1 including the primary physical volumes of the primary storage subsystem, a failure occurs based on the physical volume pair status management information. the read data in the secondary physical volume, and the primary physical volume disorders belonging to the redundancy group G1 has not occurred, the data corresponding to the primary physical volume failed corresponding to the primary physical volume Configuring a redundancy group G2 with the secondary physical volume that holds
Copying the content held by the secondary physical volume holding data corresponding to the failed primary physical volume to the primary spare physical volume of the primary storage subsystem,
Wherein said primary physical volume not faulted belonging to the redundant group G1, the failure recovery method of a storage subsystem and a step of configuring the redundancy group G3 between the primary spare physical volume.

Has a physical volume that holds the multiple between the physical volume included the update data to other storage subsystem that is remotely located, the management table for storing the physical volume pair status management information of said other storage subsystem And a control memory,
The physical volume receives data sent from a higher-level device to a logical volume to be a target of a read / write request, as a plurality of data constituting a redundancy group and a plurality of physical data as redundant data generated from the data. A storage subsystem that is distributed and stored in a volume ,
The storage subsystem accumulates and manages physical volume pair status management information of the other storage subsystem in the management table,
When a failure exceeding the redundancy of the redundant data occurs in the redundancy group G1 configured by the physical volumes of the storage subsystem, based on the physical volume pair status management information stored in the management table, reads data of the other storage subsystem that corresponds to the physical volume of a failure to hold the said disorder belonging to the redundant group G1 does not occur physical volumes, data corresponding to a physical volume of a failure A storage subsystem forming a redundancy group G2 with a physical volume provided in the another storage subsystem.

The data sent from the higher-level device to the target logical volume of the read / write request is distributed to a plurality of physical volumes of the primary storage subsystem as a plurality of data and redundant data generated from the plurality of data. Storing and storing;
Copying the plurality of data and the redundant data to a plurality of physical volumes of a secondary storage subsystem remote from the primary storage subsystem;
The primary storage subsystem accumulates and manages physical volume pair status management information of the secondary storage subsystem;
When a failure of data exceeding the redundancy of the redundant data occurs in the primary storage subsystem, normal data stored in a plurality of physical volumes of the primary storage subsystem and the secondary storage subsystem Restoring the failed data from data stored in a plurality of physical volumes of the system.