JP3736134B2

JP3736134B2 - Distributed storage method, distributed storage system, and recording medium recording distributed storage program

Info

Publication number: JP3736134B2
Application number: JP24334298A
Authority: JP
Inventors: 雅浩上野; 重親木下; 眞人久力; 節子村田; 茂太郎岩津
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1998-08-28
Filing date: 1998-08-28
Publication date: 2006-01-18
Anticipated expiration: 2018-08-28
Also published as: JP2000076207A

Description

【０００１】
【発明の属する技術分野】
本発明は、分散記憶方法及び分散記憶システム及び分散記憶プログラムを記録した記録媒体に係り、特に、地震、洪水等の地域災害に対するデータ保護機能を持ち、該データ保護構成をノード毎に選択できる分散記憶方法及び分散記憶システム及び分散記憶プログラムを記録した記録媒体に関する。
【０００２】
【従来の技術】
従来の技術における耐地域災害性を持つ分散記憶装置の一例が、例えば特開平５−３２４５７９号公報に開示されている。特開平５−３２４５７９号公報に開示されている分散記憶装置においては、
(１) データを格納するファイル装置とパリティデータとしての冗長データを格納する冗長データ格納装置を合わせて３個以上持ち、
（２）冗長データ格納装置を１個以上持つ、
ことを特徴としている。
【０００３】
このような特徴を持つため、自然災害等で、あるノードにあるファイル装置が破壊されても他のノードにある冗長データ格納装置に格納されている冗長データと、正常動作しているファイル装置に格納されているデータとから、破壊されたファイル装置に格納されているデータを復元できる。
パリティデータとしては、例えばファイル装置が４台ある場合、各ファイル装置のストライプａ０、ａ１、ａ２、ａ３からパリティデータｐ０は以下の式で算出される。
【０００４】
ｐ０＝ａ０＋ａ１＋ａ２＋ａ３（ｍｏｄ２）
ここで’＋’はmodulo２の加法を表す。
また、従来の技術における耐地域災害性を持つ分散記憶装置の別の一例として、一般的なＲＡＩＤ（Redundant Array of Independent Disks）装置がある。この装置は、ディスク装置を統括する１個のＲＡＩＤコントローラとデータまたは冗長データを格納する２個以上のディスク装置を使用し、ＲＡＩＤコントローラインターフェースとして例えばFibre Channel SCSI等を使用することにより各装置を広域に分散でき、データ保持についての耐災害性を持つことができる。
【０００５】
図7 は従来の技術による分散記憶装置の第１の例であり、複数のコンピュータからアクセスするために複数のＲＡＩＤを独立に構築した例を示している。同図に示すように、第１の例はノードＡ１０、ノードＢ２０、及びノードＣ３０から構成され、各ノードは通信路5 ３０及び通信路5 ３１によりバス型に接続されている。ノードＡ１０において、ホストコンピュータ5 ００はホスト接続手段5 ０１、ＲＡＩＤ制御手段５０２、データ転送手段５０３を介してバス接続され、記憶媒体５０４は記憶媒体接続手段５０５、データ転送手段５０６を介してバス接続され、記憶媒体５０７は記憶媒体接続手段５０８、データ転送手段５０９を介してバス接続されている。ノードＢ２０及びノードＣ３０における構成もノードＡ１０における構成と同様である。記憶媒体に接続されている記憶媒体接続手段及びデータ転送手段は、１台のコンピュータにより機能が実現されることが多く、ホストコンピュータに接続したホスト接続手段、ＲＡＩＤ制御手段、及びデータ転送手段はホストコンピュータ内に機能が設けられる場合や単独のコンピュータにより機能が実現される場合がある。
【０００６】
本構成において、例えば、ホストコンピュータ５００は記憶媒体５２４と記憶媒体５１７をＲＡＩＤ２を構成する分散記憶媒体として使用し、ホストコンピュータ５１０は記憶媒体５０４と記憶媒体５２７をＲＡＩＤ２を構成する分散記憶媒体として使用し、ホストコンピュータ５２０は記憶媒体５０７と記憶媒体５１４をＲＡＩＤ２を構成する分散記憶媒体として使用する。
【０００７】
図８は従来の技術による分散記憶装置の第２の例であり、これも複数のコンピュータからアクセスするために複数のＲＡＩＤを独立に構築した例である。
同図に示すように、第２の例はノードＡ１０、ノードＢ２０、及びノードＣ３０から構成されている。ノードＡ１０において、ホストコンピュータ5 ００はホスト接続手段5 ０１、ＲＡＩＤ制御手段５０２、データ転送手段５０３を介して通信路５４１に接続され、ノードＣ３０においてデータ転送手段５２６、記憶媒体接続手段５２５を介して記憶媒体５２４に接続される。また、ホストコンピュータ5 ００はホスト接続手段5 ０１、ＲＡＩＤ制御手段５０２、データ転送手段５０３を介して通信路５４０に接続され、ノードＢ２０においてデータ転送手段５１９、記憶媒体接続手段５１８を介して記憶媒体５１７に接続される。ホストコンピュータ５１０及びホストコンピュータ５２０も同様の構成により記憶媒体と接続される。記憶媒体に接続されている記憶媒体接続手段及びデータ転送手段は、１台のコンピュータにより機能が実現されることが多く、ホストコンピュータに接続したホスト接続手段、ＲＡＩＤ制御手段、及びデータ転送手段はホストコンピュータ内に機能が設けられる場合や単独のコンピュータにより機能が実現される場合がある。
【０００８】
本構成において、例えば、ホストコンピュータ５００は記憶媒体５２４と記憶媒体５１７をＲＡＩＤ２を構成する分散記憶媒体として使用し、ホストコンピュータ５１０は記憶媒体５０４と記憶媒体５２７をＲＡＩＤ２を構成する分散記憶媒体として使用し、ホストコンピュータ５２０は記憶媒体５０７と記憶媒体５１４をＲＡＩＤ２を構成する分散記憶媒体として使用する。
【０００９】
【発明が解決しようとする課題】
しかしながら、特開平５−３２４５７９号公報に開示されている装置は、パリティデータを持つ構成を示しており、パリティデータを持たない最も信頼性と可用性の高いＲＡＩＤレベル１（ミラーリング) を構成できない欠点を持つ。
また、高速性と信頼性を高めるために、一般に、ＲＡＩＤレベル０のディスクアレイ装置を複数用意し、それぞれを１つのディスク装置と見立ててそれらをＲＡＩＤレベル１の技術で組み合わせる等、ＲＡＩＤ技術を組み合わせることが行われているが、特開平５- ３２４５７９号公報に開示されている装置は、パリティデータを持つ構成を示しており、高速性の高さで有用なＲＡＩＤレベル０（ストライピング）を構成できない欠点を持つ。また、特開平５- ３２４５７９号公報に開示されている装置は、パリティデータをmodulo2 の加法を用いた演算により算出しているため、Hamming コードを使用しているＲＡＩＤレベル２を構成できない欠点を持つ。
【００１０】
また、１次元冗長度を持つＲＡＩＤレベル６では２種類の冗長データ生成アルゴリズムが必要となるが、１種類目としてmodulo2 の加法を選ぶとしたら２種類目はmodulo2 の加法以外の例えばReed-Solomon誤り訂正符号化方法を選ぶ必要がある。特開平５−３２４５７９号公報に開示されている装置は、modulo2 の加法のみに対応しているため、１次元冗長度を持つＲＡＩＤレベル６を構成できない欠点を持つ。
【００１１】
また、２次元冗長度を持つＲＡＩＤレベル６では１回のデータ書き込みに対して２回の冗長データ生成（例えばmodulo2 の加法）が必要となるが、特開平５- ３２４５７９号公報に開示されている装置は、１回の冗長データの生成しか行わないため、２次元冗長度を持つＲＡＩＤレベル６を構成できない欠点を持つ。
更に、従来の技術によると、一般的なＲＡＩＤ装置は、ＲＡＩＤコントローラを１個しか持たないためＲＡＩＤコントローラのある地域に自然災害等が起こり、ＲＡＩＤコントローラが破壊された場合、システム全体が停止するため、地域災害に対する可用性、すなわち装置が破壊されてもデータアクセス可能である性質を持つことができない欠点を持つ。また、一般的なＲＡＩＤ装置は、ＲＡＩＤコントローラを１個しか持たないため、複数のコンピュータからアクセスできない欠点を持つ。また、一般的なＲＡＩＤ装置は、複数のコンピュータからアクセスするために複数のＲＡＩＤ装置を使用しなければならず、各装置が全く独立して構築されるため、コストが高くなるという欠点を持つ。すなわち、図７及び図８に示したように、ノード毎に異なるファイルシステムを持ち、各ノードのデータを冗長度を持たせながらそれぞれ他のノードに分散蓄積する場合、従来技術ではデータ転送手段と記憶媒体接続手段のコストがノード数の２乗のオーダに比例してかかるため、構築するために多くの投資が必要という問題点がある。
【００１２】
本発明は上記の点に鑑みなされたもので、上記に説明した問題点を解決し、ＲＡＩＤレベル０及びＲＡＩＤレベル１〜６を含むデータ保護機能を各ノード毎で選択して構成でき、地域災害に対して可用性を持ち、複数のコンピュータからのアクセスが可能であり、重複した装置を持たず低コストな分散記憶方法及び分散記憶システム及び分散記憶プログラムを記録した記録媒体を提供することを目的とする。
【００１３】
【課題を解決するための手段】
請求項１に記載の発明は、互いに離れたノードにある分散記憶媒体に、別のノードにあるホストコンピュータのデータを通信路を介して分散して蓄積する分散記憶方法であって、各ノードにおいて、ホストコンピュータから分散記憶媒体へのデータ書き込み時にはデータを分割し、データ読み込み時にはデータを統合するデータ分割統合手段により、複数種類のデータ保護構成から１種類のデータ保護構成が指定され、一のノードのホストコンピュータ及び分散記憶媒体のデータを該一のノードから他のノードへ伸びる通信路に転送するデータ転送手段は、該他のノードから到着したデータが該一のノードのホストコンピュータへのデータかあるいは該一のノードに設置された分散記憶媒体へのデータかを判断し、該データを該ホストコンピュータあるいは該分散記憶媒体に振り分け、該分散記憶媒体と該データ転送手段とを接続する記憶媒体接続手段は、該分散記憶媒体に振り分けられたデータを、データ送出元の該他のノードが複数である場合に該他のノード毎に複数記憶媒体又は単一記憶媒体の複数記憶領域に書き込み、該他のノード毎に読み出すことを特徴とする分散記憶方法である。
【００１４】
請求項２に記載の分散記憶方法は、前記データ分割統合手段が、前記データ分割時に、前記データ保護構成により必要があれば冗長データを生成し、分割された該データと該冗長データの記憶場所を指定する情報、及び前記データ保護構成を指定する情報を保持するデータ分割蓄積情報を生成し、該冗長データが生成されている場合、該データ統合時にデータ欠損があった際には、該データ分割蓄積情報を参照して残りのデータと該冗長データにより欠損したデータを復元する。
【００１５】
請求項３に記載の発明は、互いに離れたノードにある分散記憶媒体に、別のノードにあるホストコンピュータのデータを通信路を介して分散して蓄積する分散記憶システムであって、各ノードにおいて、ホストコンピュータから分散記憶媒体へのデータ書き込み時にはデータを分割し、データ読み込み時にはデータを統合し、複数種類のデータ保護構成から１種類のデータ保護構成を指定するデータ分割統合手段と、一のノードのホストコンピュータ及び分散記憶媒体のデータを該一のノードから他のノードへ伸びる通信路に転送し、該他のノードから到着したデータが該一のノードのホストコンピュータへのデータかあるいは該一のノードに設置された分散記憶媒体へのデータかを判断し、該データを該ホストコンピュータあるいは該分散記憶媒体に振り分けるデータ転送手段と、該分散記憶媒体と該データ転送手段とを接続し、該分散記憶媒体に振り分けられたデータを、データ送出元の該他のノードが複数である場合に該他のノード毎に複数記憶媒体又は単一記憶媒体の複数記憶領域に書き込み、該他のノード毎に読み出す記憶媒体接続手段とを有する分散記憶システムである。
【００１６】
請求項４に記載の分散記憶システムは、前記データ分割統合手段が、前記データ分割時に、前記データ保護構成により必要があれば冗長データを生成する手段と、分割された該データと該冗長データの記憶場所を指定する情報、及び前記データ保護構成を指定する情報を保持するデータ分割蓄積情報を生成する手段と、該冗長データが生成されている場合、該データ統合時にデータ欠損があった際には、該データ分割蓄積情報を参照して残りのデータと該冗長データにより欠損したデータを復元する手段と、を有する。
【００１７】
請求項５に記載の発明は、互いに離れたノードにある分散記憶媒体に、別のノードにあるホストコンピュータのデータを通信路を介して分散して蓄積する分散記憶方法を各ノードのコンピュータに実行させる分散記憶プログラムを記録した記録媒体であって、前記コンピュータを、ホストコンピュータから分散記憶媒体へのデータ書き込み時にはデータを分割し、データ読み込み時にはデータを統合し、複数種類のデータ保護構成から１種類のデータ保護構成を指定するデータ分割統合手段、一のノードのホストコンピュータ及び分散記憶媒体のデータを該一のノードから他のノードへ伸びる通信路に転送し、該他のノードから到着したデータが該一のノードのホストコンピュータへのデータかあるいは該一のノードに設置された分散記憶媒体へのデータかを判断して該データを該ホストコンピュータあるいは該分散記憶媒体に振り分けるデータ転送手段、該分散記憶媒体とデータ転送手段とを接続し、該分散記憶媒体に振り分けられたデータを、データ送出元の該他のノードが複数である場合に該他のノード毎に複数記憶媒体又は単一記憶媒体の複数記憶領域に書き込み、該他のノード毎に読み出す記憶媒体接続手段、として機能させるための分散記憶プログラムを記録した記録媒体である。
【００１８】
請求項６に記載の分散記憶プログラムを記録した記録媒体は、前記データ分割統合手段が、前記データ分割時に、前記データ保護構成により必要があれば冗長データを生成する手段と、分割された該データと該冗長データの記憶場所を指定する情報、及び前記データ保護構成を指定する情報を保持するデータ分割蓄積情報を生成する手段と、該冗長データが生成されている場合、該データ統合時にデータ欠損があった際には、該データ分割蓄積情報を参照して残りのデータと該冗長データにより欠損したデータを復元する手段とを有する。
【００１９】
上記のように、本発明によれば、データ分割統合手段がデータ保護構成を指定し、指定したデータ保護構成に適合した冗長データを生成し、指定したデータ保護構成に適合した場所に該冗長データを格納することができるので、各ノード毎にデータ保護構成機能を選択して実現することができる。また、データ保護構成をＲＡＩＤ０〜６のいずれかとすることにより、各ノード毎にＲＡＩＤ０〜６を選択して実現することができる。
【００２０】
また、本発明によれば、各ノードにデータ分割統合手段があるので、地域災害によりあるノードが破壊されても別のノードのデータアクセス性が保持されるため、可用性が実現できる。また、データ転送手段は、他ノードから到着したデータが自ノードのファイルシステムのものか他ノードのファイルシステムのものかを判断し振り分ける機能を持つため、従来の技術では自ノードと他ノード分用意しなければならなかったデータ転送手段を共有でき、記憶媒体接続手段は、単一または複数媒体に接続でき、ノード別に単一または複数媒体の記憶領域を分割し、ノード別に分割データの書き込みと読み出しを行う機能を持つため、従来の技術では他ノード数分あった記憶媒体接続手段を共有できる。したがって、従来の技術においては重複して必要であった装置を削減することができ、コストの削減ができる。更に、各ノードは１つのデータ転送手段に接続するので、従来技術におけるチャネル型構成ではホストコンピュータと記憶媒体間毎に少なくとも一本ずつ必要であった通信路が、ノードとノードの間に少なくとも１本あればよい構成とすることができ、更なるコスト削減が可能となる。
【００２１】
【発明の実施の形態】
図１は本発明における第１の実施例の構成を示す図である。同図に示すように、第１の実施例はノードＡ１０、ノードＢ２０、及びノードＣ３０から構成され、各ノードは通信路１３０及び通信路１３１によりバス型ネットワーク接続されている。ノードＡ１０はホストコンピュータ１００、ホスト接続手段１０１、データ分割統合手段１０２、データ接続手段１０３、記憶媒体接続手段１０４、記憶媒体１０５、記憶媒体１０６から構成され、ノードＢはホストコンピュータ１１０、ホスト接続手段１１１、データ分割統合手段１１２、データ転送手段１１３、記憶媒体接続手段１１４、記憶媒体１１５、記憶媒体１１６から構成され、ノードＣはホストコンピュータ１２０、ホスト接続手段１２１、データ分割統合手段１２２、データ転送手段１２３、記憶媒体接続手段１２４、記憶装置１２５、記憶装置１２６から構成される。各手段の機能と、本実施例の動作については後述するとして、次に各手段の実現例について説明する。
【００２２】
ホスト接続手段に、例えば、ＳＣＳＩ（Small Computer System Interface ）ターゲット機能を持たせ、ホストコンピュータのインターフェースをＳＣＳＩイニシエータとして働かせることにより、ホストコンピュータとホスト接続手段との接続を行い、記憶媒体接続手段に、例えば、ＳＣＳＩイニシエータ機能を持たせ、ＳＣＳＩターゲット機能を持つ複数の記憶媒体をディジーチェーン方式で記憶媒体接続手段に接続する。また、通信路１３０、１３１としては、例えば、Fibre Channel を用いて、長距離大容量通信を行う。
【００２３】
ホスト接続手段とデータ分割統合手段とデータ転送手段と記憶媒体接続手段は、例えば、１つのＣＰＵ（Central proccessing unit）を持つハードウェアにＳＣＳＩイニシエータ機能とＳＣＳＩターゲット機能とFibre Channel 接続機能を付加して実現する。また、データ分割統合手段には、例えば、ＲＡＩＤ（Redundant Array of Independent Disks）機能を持たせ、記憶媒体には、例えば、ハードディスク装置や光磁気ディスク装置等の記憶装置を使用する。
【００２４】
図２は図１の構成のうち、ノードＡ１０のホストコンピュータ１００からのデータ書き込みと読み出しに関係する構成要素のみを記述した図である。図１に示す構成では、それぞれのノードにホストコンピュータを持ち、それぞれのホストコンピュータは他のノードとは独立にデータへのアクセスができるが、ここからは説明を簡単にするために、図２を用いる。なお、ノードＢ２０、ノードＣ３０におけるそれぞれのホストコンピュータのデータアクセス動作はノードＡ１０のホストコンピュータ１００のデータアクセス動作と同様である。
【００２５】
また、図２においては、ホストコンピュータ１００から見た仮想記憶装置１４１、ノードＡ１０のデータ分割統合手段１０２から見たノードＢ２０の仮想記憶装置１４２、ノードＡ１０のデータ分割統合手段１０２から見たノードＣ３０の仮想記憶装置１４３を記載している。
次に前述した図１における各手段の機能を図２を用いて説明する。なお、以下図２を用いて説明する機能は、図１に示す各手段の機能と同様である。
【００２６】
図２中、ホストコンピュータ１００においてホスト接続手段１０１は、仮想記憶装置１４１の入出力を行うものとみなされるため、ホストコンピュータ１００は、データの書き込みと読み出しを行う際には、仮想記憶装置１４１の論理的なアドレスを指定する。このアドレスは、例えばＳＣＳＩのＬＢＡ（Logical Block Address ）を使用する。ホスト接続手段１０１はこの論理的なアドレスをホストコンピュータ１００から受け、データ分割統合手段１０２に送る機能を有する。
【００２７】
また、ノードＡ１０のデータ分割統合手段１０２においてはノードＢ２０に仮想記憶装置１４２が接続され、ノードＣ３０に仮想記憶装置１４３が接続されているものとみなされるため、ノードＡ１０のデータ分割統合手段１０２は、各仮想記憶装置１４２、１４３へのデータの書き込みと読み出しを行う際には、１４２と１４３の論理的なアドレスを指定する。
【００２８】
したがって、ノードＡ１０のデータ分割統合手段１０２は、データアクセスの際に、ノードＡ１０のホストコンピュータ１００から見た仮想記憶装置１４１の論理的なアドレスをホスト接続手段１０１から受け、そのアドレスをデータ分割統合手段１０２から見たノードＢ２０とノードＣ３０の仮想記憶装置１４２、１４３の論理的なアドレスに変換を行い、変換したアドレスをデータ転送手段に送る機能を有する。また、データ分割統合手段１０２は、仮想記憶装置１４１の論理的なアドレスと仮想記憶１４２、１４３の論理的なアドレスの対応付けや、冗長データを蓄積しておくアドレスのための表または算出式を、データ分割蓄積情報として保持する機能を有し、更に、データ分割蓄積情報のバックアップ情報を蓄積しておくための仮想記憶装置１４２、１４３の論理的なアドレスを保持する機能を有している。
【００２９】
データ転送手段１０３は各ノード宛のパケットのヘッダを作成し、パケットを通信路に送る機能を有する。また、データ転送手段は自ノード宛のパケットを受け取り、アドレスとデータを記憶媒体に送る機能も有する。
ノードＢ２０の記憶媒体接続手段１１４は、データ転送手段１１３からノードＢ２０の仮想記憶装置１４２の論理的なアドレスを受け、そのアドレスを、ノードＢ２０の記憶媒体１１５、１１６のＩＤと論理的なアドレスへの変換を行う機能を有する。ノードＣ３０についてもノードＢ２０と同様に、ノードＣ３０の記憶媒体接続手段１２４では、ノードＣ３０の仮想記憶装置１４３の論理的なアドレスからノードＣ３０の記憶媒体１２５、１２６へのＩＤと論理的なアドレスへの変換を行う。記憶媒体１１５、１１６、１２５、１２６の論理的なＩＤとアドレスとしては、例えば、ＳＣＳＩのＩＤとＬＢＡを使用する。
【００３０】
続いて、図２に示す構成を用いて第１の実施例におけるデータ書き込みの動作を、図３に示すフローチャートを用いて説明する。
ステップ１）ノードＡ１０のホストコンピュータ１００は、ノードＡ１０のホスト接続手段１０１へ、仮想記憶装置１４１の論理的なアドレスとデータを送る。
【００３１】
ステップ２）ノードＡ１０のホスト接続手段１０１は、データ分割統合手段１０２へ仮想記憶装置１４１の論理的なアドレスとデータを送る。
ステップ３）ノードＡのデータ分割統合手段１０２は、データ分割蓄積情報に従って、仮想記憶装置１４１の論理的なアドレスを仮想記憶装置１４２、１４３の論理的なアドレスへ変換する。また、冗長データが必要な場合は冗長なデータを生成し、データ分割蓄積情報に従って、仮想記憶装置１４２、１４３の論理的なアドレスを付与する。その後、仮想記憶装置のＩＤと変換後のアドレスと分割データと冗長データをノードＡ１０のデータ転送手段１０３へ送る。
【００３２】
ステップ４）ノードＡ１０のデータ転送手段１０３は、仮想記憶装置のＩＤを基に各ノード宛のパケットのヘッダを作成し、アドレスと分割データまたは冗長データをつけたパケットを通信路１３０、１３１へ送信する。
ステップ５）ノードＢ２０ではデータ転送手段１１３で、ノードＣ３０ではデータ転送手段１２３で、それぞれ自ノード宛のパケットを受け取る。そして、受け取ったアドレスとデータをそれぞれ記憶媒体接続手段１１４、１２４に送る。
【００３３】
ステップ６）ノードＢ２０の記憶媒体接続手段１１４は仮想記憶装置１４２のアドレスを記憶媒体１１５、１１６のＩＤと論理的なアドレスに変換し、ノードＣ３０の記憶媒体接続手段１２４は仮想記憶装置１４３のアドレスを記憶媒体１２５、１２６のＩＤと論理的なアドレスに変換し、それぞれのＩＤとアドレスに従ってデータをそれぞれの記憶媒体に送り、書き込みを行う。
【００３４】
次に、データを読み出す場合の動作を、図４に示すフローチャートを用いて説明する。
ステップ１１）ノードＡ１０のホストコンピュータ１００は、ノードＡ１０のホスト接続手段１０１へ、読み出したい仮想記憶装置１４１の論理的なアドレスを送る。
【００３５】
ステップ１２）ノードＡ１０のホスト接続手段１０１は、データ分割統合手段１０２へ、読み出したい仮想記憶装置１４１の論理的なアドレスを送る。
ステップ１３）ノードＡ１０のデータ分割統合手段１０２は、データ分割蓄積情報に従って、仮想記憶装置１４１の論理的なアドレスを仮想記憶装置１４２、１４３の論理的なアドレスへ変換する。その後、仮想記憶装置のＩＤと変換後のアドレスをノードＡのデータ転送手段１０３へ送る。
【００３６】
ステップ１４）ノードＡ１０のデータ転送手段１０３は、仮想記憶装置のＩＤを基に各ノード宛のパケットのヘッダを作成し、アドレスをつけたパケットを通信路１３０、１３１へ送信する。
ステップ１５）ノードＢ２０ではデータ転送手段１１３で、ノードＣ３０ではデータ転送手段１２３で、自ノード宛のパケットを受け取る。そして、アドレスをそれぞれ記憶媒体接続手段１１４、１２４に送る。
【００３７】
ステップ１６）ノードＢ２０の記憶媒体接続手段１１４は仮想記憶装置１４２のアドレスを記憶媒体１１５、１１６のＩＤと論理的なアドレスに変換し、ノードＣ３０の記憶媒体接続手段１２４は仮想記憶装置１４３のアドレスを記憶媒体１２５、１２６のＩＤと論理的なアドレスに変換し、それぞれのＩＤとアドレスに従ってデータをそれぞれの記憶媒体から読み出す。記憶媒体１１５、１１６から読み出したデータと仮想記憶装置１４２のアドレスをデータ転送手段１１３へ返し、記憶媒体１２５、１２６から読み出したデータと仮想記憶装置１４３のアドレスをデータ転送手段１２３へ返す。
【００３８】
ステップ１７）ノードＢ２０のデータ転送手段１１３は、仮想記憶装置１４２のＩＤとアドレスとデータから、ノードＡ１０宛に返すパケットを生成し、ノードＣ３０のデータ転送手段１２３は、仮想記憶装置１４３のＩＤとアドレスとデータから、ノードＡ１０宛に返すパケットを生成し、それぞれ通信路１３０、１３１へ返信する。
【００３９】
ステップ１８）ノードＡ１０のデータ転送手段１０３は、ノードＢ２０とノードＣ３０からのパケットを受け取り、ＩＤとアドレスとデータをノードＡ１０のデータ分割統合手段１０２へ返す。
ステップ１９）ノードＡ１０のデータ分割統合手段１０２は、データ分割蓄積情報を使用して仮想記憶装置１４２、１４３のＩＤとアドレスを仮想記憶装置１４１のアドレスに変換し、ノードＢ２０、Ｃ３０から届いた分割データを統合する。統合したデータをノードＡ１０のホスト接続手段１０１へ返す。
【００４０】
ステップ２０）ノードＡ１０のホスト接続手段１０１は受け取ったデータをノードＡ１０のホストコンピュータ１００へ返す。
なお、図２では、各ノードに記憶媒体は２つづつ存在する構成となっているが、各ノードの記憶媒体接続手段１１４、１２４は各ノードの仮想記憶装置の論理的なアドレスを記憶媒体のＩＤとアドレスに変換するだけであるので、記憶媒体の数は２つに限られず、必要な容量にあわせて自由に選ぶことができる。前述したように、上記で説明したホストコンピュータからのデータ書き込みと読み出しの動作は、図１におけるノードＢ２０のホストコンピュータ１１０とノードＣ３０のホストコンピュータ１２０においても同様である。
【００４１】
また、図１に示す構成では、ノード数が３であるが、ノード数が２以上であればノード数が３の場合と同様な動作と効果を得ることができる。
上記で説明したような構成となっているので、ノードごとに異なるファイルシステムを有する分散記憶装置において、各ノードはホスト接続手段とデータ分割統合手段と記憶媒体接続手段とデータ転送手段を１つにでき、また、チャネル型の構成であっても通信路を共有することが可能となる。その効果として、以下に示すようなノード数にほぼ比例したコストで装置を提供することが可能となる。
【００４２】
Ｃ_Total
= ｛( Ｃ_HC+ Ｃ_DI+ Ｃ_T+ Ｃ_T+ Ｃ_MC) ×Ｎ｝+ ｛Ｃ_C×( Ｎ-1) ｝
= ＮＣ_HC+ ＮＣ_DI+ ＮＣ_T+ ＮＣ_T+ ＮＣ_MC+(Ｎ-1) Ｃ_C
= ＮＣ_HC+ ＮＣ_DI+ 2 ＮＣ_T+ ＮＣ_MC+(Ｎ-1) Ｃ_C
ただし、Ｃ_Totalはトータルコスト、Ｃ_HCはホスト接続手段のコスト、Ｃ_DIはデータ分割統合手段のコスト、Ｃ_Tはデータ転送手段のコスト、Ｃ_MCは記憶媒体接続手段のコスト、Ｃ_Cは通信路のコスト、Ｎはノード数を表す。
【００４３】
比較対象として従来の技術における図７において説明した一般的なＲＡＩＤ装置を使ってバス型ネットワーク接続を使用した同一の機能を実現する構成のコストを以下の式に示す。

ただし、Ｃ_Totalはトータルコスト、Ｃ_HCはホスト接続手段のコスト、Ｃ_DIはデータ分割統合手段のコスト、Ｃ_Tはデータ転送手段のコスト、Ｃ_MCは記憶媒体接続手段のコスト、Ｃ_Cは通信路のコスト、Ｎはノード数を表す。
【００４４】
後者から前者を引いた差をとって、Ｎ（Ｎ−１）Ｃ_T＋Ｎ（Ｎ−２）Ｃ_MCのコストダウンが図れることがわかる。この結果が示すように、従来の技術にくらべて、コストを低く押さえることができる。
また、データ分割統合手段はＲＡＩＤ０〜６に対応しているため、各ノードはＲＡＩＤ０〜６の機能を実現でき、各ノードにデータ分割統合手段があるので、地域災害によりあるノードが破壊されても別のノードのデータアクセス性は保持され、可用性が実現できる。更に、各ノードにホスト接続手段を持つので、複数コンピュータからのアクセスが可能となる。
【００４５】
次に、可用性を示すためにノード障害による復旧の動作について、図２を用いて、図１の構成においてノードＡ１０のホストコンピュータ１００が分散記憶媒体に蓄積しているデータに対して障害が起こった場合について説明する。
障害の発生場所としては、ノードＢ２０又はノードＣ３０での発生、及びノードＡ１０での発生の２通りある。まず、ノードＢ２０で発生した場合を説明する。なお、ノードＣ３０で発生しても同様である。
【００４６】
データ分割統合手段１０２がＲＡＩＤレベル１を選択している場合、データ分割統合手段１０２がノードＢ２０にアクセスできなくなったことを検知すると、データ分割統合手段１０２はノードＣ３０にのみアクセスをする。これにより
ノードＢ２０障害時においてもデータアクセス可能である。ノードＢ２０が復旧した際には、ノードＢ２０が復旧した旨の情報をノードＡ１０が検知し、データ分割蓄積情報に従ってノードＣ３０のデータを基にノードＢ２０のデータを復元し、ノードＢ２０の記憶媒体に格納することでデータ復旧ができる。復旧中は、ホストコンピュータ１００からのデータ書き込みによるデータの追加や変更もデータ分割統合手段１０２にて復元動作に組み入れるため、復帰中もデータのアクセスが可能である。ＲＡＩＤレベル２〜６についての動作は上記に説明した動作と同様である。
【００４７】
次に、ノードＡ１０で障害が発生し、その障害が復旧した後の動作について説明する。
データ分割統合手段１０２は、分割されたデータの位置情報やＲＡＩＤレベル等が書き込まれたデータ分割蓄積情報が消失しているため、ノードＢ２０又はノードＣ３０から記憶媒体の所定の位置にあるバックアップされたデータ分割蓄積情報を取得する。これによりノードＡ１０から各ノードへのアクセスが可能となる。ノードＡ１０の障害では、ホストコンピュータ１００に対するデータの消失は無いので、ＲＡＩＤレベル０〜６を使用した復旧が可能である。
【００４８】
図５は本発明の第２の実施例の構成を示す図である。同図に示すように、第２の実施例は、第１の実施例と同様にノードＡ１０、ノードＢ２０、及びノードＣ３０から構成され、各ノード内の構成も接続方法にかかわる部分を除いて第１の実施例と同一である。第２の実施例では、第１の実施例と異なり、各ノードがチャネル型ネットワーク接続により接続されている。すなわち、ノードＡ１０とノードＢ２０は通信路１５０により接続され、ノードＡ１０とノードＣ３０は通信路１５１により接続され、ノードＢ２０とノードＣ３０は通信路１５２により接続されている。
【００４９】
図６は図５の構成のうち、ノードＡ１０のホストコンピュータ１００からのデータ書き込みと読み出しに関係する構成要素のみを記述した図である。すなわち、図５に示す構成では、それぞれのノードにホストコンピュータを持ち、それぞれのホストコンピュータは他のノードとは独立にデータへのアクセスができるが、図６に示す構成ではホストコンピュータ１００からのデータ書き込みと読み出しに関係する構成要素のみ抜き出して記述している。なお、ノードＢ２０、ノードＣ３０におけるそれぞれのホストコンピュータのデータアクセス動作はノードＡ１０のホストコンピュータ１００のデータアクセス動作と同様である。
【００５０】
また、図６においては、ホストコンピュータ１００から見た仮想記憶装置１４１、ノードＡ１０のデータ分割統合手段１０２から見たノードＢ２０の仮想記憶装置１４２、ノードＡ１０のデータ分割統合手段１０２から見たノードＣ３０の仮想記憶装置１４３を記載している。
図５に示す各手段の実現例、機能は第１の実施例と同様である。また、図６に示す構成の動作は、図２で説明した第１の実施例における動作と同様である。すなわち、バス型ネットワークあるいはチャネル型ネットワークノードのどちらを接続手段として利用しても、本発明による分散記憶システムは同様の動作を行う。
【００５１】
第２の実施例に示す構成においても第１の実施例と同様に、ノードごとに異なるファイルシステムを有する分散記憶装置においては、各ノードはホスト接続手段とデータ分割統合手段と記憶媒体接続手段とデータ転送手段を１つにでき、また、通信路を共有することが可能となる。ここで、第２の実施例のコスト面の効果を示すと、以下のように、通信路以外はノード数にほぼ比例した低いコストで装置を提供することが可能となる。
【００５２】
Ｃ_Total
= ｛( Ｃ_HC+ Ｃ_DI+ Ｃ_T+ Ｃ_T+ Ｃ_MC) ×Ｎ｝+ ｛Ｃ_C×N(Ｎ-1) ｝
= ＮＣ_HC+ ＮＣ_DI+ ＮＣ_T+ ＮＣ_T+ ＮＣ_MC+N( Ｎ-1) Ｃ_C
= ＮＣ_HC+ ＮＣ_DI+ 2 ＮＣ_T+ ＮＣ_MC+N( Ｎ-1) Ｃ_C
ただし、Ｃ_Totalはトータルコスト、Ｃ_HCはホスト接続手段のコスト、Ｃ_DIはデータ分割統合手段のコスト、Ｃ_Tはデータ転送手段のコスト、Ｃ_MCは記憶媒体接続手段のコスト、Ｃ_Cは通信路のコスト、Ｎはノード数を表す。
【００５３】
比較対象として従来の技術において図８で説明した一般的なＲＡＩＤ装置を使ってチャネル型ネットワーク接続を使用した同一の機能を実現する構成についてのコストを以下の式に示す。

ただし、Ｃ_Totalはトータルコスト、Ｃ_HCはホスト接続手段のコスト、Ｃ_DIはデータ分割統合手段のコスト、Ｃ_Tはデータ転送手段のコスト、Ｃ_MCは記憶媒体接続手段のコスト、Ｃ_Cは通信路のコスト、Ｎはノード数を表す。
【００５４】
後者から前者を引いた差をとると、Ｎ（Ｎ−１）Ｃ_T＋Ｎ（Ｎ−２）Ｃ_MCのコストダウンが図れることがわかる。この結果が示すように、従来の技術にくらべて、コストが低く押さえられる。通信路については前者は後者に比較して１／２のコストとなる。
また、第１の実施例と同様に、第２の実施例はデータ分割統合手段はＲＡＩＤ０〜６に対応しているため、各ノードはＲＡＩＤ０〜６の機能を実現でき、各ノードにデータ分割統合手段があるので、地域災害によりあるノードが破壊されても別のノードのデータアクセス性は保持され、可用性が実現できる。更に、各ノードにホスト接続手段を持つので、複数コンピュータからのアクセスが可能となる。
【００５５】
次に、本発明における分散記憶プログラムを記録した記録媒体の実施例について説明する。図９は、ＣＰＵ６００、メモリ６０１、外部記憶装置６０２、ディスプレイ６０３、キーボード６０４、通信処理装置６０５を備えたコンピュータシステムの構成図であり、本発明における分散記憶プログラムを記録した記録媒体は図９に示すメモリ６０１又は外部記憶装置６０２のいずれか又は両方に相当する。また、光磁気ディスク、磁気ディスク、磁気テープ等の可搬媒体、又は電子メモリ、ハードディスク等も本発明の記録媒体に相当し、これらの記録媒体に格納された本発明の手段を有する分散記憶プログラムを、図９に示すコンピュータシステムにローディングし、該コンピュータシステムをホストコンピュータ又は記憶媒体のいずれか又は両方に接続させ、他ノードの本発明による方法を使用したシステムと通信路を介して接続することにより、該コンピュータシステムにおいて上記の分散記憶方法の使用が可能となる。
【００５６】
なお、本発明は上記の実施例に限定されることなく、特許請求の範囲内で種々変更・応用が可能である。
【００５７】
【発明の効果】
上記のように、本発明によれば、データ分割統合手段がデータ保護構成を指定し、指定したデータ保護構成に適合した冗長データを生成し、指定したデータ保護構成に適合した場所に該冗長データを格納することができるので、各ノード毎にデータ保護構成機能を選択して実現することができる。また、データ保護構成をＲＡＩＤ０〜６のいずれかとすることにより、各ノード毎にＲＡＩＤ０〜６を選択して実現することができる。
【００５８】
また、本発明によれば、各ノードにデータ分割統合手段があるので、地域災害によりあるノードが破壊されても別のノードのデータアクセス性が保持されるため、可用性が実現できる。更に、各ノードにおけるデータ転送手段はデータ振り分け機能を持ち、データ転送手段に接続された記憶媒体接続手段はノード毎に割り当てられた記憶媒体又は記憶領域にデータアクセスすることができるため、単一のデータ転送手段及び記憶媒体接続手段により自ノードのホストコンピュータ及び他ノードの複数ホストコンピュータからのアクセスを受ける複数の記憶媒体を自ノードに設けることが可能となり、コスト削減が可能である。また、チャネル型の構成でも通信路を共有させることができ、更なるコスト削減が可能である。
【図面の簡単な説明】
【図１】本発明における第１の実施例の構成を示す図である。
【図２】本発明における第１の実施例の動作説明のための図である。
【図３】本発明における第１の実施例のホストコンピュータの書き込み動作を示すフローチャートである。
【図４】本発明における第１の実施例のホストコンピュータの読み込み動作を示すフローチャートである。
【図５】本発明における第２の実施例の構成を示す図である。
【図６】本発明における第２の実施例の動作をわかりやすく示すための図である。
【図７】従来の技術における分散記憶装置の第１の例である。
【図８】従来の技術における分散記憶装置の第２の例である。
【図９】本発明の記録媒体の実施例におけるコンピュータシステムの構成図である。
【符号の説明】
１０ノードＡ
２０ノードＢ
３０ノードＣ
１００、１１０、１２０、５００、５１０、５２０ホストコンピュータ
１０１、１１１、１２１、５０１、５１１、５２１ホスト接続手段
１０２、１１２、１２２データ分割統合手段
５０２、５１２、５２２ＲＡＩＤ制御手段
１０３、１１３、１２３、５０３、５１３、５２３データ転送手段
５０６、５１６、５２６、５０９、５１９、５２９データ転送手段
１０４、１１４、１２４、５０５、５１５、５２５記憶媒体接続手段
５０８、５１８、５２８記憶媒体接続手段
５０４、５１４、５２４、５０７、５１７、５２７記憶媒体
１３０、１３１、５３０、５３１通信路
５４０、５４１、５４２、５４３、５４４通信路
６００ＣＰＵ
６０１メモリ
６０２外部記憶装置
６０３ディスプレイ
６０４キーボード
６０５通信処理装置[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a distributed storage method, a distributed storage system, and a recording medium on which a distributed storage program is recorded. In particular, the present invention has a data protection function against local disasters such as earthquakes and floods, and the data protection configuration can be selected for each node. The present invention relates to a storage method, a distributed storage system, and a recording medium on which a distributed storage program is recorded.
[0002]
[Prior art]
An example of a distributed storage device having local disaster resistance in the prior art is disclosed in, for example, Japanese Patent Laid-Open No. 5-324579. In the distributed storage device disclosed in JP-A-5-324579,
(1) Have three or more file devices that store data and redundant data storage devices that store redundant data as parity data.
(2) having one or more redundant data storage devices;
It is characterized by that.
[0003]
Because of these characteristics, even if a file device in one node is destroyed due to a natural disaster, etc., the redundant data stored in the redundant data storage device in another node and the file device operating normally The data stored in the destroyed file device can be restored from the stored data.
For example, when there are four file devices, the parity data p0 is calculated from the stripes a0, a1, a2, and a3 of each file device by the following formula.
[0004]
p0 = a0 + a1 + a2 + a3 (mod2)
Here, “+” represents modulo 2 addition.
Another example of a conventional distributed storage device having local disaster tolerance is a general RAID (Redundant Array of Independent Disks) device. This device uses one RAID controller that controls the disk device and two or more disk devices that store data or redundant data, and each device can be configured in a wide area by using, for example, Fiber Channel SCSI as a RAID controller interface. Can be distributed and can have disaster tolerance for data retention.
[0005]
FIG. 7 shows a first example of a conventional distributed storage device, in which a plurality of RAIDs are independently constructed for access from a plurality of computers. As shown in the figure, the first example includes a node A 10, a node B 20, and a node C 30, and each node is connected in a bus type by a communication path 5 30 and a communication path 5 31. In the node A 10, the host computer 500 is bus-connected through the host connection unit 500 1, the RAID control unit 502 and the data transfer unit 503, and the storage medium 504 is bus-connected through the storage medium connection unit 505 and the data transfer unit 506. The storage medium 507 is bus-connected via a storage medium connection unit 508 and a data transfer unit 509. The configurations of the node B20 and the node C30 are the same as the configuration of the node A10. The functions of the storage medium connection means and the data transfer means connected to the storage medium are often realized by a single computer, and the host connection means, RAID control means, and data transfer means connected to the host computer are the host. A function may be provided in a computer or a function may be realized by a single computer.
[0006]
In this configuration, for example, the host computer 500 uses the storage medium 524 and the storage medium 517 as a distributed storage medium constituting RAID2, and the host computer 510 uses the storage medium 504 and the storage medium 527 as a distributed storage medium constituting RAID2. Then, the host computer 520 uses the storage medium 507 and the storage medium 514 as a distributed storage medium constituting RAID2.
[0007]
FIG. 8 is a second example of a conventional distributed storage device, which is also an example in which a plurality of RAIDs are independently constructed for access from a plurality of computers.
As shown in the figure, the second example includes a node A10, a node B20, and a node C30. In the node A10, the host computer 500 is connected to the communication path 541 through the host connection unit 5001, the RAID control unit 502, and the data transfer unit 503, and at the node C30 through the data transfer unit 526 and the storage medium connection unit 525. Connected to the storage medium 524. The host computer 500 is connected to the communication path 540 through the host connection unit 5001, the RAID control unit 502, and the data transfer unit 503. In the node B20, the storage medium is connected through the data transfer unit 519 and the storage medium connection unit 518. 517 is connected. The host computer 510 and the host computer 520 are also connected to the storage medium with the same configuration. The functions of the storage medium connection means and the data transfer means connected to the storage medium are often realized by a single computer, and the host connection means, RAID control means, and data transfer means connected to the host computer are the host. A function may be provided in a computer or a function may be realized by a single computer.
[0008]
In this configuration, for example, the host computer 500 uses the storage medium 524 and the storage medium 517 as a distributed storage medium constituting RAID2, and the host computer 510 uses the storage medium 504 and the storage medium 527 as a distributed storage medium constituting RAID2. Then, the host computer 520 uses the storage medium 507 and the storage medium 514 as a distributed storage medium constituting RAID2.
[0009]
[Problems to be solved by the invention]
However, the device disclosed in Japanese Patent Laid-Open No. 5-324579 shows a configuration having parity data, and has the disadvantage that RAID level 1 (mirroring) having the highest reliability and availability without parity data cannot be configured. Have.
In order to increase the speed and reliability, generally, a plurality of RAID level 0 disk array devices are prepared, and each is regarded as a single disk device, and these are combined with a RAID level 1 technology. However, the device disclosed in Japanese Patent Application Laid-Open No. 5-324579 shows a configuration having parity data, and it is not possible to configure RAID level 0 (striping) useful at high speed. Has drawbacks. Further, the apparatus disclosed in Japanese Patent Laid-Open No. 5-324579 has a defect that RAID level 2 using a Hamming code cannot be configured because parity data is calculated by an arithmetic operation using modulo2 addition. .
[0010]
RAID level 6 with one-dimensional redundancy requires two types of redundant data generation algorithms. If modulo2 addition is selected as the first type, the second type is for example a Reed-Solomon error other than modulo2 addition. It is necessary to select a correction encoding method. Since the apparatus disclosed in Japanese Patent Laid-Open No. 5-324579 is compatible only with the addition of modulo 2, it has a drawback that RAID level 6 having one-dimensional redundancy cannot be configured.
[0011]
Further, in RAID level 6 having two-dimensional redundancy, redundant data generation (for example, addition of modulo2) is required for one data write, but this is disclosed in Japanese Patent Laid-Open No. 5-324579. Since the apparatus only generates redundant data once, it has a drawback that RAID level 6 having two-dimensional redundancy cannot be configured.
Furthermore, according to the prior art, a general RAID device has only one RAID controller, so that a natural disaster or the like occurs in an area where the RAID controller is located, and the entire system stops when the RAID controller is destroyed. It has the disadvantage that it cannot have the property that it is accessible to local disasters, that is, the data can be accessed even if the device is destroyed. Further, a general RAID device has only one RAID controller, and therefore has a drawback that it cannot be accessed from a plurality of computers. In addition, a general RAID device has to use a plurality of RAID devices in order to access from a plurality of computers, and each device is constructed completely independently. That is, as shown in FIG. 7 and FIG. 8, when each node has a different file system and the data of each node is distributed and accumulated in each other node with redundancy, the conventional technique uses data transfer means. Since the cost of the storage medium connection means is proportional to the order of the square of the number of nodes, there is a problem that a large amount of investment is required for construction.
[0012]
The present invention has been made in view of the above points, solves the problems described above, and can select and configure a data protection function including RAID level 0 and RAID levels 1 to 6 for each node. An object of the present invention is to provide a low-cost distributed storage method, distributed storage system, and recording medium on which a distributed storage program is recorded, which can be accessed from a plurality of computers without being duplicated. To do.
[0013]
[Means for Solving the Problems]
The invention according to claim 1 is a distributed storage method in which data of a host computer in another node is distributed and accumulated in a distributed storage medium in nodes distant from each other via a communication path. One type of data protection configuration is designated from a plurality of types of data protection configurations by a data division integration unit that divides data when data is written from the host computer to the distributed storage medium and integrates data when data is read. The data transfer means for transferring the data of the host computer and the distributed storage medium to the communication path extending from the one node to the other node is whether the data arriving from the other node is data to the host computer of the one node. Alternatively, it is determined whether the data is for a distributed storage medium installed in the one node, and the data is stored in the host computer. A storage medium connecting unit that distributes data to the distributed storage medium and connects the distributed storage medium and the data transfer unit to the data distributed to the distributed storage medium by a plurality of other nodes that are data transmission sources. In this case, the distributed storage method is characterized in that a plurality of storage media or a plurality of storage areas of a single storage medium are written for each other node and read for each other node.
[0014]
3. The distributed storage method according to claim 2, wherein the data division and integration unit generates redundant data if necessary by the data protection configuration at the time of the data division, and the storage location of the divided data and the redundant data When the redundant data is generated when the data division accumulation information holding the information specifying the data protection information and the information specifying the data protection configuration is generated, if there is data loss at the time of the data integration, the data With reference to the divided accumulation information, the remaining data and the data lost due to the redundant data are restored.
[0015]
According to a third aspect of the present invention, there is provided a distributed storage system in which data of a host computer in another node is distributed and stored in a distributed storage medium in a node distant from each other via a communication path. A data division and integration means for dividing data when writing data from a host computer to a distributed storage medium, integrating data when reading data, and designating one type of data protection configuration from a plurality of types of data protection configurations; and one node Data of the host computer and the distributed storage medium is transferred to a communication path extending from the one node to another node, and the data arriving from the other node is the data to the host computer of the one node or the one of the one node It is determined whether the data is for a distributed storage medium installed in the node, and the data is sent to the host computer or the distributed A data transfer unit that distributes data to a storage medium, and the distributed storage medium and the data transfer unit are connected, and the data distributed to the distributed storage medium is And a storage medium connection means for writing to a plurality of storage areas or a plurality of storage areas of a single storage medium for each node and reading for each other node.
[0016]
The distributed storage system according to claim 4, wherein the data division and integration unit generates redundant data if necessary by the data protection configuration at the time of the data division, and the divided data and the redundant data Means for generating data division accumulation information for holding information for specifying a storage location and information for specifying the data protection configuration, and when the redundant data has been generated, when there is data loss during the data integration Includes means for restoring the remaining data and data lost due to the redundant data with reference to the data division accumulation information.
[0017]
According to the fifth aspect of the present invention, a distributed storage method for distributing and storing data of a host computer in another node via a communication path in a distributed storage medium in a remote node is executed in the computer of each node. A recording medium on which a distributed storage program is recorded,The computerA data division and integration means for dividing data when writing data from a host computer to a distributed storage medium, integrating data when reading data, and designating one type of data protection configuration from a plurality of types of data protection configurations; Data of the host computer and the distributed storage medium is transferred to a communication path extending from the one node to another node, and the data arriving from the other node is data to the host computer of the one node or the one node Determining whether the data is to be distributed to the distributed storage medium, and transferring the data to the host computer or the distributed storage medium; connecting the distributed storage medium and the data transfer means to the distributed storage medium; When there are a plurality of other nodes as data transmission sources, a plurality of distributed data are assigned to each other node.憶媒 body or write to multiple storage areas of a single storage medium, the storage medium connection means for reading out each said other node,To function asA recording medium on which a distributed storage program is recorded.
[0018]
7. The recording medium on which the distributed storage program according to claim 6 is recorded, wherein the data division and integration means includes means for generating redundant data if necessary according to the data protection configuration at the time of the data division, and the divided data And data division storage information for holding information specifying the storage location of the redundant data, and information specifying the data protection configuration, and when the redundant data is generated, the data is lost when the data is integrated. If there is, there is means for referring to the data division accumulation information and restoring the remaining data and the data lost due to the redundant data.
[0019]
As described above, according to the present invention, the data division and integration means designates a data protection configuration, generates redundant data suitable for the designated data protection configuration, and places the redundant data in a location suitable for the designated data protection configuration. Therefore, the data protection configuration function can be selected and realized for each node. Further, by setting the data protection configuration to any one of RAID0 to RAID6, RAID0 to RAID6 can be selected and realized for each node.
[0020]
Further, according to the present invention, since each node has a data division and integration means, even if a node is destroyed due to a regional disaster, the data accessibility of another node is maintained, so that availability can be realized. In addition, since the data transfer means has a function to determine whether the data arriving from other nodes belongs to the file system of the own node or the file system of the other node, the conventional technology prepares for the own node and other nodes. The data transfer means that had to be shared can be shared, the storage medium connection means can be connected to a single or multiple media, the storage area of single or multiple media is divided by node, and the divided data is written and read by node In the conventional technology, the storage medium connecting means that is equivalent to the number of other nodes can be shared. Therefore, it is possible to reduce the number of devices that are redundantly required in the conventional technique, and it is possible to reduce the cost. Further, since each node is connected to one data transfer means, at least one communication path is required between the host computer and the storage medium in the channel type configuration in the prior art. If it is this, it can be set as a structure and the further cost reduction is attained.
[0021]
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 is a diagram showing the configuration of the first embodiment of the present invention. As shown in the figure, the first embodiment is composed of a node A 10, a node B 20, and a node C 30, and each node is connected to a bus network by a communication path 130 and a communication path 131. The node A10 includes a host computer 100, a host connection unit 101, a data division and integration unit 102, a data connection unit 103, a storage medium connection unit 104, a storage medium 105, and a storage medium 106, and the node B includes a host computer 110 and a host connection unit. 111, data division integration unit 112, data transfer unit 113, storage medium connection unit 114, storage medium 115, and storage medium 116. Node C is a host computer 120, host connection unit 121, data division integration unit 122, data transfer. It comprises means 123, storage medium connection means 124, storage device 125, and storage device 126. The function of each means and the operation of this embodiment will be described later. Next, an implementation example of each means will be described.
[0022]
For example, the host connection means is provided with a SCSI (Small Computer System Interface) target function, and the host computer interface is operated as a SCSI initiator to connect the host computer and the host connection means. For example, a SCSI initiator function is provided, and a plurality of storage media having a SCSI target function are connected to the storage medium connection means by a daisy chain method. Further, as the

communication paths

130 and 131, for example, long distance and large capacity communication is performed using Fiber Channel.
[0023]
The host connection means, data division integration means, data transfer means, and storage medium connection means add, for example, a SCSI initiator function, a SCSI target function, and a Fiber Channel connection function to hardware having one CPU (Central processing unit). Realize. The data division and integration means has, for example, a RAID (Redundant Array of Independent Disks) function, and the storage medium uses a storage device such as a hard disk device or a magneto-optical disk device.
[0024]
FIG. 2 is a diagram describing only the components related to data writing and reading from the host computer 100 of the node A 10 in the configuration of FIG. In the configuration shown in FIG. 1, each node has a host computer, and each host computer can access data independently of other nodes. However, for the sake of simplicity, FIG. Use. Note that the data access operation of each host computer at the node B20 and the node C30 is the same as the data access operation of the host computer 100 at the node A10.
[0025]
In FIG. 2, the virtual storage device 141 viewed from the host computer 100, the virtual storage device 142 of the node B20 viewed from the data division integration unit 102 of the node A10, and the node C30 viewed from the data division integration unit 102 of the node A10. The virtual storage device 143 is described.
Next, the function of each means in FIG. 1 will be described with reference to FIG. In addition, the function demonstrated below using FIG. 2 is the same as the function of each means shown in FIG.
[0026]
In FIG. 2, in the host computer 100, the host connection means 101 is regarded as performing input / output of the virtual storage device 141. Therefore, when the host computer 100 writes and reads data, Specify a logical address. This address uses, for example, SCSI LBA (Logical Block Address). The host connection unit 101 has a function of receiving this logical address from the host computer 100 and sending it to the data division and integration unit 102.
[0027]
Further, in the data division integration unit 102 of the node A10, since the virtual storage device 142 is connected to the node B20 and the virtual storage device 143 is connected to the node C30, the data division integration unit 102 of the node A10 When writing and reading data to and from the virtual storage devices 142 and 143, the logical addresses of 142 and 143 are designated.
[0028]
Therefore, the data division and integration unit 102 of the node A 10 receives the logical address of the virtual storage device 141 as seen from the host computer 100 of the node A 10 from the host connection unit 101 and accesses the data division and integration when accessing data. It has a function of converting the logical addresses of the virtual storage devices 142 and 143 of the node B20 and the node C30 as viewed from the means 102 and sending the converted addresses to the data transfer means. In addition, the data division and integration unit 102 associates a logical address of the virtual storage device 141 with a logical address of the virtual storage 142 or 143, or a table or calculation formula for an address for storing redundant data. Has a function of holding as data division storage information, and further has a function of holding logical addresses of the virtual storage devices 142 and 143 for storing backup information of the data division storage information.
[0029]
The data transfer means 103 has a function of creating a header of a packet addressed to each node and sending the packet to the communication path. The data transfer means also has a function of receiving a packet addressed to its own node and sending an address and data to a storage medium.
The storage medium connection unit 114 of the node B 20 receives the logical address of the virtual storage device 142 of the node B 20 from the data transfer unit 113, and changes the address to the IDs and logical addresses of the storage media 115 and 116 of the node B 20. Has a function of performing the conversion. Similarly to the node B20, the storage medium connecting unit 124 of the node C30 also changes the node C30 from the logical address of the virtual storage device 143 of the node C30 to the ID and logical address of the

storage medium

125 and 126 of the node C30. Perform the conversion. As logical IDs and addresses of the

storage media

115, 116, 125, 126, for example, SCSI IDs and LBAs are used.
[0030]
Next, the data write operation in the first embodiment will be described with reference to the flowchart shown in FIG. 3 using the configuration shown in FIG.
Step 1) The host computer 100 of the node A10 sends the logical address and data of the virtual storage device 141 to the host connection means 101 of the node A10.
[0031]
Step 2) The host connection unit 101 of the node A10 sends the logical address and data of the virtual storage device 141 to the data division and integration unit 102.
Step 3) The data division integration unit 102 of the node A converts the logical address of the virtual storage device 141 into the logical address of the virtual storage devices 142 and 143 according to the data division accumulation information. If redundant data is required, redundant data is generated, and logical addresses of the virtual storage devices 142 and 143 are assigned according to the data division accumulation information. Thereafter, the virtual storage device ID, the converted address, the divided data, and the redundant data are sent to the data transfer means 103 of the node A10.
[0032]
Step 4) The data transfer means 103 of the node A10 creates a header of the packet addressed to each node based on the virtual storage device ID, and transmits the packet with the address and the divided data or redundant data to the

communication paths

130 and 131. To do.
Step 5) The packet addressed to the own node is received by the data transfer means 113 at the node B20 and the data transfer means 123 at the node C30. Then, the received address and data are sent to the storage

medium connecting means

114 and 124, respectively.
[0033]
Step 6) The storage medium connection unit 114 of the node B 20 converts the address of the virtual storage device 142 into the ID and logical address of the storage media 115 and 116, and the storage medium connection unit 124 of the node C30 converts the address of the virtual storage device 143. Are converted into IDs and logical addresses of the

storage media

125 and 126, and data is sent to the respective storage media according to the respective IDs and addresses for writing.
[0034]
Next, the operation for reading data will be described with reference to the flowchart shown in FIG.
Step 11) The host computer 100 of the node A10 sends the logical address of the virtual storage device 141 to be read to the host connection means 101 of the node A10.
[0035]
Step 12) The host connection unit 101 of the node A10 sends the logical address of the virtual storage device 141 to be read to the data division and integration unit 102.
Step 13) The data division integration means 102 of the node A10 converts the logical address of the virtual storage device 141 into the logical address of the virtual storage devices 142 and 143 according to the data division accumulation information. Thereafter, the ID of the virtual storage device and the converted address are sent to the data transfer means 103 of the node A.
[0036]
Step 14) The data transfer means 103 of the node A10 creates a header of the packet addressed to each node based on the ID of the virtual storage device, and transmits the addressed packet to the

communication paths

130 and 131.
Step 15) The node B20 receives the packet addressed to the own node by the data transfer means 113 and the node C30 by the data transfer means 123. Then, the address is sent to the storage

medium connecting means

114 and 124, respectively.
[0037]
Step 16) The storage medium connection means 114 of the node B20 converts the address of the virtual storage device 142 into the IDs and logical addresses of the storage media 115 and 116, and the storage medium connection means 124 of the node C30 sets the address of the virtual storage device 143. Are converted into IDs and logical addresses of the

storage media

125 and 126, and data is read from the respective storage media according to the respective IDs and addresses. The data read from the storage media 115 and 116 and the address of the virtual storage device 142 are returned to the data transfer means 113, and the data read from the

storage media

125 and 126 and the address of the virtual storage device 143 are returned to the data transfer means 123.
[0038]
Step 17) The data transfer means 113 of the node B 20 generates a packet to be returned to the node A 10 from the ID, address and data of the virtual storage device 142, and the data transfer means 123 of the node C 30 determines the ID of the virtual storage device 143. A packet to be returned to the node A10 is generated from the address and data, and is returned to the

communication paths

130 and 131, respectively.
[0039]
Step 18) The data transfer means 103 of the node A10 receives the packets from the nodes B20 and C30, and returns the ID, address and data to the data division and integration means 102 of the node A10.
Step 19) The data division integration means 102 of the node A10 converts the ID and address of the virtual storage devices 142 and 143 into the address of the virtual storage device 141 using the data division accumulation information, and the division received from the nodes B20 and C30 Integrate data. The integrated data is returned to the host connection unit 101 of the node A10.
[0040]
Step 20) The host connection means 101 of the node A10 returns the received data to the host computer 100 of the node A10.
In FIG. 2, there are two storage media in each node. However, the storage medium connection means 114 and 124 in each node uses the logical address of the virtual storage device in each node. Since only the ID and address are converted, the number of storage media is not limited to two and can be freely selected according to the required capacity. As described above, the data write and read operations from the host computer described above are the same in the host computer 110 of the node B 20 and the host computer 120 of the node C 30 in FIG.
[0041]
In the configuration shown in FIG. 1, the number of nodes is three. However, if the number of nodes is two or more, the same operation and effect as in the case where the number of nodes is three can be obtained.
Since the configuration is as described above, in the distributed storage device having a different file system for each node, each node has one host connection means, data division integration means, storage medium connection means, and data transfer means. In addition, it is possible to share a communication path even in a channel type configuration. As an effect, it is possible to provide a device at a cost substantially proportional to the number of nodes as shown below.
[0042]
C_Total
= {(C_HC+ C_DI+ C_T+ C_T+ C_MC) × N} + {C_C× (N-1)}
= NC_HC+ NC_DI+ NC_T+ NC_T+ NC_MC+ (N-1) C_C
= NC_HC+ NC_DI+ 2 NC_T+ NC_MC+ (N-1) C_C
However, C_TotalIs the total cost, C_HCIs the cost of the host connection, C_DIIs the cost of data division integration means, C_TIs the cost of the data transfer means, C_MCIs the cost of the storage medium connection means, C_CRepresents the cost of the communication path, and N represents the number of nodes.
[0043]
As a comparison object, the cost of a configuration for realizing the same function using a bus type network connection using the general RAID device described in FIG.

However, C_TotalIs the total cost, C_HCIs the cost of the host connection, C_DIIs the cost of data division integration means, C_TIs the cost of the data transfer means, C_MCIs the cost of the storage medium connection means, C_CRepresents the cost of the communication path, and N represents the number of nodes.
[0044]
N (N-1) C is calculated by subtracting the former from the latter._T+ N (N-2) C_MCIt can be seen that the cost can be reduced. As this result shows, the cost can be reduced compared to the conventional technique.
Further, since the data division and integration means corresponds to RAID 0 to 6, each node can realize the functions of RAID 0 to 6, and each node has the data division and integration means, so even if a node is destroyed due to a regional disaster The data accessibility of another node is maintained and availability can be realized. Further, since each node has a host connection means, access from a plurality of computers is possible.
[0045]
Next, with regard to the recovery operation due to the node failure in order to show the availability, a failure has occurred in the data stored in the distributed storage medium by the host computer 100 of the node A 10 in the configuration of FIG. 1 using FIG. The case will be described.
There are two places where the failure occurs, that is, the occurrence at the node B20 or the node C30 and the occurrence at the node A10. First, a case where it occurs in the node B20 will be described. The same applies to the occurrence at node C30.
[0046]
When the data division integration unit 102 selects RAID level 1, when the data division integration unit 102 detects that the node B 20 cannot be accessed, the data division integration unit 102 accesses only the node C30. This
Data access is possible even when the node B 20 fails. When the node B20 is restored, the node A10 detects information indicating that the node B20 has been restored, restores the data of the node B20 based on the data of the node C30 according to the data division accumulation information, and stores it in the storage medium of the node B20. Data can be recovered by storing. During restoration, data addition and change by data writing from the host computer 100 is incorporated into the restoration operation by the data division and integration means 102, so that data can be accessed even during restoration. The operations for RAID levels 2 to 6 are the same as the operations described above.
[0047]
Next, an operation after a failure has occurred in the node A10 and the failure has been recovered will be described.
The data division and integration means 102 is backed up from a node B20 or a node C30 at a predetermined position on the storage medium because the data division accumulation information in which the position information of the divided data, the RAID level, etc. are written is lost. Acquire data division accumulation information. As a result, the node A10 can access each node. Since there is no loss of data for the host computer 100 due to the failure of the node A10, recovery using RAID levels 0 to 6 is possible.
[0048]
FIG. 5 is a diagram showing the configuration of the second embodiment of the present invention. As shown in the figure, the second embodiment is composed of a node A10, a node B20, and a node C30 as in the first embodiment, and the configuration in each node is the same except for the portion related to the connection method. This is the same as the first embodiment. In the second embodiment, unlike the first embodiment, each node is connected by channel type network connection. That is, the node A 10 and the node B 20 are connected by the communication path 150, the node A 10 and the node C 30 are connected by the communication path 151, and the node B 20 and the node C 30 are connected by the communication path 152.
[0049]
FIG. 6 is a diagram describing only the components related to data writing and reading from the host computer 100 of the node A10 in the configuration of FIG. That is, in the configuration shown in FIG. 5, each node has a host computer, and each host computer can access data independently of other nodes. In the configuration shown in FIG. Only components related to writing and reading are extracted and described. Note that the data access operation of each host computer at the node B20 and the node C30 is the same as the data access operation of the host computer 100 at the node A10.
[0050]
In FIG. 6, the virtual storage device 141 viewed from the host computer 100, the virtual storage device 142 of the node B20 viewed from the data partition integration unit 102 of the node A10, and the node C30 viewed from the data partition integration unit 102 of the node A10. The virtual storage device 143 is described.
The implementation example and function of each means shown in FIG. 5 are the same as those in the first embodiment. The operation of the configuration shown in FIG. 6 is the same as the operation in the first embodiment described in FIG. That is, the distributed storage system according to the present invention performs the same operation regardless of which of the bus type network or the channel type network node is used as the connection means.
[0051]
Also in the configuration shown in the second embodiment, as in the first embodiment, in a distributed storage device having a different file system for each node, each node has host connection means, data division integration means, storage medium connection means, and so on. One data transfer means can be provided, and a communication path can be shared. Here, when the cost effect of the second embodiment is shown, it is possible to provide a device at a low cost substantially proportional to the number of nodes other than the communication path as follows.
[0052]
C_Total
= {(C_HC+ C_DI+ C_T+ C_T+ C_MC) × N} + {C_C× N (N-1)}
= NC_HC+ NC_DI+ NC_T+ NC_T+ NC_MC+ N (N-1) C_C
= NC_HC+ NC_DI+ 2 NC_T+ NC_MC+ N (N-1) C_C
However, C_TotalIs the total cost, C_HCIs the cost of the host connection, C_DIIs the cost of data division integration means, C_TIs the cost of the data transfer means, C_MCIs the cost of the storage medium connection means, C_CRepresents the cost of the communication path, and N represents the number of nodes.
[0053]
As a comparison target, the cost of a configuration that realizes the same function using the channel type network connection using the general RAID device described in FIG.

However, C_TotalIs the total cost, C_HCIs the cost of the host connection, C_DIIs the cost of data division integration means, C_TIs the cost of the data transfer means, C_MCIs the cost of the storage medium connection means, C_CRepresents the cost of the communication path, and N represents the number of nodes.
[0054]
Taking the difference of the latter from the former, N (N-1) C_T+ N (N-2) C_MCIt can be seen that the cost can be reduced. As this result shows, the cost can be kept low compared to the prior art. Regarding the communication path, the former costs half as much as the latter.
Similarly to the first embodiment, in the second embodiment, since the data division and integration means corresponds to RAID 0 to 6, each node can realize the functions of RAID 0 to 6, and the data division and integration to each node. Since there is a means, even if a node is destroyed due to a regional disaster, the data accessibility of another node is maintained and availability can be realized. Further, since each node has a host connection means, access from a plurality of computers is possible.
[0055]
Next, an embodiment of a recording medium on which the distributed storage program according to the present invention is recorded will be described. FIG. 9 is a configuration diagram of a computer system including a CPU 600, a memory 601, an external storage device 602, a display 603, a keyboard 604, and a communication processing device 605. A recording medium on which a distributed storage program according to the present invention is recorded is shown in FIG. This corresponds to either or both of the memory 601 and the external storage device 602 shown. Further, a portable storage medium such as a magneto-optical disk, a magnetic disk, or a magnetic tape, or an electronic memory, a hard disk, or the like corresponds to the recording medium of the present invention, and a distributed storage program having the means of the present invention stored in these recording media Is loaded into the computer system shown in FIG. 9, and the computer system is connected to either or both of the host computer and the storage medium, and connected to the system using the method according to the present invention of another node via a communication path. Thus, the above distributed storage method can be used in the computer system.
[0056]
The present invention is not limited to the above-described embodiments, and various modifications and applications can be made within the scope of the claims.
[0057]
【The invention's effect】
As described above, according to the present invention, the data division and integration means designates a data protection configuration, generates redundant data suitable for the designated data protection configuration, and places the redundant data in a location suitable for the designated data protection configuration. Therefore, the data protection configuration function can be selected and realized for each node. Further, by setting the data protection configuration to any one of RAID0 to RAID6, RAID0 to RAID6 can be selected and realized for each node.
[0058]
Further, according to the present invention, since each node has a data division and integration means, even if a node is destroyed due to a regional disaster, the data accessibility of another node is maintained, so that availability can be realized. Further, the data transfer means in each node has a data distribution function, and the storage medium connection means connected to the data transfer means can access data to the storage medium or storage area assigned to each node. A plurality of storage media that are accessed from the host computer of the own node and a plurality of host computers of other nodes can be provided in the own node by the data transfer means and the storage medium connection means, and the cost can be reduced. In addition, a channel type configuration can also share a communication path, and further cost reduction is possible.
[Brief description of the drawings]
FIG. 1 is a diagram showing a configuration of a first exemplary embodiment of the present invention.
FIG. 2 is a diagram for explaining the operation of the first exemplary embodiment of the present invention.
FIG. 3 is a flowchart showing a write operation of the host computer according to the first embodiment of the present invention.
FIG. 4 is a flowchart showing a read operation of the host computer according to the first embodiment of the present invention.
FIG. 5 is a diagram showing a configuration of a second exemplary embodiment of the present invention.
FIG. 6 is a diagram for clearly showing the operation of the second exemplary embodiment of the present invention.
FIG. 7 is a first example of a distributed storage device in the prior art.
FIG. 8 is a second example of a distributed storage device in the prior art.
FIG. 9 is a configuration diagram of a computer system in an embodiment of a recording medium of the present invention.
[Explanation of symbols]
10 Node A
20 Node B
30 Node C
100, 110, 120, 500, 510, 520 Host computer
101, 111, 121, 501, 511, 521 Host connection means
102, 112, 122 Data division and integration means
502, 512, 522 RAID control means
103, 113, 123, 503, 513, 523 Data transfer means
506, 516, 526, 509, 519, 529 Data transfer means
104, 114, 124, 505, 515, 525 Storage medium connection means
508, 518, 528 Storage medium connection means
504, 514, 524, 507, 517, 527 Storage medium
130, 131, 530, 531 communication path
540, 541, 542, 543, 544 Communication path
600 CPU
601 memory
602 External storage device
603 display
604 keyboard
605 communication processing apparatus

Claims

A distributed storage method in which data of a host computer in another node is distributed and accumulated via a communication path in a distributed storage medium in nodes separated from each other,
At each node
A data division and integration means for dividing data when writing data to a distributed storage medium from a host computer and integrating data when reading data designates one type of data protection configuration from a plurality of types of data protection configurations,
The data transfer means for transferring the data of the host computer of the one node and the distributed storage medium to a communication path extending from the one node to the other node, the data arriving from the other node to the host computer of the one node Or the data to the distributed storage medium installed in the one node, and distribute the data to the host computer or the distributed storage medium,
A storage medium connection unit that connects the distributed storage medium and the data transfer unit is configured to transfer the data distributed to the distributed storage medium for each of the other nodes when there are a plurality of other nodes that are data transmission sources. Write to multiple storage areas or multiple storage areas of a single storage medium and read for each other node,
A distributed storage method.

The data division and integration means includes
When the data is divided, redundant data is generated if necessary by the data protection configuration;
Generating data division accumulation information holding information specifying the storage location of the divided data and the redundant data, and information specifying the data protection configuration;
When the redundant data has been generated, when there is data loss at the time of the data integration, the remaining data and the data lost due to the redundant data are restored by referring to the data division accumulation information.
The distributed storage method according to claim 1.

A distributed storage system in which data of a host computer at another node is distributed and accumulated via a communication path in a distributed storage medium at nodes distant from each other,
At each node
Data division and integration means for dividing data when writing data from a host computer to a distributed storage medium, integrating data when reading data, and designating one type of data protection configuration from a plurality of types of data protection configurations;
The data of the host computer of the one node and the distributed storage medium are transferred to a communication path extending from the one node to the other node, and the data arriving from the other node is data to the host computer of the one node or Data transfer means for determining whether the data is for a distributed storage medium installed in the one node and distributing the data to the host computer or the distributed storage medium;
When the distributed storage medium and the data transfer means are connected, and the data distributed to the distributed storage medium is a plurality of other storage nodes or data for each of the other nodes, Storage medium connection means for writing to a plurality of storage areas of one storage medium and reading for each of the other nodes;
A distributed storage system comprising:

The data division and integration means includes
Means for generating redundant data if required by the data protection configuration during the data division;
Means for generating data division accumulation information for holding the divided data and information for specifying the storage location of the redundant data, and information for specifying the data protection configuration;
When the redundant data is generated, when there is data loss at the time of the data integration, means for restoring the remaining data and data lost by the redundant data with reference to the data division storage information,
4. The distributed storage system according to claim 3, further comprising:

Recording medium recording a distributed storage program that causes a computer of each node to execute a distributed storage method in which data of a host computer in another node is distributed and stored via a communication path in a distributed storage medium in a remote node And the computer
Data division and integration means for dividing data when writing data from a host computer to a distributed storage medium, integrating data when reading data, and designating one type of data protection configuration from a plurality of types of data protection configurations;
The data of the host computer of the one node and the distributed storage medium are transferred to a communication path extending from the one node to the other node, and the data arriving from the other node is data to the host computer of the one node or Data transfer means for determining whether the data is for a distributed storage medium installed at the one node and distributing the data to the host computer or the distributed storage medium;
When the distributed storage medium and the data transfer means are connected, and the data distributed to the distributed storage medium is a plurality of other storage nodes or data for each of the other nodes, Storage medium connection means for writing to a plurality of storage areas of one storage medium and reading each other node;
A recording medium on which a distributed storage program for functioning as a recording medium is recorded.

The data division and integration means includes
Means for generating redundant data if required by the data protection configuration during the data division;
Means for generating data division accumulation information for holding the divided data and information for specifying the storage location of the redundant data, and information for specifying the data protection configuration;
When the redundant data is generated, when there is data loss at the time of the data integration, means for restoring the remaining data and data lost by the redundant data with reference to the data division storage information,
6. A recording medium on which the distributed storage program according to claim 5 is recorded.