JP2003223285A

JP2003223285A - Data storage device, data management method, and program

Info

Publication number: JP2003223285A
Application number: JP2002024260A
Authority: JP
Inventors: Yosuke Kaneko; 洋介金子; Hidehiro Shimizu; 英弘清水; Mitsunori Kori; 光則郡
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2002-01-31
Filing date: 2002-01-31
Publication date: 2003-08-08
Anticipated expiration: 2022-01-31
Also published as: JP4259800B2

Abstract

<P>PROBLEM TO BE SOLVED: To solve the problem that it is impossible to maximize the reliability of the entire system at a mirroring constitution unless taking due account that which disks are to be coupled to write original data and duplicate data. <P>SOLUTION: A data storage means allocates data dispersedly with the uniform number of blocks by a data block unit to a plurality of storage devices by dividing the data into a plurality of data blocks of every uniform data quantity and stores the data block having identical contents redundantly at every prescribed number of the storage devices and a readout means reads out the data stored in a plurality of storage devices dispersedly by single or plural data block unit for every prescribed number of the storage devices are provided. <P>COPYRIGHT: (C)2003,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明はデータの冗長多重
度に合わせて複数の２次記憶装置のグループを作成し、
データを各グループに書き込むデータ格納装置、データ
管理方法及びプログラムに関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention creates a group of a plurality of secondary storage devices according to the redundancy multiplicity of data,
The present invention relates to a data storage device, a data management method and a program for writing data in each group.

【０００２】[0002]

【従来の技術】近年、２次記憶装置（以下、ディスクと
称する）の大容量化が進み、ディスクからデータを高速
に転送する手法やディスクの高信頼性が求められてい
る。これらの要求に答えるための手法の１つとして、デ
ィスクアレイ装置がある。このディスクアレイ装置に
は、いくつかのデータ管理方法が提案されている。2. Description of the Related Art In recent years, the capacity of a secondary storage device (hereinafter referred to as a disk) has been increased, and a method of transferring data from the disk at high speed and a high reliability of the disk are required. There is a disk array device as one of the methods to meet these demands. Several data management methods have been proposed for this disk array device.

【０００３】ディスクアレイ装置のデータ管理方法の１
つにミラーリングがある。このミラーリングは、ＲＡＩ
Ｄ１（ＲＡＩＤ：ＲｅｄｕｎｄａｎｔＡｒｒａｙｓ
ｏｆＩｎｅｘｐｅｎｓｉｖｅＤｉｓｋｓ）とも呼ばれ
る。その概要は、２つのディスクに同一のデータを書き
込むことによってディスクの信頼性を向上させるもので
ある。One of data management methods for disk array devices
One is mirroring. This mirroring is RAI
D1 (RAID: Redundant Arrays)
Also called ofInexpensive Disks). The outline is to improve the reliability of the disks by writing the same data to the two disks.

【０００４】図２７はミラーリングを用いたデータ格納
例を示す図である。図において、＃Ｄ１〜＃Ｄ４がそれ
ぞれ物理ディスクを表しており、Ｂ１〜Ｂ２４は物理デ
ィスク＃Ｄ１〜＃Ｄ４に格納されるデータを示してい
る。ここで、物理ディスク＃Ｄ１と物理ディスク＃Ｄ
２、物理ディスク＃Ｄ３と物理ディスク＃Ｄ４が対をな
している。これら物理ディスクの対が１つの論理ディス
クを構成する。対となる物理ディスクには同一のデータ
が書き込まれる。つまり、物理ディスク＃Ｄ１と物理デ
ィスク＃Ｄ２にはデータＢ１〜Ｂ１２が格納され、物理
ディスク＃Ｄ３と物理ディスク＃Ｄ４にはデータＢ１３
〜Ｂ２４が格納される。FIG. 27 is a diagram showing an example of data storage using mirroring. In the figure, # D1 to # D4 represent physical disks, and B1 to B24 represent data stored in the physical disks # D1 to # D4. Here, physical disk # D1 and physical disk #D
2. Physical disk # D3 and physical disk # D4 form a pair. A pair of these physical disks constitutes one logical disk. The same data is written on the paired physical disks. That is, the data B1 to B12 are stored in the physical disks # D1 and # D2, and the data B13 is stored in the physical disks # D3 and # D4.
To B24 are stored.

【０００５】このように構成して、物理ディスク＃Ｄ１
に障害が発生した時には物理ディスク＃Ｄ２にアクセス
し、反対に物理ディスク＃Ｄ２に障害が発生すると物理
ディスク＃Ｄ１にアクセスするように動作する。これに
よって、対となる物理ディスクが同時に故障しない限り
はデータアクセスが可能である。With this configuration, physical disk # D1
When a failure occurs in the physical disk # D2, the physical disk # D2 is accessed, and when a failure occurs in the physical disk # D2, the physical disk # D1 is accessed. As a result, data can be accessed as long as the paired physical disks do not fail at the same time.

【０００６】上述したミラーリングには、データ転送時
に各物理ディスクの負荷が均等になるように、データを
転送するディスクを選択してディスクの全体的なデータ
転送性能を向上させる負荷分散機能を有するものがあ
る。具体的に説明すると、ミラーリング構成をとる論理
ディスクに対してデータ転送要求が発行された場合に、
要求されたデータが物理ディスクから転送されてくる待
ち時間の間に、次のデータ転送要求を物理ディスク対の
他方に対して発行するものである。これにより、データ
転送が同時平行に行われるため、データの転送効率が向
上する。The above-mentioned mirroring has a load distribution function for improving the overall data transfer performance of the disks by selecting the disks to transfer the data so that the loads on the physical disks are equalized during the data transfer. There is. Specifically, when a data transfer request is issued to a logical disk with a mirroring configuration,
During the waiting time when the requested data is transferred from the physical disk, the next data transfer request is issued to the other physical disk pair. As a result, the data transfer is performed in parallel at the same time, which improves the data transfer efficiency.

【０００７】以上のように、ミラーリングは同一のデー
タを２つのディスクに格納する手法であり、ディスクの
高信頼性、高速転送を実現する。As described above, the mirroring is a method of storing the same data in two disks, and realizes high reliability and high speed transfer of the disks.

【０００８】また、通常のミラーリングのようにディス
クの対を作り、元データ、複製データを格納する以外の
方法も提案されている。例えば、特開平６−１８７１０
１号公報には元データの複製を複数のディスクに格納す
る従来のデータ格納装置（ディスクアレイ）が開示され
ている。Further, a method other than forming the pair of disks and storing the original data and the duplicated data as in ordinary mirroring has been proposed. For example, JP-A-6-18710
Japanese Patent No. 1 discloses a conventional data storage device (disk array) that stores a copy of original data in a plurality of disks.

【０００９】図２８は、上述した特開平６−１８７１０
１号公報に開示される従来のディスクアレイによるデー
タ格納例を示す図であり、（ａ）は各ディスクの複製を
他の全てのディスクに対して配置した例を示し、（ｂ）
は元データの複製を２台以上の複製用のディスクに分け
て格納した例を示している。FIG. 28 shows the above-mentioned Japanese Patent Laid-Open No. 6-18710.
FIG. 3 is a diagram showing an example of data storage by the conventional disk array disclosed in Japanese Patent Publication No. 1-A, FIG. 1A shows an example in which a copy of each disk is arranged for all other disks, and FIG.
Shows an example in which a copy of the original data is divided and stored in two or more copy disks.

【００１０】次に概要について説明する。図２８
（ａ）、（ｂ）のデータ管理方法では、ディスクに格納
されたデータを転送する時に、各ディスクから並列にデ
ータを転送するため、高速な転送が実現できる。また、
特定のディスクにデータ転送要求が集中した場合にも他
のディスクに負荷を分散させることが可能である。さら
に、ディスクに故障が発生してデータをバックアップす
る際に、複数のディスクから並列にデータを転送してス
ペアディスクに書き込むことができる。このため、通常
のミラーリングシステムと比較して、障害復旧、バック
アップが速いといった利点もある。Next, an outline will be described. FIG. 28
In the data management methods (a) and (b), when the data stored in the disks is transferred, the data is transferred in parallel from each disk, so that high-speed transfer can be realized. Also,
Even when data transfer requests are concentrated on a specific disk, it is possible to distribute the load to other disks. Furthermore, when a disk fails and data is backed up, data can be transferred from a plurality of disks in parallel and written to a spare disk. Therefore, it has the advantage of faster failure recovery and faster backup than the normal mirroring system.

【００１１】[0011]

【発明が解決しようとする課題】従来のデータ格納装置
は以上のように構成されているので、ミラーリング構成
において、どのディスク同士を対として、元データや複
製データの書き込みを行うかを十分に考慮しないと、シ
ステム全体としての信頼性を最大限に引き出すことがで
きないという課題があった。Since the conventional data storage device is configured as described above, in the mirroring configuration, it is necessary to fully consider which disk is paired with the original data and the duplicated data for writing. Otherwise, there was a problem that the reliability of the entire system could not be maximized.

【００１２】上記課題を具体的に説明すると、例えば図
２７のミラーリング構成において、物理ディスク＃Ｄ１
と物理ディスク＃Ｄ２が同一のマシンに置かれていた場
合を考える。このとき、マシンレベルで障害が発生する
と、物理ディスク＃Ｄ１と物理ディスク＃Ｄ２に同時に
アクセスができなくなってしまう。これについては、図
２８（ｂ）に示すデータ管理方法においても、同様のこ
とが言える。つまり、このようなミラーリング構成で
は、ディスク障害以外の障害には耐えることができな
い。The above problem will be described in detail. For example, in the mirroring configuration of FIG. 27, physical disk # D1
And the physical disk # D2 are placed in the same machine. At this time, if a failure occurs at the machine level, the physical disks # D1 and # D2 cannot be simultaneously accessed. The same applies to the data management method shown in FIG. 28B. That is, such a mirroring configuration cannot withstand failures other than disk failures.

【００１３】また、通常のミラーリングでは、元データ
を格納するディスクとその複製データを格納するディス
クに同時に障害が起こらない限り、全てのデータにアク
セス可能である。つまり、通常Ｎ台のディスクで構成さ
れるミラーリングシステムでは最大Ｎ／２台のディスク
が故障しても動作する。しかしながら、図２８（ａ）に
示す例では、任意の２台のディスクに障害が発生すると
アクセスできなくなるデータが必ず発生してしまうとい
う課題がある。In normal mirroring, all data can be accessed as long as the disk storing the original data and the disk storing the duplicated data do not fail at the same time. In other words, in a mirroring system that is normally composed of N disks, it operates even if a maximum of N / 2 disks fails. However, in the example shown in FIG. 28A, there is a problem that inaccessible data is always generated when a failure occurs in any two disks.

【００１４】さらに、図２８（ｂ）の構成においては、
ディスクの本数によっては必ずしもデータの配置が均等
にならないため、データ転送時に各ディスクにかかる負
荷が均等にならないという課題がある。Further, in the configuration of FIG. 28 (b),
Depending on the number of disks, the data arrangement is not always uniform, which causes a problem that the load on each disk during data transfer is not uniform.

【００１５】このように、従来のデータ転送時に負荷分
散を行うミラーリング構成においては、ディスクが故障
していなければ、各ディスクからデータ転送量が均等化
するようにデータ転送ディスクの選択が行われる。しか
しながら、ディスクに故障が発生すると、故障ディスク
と対を組む物理ディスクのデータ転送量が増大してしま
う。このため、そのディスクのデータ転送時間がボトル
ネックとなり、システム全体のスループットの低下につ
ながるという課題があった。As described above, in the conventional mirroring configuration in which the load is distributed at the time of data transfer, the data transfer disks are selected so that the data transfer amounts from the respective disks are equalized unless the disks fail. However, if a disk fails, the amount of data transferred to the physical disk paired with the failed disk increases. Therefore, there is a problem that the data transfer time of the disk becomes a bottleneck and the throughput of the entire system is reduced.

【００１６】この発明は上記のような課題を解決するた
めになされたもので、複数のディスクを用意し、各ディ
スクに格納するデータ量及び各ディスクからのデータ転
送量を均等化することで、システム全体のスループット
の低下を最小限に抑えることができるデータ格納装置、
データ管理方法及びプログラムを得ることを目的とす
る。The present invention has been made to solve the above problems, and by preparing a plurality of disks and equalizing the amount of data stored in each disk and the amount of data transferred from each disk, A data storage device that can minimize the decrease in overall system throughput,
The purpose is to obtain a data management method and program.

【００１７】また、この発明は、ディスク間の距離情報
を元に同時に障害が起こりにくいディスク同士をグルー
プ化し、各グループに同一のデータを書き込んでミラー
リングを行うことで、システム内に保持されたデータの
信頼性を向上させることができるデータ格納装置、デー
タ管理方法及びプログラムを得ることを目的とする。Further, according to the present invention, the disks held in the system are grouped by grouping the disks which are less likely to cause a failure at the same time on the basis of the distance information between the disks, and writing the same data in each group to perform mirroring. An object of the present invention is to obtain a data storage device, a data management method, and a program that can improve the reliability of the.

【００１８】さらに、この発明は、ディスクの多重冗長
構成を構築するデータ格納装置、データ管理方法及びプ
ログラムを得ることを目的とする。A further object of the present invention is to obtain a data storage device, a data management method and a program for constructing a multiple redundant configuration of disks.

【００１９】[0019]

【課題を解決するための手段】この発明に係るデータ格
納装置は、データを同一データ量ごとの複数のデータブ
ロックに分割して、該データブロック単位で複数の記憶
装置に同一ブロック数ずつ分散配置させると共に、所定
数の記憶装置ごとに同一内容のデータブロックを重複し
て格納するデータ格納手段と、複数の記憶装置に分散し
て格納されたデータを所定数の記憶装置ごとに１又は複
数のデータブロック単位で読み出す読み出し手段とを備
えるものである。A data storage device according to the present invention divides data into a plurality of data blocks of the same data amount, and disposes the same number of blocks in a plurality of storage devices in units of the data blocks. In addition, the data storage means stores the data blocks of the same content in duplicate for each of the predetermined number of storage devices, and the data stored in the plurality of storage devices in a distributed manner is one or more for each of the predetermined number of storage devices. A reading means for reading in data block units is provided.

【００２０】この発明に係るデータ格納装置は、複数の
記憶装置の各記憶装置間における相互作用の指標である
距離情報を用いて、相互作用の小さい記憶装置同士をま
とめた冗長グループを構成し、データを複数の記憶装置
に分散して格納するにあたり、該データを同一データ量
ごとの複数のデータブロックに分割し、冗長グループ内
の各記憶装置に同一内容のデータブロックを重複して格
納するデータ格納手段と、複数の記憶装置に分散して格
納されたデータを冗長グループごとに１又は複数のデー
タブロック単位で読み出す読み出し手段とを備えるもの
である。The data storage device according to the present invention uses the distance information, which is an index of the interaction between the storage devices of the plurality of storage devices, to form a redundant group in which the storage devices having a small interaction are put together. Data in which data is distributed and stored in a plurality of storage devices, the data is divided into a plurality of data blocks of the same data amount, and data blocks of the same content are redundantly stored in each storage device in the redundancy group. The storage device includes a storage unit and a read unit that reads out data stored in a distributed manner in a plurality of storage devices in units of one or a plurality of data blocks for each redundancy group.

【００２１】この発明に係るデータ格納装置は、複数の
記憶装置の各記憶装置間における相互作用の指標である
距離情報を用いて、相互作用の小さい記憶装置同士をま
とめた冗長グループを構成し、データを複数の記憶装置
に分散して格納するにあたり、該データを同一データ量
ごとの複数のデータブロックに分割し、冗長グループ内
の各記憶装置に同一内容のデータブロックを重複して格
納するものである。The data storage device according to the present invention uses the distance information, which is an index of the interaction between the storage devices of the plurality of storage devices, to form a redundant group in which the storage devices having a small interaction are put together. When data is distributed and stored in a plurality of storage devices, the data is divided into a plurality of data blocks of the same data amount, and the data blocks of the same content are redundantly stored in each storage device in the redundancy group. Is.

【００２２】この発明に係るデータ格納装置は、データ
格納手段が複数の記憶装置に対してラウンドロビンでデ
ータブロックを格納するものである。In the data storage device according to the present invention, the data storage means stores data blocks in a plurality of storage devices in a round robin manner.

【００２３】この発明に係るデータ格納装置は、データ
格納手段がハッシュ分散を用いてデータブロックを格納
すべき記憶装置を決定するものである。In the data storage device according to the present invention, the data storage means uses hash distribution to determine the storage device in which the data block should be stored.

【００２４】この発明に係るデータ格納装置は、データ
格納手段が各記憶装置から一度に読み出される複数のデ
ータブロックをそれぞれの記憶領域内で隣接して配置す
るものである。In the data storage device according to the present invention, the data storage means arranges a plurality of data blocks read at once from each storage device adjacently in each storage area.

【００２５】この発明に係るデータ格納装置は、いずれ
かの記憶装置がデータ読み出し不可となると、該記憶装
置から読み出されるべきデータブロック群を、これと同
一内容のデータブロック群を分散して格納する他の複数
の記憶装置から均等に読み出すものである。The data storage device according to the present invention stores the data block group to be read from the storage device in a distributed manner when the data reading is impossible in any of the storage devices. The data is evenly read from a plurality of other storage devices.

【００２６】この発明に係るデータ格納装置は、複数の
記憶装置が構成する階層構造に基づいて距離情報を求め
るものである。A data storage device according to the present invention obtains distance information based on a hierarchical structure formed by a plurality of storage devices.

【００２７】この発明に係るデータ格納装置は、各記憶
装置が配置される位置情報に基づいて距離情報を求める
ものである。The data storage device according to the present invention obtains the distance information based on the position information in which each storage device is arranged.

【００２８】この発明に係るデータ格納装置は、各記憶
装置間の接続経路に応じて決定される故障率に基づいて
距離情報を求めるものである。The data storage device according to the present invention obtains distance information based on the failure rate determined according to the connection path between the storage devices.

【００２９】この発明に係るデータ格納装置は、同一内
容のデータブロックを重複させる度合いである多重度を
冗長グループごとに設定するものである。The data storage device according to the present invention sets the multiplicity, which is the degree to which data blocks having the same content are overlapped, for each redundancy group.

【００３０】この発明に係るデータ格納装置は、距離情
報に基づいて信頼性のレベルを冗長グループごとに設定
するものである。The data storage device according to the present invention sets the reliability level for each redundancy group based on the distance information.

【００３１】この発明に係るデータ管理方法は、データ
を同一データ量ごとの複数のデータブロックに分割し
て、該データブロック単位で複数の記憶装置に同一ブロ
ック数ずつ分散配置させると共に、所定数の記憶装置ご
とに同一内容のデータブロックを重複して格納するデー
タ格納ステップと、複数の記憶装置に分散して格納され
たデータを所定数の記憶装置ごとに１又は複数のデータ
ブロック単位で読み出す読み出しステップとを備えるも
のである。In the data management method according to the present invention, the data is divided into a plurality of data blocks of the same data amount, and the same number of blocks are distributed and arranged in the plurality of storage devices in units of the data blocks. A data storage step of redundantly storing data blocks of the same content for each storage device, and reading of data stored in a distributed manner in a plurality of storage devices in units of one or a plurality of data blocks for each predetermined number of storage devices And steps.

【００３２】この発明に係るデータ管理方法は、データ
格納ステップにて、複数の記憶装置の各記憶装置間にお
ける相互作用の指標である距離情報を用いて、相互作用
の小さい記憶装置同士をまとめた冗長グループを構成
し、データを複数の記憶装置に分散して格納するにあた
り、該データを同一データ量ごとの複数のデータブロッ
クに分割し、冗長グループ内の各記憶装置に同一内容の
データブロックを重複して格納するものである。In the data management method according to the present invention, in the data storing step, the storage devices having a small interaction are put together by using the distance information which is an index of the interaction between the storage devices of the plurality of storage devices. When a redundant group is formed and data is distributed and stored in a plurality of storage devices, the data is divided into a plurality of data blocks of the same data amount, and data blocks having the same content are stored in each storage device in the redundancy group. It is stored in duplicate.

【００３３】この発明に係るプログラムは、データを同
一データ量ごとの複数のデータブロックに分割して、該
データブロック単位で複数の記憶装置に同一ブロック数
ずつ分散配置させると共に、所定数の記憶装置ごとに同
一内容のデータブロックを重複して格納するデータ格納
手段、複数の記憶装置に分散して格納されたデータを所
定数の記憶装置ごとに１又は複数のデータブロック単位
で読み出す読み出し手段としてコンピュータを機能させ
るものである。A program according to the present invention divides data into a plurality of data blocks of the same data amount, disperses and arranges the same number of blocks in a plurality of storage devices in units of the data blocks, and a predetermined number of storage devices. A computer as a data storage unit that stores data blocks of the same content in duplicate for each unit, and a read unit that reads out the data stored in a distributed manner in a plurality of storage units in units of one or a plurality of data blocks for each predetermined number of storage units Is to function.

【００３４】この発明に係るプログラムは、複数の記憶
装置の各記憶装置間における相互作用の指標である距離
情報を用いて、相互作用の小さい記憶装置同士をまとめ
た冗長グループを構成し、データを同一データ量ごとの
複数のデータブロックに分割して、該データブロック単
位で複数の記憶装置に同一ブロック数ずつ分散配置させ
ると共に、冗長グループ内の各記憶装置に同一内容のデ
ータブロックを重複して格納するデータ格納手段、複数
の記憶装置に分散して格納されたデータを冗長グループ
ごとに１又は複数のデータブロック単位で読み出す読み
出し手段としてコンピュータを機能させるものである。The program according to the present invention uses the distance information, which is an index of the interaction between the storage devices of the plurality of storage devices, to configure a redundant group in which the storage devices having a small interaction are grouped and data is stored. The data blocks are divided into a plurality of data blocks of the same data amount, the same number of blocks are distributed and arranged in the plurality of storage devices in units of the data blocks, and the data blocks of the same content are duplicated in each storage device in the redundancy group. The computer is made to function as a data storage means for storing and a reading means for reading out data stored in a distributed manner in a plurality of storage devices in units of one or a plurality of data blocks for each redundancy group.

【００３５】この発明に係るプログラムは、複数の記憶
装置の各記憶装置間における相互作用の指標である距離
情報を用いて、相互作用の小さい記憶装置同士をまとめ
た冗長グループを構成し、データを複数の記憶装置に分
散して格納するにあたり、該データを同一データ量ごと
の複数のデータブロックに分割し、冗長グループ内の各
記憶装置に同一内容のデータブロックを重複して格納す
るデータ格納手段としてコンピュータを機能させるもの
である。The program according to the present invention uses the distance information, which is an index of the interaction between the storage devices of the plurality of storage devices, to configure a redundant group in which the storage devices having a small interaction are put together, and data is stored. A data storage unit that divides the data into a plurality of data blocks of the same data amount and stores the data blocks of the same content in duplicate in each storage device in the redundant group when the data blocks are distributed and stored in the plurality of storage devices. It makes a computer function as.

【００３６】[0036]

【発明の実施の形態】以下、この発明の実施の一形態を
説明する。実施の形態１．図１はこの発明の実施の形態１によるデ
ータ格納装置の構成例を示す図である。図において、１
はネットワーク２を介してｎ台のノードマシン３−１〜
３−ｎと接続するホストマシン（データ格納装置）であ
って、実施の形態１によるデータ格納を実現するプログ
ラムを実行するコンピュータ装置である。２はホストマ
シン１とノードマシンを接続するネットワークで、イン
ターネットやイントラネットなどの通信インフラが考え
られる。３−１〜３−ｎはノードマシンであって、２次
記憶装置であるディスクを有するコンピュータ装置であ
る。これらホストマシン１、ネットワーク２及びノード
マシン３−１〜３−ｎによって、全体的に１つのシステ
ムが構成される。ここで、ホストマシン１とノードマシ
ン３−１〜３−ｎとを接続する媒体としては、ネットワ
ーク２に限られず、バスを用いて接続して並列システム
を構成してもよい。また、ノードマシン３としては、Ｉ
／Ｏ制御装置やＳＣＳＩなどでも構わない。BEST MODE FOR CARRYING OUT THE INVENTION An embodiment of the present invention will be described below. Embodiment 1. 1 is a diagram showing a configuration example of a data storage device according to a first embodiment of the present invention. In the figure, 1
Are connected to the n node machines 3-1 to 3-1 via the network 2.
A host machine (data storage device) connected to 3-n, which is a computer device that executes a program that realizes data storage according to the first embodiment. Reference numeral 2 denotes a network connecting the host machine 1 and the node machine, and a communication infrastructure such as the Internet or an intranet can be considered. 3-1 to 3-n are node machines, which are computer devices having disks as secondary storage devices. The host machine 1, the network 2, and the node machines 3-1 to 3-n form one system as a whole. Here, the medium for connecting the host machine 1 and the node machines 3-1 to 3-n is not limited to the network 2, and may be connected using a bus to form a parallel system. Further, as the node machine 3, I
An I / O control device or SCSI may be used.

【００３７】図２は図１中のホストマシンの構成を示す
図である。図において、４は主記憶装置であって、オペ
レーティングシステム５、データ転送制御部（読み出し
手段）６、ファイルブロック管理部（データ格納手段）
７、通信制御部８及びノードマシン管理部（データ格納
手段）９などの機能を実現するソフトウェアが格納され
ている。また、主記憶装置４に格納されるオペレーティ
ングシステム５によってホストマシン１のハードウェア
全体が制御される。１０はホストマシン１を構成するコ
ンピュータ装置の中央演算処理装置（データ格納手段、
読み出し手段）で、主記憶装置４内の各ソフトウェアを
実行する。１１はネットワークアダプタであって、ホス
トマシン１によるネットワーク２への接続を制御する。
１２ａは主記憶装置４と中央演算処理装置１０とを接続
するシステムバスで、１２ｂはネットワークアダプタ１
１やディスクコントローラ１４、及び、他のデバイスを
接続するＰＣＩバスである。１３はバス結合装置であっ
て、システムバス１２ａとＰＣＩバス１２ｂを結合す
る。１４はディスク装置（記憶装置）１５，１６を制御
するディスクコントローラ１４である。FIG. 2 is a diagram showing the configuration of the host machine shown in FIG. In the figure, reference numeral 4 denotes a main storage device, which includes an operating system 5, a data transfer control unit (reading unit) 6, and a file block management unit (data storage unit).
Software for implementing functions such as 7, a communication control unit 8 and a node machine management unit (data storage unit) 9 is stored. The operating system 5 stored in the main storage device 4 controls the entire hardware of the host machine 1. Reference numeral 10 represents a central processing unit (data storage means,
The reading means) executes each software in the main storage device 4. A network adapter 11 controls connection of the host machine 1 to the network 2.
Reference numeral 12a is a system bus that connects the main storage device 4 and the central processing unit 10, and 12b is a network adapter 1
1 and the disk controller 14, and a PCI bus for connecting other devices. A bus coupler 13 couples the system bus 12a and the PCI bus 12b. A disk controller 14 controls the disk devices (storage devices) 15 and 16.

【００３８】ここで、データ転送制御部６、ファイルブ
ロック管理部７、通信制御部８、及びノードマシン管理
部９としては、上述ではソフトウェアで実現する例を示
したが、ファームウェアとしても良いし、上記機能を実
現するハードウェアとして構成してもよい。また、ソフ
トウェアとファームウェアとハードウェアの組み合わせ
でもよい。Here, the data transfer control unit 6, the file block management unit 7, the communication control unit 8, and the node machine management unit 9 have been described as an example realized by software, but they may be firmware. You may comprise as the hardware which implement | achieves the said function. Also, a combination of software, firmware and hardware may be used.

【００３９】図３はこの発明の実施の形態１よるデータ
格納装置の各ソフトウェアの関係を示す図である。図に
おいて、７ａは主記憶装置４に格納されるファイルブロ
ック管理テーブルであって、データ転送時に各ディスク
から転送するファイルブロックであるリードブロックを
特定する情報を格納する。９ａはノードマシン管理テー
ブルで、ノードマシンの管理に関する情報を格納する。
また、ノードマシン３−１〜３−ｎは、オペレーティン
グシステム１７、通信制御部１８及びディスクアクセス
制御部１９の機能を実現するソフトウェアを実行し、２
次記憶装置として複数のディスク装置（記憶装置）２０
を有するものとする。FIG. 3 is a diagram showing the relationship of each software of the data storage device according to the first embodiment of the present invention. In the figure, 7a is a file block management table stored in the main storage device 4, and stores information for specifying a read block which is a file block transferred from each disk at the time of data transfer. A node machine management table 9a stores information related to management of node machines.
The node machines 3-1 to 3-n execute software that realizes the functions of the operating system 17, the communication control unit 18, and the disk access control unit 19, and
A plurality of disk devices (storage devices) 20 as the next storage device
Shall have.

【００４０】図４は図３中のノードマシン管理テーブル
の構成を示す図である。図において、２１はノードマシ
ン管理テーブル９ａを構成する情報の１つであるノード
マシン管理情報であって、各ノードマシンを特定する情
報としてノードマシン番号２１ａ、ＩＰアドレス２１
ｂ、ポート番号２１ｃが設定される。ノードマシン管理
部９は、このノードマシン管理情報２１を元にして各ノ
ードマシンに接続するディスクを特定するディスク情報
２２を取得し、ディスク情報２２におけるディスク番号
２２ａ、ノードマシン番号２２ｂの設定を行う。２２は
ノードマシンに接続するディスクを特定するディスク情
報で、ディスクを特定するディスク番号２２ａ、各ノー
ドマシンを特定するノードマシン番号及びその冗長グル
ープを特定する冗長グループ番号２２ｃから構成され
る。２３は階層情報であって、各ノードマシンに接続さ
れたディスクの階層関係を特定する下位階層情報２３ａ
及び上位階層情報２３ｂから構成される。２４は距離情
報で、２つのノードマシン間で同時にアクセス不能とな
る可能性を示す指標である。つまり、距離情報２４は、
ディスク間の相互作用を示す指標であると言える。FIG. 4 is a diagram showing the structure of the node machine management table in FIG. In the figure, reference numeral 21 is node machine management information, which is one of the pieces of information constituting the node machine management table 9a, and the node machine number 21a and IP address 21 are used as information for identifying each node machine.
b and port number 21c are set. The node machine management unit 9 acquires the disk information 22 that identifies the disk connected to each node machine based on this node machine management information 21, and sets the disk number 22a and node machine number 22b in the disk information 22. . Reference numeral 22 is disk information for specifying a disk connected to the node machine, which is composed of a disk number 22a for specifying the disk, a node machine number for specifying each node machine and a redundancy group number 22c for specifying the redundancy group. Reference numeral 23 is layer information, which is lower layer information 23a that identifies the layer relationship of the disks connected to each node machine.
And upper layer information 23b. Reference numeral 24 is distance information, which is an index indicating the possibility of inaccessibility between two node machines at the same time. That is, the distance information 24 is
It can be said that this is an index showing the interaction between the disks.

【００４１】次に初期設定時における階層情報２３の設
定例について説明する。図５は実施の形態１によるデー
タ格納装置を構成するディスクの階層関係の一例を示す
図である。図において、＃Ｄ１〜＃Ｄ８はデータ格納装
置を構成するディスクである。また、符号２５−１，２
５−２，２６−１，２６−２を付した枠で囲まれたディ
スクは、各ノードマシンに接続されたディスクであるこ
とを表している。さらに、符号２５，２６を付した枠で
囲まれたディスクは、同一のフロアに存在するノードマ
シンに接続されたディスクであることを表している。ま
た、２７は実施の形態１によるデータ格納装置を構成す
るシステム全体を表している。このようにシステムの全
体を表すものをルートと呼ぶこととする。また、ノード
マシン、フロアのように、ディスクを分類する枠をフレ
ームと呼ぶこととする。つまり、ディスクの階層関係に
おいてノードマシン、フロアを指す用語として、ノード
マシンフレーム２５−１、ノードマシンフレーム２５−
２、フロアフレーム２５を用いる。Next, an example of setting the hierarchy information 23 at the time of initial setting will be described. FIG. 5 is a diagram showing an example of a hierarchical relationship of disks constituting the data storage device according to the first embodiment. In the figure, # D1 to # D8 are disks constituting a data storage device. Also, reference numerals 25-1 and 25-2
Disks surrounded by boxes 5-2, 26-1, and 26-2 represent disks connected to each node machine. Further, the disks enclosed by the frames 25 and 26 are connected to the node machines existing on the same floor. Further, 27 represents the entire system that constitutes the data storage device according to the first embodiment. In this way, what represents the entire system is called a root. A frame for classifying disks, such as a node machine or a floor, is called a frame. That is, in the hierarchical relationship of the disks, node machine frame 25-1, node machine frame 25-
2. Use the floor frame 25.

【００４２】ここで、ノードマシンフレーム２５−１に
属するディスクは＃Ｄ１，＃Ｄ２であり、フロアフレー
ム２５に属するディスクは＃Ｄ１〜＃Ｄ４である。ま
た、ルート２７に属するディスクはシステム内の全ての
ディスクであり、＃Ｄ１〜＃Ｄ８となる。つまり、この
ようなフレーム同士には、階層関係が存在する。図５の
例では、ノードマシンフレーム２５−１に属するディス
クは＃Ｄ１，＃Ｄ２であり、フロアフレーム２５に属す
るディスクは、ノードマシンフレーム２５−１，２５−
２に属するディスクである。また、ルート２７に属する
ディスクは、フロアフレーム２５，２６に属するディス
クでもある。The disks belonging to the node machine frame 25-1 are # D1 and # D2, and the disks belonging to the floor frame 25 are # D1 to # D4. The disks belonging to the route 27 are all disks in the system and are # D1 to # D8. That is, there is a hierarchical relationship between such frames. In the example of FIG. 5, the disks belonging to the node machine frame 25-1 are # D1 and # D2, and the disks belonging to the floor frame 25 are the node machine frames 25-1 and 25-.
It is a disc belonging to 2. The discs belonging to the route 27 are also discs belonging to the floor frames 25 and 26.

【００４３】具体的に説明すると、ディスク＃Ｄ１に対
して、上位階層はディスク＃Ｄ１が属する、ノードマシ
ンフレーム２５−１であり、ノードマシンフレーム２５
−１の上位階層はノードマシンフレーム２５−１が属す
るフロアフレーム２５であり、フロアフレーム２５の上
位階層はルート２７となる。More specifically, with respect to the disk # D1, the upper layer is the node machine frame 25-1 to which the disk # D1 belongs.
The upper layer of -1 is the floor frame 25 to which the node machine frame 25-1 belongs, and the upper layer of the floor frame 25 is the root 27.

【００４４】図４の例では、下位階層情報２３ａがディ
スク＃Ｄ１であり、その上位階層情報２３ｂはノードマ
シンフレーム２５−１となる。同様にディスク＃Ｄ２を
下位階層情報２３ａとすると、その上位階層情報２３ｂ
はノードマシンフレーム２５−１となる。以下、他の全
てのディスクを下位階層情報２３ａとして、その上位階
層情報２３ｂを設定する。In the example of FIG. 4, the lower layer information 23a is the disk # D1 and the upper layer information 23b is the node machine frame 25-1. Similarly, if the disk # D2 is the lower layer information 23a, its upper layer information 23b.
Becomes the node machine frame 25-1. Hereinafter, all the other disks are set as the lower layer information 23a and the upper layer information 23b is set.

【００４５】また、ノードマシンフレーム２５−１を下
位階層情報２３ａとすると、その上位階層情報２３ｂは
フロアフレーム２５となる。同様にノードマシンフレー
ム２５−２を下位階層情報２３ａとすると、その上位階
層情報２３ｂはフロアフレーム２５となる。以下、他の
全てのノードマシンフレームを下位階層情報とし、上位
階層情報２３ｂを設定する。When the node machine frame 25-1 is the lower layer information 23a, the upper layer information 23b is the floor frame 25. Similarly, when the node machine frame 25-2 is the lower layer information 23a, the upper layer information 23b is the floor frame 25. Hereinafter, all other node machine frames are used as lower layer information, and the upper layer information 23b is set.

【００４６】さらに、フロアフレーム２５，２６を下位
階層情報２３ａとすると、その上位階層情報２３ｂはル
ート２７となる。Further, when the floor frames 25 and 26 are the lower layer information 23a, the upper layer information 23b is the route 27.

【００４７】ノードマシン管理部９は、以上のような階
層関係を階層情報２３としてノードマシン管理テーブル
９ａに設定する。この階層情報２３を用いることで、各
ディスクからルートまでの階層関係が決定される。The node machine management unit 9 sets the above hierarchical relationship as the hierarchy information 23 in the node machine management table 9a. By using this layer information 23, the layer relationship from each disk to the root is determined.

【００４８】ここで、本発明のデータ格納装置におい
て、ディスクを分類する時のフレームは自由に設定して
良い。図５では、ノードマシン、フロアのフレームを設
定したが、例えば地域のような更に大きなフレームを追
加したり、バスのように更に小さなフレームを追加した
りすることも任意に行って良い。つまり、階層情報は１
階層からＮ階層まで任意に設定することが可能である。Here, in the data storage device of the present invention, the frame for classifying the disks may be set freely. In FIG. 5, the frame of the node machine and the floor are set, but a larger frame such as a region or a smaller frame such as a bus may be added arbitrarily. That is, the hierarchy information is 1
It is possible to arbitrarily set from the hierarchy to the N hierarchy.

【００４９】図６は実施の形態１によるデータ格納装置
を構成するディスクの階層関係の他例を示す図である。
図６に示すように、同一階層において複数のフレームに
属するディスク、ノードマシンがあってもよい。つま
り、ノードマシンフレーム２６−１はフロアフレーム２
５，２６の両方に属しており、このノードマシンフレー
ム２６−１を下位階層情報２３ａとすると、上位階層情
報２３ｂはフロアフレーム２５，２６の両方が設定され
ることになる。FIG. 6 is a diagram showing another example of the hierarchical relationship of the disks constituting the data storage device according to the first embodiment.
As shown in FIG. 6, there may be disks and node machines belonging to a plurality of frames in the same layer. That is, the node machine frame 26-1 is the floor frame 2
5 and 26, and if the node machine frame 26-1 is the lower layer information 23a, both the floor frames 25 and 26 are set in the upper layer information 23b.

【００５０】一般的に低い階層で同一のフレームに属す
るディスクは、システム障害時に同時にアクセス不能と
なる可能性が高い。例えば、図５に示すように、フロア
フレーム２５に対応するフロアで災害が発生すると、デ
ィスク＃Ｄ１，＃Ｄ２，＃Ｄ３，＃Ｄ４のディスクへの
アクセスが不能となる。また、ノードマシンフレーム２
５−１に対応するノードマシンが故障した時にも、ディ
スク＃Ｄ１，＃Ｄ２へのアクセスが不能となる。Generally, disks belonging to the same frame in a lower hierarchy are likely to be inaccessible at the same time when a system failure occurs. For example, as shown in FIG. 5, when a disaster occurs on the floor corresponding to the floor frame 25, it becomes impossible to access the disks of the disks # D1, # D2, # D3 and # D4. Also, node machine frame 2
Even when the node machine corresponding to 5-1 fails, access to the disks # D1 and # D2 becomes impossible.

【００５１】つまり、ノードマシンフレーム２５−１の
ような低い階層に分類されるディスク＃Ｄ１，＃Ｄ２
は、ノードマシンフレーム２５−１が属するフロアフレ
ーム２５のような上位階層のフレームで障害が発生して
もアクセスが同時に不能となる。反対に、ディスク＃Ｄ
１とディスク＃Ｄ８のような組み合わせであれば、２つ
のディスクが同時にアクセス不能となる可能性が低い。That is, the disks # D1 and # D2 classified into a lower hierarchy like the node machine frame 25-1.
Is disabled at the same time even if a failure occurs in a higher-level frame such as the floor frame 25 to which the node machine frame 25-1 belongs. On the contrary, disk #D
With a combination of 1 and disk # D8, it is unlikely that two disks will be inaccessible at the same time.

【００５２】また、階層構造のルートにホストマシン１
を置くことによって、ホストマシン１から見た階層構造
を構築することができる。例えば、図５で示された階層
構造を有するシステムにおいて、フロアフレーム２５に
対応するフロアにホストマシン１を設置した場合を考え
る。The host machine 1 is located at the root of the hierarchical structure.
By placing, the hierarchical structure viewed from the host machine 1 can be constructed. For example, consider a case where the host machine 1 is installed on the floor corresponding to the floor frame 25 in the system having the hierarchical structure shown in FIG.

【００５３】図７は上述のように配置したホストマシン
から見たシステムの階層構造を示す図である。図におい
て、＃ＨＤ１，＃ＨＤ２はホストマシン１に接続された
ディスクであり、ホストマシンフレーム２９に属する。
２８は階層構造の最上位階層に対応するホストマシン階
層、２９はホストマシン１に接続されたディスクである
ことを示すホストマシンフレームである。FIG. 7 is a diagram showing the hierarchical structure of the system viewed from the host machine arranged as described above. In the figure, # HD1 and # HD2 are disks connected to the host machine 1 and belong to the host machine frame 29.
Reference numeral 28 is a host machine hierarchy corresponding to the highest hierarchy of the hierarchical structure, and 29 is a host machine frame indicating that the disk is connected to the host machine 1.

【００５４】図７に示すように、フロアフレーム２５に
対応するフロアに設置されたホストマシン１を中心に階
層構造を構築すると、ノードマシンフレーム２５−１，
２５−２及びフロアフレーム２６は、ホストマシン階層
２８を上位階層とする同一階層のフレームとして分類さ
れる。このようにシステムの階層構造を考慮すること
で、階層構造のルートに設置するものによってそれを中
心としたシステムの階層構造を築くことができる。As shown in FIG. 7, when a hierarchical structure is built around the host machine 1 installed on the floor corresponding to the floor frame 25, the node machine frames 25-1,
The frame 25-2 and the floor frame 26 are classified as frames in the same layer with the host machine layer 28 as an upper layer. In this way, by considering the hierarchical structure of the system, it is possible to build the hierarchical structure of the system centered on the system installed at the root of the hierarchical structure.

【００５５】次に初期設定時における距離情報２４の設
定例について説明する。ここで、距離情報は同時にアク
セス不能となりやすいディスク同士においては値が小さ
くなり、同時にアクセス不能となりにくいディスク同士
においては値が大きくなるように設定される。Next, an example of setting the distance information 24 at the time of initial setting will be described. Here, the distance information is set to have a small value for disks that are likely to be inaccessible at the same time and a large value for disks that are less likely to be inaccessible at the same time.

【００５６】階層情報２３を用いて距離情報２４を設定
する手順について説明する。階層情報２３を用いた場
合、２つのディスクが低い階層で同一のフレームに属す
るならば、距離情報２４は小さくなる。逆に、低い階層
で同一のフレームに属さず、高い階層で同一のフレーム
に属するならば、距離情報２４は大きくなる。A procedure for setting the distance information 24 using the layer information 23 will be described. When the layer information 23 is used, if the two discs belong to the same frame in the lower layer, the distance information 24 becomes small. Conversely, if the lower layer does not belong to the same frame but the higher layer belongs to the same frame, the distance information 24 becomes large.

【００５７】つまり、階層情報２３と距離情報２４の関
係から、ディスク間の距離情報２４が小さい２つのディ
スクは、同時にアクセス不能となる可能性が高く、逆に
ディスク間の距離情報が大きい２つのディスクは、同時
にアクセス不能となる可能性が低いと決定することがで
きる。That is, from the relationship between the layer information 23 and the distance information 24, two discs having a small distance information 24 between discs are likely to be inaccessible at the same time, and conversely, two discs having a large distance information between discs. The disk can be determined to be unlikely to be inaccessible at the same time.

【００５８】ここで、階層情報２３の設定時に、図７に
示すようにルートにホストマシン１を置いた場合を挙げ
ると、階層情報２３と距離情報２４の関係から、ディス
ク間の距離情報２４が小さい２つのディスクは、ホスト
マシン１から見て同時にアクセス不能となる可能性が高
いと判断することができる。一方、逆にディスク間の距
離情報２４が大きい２つのディスクは、ホストマシン１
から見て同時にアクセス不能となる可能性が低いと判断
される。また、ホストマシン１に接続するディスクと、
ホストマシン以外に設置されたディスクとの距離情報２
４は、全距離情報の中で最大の値をとる。従って、ホス
トマシン１から見て、同時にアクセス不能となる可能性
が最も低いということになる。Here, when the host machine 1 is placed at the root as shown in FIG. 7 when the layer information 23 is set, the distance information 24 between the disks is determined from the relationship between the layer information 23 and the distance information 24. It can be determined that the two small disks are highly inaccessible at the same time when viewed from the host machine 1. On the other hand, conversely, two disks with large distance information 24 between the disks are
From the viewpoint, it is judged that there is a low possibility that access will be disabled at the same time. Also, a disk connected to the host machine 1,
Distance information for disks installed outside the host machine 2
4 takes the maximum value in the total distance information. Therefore, from the viewpoint of the host machine 1, the possibility of being inaccessible at the same time is the lowest.

【００５９】次に動作について説明する。図８は実施の
形態１によるデータ格納装置の動作を示すフロー図であ
り、この図に沿って階層情報２３を用いて距離情報２４
を設定する動作について説明する。また、具体例とし
て、図５中のディスク＃Ｄ１とディスク＃Ｄ３との間の
距離情報２４を設定する場合についても併せて説明す
る。Next, the operation will be described. FIG. 8 is a flow chart showing the operation of the data storage device according to the first embodiment. Along with this flow chart, the distance information 24 is obtained by using the hierarchy information 23.
The operation of setting will be described. Further, as a specific example, a case of setting the distance information 24 between the disc # D1 and the disc # D3 in FIG. 5 will also be described.

【００６０】先ず、ノードマシン管理部９は、システム
全体の階層情報２３を調べてシステムの最大の階層数を
取得する（ステップＳＴ１）。ここで、最大の階層数と
は、ルートまでの階層数が最も多いディスクのルートま
での階層数である。この値は、距離情報２４の最大値と
して使用する。図５の例では、ディスク＃Ｄ１〜＃Ｄ８
からルートまでの階層数は、全て等しく４段である。First, the node machine management unit 9 checks the hierarchy information 23 of the entire system and acquires the maximum number of layers of the system (step ST1). Here, the maximum number of layers is the number of layers up to the root of the disk having the largest number of layers up to the root. This value is used as the maximum value of the distance information 24. In the example of FIG. 5, disks # D1 to # D8
The number of layers from the root to the root is all four.

【００６１】次に、ノードマシン管理部９は、全てのデ
ィスクの組み合わせに対して距離情報２４の設定が完了
しているか否かを判定する（ステップＳＴ２）。このと
き、全てのディスクの組み合わせに対して設定が完了し
ていなければ、ノードマシン管理部９は、ディスク間の
距離情報２４の設定が完了していない２つのディスクを
選択して、その距離情報２４の値に最大値（階層の段
数）を設定する（ステップＳＴ３）。一方、全てのディ
スクの組み合わせに対して設定が完了していれば、ノー
ドマシン管理部９は処理を終了する。図５の例では、デ
ィスク＃Ｄ１とディスク＃Ｄ３との間の距離情報２４の
設定が完了していないので、その値として最大の階層数
４が設定される。Next, the node machine management unit 9 judges whether or not the setting of the distance information 24 has been completed for all combinations of disks (step ST2). At this time, if the setting is not completed for all the combinations of the disks, the node machine management unit 9 selects two disks for which the setting of the distance information 24 between the disks is not completed, and the distance information is selected. The maximum value (the number of layers in the hierarchy) is set to the value of 24 (step ST3). On the other hand, if the setting is completed for all the combinations of disks, the node machine management unit 9 ends the process. In the example of FIG. 5, since the setting of the distance information 24 between the disks # D1 and # D3 has not been completed, the maximum number of tiers 4 is set as the value.

【００６２】ステップＳＴ３において距離情報２４に最
大値（階層の段数）が設定した２つのディスクに対し
て、ノードマシン管理部９は、各ディスクからルートま
での階層情報２３を調べる（ステップＳＴ４）。図５の
例では、ノードマシン管理部９によってディスク＃Ｄ１
からルートまで、及び、ディスク＃Ｄ３からルートまで
の階層情報が求められる。これにより、ディスク＃Ｄ１
の階層情報として、ディスク＃Ｄ１→ノードマシンフレ
ーム２５−１→フロアフレーム２５→ルートが求められ
る。また、ディスク＃Ｄ３の階層情報として、ディスク
＃Ｄ３→ノードマシンフレーム２５−２→フロアフレー
ム２５→ルートが求められる。For the two disks for which the maximum value (the number of layers of the hierarchy) is set in the distance information 24 in step ST3, the node machine management section 9 checks the hierarchy information 23 from each disk to the root (step ST4). In the example of FIG. 5, the node machine management unit 9 causes the disk # D1 to
To root, and layer information from disk # D3 to root is obtained. This allows disk # D1
As the hierarchical information, the disk # D1 → node machine frame 25-1 → floor frame 25 → root is obtained. Further, as the layer information of the disk # D3, the disk # D3 → node machine frame 25-2 → floor frame 25 → route is obtained.

【００６３】次に、ステップＳＴ４で調べた階層情報２
３を用いて、ノードマシン管理部９は、ルートからディ
スク方向へ階層情報２３をたどりながら、選択した２つ
のディスクの各階層情報２３が共通する部分を有するか
否かを判定する（ステップＳＴ５）。具体的に説明する
と、ノードマシン管理部９は、先ず、選択した２つのデ
ィスクの最上位階層の階層情報２３を比較する。ここ
で、階層情報２３が異なる場合は、２つのディスクの距
離情報２４の設定を終了し、ステップＳＴ２の処理に戻
る。図５の例では、先ず、ディスク＃Ｄ１，＃Ｄ３の階
層情報２３のうち、最上位の情報が比較される。Next, the hierarchy information 2 checked in step ST4
3, the node machine management unit 9 traces the layer information 23 from the root toward the disk, and determines whether or not each layer information 23 of the two selected disks has a common part (step ST5). . More specifically, the node machine management unit 9 first compares the layer information 23 of the highest layer of the two selected disks. Here, when the layer information 23 is different, the setting of the distance information 24 of the two discs is ended, and the process returns to step ST2. In the example of FIG. 5, first, the highest level information among the layer information 23 of the disks # D1 and # D3 is compared.

【００６４】一方、階層情報２３が等しい場合には、ノ
ードマシン管理部９は、２つのディスクの距離情報２４
の値を１減少させる（ステップＳＴ６）。図５の例で
は、階層情報２３のうち、最上位の情報がディスク＃Ｄ
１，＃Ｄ３どちらもルートで等しいと判定される。これ
により、これらの距離情報２４の値が４から１減じられ
て３となる。On the other hand, when the tier information 23 is the same, the node machine management section 9 determines the distance information 24 of the two disks.
Is decreased by 1 (step ST6). In the example of FIG. 5, the highest information among the layer information 23 is the disk #D.
It is determined that both 1 and # D3 are equal in the route. As a result, the value of the distance information 24 is subtracted from 4 by 1 to become 3.

【００６５】続いて、ノードマシン管理部９は、ルート
からディスク方向へ階層情報２３をたどりながら、当該
２つのディスクの各階層情報２３をさらに取得し（ステ
ップＳＴ７）、ステップＳＴ５の処理に戻って各階層情
報２３が共通する部分を有するか否かを判定する。図５
の例では、ディスク＃Ｄ１，＃Ｄ３の階層情報２３を１
つディスク方向へたどり、再度比較すると、これらはフ
ロアフレーム２５に属する部分で共通する。このため、
ディスク＃Ｄ１，＃Ｄ３の距離情報２４の値は２とな
る。Subsequently, the node machine management unit 9 further acquires each layer information 23 of the two disks while tracing the layer information 23 from the root toward the disk (step ST7), and returns to the processing of step ST5. It is determined whether or not each layer information 23 has a common part. Figure 5
In the above example, the layer information 23 of the disks # D1 and # D3 is 1
When they are traced in the disc direction and compared again, these are common to the portions belonging to the floor frame 25. For this reason,
The value of the distance information 24 of the disks # D1 and # D3 is 2.

【００６６】さらに、ノードマシン管理部９は、上述し
た操作を２つのディスクの階層情報２３における共通部
分がなくなるまで、階層情報２３をディスク方向へ下げ
る手順を繰り返し、２つのディスク間の距離情報２４を
設定する。図５の例では、階層情報を１つディスク方向
へたどり、再比較を行うと、ディスク＃Ｄ１，＃Ｄ３の
階層情報２３は、ノードマシン２５−１，２５−２で異
なる。これにより、ディスク＃Ｄ１とディスク＃Ｄ３と
の間の距離情報２４の値は、２が設定される。Further, the node machine management unit 9 repeats the above-mentioned operation until the common part in the hierarchy information 23 of the two disks disappears, and repeats the procedure of lowering the hierarchy information 23 in the disk direction. To set. In the example of FIG. 5, if one layer information is traced in the disk direction and recomparison is performed, the layer information 23 of the disks # D1 and # D3 differs between the node machines 25-1 and 25-2. As a result, the value of the distance information 24 between the disc # D1 and the disc # D3 is set to 2.

【００６７】以上の操作を全てのディスク同士に適用す
ることによって、階層情報２３を元にして距離情報２４
を設定することができる。By applying the above operation to all the disks, the distance information 24 is obtained based on the layer information 23.
Can be set.

【００６８】また、図６に示すディスク＃Ｄ５，＃Ｄ６
のように、複数のフレームにまたがって存在するディス
クの距離情報２４を設定する場合を考える。この場合、
図８のステップＳＴ４でディスク＃Ｄ５，＃Ｄ６からル
ートまでの階層情報２３を求める際に全ての組み合わせ
の階層情報２３を求める。つまり、図６に示すディスク
＃Ｄ５からルートまでの階層情報２３は、ディスク＃Ｄ
５→ノードマシンフレーム２６−１→フロアフレーム２
５→ルートと、ディスク＃Ｄ５→ノードマシンフレーム
２６−１→フロアフレーム２６→ルートの２つの組み合
わせからなる。Further, the disks # D5 and # D6 shown in FIG.
Consider the case where the distance information 24 of the disk existing over a plurality of frames is set as described above. in this case,
In step ST4 of FIG. 8, when obtaining the layer information 23 from the disks # D5, # D6 to the root, the layer information 23 of all combinations is obtained. That is, the layer information 23 from the disk # D5 to the root shown in FIG.
5 → node machine frame 26-1 → floor frame 2
5 → root and disk # D5 → node machine frame 26-1 → floor frame 26 → root.

【００６９】ここで、ディスク＃Ｄ５と他のディスクと
の距離情報２４を設定する時は、ディスク＃Ｄ５からル
ートまでの上記２種類の階層情報を使用して、距離情報
２４の値が大きくなる方を使用する。例えば、ディスク
＃Ｄ１とディスク＃Ｄ５との間の距離情報２４を求める
場合、ディスク＃Ｄ５の階層情報２３としてフロアフレ
ーム２５を経由する方を用いると、距離情報２４の値は
２となる。一方、フロアフレーム２６を経由する方を用
いると、距離情報２４の値は３となる。このため、ディ
スク＃Ｄ１とディスク＃Ｄ５との間の距離情報２４の値
は３に設定される。Here, when setting the distance information 24 between the disk # D5 and another disk, the value of the distance information 24 is increased by using the above-mentioned two types of hierarchical information from the disk # D5 to the root. Use one. For example, when the distance information 24 between the disc # D1 and the disc # D5 is obtained, the value of the distance information 24 becomes 2 when the layer information 23 of the disc # D5 that passes through the floor frame 25 is used. On the other hand, if the one passing through the floor frame 26 is used, the value of the distance information 24 becomes 3. Therefore, the value of the distance information 24 between the disc # D1 and the disc # D5 is set to 3.

【００７０】次に、距離情報２４を設定する他の方法に
ついて説明する。ここでは、階層情報２３を用いずに、
ディスクの故障率を元にして稼働率を算出し、これを距
離情報２４として設定する方法を説明する。図９は実施
の形態１によるデータ格納装置の構成とディスクの階層
関係における故障率とを示す図である。図において、３
０−１，３０−２はノードマシンであって、ルータ３１
−１を経由してホストマシン３２に接続されており、ノ
ードマシン３０−３，３０−４がルータ３１−２を経由
してホストマシン３２に接続されている。ここで、ホス
トマシン３２は、図１中のホストマシン１と同一構成を
有するものとする。また、ディスク＃Ｄ１，＃Ｄ２はノ
ードマシン３０−１に接続している。Ｂ１はディスク＃
Ｄ１とノードマシン３０−１との間の故障率を示してお
り、Ｂ２はディスク＃Ｄ２とノードマシン３０−１との
間の故障率を示している。同様に、ディスク＃Ｄ３〜＃
Ｄ８に対しても接続しているノードマシンとの間の故障
率Ｂ３〜Ｂ８が設定される。Next, another method of setting the distance information 24 will be described. Here, without using the hierarchy information 23,
A method of calculating the operating rate based on the disk failure rate and setting it as the distance information 24 will be described. FIG. 9 is a diagram showing the configuration of the data storage device according to the first embodiment and the failure rate in the hierarchical relationship of the disks. In the figure, 3
0-1 and 30-2 are node machines, which are routers 31
-1 is connected to the host machine 32, and the node machines 30-3 and 30-4 are connected to the host machine 32 via the router 31-2. Here, the host machine 32 is assumed to have the same configuration as the host machine 1 in FIG. The disks # D1 and # D2 are connected to the node machine 30-1. B1 is disk #
The failure rate between D1 and the node machine 30-1 is shown, and B2 shows the failure rate between the disk # D2 and the node machine 30-1. Similarly, disks # D3 to #
The failure rates B3 to B8 with the node machine connected to D8 are also set.

【００７１】また、Ｂ９はノードマシン３０−１とルー
タ３１−１との間の故障率である。同様にしてノードマ
シン３０−２，３０−３，３０−４に対しても接続され
ているルータとの間の故障率Ｂ９〜Ｂ１２が設定され
る。さらに、Ｂ１３はルータ３１−１とホストマシン３
２との間の故障率であり、Ｂ１４はルータ３１−２とホ
ストマシン３２との間の故障率である。B9 is the failure rate between the node machine 30-1 and the router 31-1. Similarly, failure rates B9 to B12 with the routers connected to the node machines 30-2, 30-3, and 30-4 are also set. Further, B13 is a router 31-1 and a host machine 3
2 is the failure rate, and B14 is the failure rate between the router 31-2 and the host machine 32.

【００７２】ノードマシン管理部９は、ホストマシン３
２から２つのディスクのどちらかへのアクセス経路が稼
動している状態を稼動状態として稼働率を算出し、当該
稼働率を２つのディスクの距離情報２４として設定す
る。ここで、稼働率は、稼働率＝（１−故障率）で規定
される。従って、ディスク＃Ｄ１とノードマシン３０−
１との間の稼働率は、（１−Ｂ１）となる。また、ルー
タ３１−１からディスク＃Ｄ１のように経路が１つしか
ない場合の稼働率は、（１−Ｂ１）（１−Ｂ９）とな
る。さらに、ディスク＃Ｄ１，＃Ｄ２が同時に故障する
確率はＢ１＊Ｂ２となる。このため、ノードマシン３０
−１からディスク＃Ｄ１，＃Ｄ２のどちらかへのアクセ
ス経路が稼動している状態の稼働率は、（１−Ｂ１＊Ｂ
２）となる。The node machine management unit 9 uses the host machine 3
The operating rate is calculated assuming that the access path from 2 to either of the two disks is operating, and the operating rate is set as the distance information 24 of the two disks. Here, the operating rate is defined by operating rate = (1−failure rate). Therefore, the disk # D1 and the node machine 30-
The operating rate between 1 and 1 is (1-B1). Further, the operation rate when there is only one route from the router 31-1 to the disk # D1 is (1-B1) (1-B9). Further, the probability that the disks # D1 and # D2 will simultaneously fail is B1 * B2. Therefore, the node machine 30
The operating rate when the access path from -1 to either of the disks # D1 and # D2 is operating is (1-B1 * B
2).

【００７３】次に図９に示すディスク＃Ｄ１，＃Ｄ５の
距離情報２４を設定する場合について説明する。このと
き、ホストマシン３２からディスク＃Ｄ１まで稼働率
は、（１−Ｂ１）（１−Ｂ９）（１−Ｂ１３）となる。
一方、ホストマシン３２からディスク＃Ｄ５までの稼働
率は、（１−Ｂ５）（１−Ｂ１１）（１−Ｂ１４）とな
る。Next, a case where the distance information 24 of the disks # D1 and # D5 shown in FIG. 9 is set will be described. At this time, the operating rates from the host machine 32 to the disk # D1 are (1-B1) (1-B9) (1-B13).
On the other hand, the operating rates from the host machine 32 to the disk # D5 are (1-B5) (1-B11) (1-B14).

【００７４】ここで、ホストマシン３２からディスク＃
Ｄ１，＃Ｄ５のどちらかへのアクセス経路が確立してい
る状態の稼働率を算出すると、｛１−｛１−（１−Ｂ
１）（１−Ｂ９）（１−Ｂ１３）｝｛１−（１−Ｂ５）
（１−Ｂ１１）（１−Ｂ１４）｝｝となる。この値がデ
ィスク＃Ｄ１と＃Ｄ５の距離情報２４として設定され
る。Here, from the host machine 32, the disk #
If the operation rate in the state where the access route to either D1 or # D5 is established is calculated, {1- {1- (1-B
1) (1-B9) (1-B13)} {1- (1-B5)
(1-B11) (1-B14)}}. This value is set as the distance information 24 between the disks # D1 and # D5.

【００７５】また、次の例として、ディスク＃Ｄ１，＃
Ｄ３の距離情報２４を設定する場合について説明する。
このとき、ディスク＃Ｄ１とディスク＃Ｄ３とでは、ル
ータ３１−１まで異なる経路をとり、ルータ３１−１か
らホストマシン３２までは共通の経路をとる。この例で
は、先ず、ルータ３１−１から見て、ディスク＃Ｄ１，
＃Ｄ３のどちらかへのアクセス経路が確立している状態
の稼働率を算出する。図９を参照すると、この稼働率
は、｛１−｛１−（１−Ｂ１）（１−Ｂ９）｝｛１−
（１−Ｂ３）（１−Ｂ１０）｝｝となる。Further, as the next example, the disks # D1, #
A case of setting the distance information 24 of D3 will be described.
At this time, the disks # D1 and # D3 take different routes to the router 31-1, and a common route from the router 31-1 to the host machine 32. In this example, first, as viewed from the router 31-1, the disk # D1,
The operation rate in the state where the access route to either # D3 is established is calculated. Referring to FIG. 9, this operating rate is {1- {1- (1-B1) (1-B9)} {1-
(1-B3) (1-B10)}}.

【００７６】この値とルータ３１−１からホストマシン
３２までの稼働率を用いて、ホストマシン３２からディ
スク＃Ｄ１，＃Ｄ３のどちらかへのアクセス経路が確立
している状態の稼働率を求めると、｛１−｛１−（１−
Ｂ１）（１−Ｂ９）｝｛１−（１−Ｂ３）（１−Ｂ１
０）｝｝（１−Ｂ１３）となる。この値がディスク＃Ｄ
１，＃Ｄ３の距離情報２４として設定される。Using this value and the operating rate from the router 31-1 to the host machine 32, the operating rate in the state where the access path from the host machine 32 to either of the disks # D1 and # D3 is established is obtained. And {1- {1- (1-
B1) (1-B9)} {1- (1-B3) (1-B1
0)}} (1-B13). This value is disk #D
It is set as the distance information 24 of 1 and # D3.

【００７７】同様に全てのディスク同士の故障率を求め
ることによって、距離情報２４を設定することができ
る。Similarly, the distance information 24 can be set by obtaining the failure rates of all the disks.

【００７８】また、図１０は実施の形態１によるデータ
格納装置の他の構成とディスクの階層関係における故障
率とを示す図である。図１０に示すように、システムに
よっては、ディスクからホストマシンまで、複数の経路
を持つ場合もある。ここで、図１０は、図９に示す構成
にノードマシン３０−３とルータ３１−１との経路が追
加され、故障率はＢ１５である。また、ノードマシン３
０−２とルータ３１−２との経路も追加され、故障率Ｂ
１６である。このような場合でもホストマシン３２から
２つのディスクへの稼働率を算出し、当該稼働率を２つ
のディスクの距離情報２４として設定することができ
る。FIG. 10 is a diagram showing another configuration of the data storage device according to the first embodiment and the failure rate in the hierarchical relationship of the disks. As shown in FIG. 10, some systems may have a plurality of paths from the disk to the host machine. Here, in FIG. 10, the route between the node machine 30-3 and the router 31-1 is added to the configuration shown in FIG. 9, and the failure rate is B15. Also, the node machine 3
The route between 0-2 and router 31-2 is also added, and the failure rate B
Sixteen. Even in such a case, the operating rate from the host machine 32 to the two disks can be calculated, and the operating rate can be set as the distance information 24 of the two disks.

【００７９】次に初期設定時の冗長グループの作成につ
いて説明する。ここで、冗長グループとは、同一のデー
タが格納されるディスクの集合であり、同時にアクセス
不能となりにくいディスク同士を組み合わせたものであ
る。先ず、冗長グループを作成する一例として、距離情
報２４を元にしてディスクを組み合わせる方法について
説明する。Next, the creation of a redundant group at the time of initial setting will be described. Here, the redundant group is a set of disks in which the same data is stored, and is a combination of disks that are unlikely to be inaccessible at the same time. First, as an example of creating a redundant group, a method of combining disks based on the distance information 24 will be described.

【００８０】この場合、ノードマシン管理部９は、各冗
長グループ内のディスク数をデータの多重度によって決
定する。つまり、Ｍ（Ｍ；２以上の整数）多重のデータ
冗長構成をとるシステムでは、Ｍ個のディスクを１つの
冗長グループとして組み合わせる。In this case, the node machine management unit 9 determines the number of disks in each redundancy group according to the data multiplicity. That is, in a system having an M (M; integer of 2 or more) multiplex data redundancy configuration, M disks are combined as one redundancy group.

【００８１】具体的に説明すると、ノードマシン管理部
９は、ある１つの冗長グループの組み合わせを作成した
時に、当該冗長グループ内でディスク間の距離情報２４
が最大となる値を求める。この値を任意に組み合わせた
各冗長グループ内においても求めて、この中で距離情報
２４が最小となる値を求める。そして、ノードマシン管
理部９は、全ての冗長グループの組み合わせに対して上
記距離情報２４が最小となる値を求め、これらのうち最
も大きい値を有する冗長グループの組み合わせを選択す
る。More specifically, when the node machine management unit 9 creates a certain redundant group combination, the distance information 24 between the disks in the redundant group is created.
Find the maximum value of. This value is also obtained in each redundancy group that is arbitrarily combined, and the value that minimizes the distance information 24 is obtained. Then, the node machine management unit 9 obtains a value that minimizes the distance information 24 for all combinations of redundant groups, and selects a combination of redundant groups having the largest value among these.

【００８２】例えば、図５に示す構成で、多重度２の冗
長グループの組み合わせを作成する場合を考える。この
場合、ノードマシン管理部９が｛＃Ｄ１，＃Ｄ５｝，
｛＃Ｄ２，＃Ｄ６｝，｛＃Ｄ３，＃Ｄ７｝，｛＃Ｄ４，
＃Ｄ８｝のような冗長グループの組み合わせを選択する
と、各冗長グループ内で距離情報２４の最大値が３とな
る。このため、その中での最小値も３となる。図５に示
す構造を有するシステムでは、この最小値が全ての冗長
グループの組み合わせの中で最大の値となる。従って、
このシステムの多重度２における冗長グループの組み合
わせとして上記組み合わせが選択される。For example, consider a case where a combination of redundancy groups with a multiplicity of 2 is created with the configuration shown in FIG. In this case, the node machine management unit 9 sends {# D1, # D5},
{# D2, # D6}, {# D3, # D7}, {# D4
When a combination of redundant groups such as # D8} is selected, the maximum value of the distance information 24 is 3 in each redundant group. Therefore, the minimum value among them is 3. In the system having the structure shown in FIG. 5, this minimum value is the maximum value among all the combinations of redundancy groups. Therefore,
The above combination is selected as the combination of the redundancy groups in the multiplicity 2 of this system.

【００８３】同様に、各冗長グループ内のディスクの数
をＭに増やすことで、Ｍ多重の冗長構成を構築すること
ができる。Similarly, by increasing the number of disks in each redundancy group to M, it is possible to construct an M multiplex redundant configuration.

【００８４】また、全ての冗長グループ内に、距離情報
２４がＮ以上のディスクの組み合わせを１つ以上持たせ
ることにより、Ｎ−１階層以下で障害が発生した場合に
も動作するシステムを構築することができる。Further, by providing at least one combination of disks having the distance information 24 of N or more in all the redundancy groups, a system which operates even when a failure occurs in the N-1 or lower hierarchy is constructed. be able to.

【００８５】さらに、冗長グループごとにデータの多重
度を変更してもよい。つまり、ある冗長グループのデー
タの多重度はＭ多重であり、他の冗長グループのデータ
の多重度はＮ多重などという冗長構成も可能である。Furthermore, the data multiplicity may be changed for each redundancy group. That is, it is possible to have a redundant configuration in which the data multiplicity of a certain redundancy group is M-multiplexing and the data multiplicity of another redundancy group is N-multiplexing.

【００８６】なお、上記では、冗長グループを作成する
時に階層情報２３から求めた距離情報２４を使用する例
を示したが、各ディスク間の物理的な距離を基にして求
められるディスク間の距離を使用して冗長グループの組
み合わせを求めるようにしてもよい。In the above, the example in which the distance information 24 obtained from the hierarchy information 23 is used when creating the redundant group has been shown. However, the distance between the disks obtained based on the physical distance between the disks. May be used to find the combination of redundancy groups.

【００８７】ここで、物理的な距離を基に求められるデ
ィスク間の距離を使用して、冗長グループを作成する手
順について説明する。ノードマシン管理部９は、ある１
つの冗長グループの組み合わせを作成すると、当該冗長
グループ内でディスク間の物理的な距離が最大となる値
を求める。この値を任意に組み合わせた各冗長グループ
内においても求めて、この中で最小となる値を求める。
そして、ノードマシン管理部９は、全ての冗長グループ
の組み合わせに対して、この最小となる値を求め、これ
らのうち最も大きい値を有する冗長グループの組み合わ
せを選択する。Here, a procedure for creating a redundancy group using the distance between disks obtained based on the physical distance will be described. The node machine management unit 9 has 1
When a combination of two redundancy groups is created, a value that maximizes the physical distance between the disks in the redundancy group is calculated. This value is also obtained in each redundant group that is arbitrarily combined, and the minimum value is obtained.
Then, the node machine management unit 9 obtains this minimum value for all combinations of redundancy groups, and selects the combination of redundancy groups having the largest value among these.

【００８８】図１１は同一フロアに設置されたディスク
の物理的な位置関係を示す図である。例えば、ディスク
＃Ｄ１とディスク＃Ｄ２はフロア内で近い位置に設置さ
れたディスクであり、ディスク＃Ｄ３とディスク＃Ｄ７
はフロア内で遠い位置に配置されたディスクであること
を示している。ノードマシン管理部９は、例えば図１１
に示されるような平面上に配置されたディスク集合に対
して、ディスク間の物理的な位置情報を基にして求めた
ディスク間の距離を使用し、冗長グループを作成する。FIG. 11 is a diagram showing the physical positional relationship of the disks installed on the same floor. For example, the disc # D1 and the disc # D2 are discs installed at positions close to each other on the floor, and the disc # D3 and the disc # D7.
Indicates that the disk is located far away on the floor. The node machine management unit 9 is, for example, as shown in FIG.
For a set of disks arranged on a plane as shown in (3), a redundancy group is created by using the distance between the disks obtained based on the physical position information between the disks.

【００８９】図１１に示す条件で、多重２の冗長グルー
プを作成する場合を考える。この場合、ノードマシン管
理部９が｛＃Ｄ１，＃Ｄ８｝，｛＃Ｄ２，＃Ｄ７｝，
｛＃Ｄ３，＃Ｄ５｝，｛＃Ｄ４，＃Ｄ６｝のような冗長
グループの組み合わせを選択すると、各冗長グループ内
で距離の最大値を求める。次に、ノードマシン管理部９
は、その中での最小値を求めると、ディスク＃Ｄ４とデ
ィスク＃Ｄ６の距離の値となる。この値は、全冗長グル
ープの組み合わせに対して求めた中においても、最大の
値となる。このため、作成された冗長グループの組み合
わせが、図１１に示す構成を有するシステムの多重度２
における冗長グループとして選択されることとなる。Consider a case where a redundant group of multiplex 2 is created under the conditions shown in FIG. In this case, the node machine management unit 9 causes the {# D1, # D8}, {# D2, # D7},
When a combination of redundant groups such as {# D3, # D5}, {# D4, # D6} is selected, the maximum distance value is obtained in each redundant group. Next, the node machine management unit 9
Is the value of the distance between the disks # D4 and # D6 when the minimum value among them is calculated. This value is the maximum value among the values obtained for the combination of all redundancy groups. For this reason, the created redundancy group combination has a multiplicity of 2 in the system having the configuration shown in FIG.
Will be selected as a redundancy group in.

【００９０】同様に、各冗長グループ内のディスク数を
Ｍに増やすことで、Ｍ多重のデータ冗長構成を構築する
こともできる。Similarly, by increasing the number of disks in each redundancy group to M, it is possible to construct an M-multiplexed data redundancy configuration.

【００９１】また、図１１の例において、冗長グループ
ごとにデータの多重度を変更することも可能である。つ
まり、ある冗長グループのデータの多重度はＭ多重であ
り、他の冗長グループのデータの多重度はＮ多重などと
いう冗長構成も可能である。Further, in the example of FIG. 11, it is possible to change the data multiplicity for each redundancy group. That is, it is possible to have a redundant configuration in which the data multiplicity of a certain redundancy group is M-multiplexing and the data multiplicity of another redundancy group is N-multiplexing.

【００９２】次に実施の形態１によるデータ格納装置の
各ディスクへのデータ書き込み動作について説明する。
図１２は実施の形態１によるデータ格納装置のデータ書
き込み動作を示すフロー図であり、この図に沿って書き
込み動作の一例を説明する。中央演算処理装置１０から
のデータ書き込み命令は、上位アプリケーションよりフ
ァイルブロック管理部７に対して発行される（ステップ
ＳＴ１ａ）。これにより、ファイルブロック管理部７
は、ノードマシン管理部９に問い合わせて、冗長グルー
プ番号２２ｃを取得する（ステップＳＴ２ａ）。Next, the data writing operation to each disk of the data storage device according to the first embodiment will be described.
FIG. 12 is a flowchart showing the data write operation of the data storage device according to the first embodiment, and an example of the write operation will be described with reference to this figure. A data write command from the central processing unit 10 is issued from the upper application to the file block management unit 7 (step ST1a). As a result, the file block management unit 7
Inquires of the node machine management unit 9 to acquire the redundancy group number 22c (step ST2a).

【００９３】次に、ファイルブロック管理部７は、書き
込みデータをブロック単位に分割する（ステップＳＴ３
ａ）。続いて、ファイルブロック管理部７は、冗長グル
ープに全てのデータブロックの書き込みが完了したか否
かを判定する（ステップＳＴ４ａ）。このとき、全ての
データブロックの書き込みが完了していれば、データ書
き込み動作を終了する。一方、全てのデータブロックの
書き込みが完了していない場合は、ステップＳＴ５ａの
処理に移行する。Next, the file block management section 7 divides the write data into blocks (step ST3).
a). Subsequently, the file block management unit 7 determines whether or not writing of all data blocks in the redundancy group has been completed (step ST4a). At this time, if the writing of all the data blocks is completed, the data writing operation is ended. On the other hand, if writing of all data blocks has not been completed, the process proceeds to step ST5a.

【００９４】ステップＳＴ５ａにおいて、ファイルブロ
ック管理部７は、冗長グループ番号２２ｃを参照して、
冗長グループにブロックが均等に配置されるように、書
き込む冗長グループを選択する。このあと、ファイルブ
ロック管理部７は、同一の冗長グループ内のディスクに
対して、ディスクの先頭から順次、同一のブロックを書
き込み、全てのブロックに対して実行する（ステップＳ
Ｔ６ａ）。In step ST5a, the file block management section 7 refers to the redundancy group number 22c,
The redundancy group to be written is selected so that the blocks are evenly arranged in the redundancy group. After that, the file block management unit 7 sequentially writes the same block to the disks in the same redundancy group from the head of the disk and executes the same for all the blocks (step S
T6a).

【００９５】ここで、ノードマシン３−１〜３−ｎへの
実際のデータ書き込み処理は、ファイルブロック管理部
７が通信制御部８、オペレーティングシステム５、ネッ
トワーク２、オペレーティングシステム１７、通信制御
部１８を介して、ディスクアクセス制御部１９にディス
クへの書き込み命令を発行することで実行される。Here, in the actual data writing process to the node machines 3-1 to 3-n, the file block management unit 7 uses the communication control unit 8, the operating system 5, the network 2, the operating system 17, and the communication control unit 18. It is executed by issuing a disk write command to the disk access control unit 19 via the.

【００９６】図１３は図１２に示す書き込みフローで各
ディスクに書き込まれたブロックの格納図を示す図であ
る。図において、符号３３−１，３３−２，３３−３，
３３−４を付した枠で囲まれたディスクは同一の冗長グ
ループを表している。図１３に示すように、各冗長グル
ープ内のディスクには、均等なデータ量でデータが格納
されている。FIG. 13 is a diagram showing a storage diagram of blocks written in each disk in the write flow shown in FIG. In the figure, reference numerals 33-1, 33-2, 33-3,
Disks surrounded by a frame 33-4 represent the same redundancy group. As shown in FIG. 13, data is stored in the disks in each redundancy group in an equal amount of data.

【００９７】また、図１３ではデータの多重度が２であ
る例を示したが、冗長グループのディスクの数をＭにす
ることで、Ｍ多重の冗長構成を構築することもできる。Further, although FIG. 13 shows an example in which the data multiplicity is 2, by setting the number of disks in the redundant group to M, it is possible to construct an M-multiplexed redundant configuration.

【００９８】なお、データブロックを書き込むグループ
の決定法としては、ハッシュ分散を用いてもよく、ラウ
ンドロビンに行われてもよい。Note that hash distribution may be used as the method of determining the group in which the data block is written, or round robin may be used.

【００９９】次に、各ディスクのリードブロック番号７
ａ−２を設定する動作について説明する。図１４は図３
中のファイルブロック管理テーブルの内部構成を示す図
である。図において、７ａ−１は各ディスクを特定する
ディスク番号である。７ａ−２はリードブロック番号で
あって、データ転送時に各ディスクから転送するファイ
ルブロックであるリードブロックを特定する。Next, read block number 7 of each disk
The operation of setting a-2 will be described. FIG. 14 shows FIG.
It is a figure which shows the internal structure of the file block management table inside. In the figure, 7a-1 is a disc number for identifying each disc. 7a-2 is a read block number, which identifies a read block which is a file block transferred from each disk at the time of data transfer.

【０１００】ファイルブロック管理部７は、データ書き
込み時にファイルブロック管理テーブル７ａの更新を行
う。ここで、ファイルブロック管理部７は、データ転送
時に各ディスクから転送するファイルブロックをリード
ブロックとして予め決定しておき、これを特定する情報
をファイルブロック管理テーブル７ａにリードブロック
番号７ａ−２として保持する。The file block management unit 7 updates the file block management table 7a when writing data. Here, the file block management unit 7 predetermines a file block to be transferred from each disk as a read block at the time of data transfer, and holds information identifying this as a read block number 7a-2 in the file block management table 7a. To do.

【０１０１】図１５は実施の形態１のデータ格納装置に
よるデータ書き込み時に設定するリードブロック番号を
示す図である。図１５に示すように、ファイルブロック
管理部７は、冗長グループ３３−１，３３−２，３３−
３，３３−４からブロックの転送を行うにあたり、同一
の冗長グループ３３−１，３３−２，３３−３，３３−
４に属する各ディスクにおいて重複したブロックにアク
セスしないように、リードブロック番号７ａ−２を決定
する。例えば、冗長グループ３３−１に属するディスク
＃Ｄ１に対して、ファイルブロック管理部７は、符号３
４−１を付した枠で囲まれたデータブロックを特定する
情報をリードブロック番号７ａ−２として設定する。こ
のとき、ディスク＃Ｄ５に対してはディスク＃Ｄ１と重
複しない、符号３４−２を付した枠で囲まれたデータブ
ロックを特定する情報が、リードブロック番号７ａ−２
として指定される。以下、他の冗長グループ３３−２〜
３３−４についても同様である。FIG. 15 is a diagram showing a read block number set at the time of writing data by the data storage device of the first embodiment. As shown in FIG. 15, the file block management unit 7 includes redundancy groups 33-1, 33-2, 33-.
The same redundant groups 33-1, 33-2, 33-3, 33- are used to transfer blocks from 3, 33-4.
The read block number 7a-2 is determined so that duplicated blocks are not accessed in each disk belonging to No. 4. For example, for the disk # D1 belonging to the redundancy group 33-1, the file block management unit 7 sets the code 3
Information for identifying a data block surrounded by a frame with 4-1 is set as a read block number 7a-2. At this time, for the disk # D5, the information specifying the data block enclosed by the frame with the reference numeral 34-2, which does not overlap with the disk # D1, is the read block number 7a-2.
Is specified as. Hereinafter, other redundant groups 33-2 to
The same applies to 33-4.

【０１０２】また、Ｍ多重構成を持つ冗長グループに対
しては、冗長グループ内のあるディスクの全ブロック数
をＮとすると、各ディスクはＮ／Ｍ個の連続したブロッ
クをリードブロックとして設定する。For a redundant group having an M multiplex structure, assuming that the total number of blocks of a disk in the redundant group is N, each disk sets N / M consecutive blocks as a read block.

【０１０３】次にデータ転送時の動作について説明す
る。図１６は実施の形態１によるデータ格納装置のデー
タ転送動作を示すフロー図である。先ず、上位アプリケ
ーションよりデータ転送制御部６にデータの転送命令を
発行する（ステップＳＴ１ｂ）。これにより、データ転
送制御部６は、転送データ情報をファイルブロック管理
部７へ渡す。Next, the operation during data transfer will be described. FIG. 16 is a flowchart showing the data transfer operation of the data storage device according to the first embodiment. First, the upper application issues a data transfer instruction to the data transfer control unit 6 (step ST1b). As a result, the data transfer control unit 6 passes the transfer data information to the file block management unit 7.

【０１０４】ファイルブロック管理部７では、ファイル
ブロック管理テーブル７ａを参照して、転送ブロックが
リードブロック番号７ａ−２となっているディスクを探
索する（ステップＳＴ２ｂ）。このあと、ファイルブロ
ック管理部７は、該当するディスクのディスク番号７ａ
−１及びリードブロック番号７ａ−２をデータ転送制御
部６に返信する。The file block management section 7 refers to the file block management table 7a to search for a disk whose transfer block is the read block number 7a-2 (step ST2b). After that, the file block management unit 7 determines the disk number 7a of the corresponding disk.
-1 and the read block number 7a-2 are returned to the data transfer control unit 6.

【０１０５】次に、データ転送制御部６は、ファイルブ
ロック管理部７から返信されたディスク番号７ａ−１及
びリードブロック番号７ａ−２を元にして、ディスクア
クセス制御部１９にブロック転送命令を発行する（ステ
ップＳＴ３ｂ）。Next, the data transfer control unit 6 issues a block transfer instruction to the disk access control unit 19 based on the disk number 7a-1 and the read block number 7a-2 returned from the file block management unit 7. Yes (step ST3b).

【０１０６】ここで、実施の形態１のデータ格納装置に
よるディスク故障時の動作を説明する。ディスクの故障
が検出されると、故障したディスクのディスク番号がノ
ードマシン管理部９に渡される。続いて、ノードマシン
管理部９は、故障ディスクのディスク番号２２ａ、及
び、故障ディスクが属する冗長グループ番号２２ｃをフ
ァイルブロック管理部７に送出する。これによって、フ
ァイルブロック管理部７は、ノードマシン管理部９から
受け取った冗長グループ番号２２ｃを元にして、その冗
長グループに属するディスクのリードブロック番号７ａ
−２を書き換える。The operation of the data storage device according to the first embodiment when a disk fails will be described. When a disk failure is detected, the disk number of the failed disk is passed to the node machine management unit 9. Subsequently, the node machine management unit 9 sends the disk number 22a of the failed disk and the redundancy group number 22c to which the failed disk belongs to the file block management unit 7. As a result, the file block management unit 7 uses the redundancy group number 22c received from the node machine management unit 9 as a basis, and reads block numbers 7a of the disks belonging to the redundancy group.
Rewrite -2.

【０１０７】以下に、ファイルブロック管理部７による
リードブロック番号の書き換え動作について詳細に説明
する。図１７は実施の形態１のデータ格納装置によって
ディスク故障時に設定したリードブロック番号を示す図
である。図１７の例では、ディスク＃Ｄ７、＃Ｄ８が故
障したことを表している。先ず、ファイルブロック管理
部７は、ディスク＃Ｄ７が故障したため、ディスク＃Ｄ
７と同一の冗長グループに属するディスク＃Ｄ２とにお
けるリードブロック番号７ａ−２の書き換えを行う。こ
の書き換えを行うにあたり、ファイルブロック管理部７
は、冗長グループ内で動作可能なディスクに負荷が均等
にかかるようにリードブロック番号７ａ−２を設定す
る。つまり、冗長グループ３３−２では、動作可能なデ
ィスクがディスク＃Ｄ２の１台である。このため、ディ
スク＃Ｄ２では、冗長グループの全データがリードブロ
ック番号７ａ−２として設定される。次に、ディスク＃
Ｄ７は動作不可能なディスクであるため、リードブロッ
ク番号７ａ−２の設定を空にする。同様に、冗長グルー
プ３３−３に属するディスク＃Ｄ３，＃Ｄ８に対して、
リードブロック番号７ａ−２の書き換えを行う。The rewriting operation of the read block number by the file block management unit 7 will be described in detail below. FIG. 17 is a diagram showing the read block numbers set by the data storage device according to the first embodiment when a disk fails. The example of FIG. 17 indicates that the disks # D7 and # D8 have failed. First, since the disk # D7 has failed, the file block management unit 7
The read block number 7a-2 in the disk # D2 belonging to the same redundant group as 7 is rewritten. In performing this rewriting, the file block management unit 7
Sets the read block number 7a-2 so that the load can be evenly applied to the disks operable in the redundancy group. That is, in the redundant group 33-2, the only operable disk is the disk # D2. Therefore, in the disk # D2, all data in the redundancy group is set as the read block number 7a-2. Then disk #
Since D7 is an inoperable disc, the read block number 7a-2 is set to empty. Similarly, for the disks # D3 and # D8 belonging to the redundancy group 33-3,
The read block number 7a-2 is rewritten.

【０１０８】次にディスク増設時における動作の一例を
説明する。ディスクの増設時には、ノードマシン管理部
９が増設ディスクの情報登録作業を行う。このとき、ノ
ードマシン管理部９は、ノードマシン管理テーブル９ａ
のディスク情報２２に対して増設ディスクに関する設定
欄を追加し、ディスク番号２２ａの割り当てやノードマ
シン番号２２ｂの設定を行う。さらに、増設ディスクを
いずれかの冗長グループに分類し、冗長グループ番号２
２ｃを設定する。続いて、ノードマシン管理部９は、増
設ディスクと他のディスクとの距離情報２４を設定す
る。Next, an example of operation at the time of adding disks will be described. At the time of adding a disk, the node machine management unit 9 performs information registration work of the added disk. At this time, the node machine management unit 9 uses the node machine management table 9a.
A setting field relating to the additional disk is added to the disk information 22 of No. 2, and the disk number 22a is allocated and the node machine number 22b is set. Furthermore, classify the additional disks into one of the redundancy groups, and set the redundancy group number 2
Set 2c. Subsequently, the node machine management unit 9 sets the distance information 24 between the additional disk and another disk.

【０１０９】増設ディスクの情報登録作業が完了する
と、ノードマシン管理部９は、増設ディスクの冗長グル
ープ番号２２ｃをファイルブロック管理部９に渡す。フ
ァイルブロック管理部９では、追加ディスクが属する冗
長グループのディスク負荷が均等になるように、リード
ブロック番号７ａ−２の書き換えを行う。When the information registration work for the additional disk is completed, the node machine management section 9 transfers the redundant group number 22c of the additional disk to the file block management section 9. The file block management unit 9 rewrites the read block number 7a-2 so that the disk load of the redundant group to which the additional disk belongs is equalized.

【０１１０】以上のように、この実施の形態１によれ
ば、データを同一データ量ごとの複数のデータブロック
に分割して、該データブロック単位で複数のディスク装
置に同一ブロック数ずつ分散配置させると共に、所定数
のディスク装置（冗長グループ）ごとに同一内容のデー
タブロックを重複して格納し、複数のディスク装置に分
散して格納されたデータを冗長グループごとに１又は複
数のデータブロック単位で読み出すので、各ディスク装
置のデータ量が均等になるようにデータが格納されると
共に、データ転送時の各ディスク装置からのデータ転送
量が均等になることから、システム全体のスループット
の低下を最小限に抑えることができる。また、各データ
ブロックを冗長グループごとにシーケンシャルに格納す
ることで、データ転送処理を行う際、冗長グループ内の
どのディスクからデータを転送するかを決定するにあた
り、特別な計算が不要であることから全てのデータをデ
ィスクの先頭から順次転送するような処理において高速
に動作させることができる。As described above, according to the first embodiment, data is divided into a plurality of data blocks of the same data amount, and the same number of blocks are distributed and arranged in the plurality of disk devices in units of the data blocks. At the same time, data blocks of the same content are stored redundantly for each predetermined number of disk devices (redundancy groups), and the data stored by being distributed to a plurality of disk devices is stored in units of one or more data blocks for each redundancy group. Since it is read, the data is stored so that the data amount of each disk device becomes equal, and the data transfer amount from each disk device at the time of data transfer becomes equal, so that the decrease in the throughput of the entire system is minimized. Can be suppressed to Also, by storing each data block sequentially for each redundancy group, no special calculation is required when deciding from which disk in the redundancy group the data will be transferred when performing the data transfer process. It is possible to operate at high speed in the process of sequentially transferring all the data from the head of the disc.

【０１１１】また、この実施の形態１によれば、システ
ムの階層関係からディスク間の距離情報を定義し、当該
距離情報を元にして各ディスクをグループ化するので、
システムの特定階層における障害に対応することができ
る。Further, according to the first embodiment, the distance information between the disks is defined from the hierarchical relation of the system, and the disks are grouped based on the distance information.
It is possible to cope with a failure in a specific hierarchy of the system.

【０１１２】さらに、この実施の形態１によれば、シス
テムの特定の部分を中心とした冗長構成を構築すること
ができる。Furthermore, according to the first embodiment, it is possible to construct a redundant configuration centering on a specific part of the system.

【０１１３】さらに、この実施の形態１によれば、ディ
スクの冗長グループを作成し、冗長グループ単位でデー
タの多重度を設定するので、特定のディスク、特定の冗
長グループに対するデータの多重度の変更を適宜行うこ
とができる。Further, according to the first embodiment, since the redundancy group of the disks is created and the data multiplicity is set in the redundancy group unit, the data multiplicity is changed for the specific disk and the specific redundancy group. Can be appropriately performed.

【０１１４】さらに、この実施の形態１によれば、冗長
グループ内にディスクを追加し、データの多重度を上げ
ることによって、システムの信頼性を向上させることが
できると共に、データ転送を高速化することができる。Furthermore, according to the first embodiment, by adding a disk in the redundancy group and increasing the data multiplicity, the system reliability can be improved and the data transfer speed can be increased. be able to.

【０１１５】さらに、この実施の形態１によれば、並列
システムにおいて、データの転送性能の悪いディスクを
持つ冗長グループに新たなディスクを追加することによ
って、並列システム全体のスループットを向上させるこ
とができる。Further, according to the first embodiment, in the parallel system, the throughput of the parallel system as a whole can be improved by adding a new disk to the redundant group having the disk having the poor data transfer performance. .

【０１１６】実施の形態２．この実施の形態２では、冗
長グループを作成する手順、及び、各冗長グループへデ
ータブロックを書き込む手順が上記実施の形態１と異な
る。例えば、データの多重度がＭの時、１つのブロック
を書き込むディスク数はＭであり、あるディスクの複製
ブロックを全て同一のＭ本のディスクが保持するのでは
なく、２＊（Ｍ−１）本のディスクに分散して格納す
る。Embodiment 2. The second embodiment differs from the first embodiment in the procedure of creating a redundant group and the procedure of writing a data block to each redundant group. For example, when the data multiplicity is M, the number of disks in which one block is written is M, and duplicated blocks of a certain disk are not all held by the same M disks, but 2 * (M-1). It is distributed and stored in the book disk.

【０１１７】この実施の形態２によるデータ格納装置の
基本的なハードウェア構成は、上記実施の形態１と同様
である。また、冗長グループも上記実施の形態１と同様
にして距離情報が大きいディスクの組み合わせて作成さ
れる。ここで、上記実施の形態１と異なる点としては、
ディスクごとに冗長グループが存在することにある。つ
まり、この実施の形態２では、Ｎ個のディスクが存在す
る場合、Ｎ個の冗長グループが存在することになる。な
お、冗長グループの構成において、オリジナルデータを
書き込むディスクを元ディスク、元ディスクの複製を書
き込むディスクを複製ディスクと呼ぶこととする。The basic hardware configuration of the data storage device according to the second embodiment is the same as that of the first embodiment. Further, the redundant group is also created by combining disks having large distance information in the same manner as in the first embodiment. Here, the difference from the first embodiment is that
There is a redundancy group for each disk. That is, in the second embodiment, when N disks exist, N redundancy groups exist. In the configuration of the redundant group, the disk in which the original data is written is called the original disk, and the disk in which the copy of the original disk is written is called the duplicate disk.

【０１１８】次に動作について説明する。図１８は実施
の形態２によるデータ格納装置の動作を示すフロー図で
あり、この図及び図５に示すディスクの階層関係の一例
を用いてデータの多重度２の冗長グループ作成動作を説
明する。先ず、ノードマシン管理部９は、全てのディス
クに対して冗長グループの割り当てが完了しているか否
かを判定する（ステップＳＴ１ｃ）。このとき、全ての
ディスクに対する冗長グループの割り当てが完了してい
れば、処理を終了する。一方、全てのディスクに対する
冗長グループの割り当てが完了していなければ、ノード
マシン管理部９は、冗長グループの割り当てが完了して
いない任意のディスクを１つの元ディスクとして選択す
る（ステップＳＴ２ｃ）。ここで、図５の例では、ディ
スク＃Ｄ１が選択されたものとする。Next, the operation will be described. FIG. 18 is a flow chart showing the operation of the data storage device according to the second embodiment, and the redundant group creating operation with the data multiplicity of 2 will be described using this example and the example of the hierarchical relationship of the disks shown in FIG. First, the node machine management unit 9 determines whether or not the allocation of redundant groups has been completed for all disks (step ST1c). At this time, if the allocation of the redundant group to all the disks has been completed, the processing ends. On the other hand, if the assignment of the redundant groups to all the disks has not been completed, the node machine management unit 9 selects an arbitrary disk for which the assignment of the redundant groups has not been completed as one original disk (step ST2c). Here, in the example of FIG. 5, it is assumed that the disk # D1 is selected.

【０１１９】次に、ノードマシン管理部９は、データの
多重度Ｍの場合、元ディスクと複製ディスクの数の和が
２＊（Ｍ−１）＋１個になるまで、元ディスクの冗長グ
ループを作成する。具体的には、ノードマシン管理部９
が元ディスクの冗長グループを作成するにあたり、複製
ディスク数が２＊（Ｍ−１）個になっているか否かを常
に判定する（ステップＳＴ３ｃ）。このとき、複製ディ
スクの数が２＊（Ｍ−１）個になっていれば、ステップ
ＳＴ１ｃの処理に戻って上述した動作を繰り返す。一
方、複製ディスクの数が２＊（Ｍ−１）個になっていな
ければ、ステップＳＴ４ｃの処理に移行する。図５の例
では、データの多重度が２であるため、ノードマシン管
理部９は、元ディスクと複製ディスクの和が３個になる
まで、ディスク＃Ｄ１の冗長グループを作成する。Next, in the case of the data multiplicity M, the node machine management unit 9 sets the redundant groups of the original disks until the sum of the numbers of the original disks and the duplicate disks becomes 2 * (M-1) +1. create. Specifically, the node machine management unit 9
Always creates a redundant group of original disks, and always determines whether the number of duplicate disks is 2 * (M-1) (step ST3c). At this time, if the number of duplicate disks is 2 * (M-1), the process returns to step ST1c and the above-described operation is repeated. On the other hand, if the number of duplicate disks is not 2 * (M-1), the process proceeds to step ST4c. In the example of FIG. 5, since the data multiplicity is 2, the node machine management unit 9 creates the redundant group of the disk # D1 until the sum of the original disk and the duplicate disk becomes three.

【０１２０】ステップＳＴ４ｃにおいて、ノードマシン
管理部９は、元ディスクと最も距離情報の大きいディス
クを選択する。図５の例では、ディスク＃Ｄ１と最も距
離情報の大きいディスクはディスク＃Ｄ５〜＃Ｄ８とな
る。At step ST4c, the node machine management section 9 selects the disk having the largest distance information from the original disk. In the example of FIG. 5, the disks having the largest distance information with the disk # D1 are the disks # D5 to # D8.

【０１２１】ここで、ノードマシン管理部９は、元ディ
スクと最も距離情報の大きいディスクとして選択された
ディスクが複数存在するか否かを判定する（ステップＳ
Ｔ５ｃ）。このとき、選択されたディスクが１つであれ
ば、ノードマシン管理部９は、ステップＳＴ１５ｃに進
んで当該ディスクを元ディスクの冗長グループとすると
共に、選択されたディスクの冗長グループにも元ディス
クを追加する。Here, the node machine management unit 9 determines whether or not there are a plurality of disks selected as disks having the largest distance information from the original disk (step S).
T5c). At this time, if the number of selected disks is one, the node machine management unit 9 proceeds to step ST15c to set the disk as a redundant group of the original disk, and also sets the original disk in the redundant group of the selected disk. to add.

【０１２２】一方、選択されたディスクが複数存在する
場合、ノードマシン管理部９は、元ディスクの複製ディ
スクと最も距離情報の大きいディスクを再選択する（ス
テップＳＴ６ｃ）。図５の例では、現在ディスク＃Ｄ１
の複製ディスクが存在しないため、ステップＳＴ６ｃに
おいて再選択された後でも変わらず、ディスク＃Ｄ５〜
＃Ｄ８が選択される。On the other hand, when there are a plurality of selected disks, the node machine management section 9 reselects the disk having the largest distance information from the duplicate disk of the original disk (step ST6c). In the example of FIG. 5, the current disk # D1
Since there is no duplicate disk, the disk remains unchanged after being reselected in step ST6c.
# D8 is selected.

【０１２３】続いて、ノードマシン管理部９は、ステッ
プＳＴ６ｃにて再度選択されたディスクが複数存在する
か否かを判定する（ステップＳＴ７ｃ）。このとき、選
択されたディスクが１つであれば、ノードマシン管理部
９は、ステップＳＴ１５ｃの処理に進む。一方、選択さ
れたディスクが複数存在する場合、ノードマシン管理部
９は、選択されたディスクの中で割り当てられている複
製ディスクが最も少ないディスクを選択する（ステップ
ＳＴ８ｃ）。図５の例では、ディスク＃Ｄ５〜＃Ｄ８に
は複製ディスクが割り当てられていないため、いずれも
割り当てられた複製ディスクの数が０となり、再びディ
スク＃Ｄ５〜＃Ｄ８が選択される。Subsequently, the node machine management section 9 determines whether or not there are a plurality of disks selected again in step ST6c (step ST7c). At this time, if the number of selected disks is one, the node machine management unit 9 proceeds to the process of step ST15c. On the other hand, when there are a plurality of selected disks, the node machine management unit 9 selects the disk with the smallest number of allocated duplicate disks among the selected disks (step ST8c). In the example of FIG. 5, since the duplicate disks are not assigned to the disks # D5 to # D8, the number of assigned duplicate disks is 0, and the disks # D5 to # D8 are selected again.

【０１２４】このあと、ノードマシン管理部９は、ステ
ップＳＴ８ｃにて選択されたディスクが複数存在するか
否かを判定する（ステップＳＴ９ｃ）。このとき、選択
されたディスクが１つであれば、ノードマシン管理部９
は、ステップＳＴ１５ｃの処理に進む。一方、選択され
たディスクが複数存在する場合、ノードマシン管理部９
は、選択されたディスクの上位階層で複製ディスクの割
り当てが最も少ない上位階層を選択し、その上位階層に
含まれるディスクを選択する（ステップＳＴ１０ｃ）。
図５の例では、ディスク＃Ｄ５〜＃Ｄ８の上位階層であ
るノードマシンフレーム２６−１とノードマシンフレー
ム２６−２に割り当てられた複製ディスクの数が比較さ
れる。Thereafter, the node machine management section 9 determines whether or not there are a plurality of disks selected in step ST8c (step ST9c). At this time, if the number of selected disks is one, the node machine management unit 9
Advances to the process of step ST15c. On the other hand, when there are a plurality of selected disks, the node machine management unit 9
Selects the upper layer having the smallest allocation of duplicate disks in the upper layer of the selected disks, and selects the disks included in the upper layer (step ST10c).
In the example of FIG. 5, the number of duplicate disks allocated to the node machine frame 26-1 and the node machine frame 26-2, which are the upper layers of the disks # D5 to # D8, is compared.

【０１２５】ここで、ノードマシンフレーム２６−１の
複製ディスクの数は、ディスク＃Ｄ５，＃Ｄ６に割り当
てられた複製ディスクの数となる。また、ノードマシン
フレーム２６−２の複製ディスクの数は、ディスク＃Ｄ
７，＃Ｄ８に割り当てられた複製ディスクの数となる。
図５の例では、ディスク＃Ｄ５〜＃Ｄ８には複製ディス
クが割り当てられていない。このため、ノードマシンフ
レーム２６−１，２６−２に割り当てられた複製ディス
クの数は、いずれも０になる。従って、ステップＳＴ１
０ｃにおいても、ディスク＃Ｄ５〜＃Ｄ８が選択され
る。Here, the number of duplicate disks of the node machine frame 26-1 is the number of duplicate disks assigned to the disks # D5 and # D6. The number of duplicate disks of the node machine frame 26-2 is disk #D.
This is the number of duplicate disks allocated to # 7 and # D8.
In the example of FIG. 5, the duplicate disks are not assigned to the disks # D5 to # D8. Therefore, the number of duplicate disks assigned to the node machine frames 26-1 and 26-2 is 0. Therefore, step ST1
Also in 0c, the disks # D5 to # D8 are selected.

【０１２６】続いて、ノードマシン管理部９は、ステッ
プＳＴ１０ｃにて選択されたディスクが複数存在するか
否かを判定する（ステップＳＴ１１ｃ）。このとき、選
択されたディスクが１つであれば、ノードマシン管理部
９は、ステップＳＴ１５ｃの処理に進む。Subsequently, the node machine management section 9 determines whether or not there are a plurality of disks selected in step ST10c (step ST11c). At this time, if the number of selected disks is one, the node machine management unit 9 proceeds to the process of step ST15c.

【０１２７】一方、選択されたディスクが複数存在する
場合、ノードマシン管理部９は、ディスクを選択すべき
階層を１つ上げて、ディスクを選択すべき上位階層があ
るか否かを判定する（ステップＳＴ１２ｃ）。このと
き、ディスクを選択すべき上位階層がある場合は、ステ
ップＳＴ１０ｃの処理に戻って、ノードマシン管理部９
は、当該上位階層において複製ディスクの割り当てられ
た個数の比較を行う。図５の例では、ノードマシンフレ
ーム２６−１，２６−２の上位階層はフロアフレーム２
６で等しい。このため、当該上位階層においても割り当
てられた複製ディスク数は等しくなり、ディスク＃Ｄ５
〜＃Ｄ８が選択される。On the other hand, when there are a plurality of selected disks, the node machine management unit 9 raises the hierarchy by which the disk should be selected by one and judges whether or not there is an upper hierarchy by which the disk should be selected ( Step ST12c). At this time, if there is an upper layer for which a disk should be selected, the process returns to step ST10c and the node machine management unit 9
Compares the allocated number of duplicate disks in the upper layer. In the example of FIG. 5, the upper layer of the node machine frames 26-1 and 26-2 is the floor frame 2.
Equal to six. Therefore, the number of duplicated disks allocated in the upper layer is equal, and the disk # D5
~ # D8 is selected.

【０１２８】このあと、ステップＳＴ１１ｃにおいてさ
らに複数のディスクが存在すると判定されると、ノード
マシン管理部９は、ステップＳＴ１２ｃに進んでさらに
上位階層を１つ上げて、当該上位階層において複製ディ
スクの割り当てられた個数の比較を行う。この動作をス
テップＳＴ１２ｃにて上位階層がないと判定されるまで
繰り返す。After that, if it is determined in step ST11c that a plurality of disks are present, the node machine management unit 9 proceeds to step ST12c to further raise the upper layer by one, and allocates a duplicate disk in the upper layer. Compare the number of items. This operation is repeated until it is determined in step ST12c that there is no upper layer.

【０１２９】また、ステップＳＴ１２ｃにおいて、ディ
スクを選択すべき上位階層がないと判定されると、ノー
ドマシン管理部９は、ステップＳＴ１３ｃに進んで、ス
テップＳＴ１０ｃにて選択されたディスクが複数存在す
るか否かを判定する。このとき、選択されたディスクが
１つであれば、ノードマシン管理部９は、ステップＳＴ
１５ｃの処理に進む。一方、選択されたディスクが複数
存在する場合、ノードマシン管理部９は、選択された複
数のディスクから任意のディスクを選択する（ステップ
ＳＴ１４ｃ）。このあと、ステップＳＴ１５ｃの処理に
移行する。If it is determined in step ST12c that there is no upper layer from which a disk should be selected, the node machine management section 9 proceeds to step ST13c and determines whether there are a plurality of disks selected in step ST10c. Determine whether or not. At this time, if the number of selected disks is one, the node machine management unit 9 proceeds to step ST.
It progresses to the processing of 15c. On the other hand, when there are a plurality of selected disks, the node machine management unit 9 selects an arbitrary disk from the plurality of selected disks (step ST14c). After this, the process proceeds to step ST15c.

【０１３０】図５の例では、最上位階層まで複製ディス
クの個数が等しくなる。このため、ステップＳＴ１４ｃ
にてディスク＃Ｄ５〜＃Ｄ８のうちから任意のディスク
が選択される。ここで、ディスク＃Ｄ５が選択される
と、ステップＳＴ１５ｃにおいて、ディスク＃Ｄ１の冗
長グループとしてディスク＃Ｄ５が設定され、これに応
じてディスク＃Ｄ５の冗長グループとしてディスク＃Ｄ
１が設定される。In the example of FIG. 5, the number of duplicate disks is the same up to the highest layer. Therefore, step ST14c
At, any disc is selected from the discs # D5 to # D8. Here, when the disk # D5 is selected, the disk # D5 is set as the redundant group of the disk # D1 in step ST15c, and accordingly, the disk # D5 is set as the redundant group of the disk # D5.
1 is set.

【０１３１】図１９は図５中の階層関係に対して作成し
た冗長グループの対応表を示す図である。ここで、デー
タ多重度２に対応する冗長グループは、上述した図１８
のフローによって作成したものである。FIG. 19 is a diagram showing a correspondence table of redundant groups created for the hierarchical relationships in FIG. Here, the redundancy group corresponding to the data multiplicity of 2 is the one shown in FIG.
It is created by the flow of.

【０１３２】次に、データの多重度がＭ（Ｍ；２以上の
整数）のディスク冗長構成をとるシステムにおいて、冗
長グループを設定する動作について説明する。先ず、上
述した図１８のフローによって、全ての元ディスクに対
して多重度２のディスク冗長構成をとる冗長グループの
設定を行う。次に、データの多重度を増加させると共
に、全ての元ディスクに対して、冗長グループ数を増加
させていく。Next, the operation of setting a redundancy group in a system having a disk redundancy configuration in which the data multiplicity is M (M; an integer of 2 or more) will be described. First, according to the flow of FIG. 18 described above, a redundant group having a disk redundancy configuration with a multiplicity of 2 is set for all the original disks. Next, while increasing the data multiplicity, the number of redundant groups is increased for all the original disks.

【０１３３】例えば、データ多重度３のディスク冗長構
成をとる場合、ディスク＃Ｄ１の冗長グループは、デー
タ多重度２におけるディスク＃Ｄ１の冗長グループのメ
ンバであるディスク＃Ｄ５とディスク＃Ｄ７の冗長グル
ープのメンバを新たにディスク＃Ｄ１の冗長グループと
して加える。For example, in the case of a disk redundancy configuration with a data multiplicity of 3, the redundancy group of disk # D1 is a redundancy group of disks # D5 and # D7 which are members of the redundancy group of disk # D1 with a data multiplicity of 2. Is newly added as a redundant group of the disk # D1.

【０１３４】つまり、データ多重度２において、ディス
ク＃Ｄ５の冗長グループは、ディスク＃Ｄ１，＃Ｄ４で
ある。また、ディスク＃Ｄ７の冗長グループは、ディス
ク＃Ｄ１，＃Ｄ３である。このため、データ多重度３の
システムにおいては、ディスク＃Ｄ１の冗長グループは
｛＃Ｄ３，＃Ｄ４，＃Ｄ５，＃Ｄ７｝となる。That is, in the data multiplicity 2, the redundant group of the disk # D5 is the disks # D1 and # D4. The redundant group of the disk # D7 is the disks # D1 and # D3. Therefore, in the system having the data multiplicity of 3, the redundancy group of the disk # D1 is {# D3, # D4, # D5, # D7}.

【０１３５】同様に他の元ディスクに対して、冗長グル
ープを決定すると、図１９に示す多重度３の冗長グルー
プが設定される。データの多重度を１上げるごとに同様
の手順で全ての元ディスクの冗長グループ数を増加させ
ていくことで、多重度Ｍのディスク冗長構成をとるシス
テムにおいて冗長グループを設定することができる。Similarly, when a redundant group is determined for another source disk, a redundant group with a multiplicity of 3 shown in FIG. 19 is set. A redundant group can be set in a system having a disk redundancy configuration with a multiplicity of M by increasing the number of redundant groups of all the original disks by the same procedure each time the data multiplicity is increased.

【０１３６】また、この実施の形態２においても、デー
タは冗長グループ単位にデータ量が均等になるように書
込まれる。Also in the second embodiment, the data is written so that the data amount becomes even in the redundant group unit.

【０１３７】次に実施の形態２におけるデータ書き込み
動作について説明する。図２０は実施の形態２のデータ
格納装置によるデータ書き込み動作を示すフロー図であ
り、この図に沿って図１９に示す冗長グループ構成を有
するディスクに対するデータ書き込み動作について説明
する。先ず、ファイルブロック管理部７は、データ書き
込みを行う冗長グループの順序を決定する（ステップＳ
Ｔ１ｄ）。具体的に説明すると、ファイルブロック管理
部７は、最初に任意の元ディスクを選択する。例えば、
図５の構成で図１９の多重度２の場合の冗長グループ設
定がなされているシステムにおいて、ディスク＃Ｄ１が
選択されたものとする。この次に、ファイルブロック管
理部７は、ディスク＃Ｄ１の冗長グループの中で距離情
報の最も大きいものを１つ選択する。上記システムで
は、ディスク＃Ｄ１の冗長グループはディスク＃Ｄ５，
＃Ｄ７であり、どちらも距離情報が等しい。このため、
どちらを選択しても良い。ここでは、例としてディスク
＃Ｄ５を選択する。Next, the data write operation in the second embodiment will be described. FIG. 20 is a flow chart showing the data write operation by the data storage device of the second embodiment, and the data write operation for the disk having the redundant group configuration shown in FIG. 19 will be described with reference to this figure. First, the file block management unit 7 determines the order of the redundant groups for writing data (step S).
T1d). More specifically, the file block management unit 7 first selects an arbitrary original disk. For example,
It is assumed that the disk # D1 is selected in the system having the configuration of FIG. 5 and the redundancy group setting in the case of the multiplicity of 2 in FIG. Next, the file block management unit 7 selects one having the largest distance information in the redundancy group of the disk # D1. In the above system, the redundancy group of disk # D1 is disk # D5,
# D7, and both have the same distance information. For this reason,
Either may be selected. Here, the disk # D5 is selected as an example.

【０１３８】続いて、ファイルブロック管理部７は、デ
ィスク＃Ｄ５の冗長グループの中で距離情報の最も大き
いものを選択する。つまり、ディスク＃Ｄ５の冗長グル
ープに属するディスク＃Ｄ１，＃Ｄ４のいずれかが選択
される。ここでは、ディスク＃Ｄ１がすでに選択されて
いるため、ディスク＃Ｄ４が選択される。以下、全ての
ディスクに対して同様の操作を繰り返す。これにより、
ファイルブロック管理部７は、データブロックの書き込
みをディスク＃Ｄ１，ディスク＃Ｄ５，ディスク＃Ｄ
４，ディスク＃Ｄ８，ディスク＃Ｄ２，ディスク＃Ｄ
６，ディスク＃Ｄ３，ディスク＃Ｄ７の順序で行うこと
を決定する。Then, the file block management section 7 selects the one having the largest distance information in the redundancy group of the disk # D5. That is, one of the disks # D1 and # D4 belonging to the redundancy group of the disk # D5 is selected. Here, since the disk # D1 has already been selected, the disk # D4 is selected. Hereinafter, the same operation is repeated for all the disks. This allows
The file block management unit 7 writes the data block to the disk # D1, the disk # D5, and the disk #D.
4, disk # D8, disk # D2, disk #D
6, disc # D3, and disc # D7 are decided in this order.

【０１３９】次に、ファイルブロック管理部７は、書き
込みデータをブロック単位に分割する（ステップＳＴ２
ｄ）。続いて、ファイルブロック管理部７は、冗長グル
ープに全てのデータブロックの書き込みが完了したか否
かを判定する（ステップＳＴ３ｄ）。このとき、全ての
データブロックの書き込みが完了していれば、データ書
き込み動作を終了する。一方、全てのデータブロックの
書き込みが完了していない場合は、ステップＳＴ４ｄの
処理に移行する。Next, the file block management section 7 divides the write data into blocks (step ST2).
d). Subsequently, the file block management unit 7 determines whether or not writing of all data blocks in the redundancy group has been completed (step ST3d). At this time, if the writing of all the data blocks is completed, the data writing operation is ended. On the other hand, if the writing of all the data blocks is not completed, the process proceeds to step ST4d.

【０１４０】ステップＳＴ４ｄにおいて、ファイルブロ
ック管理部７は、上述のようにして決定した書き込み順
序に従って、先ず元ディスクに対してデータブロックの
書き込みを行う。このあと、ファイルブロック管理部７
は、データ多重度がＭ（Ｍ；２以上の整数）であれば、
元ディスクに書込んだデータブロックと同一のデータブ
ロックを、当該元ディスクが属する冗長グループの（Ｍ
−１）個の複製ディスクに対して書き込む。具体的に説
明すると、ファイルブロック管理部７は、（Ｍ−１）個
の複製ディスクにデータブロックを書き込んだか否かを
判定しながら、当該冗長グループにおいて元ディスクを
除く２＊（Ｍ−１）個のディスクから複製ディスクを選
択してデータブロックを書き込む（ステップＳＴ５
ｄ）。このとき、（Ｍ−１）個の複製ディスクにデータ
ブロックを書き込んでいれば、ステップＳＴ２ｄの処理
に戻って、上述した操作を繰り返す。In step ST4d, the file block management section 7 first writes a data block to the original disk in the writing order determined as described above. After this, the file block management unit 7
If the data multiplicity is M (M; integer of 2 or more),
A data block that is the same as the data block written to the original disk is stored in the redundancy group (M
-1) Write on the duplicate disks. More specifically, the file block management unit 7 excludes the original disk in the redundancy group while determining whether or not the data block has been written to the (M-1) duplicate disks, and 2 * (M-1). A duplicate disk is selected from the individual disks and a data block is written (step ST5).
d). At this time, if the data blocks have been written to the (M-1) duplicate disks, the process returns to step ST2d and the above-described operation is repeated.

【０１４１】一方、（Ｍ−１）個の複製ディスクにデー
タブロックを書き込んでいないと、ファイルブロック管
理部７は、既にブロックを書き込んだディスクを除い
て、元ディスクから最も距離情報の大きいディスクを選
択する（ステップＳＴ６ｄ）。On the other hand, if the data block is not written to the (M-1) duplicated disks, the file block management unit 7 selects the disk having the largest distance information from the original disk, excluding the disk to which the block has already been written. Select (step ST6d).

【０１４２】このあと、ファイルブロック管理部７は、
ステップＳＴ６ｄにて選択したディスクが複数存在する
か否かを判定する（ステップＳＴ７ｄ）。このとき、選
択したディスクが複数存在しなければ、ファイルブロッ
ク管理部７は、ステップＳＴ１１ｄの処理に進んで選択
したディスクにデータブロックの書き込みを行い、ステ
ップＳＴ５ｄの処理に戻る。After that, the file block management unit 7
It is determined whether or not there are a plurality of disks selected in step ST6d (step ST7d). At this time, if a plurality of selected disks do not exist, the file block management unit 7 proceeds to the processing of step ST11d to write the data block to the selected disk, and returns to the processing of step ST5d.

【０１４３】一方、選択したディスクが複数存在する場
合、ファイルブロック管理部７は、選択したディスクの
中で書き込まれたデータブロック数（書き込みブロック
数）が最も少ないディスクを選択する（ステップＳＴ８
ｄ）。つまり、各ディスクにブロックが均等に書込まれ
るようにディスクを選択する。On the other hand, when there are a plurality of selected disks, the file block management section 7 selects the disk with the smallest number of written data blocks (the number of written blocks) among the selected disks (step ST8).
d). That is, the disks are selected so that blocks are written evenly on each disk.

【０１４４】続いて、ファイルブロック管理部７は、ス
テップＳＴ８ｄにて選択したディスクが複数存在するか
否かを判定する（ステップＳＴ９ｄ）。このとき、選択
したディスクが複数存在しなければ、ファイルブロック
管理部７は、ステップＳＴ１１ｄの処理に進む。一方、
選択したディスクが複数存在する場合、ファイルブロッ
ク管理部７は、ステップＳＴ８ｄにて選択した複数のデ
ィスクのうち任意のディスクを１つ選択する（ステップ
ＳＴ１０ｄ）。Subsequently, the file block management section 7 determines whether or not there are a plurality of disks selected in step ST8d (step ST9d). At this time, if there are not a plurality of selected disks, the file block management unit 7 proceeds to the process of step ST11d. on the other hand,
When there are a plurality of selected disks, the file block management unit 7 selects one of the plurality of disks selected in step ST8d (step ST10d).

【０１４５】このあと、ファイルブロック管理部７は、
ステップＳＴ１１ｄに進み、ステップＳＴ１０ｄにて選
択したディスクに対してデータブロックを書き込む。こ
の動作を繰り返して、（Ｍ−１）個の複製ディスクに複
製データブロックの書き込みを行う。After that, the file block management unit 7
In step ST11d, the data block is written to the disc selected in step ST10d. By repeating this operation, the duplicate data block is written to the (M-1) duplicate disks.

【０１４６】図２１は図１９の多重度２に対応する冗長
グループ設定がなされているシステムのデータブロック
格納例を示す図である。ここで、図２１におけるデータ
ブロック格納は、上述した図２０のフローによって行っ
たものである。FIG. 21 is a diagram showing a data block storage example of a system in which a redundancy group setting corresponding to the multiplicity of 2 in FIG. 19 is made. Here, the data block storage in FIG. 21 is performed by the flow of FIG. 20 described above.

【０１４７】ファイルブロック管理部７は、データ書き
込み時にファイルブロック管理テーブル７ａの更新を行
い、リードブロック番号７ａ−２の設定を行う。このと
き、元ディスクとして書き込みを行ったブロックを各デ
ィスクのリードブロックとして設定する。The file block management section 7 updates the file block management table 7a when writing data and sets the read block number 7a-2. At this time, the block written as the original disk is set as the read block of each disk.

【０１４８】図２２は図２１の構成にリードブロックを
設定した場合を示す図である。図において、丸記号で囲
まれたデータブロックがリードブロックとして設定され
ることを示している。例えば、ディスク＃Ｄ１では｛Ｂ
１、Ｂ９、Ｂ１７｝がリードブロックに設定される。図
２２に示すように、オリジナルデータを構成するデータ
ブロックＢ１〜Ｂ２４が各ディスクのリードブロックと
してそれぞれ割り当てられる。FIG. 22 is a diagram showing a case where a read block is set in the configuration of FIG. In the figure, it is shown that the data block surrounded by a circle symbol is set as a read block. For example, for disk # D1 {B
1, B9, B17} is set to the read block. As shown in FIG. 22, the data blocks B1 to B24 forming the original data are assigned as the read blocks of the respective disks.

【０１４９】また、この実施の形態２のデータ格納装置
によるデータ転送時の動作は、上記実施の形態１と同様
に図１６のフローで行われる。これにより、重複する説
明を省略する。Further, the operation at the time of data transfer by the data storage device of the second embodiment is performed by the flow of FIG. 16 as in the first embodiment. Therefore, duplicated description will be omitted.

【０１５０】次に実施の形態２におけるディスク故障時
の動作について説明する。先ず、ディスクの故障が検出
されると、上記実施の形態１と同様にして、ノードマシ
ン管理９によってノードマシン管理テーブル９ａの修正
が行われる。このあと、ファイルブロック管理部７は、
ファイルブロック管理テーブル７ａの修正を行う。この
ファイルブロック管理テーブル７ａの修正動作を詳細に
説明する。Next, the operation at the time of a disk failure in the second embodiment will be described. First, when a disk failure is detected, the node machine management table 9a is corrected by the node machine management 9 as in the first embodiment. After that, the file block management unit 7
The file block management table 7a is modified. The correction operation of the file block management table 7a will be described in detail.

【０１５１】図２３は図２１の構成でディスク故障時に
設定されるリードブロック番号を示す図である。図にお
いて、ディスク＃Ｄ２に縦線が付されているのは、当該
ディスク＃Ｄ２が故障したことを示している。また、図
２２の場合と同様に、丸記号で囲まれたデータブロック
がリードブロックである。図２３では、ディスク＃Ｄ２
故障時に、ファイルブロック管理テーブル７ａの修正を
行った後、リードブロックとなるブロックを丸記号で囲
んでいる。例えば、ディスク＃Ｄ２の故障前、ディスク
＃Ｄ８のリードブロックは、図２２に示すように｛Ｂ
６、Ｂ１４、Ｂ２２｝であった。しかしながら、ディス
ク＃Ｄ２故障後では、ディスク＃Ｄ８のリードブロック
は、図２３に示すように｛Ｂ５、Ｂ６、Ｂ１３、Ｂ２
１｝となる。FIG. 23 is a diagram showing the read block numbers set when the disk fails in the configuration of FIG. In the figure, the vertical line attached to the disk # D2 indicates that the disk # D2 has failed. Further, as in the case of FIG. 22, the data block surrounded by the circle symbol is the read block. In FIG. 23, disk # D2
After the file block management table 7a is corrected at the time of failure, the block to be the read block is surrounded by a circle symbol. For example, before the failure of the disk # D2, the read block of the disk # D8 is {B as shown in FIG.
6, B14, B22}. However, after the failure of the disk # D2, the read blocks of the disk # D8 are {B5, B6, B13, B2 as shown in FIG.
1}.

【０１５２】図２４は実施の形態２のデータ格納装置に
よるディスク故障時の動作を示すフロー図であり、この
図、図２２及び図２３を用いて詳細に説明する。先ず、
ファイルブロック管理部７は、ファイルブロック管理テ
ーブル７ａを修正するにあたり、ノードマシン管理部９
から対象となるディスク総数、故障ディスクのディスク
番号を取得する（ステップＳＴ１ｅ）。図２３の例で
は、故障ディスクのディスク番号は＃Ｄ２であり、ディ
スク総数は７となる。FIG. 24 is a flowchart showing the operation of the data storage device according to the second embodiment when a disk fails, which will be described in detail with reference to this figure, FIG. 22 and FIG. First,
The file block management unit 7 modifies the file block management table 7a when the node machine management unit 9
The total number of target disks and the disk number of the failed disk are acquired from (step ST1e). In the example of FIG. 23, the disk number of the failed disk is # D2, and the total number of disks is 7.

【０１５３】次に、ファイルブロック管理部７は、ファ
イルブロック管理テーブル７ａを参照して、故障ディス
クのリードブロック番号７ａ−２を取得する（ステップ
ＳＴ２ｅ）。図２３の例では、ファイルブロック管理部
７が故障ディスク＃Ｄ２のリードブロック番号｛Ｂ５、
Ｂ１３、Ｂ２１｝を取得する。Next, the file block management section 7 refers to the file block management table 7a and acquires the read block number 7a-2 of the failed disk (step ST2e). In the example of FIG. 23, the file block management unit 7 determines that the read block number of the failed disk # D2 is {B5,
B13, B21} is acquired.

【０１５４】続いて、ファイルブロック管理部７は、オ
リジナルデータを構成する全てのデータブロックがいず
れかのディスクのリードブロックとして割り当てられて
いるか否かを判定する（ステップＳＴ３ｅ）。このと
き、全てのブロックがいずれかのディスクに割り当てら
れていれば、ファイルブロック管理部７は、処理を終了
する。Subsequently, the file block management unit 7 determines whether or not all the data blocks forming the original data are assigned as the read blocks of any disk (step ST3e). At this time, if all the blocks are assigned to any of the disks, the file block management unit 7 ends the process.

【０１５５】一方、割り当てられていないブロックが存
在する場合、ファイルブロック管理部７は、いずれのデ
ィスクにも割り当てられていないリードブロックの複製
を有するディスクを選択する（ステップＳＴ４ｅ）。具
体的に説明すると、ファイルブロック管理部７は、故障
ディスクのリードブロック番号７ａ−２の複製を格納す
るディスクを選択する。図２３の例では、ディスク＃Ｄ
２のリードブロック番号７ａ−２の複製を格納するディ
スクは、ディスク＃Ｄ８である。このため、ファイルブ
ロック管理部７は、ディスク＃Ｄ８を選択してリードブ
ロック番号７ａ−２の再割り当てを行うこととなる。On the other hand, if there is an unallocated block, the file block management section 7 selects a disk having a copy of the read block that is not allocated to any disk (step ST4e). Specifically, the file block management unit 7 selects a disk that stores a copy of the read block number 7a-2 of the failed disk. In the example of FIG. 23, disk #D
The disk storing the copy of the read block number 7a-2 of No. 2 is disk # D8. Therefore, the file block management unit 7 selects the disk # D8 and reassigns the read block number 7a-2.

【０１５６】次に、ファイルブロック管理部７は、ステ
ップＳＴ４ｅにて選択されたディスクに新しく割り当て
るリードブロック数を求める（ステップＳＴ５ｅ）。こ
こでは、下記式（１）によって新しく割り当てるブロッ
ク数を求める。Ｎｅｗ＿Ｂ＝ｃｅｉｌ（Ｒｅｍ＿Ｂ／（Ｒｅｍ＿Ｄ＋１））・・・（１）ただし、Ｎｅｗ＿Ｂは新しく割り当てるブロック数であ
り、Ｒｅｍ＿Ｂは割り当ての決まっていないブロック
数、Ｒｅｍ＿Ｄはリードブロックが再割り当てされてい
ないディスク数を示している。また、ｃｅｉｌ（）は
小数点以下切上げを意味する。Next, the file block management section 7 obtains the number of read blocks to be newly assigned to the disk selected in step ST4e (step ST5e). Here, the number of blocks to be newly allocated is calculated by the following equation (1). New_B = ceil (Rem_B / (Rem_D + 1)) (1) where New_B is the number of blocks to be newly allocated, Rem_B is the number of blocks whose allocation has not been decided, and Rem_D is the number of disks to which read blocks have not been reallocated. Shows. Also, ceil () means rounding up after the decimal point.

【０１５７】ここで、オリジナルデータを構成するブロ
ックのうち、現在どのディスクにもリードブロックとし
て割り当てられていないブロックを、他のディスクに均
等に割り当てるには、選択されたディスクに対して、最
低いくつのブロックを新たに割り当てればよいかを求め
なければならない。上記式（１）は、この新しく割り当
てるブロック数を決定するための数式である。Here, among the blocks forming the original data, in order to evenly allocate the blocks which are not currently assigned as read blocks to any of the disks to the other disks, at least the number of blocks of the selected disk must be set. You must ask if you should newly allocate the block. The above formula (1) is a formula for determining the number of blocks to be newly allocated.

【０１５８】図２３の例では、ディスク＃Ｄ８について
は、Ｎｅｗ＿Ｂ＝ｃｅｉｌ（３／（７−１＋１））＝１
となり、新たに１つのリードブロックを割り当てる必要
がある。つまり、現在割り当てられている３個のリード
ブロックに新たに１個のリードブロックが加えられ、再
割り当て後のリードブロックは合計４個となる。In the example of FIG. 23, for disk # D8, New_B = ceil (3 / (7-1 + 1)) = 1
Therefore, it is necessary to newly allocate one read block. That is, one read block is newly added to the three read blocks currently allocated, and the total number of read blocks after the reallocation becomes four.

【０１５９】次に、ファイルブロック管理部７は、現在
どのディスクにも割り当てられていないリードブロック
を全て選択し、これらリードブロックと、ステップＳＴ
４ｅで選択したディスクに現在割り当てられているリー
ドブロックとから、ステップＳＴ５ｅにて求めた新たに
割り当てるべきリードブロック数分のブロックを選択す
る（ステップＳＴ６ｅ）。図２３の例では、ディスク＃
Ｄ８に新たに設定するリードブロックとして、現在どの
ディスクにもリードブロックとして割り当てられていな
い｛Ｂ５、Ｂ１３、Ｂ２１｝と、現在ディスク＃Ｄ８に
リードブロックとして割り当てられている｛Ｂ６、Ｂ１
４、Ｂ２２｝とから、新しく割り当てるブロックの数、
即ち、１個のリードブロックが選択される。Next, the file block management section 7 selects all the read blocks that are not currently assigned to any disk, and selects these read blocks and step ST.
From the read blocks currently assigned to the disk selected in 4e, blocks corresponding to the number of read blocks to be newly assigned in step ST5e are selected (step ST6e). In the example of FIG. 23, the disk #
As a read block newly set to D8, {B5, B13, B21} which is not currently assigned to any disk as a read block, and {B6, B1 which is currently assigned to a disk # D8 as a read block.
4, B22}, the number of blocks to be newly allocated,
That is, one read block is selected.

【０１６０】従って、ディスク＃Ｄ８に新たに設定され
るリードブロックは、｛Ｂ５、Ｂ６、Ｂ１３、Ｂ２１｝
となる。この情報を元にして、ファイルブロック管理部
７は、ファイルブロック管理テーブル７ａのリードブロ
ック番号７ａ−２の書き換えを行う（ステップＳＴ７
ｅ）。Therefore, the read block newly set on the disk # D8 is {B5, B6, B13, B21}.
Becomes Based on this information, the file block management unit 7 rewrites the read block number 7a-2 of the file block management table 7a (step ST7).
e).

【０１６１】このあと、ファイルブロック管理部７は、
ステップＳＴ３ｅの処理に戻って同様にディスク＃Ｄ
４，＃Ｄ５についても新しいリードブロック番号を求
め、ファイルブロック管理テーブル７ａの書き換えを行
う。After that, the file block management unit 7
Return to the processing of step ST3e, and similarly perform disk #D.
For 4 and # D5, new read block numbers are obtained and the file block management table 7a is rewritten.

【０１６２】以上の動作によって、ディスク＃Ｄ２の故
障後に各ディスクのリードブロックは、図２３のように
再設定される。By the above operation, the read block of each disk is reset as shown in FIG. 23 after the failure of the disk # D2.

【０１６３】なお、図２３の例ではデータの多重度が２
であるが、同様の手順でデータの多重度Ｍの場合でも実
現することができる。In the example of FIG. 23, the data multiplicity is 2
However, the same procedure can be used to achieve the data multiplicity M.

【０１６４】さらに、この実施の形態２によるデータ格
納装置において上位階層で障害が発生した場合の動作に
ついて説明する。図２５は図５の構成のシステムを実施
の形態２の手順に基づいてデータを分散配置させたディ
スク格納例を示す図であり、ノードマシンフレーム２５
−１に障害が発生してディスク＃Ｄ１，＃Ｄ２のデータ
アクセスができない状況を示している。Further, the operation of the data storage device according to the second embodiment when a failure occurs in the upper hierarchy will be described. FIG. 25 is a diagram showing an example of disk storage in which data is distributed in the system having the configuration of FIG. 5 based on the procedure of the second embodiment.
1 shows a situation in which a failure occurs in -1 and data access to disks # D1 and # D2 is not possible.

【０１６５】図２５に示すように、複数のディスクに同
時に障害が発生した場合にも図２４で示すフローでリー
ドブロック番号７ａ−２の再配置を行うことで、各ディ
スクの負荷が均等に分散されている。As shown in FIG. 25, even when failures occur in a plurality of disks at the same time, the read block number 7a-2 is rearranged by the flow shown in FIG. 24 so that the load of each disk is evenly distributed. Has been done.

【０１６６】また、図２５において、データの多重度は
２であるが、データの多重度Ｍにおいても同様の手順で
実現することができる。In FIG. 25, the data multiplicity is 2, but the data multiplicity M can be realized by the same procedure.

【０１６７】以上のように、この実施の形態２によれ
ば、いずれかのディスクがデータ読み出し不可となる
と、該ディスクから読み出されるべきデータブロック群
を、これと同一内容のデータブロック群を分散して格納
する他の複数のディスクから均等に読み出すので、ディ
スク故障時には故障したディスクのリードブロックが非
故障のディスクに均等に設定されることから、ディスク
故障時に特定のディスクがボトルネックとなって並列シ
ステム全体におけるスループットが低下するという不具
合が発生しない。また、複数のディスクに障害が発生し
た時にも同様の効果が得られる。As described above, according to the second embodiment, when data cannot be read from any of the disks, the data block group to be read from the disk is distributed to the data block groups having the same contents. Since the read blocks of the failed disk are set evenly on non-failed disks because the disks are read evenly from other disks that are stored in parallel, the specific disk becomes a bottleneck at the time of the disk failure and becomes parallel. The problem that the throughput of the entire system is reduced does not occur. Also, the same effect can be obtained when a failure occurs in a plurality of disks.

【０１６８】また、この実施の形態２によれば、ディス
ク間に距離情報を定義し、同時に故障が起こりにくいデ
ィスク同士で冗長グループを構成しディスクの複製を作
成することで、上位階層での障害発生時にもデータへの
アクセスが可能な並列システムを提供することができ
る。Further, according to the second embodiment, the distance information is defined between the disks, and at the same time, a redundant group is made up of disks which are less likely to fail and a copy of the disks is created, so that a failure in the upper layer is caused. It is possible to provide a parallel system in which data can be accessed even when it occurs.

【０１６９】なお、上記実施の形態２において、ホスト
マシンのディスクを含めてグループ化を行い、ホストマ
シンにデータを配置しても良い。In the second embodiment, the disks of the host machine may be included in the group and the data may be arranged in the host machine.

【０１７０】実施の形態３．この実施の形態３は、上記
実施の形態２に対してデータの格納が異なる。つまり、
実施の形態３では各ディスクを２つのパーティションに
分割してデータの書き込みを行うものである。Third embodiment. The third embodiment is different from the second embodiment in data storage. That is,
In the third embodiment, each disk is divided into two partitions to write data.

【０１７１】図２６はこの発明の実施の形態３によるデ
ータ格納装置のブロック格納例を示す図である。図にお
いて、１つのディスクを表す円筒形状中に付した境界線
によって分けられる各部分は、当該ディスクのパーティ
ションを示している。このように、実施の形態３では、
リードブロックに設定されているブロック（図中、丸記
号で囲まれたデータ）と、他のディスクの複製データと
を異なるパーティションに格納する。FIG. 26 is a diagram showing an example of block storage of the data storage device according to the third embodiment of the present invention. In the figure, each part separated by a boundary line attached to a cylindrical shape representing one disk indicates a partition of the disk. Thus, in the third embodiment,
The block set in the read block (data surrounded by a circle in the figure) and the duplicated data of another disk are stored in different partitions.

【０１７２】ここで、実施の形態３によるファイルブロ
ック管理部７は、各ディスクにデータブロックを書き込
むにあたり、リードブロックに設定されているブロック
と他のディスクの複製データとを交互に異なるパーティ
ションに書き込む。これ以外の動作は、上記実施の形態
と同様である。Here, when writing the data block to each disk, the file block management unit 7 according to the third embodiment alternately writes the block set as the read block and the duplicated data of another disk to different partitions. . The other operations are the same as those in the above embodiment.

【０１７３】以上のように、この実施の形態３によれ
ば、ブロックをシーケンシャルにアクセスし、データ転
送を行う時に、各ディスクから転送するデータが特定の
パーティションにまとまって配置されていることから、
各ディスクからブロックを高速に転送することができる
と共に、ブロックデータ転送性能の劣化が少ない。ま
た、大量データのシーケンシャルアクセスに対してシー
ク回数を少なくすることができる。As described above, according to the third embodiment, when the blocks are accessed sequentially and the data is transferred, the data transferred from each disk are arranged in a specific partition.
Blocks can be transferred from each disk at high speed, and block data transfer performance is not significantly deteriorated. Moreover, the number of seeks can be reduced for sequential access of a large amount of data.

【０１７４】また、図２６の例ではデータの多重度が２
であるが、データの多重度Ｍの場合も同様に実現するこ
とができる。In the example of FIG. 26, the data multiplicity is 2
However, the case of the data multiplicity M can be similarly realized.

【０１７５】さらに、上記実施の形態３において、ホス
トマシンのディスクも含めてグループ化を行い、ホスト
マシンにデータを配置しても良い。Further, in the third embodiment, the disks of the host machine may be grouped and the data may be arranged in the host machine.

【０１７６】なお、上記実施の形態１から３において、
距離情報に基づいて信頼性のレベルを冗長グループごと
に設定するようにしてもよい。つまり、冗長グループを
設定するにあたり、バスレベル、マシンレベル、地域レ
ベルなどに分類してそれぞれ異なる信頼性レベルを有す
るように、これらを構成するディスクを選択する。この
ようにすることで、各冗長グループにて異なる信頼性レ
ベルで障害に対応することができる。In the first to third embodiments described above,
The reliability level may be set for each redundancy group based on the distance information. That is, when setting the redundancy group, disks constituting these are selected so that they are classified into a bus level, a machine level, a regional level, etc. and have different reliability levels. By doing so, it is possible to deal with the failure with different reliability levels in each redundancy group.

【０１７７】[0177]

【発明の効果】以上のように、この発明によれば、デー
タを同一データ量ごとの複数のデータブロックに分割し
て、該データブロック単位で複数の記憶装置に同一ブロ
ック数ずつ分散配置させると共に、所定数の記憶装置ご
とに同一内容のデータブロックを重複して格納し、複数
の記憶装置に分散して格納されたデータを所定数の記憶
装置ごとに１又は複数のデータブロック単位で読み出す
ので、各記憶装置のデータ量が均等になるようにデータ
が格納されると共に、データ転送時の各記憶装置からの
データ転送量が均等になることから、装置全体のスルー
プットの低下を最小限に抑えることができるという効果
がある。また、各データブロックを冗長グループごとに
シーケンシャルに格納することで、データ転送処理を行
う際、冗長グループ内のどの記憶装置からデータを転送
するかを決定するにあたり、特別な計算が不要であるこ
とから、全てのデータを記憶装置の先頭から順次転送す
るような処理において高速に動作させることができると
いう効果がある。As described above, according to the present invention, data is divided into a plurality of data blocks of the same data amount, and the same number of blocks are distributed and arranged in a plurality of storage devices in units of the data blocks. Since the data blocks of the same content are redundantly stored for each of the predetermined number of storage devices, and the data stored in a distributed manner in the plurality of storage devices are read in units of one or more data blocks for each of the predetermined number of storage devices. Since the data is stored so that the data amount of each storage device becomes equal, and the data transfer amount from each storage device at the time of data transfer becomes equal, the decrease in the throughput of the entire device is minimized. The effect is that you can. Also, by storing each data block sequentially for each redundancy group, no special calculation is required when deciding from which storage device in the redundancy group the data should be transferred when performing the data transfer process. Therefore, there is an effect that it is possible to operate at high speed in the process of sequentially transferring all the data from the head of the storage device.

【０１７８】この発明によれば、複数の記憶装置の各記
憶装置間における相互作用の指標である距離情報を用い
て、相互作用の小さい記憶装置同士をまとめた冗長グル
ープを構成し、データを複数の記憶装置に分散して格納
するにあたり、該データを同一データ量ごとの複数のデ
ータブロックに分割し、冗長グループ内の各記憶装置に
同一内容のデータブロックを重複して格納し、複数の記
憶装置に分散して格納されたデータを冗長グループごと
に１又は複数のデータブロック単位で読み出すので、各
記憶装置のデータ量が均等になるようにデータが格納さ
れると共に、データ転送時の各記憶装置からのデータ転
送量が均等になることから、装置全体のスループットの
低下を最小限に抑えることができるという効果がある。
また、記憶装置間の特定の相互作用による障害に柔軟に
対応する信頼性の高いデータ格納装置を提供することが
できるという効果がある。According to the present invention, the distance information, which is an index of the interaction between the storage devices of the plurality of storage devices, is used to form a redundant group in which the storage devices having a small interaction are put together, and a plurality of data are stored. When storing the data blocks in different storage devices, the data is divided into a plurality of data blocks of the same data amount, and the data blocks of the same content are stored in duplicate in each storage device in the redundancy group. Since the data distributed and stored in the device is read in units of one or a plurality of data blocks for each redundancy group, the data is stored so that the data amount of each storage device is equalized, and each storage device at the time of data transfer is read. Since the amount of data transferred from the device becomes even, there is an effect that a decrease in throughput of the entire device can be suppressed to a minimum.
Further, there is an effect that it is possible to provide a highly reliable data storage device that flexibly copes with a failure due to a specific interaction between the storage devices.

【０１７９】この発明によれば、複数の記憶装置に対し
てラウンドロビンでデータブロックを格納するので、各
記憶装置のデータ量が均等になるようにデータを格納す
ることができるという効果がある。According to the present invention, since the data blocks are stored in a plurality of storage devices by round robin, there is an effect that the data can be stored so that the data amount of each storage device becomes equal.

【０１８０】この発明によれば、ハッシュ分散を用いて
データブロックを格納すべき記憶装置を決定するので、
各記憶装置のデータ量が均等になるようにデータを格納
することができるという効果がある。According to the present invention, since the storage device in which the data block is to be stored is determined by using the hash distribution,
There is an effect that the data can be stored so that the data amount of each storage device becomes equal.

【０１８１】この発明によれば、各記憶装置から一度に
読み出される複数のデータブロックをそれぞれの記憶領
域内で隣接して配置するので、各記憶装置からデータブ
ロックを高速に転送することができると共に、データブ
ロックの転送性能の劣化が少ないという効果がある。ま
た、大量データのシーケンシャルアクセスに対してシー
ク回数を少なくすることができるという効果がある。According to the present invention, since a plurality of data blocks read at once from each storage device are arranged adjacent to each other in each storage area, the data blocks can be transferred from each storage device at high speed. The advantage is that the transfer performance of the data block is less deteriorated. Further, there is an effect that the number of seeks can be reduced for the sequential access of a large amount of data.

【０１８２】この発明によれば、いずれかの記憶装置が
データ読み出し不可となると、該記憶装置から読み出さ
れるべきデータブロック群を、これと同一内容のデータ
ブロック群を分散して格納する他の複数の記憶装置から
均等に読み出すので、故障した記憶装置から読み出すべ
きデータブロックが非故障の記憶装置から均等に読み出
されることから、故障時に特定の記憶装置がボトルネッ
クとなって装置全体におけるスループットが低下すると
いう不具合を抑制することができるという効果がある。According to the present invention, when data cannot be read from any of the storage devices, a plurality of data blocks to be read from the storage device are stored in a distributed manner with the data block groups having the same contents as those stored therein. Since the data blocks that should be read from the failed storage device are read evenly from the non-failed storage device, the specific storage device becomes a bottleneck at the time of the failure, and the throughput of the entire device decreases. This has the effect of suppressing the problem of

【０１８３】この発明によれば、複数の記憶装置が構成
する階層構造に基づいて距離情報を求めるので、記憶装
置の特定階層における障害に柔軟に対応することができ
るという効果がある。また、装置の特定の部分を中心と
した冗長構成を構築することができるという効果もあ
る。According to the present invention, since the distance information is obtained based on the hierarchical structure formed by a plurality of storage devices, there is an effect that it is possible to flexibly cope with a failure in a specific hierarchy of the storage device. There is also an effect that a redundant configuration can be built around a specific part of the device.

【０１８４】この発明によれば、各記憶装置が配置され
る位置情報に基づいて距離情報を求めるので、記憶装置
の位置関係に起因する障害に柔軟に対応することができ
るという効果がある。According to the present invention, since the distance information is obtained based on the position information of each storage device, there is an effect that it is possible to flexibly cope with a failure caused by the positional relationship of the storage devices.

【０１８５】この発明によれば、各記憶装置間の接続経
路に応じて決定される故障率に基づいて距離情報を求め
るので、記憶装置の接続関係に起因する障害に柔軟に対
応することができるという効果がある。According to the present invention, the distance information is obtained based on the failure rate determined according to the connection path between the storage devices, so that it is possible to flexibly cope with the failure caused by the connection relation of the storage devices. There is an effect.

【０１８６】この発明によれば、同一内容のデータブロ
ックを重複させる度合いである多重度を冗長グループご
とに設定するので、装置の信頼性を向上させることがで
きると共に、データ転送を高速化することができるとい
う効果がある。According to the present invention, since the multiplicity, which is the degree to which data blocks having the same contents are overlapped, is set for each redundancy group, the reliability of the device can be improved and the data transfer can be speeded up. There is an effect that can be.

【０１８７】この発明によれば、距離情報に基づいて信
頼性のレベルを冗長グループごとに設定するので、冗長
グループごとに異なる信頼性レベルで障害に対応するこ
とができるという効果がある。According to the present invention, since the reliability level is set for each redundancy group based on the distance information, there is an effect that a failure can be dealt with at a different reliability level for each redundancy group.

[Brief description of drawings]

【図１】この発明の実施の形態１によるデータ格納装
置の構成例を示す図である。FIG. 1 is a diagram showing a configuration example of a data storage device according to a first embodiment of the present invention.

【図２】図１中のホストマシンの構成を示す図であ
る。FIG. 2 is a diagram showing a configuration of a host machine in FIG.

【図３】この発明の実施の形態１よるデータ格納装置
の各ソフトウェアの関係を示す図である。FIG. 3 is a diagram showing the relationship of each software of the data storage device according to the first embodiment of the present invention.

【図４】図３中のノードマシン管理テーブルの構成を
示す図である。FIG. 4 is a diagram showing a configuration of a node machine management table in FIG.

【図５】実施の形態１によるデータ格納装置を構成す
るディスクの階層関係の一例を示す図である。FIG. 5 is a diagram showing an example of a hierarchical relationship of disks constituting the data storage device according to the first embodiment.

【図６】実施の形態１によるデータ格納装置を構成す
るディスクの階層関係の他例を示す図である。FIG. 6 is a diagram showing another example of the hierarchical relationship of disks constituting the data storage device according to the first embodiment.

【図７】ホストマシンから見たシステムの階層構造を
示す図である。FIG. 7 is a diagram showing a hierarchical structure of the system viewed from the host machine.

【図８】実施の形態１によるデータ格納装置の動作を
示すフロー図である。FIG. 8 is a flowchart showing the operation of the data storage device according to the first embodiment.

【図９】実施の形態１によるデータ格納装置の構成と
ディスクの階層関係における故障率とを示す図である。FIG. 9 is a diagram showing a configuration of the data storage device according to the first embodiment and a failure rate in a hierarchical relationship of disks.

【図１０】実施の形態１によるデータ格納装置の他の
構成とディスクの階層関係における故障率とを示す図で
ある。FIG. 10 is a diagram showing another configuration of the data storage device according to the first embodiment and a failure rate in a hierarchical relationship of disks.

【図１１】同一フロアに設置されたディスクの物理的
な位置関係を示す図である。FIG. 11 is a diagram showing a physical positional relationship of disks installed on the same floor.

【図１２】実施の形態１によるデータ格納装置のデー
タ書き込み動作を示すフロー図である。FIG. 12 is a flowchart showing a data write operation of the data storage device according to the first embodiment.

【図１３】図１２に示す書き込みフローで各ディスク
に書き込まれたブロックの格納図を示す図である。FIG. 13 is a diagram showing a storage diagram of blocks written in each disc in the write flow shown in FIG. 12;

【図１４】図３中のファイルブロック管理テーブルの
内部構成を示す図である。FIG. 14 is a diagram showing an internal structure of a file block management table in FIG.

【図１５】実施の形態１のデータ格納装置によるデー
タ書き込み時に設定するリードブロック番号を示す図で
ある。FIG. 15 is a diagram showing a read block number set at the time of writing data by the data storage device of the first embodiment.

【図１６】実施の形態１によるデータ格納装置のデー
タ転送動作を示すフロー図である。FIG. 16 is a flowchart showing a data transfer operation of the data storage device according to the first embodiment.

【図１７】実施の形態１のデータ格納装置によってデ
ィスク故障時に設定したリードブロック番号を示す図で
ある。FIG. 17 is a diagram showing a read block number set when a disk fails by the data storage device according to the first embodiment.

【図１８】実施の形態２によるデータ格納装置の動作
を示すフロー図である。FIG. 18 is a flowchart showing the operation of the data storage device according to the second embodiment.

【図１９】図５中の階層関係に対して作成した冗長グ
ループの対応表を示す図である。19 is a diagram showing a redundancy group correspondence table created for the hierarchical relationship in FIG.

【図２０】実施の形態２のデータ格納装置によるデー
タ書き込み動作を示すフロー図である。FIG. 20 is a flowchart showing a data write operation by the data storage device according to the second embodiment.

【図２１】図１９の多重度２に対応する冗長グループ
設定がなされているシステムのデータブロック格納例を
示す図である。FIG. 21 is a diagram showing an example of data block storage in a system in which a redundant group setting corresponding to the multiplicity of 2 in FIG. 19 is made.

【図２２】図２１の構成にリードブロックを設定した
場合を示す図である。22 is a diagram showing a case where a read block is set in the configuration of FIG.

【図２３】図２１の構成でディスク故障時に設定され
るリードブロック番号を示す図である。FIG. 23 is a diagram showing a read block number set when a disk fails in the configuration of FIG. 21.

【図２４】実施の形態２のデータ格納装置によるディ
スク故障時の動作を示すフロー図である。FIG. 24 is a flowchart showing an operation when a disk fails in the data storage device according to the second embodiment.

【図２５】図５の構成のシステムを実施の形態２の手
順に基づいてデータを分散配置させたディスク格納例を
示す図である。FIG. 25 is a diagram showing an example of disk storage in which data is distributed in the system having the configuration of FIG. 5 based on the procedure of the second embodiment.

【図２６】この発明の実施の形態３によるデータ格納
装置のブロック格納例を示す図である。FIG. 26 is a diagram showing an example of block storage of a data storage device according to a third embodiment of the present invention.

【図２７】ミラーリングを用いたデータ格納例を示す
図である。FIG. 27 is a diagram showing an example of data storage using mirroring.

【図２８】従来のディスクアレイによるデータ格納例
を示す図である。FIG. 28 is a diagram showing an example of data storage by a conventional disk array.

[Explanation of symbols]

１ホストマシン（データ格納装置）、２ネットワー
ク、３−１〜３−ｎノードマシン、４主記憶装置、５
オペレーティングシステム、６データ転送制御部
（読み出し手段）、７ファイルブロック管理部（デー
タ格納手段）、７ａファイルブロック管理テーブル、
７ａ−１ディスク番号、７ａ−２リードブロック番
号、８通信制御部、９ノードマシン管理部（データ
格納手段）、９ａノードマシン管理テーブル、１０
中央演算処理装置（データ格納手段、読み出し手段）、
１１ネットワークアダプタ、１２ａシステムバス、
１２ｂＰＣＩバス、１３バス結合装置、１４ディ
スクコントローラ、１５，１６ディスク装置（記憶装
置）、１７オペレーティングシステム、１８通信制
御部、１９ディスクアクセス制御部、２０ディスク
装置（記憶装置）、２１ノードマシン管理情報、２１
ａノードマシン番号、２１ｂＩＰアドレス、２１ｃ
ポート番号、２２ディスク情報、２２ａディスク
番号、２２ｂノードマシン番号、２３階層情報、２３
ａ下位階層情報、２３ｂ上位階層情報、２４距離
情報、２５，２６フロアフレーム、２５−１，２５−
２，２６，２６−１，２６−２ノードマシンフレー
ム、２７システム全体、２８ホストマシン階層、２９
ホストマシンフレーム、３０−１，３０−２，３０−
３，３０−４ノードマシン、３１−１，３１−２ル
ータ、３２ホストマシン、３３−１，３３−２，３３
−３，３３−４冗長グループ。1 host machine (data storage device), 2 network, 3-1 to 3-n node machine, 4 main storage device, 5
Operating system, 6 data transfer control unit (reading unit), 7 file block management unit (data storage unit), 7a file block management table,
7a-1 disk number, 7a-2 read block number, 8 communication control unit, 9 node machine management unit (data storage means), 9a node machine management table, 10
Central processing unit (data storage means, reading means),
11 network adapter, 12a system bus,
12b PCI bus, 13 bus coupling device, 14 disk controller, 15, 16 disk device (storage device), 17 operating system, 18 communication control unit, 19 disk access control unit, 20 disk device (storage device), 21 node machine management Information, 21
a Node machine number, 21b IP address, 21c
Port number, 22 disk information, 22a disk number, 22b node machine number, 23 layer information, 23
a lower layer information, 23b upper layer information, 24 distance information, 25,26 floor frame, 25-1,25
2, 26, 26-1, 26-2 node machine frame, 27 whole system, 28 host machine hierarchy, 29
Host machine frame, 30-1, 30-2, 30-
3, 30-4 node machine, 31-1, 31-2 router, 32 host machine, 33-1, 33-2, 33
-3, 33-4 Redundancy group.

───────────────────────────────────────────────────── フロントページの続き (72)発明者郡光則東京都千代田区丸の内二丁目２番３号三菱電機株式会社内Ｆターム(参考） 5B018 GA04 HA04 MA14 5B065 BA01 CA12 CC08 CH13 CH18 EA31 ─────────────────────────────────────────────────── ─── Continued front page (72) Inventor Mitsunori Gun 2-3 2-3 Marunouchi, Chiyoda-ku, Tokyo Inside Ryo Electric Co., Ltd. F-term (reference) 5B018 GA04 HA04 MA14 5B065 BA01 CA12 CC08 CH13 CH18 EA31

Claims

[Claims]

1. Data is divided into a plurality of data blocks of the same data amount, and the same number of blocks are distributed and arranged in a plurality of storage devices in units of the data blocks.
Data storage means for redundantly storing the same data block for each predetermined number of the storage devices, and one or a plurality of data stored in a distributed manner in the plurality of storage devices. And a data storage device having a reading means for reading the data block by unit.

2. A redundant group in which the storage devices having a small interaction are grouped together by using distance information which is an index of the interaction between the storage devices of the plurality of storage devices, and data is stored in the plurality of storage devices. A data storage unit that divides the data into a plurality of data blocks of the same data amount and stores the data blocks of the same content in duplicate in each storage device in the redundancy group when the data blocks are distributed and stored in the devices. A data storage device comprising: a reading unit that reads out data stored in a distributed manner in the plurality of storage devices in units of one or a plurality of data blocks for each of the redundancy groups.

3. A redundant group in which the storage devices having a small interaction are grouped together by using distance information, which is an index of interaction between the storage devices of the plurality of storage devices, and data is stored in the plurality of storage devices. A data storage device that divides the data into a plurality of data blocks of the same data amount and stores the data blocks of the same content in duplicate in each storage device in the redundancy group when the data blocks are distributed and stored in the devices.

4. The data storage device according to claim 1, wherein the data storage means stores the data block in a round robin manner in a plurality of storage devices.

5. The data storage device according to claim 1, wherein the data storage means determines the storage device in which the data block should be stored by using hash distribution.

6. The data storage according to claim 1, wherein the data storage means arranges a plurality of data blocks read at a time from each storage device adjacently in each storage area. apparatus.

7. A plurality of reading units store a data block group to be read from the storage device in a distributed manner when the storage device cannot read data in any of the storage devices. 3. Evenly reading from the storage device of claim 1.
The described data storage device.

8. The data storage device according to claim 3, wherein the distance information is obtained based on a hierarchical structure formed by a plurality of storage devices.

9. The data storage device according to claim 3, wherein distance information is obtained based on position information in which each storage device is arranged.

10. The data storage device according to claim 3, wherein the distance information is obtained based on a failure rate determined according to a connection path between the storage devices.

11. The data storage device according to claim 3, wherein a multiplicity, which is a degree of overlapping data blocks having the same content, is set for each redundancy group.

12. The reliability level is set for each redundancy group based on the distance information.
The described data storage device.

13. The data is divided into a plurality of data blocks of the same data amount, and the same number of blocks are distributed and arranged in a plurality of storage devices in units of the data blocks, and the same contents are stored for each predetermined number of the storage devices. A data storing step of storing the data blocks in duplicate, and a reading step of reading the data stored in a distributed manner in the plurality of storage devices in units of one or a plurality of data blocks for each of the predetermined number of storage devices. Prepared data management method.

14. In the data storing step, a distance group, which is an index of the interaction between the storage devices of the plurality of storage devices, is used to form a redundant group in which the storage devices having a small interaction are put together. When the data is distributed and stored in the plurality of storage devices, the data is divided into a plurality of data blocks of the same data amount, and the storage device in the redundancy group is duplicated with the same data block. 14. The data management method according to claim 13, wherein the data is stored.

15. The data is divided into a plurality of data blocks of the same amount of data, the same number of blocks are distributed and arranged in a plurality of storage devices in units of the data blocks, and the same contents are stored for each of a predetermined number of the storage devices. A data storage means for storing the data blocks in duplicate, a computer as a read means for reading out the data stored dispersedly in the plurality of storage devices in units of one or a plurality of data blocks for each of the predetermined number of storage devices. A program to make it work.

16. A redundant group in which the storage devices having a small interaction are grouped together by using distance information, which is an index of interaction between the storage devices of a plurality of storage devices, and data is stored in the same amount of data. Divided into a plurality of data blocks, and the same number of blocks are distributed and arranged in the plurality of storage devices in units of the data blocks, and the data blocks having the same content are redundantly stored in the storage devices in the redundancy group. A program for causing a computer to function as a reading unit that reads out data stored in a distributed manner in the plurality of storage devices in units of one or a plurality of data blocks for each of the redundancy groups.

17. A redundant group in which the storage devices having a small interaction are grouped together by using distance information, which is an index of interaction between the storage devices of the plurality of storage devices, and data is stored in the plurality of storage devices. As a data storage unit that divides the data into a plurality of data blocks of the same data amount and stores the data blocks of the same content in duplicate in each storage device in the redundant group when the data blocks are distributed and stored in the devices. A program for operating a computer.