JP2010134788A

JP2010134788A - Cluster storage device, and method of controlling same

Info

Publication number: JP2010134788A
Application number: JP2008311475A
Authority: JP
Inventors: Tomotaka Shionoya; 友隆塩野谷; Tetsuya Kamimura; 上村　　哲也
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2008-12-05
Filing date: 2008-12-05
Publication date: 2010-06-17

Abstract

<P>PROBLEM TO BE SOLVED: To provide a cluster storage system capable of allowing for different disk capacities and maintaining appropriate redundancy for the number of active disks. <P>SOLUTION: A cluster storage comprising an information recording controller and three or more storage devices is provided. When storing data, the information recording controller calculates data difference sequentially, and distributes and stores data to a plurality of storage devices. When outputting data, the information recording controller collects data differences from a plurality of storage devices to restore and output the data. Thus, different disk capacities are allowed, and redundancy in accordance with the number of active disks is secured. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、スラスタ・ストレージ装置及びそれを制御する方法に関し、例えば、複数のストレージ装置における冗長性を確保する技術に関する。 The present invention relates to a thruster storage apparatus and a method for controlling the same, and, for example, relates to a technique for ensuring redundancy in a plurality of storage apparatuses.

近年デジタルデータの増大とＨＤＤの容量増加が進んでいる。デジタルデータの増大はデータ保護の重要性を増し、一方のＨＤＤ容量増加は増大するデジタルデータの確保を可能にしつつも、故障時の損失という危険性を増加させている。 In recent years, digital data has increased and HDD capacity has increased. The increase in digital data increases the importance of data protection, while the increase in HDD capacity increases the risk of loss at the time of failure, while enabling increasing digital data to be secured.

また、家庭内でもテラバイト単位のストレージが保有されるようになり、そのメンテナンスを容易にする目的で、従来大規模サーバに利用されていたデジタルデータ保護を適用することが一般的になってきている。このデジタルデータ保護方法の代表がＲＡＩＤ（Redundant Arrays of Inexpensive Disks）であり、下記非特許文献にて開示されている。 In addition, terabyte-unit storage is now available at home, and it has become common to apply digital data protection that was previously used for large-scale servers for the purpose of facilitating maintenance. . A representative example of this digital data protection method is RAID (Redundant Arrays of Inexpensive Disks), which is disclosed in the following non-patent literature.

ＲＡＩＤには冗長度とディスク容量利用率に応じて０から６までのバリエーションがある。その中でも０、１、及び５の何れか、もしくは組み合わせてシステムを構築することが一般的に多い。以下、０、１、及び５についてそれぞれ概略を説明する。 RAID has variations from 0 to 6 depending on the redundancy and the disk capacity utilization rate. Of these, a system is generally often constructed by any one or a combination of 0, 1, and 5. Hereinafter, outlines of 0, 1, and 5 will be described.

ＲＡＩＤ０はストライピングとも呼ばれ、冗長度を有さないがディスク容量の利用率は１００％である。ＲＡＩＤ０の原理は２台以上の同容量のディスクにデータを細切れに分散保存し、リード時は細切れのデータをコントローラが結合・復元する仕組みである。これにより、ディスクのリード・ライト帯域がディスク台数に比例して高速になるという特徴を有する。 RAID0 is also called striping and has no redundancy, but the disk capacity utilization rate is 100%. The principle of RAID 0 is a mechanism in which data is distributed and stored in two or more disks of the same capacity, and the controller combines and restores the data in the read state when reading. As a result, the read / write bandwidth of the disk is increased in proportion to the number of disks.

ＲＡＩＤ１はミラーリングとも呼ばれ、システムに接続されたディスク台数−１台の故障に対する耐性を有する。一方でディスク容量の利用率は１／（システムのディスク台数）×１００［％］とディスク台数に反比例し、ＲＡＩＤのバリエーションの中で最も利用率が低い。ＲＡＩＤ１の原理は２台以上の同容量のディスクに同じデータを保存する仕組みである。実装によってはＲＡＩＤ０と同じく複数台のディスクから断片的にデータを読み出しコントローラで復元することでリード帯域をディスク台数に比例して高速にすることも可能である。 RAID 1 is also called mirroring, and has resistance against the failure of the number of disks connected to the system minus one. On the other hand, the utilization rate of the disk capacity is 1 / (number of disks in the system) × 100 [%], which is inversely proportional to the number of disks, and the lowest utilization rate among RAID variations. The principle of RAID 1 is a mechanism for storing the same data on two or more disks of the same capacity. Depending on the implementation, it is also possible to increase the read bandwidth in proportion to the number of disks by reading data piecewise from a plurality of disks and restoring it with a controller, as in RAID 0.

ＲＡＩＤ５はブロック単位でのパリティ分散記録であり、１台のディスクの故障に対する耐性を有し、容量の利用率は（システムのディスク台数−１）／（システムのディスク数）×１００［％］である。ＲＡＩＤ５の原理はデータをブロック単位で分割し、それぞれのブロックに対する誤り訂正符号（パリティ）を計算し、３台以上の同容量のディスクに分散保存する仕組みである。１台のディスク故障時にはパリティを用いて元データを復元することができる。分散保存されているため、ＲＡＩＤ０と同じくリードは高速である一方で、ライト時にはパリティ計算分のレイテンシがあることが特徴である。 RAID 5 is parity distribution recording in units of blocks, and has tolerance against failure of one disk, and the capacity utilization rate is (number of system disks −1) / (number of system disks) × 100 [%]. is there. The principle of RAID5 is a mechanism in which data is divided into blocks, error correction codes (parity) for each block are calculated, and distributed and stored on three or more disks of the same capacity. When one disk fails, the original data can be restored using the parity. Since the data is stored in a distributed manner, the read is as fast as RAID 0, but at the time of writing, there is a latency for parity calculation.

A case for redundant arrays of inexpensive disks (RAID). （SIGMOD ´88.）A case for redundant arrays of inexpensive disks (RAID). (SIGMOD ´88.)

ＲＡＩＤは何れのバリエーションにおいても、すべてのディスクの容量が同容量であることを前提としたシステムである。すなわち、ＲＡＩＤにおいてディスク故障時にはシステム構成時に利用した容量と同容量のディスクを用意することが望ましい。ファイルをＬＢＡ（Logical Block Addressing）で管理するため、大きな容量のディスクを用いても使う領域の容量は小さい容量のディスクに依存してしまい、不使用領域が無駄となってしまうからである。 RAID is a system based on the premise that all disks have the same capacity in any variation. That is, it is desirable to prepare a disk having the same capacity as the capacity used in the system configuration when a disk failure occurs in RAID. This is because the file is managed by LBA (Logical Block Addressing), so even if a large capacity disk is used, the capacity of the area used depends on the small capacity disk, and the unused area is wasted.

しかしながら、昨今の目まぐるしいディスク容量の増加により、ディスク故障時にシステム構成時と同容量のディスクを取得することは困難もしくはビット単価的に不利であることが多くなってきている。この問題によりディスク故障時にはシステム全体を再構築したり、大容量ディスクの一部を使用不可として小容量ディスクとして代替したりすることが増え、時間的損失もしくは金銭的損失が発生している。つまり、売れ筋のディスクや安価なディスク（ディスク容量が初期のものと異なるディスク）を後で使う場面を考えた場合、ディスクの容量がバラバラになってしまうが、このような場合でも各ディスクの容量をフルに活用してＲＡＩＤとして動作させることが望まれる。 However, due to the recent rapid increase in disk capacity, it is often difficult to acquire a disk having the same capacity as that in the system configuration when a disk fails or disadvantageous in terms of the bit unit price. Due to this problem, when a disk failure occurs, the entire system is reconstructed or a part of a large capacity disk is unusable and replaced with a small capacity disk, resulting in a time loss or a monetary loss. In other words, if you consider using a hot-selling disk or an inexpensive disk (a disk with a different disk capacity from the initial one) later, the disk capacity will vary. It is desired to operate as RAID by making full use of.

また、ＲＡＩＤの性質上、ディスクが故障しディスク台数が減少した状態における冗長性を確保することができない。データセンタなどに比べてディスクの故障率が高く、復旧までに時間がかかる家庭内環境においてこの欠点は致命的である。すなわち家庭内システムへのＲＡＩＤ適用が適していないことを意味している。 Also, due to the nature of RAID, it is not possible to ensure redundancy in a state where a disk has failed and the number of disks has decreased. This disadvantage is fatal in a home environment where the failure rate of a disk is higher than that of a data center or the like and it takes time to recover. That is, it means that RAID is not suitable for home systems.

本発明はこのような状況に鑑みてなされたものであり、ストレージシステムにおいて、異なるディスク容量のディスクを許容しつつ、ディスク故障時においてもデータの冗長性を確保する技術を提供するものである。 The present invention has been made in view of such a situation, and provides a technique for ensuring data redundancy even in the event of a disk failure while allowing disks with different disk capacities in a storage system.

上記課題を解決するために、本発明では、３台以上のストレージ装置と、外部装置から提供されたデータを３台以上のストレージ装置の何れかに格納し、かつ、３台以上のストレージ装置の何れかに格納されたデータへのアクセスを制御する情報記録再生制御装置と、を備えるクラスタ・ストレージ装置において、アクセス解釈手段がデータの格納及びアクセスをオブジェクト単位で実行するようにしている。より具体的には、アクセス解釈手段は、オブジェクト単位のデータにＩＤを付与して管理し、３台以上のストレージ装置に格納されたデータに対するアクセス要求を受け取り、アクセス対象のデータに対応するＩＤを取得してデータのアクセスを実現する。 In order to solve the above-described problem, the present invention stores three or more storage devices and data provided from an external device in any of the three or more storage devices, and the three or more storage devices. In a cluster storage apparatus including an information recording / reproduction control apparatus that controls access to data stored in any of the above, an access interpreter executes data storage and access in units of objects. More specifically, the access interpreting means assigns an ID to the object unit data, manages it, receives an access request for data stored in three or more storage devices, and assigns an ID corresponding to the data to be accessed. Acquire and implement data access.

情報記録再生制御装置では、さらに、差分情報生成手段が、３台以上のストレージ装置の何れかに格納された元データを更新して生成された更新データを受け取った場合に、更新データと元データとの差分を取り、差分情報を生成する。そして、アクセス解釈手段が、差分情報に対して元データと同一のＩＤを付与して３台以上のストレージ装置の何れかに格納する。 In the information recording / reproducing control apparatus, when the difference information generation unit receives the update data generated by updating the original data stored in any of the three or more storage apparatuses, the update data and the original data And difference information is generated. Then, the access interpretation unit assigns the same ID as the original data to the difference information and stores it in any of the three or more storage devices.

情報記録再生制御装置では、さらに、元データ検索手段が、アクセス対象のデータに対応するＩＤを基に、元データを検索し、差分情報検索手段が、そのＩＤを基に、差分情報を検索する。そして、復元手段が、検索によって得られた元データと差分情報を用いて、オブジェクト単位の更新データを復元する。 In the information recording / reproducing control apparatus, the original data search means further searches the original data based on the ID corresponding to the data to be accessed, and the difference information search means searches for difference information based on the ID. . Then, the restoration means restores the update data for each object using the original data and the difference information obtained by the search.

また、アクセス解釈手段は、ＩＤが付与されたオブジェクト単位のデータを３台以上のストレージ装置のうち少なくとも２台に格納し、冗長度２以上を実現する。 In addition, the access interpreting means stores data in units of objects to which IDs are assigned in at least two of the three or more storage devices, thereby realizing a redundancy of 2 or more.

さらに、アクセス解釈手段は、１つの元データと、それに関連する全ての差分情報とを、３台以上のストレージ装置のうち少なくとも１台に集約して格納する。この場合、具体的には、オリジナルフラグ付与手段が、１台のストレージ装置に集約して格納される元データにオリジナルデータであることを示すオリジナルフラグを付与し、アクセス解釈手段が、オリジナルフラグが付与された元データと、それに関連する全ての差分情報とを、１台のストレージ装置に集約して格納する。アクセス時の動作については、アクセス解釈手段は、アクセス要求を受け取ると、対応するオブジェクト単位のデータのＩＤを取得し、元データ検索手段が、アクセス要求のデータに対応するＩＤを基に、前記オリジナルフラグを有する元データを格納する前記ストレージ装置から元データを優先的に検索し、差分情報検索手段が、このＩＤを基に、オリジナルフラグを有する元データを格納するストレージ装置から、関連する全ての差分情報を検索し、復元手段が、オリジナルフラグを有する元データと関連する全ての差分情報を用いて、オブジェクト単位のデータを復元する。 Furthermore, the access interpreting means aggregates and stores one original data and all the difference information related thereto in at least one of the three or more storage devices. In this case, specifically, the original flag assigning means assigns an original flag indicating original data to the original data that is aggregated and stored in one storage device, and the access interpreting means The given original data and all the difference information related thereto are collected and stored in one storage device. Regarding the operation at the time of access, when the access interpreting means receives the access request, the access interpreting means acquires the ID of the corresponding object unit data, and the original data searching means obtains the original based on the ID corresponding to the access request data. The original data is preferentially searched from the storage device that stores the original data having the flag, and the difference information search means, based on this ID, from the storage device that stores the original data having the original flag, The difference information is searched, and the restoring means restores the data in object units using all the difference information related to the original data having the original flag.

本発明の別の態様によれば、３台以上のストレージ装置と、外部装置から提供されたデータを３台以上のストレージ装置の何れかに格納し、かつ、３台以上のストレージ装置の何れかに格納されたデータへのアクセスを制御する情報記録再生制御装置と、を備えるクラスタ・ストレージ装置において、故障検知手段が、ストレージ装置の故障を検知し、検査手段が、故障が検知されたストレージ装置以外の動作可能なストレージ装置に格納されているデータを検索し、所定の冗長度が確保されているかを検査し、データ格納手段が、所定の冗長度が確保されていない場合に、所定の冗長度が確保されるように、動作可能なストレージ装置に不足するデータを格納する。 According to another aspect of the present invention, three or more storage devices and data provided from an external device are stored in any of the three or more storage devices, and any of the three or more storage devices. In a cluster storage device comprising an information recording / playback control device for controlling access to data stored in the storage device, the failure detection means detects a failure of the storage device, and the inspection means detects the failure. The data stored in an operable storage device other than the above is searched to check whether a predetermined redundancy is ensured, and when the data storage means does not ensure the predetermined redundancy, In order to ensure the degree of storage, the deficient data is stored in the operable storage device.

また、本発明の別の態様によれば、３台以上のストレージ装置と、外部装置から提供されたデータを３台以上のストレージ装置の何れかに格納し、かつ、３台以上のストレージ装置の何れかに格納されたデータへのアクセスを制御する情報記録再生制御装置と、を備えるクラスタ・ストレージ装置において、故障検知手段が、ストレージ装置の故障を検知し、検査手段が、故障が検知されたストレージ装置以外の動作可能なストレージ装置に格納されているデータを検索し、所定の冗長度が確保されているかを検査し、データ整理手段が、所定の冗長度が確保されていない場合に、動作可能なストレージ装置に格納されているデータの一部を削除し、所定の冗長度を落とすようにする。 According to another aspect of the present invention, three or more storage devices and data provided from an external device are stored in any of the three or more storage devices, and the three or more storage devices In a cluster storage device comprising an information recording / playback control device that controls access to data stored in any of the storage devices, the failure detection means detects a failure of the storage device, and the inspection means detects the failure. Searches data stored in an operable storage device other than the storage device, checks whether a predetermined redundancy is ensured, and operates when the data reduction means does not ensure the predetermined redundancy A part of the data stored in the possible storage device is deleted, and the predetermined redundancy is lowered.

さらなる本発明の特徴は、以下本発明を実施するための最良の形態および添付図面によって明らかになるものである。 Further features of the present invention will become apparent from the best mode for carrying out the present invention and the accompanying drawings.

本発明のクラスタ・ストレージ装置によれば、異なるディスク容量を許容し、さらに動作しているディスクの台数に応じた冗長性を確保することができる。このように異なる容量のディスクの使用を許容するので、システム修復時にビット単価観点から最もコストが低くでき、ディスク修復を極力必要としなくなり、よって安価なシステムを提供することができる。 According to the cluster storage apparatus of the present invention, different disk capacities can be allowed, and redundancy according to the number of operating disks can be ensured. Since the use of disks with different capacities is allowed in this way, the cost can be lowest in terms of the unit price per bit when the system is repaired, and disk repair is not required as much as possible, and therefore an inexpensive system can be provided.

以下、添付図面を参照して本発明の実施形態について説明する。ただし、本実施形態は本発明を実現するための一例に過ぎず、本発明の技術的範囲を限定するものではないことに注意すべきである。また、各図において共通の構成については同一の参照番号が付されている。 Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. However, it should be noted that this embodiment is merely an example for realizing the present invention, and does not limit the technical scope of the present invention. In each drawing, the same reference numerals are assigned to common components.

１）第１の実施形態
第１の実施形態は、システム基本概要と冗長性について関するものである。本発明の重要な特徴の一つは、データを一つの”意味のある単位（オブジェクト）”として認識することで従来のＲＡＩＤ方式では実現できなかった容量の異なるディスクの混在を実現する。つまり、ＲＡＩＤはＬＢＡ空間においてデータを単位容量（ブロック／ビット）に区切って冗長度を持たせるのに対し、第１の実施形態はデータベースなどの手段を用いてデータ単位をオブジェクトとして認識、差分情報もオブジェクトとして認識させそれぞれを多重化することで冗長性を確保している。 1) First Embodiment The first embodiment relates to a basic system outline and redundancy. One of the important features of the present invention is that data is recognized as one “significant unit (object)”, thereby realizing a mixture of disks having different capacities that cannot be realized by the conventional RAID system. That is, in RAID, data is divided into unit capacities (blocks / bits) in the LBA space to provide redundancy, whereas in the first embodiment, data units are recognized as objects using means such as a database, and difference information Is also recognized as an object, and redundancy is ensured by multiplexing each.

＜クラスタ・ストレージシステムの構成＞
図１は本発明の第１の実施形態によるクラスタ・ストレージシステム１の概略構成を示す図である。クラスタ・ストレージシステム１は、オブジェクトを新規作成、変更あるいは再生するクライアント装置１０と、オブジェクトを保存、保持するストレージ装置１４が３台以上とストレージ装置１４を一つの仮想的なストレージとして仮想化するバーチャライザ装置１２と、を備える。なお、ここでは、クライアント装置１０とバーチャライザ装置は、ネットワーク２を介して接続されている。 <Configuration of cluster storage system>
FIG. 1 is a diagram showing a schematic configuration of a cluster storage system 1 according to the first embodiment of the present invention. The cluster storage system 1 is a virtual device that virtualizes a client device 10 that creates, changes, or plays back an object, three or more storage devices 14 that store and hold objects, and the storage device 14 as one virtual storage. And a riser device 12. Here, the client device 10 and the virtualizer device are connected via the network 2.

バーチャライザ装置１２において、複数台のストレージ装置１４を仮想化することで論理的に形成されるストレージ手段はクラスタ１６と呼ばれる。なお、ＲＡＩＤ５をベースに、ディスクが壊れる前後で同じ冗長性を維持するためには、ディスク装置（異なる容量でも良い）を３台以上必要となる。 In the virtualizer device 12, storage means logically formed by virtualizing a plurality of storage devices 14 is called a cluster 16. In order to maintain the same redundancy before and after the disk breaks based on RAID 5, three or more disk devices (may have different capacities) are required.

クライアント装置１０は、ユーザインターフェース２０と、ＣＰＵ２２とメモリ２４と、通信インターフェース２６と、を含む装置である。 The client device 10 is a device that includes a user interface 20, a CPU 22, a memory 24, and a communication interface 26.

ユーザインターフェース２０は、キーボードなどの入力装置２０２とディスプレイなどの出力装置２０４とから構成され、ユーザが命令を入力したり、ユーザにオブジェクトを出力したりするための装置である。このオブジェクトとは適当なフォーマットで電子化されたデータをいう。 The user interface 20 includes an input device 202 such as a keyboard and an output device 204 such as a display. The user interface 20 is a device for a user to input a command or output an object to the user. This object is data digitized in an appropriate format.

また、ＣＰＵ２２はメモリ２４に保存されている各手段（それぞれプログラムで実現される）を読み込み、処理するための装置である。以降、メモリ２４より手段をＣＰＵ２２に読み込み、ＣＰＵ２２が処理する動作を単に「ＣＰＵ２２が実行する」と表現する。メモリ２４は装置が使用する処理手段を保存および一時的に情報を記憶するための装置である。 The CPU 22 is a device for reading and processing each means (each realized by a program) stored in the memory 24. Hereinafter, the means is read into the CPU 22 from the memory 24 and the operation performed by the CPU 22 is simply expressed as “the CPU 22 executes”. The memory 24 is a device for storing processing means used by the device and temporarily storing information.

続いて、クライアント装置１０に搭載されているメモリ２４中の各手段について説明する。オブジェクト作成手段５００はユーザが入力装置２０２を用いて入力した情報を適当なフォーマットで電子化することで、オブジェクトを生成する手段である。 Next, each means in the memory 24 mounted on the client device 10 will be described. The object creation unit 500 is a unit that creates an object by digitizing information input by the user using the input device 202 in an appropriate format.

オブジェクト更新手段５０２はユーザが入力装置２０２及び出力装置２０４を用いて既存のオブジェクトに変更・更新を加え、状態を変化させる手段である。 The object updating unit 502 is a unit that changes the state by the user changing or updating an existing object using the input device 202 and the output device 204.

クライアント装置１０の用途によっては出力装置２０４を介してユーザにオブジェクトを出力するためのオブジェクト再生手段を有していてもよいし、さらにはオブジェクト作成かオブジェクト更新のどちらかあるいは両方を有していなくてもよい。 Depending on the use of the client apparatus 10, an object playback means for outputting an object to the user via the output apparatus 204 may be provided, and further, either object creation or object update or both are not provided. May be.

オブジェクト管理手段５０４は、通信インターフェース２６を介してクラスタ１６にオブジェクトを保存もしくはオブジェクトにアクセスするための手段である（あるアドレスからある別のアドレスまでがオブジェクトであるということを管理する）。このオブジェクト管理手段５０４は、バーチャライザ装置１２がストレージ装置１４をロジカル・ブロッキング・アドレス方式によってアクセス可能なブロック・ストレージとして仮想化する場合は、ＮＴＦＳやｅｘｔ３などの一般的なファイルシステムである。また、ネットワーク２経由でオブジェクトにアクセス可能なプロトコルで動作するネットワーク接続ストレージとして仮想化している場合、オブジェクト管理手段５０４は、ＮＦＳなどのプロトコル解釈手段でもある。さらに、検索などをベースとしたオブジェクトベースド・ストレージ・デバイスとして仮想化している場合、オブジェクト管理手段５０４は、プロトコルに応じた検索質問を動的に生成する手段であったりする。 The object management means 504 is means for storing or accessing an object in the cluster 16 via the communication interface 26 (manages that an object from one address to another address is an object). The object management unit 504 is a general file system such as NTFS or ext3 when the virtualizer device 12 virtualizes the storage device 14 as a block storage accessible by the logical blocking address method. When virtualized as a network-attached storage that operates with a protocol that can access objects via the network 2, the object management means 504 is also a protocol interpretation means such as NFS. Further, when virtualizing as an object-based storage device based on search or the like, the object management unit 504 may be a unit that dynamically generates a search query according to a protocol.

通信インターフェース２６はネットワーク２に接続された他の装置に対して命令及び情報を送信したり、あるいはネットワーク２に接続された他の装置からの命令及び情報を受信したりするための装置である。 The communication interface 26 is a device for transmitting commands and information to other devices connected to the network 2 or receiving commands and information from other devices connected to the network 2.

ストレージ装置１４は、通信インターフェース２６と、ストレージ手段２８と、を含んでいる。ストレージ手段２８は、通信インターフェース２６を介して受信した情報を命令に応じて保存するための手段である。また、ここでは、ストレージ装置１４はネットワーク２に接続された他の装置からの命令によって情報を保存するブロック・ストレージとしている。なお、ストレージ装置１４にＣＰＵ２２、メモリ２４及びオブジェクト管理手段５０４を組み込むことでストレージ装置１４がオブジェクトを管理できるように実装してもよい。 The storage device 14 includes a communication interface 26 and storage means 28. The storage means 28 is means for storing information received via the communication interface 26 in accordance with a command. Here, the storage device 14 is a block storage that stores information in accordance with a command from another device connected to the network 2. Note that the storage device 14 may be mounted so that the storage device 14 can manage objects by incorporating the CPU 22, the memory 24, and the object management unit 504 into the storage device 14.

バーチャライザ装置１２は、通信インターフェース２６とＣＰＵ２２とメモリ２４とから構成される装置である。以下、バーチャライザ装置１２に搭載されているメモリ２４中の各手段について説明する。 The virtualizer device 12 is a device that includes a communication interface 26, a CPU 22, and a memory 24. Hereinafter, each means in the memory 24 mounted on the virtualizer device 12 will be described.

アクセス解釈手段５２０は一種のファイルシステムであり、クライアント装置（ホスト）１０から送られてきたデータを通信プロトコルに従って、どの部分がオブジェクトかを解釈する機能を有する。言い換えると、アクセス解釈手段５２０は、ネットワーク２を介するクライアント装置１０のオブジェクトの保存要求あるいはオブジェクトへのアクセス要求をオブジェクト管理手段５０４が理解できる情報へと変換する手段である。さらに言えば、アクセス解釈手段５２０はバーチャライザ装置１２がクラスタ１６へのアクセス方法を決定する手段であり、本実施形態においてもっとも重要な手段の１つである。 The access interpreting means 520 is a kind of file system and has a function of interpreting data sent from the client device (host) 10 according to a communication protocol, which part is an object. In other words, the access interpretation unit 520 is a unit that converts an object storage request or an access request to the object of the client device 10 via the network 2 into information that the object management unit 504 can understand. Furthermore, the access interpreting means 520 is a means for the virtualizer device 12 to determine the access method to the cluster 16, and is one of the most important means in this embodiment.

オブジェクトは、オブジェクト作成手段５００によって作成されたベース部と、ベース部作成以降にオブジェクトに加えられた変更差分である差分情報とに別々に管理される。ベース部及び差分情報は、クラスタ１６内で一意なベースＩＤが付与されて管理される。つまり、ベース部とそれに関連する差分情報は、同一のベースＩＤを有している。このように、ベースＩＤでオブジェクトを管理するので、ＬＢＡ空間に拘束されることがなく、異なる容量のディスク装置の利用を許容することが可能となる。 The object is separately managed in a base part created by the object creating unit 500 and difference information that is a change difference added to the object after the base part creation. The base part and the difference information are managed by being assigned a unique base ID within the cluster 16. That is, the base part and the difference information related thereto have the same base ID. As described above, since the object is managed by the base ID, it is possible to allow use of disk devices having different capacities without being restricted by the LBA space.

なお、アクセス解釈手段５２０は各種サポートするアクセス方法におけるオブジェクトへのポインタを、例えばデータベース手段などを用いてベースＩＤ空間に割り当てる手段として実装される。ベースＩＤが重複しないように管理するのはベースＩＤ管理手段５２２である。 The access interpreting means 520 is implemented as means for assigning pointers to objects in various supported access methods to the base ID space using, for example, database means. The base ID management means 522 manages the base IDs so that they do not overlap.

本実施形態における、クラスタ１６内においてオブジェクトはベースＩＤ管理手段５２２とベース部検索手段５２４と差分情報検索手段５２６と差分情報生成手段５２８と差分情報確認手段５２１０とオブジェクト復元手段５２１２と保存ストレージ決定手段５２１４とによって管理される。以降、各機能について説明する。 In this embodiment, the objects in the cluster 16 are a base ID management unit 522, a base part search unit 524, a difference information search unit 526, a difference information generation unit 528, a difference information confirmation unit 5210, an object restoration unit 5212, and a storage storage determination unit. 5214. Hereinafter, each function will be described.

ベースＩＤ管理手段５２２はクラスタ１６内におけるベースＩＤの一意性を保証する手段である。 Base ID management means 522 is means for guaranteeing the uniqueness of the base ID in the cluster 16.

また、ベース部検索手段５２４と差分情報検索手段５２６は、それぞれアクセス解釈手段５２０がクライアント装置１０のアクセスを変換したベースＩＤを基に、それぞれベース部、差分情報をストレージ装置１４から検索する。 Further, the base part searching unit 524 and the difference information searching unit 526 respectively search the base unit and the difference information from the storage device 14 based on the base ID converted by the access interpretation unit 520 to access of the client device 10.

差分情報生成手段５２８は２つのオブジェクトの差分を生成する手段であり、一方オブジェクト復元手段５２１２は元オブジェクトと差分情報生成手段５２８により生成される差分情報からオブジェクトを復元する。 The difference information generation unit 528 is a unit that generates a difference between two objects, while the object restoration unit 5212 restores an object from the original object and the difference information generated by the difference information generation unit 528.

クライアント装置１０がオブジェクト更新手段５０２を用いてオブジェクトに変更が加え、クラスタ１６に保存する際には、差分情報生成手段５２８により差分情報が生成され、オブジェクトとは別にストレージ装置１４に保存する。この動作はオブジェクトが更新されるたびに実行される。すなわち、ベース部とすべての差分情報を用意したのちにオブジェクト復元手段５２１２により逐次オブジェクトを復元していくことによって、最新のオブジェクトを復元することができる。さらに、オブジェクトの変更過程をすべてトレースすることが可能となり、最新以前の状態も復元することができる。 When the client device 10 changes the object using the object update unit 502 and saves it in the cluster 16, difference information is generated by the difference information generation unit 528, and is stored in the storage device 14 separately from the object. This operation is performed every time the object is updated. That is, the latest object can be restored by preparing the base part and all the difference information and then restoring the object sequentially by the object restoration means 5212. Furthermore, it is possible to trace all the object changing processes, and the state before the latest can be restored.

差分情報確認手段５２１０は、上記ベース部とすべての差分情報の集約を確認する（全ての差分情報が存在するかを確認する）。 The difference information confirmation unit 5210 confirms the aggregation of all the difference information with the base part (confirms whether all the difference information exists).

保存ストレージ決定手段５２１４は、ユーザあるいはシステム管理者により設定されたポリシーに応じて、ベース部あるいは差分情報を保存するストレージ装置１４を決定する。このポリシーとはストレージ装置１４を選択する基準をストレージ手段２８の容量負荷とするか、ストレージ装置１４へのアクセス頻度とするかなどの設定と、クラスタ１６に接続するストレージ装置１４の故障耐性数、すなわち冗長度の設定を表す。また、保存ストレージ決定手段５２１４により、オブジェクトの差分情報は保存可能なストレージ装置１４に保存される。すなわち、容量の異なるストレージ手段２８が混在していても、保存ストレージ決定手段５１２４により容量に応じた分のデータが割り振ることを可能とし、結果容量の異なるストレージ装置１４を許容しながらも冗長性を確保するクラスタ・ストレージシステム１を提供することができる。 The storage storage determining unit 5214 determines the storage device 14 that stores the base unit or the difference information in accordance with the policy set by the user or the system administrator. This policy is a setting for determining whether the storage device 14 is based on the capacity load of the storage means 28 or the access frequency to the storage device 14, the fault tolerance number of the storage device 14 connected to the cluster 16, That is, it represents the setting of redundancy. Further, the difference information of the object is stored in the storage device 14 that can be stored by the storage storage determination unit 5214. That is, even when storage units 28 having different capacities are mixed, the storage storage determining unit 5124 can allocate data corresponding to the capacities, and as a result, the storage device 14 having different capacities can be allowed while providing redundancy. The secured cluster storage system 1 can be provided.

本実施形態ではベース部及び差分情報を２つのストレージにそれぞれ保存することでクラスタ１６の冗長度を１台のストレージ装置１４の故障に対する耐性として実現している。冗長度を増加させるには、ベース部及び差分情報の多重度をさらに増加させればよい。この増加に必要なストレージ装置１４の数は冗長度＋１台である。ストレージ装置が３台の場合には上述のように２台のストレージ装置にベース部及び差分情報をそれぞれ保存するが、ストレージ装置の台数がもっと増加し、冗長度を増加させる場合には、３台以上のストレージ装置にそれぞれベース部及び差分情報を保存するようにしても良い。 In this embodiment, the redundancy of the cluster 16 is realized as a tolerance for the failure of one storage device 14 by storing the base unit and the difference information in two storages, respectively. In order to increase the redundancy, it is only necessary to further increase the multiplicity of the base part and the difference information. The number of storage devices 14 required for this increase is redundancy + 1. When there are three storage devices, the base unit and the difference information are stored in the two storage devices as described above. However, when the number of storage devices further increases and the redundancy is increased, three units are stored. You may make it preserve | save a base part and difference information in the above storage devices, respectively.

以上が本発明の第１の実施形態のシステム構成例である。なお、本発明では、図１に示す構成に限らず、クライアント装置１０がバーチャライザ装置１２と直接接続し、バーチャライザ装置１２と各ストレージ装置１４はバーチャライザ装置１２が有する別の通信インターフェース２６を用いて接続する構成でもよいし、あるいはストレージ装置１４にバーチャライザ装置１２の持つ装置及び手段を組み込み、バーチャライザ装置１２をストレージ装置１４で置き換えてもよい。さらにはバーチャライザ装置１２の持つ装置及び手段をクライアント装置１０があるいはクライアント装置１０とストレージ装置１４が分担して有することでクライアント装置１０が直接ストレージ装置１４と接続し仮想化する構成でもよい。 The above is the system configuration example of the first embodiment of the present invention. In the present invention, not limited to the configuration shown in FIG. 1, the client device 10 is directly connected to the virtualizer device 12, and the virtualizer device 12 and each storage device 14 have another communication interface 26 included in the virtualizer device 12. The storage device 14 may be connected to the storage device 14, or the virtual device 12 may be replaced with the storage device 14. Further, the client device 10 or the client device 10 and the storage device 14 share the devices and means of the virtualizer device 12 so that the client device 10 is directly connected to the storage device 14 and virtualized.

＜情報の保存形式＞
以下、ストレージ装置１４内におけるベース部及び差分情報の保存形式について説明する。 <Information storage format>
Hereinafter, the storage format of the base unit and difference information in the storage apparatus 14 will be described.

図２は、ベース部６２の保存形式の一例を示す図である。ベース部６２はベースＩＤ７０が初期オブジェクト７２に付与されてストレージ装置１４内に保存される。ベースＩＤ７０はオブジェクトごとに割り当てられる整数値であり、ベースＩＤ管理手段５２２により、クラスタの中で一意であることが保障されている。ストレージ装置１４は、このベースＩＤを基にデータ列を一つのオブジェクトとして認識している。初期オブジェクト７２はクライアント装置１０がオブジェクト作成手段を５００により生成した、オブジェクトである。 FIG. 2 is a diagram illustrating an example of a storage format of the base unit 62. The base unit 62 is stored in the storage apparatus 14 with the base ID 70 assigned to the initial object 72. The base ID 70 is an integer value assigned to each object, and is guaranteed to be unique in the cluster by the base ID management means 522. The storage apparatus 14 recognizes the data string as one object based on this base ID. The initial object 72 is an object generated by the client device 10 using the object creation means 500.

なお、ベース部６２の保存方法は上記に限らず、ベースＩＤ７０を新規オブジェクト７２に直接付与する形式ではなく、例えばデータベース手段などを用いて別途管理してもよい。また、新規オブジェクト７２に直接関連付けるのではなく、例えばオブジェクトが保存されている先頭及び末尾ＬＢＡを合わせて保存する形式にしてもよい。 Note that the storage method of the base unit 62 is not limited to the above, and may be managed separately using, for example, a database unit, instead of a format in which the base ID 70 is directly assigned to the new object 72. Further, instead of directly associating with the new object 72, for example, a format in which the head and tail LBA in which the object is stored is stored together may be used.

図３は、差分情報６４の保存形式の一例を示す図である。差分情報６４はベースＩＤ７０とバージョン番号７４がオブジェクト差分７６に付与された形式でストレージ装置１４内に保存される。ベースＩＤ７０は、オブジェクト更新手段５０２が加えた変更を差分情報生成手段５２８により検出したオブジェクト差分７６を同一オブジェクトのベース部６２と関連付けるために付与される。バージョン番号７４は、オブジェクトが更新された回数を記録するための整数値で、ベース部６２を０起点とし、１、２・・・とオブジェクトが更新される順に増加する。 FIG. 3 is a diagram illustrating an example of a storage format of the difference information 64. The difference information 64 is stored in the storage device 14 in a format in which the base ID 70 and the version number 74 are added to the object difference 76. The base ID 70 is assigned in order to associate the object difference 76 detected by the difference information generating unit 528 with the base unit 62 of the same object. The version number 74 is an integer value for recording the number of times the object has been updated. The version number 74 increases from 0 in the order in which the object is updated with the base unit 62 as the starting point.

なお、オブジェクト差分７６は元オブジェクトと更新されたオブジェクトから差分情報生成手段５２８によって生成される情報である。オブジェクト差分７６は、初期オブジェクト７２と最新オブジェクトの差分でもよいし、一つ前のバージョンと最新オブジェクトの差分でもよい。前者の場合はオブジェクトを保存するための領域を節約することができる。その一方で後者は、オブジェクトの更新履歴を逐次取得することができる。本実施形態では後者のオブジェクト差分６４を逐次取得する方式について以降説明する。 The object difference 76 is information generated by the difference information generating unit 528 from the original object and the updated object. The object difference 76 may be a difference between the initial object 72 and the latest object, or may be a difference between the previous version and the latest object. In the former case, an area for storing an object can be saved. On the other hand, the latter can acquire the update history of the object sequentially. In the present embodiment, the latter method of sequentially acquiring the object difference 64 will be described below.

また、ベース部６２及び差分情報６４に付与されたベースＩＤ７０またはバージョン番号７４は、新規オブジェクト７２及びオブジェクト差分７６に付与する形ではなく、データベース手段などによって別途管理してもよい。 Further, the base ID 70 or the version number 74 given to the base unit 62 and the difference information 64 may be separately managed by a database unit or the like instead of being given to the new object 72 and the object difference 76.

オブジェクト復元時にはアクセス解釈手段５２０によって得られたベースＩＤ７０において、クラスタ１６に存在する全てのバージョン番号７４を集計し、最新の差分情報までのすべての差分情報６４が揃っているかどうか確認した上でオブジェクトの復元が実行される。 At the time of object restoration, in the base ID 70 obtained by the access interpretation means 520, all the version numbers 74 existing in the cluster 16 are totaled, and it is confirmed whether all the difference information 64 up to the latest difference information is prepared. Restoration is performed.

＜動作シーケンス＞
クライアント装置１０及びクラスタ１６の動作手順の一例を説明する。図４は、第１の実施形態によるクライアント装置１０及びクラスタ１６の動作シーケンスの一例を示す図である。 <Operation sequence>
An example of the operation procedure of the client device 10 and the cluster 16 will be described. FIG. 4 is a diagram illustrating an example of an operation sequence of the client device 10 and the cluster 16 according to the first embodiment.

図４において、Ｃはクライアント装置１０を示し、Ｖはバーチャライザ装置１２を示す。また、Ｓ１、Ｓ２、及びＳ３はそれぞれストレージ装置１４を表している。なお、図４は、上から順に時系列（８０、８２、８４、８６）にそれぞれの処理を表している。また、９０はストレージ装置１４の障害発生を表している。本実施形態における障害９０は一定期間で復帰可能な状態、たとえば通信の切断や電源断と定義する。 In FIG. 4, C indicates the client device 10, and V indicates the virtualizer device 12. S1, S2, and S3 represent the storage apparatus 14, respectively. FIG. 4 shows the respective processes in time series (80, 82, 84, 86) in order from the top. Reference numeral 90 denotes a failure occurrence of the storage apparatus 14. The failure 90 in the present embodiment is defined as a state that can be recovered within a certain period of time, for example, communication disconnection or power interruption.

さらに、図４において、点線矩形で囲われた手順はオブジェクトの新規作成手順８０とストレージ装置１４の障害９０発生時におけるオブジェクトへのアクセス手順８２とストレージ装置１４の障害９０発生時におけるオブジェクト更新手順８４とストレージ装置１４の障害９０発生時におけるオブジェクト復元を伴うオブジェクトへのアクセス手順８６とをそれぞれ表している。以下、オブジェクトの新規作成手順８０より順に説明する。 Further, in FIG. 4, the procedure surrounded by a dotted rectangle is a new object creation procedure 80, an object access procedure 82 when a failure 90 occurs in the storage device 14, and an object update procedure 84 when a failure 90 occurs in the storage device 14. And an object access procedure 86 accompanied by object restoration when a failure 90 occurs in the storage apparatus 14. Hereinafter, the new object creation procedure 80 will be described in order.

ｉ）オブジェクトの新規作成手順８０において、クライアント装置１０は、オブジェクト作成手段５００を用いて新規オブジェクト６０１を新規作成する。そして、オブジェクト管理手段５０４により、生成した新規オブジェクト６０１の保存要求を発行し、通信インターフェース２６を介してクラスタ１６に送信する。この保存要求は、バーチャライザ装置１２によって受信される。そして、新規オブジェクトに対して、アクセス解釈手段５２０によってベースＩＤ７０が割り当てられ、新規オブジェクト６０１とベースＩＤ７０を合わせてベース部６２が生成される。 i) In the new object creation procedure 80, the client device 10 creates a new object 601 using the object creation means 500. Then, the object management unit 504 issues a save request for the generated new object 601 and transmits it to the cluster 16 via the communication interface 26. This storage request is received by the virtualizer device 12. Then, the base ID 70 is assigned to the new object by the access interpretation means 520, and the base unit 62 is generated by combining the new object 601 and the base ID 70.

その後バーチャライザ装置１２は、保存ストレージ決定手段５２１４により、２台のストレージ装置１４（ここではＳ１、Ｓ２）を決定し、それぞれに対してベース部６２を送信し、保存する。 Thereafter, the virtualizer device 12 determines two storage devices 14 (here, S1 and S2) by the storage storage determination means 5214, transmits the base unit 62 to each of them, and stores them.

ii）次に、ベース部６２を保存しているストレージ装置１４に障害が発生した時におけるオブジェクトへのアクセス手順８２を説明する。 ii) Next, an object access procedure 82 when a failure occurs in the storage apparatus 14 storing the base unit 62 will be described.

クライアント装置１０は、オブジェクト管理手段５０４を用いて、ユーザが要求するオブジェクト６０１へのアクセス要求を発行（例えば、ユーザによるファイル名のクリックより発行される）し、通信インターフェース２６を介してクラスタ１６に送信する。バーチャライザ装置１２は、このアクセス要求を受信し、オブジェクトアクセス応答動作８２ｖを開始する。 The client device 10 uses the object management unit 504 to issue an access request to the object 601 requested by the user (for example, issued when the user clicks a file name), and sends it to the cluster 16 via the communication interface 26. Send. The virtualizer device 12 receives this access request and starts an object access response operation 82v.

図５は、クライアント装置１０のオブジェクトアクセス要求に対するバーチャライザ装置１２によるオブジェクトアクセス応答動作８２ｖを詳細に説明するためのフローチャートである。なお、文末の各注釈は図中の参照処理を示す。 FIG. 5 is a flowchart for explaining in detail the object access response operation 82v by the virtualizer device 12 in response to the object access request of the client device 10. Each annotation at the end of the sentence indicates a reference process in the figure.

まず、バーチャライザ装置１２は、アクセス解釈手段５２０によってベースＩＤ７０を特定する。特定したベースＩＤ７０を基に、バーチャライザ装置１２は自身が管理する各ストレージ装置１４からベース部検索手段５２４を用いて、アクセス要求のあったベース部を検索する。（Ｓ８２２） First, the virtualizer device 12 specifies the base ID 70 by the access interpretation unit 520. Based on the identified base ID 70, the virtualizer device 12 searches the base unit for which an access request has been made from each storage device 14 managed by itself using the base unit search unit 524. (S822)

上記操作によりベース部６２が発見できなかった場合は、アクセス解釈手段５２０は、オブジェクト６０１が存在しないことをクライアント１０に通知し、オブジェクトアクセス動作８２ｖを終了する（Ｓ８２２１でＮｏ→Ｓ８２６→Ｓ８２１４）。 If the base unit 62 cannot be found by the above operation, the access interpretation unit 520 notifies the client 10 that the object 601 does not exist, and ends the object access operation 82v (No in step S8221 → S826 → S8214).

ベース部６２が発見された場合は、差分情報検索手段５２６は、発見したベース部６２と同じベースＩＤ７０を有する差分情報６４を応答可能なすべてのストレージ装置１４（Ｓ１及びＳ３）から検索する（Ｓ８２２１でＹｅｓ）。 When the base unit 62 is found, the difference information search unit 526 searches for the difference information 64 having the same base ID 70 as the found base unit 62 from all the storage apparatuses 14 (S1 and S3) that can respond (S8221). Yes).

その後バーチャライザ装置１２は、ストレージ装置１４より返信されて来る各差分情報６４を、差分情報確認手段５２１０を用いてオブジェクト復元に必要な差分情報６４がすべて存在しているか確認する（Ｓ８２４）。 Thereafter, the virtualizer device 12 confirms whether or not all the difference information 64 necessary for object restoration exists for each difference information 64 returned from the storage device 14 using the difference information confirmation unit 5210 (S824).

確認の結果、全ての差分情報６４が存在しており、オブジェクト６０１が復元可能と判断されたならば、オブジェクト復元手段５２１２は、オブジェクト６０１を復元する（Ｓ８２４１でＹｅｓ）。 If it is determined that all the difference information 64 exists and the object 601 can be restored, the object restoration unit 5212 restores the object 601 (Yes in S8241).

今回のケースでは差分情報６４は存在せず、ベース部６２のみであるので、ベース部６２からベースＩＤ７０をオブジェクト復元手段５２１２を用いて除去し、オブジェクト６０１としてクライアント装置１０へと送信し、オブジェクトアクセス動作８２ｖを終了する（Ｓ８２８→Ｓ８２１２→Ｓ８２１４）。 In this case, since the difference information 64 does not exist and is only the base unit 62, the base ID 70 is removed from the base unit 62 using the object restoration unit 5212, and is transmitted to the client device 10 as the object 601 to access the object. The operation 82v is terminated (S828 → S8212 → S8214).

復元に必要な全ての差分情報が存在せず、オブジェクトを最新の状態として復元できない場合、復元可能なオブジェクトがクライアントに通知され、オブジェクトアクセス動作が終了する（Ｓ８２４１でＮｏ→Ｓ８２１０）。 If all the difference information necessary for restoration does not exist and the object cannot be restored in the latest state, the restoreable object is notified to the client, and the object access operation ends (No → S8210 in S8241).

この復元できなかった場合の動作８２１０は、クラスタ１６に設定されたユーザのポリシーによって決定されるものであり、例えばこの提示を行わずオブジェクトアクセス失敗を通知８２６してオブジェクトアクセス動作を終了するなどしてもよい。 The operation 8210 when the restoration could not be performed is determined by the user policy set in the cluster 16. For example, the object access operation is terminated by notifying the object access failure 826 without performing this presentation. May be.

iii）続いて、ベース部６２を保存しているストレージ装置１４（Ｓ２）障害発生時におけるオブジェクト６０１の更新手順８４について説明する。クライアント装置１０は、オブジェクト更新手段５０２を用いて、クラスタ１６より受信したオブジェクト６０１を更新する。そして、オブジェクト管理手段５０４により、更新したオブジェクト６０２の保存要求をクラスタ１６に発行する。バーチャライザ装置１２は、クライアント装置１０からの保存要求を受信し、オブジェクト保存動作８４ｖを開始する。 iii) Next, the update procedure 84 of the object 601 when a failure occurs in the storage device 14 (S2) storing the base unit 62 will be described. The client device 10 uses the object update unit 502 to update the object 601 received from the cluster 16. Then, the object management unit 504 issues a save request for the updated object 602 to the cluster 16. The virtualizer device 12 receives the save request from the client device 10 and starts an object save operation 84v.

図６は、クライアント装置１０のオブジェクト保存要求に対するバーチャライザ装置１２によるオブジェクト保存動作８４ｖを詳細に説明するためのフローチャートである。 FIG. 6 is a flowchart for explaining in detail the object saving operation 84v by the virtualizer device 12 in response to the object saving request of the client device 10.

バーチャライザ装置１２は、アクセス解釈手段５２０を用いて更新されたオブジェクト６０２の基となるオブジェクト６０１のベースＩＤ７０を特定し、特定したベースＩＤ７０を用いてストレージ装置１４よりオブジェクト６０１を復元する。十分なメモリ２４を有している様態においては、クライアント装置１０からのオブジェクトアクセス８２時に復元したオブジェクト６０１を記憶しておくなどして既存オブジェクト６０１を得てもよい（Ｓ８４１）。 The virtualizer device 12 specifies the base ID 70 of the object 601 that is the basis of the updated object 602 by using the access interpretation unit 520, and restores the object 601 from the storage device 14 using the specified base ID 70. In an aspect having sufficient memory 24, the existing object 601 may be obtained by storing the object 601 restored at the time of object access 82 from the client device 10 (S841).

既存オブジェクト６０１を獲得できなかった場合、オブジェクト新規作成手順８０が実行され、オブジェクト６０２が新規オブジェクト７２として新規保存され、オブジェクト保存動作８４ｖが終了する（Ｓ８４１１でＮｏ→Ｓ８０→Ｓ８４６）。 If the existing object 601 could not be acquired, the new object creation procedure 80 is executed, the object 602 is newly saved as the new object 72, and the object saving operation 84v ends (No in S8411 → S80 → S846).

既存オブジェクト６０１を獲得した場合、バーチャライザ装置１２は、オブジェクト差分情報生成手段５２８を用いて、クライアント装置１０から受信した更新オブジェクト６０２とのオブジェクト差分（図示せず）を生成する。オブジェクト差分情報生成手段５２８によるオブジェクト差分生成後、アクセス解釈手段５２０によって特定されたベースＩＤ７０と最新バージョン番号７４（この場合は１）をオブジェクト差分（図示せず）に付与することで差分情報６４を生成する（８４１１でＹｅｓ→Ｓ８４２）。 When acquiring the existing object 601, the virtualizer device 12 generates an object difference (not shown) with the update object 602 received from the client device 10 using the object difference information generation unit 528. After the object difference is generated by the object difference information generating unit 528, the difference information 64 is obtained by adding the base ID 70 and the latest version number 74 (1 in this case) specified by the access interpreting unit 520 to the object difference (not shown). (Yes in 8411 → S842).

その後、バーチャライザ装置１２は、保存ストレージ決定手段５２１４を用いて、２台のストレージ装置１４を選択する（Ｓ８４４０）。 Thereafter, the virtualizer device 12 selects two storage devices 14 using the storage storage determination unit 5214 (S8440).

本実施形態においては、利用可能なＳ１及びＳ３が選択され、それぞれに対して差分情報６４を保存してオブジェクト保存動作８２ｖが終了する（Ｓ８４４１→Ｓ８４４２→Ｓ８４６）。 In the present embodiment, S1 and S3 that can be used are selected, the difference information 64 is saved for each of them, and the object saving operation 82v ends (S8441 → S8442 → S846).

iv）さらに、ベース部６２及び差分情報６４を保存しているストレージ装置１４（Ｓ１）に障害が発生した時における復元を伴うオブジェクトへのアクセス手順８６を説明する。 iv) Further, a description will be given of an object access procedure 86 accompanied by restoration when a failure occurs in the storage unit 14 (S1) storing the base unit 62 and the difference information 64. FIG.

クライアント装置１０は、オブジェクト管理手段５０４を用いて、ユーザが要求するオブジェクト６０２へのアクセス要求を発行し、通信インターフェース２６を介してクラスタ１６に送信する。このアクセス要求を受信したバーチャライザ装置１２は、アクセス解釈手段５２０を用いてベースＩＤ７０を特定する。その後、バーチャライザ装置１２は、このベースＩＤ７０を基にベース部６２検索手段５２４及び差分情報６４検索手段５２６を用いて、ベース部６２及び差分情報６４へのアクセス要求を発行する。そして、そのアクセス要求は、全ての応答可能なクラスタ１６に属するストレージ装置１４（Ｓ２及びＳ３）に送信される。 The client device 10 uses the object management unit 504 to issue an access request to the object 602 requested by the user and transmits it to the cluster 16 via the communication interface 26. The virtualizer device 12 that has received this access request specifies the base ID 70 using the access interpretation means 520. Thereafter, the virtualizer device 12 issues an access request to the base unit 62 and the difference information 64 using the base unit 62 search unit 524 and the difference information 64 search unit 526 based on the base ID 70. Then, the access request is transmitted to all the storage apparatuses 14 (S2 and S3) belonging to the cluster 16 that can respond.

ベース部６２及び差分情報６４へのアクセス要求を受信した各ストレージ装置１４は、バーチャライザ装置１２の要求する情報を送信する。バーチャライザ装置１２は、ストレージ装置１４から受信した各情報を、差分情報確認手段５２１０を用いて解析する。本実施形態においては、最新の差分情報６４はバージョン１であり、またバージョン１までのすべての差分情報６４が集約できている。差分情報確認手段５２１０は、最新差分情報６４までの全ての差分情報６４が揃っていることを確認した後、オブジェクト復元手段５２１２は、取得した差分情報６４から最新状態のオブジェクト６０２を復元する。そして、復元したオブジェクト６０２はクライアント装置１０へ送信され、復元を伴うオブジェクトへのアクセス手順８６が完了する。 Each storage device 14 that has received an access request to the base unit 62 and the difference information 64 transmits information requested by the virtualizer device 12. The virtualizer device 12 analyzes each piece of information received from the storage device 14 using the difference information confirmation unit 5210. In the present embodiment, the latest difference information 64 is version 1, and all the difference information 64 up to version 1 can be aggregated. After the difference information confirmation unit 5210 confirms that all the difference information 64 up to the latest difference information 64 is available, the object restoration unit 5212 restores the object 602 in the latest state from the acquired difference information 64. Then, the restored object 602 is transmitted to the client device 10, and the access procedure 86 to the object accompanying the restoration is completed.

上記操作の特徴は、クラスタ・ストレージ装置１４のストレージ手段２８の容量が異なったとしても、保存ストレージ決定手段５２１４がベース部６２あるいは差分情報６４の保存先を適切に選択することで、冗長性を保つことができる点にある。 Even if the capacity of the storage unit 28 of the cluster storage device 14 is different, the storage storage determining unit 5214 appropriately selects the storage unit for the base unit 62 or the difference information 64 so that redundancy can be achieved. It is in the point that can be kept.

したがって、本発明によるクラスタ・ストレージ１は履歴管理機能を内包するオブジェクトの冗長管理を実現するディスク容量の異なるストレージ装置１４を許容するクラスタ・ストレージシステムを提供できる。 Therefore, the cluster storage 1 according to the present invention can provide a cluster storage system that allows storage devices 14 having different disk capacities to realize redundant management of objects including the history management function.

２）第２の実施形態
第２の実施形態は、オブジェクト単位で、１つのストレージ装置にできるだけデータを集約するものである。ユーザは”このオブジェクトはどこかのストレージに入っている”と直感的に想像すると考えられる。すなわち、あるオブジェクトを取得する際には１つのストレージだけ起動していればいいと考えている可能性があるのである（従来ストレージの利用のように１台ずつ起動・停止を繰り返す可能性）。 2) Second Embodiment In the second embodiment, data is collected as much as possible in one storage device in units of objects. The user can intuitively imagine that this object is in some storage. That is, when acquiring an object, there is a possibility that only one storage needs to be activated (the possibility of repeating activation and deactivation one by one as in the case of conventional storage).

このような状況を考えた場合、第１の実施形態による方法では、最悪全てのデータを提供できない可能性がある。 Considering such a situation, the method according to the first embodiment may not provide all the worst data.

そこで、オブジェクトの親装置となるストレージ装置を決定し、親装置が差分を取得できなかったときのためのストレージ装置（子装置）が仮受するような仕組みを持たせる。この動作により、上記状況においてもユーザは十分に情報を取得することができ、直感的にシステムを利用することができるようになる。 Therefore, a storage device that is a parent device of the object is determined, and a mechanism is provided such that the storage device (child device) for when the parent device cannot acquire the difference is provisionally received. With this operation, the user can sufficiently acquire information even in the above situation, and can intuitively use the system.

分散型のクラスタ・ストレージ装置の利点は分散配置によるデータアクセスの高速化とストレージ故障に対する復帰容易性すなわちメンテナンス性の２点である。データアクセス速度は一般にデータの転送がパフォーマンスのボトルネックとなる。なぜなら、データの検索もしくはデータの復元はデータの転送に比べ高速であるためである。 The advantages of the distributed cluster storage device are two points: high-speed data access by distributed arrangement and easy recovery from storage failure, that is, maintainability. As for data access speed, data transfer is generally a performance bottleneck. This is because data retrieval or data restoration is faster than data transfer.

本発明では、ストレージ装置１４を用いたデータアクセスの高速化を実現するために、クライアント装置１０が差分情報確認手段５２１０及びオブジェクト復元手段５２１２を備えるようにしている。ストレージ装置１４がそれぞれベース部６２あるいは差分情報６４を検索、クライアント装置１０に対して送信する場合、データの転送速度はネットワーク２によって律速するようになる。一般にこれは最大転送速度となる。 In the present invention, the client apparatus 10 includes a difference information confirmation unit 5210 and an object restoration unit 5212 in order to realize high-speed data access using the storage apparatus 14. When the storage device 14 searches the base unit 62 or the difference information 64 and transmits it to the client device 10, the data transfer rate is determined by the network 2. Generally this is the maximum transfer rate.

一方、メンテナンス性向上に関しては、オブジェクト６０復元に際して複数のストレージ装置１４を必要とせず、１台で復元可能な状態にしておくことでオブジェクト６０の複製・移動がオブジェクト６０を有するストレージ装置１４とその他１台のストレージ装置１４のみで実現できるようにする方法がある。この方法は、さらにオブジェクト６０を物理的に携帯する際にも集約した１台のストレージ装置１４の移動で実現できるようになり、利便性もあわせて向上する。 On the other hand, with respect to improving maintainability, a plurality of storage devices 14 are not required when restoring the object 60, and the storage device 14 having the object 60 can be copied / moved by making it possible to restore the object 60 by one unit. There is a method that can be realized by only one storage device 14. This method can be realized by moving one storage apparatus 14 that is aggregated even when the object 60 is physically carried, and convenience is also improved.

第２の実施形態では、一つのストレージ装置１４にベース部６２及び差分情報６４を集約することでメンテナンス性を高める方法を説明する。 In the second embodiment, a method for improving maintenance by aggregating the base unit 62 and the difference information 64 in one storage device 14 will be described.

＜クラスタ・ストレージシステムの構成＞
図７は、本発明の第２の実施形態によるクラスタ・ストレージシステムの概略構成を示す図である。第１の実施形態との差異は、バーチャライザ装置１２が、さらにオリジナル管理手段５２１６を有していることである。 <Configuration of cluster storage system>
FIG. 7 is a diagram showing a schematic configuration of a cluster storage system according to the second embodiment of the present invention. The difference from the first embodiment is that the virtualizer device 12 further has original management means 5216.

本実施形態において、「オリジナル」とは複数保存されるベース部６２のうち、優先度を他より上位とした一つの特別なベース部６２と定義する。また、オリジナルを保有するストレージ装置１４を「そのオブジェクトのネイティブ」であると表現する。 In the present embodiment, “original” is defined as one special base unit 62 having a higher priority than others among the plurality of stored base units 62. Further, the storage apparatus 14 that owns the original is expressed as “the native of the object”.

本実施形態の主旨は、ネイティブに差分情報６４を集約することで、可能な限り１つのストレージ装置１４のみでオブジェクト６０の復元を実現することにある。 The main point of the present embodiment is to realize the restoration of the object 60 with only one storage device 14 as much as possible by aggregating the difference information 64 natively.

＜ベース部のデータ構成例＞
図８は、第２の実施形態において用いられるベース部６２の構成の一例を示す図である。 <Example of data structure of base part>
FIG. 8 is a diagram illustrating an example of the configuration of the base unit 62 used in the second embodiment.

ベース部の構成における第１の実施形態との差異は、ベースＩＤ７０以外にオリジナルフラグ７８が付与されている部分である。オリジナルフラグ７８は、そのベース部６２がオリジナルかどうか判定するための情報で、真か偽何れかの状態を採る。オリジナルフラグ７８もベースＩＤ７０と同様にベース部６２に直接付与する形態でなく、データベース手段などによるベース部６２との関連付けによって管理してもよい。 The difference from the first embodiment in the configuration of the base portion is a portion to which the original flag 78 is added in addition to the base ID 70. The original flag 78 is information for determining whether or not the base unit 62 is original, and takes either a true or false state. Similarly to the base ID 70, the original flag 78 is not directly assigned to the base unit 62, but may be managed by association with the base unit 62 by a database means or the like.

＜動作シーケンス＞
図９は、第２の実施形態におけるクライアント装置１０及びクラスタ１６の動作シーケンスの一例を示す図である。図９では、Ｃはクライアント装置１０を示し、Ｖはバーチャライザ装置１２を示す。また、Ｓ１、Ｓ２、及びＳ３はそれぞれストレージ装置１４を表している。図９においても上から順に処理を時系列（８１０、８１２、８１４)に表している。さらに、９０はストレージ装置１４の障害発生を表している。本実施形態においても障害は一定期間で復帰可能な状態と定義し、たとえば通信切断や電源断である。また、点線矩形で囲われた手順は、それぞれオリジナルフラグ７８を伴うオブジェクトの新規作成手順８１０とネイティブ動作時におけるオブジェクト更新手順８１２とストレージ装置１４障害発生時におけるオブジェクト復元及びネイティブへの差分情報６４の複製を伴うオブジェクトへのアクセス手順８１４とを表している。 <Operation sequence>
FIG. 9 is a diagram illustrating an example of an operation sequence of the client device 10 and the cluster 16 in the second embodiment. In FIG. 9, C indicates the client device 10, and V indicates the virtualizer device 12. S1, S2, and S3 represent the storage apparatus 14, respectively. In FIG. 9 as well, the processing is shown in time series (810, 812, 814) in order from the top. Further, 90 represents the occurrence of a failure in the storage apparatus 14. Also in the present embodiment, a failure is defined as a state that can be recovered within a certain period of time, for example, communication disconnection or power supply disconnection. Further, the procedure enclosed by the dotted rectangle includes the new object creation procedure 810 with the original flag 78, the object update procedure 812 during native operation, and the object restoration and native difference information 64 when the storage device 14 fails. The access procedure 814 to the object accompanied by duplication is shown.

ｉ）まず、オリジナルフラグ７８を伴うオブジェクトの新規作成手順８１０を説明する。 i) First, a new object creation procedure 810 with the original flag 78 will be described.

クライアント装置１０は、オブジェクト作成手段５００を用いて、新規オブジェクト７２を新規作成する。そして、オブジェクト管理手段５０４は、新規作成した新規オブジェクト７２の保存要求を発行し、通信インターフェース２６を介してクラスタ１６に送信する。 The client device 10 creates a new object 72 using the object creation means 500. Then, the object management unit 504 issues a request to save the newly created new object 72 and transmits it to the cluster 16 via the communication interface 26.

バーチャライザ装置１２は、この保存要求を受信し、新規オブジェクト保存動作を開始する。 The virtualizer device 12 receives this storage request and starts a new object storage operation.

図１０は、クライアント装置１０の新規オブジェクト保存要求に対するバーチャライザ装置１２の新規オブジェクト保存動作８１０ｖを詳細に説明するためのフローチャートである。 FIG. 10 is a flowchart for explaining in detail the new object saving operation 810v of the virtualizer device 12 in response to the new object saving request of the client device 10.

アクセス解釈手段５２０は、クライアント装置１０から受信した新規オブジェクト６０１にベースＩＤ７０を割り当て、新規オブジェクト７２とベースＩＤ７０を合わせてベース部６２を生成する（Ｓ８１００）。 The access interpretation unit 520 assigns the base ID 70 to the new object 601 received from the client device 10, and generates the base unit 62 by combining the new object 72 and the base ID 70 (S8100).

その後、バーチャライザ装置１２は、保存ストレージ決定手段５２１４を用いて、２台のストレージ装置１４（本例ではＳ１及びＳ２）を決定する（Ｓ８２４０）。このとき、オリジナル管理手段５２１６は、第１のストレージ装置１４として選択されたベース部６２に対してはオリジナルフラグ７８を真として（Ｓ８４２１）、第２のストレージ装置１４として選択されたベース部６２に対してはオリジナルフラグ７８を偽として設定する（Ｓ８４２２）。 Thereafter, the virtualizer device 12 determines the two storage devices 14 (S1 and S2 in this example) using the storage storage determination unit 5214 (S8240). At this time, the original management unit 5216 sets the original flag 78 to true for the base unit 62 selected as the first storage device 14 (S8421), and sets the base unit 62 selected as the second storage device 14 to the base unit 62 selected as the second storage device 14. On the other hand, the original flag 78 is set as false (S8422).

また、オリジナル管理手段５２１６は、冗長性を高める目的で３台以上のストレージ装置１４を決定する様態においては第１のストレージ装置１４以外のオリジナルフラグ７８を偽として設定する。 In addition, the original management unit 5216 sets the original flag 78 other than the first storage device 14 as false in the mode of determining three or more storage devices 14 for the purpose of increasing redundancy.

その後、それぞれに対してベース部６２を送信し、保存する。これにより、新規オブジェクト保存動作８１０ｖが終了する（Ｓ８２４１→Ｓ８２４２→Ｓ８４４）。 Thereafter, the base unit 62 is transmitted to each of them and stored. As a result, the new object saving operation 810v ends (S8241 → S8242 → S844).

ii）次に、ネイティブ動作時におけるオブジェクト更新手順８１２について説明する。第１の実施形態におけるオブジェクト更新手順８４と同様に、クライアント装置１０は、オブジェクト更新手段５０２を用いてオブジェクト６０１を更新する。また、オブジェクト管理手段５０４は、更新したオブジェクトの保存要求を発行し、通信インターフェース２６を介してクラスタ１６にその保存要求を送信する。 ii) Next, an object update procedure 812 during native operation will be described. Similar to the object update procedure 84 in the first embodiment, the client apparatus 10 updates the object 601 using the object update unit 502. In addition, the object management unit 504 issues a storage request for the updated object, and transmits the storage request to the cluster 16 via the communication interface 26.

図１１は、保存要求を受信したバーチャライザ装置１２のオブジェクト保存動作８１２ｖを詳細に説明する説明するためのフローチャートである。 FIG. 11 is a flowchart for explaining in detail the object storing operation 812v of the virtualizer device 12 that has received the storing request.

保存要求を受信したバーチャライザ装置１２は、ベースＩＤ管理手段５２２を用いて、特定したベースＩＤ７０からストレージ装置１４よりオブジェクトを復元するか、あるいは十分なメモリ２４を有している様態においてはクライアント装置１０からのオブジェクトアクセス時に復元したオブジェクトを記憶しておく等により、既存オブジェクト６０１を獲得する（Ｓ８２１）。 The virtualizer device 12 that has received the save request uses the base ID management means 522 to restore the object from the specified base ID 70 from the storage device 14, or in the form of having sufficient memory 24, the client device The existing object 601 is acquired by storing the restored object at the time of object access from 10 (S821).

既存オブジェクト６０１が獲得できた場合、オブジェクト差分情報生成手段５２８は、獲得した既存オブジェクト６０１とクライアント装置１０から受信した更新オブジェクト６０２とのオブジェクト差分（図示せず）を生成し、アクセス解釈手段５２０によって特定されたベースＩＤ７０と最新バージョン番号７４（この場合は１）をそれぞれオブジェクト差分に付与することで差分情報６４を生成する（Ｓ８２１１でＹｅｓ→Ｓ８２２）。 When the existing object 601 can be acquired, the object difference information generation unit 528 generates an object difference (not shown) between the acquired existing object 601 and the update object 602 received from the client device 10, and the access interpretation unit 520 The difference information 64 is generated by assigning the identified base ID 70 and the latest version number 74 (in this case, 1) to the object difference (Yes in S8211 → S822).

一方、既存オブジェクト６０１を獲得できなかった場合、オリジナルフラグ７８を伴うオブジェクト新規作成手順８１０（図１０参照）が実行され、更新オブジェクト６０２が新規オブジェクトとして新規保存され、オブジェクト保存動作８１２ｖが終了する（Ｓ８２１１でＮｏ→Ｓ８４）。 On the other hand, if the existing object 601 could not be acquired, a new object creation procedure 810 (see FIG. 10) with the original flag 78 is executed, the updated object 602 is newly saved as a new object, and the object saving operation 812v ends ( No in S8211 → S84).

その後、バーチャライザ装置１２は、保存ストレージ決定手段５２１４を用いて、２台のストレージ装置１４を決定（選択）する（Ｓ８２４０）。 Thereafter, the virtualizer device 12 determines (selects) two storage devices 14 using the storage storage determination unit 5214 (S8240).

さらに、ストレージ装置決定時（Ｓ８２４０）にオブジェクト復元手段５２１２がオリジナルフラグ７８を検出する。すなわちバーチャライザ装置１２がネイティブ（Ｓ１）にアクセス可能な場合、保存ストレージ決定手段５２１４は第１のストレージとしてネイティブを優先的に選択する（Ｓ８２５０→Ｓ８２５１→Ｓ８２５２）。 Further, the object restoration unit 5212 detects the original flag 78 when the storage device is determined (S8240). That is, when the virtualizer device 12 can access the native (S1), the storage storage determining unit 5214 preferentially selects the native as the first storage (S8250 → S8251 → S8252).

その後、選択されたストレージ装置１４のそれぞれに対して差分情報６４が送信され、保存される。これにより、オブジェクト保存動作が終了する（Ｓ８２４１→Ｓ８２４２→Ｓ８２６）。 Thereafter, the difference information 64 is transmitted to each of the selected storage devices 14 and stored. As a result, the object saving operation ends (S8241 → S8242 → S826).

iii）続いて、ストレージ装置１４の障害発生時におけるオブジェクト復元及び差分情報６４の複製を伴うオブジェクトへのアクセス手順８１４について説明する。本例においてはストレージ装置１４の障害により、ネイティブのストレージ装置が復元に必要な全ての差分情報６４を有していない（即ち、復元するためのデータが一部でも欠落している）ことを仮定している。 iii) Next, an object access procedure 814 that accompanies object restoration and copy of the difference information 64 when a failure occurs in the storage apparatus 14 will be described. In this example, it is assumed that the native storage apparatus does not have all the difference information 64 necessary for restoration (that is, some data for restoration is missing) due to a failure of the storage apparatus 14. is doing.

この状態において、バーチャライザ装置１２は、オブジェクト６０３へのアクセス発生時にネイティブに差分情報６４を複製し、ネイティブのストレージ装置だけでオブジェクト６０３を復元できる状態に復帰する。 In this state, the virtualizer device 12 replicates the difference information 64 natively when access to the object 603 occurs, and returns to a state where the object 603 can be restored only by the native storage device.

図１２は、クライアント装置１０からのオブジェクトアクセス要求を受信したバーチャライザ装置１２のオブジェクトアクセス応答動作８１４ｖを詳細に説明するためのフローチャートである。 FIG. 12 is a flowchart for explaining in detail the object access response operation 814v of the virtualizer device 12 that has received the object access request from the client device 10.

クライアント装置１０からのオブジェクトアクセス要求を受信したバーチャライザ装置１２は、アクセス解釈手段５２０を用いて、目的オブジェクトのベースＩＤ７０を特定する。また、バーチャライザ装置１２は、このベースＩＤ７０を基に、ベース部検索手段５２４を用いてベースＩＤ７０に合致するベース部６２をクラスタ１６に属するストレージ装置１４から検索する（Ｓ８２２）。 The virtualizer device 12 that has received the object access request from the client device 10 specifies the base ID 70 of the target object using the access interpreter 520. Further, the virtualizer device 12 searches the storage unit 14 belonging to the cluster 16 for the base unit 62 that matches the base ID 70 using the base unit search means 524 based on the base ID 70 (S822).

合致するベース部６２が存在しない場合、バーチャライザ装置１２はオブジェクトアクセスの失敗をクライアント装置１０に通知する。これにより、オブジェクトアクセス応答動作８１４ｖは終了する（Ｓ８２２１でＮｏ→Ｓ８２６→Ｓ８２１４）。 If there is no matching base unit 62, the virtualizer device 12 notifies the client device 10 of the object access failure. As a result, the object access response operation 814v ends (No in S8221 → S826 → S8214).

ベース部６２を検出した後、バーチャライザ装置１２は、差分情報検索手段５２６を用いてオブジェクト６０３の復元に必要な差分情報６４をネットワーク２に接続しているストレージ装置１４より検索する（Ｓ８２４）。 After detecting the base unit 62, the virtualizer device 12 uses the difference information search unit 526 to search the storage device 14 connected to the network 2 for the difference information 64 necessary for restoring the object 603 (S824).

復元に必要な差分情報６４が存在し、さらにベース部６２検索手段５２４によりオリジナルの存在が確認、すなわちネイティブのストレージ装置が存在する場合、差分情報確認手段５２１０は、ネイティブのストレージ装置がオブジェクト復元に必要な全ての差分を保持しているか確認する（Ｓ８２４１でＹｅｓ→Ｓ８２５０→Ｓ８２５１でＹｅｓ→Ｓ８１４０）。 If the difference information 64 necessary for restoration exists and the base unit 62 search means 524 confirms the existence of the original, that is, if there is a native storage device, the difference information confirmation means 5210 uses the native storage device for object restoration. It is confirmed whether all necessary differences are held (Yes in S8241 → S8250 → Yes in S8251 → S8140).

復元に必要な全ての差分情報６４が存在しない場合には、復元可能なオブジェクトが提示され、オブジェクトの復元が行われる（Ｓ８２４１でＮｏ→Ｓ８２１０→Ｓ８２８）。 When all the difference information 64 necessary for restoration does not exist, a restoreable object is presented and the object is restored (No in S8241 → S8210 → S828).

ネイティブのストレージ装置がオブジェクト復元に必要な全ての差分を保持していない場合、バーチャライザ装置１２は、不足している差分情報６４をネイティブに複製する（Ｓ８１４１でＮｏ→Ｓ８１４２）。ネイティブのストレージ装置がオブジェクト復元に必要な全ての差分を保持している場合、処理はＳ８２８に移行する。 If the native storage device does not hold all the differences necessary for object restoration, the virtualizer device 12 replicates the missing difference information 64 natively (No in S8141 → S8142). If the native storage device holds all the differences necessary for object restoration, the process proceeds to S828.

その後、バーチャライザ装置１２は、オブジェクト復元手段５２１２を用いて、オブジェクトを復元し、クライアント装置１０に送信する。これにより、オブジェクトアクセス動作が終了する（Ｓ８２８→Ｓ８２１２→Ｓ８２１４） Thereafter, the virtualizer device 12 restores the object using the object restoration unit 5212 and transmits it to the client device 10. As a result, the object access operation ends (S828 → S8212 → S8214).

このように、動作Ｓ８１０、Ｓ８１２、及びＳ８１４により、ネイティブのストレージ装置のみでオブジェクトを復元できる状態を維持する。これにより、オブジェクト６０の複製・移動がネイティブのストレージ装置とその他１台のストレージ装置１４のみで実現できるようになり、結果メンテナンス性が向上する。さらに、オブジェクトを物理的に携帯する際にも、ネイティブのストレージ装置のみの移動で実現できるようになり、利便性が向上する。 As described above, the operations S810, S812, and S814 maintain a state where the object can be restored only by the native storage apparatus. As a result, the duplication / movement of the object 60 can be realized by only the native storage device and the other one storage device 14, thereby improving the maintainability as a result. Furthermore, even when the object is physically carried, it can be realized by moving only the native storage device, and convenience is improved.

３）第３の実施形態
第３の実施形態は、新規ストレージ追加による冗長性のメンテナンスを行うものであり、ストレージ故障を修復するためのプロセスに関する。第１及び第２の実施形態では、障害９０を起こしたストレージ装置１４は一定期間の後復帰できるものを仮定していた。 3) Third Embodiment The third embodiment performs redundancy maintenance by adding a new storage, and relates to a process for repairing a storage failure. In the first and second embodiments, it is assumed that the storage apparatus 14 in which the failure 90 has occurred can be restored after a certain period.

しかし、実際には復帰できない障害９０、すなわち故障によりストレージ装置１４に保存された情報が利用不可能となるケースも存在する。 However, there is a case where the failure 90 that cannot actually be recovered, that is, the information stored in the storage device 14 becomes unavailable due to the failure.

そこで、本実施形態は、クラスタ・ストレージシステム１における故障に対する冗長度の確保方法について提供する。 In view of this, the present embodiment provides a method for ensuring redundancy for failures in the cluster storage system 1.

なお、クラスタ・ストレージシステムとしては、図１や７に示す構成を採用することができる。 As the cluster storage system, the configuration shown in FIGS. 1 and 7 can be adopted.

＜冗長度確保の動作シーケンス＞
図１３は、ストレージ装置１４の故障により冗長度が確保できなくなったクラスタ１６にストレージ装置１４１を新規に追加することで冗長度を復帰させる冗長度復帰動作の動作シーケンスの一例を示す図である。 <Operation sequence for ensuring redundancy>
FIG. 13 is a diagram illustrating an example of an operation sequence of a redundancy restoration operation for restoring redundancy by newly adding a storage device 141 to the cluster 16 in which redundancy cannot be ensured due to a failure of the storage device 14.

ストレージ装置１４識別記号（Ｓ１、Ｓ２、及びＳ３）の下に描かれた各要素は、クライアント装置１０（図示せず）によって保存されたオブジェクトのベース部６２及び差分情報６４を示している。また、９２はＳ３の故障を示している。 Each element drawn under the storage device 14 identification symbols (S1, S2, and S3) indicates an object base 62 and difference information 64 stored by the client device 10 (not shown). 92 indicates a failure of S3.

以下、Ｓ３の故障により失われた冗長度を、新規ストレージ装置１４１を追加することにより復帰させる冗長度復帰動作の例について説明する。 Hereinafter, an example of the redundancy restoration operation for restoring the redundancy lost due to the failure of S3 by adding the new storage device 141 will be described.

バーチャライザ装置１２が新規ストレージ装置１４１の追加を検知したとき、バーチャライザ装置１２は冗長度復帰動作を開始する。この新規ストレージ装置１４１の追加検知は、保存ストレージ決定手段５２１４によるストレージ選択の高速化を目的に、保存ストレージ決定手段５２１４を用いて定期的にネットワーク２に接続しているストレージ装置１４の情報を収集するなどして実現する。 When the virtualizer device 12 detects the addition of the new storage device 141, the virtualizer device 12 starts the redundancy restoration operation. This addition detection of the new storage device 141 collects information of the storage device 14 connected to the network 2 periodically using the storage storage determination unit 5214 for the purpose of speeding up the storage selection by the storage storage determination unit 5214. It is realized by doing.

バーチャライザ装置１２は、ベースＩＤ管理手段５２２により管理されている全てのベースＩＤ７０に関して、ベース部検索手段５２４及び差分情報検索手段５２６を用いて、ベース部６２及び差分情報６４を検索する。 The virtualizer device 12 searches the base part 62 and the difference information 64 by using the base part search means 524 and the difference information search means 526 for all the base IDs 70 managed by the base ID management means 522.

検索の結果、保存ストレージ決定手段５２１４によって決定される冗長度に満たない数のベース部６２及び差分情報６４を、新規ストレージ装置１４１に保存することで冗長度の復帰を実現する。 As a result of the search, the redundancy is restored by storing the number of base units 62 and difference information 64 that are less than the redundancy determined by the storage storage determination unit 5214 in the new storage device 141.

この動作により、クラスタ・ストレージシステム１は失われた冗長性の復帰を実現することができるようになる。 By this operation, the cluster storage system 1 can realize restoration of lost redundancy.

４）第４の実施形態
第４の実施形態は、冗長度＋２台以上で構成されたクラスタ・ストレージ装置における冗長度のメンテナンスに関するものである。 4) Fourth Embodiment The fourth embodiment relates to redundancy maintenance in a cluster storage apparatus configured with redundancy + two or more.

本発明によるクラスタ・ストレージ装置１４が提供可能な冗長性は（クラスタ１６の構成台数−２）台までの故障である。 Redundancy that can be provided by the cluster storage apparatus 14 according to the present invention is a failure of up to (the number of clusters 16-2).

しかしながら、冗長性よりも容量を重視するために、冗長性＋２台以上でクラスタ１６を構成することもある。例えば、冗長性＋３台で動作しているクラスタ１６において１台のディスクが故障した場合、第３の実施形態によりディスクを復帰し冗長度を復帰させてもよいが、未だクラスタ１６は冗長性＋２台のクラスタ・ストレージ装置１４で構成されており、適切に冗長度を調整すればそのまま運用することも可能である。 However, in order to place importance on the capacity rather than the redundancy, the cluster 16 may be configured with redundancy + two or more. For example, if one disk fails in the cluster 16 operating with redundancy +3, the disk may be restored and the redundancy may be restored according to the third embodiment, but the cluster 16 still has redundancy +2 It is configured by a single cluster storage device 14, and can be operated as it is if the redundancy is appropriately adjusted.

なお、本実施形態でも、クラスタ・ストレージシステムとしては、図１や７に示す構成を採用することができる。 In this embodiment as well, the configuration shown in FIGS. 1 and 7 can be adopted as the cluster storage system.

本実施形態においては、元々冗長度に余裕を持たせた台数で運用されたクラスタ１６における故障に対して冗長情報を適切に調整し運用を継続する、冗長度調整動作について説明する。 In the present embodiment, a redundancy adjustment operation for appropriately adjusting redundancy information and continuing operation for a failure in a cluster 16 that was originally operated with a number of redundancy having a margin will be described.

図１４は１台のバーチャライザ装置１２と４台のクラスタ・ストレージ装置１４で構成されるクラスタ１６上で１台のディスクの故障に対する冗長度を持たせた運用において、１台のクラスタ・ストレージ装置１４（Ｓ３）が故障９２したときの冗長度調整動作のシーケンスの一例を示す図である。 FIG. 14 shows one cluster storage device in an operation in which redundancy is provided for the failure of one disk on a cluster 16 composed of one virtualizer device 12 and four cluster storage devices 14. It is a figure which shows an example of the sequence of the redundancy adjustment operation | movement when 14 (S3) fails 92. FIG.

バーチャライザ装置１２は、保存ストレージ決定手段５２１４を用いて、クラスタ・ストレージ装置１４（Ｓ３）の故障９２を検知したとき、冗長度調整動作を開始する。 When the virtualizer apparatus 12 detects the failure 92 of the cluster storage apparatus 14 (S3) using the storage storage determination unit 5214, it starts the redundancy adjustment operation.

まず、バーチャライザ装置１２は、ベース部検索手段５２４あるいは差分情報検索手段５２６を用いて、アクセス可能なクラスタ・ストレージ装置１４が有する全てのデータのベース部６２及び差分情報６４を検索し、さらに、それらの冗長度を、差分情報確認手段５２１０を用いて調査する。 First, the virtualizer device 12 uses the base unit search unit 524 or the difference information search unit 526 to search the base unit 62 and the difference information 64 of all the data that the accessible cluster storage device 14 has, The redundancy is checked using the difference information confirmation unit 5210.

この結果、検出された冗長性が保たれていないデータ６６が検出された場合、保存ストレージ決定手段５２１４は、同様のデータを有していない各クラスタ・ストレージ装置１４を決定し、それぞれに冗長性が保たれていないデータを保存する。 As a result, when the detected data 66 that does not maintain redundancy is detected, the storage storage determination unit 5214 determines each cluster storage device 14 that does not have the same data, and each has redundancy. Save data that is not kept.

この冗長度調整動作により、クラスタ・ストレージ装置１４はクラスタ１６を構成するストレージによって可能な限りの冗長性を維持することができる。この特徴は平均故障間隔が長い場合において特に有効である。すなわち、常にスペアディスクが用意されているとは限らない家庭内環境などにおいて、上記クラスタ・ストレージ装置１４が適していると言える。 With this redundancy adjustment operation, the cluster storage apparatus 14 can maintain as much redundancy as possible by the storages that make up the cluster 16. This feature is particularly effective when the mean time between failures is long. That is, it can be said that the cluster storage device 14 is suitable in a home environment where spare disks are not always prepared.

５）第５の実施形態
第５の実施形態は、冗長情報削除による適正なオブジェクト冗長度のメンテナンスに関するものである。本実施形態でも、クラスタ・ストレージシステムとしては、図１や７に示す構成を採用することができる。 5) Fifth Embodiment The fifth embodiment relates to maintenance of appropriate object redundancy by deleting redundant information. Also in this embodiment, the configuration shown in FIGS. 1 and 7 can be adopted as the cluster storage system.

家庭内においてストレージの修復は面倒で、できるなら回避したい作業である。つまり、台数が減少したとしてもそのまま稼働しつづけられるシステムはそれだけで価値があると言える。 Repairing storage at home is cumbersome and should be avoided if possible. In other words, even if the number decreases, it can be said that a system that can continue to operate as it is is valuable.

本発明のクラスタ・ストレージシステム１によれば、オブジェクトの冗長度は１からクラスタ１６を構成するクラスタ・ストレージ装置１４の台数−２の間で変更することができる。この特徴は、ストレージ装置１４の故障９２により台数が減少し、ユーザが求めている冗長度を確保できなくなった際に過多な冗長情報を保有させる原因となる。このような過剰な冗長情報はストレージ装置１４のストレージ手段２８を容量的に圧迫し、クラスタ・ストレージシステム１の利用効率を低下させる。 According to the cluster storage system 1 of the present invention, the object redundancy can be changed between 1 and the number of cluster storage devices 14 constituting the cluster 16 -2. This feature causes excessive redundancy information to be retained when the number of units decreases due to a failure 92 of the storage apparatus 14 and the redundancy required by the user cannot be secured. Such excessive redundant information squeezes the storage means 28 of the storage device 14 in capacity and reduces the utilization efficiency of the cluster storage system 1.

そこで、本実施形態は、ストレージ装置１４の故障９２に起因する過多な冗長度を適切な冗長度に復帰させる手順を提供する。 In view of this, the present embodiment provides a procedure for restoring the excessive redundancy due to the failure 92 of the storage apparatus 14 to an appropriate redundancy.

図１５は、ストレージ装置１４の故障９２により過多な冗長情報を有しているクラスタ１６の冗長度復帰シーケンスの一例を示す図である。 FIG. 15 is a diagram illustrating an example of a redundancy return sequence of the cluster 16 that has excessive redundant information due to the failure 92 of the storage apparatus 14.

まず、バーチャライザ装置１２がストレージ装置１４の故障９２を検知したとき、バーチャライザ装置１２は冗長度復帰シーケンスを開始する。なお、ストレージ装置１４の故障９２の検知は、保存ストレージ決定手段５２１４を用いて、保存ストレージ決定手段５２１４によるストレージ選択の高速化を目的に定期的にネットワーク２に接続しているストレージ装置１４の情報を収集するなどして実現する。 First, when the virtualizer device 12 detects a failure 92 of the storage device 14, the virtualizer device 12 starts a redundancy return sequence. In addition, the detection of the failure 92 of the storage device 14 is performed by using the storage storage determination unit 5214, and information on the storage device 14 periodically connected to the network 2 for the purpose of speeding up storage selection by the storage storage determination unit 5214. It is realized by collecting.

そして、バーチャライザ装置１２は、ベース部検索手段５２４及び差分情報検索手段５２６を用いて、ベースＩＤ管理手段５２２によって管理されている全てのベースＩＤ７０に関して、ベース部６２及び差分情報６４を検索する。 Then, the virtualizer device 12 searches the base unit 62 and the difference information 64 for all the base IDs 70 managed by the base ID management unit 522 using the base unit search unit 524 and the difference information search unit 526.

検索の結果、保存ストレージ決定手段５２１４によって決定される冗長度を超えるベース部６２及び差分情報６４を削除することにより、冗長度を適切な値に復帰させる。このとき情報を削除するストレージ装置１４は保存ストレージ決定手段５２１４のポリシーに応じてクラスタ１６に属する全てのストレージ装置１４の負荷が均一になるように決定される。 As a result of the search, the redundancy is returned to an appropriate value by deleting the base unit 62 and the difference information 64 that exceed the redundancy determined by the storage storage determination unit 5214. At this time, the storage device 14 whose information is to be deleted is determined so that the loads of all the storage devices 14 belonging to the cluster 16 are uniform according to the policy of the storage storage determination means 5214.

このような動作により、クラスタ１６内のオブジェクト６０の冗長度は定常的に適切な値になる。よって、クラスタ・ストレージシステム１は、クラスタ１６を構成するストレージ装置１４の台数に応じて適切な冗長情報を、適切なディスク容量で維持することができるようになる。 By such an operation, the redundancy of the object 60 in the cluster 16 is constantly set to an appropriate value. Therefore, the cluster storage system 1 can maintain appropriate redundancy information with an appropriate disk capacity according to the number of storage devices 14 constituting the cluster 16.

６）まとめ
本実施形態では、ＲＡＩＤ５を念頭にしているので、３台以上のストレージ装置が設置されたクラスタ・ストレージシステムが採用されている。そして、情報記録制御装置（バーチャライザ装置）が、ストレージ装置が保存されているデータに対する外部装置からのアクセス方式をクラスタ・ストレージ中へのアクセス方式に変換するアクセス方式変換手段（アクセス解釈手段）を備えている。このアクセス方式の変換は、データ格納時は、オブジェクト単位でのデータに識別子（ＩＤ）を付与して管理することであり、また、データアクセス時（アクセス要求が外部装置からなされた場合）は、そのオブジェクト単位のデータに対応する識別子（ＩＤ）を取得し、そのＩＤに対応するオブジェクトデータを取得することである。このように、オブジェクト単位でデータを管理すると共に、格納するデータにＩＤを付与して、データ格納及びアクセスを実現するので、３台以上のストレージ装置として、異なる容量のものを許容することができる。従来のＲＡＩＤシステムではＬＢＡによるデータ管理が行われているので異なる容量のディスク装置を許容することができないが、本発明ではオブジェクトＩＤによるデータ管理を行っているので異なる容量のディスク装置を用いることができるようになるのである。 6) Summary In this embodiment, since RAID 5 is taken into consideration, a cluster storage system in which three or more storage devices are installed is employed. Then, the information recording control device (virtualizer device) has an access method conversion means (access interpretation means) for converting the access method from the external device to the data stored in the storage device into the access method into the cluster storage. I have. This conversion of the access method is to manage by assigning an identifier (ID) to the data in the object unit at the time of data storage, and at the time of data access (when an access request is made from an external device), An identifier (ID) corresponding to the object unit data is acquired, and object data corresponding to the ID is acquired. In this way, data is managed in units of objects, and IDs are assigned to stored data to realize data storage and access, so that three or more storage devices can be allowed to have different capacities. . In the conventional RAID system, data management by LBA is performed, so that disk devices having different capacities cannot be allowed. However, in the present invention, data management by object ID is performed, so that disk devices having different capacities can be used. It will be possible.

また、既に格納されているデータ（元データ或いはベース部）に更新があった場合には、更新データと元データとの差分情報が生成され、その差分情報に元データに付与されたＩＤと同一のＩＤが付与されてストレージ装置に格納される。このようにすることにより、分散して複数のストレージ装置にデータ（元データ及び差分情報）を格納しても、関連するデータを全て取得することができる。また、元データや差分データのデータ量はまちまちであり、各ストレージ装置の容量を勘案しつつ、データを格納することができる。よって、各ストレージ装置について、様々な容量のものを許容することができるようになる。 In addition, if there is an update to the data (original data or base part) that has already been stored, difference information between the update data and the original data is generated, and the difference information is the same as the ID assigned to the original data Is assigned and stored in the storage device. In this way, even when data (original data and difference information) is distributed and stored in a plurality of storage devices, all the related data can be acquired. Moreover, the data amount of the original data and the difference data varies, and the data can be stored while taking into account the capacity of each storage device. Accordingly, various storage capacities can be allowed for each storage device.

別の態様では、１台のストレージ装置に、あるオブジェクト単位のデータに関する元データ（オリジナルデータであって、それを意味するフラグが付与されている）及び差分情報を全て格納し、管理する。これにより、利用者はある特定のストレージ装置のみを用いて所望のデータを迅速に取得することができるようになる。なお、より具体的には、ストレージ装置選択手段が１台のストレージ装置を特定する。そして、データ差異確認手段が、当該特定されたストレージ装置が要求されるデータを復元するのに必要なデータ（差分情報）を全て有しているか検査する。特定されたストレージ装置が要求されたデータを復元するのに必要な差分情報を有していない場合には、データ複製手段が特定されたストレージ装置が有していない差分情報を他のストレージ装置から取得して複製情報をこの特定されたストレージ装置に格納する。このようにして、特定のストレージ装置に全てのデータ（元データ及び差分情報）を集約する。 In another aspect, all of the original data (original data with a flag indicating that) and difference information related to data in a certain object unit and difference information are stored and managed in one storage device. As a result, the user can quickly acquire desired data using only a specific storage device. More specifically, the storage device selection unit specifies one storage device. Then, the data difference confirmation unit checks whether or not the specified storage device has all the data (difference information) necessary to restore the requested data. If the specified storage device does not have the difference information necessary to restore the requested data, the difference information that the data duplicating means does not have from the other storage device Acquire and copy information is stored in the specified storage device. In this way, all data (original data and difference information) is collected in a specific storage device.

また、別の態様では、故障したストレージ装置が新規ストレージ装置に交換された場合、その新規ストレージ装置を検知し、他のストレージ装置からデータを取得し、故障したストレージ装置が格納していたデータを新規ストレージ装置に移植する。ただし、この場合でも、扱うデータはオブジェクト単位であり、各ストレージ装置にデータを格納する場合には、元データ及び差分情報（更新データと元データの差分）には同じＩＤが付与されて管理されているにするとよい。このようにすることにより、新規のストレージ装置としてどのような容量のものでも許容可能となり、冗長性を確保することが容易になる。 In another aspect, when a failed storage device is replaced with a new storage device, the new storage device is detected, data is acquired from another storage device, and the data stored in the failed storage device is stored. Port to a new storage device. However, even in this case, the data to be handled is in units of objects. When data is stored in each storage device, the same ID is assigned to the original data and the difference information (difference between the update data and the original data) and managed. It is good to have. By doing so, it is possible to accept any capacity as a new storage apparatus, and it becomes easy to ensure redundancy.

さらに、別の態様では、ストレージ装置の故障を検知した場合に、その故障したストレージ装置が格納していたデータを他のストレージ装置に移植するようにしている。ただし、この場合でも扱うデータはオブジェクト単位であり、各ストレージ装置にデータを格納する場合には、元データ及び差分情報（更新データと元データの差分）には同じＩＤが付与されて管理されているようにするとよい。このようにすることにより、冗長度を確保することができ、当該他のストレージ装置と故障したストレージ装置の容量が異なっていても問題ない（一番容量の少ないストレージ装置に影響されない）。 Furthermore, in another aspect, when a storage device failure is detected, the data stored in the failed storage device is ported to another storage device. However, even in this case, the data handled is in units of objects, and when storing data in each storage device, the same ID is assigned to the original data and difference information (difference between update data and original data) and managed. It is good to be. In this way, redundancy can be ensured, and there is no problem even if the capacity of the failed storage apparatus is different from that of the other storage apparatus (not affected by the storage apparatus with the smallest capacity).

また、別の態様では、ストレージ装置の故障を検知した場合に、その故障したストレージ装置以外の動作可能なストレージ装置が格納しているデータを検出し、故障前の冗長度を下げることが可能な場合には、動作可能なストレージ装置から余分なデータを削除して冗長度を下げるようにしている。ただし、この場合でも扱うデータはオブジェクト単位であり、各ストレージ装置にデータを格納する場合には、元データ及び差分情報（更新データと元データの差分）には同じＩＤが付与されて管理されているようにするとよい。このようにすることにより、ストレージ装置の数に余裕がある場合には、無理に高い冗長度を維持することなく、緊急事態に対応することができるようになる。特に、家庭内で使用されるストレージシステムにおいては、ストレージ装置が故障しても動作することが優先され、新規のストレージ装置が補充されるまでにはある程度の時間が必要となる。この補充までの期間でも問題なくストレージシステムを動作させることが可能となる。 In another aspect, when a storage device failure is detected, it is possible to detect data stored in an operable storage device other than the failed storage device and reduce the redundancy before the failure. In such a case, redundant data is deleted from the operable storage device to reduce the redundancy. However, even in this case, the data handled is in units of objects, and when storing data in each storage device, the same ID is assigned to the original data and difference information (difference between update data and original data) and managed. It is good to be. By doing so, when there is a surplus in the number of storage devices, it becomes possible to respond to an emergency without forcibly maintaining a high degree of redundancy. In particular, in a storage system used at home, priority is given to operation even when a storage device fails, and a certain amount of time is required until a new storage device is replenished. The storage system can be operated without any problem even during the period until the replenishment.

なお、本発明は、実施形態の機能を実現するソフトウェアのプログラムコードによっても実現できる。この場合、プログラムコードを記録した記憶媒体をシステム或は装置に提供し、そのシステム或は装置のコンピュータ（又はＣＰＵやＭＰＵ）が記憶媒体に格納されたプログラムコードを読み出す。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコード自体、及びそれを記憶した記憶媒体は本発明を構成することになる。このようなプログラムコードを供給するための記憶媒体としては、例えば、フレキシブルディスク、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、ハードディスク、光ディスク、光磁気ディスク、ＣＤ−Ｒ、磁気テープ、不揮発性のメモリカード、ＲＯＭなどが用いられる。 The present invention can also be realized by a program code of software that realizes the functions of the embodiments. In this case, a storage medium in which the program code is recorded is provided to the system or apparatus, and the computer (or CPU or MPU) of the system or apparatus reads the program code stored in the storage medium. In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiments, and the program code itself and the storage medium storing the program code constitute the present invention. As a storage medium for supplying such program code, for example, a flexible disk, CD-ROM, DVD-ROM, hard disk, optical disk, magneto-optical disk, CD-R, magnetic tape, nonvolatile memory card, ROM Etc. are used.

また、プログラムコードの指示に基づき、コンピュータ上で稼動しているＯＳ（オペレーティングシステム）などが実際の処理の一部又は全部を行い、その処理によって前述した実施の形態の機能が実現されるようにしてもよい。さらに、記憶媒体から読み出されたプログラムコードが、コンピュータ上のメモリに書きこまれた後、そのプログラムコードの指示に基づき、コンピュータのＣＰＵなどが実際の処理の一部又は全部を行い、その処理によって前述した実施の形態の機能が実現されるようにしてもよい。 Also, based on the instruction of the program code, an OS (operating system) running on the computer performs part or all of the actual processing, and the functions of the above-described embodiments are realized by the processing. May be. Further, after the program code read from the storage medium is written in the memory on the computer, the computer CPU or the like performs part or all of the actual processing based on the instruction of the program code. Thus, the functions of the above-described embodiments may be realized.

また、実施の形態の機能を実現するソフトウェアのプログラムコードを、ネットワークを介して配信することにより、それをシステム又は装置のハードディスクやメモリ等の記憶手段又はＣＤ-ＲＷ、ＣＤ-Ｒ等の記憶媒体に格納し、使用時にそのシステム又は装置のコンピュータ(又はＣＰＵやＭＰＵ)が当該記憶手段や当該記憶媒体に格納されたプログラムコードを読み出して実行するようにしても良い。 Also, by distributing the program code of the software that realizes the functions of the embodiment via a network, the program code is stored in a storage means such as a hard disk or memory of a system or apparatus, or a storage medium such as a CD-RW or CD-R And the computer of the system or apparatus (or CPU or MPU) may read and execute the program code stored in the storage means or the storage medium when used.

本発明の第１の実施形態によるクラスタ・ストレージシステムの概略構成例を示す図である。1 is a diagram showing a schematic configuration example of a cluster storage system according to a first embodiment of the present invention. FIG. ベース部のデータ構成例を示す図である。It is a figure which shows the data structural example of a base part. 差分情報のデータ構成例を示す図である。It is a figure which shows the data structural example of difference information. 第１の実施形態によるクラスタ・ストレージシステムの動作シーケンスを示す図である。It is a figure which shows the operation | movement sequence of the cluster storage system by 1st Embodiment. バーチャライザ装置のオブジェクトアクセス応答動作を説明するためのフローチャートである。It is a flowchart for demonstrating the object access response operation | movement of a virtualizer apparatus. バーチャライザ装置のオブジェクト保存動作を説明するためのフローチャートである。It is a flowchart for demonstrating the object preservation | save operation | movement of a virtualizer apparatus. 本発明の第２の実施形態によるクラスタ・ストレージシステムの概略構成例を示す図である。It is a figure which shows the schematic structural example of the cluster storage system by the 2nd Embodiment of this invention. 第２の実施形態によるベース部のデータ構成例を示す図である。It is a figure which shows the example of a data structure of the base part by 2nd Embodiment. 第２の実施形態によるクラスタ・ストレージシステムの動作シーケンスを示す図である。It is a figure which shows the operation | movement sequence of the cluster storage system by 2nd Embodiment. バーチャライザ装置の新規オブジェクト保存動作を説明するためのフローチャートである。It is a flowchart for demonstrating the new object preservation | save operation | movement of a virtualizer apparatus. バーチャライザ装置のオブジェクト保存動作を説明するためのフローチャートである。It is a flowchart for demonstrating the object preservation | save operation | movement of a virtualizer apparatus. バーチャライザ装置のオブジェクトアクセス応答動作を説明するためのフローチャートである。It is a flowchart for demonstrating the object access response operation | movement of a virtualizer apparatus. 第３の実施形態に夜クラスタ・ストレージシステムの動作シーケンスを示す図である。It is a figure which shows the operation | movement sequence of a night cluster storage system in 3rd Embodiment. 第４の実施形態に夜クラスタ・ストレージシステムの動作シーケンスを示す図である。It is a figure which shows the operation | movement sequence of a night cluster storage system in 4th Embodiment. 第５の実施形態に夜クラスタ・ストレージシステムの動作シーケンスを示す図である。It is a figure which shows the operation | movement sequence of a night cluster storage system in 5th Embodiment.

Explanation of symbols

１：クラスタ・ストレージシステム
２：ネットワーク
１０：クライアント装置
１２：バーチャライザ装置
１４：ストレージ装置
２０：ユーザインターフェース
２２：ＣＰＵ
２４：メモリ
２６：通信インターフェース
２８：ストレージ手段
５００：オブジェクト作成手段
５０２：オブジェクト更新手段
５２０：アクセス解釈手段
５２２：ベースＩＤ管理手段
５２４：ベース部検索手段
５２６：差分情報検索手段
５２８：差分情報生成手段
５２１０：差分情報確認手段
５２１２：オブジェクト復元手段
５２１４：保存ストレージ決定手段 1: Cluster storage system 2: Network 10: Client device 12: Virtualizer device 14: Storage device 20: User interface 22: CPU
24: Memory 26: Communication interface 28: Storage means 500: Object creation means 502: Object update means 520: Access interpretation means 522: Base ID management means 524: Base part search means 526: Difference information search means 528: Difference information generation means 5210: Difference information confirmation unit 5212: Object restoration unit 5214: Storage storage determination unit

Claims

Store data provided from three or more storage devices and an external device in any of the three or more storage devices, and access to data stored in any of the three or more storage devices An information recording / reproducing control device for controlling the storage device,
The information recording / reproducing control apparatus comprises an access interpreting means for executing storage and access of the data in units of objects.

2. The cluster storage apparatus according to claim 1, wherein the access interpreting means manages the data by assigning an ID to the object unit data.

The access interpreting means receives an access request for data stored in the three or more storage devices, acquires an ID corresponding to the data to be accessed, and realizes access to the data. The cluster storage device described in 1.

The information recording / reproduction control device further receives the update data generated by updating the original data stored in any of the three or more storage devices, and updates the update data and the original data. A difference information generating means for taking a difference and generating difference information;
4. The cluster storage according to claim 3, wherein the access interpretation unit assigns the same ID as the original data to the difference information and stores the difference information in any of the three or more storage devices. apparatus.

The information recording / reproducing control device further includes:
Based on an ID corresponding to the access target data, original data search means for searching the original data;
Difference information search means for searching for the difference information based on the ID;
5. The cluster storage apparatus according to claim 4, further comprising a restoration unit that restores the update data in units of objects using the original data and the difference information.

3. The cluster storage apparatus according to claim 2, wherein the access interpreting unit stores data in units of objects to which the ID is assigned in at least two of the three or more storage apparatuses.

The information recording / reproduction control device further receives the update data generated by updating the original data stored in any of the three or more storage devices, and updates the update data and the original data. A difference information generating means for taking a difference and generating difference information;
The cluster according to claim 6, wherein the access interpretation unit assigns the same ID as the original data to the difference information and stores the difference information in at least two of the three or more storage devices. -Storage device.

5. The access interpreting unit aggregates and stores one original data and all the difference information related to the original data in at least one of the three or more storage devices. The cluster storage device described in 1.

The information recording / reproducing control apparatus further includes original flag adding means for adding an original flag indicating that the original data is original data to the original data stored in a centralized manner in the one storage device,
The said access interpretation means aggregates and stores the said original data to which the said original flag was given, and all the said difference information relevant to it in the said one storage apparatus. The cluster storage device described.

When the access interpreting unit receives the access request, the access interpreting unit obtains the ID of the corresponding object unit data,
The information recording / reproducing control device further includes:
Original data search means for preferentially searching the original data from the storage device storing the original data having the original flag based on the ID corresponding to the data of the access request;
Based on the ID, difference information search means for searching all the related difference information from the storage device that stores the original data having the original flag;
10. The cluster storage apparatus according to claim 9, further comprising a restoration unit that restores the data in units of objects using the original data having the original flag and all the related difference information.

Store data provided from three or more storage devices and an external device in any of the three or more storage devices, and access to data stored in any of the three or more storage devices An information recording / reproducing control device for controlling the storage device,
The information recording / reproducing control apparatus comprises:
Failure detection means for detecting a failure of the storage device;
Search means for searching data stored in an operable storage device other than the storage device in which the failure is detected, and checking whether a predetermined redundancy is secured;
Data storage means for storing insufficient data in the operable storage device so that the predetermined redundancy is ensured when the predetermined redundancy is not ensured;
A cluster storage apparatus comprising:

Store data provided from three or more storage devices and an external device in any of the three or more storage devices, and access to data stored in any of the three or more storage devices An information recording / reproducing control device for controlling the storage device,
The information recording / reproducing control apparatus comprises:
Failure detection means for detecting a failure of the storage device;
Search means for searching data stored in an operable storage device other than the storage device in which the failure is detected, and checking whether a predetermined redundancy is secured;
Data organizing means for deleting a part of data stored in the operable storage device and reducing the predetermined redundancy when the predetermined redundancy is not secured;
A cluster storage apparatus comprising:

Store data provided from three or more storage devices and an external device in any of the three or more storage devices, and access to data stored in any of the three or more storage devices A method of controlling a cluster storage device comprising:
In the information recording / reproducing control apparatus, the access interpreting unit executes storage and access of the data in units of objects.

14. The method according to claim 13, wherein the access interpreting unit manages the data by assigning an ID to the object unit data.

15. The access interpreting unit receives an access request for data stored in the three or more storage devices, acquires an ID corresponding to data to be accessed, and realizes data access. The method described in 1.

In the information recording / reproducing control apparatus, when the difference information generating means receives the update data generated by updating the original data stored in any of the three or more storage apparatuses, the update data And the difference between the original data and generate difference information,
16. The method according to claim 15, wherein the access interpreting unit assigns the same ID as the original data to the difference information and stores the difference information in any of the three or more storage devices.

17. The access interpreting unit stores one piece of the original data and all the difference information related thereto in a concentrated manner in at least one of the three or more storage devices. The method described in 1.

In the information recording / reproducing control apparatus,
An original flag assigning unit assigns an original flag indicating that the original data is original data to the original data collected and stored in the one storage device,
18. The access interpreting unit aggregates and stores the original data to which the original flag is assigned and all the difference information related to the original data in the one storage device. The method described.

When the access interpretation means receives the access request, it obtains the ID of the corresponding object unit data,
In the information recording / reproducing control apparatus,
Based on an ID corresponding to the access request data, the original data search means preferentially searches for the original data from the storage device storing the original data having the original flag,
The difference information search means searches all the related difference information from the storage device storing the original data having the original flag based on the ID,
19. The method according to claim 18, wherein the restoration means restores the data in units of objects using the original data having the original flag and all the related difference information.