JP2009289161A

JP2009289161A - Clustered storage system, node device thereof, and method and program for controlling data read/write

Info

Publication number: JP2009289161A
Application number: JP2008143046A
Authority: JP
Inventors: Shin Kobayashi; 心小林; Shunichi Ichikawa; 俊一市川
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2008-05-30
Filing date: 2008-05-30
Publication date: 2009-12-10

Abstract

<P>PROBLEM TO BE SOLVED: To freely combine devices storing content, in a clustered storage system which stores content in a plurality of devices in a redundant manner. <P>SOLUTION: In a node device 100 of the clustered storage system which stores content in a plurality of device 110 in a redundant manner, layout information 131 is created which is to be referred in determining the devices to which data are written, and a basket ID is created using a hash value which is calculated using a content key (K) of the desired content. In the layout information 131, the device IDs of the plurality of devices 110 storing the content can be set as device IDs corresponding to the basket ID. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、クラスタ型ストレージシステムに関する。 The present invention relates to a cluster type storage system.

従来から、分散した複数のデバイスにデータを保存するクラスタ型ストレージシステムがある。このクラスタ型ストレージシステムによれば、デバイスの追加によって、スループット、単位時間あたりのＩ／Ｏ数、データ保存容量を柔軟に向上させることができるという、拡張性（スケーラビリティ）を有している。 Conventionally, there is a cluster type storage system that stores data in a plurality of distributed devices. This cluster type storage system has expandability (scalability) in which throughput, the number of I / Os per unit time, and data storage capacity can be flexibly improved by adding devices.

このクラスタ型ストレージシステムとして、Amazon社が開発したDynamoがある（非特許文献１参照）。このDynamoは、分散ハッシュテーブル（DHT:Distributed Hash Table)を使用し、Key-Valueペアでデータ（コンテンツ）の書き込み先を管理するストレージシステムである。また、このKey-Valueペアとは、コンテンツを保存するときコンテンツキーを付与して保存する方式であり、システムがコンテンツを読み出すときは、コンテンツキーを指定して読み出す。Dynamoでは、ＤＨＴ空間の均等分割、仮想ノードの導入等により、ノード装置間（デバイス間）の負荷分散を行っている。このDynamoでは、ハッシュ関数ＨにコンテンツキーＫを代入して計算したハッシュ値Ｈ（Ｋ）によって、コンテンツを保存するデバイスを決定する。ここで、コンテンツを冗長化して保存する場合は、ＤＨＴ空間上、隣接するデバイス群にコンテンツを複製して保存する。
Dynamo、[online]、[平成20年5月9日検索]、インターネット、<URL: http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html > As this cluster type storage system, there is Dynamo developed by Amazon (see Non-Patent Document 1). This Dynamo is a storage system that uses a distributed hash table (DHT) and manages data (content) write destinations with key-value pairs. The key-value pair is a method of assigning and storing a content key when storing the content. When the system reads the content, the content key is designated and read. In Dynamo, load distribution between node devices (between devices) is performed by equally dividing the DHT space, introducing virtual nodes, and the like. In this Dynamo, a device for storing content is determined by a hash value H (K) calculated by substituting the content key K for the hash function H. Here, in the case where the content is redundantly stored, the content is duplicated and stored in the adjacent device group in the DHT space.
Dynamo, [online], [Search May 9, 2008], Internet, <URL: http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html>

しかし、非特許文献１の記載によると、Dynamoで30分間の各ノード装置の平均負荷を計測したところ、負荷が平均値の15％になったノード装置の数が、全ノード数の10％〜20％存在する。このような負荷の偏りが生じる原因は様々なものが考えられるが、例えばアプリケーションによる特定のキーのコンテンツに対する集中的なアクセスが挙げられる。具体例を挙げると、コンテンツＣ１が、ＤＨＴ空間上隣接する３つのデバイスＤ１，Ｄ２，Ｄ３に保存され、別のコンテンツＣ２が、ＤＨＴ空間上隣接する３つのデバイスＤ２，Ｄ３，Ｄ４に保存されるシステムを考える。このシステムの場合、コンテンツＣ１，Ｃ２に頻繁にアクセスするアプリケーションが出現すると、デバイスＤ２，Ｄ３の負荷が特に高くなり、システム内のデバイス間（ノード装置間）に負荷の偏りが生じるという問題がある。ここで、Dynamoでは冗長化されるデバイスの組み合わせはＤＨＴ空間上隣接するデバイスと決められている。よって、負荷の偏りを解消するため、コンテンツを冗長化して保存するデバイスの組み合わせを任意の組み合わせに変更することはできない。 However, according to the description of Non-Patent Document 1, when the average load of each node device for 30 minutes was measured with Dynamo, the number of node devices whose load was 15% of the average value was 10% to the total number of nodes. Present 20%. There are various causes for such a load imbalance. For example, concentrated access to content of a specific key by an application can be mentioned. As a specific example, content C1 is stored in three devices D1, D2, and D3 adjacent in the DHT space, and another content C2 is stored in three devices D2, D3, and D4 adjacent in the DHT space. Think of a system. In the case of this system, when an application that frequently accesses the contents C1 and C2 appears, the load on the devices D2 and D3 becomes particularly high, and there is a problem that the load is unbalanced between devices in the system (between node devices). . Here, in Dynamo, the combination of redundant devices is determined to be adjacent devices in the DHT space. Therefore, in order to eliminate the uneven load, it is not possible to change the combination of devices for storing contents redundantly to an arbitrary combination.

本発明は、前記した問題を解決し、複数のデバイスにコンテンツを冗長化して保存するクラスタ型ストレージシステムにおいて、コンテンツの保存先であるデバイスの組み合わせを、任意の組み合わせにできるようにすることを目的とする。 SUMMARY OF THE INVENTION An object of the present invention is to solve the above-mentioned problems and to enable any combination of devices as storage destinations of contents in a cluster type storage system for storing contents redundantly in a plurality of devices. And

前記した課題を解決するため請求項１に記載の発明は、データの読み出しおよび書き込みを行うノード装置を複数備え、前記ノード装置の収容する複数のデバイスに前記データを冗長化して記憶するクラスタ型ストレージシステムにおける前記ノード装置であって、
前記データの識別情報であるコンテンツキーのハッシュ値を基に得られた識別情報であるバスケットＩＤごとに、当該データを記憶する複数の前記デバイスの識別情報を示した配置情報と、前記デバイスの識別情報ごとに、前記デバイスを収容するノード装置の識別情報を示したアドレス情報とを記憶する記憶部と、前記読み出しまたは書き込みの対象となるデータのコンテンツキーを含む、前記データの書き読み要求または読み出し要求の入力を受け付ける入力部と、前記コンテンツキーのハッシュ値を基に当該コンテンツキーに対応する前記バスケットＩＤを計算するハッシュ計算部と、前記計算されたバスケットＩＤをキーとして、前記配置情報から、前記バスケットＩＤに対応する前記デバイスの識別情報を検索し、この検索した前記デバイスの識別情報と、前記アドレス情報とを参照して、前記検索した識別情報に対応するデバイスを収容するノード装置の識別情報を特定し、前記特定した識別情報のノード装置へ、当該デバイスへの前記コンテンツキーに対応するデータの読み出し要求または書き込み要求を送信するルーチング部と、前記ノード装置から前記データの読み出し要求または書き込み要求の結果を受信するデータ受信部とを備えることを特徴とする。 In order to solve the above-mentioned problem, the invention according to claim 1 is provided with a plurality of node devices that read and write data, and a cluster type storage that stores the data redundantly in a plurality of devices accommodated in the node device The node device in the system,
For each basket ID that is identification information obtained based on a hash value of a content key that is identification information of the data, arrangement information indicating identification information of a plurality of the devices that store the data, and identification of the device For each piece of information, a write / read request or read of the data, including a storage unit that stores address information indicating identification information of a node device that accommodates the device, and a content key of the data to be read or written From the arrangement information, using an input unit that receives an input of a request, a hash calculation unit that calculates the basket ID corresponding to the content key based on a hash value of the content key, and the calculated basket ID as a key, The identification information of the device corresponding to the basket ID is searched and before this search Referring to the identification information of the device and the address information, the identification information of the node device that accommodates the device corresponding to the searched identification information is specified, and the node device of the identified identification information is transferred to the device A routing unit for transmitting a data read request or a write request corresponding to the content key, and a data receiving unit for receiving a result of the data read request or write request from the node device.

請求項８に記載の発明は、データの読み出しおよび書き込みを行うノード装置を複数備え、前記ノード装置の収容する複数のデバイスに前記データを冗長化して記憶するクラスタ型ストレージシステムにおいて、前記データの識別情報であるコンテンツキーのハッシュ値を基に得られた識別情報であるバスケットＩＤごとに、当該データを記憶する複数の前記デバイスの識別情報を示した配置情報と、前記デバイスの識別情報ごとに、前記デバイスを収容するノード装置の識別情報を示したアドレス情報とを記憶する記憶部を備える前記ノード装置が、前記読み出しまたは書き込みの対象となるデータのコンテンツキーを含む、前記データの書き読み要求または読み出し要求の入力を受け付けるステップと、前記コンテンツキーのハッシュ値を基に当該コンテンツキーに対応する前記バスケットＩＤを計算するステップと、前記計算されたバスケットＩＤをキーとして、前記配置情報から、前記バスケットＩＤに対応する前記デバイスの識別情報を検索するステップと、この検索した前記デバイスの識別情報と、前記アドレス情報とを参照して、前記検索した識別情報に対応するデバイスを収容するノード装置の識別情報を特定するステップと、前記特定した識別情報のノード装置へ、当該デバイスへの前記コンテンツキーに対応するデータの読み出し要求または書き込み要求を送信するステップと、前記ノード装置から前記データの読み出し要求または書き込み要求の結果を受信するステップとを実行することを特徴とするデータの読み出しおよび書き込み制御方法である。 According to an eighth aspect of the present invention, there is provided a cluster type storage system comprising a plurality of node devices that read and write data, wherein the data is stored redundantly in a plurality of devices accommodated in the node device. For each basket ID that is identification information obtained based on the hash value of the content key that is information, for each of the arrangement information indicating the identification information of the plurality of devices storing the data, and for each identification information of the device, The node device including a storage unit that stores address information indicating identification information of a node device that accommodates the device includes a content key of the data to be read or written, A step of accepting an input of a read request, and a hash value of the content key A step of calculating the basket ID corresponding to the content key on the basis, a step of searching for identification information of the device corresponding to the basket ID from the arrangement information using the calculated basket ID as a key, A step of identifying identification information of a node device that accommodates a device corresponding to the retrieved identification information with reference to the retrieved identification information of the device and the address information, and to the node device of the identified identification information Transmitting a data read request or write request corresponding to the content key to the device, and receiving a result of the data read request or write request from the node device. Data read and write control method.

このようにすることで、クラスタ型ストレージシステム（以下、システムと略す）のノード装置は、冗長化するデバイスの識別情報を明示した配置情報を用いて、データの書き込み先（または読み出し先）となるデバイスを決定する。よって、システムの管理者等がこの配置情報を設定することで、各ノード装置が決定するデータの書き込み先（または読み出し先）となるデバイスを自由に設定できる。 By doing so, the node device of the cluster type storage system (hereinafter abbreviated as “system”) becomes the data write destination (or read destination) using the arrangement information that clearly identifies the identification information of the device to be made redundant. Determine the device. Therefore, a system administrator or the like can freely set a device as a data write destination (or read destination) determined by each node device by setting this arrangement information.

請求項２に記載の発明は、請求項１に記載のノード装置が前記入力部経由で、前記データのコンテンツキーと、当該データの記憶先となる複数の前記デバイスの識別情報とを含む前記配置情報の設定情報の入力を受け付けたとき、前記ハッシュ計算部は、前記設定情報に含まれるコンテンツキーのハッシュ値を基に当該コンテンツキーに対応する前記バスケットＩＤを計算し、前記ノード装置は、前記計算されたバスケットＩＤと、前記設定情報に含まれる前記複数の前記デバイスの識別情報とを対応付けた前記配置情報を作成する配置情報管理部を備えることを特徴とする。 According to a second aspect of the present invention, the node device according to the first aspect includes the content key of the data and identification information of the plurality of devices that are storage destinations of the data via the input unit. When receiving the input of the setting information of information, the hash calculation unit calculates the basket ID corresponding to the content key based on the hash value of the content key included in the setting information, the node device, An arrangement information management unit that creates the arrangement information in which the calculated basket ID is associated with the identification information of the plurality of devices included in the setting information is provided.

このようにすることで、システムの管理者等は、各ノード装置が決定するデータの書き込み先（または読み出し先）となるデバイスを自由に設定できる。 By doing so, a system administrator or the like can freely set a device as a data write destination (or read destination) determined by each node device.

請求項３に記載の発明は、請求項２に記載のノード装置において、前記データの記憶先となるデバイスを変更するため、前記入力部経由で、前記変更対象となるデータのコンテンツキーと、当該データの新たな記憶先となるデバイスの識別情報とを含む変更情報の入力を受け付けたとき、前記配置情報管理部は、前記ハッシュ計算部により計算された前記変更対象となるデータのコンテンツキーに対応するバスケットＩＤをキーとして、前記配置情報から、当該バスケットＩＤに対応する前記デバイスの識別情報を検索し、前記配置情報における、当該検索したデバイスの識別情報を、前記新たな記憶先となるデバイスの識別情報に変更し、前記ルーチング部は、前記デバイスの識別情報を検索するとき、前記変更した配置情報を参照して、前記デバイスの識別情報を検索することを特徴とする。 According to a third aspect of the present invention, in the node device according to the second aspect, in order to change the device that is the storage destination of the data, the content key of the data to be changed via the input unit, When receiving input of change information including identification information of a device that is a new storage destination of data, the arrangement information management unit corresponds to the content key of the data to be changed calculated by the hash calculation unit The identification information of the device corresponding to the basket ID is searched from the arrangement information using the basket ID as a key, and the identification information of the searched device in the arrangement information is stored in the new storage destination device. Change to identification information, when the routing unit searches for the identification information of the device, referring to the changed arrangement information, Wherein the retrieving the identity of the serial device.

請求項９に記載の発明は、請求項８のデータの読み出しおよび書き込み制御方法において、前記ノード装置が、前記データの記憶先となるデバイスを変更するため、前記変更対象となるデータのコンテンツキーと、当該データの新たな記憶先となるデバイスの識別情報とを含む変更情報の入力を受け付けたとき、前記変更対象となるデータのコンテンツキーに対応するバスケットＩＤを計算するステップと、前記計算されたバスケットＩＤをキーとして、前記配置情報から、当該バスケットＩＤに対応する前記デバイスの識別情報を検索するステップと、前記配置情報における、当該検索したデバイスの識別情報を、前記新たな記憶先となるデバイスの識別情報に変更するステップとを実行し、前記デバイスの識別情報を検索するとき、前記変更した配置情報を参照して、前記デバイスの識別情報を検索することを特徴とする。 According to a ninth aspect of the present invention, in the data read / write control method according to the eighth aspect, since the node device changes a device that is a storage destination of the data, the content key of the data to be changed And calculating the basket ID corresponding to the content key of the data to be changed when the input of the change information including the identification information of the device as the new storage destination of the data is received, and the calculated Using the basket ID as a key, the step of searching the device identification information corresponding to the basket ID from the placement information, and the device information that becomes the new storage destination of the device identification information in the placement information When the device identification information is searched for, Referring to further the arrangement information, characterized by searching for identification information of the device.

このようにすることで、例えば、システム内の処理負荷の偏りの解消のため、管理者等が処理負荷の大きいノードのデバイスに記憶されているデータを、比較的処理負荷の小さいノードのデバイスへ移行した場合、この配置情報に、当該データの記憶先としてこの移行先のデバイスに設定しておけば、データ移行後、ノード装置にこの移行先のデバイスに対し読み出し処理や書き込み処理を実行させることができる。 In this way, for example, in order to eliminate the uneven processing load in the system, the data stored in the node device with a high processing load by the administrator or the like is transferred to the node device with a relatively low processing load. If the migration is performed, if the migration destination device is set as the storage destination of the data in the arrangement information, the node device is allowed to execute read processing and write processing on the migration destination device after data migration. Can do.

請求項４に記載の発明は、請求項１ないし請求項３のいずれか１項に記載のノード装置において、他の前記ノード装置から前記データの読み出し要求または書き込み要求を受信したとき、自身の前記ノード装置の備えるデバイスに記憶される前記データに対し、前記読み出しまたは書き込み処理を実行し、その実行結果を、前記データの読み出し要求または書き込み要求の送信元であるノード装置へ送信するデータ送信部を備えることを特徴とする。 According to a fourth aspect of the present invention, in the node device according to any one of the first to third aspects, when the data read request or write request is received from another node device, A data transmission unit that executes the read or write process on the data stored in a device included in the node apparatus, and transmits the execution result to the node apparatus that is a transmission source of the data read request or write request; It is characterized by providing.

このようにすることで、他のノード装置からデータの読み出し要求または書き込み要求を受信した場合でも、この要求に応じた処理を実行し、その実行結果を相手方のノード装置へ返すことができる。 In this way, even when a data read request or write request is received from another node device, it is possible to execute processing according to this request and return the execution result to the other node device.

請求項５に記載の発明は、請求項２ないし請求項４のいずれか１項に記載のノード装置の前記配置情報管理部が、前記記憶部における前記配置情報が変更されたとき、前記クラスタ型ストレージシステム内の他のノード装置へ、前記変更後の配置情報を送信し、他のノード装置から、変更後の配置情報を受信したとき、前記変更後の配置情報を前記記憶部に記憶し、前記ルーチング部は、前記デバイスの識別情報を検索するとき、前記変更後配置情報を参照して、前記デバイスの識別情報を検索することを特徴とする。 According to a fifth aspect of the present invention, when the arrangement information management unit of the node device according to any one of the second to fourth aspects changes the arrangement information in the storage unit, the cluster type Sending the changed placement information to other node devices in the storage system, and receiving the changed placement information from the other node devices, storing the changed placement information in the storage unit, The routing unit searches for the identification information of the device with reference to the changed arrangement information when searching for the identification information of the device.

このようにすることで、システムのいずれかのノード装置で配置情報が変更されたとき、各ノード装置がそれぞれ、この変更された配置情報に基づきデータの読み出し先または書き込み先を決定できる。 In this way, when the placement information is changed in any node device of the system, each node device can determine the data read destination or write destination based on the changed placement information.

請求項６に記載の発明は、請求項１ないし請求項５のいずれか１項に記載のノード装置を複数備えることを特徴とするクラスタ型ストレージシステムとした。 According to a sixth aspect of the present invention, there is provided a cluster type storage system comprising a plurality of node devices according to any one of the first to fifth aspects.

このようにすることで、請求項１ないし請求項５のいずれか１項に記載のノード装置によりクラスタ型ストレージシステムを実現できる。 By doing so, a cluster storage system can be realized by the node device according to any one of claims 1 to 5.

請求項７に記載の発明は、請求項６に記載のクラスタ型ストレージシステムが、前記前記クラスタ型ストレージシステム内の前記ノード装置それぞれにおける処理負荷を監視する監視装置をさらに備え、前記監視装置において、前記ノード装置間の処理負荷の偏りを検出したとき、前記ノード装置は、前記配置情報において、前記処理負荷が比較的高い前記ノード装置に収容されるデバイスの識別情報に対応するバスケットＩＤについて、このバスケットＩＤに対応するデバイスの識別情報を、前記処理負荷が比較的低い前記ノード装置に収容されるデバイスの識別情報へ変更することを特徴とする。 According to a seventh aspect of the invention, the cluster type storage system according to the sixth aspect further includes a monitoring device that monitors a processing load in each of the node devices in the cluster type storage system, When detecting a bias in processing load between the node devices, the node device uses the placement information for a basket ID corresponding to identification information of a device accommodated in the node device having a relatively high processing load. The device identification information corresponding to the basket ID is changed to device identification information accommodated in the node device having a relatively low processing load.

このようにすることで、システムのノード装置間に処理負荷の偏りが生じた場合、ノード装置が自動で配置情報の変更を行うことができる。よってシステムのノード装置間に処理負荷の偏りが生じた場合の配置情報の変更の手間を軽減できる。 By doing in this way, when the processing load is uneven between the node devices of the system, the node device can automatically change the arrangement information. Therefore, it is possible to reduce the trouble of changing the arrangement information when the processing load is uneven between the node devices of the system.

請求項１０に記載の発明は、請求項８または請求項９に記載のデータの読み出しおよび書き込み制御方法を、コンピュータである前記ノード装置に実行させるためのプログラムである。 The invention described in claim 10 is a program for causing the node device, which is a computer, to execute the data read and write control method according to claim 8 or claim 9.

このようなプログラムによれば、一般的なコンピュータに請求項８または請求項９に記載のデータの読み出しおよび書き込み制御方法を実行させることができる。 According to such a program, it is possible to cause a general computer to execute the data read and write control method according to claim 8 or claim 9.

本発明によれば、複数のデバイスにコンテンツを冗長化して保存するクラスタ型ストレージシステム（システム）において、コンテンツの保存（記憶）先であるデバイスの組み合わせを管理者等が任意に設定可能となる。よって、デバイス間に負荷の偏りが発生したとき、その負荷の偏りの軽減のため、コンテンツの記憶先であるデバイスの変更を行いやすくなる。また、システムの運用環境に合わせて、当該コンテンツの保存先であるデバイスの組み合わせをシステムの管理者等が任意に設定できるので、システムのアベイラビリティを向上させることができる。 According to the present invention, in a cluster type storage system (system) in which contents are stored redundantly in a plurality of devices, an administrator or the like can arbitrarily set a combination of devices that are contents storage (storage) destinations. Therefore, when load imbalance occurs between devices, it is easy to change the device that is the storage destination of the content in order to reduce the load imbalance. In addition, the system administrator can arbitrarily set a combination of devices as storage destinations of the content according to the system operating environment, so that the system availability can be improved.

以下、本発明を実施するための最良の形態（以下、実施の形態という）について説明する。 Hereinafter, the best mode for carrying out the present invention (hereinafter referred to as an embodiment) will be described.

≪概要≫
まず、図１を用いて本実施の形態のクラスタ型ストレージシステムの概要を説明する。図１は、本実施の形態のクラスタ型ストレージシステムの構成例を示した図である。 ≪Overview≫
First, an overview of the cluster type storage system of this embodiment will be described with reference to FIG. FIG. 1 is a diagram showing a configuration example of a cluster type storage system according to the present embodiment.

図１に示すように、クラスタ型ストレージシステム（以下、システムと略す）は複数のノード装置１００（１００Ａ，１００Ｂ，１００Ｃ，１００Ｄ）を含んで構成される。ここでは、同じコンテンツキー（コンテンツの識別情報）に対応するコンテンツが、３つのデバイス１１０（１１０Ｃ,１１０Ｅ，１１０Ｇ）に冗長化されて保存されている場合を例に説明する。ここでは、この３つのデバイス１１０（１１０Ｃ,１１０Ｅ，１１０Ｇ）は、それぞれノード装置１００Ｂ，１００Ｃ，１００Ｄ内のデバイス１１０であるものとする。 As shown in FIG. 1, a cluster storage system (hereinafter abbreviated as “system”) includes a plurality of node devices 100 (100A, 100B, 100C, 100D). Here, a case where contents corresponding to the same content key (content identification information) are redundantly stored in three devices 110 (110C, 110E, 110G) will be described as an example. Here, it is assumed that the three devices 110 (110C, 110E, and 110G) are the devices 110 in the node apparatuses 100B, 100C, and 100D, respectively.

クライアント装置２０は、システムにアクセスするためのアプリケーションを備える端末装置である。各ノード装置１００は、このクライアント装置２０からのアクセスを受け付けると、このクライアント装置２０からデータ（コンテンツ）の読み出し要求または書き込み要求に基づき、所定のノード装置１００に対しコンテンツの読み出しまたは書き込みを行う。 The client device 20 is a terminal device that includes an application for accessing the system. When each node device 100 accepts access from the client device 20, the node device 100 reads or writes content from / to a predetermined node device 100 based on a data (content) read request or write request from the client device 20.

このようなクラスタ型ストレージシステムにおいて、各ノード装置１００はそれぞれ、当該コンテンツの記憶先を示した配置情報１３１と、デバイス１１０が収容されるノード装置１００のＩＰアドレス等の識別情報を示したアドレス情報１３２とを備える。なお、この配置情報１３１が更新されたとき、この更新された配置情報１３１は、ノード装置１００は、他の各ノード装置１００へこの更新された配置情報１３１を送信する。そして、各ノード装置１００はこの更新後の配置情報１３１を基に当該コンテンツの読み出し先または書き込み先となるデバイス１１０を決定する。 In such a cluster-type storage system, each node device 100 has location information 131 indicating the storage destination of the content and address information indicating identification information such as an IP address of the node device 100 in which the device 110 is accommodated. 132. When the placement information 131 is updated, the node device 100 transmits the updated placement information 131 to each of the other node devices 100. Then, each node device 100 determines a device 110 that is a read destination or a write destination of the content based on the updated arrangement information 131.

例えば、ノード装置１００Ａが、クライアント装置２０から、コンテンツキー（Ｋ）を指定した読み書き要求（コンテンツの読み出し要求または書き込み要求）を受け付けると、このコンテンツキー（Ｋ）をハッシュ関数Ｈに代入してバスケットＩＤ（例えば、「１」）を計算する。そして、ノード装置１００Ａは、この計算したバスケットＩＤ（例えば、「１」）をキーとして、この配置情報１３１から、このバスケットＩＤに対応するデバイスＩＤ（例えば、「１１０Ｃ,１１０Ｅ，１１０Ｇ」）を検索する。また、ノード装置１００Ａは、アドレス情報１３２を参照して、この検索したデバイスＩＤに対応するデバイス１１０を収容するノード装置１００のノードＩＤ（ノード装置１００のＩＰアドレス等）を特定する。そして、この特定したノード装置１００のデバイス１１０それぞれに対し当該コンテンツの読み書き要求を送信する。そして、ノード装置１００Ａは、ノード装置１００Ｂ，１００Ｃ，１００Ｄから、その読み書き要求に対する実行結果を受信すると、この実行結果をクライアント装置２０へ返す。 For example, when the node device 100A receives a read / write request (content read request or write request) designating the content key (K) from the client device 20, the node device 100A assigns the content key (K) to the hash function H and stores the basket. An ID (eg, “1”) is calculated. Then, the node device 100A searches the device ID (for example, “110C, 110E, 110G”) corresponding to the basket ID from the arrangement information 131 using the calculated basket ID (for example, “1”) as a key. To do. Further, the node device 100A refers to the address information 132 and specifies the node ID (such as the IP address of the node device 100) of the node device 100 that accommodates the device 110 corresponding to the searched device ID. Then, the content read / write request is transmitted to each of the devices 110 of the identified node device 100. When the node device 100A receives the execution result for the read / write request from the node devices 100B, 100C, and 100D, the node device 100A returns the execution result to the client device 20.

ここで、各ノード装置１００の備える配置情報１３１は、図１に例示するようにバスケットＩＤごとに、このバスケットＩＤに対応する１以上のデバイスＩＤ（デバイス１１０の識別情報）を示した情報である。このバスケットＩＤは、前記したコンテンツキー（Ｋ）をハッシュ関数Ｈに代入して得られた値、Ｈ（Ｋ）である。なお、このようなハッシュ関数を用いてバスケットＩＤを決めるのは、システム内のバスケット間で負荷の偏りがないようにするためである。 Here, the arrangement information 131 included in each node device 100 is information indicating one or more device IDs (identification information of the device 110) corresponding to the basket ID for each basket ID as illustrated in FIG. . This basket ID is a value obtained by substituting the content key (K) into the hash function H, H (K). The reason why the basket ID is determined using such a hash function is to prevent the load from being biased between baskets in the system.

なお、本実施の形態の配置情報１３１のバスケットＩＤに対応するデバイスＩＤ群は、入出力部１１経由の指示入力により設定可能である。つまり、従来のクラスタ型ストレージシステムにおいて、コンテンツを複数のデバイスに冗長化して保存する場合、そのデバイスの組み合わせはＤＨＴ空間上隣接するデバイスとしかできなかった。しかし、本システムのノード装置１００によれば、コンテンツを複数のデバイスに冗長化して保存する場合、このデバイスの組み合わせは管理者等が任意に設定可能である。よって、システム内のデバイス間でデータの移行等が行いやすくなる。 Note that the device ID group corresponding to the basket ID in the arrangement information 131 of the present embodiment can be set by an instruction input via the input / output unit 11. That is, in the conventional cluster type storage system, when content is stored redundantly in a plurality of devices, the combination of the devices can be made only with devices adjacent in the DHT space. However, according to the node apparatus 100 of the present system, when content is stored redundantly in a plurality of devices, the combination of the devices can be arbitrarily set by an administrator or the like. Therefore, it becomes easy to transfer data between devices in the system.

≪構成≫
引き続き図１を用いて、ノード装置１００の構成を説明する。図１に示すようにノード装置１００は、１以上のデバイス１１０と、コンテンツ管理部１０とを備える。このデバイス１１０は、前記したコンテンツキー（Ｋ）およびバスケットＩＤごとにコンテンツを記憶する。このデバイス１１０は、例えばＨＤＤ（Hard Disk Drive）等により実現される。また、コンテンツ管理部１０は、クライアント装置２０からコンテンツの読み出し要求または書き込み要求（以下、適宜「読み書き要求」と略す）を受け付けたとき、配置情報１３１を参照して、このコンテンツが保存されるデバイス１１０を検索する。この検索したデバイス１１０に対し、当該コンテンツの読み書き処理（読み出し処理または書き込み処理）を実行する。また、コンテンツ管理部１０は、クライアント装置２０等からの設定指示に基づき配置情報１３１の更新や作成を行ったり、この配置情報１３１を他のノード装置１００との間で交換したりする。 ≪Configuration≫
The configuration of the node device 100 will be described with reference to FIG. As illustrated in FIG. 1, the node device 100 includes one or more devices 110 and a content management unit 10. The device 110 stores content for each content key (K) and basket ID described above. The device 110 is realized by an HDD (Hard Disk Drive) or the like, for example. When the content management unit 10 receives a content read request or write request (hereinafter, abbreviated as “read / write request” as appropriate) from the client device 20, the content management unit 10 refers to the arrangement information 131 and stores the content. 110 is searched. A read / write process (read process or write process) of the content is executed on the searched device 110. Further, the content management unit 10 updates or creates the placement information 131 based on a setting instruction from the client device 20 or the like, and exchanges the placement information 131 with another node device 100.

このようなコンテンツ管理部１０は、入出力部１１と、処理部１２と、記憶部１３とを備える。 Such a content management unit 10 includes an input / output unit 11, a processing unit 12, and a storage unit 13.

入出力部１１は、他のノード装置１００やクライアント装置２０との間の通信を行うための通信インタフェースから構成される。また、処理部１２は、このノード装置１００が備えるＣＰＵ（Central Processing Unit）によるプログラム実行処理により実現される。さらに、記憶部１３は、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、ＨＤＤ（Hard Disk Drive）、フラッシュメモリ等の記憶媒体から構成される。なお、記憶部１３には、このノード装置１００の機能を実現するためのプログラムが格納される。 The input / output unit 11 includes a communication interface for performing communication with other node devices 100 and client devices 20. The processing unit 12 is realized by a program execution process by a CPU (Central Processing Unit) included in the node device 100. Further, the storage unit 13 includes a storage medium such as a random access memory (RAM), a read only memory (ROM), a hard disk drive (HDD), and a flash memory. The storage unit 13 stores a program for realizing the function of the node device 100.

入出力部（入力部および出力部）１１は、コンテンツキーを含むコンテンツの読み書き要求の入力を受け付けたり、配置情報１３１の設定指示の入力を受け付けたりする。また、他のノード装置１００との間でコンテンツの送受信や、読み書き処理の結果の送受信を行ったりする。 The input / output unit (input unit and output unit) 11 receives an input of a content read / write request including a content key, or receives an input of a setting instruction for the arrangement information 131. In addition, content is transmitted / received to / from other node devices 100, and results of read / write processing are transmitted / received.

処理部１２は、ノード装置１００のデバイス１１０に対し読み書き処理を実行したり、配置情報１３１を作成したりする。このような処理部１２は、ハッシュ計算部１２１と、配置情報管理部１２２と、アドレス情報管理部１２３と、ルーチング部１２４と、データ受信部１２５と、データ送信部１２６とを含んで構成される。 The processing unit 12 executes read / write processing on the device 110 of the node device 100 and creates the arrangement information 131. The processing unit 12 includes a hash calculation unit 121, an arrangement information management unit 122, an address information management unit 123, a routing unit 124, a data reception unit 125, and a data transmission unit 126. .

ハッシュ計算部１２１は、コンテンツキー（Ｋ）をハッシュ関数Ｈに代入して、ハッシュ値Ｈ（Ｋ）を計算する。 The hash calculator 121 assigns the content key (K) to the hash function H and calculates the hash value H (K).

配置情報管理部１２２は、入出力部１１からの設定入力に基づき新規に配置情報１３１を作成したり、既存の配置情報１３１の内容を変更したりする。例えば、新規に配置情報１３１を作成するとき、配置情報管理部１２２は、コンテンツのコンテンツキーと、このコンテンツを冗長化して保存する複数のデバイス１１０のデバイスＩＤとを含む設定入力を受け付けたとき、ハッシュ計算部１２１は、このコンテンツキーのハッシュ値を基にバスケットＩＤを計算する。そして、配置情報管理部１２２は、このバスケットＩＤと、この設定入力に含まれるデバイスＩＤとを対応付けて配置情報１３１を作成する。また、この配置情報管理部１２２は、既存の配置情報１３１の内容を変更するときには以下のようにする。まず、配置情報管理部１２２は、変更対象となるコンテンツのコンテンツキーと、このコンテンツの新たな記憶先となるデバイス１１０のデバイスＩＤとを含む変更情報の入力を受け付けると、ハッシュ計算部１２１は、このコンテンツのコンテンツキーに対応するバスケットＩＤを計算する。そして、配置情報管理部１２２は、このバスケットＩＤをキーとして、配置情報１３１から、当該バスケットＩＤに対応するデバイスＩＤを検索する。そして、このデバイスＩＤに対応するデバイス１１０を、新たな記憶先となるデバイス１１０のデバイスＩＤに書き換える。さらに、この配置情報管理部１２２は、自身のノード装置１００で作成した配置情報１３１を他のノード装置１００へ送信したり、他のノード装置１００から配置情報１３１を受信したりする。つまり、ノード装置１００間での配置情報１３１の交換を行う。なお、作成または更新された配置情報１３１や、他のノード装置１００から受信した配置情報１３１は、記憶部１３に記憶される。 The arrangement information management unit 122 newly creates the arrangement information 131 based on the setting input from the input / output unit 11 or changes the contents of the existing arrangement information 131. For example, when the arrangement information 131 is newly created, the arrangement information management unit 122 receives a setting input including a content key of content and device IDs of a plurality of devices 110 that store the content in a redundant manner. The hash calculator 121 calculates a basket ID based on the hash value of the content key. Then, the arrangement information management unit 122 creates the arrangement information 131 by associating the basket ID with the device ID included in the setting input. The arrangement information management unit 122 changes the contents of the existing arrangement information 131 as follows. First, when the arrangement information management unit 122 receives input of change information including the content key of the content to be changed and the device ID of the device 110 that is the new storage destination of the content, the hash calculation unit 121 The basket ID corresponding to the content key of this content is calculated. Then, the arrangement information management unit 122 searches the arrangement information 131 for a device ID corresponding to the basket ID using the basket ID as a key. Then, the device 110 corresponding to this device ID is rewritten with the device ID of the device 110 that becomes a new storage destination. Further, the arrangement information management unit 122 transmits the arrangement information 131 created by its own node apparatus 100 to the other node apparatus 100 and receives the arrangement information 131 from the other node apparatus 100. That is, the arrangement information 131 is exchanged between the node devices 100. The created or updated placement information 131 and the placement information 131 received from another node device 100 are stored in the storage unit 13.

アドレス情報管理部１２３は、入出力部１１からの入力に基づきアドレス情報１３２の内容を更新する。 The address information management unit 123 updates the contents of the address information 132 based on the input from the input / output unit 11.

ルーチング部１２４は、入出力部１１経由で、クライアント装置２０から処理対象のコンテンツのコンテンツキー（Ｋ）を含むコンテンツの読み書き要求の入力を受け付けたとき、ハッシュ計算部１２１によりこのコンテンツキーのハッシュ値Ｈ（Ｋ）を計算する。そして、このハッシュ値Ｈ（Ｋ）から、このコンテンツキー（Ｋ）に対応するバスケットＩＤを求める。ここでは、ハッシュ値Ｈ（Ｋ）をそのままバスケットＩＤとして用いる場合を例に説明する。 When the routing unit 124 receives an input of a content read / write request including the content key (K) of the content to be processed from the client device 20 via the input / output unit 11, the hash calculation unit 121 causes the hash value of the content key to be received. Calculate H (K). Then, a basket ID corresponding to the content key (K) is obtained from the hash value H (K). Here, a case where the hash value H (K) is used as it is as a basket ID will be described as an example.

ルーチング部１２４は、配置情報１３１からこのバスケットＩＤに対応するデバイスＩＤを検索する。そして、ルーチング部１２４は、このデバイスＩＤに対応するデバイス１１０を収容するノード装置１００のノードＩＤをアドレス情報１３２から検索する。そして、ルーチング部１２４は、検索したノードＩＤに対応するノード装置１００へ、当該デバイスＩＤに対応するデバイス１１０に記憶されるコンテンツの読み書き要求を送信する。なお、このルーチング部１２２は、記憶部１３の配置情報１３１が変更されたとき、この変更された配置情報１３１を参照してデバイスＩＤを検索する。 The routing unit 124 searches the arrangement information 131 for a device ID corresponding to this basket ID. Then, the routing unit 124 searches the address information 132 for the node ID of the node device 100 that accommodates the device 110 corresponding to this device ID. Then, the routing unit 124 transmits a read / write request for content stored in the device 110 corresponding to the device ID to the node device 100 corresponding to the searched node ID. When the arrangement information 131 in the storage unit 13 is changed, the routing unit 122 searches for the device ID with reference to the changed arrangement information 131.

また、このルーチング部１２４が送信する読み出し要求は、コンテンツの読み出し先であるデバイス１１０のデバイスＩＤ、バスケットＩＤ、コンテンツキー（Ｋ）等を含む。また、書き込み要求は、コンテンツの書き込み先であるデバイス１１０のデバイスＩＤ、バスケットＩＤ、コンテンツキー（Ｋ）に加え、書き込み内容であるデータと、クライアント装置２０からこの書き込み要求を受信した時刻であるタイムスタンプとを含む。このタイムスタンプは、例えば「20070501100015（2007年5月1日10時00分15秒）」等である。 Further, the read request transmitted by the routing unit 124 includes the device ID, basket ID, content key (K), and the like of the device 110 from which the content is read. In addition to the device ID, basket ID, and content key (K) of the device 110 that is the content write destination, the write request includes the data that is the write content and the time that is the time when the write request is received from the client device 20. Including stamps. This time stamp is, for example, “20070501100015 (May 1, 2007 10:00:15)” or the like.

データ受信部１２５は、他のノード装置１００からコンテンツの読み書き要求の応答を受信する。受信した応答は、記憶部１３のＲＡＭ等に記憶しておく。 The data receiving unit 125 receives a response to a content read / write request from another node device 100. The received response is stored in the RAM of the storage unit 13 or the like.

データ送信部１２６は、他のノード装置１００からコンテンツの読み書き要求を受信したとき、デバイス１１０に記憶される当該コンテンツの読み出し処理または書き込み処理を実行する。そして、その処理結果を、この読み書き要求の送信元であるノード装置１００へ返す。 When the data transmission unit 126 receives a content read / write request from another node device 100, the data transmission unit 126 executes a read process or a write process of the content stored in the device 110. Then, the processing result is returned to the node device 100 which is the transmission source of this read / write request.

なお、このコンテンツの読み書き要求の送信およびその要求結果（応答結果やコンテンツ等を含む）に用いられるプロトコルは、例えば、ＨＴＴＰ（HyperText Transfer Protocol）またはＨＴＴＰに相当するデータ通信プロトコルを用いる。 The protocol used for transmitting the content read / write request and the request result (including the response result and content) uses, for example, HTTP (HyperText Transfer Protocol) or a data communication protocol corresponding to HTTP.

記憶部１３は、配置情報１３１と、アドレス情報１３２とを記憶する。 The storage unit 13 stores arrangement information 131 and address information 132.

配置情報１３１は、前記したとおり、バスケットＩＤごとに、このバスケットＩＤに対応するコンテンツを記憶するデバイス１１０のデバイスＩＤを示した情報である。例えば、図１に例示するシステムにおいて、バスケットＩＤ「１」に対応するコンテンツを、デバイス１１０Ｃ，１１０Ｅ，１１０Ｇに冗長化して保存し、バスケットＩＤ「２」に対応するコンテンツを、デバイス１１０Ｄ，１１０Ｆ，１１０Ｈに冗長化して保存する場合、配置情報１３１のバスケットＩＤ「１」に対応するデバイスＩＤとして「１１０Ｃ，１１０Ｅ，１１０Ｇ」を設定しておく。また、この配置情報１３１のバスケットＩＤ「２」に対応するデバイスＩＤとして「１１０Ｄ，１１０Ｆ，１１０Ｈ」を設定しておく。 As described above, the arrangement information 131 is information indicating the device ID of the device 110 that stores the content corresponding to the basket ID for each basket ID. For example, in the system illustrated in FIG. 1, the content corresponding to the basket ID “1” is stored redundantly in the devices 110C, 110E, and 110G, and the content corresponding to the basket ID “2” is stored in the devices 110D, 110F, In the case of storing 110H in a redundant manner, “110C, 110E, 110G” is set as a device ID corresponding to the basket ID “1” of the arrangement information 131. Further, “110D, 110F, 110H” is set as a device ID corresponding to the basket ID “2” of the arrangement information 131.

アドレス情報１３２は、デバイスＩＤごとに、そのデバイスＩＤに対応するデバイス１１０を収容するノード装置１００のノードＩＤを示した情報である。なお、このノードＩＤは、ＩＰアドレスとポート番号との組み合わせで記述するようにしてもよい。 The address information 132 is information indicating the node ID of the node device 100 that accommodates the device 110 corresponding to the device ID for each device ID. This node ID may be described by a combination of an IP address and a port number.

なお、図１において、コンテンツ管理部１０とデバイス１１０とは同じノード装置１００内に含まれるものとして説明したが、これらを別個の装置として実現するようにしてもよい。 In FIG. 1, the content management unit 10 and the device 110 have been described as being included in the same node device 100, but they may be realized as separate devices.

≪処理手順≫
次に、図１を参照しつつ、図２を用いてノード装置１００の処理手順を説明する。図２は、図１のノード装置が他のノード装置へコンテンツの読み書き要求を送信する手順を示した図である。ここでは、クライアント装置２０からのコンテンツの読み書き要求を受け付けたノード装置１００Ａが、このコンテンツを保存するデバイス１１０Ｃ，１１０Ｅ，１１０Ｇへ当該コンテンツの読み書き要求を送信する場合を例に説明する。 ≪Processing procedure≫
Next, the processing procedure of the node device 100 will be described using FIG. 2 with reference to FIG. FIG. 2 is a diagram illustrating a procedure in which the node device of FIG. 1 transmits a content read / write request to another node device. Here, an example will be described in which the node device 100A that has received a content read / write request from the client device 20 transmits the content read / write request to the devices 110C, 110E, and 110G that store the content.

まず、ノード装置１００Ａのルーチング部１２４（図１参照）は、クライアント装置２０からコンテンツの読み出し要求または書き込み要求（読み書き要求）を受信する（Ｓ１０１）。なお、読み出し要求は、読み出し対象であるコンテンツのコンテンツキー（Ｋ）を含む。書き込み要求は、書き込み対象であるコンテンツのコンテンツキー（Ｋ）と、書き込み内容を示すデータとを含む。 First, the routing unit 124 (see FIG. 1) of the node device 100A receives a content read request or write request (read / write request) from the client device 20 (S101). Note that the read request includes the content key (K) of the content to be read. The write request includes the content key (K) of the content to be written and data indicating the write content.

ノード装置１００Ａのハッシュ計算部１２１（図１参照）は、受信したコンテンツキー（Ｋ）を基にバスケットＩＤを計算する（Ｓ１０２）。そして、ルーチング部１２４は、この計算したバスケットＩＤと配置情報１３１とを参照して、このコンテンツの書き込み先であるデバイス１１０Ｃ,１１０Ｅ，１１０ＧのデバイスＩＤを検索する（Ｓ１０３）。次に、ルーチング部１２４は、検索したデバイスＩＤをキーとしてアドレス情報１３２から、このデバイスＩＤに対応するデバイス１１０（デバイス１１０Ｃ,１１０Ｅ，１１０Ｇ）を収容するノード装置１００（ノード装置１００Ｂ，１００Ｃ，１００Ｄ）のアドレス（ノードＩＤ）を検索する（Ｓ１０４）。 The hash calculator 121 (see FIG. 1) of the node device 100A calculates a basket ID based on the received content key (K) (S102). Then, the routing unit 124 refers to the calculated basket ID and the arrangement information 131, and searches for the device IDs of the devices 110C, 110E, and 110G that are the write destinations of the contents (S103). Next, the routing unit 124 uses the searched device ID as a key, and from the address information 132, the node device 100 (node devices 100B, 100C, 100D) that accommodates the device 110 (device 110C, 110E, 110G) corresponding to this device ID. ) Address (node ID) is searched (S104).

そして、ルーチング部１２４は、この検索したアドレスを基に、このコンテンツの書き込み先であるデバイス１１０Ｃ，１１０Ｅ，１１０Ｇを収容するノード装置１００Ｂ，１００Ｃ，１００Ｄへ、当該コンテンツの読み出し要求または書き込み要求（読み書き要求）を送信する（Ｓ１０５）。なお、ここで送信する読み出し要求は、前記したとおり、デバイスＩＤ、バスケットＩＤ、コンテンツキー（Ｋ）を含むものである。また、書き込み要求は、デバイスＩＤ、バスケットＩＤ、コンテンツキー（Ｋ）、書き込み内容を示すデータに加え、この書き込み要求を受け付けた日時を示すタイムスタンプを含む。 Then, based on the retrieved address, the routing unit 124 makes a request to read or write the content (read / write) to the node devices 100B, 100C, and 100D that accommodate the devices 110C, 110E, and 110G that are the content write destinations. Request) is transmitted (S105). The read request transmitted here includes the device ID, basket ID, and content key (K) as described above. The write request includes a device ID, a basket ID, a content key (K), data indicating the write content, and a time stamp indicating the date and time when the write request is received.

次に、図１を参照しつつ、図３を用いて、図２に示す手順によりコンテンツの読み書き要求を送信したノード装置１００Ａが、クライアント装置２０へその読み書き要求の応答を返すときの処理手順を説明する。図３（ａ）は、図１のノード装置が、コンテンツの読み書き要求に対する応答をクライアント装置へ送信する手順を示した図であり、図３（ｂ）は、（ａ）におけるＳ２０２の処理を概念的に示した図である。 Next, referring to FIG. 1, using FIG. 3, a processing procedure when the node device 100A that has transmitted a content read / write request by the procedure shown in FIG. 2 returns a response to the read / write request to the client device 20 will be described. explain. FIG. 3A is a diagram illustrating a procedure in which the node device in FIG. 1 transmits a response to the content read / write request to the client device, and FIG. 3B conceptually illustrates the process of S202 in FIG. FIG.

図２においてノード装置１００Ａからコンテンツの読み出し要求または書き込み要求（読み書き要求）を受信したノード装置１００Ｂ，１００Ｃ，１００Ｄは、データ送信部１２６（図1参照）により、このコンテンツの読み出し要求または書き込み要求（読み書き要求）に対する応答をノード装置１００Ａへ送信する（Ｓ２０１）。 In FIG. 2, the node devices 100B, 100C, and 100D that have received the content read request or write request (read / write request) from the node device 100A have this content read request or write request (see FIG. 1) (see FIG. 1). A response to the read / write request is transmitted to the node device 100A (S201).

そして、ノード装置１００Ａのデータ受信部１２５は、ノード装置１００Ｂ，１００Ｃ，１００Ｄから、この読み出し要求または書き込み要求（読み書き要求）に対する応答を受信する。ここで、ルーチング部１２４には、データ受信部１２５で受信した応答の数が所定の定足数を充足したと判断したとき（Ｓ２０２）、この受信した応答をクライアント装置２０へ返す。なお、図３（ａ）に示すように読み出し要求に対する応答および書き込み要求に対する応答は、それぞれ当該要求に基づく処理の結果（処理の成功または失敗）を示したものである。なお、読み出し処理に成功した場合、その応答には、読み出したコンテンツが含まれる。 Then, the data receiving unit 125 of the node device 100A receives a response to this read request or write request (read / write request) from the node devices 100B, 100C, and 100D. Here, when the routing unit 124 determines that the number of responses received by the data receiving unit 125 has satisfied a predetermined quorum (S202), the received response is returned to the client device 20. As shown in FIG. 3A, the response to the read request and the response to the write request indicate the results of the processing based on the request (success or failure of the processing). If the read process is successful, the response includes the read content.

このようにすることで、例えば、当該コンテンツを記憶するノード装置１００の数が多数であった場合でも、そのノード装置１００すべてから応答を待たずにクライアント装置２０へ応答を返すことができる。 By doing so, for example, even when the number of node devices 100 storing the content is large, a response can be returned to the client device 20 without waiting for a response from all the node devices 100.

ここで、図３（ｂ）を用いてＳ２０２における定足数充足の判断処理を、具体例を用いて説明する。ここでは、読み出し要求または書き込み要求（読み書き要求）を送信したノード装置１００の数が「４」であり、定足数を「２」とした場合を例に説明する。 Here, the quorum satisfaction determination process in S202 will be described using a specific example with reference to FIG. Here, an example will be described in which the number of node devices 100 that transmitted a read request or a write request (read / write request) is “4” and the quorum is “2”.

例えば、書き込み要求の場合、ノード装置１００から、他のノード装置１００へ書き込み要求を送信し、この要求に対する結果ＯＫ（成功）の受信数が「２」に達したとき、この時点でノード装置１００はクライアント装置２０へＯＫを返送する（例１、例２参照）。つまり、受信した応答の中に結果ＮＧが含まれていた場合、この結果ＮＧの応答はカウントせず、結果ＯＫの応答の受信数が「２」に達した時点でノード装置１００はクライアント装置２０へＯＫを返送する（例２参照）。そして、今まで受信した書き込み要求に対する応答をクライアント装置２０へ送信する。このように結果ＯＫの応答の受信数が「２」に達した後、ノード装置１００が応答を受信しても、その応答については無視する。なお、この書き込み要求に関する定足数は、冗長化されたデバイス１１０の数そのものでもよいし、冗長化されたデバイス１１０よりも少ない数であってもよい。 For example, in the case of a write request, a write request is transmitted from the node device 100 to another node device 100, and when the number of results OK (success) received for this request reaches “2”, the node device 100 at this time. Returns OK to the client device 20 (see Examples 1 and 2). That is, when the result NG is included in the received response, the response of the result NG is not counted, and when the number of received responses of the result OK reaches “2”, the node device 100 determines that the client device 20 OK is returned to (see Example 2). Then, a response to the write request received so far is transmitted to the client device 20. As described above, even if the node device 100 receives a response after the number of received responses of the result OK reaches “2”, the response is ignored. Note that the quorum for this write request may be the number of redundant devices 110 itself, or may be smaller than the number of redundant devices 110.

また、読み出し要求の場合、ノード装置１００から、他のノード装置１００へ読み出し要求を送信し、最新のタイムスタンプ（例えば「Ｔｉｍｅ＝３」）を持つコンテンツの受信数が「２」に達した時点でノード装置１００はクライアント装置２０へＯＫを返送する（例３、例４参照）。つまり、（１）受信した応答の中に結果ＮＧのものが含まれていた場合、および、（２）タイムスタンプの値が最新のものではないコンテンツ（例えば「Ｔｉｍｅ＝２」）が含まれていた場合（例４参照）、この応答はカウントしない。そして、この最新のタイムスタンプを持つコンテンツの受信数が「２」に達した時点で、ノード装置１００はクライアント装置２０へＯＫを返送する。そして、今まで受信したコンテンツをクライアント装置２０へ送信する。ここで、結果ＯＫの応答の受信数が「２」に達した後、ノード装置１００が受信した応答については、ノード装置１００はこの応答をクライアント装置２０へ返さない。なお、ノード装置１００において最新のコンテンツの受信数が「２」に達する前にコンテンツ受信処理が終了した場合、受信した分のコンテンツをクライアント装置２０へ返すようにしてもよい。 In the case of a read request, when a read request is transmitted from the node device 100 to another node device 100, the number of received contents having the latest time stamp (for example, “Time = 3”) reaches “2”. The node device 100 returns OK to the client device 20 (see Examples 3 and 4). That is, (1) if the received response contains a result NG, and (2) the content whose time stamp value is not the latest (for example, “Time = 2”) is included. If this occurs (see Example 4), this response is not counted. Then, when the number of received contents having the latest time stamp reaches “2”, the node device 100 returns OK to the client device 20. Then, the content received so far is transmitted to the client device 20. Here, the node device 100 does not return this response to the client device 20 for the response received by the node device 100 after the number of received responses of the result OK reaches “2”. Note that if the content reception process is completed before the node device 100 receives the latest number of received contents reaches “2”, the received content may be returned to the client device 20.

≪具体例≫
次に、図４を用いて、クラスタ型ストレージシステムのノード装置１００に負荷の集中が起こった場合における、配置情報１３１の書き換えについて、具体例を用いて説明する。図４（ａ）は、本実施の形態のクラスタ型ストレージシステムにおいて各ノード装置の収容するデバイスおよび各ノード装置における負荷の相対比率を例示した図である。図４（ｂ）は、書き換え前の配置情報と、書き換え後の配置情報とを例示した図である。 ≪Specific example≫
Next, rewriting of the arrangement information 131 when load concentration occurs in the node device 100 of the cluster type storage system will be described using a specific example with reference to FIG. FIG. 4A is a diagram exemplifying the relative proportions of the devices accommodated in each node device and the load in each node device in the cluster type storage system of the present embodiment. FIG. 4B illustrates the arrangement information before rewriting and the arrangement information after rewriting.

ここでは、クラスタ型ストレージシステムにおいてデータセンタＡ，Ｂ，Ｃがそれぞれ図４に示すようなデバイスを収容するノード装置１００（１００Ａ〜１００Ｌ）を備えている場合を例に説明する。このときの各ノード装置１００の配置情報１３１の内容は、図４（ｂ）の配置情報１３１Ａに示すとおりである。各ノード装置１００の左に付記された数値は、このクラスタ型ストレージシステムの各ノード装置１００における負荷の相対的比率を示す。このようなクラスタ型ストレージシステムにおいて、ノード装置１００Ｃ，１００Ｇ，１００Ｋの負荷が比較的高く（負荷の相対的比率「２」）、ノード装置１００Ｄ，１００Ｈ，１００Ｌの負荷が比較的低く（相対的比率「０．６７」）なったとする。 Here, a case where the data centers A, B, and C each include a node device 100 (100A to 100L) that accommodates devices as shown in FIG. 4 in the cluster type storage system will be described as an example. The contents of the placement information 131 of each node device 100 at this time are as shown in the placement information 131A of FIG. The numerical value appended to the left of each node device 100 indicates the relative ratio of the load in each node device 100 of this cluster type storage system. In such a cluster type storage system, the loads of the node devices 100C, 100G, and 100K are relatively high (relative load ratio “2”), and the loads of the node devices 100D, 100H, and 100L are relatively low (relative ratio). “0.67”).

このような状態において、ノード装置１００間の負荷の偏りを解消するためには、図４（ａ）のノード装置１００Ｃ，１００Ｇ，１００Ｋの負荷を従来の１／２倍とし、ノード装置１００Ｄ，１００Ｈ，１００Ｌの負荷を従来の３／２倍とすればよい。そのためには、例えば、バスケットＩＤ「１２，１３」に対応するコンテンツの書き込み先を、それぞれデバイス０４１，１４１，２４１およびデバイス０４２，１４２，２４２に変更すればよい。このような変更に伴い、配置情報管理部１２２は、管理者等の指示入力に基づき、配置情報１３１Ｂを作成する。この配置情報１３１Ｂは、配置情報１３１ＡにおけるバスケットＩＤ「１２，１３」のコンテンツの書き込み先をノード装置１００Ｃ，１００Ｇ，１００Ｋから、ノード装置１００Ｃ，１００Ｇ，１００Ｋに書き換えたものである。そして、この書き換え後の配置情報１３１Ｂは各ノード装置１００へ伝播し、当該バスケットＩＤに対応するコンテンツに対する読み書き処理は、配置情報１３１Ｂに示されるノード装置１００に対し行われるようになる。 In such a state, in order to eliminate the load imbalance between the node devices 100, the load of the node devices 100C, 100G, and 100K in FIG. , 100L load should be 3/2 times that of the prior art. For this purpose, for example, the content write destinations corresponding to the basket IDs “12, 13” may be changed to the devices 041, 141, 241, and the devices 042, 142, 242, respectively. With such a change, the arrangement information management unit 122 creates the arrangement information 131B based on an instruction input from the administrator or the like. This arrangement information 131B is obtained by rewriting the content write destination of the basket ID “12, 13” in the arrangement information 131A from the node devices 100C, 100G, 100K to the node devices 100C, 100G, 100K. Then, the rewritten placement information 131B is propagated to each node device 100, and the read / write processing with respect to the content corresponding to the basket ID is performed on the node device 100 indicated by the placement information 131B.

このようにすることで、各ノード装置１００に冗長化してコンテンツを保存するシステムにおいて特定のノード装置１００（またはデバイス１１０）に負荷が集中した場合、その負荷の集中を解消するように配置情報１３１を書き換えることができる。 In this way, when load is concentrated on a specific node device 100 (or device 110) in a system that stores the contents redundantly in each node device 100, the arrangement information 131 is set so as to eliminate the concentration of the load. Can be rewritten.

また、このような場合以外にも、配置情報１３１の各バスケット（バスケットＩＤ）に対応するデバイスＩＤは、管理者等が書き換え可能である。よって、コンテンツを冗長化して保存するデバイス１１０の組み合わせをシステムの運用環境に合わせて、ＤＨＴ空間上で自由に選択することができる。 In addition to this case, the device ID corresponding to each basket (basket ID) in the arrangement information 131 can be rewritten by the administrator or the like. Therefore, it is possible to freely select a combination of devices 110 for storing contents in a redundant manner on the DHT space in accordance with the operation environment of the system.

なお、システム内に、ノード装置１００それぞれにおける処理負荷を監視する監視装置を設けてもよい。そして、この監視装置において、ノード装置１００間の処理負荷の偏りを検出したとき、この処理負荷の値をノード装置１００へ送信する。そして、ノード装置１００は、この処理負荷の値を参照して、配置情報１３１において、処理負荷が比較的高いノード装置１００に対応するバスケットの数を低減し、処理負荷が比較的低いノード装置１００に対応するバスケットの数を増加させるようにしてもよい。例えば、ノード装置１００は、配置情報１３１において、処理負荷が比較的高いノード装置１００のデバイス１１０に対応するバスケットＩＤについて、このバスケットＩＤに対応するデバイス１１０を、処理負荷が比較的低いノード装置１００のデバイス１１０へ変更するようにしてもよい。 A monitoring device that monitors the processing load in each node device 100 may be provided in the system. When this monitoring device detects a bias in processing load between the node devices 100, the value of this processing load is transmitted to the node device 100. Then, the node device 100 refers to the value of the processing load, and reduces the number of baskets corresponding to the node device 100 having a relatively high processing load in the arrangement information 131, and the node device 100 having a relatively low processing load. You may make it increase the number of the baskets corresponding to. For example, for the basket ID corresponding to the device 110 of the node device 100 having a relatively high processing load in the arrangement information 131, the node device 100 assigns the device 110 corresponding to this basket ID to the node device 100 having a relatively low processing load. The device 110 may be changed.

また、前記した配置情報１３１に用いられるバスケットＩＤは、コンテンツキー（Ｋ）のハッシュ値Ｈ（Ｋ）としたが、これに限定されない。例えば、管理者等が用意しておきたいバスケット数よりも充分多い種類のハッシュ値が得られるハッシュ関数Ｈ’によりハッシュ値Ｈ’（Ｋ）を得て、そのマスク値をバスケットＩＤとしてもよい。従って、バスケット数が２^１６個必要である場合、２^３２種類のハッシュ値が得られるハッシュ関数ＣＲＣ３２によってコンテンツキー（Ｋ）のハッシュ値を得て、その下位１６ビットのマスク値をバスケットＩＤとしてもよい。 The basket ID used for the arrangement information 131 is the hash value H (K) of the content key (K), but is not limited to this. For example, the hash value H ′ (K) may be obtained by the hash function H ′ that can obtain a sufficiently larger number of hash values than the number of baskets that the administrator or the like wants to prepare, and the mask value may be used as the basket ID. Therefore, if 2 ¹⁶ baskets are required, the hash value CRC32 for obtaining 2 ³² types of hash values is used to obtain the hash value of the content key (K), and the mask value of the lower 16 bits is used as the basket ID. Good.

また、デバイス１１０内にコンテンツを保存（記憶）する方法は様々な方法が考えられる。例えば、（１）デバイス１１０内に１つのデータベースを置き、そのデータベースにバスケットＩＤと、コンテンツキー（Ｋ）と、コンテンツとを関連付けて保存してもよいし、（２）デバイス１１０内にバスケットＩＤごとにデータベースを置き、そのデータベースそれぞれにおいてコンテンツキー（Ｋ）とコンテンツとを関連付けて保存してもよい。また、（３）バスケットＩＤ、コンテンツキー（Ｋ）をディレクトリ名やファイル名として、コンテンツをファイルシステムに保存するようにしてもよい。 Various methods are conceivable for storing (storing) content in the device 110. For example, (1) one database may be placed in the device 110, and the basket ID, the content key (K), and the content may be stored in association with the database, or (2) the basket ID may be stored in the device 110. A database may be placed for each, and the content key (K) and the content may be stored in association with each database. Further, (3) the contents may be stored in the file system using the basket ID and the content key (K) as a directory name or a file name.

本実施の形態のクラスタ型ストレージシステムの構成例を示した図である。1 is a diagram illustrating a configuration example of a cluster type storage system according to the present embodiment. 図１のノード装置が他のノード装置へコンテンツの読み書き要求を送信する手順を示した図であるIt is the figure which showed the procedure in which the node apparatus of FIG. 1 transmits the reading / writing request | requirement of a content to another node apparatus. （ａ）は、図１のノード装置が、コンテンツの読み書き要求に対する応答をクライアント装置へ送信する手順を示した図であり、（ｂ）は、（ａ）におけるＳ２０２の処理を概念的に示した図である。(A) is the figure which showed the procedure in which the node apparatus of FIG. 1 transmits the response with respect to the read / write request | requirement of a content to a client apparatus, (b) showed the process of S202 in (a) notionally. FIG. （ａ）は、本実施の形態のクラスタ型ストレージシステムにおいて各ノード装置の収容するデバイスおよび各ノード装置における負荷の相対比率を例示した図であり、（ｂ）は、書き換え前の配置情報と、書き換え後の配置情報とを例示した図である。(A) is the figure which illustrated the relative ratio of the device which each node apparatus accommodates, and the load in each node apparatus in the cluster type storage system of this Embodiment, (b) is the arrangement information before rewriting, It is the figure which illustrated the arrangement information after rewriting.

Explanation of symbols

１０コンテンツ管理部
１１入出力部
１２処理部
１３記憶部
２０クライアント装置
１００（１００Ａ，１００Ｂ，１００Ｃ，１００Ｄ）ノード装置
１１０（１１０Ｃ，１１０Ｄ，１１０Ｅ，１１０Ｆ，１１０Ｇ，１１０Ｈ）デバイス
１２１ハッシュ計算部
１２２配置情報管理部
１２３アドレス情報管理部
１２４ルーチング部
１２５データ受信部
１２６データ送信部
１３１（１３１Ａ，１３１Ｂ）配置情報
１３２アドレス情報 DESCRIPTION OF SYMBOLS 10 Content management part 11 Input / output part 12 Processing part 13 Storage part 20 Client apparatus 100 (100A, 100B, 100C, 100D) Node apparatus 110 (110C, 110D, 110E, 110F, 110G, 110H) Device 121 Hash calculation part 122 Arrangement Information management unit 123 Address information management unit 124 Routing unit 125 Data reception unit 126 Data transmission unit 131 (131A, 131B) Arrangement information 132 Address information

Claims

The node device in a cluster type storage system comprising a plurality of node devices for reading and writing data, and storing the data redundantly in a plurality of devices accommodated in the node device,
For each basket ID that is identification information obtained based on a hash value of a content key that is identification information of the data, arrangement information indicating identification information of a plurality of the devices that store the data, and identification of the device For each information, a storage unit that stores address information indicating identification information of a node device that accommodates the device;
An input unit that receives a request to read or write the data including a content key of the data to be read or written;
A hash calculator that calculates the basket ID corresponding to the content key based on the hash value of the content key;
Using the calculated basket ID as a key, the device identification information corresponding to the basket ID is retrieved from the arrangement information, and the retrieved device identification information and the address information are referred to. The identification information of the node device that accommodates the device corresponding to the searched identification information is specified, and the data read request or write request corresponding to the content key to the device is transmitted to the node device of the specified identification information. A routing section;
A node device, comprising: a data receiving unit that receives a result of the data read request or write request from the node device.

When receiving the setting information of the arrangement information including the content key of the data and the identification information of the plurality of devices serving as storage destinations of the data via the input unit,
The hash calculation unit calculates the basket ID corresponding to the content key based on a hash value of the content key included in the setting information,
The node device is
2. The arrangement information management unit according to claim 1, further comprising an arrangement information management unit that creates the arrangement information in which the calculated basket ID is associated with identification information of the plurality of devices included in the setting information. Node device.

Input of change information including the content key of the data to be changed and the identification information of the device to be the new storage destination of the data via the input unit in order to change the device as the data storage destination When you accept
The arrangement information management unit uses the basket ID corresponding to the content key of the data to be changed calculated by the hash calculation unit as a key, and from the arrangement information, identifies the device identification information corresponding to the basket ID. Search and change the identification information of the searched device in the arrangement information to the identification information of the device as the new storage destination,
The node device according to claim 2, wherein the routing unit searches for the identification information of the device with reference to the changed arrangement information when searching for the identification information of the device.

When the data read request or write request is received from another node device, the read or write processing is executed on the data stored in the device included in the node device, and the execution result is obtained. 4. The node device according to claim 1, further comprising: a data transmission unit configured to transmit to a node device that is a transmission source of the data read request or write request. 5.

When the placement information in the storage unit is changed, the placement information management unit transmits the changed placement information to another node device in the cluster storage system, and the change is made from the other node device. When the later arrangement information is received, the changed arrangement information is stored in the storage unit,
The routing unit searches the device identification information with reference to the arrangement information stored in the storage unit including the changed arrangement information when searching for the device identification information. The node device according to any one of claims 2 to 4.

A cluster type storage system comprising a plurality of node devices according to any one of claims 1 to 5.

A monitoring device for monitoring a processing load in each of the node devices in the cluster type storage system;
In the monitoring device, when detecting a bias in processing load between the node devices,
For the basket ID corresponding to the identification information of the device accommodated in the node device having a relatively high processing load in the arrangement information, the node device uses the processing load to identify the device identification information corresponding to the basket ID. 7. The storage system according to claim 6, wherein the storage system is changed to identification information of a device accommodated in the node device having a relatively low value.

In a cluster-type storage system that includes a plurality of node devices that read and write data, and stores the data redundantly in a plurality of devices accommodated in the node device, a hash value of a content key that is identification information of the data For each basket ID which is identification information obtained based on the arrangement information indicating the identification information of the plurality of devices storing the data, and identification of the node device which accommodates the device for each identification information of the device The node device including a storage unit that stores address information indicating information,
Receiving an input of the data read / write request or read request including a content key of the data to be read or written;
Calculating the basket ID corresponding to the content key based on the hash value of the content key;
Searching the device identification information corresponding to the basket ID from the arrangement information using the calculated basket ID as a key;
Identifying the identification information of the node device accommodating the device corresponding to the searched identification information with reference to the searched identification information of the device and the address information;
Transmitting a data read request or a write request corresponding to the content key to the device to the identified identification information node device;
And a step of receiving a result of the data read request or write request from the node device.

The node device is
In order to change the device that is the storage destination of the data, when receiving the input of the change information including the content key of the data to be changed and the identification information of the device that is the new storage destination of the data,
Calculating a basket ID corresponding to the content key of the data to be changed;
Searching the device identification information corresponding to the basket ID from the arrangement information using the calculated basket ID as a key;
Executing the step of changing the searched device identification information in the arrangement information to the new storage destination device identification information;
9. The data read and write control method according to claim 8, wherein when the device identification information is searched, the device identification information is searched with reference to the changed arrangement information.

The program for making the said node apparatus which is a computer perform the data read-out and write-in control method of Claim 8 or Claim 9.