JP5758449B2

JP5758449B2 - Data rearrangement apparatus, method and program

Info

Publication number: JP5758449B2
Application number: JP2013147463A
Authority: JP
Inventors: 佐藤　孝治; 孝治佐藤; 淑美一柳; 一樹及川; 公洋山本
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2013-07-16
Filing date: 2013-07-16
Publication date: 2015-08-05
Anticipated expiration: 2033-07-16
Also published as: JP2015022327A

Description

本発明は、データ再配置装置、方法およびプログラムに関する。 The present invention relates to a data rearrangement apparatus, method, and program.

近年、分散ファイルシステム（Distributed File System）が広く利用されるようになっている。分散ファイルシステムは、ネットワークで接続された多数のサーバで構成され、巨大な記憶装置を実現する。典型的には、分散ファイルシステムは、複数のラックにまたがる数十台から数百台、それ以上のサーバを接続することで構成される。 In recent years, a distributed file system has been widely used. The distributed file system is composed of a large number of servers connected via a network, and realizes a huge storage device. Typically, a distributed file system is configured by connecting several tens to several hundreds of servers that extend over a plurality of racks.

このような分散ファイルシステムの例としては、Google File System（以下、ＧＦＳ、非特許文献１参照）やHadoop Distributed File System（以下、ＨＤＦＳ、非特許文献２参照）が知られている。これらのシステムでは、ファイルを固定長のブロックに分割して、分散ファイルシステムを構成する各サーバに格納する。可用性や耐障害性を高めるために、各ブロックは定められた冗長度の数のサーバに複製して格納される。これにより、あるサーバに障害が発生しても、他のサーバに格納されているブロックを用いて処理を継続することができる。また、ブロックをサーバに格納するときは、可用性や耐障害性、読み書き性能が最大になるように格納先のサーバを決定する。 As examples of such a distributed file system, Google File System (hereinafter referred to as GFS, Non-Patent Document 1) and Hadoop Distributed File System (hereinafter referred to as HDFS, Non-Patent Document 2) are known. In these systems, a file is divided into fixed-length blocks and stored in each server constituting the distributed file system. In order to increase availability and fault tolerance, each block is duplicated and stored in a server of a predetermined redundancy number. Thereby, even if a failure occurs in a certain server, the processing can be continued using the block stored in another server. Also, when storing a block in a server, the storage destination server is determined so that availability, fault tolerance, and read / write performance are maximized.

従来の分散ファイルシステムにおいては、外部アプリケーションがデータを読み出す場合、当該外部アプリケーションが動作しているサーバ（動作サーバとも呼ぶ。）に対して、ネットワークトポロジ的に最も距離の近いサーバからブロックを読み出す。これによって、ブロックのネットワーク転送にかかる時間や処理量を抑制する。 In a conventional distributed file system, when an external application reads data, a block is read from a server having the closest network topology to a server (also referred to as an operation server) on which the external application is operating. This suppresses the time and processing amount required for network transfer of blocks.

しかし、分散ファイルシステムを長期間にわたって運用すると、サーバ間のストレージ使用が不均等になり、読み書きの処理負荷に偏りが生じる可能性がある。そこで、データ格納量が多いサーバから少ないサーバへとデータを移動させ、分散ファイルシステム全体としてサーバのストレージ使用を平準化するデータ再配置技術が提案されている。 However, when a distributed file system is operated for a long period of time, the storage usage between servers becomes uneven, and there is a possibility that the read / write processing load is biased. Thus, a data relocation technique has been proposed in which data is moved from a server with a large amount of data storage to a server with a small amount of data, and the storage use of the server is leveled as a whole distributed file system.

Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung, The Google File System, Proceedings of the 19th ACM Symposium on Operating Systems Principles, pages 29-43, October, 2003.Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung, The Google File System, Proceedings of the 19th ACM Symposium on Operating Systems Principles, pages 29-43, October, 2003. Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, The Hadoop Distributed File System, Proceedings of the 26th IEEE Symposium on Mass Storage Systems and Technologies, pages 1-10, May, 2010.Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, The Hadoop Distributed File System, Proceedings of the 26th IEEE Symposium on Mass Storage Systems and Technologies, pages 1-10, May, 2010.

従来のデータ再配置技術においては、処理の結果として動作サーバに格納されていたブロックが他のサーバに移動する場合がある。その結果、外部アプリケーションがブロックを読み出そうとすると、動作サーバ以外のサーバからブロック転送を実行することになる。このため、読み出し性能が低下する。 In the conventional data rearrangement technique, a block stored in the operation server as a result of processing may move to another server. As a result, when an external application attempts to read a block, block transfer is executed from a server other than the operation server. For this reason, the reading performance is degraded.

開示の技術は、上記に鑑みてなされたものであって、外部アプリケーションの読み出し性能の低下を防止することができる分散ファイルシステムのデータ再配置装置、方法およびプログラムを提供することを目的とする。 The disclosed technique has been made in view of the above, and an object thereof is to provide a data rearrangement apparatus, method, and program for a distributed file system that can prevent a decrease in read performance of an external application.

上述した課題を解決し、目的を達成するために、実施形態に係るデータ再配置装置、方法およびプログラムは、外部アプリケーションが書き込むファイルデータを固定長のブロックに分割して予め定められた冗長度の数のサーバに格納する際に、各サーバに格納されるブロック各々について、データ再配置処理の優先度を示す再配置優先度を付与し、再配置優先度に基づいて、ブロックのデータ再配置処理を実行することを特徴とする。 In order to solve the above-described problems and achieve the object, the data relocation apparatus, method, and program according to the embodiment divide file data written by an external application into fixed-length blocks and have a predetermined redundancy. When data is stored in a number of servers, a relocation priority indicating the priority of data relocation processing is assigned to each block stored in each server, and block data relocation processing is performed based on the relocation priority. It is characterized by performing.

開示するデータ再配置装置、方法およびプログラムは、外部アプリケーションの読み出し性能の低下を防止することができるという効果を奏する。 The disclosed data rearrangement apparatus, method, and program have an effect of preventing a decrease in read performance of an external application.

図１は、第１の実施形態に係るデータ再配置システムの構成例を示す図である。FIG. 1 is a diagram illustrating a configuration example of a data rearrangement system according to the first embodiment. 図２は、第１の実施形態におけるファイルデータと固定長のブロックの関係を示す図である。FIG. 2 is a diagram illustrating the relationship between file data and fixed-length blocks in the first embodiment. 図３は、第１の実施形態におけるマスタサーバの構成例を示す図である。FIG. 3 is a diagram illustrating a configuration example of the master server in the first embodiment. 図４は、第１の実施形態におけるブロック位置管理テーブルの構成例を示す図である。FIG. 4 is a diagram illustrating a configuration example of a block position management table according to the first embodiment. 図５は、第１の実施形態におけるデータサーバ状態管理テーブルの構成例を示す図である。FIG. 5 is a diagram illustrating a configuration example of a data server state management table in the first embodiment. 図６は、第１の実施形態における再配置元ブロック情報テーブルの構成例を示す図である。FIG. 6 is a diagram illustrating a configuration example of a rearrangement source block information table according to the first embodiment. 図７は、第１の実施形態における再配置先データサーバ識別子リストの構成例を示す図である。FIG. 7 is a diagram illustrating a configuration example of a relocation destination data server identifier list in the first embodiment. 図８は、第１の実施形態におけるデータサーバの構成例を示す図である。FIG. 8 is a diagram illustrating a configuration example of the data server in the first embodiment. 図９は、第１の実施形態におけるブロック情報テーブルの構成例を示す図である。FIG. 9 is a diagram illustrating a configuration example of a block information table according to the first embodiment. 図１０は、第１の実施形態におけるブロック生成処理において、マスタサーバのブロック位置管理部が外部アプリケーションからブロック生成要求を受信したときの動作の一例を示すフローチャートである。FIG. 10 is a flowchart illustrating an example of an operation when the block location management unit of the master server receives a block generation request from an external application in the block generation processing according to the first embodiment. 図１１は、第１の実施形態におけるブロック生成処理において、データサーバのデータアクセス部がブロック生成要求を受信したときの動作の一例を示すフローチャートである。FIG. 11 is a flowchart illustrating an example of an operation when the data access unit of the data server receives a block generation request in the block generation process according to the first embodiment. 図１２は、第１の実施形態におけるデータ再配置実行処理でのマスタサーバのデータ再配置制御部の動作の一例を示すフローチャートである。FIG. 12 is a flowchart illustrating an example of the operation of the data rearrangement control unit of the master server in the data rearrangement execution process according to the first embodiment. 図１３は、第１の実施形態におけるデータ再配置実行処理での再配置元データサーバのデータ再配置実行部がブロック情報テーブル取得要求を受信したときの動作の一例を示すフローチャートである。FIG. 13 is a flowchart illustrating an example of an operation when the data rearrangement execution unit of the rearrangement source data server receives the block information table acquisition request in the data rearrangement execution process according to the first embodiment. 図１４は、第１の実施形態におけるデータ再配置実行処理での再配置先データサーバのデータ再配置実行部がデータ再配置要求を受信したときの動作の一例を示すフローチャートである。FIG. 14 is a flowchart illustrating an example of an operation performed when the data relocation execution unit of the relocation destination data server receives the data relocation request in the data relocation execution process according to the first embodiment. 図１５は、第１の実施形態におけるデータ再配置実行処理でのブロックコピー元データサーバのデータアクセス部がブロックコピー要求を受信したときの動作の一例を示すフローチャートである。FIG. 15 is a flowchart illustrating an example of an operation when the data access unit of the block copy source data server receives a block copy request in the data rearrangement execution process according to the first embodiment. 図１６は、第１の実施形態におけるデータ再配置実行処理でのマスタサーバのブロック位置管理部がブロック登録要求を受信したときの動作の一例を示すフローチャートである。FIG. 16 is a flowchart illustrating an example of an operation when the block location management unit of the master server receives a block registration request in the data rearrangement execution process according to the first embodiment. 図１７は、第１の実施形態におけるデータ再配置実行処理での再配置元データサーバのデータアクセス部がブロック削除要求を受信したときの動作の一例を示すフローチャートである。FIG. 17 is a flowchart illustrating an example of an operation when the data access unit of the relocation source data server receives a block deletion request in the data relocation execution process according to the first embodiment. 図１８は、データ再配置装置による一連の処理を実行するプログラムであるデータ再配置プログラムによる情報処理が、コンピュータを用いて具体的に実現されることを示す図である。FIG. 18 is a diagram illustrating that the information processing by the data rearrangement program, which is a program for executing a series of processes by the data rearrangement apparatus, is specifically realized using a computer.

以下に、開示するデータ再配置装置、方法およびプログラムの実施形態を図面に基づいて詳細に説明する。なお、実施形態は開示の技術を限定するものではない。また、各実施形態は、適宜組み合わせることができる。 Hereinafter, embodiments of a disclosed data rearrangement apparatus, method, and program will be described in detail with reference to the drawings. The embodiments do not limit the disclosed technology. Moreover, each embodiment can be combined suitably.

［ＧＦＳおよびＨＤＦＳにおけるブロックの格納処理］
まず、実施形態の前提として、ＧＦＳおよびＨＤＦＳにおけるブロックの格納処理の一例について説明する。 [Block storage processing in GFS and HDFS]
First, an example of block storage processing in GFS and HDFS will be described as a premise of the embodiment.

ＧＦＳでは、どのサーバにブロックを格納するか決定する場合、以下のように処理を行う。
（１）ブロックは、ストレージ使用率が低いサーバに作成する。
（２）特定のサーバに負荷が偏らないようにするために、各サーバにおける最近のブロック生成の数に上限を設ける。
（３）ブロックは、ラック間をまたがり複数のサーバに作成する。 In GFS, when deciding which server stores a block, the following processing is performed.
(1) A block is created on a server with a low storage usage rate.
(2) In order to prevent the load from being concentrated on a specific server, an upper limit is set on the number of recent block generations at each server.
(3) A block is created in a plurality of servers across racks.

また、ＨＤＦＳでは、例えば、冗長度が３の場合、ブロックを作成するサーバは次のように選択する。まず、１つ目のブロックは書き込みを要求した外部アプリケーションが動作しているサーバと同一のサーバに作成する。そして、２つ目のブロックは１つ目のサーバとは異なるラックの任意のサーバに作成する。さらに、３つ目のブロックは２つ目のサーバと同じラックに存在するサーバだが２つ目のサーバとは異なる任意のサーバに作成する。 In HDFS, for example, when the redundancy is 3, the server that creates the block is selected as follows. First, the first block is created on the same server as the server on which the external application that requested writing is operating. The second block is created on an arbitrary server in a rack different from that of the first server. Further, the third block is created in an arbitrary server that is in the same rack as the second server but is different from the second server.

［外部アプリケーションによるデータ読み出し処理］
ＧＦＳやＨＤＦＳでは、外部アプリケーションがデータを読み出すとき、ブロックのネットワーク転送を抑制するために、外部アプリケーションが動作しているサーバに対してネットワークトポロジ的に最も距離の近いサーバからブロックを読み出す。 [Data read processing by external application]
In GFS and HDFS, when an external application reads data, in order to suppress the network transfer of the block, the block is read from the server closest in network topology to the server on which the external application is operating.

例えば、ＨＤＦＳでは、外部アプリケーションがデータを書き込み、その後、当該データを読み出す場合、当該データの１つ目のブロックは外部アプリケーションが動作するサーバと同一のサーバに格納されている。このため、データ読み出し時にブロックのネットワーク転送が発生しない。したがって、外部アプリケーションが動作しているサーバとは異なるサーバからネットワークを介してブロックを読み出す場合と比べて高速な読み出しが可能である。 For example, in HDFS, when an external application writes data and then reads the data, the first block of the data is stored in the same server as the server on which the external application operates. For this reason, network transfer of blocks does not occur during data reading. Therefore, reading can be performed at a higher speed than when reading a block from a server different from the server on which the external application is operating via the network.

このようにブロックを効率的に格納し読み出すように処理が行われるが、他方、分散ファイルシステムの長期間運用によって、ファイルの生成や削除、サーバの追加や削除が行われる。すると、サーバ間でストレージ使用が不均等になり、サーバによって読み書きの負荷に偏りが生じることがある。そこで、分散ファイルシステムでは、負荷を平準化するためにデータの再配置を実行する。すなわち、大量のデータを格納しているサーバから少量のデータしか格納していないサーバにデータを移動させ、分散ファイルシステム全体でサーバのストレージ使用を平準化するデータ再配置処理を実行する。ＧＦＳでは、自動的にデータ再配置処理が実行される。ＨＤＦＳでは、データ再配置処理を実行するためのコマンドが提供されている。 In this way, processing is performed so that blocks are efficiently stored and read. On the other hand, file generation and deletion, and server addition and deletion are performed by long-term operation of the distributed file system. Then, the storage usage among servers becomes uneven, and the load of reading and writing may be biased by the servers. Therefore, in the distributed file system, data rearrangement is executed in order to level the load. That is, data is relocated from a server that stores a large amount of data to a server that stores only a small amount of data, and a data relocation process is performed to level the storage usage of the server in the entire distributed file system. In GFS, a data rearrangement process is automatically executed. In HDFS, a command for executing data relocation processing is provided.

（第１の実施形態）
図１は、第１の実施形態に係るデータ再配置システムの構成例を示す図である。第１の実施形態では、分散ファイルシステムにおけるマスタサーバおよびデータサーバとしてデータ再配置装置を説明する。 (First embodiment)
FIG. 1 is a diagram illustrating a configuration example of a data rearrangement system according to the first embodiment. In the first embodiment, a data relocation apparatus will be described as a master server and a data server in a distributed file system.

［第１の実施形態のデータ再配置システムの構成の一例］
図１に示すように、第１の実施形態のデータ再配置装置は、分散ファイルシステムすなわちデータ再配置システムを構成するマスタサーバ１００と、複数のデータサーバ２００、３００、４００、５００、６００として構成される。図１では、例として、５つのデータサーバを示しているが、データサーバの数は５つに限定されない。外部アプリケーション７００はファイルの読み出しや書き込みを要求する。 [Example of Configuration of Data Relocation System of First Embodiment]
As shown in FIG. 1, the data relocation apparatus according to the first embodiment is configured as a master file 100 constituting a distributed file system, that is, a data relocation system, and a plurality of data servers 200, 300, 400, 500, 600. Is done. In FIG. 1, five data servers are shown as an example, but the number of data servers is not limited to five. The external application 700 requests reading and writing of files.

データサーバ２００〜６００には各データサーバを一意に識別するためのデータサーバ識別子（ＩＤ：Identifier）が割り当てられる。例えば、データサーバ２００〜６００が動作するサーバに割り当てられたＩＰアドレスをデータサーバ識別子とする。図１の例では、データサーバ２００、３００、４００、５００、６００にはそれぞれデータサーバ識別子ｄｓ２、ｄｓ３、ｄｓ４、ｄｓ５、ｄｓ６が割り当てられている。 A data server identifier (ID: Identifier) for uniquely identifying each data server is assigned to each of the data servers 200 to 600. For example, an IP address assigned to a server on which the data servers 200 to 600 operate is used as the data server identifier. In the example of FIG. 1, data server identifiers ds2, ds3, ds4, ds5, and ds6 are assigned to the data servers 200, 300, 400, 500, and 600, respectively.

外部アプリケーション７００がデータサーバに書き込むファイルデータは固定長のブロックに分割され、各ブロックに当該ブロックを一意に識別するためのブロック識別子が割り当てられる。各ブロックはそれぞれ定められた冗長度の数のデータサーバに格納される。 File data written to the data server by the external application 700 is divided into fixed-length blocks, and a block identifier for uniquely identifying the block is assigned to each block. Each block is stored in a data server of a predetermined redundancy number.

ファイルデータの例を図２に示す。図２は、第１の実施形態におけるファイルデータと固定長のブロックの関係を示す図である。図２の例では、ファイルデータ７１０は固定長のブロック７１１、７１２、７１３に分割される。ブロック７１１、７１２、７１３にはそれぞれブロック識別子ｂ１、ｂ２、ｂ３が割り当てられる。なお、ファイルのサイズが固定長の倍数でない場合、末尾ブロック７１３のサイズは固定長よりも小さくなる。また、ファイル中にデータが存在しない区間がある場合、固定長よりも小さいブロックとなることがある。 An example of file data is shown in FIG. FIG. 2 is a diagram illustrating the relationship between file data and fixed-length blocks in the first embodiment. In the example of FIG. 2, the file data 710 is divided into fixed-length blocks 711, 712, and 713. Block identifiers b1, b2, and b3 are assigned to the blocks 711, 712, and 713, respectively. If the file size is not a multiple of the fixed length, the size of the end block 713 is smaller than the fixed length. If there is a section in the file where no data exists, the block may be smaller than the fixed length.

［第１の実施形態のマスタサーバ１００の構成の一例］
図３は、第１の実施形態におけるマスタサーバ１００の構成例を示す図である。マスタサーバ１００は、ブロック位置管理部１１０、データサーバ状態管理部１２０、再配置優先度付与部１３０およびデータ再配置制御部１４０を備える。ブロック位置管理部１１０は、ブロック位置管理テーブル１１１を保持する。データサーバ状態管理部１２０は、データサーバ状態管理テーブル１２１を保持する。データ再配置制御部１４０は、再配置元ブロック情報テーブル１４１および再配置先データサーバ識別子リスト１４２を備える。なお、再配置元ブロック情報テーブル１４１および再配置先データサーバ識別子リスト１４２はデータ再配置実行処理時に作成され、それ以外のときには格納されないものとする。 [Example of Configuration of Master Server 100 of First Embodiment]
FIG. 3 is a diagram illustrating a configuration example of the master server 100 according to the first embodiment. The master server 100 includes a block location management unit 110, a data server state management unit 120, a rearrangement priority assignment unit 130, and a data rearrangement control unit 140. The block position management unit 110 holds a block position management table 111. The data server state management unit 120 holds a data server state management table 121. The data relocation control unit 140 includes a relocation source block information table 141 and a relocation destination data server identifier list 142. It is assumed that the relocation source block information table 141 and the relocation destination data server identifier list 142 are created during the data relocation execution process and are not stored at other times.

以下、マスタサーバ１００による処理および格納される情報の詳細につき説明する。 Hereinafter, the details of the processing and stored information by the master server 100 will be described.

ブロック位置管理部１１０は、データサーバ２００〜６００へのブロックの割り当てを管理する。ブロック位置管理部１１０は、データサーバ２００〜６００へのブロックの割り当てを管理するための情報を記憶するブロック位置管理テーブル１１１を管理する。 The block position management unit 110 manages allocation of blocks to the data servers 200 to 600. The block position management unit 110 manages a block position management table 111 that stores information for managing allocation of blocks to the data servers 200 to 600.

図４は、第１の実施形態におけるブロック位置管理テーブル１１１の構成例を示す図である。ブロック位置管理テーブル１１１は、各ブロックに対応付けて、当該ブロックを一意に識別するためのブロック識別子と、当該ブロックを格納する複数のデータサーバ各々を一意に識別するためのデータサーバ識別子のリストであるデータサーバ識別子リストと、を保持する。例えば、図４の例では、ブロック識別子「ｂ１」で識別されるブロックは、データサーバ識別子「ｄｓ２」、「ｄｓ４」、「ｄｓ５」で識別される３つのデータサーバに格納されていることが示されている。また、ブロック識別子「ｂ２」で識別されるブロックは、データサーバ識別子「ｄｓ２」、「ｄｓ４」、「ｄｓ６」で識別される３つのデータサーバに格納されていることが示されている。また、ブロック識別子「ｂ３」で識別されるブロックは、データサーバ識別子「ｄｓ２」、「ｄｓ３」、「ｄｓ６」で識別される３つのデータサーバに格納されていることが示されている。また、ブロック識別子「ｂ４」で識別されるブロックは、データサーバ識別子「ｄｓ２」、「ｄｓ４」、「ｄｓ５」で識別される３つのデータサーバに格納されていることが示されている。 FIG. 4 is a diagram illustrating a configuration example of the block position management table 111 according to the first embodiment. The block position management table 111 is a list of a block identifier for uniquely identifying the block in association with each block and a data server identifier for uniquely identifying each of a plurality of data servers storing the block. A data server identifier list is maintained. For example, in the example of FIG. 4, it is indicated that the block identified by the block identifier “b1” is stored in three data servers identified by the data server identifiers “ds2,” “ds4,” and “ds5”. Has been. Further, it is indicated that the block identified by the block identifier “b2” is stored in three data servers identified by the data server identifiers “ds2”, “ds4”, and “ds6”. In addition, it is indicated that the block identified by the block identifier “b3” is stored in three data servers identified by the data server identifiers “ds2”, “ds3”, and “ds6”. Further, it is indicated that the block identified by the block identifier “b4” is stored in three data servers identified by the data server identifiers “ds2”, “ds4”, and “ds5”.

データサーバ状態管理部１２０は、データサーバ２００〜６００の状態を管理する。データサーバ状態管理部１２０は、データサーバ２００〜６００の状態に関する情報を格納するデータサーバ状態管理テーブル１２１を管理する。 The data server status management unit 120 manages the status of the data servers 200 to 600. The data server state management unit 120 manages a data server state management table 121 that stores information regarding the states of the data servers 200 to 600.

図５は、第１の実施形態におけるデータサーバ状態管理テーブル１２１の構成例を示す図である。データサーバ状態管理テーブル１２１は、各データサーバのデータサーバ識別子に対応付けて、当該データサーバのデータ記憶部（後述）に格納可能なデータの最大容量と、データ記憶部の空き容量と、を保持する。図５の例では、データサーバ識別子「ｄｓ２」で識別されるデータサーバのデータ記憶部に格納可能なデータの最大容量は２．０テラバイト（ＴＢ）であり、現在の空き容量は１．０テラバイト（ＴＢ）であることが示されている。 FIG. 5 is a diagram illustrating a configuration example of the data server state management table 121 according to the first embodiment. The data server state management table 121 stores the maximum capacity of data that can be stored in the data storage unit (described later) of the data server and the free capacity of the data storage unit in association with the data server identifier of each data server. To do. In the example of FIG. 5, the maximum capacity of data that can be stored in the data storage unit of the data server identified by the data server identifier “ds2” is 2.0 terabytes (TB), and the current free capacity is 1.0 terabytes. (TB).

再配置優先度付与部１３０は、各ブロックに再配置優先度を付与する。再配置優先度は後述するデータサーバのブロック情報テーブルに格納される（図９）。再配置優先度付与部１３０による再配置優先度付与処理の詳細については後述する。 The rearrangement priority assigning unit 130 assigns a rearrangement priority to each block. The rearrangement priority is stored in a block information table of the data server described later (FIG. 9). Details of the rearrangement priority assignment processing by the rearrangement priority assignment unit 130 will be described later.

データ再配置制御部１４０は、再配置元データサーバと再配置先データサーバを選択し、ブロックに付与された再配置優先度に基づいて、再配置元データサーバから再配置先データサーバへのブロックの再配置を制御する。 The data rearrangement control unit 140 selects a rearrangement source data server and a rearrangement destination data server, and blocks from the rearrangement source data server to the rearrangement destination data server based on the rearrangement priority assigned to the block. Control the relocation of

また、データ再配置制御部１４０は、データ再配置実行処理の実行時に、データ再配置実行処理の対象となるブロックについての情報を格納する再配置元ブロック情報テーブル１４１を管理する。また、データ再配置制御部１４０は、データ再配置実行処理の対象であるブロックの再配置先候補である再配置先データサーバのリストである再配置先データサーバ識別子リスト１４２を管理する。図６は、第１の実施形態における再配置元ブロック情報テーブル１４１の構成例を示す図である。図７は、第１の実施形態における再配置先データサーバ識別子リスト１４２の構成例を示す図である。再配置元ブロック情報テーブル１４１の構成は、後述するブロック情報テーブル（図９）と同様である。データ再配置制御部１４０によるデータ再配置実行処理、再配置元ブロック情報テーブル１４１および再配置先データサーバ識別子リスト１４２に関連する処理の詳細については後述する。 In addition, the data relocation control unit 140 manages a relocation source block information table 141 that stores information about a block that is a target of the data relocation execution process when the data relocation execution process is executed. In addition, the data relocation control unit 140 manages a relocation destination data server identifier list 142 that is a list of relocation destination data servers that are candidates for the relocation destination of the block that is the target of the data relocation execution process. FIG. 6 is a diagram illustrating a configuration example of the relocation source block information table 141 according to the first embodiment. FIG. 7 is a diagram illustrating a configuration example of the relocation destination data server identifier list 142 according to the first embodiment. The configuration of the relocation source block information table 141 is the same as the block information table (FIG. 9) described later. Details of data relocation execution processing by the data relocation control unit 140, processing related to the relocation source block information table 141 and the relocation destination data server identifier list 142 will be described later.

［第１の実施形態のデータサーバの構成例］
図８は、第１の実施形態におけるデータサーバの構成例を示す図である。なお、データサーバ２００〜６００の構成はいずれも同様であるものとし、以下、データサーバ２００を代表例として説明する。以下の説明では、必要に応じて対応する他のデータサーバ３００〜６００の構成要素の参照符号を、括弧を付して併記する。 [Configuration Example of Data Server of First Embodiment]
FIG. 8 is a diagram illustrating a configuration example of the data server in the first embodiment. The configurations of the data servers 200 to 600 are all the same, and the data server 200 will be described below as a representative example. In the following description, reference numerals of constituent elements of other corresponding data servers 300 to 600 are shown in parentheses as necessary.

データサーバ２００（３００，４００，５００，６００）は、データ記憶部２１０（３１０，４１０，５１０，６１０）、データアクセス部２２０（３２０，４２０，５２０，６２０）、データ再配置実行部２３０（３３０，４３０，５３０，６３０）および状態通知部２４０（３４０，４４０，５４０，６４０）を備える。 The data server 200 (300, 400, 500, 600) includes a data storage unit 210 (310, 410, 510, 610), a data access unit 220 (320, 420, 520, 620), and a data rearrangement execution unit 230 (330). , 430, 530, 630) and a status notification unit 240 (340, 440, 540, 640).

データ記憶部２１０（３１０，４１０，５１０，６１０）はデータサーバ２００（３００，４００，５００，６００）が格納するすべてのブロック、および当該ブロックに対応するブロック情報テーブル２１１（３１１，４１１，５１１，６１１）を保持する。 The data storage unit 210 (310, 410, 510, 610) stores all blocks stored in the data server 200 (300, 400, 500, 600) and a block information table 211 (311, 411, 511) corresponding to the block. 611).

図９は、第１の実施形態におけるブロック情報テーブル２１１（３１１，４１１，５１１，６１１）の構成例を示す図である。第１の実施形態に係るブロック情報テーブル２１１（３１１，４１１，５１１，６１１）は、格納する各ブロックについて、当該ブロックを一意に識別するためのブロック識別子と、データ記憶部２１０（３１０，４１０，５１０，６１０）内での当該ブロックの格納場所と、を対応付けて保持する。また、ブロック情報テーブル２１１（３１１，４１１，５１１，６１１）はさらに、各ブロックについて、当該ブロックのサイズと、当該ブロックに付与される再配置優先度と、を保持する。例えば、図９の例では、ブロック識別子「ｂ１」で識別されるブロックは、データ記憶部２１０内の「ｐ１」で識別される位置に格納されていることが示されている。また、当該ブロックのサイズは、６４メガバイト（ＭＢ）であること、付与される再配置優先度は「低」であることが示されている。 FIG. 9 is a diagram illustrating a configuration example of the block information table 211 (311, 411, 511, 611) according to the first embodiment. The block information table 211 (311, 411, 511, 611) according to the first embodiment includes, for each block to be stored, a block identifier for uniquely identifying the block, and a data storage unit 210 (310, 410, 510, 610) and the storage location of the block in association with each other. The block information table 211 (311, 411, 511, 611) further holds, for each block, the size of the block and the rearrangement priority assigned to the block. For example, in the example of FIG. 9, it is indicated that the block identified by the block identifier “b1” is stored at the position identified by “p1” in the data storage unit 210. In addition, it is indicated that the size of the block is 64 megabytes (MB), and the reallocation priority given is “low”.

データアクセス部２２０（３２０，４２０，５２０，６２０）は、データ記憶部２１０（３１０，４１０，５１０，６１０）から当該データ記憶部２１０（３１０，４１０，５１０，６１０）に存在するブロックの読み出しを実行する。また、データアクセス部２２０（３２０，４２０，５２０，６２０）は、ブロック情報テーブル２１１（３１１，４１１，５１１，６１１）からの情報の読み出しを実行する。さらに、データアクセス部２２０（３２０，４２０，５２０，６２０）は、データ記憶部２１０（３１０，４１０，５１０，６１０）へのブロックの書き込みやブロック情報テーブル２１１（３１１，４１１，５１１，６１１）への情報の書き込みを行う。 The data access unit 220 (320, 420, 520, 620) reads blocks existing in the data storage unit 210 (310, 410, 510, 610) from the data storage unit 210 (310, 410, 510, 610). Run. Further, the data access unit 220 (320, 420, 520, 620) executes reading of information from the block information table 211 (311, 411, 511, 611). Further, the data access unit 220 (320, 420, 520, 620) writes a block to the data storage unit 210 (310, 410, 510, 610) or to the block information table 211 (311, 411, 511, 611). Write the information.

データ再配置実行部２３０（３３０，４３０，５３０，６３０）は、当該データサーバ２００（３００，４００，５００，６００）に係るデータ再配置実行処理を実行する。データ再配置実行部２３０（３３０，４３０，５３０，６３０）によるデータ再配置実行処理の詳細については後述する。 The data rearrangement execution unit 230 (330, 430, 530, 630) executes a data rearrangement execution process related to the data server 200 (300, 400, 500, 600). Details of the data rearrangement execution processing by the data rearrangement execution unit 230 (330, 430, 530, 630) will be described later.

状態通知部２４０（３４０，４４０，５４０，６４０）は、データ記憶部２１０（３１０，４１０，５１０，６１０）の状態を監視し、マスタサーバ１００に通知する。具体的には、状態通知部２４０（３４０，４４０，５４０，６４０）は、データ記憶部２１０（３１０，４１０，５１０，６１０）に格納可能な最大容量とデータ記憶部２１０（３１０，４１０，５１０，６１０）の空き容量とを定期的に検知する。そして、状態通知部２４０（３４０，４４０，５４０，６４０）は、検知した最大容量と空き容量とを定期的にマスタサーバ１００のデータサーバ状態管理部１２０へ通知する。なお、ここでは定期的に状態を検知し通知するものとするが、なんらかのイベントの発生に応じて、またはオペレータの入力等に応じて検知および通知を実行するものとしてもよい。 The status notification unit 240 (340, 440, 540, 640) monitors the status of the data storage unit 210 (310, 410, 510, 610) and notifies the master server 100 of the status. Specifically, the status notification unit 240 (340, 440, 540, 640) includes the maximum capacity that can be stored in the data storage unit 210 (310, 410, 510, 610) and the data storage unit 210 (310, 410, 510). , 610) free space is periodically detected. Then, the status notification unit 240 (340, 440, 540, 640) periodically notifies the detected maximum capacity and free capacity to the data server status management unit 120 of the master server 100. Here, the state is periodically detected and notified, but detection and notification may be executed in response to the occurrence of some event or in response to an operator input or the like.

［第１の実施形態におけるデータ再配置処理の流れの一例］
第１の実施形態に係るデータ再配置装置によるデータ再配置処理につき説明する。第１の実施形態におけるデータ再配置処理は、再配置優先度付与処理とデータ再配置実行処理とを含む。 [Example of Flow of Data Relocation Processing in First Embodiment]
Data relocation processing by the data relocation apparatus according to the first embodiment will be described. The data rearrangement process in the first embodiment includes a rearrangement priority assignment process and a data rearrangement execution process.

再配置優先度付与処理とは、データサーバに格納する各ブロックに対して、当該ブロックに対するデータ再配置処理の優先度を示す再配置優先度を付与する処理である。以下に説明する例において、再配置優先度付与処理は、ブロック生成処理の一部として実行される。ブロック生成処理は、マスタサーバ１００のブロック位置管理部１１０によるブロックの登録と、再配置優先度付与部１３０による前記ブロックへの再配置優先度の付与と、データサーバによるブロックの生成処理と、を含む。 The rearrangement priority assigning process is a process for assigning a rearrangement priority indicating the priority of the data rearrangement process for each block to each block stored in the data server. In the example described below, the rearrangement priority assignment process is executed as part of the block generation process. The block generation processing includes block registration by the block position management unit 110 of the master server 100, assignment of rearrangement priority to the block by the rearrangement priority assignment unit 130, and block generation processing by the data server. Including.

データ再配置実行処理は、各ブロックに付与された再配置優先度に基づいて当該ブロックの再配置を実行する処理である。以下に説明する例において、データ再配置実行処理は、マスタサーバ１００のデータ再配置制御部１４０によるデータ再配置の制御と、データサーバ２００（３００，４００，５００，６００）のデータ再配置実行部２３０（３３０，４３０，５３０，６３０）によるデータ再配置の実行と、を含む。 The data rearrangement execution process is a process for executing the rearrangement of the block based on the rearrangement priority assigned to each block. In the example described below, the data rearrangement execution process includes the data rearrangement control by the data rearrangement control unit 140 of the master server 100 and the data rearrangement execution unit of the data server 200 (300, 400, 500, 600). 230 (330, 430, 530, 630).

［ブロック生成処理の流れ］
ブロック生成処理についてさらに詳細に説明する。 [Block generation process flow]
The block generation process will be described in more detail.

［マスタサーバ１００における処理］
図１０は、第１の実施形態におけるブロック生成処理において、マスタサーバ１００のブロック位置管理部１１０が外部アプリケーション７００からブロック生成要求を受信したときの動作の一例を示すフローチャートである。まず、ブロック生成処理においてマスタサーバ１００が実行する処理について、図１０を参照して説明する。 [Processing in Master Server 100]
FIG. 10 is a flowchart illustrating an example of an operation when the block position management unit 110 of the master server 100 receives a block generation request from the external application 700 in the block generation processing according to the first embodiment. First, processing executed by the master server 100 in the block generation processing will be described with reference to FIG.

ブロック生成処理においては、まず外部アプリケーション７００は、ファイルデータの書き込み時にマスタサーバ１００のブロック位置管理部１１０へブロック生成要求を送信する。ブロック位置管理部１１０はブロック生成要求を受信すると、冗長度の数のデータサーバにブロックを生成し、当該ブロックのブロック識別子と当該ブロックを格納するデータサーバのデータサーバ識別子リストをブロック位置管理テーブル１１１に登録する。 In the block generation process, first, the external application 700 transmits a block generation request to the block position management unit 110 of the master server 100 when writing file data. When receiving the block generation request, the block position management unit 110 generates blocks in the data servers having the redundancy number, and the block position management table 111 stores the block identifier of the block and the data server identifier list of the data server storing the block. Register with.

以下、図１０に即して、ブロック生成処理においてマスタサーバ１００のブロック位置管理部１１０が外部アプリケーション７００からブロック生成要求を受信したときの動作を説明する。 The operation when the block location management unit 110 of the master server 100 receives a block generation request from the external application 700 in the block generation process will be described below with reference to FIG.

まず、外部アプリケーション７００は、マスタサーバ１００に対してブロック生成要求を送信する。マスタサーバ１００はブロック生成要求を受信する（ステップＳ１２０１）と当該ブロック生成要求をブロック位置管理部１１０に送る。 First, the external application 700 transmits a block generation request to the master server 100. When the master server 100 receives the block generation request (step S1201), the master server 100 sends the block generation request to the block location management unit 110.

そして、ブロック位置管理部１１０は、データサーバ状態管理部１２０のデータサーバ状態管理テーブル１２１から各データサーバにおけるデータ記憶部の空き容量を取得する。そして、ブロック位置管理部１１０は、データ記憶部の空き容量が定められた閾値以上のデータサーバの中から予め定められた冗長度の数のデータサーバを選択する（ステップＳ１２０２）。 Then, the block location management unit 110 acquires the free capacity of the data storage unit in each data server from the data server state management table 121 of the data server state management unit 120. Then, the block location management unit 110 selects a data server having a predetermined redundancy number from data servers having a free space in the data storage unit equal to or greater than a predetermined threshold (step S1202).

ブロック位置管理部１１０は、各データサーバの空き容量と当該データサーバが配置されるラックに基づいて、ブロックの格納先となるサーバを選択する。すなわち、ブロック位置管理部１１０は、データ記憶部の空き容量が予め定められた閾値以上のデータサーバを選択する。また、ブロック位置管理部１１０は、ブロックを格納するデータサーバが複数のラックにまたがるようにデータサーバを選択する。 The block location management unit 110 selects a server as a block storage destination based on the free capacity of each data server and the rack in which the data server is placed. That is, the block location management unit 110 selects a data server whose free space in the data storage unit is equal to or greater than a predetermined threshold. In addition, the block location management unit 110 selects a data server so that the data server storing the block spans a plurality of racks.

例えば、予め定められた冗長度が３であり、ブロックをそれぞれ異なるラックに配置されるサーバで動作するデータサーバに格納するものとする。この場合、ブロック位置管理部１１０は、１つ目のデータサーバとして、ブロック生成要求を送信した外部アプリケーション７００が動作しているサーバと同一のサーバで動作しているデータサーバを選択する。そしてブロック位置管理部１１０は、２つ目のデータサーバとして、１つ目のデータサーバとは異なるラックのサーバで動作しているデータサーバを選択する。さらに、ブロック位置管理部１１０は、３つ目のデータサーバとして１つ目のラックと２つ目のラックとは異なるラックのサーバで動作しているデータサーバを選択する。 For example, assume that the predetermined redundancy is 3, and the blocks are stored in data servers that operate on servers arranged in different racks. In this case, the block location management unit 110 selects a data server operating on the same server as the server on which the external application 700 that transmitted the block generation request is operating as the first data server. Then, the block location management unit 110 selects a data server operating on a server in a rack different from that of the first data server as the second data server. Furthermore, the block location management unit 110 selects a data server operating as a third data server that is operating on servers in different racks from the first rack and the second rack.

そして、ブロック位置管理部１１０は、ブロックを格納するデータサーバのデータサーバ識別子リストを生成する。なお、ブロック位置管理部１１０は、空き容量不足等により予め定められた冗長度の数のデータサーバを選択できない場合は、外部アプリケーション７００へブロック生成失敗を送信する。 Then, the block position management unit 110 generates a data server identifier list of the data server that stores the block. Note that the block location management unit 110 transmits a block generation failure to the external application 700 when a data server having a predetermined redundancy number cannot be selected due to insufficient free space or the like.

次に、ブロック位置管理部１１０は、格納するブロックのブロック識別子を生成し（ステップＳ１２０３）、再配置優先度付与部１３０に対して再配置優先度の算出を指示する。 Next, the block position management unit 110 generates a block identifier of the block to be stored (step S1203), and instructs the rearrangement priority assignment unit 130 to calculate rearrangement priority.

再配置優先度付与部１３０は、指示に応じて、再配置優先度を算出する（ステップＳ１２０４）。例えば、再配置優先度付与部１３０は、ブロック生成要求を送信した外部アプリケーションが動作しているサーバと同一のサーバで動作しているデータサーバに保持させるブロックには低い再配置優先度を算出し付与する。また、再配置優先度付与部１３０は、その他のサーバで動作しているデータサーバに保持させるブロックには高い再配置優先度を算出し付与する。 The rearrangement priority assigning unit 130 calculates rearrangement priority according to the instruction (step S1204). For example, the relocation priority assigning unit 130 calculates a low relocation priority for a block to be held in a data server operating on the same server as the server on which the external application that transmitted the block generation request is operating. Give. Further, the rearrangement priority assigning unit 130 calculates and assigns a higher rearrangement priority to the blocks held by the data servers operating on other servers.

再配置優先度の算出が終わると、マスタサーバ１００は、ブロック位置管理部１１０が生成したデータサーバ識別子リストにある各データサーバ識別子が示すデータサーバのデータアクセス部へブロック生成要求を送信する（ステップＳ１２０５）。このとき、マスタサーバ１００は、ブロック生成要求とともに、ステップＳ１２０３で生成したブロック識別子と、ステップＳ１２０４で算出した再配置優先度と、を送信する。 When the calculation of the rearrangement priority is completed, the master server 100 transmits a block generation request to the data access unit of the data server indicated by each data server identifier in the data server identifier list generated by the block location management unit 110 (step S1205). At this time, the master server 100 transmits the block identifier generated in step S1203 and the rearrangement priority calculated in step S1204 together with the block generation request.

上述の通り、第１の実施形態では、データサーバ識別子として、データサーバのＩＰアドレスを用いる。このため、マスタサーバ１００は、データサーバ識別子に基づいて、当該ＩＰアドレスが示すデータサーバのデータアクセス部にアクセス可能である。しかし、ＩＰアドレス以外の情報をデータサーバ識別子としている場合は、別途、データサーバ識別子とＩＰアドレスとの対応付けを行うための機構を用意する。 As described above, in the first embodiment, the IP address of the data server is used as the data server identifier. Therefore, the master server 100 can access the data access unit of the data server indicated by the IP address based on the data server identifier. However, when information other than the IP address is used as the data server identifier, a mechanism for associating the data server identifier with the IP address is prepared separately.

マスタサーバ１００は、該当するデータサーバにブロック生成要求を送信した後、それぞれのデータサーバからブロック生成応答を受信する（ステップＳ１２０６）。ブロック生成応答を受信すると、マスタサーバ１００は、生成されたブロックを識別するブロック識別子と当該ブロックを生成し格納した複数のデータサーバを示すデータサーバ識別子リストとをブロック位置管理テーブル１１１に登録する（ステップＳ１２０７）。そして、マスタサーバ１００は、外部アプリケーション７００へ生成されたブロックを識別するブロック識別子と当該ブロックを生成し格納した複数のデータサーバを示すデータサーバ識別子リストとを送信する（ステップＳ１２０８）。 After transmitting a block generation request to the corresponding data server, the master server 100 receives a block generation response from each data server (step S1206). When the block generation response is received, the master server 100 registers a block identifier for identifying the generated block and a data server identifier list indicating a plurality of data servers that have generated and stored the block in the block location management table 111 ( Step S1207). Then, the master server 100 transmits to the external application 700 a block identifier for identifying the generated block and a data server identifier list indicating a plurality of data servers that have generated and stored the block (step S1208).

外部アプリケーション７００は、ブロック識別子とデータサーバ識別子リストとを受信すると、当該データサーバ識別子リストのデータサーバが保持している、当該ブロック識別子で識別されるブロックへデータを書き込む。これでマスタサーバ１００での処理が終了する。 When the external application 700 receives the block identifier and the data server identifier list, the external application 700 writes the data to the block identified by the block identifier held by the data server in the data server identifier list. This completes the processing in the master server 100.

［データサーバ２００における処理］
次に、ブロック生成処理におけるデータサーバ側の処理について説明する。図１１は、第１の実施形態におけるブロック生成処理において、データサーバのデータアクセス部がブロック生成要求を受信したときの動作の一例を示すフローチャートである。ここでは、データサーバ２００がブロック生成要求を受信したとして説明する。 [Processing in Data Server 200]
Next, processing on the data server side in the block generation processing will be described. FIG. 11 is a flowchart illustrating an example of an operation when the data access unit of the data server receives a block generation request in the block generation process according to the first embodiment. Here, it is assumed that the data server 200 has received a block generation request.

まず、データサーバ２００は、マスタサーバ１００のブロック位置管理部１１０からブロック生成要求、ブロック識別子、再配置優先度を受信する（ステップＳ１３０１）。すると、データアクセス部２２０は、空のブロックを生成し、データ記憶部２１０に格納する（ステップＳ１３０２）。データアクセス部２２０はまた、マスタサーバ１００から受信したブロック識別子と再配置優先度、ブロックのサイズ（この時点では０）をブロック情報テーブル２１１に登録する（ステップＳ１３０２）。 First, the data server 200 receives a block generation request, a block identifier, and a rearrangement priority from the block position management unit 110 of the master server 100 (step S1301). Then, the data access unit 220 generates an empty block and stores it in the data storage unit 210 (step S1302). The data access unit 220 also registers the block identifier, rearrangement priority, and block size (0 at this time) received from the master server 100 in the block information table 211 (step S1302).

そして、データサーバ２００は、マスタサーバ１００のブロック位置管理部１１０へブロック生成応答を送信し（ステップＳ１３０３）、処理を終了する。 Then, the data server 200 transmits a block generation response to the block position management unit 110 of the master server 100 (step S1303), and ends the process.

［データ再配置実行処理の流れ］
次に、データ再配置実行処理について説明する。第１の実施形態のデータ再配置装置においては、データ再配置実行処理は定期的に行われる。 [Flow of data relocation execution processing]
Next, the data rearrangement execution process will be described. In the data rearrangement apparatus of the first embodiment, the data rearrangement execution process is periodically performed.

［マスタサーバ１００における処理］
図１２は、第１の実施形態におけるデータ再配置実行処理でのマスタサーバ１００のデータ再配置制御部１４０の動作の一例を示すフローチャートである。図１２を参照して、マスタサーバ１００における処理について説明する。 [Processing in Master Server 100]
FIG. 12 is a flowchart illustrating an example of the operation of the data relocation control unit 140 of the master server 100 in the data relocation execution process according to the first embodiment. Processing in the master server 100 will be described with reference to FIG.

まず、マスタサーバ１００のデータ再配置制御部１４０は、データ再配置実行処理が必要かどうかを判定する（ステップＳ１４０１）。データ再配置制御部１４０は、データ再配置実行処理が必要と判定した場合（ステップＳ１４０１、肯定）、ステップＳ１４０２へ遷移する。他方、データ再配置制御部１４０は、データ再配置実行処理が必要ではないと判定した場合（ステップＳ１４０１、否定）、処理を終了する。 First, the data relocation control unit 140 of the master server 100 determines whether data relocation execution processing is necessary (step S1401). If the data rearrangement control unit 140 determines that the data rearrangement execution process is necessary (Yes at step S1401), the data rearrangement control unit 140 proceeds to step S1402. On the other hand, when the data rearrangement control unit 140 determines that the data rearrangement execution process is not necessary (No in step S1401), the data rearrangement control unit 140 ends the process.

例えば、データ再配置制御部１４０は、データサーバ状態管理部１２０からデータサーバ状態管理テーブル１２１を取得し、以下の条件（１）および（２）を満たす場合、データ再配置実行処理が必要であると判定する。 For example, the data relocation control unit 140 acquires the data server state management table 121 from the data server state management unit 120, and when the following conditions (1) and (2) are satisfied, data relocation execution processing is necessary. Is determined.

（１）個々のデータサーバにおけるデータ記憶部の空き容量を、全データサーバについて合計した値が、予め定められた一定容量とデータサーバ数の積よりも大きい。
（２）前記データサーバ状態管理テーブル１２１の値を基に各データサーバにおけるデータ記憶部の使用率を算出し、データ記憶部の使用率が予め定められた閾値以上になっているデータサーバが１つ以上存在する、かつ、当該データサーバのデータ記憶部の空き容量と予め定められた空き容量の余裕値との合計よりもデータ記憶部の空き容量が大きいデータサーバが１つ以上存在する。 (1) The value obtained by summing the free capacity of the data storage unit in each data server for all data servers is larger than the product of a predetermined fixed capacity and the number of data servers.
(2) The usage rate of the data storage unit in each data server is calculated based on the value of the data server state management table 121, and 1 data server has a usage rate of the data storage unit equal to or greater than a predetermined threshold. There are one or more data servers in which there is at least one and the free capacity of the data storage unit is larger than the sum of the free capacity of the data storage unit of the data server and a predetermined free capacity margin.

データ再配置実行処理が必要と判定した場合（ステップＳ１４０１、肯定）、次に、データ再配置制御部１４０は、再配置元データサーバを選択する（ステップＳ１４０２）。例えば、データ再配置制御部１４０は、データサーバ状態管理部１２０からデータサーバ状態管理テーブル１２１を取得する。そして、データ再配置制御部１４０は、データサーバ状態管理テーブル１２１の値を基に各データサーバにおけるデータ記憶部の使用率を算出する。データ再配置制御部１４０は、算出した使用率が予め定められた閾値以上となるデータサーバのなかからランダムに１つのデータサーバを選択する。そして、データ再配置制御部１４０は、選択したデータサーバを、再配置元データサーバとする。例として、ここでは、データサーバ２００を再配置元データサーバとして選択したとする。 When it is determined that the data rearrangement execution process is necessary (Yes at Step S1401), the data rearrangement control unit 140 selects a rearrangement source data server (Step S1402). For example, the data relocation control unit 140 acquires the data server state management table 121 from the data server state management unit 120. Then, the data relocation control unit 140 calculates the usage rate of the data storage unit in each data server based on the value of the data server state management table 121. The data rearrangement control unit 140 randomly selects one data server from among data servers whose calculated usage rate is equal to or greater than a predetermined threshold. Then, the data rearrangement control unit 140 sets the selected data server as the rearrangement source data server. As an example, here, it is assumed that the data server 200 is selected as the relocation source data server.

次に、データ再配置制御部１４０は、最大再配置量を設定し、累積再配置量をゼロに初期化する（ステップＳ１４０３）。ここで、最大再配置量とは、一度のデータ再配置実行処理で再配置するデータ量である。最大再配置量は、予め決定しておいてもよく、データ再配置実行処理を実行するごとに決定するものとしてもよい。また、累積再配置量とは、一度のデータ再配置実行処理において、それまでに再配置を完了したデータ量を指す。 Next, the data relocation control unit 140 sets a maximum relocation amount and initializes the cumulative relocation amount to zero (step S1403). Here, the maximum rearrangement amount is a data amount to be rearranged by a single data rearrangement execution process. The maximum rearrangement amount may be determined in advance, or may be determined every time the data rearrangement execution process is executed. The cumulative rearrangement amount refers to the amount of data that has been rearranged so far in a single data rearrangement execution process.

そして、データ再配置制御部１４０は、再配置元データサーバ２００のデータ再配置実行部２３０へブロック情報テーブル取得要求を送信する（ステップＳ１４０４）。データ再配置制御部１４０は、再配置元データサーバ２００のデータ再配置実行部２３０からブロック情報テーブル２１１を受信する（ステップＳ１４０５）。 Then, the data rearrangement control unit 140 transmits a block information table acquisition request to the data rearrangement execution unit 230 of the rearrangement source data server 200 (step S1404). The data rearrangement control unit 140 receives the block information table 211 from the data rearrangement execution unit 230 of the rearrangement source data server 200 (step S1405).

データ再配置制御部１４０は、再配置元データサーバ２００のブロック情報テーブル２１１の各エントリを再配置優先度の降順にソートし、再配置元ブロック情報テーブル１４１（図６）とする（ステップＳ１４０６）。 The data rearrangement control unit 140 sorts the entries in the block information table 211 of the rearrangement source data server 200 in descending order of rearrangement priority to obtain the rearrangement source block information table 141 (FIG. 6) (step S1406). .

次に、データ再配置制御部１４０は、再配置元ブロック情報テーブル１４１の最後のエントリの処理が完了しているか否かを判定する（ステップＳ１４０７）。最後のエントリの処理が完了していると判定した場合（ステップＳ１４０７、肯定）、データ再配置制御部１４０は、処理を終了する。データ再配置制御部１４０が最後のエントリの処理が完了していないと判定した場合（ステップＳ１４０７、否定）、ステップＳ１４０８へ遷移する。なお、データ再配置制御部１４０は、再配置元ブロック情報テーブル１４１の先頭のエントリに設定されたブロックから順にデータ再配置処理を実行する。 Next, the data relocation control unit 140 determines whether or not the processing of the last entry in the relocation source block information table 141 has been completed (step S1407). If it is determined that the processing of the last entry has been completed (Yes at step S1407), the data rearrangement control unit 140 ends the processing. When the data rearrangement control unit 140 determines that the process of the last entry has not been completed (No at Step S1407), the process proceeds to Step S1408. The data rearrangement control unit 140 executes data rearrangement processing in order from the block set in the head entry of the rearrangement source block information table 141.

データ再配置制御部１４０は、再配置元ブロック情報テーブル１４１の現在のエントリのブロックについて再配置先データサーバ識別子リスト１４２（図７）を作成し、当該リストが空か否かを判定する（ステップＳ１４０８）。 The data relocation control unit 140 creates a relocation destination data server identifier list 142 (FIG. 7) for the block of the current entry in the relocation source block information table 141, and determines whether or not the list is empty (Step S1). S1408).

データ再配置制御部１４０が、再配置先データサーバ識別子リスト１４２は空であると判定した場合（ステップＳ１４０８、肯定）、処理はステップＳ１４１４へ遷移する。 When the data relocation control unit 140 determines that the relocation destination data server identifier list 142 is empty (Yes in step S1408), the process transitions to step S1414.

他方、データ再配置制御部１４０が、再配置先データサーバ識別子リスト１４２は空でないと判定した場合（ステップＳ１４０８、否定）、処理はステップＳ１４０９に遷移する。 On the other hand, when the data relocation control unit 140 determines that the relocation destination data server identifier list 142 is not empty (No at Step S1408), the process transitions to Step S1409.

再配置先データサーバ識別子リスト１４２は、所定のブロックの再配置先となりうる再配置先データサーバのリストである。データ再配置制御部１４０は、例えば、以下の条件に従い、再配置先データサーバ識別子リスト１４２を作成する。 The relocation destination data server identifier list 142 is a list of relocation destination data servers that can be relocation destinations of a predetermined block. For example, the data relocation control unit 140 creates the relocation destination data server identifier list 142 according to the following conditions.

例えば、ブロックをそれぞれ別々のラックのサーバで動作するデータサーバに格納すると仮定する。この場合、データ再配置実行処理の対象であるブロックに対して、以下の条件をすべて満たすデータサーバを再配置先データサーバ識別子リスト１４２に登録する。 For example, assume that the blocks are stored in data servers that each run on a separate rack server. In this case, a data server that satisfies all of the following conditions is registered in the relocation destination data server identifier list 142 for the block that is the target of the data relocation execution process.

（１）当該データサーバは、データ再配置実行処理の対象であるブロックを保持していないデータサーバである。
（２）当該データサーバのデータ記憶部の空き容量は、再配置元データサーバのデータ記憶部の空き容量と予め定められた空き容量の余裕値との合計よりも大きい。
（３）当該データサーバが動作しているサーバのラックは、当該ブロックを保持しているデータサーバのうち、再配置元データサーバを除いたデータサーバが動作しているサーバを含まない。 (1) The data server is a data server that does not hold a block that is a target of data relocation execution processing.
(2) The free capacity of the data storage unit of the data server is larger than the sum of the free capacity of the data storage unit of the relocation source data server and a predetermined free capacity margin.
(3) The rack of servers on which the data server is operating does not include the servers on which the data servers other than the relocation source data server are operating among the data servers holding the block.

ステップＳ１４０９において、データ再配置制御部１４０は、再配置先データサーバ識別子リスト１４２にデータサーバ識別子が登録されているデータサーバの中から再配置先データサーバを選択する。 In step S1409, the data relocation control unit 140 selects a relocation destination data server from data servers whose data server identifiers are registered in the relocation destination data server identifier list 142.

例えば、データ再配置制御部１４０は、再配置先データサーバ識別子リスト１４２のデータサーバから再配置コストが最も低いデータサーバを再配置先データサーバとする。再配置コストとは、データ再配置実行処理に要する処理負荷を示すパラメータである。 For example, the data relocation control unit 140 sets the data server having the lowest relocation cost from the data servers in the relocation destination data server identifier list 142 as the relocation destination data server. The rearrangement cost is a parameter indicating the processing load required for the data rearrangement execution process.

例えば、再配置コストは以下のように算出する。再配置先データサーバ識別子リスト１４２にあるデータサーバ識別子が示すデータサーバの数が２以上であるとする。このとき、データサーバｄｓの再配置コストＣｄｓは次式により算出される。
Ｃｄｓ＝Ｆｄｓ＋Ｗ×Ｄｄｓ
ここで、Ｆｄｓは空き容量コスト、Ｄｄｓは距離コスト、Ｗは予め定められた重み係数である。 For example, the rearrangement cost is calculated as follows. Assume that the number of data servers indicated by the data server identifiers in the relocation destination data server identifier list 142 is two or more. At this time, the relocation cost Cds of the data server ds is calculated by the following equation.
Cds = Fds + W × Dds
Here, Fds is a free capacity cost, Dds is a distance cost, and W is a predetermined weighting factor.

空き容量コストＦｄｓは次式により算出される。
Ｆｄｓ＝（Ｆｍａｘ−ｆｄｓ＋１）／（Ｆｍａｘ−Ｆｍｉｎ＋１）
ここで、ＦｍａｘとＦｍｉｎはそれぞれすべてのデータサーバの空き容量の最大値と最小値、ｆｄｓは当該データサーバの空き容量である。 The free capacity cost Fds is calculated by the following equation.
Fds = (Fmax−fds + 1) / (Fmax−Fmin + 1)
Here, Fmax and Fmin are the maximum value and the minimum value of the free capacity of all data servers, respectively, and fds is the free capacity of the data server.

距離コストＤｄｓは次式により算出される。
Ｄｄｓ＝（１．０−ｃｏｓ（π×ｉ／（ｎ−１）））／２
ここで、ｎは再配置先データサーバ識別子リスト１４２のデータサーバの数、ｉは再配置元データサーバからの距離を用いて再配置先データサーバ識別子リスト１４２の各データサーバ識別子が示すデータサーバを昇順にソートしたときの０から始まる順番である。２つのデータサーバ間の距離は、当該２つのデータサーバに割り当てられているそれぞれのＩＰアドレスの排他的論理和の自然対数に１を加算した値とする。 The distance cost Dds is calculated by the following equation.
Dds = (1.0−cos (π × i / (n−1))) / 2
Here, n is the number of data servers in the relocation destination data server identifier list 142, i is the data server indicated by each data server identifier in the relocation destination data server identifier list 142 using the distance from the relocation source data server. The order starts from 0 when sorted in ascending order. The distance between the two data servers is a value obtained by adding 1 to the natural logarithm of the exclusive OR of the IP addresses assigned to the two data servers.

例として、ここでは、データサーバ３００を再配置先データサーバとして選択したとする。 As an example, here, it is assumed that the data server 300 is selected as the relocation destination data server.

データ再配置制御部１４０は次に、累積再配置量と現在のエントリのブロックのサイズの合計が最大再配置量以下であるか否かを判定する（ステップＳ１４１０）。データ再配置制御部１４０は合計が最大再配置量より大きいと判定した場合（ステップＳ１４１０、否定）、処理を終了する。合計が最大配置量以下であると判定した場合（ステップＳ１４１０、肯定）、データ再配置制御部１４０は、ステップＳ１４１１の処理へ進む。 Next, the data relocation control unit 140 determines whether the sum of the cumulative relocation amount and the block size of the current entry is equal to or smaller than the maximum relocation amount (step S1410). If the data rearrangement control unit 140 determines that the total is larger than the maximum rearrangement amount (No in step S1410), the process is terminated. When it is determined that the total is equal to or less than the maximum arrangement amount (Yes at Step S1410), the data rearrangement control unit 140 proceeds to the process at Step S1411.

ステップＳ１４１１において、データ再配置制御部１４０は、累積再配置量に現在のエントリのブロックサイズを加算する。そして、データ再配置制御部１４０は、ブロック位置管理部１１０のブロック位置管理テーブル１１１から、現在のエントリのブロック識別子が付与されたブロックを格納しているデータサーバのデータサーバ識別子リストを取得し、ブロックコピー元データサーバ識別子リストとする（ステップＳ１４１２）。 In step S1411, the data rearrangement control unit 140 adds the block size of the current entry to the cumulative rearrangement amount. Then, the data rearrangement control unit 140 acquires the data server identifier list of the data server storing the block to which the block identifier of the current entry is assigned from the block location management table 111 of the block location management unit 110, The block copy source data server identifier list is set (step S1412).

そして、データ再配置制御部１４０は、再配置先データサーバ３００のデータ再配置実行部３３０へデータ再配置要求、現在のエントリのブロック識別子、再配置元データサーバ識別子、ブロックコピー元データサーバ識別子リストを送信する（ステップＳ１４１３）。次に、データ再配置制御部１４０は、再配置元ブロック情報テーブル１４１の次のエントリの処理へ進む（ステップＳ１４１４）。そして、処理はステップＳ１４０７に戻り、最後のエントリの処理が完了していなければ（ステップＳ１４０７、否定）、データ再配置制御部１４０は、ステップＳ１４０８からステップＳ１４１４の処理を繰り返す。最後のエントリの処理が完了したとき（ステップＳ１４０７、肯定）、処理は終了する。 Then, the data relocation control unit 140 sends a data relocation request to the data relocation execution unit 330 of the relocation destination data server 300, the block identifier of the current entry, the relocation source data server identifier, and the block copy source data server identifier list. Is transmitted (step S1413). Next, the data relocation control unit 140 proceeds to the processing of the next entry in the relocation source block information table 141 (step S1414). Then, the process returns to step S1407. If the process of the last entry is not completed (No at step S1407), the data rearrangement control unit 140 repeats the process from step S1408 to step S1414. When the process of the last entry is completed (Yes at step S1407), the process ends.

［再配置元データサーバ２００における処理］
図１３は、第１の実施形態におけるデータ再配置実行処理での再配置元データサーバ２００のデータ再配置実行部２３０がブロック情報テーブル取得要求を受信したときの動作の一例を示すフローチャートである。図１３を参照し、データ再配置実行処理での再配置元データサーバ２００のデータ再配置実行部２３０がブロック情報テーブル取得要求を受信したときの動作の一例につき説明する。 [Processing in Relocation Source Data Server 200]
FIG. 13 is a flowchart illustrating an example of an operation when the data rearrangement execution unit 230 of the rearrangement source data server 200 in the data rearrangement execution process according to the first embodiment receives a block information table acquisition request. With reference to FIG. 13, an example of an operation when the data rearrangement execution unit 230 of the rearrangement source data server 200 in the data rearrangement execution process receives a block information table acquisition request will be described.

まず、再配置元データサーバ２００のデータ再配置実行部２３０は、マスタサーバ１００のデータ再配置制御部１４０からブロック情報テーブル取得要求を受信する（ステップＳ１５０１、図１２のステップＳ１４０４、Ｓ１４０５参照）。 First, the data relocation execution unit 230 of the relocation source data server 200 receives a block information table acquisition request from the data relocation control unit 140 of the master server 100 (see step S1501, steps S1404 and S1405 in FIG. 12).

ブロック情報テーブル取得要求に応じて、データ再配置実行部２３０は、データアクセス部２２０に指示し、データ記憶部２１０からブロック情報テーブル２１１を取得する（ステップＳ１５０２）。そして、データ再配置実行部２３０は、マスタサーバ１００のデータ再配置制御部１４０へブロック情報テーブル２１１を送信し（ステップＳ１５０３）、処理を終了する。 In response to the block information table acquisition request, the data rearrangement execution unit 230 instructs the data access unit 220 to acquire the block information table 211 from the data storage unit 210 (step S1502). Then, the data rearrangement execution unit 230 transmits the block information table 211 to the data rearrangement control unit 140 of the master server 100 (step S1503), and ends the process.

［再配置先データサーバ３００における処理］
図１４は、第１の実施形態におけるデータ再配置実行処理での再配置先データサーバ３００のデータ再配置実行部３３０がデータ再配置要求を受信したときの動作の一例を示すフローチャートである。図１４を参照し、データ再配置実行処理において再配置先データサーバ３００のデータ再配置実行部３３０がデータ再配置要求を受信したときの動作の一例を説明する。 [Processing in Relocation Destination Data Server 300]
FIG. 14 is a flowchart illustrating an example of an operation when the data rearrangement execution unit 330 of the rearrangement destination data server 300 receives a data rearrangement request in the data rearrangement execution process according to the first embodiment. With reference to FIG. 14, an example of an operation when the data rearrangement execution unit 330 of the rearrangement destination data server 300 receives a data rearrangement request in the data rearrangement execution process will be described.

まず、再配置先データサーバ３００のデータ再配置実行部３３０は、マスタサーバ１００のデータ再配置制御部１４０からデータ再配置要求、再配置の対象となるブロックのブロック識別子、再配置元データサーバ識別子、ブロックコピー元データサーバ識別子リストを受信する（ステップＳ１６０１、図１２のステップＳ１４１３参照）。 First, the data relocation execution unit 330 of the relocation destination data server 300 receives a data relocation request from the data relocation control unit 140 of the master server 100, the block identifier of the block to be relocated, and the relocation source data server identifier. The block copy source data server identifier list is received (see step S1601, step S1413 in FIG. 12).

データ再配置実行部３３０は、受信したブロックコピー元データサーバ識別子リストに含まれるデータサーバの中からブロックコピー元データサーバ、すなわち、ブロックのコピー元とするデータサーバを選択する（ステップＳ１６０２）。例えば、再配置先データサーバ３００から最も距離の近いデータサーバをブロックコピー元データサーバとする。 The data rearrangement execution unit 330 selects a block copy source data server, that is, a data server as a block copy source, from the data servers included in the received block copy source data server identifier list (step S1602). For example, the data server closest to the relocation destination data server 300 is set as the block copy source data server.

ここで、例えば、２つのデータサーバ間の距離は、当該２つのデータサーバに割り当てられているそれぞれのＩＰアドレスの排他的論理和の自然対数に１を加算した値とする。なお、ブロックコピー元データサーバは必ずしも再配置元データサーバと同一である必要はない。例として、ここでは、データサーバ４００をブロックコピー元データサーバとして選択したとする。このように、ブロックコピー元データサーバと再配置先データサーバとの距離を最短とすることで、効率的なブロックコピーを実現し処理を高速化することができる。ただし、別の基準に基づいてブロックコピー元データサーバを選択することもできる。 Here, for example, the distance between two data servers is a value obtained by adding 1 to the natural logarithm of the exclusive OR of the IP addresses assigned to the two data servers. The block copy source data server is not necessarily the same as the relocation source data server. As an example, here, it is assumed that the data server 400 is selected as the block copy source data server. Thus, by making the distance between the block copy source data server and the relocation destination data server the shortest, efficient block copy can be realized and the processing speed can be increased. However, the block copy source data server can be selected based on another criterion.

データ再配置実行部３３０は、ブロックコピー元データサーバ４００のデータアクセス部４２０へブロックコピー要求と再配置の対象となるブロックのブロック識別子とを送信する（ステップＳ１６０３）。そして、データ再配置実行部３３０は、ブロックコピー元データサーバ４００のデータアクセス部４２０からブロックを受信する（ステップＳ１６０４）。データ再配置実行部３３０は、データアクセス部３２０に指示し、受信したブロックをブロック記憶部３１０に格納する（ステップＳ１６０５）。また、データ再配置実行部３３０は、データアクセス部３２０に指示し、格納したブロックに対応するエントリをブロック情報テーブル３１１に追加して、ブロック情報テーブル３１１を更新する（ステップＳ１６０５）。データ再配置実行部３３０は、当該ブロックのサイズとして、受信したブロックのサイズを登録し、再配置優先度は「高」とする。 The data rearrangement execution unit 330 transmits the block copy request and the block identifier of the block to be rearranged to the data access unit 420 of the block copy source data server 400 (step S1603). The data rearrangement execution unit 330 receives a block from the data access unit 420 of the block copy source data server 400 (step S1604). The data rearrangement execution unit 330 instructs the data access unit 320 to store the received block in the block storage unit 310 (step S1605). Also, the data rearrangement execution unit 330 instructs the data access unit 320 to add an entry corresponding to the stored block to the block information table 311 and update the block information table 311 (step S1605). The data rearrangement execution unit 330 registers the received block size as the size of the block, and sets the rearrangement priority to “high”.

データ再配置実行部３３０は、マスタサーバ１００のブロック位置管理部１１０へブロック登録要求、格納したブロックのブロック識別子、再配置元データサーバ２００のデータサーバ識別子を送信する（ステップＳ１６０６）。これで、再配置先データサーバ３００での処理が終了する。 The data rearrangement execution unit 330 transmits a block registration request, the block identifier of the stored block, and the data server identifier of the rearrangement source data server 200 to the block position management unit 110 of the master server 100 (step S1606). This completes the processing in the relocation destination data server 300.

［ブロックコピー元データサーバ４００における処理］
図１５は、第１の実施形態におけるデータ再配置実行処理でのブロックコピー元データサーバ４００のデータアクセス部４２０がブロックコピー要求を受信したときの動作の一例を示すフローチャートである。図１５を参照し、データ再配置実行処理でのブロックコピー元データサーバ４００のデータアクセス部４２０がブロックコピー要求を受信したときの動作の一例につき説明する。 [Processing in Block Copy Source Data Server 400]
FIG. 15 is a flowchart illustrating an example of an operation when the data access unit 420 of the block copy source data server 400 receives a block copy request in the data rearrangement execution process according to the first embodiment. With reference to FIG. 15, an example of an operation when the data access unit 420 of the block copy source data server 400 in the data relocation execution process receives a block copy request will be described.

ブロックコピー元データサーバ４００のデータアクセス部４２０はまず、再配置先データサーバ３００のデータ再配置実行部３３０からブロックコピー要求とブロック識別子とを受信する（ステップＳ１７０１、図１４のステップＳ１６０３参照）。そして、データアクセス部４２０は、受信したブロック識別子が付与されたブロックをデータ記憶部４１０から読み出す（ステップＳ１７０２）。データアクセス部４２０は、読み出したブロックを再配置先データサーバ３００のデータ再配置実行部３３０へ送信する（ステップＳ１７０３）。これで、ブロックコピー元データサーバ４００における処理が終了する。 First, the data access unit 420 of the block copy source data server 400 receives a block copy request and a block identifier from the data relocation execution unit 330 of the relocation destination data server 300 (see step S1701 and step S1603 of FIG. 14). Then, the data access unit 420 reads the received block identifier from the data storage unit 410 (step S1702). The data access unit 420 transmits the read block to the data rearrangement execution unit 330 of the rearrangement destination data server 300 (step S1703). This completes the processing in the block copy source data server 400.

［マスタサーバ１００におけるブロック登録要求に対する処理］
図１６は、第１の実施形態におけるデータ再配置実行処理でのマスタサーバ１００のブロック位置管理部１１０がブロック登録要求を受信したときの動作の一例を示すフローチャートである。図１６を参照し、データ再配置実行処理でのマスタサーバ１００のブロック位置管理部１１０がブロック登録要求を受信したときの動作の一例を説明する。 [Processing for Block Registration Request in Master Server 100]
FIG. 16 is a flowchart illustrating an example of an operation when the block location management unit 110 of the master server 100 receives a block registration request in the data rearrangement execution process according to the first embodiment. With reference to FIG. 16, an example of an operation when the block location management unit 110 of the master server 100 receives a block registration request in the data rearrangement execution process will be described.

まず、マスタサーバ１００のブロック位置管理部１１０は、再配置先データサーバ３００のデータ再配置実行部３３０からブロック登録要求、ブロック識別子、再配置元データサーバ識別子を受信する（ステップＳ１８０１、図１４のステップＳ１６０６参照）。ブロック位置管理部１１０は、ブロック位置管理テーブル１１１中、受信したブロック識別子に対応付けられるデータサーバ識別子リストに再配置先データサーバ３００のデータサーバ識別子を追加する（ステップＳ１８０２）。また、ブロック位置管理部１１０は、データサーバ識別子リストから再配置元データサーバ２００のデータサーバ識別子を削除する（ステップＳ１８０２）。 First, the block location management unit 110 of the master server 100 receives a block registration request, a block identifier, and a rearrangement source data server identifier from the data rearrangement execution unit 330 of the rearrangement destination data server 300 (step S1801, FIG. 14). (See step S1606). The block location management unit 110 adds the data server identifier of the relocation destination data server 300 to the data server identifier list associated with the received block identifier in the block location management table 111 (step S1802). Further, the block location management unit 110 deletes the data server identifier of the relocation source data server 200 from the data server identifier list (step S1802).

そして、ブロック位置管理部１１０は、再配置元データサーバ２００のデータアクセス部２２０へブロック削除要求と、再配置先データサーバ３００から受信したブロック識別子を送信する（ステップＳ１８０３）。ブロック位置管理部１１０は、再配置元データサーバ２００のデータアクセス部２２０からブロック削除応答を受信する（ステップＳ１８０４）。これによって、ブロック登録要求に対応するマスタサーバ１００での処理が終了する。 Then, the block position management unit 110 transmits the block deletion request and the block identifier received from the relocation destination data server 300 to the data access unit 220 of the relocation source data server 200 (step S1803). The block location management unit 110 receives a block deletion response from the data access unit 220 of the relocation source data server 200 (step S1804). Thereby, the process in the master server 100 corresponding to the block registration request is completed.

［再配置元データサーバでのブロック削除要求に対する処理］
図１７は、第１の実施形態におけるデータ再配置実行処理での再配置元データサーバ２００のデータアクセス部２２０がブロック削除要求を受信したときの動作の一例を示すフローチャートである。図１７を参照し、データ再配置実行処理での再配置元データサーバ２００のデータアクセス部２２０がブロック削除要求を受信したときの動作の一例につき説明する。 [Processing for block deletion request at the relocation source data server]
FIG. 17 is a flowchart illustrating an example of an operation when the data access unit 220 of the relocation source data server 200 receives a block deletion request in the data relocation execution process in the first embodiment. With reference to FIG. 17, an example of an operation when the data access unit 220 of the relocation source data server 200 receives a block deletion request in the data relocation execution process will be described.

まず、再配置元データサーバ２００のデータアクセス部２２０は、マスタサーバ１００のデータ再配置制御部１４０からブロック削除要求、ブロック識別子を受信する（ステップＳ１９０１、図１６のステップＳ１８０３参照）。そして、データアクセス部２２０は、受信したブロック識別子に対応するエントリをブロック情報テーブル２１１から削除する（ステップＳ１９０２）。また、データアクセス部２２０は、受信したブロック識別子が付与されたブロックをブロック記憶部２１０から削除する（ステップＳ１９０２）。そして、データアクセス部２２０は、マスタサーバ１００のデータ再配置制御部１４０へブロック削除応答を送信する（ステップＳ１９０３）。これによって、再配置元データサーバ２００におけるブロック削除要求に対応した処理が終了する。 First, the data access unit 220 of the relocation source data server 200 receives a block deletion request and a block identifier from the data relocation control unit 140 of the master server 100 (see step S1901 and step S1803 of FIG. 16). Then, the data access unit 220 deletes the entry corresponding to the received block identifier from the block information table 211 (step S1902). In addition, the data access unit 220 deletes the received block with the block identifier from the block storage unit 210 (step S1902). Then, the data access unit 220 transmits a block deletion response to the data rearrangement control unit 140 of the master server 100 (step S1903). As a result, the processing corresponding to the block deletion request in the relocation source data server 200 ends.

［第１の実施形態の効果］
このように、第１の実施形態のデータ再配置装置は、外部アプリケーションが書き込むファイルデータを固定長のブロックに分割して予め定められた冗長度の数のサーバに格納する際に、各サーバに格納されるブロック各々について、データ再配置処理の優先度を示す再配置優先度を付与する再配置優先度付与部と、再配置優先度に基づいて、ブロックのデータ再配置処理を実行するデータ再配置実行部と、を備える。このため、再配置優先度に基づいて再配置が好ましくないブロックの移動を抑制することができ、外部アプリケーションの読み出し性能の低下を防止することができる。 [Effect of the first embodiment]
As described above, the data rearrangement device according to the first exemplary embodiment allows each server to store the file data written by the external application into fixed-length blocks and store them in a predetermined number of redundancy servers. For each block to be stored, a rearrangement priority assigning unit that assigns a rearrangement priority indicating the priority of the data rearrangement process, and a data rearrangement process that executes block data rearrangement processing based on the rearrangement priority An arrangement execution unit. For this reason, it is possible to suppress the movement of the blocks whose rearrangement is not preferable based on the rearrangement priority, and to prevent the reading performance of the external application from being deteriorated.

また、第１の実施形態に係るデータ再配置装置１によれば、再配置優先度付与部は、外部アプリケーションが書き込むファイルデータを固定長に分割して得られたブロックのうち、外部アプリケーションが動作するサーバと同一のサーバが保持するブロックには低い再配置優先度、外部アプリケーションが動作するサーバとは異なるサーバが保持するブロックには高い再配置優先度を付与する。そして、データ再配置実行部は、再配置優先度が高いブロックからデータ再配置処理を実行する。 Further, according to the data rearrangement device 1 according to the first embodiment, the rearrangement priority assigning unit operates the external application among the blocks obtained by dividing the file data written by the external application into a fixed length. A low relocation priority is assigned to a block held by the same server as the server to be executed, and a high relocation priority is assigned to a block held by a server different from the server on which the external application operates. Then, the data rearrangement execution unit executes the data rearrangement process from the block having the higher rearrangement priority.

このため、外部アプリケーションが動作するサーバと同一のサーバに格納されるブロックは、データ再配置処理を実施しても同一のサーバにとどまりやすくなる。したがって、外部アプリケーションは、データを読み出す際に自身が動作するサーバと同一のサーバからブロックを読み出す確率が、他のサーバからブロックを読み出す確率よりも高くなり、再配置優先度に基づかないでデータ再配置処理を実行する場合と比較して、外部アプリケーションによる読み出しを高速化することができる。 For this reason, the block stored in the same server as the server on which the external application operates is likely to remain in the same server even if the data rearrangement process is performed. Therefore, when an external application reads data, the probability of reading a block from the same server as the server on which it operates is higher than the probability of reading a block from another server. Reading by an external application can be speeded up as compared with the case where the arrangement process is executed.

（第２の実施形態）
これまで本発明の実施形態について説明したが、本発明は上述した実施形態以外にも、その他の実施形態にて実施されてもよい。以下に、その他の実施形態を説明する。 (Second Embodiment)
Although the embodiments of the present invention have been described so far, the present invention may be implemented in other embodiments besides the above-described embodiments. Other embodiments will be described below.

［他の分散ファイルシステムへの応用］
上記実施形態は、ＧＦＳおよびＨＤＦＳのように、ブロックの位置を管理するマスタサーバとブロックを保持するデータサーバとで構成される分散ファイルシステムにおける処理を前提として説明した。しかし、本発明はこれに限定されず、データの格納位置が予め限定されているような記憶システムや分散ファイルシステムでなければ任意の記憶システムまたは分散ファイルシステムに適用することができる。 [Application to other distributed file systems]
The above embodiment has been described on the premise of processing in a distributed file system including a master server that manages the position of a block and a data server that holds the block, such as GFS and HDFS. However, the present invention is not limited to this, and can be applied to any storage system or distributed file system as long as the data storage location is not limited to a storage system or distributed file system.

［システム構成］
また、本実施例において説明した各処理のうち、自動的に行われるものとして説明した処理の全部又は一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部又は一部を公知の方法で自動的に行うこともできる。この他、上述文書中や図面中に示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 [System configuration]
Also, among the processes described in this embodiment, all or part of the processes described as being performed automatically can be performed manually, or the processes described as being performed manually can be performed. All or a part can be automatically performed by a known method. In addition, the processing procedures, control procedures, specific names, and information including various data and parameters shown in the above-described document and drawings can be arbitrarily changed unless otherwise specified.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部又は一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的又は物理的に分散・統合して構成することができる。例えば、図９に示す例では、再配置優先度はブロック情報テーブル２１１（３１１，４１１，５１１，６１１）に格納するものとして説明したが、再配置優先度をマスタサーバ１００側で管理するように構成してもよい。 Further, each component of each illustrated apparatus is functionally conceptual, and does not necessarily need to be physically configured as illustrated. That is, the specific form of distribution / integration of each device is not limited to the one shown in the figure, and all or a part of the distribution / integration may be functionally or physically distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured. For example, in the example shown in FIG. 9, the relocation priority is described as being stored in the block information table 211 (311, 411, 511, 611), but the relocation priority is managed on the master server 100 side. It may be configured.

［プログラム］
図１８は、データ再配置装置による一連の処理を実行するプログラムであるデータ再配置プログラムによる情報処理が、コンピュータを用いて具体的に実現されることを示す図である。図１８に例示するように、コンピュータ１０００は、例えば、メモリ１０１０と、ＣＰＵ（Central Processing Unit）１０２０と、ハードディスクドライブ１０８０と、ネットワークインタフェース１０７０とを有する。コンピュータ１０００の各部はバス１１００によって接続される。 [program]
FIG. 18 is a diagram illustrating that the information processing by the data rearrangement program, which is a program for executing a series of processes by the data rearrangement apparatus, is specifically realized using a computer. As illustrated in FIG. 18, the computer 1000 includes, for example, a memory 1010, a CPU (Central Processing Unit) 1020, a hard disk drive 1080, and a network interface 1070. Each part of the computer 1000 is connected by a bus 1100.

メモリ１０１０は、図１８に例示するように、ＲＯＭ１０１１およびＲＡＭ１０１２を含む。ＲＯＭ１０１１は、例えば、ＢＩＯＳ（Basic Input Output System）等のブートプログラムを記憶する。 The memory 1010 includes a ROM 1011 and a RAM 1012 as illustrated in FIG. The ROM 1011 stores a boot program such as BIOS (Basic Input Output System).

ここで、図１８に例示するように、ハードディスクドライブ１０８０は、例えば、ＯＳ１０８１、アプリケーションプログラム１０８２、プログラムモジュール１０８３、プログラムデータ１０８４を記憶する。すなわち、開示の技術に係るデータ再配置プログラムは、コンピュータによって実行される指令が記述されたプログラムモジュール１０８３として、例えばハードディスクドライブ１０８０に記憶される。例えば、マスタサーバ１００、データサーバ２００〜６００の各部と同様の情報処理を実行する手順各々が記述されたプログラムモジュール１０８３が、ハードディスクドライブ１０８０に記憶される。 Here, as illustrated in FIG. 18, the hard disk drive 1080 stores, for example, an OS 1081, an application program 1082, a program module 1083, and program data 1084. That is, the data rearrangement program according to the disclosed technique is stored in, for example, the hard disk drive 1080 as the program module 1083 in which an instruction to be executed by the computer is described. For example, the hard disk drive 1080 stores a program module 1083 in which each procedure for executing the same information processing as each unit of the master server 100 and the data servers 200 to 600 is described.

また、マスタサーバ１００、データサーバ２００〜６００に記憶されるデータのように、データ再配置プログラムによる情報処理に用いられるデータは、プログラムデータ１０８４として、例えばハードディスクドライブ１０８０に記憶される。そして、ＣＰＵ１０２０が、ハードディスクドライブ１０８０に記憶されたプログラムモジュール１０８３やプログラムデータ１０８４を必要に応じてＲＡＭ１０１２に読み出し、各種の手順を実行する。 Further, data used for information processing by the data rearrangement program, such as data stored in the master server 100 and the data servers 200 to 600, is stored as, for example, the hard disk drive 1080 as the program data 1084. Then, the CPU 1020 reads the program module 1083 and program data 1084 stored in the hard disk drive 1080 to the RAM 1012 as necessary, and executes various procedures.

なお、データ再配置プログラムに係るプログラムモジュール１０８３やプログラムデータ１０８４は、ハードディスクドライブ１０８０に記憶される場合に限られない。例えば、プログラムモジュール１０８３やプログラムデータ１０８４は、着脱可能な記憶媒体に記憶されてもよい。この場合、ＣＰＵ１０２０は、ディスクドライブなどの着脱可能な記憶媒体を介してデータを読み出す。また、同様に、更新プログラムに係るプログラムモジュール１０８３やプログラムデータ１０８４は、ネットワーク（ＬＡＮ（Local Area Network）、ＷＡＮ（Wide Area Network）等）を介して接続された他のコンピュータに記憶されてもよい。この場合、ＣＰＵ１０２０は、ネットワークインタフェース１０７０を介して他のコンピュータにアクセスすることで各種データを読み出す。 Note that the program module 1083 and the program data 1084 related to the data rearrangement program are not limited to being stored in the hard disk drive 1080. For example, the program module 1083 and the program data 1084 may be stored in a removable storage medium. In this case, the CPU 1020 reads data via a removable storage medium such as a disk drive. Similarly, the program module 1083 and program data 1084 related to the update program may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.). . In this case, the CPU 1020 reads various data by accessing another computer via the network interface 1070.

［その他］
なお、本実施例で説明したデータ再配置プログラムは、インターネット等のネットワークを介して配布することができる。また、データ再配置プログラムは、ハードディスク、フレキシブルディスク（ＦＤ）、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤなどのコンピュータで読取可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行することもできる。 [Others]
The data rearrangement program described in this embodiment can be distributed via a network such as the Internet. The data rearrangement program can also be executed by being recorded on a computer-readable recording medium such as a hard disk, a flexible disk (FD), a CD-ROM, an MO, and a DVD, and being read from the recording medium by the computer. .

１００マスタサーバ
１１０ブロック位置管理部
１１１ブロック位置管理テーブル
１２０データサーバ状態管理部
１２１データサーバ状態管理テーブル
１３０再配置優先度付与部
１４０データ再配置制御部
１４１再配置元ブロック情報テーブル
１４２再配置先データサーバ識別子リスト
２００，３００，４００，５００，６００データサーバ
２１０，３１０，４１０，５１０，６１０データ記憶部
２１１，３１１，４１１，５１１，６１１ブロック情報テーブル
２２０，３２０，４２０，５２０，６２０データアクセス部
２３０，３３０，４３０，５３０，６３０データ再配置実行部
２４０，３４０，４４０，５４０，６４０状態通知部
７００外部アプリケーション
７１０ファイルデータ
７１１，７１２，７１３ブロック
１０００コンピュータ
１０１０メモリ
１０１１ＲＯＭ
１０１２ＲＡＭ
１０２０ＣＰＵ
１０７０ネットワークインタフェース
１０８０ハードディスクドライブ
１０８１ＯＳ
１０８２アプリケーションプログラム
１０８３プログラムモジュール
１０８４プログラムデータ
１１００バス DESCRIPTION OF SYMBOLS 100 Master server 110 Block position management part 111 Block position management table 120 Data server state management part 121 Data server state management table 130 Relocation priority assignment part 140 Data relocation control part 141 Relocation source block information table 142 Relocation destination data Server identifier list 200, 300, 400, 500, 600 Data server 210, 310, 410, 510, 610 Data storage unit 211, 311, 411, 511, 611 Block information table 220, 320, 420, 520, 620 Data access unit 230, 330, 430, 530, 630 Data relocation execution unit 240, 340, 440, 540, 640 Status notification unit 700 External application 710 File data 711, 712, 713 Click 1000 computer 1010 memory 1011 ROM
1012 RAM
1020 CPU
1070 Network interface 1080 Hard disk drive 1081 OS
1082 Application program 1083 Program module 1084 Program data 1100 Bus

Claims

When the file data written by the external application is divided into fixed-length blocks and stored in a predetermined number of redundant servers , the block written by the external application is stored for each block stored in each server. If the server and the server running the external application are the same, move the block from the server than when the server storing the block written by the external application is different from the server running the external application. A rearrangement priority assigning unit that assigns a rearrangement priority indicating the priority of the data rearrangement process by setting the rearrangement priority indicating the priority to be low ,
Based on at least the free capacity and the usage rate of the data storage unit of each server, it is determined whether or not to execute a data rearrangement process for moving a block stored in each server, and it is determined to execute the data rearrangement process In this case, each block stored in the server that is the rearrangement source of the data rearrangement process is rearranged in descending order of the rearrangement priority, and the data amount that has completed the data rearrangement process becomes a predetermined data amount. A data rearrangement control unit that terminates the data rearrangement process,
A data rearrangement device comprising:

When the file data written by the external application is divided into fixed-length blocks and stored in a predetermined number of redundant servers , the block written by the external application is stored for each block stored in each server. If the server and the server running the external application are the same, move the block from the server than when the server storing the block written by the external application is different from the server running the external application. A rearrangement priority assigning step for assigning a rearrangement priority indicating the priority of the data rearrangement process by setting the rearrangement priority indicating the priority to be low ;
Based on at least the free capacity and the usage rate of the data storage unit of each server, it is determined whether or not to execute a data rearrangement process for moving a block stored in each server, and it is determined to execute the data rearrangement process In this case, each block stored in the server that is the rearrangement source of the data rearrangement process is rearranged in descending order of the rearrangement priority, and the data amount that has completed the data rearrangement process becomes a predetermined data amount. A data rearrangement control step for ending the data rearrangement process upon reaching,
A data rearrangement method comprising:

A data rearrangement program for causing a computer to function as the data rearrangement device according to claim 1 .